This is a third public draft of a standard documenting the proposed usage of the FHISO Citation Elements standard in RDFa. This document is not an FHISO standard and is not endorsed by the FHISO membership. It may be updated, replaced or obsoleted by other documents at any time.
In particular, some examples in this draft use citation elements that are not yet included in the draft Citation Element: Vocabulary, and source derivation types that may be standardised in a future Source Derivation Vocabulary. These are likely to be changed as these vocabularies progress.
The public tsc-public@fhiso.org mailing list is the preferred place for comments, discussion and other feedback on this draft.
Latest public version: | https://fhiso.org/TR/cev-rdfa-bindings |
This version: | https://fhiso.org/TR/cev-rdfa-bindings-20180316 |
Previous version: | https://fhiso.org/TR/cev-rdfa-bindings-20170911 |
FHISO’s suite of Citation Elements standard provides an extensible framework and vocabulary for encoding all the data about a genealogical source that might reasonably be included in a formatted citation to that source.
This information is represented as a sequence of citation elements, logically self-contained pieces of information about a source. This document defines a means by which citation elements may be identified and tagged within an XML or HTML formatted citation, allowing a computer to extracted them in a systematic manner. The tagging of citation elements is done using a standard set of HTML attributes known as RDFa attributes, which can also be used in XML languages besides HTML.
Other documents in the suite of Citation Elements standards are as follows:
Citation Elements: General Concepts. This standard defines the general concepts used in FHISO’s suite of Citation Elements standards, and the basic framework and data model underpinning them.
Citation Elements: Vocabulary. This standard defines a collection of citation elements allowing the representation of information normally found in formatted citations to diverse types of source.
Citation Elements: Bindings for GEDCOM X. This standard defines extensions to the GEDCOM X data model and its JSON and XML serialisations to allow citation elements to be represented in GEDCOM X.
Citation Elements: Bindings for ELF. This standard defines how citation elements should be represented in FHISO’s Extensible Legacy Format (ELF), a format based on and compatible with GEDCOM 5.5.1, but with the addition of a new extensibility mechanism.
Where this standard gives a specific technical meaning to a word or phrase, that word or phrase is formatted in bold text in its initial definition, and in italics when used elsewhere. The key words must, must not, required, shall, shall not, should, should not, recommended, not recommended, may and optional in this standard are to be interpreted as described in [RFC 2119].
An application is conformant with this standard if and only if it follows all the requirements and prohibitions contained in this document, as indicated by use of the words must, must not, required, shall and shall not, and the relevant parts of its normative references. Standards referencing this standard must not loosen any of the requirements and prohibitions made by this standard, nor place additional requirements or prohibitions on the constructs defined herein.
This standard depends on the Citation Elements: General Concepts and Basic Concepts for Genealogical Standards standards. To be conformant with this standard, an application must also be conformant with [CEV Concepts] and [Basic Concepts]. Concepts defined in those standards are used here without further definition.
Indented text in coloured boxes, such as preceding paragraph, does not form a normative part of this standard, and is labelled as either an example or a note.
The tagging of citation elements in formatted citations is done using a standard set of HTML attributes known as RDFa attributes which are defined in [RDFa Core]. Compliance with this FHISO standard does not require full RDFa compliance: support for the full [RDFa Core] is optional, and RDFa features other than those for which support is required by this standard should not be used when compatibility between implementations is desirable.
These attributes may be used in HTML or any XML-based markup language, but for the purpose of tagging citation elements in formatted citations it is recommended that they be used in XHTML. The language they are used in is referred to here as the host language.
In the simplest case, the citation element name (which is an IRI) can be put in a property
attribute on an XML or HTML element, and the citation element value is the text contents of the element. The particular type of element on which the attributes are placed is not relevant.
A simplified formatted citation to Settipani’s book Les ancêtres de Charlemagne might be marked up as the following HTML fragment:
<p>Settipani, Christian. <i>Les ancêtres de Charlemagne</i>.</p>
The title of the book can be tagged by adding a property
attribute to the existing <i>
element. As written above, no element contains just the author’s name, as the <p>
element also encloses the title; however author’s name can be wrapped in an <span>
element and a property
attributed added to that. HTML’s <span>
element has no defined meaning of its own, but exists to provide a place for attributes such as this.
<p><span property="https://terms.fhiso.org/sources/authorName"
>Settipani, Christian</span>.
<i property="https://terms.fhiso.org/sources/title">Les ancêtres
de Charlemagne</i>.</p>
An HTML renderer will correctly format this while ignoring the two property
attributes, but an application that conforms to this standard will extract these two citation elements from this HTML:
authorName : |
“Settipani, Christian ” |
title : |
“Les ancêtres de Charlemagne ” |
Note the citation element value of title
citation element contains no line break, despite the HTML being split across two lines. This is because [CEV Concepts] says applications should whitespace-normalise citation element values.
This standard makes use of the following attributes:
The vocab
and prefix
attributes are used to allow the creation of shorthand IRIs per §2. Full support for their RDFa semantics is required by this standard, except that the use of an initial context to provide defaults is optional.
The typeof
attribute is used to locate formatted citations per §3.1. Support for any other use of this attribute is optional; any unsupported use of it shall be marked as a source-exclusion element per §3.2 and is not further processed by this standard.
The property
attribute contains a citation element name as per §4. Full support for its RDFa semantics is required, other than when it is used in constructs that define source-exclusion elements, and except for the special behaviour RDFa gives to an rdfa:copy
property for which support is optional.
The content
attribute can be used to represent a citation element value as per §4.2. Full support for its RDFa semantics is required.
The href
and src
attribute can be used to represent a citation element value as per §4.2. They are not formally considered RDFa attributes but are part of the host language. Full support for their RDFa semantics is required if the host language permit their use, as HTML does.
The datetime
attribute can also be used to represent a citation element value as per §4.2 if the host language is HTML.
The xml:lang
and lang
attributes are used to represent a language tag as per §4.4. Full support for their RDFa semantics is required.
The datatype
attribute is used to identify the datatype of strings in a citation element value. Full support for its RDFa semantics is required.
The rel
and rev
attributes are used to denote layer derivation links per §5.3. Support for any other use of this attribute is optional; any unsupported use of them shall be marked as a source-exclusion element per §3.2 and is not further processed by this standard.
The about
, inlist
and resource
attributes are not used by this standard. Support for their RDFa semantics is optional. Any unsupported use of them shall be marked as a source-exclusion element per §3.2 and is not be processed by this standard, except when the presence of one of these attributes (but not its particular value) prevents the recognition of nested source-type element per §5.1.
In addition, when the host language is HTML, special meaning is attached to the <time>
element.
In this standard, unless otherwise stated, the term HTML refers to any backwards-compatible version of HTML, and XHTML refers to any version of HTML that is also well-formed XML.
The use of HTML, or a subset of HTML, is often permitted in genealogy applications to allow users to add formatting to text in various contexts. It is recommended that applications which allow users to edit or manually lay out formatted citations should permit the use of some HTML elements in them.
If an application automatically generates an HTML formatted citation from a citation element set, it should add RDFa attributes in such a manner that will another application conformant with this standard will be able to extract the citation elements again. This should not be an application’s principal means of serialising a citation element set: applications should prefer a format that serialises the citation element set directly rather than after converting it to a formatted citation.
RDFa attributes are not the recommended way of serialising citation element sets primarily because it requires creating a formatted citation. Doing this to a reasonable standard is non-trivial, and results in particular language and style being favoured. This standard is provided for situations when a formatted citation is desired or required anyway. For example, much genealogical research has been published online in HTML and includes formatted citations. If they are tagged according to this standard, these formatted citations can be copied into a genealogy application which can convert them back to a citation element set.
The process for generating a formatted citation, with or without RDFa attributes, is outside the scope of this standard, and this standard does not require applications to produce formatted citations.
Application parsing an HTML or XML file for citation elements in accordance with this standard shall follow the steps outlined in this section. Conformant applications may deviate from this processing sequence only if it has no effect on the observable behaviour of the application.
The application shall first parse the host language according to the applicable standards for the host language. The application may carry out any form of validation that is defined for the host language and reject input that fails. The application may also accept input that is not well-formed according to the rules of the host language, and parse it in some implementation-defined manner. It is recommended that XML that is not well-formed be rejected.
If the application is following the procedure described in this standard rather than using a full RDFa parser, the application shall process the document as follows:
datatype
, property
, rel
, rev
and typeof
attributes shall be expanded according to the rules in §2.property
attributes that identify citation elements shall be located according to the process defined in §3, and the value of the property
attribute becomes the citation element name.Alternatively, if a full RDFa parser is being used, the application shall process the document as follows:
The [CEV Concepts] standard makes heavy use of IRIs as identifiers, as does RDFa. In particular, the datatype
, property
, rel
, rev
and typeof
attributes contain IRIs.
The datatype
attribute shall contain a single IRI. The property
, rel
, rev
and typeof
attributes shall contain a list of IRIs separated by whitespace. Leading and trailing whitespace is discarded.
A common reason why multiple IRIs might be present is when two IRIs exist with similar meanings and the creator of the citation wishes to use both for compatibility.
<i property="https://terms.fhiso.org/sources/title
http://purl.org/dc/terms/title">Les ancêtres de
Charlemagne</i>
Here two alternative IRIs are used to tag the title, presumably because the citation’s creator anticipated it being processed by applications that support [Dublin Core] metadata as well as FHISO’s Citation Elements standards. A parser conforming to this standard will treat both IRIs as valid and create two citation elements, both with the same citation element value, however if the Dublin Core IRI is not known to the application, it will likely be ignored.
In the uses described by this standard the property
attribute will always contain a citation element term, and the datatype
attribute will always contain a datatype name. The typeof
attribute will contain an IRI that allows this standard’s use of RDFa to be distinguished from any other uses also present in the document. The rev
and rel
attributes will contain a source derivation type to denote citation layer links.
RDFa provides two separate mechanisms for abbreviating the IRIs in these attributes: by setting a local default vocabulary, and by using prefixes to create compact URIs expressions (CURIEs) as a form of prefix notation. Applications processing formatted citations in accordance with this standard must support both of these mechanisms. Expansion of terms using the local default vocabulary shall be done before the expansion of CURIEs. An application must behave as if all datatype
, property
, rel
, rev
and typeof
attributes have been expanded before continuing to process the data.
typeof
attribute is the only one whose value invariably needs expanding.
A term in RDFa is an XML NCName that also permits slash (U+002F) as a non-leading character. It matches the term
production given in §7.4.3 of [RDFa Core].
This production is as follows:
term ::= NCNameStartChar termChar*
termChar ::= ( NameChar - ':' ) | '/'
The definitions of NameChar
and NCNameStartChar
are found in [XML] and [XML Names] respectively.
When a datatype
, property
, rel
, rev
or typeof
attribute contains a term, it shall be converted to an IRI by prepending the local default vocabulary if one exists. The local default vocabulary is an IRI which is specified using a vocab
attribute. It applies to the element where it is specified and to all elements in its content unless overridden with another vocab
attribute.
Markup generators should ensure that a vocab
attribute is present if terms are being used when compatibility between implementations is desirable. When these attributes are used in a host language other than HTML, the definition of the host language may provide a default vocabulary that applies in the event that no vocab
attribute is found; HTML provides no such default.
If no local default vocabulary was found, a parser may use an initial context as described in §9 of [RDFa Core] to resolve the term to an IRI; if not, or if it was not found in the initial context, the term shall be ignored. When an initial context is used, it must be the standard one for the host language: implementations must not define their own initial context.
<p><span property="authorName">Settipani, Christian</span>.
<i vocab="https://terms.fhiso.org/sources/"
property="title">Les ancêtres de Charlemagne</i>.</p>
In this fragment, both property
attributes contain a term. The title
term is converted to the IRI of FHISO’s title
citation element:
https://terms.fhiso.org/sources/title
In considering the authorName
term, a parser looks for a vocab
attribute on the <span>
or the enclosing <p>
element. No such attribute exists, and the RDFa attributes are being used in HTML which provides no default vocabulary.
The parser may consider the standard initial context too, and if it is a full RDFa parser it must. As the host language is HTML, the initial context is defined in [HTML5+RDFa Context]. At the present time this only includes mappings for describedBy
, license
and role
. These are to be matched case-sensitively, or failing that case-insensitively, but the title
term used in this example clearly does not match.
Regardless of whether the application considered the initial context, the title
term cannot be resolved to an IRI and is therefore ignored.
A CURIE comprises two components, a prefix and a reference, separated by a colon (U+003A). It matches the curie
production given in §6 of [RDFa Core].
This production is defined as follows:
curie ::= ( prefix? ':' )? reference
prefix ::= NCName
reference ::= ( ipath-absolute | ipath-rootless | ipath-empty )
( '?' iquery )? ( '#' ifragment )?
The definitions of NCName
is found in [XML Names]. The various productions referenced in the definition of reference
are defined in [RFC 3987]. None of these ipath
productions match a string beginning “//
”, therefore IRIs of the form http://
… never match the curie
syntax production. There is a conflict with certain other, less-used IRI schemes, and mailto:user@example.com
does match the syntax. However this only results in this IRI being treated as a CURIE if mailto
is defined as a CURIE prefix. The RDFa working group considered the risk of this to be minimal.
Although this syntax definition allows the omission of both prefix
and the colon, in practice there is no situation in RDFa where both can be omitted and the result still parsed as a CURIE. A parser conforming to this standard may safely treat the colon as mandatory.
When a datatype
, property
, rel
, rev
or typeof
attribute contains a whitespace separated token that is syntactically a CURIE, the parser should look up its prefix to see whether a prefix mapping (which is an IRI) has been defined. This look-up is done case-insensitively.
If the prefix has been omitted and the CURIE begins with a colon, parsers may ignore the CURIE and must not fall back to treating it as an IRI; if is is not ignored, the prefix mapping must be
http://www.w3.org/1999/xhtml/vocab#
When the prefix is present, a parser must try to look it up in the local prefix mappings. These are set using prefix
attributes. This attribute must contain an even number of whitespace separated tokens: the first and every subsequent odd token must be an NCName
followed by a colon; the second and every subsequent even token must be an IRI. The NCName
is the prefix and the IRI is its prefix mapping. The mapping applies to the element where it is specified and to all elements in its content unless overridden.
The following is an example of a well-formed prefix
attribute.
<div prefix="cev: https://terms.fhiso.org/sources/
dc: http://purl.org/dc/elements/1.1/">
<i prefix="dc: http://purl.org/dc/terms/"
property="cev:title dc:title">Les ancêtres de Charlemagne</i>
</div>
The prefix
attribute on the <div>
defines two local prefix mappings, one for the cev
prefix, the other for the dc
prefix. The dc
local prefix mapping is overridden by the prefix
attribute on the <i>
element; the cev
local prefix mapping has not been overridden and remains in operation.
The prefix consisting of a single underscore character (U+005F) has special meaning in §7.4.5 [RDFa Core] for referencing blank nodes. It must not be used in CURIEs other than for that purpose. Support for blank nodes is not recommended in this standard. Applications that do not support blank nodes must ignore CURIEs with a prefix consisting of a single underscore.
In determining the local prefix mappings, a parser may also use XML namespace declarations as defined in §7.5, item 3 of [RDFa Core]. This is not required even in full RDFa parsers and is deprecated; it is not recommended by this standard.
If the prefix was not found in the local prefix mappings, a parser may use an initial context as described in §9 of [RDFa Core] to determine the prefix mapping. When an initial context is used, it must be the standard one for the language on which the RDFa tags are used: implementations must not define their own initial context.
If a prefix mapping is found, the CURIE is converted to an IRI by prepending the prefix mapping to the reference part of the CURIE.
The two CURIEs in the previous example expand to these IRIs:
https://terms.fhiso.org/sources/title
http://purl.org/dc/terms/title
If no prefix mapping is found, the CURIE shall be treated as an IRI if it is syntactically valid as one or ignored otherwise. If this results in an IRI with an unknown scheme, the parser may ignore it; parsers must not ignore the http
, https
or urn
schemes.
prefix:reference
is a valid IRI, despite having an unknown scheme. The option of ignoring unknown IRI with unknown schemes is introduced because this standard makes the use of an initial context optional. CURIEs with prefixes that would be resolved via the initial context in a full RDFa parser may therefore be left unresolved by a parser conforming to this standard. Almost invariably they will have an unknown scheme when reinterpreted as an IRI and can therefore be dropped. Full RDFa parsers must use initial contexts and therefore must not ignore IRIs with unknown schemes.
In general, a document will contain more than just a single citation element set, and parts of the document may also contain RDFa attributes for entirely different purposes; even if the only use of RDFa is for tagging citation elements it is important not to confuse the citation elements from one formatted citation or citation layer with those of another.
Citation elements are represented by property
attributes; however a property
attribute shall only be interpreted as representing a citation element if:
the property
attribute is on an element contained within a source-type element (as defined in §3.1) known as its associated source-type element, but is not located on the source-type element itself; and
the property
attribute is not located on a source-exclusion element (as defined in §3.2) within its associated source-type element, nor is it located on an element contained within a source-exclusion element within its associated source-type element.
The property
attributes matching the above criteria shall be used to generate citation elements as described in §4. The set of citation elements generated from property
attributes with a common associated source-type element shall form a citation element set, which represents a citation layer or a single-layered citation, as described in §5. The order of the citation elements in the citation element set shall be the order in which the property
attribute from which they were generated appear in the document.
Alternatively an application using a full RDFa parser may identify citation element triples per §3.3 and parse them according to §4.5.
A source-type element is any element that has a typeof
attribute whose value, once shorthand IRIs have been expanded, includes either of the following IRIs:
https://terms.fhiso.org/sources/Source
https://terms.fhiso.org/sources/CitedSource
Formally these terms are defined as follows:
Name | https://terms.fhiso.org/sources/Source |
Type | http://www.w3.org/2000/01/rdf-schema#Class |
Superclass | http://www.w3.org/2000/01/rdf-schema#Resource |
Required properties | http://www.w3.org/1999/02/22-rdf-syntax-ns#type |
Name | https://terms.fhiso.org/sources/CitedSource |
Type | http://www.w3.org/2000/01/rdf-schema#Class |
Superclass | https://terms.fhiso.org/sources/Source |
Required properties | http://www.w3.org/1999/02/22-rdf-syntax-ns#type |
HTML or XML content is only considered to be part of a formatted citation if it is a source-type element or is contained within one.
The following example contains two entirely unrelated uses of RDFa attributes:
<p vocab="https://terms.fhiso.org/sources/" typeof="Source">
<span property="authorName">Settipani</span>, <i>Ibid.</i></p>
<div vocab="http://creativecommons.org/ns#">Released under a
<a href="http://creativecommons.org/licenses/by/3.0/"
property="license">Creative Commons License</a>.</div>
The typeof
attribute of the <p>
element has a value that expands to the required IRI. This marks the <p>
element as a source-type element, and its contents as a formatted citation. This contains just one property
attribute, so a parser will find just one citation element: an authorName
one with value “Settipani”.
The license
property is not contained in a source-type element and therefore does not denote a citation element. It is a use of RDFa that is outside the scope of this standard. This is as well: Settipani’s book is not licensed under a Creative Commons License, though a page discussing it may well be.
An external mechanism may be used to designate the entirety of an HTML document or fragment a source-type element.
typeof
attribute is optional.
resource
attribute on source-type elements to generate certain “meta” citation elements such as a UUID or a “citation authority IRI”.
property
attributes that are part of more complex RDFa constructs which this standard does not require to be supported. Future FHISO standards may make use of some of these RDFa constructs and this restriction also allows for forwards compatibility.
An application that supports only those RDFa features for which support is required by this standard must consider an element to be a source-exclusion element of a given source-type element if it is contained within the source-type element (but is not the source-type element itself) and has an attribute named about
, inlist
, rel
, resource
, rev
, or typeof
.
The following example includes a more complex use of RDFa attributes, beyond what this standard requires to be understood.
<p prefix="foaf: http://xmlns.com/foaf/0.1/"
vocab="https://terms.fhiso.org/sources/" typeof="CitedSource">
<span rel="foaf:maker">
<span property="foaf:name">Settipani</span></span>,
<i property="title">Les ancêtres de Charlemagne</i>.
</p>
The <p>
element is a source-type element due to the typeof="CitedSource"
attribute, and the formatted citation is the string “Settipani, Les ancêtres de Charlemagne.”
The <p>
element has one source-exclusion element: the outer <span>
element due to its rel
attribute. Parsers are not expected to understand the meaning of this rel
attribute, just to note its presence. As the inner <span>
element is contained within this source-exclusion element, the property="foaf:name"
attribute must not be treated as a citation element.
The property
attribute on the <i>
element is not located within a source-exclusion element, and therefore it does denote a citation element. This is the only citation element in this example.
Applications which support a larger part of RDFa than this standard requires may treat fewer elements as source-exclusion elements. If so, they must ensure that RDFa constructs are only treated as citation elements when they produce a relevant RDF triples as defined in §3.3.
Instead of identifying source-type elements and source-exclusion elements, as specified in §3.1 and §3.2, applications supporting more RDFa features than this standard requires may parse the document in accordance with [RDFa Core] to generate a sequence of RDF triples which must be in the order in which §7.5 of [RDFa Core] states that they are produced.
property
attributes are processed and used to generate RDF triples in document order. However the [RDFa Core] processing model requires these triples be added to an RDF graph, and RDFa graphs are not required to preserve the order of triples; nevertheless, most current RDFa processors do output properties in document order. Implementations using an RDFa parser to implement this specification must verify that the document order of properties can be determined.
Triples whose predicate is the following IRI have a special role in RDF:
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
This IRI is referred to as the rdf:type
IRI, and such triples are referred to as rdf:type
triples. They are used to state that the declared type of the subject of triple is the object of the triple.
Suppose the RDF graph extracted from a document contains a triple whose subject is a blank node _:1
, whose predicate is the rdf:type
IRI, and whose object is the following IRI:
https://terms.fhiso.org/sources/CitedSource
This means that the declared source of the blank node _:1
is
https://terms.fhiso.org/sources/CitedSource
If there is no triple stating the declared type of a particular entity, it has no declared type. An entity might have multiple declared types if an RDF graph has multiple rdf:type
triples with the same subject and different objects.
Not all the RDF triples extracted from a document will necessarily correspond to citation elements. RDF triples that do represent a citation element are known as citation element triples. Applications shall determine which RDF triples are citation element triples as follows.
If the object of the RDF triple is an RDF blank node, the triple shall not be considered a citation element triple.
Otherwise, if the predicate of the RDF triple is a term whose type is known to be the following class, the triple shall be considered a citation element:
https://terms.fhiso.org/sources/CitationElement
Otherwise, if the predicate of the RDF triple is the rdf:type
IRI or is known to be a source derivation type defined, as defined in §5.1 of [CEV Concepts], the triple shall not be considered a citation element triple.
If the IRI is a source derivation type, the triple represents a layer derivation link rather than a citation element. Because §5.1 of [CEV Concepts] leaves the mechanism for defining new source derivation types to a future FHISO standard, applications might not know whether the IRI is a source derivation type. The only IRI that a conformant application must recognise as a source derivation type is the one defined in [CEV Concepts]:
https://terms.fhiso.org/sources/derivedFrom
Otherwise, if the object of the RDF triple has a declared type which is or includes one of the following IRIs, the triple shall not be considered a citation element triple.
https://terms.fhiso.org/sources/Source
https://terms.fhiso.org/sources/CitedSource
Otherwise, if the subject of the RDF triple has a declared type which is or includes one of the following IRIs, the triple shall be considered a citation element triple.
https://terms.fhiso.org/sources/Source
https://terms.fhiso.org/sources/CitedSource
Otherwise, if the application can infer the RDF type of the subject of the RDF triple to be one of the two previous IRIs, the triple should be considered a citation element triple.
Otherwise, the RDF triple should be considered a citation element triple.
As defined in the [CEV Concepts] standard, a citation element consists of two components:
Once an application has identified the property
attributes that are representing citation element according to the process given in §3, it shall determine each component of each citation element as follows.
The citation element name shall be the value of the property
attribute, once shorthand IRIs have been expanded. If the property
attribute contains more than one IRI, each shall be used as the citation element name of a separate citation element with a copy of the same citation element value.
To construct the citation element value, an application shall determine its current property value, as defined in §4.2 below. This is a string and is used to construct a new localisation set to be the citation element value. The application shall then determine the datatype of the string per §4.3, and if the result is a language-tagged datatype, shall also determine its language tag per §4.4. Alternatively, applications that opt to parse RDFa to RDF triples, as a full RDFa parser does, may determine the current property value, datatype and language tag per §4.5.
For the purpose of this section, the current element refers to the XML or HTML element that has the property
attribute which tags the current citation element.
RDFa, as used in this standard, is a list-flattening format. This means it does not naturally provide a means of keeping the localisation sets of each citation element separate because it has no means of distinguishing multi-valued citation elements from translated or localised versions of the same citation element. Applications must therefore assume every property
attribute identifies a separate citation element.
The following RDFa markup is well-formed but will be misinterpreted by a parser conforming to this specification.
<p lang="en-GB" typeof="Source">
<span property="authorName"
content="Lansdowne, Marquess of">Lord Lansdowne</span> and
<span property="authorName" lang="jp-Latn">Hayashi Tadasu</span>
(<span property="authorName" lang="jp">林 董</span>),
<i property="title">The Anglo-Japanese Treaty</i>,
<span property="publicationDate">1902</span>.
</p>
The Anglo-Japanese Treaty was (at least nominally) authored by two people: the Marquess of Lansdowne and Count Hayashi Tadasu whose name is written in kanji as 林 董. A conformant application will see three authorName
s and make each into a separate citation element, when in fact the desired behaviour is for “林 董” to be part of the same localisation set as “Hayashi Tadasu”.
Applications are required to use the localisedElement
mechanism defined in §3.4.1 of [CEV Concepts] when multiple translations or localisations of a single citation element value are needed.
The RDFa markup from the previous example can be fixed by using a localisedElement
to encode the second form of Hayashi’s name. At its simplest, this alters the two <span>
elements referring to Hayashi to read:
<span property="authorName" lang="jp-Latn">Hayashi Tadasu</span>
(<span property="localisedElement" lang="jp">林 董</span>)
However, [CEV Concepts] recommends that the first string in the localisation set should be the untranslated, and ideally untransliterated form of the citation element. Undoubtedly it is the Latin form that is the transliteration, and therefore these elements are the wrong way round. While this is only a recommendation, applications should try to follow it; this can be achieved as follows:
<span property="authorName" lang="jp" content="林 董" />
<span property="localisedElement"
lang="jp-Latn">Hayashi Tadasu</span> (林 董)
This use of the content
attribute is discussed below. It provides a value for the citation element while hiding the value from an HTML renderer.
The current property value is a string which will be used to create the citation element value. It is determined based on the RDFa attributes present on the current element as follows.
If current element has a content
attribute, and either has no datatype
attribute, or its datatype
attribute is empty or has a value (after expanding shorthand IRIs) other than either of the following IRIs, then the current property value shall be the value of the content
attribute.
http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral
http://www.w3.org/1999/02/22-rdf-syntax-ns#HTML
The purpose of the content
attribute is to allow the citation element value to be something that is not rendered or otherwise used in HTML. This is particularly important when the citation element is required to have a value in a format that is different to how the element is formatted.
<span property="https://terms.fhiso.org/sources/publicationDate"
content="2017-05-22">May 22nd, 2017</span>
In this case, the use of a content
attribute is necessary because the publicationDate
citation element value must be a date in the prescribed date format based on [ISO 8601]: it must not be a date like “May 22nd, 2017”.
resource
attribute here. Before adding it, it is necessary to establish how safe it is to remove resource
from the list of attributes that make a source-exclusion element.
Otherwise, if the host language is HTML, if the current element has a datetime
attribute, the current property value shall be the value of the datetime
attribute.
Otherwise, in HTML or in other XML languages that support an href
attribute, if the current element has an href
attribute and no datatype
attribute, the current property value shall be the value of the href
attribute, which shall be an IRI.
Otherwise, in HTML or in other XML languages that support a src
attribute, if the current element has a src
attribute and no datatype
attribute, the current property value shall be the value of the src
attribute, which shall be an IRI.
datetime
, href
or src
attribute. At present, the datetime
attribute is only permitted on a <time>
element; most href
attributes in HTML are found on <a>
elements; most src
attributes are on elements that display some form of media, particularly <img>
and in HTML5, <video>
and <audio>
.
When an href
or src
attribute links to an online source, it can be tagged as a citation element.
<div vocab="https://terms.fhiso.org/sources/" typeof="Source">
<a href="http://discovery.nationalarchives.gov.uk/"
property="accessURL"><span property="title">Discovery</span>
</a> (online catalogue)
</div>
This example has two citation elements:
accessURL : |
http://discovery.nationalarchives.gov.uk/ |
title : |
“Discovery ” |
The fact that the second property
attribute is on a child element of the element containing the first property
attribute is irrelevant and does not signify any additional connection between the title
and the accessURL
over and above their usual relationship.
Otherwise, the current property value shall be formed by concatenating the text contained in each of the descendant text nodes of the current element in document order.
This definition allows citation elements to nest which can be useful when tagging full titles and short versions of them.
<p vocab="https://terms.fhiso.org/sources/" typeof="Source">
<i property="title"><span property="shortTitle">The visitations
of Kent</span>, taken in the years 1530–1 by Thomas Benolte,
Clarenceux, and 1574 by Robert Cooke, Clarenceux.</i>
</p>
The shortTitle
property takes the value “The visitations of Kent”, while the title
property takes the value “The visitations of Kent, taken in the years …” by concatenating the text in the nested <span>
element with the text directly in the <i>
element.
A conformant parser must determine the datatype which tags the string in the citation element value as follows.
If the current element has a non-empty datatype
attribute, then the datatype shall be the value of datatype
attribute once shorthand IRIs have been expanded. The datatype
attribute must not contain the name of a language-tagged datatype or the name of an abstract datatype. The use of a datatype
attribute is recommended for citation elements that are not well-known if the datatype is known not to be one that is prohibited in a datatype
attribute.
Suppose a vendor defines a citation element called reviewDate
which contains an [ISO 8601] date. This third-party element may not be well known, so an RDFa author should mark up its use with a datatype
attribute:
<span prefix="vendor: http://example.com/sources/
xsd: http://www.w3.org/2001/XMLSchema#"
property="vendor:reviewDate" datatype="xsd:date"
content="2000-10-08" />
By using a datatype
attribute, the RDFa author is ensuring the application processing the data knows the citation element is a date and will display it to the user appropriately, even if it does not know exactly what the date signifies.
datatype
attribute because the RDFa parsing rules mean the language tag is discarded if a datatype
attribute is found.
Otherwise, if the host language is HTML, if the current property value was found in a datetime
attribute or was the contents of a <time>
element, an application may examine the current property value, and if it is syntactically valid as the following structured non-language-tagged datatypes defined in [XSD Pt2], it may determine that to the datatype:
http://www.w3.org/2001/XMLSchema#date
http://www.w3.org/2001/XMLSchema#time
http://www.w3.org/2001/XMLSchema#dateTime
http://www.w3.org/2001/XMLSchema#duration
http://www.w3.org/2001/XMLSchema#gYear
http://www.w3.org/2001/XMLSchema#gYearMonth
datatype
attribute.
An application that implements this rule would read the markup below and generate a citation element value whose single string “2000-10-08
” would be tagged with the xsd:date
datatype.
<time property="vendor:reviewDate">2000-10-08</time>
Had a different HTML element been used, say a <span>
, or if the parser does not support this rule, the datatype would fall back to rdf:langString
. If this third-party citation element were unfamiliar to the application, it would not undergo datatype correction per §4.4 of [CEV Concepts], and would remain with the wrong datatype. For this reason, an explicit datatype
attribute is recommended:
<time property="vendor:reviewDate"
datatype="xsd:date">2000-10-08</time>
Otherwise, if the current property value was found in a src
or href
attribute, then the datatype shall be:
http://www.w3.org/2001/XMLSchema#anyURI
Otherwise, the application shall attempt to determine whether a language tag is in scope per §4.4; if a language tag can be determined, the datatype shall be the rdf:langString
type:
http://www.w3.org/1999/02/22-rdf-syntax-ns#langString
Otherwise, the application shall determine the datatype to be:
http://www.w3.org/2001/XMLSchema#string
Applications wishing not to handle the xsd:string
datatype are allowed by §2.4.2 of [CEV Concepts] to change this datatype to
http://www.w3.org/1999/02/22-rdf-syntax-ns#langString
and tag the string with a language tag of und
.
The datatypes selected in the last three cases are the three datatypes which are defined to participate in the datatype correction mechanism defined in §3.4 of [CEV Concepts]:
http://www.w3.org/1999/02/22-rdf-syntax-ns#langString
http://www.w3.org/2001/XMLSchema#string
Applications may opt to apply datatype correction while parsing RDFa for citation elements; if so, these datatypes will often be replaced by the default datatype of the citation element term.
The language tag of the citation element shall be the value of xml:lang
or lang
attribute on the current element, failing which on the nearest ancestor element of the current element. If both attributes are present on the same element, the xml:lang
attribute takes precedence.
xml:lang
and lang
attributes may be used on an HTML element. In particular, the xml:lang
attribute is only allowed in XHTML documents.
<p vocab="https://terms.fhiso.org/sources/"
typeof="Source" lang="en">
<span property="authorName"
content="Settipani, Christian">Christian Settipani</span>,
<i property="title" lang="fr">Les ancêtres de Charlemagne</i>,
<span property="edition" content="2">2nd ed.</span>
</p>
This formatted citation is correctly tagged with the language tag en
denoting English. This is because, even though the book’s title is French, the citation as a whole is in English. Had the citation been written in French, the edition would have been written “2ᵉ éd” rather than “2nd ed”.
This example contains three citation elements. The authorName
and edition
citation elements both inherit the en
language tag. In the case of authorName
this may or may not be what was intended: the author is French but his name would not normally be altered in translation to English. The explicit language tag is necessary on the title
citation element, as the title is clearly French.
If no applicable xml:lang
or lang
, an external mechanism may be used to supply the language tag.
Content-Language
header may provide the default language tag for the whole document.
xml:lang
attributes in the host XML will be inherited by the XHTML as defined in §2.12 of [XML].
When these attributes are used in host languages other than HTML, the definition of the host language may provide a default language tag that applies in the event that no such attribute is found.
und
(defined in [ISO 639-2] to represent an undetermined language) should be used.
If no applicable xml:lang
or lang
attribute was found, no value was supplied through an external mechanism and no default applies, or if provided language tag is an empty string, the citation element has no language tag.
Applications supporting more RDFa features than required by this standard may determine the current element value, its datatype and, where applicable, its language tag from the object of a citation element triple that was identified per §3.3 of this standard.
If the object of the RDF triple is a literal, then the current element value shall be the lexical form of the literal, as defined in §3.3 of [RDF Concepts]. Its datatype shall be the datatype IRI of the literal, and its language tag shall be the language tag of the literal if that is present exists.
Otherwise, if the object of the RDF triple is an IRI, then the current element value shall be that IRI, and its datatype shall be:
http://www.w3.org/2001/XMLSchema#anyURI
Once the citation elements in a document have been located, parsed and grouped into citation element sets, the application shall interpret each citation element set as a citation layer.
In [CEV Concepts], a citation is represented with three parts:
In these RDFa bindings, citation layers are represented by a source-type element which are nested in layered citations.
A nested source-type element is a source-type element that:
rev
or rel
(or has both), but does not also have an attribute named about
, href
, inlist
, resource
or src
.The citation layer represented by a nested source-type element shall be part of the same layered citation as the citation layer represented by its outer source-type element. Source-type elements may be nested arbitrarily deep, and multiple nested source-type elements may be present within the same outer source-type element: they all represent citation layers which are part of the same layered citation.
The following fragment of HTML represents a layered citation with three citation layers.
<p vocab="https://terms.fhiso.org/sources/" typeof="CitedSource">
<span property="authorName">Settipani</span>, citing
<span rel="cites" typeof="Source"><i property="title">Vita
Sancti Arnulfi</i></span> and
<span rel="cites" typeof="Source"><i property="title">Testamentum
Bertichramni</i></span>.</p>
The second <span>
element is a source-type element by virtue of its typeof
attribute, which also makes it a source-exclusion element of the <p>
element. It has a rel
attribute, and together these facts make it a nested source-type element. The <p>
element is its outer source-type element. Exactly the same applies to the third <span>
element, and as both are part of the same layered citation as their shared outer source-type element, both must be in the same layered citation as each other.
As the second and third <span>
elements are source-exclusion elements of the outer source-type element, their title
property is only a citation element of the nested source-type elements, and not also of the outer source-type element. The outer source-type element therefore only has one citation element: the authorName
.
All but one of the source-type elements in a layered citation will be nested source-type elements. The one that is not is known as the outermost source-type element.
The collection of citation layers in a layered citation is an ordered list, and the citation layers should be include given in document order.
The head citation layer may be indicated by source-type element with a typeof
attribute whose value, once shorthand IRIs have been expanded, includes the following IRI:
https://terms.fhiso.org/sources/CitedSource
If precisely one such element exists in the layered citation, the head citation layer shall be the citation element represented by that element; otherwise the head citation layer shall be the citation element represented by the outermost source-type element. There shall not be more than one source-type elements in a layered citation with a typeof
attribute whose value includes this IRIs.
CitedSource
type is provided to facilitate the correct identification of the head citation layer, regardless of where it is placed.
Individual citation elements have not been tagged in this example for reasons of brevity.
<p vocab="https://terms.fhiso.org/sources/" typeof="Source">
1810 U.S. census, York County, Maine, town of York,
p. 435 (penned), line 9, Jabez Young;
<span rev="facsimileOf" typeof="CitedSource">NARA microfilm
publication M252, roll 12</span>.</p>
This formatted citation, based on an example in [Evidence Explained], places the head citation layer (the microfilm) at the end of the formatted citation, and marks it with a CitedSource
type. In this case, the same effect could have been achieved by nesting the HTML elements differently:
<p vocab="https://terms.fhiso.org/sources/" typeof="Source">
<span rel="facsimileOf" typeof="Source">1810 U.S. census,
York County, Maine, town of York, p. 435 (penned),
line 9, Jabez Young</span>;
NARA microfilm publication M252, roll 12.</p>
In this second version, there is no need to use the CitedSource
type as it defaults to the outermost source-type element.
In the [CEV Concepts] data model, layer derivation links have components:
In this standard, layer derivation links are represented by rel
and rev
attributes on nested source-type elements.
Once shorthand IRIs have been expanded, each IRI in the rel
and rev
attributes shall be used as the source derivation type of a new layer derivation link. If the IRI was in a rel
attribute, the derived source shall be the source represented by the outer source-type element, and the base source shall be the source represented by the nested source-type element. If the IRI was in a rev
attribute, the derived source shall be the source represented by the nested source-type element, and the base source shall be the source represented by the outer source-type element.
rel
and rev
attributes provide forwards and reverse versions of the same functionality: the difference being that the rel
attribute is placed on the base source, while the rev
attribute is placed on the derived source.
rev
attribute because the nested source-type element is the derived source, while the second version uses a rel
attribute because the nested source-type element is the base source.
Documents that use more RDFa features than this standard requires to be supported must not include any source-type elements, other than the head citation layer as determined by the above rules, whose RDF type can be inferred to be:
https://terms.fhiso.org/sources/CitedSource
Applications may utilise the fact that https://terms.fhiso.org/sources/CitedSource
is an RDF subclass of https://terms.fhiso.org/sources/Source
.
Applications which support a larger part of RDFa may find additional layer derivation links. If so, or if a full RDFa parser is being used, they must ensure that RDFa constructs are only treated as layer derivation links when they produce an RDF triple whose subject and object both have the following RDF types, or a subtype thereof:
https://terms.fhiso.org/sources/Source
In addition, the predicate of the RDF triple must be the following, or an RDF subproperty thereof:
https://terms.fhiso.org/sources/derivedFrom
The subject of the RDF triple corresponds to derived source and its object is the base source; the predicate is the source derivation type. Such triples should not also be used to generate a citation element as would otherwise be permitted by §3.2.
In the following example, the layers have been shorted to just contain placeholder text for brevity.
<p vocab="https://terms.fhiso.org/sources/" typeof="Source">
Source A; derived from
<i resource="#B" rel="derivedFrom" typeof="Source">B</i> &
<i rel="derivedFrom" typeof="Source">C
<span rel="derivedFrom" resource="#B"/>
</i>.
</p>
An application conforming only to this standard will parse this and find three citation layers, and two layer derivation links saying that A is derived from both B and C. The resource
attribute on the first <i>
element will be ignored, and the <span>
element is a source-exclusion element and so will also be ignored.
However a full RDFa parser will find three derivedFrom
triples. In addition to the triples saying A is derived from B and C, there is a third triple saying that C is derived from B. An application may use this information to generate a third layer derivation link.
This arrangement of three layer derivation links is an example that cannot be represented in the subset of RDFa that this standard requires to be supported.
When an application has both a formatted citation tagged with RDFa attributes per this standard and a citation element set for the same citation, the two will typically have much content in common. This introduces the possibility that the data in the two places becomes unsynchronised. This section discusses ways of avoiding this.
In general, applications should consider information from the citation element set to have precedence over information extracted from a formatted citation.
If an application allows the manual editing of formatted citations tagged with RDFa attributes per this standard, it should take steps to prevent this from changing the citation element values that a conformant application would extract from the formatted citation to be different from the citation element values in the citation element set.
property
attribute so the changed data is no longer recognised as a citation element, or insert a content
attribute containing the correct data per §4.2.
Suppose an application generates the following formatted citation.
<p><span property="https://terms.fhiso.org/sources/authorName"
>Settipani, Christian</span>.
<i property="https://terms.fhiso.org/sources/title">Les ancêtres
de Charlemagne</i>.</p>
If a user edits this HTML to replace Les ancêtres de Charlemagne with Ibid., the application should then take steps to ensure a future parser does not believe the source literally has the title Ibid. In this case, clearly the change should not be propagated back to the citation element set as the source isn’t titled Ibid., and the user would presumably decline if offered this option. An application might delete the property
attribute so Ibid. is not understood to be a title, or insert a content
attribute containing real title as follows:
<p><span property="https://terms.fhiso.org/sources/authorName"
>Settipani, Christian</span>.
<i property="https://terms.fhiso.org/sources/title"
content="Les ancêtres de Charlemagne">Ibid.</i></p>
If an application stores formatted citations tagged with RDFa attributes as per this standard, it should take steps to ensure that changes to the underlying citation element set propagate to the formatted citation.
This example gives a full HTML document of the sort a genealogist might publish online. In a paragraph of narrative text it gives some brief details of King Edward II’s birth and parents. Although brief, this information is properly sourced to three published books with the citations formatted according to the Chicago Manual of Style. Each of these formatted citations has been marked up with RDFa attributes as described in this standard. The document includes several other instances of RDFa attributes that will not be detected as citation elements by a compliant parser.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<title property="dc:title">Edward II</title>
<meta property="dc:creator" content="FHISO, Inc." />
<style>
p { max-width: 720px; }
.notes p, .note { font-size: smaller; }
.fnref { vertical-align: super; font-size: smaller; }
.fnref::before { content: '['; }
.fnref::after { content: ']'; }
</style>
</head>
<body>
<h1>Edward II</h1>
<p>
Edward II was the fourth son of Edward I and his wife, Eleanor
of Castile.<a id="fnref1" class="fnref" href="#fn1">1</a>
He was born in Caernarfon Castle in North Wales on
25 April 1284, less than a year after Edward I had conquered
the region, and as a result is sometimes called Edward of
Caernarfon.<a id="fnref2" class="fnref" href="#fn2">2</a>
His father was the King of England, and had also inherited
Gascony in south-western France, which he held as the
feudal vassal of the King of France, and the Lordship
of Ireland.<a id="fnref3" class="fnref" href="#fn3">3</a>
</p>
<div vocab="http://terms.fhiso.org/sources/" class="notes">
<h2>References</h2>
<p typeof="Source" id="fn1"><a href="#fnref1">1</a>.
<span property="authorName">Roy Martin Haines</span>,
<i property="title">King Edward II: His Life, his Reign and
its Aftermath, 1284â1330</i>
(<span property="publicationPlace">Montreal, Canada
& Kingston, Canada</span>:
<span property="publisher">McGill-Queenâs
University Press</span>,
<span property="publicationDate">2003</span>),
<span property="page" content="3">3</span>.
</p>
<p typeof="Source" id="fn2"><a href="#fnref2">2</a>.
<span property="authorName">Seymour Phillips</span>,
<i property="title">Edward II</i>
(<span property="publicationPlace">New Haven, US
& London, UK</span>:
<span property="publisher">Yale University Press</span>,
<span property="publicationDate">2011</span>),
<span property="page" content="33, 36">33 & 36</span>.
</p>
<p typeof="Source" id="fn3"><a href="#fnref3">3</a>.
<span property="authorName">Michael Prestwich</span>,
<i property="title">Edward I</i>
(<span property="publicationPlace">Berkeley, US
& Los Angeles, US</span>:
<span property="publisher">University of California
Press</span>,
<span property="publicationDate">1988</span>),
<span property="page" content="13-14">13â14</span>.
</p>
</div>
<hr/>
<p class="note">This file is an example of an HTML document
containing formatted citations marked up with RDFa attributes
per the FHISO draft standard
<a href="http://tech.fhiso.org/drafts/cev-rdfa-bindings"
>Citation Elements: Bindings for RDFa</a>.</p>
<p vocab="http://creativecommons.org/ns#"
class="note">Content copied from
<a href="https://en.wikipedia.org/wiki/Edward_II_of_England"
property="dc:source">Wikipedia</a> and released under a
<a href="http://creativecommons.org/licenses/by-sa/3.0/"
property="license">Creative Commons License</a>.</p>
</body>
</html>
Copyright © 2017–18, Family History Information Standards Organisation, Inc. The text of this standard is available under the Creative Commons Attribution 4.0 International License.