Warning: This may be an old version of the document. The current version can be found here.

FHISO Citation Elements: Bindings for RDFa

This is a second public draft of a standard documenting the proposed usage of the FHISO Citation Elements standard in RDFa. This document is not an FHISO standard and is not endorsed by the FHISO membership. It may be updated, replaced or obsoleted by other documents at any time.

In particular, some examples in this draft use citation elements that are not yet included in the draft Citation Element: Vocabulary, and source derivation types that may be standardised in a future Source Derivation Vocabulary. These are likely to be changed as these vocabularies progress.

The public mailing list is the preferred place for comments, discussion and other feedback on this draft.

Latest public version:
This version:
Previous version:

FHISO’s suite of Citation Elements standard provides an extensible framework and vocabulary for encoding all the data about a genealogical source that might reasonably be included in a formatted citation to that source.

This information is represented as a sequence of citation elements, logically self-contained pieces of information about a source. This document defines a means by which citation elements may be identified and tagged within an XML or HTML formatted citation, allowing a computer to extracted them in a systematic manner. The tagging of citation elements is done using a standard set of HTML attributes known as RDFa attributes, which can also be used in XML languages besides HTML.

Other documents in the suite of Citation Elements standards are as follows:

Not all of these documents are yet at the stage of having a first public draft.


Conventions used

Where this standard gives a specific technical meaning to a word or phrase, that word or phrase is formatted in bold text in its initial definition, and in italics when used elsewhere. The key words must, must not, required, shall, shall not, should, should not, recommended, not recommended, may and optional in this standard are to be interpreted as described in [RFC 2119].

An application is conformant with this standard if and only if it follows all the requirements and prohibitions contained in this document, as indicated by use of the words must, must not, required, shall and shall not, and the relevant parts of its normative references. Standards referencing this standard must not loosen any of the requirements and prohibitions made by this standard, nor place additional requirements or prohibitions on the constructs defined herein.

Adding requirements or prohibitions is disallowed so as to preserve interoperability between applications: data generated by one conformant application must always be acceptable to another conformant application, regardless of what additional standards each may conform to.

This standard depends on the Citation Elements: General Concepts standard [CEV Concepts]. To be conformant with this standard, an application must also be conformant with [CEV Concepts]. Some words and phrases defined in that standard are used here without further definition.

Readers are advised to read at least the introduction to [CEV Concepts] before reading this standard.

Indented text in coloured boxes, such as preceding paragraph, does not form a normative part of this standard, and is labelled as either an example or a note.

Editorial notes, such as this, are used to record outstanding issues, or points where there is not yet consensus; they will be resolved and removed for the final standard. Examples and notes will be retained in the standard.

RDFa attributes

The tagging of citation elements in formatted citations is done using a standard set of HTML attributes known as RDFa attributes which are defined in [RDFa Core]. Compliance with this FHISO standard does not require full RDFa compliance: support for the full [RDFa Core] is optional, and RDFa features other than those for which support is required by this standard should not be used when compatibility between implementations is desirable.

The specification of [RDFa Core] assumes a detailed working knowledge of the RDF graph model. A more accessible introduction to RDFa can be found in the [RDFa Primer], but FHISO’s use of RDFa attributes here is limited, and this standard is designed to be used without any knowledge of RDFa or RDF. An application parsing RDFa attributes according to this specification does not need a full RDFa parser, far less to support the full RDF graph model.

These attributes may be used in HTML or any XML-based markup language, but for the purpose of tagging citation elements in formatted citations it is recommended that they be used in XHTML. The language they are used in is referred to here as the host language.

Applications wishing to implement a fully-compliant RDFa parser for HTML will find the formal specification on the use of RDFa in HTML in two standards, [HTML+RDFa] and [XHTML+RDFa].

In the simplest case, the citation element name (which is an IRI) can be put in a property attribute on an XML or HTML element, and the citation element value is the text contents of the element. The particular type of element on which the attributes are placed is not relevant.

A simplified formatted citation to Settipani’s book Les ancêtres de Charlemagne might be marked up as the following HTML fragment:

<p>Settipani, Christian.  <i>Les ancêtres de Charlemagne</i>.</p>

The title of the book can be tagged by adding a property attribute to the existing <i> element. As written above, no element contains just the author’s name, as the <p> element also encloses the title; however author’s name can be wrapped in an <span> element and a property attributed added to that. HTML’s <span> element has no defined meaning of its own, but exists to provide a place for attributes such as this.

<p><span property=""
  >Settipani, Christian</span>. 
  <i property="">Les ancêtres 
    de Charlemagne</i>.</p>

An HTML renderer will correctly format this while ignoring the two property attributes, but an application that conforms to this standard will extract these two citation elements from this HTML:

authorName: Settipani, Christian
title: Les ancêtres de Charlemagne

Note the citation element value of title citation element contains no line break, despite the HTML being split across two lines. This is because [CEV Concepts] says applications should whitespace-normalise citation element values.

In many examples in this standard, including the previous one, the list of citation elements is given as a list of name, value pairs with both presented as a string. In practice the citation element value is a localisation set containing one string which is additionally tagged with a datatype and possibly a language tag. This detail is frequently omitted from examples where it is not germane to the point being illustrated.

Index of attributes used

This standard makes use of the following attributes:

In addition, when the host language is HTML, special meaning is attached to the <time> element.

Motivation and limitations

In this standard, unless otherwise stated, the term HTML refers to any backwards-compatible version of HTML, and XHTML refers to any version of HTML that is also well-formed XML.

This definition of HTML includes HTML 4.01, XHTML 1.0, XHTML 1.1, HTML5 and HTML 5.1. For the last two, it includes both their XML and non-XML forms. It will include future editions of HTML5 too, assuming they retain backwards compatibility. This definition of XHTML includes not just the standards that are named XHTML, but also the XML forms of HTML5 and later.

The use of HTML, or a subset of HTML, is often permitted in genealogy applications to allow users to add formatting to text in various contexts. It is recommended that applications which allow users to edit or manually lay out formatted citations should permit the use of some HTML elements in them.

[CEV Concepts] recommends that if high quality formatted citations are required, users should be allowed to fine-tune the presentation by hand because it is not anticipated that an application will always do a perfect job. Many citation styles use italics and some use bold, underlining or other text-level formatting when formatting certain citation elements. In order to allow the user to fine-tune the use of such formatting, the user should be allowed to edit the formatted citation as HTML.

If an application automatically generates an HTML formatted citation from a citation element set, it should add RDFa attributes in such a manner that will another application conformant with this standard will be able to extract the citation elements again. This should not be an application’s principal means of serialising a citation element set: applications should prefer a format that serialises the citation element set directly rather than after converting it to a formatted citation.

RDFa attributes are not the recommended way of serialising citation element sets primarily because it requires creating a formatted citation. Doing this to a reasonable standard is non-trivial, and results in particular language and style being favoured. This standard is provided for situations when a formatted citation is desired or required anyway. For example, an enormous amount of genealogical research has been published online and includes formatted citations. If they are tagged according to this standard, these formatted citations can be copied and pasted into a genealogy application which can convert them back to a citation element set.

Shorthand IRIs

The [CEV Concepts] standard makes heavy use of IRIs as identifiers, as does RDFa. In particular, the datatype, property, rel, rev and typeof attributes contain IRIs.

The datatype attribute shall contain a single IRI. The property, rel, rev and typeof attributes shall contain a list of IRIs separated by whitespace. Leading and trailing whitespace is discarded.

A common reason why multiple IRIs might be present is when two IRIs exist with similar meanings and the creator of the citation wishes to use both for compatibility.

<i property="
   ">Les ancêtres de

Here two alternative IRIs are used to tag the title, presumably because the citation’s creator anticipated it being processed by applications that support [Dublin Core] metadata as well as FHISO’s Citation Elements standards. A parser conforming to this standard will treat both IRIs as valid and create two citation elements, both with the same citation element value, however if the Dublin Core IRI is not known to the application, it will likely be ignored.

In the uses described by this standard the property attribute will always contain a citation element term, and the datatype attribute will always contain a datatype name. The typeof attribute will contain an IRI that allows this standard’s use of RDFa to be distinguished from any other uses also present in the document. The rev and rel attributes will contain a source derivation type to denote citation layer links.

RDFa provides two separate mechanisms for abbreviating the IRIs in these attributes: by setting a local default vocabulary, and by using prefixes to create compact URIs expressions (CURIEs) as a form of prefix notation. Applications processing formatted citations in accordance with this standard must support both of these mechanisms. Expansion of terms using the local default vocabulary shall be done before the expansion of CURIEs. An application must behave as if all datatype, property, rel, rev and typeof attributes have been expanded before continuing to process the data.

Applications may opt to expand these attributes on demand, provided the effect is the same. The typeof attribute is the only one whose value invariably needs expanding.

Default vocabularies

A term in RDFa is an XML NCName that also permits slash (U+002F) as a non-leading character. It matches the term production given in §7.4.3 of [RDFa Core].

This production is as follows:

term     ::=  NCNameStartChar termChar*
termChar ::=  ( NameChar - ':' ) | '/'

The definitions of NameChar and NCNameStartChar are found in [XML] and [XML Names] respectively.

The [CEV Concepts] standard also uses the word “term”, and defines it to mean a vocabulary item identified by an IRI. To minimise confusion, this standard never uses the word “term” in that sense, and only uses it in the RDFa sense given above.

When a datatype, property, rel, rev or typeof attribute contains a term, it shall be converted to an IRI by prepending the local default vocabulary if one exists. The local default vocabulary is an IRI which is specified using a vocab attribute. It applies to the element where it is specified and to all elements in its content unless overridden with another vocab attribute.

Terms look similar to relative IRIs and this process is similar to resolving relative IRIs against a base IRI, but the process of applying a local default vocabulary is simpler as the two strings are simply concatenated without understanding the structure of the IRI.

Markup generators should ensure that a vocab attribute is present if terms are being used when compatibility between implementations is desirable. When these attributes are used in a host language other than HTML, the definition of the host language may provide a default vocabulary that applies in the event that no vocab attribute is found; HTML provides no such default.

If no local default vocabulary was found, a parser may use an initial context as described in §9 of [RDFa Core] to resolve the term to an IRI; if not, or if it was not found in the initial context, the term shall be ignored. When an initial context is used, it must be the standard one for the host language: implementations must not define their own initial context.

<p><span property="authorName">Settipani, Christian</span>. 
  <i vocab=""
     property="title">Les ancêtres de Charlemagne</i>.</p>

In this fragment, both property attributes contain a term. The title term is converted to the IRI of FHISO’s title citation element:

In considering the authorName term, a parser looks for a vocab attribute on the <span> or the enclosing <p> element. No such attribute exists, and the RDFa attributes are being used in HTML which provides no default vocabulary.

The parser may consider the standard initial context too, and if it is a full RDFa parser it must. As the host language is HTML, the initial context is defined in [HTML5+RDFa Context]. At the present time this only includes mappings for describedBy, license and role. These are to be matched case-sensitively, or failing that case-insensitively, but the title term used in this example clearly does not match.

Regardless of whether the application considered the initial context, the title term cannot be resolved to an IRI and is therefore ignored.

If use of the initial context is changed to be required for CURIEs, below, it should be changed here too.

Compact URI Expressions (CURIEs)

A CURIE comprises two components, a prefix and a reference, separated by a colon (U+003A). It matches the curie production given in §6 of [RDFa Core].

This production is defined as follows:

curie       ::=   ( prefix? ':' )? reference
prefix      ::=   NCName
reference   ::=   ( ipath-absolute | ipath-rootless | ipath-empty ) 
                       ( '?' iquery )?  ( '#' ifragment )?

The definitions of NCName is found in [XML Names]. The various productions referenced in the definition of reference are defined in [RFC 3987]. None of these ipath productions match a string beginning “//”, therefore IRIs of the form http://… never match the curie syntax production. There is a conflict with certain other, less-used IRI schemes, and does match the syntax. However this only results in this IRI being treated as a CURIE if mailto is defined as a CURIE prefix. The RDFa working group considered the risk of this to be minimal.

Although this syntax definition allows the omission of both prefix and the colon, in practice there is no situation in RDFa where both can be omitted and the result still parsed as a CURIE. A parser conforming to this standard may safely treat the colon as mandatory.

When a datatype, property, rel, rev or typeof attribute contains a whitespace separated token that is syntactically a CURIE, the parser should look up its prefix to see whether a prefix mapping (which is an IRI) has been defined. This look-up is done case-insensitively.

If the prefix has been omitted and the CURIE begins with a colon, parsers may ignore the CURIE and must not fall back to treating it as an IRI; if is is not ignored, the prefix mapping must be
This vocabulary contains little of use in marking up formatted citations.

When the prefix is present, a parser must try to look it up in the local prefix mappings. These are set using prefix attributes. This attribute must contain an even number of whitespace separated tokens: the first and every subsequent odd token must be an NCName followed by a colon; the second and every subsequent even token must be an IRI. The NCName is the prefix and the IRI is its prefix mapping. The mapping applies to the element where it is specified and to all elements in its content unless overridden.

The following is an example of a well-formed prefix attribute.

<div prefix="cev:
  <i prefix="dc:"
     property="cev:title dc:title">Les ancêtres de Charlemagne</i>

The prefix attribute on the <div> defines two local prefix mappings, one for the cev prefix, the other for the dc prefix. The dc local prefix mapping is overridden by the prefix attribute on the <i> element; the cev local prefix mapping has not been overridden and remains in operation.

The prefix consisting of a single underscore character (U+005F) has special meaning in §7.4.5 [RDFa Core] for referencing blank nodes. It must not be used in CURIEs other than for that purpose. Support for blank nodes is not recommended in this standard. Applications that do not support blank nodes must ignore CURIEs with a prefix consisting of a single underscore.

In determining the local prefix mappings, a parser may also use XML namespace declarations as defined in §7.5, item 3 of [RDFa Core]. This is not required even in full RDFa parsers and is deprecated; it is not recommended by this standard.

If the prefix was not found in the local prefix mappings, a parser may use an initial context as described in §9 of [RDFa Core] to determine the prefix mapping. When an initial context is used, it must be the standard one for the language on which the RDFa tags are used: implementations must not define their own initial context.

It may be worth making this required rather than optional as the initial context for HTML contains prefix mappings for several potentially useful vocabularies including Dublin Core and PROV. It is unlikely to add much complexity to the parser or this specification.

If a prefix mapping is found, the CURIE is converted to an IRI by prepending the prefix mapping to the reference part of the CURIE.

The two CURIEs in the previous example expand to these IRIs:

If no prefix mapping is found, the CURIE shall be treated as an IRI if it is syntactically valid as one or ignored otherwise. If this results in an IRI with an unknown scheme, the parser may ignore it; parsers must not ignore the http, https or urn schemes.

Virtually all CURIEs are syntactically valid IRIs since prefix:reference is a valid IRI, despite having an unknown scheme. The option of ignoring unknown IRI with unknown schemes is introduced because this standard makes the use of an initial context optional. CURIEs with prefixes that would be resolved via the initial context in a full RDFa parser may therefore be left unresolved by a parser conforming to this standard. Almost invariably they will have an unknown scheme when reinterpreted as an IRI and can therefore be dropped. Full RDFa parsers must use initial contexts and therefore must not ignore IRIs with unknown schemes.
If support for initial contexts becomes required, the ability to ignore unknown schemes should probably be dropped.

Locating citation elements

In general a document will contain more than just a single citation element set, and other parts of the document may also contain RDFa attributes for entirely different purposes; even if the only use of RDFa is for tagging citation elements it is important not to confuse the citation elements from one formatted citation or citation layer with those of another.

Citation elements are identified using property attributes. However a property attribute shall only be interpreted as representing a citation element if:

Any property attributes matching the above criteria shall be considered in the order they appear in the document and used to generate citation elements as described in §4.

The detailed specification in §7.5 of [RDFa Core] requires that property attributes are processed and used to generate RDF triples in document order. However the [RDFa Core] processing model requires these triples be added to an RDF graph, and RDFa graphs are not required to preserve the order of triples; nevertheless, most current RDFa processors do output properties in document order. Implementations using an RDFa parser to implement this specification should verify that the document order of properties can be determined.

The citation elements contained within a source-type element shall form a citation element set which represents a citation layer (or a single-layered citation) as described in §5.

Source-type elements

A source-type element is any element that has a typeof attribute whose value, once shorthand IRIs have been expanded, includes either of the following IRIs:

HTML or XML content is only considered to be part of a formatted citation if it is a source-type element or is contained within one.

The following example contains two entirely unrelated uses of RDFa attributes:

<p vocab="" typeof="Source">
  <span property="authorName">Settipani</span>, <i>Ibid.</i></p>
<div vocab="">Released under a 
  <a href=""
     property="license">Creative Commons License</a>.</div>

The typeof attribute of the <p> element has a value that expands to the required IRI. This marks the <p> element as a source-type element, and its contents as a formatted citation. This contains just one property attribute, so a parser will find just one citation element: an authorName one with value “Settipani”.

The license property is not contained in a source-type element and therefore does not denote a citation element. It is a use of RDFa that is outside the scope of this standard. This is as well: Settipani’s book is not licensed under a Creative Commons License, though a page discussing it may well be.

An external mechanism may be used to designate the entirety of an HTML document or fragment a source-type element.

A non-HTML syntax might embed fragments of HTML to represent individual formatted citations. It would likely designate each fragment to be a source-type element, in which case the typeof attribute is optional.
There has been some discussion about the possibility of using the resource attribute on source-type elements to generate certain “meta” citation elements such as a UUID or a “citation authority IRI”.

Source-exclusion elements

The concept of a source-exclusion element is necessary to prevent a parser from misinterpreting property attributes that are part of more complex RDFa constructs which this standard does not require to be supported. Future FHISO standards may make use of some of these RDFa constructs and this restriction also allows for forwards compatibility.

An application that supports only those RDFa features for which support is required by this standard must consider an element to be a source-exclusion element of a given source-type element if it is contained within the source-type element (but is not the source-type element itself) and has an attribute named about, inlist, rel, resource, rev, or typeof.

The circumstances in which the source-type element is itself excluded needs further consideration giving particular attention to the processing sequence in §7.5 of [RDFa Core].

The following example includes a more complex use of RDFa attributes, beyond what this standard requires to be understood.

<p prefix="foaf:"
   vocab="" typeof="CitedSource">
  <span rel="foaf:maker">
    <span property="foaf:name">Settipani</span></span>,
  <i property="title">Les ancêtres de Charlemagne</i>.

The <p> element is a source-type element due to the typeof="CitedSource" attribute, and the formatted citation is the string “Settipani, Les ancêtres de Charlemagne.”

The <p> element has one source-exclusion element: the outer <span> element due to its rel attribute. Parsers are not expected to understand the meaning of this rel attribute, just to note its presence. As the inner <span> element is contained within this source-exclusion element, the property="foaf:name" attribute must not be treated as a citation element.

The property attribute on the <i> element is not located within a source-exclusion element, and therefore it does denote a citation element. This is the only citation element in this example.

These rules allow source-type elements to nest, with the inner source-type element being a source-exclusion element of the outer source-type element. This behaviour is used in the representation of layered citations, as discussed in §5.

Applications which support a larger part of RDFa than this standard requires may treat fewer elements as source-exclusion elements. If so, they must ensure that RDFa constructs are only treated as citation elements when they produce an RDF triples whose subject has the following RDF types, or a subtype thereof:

In addition, applications supporting a larger part of RDFa must discard triples where the object is an RDF blank node.

A future FHISO standard might extend this data model to include support for blank nodes, likely using them to represent objects with properties of their own.
This standard is designed to allow implementers to parse those RDFa constructs used without having to consider how they map to RDF. The preceding text is only of relevant if an implementer wishes to make greater use of the RDF features underlying RDFa.

Parsing citation elements

As defined in the [CEV Concepts] standard, a citation element consists of two components:

Once a parser has identified the property attributes that are tagging citation element it shall determine each component of each citation element as described in the following sub-sections.

For the purpose of this section, the current element refers to the XML or HTML element that has the property attribute which tags the current citation element.

The citation element name shall be the value of the property attribute, once shorthand IRIs have been expanded. If the property attribute contains more than one IRI, each shall be used as the citation element name of a separate citation element with a copy of the same citation element value.

To construct the citation element value, an application shall determine its current property value, as defined in §4.2 below. This is a string and is used to construct a new localisation set to be the citation element value. The application shall then determine the datatype of the string per §4.3, and if the result is a language-tagged datatype, shall also determine its language tag per §4.4. Alternatively, applications that opt to parse RDFa to RDF triples, as a full RDFa parser does, may determine the current property value, datatype and language tag per §4.5.

These rules are illustrated by example in the sections below.

List flattening

RDFa, as used in this standard, is a list-flattening format. This means it does not naturally provide a means of keeping the localisation sets of each citation element separate because it has no means of distinguishing multi-valued citation elements from translated or localised versions of the same citation element. Applications must therefore assume every property attribute identifies a separate citation element.

It would have been possible for this standard to have defined a usage of RDFa that was not a list-flattening format. This was not done because it would make most straightforward uses unidiomatic, and likely compromise the uptake of this standard.

The following RDFa markup is well-formed but will be misinterpreted by a parser conforming to this specification.

<p lang="en-GB" typeof="Source">
  <span property="authorName" 
        content="Lansdowne, Marquess of">Lord Lansdowne</span> and
  <span property="authorName" lang="jp-Latn">Hayashi Tadasu</span>
  (<span property="authorName" lang="jp">林 董</span>),
  <i property="title">The Anglo-Japanese Treaty</i>,
  <span property="publicationDate">1902</span>.

The Anglo-Japanese Treaty was (at least nominally) authored by two people: the Marquess of Lansdowne and Count Hayashi Tadasu whose name is written in kanji as 林 董. A conformant application will see three authorNames and make each into a separate citation element, when in fact the desired behaviour is for “林 董” to be part of the same localisation set as “Hayashi Tadasu”.

Applications are required to use the localisedElement mechanism defined in §3.4.1 of [CEV Concepts] when multiple translations or localisations of a single citation element value are needed.

The RDFa markup from the previous example can be fixed by using a localisedElement to encode the second for of Hayashi’s name. At its simplest, this alters the two <span> elements referring to Hayashi to read:

  <span property="authorName" lang="jp-Latn">Hayashi Tadasu</span>
  (<span property="localisedElement" lang="jp">林 董</span>)

However, [CEV Concepts] recommends that the first string in the localisation set should be the untranslated, and ideally untransliterated form of the citation element. Undoubtedly it is the Latin form that is the transliteration, and therefore these elements are the wrong way round. While this is only a recommendation, applications should try to follow it; this can be achieved as follows:

  <span property="authorName" lang="jp" content="林 董" />
  <span property="localisedElement" 
        lang="jp-Latn">Hayashi Tadasu</span> (林 董)

This use of the content attribute is discussed below. It provides a value for the citation element while hiding the value from an HTML renderer.

Current property value

This section, together with the following section defining the datatype, derive from step 11 in the processing sequence given in §7.5 of [RDFa Core], as amended by §3.1 of [HTML+RDFa].

The current property value is a string which will be used to create the citation element value. It is determined based on the RDFa attributes present on the current element as follows.

The use of the term current property value in this standard coincides with its definition in [RDFa Core].

If current element has a content attribute, and either has no datatype attribute, or its datatype attribute is empty or has a value (after expanding shorthand IRIs) other than either of the following IRIs, then the current property value shall be the value of the content attribute.
These two IRIs have special treatment in RDFa. This standard excludes them for complete compatibility with a full RDFa parser, but it is not anticipated that they will arise in practice.

The purpose of the content attribute is to allow the citation element value to be something that is not rendered or otherwise used in HTML. This is particularly important when the citation element is required to have a value in a format that is different to how the element is formatted.

<span property=""
      content="2017-05-22">May 22nd, 2017</span>

In this case, the use of a content attribute is necessary because the publicationDate citation element value must be a date in the prescribed date format based on [ISO 8601]: it must not be a date like “May 22nd, 2017”.

It would be desirable to add support for the resource attribute here. Before adding it, it is necessary to establish how safe it is to remove resource from the list of attributes that make a source-exclusion element.

Otherwise, if the host language is HTML, if the current element has a datetime attribute, the current property value shall be the value of the datetime attribute.

Otherwise, in HTML or in other XML languages that support an href attribute, if the current element has an href attribute and no datatype attribute, the current property value shall be the value of the href attribute, which shall be an IRI.

Otherwise, in HTML or in other XML languages that support a src attribute, if the current element has a src attribute and no datatype attribute, the current property value shall be the value of the src attribute, which shall be an IRI.

The [HTML+RDFa] standard does not change which HTML elements can have a datetime, href or src attribute. At present, the datetime attribute is only permitted on a <time> element; most href attributes in HTML are found on <a> elements; most src attributes are on elements that display some form of media, particularly <img> and in HTML5, <video> and <audio>.

When an href or src attribute links to an online source, it can be tagged as a citation element.

<div vocab="" typeof="Source">
  <a href=""
     property="accessURL"><span property="title">Discovery</span></a>
  (online catalogue)

This example has two citation elements:

title: Discovery

The fact that the second property attribute is on a child element of the element containing the first property attribute is irrelevant and does not signify any additional connection between the title and the accessURL over and above their usual relationship.

Otherwise, the current property value shall be formed by concatenating the text contained in each of the descendant text nodes of the current element in document order.

This definition allows citation elements to nest which can be useful when tagging full titles and short versions of them.

<p vocab="" typeof="Source">
  <i property="title"><span property="shortTitle">The visitations 
  of Kent</span>, taken in the years 1530–1 by Thomas Benolte, 
  Clarenceux, and 1574 by Robert Cooke, Clarenceux.</i>

The shortTitle property takes the value “The visitations of Kent”, while the title property takes the value “The visitations of Kent, taken in the years …” by concatenating the text in the nested <span> element with the text directly in the <i> element.


A conformant parser must determine the datatype which tags the string in the citation element value as follows.

If the current element has a non-empty datatype attribute, then the datatype shall be the value of datatype attribute once shorthand IRIs have been expanded. The datatype attribute must not contain the name of a language-tagged datatype or the built-in rdfs:Resource datatype. The use of a datatype attribute is recommended for citation elements that are not well-known if the datatype is known not to be one that is prohibited in a datatype attribute.

Suppose a vendor defines a citation element called reviewDate which contains an [ISO 8601] date. This third-party element may not be well known, so an RDFa author should mark up its use with a datatype attribute:

<span prefix="vendor:
      property="vendor:reviewDate" datatype="xsd:date" 
      content="2000-10-08" />

By using a datatype attribute, the RDFa author is ensuring the application processing the data knows the citation element is a date and will display it to the user appropriately, even if it does not know exactly what the date signifies.

Language-tagged datatypes must not be placed in a datatype attribute because the RDFa parsing rules mean the language tag is discarded if a datatype attribute is found.

Otherwise, if the host language is HTML, if the current property value was found in a datetime attribute or was the contents of a <time> element, an application may examine the current property value, and if it is syntactically valid as the following structured non-language-tagged datatypes defined in [XSD Pt2], it may determine that to the datatype:
This rule exists for compatibility with a full HTML+RDFa parser where this behaviour is required; implementation of this rule is otherwise not recommended. Document authors should not rely on this behaviour, and should instead add a datatype attribute.

An application that implements this rule would read the markup below and generate a citation element value whose single string2000-10-08” would be tagged with the xsd:date datatype.

<time property="vendor:reviewDate">2000-10-08</time>

Had a different HTML element been used, say a <span>, or if the parser does not support this rule, the datatype would fall back to rdf:langString. If this third-party citation element were unfamiliar to the application, it would not undergo datatype correction per §4.4 of [CEV Concepts], and would remain with the wrong datatype. For this reason, an explicit datatype attribute is recommended:

<time property="vendor:reviewDate" 

Otherwise, if the current property value was found in a src or href attribute, then the datatype shall be:

This datatype IRI must not be given explicitly in a datatype attribute.

RDF’s notion of a datatype is narrower than the definition in these Citation Elements standard and rdfs:Resources is not a datatype in the RDF sense which is why it must not be given in an RDFa datatype attribute.
The handling of src and href attributes should be revisited as the [CEV Vocabulary] progresses. If there are no obvious use cases, support for them could be made optional, with them behaving as source-exclusion elements if not supported.

Otherwise, the application shall attempt to determine whether a language tag is in scope per §4.4; if a language tag can be determined, the datatype shall be the rdf:langString type:
This is so that the current language tag is not lost, as it would be if the default were a string.

Otherwise, the application shall determine the datatype to be:

Applications wishing not to handle the xsd:string datatype are allowed by §2.4.2 of [CEV Concepts] to change this datatype to

and tag the string with a language tag of und.

The datatypes selected in the last three cases are the three datatypes which are defined to participate in the datatype correction mechanism defined in §4.4 of [CEV Concepts]:

Applications may opt to apply datatype correction while parsing RDFa for citation elements; if so, these datatypes will often be replaced by the default datatype of the citation element term.

Language tags

The language tag of the citation element shall be the value of xml:lang or lang attribute on the current element, failing which on the nearest ancestor element of the current element. If both attributes are present on the same element, the xml:lang attribute takes precedence.

This standard does not change when the xml:lang and lang attributes may be used on an HTML element. In particular, the xml:lang attribute is only allowed in XHTML documents.
<p vocab="" typeof="Source" lang="en">
  <span property="authorName"
        content="Settipani, Christian">Christian Settipani</span>, 
  <i property="title" lang="fr">Les ancêtres de Charlemagne</i>, 
  <span property="edition" content="2">2nd ed.</span> 

This formatted citation is correctly tagged with the language tag en denoting English. This is because, even though the book’s title is French, the citation as a whole is in English. Had the citation been written in French, the edition would have been written “2ᵉ éd” rather than “2nd ed”.

This example contains three citation elements. The authorName and edition citation elements both inherit the en language tag. In the case of authorName this may or may not be what was intended: the author is French but his name would not normally be altered in translation to English. The explicit language tag is necessary on the title citation element, as the title is clearly French.

If no applicable xml:lang or lang, an external mechanism may be used to supply the language tag.

In a document fetched via HTTP, a Content-Language header may provide the default language tag for the whole document.
If the formatted citation is a fragment of XHTML in a different XML language, the value of any xml:lang attributes in the host XML will be inherited by the XHTML as defined in §2.12 of [XML].

When these attributes are used in host languages other than HTML, the definition of the host language may provide a default language tag that applies in the event that no such attribute is found.

FHISO does not recommend the use of a default language tag when it gives privileged status to one language. If technical considerations require a default language tag, a neutral language tag such as und (defined in [ISO 639-2] to represent an undetermined language) should be used.

If no applicable xml:lang or lang attribute was found, no value was supplied through an external mechanism and no default applies, or if provided language tag is an empty string, the citation element has no language tag.

Parsing RDF triples

This section is only relevant if an implementation wishes to make greater use of the RDF features that underlie RDFa. Support for everything in this section is therefore optional.

Applications supporting more RDFa features than this standard requires may determine the current element value, its datatype and, where applicable, its language tag from the object of an RDF triple that was identified as representing a citation element per §3.2 of this standard.

If the object of the RDF triple is a literal, then the current element value shall be the lexical form of the literal, as defined in §3.3 of [RDF Concepts]. Its datatype shall be the datatype IRI of the literal, and its language tag shall be the language tag of the literal if that is present exists.

Otherwise, if the object of the RDF triple is an IRI, then the current element value shall be that IRI, and its datatype shall be:
The object of the RDF cannot be a blank node as RDF triples whose objects are blank nodes are discarded in §3.2.

Layered citations

Once the citation elements in a document have been located, parsed and grouped into citation element sets, the application shall interpret each citation element set as a citation layer.

In [CEV Concepts], a citation is represented with three parts:

In these RDFa bindings, citation layers are represented by a source-type element which are nested in layered citations.

Nested source-type elements

A nested source-type element is a source-type element that:

The citation layer represented by a nested source-type element shall be part of the same layered citation as the citation layer represented by its outer source-type element. Source-type elements may be nested arbitrarily deep, and multiple nested source-type elements may be present within the same outer source-type element: they all represent citation layers which are part of the same layered citation.

The following fragment of HTML represents a layered citation with three citation layers.

<p vocab="" typeof="CitedSource">
  <span property="authorName">Settipani</span>, citing  
  <span rel="cites" typeof="Source"><i property="title">Vita 
    Sancti Arnulfi</i></span> and 
  <span rel="cites" typeof="Source"><i property="title">Testamentum

The second <span> element is a source-type element by virtue of its typeof attribute, which also makes it a source-exclusion element of the <p> element. It has a rel attribute, and together these facts make it a nested source-type element. The <p> element is its outer source-type element. Exactly the same applies to the third <span> element, and as both are part of the same layered citation as their shared outer source-type element, both must be in the same layered citation as each other.

As the second and third <span> elements are source-exclusion elements of the outer source-type element, their title property is only a citation element of the nested source-type elements, and not also of the outer source-type element. The outer source-type element therefore only has one citation element: the authorName.

All but one of the source-type elements in a layered citation will be nested source-type elements. The one that is not is known as the outermost source-type element.

The collection of citation layers in a layered citation is an ordered list, and the citation layers should be include given in document order.

The head citation layer

The head citation layer may be indicated by source-type element with a typeof attribute whose value, once shorthand IRIs have been expanded, includes the following IRI:

If precisely one such element exists in the layered citation, the head citation layer shall be the citation element represented by that element; otherwise the head citation layer shall be the citation element represented by the outermost source-type element. There shall not be more than one source-type elements in a layered citation with a typeof attribute whose value includes this IRIs.

The head citation layer is defined in [CEV Concepts] as the citation layer representing the source that was actually consulted, but this need not be presented first in a formatted citation. More generally, this suite of standard makes no recommendation on how citation layers should be ordered within a formatted citation. Different style guides make different recommendations, and the decision may depend on the precise circumstances and what the author wishes to emphasise. The CitedSource type is provided to facilitate the correct identification of the head citation layer, regardless of where it is placed.

Individual citation elements have not been tagged in this example for reasons of brevity.

<p vocab="" typeof="Source">
  1810 U.S. census, York County, Maine, town of York,  
  p.&nbsp;435 (penned), line 9, Jabez Young; 
  <span rev="facsimileOf" typeof="CitedSource">NARA microfilm 
    publication M252, roll 12</span>.</p>

This formatted citation, based on an example in [Evidence Explained], places the head citation layer (the microfilm) at the end of the formatted citation, and marks it with a CitedSource type. In this case, the same effect could have been achieved by nesting the HTML elements differently:

<p vocab="" typeof="Source">
  <span rel="facsimileOf" typeof="Source">1810 U.S. census, 
    York County, Maine, town of York, p.&nbsp;435 (penned), 
    line 9, Jabez Young</span>; 
  NARA microfilm publication M252, roll 12.</p>

In this second version, there is no need to use the CitedSource type as it defaults to the outermost source-type element.

Layer derivation links

In the [CEV Concepts] data model, layer derivation links have components:

In this standard, layer derivation links are represented by rel and rev attributes on nested source-type elements.

Once shorthand IRIs have been expanded, each IRI in the rel and rev attributes shall be used as the source derivation type of a new layer derivation link. If the IRI was in a rel attribute, the derived source shall be the source represented by the outer source-type element, and the base source shall be the source represented by the nested source-type element. If the IRI was in a rev attribute, the derived source shall be the source represented by the nested source-type element, and the base source shall be the source represented by the outer source-type element.

The rel and rev attributes provide forwards and reverse versions of the same functionality: the difference being that the rel attribute is placed on the base source, while the rev attribute is placed on the derived source.
In the previous example, the microfilm is derived from the 1810 census returns. The first version needs to use a rev attribute because the nested source-type element is the derived source, while the second version uses a rel attribute because the nested source-type element is the base source.
This representation of layer derivation links does not allow an arbitrary set of layer derivation links to be encoded as there is no way to reference a citation layer that is encoded elsewhere, but it does cope with any tree of derivations which is the case that is anticipated to arise in practice. Applications supporting more RDFa functionality than this standard requires can express arbitrary collections of layer derivation links, and an example of this is given in §5.4.

Full RDFa considerations

This section is only relevant if an implementation wishes to make greater use of the RDF features that underlie RDFa. Support for everything in this section is therefore optional.

Documents that use more RDFa features than this standard requires to be supported must not include any source-type elements, other than the head citation layer as determined by the above rules, whose RDF type can be inferred to be:
The above restriction is to prevent a full RDFa parser from disagreeing with an application just implementing this standard over the identity of the head citation layer. The term “inferred” is meant broadly, and includes inferences made through entailment regimes, as defined in [RDF Semantics].

Applications may utilise the fact that is an RDF subclass of

Applications which support a larger part of RDFa may find additional layer derivation links. If so, they must ensure that RDFa constructs are only treated as layer derivation links when they produce an RDF triple whose subject and object both have the following RDF types, or a subtype thereof:

In addition, the predicate of the RDF triple must be the following, or an RDF subproperty thereof:

The subject of the RDF triple corresponds to derived source and its object is the base source; the predicate is the source derivation type. Such triples should not also be used to generate a citation element as would otherwise be permitted by §3.2.

In the following example, the layers have been shorted to just contain placeholder text for brevity.

<p vocab="" typeof="Source">
  Source A; derived from
  <i resource="#B" rel="derivedFrom" typeof="Source">B</i> &amp;
  <i rel="derivedFrom" typeof="Source">C
    <span rel="derivedFrom" resource="#B"/>

An application conforming only to this standard will parse this and find three citation layers, and two layer derivation links saying that A is derived from both B and C. The resource attribute on the first <i> element will be ignored, and the <span> element is a source-exclusion element and so will also be ignored.

However a full RDFa parser will find three derivedFrom triples. In addition to the triples saying A is derived from B and C, there is a third triple saying that C is derived from B. An application may use this information to generate a third layer derivation link.

This arrangement of three layer derivation links is an example that cannot be represented in the subset of RDFa that this standard requires to be supported.

Synchronising citation elements

When an application has both a formatted citation tagged with RDFa attributes per this standard and a citation element set for the same citation, the two will typically have much content in common. This introduces the possibility that the data in the two places becomes unsynchronised. This section discusses ways of avoiding this.

In general, applications should consider information from the citation element set to have precedence over information extracted from a formatted citation.

If an application allows the manual editing of formatted citations tagged with RDFa attributes per this standard, it should take steps to prevent this from changing the citation element values that a conformant application would extract from the formatted citation to be different from the citation element values in the citation element set.

This document does not prescribe a particular mechanism for ensuring this, but most strategies will involve parse the RDFa attributes before and after the edit and identify any citation elements whose values have changed. An application might ask the user whether the change should be propagated back to the original citation element set. If the change is not to be propagated back to the citation element set, the application might delete the property attribute so the changed data is no longer recognised as a citation element, or insert a content attribute containing the correct data per §4.2.

Suppose an application generates the following formatted citation.

<p><span property=""
  >Settipani, Christian</span>. 
  <i property="">Les ancêtres 
    de Charlemagne</i>.</p>

If a user edits this HTML to replace Les ancêtres de Charlemagne with Ibid., the application should then take steps to ensure a future parser does not believe the source literally has the title Ibid. In this case, clearly the change should not be propagated back to the citation element set as the source isn’t titled Ibid., and the user would presumably decline if offered this option. An application might delete the property attribute so Ibid. is not understood to be a title, or insert a content attribute containing real title as follows:

<p><span property=""
  >Settipani, Christian</span>. 
  <i property=""
     content="Les ancêtres de Charlemagne">Ibid.</i></p>

If an application stores formatted citations tagged with RDFa attributes as per this standard, it should take steps to ensure that changes to the underlying citation element set propagate to the formatted citation.

An application doing this would parse the formatted citation per this standard, locate the part of the HTML or XML that contains the old citation element value and overwrite it with the new value. For citation elements that are multi-valued elements, the application needs to know both the old and the new citation element value so that it knows which value is being updated; for other elements it is not necessary to know the old value.

Longer example

This example gives a full HTML document of the sort a genealogist might publish online. In a paragraph of narrative text it gives some brief details of King Edward II’s birth and parents. Although brief, this information is properly sourced to three published books with the citations formatted according to the Chicago Manual of Style. Each of these formatted citations has been marked up with RDFa attributes as described in this standard. The document includes several other instances of RDFa attributes that will not be detected as citation elements by a compliant parser.

<!DOCTYPE html>
<html lang="en">
    <meta charset="UTF-8" />
    <title property="dc:title">Edward II</title>
    <meta property="dc:creator" content="FHISO, Inc." />
      p { max-width: 720px; }
      .notes p, .note { font-size: smaller; }
      .fnref { vertical-align: super; font-size: smaller; }
      .fnref::before { content: '['; }
      .fnref::after { content: ']'; }

    <h1>Edward II</h1>
      Edward II was the fourth son of Edward I and his wife, Eleanor 
      of Castile.<a id="fnref1" class="fnref" href="#fn1">1</a>
      He was born in Caernarfon Castle in North Wales on 
      25 April 1284, less than a year after Edward I had conquered 
      the region, and as a result is sometimes called Edward of
      Caernarfon.<a id="fnref2" class="fnref" href="#fn2">2</a>
      His father was the King of England, and had also inherited 
      Gascony in south-western France, which he held as the
      feudal vassal of the King of France, and the Lordship 
      of Ireland.<a id="fnref3" class="fnref" href="#fn3">3</a>

    <div vocab="" class="notes">

      <p typeof="Source" id="fn1"><a href="#fnref1">1</a>.
        <span property="authorName">Roy Martin Haines</span>, 
        <i property="title">King Edward II: His Life, his Reign and 
          its Aftermath, 1284–1330</i> 
        (<span property="publicationPlace">Montreal, Canada 
           &amp; Kingston, Canada</span>: 
         <span property="publisher">McGill-Queen’s 
           University Press</span>, 
         <span property="publicationDate">2003</span>), 
        <span property="page" content="3">3</span>.

      <p typeof="Source" id="fn2"><a href="#fnref2">2</a>.
        <span property="authorName">Seymour Phillips</span>, 
        <i property="title">Edward II</i> 
        (<span property="publicationPlace">New Haven, US 
           &amp; London, UK</span>: 
         <span property="publisher">Yale University Press</span>, 
         <span property="publicationDate">2011</span>), 
        <span property="page" content="33, 36">33 &amp; 36</span>.

      <p typeof="Source" id="fn3"><a href="#fnref3">3</a>.
        <span property="authorName">Michael Prestwich</span>, 
        <i property="title">Edward I</i> 
        (<span property="publicationPlace">Berkeley, US 
           &amp; Los Angeles, US</span>: 
         <span property="publisher">University of California 
         <span property="publicationDate">1988</span>), 
        <span property="page" content="13-14">13–14</span>.

    <p class="note">This file is an example of an HTML document 
      containing formatted citations marked up with RDFa attributes
      per the FHISO draft standard 
      <a href=""
        >Citation Elements: Bindings for RDFa</a>.</p>

    <p vocab=""
       class="note">Content copied from 
      <a href=""
         property="dc:source">Wikipedia</a> and released under a 
      <a href=""
         property="license">Creative Commons License</a>.</p>


Normative references

[CEV Concepts]
FHISO (Family History Information Standards Organisation). *Citation Elements: General Concepts". Exploratory draft of standard. See
[RDF Concepts]
W3C (World Wide Web Consortium). RDF 1.1 Concepts and Abstract Syntax. W3C Recommendation, 2014. See
[RDFa Core]
W3C (World Wide Web Consortium). RDFa Core 1.1. W3C Recommendation, 3rd ed., 2015. See
[RFC 2119]
IETF (Internet Engineering Task Force). RFC 2119: Key words for use in RFCs to Indicate Requirement Levels. BCP 14. Scott Bradner, 1997. See
W3C (World Wide Web Consortium). Extensible Markup Language (XML) 1.0 (Fifth Edition). W3C Recommendation, 26 Nov 2008. See

Other references

FHISO (Family History Information Standards Organisation). *Citation Elements: Bindings for ELF". Early draft of standard.
FHISO (Family History Information Standards Organisation). *Citation Elements: Bindings for GEDCOM X". Early draft of standard.
[Dublin Core]
Dublin Core Metadata Initiative. Dublin Core metadata element set. Dublin Core recommendation, version 1.1, 1999. See
[Evidence Explained]
Elizabeth Shown Mills. Evidence Explained, 2nd ed. Baltimore: Genealogical Publishing Company, 2009.
W3C (World Wide Web Consortium). HTML+RDFa 1.1. W3C Recommendation, 2nd ed., 2015. See
[HTML5+RDFa Context]
W3C (World Wide Web Consortium). HTML5+RDFa Initial Context. Last updated 9 Dec 2011. See
[ISO 639-2]
ISO (International Organization for Standardization). ISO 639-2:1998. Codes for the representation of names of languages — Part 2: Alpha-3 code. 1998. (See
[ISO 8601]
ISO (International Organization for Standardization). ISO 8601:2004. Data elements and interchange formats — Information interchange — Representation of dates and times. 2004.
[RDF Schema]
W3C (World Wide Web Consortium). RDF Schema 1.1. W3C Recommendation, 2014. See
[RDF Semantics]
W3C (World Wide Web Consortium). RDF 1.1 Semantics. W3C Recommendation, 2014. See
[RDFa Primer]
W3C (World Wide Web Consortium). RDFa 1.1 Primer. W3C Recommendation, 3rd ed., 2015. See
W3C (World Wide Web Consortium). XHTML+RDFa 1.1. W3C Recommendation, 3rd ed., 2015. See
[XML Names]
W3 (World Wide Web Consortium). Namespaces in XML 1.0 (Third Edition). W3C Recommendation, 8 Dec 2009. See

Copyright © 2017, Family History Information Standards Organisation, Inc. The text of this standard is available under the Creative Commons Attribution License.