Warning: This may be an old version of the document. The current version can be found here.
This is a third public draft of the core part of FHISO’s proposed suite of standards on Citation Elements. This document is not endorsed by the FHISO membership, and may be updated, replaced or obsoleted by other documents at any time.
In particular, some examples in this draft use citation elements that are not even included in the draft Citation Elements: Vocabulary. These elements are very likely to be changed as the vocabulary progresses.
The public tsc-public@fhiso.org mailing list is the preferred place for comments, discussion and other feedback on this draft.
Latest public version: | https://fhiso.org/TR/cev-concepts |
This version: | https://fhiso.org/TR/cev-concepts-20180316 |
Previous version: | https://fhiso.org/TR/cev-concepts-20170911 |
FHISO’s suite of Citation Elements standards provides an extensible framework and vocabulary for encoding all the data about a genealogical source that might reasonably be included in a formatted citation to that source.
This document defines the general concepts used in FHISO’s suite of Citation Elements standards, and the basic framework and data model underpinning them. Other standards in the suite are as follows:
Citation Elements: Vocabulary. This standard defines a collection of citation elements allowing the representation of information normally found in formatted citations to diverse types of source.
Citation Elements: Bindings for RDFa. This standard defines a means by which citation elements may be identified and tagged using RDFa attributes within HTML and XML formatted citations, allowing a computer to extract them in a systematic manner.
Citation Elements: Bindings for GEDCOM X. This standard defines extensions to the GEDCOM X data model and its JSON and XML serialisations to allow citation elements to be represented in GEDCOM X.
Citation Elements: Bindings for ELF. This standard defines how citation elements should be represented in FHISO’s Extensible Legacy Format (ELF), a format based on and compatible with GEDCOM 5.5, but with the addition of a new extensibility mechanism.
Where this standard gives a specific technical meaning to a word or phrase, that word or phrase is formatted in bold text in its initial definition, and in italics when used elsewhere. The key words must, must not, required, shall, shall not, should, should not, recommended, not recommended, may and optional in this standard are to be interpreted as described in [RFC 2119].
An application is conformant with this standard if and only if it obeys all the requirements and prohibitions contained in this document, as indicated by use of the words must, must not, required, shall and shall not, and the relevant parts of its normative references. Standards referencing this standard must not loosen any of the requirements and prohibitions made by this standard, nor place additional requirements or prohibitions on the constructs defined herein.
If a conformant application encounters data that does not conform to this standard, it may issue a warning or error message, and may terminate processing of the document or data fragment.
This standard depends on FHISO’s Basic Concepts for Genealogical Standards standard. To be conformant with this standard, an application must also be conformant with [Basic Concepts]. Concepts defined in that standard are used here without further definition.
Indented text in grey or coloured boxes does not form a normative part of this standard, and is labelled as either an example or a note.
The grammar given here uses the form of EBNF notation defined in §6 of [XML], except that no significance is attached to the capitalisation of grammar symbols. Conforming applications must not generate data not conforming to the syntax given here, but non-conforming syntax may be accepted and processed by a conforming application in an implementation-defined manner.
This standard uses prefix notation when discussing specific terms. The following prefix bindings are assumed in this standard:
rdf |
http://www.w3.org/1999/02/22-rdf-syntax-ns# |
rdfs |
http://www.w3.org/2000/01/rdf-schema# |
xsd |
http://www.w3.org/2001/XMLSchema# |
types |
https://terms.fhiso.org/types/ |
cev |
https://terms.fhiso.org/sources/ |
When this standard discusses the xsd:string
datatype, this means the datatype whose term name is:
http://www.w3.org/2001/XMLSchema#string
A source is any resource from which information is obtained during the genealogical research process. Sources come in many forms, including manuscripts, artefacts, books, films, people, recordings and websites. A full mechanism for describing sources is beyond the scope of this standard.
A source derivation is a directional link between two sources, indicating that the first source was derived from, cites or otherwise references the second source. The first source is referred to as the derived source, and the second the base source.
A citation is an abstract reference to a specific source from which information has been used in some context. It should include sufficient detail that a third-party could readily locate the information themselves, assuming the source remains accessible.
A formatted citation is a citation that has been rendered into human-readable form, typically as a sentence or short paragraph that might be used as a footnote, endnote, tablenote or bibliography entry. There is no single standard on the correct form of formatted citations; many different style guides exist, each giving their own rules on how to construct a formatted citation.
A formatted citation produced for use in a footnote on the first use of the source, and conforming to [Chicago] might read:
1 Christian Settipani, Les ancêtres de Charlemagne, 2nd ed. (Oxford: Prosopographia et Genealogica, 2015), 129–31.
The 1 at the start of the citation is the hypothetical footnote number.
A layered citation is a citation that includes information about several sources between which source derivation links exist. The information in a layered citation about a specific source, whether the consulted source or one of sources from which it was derived, is known as a citation layer. A citation with just a single citation layer is called a single-layer citation.
The citation layer containing the information about the specific source which was consulted is known as the head citation layer. For a single-layer citation, the sole citation layer is necessarily the head citation layer.
A citation to a census return that was consulted on microfilm might contain information about the microfilm and as well as information about the census return, as in the following formatted citation from [Evidence Explained]:
1810 U.S. census, York County, Maine, town of York, p. 435 (penned), line 9, Jabez Young; NARA microfilm publication M252, roll 12.
In this example, the information before the semicolon pertains to the census return, while the information after it pertains to the microfilm. The microfilm and the census return are different sources, and a source derivation exists between them as the microfilm is derived from the census return. The information in the citation about microfilm forms the head citation layer, while the information about the census return forms a separate citation layer. As the citation contains two citation layers, it is an example of a layered citation.
In this example, the head citation layer is not presented first in the formatted citation. Whether the head citation layer is presented first is a matter of style and emphasis, and it is common not to present the head citation layer first when it is a photographic or digital reproduction, as in this case.
A citation element is a logically self-contained piece of information in a citation layer that might reasonably be included in a formatted citation. As this standard does not aim to provide facilities for the exhaustive description of sources, information about sources that is not normally included in formatted citations is not considered to be a citation element. Citation elements are represented in a sufficiently structured and language-independent way that applications can parse and reformat it in different styles and languages as needed.
The accompanying Citation Elements: Vocabulary standard defines many citation elements, covering the information normally found in formatted citations to a wide range of common sources. Applications may define their own citation elements or use those defined by a third-party standard; such citation elements are known as extension citation elements.
Conforming applications must not discard citation elements, except on the instruction of the user or as explicitly permitted in this standard. This applies to unrecognised extension citation elements too, though an application may opt not to display any such citation elements.
Note that the definition of citation element limits it to information that might reasonably appear in a citation; thus, most items of metadata (such as who created the citation and when, or a globally-unique identifier for the citation or its layers) are not properly considered citation elements themselves.
It is anticipated that metadata will be addressed in a future FHISO standard. Initial brainstorming on metadata implementation suggests that this document may be edited slightly to support metadata, perhaps by adding an optional identifier or context pointer to each element. The exact nature of such an edit, or if it will even be necessary, will depend on future development of that metadata standard.
A citation element set is a collection of citation elements that completely encode the information about a source that is present in a particular citation layer.
The example formatted citation to Les ancêtres de Charlemagne is represented by a citation element set containing the following seven citation elements:
Settipani, Christian
”.Les ancêtres de Charlemagne
”.2
”.Oxford
”.Prosopographia et Genealogica
”.2015
”.129-131
”.The footnote number is not a citation element as it does not pertain to the source. The author and page range are not expressed here in quite the same form as the formatted citation, but an application can readily parse them to convert them to the required format because their format is defined by this standard.
When provided with the citation element set for each citation layer in the citation, knowledge of which is the head citation layer, information about the source derivations between sources referred to in each citation layer, and any necessary internal state, an application ought to be able to produce algorithmically a formatted citation in a reasonable approximation to any mainstream citation style. If higher quality formatted citations are desirable, applications should allow users to manually edit them to fine-tune their presentation, and should store the result for reuse. Formatted citations need not include all the information from a citation element set if the style dictates that certain information is omitted in the relevant context.
Citation element sets should not include citation elements for information that is not normally included in a formatted citation. They are not intended to provide a general mechanism for storing arbitrary information about sources.
In the data model defined by this standard, a citation element consists of two parts, both of which are required:
A citation element set is defined to be an ordered list of citation elements; conformant applications may reorder the list subject to the following constraints:
The relative order of citation elements must be preserved when they have the same ultimate super-element (as defined in §3.1 of this standard).
When a citation element set contains a citation element with the citation element name https://terms.fhiso.org/sources/localisedElement
, the previous element in citation element set with a different citation element name is referred to as its localisation base. The localisation base of any localisedElement
citation element must not change if a citation element set is reordered.
localisedElement
s per §3.3.1 of this standard, and then removing them from the citation element set.
The citation element name identifies the nature of the information contained in a particular citation element. It shall be a term that has been defined to be used as a citation element name in the manner required by §3 of this standard; a term defined for this purpose is called a citation element term.
The [CEV Vocabulary] defines a citation element term for the title of a source. Its term name is:
https://terms.fhiso.org/sources/title
A dataset might contain many citation elements with this as their citation element name.
The citation element value is the content of the citation element which shall be a localisation set. A localisation set is an ordered list of strings, which applications should whitespace-normalise. Each string in a localisation set should contain the same information, but translated, transliterated or otherwise localised.
Each string in a localisation set shall be tagged with a datatype, and shall additionally be tagged with a language tag if and only if the specified datatype is a language-tagged datatype.
The title
citation element defined in the [CEV Vocabulary] would normally contain strings tagged with the rdf:langString
datatype. An example title
citation element might contain a localisation set with three rdf:langString
strings in the following order:
Η Γενεαλογία των Κομνηνών
” with language tag el
, the language code for Greek in [ISO 639-1];Hē Genealogia tōn Komnēnōn
” and language tag el-Latn
, Latn
being the code for the Latin script in [ISO 15924]; andLa généalogie des Comnènes
”, tagged with the language code fr
.Language tags should contain a script subtag per §2.2.3 of [RFC 5646] when the string has been transliterated from the script in which it originally appeared.
ar
, while the Latin transliteration should be tagged ar-Latn
. A layered citation should be used when citing a translation of al-Andalusī’s work, and al-Andalusī’s name would normally only appearing in the citation layer pertaining to the original. If the particular translation used was the English translation by Sema‘an I. Salem and Alok Kumar, the names of these translators should be tagged en
, the code of English, even though the first translator is a Lebanese man with an Arabic name. This is because these are the forms of their names the translators chose to use when writing in English.
Suppress-Script
field in [IANA Lang Subtags]. If a source is written in an unorthodox script, there may be a need to transliterate back to the conventional script. Such cases are expected to be rare. When such a case arises, this standard recommends the use of a script subtag on the transliteration, while [RFC 5646] recommends against one because the transliteration is to the default script. Both are recommendations rather than requirements, meaning that after careful consideration they may be ignored in particular circumstances.
Although the language tags is required for language-tagged datatypes, it need not be explicit in the serialisation. A serialisation format may provide a mechanism for stating the document’s default language tag, and may provide a global default which should be a language-neutral choice such as und
, defined in [ISO 639-2] to mean an undetermined language. In the absence of an explicit or implicit language tag, applications must not apply their own default, and must treat the string as if it had the language tag und
.
lang
attribute to provide a default language tag for the document or a part of the document. Thus, if the document begins <html lang="pt_BR">
, it is not necessary to tag each string separately for them to be understood to be in Brazilian Portuguese. HTML does not define a default language tag that applies in the absence of a lang
tag, and applications must not apply one.
If localisation sets are being serialised in XML, it is recommended that the special xml:lang
attribute defined in §2.12 of [XML] is used to encode the language tag.
Similarly, a datatype is required, but it need not be explicit in the serialisation. A serialisation format may specify a format default datatype that applies when none is given explicitly. Ordinarily, if a format default datatype is specified, it should be the rdf:langString
datatype described in §6.6.5 of [Basic Concepts].
rdf:langString
is recommended. The datatype correction mechanism defined in §3.4 of this standard allow a conformant application to correct the datatype that have incorrectly defaulted to rdf:langString
. In practice it is anticipated that many applications will apply datatype correction during import, and therefore the format default datatype becomes a fallback that applies if the citation element term does not define its own default datatype, or if this is unknown.
The [CEV RDFa] standard makes rdf:langString
the format default datatype in most circumstances. Thus the citation element extracted from the following HTML fragment is interpreted as an rdf:langString
string, even though it is not explicitly tagged as such:
<i lang="en" property="title">The Complete Peerage</i>
Where possible, the first string in the localisation set should be the untranslated, and ideally untransliterated form of the citation element value. If it is known that the only available values are translations, the first string in the localisation set should be an empty string tagged with the language tag und
, and the translations listed afterwards. An empty string in a localisation set means that its value is unknown, rather than that this particular translation is literally an empty string.
Conformant applications may reorder the localisation set, but must leave the first string first, so that applications wishing to use the original, untranslated, untransliterated form can do so.
rdf:langString
strings, except that JSON’s object notion, as given in §4 of [RFC 7159], does not preserve order. One possible solution is to append some private use subtag (per §2.2.7 of [RFC 5646]) to the first language tag.
In a localisation set which contains more than one string with the same datatype and language tag, or more than one string with the same datatype if it is a non-language-tagged datatype, any string other than the first non-empty string with that datatype and, if relevant, language tag is known as a duplicate string.
If an application encounters a localisation set with duplicate strings, it should ignore the value of any duplicate strings and may deduplicate the localisation set; where possible it should not deduplicate a localisation set that has been reordered from its serialised form.
To deduplicate a localisation set, the application first notes the datatype and, if present, the language tag of the first string in the localisation set. Next, all duplicate strings are deleted from the localisation set. Finally, if a string with the noted datatype and language tag remains after deduplication, the application shall reorder the localisation set to ensure it is the first string in the deduplicated localisation set; if there is not, the application shall insert any empty string with that datatype and language tag as the first string in the localisation set.
If an application needs to merge two or more localisation sets, the contents of each localisation sets shall be combined in the order specified by this standard, and the application should deduplicate the resultant localisation set.
If a citation element has a citation element name which is an empty localisation set, that citation element should be discarded.
A citation element term is a term which has been defined specifically for use as a citation element name in the following manner. The party defining the citation element term shall provide a description of the intended purpose of the citation element term which should be made freely available to all interested parties, preferably by an HTTP request as described in §4.2 of [Basic Concepts]. In addition, the definition shall state:
The class of citation element terms has the following class name and properties:
Name | https://terms.fhiso.org/sources/CitationElement |
Type | http://www.w3.org/2000/01/rdf-schema#Class |
Superclass | http://www.w3.org/1999/02/22-rdf-syntax-ns#Property |
Required properties | http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/2000/01/rdf-schema#range https://terms.fhiso.org/sources/isSingleValued |
CitationElement
class is defined as a subclass of the rdfs:Property
class defined in §5.2 of [Basic Concepts]. Logically this makes sense, as a citation element can be considered a property of a source, and it allows the concept of the range of a property to be reused.
xsd:anyAtomicType
to mean there is no meaningful default datatype, and the citation element name itself or rdfs:Resource
to mean there is no super-element.
A citation element term may be defined as a sub-element of another citation element term which is referred to as its super-element. This is used to provide a refinement of a general citation element term. If an application is unfamiliar with the sub-element it may process it as if it were the super-element, with its citation element value unchanged. The sub-element must be defined in such a way that this only results in some loss of meaning, and does not imply anything false about the cited source.
The [CEV Vocabulary] defines a citation element term with the name
https://terms.fhiso.org/sources/creatorName
which contains name of a person, organisation or other entity who created or contributed to the creation of the source. Several sub-elements of it are defined, including
https://terms.fhiso.org/sources/interviewerName
which contains the name of an interviewer when the source is an interview. An interviewer can certainly be considered to have contributed to the creation of the interview.
The [CEV Vocabulary] also defines a citation element with the name
https://terms.fhiso.org/sources/recipientName
which contains the party to whom a source such as a letter is addressed. In many respects it is similar to the sub-elements of creatorName
, but because a recipient of a letter cannot be said to have contributed to the creation of the letter, and might not even be aware of its existence if it were not delivered, the recipientName
element cannot be defined as a sub-element of creatorName
.
The range of a sub-element shall be the same as that of its super-element.
Any sub-element of a single-valued super-element must be single-valued.
The property representing the super-element of a citation element term is defined as follows:
Name | https://terms.fhiso.org/sources/subElementOf |
Type | http://www.w3.org/1999/02/22-rdf-syntax-ns#Property |
Range | https://terms.fhiso.org/sources/CitationElement |
The super-element list of a citation element term is an ordered list of IRIs defined inductively as follows. If the citation element term is not a sub-element, then its super-element list contains just that citation element term. Otherwise, its super-element list is the super-element list of its super-element to which its own citation element term is appended.
The ultimate super-element of a citation element term is defined as the first IRI in its super-element list.
The ultimate single-valued super-element of a single-valued citation element term is defined as the first IRI in its super-element list that is a single-valued citation element term.
The most-refined common super-element of a collection of citation element terms is defined as the last IRI that appears in the super-element list of every citation element term in the collection. It is only defined for citation element terms that share an ultimate super-element.
The range of a citation element term shall be a datatype, which describes what citation element values are valid in a citation element with this citation element name.
Citation elements terms with non-textual citation element values such as numbers or dates should have ranges that are non-language-tagged datatype.
FHISO defines an abstract datatype called AbstractDate
which is used as the supertype of all structured datatypes for dates; it has the following term name:
https://terms.fhiso.org/dates/AbstractDate
Several citation element terms have a range consisting of a union of AbstractDate
and rdf:langString
. This union of datatypes is itself a non-language-tagged datatype because not all of its constituent datatypes are language-tagged datatypes, as specified in §6.5 of [Basic Concepts].
One such citation element term is:
https://terms.fhiso.org/sources/publicationDate
Because this citation element typically has non-textual values, frequently just a year, its range should be a non-language-tagged datatype which the inclusion of AbstractDate
in the union ensures.
The inclusion of rdf:langString
is to allow dates that cannot readily be represented in any of the available structured formats. An example might be a termly university publication dated “Michaelmas term, 1997”.
The property representing the range of a citation element term is the rdfs:range
property defined in §5.2.1 of [Basic Concepts].
A datatype is said to be compatible with the range if it is a subtype of the datatype identified as the range.
A string in a localisation set which is used as a citation element value is said to be invalid if, after datatype correction has occurred per §3.4 of this standard, either the string is tagged with a datatype that is not compatible with the range of the citation element term used as the citation element name, or the string is outside the lexical space of that datatype. Conformant application should take steps to avoid creating localisation sets containing invalid strings.
Applications may use one or more discovery mechanism to obtain the information needed to determine which strings are invalid.
If the range of the citation element term includes one of the following datatypes, applications should change the datatype of the invalid string to that datatype:
http://www.w3.org/1999/02/22-rdf-syntax-ns#langString
http://www.w3.org/2001/XMLSchema#string
If the range contains both of these datatypes, applications should change the datatype of an invalid language-tagged string to rdf:langString
. If the range does not include either of these datatypes, applications may discard any strings that are found to be invalid. It is recommended that this should be done prior to deduplicating a localisation set, and it may be done at other times. A conformant application must not discard any string unless it is known to be invalid or as otherwise permitted by this standard.
Exceptionally, a conformant application may also discard any string which it has credible reason to believe contains malware or illegal content, or any string that is so long that the application cannot reasonably handle it.
The cardinality of a citation element term records how many semantically distinct values it can have. A multi-valued citation element term is one that can logically have multiple values in a single citation layer. It should be reserved for situations where the values genuinely contains different information, and not used to accommodate transliterations, translations, or variant forms of something that is logically a single value. Citation elements terms that are not multi-valued are single-valued.
https://terms.fhiso.org/sources/title
citation element term is defined to be single-valued, as citations do not refer to the same sources by multiple titles (though they may translate or transliterate the title), so a citation element set must not contain more than one citation element with this citation element name; but it may contain several https://terms.fhiso.org/sources/authorName
citation elements, as that is defined to be multi-valued to accommodate sources with several authors.
The cardinality of a citation element term is represented by a boolean property called isSingleValued
, which shall have the value “true
” for single-valued citation element terms and “false
” otherwise.
Name | https://terms.fhiso.org/sources/isSingleValued |
Type | http://www.w3.org/1999/02/22-rdf-syntax-ns#Property |
Range | http://www.w3.org/2001/XMLSchema#boolean |
In a citation element set which contains more than one citation element whose citation element names have the same ultimate single-valued super-element, any citation element other than the first citation element with that ultimate single-valued super-element is known as a duplicate citation element.
Citation element sets should not contain duplicate citation elements, and an application should take steps to avoid creating duplicate citation elements.
When duplication citation elements are present, they may be deduplicated. To deduplicate a citation element set, the application should replace all the citation elements with a common ultimate single-valued super-element with a single replacement citation element with the following properties:
Consider the following citation element set, written in a hypothetical JSON format:
[ "title": [ "fr": "Les ancêtres des Charlemagne",
"en": "The Ancestors of Charlemagne" ],
"title": [ "fr": "Les Ancêtres des Charlemagne",
"de": "Die Vorfahren von Karl dem Großen" ] ]
Assuming the title
citation element term is single-valued, an application may deduplicate the citation element set by merging the two localisation sets in order to get the following:
[ "title": [ "fr": "Les ancêtres des Charlemagne",
"en": "The Ancestors of Charlemagne",
"fr": "Les Ancêtres des Charlemagne",
"de": "Die Vorfahren von Karl dem Großen" ] ]
After merging the localisation sets, §2.2.2 says the application should deduplicate the resultant localisation set. This removes the second French title to give the following:
[ "title": [ "fr": "Les ancêtres des Charlemagne",
"en": "The Ancestors of Charlemagne",
"de": "Die Vorfahren von Karl dem Großen" ] ]
These rules mean that single-valued citation elements with the same ultimate single-valued super-element (in this example, with the same citation element name) are assumed to be given in order of preference for the purpose of deduplicating the merged localisation set, with the most preferred value first.
This standard needs to define how to merge citation element sets. The following text is a start towards that.
If an application needs to merge two or more citation element sets, the contents of each citation element set shall be combined in order. The application shall identify any sets of duplicate citation elements in the combined citation element set and deduplicate them according to the rules above. An application may use one or more discovery mechanism to attempt to obtain machine-readable definitions of any extension citation element used in the citation element set before identifying duplicate citation elements.
However the merger of multi-valued elements requires thought too. Even though the data model doesn’t require deduplication, it is still necessary to prevent duplication of, say, authors.
Conformant applications must ensure that in citation elements whose citation element names are multi-valued, the localisation set in each citation element value remains separate.
The authorName
citation element term is defined to be multi-valued because a source may have multiple authors, and each of them may have names that have been transliterated into different scripts. Suppose a researcher wants to cite the Anglo-Japanese Treaty document of 1902 which was (at least nominally) authored by the Marquess of Lansdowne and Count Hayashi Tadasu whose name is written in kanji as 林 董.
The following hypothetical JSON serialisation is not allowed as it flattens localisation sets so it is no longer possible to determine how many authors there are, and which names are translations of which others.
[ { "name": "https://terms.fhiso.org/terms/title",
"lang": "en", "value": "The Anglo-Japanese Treaty" },
{ "name": "https://terms.fhiso.org/terms/authorName",
"lang": "en", "value": "Lord Lansdowne" },
{ "name": "https://terms.fhiso.org/terms/authorName",
"lang": "jp", "value": "林 董" },
{ "name": "https://terms.fhiso.org/terms/authorName",
"lang": "jp-Latn", "value": "Hayashi Tadasu" } ]
In this example, the datatype of each string has been omitted on the assumption that it defaults to rdf:langString
and is corrected via the mechanism specified in §3.4 of this standard.
This is an example of a list-flattening format that does not conform to this specification; a list-flattening format that does conform to this specification is found in the next example.
A serialisation format that does not keep the localisation sets of each citation element value separate is called a list-flattening format, and this standard provides a facility to allow such formats to comply with this standard by introducing a special citation element term with the following properties:
Name | https://terms.fhiso.org/sources/localisedElement |
Type | https://terms.fhiso.org/sources/CitationElement |
Range | http://www.w3.org/2001/XMLSchema#anyAtomicType |
Cardinality | multi-valued |
Super-element | none |
Default datatype | none |
localisedElement
is given here as xsd:anyAtomicType
, which is the ultimate supertype of all datatypes defined in §6.6.6 of [Basic Concepts]. This is an explicit statement of the fact that the citation element value of a localisedElement
citation element may be tagged with an arbitrary datatype.
In a list-flattening format, an application must consider every value to be a separate citation element value, and therefore to be a localisation set with one element.
When a localisation set with two or more strings needs to be serialised in a list-flattening format, the first string must be serialised according to the normal rules of the format, and subsequent strings must be serialised as if they were separate citation element, but with the localisedElement
citation element term in place of the actual citation element name. This special citation element indicates that its value is not a distinct citation element and should instead be appended to the localisation set of its localisation base (i.e. the last preceding citation element which is not a localisedElement
), and the localisedElement
removed from the citation element set.
The hypothetical JSON serialisation in the last example can be fixed by using a localisedElement
to serialise the transliterated version of Hayashi’s name:
[ { "name": "https://terms.fhiso.org/terms/title",
"lang": "en", "value": "The Anglo-Japanese Treaty" },
{ "name": "https://terms.fhiso.org/terms/authorName",
"lang": "en", "value": "Lord Lansdowne" },
{ "name": "https://terms.fhiso.org/terms/authorName",
"lang": "jp", "value": "林 董" },
{ "name": "https://terms.fhiso.org/terms/localisedElement",
"lang": "jp-Latn", "value": "Hayashi Tadasu" } ]
The two authorName
element are assumed to be separate citation elements and therefore to refer to different authors. The use of localisedElement
signifies that this is not a different author. It immediately follows an authorName
citation element with the value 林 董, and its value (“Hayashi Tadasu”, tagged as jp-Latn
) should be appended to that localisation set.
localisedElements
occurs. Ideally an application should do it during the process of reading a list-flattening format, but may do it later or not at all. If the application subsequently serialise the data in a non-list-flattening format, the localisedElement
s may still be present. Therefore applications reading non-list-flattening format should cope with the possibility of localisedElements
being present.
If the localisation set in the localisation base already contains a string with the same datatype and language tag, an application must not overwrite or duplicate a language tag; the localisedElement
should be ignored and may be removed from the the citation element set.
The use of list-flattening formats is not recommended except where there is a good technical reason. The use of localisedElement
s other than in list-flattening formats is not recommended.
A citation element term may have a default datatype defined. When a default datatype is defined, it is used to provide an optional datatype correction mechanism for correcting the datatype of a string in the localisation set of a citation element value in certain situations. The default datatype must be a datatype that is compatible with the range of the citation element term.
The property representing the default datatype of a citation element term is defined as follows:
Name | https://terms.fhiso.org/sources/defaultDatatype |
Type | http://www.w3.org/1999/02/22-rdf-syntax-ns#Property |
Range | http://www.w3.org/2000/01/rdf-schema#Datatype |
Datatype correction shall not be carried out unless the datatype of the string prior to datatype correction is one of the following datatypes, and not just a subtype of one of them:
http://www.w3.org/1999/02/22-rdf-syntax-ns#langString
http://www.w3.org/2001/XMLSchema#string
rdf:langString
. Support for xsd:string
is only included in this datatype correction mechanism to accommodate certain corner cases in RDF processing that could arise in the [CEV RDFa] bindings.
Datatype correction shall only be applied to a string if it appears in a citation element whose citation element name is a citation element term that has a default datatype, and if that default datatype is a datatype whose pattern is known to the application, and if the string matches that pattern.
At any time when an application encounters a string which is eligible for datatype correction according to the above criteria, it may replace its datatype with the default datatype. It is recommended that applications apply datatype correction during or shortly after the import of data in any serialisation format that defines a format default datatype of rdf:langString
.
The hypothetical JSON format used in several earlier examples included the following citation element:
[ { "name": "https://terms.fhiso.org/terms/authorName",
"lang": "jp", "value": "林 董" } ]
This hypothetical format is supposed to default datatypes to rdf:langString
, as recommended by this standard.
The authorName
citation element is defined in the [CEV Vocabulary] to have the following default datatype:
https://terms.fhiso.org/sources/AgentName
This datatype in turn defines the following pattern:
([^!#$%&@{|}]+@)?[^!#$%&@{|}]+(\|[^!#$%&@{|}]*(\|[^!#$%&@{|}]+)?)?
The string “林 董
” matches this pattern — specifically it matches the second [^!#$%&@{|}]+
part of the pattern — and therefore the datatype correction will change the datatype to this AgentName
datatype.
AgentName
will almost certainly need changing as the AgentName
datatype is properly specified.
The publicationDate
citation element term defined in the [CEV Vocabulary] has a range which is the union of the AbstractDate
and rdf:langString
datatypes; its default datatype is GregorianDate
, a subtype of AbstractDate
with the following pattern:
-?[0-9]{4,}(-(0[1-9]|1[0-2])(-(0[1-9]|[12][0-9]|3[01]))?)?
A citation element set might contain a publicationDate
citation element whose localisation set contains the following two strings, both tagged with the language tag en
and datatype rdf:langString
(presumably implicitly as the result of no datatype being given in the serialisation):
Michaelmas term, 1997
1997-10
The former string is not remotely close to matching the pattern for the GregorianDate
datatype, so it is unaffected by datatype correction; however the latter string does match the pattern and so datatype correction may change its datatype to GregorianDate
.
This is an example of where a localisation set might usefully contain both language-tagged datatypes and non-language-tagged datatypes. The former gives the date in the correct form for inclusion in a formatted citation, while the latter allows an application to parse the date, for example to highlight contemporary sources to a user.
GregorianDate
datatype, let alone whether it is actually the default datatype of the publicationDate
citation element term. If such a datatype is specified, it is unlikely to have precisely the pattern given above. Nevertheless, it is safe to assume that this citation element term will have a default datatype that is some structured datatype for dates.
Matching the pattern of a datatype does not guarantee the string necessarily belongs to the lexical space of that datatype, so it is possible that data correction might turn a valid unstructured string into an invalid string. An application should not perform data correction when it knows the result would be an invalid string.
rdf:langString
or xsd:string
rather than being discarded.
Applications should try to ensure that no strings are entered which match the pattern of the default datatype but are outside its lexical space. One strategy for ensuring this is to suggest an alteration to the string that would prevent it from matching the pattern; however applications must not make such an alternation other than at the instruction of the user.
1999-02-31
” matches the pattern for a GregorgianDate
but is nonetheless outside the lexical space of that datatype as there was no such date. A conformant application might warn the user that this is not a valid Gregorian date; if the user confirms they really did mean to enter an unstructured string that looks like an invalid Gregorian date, the application may alter the string to make it not match the pattern. One way this could be done would be appending “(sic)
” to the string; another option is to append an invisible Unicode character such U+2060 (word joiner).
If datatype correction would result in replacing a non-language-tagged datatype with a language-tagged datatype, then the application must tag the string with the language tag und
.
xsd:string
datatype, which this standard discourages when the data is indeed language-tagged.
In the data model defined in this standard, a citation layer is represented by a citation element set containing the information in the citation layer.
A citation is represented with the following three parts:
In the common case of a singe-layer citation, the set of layer derivation links will be empty, and the sole citation layer present must be the head citation layer. This means that a single-layer citation can be represented using just a citation element set.
Applications should not reorder the list of citation layers, other than at the request of the user. The order of the citation layers is an indication of the preferred order for displaying the citation layers, and should begin with the one considered most important. This is not necessarily the head citation layer. Applications may ignore this order when displaying or formatting citation layers.
When the sources represented by two citation layers are linked by a source derivation, a layer derivation link is used to encode this. It has three parts, all of which are required:
The two references to citation layers in the layer derivation link shall refer to citation layers present in the current citation.
The source derivation type shall be either an IRI defined in accordance with a future FHISO standard on source derivation types, or the following cev:derivedFrom
IRI which represents the most general case of derivation supported in this data model:
Name | https://terms.fhiso.org/sources/derivedFrom |
Type | https://terms.fhiso.org/sources/SourceDerivation |
prov:wasDerivedFrom
or prov:wasInfluencedBy
properties from [PROV-O] instead of inventing our own derivedFrom
term?
Applications may discard any IRI that it knows does not conform to the above requirement.
derivedFrom
source derivation type. The Source Derivation Vocabulary standard will also provide a mechanism for third parties to provider their own extension source derivation types, and provide a means of determining whether a given IRI is a source derivation type. If this document is ready for standardisation at the same time as this document, the previous paragraph will be updated to reference it.
The class of source derivation types has the following class name and properties:
Name | https://terms.fhiso.org/sources/SourceDerivation |
Type | http://www.w3.org/2000/01/rdf-schema#Class |
Required properties | http://www.w3.org/1999/02/22-rdf-syntax-ns#type |
SourceDerivation
class a subclass of rdfs:Property
.
A citation layer is directly derived from another citation layer if there exists a layer derivation link whose derived reference is to the former citation layer and whose base reference is to the latter citation layer. The direct base citation layer set of a citation layer is the set of citation layers from which the first citation layer is directly derived.
The complete base citation layer set of a citation layer is defined recursively as follows. The citation layer itself is part of its complete base citation layer set. It also contains every citation layer in the complete base citation layer set of every citation layer in its direct base citation layer set.
The complete base citation layer set of the head citation layer shall contain every citation layer in the citation. If an application encounters a citation for which this is not the case, it may discard any citation layers that are not in the complete base citation layer set of the head citation layer.
Copyright © 2017–18, Family History Information Standards Organisation, Inc. The text of this standard is available under the Creative Commons Attribution 4.0 International License.