This is a first public draft of a standard covering basic concepts that are expected to be used in multiple FHISO standards. This document is not endorsed by the FHISO membership, and may be updated, replaced or obsoleted by other documents at any time.
The public tsc-public@fhiso.org mailing list is the preferred place for comments, discussion and other feedback on this draft.
Latest public version: | https://fhiso.org/TR/basic-concepts |
This version: | https://fhiso.org/TR/basic-concepts-20180316 |
FHISO’s Basic Concepts for Genealogical Standards standard defines various low-level concepts that will be used in many FHISO standards, and whose definitions do not logically belong in any one particular higher-level standard.
The definition of a string which is used in multiple FHISO standards is given in §2 of this standard, together with various related concepts such as characters and whitespace, and §3 defines briefly how FHISO standards use language tags. Terms are defined in §4 as a form of extensible identifier using IRIs, and §4.1 discusses information that may be retrieved from these IRIs. The notion of a datatype is defined in §6, which also includes details on how to specify a new datatype.
The concepts of a classes and properties are defined in §5. They provide an infrastructure for defining extensions to FHISO standards and new, compatible standards in such a way that applications can use a discovery mechanism to find out about unknown components, allowing them to be processed. The facilities in these sections will primarily be of use to parties defining extensions or implementing discovery.
Where this standard gives a specific technical meaning to a word or phrase, that word or phrase is formatted in bold text in its initial definition, and in italics when used elsewhere. The key words must, must not, required, shall, shall not, should, should not, recommended, not recommended, may and optional in this standard are to be interpreted as described in [RFC 2119].
An application is conformant with this standard if and only if it obeys all the requirements and prohibitions contained in this document, as indicated by use of the words must, must not, required, shall and shall not, and the relevant parts of its normative references. Standards referencing this standard must not loosen any of the requirements and prohibitions made by this standard, nor place additional requirements or prohibitions on the constructs defined herein.
If a conformant application encounters data that does not conform to this standard, it may issue a warning or error message, and may terminate processing of the document or data fragment.
Indented text in grey or coloured boxes does not form a normative part of this standard, and is labelled as either an example or a note.
The grammar given here uses the form of EBNF notation defined in §6 of [XML], except that no significance is attached to the capitalisation of grammar symbols. Conforming applications must not generate data not conforming to the syntax given here, but non-conforming syntax may be accepted and processed by a conforming application in an implementation-defined manner.
This standard uses prefix notation, as defined in §4.3 of this standard, when discussing specific terms. The following prefix bindings are assumed in this standard:
rdf |
http://www.w3.org/1999/02/22-rdf-syntax-ns# |
rdfs |
http://www.w3.org/2000/01/rdf-schema# |
xsd |
http://www.w3.org/2001/XMLSchema# |
types |
https://terms.fhiso.org/types/ |
Characters are specified by reference to their code point number in [ISO 10646], without regard to any particular character encoding. In this standard, characters may be identified in this standard by their hexadecimal code point prefixed with “U+”.
Characters must match the Char
production from [XML].
Char ::= [#1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
A string is a sequence of zero or more characters, and should only be used to encode textual data.
This definition of a string is identical to the definition of the string
datatype defined in [XSD Pt2], used in many XML and Semantic Web technologies.
This definition of a string differs very slightly from JSON’s definition of a string, as defined in [RFC 7159], as a JSON string may include the null character (U+0000). This is the only difference between a JSON string and FHISO’s definition of a string. As a string should not be used to contain raw binary data, this difference is not anticipated to cause a problem. If an application needs to store binary data in string, it should encode it in a textual form, for example with the Base64 data encoding scheme defined in [RFC 4648].
Applications may convert any string into Unicode Normalization Form C, as defined in any version of Unicode Standard Annex #15 [UAX 15].
Characters matching the RestrictedChar
production from [XML] should not appear in strings, and applications may process such characters in an implementation-defined manner or reject strings containing them.
RestrictedChar ::= [#x1-#x8] | [#xB-#xC] | [#xE-#x1F]
| [#x7F-#x84] | [#x86-#x9F]
RestrictedChar
s.
Conformant applications must be able to store and process strings containing arbitrary characters other than those matching the RestrictedChar
. In particular, applications must be able to handle characters which correspond to unassigned Unicode code points as they may be assigned in future versions of [ISO 10646]. Applications must also be able to handle characters outside Unicode’s Basic Multilingual Plane — that is, characters with a code point of U+10000 or higher.
Whitespace is defined as a sequence of one or more space characters, carriage returns, line feeds, or tabs. It matches the production S
from [XML].
S ::= (#x20 | #x9 | #xD | #xA)+
Whitespace normalisation is the process of discarding any leading or trailing whitespace, and replacing other whitespace with a single space (U+0020) character.
In the event of a difference between the definitions of the Char
, RestrictedChar
and S
productions given here and those in [XML], the definitions in the latest edition of XML 1.1 specification are definitive.
A language tag is a string that is used to represent a human language, and where appropriate the script and regional variant or dialect used. They are commonly used to tag other strings to identify their language in a machine-readable manner.
The language tag shall match the Language-Tag
production from [RFC 5646], or from any future RFC published by the IEFT that obsoletes [RFC 5646] (hereinafter referred to as RFC 5646’s successor RFC), and should be valid, as defined in §2.2.9 of [RFC 5646].
Valid language tags have the meaning that is assigned to them by [RFC 5646] and any successor RFC. Applications may discard any language tag that is not well-formed and replace it with und
, meaning a undetermined language, but must not discard any language tag that is well-formed even if it is not valid.
[RFC 5646] says that to be valid, a language tag must consist of tags that have been registered in the [IANA Lang Subtags] registry. This is freely available online in a machine-readable form defined in §3.1.1 of [RFC 5646], and gives the meaning of every tag. Currently it includes:
The meanings of codes in the source ISO standards may change over time, but the procedure set out in §3.4 of [RFC 5646] governing the addition of tags to [IANA Lang Subtags] ensures the meanings there stable. This particularly affects [ISO 3166-1] country codes which historically have been reused, and may result in a gradual divergence between and [IANA Lang Subtags]. Applications should therefore avoid using [ISO 3166-1] codes that have not been registered in [IANA Lang Subtags].
A string tagged with the language tag hu-CS
must be interpreted by a conformant application as being in the Hungarian language localised for use in the former state of Serbia and Montenegro, because this is how hu
and CS
are listed in [IANA Lang Subtags]. The code CS
is perhaps better known as representing the former state of Czechoslovakia and appears in older lists of [ISO 3166-1] country codes as such, but neither IANA nor FHISO recognise this former meaning.
This is one of five country codes whose meaning has materially changed in [ISO 3166-1], the other four being AI
, BQ
, GE
and SK
. In each case, because the reuse occurred before the creation of [IANA Lang Subtags], it is the current meaning that is listed in [IANA Lang Subtags]. If there is further reuse of country codes in the future, [RFC 5646] requires that the current meaning of the tag be retained and a numeric code be given to the new country in [IANA Lang Subtags].
A conformant application may convert any language tag into its canonical form, as defined by §4.5 of [RFC 5646] or an equivalent section of a successor RFC.
Preferred-Value
field in [IANA Lang Subtags]. It never result in the removal of script subtag, even when they are the default script for the language as defined by a Suppress-Script
field.
iw
is listed in [IANA Lang Subtags] as a deprecated language code for Hebrew which has now been removed from [ISO 639-1]. Its Preferred-Value
field is he
, so an application may replace iw
with he
.
A conformant application may alter a language tag in any other way that leaves its canonical form unchanged when compared in a case-insensitive manner.
A string which is accompanied by a language tag which identifies the language in which the string is written is called a language-tagged string.
A term is a form of identifier used in FHISO standards to represent a concept which it is useful to be able to reference. A term consists of a unique, machine-readable identifier, known as the term name, paired with a clearly-defined meaning for the concept or idea that it represents. Term names shall take the form of an IRI matching the IRI
production in §2.2 of [RFC 3987].
This standard uses terms to name datatypes, as defined in §6 of this standard, and also to name classes and properties, defined in §5.1 and §5.2. For example, §6.6.3 of this standard defines a datatype for representing integers. This datatype is identified by a term whose term name in prefix notation is xsd:integer
. This is short for the following IRI:
http://www.w3.org/2001/XMLSchema#integer
Term names are compared using the “simple string comparison” algorithm given in §5.3.1 of [RFC 3987]. If a term name does not compare equal to an IRI known to the application, the application must not make any assumptions about the term, its meaning or intended use, based on the form of the IRI or any similarity to other IRIs.
The following IRIs are all distinct for the purpose of the “simple string comparison” algorithm given in §5.3.1 of [RFC 3987], , even though an HTTP request to them would fetch the same resource.
https://éléments.example.com/nationalité
HTTPS://ÉLÉMENTS.EXAMPLE.COM/nationalit%C3%A9
https://xn--lments-9uab.example.com/nationalit%c3%a9
An IRI must not be used as a term name unless it can be converted to a URI using the algorithm specified in §3.1 of [RFC 3987], and back to a IRI again using the algorithm specified in §3.2 of [RFC 3987], to yield the original IRI.
The terms defined in FHISO standards all have term names that begin https://terms.fhiso.org/
. Subject to the requirements in the applicable standards, third parties may also define additional terms. It is recommended that any such terms use either the http
or preferably the https
IRI scheme defined in §2.7.1 and §2.7.2 of [RFC 7230] respectively, and an authority component consisting of just a domain name or subdomain under the control of the party defining the term.
An http
or https
IRI scheme is recommended because the IRI is used to fetch a resource during discovery, and it is desirable that applications implementing discovery should only need to support a minimal number of transport protocols. URN schemes like the uuid
scheme of [RFC 4122] are not recommended as they do not have transport protocols that can be used during discovery.
The preference for a https
IRI is because of security considerations during discovery. A man-in-the-middle attack during discovery could insert malicious content into the response, which, if undetected, could cause an application to process user data incorrectly, potentially discarding parts of it or otherwise compromising its integrity. It is harder to stage a man-in-the-middle attack over TLS, especially if public key pinning is used per [RFC 7469].
It is recommended that an HTTP GET
request to a term name IRI with an http
or https
scheme (once converted to a URI per §4.1 of [RFC 3987]), should result in a 303 “See Other” redirect to a document containing a human-readable definition of the term if the request was made without an Accept
header or with an Accept
header matching the format of the human-readable definition. It is further recommended that this format should be HTML, and that documentation in alternative formats may be made available via HTTP content negotiation when the request includes a suitable Accept
header, per §5.3.2 of [RFC 7231].
Parties defining terms should arrange for their term name to support discovery. This when an HTTP GET
request to a term name IRI with an http
or https
scheme, made with an appropriate Accept
header, yields 303 redirect to a machine-readable definition of the term.
This standard does not define a discovery mechanism, but it is recommended that parties defining terms support FHISO’s [Triples Discovery] mechanism, and may additionally support other mechanisms. Support for discovery by applications is optional.
Suppose an application wants to perform discovery on the hypothetical https://example.com/events/Baptism
term used in several later examples in this standard. If the application supports FHISO’s [Triples Discovery] mechanism, which uses [N-Triples] as its serialisation format, together with some other hypothetical discovery mechanism using the application/x-discovery
MIME type, but prefers to use [Triples Discovery], it might make the following HTTP request:
GET /events/Baptism HTTP/1.1
Host: example.com
Accept: application/n-triples, application/x-discovery; q=0.9
In this example, the q=0.9
in the Accept
header is a quality value which, per §5.3 of [RFC 7231], indicates that the x-discovery
format is less preferred than n-triples
which by default has a quality value of 1.0.
If the server supports n-triples
, it must respond with a 303 redirect:
HTTP/1.1 303 See Other
Location: https://example.com/events/Baptism.n3
Vary: Accept
In this case the redirect is to the original IRI but with .n3
appended, however the actual choice of IRI is up to the party defining the term and running the example.com
web server. When a server’s response is dependent on the contents of an Accept
header, §7.1.4 of [RFC 7231] says that this should be recorded in a Vary
header, as it is in this example.
The application would normally then make a second HTTP request to follow the redirect:
GET /events/Baptism.n3 HTTP/1.1
Host: example.com
Accept: application/n-triples, application/x-discovery; q=0.9
This request uses the same Accept
header as the first, as HTTP redirects contain no information about the MIME type of the destination resource, so at this point the application does not know which discovery mechanism the server is using, or whether the server does not support discovery or HTTP content negotiation and is serving a human-readable definition.
The server’s response to this request should be an N-Triples file containing information about the Baptism
term.
A party defining a term may support discovery without using HTTP content negotiation on their web server by serving a machine-readable definition of the term unconditionally (which should be served via a 303 redirect), however it is recommended that such servers implement HTTP content negotiation respecting the Accept
header.
The namespace of a term is another term which identifies a collection of related terms defined by the same party. The term name of the namespace is also referred to as its namespace name. The namespace name of the namespace of some term is found as follows.
If the term name ends with a non-empty fragment identifier, then its namespace name is formed by removing the fragment identifier, leaving an IRI ending with a #
.
[Basic Concepts] uses a datatype identified by the following term name IRI:
http://www.w3.org/2001/XMLSchema#integer
This concludes with a fragment identifier, “integer
”, and therefore its namespace name is its term name with the fragment identifier removed:
http://www.w3.org/2001/XMLSchema#
Otherwise, if the term name ends with a non-empty path segment, then its namespace name is formed by removing the path segment, leaving an IRI ending with a /
.
[Basic Concepts] defines a property identified by the following term name IRI:
https://terms.fhiso.org/types/pattern
This concludes with a path segment, “pattern
”, and therefore its namespace name is its term name with the path segment removed:
https://terms.fhiso.org/types/
Otherwise, the namespace is undefined.
#
or /
, meaning they end with either an empty fragment identifier or an empty path segment.
Term names are sometimes referred using prefix notation. This is a system whereby prefixes are assigned to namespace names which occur frequently in term names. Then, instead of writing the term name in full, the leading portion of the term name equal to the namespace name is replaced by its prefix followed by a colon (U+003A) separator.
http://www.w3.org/2000/01/rdf-schema#Class
is used in several places in this standard. Instead of writing this in full, if the rdfs
prefix is bound to its namespace name http://www.w3.org/2000/01/rdf-schema#
, then this IRI can be written in prefix form as rdfs:Class
.
Terms are used in many contexts in FHISO standards and it can be useful to have a concise, machine-readable way of stating the use for which it was defined.
A class is a term used to denote a particular context or use for which other terms may be defined. Standards defining such contexts should define a class to represent that context, and must do so if the third parties are permitted to define their own terms for use in that context.
A hypothetical standard might defined various terms representing events of genealogical interest that might occur during a person’s lifetime. Examples might include:
https://example.com/events/Baptism
https://example.com/events/Ordination
https://example.com/events/Emigration
https://example.com/events/Death
The standard should provide a class to represent the abstract concept of an event type, and as the class is itself a term, it must have an IRI as its term name. Perhaps it might be:
https://example.com/events/EventType
This class might be referred to as the class of event types.
The term name of a class is also referred to as its class name.
When a term has been defined for use in the context denoted by some class, that class is referred to as the type of the term.
ex
bound to https://example.com/events/
, the type of ex:Baptism
from the previous example is ex:EventType
.
The type of a term is a piece of information which must be provided, perhaps implicitly, when defining a term. As such, the type is a property of the term, as defined in §5.2, and needs a property term to represent it. This standard uses the rdf:type
term for this purpose:
Name | http://www.w3.org/1999/02/22-rdf-syntax-ns#type |
Type | http://www.w3.org/1999/02/22-rdf-syntax-ns#Property |
Range | http://www.w3.org/2000/01/rdf-schema#Class |
rdf:type
property. The first line of this definition states the term name of the rdf:type
property. As required above, the type of a term must be specified when a term is defined and the rdf:type
property is no exception. Its type is rdf:Property
which is defined in §5.2 of this standard. The meaning of the range is given in §5.2.1.
rdf:type
property term is defined §3.3 of [RDF Schema], however implementers may safely use this property term for the purposes of this standard without reading [RDF Schema]. The decision to use this RDF term in FHISO’s standards rather than invent a new term allows for greater compatibility with existing third-party vocabularies.
As a class is a term, defining a class is itself a context in which terms are defined, including by third parties. This means the general concept of a class needs a term defining to represent it. This standard uses the rdfs:Class
term for this purpose:
Name | http://www.w3.org/2000/01/rdf-schema#Class |
Type | http://www.w3.org/2000/01/rdf-schema#Class |
Superclass | http://www.w3.org/2000/01/rdf-schema#Resource |
Required properties | http://www.w3.org/1999/02/22-rdf-syntax-ns#type |
rdfs:Class
.
Although the rdfs:Class
class is defined in §2.2 of [RDF Schema], this standard does not require support for any of the facilities in [RDF Schema], nor are parties defining classes or terms required to do so in a manner compatible with RDF. An implementer may safely use the rdfs:Class
class for the purposes of this standard using just the information given in this section without reading [RDF Schema] or otherwise being familiar with RDF.
The decision to use rdfs:Class
and other terms from [RDF Schema] is due to FHISO’s practice of reusing facilities from existing standards when they are a good match for our requirements, rather than inventing our own versions with similar functionality. It also allows future standards and vendor extensions the option of reusing existing third-party vocabularies where appropriate, as most such vocabularies are also aligned with RDF.
The type of any class is therefore rdfs:Class
.
rdfs:Class
. As rdfs:Class
is just another class, albeit a fairly special one, the type of rdfs:Class
is rdfs:Class
.
A class may be defined as a subclass of another class. The latter class is referred to as the superclass of the former class. The subclass denotes a more specialised version of the context denoted by its superclass. A term whose type is subclass of some other class may be used wherever a term is required whose type is the superclass.
IndividualEventType
to represent individual events for those events that are principally about a single person. In such a scheme, a baptism would be considered an individual event, while a marriage would probably not as it involves two principal participants. In a context where a term of type EventType
is required, an IndividualEventType
like Baptism
may be used; but in a context where an IndividualEventType
is required, others sorts of event such as Marriage
must not be used.
The superclass of a class must be specified when defining a class to be the subclass of some other class. As such, the superclass of the class is a property of the class, as defined in §5.2 and needs a property term to represent it. This standard uses the rdfs:subClassOf
term for this purpose:
Name | http://www.w3.org/2000/01/rdf-schema#subClassOf |
Type | http://www.w3.org/1999/02/22-rdf-syntax-ns#Property |
Range | http://www.w3.org/2000/01/rdf-schema#Class |
rdfs:subClassOf
property term is defined §3.4 of [RDF Schema], however implementers may safely use this property term for the purposes of this standard without reading [RDF Schema]. The decision to use this RDF term in FHISO’s standards rather than invent a new term allows for greater compatibility with existing third-party vocabularies.
The notion of a subclass is transitive, meaning that if a class is a subclass of a second class, and that second class is a subclass of a third class, then the first class is a subclass of the third. The notion of a subclass is also reflexive, meaning that a class is by definition a subclass of itself. The notion of a superclass is similarly transitive and reflexive.
The rdfs:subClassOf
property is defined as a required property of rdfs:Class
, meaning its supertypes must be specified whenever a new class is defined. However this standard does not require every superclass to be identified explicitly. If a class has two or more superclasses, and one of the superclasses is itself a superclass of another of the superclasses, then the superclass of the superclass need not be identified explicitly.
IndividualEventType
class is a superclass of EventType
, but it is equally correct to say that it is a superclass of the rdfs:Resource
universal superclass defined in §5.1.4, below. The IndividualEventType
class therefore has two superclasses, one of which (EventType
) is a superclass of the other (rdfs:Resource
). Because of this, it is not necessary to state that IndividualEventType
is a subclass of rdfs:Resource
.
This standard uses rdfs:Resource
as the universal superclass defined to be the superclass of all classes.
Name | http://www.w3.org/2000/01/rdf-schema#Resource |
Type | http://www.w3.org/2000/01/rdf-schema#Class |
Required properties | http://www.w3.org/1999/02/22-rdf-syntax-ns#type |
rdfs:Resource
class is defined in §2.1 of [RDF Schema].
This class has no semantics of its own, other than to be class of all things that can be expressed in this data model.
rdfs:Resource
class is useful with the rdfs:subClassOf
property when defining a class which has no other superclass.
During discovery, and in other situations when a formal definition of a particular term is needed, it is necessary to have a formalism for providing information about that term.
A property is a particular piece of information that might be provided when defining some entity. The thing being defined is typically a term, and is called the subject of the property.
The property consists of two parts, both of which are required to be present:
The property name shall be a term that has been defined to be used as a property name in the manner required by this standard; a term defined for this purpose is called a property term.
The property value shall be a term, a string, or a language-tagged string. The property value may additionally be tagged with a datatype name, which is a term name defined in §6.
Properties shall not have default property values that applies when the property is absent, however standards may define how an conformant application handles the absence of a property.
Standards which introduce such pieces of information should define a property terms to represent them, and must do so if third parties are permitted to define their own terms and if it is recommended or required that these third parties document or otherwise make available the information represented by the property.
An earlier example introduced several hypothetical terms for events of genealogical interest, such as birth, baptism, ordination, emigration and death. Many events can occur multiple times during a person’s life: for example, a person might emigrate more than once. But other events cannot by definition occur more than once: birth and death are obvious examples. The number of times something is permitted to occur is sometimes called its cardinality, and if the authors of this hypothetical standard considered it a relevant concept, they should define a property term to represent the concept of cardinality:
https://example.com/events/cardinality
If the hypothetical standard allows third parties to define additional types of event, and either recommends or requires that they state the cardinality of the new events, then the standard must define a property term representing cardinality.
The term name of a property term is also referred to as its property term name.
The class of property terms has the following class name:
Name | http://www.w3.org/1999/02/22-rdf-syntax-ns#Property |
Type | http://www.w3.org/2000/01/rdf-schema#Class |
Superclass | http://www.w3.org/2000/01/rdf-schema#Resource |
Required properties | http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/2000/01/rdf-schema#range |
rdf:Property
term is defined in §2.8 of [RDF Schema]. As with the rdfs:Class
term, an implementer may safely use the rdf:Property
terms for the purposes of this standard without reading [RDF Schema].
The range of a property term is a formal specification of allowable property values for a property whose property name is that property term. The range shall be a class name or a datatype name.
When the range is a class, the property value shall be a term whose type is that class; when the range is a datatype, the value associated with the property shall be a string in the lexical space of that datatype.
An earlier example gave a hypothetical cardinality
property term that might be used when defining genealogical events. Most likely, the property value of this property would be a representation of “one” or “unbounded”, depending on whether the event is one that can occur just once, or whether it can occur multiple times. The party defining this property would need to consider how best to represent these two values.
One option is to define two terms to represent these options, say:
https://example.com/events/SinglyOccurring
https://example.com/events/MultiplyOccurring
The context in which these two terms can be used is when specifying a cardinality, so a Cardinality
class would be defined:
https://example.com/events/Cardinality
The type of SinglyOccuring
and MultiplyOccuring
would be Cardinality
, and the range of the cardinality
property would be the Cardinality
class. Having a property and the class that serves as its range only differing in capitalisation is a common idiom.
A second option is to use two strings to represent the possible cardinalities, perhaps “1
” and “unbounded
”. A datatype would then be defined whose lexical space consisted of just these two strings, and the datatype given a name like:
https://example.com/events/Cardinality
As in the first option, the range of the cardinality
property would be the Cardinality
class.
A third and likely preferable option would be to name the cardinality
property differently, say canOccurMultiply
, so that its range could be a standard boolean datatype like xsd:boolean
.
rdf:type
property term in §5.1.1. The type of a term is the class which denotes the context in which it can be used. Therefore the range of rdf:type
is rdfs:Class
, as shown in the property definition table in §5.1.1.
Standards which define property terms should specify their range, and must do so if third parties are permitted to define their own terms and if it is recommended or required that these third parties document or otherwise make available the information represented by the property term.
The range of a property term is itself a property which is defined as follows:
Name | http://www.w3.org/2000/01/rdf-schema#range |
Type | http://www.w3.org/1999/02/22-rdf-syntax-ns#Property |
Range | http://www.w3.org/2000/01/rdf-schema#Class |
rdfs:range
property is defined above to be rdfs:Class
, although the property value of an rdfs:range
property can be either a class name or a datatype name. This works because rdfs:Datatype
is defined as a subclass of rdfs:Class
, and therefore a datatype name can be used where a class name is required.
A property which must be provided when a third party defining a new term with some particular type is called a required property.
types:pattern
, types:nonTrivialSupertype
and types:isAbstract
. These three properties are therefore the required properties for datatypes. In fact, datatypes have a fourth required property which is their type: i.e. a statement that the term is a datatype.
The type of a new term being defined is a class, and therefore the list of the property names of the required properties of a term defined with that type is a property of that class. The property representing the required properties of a class is defined as follows:
Name | https://terms.fhiso.org/types/requiredProperty |
Type | http://www.w3.org/1999/02/22-rdf-syntax-ns#Property |
Range | http://www.w3.org/1999/02/22-rdf-syntax-ns#Property |
requiredProperties
property whose value is a list of property names, classes will normally have multiple requiredProperty
properties each of whose value is a single property name.
The required properties of a class shall include all the required properties of each superclass of the class.
rdf:type
is a required property of rdfs:Resource
and all classes are a subclass of rdfs:Resource
, thus rdf:type
is a required property of every class.
A datatype is a term which serves as a formal description of the values that are permissible in a particular context. Being a term, a datatype is identified by a term name which is an IRI. The term name of a datatype is also referred to as its datatype name.
A datatype has a lexical space which is the set of strings which are interpreted as valid values of the datatype. The definition of a datatype shall state how each string in its lexical space maps to a logical value, and state the semantics associated with of those values.
The mapping from lexical representations to logical values need not be one-to-one. If a datatype has multiple lexical representations of the same logical value, a conformant application must treat these representations equivalently and may change a string of that datatype to be a different but equivalent lexical representation.
integer
datatype used in the previous example is one where the mapping from lexical representation to value is many-to-one rather than one-to-one. This is due to lexical space including strings with a leading +
sign as well as superfluous leading 0
s, and means that “00137
”, “+137
” and “137
” all represent the same underlying value: the number one hundred and thirty-seven. Because conformant applications may convert strings between equivalent lexical representations, they may store them in a database in an integer field and regenerate strings in a canonical representation.
Strings outside the lexical space of a datatype must not be used where a string of that datatype is required. If an application encounters any such strings, it may remove them from the dataset or may convert them to a valid value in an implementation-defined manner. Any such conversion that is applied automatically by an application must either be locale-neutral or respect any locale given in the dataset.
date
type in §3.3.9 of [XSD Pt2] which has a lexical space based on [ISO 8601] dates. If, in a dataset that is somehow identified as being written in German, an application encountering the string “8 Okt 2000
” in a context where an XML Schema date
is expected, it may convert this to “2000-10-08
”. However an application encountering the string “8/10/2000
” must not conclude this represents 8 October or 10 August unless the document includes a locale that uniquely determines the date format. In this case, information that the document is in English is not sufficient as different English-speaking countries have different conventions for formatting dates.
This standard uses the rdfs:Datatype
class as the class of datatypes, defined as follows:
Name | http://www.w3.org/2000/01/rdf-schema#Datatype |
Type | http://www.w3.org/2000/01/rdf-schema#Class |
Superclass | http://www.w3.org/2000/01/rdf-schema#Class |
Required properties | http://www.w3.org/1999/02/22-rdf-syntax-ns#type https://terms.fhiso.org/types/pattern https://terms.fhiso.org/types/nonTrivialSupertypeCount https://terms.fhiso.org/types/isAbstract |
rdfs:Datatype
term is defined in §2.4 of [RDF Schema].
The class of datatypes, rdfs:Datatype
, is defined here to be a subclass of the class of all classes, rdfs:Class
. This may appear counter-intuitive as new classes are normally defined to be a subclass only of rdfs:Resource
, the universal superclass. The reason for doing this is partly for compatibility with its definition in [RDF Schema], but the reasons [RDF Schema] took this unusual decision are also valid here.
Making rdfs:Datatype
a subclass of rdfs:Class
says that a datatype name may be used where a class name is expected. In many situations this is desirable. For example, the range of a property is, in general, a class name, but frequently a datatype name will be used: for example, the range of types:isAbstract
is the xsd:boolean
datatype. By making rdfs:Datatype
a subclass of rdfs:Class
, the range of rdfs:range
can be rdfs:Class
.
A party defining a datatype shall specify a pattern for that datatype. This is a regular expression which provides a constraint on the lexical space of the datatype. Matching the pattern might not be sufficient to validate a string as being in the lexical space of the datatype, but parties defining a datatype must ensure that all strings in the lexical space match the pattern, even if some strings outside the lexical space also match the pattern.
The XML Schema date
type mentioned in a previous example has the following pattern (here split onto two lines for readability — the second line is an optional timezone which the XML Schema date
type allows).
-?([1-9][0-9]{3,}|0[0-9]{3})-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])
(Z|(\+|-)((0[0-9]|1[0-3]):[0-5][0-9]|14:00))?
This pattern matches strings like “1999-02-31
”. Despite matching the pattern, this string is not part of the lexical space of this date
type as 31 February is not a valid date.
The property term representing the pattern of a datatype is defined as follows:
Name | https://terms.fhiso.org/types/pattern |
Type | http://www.w3.org/1999/02/22-rdf-syntax-ns#Property |
Range | https://terms.fhiso.org/types/Pattern |
types:Pattern
datatype used as the range of this property is defined in a separate [FHISO Patterns] standard which defines the dialect of regular expressions which FHISO supports.
xsd:pattern
as the property term, even though it is used as a predicate in OWL 2. Its use would pose a difficulty because none of the relevant W3C specifications indicate what the rdfs:domain
of xsd:pattern
is supposed to be. Possibly it is an owl:Restriction
, which would be incompatible with this use. Using xsd:pattern
would also require us to use precisely the form of regular expression defined in Appendix G of [XSD Pt2].
A datatype with a pattern other than “.*
” is known as a structured datatype, while one with a pattern of “.*
” is known as an unstructured datatype.
A datatype may be defined as a subtype of one or more other datatype which are referred to as its supertypes. This is used to provide a more specific version of a more general datatype. If an application is unfamiliar with the subtype it may process it as if it were one of its supertypes. The subtype must be defined in such a way that at most this results in some loss of meaning but does not introduce any false implications about the dataset.
The lexical space of the subtype shall be a subset of the lexical space of the supertype.
1999-02-31
”.
xsd:anyAtomicType
. It is anticipated that dates will provide a good example, as we expect to need several subtypes of AbstractDate
, but FHISO has yet to specify how dates are handled in this data model.
All datatypes are implicitly a subtype of the xsd:anyAtomicType
abstract datatype defined to be the universal supertype in §6.6.6.
The notion of a subtype is transitive, meaning that if a datatype is a subtype of a second datatype, and that second datatype is a subtype of a third datatype, then the first datatype is a subtype of the third. The notion of a subtype is also reflexive, meaning that a datatype is by definition a subtype of itself. The notion of a supertype is similarly transitive and reflexive.
The trivial supertypes of a datatype are certain supertypes whose status as a supertype of the datatype is implied by the data model defined in this standard. The trivial supertypes of a datatype include the datatype itself and the universal supertype, xsd:anyAtomicType
. A supertype which not a trivial supertype is called a non-trivial supertype.
The property term representing a non-trivial supertype of a datatype is defined as follows:
Name | https://terms.fhiso.org/types/nonTrivialSupertype |
Type | http://www.w3.org/1999/02/22-rdf-syntax-ns#Property |
Range | http://www.w3.org/2000/01/rdf-schema#Datatype |
rdfs:subClassOf
property to represent the supertype of a datatype. This introduced a fairly obscure incompatibility with RDF. RDF only requires that the value space of a subtype is a subset of the value space of the supertype: it says nothing about their lexical spaces. Thus in RDF it would be possible for xsd:boolean
to be a subclass of xsd:integer
if the boolean values “true” and “false” are considered to be identical to the integer values 1 and 0, respectively (though in fact they’re not). This is despite the strings “true
” and “false
” being part of lexical space of xsd:boolean
but not of xsd:integer
. This means a stronger relationship is needed which constrains both the lexical space and the value space. This is what types:nonTrivialSupertype
provides. This standard explicitly does not state whether types:nonTrivialSupertype
is an rdfs:subPropertyOf
rdfs:subClassOf
.
The types:nonTrivialSupertype
property must not be used to record a trivial supertypes of the datatype.
A types:nonTrivialSupertype
property must be used to record every non-trivial supertype of a datatype which is not implied by the transitivity of types:nonTrivialSupertype
and the other types:nonTrivialSupertype
properties present.
DateTime
, Date
, and Year
. If the standard specifies that Year
has a types:nonTrivialSupertype
property with property value Date
, and that Date
has a types:nonTrivialSupertype
property with property value DateTime
, it is not necessary for the standard to record that Year
has a second types:nonTrivialSupertype
property with property value DateTime
as this is implied by the other two. Nevertheless, the standard may do so.
As a way of checking for data integrity during discovery, an additional property is provided representing the number of non-trivial supertypes of the datatype that are either recorded using types:nonTrivialSupertype
properties or are implied by them via transitivity:
Name | https://terms.fhiso.org/types/nonTrivialSupertypeCount |
Type | http://www.w3.org/1999/02/22-rdf-syntax-ns#Property |
Range | http://www.w3.org/2001/XMLSchema#integer |
xsd:nonNegativeInteger
instead?
This types:nonTrivialSupertypeCount
property is a required property of rdfs:Datatype
, and must be specified (with a value of “0
”) even if there are no non-trivial supertypes.
An application which finds out about a datatype through discovery must not assume it knows the supertypes of the datatype unless it has verified that the number of non-trivial supertypes specified with the types:nonTrivialSupertype
property or implied by the transitivity of that property is equal to the value of the types:nonTrivialSupertypeCount
property.
These two properties are likely to be changed in a future draft. A cleaner implementation would be to have a single types:supertypes
property which is a list of the non-trivial supertypes of the datatype, however at the moment the data model does not support list-valued properties. This is a recognised deficiency in the current data model that is likely to be changed, but which requires considerable work.
The reason why a single list-valued property is inherently safe whereas a collection of a properties is not is that the list-valued property can be made a required property which must be present exactly once. If it is not, an application knows that is missing and will not assume it properly understands the datatype. However if one of several types:nonTrivialSupertype
properties goes missing, this might go unnoticed. This is particular relevant if the properties have been processed by RDF applications, as the RDF philosophy is that RDF triples can be taken in isolation and that removing one or more RDF triples merely loses information rather than altering the meaning of something. It is therefore quite conceivable that an RDF triple encoding a property might go missing.
In [CEV Concepts], a missing types:nonTrivialSupertype
might result in a datatype being incorrectly thought not to conform to the range of some citation element, which might result in a valid citation element being discarded. The importance of avoiding this is the reason why the current draft includes a types:nonTrivialSupertypeCount
as a check.
In the datatype definition tables in this standard, a single supertype row is given which is understood to contain a complete list of all non-trivial supertypes and no trivial supertypes.
A datatype may be defined to be a abstract datatype. An abstract datatype is one that must only be used as a supertype of other types. A string must not be declared to have a datatype which is an abstract datatype. Abstract datatypes shall specify a pattern and shall have a lexical space.
The property that represents whether or not a datatype is an abstract datatype defined as follows:
Name | https://terms.fhiso.org/types/isAbstract |
Type | http://www.w3.org/1999/02/22-rdf-syntax-ns#Property |
Range | http://www.w3.org/2001/XMLSchema#boolean |
AbstractDate
datatype, but is it necessary for this datatype to be an abstract datatype?
A language-tagged datatype is a datatype whose values are language-tagged strings consisting of both a string from the lexical space of the datatype and a language tag to identify the language in which that particular string is written.
Language-tagged datatypes should be used whenever a datatype is needed to represent textual data that is in a particular language or script and which cannot automatically be translated or transliterated as required, and should not be used otherwise.
2015
”. Even though an application designed for Arabic researchers might need to render this year as “٢٠١٥” using Eastern Arabic numerals, this conversion can be done entirely in the application’s user interface, so a language-tagged datatype is not required and should not be used.
The [CEV Vocabulary] defines a datatype for representing the names of authors and other people, which has the following term name:
https://terms.fhiso.org/sources/AgentName
A person’s name is rarely translated in usual sense, but may be transliterated. For example, the name of Andalusian historian صاعد الأندلسي might be transliterated “Ṣā‘id al-Andalusī” in the Latin script. Because machine transliteration is far from perfect, a language-tagged datatype should be used to allow an application to store both names.
An author’s names may also be respelled to conform to the spelling and grammar rules of the reader’s language. An Englishman named Richard may be rendered “Rikardo” in Esperanto: the change of the “c” to a “k” being to conform to Esperanto orthography, while the final “o” marks it as a noun. The respelling would be tagged eo
, the language code for Esperanto.
Language-tagged datatypes shall define a pattern, just as other datatypes do.
A datatype that is not a language-tagged datatype is called a non-language-tagged datatype.
AgentName
datatype from the previous example is a microformat which is constrained by a pattern meaning it is a structured datatype, but it is also a language-tagged datatype as names can be translated and transliterated.
A language-tagged datatypes may be used as a supertype of a datatype. All subtypes of a language-tagged datatype shall also be language-tagged datatypes.
xsd:anyAtomicType
) were required to be non-language-tagged, with an exception for subtypes. This requirement has been dropped to allow unions to be defined which contain a mixture of language-tagged datatypes and non-language-tagged datatypes.
All language-tagged datatypes are implicitly a subtype of the rdf:langString
datatype defined in §6.6.5.
types:nonTrivialSupertype
property.
A union of datatypes is an abstract datatype which is defined in terms of a unordered list of two or more distinct datatypes called its constituent datatypes. The constituent datatypes must not themselves be unions of datatypes. The lexical space of a union of datatypes is the union of the lexical spaces of each constituent datatype.
Like any other datatype, a union of datatypes is a term with a term name. It must also specify a pattern.
FHISO plans to define a datatype for representing dates which has the following datatype name:
https://terms.fhiso.org/dates/Date
It is a union of datatypes with the following two constituent datatypes:
http://www.w3.org/1999/02/22-rdf-syntax-ns#langString
https://terms.fhiso.org/dates/AbstractDate
The former is the language-tagged datatype defined in §6.6.5 and is used to record dates that are described in a way that cannot be converted to a structured form without loosing information. The latter is an abstract datatype which serves as the supertype for various structured datatypes for dates.
Because the rdf:langString
constituent datatype is an unstructured datatype, every possible string is part of that of the lexical space of that datatype, and therefore also part of the lexical space of the union of datatypes. This means the pattern specified for the union of datatypes must allow every possible string, and so should be “.*
”.
A union of datatypes may contain language-tagged datatypes, non-language-tagged datatypes, or a mixture of both.
Each constituent datatype of a union of datatypes is a subtype of the union of datatypes. Whenever a union of datatypes is supertype of some other datatype it is defined to be a trivial datatype.
A datatype shall be a supertype of a union of datatypes if and only if it is a supertype of every constituent datatype of the union of datatypes.
rdf:langString
nor AbstractDate
has any non-trivial supertypes, and therefore neither does the Date
union of datatypes.
rdf:langString
and therefore rdf:langString
is a non-trivial supertype of the union of datatypes. This means the union of datatypes is classified as a language-tagged datatype.
The class of unions of datatypes is defined as follows:
Name | http://www.w3.org/2000/01/rdf-schema#Union |
Type | http://www.w3.org/2000/01/rdf-schema#Class |
Superclass | http://www.w3.org/2000/01/rdf-schema#Datatype |
Required properties | http://www.w3.org/1999/02/22-rdf-syntax-ns#type https://terms.fhiso.org/types/pattern https://terms.fhiso.org/types/nonTrivialSupertypeCount https://terms.fhiso.org/types/isAbstract https://terms.fhiso.org/types/constituentDatatypeCount |
types:nonTrivialSupertypeCount
property. It also allows types:constituentDatatypeCount
to be defined as a required property.
The property which denotes a constituent datatype of a union of datatypes is defined as follows:
Name | https://terms.fhiso.org/types/constituentDatatype |
Type | http://www.w3.org/1999/02/22-rdf-syntax-ns#Property |
Range | http://www.w3.org/2000/01/rdf-schema#Datatype |
As a way of checking for data integrity during discovery, an additional property is provided representing the number of constituent datatypes of the union of datatype:
Name | https://terms.fhiso.org/types/constituentDatatypeCount |
Type | http://www.w3.org/1999/02/22-rdf-syntax-ns#Property |
Range | http://www.w3.org/2001/XMLSchema#integer |
xsd:nonNegativeInteger
instead?
These two properties are likely to be changed in a future draft. Much as with the two properties for recording supertypes given in §6.2, a cleaner implementation would be to have a single types:unionOf
property which is a list of the constituent dataptyes of the union of datatypes, however at the moment the data model does not support list-valued properties. This is a recognised deficiency in the current data model that is likely to be changed, but which requires considerable work.
If and when list-valued properties are added to the data model, it may be that the owl:unionOf
property defined in OWL should be reused instead of inventing our own property.
This standard recommends the use of the xsd:string
, xsd:boolean
, xsd:integer
and xsd:anyURI
datatypes defined in [XSD Pt2] to represent strings, booleans, integers and IRIs, respectively. They are described in the following subsections.
XML Schema does not give its types IRIs, but it does give them id
s, and following the best practice advice given in §2.3 of [SWBP XSD DT] gives them IRIs like this:
http://www.w3.org/2001/XMLSchema#integer
These types are also recommended for use in RDF by §5.1 of [RDF Concepts]. RDF requires all datatypes to be identified by an IRI, and IRIs such as the one above are used for XML Schema datatypes.
This section also contains a summary of the rdf:langString
datatype which is used heavily by FHISO technologies.
xsd:string
datatypeSome FHISO standards make limited use of the xsd:string
datatype defined in §3.3.1 of [XSD Pt2]. This is an unstructured non-language-tagged datatype which has the following properties:
Name | http://www.w3.org/2001/XMLSchema#string |
Type | http://www.w3.org/2000/01/rdf-schema#Datatype |
Pattern | .* |
Supertype | No non-trivial supertypes |
Abstract | false |
It is a general-purpose datatype whose lexical space is the space of all strings; however it is not a language-tagged datatype and therefore it should not be used to contain text in a human-readable natural language.
xsd:boolean
and xsd:integer
are not defined as subtypes of xsd:string
in XML Schema.
Use of this datatype is generally not recommended: data that is in a human-readable form should use a language-tagged datatype, while data that is not human-readable should use a structured datatype.
If an application encounters a string with the xsd:string
datatype in a context where a language-tagged string would be permitted, the application may change the datatype to rdf:langString
and assign the string a language tag of und
, meaning an undetermined language.
xsd:string
datatype is included in this standard in order to align this data model more closely with the RDF data model, and in particular the [CEV RDFa] bindings which use this datatype as the default when no language tag is present. The above rule allowing conversion to rdf:langString
means that applications may ignore the xsd:string
datatype.
xsd:boolean
datatypeA boolean is a datatype with precisely two logical values: true and false. FHISO standards represent booleans using the xsd:boolean
datatype defined in §3.3.2 of [XSD Pt2]. This is a structured non-language-tagged datatype which has the following properties:
Name | http://www.w3.org/2001/XMLSchema#boolean |
Type | http://www.w3.org/2000/01/rdf-schema#Datatype |
Pattern | true|false|1|0 |
Supertype | No non-trivial supertypes |
Abstract | false |
The lexical space of this datatype includes four different strings so that the two logical values of the datatype each have two alternative lexical representations. The value true may be represented by either “true
” or “1
”, while the value false may be represented by either “false
” or “0
”. Conformant applications shall not attach any significance to which of the alternative lexical representations is used, and may replace any instance of “1
” in a boolean string with “true
”, or “0
” with “false
”, but not vice versa. Where possible, the numeric representations, “0
” and “1
”, should not be used.
xsd:boolean
allows them, and alignment with the XML Schema datatype is desirable as it is widely used in third-party standards. Appendix E.4 of [XSD Pt2] defines the alphabetic representations, “true
” and “false
”, to be the canonical forms of the datatype, and this standard does similarly.
xsd:boolean
are “true
” and “false
”, which are in English, it is not a language-tagged datatype because the values must not be present in translated form. A Romanian dataset, for example, would still use the value “false
” rather than translating it as “adevărat
”.
xsd:integer
datatypeFHISO uses the xsd:integer
datatype defined in §3.4.13 of [XSD Pt2] to represent integers. It must not be used for values which are typically but not invariably integers.
This datatype can represent arbitrarily large integers, but unless otherwise stated, applications may opt not to support values greater than 2 147 483 647 or less than −2 147 483 648. In the event an unsupported value is encountered, an implementation may handle it in an implementation-defined manner, but must not convert it to a different integer.
xsd:integer
as a signed 32-bit integer except where otherwise noted.
The lexical space of this datatype is the space of all strings consisting of a finite-length sequence of one or more decimal digits (U+0030 to U+0039, inclusive), optionally preceded by a +
or -
sign (U+002B or U+002D, respectively).
137
” is within the lexical space of this datatype, but “20.000
” and “四十二
” are not, despite being normal ways of representing integers in certain cultures.
This datatype has several alternative representations of the same logical integer value. This arises because leading zeros are permitted, the +
sign is optional, and the value -0
is permitted. Applications may remove any leading +
sign and any leading zeros preceding a non-zero digit, and may rewrite -0
as 0
.
This is a structured non-language-tagged datatype which has the following properties:
Name | http://www.w3.org/2001/XMLSchema#integer |
Type | http://www.w3.org/2000/01/rdf-schema#Datatype |
Pattern | [+-]?[0-9]+ |
Supertype | No non-trivial supertypes |
Abstract | false |
xsd:decimal
, but this is not a supported datatype in this standard.
xsd:byte
, xsd:short
, xsd:int
, xsd:long
and their unsigned equivalents. FHISO discourage the use of all of these datatypes in conjunction with FHISO standards as there very few genealogical contexts where an integer is required but where the value can be guaranteed always to fit in these fixed sized datatypes.
xsd:positiveInteger
and xsd:nonNegativeInteger
.
xsd:anyURI
datatypeFHISO uses the xsd:anyURI
datatype defined in §3.3.17 of [XSD Pt2] to represent strings which are valid IRIs.
Formally this is an unstructured datatype with no restrictions imposed on its lexical space; nevertheless, this datatype should only be used with strings which match the IRI-reference
production in §2.2 of [RFC 3987] which matches both absolute and relative IRIs.
Name | http://www.w3.org/2001/XMLSchema#anyURI |
Type | http://www.w3.org/2000/01/rdf-schema#Datatype |
Pattern | .* |
Supertype | No non-trivial supertypes |
Abstract | false |
rdf:langString
datatypeThe rdf:langString
datatype defined in §2.5 of [RDFS] is used as the general-purpose unstructured language-tagged datatype. No constraints are placed on the lexical space of this datatype; the only restriction placed on the use or semantics of this datatype is that it should contain text in a human-readable form.
Any language-tagged datatype that is not defined to be a subtype of some other datatype shall implicitly be considered to be a subtype of the rdf:langString
datatype.
rdf:langString
is the ultimate supertype of all language-tagged datatypes.
This datatype has the following properties:
Name | http://www.w3.org/1999/02/22-rdf-syntax-ns#langString |
Type | http://www.w3.org/2000/01/rdf-schema#Datatype |
Pattern | .* |
Supertype | No non-trivial supertypes |
Abstract | false |
xsd:anyAtomicType
datatypeThe xsd:anyAtomicType
datatype defined in defined §3.2.2 of [XSD Pt2] is used as the universal supertype of all datatypes.
This datatype has the following properties:
Name | http://www.w3.org/2001/XMLSchema#anyAtomicType |
Type | http://www.w3.org/2000/01/rdf-schema#Datatype |
Pattern | .* |
Supertype | No non-trivial supertypes |
Abstract | true |
xsd:anyAtomicType
datatype is defined §3.2.2 of [XSD Pt2]. That standard does not define it as an abstract datatype as XML Schema’s notion of abstract types does not extend to simple types. Neverthless, xsd:anyAtomicType
is treated specially by XML Schema in a way that is similar to this standard’s definition of an abstract datatype. It is also not considered an “RDF-compatible XSD type” in §5.1 of [RDF Concepts] which means it should not be used as a datatype in RDF; again, this is similar to this standard’s notion of an abstract datatype.
Any non-language-tagged datatype not defined to be a subtype of any other datatype shall implicitly be considered to be a subtype of the xsd:anyAtomicType
datatype.
xsd:anyAtomicType
is a subclass of rdfs:Literal
. So is rdf:langString
. This standard does not explicitly say this as FHISO’s data model currently has no need for the rdfs:Literal
class.
Copyright © 2017–18, Family History Information Standards Organisation, Inc. The text of this standard is available under the Creative Commons Attribution 4.0 International License.