Technical
Site

Download

Simple Triples Discovery Mechanism

This is an exploratory draft of a standard defining a simple, general-purpose discovery mechanism. This document is not endorsed by the FHISO membership, and may be updated, replaced or obsoleted by other documents at any time.

FHISO's Simple Triples Discovery Mechanism (or Triples Discovery) provides a way for internet-connected applications to attempt to gain information on any unfamiliar terms they may encounter, allow these terms to be better processed. Unknown terms can appear in data as a result of third-party extensions being used, when data conforming to a new standard is read by an older application, or if data conforming to other standards is present.

In Triples Discovery, an application makes an HTTP request to the term name IRI with an Accept header requesting a response in the N-Triples format. The details of these HTTP requests and their responses are given in §2. The N-Triples format is described in §3; it is extremely simple to parse, and is supported in various libraries by virtue of being a small subset of the more popular Turtle serialisation format.

Conventions used

Where this standard gives a specific technical meaning to a word or phrase, that word or phrase is formatted in bold text in its initial definition, and in italics when used elsewhere. The key words must, must not, required, shall, shall not, should, should not, recommended, not recommended, may and optional in this standard are to be interpreted as described in [RFC 2119].

An application is conformant with this standard if and only if it obeys all the requirements and prohibitions contained in this document, as indicated by use of the words must, must not, required, shall and shall not, and the relevant parts of its normative references. Standards referencing this standard must not loosen any of the requirements and prohibitions made by this standard, nor place additional requirements or prohibitions on the constructs defined herein.

Derived standards are not allowed to add or remove requirements or prohibitions on the facilities defined herein so as to preserve interoperability between applications. Data generated by one conformant application must always be acceptable to another conformant application, regardless of what additional standards each may conform to.

If a conformant application encounters data that does not conform to this standard, it may issue a warning or error message, and may terminate processing of the document or data fragment.

This standard depends on FHISO's Basic Concepts for Genealogical Standards standard. To be conformant with this standard, an application must also be conformant with [Basic Concepts]. Concepts defined in that standard are used here without further definition.

In particular, precise meaning of string, language tag, term, discovery, class, class name, type, property, property name and datatype are given in [Basic Concepts].

Indented text in grey or coloured boxes does not form a normative part of this standard, and is labelled as either an example or a note.

Editorial notes, such as this, are used to record outstanding issues, or points where there is not yet consensus; they will be resolved and removed for the final standard. Examples and notes will be retained in the standard.

In some of the examples in this standard, long lines are broken across multiple lines to improve readability. Where this has occurred, the continuation lines are prefixed with a "" to mark the continuation. To get the actual text, this character needs to be removed, and the continuation line appended to the previous line with a single space character (U+0020) separating the previous line's content from the continuation line's.

The grammar given here uses the form of EBNF notation defined in §6 of [XML], except that no significance is attached to the capitalisation of grammar symbols. Conforming applications must not generate data not conforming to the syntax given here, but non-conforming syntax may be accepted and processed by a conforming application in an implementation-defined manner.

This standard uses prefix notation when discussing specific terms. The following prefix bindings are assumed in this standard:

rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs http://www.w3.org/2000/01/rdf-schema#
xsd http://www.w3.org/2001/XMLSchema#
types https://terms.fhiso.org/types/
The particular prefix assigned above have no relevance outside this standard document as prefix notation is not used in the formal data model defined by this standard. This notation is simply a notational convenience to make the standard easier to read.

HTTP requests and responses

Discovery is defined in §4.2 of [Basic Concepts] being when an HTTP request to the term name IRI, made with an appropriate Accept header, results in a particular machine-readable format. This section defines how those HTTP requests and responses are made in Triples Discovery.

When an application opts to carry out discovery using this Triples Discovery mechanism on a term whose term name IRI has an http or https scheme, it shall make an HTTP GET request to the URI that results from the conversion of the term name IRI to a URI per §4.1 of [RFC 3987].

This standard does not specify how Triples Discovery works with terms using other IRI schemes, and the use of other schemes is not recommended by §4 of [Basic Concepts].

The IRI to which the initial GET request is made is called the discovery IRI. It is the term name IRI with any fragment component removed.

The term name https://example.com/events#Birth contains a fragment component, therefore its discovery IRI is https://example.com/events.

The GET request should have an Accept header that is well-formed according to §5.3.2 of [RFC 7231], and which references the N-Triples media type, "application/n-triples". The request's Accept header may alternatively or additionally reference the media types of one of the alternative RDF formats described in §4 of this standard, but conformant servers need not support support those formats.

If the discovery IRI is not one of the cases listed in §5, and is not an IRI used for another purpose, it is recommended that servers issue a 404 "Not Found" response. Applications must not consider a 404 to mean the term is invalid.

As it is only recommended and not required that parties defining new terms make information available online at the term name IRI, a 404 response can also mean the provider has chosen not to follow this recommendation. This might occur when a term is no longer supported by the organisation which originally defined it, but is still in use.

After any initial redirections, a conformant server should use the algorithm in §5.3 of [RFC 7231] to consider each of those media types listed in the Accept header which the server supports, including any documentation formats or other discovery formats outside the scope of this standard, to select the media type of the resource that will be served. If the server supports none of the listed media types, it should send a 406 "Not Acceptable" response; otherwise, if the selected media type is the N-Triples media type or a supported alternative type from §4, and the discovery IRI is one of the cases listed in §5, the server should continue with Triples Discovery as outlined here. If the Accept header was precisely "application/n-triples" then a conformant server must continue with Triples Discovery.

It is only recommended that parties defining new terms do arrange for HTTP content negotiation to be performed properly as described above and in [RFC 7231], but not required by this standard. The reason for this is that some popular web servers do not make necessary configuration straightforward, and much of the published advice on the subject is to use basic pattern matching on Accept headers rather than proper content negotiation. An prominent example of such advice is [SWBP Vocab Pub], published as best practice by the World Wide Web Consortium. Recipes 3 and 4 from this can result in certain complex Accept headers being parsed contrary to [RFC 7231]; nevertheless, this standard allows server administrators to follow [SWBP Vocab Pub] while remaining conformant with this standard.

An application might send a GET request with the following Accept header:

Accept: application/x-discovery; q=0.9, application/n-triples

This is well-formed according to §5.3.2 of [RFC 7231]. The q=0.9 in the Accept header is a quality value attached to the preceding media type. It indicates that the hypothetical x-discovery format is less preferred than N-Triples which by default has a quality value of 1.0. Placing a less preferred format before the preferred format is unorthodox but not prohibited.

A server that supports both the x-discovery format and N-Triples should provide an N-Triples description of the term per this standard, but because the Accept header was not exactly "application/n-triples", this is not required. If the server uses some form of pattern matching on the Accept header and concludes that x-discovery must be preferred as it is listed first, this behaviour, while incorrect, is still conformant with this standard.

Except when the discovery IRI is a namespace name as defined in §5.XXX, a conformant server shall issue a redirect to a resource containing a description of the discovery term in the selected format. This redirect should be a 303 "See Other" redirect, and must not be a 301 "Moved Permanently". If the discovery IRI is a namespace name, a redirect may be produced but is not required.

A redirect is required when the discovery IRI is a term name to avoid confusing the term name with the document containing its definition, which is found at the post-redirect URL. Neither this standard nor [Basic Concepts] currently defines properties for use with documents, but future FHISO standards might, and servers conforming to this standard may include in their response RDF triples outside the scope of this standard, such as [Dublin Core] metadata about the document. Without this requirement, such metadata should be indistinguishable from properties about the term subject to discovery.

Suppose an application wants to perform discovery on a hypothetical https://example.com/events/Baptism term. An application wanting to maximise the likelihood of a response from any conformant server might make the following request:

GET /events/Baptism HTTP/1.1
Host: example.com
Accept: application/n-triples

This standard does not specifically require support for HTTP/1.1, but it is currently the most widely used version of HTTP and servers are strongly encouraged to support it. If the server does, as the Accept header is exactly "application/n-triples", a conformant server must conclude this is a request for an N-Triples representation of the term. And as the discovery IRI is the term name, it must respond with a redirect:

HTTP/1.1 303 See Other
Location: https://example.com/events/Baptism.n3
Vary: Accept

In this case the redirect is to the original IRI but with .n3 appended, however the actual choice of IRI is up to the party defining the term and running the example.com web server. When a server's response is dependent on the contents of an Accept header, §7.1.4 of [RFC 7231] says that this should be recorded in a Vary header, as it is in this example. In practice other headers are likely to be present too, probably including a Date header containing the current date and time, and a Server header identifying the web server software; these have been omitted for brevity.

The application would normally then make a second HTTP request to follow the redirect:

GET /events/Baptism.n3 HTTP/1.1
Host: example.com
Accept: application/n-triples

This request uses the same Accept header as the first, as HTTP redirects contain no information about the MIME type of the destination resource, so at this point the application does not know whether the server has done HTTP content negotiation.

The server's response to this request should be an N-Triples file containing information about the Baptism term.

HTTP/1.1 200 OK
Content-Type: application/n-triples

<https://example.com/events/Baptism> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://example.com/types/Event> .

The meaning of the N-Triples in the request body is described in §3.

Conformant servers may issue additional redirects during the Triples Discovery process, including 301 "Moved Permanently" redirects. Applications must not infer anything from the use of redirects: in particular, if one term name IRI permanently redirects to another term name IRI, applications must not assume the terms are synonymous.

Conformant servers may support any version of HTTP and any additional HTTP features.

Conditional HTTP requests per [RFC 7232] are an example of a feature that the operators of conformant servers may opt to support. Applications may choose to repeat discovery on certain terms after some time has elapsed, and include an If-Modified-Since header in the request. A server that has chosen to support conditional requests would respond with a 304 "Not Modifed" if the results of discovery have not changed. If the results have changed, if the server cannot determine whether they have changed, or if the server does not support conditional requests of this form, it would produce a 200 "OK" response containing triples describing the term.

N-Triples syntax

N-Triples is a line-based format. Each non-empty line contains a triple, which is a sequence of three elements separated by whitespace, and ending with a "." (U+002E). In the simplest case, each of the three elements is a term written as an absolute IRI enclosed in "<" and ">" (U+003C and U+003E). These three elements are known as the subject, predicate and object of the triple, and are used in this discovery mechanism to record the properties of a term. The subject of the triple shall be the subject of the property: that is, the term being described. The predicate shall be the property name of the property, and the object shall be its property value.

The following is one triple in the N-Triples format:

<https://example.com/types/Date> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Datatype> .

To be valid N-Triples, the triple must be on a single line, and there should be exactly one space character (U+0020) separating each pair of IRIs. The "." (U+002E) at the end of the line is a required part of the N-Triples syntax to mark the end of a triple.

In this example, the subject is a hypothetical Date term, the predicate is rdf:type and the object is rdfs:Datatype. This triple is therefore describing the Date term and saying that the value of its rdf:type property is rdfs:Datatype: i.e. that this Date term is a datatype.

The details of this syntax are defined in the [N-Triples] standard. A conformant application may choose only to support the canonical form of N-Triples define in §4 of [N-Triples], but are recommended to support the full N-Triples syntax. Conformant servers must produce N-Triples in canonical form.

Canonical N-Triples is a form of N-Triples which does not allow arbitrary whitespace, comments or certain escape constructs. This results in a further simplification to the parsing of N-Triples by removing alternative ways of serialisation the same triple.

This standard prefers Canonical N-Triples in part because the precise details of whitespace handling is underspecified in the full N-Triples syntax. This is the subject of erratum 24 in [RDF Errata] and is likely to be addressed in a future version of [N-Triples], most likely by allowing more liberal use of whitespace. To avoid potential incompatibilities when this is resolved, N-Triples producers must be conservative in the features they use, while consumers should be permissive in what they accept.

Literals

Instead of being a term, the object of a triple may alternatively be a literal, which has a string value instead of an IRI, and is serialised in double quotes (U+0022). If the predicate of the triple is a property term whose range is a datatype, then the object of the triple shall be a literal rather than a term; otherwise the object of the triple shall be a term.

If discovery is performed on a datatype name, the resulting N-Triples should include its type and pattern, as specified using the rdf:type and types:pattern properties terms. For a hypothetical date type, this might be as follows:

<https://example.com/types/Date> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Datatype> .
<https://example.com/types/Date> <https://terms.fhiso.org/types/pattern> "[0-9]{4}-[0-9]{2}-[0-9]{2}" .

The range of the rdf:type property term is rdfs:Datatype which is a class, and therefore the property value is serialised in N-Triples as a term; however the range of the types:pattern property term is types:Pattern which is a datatype, and therefore its property value is serialised as a literal.

In N-Triples, a literal may optionally be followed by either a language tag or a datatype name, but not both. If either of these is present, it is placed after the quoted string in the serialisation: the language tag if present is preceded by an "@" (U+0040), and a datatype name is enclosed in "<" and ">" (U+003C and U+003E) and preceded by "^^" (U+005E twice).

<https://example.com/types/YearMonth> <https://terms.fhiso.org/types/pattern> "[0-9]{4}-[0-9]{2}"^^<https://terms.fhiso.org/types/Pattern> .
<https://example.com/types/YearMonth> <http://www.w3.org/2000/01/rdf-schema#label> "Jahr und Monat"@de .

The object of the first of these triples is a literal tagged with a datatype name, types:Pattern. The object of the second triple is a literal with a language tag, de representing German.

A literal followed by a language tag is used to serialise a property value which is a language-tagged string. If the predicate of the triple is a property term whose range is a language-tagged datatype, then the object of the triple shall be a literal with a language tag.

N-Triples has no mechanism for stating a default language tag, so the language tag must not be omitted when serialising a language-tagged string.

The use of literals with datatype names is not recommended when N-Triples is used in Triples Discovery. Conformant applications must be able to parse literals containing them, but may ignore any datatype names encountered and parse the literal as if it were absent.

Applications are required be able to parse literals containing datatype names so that they can parse N-Triples data that was not generated specifically for the purpose of FHISO's Triples Discovery mechanism.
Because Triples Discovery is only intended as a discovery mechanism for obtaining the definition of a term, it only need accommodate properties that might reasonably be defined on terms. At present this does not include properties with polymorphic datatypes, which is when a datatype name might be needed. For this reason their use is not recommended. If this mechanism is generalised in the future and used for genealogical data, rather than just metadata on terms, it will be necessary to support language tags and datatype names properly.

Conformant servers must not produce triples whose object is a literal with a datatype name unless the datatype is either the range of the predicate of the triple or is a subtype of the range of the predicate. Applications may discard any triple not conforming to this requirement.

The previous example included the following triple:

<https://example.com/types/YearMonth> <https://terms.fhiso.org/types/pattern> "[0-9]{4}-[0-9]{2}"^^<https://terms.fhiso.org/types/Pattern> .

A conformant server may generate this, even though the use of the datatype name is not recommended. This is because the range of the types:pattern property term is defined to be types:Pattern, which is the datatype name used.

Blank nodes

N-Triples also allows the subject or object of a triple to be a blank node, which have serialisations in N-Triples beginning with "_:" (U+005F, U+003A). This Triples Discovery mechanism makes no use of blank nodes and conformant applications should ignore any triples containing them.

The following triple has a blank node as its object and should be ignored.

<https://example.com/types/YearMonth> <http://www.w3.org/2000/01/rdf-schema#isDefinedBy> _:1 .              
A future FHISO standard might use blank nodes to represent more complicated property values that cannot conveniently be represented by a term or a literal. For this reason, this standard does not prohibit conformant servers from generating triples using blank nodes.

Other formats

Discovery IRIs

References

Normative references

[Basic Concepts]
FHISO (Family History Information Standards Organisation). Basic Concepts for Genealogical Standards. First public draft. (See https://fhiso.org/TR/basic-concepts.)
[N-Triples]
W3C (World Wide Web Consortium). RDF 1.1 N-Triples. David Becket, 2014. W3C Recommendation. (See https://www.w3.org/TR/n-triples/.)
[RFC 3987]
IETF (Internet Engineering Task Force). RFC 3987: Internationalized Resource Identifiers (IRIs). Martin Duerst and Michel Suignard, eds., 2005. (See https://tools.ietf.org/html/rfc3987.)
[RFC 7230]
IETF (Internet Engineering Task Force). RFC 7230: Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing. Roy Fielding and Julian Reschke, eds., 2014. (See https://tools.ietf.org/html/rfc7230.)
[RFC 7232]
IETF (Internet Engineering Task Force). RFC 7232: Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests. Roy Fielding and Julian Reschke, eds., 2014. (See https://tools.ietf.org/html/rfc7232.)

Other references

[Dublin Core]
Dublin Core Metadata Initiative. Dublin Core metadata element set. Dublin Core recommendation, version 1.1, 1999. See http://dublincore.org/documents/dcmi-terms/.
[RDF Errata]
W3C (World Wide Web Consortium). RDF1.1 Errata. (See https://www.w3.org/2001/sw/wiki/RDF1.1_Errata.)
[RFC 7231]
IETF (Internet Engineering Task Force). RFC 7231: Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content. Roy Fielding and Julian Reschke, eds., 2014. (See https://tools.ietf.org/html/rfc7231.)
[SWBP Vocab Pub]
W3C (World Wide Web Consortium). Best Practice Recipes for Publishing RDF Vocabularies. Diego Berrueta and Jon Phipps, eds., 2008. W3C Working Group Note. (See https://www.w3.org/TR/swbp-vocab-pub/.)

Copyright © 2017, Family History Information Standards Organisation, Inc. The text of this standard is available under the Creative Commons Attribution 4.0 International License.