[TSC-public] Fwd: Re: [FHISO-TSC] {Spam?} Vocabularies Document

Richard Smith rsmith at fhiso.org
Tue Feb 16 20:57:16 CST 2016

[Sorry, dropped you from the To: list, Tony.]

-------- Forwarded Message --------
Subject: Re: [FHISO-TSC] {Spam?} Vocabularies Document
Date: Wed, 17 Feb 2016 02:52:48 +0000
From: Richard Smith <rsmith at fhiso.org>
To: tsc at fhiso.org

On 16/02/16 18:00, Luther Tychonievich wrote:
> I have no real opinion on the enumeration topic, either vocabulary or
> even its presence.  I'm content with whatever you agree upon.
> Can we remove the first cluster of comments by adding
>      The word *vocabulary* can refer to either a "partially-controlled
> vocabulary" (which may be extended to contain additional *term*s) or a
> "controlled vocabulary" (which may not be extended).
> to the end of the paragraph defining "vocabulary" and "term"?

My problem with this is that 'vocabulary' has a well-established meaning
in the linked data and semantic web world, which is subtly different to
what Tony is using it to mean.  It is not usually formally defined and
rarely used normatively in standard, nonetheless it is widely understood
to mean a loosely connected collection of terms (URIs), plus their
definitions.  Luther's current definition ("a set of *term*s paired with
their well-defined meanings") is entirely compatible with this and is
just a formal statement of the normal de facto definition.

Typically the word 'vocabulary' is used to mean all the terms in a
particular standard, or group of standards, so Dublin Core, FOAF or the
W3 PROV are examples of vocabularies, and you'll find them referred to
as such throughout the literature.  In this sense 'vocabulary' is often
synonymous with 'namespace', though it needn't be: some vocabularies use
multiple namespaces (Dublin Core, for example).  Regardless of whether
of not we think it's politically expedient to play up the linked data
and semantic web side of things, and personally I think it might be best
down-played, there's no escaping that if we're using HTTP URIs as terms,
what we're doing (and what GEDCOM X has already done) is defining new
linked data vocabularies.

Most standards define several types of term.  Even if the primary
objective is to define values for a single property or datum (such as
event types), it's common for find a handful of other terms being
defined: perhaps to classify the other terms, perhaps to label the class
of terms, or perhaps as some form of associated value.  And most
standards end up defining rather more than a single property's values.
It's perhaps not strictly wrong to refer to a subset of a vocabulary as
a vocabulary, so it's not necessarily wrong to refer to the set of terms
that can be used in a particular context as a vocabulary, but it's not
normal usage.  But in normal usage, it's really not meaningful to start
talking about vocabularies being extensible, except in the sense that a
future version of the defining standard might augment it; third parties
can never extend vocabularies, though they can sometimes build their own
vocabularies that are usable with them.

In any case, I hope the last paragraph is now sufficiently clear that
some areas may be extensible and others may not be, and that we will
document which is applies to each case.

If there are still concerns about the paragraph defining 'vocabulary',
I'd rather fix it by removing entirely the sentence "These words are
intended to evoke the idea of controlled vocabulary without specifying a
closed or controlled character."

Richard Smith                    rsmith at fhiso.org
FHISO Technical Standing Committee Co-Coordinator

TSC mailing list
TSC at fhiso.org

More information about the TSC-public mailing list