[TSC-public] Progress on ELF

Richard Smith rsmith at fhiso.org
Tue Oct 24 16:13:25 CDT 2017


Back in the summer we announced that FHISO were going to develop
Extended Legacy Format (ELF), which will be a document model and file 
format that is fully compatible with current uses of GEDCOM 5.5.1, but 
with the addition of a structured extensibility mechanism.  We've been 
making good progress on this, and hope to have some first drafts 
available for wider review before too long.

We have decided to split ELF into two standards, with "ELF 
Serialisation" defining the low-level serialisation format and "ELF Data 
Model" defining a lineage-linked genealogical data model on top of the 
low-level serialisation model.  These correspond roughly to chapters 1 
and 2 of the GEDCOM spec.  In principle ELF Serialisation can be used 
with other data models, though we have no immediate plans to define any 
other ELF-based data models.  We're calling these standards together ELF 
1.0.

After a few recent discussions on whether some particular feature should 
be included or deferred to a later point, we decided we needed to 
clarify exactly what the scope for ELF was and how we see development 
progressing after the initial release.   We've now done that, and I've 
described our current thinking below.  We'd welcome any comments or 
questions on it.

ELF 1.0's two main goals are compatibility and extensibility.  It must 
be backwards compatible with GEDCOM 5.5 and 5.5.1, while providing a 
much more solid framework for future extensions, whether by FHISO or by 
third parties.

A high degree of compatibility with GEDCOM essential if ELF is to be 
successful.  It is vital that ELF files produced by applications 
following the recommendations in ELF 1.0 will be readable by 
applications expecting GEDCOM, and that ELF applications can read 
current GEDCOM file, including files that deviate from the GEDCOM 
standard in various well-known ways.  However we're not requiring 
backwards compatibility with every ancient version of GEDCOM: just 5.5 
and 5.5.1 (though in practice we anticipate being compatible with 5.4 
too).

Let me give three simple examples of decisions made as a result of these 
requirements for compatibility.

   * We considered allowing compliant applications not to support GEDCOM 
files in ANSEL, as it can be an awkward character set to support; 
however, we concluded we must require ANSEL support as there are still a 
significant number of GEDCOM files written in ANSEL, though we plan to 
deprecate it with a view to perhaps removing support in the future.

   * We also considered recommending that applications did not use CONC 
tags to split long lines when exporting ELF.  We decided not, and rather 
to recommend applications to use CONC tags to avoid writing long lines, 
as we believe there are still applications that take advantage of the 
current 255 character maximum line length which would be unable to read 
the resulting files.  We will require ELF applications to read arbitrary 
length lines, however.

   * We reviewed how applications which support same-sex unions 
represented them in GEDCOM.  It turns out most use FAM records with one 
HUSB and one WIFE, but have both the HUSB and WIFE line pointing to 
individuals of the same sex.  As a result, ELF will prohibit 
applications from requiring HUSB points to a man and WIFE points to a 
woman.  This is an instance where the requirement for compatibility has 
resulted in a change from GEDCOM which is at best ambiguous on whether 
this is legal.

The requirement for compatibility does not mean ELF will have no new 
functionality.  We believe that current applications are very tolerant 
of unknown tags, and to some extent of known tags appearing in unknown 
contexts, and new features will take advantage of this.  That said, we 
have decided to be very conservative on what we add in ELF 1.0, and only 
include extensions that fulfil a technical requirement rather than a 
genealogical requirement.  Some example of extension that are needed for 
technical reasons are as follows:

   * an extensibility mechanism which allows multiple future extensions, 
whether by third parties or by FHISO, to safely coexist;

   * an escape mechanism that ensures that any string can be be encoded 
on a line (or series of lines) in ELF; and

   * a way of tagging lines with natural language payloads with the 
language it is written in.

The result is that the ELF Serialisation standard will include quite a 
lot of new functionality, while the ELF Data Model standard will be 
little more than a reformation of GEDCOM 5.5.1 in ELF terms.  The main 
reason for this is to keep the task of producing ELF 1.0 manageable with 
the very limited resources we have available.  We're currently minded to 
tale a firm line on not including extension on genealogical grounds to 
avoid a slow feature creep.  For example, we recently discussed whether 
to add "1 SEX X" to represent individuals who can't readily be described 
by the M and F values.  Although this would be a trivial change, and we 
have a way of stopping it from causing any compatibility problems, we 
are likely not to include it in ELF 1.0.

This does not mean there won't be new genealogical features in ELF, just 
that they won't be included in the two standards that will comprise ELF 
1.0.  Using the new extensibility features in ELF 1.0, we plan to 
produce a series of further standards extending and updating the GEDCOM 
5.5.1 data model that ELF 1.0 will have inherited.  The first such 
standard is expected to be on Citation Elements which will tie into our 
current work on the subject.  We currently anticipate it being released 
at the same time as the ELF 1.0 standards (though a first public draft 
of it is still some way away and it will not be in the next batch of 
public drafts).  Nevertheless, the ELF Citation Elements standard will 
not be part of ELF 1.0, and applications supporting ELF 1.0 will not be 
required to support it.

Further into the future we hope there will be further extension covering 
a wider range of facilities.  We already have some ideas in mind, and 
I'm sure others will have suggestions too.   But for the time being we 
want to focus on this relatively limited scope.

As we haven't said anything about ELF since June and our comments then 
were very brief, I thought it was about time I posted an update and gave 
people a chance to say whether they're happy with the direction we've 
chosen and express any concerns they may have.

-- 
Richard Smith,                       FHISO   <http://fhiso.org/>
FHISO Technical Co-Coordinator       One Community, One Standard



More information about the TSC-public mailing list