[TSC-public] Elements of Genealogical Analysis

Enno Borgsteede ennoborg at gmail.com
Fri May 15 08:08:08 CDT 2015

>> Re your other points, I admire DeadEnds and agree that "1 INDI @XXX@" is a good step, but do not find either that step nor the full DeadEnds model to meet the full range of activities I have researchers ask me to support.  Clearly there are important tangential issues (such as those being discussed at sources-citations at fhiso.org), but there are also missing elements of reasoning.  One example from EGA is Anderson's discussion of the structure of inferences as internal knowledge plus external information.  And yes, I know that you can put that in anthropocentric textual notes (EGA itself appears to advocate such an approach insofar as it does not include a model of reasoning in its presentation of a data model), but notes are language-dependent and do not yield well to automated query.  I hesitate to put any common, structured, and important part of research in a data model as human language text.
> I understand your beliefs on this. You think that we can (and should) codify our reasons for making decisions into some formal, formulaic form. As I’m sure you have figured out, I believe that idea is ivory tower thinking, essentially impossible, and a waste of time.
> In this regard EGA suggests that every decision require a confidence rating, and EGA uses “almost certain”, “highly probable”, “probable”, and “possible” as the ways to rate the confidence in decisions. RCA goes on to say, “By the nature of genealogical reasoning, it is impossible to apply mathematical formulas by which you compute these probabilities. Rather, assign these confidence levels based on your developing experience as a genealogical analyst.”
> I believe that you believe EGA is wrong on this and that something formal is possible, or maybe even necessary. Can you confirm that?
I can't speak for Luther, but I do agree with him. Reasoning in text 
form keeps us trapped in language issues, and opposes the neutrality of 
an international standard. It's like source details without persona, 
counter productive.

Your quotes of EGA/RCA show a clear contradiction too, because when one 
speaks about a confidence rating, or a scale, one can apply fuzzy logic 
to that, so yes, I see room for formulas. They're fuzzy, but still formulas.

So, yes, I strongly believe that RCA is wrong. His reasoning is absurd.
>> EGA, DeadEnds, the GENTECH date model, the Genealogical Proof Standard, and most other approaches I have seen fail to even mention, let alone address, collaboration.  Collaboration is so prevalent as *the* complaint people have with existing data models that I almost never hear any other.
> This makes it clear that you believe 1) collaboration is critical to genealogical research, and 2) collaboration requires significant support by our genealogical data model. In contrast to you, I 1) don’t believe collaboration is such a primary concern, but even if it were, 2) I believe essentially all support for collaboration should be provided through software and that it has next to no impact on the data model. All you might have to do is record who is making each decision. What I think is critical is providing a genealogical data model that can support research as defined by processes such as those described in EGA.
I'm with Luther here. Collaboration is critical in many senses, one 
being that I don't believe that many genealogists will work with 
personae for their own fun. They will probably work in the classic way, 
with a conclusion tree and old school citations, not only because that's 
what their (desktop) software supports, but also because it works for 
them. Source mining as described by Tony Proctor in


looks like a typical example of collaboration between people, even if 
it's between people of a different kind, like indexers and miners. I'm a 
miner myself, primarily on the FamilySearch Family Tree, and with that 
being used on-line, and in a dozen desktop programs, I think this is an 
important collaborative effort, which has the risk of becoming sort of 
closed, because its standard is not fully open, at least not in the area 
of record contents, or citation elements.

This EGA looks quite conservative to me, for a start because it's in a $ 
25 book. What's the value in that if it's only a process? I can 
understand it for EE, but for a process? And that too would defeat the 
purpose of FHISO, which is supposed to be open and neutral.

>> The difficulty of collaboration was also the motivating concern that gave birth to BetterGEDCOM, the organisation that gave birth to FHISO.
> Not really. The problem that started BetterGEDCOM was the difficulty in sharing GEDCOM files by moving them from one person’s database to another’s. Solving that problem might be a prerequisite to full-scale collaboration, but systems in which multiple persons, on multiple machines, share a single database that they work on simultaneously (which is how I would define collaboration), which database instantly gets mirrored on their own machines, was way beyond what BetterGEDCOM considered. And for me still, this is all in software, not in the model.
H'm, right. Better GEDCOM started about 5 years ago, and that file 
format isn't there, but people do collaborate in various forms, through 
APIs, on-line on FamilySearch and Geni, etc., see also Louis Kessler's 
blog on


I'm still pessimistic about the state of the APIs, because they don't 
exchange the data that I like to see, but the vision itself is a very 
realistic one, I think.

>> To my mind two of the main reasons to model the research process carefully are that doing so can (a) reduce the number of places where areas of disagreement and areas of agreement are both stored in a single data element, and (b) permit the simultaneous existence of several conflicting bits of reasoning without invalidating or duplicating the entire "tree".  I was disappointed, though not surprised, to discover that EGA did not these issues at all.
> Don’t you think that the only two things needed to support ideas about recording conflicting reasoning and allowing collaboration are:
> 1. The ability to let the Records be simultaneously in different Linkage Bundles and Linkage Bundles be simultaneously in different Dossiers? [Using EGA terminology]. This allows you to record conflicting reasoning within the same trees. Note that whether you do or don’t do this has nothing to do with the model. DeadEnds, for example, is agnostic as to whether software using it as a model would either allow or disallow such simultaneous and conflicting linking.
> 2. The ability to identify which persons are responsible for the decisions that create each Linkage Bundle and Dossier? Isn’t this what collaboration is, allowing different persons to make decisions within the same overall structure that don’t unwind other persons’ decisions? Can’t recording who is making the decisions just be a sub-property of the reason property that every Linkage Bundle and Dossier should have?
Sure, but what is new about that? It's all in the GEDCOM X conceptual 
model. Look at subjects, attribution, etc. Why introduce new 
terminology, again? It worked for persona, should work for locations 
too, why use a new French word like 'dossier'. What's it for?



More information about the TSC-public mailing list