[TSC-public] Format for Raw Source Content

Jan Murphy packrat74 at gmail.com
Thu Jan 8 21:50:23 EST 2015


On Thu, Jan 8, 2015 at 4:56 PM, Thomas Wetmore <ttw4 at verizon.net> wrote:

>
> > On Jan 8, 2015, at 7:14 PM, Louis Kessler <lkessler at LKESSLER.COM> wrote:
> >
> > Jan Murphy said:
> >> Are you defining census households by using the heads of households as
> delineators?  That is treacherous
> >> because I've seen many instances where people are not written neatly
> within their own family group, e.g.
> >> p1 = uncle p2 = head and so on.
> >>
> >> This is one of many reasons I would rather see the breakdown of raw
> data done by source, and instead of
> >> a p for person have an e = entry number for the line on the document.
> >
> > As you describe, Jan, is exactly the way I would do it. I have stated
> that the source should contain "just the facts" and no interpretation.
> Assuming the personas is interpretation.
>
> Does this mean that you and Jan both think that the concept of the
> household is too “interpreted” to use as a basis in extracting raw data
> from a census? If you check censuses that organize by household you will
> find that almost all of them give each household a specific index number,
> and keep them grouped by those numbers. Isn’t that data indexing part of
> the raw source data?
>
> You could treat that family index as another field/column for each person,
> but that’s just the same effect that I’m after with a slightly different
> organization. Are you saying the household is not a useful concept? If you
> think it is a useful concept, how would you handle it?
>
> When you are doing research and extracting data from a census, do you
> extract the data for every person on the page with your family of interest,
> or do you just extract the data for the family of interest? Or do you do
> something between the two, maybe extract nearby by families that you think
> might prove of interest eventually.
>
> Just think about what it would mean to extract “just the facts” from a
> census as a source. Wow.
>
> If you have an event with a date, and the age of a person at that date,
> you can estimate the person’s birth to plus or minus a year or two. Would
> you call that interpreted data? Or would you call it just another form of
> the provided information? Would you include that estimated birth year
> anywhere? Or would you expect software to infer it when appropriate? If you
> were to include that estimated birth year anywhere, where would that be?
>
> > This small difference in thinking is the thing I don’t like about Tom's
> ideas of personas, because I think of personas as mini-conclusions not
> belonging with the source data.
>
> Given that you don’t like it at the persona level, do you like it anywhere
> else instead?
>

Bearing in mind that we are talking about RAW data here....

If I say

Archive reference     RG11
> Piece number     2180
> Folio     67

Page     3
>

and I say I am looking at line 9, then any genealogist in the UK knows
exactly which person is meant, without any ambiguity.

Tom said:

If you check censuses that organize by household you will find that almost
> all of them give each household a specific index number, and keep them
> grouped by those numbers. Isn’t that data indexing part of the raw source
> data?


Yes indeed, but in the cases where the census does not include dwelling
numbers and family numbers, then the only cross-check we have if a boundary
goes awry is to also know the number of the entry on the page, and if there
is no printed line number on the census form, then there should be some way
of preserving the position of the entry on the page.

I have seen cases on Ancestry where the choice to view "all people on this
page" results in a display which puts all the name in alphabetical order.

There have also been cases where published petitions disregarded the order
of the names on the sheets and published the names every which way
(Elizabeth, was that one of your presentations?), thus making the printed
form useless for figuring out which people were actually 'nearby'.

I hate to keep arguing this point over and over again, but we are looking
at documents and other source material.  We are not looking at people.  We
are looking at sources, most of which (but not all) contain names.

A lot of beginning researchers, including many of the people in the
Genealogy Do-Over group, struggle to learn how to cite their sources, and
why? Because if you work in a people-centric system the sources are always
an afterthought.

Jan Murphy
packrat74 at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://fhiso.org/pipermail/tsc-public_fhiso.org/attachments/20150108/6a21bcd9/attachment-0002.html>


More information about the TSC-public mailing list