[TSC-public] Format for Raw Source Content

Thomas Wetmore ttw4 at verizon.net
Thu Jan 8 18:56:41 CST 2015


> On Jan 8, 2015, at 7:14 PM, Louis Kessler <lkessler at LKESSLER.COM> wrote:
> 
> Jan Murphy said:
>> Are you defining census households by using the heads of households as delineators?  That is treacherous
>> because I've seen many instances where people are not written neatly within their own family group, e.g. 
>> p1 = uncle p2 = head and so on.
>> 
>> This is one of many reasons I would rather see the breakdown of raw data done by source, and instead of 
>> a p for person have an e = entry number for the line on the document.
> 
> As you describe, Jan, is exactly the way I would do it. I have stated that the source should contain "just the facts" and no interpretation. Assuming the personas is interpretation.

Does this mean that you and Jan both think that the concept of the household is too “interpreted” to use as a basis in extracting raw data from a census? If you check censuses that organize by household you will find that almost all of them give each household a specific index number, and keep them grouped by those numbers. Isn’t that data indexing part of the raw source data?

You could treat that family index as another field/column for each person, but that’s just the same effect that I’m after with a slightly different organization. Are you saying the household is not a useful concept? If you think it is a useful concept, how would you handle it?

When you are doing research and extracting data from a census, do you extract the data for every person on the page with your family of interest, or do you just extract the data for the family of interest? Or do you do something between the two, maybe extract nearby by families that you think might prove of interest eventually.

Just think about what it would mean to extract “just the facts” from a census as a source. Wow.

If you have an event with a date, and the age of a person at that date, you can estimate the person’s birth to plus or minus a year or two. Would you call that interpreted data? Or would you call it just another form of the provided information? Would you include that estimated birth year anywhere? Or would you expect software to infer it when appropriate? If you were to include that estimated birth year anywhere, where would that be?

> This small difference in thinking is the thing I don’t like about Tom's ideas of personas, because I think of personas as mini-conclusions not belonging with the source data.

Given that you don’t like it at the persona level, do you like it anywhere else instead?
> 
> Louis
> 
Tom



More information about the TSC-public mailing list