Linked Open Genealogy
Total Page:16
File Type:pdf, Size:1020Kb
Linked Open Genealogy Porting Genealogy Commons to the Web of Data © Bernard Vatant, 2019 Genealogy : data about people • People are entities with data properties – name, date of birth, date of death ... • Linked together by object properties – mother, father, spouse, ... • Linked to other entities – places, events, organisations, works ... • Genealogy is naturally linked data! Genealogy Linked Data © Bernard Vatant 2019 2 Genealogy as Big Data Business • Ancestry.com – 10 billion records, 3 million paying suscribers • MyHeritage – 9 billion records, 35 million family trees • Geni.com – Over 100 million records, 11 million users – Owned by MyHeritage since 2012 • And many more ... – Genealogy is a trendy market • Source : https://en.wikipedia.org/wiki/List_of_genealogy_databases Genealogy Linked Data © Bernard Vatant 2019 3 Big Business means Data Silos • Proprietary data formats and API – As everywhere ... • De facto standard : GEDCOM – Non-extensible data model – Strong cultural-religious bias (WASP) – No support for linked entities (places etc.) – No support for standard identifiers (ISNI, VIAF) Genealogy Linked Data © Bernard Vatant 2019 4 Tragedy of the Genealogy Commons • Genealogy is about our common history – The oldest ancestors are the most common – Some shared today by millions of descendants! • Genealogy Commons should be open data! – And of course, free standard open linked data! • Some efforts to regain the Commons – WikiTree : 20 million records, 600,000 members – WeRelate : 2,5 million records – Open data, but still not part of the Web of Data Genealogy Linked Data © Bernard Vatant 2019 5 Genealogy in Wikidata • Over 5 million « items » of type « human » – Based on (fuzzy) « notability criteria » – ~ 400,000 hold at least one parenthood property – ~ 200,000 hold « has father » relationship – ~ 44,000 hold « has mother » relationship – ~ 38,000 are linked to both parents • External identifiers from genealogy data bases – ~ 60,000 WikiTree identifiers – ~ 25,000 WeRelate identifiers Figures as of April 2019 Genealogy Linked Data © Bernard Vatant 2019 6 A real challenge • Genealogy Linked Open Data is still small – 3 orders of magnitude under business data bases • To scale from million to billion range... – ... needs a strong business model • Some tricky legal and ethical issues – Privacy of living people – Meaning of notability – Time limits of the Genealogy Commons Genealogy Linked Data © Bernard Vatant 2019 7 Way forward #1 • Adding more genealogy to Wikidata – Native linked data – Extensible vocabulary – Links to many data bases and identifiers • Issues – Notability policy – No support for privacy – Scalability Genealogy Linked Data © Bernard Vatant 2019 8 Way forward #2 • Support development of WikiTree – Open data – Good collaboration interface – Active community • Issues – No linked data publication so far – Uses GEDCOM as interchange format Genealogy Linked Data © Bernard Vatant 2019 9 .