Linked Open

Porting Genealogy Commons to the Web of Data

© Bernard Vatant, 2019

Genealogy : data about people

• People are entities with data properties – name, date of birth, date of death ... • Linked together by object properties – mother, father, spouse, ... • Linked to other entities – places, events, organisations, works ...

• Genealogy is naturally linked data!

Genealogy Linked Data © Bernard Vatant 2019 2

Genealogy as Big Data Business

• Ancestry.com – 10 billion records, 3 million paying suscribers • MyHeritage – 9 billion records, 35 million family trees • Geni.com – Over 100 million records, 11 million users – Owned by MyHeritage since 2012 • And many more ... – Genealogy is a trendy market

• Source : https://en.wikipedia.org/wiki/List_of_genealogy_databases

Genealogy Linked Data © Bernard Vatant 2019 3

Big Business means Data Silos

• Proprietary data formats and API – As everywhere ... • De facto standard : GEDCOM – Non-extensible data model – Strong cultural-religious bias (WASP) – No support for linked entities (places etc.) – No support for standard identifiers (ISNI, VIAF)

Genealogy Linked Data © Bernard Vatant 2019 4 Tragedy of the Genealogy Commons

• Genealogy is about our common history – The oldest ancestors are the most common – Some shared today by millions of descendants! • Genealogy Commons should be open data! – And of course, free standard open linked data! • Some efforts to regain the Commons – WikiTree : 20 million records, 600,000 members – WeRelate : 2,5 million records – Open data, but still not part of the Web of Data

Genealogy Linked Data © Bernard Vatant 2019 5

Genealogy in Wikidata

• Over 5 million « items » of type « human » – Based on (fuzzy) « notability criteria » – ~ 400,000 hold at least one parenthood property – ~ 200,000 hold « has father » relationship – ~ 44,000 hold « has mother » relationship – ~ 38,000 are linked to both parents • External identifiers from genealogy data bases – ~ 60,000 WikiTree identifiers – ~ 25,000 WeRelate identifiers

Figures as of April 2019

Genealogy Linked Data © Bernard Vatant 2019 6

A real challenge

• Genealogy Linked Open Data is still small – 3 orders of magnitude under business data bases • To scale from million to billion range... – ... needs a strong business model • Some tricky legal and ethical issues – Privacy of living people – Meaning of notability – Time limits of the Genealogy Commons

Genealogy Linked Data © Bernard Vatant 2019 7 Way forward #1

• Adding more genealogy to Wikidata – Native linked data – Extensible vocabulary – Links to many data bases and identifiers • Issues – Notability policy – No support for privacy – Scalability

Genealogy Linked Data © Bernard Vatant 2019 8 Way forward #2

• Support development of WikiTree – Open data – Good collaboration interface – Active community • Issues – No linked data publication so far – Uses GEDCOM as interchange format

Genealogy Linked Data © Bernard Vatant 2019 9