<<

The YAGO

Fabian M. Suchanek Tel´ ecom´ ParisTech University Paris, France

See the original SVG slides

1 A bit of history...

Unknown painter 2 2 YAGO: First large KB from

Elvis Presley

Elvis Presley was one AmericanSinger of the best blah blah blub blah don’t read type this, listen to the speaker! blah blah blah ElvisPresley blubl blah you are still Born: 1935 born bornIn reading this! blah blah In: Tupelo blah blah blabbel blah ... 1935 Tupelo won Categories: locatedIn AcAward Rock&Roll, American Singers, USA Academy Award winners...

3 3 Type Hierarchy

Singer

from Wikipedia categories, American Singer Male Singer automatically selected Male American Singer

Shallow noun Male American Singer of German Origin phrase parsing

gender Regular male expressions livesIn USA + demonyms 4 4 Type Hierarchy continued

Entity

... from WordNet Organism

...

Person by linking Wikipedia & Singer WordNet ...

5 5 Type Hierarchy continued

Entity

...... from WordNet Organisation Organism Location

...... University Village Person by linking Wikipedia & Singer WordNet ... In total: 650k classes • every entity has 1 class • link precision of 97% • 6 6 Knowledge Representation

marriedTo

Fact Id Subject Predicate Object

”1967-05-01”8sd:date ”1973-08-18”8sd:date ”Infobox “spouse””

(exported to RDF w/out factIds)

(Example for illustration only) 7 7 Knowledge Representation

marriedTo

Fact Id Subject Predicate Object

”1967-05-01”8sd:date ”1973-08-18”8sd:date ”Infobox “spouse”” Typechecked, (exported to RDF w/out factIds) clean facts (Example for illustration only) 8 8 Creating a large knowledge base

Extractor Extractor Extractor

Extractor GeoNames Extractor Extractor WordNet Extractor

Intermediate extractors clean facts • deduplicate facts and entities ensuring high quality (95%) • check consistency • See it online See it locally 9 9 Creating a large knowledge base

Extractor Extractor Extractor

Extractor GeoNames Extractor Extractor WordNet Extractor

Intermediate extractors clean facts • deduplicate facts and entities ensuring high quality (95%) • check consistency • See it online See it locally 10 10 Integrating multilingual Wikis

Extractor was born Extractor

est ne´

e` nato

11 11 Integrating multilingual Wikis

Extractor was born Extractor

est ne´

map?

e` nato

12 12 Integrating multilingual Wikis

Extractor was born Extractor

est ne´

e` nato Extractor ?

13 13 Integrating multilingual Wikis

bornIn

est ne´

14 14 Integrating multilingual Wikis

bornIn

est ne´ = bornIn est ne´

P (S1 S2) >✓ ◆ ?

[CIDR 2015] 15 15 Integrating multilingual Wikis

bornIn

est ne´ = bornIn est ne´

P (S1 S2) >✓ ◆ ?

[CIDR 2015] 16 16 Example: YAGO about Elvis

17 17 YAGO: a large knowledge base

Wikipedia + WordNet time and space 10 languages 100 relations http://yago-knowledge.org 100m facts 10m entities now open-source! 95% accuracy used by DBpedia and IBM

[WWW’07,JWS’08,WWW’11 demo,AIJ’13,WWW’13 demo,CIDR’15,ISWC’16]

18 18 Rule Mining finds patterns

married

hasChild hasChild

married(x, y) hasChild(x, z) hasChild(y, z) ^ )

19 19 Rule Mining finds patterns

married

hasChild hasChild

married(x, y) hasChild(x, z) hasChild(y, z) ^ )

But: Rule mining needs counter examples and RDF ontologies are positive only 20 20 Partial Completeness Assumption

hasChild

Assumption: If we know r(x,y1),..., r(x,yn), then all other r(x,z) are false.

21 21 Partial Completeness Assumption

hasChild

Assumption: If we know r(x,y1),..., r(x,yn), then all other r(x,z) are false.

22 22 Partial Completeness Assumption

hasChild

Assumption: If we know r(x,y1),..., r(x,yn), then all other r(x,z) are false.

23 23 AMIE finds rules in ontologies

AMIE r(x, y) r (z, y) r”(x, z) ^ 0 ) (5min)

AMIE is based on an efficient in-memory implementation.

Caveat: rules cannot predict the unknown with high precision

24 24 AMIE finds rules in ontologies

type(x, pope) AMIE ) diedIn(x, Rome) (5min)

[WWW 2013, VLDB journal 2015]

25 25 Incompleteness

marriedTo

26 26 Incompleteness

marriedTo

c 27 27 Incompleteness

marriedTo

marriedTo ?

Given a subject s and a relation r, do we know all o with r(s, o) ?

c Girl Happy 28 28 Signals for Incompleteness

marriedTo

marriedTo ?

Closed World Assumption No-change oracle Partial Completeness Assumption Star-pattern oracle Popularity oracle Class-oracle

AMIE oracle: Learn rules such as

moreT han1(x, hasP arent) complete(x, hasP arent) ) 29 29 Signals for Incompleteness (F1)

* = biased training sample 30 30 Signals for Incompleteness AMIE can predict incompleteness marriedTo bornIn: 100% F1-measure • diedIn: 96% marriedTo • directed: 100% • graduatedFrom: 87% • hasChild: 78% • isMarriedTo: 46% • ... and more.

[WSDM 2017]

31 31 & YAGO

We are happy to find ways of collaboration can WikiData make use of YAGO’s type hierarchy? • can WikiData import facts from YAGO? • can YAGO be used to consolidate facts in WikiData? • can the source code of YAGO be of use? • add your idea here: •

32 32 Happy Birthday!!!

We are happy to find ways of collaboration can WikiData make use of YAGO’s type hierarchy? • can WikiData import facts from YAGO? • can YAGO be used to consolidate facts in WikiData? • can the source code of YAGO be of use? • add your idea here: •

33 33