The YAGO Knowledge Base
Fabian M. Suchanek Tel´ ecom´ ParisTech University Paris, France
See the original SVG slides
1 A bit of history...
Unknown painter 2 2 YAGO: First large KB from Wikipedia
Elvis Presley
Elvis Presley was one AmericanSinger of the best blah blah blub blah don’t read type this, listen to the speaker! blah blah blah ElvisPresley blubl blah you are still Born: 1935 born bornIn reading this! blah blah In: Tupelo blah blah blabbel blah ... 1935 Tupelo won Categories: locatedIn AcAward Rock&Roll, American Singers, USA Academy Award winners...
3 3 Type Hierarchy
Singer
from Wikipedia categories, American Singer Male Singer automatically selected Male American Singer
Shallow noun Male American Singer of German Origin phrase parsing
gender Regular male expressions livesIn USA + demonyms 4 4 Type Hierarchy continued
Entity
... from WordNet Organism
...
Person by linking Wikipedia & Singer WordNet ...
5 5 Type Hierarchy continued
Entity
...... from WordNet Organisation Organism Location
...... University Village Person by linking Wikipedia & Singer WordNet ... In total: 650k classes • every entity has 1 class • link precision of 97% • 6 6 Knowledge Representation
marriedTo
Fact Id Subject Predicate Object
(exported to RDF w/out factIds)
(Example for illustration only) 7 7 Knowledge Representation
marriedTo
Fact Id Subject Predicate Object
Extractor Extractor Extractor
Extractor GeoNames Extractor Extractor WordNet Extractor
Intermediate extractors clean facts • deduplicate facts and entities ensuring high quality (95%) • check consistency • See it online See it locally 9 9 Creating a large knowledge base
Extractor Extractor Extractor
Extractor GeoNames Extractor Extractor WordNet Extractor
Intermediate extractors clean facts • deduplicate facts and entities ensuring high quality (95%) • check consistency • See it online See it locally 10 10 Integrating multilingual Wikis
Extractor was born Extractor
est ne´
e` nato
11 11 Integrating multilingual Wikis
Extractor was born Extractor
est ne´
map?
e` nato
12 12 Integrating multilingual Wikis
Extractor was born Extractor
est ne´
e` nato Extractor ?
13 13 Integrating multilingual Wikis
bornIn
est ne´
14 14 Integrating multilingual Wikis
bornIn
est ne´ = bornIn est ne´
P (S1 S2) >✓ ◆ ?
[CIDR 2015] 15 15 Integrating multilingual Wikis
bornIn
est ne´ = bornIn est ne´
P (S1 S2) >✓ ◆ ?
[CIDR 2015] 16 16 Example: YAGO about Elvis
17 17 YAGO: a large knowledge base
Wikipedia + WordNet time and space 10 languages 100 relations http://yago-knowledge.org 100m facts 10m entities now open-source! 95% accuracy used by DBpedia and IBM Watson
[WWW’07,JWS’08,WWW’11 demo,AIJ’13,WWW’13 demo,CIDR’15,ISWC’16]
18 18 Rule Mining finds patterns
married
hasChild hasChild
married(x, y) hasChild(x, z) hasChild(y, z) ^ )
19 19 Rule Mining finds patterns
married
hasChild hasChild
married(x, y) hasChild(x, z) hasChild(y, z) ^ )
But: Rule mining needs counter examples and RDF ontologies are positive only 20 20 Partial Completeness Assumption
hasChild
Assumption: If we know r(x,y1),..., r(x,yn), then all other r(x,z) are false.
21 21 Partial Completeness Assumption
hasChild
Assumption: If we know r(x,y1),..., r(x,yn), then all other r(x,z) are false.
22 22 Partial Completeness Assumption
hasChild
Assumption: If we know r(x,y1),..., r(x,yn), then all other r(x,z) are false.
23 23 AMIE finds rules in ontologies
AMIE r(x, y) r (z, y) r”(x, z) ^ 0 ) (5min)
AMIE is based on an efficient in-memory database implementation.
Caveat: rules cannot predict the unknown with high precision
24 24 AMIE finds rules in ontologies
type(x, pope) AMIE ) diedIn(x, Rome) (5min)
[WWW 2013, VLDB journal 2015]
25 25 Incompleteness
marriedTo
26 26 Incompleteness
marriedTo
c Girl Happy 27 27 Incompleteness
marriedTo
marriedTo ?
Given a subject s and a relation r, do we know all o with r(s, o) ?
c Girl Happy 28 28 Signals for Incompleteness
marriedTo
marriedTo ?
Closed World Assumption No-change oracle Partial Completeness Assumption Star-pattern oracle Popularity oracle Class-oracle
AMIE oracle: Learn rules such as
moreT han1(x, hasP arent) complete(x, hasP arent) ) 29 29 Signals for Incompleteness (F1)
* = biased training sample 30 30 Signals for Incompleteness AMIE can predict incompleteness marriedTo bornIn: 100% F1-measure • diedIn: 96% marriedTo • directed: 100% • graduatedFrom: 87% • hasChild: 78% • isMarriedTo: 46% • ... and more.
[WSDM 2017]
31 31 WikiData & YAGO
We are happy to find ways of collaboration can WikiData make use of YAGO’s type hierarchy? • can WikiData import facts from YAGO? • can YAGO be used to consolidate facts in WikiData? • can the source code of YAGO be of use? • add your idea here: •
32 32 Happy Birthday!!!
We are happy to find ways of collaboration can WikiData make use of YAGO’s type hierarchy? • can WikiData import facts from YAGO? • can YAGO be used to consolidate facts in WikiData? • can the source code of YAGO be of use? • add your idea here: •
33 33