The YAGO Knowledge Base
Total Page:16
File Type:pdf, Size:1020Kb
The YAGO Knowledge Base Fabian M. Suchanek Tel´ ecom´ ParisTech University Paris, France See the original SVG slides 1 A bit of history... Unknown painter 2 2 YAGO: First large KB from Wikipedia Elvis Presley Elvis Presley was one AmericanSinger of the best blah blah blub blah don’t read type this, listen to the speaker! blah blah blah ElvisPresley blubl blah you are still Born: 1935 born bornIn reading this! blah blah In: Tupelo blah blah blabbel blah ... 1935 Tupelo won Categories: locatedIn AcAward Rock&Roll, American Singers, USA Academy Award winners... 3 3 Type Hierarchy Singer from Wikipedia categories, American Singer Male Singer automatically selected Male American Singer Shallow noun Male American Singer of German Origin phrase parsing gender Regular male expressions livesIn USA + demonyms 4 4 Type Hierarchy continued Entity ... from WordNet Organism ... Person by linking Wikipedia & Singer WordNet ... 5 5 Type Hierarchy continued Entity ... ... ... from WordNet Organisation Organism Location ... ... ... University Village Person by linking Wikipedia & Singer WordNet ... In total: 650k classes • every entity has 1 class • ≥ link precision of 97% • 6 6 Knowledge Representation marriedTo Fact Id Subject Predicate Object <fact 42> <Elvis> <marriedTo> <Priscilla> <fact 43> <fact 42> <since> ”1967-05-01”8sd:date <fact 44> <fact 42> <until> ”1973-08-18”8sd:date <fact 45> <fact 42> <extractedFrom> <http://wikipedia...> <fact 46> <fact 45> <byTechnique> ”Infobox “spouse”” (exported to RDF w/out factIds) (Example for illustration only) 7 7 Knowledge Representation marriedTo Fact Id Subject Predicate Object <fact 42> <Elvis> <marriedTo> <Priscilla> <fact 43> <fact 42> <since> ”1967-05-01”8sd:date <fact 44> <fact 42> <until> ”1973-08-18”8sd:date <fact 45> <fact 42> <extractedFrom> <http://wikipedia...> <fact 46> <fact 45> <byTechnique> ”Infobox “spouse”” Typechecked, (exported to RDF w/out factIds) clean facts (Example for illustration only) 8 8 Creating a large knowledge base Extractor Extractor Extractor Extractor GeoNames Extractor Extractor WordNet Extractor Intermediate extractors clean facts • deduplicate facts and entities ensuring high quality (95%) • check consistency • See it online See it locally 9 9 Creating a large knowledge base Extractor Extractor Extractor Extractor GeoNames Extractor Extractor WordNet Extractor Intermediate extractors clean facts • deduplicate facts and entities ensuring high quality (95%) • check consistency • See it online See it locally 10 10 Integrating multilingual Wikis Extractor was born Extractor est ne´ e` nato 11 11 Integrating multilingual Wikis Extractor was born Extractor est ne´ map? e` nato 12 12 Integrating multilingual Wikis Extractor was born Extractor est ne´ e` nato Extractor ? 13 13 Integrating multilingual Wikis bornIn <Elvis,USA> <Mao,China> est ne´ <Elvis,USA> <Mao,China> 14 14 Integrating multilingual Wikis bornIn <Elvis,USA> <Mao,China> est ne´ = bornIn est ne´ <Elvis,USA> <Mao,China> P (S1 S2) >✓ ◆ ? [CIDR 2015] 15 15 Integrating multilingual Wikis bornIn <Elvis,USA> <Mao,China> est ne´ = bornIn est ne´ <Elvis,USA> <Mao,China> P (S1 S2) >✓ ◆ ? [CIDR 2015] 16 16 Example: YAGO about Elvis 17 17 YAGO: a large knowledge base Wikipedia + WordNet time and space 10 languages 100 relations http://yago-knowledge.org 100m facts 10m entities now open-source! 95% accuracy used by DBpedia and IBM Watson [WWW’07,JWS’08,WWW’11 demo,AIJ’13,WWW’13 demo,CIDR’15,ISWC’16] 18 18 Rule Mining finds patterns married hasChild hasChild married(x, y) hasChild(x, z) hasChild(y, z) ^ ) 19 19 Rule Mining finds patterns married hasChild hasChild married(x, y) hasChild(x, z) hasChild(y, z) ^ ) But: Rule mining needs counter examples and RDF ontologies are positive only 20 20 Partial Completeness Assumption hasChild Assumption: If we know r(x,y1),..., r(x,yn), then all other r(x,z) are false. 21 21 Partial Completeness Assumption hasChild Assumption: If we know r(x,y1),..., r(x,yn), then all other r(x,z) are false. 22 22 Partial Completeness Assumption hasChild Assumption: If we know r(x,y1),..., r(x,yn), then all other r(x,z) are false. 23 23 AMIE finds rules in ontologies AMIE r(x, y) r (z, y) r”(x, z) ^ 0 ) (5min) AMIE is based on an efficient in-memory database implementation. Caveat: rules cannot predict the unknown with high precision 24 24 AMIE finds rules in ontologies type(x, pope) AMIE ) diedIn(x, Rome) (5min) [WWW 2013, VLDB journal 2015] 25 25 Incompleteness marriedTo 26 26 Incompleteness marriedTo c Girl Happy 27 27 Incompleteness marriedTo marriedTo ? Given a subject s and a relation r, do we know all o with r(s, o) ? c Girl Happy 28 28 Signals for Incompleteness marriedTo marriedTo ? Closed World Assumption No-change oracle Partial Completeness Assumption Star-pattern oracle Popularity oracle Class-oracle AMIE oracle: Learn rules such as moreT han1(x, hasP arent) complete(x, hasP arent) ) 29 29 Signals for Incompleteness (F1) * = biased training sample 30 30 Signals for Incompleteness AMIE can predict incompleteness marriedTo bornIn: 100% F1-measure • diedIn: 96% marriedTo • directed: 100% • graduatedFrom: 87% • hasChild: 78% • isMarriedTo: 46% • ... and more. [WSDM 2017] 31 31 WikiData & YAGO We are happy to find ways of collaboration can WikiData make use of YAGO’s type hierarchy? • can WikiData import facts from YAGO? • can YAGO be used to consolidate facts in WikiData? • can the source code of YAGO be of use? • add your idea here: • 32 32 Happy Birthday!!! We are happy to find ways of collaboration can WikiData make use of YAGO’s type hierarchy? • can WikiData import facts from YAGO? • can YAGO be used to consolidate facts in WikiData? • can the source code of YAGO be of use? • add your idea here: • 33 33.