AAAI 2017 JAY PUJARA, SAMEER SINGH, BHAVANA DALVI Tutorial Overview
Total Page:16
File Type:pdf, Size:1020Kb
Knowledge Graph Construction from Text AAAI 2017 JAY PUJARA, SAMEER SINGH, BHAVANA DALVI Tutorial Overview Part 1: Knowledge Graphs Part 2: Part 3: Knowledge Graph Extraction Construction Part 4: Critical Analysis 2 Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. NLP Fundamentals [Sameer] b. Information Extraction [Bhavana] Coffee Break 3. Knowledge Graph Construction a. Probabilistic Models [Jay] b. Embedding Techniques [Sameer] 4. Critical Overview and Conclusion [Bhavana] 3 What is Knowledge Extraction? Text John was born in Liverpool, to Julia and Alfred Lennon. Literal Facts Alfred Lennon childOf birthplace Liverpool John Lennon childOf Julia Lennon 4 NLP Fundamentals EXTRACTING STRUCTURES FROM LANGUAGE What is NLP? NLP “Knowledge” Unstructured Structured Ambiguous Precise, Actionable Lots and lots of it! Specific to the task Humans can read them, but Can be used for downstream … very slowly applications, such as creating … can’t remember all Knowledge Graphs! … can’t answer questions 6 What is NLP? John was born in Liverpool, to Julia and Alfred Lennon. Natural Language Processing Lennon.. Mrs. Lennon.. his father John Lennon... the Pool .. his mother .. he Alfred Person Location Person Person John was born in Liverpool, to Julia and Alfred Lennon. NNP VBD VBD IN NNP TO NNP CC NNP NNP 7 What is Information Extraction? Lennon.. Mrs. Lennon.. his father John Lennon... the Pool .. his mother .. he Alfred Person Location Person Person John was born in Liverpool, to Julia and Alfred Lennon. NNP VBD VBD IN NNP TO NNP CC NNP NNP Information Extraction Alfred Lennon childOf birthplace spouse Liverpool John Lennon childOf Julia Lennon 8 Breaking it Down Alfred Lennon Entity resolution, childOf birthplace spouse Entity linking, Liverpool John Lennon Julia Relation extraction… childOf Extraction Lennon Information Lennon.. Mrs. Lennon.. his father John Lennon... the Pool Coreference Resolution... .. his mother .. he Alfred Person Location Person Person Document John was born in Liverpool, to Julia and Alfred Lennon. Dependency Parsing, Part of speech tagging, Named entity recognition… Sentence NNP VBD VBD IN NNP TO NNP CC NNP NNP John was born in Liverpool, to Julia and Alfred Lennon. 9 Tokenization & Sentence Splitting “Mr. Bob Dobolina is thinkin' of a master plan. Why doesn't he quit?” [Mr.] [Bob] [Dobolina] [is] [thinkin’] [of] [a] [master] [plan] [.] [Why] [doesn't] [he] [quit] [?] How it is done: Uses in KG Construction: • Regular expressions, but not trivial • Strictly constrains other NLP tasks • Mr., Yahoo!, lower-case • Parts of Speech • For non-English, incredibly difficult! • Dependency Parsing • Chinese: no “space” character • Directly effects KG nodes/edges • Non-trivial for some domains… • Mention boundaries • What is a “token” in BioNLP? • Relations within sentences 10 Tagging the Parts of Speech NNP VBD VBD IN NNP TO NNP CC NNP NNP John was born in Liverpool, to Julia and Alfred Lennon. How it is done: Uses in KG Construction: • Context is important! • Entities appear as nouns • run, table, bar, … • Verbs are very useful • Label whole sentence together • For identifying relations • “Structured prediction” • For identifying entity types • Conditional Random Fields, .. • Important for downstream NLP • Now: CNNs, LSTMs, … • NER, Dependency Parsing, … 11 Detecting Named Entities Person Location Person Person John was born in Liverpool, to Julia and Alfred Lennon. How it is done: Uses in KG Construction: • Context is important! • Mentions describes the nodes • Georgia, Washington, … • Types are incredibly important! • John Deere, Thomas Cook, … • Often restrict relations • Princeton, Amazon, … • Fine-grained types are informative! • Label whole sentence together • Brooklyn: city • Structured prediction again • Sanders: politician, senator 12 NER: Entity Types 3 class: Location, Person, Organization Stanford CoreNLP 4 class: Location, Person, Organization, Misc 7 class: Location, Person, Organization, Money, Percent, Date, Time PERSON People, including fictional. NORP Nationalities or religious or political groups. FACILITY Buildings, airports, highways, bridges, etc. ORG Companies, agencies, institutions, etc. GPE Countries, cities, states. spaCy.io LOC Non-GPE locations, mountain ranges, bodies of water. PRODUCT Objects, vehicles, foods, etc. (Not services.) EVENT Named hurricanes, battles, wars, sports events, etc. WORK_OF_ART Titles of books, songs, etc. LANGUAGE Any named language. From Stanford CoreNLP (http://nlp.stanford.edu/software/CRF-NER.shtml) 13 NER: Entity Types Fine-grained Types • More on this later… From Ling & Weld. AAAI 2012 (http://aiweb.cs.washington.edu/ai/pubs/ling-aaai12.pdf) 14 Dependency Parsing How it is done: Uses in KG Construction: • Model: score trees using features • Incredibly useful for relations! • Lexical: words, POS, … • What verb is attached? • Structure: distance, … • Relation to which mention? • Prediction: Search over trees • Incredibly useful for attributes! • greedy, spanning tree, belief • Appositives: “X, the CEO, …” propagation, dynamic prog, … • Paths are used as surface relations Using http://nlp.stanford.edu:8080/corenlp/process 15 Dependency Paths nmod:to nmod:in John was born in Liverpool, to Julia and Alfred Lennon. case case Text Patterns Dependency Paths John, Liverpool “was born in” “was born in” John, Julia “was born in Liverpool, to” “was born to” John, Alfred Lennon “was born in Liverpool, to Julia and” “was born to” 16 Within-document Coreference Mrs. Lennon.. He… Alfred .. his mother .. Lennon.. the Pool his father John Lennon... he John was born in Liverpool, to Julia and Alfred Lennon. How it is done: Uses in KG Construction: • Mo`del: score pairwise links • More context for each entity! • dep path, similarity, types, … • Many relations occur on pronouns • “representative mention” • “He is married to her” • Prediction: Search over clusterings • Coref can be used for types • greedy (left to right), ILP, • Nominals: The president, … belief propagation, MCMC, … • Difficult, so often ignored 17 Information Extraction Lennon.. Mrs. Lennon.. his father John Lennon... the Pool .. his mother .. he Alfred Person Location Person Person John was born in Liverpool, to Julia and Alfred Lennon. NNP VBD VBD IN NNP TO NNP CC NNP NNP Information Extraction Alfred Lennon childOf birthplace spouse Liverpool John Lennon childOf Julia Lennon 18 Surface Patterns Combine tokens, dependency paths, and entity types to define rules. appos nmod det case Argument 1 , DT CEO of Argument 2 Person Organization Bill Gates, the CEO of Microsoft, said … Mr. Jobs, the brilliant and charming CEO of Apple Inc., said … … announced by Steve Jobs, the CEO of Apple. … announced by Bill Gates, the director and CEO of Microsoft. … mused Bill, a former CEO of Microsoft. and many other possible instantiations… 19 Rule-Based Extraction appos nmod headOf det case Implies Argument 1 Argument 2 Argument 1 , DT CEO of Argument 2 Person Organization Use a collection of rules as the system itself Variations High precision: when it fires, it’s correct Source: Easy to explain predictions • Manually specified Easy to fix mistakes • Learned from Data Multiple Rules: However… • Attach priorities/precedence Only work when the rules fire • Attach probabilities (more later) Poor recall: Do not generalize! 20 Supervised Extraction P(birthplace) = 0.75 Machine Learning: hopefully, generalizes the labels in the right way Classifier Use all of NLP as features: words, POS, NER, dependencies, embeddings POS NER Dep Path Text in b/w embeddings … However Feature Engineering Usually, a lot of labeled data is needed, which is expensive & time consuming. John was born in Liverpool, to Julia and Alfred Lennon. Requires a lot of feature engineering! 21 Entity Resolution & Linking ...during the late 60's and early 70's, Kevin Smith worked with several local... ...the term hip-hop is attributed to Lovebug Starski. What does it actually mean... Like Back in 2008, the Lions drafted Kevin Smith, even though Smith was badly... ... backfield in the wake of Kevin Smith's knee injury, and the addition of Haynesworth... The filmmaker Kevin Smith returns to the role of Silent Bob... Nothing could be more irrelevant to Kevin Smith's audacious ''Dogma'' than ticking off... ... The Physiological Basis of Politics,” by Kevin Smith, Douglas Oxley, Matthew Hibbing... 22 Entity Names: Two Main Problems Entities with Same Name Different Names for Entities Same type of entities share names Nick Names Kevin Smith, John Smith, Bam Bam, Drumpf, … Springfield, … Things named after each other Typos/Misspellings Clinton, Washington, Paris, Baarak, Barak, Barrack, … Amazon, Princeton, Kingston, … Partial Reference Inconsistent References First names of people, Location MSFT, APPL, GOOG… instead of team name, Nick names 23 Entity Linking Approach Washington drops 10 points after game with UCLA Bruins. Washington DC, George Washington, Washington state, Candidate Generation Lake Washington, Washington Huskies, Denzel Washington, University of Washington, Washington High School, … Washington DC, George Washington, Washington state, Entity Types LOC/ORG Lake Washington, Washington Huskies, Denzel Washington, University of Washington, Washington High School, … Washington DC, George Washington, Washington state, Coreference UWashington, Lake Washington, Washington Huskies, Denzel Washington, Huskies University of Washington, Washington High School, … Washington DC, George Washington, Washington state, UCLA Bruins, Coherence Lake Washington, Washington Huskies, Denzel Washington, USC Trojans University of Washington, Washington High School, … Vinculum, Ling, Singh, Weld, TACL (2015) 24 Information Extraction Lennon.. Mrs. Lennon.. his father John Lennon... the Pool .. his mother .. he Alfred Person Location Person Person John was born in Liverpool, to Julia and Alfred Lennon. NNP VBD VBD IN NNP TO NNP CC NNP NNP Information Extraction Alfred Lennon childOf birthplace spouse Liverpool John Lennon childOf Julia Lennon 25.