Knowledge Graph Construction from Text AAAI 2017 JAY PUJARA, SAMEER SINGH, BHAVANA DALVI Tutorial Overview
Part 1: Knowledge Graphs
Part 2: Part 3: Knowledge Graph Extraction Construction
Part 4: Critical Analysis
2 Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. NLP Fundamentals [Sameer] b. Information Extraction [Bhavana] Coffee Break 3. Knowledge Graph Construction a. Probabilistic Models [Jay] b. Embedding Techniques [Sameer] 4. Critical Overview and Conclusion [Bhavana]
3 What is Knowledge Extraction?
Text John was born in Liverpool, to Julia and Alfred Lennon.
Literal Facts
Alfred Lennon childOf birthplace Liverpool John Lennon childOf Julia Lennon
4 NLP Fundamentals EXTRACTING STRUCTURES FROM LANGUAGE What is NLP?
NLP “Knowledge”
Unstructured Structured Ambiguous Precise, Actionable Lots and lots of it! Specific to the task
Humans can read them, but Can be used for downstream … very slowly applications, such as creating … can’t remember all Knowledge Graphs! … can’t answer questions
6 What is NLP?
John was born in Liverpool, to Julia and Alfred Lennon.
Natural Language Processing
Lennon.. Mrs. Lennon.. his father John Lennon... the Pool .. his mother .. he Alfred Person Location Person Person John was born in Liverpool, to Julia and Alfred Lennon. NNP VBD VBD IN NNP TO NNP CC NNP NNP
7 What is Information Extraction?
Lennon.. Mrs. Lennon.. his father John Lennon... the Pool .. his mother .. he Alfred Person Location Person Person John was born in Liverpool, to Julia and Alfred Lennon. NNP VBD VBD IN NNP TO NNP CC NNP NNP
Information Extraction
Alfred Lennon childOf birthplace spouse Liverpool John Lennon childOf Julia Lennon
8 Breaking it Down
Alfred Lennon Entity resolution, childOf birthplace spouse Entity linking, Liverpool John Lennon Julia Relation extraction… childOf
Extraction Lennon Information
Lennon.. Mrs. Lennon.. his father John Lennon... the Pool Coreference Resolution... .. his mother .. he Alfred Person Location Person Person Document John was born in Liverpool, to Julia and Alfred Lennon.
Dependency Parsing, Part of speech tagging, Named entity recognition… Sentence NNP VBD VBD IN NNP TO NNP CC NNP NNP John was born in Liverpool, to Julia and Alfred Lennon.
9 Tokenization & Sentence Splitting
“Mr. Bob Dobolina is thinkin' of a master plan. Why doesn't he quit?”
[Mr.] [Bob] [Dobolina] [is] [thinkin’] [of] [a] [master] [plan] [.] [Why] [doesn't] [he] [quit] [?]
How it is done: Uses in KG Construction: • Regular expressions, but not trivial • Strictly constrains other NLP tasks • Mr., Yahoo!, lower-case • Parts of Speech • For non-English, incredibly difficult! • Dependency Parsing • Chinese: no “space” character • Directly effects KG nodes/edges • Non-trivial for some domains… • Mention boundaries • What is a “token” in BioNLP? • Relations within sentences
10 Tagging the Parts of Speech
NNP VBD VBD IN NNP TO NNP CC NNP NNP John was born in Liverpool, to Julia and Alfred Lennon.
How it is done: Uses in KG Construction: • Context is important! • Entities appear as nouns • run, table, bar, … • Verbs are very useful • Label whole sentence together • For identifying relations • “Structured prediction” • For identifying entity types • Conditional Random Fields, .. • Important for downstream NLP • Now: CNNs, LSTMs, … • NER, Dependency Parsing, …
11 Detecting Named Entities
Person Location Person Person John was born in Liverpool, to Julia and Alfred Lennon.
How it is done: Uses in KG Construction: • Context is important! • Mentions describes the nodes • Georgia, Washington, … • Types are incredibly important! • John Deere, Thomas Cook, … • Often restrict relations • Princeton, Amazon, … • Fine-grained types are informative! • Label whole sentence together • Brooklyn: city • Structured prediction again • Sanders: politician, senator
12 NER: Entity Types
3 class: Location, Person, Organization
Stanford CoreNLP 4 class: Location, Person, Organization, Misc
7 class: Location, Person, Organization, Money, Percent, Date, Time
PERSON People, including fictional. NORP Nationalities or religious or political groups. FACILITY Buildings, airports, highways, bridges, etc. ORG Companies, agencies, institutions, etc. GPE Countries, cities, states. spaCy.io LOC Non-GPE locations, mountain ranges, bodies of water. PRODUCT Objects, vehicles, foods, etc. (Not services.) EVENT Named hurricanes, battles, wars, sports events, etc. WORK_OF_ART Titles of books, songs, etc. LANGUAGE Any named language.
From Stanford CoreNLP (http://nlp.stanford.edu/software/CRF-NER.shtml) 13 NER: Entity Types
Fine-grained Types
From Ling & Weld. AAAI 2012 (http://aiweb.cs.washington.edu/ai/pubs/ling-aaai12.pdf) 14 Dependency Parsing
How it is done: Uses in KG Construction: • Model: score trees using features • Incredibly useful for relations! • Lexical: words, POS, … • What verb is attached? • Structure: distance, … • Relation to which mention? • Prediction: Search over trees • Incredibly useful for attributes! • greedy, spanning tree, belief • Appositives: “X, the CEO, …” propagation, dynamic prog, … • Paths are used as surface relations
Using http://nlp.stanford.edu:8080/corenlp/process 15 Dependency Paths
nmod:to
nmod:in John was born in Liverpool, to Julia and Alfred Lennon. case case
Text Patterns Dependency Paths
John, Liverpool “was born in” “was born in”
John, Julia “was born in Liverpool, to” “was born to”
John, Alfred Lennon “was born in Liverpool, to Julia and” “was born to”
16 Within-document Coreference
Mrs. Lennon.. He… Alfred .. his mother .. Lennon.. the Pool his father John Lennon... he John was born in Liverpool, to Julia and Alfred Lennon.
How it is done: Uses in KG Construction: • Mo`del: score pairwise links • More context for each entity! • dep path, similarity, types, … • Many relations occur on pronouns • “representative mention” • “He is married to her” • Prediction: Search over clusterings • Coref can be used for types • greedy (left to right), ILP, • Nominals: The president, … belief propagation, MCMC, … • Difficult, so often ignored
17 Information Extraction
Lennon.. Mrs. Lennon.. his father John Lennon... the Pool .. his mother .. he Alfred Person Location Person Person John was born in Liverpool, to Julia and Alfred Lennon. NNP VBD VBD IN NNP TO NNP CC NNP NNP
Information Extraction
Alfred Lennon childOf birthplace spouse Liverpool John Lennon childOf Julia Lennon
18 Surface Patterns
Combine tokens, dependency paths, and entity types to define rules.
appos nmod
det case
Argument 1 , DT CEO of Argument 2 Person Organization
Bill Gates, the CEO of Microsoft, said … Mr. Jobs, the brilliant and charming CEO of Apple Inc., said … … announced by Steve Jobs, the CEO of Apple. … announced by Bill Gates, the director and CEO of Microsoft. … mused Bill, a former CEO of Microsoft. and many other possible instantiations…
19 Rule-Based Extraction
appos nmod headOf det case Implies Argument 1 Argument 2
Argument 1 , DT CEO of Argument 2 Person Organization Use a collection of rules as the system itself
Variations High precision: when it fires, it’s correct Source: Easy to explain predictions • Manually specified Easy to fix mistakes • Learned from Data Multiple Rules: However… • Attach priorities/precedence Only work when the rules fire • Attach probabilities (more later) Poor recall: Do not generalize!
20 Supervised Extraction
P(birthplace) = 0.75 Machine Learning: hopefully, generalizes the labels in the right way Classifier Use all of NLP as features: words, POS, NER, dependencies, embeddings
POS NER Dep Path Text in b/w embeddings … However Feature Engineering Usually, a lot of labeled data is needed, which is expensive & time consuming. John was born in Liverpool, to Julia and Alfred Lennon. Requires a lot of feature engineering!
21 Entity Resolution & Linking
...during the late 60's and early 70's, Kevin Smith worked with several local...
...the term hip-hop is attributed to Lovebug Starski. What does it actually mean...
Like Back in 2008, the Lions drafted Kevin Smith, even though Smith was badly...
... backfield in the wake of Kevin Smith's knee injury, and the addition of Haynesworth...
The filmmaker Kevin Smith returns to the role of Silent Bob...
Nothing could be more irrelevant to Kevin Smith's audacious ''Dogma'' than ticking off...
... The Physiological Basis of Politics,” by Kevin Smith, Douglas Oxley, Matthew Hibbing...
22 Entity Names: Two Main Problems
Entities with Same Name Different Names for Entities
Same type of entities share names Nick Names Kevin Smith, John Smith, Bam Bam, Drumpf, … Springfield, …
Things named after each other Typos/Misspellings Clinton, Washington, Paris, Baarak, Barak, Barrack, … Amazon, Princeton, Kingston, …
Partial Reference Inconsistent References First names of people, Location MSFT, APPL, GOOG… instead of team name, Nick names
23 Entity Linking Approach
Washington drops 10 points after game with UCLA Bruins.
Washington DC, George Washington, Washington state, Candidate Generation Lake Washington, Washington Huskies, Denzel Washington, University of Washington, Washington High School, …
Washington DC, George Washington, Washington state, Entity Types LOC/ORG Lake Washington, Washington Huskies, Denzel Washington, University of Washington, Washington High School, …
Washington DC, George Washington, Washington state, Coreference UWashington, Lake Washington, Washington Huskies, Denzel Washington, Huskies University of Washington, Washington High School, …
Washington DC, George Washington, Washington state, UCLA Bruins, Coherence Lake Washington, Washington Huskies, Denzel Washington, USC Trojans University of Washington, Washington High School, …
Vinculum, Ling, Singh, Weld, TACL (2015) 24 Information Extraction
Lennon.. Mrs. Lennon.. his father John Lennon... the Pool .. his mother .. he Alfred Person Location Person Person John was born in Liverpool, to Julia and Alfred Lennon. NNP VBD VBD IN NNP TO NNP CC NNP NNP
Information Extraction
Alfred Lennon childOf birthplace spouse Liverpool John Lennon childOf Julia Lennon
25