Implicit Entity Linking in Tweets

Sujan Perera, Pablo N. Mendes, Adarsh Alex, Amit P. Sheth, Krishnaprasad Thirunarayan

Kno.e.sis Center, Wright State University IBM Research, Almaden

Extended Semantic Web Conference 2016, Heraklion, Crete, Greece 1 Motivation

• Linking explicitly mentioned entities in tweets is well-explored • Tweets also contain implicit mentions of entities

Explicit Implicit Entity Entity “New Sandra Bullock astronaut lost in space movie looks absolutely terrifying.”

Implicit Entity “ISRO sends probe to Mars for less money than it takes Hollywood movie to send a woman to space.”

Implicit Entity Linking in Tweets 2 Motivation

• Sentiment analysis “New Sandra Bullock astronaut lost in space movie looks Gravity - Positive absolutely terrifying.“ • Trend detection “Kinda sad to hear about that South African runner kill his girlfriend” Oscar Pistorius • Event monitoring “Texas Town Pushes for Marijuana Legalization to Combat Marijuana Legalization in El Cartel Traffic” Paso

Ignoring implicit entity mentions would adversely affect downstream analysis tasks

Implicit Entity Linking in Tweets 3 Implicit Entities

• Definition – Implicit entity is an entity mentioned in text where its name is not present nor it is a synonym/alias/abbreviation or a co-reference of an explicitly mentioned entity in the text.

• Prevalence – 21% of movie mentions and 40% of book mentions are implicit in tweets.

• Implicit entity linking – Given a text with an implicit entity mention of a particular type (e.g. Movie, Book, Disorder) output the entity mentioned by the text w.r.t a given knowledge base.

Implicit Entity Linking in Tweets 4 Characteristics

• Types of references-through-characteristics – “… Richard Linklater movie …” – “… Ellar Coltrane on his 12-year movie …” – “… 12-year long movie shoot …”

• Dynamic Context

space movie

Gravity Furious 7 The Martian

Paul walkers’ fastest movie to last movie earn $1 billion Fall 2013 April 2014 Fall 2015 Implicit Entity Linking in Tweets 5 Implicit Entity Linking in Tweets

Twitter users often rely on sources of context outside the current post, assuming that perhaps there is some shared relationship between them and their audience, or temporal context in the form of recent events and recently mentioned entities (Derczynski et al., 2015)

• “New Sandra Bullock astronaut lost in space movie looks absolutely terrifying.” – Gravity is_a Sandra Bullock starred_in Gravity Space Adventure

• “ISRO sends probe to Mars for less money than it takes Hollywood to send a woman to space.” - Gravity

Indian Space Research Organization’s Mars orbiter mission cost less than the movie Gravity

Implicit Entity Linking in Tweets 6 Implicit Entity Linking in Tweets

Tweets

Modeling entity as a Implicit Tweets property graph Detection

Gravity, Sandra Bullock New Sandra Bullock astronaut lost in space movie looks absolutely terrifying Gravity, Space adventure c2 c 1 c3

e Knowledge gleaned 1 Tweet Processing from Wikipedia Gravity, 0.8 Space Odyssy, 0.6 Creating Entity Sandra Bullock, The Martian, 0.58 astronaut, space movie Knowledge gleaned Model Network from tweets Learning to link c2 e2 e3 c4 e1 c5 c1 c3

Implicit Entity Linking in Tweets 7 Entity Modeling

Template Gravity Movie

Sandra Bullock George Clooney Factual Knowledge Alfonso Cuarón

mars orbiter mission astronaut Contextual Knowledge space shuttle ISRO

Time salience 135673

Implicit Entity Linking in Tweets 8 Knowledge Acquisition

• Acquiring factual knowledge • Source – DBpedia Factual Knowledge • Not all factual knowledge is important – movie has ‘starring’ and ‘director’ as well as ‘billed‘ and ‘license’ • Rank the relationships based on joint probability with the entity type 푛푢푚푏푒푟 표푓 푡푟푖푝푙푒푠 표푓 푟 푤푖푡ℎ 푖푛푠푡푎푛푐푒푠 표푓 푇 푃 푟, 푇 = 푡표푡푎푙 푛푢푚푏푒푟 표푓 푡푟푖푝푙푒푠 표푓 푟

• Acquiring contextual knowledge Contextual • Source – contemporary tweets Knowledge • We collect 1000 tweets with explicit mentions of the entity

• Collect the number of hits for the entity’s Wikipedia page within last t days Time as its temporal salience salience

Implicit Entity Linking in Tweets 9 Entity Modeling

Alfonso Curan Wikipedia page titles Mars orbiter mission Sandra Bullock and anchor texts

Woman in space Astronaut Generate freq = 53 semantic cues Gravity Gravity ts = 128262

Generate n-grams

Clean tweets

Contemporary Factual tweets knowledge Factual Knowledge Contextual Knowledge Entity

Implicit Entity Linking in Tweets 10 Entity Model Network

• A property graph - reflecting the topical relationships between entities

Sandra Bullock Alfonso Curan Interstellar Mars orbiter mission

Christopher Nolan

푡푖푚푒 푠푎푙푖푒푛푐푒 = total Matt Damon number of Wikipedia views Gravity Woman in space

Factual Knowledge 푓푟푒푞푢푒푛푐푦 = 푡표푡푎푙 푛푢푚푏푒푟 표푓 푡푖푚푒푠 푝ℎ푟푎푠푒 푖푛 푡푤푒푒푡푠 The Martian Contextual Knowledge astronaut |푁| 푠푝푒푐푖푓푖푐푖푡푦 = Entity |푁푐푗| 푁 − 푡표푡푎푙 푛푢푚푏푒푟 표푓 푒푛푡푖푡푖푒푠, 푁푐푗 푛푢푚푏푒푟 표푓 푎푑푗푎푐푒푛푡 푒푛푡푖푡푖푒푠 Implicit Entity Linking in Tweets 11 Entity Linking

• Two Step Process

• Step 1: Candidate selection and filtering • Objective - prune the search space to reduce number of entities should be considered in disambiguation step from EMN

• Step 2: Disambiguation • Objective - sort the selected candidate entities to place the implicitly mentioned entity in top position

Implicit Entity Linking in Tweets 12 Learning to Link - Candidate Selection and Filtering

c3 m8 ISRO sends probe to Mars for less money than m3 m m1 6 it takes Hollywood to send a woman to space c1 c4 c9 c5 c2 m c7 2 m4 m c6 m 7 5 c8 m1

m2 m2

m3 m4

m4 푠푐표푟푒 = ෍ 푠푝푒푐푖푓푖푐푖푡푦 표푓 푐 ∗ 푓푟푒푞푢푒푛푐푦 (푐 , 푚 ) 푚푖 푗 푗 푖 m6 푐 휖 ℂ m 푗 5 m7 ℂ is the set of matching cues m6 m3

m7 Factual Knowledge Contextual Knowledge Entity Implicit Entity Linking in Tweets 13 Learning to Link - Disambiguation

• Formulated as a ranking problem

• SVMrank to rank candidates • Similarity between the candidate entity and the tweet

x1 x2 x3 … xn

xj= 푠푝푒푐푖푓푖푐푖푡푦 표푓 푐푗 ∗ 푓푟푒푞푢푒푛푐푦 (푐푗, 푒푖) • Time salience of the candidate entity

푡푒푚푝표푟푎푙 푠푎푙푖푒푛푐푒 푒푖 σ 푡푒푚푝표푟푎푙 푠푎푙푖푒푛푐푒 푒 푒∈퐸푐 m2 m6 퐸푐 is the selected candidate set m4 m3 m7

Implicit Entity Linking in Tweets 14 Evaluation Dataset

• Manually annotated tweets with two entity types • Tweets are collected in August 2014 – using keywords ‘movie’ and ‘film’ for movies and ‘book’ and ‘novel’ for books

Entity Type Annotation Tweets Entity Movie Explicit 391 107 Implicit 207 54 NIL 117 0 Book Explicit 200 24 Implicit 190 53 NIL 70 0 • The tweets annotated with NIL do not have either explicit or implicit mention of an entity

Implicit Entity Linking in Tweets 15 Entity Model Network for Evaluation

• 15,000 tweets for movies and books in July 2014 • 617 movies and 102 books • Recent 1000 tweets per entity to build its contextual knowledge • May 2014 version of DBpedia used to extract factual knowledge • Temporal salience is obtained for July 2014

c3 m3 m m1 6 c1 c4 c9 c5 c2 m c7 2 m4 m c6 m 7 5 c8

Factual Knowledge Contextual Knowledge Entity Implicit Entity Linking in Tweets 16 Evaluation - Implicit Entity Linking

• How many tweets had correct entity within selected candidate set (top-25) ? • How many entities were correctly linked by our disambiguation approach?

Entity Type Candidate Selection Recall Disambiguation accuracy Movie 90.33% 60.97% Book 94.73% 61.05% • Importance of contextual knowledge

Step Entity Type Without ctx With ctx Candidate Selection Recall Movie 77.29% 90.33% Book 76.84% 94.73% Disambiguation Accuracy Movie 51.7% 60.97% Book 50.0% 61.05%

Implicit Entity Linking in Tweets 17 Qualitative Error Analysis

Error Tweet Entity Lack of contextual “That Movie Where Shailene Woodley Has Her First Nude Scene? White Bird in a Blizzard knowledge The Is RIGHT HERE!: No one can say Shailene Woodley isn't brave!” Novel entities “”hey, what's wrawng widdis goose?" RT @TIME: Mark Wahlberg Deepwater Horizon could be starring in a movie about the BP oil spill http://ti.me/1oZh55V” Cold start of entities “: George R.R. Martin's Children's Book Gets Re-release The Ice Dragon http://bit.ly/1qNNH5r” Multiple implicit entity “That moment when you realize that hazel grace and Augustus are Divergent, The Fault in mentions brother and sister in one movie and in love battling cancer in Our Stars another”

Implicit Entity Linking in Tweets 18 Conclusion and Future Work

• Introduced a novel task and studied its characteristics • Developed a knowledge-driven solution

• Implement operators to capture evolving knowledge • A new entity becomes popular and people start to tweet about it or the popularity of an existing entity fades away • A new topic of interest emerges for an existing entity or with the introduction of a new entity, or the popularity of the existing topic fades away • Develop technique to identify the tweets with implicit entity mentions and their type (“recognition” as in NER) • Expand the evaluation to other domains and use larger datasets

Implicit Entity Linking in Tweets 19 Thank You http://www.knoesis.org

Dataset is available at: https://goo.gl/jrwpeo

Implicit Entity Linking in Tweets 20