Discerning Intelligence from Text (at the UofA)
Denilson Barbosa [email protected] Web search is changing
• … from IR-style document retrieval • … to ques on-answering over en es extracted from the web
which team did Lou Saban coach last? which was Lou Saban’s last team ?
• The answer is the Chowan Braves, and it can be found in Lou Saban’s Wikipedia page (ranked #1), obituary (ranked #4), and so on… [email protected] 2 in good company…
An explicit answer
[email protected] 3 it is not all bad news…
[email protected] 4 Structured knowledge (harnessed from the Web)
[email protected] 5 Surface-level rela on Extrac on
A er his departure from Buffalo, Saban Documents returned to coach college football teams including Miami, Army, and UCF.
Recognize Resolve Split Find Entities Coreferences Sentences Relations
<“Lou Saban”, departed from, “Buffalo Bills”> <“Lou Saban”, coach, “Miami Hurricanes”> <“Lou Saban”, coach, “Army Black Knights football”> Triple store <“Lou Saban”, coach, “University of Central Florida”>
[email protected] 6 From triples to a KB…
?????
• There is a very, very, very long way… § Predicate disambigua on into seman c “rela ons”… § Named en ty disambigua on… § Assigning en es to classes… § Grouping classes into a hierarchy… § Ordering facts temporally…
• It would have been virtually impossible without Wikipedia
[email protected] 7 In this talk…
• Work on en ty linking with random walks … [CIKM’2014]
• A bit of the work on open rela on extrac on – less on disambigua on § SONEX (clustering-based) [TWEB ‘2012] § EXEMPLAR (dependences based) [EMNLP’2013] § With Tree Kernels [NAACL’2013] § EFFICIENS (cost-constrained)
• A bit of our work on understanding disputes in Wikipedia [Hypertext2012] [ACM TIST’2015]
[email protected] 8 In this talk…
• Work on en ty linking … [CIKM’2014]
En ty Linking
[email protected] 9 The en ty graph
• We perform disambigua on of a graph where nodes have ids of en es in the KB with their respec ve context (i.e., text!)
The En ty Graph has The KB has facts text about the en es ≠ and asser ons
Buffalo Bills A er his departure Buffalo Bulls from Buffalo, Nick Saban Saban Lou Saban
returned to coach Miami Heat college football Miami Hurricanes Miami Dolphins teams including Miami, Army Black Knights football US Army Army, University of Central Florida and UCF. UCF Knights football [email protected] 10 The en ty graph
• Typically, built from Wikipedia
• Nodes are Wikipedia ar cles § All known names § Context: whole ar cle § Metadata: • types, keyphrases, • type compa bility…
• Edges: E1 – E2 iff: § There is a wikilink from E1 to E2 § There is ar cle E3 that men ons E1 and E2 close to each other
• Alias dic onary: § Mapping from names to ids
[email protected] 11 En ty linking – main steps
• Candidate Selec on: find a small set of good candidates for each men on à using the alias dic onary • Men on disambigua on: assign each men on to one of its candidates
Buffalo Bills A er his departure Buffalo Bulls from Buffalo, Nick Saban Saban Lou Saban
returned to coach Miami Heat college football Miami Hurricanes Miami Dolphins teams including Miami, Army Black Knights football US Army Army, University of Central Florida and UCF. UCF Knights football [email protected] 12 Candidate Selec on
• On the KB: alias-dic onary expansion § Saban : {Nick Saban, Lou Saban, Saban Capital Group, …} • On the document: § Lookups: alias-dic onary/Wikipedia disambigua on pages § Co-reference resolu on[Cucerzan’07] § Acronym expansion[Zhang et.al’10, Zhang et.al’11] (ABC -> Australian Broadcas ng Corpora on) [email protected] 13 Local men on disambigua on—e.g., [Cucerzan’2007]
• Disambiguate each men on in isola on ent(m) = arg max (↵ prior(m, e)+ sim(m, e)) e candidates (m) · · 2
•freq (e|m) •indegree (e) •length( context(e))
• cosine/Dice/KL( context(m), context(e))
[email protected] 14 Local men on disambigua on
• Problema c assump on: men ons are independent of each other
Saban = Nick Saban è Miami = Miami Dolphins
Saban = Lou Saban è Miami = Miami Hurricanes
Buffalo Bills A er his departure Buffalo Bulls from Buffalo, Nick Saban Saban Lou Saban
returned to coach Miami Heat college football Miami Hurricanes Miami Dolphins teams including Miami, Army Black Knights football US Army Army, University of Central Florida and UCF. UCF Knights football [email protected] 15 Global men on disambigua on—e.g., [Hoffart’2011]
• Disambiguate all men ons at once ent(m) = arg max (↵ prior(m, e)+ sim(m, e)+ e candidates (m) · · 2 coherence(G ))) · ent
Buffalo Bills A er his departure Buffalo Bulls from Buffalo, Nick Saban Saban Lou Saban
returned to coach Miami Heat college football Miami Hurricanes Miami Dolphins teams including Miami, Army Black Knights football US Army Army, University of Central Florida and UCF. UCF Knights football [email protected] 16 Global men on disambigua on
• Coherence captures the assump on that the input document has a single theme or topic § E.g., rock music, or the world cup final match § NP-hard op miza on in general
Buffalo Bills A er his departure Buffalo Bulls from Buffalo, Nick Saban Saban Lou Saban
returned to coach Miami Heat college football Miami Hurricanes Miami Dolphins teams including Miami, Army Black Knights football US Army Army, University of Central Florida and UCF. UCF Knights football [email protected] 17 Global men on disambigua on
• [Hoffart et al 2011] – dense sub-graph problem • Greedy algorithm: remove non-taboo en es un l a minimal subgraph with highest weight is found en ty-en ty men on-en ty •overlap anchor words •sim (m,e) •overlap links •keyphraseness (m,e) •type similarity
post-processing
[email protected] 18 Global men on disambigua on
• Rel-RW : Robust en ty linking with Random Walks [CIKM2014] • Global no on of en ty-en ty similarity • Greedy algorithm: itera vely disambiguate men ons; start with the easiest ones
Buffalo Bills A er his departure Buffalo Bulls from Buffalo, Nick Saban Saban Lou Saban
returned to coach Miami Heat college football Miami Hurricanes Miami Dolphins teams including Miami, Army Black Knights football US Army Army, University of Central Florida and UCF. UCF Knights football [email protected] 19 Random Walks as context representa on
• Random walks capture indirect relatedness between nodes in the graph
k candidates n nodes in total
Buffalo Bills A er his departure Buffalo Bulls from Buffalo, Nick Saban Saban Lou Saban
returned to coach Miami Heat college football Miami Hurricanes Miami Dolphins teams including Miami, Army Black Knights football US Army Army, University of Central Florida and UCF. UCF Knights football
[email protected] 20 Random Walks as context representa on
Relatedness between en es
En ty Seman c Signature
Document Seman c Signature
• One vector for each en ty, and another for the whole document
• Similarity is measured using Zero-KL Divergence
[email protected] 21 Seman c Signatures of En es
• Restart from the en ty with probability α (e.g. 0.15) § Un l convergence • Repeat for the candidate men ons only
Buffalo Bills Buffalo Bulls
Nick Saban Lou Saban
Miami Heat
. . . Miami Hurricanes Miami Dolphins
Army Black Knights football
US Army . . .
University of Central Florida
UCF Knights football
[email protected] 22 Seman c Signatures of Documents
• (From [Milne & Wi en 2008]): If there are unambiguous men ons, use only their en es to find the signature of the document • Otherwise, use all candidate en es
Buffalo Bills A er his departure Buffalo Bulls from Buffalo, Nick Saban Saban Lou Saban
returned to coach Miami Heat unambiguous college football Miami Hurricanes Miami Dolphins teams including Miami, Army Black Knights football US Army Army, University of Central Florida and UCF. UCF Knights football
[email protected] 23 Algorithm
• Find candidate en es for each men on • Compute prior(m,e) and the context(m,e) • Sort men ons by ambiguity (i.e., number of candidates) • Go through each men on m in ascending order: • SSd = seman c signature of document • Assign to m the candidate e with highest combined score prior(m,e) * context(m,e) + sim(SSe , SSd) • Update the set of en es for the document
[email protected] 24 Algorithm
Men ons Candidates [ambiguity] [PriorProb, CtxSim, SemSim] UCF Knights football [0.133, 0.18, 0.50] University of Central Florida UCF [0.167, 0.13, 0.52] 33 UCF Knights basketball [0.041, 0.13, 0.34]
Lou Saban [0.009, 0.28, 0.41] Nick Saban Saban [0.009, 0.15, 0.54] 45 Saban Capital Group [0.545, 0.13, 0.20]
Buffalo, New York [0.467, 0.07, 0.54] Buffalo Bulls football Buffalo [0.024, 0.11, 0.50] Use all candidates for SSd 317 Buffalo Bills [0.021, 0.09, 0.58]
Miami [0.632, 0.07, 0.61] Miami Hurricanes football Miami [0.029, 0.12, 0.58] 343 Miami Dolphins [0.011, 0.10, 0.56]
Army Black Knights football [0.062, 0.09, 0.52] Mariland Terrapins football Army [0.001, 0.07, 0.56] 402 Army [0.155, 0.04, 0.34] [email protected] 25 Algorithm
Men ons Candidates Ed = {UCF Knights football} [ambiguity] [PriorProb, CtxSim, SemSim] UCF Knights football [0.133, 0.18, 0.50] University of Central Florida UCF UCF Knights football [0.167, 0.13, 0.52] [0.133, 0.18, 0.50] 33 UCF Knights basketball [0.041, 0.13, 0.34]
Lou Saban [0.009, 0.28, 0.41] Nick Saban Saban [0.009, 0.15, 0.54] 45 Saban Capital Group [0.545, 0.13, 0.20]
Buffalo, New York [0.467, 0.07, 0.54] Buffalo Bulls football Buffalo [0.024, 0.11, 0.50] 317 Buffalo Bills [0.021, 0.09, 0.58]
Miami [0.632, 0.07, 0.61] Miami Hurricanes football Miami [0.029, 0.12, 0.58] 343 Miami Dolphins [0.011, 0.10, 0.56]
Army Black Knights football [0.062, 0.09, 0.52] Mariland Terrapins football Army [0.001, 0.07, 0.56] 402 Army [0.155, 0.04, 0.34] [email protected] 26 Algorithm
Men ons Candidates Ed = {UCF Knights football, Lou Saban} [ambiguity] [PriorProb, CtxSim, SemSim] UCF Knights football [0.133, 0.18, 0.50] University of Central Florida UCF UCF Knights football [0.167, 0.13, 0.52] [0.133, 0.18, 0.50] 33 UCF Knights basketball [0.041, 0.13, 0.34]
Lou Saban Lou Saban [0.009, 0.28, 0.41] [0.009, 0.28, 0.51] Nick Saban Nick Saban Lou Saban Saban [0.009, 0.15, 0.54] [0.009, 0.15, 0.58] [0.009, 0.28, 0.51] 45 Saban Capital Group Saban Capital Group [0.545, 0.13, 0.20] [0.545, 0.13, 0.18]
Buffalo, New York [0.467, 0.07, 0.54] Buffalo Bulls football Buffalo [0.024, 0.11, 0.50] 317 Buffalo Bills [0.021, 0.09, 0.58]
Miami [0.632, 0.07, 0.61] Miami Hurricanes football Miami [0.029, 0.12, 0.58] 343 Miami Dolphins [0.011, 0.10, 0.56]
Army Black Knights football [0.062, 0.09, 0.52] Mariland Terrapins football Army [0.001, 0.07, 0.56] 402 Army [0.155, 0.04, 0.34] [email protected] 27 Algorithm
Men ons Candidates Ed = {UCF Knights football, Lou Saban, Buffalo Bills} [ambiguity] [PriorProb, CtxSim, SemSim] UCF Knights football [0.133, 0.18, 0.50] University of Central Florida UCF UCF Knights football [0.167, 0.13, 0.52] [0.133, 0.18, 0.50] 33 UCF Knights basketball [0.041, 0.13, 0.34]
Lou Saban Lou Saban [0.009, 0.28, 0.41] [0.009, 0.28, 0.51] Nick Saban Nick Saban Lou Saban Saban [0.009, 0.15, 0.54] [0.009, 0.15, 0.58] [0.009, 0.28, 0.51] 45 Saban Capital Group Saban Capital Group [0.545, 0.13, 0.20] [0.545, 0.13, 0.18]
Buffalo, New York Buffalo Bills [0.467, 0.07, 0.54] [0.021, 0.09, 0.95] Buffalo Bulls football Buffalo Bulls football Buffalo Bills Buffalo [0.024, 0.11, 0.50] [0.024, 0.11, 0.71] [0.021, 0.09, 0.95] 317 Buffalo Bills Buffalo, New York [0.021, 0.09, 0.58] [0.467, 0.07, 0.42]
Miami [0.632, 0.07, 0.61] Miami Hurricanes football Miami [0.029, 0.12, 0.58] 343 Miami Dolphins [0.011, 0.10, 0.56]
Army Black Knights football [0.062, 0.09, 0.52] Mariland Terrapins football Army [0.001, 0.07, 0.56] 402 Army [0.155, 0.04, 0.34] [email protected] 28 Algorithm
Men ons Candidates Ed = {UCF Knights football, Lou Saban, Buffalo Bills, Miami Hurricanes football} [ambiguity] [PriorProb, CtxSim, SemSim] UCF Knights football [0.133, 0.18, 0.50] University of Central Florida UCF UCF Knights football [0.167, 0.13, 0.52] [0.133, 0.18, 0.50] 33 UCF Knights basketball [0.041, 0.13, 0.34]
Lou Saban Lou Saban [0.009, 0.28, 0.41] [0.009, 0.28, 0.51] Nick Saban Nick Saban Lou Saban Saban [0.009, 0.15, 0.54] [0.009, 0.15, 0.58] [0.009, 0.28, 0.51] 45 Saban Capital Group Saban Capital Group [0.545, 0.13, 0.20] [0.545, 0.13, 0.18]
Buffalo, New York Buffalo Bills [0.467, 0.07, 0.54] [0.021, 0.09, 0.95] Buffalo Bulls football Buffalo Bulls football Buffalo Bills Buffalo [0.024, 0.11, 0.50] [0.024, 0.11, 0.71] [0.021, 0.09, 0.95] 317 Buffalo Bills Buffalo, New York [0.021, 0.09, 0.58] [0.467, 0.07, 0.42]
Miami Miami Hurricanes football [0.632, 0.07, 0.61] [0.029, 0.12, 0.93] Miami Hurricanes football Miami Dolphins Miami Hurricanes football Miami [0.029, 0.12, 0.58] [0.011, 0.10, 0.98] [0.029, 0.12, 0.93] 343 Miami Dolphins Miami [0.011, 0.10, 0.56] [0.632, 0.07, 0.48]
Army Black Knights football [0.062, 0.09, 0.52] Mariland Terrapins football Army [0.001, 0.07, 0.56] 402 Army [0.155, 0.04, 0.34] [email protected] 29 Algorithm
Men ons Candidates [ambiguity] [PriorProb, CtxSim, SemSim] UCF Knights football [0.133, 0.18, 0.50] University of Central Florida UCF UCF Knights football [0.167, 0.13, 0.52] [0.133, 0.18, 0.50] 33 UCF Knights basketball [0.041, 0.13, 0.34]
Lou Saban Lou Saban [0.009, 0.28, 0.41] [0.009, 0.28, 0.51] Nick Saban Nick Saban Lou Saban Saban [0.009, 0.15, 0.54] [0.009, 0.15, 0.58] [0.009, 0.28, 0.51] 45 Saban Capital Group Saban Capital Group [0.545, 0.13, 0.20] [0.545, 0.13, 0.18]
Buffalo, New York Buffalo Bills [0.467, 0.07, 0.54] [0.021, 0.09, 0.95] Buffalo Bulls football Buffalo Bulls football Buffalo Bills Buffalo [0.024, 0.11, 0.50] [0.024, 0.11, 0.71] [0.021, 0.09, 0.95] 317 Buffalo Bills Buffalo, New York [0.021, 0.09, 0.58] [0.467, 0.07, 0.42]
Miami Miami Hurricanes football [0.632, 0.07, 0.61] [0.029, 0.12, 0.93] Miami Hurricanes football Miami Dolphins Miami Hurricanes football Miami [0.029, 0.12, 0.58] [0.011, 0.10, 0.98] [0.029, 0.12, 0.93] 343 Miami Dolphins Miami [0.011, 0.10, 0.56] [0.632, 0.07, 0.48]
Army Black Knights football Army Black Knights football [0.062, 0.09, 0.52] [0.062, 0.09, 0.74] Mariland Terrapins football Mariland Terrapins football Army Black Knights football Army [0.001, 0.07, 0.56] [0.001, 0.07, 0.83] [0.062, 0.09, 0.74] 402 Army Army [0.155, 0.04, 0.34] [0.155, 0.04, 0.28] [email protected] 30 “Paul, John, Ringo, and George”
Men ons Candidates [ambiguity] [PriorProb, CtxSim, SemSim] Ringo Starr [0.266, 0.08, 0.42] Ringo (album) [0.297, 0.09, 0.30] Ringo Ringo Rama 35 [0.010, 0.14, 0.27] Johnny Ringo [0.010, 0.18, 0.20
Paul the Apostle [0.354, 0.06, 0.42] Paul McCartney [0.055, 0.06, 0.51] Paul Paul Field 379 [0.001, 0.12, 0.25] Paul I of Russia [0.026, 0.04, 0.25] Ed George Sco [0.002, 0.10, 0.21] George Moore George [0.001, 0.10, 0.20] George Costanza 807 [0.07, 0.06, 0.24] George Harrison [0.011, 0.03, 0.47]
John Lennon [0.007, 0.05, 0.53] Gospel of John [0.154, 0.03, 0.33] John John the Apostle 1699 [0.038, 0.04, 0.33] John, King of England [0.066, 0.03, 0.28] [email protected] 31 “Paul, John, Ringo, and George”
Men ons Candidates Ed = {Ringo Starr} [ambiguity] [PriorProb, CtxSim, SemSim] Ringo Starr [0.266, 0.08, 0.42] Ringo (album) [0.297, 0.09, 0.30] Ringo Ringo Starr Ringo Rama [0.266, 0.08, 0.42] 35 [0.010, 0.14, 0.27] Johnny Ringo [0.010, 0.18, 0.20
Paul the Apostle [0.354, 0.06, 0.42] Paul McCartney [0.055, 0.06, 0.51] Paul Paul Field 379 [0.001, 0.12, 0.25] Paul I of Russia [0.026, 0.04, 0.25]
George Sco [0.002, 0.10, 0.21] George Moore George [0.001, 0.10, 0.20] George Costanza 807 [0.07, 0.06, 0.24] George Harrison [0.011, 0.03, 0.47]
John Lennon [0.007, 0.05, 0.53] Gospel of John [0.154, 0.03, 0.33] John John the Apostle 1699 [0.038, 0.04, 0.33] John, King of England [0.066, 0.03, 0.28] [email protected] 32 “Paul, John, Ringo, and George”
Men ons Candidates Ed = {Ringo Starr, Paul McCartney} [ambiguity] [PriorProb, CtxSim, SemSim] Ringo Starr [0.266, 0.08, 0.42] Ringo (album) [0.297, 0.09, 0.30] Ringo Ringo Starr Ringo Rama [0.266, 0.08, 0.42] 35 [0.010, 0.14, 0.27] Johnny Ringo [0.010, 0.18, 0.20
Paul the Apostle Paul McCartney [0.354, 0.06, 0.42] [0.055, 0.06, 1.25] Paul McCartney Paul the Apostle [0.055, 0.06, 0.51] [0.354, 0.06, 0.24] Paul McCartney Paul Paul Field Paul Field [0.055, 0.06, 1.25] 379 [0.001, 0.12, 0.25] [0.001, 0.12, 0.22] Paul I of Russia Paul I of Russia [0.026, 0.04, 0.25] [0.026, 0.04, 0.19]
George Sco [0.002, 0.10, 0.21] George Moore George [0.001, 0.10, 0.20] George Costanza 807 [0.07, 0.06, 0.24] George Harrison [0.011, 0.03, 0.47]
John Lennon [0.007, 0.05, 0.53] Gospel of John [0.154, 0.03, 0.33] John John the Apostle 1699 [0.038, 0.04, 0.33] John, King of England [0.066, 0.03, 0.28] [email protected] 33 “Paul, John, Ringo, and George”
Men ons Candidates Ed = {Ringo Starr, Paul McCartney, George Harrison} [ambiguity] [PriorProb, CtxSim, SemSim] Ringo Starr [0.266, 0.08, 0.42] Ringo (album) [0.297, 0.09, 0.30] Ringo Ringo Starr Ringo Rama [0.266, 0.08, 0.42] 35 [0.010, 0.14, 0.27] Johnny Ringo [0.010, 0.18, 0.20
Paul the Apostle Paul McCartney [0.354, 0.06, 0.42] [0.055, 0.06, 1.25] Paul McCartney Paul the Apostle [0.055, 0.06, 0.51] [0.354, 0.06, 0.24] Paul McCartney Paul Paul Field Paul Field [0.055, 0.06, 1.25] 379 [0.001, 0.12, 0.25] [0.001, 0.12, 0.22] Paul I of Russia Paul I of Russia [0.026, 0.04, 0.25] [0.026, 0.04, 0.19]
George Sco George Harrison [0.002, 0.10, 0.21] [0.011, 0.03, 1.30] George Moore George Sco George [0.001, 0.10, 0.20] [0.002, 0.10, 0.20] George Harrison George Costanza George Costanza [0.011, 0.03, 1.30] 807 [0.07, 0.06, 0.24] [0.07, 0.06, 0.24] George Harrison George Moore [0.011, 0.03, 0.47] [0.001, 0.10, 0.19]
John Lennon [0.007, 0.05, 0.53] Gospel of John [0.154, 0.03, 0.33] John John the Apostle 1699 [0.038, 0.04, 0.33] John, King of England [0.066, 0.03, 0.28] [email protected] 34 “Paul, John, Ringo, and George”
Men ons Candidates [ambiguity] [PriorProb, CtxSim, SemSim] Ringo Starr [0.266, 0.08, 0.42] Ringo (album) [0.297, 0.09, 0.30] Ringo Ringo Starr Ringo Rama [0.266, 0.08, 0.42] 35 [0.010, 0.14, 0.27] Johnny Ringo [0.010, 0.18, 0.20
Paul the Apostle Paul McCartney [0.354, 0.06, 0.42] [0.055, 0.06, 1.25] Paul McCartney Paul the Apostle [0.055, 0.06, 0.51] [0.354, 0.06, 0.24] Paul McCartney Paul Paul Field Paul Field [0.055, 0.06, 1.25] 379 [0.001, 0.12, 0.25] [0.001, 0.12, 0.22] Paul I of Russia Paul I of Russia [0.026, 0.04, 0.25] [0.026, 0.04, 0.19]
George Sco George Harrison [0.002, 0.10, 0.21] [0.011, 0.03, 1.30] George Moore George Sco George [0.001, 0.10, 0.20] [0.002, 0.10, 0.20] George Harrison George Costanza George Costanza [0.011, 0.03, 1.30] 807 [0.07, 0.06, 0.24] [0.07, 0.06, 0.24] George Harrison George Moore [0.011, 0.03, 0.47] [0.001, 0.10, 0.19]
John Lennon John Lennon [0.007, 0.05, 0.53] [0.007, 0.05, 1.20] Gospel of John Gospel of John [0.154, 0.03, 0.33] [0.154, 0.03, 0.22] John Lennon John John the Apostle John the Apostle [0.007, 0.05, 1.20] 1699 [0.038, 0.04, 0.33] [0.038, 0.04, 0.23] John, King of England John, King of England [0.066, 0.03, 0.28] [0.066, 0.03, 0.21] [email protected] 35 En ty Linking – Evalua on
• Public benchmarks Datasets # of mentions # of articles MSNBC 739 20 AQUAINT 727 50 ACE2004 306 57
• Synthe c Wikipedia dataset. § Generated based on the popularity of en es. e.g. dataset 0.3-0.4 means the accuracy of prior probability is between 0.3-0.4. § 8 datasets: 0.3-0.4, 0.4-0.5, … 0.9-1.0, 1.0-1.1 § 40 documents, each document has 20-40 en es. • Evalua on Metrics § Accuracy § Micro F1: average F-1 per men on § Macro F1: average F-1 per document [email protected] 36 En ty Linking – Evalua on
• Results on MSNBC, AQUAINT, ACE2004 Datasets MSNBC AQUAINT ACE2004 Systems Accuracy F1@MI F1@MA Accuracy F1@MI F1@MA Accuracy F1@MI F1@MA PriorProb 85.98 86.50 87.15 84.87 87.27 87.16 84.82 85.49 87.13 Prior-Type 81.86 82.81 83.84 83.22 85.57 85.08 80.93 84.04 86.08 Local 77.43 77.91 72.30 66.44 68.32 68.09 61.48 61.96 56.95 Cucerzan 87.80 88.34 87.76 76.62 78.67 78.22 78.99 79.30 78.22 M&W 68.45 78.43 80.37 79.92 85.13 84.84 75.54 81.29 84.25 Han’11 87.65 88.46 87.93 77.16 79.46 78.80 72.76 73.48 66.80 AIDA 76.83 78.81 76.26 52.54 56.47 56.46 77.04 80.49 84.13 GLOW 65.55 75.37 77.33 75.65 83.14 82.97 75.49 81.91 83.18 RI 88.57 90.22 90.87 85.01 87.72 87.74 82.35 86.60 87.13 REL-RW 91.62 92.18 92.10 88.45 90.82 90.51 84.43 87.68 89.23
• Prior probability is a strong baseline. • Benchmarks are biased towards popular en es, • Representa veness? (e.g. long tails of the men ons in the Web)
[email protected] 37 En ty Linking – Evalua on
REL-RW, Han11: Graph-based measure Curcerzan, AIDA: Lexical measure M&W, RI, GLOW: Linked-based measure
[email protected] 38 En ty Linking – Evalua on
• Different configura ons § Itera ve process performs best § Unambiguous men ons are more informa ve than candidates. § Robust performance with different weigh ng schemes.
[email protected] 39 Robust En ty Linking with Random Walks
• Intui on: less popular en es are more likely to be be er linked than well described (i.e., have a lot of text) • Our seman c similarity has a natural interpreta on and relies more on the graph than on the document content • Men on disambigua on without global no on of coherence • Use a greedy itera ve approach: disambiguate the ``easiest’’ men on, re-compute everything, repeat • Robust against bad parameter choice • No learning! Previous state of the art [Hoffart’2011, Milne&Wi