AFET: Automatic Fine-Grained Entity Typing by Hierarchical Partial-Label Embedding

AFET: Automatic Fine-Grained Entity Typing by Hierarchical Partial-Label Embedding Xiang Ren+, Wenqi He+, Meng Qu+, Lifu Huang#, Heng Ji#, Jiawei Han+ + University of Illinois at Urbana-Champaign, Urbana, IL # Rensselaer Polytechnic Institute, Troy, NY Context-Dependent Entity Typing q Given token spans of entity mentions in text, focuses on classifying them into types of interest [Barack Obama] arrived this afternoon in [Washington, D.C.]. [President Obama]’s wife [Michelle] accompanied him [TNF alpha] is produced chiefly by activated [macrophages] Context-Dependent Entity Typing q Given token spans of entity mentions in text, focuses on classifying them into types of interest [Barack Obama] arrived this afternoon in [Washington, D.C.]. PERSON [President Obama]’s wife [Michelle] accompanied him LOCATION PROTEIN [TNF alpha] is produced chiefly by activated [macrophages] CELL Entity Types: From Coarse Labels to Fine-Grained Labels ID Sentence The fourth movie in the Predator series entitled ‘The Predator’ may see S2 the return of action-movie star Arnold Schwarzenegger to the franchise root ... product person location organiz ation Person ... ... Location ... business politician artist ... Organization man ... ... A set of few common types author actor singer ... A type hierarchy with 100+ types Entity Types: From Coarse Labels to Fine-Grained Labels ID Sentence The fourth movie in the Predator series entitled ‘The Predator’ may see S2 the return of action-movie star Arnold Schwarzenegger to the franchise root q Fine-grained entity type features for deeperNLP tasks ... product person location organiz – Relation extraction: a 93% ation improvement on NYT news articles, ... ... reported by (Ling and Weld, 2012) ... – Coreference resolution business politician artist man ... q Assists downstream applications ... – Question answering systems ... – Knowledge base completion author actor singer ... *Ling and Weld, “Fine-Grained Entity Recognition”, AAAI 2012 How to Get Labeled Data? q Human annotation (for 100+ entity types) – Cost cannot scale up! – Error-prone Crowding sourcing? Hard for (non-expert) annotators to distinguish over 100 types consistently ID Sentence Target Type root Hierarchy ... Governor Arnold Schwarzenegger gi ves a speech at S1 Mission Serve's service project on Veterans Day 2010. organiz product person location The fourth movie in the Predator series entitled 'The ation S2 Predator' may see the return of action-movie star Arnold ... ... Schwarzenegger to the franchise. ... Schwarzenegger’s first property investment was a blocK S3 althete politician artist business ... of six units, for which he scraped together $US27,000. man ... ... ... Noisy Training Examples author actor singer ... Mention: “Arnold Schwarzenegger”; Context: S1; Candidate Type Candidate Type Set: {person, politician, artist, Knowledge Bases S1 Set (Sub-tree) actor, author, businessman, althete} Distant Mention: “Arnold Schwarzenegger”; Context: S2; How to Get Labeled Data? SupervisionID Sentence Target Type root Hierarchy ... q Human Governor Arnoldannotation Schwarzenegger gi ves a speech at S1 Candidate Type Set: {person, politician, artist, Mission Serve's service project on Veterans Day 2010. S2 organiz product person location q Distant The fourth moviesupervision in the Predator series entitled* 'The ation S2 Predator' may see the return of action-movie star Arnold ... ... actor, author, businessman, althete} – Heuristically Schwarzenegger to the franchiselabels. large ... Schwarzenegger’s first property investment was a blocK S3 althete politician artist business ... corpus of six units, forwith which heKB scrapedtypes together $US27,000. man ... ... ... Noisy Training Examples author actor singer ... ID Sentence Target Type root Mention: “Arnold Schwarzenegger”; Context: S1; Hierarchy Candidate Type ... Mention: “Schwarzenegger”; Context: S3; GovernorS1 ArnoldCandidate Schwarzenegger Type Set: {person gi ves, politician a speech, artistat , Knowledge Bases S1 Set (Sub-tree) Mission Serve's actorservice, author project, businessman on Veterans, althete Day 2010} . organiz Distantproduct person location S3 Candidate Type Set: {person, politician, artist, The fourthMention movie :in “ Arnoldthe Predator Schwarzenegger series entitled”; Context 'The: S2; Supervision ation S2 PredatorS2 ' mayCandidate see the Type return Set of: { personaction,- moviepolitician star, artist Arnold, ... ... CandidateSchwarzeneggertypes:actor to the{, personauthor franchise, businessman, politician. , althete, } ... athleteSchwarzeneggerbusinessman’s first property, artist investment, actor, author was a blocK} business ... actor, author, businessman, althete} S3 Mention: “Schwarzenegger”; Context: S3; althete politician artist ofEntity sixS units3 Candidate, :for Arnold which Type he scraped Set Schwarzenegger: {person together, politician $US27, artist,000., man ... actor, author,... businessman, althete} Entity: Arnold Schwarzenegger ... Noisy Training Examples author actor singer ... *Mintz et al. “Distant supervision for relation extraction without labeled data”, ACL 2009 Mention: “Arnold Schwarzenegger”; Context: S1; Candidate Type Candidate Type Set: {person, politician, artist, Knowledge Bases S1 Set (Sub-tree) actor, author, businessman, althete} Distant Mention: “Arnold Schwarzenegger”; Context: S2; Supervision S2 Candidate Type Set: {person, politician, artist, actor, author, businessman, althete} Mention: “Schwarzenegger”; Context: S3; S3 Candidate Type Set: {person, politician, artist, actor, author, businessman, althete} Entity: Arnold Schwarzenegger Automatic Fine-Grained Entity Typing • Problem: How to learn an effective model to predict a single type-path for each unlinkable entity mentions, using the automatically-labeled training corpus Predictions for Text corpus Labeled corpus Typing model unlinkable mentions ? -------------------- -------------------- -------------------- -------------------- --------------------? -------------------- -------------------- -------------------- -------------------- -------------------- --------------------? -------------------- -------------------- -------------------- -------------------- root ... product person location organiz ation ... ... NER + Distant ... business Supervision politician artist man ... ... ... author actor singer ... ID Sentence Target Type root Hierarchy ... Governor Arnold Schwarzenegger gi ves a speech at S1 Mission Serve's service project on Veterans Day 2010. organiz product person location The fourth movie in the Predator series entitled 'The ation S2 Predator' may see the return of action-movie star Arnold ... ... Schwarzenegger to the franchise. ... Schwarzenegger’s first property investment was a blocK S3 althete politician artist business ... of six units, for which he scraped together $US27,000. man ... ... ... Noisy Training Examples author actor singer ... Mention: “Arnold Schwarzenegger”; Context: S1; Candidate Type Candidate Type Set: {person, politician, artist, Knowledge Bases S1 Set (Sub-tree) actor, author, businessman, althete} Distant Mention: “Arnold Schwarzenegger”; Context: S2; Challenges SupervisionID Sentence Target Type root Hierarchy ... q Noisy Governortype Arnold Schwarzeneggerlabels gi ves a speech at S1 Target Type Candidate Type Set: {person, politician, artist, ID Mission Serve's serviceSentence project on Veterans Day 2010. root S2 Hierarchy organiz – Context-agnostic entity product person location ation ... Governor ArnoldThe fourth Schwarzenegger movie in the Predator gi ves series a speech entitled 'atThe S1 S2 Predator' may see the return of action-movie star Arnold ... ... Mission typeServe Schwarzenegger's serviceassignment project to the franchise on Veteranson. Day 2010. ... actor, author, businessman, althete} organiz Schwarzenegger’s first property investment was a blocK product person location The fourthS3entity movie inmentions the Predator series entitled 'The althete politician artist business ... ation of six units, for which he scraped together $US27,000. man S2 Predator' may see the return of action-movie star Arnold ... ... ... ... Schwarzenegger to the franchise. ... ... Schwarzenegger’s fiNoisyrst property Training investment Examples was a blocK Targetauthor Type actor singer ... SID3 Sentence althete politician rootartist business ... of six unitsMention, for which: “Arnold he scraped Schwarzenegger together $”;US Context27,000: .S1; Hierarchy man Candidate Type ... Mention: “Schwarzenegger”; Context: S3; GovernorS1 ArnoldCandidate Schwarzenegger Type Set: {person gi ves, politician a speech, artistat , Knowledge Bases ...S1 ... Set (Sub-tree) Mission Serve's actorservice, author project, businessman on Veterans, althete Day 2010} . ... Distant organiz product person location ... S3 Candidate Type Set: {person, politician, artist, The fourthMention movieNoisy :in “ Training Arnoldthe Predator Schwarzenegger Examples series entitled”; Context 'The: S2; Supervision author actor singer ation Candidate Type Set: {person, politician, artist, S2 Mention PredatorS2 ' :may “Arnold see the Schwarzenegger return of action-movie”; Context star Arnold: S1; ... ... Schwarzeneggeractor to the, author franchise, businessman. , althete} ... Candidate Type S1 Candidate Type Set: {person, politician, artist, Knowledge Bases Schwarzenegger’s first property investment was a blocK Set (Sub-tree) business ... actor, author, businessman, althete} S3 actorMention, author: ,“ Schwarzeneggerbusinessman,” althete; Context}: S3; althete politician

Load more