Arxiv:1903.02591V1 [Cs.CL] 6 Mar 2019
Total Page:16
File Type:pdf, Size:1020Kb
Imposing Label-Relational Inductive Bias for Extremely Fine-Grained Entity Typing Wenhan Xiongy, Jiawei Wuy, Deren Leiy, Mo Yu∗, Shiyu Chang∗, Xiaoxiao Guo∗, William Yang Wangy y University of California, Santa Barbara ∗ IBM Research fxwhan, [email protected], [email protected], fshiyu.chang, [email protected] Abstract Context Types y Existing entity typing systems usually ex- person , televi- ? ploit the type hierarchy provided by knowl- Big Show then appeared at One Night sion program edge base (KB) schema to model label cor- Stand, attacking Tajiri, Super Crazy, and the Full Blooded Italians after person, athlete, relations and thus improve the overall perfor- their tag team match wrestler, mance. Such techniques, however, are not di- entertainer rectly applicable to more open and practical scenarios where the type set is not restricted The womens pole vault at the 2010 monthy, event? by KB schema and includes a vast number IAAF World Indoor Championships of free-form types. To model the underly- was held at the ASPIRE Dome on 12 ing label correlations without access to man- and 14 March. date, month ually annotated label structures, we introduce a novel label-relational inductive bias, repre- Table 1: Examples of inconsistent predictions pro- sented by a graph propagation layer that effec- duced by existing entity typing system that does not tively encodes both global label co-occurrence model label correlations. We use different subscript statistics and word-level similarities. On a symbols to indicate contradictory type pairs and show large dataset with over 10,000 free-form types, the ground-truth types in italics. the graph-enhanced model equipped with an attention-based matching module is able to achieve a much higher recall score while main- In practical scenarios, a key challenge of entity taining a high-level precision. Specifically, it typing is to correctly predict multiple ground-truth achieves a 15:3% relative F1 improvement and type labels from a large candidate set that covers also less inconsistency in the outputs. We fur- ther show that a simple modification of our a wide range of types in different granularities. In proposed graph layer can also improve the per- this sense, it is essential for models to effectively formance on a conventional and widely-tested capture the inter-label correlations. For instance, if dataset that only includes KB-schema types.1 an entity is identified as a “criminal”, then the en- tity must also be a “person”, but it is less likely for 1 Introduction this entity to be a “police officer” at the same time. When ignoring such correlations and considering arXiv:1903.02591v1 [cs.CL] 6 Mar 2019 Fine-grained entity typing is the task of identifying each type separately, models are often inferior in specific semantic types of entity mentions in given performance and prone to inconsistent predictions. contexts. In contrast to general entity types (e.g., As shown in Table1, an existing model that in- organization, event), fine-grained types (e.g., po- dependently predicts different types fails to reject litical party, natural disaster) are often more infor- predictions that include apparent contradictions. mative and can provide valuable prior knowledge Existing entity typing research often address for a wide range of NLP tasks, such as coreference this aspect by explicitly utilizing a given type resolution (Durrett and Klein, 2014), relation ex- hierarchy to design hierarchy-aware loss func- traction (Yaghoobzadeh et al., 2016) and question tions (Ren et al., 2016b; Xu and Barbosa, 2018) or answering (Lee et al., 2006; Yavuz et al., 2016). enhanced type label encodings (Shimaoka et al., 1https://github.com/xwhan/ 2017) that enable parameter sharing between re- Extremely-Fine-Grained-Entity-Typing lated types. These methods rely on the assump- tion that the underlying type structures are prede- • Empirically, our model is able to offer sig- fined in entity typing datasets. For benchmarks nificant improvements over previous models annotated with the knowledge base (KB) guided on the Ultra-Fine dataset and also reduces the distant supervision, this assumption is often valid cases of inconsistent type predictions. since all types are from KB ontologies and natu- rally follow tree-like structures. However, since 2 Related Work knowledge bases are inherently incomplete (Min Fine-Grained Entity Typing The task of fine- et al., 2013), existing KBs only include a limited grained entity typing was first thoroughly inves- set of entity types. Thus, models trained on these tigated in (Ling and Weld, 2012), which utilized datasets fail to generalize to lots of unseen types. Freebase-guided distant supervision (DS) (Mintz In this work, we investigate entity typing in a more et al., 2009) for entity typing and created one of open scenario where the type set is not restricted the early large-scale datasets. Although DS pro- by KB schema and includes over 10,000 free-form vides an efficient way to annotate training data, types (Choi et al., 2018). As most of the types do later work (Gillick et al., 2014) pointed out that en- not follow any predefined structures, methods that tity type labels induced by DS ignore entities’ lo- explicitly incorporate type hierarchies cannot be cal context and may have limited usage in context- straightforwardly applied here. aware applications. Most of the following research To effectively capture the underlying label cor- has since focused on testing in context-dependent relations without access to known type struc- scenarios. While early methods (Gillick et al., tures, we propose a novel label-relational induc- 2014; Yogatama et al., 2015) on this task rely on tive bias, represented by a graph propagation layer well-designed loss functions and a suite of hand- that operates in the latent label space. Specifi- craft features that represent both context and enti- cally, this layer learns to incorporate a label affin- ties, Shimaoka et al.(2016) proposed the first at- ity matrix derived from global type co-occurrence tentive neural model which outperformed feature- statistics and word-level type similarities. It can based methods with a simple cross-entropy loss. be seamlessly coupled with existing models and jointly updated with other model parameters. Em- Modeling Entity Type Correlations To bet- pirically, on the Ultra-Fine dataset (Choi et al., ter capture the underlying label correlations, Shi- 2018), the graph layer alone can provide a signif- maoka et al.(2017) employed a hierarchical label icant 11:9% relative F1 improvement over previ- encoding method and AFET (Ren et al., 2016a) ous models. Additionally, we show that the re- used the predefined label hierarchy to identify sults can be further improved (11:9% ! 15:3%) noisy annotations and proposed a partial-label loss with an attention-based mention-context matching to reduce such noise. A recent work (Xu and module that better handles pronouns entity men- Barbosa, 2018) proposed hierarchical loss nor- tions. With a simple modification, we demonstrate malization which alleviated the noise of too spe- that the proposed graph layer is also beneficial cific types. Our work differs from these works to the widely used OntoNotes dataset, despite the in that we do not rely on known label structures fact that samples in OntoNotes have lower label and aim to learn the underlying correlations from multiplicity (i.e., average number of ground-truth data. Rabinovich and Klein(2017) recently pro- types for each sample) and thus require less label- posed a structure-prediction approach which used dependency modeling than the Ultra-Fine dataset. type correlation features. The inference on their To summarize, our major contribution includes: learned factor graph is approximated by a greedy • We impose an effective label-relational bias decoding algorithm, which outperformed unstruc- on entity typing models with an easy-to- tured methods on their own dataset. Instead of us- implement graph propagation layer, which al- ing an explicit graphical model, we enforce a re- lows the model to implicitly capture type de- lational bias on model parameters, which does not pendencies; introduce extra burden on label decoding. • We augment our graph-enhanced model with 3 Task Definition an attention-based matching module, which constructs stronger interactions between the Specifically, the task we consider takes a raw sen- mention and context representations; tence C as well as an entity mention span M inside P(person|x) ! Interaction Dot Product Information Propagation Flow on Graph Entity Mention Linear Transform Attention Self-Attentive leader politician Context Graph Convolution person official Bi-LSTM author GloVe + Position Embedding athlete He has been trying to win international support for the Palestinian cause Neighbors of person Type Co-occurrence Graph a) Mention-Context Encoding b) GCN over Type Vectors Figure 1: Overview of the process to make predictions on the type “person”. a) Modules used to extract mention and context aware representations. b) An illustration of the graph layer operating over the type vector of “person”. lc×hc C as inputs, and aims to predict the correct type la- Ch 2 R , we then apply a self-attentive en- bels Tm of M from a candidate type set T , which coder (McCann et al., 2017) on the top to get the includes more than 10,000 free-form types. The final context representation C. For the entity men- entity span M here can be named entities, nom- tion span, we concatenate the features derived by inals and also pronouns. The ground-truth type a character-level CNN and a similar self-attentive set Tm here usually includes more than one types encoder. We denote the final mention representa- (approximately five types on average), making this tion as M.2 task a multi-label classification problem. 4.2 Mention-Context Interaction 4 Methodology Since most previous datasets only consider named In this section, we first briefly introduce the neural entities, a simple concatenation of the two features architecture to encode raw text inputs.