<<

Using Entity Information from a Knowledge Base to Improve Relation Extraction

Lan Du1,∗ Anish Kumar2, Mark Johnson2 and Massimiliano Ciaramita3 1Faculty of , Monash University Clayton, VIC 3800, Australia 2Department of Computing, Macquarie University Sydney, NSW 2109, Australia 3Google Research, Zurich, Switzerland

Abstract A typical relation extraction system functions as a pipeline, first performing named entity recog- Relation extraction is the task of ex- nition (NER) and entity disambiguation to link tracting predicate-argument relationships the entity mentions found in sentences to their between entities from natural language entries (e.g., “Michael Bay” and “Trans- text. This paper investigates whether back- formers” would both be linked to their respective ground information about entities avail- database ids). Then the context in which these en- able in knowledge bases such as tity mentions co-occur is used to predict the re- can be used to improve the accuracy of lationship between the entities. For example, the a state-of-the-art relation extraction sys- path in a syntactic parse between two mentions in tem. We describe a simple and effective a sentence can be used as a feature to predict the way of incorporating FreeBase’s notable relation holding between the two entities. Contin- types into a state-of-the-art relation extrac- uing our example, the text pattern feature X-the- tion system (Riedel et al., 2013). Experi- director-of-Y (or a corresponding parse subtree mental results show that our notable type- fragment) might be used to predict the database based system achieves an average 7.5% relation film director(X,Y). In such a pipeline ar- weighted MAP score improvement. To chitecture, information about the entities from the understand where the notable type infor- database is available and can be used to help de- mation contributes the most, we perform termine the most appropriate relationship between a series of ablation experiments. Results the entities. The goal of this paper is to identify show that the notable type information im- whether that information is useful in a relation ex- proves relation extraction more than NER traction task, and study such information about the labels alone across a wide range of entity entities with a set of ablation experiments. types and relations. We hypothesise that information from database 1 Introduction entries can play the role of background knowl- edge in human sentence comprehension. There is The goal of relation extraction is to extract rela- strong evidence that humans use world knowledge tional information about entities from a large text and contextual information in both syntactic and collection. For example, given the text “Michael semantic interpretation (Spivey-Knowlton and Se- Bay, the director of Transformers, visited Paris divy, 1995), so it is reasonable to expect a machine yesterday,” a relation extraction system might ex- might benefit from it as well. Continuing with our tract the relationship film director(Michael Bay, example, if our database contained the information Transformers). These tuples can be then used to that one particular entity with the name Michael extend a knowledge base. With the increase in the Bay had died a decade before the movie Trans- amount of textual data available on the web, rela- formers was released, then it might be reason- tion extraction has gained wide applications in in- able to conclude that this particular individual was formation extraction from both general newswire unlikely to have directed Transformers. Clearly, texts and specialised document collections such as modelling all the ways in which such background biomedical texts (Liu et al., 2007). information about entities might be used would be ∗This work was partially done while Lan Du was at Mac- extremely complex. This paper explores a simple quarie University. way of using some of the background information

Lan Du, Anish Kumar, Mark Johnson and Massimiliano Ciaramita. 2015. Using Entity Information from a Knowledge Base to Improve Relation Extraction . In Proceedings of Australasian Language Technology Association Workshop, pages 31−38. about entities available in FreeBase (Bollacker et but creating these annotations is both expensive al., 2008). and error-prone. Semi-supervised approaches, by Here we focus on one particular kind of back- contrast, rely on correlations between relations ground information about entities — the informa- and other large data sources. tion encoded in FreeBase’s notable types. Free- In relation extraction, most semi-supervised ap- Base’s notable types are simple atomic labels proaches use distant supervision, which aligns given to entities that indicate what the entity is facts from a large database, e.g., Freebase, to un- notable for, and so serve as a useful informa- labelled text by assuming some systematic rela- tion source that should be relatively easy to ex- tionship between the documents and the database ploit. For example, the search results for “Jim (Bunescu and Mooney, 2007; Mintz et al., 2009; Jones” given by FreeBase contains several dif- Riedel et al., 2010; Yao et al., 2010). Typically, ferent entities. Although they all have the same we assume that (a) an entity linker can reliably name entity (NE) category PERSON, their no- identify entity mentions in the text and map them table types are different. The notable types to the corresponding database entries, and (b) for for the top 4 “Jim Jones” results are organiza- all tuples of entities that appear in a relation in tion/organization founder, music/composer, base- the database, if we observe that entity tuple co- ball/baseball player and government/politician. It occurring in a suitable linguistic construction (e.g., is clear that the notable type information provides a sentence) then that construction expresses the much finer-grained information about “Jim Jones” database relationship about those entities. Pre- than just the NE category. It is reasonable to ex- vious work (Weston et al., 2013; Riedel et al., pect that notable types would be useful for relation 2013; Bordes et al., 2013; Chang et al., 2014) extraction; e.g., the politician Jim Jones is likely has shown that models leveraging rich information to stand for election, while the baseball player is from database often yield improved performance. likely to be involved in sport activities. In this work we are particularly interested in ex- We extend one state-of-the-art relation extrac- ploring entity type information in relation extrac- tion system of Riedel et al. (2013) to exploit this tion, as semantic relations often have selectional notable type information. Our notable type ex- preference over entity types. Yao et al. (2010), tensions significantly improve the mean averaged Singh et al. (2013), Yao et al. (2013), Koch et al. precision (MAP) by 7.5% and the weighted MAP (2014) and Chang et al. (2014) have shown that the by 6% over a strong state-of-the-art baseline. With use of type information, e.g., NE categories, sig- a set of ablation experiments we further evaluate nificantly improves relation extraction. Our work how and where the notable type information con- here is similar except that we rely on Freebase’s tributes to relation extraction.The rest of this paper notable types, which provide much finer-grained is structured as follows. The next section describes information about entities. One of the challenges related work on relation extraction. Section 3 de- in relation extraction, particularly when attempt- scribes how a state-of-the-art relation extraction ing to extract a large number of relations, is to gen- system can be extended to exploit the notable type eralise appropriately over both entities and rela- information available in FreeBase. Section 4 spec- tions. Techniques for inducing distributed vector- ifies the procedures used to identify the space representations can learn embeddings of values of the model parameters, while section 5 both entities and relations in a high-dimensional explains how we evaluate our models and presents vector space, providing a natural notion of simi- a systematic experimental comparison of the mod- larity (Socher et al., 2013) that can be exploited in els by ablating the notable type in different ways the relation extraction task (Weston et al., 2013). based on entities’ NE categories. Section 6 con- Instead of treating notable types as features Ling cludes the paper and discusses future work. and Weld (2012), here we learn distributed vector- space representations for notable types as well as 2 Related work entities, entity tuples and relations.

Most approaches to relation extraction are either 3 Relation extraction as matrix supervised or semi-supervised. Supervised ap- completion proaches require a large set of manually annotated text as training data (Culotta and Sorensen, 2004), Riedel et al. (2013) formulated the relation extrac-

32 tion task as a matrix completion problem. In this lation r that will be learnt from the training data. section we extend this formulation to exploit no- The neighbourhood model can be regarded as pre- table types in a simple and effective way. Specif- dicting an entry θr,t by using entries along the ically, we follow Riedel et al. (2013) in assum- same row. It functions as a logistic regression clas- ing that our data O consists of pairs hr, ti, where sifier predicting the log odds of a FreeBase rela- r ∈ R is a relation and t ∈ T is a tuple of en- tion r applying to the entity tuple t using as fea- tities. The tuples are divided into training and tures the syntactic relations r0 that hold of t. test depending on which documents they are ex- Our notable type extension to the neigh- tracted from. In this paper, the tuples in T are bourhood model enriches the syntactic pat- always pairs of entities, but nothing depends on terns in the training data O with notable this. There are two kinds of relations in R: syntac- type information. For example, if there is tic patterns found in the document collection, and a syntactic pattern for X-director-of-Y in those appearing in the database (including target our training data (say, as part of the tuple relations for extraction). For our notable type ex- hX-director-of-Y, hMichael Bay, Transformersii), tension we assume we have a function n that maps then we add a new syntactic pattern an entity e to its FreeBase notable type n(e). hPerson(X)-director-of-Film(Y)i, where Per- For example, given text “Michael Bay, the di- son and Film are notable types and add rector of Transformers, visited Paris yesterday” the tuple hPerson(X)-director-of-Film(Y), we extract the pair hr, ti where t = hMichael Bay, hMichael Bay, Transformersii to our data O. Transformersi and r = X-the-director-of-Y (actu- Each new relation corresponds to a new col- ally, the path in a dependency parse between the umn in our matrix completion formulation. named entities). From FreeBase we extract the More precisely, the new relations are mem- pair hr0, ti where r0 = film/director. FreeBase bers of the set N = {hr, n(t)i : hr, ti ∈ O}, also tells us that n(Michael Bay) = Person and where n(t) is the tuple of notable types cor- n(Transformers) = Film. Our goal is to learn a responding to the entity tuple t. For example, matrix Θ whose rows are indexed by entity tuples if t = hMichael Bay, Transformersi then in T and whose columns are indexed by relations n(t) = hPerson, Filmi. Then the notable type in R . The entry θt,r is the log odds of relation extension of the neighbourhood model is: r ∈ R holding of tuple t ∈ T , or, equivalently, N0 X 0 θr,t = wr,r0 + wr,hr0,n(t)i the probability that relation r holds of tuple t is hr0,ti∈O\{hr,ti} given by σ(θt,r), where σ is the logistic function: where w0 is a matrix of weights relating the rela- −x −1 σ(x) = (1 + e ) . tions N to the target FreeBase relation r. Riedel et al. (2013) assume that Θ is the sum of three submodels: Θ = ΘN + ΘF + ΘE, where 3.2 A notable type extension to the latent ΘN is the neighbourhood model, ΘF is the latent feature model feature model and ΘE is the entity model (these The latent feature model generalises over relations will be defined below). Here we extend these sub- and entity tuples by associating each of them with models using FreeBase’s notable types. a 100-dimensional real-valued vector. Intuitively, these vectors organise the relations and entity tu- 3.1 A notable type extension to the ples into clusters where conceptually similar rela- neighbourhood model tions and entity tuples are “close,” while those that The neighbourhood model ΘN captures depen- are dissimilar are far apart. In more detail, each dencies between the syntactic relations extracted relation r ∈ R is associated with a latent feature from the text documents and the database relations vector ar of size K = 100. Similarly, each entity extracted from FreeBase. This is given by: tuple t ∈ T is also associated with a latent feature N X vector v of size K as well. Then the latent fea- θr,t = wr,r0 , t hr0,ti∈O\{hr,ti} ture score for an entity tuple t and relation r is just where O is the set of relation/tuple pairs in the data the dot product of the corresponding relation and F and O \{hr, ti} is O with the tuple hr, ti removed. entity tuple vectors, i.e.: θr,t = ar · vt. w is a matrix of parameters, where wr,r0 is a real- We extend the latent feature model by associat- valued weight with which relation r0 “primes” re- ing a new latent feature vector with each notable

33 type sequence observed in the training data, and pressed in terms of notable types to be encoded in use this vector to enrich the vector-space repre- the u0 latent feature vectors. n(ti) sentations of the entity tuples. Specifically, let T 0 = {n(t): t ∈ T } be the set of notable type 4 Inference for model parameters tuples for all of the tuples in T , where n(t) is the The goal of inference is to identify the values tuple of notable types corresponding to the tuple of the model’s parameters, i.e., w, a, v, d and u of entities t as before. We associate each tuple of in the case of the Riedel et al model, and these 0 0 notable types t ∈ T with a latent feature vector plus w0, v0 and u0 in the case of the notable 0 vt0 of dimensionality K. Then we define the no- type extensions. The inference procedure is in- table type extension to the latent feature model as: spired by Bayesian Personalised Ranking (Rendle F0 0 θr,t = ar · (vt + vn(t)) . et al., 2009). Specifically, while the true value of This can be understood as associating each entity θr,t is unknown, it’s reasonable to assume that if tuple t ∈ T with a pair of latent feature vectors vt hr, t+i ∈ O (i.e., is observed in the training data) v v − and n(t). The vector n(t) is based on the notable then θr,t+ > θr,t− for all hr, t i 6∈ O (i.e., not types of the entities, so it can capture generalisa- observed in the training data). Thus the training tions over those notable types. The L2 regularisa- objective is to maximise tion employed during inference prefers latent fea- X X ` = `hr,t+i,hr,t−i ture vectors in which v and v0 are small, thus t n(t) hr,t+i∈O hr,t−i6∈O encouraging generalisations which can be stated in where: ` + − = log σ(θ + − θ − ), and 0 hr,t i,hr,t i r,t r,t terms of notable types to be captured by vn(t). N F E N0 F0 E0 θr,t = θr,t + θr,t + θr,t or θr,t = θr,t + θr,t + θr,t, 3.3 A notable type extension of the entity depending on whether the submodels with notable model type extensions are used. The objective function ` is then maximised by using stochastic gradient The entity model represents an entity e with a K- ascent. The stochastic gradient procedure sweeps dimensional (K = 100) feature vector ue. Sim- through the training data, and, for each observed ilarly, the ith argument position of a relation r is tuple hr, t+i ∈ O, samples a negative evidence tu- also represented by a K-dimensional feature vec- − E ple hr, t i 6∈ O not in the training data, adjusting tor dr,i. The entity model associates a score θ r,t weights to prefer the observed tuple. with a relation r ∈ R and entity tuple t ∈ T In our experiments below we ran stochastic gra- as follows: θE = P|t| d · u , where |t| is r,t i=1 r,i ti dient ascent with a step size of 0.05 and an L2 the arity of (i.e., number of elements in the entity regulariser constant of 0.1 for the neighbourhood tuple t), t is the ith entity in the entity tuple t, i model and 0.01 for the latent feature and entity and d and u are K-dimensional vectors asso- r,i ti models (we used the same regulariser constants for ciated with the ith argument slot of relation r and models both with and without the notable type ex- the entity t respectively. The intuition is that the i tensions). We ran 2,000 sweeps of stochastic gra- latent feature vectors of co-occurring entities and dient ascent. argument slots should be close to each other in the K-dimensional latent feature space, while entities 5 Experimental evaluation and argument slots that do not co-occur should be far apart. We used a set of controlled experiments to see Our notable type extension of the entity model to what extent the notable type information im- is similar to our notable type extension of the la- proves the state-of-the-art relation extraction sys- tent feature model. We associate each notable type tem. We used the New York Times corpus (Sand- 0 haus, 2008) in our experiments, assigning articles m with a K-dimensional feature vector um, and use those vectors to define the entity model score. from the year 2000 as the training corpus and the Specifically, the entity model score is defined as: articles from 1990 to 1999 for testing. The entity |t| tuples T were extracted from the New York Times 0 X   θE = d · u + u0 , corpus (tuples that did not appear at least 10 times r,t r,i ti n(ti) i=1 and also appear in one of the FreeBase relations where n(e) is the notable type for entity e and |t| is were discarded). The relations R are either syn- the length of tuple t. The L2 regularisation again tactic patterns found in the New York Times cor- should encourage generalisations that can be ex- pus, FreeBase relations, or (in our extension) no-

34 Relation # NF NFT NFE NFET table types extracted from FreeBase. Our evalua- person/company 131 0.83 0.89 0.83 0.86 tion focuses on 19 FreeBase relations, as in Riedel location/containedby 88 0.68 0.69 0.68 0.69 person/nationality 51 0.11 0.55 0.15 0.45 et al. (2013). author/works written 38 0.51 0.53 0.57 0.53 person/parents 34 0.14 0.31 0.11 0.28 parent/child 31 0.48 0.58 0.49 0.58 5.1 Notable type identification person/place of birth 30 0.51 0.48 0.56 0.57 person/place of death 22 0.75 0.77 0.75 0.77 Our extension requires a FreeBase notable type neighbourhood/neighbourhood of 17 0.48 0.55 0.52 0.54 broadcast/area served 8 0.21 0.41 0.26 0.30 for every entity mention, which in turn requires company/founders 7 0.46 0.27 0.40 0.28 a Freebase entity id because a notable type is a team owner/teams owned 6 0.21 0.25 0.25 0.25 team/arena stadium 5 0.06 0.07 0.06 0.09 property associated with entities in FreeBase. We film/directed by 5 0.21 0.25 0.24 0.35 person/religion 5 0.20 0.28 0.21 0.23 found the entity id for each named entity as fol- composer/compositions 4 0.42 0.44 0.06 0.42 lows. We used the FreeBase API to search for the sports team/league 4 0.70 0.62 0.63 0.64 film/produced by 3 0.17 0.30 0.12 0.26 notable type for each named entity mentioned in structure/architect 2 1.00 1.00 1.00 1.00 MAP 0.43 0.49 0.42 0.48 the training or test data. In cases where several en- Weighted MAP 0.55 0.64 0.56 0.62 tities were returned, we used the notable type of Table 1: Averaged precision and mean aver- the first entity returned by the API. For example, age precision results. The rows correspond to the FreeBase API returns two entities for the string FreeBase relations, and the columns indicate the “Canada:” a country and a wine (in that order), so combination of sub-models (N = neighbourhood we use the notable type “country” for “Canada” in model, F = latent feature model, E = entity model). our experiments. This heuristic is similar to the The superscript “T ” indicates the combined mod- method of choosing the most likely entity id for a els that incorporate the notable type extensions, string, which provides a competitive baseline for and the # column gives the number of true facts. entity linking (Hoffart et al., 2011). 1 NF 0.9 NFT NFE T 5.2 Evaluation procedure 0.8 NFE After the training procedure is complete and we 0.7 0.6 have estimates for the model’s parameters, we can 0.5

Precision 0.4 use these to compute estimates for the log odds θr,t 0.3 for the test data. These values quantify how likely 0.2 it is that the FreeBase relation r holds of an entity 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 tuple t from the test set, according to the trained Recall model. Figure 1: Averaged 11-point precision-recall In evaluation we follow Riedel et al. (2013) and curve for the four models shown in Table 1. treat each of the 19 relations r as a query, and evaluate the ranking of the entity tuples t returned that this pool is manually constructed by manually according to θ . For each relation r we pool r,t annotating the highest-scoring tuples returned by the highest-ranked 100 tuples produced by each each model). Thus in general the recall scores of of the models and manually evaluate their accu- the existing models are lowered as the number of racy (e.g., by inspecting the original document if models increases. necessary). This gives a set of results that can be used to calculate a precision-recall curve. Aver- 5.3 Experiments with Notable Types aged precision (AP) is a measure of the area under that curve (higher is better), and mean average pre- We found we obtained best performance from the cision (MAP) is average precision averaged over model that incorporates all submodels (which we all of the relations we evaluate on. Weighted MAP call NFET ) and from the model that only incorpo- is a version of MAP that weights each relation by rates the Neighbourhood and Latent Feature sub- the true number of entity tuples for that relation models (which we call NFT ), so we concentrate (so more frequent relations count more). on them here. Table 1 presents the MAP and An unusual property of this evaluation is that weighted MAP scores for these models on the 19 increasing the number of models being evaluated FreeBase relations in the testing set. generally decreases their MAP scores: as we eval- The MAP scores are 6% higher for both NFT uate more models, the pool of “true” entity tuples and NFET , and the weighted MAP scores are 9% for each relation grows in size and diversity (recall and 6% higher for NFT and NFET respectively.

35 T Ablation setting Description NE All entities are labelled with their NE class instead of NE NFE NE+P NE+O NE+M Relation # NE+L their notable type. person/place of birth 30 0.52 0.57 0.54 0.50 0.50 0.54 NE+P Only PERSON entities have notable type information; author/works written 38 0.57 0.53 0.61 0.56 0.57 0.49 the notable type of other entities is replaced with their team/arena stadium 5 0.08 0.09 0.10 0.09 0.07 0.09 NE class. composer/compositions 4 0.35 0.42 0.51 0.37 0.35 0.45 NE+L Only LOCATION entities have notable type informa- person/company 131 0.81 0.86 0.84 0.82 0.83 0.86 tion; the notable type of other entities is replaced with film/directed by 5 0.30 0.35 0.41 0.27 0.27 0.41 their NE class. neighbourhood/neighbourhood of 17 0.59 0.54 0.59 0.49 0.59 0.62 NE+O Only ORGANISATION entities have notable type in- film/produced by 3 0.20 0.26 0.29 0.18 0.19 0.40 formation; the notable type of other entities is replaced person/religion 5 0.22 0.23 0.21 0.22 0.28 0.53 with their NE class. location/containedby 88 0.66 0.69 0.68 0.64 0.64 0.70 NE+M Only MISC entities have notable type information; the sports team/league 4 0.53 0.64 0.54 0.52 0.75 0.75 notable type of other entities is replaced with their NE person/parents 34 0.33 0.28 0.30 0.32 0.35 0.34 class. parent/child 31 0.55 0.58 0.56 0.55 0.59 0.56 person/place of death 22 0.71 0.77 0.74 0.74 0.78 0.72 Table 3: Descriptions of the ablation experiments company/founders 7 0.22 0.28 0.28 0.21 0.29 0.22 team owner/teams owned 6 0.34 0.25 0.27 0.34 0.36 0.35 in Table 2. person/nationality 51 0.19 0.45 0.23 0.50 0.20 0.21 broadcast/area served 8 0.32 0.30 0.33 0.38 0.31 0.29 structure/architect 2 1.00 1.00 1.00 1.00 1.00 1.00 lows. For each NE class c in turn, we replaced MAP 0.45 0.48 0.48 0.46 0.47 0.50 Weighted MAP 0.57 0.62 0.59 0.60 0.58 0.60 the notable type information for entities not clas- Table 2: Results of ablation experiments on the sified as c with their NE class. For example, when NFET model. The columns correspond to experi- c = PERSON, only entities with the NE label ments, and the column labels are explained in Ta- PERSON had notable type information, and the ble 3. notable types of all other entities was replaced A sign test shows that the difference between the with their NE labels. Table 3 lists the different ab- models with notable types and those without the lation experiments. The ablation experiments are notable types is statistically significant (p < 0.05). designed to study which NE classes the notable Clearly, the notable type extensions significantly types help most on. The results are reported in Ta- improve the accuracy of the existing relation ex- ble 2. The results clearly indicate that different re- traction models. Figure 1 shows an averaged 11- lations benefit from the different kinds of notable point precision-recall curve for these four mod- type information about entities. els. This makes clear that across the range of Column “NE+P” shows that relations precision-recall trade-offs, the models with no- such as “author/works written”, “com- table types offer the best performance. poser/compositions” and ”film/directed by” benefit the most from notable type information 5.4 Ablation Experiments about PERSONs. We noticed that there are about We performed a set of ablation experiments to de- 43K entities classified as PERSON, which in- termine exactly how and where the notable type cludes 8,888 book authors, 802 music composers, information improves relation extraction. In these 1212 film directors, etc. These entities have 214 experiments entities are divided into 4 “named en- distinct notable types. Our results show that it is tity” (NE) classes, and we examine the effect of helpful to distinguish the PERSON entities with just providing notable type information for the en- their notable types for relations involving profes- tities of a single NE class. The 4 NE classes we sions. For example, not all people are authors, so used were PERSON, LOCATION, ORGANISA- knowing that a person is an author increases the TION, and MISC (miscellaneous). We classified accuracy of extracting “author/works written”. all entities into these four categories using their Similarly, Column “NE+L” shows that “per- FreeBase types, which provide a more coarse- son/nationality” and “broadcast/area served” grained classification than notable types. For ex- gain the most from the notable type information ample, if an entity has a FreeBase “people/person” about locations. There are about 8.5K entities type, then we assigned it to the NE class PER- classified as LOCATION, which includes 4807 SON; if an entity has a “location/location” type, city towns, 301 countries, and so on. There then its NE class is LOCATION; and if an en- are 170 distinct notable types for LOCATION tity has a “organisation/organisation” type, then entities. its NE class is ORGANISATION. All entities not Column “NE+O” shows that the notable type classified as PERSON, LOCATION, or ORGAN- information about ORGANISATION entities im- ISATION were labelled MISC. proves the accuracy of extracting relations involv- We ran a set of ablation experiments as fol- ing organisations. Indeed, there are more than

36 3K business companies and 200 football teams. relation extraction system (Riedel et al., 2013) by Notable type information about organisations im- extending each of its submodels to exploit the “no- proves extraction of the “parent/child” relation be- table type” information about entities available in cause this relation involves entities such as com- FreeBase. We demonstrated that these extensions panies. For example, in our corpus the sentence improve the MAP score by 6% and the weighted “CNN, a unit of the Turner Broadcasting, says MAP score by 7.5%, which is a significant im- that 7000 schools have signed up for The News- provement over a strong baseline. Our ablation room” expresses the parent/child(Turner Broad- experiments showed that the notable type informa- casting, CNN) relation. tion improves relation extraction more than NER The ablation results in Column “NE+M” show tags across a wide range of entity types and rela- that information about MISC entities is most use- tions. ful of all, as this ablation experiment yielded the In future work we would like to develop meth- highest overall MAP score. There are about 13.5K ods for exploiting other information available in entities labelled MISC. The most frequent no- FreeBase to improve a broad range of natural table types for entities in the MISC NE class are language processing and information extraction “film/film” and “book/book”. Therefore it is rea- tasks. We would like to explore ways of exploit- sonable that notable type information for MISC ing entity information beyond (distant) supervi- entities would improve AP scores for relations sion approaches, for example, in the direction of such as “film/directed by” and “person/religion”. OpenIE (Wu and Weld, 2010; Fader et al., 2011; For example, “George Bush reached a turning Mausam et al., 2012). The temporal information point in his life and became a born-again Chris- in a large database like FreeBase might be es- tian” is an example of the “person/religion” rela- pecially useful for named entity linking and re- tion, and it’s clear that it is useful to know that lation extraction: e.g., someone that has died is “born-again Christian” belongs to the religion no- less likely to release a hit single. In summary, we table type. The “sports team/league” relation is believe that there are a large number of ways in interesting because it performs best with notable which the rich and diverse information present in type information for entities in the ORGANISA- FreeBase might be leveraged to improve natural TION or MISC NE classes. It turns out that language processing and information retrieval, and roughly half the sports teams are classified as OR- exploiting notable types is just one of many possi- GANISATIONs and half are classified as MISC. ble approaches. The sports teams that are classified as MISC are missing the “organisation/organisation” type in Acknowledgments their FreeBase entries, otherwise they would be This research was supported by a Google award classified as ORGANISATIONs. through the Natural Language Understanding In summary, the ablation results show that the Focused Program, and under the Australian contribution of notable type information depends Research Council’s Discovery Projects fund- on the relation being extracted. The result demon- ing scheme (project numbers DP110102506 and strates that relations involving organisations bene- DP110102593). fits from the notable type information about these organisations. It also demonstrates that certain re- lations benefit more from notable type information References than others. Further research is needed understand some of the ablation experiment results (e.g., why Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim does person/place of death perform best with no- Sturge, and Jamie Taylor. 2008. Freebase: a col- laboratively created for structuring table type information about ORGANISATIONs?) human knowledge. In Proceedings of the ACM SIG- MOD international conference on Management of 6 Conclusion and future work data, pages 1247–1250.

In this paper we investigated the hypothesis that Antoine Bordes, Nicolas Usunier, Alberto Garcia- Duran, Jason Weston, and Oksana Yakhnenko. background information about entities present in a 2013. Translating embeddings for modeling multi- large database such as FreeBase can be useful for relational data. In Advances in Neural Information relation extraction. We modified a state-of-the-art Processing Systems 26, pages 2787–2795.

37 Razvan Bunescu and Raymond Mooney. 2007. Learn- Steffen Rendle, Christoph Freudenthaler, Zeno Gant- ing to extract relations from the web using mini- ner, and Lars Schmidt-Thieme. 2009. Bpr: mal supervision. In Proceedings of the 45th An- Bayesian personalized ranking from implicit feed- nual Meeting of the Association of Computational back. In Proceedings of the Twenty-Fifth Confer- Linguistics, pages 576–583. ence on Uncertainty in Artificial Intelligence, pages 452–461. Kai-Wei Chang, Wen-tau Yih, Bishan Yang, and Christopher Meek. 2014. Typed tensor decom- Sebastian Riedel, Limin Yao, and Andrew McCal- position of knowledge bases for relation extrac- lum. 2010. Modeling relations and their men- tion. In Proceedings of the 2014 Conference on tions without labeled text. In Proceedings of the Empirical Methods in Natural Language Processing 2010 European Conference on Machine Learning (EMNLP), pages 1568–1579. and Knowledge Discovery in : Part III, ECML PKDD’10, pages 148–163. Aron Culotta and Jeffrey Sorensen. 2004. Dependency tree kernels for relation extraction. In Proceedings Sebastian Riedel, Limin Yao, Andrew McCallum, and of the 42nd Meeting of the Association for Compu- Benjamin M. Marlin. 2013. Relation extraction tational Linguistics (ACL’04), pages 423–429. with matrix factorization and universal schemas. In Proceedings of the 2013 Conference of the North Anthony Fader, Stephen Soderland, and Oren Etzioni. American Chapter of the Association for Computa- 2011. Identifying relations for open information ex- tional Linguistics: Human Language Technologies, traction. In Proceedings of the Conference of Em- pages 74–84. pirical Methods in Natural Language Processing Evan Sandhaus. 2008. The new york times annotated (EMNLP ’11), pages 1535–1545. corpus. Linguistic Data Consortium, Philadelphia.

Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bor- Sameer Singh, Sebastian Riedel, Brian Martin, Jiap- dino, Hagen Furstenau,¨ Manfred Pinkal, Marc Span- ing Zheng, and Andrew McCallum. 2013. Joint iol, Bilyana Taneva, Stefan Thater, and Gerhard inference of entities, relations, and coreference. In Weikum. 2011. Robust disambiguation of named Proceedings of the 2013 Workshop on Automated entities in text. In Proceedings of the 2011 Con- Knowledge Base Construction, pages 1–6. ference on Empirical Methods in Natural Language Processing (EMNLP) , pages 782–792. Richard Socher, Danqi Chen, Christopher D. Manning, and Andrew Ng. 2013. Reasoning with neural ten- Mitchell Koch, John Gilmer, Stephen Soderland, and sor networks for knowledge base completion. In Daniel S. Weld. 2014. Type-aware distantly su- Advances in Neural Information Processing Systems pervised relation extraction with linked arguments. (NIPS 26), pages 926–934. In Proceedings of the 2014 Conference on Em- pirical Methods in Natural Language Processing Michael Spivey-Knowlton and Julie Sedivy. 1995. Re- (EMNLP), pages 1891–1901. solving attachment ambiguities with multiple con- straints. Cognition, 55:227–267. Xiao Ling and Daniel S Weld. 2012. Fine-grained en- tity recognition. In AAAI. Jason Weston, Antoine Bordes, Oksana Yakhnenko, and Nicolas Usunier. 2013. Connecting language Yudong Liu, Zhongmin Shi, and Anoop Sarkar. 2007. and knowledge bases with embedding models for re- Exploiting rich syntactic information for relation lation extraction. In Proceedings of the 2013 Con- extraction from biomedical articles. In Human ference on Empirical Methods in Natural Language Language Technologies 2007: The Conference of Processing, pages 1366–1371. the North American Chapter of the Association for Computational Linguistics, NAACL-Short’07, Fei Wu and Daniel S. Weld. 2010. Open information pages 97–100. extraction using . In Proceedings of the 48th Annual Meeting of the Association for Compu- Mausam, Michael Schmitz, Stephen Soderland, Robert tational Linguistics, ACL’10, pages 118–127. Bart, and Oren Etzioni. 2012. Open language learn- Limin Yao, Sebastian Riedel, and Andrew McCallum. ing for information extraction. In Proceedings of 2010. Collective cross-document relation extraction the 2012 Joint Conference on Empirical Methods without labelled data. In Proceedings of the 2010 in Natural Language Processing and Computational Conference on Empirical Methods in Natural Lan- Natural Language Learning, pages 523–534. guage Processing, pages 1013–1023. Mike Mintz, Steven Bills, Rion Snow, and Daniel Ju- Limin Yao, Sebastian Riedel, and Andrew McCallum. rafsky. 2009. Distant supervision for relation ex- 2013. Universal schema for entity type prediction. traction without labeled data. In Proceedings of the In Proceedings of the 2013 Workshop on Automated Joint Conference of the 47th Annual Meeting of the Knowledge Base Construction, pages 79–84. ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 1003–1011.

38