Discovering Implicit Knowledge with Unary Relations

Michael Glass Alfio Gliozzo IBM Research AI IBM Research AI Knowledge Induction and Reasoning Knowledge Induction and Reasoning [email protected] [email protected]

Abstract to detect an occurrence of this relation between the entities TRUMP and UNITED STATES from both State-of-the-art relation extraction ap- the sentences “Trump issued a presidential mem- proaches are only able to recognize rela- orandum for the United States” and “The Houston tionships between mentions of entity ar- Astros will visit President Donald Trump and the guments stated explicitly in the text and White House on Monday”. However, the first ex- typically localized to the same sentence. ample expresses an explicit relation between the However, the vast majority of relations are two entities, while the second states the same re- either implicit or not sententially local- lation implicitly and requires some background ized. This is a major problem for Knowl- knowledge and inference to identify it properly. In edge Base Population, severely limiting fact, the entity UNITED STATES is not even men- recall. In this paper we propose a new tioned explicitly in the text, and it is up to the methodology to identify relations between reader to recall that US presidents live in the White two entities, consisting of detecting a very House, and therefore people visiting it are visiting large number of unary relations, and us- the US president. ing them to infer missing entities. We de- Very often, relations expressed in text are im- scribe a deep learning architecture able to plicit. This reflects in the low recall of the cur- learn thousands of such relations very ef- rent KBP relation extraction methods, that are ficiently by using a common deep learn- mostly based on recognizing lexical-syntactic con- ing based representation. Our approach nections between two entities within the same sen- largely outperforms state of the art rela- tence. The state-of-the-art systems are affected tion extraction technology on a newly in- by very low performance, close to 16.6% F1, as troduced web scale knowledge base pop- shown in the latest TAC-KBP evaluation cam- ulation benchmark, that we release to the paigns and in the open KBP evaluation bench- research community. mark1. Existing approaches to dealing with im- 1 Introduction plicit information such as textual entailment de- pend on unsolved problems like inducing entail- Knowledge Base Population (KBP) from text is ment rules from text. the problem of extracting relations between enti- In this paper, we address the problem of ties with respect to a given schema, usually de- identifying implicit relations in text using a fined by a set of types and relations. The facts radically different approach, consisting of added to the KB are triples, consisting of two en- reducing the problem of identifying binary re- tities connected by a relation. Although providing lations into a much larger set of simpler unary explicit provenance for the triples is often a sub- relations. For example, to build a Knowl- goal in KBP, we focus on the case where correct edge Base (KB) about presidents in the G8 triples are gathered from text without necessarily countries, the presidentOf relation can be annotating any particular text with a relation. Hu- expanded to presidentOf :UNITED STATES, pres- mans are able to perform very well on the task of identOf :GERMANY, presidentOf :JAPAN, and so understanding relations in text. For example, if the target relation is presidentOf, anyone will be able 1https://kbpo.stanford.edu

1585 Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), pages 1585–1594 , Australia, July 15 - 20, 2018. c 2018 Association for Computational Linguistics on. For all these unary relations, we train a multi- vised KBP methodologies, with a focus on knowl- class (and in other cases, multi-label) classifier edge induction applications. Section3 introduces from all the available training data. This classifier the use of Unary Relations for KBP and section takes textual evidence where only one entity is 4 outlines the process for producing and training identified (e.g. ANGELA MERKEL) and predicts them. Section5 describes a deep learning archi- a confidence score for each unary relation. In tecture able to recognize unary relations from tex- this way, ANGELA MERKEL will be assigned to tual evidence. In section6 we describe the bench- the unary relation presidentOf :GERMANY, mark for evaluation. Section7 provides an exten- which in turn generates the triple sive evaluation of unary relations, and a saliency hANGELA MERKEL presidentOf GERMANYi. map exploration of what the deep learning model To implement the idea above, we explore the has learned. Section8 concludes the paper high- use of knowledge-level supervision, sometimes lighting research directions for future work. called distant supervision, to train a deep learning based approach. The training data in this approach 2 Related Work is a knowledge base and an unannotated corpus. A pre-existing Entity Detection and Linking sys- Binary relation extraction using distant supervi- tem first identifies and links mentions of entities in sion has a long history (Surdeanu et al., 2012; the corpus. For each entity, the system gathers its Mintz et al., 2009). Mentions of entities from the context set, the contexts (e.g. sentences or token knowledge base are located in text. When two en- windows) where it is mentioned. The context set tities are mentioned in the same sentence that sen- forms the textual evidence for a multi-class, multi- tence becomes part of the evidence for the relation label deep network. The final layer of the network (if any) between those entities. The set of sen- is vector of unary relation predictions and the in- tences mentioning an entity pair is used in a ma- termediate layers are shared. This architecture al- chine learning model to predict how the entities lows us to efficiently train thousands of unary rela- are related, if at all. tions, while reusing the feature representations in Deep learning has been applied to binary rela- the intermediate layers across relations as a form tion extraction. CNN-based (Zeng et al., 2014), of transfer learning. The predictions of this net- LSTM-based (Xu et al., 2015), attention based work represent the probability for the input entity (Wang et al., 2016) and compositional embedding to belong to each unary relation. based (Gormley et al., 2015) models have been To demonstrate the effectiveness of our ap- trained successfully using a sentence as the unit proach we developed a new KBP benchmark, con- of context. Recently, cross sentence approaches sisting of extracting unseen DBPedia triples from have been explored by building paths connecting the text of a web crawl, using a portion of DBpe- the two identified arguments through related enti- dia to train the model. As part of the contributions ties (Peng et al., 2017; Zeng et al., 2016). These for this paper, we release the benchmark to the re- approaches are limited by requiring both entities search community providing the software needed to be mentioned in a textual context. The context to generate it from Common Crawl and DBpedia aggregation approaches of state-of-the-art neural as an open source project2. models, max-pooling (Zeng et al., 2015) and at- As a baseline, we adapt a state of the art tention (Lin et al., 2016), do not consider that dif- deep learning based approach for relation extrac- ferent contexts may contribute to the prediction in tion (Lin et al., 2016). Our experiments clearly different ways. Instead, the context pooling only show that using unary relations to generate new determines the degree of a sentence’s contribution triples greatly complements traditional binary ap- to the relation prediction. proaches. An analysis of the data shows that our TAC-KBP is a long running challenge for approach is able to capture implicit information knowledge base population. Effective systems from textual mentions and to highlight the reasons in these competitions combine many approaches why the assignments have been made. such as rule-based relation extraction, directly su- The paper is structured as follows. In section2 pervised linear and neural network extractors, dis- we describe the state of the art in distantly super- tantly supervised neural network models (Zhang et al., 2016) and tensor factorization approaches 2https://github.com/IBM/cc-dbp to relation prediction. Compositional Universal

1586 Schema is an approach based on combining the the subset of the KB extension that occurs in the matrix factorization approach of universal schema corpus. (Riedel et al., 2013), with repesentations of tex- A requisite for a unary relation is that in the tual relations produced by an LSTM (Chang et al., training KB there should exist many triples that 2016). The rows of the universal schema matrix share a relation and a particular entity as one ar- are entity pairs, and will only be supported by a gument, thus providing enough training for each textual relation if they occur in a sentence together. unary classifier. Therefore, in the example above, Other approaches to relational knowledge in- we will not likely be able to generate predicates for duction have used distributed representations for all the 195 countries, because some of them will words or entities and used a model to predict the either not occur at all in the training data or they relation between two terms based on their seman- will be very infrequent. However, even in cases tic vectors (Drozd et al., 2016). This enables the where arguments tend to follow a long tail distri- discovery of relations between terms that do not bution, it makes sense to generate unary predicates co-occur in the same sentence. However, the dis- for the most frequent ones. tributed representation of the entities is developed from the corpus without any ability to focus on the 1000000 relations of interest. One example of such work is 100000

LexNET, which developed a model using the dis- 10000 tributional word vectors of two terms to predict 1000 lexical relations between them (DSh). The term vectors are concatenated and used as input to a 100 single hidden layer neural network. Unlike our ap- 10 proach to unary relations the term vectors are pro- Number ofUnary Relations 1 duced by a standard relation-independent model 10000 1000 100 10 1 of the term’s contexts such as word2vec (Mikolov Corpus Extension Threshold et al., 2013). Figure 1: Minimum Corpus Extension to Number Unary relations can be considered to be similar of Unary Relations to types. Work on ontology population has con- sidered the general distribution of a term in text to Figure1 shows the relationship between the predict its type (Cimiano and Volker¨ , 2005). Like threshold for the size of the corpus extension of the method of DS , this does not customize the h a unary relation and the number of different unary representation of an entity to a set of target rela- relations that can be found in our dataset. The rela- tions. tionship is approximately linear on a log-log scale. There are 26 unary relations with a corpus exten- 3 Unary vs Binary Relations sion of at least 10,000. These relations include: The basic idea presented in this paper is that in many cases relation extraction problems can • hasLocation:UNITED STATES be reduced to sets of simpler and inter-related • background:GROUP OR BAND unary relation extraction problems. This is pos- • kingdom:ANIMAL sible by providing a specific value to one of the two arguments, transforming the relations into a • language:ENGLISH LANGUAGE set of categories. For example, the livesIn re- lation between persons and countries can be de- Lowering the threshold to 100 we have 8711 unary composed into 195 relations (one relation for relations and we get close to 1M unary relations each country), including livesIn:UNITED STATES, with more than 10 entities. livesIn:CANADA, and so on. The argument that is In a traditional binary KBP task a triple has a combined with the binary relation to produce the relevant context set if the two entities occur at unary relation is called the fixed argument while least once together in the corpus - where the notion the other argument is the filler argument. The KB of ‘together’ is typically intra-sentential (within a extension of a unary relation is the set of all filler single sentence). In KBP based on unary relations, arguments in the KB, and the corpus extension is a triple hFILLER rel FIXEDi has a relevant context

1587 set if the unary relation rel:FIXED has the filler ar- are based in (hcompany employs personi gument in its corpus extension, i.e. the filler oc- ∧ hcompany based in countryi ⇒ curs in the corpus. hperson lives in countryi) is not representable. Both approaches are limited in different important respects. KBP with unary re- 4 Training and Using Unary Relation lations can only produce triples when fix- Classifiers ing a relation and argument provides a rela- A unary relation extraction system is a multi-class, tively large corpus extension. Triples such as multi-label classifier that takes an entity as input hBARACK OBAMA spouse MICHELLE OBAMAi and returns its probability as a slot filler for each cannot be extracted in this way, since neither relation. In this paper, we represent each entity by Barack nor Michelle Obama have a large set of the set of contexts (sentences in our experiments) spouses. The limitation of binary relation extrac- where their mentions have been located; we call tion is that the arguments must occur together. But them context sets. for many triples, such as those relating to a per- The process of training and applying a KBP sys- son’s occupation, a film’s genre or a company’s tem using unary relations is outlined step-by-step product type, the second argument is often not below. given explicitly. In both cases, a relevant context set is a neces- • Build a set of unary relations that have a cor- sary but not sufficient condition for extracting the pus extension above some threshold. triple from text, since the context set may not ex- • Locate the entities from the knowledge graph press (even implicitly) the relation. Figure2 shows in text. the number of triples in our dataset that have a rel- evant context set with unary relations exclusively, • Create a context set for each entity from all binary relations exclusively and both unary and bi- the sentences that mention the entity. nary. The corpus extension threshold for the unary • Label the context set with the unary relations relations is 100. (if any) for the entity. The negatives for each unary relation will be all the entities where that unary relation is not true. 566,990 • Train a model to determine the unary rela- 199,515 tions for any given entity from its context set. • Apply the model to all the entities in the cor- 2,783,357 pus, including those that do not exist in the knowledge graph. • Convert the extracted unary relations back to binary relations and add to the knowledge graph as new edges. Any new entities are Uniquely Unary Uniquely Binary Both added to the knowledge graph as new nodes. Figure 2: Triples with Relevant Context Sets Per- A closer look to the generated training data can Relation Style provide insight in the value of unary relations for distant supervision. Although unary relations could also be Below are example binary contexts relating an viewed as types, we argue that it is preferable organization to a country. The two arguments are to consider them as relations. For example, shown in bold. Some contexts where two entities if the unary relation lives in:UNITED STATES occur together (relevant contexts) will imply a re- is represented as the type US-PERSON, it has lation between them, while others will not. In the no structured relationship to the type US- first context, Philippines and Eagle Cement are COMPANY (based in:UNITED STATES). So not textually related. While in the second context, the inference rule that companies tend to em- Dyna Management Services is explicitly stated ploy people who live in the countries they to be located in Bermuda.

1588 Entity Context Set

Entity Detection Domain and Unary Corpus Linking Deep Knowledge

Network Triples …

Figure 3: Unary Relational Knowledge Induction Architecture Overview

The company competes with Holcim , , The Good Guys, Philippines, the local unit of Swiss Ansell and Bitcoin group. There is no strict logical company LafargeHolcim, and Eagle entailment, indicating JB Hi-Fi is located in Aus- Cement, a company backed by diver- tralia, instead there is textual evidence that makes sified local conglomerate San Miguel it probable. which is aggressively expanding into in- frastructure. 5 Architecture for Unary Relations ... said Richmond, who is vice presi- Figure3 illustrates the overall architecture. First dent of Dyna Management Services, a an Entity Detection and Linking system identifies Bermuda-based insurance management occurrences in text of entities that are or should company. be in the knowledge base. Second, the contexts (here we use a sentence as the unit of context) for On the other hand, there are many triples that each entity are then gathered into an entity context have no relevant context using binary extraction, set. This context set provides all the sentences that but can be supported with unary extraction. JB contain a mention of a particular entity and is the Hi-Fi is a company located in Australia, (unary textual evidence for what triples are true for the relation hasLocation:AUSTRALIA). Although “JB entity. Third, the context set is then fed into a deep Hi-Fi” never occurs together with “Australia” in neural network, given in Figure4. The output of our corpus, we can gather implicit textual evidence the network is a set of predicted triples that can be for this relation from its unary relation context added to the knowledge base. sets. Furthermore, even cases where there is a rel- Figure4 shows the architecture of the deep evant binary context set, the contexts may not pro- learning model for unary relation based KBP. vide enough or any textual support for the relation, From an entity context set, each sentence is pro- while the unary context sets might. jected into a vector space using a piecewise con- volutional neural network (Zeng et al., 2015). Woolworths, Coles owner Wesfarmers, The sentence vectors are then aggregated using JB Hi-Fi and Harvey Norman were also a Network-in-Network layer (NiN) (Lin et al., trading higher. 2013). JB Hi-Fi in talks to buy The Good Guys The sentence-to-vector portion of the neural ar- In equities news, protective glove and chitecture begins by looking up the words in a condom maker Ansell and JB Hi-Fi are word embedding table. The word embeddings are slated to post half year results, while initialized with word2vec (Mikolov et al., 2013) Bitcoin group is expected to list on and updated during training. The position of each ASX. word relative to the entity is also looked up in a position embedding table. Each word vector The key indicators are: “ASX”, which is an is concatenated with its position vector to pro- Australian stock exchange, and the other Aus- duce each word representation vector. A piecewise tralian businesses mentioned, such as Woolworths, max-pooled convolution (PCNN) is applied over

1589 Sentence To Vector … Sentence Vector Aggregation … …

… …

…co-founded Allen & Shariff in 1993… -1 0 0 0 1 2

Figure 4: Deep Learning Architecture for Unary Relations the resulting sentence matrix, with the pieces de- In contrast, the common maximum aggregation fined by the position of the entity argument: before approach is to move the final prediction layer to the entity, the entity, and after the entity. A fully the sentence-to-vector modules and then aggre- connected layer then produces the sentence vector gate by max-pooling the sentence level predic- representation. This is a refinement of the Neural tions. This aggregation strategy means that only Relation Extraction (NRE) (Lin et al., 2016) ap- the sentence most strongly indicating the relation proach to sentence-to-vector mapping. The pres- contributes to its prediction. We measured the im- ence of only a single argument simply reduces pact of the Network-in-Network sentence vector from two position encoding vectors to one. The aggregation approach on the validation set. Rela- fully connected layer over the PCNN is an addi- tive to Network-in-Network aggregation and using tion. the same hyperparameters, a maximum aggrega- tion strategy gets two percent lower precision at The sentence vector aggregation portion of the one thousand: 66.55% compared to 68.49%. neural architecture uses a Network-in-Network over the sentence vectors. Network-in-Network There are 790 unary relations with at least one (NiN) (Lin et al., 2013) is an approach of 1x1 thousand positives in our benchmark. To speed CNNs to image processing. The width-1 CNN the training, we divided these into eight sets of we use for mention aggregation is an adaptation approximately 100 relations each and trained the to a set of sentence vectors. The result is max- models for them in parallel. Unary relations based pooled and put through a fully connected layer to on the same binary relation were grouped together produce the score for each unary relation. Un- to share useful learned representations. The re- like a maximum aggregation used in many pre- sulting split also put similar numbers of positive vious works (Riedel et al., 2010; Zeng et al., examples in the training set for each model. 2015) for binary relation extraction the evidence Training continued until no improvement was from many contexts can be combined to pro- found on the validation set. This occurred at be- duce a prediction. Unlike attention-based pooling tween five and nine epochs. All eight models also used previously for binary relation extraction were trained with the hyperparameters in Table1. (Lin et al., 2016), the different contexts can con- Dropout was applied on the penultimate layer, the tribute to different aspects, not just different de- max-pooled NiN. grees. For example, a prediction that a city is in Based on validation set performance, we found France might depend on the conjunction of sev- that when larger numbers of relations are trained eral facets of textual evidence linking the city to together the NiN filters and sentence vector di- the French language, the Euro, and Norman his- mension must be increased. Of all the hyperpa- tory. rameters, the training time is most sensitive to the

1590 Hyperparameter Value the precision / recall curves and focus on the rel- word embedding 50 ative area under the curves to evaluate the quality position embedding 5 of different systems. PCNN filters 1000 Figure5 shows the distribution of triples with PCNN filter width 3 relevant unary context sets per relation type. sentence vector 400 The relations giving rise to the most triples are NiN filters 400 high level relations such as hasLocation, a super- dropout 0.5 relation comprised of the sub-relations: coun- learnRate 0.003 try, state, city, headquarter, hometown, birthPlace, decay multiplier 0.95 deathPlace, and others. Interestingly there are 165 batch size 16 years with enough people born in them to produce optimizer SGD unary relations. While these all will have at least 100 relevant context sets, typically the context sets Table 1: Hyperparameters used do not have textual evidence for any birth year. Perhaps most importantly, there are a large num- ber of diverse relations that are suitable for a unary number of PCNN filters, since these are applied to KBP approach. This indicates the broad applica- every sentence in a context set. We found major bility of our method. improvements moving from the 230 filters used for NRE to 1000 filters, but less improvement or To test what improvement can be found by in- no improvement to increases beyond that. corporating unary relations into KBP, we combine the output of a state-of-the-art binary relation ex- 6 Benchmark traction system with our unary relation extraction system. For binary relation extraction, we use a Large KBs and corpora are needed to train KBP slightly altered version of the PCNN model from systems in order to collect enough mentions for NRE (Lin et al., 2016), with the addition of a fully each relation. However, most of the existing connected layer for each sentence representation Knowledge Base Population tasks are small in size before the max-pooled aggregation over relation (e.g. NYT-FB (Riedel et al., 2010) and TAC- predictions. We found this refinement to perform 3 KBP ) or focused on title-oriented-documents slightly better in NYT-FB (Riedel et al., 2010), a which are not available for most domains (e.g. standard dataset for distantly supervised relation WikiReading (Hewlett et al., 2016)). Therefore, extraction. we needed to create a new web-scale knowledge The binary and unary systems are trained from base population benchmark that we called CC- their relevant context sets to predict the triples in 4 5 DBP . It combines the text of Common Crawl train. The validation set is used to tune hyper- with the triples from 298 frequent relations in DB- parameters and choose a stopping point for train- pedia (Auer et al., 2007). Mentions of DBpedia ing. We combine the output of the two systems by, entities are located in text by gazetteer matching for each triple, taking the highest confidence from of the preferred label. We use the October 2017 each system. Common Crawl and the most recent (2016-10) version of DBpedia, in both cases limited to En- 7 Evaluation glish. We divided the entity pairs into training, vali- Figure6 shows the precision-recall curves for dation and test sets with a 80%, 10%, 10% split. unary only, binary only and the combined system. All triples for a given entity pair are in one of The unary and binary systems alone achieve simi- the three splits. This split increases the challenge, lar performance. But they are effective at very dif- since many relations could be used to predict oth- ferent triples. This is shown in the large gains from ers (such as birthPlace implying nationality). The combining these complementary approaches. For task is to generate new triples for each relation and example, at 0.5 precision, the combined approach rank them according to their probability. We show has a recall of more than double (15,750 vs 7,400) compared to binary alone, which represents over 3https://tac.nist.gov/ 4https://github.com/IBM/cc-dbp 100% relative improvement. 5http://commoncrawl.org The recall is given as a triple count rather than

1591 isClassifiedBy Pop Rock

birthPlace alma formerTeam Mater death Place pub- activeYears activeYears industry lisher Start EndYear location lan- draft gauge number Year

area Code type media leader Type hometown Title

battle

instrument

award kingdom computing literary has Platform Animal Genre Role

Figure 5: Distribution of Unary Relation Counts a percentage. Traditional attempts to measure the Cold Lake Provincial Park (Alberta, Canada) is recall of KBP systems use the set of all triples ex- mentioned in two sentences in the Common Crawl plicitly stated in text for the denominator of recall. English text. The unary relational knowledge This is unsuitable for evaluating our approach be- induction system predicts hasLocation:CANADA cause the system is able to make probabilistic pre- with the highest confidence (over 90%). Both sen- dictions based on implicit and partial textual ev- tences contribute to the decision. We see high idence, thus producing correct triples outside the weight from words including “cold”, “provin- classic recall basis. cial” and “french”. A handful of countries have

0.9 “provincial parks” including Argentina, Belgium,

0.8 South Africa and Canada. Belgium and Canada

0.7 have substantial French speaking populations and

0.6 Canada has by far the coldest climate.

0.5 • located within 10 minutes of

0.4 Precision

0.3 cold lake with quick access to

0.2 OOV ridge ski hill , cold lake Unary Binary Binary & Unary 0.1 provincial park and french bay . 0 0 10000 20000 30000 40000 50000 Recall Count • welcome to cold lake provincial Figure 6: Precision Recall Curves for KBP park on average 4.00 pages are viewed each , by the estimated 7.1 Saliency Maps 959 daily visitors . To gain some insight into how the unary KBP sys- Rock Kills Kid is a band mentioned twice in the tem is able to extract implicit knowledge we turn corpus. From this context set, the relation back- to saliency maps (Simonyan et al., 2014). By find- ground:GROUP OR BAND is predicted with high ing the derivative of a particular prediction with confidence. The fact that “Kid” occurs in the name respect to the inputs, we can discover a locally lin- of the entity seems to be important in identifying it ear approximation of how much each part of the as a musical group. The first sentence also draws input contributed (Zeiler and Fergus, 2014). focus to the band’s connection to rock and pop.

1592 While the second sentence seems to recognize the as domain, range and taxonomies. We will also band - song (year) pattern as well as the compari- integrate the new triples gathered from textual evi- son to Duran Duran. dence with new triples predicted from existing KB relationships by knowledge base completion. • the latest stylish pop synth band is rock kills kid . References • rock kills kid - are you Sren Auer, Christian Bizer, Georgi Kobilarov, Jens nervous ? ( 2006 ) who ever Lehmann, and Zachary Ives. 2007. Dbpedia: A nu- cleus for a web of open data. In In 6th Intl Semantic thought duran duran would become Web Conference, Busan, Korea. Springer, pages 11– so influential ? 15.

The Japanese singer-songwriter Masaki H Chang, M Abdurrahman, Ao Liu, J Tian-Zheng Wei, Aaron Traylor, Ajay Nagesh, Nicholas Monath, Haruna, aka Klaha is mentioned twice in the Patrick Verga, Emma Strubell, and Andrew McCal- corpus. From this context set, the relation back- lum. 2016. Extracting multilingual relations under ground:SOLO SINGER is predicted with high limited resources: Tac 2016 cold-start kb construc- confidence. The first sentence clearly establishes tion and slot-filling using compositional universal schema. Proceedings of TAC . the connection to music while the second indi- cates that Klaha is a solo artist. The conjunction Philipp Cimiano and Johanna Volker.¨ 2005. To- of these two facets, accomplished through the wards large-scale, open-domain and ontology-based context vector aggregation using NiN permits the named entity classification. In Proceedings of the International Conference on Recent Advances in conclusion of SOLO SINGER. Natural Language Processing (RANLP).

• tvk music chat interview klaha Aleksandr Drozd, Anna Gladkova, and Satoshi Mat- parade . suoka. 2016. Word embeddings, analogies, and machine learning: Beyond king - man + woman = queen. In Proceedings of COLING 2016, the • klaha tvk music chat OOV red 26th International Conference on Computational scarf interview the tv - k folks Linguistics. pages 3519–3530. did after klaha went solo . Matthew R Gormley, Mo Yu, and Mark Dredze. 2015. Improved relation extraction with feature-rich com- 8 Conclusions positional embedding models. In Proceedings of the 2015 Conference on Empirical Methods in Natural In this paper we presented a new methodology to Language Processing. pages 1774–1784. identify relations between entities in text. Our ap- Daniel Hewlett, Alexandre Lacoste, Llion Jones, Illia proach, focusing on unary relations, can greatly Polosukhin, Andrew Fandrianto, Jay Han, Matthew improve the recall in automatic construction and Kelcey, and David Berthelot. 2016. Wikireading: A updating of knowledge bases by making use of novel large-scale language understanding task over implicit and partial textual markers. Our method wikipedia. In Proceedings of the Conference on As- sociation for Computational Linguistics. is extremely effective and complement very nicely existing binary relation extraction methods for Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Net- KBP. work in network. arXiv preprint arXiv:1312.4400 This is just the first step in our wider research . program on KBP, whose goal is to improve re- Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, call by identifying implicit information from texts. and Maosong Sun. 2016. Neural relation extraction First of all, we plan to explore the use of more ad- with selective attention over instances. In Proceed- vanced forms of entity detection and linking, in- ings of ACL. cluding propagating features from the EDL sys- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S tem forward for both unary and binary deep mod- Corrado, and Jeff Dean. 2013. Distributed Rep- els. In addition we plan to exploit unary and bi- resentations of Words and Phrases and their Compositionality. In C. J. C. Burges, L. Bottou, nary relations as source of evidence to bootstrap a M. Welling, Z. Ghahramani, and K. Q. Wein- probabilistic reasoning approach, with the goal of berger, editors, Advances in Neural Information leveraging constraints from the KB schema such Processing Systems 26, Curran Associates, Inc.,

1593 pages 3111–3119. http://papers.nips.cc/paper/5021- Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao. distributed-representations-of-words-and-phrases- 2015. Distant supervision for relation extraction and-their-compositionality.pdf. via piecewise convolutional neural networks. In EMNLP. pages 1753–1762. Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou, extraction without labeled data. In Proceed- and Jun Zhao. 2014. Relation classification via con- ings of the Joint Conference of the 47th Annual volutional deep neural network. In Proceedings of Meeting of the ACL and the 4th International COLING 2014, the 25th International Conference Joint Conference on Natural Language Pro- on Computational Linguistics: Technical Papers. cessing of the AFNLP: Volume 2 - Volume pages 2335–2344. 2. Association for Computational Linguistics, Stroudsburg, PA, USA, ACL ’09, pages 1003–1011. Wenyuan Zeng, Yankai Lin, Zhiyuan Liu, and http://dl.acm.org/citation.cfm?id=1690219.1690287. Maosong Sun. 2016. Incorporating relation paths in neural relation extraction. In CoRR. Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Wen-tau Yih. 2017. Cross-sentence Yuhao Zhang, Arun Chaganty, Ashwin Paranjape, n-ary relation extraction with graph lstms. Transac- Danqi Chen, Jason Bolton, Peng Qi, and Christo- tions of the Association for Computational Linguis- pher D Manning. 2016. Stanford at tac kbp 2016: tics 5:101–115. Sealing pipeline leaks and understanding chinese. Proceedings of TAC . Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling relations and their mentions with- out labeled text. Machine learning and knowledge discovery in databases pages 148–163.

Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M Marlin. 2013. Relation extraction with matrix factorization and universal schemas. In Pro- ceedings of the 2013 Conference of the North Amer- ican Chapter of the Association for Computational Linguistics: Human Language Technologies. pages 74–84.

Karen Simonyan, Andrea Vedaldi, and Andrew Zisser- man. 2014. Deep inside convolutional networks: Vi- sualising image classification models and saliency maps. In ICLR Workshop.

Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D Manning. 2012. Multi-instance multi-label learning for relation extraction. In Pro- ceedings of the 2012 joint conference on empirical methods in natural language processing and compu- tational natural language learning. Association for Computational Linguistics, pages 455–465.

Linlin Wang, Zhu Cao, Gerard de Melo, and Zhiyuan Liu. 2016. Relation classification via multi-level at- tention cnns. In Proceedings of the 54th Annual Meeting of the Association for Computational Lin- guistics (Volume 1: Long Papers). volume 1, pages 1298–1307.

Yan Xu, Lili Mou, Ge Li, Yunchuan Chen, Hao Peng, and Zhi Jin. 2015. Classifying relations via long short term memory networks along shortest depen- dency paths. In Proceedings of the 2015 Confer- ence on Empirical Methods in Natural Language Processing. pages 1785–1794.

Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In Euro- pean conference on computer vision. Springer, pages 818–833.

1594