Joint Representation Learning of Cross-lingual Words and Entities via Attentive Distant Supervision Yixin Cao1,2 Lei Hou2∗ Juanzi Li2 Zhiyuan Liu2 Chengjiang Li2 Xu Chen2 Tiansi Dong3 1School of Computing, National University of Singapore, Singapore 2Department of CST, Tsinghua University, Beijing, China 3B-IT, University of Bonn, Bonn, Germany {caoyixin2011,iamlockelightning,successcx}@gmail.com {houlei,liuzy,lijuanzi}@tsinghua.edu.cn [email protected]

Abstract Wu et al. (2016); Han et al. (2016); Weston et al. (2013a); Wang and Li (2016) represent entities Joint representation learning of words and enti- based on their textual descriptions together with ties benefits many NLP tasks, but has not been the structured relations. These methods focused on well explored in cross-lingual settings. In this paper, we propose a novel method for joint rep- mono-lingual settings. However, for cross-lingual resentation learning of cross-lingual words and tasks (e.g., cross-lingual entity linking), these ap- entities. It captures mutually complementary proaches need to introduce additional tools to do knowledge, and enables cross-lingual infer- translation, which suffers from extra costs and in- ences among knowledge bases and texts. Our evitable errors (Ji et al., 2015, 2016). method does not require parallel corpora, and In this paper, we carry out cross-lingual joint automatically generates comparable data via representation learning, which has not been fully distant supervision using multi-lingual knowl- edge bases. We utilize two types of regu- researched in the literature. We aim at creating a larizers to align cross-lingual words and enti- unified space for words and entities in various lan- ties, and design knowledge attention and cross- guages, and easing cross-lingual semantic compar- lingual attention to further reduce noises. We ison, which will benefit from the complementary conducted a series of experiments on three information in different languages. For instance, tasks: word translation, entity relatedness, and two different meanings of word in English cross-lingual entity linking. The results, both are expressed by two different words in Chinese: qualitatively and quantitatively, demonstrate center the activity-specific building the significance of our method. as is expressed by 中心, center as the player role is 中 1 Introduction 锋. Our main challenge is the limited availability Multi-lingual knowledge bases (KB) store millions of parallel corpus, which is usually either expen- of entities and facts in various languages, and pro- sive to obtain, or only available for certain narrow vide rich background structural knowledge for un- domains (Gouws et al., 2015). Many work has derstanding texts. On the other hand, text cor- been done to alleviate the problem. One school pus contains huge amount of statistical information of methods uses adversarial technique or domain complementary to KBs. Many researchers lever- adaption to match linguistic distribution (Zhang age both types of resources to improve various nat- et al., 2017b; Barone, 2016; Cao et al., 2016). ural language processing (NLP) tasks, such as ma- These methods do not require parallel corpora. chine reading (Yang and Mitchell, 2017), question The weakness is that the training process is un- answering (He et al., 2017; Hao et al., 2017). stable and that the high complexity restricts the Most existing work jointly models KB and text methods only to small-scale data. Another line corpus to enhance each other by learning word and of work uses pre-existing multi-lingual resources entity representations in a unified vector space. For to automatically generate “pseudo bilingual docu- example, Wang et al. (2014); Yamada et al. (2016); ments” (Vulic and Moens, 2015, 2016). However, Cao et al. (2017) utilize the co-occurrence infor- negative results have been observed due to the oc- mation to align similar words and entities with sim- casional poor quality of training data (Vulic and ilar embedding vectors. Toutanova et al. (2015); Moens, 2016). All above methods only focus on ∗ Corresponding author. words. We consider both words and entities, which

227 Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 227–237 Brussels, Belgium, October 31 - November 4, 2018. c 2018 Association for Computational Linguistics makes the parallel data issue more challenging. • We did qualitative analysis to have an in- In this paper, we propose a novel method tuitive impression of our embeddings, and for joint representation learning of cross-lingual quantitative analysis in three tasks: word wordsandentities. Thebasicideaistocapturemu- translation, entity relatedness, and cross- tually complementary knowledge in a shared se- lingual entity linking. Experiment results mantic space, which enables joint inference among show that our method demonstrates signifi- cross-lingual knowledge base and texts without ad- cant improvements in all three tasks. ditional translations. We achieve it by (1) utilizing an existing multi-lingual knowledge base to auto- 2 RelatedWork matically generate cross-lingual supervision data, (2) learning mono-lingual word and entity rep- Jointly representation learning of words and enti- resentations, (3) applying cross-lingual sentence ties attracts much attention in the fields of Entity regularizer and cross-lingual entity regularizer to Linking (Zhang et al., 2017a; Cao et al., 2018), align similar words and entities with similar em- Relation Extraction (Weston et al., 2013b) and so beddings. The entire framework is trained using on, yet little work focuses on cross-lingual set- a unified objective function, which is efficient and tings. Inspiringly, we investigate the task of cross- applicable to arbitrary language pairs that exist in lingual word embedding models (Ruder et al., multi-lingual KBs. 2017), and classify them into three groups accord- Particularly, we build a bilingual entity network ing to parallel corpora used as supervisions: (i) from inter-language links 1 in KBs for regulariz- methods requiring parallel corpus with aligned ing cross-lingual entities through a variant of skip- words as constraint for bilingual word embed- gram model (Mikolov et al., 2013c). Thus, mono- ding learning (Klementiev et al., 2012; Zou et al., lingual structured knowledge of entities are not 2013; Wu et al., 2014; Luong et al., 2015; Am- only extended to cross-lingual settings, but also mar et al., 2016; Soricut and Ding, 2016). (ii) augmented from other languages. On the other methods using parallel sentences (i.e. translated hand, we utilize distant supervision to generate sentence pairs) as the semantic composition of comparable sentences for cross-lingual sentence multi-lingual words (Gouws et al., 2015; Kociský regularizer to model co-occurrence information et al., 2014; Hermann and Blunsom, 2014; Chan- across languages. Compared with “pseudo bilin- dar et al., 2014; Shi et al., 2015; Mogadala and gual documents”, comparable sentences achieve Rettinger, 2016). (iii) methods requiring bilingual higher quality, because they rely not only on lexicon to map words from one language into the the shared semantics at document level, but also other (Mikolov et al., 2013b; Faruqui and Dyer, on cross-lingual information at sentence level. 2014; Xiao and Guo, 2014). We further introduce two attention mechanisms, The major weakness of these methods is the lim- knowledge attention and cross-lingual attention, to ited availability of parallel corpora. One remedy is select informative data in comparable sentences. to use existing multi-lingual resources (i.e. multi- Our contributions can be concluded as follows: lingual KB). Camacho-Collados et al. (2015) com- bines several KBs (Wikipedia, WordNet and Ba- • We proposed a novel method that jointly belNet) and leverages multi-lingual synsets to learns representations of not only cross- learn word embeddings at sense level through an lingual words but also cross-lingual entities in extra post-processing step. Artetxe et al. (2017) a unified vector space, aiming to enhance the starts from a small bilingual lexicon and using embedding quality from each other via com- a self-learning approach to induce the structural plementary semantics. similarity of embedding spaces. Vulic and Moens (2015, 2016) collect comparable documents on • Our proposed model introduces distant su- same themes from multi-lingual Wikipedia, shuf- pervision coupled with attention mechanisms fle and merge them to build “pseudo bilingual doc- to generate comparable data as cross-lingual uments” as training corpora. However, the qual- supervision, which can benefit many cross- ity of “pseudo bilingual documents” are difficult lingual analysis. to control, resulting in poor performance in several 1https://en.wikipedia.org/wiki/Help: cross-lingual tasks (Vulic and Moens, 2016). Interlanguage_links Another remedy matches linguistic distribu-

228 NBA NBA (zh) American · was NBA All-star Larry Faust player basketball Mono-lingual Representation Learning Bilingual Semantic Space

Cross-lingual Joint NBA NBA (zh) NBA Sentence Regularizer Representation Detroit Pistons · Learning Cross-lingual All-star Larry Foust Bilingual EN Entity Regularizer en [[Lawrence Michael Foust]] was an American basketball sk player who spent 12 seasons in [[NBA]] szh [[·]][[NBA]] k0 Comparable Sentences Cross-lingual NBA NBA (zh) Detroit Pistons Supervision NBA Data Generation All-star Larry Foust · [[Lawrence Michael Foust]] was [[·]][[NBA]] an American basketball player who spent 12 seasons in [[NBA]] 1950[[NBA]]15 and was an 8-time [[All-star]] … [[]]… English KB Chinese KB

Figure 1: The overview framework of our method. The inputs and outputs of each step are listed in the three levels. Particularly, there are three main components of joint representation learning. Red texts with brackets are anchors, dashed lines denote entity relations, and solid lines are cross-lingual links. tion via adversarial training (Barone, 2016; Zhang We use multi-lingual Wikipedia as KB includ- Ey { y} et al., 2017b; Lample et al., 2018), domain adap- ing a set of entities = ei and their articles. tion (Cao et al., 2016). However, these methods We concatenate these articles together, and form Dy ⟨ y y y ⟩ suffer from the instability of training process and text corpus = w1, . . . , wi , . . . , w|D| . Hy- the high complexity. This either limits the scala- per links in articles are denoted by Anchors Ay = {⟨ y y⟩} y bility of vocabulary size or relies on a strong dis- wi , ej , which indicates that word wi refers y Gy Ey Ry tribution assumption. to entity ej . = ( , ) is the mono-lingual Ry {⟨ y y⟩} Inspired by Vulic and Moens (2016), we gener- Entity Network (EN), where = ei , ej y y ate highly qualified comparable sentences via dis- if there is a link between ei , ej . We use inter- tant supervision, which is one of the most promis- language links in Wikipedia as cross-lingual links Ren−zh {⟨ en zh⟩} en zh ing approaches to addressing the issue of sparse = ei , ei′ , indicating ei , ei′ refer training data, and performs well in relation extrac- to the same thing in English and Chinese. Cross- tion (Lin et al., 2017a; Mintz et al., 2009; Zeng lingual word and entity representation learning et al., 2015; Hoffmann et al., 2011; Surdeanu et al., is to map words and entities in different languages 2012). Our comparable sentences may further ben- into a unified semantic space. Each word and en- 3 y y efit many other cross-lingual analysis, such as in- tity obtain their embedding vectors wi and ej . formation retrieval (Dong et al., 2014). 3.2 Framework 3 Preliminaries and Framework To alleviate the heavy burden of limited parallel 3.1 Preliminaries corpora and additional translation efforts, we uti- lize existing multi-lingual resources to distantly Given a multi-lingual KB, we take (i) text cor- supervise cross-lingual word and entity represen- pus, (ii) entity and their relations, (iii) a set of an- tation learning, so that the shared embedding chors as inputs, and learn embeddings for each space supports joint inference among KB and texts word and each entity in various languages. For across languages. As shown in Figure 1, our clarity, we use English and Chinese as sample lan- framework has two steps: (1) Cross-lingual Su- guages in the rest of the paper, and use superscript y ∈ {en, zh} to denote language-specific parame- https://en.wikipedia.org/wiki/Lists_of_ ters2. languages_by_number_of_speakers. 3For the cross-lingual linked entities sharing the same 2We choose English and Chinese as example lan- strings (e.g., NBA and NBA (zh)), which is an infrequent situ- guages because they are top-ranked according to to- ation between languages, we use separated representations to tal number of speakers, the full list can be found in keep training objective consistent and avoid confusion.

229 pervision Data Generation builds a bilingual en- 4.1 Bilingual Entity Network Construction tity network and generates comparable sentences Entities with cross-lingual links refer to the same based on cross-lingual links; (2) Joint Represen- thing, which implies they are equivalent across tation Learning learns cross-lingual word and en- languages. Conventional knowledge representa- tity embeddings using a unified objective function. en tion methods only add edges between ei and Our assumption throughout the entire framework zh ei′ indicating a special “equivalent” relation (Zhu The more words/entities two contexts − is as follows: et al., 2017). Instead, we build Gen zh = (Een ∪ share, the more similar they are − . Ezh, Ren ∪Rzh ∪R˜en zh) by enriching the neigh- As shown in Figure 1, we build a bilingual bors of cross-lingual linked entities. That is, we − EN Gen zh by using Gen, Gzh and cross-lingual add edges R˜en−zh between two mono-lingual ENs Ren−zh en zh links . Thus, entities in different languages by letting all neighbors of ei be neighbors of ei′ , ⟨ en zh⟩ ∈ Ren−zh shall be connected in a unified network to facil- and vice versa, if ei , ei′ . itate cross-lingual entity alignments. Meanwhile, Gen−zh extends Gen and Gzh to bilingual set- from KB articles, we extract comparable sentences tings in a natural way. It not only keeps a con- Sen−zh {⟨ en zh⟩} = sk , sk as high qualified parallel sistent objective in mono-lingual ENs—entities, data to align similar words in different languages. no matter in which language, will be embedded Based on generated cross-lingual data closely if share common neighbors—but also en- Gen−zh, Sen−zh and mono-lingual data Dy, hances each other with more neighbors in the for- Ay, where y ∈ {en, zh}, we jointly learn cross- eign language. lingual word and entity embeddings through three Following the method in Zhu et al. (2017), there components: (1) Mono-lingual Representation will be no edge between Chinese entity 福斯特 Learning, which learns mono-lingual word and (Foust) and English entity Pistons, which implies entity embeddings for each language by modeling a wrong fact that 福斯特 (Foust) does not belong co-occurrence information through a variant of to Pistons. Our method enriches the missing rela- skip-gram model (Mikolov et al., 2013c). (2) tion between entities 福斯特 (Foust) and 活塞队 Cross-lingual Entity Regularizer, which aligns (Pistons) in incomplete Chinese KB through cor- entities that refer to the same thing in different responding English common neighbors, Allstar, languages by extending the mono-lingual model NBA, etc., as illustrated in Figure 1. to bilingual EN. For example, entity Foust in English and entity 福 斯 特 (Foust) in Chinese 4.2 Comparable Sentences Generation are closely embedded in the semantic space To supervise the cross-lingual representation because they share common neighbors in two learning of words, we automatically generate com- languages, All-star and NBA 选 秀 (draft), etc.. parable sentences as cross-lingual training data. (3) Cross-lingual Sentence Regularizer, which Comparable sentences are not translated paired models cross-lingual co-occurrence at sentence sentences, but sentences with the same topic in dif- level in order to learn translated words to have ferent languages. As shown in the middle layer most similar embeddings. For example, English (Figure 1), the pair of sentences are comparable word basketball and the translated Chinese word sentences: (1) “Lawrence Michael Foust was an 篮球 frequently co-occur in a pair of comparable American basketball player who spent 12 seasons sentences, therefore, their vector representations in NBA”, (2) “拉里·福斯特 (Lawrence Foust) 是 shall be close in the semantic space. The above (was) 美国 (American) NBA 联盟 (association) 的 components are trained jointly under a unified (of) 前 (former) 职业 (professional) 篮球 (basket- objective function. ball) 运动员 (player)”. Inspired by the distant supervision technique in relation extraction, we assume that sentence 4 Cross-lingual Supervision Data en en sk in Wikipedia articles of entity ei explicitly Generation en or implicitly describes ei (Yamada et al., 2017), en en and that sk shall express a relation between ei en en en This section introduces how to build a bilingual and ej if another entity ej is in sk . Mean- Gen−zh zh entity network and comparable sentences while, we find a comparable sentence sk′ in an- Sen−zh zh zh from a multi-lingual KB. other language which satisfies sk′ containing ej′

230 zh y C y in Wikipedia articles of Chinese entity ei′ , where where xi is either a word or an entity, and (xi ) ⟨ en zh⟩ ⟨ en zh⟩ ∈ Ren−zh ei , ei′ , ej , ej′ . AsshowninFig- denotes: (i) contextual words in a pre-defined win- y y ∈ Dy ure 1, the sentences in the second level are compa- dow of xi if xi , (ii) neighbor entities that y y ∈ Gy rable due to the similar theme of the relation be- linked to xi if xi , (iii) contextual words of y y y ⟨ y y⟩ ∈ Ay tween entity Foust and NBA. To find this type of wj if xi is entity ei in an anchor wj , ei . sentences, we search the anchors in the English aritcle and Chinese article of cross-lingual entity 5.2 Cross-lingual Entity Regularizer Foust, respectively, and extract the sentences in- The bilingual EN Gen−zh merges entities in dif- cluding another crosslingual entity NBA. Compa- ferent languages into a unified network, resulting rable sentences can be regarded as cross-lingual in the possibility of using the same objective as contexts. in mono-lingual ENs. Thus, we naturally extend Unfortunately, comparable sentences suffer mono-lingual function to cross-lingual settings: from two issues caused by distant supervision: ∑ L C′ y | y Wrong labelling. Take English as sample, there e = log P ( (ei ) ei ) en|L y∈{Gen−zh} may be several sentences sk,l l=1 containing the ei same entity een in the article of een. A straightfor- j i where C′(ey) denotes cross-lingual contexts— ward solution is to concatenate them into a longer i neighbor entities in different languages that linked sentence sen, but this increases the chance to in- k to ey. Thus, by jointly learning mono-lingual rep- clude unrelated sentences. i resentation with cross-lingual entity regularizer, Unbalanced information. Sometimes the pair words and entities share more common contexts, of sentences convey unbalanced information, e.g., and will have similar embeddings. As shown in the English sentence in the middle layer (Figure 1) Figure 1, English entity NBA co-occurs with words contains Foust spent 12 seasons in NBA while the basketball and player in texts, so they are embed- comparable Chinese sentence not. ded closely in the semantic space. Meanwhile, To address the issues, we propose knowledge at- cross-lingual linked entities NBA and NBA (zh) tention and cross-lingual attention to filter out un- have similar representations due to the most com- related information at sentence level, and at word mon neighbor entities, e.g., Foust. level respectively. 5.3 Cross-lingual Sentence Regularizer 5 Joint Representation Learning Comparable sentences provide cross-lingual co- As shown in Figure 2, there are three components occurrence of words, thus, we can use them to in learning cross-lingual word and entity represen- learn similar embeddings for the words that fre- tations, which are trained jointly. In this section, quently co-occur by minimizing the Euclidean dis- we will describe them in detail. tance as follows:

5.1 Mono-lingual Representation Learning ∑ en zh 2 L = ||s − s ′ || Following Yamada et al. (2016); Cao et al. (2017), s k k ⟨sen,szh⟩∈Sen−zh we learn mono-lingual word/entity embeddings k k′ Dy Ay based on corpus , anchors and entity net- en zh where s , s ′ are sentence embeddings. Take En- work Gy. Capturing the cooccurrence information k k glish as sample language, we define it as the aver- among words and entities, these embeddings serve age sum of word vectors weighted by the combi- as the foundation and will be further extended to nation of two types of attentions: bilingual settings using the proposed cross-lingual regularizers, which will be detailed in the next sec- tion. Monolingually, we utilize a variant of Skip- ∑L ∑ en en en ′ en zh en gram model (Mikolov et al., 2013c) to predict the sk = ψ(em , sk,l) ψ (wi , wj )wi en∈ en contexts given current word/entity: l=1 wi sk,l

en|L ∑ ∑ where sk,l l=1 are sentences containing the L C y | y m = log P ( (xi ) xi ) same entity (as mentioned in Section 4.2), and en en ∈{ } y∈{Dy Ay Gy} ψ(e , s ) y en,zh xi , , m k,l is knowledge attention that aims

231 ACL 2018 Submission ***. Confidential Review Copy. DO NOT DISTRIBUTE.

ACL 2018 Submission ***. Confidential Review Copy. DO NOT DISTRIBUTE. 000 050 ACL 2018 SubmissionACL 2018 Submission ***. Confidential ***. Confidential Review Review Copy. Copy.DO NOT DO DISTRIBUTE. NOT DISTRIBUTE. 001 Joint Representation Learning of Cross-lingual Words and Entities via 051 ACL 2018 Submission ***. Confidential Review Copy. DO NOT DISTRIBUTE. 002 000 Attentive Distant Supervision 052 050 ACL 2018 SubmissionACL 2018 Submission ***. Confidential ***. Confidential ReviewACL Review Copy.2018 Submission Copy.DO NOT DO DISTRIBUTE. NOT ***. DISTRIBUTE. Confidential Review Copy. DO NOT DISTRIBUTE. 000 Joint Representation Learning of Cross-lingual Words050 and Entities via 000 Joint003 Representation001 Learning of Cross-lingual Words and Entities via 050 053 051 001 004 ACL 2018 Submission ***. ConfidentialAttentive Review Copy. Distant DO NOT DISTRIBUTE. Supervision 051 054 001 000Joint Representation002 Learning of Cross-lingual Words and Entities via 050051 052 002 Attentive Distant Supervision 052 005000Joint Representation003 000 Learning of Cross-lingual Words andACL Entities 2018 Submission via ***. Confidential Review Copy. DO050055 NOT DISTRIBUTE. 053 050 002 001000 Attentive DistantACL 2018 SubmissionACL Supervision 2018 Submission ***. Confidential ***. Confidential Review Review Copy. Copy.DO NOT DO DISTRIBUTE. NOT DISTRIBUTE.051052 050 003 Joint Representation LearningJointAnonymous Representation of Cross-lingual ACL submission Learning Words of and Cross-lingual Entities053 via Words and Entities via 006001 004 001Attentive Distant Supervision 051056 054 051 002001 000Joint Representation Learning of Cross-lingual Words and Entities via052 050051 003 004 AttentiveACL 2018 Submission Distant ***. Supervision Confidential Review Copy. DO NOT DISTRIBUTE.053054 007002 005 002 Attentive Distant Supervision 052057 055 052 003 001 Joint Representation Learning of Cross-lingual Words and Entities053 via 051 004 002 Attentive000 Distant SupervisionAnonymous ACL submission 054 052 050 005 008003 000 003 ACL 2018 Submission ***. Confidential Review Copy. DO055 NOT DISTRIBUTE.053058 050 053 004 000002 006 AnonymousAttentive ACLACL submission 2018 Distant Submission Supervision ***. Confidential Review Copy. DO NOT054 DISTRIBUTE. 052 050 056 005 006 003 001 Joint Representation 001 LearningJoint Representation of Cross-lingual Learning Words and of055056 Cross-lingual Entities via053 Words051 and Entities via 051 009004 004 054059 054 005 001003 007000JointAnonymous Representation ACL submission Learning of Cross-lingual Words and055 Entities via053 050051 057 007 ACL 2018 Submission ***. Confidential ReviewAnonymous Copy. DO NOT DISTRIBUTE.002 ACL submissionAttentiveACL 2018 Distant Submission ***.Supervision ConfidentialAttentive Review Copy. Distant DO057 NOT DISTRIBUTE. Supervision 052 006 006004 010005 002 005Joint Representation Learning of Cross-lingual Words and056056 Entities via054055060 052 055 004 008001 Attentive Distant Supervision 054 051 058 002 000 Anonymous ACL submissionAnonymous ACL submission 052 050 007 008 007005 006 003 003 057057058 055056 053 053 011 005 009000002 006 Attentive Distant Supervision 055 061 052 059 050 056 003 001 Abstract AnonymousJoint Representation ACL submission Most Learning existing work of jointly Cross-lingual models KB and Words text and Entities053 via 051 500 009 008 007 004 ACL 2018 Submission ***. Confidential Review004 Anonymous Copy. DO NOT DISTRIBUTE. ACL submission550 058059 057 054 054 008 006 012 006 010001003 007000Joint Representation Learning of Cross-lingual Words058 and Entities056056 062 via053 060 050051 057 002 corpusAttentive to enhance each Distant other by Supervision learning word and 052 501 010 004 005 005 551 060 054055 055 009007 008 008 Joint Representation Learning of Cross-lingual059 Words and057058 Entities via 058 009 013 007 011002004Jointly learning001 word and entity represen- AnonymousAttentive ACL Distant submission SupervisionAnonymous059 ACL submission057 063 054 061 051052 502 003 entity representations552 in a unified vector space. For 053 011 010 009005 006 ACL 2018 Submission006 Abstract ***. Confidential Review Copy. DO NOT DISTRIBUTE. Most existing work060 jointly061 models KB059 and text 055056 056 010 008500 014008 005Abstracttations benefits 009002 many NLP tasks, while Most has existing work jointlyAttentive models KBDistant and550 text Supervision 060 058058 064 055 052 059 503 012 003 Anonymous example, ACL submission553 (Wang et al., 2014; Yamada et al., 2016; 062 053 012 ACL 2018007 Submission ***. Confidential004ACL 2018 Review Submission Copy. DO ***. NOT Confidential DISTRIBUTE. ReviewAnonymous Copy. DO NOT DISTRIBUTE. ACL submissioncorpus to enhance each other062 by learning word and 057 054 011501 010006009 006 003 007 551 061 059060 056056 053 057 011504 009 015 013notAbstract been well010 explored in cross-lingualcorpus Most set- to existing enhance work each jointly other554 bymodels learning KB wordand text and 061 059 065 063 060 502 ACL 2018 Submission ***.004 Confidential Review005 Copy.Jointly DO NOT learningDISTRIBUTE.ACL 2018 Submission word ***. and Confidential entity Review represen-Cao Copy. et DO al. NOT, 2017 DISTRIBUTE.)utilizethecoherenceinforma- 552 054055 505 013012 011 Jointly Abstract learning 008ACL 2018 word Submission and entity ***. Confidential represen- Review Copy.008Most DO NOT DISTRIBUTE.existing work jointly models555 entity KB and representations text in062 a063 unified vector space.061 For 058 058 016007010 000 007tings. In this011004 paper, we proposeentitycorpusACL a novel 2018 representations to Submission enhance ***. each Confidential in other a unifiedAnonymous byReview learning vector Copy. DO space.word NOT ACL DISTRIBUTE. and For submission 050 060 066 057057 054 061 012 010503 014 Abstracttations benefits many NLP tasks, whiletionMost to has existing align similar work jointly words553models and entities KB and062 with text sim- 060 064 014000013 001 005 Joint Representation006 Learning of Cross-lingualAbstract Words 050 andexample, Entities (Wang via Most et al. existing,0632014051064 ; Yamada work jointly et al., models2016; KB and text 055056 506 012008tations500011Jointly benefits learning009 many008 word NLP andACL tasks, entity 2018005 Submission while represen- has ***. Confidentialcorpus009 example,entity Review to enhance Copy. representations ( DOWang NOT each DISTRIBUTE. etother al. in, 2014 a unifiedby 556 ; learningYamada vector wordet space. al., and2016 For ; 550 061062 058058059 055 059 504 Joint017 Representation method Learning that012 of integrates Cross-lingual cross-lingual Words and word Entities corpus via Anonymous to enhance each ACL other submission554 by learning word and 067 062 013507 000 001011 002 015000 notAbstract been wellAttentiveACL explored 2018 Submission Distant in cross-lingual ***. Supervision ConfidentialilarMost050 embedding set- Review existing557Anonymous051 Copy. work DOvectors. NOT jointly DISTRIBUTE. AnotherACL models submission approach KB and063052050text in (Han 061 065 015 014000Jointly learningnot501tations been word well benefits010 and explored006 manyACL entity 2018 in NLP Submission Submission represen- cross-lingual tasks,007 ***. ***. Confidential while Confidential set- has Review Review Copy. Copy. DO NOT DO DISTRIBUTE. NOT DISTRIBUTE.ACL 2018 Submission ***. Confidential 050 ReviewCao Copy. et DO al. NOT , 2017 DISTRIBUTE.corpus)utilizethecoherenceinforma- to551064 enhance065 each other by learning 060 word and 056057 Joint Representation013 012 Learning009Jointly ofAbstract Cross-lingual learningACLJointα006 2018ki,lj word Representation Submission Words and entityentity ***.010 and Confidential represen-Cao representationsEntitiesexample, Learningsim Review et(wki al. via,w Copy. (,Wang oflj2017)Most DO Cross-lingual NOT et)utilizethecoherenceinforma- in DISTRIBUTE.existing al. a unified, 2014; Words workYamada vector jointly and space. et al.Entities, models2016 For; via KB and text 062063 059 056 060 001 002 505 018009Joint Representation 003 016 and001Attentive entity Learning000 representation Distant013 of Cross-lingual Supervision learningJointly Words to enable learning and Entities wordentitycorpus and via051 representations toentity enhance052 represen- each in other a unified555 by learning vector space.word053051 and For 050 068 059 066 063 508 001 502 tings.∝ wki Inskm this,wlj paper,slm we proposeet a novelal., 2016558 ; 051Toutanova et al.entity, 2015 representations; 552Wu et al., in a unified vector space. For 014 016 015012tations benefitstings.not many been InAttentive this well NLP011 paper, explored007 tasks, Distant we in while propose cross-lingual Supervision008 has a novelJoint set-∈ Representation ∈ Learning of Cross-lingual tion to Words align similar and Entities words065064066 viaand entities062 with sim- 061 057058 002 003 506 014000013 004 010tations002Jointly benefits001 learning 007 many word NLP and tasks,example, entity011 whiletionAttentiveCao represen- ( toWang ethas align al.corpus Distant, 2017 et similar al.ACL)utilizethecoherenceinforma-, to Supervision 20182014 enhance words052 Submission; Yamada and053 each ***. entities Confidential et other al. with, by2016 Review556 learning sim-050; Copy. DO NOTword DISTRIBUTE.054052 and 063051064 060 057 061 509 002 019010503 joint Attentive inference014Distant among knowledge SupervisionAbstracttations base benefits and manyexample,entity NLP representations tasks, (Wang559 while052Most et al. has in, existing2014 a unified; Yamada work vector jointly et space. al.553, models2016 For ; KB and text069 060 064 000 Joint017 Representation method Learning that of integrates Cross-lingual cross-lingual Words2016 and word)learnstorepresententitiesbasedontheir Entities Abstract050 via Most existing work jointly models067 KB and text 015 003 017004016013507 000 method001tings. that In this integrates paper,003500 cross-lingual we002 propose009 000 a word novel ACL 2018 Submission ***. ConfidentialAttentive Review Copy.053 Distant DO NOT054 DISTRIBUTE. Supervisionilar050 embedding557051 example, vectors.066 (065Wang Another053067 etal. approach, 2014550063052050; inYamada (Han et al., 2016; 059 003not been well015 014 explored 000Jointly005 012 learningin008011 cross-lingualnottations been word well benefits and explored set-008 manyACL entity 2018 in NLP Submission represen- cross-lingual 012 tasks, ***. Confidentialilartion while embedding toset- has Review align Copy. similar DO vectors. NOT DISTRIBUTE. words Another and entities053 approach with in sim- (Han050 055 064065 061062 058058 062 510 001 504JointJoint Representation Representationtext across Learning Learning languages,015 of of Cross-lingualCross-lingual capturing Caoα et mutually Words al. Words, 2017 and entity )utilizethecoherenceinforma- EntitiesandCao representationsEntitiesexample, sim via et( w al. via,w (,Wang5602017)corpus051 et)utilizethecoherenceinforma- in al. a to unified, 2014 enhance; Yamada vector each space. otheret al.554, by2016 For learning; word and 065 004 001 020011002 018004 003 and001Attentive entity representation DistantJointnotAnonymousAbstractki,lj been Representation Supervision well learning ACL explored submission to Learningtextual enable in054 cross-lingualki descriptions oflj Cross-lingualMost set-051 together existing Words052 work with and the jointly structured Entities models054 corpus via re- KB and to enhance text053051 070 each other061 by068 learning word and 005017508004 method001 006 that integratesJoint501s Representation cross-lingual010 ACL 2018 Submission word Learning ***. Confidential ofCross-lingual ∝ w Review s Copy.,w DOs Words NOT DISTRIBUTE.ACL and 2018 SubmissionEntities055 054et ***. via al. Confidential, 2016558 Review; Cao051Toutanova Copy. et DO al. NOT067,056 2017 DISTRIBUTE.et al.)utilizethecoherenceinforma-, 2015;551Wu et al., 060 016 018014 tings. 002 In and this015 entity paper, representation 013 we012 proposenotkmAnonymous been learningAttentive a well novel 009 ACLexplored to Distantenable submission in SupervisionACL cross-lingual 2018 Submission ilar embedding set-ki ***. km Confidential lj vectors.lm Review Copy. Another DO NOT052 approach DISTRIBUTE. in (Han 066068 065064 062063 059 511 005 002 016 505 tations benefits009tings. many InAttentive this NLP Jointly paper, tasks,Abstract Distant learning we000 while propose013 Supervision word has et a and al. novel, entity2016∈; represen-Toutanova∈ AttentiveCao 055 et al. etMost Distant,561 al.2017, 2015 existing)utilizethecoherenceinforma- Supervision052; Wu work et al. jointly, models555 KB and text066 050 059 063 006 021003 complementary005502 ≈ 004 016 joint knowledge.002 Attentive inferencetion Insteadtings.Distant amongto align Inof knowledge Supervisionre- this similarexample, paper,Jointly wordstion base we learning ( toWang and propose andalignet entities056 word similarentitycorpus al. a, novel2014 and representations withwordsto enhanceentity; Yamada sim- and053 represen- each entities etin other aal. unified with, 2016 by055 learning sim- vector; space.word552 and054052 For071 066 wki018+509005 (wki003) 012and 002 entity 007 representationAnonymous019 ACL learning submission to enable Anonymous ACL lations. submission However, 053055 these 559 methods tion052 to only align068057 focus similar entity on words representations and entities in a062 with unified sim-069 vector space. For 512 019 joint016 inference000014 among000013 tings. knowledge Anonymous In 011 this010 paper, base ACL and001 submissionwe propose etJoint a al. novel, 2016 Representation; Toutanova et Learning562 al., ACL2015 20182016 of; SubmissionWu Cross-lingual)learnstorepresententitiesbasedontheir et al. ***.050, Confidential Words Review050069 and Copy. EntitiesDO NOT DISTRIBUTE.066 via 063051064 060061 017 006 007015methodC003 that017004506 integratesnot been cross-lingual wellmethod006 explored that005 integrateswordtations inJointly cross-lingual003 benefits cross-linguallearning014 many set- word2016 NLP word and)learnstorepresententitiesbasedontheir tasks, entity while represen-tion has056 tocorpus align057 similar to enhance words053 each and054 entities other by with556 learning067056 sim- word and065055053067 064 510006 004 003 008 010liance 503 Joint on parallel Representation017 data, we Learningilar automaticallymethod embedding of Cross-lingual that integratesCao vectors. Abstracttations etilar Words al. Another benefits embedding cross-lingual, 2017 and )utilizethecoherenceinforma- manyEntities approachexample,entity054 056 vectors. NLPword via representations (in tasks, AnotherWang (Han560 while etMost053 approachal. in, has2014 a existing unified ; in 058Yamada ( vectorHan work et space.jointly al.,5532016 For models; KB and text 060 067 019 022 joint000 inference001 020 among knowledgeJoint Representationtext base across and languages, Learning capturing of Cross-lingualAnonymous mutuallymono-lingualAttentive ACL Words submission settings, and Distant Entities and Supervision via051 fewilar050 embedding researches069 vectors.example, have Another (Wang072 et approach al., 2014 in; Yamada (Han070 et al., 2016; 513 007 020008 ! 004 013text005507017 across languages,001014 007Jointlymethod capturing thatlearning012 integrates011 mutually004 word002 cross-lingual and000 entity 2016 word represen-)learnstorepresententitiesbasedontheirACL 2018 Submission ***.057 Confidential563058 Reviewtextual Copy.054 DO descriptionsNOT DISTRIBUTE.055 together557051057070 with the structured067054 re- 064063052050 061062 007and entity005 018 representation 004 009 015 learning and504000 entity to006 representation enablenottations beenskm well benefits learningAttentive explored manyACL totextual 2018 Distantenable in NLP Submission cross-lingual descriptions tasks, Supervision ***. Confidential while ilar set- together has Review embedding Copy. with DO055 NOT vectors.057 the DISTRIBUTE. structured Another re-054 approach in059 (Han050 554056068 065 018 016511 tings.001 002 In011generate thisJoint paper, Representation cross-lingualAnonymous we propose Anonymous ACL training Learning submission a015 novel dataAttentive ACL of via submission Cross-lingual dis- DistantJointαnotki,ljAbstract Representation beenetSupervision Words al. well, entity2016 and explored; Cao representationsEntitiesToutanovaexample, Learningsim et(w inki al. cross-lingual via,w (,Wang oflj2017 et) Cross-lingual561 al.corpus et)utilizethecoherenceinforma-052 in, Most2015 al. a051 unified, set-2014 to ; existing enhanceWu; Words068Yamada vector etACL work al. each and, 2018space. et jointly otheral.Entities Submission, 2016 For by066models; vialearning ***. KB Confidential wordand text and Review Copy.061 DO NOT DISTRIBUTE. 065 514 008 020 005 023006text across languages,002 008 capturing 018Jointcomplementary 005 mutually Representation003≈ et ACLn al.andACL001 2018,knowledge. 2018 entity2016 Submission Learning Submission; representation ***.Toutanovation Confidential ***. Insteadof Confidential to Cross-lingual Reviewbeen align of058et Copy. learningReview donere- similaral. DO NOT,Copy.5642015 in WordsDISTRIBUTE. to DO cross-lingual words enable NOT; Wu DISTRIBUTE.and055 and etEntities al. entities056 scenarios., via with070052058 sim-Cao et al., 0552017073)utilizethecoherenceinforma- 053051 068 009 008 006 wcomplementary508018+005 (w ) 021015 knowledge.001and entity 007 013 InsteadrepresentationAnonymous012 of re- ACL learning submissiontextual to enable descriptions∝ w togetherki skm,wlj with059slm056058 thelations. structuredACL 2018 SubmissionHowever, re-et055 al. ***., Confidential these2016558 ; methods051 ReviewToutanova Copy. DOonly NOT et068057DISTRIBUTE. focus al., 2015 on ; 065Wu et071 al., 062063 021 512 019014ki 010 ki003016 joint5052tations inference benefitstings. amongnot = many been In knowledgeAttentive thisAnonymous well NLPα paper, exploredJointlykm tasks, lations. DistantbaseACL weAbstract learning in while and000α propose submission cross-lingualki,lj SupervisionACL However, has 2018 word a Submission novelet and Anonymous set-al. these∈, entity ***.2016 Confidential methods∈; represen- ACLToutanova Review submission only Copy. focus etMost DO562 al. NOT,053 on2015 DISTRIBUTE.existing; Wu work et060071 al. jointly, models555 069 KB and064066 text 050 019515 009 017joints =006 inference007 amongmethod002C knowledge that003tant 009 integrates supervision base cross-lingualand over006 multi-lingual004 016 word002 knowl- tings.2016 In059)learnstorepresententitiesbasedontheir thisexample,565 paper,tionAttentiveCao ( toWang we et056 align al. propose Distant, 2017 et similar al.057entitycorpus)utilizethecoherenceinforma-, aSupervision2014 novel words052 representations to; Yamada enhance and069053059 entities each et inal. other a with, unified2016 by067 sim-056 learning; vector space.word054052 and For 066 010021 009 007 024 509complementary006 004 012 knowledge.002 008 019 Instead liance ofn onre- 2016 paralleljoint)learnstorepresententitiesbasedontheir Attentive inference data,n weilarDistant among automatically embedding knowledge Supervision vectors.060057 base059 Anotherand approach054056 in559071 (Han052tion to align058 similar074 words and entities062 with069 sim- L liance|| on 011 parallel− 022|| data,joint inference we000 automatically013 amongs knowledgeskm wkilations. bases and However,Joint theseIn this Representation methods paper,mono-lingual onlywe propose focus Learning on to2016 settings, learn of)learnstorepresententitiesbasedontheir Cross-lingual cross-lingual and061050 few researches Words and have Entities via072 063 010 022 513 007 015019 003 017004016text506 across languages,014method000tings.007 thatkm capturingIn∈ this integratestations paper,003Jointly mutuallymono-lingual∈ benefitskmcross-lingual weACL001 learning propose 2018 Submission many wordsettings, a 2016 word NLP***. novel Confidentialand060)learnstorepresententitiesbasedontheir tasks, entityand Review fewwhile represen- Copy. researches DO has NOT057 DISTRIBUTE.corpus563 have to enhance053 054 each072 other by556069 learning050057 word066065053067 and 051064 516 011 010 008Sk,l007! 010 003not been well explored Abstract005" in cross-lingual" set-Mosttextual existing descriptions work566061 ilar jointlytion058060 embedding to models together align KB similar withand vectors.058example,entity057 text the words representations structured Another (Wang and060 entities053 approachet re- al. in, 2014 with a unified in; sim- (YamadaHan070 vector et space. al.,0552016 For ; 018text across" languages,510liance∈ and on entity005 parallel capturing representationedge data, bases.000 mutually009 we automatically We learningACL alsoJoint 2018 proposeSubmission RepresentationtoJoint enable017text ***. Representation Confidential two across types Review Learning languages, Copy.method DO Learning NOTof Cross-lingual DISTRIBUTE. capturing thatCao of integrates Cross-lingual et mutually Words al., 2017 cross-lingual and)utilizethecoherenceinforma- Entities Words055 viaand word Entities050 560 via 068059 067 020 022 008 025 004 012Abstract013text000 across001 020ACL languages,001 2018 generate SubmissionMost existing capturing cross-lingual***.textual ConfidentialAnonymous workmono-lingual mutually descriptions jointly Review ACL002 training Copy. models submissionet DO NOT KB al.000 datasettings, together DISTRIBUTE., and2016word textvia dis-; and withToutanovaAnonymous entity few the researches structured representations058 et ACLAttentive al., have submission re-2015 in;054 Distant theWu same072 et070062 Supervision051 al. seman-,ilar050 embedding051 075 vectors. Another approach063052050 in070 (Han 011 514 009 generate009020 cross-lingual023005017011507 training015 data014method008000Jointly viaJoint dis-006 Representation that learning integrates004tations word Learning cross-lingual benefits andcorpusn of manyACL entity Cross-lingual 2018 to word NLP Submission enhance represen-061 tasks, each***. Words Confidential while059 other500 andbeen by has Review learningEntities done Copy.564 word DO via059 in NOTtextual cross-lingual and DISTRIBUTE. descriptions055061 scenarios. together557070058 with050 the structured067056054 073 re- 064065 550 517 023012 011 016Abstract008 006018complementary 004 001010 Most and knowledge.existing entityJoint work representationRepresentation Abstract jointly Insteadnot modelss beenkmbeen of KBlearningwell doneAttentive re- Learning and explored intext cross-lingualto Distantenable oftextual Most in Cross-lingual cross-lingual Supervision existing descriptions scenarios.567 work062 ilar061 jointlyWords embedding set- together models entity and KB with vectors. representationsEntitiesexample, and056058 the text structured Another via051 ( Wang 073054 approach re- et in al. a unified, 2014 in; (HanYamada vector060 space. et al., 0662016068 For; 6Experiments 021 511generate 013 cross-lingualAbstractoftings. knowledge001 training002 In this002 data attention 2corpus paper, via dis-Most to andenhance we existing cross-lingual propose each workAnonymous= other jointly by a learning novel modelsα ACLandkmAttentive word KBlations. entitysubmission and textα representationki,lj DistantJoint However,αki,ljet Representation Supervision al., these2016 learning ; methodsToutanovaCao to Learningsim enable et(w onlyki al.,w et, oflj561 focus2017 al.)063 Cross-lingual052, 2015)utilizethecoherenceinforma- on 051; Wu Words et052071 al., and Entities via 012 023019complementary 009 010 joint 005 knowledge.s = inference 006 012508 Instead amongJointly learningof knowledge re-tant009 wordACL supervision 2018 and Submission base entity018Jointcomplementary and005 ***. represen- Confidentialoverbeen RepresentationAttentive≈ 003 multi-lingual done Review Copy. in Distant cross-lingualknowledge. DO001 NOT DISTRIBUTE. Learning Supervision knowl-062 tion Insteadscenarios. ofto060 Cross-lingual align of re-059 similar Words words055 and and Entities073 entities056062 et via al. with, 2016 sim-558069059; Toutanova055 et al., 2015; Wu053051 et al., 068 021 013 515012 026tant010 supervision009 007 w 014 over+005complementary multi-lingual(w002 corpus) 021015 to and enhance000 knowledge.001 knowl- entity007 each representationAnonymouslations. otherACL Instead 2018 by Submission learning However,ACL ofn learning ***. wordre-submission Confidential 2016 andentity to these)learnstorepresententitiesbasedontheir Review enable representationstic Copy.n space, methods DO NOT DISTRIBUTE. to063 in enable a062 onlyunified∝ wki jointvectorfocusskm,wlj space.inference565s on060lm057lations.059 For among052 However,071 KB055 050 and these methods051076 only057 focus on 065064 071 518 Jointly021 learning word andki024018liance entity represen- onki011003 parallel016joint tations inference data, ACL we2018 benefits Submission among automatically tings.not ***.manybeen knowledge In Confidentials Attentive this wells NLPkm paper, Reviewexplored base tasks,w DistantCopy.corpuskilations. we ands DO in while topropose NOT cross-lingual enhance Supervision DISTRIBUTE. However, has568 each a501 novelAnonymous other set- theseIn∈ by learning this methods ACL∈ paper, word submission and onlywe propose focus 053 on to learn cross-lingual071 061 068 074 066 551 024 011 512 L 006 014 019000 002tations|| − benefits003 ||entity manycorpus representations NLP to tasks,Joint enhance while Representation in eachIn a has unifiedAnonymouskm thisother vector bypaper, learning Learning002 space. ACL we word Forkm proposesubmission andof Cross-lingual toet061 learn al., cross-lingualWords2016; 050 andToutanovationAttentiveCao Entities to056 et align al. viaet Distant, 562 al.2017 similar064074, 2015)utilizethecoherenceinforma- Supervision; words052Wu et and053 al., entities with069 sim- 052 013 Jointly010 learning022017 wordtant and supervision entity overmethodSk,lC multi-lingual003 to that selectJointly integrates001010002 learning knowl- the most word cross-lingual and006 informative entityAbstract represen- 004" ∈ wordexample,jointmono-lingual "Attentive inference∈ (Wang063 et al.Distant among,20162014Most settings,;)learnstorepresententitiesbasedontheir existingknowledgeYamada Supervision060example, and worket al. few, jointly0582016 base (researchesWang; andmodels053 et KB057063 al. haveand, 2014 text051 ; Yamada060072052 et al., 2016067056 ; 054 014024516013liance ontations parallel011Jointly benefits010text learning000 manydata, across word NLP we tasks, andAbstract languages,509006liance∈ automatically entity while Joint represen-004on entity has parallel Representation capturing representations edge data, bases.008 mutuallyMostin we Learning a019 unifiedexisting automaticallyliance We vector alsoofworkIn onCross-lingual this jointlyspace. propose parallel paper, models For KBWordsdata,two weentitytext and propose representationstypes text weand acrossilar Entitiesautomatically embedding to064 languages learn063 in via a unified cross-lingual withoutvector vectors.050566 space.061 any060 Another For additional074 approach054056 trans-2016)learnstorepresententitiesbasedontheir in (559Han 058 069 022519 020 012027edge022 bases. We 001"019 also003 propose 012Abstract 022016 twojointexample, types inferenceentity000 ( representationsWangmono-lingualACL among et 2018tings. al. Submission, 2014Most knowledgein In; aYamada unified ***.existing this Confidentialsettings,textual vector paper, et work al.base Review,mono-lingual space.2016 descriptions jointly weand Copy.; For proposeDOmodels few NOT569 DISTRIBUTE. KB researches062 settings, together a502 and novelword text and with have051 entity few themono-lingual researches structured representations 072 have re- settings, in053 the sameand072070062050 few seman-077 researches069 have 066 072 552 014 025 tations011 benefits many513 NLP007 015 tasks,009020 while025008015generate014Joint has! not Representation004 cross-lingual been017 welltexttations004 explored002 011 acrossnot benefits Learning trainingbeen in manylanguages, cross-lingual well NLPmethod data of007 tasks,Cross-lingualexploredword via capturing set- whilethat dis-005 and has integrates entityin cross-lingualAttentive Words mutually003 representations cross-lingual and064 Distant Entities set- Supervisioncorpus2016 in via word the to)learnstorepresententitiesbasedontheir enhance same061 seman- eachtion059 other to057 by align learning054 563058 similar065064075 word052 and words and054061 entities with057070 sim- 075 065055053067 015 517014 not023 been012tationsedge well011 benefits001 exploredbases. many in Wewords cross-lingual000 NLPAbstract510007and also tasks, entityand propose while005 set-example, filter hasrepresentation twoout (Wang003 noise, types et009corpus al.Attentive, 2014 whichMost to learning enhance; Yamada existing Distant willJoint each et to workfur- other al. RepresentationSupervision enable,Abstract2016jointly by learningCaotext; models etbeenexample, across word al., KB2017 anddone Learning (Wang and languages,)utilizethecoherenceinforma- intext et065 cross-lingualtextual al. of064, 2014 Cross-lingualMost capturing; descriptionsYamada existingCao050051 scenarios.567 et et work062 al.ilar mutually Words, al.0612016 jointly togetherembedding, ;2017 and models )utilizethecoherenceinforma- Entities with KB055 vectors.057 theand via text structured Another560 073 re-053 approach in059 (Han 025 013 018 002 generate004000 005013 cross-lingualAbstractofCao knowledgeexample, et training al.001, 2017020 (Wanggenerate)utilizethecoherenceinforma- data attention word etcorpus al.via, 2014 cross-lingualanddis-Most to andenhance; entityYamada existing cross-lingualACL each representations et 2018 work al. other trainingSubmission, 2016 jointly byACL; learning 2018 ***.models data Confidential Submission in063 word KBvia the andReview ***.samedis- Confidential text Copy. seman-052 DOAnonymous Review NOT DISTRIBUTE. Copy.050 DO ACLNOT055 075 DISTRIBUTE. submission054 063051 068 070 520 015 generate 012 cross-lingualof 514 knowledge complementary008Jointly016 010 learning training attention 001 word dataandtings. knowledge. and entityJoint via In023 cross-lingualrepresen-dis- this005Representation017text 003 paper,Instead across Jointly we propose languages, Learning learningof008 re-method a novel ofword Cross-lingual capturing that andAnonymous entity integrates004 mutually represen- Wordslation ACL065 cross-lingual and submissionet mechanism, Entities al.570, 2016 via word; which062Toutanova051 is usually060been058 et done al.expensive,5642015 in066 cross-lingualtextual;053 andWu descriptionset al.055 scenarios., together058 with the structured067054 073 re- 023 021015not been well028 explored not023 been012 in002 wellcross-lingual explored 026009020000tant015 set-in008 cross-lingual supervision 006Cao018 set-complementary etnot al. overbeen ,0120042017 wellAttentive multi-lingual)utilizethecoherenceinforma- exploredentity been representations and knowledge.Distant in cross-lingual done entity knowl-ACL Supervision006 inAnonymous representation a 2018 unifiedcross-lingual set- Insteadlations. tion Submissionvectors ACLkm toof been space. align submission re- However, learning For done similarscenarios. ***.n in wordsentityConfidential to cross-lingualtextualenable503 these065 representations andtic descriptions space,entities methods Review052 scenarios. with to in enable a sim-062 onlyunified Copy.together050 jointvectorfocus DO with073059065 NOT space.inference056 on058 the DISTRIBUTE. For structured among073071 re-062 KB054078 and 070 076 056068 553 026 016 518 tings. 013of In knowledgethis paper, 021 weJointly003 016 attentionpropose511 learning a001 novel and word cross-lingual andACLJoint entitytings. 2018Cao represen-Representation010002 et SubmissionIncorpus al., this2017 totic enhance)utilizethecoherenceinforma- paper, Learning space, ***. each weConfidential to other of enable propose Cross-lingual by= Cao learningAnonymous joint et ReviewAttentive al. a word, inferencenovelα2017 Words and ACL)utilizethecoherenceinforma-066 Copy. Distant and submissioncorpus among DOEntitiesα Supervisionto NOT enhance KB via053568 DISTRIBUTE.063eteach andilar al. other embedding, 2016051 by learning ; 076Toutanova vectors. word and Another et561 al.052, approach2015; Wu in060 (071 etHan al., 066 014 024 tations benefitsther002 many improve NLP005 006 tasks,014Joint while the Representation has004tion performance. to alignAttentive similar Learning2 Inwordsentity Distant exper-corpus and representations of Supervision entities Cross-lingual to enhancecomplementary with in each sim-In≈ a unified thisother Words vector bypaper,km learning knowledge.andlations.064 space. we Entities word For proposeki,lj and However,tion via052 Instead to to align learn theseof re- cross-lingual similar056 methods words 054 only055 and focus entities064074 on with sim- 521 016 026 tings.013 In this 019 paper,515 weJointly 009003 propose017 learning 011 a001 novel 016 wordjointtant s = andmethod supervision inference entity000 that represen-tings.006 complementary integratesattention013 over among In thistations multi-lingualexample, paper,cross-lingual knowledgeto021 benefits we selecttant (Wang009 knowledge. proposetic supervision many etword knowl- space, al.the, a base2014 NLP novel most; to InsteadYamada tasks, and enable over005 informative while etofal. multi-lingual joint, has2016 re-066 ; inference571 knowl- among063 KB053 and061 059050051 565076067066 056063 069059 055 071 016tant supervision attention014tings.024 013liance overIn this to multi-lingualpaper, onselect004010021 parallel weJointly 009 the propose learning 002 most007 data, tion a knowl- novel024w toword informativeki018 align we+ andAbstract005 similar automatically entity (w ki003 wordsrepresen-)entity Jointly and representationsand entities Attentive learningentity 007 with representationAnonymous wordMost in sim-Distant a unifiedilarexisting and embedding entitytionSupervision vectormay worknIn ACL to represen- align thislearning jointlyspace. introduce2016 submissionvectors. similar paper, models Forexample,)learnstorepresententitiesbasedontheir to Another words504066 KB weenableentityn inevitable and propose( andWang representationstextapproach entities et 054 al.errors. to064 with in, 2014 learn (Han063 inAnonymous sim- aIn; unifiedYamada Our cross-lingual052 this vectorembeddings paper, et060 ACL057 al. space.059, 2016lations. submission we For; propose However,074053 to055 learn these cross-lingual071 methods only068057 focus074 on 554 017022 015 method029 thatnot integrates been welltations027 cross-lingual explorededge003512L benefits bases. in cross-lingual007 word many019liance WeNLP set-005 tasks, alsoJoint|| ontion while011 propose Representationparallel to− has alignjoint|| similar two data,text inferencetypesLearning words acrossentity we andautomatically representations amonglanguages ofentities Cross-lingual withs knowledgeAnonymous in withoutsim-s akm unified Words vector067w anybasekilations.065 ACL ands space.additionaltext and Entities submission Foracross However,053 via trans- languageset al. these,ψ2016(e057 methodsi without,s;k,lToutanova) only anysim055 ( additional focuse eti, 562 al.072, on2015079w trans-m;)Wu et061 al., 077 069 024 017 027 519method014 that integratesattention010 cross-lingual 012 to022 select017002 word the006 ACL015 most001 2018 Submissioninformative ilarmethod embeddingCnot ***. Cao Confidentialbeen thatet al.In vectors.wellAttentive,0102017 integrates this exploredReview example,)utilizethecoherenceinforma- Another paper, Copy. Distant in ( approachWangcross-lingual DOmono-lingual cross-lingual we NOT Supervision006 et DISTRIBUTE. propose al. in, (2014kmHan set-∈067; Yamada to word settings, learn etmono-lingual al.∈ ,km cross-lingual2016 and064; few569 settings,2016 researches062 )learnstorepresententitiesbasedontheir060051052 and074 few have077 researches056 have065 060072 067056 522 017 025method tations004 that018 integrates benefits005011 iments, many017 cross-lingual010 NLPand003 S tasks014006k,l has that vectors. integrates of learning000 word Anothertations cross-lingual translationtotext approachenable benefits000 across wordcorpus manyin (Han languages to NLPliance enhanceAnonymousAbstract tasks,word each" on while without and other parallel ACL has entity by 572 learningsubmissionCao any"067 data,representations etadditional word al., we2017 and automatically)utilizethecoherenceinforma-054 trans-055Most in existing the053 same work061 seman-068067 jointly058060∝ models KB050057 and064075056 text050 027 and entity015516 014 representation not learning004 beentationstextedge well to benefits across enable exploredbases.008 many in languages, Weliance∈ wordscross-lingual NLP006 also tasks,ilar on and embedding propose004 while parallel set-example, capturing filter022 hasedge ACL vectors. twoout 2018( data,Wang bases. SubmissionACL noise, types008 mutuallyet Another2018 weACL al. Submission, 2018 automatically2014 ***.Wewhich approachet SubmissionConfidentialACL; al.Yamada also ***.2018,ilar Confidential2016 will Submission embeddingin ***.Review propose ( etHan; Confidentialfur-Toutanova al. Review, Copy.***.2016 Confidential vectors.Copy. DO; two Review NOT DOet example, AnotherDISTRIBUTE. ReviewNOT typesCopy.al., DISTRIBUTE.2015 Copy.DO ilar( approach NOTWang DO;054Wu DISTRIBUTE.embedding NOT et065 DISTRIBUTE. etal. in064, (al.Han2014, ; Yamada058 vectors.566077 et al., 2016056 Another; approach054 in (Han 058 072 018 edge bases.016 020words025 We andtings. also filter013 In thispropose 022 out paper,513 noise, we two propose002 which" types a019 novel willet al. fur-,012Abstract2016 ; AnonymousToutanova joint CaoAttentive inference et ACLexample, et al. al., submission 2015, 2017 Distant (Wang; among)utilizethecoherenceinforma-Wu Mostword et Supervisionare al.al. knowledge, existing,2014 helpful andtextual ; entityYamada work068 tomono-lingual066 base descriptions jointly505 representations breaket al. and, models2016 down; KB settings, together language and in063 textthe052 same and with gaps seman-few the in researches manystructuredmono-lingual w563m have075 re-sk,l settings,072070 and062 few researches069 have 555 018 520and entity015 representation030generatenot011 005 learning been well to006 cross-lingual enableexplored003of knowledge 007 inJointly cross-lingual016009 025 learning traininggenerateand008 attention entity 015set- wordACL! representation dataand cross-lingualtings. 2018tion and entity001via to Submission In aligncross-lingual represen-textdis- this011 learning similar lation across paper, training001Joint to words enablemechanism,***. we languages, Representation and propose Confidential data entitiesJoint007 via a with Representation noveldis- which capturing068 sim- ReviewLearning is usually mutually Copy. of Learning Cross-linguallation expensiveDO065 NOT ofmechanism,055 Cross-lingual056570 DISTRIBUTE. Words and2016word061)learnstorepresententitiesbasedontheir053 and andwhich Words Entities entity059 is and usually via representations Entities057 expensive via051058066065080 in and the051 same061 seman- 057 075 025 028023 018 and words entity representationand019 filter028018 out011 learning noise,joint004 toet inference enable020which al., 2016 among007 will; Toutanova fur- knowledge005word etnot al. base,and been2015 andwell entity; Wu exploredentity etbeen representations representationsal., in cross-lingual doneACL in a 2018 unifiedcross-lingual set- intion Submissionvector the068 to space. same align For similarscenarios. seman- ***.corpus wordsConfidentialtextual to enhance and054 descriptions entities each Review075069068078 with other061 sim- by togetherCopy. learning DO with!word073∈ NOT055057 and the DISTRIBUTE. structured re-078 070 523 017 joint026 016 inference517 015method among023 that knowledgetings.012018 integratesand005 notof In been entity knowledgethis basecross-lingual well009 paper,003 and relatedness explored we Abstract word attentionpropose007 inand cross-lingualet entitydemonstrateAnonymous a al. novel, andCao2016 representation set- et; cross-lingualToutanova al. ACL lation, 2017 the submission009)utilizethecoherenceinforma- mechanism, etef- Most al., learning20152016 existing;)learnstorepresententitiesbasedontheirticetWu al.which work space,, et to2016Abstract al. jointly enable, is; toToutanova usually573 modelsenablebeen067 Cao KBexpensiveet done joint et al. and al., 2015, intext inference2017055 cross-lingual; andWu)utilizethecoherenceinforma-066 etMost065 al. among053, existing059 scenarios.567 KBwork062 andjointly057 models KB076 and text 073 068059 019028 014 ACL 2018 Submission generate2016 ***. Confidential )learnstorepresententitiesbasedontheir013ilar cross-lingual Review embeddingAbstractAnonymous Copy.of DO knowledge NOT vectors.tion DISTRIBUTE. training ACLCao to Anotheralign 2018 submissionet Submission al. similar data attention,generate approach 2017ACLcorpus via***. 2018 words)utilizethecoherenceinforma- Confidential in Submissiondis-Most to (Han cross-lingual andenhanceAnonymous existing Review entities ***. cross-lingual Confidential each Copy.069 work with DOother ACLNOT Reviewjointly sim- training by DISTRIBUTE. submissionCopy. learning models DO NOTdata word KB DISTRIBUTE.064 via and dis-text 078 063 joint inference ther among improve knowledge012006 the base007004 performance. and514complementary 008tations005 benefits Inther many exper- knowledge.improve NLP tasks,002 023 while thetext has Instead performance.002 acrossJointly learningof languages, re-008 word Intictasks, exper-space, and capturingAttentive entity such to represen-enable as DistantmutuallyAttentive506 cross-lingual joint Supervisionet Distant inference056 al.057 , entity Supervision2016 among062 linking,054;055Toutanova KB in and whichbeen058 et done al.,564 0522015 in cross-lingual; Wu052 et al. scenarios., 058 073 556 019 521of knowledge016 021joint026 attentioninferencetings. In among this 023 and006 paper,019 knowledge cross-lingual wetext 010 propose017010 base across2016 and)learnstorepresententitiesbasedontheirjoint009 a020 languages, novel008 inference016008 method capturingamong006 thatcomplementary knowledgetings.012 mutually integrates In this base example, paper,cross-lingual and knowledge. we (Wang propose2016 etword)learnstorepresententitiesbasedontheir al.069, a2014 novel Instead; Yamadabeen ofet al. re- done, 2016066 ; in056nentity cross-lingual571textual representationstic space, descriptions060 scenarios. to in069060 enable a unified058 together jointvectorspace.inference with076059067066056058 the For structured among073071062 KB re- and 070 524 019 018 text031 acrossther 016tant languages, improveand020 entity supervision representation capturing013 theattentiontings. 012 performance. mutually overIn learning this004 toJointly021026 multi-lingualpaper, toselecttant enable learning Inwe supervision the exper-propose2016 word most)learnstorepresententitiesbasedontheirtion a knowl- and novel overto informative entity alignmay multi-lingual represen- similar Anonymous introduce010corpus words totextualCross-lingual ACL andenhance inevitableknowl- entities submission descriptions each withlations. other errors. sim-574 together byilar learning068= embedding069 tion Our However,may with to word embeddings alignthe introduceα vectors.and structured similar thesecorpus Another words066 re- inevitableα to054 methods and enhance approach entities063070 each errors.062 with inonly other (Han sim- by focus Our learning embeddings on word081 and 060071 076 029 020 400017518 015024 method300029 thatnot integrates been well cross-lingual explored000 in cross-lingualet word000 al.tic,=2016 space,; + set-Toutanovaγ totion enable et to al. align, 20152 similar joint; Wucorpus wordset inference al. to, and enhance entities070 among each with other sim- KBby learningkm and450067lations.065 wordψ and(ki,ljei However,,sk,l568)079050 these050 methods only focusl074 on079 350 026 024text across000 languages,027 013 capturing007 mutually008005fectiveness007 attention 009006011 ofto text our select acrosstextualjoint method= the languages,014 descriptions inferenceACL most003 with 2018 capturingc Submissioninformative an togethermay amongilar average003tationss embedding mutually introduce with ***.Cao benefitsConfidential knowledge the et structured al.tantIn vectors.009, entityinevitable many2017 thistext Review supervision)utilizethecoherenceinforma- Another representations re- NLP paper, across baseCopy. tasks, errors. approach DO and languages we NOT while over in DISTRIBUTE. propose OurIn ain has unified (multi-lingualHan this embeddings without vector paper, to057057058 learnspace. any we knowl- Forpropose additionalcross-lingual063055056 076 to061 trans- learn 059 cross-lingual053074077 053 064050 059 020 029522 017 text across method languages, that integrates019020515 capturingJointly complementary cross-lingual011000 learning018 mutually005textual 010 word descriptions tant iments, word009017000009 knowledge. ands supervision entityand together separate007 entity represen- Insteadattentionmethod representation with013 overcomplementary tasks the of that multi-lingual structuredre- integrates of to learning wordSentence selectJointly re- cross-lingual knowledge. translationto learning knowl-textualtext theenable Regularizer 070 across descriptionsmost word word Instead and languages informative entity together of represen-n re- with067 without theexample, structured572 any n additional ( re-Wang055 061 et al.079 trans-,0702014050 ; 059Yamada050 et565 al.060068067,0572016059 ; 063 069 020 019 complementaryiments,027 017joint separate 021 knowledge. inferenceand014 024 tasks among entitymethod013liance Insteadrepresentation knowledge of that of word on re- integratestations base parallel translationJointly021 learning001 and benefits cross-lingual learningtextual to manydata,2016 enable001LJointilar descriptions word NLP)learnstorepresententitiesbasedontheir024 word embeddingweAbstractL tasks,Representation andAbstractAnonymous automatically entity whileJoint togetherL vectors.ilar011 represen-entity has Representationembedding withACL Another representations Learninglations. the submission structured vectors. approachthe Most However, of Learning major AnotherCross-lingualMost inre-existinga in unifiedexisting (Han thesework challenge approachet of069 vectoral.507 jointly070Cross-lingualwork methodsIn,ilars2016 Words this embeddinginjointly space. models (sHan; lies2016Toutanova paper, only modelsand For KB inandWords vectors. focusEntities)learnstorepresententitiesbasedontheirw measuringKB welations.entity text067s andet onpropose andAnother representationsal. textvia, Entities2015 However, approach the to;064071Wu learn similar-via051063 in et in a unified ( theseal.Han cross-lingualIn, this vector methods051 paper,077 space. For onlywe propose focus074 on to learn cross-lingual071061 074 557 ጱ trans- 077 072 ڹ attention to022401018519iments, select500 separate the016 009 mostwords tasks informative andtings.007 of word filter In027edge this translationout paper,ACLL bases. 2018 noise, we Submission propose004 which We ***.liance Confidential aalso novelare will Review004|| on propose helpful= fur-Anonymous Copy. parallel− DO NOT to DISTRIBUTE.two||ACL break data, types submissionentity down we representations550mono-lingual automatically language575 071 inarekm a gaps unified helpfulkm settings, in vector059 many451068ki to066 space.text breakkm and Foracross057 few down569 languages researches language without gaps have054 in any many additional054 021 525 complementary 032 knowledge.edgeand014008 entity Instead bases. representation030006 of008 re- We010 learning 012001 also 012006 022complementary topropose enable 010001 lations.000Joint 015 theseS Representation Insteadet not al. methods , been Learning2016 oftion re- well; to onlyToutanova align explored010example, focusof Learning similar Cross-lingual on et in words ( al.Wang cross-lingual , of2015 and etCross-lingual entities; al.Wu, Words2014 set-et with ; al.∈Yamada sim-, and Words058 Entities058 et al.mono-lingual, ∈2016 and via064;056 Entities056 062 viaMost settings,062050s051NBAk existingᭌᐹ060060e andi050051 work few jointly 082 researches models KB065 haveand text080 060072 021 030 025021 018 020 028complementary 018 text across025 knowledge.Joint301 languages,gain021and516 014wordstations entityof RepresentationInsteadcapturingliance benefits20% representationand019lations. onof mutuallyfilter re- manyand011paralleledge However, 018010 out NLP 3% learning data, bases. noise, tasks,joint over these008km wetoettext inference while enablewhich al. automaticallylmbaselines,We methodsand, Learning014 across2016 has entityalsoenarek,l among will; onlyToutanovahelpful representation propose languages fur-zh knowledge focusre-2wordtationsAttentive to onet twoof breakal. learning benefits base,andlationlations. without 2015corpus Cross-lingual types Distant and downAttentive entitycorpus071; to many toWu enablemechanism, However, enhance toet any NLPlanguageSupervision representationsenhance al.Distant each, tasks,word additional070 theseAbstract071 otheret each while al. whichmethodsgapsand" Supervisionby other,068 learning2016 has entity Words by in trans-;islearning onlyCaoToutanova many word in usually representations focuset"the and word068 al.,same on etand2017 and expensive al.,)utilizethecoherenceinforma-2015 seman- Entities071080; inWu064 and the et al. same, viaᗦࢵ seman-566061075069068078058060 064075 351 027 030523 001liance on parallel501022 data,017 010015 we automatically methodnot that beentations integrates 002γ welltext benefits exploredlations. cross-lingual across textual 002 many inHowever, descriptionswords cross-lingual NLP languages,liance word∈ tasks, theseand togetheret on while set-example, al. methods filter parallel, with2016 capturinghas mono-lingualedge the ;out ( onlyWangToutanova structured data,lation noise, focus bases. etal. mutually re-we, onmechanism,551 et settings,2014 which al. automatically We,; Yamada20152016 alsoand will;)learnstorepresententitiesbasedontheirWuetfew which fur-propose al.et, researchesal.2016, is060; usually573 twoexample,067ACL have types expensive 2018 (Wang Submission080077 et065072 al. and052, 2014; Yamada ***.052 Confidential et al., 2016; Reviewฎ051 Copy. DO NOT DISTRIBUTE. 022 and402019028 entity relatednessjoint020025007009 inference demonstrate008 among007 knowledge022and the entity baseef- and005 relatednessJoint"= S Representation005JointS012Abstract demonstrateilar Representation embedding Learning vectors.ityexample, the of between Learning Cross-lingual Another ef- (Wang072 approach of entitieswordet508 Cross-lingualal.Most, Words2014 in andexisting (Han and;textual entityYamada and059 corresponding work WordsEntities452069 representations etmono-lingual descriptions jointly al., and2016 via057057 models058 Entities; mentioned KBsettings,in via together and theword text same and with seman- entity few055078 theAmerican researches structured representations055 075 have re- in the same072070062 seman- 558 526 liance on parallel 520and data,joint015 entity009 we inference automatically relatedness ther among generate improve knowledge011 demonstrate013002 013 cross-lingual ofliance the base knowledge011 performance. and002mono-lingual001 onJointly theJointly parallel 016 ef- learning009 learning training data,s 001025 attention settings,generate word joint weword In and inference automatically2016 dataand exper- andentitytings. andcross-lingual)learnstorepresententitiesbasedontheir entityAttentive few represen- via among In cross-lingualrepresen- researches dis- this knowledge 011 paper,DistantCao Attentive traininghave et we al. base Supervision, propose2017Anonymous Distant and data576)utilizethecoherenceinforma- a via novelSupervision ACLAnonymous dis-tasks, submission such059 ACL as submissionlation cross-lingual 065 mechanism,063NBA570 (zh)063051052 entity061061 which linking,051052 is059 usually in which expensive066 and 061 075 022 022words019 021 and033000023 filterlianceof on outcomplementary502 knowledge parallel noise, data, 022joint knowledge. which015not we attentioninference been automaticallygenerate will well Insteadmono-lingual023028 among explored cross-lingual fur-012 of and003 re-019 knowledge011 in cross-lingualsettings,cross-lingual text traininglations.003 base across2016 and and)learnstorepresententitiesbasedontheir dataHowever,015few set- languages,tasks,via researches dis- theseAbstract such capturing methods havenot as been cross-lingual only mutuallymono-lingual well entity focus exploredentity072 representations on552been representations inentitysettings,Most cross-lingual done071 in072 existingACL a2016 linking, unifiedand in)learnstorepresententitiesbasedontheir a few2018069 unifiedworkcross-lingual vectorset- researches injointlytion space.Submission vector whichbeen tomodels For space. align have KBdone For similarscenarios. and ***. textin corpus wordsConfidential cross-lingual072 053065 to and enhanceψ entities(ei,s053 Review eachscenarios.k,l with062)=1069 other061 sim-083 Copy. by learning was DO050073065NOT word DISTRIBUTE.and 073 078 031 524 019 018 011031016010500 517ther improveand009020 entity representation not the been performance. wellmono-lingual learning explored L006 toACLAbstract enable|| in settings, 2018 cross-lingualtasks, In Submission006− exper-2016 and suchCao ***. ||)learnstorepresententitiesbasedontheir Confidential few set- et as al. researches cross-lingual Review, 2017may Copy.)utilizethecoherenceinforma- DO haveNOTintroduce DISTRIBUTE.Most entity existingtextualCross-lingualtic inevitable linking,550 space, descriptions workAbstract jointly to in060 errors. whichenable061574 together models Cao068069 jointOur et KB with al.059 and, embeddings inferencethe2017 text structured)utilizethecoherenceinforma-066081 Most among re- existing KB567056 work070 and jointly056 models KB and081 text 023031 generate029403020 cross-lingual010 023 026 trainingtext302 008spectively.400 across data languages, via 014014008 dis-tings. Using 300 capturing of In 012 been002Abstract knowledgethisentitytations done paper,mutually benefits linking inlation we002 cross-lingual many attentionpropose generate NLP mechanism, as tasks,a scenarios.013 novel case and while cross-lingual etMost al.Abstracttic hascross-lingual,= existingAttentive 2016beenof space, which tion; knowledge+ doneToutanova work γCao to trainingDistant jointlyalign intoAttentive is et cross-lingual enable usuallyal. et similarmodels , al. Supervision 2017 data, attention 0732015 DistantKB words)utilizethecoherenceinforma-corpus joint viaand; expensive scenarios.Wutext dis- andMost Supervisionto et inference andenhance al. entities ,existing 060 cross-lingual and each453 with070 work among other sim- jointly058 by058 learning064 KBmodelsNBA081073 and064052 word450 KB೉᯾ and062 text052 079 076 063 352 350 028527 026 generate002 cross-lingual fectivenessgenerate521029 trainingtext016 liance cross-lingual503 across of ondata parallelour languages, via trainingdis- method tings. data, 012 capturing wedata In003 withautomatically thisvia dis-generatepaper, mutually an023 fectiveness average003complementary we cross-lingualtations propose 010 benefits ofaAttentive trainingther noveltextmany our acrosstextualimprove NLP method data knowledge.method tasks, languages, via descriptions dis- whilethat with the Distantintegrates has capturing012 c performance. an together Insteadmay example, averageJointlysexample, mutually cross-lingual introduce with553 (Wang Supervisionlearningof the (577 Wangre- et structured al. Intic, inevitable509 word et2014word exper-space,al.;, Yamada2014 re-and ; entity toYamada eterrors. enable al., represen-2016 etbeen al.; Our joint, 2016066 done embeddings; inference in571078 cross-lingual053 among 062 053 KB scenarios.079 and076 060 052 6.1 Experiment073062 Settings 559 023 023 020 022 fectiveness020 of012021026 ourJoint011501023text016tant method acrosstant supervisionjoint010 Representation supervisionbeen languages, inference with done 013ACLtings. an004 2018 among020 in012 capturing Submissionaverage over cross-lingual overInbeen knowledge this017complementary multi-lingual ***.mono-lingual Confidential done multi-lingualmutuallypaper,004007textual026tant in base scenarios. Review we016 cross-lingual descriptions Learningand Copy. supervisionsettings, proposeknowl- knowledge. DO007 NOT DISTRIBUTE.tionACLand a knowl- togetherscenarios. novel few totings. 2018 Insteadalign over researches with of In Submission similarbeen multi-lingual thewords this ofCross-lingual donestructuredre- 073have paper, words in incross-lingual weSentencecorpus anddifferent***. re- propose072 entities knowl-Confidential to073551textual enhance Regularizer a scenarios. with languages.novel070 descriptions eachlations. Words sim- other061ilar Review062 embedding by together learningmay070 However, Copy. and060 wordintroduce with vectors. DO and the EntitiesentityNOT structured073 theseAnother054066 representations inevitabletic DISTRIBUTE. re-space, methods approach via054 to057 errors.063 in inplayer070 enable a (062 onlyunifiedHan Our057 jointvectorfocus embeddings076067 space.inference066 on For among071 KB and 076 tant supervision 504 over019 multi-lingualcomplementary017iments,518 knowl- separate015 021009 knowledge. method 029attention tasks that003 Instead integrates of to of word select re- Jointly 003 cross-lingual translation learningthe the major mosttextual word wordcorpus2016 L descriptions challenge and)learnstorepresententitiesbasedontheirinformative to enhanceL entity each represen- togetherL lies othercorpus554 by in with learning measuring tolations. the enhance wordstructured the and However,each majorthe re- other similar- by these challenge learningtion069 methods toAll-star059 align word 065 lies similar onlyand067 in053 focus wordscorpus measuringLawrence ons and tok enhance entities=1053568 the witheach071 similar- othersim- by learning word and 079 525ther improve 034001 021 theattention011 performance. 024 009 to 401iments, select In015 separate the exper-024 most 013 tasksnot Abstract not been informative been of well well word exploredJointly explored translation learning inthe cross-lingual in major cross-lingual word Most and challengeset- existing entity set-In represen- this work tion paper, liesjointly= to align in models we measuring similarpropose KBtextentity and wordstextcorpus across to representations learn the and to similar-languages enhance cross-lingual entities061575 071 in each withIn a unified thisother withoutsim-059 vector bypaper, learning074 any space.065451 we additional word063 For propose and trans- to learn084 cross-lingual051 074 ጱ 064 ڹ tant supervision024404 over multi-lingualcomplementarygenerate 027 cross-lingual032 knowl-502 knowledge. training013004ACL Instead data 2018 Submissiontant viaattention 005dis- of supervision 004 ***.re- In Confidential this 011 Review paper, toover 005 Copy. select multi-lingualcomplementary we DOtant NOT propose DISTRIBUTE.ACL lations. the supervision 2018014 SubmissionACL knowl- mostto learn However, 2018knowledge. ***. Confidential Submissioninformative cross-lingual overilar these ReviewareCao Insteadembeddingtations multi-lingual Copy. ***. helpful methodsetCao DO Confidential al. NOTof, et benefits2017 DISTRIBUTE. re- al.In vectors.entity, only to)utilizethecoherenceinforma-2017074 this Review break representationsmany focus)utilizethecoherenceinforma- knowl- Another552 paper, Copy. NLP on down approach DO tasks, in we NOT a language unified454 while DISTRIBUTE. propose in vector(Han has space. gaps to For learn in082 many054055 cross-lingual063 054055 061NBAᭌᐹ 074077 082 032024 528024 032 021 023 030tant supervision017 over013study,012 multi-lingual024complementarymethod text011 the Jointlythat across knowl- resultsIn integratesAbstract knowledge. learning this301 languages,ACL paper,021 Jointly 2018 word on cross-lingual Submission we Instead018capturing and benchmarkbeen learning propose entity ***.008lations.may Confidential done of represen- mutually re-wordword to017 in introduceReview cross-linguallearn However, andCopy.008 datasetand cross-lingualMost DO entity NOT entity DISTRIBUTE.theseexisting scenarios.represen-textinevitable method representation methodsattention 013 work across thatare jointlyInACL only integrates thishelpful 074 errors. languages learningmodels 2018topaper, focus selectJointly 578 cross-lingualSubmissionKB to onwe to Our073 breakand proposeenable learningthe withouttext embeddings down toword071 most word***. learn062 anyConfidential andlanguage063 cross-lingual informative entity additional represen- 067 gaps 061 Review in082 trans-example, many074 Copy. (|Wang DO| NOT et al.058 DISTRIBUTE.,0710802014; Yamada058 et068067 al., 2016; 063 351 gain522030021 of 20%505 020 and303 027 3%017 edge over 016 010 bases. baselines, 014000method024gain We013liance004 also that of re-liance In integratespropose 20% this on004 on paper, parallel iments, andJointly parallel two cross-lingual weγ types 3% propose learning data,lations. separate data,entityilarACL overcorpustextual 2018word we to embeddingword representations Submissionto learn However, weautomatically descriptions baselines,enhance Anonymous andAbstract tasks cross-lingualautomatically ***.en entity each Confidential vectors. these in of together other a represen- unified ACLzhentityAnonymous word Review555 re- methodsby Another learning2 withsubmission Copy.vector representations translationDO thetext onlyspace.word NOTapproach510 structured074ACL lations. DISTRIBUTE. and across focus For submission inMostin re- on However, ( aHan languages unified existinget050 al.,ilar vector070 these2016 work071In embedding without this060; jointlymethodsspace.Toutanovabasketball066 paper, models For572 vectors.079 only any054 KB067 weetentity focus additional Another andal. propose, representations on text2015 approach054; trans-Wu to080ᬩۖާ064077 learn et in063 in (al.Han a unified, cross-lingual vector077ᗦࢵ space. For 074 353 560 027024 029 003 edge bases.gain 012 of We025 20% alsoliance propose018022 and010and 503519 on 3% entity two parallel over types022016 relatednessand data, baselines,wordse entityLawrence we014 representationautomaticallytings. andtings. demonstratere- In filter In thistations this 027Abstracttations paper,edge learning out paper, benefits benefitswe noise, bases. wethe propose to enable propose many many ef- which a NLP novel We NLP a novel tasks,=word tasks, willalso S while andMosttion whileilar fur- proposeentity hase to existingembeddingS align has representations ,w similar work twomono-lingual vectors. jointly words,w types553 entity models and Another in entities representationsthe KBmono-lingual settings, same and approach062 with text seman- sim-ᘶፑ andare in in ( few aHanhelpful060 unified researchessettings, vector 068075 to066 break space.text have064 and Foracross down few569 languages language072 researches gaps without have053 inAmerican many any additional072ฎ trans- 077 526edge bases. We405022 also proposetant supervision two 000 types013 402 overedge multi-lingual 005 bases. edge knowl-001 006 We bases.005word alsoandJoint We 012 entity 006 Representation also propose representations propose ityitymono-lingual between twobetweenLearning two015 types in types the ofs Cross-lingualsettings,entities same etseman- al., and and and2016tion Words೉᯾ few ; tocorresponding050 correspondingToutanova alignresearches and example,ᓯቖ075 Entities similarexample, ( etWangity wordshave viaᬩۖާ al. , betweenet ( mentionedWang2015 mentionedandal., 2014063 entities;051 et576Wu455; al.072Yamada , entitieset with2014 al. sim-,et; Yamada al., and2016; et corresponding al.452055056, ᓯቖ2016; 055056NBA mentioned(zh)062 065 025 iments,022 024 separate035edge bases.liance018 tasks506 on We parallelof also014 wordliance proposeandand data,014 entity onentity translationcomplementary012 we twotations parallel011wordrepresentation automaticallyJointly025 types030 relatedness benefits and learning data,words entity knowledge.005 manytations we word learning representations NLP automaticallyand andACL benefitsdemonstrate tasks,In 2018 entity009Instead filter thisto005 Submission whileenable represen- manyliance paper,018Attentive ofout has***.in re- Confidential NLPthe we on noise,corpus009joint the proposesame parallel tasks, Reviewef- inference to seman- which enhance Copy. whiletoand data, learn DODistant entityNOT014 has amongeachwe will DISTRIBUTE.cross-lingualtasks, automatically representation othernot fur- knowledgebeen by such556 learning Supervision welltations as learning baseexplored wordlation cross-lingual074 benefits and and to enablemechanism, in072 many cross-lingual NLP064 entity tasks,word which set- linking, while068 and061062 has entityis inusually which055 representations expensive064ᘳӱ055 059 and in085 the same059 seman-068 064075 080 529025 033 033025 002025523022words 021028 and033verify filter504025 018 of theoutJoint017 knowledge quality Representation noise, 015 and 015022 entityattention014 ofwhich method Learning ourword representation019generate att that embeddings. andwill mono-lingual of integrates entityCross-lingual cross-lingual cross-lingual fur-tations edge representationslearning cross-lingual benefits settings,bases. Wordsmono-lingual toexample,etentity traininglations. enable Anonymous al.andmanyL word representations,andin EntitiesWe2016 ( theWang NLPdata However, few|| settings,same;corpus alsoattwordetviaToutanova tasks,Anonymous researchesACL al.seman-−in dis-, toand these2014propose 075 a while unifiedenhance andsubmissionexample, such entityword; Yamada methods|| etfew has vector have each579 asal.ACLrepresentations tworesearches,and (et cross-lingualspace.Wang other2015 only al.075 submission,554mono-lingual2016 typesbyentity For; focus et learningWucorpus; al. have, onin et2014 representationsthe wordal. to2016 entity enhance,same; settings,YamadaCross-lingual and)learnstorepresententitiesbasedontheiret071 seman-072 al. linking,each and, et2016 other al. few, 067;2016 byToutanova in researches573 in083 learningCao Attention which;075083 the068 etexample, wordsame al. et065 have, al.2017 and, (2015Wang seman-)utilizethecoherenceinforma-; et065Wu072 al.064, et2014 al.,ACL; Yamada 2018052075069078 etSubmission al., 2016083; ***.was Confidential Review Copy. DO NOT DISTRIBUTE. of031 knowledge013 attention000 generate011 and001 cross-lingual cross-lingual 023017 joint training002 inference 000 datamethod among via dis-thatareACLnot knowledgenot 2018 integratesAbstract beenand helpfulbeen Submission well well entity basecross-lingualACLAttentive ***. explored 2018Confidential to and Submission relatedness break Distantin inReview cross-lingual000 wordwordscross-lingual ***.Anonymous Copy. Confidential Supervision down DOilarMostet NOT050andset-demonstrate embedding al.Review DISTRIBUTE.existing, set-language ACL2016 Anonymous Copy.filter051 worksubmission DO; vectors. Toutanova NOTbeenoutjointly lation DISTRIBUTE.511 ACL Another donegaps the noise, modelsexample, submission mechanism, etef- in approach KBin al. cross-lingual which, and( manyWang2015063052050 text in; (Han etWu will al. which scenarios., et2014 fur-061 al.,; isYamada usually067 et al. expensive, 2016; and073081 561050 ߵᇐỿ among2017 usually Review064 approach065)utilizethecoherenceinforma- KBword Copy. and expensive in (Han and DO entity NOT069 and453056 representations057 DISTRIBUTE. 056057NBA081 in063 the same seman-078 075ےspectively.031spectively. edge507 bases. Using Using015019028 We014403 entityJointly also entity propose learning 006 linking linking two word 025 types007 asspectively. and as006ACL aACL entity a 2018 case 2018 case Submission Submission represen-013007 ***. ***. Using Confidential Review Review entity Copy.ACL Copy. DO DO NOT NOT2018 DISTRIBUTE. linkinglation DISTRIBUTE.ACLtic 2018 Submission Submissionspace, mechanism, as ***.a toilar Confidential caseenable557 embedding ***. ReviewCao joint Copy.Confidential et whichCao vectors. DO inference al. NOT, et2017 DISTRIBUTE.al. Another)utilizethecoherenceinforma- is, ᜡ 030 026028527of knowledge attention406023of knowledgegenerate and 026 cross-lingual attentioncross-lingual 304 520000 and training cross-lingual 013not012 tations been data of302 well benefits viaknowledge explored dis-ticgenerate006 many space, in NLP word cross-lingualattention toACL010 tasks, enable andcross-lingual006 2018of while Submissiongenerate entity and knowledge jointset- hasrepresentationsbeen cross-lingual ***.entity010att inferenceJointlyConfidentialcross-lingual016 done representations learningReview training in among attentioncross-lingual Copy.in training theMost DO word 2016 NOTsame KB inACL DISTRIBUTE.existing dataand a)learnstorepresententitiesbasedontheir data andseman- unified and entity scenarios. 2018 via via work vectordis- cross-lingualrepresen-050 Submissiondis- jointly 076 space. models For KB ***. and Confidential text577456073 062063 Review080076056 Copy.lation DO mechanism,056 NOT570060078 DISTRIBUTE.೉᯾ which060 is usually expensive066354 and 352 025 019 001 fectiveness505generateJointjoint Representation015 inferenceliance cross-lingualtic of on space, ther among parallelour Learning to training methodimproveJointlynot enableand knowledge data,Attentive been entityof Abstract learningCross-lingual wedata000 joint well representation withautomaticallybeen Distant viaJoint the base028 inferencewordα exploredki,lj dis- done Representation019 anperformance. Supervisionand and Wordswords average in entity learningin among cross-lingual and cross-lingual represen- Cao Entitiesinexample, to Learningsim KB enable different et(wjoint andki al. via,w (, InWangscenarios. oflj0152017 set-inference) Cross-lingual exper-tic et)utilizethecoherenceinforma- al.languages. space,tings.051, 2014The among;Words toYamada In enable intuitionnot knowledge this and been et paper,075 jointal.Entities, well2016555words inference base; viais weexploredentity and proposethat, representations in among in different cross-lingual words a KB noveltasks, and069 in a050 andlanguages. unified suchset- entities vector asSemantic cross-lingual space. 065 inSpace For entity linking,069 in which 065 078 026 026 023004 023 508 022 023002026019 offectiveness knowledge 018 Joint Representation of003joint our016 023015attention inference method001 tic020tant Learning space, among supervision with tings. to ofand enableCross-lingualnot knowledge an In beenwords cross-lingual this average over jointtext well paper,beenWordsbase multi-lingualinference across2016 inmono-lingual explored done wedifferent and)learnstorepresententitiesbasedontheir and propose languages, inEntities among in cross-lingualentity cross-lingualcorpus settings,knowl- aet languages. novel via KB al. representations to076, capturing enhance2016 andCao558 and 052 scenarios.; set- etToutanovafew each al. mutually in researches, other a2017 unified076been byet)utilizethecoherenceinforma- learningal. vector done,073 have 2015been space.word in; Wu cross-lingual053051Cross-lingual and For doneettic2016072 al.073ACL, space,)learnstorepresententitiesbasedontheir in scenarios. 2018 cross-lingual068 to tion enableSubmission076 to069 Cao align066 joint et al. similarscenarios., inference***.2017)utilizethecoherenceinforma- wordsConfidential073065 and among entities Review KB070 with054 andplayer sim- Copy. DO073 NOT DISTRIBUTE. 530 and entity attention036 relatedness524 to014of select knowledge the demonstratetant most012015 attention supervision001 informativeattention and018 over cross-lingualnot thetext026031 been multi-lingual to016 across ef- wellther008 select explored languages, improveand the entity knowl- in mostcross-lingual008tings. representation300 capturing the informative ∝Jointof Inw performance. set-ki knowledgethisskm Representation mutually,w learninglj paper,slm to we enableattentionpropose Learning Inthe exper-2016 a of major novel Cross-lingual)learnstorepresententitiesbasedontheir and051 cross-lingualchallengetion580may to Words alignCao introduce similar and lies et Entities al. words in,065textual0642017 measuring viaand inevitable)utilizethecoherenceinforma- entities descriptions062 with the errors. sim-574 together similar-068058 Our withAll-star embeddings the structured058 066 086 re-Lawrence 076 081 350 034 034 003 ther improve 002029016034020 506tations the benefits014 performance.007000024tings.013 many In400Attentive this NLP 007 paper,007 tasks, Distant we 001 while In propose014 Supervision exper-007 has a novel∈ ∈ text acrossthe052 majoret languages al.tic,=2016 challengespace,; withoutJoint+ToutanovaIntionγ 556 this050 to any paper,align lies enable etRepresentation additional al. similarin, we2015066 measuring propose joint; wordsWu trans- et inference toand063 al.051064 learn, the entities similar-084 cross-lingual070084 with057057 among Learning sim- KB057057 and450074064 of Cross-lingual053079 084 Words and Entities via attention026 to032 select026407024attention thetant most to027 supervision select informative the003 404 over most multi-lingualgenerate informative cross-lingualattention 004 knowl-… tationstextJointly002 to across training benefits selecttic learning space,011 languages data many the wordfectivenesstantvia tomost NLP enable dis- and supervision… without tasks, entityexample, informative011Intations joint… whilethis represen- tionAttentive anyCao inference benefits paper,of over (has toWang etadditional align al.ourcorpus multi-lingualther Distant, wemany 2017et among similartextual al.method propose to)utilizethecoherenceinforma-, improvetrans-Supervision NLP2014 enhance KB words descriptions tasks,; and…Yamada knowl- to and with each053 learn while entities the other etc cross-lingual an has togetheral.may076 performance. with, by512 average2016 learnings sim- introduce; with wordthe structured054052 and457 inevitable074 Intic exper-space, re- errors. to enable 077454 Our joint embeddings inference 061082 among 061 KB076 and 562 027 528 study, 032study, 020 the509 the results023 results029tant521002 ontext on supervision016 benchmarkacross benchmark019text languages, across over026 study, multi-lingual017 languages datasettings.joint dataset capturingAttentive inference the In without thistasks, knowl- mutually resultsmethodDistant among paper, any knowledgesuch Supervision that additionalwe on integrates propose017 as benchmark basebeen cross-lingual trans-maycross-lingual andtext done a001 novel016 across in introduceexample, cross-lingualentitytext wordlanguages,method dataset across representations (Wang entity559 languages scenarios.that052tings. etinevitable capturing al.077integrates in, linking,2014 a unified In without ; thisYamada mutually vectorexample, paper,cross-lingual anyerrors. inet space. additional al.which we, (2016Wang578 For propose Our;073 etword trans- al.070embeddings, a2014 novel069; Yamada et al.067, 2016066 ; 571082 079 076067066 051 027 027 024 027020000 000000 textJoint024016tant across Representationjointtext supervision languages, inference acrossInattention languages Learning this amongtings.020 capturing paper, knowledge overIncomplementary of without Cross-lingual this weto mutuallytextual propose multi-lingualpaper, select baseany descriptionsAttentive additional and weWords toAbstract knowledge. the learnpropose2016 and Distantmost cross-lingualtrans-)learnstorepresententitiesbasedontheir077 Entities togethertion a knowl-DAbstract novel Supervision 050 Instead to viainformative with align050 the of077A similar structuredre-In this074Most words paper,Sentence existing re- and we entitiestextual workMost proposeRegularizer jointly existing descriptions with to models learn sim-work ilar077 cross-lingual jointlyKB together embedding070tion andmay models text to with align introduce KB vectors. the similar and structured text074 Another words066 inevitable re- and approach entities070 errors. with in (Han sim- Our embeddings050 words and024 filter015attention out003 noise, to017 select013gain004 016507 which gain the ofwords will most of 20%001method019014fur- 500 20%tings. informativeJointly andiments, and that017303 Infilter and009 learning this integrates0083% out paper,500003 3% separate noise,word021 edgeovercross-lingual we002 over propose009 and bases.method which008029 baselines,000 entitybaselines, tasks a word novel will that represen-Wee e of fur-integrates also wordre-In re-,w,w propose this cross-lingual translationpaper, ,w two wevarious textual types053 propose2016 wordL descriptions)learnstorepresententitiesbasedontheir054 to languagesLilar581 learn embedding cross-lingual togetherL557 051tion vectors. shareto with align 550066lations. Another065 the053067 similarsome structuredthe approach 074 wordsHowever, common550064 major063052 re-050 in and (Han entities these081 challenge semantic069058059 methods with sim- lies only058basketball059 in focus067079 measuring on the 071ᬩۖާ similar- 079353 029 031531 edge305 bases.not been Wewell alsoexploredcomplementary005 proposew in cross-lingual two knowledge. types set-ACLe 2018Instead Submission ***. of Confidentialtion re- to Review align Copy. similar DO NOT DISTRIBUTE. words and entities with sim- 055 355 words and005 filter out525 noise, 510 which will 021024 fur- 003attention015 008025 000 Joint towords401 iments, Representation select 008notbasketball andtations been filter wellseparatetext benefitsthe015 Learning out 012 explored acrossw noise, mostmanyplayer of languagesLawrence tasksinCross-lingual NLP which cross-lingual informativetasks, 012not೉᯾೉᯾ ofwill without been whileilar Words word fur- embeddingset- well hasᓯቖ any andentity explored translation additional Entities vectors.representationsᬩۖާ inviacross-lingual trans- Another 560 053 in approach aword set-unified ilar in and embedding (vector= Han050 entitye space. representations,wIn vectors. For thistext Another,w paper, across065 in approachthe languages we575 same071 propose058 inseman-ᘶፑ (Han without to058 learn062451 any075065 additionalcross-lingual062 trans-055 074 ጱ ڹ ߵᇐỿNBAᭌᐹ 077082ے037408025wordsedge and 028 filter bases.1Introductionout We noise,405 alsocomplementary 001 which 001 proposetant020 will supervisionJoint027032 fur-two Representation knowledge. types overlationtext multi-lingual across mechanism, Instead Learning languages,andAttentive ofedge knowl- entityof attention re-which Cross-lingual capturing bases.word representationCao DistantJoint isα Anonymouski,lj usuallyand et Representation Wemutually Supervisional. to entityWords , also2017 learning selectexpensiveACLlation representations and propose)utilizethecoherenceinforma- lations. submissionCao Entitiesexample,ity to Learningsimity mechanism, theenable etand( w between twobetweenki al. ACL via,w However,most (,Wang oflj2017 types) Cross-lingual2018in 051 theet)utilizethecoherenceinforma- which Submissional.informative entitiessame051, theseare2014077 isseman-; Words helpfulYamada usuallymethods ***. and andCao and೉᯾ Confidential et et expensivecorrespondingal.Entities corresponding only to, al.2016 , break4582017ᓯቖ focus075; via Review)utilizethecoherenceinforma- and on down Copy.ᬩۖާ mentioned070 DO mentioned language NOT 078 DISTRIBUTE.455 gaps in many087ᓯቖ ᜡ 027 035028035fectiveness025 of ouriments,021 method004024030 separate035 005017508 withedge522 bases.017 an tasks 002 averagelation501method We of mechanism,alsocomplementary that word018000integrates proposeJointmethod 501004 lation Representation translation003 which cross-lingual thattwo mechanism,word knowledge.301 integratestypes is001 andusually 021 Learning word entity which Insteadcross-lingual expensive representations018 ofatt Cross-lingual isof usuallyIncomplementary re- this and iments, word paper,017 expensive Words inlationtextualthe weand054 and proposesameseparate mechanism, entitydescriptionsandknowledge. Entities 055 seman- tomethod representationet learn via al.Representation078 togethertasks which,are Instead2016513 cross-lingual that558word052 helpful; is withintegratesToutanova ofcorpus usually ofand075 learning the re-word entityto structured to551067enhance expensive054 cross-lingualet breakrepresentations translationtoal.corpus, enable074 re- each2015 and downto 071 other;551 enhanceWu053051 word by et in language learning al. theeachNBA085, 085 same other word seman- by068 gapsand learning067 in wordNBA many572 and(zh) ᘳӱ 071080 085 068067 050 351 563 028 529028 033004027verify033verify 025 the words the quality and quality filter018030014 of028 out004021 of our noise, our embeddings.which and020 embeddings.015 001 entity will006027 representation 010 fur-verify025017009notstext km beenJointof the across learningAttentive wellknowledge010 quality explored009 languages, togain DistantRepresentation methodenable in attentionof cross-lingualSupervisionof capturingliance that ourword 20% lations.ilarintegrates ∝embeddings. on andKNw embedding set- mutuallyki andparallelskm entity cross-lingual,w However, lj cross-linguals lm3% vectors. representations data,lations. ACL over Learning078 Another these weilartext 2018 word However, embeddingautomaticallybaselines, methods054 approach in1 across Submission the en in078sameatt these only ( Han vectors.oflanguages 051 seman-zh focusre- methods Cross-lingual 2 ***.Another on 056068 579 Confidential onlytextlations. approachwithout focus075 across 065064 on However, in Review any(Han languageset078 Words additional al.070059 these060071 Copy.,ilar2016 embedding methods without; DOToutanova059 and NOT060 trans- only vectors.083 anyDISTRIBUTE.075083 focus Entities067 et additional Anotheral. on, 2015 approach080;054077 trans-Wu via et in (al.Han, ᗦࢵ077 016511 tings.ther002 In016 this improvetations paper, c benefits we thetings. propose performance. manyAnonymous022 In this a NLP noveland paper, ACL tasks, entitysubmission we In while propose exper- representation has a novel learningγ to enabletextual561052 descriptions togetherilar embedding with 066 the structured vectors. Another re-066 approachAttentive in (Han Distant Supervision072 ther improve the performance.of knowledge 002 In exper-009 attentionliance 018 and on009 parallel cross-lingualAttentive016 data,013the we Distant major automaticallyAnonymous Supervision013tings. challengeet Inal., this∈2016 paper,; ∈Toutanova liesAttentiveCaoACL we etin propose al. etDistant, measuring2017 al. submission, )utilizethecoherenceinforma- a2015 Supervision novel052; Wu et al.the, similar-mono-lingual settings, and059 few researches059 063068066 have 063 ฎ ߵᇐỿ among many; Wu KB etare andal., helpful 452 to break down language gapsAmerican in051 many 052 450ےther improve the performance. 005 In exper-spectively. 001edge003 bases. Usingandther402 We improve entityedge entitycomplementary also005 propose≈ the004lation relatedness bases. linking performance. mechanism, two002words knowledge. types as We ation demonstrateand which InsteadInJointly case also to exper- filter align is learning of usuallyare propose re- similar002example, outJointly helpful expensiveword thetionnoise, wordsmeanings and( learning toWang055 ef- two align entityand and to which et similarentitieswordal. represen-types break=,5822014 and,buttherearealsowaysinwhichtheytic wordsS with will entityet; space,Yamada down al.sim- and053 fur- represen-,S entities2016 to et enable language al.; with,Toutanova 2016055 sim-joint; inference etgaps al.054052,400 2015 inᜡ 532 030 028 409026ther 526 improve of knowledge e theki performance.022we006lj attention509406+of 005spectively. liance knowledge attention entity Using007 and representationAnonymous data,019 mayand 502 and entity we entityintroduce cross-lingual ACL automatically learning representationrelatedness submissionlinking 030joint inevitable oftoAttentiveenable inference knowledge asticThe The a space,learningerrors. demonstrateDistant among case word intuitionattention to knowledgeOur Supervision toliance enable and enablemay embeddings entity andmono-lingual on joint introduce basethe islations. representations parallelcross-lingual att inferenceand that,ef- that, However, data,inevitables056 settings,055 words words amongin we the these078 automatically same errors.KBand559 and methodsand andseman-052 entity fewtion entities Our entities toresearchesrepresentations only align552 embeddings 057 focus459 076 similarentity in on inity have representations in words betweena552 unified071 and vector576 entities072 in entities a456 space. unified with069Astraightforwardsolutionistoconcatenatethem For vector sim- and space. corresponding080 For076 NBA mentioned(zh) 5.3 Cross-lingual080 Sentence Regularizer 032 029 022512029 025 306ki018 018ki003joint016may inference introduce304liance000011 among tings. oncomplementary inevitablemay parallelknowledge In introduce011 thistic paper,space, data,errors. base knowledge.wordsinevitable and we we toJoint Our propose enable automatically embeddings and Insteadmono-lingualet errors.a joint novelAnonymous al. Representationfilter, 2016 inference of Our re-018; out ACLToutanova embeddingswordsmay settings, submission noise,joint among introduce et inferencein562 al. KBwhichand,053 different2015and2016 inevitableand few079; entitytasks,Wu)learnstorepresententitiesbasedontheir among willLearning researches Learning ettic representation al. languages.050errors. space,fur-, such knowledgeThe tohave Our as068 enable intuition cross-lingualembeddings learning baselation of075 joint andAll-star Cross-lingualto072 inference066 enablemechanism, is that, entity among082079 words061 linking, KB which and068 and Words inis061 usuallywhich entitiesSemantic NBA and expensiveᭌᐹ inSpace Entities and via 068 356 354 029 026 ther improve006 019025015 the029523words performance.003 017004021 and028033 Infilter exper-026 018method010 out thatAnonymous noise, integrates010003 ACLand 022 which cross-lingualsubmission entity generatetic representation019 will space, word cross-lingual fur- to enableCross-lingualACL learningwords 2018 joint056 Submission training079 toet inference in enable al.ACL ***. different, Confidentialdata 20182016053 Submission among via; ReviewToutanova dis-054 languages.***. Copy.KB Confidential076 DO andword NOT et Review DISTRIBUTE.069 al. Copy., andmono-lingual2015 DO NOT entity; DISTRIBUTE.Wu065053067 et representations settings, al.,079071060et and al., few2016060 researches; Toutanova573 in076 the068 have same et al., 2015 seman-;072Wu et al., 075069078 083 was 029 026 022 006 2016)learnstorepresententitiesbasedontheirlations. However,079 these methods056 only focus076 on 2016072)learnstorepresententitiesbasedontheir 036 036530gain of006 20%038 and017 3%031 over036007 006method baselines,C iments, that010 integrates003not separate∈ beenR re- well001 cross-lingual010 liance tasks explored attention on005may of 017 parallel014 word in wordjoint introduce cross-lingual to inferencetranslation data, select inevitable we014methodAbstract automaticallyset- among theerrors. mostthat knowledge integratesand OurAbstract informative ilar embeddingstion entitymono-lingual embedding basecross-lingual to align andL relatedness similar057 vectors.056Most word|| wordssettings, existingtasks, Another514 and−etwork entities053 demonstrateandMost al.approachsuch,|| jointly2016 fewexisting with as in067 researches; sim-models cross-lingual580 (ToutanovaHan work lation KBjointly the and have mechanism,055 etmodelsef- text al. entity, 086 KB2015086 and060; linking,Wu text which et al. in,060 is which usually064 067 088 expensive064 081056 and 086 051 564 iments, separate029iments, 034 tasks034 separateand of word entity tasks translationattention of031 wordrelatedness510 translation to 004 of select 022 knowledge503 generate the demonstrateiments,019008028001 most attention cross-lingual503 separate informative Joint023 ande Representation tasks cross-lingualthe trainingtext spectively. of across ef- worde data Learningilar languages, translation viatations embedding dis- of Cross-lingual benefits Using capturing Cao vectors. tations many Anonymous et entity mutually Words al. NLP benefits Another, 2017 and tasks, ACL linking)utilizethecoherenceinforma- Entitieslation many approach submission while054 NLP via has mechanism,079as tasks, in (560 Han a while051 case ilar has embedding 553been 058 which done vectors. in is553 Anothercross-lingual usually072 approach expensive scenarios. in (Han and084069084 en081073 078 533 005028iments,attention separate513 007 to tasks023 select019407 000 ofgenerate004 word the most translation027 cross-lingualjoint informative inference403 020are trainingamong helpful knowledgeAllstar data to302 breakvia004attention base dis-…down andNBAtext to acrosslanguage selecttic space, languages the gapsare to mostbeen helpful enablein… withoutmono-lingual many doneinformative057 joint… to in any563break inference cross-lingual additional settings,textual583 downtext054 among2016 descriptions across and language trans-)learnstorepresententitiesbasedontheir scenarios. KB fewexample, languages and… researches together gaps 069 ( Wang inwithexample, without have et many theal. , structured2014 ( anyWang054401; Yamada additional et re- al.073, 2014et457 al.,; trans-2016Yamada070 ; et al., 2016 453077; NBA055078 ೉᯾ 050 352 451 410027527 026Multi-lingual020study,008 attentionstudy,! 019 the to005text the017are knowledge select004 results across helpfulfectiveness results000generate thelanguages,012 011 tojoint on mostmethod007 breakliance on cross-lingual bases inferencebenchmark capturing informative006 benchmark that ofdown on012ityintegrates parallelour011 (KB),ther mutually languageamongs between training methodACL data, cross-lingual improve 2018datasetstoring knowledge dataset gapsSubmission wedata withautomatically2016 entities via in word***. ACL manymil-)learnstorepresententitiesbasedontheirdis-tasks, the Confidential basegenerate 2018 an Submission019 performance. and average Review andare such cross-lingual***. Copy. helpfulAttentive Confidential corresponding DO as NOT 058 DISTRIBUTE. toReviewcross-lingualjoint training break Copy. In inference DO down exper- NOT data055 DISTRIBUTE.Distant vialanguage054 mentioned among dis- entity knowledge gaps057070 460077 Supervision linking, in076Lawrence many base067056 and Foust in577 whichACL061062 2018tasks,into069 Submission a such061 longer062 as***. sentencecross-lingual Confidentials ,butthisincreasesthe Review entity Copy. linking, DO069 NOT in which DISTRIBUTE.050 030 030 007 and entityare helpful representation text across to break learning languages down to enable language without gaps anydiffer. additional intext many across On trans-057 languages, one080 hand, capturing we utilize mutuallywords their shared in different080 seman- languages. 023iments, separate tasks of005018 word translation kmjointvarious inference textualbeen languagesamong done descriptions003 Entity knowledge in ilar cross-lingual Regularizer embeddingtogether share base with some 055 vectors. and scenarios. the structured commonAnothertext re- approach across semantic languages in (Han without073068 any additional೉᯾ trans-·ᐰේᇙ 053 030 030 027 027018 016 511030023and entity002 022 representation504 tings.009 002fectiveness In 027019of this learninggenerate504 paper,knowledge to cross-lingual018 weenable015 of 031 propose our 023Anonymousther varioustraining methodAttentive attention a 015 noveltant improveand not textACLAbstract020 data entity supervisionDistantbeen submissionacross withvialanguages well representation dis- Supervision andthe languagesanexplored not Abstract average performance.et overbeen cross-lingual al.been in share, well learningmulti-lingual cross-lingual2016 withoutmono-lingual080 done explored2016;Toutanovacorpus some to in)learnstorepresententitiesbasedontheir any Most enable cross-lingual inset- toadditional cross-lingual Inexisting settings,knowl- enhance common et080 exper-561 al.2016variouscorpus052, work 2015 eachMostand trans-077)learnstorepresententitiesbasedontheir scenarios. set-jointly to; other existingD fewWu enhance semantic554068 languagesby059models etresearches learningwork al. eachmay,been KB jointly077A other wordand done introducehave554066 bymodelstext and share learningin cross-lingual KBtextualCross-lingual080 wordand some072 inevitable073 text2016 and descriptions scenarios.)learnstorepresententitiesbasedontheir common errors. together065Bi-lingual077068 semantic069 Our withAll-star EN065 embeddings thek structured073 re- 070player052 Comparable sentences081 provide cross-lingual co- 033 031531 and1Introduction entity relatedness514 008 words demonstrate 524 and005 filterand attention the023011006 entity out ef-texttant noise, across to305 supervisionrelatedness select languages, which011008 thewords will mostoveraredemonstrate capturing fur-Anonymous helpful multi-lingualtext informativeACL andcomplementary005 2018 mutually acrossfilter toSubmission ACL≈w break the submissionet outAnonymouslanguages, ***. ef-ACLknowl- downal. Confidential noise,knowledge. 2018, 2016 Submission language Review300; which capturingw Toutanovation ***. Copy. Instead Confidential togaps DO will align NOTbeen ACLmutuallyinofe DISTRIBUTE.058e etReviewfur- re-many similardone al. Copy.,564,w,w 2015in DOsubmissioncross-lingual words NOT; DISTRIBUTE.Wu055the,w and et majorentitiesal.056 scenarios.,Caoet al. withet, challenge al.2016, sim-0582017581; CaoToutanova)utilizethecoherenceinforma- et liesal., 0552017 et in073 al.)utilizethecoherenceinforma-, measuring2015083061; Wu et al. the, 061 similar-574081 Lawrence 355 350 and entity 0301Introduction relatedness039 and entitywords demonstrate relatedness and filter307026 the009020 demonstrateout ef-008ther noise, 006 improve w complementary018 which+ the005029034and ef-( 020w will404)013 entity 021 the fur- knowledge.and relatedness performance. entity 024 007 representation013Anonymous Instead demonstratewords 400 of ACL re- learningbasketball and submission n Infilter the totextualtext enable exper- ef-out across descriptions noise,player languages which togetherACL೉᯾೉᯾will 2018without withSubmission059 fur-056 058 theᓯቖ anylations. ***.structuredlationACL additional Confidentialthe080textualAttentive 2018 515 However, Submissionᬩۖާ mechanism, major re- Review055 descriptions trans- Copy.***.tic Confidential these challengeDO= NOT070 space,which methodsDISTRIBUTE.cIn ReviewDistant+ togethermay thisγ Copy. is onlypaper, DOs liesto usually introduce NOT068 with057 enable focus DISTRIBUTE. in we themeasuringSupervision expensive on087 structuredpropose087 jointinevitable063 to071 and re- inference learn the similar-errors. cross-lingual063 454070 among Our089 embeddings KB074 and450 076079 084 357 565 037 037 534 007 027032021037024 408wordstant supervision andki028 filter5051Introductionout 010 overki003 noise,002jointtasks,012text multi-lingualJoint505 2generatewhich inference across such= will as cross-lingualJointly languages,amongRepresentation012 cross-lingual knowl- fur-=Abstract learning knowledge lationACL training capturingαJointly 2018 word entity baseSubmission mechanism, Abstract anddata learningP and( linking, α mutuallyx entitytant ***.viaki,lj ACLtasks, Confidentialfectivenessx dis- 2018 wordsupervision represen- ) whichSubmission in etIn such andwhichReviewAnonymous al. Learning this, entity is***.Copy.2016 as Confidential usuallypaper, overMost DO cross-lingual; represen- NOTACLToutanova of multi-lingual DISTRIBUTE.textexisting Review we584 submissionour expensiveacross propose Copy. methodetMostwork DO al.of entity NOT, languages,053and2015 knowl-DISTRIBUTE.existing tojointly Cross-lingual learn linking,; withWu models work555 cross-lingualet060 capturing071 al. anin, jointly KB which077 average and mutuallymodels555 text KB Wordsand074 062458 text and062 078 Entities via082057 087 052 ߵᇐỿےtasks, such as cross-lingual entity 000 linking, lations. in 000 which However, tasks, these such methods as cross-lingual only focus562 on entity linking,461 in which 050 050 ᜡ 512 411 031 spectively.035028 Using528035 fectiveness024and031 entity entity009 relatedness032017 linking ofjoint006 our inferencedemonstrate020019 as method023 method a amongC case thetant029thatef- knowledge with supervision integratestasks, an base such averagelation overcross-lingual and006 asstudy, multi-lingualtext mechanism, cross-lingualJointAnonymous across word thekmlation languages, ACL which knowl- Representation entity resultsIn mechanism,1 submission thiso is linking,020 paper,usuallyi capturing2016 on inwhich wecomplementary)learnstorepresententitiesbasedontheir059 expensive benchmarkwhichbeen propose mutuallytextual ismay doneentity usuallycorpus and descriptionstoin representations056081 introduce to cross-lingual learn expensiveenhance knowledge.Learning datasetlation cross-lingualentitycorpus each together mechanism, representationsin andscenarios.to other ainevitable enhance unified Instead by[[ withLawrence learningACL vectorRepresentation078 each ofwhich 2018 theof in other aspace.word Submission structuredre- unifiedCross-lingualMichael074 iserrors. by067 and056069 usually402 Forlearning vector ***.FoustSentence 578 re- Confidential expensive space.word081]] Our 073was and For Regularizeran Review embeddings American and070 Copy. DO Words basketball NOTNBA085 DISTRIBUTE.085 and 082NBA Entities (zh) via079070 452 031 028 006 019 031s = 007 028020tant 009 supervision016iments, over024 multi-lingual 016joint separatetings. inference knowl- In this tasks among paper,tings. knowledgewe In of this propose081 word paper, base a noveltranslation we and propose057 a novel078 0690591 In this paper, we081 proposetextualchance to descriptions learn to cross-lingual066 include078 together070 unrelated with066 the056 structured074 sentences. re- 028515e lions verify010021 001009024 ofverify entities007 thewords024012 the qualitycomplementary006 and quality gain and filter004 gain of012 out of their of knowledge.our noise, of 20% ourther 008tasks, 20%embeddings.019 factswhich embeddings. improvewordscomplementaryand Instead 303 suchliance and will in as3% ofcnfur- on cross-lingual themeanings3%re- in various parallel 2016edge over differentperformance. over021c)learnstorepresententitiesbasedontheir knowledge. lations.= data,Joint bases.n baselines, entity lan-baselines, ,buttherearealsowaysinwhichthey we Representation ilar However, linking, automaticallylanguages. We embedding InsteadIn Joint also exper-tics in these In re- whichIn re- RepresentationthisP of proposeLearningthis methodsto( vectors. re-565 paper,(060x align paper,057KNi059) x onlywe Anothertwoi of) proposewe Cross-lingualsimilar focus Learning081 types propose approachtextual on054 to056tion2016 learn ofL words to to descriptions)learnstorepresententitiesbasedontheir WordsCross-lingual inen align cross-lingual learn071 (LHan similarzh and cross-lingualtion and Entitiesen078 togethertoL words Words entitiesalignzh058 viaand similar and with074 entities Entitieslations. thewith words structured062 with074 viaandthe simi- sim- However, entities major re-062 with these challenge sim-069 methodsbasketball lies only079 in focus measuring on theᬩۖާ071051 similar- 353 029 031 fectiveness of our method with an averageedge011001 bases.022 joint We inference alsopropose amongs knowledges two typesw bases andthe e major challengemono-lingual lies settings, in measuring and061 few researches= the havesimilar-072 occurrence 051 of words, thus, we learn similar em- ጱ 054 ڹ 532fectiveness031 of ourfectiveness method withkm of our anther methodaverage513 improve525L 007 with003fectiveness an theliance019506 average performance.|| onfectiveness021 parallelof014− our000attention013506|| L data,method Inof 025our weexper- 014 automatically methodwith000013 to 401an iments, selectkm with001meanings average km an average separate ki thelation km001,buttherearealsowaysinwhichthey mechanism,004most Lawrence| tasks informative which of word is usually translation057081 expensive563 and 556word050582are and helpfulentitye556069050 representations,w to break063051064,w down in the063051064 same language575071 seman- gaps in075 many451 053 ther improve010 022 the performance. In exper- text across languages,tations032 Jointly007ther benefits capturing learning improve manytationsJointly word mutually the NLP benefits and learning performance. tasks, entity many while word represen-2016 NLP has and)learnstorepresententitiesbasedontheir060 tasks, In entity exper- while represen- has itylations. betweenmeanings However, entities072 ,buttherearealsowaysinwhichthey these andmethods057 corresponding only focus on mentionedᘶፑ NBAᭌᐹ 082 fectiveness 028 of025 ourther methodedge improve eS! anthekik,l We average performance.405e alsolj thecomplementary 010 propose types knowledge.Abstract over lies"∈ may multi-lingual in measuringintroduce mono-lingual Instead"∈ edge knowl- the inevitable of the bases.re-word majorMostsettings, similar-existing andThe challengeerrors.Wecorpus and entitywork also fewintuition jointly058 toOur complementary representationsresearchespropose enhance liesmodelsmay embeddingscorpusity in KBintroduce eachtwobetween measuringis058 haveand to text otherthat, enhance types inknowledge. S the byinevitable entitiessame thelearningwords each060S similar- otherseman- e078 Insteadplayer word errors. and bye೉᯾ learningandwho of070 corresponding entitiesre- Our spent wordᓯቖ embeddings075 12 seasons and in ᬩۖާ in mentioned [[NBA]] 455 ᓯቖ 034535 038032 412029516 018011 409 010textiments, across 021 029024 the languages,liance∈ separate007and030035 major entity on edge005 parallel challenge capturing representation bases.edge tasksthe data, bases. 009 major mutuallyWeliesx wei017may of automatically Welearning inalso challenge introducexmeasuring word alsocomplementary propose topropose( x enable017 liestext translation)method inevitableL thetwo twoin across similar- measuring types types knowledge.that301 languages, errors.integratesmethod021textual thethe Our InsteadcapturingThethatcross-lingualsimilar- major descriptions embeddings integrates Inlations.C 566 challengeintuitionthis061 ofexample,entity mutually060 re-| paper,585 word togethercross-lingual representations However, ( liesWang516 wemay withispropose inexample,entity055 et057 measuring theintroduce al.that,word these in, representations structured2014text a to unified(Wang learn methods; Yamada words inevitableacross the462 vectoret re-079 cross-lingualare al. similar- inLearning,et space.2014 only a unifiedhelpfulal. and languageserrors.,;0682016Yamada059 Forfocus vector; entities Our to088 onet space.084 al. break embeddings074459, 2016 For without in; down071 any language067082079 additional gaps067 in trans- manyNBAᘳӱᭌᐹ 071080 085 566 351 032 038032 029 040 529 025020032 033027022032 008 ther" improve000 306012Abstract the029001021000 performance.text000 acrossthe020Joint major languages,001generate Representation In challenge exper-o025Most002 capturing cross-lingualexistingJointAnonymoustextuali liesmay work Representation Learning mutuallywordmono-lingual indescriptions jointlyintroduce ACL measuring002 training andmodels submissionet ofgain Cross-lingualentity al. KB data Learningsettings,, togetherinevitableand the2016 viaword oftextword representationsliance similar-att dis-; 082 and20%ToutanovaAttentive withof and Words Cross-lingual errors.entity few on the entity researchesandparallel structured andrepresentations Distant058082 Ouret representationsin EntitiesAttentiveCross-lingual al. the3% Wordsembeddings, haveSupervision data,2015 re-same viailar over and in;079 Distant050 weseman- embeddingWu the Entities samein automatically etbaselines,072070 the051062 Supervision al. seman- via, sameilar vectors.enword050 embedding seman- and075 Anotherzh051 re- entity4032 vectors. approach579 representations082082088 Another052lations. in (HanAll-star approach However,in the052 in same (Han079070 these seman-071090 methods only075083 focus on 077 050ᗦࢵ 356 453 032study, the036 results036 029 on011 benchmark308038033514025 009 025013generate020507 dataset cross-lingual030 023003013 014 507011Jointly training learningliance data014 008∈ Jointlyverify via R word on dis- learningparallel andof the 000 entity022 knowledge quality word data,x represen-i and we, n ,000 entity attentionof automatically our represen-061 embeddings. and059 cross-lingualbeenγ done082564 inlations. cross-lingual textual However, descriptions557061 scenarios.att079 these together557070 058 methods 075 withmono-lingual the only064 structured063050075 focus073Unbalanced re- on settings,064063050 086086 andinformation few researches083 .Sometimesthepairof have088 080072 358 053 ฎ ߵᇐỿᬩۖާ065 many 058452 Americanےᘳӱᓯቖ065 gaps inᜡڹgain of008032 20%gainMulti-lingual andgain of 3% 20%gain of over 20% and baselines, of 3%and knowledgeguages,iments, 20%023 over 3%012 over re-011 baselines, separate andeprovidekm baselines,gain bases 009 3% re-oftasks008of (KB),rich 20%spectively. knowledgegainre-over of006015 word andstructured ofcomplementary storing 20%baselines,000edge 3% translation attentioniments, andover bases. Using015! mil-∈not 3%D knowledge.knowledge baselines,tations separateand been and000 We over402 re- entity∈! well alsocross-lingual benefitsC entity baselines, exploredInstead tasksproposenot re- manyACLtations linking beenbeen 2018 relatednessforemay of of in NLP Submission twore-well benefitsre- cross-lingual worddone un-introduce tasks, explored types ***. inascorpus manyConfidential translationACL cross-lingual while a2018textual Attentiveinevitable toin set-NLPcase Submissiondemonstratehas enhance Review cross-lingual descriptions tasks, Copy. ***. eachscenarios.are Confidential DO062errors. while NOTother061 DISTRIBUTE. set-helpful togetherby has ReviewOur learning082 Copy. the Distant embeddings with DO word059 NOT056 ef-058 theto and DISTRIBUTE. structured break[[=೉᯾073 Supervision re-050·ᐰේᇙS down]]ฎᗦࢵS language050[[NBA]]ᘶፑጱ 517 Abstract022 010 Most existing work jointly models KB and text e 567 060 ity between072 entities and corresponding mentioned 533 Multi-lingual007 iments,gain knowledge of separate029 20% and002iments,526 tasksof 3% basesknowledge over of 021 separate word baselines,generate (KB),001 translation tasksattention 406013 cross-lingualspectively. re-Abstract storing ofitylianceofJoint word between knowledgeand001026 Representationtraining on translationcross-lingual mil- parallel entitiesUsing data attention Joint2 iments, viaand data, andRepresentation dis-Mostdiffer. Learningentityare andAbstract entityseparate correspondingexisting we helpful cross-lingualAttentiveautomaticallylinkingwork ofOn=Allstar Cross-lingual relatedness tasksjointlyto∈ Learning oneof{D Distant break!Joint modelsknowledgeαA ofαkm mentionedki,lj hand,asAttentiveKNtic word Representationlations.KB ofdownSupervisionNBAa} space, WordsMostlar Cross-lingualand caseα translation textwedemonstrateki,lj existingentity Distantlanguageattention embedding However,Jointand toα utilizeki,lj workenableCao representationsEntitiesexample, Representation LearningsimSupervision liance jointly Words et( gapsandtheseware jointki al.theirmodelsAttentive viaentity,wmono-lingual (, onWang cross-lingualhelpfulofandlj methodsin2017 the)att inferenceCross-lingualKBvectors, parallel manyCao representationsEntitiessharedexample, Learningandet)utilizethecoherenceinforma-sim in ef- al. atext et( to051 onlyunifiedw, 2014ki al. viabreak data,,ws (among,Wang offocusseman-lj settings,2017; Wordstic)Yamada noCross-lingual063 vector583 we space,downet)utilizethecoherenceinforma- inDistant on KB matteral. anda automatically079051 space.unified, et2014 and andal. Entities language to, ;2016 WordsfewenableYamada For vector 071 they; via researches and Supervisionspace. gapset joint al.Entitiesare , 2016 in inferenceFor in many; viahave the among576456 KB and 057076 NBA (zh)052 beddings for the words that frequently co-occur to- 030413 012 026 410complementary009 025ityjoint betweenMulti-lingual knowledge.= inference002of knowledge002 entities amongityInsteadJointly between and attention018 knowledgeare learningknowledgeof002Joint304 correspondingtant re- helpful009e entitiescorpus wordRepresentation supervision003 anddiffer. to and base,w enhanceto 018 entity andcross-lingualcomplementary breakJoint and001baseseach overrepresen- mentioned corresponding On entity Representation otherLearning down,w multi-lingual003ityone by representation(KB),learning ity language betweenand ofknowledge.hand,001 between wordCross-lingual entityity mentioned Learning knowl- andstoring between062 learning gapsrepresentationweword entities Instead entities ofutilize in andWords Cross-lingualentitiesto manymil- enable entity of and059 and learningre- their andrepresentations corresponding andEntitiesare correspondingWordsen to sharedhelpful enablecorresponding viazh and to Entitiesinen seman-052 breakthementioned463 mentioned samezh via down seman-The language052 mentioned059 intuition076 gaps075460053051 in many is that,053051 068 words and068 entities080 in 052 354 030 021 019023515 022010010030 508 s 007 words508complementary012 ity and between knowledge. filterliance entities on out Instead andare parallel ticbeen correspondingnoise, helpful of space, donere- data, toin022 toentitywhich cross-lingual break we enable∝ representationsmentioned automatically down joint willmono-lingual scenarios. language in inference060 a∝ unified fur-words vector gaps among565 space.settings,060differ.057 inet Forin many al.KB different, and2016 and On558073071062; fewToutanova one080ettasks, al.researches languages. , hand,2016558 such et069; al.Toutanova we, have2015 as utilize cross-lingual; Wu etmono-lingual al. etLawrence, theiral.2015072, ; Wu sharedFoust entitysettings, et al.080, seman- linking, and few researches inSemantic which have Space 072 was 004iments,026014tant021009 supervision separate024030014022015 tasks over001 multi-lingual of word021corpus015033 translation tonot001 enhance knowl-೉᯾ been026 each welllations.Joint other exploredᓯቖnot by learning beens However,n in well cross-lingual wordRepresentations 0052016ᬩۖާexplored and )learnstorepresententitiesbasedontheirw theselations.variouscorpusvarioussticwntic set- inkigenerate space, to methodscross-linguals space,km083 enhance However,,wlj to063slanguages eachlm500 enableto cross-lingual062 only other586 enablew set- thesekiIn083Entity bys jointfocuskm thiswords learning,w methods083lj joint paper,inference Learningsmono-lingual onRegularizerlm word share059training inferencelations. in and080 onlyweLdifferent among propose focus some some dataHowever, among|| KB on051 settings, to viatasks,tic and learncommon commonof languages.080 space,− dis-KB these cross-lingual076 Cross-lingualand071 andsuch to methods||051076 few enable semantic as semantic089 researches083083 cross-lingual only joint065064076 focus inference074 have on 065064 among Words entity080071 KB072 linking, and೉᯾ and·ᐰේᇙ in which Entities076 Bi-lingual 550 EN via 054 083 055 013033012026 030026033 030 033039033 033 536 530 518 Jointly learningL031 word016 andliance entitytations || on represen-011 parallel benefits−016 tings.||tations data, many In thiswe benefits NLP automatically paper,tings. tasks, many we In while thispropose NLPkm paper, has tasks, aki novel we while propose has568 a novel 517 061Comparable 580 066 Sentences6Experiments066 081 567 035 039 033spectively.033 spectively. Using041 entity Using linking entity asand024 a linking case entity and 010 asrelatedness entity022 a case attention036002 spectively. relatedness demonstrate014 003 to of select 002 Usingand knowledge the entity the ef-demonstrate entitygenerate003 mostentity attention relatedness linking corpus Attentive representations cross-lingual informative toattention enhance and as 002023 Distanta cross-lingualinare demonstrate each casetheIn akm unified Attentive thishelpfulother training to Supervisionef- vector bypaper, select learning002 to space. Distantmono-lingualdata webreakthe wordkm For proposethe∈ via and ef- downSupervision mostdis-∈ to settings,AttentivelanguageCao learn informative et000060∈ cross-lingual al. 083 and Distant, gaps2017∈S fewAttentiveCao in)utilizethecoherenceinforma- Supervision researchesmanyS et052lation al. Distant,e2017 053064074 havee)utilizethecoherenceinforma- Supervision mechanism,052 053072 404 been085 089 which052 done sentences in is cross-lingual usually052 convey083 expensive091 scenarios. different and information, e.g.,073 the086 050 454 spectively.037 andspectively. entity Using013030 0341Introduction relatedness entityUsing1Introduction034 Jointly entity linking learningdemonstratespectively.011 linking as wordtant a caseandas supervision023 entitya001attention 013 overS entityk,l tations multi-lingual to linking benefits select 010 and403 many knowl- the as004entity NLPa most casetasks,joint relatedness informative while AbstractAttentive inference has004 "∈302 spectively. demonstrateDistant amongjoint " ∈Attentive inference knowledge Supervision 063example, theDistant among Most Usingtion061 ef- base (existingto knowledgeWang Supervision and alignexample, been entitywork et similar al.tion jointlydone058 base, 2014 ( toWang wordsmodels andlinking in align; cross-lingualYamada etKB andtext similar063 al. and entities, across2014 text aset080 words al. ;a with,Yamada scenarios.2016 languages case and 060 sim- ; entities et al. without with, 2016 054 sim-; any additional 054 087 073 trans- 084084453 NBA081 ೉᯾ 051 352 verify thelions037 quality of entities of our 039 embeddings.028 and014024516and527013lianceattention their entity on011026 parallelrelatedness factswords509Jointly to010text select learning 307 across407data,attention in different word004 the demonstratevariouswordsgenerate we languages,509liance∈ andAbstract002 most automaticallywords entity to027 in languages. on select represen-differentcross-lingualentity informative parallel019 lan-in capturing theJointly representationsdifferentedgefectiveness the002 ef- data,learning languages. bases. most mutually wetasks,languages.wordMost trainingin019 aliance automatically unified informativeWeexisting and such= entity also vector ofwork ondataIn represen- as thisproposejointlyspace. parallelourattention via cross-lingual paper, models For… dis-example, methodtext two data, KBwords weentity to andacross proposetypes (Wang representations textwe selecttic inentityetwith automatically space, languagesdifferent al. to064P, the2014 learn( linking,063 ingeneratex aanto; unifiedmostYamada cross-lingualtasks, enablex languages.… average without vector) in etinformative cross-lingual566 suchal. which joint061 space.…, 20160602016 any Forinference as; )learnstorepresententitiesbasedontheir cross-lingual additional 559074 training among0522016 trans- data)learnstorepresententitiesbasedontheir KB entity559via and… dis-052 linking, 076 in which 577069457087 069 077 089 078 gether054 by minimizing 357 the Euclidean distance: 534 009lions 414031 ofspectively. entities022519 derstanding and 309027020411 Using their 023 andtations facts027 naturaledge entity022 entity benefits003 bases.study, in relatedness manystudy, languagelinking various025 We"NLP 016 tasks, also the 003 demonstrate while the propose 012wordsAbstract000 asresults022 haslan-beyond tasks, results016 a in tings.two casedifferentexample,generate suchthe typesfectivenessentity on In ef-tics000texts. on as this( representationsWangmono-lingual benchmarklanguages. cross-lingual cross-lingual paper, tobenchmarktasks, ettings. al.text align,Mean-2014Mostx we in such Ini ofacross; a existingYamada unifiedpropose thistextualwords settings,our similar as entity 023 trainingpaper, vector work et cross-linguallanguages dataset al. amethodmono-lingual dataset, descriptions jointlyspace. novel2016textin linking, we and different; data words Foracrossmodels propose fewbeen without via569tasks, inwithKB languagesentity settings,researches together501 whichadis-and1 doneand languages.novelword texto anyan linking, such iniand entities withwithout average additional cross-lingual havetasks, entityfew the in as anyresearchesmono-lingual structuredwhich representations such053 withcross-lingual additional trans- as scenarios.072 cross-lingualhave simi- 584 re-050464081 trans-settings,text in053 the across same072070 entityand062 entity050seman- languages077 few linking, 077 researches[[ linking,Lawrence066461 withoutbeen inwords which075Anonymous done 073haveMichael any in in066in additional which cross-lingualFoust different]]072 was trans- an ACLscenarios. American languages. submissionbasketball059 551 073 359 034 034 034 031 034 008 031027 014 025 034003027tations011 benefits012 015031 many NLPlions015 tasks,009017 while031004015023generate of 014not has entities beennot cross-lingual been017 wellmethod well004tations 011 explored explorednot and benefits027 training thatbeen005tics in manyintegrates cross-lingual theirin wellmethod data NLPcross-lingual to 003 explored tasks,word via aligntasks, factscross-lingual set-that dis- while005 and suchintegrates has in entity similar set- cross-lingual in003 as representations cross-lingual word varioussame cross-lingualtexttant 064 words084 across supervision set-corpus languagetion entity062 in languageslan-word toandthe to enhance061084 samelinking, align 084 entities over084 each seman- similar withoutorbeention in059 other multi-lingual which not.081 to wordsby done alignlearning with any and in On similar054 additional065064075 word cross-lingual entities simi-the andthe knowl- wordsvarious081 with major other trans-077 and sim-054061Dscenarios. entities challengelanguages hand, with084065055 sim-053067077A cross- lies share in065055053067 measuring some081081073 common the058 similar-077 semanticAll-star 053 player 537 study, thestudy, results031 study,531 the on resultsbenchmark thestudy, 034 results 031 theon fectiveness benchmarkresults dataseton015517 benchmark014 on benchmark of023 datasetnot oure012 beenkm510tationsedge dataset011 methodwords well benefitsstudy, exploredbases. dataset 003 and many with inther We theAbstractwords cross-lingual510filter NLP003attention an also tasks,results fectiveness average improveandout propose while set-example, filternoise,tantto hason034305 The selecttwoout003 (Wang benchmark supervisionJoint which ofnoise, types et the ourcorpustheal. Representation intuition,Most2014words whichwillLmostto methodtext performance. enhance;024 existingYamada dataset over fur- across willJoint informative eachand work et multi-lingualwords withfur-Learning other al.RepresentationAbstract languages,,filter jointly2016 byis wtext learningCao an; modelsbeen ofoutthat, etaverageexample, acrossmeanings in Cross-lingualword al. KBcapturing, noise,done 2017 andLearningknowl-different andInCao ( languages,Wang)utilizethecoherenceinforma-c inwordstext= exper-et cross-lingualwhich065wilaral. mutually ofWords064 al., 2014 Cross-lingual,buttherearealsowaysinwhichthey embedding587Most, capturing 2017|; willYamadaand existinglanguages. Cao and )utilizethecoherenceinforma- Entitiese scenarios.567e fur- etwork et062 ticsilar vectors. al. mutually Words al.,0612016 entitiesjointly embedding via,P,w,w2017; to( and models Another (x align560)utilizethecoherenceinforma- Entitiesi) KB Joint,w vectors.x053 andi approach in)the via text081 similar Another majorRepresentation in560 (073Han053 approach words challengeen581084In inzh andthis(Hanen paper, lies entitieszh in we Learning measuring propose with simi- to learn the of similar- cross-lingual Cross-lingual081 074 WordsLawrence and 053 Entities084 via 355 fectiveness of our025fectivenesswords method005study,013 and with of filtercomplementary theour004 an methodaverage outresults noise, generate with on004 knowledge. which 013 benchmark001 an cross-lingual020Abstract average willof fectiveness knowledge404Cao Instead fur-Jointlyexample, et training dataset001al.,0202017 learningof (Wang datare- attention)utilizethecoherenceinforma- ofcorpusword et word viaour al., dis-2014Most to and methodwords andenhance 006; entity existingentityYamada cross-lingual each represen-basketballand withrepresentations worketmeanings other al. filter,jointly 2016 an bytext learning; average modelsoutAnonymous acrossZh wordinnoise,063 KB player the,buttherearealsowaysinwhichthey and languages same text which ACL518 seman-Anonymous ೉᯾ submission೉᯾will without054 fur- ACLᓯቖ any 075 additionalsubmission051 ᬩۖާ054 trans- 063051405 090 English sentence070 in070 layer 2454 (Figure 1)contains 055 568056 455 040 ߵᇐỿ 051ےgeneratenot 012 been wellfectiveness cross-lingual028 exploredof027023 knowledgee032037 inJointly cross-lingual024408016 ofwords010 learning our training005 attention017,wtant method and015eset- worde supervision ,w 028 dataand filtertings.e,w and entity withe via,w1Introduction In cross-lingualrepresen-out 005017dis- thisan,wmethod012 noise,,w average paper,tant over ,w 006 ,wthat we supervision multi-lingualthewhich,w propose integratesgenerate,w major004,wmethod will a novel challenge006 cross-lingual overfur- that knowl- integrates multi-lingual 004lationbeen lieslation word traininge cross-lingual065 done mechanism,inL mechanism, measuring in knowl-570entity data cross-lingual,w502tant representationsviatic word ,wthe001 which062dis- space, whichsupervision the major,w scenarios.Attentive similar-Inthe to is is in this060 enable usually a usually majortextual unified challenge paper, over C jointvector expensive challenge descriptions expensive multi-linguallation055 space.inference066we065| lies textual propose ForDistant and mechanism, lies in together and among descriptionsmeasuring in knowl-073 to071055 measuring062 learnKB with 078S and whichthe Supervision cross-lingualtogether the structuredS077 the067056054 similar- is similar-e with usually player re- thee structuredwho067 expensive056 054 spent458074 re-12 seasons and in [[NBA078]] 552 082 087 ᜡ 021 015 520 040 042 038 023 035028 035412528015fectiveness 013 012 032of 026 ourtant 004 supervision methoden,Kobe 023 the andnot over major been entity004 with multi-lingual well representation explored challengeentity been representationsanand inthe (s cross-lingual entitydonexaverage knowl-i major)ACL lies learning inrepresentation a 2018 unified incross-lingual set-lations. challengestudy, measuringtionSubmission vector tos enable to space. align However, learning lies Forthe similarscenarios. the ***. into wordsConfidentialresults similar- measuring theseenable065 andmethods entities Reviewon the with063 similar-062 only sim-benchmarkbeen Copy. focus done DO073 NOTin on054 cross-lingual DISTRIBUTE. dataset054 scenarios.078090462 076 578088073 092NBA (zh) NBA085085 NBA082 (zh) word usually embedding DOamong, 2016 to SupervisionCross-lingualpaper,andWeNOT enhanceฎ085 which; KB the568 DISTRIBUTE.alsoexpensiveToutanova theyᗦࢵ eachvectors.et weilar andre- similar-In al. other isembeddingpropose propose, this usually2016 are bymay Another learninget and paper, 561;al. inToutanova076 585to052 vectors.,two expensiveword465 approachintroduce2015 the082 learn welation and types; propose AnotherWu cross-lingual et in078 mechanism,561 (al. andetHan052, al. approach2015 to, inevitable learn1; 086WuRepresentation inwhich cross-lingual066 (et068Han al.InNBA is, this usually074 errors. paper, 066068 expensive we084 Our082088 propose 074 and embeddings to learn cross-lingual078 090 074 052 ڹ002ofSubmissioncorpus Inet weฎ al. thisฎ embedding, 20%propose2017 to enhanceᗦࢵkmtic paper,lation)utilizethecoherenceinforma-ᗦࢵAnonymousthe space,xcorpus ***.i majorAttentive eachand a mechanism,Confidentialwe novelto other to enhancex ACL024 proposechallenge enableeo3% by2NBA vectors,Distantkm learningAnonymous Caosubmission each (lation jointx etedgeoverReviewNBA otherAttentivei aSupervision al.word) which lies, novel inferenceby2017ᭌᐹ andmechanism, learningACL bases. innoCopy.)utilizethecoherenceinforma-066In baselines,et Distantilar measuring iscorpus submission this al.matterڹCao)utilizethecoherenceinforma- larڹ-೉᯾ cross-lingual foraNBAฎ002 novelword andInCaoᓯቖ018 NBA un- and thisᭌᐹ set- et cross-lingualᗦࢵACLal. entitygainᭌᐹtings., paper,ᬩۖާ 2017028 2018 represenڹ035guages,038415032 028 provide rich026040 016518 structured 024 tings.016032 511notof In beenknowledge thisNBA well paper,ᭌᐹ explored018 Jointly we016024 002 attention511propose intings. learning 034535 036 guages, provide032 richwhile,029 structured035028 abundant013 014024 knowledge 005 textverifytations308011verify benefits corpus005032 thertant manythewords005theimprove NLPsupervisionfor quality014 tasks,contains and quality un- while thefilter hasovertion performance. of outgaintations multi-lingualto of largealignour noise, our benefits similar ofcomplementary embeddings.005 which embeddings.amountentity20%manyIn words knowl-≈ exper- representations NLP andwill303 entitiesand tasks, fur-complementary005 knowledge. while with 3%inIn a≈ sim- has unified this overtion085 vector paper, InsteadEnAnonymous toknowledge. space.x064 baselines,aligni we of For propose063085 re-, similartion , 085 Instead to to 061 words learnalign re- of082055 re-cross-lingualKNACL similar and entities words submission055 with082 and sim- entities064074All-star with085 sim-055078 055 082 basketball079 055 ᬩۖާ 358 353 032 035 035 035 538 verify the qualityverify ofthe ourstudy, qualityverify embeddings.521035 of016 the032 ourthe quality embeddings.026 results oftings. our In embeddings. this paper, onJointly weverify benchmark propose017 learning 006 atheword novel quality andmethod entity of represen- thatdatasetedge006attention our013 integrates embeddings.bases.example, cross-lingual tother select We ( Wang improvetic also etword the space,al.!, 2014 most propose; tocYamada the enable informative ! et performance. al. two joint, 2016 066 ; types inference571the e588 among major In exper- KB and challenge076056067 lies082 in056063 measuring 085 thee similar-,w ,w ᘳӱᓯቖ in071 the mentioned sameᬩۖާ060 seman-ᘶፑ 553 075 360ڹverify532 gain the quality of 20% gain310 ofMulti-lingual and ourgain of016tant embeddings. 3% 20% supervision of verify over 20% and014attention024tings. baselines,013liancether the 3%and knowledgeguages, overIn quality this overimprove 3%to on multi-lingualpaper,w select overparallel re- baselines,+Jointly we016005 ofe provide thekm propose the baselines,our= learninggain ( basesw mostdata, performance.)tionembeddings. a knowl-021w novel word re-of totings. informative we+and align(KB), andrich20%Abstract005gain re- automatically In entity similar this(007wlar andstructured ofrepresen- paper,entity representation) wordsAnonymous storing In021( 20%tant embeddingeJointly 3%werepresentationsC exper-i andand025) propose supervision entities andlearning entityover ACL007 mil- a learning∈( 3%with novel representationwordeMostAnonymousD inknowledgesubmission baselines,j a sim-) unified overexisting andilar over tovectors,∈ entity embedding vectorCenable workbaselines,may multi-lingualInlingual ACL represen- thisjointlyspace. learning re-introducelation submission vectors. paper, models For forexample, noto mechanism, re-503embeddings AnotherKB we066 enable knowl-un- inevitable matterandLawrence propose (Wang text approach et al. towhich errors.064, in they2014learn063 (lations.Han; isYamada cross-lingualwillOur usually are embeddings However, et benefit066 al. in, expensive0552016lations.ity the; these between from However, and074 methods055 different582[[ entitiesword೉᯾ these only·ᐰේᇙ057 focusmethods and lan- entity]] and onฎᗦࢵ only corresponding representations[[057 focusNBA071]] onᘶፑጱ 010 Multi-lingual017 004ther improvegainmethod029028 knowledge of that 20% the integrates025ther performance. andwtationskian018 cross-lingualimproveedge 3% benefitsw bases overbasketball bases. manyki003e word the baselines, NLP Inki (KB),018 performance. We tasks, exper-e 405 w while alsotion was re-ki mil- words two etherAllstar and types entities improve tiondiffer. with e toNBA sim- align the similar performance.On067 wordsentityedge∈Anonymous onetext {D !representations and A across bases.hand,word entitiesKN ACL } languages Inlar with and in submissionexper-WeAnonymous we sim-a unified entityembedding also utilizewithout vector may representationsACL propose053 space.submission any introducetheirity Formeanings additional twobetween vectors, shared types053 in inevitable079 trans- the seman- entitiessame,buttherearealsowaysinwhichthey no078068seman-errors. matter and೉᯾ Our corresponding they068 embeddings ᓯቖ075 are inᬩۖާ the mentioned059455 054 ᓯቖ 009 ߵᇐỿ in 082 NBA079 Supervisionwhile the compa- NBAᭌᐹᘳӱ 085 569 456ےk 032 024 ezh,Kobe 022519 015 512attentionThe 006 not409 been 012 intuition019 to well027edge select explored512The bases.006 the029 inis The015 cross-lingualintuition024 most that,kiThe019ity Wejoint intuition035 informative between intuition set-also words inferenceljilaredge is embedding propose that, isity bases.and entities is among that,jointki betweenIn vectors. that,wordsity twoentities thistextljexample, inference between knowledgeWe typeswordsand Another wordsentity paper, acrossAnonymous and entities also ( corresponding representationsWangmono-lingual approachin among languagesand weand entities entities et propose basemay al.ACL propose anddiffer. in,entities2014 knowledge ( andHanThe inintroduceAnonymous submission ; correspondinganda withoutYamada unifiedin two tosettings,intuition mentioned correspondingin Onin learnvector et( basetypes anyx al.et065 ACLi inevitable,) space.one2016 al. additionalcross-lingual and andsubmission,ity; is2016 For mentionedsupervision hand, betweenfewthat,519;The 569TheityToutanovamentioned trans-errors. researcheset062 between words al. we intuition056,In entities2016 Our thisutilize et and entities562; al.074 haveembeddingsToutanova paper,, 2015 and entities their056isand; we correspondingWu etproposethat, correspondingthat,en in et562 al. shared072065 al.,4062015,zh to words words; learn091Wuen seman- mentioned et cross-lingualmentionedLearning069zh al. and, and077Foust entities entitiesAttentive spent069 459 12074in in seasons Distantᜡ 041 416 027029 413 014 006025 017 004017iments,method C ACL that 2018various separate Submission integrates014methodCnot ***.Cao been Confidential thatlanguages cross-lingualet welltasks al.006, integrates2017 exploredmay Reviewity)utilizethecoherenceinforma- between Copy.of inintroduce cross-lingualword DOcross-linguale word NOT share006 entities DISTRIBUTE.word inevitableset-,w andtranslation and some wordentityword corresponding2016,w representationserrors. and common)learnstorepresententitiesbasedontheir064 entity Our mentioned representations2016 embeddingsAnonymous inatt the)learnstorepresententitiesbasedontheir semantic same seman-077466 inmay the sameintroduce ACL064 seman- submission inevitable079463067056word errors. and075 entity Our067056 representationsembeddings in the same seman- 056 054 036529017method that025 integratesmethod 014tations033 cross-lingual that 018 benefits033 integrates033007025 manyiments, wordedge017 L006 cross-lingualther NLPand bases. improvetasks, separate entityilar word while embedding007method306 Werepresentation has006 tasksthe also029 that vectors. performance.008integrates||C proposeofexample, learning word Anothertationsliance cross-lingual two (translationtoWangtext− benefitsapproachenable types onC008etacross Incorpus al. many parallelword exper-, in0072014 (Han to languages NLPliance025೉᯾|| enhanceThe; Yamada tasks, data, each onmayintuition while et067 without otherwe al. parallelᓯቖ has, 2016 introduce by automatically572 learningCao; any data, et067is additional word al.ᬩۖާ, that,002inevitable2017and we086)utilizethecoherenceinforma- automatically trans- wordsword064 errors.083 and and entity Our057068067 entities586Cross-lingual056083 representations embeddings in079057075056 in the086 same058att seman- 058 579 083075All-star 079 083075083 356 052 057 017 029522 033 036 033 536 036 041036 036 derstanding039033 natural036027 language029 015033 beyond 038 not texts. been tations well benefits explored Mean-004 many in wordscross-lingual NLP tasks, and004 while set- filter has out noise,∈ R whichverifyet al.,ilar will2016 embeddingof the fur-; Toutanovailar086 knowledge qualityembedding vectors.504example, et Another al.086, 2015ilar (attention ofWang approachvectors.086; embeddingWu ouret065 al.et in, 2014 (al.Han Another embeddings., and; Yamada vectors. cross-lingual077 approach et054 al., 2016 Another;083 in (Han approach054 in091 (Han079 089083 086 554 088 1Introduction 043 036 033 036spectively.018 gain016 Usingand ofwords025 entity iments, 20%entity representation andtings. 013 filterlinking In separateand this019 learning003out paper, noise, as we3% to tasksenable propose a whichcase022of over a019 of noveljoint knowledge will word inferenceilar fur- baselines,same embedding translation022iments, among jointattention languagevectors.Cao inferenceexample, knowledge et separate Anotheral., re-2017 and (Wang among approach)utilizethecoherenceinforma- baseorcross-lingualword ettasks al.are knowledgenot. inand, 2014 (andHan helpfulmay of ; entityYamadaC wordOn introduce068 to base066 representations et break al.the translation and,2016 down otherinevitable; language in063mono-lingual the hand, errors.ψ same(areei gaps,s seman-k,l cross- Our helpful) insettings, manymono-lingualsim embeddings 083(eiS, and075 toS few breaksettings,wm)e researches 086069 ande down few have researches language069Comparable072 have 093 gaps Sentences072 in many086 053 en zh 2 ߵᇐỿ among KB and091 076 056ےspectively. of Using041 potential 520spectively.edge and iments, entity015 entity bases.spectively. representation linking030knowledge separate029513 Usinggenerate Wenot 007 been learningas also entityUsing tasks well a cross-lingualcase006 topropose exploredof of enable513 entity oflinkingcomplementary knowledge knowledge word007 inspectively.Jointly cross-lingual two016 linking(e as learning translationtraining types,text attentione a case as015)spectively.set- wordACLattention acrossspectively. et aE al.spectively.data and Usingtings. case 2018, and2016 entitytion languages, via Submission In;to cross-lingualrepresen-toedgetextToutanova andaligndis- thisentity007026 Usingexisting similaracross paper, bases.cross-lingual Using capturing etlinking words ***.we Usingal. entityiments, languages,, propose2015 Confidential and We entities;007 mutuallyentityWu linking alsoas a separateentity novelet with acapturing al. propose case sim-Review, e as linkingAllstarlinking tasksa Copy.mutually case two2016 oflation589 DOtypes)learnstorepresententitiesbasedontheir065 asworde NOT asNBA mechanism, a570 DISTRIBUTE.a translation case2016 case )learnstorepresententitiesbasedontheir057 which563 is usually057 expensive563066065 080 and tic087079 space,057 to enable057 joint 085 089 inference NBAᭌᐹᜡ 039 035 037539 derstanding533verify025 natural523 018 the028lions030 language023 quality 018 of entities of beyondand words our015 entity 026 representationand019iments, embeddings. and020 texts. filter028008not018007 theirout! beenseparate learningvarious noise,Mean-joint wellvarious005025 to exploredet inference020 factsMulti-lingual enableiwordswhich al.008 taskslanguagesand, j2016 languagesentityin007! among406 willcross-lingualof; in ofToutanovac representation knowledgesamedifferentfur-word knowledge005 variousCaowords shareword set-share etnot translation al. learningbeen language base,andinlation somelanguages.20172015 knowledge some well different andattention entity)utilizethecoherenceinforma-; to lan- exploredWuentity enablemechanism,304 common commonbeen et representations representations al. in,are languages.cross-lingual doneand ortic variousetACLhelpful space,guagesal. whichsemantic not.cross-lingualbases, in068 2016 a 2018 unifiedcross-lingual set-; to islanguages Toutanova573intotion usuallyenableSubmission Onvectortextual the(KB), breakdue to068 Caoof space. alignsame the et joint expensiveetDataknowledge al. descriptions For al.to similardownscenarios. share, ,tic2015seman- inference***.2017 other storing Generationthewords space,;)utilizethecoherenceinforma- wordsConfidentialWutextual languagesome and065 etcomplementaryword togetherandattention al.hand,among in to, descriptions common entities mil-different enable and Reviewgaps KBare with075058069entity068078 with cross-055 and∝ and057helpful joint thein sim- semanticlanguages. Copy.together representationscross-lingual structuredmanyatt inference DO to with073058 knowledge.NOT break055 re-057 the DISTRIBUTE. among structured583 in down the same070 KB language re- and078seman- For gaps070 076 in075 many 456 080 = s s 354 414 017026joint018 016various inference derstanding method 309 among410 languages thattings. knowledge018various integratesof Inand knowledgethis entity basecross-lingual paper, share languages natural and we representation someattention proposewordand shareet entitylanguage a009 common novelal. and , 2016 representationsome learning1 cross-lingual;words Toutanovaare lation semanticwords common Cao009beyond helpfultoin mechanism, et et enable differentin al. al.x,, learningdifferent i201520172016 tosemantic ;)utilizethecoherenceinforma-)learnstorepresententitiesbasedontheirticsWu breaktexts. whichlanguages. et to languages. al. enable to,down istic usuallyity alignMean- space,x067 language between expensivewords similar to enable gaps066in and different words entities joint in many inference andlanguages. and wordsare entities among helpful correspondingLawrence in076 KB towith differentbreak and Foust simi-464068 down059 languages. rable languageThe mentioned Chinese068059 intuition gaps460 inLawrence sentence many is Foust that, not. words and entitiesSemantic inSpace s 359 037 417 1Introductionlions311030019028 of entities026 and014 theirandiments, entity030tations facts benefits relatednessseparate! many in NLP tasks various demonstrate tasks,ilar embedding of whilegenerate word hastion vectors. thetranslation lan-tocross-lingual align Another ef- similargenerate026 approach wordsare intraining ( andcross-lingualHan helpful entities data069 with505i via to sim- training breakdis-various087 datadowntic064 via space, language dis-languages to078Entity enable467 gapswords Regularizerdiffer. joint insharew manym inference insk,l407 someOn different one080 among commontic hand, languages. space, KB076 and we to semantic enable utilize joint their inference shared among080 seman- KB೉᯾ and·ᐰේᇙ555 076Bi-lingual 361 EN k k 457 037 0370421Introduction 037 0341Introduction0111Introduction010 034 034030 034 521037 005530030joint016 inference ther among026514 1Introductiontings. improve008 knowledge In this 034the base026020 paper,514 performance. and we008 propose017 023 ajoint020ther noveltext In016 inference∈12016 exper-030 across improvemethod )learnstorepresententitiesbasedontheir among languages,023 thattings. the knowledge008textintegrates performance. In across this capturing baseexample, paper,cross-lingualAnonymous and languages,various we (008Wang mutuallypropose Intictics etword ACLtasks, exper-space,al. languagescapturingattention, a2014 submissionnovel toet087Anonymous such; toYamada al. align enable, as mutually 2016 et cross-lingual to al.ACL share joint, 2016066087; similarselect submissionvariousToutanovaet; inference520087 some al.571,same entitythe2016been words amongcommon084 058 mostetlanguages linking,; doneToutanova al. language KB,5642015 informativein andin and cross-lingualsemantic which084been; 084058Wu et entities084 doneshare al. oret080,564 0762015 inal.067066 scenarios. not. cross-lingual, some ; withWu092 On087 et070 simi-common058080 al. scenarios.the, other070 semantic058 hand,580073084076 cross-073 060080061 055 ′ 570 537 n 033 study,019 the study, results 019of1Introduction the on knowledge007 resultsbenchmark 031034jointther016tantand 034 on attentioninference improveand supervisionentity020 benchmark005 entity dataset among1 representation009 attentionand andthetings. relatedness019 knowledge008 performance. cross-lingual overIn entitytext learning datasetthis to006 base across2016 multi-lingualpaper, selectmeaningscomplementary toattention and009)learnstorepresententitiesbasedontheir036demonstrate enable1 languages, we008 relatednessInstudy, the propose exper- most 006 capturing totion a knowl- the noveland ofcomplementary toselect informativeknowledge. the align,buttherearealsowaysinwhichthey resultsknowledge mutually entity ef- similar the demonstrate on words relatedness Instead most attention knowledge.benchmark andCross-lingual entities informativeof2016 re-)learnstorepresententitiesbasedontheir069 and withare demonstrate Instead datasetsim-n cross-lingual helpful1ilarthe textual embedding 069oftion may ef-re- to to003 descriptionsalign breakintroducethe vectors.n similar ef- downtextual Another words066inevitable togetherlanguage and descriptions approach entities with059070069 errors.587056 with in058 thegaps (Han togethersim- structured Our in many embeddings with!∈059 re-056058 the081 structured re-೉᯾ ·ᐰේᇙ 084 083 084057 L055 086 || − 053|| (3) 042 while,040 abundant 524037 text029 037024020 study, corpusand 000 entity018 thestudy,text 017030 contains1Introduction400 resultsacross relatedness the0391Introduction languages, resultsand034015 on021 method029 large300entitybenchmark capturingattention demonstrate on that benchmarkstudy,notintegratesamountrelatedness mutually meanings been1021 to well dataset cross-lingual307 exploredtheselect the dataset,buttherearealsowaysinwhichthey demonstrate results ef-in2016010 cross-lingual the wordet)learnstorepresententitiesbasedontheir al.tic, most2016 space,on2 set-; mayToutanovailar benchmark the informative embeddingtion010 to introduceand ef-to enable et align al.The,In entityvectors.textual2015 similar2 this jointinevitabletext=; Wu dataset descriptions Another wordsrelatedness paper,intuition across et inference al., and approachα errors.km languages we entities574 together070lations.068 propose= among inattentiondemonstrateOur with (Han withαiski,ljwithout sim-… embeddings the However, KBα to structuredkm that, learn andto067450 anylations.065 select thetic additionalre-cross-lingual theseα space,ki,lj ef-words However, the methodsZh 079 to trans- most enable these only and informative joint focus methods074 entities inference on only text 087092 amongfocus080 across050060071 in on KB079 languages and…060071 090350 without any additional087 trans- 084 089 357 540 044026 037 027 009027 attentionjointmeanings009 inference026 to select,buttherearealsowaysinwhichthey joint among407textualattention the inferencelingual descriptionsACL most knowledge 2018= Submissioninformative to027c + together amongmay selectγ embeddingss ***.introduce base withCao Confidential008 knowledge etthe and al. structured, tasks,2017 most inevitable Reviewmeanings )utilizethecoherenceinforma- re- Copy. base such will1 informative DOerrors. and asNOT benefit,buttherearealsowaysinwhichthey cross-lingual DISTRIBUTE. Our 590 embeddings fromtext across entity different059 languages linking,tasks,076 lan- in… without such which059 … ase any077 cross-lingual additionalz trans-entity linking, 077 in076 which094e z 087457 077 en zh en zh 058 α sim040( 534 , 020 042 )029e522en,Joespectively.text 017 across languages,019meanings515e zh,JoemethodE capturing411 Usingthat Abstract integrates,buttherearealsowaysinwhichthey mutually019meanings004fectiveness 515and s = cross-lingual entity entity018 ,buttherearealsowaysinwhichthey oftext relatedness e our wordcomplementary017study, acrosslinking methods study,=and languages,x entity,wi demonstratewith theemethod representationtante capturing009complementary asknowledge.,w the anetasks, ,w supervision resultsthat average ae,w mutually integratesresults case suchthelearning,w Instead,w ef-tant cross-lingual009 overknowledge. as,w on to supervisionenable cross-lingual,w on of multi-lingual ,wbenchmark re-070= ,wword benchmark,w( Insteadtextx over) Most506 across knowl- entity of multi-lingual re-067 languagesdataset linking, existing dataset 572 e knowl- inP without whichtasks,(1x work,w565079x any,w) suchtasks, additional jointly,w such565 as068067 cross-lingual as trans-models cross-lingual584 069059 KB entity entity and linking,069059 461 text linking,090 in which in which 092NBA556 (zh) 054 057i submission468059et in (al.Hanฎ, in without whichᗦࢵ077060057059 any additional081465text across trans-077 languages[[Lawrence086 without MichaelNBA any Foust additional]] was an trans- American basketball077 s ,s SڹRegularizerntasks, acrossฎ descriptionsฎ 2016 in Anonymous ( ᗦࢵHan suchlanguagesᗦࢵeti)learnstorepresententitiesbasedontheir al. together070,ilar asn2016n embedding cross-lingual without with; Toutanova2016088e the2 structured vectors.)learnstorepresententitiesbasedontheir anytext NBA067 et additionaln Another acrossACL al. re- entity ,ᭌᐹ20151 approacho; languages linking,trans-Wu060070 ڹtext approachtextualڹ೉᯾ withen,Kobe tasks for theNBAฎ oftypesvectors. structuredre-ᓯቖNBA of un-ᭌᐹᗦࢵ word AnotherᭌᐹSentence re-027ᬩۖާ translationڹkm 038while,418 abundant textKBs.031guages, corpus 415020 Therefore,027 provide contains027 text 017 across joint researchersrich languages, inference large010 structured method among020009 capturing amount knowledge thatcomplementary 007 mutuallyintegratestextual leverage base010iments, descriptionsknowledgeNBA and009 cross-lingual knowledge.ᭌᐹ separate007 togetherilar wordboth embedding Instead 036 038 031 035 019 complementary018iments, c separate 021 knowledge.016 and007027021 entity tasks Instead representation031tings. of of word re-In024 this021 learning translation paper, 031 we totextual011lingual enable propose2016L descriptions024)learnstorepresententitiesbasedontheir a L novel embeddings togetherLilar011 embedding withmeanings lations. the vectors. structuredlarsthe instance, However, Anothers majorwill embedding re-,buttherearealsowaysinwhichthey approach these challengew benefitlations.069s methodss textualin (sHan lies However, only fromNBA vectors, in w focus068lations. measuring066 = ambiguitys different ontheseIn this noHowever, methods the paper, 071 matter similar- lan-Cross-lingual onlywe these085 inψIn propose this( focus onew methodsthey081 paper, P on,w( to languagex are onlylearnwe x) propose088) infocus cross-lingual071061 the on to may learn ToD cross-lingual071 address061 sim 074 (the 077Aw issues,,w 074)we 081 081 propose knowledge at- 057 k − 038 035 035 035 038031 035 words and filterlionsliance out noise, on of parallel which entitiesliance data, will on fur-= we parallel and automatically data, their we kmtasks, automatically facts suchki are in as helpfulkm cross-lingual various088 tokiAnonymous breakψ085(e entity,s lan- down) linking, language 085 085 in′ which gapsACL in manyo submissioni 088l081various languages085 share some common semantic ጱ · ]]enAll-starzh[[NBAen]] zh 081 k 458]] ڹ( attention to401iments, selectedgewhile, bases. separate the most abundantL tasksWe informative also of word propose translationtextL || two corpus− typesattention|| areet|| al.contains helpful, 2016 totion− ; selecttoToutanova align to|| break similar the largekm et words down al.most,0882015 andamount language575 entitiesinformative; and071Wuet withkmkm filter al. gaps088 sim-, out in many451 noise,km= iEn which[[k,lLawrence will Michael fur-P (Foust408(x]]) wasx 031021 525 038 038 538 ∝ Multi-lingualverify theguages, quality knowledgeverify verify531 ofcomplementarythe018 provide ourbases quality theverify embeddings.032 quality035031 (KB), knowledge. offectiveness theand rich010 our entitystoring quality310 of Instead embeddings. 022 representationourJoint structured 030 mil-of ofembeddings.words of re- our oure010Representation learningkm embeddings. enable On 018S outverify lations. withone noise, andjoint qualityweLearningD010 entity=forSout these among utilizewillare Instead representationhelpfulnoise, methodsun- fur-of knowledge305 of their our re- of only to whichof our010 sharedembeddings. learning break base focuslationAbstractlar Cross-lingualL method and " toon downwill∈ seman-( enablemechanism, embeddingeC)fur- languagewords withmono-lingual"Abstract∈ ( whichean " gaps∈068w)meanings average Wordsmeanings inbasketball in521 isvectors, many usually differentMost settings,lingualmono-lingual"∈c existing and060 expensive and,buttherearealsowaysinwhichthey nowwork Entitiesembeddingsplayer| fewMost settings,languages.jointly588 matterNBA and researches existingᭌᐹ models060ticsee and via work KB theym tofew068 haveand jointly,w,w will aligntext082 researchesin aremodelsi benefit,w inKB081 similar060072 haveand the೉᯾ text080eᐰේᇙ from words060072ฎᗦࢵz different581085 andmᘶ entities lan-n with simi- ⟨ 360!′ ⟩∈ 571 355 Multi-lingual knowledge038n021 312030025 bases523021fectiveness (KB),001 020028 516complementary storingMulti-lingual 018 of text our fectiveness acrossmil- method knowledge.011301 languages,wordsandgain 516021010 entityknowledge withofverify Insteadcapturingliance of representationand019008 20%km027lations. our an onoflmfilter mutually011 re-the andparallel methodaveragebases However,010k,l learning quality 3%out data,w over(KB),noise,008 withtokm theseet we enable text al.lm of automatically,baselines, methodsw2016 anstoring acrosswhich our k,len average; Toutanova only1Introduction embeddings. languageszhfectivenesswillmil- focusre-word2 et on fur- al. w, and2015lations. without entity; of071Wu However,w ouri et anyrepresentations al.2016 method, additional070507 these)learnstorepresententitiesbasedontheir071etwordse al. methods,j 0042016 with and; trans-Toutanova only 573 inane filterthe focus average068text sameet on out al. across , 2015 seman- noise,566061;071080Wu058 languages060 etwhich al., ೉᯾೉᯾will withoutᗦࢵ566075061069078058 fur-060 ᓯቖ any093lation088 additional 051ᬩۖާe mechanism, trans-z which 351077 is usually expensive and 557 362 054 043 ߵᇐỿےߵᇐỿ]] 078056 087 ᜡےMulti-lingualMulti-lingual Multi-lingual012 knowledge of011 knowledge potential knowledgebases 027 k (KB),knowledge bases bases storing030006 (KB), (KB), mil- complementary storingliance019 storingdiffer. on mil- parallel022 On017 data,wjoint one mil- we inference totext hand,fectiveness automaticallydiffer. existingmethod across among we that On" knowledge utilize037 integrateslanguages,and ofliance∈ onetextγ our408words entitye hand, basecross-linguallations. their onmethodacross textual andan parallel relatedness andcapturing" sharedwe However, descriptionsedge languages, wordliance with028∈ utilizefilterbasketballx data, bases. theseseman- anetThe together on mutuallydemonstrateoutal. we average their methods, parallel e2016 capturing intuitionwithautomaticallynoise, Weedgemono-lingual;shared the Toutanovathe only also structuredlation data,differ.was whichbases. majorfocus the proposeis mutually seman- we mechanism, re-et ef- on that,settings,On al. automaticallywillan challengeWe, 2015 one two words also; andfur-Wu hand, whichAllstar types few proposeetal. liesand researcheslation, we is usuallyL in entities utilize069 two067 measuringNBA mechanism, have types expensive their in 080 sharedthe077072 the and which major similar- seman- is challenge usually| expensive liesฎw in measuring ands ,w S s theS similar-e playere who spent061 12062458 seasons in [[NBAᜡ (ᘳӱᓯቖᬩۖާ070 liesentities062 have re-lationm seman- in in measuringk mechanism,the in and n same072070062k seman-075091 the̶՜ࣁ462 Representation078 which similar- is usually075 088 expensive084 and NBA085085058 NBA056 (zhڹwki526s 038022 416 008 020and 028402e entity035zh,Kobe028412006035 relatedness 020differ.fectiveness022American demonstrate On one025 differ.022The hand, mono-lingual theof intuitionLawrence 012 Abstractef-weguages ourOn utilize025sThe settings,=one methodS is The2016thei intuition theirdue and that, )learnstorepresententitiesbasedontheir012hand, majorSAbstractvariousilar fewintuition shared to embedding researches wordsLawrence withthe challengeis weseman- vectors.Mostthat, ity iscomplementaryhavethe andlanguages an existingutilize between that, Anothertextual majorwords liesx entitiesaveragelation work576 approach words072 inmono-lingual entities descriptionschallengejointly Most andtheirx measuring mechanism,591 in in (existingHan models and andtextualentities089 share ( x sharedworkKB correspondingknowledge. lies entities) settings,452lation togetherL andThemono-lingual the descriptions jointlyword in whichtext in some similar- measuringintuitionan mechanism,models and inseman- withAmericanNBA( xis entityfew mentionedKBi(zh)) usually thesettings,s together commonand For researchesword structuredis representationstext thebasketballthe supervision whichthat,e expensiveand similar-major with entity078 fewhavewords re-playeris theCAmerican challenge usuallysemantic researches structured representations in and|585 the and∝ same expensive072466ፑጱ 034535041 039 541 043 045419 032 km022036038032 wordslianceMulti-lingual019 on028 and parallel000 filterlianceand data,ofjoint040011 onentityout knowledgecomplementary we inferenceknowledge parallel automatically noise, relatedness028ther data,amongjoint022 knowledge.which weimprove attentioninference knowledge011032 automatically bases demonstrate009 will Insteadmono-lingual among thegenerate baseliance fur-and(KB), of019verify knowledgeperformance. and re- on cross-lingual032 theverify settings, paralleltext cross-lingual storing ef-009 base across2016 the data,and Lwordsgenerate and)learnstorepresententitiesbasedontheirjoint languages,the011 fewIn wemil- inferencequalitytasks, exper-automatically researchestraining and quality cross-lingual such capturingamong filterdiffer. datahave as knowledgeof028 out011 cross-lingual mutuallyviaof ourmono-lingual||C noise,ACL training Ondis- our base072 2018 oneembeddings.the and which embeddings. dataiSubmission entitysettings, major− hand, viaC will linking,tasks, dis-and challengeo we069 fur- few ***.||The suchutilize researches in Confidentialcorpusi which as lies intuitionACL cross-lingual to their have enhancein061x 2018 measuring Review shared Submission eachcorpus is072059 otherk Copy.entity469 that, to seman-( the byenhance086061DO) learninglinking, ***. similar-i words NOT082Confidential each word069 DISTRIBUTE.059 inotherKN which and and by learning Review082was093 entities050061 word Copy. and in DO078 NOT061 DISTRIBUTE.1 078095 082082088 078090 039 039 039 036 of potential041036 lions036 knowledge of∈ entitiesof031043derstanding resources and524039022032study, complementary their021 factsgenerate033032 to517 thenatural in019 cross-lingual improve various results 018023 text031 language036 training012 lan-to across005517ther011 existing datavarious languages, improveandgenerate on via020tics entity dis-023beyondto representation300therbenchmarkcross-lingual012 capturing308Abstract the align 011 natural performance. improve mutually similarmono-lingual learning traininglations. texts.L to enableAbstractthewords dataHowever,language|| settings, In viatasks,Mean- performance.dataset exper-∈− dis-2016 and theseKN and such)learnstorepresententitiesbasedontheirMost methods009|| entities few asbeen existing researches cross-lingual onlymay done with focus workInther introduce in haveAbstract on exper-cross-lingual simi-jointly089Mostcorpus improve entity!textual existing modelsCross-lingualbeen inevitable0715080722016 linking, scenarios.descriptions workKB done)learnstorepresententitiesbasedontheirc089!Abstract and the tojointly in intext089 errors. which574 performance.enhancecross-lingual together modelsbeen068 069Most Our086KB with donexi existing andtheembeddingsi the intext structuredscenarios.567x workcross-lingual,062 eacho081 major,061 In jointly086Most re-x exper-i 086 existingmodels other challenge scenarios.567 KB work062070 and061 jointly083 by text models learning089082 KBlies073082 and∈081 texttention in word measuring∈073 and and cross-lingual086091 the350 similar- attention093 to filter558 out un- 055 358 059 ᘳӱᓯቖNBAᭌᐹᬩۖާ 058 459ڹlions1Introduction of entities " and their026023 031 532 factsgain of in029020036 20% various403 lionsgain Multi-lingual and 023gain of lan- 3%008 20%entities302 ther of400Abstractspectively. overcomplementary 20% and improve and baselines, 028 3%and knowledgeguages, Using their over 3%generate the complementary been entity knowledge. facts over performance.re- baselines,013 done guages cross-linguale provide km linking inAbstract inlationbaselines,gain cross-lingual basesAttentiveof variousgenerate knowledge Insteadtextual mechanism, as re-of dueknowledge.(Jointly traininga013 (KB),scenarios.rich 20% Incasedescriptionset, cross-lingualgain re- lan- al.toAbstract learningoftic exper-,)=2016Distant datare- attentionof space,andstructuredthe of c which;storingcorpus+ togetherknowledgeToutanova Insteadwordmay via 20%γsame Jointly 3%complementarydisappear trainingdis-s toMost to andintroduce withis andenhance enable andetSupervisionentityover usually learningexistingof the al.languagemil-cross-lingual, data re-attention structured2015∈ each3% represen-073Dknowledgecorpus jointworkinevitable baselines,Most word; via otherther expensiveWu over jointlydis- re-inMost toet and inference by∈ andenhanceal. improveC learningdiffer. entity,another existingbaselines, models errors. existingor cross-lingual eachand represen-070453knowledge. re- word KBworknot. among other Our thelation andforOn jointly bytext embeddingsre-language, performance.∈C learningOn oneKBDun-models mechanism,NBA work081073 andthe450589 word KBhand,೉᯾ For and other jointly text which e.g., Inwe exper-076079 hand, utilize409 is the usuallymodels two089 cross- their expensive063 mean-meanings KB shared and and[[063 seman-೉᯾582 text,buttherearealsowaysinwhichthey·352087ᐰේᇙ086078 ]]ฎᗦࢵ[[NBA ]]ᘶፑጱ 037539 039 lions of entitieslions of and entities their028 facts and527039 023 their inderstanding various facts generate020 in lan- 002 various cross-lingual natural021fectiveness029Multi-lingualticsgenerate text oflan-012 training to liance across cross-lingual alignpotential language of ondata languages, parallelour021023 viatext similar trainingdis- methodgain acrossdata,tics012 capturing knowledge wedata of010 languages, toknowledge wordsbeyond withautomaticallybeen via 20% align mutually dis-generate done023 anfectiveness 020 capturing andaverage in409 similarther cross-lingual cross-lingualcomplementary texts.3% entities010 mutuallyimprovetextual bases of over wordscomplementarytext trainingscenarios.our descriptions012 across withMean- baselines, method knowledge.evarious thew(KB), dataki ande languages,i simi- via together performance.e entities dis-j withe languageslj Instead re- capturingstoringE with012ity anc ticssomebeen similarx Regularizer common done scenarios.070 languages.= and may words in522entity corresponding cross-lingual introducesemanticbeenwho representationstic and 062spent donespace,! entities∈ {D inscenarios.12 inevitable!entity to078 cross-lingual inA seasons060∈ enable!mentioned aKNC unifiedwithDatarepresentations ticPψ }062 space,(lar Thee jointvectorinxThe simi-Generationi ,s errors.[[ scenarios.NBAk,lx embedding 079 space.inferenceto)=1 in070)060 enable]]intuition a unified ForOur1950 jointvectormayamong embeddings089ଙጱ073071 space.inference052062 introduce[[ KB isNBA vectors, For and that,ᭌᐹ that, among]]Ӿᒫ inevitable073071062 words1 KBwords no᫪ᒫ and5 matter errors. and and they entities Our entities embeddings are459 in in in the en zh 572 044 n 040 032 023 1Introduction022 fectiveness 020 029413019 complementary of013iments,ticsJoint our023012to methodjointtant separatealign Representation021 knowledge. inference supervisionJointly with026varioustant013 similar tasks among an learning012 supervision Instead average over knowledge oflanguagesbeen words word multi-lingual ofmono-lingual wordvarious donere-Jointly026 and basetant overin translationvarious entity and029 cross-lingual andLearning learning supervisiontheity settings,knowl- multi-lingualshare languages represen- entitiestextual major betweenj word2016corpus306 andlanguages scenarios.Lsome descriptions)learnstorepresententitiesbasedontheir andfew challenge overtoL with entityresearchesof enhance share knowl- entities commonsame multi-lingual Cross-lingualrepresen- togetherL simi-ityshare each lies havecorpus some1 betweenitylations. other withand in languagemay betweensometo measuringsemanticlations. bythe enhancecommon corresponding learning knowl-072 structured073textuale introduceithe However, entities common However, each005 entities wordx descriptionsmajordiffer. re-thei Wordslations.090 other,w semanticandor similar-various and bythese challengeinevitablemay andguages these learningsemantic069 togethernot.corpus 070 mentioned corresponding methods However,On correspondingandAll-star introduce to methods word,w withlanguages enhance liesoneOn only the andEntities errors. dueity structured in063073 each focus these the hand, measuring062corpus betweenonly inevitable other toity shareon mentionedOur re- toother methodsby focus betweenthe enhancevia learning o weembeddings some the entities063i071complementary eacherrors.onplayer wordutilize similar-hand,062 only other entities common and094 by andfocus Our learning cross-their and corresponding embeddingsmay onsemantic word correspondingen andshared introducezhLawrence knowledge.en seman- mentionedinevitable076463079 mentioned zhFoustLearning errors. For076 Our embeddingsAll-star 082079 NBAᭌᐹ 055 356 -ጱ Settings079 083 086079 where s , s 088are361 sentence embeddings. Take En 064074 ڹlions of entitiesKBs.420536 andguages, Therefore, their provide037 0331Introduction factsrich researchers525417 structuredtherlions improvein 029001tant of034 various021 knowledge supervision518 entities leverage attention the performance.311024036 over024 for and032 multi-lingual029 to lan-401 un-518 iments,both select their knowl- In separate thetypes facts024 exper- 1Introduction most038 tasks in 033 informative various014 of wordther translation lan- improvelations.the major014 ! However, the challenge∈029In performance.entity theseare this corpus helpfulrepresentations methods= paper, lies to in enhancewe onlymeasuring to propose breakentity In focusin eachIn acorpus exper-unified592 representationsto thisother on downvarious learn the to vector bypaper, enhance similar- cross-lingual learning language575 space.071 inwe languages eachIn wordaAnonymous unifiedFor propose thisothergaps and vector bypaper, learningin568ᬩۖާ to074 space. many451 share learn470 we word ForLawrence propose087 cross-lingual and some568 ACL to common learnNBA084ᭌᐹ586 cross-lingual submission083051Cross-lingual467 semantic064074 6.1082079 Experiment 542 ߵᇐỿ083063083 embeddingsS ]]ᭌӾ̶S e…063 092e 087 Comparable089062083063 Sentences559 086057 363ےα 037013 012 0370421Introduction037033 1Introduction313032039024032040007033tant 021 supervision 030033404tant overcomplementary036 supervision multi-lingual013generate cross-lingual over037gain knowl- knowledge.complementarystudy, multi-lingual013 033 training thelarof011 Instead knowl- data embedding resultsknowledge.tics 20%tant via oftant dis- supervision re- tosupervisiononIn Instead thisand benchmarkinstance, 011 alignvectors, paper, over ofcomplementary re- multi-lingualattention we013 tant over3% propose no similar supervisiontations datasetmulti-lingual matter textualover knowl- to to knowledge. learn benefits tics select 1attention cross-lingualthey013 over tomanywords baselines,are knowl- Instead theACL ambiguity aretations align multi-lingualhelpful NLPiments, most2018toin of090 tasks,benefits re- theselect similar Submission toand informative while074 break manyseparate509∈ knowl- the hasR೉᯾ inre- entities NLP words down071 most ***. one tasks,090 Confidentialand language tasks informative andwhile454 was languageᓯቖ withmayhas of entitiesan087 063 gaps Review8-timeword introduce in082 simi- many[[082 withCopy. translation061All-star may 087 inevitable DO063 simi-]]087 NOT… 083 DISTRIBUTE.080061 errors.ᶲ֖ᤩ [[ Our090ᜡ 040 040 ′km 035 528n 024039Jointly027 024spectively.009 023 learning037gain 030021spectively. Using041 of 20% 007spectively. 020 entity word and014303024 iments,1Introduction024013 Usingliance 3%Jointly linking textedge over Using and learning across separate029 on entity bases.In baselines,301iments, parallelas014 languages, this024Jointlyword gain entity021 a013 paper,Weliance linkingJointlyMono-lingual andcase tasks learning of also capturingliance entitywe data,re-separate linking1In learning 20%been propose proposeonlations. this wordofmayrepresen-as on represen- donespectively. mutually paper,parallel we andparallelJointlyword a andAbstract toin two However,introduce as caseγ cross-lingualautomatically learn taskswe entity 3%andRepresentation types learning a propose data,1lations. translationcross-lingual entity data,case represen-overmeaningsACLtextual entityspectively. these of we1 scenarios.textwordrepresen- to UsinginevitableJointly2018However, word automaticallyrepresentations descriptions welearnbaselines, methods andAbstract across Submission cross-lingualautomatically learningen entity translation theseentityIn only togetherACL thisUsing languages074represen-zh errors.Learningentity wordMost focusinre- methods paper, a,buttherearealsowaysinwhichthey20182 with unifiedJointlyexisting andrepresentationslinking***. on578 wetheentity Submission entityOur only Confidential073 structuredvector propose learning074 worklations.iments, withoutInL focus represen- embeddings this jointlyspace.linking090 as towordMost re- in on However, learn paper,a a***.models unifiedFor anyexistingseparateReview andexample, case cross-lingual Confidential entity as additionalKB070 wevector these work071 In aeand Copy. represen-propose (Wangcase methodsthis jointlyspace.text tasks1basketballDO paper,Review et models For trans-al.example, only to NOT of064074, 2014 learnKB wefocus063Copy. word DISTRIBUTE.e and; propose (Yamadas cross-lingualWang on text DO =1 translation etNOT| et al.080 toᬩۖާ077 al.064,071 DISTRIBUTE.2014 learn, 2016063 ; Yamada ; cross-lingual094074 etᗦࢵ al., 2016related; 074 information353087079 at೉᯾351 sentence·ᐰේᇙ085089 level and at word059091 k057k 044040 guages,540 provide KBs. 046guages, rich042 structured Therefore, provide029 knowledge rich processing structuredresearcherswhile,533 for un- knowledgeabundant003 (NLP)edge022 leveragelar bases.gainguages, lions embedding forof Werelated text 20% also un- provideliance022 ofand propose corpusboth and402 onspectively. entity entities vectors,lar3%tasks, parallel two rich022 embeddingtypesover types relatednesscontainsdata, tations structured027edge baselines, nosuche we UsingLawrence benefitsand automatically matter bases.iments, demonstrate vectors, manyas entityknowledge re-theirlarge tationsrelationWe NLPthey027edge separate linking tasks, also benefits noamount facts are thebases.meanings while propose matter for manyinef- as tasks hasex-un- thea in WeNLP case= theytwoword,buttherearealsowaysinwhichthey of tasks, also variousS wordlar andtypeslingual while areentity propose entityS embeddinge has representationsin translation representations ,w the lan- twomono-lingual embeddings types,w ityentity in vectors, a betweenunifiedin representations the settings,are same vector helpful noseman- entitiesentity and space.text will matter in fewAllstar representations a Foracrossunified to andresearches benefit break they vector079 corresponding languages452590entity in space. aretext have adownNBA unified from representationsk inForacross without vector the language mentioned different languages072 space.410 in any a For unified additional gapsare without vector lan-helpful072053 in space.American many trans- any For additionalto break072ฎ 583 trans-077Anonymous down092096 language077 gaps ACL in many submission 460 guages,041 provide richα040 structuredsim 044 025526( knowledgeverify edge bases., We022 the405519edge for also liance014 bases. un- propose qualitytant 030025 on supervision We) parallel twoe alsolaren,Joe006 types519 and overdata, propose embedding of014 multi-lingualentity we 012 twoour automatically derstanding relatednessmeaningse typeszh,Joe edge309 knowl-E embeddings. vectors, bases.410word 015 andinstance, demonstrate We012,buttherearealsowaysinwhichtheymeanings entity also noliance representations meanings propose030naturalmatter itymono-lingualwordsity on thenot parallelbetween Multi-lingual twobetween015,buttherearealsowaysinwhichthey textual beenef- they types in 010 data,s well,buttherearealsowaysinwhichtheythe language settings, different entitiessamemono-lingual corresponding correspondingdifferent wellresearchesentity arebeyondcorpusknowledge in et ᓯቖ075 exploredal. differentexample,in helpful , 2014 set-different haveᬩۖާ in;languages. inYamada representations (Wang cross-lingualmentionedmono-lingualtics091 tomentionedsettings,texts.to one languages.xiet576meanings breaklanguages. enhance072455 al.are to bases,, languageᘶፑ20142016x1 set- helpfulo andalignMean- down064;; xYamadaity( settings,few,buttherearealsowaysinwhichtheyx (KB),i569)words language toetsimilareach075 between researchesal.062 break, 2016 mayᓯቖ words andin064 in;storing otherᘳӱ down agapsdifferent fewwordsNBA unifiedin569have (zh) entities differentresearches062 in language by manymil- andlanguages. learning090 vectorlanguages. gapsentitieshave and065are differ. in helpful corresponding many space.word with toOn065 break andsimi- one080 For down hand, language mentioned we utilize gaps460 inLawrence many their094 sharedFoust 080 seman- 056 359 060 ࣁ ՜ ԅ ၎๩ᎫდՈderstanding 025038while, natural languageiments, abundant022 024 beyond002separate035lionswords textKBs. texts. tasks414021 of and025 corpus Mean-033009 of entitiesfilterliance Therefore, wordtations onoutcomplementary translation parallelbenefitscontains noise,word025 and manydata, and 022 knowledge.which entitytations we NLP researcherstheir automatically representationslarge tasks,att benefits will Insteadmono-lingualIn while thisiments, manyfur- factsAttentiveof paper,014amount has re- in NLPthe we settings, separate proposesame tasks, leverage in seman- while andtotationsvarious learnDistant014fewtasks hastasks, benefitscross-lingualword researches of and075both manysuch word entity Supervision lan-havetations NLP as representations cross-lingualtranslation tasks,word types074510 benefitsmono-lingual while and072 many has in entity the NLP entitysettings, same tasks,word representations 071 seman- linking, andwhile and fewi has entity researches in which representationsin thevarious have|088 same| seman-072 in languages085 the sameEntity052 seman-064075 Regularizer share083 was some064075 464 common semantic 560 ೉᯾·ᐰේᇙ059 Bi-lingual EN 040543 045038041 421 038034529km 033034033 418041 025034 guages, 030 031provide 022 richgenerate 038 015030verify structured025014 cross-lingual of the023same knowledge qualitynot training015 been knowledgelanguagetations edge c 014 well attentionof data034 benefits exploredbases.ourgenerateword via ordis- embeddings.and manynot innot. cross-lingualentity Wefor been cross-lingualwords cross-lingualtations NLP edge wellalsorepresentations un-On tasks, benefits exploredbases.andmono-lingual proposethe while traininglations. set-example, filterL many other has in Wein words cross-lingual030 dataHowever, NLP the twoout|| (Wang settings, alsohand,same viatasks,lingualatt tasks, noise, types− dis-and et theseseman- proposecorpus al.while and suchset-example, cross- ,filter2014 methods|| which few hasto as enhance579;been embeddingsYamadaresearches twoout cross-lingual(Wang only will075 done eachnoise, types focus593 etfur-corpusmeanings006 otherin al.have ontics, cross-lingual,20142016which by to523091 entitylearningCao enhance;;∈Yamada toDinstance, etexample, will072 linking,word eachal.willare scenarios. align,buttherearealsowaysinwhichthey et, fur-2017and other al. helpful (,Wang 2016 bybenefit)utilizethecoherenceinforma- in083 learningCao which; similaret065075083 al. totextual471 et064example,, 2014 wordvarious al. break, ;2017same andfromYamada088 (Wang downwords)utilizethecoherenceinforma-084= etambiguity065073081al. different languagelanguages064, language, 20162014ACL and;;095Yamada 090 2018084091 gaps468 Submission entities et084 lan- al. in, or in2016 share many one080;not. ***.P Confidential with( languagexsome o Onx simi-i088 Review)088 the common080 Copy. other may DO NOT DISTRIBUTE. hand, semantic 084 cross- 080 056 573 -ߵᇐỿ usually amongentityYamada Review no (Wang! many)utilizethecoherenceinforma-073 matter KBrepresentationsword etEnglish Copy.al. expensive and demonstrate,08820142016∈! and C DO they; entityYamada NOTKB 076 and453 arein063 representations etDISTRIBUTE. the088 al., thein2016 same the; ef- seman-NBA081 in063೉᯾ the same[[587LawrenceChinese seman-084075 Michael KB Foust075 ]] was [[೉᯾·ᐰේᇙ352 ]]ฎᗦࢵ084 [[NBA]]ᘶ 083 glish as sample language, we define it as the averےn 038 sof038 resources537 to improve033 Multi-lingual variousof023spectively.034 knowledge031 spectively.edge knowledgenatural attention026 bases. Using Using025403 and We generate entity alsocross-lingual language entitybases034 propose013 linking linking (KB), cross-lingual two302and 0251Introduction spectively. types asgenerate storingas entity a a case case 013 mil-relatedness trainingare cross-lingual Using helpfulbeen entity dataACL done toviademonstrate 2018 linking intraining lationbreaklar cross-lingual dis- tic Submission embeddingCao space, mechanism,down as dataexample, et a toandscenarios. al. case∈ enable,viathe language10912017 D ***. (∈ entityWang dis- ef-)utilizethecoherenceinforma- jointConfidentialA vectors, whichCaoword et inference al. gapsexample, relatednesset, 2014and al. is, ; in2017ᜡ 041 041 derstandingkm 030 natural∝ language028026527ofstudy, beyond knowledge025 the texts.attention406520of knowledgestudy,resultsgenerate015 andMean- study,037 cross-lingual cross-lingual theattention on304fectivenessandgenerate520 resultsbenchmark thestudy, and trainingentity015liance cross-lingual results 030 oftheon data relatedness parallelourof of via benchmarkresults datasetknowledgeon training dis- methodtic data, Jointly benchmark space,016 wedata ondisappear word attention withautomatically learning tovia demonstrate benchmark enable andattention dis-of generate dataset an entity wordknowledge and joint average differ. representations cross-lingualdataset and cross-lingualJointly tings.att inference016 and entity instudy, dataset learning InOn thecross-lingual represen- another attention among this intraining onetheef- word paper, the same KB hand, and datatings. andseman- andresultswe entity via language, proposewe dis- In cross-lingualrepresen- this utilize076 on paper, a075 novelTheand benchmark their we091 entitye.g., propose shared intuition577456 relatedness athe novel seman- datasetlation065 two mechanism,570 mean-080 demonstrateislation that,065 which mechanism,570 the078 is words usually ef-Zh which expensive and066 is usually and entities expensive066354 and in080 这里感觉改动较大 087 guages,derstanding provide042 natural Multi-lingual language043 richMulti-lingual structured beyond 026 knowledge texts. Multi-lingual040 knowledge Mean-knowledge023 004 bases 023 knowledge038same (KB),nderstanding023=037 bases language312 storingfor022 bases023 un-(KB), naturalfectiveness026015 mil-notPsame or(KB), been( not. well languagetic1Introduction( language storing028estoring ofspace, exploredMulti-lingual On039)not our000023e015been tonot the methodand enablein)+ beenmil-beyondwell cross-lingualtanttic or other space,entitywellexplored jointmil- supervisionbeen not. with028study, explored inference to doneknowledgenot hand,texts. enable015 set-in an relatedness On been cross-lingualinwordswords average over in cross-lingual among joint wellcross-lingualthebeen Mean- cross- multi-lingualmono-lingualCao inference307in explored done otherKB resultsdifferent set-bases etnot000 andP in scenarios. al. demonstrate been015 set-cross-lingual( in among, hand,2017 cross-lingual settings,knowl-sametic well (KB), languages.( space,on)utilizethecoherenceinforma- KBe exploredTheentity076Cao and) cross- languagebeenscenarios. benchmarkto representationsstoringe set-few enableetnot intuition in) al.the beencross-lingual researches done, joint2017been wellef-ACL words mil- inference)utilizethecoherenceinforma- or in explored doneisentity073have a 2018 unifiedcross-lingual not.set-been092 that,datasettasks, in representationsin amongcross-lingualtion inSubmission vectordifferentCross-lingual cross-lingualOn done words to072 suchKB space.073ACL align the and in scenarios. For alanguages. similar2018scenarios.asand unifiedothercross-lingual set- ***. cross-lingual Attentiontion Submissionvectorentities wordsConfidential076Semantic hand, to065 space. align and inSpace Forcross- entities similarscenarios. ***. Reviewentity with wordsConfidential073065 sim- linking,Copy. and entitiestasks, DO Review073 in054 with065NOT suchplayer which sim- DISTRIBUTE.Copy.level, as cross-lingual DO073 respectively.065 NOT078093 DISTRIBUTE.088 entity050 [[ 078 linking,090 in which 050]] 087 089362 357 045 derstanding013 natural530041 languageMulti-lingual314of beyond potential008534026 knowledge texts. 036 knowledge Mean- bases 026 tant 016spectively. (KB), supervision complementary attention storing over026tings.differ.lar multi-lingual i 016 toof mil- In selecti411 embeddingknowledgethis UsingOn paper, the knowl-w one mosttings. we toand hand,attention propose informativediffer.ofentity Inexisting entity knowledgethis we ae paper, novel On and vectors, relatednessutilize weonelinking cross-lingual attentionproposee hand,,w theirthee ae major demonstrate,w noveljsharedwe and noe,w asSomeutilize challengee cross-lingual,wxtasks,i580 matter a seman-tic,w511076 case space,,w suchthetheircross-linguale,w lies ef- ,w asshared to in they,w enablecross-lingual measuringdiffer.,wtic,w Cao= seman- space, jointare et On al.pioneering , the inferenceone2017 to in similar- enable entity)utilizethecoherenceinforma-066 hand, the Cao among joint et linking,All-star we al.e work,L inference2017 utilize KB)utilizethecoherenceinforma-066 in observeandP,w whichtheir086(1 amongxLawrence,w sharedx091095 KB)that076,wtasks, and seman- word such as076 cross-lingual584| entity linking,063461 in which 561 058 364 ᘳӱᓯቖᬩۖާ097 086[[ 090Lawrence064̶՜ࣁ NBA Michael Foust081]] 060was092 anNBA American (zh) basketball058 461ڹo linking, i053 ฎ For basketballᗦࢵ in084 which player 465 081093ፑጱڹtheal. in similar we, weฎ measuring2017 seman- propose ฎ074 withtasks,complementary Inwords)utilizethecoherenceinforma- trans-( utilizeᗦࢵx sim- exper-ᗦࢵi to and) such learn the entities similar- cross-lingual084 as 084their with454591064 cross-linguale2NBA sim- 089 shared NBA knowledge. ᭌᐹ074082 064411 entityan1 seman- American ڹetLawrence entities additional ڹCao೉᯾ to KB major alignutilize tasks,en,Kobe languages knowl- to andet for NBA learnฎ al.one lan- similarwhile , challengeᓯቖ the 2017NBAi cross-lingualun-theirᭌᐹ hasdue withoutᗦࢵIntion Inwordsperformance.)utilizethecoherenceinforma-hand, ᭌᐹ this exper-Cao toᬩۖާ sharedto paper,and anylies alignڹ-014of047 resources043036 while, toabundant improve039040 034tations text034 Jointly corpusandattention various entity010 w benefitscontains to032ki003attention select024 relatedness learnings thertant the042guages, to naturallarge of supervisionmost select031 improve415 knowledge008 many amountinformative the034 demonstratesame most404 overword attention providelanguage the multi-lingual informative languagegenerate NLPperformance. and024lingual014 cross-lingual and the cross-lingualattention knowl-rich ef- tasks,embeddings or entitytations trainingnot. tostructured Inselect014tic benefitsdiffer.American dataspace, exper- Onwhile the tantvia willtherrepresen- many to031 mostdis-the supervision enable On benefitimprove NLPIntations informative other joint thisoneknowledgeNBA tasks,hasdiffer. benefitspaper,inference over from whilehand,ᭌᐹ hand, the multi-lingualthertext we many has among differenttion performance. proposeLawrence acrossthe weimprove NLPguages On cross 541 ጱ 042ଙ 422ᘳӱ039035traction045035027528419042035derstanding024 (Westonn031026study,035032407521attentionstudy,km016 natural etthe to027 the023al. select results results039031026, language the007tant5212013Multi-lingualtings. on most supervision on016 benchmark informative In benchmark; thistextLin beyond over across paper,026 study, multi-lingual024… ettings. languagestext knowledgedatasetwe035 dataset acrossproposetexts.al.017 thedisappear In this, knowl-without languagestasks, resultsIn 2017 a paper, thisMean- novel 016 any paper, basessuch… withoutwe onadditionalmethod), lions propose we017… benchmark asinbeen and proposeany (KB), cross-lingualmaythat trans-done additionalanothertings. a 031 novel integrates016 toin of introduce cross-lingual learn In storingtext trans-thismethod dataset entities cross-lingualacrossexample, paper,cross-lingual… language,entity scenarios. languagesthat inevitabletings.mil- we (Wang integrates propose077 linking,tic076 In etwithoutwordand this space,al.In, a thisexample,2014differ.074 paper,cross-lingual novelerrors. any in paper,lar; e.g., toYamada additional their092which we enable (578Wang we proposeembedding457 On Ourtic073 et propose theetword al. trans- space,tasks,al. onejoint, 2016embeddingsfacts, a0662014 novel totwo; inferencehand, learn; tosuchYamada571 enable cross-lingual077mean- in as et472 we vectors, al. cross-lingual among joint, various2016066 utilize089; inference KB085571082 andtheir=074 nox entityi among lan-shared matterx linking,085092Cross-lingual KB085076469067066085 seman-and(x they in) which081P ( are(076x067066) inx ) the089081 en 085zh en095zh 081 057 ێප 027of potential 029 027verifylions theknowledge" of∈Lguages, quality entities024verifyofattention verify resources ofthe and edge010 toprovide ourgain selectquality complementary bases.theirgain the027016tantverify embeddings. of the quality035 ofwords supervisionWe20%N factsmost of 20%031thealso richmethodtowhile, informative and our andattention303fectiveness inqualitypropose oftings. and filter improve|embeddings. variousthat016 3%tant our structured 3% out overIn twointegrates edge overtextof supervisionthisembeddings.abundant noise, to over types ouracross multi-lingualpaper, bases.lan-selecttomethod baselines,of which cross-lingualattentionbaselines, embeddings.e languages tings. ourexistinge wevariouskm JointWethat willthepropose tics methodknowledge overIn alsointegrates fur- withoutmost this re- wordtexttoIn re-totionproposeverify a knowl- thismulti-lingual paper,Representation novel011select alignsame to any informativecross-lingual paper,natural with aligncorpus we additional two theN similar the wepropose similar language typesanvarious proposequality mostfectiveness trans-wordfor077 averagetion words a knowl- wordsD092 novellanguage |contains to toexample, informativelanguagesun-entity learn andof alignJoint ∈ or entitiesour andcross-lingualLearning similar077AKN not.594 ofembeddings. entities with Representationrepresentations sharewords our large On sim- 524ilar andL(method some with theWang embeddingC entities oftion074 may amount simi-other to089common Cross-lingual alignwith introducewords with vectors. sim- similar et hand,ilar 077 semantic an embedding al.AnotherLearning words066tion089may inevitablemeanings averagemeanings in tocross-, andlingual align2014 approachintroduce adifferent entitiesvectors.basketballWords similarunifiedc079 errors. with inEn of Another words;066 (Han inevitable embeddingssim-Yamada Cross-lingual,buttherearealsowaysinwhichthey andOur588 approach| vector entities languages.o embeddings Entities errors.ᬩۖާ with inticsi (Han sim- et space. Our toal. will viaWords embeddings align,i 2016 benefit Fori and081 similar; EntitiesfromAll-star words353 different via085 and lan- entities with simi- 060 574061 039538 039 042 544 -042046039while, abundant 043 while, textprocessing corpus abundant 031 contains531e (NLP)km text corpuslargelions related amountwords contains of α 005 entities and tasks, filterwords 024 large out andwhile, andedge noise,suchfilter amount bases. which their out abundant as noise, 017305024 willWefectiveness405relation facts alsowhichfur- lingual proposetant text will in025015 supervision fur- corpusvarious029 two embeddingsex-lions of017words310 001 typesattention our overwbasketball andfectiveness contains multi-lingualnot of lan-method filter been015text entities to out well across029 willverifyw select noise,edge knowl-explored largeplayerLawrenceattention with languages benefitof bases.whichword and theinenot ourtheeamount೉᯾ ancross-lingual೉᯾ and beenwill without mostWe their methodentitytoaverage,wquality,w fromfur- well alsoᓯቖselect any001w exploredinformativerepresentationslation propose set- factsadditional,w differentilar ityguageslingualᬩۖާ withtheof mechanism, embedding intion w between two cross-lingualin trans-our=most to types an inalign various embeddingslan- the embeddings.In vectors. average whichinformative581 entitieswordsimilarsame due set- thistextilar512seman- andfectiveness is Another embedding wordstion paper,lan- acrossusually toentity andwe007092 to೉᯾ lar and align093approachthe will correspondingrepresentations expensive languages,w weIn vectors.entities ( similar embeddingᓯቖe thisbenefit complementarywdisappearproposetexti of in) with,w (Another andHan wordsour paper, acrossᬩۖާ withoutsim- in from method andethe approachto mentioned( e languages wesame entities learnj081067)different any455065 proposeseman- inᘶፑ withvectors, ( additionalcross-lingualHan another withoutesim- an lan- to knowledge. averagewho learn trans-067075 any065 nospentᓯቖ additionalcross-linguallanguage, ∈ D matter09612 seasons074055 trans- For theyin e.g.,[[NBA are]] 074 the in079 two355089 the mean- 051 079S S e e 562051 age sum of word vectors weighted360 by the057 combi [[ߵᇐỿNBAےߵᇐỿ∈! majorC with068067077 similar-theP(085 majorx simi- challengeex-xC) challenge068067077|1950 lies089ଙጱ lies in050[[ measuringNBA in measuringᭌᐹ]]Ӿᒫ the1᫪ the462ᒫ similar-5 similar-player who spent088 12 seasons in [[ᜡےwhile, abundantlionsof potential text of entities corpuslions040 knowledge035 contains of and028035 entitiesfectiveness their complementarykm large027k facts and037025039408 amountlingualwords of iments, their ourin017 ande various tomethod028 filter038 embeddings024 facts existing0271Introduction separate035out edge noise, with in lan-method bases.017 which various tasks an that will willaveragelation We027tics integrates of fur-also benefit mechanism, word lan-000 to proposemethod412lation cross-lingual align translation018 mechanism,from two whichthat word(e similar typesintegratesfectiveness, is anddifferentworde tics usually017 whichentity )cross-lingual toEand words expensive representations is018ACL align usually entity lan- ofIn 2018 thismethod our representation and expensive Submission wordsimilar paper,017an method inity thatentitieslationthe weandembeddings betweenand proposeintegratessame***.CaoACL mechanism,basketball wordsentity Confidentiallearning with 2018seman- et tomethod al.Thewiththe representation Submissionlearncross-lingual, entities2017 an078wto and whichReview major077 enablecross-lingual intuitionthat)utilizethecoherenceinforma- averagesimi- entities isintegrates***. Copy.andCao trained usually word Confidentiallearningwas challenge DOcorrespondingetthe al. NOT expensiveis cross-lingualtics, major2017with075458 to DISTRIBUTE. that,Review074 enable separately)utilizethecoherenceinforma- an to and simi- Copy.lies067 align challenge words word mentioned DO inAllstarNBA NOT085 similarx measuring078085 andDISTRIBUTE. on= lies entities090067 wordsmonolingualNBA inL NBA measuringthe andin(zh) similar-!087 entities the091ᘳӱ corporatheᜡ 041 542 044 042n 028 034041 529 535028 025 033004 verify033 522verify025038 the words032 thei416 quality and quality filterlingual027522 of028017 out of our000 noise, our embeddingsguages embeddings.e which embeddings.zh,Kobe will027040methodverify due025 fur-017 000 tothatoflation the will the knowledgeintegratestics quality mechanism, complementary benefiti iments, method toj cross-lingual attentionalignof which thatfromourThewordc separateatt integratesilar similarisembeddings. andword usuallyKN embeddingdifferent intuition entity cross-lingualknowledge.iments, cross-lingualtasks expensive representations wordsThe vectors. of lan-separate078 isilarThe and wordintuition word Anotherand embeddingFor that,in various1 theintuitionRepresentation translation tasks entitiestext approachsamejatt078word words vectors. acrossisseman- of and075 in wordthat,entity with Another(isHan languages and languageset579 representationsthat, translational.the simi- wordstext approach,ilar075entities2016 the majorembedding across xwords without;i majorToutanova in inand (Han thelanguages challenge572 vectors.etinx anysame078and challengeal.io592 entities share,067ilar et additionalseman-2016 Another al.embedding entities( without,x;TheAbstract2015 Toutanovai lies) approach inlies; some572 trans- vectors.083Wu intuitionin any075083 inet measuringin067( et additionalmeasuringx (al. AnotherHan al.i,), common2015092is approach 054;supervision 077 trans-Wu that, the et in similar-(al. similar-Han words,o semantici077 and050585094466082 entities050 in 091Most existing work084088 jointly models090 KB and text ߵᇐỿ amongmany approach 068 to066|| KBThe473 breakare and in (Han helpful intuition down 068 tolanguage066412x break is downthat,, gaps096,086 470 inlanguage words many 082 gaps and in entities many 082 in 086 082082450 358 462ے046derstanding natural423lions language of532 entities KBs. beyond420while, andguages, Therefore,032ther abundanttexts.036 their improveprovide the Mean-factsrich researchersperformance.textof018032spectively. knowledge structured corpuslions in Inedgether attention exper-016 of bases.contains variousimprove Usingknowledgeand entitieswords018 leverage entity and Wec entity the alsocross-lingual representation036 andtings. performance.large propose for and linking016filterlation Inlan-and un-the this twoboth1amount mechanism,words learningentityout their paper, types major032 as In Anonymous noise, representation exper-a weand totings. types case which facts enable propose challenge filter308 which In isare thisusually a inlearning novelout032 paper, helpful will expensive variousnoise, lies ACLL we toilar fur- enable propose in embeddingto and which measuringsubmission break a lan- novel582ticwill vectors. space, downilar fur- the to Another embedding enable language||C similar- approach joint vectors. inferenceare gaps in− ( Another HanhelpfulC400 inᜡ ߵᇐỿ082 Cross-lingual]]080ᭌӾ̶354 Sentence…064 091 Regularizer 563 061059093 363ےኞ႕ 043 043 040 processing014 040036 (NLP) α 041 related036 030 043009 036ther tasks, improve011028026 the 409ther such performance.of043 improvederstanding knowledge !313 ∈025 ase thekiE In040 performance.relationstudy, attention exper-406eof ljspectively.edge knowledgeand 030same Using304gaintherMulti-lingual Weedge improve and andmay results entitylanguage of learningentity cross-lingualalso introduce language 20% bases. the linkingrepresentation performance.030 topropose enableof inevitable! andknowledge onWe astic∈ beyondlar 3%aThe space, knowledgeThe learning two alsoerrors. caseIn embeddingbenchmarktics word exper- attentionorover tointuition typeslingual toproposeenableOur and enable not.may entity embeddingstotexts. andbaselines,et joint introduce representations al.cross-lingualinstance, is alignmeaningsvectors,att, inference embeddings two2016 that,tion that,bases On Mean-093; toinevitable typesToutanova align amongwords wordsre- indataset no thesimilar the,buttherearealsowaysinwhichthey similarAttentiveet (KB),078513 samematter KB textual al.errors.and et, and andwordsseman-other will2016 al.Thetiontics , entitiesOur2015 they; toentitiesand storingbenefitToutanova align093 intuitiontowordsembeddings entities; Wu ambiguity076459 are similaralignDistant hand,075 inet within in etal. from090 words sim-al.,is the!mil- similar, and∈2015 andthat,D differentcross- Supervision entities; Wu456 wordsin entities words!et withAstraightforwardsolutionistoconcatenatethem al. one sim-, 090and lan- and and 086was language entities080076Semantic with entitiesan i8-timeNBA inSpace simi-086 093[[ withAll-star may 086 simi-]] … ᶲ֖ᤩ0805.3[[090ᜡ ᘳӱᓯቖNBA365ᭌᐹ ᬩۖާ059ڹ043of potential015 knowledge044 048of044037 potential040539 complementary 032 knowledgekm029entity3151Introduction guages,not complementaryto029530 existingtations linking been026 provide006 025 well benefits523 ( richof toTsai026018 potentialexistingther structuredexplored029009 improve and028 306025 manygain the523 knowledge 029Roth018 performance.guages knowledge018036 of in 20%NLP,may 0282016 dueIn cross-lingual introduce complementaryand exper-002words and026 018 entitytofor tasks,gain ; the inevitableJointmay3%un- representationYamadaand019 complementaryticofintroduce filter over space, 20%errors. whileRepresentationand 018words out toinevitablelearningJoint toentity baselines, enableOur noise, andexistingjoint etguages,tic representationand019set- embeddings tospace,et joint inference hasal.errors. enable3% Representationwhichfilter al.and, inference002 knowledge.to,Mono-lingual 2016 Our enableover018 entityre- out amongCross-lingual willlearningwords; words embeddingsmayToutanovaeprovide noise,representation among jointjointkm fur-baselines,Learning introducegainknowledge079 to inferenceetin inference enableword KBwhich al.( different etForand, and of2016 inevitableal., learningRepresentation entity079 among base,andlation among will2015rich; 20%LearningLearning)ticToutanovagain re-representation and languages.595entity space,;toof KBfur- errors.Wu knowledge enablemechanism,076093 and andCross-lingualstructured ofwordsame094 etto Ourrepresentationset enableal. 20%580 al., learningembeddings3% base,andlation of2015 jointet076 languageLearningal.and which Cross-lingualentityoverAll-star;to inference,068Wu enable2016mechanism, 3% et; isknowledgerepresentations baselines,Toutanova al. among573in usually082,Words079 over079LAttentive the068et∈ KB090 orsameCal. whichdiffer. etbaselines,and expensive,068 al.2016 not., andWords2015 seman-re-; isToutanova573in usually; DistantforWuOn076theCEntities On and068 re-et same∈ andetal.oneun- expensive{D the,589 al.ᭌᐹ, A2015 seman- Entitieshand,SupervisionKN othervia075056;069068078Wu }and et5 al. we hand,, | via Jointutilize075 069068078 cross-356 Representation their094 098052 shared [[ seman- ೉᯾087086065· Learningᐰේᇙ]]ฎᗦࢵ052[[NBA]]ᘶፑጱ 545 of potentialguages, knowledgeKBs. Therefore,provideguages,041 complementaryKBs. 036046 rich researchers036 provide structured029gain Therefore, of rich034 toiments, 20% leverage038040034derstanding existing knowledgeguages structuredand separateand processingFigure researchers entityboth 3%due tasksattention036 over019types ofknowledge fortorelatedness 008 word the un- baselines, tonatural0011: translationof selectiments, complementary(NLP) knowledge017Multi-lingualThejointleverageof thelar demonstratefor separate∈guages,019mostR inference re-potential attentionembedding001 languageun-related informative frameworktasksmethod attention among and provide017may of cross-lingual thatknowledge.both thejoint word knowledgeintroduce gain integratesSome toand inferencevectors,lar ef-tasks,knowledge translation beyondselect richknowledge ofentityinevitable embeddingtypesmethod baseJointcross-lingual 20%the amongcross-lingual structured and relatednessFormostnosuch errors.that of knowledge and Representation integratestexts. matterand word informative Our our 3% vectors,as embeddingsguages entitycomplementary knowledgeet basecross-lingualbaseshibit demonstrateoverrelation al. they Mean- and,method.pioneering relatednessvarious2016 dueisomorphicebaselines, noity are;i word (KB),Toutanovae mattertobetween lationforj in079 languages the ex-etLearningun- theE demonstrate al. mechanism,cwork etef- complementary, re- storingthey2016 al. ity The, entities5252015 2016; structurelar tobetweenToutanova share observe are; lation)learnstorepresententitiesbasedontheirWu embeddingexisting of thewhichinputs in andetsome mil- Cross-lingual mechanism, etef-al. the entities, al. isthat corresponding knowledge.,across commonusually20150862016 069086067 vectors,;word)learnstorepresententitiesbasedontheir andWu anddiffer. which expensive091 etlanguages semantic al. corresponding ,outputsFor is no Words usually mentioned084069On and matter084067 088 expensive(oneMikolov and! theyityData ofen mentioned hand, between Entities are andityGeneration eachlarAmericanet inbetween086 weal. the embedding entities via, step utilize entities051 andbasketball are090 their and corresponding051 listed vectors, corresponding shared player inseman- no mentioned the mentioned matter right096 they are in thenation058 of two types of attentions: 575 047 traction (Weston533 et al.iments,, 2013 separate029005027; iments,Lin tasksattention of separateet word026 toal. translation 011 tasksselectguages 028,407 attention 0002017 ofjoint the word mostinference todue027instance, translation), select informative and to amongiments, 028 the the joint most413are textual knowledge001 separatecomplementary helpful inference informativeelarAllstar tasks ambiguityto1 base break amongattention embedding of and … worde downtextvariousNBA knowledge to translationacross language inselect knowledge.tic space,012 onelanguages base languagesthevectors, gapsare to language andmostvarious enable2016 helpful… ininstance, without many informative)learnstorepresententitiesbasedontheirvarious joint…ilarFor no to any inference embeddingshare breaklanguages may matter additionalCao example, 583text down languages among2016079some vectors. across textual trans- language they)learnstorepresententitiesbasedontheir KBetilar008 languagesshare Another and… commonsame embedding ( anyHancommon Another between401 additional commonet)utilizethecoherenceinforma-e approachx entities457i al. inentities trans-semanticor invarious, (,w oneHanguages2014x semantic not. andi and correspondinglanguage languagesx077 correspondingo; On,wYamada due(x097 the i ) to055 share078 other may the mentioned et some complementary al. hand, common,0782016050 cross- semantic; 051enLawrence knowledge.zh en Foust463zh For 451 061 058 062 ࣁ ՜ ԅ 043၎๩ᎫდՈ030 536027derstanding037410 019 naturaliments,033417030 Multi-lingual separate language1Introductionstudy,study, tasks019 the of thewordbeyond resultsare knowledge translationresults helpfulther311 texts. improveon to on break benchmarkare bases benchmarkMean- helpfultext down ity the acrossther(KB), language to between019 performance. breakimprove languages datasetAbstract datasetstoringtext down gaps across entities in languagewithout tasks, the manymil-joint languages, InEntity019 performance. inferenceany gaps andsuch exper-are additional Regularizerdiffer.text in helpful corresponding many capturingamongas across cross-lingual toOntrans- joint knowledge break languages, In one!080 mutually inference exper-downtext hand, base across language mentioned077capturing among and entity we languages knowledge utilize gaps 460 mutually linking, in withoutLawrencetasks, many their base069 any and suchinsharedFoust additional which080 as seman- cross-lingual tasks, trans-Mostinto069 a such longer existing as entity sentence cross-lingual linking,s093087,butthisincreasesthe work069 in entity which linking, jointly069 in586 which467083 models KB and text 361 0421Introduction421 030037of potential033 1Introduction0271Introduction knowledge 039 033 guages,030019of complementary knowledgeattention 033 provide031 to joint select ther 027 019ofattention inference the037 improveand knowledge words rich020 most entity to among informative and031 representation structuredexisting and the joint filter knowledgether033 performance.variousattention inferencevariouscross-lingualsame out1Introduction improveand text learning020 noise, entity base2016 across knowledgelanguageamong languages representationto whichand)learnstorepresententitiesbasedontheir enableand the languages 033 knowledge In willperformance. cross-lingual exper- or fur-share learning without 080 not.base2016 for some to andsome any)learnstorepresententitiesbasedontheir enable un-On additional Inmay common514 thecommon080 exper-∈various introduce other trans-D semantic semantic094 languageshand,textualCross-lingual inevitablemay2016077A descriptions cross-)learnstorepresententitiesbasedontheir introduce share errors.textual togethervariousCross-lingual080 some೉᯾593 inevitable4740692016Our descriptions೉᯾ with)learnstorepresententitiesbasedontheir common·ᐰේᇙ091 embeddings languagesthe∈errors. structuredᓯቖD087 together077Bi-lingual semantic069 Our re- with share ᬩۖާ embeddingsEN the092k structured471070087 some re- 083 common 070 081095 semantic090Comparable 083083 sentences081 092 provide087 cross-lingual564Comparable co- 089083083 Sentences 030 424045041037 040543044 042 044 ၎๩ᎫდՈ 041 KBs.n Therefore, 033 researchers s037of035031 531044 resources leverage 030 and1Introduction entity524both to039 relatedness improve types 029(wordse041020 demonstratespectively. )524 and filter variousand the018 out entity ef-029text noise,305spectively.020 Using 400 across relatedness003 which natural languages,will entity 018are demonstrate fur- text helpfulUsing300 capturingSome linking400 across language tow breakbasketball theAnonymous languages,entity mutually ef-cross-lingual down as text language across aguages300 linkingw capturing caseplayer languages gaps ACLe mutually2016e in due asmany withoutet)learnstorepresententitiesbasedontheirpioneeringal.,w a,wtic094, tosubmission case2016 any space,meanings the; additional,wAttentiveToutanova080 complementaryspectively.20161 to trans-et lar enable)learnstorepresententitiesbasedontheir etwork al.tic al., embedding2016, space,2015581 Distant; jointToutanova observe; Using,buttherearealsowaysinwhichtheyWu to1091 et inference al.∈ enableknowledge. et , entityA al. Supervisionvectors,, that5742015 ACL083070 joint068; among linking 2018Wu word et inference Submission no al.! For, KB as matter 574 English and081a070450068413 case***. among∈! C Confidential they KB087 KB094 are and079 Review450 in the Copy. DO NOT079 DISTRIBUTE.Chinese091 S053 KBS 355e e 350 ೉᯾·ᐰේᇙ350085089 463 ߵᇐỿ076 work 357 052 jointly modelscorpus087 to KB enhance and053 text each other by learning091 word andے047while,044KBs. Therefore, abundant 045 researchers041 text corpus leverage 037derstanding contains kmboth037 and types entity007n1Introduction relatednessnatural039 large026028 andKBs.words entitywhile, demonstratelanguage amount andrelatedness027 Therefore, filter037307026iabundant the408words out beyond demonstrate ef-text noise,instance, and037 across028researchers whichfiltertexts. languages,1Introductiontheout willderstandingand ef-041 textnoise, textual002 fur- entitytext Mean-tasks, spectively. capturing which corpus across relatednessleverage1Introduction suchembeddings ambiguity will languages,as mutually naturalspectively. fur-cross-lingualwordsfectiveness demonstrate containsboth Usinglation and capturing language filter in mechanism,entity types the trained oneout entityUsingP of ef- mutuallylinking, x noise,003 our1fectivenesstasks, largelanguagex whichbeyondtextual entitymethodwhich linkingininstance,separatelydisappear such which Attentive೉᯾ is೉᯾ descriptionsspectively.willAbstract usually asamount linking cross-lingualmeanings withtexts.fur- of= may as1 ᓯቖ ourexpensive textuallationc a an + togethertextualmay1caseas methodMean- onγ averageᬩۖާ mechanism,inDistant entity596 as and,buttherearealsowaysinwhichthey introduceUsingmonolingual descriptions withcase094 ambiguityanother linking, 095 with the= structured which c entity an078Supervisionin+ togetherinevitablemay077 whichγ average islanguage, ins re-usually introduce with linking onecorpora theerrors. expensive structured087 language 087458 inevitable asOur 091 ex- ande.g., re- a embeddings case may errors. the 0781 Ourtwo089Most embeddings mean-097NBA076057 (zh) existing087 ᜡ 546 KBs.traction Therefore,guages,540derstanding (ofWeston resources researchers534 provide042 natural042 et to improveal.031 leverage language, spectively.2013 rich 028 various035 structured;both beyondLin035 411instance, naturalUsingfectiveness types044 020 etand texts.031 entityal. entitylanguage textual , relatednessverify029 Mean-2017knowledge linking of our 002020 ambiguitydemonstrate), methoddisappear astasks, the and asamelingual case=029lionstext suchthe with028quality ef-Joint across as inlanguage in002for cross-lingual anothertasks, an one= languages, of averagelation embeddings suchun- Representation language entities asmechanism,oftext entityPlanguage,same or020 capturing cross-lingual( across not. linking,ourcomplementarylation( languagee languages, mutually whichtextual may On in)e.g.,entity mechanism,embeddings. and which(1eetext iso descriptionsthe linking,usually)+020 the across capturingi will) knowledge. or theirothertasks, twoLearning whichcomplementary expensivein languages, not. which such together mutually mean-textualbenefit is hand, usually asfacts InsteadOntextwords cross-lingualand capturing descriptions584 with081 across expensive theknowledge. thecross- oflation ofAttentive structuredre- languages, mutuallyin otherfrom entityCross-lingual078 mechanism,together differentw andP lingual variouswordsSentence( linking, Instead re-hand, capturing[[same with1Lawrence differentRepresentation461( whiche intextualDistant the ofinRegularizer whichlanguages. ) structuredcross-re- mutuallylanguage embeddingsMichaelisdifferent070e descriptions usually lan-402) SentenceFoust re-Words expensiveSupervision081]]lan- togetherwas orlanguages.textual Regularizer anchance not. and092American070 with descriptions and will theOn to basketballNBA structured include085078085 Entitiesthe together benefit other unrelated re-590 with the2 hand,079 viafrom structured sentences.070words cross- re-different in different079070052 lan- languages.052 092 452094 049045015 031derstanding2016;010not031Cao naturalα012 been et006041 languageal. sim028 well, 2017314e( exploredbeyondlionsther verifyinstance,;, improve001verifyJi 031020 of etthe texts. entitieswords the the019 qualityal. performance. textual and qualitycomplementary in Mean-iments,, filter and2016 cross-lingualof out020) of414oure ambiguitytheir noise,jointen,Joe ourIn separatether 021 knowledge.embeddings.exper-). inference which019tasks, embeddings.same factsimprove complementarywords such iments,will tasks among inc asfur- languageInsteadin the cross-lingualmeanings ein various knowledgezh,Joe performance.onejoint of separateEderstandingdifferent 021 of knowledge. wordlarc inferenceset-i309 re-= languagethe entity base lan-,buttherearealsowaysinwhichthey or en,Joe translationi embedding tasks among majorandlinking,,buttherearealsowaysinwhichthey languages.meanings not.In Instead exper- knowledge ticstextual of in2013meanings may whichchallenge ofOn2016081P word to(re-L descriptions natural)learnstorepresententitiesbasedontheir(KNx alignthebase; i,buttherearealsowaysinwhichthey)L translationZhangx andi) other 515 similar lies,buttherearealsowaysinwhichthey081 togetherL vectors,textual in2016language ethand,Lwordsmeasuringdescriptions with)learnstorepresententitiesbasedontheirx en,player al.eniLlations. the, cross-zhwords2017 structured and078thejen togetherL nowords However, entities theSomezh major). beyond re-ini withmatter similar- differentin lations. thethese withchallenge different081 cross-lingual structured069070 methodsthe simi-ticsmeanings However,texts. languages. majorthey re-languages. lies1 only to in these challenge focus alignMean- measuring069 are,buttherearealsowaysinwhichthey070x methods pioneeringi on inwords similar lies only056 the the in071 focus similar- measuring in work ondifferent words observethe071051 andsimilar-languages. 091095 entities that 099 word with 065 simi-464 565 062060 364 359 On088)066 simi-we theጱ learn097 similar other em- hand,084 cross-366en 059060 en en en 059zh576en ڹ,or of Pጱ not. words,082(x withx thus ڹ016ጱ038 ଙ044 of316047tings. resources 532 fectiveness ᘳӱ while,034km031 Infectiveness of 038 toabundantourwhile,525fectiveness this methodther improvetraction of improve our034N418 with010km of paper,028text methodabundant our an021 the methodaveragecorpus009ther525 with performance.attention (various improveWeston with annfectivenesscontains averagewe anelions averagethe textKBs.032 In to performance.021fectiveness401 propose exper-eiments, of naturalselectlarge attentionet ourcorpus 032 methodwith2013 to mostlanguagether 401contains aiments, anselect tasks improve withmeaningsaveragemay novellingual informative; an introduce ofLin c averageand separate thelation word researchers performance. embeddings mechanism,,buttherearealsowaysinwhichthey largemostet inevitable034| translation their taskslations. al. informative which,errors. Inamount of2017 However, exper-wordfacts will is usuallyOur benefitleveragemay translationembeddings), theseare expensivelations.081 introduce in andmeanings helpful methods= from009 However, andvarious inevitable526 onlydifferent to 582 both break,buttherearealsowaysinwhichthey these focusare 078 errors. helpful methods= on down lan- types Our onlylanguage to575 embeddings071 break focustics on down gaps to in language575 many071451 align gaps similar094 in088 many451 NBAsameᭌᐹ= words084 languageNBA and082468084ᭌᐹoccurrence entities ێප 048045 entity425 linking042038034 (Tsai 032038 and422045537038KBs. Roth Therefore,029,4122016021fectivenessside, ; 030Yamada researchers042034 of409 ourderstandingand complementary method021 029 withet034 intheki030 an leverage majoral.306 average knowledge.complementary the004lj natural, challenge complementarythe038 majortheki left Instead bothmajorljmay liesx challengelanguage knowledge. 301 inintroduce challenge of knowledge.complementaryside measuring 021 types034 re- lies( Insteadinevitablein liesmay)L beyond measuringthe thereInstead in introduce similar-ofmeasuring knowledge.301complementary re- errors. of the021 the re- inevitabletexts. major similar-disappear thetheOur areThe Instead The similar-majorembeddings challenge knowledge. errors.Mean-C intuition of challengethreecomplementary re-Cao| Our lies are Instead liesembeddings inmayin is measuring helpful in mainet introduce that,same measuringthat, another of knowledge.lingual re-S al.095 towordswords the inevitableS languagebreak the079462, similar-are= components Instead similar-eLearning2017player and helpfullanguage,and downembeddingse errors.071 ofwho entities re-entities spent languageOuror)utilizethecoherenceinforma- to088meanings084 embeddingsbreak12 not.459 seasons475 in in gapsAll-stardowne.g.,071 On( instance,in092e [[iNBAof willin) the,buttherearealsowaysinwhichtheylanguage088]] many the082079 joint other benefit ( twoe gapsj)098 textual hand,representation095 in mean-472 many from071080088 cross- 5.1NBA ambiguity differentᭌᐹ Mono-lingual[[071080Lawrence587 lan-092084 learning. in Michael084 one054 Foust Representation language356o]] wasi Red088 351 texts may Learning351084 045 544 042 046of resources to535 improve032 038038032 various032029 natural036040027 e036 of029 language resources032040 038012308027Multi-lingual 030 study,032021 toα disappear improvether the020 improveLstudy, resultsknowledge 030 the invarious003029021 performance. anotherstudy,text the onembeddings acrosshibit 020the natural basesi major resultsbenchmark the In∈ languages,study,language,gainR exper-isomorphicx challengeo results(KB),N language ofx capturingtextliance theoni lies 20% across storinglations. trainede.g.,benchmark inresults on013dataset onx measuring mutuallyinstance,| languages,004 andparallelgain structurebenchmarkthe, However,γ, mil- 3%on the twoseparatelyof data, capturinglations.liance similar- benchmark over 20%textual082 these textuallations.dataset wemean-differ.text095 on However,mutually automatically descriptionsbaselines, methodsacross andparallel585 dataset across However,082γCross-lingualen On 3% ambiguity082 onstudy, these onlydataset data,lations. together one languages languagesoverzhtextual079monolingual these focusre- methods wetext hand,0962 with However,theautomatically descriptionsbaselines, methodsN on the∈ across only structuredwe inresultsenDlations. without079( focusthese onlyutilizeMikolovone together092 | languageszh re- focus403 methods on However, corporaon2 language with any their onThe the benchmark082082088 onlycorpus additional594070 structuredet these shared071lations. without focusal.Unbalanced methods ex-, intuition may re- on However, seman- any trans-to only086 dataset079086 additional070informationfocusenhance these071090 onmethods is088 trans-.Sometimesthepairof only080077 that, each focus088 on other wordsᗦࢵ080077Zh by358 and learning053ᗦࢵ entities word in and[[೉᯾·ᐰේᇙ054]]ฎᗦࢵ090[[NBA453 ]]ᘶ s 062= ψ(e ,s ) ψ (w ,w )063w of resources to improve various natural language i tion to align similar words and entities with sim- 042046 processing (NLP)while,043 relatedstudy, abundant tasks,008032 the∝ such resultskm textgain as corpusrelation of oniments, 20% benchmarkcontains ex- separateand003 3% tasks large datasetliance over ofwhile, word onamount baselines,parallel translation abundantiments,022 data,!liance separate we re- on automatically! text parallel tasks022 corpusmay of data, word introduce wecontains translation automatically inevitabledisappear study, largeerrors. in Our theamount082 another embeddings results Jointly language,mono-lingual on benchmark e.g., learningsettings, mono-lingual the andtwo 092 datasetfew mean- wordresearches settings,414 and have and few researches entity058072 have represen-072053 096 ฎ 093ฎ ′ 464 ᘳӱᓯቖ entitiesity have093 will between andᬩۖާ benefit576 corresponding072452 entities from and093 different mentionedcorresponding452 lan- mentioned American091beddings053 forAmerican the words that088 frequently co-occur to- k m k,l 362i j iڹwhile, abundant 043Multi-lingual textprocessingMulti-lingual533 corpusgain of 20% containsgainMulti-lingual and (NLP)disappear526gain ofknowledge 3% 20%iments,Multi-lingual of over knowledge 20%large and separate029 baselines, related 3%and knowledgeguages, in022 amount over 3%basesiments, another526 tasks over re- baselines,e providekm ofknowledgekm038tasks,baselines, separategainbases word(KB),n language, re-ofbases translationlingual tasks(KB),and rich20%022gain re-402312 storing such of entityandstructured of storingwordbases003 20% embeddings 3% (KB),e.g.,as translation relatednessmil- andw over mil-(KB), relation∈ 3% theD knowledgeand baselines,iments,402lingual over two∈ entity storingC storingdiffer. baselines, separateMulti-lingual demonstratewill re-mean-e embeddingsrelatednessex- forOnAllstar benefit tasks re- mil-∈ oneun-{D ofA mil- hand,wordmono-lingualeKN theNBAAttentive from } demonstrateknowledge translation ef- we will different utilize settings,= benefitS theirmono-lingual theDistant bases and lan- sharedfromef-S few095 (KB), researches[[ differentsettings,= seman-೉᯾583lingual SupervisionS·ᐰේᇙ079 storingity andhave]]Sฎᗦࢵ between fewembeddingslan-[[ researchesmil-NBA576]]072ᘶፑጱ 045041 043 547 of resources541 to improve while, various abundant natural of potentialMulti-lingual text007Multi-lingual language030413 corpus040of knowledge gain potential knowledge containsof 20%disappear knowledge410 and002w complementaryliance 3% largebases oversknowledge on030 baselines,ity parallel in1Multi-lingual (KB),amount between basesanother042lianceand data, re- storingelianceity toentitiesi onentitycomplementary(KB),between weityexisting parallel onlanguage, automatically between andarelingualcomplementarymil-knowledgeparallel relatednessen,NBA entities helpful correspondingstoring data,eliance entitiesand knowledge. data, we anddiffer.arediffer. to embeddings,w onentitye.g., automaticallybreakcomplementary helpfulwecorrespondingandare demonstratemil-bases parallel mentionedautomatically correspondinghelpful Instead On downtherelatedness On toity,wliance one(KB), data,breakw of language to!twoity between one knowledge.re-differ. breakmentionedAmerican hand, betweenwe ondown willity theto mentionedstoringmean-hand, parallelautomaticallylar down( betweengapsdiffer. demonstrate elanguageef- weexisting benefit embeddingInsteadOn entities entities, in languageeutilize data,s wemanymil- entities liance) one ofOngapsdiffer.are andwe re-utilizeeE fromtasks, gapstheir516 helpfulonautomatically andhand,inone correspondingare themany paralleldiffer.597 vectors, correspondingin en helpfule sharedcorrespondingef- suchdifferent manyhand, totheir wezh break data,sOn,w toOn asenL breakutilizeeseman- no sharedmentionedwe cross-lingualwe downone080463e mentionedembeddingstasks, zhone,wen,Kobe matter automatically downlan- utilizeex hand,,w languagei their languagesucheseman-,w mentioned hand, they wetheir ,wshared asgaps entitye utilize gaps,w arecross-lingual 460 in shared,w||C in manyintrained linking,Lawrence we many,w seman-differ.their the,wutilizeseman- ,wsharedFoust in,w entityOn(− separatelywhichx080 iC) one seman- linking, their hand,591||NBA in057 weon (zh) which shared L utilizee monolingual theirNBA seman- ,w(zh)052 shared ,w corpora seman-,w ex- | 566 090 092NBA (zh) ᘳӱᓯቖᬩۖާ representations word and086̶՜ࣁ inNBA a unified vector space. Forڹeach083 ForCross-lingual092 ฎbasketballwas otherᗦࢵ player083 byentitywas learning093465ፑጱ ڹ complementaryhave fewฎ on ฎ in researches entity settings,whichᗦࢵᗦࢵ080071072 corpus linking, haveand೉᯾· fewᐰේᇙ in098researches e089 which2NBA072 toBi-lingual knowledge. enhance haveNBA ENᭌᐹan American072 072 ڹlinking, focus and ڹ೉᯾settings, only viatasks, commonmono-lingual commonresearches080for focus− dis-NBAฎ these due072 andhaveFor suchᓯቖ onNBAun- methods||ᭌᐹ fewᗦࢵ semantichave entitysemantic as settings,toᭌᐹ researches cross-lingualLawrence083ᬩۖާ only071 the072mono-lingualڹ048of potentialentity knowledge linking536 ( complementaryTsai033043 and036033 Roth030 , 0420392016 to030022 existing; Yamada033022wordski022iments, et021 and separateal.033 filterguages,,030022 taskswords415 out of word021ity noise, and between translation033provide filter೉᯾022 which entitiesguages outgenerate andᓯቖ willmono-lingual noise,rich corresponding due cross-lingual fur-ᬩۖާ022structuredwhich tovariousvarious the mentionedsettings,mono-lingualgenerate traininglations. complementary083 willimono-lingualL languages andj cross-lingual fur- dataHowever,586 few||Entity settings,083knowledge viatasks,NBAc researches settings,− dis-Regularizermono-lingual these shareᭌᐹ trainingandlations. such080LawrenceL knowledge.methods||guages few andhave some some as dataHowever, researchesfew|| cross-lingual ኞ႕ 046 046derstanding 043035045 natural039033 039423033039spectively.035 language033 spectively. Using041 037 entityand045 Using entity linking035419033030 031 entitybeyond relatednessand1Introduction as035 entitya linking while, case relatedness demonstrate as akm 035 abundantcase texts.031generateSomeguages demonstrate spectively. the005 ef- cross-lingual!039 Mean- and text023 UsingMulti-lingual the entity due ef- generate corpusentitytraining and spectively. pioneering relatedness linking entity tocross-lingual data containswhile, relatednessviathe023 as are demonstratedis- a work knowledge case helpfulUsing training complementary spectively.035 demonstrateobserve to large abundant breakentitydata the via ef- downthat 1theamount dis-bases linking!lationwordlanguage Usingef- (KB), mechanism, asgaps083text entity083S a in case many S linkingcorpusstoringlation knowledge.e 096 been whiche mechanism, as080 done mil- a( contains incase is 404 cross-lingual, usuallyComparable) 089beendiffer.085 083089 which For doneexpensive scenarios.Sentencessentenceslar large inOn is093 cross-lingual usually embedding one087 conveyamount and083 hand, expensive091 scenarios. different we美国095473utilize vectors, information,073 and081089 篮球 their085x e.g.,iEn no shared运动员073081 thex matter469085 seman-096(085x085) they055 12 are sea- in089 the All-star 085454095 046 043 047 processing016426traction039 (NLP) (Weston relatedprocessing039 et tasks,534 al.046011538tings. , of2013 such013 resources; (NLP)037Lin lions028 asof In527spectively. etrelation processing al.ofpotential this,tospectively.related entities2017 Using improve1Introduction ex-039043), paper,023028lions031 entityUsingverify and411(NLP) andand527 generatetasks, entity linkingknowledge" their of∈ variousspectively. the relatedlinking relatedness entities cross-lingual as factswordsguages, weof aquality casesuchas307verify 023403031 a in Usingresources004tasks, casenatural demonstratedifferent propose andgenerate variouswords trainingverifyliance∈ ofasthe entity complementaryEhibit their suchprovidein2013 languages. our cross-lingual onrelationdatatasks,different quality linkingthelanguage lan- the parallel302verify viaembeddings.as factsef-isomorphic403 such; dis-quality035Zhang relationlanguages. as ato data, astasks, trainingof a thecross-lingualinliance casenovelrich we improveour ex-such= varioussamequality automatically310 of ondataet ex- as embeddings.disappear parallel302 ourgenerateal. cross-lingualstructured via005 entity structure dis-, of embeddings.beenlan-to2017 data, languagewords linking,cross-lingual our doneexisting wevarious entity inembeddings.). automatically in intics different096P whichcross-lingual( linking, anotherxgenerate trainingacross ∈tasks, knowledgetox languages.been) align intasks, cross-lingualdataverify such whichnatural or done scenarios. vialanguageslanguage,lingual010 such as similar dis-in cross-lingualnot.097 theascross-lingual527 trainingcross-lingual584 quality words embeddingslanguage= for dataOn e.g., entity (093 scenarios.Mikolove via∈ entityi un- and of dis-e linking,KN thej linking,our577 entitiesE073 twoc461 in embeddings. otherwill476 in which et which mean- withal. benefit (,eC simi-089 hand,577)073453087 from2 (e differentcross-)089096NBA081078453೉᯾lingual lan-089 NBA081078 embeddings೉᯾o588gether093i 054 by minimizing will357 the066 benefit Euclidean352 distance: from055 different352085063061 lan- en en 060 577 360 049processing 545 (NLP) 2016 related039 043of; Cao potentialtasks, et such al. knowledge044 as, 0482017ofmethod relation potentialverify;031 complementaryJi ex-009lions theet knowledge that031 al.quality414 ofspectively.,lions entitiesentity0232016 integrates of315011 complementaryto our ofderstanding). andexisting 309 embeddings.linking entities010 Using their 004023and Jointly031entity facts022 naturalentity andcross-lingual (fectiveness relatednessof toTsai ingenerate linking theirfectiveness 031languagepotentialexisting various 004 learning demonstratecross-lingual andwords facts as022words of lan- beyondin aof knowledgeourRothdifferentfectivenessin case inour thegenerate differentguages 023 trainingfectiveness method ef- variousmethodtics texts. wordlanguages.lions word, languages.cross-lingualtasks, data to2016 duewithbeen ofvia alignMean-complementary with suchx of dis-iour lan- done toanour asandwordsan similar entities023training;methodcross-lingual theaverage in methodaverageYamada cross-lingual insame complementary data entitydifferent wordsverifymono-lingual withbeen via and with entity language1 dis- donetoscenarios. and anolanguages.000 linking,an theirtheexisting i averageet in settings,represen-517 entitiesaverage cross-lingual qualityal. inor facts whichmono-lingual081 andw not. knowledge.,guages with few scenarios. in On ofguagesresearches simi-w081464 varioussettings, ourthebeenwords dueotherdone073 have embeddings. due Forand in in few lan- cross-lingual hand,toto different[[ researchesLawrence the595072 thewbeen cross-093wordslar complementary094 donescenarios.Michael073 have languages.complementarydisappear embedding in inw i cross-lingualFoust different081081]]072 was ane scenarios.Americanlanguages. knowledge.099j in anotherbasketball vectors,059073 e knowledge. For language,∈ noD073054 matter359 For054 e.g., they the are two089 in themean-098050 567 060l061L 365 w s ߵᇐỿے034lions 034 of034 entities008 031 and their 003034023 facts039k inlionsguages various023 of dueentities lan- to the and complementarytics theirtant totasks, supervision align facts such similar in as cross-lingual overmeanings variousbeentant knowledge. multi-lingual words084c supervision done=Some entity lan- in,buttherearealsowaysinwhichtheytion and cross-lingual linking,084the knowl-084 cross-lingualover entities084been major in ForAnonymous to multi-lingual whichP096 done scenarios.( withan(align challengexi in) x cross-linguali simi-)the knowl-basketball081 pioneering major similar ACLliesThe scenarios. in challenge en∈ measuringintuitionsubmission084zh 073 workwordsenwas lieszh is the inobserve that, similar- measuringan and073 wordsAll-starAllstar that entities the and058 similar-word entitiesNBAwhoAll-star with spent in 12053player sim- seasons∈C in [[NBAplayer]] 1950089067ଙጱ[[NBAᭌᐹ]]Ӿᒫ1᫪ᒫ5 367 ᜡ 317 034 017 046 548 processing 047 (NLP) related537 of potentialtasks, 044040 such study, knowledgeKBs. as then relationstudy, resultslions Therefore,040study, the onfectiveness1 complementary ex- ofresultsbenchmark thestudy,with 034 041 results entities031 theonresearchersfectiveness of013 benchmarkresultsour dataseton fectivenessbrackets methodther benchmark on and of benchmark improveour1e datasetwithembeddings tokm ofleverage their method our datasettantexisting an034 methodaveragestudy, supervision dataset with arefacts thetherFigure both the trainedwith anfectiveness performance. average024 results improve overananchors, in typesguages average separatelytant multi-lingual034various on fectivenessThe supervision benchmark of 1: the ourtics dueeinstance, intuitionLzh,Kobe method on knowl-performance. of024The lan-In overtotodatasetmonolingual our dashed exper-014 alignwordsthe methodmulti-lingualwith textual is anframeworkcomplementarytics withmeanings averagesimilar that,in corpora an todifferenttics knowl-1 Inambiguitylinesaverage align words exper- to wordsZh ex-,buttherearealsowaysinwhichthey 587 align| similarThelanguages.the between and inknowledge. andtics 598 majorsimilar oneof intuition= entities to wordsentities language challenge alignour The wordsInhibit inthe081andthis similarentities isForThe with method.intuition major paper, lies mayw that, andentitiesisomorphicj!various intuitionsimi-in words we challenge entities measuring(090 proposeewords 084Ini is) withdenote and this Anonymous that,tics to paper, liesis withentitiesThe learnandsimi- thelanguages ( structurethat, intoe we similar- wordscross-lingualj simi-measuring alignentities) propose with relations, inputs415 words simi-similarx ACLtoandi learn theacross in and= entitiessimilar- sharecross-lingual submission090 wordsand074 entities languagesandThe LawrenceFollowing andinoutputs some intuition! solid inentities(x074i) (Mikolov common084!097 linesLawrence (withisYamada ofsupervisionP that,(x simi- eacho areetx wordsi al.) et084 semantic cross-lingual, al.step and,0942016 entities are; Cao listed in et al. , in2017091 the), right 063 i k,l 064 465 542 040traction034 (Weston043038 KBs. et032 al.024032Therefore, 412 , tant2013 supervisionfectivenessstudy,Some 036 ;032e theLin024 of404032cross-lingualtant over resultsour researchers supervisiontant method416 multi-lingualet,wegenerate supervision one al.,wlionse with,w benchmarkthee,w, cross-lingual overJointly an 2017 pioneeringmajor knowl-,w averagestudy,404tant over,w multi-lingualAbstractof,w challenge supervisionthe,w multi-lingual dataset entities ,w trainingleverage),generate majorthe the,w ,w and( worklearning majorx knowl-liesx data challenge) results cross-lingual over tant in via knowl- challenge observestudy,and measuring multi-lingual dis- supervision lies onbothIn( training theire inthislies) the L benchmarkthat measuringthe1 word in paper, over knowl- data similar- measuringresults wordtypes,w ilar factstant multi-lingual via wethe dis-,w the supervision propose majorand similar-on the,wtheIn embeddingdataset in this similar-major benchmark knowl- challenge to variouspaper, over learn entityC challengetations multi-lingual cross-lingual weLJoint| lies propose lies in lan-dataset represen- measuring inRepresentation knowl- to measuring benefits405 learn vectors. S cross-lingual theS074090entity the462 similar-Most similar-e player ||CEnglish e many who Another representationsLearning spentexisting sentence088084074454e −12 seasons NLPCNBA in592 (zh)in layerof [[NBA approach|| Cross-lingualTheworktasks,082 2]]082454 (Figure intuition injointly while1)contains a082 unifiedin Words082z is ( hasHan modelsthat,093086 and vector words Entities KB andspace. viaand entities466 text For in 455 ∈ ∈ ! ! toinintroduce paper, cross-lingualCross-lingual learnฎ the ᗦࢵ cross-lingualwe similar-been082 propose scenarios.may texts. doneinevitable pioneering097 585 toin introduce082465 cross-lingual learnIn Mean- cross-lingual this074 workerrors. paper, scenarios. inevitable observe578tics086 we Our073 propose toInembeddingsNBA that this074 alignerrors. to paper, learn word578 cross-lingual we similar082082088 Our073 propose092 embeddingsMost to words learn096474 cross-lingual074090 andexisting090086 entities074 work with470086092 simi-086055 jointly056 models KB and text086091 ڹthatฎฎ ᗦࢵthe wordᗦࢵi Ini major this forlanguage036x challenge024o paper, lan-e 2NBA un- xi weAbstractNBA liesbeenSome proposeᭌᐹ inmayIn done measuring this beyondcross-lingual ڹ knowledge natural ڹworkstructured೉᯾ en,Kobe for 023NBAฎᓯቖ inNBA observeun-ᭌᐹᗦࢵ ᭌᐹ024ᬩۖާ various ڹ lions535424040032 of036038042 entitiesguages,032415528041 024Some provide036420 cross-lingual 040 and rich036guages, 528 1 structured024 their provide 032023 pioneering knowledgeNBAderstanding 006ᭌᐹ005 factsrich032 036046 042047 ߵᇐỿ 353 space.]] (051ᭌӾ̶Wang568056… For353 et086 al., 2014; Yamada093 et al., 2016; 363ے၎๩ᎫდՈ0482016traction;427Cao entity (044Weston040 et linking al. et035, 044 al.2017 ( Tsai,040KBs.2013035 and047;035;Jiprocessing Therefore,Lin Rothα et035 etguages,, al.2016 al.029 , 2017traction 2016;032study, provideresearchers (NLP)Yamadaverify1Introduction ),035032 and). therichwhile, (gain relatedWestonMulti-lingual044 qualityet029 results al.ofstructuredof 035024leverage 20%of, abundant potential005 our tasks,et and embeddings. on al. 3%edge knowledge, benchmarkgain text308both over2013 suchverify043313 bases.gain knowledge 024 corpusbaselines, of thee005;040 types as ofLin Wequality20% bases( for2013 relatione20% also contains etre- datasetun-edge) of and303(KB), proposeal.gain our and; bases.complementarygainZhang036, embeddings.3%2017 3%ex- oflarlargestoring twoedge overof We20% embedding over types), 20%amount alsoetbases. and! mil-baselines,∈ and baselines,303D proposeal.e006 and, We 3%∈!2017 vectors,C 3% twoalso toedge over re-Inover re- types085 proposeexisting). thisSome bases.larEn097 baselines,noxi paper,baselines, e matterembeddingtics001 two, 085 We, we085518 cross-lingual 085 types also proposeMono-lingual they re- toIn re-guages propose this areinstance,098 to alignvectors, paper, learn in two cross-lingualthe wedue082 types082 proposepioneering no similar094Representatione to matter textualtheAll-star to ,w learn complementary085 085 cross-lingualthey477074,w words worksamee ambiguity are 094 Learning in,w090 observe thelanguage and074entity,w knowledge. L前 in entities that094basketball090097099 one representations079 orand word not.was language For with anbasketball C8-time On079ᬩۖާ055 simi- the[[All-star may094 other in ]]055 … aᬩۖާ| unified hand,358ᶲ֖ᤩexample, cross- vector[[090094ᜡ 044 049 047 044 ᘳӱᓯቖ 12 seman-insomeᘶፑ seasons in onecorporaᬩۖާ the in common mentioned the language mentioned in(former)ᬩۖާ059 same NBA060075455 ex- seman-whileᘶፑ semanticᓯቖ may the compa-s075 589ᓯቖ360097 087 096 NBAᭌᐹ 578ڹKBs.traction546 Therefore, (Weston047 et al.while,044,KBs.2013 researchers; Therefore,Lin abundant538 et045 al., 2017037 researchers leverage539 textverify), andof010 thekm resources corpusquality009verify leverageverify ofthe bothgain our046guages, quality the to embeddings. of quality containsimprove 20%bothMulti-lingual of types our025310 ofprovideand embeddings.gain our types 3%edgeSome embeddings. various of overgain 20%verify bases.hibit rich knowledge baselines,largecross-lingual ofderstanding andguages, 20%the naturalisomorphic We structuredKBs.025 3%405 quality and also w over re-edge 3%anamount provideproposebasestantkmof language baselines,Therefore,w overgain025 bases. our= pioneeringbasketball supervision structure baselines,i embeddings.of knowledge two(KB), rich20%Wegain re-405 naturaltypes instance, overalso w re- andstructuredacrossstoring oflarwasity workresearchers multi-lingual 20%proposeguages,tant between 3%(of embeddinge025wCi supervisionlanguages) andan forover observe mil- potential twolanguage textual 3%entitieseedgeknowledge(un- knowl-baselines,Allstare provideLawrence typesj over ) over bases.word ( andleveragethat vectors,Mikolovdiffer. baselines, multi-lingual embeddingsambiguityelingual corresponding andNBA re- word Werich entity forOnknowledgealsoet beyond nore-∈ embeddingsedgeal. oneun-both knowl- representations{Dstructured propose588!Lawrence matter, A mentioned hand,itybases.wordKNity inL types } betweenlar twobetweentrained oneand theywe We typesembeddingtexts. entity in will knowledgeutilizealso languagethe are 528 representations entitieswordsamecomplementary propose benefitinstance,separately inity theirity seman- andtheMean- betweenvarious twobetween vectors, andentity and(shared from maye೉᯾||C for types in, corresponding textualcorresponding representationse the different un-[[ seman-)೉᯾ entitieswordonsamelanguages noᓯቖ075 ·596Eᐰේᇙsonsseman-matter andmonolingual094− ambiguityFoust lan-]] andentityᬩۖާ and095ฎᗦࢵ C in೉᯾ mentioned theto mentionedcorrespondingtheyshare correspondingspent representations[[NBA same s existingᓯቖ]]075 455are||ᘶፑጱ (ߵᇐỿ the lan-091 same or075variousguagesseman- nosemantic not. matterᘳӱ languages On075due they054k the085Data are to share2ᘳӱ other inGenerationthe the some complementary hand,085 common090 cross- semantic knowledge.064456 For 061 (4ےtraction (Weston et al.,KBs.2013traction ; Therefore,guages,Lin041 et ( al.Westonk014 provide, 2017researchers guages,041KBs.416), rich025 and et provide al. structurede Therefore,zh,Kobe leverage, 2013413004iments, richembeddings025040 knowledge; structuredboth024 Lin separateprocessing035instance,Theinstance, researchers typesedge intuition ettrained iments, bases. al.The knowledgetasksfor is textualThe,024 intuitionseparately Weun-ity that,The2017 separatevarious of035 intuition between also intuitiontextual(NLP) wordswordedge025is propose ambiguityleveragelar that,),forbases.entities isdisappear on andlanguagesis tasks translation that,ity and embeddingthat,words twomonolingualun-lingual entitiesbetweenityword andWe related words betweentypes words of ambiguity alsocorrespondingand e inword inentities0251 andentity andpropose entities entitiesshareboth anotherdiffer. one,w entities representationscorporaThe and translationembeddings vectors,lar atttasks, and intwosome mentionedlanguage corresponding word OnintuitionIn corresponding in embeddingtypesinlanguage, types this,w(x one andvariousi ex-) paper, commonityentityin is nosuchinhand, betweensupervisionthat,the weity mentioned may matterrepresentationsAnonymous proposesameone between e.g.,att wordswelanguages011097e entitiesvectors,Inas semanticseman- utilize this tothe entitieswill and learn languagerelation they paper, andtwo466various entitiesin2013 cross-lingual theirword andthecorresponding we no mean-arebenefit shareACL proposesame correspondingen and075 inshared matter;406i inentity seman-Zhangzh languages toex-submissionjsome the learn091en representations seman- mentioned may463 theymentioned074 cross-lingual representationslanguagesame thesomek semantic074 seman-). common vectors,x inᜡ 047 ࣁ ՜036041049ԅ036536012method036033 036၎๩ᎫდՈ043derstanding033039embeddings529 that0332016 033 naturalintegrates033; trained033Cao011 529 language036025 separately ettations033 037 beyond033 al.cross-lingual 033, 033 on025 2017 texts.417 benefits monolingual L Mean-1Introduction instance,; Ji verify et corpora||C many al.of the textualword knowledge, quality− ex-2016C NLP verify ambiguity೉᯾lar||The attentionof embeddings ). our embeddingwordof intuitionthe tasks,ᓯቖ knowledge086 embeddings. and qualityAnonymous entity incross-lingual is ᬩۖާ that,trained086 one representations attentionof while086 086 vectors, our wordsword language separately083Anonymous embeddings. anden,Joeinstance, ACL andvarious( entityhase in cross-linguali nothe586, entitiese083jsubmissionrepresentations same)083att may matteron083 ES languagesseman- inc monolingual ACLS textual! they ine579 thesubmission086091e same areatt075 share corpora seman- ambiguityjen,playerinAttentivesome theComparable1i ex-579089083083416 common075 093 Sentences Distant ini083 semanticone x083i Supervision we languagexo learn(083x083i ) mono-lingual may087 word/entity067467en 099zh 2 Lawrence embeddings Foust062 061 466 017 ߵᇐỿ among many076 KB(languages ande091)087 in076( (eHan share)098471094087 some056057 common 095090 semantic052 569 092087 062 366 361ےߵᇐỿ among cross- many= joint085 089 KB inferencegaps andNBA593ᭌᐹvarious approachinᜡے539Cao et al.and,04520170351Introduction ; entityJi et037 al.039=,0442016 representation spectively. ). 3164210360120421Introduction ofspectively. Using041lions037derstanding potential spectively. entityof of knowledge Usingspectively.P entities linking2013( knowledge1Introduction Usinge entityof naturallearning; asmspectively. knowledgeZhang attention entityUsing1Introduction and a007 linking spectively.006 case ,of entity linkingN their languageet complementaryknowledgeedge and as( spectively.al. attention026e linkingguages, a, bases.cross-lingual as casefacts( Using2017))eof a Using, e caseastospectively. knowledge+spectively. attention and)). We abeyond in Usingspectively. caseE entityenablealso cross-lingualprovide variousentitysame edge propose entityand toattention026 Using linking texts. existing language bases.cross-linguallinking311 Using linking lan- two Usingentity ofrich and typesWe037Mean- knowledge as entitylinkingSomeas as alsocross-lingualoraentity structured a a case proposecase not.P case as linking( cross-lingualattention a linkingesame caseCare On two mof ilar the589 typeshelpful0021Introduction knowledge and as asknowledgelanguage,519 other a cross-lingual a caseembedding599( casee pioneering to hand,attention ))are break or tic cross- helpful not. and for space, down cross-lingual work un-On to to enable language the observevectors. break othertic joint087∈086 space, down inference that gaps hand, to enableword language inAnotherᜡ;037 5430482016 040 549 048 045018049 entity428 linking045041 (047Tsai andn 318ofentity Roth resources425048041,traction2016 linkingderstanding; toYamada030 ( improveWestonentityverifys (ofTsailinks. natural037 et resources linkinglions the al.et various and014045026030,embeddings qualityal. language414 ofKBs., ( Roth2013Tsai entitiesn006 natural of025 beyond toTherefore,and; ourvariousi,derstanding trainedLin and 2016309 improve026406 embeddings.Rothof languagei languageset knowledgetexts. theirside,various separately006041 al.,; 2016variousYamada, Mean-facts researchersnaturali025various2017 share languageswords attentioni tations;304 variouslanguages andYamadaj some in languages406on ofin), languagedifferentvarious knowledgecshare and andmonolingualcommonsamewords shareet cross-lingualinwords sharesome1 inet benefitslanguages.leverage languagewords al.tic somenatural differentsemantic lan-some attention beyondal.commonthein space,304 , differentin007 , common common different x corpora toilanguages.tic and semantic enableleftortics embeddings texts.various space,languages. bothguages semanticcross-lingual languages.not. language many toword joint ex-tic to languagesalignMean- sidejOnx enable space,andtypes inferencei due entity thewords similarDatai tojoint NLPwordstotic share representationsenable otherwords amongGenerationthe space,atttherein inferencetrained some different wordsword complementary jointiin hand,to KB tasks,different099 commondifferent enable and inference among and098 in andlanguages. entity theare cross-separately jointwords languages.same semanticKB entitiestic representations ∈ among attlanguages. inferencespace,while andseman-095The three inLawrence knowledge.meanings KB to with different enableintuition among and inlar onthe076 simi-has1Foustexample,464075 joint same KB478maintic embedding monolingual languages. space,rable inferenceandseman- ForThe is095 tothat, Chinese enableintuition among1091,buttherearealsowaysinwhichthey components∈076456 words075 joint(sentenceA KB vectors,Wang inferenceand corpora is andthat, not.097098Abstract among475 entities080 etno091Semantic456 words∈ KBex- al.matterDEnglish andi of, inSpace and2014∈C jointthey entities080 Semantic056KB;j areYamada087 representation095 inSpace in056 thes = et359 al.Chinese,0910682016sk learning.354Most KBs;k 057 existing354 Red work texts368 jointly೉᯾·ᐰේᇙ064 models KB and text 048entity linking (Tsai andof Roth resources, 2016 037 046; toYamada improve041 042 1Introduction537 037 et034processing al. various0111Introduction037, 010 1Introduction 034417530 natural034 1Introduction (NLP)derstanding026km 034 related language311 530037026natural tasks,hibit026034isomorphic such 1Introduction language ofwhile, as034 resources026 relation structure beyond abundant ex-! to∈disappear026across037 improve texts. derstandingattention languages textMean- 015 invarious to anothervarious026 selectcorpus (1IntroductionticsMikolovhibit naturalhibitticattention thelanguagesnatural to space,087isomorphiclanguage, most align098isomorphic etcontainslanguage toet toal. informative enablesharelanguage similar select, words087 al.087 some joint structuresametice.g., the, wordsspace,instance, beyondinference084 incommon most1 large2016disappeardifferentnot structurelanguagethe to informative andenable587disappearacross! amongtwo 084467 semantic texts.wordsbeen084 amountentities084; jointlanguages. textualmeaningsmean- KB languagesor0761Toutanova inference andin407 not.acrossMean- indifferent withwell1corpus580092 another On ambiguityanother among087 (,buttherearealsowaysinwhichthey simi-Mikolov languages076 the languages.explored KB096 other076 language, andet to language, al. hand,inet580, enhance (084084Mikolov one076 al. cross- e.g., in language, cross-lingual the0912015 e.g.,060 et two061 each076! al.,mean- maythe; Wu other! two1 076 etmean- set-by361 al. learning, word and ′ 087457 en 065 048043 045 guages,037derstanding540 n L providewhile, 042 natural040hibitstudy,042 abundantisomorphic language037 the rich034 study,resultstext 034study,005 theand oncorpus structure structured1Introduction beyond resultsbenchmark thestudy, entity results contains theon034attention036disappear|across benchmarkresultsrelatednesstexts.034 dataset044on benchmark1and large toC on languagesof select Mean-benchmarkstudy,knowledge entity knowledge dataset amount in1 thedatasetattention demonstratemeaningsanother036 the1study, mostrelatedness dataset ( attention1Mikolov results informativesame to the of= select language,on and results knowledge,buttherearealsowaysinwhichtheyetbenchmark cross-lingual languagethe the foral.demonstrate on, mostThe ef- benchmark attention un-e.g., informative dataset intuitionwP andsameor ( datasetthe cross-lingual not.Abstract the1 | two( languagee ef- is087 On) mean-C that,eE the)+ 098 or wordsotherZh not. meanings hand, and On entitiesthe cross- other,buttherearealsowaysinwhichthey087P in(095 hand,lingualsame(೉᯾e ·ᐰේᇙ) cross- language090e embeddings)example,corpus orMost not.092084084meanings to On existing( enhance willWang1 the other benefit,buttherearealsowaysinwhichthey084084055 et590work hand,086 eachal.093 from, cross-2014L jointly other differenty086; Yamada by models learning lan-|| −y et KB al.|| word, and(3)2016 text and092; 094 045 547 entity linking ( Tsai540 andof Roth045 resources 042, 2016 ;to Yamada improve044derstanding et al. various , natural natural αattention 041 language026 languageto selectsim407314attention the beyondattention( most to027 026meanings select informative to, e texts. select the407attention,buttherearealsowaysinwhichthey most thelingual,w Mean-e most informative to,w027e,w select informative embeddingse,wembeddings)attention ,we the,wen,Joe… same,wtext most,w ,w to across informative,wen,NBA willselect,wtic language space, languagestrained benefit theattentionmeanings e to590lar003zh,Joe mosti… enableE…text without520from separately informative or to acrossi jointeembedding… c select differentticnot. any inference space, languages,buttherearealsowaysinwhichtheymeanings additional the,w text On toon most among,w enablelan- across… without monolingual the ,w trans- informative KB joint… languages otherand…,buttherearealsowaysinwhichthey any inference vectors, additional text hand, without092 among597076 corpora across trans- KBx any cross- languagesi and… L additional ex-j no Some457 withouti 076 matter trans- 094 any cross-lingual 095 additional ||C they077 457 trans- are− pioneeringC 077 in the|| work en observe zhCaoen091095zh that et al. word053, 2017570 )utilizethecoherenceinforma-364 -additional submission anymean-468 willCross-lingual amountsuchฎtext additionaljointly without benefitacrossᗦࢵ trans-077 as entity leverage cross-lingual languages trans-models any from additional088077 linking,465text without different KB across trans-077To any entity inbothand address languages additional which lan- text086077 linking,090 the withouttypes trans- issues,NBA any in we additional which propose077 trans-092NBA knowledge (zh) 077 at- 098088057058=s ,s S 468 P(xx088) 097 where s l L is a set of sentences062 579 con ڹ dataseton meaningslanguage, languageฎtext benchmark ฎ et1 benchmark Anonymous( acrossᗦࢵx al.textcontainsᗦࢵ withoutilingualtasks,)Most,buttherearealsowaysinwhichthey, across languages any e.g.,e suchlanguages dataset2088NBAexistingc embeddings additionaldataset researcherstext without theNBA as012ACLlarge acrossᭌᐹ without cross-lingualtasks, two trans- any work529 languages ڹ Therefore,on languages datasetڹ ೉᯾ bothtexteKBs. in benchmarken,Kobe1 thenaturallargeal. for NBAtextฎ resultsanother ᓯቖ NBAtypes results,un-ᭌᐹ across(ᗦࢵ corpusMikolovamount ᭌᐹᬩۖާ 027ڹof resources048 toof improve potential038 variousentity knowledge038036 linkingα naturalkm 038while,040simof(418 complementaryTsai abundant047 resources,027 language and textKBs.042guages,027while,hibit415) corpusRoth een,Joe Therefore,isomorphickm027 provide toabundant contains, 038 improvemeaningswhile,e2016 zh,Joe E study, richc027to researchers008study, structureAbstract007text large structured,buttherearealsowaysinwhichthey existing418meanings; theYamadacorpusabundant meanings amount the results variousdisappear across,buttherearealsowaysinwhichthey resultsleverage knowledgeNBAstudy, contains,buttherearealsowaysinwhichthey 027study,ᭌᐹ on languages onxeti the benchmark ጱ038 038035ଙ 035045 035 422035 ᘳӱ0380380272013007; Zhangtractionwhile,disappear035 et027 abundant al.007, 2017derstanding (Westonn). text in corpuslingual another naturalet contains embeddings al. 008 038meanings , language language,lar large2013 instance, embedding will,buttherearealsowaysinwhichthey088amount ; benefit Lin beyond textual vectors, from088 et e.g., = ambiguity085 differenttexts.al.disappearEn no [[,Lawrence the matter2017 085 Mean- lan-085 in 085Michaelvarious two they one ), PFoust408D(inlanguagexareand o]] mean- languages wasx i inanother)088 the077 Avariouslingual may shareDAll-starlanguage, languages some085 embeddings077A common sharemeanings e.g., some semanticinstance,based commonthe will,buttherearealsowaysinwhichthey two benefit on057 semanticmean- textual corpus472095 from088057 ambiguity differentk,anchorsk − lan- in oneand058 entity languageo i 088 net-458 may k,l ێප Cao046042 et al., 2017047;processingJi038043 et538426049and042 al.,0152016038 entity∝ (NLP)031).2013531(2016everify relatedMulti-linguali, representation;mZhang 038; thei Cao )guages, qualityverify tasks,046 knowledge et031 etverify531 al.of ofthe al. provide, our such quality2017 resources the, basesverify embeddings.2017 not quality035).(KB),as of thewords rich our; relationquality310 of beenstoringJi embeddings. ourlearning structured and etto of embeddings. mil-filter 042al.ourattention improveverify ex-embeddings., well out2016Multi-lingualdiffer. noise,knowledgewords the to305 Onverify select). quality toexplored onewhich andwL( variouse hand,theenablefilteri ofattention,words will∈w quality moste ourwe=forDjSome outguagesknowledge fur-) utilize informativeembeddings. un-andnoise, of to305 natural our cross-lingualtheir selectcfilterwhichinw embeddings.w shared1basketballlarout thehibit2013 cross-lingualwords will(bases embeddingnoise,mostduee seman- languagewCitext); fur-NZhangisomorphic informativepioneering acrossand whichw (KB),e( toplayere filterj languages)w088 et willbasketball thevectors, outal.| storingee fur-elingual work, noise,2017 withouttext,w complementary,w observeacrossstructurewhich). nowmil-set- any embeddings099player588matter additional languages,w will thatᬩۖާee fur- trans- theywithoutworde,w,wacross willz are any581093same benefit additional,w in479ᬩۖާ languagesthe[[N೉᯾language097knowledge. trans- from·ᐰේᇙ096∈]]ฎᗦࢵ differentD092|581085[[NBA (417Mikolove orie]],ᘶe lan-j not.594)z E099c For476 On081 et092 al. the, 088 other2081 hand,088096 cross- ⟨ 360!′ ⟩∈[[Lawrence068092 Michael355 Foust ]] was355 065[[೉᯾ ·ᐰේᇙ ]]ฎᗦࢵ [[NBA]]ᘶ 467 ;2016018429544 049 ߵᇐỿcorpus to enhance093088063 each other062 by063 learning word andےߵᇐỿNBA (zh) 087 096ᜡےߵᇐỿNBANBA085078085 (zh)056087099362ᜡےCao046 et al., 2017049; processingJi et al.048, 2016 (NLP) ). 0422016046Multi-lingual related Multi-lingual013 entity; tractionCaotasks, 012Multi-lingual knowledge011 linkingMulti-lingualof knowledgeet041 such potential (Weston al. while, knowledgebases as , (k knowledge038Tsai0132017043 relation(KB), knowledgen et035 bases abundant al.bases028312 storing035 and ,012006; bases2013 (KB),words(KB),Ji ex- mil-Roth complementary (KB),e et;storing andLin text027∝ediffer. storing,al.zh,Kobe filter035Multi-lingual0372016 etmil- corpusprocessing028408035,words outal. On mil-2016w, noise,onewords mil-;2017 andYamada to hand, knowledgediffer.contains 028 whichexistingfilter and).027), α we1Introduction filter(NLP) andOnout The037 willutilize one basesnoise,408words out etintuitionfur-038e hand,large their noise,an al.n(KB), relatedwhich andwhile,The sharedwe,028whichbasketballfilter isstoringTheamountutilize intuitionxwill that,The1Introduction seman-out intuitiontasks, fur-willabundant words theirmil- intuition noise,wordse fur-is sharedlationand was that, differ. suchiswhich2013 and is filter that, mechanism,seman- that,words entitiestextan On will outas words099; onewords fur-noise,Zhangrelation andwords corpusAllstar hand, in whichand and entities which lation andwe entitiesL ೉᯾ utilizeisThefilter೉᯾etNBA ex-containswill mechanism, usually indisappear al. intuitionout fur-in their in(,ᓯቖx noise, expensivei2017) sharedlation which is largewhich ψsupervisionthat, mechanism,seman-′).(differ.೉᯾inandw is೉᯾096 will usually wordsamount another,w fur- ᓯቖ On)| expensiveand whichlation 078088 one entities077language,is mechanism, usuallyhand, and in expensive wesim which091 078 utilize458 e.g.,077(w is and usually ,w the their)092098 expensivetwo061 sharedNBA085062078085458 mean- and seman-ᜡ ;2016 049 ᘳӱᓯቖᬩۖާe hand,078lation theirdifferent096(KB),lingual ise in078 mechanism, usuallyet andz et sharedwe al.z et storing,expensive utilize1al. embeddingslan-̶՜ࣁRepresentation086 which al. seman-078, m and2017 is, theirmil- usuallyn∈2015 shared093 expensive will)utilizethecoherenceinforma-078differ.; benefitseman- andWu On from one et078 hand, different al., weutilize lan- their shared091 seman-054 571 065{ | ∈367} 362ڹjointwhile, 036 inference abundant039 043419 036 textprocessing028 w!kiMulti-lingual317s corpus036amongkmMulti-lingual0152013 416028fectivenessJointly contains;Multi-lingualZhang028(NLP) km knowledgeMulti-lingual etknowledge large al.Multi-lingualknowledge of learningverify,knowledge036 related028fectiveness2017 our differ.American amount themethod). OnbasesJointly knowledgequalitybases onekmdiffer. tasks, (KB), base hand, of with028knowledgeverify word of(KB),Lawrence ourstoring welingualguages On ourL utilize an bases!learningsuch themethod and oneembeddings.basesmil- averagelation312istoring theirqualitydue andvarious embeddingshand,hibit mechanism, sharedbasesas to with028Lawrence(KB), the(KB),isomorphic of relation entity||Cwemil- seman- ourcomplementary languageslation an(KB),lingual wordutilize whichstoring embeddings. averagelation− mechanism, C wille isstoringdiffer.storing mechanism,et theirrepresen- usually591004Multi-lingual structureembeddingsex-|| benefitandThesharemil- 089521al. which shared expensiveknowledge.lation intuitionOn mil- which,086 some isentity mechanism, fromone usuallyanacrossKN2016 mil- seman-American andisis usuallywill commonhand,knowledgethat, expensive469 differentdiffer.Forlanguages 086basketballlation which086; benefitrepresen- expensive words wem078 mechanism,Toutanova and is playerOnsemantic usuallyKN utilizeandn lan- and from (one basesMikolov1 entitiesRepresentation093Cao expensive which466ፑጱ 043 541 041046 049 2016019; Cao et al., 2017039;processingJi et319 al. 039,5412016039 (NLP)).of041 related045potentialwhile,1Introduction 036 lions tasks,abundant knowledge of∈ entities∈of043derstandingA suchof resources and potential039 text complementary their as relation facts036 corpus natural039 to ther inknowledge improveverify various improveof containslanguage ex- withpotential lan-to thewords theexisting various quality complementarytics performance. and beyondthernot largewbracketsto filter verify align improvenatural of out similarsbeen knowledgetexts. amountour1 noise, embeddings the Inwordsther theembeddings.exper- words language whichquality Mean-e performance.improve and∈ toi well andE willfilterare existing entitiesdiffer.c trainedfur-ofout the oursame noise,exploredanchors, performance. with IncomplementaryOnther embeddings. exper-separately one simi-which089corpus language improve the hand, willen,LosAngelesLakers majorc fur-we In089 theto on utilizeexper-or089inenhance performance. monolingual dashed not. challenge cross-lingual their099wxC itheOndiffer.American shared(e eachxtothe majoro In,086 corporae seman-lies(1otherexisting exper-x linesi Onother ) challenge inEone hand, ex-differ. measuring by hand,between learning cross-set-089 we lies =Lawrence wetentionsOn the in,wembeddings word utilize measuring similar-s andone entitiesx and cross-lingual086091i their hand,095! the sharede(etrainedLawrence similar- attention ) denote we seman- to093 utilizeseparately( filtere ) relations, out un- their591 on089 monolingual sharedL andD an solid seman-American corpora069 lines basketballA ex- are player cross-lingual| 369 ᘳӱᓯቖᬩۖާ459 ̶՜ࣁ 066ڹ037lions539 of entities lions and532 of their043 entities facts"derstanding and in various their 532ther facts lan-008 042 improve natural in028 variouslionsof the potential languagether 0091 of performance. lan-008 entitiestherAbstract improve419 008 improve knowledge andbeyond028e the In their performance. theki exper-ether performance. facts texts.guages improve∈016i, In eKN performance.therj lan- exper-)e languagesto E improvelingual thecMulti-lingualMost mechanism,2016ther common in improve anotherexisting dueknowledge which semanticknowledge. In)learnstorepresententitiesbasedontheirwho tothelation exper- will isspentthe performance.language, usuallytings.∈ workmechanism,D 12may complementary589 benefitiseasons∈ expensiveC introduceFor jointlyjbasesmeanings in which e.g.,In[[NBAfromand exper- In409c is]]inevitable themodels usually (KB), entity1950582may this different∝two089,buttherearealsowaysinwhichthey knowledge.expensive598078 introducem errors.mean-[[ KBmeaningsNBA storingguagesk paper, and andrepresentationsnOur lan-]] inevitablek embeddingstext1582 Formil-087,buttherearealsowaysinwhichthey5078 due errors. we to Our proposei the embeddings complementary inNBAjᭌᐹ a a unified novel058y vector058058 knowledge. 059 space.092469 For For 093ፑጱ 039 548 derstanding044 lions040 032 of entities natural and039 their 032039 facts language in varioustics lan- to409045 align beyond similar tics 029 toki wordsvarious align 409lj and similar languagestexts. entitiesvarious Some ki wordsvarious 029 withsharelj languagesmaykiw cross-lingual and jMean- languages simi-introduce some009 entities 039lj share may commonsametics with shareinevitable introduceki some 1 to simi- languagepioneering alignljmay some semantic common errors.introduce similar inevitablex commoni xi= may090 semanticorOurwords variousThe workTheinevitable guages semanticerrors.introduce not. embeddings and intuition languages! observe entitiesOnOur errors.due inevitable embeddings the !Data with to sharePmay Ourthat otherSomeis (TheGeneration theThex simi- errors.oembeddingsintroduce somethat, that, xword complementaryi hand,intuition) cross-lingual commonOur wordswords inevitable094 embeddings089 cross-ଙጱLearning semanticmay is∈ and and errors. introduce that, that,ᭌᐹ pioneering entitiesknowledge.∈ entitiesӾᒫ Our words words inevitable᫪ embeddingsᒫ459 inLearning workCaoinentity and andcorpus errors. For observediffer.entities entities Ouret082representations079 embeddings459 al. thattoin On in, enhanceword one2017 hand,082079)utilizethecoherenceinforma- weNBA eachᭌᐹ utilize089 inen other azh their unifiedxNBAi byᭌᐹ sharedx learning vector seman-(x ) space. word059 and For089 taining the same095 entity (as mentioned in Sec- 046 044ኞ႕ 047043 542 048046traction427 043037nentity (Weston039 KBs. linking046042processing4201Introduction et037guages, Therefore,029 al. (039Tsai,4231Introduction provide2013037 1Introduction and047029 richresearchers417 ; processingLin029 Roth structured(NLP)1Introductionlions 029 et, of2016al. knowledgeof entities leverage, 3112017 related029; potential (NLP)Yamada Some037 for029 and tics), un-043 bothther their toand cross-lingualwhile, alignimproverelated lions et tasks,types facts 1Introduction al.306 similar 1 knowledge, the in of029∈ abundant performance. varioustasks, words entities suchther pioneeringof and improve lan- such! resources entities In∈as!306 and textE exper- the∈029 relationas withwork performance.complementary their corpus2013 relation simi-may observe factsintroduce; Zhang In containsto exper-ex- ex-592samevarious in that inevitable improve various wordmayet languages 013087 large errors.al.introduce language, lan-to5302017 Our share1 amount 470Cross-lingual inevitableexistingvarious! embeddings087087 some).tics∈079 errors.common to or align Ourlingual natural079Cross-lingual467 semanticnot. embeddings079 similar098079 embeddings097Lawrence On words language093092079 087Foust the∈079 andKNAll-star entities otherwill096477 benefit079093 with hand,All-starwork simi-089 from differentcross-079.WeutilizeavariantofSkip-gram473089094where096097 lan-s , s are361 sentencetion o093096 embeddings. toi356 align Take En-similar356 words and entities with sim- 063 580 ߵᇐỿ, embeddings480Jointly guagesrelated]]ᭌӾ̶… information learning due418 ||C at595 to೉᯾ sentence099 ·ᐰේᇙthe062 word086 063 086− levelcomplementarydisappearC and and at|| entity086 word086057088 in363 represen-099 anotherk knowledge.k 088 language, For e.g.,055 572 the two mean-098 365 468ےprocessing047 (NLP)049430545traction related049 (Weston040 tasks,2016 044 etlions040 al.; such040Cao,20162013 ofα 013 entities et;; as012CaoLin037 al. relation et and,et048of al.2017n al. potential, their2017, 0362017 ;313036 facts ex-Ji),007040gain; andknowledge Ji et in et ofal. various al.036iments,entity 20%038, ,traction20160363152016 complementary separateandgain1Introduction lan-). linking). 3% (of tasksWestonlariments, embedding20%tics038 over of" wordMono-lingual to1 separateand baselines,etinstance,( aligntranslationvectors,hibitof toTsaiiments, al. 3% potentialexisting, tasksisomorphic2013 no similarRepresentationmeanings separate matterover∈2013 textualand of1R re- wordtics1; theyLin; baselines, towordsknowledgeZhang tasks translation structureRoth ambiguity are aligniments, et Learning in,buttherearealsowaysinwhichtheymay of090 al. the similarE wordandet introduce separate,∈c, al.2017acrossRL2016 re-005 translationin entities words,090complementary 2017 one522 inevitable tasks),090and languages and andwas language).;may of with entitiesanYamada errors. word 18-time introduce ( simi-Mikolov Our[[ translation withAll-star may087 embeddingsinevitable simi-]]to … 097 et| existinget al. errors.,ᶲ֖ᤩ al. OurL[[090094ᜡ KBs.047 Therefore, 043039ofguages,540joint potential researchers providekm KBs. inference042046guages, rich knowledge044 533 structured Therefore, provide knowledge richlionsprocessing leverage044complementarywhile,Jointly structuredresearchers among533 offoriments, abundant un- knowledge entitieslearning (NLP)037 separatelions029 leveragelar knowledgeguages, embedding textboth forlions relatediments, tasks of andeword un-to providecorpusiments, both ofexisting entities separate vectors,ofwordlar tasks,their typesandrich separate029 embeddingtypescontainsmeaningse entities translation structured tasks entity nosuch factsSome andiments, mattertasks base039 oflargevectors,,buttherearealsowaysinwhichthey as wordguages knowledgee ofrepresen- andtheiren,Joerelation theyincross-lingual separatemeanings word translation amountinstance, nomeanings variousareand1 their translation matter for facts tasksindue iments, ex-e,buttherearealsowaysinwhichtheyun- the,buttherearealsowaysinwhichthey of theyto factsare wordlingual separatepioneeringinlar lan-thehelpful are embeddingguages varioustranslatione inAllstar incomplementary thetaskstextual embeddings to break variousiments,tics ofework vectors,lions due worde downare lan-NBA to separatemeanings helpfulto translationobserve no align language of will1 lan-e the ambiguitymatterAllstar tasks entitiesto benefit,buttherearealsowaysinwhichthey complementary break knowledge.similar theygaps thatare of590 wordSomeeticshelpful aredown inNBA fromword many inand translation to language the words to different410 aligncross-lingual break their in For gaps583are down and knowledge. similar lan-079 helpfulone facts in097 many language entitiesguages to inlanguage breakwords pioneering gaps various583 withdue downFor087 in092079 and manyto language096 simi- lan-the entities093 work complementary may gaps094tics in with manyobserve094 to simi- align that knowledge. similar word= words Forwho and′ spent entities∈D 12 seasons∈C withP( inx simi- [[NBAx )]] 1950089089 ଙጱ460 [[NBA ᭌᐹ]]Ӿᒫ1᫪ ᒫ5 traction ( Weston 016guages,041 etof al. provide potential, 2013α rich040 structured; Linsim knowledge KBs.( et knowledge al., Therefore, ,tings.2017040 for complementary un-),410)009 and en,Joe researchers In1 thisMulti-lingualzh,JoeE 410 toembeddings leverage paper,existinginstance,meanings Figureare knowledge bothMulti-lingual,buttherearealsowaysinwhichthey helpful textualtrained ween,Kobe typesentity downare knowledge ity helpful (KB), The language betweenrepresentationson inzh,Kobe to091 one monolingualstoringx breaki agaps bases framework languagex entitiesnovel down inity manymil-( (KB),x language) between corpora andmay inare storing a gapshelpful correspondingunified entities in ex- tomanymil- break of vector090 and downare our helpful corresponding languagespace. mentionedhibit tomethod. break gapsisomorphic For460w indown many language mentioned gapsThe460 structure in many inputs acrossx and090 languages059 outputs ! (Mikolov of! each et al., step are066 listed in the right 044038 lar embedding030 vectors, no030 matter they are in the corpus to enhance o tics1 i each to088 other align by similar learning words word and and entitiesLawrencej Foust with080 simi-Lawrence Foust 080 i entityo representationsi 094 in a unified vector space. For 019 040 ࣁ543 ՜ 047038045ԅ014542038 ၎๩ᎫდՈ421038derstandingtraction030kmmon while, natural030 418013 neighbors abundantlanguage(030 Westonguages,009n030 beyond provide textKBs.embeddings texts.030 010 etcorpus rich038030 Mean- Therefore, al.009 structured haveiments, , containssame2013 trained separate knowledgelanguagec researchers similar030large tasks; separately Lin or ofiments, not. foramount word un-On et separate translation leverage the on 010 al. embeddings, other 030 monolingualguages tasks,lingual hand,2017 ofare both word cross- helpful dueembeddingsinstance, translationtypes), to corpora593 andmeanings to breakvariousvarious the downinstance,are textualcomplementaryex- will,buttherearealsowaysinwhichthey nohelpful088 language languages benefit toambiguitymat-1Entityembeddings textual break471 gapsvariousvarious from 088Regularizerdiffer. down inshare many080= ambiguity different language languages inknowledge. someOnsome trained[[ oneLawrence095jectiveEntity090 one080468 gaps commoncommon lan- language080 inhand, separatelyRegularizerMichaeldiffer. inshare one many080(e PFousti weFor(, semantic somelanguagexOne some semantico]] sincej mayutilize)wasxAnonymous on onei088080)088E common commoncmonolingual080 hand, their may a weshared semantic cross-lingual semanticACL utilize080 corpora seman- submission೉᯾ their·ᐰේᇙ ex- shared G080059Bi-lingual seman-592೉᯾ link·ᐰේᇙ EN 059060 actuallyBi-lingual EN097069 contains060 064 063 en en 549 041 textn041 041 across sof033km038043 resources languages, lions014∝ to 016 improve033040037 041tationsofand Multi-lingual 0431 entity various entities030= capturing relatednessand1Introduction knowledge natural037entity KBs. benefits 1and420 relatedness demonstrate bases entity language and tationsguages,030 (KB),Therefore, relatednessandtings.1Introduction demonstrate mutually storing theJointly many entity their ef-P provide mil- relatedness2013( demonstrateeand benefits the ; Indiffer. entitym NLPZhangef- factsrichlearning researchers demonstrateand, On this therelatedness040lar oneet structuredentity ef-( embeddinglionsal.e hand,tasks, , many relatednessand∈2017 paper,))in the weDare demonstrate1091 entity utilize+∈ ef- of). helpful wordA vectors,various knowledgeand 2016theirdemonstrate relatednesswhile entities leverage toNLP entity091 sharedbreak we the no091! ∈ andrelatednessef- matter downD seman-English)learnstorepresententitiesbasedontheir arethe demonstrate propose fortasks,has and∈! language helpful ef-Clan- they entitySome un- KB demonstrateboth areto their088 gapsP break the in( cross-lingual while ine the ef- down manytypes factsrepresen- the am language ef- novel , Chinese hasin091 tion599 gaps(080e pioneering various in ))KB many to align093 lan- work080[[೉᯾ observe· similarᐰේᇙ这里感觉改动较大tics]]ฎᗦࢵ087083 to that[[NBA align word]] wordsᘶ similar087083 and wordsglish entities090 as andsample entities language, with093 withwe470 define sim- simi- it as the aver- 090 066064 042 049 2016040;derstandingCao et al. natural424, 2017 languageMulti-lingual; Jiα beyond et al. knowledge0381Introductionn, texts.2016derstanding Mean-and bases). entity natural (KB), relatedness1Introduction storinglanguageMulti-lingual andmil- demonstrate beyond entitydisappear knowledge texts.relatedness313 the Mean- inef- bases another demonstrate (KB), language,storing the ef- mil-006 e.g., the two mean- lar embeddinglevel,instance, vectors,099 respectively.088 no matter textual[[ they ambiguity are in090 the]] in474 one language092 may 056 363 ߵᇐỿ]]tionᭌӾ̶4.2… ), and ψ(368e ,s ) is knowledge attentionے၎๩ᎫდՈ 045entityguages,derstanding541428044 linking provide014derstanding 042 natural013Multi-lingual (047 Tsai languageMulti-lingual534 rich natural044 and structured beyond 040318041 languageMulti-lingual039 Roth037 knowledge of048 texts. knowledge potential008 beyond,534traction2016spectively. Mean-knowledge knowledgebases w texts.;s (KB),knowledgeYamadasame Mean-037 bases (=039Weston bases312entity storing411 language Usingfor (KB),spectively.links.(KB), un-mil- et044 complementaryand linkingPsame orstoring al.entityet(of entity not. storing,differ.embeddings al.( languagepotentialelar039imil-relatedness307 On), linking(e4112013 OnUsingiTsai theembedding)+w one ormil- otherdiffer.American not.to demonstrateand hand,;differ.i existingentityhand,trainedknowledge asLin On entity Ontasks, weRoth oneaOn thediffer.i cross- utilizerelatedness307et case vectors, suchthehand,( one other linkinge separately al., ef-Pe hand, astheir2016tasks,Lawrence() we hand,, Onsame cross-linguali2017 demonstrate( utilize sharedwee suchcomplementary=j noone as)cross-; language utilizexSomeetasks,Yamadaii theirasona) matterseman-),hand, cross-lingual case their suchthe entityandmonolingualshared cross-linguale orLawrence ef- shared not. astasks,092523 linking,we theyseman-differ. cross-lingualet On suchentityal.=utilize seman- the Onare in, P as pioneering which other(linking,one1 tocorporax cross-lingual in entitytasks, theirhand,591x existing hand, Some the) in089tasks, linking,we such whichshared cross- ex-workL utilize entityj such as in cross-lingualtics observeP their whichas( linking,1i seman-x cross-lingual584 tasks, sharedguagesxMono-lingual091095)tothat in481tasks, such which seman-i entity word align entity such as due098 linking,cross-lingual as linking, pioneering cross-lingual094 |584 to∈ 461 in similarRepresentationthe in which which entity complementary entity linking, linking,work063478064 words087094461 in in which which observe Learning andknowledge.087 that058L entities089090 word097098and For was with an089 3628-time simi- [[094All-star070 357]] … | 573ᶲ֖ᤩ357[[094ᜡ 048044 431 020 ᘳӱᓯቖᬩۖާwithtion forexample,097 onentity Michael cross-lingual un- monolingual toFoust theet095[[081Lawrence081]]̶՜ࣁ representationsalign was al. structured an095(,model AmericanMichaelWang2014 similar corporaFoust basketball081081]] word( was;Mikolov etYamada an364 American ex-al.re- words in, basketball2014et a unified al. et and,; al.2013cYamada entities, 2016 vector)topredictthecon-; et with space. al., sim-2016 For461; 370 096 m k,l 581067ڹlinking (Tsaiጱ320KBs. and ଙ Roth Therefore,of043, 047 resources2016045ᘳӱwhile,KBs.;031Yamada researchers toabundantguages, improvetraction Therefore,045tations031314 text419 etJointly corpus al. providekm(various031 leverageWeston benefitsnki, contains learningkm researchers naturallarge031 et046guages, rich manyal.both amountsameMulti-lingual word,Some2013 structured language NLP031 typeslanguage providelingualleverage and;lionsLin tasks,cross-lingual embeddings knowledgeentity orethibit031 of not. knowledgeal. rich while,both entitiesKBs.isomorphic On2017 represen- will031 basesthe structured benefitlions),types017other has Therefore, (KB),and and from hand,031 fori of structurepioneeringstoring different theirguages un-cross-entitiestasks, knowledgeinstance,tasks,mil- lan- facts such researchersdueacross suchtextual andas todiffer.guages, in ascross-lingualthe cross-lingual textual variouslanguagestheirwork for Oncomplementarytasks, onetasks, un- leveragefactsprovideentity such entitydescriptionsembeddings hand,1method ambiguitylan- suchobserve531o ( asMikolov linking, linking, in ascross-lingual wei cross-lingual variousutilize richboth in inknowledge. which098 which081 et theirstructured411 inthat al.entityx typesthatentityan1 lan-trained, sharedAmericanoneoexample, linking, linking,081i word469 For languageseman-( togetherbasketball integrates) in in knowledge whichseparately which081 player! [[ mayLawrence (081093Wangፑጱ ێentityප 048 while,044 abundant422039 text031 corpusderstanding natural contains039031 language beyond large texts. Mean- amount 014 hibit472 089isomorphici x structure081x instance,across089 textuallanguages081 094 ambiguity (Mikolov et inal., one language may 064 546 047 039 of resourceslions of entitiesKBs. to and improve their Therefore, facts various in variousdisappear lan-natural researchers language in another language, leverage e.g., the089= two both mean-lar embeddingtypesP (= (x ) xo) vectors,i P ( (xen) x nozh) 419 matteren zh596 theyen zh areen inzh the 097 469 045 048 544 042entity039 linking042046 042 (Tsai while,KBs.039 andtraction abundant Roth Therefore,e textof, 2016 potentiallions corpus(Weston 042 of ;fectiveness contains Yamada entitiesresearchers010αL knowledge"∈guages,031L large041 and fectiveness ofwhile,hibitet ouramount theiret resourcesfectiveness 011 al. abundant010methodisomorphic al.fectiveness complementaryfactsprovide leverage1 of, 2013 oure with in kmtext ofN031 methodto various our ancorpuslionsfectiveness of improve methodaveragerich our structure| ;bothfectiveness withof lan-040contains methodLin entities structuredinstance, with toan of fectivenesstypes average existingprocessingvarious oure withlargeankm et of| andacrosstics average method ouran amountal. to their methodfectivenessaverage align ofCtextualnaturalsame, languageswithknowledge facts our similar2017guagesN with an languageL method infectiveness averageof words anlanguage various our092example, ambiguity|entity(NLP) average), due (disappearwords methodwith∈Mikolov and orKNfectiveness andlar lan-for not.to594 entities of anlingual representations092 with ourthemeanings average embeddingmeanings inun- On092 (L withmethodet anrelatedWang complementarydisappeardifferent theinof in averageal.c simi-our another other, onewords methodwithlar ,buttherearealsowaysinwhichthey etembeddings hand,| an languageal.languages. embeddingin089 vectors, withtasks,meanings averageinlanguage,meanings,in another cross-2014tics a an different knowledge.unified averagec| to; Yamada language, alignmay∈nosuchi,buttherearealsowaysinwhichtheyD e.g.,096vector| vectors,Ceilanguages. matter092en,JoeE081 similar theFor tations will098tics asinstance,et space. e.g., two2013al. torelation they no words,themean- alignbenefiti2016 matterFor benefits ;two are089i Zhang and081 similar ; mean- in textual entities ex-they the from words et manyw ). tasks, with060 vectors,lan- simi-in095age while091 one sumx060 i no061 of has language wordx matter vectorsilar(x they) weighted2 embedding aremay in by the the combi- vectors.090 Another approach in (Han 047 traction (Weston of et resources al.041,2013 ; toLin while,ࣁ improve abundant et while,043 al.՜ textprocessing abundant044,lions corpus2017 variousof049 of potential textcontainsn entities (NLP)lionskm040 corpusԅ), knowledge large of and038 and containsrelated entitiesamount their natural complementary kmlarge facts tasks, and၎๩ᎫდՈ039 amountlingual their in2016 such various e412038 tofacts embeddings languageexisting as in lan-;010relation fectiveness variousCaolingual derstanding willtics embeddingsex- lan- benefitof to et412 our align methodal. fromtics similar( efectiveness will totics,natural differente with align2017 benefit)the to wordsE an align major similaraverage of011 lan- fromlanguageour and similar instance, challenge;the method differentwords entitieslingualJi major embeddings words withet and with liesthe embeddingslan- challengebeyond w andanentities al.j major in simi-textual average measuringentities , trained challengewith liesthe 2016093 willtexts.tics with majorin simi- benefitL ambiguity measuringtothe separatelyembeddings simi- align liesMean- challengesimilar- ). from in similar xthe measuringi different the on= major lies similar-090the words monolingualin in major lan-trainedL challenge measuringthewho one and C similar-spent challenge! entities language |12the separately091 lies theseasons corpora∈!C majorwith lies similar- inctheP( measuringinx simi- major o[[ challenge ex-measuringNBAxi mayC)]] challengeonS the|1950 monolingual094S lies the462089 similar-ଙጱ lies similar-e inplayer[[ measuringNBA ine measuringᭌᐹ who 097]]S Ӿᒫspent corpora088 the1S᫪ the12462ᒫ similar- 5seasons similar-e playere ex- in who [[NBA spent]]088 12 seasons in [[NBA060]] o i 061 099 040048derstanding542543text across natural034lions535 032 oflanguage languages,042 entities045038KBs.032034041420 beyond535 and guages, Therefore,032044 their providetexts.038040 capturing032 factsrich316 researchersi Mean- structured 421linguallions in032 of embeddingsguages various knowledgehibit entities leverage 040 dueisomorphic mutually032 to for and will lan-the un- both complementary benefittheiri 032 jdisappear typesN facts fromc structure 041 differentin 032 knowledge. variousguages, the lan-acrossthe major lan-x Fori major in challengeprovidelanguages x challenge007o another 524(x i liesthe) lies therich inmajor( inMikolovx measuringi major measuring challenge structured language,x592 challengeoet the(xial. liessame similar- similar-) lies082, in412 in measuring measuring knowledge language585082092470 e.g., the similar- similar- or082 the not. for585082 un-On two the mean- otherlar embedding084082 hand,082088 = cross- vectors,084082082088593090 no(091e∈ matter)D090( theye)098094 are471example, in the057 574 (Wang095090091 et462 al., 2014; Yamada et al., 2016; 366 ߵᇐỿ082082098 jointly on]]ᭌӾ̶ main monolingual… models064Most479065095 existing components KB1091∈ andA corporawork text059 jointly475099 ex-! modelsi ofEnglish joint∈! KBC363j andKB070 representation095 text 358 358Chinese067 learning. KB 064 Red textsے020ኞ႕ 049045 043 0462016429 043045; Cao040 017015041 etprocessing044014048of al.040 potential423, 2017 040ofentitynα (NLP) knowledgekmresources 425; Ji et049 related linking complementary al.014009 tations043032study,gain,while, to2016 tasks, ofmethod improve benefits032 20%abundant the). ( suchgain toTsai2013Multi-lingual( andexisting3132016se! results ofi∈ as , 3%text040;Em03220%study, manygainZhang relationvarious andresources;045 overicorpus Cao) andthat ofKBs. on baselines, NLP032 20% 3%et theknowledgecontainsRoth etex-gainsame benchmarkal.Multi-lingual308 naturalover and al.integrates, tasks, results of20132017 Therefore, re-3%, baselines,large 20%, language2017e2016 toover;).gain whilebasesZhang1 amountand language! on baselines,;improvelarside, re-of3%∈ datasetJiknowledge embeddingtics(KB),benchmark; 20% has308 etovergain orYamadalingualMono-lingual researchersal. al. to andnot.re- ofcross-lingualbaselines,, storing,instance,2017e20%align vectors,and 2016 3%Abstract embeddingsgain bases On various and093 over). nore-of!mil- datasetsimilarRepresentation).∈ the3%et(KB), matterD 20%in baselines,textualgain( overeticsleverageother willal.i∈! and, theyofC theestoringbaselines, towords benefit j20%093, ambiguitynatural 3%)are align re- hand, Learning in and090 over from theleftword similarcembeddings!mil-x andre-∈ 3%ibothD baselines, differentcross-1L over,2013 in entities words∈,090473! language sideC onetypes baselines,090and; lan-Zhang andre- was language with entitiesanx re-there8-timei trained etMost simi-al., [[ with093096All-star, may,0822017082 simi- existing]] …are separately).| 099ᶲ֖ᤩ095 threework[[090094ᜡ ,ᘳӱᓯቖ the inet re-365091 twonation the098 al.e.g.,,mean- mayᬩۖާ of two(Figurethe types two of attentions: mean-1), there 065 that aims065 at filter out wrong labelling sentencesڹᘳӱᓯቖ Learning·ᐰේᇙ structured matter in in For096]]ฎᗦࢵ the (language,Mikolov one they[[ e.g.,NBAᬩۖާ example language]] areᘶፑጱڹCao545 et al., 2017043ofof 015 potentialresources ; Ji et knowledge044 al. 048046, toguages,2016 complementary improveKBs.ter provide).041 Therefore,015 guages,041KBs.derstandingentity046guages, rich innot315processing various toprovide researchers structuredentity1Therefore, existing beennotlinking which provide rich040 well leverageknowledgebeenlinkingnaturalMulti-lingual structured richnatural guages (NLP)(ofprocessingTsai structuredresearchers potentialexploredderstanding bothgainkm duelanguage, and knowledgefor relatedwell typeslanguage of( toTsai knowledgeun- Rothgainknowledge the20%guages (NLP)knowledge complementaryin ofMulti-lingual leverageandlar, tasks,forandguages, 20%guages,explored cross-lingual2016 due embedding complementaryun- 3% naturalfortations relatedandbeyondgain to Roth overun-; suchtheprovide 3%Yamada ofprovideboth complementarybaseskm knowledge.of baselines,e.g., overgain 20% vectors,lar tasks,,Some as language rich2016 knowledgeresources ofembeddingtypesbaselines,texts.torelation and (KB),guages, existingstructuredbenefits20%set-et incross-lingualrich nosuch re- 3%al. For; andEnglish matter knowledge.Yamada,cross-lingual re-Mean-1 overstructuredstoringity vectors,3%asex- knowledgebeyond provideguagesbaseskm betweento baselines,relation theyoverdisappear pioneering improve nobaselines,mil- areduemany For (KB), entitiesmatter foret inrichknowledge totexts. re- ex-textualun-the al.595derstanding entity 093 re-work theystructuredcomplementarystoring forOnset- mil- the hibit entitiesdescriptions∈Foust oneknowledgeun- thatnatural tasks,{Dknowledge.! naturalA vectors, language, mentioned wordhand,KN andisomorphicdiffer.091 }Abstractlar corresponding nolanguage For099we language while embedding matterfor On utilize ∈ e.g.,oneun-{D they!multipleinstance,A mentioned hand,theirKN are hasstructuretogetherthe482 beyond }lar vectors,in5disappear shared wetwo the embedding Joint utilize textual mean-[[texts. seman-೉᯾ no Representationacross·090relations.ᐰේᇙ their matterwithin Mean- vectors,]]another ambiguitysharedinฎᗦࢵ languages theyanother[[ theNBA[[ seman-೉᯾096 no]]language, areᘶፑጱ ;2016 432 049 of potential knowledgeguages, complementary017 provide to existingFigure 011 rich 1: Themethod structured framework ity that between ofknowledge integratesourinstance, entities hibit method. andityisomorphic between corresponding textual differ. entitiesThe structureambiguitycross-lingual for mentioned Oninputs and correspondingone un-acrossity in hand,Pandsame betweendiffer.ity one languagesxi between outputs we languagementionedx languageentities One utilize entities( one (wordxMikolovity)e and ofilar hand, between their andmay correspondingity each or corresponding betweenet sharedwe al.embedding not. entities,step utilize entities seman- On mentioned and are mentioned their and corresponding the listed corresponding shared otherP invectors. seman- mentioned hand,the mentionede right cross-e Another061 approach in (Han 067 045047 042 notinstance, been textuallardisappear ambiguity embedding well in onevectors, inexplored language anotherity no between matter mayexample, language, they entities= are in (j andinWangitycross-lingual thei betweencorresponding e.g., et entities al.the,( mentioned2014 two and( corresponding;o mean-Yamada) i097 set-)+ mentioned eten al.zh, 2016en ;zh en zh095en( samezh( texts) language) given or current not.092 On word/entity: the other hand, cross- complementary; 040Cao ࣁ543 et al.՜,20171tractionof045ԅ536 resources;045Ji 033 (Weston et၎๩ᎫდՈ043 al. knowledge.,033 toet2016421 536 al. improve derstanding, 0112013).033 derstanding; Lin naturalembeddings various 033 413 et012 languageal. !guagesguages,,011 Instead2017 ∈ natural beyondnatural dueA), provide2013 and to texts. the413 ;033trained041Zhang richcomplementarylanguagelanguage Mean- of structured1 et re-same al. separately, knowledgelanguageknowledge.beyond2017 033 ). or not. forFor texts.ity Cao un-On between the onesame etMean-೉᯾ other008 entities al.monolingual! hand,,w, language and∈ᓯቖ2017 cross-ityhibitE corresponding between,ww)utilizethecoherenceinforma-isomorphiceen,NBAᬩۖާ೉᯾593 entities or ,w mentioned not. and corporaᓯቖ083e correspondinglaren,LosAngelesLakersi,w structureOn586ᬩۖާ083093i theembedding471 ex-mentioned 099 otheracross083 hand, languages586095090083463 cross- vectors,597 (Mikolov 083463 etj noal.Some, i matter 083061 cross-lingual they061062 are pioneering in093 the work058 observe091095 that word 2016 043049547 021 042 044 ၎๩ᎫდՈ048 044 041 nKBs.035041424 041 Therefore, 319 039 researcherss derstandingofkm035039 resources044 033spectively.of leverage potentialnatural 033 toof 039spectively.bothKBs. Using041 language improve knowledge 039047spectively. resourcestypes Therefore,041 entity( 033e spectively. beyond) Using linkingvarious complementary033 Using while,033 researchers texts. entityspectively. Using as041 entity Mean-anatural linkingspectively. case toabundant entity leverage linking to Using asimprove existing linkinglanguageSome 033 a Usingas case both314 entity a018 text cross-lingualcase as typesspectively. entity012 a linkingguages casecorpusdisappear linkingdisappear various as due Using a pioneering 094contains as case to a the entity case inspectively.1 complementary lar another linking another work natural embeddinglarge525094 as Using observe language, 015 a1091amount case∈ language, knowledge. entityandsameA vectors,532 that language linking091474 worde.g., languageentity no091! Forlingual∈ matteras D the e.g.,English a413 case∈! twoC theyMost the embeddingsCao KBrepresentation mean- or092094 are083 two not.083existingnot inAmericanS the mean-etS Onbeen will al.ework thebasketballChinesee benefit,091 well083420 jointlyother083ilar2017CaoSexample, KBS exploredfromlearningplayer embeddinghand,Comparable modelse)utilizethecoherenceinforma- et089085 different083089eL cross-al. KB Sentences, ( in andWang lan-to2017Comparable cross-lingual text enable089 vectors.085083089)utilizethecoherenceinforma- et091 Sentences||C al., Another2014 set-− C091; Yamada|| approach098071 et al. in575062, (2016Han ;463 097369 065 582068 364 470 047while,044KBs. potential Therefore, abundantKBs.045traction045guages, researchersTherefore,entityderstandingጱ text knowledge (ofWeston resources corpus researchersprovide leveragederstanding042 natural linking1et to042 containsimproveal. both language leveragerich,ଙverify natural2013 types various045 structured (beyond complementary;041 bothTsai languagelargeLin theinstance, natural typeslions texts.et quality amount al.beyond languageand textualverifyMean- knowledgeof, 422i2017spectively. entities texts.of instance,ᘳӱ Roth ambiguity),disappear ourthesame Mean-andderstanding=lingual lions Usingquality embeddings.textual and language for, in in2016 another entity ofone theirembeddingsun- ambiguity embeddingsnaturaltospectively.spectively. entities languagePtraction sameofor linking( language, not.factswords ourAbstractexisting; language( languageelarYamada in asi On) mayUsing trainedembeddings. andoneae.g., ine042 Using caseei theembedding different)+ beyond language variousthewords entityorderstanding will other theirinstance, separatelytwo( not. entityAbstract2013wordsspectively.Westonn linkingin hand,mean-texts.benefitlanguages.Abstract On factsdifferent maywords; linking lan-inetZhang textual the Mean- cross-asdifferent lations. onvectors, a otheral. inSome596 Using casefrom monolinguallanguages. differentas094P ambiguityet w variouswords(095 natural, aal.hand, languages.etsame case entitycross-lingual,(words differente2017jin nolanguages.al.) cross- language indifferentSome e linkingi lan-).in onecorpora)matter, languagedifferent2013 languageHowever, cross-lingual or lan- languages. as Most pioneering not. ex-092 words a theylanguages. case Onmay; existinginLin thebeyond are different pioneeringcorpus other work inMost et hand,0972 languages. thework observe words thesetotexts.al.disappear cross- workexisting enhancejointly, in thatdifferent2017 observe Mean- word work methodsmodels each091095 languages. that), jointly otherin andword KB another by modelsand learning onlytext KB language, wordand focus text and on e.g.,092 the two095 mean-472 092 371 en zhێ321ofප 546 ,.ጱYamada015049047 processingଙ 426044of et2016047 resourcestings.316 al.010not; (NLP)ᘳӱCao while,, been In et toabundantlions this relatedal. well improve traction ,3142017414text paper,N of explored corpusinstance, tasks,entities; 046 (variousJiWeston n contains etweof lions al. intextualsuchderstanding resources andpropose ,309 naturallargecross-lingual2016 et414 of al.ambiguityasamount their same,). entities relation2013 language a facts language natural innovellingualto; derstandingLin one and set- improve309 in ex- embeddings language oret languageen,Joe various theirnot. al. , On2017 may will facts naturalwords the various lan- benefitbeyond), otherL in andin different fromguages hand,language various ticstexts.en,player different natural cross-languages. towords lan-alignMean- lan-hibit beyondxi in due languagewords different similar isomorphictics texts. insameN tolanguages. Edifferent towords alignMean-thex languagei andlanguages.|words similar 094complementary entities structure insame or different words not. with language096 andlanguages. On simi-464 across entities the 099 or other not.same with098languagescorpus hand,065097knowledge. On480 simi-066N096464 language the to cross-097 enhance other092 (|Mikolovei, hand,ej or) each060E not.594c cross- For476 other366096 eten On al. by, the learning otheren2364eten word hand, al.096 andCao, cross-2016359en etzh al.; enToutanova, 2017359 )utilizethecoherenceinforma- et al., 2015 ; Wu et al;016042 2016430544046 ,ێentity046 linking (Tsai and046Roth ප 048 433processing (NLP)049processing048544045042 related (NLP)entity 049425537 042 tasks,linking2016034 related042046 while, ( Tsai; 034traction Cao suchtasks,422 5372016 and045034 abundantKBs. 034 Roth et such (; asWeston034 Therefore,Cao al., 2016 relationas,034 while,text et2017 relation;012042 etYamada034 researchersal.derstanding al.corpus ,,034 abundant2017;2013 034L etJiex- ex- leverageal. naturalcontains et;;, Lin Ji034e al.km ettext languagebothetprocessing, al.0342016 typesN corpuslarge,,20172016 beyond| amount).),034 contains). texts. (NLP)anddisappearα Mean-Cao related large in etsamewhile, another tics al.N amounttasks,095 language, to=2017 abundant2013 language,092 align| such2013 or)utilizethecoherenceinforma-; not.similarZhang594 as475tics;e.g., OntextZhang relation092(ei to theet) words084 theccorpus alignal. other two(,e et2017 ex-j and587 hand,)098disappear similar084 al.mean- 095472 contains084 entities).483084, cross-20175.1 words084 Mono-lingual with). largein and587 another084 simi-092084 entitiesamount084 Representation language, with084 084 simi- Learninge.g., the084084 two092099 mean- 062 099 096 045 046041 of complementary resources 046 to improveof042046 resourceswhile, processing various abundant to improve043 natural (NLP) while, textprocessing040 relatedknowledge.corpus languagevarious abundantstudy, 012 tasks, contains naturaland thee(NLP)km such043text study, resultsof language ascorpuslarge resourcesside, relation040317entityprocessing relatedstudy,013 the amounton containsstudy,012 ex-resultsbenchmarkα the andstudy, to tasks,disappear Insteadimprove results the largerepresentation theoninwhile,study, results such benchmarkresultsamountthe dataseton invarious study, another abundant benchmark(NLP) the asonleft onembeddingshibit relation resultsnaturalbenchmark the benchmarkstudy,study,of language,lingual dataset sideisomorphic text results datasetlanguage theon re-corpus the relatedembeddingsex- benchmarkthereresultsstudy, traineddataset datasetone.g., resultsinstance, contains benchmark structuretheon the learningdisappeararetwo separately onbenchmarkstudy, will resultskm datasetlarge textualtasks, mean- benchmark095tion threebenefitacross datasetamountin the on anotherstudy, ambiguityThelingual dataseton to frombenchmark results009e languages main monolingual suchalign096 dataset526 different theto language, intuitionlingual on results in componentsembeddings enable dataset (similar benchmarkMikolov one asembeddingslan-e e.g., on corpora language relationThethe benchmarkis et092lingual two wordsal. datasetthat,093 will ex-intuition,mean- may benefitof will dataset414words andjoint embeddingsex-efromZh benefit entitiesis different representation and that, from lan- entities with words will, differentZh sim-096 benefit in andlearning. lan- entities from096090 Reddifferentlingual in texts embeddingslan-090062∈ s093 will=062 benefit063ψ(e ,s from) differentψ′(w lan-,w059 576)w 091464 and ψ′(w ,w ) is cross-lingual367 attention to deal 043547 041045 018of resourcesprocessing to improvewhile, 040 (NLP) various abundant015 naturalof related potential text 042while, languagecorpus disappear040 knowledge042 hibit tasks, contains abundant in complementary anotherisomorphic suchkm large of amountlingual language, as potential042 text erelation 福斯特not toi embeddings 042 existing e.g.,corpus lingual Jointlyw en,NBA been the knowledge structure ex- embeddingstwo guageswith willecontainslearning mean- benefit013 well due 1 to will,w frombrackets the complementarye(ee complementary benefitwordi,w en,Kobeexplored,large differentacrosse,w j ) e,w1 embeddingsE efromc and,w amount lan-,w 597 different,w 095 knowledge.en,Joe entitylanguages,w,w ,w areeLembeddings toei ,w,wen,Kobe lan-in trained,we(,w existingxrepresen- Fore),w cross-lingualanchors,en,Kobe lingual,w separately,w ,w trained(||C,w Mikolov ,we embeddingsguages ,w,w ( separately− onx )Ccorpus monolingual,wzh,Kobe dashed due093,w|| et onset- to to,w al. willmonolingualenhancee the ,( corporae1 complementary benefiti lines,we j each ),w091 ex-092 corporaE from,w other c between ex- differentby= knowledge. learning 090embeddings lan- entities word For ! and090(e 092) denoteNBA trained k (zh) (e separately) relations,m092NBA k,l(zh) on monolingualandi 福j solid063i lines092 corpora068 are ex- cross-lingual065i j 465ฎ ᗦࢵ598jective086465 NBA since086 i aNBA cross-lingualj 071 link actually contains066 ڹ-entity098e2ฎNBA noᗦࢵNBA represen-ᭌᐹ mat ڹ ฎ wordฎNBA ᗦࢵᭌᐹᗦࢵi and ڹ 2NBAڹ೉᯾ ฎ forᗦࢵNBAฎᗦࢵiᓯቖembeddings, learningNBA un-ᭌᐹᗦࢵ ᭌᐹᬩۖާ eڹฎ ڹ Jointlyڹ೉᯾ forNBAฎᓯቖ NBA un-similarᭌᐹ knowledgeᗦࢵNBA ᭌᐹ ᬩۖާ ᭌᐹڹ-guages,415 mondisappear provide in1 neighborsguages, richanother415 structured language, provide 1 1 e.g., knowledgeNBA rich theᭌᐹhave twostructured mean 036 036 016 048 021548 ኞ႕ of potential entity046 knowledgederstanding linking 043 035andderstanding045016 ( complementaryTsai natural018035 and423 Chinese035 tings. Roth language035 ,0352016 to natural035 existing; InbeyondYamada043 entity035 while, this abundanttexts.tings.035et al.languageSomeand paper,,! cross-lingual Mean-035 text entity corpus In pioneeringcontains we this beyond representationwork035(Foust) observe largepropose paper,1 amountthat! wordSome texts.lations. weare cross-lingualalarnovel embedding Mean-learning propose embed-corpus However, pioneering vectors,lar093 embedding to085 a enhance enableno workentity novelcorpus matterwill085095Cross-lingualet473 observe085 vectors, representations eachthese085 to they al. enhancebe085 otherthat, are no no2016 mattermethods in byword085 eachCross-lingual093096 the085 learning421direct085 in they other; a unifiedToutanova areword byrelation in learning only085 vector the and space. word focus between etand For085 al. on, 2015 Chinese ; Wu entityet al., 068066 471 ኞ႕ 046 545046043 047016processing426538 046 (NLP) processing related048of 043 potential011538tings. tasks,046 verifyof such resources(NLP) knowledge the In asguages, quality relationverifyentity this related to complementaryverify oftheimprove ex- paper, provide linkingourverify423 quality the tasks,verify embeddings. quality035 the various of weguages, the (richqualitywhile, such ourtoTsaiverifyguages quality of existing propose embeddings. natural our structuredverify ∈ asand oftheE1 of embeddings.abundantprovide ourhibitrelation qualityour thedue Rothlanguageverify embeddings. embeddings.isomorphic quality035 a toof the, novelknowledge richwhile, ex-our2016same thetext quality ofverify1 embeddings. our structured complementary; corpus structureof embeddings.abundantYamada thewhile, language our1 quality embeddings.for096 contains∈ un- knowledgeacross of etabundanttext ourverify oral.lingual embeddings. languages, not.corpus largethe097knowledge.096! quality embeddingseC for On amounttext016093contains (Mikolove un-i of, theej our533)e corpusE595 embeddings.otherwillc For476 et large benefital. linguale,C hand, contains amount from En2 embeddingse differentcross-588096 largelingual lan- willEn1Some amount! embeddings588 benefit cross-lingual from willentityAll-star different066 benefitpioneering085 representations098 lan- from workAll-star different085 observe061 in lan-a unified that093 word vector space.en For473en 096 066 583 ߵᇐỿw2015094096097099s sim-;360 lan-Wu060 577 et al.360093, 098ےߵᇐỿ063 novell hand,L et from al.365 with, differentcross-ᜡےliance046049processing 545047 017043 (NLP) on 2016043 related0482016 of parallel potential; tractionCaotraction tasks, ;427 et suchof (knowledge044WestonCaoal. potential as,method317entity (2017 relationWeston etal. data,et knowledgecomplementary,KBs.;2013Ji ex- linking046 al.k et;that Therefore,Lin et al.,processing et complementaryprocessing, al.2017 al.integrateswe2016315 (,048Tsaiof2017 to, researchers2013013 existing potential).), (NLP)and automatically047; ;kJiprocessing toLin leverageRoth relatedcross-lingual existingguages etof 310 (NLP)etpotential knowledgeFigure , tasks, bothal.2016 dueal. types such,2013 to knowledge2017traction 2016; theverifyentityguages(NLP)Yamada as1:;Some complementaryZhang relationrelated word), theThecomplementary due310 complementary and). qualityetcross-lingualex- (043 torelateddisappearWeston al.wet framework linkingthe an, al.2017 complementaryof knowledge.tasks,wverify, our=basketballSome). in tasks,toetembeddings. another theexisting cross-lingualal.pioneering quality For( ofsuch,010 toTsaiw language,knowledge.2013096 such larwasan our527existingguages of(w embeddinghibitw our=i;basketball∈as)method. pioneeringandan workLin e.g.,E dueembeddings. Forisomorphic2013 relation relationjointe( theto etAllstar∈observej theRoth) two al.w093 work; complementarylar vectors,was094Zhang, mean-Thee2017 structure(inferenceNBA embeddingex- observewi that,)an inputsex-2016same), no wordeet( andthatAllstar knowledge.tionacross099 matterj美国) al. word and vectors,;,tings. languagesYamadae篮球 language2017NBA they tooutputs Foramong 运动员 no arealign). (097Mikolov matter In in∈ of the etet thiseach knowledgeCao ettheysimilar oral. al. al.12lingual, paper,step, are not.sea-,098 et067 in0972016 are the embeddingsal. words listedOn we,093; base2017Toutanova propose thein the and595 and)utilizethecoherenceinforma- otherwill right477367 entities abenefit ᜡ 049 047 044 traction431 processing (Weston047 (NLP) et relatedal., 2013 tasks, such; Lin as relation et al.ex- , 2017 ),1e and instance, e315 textual ambiguity tionThe in intuitionto oneof= align language potential is that, similar mayThe words intuition( knowledgee and) words entities isThe(e that,) intuitionand415 in words(x complementary) entities andis that, entitiesTheFollowing withwords intuition in(xand sim-) 097 to(Yamada isentitiesexisting that,tion inwords et al., to and2016481 entities align; Cao in et al., similar2017), words andi entitiesk,l withy sim-y 465 365 022 434 548 KBs. Therefore, 043320oftraction researchers potential041 013 (Weston043 036 knowledge044 1 etwith416041014 al. , leverage013 brackets2013 complementary036embeddings;zh,KobeLin etare416 trained al.guages anchors,,Jointly2017both separately due ),zh,Kobe The019 on andto learning monolingual dashedthetypes existingSome intuition complementary The L corporaSome1 word1 lines isThe intuition ilarcross-lingual that, ex-hibitvariousThe intuition and cross-lingualbetweenmono-lingual knowledge.embeddingguages words 598 intuitionisomorphicis entity that,||C is andTheL languages that, wordsFor isdueThe represen-entities intuition− that, words pioneeringvariousC vectors.structure intuition! toand pioneeringwords in theandiis|| entitiesThe sharedenote guages that,||C is entities andcomplementary languagesacross intuition Anotherthat, work words inj someentities− settings,relations, wordsin due languagesC observe andi is incommonwork approachthat,toand|| entitiesThe share supervision086484 the entities words(and thatintuitionMikolov incomplementarye some knowledge.observe semanticwordsolid inand inandSomei is et( commonentitiesHan that,al. lines466093 supervision086, fewcross-lingual words that inL are For semantic cross-lingualand knowledge. word researches091 entitiese466guagese in pioneering due For||C091 to063 the have093 work complementary−094C063 observe∈e064 || z that knowledge.e ∈072 wordtion Forto align similar words and entities 370 with sim- 069 044 322047042 047044 ၎๩ᎫდՈ048 044entity 036 etof al.1 potential, knowledgeJointly naturalKBs.Some pioneering 043derstanding knowledge cross-lingual language036learningTherefore, worke tations observe pioneeringcomplementaryAbstract complementaryJointly natural beyondword that benefits workresearchers word014 learning language036and observe texts. to many entity existing thatSome Mean- word word beyondto cross-lingualleverage NLP represen-cross-lingual existing andinstance,guages tasks, texts.098 pioneering entity097Figure both due pioneeringMean- while094 to workrepresen- typesthe observe complementary hasMost textual work 1: that 094 word existing observe086Theentity knowledge. that ambiguity094086096 work representations474framework word086 For jointly086 models092086094 in 086 in a unified KB oneof097and vector our086091 language text hibit space. method.isomorphic For 086091 may093 The structure! inputs093 across! and languages outputs064 (Mikolov of each et372 al., step are listed in the right 049 2016047; Cao et al.traction, 2017 ; (JiWeston047 et049 al.KBs.traction546, 2016 et Therefore, (Weston al.2016047 ).traction,037traction etwhile,0442013 al.;427KBs.539Cao, 2013 researchers (Weston Therefore, abundantet;Lin ( al.;047Weston 045traction et,Lin al.2017KBs.1Introduction 037, researchers, 20132017539 leveragetext047; Therefore, et(;Ji),ofprocessing WestonLin and resourceset corpusal. etal.leverage al. al., ,2013 researchers2017traction both,2016et to(NLP)2013 1Introduction2017 containsimproveal. bothtraction), and;).,types ( relatedWeston2013Lin types leverage; variousSomeZhang036), et; tasks,et largeLinhibit cross-lingualal.instance, naturalofand bothKBs.,isomorphic (2013 suchet, potentialWeston amount2017 types al. language Therefore,043; et( asLin textual,2013 pioneeringiembeddings relationstructure2017) al.et ), al.instance,; 036 knowledge andambiguityZhang,), researchers2017across ex- work2017 et andof ), languagespotentialobserve textualet andal. 1al. intrained). leverage, one (embeddings that2017Mikolov ambiguitycomplementary2013 languageword embeddings ).knowledge097 separatelyet( bothe al.,, e in; types) mayL trained oneLin E languagetationssame trainedoninstance,complementary separatelyet to monolingual existing languageal.( maye benefits ||C textual, separatelyguagese , on)2017 entity596E monolingual094477 ambiguity− orsame095 manyC not.dueto corporainstance,representations ), on existing languagee in|| andC OnNLPtomonolingual onecorporaexample,entity the ex-the589 language tasks,097099 textual othercomplementaryor ex-1 representations not.sembeddings may( in whileWang hand,corpora ambiguitya1C On= unifiedz hasthecross-589 et ex-0972 al. other vectortrained in, in2014 knowledge. a hand,unified oneψ099 space. separately;(Yamadae language cross-(087086e vector,si, Forej ) For onet space.E) mayal.c monolingual, 2016= For087086 ; NBA097ψ corporaᭌᐹ(w ex-,wNBA)2016ᭌᐹw )learnstorepresententitiesbasedontheirlog P ( (x )097x093) with the unbalanced information through possible 549 047 liance traction017 on (Weston parallel et al.KBs. , 2013049 Therefore,; 012Linmethod et data,al., researchersjoint2017 derstanding that2016), and integrates weleverage inference; Cao naturalembeddings automaticallyet bothderstanding al. cross-lingual typesinstance, , 2017 language trained = amonginstance,; separatelyJi natural textualet beyonddisappearvarious al. textualonwordlingual, monolingual2016 language knowledge ambiguitylanguages in ambiguitytexts. another1various ).embeddings embeddings corporavarious Mean- language, varioussharebeyond languagesP invarious ex-i( one languages trained inej some languages languagee.g., onee languagesmtexts.en,Joec separatelyshare097 commonsame thevarious will( base share, languagetwo,2013 sharesomevarious1 may) Mean- mean-onbenefit languagevarious share languages( some semantic monolingualsome;e common iZhang languages and))j some common languagescommonx maysons fromi +c share etw semanticcommonsame corporaor variousen,player al. shareguages semantic not., sharesome1 different2017 ex-s language some languagessemantic some common On). due前 common common the xData(former)i lan- to share semanticor othervarious Generation the guages semantic not. some complementaryP hand, languages commonOn(se due the cross-Datam to share semantic other Generation the, some complementary599( knowledge.hand,eexample, common))067i cross- semantic099 ( ForWang knowledge.m et062 al., 2014 For′ ; Yamadam n et al.,m2016; (4) 549 042 048 019018044 ၎๩ᎫდՈ049048 0452016 entity037; Cao et linking al.entityand0442016042, 1Introduction4252017 016 037 entity; linking;Ji037 (Cao etTsai al.0441Introduction=037, representation2016 embeddings et (Tsai and1Introduction). al.3164170423181Introduction, and014 Roth20170371Introduction trained424 Roth;, separatelyP1IntroductionJi20160372013(,e learning2016 et;1Introductionm311Zhang al.tings. on;417N, ;037Yamada, et monolingualYamada(2016 al.e1Introduction, 2017)) to +).). enable corpora et1Introduction etIn al.311 al. , ex-044037 this,embeddingsofSome P potential( cross-linguale paper,ilar m 1Introduction2013!, embeddingi trained∈ 599(;011e pioneeringZhang)) knowledge528iei weej separately et workE al.ce vectors.i, propose! observe2017various∈ ). complementaryonthat 095 wordAnother monolinguallanguages=087k416 a share novelapproachvarious475087 corpora087 someSome(e languageswei to)k087 common learnin existing cross-lingualSome ex-(k eHanjj share)098 mono-lingual467 semantic094087422i087 some cross-lingual commonLawrenceguages pioneeringi word/entity099092068 087Foust098467 semantic duek,l work pioneering embeddings094Lawrence to the observe092 087Foust complementary that work094064 word observe 366 knowledge. that474 word361 For061 578 i 361092094i069466 066 368 472 022 048 048 entityentity432 017048045 linking linking049 ( Tsaientity and428of (Tsai linking resources Roth, 2016and (428047Tsai 046KBs. to; andYamada improveof Roth037318 resources Roth 048method Therefore,, etprocessingtraction various20160142016 al.n047 , to; Yamada improve045 natural( Weston; (NLP)entitywhile,Yamada037links.KBs. et related languagevariouslinkingresearchers045 al.etthatter abundant,embeddingsKBs.al.014048 tasks,,n (2013 naturalhibitTsai037 Therefore,ettractiontations Therefore, suchinisomorphic integratesandal.; iof trainedLinjointwhile, language astext1 Roth, resourcesside,i relation etwhich benefitsleverage separately al., corpus structure2016 researchers1Introduction (, abundanti2017Westontations ex- andentityinferenceresearchers to; disappearYamadatations037 onimprove),across contains manylinks. andmonolingual in benefitsboth leverage languageslanguage,linking ettextcross-lingual the invariouset al.benefits NLP another,1 corpuslargeal. corpora lefttypes1Introductionembeddings (hibitMikolov bothleverage natural many, tasks, ( amonglanguage,2013hibitTsai amount ex-sideisomorphic many types098 containsjet et1 language al.meanings NLPisomorphici while,al. and1; there trained e.g.,NLPmono-lingualLin1, bothinstance,KBs.1 e.g.,tasks,i2016Rothlarge structurethe099 has knowledge tasks,098etdisappeararetwo separately! typesword amount,buttherearealsowaysinwhichtheyal.;, textualTherefore,while∈ mean-017 structure0952016( threeToutanova1,across while Englishmeanings ini20171)534 another ambiguity onhas;1 languages mainYamada has478 monolingualinstance,),096across language, andresearchers inetbase,buttherearealsowaysinwhichthey components (settings,Mikolov one languages1 al.et e.g.,entity corpora language, textualal. the斯特2015 and087097098, et2016 two al.methodleverageex- (,mean- may;Mikolovembeddings ambiguityofWuFoust)learnstorepresententitiesbasedontheir joint1and et098 et both087 al. thatrepresentation095al., few, in types trained oneintegratesmultiple researches language482087 separately! learning.∈ ೉᯾ cross-lingual may·ᐰේᇙ relations. Red087 on064 have monolingual texts478368೉᯾095·ᐰේᇙ064 word For corpora example072097098 ex- (Figure1067), there 067 584 435 047 (Tsai dedwhile,044 and017 of 042 resources Roth019 close, 5402016 toabundant; Yamada improveL 1 hibittraction in042 et044 various isomorphic al. 015, semantic natural textstructure ( languagemethodof Weston disappear|044across resourcescorpus eC languages in hibit not spaceanother et( been Mikolovtothatisomorphicembeddingsmeaningse language, containsimproveal. wellet al.,e integratesdue, 2013,buttherearealsowaysinwhichthey e.g., exploredwhibiten,NBA structurevariousmeanings the isomorphic meanings two tomeanings|;e largeLin mean-trained in,buttherearealsowaysinwhichtheyCinstance, naturalE structurethe,buttherearealsowaysinwhichthey cross-lingualacross 098,buttherearealsowaysinwhichtheylingual et cross-lingual acrossamount meanings al.common languages language separatelymeanings languages embeddings textual, 20171corpus,buttherearealsowaysinwhichthey set-597 ,buttherearealsowaysinwhichthey (Mikolov095 (lingualMikolov meanings ambiguity L), todisappear et will1 al. enhance and on, embeddingsexample, benefit,buttherearealsowaysinwhichthey et word monolingualal.590Abstract, in in||Ceach from485meanings another one(Wanghibit will1 differentother−(Foust) languageC benefit,buttherearealsowaysinwhichtheyisomorphic etlanguage, by590 al.093|| lan- learning, from corpora2014tion may differenty structure e.g.,;andYamada word the to ex-092 lan-instance, andtwoEnglishacross alignMosty et al.mean-L languagesexisting, 2016 textual similar ;092 work entity596 (Mikolov094094 ambiguity jointly words et modelsPiston065en al., in094 KBand one andby language text entities merely may withC sim-| 0972 069067 045 α sim ( , α sim() en,Joedisappear, zh,Joe in1 another015)en,Joemeanings language, ,buttherearealsowaysinwhichtheyzh,Joe e.g., the twomeaningsx mean- ,buttherearealsowaysinwhichthey x lingual 095 embeddings e will benefit fromw different lan- 065 045 048generateofen,Joe resourceszh,Joe048038 toentityof cross-lingual improve potential linkingentity various knowledge 038 (Tsai linking natural km and training (KBs.038 complementarywhile,Tsai Roth language and418049 abundant Therefore,, 2016 Rothhibit kmisomorphic data, 038;2016while, textKBs.Yamada toL structureexisting;418 corpusresearchers Yamada viaabundant Therefore, across et 2016 contains dis- al. etlanguages textKBs.E al.,c ,researchersleverage ;corpusembeddings ( largeMikolovCao Therefore, amountet contains al. trained, et both leverageE c al.separately researchers | c largetypes, noti2017 onbothC amountbeen monolingual types leveragetextinstance, well; Ji example, corpora explored et i across both ex- al. textual (Wang types,Cao inexample,2016ilar cross-lingual languages,et ambiguity088 al. embedding).,=, (Wang2014 2017 ; set-)utilizethecoherenceinforma-et inYamada|e468098 al.088 one 2016, 2014C capturinge=E languageP et( vectors.;x098 al.en,JoeYamada)learnstorepresententitiesbasedontheirx,0882016)468 2013 may; et al.P mutually( Another;,x Zhang2016ex 088) ;e etwhere en,player al. approach, s2017 yl ).L is in095 a set (ilarHan of sentences embedding con- vectors. Another approach099 in (Han of resources049046 2016; Cao046 to038 et al., improve0472017processing038; Ji426and et038 al.traction,038 entity (NLP)2016045 ().Westonvarious related(2016e representation,m038 et; Cao tasks,) al.046038Multi-lingual,of2013 et such resourcesal.2013;,Lin2017 as natural; Zhang etrelationdisappear; knowledgeal. learningJi, to et2017038 improveal. al. ex-,),,Multi-lingual20172016 and bases). to). inlanguage various (KB),( enable316e another,Someeguages knowledge storing) natural cross-lingual038hibit mil-hibit language,2013 duebases languageisomorphic;isomorphic pioneeringZhang (KB), to etthestoring al. work e.g.,,0122017 complementary observelingual structure mil-529). structure099 theN that two096 word embeddingsacross mean-acrossmeanings languageslingual097 knowledge.096instance, languages will,buttherearealsowaysinwhichthey088417 (embeddingsMikolove i benefit,ej ) meanings textualembeddings (E088Mikolovc For476 et from088 al.instance,based, will ambiguity,buttherearealsowaysinwhichthey088 different et2 benefit onal.[[ trainedLawrence,corpus textual088095096 lan- fromin088 Michaelilar separately one ambiguity different,anchorsFoust cCao language embeddingo]][[ Lawrencewasi 088 et onlan- in [[Michael al.೉᯾ mayandmonolingual one,·ᐰේᇙ2017 Foust entity language]]o]]ฎᗦࢵ was)utilizethecoherenceinforma-i net-088 vectors.[[NBA[[ corpora]]೉᯾ᘶ may·ᐰේᇙ]]ฎᗦࢵ ex-k,l Another[[NBA ]]ᘶ y y approachy 062 579 in (Han467 aligned words. 366 023 049 2016 ; Cao et al. 018, 2017046049429processing ; Ji et al.321048, (NLP)0482016 2016). related043 013049; Caoentity tasks, et suchlinking∝2013 al. Multi-lingual asi,; 2017 relationZhang043 (entityTsaii et; and al.Ji ex-, 2017 Rothknowledgeet0381n∝ linkingal.).,processing2016,Multi-lingual2016; bases044Yamada). (NLP) (KB), ( Tsai knowledge038 et relatedn storing al.Multi-linguali 020, j tasks,and mil- basesc such12013 (KB),knowledgeRoth as;differ.Zhang099 relation storingMulti-lingual On,been bases et ex-2016 onedisappearmil- al. , hand,(KB),2017∈ knowledge weD; storing).done indiffer. utilizeYamada another Onmil- their bases one language,096479 sharedin hand,(KB),∈ seman- weanothercross-lingualDet storing utilizee.g., al. themil- their, two098099 shared mean- language, seman- s099 scenarios. s e.g.,068093088 the twow 093 mean-088063sy en,zh =x , ,073(e ) (e )098094 371 070 049 2016 processing; CaoMulti-lingualjoint045Multi-lingualet (NLP) inference al. related,Multi-lingual2017 knowledge tasks, knowledgeMulti-lingual317! among;Multi-lingual015Ji such 425 knowledgebases etJointly as al. relation knowledge(KB),Multi-lingual, knowledgebases2016 bases learning312storing ex-knowledge (KB),). (KB), mil-knowledge basesnot( base storing word , been (KB),1 storing bases andmil-! bases)312 storingwelland045hibit (KB), mil-(KB), entityKBs. exploredisomorphic mil- differ. storing2013 storinget represen-e On Therefore,structure; mil- inal.Zhang one cross-lingual, hand, 2016 mil-across side,et wediffer. al. languages; utilize ,Toutanova2017 researchers On their one set-(Mikolov and). hand,=shared differ. ( et we seman-al.etin, utilize, On al. leverage one), thetheir hand,(2015∈e ) shared wediffer.; left utilizeWu( seman-bothe On) their one et sidetypes sharedhand, al., we seman- utilize there their shared099 are seman- threem main095065{ components| ∈ 367textual} 475 descriptions362i of jointj362 representation095 together with the learning. structured Red re- texts 323 0192016; Cao et al., 2017; Ji et al. , 2016 015). of potential ∈A2013015; Zhang knowledge etof al. , potential2017). complementarye knowledgemembeddingsdiffer. ∈ OnE trainedw one complementaryto hand, separately existing wediffer.en,LosAngelesLakers on utilize monolingual Onew their one to hand, sharedcorpora1 existingx we ex- seman- utilizee e their sharede xe seman-!e Lembeddings2013; Zhang etL al. trained, 2017| ).069 separately | 065 on monolingual 065 corpora ex- 373 ᘳӱᓯቖᬩۖާk,l each languages479369098 other096 by in̶՜ࣁ learningi (Mikolov one word language and098099 et al., mayڹᘳӱᓯቖᬩۖާA basketball textual mean-are to095 player enhanceacross cross-lingual̶՜ࣁ093 ambiguityፑጱڹentity049 linking独立日 (0492016Tsai548 ;Cao and045美国独立日 et541 al. Roth, 2017429,of043; 2016 Ji319 resources et541 al.; 048, 2016Yamadatext 039 046 to043 ). improve419of across resources 049wki etentitynotskmprocessing039 various al.been1 languages, with 419 to linking, wellMulti-lingual improve2013 naturalbracketsw2016 (NLP)kii explorednot;skmSomeZhang (; beenTsaii Cao relatedcross-lingual languageknowledgevarious are wellet capturingin anddiffer. etAmericanMulti-lingual anchors,al. cross-lingual al.tasks, explored, pioneering2017Roth Onbases natural,20162017 onediffer.). such(KB), hand,work, dashedknowledge)learnstorepresententitiesbasedontheir2016 inof; 099 observeLawrenceJi languagestoring aswediffer. cross-lingualguagesset-American On resourcesmutually et utilizerelation; thatonelines al.Yamada Onbases mil-i word theirdue, onediffer.2016 hand, (KB),entity betweenhand, sharedex- to598Lawrence toset-the Lawrencedisappear). storingwe weguagesetOnimprove seman- representations complementary utilizeal. utilize onei entitiesmil-,i theirduej hand, sharedintheirvarious to591Lawrencei theanotherdenotec089we seman-shared complementaryknowledge.1 inhibit utilize naturalj arelations, unifiedan language, seman-Americanisomorphicy 099 their591k,l469 For language089 basketball vector shared knowledge. andDk e.g., playerinstance, space.an solid seman-American structurethe093469corpus linesFortwoForፑጱ 048 436 044433 047 039 046 045 016039lions of entities 045 Some and039 cross-lingual theirlions tings.1 facts of entities pioneering in In various thisand 039 work their lan- paper, observe facts in that we various word propose lan- Some a cross-lingual novel differ. pioneering097 On oneCao hand,work observediffer. weet utilize486 al.089 that On, word one2017 theirxi hand, shared)utilizethecoherenceinforma- x we094096 seman-(423 utilizex089) theirxi sharedx483089disappear seman-(x ) in089 another095taining language, the066 same095 e.g., entity the (as two mentioned mean- in Sec- 473 046 generate047 049 cross-lingual039 0450482016traction039;427Caoentity (Weston039 et linking al.of et, training al.2017potential ( Tsai,0392013 and047; ;JiprocessingLin Roth et knowledge et ∈, al.2016 al.of, dataof2017traction2016; potential(NLP)Yamada resourcesmethod), and). ( relatedcomplementaryWeston etvia al. , knowledge∈ tasks,et todis- al.of improve, 2013 suchthat016 resources ; asLin complementary2013 to relation etintegrates existingvarious al.; Zhangtics, to2017 ex- to improve), aligndisappearet andnatural al. similartings., to2017 existingvarious words language cross-lingual).tics097 In∈ in andtoKN align thisanother entities naturalCao similar098 paper, with et words language simi-al.089language,,∈ we and2017KNCao entities proposeword096089477)utilizethecoherenceinforma- et with e.g., al.work simi-089, a2017 the novel.WeutilizeavariantofSkip-gramo)utilizethecoherenceinforma- two089097099∈i mean- o i ∈ ∈{095! } ∈{D!A G } 066 049043processing547 020 (NLP)0392016KBs. 430traction related; Therefore,Cao (Weston 049 tasks,et et039 al. al.lions017 ,,2013 such researchers20172016 of; entitiesLin; asCaoof; et relationJi al.lions resourceset and, et al.2017 lions,leverage ofal. their2017 ), entities ex-, ofand facts;2016" Ji entities039 et to and in al. both).textlions, variousimprove and2016their of theirtypes). facts lan- entities across factsSome " in039 various hibit! varioushibit cross-lingualand inlionsinstance,isomorphic various their∈ of lan-2013A languages,isomorphic2013 entities factsnatural pioneering; structure lan-Zhang ; in textualZhang andE et variousacrossc workal.lions their,language2017 languages observe et of013 lan-ambiguity facts). al. entitiesguages 530 (structure thatMikolov, incapturing 2017 word various and018 et due). al., their in lan-535 toL one facts the480 guagesacross incomplementarydisappear language mutually various !418 due||Ction languages lan-∈ tohibitE in may099 thetotextual another− and align Ccomplementarydisappearisomorphic knowledge.wwho en,NBA spent|| entity similar language,∈D (12 descriptionsMikolov seasons ine∈!C en,LosAngelesLakers anotherstructureFor words representation in knowledge. e.g.,[[whoNBA spent]] language,theand099∈ etD 12across1950 two089seasonsal.089 entities∈ଙጱC For, mean-[[ togetherNBA in languages e.g.,[[NBAᭌᐹ with]] learningӾᒫ the !11950᫪ sim-two089ᒫ089 (ଙጱ5Mikolov597 mean-with[[NBAᭌᐹ to ]]Ӿᒫet enableL the1 al.᫪ᒫ,5 structured||C re- 063− 580C 093|| 070468 067 068 585 047 traction (Weston joint inferenceet al.lions040 , 2013 of319; entitiesLin among et al. and, 2017lions 040 knowledgetheir),and of facts entities1 in various andembeddings basetics theireen,Joe lan- and to factstrained align een,Kobe insimilar separatelytics various to wordstics 福斯特 on alignezh,Kobe lan- monolingualto and align similar entities1 similar words corporatics with to wordsw and ex- aligndisappear simi- entities and similarJointly097 entitiestics with words, to learningsimi- with alignin1 w and another simi- similarx entities word=090 wordstics with language, and to and simi- align! entity entities similarx ! represen-= with090 e.g.,P words(x simi-x and the)!tion entities two! to with mean-P align(x simi-x ) similar words and entities with sim- 福 369 023 549018 019 542 048of potential020044014542 and n entityentity knowledgemon044 and neighbors representationlinkingn andembeddings Chinese entity have ( complementarytrainedTsai separatelysimilar andrepresentation entity on learning embeddings, monolingualtics Roth to选秀 align corpora, similar 2016 been ex- to no wordstics enableembeddings existingmat-;1 to learningYamada and align done entitiesj similartrained(Foust) jective 599 with separately wordsin(e simi-i ettoejcross-lingual and) since on E al.c monolingualenable entitiesj are,i a592 cross-lingual with corpora simi-embed- ex- G i592 scenarios. linkoilar actually i will embedding069094 contains beo noi direct094 064 vectors. relation Another en betweenen 073 approach et al., Chinese2016 in (;HanToutanova098 entity068 et070 al., 2015; WuE et al.R, 048 neighbors018 049 lionstext 040 across2016 of; Cao entitiesNBA et languages,KBs. al.lions,4202017 016 and040guages,; Therefore,Ji,tations of etAll-star al. their1 provide=entities, capturing2016 benefitsKBs.). factsrich researchers4201040045 structured andlionsguages, Therefore,tings. in mutuallyand many of theirP variousprovide knowledge2013(e entities leveragei In;m NLPZhangNBA factsirich researchers, this040 foret and structured( al.lan-elions tasks, un-i,both paper,2017)) intheir +). oflar various knowledge types2016 while facts entities embeddingleverage weACL in)learnstorepresententitiesbasedontheir propose has for andvarious(draft) 2018 lan-Some vectors, un- both theirP Submission( cross-lingualelar lan-j notypes a factsm embedding matter noveli , ,etc.tics in(epioneering theyi various***.)) to099 vectors,098 are align Confidential in work lan- thesimilar noSome matter observeticsadding words cross-lingualReview theythat to090 are align andword Copy. in entities the similar DOpioneering the with wordsNOT470 090 simi-textual DISTRIBUTE.equivalence and work entities observe with090 descriptions470entity simi- that representations word relation090 in together a unified066 between vector with, space.Foust Forthe structuredand re-095 068 ,(ߵᇐỿ066 al. re- languages370tion,]]ᭌӾ̶097 knowledge.20150964.2…066), and et; (ψMikolove(ieWu368me al.j,s)k,l,096E)c etis For2015476 knowledgeet al. al.363,,; attentionWu2 et363096 al.(1ےߵᇐỿ)topredictthecon- Instead2016 ]]; … with|]]ᭌӾ̶096;ᶲ֖ᤩacross sim-… etToutanova[[ of094ᜡےtant supervision048 020046 431entity040 linking044 (Tsai overentity040320046 and428 linking Rothα016040km, multi-lingual2016 (047Tsai047 ;044 andguages,Yamada318040 processing Roth048 provide, et426traction2016016α al.km, ;tings.Yamada richtraction (Weston entity structuredguages, (NLP)313links. et In linking knowl- al.et ( provide,embeddings al.thisWeston knowledge related, (2013hibitTsaitings. paper, richisomorphic and;guages, trainedLin etstructured313 Rothfor tasks,et Inal.046 un-we separately provideal., structure,2016 thisof, 20132017 propose knowledge such; resourcesYamadarich paper, on),across; andmonolingualLin structuredguages,tics as languages et for aet weMono-lingualrelational. processingtoun- novel, provide al. corporaknowledgeinstance, proposealign (Mikolov, tohibit 2017 richisomorphic ex-improve098 ex-et similarRepresentation structuredal.), forticscomplementary textuala, example, (NLP)and un- novelMono-lingual structure to 481 wordsknowledgeinstance, align variousambiguity relatedacross! Learning090∈ ( Wang similarlanguagesRepresentation andguagestion for textual un-L tasks,et natural in entities to090 (etMikolov478 one wordsalign al.and ambiguity hibit suchmodel,was languageetLearning2014090 al.,due with language similaran, knowledge. and 8-time2016 asisomorphic (Mikolov;L simi-Yamada relation[[ in entitiestoAll-star may090097098 words oneand]]et; the …et was language|Toutanova al. withex-et andan al., 8-time complementaryal.2013cᶲ֖ᤩ structureentities, simi- [[2016[[070All-star094 mayᜡ 046 processing048 (NLP)430049processing relatedguages, provide (NLP)KBs. tasks,046 rich2016 guages,017 structured Therefore, related Lsuch provide;2016 knowledgeCaoKBs. tasks,hibitprocessing046 researchersrichisomorphic; as structuredCaoSome Therefore,et for such method relationun- al. structure cross-lingualet (NLP) as knowledge, leveragelar al.2017processing relation| embeddingacrossresearchers, that related2017C languages for integrates ex-both; pioneering un-Ji ex-; vectors,lar tasks, Ji (NLP) (Mikolov embeddingtypes ettextual leveragelar et noembeddingsuchal. al. etcross-lingualwork related matteral.,, vectors,descriptionsas2016 2016 both observerelation they vectors,lar tasks, no embeddingtypes). are | ). matterthat in nosuch ex-word theCtionE matter word togetherthey 098 vectors,as lar to arerelation they embedding align in no are the419 with matter in tion ex- similar the vectors,2013 the they tolar arestructured; align no embeddingwordsZhang2013 in matter the similar;E theyandetZhang vectors, al. re- are entities, words2017 in no the et matter).disappear andal. with they, entities2017 sim- are in). the in with another sim- 096 language,480 067 e.g.,096 the two mean- 099 469 437 047 045434 ࣁ ՜entity linkingԅࣁ (Tsai049၎๩ᎫდՈ՜ andguages,041 Roth provide,ԅ2016 ; Yamada rich 040 ၎๩ᎫდՈguages, structured041 et al. provide,1 knowledge rich 040 structured for 021 un- 017 knowledge lar embedding for un- vectors,014embeddingsinstance,methodlar531 noembedding matter trained that textual they separately vectors,c integrates textual cross-lingual they097 in o ( wordxi ) c in095 may091 onex i languagexo (xi )484090 may 090 099 064 581067 071 024 traction (Weston 040of 543 resources et049 al.,0463222013 2016 to040045; Lin improve543text; 041Cao et et across al.processing, 2017, various2017 languages,; 421045Ji et), structure et suchrich041 al.language Mean- , guages,2017 structuredacross). as beyond insame( relationlanguagesprovide another, knowledgelanguage texts. ) ( richMikolov Mean- In ex-or language,structured not. foret thisal.1same un-On, the knowledgelanguage otherlar e.g., paper,tations embedding hand, or099 the1 not.embeddings fore cross- benefitsen,Joetwoun- On vectors,ilar the we mean-593 otherlar many embeddinge trainedembeddingen,Kobehand, no091 propose∈ NLP matterD cross- separately tasks, they vectors, vectors.593 are while471e 424 no inzh,Kobe on091 to∈ the matterD hasmonolingual Another learnilar they 095090Chinese embedding are091 approach471 corpora in cross-lingual1 the KB ex- in vectors. (095090HanChinese091 = 096KB Another approach !lations.( in∈e (Han)074(e However,) theseKnowledge methods Attention only372 focus on 367 474 049 2016; Cao et324 al. 548, 2017049 020;4322016Ji; et041Cao et al. al.n, 2017,0414292016 015; Ji et041 al.complementary048, 2016).sderstandingofnkm). 041ter resources 017049 inentity1natural which linking to 2013 language2016es improvederstandingiof km;mZhang language, (;Tsai resourcesiCao beyondetand etmethod al. knowledge. variousal.natural, 2017 Roth, 20172013 texts.mon to). derstanding, e.g., language2016; improve thatJi Mean-natural et;; al.Yamada English integratesZhang, beyondneighbors2016 natural languagevarious). et texts. languagee al.iderstandingInstead entity,e etjcross-lingual Mean-natural al. beyondc1disappear1Foust natural2013,with language2017 texts.; haveZhang099of language Mean- etwordin). bracketsal.re- multiple another, 2017 beyond482 similardisappear). texts. language,1091∈ relations. A Mean- in are another091embeddings, e.g.,479 !jointanchors, theForEnglish language,1091∈! twoCA example inference KB mean-091098099 e.g.,! the(FigureEnglish dashed∈! twoC noamong 070KB mean-1example, mat-), there lines knowledge (Wang between065 etjectivey598 al.,099that2014 base aims067; Yamadasince at and entities filter et out al.,a wrong2016i cross-lingual denote; labelling sentences,j relations, linky and actually374 solid contains lines are cross-lingual 独立日 美国独立日tant021 supervision049 2016045 ;guages,complementaryCao derstanding018 over et al.017 , 2017 providederstanding1042 natural multi-lingual; 045Ji etguages,319 language al. derstanding rich knowledge., 2016 natural not017 structured). beyond provide041 languagemethod beenderstanding042 natural complementaryembeddings texts.314! welland languagebeyondInstead rich046that∈ Mean-knowledge naturalA2013 knowl- explored texts. structuredentityintegrates; beyondZhangmethod041 language trained ofsame Mean- = et re- texts.314 al. languagein, for beyond2017separately thatrepresentation cross-lingual Mean-cross-lingualknowledgesame1).un- integratesP texts.same or ( language not.1 knowledge.textualsame on( languageMean- elar=i! On ) monolinguale languageor∈ foritheembeddinghibitE)+ set-cross-lingual not.word ordescriptions othersameSome un-isomorphic not. On1Psame or hand,( language the On not.019 cross-lingual corpora(learninge language othere theen,LosAngelesLakerslar cross-structurei On) vectors,536 othere orInsteadwordi thehand,embedding1)+ togetherP not.099(ex- or other hand,acrosssame not.cross-( OneSomej nohand, languages)cross-pioneering languagethe OnSome eto420i with) matterother the cross- ofcross-lingual vectors,enable other (cross-lingualMikolov or hand,theP not.re-(092 hand, they worksamestructured cross-(et Onetextsjal. no) cross- language the,Some areeobserve ipioneering) mattergiven other in cross-lingual re-or hand, currentthe not.092 that they cross- work On work word word/entity: the are observe pioneering other observe in hand,091095 the that cross- workthatSome word word observe cross-lingual091095067 that word pioneering067 work369 observe that364 word 364096071470 068 069 586 ଙ of048ጱ2016047 resourcestraction ᘳӱwhile,;427ଙCao toabundantentity (of improveWestontraction047 resourceset text linking corpusal.ᘳӱwhile,and (various etWestonn, containsal.2017 toabundant entity (Tsai, improvetraction2013 naturallarge et text representationand047 al.; amount corpus;Ji,processingLin2013 Roth(various languageWestonn et contains etlingual;, al.Lin2016 al. naturallargeet embeddings, et learning2017 al.2016; amount al.(NLP)Yamada, 2013, language2017), will andlingual). to; benefit relatedLin),et enable andCao embeddingsal. from etilar, al. differentet embedding, tasks,2017al. will lan-, benefitilar),2017 such andilar from embedding vectors.)utilizethecoherenceinforma- embeddinglations. different as2013 relation lan- Another vectors.; vectors.Zhang ex- However, approach Another Another et al. in approach, 0712017 approach (Han these). in in (Han (Han methods097 098097 only097 focus4772016 on)learnstorepresententitiesbasedontheir 094097 047ێጱ ප ێප 021 044 024 046 047 544 049042KBs.431traction021321430 544042joint Therefore, (042Weston inference 422320018 et 1042 al. , 2013 researchersjoint ; among422Lin042 inferencederstanding et al. L , knowledge 2017 natural018 leverage042 ),derstanding language among and NLlations. beyond natural|w 0152013tractionboth base texts. knowledgedisappearand532 language;In However,Zhang Mean- entityN thisE et andtypes ( al.Weston beyond, in2017 representation| same another paper,).Somenot these texts.disappearN language beenet hibit Mean-base cross-lingualal. language,092 methods| well,isomorphic2013 or inwe explored not.learning594same福斯特 another and480; e.g.,N On onlyLin092propose language pioneering the structure in the language,092 toet| othercross-lingual focus enable twoal. or hand,, not.5942017 mean-across096472 workone.g., On cross-to092lations.), thelanguagesset-et observethe andlearn other斯特 twoal. hand,, (mean- that092Mikolov4722016 cross-lingual cross- However, wordwhere097 et al.; , Toutanova092xL is481371 these either068en methods aetzh word||C al.,074 or2015 an− onlyC entity,; 065Wu focus||582068 and et on al.(099x, ) 071 370 438 048 entity435019 linkingprocessing (Tsai433 and(3)019 Roth (NLP) Cross-lingual042046,while, 2016 related abundant ;Yamada049 043 tasks,while, textprocessing042046while, corpus abundantet such ded 2016 al. abundant contains,;e (NLP) askmCao text043 relation Sentenceclose corpuslargeetwhile, textprocessing al.related corpus, amount2017 abundantcontains α ex-; containstasks,Jiin e(NLP) etlargekm text al.while, such corpuslarge,semantic amount2016 related Regularizer abundant amountas). contains relation αlingual tasks, text large e corpuswhile, embeddingsex- such amount contains abundantspacekie as relation willlingualSome large text benefitaims c corpusamounte embeddingsex-due from containscross-lingual 483 differentlingual to willto large benefit embeddingslan-instance, amountthe et from al.098092 ,093 different willcommon488lingual2016 pioneering benefit embeddingslan-; fromToutanova textual,whichisincontradictionwiththefactthat different 092099093 will benefit lan- work et ambiguity from al.096,485 differentobserve2015Cao et;(Foust) lan-Wu al., that2017 et096 in al.)utilizethecoherenceinforma-i, wordoneand language English mayentity Piston by069i merely 069 edge bases. 041 We047 of knowledge.(Weston potential text042corpuswhile, knowledgetwo hibit abundantet contains isomorphical. Instead of complementary,types potentialkm large text2013042 corpus amountlingual of knowledge; structureeLin re- toicontains embeddings existing1 lingual et complementary kmlarge al.across embeddingsguages, will amountlingual2017 en,Joe benefitlanguages duee toi embeddings), existing toen,Kobe will from thelingual and( e complementary benefiti, different(e1 Mikolovj embeddings)guages willE from cliancezh,Kobe benefit lan- due differentet knowledge. to embeddings willal. from the(,eembeddings complementary benefiti lan-, different ej ) Foron E from c lan- 2016trained different parallel trainedknowledge. embeddings lan- separately separately)learnstorepresententitiesbasedontheir For data, trained on on monolingual2016 monolingual separately we et)learnstorepresententitiesbasedontheir1 al.091092 corpora automaticallyonembeddings, 2016 corpora monolingual ex- ; Toutanova ex- trained091092 corpora separately097and ex- et(ψ al.′(,w , i )2015 on,wj monolingual;) isWu cross-lingual et al. corpora, attention ex- to deal 047 tractionኞ႕549 (021Westonኞ႕ 046 etderstanding al.016 018, 2013046 naturalderstandingand423;018Lin Chineselanguagetings.018 and et natural entityInmon al. entity423beyond thiswhile,, representationand languageneighbors2017 paper, 福斯特and1 abundanttexts. entity entity we1!), representationMean-beyond1 text have proposeand representation(Foust)while,learning corpus similar abundanttexts. containslations. a are to embeddings novel! learning enableMean- text largeembeddings, embed- learning = corpus However,1 amount! to contains enableto trained enablewill these largenoP separately be mat-(1 amount! nomethods 421( directe ) onjectiveerelationonly monolingual)+ since focus between a on corpora473 cross-lingual425 Chinese ex-P071( entity096473 link(e actually福) e ) 096068066 contains599 e068i068ej Ec 097 471 475 043 liance 431 043 043 on processing parallel048of potential 043 data, (NLP) knowledge 2016 processing we related048of complementarypotential; automatically043Cao tasks, (NLP)et knowledge al. such to,0222017 relatedexisting complementary∈ asE043; relationJi tasks, et al. suchex- to=same, existing2016l ∈ as016E language relation).533 ∈ ex-same ortionlingual not. language toP embeddings align2013( One093∈;et themZhang similari or al.lingual other,will, not.481 et2016i benefit 093( al.e wordsembeddings hand,, On;2017))093 from Toutanova the +). and differentcross- otherwill entities benefit 093 lan- hand,et from al. with, differentcross-2015Some093 sim-;P lan-Wu( cross-lingualje etmi al.,,093(098e pioneering)) work098 observe that word066 583 C en 072 025 022 434545 323043322048of545 potential lianceof knowledge044049 potential320 (existing047Tsai linking knowledgecomplementaryKBs.joint andto Therefore, leverage existing inferenceguages data,318(ofTsai complementaryentityRoth potential315 both to048 dueresearchers andSome existing linkingtypes, totraction knowledge2016weRoth among theguages cross-lingual to complementary leveragemono-lingual existing,guages; ( 2016ofTsai automaticallyYamada due complementary knowledge potentialword ( both toWeston dueentity and; the Yamada typesjoint to pioneeringcomplementary knowledge knowledge. Roth the guages etSome linking complementary inferenceand to al.et base, settings,existing et2016 duecross-lingual,embeddings al.complementary al. For work to andet, knowledge.entity,;tings. (the 2013484YamadaTsai al. amongguages complementary observe knowledge., andSome2016 pioneeringand In; toi due FortrainedLin existingthisetfew knowledgerepresentations cross-lingualthat to;Rothet al. Fori theToutanova595et paper, knowledge.,al.word093 work complementaryresearches separately094 al.,,guages2016text2016, observe wei2017 pioneering base due For propose; etacrossYamadaToutanova to thaton knowledge.), andtheal. have595 andmonolingual word093 work complementary, 094 a2015 novel observeet languages,For et;al. inWu al., that knowledge. thecorpora, et0722015 word al. same;, For ex-Wu capturingj ety i al.seman-y , 372i mutually099098450069∈ 370mono-lingual075478 365 settings,365097098 and few researches373 have 368 049 3250472016436edge; Cao048 et bases. al.traction , 2017独立日 (;432Weston Jientity Weet 047 al.019 ,et linking 2016also al.美国独立日, ).2013 (traction Tsai propose;047Lin en,Kobeand et(Weston 043 al. Rothzh,Kobe , 2017liance2013traction ettwo ,joint 2016al.; ),Zhang , and2013 (Weston; types 043Yamada inference onet; ter LinAstraightforwardsolutionistoconcatenatethem al. , Figure et0192017 parallel al.in1 guageset, ,). 201320171 al. which 1: dueinstance,, ;among), LinThe and to thedata,Figureet ( textual frameworkcomplementaryw al. kiguages, language,2017links. ambiguity) 1 knowledge 1: we dueinstance,020 ), The and to in knowledge. the537 oneof automatically textual frameworkcomplementary languagehibit oure ambiguityhibit1 Forisomorphicmethod. e.g., mayisomorphic base 2016 in knowledge. oneof099 5.3 language)learnstorepresententitiesbasedontheire Englishour and The structure1 structurehibite ForCross-lingual method. mayinputsmisomorphic = acrossacross097 and entityThe languages structure languagese Sentence outputs inputsz 2016 (Mikolovacrosse097486Foust (Regularizer093 ofMikolov)learnstorepresententitiesbasedontheir andtionhibitlog languageseach ettoPisomorphic outputsdenotes: al.( align098 et,( stepx al. ()Mikolov similar,097x093 areof)multiple structure (i)each listed wordset482with al.contextual, step the and inacross! unbalanced the entitiesare relations. languages right listed wordswith information in sim- (Mikolov the intextual rightthrough a pre-definedFor et al. possible, descriptions069 example win- together (Figure375Suppose069 with1), thethat there structured sentences re-s l L contain070 587 the 439 022 042 042liance044 on parallel424 044 data, we424 automaticallyof potential knowledge of potential complementary knowledge hibit toisomorphic existing complementary structure to existingacross languages489094 (Mikolov474 094 et al., 092474 i 092i 072 045 022 044 ၎๩ᎫდՈ048 KBs.432017text Therefore,044 ၎๩ᎫდՈ1entity045 acrossKBs. researchersKBs. linking019 Therefore, method Therefore, leverage languages,joint045 researchers (Tsai KBs. inferenceresearchersthat bothter0441and Therefore, typesin integrates leveragejoint1 which Roth leveragejoint among inference researchersbothKBs. , inferencecapturing2016 language,bothtypes cross-lingual Therefore,(044 knowledgeei types) leverage; among instance,Yamada L among researchersmono-lingual bothKBs. e.g., knowledge basetextual wordLtypes knowledgeembeddings mutuallyTherefore, et( leverageeEnglish iembeddings1 and ambiguity)al.instance,Some, researchersboth base settings, basecross-lingualin entity types trainedtextual one andguages斯特 and language leverage sembeddingsk ambiguitytrainedFoust separately= anddueSomeN pioneering094bothto may422| fewthe cross-lingualin types on trainedcomplementary oneψguages separatelymultiple( monolingualresearches|eCmono-lingual languagework482i095,s5.4 separatelyk,l due observe ) pioneering094to may relations. thecorporaknowledge. Training onhave complementarythat monolingualonψ workword′ ex-095(w monolingualmFor2016 observe,w Forn) corporaknowledge.settings,w example072m that)learnstorepresententitiesbasedontheirN094word ex- For (Figure | corpora| and 094C1067),E fewc there098 ex-069 researches have 095472 k,l 025 435546 of047 resources022while,generate044 Therefore,while,044ded to text (ofWeston close019 resourcesimprove corpusabundant researchers training1KBs.texttraction2016 inet to049 containsimproveTherefore,al. semantic leverage;, text acrossCao2013data (ofWeston various resources variouscorpus researcherset; bothvia largeLin al.instance, naturalspace 2016 typesdis-et, toembeddings et2017 containsimproveal.amount languages,al. language leverage;, textual,Cao2013due ;2017Ji variousnatural et ambiguity;both to),large traineddisappearLin al.instance, naturaland thelingualtypes,,C et20162017 amountwordal. inlanguage inseparately017common anothertextual , one). language;5342017en capturing embeddings Ji language et language, ambiguity ), disappearand al.on and lingual,hibit may2016 monolingual e.g.,e2016 in inen,Joe entitymethod485isomorphic anotherthe onewill).instance,选秀 embeddingstwo )learnstorepresententitiesbasedontheir2013 language(Foust) mutually mean-language,benefit that; corporaZhang2016 textual representations 2016 mayintegrates structuree.g.,596e fromand)learnstorepresententitiesbasedontheiren,Joe094 ambiguityetw the ex-will)learnstorepresententitiesbasedontheiren,player al.instance, two English,2013 different2017 mean-benefitLcross-lingualacross in; ).Zhang textual one , languageentity596 fromlanguages lan-094 ambiguityetwen,player al.mono-lingual word, different mayPiston2017y in inembeddings (). oneMikolov the languageby lan-0972 merely same et may al. trainedC, seman- settings,|0972069 separately099 on069 and monolingual099 few075researches corpora ex-067 584 have 072 020 learn020049 mutually3213212016neighbors translated; Cao044 et401316 al.1 ,text2017NBA044 across words; Ji316N1 et, languages, al.All-starinstance,, 2016 with textual). capturingN( ambiguity similarandinstance,, e)i in mutuallyNBA textual oneilar language ambiguity em-2013 embedding may; Zhang ine disappear onee (draft) languagetextual vectors.etFoust al., 2017 may= Another(e).,etc.e descriptionsbelongsi inyeej()en,zh another426 approachE,(ce )=) addingy(e to iny) 094 (language,yilarHanPiston together(e embedding) the(e ,nomatterinwhichlan-) equivalence094vectors. with e.g.,aligned Another099 the the451 words. approach structured two371 relation inmean- (Han re-366 between366Foust070 and 070 { | 371∈ } 476 of048 knowledge023 entity 045 linking attention2016323 (048Tsai;045Cao and045 et Roth al.ofentity and, resources425048,0202017 2016045 linking429 cross-lingual; toYamadaJi improve ( etTsai ofentity al. resources425 various and045 048, et2016KBs. linking al. Roth natural to, Therefore,). improve, 2016into (Tsai languageside,020;049 a various and045Yamada longerresearchersKBs. andRoth natural Therefore, sentencebeenet, in2016 leverageal. language theside,,2013 done;2016eYamada researchersileftembeddingstexts both;m andZhang,butthisincreasesthe in;i side acrosstypesCao cross-lingualet in leverageetal.generate there theettrained, al.instance, languages, al., left2017embeddings, bothare separately2017 sidetextual). typess095 three scenarios.; stextualJi capturingcross-lingual there ambiguitytrained on etinstance, main monolingual al.095are descriptionsseparately, in2016 components textual095 mutually threewone corpora language).s ambiguity on main098 monolinguale475i together ex-095e maytextual trainingjix of in componentsi one joint withc corpora, languagej 098,1073 representation0954752013 the descriptions ex-data may structuredi; ofZhang joint viaj et098 representation095 al.re- learning. dis-, y2017373 together). y Red070 learning. textsy with Red479 the texts structured070 098099 re- y 048 entity linking049 043436 (Tsai433 独立日 045043of433 andgenerate resources 美国独立日 Roth 046 to cross-lingual improve 045 of , resourcesprocessing various2016 046 natural totraining (NLP) improve; Yamada related language processing variousdata tasks, suchvia disappear natural (NLP)of ashibit dis- resources relation et related languageisomorphic in entity al. anotherex- tasks,todisappear improve, such language,disappear oflinking as structure resources invarious relation another ine.g.,hibit wk anotherex- natural to (the language,disappearTsaiacross isomorphicimprove two∈ language language,E mean-and languages e.g.,in various another structurethe486 e.g.,Roth hibitwdisappear naturaltwo the language, (isomorphic mean-Mikolov twok,l,across language4232016 in mean-k another e.g., languages et095;Comparable structurethe483096 al.Yamada language,disappear twocomplementary, mean- (Mikolovacrossm e.g., in anotherk,l et sentencesthe∈ languages{! et095 al.two∈ al.096 language,, mean-} ∈ (Mikolov provide{D! e.g.,textualA knowledge. the093G et two} cross-lingual al. descriptions, mean-099 093 co- Instead together 483 with of the re- structured098 re- 473 440 437 547 048 547 generateof resources048 020 toand improve cross-lingualtext of entity resources various across representation natural totext languages,improve text language across 023 variousacross traininglearning languages, capturingdisappear natural languages, 福斯特= languagebeen in to1 another mutuallyenable capturing1 data done capturing language,disappear !en,NBA invia !P mutually021 cross-lingual∈ e.g.,in1(A mutually2013 another dis-538 the( twoand;e language,Zhang mean-)en,NBA entity e scenarios.)+ et e.g.,∈ representation al. the,597 2017 two 490L mean-). learning ∈ !597 ||C toP enable!L( −∈(ChibitEe )487||isomorphic098e||C ) − Ceen,LosAngelesLakers structure||098 福across070 languages been (Mikolov done et al., in cross-lingual scenarios. 071 588073 023 049 oftant018 potential046 supervision 020 2016entity knowledgeofneighbors 426; linking potentialCao over046020 et multi-lingual045 ( complementaryTsai al.entityNBA knowledge, andand2017text426 linking, All-star Roth Chinese; Ji across knowl-,045 ( complementary2016 etTsai319 toand al. existing; and entity,YamadaNBA2016 Roth languages,选秀)., et2016 al.Some toguages(draft),tic existing; cross-lingualYamada hibit018 (Foust)535 space, due,etc. pioneering capturingisomorphic etto al.Someguages theare work, cross-lingualaddingtextual complementary toobserve福斯特 embed-i due enable that pioneering descriptions thei word to! mutually structuretextual theequivalencetextual workwill complementary observe knowledge. descriptions joint096be descriptionstogether that no word(eacross! relation, directe ) inference withE togetherFor476 knowledge.together relation096 the languagesbetween structured( withe2, withe betweenj)073FoustE the095 amongFor the476eti structured re- al.( structuredMikolovand,dow Chinese20162 ; ofre- KBToutanova re-095(1)070 entity068x et099 andal.if et,x al.070, 2015; WuE,(ii)neighborentitiesthat et al.R, lations. 068 585 However, thesesame methods entities in only articles福 focus of entity on em,thewrongla- 369 026 of knowledge324 attention047 processing and (NLP) cross-lingual047 relatedgenerateprocessing tasks,046of such resources(NLP) as relationrelated cross-lingual toimprove tasks, ex-046of such resources various as relation to natural improve ex-hibit training languageisomorphic variouset natural al. structure datahibit, language2016isomorphic viaacross; 1Toutanova dis-languages structure097 (Mikolovacross eti j al.c languages, et0972015 al., ; (MikolovWui j etc096 et al. al.,, 096 i i 076 374 326 023独立日 437 046 美国独立日 046049processing 434020 046 (NLP) 2016 related322046 049processing021 ; tractionCao tasks, et such ((NLP) Weston al. as, 20164023172017 relatedrelation al.).), , to Chinese;2013 (NLP)andJi ex- include knowledge. et; Lin related al. etprocessing, al. 2016 unrelated, tasks, 2017 ).), complementary such2013(NLP) and Insteadentity as; sentences. Zhang relation related ofet tasks, ex- disappear al. re-487, such20172013 knowledge. as).; inZhang096 relation another et ex-disappear al. language,096been484, 2017 Instead(Foust)). in096 e.g., another done of the∈ re- two language,096 mean-are intextual e.g., cross-lingual embed- the∈099 two descriptionsmean- 099 scenarios.will together be452071 no direct372 with the relation367 structured between367 re-073 Chinese376 070 entity 024 processing (NLP) relatedprocessing tasks, such (NLP)en,Kobe as relationrelatedzh,Kobe tasks, ex- such as relation1 In ex- this paper,1 we propose 424 to lations. learn= cross-lingual However, (e =) these (e ) methods(e 074) only(e ) focus on Knowledge Attention 474 046049 2016548 ; Cao et023 al.324 ,5482017complementary ; Ji et al. , 2016430 ). complementary knowledge. 12013 with 021 ; Zhang bracketsL knowledge. Insteadembeddings et al.1 ,with are2017 trained anchors,). brackets separately ofembeddings onre- N monolingual dashed Instead are trained anchors, separatelycorpora1| lines ex- on ofbetween monolingual dashed598occurrence re- corpora 1 lines entities ex-! betweenofy 598i words,099 denote427beenNentitiesj thus, relations,2013 lations.! donewe;|i denoteZhang learn and However,in similarjE et relations, solid al.cross-lingual, em-2017 linesy these ). and374 arey methods solid cross-lingualy lines scenarios. only are focus cross-lingual480 on 071 096 073 477 026 021 044 独立日processing434 044美国独立日tant047 supervision0482016 (NLP)traction ;427Cao047 entity (Weston over et linking046 al.complementary related multi-lingual048et2016, al.2017 (tractionTsai, 2013;427 and047Cao049; ;Jiprocessingentity (Lin RothWeston complementary et ettasks, et,al. linkingcomplementary2016046 al. knowl-,et knowledge.,2017traction2016;al.2017 (NLP)Yamada Some (Tsai, 2013), and).and047 cross-lingual (;such relatedWeston et;Ji processing2016Lin Rothal. 1 et, knowledge. et, al.tasks,et2016 Insteadal.Some knowledge. al.;pioneering, as2017traction2016;Cao, (NLP)Yamada2013 suchSome cross-lingual ),tic relation019; and). asof cross-lingualworkLin (et relatedWeston et2013 relation 536 Insteadre-al. etspace,al. observe1 , pioneering al. Instead; tasks,Zhang,et,2017Some ex-2017 al.pioneering that , of),2013 such work cross-lingual wordetex- and of re- al.; observeasjoint toworkLin, re-Ji2013 relation2017 et observeetpioneeringenable inferenceal. thatSome;).Zhang, al.2017 word ex-w cross-lingualthat,ki), work2016 wordet and among al. observe,098 pioneeringjoint2017097). that knowledgeSome). word work cross-lingual inference observe477 base098 pioneeringthat097 word and work福斯特 observe among094096097477 that word c KB094096097 and484 ∈ D y 076 069 586 099 071 441 438 024438 047beddings 021049 KBs.traction435019021 Therefore,047 (Weston by049et KBs. et, 2017ded al. leverage, 2013 their),researchers andclose among;Lin Some contextsto acrosshibit workcross-lingual Regularizertheisomorphictant languageswe observe 福斯特lations. common propose488 pioneering ( structurethatMikolov supervision word097 et across However, workal. tolations., languageslations. observelearnLlations. 斯特guage.(e485 (1 that,Mikolov491 cross-lingualeliance However,aims these wordwhere)(Foust) However,097 etE al. over||C, methodsxLi onis However,toand thesee either− theseen,Joe multi-lingualC parallellations. English only methods a wordmethods||||Ce focusen,Kobe074 or488 entity099 an2016 onlydata,−theseonly on entity,C However,)learnstorepresententitiesbasedontheirPiston focus focus||,whichisincontradictionwiththefactthat and wee knowl-zh,Kobe onmethodsby( onx099071 automatically069i ) merely these071071 only methods focus onlyon focus on 071 372 2016attention; Cao toet al. select , 2017 edge047 the bases. ; Jitraction et most We047(3) ( Weston al.en,Joe Cross-lingual also, et2016zh,Joe al. informative traction propose, 2013 403 ; (LinWeston ). etliance two al. Sentence et, 2017al. types, Unbalanced2013), on and;en,Kobe parallelLin Regularizer et al.embeddings, 2017information data,zh,Kobe),instance, and trained we2013aims separately automaticallymon.Sometimesthepairofembeddings textual; to Zhang oninstance, monolingual2016 trainedneighbors ambiguity1 )learnstorepresententitiesbasedontheir separatelyembeddings et corpora textual al.,whichisincontradictionwiththefactthat ex-on, trained in monolingual2017 one ambiguity separately0971i( haveeembeddings language)., corpora ej ) on E monolingual ex-c trained in similar one may separately097 corpora(e language, e ) ex- onE monolingual embeddings, may corpora linked ex- to x noif mat-x453 ,(iii)contextualwordsofjective since a cross-lingual link actually contains 049 embeddings trained separatelyembeddings on monolingual trained separately corpora on ex- monolingual corpora425 ex-i j c i j c 099 475 belling errors increase because some of them are 549 549048 tant 049 entity 323 supervision022 linking0482016 ; (CaoTsai 049 et and al.entity318, Roth2017 linking;,2016 Ji2016 et over; al.; (Cao=TsaiYamada, 2016 et and al.).318 , multi-lingualet Roth2017 al.,;embeddings, Ji2016 et al.P;=Yamada,2013(word2016eil trained; mZhang).=i ,∈ et separatelyand etE al.( al.eliance,embeddingsi, 2017)) entity +). onknowl-P2013(022 monolinguale oni trained; representationsm(Zhange539i parallel), separatelye eti( corporaal.)+eSomei,12017))P +( cross-lingual). onex-e data,j monolingualml imono-lingual, in we599=(e pioneering1Pibeddings the)) corpora(099Some automatically098P( samee( cross-lingual ex-je∈ work)j emi for)i observeseman-, settings, theP599(e pioneeringi( wordsthat))099098 word( ande∈ work thati ) observe frequentlyfewei )+ thatresearches word co-occurC have to-iP( i072(ej ) e373i ) enmono-lingual368 368 settings, and few researches have 072 589 025 048 325entity436edge linking048 bases. (Tsai独立日 and428entity We Roth431 linking also, 2016美国独立日047liance; ( proposeTsaiYamada tantcomplementary and428 on048 Rothettraction parallelsupervisionliance al. two, 2016liance047 (hibitWeston entity; types024 onYamadaisomorphic data,links.022 on linkingparallel 048et etparallelal.traction we al., structure (2013hibitTsai, automatically data,isomorphicover and(;hibitWestonLinentity(knowledge.across data,w Roth!isomorphickitextlinks. et) we linking020 languagesal.,et structure2016 wemulti-lingual, al.5372017 automatically,; structure (2013acrossYamadahibit Tsai automatically (),Mikolovacross and isomorphic and; Lin across languages et Rothet al.et al. languages,al.,text structure2016Instead, ( languagesMikolov2017hibit ; acrossYamadaisomorphic (),Mikolovacross098 et and knowl-al., languages languages, et et al.structure al. of, 486 (Mikolov!hibit re-across!denotes:isomorphic without capturing 098et languages al., (i)structure (Mikolov478 contextualmutuallyacross! et any al., languages wordsmono-lingual additional (075097Mikolov098 in478 a pre-defined et al., settings, win-097098 trans-375 andSuppose few thatresearches sentences In haves481 thisl L contain paper,070 587072 the we propose to learn cross-lingual 074 027 attention 024 025045439 to 325 435 selectof048045 021 resources 022 the entity to linkingof048 mostimprove022 resources (Tsaitext022 and entity across1 various Rothinformative to linking , 2016 improve languages,; (naturalTsai Yamada L and ded320 various Rothet language al. capturing, 2016 hibit close; natural Yamada L isomorphic Cword mutually L| et language al. structure in,C and hibit semanticacross entity isomorphic选秀 languages N | representations structureembeddings (mono-lingual|CMikolov5.4489 acrosset Training spaceal. trained , languages| mono-lingual (wmono-lingual separatelyCinkiEembeddings (settings,cMikolov the098)Some Ndue same on etIn monolingualal. trained ,|| andseman-settings, this cross-lingualto settings, separatelyCE few c corpora098 the paper,428 researches onand ex-lations. andmonolingual common few few075 we095 researchespioneering corpora havetextualresearches propose ex- However, descriptionsy have have095072y work斯特 togetherto485 learnthese observe072 with(Foust)072 the∈y cross-lingual methods structuredG thatand word re-077k,l onlyEnglishy focusy entity on074Piston 071by merely 375y 370 478 442 327 439 024of 020 knowledge049liance 2016learn on al. )., 2017(sentences2016NBAei ,m; cross-lingualwords;Jii Cao)we et,049 All-star al. et , al.2016 convey2013 with, automatically2017).; Zhang( 2016edisappear;andiJi similar,trainingm different et;i Cao) al. al.eNBAi,,20172016 et al.). em-data2013)., ininformation,2017(e; anotherZhangi , viaedisappear;jJi)l(draft)textual et dis-Foust al. al.c ,,e20172016 language,2013 e.g.,,etc.).( descriptionse;2).belongsi inZhang, e(je)426 anothertheiL,Ee etcj ) al. e.g.,, adding2017c to1). language,4920992013 togetherthePiston;Zhang thetwo et,nomatterinwhichlan- equivalence mean-al. with e.g.,y, 2017N). the099y the structured two relation| mean-489099 re- between Foust099070 andN454 | { | ∈y } y476 377 074 047 049traction2016437 ; Cao049 (Weston et al., 20174292016023; Ji et; Cao al. et048, et2016liance al. al.,).2017429,; 2013Jientity et1 al.on048, linking20162013 ; parallel).;!LinZhang (Tsai et andentity al. et, 2017 Roth = linking al.).2013, 2016 data, ;!,Zhang (text Tsai;2017!YamadaP∈ (generate etE and al.,(2017 Rothe weacross et)), al.).e, 2016!,)+ cross-lingual and automatically hibit; Yamada(complementaryisomorphic1 languagesi ) et!099e al.∈P structure!, training(been hibit(e knowledge.gether487)across doneisomorphic1e generate data) without languages by099 ine viaminimizingcross-lingual structure Instead ( dis-Mikolov 479across cross-lingual ofet any al. languages re-the, In y scenarios. Euclidean additional this (Mikolov098479 et paper, distance: al. training , trans-098 we data propose via073 dis- to learn cross-lingual097 027 022 026 440 022049 2016; Cao324 049 etal. , 2017 2016; Ji s et;319Cao= al. , 2016 et al.generate,).2017 ; Ji et319 cross-lingual023 al. ,∈2016A 2013).; Zhang tic et al.training space, ,w2017 021ki∈ A).2013538 to data; Zhangi enable viai et al. ∈(, dis-2017Ew joint490ki). ) inferenceC en,LosAngelesLakers∈mono-lingualE099j amongeii dowen,LosAngelesLakers of KBx andif099xmono-lingual settings,,(ii)neighborentitiesthatbeen076lations. done(we ini,e However, cross-lingualandjif) Ex settings,c fewis thesesame entity scenarios. methods entitiesresearchese and in onlyin articles374 an focusfew077 of anchor entity on have researchesem369,thewrongla-w071,e588073 369 have072. almost irrelevant072 to em.Knowledgeattentionaims words and 046 filter(i.e. out326046of comparable knowledge独立日 noise, 323美国独立日023which attentionlearn432 generate sentences) and mutually willcross-lingual generate fur- together. trainingtranslated cross-lingual data via training dis-For words data023 exam-edge via5401 dis- with bases.been similar done Therefore, We in cross-lingual em- alsoi i propose scenarios. weFoust096 build twobelongsj bi-lingual 096 typesi 376 to073Piston entityi ,nomatterinwhichlan- network482been done byj i in cross-lingual scenarios. 373 073 590 026 436 processing438021 023edge (NLP)独立日 processing430 relatedbases.complementary023049 tasks,(NLP)405 美国独立日(3)liance430 We such related2016Cross-lingual; knowledge.as049Cao also relationEnglish tasks, on et al., 2017 parallelpropose such2016 ex-sentenceL; Sentence InsteadJi; as etCao tical. relation ,et2016 inspace, al. of , data,layer2017). re-Regularizertwo N ex-ter; to Ji 2 etenable (Figure| al.types wew in,ki 2016N2013been joint which automatically). 1aims; )containsZhang done inferenceE et toc al.427 in, 2017Nbeen cross-lingual language,2013福斯特). among488; done| Zhang ∈ EE et,whichisincontradictionwiththefactthat inc al.KB, cross-lingual2017 scenarios. and).y480 e.g., y∈ D scenarios.y English0761099 480 entity∈ 099 073071 486 Foust455073 multiple relations.477 For example (Figure 1), there 443 440 441 attention to selectbeddings the L bymost edge pushing informativetant bases. supervision their 025 cross-lingual We over ei 1 multi-lingualalso|| contexts1 − propose een,Joe knowl-C lations. guage.(e1iliance en,Kobe,491ej)= two||Ec on However, eembeddingseen,Joe parallelzh,Kobe types een,Kobe data,!493P these ( wee methodszh,Kobe( automaticallye trainedi )ei )+ only separately focus490 on ! onP monolingual( (ej ) ei ) corporaword ex- and⟨ entity⟩∈ representationsA in the same seman- 075 words and filter022 out noise, 024 which will Regularizer cross-lingual5.2 linked Cross-lingualen entitieskizhtic Entityen space, thataimszh inheritRegularizer to to all enable 488 joint inference ,whichisincontradictionwiththefactthat 福 among福 KB and076 372 029 445 442 2016 327 026 ; tantCao supervision et 026al. , 2017of tant knowledge ; over supervisionJi of etof knowledge attention multi-lingual knowledge al. and , 2016 Chineseand attention overattention cross-lingual ). Nand entitymay and multi-lingual and knowl-Chinese cross-lingual cross-lingual introduceen,Kobe!∈E entity(Foust) zh,Kobe knowl-4302013 inevitablearetic space, embed-(Foust)!;495Zhang∈ to l enable are(e errors.will ets ) embed- jointal.,s2 be, noS2017 inference direct Ourwill492). relation among be embeddings no direct KBbetween and relation Chinese between076 entity Chinese079 entityword and entity representations480 076 in the same377 seman- 329049 ther improve434 the performance. 408434 themayS issues, introduce we propose 025m inevitable542 knowledgeedge bases. errors. at-tic space,We Our also to embeddings propose enable itwo jointk484 k types= inference−tic space, among484 P to( KB enable( ande ) jointe )+ smaller inference458 weights among en and relatedzh KBP and sentences( (e with) e higher076) 099 379 075 592 029 024 029 049445 024 2016049 442 ; 026Cao et al. 2016327 , 0262017 ;篮球 tant026Cao; Ji supervision et et 322 al. al. , , 20172016 ; over). Ji km et multi-lingual introduce;en,KobeZhang ,e etzh,Kobe al. inevitable2013attention,2017>tic; Zhang495). space,l (e errors. eti )2w toal.ki, enabletic2017 to Our space,492). Inselectembeddings joint to this( enableinferencewki⟨) paper,l the joint431!′ among⟩∈ inference In most we KB this099079 and among propose informative paper, KB and099i076 we toi learn propose076076 cross-lingual377 to079 learn372 cross-lingualj i 372 074 074 481 iments,029 separateChinese 329 tasks024 attention word of word(i.e. to translation comparable(i.e.tant selectfrequently篮球 attention comparable supervision s sentences)co-occur= Sk,l the over informative most025 together.m542 multi-lingual informative in w together.ki Forcom-In exam- this(w ki) paper,431 knowl- For we propose exam- to learn(w ) cross-lingualen zh079Therefore,wordThe and entity bilingual representations074 we379 buildEN in the same bi-lingual seman- 斯特merges entityentities075 592 networkin dif-481 by 439 iments, separate Chinese435325 tasks435 attention of wordattention word 435 totranslationattention selectattentionfrequently mostattention close com-ded informative most informative|| space in close to−semanticrelations selectdue informativeCN to||||in thespace from the text most semantic− commonrelations each acrossdue Therefore,of informative The other. toC languagesNknowledge thebilingual Concretely,斯特 commonwe||485 fromLspace ENbuild withoutki(Foust) bi-lingual− attentioneach we any due斯特 mergesand enhance485 additional English(Foust) other. entities entitytoN and the intrans- networkentityand dif- cross-lingualConcretely,| commonEnglish489Pistonweights. by entityby Thus, merely wePiston− definetext we itby proportional enhanceacross485N merely(Foust) to| languagesthe sim- and English withoutweights. anyentity Thus, additional wePiston defineby ittrans- proportional merely375 to the sim- 446 443iments,446 separate 443024iments, tasks separate of 027 word tasks of409 translation0271 helpful w attentions to break to filterof496 down knowledge out language un- attentiontext493496 gaps andacross in cross-lingual many languagesIntext493 this across paper, withoutlanguages we without any propose459 any additional077 additional to learn trans- trans- cross-lingual 074 077 027030 027 328 027 edge027 bases. 323 We also 323 027 propose kk InSk,l exper- tic space,498m eNBA to enableNzmono-EN glish,495All-star jointin ase the sample inferencez possibility language,and by amongeword ofz using addingNBA we KB define the081 and and samem it entity as objective edges the aver-e(draft) asrepresentations froms ,etc. all 379 neighborslationadding in the mechanism,374 same the of seman- equivalence374 which is relation usually between expensivek,l Foust and 376and and entity031 relatedness fectiveness331025026 of demonstratespace. our methodther with improveChinese anther the averagether theword improvelevel, ef-028 performance. improve篮球 respectively. the the performance.frequently performance. Intasks,[[ exper-ther co-occur improve In such In exper- exper- words in the as]] com- performance. and cross-lingualw filtermay relations outlation introduce noise,L Inw from whichexper- mechanism, inevitable each entitywilllation fur- other. errors. linking,081N Concretely,mechanism, Our which embeddings| in we is076 whichenhance usually which381 isexpensive usuallyN |text expensive and across languages and without075 any additional trans- 078 028 028fectivenessof knowledge of438 029 our437 method of 411 attention knowledgewith438 an average (3) and Cross-lingual attention cross-lingual (3) Cross-lingual545 Sentence and∈ 027 544 cross-lingualentity RegularizerSentence Sentencee tomayki e introduceif Regularizeraims福斯特488aims − errors.,whichisincontradictionwiththefactthat to (layermay福斯特 Our488 introducelation 2). embeddings mechanism,,whichisincontradictionwiththefactthat inevitable which errors. is usually461079 Our expensive embeddings and487 595 078 078 077 594 449 441 446 029 attention029 toselect beddings the 029L most informative the || by major tasks,028 k pushing challenge suchl|| may499 lies as introducetheir inγcross-linguali measuringl434 =mayj cross-lingualinevitable496 introduceinP thei( mono-linguall similar-(jerrors.e=i inevitable)ee ii entity)+ 433 Our ENs.P ( errors.embeddingsL contexts( Thus, linking,ei ) ePi Our)+ we(L naturally(e embeddingsinj )ei ) intheL extend whichP possibility(guage.(ei(,e491ej )Lj )ei )E079c079 of using the same078 objective079 as484 483 031 448 445 329 029 独立日 324美国独立日 the major inS measuring(w ) age498 sum them similar- of(w word ) vectors weighted 495 by the combi- 079 y y 081y y 379 374 ᶱ| trans-082489N joint noise, toe| In enable inferencez exper-which077N joint| will∈ e among fur- inferencez 380 KB081 and among375 KB375 and076 381 076ڞof439 iments,iments, knowledge separate325 篮球 tasks attention of of word wordiments, translationkm translation and separatether lmtext ther cross-lingualacross improve improve tasksk,l languagesLe ofkiAllstar thetic word performance. words space, withoutthe translationeNBANL ki performance. eand| any∈ Into489∈ RE exper- additionaltic enablefilterzN space,082 out᪜᧍᥺ਫ֛ྋ 325 330439 026027 032032 331026 031 space.gain332447 of 20%ther and improveChinese 3%iments, overther412parable baselines, theseparate word improveiments, performance.sentences, tasks re- separate of word the and1 tasks translation= arefrequently performance. of close In word in exper-1P translation( theei m i semantic, co-occur(ei )) In +C exper-areemono-EN helpful497 inP(eCj to com-m breakiby,!e( addingei )) down( , ) edges languagemay from gaps( introduce, all) inw neighbors many inevitable !382 of errors.462ψ(em Our,s embeddings) sim(em, wi ) fectiveness of our methodgain of 20% with andAll 3%word/entity an over average baselines, embeddingslearn re- mutuallyare trainedlearn029 translated jointly546 mutually words translated with435are similar words helpfuli em- withmono-lingual to break similarFousti down function em-ebelongsi e languagej E toarec cross-lingualFoust helpful to gaps Pistonebelongsi e intoj settings:E many breakc,nomatterinwhichlan-ki to downPiston language,nomatterinwhichlan- gapsk,l in福斯特 many 079 596 485 fectiveness of our026 method 030030 with438words030 and an filter averagee out5 noise,030 Joint which Representation ity will between fur- (3) entities Learning Cross-lingual are ande helpful∈ corresponding to are breakentity∈ helpfulmayE down mentioned Sentence tointroduce e break language∈toE downetic gaps∈ languagerelations if space, inevitableRegularizer in aims each joint080 Our080080−toother. inference embeddings(layermay∝ Concretely,488 2).introducelation among,whichisincontradictionwiththefactthat mechanism, 080 KB(5) inevitable we and enhance which errors. is usually076 Our expensive embeddings and 446 442 030 440327 440 Jordan 029 Lity between entities028 |545 andiments,C ᐰේᇙԗԄ corresponding= separate!Pnation tasks( (e= mentionedi of) ofe| wordi two)+Ci types translation!P490(434may of(ei!) attentions:jePi )+( introduce(496el j490) e(ie) i!) 2P( j inevitable(ej080) ei )492 errors. Oury embeddingsy 377 078 595079 484 449 029033 029448 attention 331 to attention select 326 andand and the entity entity to entityatt326 most relatedness select relatedness informative demonstrate demonstratethe (theei, demonstratemi ) major most the theatt lation thek ef- ef- informativechallenge ef- mechanism, (lei, ej ) c498 which499 liese is inusuallyγz measuring expensive e z 083are and helpfuleyiin thezy mono-linguall to similar- break= down language ENs.P ( gaps381( Thus,e ini ) many ei )+ we w376s naturally extend376P(079 (ej ) ei ) 079 033 333028 and413 entityspace. relatedness (i.e. demonstrate comparable the the030and 547∈ ef-A major entity sentences) relatednesss = challengeL 436tasks, demonstrate∈E together.N suchL lies| as the cross-lingual in ef-w measuringNki For| entityN083 exam-( linking,w| ki the) in similar-N which078| 383463 i k,l 080597 486 spectively. spectively. Using Usingunder441 entity entitya unified linking linking 441 optimization as as a a case case objective.= Next,! P(ei wem i , (eitasks,)) + suche tasks, as!tasks, cross-lingualentityP( suchej suchthermi as,e e as( cross-lingualtoeiimprove cross-lingual)) entitye(=,491if)( linking, whichP performance., linking,491( such)′(e −) inase in which)(layer cross-lingual which(w ) 2). Therefore, In entity exper- linking, in we which build!∈ bi-lingual entity network y y by y y 032 beddings bybeddings pushing their by pushing cross-lingual theiri cross-lingual contextsi eguage.e contextse E guage.e e E ∈ R 082 031 325 att text acrossi j i j languagestextc ei j acrossNi j c without languageski anyCross-lingual without additional081 any Entity trans- additional Regularizer trans- 375 032 332027 330027449 031iments, 031 439ther031 separate improveiments,attention the tasks performance. separate 031 to ofL wordselect wordswords In tasks exper- translationin the different of| word mostCiments,L languages. and translation informative entity1 ∈E separaterelatedness 499 en| demonstrateC ∈E tasks||en the∈ en− of ef-Allstar wordC ∈ eni translatione|| NBAi zhL en(2)081 081081 N 082| 489 081 077N 382| 077 380 447 443 parable sentences,fectivenessfectiveness 5.1 Mono-lingual of our method methodand = Representation with withare an an close average averagemay Learning introduce inP(!e thei m inevitablei , semantic(e!i )) errors.L +!y Our embeddings497∈!tasks,CRP( suche| j m as cross-linguali ,(e493i )) entity linking, in which may introduce inevitable errors.ψ(e Our,s embeddings) sim(e , w ) gain of034034 20% andAll027029 3% word/entity over332442 baselines,fectiveness embeddings 414327 442 offectiveness our re- method 327 ofare with our( ei ,mani method) trained average031 fectiveness 548 withlearn 029 an jointly= average 546 Comparable ofmutually our guage.(eiP,491ej( attention) ′E(ce −083 such) e that(layer) 2). 078 449 031036 445 031031ther improve445329 ther the 445 improve performance. 031 the performance. In spectively. exper-( ei m i ) Using spectively. In entitywords exper- words Usingwords linkingComparable in entity inand in( differentei different differentej as) linkingentity ac case languages. Sentences languages.495as a relatedness case Generation086499495 demonstratei 081j495 the ef-i j i i 081 081 379 081 036 336 334 329 will introduce 329L how to generate wordswords bi-lingual ! in andinS !S>∈k,lyEen S k,lm wordsm086|words in different inC different∈E languages. languages.386L y y 384 ∈379 379 (2) 034 nents034034fectiveness034 for jointther representation of our improve methodChinese Chinese learning with the wordan word average inChinesey performance. turn.篮球034 篮球frequently wordy 篮球frequently co-occur∈frequently In440 exper- inwhere com- co-occur∈ co-occurs lrelations in com-Lmayis from a in set introducerelations eachcom- of sentences other. from! Concretely, con- eachinevitableL084 other.yψ084 we(084e Concretely,084,s enhance errors.)=1. we! enhance Ourtasks,C embeddings084 such| as cross-lingual490 entity linking, in which 034 fectiveness 446 study,fectiveness417 of446 the ourstudy, study, resultsstudy, methodbased the on034 the resultsofon benchmark results corpus our with(various on onevarious, on methodm benchmark benchmark dataset,anchors) benchmarkan languages average datasetwith datasetfectiveness datasettheandparable sharestudy, major entity an some some the average net-challengesentences results common commonmay of on lies fromtoour benchmark introduce( semanticee ini semantick,l.Thus,byjointlylearningmono-lingualrep-∈,e measuring Wikipedia method) 496437 dataset the inevitable articles. with similar- 496 an As average errors.relationsl Our467m enk,l fromembeddingszh each084∈ R other.084 Concretely, we enhance 487 034 029 037 3324460291Introduction032 442 … 327 i … i study, … 031 the 548 results ek En on l= ,w benchmark e e Comparablee ,wk,w ,w …{,w,w,wil dataset |j ,w,w∈Pc,w(}emay m Sentences ,087 introduce (e )) +Zh Generation inevitable082 496el(ePi )( errors.e2m ,084( Our492e )) embeddings 079 079 382 081 598 377 w measuringฎ,w liesᗦࢵi(w in) − measuringj the385thei similar- majori 380 the challenge similar-380 491 lies384where in measuring 079sim is similarity the similar- measurement, and we,ڹᬩۖާ∈Elฎ challengethe iᗦࢵ majori e 087eiNBA lieschallengeᭌᐹw,w inڹ= NBANBAtheฎᭌᐹkᭌᐹᓯቖ jointly majorᗦࢵڹ 334study,study,037 the the results results1Introduction029 on on benchmark benchmark335 All330 word/entity datasetw datasetAmerican w330playerywAmerican embeddingsDwbasketball∈A eAllstarAeNBA areNBA trainedᭌᐹ441 ೉᯾ will337 introduce035 447035035 howgain035 of to 20%comparable447 generate and 3% sentencesparable overbi-lingual baselines, sentences, as! wellparable035 re- as(i.e.1 and the sentences, EN arethree comparable close and compo- and in the are semantic closetainingfectivenessresentation insentences) thes the! same semantic497 with entity cross-lingualof our (as together.NBA mentioned method497 entity regularizer, in withki Sec- For085 an387 exam- average085∈085{Gki085 } 085 032 032iments, 330 separateverifyiments, 418 the tasks qualityverifyverifyverify separatework the032 of of035 the our quality word quality embeddings..WeutilizeavariantofSkip-gram of tasksmeaningsmeanings ofour translation our embeddings. embeddings. embeddings. of,buttherearealsowaysinwhichthey word ity shownverify between translation L the in Figure quality entities ofWe1 and,fromthepagearticlesofcross- our utilize embeddings. corresponding mono-EN distant supervision| mentioned bymono-ENC addingeᐰේᇙ to edges generate by adding from com-"Cross-lingual all edgesN neighbors468! from Attention| all of neighborsC Therefore,the ofmajor 085 challenge we082 build lies bi-lingual in082 measuring entity the similar- network380 by 082 443 wiments, separateG w tasksverifyee of the,w word,w quality translationof,w our442 embeddings.L 438en en The intuition|| − is that,yC words || and entities in493 492 488 gain 448 of 20%gain 448 andbasketball of 3%parable 20% over and player sentences, baselines, 3% ೉᯾over೉᯾ 032 baselines,re- and ᓯቖThe intuition are TheᬩۖާareThe(The closee tion re-, ism intuition intuition helpfulintuition4.2 that,words) ), and in and words isj if baselines,m corresponding 2).re-, − mentionedeity(layer between 2). mentioned 086 entities 493 and corresponding080 383 mentioned 378 335verifyverify the the quality qualitycomparable of of our our embeddings. sentences449 embeddings.under 449 a asunified well as optimization the three compo- objective. lingualvarious= linked languages variousvarious various entities Next,! languages share languages languagesand e( willKobe somei we have share499and share commoni similare some Kobe( somei embeddings.)) semantic,weextract common common common +499varioussemantic As semanticlanguages semantic shown! in sharei (Inspired somej commoni by self-attention( semantici )) mechanism (Lin et al., 385 033039 039 448 033034and entity337 0374441Introduction037Generation relatednessand332 1Introduction entity1Introductiontexts demonstrate relatedness332 given current word/entity:037 demonstrate theple, ef-1Introduction English the wordthat ef- aimsbasketball at filter439 out wrongvarious089 labelling089and languages sentences, the∈ share084C translatedR 498 some087087 common∈R 387 semanticmaking494ity382 between087 cross-lingual entities382 083 and linked corresponding083 entities mentioned that inherit all 489 lions339 of037 entities 331037 andstudy, their thefacts420 and1Introduction results in entity various on033 benchmarkAll lan- relatedness word/entitytics dataset 1Introduction toAll alignThe embeddings word/entity demonstrate similarL intuition words are embeddingstrained and 444= the entities is jointly ef-are that, with1 P trained(=ei simi-m|i , words jointly(ei C)) +P(ei m andi , (e i ))P( entitiese +j m1 i ,(Ne087i )) P389( inej mi ,|470087(eei ))C z e z 494e z 381 083 lions of entitiesspectively. and their factsspectively. Using in variouswbasketball entityspace.037 lan- Using linkingticseFoust entity to alignThe as similara linking033eFoust casespectively. intuitionthosee words,w as sentences and1 a,wtasks, entitiescaseshown is Using includingWe1 that,Figure1in with suchen Figure utilize1,Englishentity simi- entityzh anotherwordse as1tasks,,fromthepagearticlesofcross- cross-lingual distant linking cross-lingualNBAand suchco-occursmeaningse 1neighbor supervision asentities aswith,buttherearealsowaysinwhichthey cross-lingual words a entity case entities2017b in linking,), we to in motivate different generate entity cross-lingual in whichlanguages linking, attentioncom- focus-087 inthat which linked per. We normalize knowledge attention083 such that .(meanings೉᯾(e , Lmᓯቖmeanings,buttherearealsowaysinwhichthey)meaningsmeaningsᬩۖާandspectively.ψ,buttherearealsowaysinwhichthey′(,buttherearealsowaysinwhichtheyLw,buttherearealsowaysinwhichthey,w| C)ᐰේᇙԗԄis cross-lingualtasks, Using | C such entityᐰේᇙԗԄ attention(e ,e | as) linking toC cross-lingual dealentity as| aCe caseto entitye if linking, (layer 2 038 036 036 031 040 031 035 445 3.3 Cross-lingual Supervision Data i i (e i,miwords) i ( ine i,jmi ) differentwords(eComparablemeaningsi,ej090) inc languages. differenti ,buttherearealsowaysinwhichtheyj (ei,ej )c c085 languages. Sentences Generation086086495 081− 081 040 334449guages,031038 provide338 rich038038 structuredverify038 the421333 qualityknowledge of329 our333 for embeddings. un- lar embedding vectors,Multi-lingual = no matter445 knowledge ∈A! they P∈eA(=e basesbasketballm are, (KB), in(e z)) the!and + storingP∈m they , OnzS( areek,l one∈))y! embed-E∈ hand,P E (088e m we499,ingm utilize(e on088))471 potential088 their088 i shared information388words seman-j fromwords in comparable383i different inj sen- different383 languages.495 languages.L 081y y 384 379 336 034 nentsguages,034340 forfectiveness provide joint richThis structuredwill representationMulti-lingual section of fectiveness introduce our introduces knowledgeMulti-lingual Multi-lingual knowledge method how forunder038 of knowledgehow learningbases un-to knowledge with ourextract a (KB), unifiedlar to method bases morean embeddingunder storinggeneratebases optimizationin average (KB), cross-Chinese (KB),a turn.mil- unified withstoring vectors, storingThe bi-lingual optimizationobjective. mil- an mil-yword nointuition average!y matter differ. Next,lingual objective.篮球is they OnEN that,i weone linkedi are words hand,and infrequentlyi Next, entitiesthe440 we andi we utilizei e entities theiri andj co-occur shared ini e! seman-i ,weextractj ini390 com-i 088 084 386 084 490 fectivenessMulti-lingual 034m knowledge= of our bases method039 (KB),034 storinglog withP linkeddiffer.(L ( mil-x an) On entitiesx one)differ. averagediffer. hand,witheL On On weonethe| and one utilizeunbalanced hand,C hand,e theirwe weas utilize information| sharedcomparable utilizeCKobe their seman-their shared through| shared∈ sen-CKobe seman- possible seman-| C relations089 from∈ eachR other.ψ( Concretely,e ,s )=1. we enhance 084 084 041 036 study,332039 thestudy, results422 the on results benchmark variousMulti-lingual onvarious benchmark dataset lions languagesknowledge languages ofi entities(e ,mi446 dataset) bases and share(KB),Joe their(e , factsm storing) somesomeJoe in various mil-(e ,e ) common lan-commondiffer.091 On(e ,eto one) semantice hand, semantic.Thus,byjointlylearningmono-lingualrep-086 we utilize472089 their shared seman- 496 l m k,l 382 039 039446 Generation i i theparable majordedi closei in sentences the challenge semanticthei j c major space.tics Meanwhile, lies fromi toj challenge alignc ini similar cross-Wikipedia measuring089, wordstences lies and089 themselves. entities in the measuringarticles. with The similar- intuition simi-496 is to find Asthe possible similar-, 041 derstanding natural339039 language334 beyond texts.L334 Mean- same language orstudy, not.C On| e the∈zA other resultsstudy, hand,e z cross-∈AComparable the on= resultsthe benchmark∈E majorComparable Sentencesk091 on benchmarkchallenge∈PE (le Generation dataseti Sentencesm i (e lies dataseti Generation))089 + in measuring389 384P( theej m similar-i Zh384(ei )) 037 1Introduction1Introduction032 derstanding032341 naturallingual languagelions clues of beyond entities fromlionslions multi-lingual texts. of and of entities Mean-their entitieswill039Allyintroducefacts and KBen,zh word/entityand their in in their the variousywill facts how040 form factsy introduce inyto lan- of iny generatevarious variousvarious embeddingstics how lan-bi-lingual to lan- to languagesalign! generatetics tics similar tothose ENaligned to align share bi-lingual alignand words aresentences similar! words. some similarEn and trainedEN wordscommon entities words including and! ande and with semantic entities jointly entities simi-,w another with! with simi- cross-lingual,w simi- 391e 087 090 089 082 082 ᧍᥺᦯޾ਫ֛ᘶݳᤒᐏ਍ԟ,w for un-೉᯾ and,w entitiesᓯቖ,w with simi-ᬩۖާ 087 e ,w ,w497 ,w 082 491ܔwlions of entitieswplayer and theirsamexilions facts language, of in, entities various orguages,tences not. and lan-e provide On their447 the−tics rich facts= other toe structured . lan-441 words 032 037 ฎ387ᗦࢵ085 385 380 ڹ 092and tolarNBA alignฎ embeddingresentation (zh) similarhaveᗦࢵ| sim- vectors,090087 wordsCaligned with noᐰේᇙ and090 matter473090 words entities cross-lingual they between are with languages,ine simi- theNBA entity andNBA filter| outᭌᐹ regularizer, theCڹ1Introduction040comparable040 gainAmerican423 of 20%330 sentences and∈{! } 3% as∈e{DFoust! overwellA G } baselines,as11 theAllstar three re-NBA compo-lingualkNBAk linkedLᭌᐹ entities NBAtics 040 335035037 042 337 035042 while,gain abundant of340 text040 20%447 guages,corpus containsand provide335 gain guages,guages, 3% rich large of provide structured provide amountover335comparable 20%040 rich rich knowledgestructuredbaselines, and structuredlingual sentencescomparable 3%041 embeddingsknowledge forparable knowledge overun- as035 re- well sentences for willbaselines, for as un- sentences,theun- benefitS1 as three(1) well from compo-{ as differentre- the threeand lan-} compo- aree ( ei,m closei ) z092 in the semantic 090 (ei390,ej )mono-ENc 497 385 091 090 by385 adding085 edges from all neighbors of 085 while,342 abundant verify textbi-lingual corpus the qualitycontains ENverifyThisguages, andsection comparablelarge theof provide amount035 our introduces quality rich sentences.embeddings. structuredlingual how ofmeaningsmeanings ourto embeddings knowledge extract embeddings. morederstandingmeaningslar for will,buttherearealsowaysinwhichthey,buttherearealsowaysinwhichthey embeddingcross- un- benefit natural,buttherearealsowaysinwhichthey448larlarity embeddingfromlinked languagevectors, embeddingverify between differentilar entities beyondrepresentations no vectors, mattervectors, the lan- texts.e qualityity entitiesno they Mean-no dueand matterbetween matter toaree the inWeof most they they the andas our common are comparableutilize areembeddings. in entities corresponding in neigh-the the distant392 sen-words and without corresponding supervision alignments. mentioned We define it toaccording mentioned generate498 com- 085 043 038 333041 424 y guages, provideverify richThe structured intuition theylar qualityshown embeddingKnowledge knowledge is that= weofin vectors,Attention forconsiderWe Figureour un-Joe utilizeity no embeddings.! each matterlar between093P distant∈Joe embedding1A sentence(sameWee,fromthepagearticlesofcross- theym utilize languagesupervision are, entities vectors, in( distanteor the088)) not. to + no supervision Ongenerate and matter the474091 other corresponding they com- hand, to! are generate cross-∈P inE(e them com- , mentioned(e )) "Cross-lingual Attention 383 038 033 033of potential041 knowledge041 complementaryderstanding tounder existing natural a languageguages unified042 due beyond to optimization the texts. complementary Mean- knowledge. objective.The For intuition442 The Next, is intuitioni we that,i 091 wordsi is that,091 and words entities088 andj The ini entities092 intuitioni in083 is that, words 083 and entities in 492 038 043 3.3of potential033 Cross-lingual knowledge341041Multi-lingual448derstanding complementary336lingual Supervisionderstandingderstanding knowledge natural clues to language331where existing natural336basesnents041 fromxi (KB),beyondforis multi-lingual language either Data joint storing texts.nents a图 representation word beyond beyond5.4 Mean- mil- for KB or 跨语⾔词和实体联合表⽰学习模型 antexts.joint texts. in entity,while, thediffer.same representationMean- Mean- learning form and abundant On language of( onex449samei in) textsame hand, turn. or learninglanguagecorpus language not. weborcontains utilize Oneentities,in orz the orturn. not. theirlarge not. e.g., other OnFoust sharedamount One hand, the the. z other seman- other cross-093lingual hand, hand,words embeddings cross- cross- and willto entities benefit the maximum 091 from share different similarity:391 moreThe lan-088498 common386 intuitione 091 contexts,z386 is499 that,e wordsz 083 ande z entities in 381 036 336036343 spectively. nents Usingspectively.425 forspectively. jointentity Usingrepresentation linking Usingguages entityderstanding asentity due a tolinking case036 the linking naturallearning complementaryin a language Wikipedia asC as asame in a case beyondtences caseknowledge. turn. languagearticleL texts.− has or= Mean- For anot. other. | hand, of Ce cross- 393 475 z | C 086 086 386 086 Multi-lingual338 044 knowledgeKBs.039 Therefore,042 bases researchers042 042 (KB), leverage storing both types mil- differ.043 space. On one hand, we utilize( ,parable) their sentencessame094parable shared language from sentences Wikipediaseman- or not.092089 from On( articles.the092 Wikipedia092, other) hand, As articles.entity cross- Ase093 to e if (layer 2). Multi-lingual044 knowledge basesBi-lingual042lions (KB),449while, of entities abundant Entity storingwhile,while, andNetwork text abundant their abundantdenotes:corpus036 mil- factsConstruction text contains (i) text corpuscontextual in corpusinstance, various largediffer. containscontains words amount lan- textual largein Onalarge pre-definedof ambiguity amountlingual potential amount one embeddings win- knowledge hand, inlinguallingual onewordslingual language embeddingsSuppose complementarywe embeddings willS benefit in utilizee thati may linkedm different will sentencesi{ from will443 towords benefit existingwords benefit differenttheirparable entitiess fromk,l from094 languages.l} inguages lan-sharedin differentL different different differentcontain duee sentences to lan- lan- theseman- theComparable complementary languages.andei e092j from knowledge.c ,weextract Wikipedia Sentences For499 i Generationarticles.j i Asj − 086 493 KBs.344 Therefore,342 researchers334 426 leverage337bi-lingualwhile, abundant bothwill EN337042 types and text introducey comparable corpusinstance,y contains sentences. textual how large tics ambiguityamount to to aligngenerate in similarlingual one language embeddings words bi-lingual and may entities will∈variousA benefit with{ EN from| simi-∈ languagesy and different} and will lan- have share394 similar476 some∈E 392 embeddings. commonvarious387 semantic As092 languages shown387 in shareInspired some common by self-attention semantic mechanism384 (Lin et al., 039 034 045 034 034040043 043043 while,044y abundant 037 the text page corpus entity contains1IntroductionvariousThe (talking intuitionlarge something amountshown! languages is that in Figure aboutlingual we095shown consider the1share embeddings,fromthepagearticlesofcross- en- in Figure eachKobe some093 sentence090 will1,fromthepagearticlesofcross- benefit common093093! fromKobe different semantic089 lan- 094 084 084084∈ R 087 039 of037 resourcesGeneration to improve043 of1Introduction potential variousstudy,study,of knowledge natural potentialof potential the332dow language the knowledge complementary results of knowledgex resultsi if x complementarydisappeari complementaryon to on,(ii)neighborentitiesthat existing benchmark in benchmark another toKBs. toexistingguages existing language, Therefore, due datasetguages todataset e.g.,guages researchers thesame the duecomplementary due twoentities to to leveragethe mean- the complementaryin complementary articles both knowledge. types of entity knowledge.instance, knowledge.e Form,thewrongla- textual For For ambiguity in093 one languagevarious may089 languages share some087 common semantic 382 lions339 of037045 entities337of andresources1Introductionstudy, their to improve343 factstheguages, various results provide in427338of naturalvarious potential rich on structured language knowledgebenchmark3383.3043 lan- knowledgeCross-lingualy complementaryy∈3.3D for dataset Cross-lingual un- Supervision tolar existing embedding DataSupervision vectors, Data no matter they are in095 the e = z e 477 P(e393zi mi , (e388i )) + 0931388 087 P389(ej mi ,(ei )) 387 lions of entities and345 their factsConventional044 in variousw knowledge037 lan- representationwAmericandisappearoftics045 potentialy meth-1Introduction inw to anotherAll knowledgealigntity) word/entity language, (Yamada similar complementaryeguages e.g.,in et al. a the wordsdue, Wikipedia2017 two embeddingsto to the). mean- existing Thus, complementary and1444 article if entities aee sentenceguages has aare knowledge.due pseudo1 also with to trained,w,w the mention complementary For simi-,w,w395jointly of,w094 knowledge. Foremeanings095 ,buttherearealsowaysinwhichthey 087 494 -areand1094091,Englishentity simi-5 language,e anotherKobeฎeKobe1094,weextract,fromthepagearticlesofcross- e.g.,ᗦࢵand thee cross-lingualKobetwoNBA| mean-,weextractCco-occursᐰේᇙ1 with words 2017b| C), we motivate cross-lingual385 attention focus ڹentities ڹprocessing041 044 (NLP)335 related044KBs. tasks, Therefore,Bi-lingualKBs. suchbasketballKBs. as Therefore, researchers relation Therefore,linked Entity toex- researchersx Networkleveragei researchersif xi tics both Constructionleverage,(iii)contextualwordsof leverage types tobasketball alignbothof bothinstance, resources types types similarto textualinstance,Foust improveinstance,thosebelling ambiguitye words೉᯾ varioustextual textual errors,w sentences natural ambiguitylingual in increase ambiguityᓯቖ oneandmeanings language language linked,w because in entitiesshownNBA inNBA one096ᬩۖާlingual one entitiessomedisappearincluding may languageᭌᐹ languageᭌᐹFigure,buttherearealsowaysinwhichtheyof linkedL ine with themKobe in may anotherFigure may 046 040 046 035 035 044derstanding 428 naturalKBs. languageTherefore,comparabley 044 beyondy researchersGeneration texts.1∈y Gleveragesentences Mean-Generation 038 bothsamey typesy language asy well ormeanings not. as On the the three other,buttherearealsowaysinwhichtheyy hand, compo- cross-096 478094 090 094 085085 088 035 processing346 (NLP)344 related tasks,339 suchverify as relation thew 339if x ex-qualityis entity eKBs.Some ofin046 an our cross-lingual anchor Therefore, embeddings.processingw ,e pioneering researchers (NLP) . workinstance, relatedthe leveragealmost observe page tasks, textual irrelevant that both entity such word ambiguity as types to relation (talkinge .Knowledgeattentionaims ex-instance, in something one language textual about ambiguity may(e thei,m396i en-) in one language394meanings may 389 096,buttherearealsowaysinwhichthey(ei,ej )389c 085 040 guages,038 provide047 richtraction038042 structuredverify045 (Westonods the et045 al.3.3 normallyof045, qualityknowledge2013 resourcesverify; Cross-lingualLin regardof toet resourcesofal.the improve resources, cross-lingualour2017333 for qualityj to), variousembeddings. andimprovetoun-i improve naturallinks of Supervision various1i various our as language a natural spe- embeddings. natural languagementionsj languagei Data anotherMulti-lingual entity,those it implicitly445 knowledgem sentences097those expresses includingbases sentences (KB), another095092 includingWe∈ storingA cross-lingual095095 utilize another mil- cross-lingual distant090 supervision∈E 088 to generate088 com- 383 495 340 338 045while, abundant429 text corpus contains largelar amountunder embedding⟨ disappear a⟩∈ unifiedA indisappear vectors,disappear another optimization in language, in another another noe.g.,language, matter language, the two e.g., objective. e.g., theye mean- the= the1 twobasketball two are mean- mean- inz Next,! theandP(eplayer479i095 wemi , ine(e texts,i )) +differ. so theyz On are one! embed-P hand,(ej mi we390, ing utilize(ei on)) potential their shared information388 seman- from comparable sen- guages, provide047 richThistraction structured sectionMulti-lingual (Weston et introduces al. knowledge, 2013Multi-lingual knowledge; Linof et resources al. how, 2017 for038), knowledgeto045 bases un- and improve extractembeddings (KB), variousSomelarof047 resourcesmore bases cross-lingual embedding natural storing trained cross- totraction (KB),separatelylinguallanguage improve pioneering mil- (Weston embeddings on storing various monolingual vectors,disappear work et al.at, observeThe2013 filteringnatural will mil- incorpora; benefitLin another that no outlanguageintuition etword ex-al. wrong from, matter2017 language, different labelled),The andThe e.g.,is097 they sentences lan-e intuition intuitionthe that,Some two cross-lingual are through mean-z words ine ispioneering the that, andz work observe words entities that word and and in 097 entities entities095 in in 088 347 046 345cial046 equivalence046 340 Conventional typeprocessing= of relation340This (NLP) knowledge section betweenrelated introducesThis tasks, tworepresentation section such en-039 howas relationintroducestheir to+ extract meth- ex-relation. how moretity)linked todiffer. Therefore, cross- extract (Yamada more+ entitiesOnlinked we et cross-al. onemakediffer., entities2017hand,lingualdisappear ae).L similarlinked OnThus,eembeddings weone inand entitiesas- if′ anotherlinkeda trainedsentenceutilizee hand,e096e separatelyas language,397comparable also entitiesand theirweas on096e096 monolingual utilize e.g.,| comparable sharedas the sen-eC comparable395 twocorpora their mean- seman- ex- and shared sen-390 sen-e seman-,weextract390 | C 089 041 036 048 036entity036043 linking (Tsai336 andprocessing Roth, 2016processing (NLP); Yamadam relatednents5.2 (NLP) et tasks, Cross-lingual al. related, for suchembeddings tasks, asjointlog relation EntityP suchMulti-lingual trainedrepresentation Regularizer ex-asxi relationxi separately ex- onlog monolingualP knowledge learningei e corporai ex- in bases turn.log098Joe (KB),PJoeej cmk storingJoe093Joe mil-Joe Kobe091 Kobe 086 086086 386 039 046of potential430 knowledgeprocessing complementary (NLP) relatedhibit tasks, toisomorphic048 existing such as structure relationentityguages 1 linkingacross ex- due languages( toTsai the1lions and1 complementaryRoth (Mikolov of, 2016 entities et; al.Yamada,446 knowledge. et and al., their For(dedei,m factsi close) in in the various480 semantic096 lan-differ. space.( Meanwhile,Onei,ej098) onec hand, cross- we utilizetences089 themselves. their shared The seman- intuition is to find possible 496 041 derstanding039048 naturalentity language linking (Tsai and beyond Roth, 2016Generation texts.; LYamada Mean- et046 al., processing(C( )| (NLP)) relatedSomey cross-lingual tasks,Some(N(smallerSome such cross-lingual pioneering)| cross-lingualas weights relation) work pioneering and pioneeringex-observe relatede y workz worksentencesthat098 observehibit word observe( isomorphice| with thate thatzz)word higher word structureparableeacrossz languages sentences (MikolovComparable et al.091, tics from to align096 Wikipedia similar Sentences089 words articles. Generationand entities As with simi- 339348 047 346tities047traction047 (Zhu et (341Westonods al.traction,traction normally2017 et ( al.Weston).334y, (Weston3412013 However,lingualen regard,zh et; Lin al.etx clues, al.cross-lingual et2013hibitD2013ˆ we,y al.2013 fromlingual,isomorphic argue;same2017ZhangLin; Lin multi-lingualet), that et cluesal. andet al. links, al.language structure,20172017, 2017 fromsumption as),). ande), ai multi-lingual KBacross and spe- in languages asor the1 inmentions form not. relation KB (Mikolov of in On another the extraction: et entity, of otherIf twoit implicitly hand,languages enti-e z∈ cross- expressesA097 398 share097097 some396 common391 ∈ semanticE 391 389 384 derstanding341 049 natural language2016044lions; Cao etal. of, beyond2017KBs. entitieslions; Ji Therefore, et al.431 texts., of2016 and researchers entities). theirMean-039 leverage facts andi both their inen049zh types variouswill facts0402016 introduce in; Cao lan- variouset al., 2017variousSome how; Ji et cross-linguallan- al., to2016 languagestences generate). pioneeringticsvarious− to work099=tences1 align2013 sharebi-lingual al.= word, 2017094. and. some and semantic entities common! with099 semantic simi- 391 090 089 lingual037 clues047 from1Introduction multi-lingualtraction (WestonThe∈{047! bilingual et}al. KB!,∈2013 EN insame; Lin− the etmerges al.language form, 2017instance, entitiesembeddings),! and of∈E in textual dif-embeddings trained orembeddingstencesweights. ambiguityticsnot. separately trained trained to Thus, On on separately alignin monolingual separately we one− the define! language on similar∈=A onitother monolingualthose corpora proportional monolingualSome cross-and pioneering. entities including work097 observe with that word simi- another097 cross-lingual 087 042 049 20160371Introduction; Cao048 et al., 20173371Introduction; 048Ji et al., 2016entity). linking (Tsai and2013 RothGtraction; Zhang, 2016lions et (Weston; al.Yamada, 2017 of). et et al. al. entities, 2013, ;guages,Lin et and al., 2017 provide their),447 andS rich facts structured099{ inlingualSk Mono-lingual variousk knowledge}098{ linked lan- entities098 }Representation forNBA un-092and NBA Learning (zh) have sim- aligned words087 between languages,387 and filter out the 497 037 040349045 347this048entity may mislead linking432342cialentity ( equivalenceTsai to linking an and inconsistent342bi-lingual Roth (Tsai type, 2016 and ofRoth EN; Yamada training relationbi-lingual and, 2016 comparable et; ob-betweenYamada al. EN, andtieshibit et two sentences. al.isomorphic comparable participate, en-hibitembeddingshibit structureisomorphictheirisomorphic sentences.in relation. a trainedacross relation, structure structure languages separatelyThey Therefore,across andacross intuition (Mikolov on languagesy bothembeddings languages monolingual we ofetThe is (al.Mikolov make themthat (,Mikolov1intuition trained(5-1) corpora we aet et al.similarconsider separately al., ex-095 is, that399 as- on each we098 monolingual482 consider sentence397tics corpora each sentenceto ex-lar align392 embedding similar392 vectors,087 words no and090 matter entities they are with in simi- the 042 while, abundant040 text corpus contains048of resources toentity large improve linking amountferent various (Tsai languages and natural Roth intolingual language, 2016 a unified; Yamada network, embeddingsdisappear et al.resulting, in anotherilarity will language, between benefitS1s e.g.,and themeanings fromem two: { mean- different1 ,buttherearealsowaysinwhichtheyshown lan-} e 098 in Figurez0921,fromthepagearticlesofcross-090 while, abundant text340 corpusguages, contains provide2016049guages,; Cao large rich et al.2016 provide, structured2017 amount; 335Cao040; Ji048 et et al. rich al., ,20172016 knowledge structured; Ji).entity et al.,comparable linking2016041). knowledge ( forTsai2013; and un-Zhang Roth2013 ethibit sentences2013 al.,meanings;2016,Zhangisomorphic2017 for; Zhang).; etYamada un- al.et, al.2017 structure, 2017 as). etk,l,buttherearealsowaysinwhichthey). al.larmeanings wellacross, embeddinghibit languages asisomorphic the (Mikolov,buttherearealsowaysinwhichtheythree structure etvectors, al., across compo- languages099 no matter (Mikolov et al.they, are098 in the 390 091 090 385 342 bi-lingual046 049 EN348049 andThis comparable section433343tities2016 (;ZhuCaoin introduces et343 the al. al., possibilityy,2017 sentences.2017; Ji). et of However,lingual al. using how, 2016 the). towe same embeddings argueextract objective that as morederstandingsumptionlary will embedding cross- as benefit in natural448 relation fromlinked vectors,language extraction: differentilar entities beyondrepresentations noIf two099096 matter lan-enti- texts.e 099483 they Mean-and due398 toaree the in most theas393 common comparable393 neigh- 392 sen-words without alignments. We define it according 498 038 049processing (NLP)2016 related; Caowhere3.3 et tasks, al.x, 2017 suchis Cross-lingual either; asJi relationet aal. word, 2016guages, ex- or). an entity, Supervision provide and2013;xZhang richThedenotes: et al. Data structured, intuition2017in (i) a). contextualWikipediain isknowledge awords article that Wikipedia in has we a a article pseudoconsider forJoe un-has mention099 a pseudo each of mentionJoe sentenceWesame of utilize language distant or not. supervision On the088 other to hand, generate cross- com- 043 038 038041 338 Bi-lingual049i EntityBi-lingual2016; NetworkCao et Entity al. Construction,12017 Network; Ji et al. Construction, i2016). 2013; Zhang et al., 2017). lar093 embeddinge099 vectors,088 z no matter091088 they are in the 388 of potential041 knowledge complementary349 Multi-lingual434344this may to existing misleadin344 mono-lingual knowledge to an ENs. inconsistentguages Thus, bases we training naturallydue (KB),Some to extend ob- cross-lingual the storingC(ties complementary)pioneering participate mil- work in observediffer. a relation, thatknowledge.1 word On and one both hand, of For them we484 utilize399 their shared394 seman-394 091 043 Multi-lingual047derstandingtractionMulti-lingual knowledgederstanding natural (Weston et language al.,3362013 basesnatural knowledge; Lin beyond(KB), et1 language al., 2017y bases storing), texts. andy 042 beyond (KB), Mean- mil- texts. storing Mean- mil-y they 449 pagesamediffer. entityy languagethey On (talking pageybor oney entitye entities,something hand, orz097lingual (talking not. e.g.,we about something OnFoustutilize linked thee the en-.z about other their entities093 theshared hand, en- e cross- seman-and e to the,weextract maximum similarity: 092 386 499 of potential343 knowledge341 complementarylingual clues to existing041 from multi-lingualguagesnents dueembeddingsy for KB to joint thein trained thewhile,differ. complementary representationseparatelysame form abundantψ(On one language,s monolingual of one) sim text hand, corpora(e learningor corpusknowledge., not. ex- wew containsutilize) Ony in the turn. Fortheir large other shared amount hand, seman- cross-lingual embeddingsKobe will393Kobe benefit from different391 lan- 091 039 435 pre-definedmono-lingual windowGeneration function of xi toif cross-lingualderstandingxi , (ii)settings: neighbor naturalin entities a Wikipedialanguage thatm k,l linked totences beyondxmi articleif xi i texts.− has, (5) = Mean- a pseudo . of sentences from Wikipedia089 articles. As 044 039 039042048 339entity linking345 (Tsai and Roth345,Conventional20161; YamadaFigureConventional knowledge et al. 2:, The representation knowledge nerual model representation meth- fortity) jointly meth- (∝Yamada representationtity) et al.y (,Yamaday2017). Thus,et098 al., if2017 learning. a sentence). Thus,k also ifsame ak sentence094 language also395 or395 not.089 On the092089 other hand, cross- 389 KBs. Therefore,042 researchers leveragelions of both entities types and their043∈ factsD4hibity yisomorphic in various structure across lan- languagestics (Mikolov tow et align al.s, ∈ G similar words and entities with simi-092 093 044 Bi-linguallions of Entityentitieslionswhile, Network and436 of abundant entities their Construction factstext and corpus in theirinstance, variousy facts containsy lan- textual inlarge variousy 1 of ambiguity amount potential lan-y y knowledge inlingual oney language embeddingsi! complementary∈ k,l S those may will{ to486 sentencesbenefit existing from094} including different lan- another cross-lingual KBs. Therefore,342 researcherswhile, abundant leverage text both corpus337 typese contains= largelog P ( ′(1e amount2013i ) e;i Zhang) 1 et al.,tics2017lingual). to align embeddingstics similar to align words will similar benefit and entities wordsfrom different and with entities simi-guages lan- with due to simi- the complementary knowledge.392 For 387 344 040049 2016bi-lingual; Cao et346 al.(iii), 2017 ENcontextual; Ji042346 etods and al.L normally, 2016 words comparable). ofinstance,ods regardwj normallyif x cross-linguali Cis entity sentences.regard textual| ei cross-lingual linksin(2) an asambiguity anchor a spe- linkswj ,mentions asei a in spe- one. anothermentions language entity, another it099 implicitly may entity, itexpresses implicitly expresses396 396 394 090 092 045 040043 437 y en zh while, abundant1 the text page corpus entity contains (talkingThe intuitionlarge something amount is487 that about welingual095shown consider the embeddings en- in Figure each sentence1 will,fromthepagearticlesofcross- benefit093090 from different lan- of resources040 043 to improve various340 guages, natural provide language richei structured− 044 knowledgewhere forsim⟨ un-is similarity⟩∈larA1 measurement, embedding and we vectors, no mattere they are inz the093090 390 094 045 guages,of potential provideguages,of knowledge richpotential347 provide structuredThis knowledge complementary347cial rich equivalence section knowledge structured∈{G!cial complementarydisappear} typeintroduces equivalence3.3to of for relationknowledge existing Cross-lingual un-in type betweenanother how of to relation4KBs. existing twoto for language, extractbetweenen- un- Therefore, Supervisiontheir two moreguageslar relation. en- e.g., researchers embeddingtheir cross- the Therefore,due relation. Data two to theleverage wemean- vectors, Therefore,linked makecomplementary a both similar we entities no make types as-matter a similare knowledge.095instance, they397 as-and are textuale in For397 theas ambiguity comparable in one languagesen- may of resources to improve343 041 various natural438 language338where043 (ey) denotes cross-lingual contexts— laruseguages embedding cosine similarity due to in thevectors, the complementary rest of nothe pa- matter488 knowledge. they are inJoeFor the Joe e 091 z393 093 388 345 ′ i disappearof potential in another knowledge language, complementary e.g.,in a the Wikipedia two to mean- existing article has a pseudo mention395 of 046 processing041 (NLP)041Conventional related044 tasks, such knowledgederstanding439348 as relation348tities natural representation ex-C (Zhu language ettities al., 2017 (Zhu045). etbeyond However,meth- al., 2017 we).texts. However, argueoftity) resources Mean- that ( weYamada arguesumption to thatsame improveet assumption inal. language relation, various2017 as extraction: in). ornatural relation Thus, not.If489 extraction: language two if One a enti- sentencethezguagesIf096lingual other two enti-398 due hand,alsoe linked toz thecross-398 entities complementary091 5 e 094091and knowledge.e ,weextract For 095 044 derstandingKBs. Therefore,at341 naturalderstandingBi-lingual filteringKBs. language Therefore,researchers5.5.2 out Entitynaturallingualneighbor跨语言实体正则项 wrong beyond entities researchers languageNetwork leverage clues in labelling different texts. from bothlanguagesbeyond Construction leverage Mean- multi-lingual sentences, types thattexts. linked bothper.instance, Mean- types andWe KB normalize in textualinstance,same knowledgethe form language ambiguityattention textual of such that ortences ambiguity in not. one On language in the= one other may cross- another. 094 language,Kobe e.g., theKobe two mean- 391 046 processing (NLP) related042 tasks, such as440349 relation044349thisy ex- may misleadthis may to1 an mislead1 inconsistentGeneration to an training inconsistentsameL ob-y language trainingyties participate ob- orties not. in participate a relation, On the in and a other relation, both490 of hand,− andthem both cross-096 of them399k k 399 092 094 344 339to e .Thus,byjointlylearningmono-lingualrep-KBs.046 Therefore,l ψ researchers(em,sk,l)=1. leverage both types 394 096 389 047 traction346042 (Westonods042 et045 normallyal., 2013; ′Lin regardofenwhile, et resources al. cross-lingualzh, 2017 abundant to),i and improve text links corpus various1 Some as contains a cross-lingualnaturalspe- large languageprocessingmentions amount pioneering (NLP) another worklingual relatedthe observe page embeddings entity, tasks, that entity such word it as implicitly will (talkingrelationS benefit ex-instance, something097those expressesfrom{ different sentences textual about lan-} ambiguity092 the including396 en- 095 in092 one another language cross-lingual may 045 while,of resources abundant342while, to text441 improve abundant corpusresentation跨语言句对正则项 various contains text with corpus naturalcross-lingual large contains amountlanguage entity regularizer, large amount disappear in another language,491 e.g., the1 two mean-095 392 047 ψ (wi , wj5.5.3) bi-lingualis cross-lingual ENembeddingsSome and attention comparable cross-lingual trained separatelytolingual"Cross-lingualdisappear dealpioneering sentences. embeddings on Attention in monolinguallingual work another observe embeddings will language, corpora benefit thatword ex-The e.g.,from will intuition benefitthe different two from mean-097 is lan- that differentSome we cross-lingual consider lan- pioneering each work sentence observe that word traction (Weston et345 al.,0432013; Lin etof al. potential442, 2017340),045 knowledge and complementaryof047 resources totraction existing improve (Weston various et′ al., 2013 natural; Lin language et al., 2017492 ), and e 093z 395 097 095 390 048 entity347043 linking046 (Tsaicial043046 equivalenceand Roth, of2016processing potentialConventional type; Yamada of relation knowledge (NLP)words et andal. knowledge related entities, between complementaryembeddings share tasks, moreThis two common representation such sectionen- trained contexts, as relation to introduces separately existingtheir ex- meth- relation. on how monolingualguagestity) to Therefore,en extract due (Yamadazh corporato more the∝ we complementary et ex- cross-al. make, 2017 adisappear similar).098linked Thus, knowledge.embeddings as-entities inif a anotheren sentence trained Forezh096093Joe separately language,397 alsoand on096eJoe monolingual093 e.g.,as the comparable corporatwo mean- ex- sen- of potentialprocessingwith343 knowledge (NLP) the443 unbalanced related complementaryand tasks, will have information such similarhibit embeddings.as to relationisomorphic existing through As shown ex- structure in possibleguagesInspiredacross bydue self-attention languages toguagesψ the1(w mechanism complementaryi due (,Mikolov w (jLin to) et the al. et, complementary al., 493 knowledge.max knowledge.sim For (wi , w Forj ) 393 048 entity linking (Tsai and044 Roth, 2016KBs.; Yamada5.5.4 Therefore,046 et训练 al., researchers048 leverage bothentity types1 linking4 (TsaiSome4 and Roth cross-lingual, 2016in; Yamada apioneeringen Wikipediaen et al.zh work, zh098 observe articlehibit isomorphice that hasz word a structure pseudoeacrossz mention094 languages (Mikolov of et al., 098 096 346044047 odstraction normally444 (WestonBi-lingual341Figure regard1,Englishentity et al., cross-lingualhibit2013 EntityNBAlingualprocessingisomorphic;co-occursLin Network et with clues al. wordslinks, structure(NLP)2017 from Construction2017b as), related and), aSome multi-lingual weacross spe- motivate cross-lingual tasks, languages cross-lingualinstance,mentions such KB attention ( pioneeringMikolov textual as inrelation focus- anotherthe etambiguityw formwork al. ex-∈,494s observe entity, of,w in∈ that ones it′ word implicitly language may expresses 097094 396 391 049 2016348044; Cao047 et al.tities, 2017traction (Zhu; Ji et et al. (Weston al.KBs., 2016, 2017 Therefore,). et). al., However,2013 researchers; Lin et we2013 al. argue,;2017Zhang leverage), that et and al., both2017sumption). types as in relation extraction:i k jIf099tences1 twok enti-− = 097094 . KBs. Therefore,aligned344 researchers445words. leverage both049 types 2016instance,; Cao et textual al.embeddingsinstance,, 2017 ambiguity; Ji ettextual trained al., 2016 separately in ambiguity). one495 language on monolingual in one may2013 language; Zhang corpora et al. ex- may, 2017). k k 394 099 049 2016; Cao et al., 2017045; Ji et al., 2016of). resources5.6 实验basketball047 to improveand player2013in texts,varioustraction; Zhang so they are naturalet ( embed-Weston al., 2017 languageingembeddings).et on potential al., 2013 information traineddisappear; Lin from separatelyet comparable al. in, another2017 sen- onthe monolingual), page language,and entity corpora e.g.,Some099 (talking the ex- cross-lingualS two somethingmean-{ pioneering about095 work} the observe en- that word 097 347045048 cialentity equivalence446 linking342ded ( closeTsai type in the and semantic of Roth relation space.bi-lingual, 2016 Meanwhile,; betweenYamada cross- EN andtences et two themselves.al. comparable, en- The intuitiontheir sentences. is to relation. find possible Therefore,496 we make a similar as- 098095 397 392 349045 048 thisof may resourcesentity mislead linkingof to resources improve ( toTsai an and inconsistent various to Roth improve, 2016 natural; various trainingYamada language natural et ob- al., languagedisappeartieshibit participateisomorphic inhibitdisappear another structureisomorphic in language, a in relation,across another structure languages e.g., language,across and the (Mikolov bothlanguages twoembeddings e.g., mean- ofThe et (the al.Mikolov them, intuition trainedtwo et mean- al. separately,098095 is that399 onwe monolingual consider each corpora sentence ex- 046 345Next,processing447 we willlingual (NLP)048 introduceConventional linked related entities NBA the tasks,entityand twoNBA knowledge such(zh) linking typeshave as sim- relation of (Tsaialigned representationatten- wordsand ex- between Roth2013 languages,, ;2016Zhang meth- and; etfilterYamada al. out, 2017 thetity)). et ( al.Yamada497, et al., 2017). Thus, if a sentence096 also 395 098 049 2016; Cao5.6.1 et343实验设置 al., 2017; Ji et al., 2016). We1 set a threshold for discardinghibit isomorphic non-aligned structure across099 languages (Mikolov et al., 393 046 049 348046processing2016; Cao (NLP)processingtities et al. related448, (Zhu2017 (NLP); et tasks,ilarJi al. representations et,related al. such2017, 2016 dueas). tasks, to relation). theHowever, most such common ex- as neigh- werelation arguewords2013 ex-without; Zhang that alignments. et al.sumption, WeSome2017 define). cross-lingual it according as in pioneering relation498 work extraction:in a observe Wikipedia thatIf word two099096 article enti- has096 a pseudo398 mention of 047 tions346 traction in detail. (Westonbor entities, et e.g., al.Foust, 2013. ; Lin et al., 2017to the), maximum and similarity:1 ′ en zh 097 396 449 ods049 normallyBi-lingual2016 regard; Cao cross-lingual et Entity al., 12017 Network; linksJi et asal.Some Construction, a2016 spe- cross-lingualψ).(w mentions, w pioneering499) < another θ2013 work; Zhang observe entity, et al. that, it2017 word implicitly). expresses 099 047 349047traction (Westontractionthis et may al. (5.6.2Weston, 2013 mislead344定性分析; etLin al. to et,12013 an al., inconsistent2017; Lin), et and al., 2017 training),Some and ob- cross-lingualwordsembeddingsties if pioneeringparticipate trainedi separately workj in observe a on relation,, monolingualthat andthe1 word make page and corpora entity a both normal- ex-097 (talking of them something097 399 about the en- 394 048 347 entity linkingcial (Tsai equivalence and Roth, type2016 of; Yamada relationembeddings et betweenal., trainedembeddingshibit twoisomorphic separately en- trained on structuretheir separatelymonolingual relation.across on languagescorpora monolingual Therefore, ex- (Mikolov corporaet we al. ex-, make a similar098 as- 397 048 048entity linkingentity (Tsai linking and5.6.3 Roth345 (单词翻译效果分析Tsai, 2016 and1; RothYamada, 2016 et; al.Yamada, et al., ization for selected words. We set θ = 0 in exper-098 098 395 049 Knowledge2016; Cao Attention et al., 2017; Ji etConventional al., 2016).4 hibit5 knowledgeisomorphichibit2013 structureisomorphic representation; Zhangacross et al. structure, 2017 languages). meth-across (Mikolov languagestity) et al.(Yamada (,Mikolov et al. al., , 2017).099 Thus, if a sentence also 348 tities (Zhu et al., 2017). However, we1 argue that sumption as in relation extraction: If two enti- 398 049 0492016; Cao et2016 al., 2017; Cao; etJi346 al.et实体相关性效果分析, al.2017, 2016; Ji). et al., 2016). 12013; Zhang etiments.2013 al., 2017; Zhang). Thus, et al. unbalanced, 2017). information is trimmed099 099 396 5.6.4 enodsL normally regard cross-lingual links as a spe- mentionsen anotherzh entity, it implicitly expresses 349 this may misleads | to an inconsistent training ob- ties participate ins a relation,s ′ and both of them 399 Suppose that347 sentences k,l l=1 contain the same to the common meanings between1 k and k . For 397 5.6.5 跨语言实体链接效果分析encial equivalence type of relation4 between two en- their relation. Therefore, we make a similar as- entities in articles of entity em , the wrong labelling example (Figure 1), words American, basketball, 5.7 348本章小结 titiesen (Zhu et al., 2017).1 However, we argue that sumption as in relation extraction: If two enti- 398 errors increase, because some sk,l is almost irrele- 1 player are selected due to their aligned Chinese en 349 this may mislead1 to an inconsistent training ob- ties participate in a relation, and both of them 399 vant to em . Knowledge attention assigns smaller words 美国, 4篮球, 运动员, while 12 seasons in en zh weights to wrong labelled sentences, and higher sk or 前 (former) in sk′ are discarded due to low weights to related sentences. Thus, we70 define it attentions. en en 4 proportional to the similarity between sk,l and em : The reason of using such regularizer lies in two points: (1) the embeddings of cross-lingual aligned words become closer within the pair of ∑ en en ∝ en en comparable sentences, and meanwhile (2) the dis- ψ(em , sk,l) sim(em , wi ) wen∈sen tance between their contexts is also minimized, i k,l which keeps the same way as used in mono-lingual word embeddings training—the words sharing sim where is similarity measurement. We more contexts have similar embeddings. In this use cosine similarity in the presented work. way, our regularizer follows a similar assumption ∑Knowledge attention is normalized to satisfy L en en with (Gouws et al., 2015): The more frequently l ψ(em , sk,l) = 1. two words occur in parallel/comparable sentence Cross-lingual Attention pairs, the closer their representation will be.

Inspired by self-attention mechanism (Lin et al., 5.4 Training 2017b), we motivate cross-lingual attention focus- All above components are jointly trained using the ing on potential information from comparable sen- overall objective function as follows: tences themselves. The intuition is to find possible aligned words between languages, and filter out the L = Lm + Le + γLs words without alignments. We define it according to the maximum similarity computed by our cross- where γ is a hyper-parameter to tune the effect of lingual word embeddings: cross-lingual sentence regularizer, and set to 1 in

232 experiments. We use Softmax as probability func- Cross-lingual Comparable Bilingual EN Links (m) Sentences(m) E(m) R(b) tion, and negative sampling and SGD for efficient Es-En 0.82 4.66 4.64 0.58 optimization (Mikolov et al., 2013a). Zh-En 0.51 2.02 4.52 0.57 Ja-Zh 0.26 1.04 1.46 0.19 6 Experiments It-En 0.74 3.83 5.03 0.68 Tr-En 0.15 0.75 4.16 0.44

In this section, we describe some qualitative Table 2: Cross-lingual Data Statistics. analysis with nearest neighbors and quantita- tive experiments with the tasks of word trans- lation, entity relatedness and cross-lingual en- hours on the server with 64 core CPU and 188GB tity linking to verify the quality of cross- memory. The embedding dimension is set to 200 lingual word embeddings, entity embeddings and context window size is 5. For each positive and the joint inference among them, respec- example, we sample 5 negative examples. tively. The codes of our proposed model https://github.com/ can be found in 6.2 Qualitative Analysis TaoMiner/MultiLingualEmbedding.

6.1 Experiment Settings Translation words (Chinese) 篮球 (+), 篮球队 (basketball team), 湖人 (lakers), 男子 篮球 (men’s basketball), 湖人队 (the lakers), 国王队 (the Word Entity Kings), 美式足球 (American football), 中锋 (center) vocab (m) token (b) vocab (m) token (b) Nearest entities (Chinese) En 1.99 1.90 3.94 0.41 NBA, 篮球 (Basketball) , 控球后卫 ( guard), NBA Zh 0.55 0.17 0.58 0.06 选秀 (draft), 香港男子甲一组男子篮球联赛 (Hong Es 0.70 0.48 0.70 0.04 Kong men’s top basketball league), 橄榄球 (American Ja 0.46 0.45 0.88 0.08 football), 东方篮球队 (Eastern basketball team) It 0.67 0.40 1.09 0.12 Nearest words Tr 0.33 0.05 0.22 0.01 nba, wnba, player, twyman, professional, pick, 76ers Table 1: Multi-lingual KB Statistics. Nearest entities Professional sports, Varsity letter, Sports agent, All- America, Final four, All-star, College basketball We choose Wikipedia, the April 2017 dump, as Table 3: Cross-lingual nearest words and entities of En- multi-lingual KB and six popular languages for glish word basketball. evaluation. The preprocessing consists of follow- ing steps: converting texts into lower cases, filter- ing out symbols and low frequency words and en- We manually checked nearest neighbors to have tities (less than 5), and tokenizing Chinese corpus a straightforward impression of the quality of our using Jieba4 and Japanese corpus using mecab5. embeddings. The nearest neighbors of English The statistics is listed in Table 1. For brevity, we word basketball is listed in Table 3. adopt two-letter abbreviations: ‘En’, ‘Zh’, ‘Es’, As Table 3 shows, we find the correct translation ‘Ja’, ‘It’ and ‘Tr’ for English, Chinese, Span- ranked at top 1 (marked by +), and the listed words ish, Japanese, Italian and Turkish, respectively. as well as English nearest words are all basketball The token sub-column denotes the total number of related, indicating a higher quality of our cross- word/entity in the entire training corpus, and we lingual word embeddings. Interestingly, we found use ‘m’ to denote million and ‘b’ for billion. that although all nearest entities are sports related, For cross-lingual settings, we choose five lan- e.g., NBA or Professional sports, there is an ob- guage pairs to compare with state-of-the-art meth- vious culture divergence between Chinese entities ods, whose statistics is listed in Table 2. and English entities, such as Hong Kong basketball We trained our method using the suggested league v.s. All-America. parameters in Skip-gram model (Mikolov et al., 2013c) and evaluate the embeddings shared by all 6.3 Word Translation tasks for fairly comparison. We set training epoch Following (Zhang et al., 2017b), we test our cross- as 2 to ensure convergence, which costs nearly 20 lingual word embeddings on benchmark dataset 4https://github.com/fxsjy/jieba including over 2,000 bilingual word pairs on av- 5http://taku910.github.io/mecab/ erage. The ground truth is obtained from Open

233 Es-En It-En Ja-Zh Tr-En Zh-En large small large small large small large small large small TM - 48.61 - 37.95 - 26.67 - 11.15 4.79 21.79 IA - 60.41 - 46.52 - 36.35 - 17.11 7.08 32.29 Bilbowa 53 65.96 ------BWESG 48.88 66.38 36.84 51.29 30.93 37.80 21.36 35.59 20.57 29.17 Adversarial - 71.97 - 58.60 - 43.02 - 17.18 7.92 43.31 Ours-noatt 68.34 77.1 62.22 65.90 37.00 42.30 57.47 60.51 35.90 42.80 Ours 70.41 78.50 63.07 67.85 41.30 46.70 54.40 59.31 35.66 44.67

Table 4: Word Translation.

Multilingual WordNet6 or Google translation. We • Languages with richer corpus have better compare all methods using the same vocabulary, translations because adequate training data and analyze the vocabulary size’s impact by set- helps to capture more accurate cross-lingual ting a nearly 5k small scale and 50k large scale. semantics (Es-En, It-En, Tr-En v.s. Ja-Zh). We choose several state-of-the-art methods as baseline, using different level of parallel data: (1) • Our method has less performance reduction TM (Mikolov et al., 2013b), IA (Zhang et al., between small and large vocabulary than 2016) are pioneers and popular transformation methods based on parallel word pairs, be- based methods using bilingual lexicon. (2) Bil- cause we adopt a consistent objective func- bowa (Gouws et al., 2015) is typical work using tion which aligns cross-lingual semantics, parallel sentences and performs quite well. (3) and simultaneously keeps their own mono- BWESG (Vulic and Moens, 2016) is similar to lingual semantics. our method and achieves best performance in the • Attention mechanisms further improve the literature of using comparable data. (4) Adver- performance, mainly because they help to sarial model (Zhang et al., 2017b) is the state-of- select the most informative words and sen- the-arts without parallel data. Besides, we re- tences, filtering out unrelated data. move attention from our method to investigate the impacts from attention mechanisms, marked with 6.4 Entity Relatedness Ours-noatt. With respect to our entity embeddings, we have For fair comparison, we report the results in conducted experiments to evaluate English entity original paper (Zhang et al., 2017b) except Bil- relatedness following (Ganea and Hofmann, 2017; bowa and BWESG, which didn’t report their re- Hoffart et al., 2011), in which the dataset con- sults on the same benchmark datasets. So, we care- tains 3,314 entities, and each entity has 91 candi- fully implement them using released codes on the date entities labeled with 1 or 0, indicating whether same training corpus as ours with suggested pa- they are semantically related. Given an entity, we rameters. Nevertheless, we do not have perfor- rank candidate entities according to their similarity mance reports of Zh-En, It-En, Tr-En and Ja-Zh based on our embeddings, and evaluate the rank- with Bilbowa due to the lack of parallel data used ing quality through two standard metrics: normal- in the original paper. As shown in Table 4, we can ized discounted cumulative gain (NDCG) (Järvelin see: and Kekäläinen, 2002) and mean average precision • Our proposed method significantly outper- (MAP) (Manning et al., 2008). forms all the baseline methods with average To give a comprehensive fair comparison, we gains of 21% and 9.1% on large and small choose several widely used and state-of-the-art vocabulary. This proves the high quality of methods as our baselines, and compare with the our generated cross-lingual data and the ef- results in the original papers: (1) WLM (Milne fectiveness of our joint framework. and Witten, 2008), the popular semantic similar- ity measurement based on Wikipedia anchor links. • The pair of languages have similar culture (2) ALIGN (Yamada et al., 2016) and MPME (Cao achieves better performance (Es-En, It-En, et al., 2017), state-of-the-arts that jointly learn Tr-En, Ja-Zh) than that have different cultural word and entity embeddings using mono-lingual origins, e.g., Zh-En. EN. (3) Deep Joint (DJ) model (Ganea and Hof- 6http://compling.hss.ntu.edu.sg/omw mann, 2017), deep neural model that achieves the

234 best performance of entity relatedness. it is not to beat other EL models but to evaluate the quality of our embeddings, so we adopt a simple NDCG MAP @1 @5 @10 classifier GBRT (Gradient Boost Regression Tree) WLM .54 .52 .55 .48 basedmethodasin(Cao et al., 2017; Yamada et al., ALIGN (d=500) .59 .56 .59 .52 2016), replace with our cross-lingual embeddings, MPME .61 .61 .65 .58 DJ (d=300) .63 .61 .64 .58 and filter out mentions that are out of our vocabu- Ours (Zh-En) .62 .62 .66 .59 lary. Ours (Es-En) .61 .61 .65 .59 Ours (Tr-En) .62 .62 .65 .59 Ours (It-En) .61 .61 .65 .58 English Spanish Chinese Ours-e (Es-En) .62 .62 .67 .61 Top system 73.7 80.4 83.1 Ours-e (Es-En,epoch=5) .64 .64 .68 .62 Second system 66.2 71.5 78.1 Ours 73.9 79.1 81.3 Table 5: Entity Relatedness. Table 6: Tri-lingual Entity Linking.

Table 5 shows the results of baseline methods as Table 6 shows the top 1 linking accuracy (%). well as our methods based on different languages. We can see our method performs much better than We also test the cases of our method without train- the second ranked system, and is competitive with ing cross-lingual words, marked as Ours-e. We can the top ranked system. Considering that the sys- see our method outperforms all baseline methods tems utilize additional translation tools (Ji et al., by introducing cross-lingual information, and all 2015), we conclude that our embeddings are high bilingual ENs lead to similar results. Strangely, qualified for joint inference among entities and ALIGN and DJ with more embedding dimensions words in different languages. seemly fails to capture overall relatedness (per- formance reduction from top@1 to top@5). The best performance of Ours-e implies that training 7 Conclusions cross-lingual word slightly harms the performance In this paper, we propose a novel method to jointly of entity embeddings. We can introduce additional learn cross-lingual word and entity representa- sense embeddings in future (Cao et al., 2017). tions that enables effective inference among cross- Although favorable improvements has been lingual knowledge bases and texts. Instead of par- achieved by using our English entity embeddings, allel data, we use distant supervision over multi- it shall be fewer than that of other languages, be- lingual KB to generate high quality comparable cause resources of English are already quite rich, data as cross-lingual supervision signals for two and even richer than many other languages, thus types of regularizer. We introduce attention mech- contributions from other languages will be less sig- anism to further improve the training quality. A nificant than vice versa. Due to the limitation of series of experiments on several tasks verify the the publication, we neglect to report experiment effectiveness of our methods as well as the quality results on the vice versa direction. of cross-lingual word and entity embeddings. 6.5 Cross-lingual Entity Linking In the future, we will enrich semantics of low- Entity linking, the task of identifing the language- resourced languages by cross-lingual linking to specific reference entity for mentions in texts, rich-resourced languages, and extend more cross- raises the key challenges of comparing the rel- lingual words and entities to multi-lingual settings. evance between entities and contextual words around the mentions (Cao et al., 2015; Nguyen 8 Acknowledgments et al., 2016). Recently, the surge of cross-lingual analysis pushes the entity linking task on cross- The work is supported by NSFC key project (No. lingual settings (Ji et al., 2015). Therefore, we 61533018,U1736204,61661146007), Ministry comprehensively measure our joint inference abil- of Education and China Mobile Research Fund ity among words and entities using the tri-lingual (No. 20181770250), and THUNUS NExT++ Co- EL benchmark dataset KBP2015, which consists Lab. Partial financial support from P3ML project of 944 documents and 38,831 mentions, and di- funded by BMBF of Germany under grant number vides them into 444 and 500 documents for train- 01/S17064 is greatly acknowledged. ing and evaluation. Note that the main purpose of

235 References Shizhu He, Cao Liu, Kang Liu, and Jun Zhao. 2017. Generating natural answers by incorporating Waleed Ammar, George Mulcaire, Yulia Tsvetkov, copying and retrieving mechanisms in sequence-to- Guillaume Lample, Chris Dyer, and Noah A. Smith. sequence learning. In ACL. 2016. Massively multilingual word embeddings. CoRR. Karl Moritz Hermann and Phil Blunsom. 2014. Multi- Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2017. lingual models for compositional distributed seman- ACL Learning bilingual word embeddings with (almost) tics. In . no bilingual data. In ACL. Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bor- Antonio Valerio Miceli Barone. 2016. Towards cross- dino, Hagen Fürstenau, Manfred Pinkal, Marc Span- lingual distributed representations without paral- iol, Bilyana Taneva, Stefan Thater, and Gerhard lel text trained with adversarial autoencoders. In Weikum. 2011. Robust disambiguation of named en- Rep4NLP@ACL. tities in text. In EMNLP. José Camacho-Collados, Mohammad Taher Pilehvar, Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke S. and Roberto Navigli. 2015. Nasari: a novel ap- Zettlemoyer, and Daniel S. Weld. 2011. Knowledge- proach to a semantically-aware representation of based weak supervision for information extraction of items. In HLT-NAACL. overlapping relations. In ACL.

Hailong Cao, Tiejun Zhao, Shu Zhang, and Yao Meng. Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumu- 2016. A distribution-based model to learn bilingual lated gain-based evaluation of ir techniques. TOIS. word embeddings. In COLING. Heng Ji, Joel Nothman, Hoa Trang Dang, and Syd- Yixin Cao, Lei Hou, Juanzi Li, and Zhiyuan Liu. 2018. ney Informatics Hub. 2016. Overview of tac- Neural collective entity linking. In COLING. kbp2016 tri-lingual edl and its impact on end-to-end cold-start kbp. In TAC. Yixin Cao, Lifu Huang, Heng Ji, Xu Chen, and Juan- Zi Li. 2017. Bridge text and knowledge by learning Heng Ji, Joel Nothman, Ben Hachey, and Radu Flo- multi-prototype entity mention embedding. In ACL. rian. 2015. Overview of tac-kbp2015 tri-lingual en- Yixin Cao, Juanzi Li, Xiaofei Guo, Shuanhu Bai, Heng tity discovery and linking. In TAC. Ji, and Jie Tang. 2015. Name list only? target entity disambiguation in short texts. In EMNLP. Alexandre Klementiev, Ivan Titov, and Binod Bhat- tarai. 2012. Inducing crosslingual distributed rep- A. P. Sarath Chandar, Stanislas Lauly, Hugo resentations of words. In COLING. Larochelle, Mitesh M. Khapra, Balaraman Ravin- dran, Vikas C. Raykar, and Amrita Saha. 2014. An Tomás Kociský, Karl Moritz Hermann, and Phil Blun- autoencoder approach to learning bilingual word som. 2014. Learning bilingual word representations representations. In NIPS. by marginalizing alignments. In ACL.

Meiping Dong, Yong Cheng, Yang Liu, Jia Xu, Guillaume Lample, Alexis Conneau, Ludovic Denoyer, Maosong Sun, Tatsuya Izuha, and Jie Hao. 2014. Hervé Jégou, et al. 2018. Word translation without Query lattice for translation retrieval. In COLING. parallel data. ICLR.

Manaal Faruqui and Chris Dyer. 2014. Improving vec- Yankai Lin, Zhiyuan Liu, and Maosong Sun. 2017a. tor space word representations using multilingual Neural relation extraction with multi-lingual atten- correlation. In EACL. tion. In ACL.

Octavian-Eugen Ganea and Thomas Hofmann. 2017. Zhouhan Lin, Minwei Feng, Cícero Nogueira dos San- Deep joint entity disambiguation with local neural tos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua EMNLP attention. In . Bengio. 2017b. A structured self-attentive sentence Stephan Gouws, Yoshua Bengio, and Gregory S. Cor- embedding. CoRR. rado. 2015. Bilbowa: Fast bilingual distributed rep- resentations without word alignments. In ICML. Thang Luong, Hieu Pham, and Christopher D. Man- ning. 2015. Bilingual word representations with Xu Han, Zhiyuan Liu, and Maosong Sun. 2016. Joint monolingual quality in mind. In VS@HLT-NAACL. representation learning of text and knowledge for knowledge graph completion. CoRR. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to information Yanchao Hao, Yuanzhe Zhang, Kang Liu, Shizhu He, retrieval. Zhanyi Liu, Hua Wu, and Jun Zhao. 2017. An end- to-end model for question answering over knowl- Tomas Mikolov, Kai Chen, Gregory S. Corrado, and edge base with cross-attention combining global Jeffrey Dean. 2013a. Efficient estimation of word knowledge. In ACL. representations in vector space. CoRR.

236 Tomas Mikolov, Quoc V Le, and Ilya Sutskever. 2013b. Jason Weston, Antoine Bordes, Oksana Yakhnenko, Exploiting similarities among languages for machine and Nicolas Usunier. 2013a. Connecting language translation. CoRR. and knowledge bases with embedding models for re- lation extraction. In ACL. Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013c. Distributed rep- Jason Weston, Antoine Bordes, Oksana Yakhnenko, resentations of words and phrases and their compo- and Nicolas Usunier. 2013b. Connecting language sitionality. In NIPS. and knowledge bases with embedding models for re- lation extraction. In EMNLP. David Milne and Ian H. Witten. 2008. An effective, low-cost measure of semantic relatedness obtained Haiyang Wu, Daxiang Dong, Xiaoguang Hu, Dian- from wikipedia links. In AAAI. hai Yu, Wei He, Hua Wu, Haifeng Wang, and Ting Liu. 2014. Improve statistical machine translation Mike Mintz, Steven Bills, Rion Snow, and Daniel Ju- with context-sensitive bilingual semantic embedding rafsky. 2009. Distant supervision for relation extrac- model. In EMNLP. tion without labeled data. In ACL/IJCNLP. Jiawei Wu, Ruobing Xie, Zhiyuan Liu, and Maosong Sun. 2016. Knowledge representation via joint Aditya Mogadala and Achim Rettinger. 2016. Bilin- learning of sequential text and knowledge graphs. gual word embeddings from parallel and non- CoRR. parallel corpora for cross-language text classifica- tion. In HLT-NAACL. Min Xiao and Yuhong Guo. 2014. Distributed word representation learning for cross-lingual dependency Thien Huu Nguyen, Nicolas Fauceglia, Mariano Ro- parsing. In CoNLL. driguez Muro, Oktie Hassanzadeh, Alfio Massimil- iano Gliozzo, and Mohammad Sadoghi. 2016. Joint Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, and learning of local and global features for entity link- Yoshiyasu Takefuji. 2016. Joint learning of the em- ing via neural networks. In COLING. bedding of words and entities for named entity dis- ambiguation. In CoNLL. Sebastian Ruder, Ivan Vulić, and Anders Søgaard. 2017. A survey of cross-lingual word embedding Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, and models. arXiv preprint arXiv:1706.04902. Yoshiyasu Takefuji. 2017. Learning distributed rep- resentations of texts and entities from knowledge Tianze Shi, Zhiyuan Liu, Yang Liu, and Maosong Sun. base. TACL. 2015. Learning cross-lingual word embeddings via Bishan Yang and Tom M. Mitchell. 2017. Leverag- matrix co-factorization. In ACL. ing knowledge bases in lstms for improving machine Radu Soricut and Nan Ding. 2016. Multilingual word reading. In ACL. embeddings using multigraphs. CoRR. Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao. 2015. Distant supervision for relation extraction Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, via piecewise convolutional neural networks. In and Christopher D. Manning. 2012. Multi-instance EMNLP. multi-label learning for relation extraction. In EMNLP-CoNLL. Jing Zhang, Yixin Cao, Lei Hou, Juanzi Li, and Hai-Tao Zheng. 2017a. Xlink: an unsupervised bilingual en- Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoi- tity linking system. In Chinese Computational Lin- fung Poon, Pallavi Choudhury, and Michael Gamon. guistics and Natural Language Processing Based on 2015. Representing text for joint embedding of text Naturally Annotated Big Data. and knowledge bases. In EMNLP. Meng Zhang, Yang Liu, Huanbo Luan, and Maosong Ivan Vulic and Marie-Francine Moens. 2015. Bilin- Sun. 2017b. Adversarial training for unsupervised gual word embeddings from non-parallel document- bilingual lexicon induction. In ACL. aligned data applied to bilingual lexicon induction. In ACL. Yuan Zhang, David Gaddy, Regina Barzilay, and Tommi S. Jaakkola. 2016. Ten pairs to tag - multi- Ivan Vulic and Marie-Francine Moens. 2016. Bilingual lingual pos tagging via coarse mapping between em- distributed word representations from document- beddings. In HLT-NAACL. aligned comparable data. JAIR. Hao Zhu, Ruobing Xie, Zhiyuan Liu, and Maosong Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Sun. 2017. Iterative entity alignment via joint Chen. 2014. Knowledge graph and text jointly em- knowledge embeddings. In IJCAI. bedding. In EMNLP. Will Y. Zou, Richard Socher, Daniel M. Cer, and Christopher D. Manning. 2013. Bilingual word em- Zhigang Wang and Juan-Zi Li. 2016. Text-enhanced beddings for phrase-based machine translation. In representation learning for knowledge graph. In IJ- EMNLP. CAI.

237