Joint Representation Learning of Cross-lingual Words and Entities via Attentive Distant Supervision Yixin Cao1,2 Lei Hou2∗ Juanzi Li2 Zhiyuan Liu2 Chengjiang Li2 Xu Chen2 Tiansi Dong3 1School of Computing, National University of Singapore, Singapore 2Department of CST, Tsinghua University, Beijing, China 3B-IT, University of Bonn, Bonn, Germany {caoyixin2011,iamlockelightning,successcx}@gmail.com {houlei,liuzy,lijuanzi}@tsinghua.edu.cn [email protected]
Abstract Wu et al. (2016); Han et al. (2016); Weston et al. (2013a); Wang and Li (2016) represent entities Joint representation learning of words and enti- based on their textual descriptions together with ties benefits many NLP tasks, but has not been the structured relations. These methods focused on well explored in cross-lingual settings. In this paper, we propose a novel method for joint rep- mono-lingual settings. However, for cross-lingual resentation learning of cross-lingual words and tasks (e.g., cross-lingual entity linking), these ap- entities. It captures mutually complementary proaches need to introduce additional tools to do knowledge, and enables cross-lingual infer- translation, which suffers from extra costs and in- ences among knowledge bases and texts. Our evitable errors (Ji et al., 2015, 2016). method does not require parallel corpora, and In this paper, we carry out cross-lingual joint automatically generates comparable data via representation learning, which has not been fully distant supervision using multi-lingual knowl- edge bases. We utilize two types of regu- researched in the literature. We aim at creating a larizers to align cross-lingual words and enti- unified space for words and entities in various lan- ties, and design knowledge attention and cross- guages, and easing cross-lingual semantic compar- lingual attention to further reduce noises. We ison, which will benefit from the complementary conducted a series of experiments on three information in different languages. For instance, tasks: word translation, entity relatedness, and two different meanings of word center in English cross-lingual entity linking. The results, both are expressed by two different words in Chinese: qualitatively and quantitatively, demonstrate center the activity-specific building the significance of our method. as is expressed by 中心, center as the basketball player role is 中 1 Introduction 锋. Our main challenge is the limited availability Multi-lingual knowledge bases (KB) store millions of parallel corpus, which is usually either expen- of entities and facts in various languages, and pro- sive to obtain, or only available for certain narrow vide rich background structural knowledge for un- domains (Gouws et al., 2015). Many work has derstanding texts. On the other hand, text cor- been done to alleviate the problem. One school pus contains huge amount of statistical information of methods uses adversarial technique or domain complementary to KBs. Many researchers lever- adaption to match linguistic distribution (Zhang age both types of resources to improve various nat- et al., 2017b; Barone, 2016; Cao et al., 2016). ural language processing (NLP) tasks, such as ma- These methods do not require parallel corpora. chine reading (Yang and Mitchell, 2017), question The weakness is that the training process is un- answering (He et al., 2017; Hao et al., 2017). stable and that the high complexity restricts the Most existing work jointly models KB and text methods only to small-scale data. Another line corpus to enhance each other by learning word and of work uses pre-existing multi-lingual resources entity representations in a unified vector space. For to automatically generate “pseudo bilingual docu- example, Wang et al. (2014); Yamada et al. (2016); ments” (Vulic and Moens, 2015, 2016). However, Cao et al. (2017) utilize the co-occurrence infor- negative results have been observed due to the oc- mation to align similar words and entities with sim- casional poor quality of training data (Vulic and ilar embedding vectors. Toutanova et al. (2015); Moens, 2016). All above methods only focus on ∗ Corresponding author. words. We consider both words and entities, which
227 Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 227–237 Brussels, Belgium, October 31 - November 4, 2018. c 2018 Association for Computational Linguistics makes the parallel data issue more challenging. • We did qualitative analysis to have an in- In this paper, we propose a novel method tuitive impression of our embeddings, and for joint representation learning of cross-lingual quantitative analysis in three tasks: word wordsandentities. Thebasicideaistocapturemu- translation, entity relatedness, and cross- tually complementary knowledge in a shared se- lingual entity linking. Experiment results mantic space, which enables joint inference among show that our method demonstrates signifi- cross-lingual knowledge base and texts without ad- cant improvements in all three tasks. ditional translations. We achieve it by (1) utilizing an existing multi-lingual knowledge base to auto- 2 RelatedWork matically generate cross-lingual supervision data, (2) learning mono-lingual word and entity rep- Jointly representation learning of words and enti- resentations, (3) applying cross-lingual sentence ties attracts much attention in the fields of Entity regularizer and cross-lingual entity regularizer to Linking (Zhang et al., 2017a; Cao et al., 2018), align similar words and entities with similar em- Relation Extraction (Weston et al., 2013b) and so beddings. The entire framework is trained using on, yet little work focuses on cross-lingual set- a unified objective function, which is efficient and tings. Inspiringly, we investigate the task of cross- applicable to arbitrary language pairs that exist in lingual word embedding models (Ruder et al., multi-lingual KBs. 2017), and classify them into three groups accord- Particularly, we build a bilingual entity network ing to parallel corpora used as supervisions: (i) from inter-language links 1 in KBs for regulariz- methods requiring parallel corpus with aligned ing cross-lingual entities through a variant of skip- words as constraint for bilingual word embed- gram model (Mikolov et al., 2013c). Thus, mono- ding learning (Klementiev et al., 2012; Zou et al., lingual structured knowledge of entities are not 2013; Wu et al., 2014; Luong et al., 2015; Am- only extended to cross-lingual settings, but also mar et al., 2016; Soricut and Ding, 2016). (ii) augmented from other languages. On the other methods using parallel sentences (i.e. translated hand, we utilize distant supervision to generate sentence pairs) as the semantic composition of comparable sentences for cross-lingual sentence multi-lingual words (Gouws et al., 2015; Kociský regularizer to model co-occurrence information et al., 2014; Hermann and Blunsom, 2014; Chan- across languages. Compared with “pseudo bilin- dar et al., 2014; Shi et al., 2015; Mogadala and gual documents”, comparable sentences achieve Rettinger, 2016). (iii) methods requiring bilingual higher quality, because they rely not only on lexicon to map words from one language into the the shared semantics at document level, but also other (Mikolov et al., 2013b; Faruqui and Dyer, on cross-lingual information at sentence level. 2014; Xiao and Guo, 2014). We further introduce two attention mechanisms, The major weakness of these methods is the lim- knowledge attention and cross-lingual attention, to ited availability of parallel corpora. One remedy is select informative data in comparable sentences. to use existing multi-lingual resources (i.e. multi- Our contributions can be concluded as follows: lingual KB). Camacho-Collados et al. (2015) com- bines several KBs (Wikipedia, WordNet and Ba- • We proposed a novel method that jointly belNet) and leverages multi-lingual synsets to learns representations of not only cross- learn word embeddings at sense level through an lingual words but also cross-lingual entities in extra post-processing step. Artetxe et al. (2017) a unified vector space, aiming to enhance the starts from a small bilingual lexicon and using embedding quality from each other via com- a self-learning approach to induce the structural plementary semantics. similarity of embedding spaces. Vulic and Moens (2015, 2016) collect comparable documents on • Our proposed model introduces distant su- same themes from multi-lingual Wikipedia, shuf- pervision coupled with attention mechanisms fle and merge them to build “pseudo bilingual doc- to generate comparable data as cross-lingual uments” as training corpora. However, the qual- supervision, which can benefit many cross- ity of “pseudo bilingual documents” are difficult lingual analysis. to control, resulting in poor performance in several 1https://en.wikipedia.org/wiki/Help: cross-lingual tasks (Vulic and Moens, 2016). Interlanguage_links Another remedy matches linguistic distribu-
228 NBA NBA (zh) American · was NBA All-star Larry Faust player basketball Mono-lingual Detroit Pistons Representation Learning Bilingual Semantic Space
Cross-lingual Joint NBA NBA (zh) NBA Sentence Regularizer Representation Detroit Pistons · Learning Cross-lingual All-star Larry Foust Bilingual EN Entity Regularizer en [[Lawrence Michael Foust]] was an American basketball sk player who spent 12 seasons in [[NBA]] szh [[ · ]] [[NBA]] k0 Comparable Sentences Cross-lingual NBA NBA (zh) Detroit Pistons Supervision NBA Data Generation All-star Larry Foust · [[Lawrence Michael Foust]] was [[ · ]] [[NBA]] an American basketball player who spent 12 seasons in [[NBA]] 1950 [[NBA ]] 1 5 and was an 8-time [[All-star]] … [[ ]] … English KB Chinese KB
Figure 1: The overview framework of our method. The inputs and outputs of each step are listed in the three levels. Particularly, there are three main components of joint representation learning. Red texts with brackets are anchors, dashed lines denote entity relations, and solid lines are cross-lingual links. tion via adversarial training (Barone, 2016; Zhang We use multi-lingual Wikipedia as KB includ- Ey { y} et al., 2017b; Lample et al., 2018), domain adap- ing a set of entities = ei and their articles. tion (Cao et al., 2016). However, these methods We concatenate these articles together, and form Dy ⟨ y y y ⟩ suffer from the instability of training process and text corpus = w1, . . . , wi , . . . , w|D| . Hy- the high complexity. This either limits the scala- per links in articles are denoted by Anchors Ay = {⟨ y y⟩} y bility of vocabulary size or relies on a strong dis- wi , ej , which indicates that word wi refers y Gy Ey Ry tribution assumption. to entity ej . = ( , ) is the mono-lingual Ry {⟨ y y⟩} Inspired by Vulic and Moens (2016), we gener- Entity Network (EN), where = ei , ej y y ate highly qualified comparable sentences via dis- if there is a link between ei , ej . We use inter- tant supervision, which is one of the most promis- language links in Wikipedia as cross-lingual links Ren−zh {⟨ en zh⟩} en zh ing approaches to addressing the issue of sparse = ei , ei′ , indicating ei , ei′ refer training data, and performs well in relation extrac- to the same thing in English and Chinese. Cross- tion (Lin et al., 2017a; Mintz et al., 2009; Zeng lingual word and entity representation learning et al., 2015; Hoffmann et al., 2011; Surdeanu et al., is to map words and entities in different languages 2012). Our comparable sentences may further ben- into a unified semantic space. Each word and en- 3 y y efit many other cross-lingual analysis, such as in- tity obtain their embedding vectors wi and ej . formation retrieval (Dong et al., 2014). 3.2 Framework 3 Preliminaries and Framework To alleviate the heavy burden of limited parallel 3.1 Preliminaries corpora and additional translation efforts, we uti- lize existing multi-lingual resources to distantly Given a multi-lingual KB, we take (i) text cor- supervise cross-lingual word and entity represen- pus, (ii) entity and their relations, (iii) a set of an- tation learning, so that the shared embedding chors as inputs, and learn embeddings for each space supports joint inference among KB and texts word and each entity in various languages. For across languages. As shown in Figure 1, our clarity, we use English and Chinese as sample lan- framework has two steps: (1) Cross-lingual Su- guages in the rest of the paper, and use superscript y ∈ {en, zh} to denote language-specific parame- https://en.wikipedia.org/wiki/Lists_of_ ters2. languages_by_number_of_speakers. 3For the cross-lingual linked entities sharing the same 2We choose English and Chinese as example lan- strings (e.g., NBA and NBA (zh)), which is an infrequent situ- guages because they are top-ranked according to to- ation between languages, we use separated representations to tal number of speakers, the full list can be found in keep training objective consistent and avoid confusion.
229 pervision Data Generation builds a bilingual en- 4.1 Bilingual Entity Network Construction tity network and generates comparable sentences Entities with cross-lingual links refer to the same based on cross-lingual links; (2) Joint Represen- thing, which implies they are equivalent across tation Learning learns cross-lingual word and en- languages. Conventional knowledge representa- tity embeddings using a unified objective function. en tion methods only add edges between ei and Our assumption throughout the entire framework zh ei′ indicating a special “equivalent” relation (Zhu The more words/entities two contexts − is as follows: et al., 2017). Instead, we build Gen zh = (Een ∪ share, the more similar they are − . Ezh, Ren ∪Rzh ∪R˜en zh) by enriching the neigh- As shown in Figure 1, we build a bilingual bors of cross-lingual linked entities. That is, we − EN Gen zh by using Gen, Gzh and cross-lingual add edges R˜en−zh between two mono-lingual ENs Ren−zh en zh links . Thus, entities in different languages by letting all neighbors of ei be neighbors of ei′ , ⟨ en zh⟩ ∈ Ren−zh shall be connected in a unified network to facil- and vice versa, if ei , ei′ . itate cross-lingual entity alignments. Meanwhile, Gen−zh extends Gen and Gzh to bilingual set- from KB articles, we extract comparable sentences tings in a natural way. It not only keeps a con- Sen−zh {⟨ en zh⟩} = sk , sk as high qualified parallel sistent objective in mono-lingual ENs—entities, data to align similar words in different languages. no matter in which language, will be embedded Based on generated cross-lingual data closely if share common neighbors—but also en- Gen−zh, Sen−zh and mono-lingual data Dy, hances each other with more neighbors in the for- Ay, where y ∈ {en, zh}, we jointly learn cross- eign language. lingual word and entity embeddings through three Following the method in Zhu et al. (2017), there components: (1) Mono-lingual Representation will be no edge between Chinese entity 福斯特 Learning, which learns mono-lingual word and (Foust) and English entity Pistons, which implies entity embeddings for each language by modeling a wrong fact that 福斯特 (Foust) does not belong co-occurrence information through a variant of to Pistons. Our method enriches the missing rela- skip-gram model (Mikolov et al., 2013c). (2) tion between entities 福斯特 (Foust) and 活塞队 Cross-lingual Entity Regularizer, which aligns (Pistons) in incomplete Chinese KB through cor- entities that refer to the same thing in different responding English common neighbors, Allstar, languages by extending the mono-lingual model NBA, etc., as illustrated in Figure 1. to bilingual EN. For example, entity Foust in English and entity 福 斯 特 (Foust) in Chinese 4.2 Comparable Sentences Generation are closely embedded in the semantic space To supervise the cross-lingual representation because they share common neighbors in two learning of words, we automatically generate com- languages, All-star and NBA 选 秀 (draft), etc.. parable sentences as cross-lingual training data. (3) Cross-lingual Sentence Regularizer, which Comparable sentences are not translated paired models cross-lingual co-occurrence at sentence sentences, but sentences with the same topic in dif- level in order to learn translated words to have ferent languages. As shown in the middle layer most similar embeddings. For example, English (Figure 1), the pair of sentences are comparable word basketball and the translated Chinese word sentences: (1) “Lawrence Michael Foust was an 篮球 frequently co-occur in a pair of comparable American basketball player who spent 12 seasons sentences, therefore, their vector representations in NBA”, (2) “拉里·福斯特 (Lawrence Foust) 是 shall be close in the semantic space. The above (was) 美国 (American) NBA 联盟 (association) 的 components are trained jointly under a unified (of) 前 (former) 职业 (professional) 篮球 (basket- objective function. ball) 运动员 (player)”. Inspired by the distant supervision technique in relation extraction, we assume that sentence 4 Cross-lingual Supervision Data en en sk in Wikipedia articles of entity ei explicitly Generation en or implicitly describes ei (Yamada et al., 2017), en en and that sk shall express a relation between ei en en en This section introduces how to build a bilingual and ej if another entity ej is in sk . Mean- Gen−zh zh entity network and comparable sentences while, we find a comparable sentence sk′ in an- Sen−zh zh zh from a multi-lingual KB. other language which satisfies sk′ containing ej′
230 zh y C y in Wikipedia articles of Chinese entity ei′ , where where xi is either a word or an entity, and (xi ) ⟨ en zh⟩ ⟨ en zh⟩ ∈ Ren−zh ei , ei′ , ej , ej′ . AsshowninFig- denotes: (i) contextual words in a pre-defined win- y y ∈ Dy ure 1, the sentences in the second level are compa- dow of xi if xi , (ii) neighbor entities that y y ∈ Gy rable due to the similar theme of the relation be- linked to xi if xi , (iii) contextual words of y y y ⟨ y y⟩ ∈ Ay tween entity Foust and NBA. To find this type of wj if xi is entity ei in an anchor wj , ei . sentences, we search the anchors in the English aritcle and Chinese article of cross-lingual entity 5.2 Cross-lingual Entity Regularizer Foust, respectively, and extract the sentences in- The bilingual EN Gen−zh merges entities in dif- cluding another crosslingual entity NBA. Compa- ferent languages into a unified network, resulting rable sentences can be regarded as cross-lingual in the possibility of using the same objective as contexts. in mono-lingual ENs. Thus, we naturally extend Unfortunately, comparable sentences suffer mono-lingual function to cross-lingual settings: from two issues caused by distant supervision: ∑ L C′ y | y Wrong labelling. Take English as sample, there e = log P ( (ei ) ei ) en|L y∈{Gen−zh} may be several sentences sk,l l=1 containing the ei same entity een in the article of een. A straightfor- j i where C′(ey) denotes cross-lingual contexts— ward solution is to concatenate them into a longer i neighbor entities in different languages that linked sentence sen, but this increases the chance to in- k to ey. Thus, by jointly learning mono-lingual rep- clude unrelated sentences. i resentation with cross-lingual entity regularizer, Unbalanced information. Sometimes the pair words and entities share more common contexts, of sentences convey unbalanced information, e.g., and will have similar embeddings. As shown in the English sentence in the middle layer (Figure 1) Figure 1, English entity NBA co-occurs with words contains Foust spent 12 seasons in NBA while the basketball and player in texts, so they are embed- comparable Chinese sentence not. ded closely in the semantic space. Meanwhile, To address the issues, we propose knowledge at- cross-lingual linked entities NBA and NBA (zh) tention and cross-lingual attention to filter out un- have similar representations due to the most com- related information at sentence level, and at word mon neighbor entities, e.g., Foust. level respectively. 5.3 Cross-lingual Sentence Regularizer 5 Joint Representation Learning Comparable sentences provide cross-lingual co- As shown in Figure 2, there are three components occurrence of words, thus, we can use them to in learning cross-lingual word and entity represen- learn similar embeddings for the words that fre- tations, which are trained jointly. In this section, quently co-occur by minimizing the Euclidean dis- we will describe them in detail. tance as follows:
5.1 Mono-lingual Representation Learning ∑ en zh 2 L = ||s − s ′ || Following Yamada et al. (2016); Cao et al. (2017), s k k ⟨sen,szh⟩∈Sen−zh we learn mono-lingual word/entity embeddings k k′ Dy Ay based on corpus , anchors and entity net- en zh where s , s ′ are sentence embeddings. Take En- work Gy. Capturing the cooccurrence information k k glish as sample language, we define it as the aver- among words and entities, these embeddings serve age sum of word vectors weighted by the combi- as the foundation and will be further extended to nation of two types of attentions: bilingual settings using the proposed cross-lingual regularizers, which will be detailed in the next sec- tion. Monolingually, we utilize a variant of Skip- ∑L ∑ en en en ′ en zh en gram model (Mikolov et al., 2013c) to predict the sk = ψ(em , sk,l) ψ (wi , wj )wi en∈ en contexts given current word/entity: l=1 wi sk,l
en|L ∑ ∑ where sk,l l=1 are sentences containing the L C y | y m = log P ( (xi ) xi ) same entity (as mentioned in Section 4.2), and en en ∈{ } y∈{Dy Ay Gy} ψ(e , s ) y en,zh xi , , m k,l is knowledge attention that aims
231 ACL 2018 Submission ***. Confidential Review Copy. DO NOT DISTRIBUTE.
ACL 2018 Submission ***. Confidential Review Copy. DO NOT DISTRIBUTE. 000 050 ACL 2018 SubmissionACL 2018 Submission ***. Confidential ***. Confidential Review Review Copy. Copy.DO NOT DO DISTRIBUTE. NOT DISTRIBUTE. 001 Joint Representation Learning of Cross-lingual Words and Entities via 051 ACL 2018 Submission ***. Confidential Review Copy. DO NOT DISTRIBUTE. 002 000 Attentive Distant Supervision 052 050 ACL 2018 SubmissionACL 2018 Submission ***. Confidential ***. Confidential ReviewACL Review Copy.2018 Submission Copy.DO NOT DO DISTRIBUTE. NOT ***. DISTRIBUTE. Confidential Review Copy. DO NOT DISTRIBUTE. 000 Joint Representation Learning of Cross-lingual Words050 and Entities via 000 Joint003 Representation001 Learning of Cross-lingual Words and Entities via 050 053 051 001 004 ACL 2018 Submission ***. ConfidentialAttentive Review Copy. Distant DO NOT DISTRIBUTE. Supervision 051 054 001 000Joint Representation002 Learning of Cross-lingual Words and Entities via 050051 052 002 Attentive Distant Supervision 052 005000Joint Representation003 000 Learning of Cross-lingual Words andACL Entities 2018 Submission via ***. Confidential Review Copy. DO050055 NOT DISTRIBUTE. 053 050 002 001000 Attentive DistantACL 2018 SubmissionACL Supervision 2018 Submission ***. Confidential ***. Confidential Review Review Copy. Copy.DO NOT DO DISTRIBUTE. NOT DISTRIBUTE.051052 050 003 Joint Representation LearningJointAnonymous Representation of Cross-lingual ACL submission Learning Words of and Cross-lingual Entities053 via Words and Entities via 006001 004 001Attentive Distant Supervision 051056 054 051 002001 000Joint Representation Learning of Cross-lingual Words and Entities via052 050051 003 004 AttentiveACL 2018 Submission Distant ***. Supervision Confidential Review Copy. DO NOT DISTRIBUTE.053054 007002 005 002 Attentive Distant Supervision 052057 055 052 003 001 Joint Representation Learning of Cross-lingual Words and Entities053 via 051 004 002 Attentive000 Distant SupervisionAnonymous ACL submission 054 052 050 005 008003 000 003 ACL 2018 Submission ***. Confidential Review Copy. DO055 NOT DISTRIBUTE.053058 050 053 004 000002 006 AnonymousAttentive ACLACL submission 2018 Distant Submission Supervision ***. Confidential Review Copy. DO NOT054 DISTRIBUTE. 052 050 056 005 006 003 001 Joint Representation 001 LearningJoint Representation of Cross-lingual Learning Words and of055056 Cross-lingual Entities via053 Words051 and Entities via 051 009004 004 054059 054 005 001003 007000JointAnonymous Representation ACL submission Learning of Cross-lingual Words and055 Entities via053 050051 057 007 ACL 2018 Submission ***. Confidential ReviewAnonymous Copy. DO NOT DISTRIBUTE.002 ACL submissionAttentiveACL 2018 Distant Submission ***.Supervision ConfidentialAttentive Review Copy. Distant DO057 NOT DISTRIBUTE. Supervision 052 006 006004 010005 002 005Joint Representation Learning of Cross-lingual Words and056056 Entities via054055060 052 055 004 008001 Attentive Distant Supervision 054 051 058 002 000 Anonymous ACL submissionAnonymous ACL submission 052 050 007 008 007005 006 003 003 057057058 055056 053 053 011 005 009000002 006 Attentive Distant Supervision 055 061 052 059 050 056 003 001 Abstract AnonymousJoint Representation ACL submission Most Learning existing work of jointly Cross-lingual models KB and Words text and Entities053 via 051 500 009 008 007 004 ACL 2018 Submission ***. Confidential Review004 Anonymous Copy. DO NOT DISTRIBUTE. ACL submission550 058059 057 054 054 008 006 012 006 010001003 007000Joint Representation Learning of Cross-lingual Words058 and Entities056056 062 via053 060 050051 057 002 corpusAttentive to enhance each Distant other by Supervision learning word and 052 501 010 004 005 005 551 060 054055 055 009007 008 008 Joint Representation Learning of Cross-lingual059 Words and057058 Entities via 058 009 013 007 011002004Jointly learning001 word and entity represen- AnonymousAttentive ACL Distant submission SupervisionAnonymous059 ACL submission057 063 054 061 051052 502 003 entity representations552 in a unified vector space. For 053 011 010 009005 006 ACL 2018 Submission006 Abstract ***. Confidential Review Copy. DO NOT DISTRIBUTE. Most existing work060 jointly061 models KB059 and text 055056 056 010 008500 014 008 005Abstracttations benefits 009002 many NLP tasks, while Most has existing work jointlyAttentive models KBDistant and550 text Supervision 060 058058 064 055 052 059 503 012 003 Anonymous example, ACL submission553 (Wang et al., 2014; Yamada et al., 2016; 062 053 012 ACL 2018007 Submission ***. Confidential004ACL 2018 Review Submission Copy. DO ***. NOT Confidential DISTRIBUTE. ReviewAnonymous Copy. DO NOT DISTRIBUTE. ACL submissioncorpus to enhance each other062 by learning word and 057 054 011501 010006 009 006 003 007 551 061 059060 056056 053 057 011504 009 015 013notAbstract been well 010 explored in cross-lingualcorpus Most set- to existing enhance work each jointly other554 bymodels learning KB wordand text and 061 059 065 063 060 502 ACL 2018 Submission ***.004 Confidential Review005 Copy.Jointly DO NOT learningDISTRIBUTE.ACL 2018 Submission word ***. and Confidential entity Review represen-Cao Copy. et DO al. NOT, 2017 DISTRIBUTE.)utilizethecoherenceinforma- 552 054055 505 013 012 011 Jointly Abstract learning 008ACL 2018 word Submission and entity ***. Confidential represen- Review Copy. 008Most DO NOT DISTRIBUTE.existing work jointly models555 entity KB and representations text in062 a063 unified vector space.061 For 058 058 016007010 000 007tings. In this011004 paper, we proposeentitycorpusACL a novel 2018 representations to Submission enhance ***. each Confidential in other a unifiedAnonymous byReview learning vector Copy. DO space.word NOT ACL DISTRIBUTE. and For submission 050 060 066 057057 054 061 012 010503 014 Abstracttations benefits many NLP tasks, whiletionMost to has existing align similar work jointly words553models and entities KB and062 with text sim- 060 064 014 000013 001 005 Joint Representation006 Learning of Cross-lingualAbstract Words 050 andexample, Entities (Wang via Most et al. existing,0632014051064 ; Yamada work jointly et al., models2016; KB and text 055056 506 012008tations500011Jointly benefits learning009 many008 word NLP andACL tasks, entity 2018005 Submission while represen- has ***. Confidentialcorpus009 example,entity Review to enhance Copy. representations ( DOWang NOT each DISTRIBUTE. etother al. in, 2014 a unifiedby 556 ; learningYamada vector wordet space. al., and2016 For ; 550 061062 058058059 055 059 504 Joint017 Representation method Learning that012 of integrates Cross-lingual cross-lingual Words and word Entities corpus via Anonymous to enhance each ACL other submission554 by learning word and 067 062 013507 000 001011 002 015000 notAbstract been wellAttentiveACL explored 2018 Submission Distant in cross-lingual ***. Supervision ConfidentialilarMost050 embedding set- Review existing557Anonymous051 Copy. work DOvectors. NOT jointly DISTRIBUTE. AnotherACL models submission approach KB and063052050text in (Han 061 065 015 014000Jointly learning not501tations been word well benefits010 and explored006 manyACL entity 2018 in NLP Submission Submission represen- cross-lingual tasks,007 ***. ***. Confidential while Confidential set- has Review Review Copy. Copy. DO NOT DO DISTRIBUTE. NOT DISTRIBUTE. ACL 2018 Submission ***. Confidential 050 ReviewCao Copy. et DO al. NOT , 2017 DISTRIBUTE.corpus)utilizethecoherenceinforma- to551064 enhance065 each other by learning 060 word and 056057 Joint Representation013 012 Learning009Jointly ofAbstract Cross-lingual learningACLJointα006 2018ki,lj word Representation Submission Words and entityentity ***.010 and Confidential represen-Cao representationsEntitiesexample, Learningsim Review et(wki al. via,w Copy. (,Wang oflj2017)Most DO Cross-lingual NOT et)utilizethecoherenceinforma- in DISTRIBUTE.existing al. a unified, 2014; Words workYamada vector jointly and space. et al.Entities, models2016 For; via KB and text 062063 059 056 060 001 002 505 018009 Joint Representation 003 016 and001Attentive entity Learning000 representation Distant013 of Cross-lingual Supervision learning Jointly Words to enable learning and Entities wordentity corpus and via051 representations toentity enhance052 represen- each in other a unified555 by learning vector space.word053051 and For 050 068 059 066 063 508 001 502 tings.∝ wki Inskm this,wlj paper,slm we proposeet a novelal., 2016558 ; 051Toutanova et al.entity, 2015 representations; 552Wu et al., in a unified vector space. For 014 016 015012tations benefitstings.not many been InAttentive this well NLP011 paper, explored007 tasks, Distant we in while propose cross-lingual Supervision008 has a novelJoint set-∈ Representation ∈ Learning of Cross-lingual tion to Words align similar and Entities words065064066 viaand entities062 with sim- 061 057058 002 003 506 014 000013 004 010tations002Jointly benefits001 learning 007 many word NLP and tasks,example, entity011 whiletionAttentiveCao represen- ( toWang ethas align al.corpus Distant, 2017 et similar al.ACL)utilizethecoherenceinforma-, to Supervision 20182014 enhance words052 Submission; Yamada and053 each ***. entities Confidential et other al. with, by2016 Review556 learning sim-050; Copy. DO NOTword DISTRIBUTE.054052 and 063051064 060 057 061 509 002 019010503 joint Attentive inference014Distant among knowledge SupervisionAbstracttations base benefits and manyexample,entity NLP representations tasks, (Wang559 while052Most et al. has in, existing2014 a unified; Yamada work vector jointly et space. al.553, models2016 For ; KB and text069 060 064 000 Joint 017 Representation method Learning that of integrates Cross-lingual cross-lingual Words2016 and word)learnstorepresententitiesbasedontheir Entities Abstract050 via Most existing work jointly models067 KB and text 015 003 017004016013507 000 method001tings. that In this integrates paper, 003500 cross-lingual we002 propose009 000 a word novel ACL 2018 Submission ***. ConfidentialAttentive Review Copy.053 Distant DO NOT054 DISTRIBUTE. Supervisionilar 050 embedding557051 example, vectors.066 (065Wang Another053067 etal. approach, 2014550063052050; inYamada (Han et al., 2016; 059 003not been well 015 014 explored 000Jointly005 012 learningin008011 cross-lingualnottations been word well benefits and explored set-008 manyACL entity 2018 in NLP Submission represen- cross-lingual 012 tasks, ***. Confidentialilartion while embedding toset- has Review align Copy. similar DO vectors. NOT DISTRIBUTE. words Another and entities053 approach with in sim- (Han050 055 064065 061062 058058 062 510 001 504JointJoint Representation Representation text across Learning Learning languages,015 of of Cross-lingualCross-lingual capturing Caoα et mutually Words al. Words, 2017 and entity )utilizethecoherenceinforma- EntitiesandCao representationsEntitiesexample, sim via et( w al. via,w (,Wang5602017)corpus051 et)utilizethecoherenceinforma- in al. a to unified , 2014 enhance; Yamada vector each space. otheret al.554, by2016 For learning; word and 065 004 001 020011002 018004 003 and001Attentive entity representation DistantJointnotAnonymousAbstractki,lj been Representation Supervision well learning ACL explored submission to Learningtextual enable in054 cross-lingualki descriptions oflj Cross-lingualMost set-051 together existing Words052 work with and the jointly structured Entities models054 corpus via re- KB and to enhance text053051 070 each other061 by068 learning word and 005017508004 method001 006 that integratesJoint501s Representation cross-lingual010 ACL 2018 Submission word Learning ***. Confidential ofCross-lingual ∝ w Review s Copy.,w DOs Words NOT DISTRIBUTE.ACL and 2018 SubmissionEntities055 054 et ***. via al. Confidential, 2016558 Review; Cao051Toutanova Copy. et DO al. NOT067,056 2017 DISTRIBUTE.et al.)utilizethecoherenceinforma- , 2015;551Wu et al., 060 016 018 014 tings. 002 In and this015 entity paper, representation 013 we012 proposenotkmAnonymous been learningAttentive a well novel 009 ACLexplored to Distantenable submission in SupervisionACL cross-lingual 2018 Submission ilar embedding set-ki ***. km Confidential lj vectors.lm Review Copy. Another DO NOT052 approach DISTRIBUTE. in (Han 066068 065064 062063 059 511 005 002 016 505 tations benefits 009tings. many InAttentive this NLP Jointly paper, tasks,Abstract Distant learning we000 while propose 013 Supervision word has et a and al. novel, entity2016∈ ; represen-Toutanova∈ AttentiveCao 055 et al. etMost Distant,561 al.2017, 2015 existing)utilizethecoherenceinforma- Supervision052; Wu work et al. jointly, models555 KB and text066 050 059 063 006 021 003 complementary005502 ≈ 004 016 joint knowledge.002 Attentive inferencetion Insteadtings.Distant amongto align Inof knowledge Supervisionre- this similarexample, paper,Jointly wordstion base we learning ( toWang and propose andalignet entities056 word similarentitycorpus al. a, novel2014 and representations withwordsto enhanceentity; Yamada sim- and053 represen- each entities etin other aal. unified with, 2016 by055 learning sim- vector; space.word552 and054052 For071 066 wki018+509005 (wki003) 012and 002 entity 007 representationAnonymous019 ACL learning submission to enable Anonymous ACL lations. submission However, 053055 these 559 methods tion052 to only align068057 focus similar entity on words representations and entities in a062 with unified sim-069 vector space. For 512 019 joint016 inference000014 among000013 tings. knowledge Anonymous In 011 this010 paper, base ACL and001 submissionwe propose etJoint a al. novel, 2016 Representation; Toutanova et Learning562 al., ACL2015 20182016 of; SubmissionWu Cross-lingual)learnstorepresententitiesbasedontheir et al. ***.050, Confidential Words Review050069 and Copy. EntitiesDO NOT DISTRIBUTE.066 via 063051064 060061 017 006 007015methodC003 that017004506 integratesnot been cross-lingual wellmethod006 explored that005 integrateswordtations inJointly cross-lingual003 benefits cross-linguallearning014 many set- word2016 NLP word and)learnstorepresententitiesbasedontheir tasks, entity while represen-tion has056 tocorpus align057 similar to enhance words053 each and054 entities other by with556 learning067056 sim- word and065055053067 064 510006 004 003 008 010liance 503 Joint on parallel Representation017 data, we Learningilar automatically method embedding of Cross-lingual that integratesCao vectors. Abstracttations et ilar Words al. Another benefits embedding cross-lingual, 2017 and )utilizethecoherenceinforma- manyEntities approachexample,entity054 056 vectors. NLPword via representations (in tasks, AnotherWang (Han560 while etMost053 approachal. in, has2014 a existing unified ; in 058Yamada ( vectorHan work et space.jointly al.,5532016 For models; KB and text 060 067 019 022 joint000 inference001 020 among knowledgeJoint Representationtext base across and languages, Learning capturing of Cross-lingualAnonymous mutuallymono-lingualAttentive ACL Words submission settings, and Distant Entities and Supervision via051 fewilar050 embedding researches069 vectors.example, have Another (Wang072 et approach al., 2014 in; Yamada (Han070 et al., 2016; 513 007 020008 ! 004 013text005507017 across languages, 001014 007Jointlymethod capturing thatlearning012 integrates011 mutually004 word002 cross-lingual and000 entity 2016 word represen-)learnstorepresententitiesbasedontheirACL 2018 Submission ***.057 Confidential563058 Reviewtextual Copy.054 DO descriptionsNOT DISTRIBUTE.055 together557051057070 with the structured067054 re- 064063052050 061062 007and entity005 018 representation 004 009 015 learning and504000 entity to006 representation enable nottations beenskm well benefits learningAttentive explored manyACL totextual 2018 Distantenable in NLP Submission cross-lingual descriptions tasks, Supervision ***. Confidential while ilar set- together has Review embedding Copy. with DO055 NOT vectors.057 the DISTRIBUTE. structured Another re-054 approach in059 (Han050 554056068 065 018 016511 tings.001 002 In011generate thisJoint paper, Representation cross-lingualAnonymous we propose Anonymous ACL training Learning submission a015 novel data Attentive ACL of via submission Cross-lingual dis- Distant Jointαnotki,ljAbstract Representation beenetSupervision Words al. well , entity2016 and explored; Cao representationsEntitiesToutanovaexample, Learningsim et(w inki al. cross-lingual via,w (,Wang oflj2017 et) Cross-lingual561 al.corpus et)utilizethecoherenceinforma-052 in, Most2015 al. a051 unified, set-2014 to ; existing enhanceWu; Words068Yamada vector etACL work al. each and , 2018space. et jointly otheral.Entities Submission, 2016 For by066models; vialearning ***. KB Confidential wordand text and Review Copy.061 DO NOT DISTRIBUTE. 065 514 008 020 005 023006text across languages,002 008 capturing 018Jointcomplementary 005 mutually Representation003≈ et ACLn al.andACL001 2018,knowledge. 2018 entity2016 Submission Learning Submission; representation ***.Toutanovation Confidential ***. Insteadof Confidential to Cross-lingual Reviewbeen align of058et Copy. learningReview donere- similaral. DO NOT,Copy.5642015 in WordsDISTRIBUTE. to DO cross-lingual words enable NOT; Wu DISTRIBUTE.and055 and etEntities al. entities056 scenarios. , via with070052058 sim-Cao et al., 0552017073)utilizethecoherenceinforma- 053051 068 009 008 006 wcomplementary508018+005 (w ) 021 015 knowledge.001and entity 007 013 InsteadrepresentationAnonymous012 of re- ACL learning submissiontextual to enable descriptions∝ w togetherki skm,wlj with059slm056058 thelations. structuredACL 2018 SubmissionHowever, re-et055 al. ***., Confidential these2016558 ; methods051 ReviewToutanova Copy. DOonly NOT et068057DISTRIBUTE. focus al., 2015 on ; 065Wu et071 al., 062063 021 512 019014ki 010 ki003016 joint5052tations inference benefits tings. among not = many been In knowledgeAttentive thisAnonymous well NLP α paper, exploredJointlykm tasks, lations. DistantbaseACL weAbstract learning in while and000α propose submission cross-lingualki,lj SupervisionACL However, has 2018 word a Submission novel et and Anonymous set-al. these∈, entity ***.2016 Confidential methods∈; represen- ACLToutanova Review submission only Copy. focus etMost DO562 al. NOT,053 on2015 DISTRIBUTE.existing; Wu work et060071 al. jointly, models555 069 KB and064066 text 050 019515 009 017joints =006 inference007 amongmethod002 C knowledge that003tant 009 integrates supervision base cross-lingualand over 006 multi-lingual004 016 word002 knowl- tings.2016 In059)learnstorepresententitiesbasedontheir thisexample,565 paper,tionAttentiveCao ( toWang we et056 align al. propose Distant, 2017 et similar al.057entitycorpus)utilizethecoherenceinforma-, aSupervision2014 novel words052 representations to; Yamada enhance and069053059 entities each et inal. other a with, unified2016 by067 sim-056 learning; vector space.word054052 and For 066 010021 009 007 024 509complementary006 004 012 knowledge.002 008 019 Instead liance ofn onre- 2016 paralleljoint)learnstorepresententitiesbasedontheir Attentive inference data,n weilarDistant among automatically embedding knowledge Supervision vectors.060057 base059 Anotherand approach054056 in559071 (Han052tion to align058 similar074 words and entities062 with069 sim- L liance|| on 011 parallel− 022|| data,joint inference we000 automatically013 amongs knowledgeskm wkilations. bases and However,Joint theseIn this Representation methods paper,mono-lingual onlywe propose focus Learning on to2016 settings, learn of)learnstorepresententitiesbasedontheir Cross-lingual cross-lingual and061050 few researches Words and have Entities via072 063 010 022 513 007 015019 003 017004016text506 across languages,014 method000tings.007 thatkm capturingIn∈ this integratestations paper,003Jointly mutuallymono-lingual∈ benefitskmcross-lingual weACL001 learning propose 2018 Submission many wordsettings, a 2016 word NLP***. novel Confidentialand060)learnstorepresententitiesbasedontheir tasks, entityand Review fewwhile represen- Copy. researches DO has NOT057 DISTRIBUTE.corpus563 have to enhance053 054 each072 other by556069 learning050057 word066065053067 and 051064 516 011 010 S tasks014006k,l has that vectors. integrates of learning000 word Another tations cross-lingual translationtotext approachenable benefits000 across wordcorpus many in (Han languages to NLPliance enhanceAnonymousAbstract tasks,word each " on while without and other parallel ACL has entity by 572 learningsubmissionCao any"067 data, representations etadditional word al., we2017 and automatically)utilizethecoherenceinforma-054 trans-055Most in existing the 053 same work061 seman-068067 jointly058060∝ models KB050057 and064075056 text 050 027 and entity015516 014 representation not learning004 beentationstextedge well to benefits across enable exploredbases.008 many in languages, Weliance∈ wordscross-lingual NLP006 also tasks, ilar on and embedding propose004 while parallel set-example, capturing filter022 hasedge ACL vectors. twoout 2018( data,Wang bases. SubmissionACL noise, types008 mutuallyet Another2018 weACL al. Submission, 2018 automatically2014 ***.Wewhich approachet SubmissionConfidentialACL ; al.Yamada also ***.2018,ilar Confidential2016 will Submission embeddingin ***. Review propose ( etHan; Confidentialfur-Toutanova al. Review, Copy.***.2016 Confidential vectors. Copy. DO; two Review NOT DOet example, AnotherDISTRIBUTE. ReviewNOT typesCopy.al. , DISTRIBUTE.2015 Copy.DO ilar( approach NOTWang DO;054Wu DISTRIBUTE.embedding NOT et065 DISTRIBUTE. etal. in064, (al.Han2014, ; Yamada058 vectors.566077 et al., 2016056 Another; approach054 in (Han 058 072 018 edge bases.016 020words025 We andtings. also filter013 In thispropose 022 out paper,513 noise, we two propose002 which" types a019 novel willet al. fur-,012Abstract2016 ; AnonymousToutanova joint CaoAttentive inference et ACLexample, et al. al., submission 2015, 2017 Distant (Wang; among)utilizethecoherenceinforma-Wu Mostword et Supervisionare al.al. knowledge, existing,2014 helpful andtextual ; entityYamada work 068 tomono-lingual066 base descriptions jointly505 representations breaket al. and, models2016 down; KB settings, together language and in063 textthe052 same and with gaps seman-few the in researches manystructuredmono-lingual w563m have075 re-s k,l settings,072070 and062 few researches069 have 555 018 520and entity015 representation030 generatenot011 005 learning been well to006 cross-lingual enableexplored003of knowledge 007 inJointly cross-lingual016009 025 learning traininggenerateand008 attention entity 015set- wordACL! representation data and cross-lingualtings. 2018 tion and entity001via to Submission In aligncross-lingual represen-textdis- this011 learning similar lation across paper, training001Joint to words enablemechanism,***. we languages, Representation and propose Confidential data entitiesJoint007 via a with Representation noveldis- which capturing068 sim- ReviewLearning is usually mutually Copy. of Learning Cross-linguallation expensiveDO065 NOT ofmechanism,055 Cross-lingual056570 DISTRIBUTE. Words and2016word061)learnstorepresententitiesbasedontheir053 and andwhich Words Entities entity059 is and usually via representations Entities057 expensive via051058066065 080 in and the051 same061 seman- 057 075 025 028 023 018 and words entity representationand019 filter028018 out011 learning noise,joint004 toet inference enable020which al., 2016 among007 will; Toutanova fur- knowledge005word etnot al. base,and been2015 andwell entity; Wu exploredentity etbeen representations representationsal., in cross-lingual doneACL in a 2018 unifiedcross-lingual set- intion Submissionvector the068 to space. same align For similarscenarios. seman- ***.corpus wordsConfidentialtextual to enhance and 054 descriptions entities each Review075069068078 with other061 sim- by togetherCopy. learning DO with!word073∈ NOT055057 and the DISTRIBUTE. structured re- 078 070 523 017 joint026 016 inference517 015method among023 that knowledgetings.012018 integratesand005 notof In been entity knowledgethis basecross-lingual well009 paper,003 and relatedness explored we Abstract word attentionpropose007 inand cross-lingualet entitydemonstrateAnonymous a al. novel, andCao2016 representation set- et; cross-lingualToutanova al. ACL lation, 2017 the submission009)utilizethecoherenceinforma- mechanism, etef- Most al., learning20152016 existing;)learnstorepresententitiesbasedontheirticetWu al.which work space,, et to2016Abstract al. jointly enable, is; toToutanova usually573 modelsenablebeen067 Cao KBexpensiveet done joint et al. and al., 2015, intext inference2017055 cross-lingual; andWu)utilizethecoherenceinforma-066 etMost065 al. among053, existing059 scenarios.567 KBwork062 andjointly057 models KB076 and text 073 068059 019028 014 ACL 2018 Submission generate2016 ***. Confidential )learnstorepresententitiesbasedontheir013ilar cross-lingual Review embeddingAbstractAnonymous Copy.of DO knowledge NOT vectors.tion DISTRIBUTE. training ACLCao to Anotheralign 2018 submissionet Submission al. similar data attention,generate approach 2017ACLcorpus via***. 2018 words)utilizethecoherenceinforma- Confidential in Submissiondis-Most to (Han cross-lingual andenhanceAnonymous existing Review entities ***. cross-lingual Confidential each Copy.069 work with DOother ACLNOT Reviewjointly sim- training by DISTRIBUTE. submissionCopy. learning models DO NOTdata word KB DISTRIBUTE.064 via and dis-text 078 063 joint inference ther among improve knowledge012006 the base007004 performance. and514complementary 008tations005 benefits Inther many exper- knowledge.improve NLP tasks,002 023 while thetext has Instead performance. 002 acrossJointly learningof languages, re- 008 word Intictasks, exper-space, and capturingAttentive entity such to represen-enable as Distantmutually Attentive506 cross-lingual joint Supervisionet Distant inference056 al. 057 , entity Supervision2016 among062 linking,054;055Toutanova KB in and whichbeen 058 et done al.,564 0522015 in cross-lingual; Wu052 et al. scenarios., 058 073 556 019 521of knowledge016 021 joint026 attentioninference tings. In among this 023 and006 paper, 019 knowledge cross-lingual wetext 010 propose017010 base across2016 and)learnstorepresententitiesbasedontheirjoint 009 a020 languages, novel008 inference 016008 method capturingamong006 thatcomplementary knowledgetings.012 mutually integrates In this base example, paper,cross-lingual and knowledge. we ( Wang propose2016 etword)learnstorepresententitiesbasedontheir al.069, a2014 novel Instead; Yamadabeen ofet al. re- done, 2016066 ; in056nentity cross-lingual571textual representationstic space, descriptions060 scenarios. to in069060 enable a unified058 together jointvectorspace.inference with076059067066056058 the For structured among073071062 KB re- and 070 524 019 018 text031 acrossther 016tant languages, improveand020 entity supervision representation capturing013 theattentiontings. 012 performance. mutually overIn learning this004 toJointly021026 multi-lingualpaper, toselecttant enable learning Inwe supervision the exper-propose2016 word most)learnstorepresententitiesbasedontheirtion a knowl- and novel overto informative entity alignmay multi-lingual represen- similar Anonymous introduce010corpus words totextualCross-lingual ACL andenhance inevitableknowl- entities submission descriptions each withlations. other errors. sim- 574 together byilar learning068= embedding069 tion Our However,may with to word embeddings alignthe introduceα vectors.and structured similar thesecorpus Another words066 re- inevitableα to054 methods and enhance approach entities063070 each errors.062 with inonly other (Han sim- by focus Our learning embeddings on word081 and 060071 076 029 020 400017518 015024 method300029 thatnot integrates been well cross-lingual explored000 in cross-lingualet word000 al.tic,=2016 space,; + set-Toutanovaγ totion enable et to al. align, 20152 similar joint; Wucorpus wordset inference al. to, and enhance entities 070 among each with other sim- KBby learningkm and 450067lations.065 wordψ and(ki,ljei However,,s k,l568 )079 050 these050 methods only focusl 074 on079 350 026 024text across000 languages,027 013 capturing007 mutually008005fectiveness007 attention 009006011 ofto text our select acrosstextualjoint method = the languages,014 descriptions inferenceACL most003 with 2018 capturingc Submissioninformative an togethermay amongilar average003tationss embedding mutually introduce with ***.Cao benefitsConfidential knowledge the et structured al.tantIn vectors.009, entityinevitable many2017 thistext Review supervision)utilizethecoherenceinforma- Another representations re- NLP paper, across baseCopy. tasks, errors. approach DO and languages we NOT while over in DISTRIBUTE. propose Our In ain has unified (multi-lingualHan this embeddings without vector paper, to057057058 learnspace. any we knowl- Forpropose additionalcross-lingual063055056 076 to061 trans- learn 059 cross-lingual053074077 053 064050 059 020 029 522 017 text across method languages, that integrates019020515 capturingJointly complementary cross-lingual 011000 learning018 mutually005textual 010 word descriptions tant iments, word009017000009 knowledge. and s supervision entityand together separate007 entity represen- Insteadattentionmethod representation with013 overcomplementary tasks the of that multi-lingual structuredre- integrates of to learning word Sentence selectJointly re- cross-lingual knowledge. translationto learning knowl-textualtext theenable Regularizer 070 across descriptionsmost word word Instead and languages informative entity together of represen-n re- with067 without theexample, structured572 any n additional ( re-Wang055 061 et al.079 trans-,0702014050 ; 059 Yamada050 et565 al.060068067,0572016059 ; 063 069 020 019 complementary iments,027 017joint separate 021 knowledge. inferenceand014 024 tasks among entitymethod013liance Insteadrepresentation knowledge of that of word on re- integratestations base parallel translationJointly021 learning001 and benefits cross-lingual learningtextual to manydata,2016 enable001LJointilar descriptions word NLP)learnstorepresententitiesbasedontheir 024 word embeddingweAbstractL tasks,Representation andAbstractAnonymous automatically entity whileJoint togetherL vectors.ilar 011 represen- entity has Representationembedding withACL Another representations Learninglations. the submission structured vectors. approachthe Most However, of Learning major AnotherCross-lingualMost inre-existinga in unified existing (Han thesework challenge approachet of 069 vectoral.507 jointly070Cross-lingualwork methodsIn,ilars2016 Words this embeddinginjointly space. models (sHan; lies2016Toutanova paper, only modelsand For KB inandWords vectors. focusEntities)learnstorepresententitiesbasedontheirw measuringKB welations.entity text067s andet onpropose andAnother representationsal. textvia, Entities2015 However, approach the to;064071Wu learn similar-via051063 in et in a unified ( theseal.Han cross-lingualIn, this vector methods051 paper,077 space. For onlywe propose focus 074 on to learn cross-lingual071061 074 557 ጱ trans- 077 072 ڹ attention to022401018519iments, select 500 separate the016 009 mostwords tasks informative andtings.007 of word filter In027edge this translationout paper,ACLL bases. 2018 noise, we Submission propose004 which We ***.liance Confidential aalso novelare will Review004|| on propose helpful= fur-Anonymous Copy. parallel− DO NOT to DISTRIBUTE.two||ACL break data, types submissionentity down we representations550mono-lingual automatically language575 071 inarekm a gaps unified helpfulkm settings, in vector059 many451068ki to066 space.text breakkm and Foracross057 few down569 languages researches language without gaps have054 in any many additional054 021 525 complementary 032 knowledge. edgeand014008 entity Instead bases. representation 030 006 of008 re- We 010 learning 012001 also 012006 022complementary topropose enable 010001 lations.000Joint 015