Joint Representation Learning of Cross-Lingual Words and Entities

Joint Representation Learning of Cross-lingual Words and Entities via Attentive Distant Supervision Yixin Cao1;2 Lei Hou2∗ Juanzi Li2 Zhiyuan Liu2 Chengjiang Li2 Xu Chen2 Tiansi Dong3 1School of Computing, National University of Singapore, Singapore 2Department of CST, Tsinghua University, Beijing, China 3B-IT, University of Bonn, Bonn, Germany {caoyixin2011,iamlockelightning,successcx}@gmail.com {houlei,liuzy,lijuanzi}@tsinghua.edu.cn [email protected] Abstract Wu et al. (2016); Han et al. (2016); Weston et al. (2013a); Wang and Li (2016) represent entities Joint representation learning of words and enti- based on their textual descriptions together with ties benefits many NLP tasks, but has not been the structured relations. These methods focused on well explored in cross-lingual settings. In this paper, we propose a novel method for joint rep- mono-lingual settings. However, for cross-lingual resentation learning of cross-lingual words and tasks (e.g., cross-lingual entity linking), these ap- entities. It captures mutually complementary proaches need to introduce additional tools to do knowledge, and enables cross-lingual infer- translation, which suffers from extra costs and in- ences among knowledge bases and texts. Our evitable errors (Ji et al., 2015, 2016). method does not require parallel corpora, and In this paper, we carry out cross-lingual joint automatically generates comparable data via representation learning, which has not been fully distant supervision using multi-lingual knowledge bases. We utilize two types of regu- researched in the literature. We aim at creating a larizers to align cross-lingual words and enti- unified space for words and entities in various lan- ties, and design knowledge attention and cross- guages, and easing cross-lingual semantic compar- lingual attention to further reduce noises. We ison, which will benefit from the complementary conducted a series of experiments on three information in different languages. For instance, tasks: word translation, entity relatedness, and two different meanings of word center in English cross-lingual entity linking. The results, both are expressed by two different words in Chinese: qualitatively and quantitatively, demonstrate center the activity-specific building the significance of our method. as is expressed by 中心, center as the basketball player role is 中 1 Introduction 锋. Our main challenge is the limited availability Multi-lingual knowledge bases (KB) store millions of parallel corpus, which is usually either expen- of entities and facts in various languages, and pro- sive to obtain, or only available for certain narrow vide rich background structural knowledge for un- domains (Gouws et al., 2015). Many work has derstanding texts. On the other hand, text cor- been done to alleviate the problem. One school pus contains huge amount of statistical information of methods uses adversarial technique or domain complementary to KBs. Many researchers lever- adaption to match linguistic distribution (Zhang age both types of resources to improve various nat- et al., 2017b; Barone, 2016; Cao et al., 2016). ural language processing (NLP) tasks, such as ma- These methods do not require parallel corpora. chine reading (Yang and Mitchell, 2017), question The weakness is that the training process is un- answering (He et al., 2017; Hao et al., 2017). stable and that the high complexity restricts the Most existing work jointly models KB and text methods only to small-scale data. Another line corpus to enhance each other by learning word and of work uses pre-existing multi-lingual resources entity representations in a unified vector space. For to automatically generate “pseudo bilingual docu- example, Wang et al. (2014); Yamada et al. (2016); ments” (Vulic and Moens, 2015, 2016). However, Cao et al. (2017) utilize the co-occurrence infor- negative results have been observed due to the oc- mation to align similar words and entities with sim- casional poor quality of training data (Vulic and ilar embedding vectors. Toutanova et al. (2015); Moens, 2016). All above methods only focus on ∗ Corresponding author. words. We consider both words and entities, which 227 Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 227–237 Brussels, Belgium, October 31 - November 4, 2018. c 2018 Association for Computational Linguistics makes the parallel data issue more challenging. • We did qualitative analysis to have an in- In this paper, we propose a novel method tuitive impression of our embeddings, and for joint representation learning of cross-lingual quantitative analysis in three tasks: word wordsandentities. Thebasicideaistocapturemu- translation, entity relatedness, and cross- tually complementary knowledge in a shared se- lingual entity linking. Experiment results mantic space, which enables joint inference among show that our method demonstrates signifi- cross-lingual knowledge base and texts without ad- cant improvements in all three tasks. ditional translations. We achieve it by (1) utilizing an existing multi-lingual knowledge base to auto- 2 RelatedWork matically generate cross-lingual supervision data, (2) learning mono-lingual word and entity rep- Jointly representation learning of words and enti- resentations, (3) applying cross-lingual sentence ties attracts much attention in the fields of Entity regularizer and cross-lingual entity regularizer to Linking (Zhang et al., 2017a; Cao et al., 2018), align similar words and entities with similar em- Relation Extraction (Weston et al., 2013b) and so beddings. The entire framework is trained using on, yet little work focuses on cross-lingual set- a unified objective function, which is efficient and tings. Inspiringly, we investigate the task of cross- applicable to arbitrary language pairs that exist in lingual word embedding models (Ruder et al., multi-lingual KBs. 2017), and classify them into three groups accord- Particularly, we build a bilingual entity network ing to parallel corpora used as supervisions: (i) from inter-language links 1 in KBs for regulariz- methods requiring parallel corpus with aligned ing cross-lingual entities through a variant of skip- words as constraint for bilingual word embed- gram model (Mikolov et al., 2013c). Thus, mono- ding learning (Klementiev et al., 2012; Zou et al., lingual structured knowledge of entities are not 2013; Wu et al., 2014; Luong et al., 2015; Am- only extended to cross-lingual settings, but also mar et al., 2016; Soricut and Ding, 2016). (ii) augmented from other languages. On the other methods using parallel sentences (i.e. translated hand, we utilize distant supervision to generate sentence pairs) as the semantic composition of comparable sentences for cross-lingual sentence multi-lingual words (Gouws et al., 2015; Kociský regularizer to model co-occurrence information et al., 2014; Hermann and Blunsom, 2014; Chan- across languages. Compared with “pseudo bilin- dar et al., 2014; Shi et al., 2015; Mogadala and gual documents”, comparable sentences achieve Rettinger, 2016). (iii) methods requiring bilingual higher quality, because they rely not only on lexicon to map words from one language into the the shared semantics at document level, but also other (Mikolov et al., 2013b; Faruqui and Dyer, on cross-lingual information at sentence level. 2014; Xiao and Guo, 2014). We further introduce two attention mechanisms, The major weakness of these methods is the lim- knowledge attention and cross-lingual attention, to ited availability of parallel corpora. One remedy is select informative data in comparable sentences. to use existing multi-lingual resources (i.e. multi- Our contributions can be concluded as follows: lingual KB). Camacho-Collados et al. (2015) com- bines several KBs (Wikipedia, WordNet and Ba- • We proposed a novel method that jointly belNet) and leverages multi-lingual synsets to learns representations of not only cross- learn word embeddings at sense level through an lingual words but also cross-lingual entities in extra post-processing step. Artetxe et al. (2017) a unified vector space, aiming to enhance the starts from a small bilingual lexicon and using embedding quality from each other via com- a self-learning approach to induce the structural plementary semantics. similarity of embedding spaces. Vulic and Moens (2015, 2016) collect comparable documents on • Our proposed model introduces distant su- same themes from multi-lingual Wikipedia, shuf- pervision coupled with attention mechanisms fle and merge them to build “pseudo bilingual doc- to generate comparable data as cross-lingual uments” as training corpora. However, the qual- supervision, which can benefit many cross- ity of “pseudo bilingual documents” are difficult lingual analysis. to control, resulting in poor performance in several 1https://en.wikipedia.org/wiki/Help: cross-lingual tasks (Vulic and Moens, 2016). Interlanguage_links Another remedy matches linguistic distribu- 228 NBA NBA (zh) American · was NBA All-star Larry Faust player basketball Mono-lingual Detroit Pistons Representation Learning Bilingual Semantic Space Cross-lingual Joint NBA NBA (zh) NBA Sentence Regularizer Representation Detroit Pistons · Learning Cross-lingual All-star Larry Foust Bilingual EN Entity Regularizer en [[Lawrence Michael Foust]] was an American basketball sk player who spent 12 seasons in [[NBA]] szh [[·]][[NBA]] k0 Comparable Sentences Cross-lingual NBA NBA (zh) Detroit Pistons Supervision NBA Data Generation All-star Larry Foust · [[Lawrence Michael Foust]]

Load more