Learning to Represent Bilingual Dictionaries
Total Page:16
File Type:pdf, Size:1020Kb
Learning to Represent Bilingual Dictionaries Muhao Chen1∗, Yingtao Tian2∗, Haochen Chen2, Kai-Wei Chang1, Steven Skiena2 & Carlo Zaniolo1 1University of California, Los Angeles, CA, USA 2The State University of New York, Stony Brook, NY, USA [email protected]; fkwchang, [email protected]; fyittian, haocchen, [email protected] Abstract mantic transfers, these approaches critically sup- port many cross-lingual NLP tasks including neu- Bilingual word embeddings have been widely ral machine translations (NMT) (Devlin et al., used to capture the correspondence of lexi- 2014), bilingual document classification (Zhou cal semantics in different human languages. However, the cross-lingual correspondence et al., 2016), knowledge alignment (Chen et al., between sentences and words is less studied, 2018b) and entity linking (Upadhyay et al., 2018). despite that this correspondence can signifi- While many existing approaches have been pro- cantly benefit many applications such as cross- posed to associate lexical semantics between lan- lingual semantic search and textual inference. guages (Chandar et al., 2014; Gouws et al., 2015; To bridge this gap, we propose a neural em- Luong et al., 2015a), modeling the correspon- bedding model that leverages bilingual dictio- dence between lexical and sentential semantics naries1. The proposed model is trained to map the lexical definitions to the cross-lingual tar- across different languages is still an unresolved get words, for which we explore with differ- challenge. We argue that learning to represent ent sentence encoding techniques. To enhance such cross-lingual and multi-granular correspon- the learning process on limited resources, our dence is well desired and natural for multiple rea- model adopts several critical learning strate- sons. One reason is that, learning word-to-word gies, including multi-task learning on differ- correspondence has a natural limitation, consider- ent bridges of languages, and joint learning of ing that many words do not have direct transla- the dictionary model with a bilingual word em- tions in another language. For example, schaden- bedding model. We conduct experiments on two new tasks. In the cross-lingual reverse freude in German, which means a feeling of joy dictionary retrieval task, we demonstrate that that comes from knowing the troubles of other our model is capable of comprehending bilin- people, has no proper English counterpart word. gual concepts based on descriptions, and the To appropriately learn the representations of such proposed learning strategies are effective. In words in bilingual embeddings, we need to capture the bilingual paraphrase identification task, we their meanings based on the definitions. show that our model effectively associates sen- Besides, modeling such correspondence is also tences in different languages via a shared em- bedding space, and outperforms existing ap- highly beneficial to many application scenarios. One example is cross-lingual semantic search of arXiv:1808.03726v3 [cs.CL] 6 Sep 2019 proaches in identifying bilingual paraphrases. concepts (Hill et al., 2016), where the lexemes 1 Introduction or concepts are retrieved based on sentential de- scriptions (see Fig.1). Others include discourse Cross-lingual semantic representation learning has relation detection in bilingual dialogue utterances attracted significant attention recently. Various ap- (Jiang et al., 2018), multilingual text summariza- proaches have been proposed to align words of tion (Nenkova et al., 2012), and educational ap- different languages in a shared embedding space plications for foreign language learners. Finally, (Ruder et al., 2017). By offering task-invariant se- it is natural in foreign language learning that a ∗ Both authors contributed equally to this work. human learns foreign words by looking up their 1We refer the term dictionary to its common meaning, i.e. meanings in the native language (Hulstijn et al., lexical definitions of words. Note that this is different from some papers on bilingual settings that refer dictionaries to 1996). Therefore, learning such correspondence seed lexicons for one-to-one word mappings. essentially mimics human learning behaviors. ful to help users find foreign words based on the A male descendent in relation to his parents. EN FilsFR Cross-lingual Reverse notions or descriptions, and is especially benefi- Cross-lingual Paraphrases Dictionary Retrieval cial to users such as translators, foreigner language Tout être humain du sexe masculin considéré learners and technical writers using non-native par rapport à son père et à sa mère, ou à un des SonEN deux seulement. FR languages. We show that BilDRL achieves promising results on this task, while bilingual Figure 1: An example illustrating the two cross-lingual multi-task learning and joint learning dramatically tasks. The cross-lingual reverse dictionary retrieval finds cross-lingual target words based on descriptions. In terms enhance the performance. (ii) Bilingual para- of cross-lingual paraphrases, the French sentence (which phrase identification asks whether two sentences means any male being considered in relation to his father and mother, or only one of them) describes the same meaning in different languages essentially express the same as the English sentence, but has much more content details. meaning, which is critical to question answering or dialogue systems that apprehend multilingual utterances (Bannard and Callison-Burch, 2005). This task is challenging, as it requires a model to However, realizing such a representation learn- comprehend cross-lingual paraphrases that are in- ing model is a non-trivial task, inasmuch as it re- consistent in grammar, content details and word quires a comprehensive learning process to effec- orders. BilDRL maps sentences to the lexicon tively compose the semantics of arbitrary-length embedding space. This process reduces the prob- sentences in one language, and associate that with lem to evaluate the similarity of lexicon embed- single words in another language. Consequently, dings, which can be easily solved by a simple clas- this objective also demands high-quality cross- sifier. BilDRL performs well with even a small lingual alignment that bridges between single and amount of data, and significantly outperforms pre- sequences of words. Such alignment information vious approaches. is generally not available in the parallel and seed- lexicon that are utilized by bilingual word embed- 2 Related Work dings (Ruder et al., 2017). To incorporate the representations of bilingual We discuss two lines of relevant work. lexical and sentential semantics, we propose an Bilingual word embeddings. Various approaches approach to capture the mapping from the defini- have been proposed for training bilingual word tions to the corresponding foreign words by lever- embeddings. These approaches span in two fami- aging bilingual dictionaries The proposed model lies: off-line mappings and joint training. BilDRL (Bilingual Dictionary Representation The off-line mapping based approach fixes the Learning) first constructs a word embedding structures of pre-trained monolingual embeddings, space with pre-trained bilingual word embed- and induces bilingual projections based on seed dings. Based on cross-lingual word definitions, lexicons (Mikolov et al., 2013a). Some variants a sentence encoder is trained to realize the map- of this approach improve the quality of projec- ping from literal descriptions to target words in tions by adding constraints such as orthogonality the bilingual word embedding space, for which of transforms, normalization and mean centering we investigate with multiple encoding techniques. of embeddings (Xing et al., 2015; Artetxe et al., To enhance cross-lingual learning on limited re- 2016; Vulic´ et al., 2016). Others adopt canonical sources, BilDRL conducts multi-task learning on correlation analysis to map separate monolingual different directions of a language pair. More- embeddings to a shared embedding space (Faruqui over, BilDRL enforces a joint learning strategy and Dyer, 2014; Doval et al., 2018). of bilingual word embeddings and the sentence Unlike off-line mappings, joint training mod- encoder, which seeks to gradually adjust the em- els simultaneously update word embeddings and bedding space to better suit the representation of cross-lingual alignment. In doing so, such ap- cross-lingual word definitions. proaches generally capture more precise cross- To show the applicability of BilDRL, we con- lingual semantic transfer (Ruder et al., 2017; duct experiments on two useful cross-lingual tasks Upadhyay et al., 2018). While a few such mod- (see Fig.1). (i) Cross-lingual reverse dictionary els still maintain separated embedding spaces for retrieval seeks to retrieve words or concepts given each language (Artetxe et al., 2017), more of them descriptions in another language. This task is use- maintain a unified space for both languages. The cross-lingual semantic transfer by these models is Vlj ) is a cross-lingual definition that describes the captured from parallel corpora with sentential or word wi with a sequence of words in language document-level alignment, using techniques such lj. For example, a French-English dictionary as bilingual bag-of-words distances (BilBOWA) D(Fr; En) could include a French word appetite´ (Gouws et al., 2015), Skip-Gram (Coulmance accompanied by its English definition desire for, et al., 2015) and sparse tensor factorization (Vyas or relish of food or drink. Note that, for