Multilingual lexical resources to detect cognates in non-aligned texts Haoxing Wang Laurianne Sitbon Queensland University of Technology Queensland University of Technology 2 George St, Brisbane, 4000 2 George St, Brisbane, 4000
[email protected] [email protected] ers. As word frequency can be partially estimated Abstract by word length, it remains the principal feature for estimation second language learners with the The identification of cognates between assumption that less frequent words are less like- two distinct languages has recently start- ly to have been encountered and therefore acces- ed to attract the attention of NLP re- sible in the memory of the learnt language by search, but there has been little research readers. However such features are not entirely into using semantic evidence to detect adapted to existing reading abilities of learners cognates. The approach presented in this with a different language background. In particu- paper aims to detect English-French cog- lar, for a number of reasons, many languages nates within monolingual texts (texts that have identical words in their vocabulary, which are not accompanied by aligned translat- opens a secondary access to the meaning of such ed equivalents), by integrating word words as an alternative to memory in the learnt shape similarity approaches with word language. For example, English and French lan- sense disambiguation techniques in order guages belong to different branches of the Indo- to account for context. Our implementa- European family of languages, and additionally tion is based on BabelNet, a semantic European history of mixed cultures has led their network that incorporates a multilingual vocabularies to share a great number of similar encyclopedic dictionary.