TANGO: Bilingual Collocational Concordancer Jia-Yan Jian Yu-Chia Chang Jason S. Chang Department of Computer Inst. of Information Department of Computer Science System and Applictaion Science National Tsing Hua National Tsing Hua National Tsing Hua University University University 101, Kuangfu Road, 101, Kuangfu Road, 101, Kuangfu Road, Hsinchu, Taiwan Hsinchu, Taiwan Hsinchu, Taiwan
[email protected] [email protected] [email protected] du.tw on elaborated statistical calculation. Moreover, log Abstract likelihood ratios are regarded as a more effective In this paper, we describe TANGO as a method to identify collocations especially when the collocational concordancer for looking up occurrence count is very low (Dunning, 1993). collocations. The system was designed to Smadja’s XTRACT is the pioneering work on answer user’s query of bilingual collocational extracting collocation types. XTRACT employed usage for nouns, verbs and adjectives. We first three different statistical measures related to how obtained collocations from the large associated a pair to be collocation type. It is monolingual British National Corpus (BNC). complicated to set different thresholds for each Subsequently, we identified collocation statistical measure. We decided to research and instances and translation counterparts in the develop a new and simple method to extract bilingual corpus such as Sinorama Parallel monolingual collocations. Corpus (SPC) by exploiting the word- We also provide a web-based user interface alignment technique. The main goal of the capable of searching those collocations and its concordancer is to provide the user with a usage. The concordancer supports language reference tools for correct collocation use so learners to acquire the usage of collocation.