<<

BanglaNet: Towards a WordNet for Bengali K.M. Tahsin Hassan Rahit Khandaker Tabin Hasan American International American International University - University -Bangladesh [email protected] [email protected] Md. Al- Amin Zahiduddin Ahmed American International American International University - -Bangladesh [email protected] [email protected]

Abstract present, there are roughly 6,500 1. Among those, Bengali is the 7th most popular language 2 in the world. Yet, there is a lack Despite being a popular language in of work for Bengali . Global - the world, the Bengali language lacks Net Association (GWA) has enlisted almost all in having a good wordnet. This re- in several levels depending on avail- stricts us to do NLP related research ability and how rich it is. At first level, there work in Bengali. Most of the today’s are 34 Open Multi-lingual WordNet 3 that are wordnets are developed by following merged into Global WordNet Grid. But in spite expand approach. One of the key chal- of being a popular language, Bengali is not one lenges of this approach is the cross- of them. GWA also enlist other available word- lingual word-sense disambiguation. In nets. Among those 80 wordnets, there are two our research work, we make seman- Bengali wordnets which are developed in In- tic relation between Bengali wordnet dia. and Princeton WordNet based on well- In this research work, a baseline for BanglaNet established research work in other lan- has been developed which is a wordnet for guages. The algorithm will derive rela- the Bengali language. To link the wordnet tions between concepts as well. One of with Princeton WordNet, semi-automatic cross- our key objectives is to provide a panel lingual sense mapping approach is used. We for lexicographers so that they can val- align the Princeton WordNet synset into a bi- idate and contribute to the wordnet. lingual dictionary through the English equiv- alent and its part-of-speech (POS). Manual 1 Introduction translation and link-up can also be employed after the alignment. This paper covers previous The Princeton WordNet (PWN) (Miller, 1995; works for other wordnets including previous Fellbaum, 1998) is one of the most semanti- cally rich English lexical database which is 1 How many spoken languages are there in widely used as a resource in many research the world, http://www.infoplease.com/askeds/ many-spoken-languages.html (Accessed 2016-10- and development. It is not only an important re- 22) source for NLP applications in each language, 2Most widely spoken languages in the world, http: but also for inter-linking WordNets of differ- //www.infoplease.com/ipa/A0775272.html (Accessed 2016-10-22) ent languages to develop multilingual applica- 3Open Multilingual WordNet, http://compling.hss. tions to overcome the language barrier. In the ntu.edu.sg/omw/ (Accessed 23-10-2016) attempts of developing Bengali WordNet, de- in 1996. EuroWordNet (Vossen, 2002) began scribe initiative taken for BanglaNet and our as an EU project, with the goal of developing design and execution process in depth. Lastly, wordnets for Dutch, Spanish and Italian and analysis of resultant lexical database has been linking these wordnets to the English Word- presented. We aim to include BanglaNet into Net in a multilingual database. Later in 1997, GlobalWordNet in future. Intending to doing it was extended and German, French, Czech so, relation with Princeton WordNet is main- and Estonian included. Balkan WordNet (Tu- tained as much as possible as per the conven- fis et al., 2004) - which was developed in the tion. Additionally, a web-based collaborative BalkaNet project was developed with an aim tool, called Oikotan which is BanglaNet Lexi- to develop a multilingual semantic network for cography Development Panel (LDP) has been Balkan languages such as reek, Turkish, Ro- developed for revising the result of synset as- manian, Bulgarian, Czech and Serbian. In de- signment and provide a framework to create veloping BalkaNet semantic relations are clas- BanglaNet via the linkage with synsets. sified in the independent WordNets according to a shared ontology. BalkaNet was integrated 2 Background Study along with EuroWordNet through a WordNet Management System. Relations among synsets 2.1 WordNet Development Techniques have been built mostly automatically (Pala and To this date, there are two ways develop word- Smrz, 2004) and these relations are developed net for a particular language. based on Princeton WordNet. However, to Merge Approach is used to build the word achieve high accuracy rate developer needs net from scratch. The Princeton WordNet is to pay special attention to the problem of the built in this approach. The taxonomies of the translation equivalents. language, synsets, relations among synsets are developed first. Experienced work power, lexi- There are open challenges in NLP re- cographer and time are needed to develop for search to automate development of semantic this approach (Taghizadeh and Faili, 2016). resources constitutes. In WOLF (Wordnet Li- Mapping resultant wordnet with the Princeton bre du Franc¸ais, Free French Wordnet) (Apidi- WordNet is also required extensive work and anaki and Sagot, 2012) development, multi- cross-language expert. ple NLP algorithms including cross-lingual word sense disambiguation is used. WOLF Expand Approach is used to map or trans- is free wordnet for the French language. In late local directly to the Princeton Word- Asian region, Japanese WordNet (Isahara et al., Net’s synsets by using the existing bilingual 2008) was developed using expand approach. dictionaries. Most of the WordNet available Korean WordNet (Lee et al., 2002) was de- currently is developed by following this ap- veloped using extracting semantic hierarchy proach. This process can be made easy by by utilizing a monolingual MRD and an ex- semi-automatically doing many tasks and then isting thesaurus in expand approach. Thai refactoring it for further proofing. WordNet was (Sathapornrungkij and Pluem- 2.2 Related Works pitiwiriyawej, 2005) also developed by follow- ing this same approach. Another large work in 2.2.1 International Languages Asian region includes IndoWordNet (Prabhu The first attempt for developing WordNet in et al., 2012) developed in to incorpo- another language other than English started rate language used in Indian sub-continent. In- doWordNet was also developed using existing Recently, BabelNet 4 (Navigli and Ponzetto, WordNets. 2012a) has become a good example of multi- Word-Sense Disambiguation (WSD) tech- lingual language resource. BabelNet simpli- nique played a major role in most of the word- fied WSD process by incorporating coding API net development. Lefever, Els and Hoste, (Navigli and Ponzetto, 2012b). Primarily, it Veronique have presented review on cross- uses open-source resources such as . lingual disambiguation (Lefever and Hoste, However, BabelNet does not create any Word- 2010)(Lefever and Hoste, 2013). They found Net for a particular language. It is a huge out that languages where the ratio of word standalone network of multilingual resources against sense is low, it becomes hard to extract which utilizes Princeton WordNet along with translation for that language since the number other resources to make relations. of translation for a particular word in another language becomes greater. Hence, a particular 2.2.2 Bengali word contains multiple translations in counter language. Between two of Bengali wordnets listed in French encountered the similar problem like GWA, one is developed by Indian Institute us. It had no corpus with predicate-argument of Statistics under Indradhanush Project 5. It annotations which help to express semantic re- has an online browser which does not pro- lation build-up. Van der Plas et al. researched vide the semantic relation between synsets and on predicate labeling in French (van der Plas only provides different concept available for a and Apidianaki, 2014) to overcome this issue word. Another Bengali wordnet is developed using Word Sense Disambiguation. as part of IndoWordNet by Center for Indian There are two terms in cross-lingual WSD. Language Technology (CILT) and Indian In- One is best match and another one is Out-of- stitute of Technology (IIT-Bombay) (Prabhu five. In best mode, the word or sense with the et al., 2012). A notable point in this Word- best probability score tagged with its counter Net is - it is built by following the expand word or sense. In case of, Out-of-five approach, approach. It does have the semantic relation if multiple senses or word belongs to candi- between synset to some extent. This is the most date conceptualization, best five probability mature and contextually rich Bengali WordNet candidates are considered for further analysis. to this date. Both WordNets are browsable Further analysis can be done manually or auto- and closed source. These are neither publicly matically. It can be semi-automatic as well. available for development, use or extend nor it WSD process performance can be improved provides any API for use. by using the Direct Semantic Transfer (DST) There was an effort for developing Bengali technique developed by Van der Plas et al. WordNet in BRAC University’s Center for Re- (Van der Plas et al., 2011). It tells us that the search on Bengali Language Processing. In senses which can be directly transferred to an- their development process they followed merge other language if and only if both share same approach (Faruqe and , 2010). semantic property. Surtani et al. developed a system where it 4 can predict the paraphrases based on corpus BabelNet can be found on http://babelnet.org (Ac- cessed 2016-12-07) (Surtani et al., 2013). In their system, they 5Indradhanush Project, http://indradhanush.unigoa. have a semantic relation prediction model. ac.in (Accessed 2016-10-22.) vi) Lexicographer validation for resultant mapping. 3.1 Similarity Matrices In step iv, similarity algorithm is used. Similar- ity algorithm calculates similarity in a sense be- tween two words in Princeton WordNet. Simi- larity can be calculated in several ways. There Figure 1: Proposed method for BanglaNet are well-established algorithms (Pedersen et al., 2004; Meng et al., 2013) to calculate simi- 3 Architecture larity score. Few of those algorithms are - It has been discussed above that expand ap- ) Path Similarity (Meng et al., 2013) cal- proach is followed to construct the BanglaNet culates the score in a range of 0 to 1 by translating the synsets in the Princeton based on the shortest path that connects WordNet to the Bengali language. Both - the senses in “is-a” (hypernym/hyponym) tomatic and manual methods are applied in relation. the process. Ambiguity is one of the concerns ii) Leacock-Chodorow Similarity (Bruce for automatic concept mapping. This cross- and Wiebe, 1994) scores based on the lingual ambiguity can come in different form. shortest path that connects the senses For instance - one-to-one, one-to-many, many- (identical to Path Similarity) and the max- to-one, many-to-many. In this research work, imum depth of the taxonomy in which the uni-directional ambiguity in one-to-one and senses occur. one-to-many has been addressed. Based on our research on other languages’ iii) Wu-Palmer Similarity (Wu and Palmer, WordNet and past works in Bengali WordNet, 1994) uses depth of the two senses in the this paper proposes to follow methodology de- taxonomy considering their most specific scribed in Fig1 for BanglaNet development. ancestor node are used to calculate the score. i) Extract monosemous literals w from Ben- gali lexicon. There are other algorithms like Resnik Simi- larity (Resnik, 1995), Jiang-Conrath Similarity ii) Translate each Bengali literal to English (Jiang and Conrath, 1997), Lin Similarity (Lin, literals using bilingual dictionary. 1998). To calculate the similarity between two iii) For each English literals, extract con- concepts, we use Wu & Palmer’s similarity cept(s) available in Princeton WordNet p. algorithm as it takes the hierarchical position of concepts C2 and C2 in the taxonomy tree iv) Run similarity score calculation algorithm relatively to the position of the most specific using the e and p we found for two dif- common concept lso(c1, c2) into account. It ferent Bengali sense. We take different assumes that the similarity between two con- synset available for sense w and compare cepts is the function of path length and depth in their English counterpart. path-based measures (Wu and Palmer, 1994). 2 ∗ depth(lso(c1,c2)) v) Based on similarity score, map Bengali simWP(C1,C2) = len(c1,c2) + 2 ∗ depth(lso(c1,c2)) concept with pwn concept. (1) 4 BanglaNet Development get the proper concept from WordNet. This is achieved through the WordNet concept selec- The primary task for WordNet development tion algorithm which is explained in later part expand approach using is to generate base lex- of this paper. For now, let’s see how dictionary icons and concepts. Full system including the translations are processed. database of Princeton WordNet is download- At first, every possible English transla- able from its official website. It is possible only tion for each of the words in the lexicon is to get the database files without the system as needed. This translation is achieved by iterat- well. Lexical database files can be downloaded ing through each Bengali word in our lexicon. separately as well. For base concepts, a dataset Bi-lingual (Bengali to English) dictionaries are which is available on GitHub 6 has been used. used to get translations of each of the words. It provides conceptual gloss in Bengali for This translation can be from multiple parts of words along with its synonymy. This dataset speech. POS for this translation is considered made our work more focused on cross-lingual as well so that it can be used to properly iden- mapping rather than local synset construction. tify correct translation in later steps. However, This research work is focused more on making not all words have its counter English words. relation with PWN concept rather than produc- These words can be a concept which is only ing concepts. After analyzing the list of con- available in Bengali concept only. These words cept retrieved from the dataset, at first synsets can also be a proper . For instance, the for each concept is generated. A concept can name of the places, location, river or person, be represented using multiple words; it ensures scientific terms. Although, it is also possible to that we have synonyms for every concept. collect this information in run-time, to reduce Moreover, There is a POS tag available for time latency and run-time processing, trans- each concept representing the word. lations along with the POS are temporarily 4.1 Word to Word Translation stored.

Currently, a list of concepts with its gloss and 4.2 Linking with Princeton WordNet synset is available. Now, English translation using Probabilistic Model for each word needed to be determined. A word in one language can be represented by It is mentioned earlier that, automated and multiple words in another language. This is semi-automated WordNet mostly depends on true for concept also. But for now, English well-crafted algorithms of Natural Language translation for the enlisted words is needed. Processing (NLP) and data processing. মিথ্যা Nevertheless, for a Bengali word, there can be These statistical and probabilistic heuristic multiple English meaning. For example: ”বল ” algorithms are good enough to create the means ’Ball’ in English. It means ’Force’মিকেট as relation between words, sense. It is obvi- well. A bilingual dictionary is needed to col- ous that the results are not always 100% lect these translations. In this step, candidate accurate. Hence, lexical post-verification translations from Bengali to English bilingual steps then come in place to fine tune the results. dictionary is stored. The reason behind collect- ing English translation using a dictionary is to After having the candidate translation, now it is possible to calculate the score of the prob- 6Bengali Synsets Data available on GitHub, Soumen- ganguly. https://github.com/soumenganguly/Bangla- able concept from Princeton WordNet for a Wordnet/ (Accessed 2016-10-22) BanglaNet concept. Let’s assume, Sc is the synset for a Bengali concept c. We have a set Algorithm 4.1: Algorithm for calculating of candidate translation CTw for a particular probability score Bengali word w. w belongs to the concept c. 1 Function CalculateProbabilityScore (P) POS tag associated with w is a. Input: P Sc = {s | s ∈ Bengali word} (2) Output: Sprted scores of P based on probability score Now, translation for each Bengali word si in Sc 2 scores[] := /0; is taken: 3 foreach (m,n) ∈ P do 4 if scores[m] 6= /0 then STsi = {sti | si ∈ Sc,sti ∈ CTsi } (3) 5 scores[m] ←

Combining STsi for all Sc. scores[m] + simwp(m,n); n [ 6 else STc = {st | ∀st ∈ STsi → si ∈ Sc } (4) 7 scores[m] ← sim (m,n); i=0 wp 8 end According to set theory, STc will contain all 9 end unique English translations for the words in 10 return sort(scores); Synset Sc. Synset from Princeton WordNet for each words in the set CTw and STc is retrieved. POS tag for the synsets should match with a. algorithm 4.2. To link Bengali concept with Assuming, as an English word - Princeton WordNet, multiple procedures have synu,a = {x | x ∈ PWN Synset for u and x ∈ a} used to ensure correctness as much as possible. (5) First of all, Princeton WordNet concept is as- [ signed to those concepts in BanglaNet which P = {x | ∀x ∈ syn ,a} (6) 1 u have only one possible item in P1. Secondly, u=CTw if and only if there is only one concept avail- [ P2 = {x | ∀x ∈ synu,a} (7) able for the word w, in that case, the concept u=STc from Princeton WordNet which scored high

We take cross product of elements of P1 against probability in probability calculation algorithm each elements of P2. would be chosen. A point to be noted is, if any of the synonyms (word) in synset of a concept P = {(m,n) | m ∈ CT and n ∈ ST } (8) w c has only one concept tagged to it, it can be After having the cross product, a similar- linked using this method. By using this first ity algorithm on each tuple is run. To cal- pass on all over the concepts, Princeton Word- culate similarity score, equation (1) on each Net concepts is assigned. P tuple is used. Sorting the synset 1 accord- 5 Results and Analysis ing to the summation of each synset’s score which is probability score for the synset, the In the initial dataset, there were 27239 unique tuple with maximum similarity score is cho- concepts. These concepts are represented us- . Algorithm for this task is transcribed in ing 40158 unique words tagged with different Algorithm 4.1 Now, the probability score for parts of speech. Table1 shows statistics of our all probable synset in Princeton WordNet for initial data. In total, almost 65% of the whole the Bengali concept is c. Bengali synset is concepts are tagged with Noun parts of speech. linked with Princeton WordNet synset using English translation for 13029 words has Algorithm 4.2: Algorithm for linking oped white flowers concept- first pass খেলা The algorithm predicted both English con- 1 Function LinkSynset (w) 뗁ক্তক াগী cepts for the two concepts available. For Input: w 귁লেমি খেলা .n.01 probability score for English con- 2 concept count := number of concepts 뗁ক্তক াগী cepts are 4.4419589754 and 4.4419589754 re- for the word w; 귁লেমি spectively. On the other hand, .n.02 3 P := Generate synset cross product ; score is 6.84959684439 and 6.20774295822. 4 sorted scores[] := It is observed that for both cases these scores CalculateProbabilityScore(P); are too close to prioritize probability. 5 if length of sorted scores = 1 or Although the algorithm used in BanglaNet is concept count = 1 then directed from Bengali to English synset match- 6 C := concepts for the word w; ing, this development can also be implied from 7 foreach c ∈ C do another way around. In that case, Bengali word 8 c.pwn ← which represents a particular concept in Prince- sorted scores.top().key(); ton WordNet can be used to verify and add 9 end more confidence to concept linking. As a re- 10 end sult, more link up can be achieved. Our initial synset contains gloss. But our Noun Adj Verb Adv Total approach does not take gloss into consideration. synsets 18311 5713 2777 438 27239 Initial As a consequence, BanglaNet can be expanded words 28311 8136 2923 788 40158 synsets 3174 1352 73 66 4665 using the same approach in future even if gloss Linked words 7477 2971 130 170 10748 for a synset is not available.

Table 1: Status of linked Synset and Words 5.1 Future Works from initial dataset There is a big opportunity to work on BanglaNet expansion and development. In this been retrieved. After applying concept link- algorithm, the gloss is not taken into consid- ing, 4665 concepts are linked with Princeton eration. The accuracy of the algorithm can be WordNet. In total, 10748 words are linked with noticeably improved by incorporating the gloss. Princeton WordNet. However, a bilingual corpus will be required To link this 4665 concepts with Princeton to achieve this. It has been found out that there WordNet, 3729 Princeton WordNet concepts is a lack of good corpus for Bengali. Good cor- are used. That means, there are cross multiple pus is one of the key components of Natural concepts within two WordNet. Language Processing. However, our literature খেলা Cross-lingual word-sense disambiguation review discussed BabelNet. It’s data sources 뗁ক্তক াগী can be shown using another example. For the and approach can be useful to map concepts. 귁লেমি word ” ” there are two concept available In this research work, first pass or first level in Bengali. In English it has two concepts too. linking is done. In the second pass, new algo- rithm needed to connect concepts which have cauliflower.n.01 a plant having a large edi- multiple synsets in either end (BanglaNet or ble head of crowded white Princeton WordNet). We propose to use, Vari- flower buds able Neighborhood Search (VNS) (”Hansen cauliflower.n.02 compact head of undevel- and Mladenovic,´ Nenad and MorenoA Perez,´ Jose´ A.”, 2010). [”Hansen and Mladenovic,´ Nenad and MorenoA Perez,´ Jose´ A.”2010] Pierre ”Hansen and Mladenovic,´ Nenad and 6 Conclusion MorenoA Perez,´ Jose´ A.”. ”2010”. ”variable neighbourhood search: methods and appli- Developing wordnet is an immense task. It is cations”. ”Annals of Operations Research”, ”175”(”1”):”367–407”. our distinct pleasure that in this research work, a basic layer of the system has been laid down [Isahara et al.2008] Hitoshi Isahara, Francis Bond, for Bengali wordnet from where further devel- Kiyotaka Uchimoto, Masao Utiyama, and opment can be made. Suggestion generation Kyoko Kanzaki. 2008. Development of the japanese wordnet. In LREC. task for validation can be achievable through the result of this research work. Our result [Jiang and Conrath1997] Jay J. Jiang and David W. analysis shows that around 5000 words from Conrath. 1997. Semantic similarity based on initially collected data are automatically linked corpus statistics and lexical taxonomy. CoRR, cmp-lg/9709008. up with Princeton WordNet. Although there is a long way to go in the development of Bengali [Lee et al.2002] Juho Lee, Koaunghi Un, Hee- wordnet, this research work is starting stage for Sook Bae, and Key-Sun Choi. 2002. A ko- further development. rean noun semantic hierarchy (wordnet) con- struction. In GLOBAL WORDNET CONFER- ENCE. Acknowledgments [Lefever and Hoste2010] Els Lefever and This research is funded by ICT Division Veronique Hoste. 2010. Semeval-2010 - Government of the People’s Republic of task 3: Cross-lingual word sense disambigua- Bangladesh under its fellowship program. tion. In Proceedings of the 5th international workshop on semantic evaluation, pages 15–20. Association for Computational Linguistics.

References [Lefever and Hoste2013] Els Lefever and Veronique Hoste. 2013. Semeval-2013 task [Apidianaki and Sagot2012] Marianna Apidianaki 10: Cross-lingual word sense disambiguation. and Benoˆıt Sagot. 2012. Applying cross- Proc. of SemEval, pages 158–166. lingual wsd to wordnet development. In LREC 2012-Eighth International Conference on Lan- [Lin1998] Dekang Lin. 1998. An information- guage Resources and Evaluation. theoretic definition of similarity. Proceedings of ICML, pages 296–304. [Bruce and Wiebe1994] Rebecca Bruce and Janyce Wiebe. 1994. A new approach to word [Meng et al.2013] Lingling Meng, Runqing Huang, sense disambiguation. In Proceedings of the and Junzhong Gu. 2013. A review of seman- Workshop on Language Technology, tic similarity measures in wordnet. Interna- HLT 94, pages 244–249. tional Journal of Hybrid Information Technol- ogy, 6(1):1–12. [Faruqe and Khan2010] Farhana Faruqe and Mu- mit Khan. 2010. Bwn-a software platform [Miller1995] George A Miller. 1995. Wordnet: a for developing bengali wordnet. In Innovations lexical database for english. Communications and Advances in Computer Sciences and Engi- of the ACM, 38(11):39–41. neering, pages 337–342. Springer. [Navigli and Ponzetto2012a] Roberto Navigli and [Fellbaum1998] Christiane Fellbaum, editor. 1998. Simone Paolo Ponzetto. 2012a. BabelNet: The WordNet: An Electronic Lexical Database. Lan- automatic construction, evaluation and applica- guage, Speech, and Communication. MIT Press, tion of a wide-coverage multilingual semantic Cambridge, MA. network. Artificial Intelligence, 193:217–250. [Navigli and Ponzetto2012b] Roberto Navigli and [Tufis et al.2004] Dan Tufis, Dan Cristea, and Sofia Simone Paolo Ponzetto. 2012b. Multilingual Stamou. 2004. Balkanet: Aims, methods, wsd with just a few lines of code: the babelnet results and perspectives. a general overview. api. In Proceedings of the ACL 2012 System Romanian Journal of Information science and Demonstrations, pages 67–72. Association for technology, 7(1-2):9–43. Computational Linguistics. [van der Plas and Apidianaki2014] Lonneke [Pala and Smrz2004] Karel Pala and Pavel Smrz. van der Plas and Marianna Apidianaki. 2014. 2004. Building czech wordnet. Romanian Jour- Cross-lingual word sense disambiguation for nal of Information Science and Technology, 7(2- predicate labelling of french. In Proceedings 3):79–88. of the 21st TALN (Traitement Automatique des Langues Naturelles) conference, pages 46–55. [Pedersen et al.2004] T Pedersen, S Patwardhan, and J Michelizzi. 2004. Wordnet similarity [Van der Plas et al.2011] Lonneke Van der Plas, - measuring the relatedness of concepts. Pro- Paola Merlo, and James Henderson. 2011. ceedings - Nineteenth National Conference on Scaling up automatic cross-lingual semantic Artificial Intelligence (AAAI-2004): Sixteenth role annotation. In Proceedings of the 49th An- Innovative Applications of Artificial Intelli- nual Meeting of the Association for Computa- gence Conference (IAAI-2004), pages 1024– tional Linguistics: Human Language Technolo- 1025. gies: short papers-Volume 2, pages 299–304. Association for Computational Linguistics. [Prabhu et al.2012] Venkatesh Prabhu, Shilpa De- sai, Hanumant Redkar, NR Prabhugaonkar, [Vossen2002] Piek Vossen. 2002. Wordnet, eu- Apurva Nagvenkar, and Ramdas Karmali. 2012. rowordnet and global wordnet. Revue franc¸aise An effcient database design for de- de linguistique appliquee´ , 7(1):27–38. velopment using hybrid approach. 3rd Work- shop on South and Southeast Asian Natural [Wu and Palmer1994] Zhibiao Wu and Martha Language Processing (SANLP). Palmer. 1994. Verbs semantics and lexical selection. In Proceedings of the 32Nd Annual [Resnik1995] Philip Resnik. 1995. Using infor- Meeting on Association for Computational Lin- mation content to evaluate semantic similarity guistics, ACL ’94, pages 133–138, Stroudsburg, in a taxonomy. In Proceedings of the 14th In- PA, USA. Association for Computational Lin- ternational Joint Conference on Artificial Intel- guistics. ligence - Volume 1, IJCAI’95, pages 448–453, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.

[Sathapornrungkij and Pluempitiwiriyawej2005] Patanakul Sathapornrungkij and Charnyote Pluempitiwiriyawej. 2005. Construction of thai wordnet lexical database from machine readable dictionaries. Proc. 10th Machine Translation Summit, Phuket, Thailand.

[Surtani et al.2013] Nitesh Surtani, Arpita Batra, Urmi Ghosh, and Soma Paul. 2013. Iiith: A corpus-driven co-occurrence based probabilis- tic model for noun compound paraphrasing. At- lanta, Georgia, USA, page 153.

[Taghizadeh and Faili2016] Nasrin Taghizadeh and Hesham Faili. 2016. Automatic word- net development for low-resource languages using cross-lingual wsd. Journal of Artificial Intelligence Research, 56:61–87.