<<

Building a WordNet for

Indeewari Wijesiri Malaka Gallage University of Moratuwa University of Moratuwa Moratuwa, Sri Moratuwa, [email protected] [email protected]

Buddhika Gunathilaka Madhuranga Lakjeewa University of Moratuwa University of Moratuwa Moratuwa, Sri Lanka Moratuwa, Sri Lanka [email protected] [email protected]

Daya C. Wimalasuriya Gihan Dias University of Moratuwa University of Moratuwa Moratuwa, Sri Lanka Moratuwa, Sri Lanka [email protected] [email protected]

Rohini Paranavithana Nisansa de Silva University of University of Moratuwa Colombo, Sri Lanka Moratuwa, Sri Lanka [email protected]

1 Introduction Abstract Despite being used by over 19 million people and being one of the official languages of Sri Lanka, there has not been much progress in de- Sinhala is one of the official languages of Sri veloping natural language processing (NLP) ap- Lanka and is used by over 19 million people. plications for the . This is partly It belongs to the Indo-Aryan branch of the In- due to the lack of commercial interest on devel- do-European languages and its origins date oping Sinhala NLP applications on a global back to at least 2000 years. It has developed scale. For instance, as of now, neither Google into its current form over a long period of time Translate 1 nor Google News 2 is available for with influences from a wide variety of lan- guages including Tamil, Portuguese and Eng- Sinhala while both are available in and lish. As for any other language, a WordNet is Tamil – two other regional languages spoken by extremely important for Sinhala to take it into a much larger population and thus with a higher the digital era. This paper is based on the pro- business value. ject to develop a WordNet for Sinhala based Within this backdrop, we believe that develop- on the English (Princeton) WordNet. It de- ing a fully functional WordNet for Sinhala would scribes how we overcame the challenges in provide a much needed boost for the Sinahla adding Sinhala specific characteristics which NLP work. This is because it is well recognized were deemed important by Sinhala language that a WordNet is a very important tool in per- experts to the WordNet while keeping the forming natural language processing tasks for structure of the original English WordNet. It also presents the details of the crowdsourcing any language. A WordNet will be helpful to Sin- system we developed as a part of the project - hala NLP application developers in tasks ranging consisting of a NoSQL database in the from word sense disambiguation and information backend and a web-based frontend. We con- retrieval to translation. Moreover a Sinhala clude by discussing the possibility of adapting WordNet will be a valuable resource to linguists this architecture for other languages and the road ahead for the Sinhala WordNet and Sin- 1 hala NLP. http://translate.google.com/ 2https://support.google.com/news/answer/ 40237 studying the Sinhala language. We paid special ing the possibility of adopting the entire system attention to the interests and concerns of the lat- to other languages in Section 5. We present the ter group as described later in the paper. details of some related work in Section 6 and The project team, mainly consisting of per- provide concluding remarks in Section 7. sonnel from the Knowledge and Language Engi- neering Lab of University of Moratuwa, started 2 Developing the Linguistic Infrastruc- the task of developing a WordNet for Sinhala ture with several brainstorming sessions which in- volved Sinhala language experts, computer sci- Development of linguistic infrastructure was car- ence specialists and people who had previously ried out as the first phase of the project. Several made some contributions in digitizing the Sinha- discussions with Sinhala language experts were la language (for example in developing Sinhala conducted to better understand the key features characters). Although we were biased of the Sinhala language. towards using the expansion approach, which 2.1 Discussions with Sinhala Linguists develops a WordNet based on an existing WordNet for another language, we discussed the From the beginning of the project the develop- possibility of adopting the merge approach, ment team was collaborating with some promi- which develops a WordNet using the first princi- nent experts on Sinhala language. The basic idea ples by leveraging existing dictionaries and other of this collaboration was to acquire the necessary resources (Bhattacharyya, 2010). We settled on knowledge of the Sinhala language to get to the expansion approach because it was evident know the linguistic requirements of a Sinhala that we do not have the resources to successfully WordNet and to form an expert evaluator panel pursue the merge approach. to help with the crowdsourcing effort in develop- We came up with basic design for the Word- ing the WordNet. Net through the above mentioned brainstorming One important topic discussed with the experts sessions and then proceeded to develop the tech- was that Sinhala has a significant difference in nical infrastructure needed. This consists of de- written and spoken usage. These differences in- veloping Sinhala WordNet APIs and a web inter- clude differences in word usage and differences face as well as a crowdsourcing system to add in grammar. We were particularly interested in synsets and relationships. The latter is needed differences in word usage in spoken and written because coming up with Sinhala synsets and re- forms as grammar rules fall outside the scope of lationships based on the synsets of another lan- a WordNet. It was observed that words with sub- guage requires a lot of manual work. Initially we tle but important differences are used in the writ- were planning to use the Hindi WordNet as the ten and spoken forms of Sinhala. For instance, source WordNet but switched to the English for the sense “man”, නිසා (minisa) is the most WordNet a couple of months into the project. frequent word used in written Sinhalese while The reasons for this change are discussed in Sec- නිහා (miniha) is the most frequent word used in tion 2.2. Apart from this the development effort spoken Sinhalese. While the difference is subtle proceeded fairly smoothly and we have complet- (a single phoneme in this case) its implications ed the implementation of the WordNet API and are significant for a natural speaker of Sinhala. In the crowdsourcing system. Currently we are in this case, using නිසා in normal conversations the process of adding synsets using this system. appears extremely odd. Moreover such differ- The rest of the paper is organized as follows. ences are very common and combining words In Section 2, we present the details of the discus- used in spoken and written Sinhala results in sions we had with Sinhala language experts and very odd phrases. the effects these discussions had in the structure The problem faced by us was whether to in- of the Sinhala WordNet. In Section 3 we discuss clude this difference in the Sinhala WordNet. the technical details of the project. Here, we de- Doing so would go against the main objective of scribe the use of a NoSQL database to facilitate a WordNet which is organizing words by their modification to a WordNet, which has not been meanings; clearly there is no difference in the done before to the best of our knowledge. In Sec- meanings of නිසා and නිහා as it is simply a tion 4, we describe how the crowdsourcing sys- matter of language usage. Despite this concern, tem works including how it gives suggestions to we decided to include this difference as a flag for the contributors simplifying their task. We reflect each word due to the following reasons. on some important aspects of the project includ- 1. Not including these in the WordNet would pear odd. This is despite the fact that all four result in the loss of a valuable opportunity words are acceptable in written Sinhala. Thus to encode these differences in a machine details of the origin of a word are also included readable manner; the contributors of the in the Sinhala WordNet. Both the source lan- crowdsourcing system can do this with lit- guage and the derivation type (/tatbawa) tle extra effort but doing it as a separate are kept on this regard. project would require a lot more effort. Each noun in Sinhala can be in 9 morphologi- The importance of this factor is magnified cal forms called ‘vibhakthi’(විභ槊ති). Furthermore by the lack of commercial interest in Sin- there are fairly complicated rules in forming hala NLP. words called ‘sandi’(සේ쇒) and ‘sa- 2. Since one of the primary reasons for de- masa’(සමාස). The formation of these forms and veloping a Sinhala WordNet was to serve rules as well as the inflectional forms of a the needs of Sinhala linguists we wanted are based on the root of the word, which may not to accommodate their requirements. We be the most commonly used form of the word. suspected that eliminating this type of in- Therefore, it was decided to keep the word root formation would make the WordNet less as well as the most common morphological form useful to them. Janssen (2002) has made a in storing a word in the WordNet. similar argument with regards to eliminat- In summary, we decided to include the follow- ing gender information from WordNets. ing features for each word. Hence, adding this information to the  Written/ Spoken usage WordNet was seen as a pragmatic move.  Gender 3. Different words being used in spoken and  Origin of the word written Sinhala is an extremely common  Word root phenomenon that cannot simply be ig-  The most common morphological form nored or left for later consideration. It is interesting to relate these features, which By the same reasoning, we decided to add few are deemed important in representing Sinhala more features of the Sinhala language to the words in a machine-processable format, to a WordNet. One of them is the gender difference. standard lexical-encoding framework. Our dis- The genders in Sinhala are masculine and femi- cussion on this regards is based on the lemon nine but none are specified for some words (typi- (Lexicon Model for Ontologies) framework cally for things that are not alive). The gender of (McCrae et al., 2012). Our view is that the writ- a noun is important as it decides which morpho- ten/spoken usage and the origin of the word are logical form of a verb is used with it. Thus the properties under the linguistic description mod- Sinhala WordNet will contain the gender of each ule of lemon outside its core. These will be used noun, if exists. by the phrase-structure module in identifying The Sinhala words can be divided into three well-formed phrases. The word root is related to main categories called native words, words di- the morphology module and is used in rectly borrowed from another language which are while the most common morphological form is being used without any change (තත්සම - tatsama) the main lexical entry in the core for the word in and the words borrowed from another language concern. The gender information is useful for and have been modified (තත්භව - tatbawa). The inflection in the morphology module and in rec- words have been mainly borrowed from , ognizing words that do not have certain morpho- , Hindi, Portuguese, English, Tamil and logical forms. (e.g., 角ණ - rajina - the queen Dutch. In constructing phrases in Sinhala, the does not have a masculine form). origin of the word should be considered similar to how the spoken/written differentiation is used. 2.2 Selecting the Source WordNet As an example ‘mathru’(මාතෘ) and ‘maw’(මව්) As mentioned earlier we decided to develop the are two forms to express the meaning “mother’s” Sinhala WordNet following the expansion ap- in Sinhala but ‘mathru’ is a tatsama while ‘maw’ proach due to practical considerations. Then the is a tatbawa. ‘snehaya’( and ස්නේහය) question was which WordNet to use as the ‘senehasa’(නසනෙහස) means ‘affection’ which source WordNet. We first decided to use the again are tatsama and tatbawa. To express Hindi WordNet (Jha et al., 2001) for this purpose “mother’s affection”, people use either ‘mathru due to the following reasons. snehaya’(මාතෘ ස්නේහය) or ‘maw senehasa’(මව් නසනෙහස) while the other two combinations ap- 1. The Sinhala language belongs to the Indo- as described in Section 2.1 created additional Aryan branch of the Indo-European lan- complexities and we found that accommo- guages and is heavily influenced by the clas- dating these changes in the Indo WordNet sical Indian languages of Sanskrit and Pali. text database stricture was very difficult. The Since Hindi is close to Sanskrit and the Hin- Princeton English WordNet (Fellbaum, di WordNet is fairly sophisticated - it serves 1998), with its extensive documentation and as the hub of the Indo WordNet initiative the support network was seen as a much bet- (Bhattacharyya, 2010) - we assumed that the ter alternative in this context. Hindi WordNet would provide a good basis 3. A significant percentage of native Sinhala for developing the Sinhala WordNet. We speakers have a working knowledge in Eng- even considered using the Sankrit WordNet lish and it was seen that this will be very use- as the source WordNet but realized that it is ful for a crowdsourcing system. In contrast, still in an early stage. familiarity with the Hindi language is not 2. The success of the Indo WordNet initiative widespread and this combined with the fact in creating WordNets for many languages in that most Hindi words are apparently unfa- India (Bhattacharyya, 2010) was one of the miliar to Sinhala speakers as described in main motivations for us in embarking on this (1), means that it is very difficult to use the project. It was assumed that using the Hindi Hindi WordNet in a crowdsourcing system. WordNet as the source WordNet would help us leverage the success of the Indo WordNet. Based mainly on the above factors, we switched the source WordNet from Hindi to Eng- However, as we proceeded with the devel- lish early in the development stage. The fact that opment work, it was apparent that using the Hin- the WordNets for Arabic (Rodrıguez et al., 2008) di WordNet as the source WordNet was not a and Japanese (Isahara et al., 2008), which have viable option. The following are the main rea- very little in common with English, have also sons for this. been developed with the English WordNet as the 1. Despite the perceived similarity in the ori- source, also weighed in on our decision. gins of the languages, Hindi and Sinhala are We were mindful of the consequences of us- very different languages in many aspects re- ing the English WordNet as the source WordNet lated to WordNet construction: One difficul- in developing the Sinhala WordNet. It has been ty associated with this is that Hindi is written stated that the source WordNet can have a dis- in Devanagari script, which is not familiar to tracting influence on the new WordNet being most Sinhala speakers. (Sinhala has its own created especially when the two languages exist alphabet). Moreover, for many Hindi words in different regions and cultural settings it was difficult to identify Sinhala words with (Bhattacharyya, 2010). It is clear that this con- the same meaning, even after knowing how cern is applicable here. As such we decided to the word is pronounced. It was thought that aggressively remove existing synsets in the Eng- translating Hindi words to Sinhala would be lish WordNet and add new synsets as necessary easier once the pronunciation is known be- when developing the Sinhala WordNet. cause words of the languages are often pro- nounced similarly – e.g., Sinhala බෑයා 3 Developing the Technical Infrastruc- (baaya) vs. Hindi भाई (bhai) meaning broth- er. It was seen that such similarities are not ture very common. As a result, we found our- After developing the linguistic infrastructure, we selves frequently translating words from focused on developing the technical infrastruc- Hindi to English to understand the relevant ture according to the requirements identified. Sinhala words. The main challenges we faced here were resolv- 2. It was seen that adopting the technical infra- ing the complications arising when extending the structure of the Indo WordNet project to de- Princeton WordNet API, dealing with different velop the Sinhala WordNet was difficult. data structures, and selecting tools and technolo- Part of this is due the communication diffi- gies. In this section, we describe the salient fea- culties – all other WordNets of the Indo tures of the architecture of the system and how WordNet have been developed within India we approached the above mentioned challenges. itself. In addition, our requirement to add flags to words in addition to flags for synsets The components in the presentation layer get Figure 1: System Architecture the data they need from three sources. 3.1 The WordNet API 1. The English WordNet: The data contained in the English WordNet text database in terms The Sinhala WordNet API is implemented on the of synsets and relationships are used. Java platform extending the English WordNet 3 2. The NoSQL Database: The modifications API (JWNL) . The basic idea of developing this made by contributors of the crowdsourcing API is to provide general WordNet functionali- system to the data of the English WordNet ties as well as the specific functionalities of the are stored in this database. Sinhala WordNet discussed above. We defined 3. Linguistic Resources: Several linguistic re- new classes for synset, word, noun, verb, adjec- sources such as available machine readable tive and adverb extending the JWNL classes. The dictionaries for Sinhala are used in providing JWNL documentation and mailing lists were ex- suggestions for the collaborators. tremely helpful to us in this exercise. Incorporat- Components in the Data Access Layer are used ing Sinhala characters in the API was based on by the two components in the Process Layer to the Sinhala Unicode characters. access the necessary data. 3.2 System Architecture The MongoDBToTextDB transformer gets the data from the NoSQL database as well as the Figure 1 shows the architecture of the entire sys- text database of the English WordNet because tem, consisting of the API and the crowdsourcing the NoSQL database only contains the modifica- system. For the non-technical users, the main tions made by collaborators. It combines the data outputs of the system are the online and offline from the two sources into the text database of the Sinhala WordNet browsers and the web-based Sinhala WordNet API. This step is carried out interface for the crowdsourcing system. Devel- when releasing a new version of the Sinhala opers will have access to these components as WordNet. well as the source code of the Sinhala WordNet API, WordNet Constructor Core - which governs 3.3 Use of a NoSQL Database how the crowdsourcing system operates -, the According to the system architecture described MongoDBToTextDB Transformer and the sche- above, we need a database to store the modifica- mata of the underlying databases. tions performed by the contributors of the crowdsourcing system. The modifications in- 3http://jwordnet.sourceforge.net/han clude adding Sinhala words to a synset, adding dbook.html features to words and synsets, adding relation- ships between words/synsets and adding and re- they use the text database, assuming that any moving synsets. contradictions are resolved before a release. Until recently, the standard solution for this We concluded that the advantages of NoSQL type of a data storage need has been to use a rela- databases outweigh their disadvantages and de- tional database system. However, the use of cided to use one. We selected the MongoDB NoSQL databases has increased in the recent NoSQL (Plugge et al, 2010) system. Table 1 past partly due to the flexibility it offers to the shows the schema we used for nouns. To the best schema designer. Instead of being restricted to a of our knowledge, this is the first time a NoSQL relational schema, which often requires multiple database has been used in developing a Word- tuples spread across several relations for the Net. same logical data unit, NoSQL databases allows Currently, the source repository is maintained the designers to store data according to the se- as a private GitHub project. We will make it pub- mantics behind them. We realized that these ad- lic in the near future. vantages will be important in our system since a synset consists of an unlimited number of words, 4 The Crowdsourcing System each with several distinct features. 4.1 Overview Another advantage of using NoSQL databases is that they provide better scalability than rela- As mentioned earlier, a crowdsourcing system to tional database systems especially in setting up facilitate the development of the Sinhala Word- multiple servers connected to a web-based front- Net was designed and implemented as a part of end. This too will be helpful in using a crow- the project. As illustrated in Figure 1, the Word- sourcing approach for WordNet creation as the Net Constructor Core component contains the system will provide better performance for the major functionalities of this system. It obtains contributors. different types of data through the components of the Data Access Layer and provides an interface Noun to be used by the web-based interface of the _id crowdsourcing system. The following are the _class different types of data used by this component userName through the Data Access Layer. EWNID 1. Information contained in the English Words _id WordNet through the EWN API (JWNL). Lemma 2. Information obtained from several linguis- wordID tic resources for the Sinhala language in- wordPointerList cluding machine readable dictionaries and pointerType thesauri. These are used to specify sugges- synsetType tions to contributors to simplify their task synsetId as described in Section 4.2. wordId 3. Information contained in the mongoDB sensePointers database, which contains the modifica- pointerType tions made by the contributors as men- synsetType synsetId tioned earlier. gloss The web-based user interface allows contributors to browse through the English WordNet hierar- Table 1: Schema for Nouns chy and perform modifications as necessary. If no work has been done on a particular synset of However, it was noted that NoSQL solutions the English WordNet, they will be shown the do not guarantee consistency of the database alt- data contained in the English WordNet and are hough they provide eventual consistency. There- expected to replace them with Sinhala words. fore, it is possible, in rare conditions, for two These changes include adding words to synsets, contributors to make contradictory updates in the specifying flags for the words (e.g., whether the database. In the context of our system, these in- word is used in written/spoken Sinhala) and add- consistencies can be resolved later, generally in ing relationships. All the modifications are saved evaluation. Moreover any inconsistencies do not in the MongoDB database. affect the releases of the Sinhala WordNet as ble quality is an open research question. Dow et Figure 2: The UI of the Crowdsourcing System al. (2012) have found that assessment of work produced, whether it is external assessment or Figure 2 shows the web interface when adding self-assessment, if very helpful on this regard. As Sinhala words/relationships for the English syn- such, we expect the feedback provided by evalu- set for one sense of the word “phenomenon”. ators to help our effort. Since Sinhala words have not been added to this synset, it shows the available information in the 4.2 Providing Suggestions English WordNet. In addition, it shows suggest- The purpose of providing suggestions for con- ed Sinhala words obtained from linguistic re- tributors is simplifying their task so that they do sources as described in Section 4.2. not have to rely entirely on their knowledge and The web-based user interface is operational available printed material. Currently, we provide and can be accessed from suggestions for English words based on machine http://www.wordnet.lk. The modifica- readable English to Sinhala and Sinhala to Sinha- tions made by the contributors have to be ap- la dictionaries and thesauri. Out of the available proved by an evaluator before being included in resources, we found the Madura English-Sinhala a release. dictionary (Kulatunga, undefined) particularly How to effectively use a crowdsourcing tech- helpful. We are currently in the process of im- nique to get a particular task done with accepta- proving this component by incorporating the the- sauri developed by the Department of Official by the Indo WordNet project in developing Languages of Sri Lanka and a text corpus com- WordNets for languages in India (Bhattacharyya, piled by ourselves. 2010). Our goal is to release the first complete version early next year. 5 Discussion The Knowledge and Language Engineering Lab of the Department of Computer Science and 5.1 The Morphology of the Language Engineering at University of Moratuwa is coor- Sinhala is an inflectional language where many dinating this effort. and nouns have a fairly large number of morphological forms. Verbs and nouns frequent- 6 Related Work ly have more than 10 morphological forms when The Hindi WordNet and the Indo WordNet initi- considering both spoken and written forms. This ative provided a lot of inspiration to us in at- has implications for the WordNet as a person or tempting to develop a WordNet for Sinhala fol- a software system searching for a word may use lowing the expansion approach. We followed a different morphological form from what is con- their work in several aspects of the project such tained in the WordNet. We decided against stor- as the use of crowdsourcing to generate synsets. ing all morphological forms of a word in the There has been a previous work on develop- WordNet since that increases the number of ing a WordNet for Sinhla by Welgama et al. words for a synset to an unmanageable level. As (2011), which is basically an exploration on de- such a good morphological analyzer, which is veloping a WordNet for Sinhala by extracting external to the WordNet is necessary to obtain some common words from a corpus and getting the full benefits of the WordNet. There have the help of Sinhala language experts to come up been previous attempts to develop a morphologi- with synsets based on them. It can be seen that cal analyzer for Sinhala which have produced this work is related to the merge approach. Our satisfactory results (Hettiage, 2006; Fernando work differs from this effort in our use of the and Weerasinghe 2013). expansion approach and the objective of devel- 5.2 Extending to Other Languages oping a complete WordNet. While we did not develop our system with the 7 Conclusion objective of developing WordNets for languages other than Sinhala, we recognize that it has the Developing a fully functional Sinhala WordNet potential to be used in this manner. The architec- can be considered a landmark in NLP for Sinhala ture of the system has to be changed in some and we believe that we are well set to achieve places, for example in using linguistic resources this in the near future. This will provide a tre- of other languages for providing suggestions for mendous boost for developing Sinhala NLP ap- contributors. But the overall design of displaying plications such as information retrieval systems, the information of the English WordNet, allow- text classifiers and summarizers and translators. ing the contributors to modify them with words The availability of a platform in terms of a from the target language and storing the modifi- WordNet may even attract some commercial in- cations in the NoSQL database can be easily ap- terest for Sinhala NLP. plied in developing a WordNet for another lan- It should also be recognized that our work has guage based on the English WordNet following the potential to be generalized into a system that the expansion approach. It is possible to reuse can be used to bootstrap WordNet creation for a the schema of the MongoDB database and the language. If this goal can be achieved, it will be source code of the crowdsourcing interface, the extremely helpful in developing WordNets for WordNet Constructor Core and the MongoD- minority languages such as Sinhala. BToTextDB Transformer in such an exercise. We plan to separate out these parts from our Acknowledgements codebase as a future work. We thank Prof. J.B. Disanayaka, Dr. Sandagomi 5.3 Current Status Coperahewa and Mr. Achinthya Bandara of the Department of Sinhala of University of Colombo The crowdsourcing system is currently opera- and Mr. Anushke Guneratne of the LK Domain tional and the number of synsets in the Sinhala Registry for their help in this project. WordNet is approaching 2000. This number is significant since this has been used as a marker References Viraj Welgama, Dulip Lakmal Herath, Chamila Li- yanage, Namal Udalamatta, Ruvan Weerasinghe Pushpak Bhattacharyya. 2010. IndoWordNet, Pro- and Tissa Jayawardana. 2011. Towards a Sinhala ceedings of the Lexical Resources Engineering WordNet, Proceedings of the Conference on Hu- Conference. man Language Technology for Development. Steven P. Dow, Anand Kulkarni, Scott R. Klemmer and Björn Hartmann. 2012. Shepherding the Crowd Yields Better Work, Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work: 1013-1022. Christiane Fellbaum (ed.), 1998. WordNet: An Elec- tronic Lexical Database. MIT Press. Niroshinie Fernando and Ruwan Weerasinghe. 2013. A Morphological Parser for Sinhala Verbs. Pro- ceedings of the International Conference on Ad- vances in ICT for Emerging Regions. Buddhita Hettige. 2006. A Morphological Analyzer to Enable English to Sinhala Machine Translation, Proceeding of the 2nd International Conference on Information and Automation: 21-26. Hitoshi Isahara, Fransis Bond, Kiyotaka Uchimoto, Masao Utiyama and Kyoko Kanzaki. 2008. Devel- opment of the Japanese WordNet. Proceedings of the Sixth International Conference on Language Resources and Evaluation. Maarten Janssen. 2002, Differentiae Specificae in EuroWordNet and SIMuLLDA. Proceedings of the Ontologies and Lexical Knowledge Bases Work- shop. Madura Kulatunga. (undefined). Madura English- Sinhala Dictionary. Retrieved September 6, 2013, from http://maduraonline.com/. John McCrae, Guadalupe Aguado-de-Cea, Paul Bui- telaar, Philipp Cimiano, Thierry Declerck, Asun- ción Gómez-Pérez, Jorge Gracia, Laura A Hollink, Elena Montiel-Ponsoda, Dennis Spohr and Tobias Wunner. 2012. Interchanging lexical resources on the Semantic Web, Language Resources and Eval- uation, 46(4): 701-719. S. Jha, Dipak Narayan, Prabhakar Pande and Pushpak Bhattacharyya. 2001. A WordNet for Hindi, Pro- ceedings of the International Workshop on Lexical Resources in Natural Language Processing. Eelco Plugge, Tim Hawkins and Peter Membrey. 2010. The Definitive Guide to MongoDB: The NoSQL Database for Cloud and Desktop Compu- ting. Apress. Horacio Rodríguez, David Farwell, Javi Farreres, Manuel Bertran, Antonia Martí, William Black, Sabri Elkateb , James Kirk, Piek Vossen and Chris- tiane Fellbaum. 2008. Arabic WordNet: Current state and future extensions. Proceedings of the Fourth Global WordNet Conference.