Sam āsa-Kart ā: An Online Tool for Producing Compound using IndoWordNet

Hanumant Redkar, Nilesh Joshi, Sandhya Singh, Irawati Kulkarni, Malhar Kulkarni, Pushpak Bhattacharyya

Center for Indian Language Technology, Indian Institute of Technology Bombay, India

[email protected], [email protected], [email protected], [email protected], [email protected], [email protected]

of the compound 1. A compound word is a Abstract that consists of more than one stem. An is a word composed of more Sam āsa or compounds are a regular feature of than one free morpheme. However, in , a Indian Languages. They are also found in compound, also known as (sam āsa ) is de- other languages like German, Italian, French, समास fined as (p thagarth Russian, Spanish, etc. Compound word is पृथगथाϕनामेकाथĖभावः समासः ṛ ā constructed from two or more words to form nāmek ārth ībh āva ḥ sam āsa ḥ, placing together two a single word. The meaning of this word is or more words so as to express a composite sense, derived from each of the individual words of which is a compound composition) 2. Example, the compound. To develop a system to gener- िशवपŨी (śivapatn ī, wife of śiva and a benevolent ate, identify and interpret compounds, is an aspect of dev ī) is a sam āsa or a compound formed important task in Natural Language Pro- from two words (śiva, a major divinity in the cessing. This paper introduces a web based िशव tool – Sam āsa-Kart ā for producing compound later Hindu pantheon) and पŨी (patn ī, a married words. Here, the focus is on Sanskrit lan- woman) which are formed from paraphrase िशवय guage due to its richness in usage of com- पŨी (śivasya patn ī, wife of śiva and a benevolent pounds; however, this approach can be aspect of dev ī). Sanskrit language has high usage applied to any Indian language as well as oth- of compounds in literature and is rich in producing er languages. IndoWordNet is used as a re- compound words. Pāṇini , the most referred San- source for words to be compounded. The skrit grammarian, mentioned various types of motivation behind creating compound words

is to create, to improve the vocabulary, to re- sam āsa and compounding system stated in the duce sense ambiguity, etc. in order to enrich form of 110 sutras (rules) in his book the WordNet. The Sam āsa-Kart ā can be used Aṣṭādhy āyī (Mishra, 2010). for various applications viz., compound cate- gorization, creation, morphological 1.1 Types of Sam āsa in Sanskrit analysis, paraphrasing, synset creation, etc. In Sanskrit, there are four major types of Sam āsa : • अƆयीभाव (Avyay ībh āva ) - In avyay ībh āva 1 Introduction sam āsa, first member has primacy 3

Word compounding is an essential feature of any 1 http://grammar.ccc.commnet.edu/grammar/compounds.htm language. In literature, there are various definitions 2http://lukashevichus.info/knigi/abhyankar_shukla_sans_gram _dic.pdf 3 primacy – the fact of being pre-eminent or most important. 6 (पूवϕपदाथϕOधान , pūrva-pad ārtha-pradh āna ). comes part of the language and gets lexicalized . Here, the first member of this type of nom- The grammatical features of Sanskrit are pre- inal compounds is indeclinable, to which scribed for use and compounding is an important another word is added so that the newly feature of Sanskrit. formed compound also becomes indeclin- The paper is organized as follows: Section 2 in- able ( i.e., अƆय , avaya ). Example, यथाशिŎ troduces Sam āsa-Kart ā and its components in de- (yath āś akti , in accordance with one’s tail. Section 3 lists the salient features of Sam āsa- strength). Kart ā. Section 4 gives the limitation of Sam āsa- • Kart . Section 5 describes the related work. Final- तपुतपुďषďषďषďष (Tatpuru ṣa) - In tatpuru ṣa sam āsa , ā ly, we conclude the paper with the mention of second member has primacy ( उēरपदाथϕ- scope and enhancements to this tool and its useful- Oधान , uttara-pad ārtha-pradh āna ) and the first component is in a case relationship ness in the entire WordNet community. with another. Example, सयाकालः (sandh- 2 Sam āsa-Kart ā: The Compound Word yākāla ḥ, evening time). • Producer ůůůůůů ( ) – In dvandva sam āsa , both members have primacy ( उभयपदाथϕOधान , ubhaya-pad ārtha -pradh āna ). Here, the 2.1 What is Sam āsa-Kart ā? members are usually noun stems, connect- The Sam āsa-Kart ā7, also known as Compound ed in sense with 'and'. Example, Word Producer is an online tool developed to pro- रामलमणभरतशJुőाः (rāmalak ṣma ṇabharata duce compound words. The produced words are śatrughn āḥ, Ram and Laxman and Bharat formed using rule based system which takes two and Shatrughn). words from IndoWordNet database (Prabhu et. al, • बćWीिह (Bahuvr īhi ) - In bahuvr īhi sam āsa , 2012) with the help of IndoWordNet APIs both members refers to a thing which in it- (Prabhugaonkar et. al, 2012). The new word which self is not part of the compound is produced, is another word, which falls under any (अयपदाथϕOधान , anya-pad ārtha -pradh āna ). of the four types of sam āsas mentioned above. Example, गजाननः (gaj ānana ḥ, one whose There are two types of users for this tool – the face is that of an elephant). lexicographer and the validator. The basic job of lexicographer is to enter words, generate com- 1.2 IndoWordNet as a Resource pound words and temporarily add these compound WordNet is a composed of synsets words to the synset in WordNet database. The and their semantic and lexical relations. Synsets main task of validator is to validate if the com- are sets of synonyms or synonymous words (Miller pound words are properly produced and added to et al., 1990). IndoWordNet 4 is a linked structure of the WordNet database. of major Indian languages from Indo- Sam āsa-Kart ā basically produces compounds between Noun-Noun (NN-NN), Noun-Adjective Aryan, Dravidian and Sino-Tibetan families (NN-JJ), Noun-Verb (NN-VM), Adjective-Noun (Bhattacharyya, 2010). 5 (JJ-NN), Adverb-Noun (RB-NN) pairs. However, In this paper, we have taken Sanskrit WordNet it does not deal with the word combinations such as a resource. Sanskrit is an Indo-Aryan language as Noun-Adverb (NN-RB), Verb-Verb (VM-VM) and is one of the ancient languages. It has vast lit- and Verb-Noun (VM-NN) as they cannot be com- erature and a rich tradition of creating lexica pounded. (Kulkarni et al., 2010(a)). The roots of most of the languages in the Indo European family in India can 2.2 Components of Sam āsa-Kart ā be traced to Sanskrit (Kulkarni et al., 2010(b)). Sam āsa-Kart ā, the tool, has multiple components which Also, as stated in the article 351 of the constitution follows the pipeline architecture. Figure 1 shows the of India, the need arises for coining new words block diagram and figure 2 shows the basic interface of when the new object or an action related to it be- the Sam āsa -Kart ā.

4 http://www.cfilt.iitb.ac.in/indowordnet/ 6 http://www.constitution.org/cons/india/p17351.html 5 http://www.cfilt.iitb.ac.in/wordnet/webswn/wn.php 7 http://www.cfilt.iitb.ac.in/wordnet/samaaskarta/

For example, if a lexicographer inputs two

words, मदः (manda ḥ, disinclined to work or exer- tion) as first word and मितः (mati ḥ, knowledge and intellectual ability) as second word; we get the fol- lowing synset information:

For the word मदः (manda ḥ) Sense 1 Synonyms: मदः (manda ḥ), तुदपƗरमृजः (tundaparim ṛja ḥ), आलयः (ālasya ḥ), शीतकः (śī taka ḥ), अनुणः (anu ṣṇ aḥ), शीतलः (śī tala ḥ), कुठः (ku ṇṭ ha ḥ), अनाशुः (an āś uḥ) । Gloss: अवयकतϕƆेषु अOवृिēशीलः (ava śyakartavye ṣu aprav ṛtti śī la ḥ) । Example(s): "मदः Ƙकमिप न Oाƀोित " (manda ḥ kimapi na pr āpnoti )

For the word मितः (mati ḥ) Figure 1. Block diagram of Sam āsa-Kart ā Sense 1 Synonyms: मतम् (matam ), दृिƍः (dṛṣṭ iḥ), मितः (mati ḥ), धीः (dh īḥ) Gloss: Ƙकमिप वतु कमिप िवषयं वा अिधकृय कृतं । िचतनम् (kimapi vastu kamapi vi ṣaya ṃ vā adhik ṛtya k ṛta ṃ cintanam ) Example(s): "अमाकं मतेन भवताम् इदं कायİ न । समीिचनम् " ( asm āka ṃ matena bhavat ām ida ṃ kārya ṃ na sam īcinam ) Sense 2 Synonyms: मितः (mati ḥ), बुिŵः (buddhi ḥ), धीः (dh īḥ), Oाϔता (pr ājñat ā) Gloss: िनƇयािमकातःकरणवृिēः ययाः बलेन । िचतियतुं शϝयते (ni ścay ātmik ānta ḥkara ṇa- vṛtti ḥ yasy āḥ balena cintayitu ṃ śakyate)

Example(s): "धनलाभाथğ अयय मया जीवनाद् । Figure 2. Interface of Sam āsa-Kart ā िभϓाटनं वरम् " ( dhanal ābh ārthe anyasya maty ā j īvan ād bhik ṣāṭana ṃ varam ) The components of Sam āsa-Kart ā are described in detail as follows: Sense 3 Synonyms: (matam ), 2.2.1 Language and Words Selector मतम् अिभOायः (abhipr āya ḥ), स मितः (sammati ḥ), दृिƍः In this module, user selects input language, in our (dṛṣṭ iḥ), बुिŵः (buddhi ḥ), पϓः (pak ṣaḥ), भावः case, Sanskrit and the words are taken from (bh āva ḥ), मनः (mana ḥ), धी (dh ī), मितः IndoWordNet database to form a compound. Here, (mati ḥ), आकुतम् (ākutam ), आशयः (āś aya ḥ), the lexicographer types-in any character in the se- छदः (chanda ḥ) lected input language and all the words in database । Gloss: केषुिचत् िवषयाƘदषु Oकटीकृतः विवचारः starting with typed character appear in the drop (ke ṣucit vi ṣay ādi ṣu praka ṭīkṛta ḥ down list. Once words are selected, their corre- svavic āra ḥ) sponding synset information is displayed accord- ingly.

Example(s): "सवğषां मतेन इदं कायİ स यक् ।" ( sarve ṣāṃ matena ida ṃ kārya ṃ ववव रात -श दाः Ɔśनात -श दाः Oचलित (svar ānta-śabd āḥ) (vyañjan ānta-śabd āḥ) samyak pracalati ) (vowel-ending words) (consonant-ending words) अकारात मदः → मद चकारात वाक् → वाच् Here, the lexicographer chooses sense 1 of the (ak ārānta ) (manda ḥ → manda ) (cak ārānta ) (vāk → v āc) word (manda ḥ) and sense 2 of the word आकारात िवźा → िवźा जकारात िभषक् → िभषज् मदः मितः (ākārānta ) (vidy ā → vidy ā) (jak ārānta ) (bhi ṣak → bhi ṣaj ) (mati ḥ) to form a compound word मदमित → इकारात → तकारात भगवान् भगवत् (mandamati , lacking intelligence). He/she also has मितः मित (bhagav ān → (ik ārānta ) (mati ḥ → mati ) (tak ārānta ) freedom to select/deselect the synonymous words bhagavat ) ईकारात नदी → नदी दकारात शर द् → शरद् of these selected synsets. Also, in this module, the (īkārānta ) (nad ī → nad ī) (dak ārānta ) (śarad → śarad ) proper care has been taken to avoid words which उकारात भानुः → भानु नकारात आमा → आमन् cannot form sam āsa . There are some words having (uk ārānta ) (bh ānu ḥ → bh ānu ) (nak ārānta ) (ātm ā → ātman ) specific case endings which cannot be compound- ऋकारात माता → मातृ सकारात तेजः → तेजस् (ṛkārānta ) (mātā → m ātṛ) (sak ārānta ) (teja ḥ → tejas ) ed, e.g., a word यथा (yath ā, in which manner) can be compounded; however its synonyms यOकारेण Table 1. Words processed through Morph-Kāraka (yatprak āre ṇa), येन _Oकारेण (yena_prak āre ṇa) can- not be compounded, as they are specific case end- Once the morphological analysis is done on input ing adverbs. words, they are given to Sam āsa-Kāraka for fur- After selecting the appropriate synset and its ther processing. synonyms, lexicographer finally proceeds to gen- erate sam āsas or compound words. The compound Sam āsa-Kāraka words are then processed using the following The Sam āsa-Kāraka takes the processed words modules. from Morph-Kāraka and applies standard sam āsa 2.2.2 Sam sa Preprocessor rules based on grammar. The Sam āsa-Kāraka ā works at the semantic as well as syntactic level. At The Sam āsa Preprocessor performs a check wheth- semantic level, meanings of the words are consid- er the input words are valid to form a sam āsa or ered from the gloss to form the compounded word. not. Here, it will check part-of-speech (POS) of At syntactic level, the are appended/not each input word and validates if the combinations appended to the morphed words. The processed of POS like NN-NN, NN-JJ, JJ-NN, RB-NN, etc. words along with its Sam āsa type are passed to the can be formed. Sam āsa Categorizer as an input.

2.2.3 Word Generator For example, The Word Generator internally processes each in- 1) आमन् + शिŎ (ātman + śakti ) – Here, Sam āsa- put word by using Morph-Kāraka, Sam āsa- Kāraka identifies that both the words आमन् Kāraka , Sam āsa Categorizer and Sandhi-Kart ā to (ātman ) and शिŎ (śakti ) follows 2.2.8 rule षƏी form a compound word. The details of these sub (ṣaṣṭ hī) of Pāṇinian grammar. Hence, आमन् modules are as follows: (ātman ) is eligible to form Sam āsa with the (śakti ). However, the rule number 8.2.7 Morph Kāraka शिŎ नलोपः OाितपƘदकातय (nalopa ḥ pr ātipadik ān Morph-Kāraka or Morph Analyzer is executed tasya ) of P āṇinian grammar says that the न् (n) once the Sam āsa Preprocessor provides it the vali- should be removed from the word dated input words. In this module, each input word आमन् (ātman ). Hence, words (ātma ) and is taken and converted to its form by applying आम शिŎ (śakti ) is sent to Sam āsa Categorizer for fur- standard morphological rules. This is required, as ther processing. in Sanskrit WordNet, all nouns are stored in nomi- native singular form. In order to make compound 2) देव + ईश (deva + īś a) – Here, there is no infec- of these words, we need to bring these nouns to tion, hence these words are directly passed to their root form. Table 1 illustrates some of the the Sam āsa Categorizer. words processed through Morph-Kāraka .

Sam āsa Categorizer (words ending with ī) उकारात Sam āsa Categorizer identifies category of a (uk ārānta ) भानु + उदय → भानूदय (bh ānu + udaya → bh ānūdaya ) sam āsa like Avyay ībh āva , Tatpuru ṣa, Dvandva & (words ending with u)

Bahuvr īhi as per the sam āsa rules. Further, it identi- ऋकारात (ṛkārānta ) मातृ + ऋण → मातॄण fies its sub categories. It generates paraphrased (words ending with ṛ) (mātṛ + ṛṇ a → m ātṝṇ a) information using gloss of input words. This para- phrased information is stored here, which is further Ɔśनात -श दाः (vyañjan ānta-śabd āḥ) used in the WordNet Adder for paraphrasing of (consonant-ending words) compound words. तकारात (tak ārānta ) भगवत् + गीता → भगवŬी ता (bhagavat + g ītā → bhagavadg ītā) Sandhi-Kart ā (words ending with ta) दकारात Sandhi-Kart ā or Sandhi Joiner helps in joining two (dak ārānta ) शरद् + हिवष् → शरŵ िवष् (śarad + havi ṣ → śaraddhavi ṣ) words together which are passed through Sam āsa (words ending with da) नकारात Categorizer. The words are joined together by fol- (nak ārānta ) आमन् + शिŎ → आमशिŎ lowing sandhi rules of the language into considera- (words ending with na) (ātman + śakti → ātma śakti ) tion. The Sandhi-Kart ā performs on all the सकारात (sak ārānta ) मनस् + रथ → मनोरथ combinations of the selected synset words and (words ending with sa) (manas + ratha → manoratha ) produces list of joined words. All these joined words are given to the Sam āsa Ranker & Accumu- Table 2. Words processed through Sandhi-Kart ā lator module. Some of the examples of Sandhi-Kart ā usage for Following are the sub modules of WordNet Adder. words in Sanskrit are illustrated in table 2. Synset Finder 2.2.4 Sam āsa Ranker and Accumulator Here, the lexicographer checks if the intended In this module, all the combinations of words are synset already exists in the WordNet. If it exists ranked and accumulated together as per the most then the words are directly appended to the intend- frequent usage of words in the original WordNet ed synset’s vocabulary. If the synset does not exist, synsets. Here, Sam āsa Ranker Algorithm is used to then it passes through the Paraphraser to create rank the accumulated sam āsas. Once the ranking gloss of the compound word which will help in and accumulating of words are done, the sam āsas creating new synset . will be passed through the WordNet Adder module Paraphraser where its validity is checked and added to the WordNet. The Paraphraser automatically generates most like- ly gloss of the intended synset on the basis of input 2.2.5 WordNet Adder words. This gloss or a concept definition of a WordNet Adder is a semi-automatic process where synset is given to Paraphrase Validator for further newly formed sam āsas are passed through se- processing. quence of steps before adding to the synset in the Paraphrase Validator WordNet. ववव रात -श दाः Here, the lexicographer checks if the paraphrased (svar ānta-śabd āḥ) gloss is properly generated. If not, it is created / (vowel-ending words) edited manually by using the three principles of अका रात (ak ārānta ) देव + ईश → देवेश synset creation, viz., principle of minimality, cov- (words ending with a) (deva + īś a → deve śa) erage and replaceability (Bhattacharyya, 2010). आकारात This is given to the Word Adder module. (ākārānta ) िवźा + आलय → िवźा लय (vidy ā + ālaya → vidy ālaya ) (words ending with ā) Word Adder इकारात (ik ārānta ) Oित + उēर → Oयुēर (prati + uttara → pratyuttara ) The lexicographer finally fills-in other synset in- (words ending with i) formation like examples, gender, etc. and adds to ईकारात नदी + ईश → नदीश the WordNet using an online synset creation tool - (īkārānta ) (nad ī + īś a → nad īś a)

Synskarta (Redkar et al., 2014). The resultant and paraphrasing. This tool can identify the type of Sam āsas will either be the member of an existing compound and suggest its component’s root word. synset or it can be a new synset altogether. Jha et.al. (2009) proposed an Inflectional Mor- phology Analyzer for Sanskrit that identifies and 3 Salient Features of Sam āsa-Kart ā analyzes the inflected noun-forms and verb-forms in any sandhi-free text. The tool checks and labels Some of the salient features of Sam sa-Kart are ā ā each word as three basic POS categories - subanta , as follows: • ti ṅanta, and avyaya . It is based on a reverse Sam āsa or compounds are created on the flow. p inian approach to analyze ti anta verb forms • āṇ ṅ Sam āsa in WordNet helps in identifying mean- into their verbal base and verbal . The ing or concept of a compound occurring in the methodology used to create database tables to store literature. various morphological components of Sanskrit • Sam āsa-Kart ā helps in enriching the standard verb forms is based on the well defined and struc- of the language and to simplify the case-ending tured process of Sanskrit described by words in language under consideration. Pāṇini in his Aṣṭādhy āyī. This analyzer also in- • It assists in developing vocabulary, which in cludes the analysis of derived verb roots. turn, helps in improving the word count in a Gupta et al. (2009) proposed a Rule Based Algo- language. rithm for Sandhi-Viched ā of compound • It helps in automatic generation of paraphrases. words where one letter (whether single or con- • It helps in compound type identification. joined) is broken to form two words. Part of the • The compound words produced can be helpful broken letter remains as the last letter of the first to understand the multi-words. word and later part of the broken word forms the first letter of the next letter. A Sandhi-Viched ā 4 Limitation of Sam āsa-Kart ā module breaks the compound word in a sentence into constituent words, which enables to under- Some of the limitations of Sam āsa-Kart ā are: stand the meaning of the words better. This work • Used only for words in WordNet. aids in learning about the language grammar in an • Possibility of over generation of compounds. easy way. • In Sanskrit, verbs are in its root form; hence Satuluri et al. (2013) studied the generation of word pairs such as VM-VM and RB-VM are Sanskrit compounds and rewrote the grammar as a not implemented. combination of phrase structure rules and the regu- • The word combination NN-RB is not possible lar grammar. It listed various semantic features as as adverbs cannot come as a second word in constraints governing the formation of compounds the compound. in Sanskrit. The rules taken from Pāṇini for com- pound formation are classified into two sets – the 5 Related Work ones which designate a technical term to the input string or a part thereof termed as sa ṃjñ āsūtra , and In past, many researchers have worked on com- the others which transform the input string into pound words, more particularly for Sanskrit Lan- another termed as vidhis ūtra. Also, the various guage. To understand the need of the tool semantic information needed by the compound presented here, a study is done on different kinds formation rules is stated through ontological ap- of tools available for usage. Some tools and work proach. which were reviewed in the domain of compound Sanskrit being a highly inflected language in na- word and its related fields are presented here. ture, each of its word is inflected. If the words are Kumar et al. (2010) presented a Sanskrit com- not used in the correct case-endings, it may lead to pound processor tool, which automatically seg- a different meaning altogether, giving different ments and identifies the type of a compound using context. To simplify the usage of these case- the manually annotated data. To understand the endings, compound words are used. Also, if these compound; their approach involved segmentation, compound words are added to the WordNet, it may constituency , compound type identification help in identifying meaning of a compound occur- ring in the literature.

Hence, we have developed a web based tool Acknowledgments called Sam āsa-Kart ā. The approach used in this We graciously thank all the members of CFILT 8 tool follows the rule-based system which takes two lab, IIT Bombay for providing necessary help and words from IndoWordNet database as an input and ā produces a compound word. This resultant com- guidance needed for the development of Sam sa - ā pound word or sam āsa can be included as a synset Kart . Further, we sincerely thank the members member along with its paraphrase as a gloss in the IndoWordNet and Global WordNet community. WordNet. Work done by Kumar et al. is about identifica- References tion of compound word, whereas, our tool deals Amba Kulkarni, Soma Paul, Malhar Kulkarni, Anil with creation of new compound words. Jha et al. Kumar , Nitesh Surtani : Semantic Processing of created morphological analyzer; similarly, we have Compounds in Indian Languages, Proceedings of implemented Morph Kart ā which is created, spe- COLING 2012, Mumbai, December 2012. cifically for WordNet words. Gupta et al., created Sandhi Splitter, however, we have created Sandhi Anil Kumar, Vipul Mittal and Amba Kulkarni: Sanskrit Joiner, which is also specific for sam āsa of Compound Processor , Sanskrit Computational Lin- WordNet words. Hence, Sam sa -Kart can be guistics - 4th International Symposium, New Delhi, ā ā India, 2010. considered as a complete tool of producing com- pound words related to words in IndoWordNet da- George Miller, R., Fellbaum, C., Gross, D., Miller, K. J. tabase. 1990. Introduction to wordnet: An on-line lexical da- tabase . International journal of lexicography, OUP. 6 Conclusion (pp. 3.4: 235-244).

Sam āsa is a significant part of most of the lan- Girish Nath Jha, Muktanand Agrawal, Subash, Sudhir guages which is used to express meaning using less K. Mishra, Diwakar Mani, Diwakar Mishra, Manji number of words. The tool Sam āsa-Kart ā, dis- Bhadra, Surjit K. Singh, Inflectional Morphology cussed in this paper, is an attempt to improve upon Analyzer for Sanskrit , Sanskrit computational lin- the richness and coverage of a language using a guistics. Springer Berlin Heidelberg, 2009. semi-automated approach. It takes words from IndoWordNet and creates Sam āsa or compound Hanumant Redkar, Jai Paranjape, Nilesh Joshi, Irawati word(s). Sam āsa-Kart ā uses rule based system to Kulkarni, Malhar Kulkarni, and Pushpak form the compounds by passing through various Bhattacharyya. 2014. Introduction to Synskarta: An Online Interface for Synset Creation with Special rules of grammar at each sub module. This tool is Reference to Sanskrit. ICON 2014, Goa, India. able to create new compound words along with its paraphrase which can be added to the WordNet. Malhar Kulkarni, Chaitali Dangarikar, Irawati Kulkarni, Abhishek Nanda and Pushpak Bhattacharyya. 7 Future Scope and Enhancements 2010(a). Introducing Sanskrit Wordnet . In Principles, Construction and Application of Multilingual In future, the tool can be extended to other Indi- WordNets, Proceedings of the 5th GWC, edited by an languages belonging to Indo-Aryan, Dravidian Pushpak Bhattacharyya, Christiane Fellbaum and and Sino-Tibetan families viz ., Hindi, Marathi, Gu- Piek Vossen, Narosa Publishing House, New Delhi, jarati, Bengali, Konkani, Kannada, etc . It can also 2010, pp 257 – 294. be extended to other non-Indian languages like English, German, Italian, etc. This tool can have Malhar Kulkarni, Irawati Kulkarni, Chaitali Dangarikar additional features such as non-WordNet words. and Pushpak Bhattacharyya. 2010(b). Gloss in San- This will be useful in the light of development of skrit Wordnet . In Proceedings of Sanskrit Computa- improving the vocabulary of the language, thus tional Linguistics. Jha. G. Berlin: Springer-Verlag / Heidelberg. pp 190-197. enhancing the richness of the language. Some of the major modules of this tool such as Morph Neha R. Prabhugaonkar, Apurva S. Nagvenkar, and Kāraka, Sandhi Kart ā, Sam āsa Categorizer, Para- Ramdas N. Karmali. 2012. IndoWordNet Application phraser, etc. can be made available independently.

8 http://www.cfilt.iitb.ac.in/

Programming Interfaces. In 24th International Con- ference on Computational Linguistics (COLING 2012), p. 237.

Pavankumar Satuluri, Amba Kulkarni, Generation of Sanskrit Compounds , Proceedings of ICON, 2013 .

Priyanka Gupta, Vishal Goyal. 2009. Implementation of Rule Based Algorithm for Sandhi-Vicheda of Com- pound Hindi Words . JCSI International Journal of Computer Science Issues, Vol. 3, 2009.

Pushpak Bhattacharyya. 2010. IndoWordNet . In the Proceedings of Lexical Resources Engineering Con- ference (LREC), Malta.

Ramashankar Mishra. 2010. अƍायायीसूJपाठः . Motilal Banarasidas publishers pvt. ltd, New Delhi (ISBN 978-81-208-2748-6).

Venkatesh Prabhu, Shilpa Desai, Hanumant Redkar, Neha Prabhugaonkar, Apurva Nagvenkar, Ramdas, Karmali. 2012. An Efficient Database Design for IndoWordNet Development Using Hybrid Approach . COLING 2012, Mumbai, India. p 229.