Experiences in Building the Konkani Wordnet Using the Expansion Approach
Total Page:16
File Type:pdf, Size:1020Kb
Experiences in Building the Konkani WordNet Using the Expansion Approach Shantaram Walawalikar Shilpa Desai Ramdas Karmali ILCI - Konkani Team Dept. of Computer Science Dept. of Computer Science Goa University & Tech., Goa University & Tech., Goa University [email protected] [email protected] [email protected] Sushant Naik Damodar Ghanekar Chandralekha D'Souza ILCI - Konkani Team ILCI - Konkani Team Dept. of Konkani Goa University Goa University Goa University [email protected] Jyoti Pawar Dept. of Computer Science & Tech., Goa University [email protected] Abstract 1. Introduction WordNet can be described in short as a massive WordNet can be described as an electronic structure of words in a graph like form. It is an lexical database available on-line as a electronic lexical database available as a powerful resource to the researchers in the powerful resource to the researchers in the area area of computational linguistics, text of computational linguistics, text processing and processing and many other related areas. many other related areas. Since 1987 when Currently, the necessity of building WordNets has been felt for all the Indian Languages to WordNet first appeared globally, it has come a aid in multi lingual machine translation and long way, getting itself moulded as per the cross lingual information retrieval to promote ongoing requirements of the users and making tourism, farming, education and other related use of the advancement of technology viz. areas for overall growth and development of Computer Science and Communications. Indo the nation. IIT Bombay, India has developed a WordNet is India's contribution to this global number of tools, resources and facilities by effort and the steps towards the development of which WordNet of any language can be Konkani WordNet shabdamAleM is a constructed through what is known as the ळ녍दभारं expansion approach. Projects to create part of this initiative. WordNet in most of the Indian languages The layout of this paper is as follows – Section 2 using this approach with Hindi WordNet as the discusses the characteristics of Konkani base are currently in progress. language. A brief description of the Hindi In this paper we report our experiences of WordNet, the expansion approach used to create creating a WordNet for Konkani language Konkani WordNet, observations made during using the expansion approach with Hindi as the WordNet creation process and challenges the source language and Konkani as the target faced are given in section 3. Section 4 concludes language. The Konkani WordNet is in the the paper with a discussion on the future work initial stage of development. The 1969 Hindi plan. core synsets have been incorporated in the Konkani WordNet. The Offline Synset Linking Tool developed by IIT Bombay is 2. Characteristics of Konkani Language being used for this task. Konkani language is one of the twenty two languages included in the eighth schedule of the constitution of India. It is also the official 2.2 Number language of the State of Goa. Konkani is an Indo-European (Indo-Aryan) language derived Konkani has two numbers - singular and plural from Sanskrit through Prakrit and is influenced (Sardesai, 1986). The derivation of plural form and enriched by various other languages like from singular form is dependent on gender and Marathi, Kannada, Malayalam, Hindi, phonetic characteristic of singular form. Portuguese and English. Though Devanagari In some cases the change in pronunciation of the script is recognised as official script of vowel denotes change in number, e.g. दोतोय Konkani, it is also written in Roman and dotora „doctor or doctors‟, पातय phAtara „stone Kannada scripts. Old Konkani literature is also or stones‟, देय dera „brother-in-law or brothers-in- found written in Malayalam and Urdu scripts. law‟, ओंठ oMTha „lip or lips‟. The first edition of Konkani grammar titled „Arte da Lingua Canarin‟ was written 2.3 Gender somewhere in 1617 A.D. by Fr. Thomas Stephens (Asmitai, 2008; Cunha, 1958) . It was Konkani has three genders - masculine, feminine enlarged by Fr. Diogo Ribeiro and revised by and neuter. However, in some cases feminine four priests of the Society of Jesus, and printed nouns are also addressed as neuter e.g., कभरा in 1640. This is considered to be the first आंगणांत खेऱटारं kamalA AMgaNAMta published grammar not only of Konkani but of kheLatAleM ‘Kamala was playing in the any Indian language. Monsignor Sebastiao courtyard‟. Here, the verb खेऱटारं refers to the Rudolpho Dalgado was the first known Indian neuter gender whereas Kamala is otherwise a lexicographer of Konkani as those preceded feminine noun. him were all European missionaries. He It is also interesting to note that two synonymous contributed to the development of Konkani nouns may have two different genders, e.g., 셂ख with his three important works „Konkani – rUkha „tree, masculine‟ and झाड jhADa „tree, Portuguese Dictionary‟ (1893), „Portuguese – neuter‟. Konkani Dictionary‟ (1905), and „Bouquet of Konkani Proverbs‟ consisting of 2177 2.4 Word Structure proverbs. Konkani is a highly inflected language (Almeida, 2.1 Pronunciation 1989). Nouns and pronouns are inflected for number and case. Verbs are inflected for person, Shennoi Goembab (1949) in his book, number, gender, tense and aspect. Adjectives are „Konkanichi Vyakarani Bandavol‟ discusses inflected for gender and number. pronunciation in detail. The Structure of Konkani word (Goembab, Konkani pronunciation for अ, ए, ओ, औ have 1949; Borkar, 1986) can be depicted as under: additional pronunciations besides the original Sanskrit pronunciations. अ in ऩणस paNasa Nominal Base (N.B.) + Nominal Inflection „jackfruit‟ is known as लरयत svarita in Vedic (N.I.) Sanskrit. and also have open ए ओ pustakAcheM „of the book‟ pronunciations. These open pronunciations ऩुतकाचें must have been influenced by Pali language. These are found in other Indian languages like (N.B.) + postposition Bengali, Bihari, Gujarati, Kannada, Telugu, याभाकड쥍मान rAmAkaDalyAna „from Ram‟ Tamil, Malayalam, etc, but it is not found in Marathi. (N.B.) + (N. I.) + (N. I.) In Konkani, according to the pronunciation of ऩुतकांतरं pustakAMtaleM „from the book’ a vowel in the same word, the meaning changes e.g. pera „guava fruit or guava ऩेय (N.B.) + (N. I.) + postposition tree‟, भोय mora „peacock, sl. or peacock, pl.), pustakAMtalyAna ‘from inside the लंलऱ voMvaLa;a „kind of flower – mimusops ऩुतकांत쥍मान elengi flower or its tree‟. book’ (N.B.) + (N. I.) + postposition + (N. I.) 3rd धांलतारो री रं धांलतारे 쥍मो रीं ऩुतकाऩे쥍मानचें pustakApelyAnacheM „from beyond the book’ Future 1st धांलतरं रीं रं धांलतरे 쥍मं रीं (N.B.) + (N. I.) + clitic 2nd धांलतरो री रं धांलतरे 쥍मो रीं 3rd धांलतरो री रं धांलतरे 쥍मो रीं ऩुतकाचेंचे pustakAcheMcha „of the book itself‟ Transitive Verb खालऩ khAvapa „to eat‟ (N.B.) + postposition + clitic Singular Plural याभाकड쥍मानम rAmAkaDalyAnaya „also from Present Ram‟ 1st person खातां खातात 2nd खाता खातात (N.B.) + (N. I.) + (N. I.) + clitic 3rd खाता खातात pustakAMtaleMcha „from the ऩुतकांतरंचे Imperfect book itself‟ 1st खातारं रीं रं खातारे 쥍मं रीं 2nd खातारो री रं खातारे 쥍मो रीं (N.B.) + (N. I.) + postposition + clitic 3rd खातारो री रं खातारे 쥍मो रीं जेलचेेऩासतचे jevachepAsatacha „only for meals‟ Future 1st खातरं रीं रं खातरे 쥍मं रीं (N.B.) + (N. I.) + postposition + (N. I.) + clitic 2nd खातरो री रं खातरे 쥍मो रीं 3rd ऩुतकाऩे쥍मानचेंम pustakApelyAnacheMya खातरो री रं खातरे 쥍मो रीं „also from beyond the book‟. Perfective Intransitive 2.5 Verb Base Singular Plural The verbal base of Konkani has three sources (Goembab, 1949), present active base, present Present Perfect passive base and past passive participles. The 1st धांलरां 쥍मां रां धांल쥍मात 쥍मांत 쥍मांत roots are either active or passive in sense, the 2nd धांलरा 쥍मा रा धांल쥍मात 쥍मांत 쥍मांत passive being intransitive and the active being 3rd धांलरा 쥍मा रा धांल쥍मात 쥍मांत 쥍मांत transitive. The following is a sample of these forms separated with base form of verb: Past 1st धांलरं रीं रं धांलरे 쥍मं रीं Non Perfective 2nd धांलरो री रं धांलरे 쥍मो रीं Intransitive 3rd धांलरो री रं धांलरे 쥍मो रीं The verb: धांलऩ dhAMvapa „to run‟ Singular Plural Past Perfect Present 1st धांवल쥍रं 쥍रीं 쥍रं धांवल쥍रे 쥍쥍मं 쥍रीं 1st person धांलतां धांलतात 2nd धांवल쥍रो 쥍री 쥍रं धांवल쥍रे 쥍쥍मो 쥍रीं 2nd धांलता धांलतात 3rd धांवल쥍रो 쥍री 쥍रं धांवल쥍रे 쥍쥍मो 쥍रीं 3rd धांलता धांलतात Transitive In the present tense, gender has no effect. But Singular Plural the verb endings change as we go to all other cases and are differentiated below with the Present Perfect respective affixes in the sequence of 1st person खारा खा쥍मात masculine, feminine and neuter. 2nd खारा खा쥍मात 3rd खारा खा쥍मात Imperfect 1st धांलतारं रीं रं धांलतारे 쥍मं रीं Past nd 2 धांलतारो री रं धांलतारे 쥍मो रीं 1st person खारो खारे 2nd खारो खारे भा蕍डऩ, चचे蕍डऩ mAD.hDapa, chiD.hDapa „to beat 3rd खारो खारे someone by putting under one's feet‟. Past Perfect 2.7 Homographic Words: 1st person खा쥍रो खा쥍रे 2nd खा쥍रो खा쥍रे In Konkani, we also come across homographic 3rd खा쥍रो खा쥍रे words i.e.