View metadata, citation and similar papers at core.ac.uk brought to you by CORE
provided by Repositorio Institucional de la Universidad de Alicante
Development of a morphological analyser for Bengali
Abu Zaher Md. Faridee Francis M. Tyers Dept. of Comp. Science and Eng., Dept. Llenguatges i Sist. Informàtics, Bangladesh Univ. of Eng. and Tech., Universitat d'Alacant, Dhaka 1000 Alacant. E-03070 [email protected] [email protected]
Abstract Persian as well as its own kin Hindi, Urdu, and Sanskrit language, resulting in a rich set of in- This article describes the development ections. Negatives and adverbs sometimes also of an open-source morphological anal- take the form of enclitic, being another reason for yser for Bengali Language using nite- higher the inection. state technology. First we discuss the Bengali exhibits diglossia,1 one of the two ma- challenges of creating a morphologi- jor forms is called Shadhubhasha and the other cal analyser for a highly inectional is called Chalitabhasha or SCB (Standard Col- language like Bengali and then pro- loquial Bengali). The main difference between pose a solution to that using lttoolbox, these two forms are seen in verbal inections an open-source nite-state toolkit. We and higher usage of tatshama words in Shadhub- then evaluate the performance of our hasha. Though a lot of Bengali literature and developed system and propose ways of formal or legal documents have been historically improving it further. written in Shadhubhasha, in modern times, SCB has almost replaced it in day-to-day usage; there- 1 Introduction fore, we chose to create our morphological anal- yser based on SCB. Bengali language (also referred as Bangla by Bengali is a classier language (Samit Bhat- its native speakers) is an Indo-Aryan language tacharya and Basu, 2005), meaning the verb does which is mainly spoken in the region of east- not change its form based on the gender and num- ern South Asia (also known as Bengal) compris- ber of the subject or object, rather on the tense, ing Bangladesh, the Indian State of West Ben- aspect, modality and person. Our work found gal, southern Assam and part of Tripura. A lit- Bengali having three tenses, present, past and fu- erally rich language, its evolution can be traced ture; two moods, indicative and imperative; four back to Magadhi Prakrit and Sanskrit languages. aspects, simple, continuous, perfect and habitual, There are an estimated 230 million speakers of the which differs a little from the grammar books. We language world wide, making it sixth among the identied 59 separate inections for each of the most spoken languages of the world. verbs. A sample sufx list for the Bengali verb কর Building a morphological analyser for Bengali [kar]2 `do' is shown in table 1. It should be noted using nite-state technology requires taking into that the sufx added to the verb root is highly de- account the highly inectional properties of the pendent on its syllabic structure leading to six ba- language and the diverse vocabulary. This di- versity can be attributed to the inuence of dif- 1http://lrc.cornell.edu/asian/courses/ ferent cultures and languages ranging from Euro- bengali 2NLK transliteration scheme, details at http: pean languages like English, French, Dutch, Por- //en.wikipedia.org/wiki/Bengali_script# tuguese to Middle Eastern languages like Arabic, Romanization_Reference.
J.A. P´erez-Ortiz,F. S´anchez-Mart´ınez,F.M. Tyers (eds.) Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation, p. 43–50 Alacant, Spain, November 2009 sic inection rule for verbs (the actual number of tium, hence the analyser data conforms to Aper- inectional paradigms for the verbs are however tium's dix format (Ortiz-Rojas et al., 2005). The much more than that). data for the analyser/generator is contained in For creating this morphological analyser/gen- an XML le, the Bengali monolingual dictio- erator we took a slightly different approach from nary (monodix). This dictionary has two principal more traditional Bengali grammars. For example, parts, the rst one is the pardef section, which there are seven grammatical cases in Bengali tra- contains the inection rules for a particular type ditional grammars (Chowdhury et al., 2000), but of word, the other is the entry section, which con- morphologically there are only four cases.4 An- tains the list of words stating which pardef they imacy also plays a major role in the inectional belong to. A part of pardef denition for a verb is pattern for nouns. We identied that there are four given in gure 1. levels of animacy encountered in Bengali text, When the transducer encounters the surface inanimate, animate, human and elite. The ani- form `িফরিছল' [phirchila], it outputs lemma `েফর' macy level of the nouns govern which case and [pher]¯ along with tag vblex, past, cnt, p3, plurality sufx are added to them. For example, infml. Thus the morphological analyser realises in nominative case, to construct plural form the 3rd person, informal, past continuous form of sufx রা [ra]¯ and গণ [gan] are added after human, verb `েফর' from `িফরিছল'. elite nouns respectively, but েলা [gulo]¯ is added The benet of using Apertium (particularly lt- after inanimate and animate nouns. The details toolbox, the toolset responsible for managing the are given in table 2. The inection rules for gen- monodix) is its robust architecture. The primary der of nouns vary radically, they are actually more advantage is that the analyser can also double as derivational than inectional. Since right now we a morphological generator. On the other hand, only deal with inectional morphology, we de- given an XML based monodix it can create a com- cided to treat genders as separate words. piled dictionary which is generally faster than a Inectionally similar to nouns, the pronouns do normal text or database based dictionary, optimal not have have genders. Proper nouns are harder for running small memory footprint devices. to deal with in Bengali, there is no special way to Bengali is distinct in nature as certain clitics are readily identify proper nouns, unlike in English. used to convey several adverbial meaning. For
We have several sub-categories like anthroponym, example, আপিন [apni]¯ - you, but আপিনও [apnio]¯ - toponym, cognomen, organization etc. for them. you too. Here the clitic ও is used to convey the A small number of Adjectives inect on degree of meaning of adverb `also'. So we create a new comparison. They can also inect on gender, but pardef for this. A sample pardef for enclitic can these can be considered sankskritism rather than be seen in gure 2. general phenomena (Islam et al., 2007) and do not As can be seen in table 1, the lemma form that have much use in SCB. Comparative and superla- we chose for the verb is a bit different from tra- tive forms are normally generated by applying তর ditional dictionary formats where the verbs are [tara] and তম [tama] sufxes. Adverbs do not in- generally represented in their gerund form. We ect on degree. chose to use the present indenite, second per- Like most other languages, there are small son, informal inected form as the lemma. The number of exceptions to all the general set of justication for this decision lies in the fact that rules, and these words are treated separately. this inection is the shortest (in length) for ev- ery verb. Also, according to Sanskrit grammar, 2 Design this form is analogous to the original stem (i.e. dhatu) after which afxes are added to create the The morphological analyser/generator was devel- real verb. Anubadok,5 an open-source English to oped as a part of a new language pair for Aper- Bengali machine translation system from which a 3Although no sufx is added, the pronunciation here is lot of linguistic data was derived, also uses this different [kara]¯ format for verb lemma. A potential disadvantage 4http://en.wikipedia.org/wiki/Bengali_ grammar#Case 5http://anubadok.sf.net
44 First Second Third and Impersonal কর [kar] Polite Familiar Informal Polite Familiar
Ger. ◌া [a]¯ n/a n/a n/a n/a n/a n/a Inf. েত [te]¯ n/a n/a n/a n/a n/a n/a Gen. ◌ার [ar]¯ n/a n/a n/a n/a n/a n/a 3 Pres. Spl. n/a ি◌ [i] ে◌ন [en]¯ ø ি◌স [is] ে◌ন [en]¯ ে◌ [e]¯ Pres. Cont. n/a িছ [chi] েছন [chen]¯ ছ [cha] িছস [chis] েছন [chen]¯ েছ [che]¯ Fut. Spl. n/a ব [ba] েবন [ben]¯ েব [be]¯ িব [bi] েবন [ben]¯ েব [be]¯ Past Spl. n/a লাম [lam]¯ েলন [len]¯ েল [le]¯ িল [li] েলন [len]¯ ল [la] Past Hbl. n/a তাম [tam]¯ েতন [ten]¯ েত [te]¯ িত [ti] েতন [ten]¯ ত [ta] Past Cont. n/a িছলাম [chilam]¯ িছেলন [chilen]¯ িছেল [chile]¯ িছিল [chili] িছেলন [chilen]¯ িছল [chila] Perf. n/a ে◌িছ [echi]¯ ে◌েছন [ech¯ en]¯ ে◌ছ [echa]¯ ে◌িছস [echis]¯ ে◌েছন [ech¯ en]¯ ে◌েছ [ech¯ e]¯ Plu Perf. n/a ে◌িছলাম [echil¯ am]¯ ে◌িছেলন [echil¯ en]¯ ে◌িছেল [echil¯ e]¯ ে◌িছিল [echili]¯ ে◌িছেলন [echil¯ en]¯ ে◌িছল [echila]¯ Part. Past. ে◌ [e]¯ n/a n/a n/a n/a n/a n/a Part. Cond. েল [le]¯ n/a n/a n/a n/a n/a n/a Imp. Pres. n/a n/a ◌ুন [un] ø ø n/a n/a Imp. Fut. n/a n/a ে◌ন [en]¯ ে◌া [o]¯ ি◌স [is] n/a n/a
Legends: Ger. - Gerund, Inf. - Innitive, Gen. - Genitive, Pres. - Present, Spl. - Simple, Cont. - Continuous, Hbl. - Habitual, Perf. - Perfect, Part. - Participle, Imp. - Imperative, Fut. - Future
Table 1: Sufx table for the Bengali verb কর [kar] `do'
Inanimate Animate Human Elite Singular Plural Singular Plural Singular Plural Singular Plural
Nominative ø/টা [t.a]¯ েলা [gulo]¯ ø/টা [t.a]¯ েলা [gulo]¯ ø/টা [t.a]¯ রা [ra]¯ ø রা [ra]¯ /গণ [gan. a] / [khan¯ a]¯ / [brinda] খানা /টু কু [t.uku] বৃ . /খািন [khani]¯ Objective ø/টা [t.a]¯ েলা [gulo]¯ েক [ke]¯ েলােক [gulok¯ e]¯ েক [ke]¯ েদরেক [derk¯ e]¯ েক [ke]¯ েক [ke]¯ / [khan¯ a]¯ / [tak¯ e]¯ / [tak¯ e]¯ / [ganke]¯ খানা /টু কু [t.uku] টােক . টােক . গণেক . / [brindake]¯ /খািন [khani]¯ বৃ েক . Genitive র [ra] /ে◌র [er]¯ েলার [gulor]¯ র [ra] েলার [gulor]¯ র [ra] েদর [der]¯ র [ra] েদর [der]¯ / [yer]¯ / [er]¯ / [er]¯ / [er]¯ / [ganer]¯ েয়র /টু কু র [t.ukur] ে◌র ে◌র ে◌র গেণর . / [yer]¯ / [yer]¯ / [yer]¯ / [brinder]¯ /টার [t.ar]¯ েয়র েয়র েয়র বৃে র . /খানার [khan¯ ar]¯ /টার [t.ar]¯ /টার [t.ar]¯ /খািনর [khanir]¯ Locative েয় [ye]¯ /য় [y] েলায় [guloy]¯ n/a n/a n/a n/a n/a n/a /েত [te]¯ /টায় [tay]¯ /টু কু েত [tukute]¯ /খানায় [khan¯ ay]¯ /খািনেত [khanit¯ e]¯
Table 2: Sufx table for Bengali noun inections
Figure 1: Part of paradigm denition for ফ/ে◌র__vblex
45 wanted to remain faithful to Zipf's Law,9 which BLP.10 The list was created at CRBLP as per their get a list of most frequently used 20,000 Bengali
46 Part of speech Number of entries are few digital SCB text resources available in the Noun 2,035 Internet, apart form Wikipedia and the Prothom- Adjective 866 Proper noun 800 alo website. Therefore, our evaluation was also Adverb 432 conned to corpus developed from Wikipedia and Verb 136 Prothom-alo. The data was collected by running Numeral 123 Post-position 46 a web crawler on these two web sites. Determiner 45 Pronoun 32 First we calculate the naïve coverage according Conjunction 8 to following formula: 4,523 No. of words with at least one analysis Coverage = Table 3: Number of entries in the lexicon for the main parts No. of words of speech
Table 4 shows the naïve coverage for wikipedia we are generating the inected forms. First in and prothom-alo. The statistics clearly show op- line 1 and 2, we generate two type of umlauts for timization nature of our analyser. Since the fre- this particular kind of verb root. Normally this is quency list was derived from prothom-alo, we get a regressive vowel harmony, a phonological phe- a higher coverage (80.35%) of prothom-alo than nomenon attributive to SCB. But for some verbs of wikipedia (68.21%). like যা [ya],¯ আছ [acha],¯ it is rather a morphological phenomenon (e.g. যা [ya]¯ - েগল [gela],¯ আছ [acha]¯ - Site File size Total Recognised Naïve থাকত [thakta]).¯ In the following lines we concate- (MB) words words coverage nate the root or the umlauts with the sufxes. It is Wkipedia 27.1 MB 1,730,745 1,180,542 68.21% to be noted that the actual code for these two pro- Prothom- 23.4 MB 1,572,601 1,263,661 80.35% cedure involves some complex regular expression alo matching (e.g. detection of hashant and chandra- bindu, unicode normalisation, handling irregular Table 4: Naïve coverage for wikipedia and prothom-alo verbs), something which was removed from the pseudo-code for simplicity and clarity. For calculating recall and precision we also A preliminary version of the verb conjugator took two sets of data. First one is the list of 13 can be found online which generates all the most frequently used 1000 words (in their surface forms for a particular verb. The current morpho- form) from CRBPL (Prothom-alo corpus). The 14 logical analyser can also be accessed online. second one is randomly selected 1000 word text excerpt from the web-crawler data of Prothom- 4 Evaluation alo. The formulas for calculating recall and pre- As table 3 shows, the number of currently tagged cision are as follows: parts of speech is 4,523. This is not a very high number. However, as we shall later see, our dic- Number of correct analyses Precision tionary puts emphasis on the most frequently used = Number of analyses retrieved words, therefore achieving a higher effective cov- erage. Doing an evaluation poses some major chal- lenges for us. As we have stated before, this Number of correct analyses Recall implementation of Bengali morphological anal- = Total number of analyses yser was created keeping in mind only the SCB (Standard Colloquial Bengali). However there Table 5 shows the recall and precision values 13http://xixona.dlsi.ua.es/~fran/ for the two sets of data. Precision is high in both bengali/conj/ 14http://xixona.dlsi.ua.es/~fran/ cases. Recall value falls in the latter case (random bengali/analysis.php excerpt), but this is still a high value.
47 PROCESS-VERB(verbList) 1 for each verb in verbList 2 do length length of verb discarding ` ' [chandrabindu] and ` ' [hashant] ← ◌ঁ ◌্ 3 if length > 2 ✄ Rule for verbs that are more than 2 characters long 4 then if verb ends with a marker preceded by a consonant 5 then GET-INFLECTION-DO(verb) ✄ করা,পড়া etc. 6 if verb ends with a consonant preceded by a `ে◌' or `ে◌া' 7 then GET-INFLECTION-WRITE(verb) ✄ েলখ, েখল etc. 8 if verb ends with a consonant preceded by a marker 9 then GET-INFLECTION-SHAKE(verb) ✄ পাড়, নাড়, ছুট etc. 10 elseif length = 2 ✄ Process 2 letter verbs 11 then if verb ends with a consonant preceded by an ও 12 then GET-INFLECTION-WRITE(verb) ✄ ওঠ etc. 13 if verb ends with a consonant preceded by an vowel or a consonant 14 then GET-INFLECTION-SHAKE(verb) ✄ আস, আন, আঁক etc. 15 if verb is `যা' 16 then GET-INFLECTION-GO(verb) 17 if verb ends with ে◌ or ে◌া preceded by a consonant 18 then GET-INFLECTION-TAKE(verb) ✄ েন, েদ etc. 19 if verb ends with a marker preceded by a consonant 20 then GET-INFLECTION-EAT(verb) ✄ খা, পা, , িন etc. 21 else ✄ Process single letter verbs 22 if verb ends with a consonant 23 then GET-INFLECTION-EAT(verb) ✄ ক, হ etc.
Figure 3: Pseudo code for the procedure PROCESS-VERB
GEN-INFLECTION-TAKE(verb) 1 uml replace ` ' with ` ' and ` ' with ` ' in verb ← ে◌ ি◌ ে◌া ◌ু 2 uml2 replace ` ' with ` ' in verb ← ে◌ ◌া 3 I[Ger.] verb + ← ওয়া 4 I[Inf.] uml + ← েত 5 I[Gen.] verb + ← বার 6 I[P res.Spl.] verb + , verb + , uml2 + , uml + , verb + , verb + ← ই ন ও স ন য় 7 I[P res.Cont.] uml + , uml + , uml + , uml + , uml + , uml + ← ি ে ন ি স ে ন ে 8 I[F ut.Spl.] verb + , verb + , verb + , uml + , verb + , verb + ← ব েবন েব িব েবন েব 9 I[P ast.Spl.] uml + , uml + , uml + , uml + , uml + , uml + ← লাম েলন েল িল েলন ল 10 I[P ast.Hbl.] uml + , uml + , uml + , uml + , uml + , uml + ← তাম েতন েত িত েতন ত 11 I[P ast.Cont.] uml + , uml + , uml + , uml + , uml + , uml + ← ি লাম ি েলন ি েল ি িল ি েলন ি ল 12 I[P erf.] uml + , uml + , uml + , uml + , uml + , uml + ← েয়িছ েয়েছন েয়ছ েয়িছস েয়েছন েয়েছ 13 I[P luP erf.] uml + , uml + , uml + , uml + , uml + , uml + ← েয়িছলাম েয়িছেলন েয়িছেল েয়িছিল েয়িছেলন েয়িছল 14 I[P art.P ast] uml + ← েয় 15 I[P art.Cond.] uml + ← েল 16 I[Imp.P res.] verb + , uml2 + , verb, verb+ , uml + ← ন ও ন ক 17 I[Imp.F ut.] verb + , uml + , uml + , verb+ , uml + ← েবন ও স েবন েব
Figure 4: Pseudo code for procedure GEN-INFLECTION-TAKE
48 Type Correct Retrieved Total Precision Recall could also be used as a stemmer for any search Top 1000 1,626 1,723 1,631 99.6% 94.37% engine for Bengali language. It could also dou- Random 1000 1,328 1,504 1,330 99.8% 88.29% ble as a spell checker. In fact works are in the way to create a new Bengali dictionary for Fire- Table 5: Precision and Recall fox and OpenOfce.Org using this morphological analyser by Ankur.15 5 Discussion To our knowledge, this is the rst open- source attempt in creating a fully-functional wide- Currently the analyser is in preliminary stage and coverage morphological analyser and generator there are lots of room for improvement. Firstly, for Bengali that is publicly available to every- the coverage needs to be expanded. This will re- one. All the tools that were used in this project quire manual tagging of many words. Secondly, are open source and the output, both the linguistic the verb section faces difculty in treating multi- data and the toolset is available under the GNU word verbs and the negative form are not well GPL license. recognised. This is because several forms of the verb like innitives and participles demand a neg- Acknowledgments ative particle before the verb while nite forms Development was funded as part of the Google require the particle to follow the verb and in some Summer of Code programme16. We are also very cases as enclitic. Matters get complicated in the grateful to the personnel from CRBLP, specially former case, multi-word verbs require the nega- Dr. Mumit Khan for allowing us to use their tive particle to be in the middle. Table 6 shows the the most frequently used word-list. And special negative forms of a multi-word verb কাজ কর `work' thanks to co-mentor Kevin Donnelly, who helped clearly identifying the problem. This can be par- us a lot during the initial phase of the project and tially taken care of by a nested paradigm, but this to Dr. G. M. Hossain, the author of the Anubadok makes the analyser slow, we need to come up with system which was used during the development. a better solution to solve this. One possibility is porting this analyser (the References python scripts) to a more robust architecture Beesley, K. R. and Karttunen, L. (2003). Finite such as foma (Huldén, 2009), an open-source State Morphology. CSLI Publications. implementation of the Xerox nite-state tools Chowdhury, S. M., Chowdhury, S. M. H., Khalil, (Beesley and Karttunen, 2003). It should be M. I., Muhammad, D. K. D., and Lahiri, S. noted that there has been similar attempts to cre- (2000). (A grammar of Bengali ate nite-state technology based morphological বাংলা ভাষার ব াকরণ Language). National Curriculam and Textbook analyser for Bengali with PC-KIMMO (Dasgupta Board (NCTB). and Khan, 2004) and JKimmo (Islam and Khan, 2006). However, PC-KIMMO is not Unicode Dasgupta, S. and Khan, M. (2004). Morpho- compliant, it cannot be directly used for Ben- logical parsing of Bangla words using PC- gali morphological analysis. On the other hand KIMMO. Proceedings of the 7th International JKimmo requires a transliteration scheme. The Conference on Computer and Information solution we present here is Unicode compliant Technology (ICCIT2004, Dhaka, Bangladesh). and requires no transliteration scheme. Further- Huldén, M. (2009). Foma: a nite-state compiler more, foma is fully able to handle Unicode char- and library. EACL 2009, pages 2932. acters, thus maintaining our Unicode compliance Islam, M. Z. and Khan, M. (December 2006). should we choose to port to it. Jkimmo: A multilingual computational mor- As mentioned earlier, this morphological anal- phology framework for pc-kimmo. Proc. of yser/generator was created as a part of a new lan- 15 guage pair bn-en for Apertium. We are on our http://ankur.org.bd/ 16http://socghop.appspot.com/student_ way of creating a functional English to Bengali project/show/google/gsoc2009/apertium/ translation system. The morphological analyser t124021625239
49 First person Second person Third person / Impersonal কাজ কর [kaz¯ kar] Polite Familiar Informal Polite Familiar
Ger. কাজ না করা n/a n/a n/a n/a n/a n/a [kaz¯ na¯ kara]¯ Inf. কাজ না করেত n/a n/a n/a n/a n/a n/a [kaz¯ na¯ karte]¯ Gen. কাজ না করার n/a n/a n/a n/a n/a n/a [kaz¯ na¯ karar]¯ Pres. Spl. n/a কাজ কির না কাজ কেরন না কাজ কর না কাজ কিরস না কাজ কেরন না কাজ কের না [kaz¯ kari na]¯ [kaz¯ karen¯ na]¯ [kaz¯ kara na]¯ [kaz¯ karis na]¯ [kaz¯ karen¯ na]¯ [kaz¯ kare¯ na]¯ Pre. Cont. n/a কাজ করিছ না কাজ করেছন না কাজ করছ না কাজ করিছস না কাজ করেছন না কাজ করেছ না [kaz¯ karchi na]¯ [kaz¯ karchen¯ na]¯ [kaz¯ karcha na]¯ [kaz¯ karchis na]¯ [kaz¯ karchen¯ na]¯ [kaz¯ karche¯ na]¯ Fut. Spl. n/a কাজ করব না কাজ করেবন না কাজ করেব না কাজ করিব না কাজ করেবন না কাজ করেব না [kaz¯ karba na]¯ [kaz¯ karben¯ na]¯ [kaz¯ karbe¯ na]¯ [kaz¯ karbi na]¯ [kaz¯ karben¯ na]¯ [kaz¯ karbe¯ na]¯ Past. Spl. n/a কাজ করলাম না কাজ করেলন না কাজ করেল না কাজ করিল না কাজ করেলন না কাজ করল না [kaz¯ karlam¯ na]¯ [kaz¯ karlen¯ na]¯ [kaz¯ karle¯ na]¯ [kaz¯ karli na]¯ [kaz¯ karlen¯ na]¯ [kaz¯ karla na]¯ Past. Hbl. n/a কাজ করতাম না কাজ করেতন না কাজ করেত না কাজ করিত না কাজ করেতন না কাজ করত না [kaz¯ kartam¯ na]¯ [kaz¯ karten¯ na]¯ [kaz¯ karte¯ na]¯ [kaz¯ karti na]¯ [kaz¯ karten¯ na]¯ [kaz¯ karta na]¯ Past. Cont. n/a কাজ করিছলাম না কাজ করিছেলন না কাজ করিছেল না কাজ করিছিল না কাজ করিছেলন না কাজ করিছল না [kaz¯ karchilam¯ na]¯ [kaz¯ karchilen¯ na]¯ [kaz¯ karchile¯ na]¯ [kaz¯ karchili na]¯ [kaz¯ karchilen¯ na]¯ [kaz¯ karchila na]¯ Perf. n/a কাজ কিরিন কাজ কেরনিন কাজ করিন কাজ কিরসিন কাজ কেরনিন কাজ কেরিন [kaz¯ karini] [kaz¯ karenni]¯ [kaz¯ karani] [kaz¯ karisni] [kaz¯ karenni]¯ [kaz¯ kareni]¯ Plu Perf. n/a কাজ কিরিন কাজ কেরনিন কাজ করিন কাজ কিরসিন কাজ কেরনিন কাজ কেরিন [kaz¯ karini] [kaz¯ karenni]¯ [kaz¯ karani] [kaz¯ karisni] [kaz¯ karenni]¯ [kaz¯ kareni]¯ Part. Past. কাজ না কের n/a n/a n/a n/a n/a n/a [kaz¯ na¯ kare]¯ Part. Cond. কাজ না করেল n/a n/a n/a n/a n/a n/a [kaz¯ na¯ karle]¯ Imp. Pres. n/a n/a কাজ করেবন না কাজ করেব না কাজ করিব না কাজ করেবন না কাজ করেব না [kaz¯ karben¯ na]¯ [kaz¯ karbe¯ na]¯ [kaz¯ karbi na]¯ [kaz¯ karben¯ na]¯ [kaz¯ karbe¯ na]¯ কাজ কিরস না [kaz¯ karis na]¯ Imp. Fut. n/a n/a কাজ করেবন না কাজ কেরা না কাজ করেবন না কাজ করেব না [kaz¯ karben¯ na]¯ [kaz¯ karo¯ na]¯ [kaz¯ kar na]¯ [kaz¯ karbe¯ na]¯ /কাজ কিরসেন [kaz¯ karisne]¯
Table 6: Example of negative inections of a compound verb
9th International Conference on Computer and systems. In Proc. of the National Conference Information Technology (ICCIT 2006), Dhaka, on Computer Processing of Bangla (NCCPB Bangladesh. 05), Dhaka, Bangladesh. Islam, M. Z., Uddin, M. N., and Khan, M. (2007). A light weight stemmer for Bengali and its use in a spelling checker. Proceedings of the 1st International Conference on Digital Communications and Computer Applications (DCCA2007, Irdbid, Jordan). Klaiman, M. H. (2009). The World's Major Lan- guages, volume 23. Routledge, 2nd edition. Ortiz-Rojas, S., Forcada, M. L., and Ramírez- Sánchez, G. (2005). Construcción y mini- mización eciente de transductores de letras a partir de diccionarios con paradigmas. Proce- samiento del lenguaje natural, (35):5157. Ray, P. S., Chakravarti, P. N., Chatterjee, S., Ja- han, R., and Ahmed, M. (1966). A Refer- ence Grammer of Bengali. The University of Chicago. Samit Bhattacharya, Monojit Choudhury, S. S. and Basu, A. (2005). Inectional morphology synthesis for bengali noun, pronoun and verb
50