View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Repositorio Institucional de la Universidad de Alicante

Development of a morphological analyser for Bengali

Abu Zaher Md. Faridee Francis M. Tyers Dept. of Comp. Science and Eng., Dept. Llenguatges i Sist. Informàtics, Univ. of Eng. and Tech., Universitat d'Alacant, Dhaka 1000 Alacant. E-03070 [email protected] [email protected]

Abstract Persian as well as its own kin Hindi, Urdu, and language, resulting in a rich set of in- This article describes the development ections. Negatives and adverbs sometimes also of an open-source morphological anal- take the form of enclitic, being another reason for yser for using nite- higher the inection. state technology. First we discuss the Bengali exhibits diglossia,1 one of the two ma- challenges of creating a morphologi- jor forms is called Shadhubhasha and the other cal analyser for a highly inectional is called Chalitabhasha or SCB (Standard Col- language like Bengali and then pro- loquial Bengali). The main difference between pose a solution to that using lttoolbox, these two forms are seen in verbal inections an open-source nite-state toolkit. We and higher usage of tatshama words in Shadhub- then evaluate the performance of our hasha. Though a lot of Bengali literature and developed system and propose ways of formal or legal documents have been historically improving it further. written in Shadhubhasha, in modern times, SCB has almost replaced it in day-to-day usage; there- 1 Introduction fore, we chose to create our morphological anal- yser based on SCB. Bengali language (also referred as Bangla by Bengali is a classier language (Samit Bhat- its native speakers) is an Indo-Aryan language tacharya and Basu, 2005), meaning the verb does which is mainly spoken in the region of east- not change its form based on the gender and num- ern South Asia (also known as ) compris- ber of the subject or object, rather on the tense, ing Bangladesh, the Indian State of West Ben- aspect, modality and person. Our work found gal, southern Assam and part of Tripura. A lit- Bengali having three tenses, present, past and fu- erally rich language, its evolution can be traced ture; two moods, indicative and imperative; four back to Magadhi and Sanskrit languages. aspects, simple, continuous, perfect and habitual, There are an estimated 230 million speakers of the which differs a little from the grammar books. We language world wide, making it sixth among the identied 59 separate inections for each of the most spoken languages of the world. verbs. A sample sufx list for the Bengali verb কর Building a morphological analyser for Bengali [kar]2 `do' is shown in table 1. It should be noted using nite-state technology requires taking into that the sufx added to the verb root is highly de- account the highly inectional properties of the pendent on its syllabic structure leading to six ba- language and the diverse vocabulary. This di- versity can be attributed to the inuence of dif- 1http://lrc.cornell.edu/asian/courses/ ferent cultures and languages ranging from Euro- bengali 2NLK transliteration scheme, details at http: pean languages like English, French, Dutch, Por- //en.wikipedia.org/wiki/Bengali_script# tuguese to Middle Eastern languages like Arabic, Romanization_Reference.

J.A. P´erez-Ortiz,F. S´anchez-Mart´ınez,F.M. Tyers (eds.) Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation, p. 43–50 Alacant, Spain, November 2009 sic inection rule for verbs (the actual number of tium, hence the analyser data conforms to Aper- inectional paradigms for the verbs are however tium's dix format (Ortiz-Rojas et al., 2005). The much more than that). data for the analyser/generator is contained in For creating this morphological analyser/gen- an XML le, the Bengali monolingual dictio- erator we took a slightly different approach from nary (monodix). This dictionary has two principal more traditional Bengali grammars. For example, parts, the rst one is the pardef section, which there are seven grammatical cases in Bengali tra- contains the inection rules for a particular type ditional grammars (Chowdhury et al., 2000), but of word, the other is the entry section, which con- morphologically there are only four cases.4 An- tains the list of words stating which pardef they imacy also plays a major role in the inectional belong to. A part of pardef denition for a verb is pattern for nouns. We identied that there are four given in gure 1. levels of animacy encountered in Bengali text, When the transducer encounters the surface inanimate, animate, human and elite. The ani- form `িফরিছল' [phirchila], it outputs lemma `েফর' macy level of the nouns govern which case and [pher]¯ along with tag vblex, past, cnt, p3, plurality sufx are added to them. For example, infml. Thus the morphological analyser realises in nominative case, to construct plural form the 3rd person, informal, past continuous form of sufx রা [ra]¯ and গণ [gan] are added after human, verb `েফর' from `িফরিছল'. elite nouns respectively, but েলা [gulo]¯ is added The benet of using Apertium (particularly lt- after inanimate and animate nouns. The details toolbox, the toolset responsible for managing the are given in table 2. The inection rules for gen- monodix) is its robust architecture. The primary der of nouns vary radically, they are actually more advantage is that the analyser can also double as derivational than inectional. Since right now we a morphological generator. On the other hand, only deal with inectional morphology, we de- given an XML based monodix it can create a com- cided to treat genders as separate words. piled dictionary which is generally faster than a Inectionally similar to nouns, the pronouns do normal text or database based dictionary, optimal not have have genders. Proper nouns are harder for running small memory footprint devices. to deal with in Bengali, there is no special way to Bengali is distinct in nature as certain clitics are readily identify proper nouns, unlike in English. used to convey several adverbial meaning. For

We have several sub-categories like anthroponym, example, আপিন [apni]¯ - you, but আপিনও [apnio]¯ - toponym, cognomen, organization etc. for them. you too. Here the clitic ও is used to convey the A small number of Adjectives inect on degree of meaning of adverb `also'. So we create a new comparison. They can also inect on gender, but pardef for this. A sample pardef for enclitic can these can be considered sankskritism rather than be seen in gure 2. general phenomena (Islam et al., 2007) and do not As can be seen in table 1, the lemma form that have much use in SCB. Comparative and superla- we chose for the verb is a bit different from tra- tive forms are normally generated by applying তর ditional dictionary formats where the verbs are [tara] and তম [tama] sufxes. Adverbs do not in- generally represented in their gerund form. We ect on degree. chose to use the present indenite, second per- Like most other languages, there are small son, informal inected form as the lemma. The number of exceptions to all the general set of justication for this decision lies in the fact that rules, and these words are treated separately. this inection is the shortest (in length) for ev- ery verb. Also, according to Sanskrit grammar, 2 Design this form is analogous to the original stem (i.e. dhatu) after which afxes are added to create the The morphological analyser/generator was devel- real verb. Anubadok,5 an open-source English to oped as a part of a new language pair for Aper- Bengali machine translation system from which a 3Although no sufx is added, the pronunciation here is lot of linguistic data was derived, also uses this different [kara]¯ format for verb lemma. A potential disadvantage 4http://en.wikipedia.org/wiki/Bengali_ grammar#Case 5http://anubadok.sf.net

44 First Second Third and Impersonal কর [kar] Polite Familiar Informal Polite Familiar

Ger. ◌া [a]¯ n/a n/a n/a n/a n/a n/a Inf. েত [te]¯ n/a n/a n/a n/a n/a n/a Gen. ◌ার [ar]¯ n/a n/a n/a n/a n/a n/a 3 Pres. Spl. n/a ি◌ [i] ে◌ন [en]¯ ø ি◌স [is] ে◌ন [en]¯ ে◌ [e]¯ Pres. Cont. n/a িছ [chi] েছন [chen]¯ ছ [cha] িছস [chis] েছন [chen]¯ েছ [che]¯ Fut. Spl. n/a ব [ba] েবন [ben]¯ েব [be]¯ িব [bi] েবন [ben]¯ েব [be]¯ Past Spl. n/a লাম [lam]¯ েলন [len]¯ েল [le]¯ িল [li] েলন [len]¯ ল [la] Past Hbl. n/a তাম [tam]¯ েতন [ten]¯ েত [te]¯ িত [ti] েতন [ten]¯ ত [ta] Past Cont. n/a িছলাম [chilam]¯ িছেলন [chilen]¯ িছেল [chile]¯ িছিল [chili] িছেলন [chilen]¯ িছল [chila] Perf. n/a ে◌িছ [echi]¯ ে◌েছন [ech¯ en]¯ ে◌ছ [echa]¯ ে◌িছস [echis]¯ ে◌েছন [ech¯ en]¯ ে◌েছ [ech¯ e]¯ Plu Perf. n/a ে◌িছলাম [echil¯ am]¯ ে◌িছেলন [echil¯ en]¯ ে◌িছেল [echil¯ e]¯ ে◌িছিল [echili]¯ ে◌িছেলন [echil¯ en]¯ ে◌িছল [echila]¯ Part. Past. ে◌ [e]¯ n/a n/a n/a n/a n/a n/a Part. Cond. েল [le]¯ n/a n/a n/a n/a n/a n/a Imp. Pres. n/a n/a ◌ুন [un] ø ø n/a n/a Imp. Fut. n/a n/a ে◌ন [en]¯ ে◌া [o]¯ ি◌স [is] n/a n/a

Legends: Ger. - Gerund, Inf. - Innitive, Gen. - Genitive, Pres. - Present, Spl. - Simple, Cont. - Continuous, Hbl. - Habitual, Perf. - Perfect, Part. - Participle, Imp. - Imperative, Fut. - Future

Table 1: Sufx table for the Bengali verb কর [kar] `do'

Inanimate Animate Human Elite Singular Plural Singular Plural Singular Plural Singular Plural

Nominative ø/টা [t.a]¯ েলা [gulo]¯ ø/টা [t.a]¯ েলা [gulo]¯ ø/টা [t.a]¯ রা [ra]¯ ø রা [ra]¯ /গণ [gan. a] / [khan¯ a]¯ / [brinda] খানা /টু কু [t.uku] বৃ . /খািন [khani]¯ Objective ø/টা [t.a]¯ েলা [gulo]¯ েক [ke]¯ েলােক [gulok¯ e]¯ েক [ke]¯ েদরেক [derk¯ e]¯ েক [ke]¯ েক [ke]¯ / [khan¯ a]¯ / [tak¯ e]¯ / [tak¯ e]¯ / [ganke]¯ খানা /টু কু [t.uku] টােক . টােক . গণেক . / [brindake]¯ /খািন [khani]¯ বৃেক . Genitive র [ra] /ে◌র [er]¯ েলার [gulor]¯ র [ra] েলার [gulor]¯ র [ra] েদর [der]¯ র [ra] েদর [der]¯ / [yer]¯ / [er]¯ / [er]¯ / [er]¯ / [ganer]¯ েয়র /টু কু র [t.ukur] ে◌র ে◌র ে◌র গেণর . / [yer]¯ / [yer]¯ / [yer]¯ / [brinder]¯ /টার [t.ar]¯ েয়র েয়র েয়র বৃের . /খানার [khan¯ ar]¯ /টার [t.ar]¯ /টার [t.ar]¯ /খািনর [khanir]¯ Locative েয় [ye]¯ /য় [y] েলায় [guloy]¯ n/a n/a n/a n/a n/a n/a /েত [te]¯ /টায় [tay]¯ /টু কু েত [tukute]¯ /খানায় [khan¯ ay]¯ /খািনেত [khanit¯ e]¯

Table 2: Sufx table for Bengali noun inections

......

ি◌রিছল ে◌র

ি◌ের ে◌র

......

Figure 1: Part of paradigm denition for ফ/ে◌র__vblex

45 data such as gender, animacy of nouns, mood and aspect of tense; we had to manually tag them. We

wanted to remain faithful to Zipf's Law,9 which Anubadok does not follow. So a lot of high fre-

quency words that were not in Anubadok's dic- tionary had to be manually tagged. The fre- quency list of the words were obtained from CR-

BLP.10 The list was created at CRBLP as per their ই Prothom-alo lexicon project , the analysis done

on the frequency of Bengali words by crawling
Prothom-alo,11 one of the most circulated Ben- ও gali newspapers in Bangladesh. We were able to

get a list of most frequently used 20,000 Bengali words from the project.

3 Development
The development of the XML dictionary was done using open source tools like Python, PHP, Figure 2: A paradigm denition for Bengali enclitics MySQL and shell-scripting. At rst we focused on the open category words starting with verbs of this approach is we cannot directly use existing and then moving onto nouns, pronouns and ad- dictionary from other resources, but the benets jectives etc. We rst populated our database with outweighs this in most cases. lemmas, then made additional changes to the ta- As mentioned earlier, initially we took ad- bles with proper animacy, gender, number tags. vantage of the open-source English to Ben- Then we put down the inection rules in our gali machine translation tool called Anubadok. python scripts. In some cases, we took the help Anubadok served as a starting point for creating of another intermediate format called Speling for- 12 the rules section for this project, but we even- mat for ease of use. The scripts with the in- tually realised that more extensive rules would ection rules generate the semi-colon delimited be needed to build a high quality morphological les which are then used to create the nal Aper- analyser and generator. Although Anubadok is tium dictionary. We then use lttoolbox to compile a functional English to Bengali machine transla- the text dictionary to binary format. The Speling tion system, morphological analysis was never its format does not currently support enclitics, so it main focus. So we took the most basic approach, cannot be used when the words might take on ad- that is consulting the grammar books. Most of ditional enclitic, e.g. verbs and nouns. In these the inection rules come from Klaiman (2009), cases, the monodix is directly generated by the Chowdhury et al. (2000) and Ray et al. (1966) as script. well as Wikipedia.6 Figures 3 and 4 show the pseudo code for gen- erating the inections of verbs. In line 2 of g- Most of the part-of-speech tagged data came ure 3 we calculate the effective length of the from Anubadok, but Anubadok uses the Penn verb root by removing ` '(chandrabindu) and Treebank tagset7, so a converter was needed to ◌ঁ ` '(hashant). Then depending on the length of convert the tags to our proposed tagset which is a ◌্ the verb, we choose appropriate function to gen- superset8 of Apertium's normal tagset. Also, each erate the actual inection. Figure 4 shows how of our each lexical category carries more lexical 9http://en.wikipedia.org/wiki/Zipf's_ 6http://en.wikipedia.org/wiki/Bengali_ law grammar 10http://crblp.bracu.ac.bd 7http://www.cis.upenn.edu/~treebank/ 11http://www.prothom-alo.com 8http://wiki.apertium.org/wiki/ 12http://wiki.apertium.org/wiki/ Bengali_and_English/TagSets Speling_format

46 Part of speech Number of entries are few digital SCB text resources available in the Noun 2,035 Internet, apart form Wikipedia and the Prothom- Adjective 866 Proper noun 800 alo website. Therefore, our evaluation was also Adverb 432 conned to corpus developed from Wikipedia and Verb 136 Prothom-alo. The data was collected by running Numeral 123 Post-position 46 a web crawler on these two web sites. Determiner 45 Pronoun 32 First we calculate the naïve coverage according Conjunction 8 to following formula: 4,523 No. of words with at least one analysis Coverage = Table 3: Number of entries in the lexicon for the main parts No. of words of speech

Table 4 shows the naïve coverage for wikipedia we are generating the inected forms. First in and prothom-alo. The statistics clearly show op- line 1 and 2, we generate two type of umlauts for timization nature of our analyser. Since the fre- this particular kind of verb root. Normally this is quency list was derived from prothom-alo, we get a regressive vowel harmony, a phonological phe- a higher coverage (80.35%) of prothom-alo than nomenon attributive to SCB. But for some verbs of wikipedia (68.21%). like যা [ya],¯ আছ [acha],¯ it is rather a morphological phenomenon (e.g. যা [ya]¯ - েগল [gela],¯ আছ [acha]¯ - Site File size Total Recognised Naïve থাকত [thakta]).¯ In the following lines we concate- (MB) words words coverage nate the root or the umlauts with the sufxes. It is Wkipedia 27.1 MB 1,730,745 1,180,542 68.21% to be noted that the actual code for these two pro- Prothom- 23.4 MB 1,572,601 1,263,661 80.35% cedure involves some complex regular expression alo matching (e.g. detection of hashant and chandra- bindu, unicode normalisation, handling irregular Table 4: Naïve coverage for wikipedia and prothom-alo verbs), something which was removed from the pseudo-code for simplicity and clarity. For calculating recall and precision we also A preliminary version of the verb conjugator took two sets of data. First one is the list of 13 can be found online which generates all the most frequently used 1000 words (in their surface forms for a particular verb. The current morpho- form) from CRBPL (Prothom-alo corpus). The 14 logical analyser can also be accessed online. second one is randomly selected 1000 word text excerpt from the web-crawler data of Prothom- 4 Evaluation alo. The formulas for calculating recall and pre- As table 3 shows, the number of currently tagged cision are as follows: parts of speech is 4,523. This is not a very high number. However, as we shall later see, our dic- Number of correct analyses Precision tionary puts emphasis on the most frequently used = Number of analyses retrieved words, therefore achieving a higher effective cov- erage. Doing an evaluation poses some major chal- lenges for us. As we have stated before, this Number of correct analyses Recall implementation of Bengali morphological anal- = Total number of analyses yser was created keeping in mind only the SCB (Standard Colloquial Bengali). However there Table 5 shows the recall and precision values 13http://xixona.dlsi.ua.es/~fran/ for the two sets of data. Precision is high in both bengali/conj/ 14http://xixona.dlsi.ua.es/~fran/ cases. Recall value falls in the latter case (random bengali/analysis.php excerpt), but this is still a high value.

47 PROCESS-VERB(verbList) 1 for each verb in verbList 2 do length length of verb discarding ` ' [chandrabindu] and ` ' [hashant] ← ◌ঁ ◌্ 3 if length > 2 ✄ Rule for verbs that are more than 2 characters long 4 then if verb ends with a marker preceded by a consonant 5 then GET-INFLECTION-DO(verb) ✄ করা,পড়া etc. 6 if verb ends with a consonant preceded by a `ে◌' or `ে◌া' 7 then GET-INFLECTION-WRITE(verb) ✄ েলখ, েখল etc. 8 if verb ends with a consonant preceded by a marker 9 then GET-INFLECTION-SHAKE(verb) ✄ পাড়, নাড়, ছুট etc. 10 elseif length = 2 ✄ Process 2 letter verbs 11 then if verb ends with a consonant preceded by an ও 12 then GET-INFLECTION-WRITE(verb) ✄ ওঠ etc. 13 if verb ends with a consonant preceded by an vowel or a consonant 14 then GET-INFLECTION-SHAKE(verb) ✄ আস, আন, আঁক etc. 15 if verb is `যা' 16 then GET-INFLECTION-GO(verb) 17 if verb ends with ে◌ or ে◌া preceded by a consonant 18 then GET-INFLECTION-TAKE(verb) ✄ েন, েদ etc. 19 if verb ends with a marker preceded by a consonant 20 then GET-INFLECTION-EAT(verb) ✄ খা, পা, , িন etc. 21 else ✄ Process single letter verbs 22 if verb ends with a consonant 23 then GET-INFLECTION-EAT(verb) ✄ ক, হ etc.

Figure 3: Pseudo code for the procedure PROCESS-VERB

GEN-INFLECTION-TAKE(verb) 1 uml replace ` ' with ` ' and ` ' with ` ' in verb ← ে◌ ি◌ ে◌া ◌ু 2 uml2 replace ` ' with ` ' in verb ← ে◌ ◌া 3 I[Ger.] verb + ← ওয়া 4 I[Inf.] uml + ← েত 5 I[Gen.] verb + ← বার 6 I[P res.Spl.] verb + , verb + , uml2 + , uml + , verb + , verb + ← ই ন ও স ন য় 7 I[P res.Cont.] uml + , uml + , uml + , uml + , uml + , uml + ← ি েন িস েন ে 8 I[F ut.Spl.] verb + , verb + , verb + , uml + , verb + , verb + ← ব েবন েব িব েবন েব 9 I[P ast.Spl.] uml + , uml + , uml + , uml + , uml + , uml + ← লাম েলন েল িল েলন ল 10 I[P ast.Hbl.] uml + , uml + , uml + , uml + , uml + , uml + ← তাম েতন েত িত েতন ত 11 I[P ast.Cont.] uml + , uml + , uml + , uml + , uml + , uml + ← িলাম িেলন িেল িিল িেলন িল 12 I[P erf.] uml + , uml + , uml + , uml + , uml + , uml + ← েয়িছ েয়েছন েয়ছ েয়িছস েয়েছন েয়েছ 13 I[P luP erf.] uml + , uml + , uml + , uml + , uml + , uml + ← েয়িছলাম েয়িছেলন েয়িছেল েয়িছিল েয়িছেলন েয়িছল 14 I[P art.P ast] uml + ← েয় 15 I[P art.Cond.] uml + ← েল 16 I[Imp.P res.] verb + , uml2 + , verb, verb+ , uml + ← ন ও ন ক 17 I[Imp.F ut.] verb + , uml + , uml + , verb+ , uml + ← েবন ও স েবন েব

Figure 4: Pseudo code for procedure GEN-INFLECTION-TAKE

48 Type Correct Retrieved Total Precision Recall could also be used as a stemmer for any search Top 1000 1,626 1,723 1,631 99.6% 94.37% engine for Bengali language. It could also dou- Random 1000 1,328 1,504 1,330 99.8% 88.29% ble as a spell checker. In fact works are in the way to create a new Bengali dictionary for Fire- Table 5: Precision and Recall fox and OpenOfce.Org using this morphological analyser by Ankur.15 5 Discussion To our knowledge, this is the rst open- source attempt in creating a fully-functional wide- Currently the analyser is in preliminary stage and coverage morphological analyser and generator there are lots of room for improvement. Firstly, for Bengali that is publicly available to every- the coverage needs to be expanded. This will re- one. All the tools that were used in this project quire manual tagging of many words. Secondly, are open source and the output, both the linguistic the verb section faces difculty in treating multi- data and the toolset is available under the GNU word verbs and the negative form are not well GPL license. recognised. This is because several forms of the verb like innitives and participles demand a neg- Acknowledgments ative particle before the verb while nite forms Development was funded as part of the Google require the particle to follow the verb and in some Summer of Code programme16. We are also very cases as enclitic. Matters get complicated in the grateful to the personnel from CRBLP, specially former case, multi-word verbs require the nega- Dr. Mumit Khan for allowing us to use their tive particle to be in the middle. Table 6 shows the the most frequently used word-list. And special negative forms of a multi-word verb কাজ কর `work' thanks to co-mentor Kevin Donnelly, who helped clearly identifying the problem. This can be par- us a lot during the initial phase of the project and tially taken care of by a nested paradigm, but this to Dr. G. M. Hossain, the author of the Anubadok makes the analyser slow, we need to come up with system which was used during the development. a better solution to solve this. One possibility is porting this analyser (the References python scripts) to a more robust architecture Beesley, K. R. and Karttunen, L. (2003). Finite such as foma (Huldén, 2009), an open-source State Morphology. CSLI Publications. implementation of the Xerox nite-state tools Chowdhury, S. M., Chowdhury, S. M. H., Khalil, (Beesley and Karttunen, 2003). It should be M. I., Muhammad, D. K. D., and Lahiri, S. noted that there has been similar attempts to cre- (2000). (A grammar of Bengali ate nite-state technology based morphological বাংলা ভাষার বাকরণ Language). National Curriculam and Textbook analyser for Bengali with PC-KIMMO (Dasgupta Board (NCTB). and Khan, 2004) and JKimmo (Islam and Khan, 2006). However, PC-KIMMO is not Unicode Dasgupta, S. and Khan, M. (2004). Morpho- compliant, it cannot be directly used for Ben- logical parsing of Bangla words using PC- gali morphological analysis. On the other hand KIMMO. Proceedings of the 7th International JKimmo requires a transliteration scheme. The Conference on Computer and Information solution we present here is Unicode compliant Technology (ICCIT2004, Dhaka, Bangladesh). and requires no transliteration scheme. Further- Huldén, M. (2009). Foma: a nite-state compiler more, foma is fully able to handle Unicode char- and library. EACL 2009, pages 29–32. acters, thus maintaining our Unicode compliance Islam, M. Z. and Khan, M. (December 2006). should we choose to port to it. Jkimmo: A multilingual computational mor- As mentioned earlier, this morphological anal- phology framework for pc-kimmo. Proc. of yser/generator was created as a part of a new lan- 15 guage pair bn-en for Apertium. We are on our http://ankur.org.bd/ 16http://socghop.appspot.com/student_ way of creating a functional English to Bengali project/show/google/gsoc2009/apertium/ translation system. The morphological analyser t124021625239

49 First person Second person Third person / Impersonal কাজ কর [kaz¯ kar] Polite Familiar Informal Polite Familiar

Ger. কাজ না করা n/a n/a n/a n/a n/a n/a [kaz¯ na¯ kara]¯ Inf. কাজ না করেত n/a n/a n/a n/a n/a n/a [kaz¯ na¯ karte]¯ Gen. কাজ না করার n/a n/a n/a n/a n/a n/a [kaz¯ na¯ karar]¯ Pres. Spl. n/a কাজ কির না কাজ কেরন না কাজ কর না কাজ কিরস না কাজ কেরন না কাজ কের না [kaz¯ kari na]¯ [kaz¯ karen¯ na]¯ [kaz¯ kara na]¯ [kaz¯ karis na]¯ [kaz¯ karen¯ na]¯ [kaz¯ kare¯ na]¯ Pre. Cont. n/a কাজ করিছ না কাজ করেছন না কাজ করছ না কাজ করিছস না কাজ করেছন না কাজ করেছ না [kaz¯ karchi na]¯ [kaz¯ karchen¯ na]¯ [kaz¯ karcha na]¯ [kaz¯ karchis na]¯ [kaz¯ karchen¯ na]¯ [kaz¯ karche¯ na]¯ Fut. Spl. n/a কাজ করব না কাজ করেবন না কাজ করেব না কাজ করিব না কাজ করেবন না কাজ করেব না [kaz¯ karba na]¯ [kaz¯ karben¯ na]¯ [kaz¯ karbe¯ na]¯ [kaz¯ karbi na]¯ [kaz¯ karben¯ na]¯ [kaz¯ karbe¯ na]¯ Past. Spl. n/a কাজ করলাম না কাজ করেলন না কাজ করেল না কাজ করিল না কাজ করেলন না কাজ করল না [kaz¯ karlam¯ na]¯ [kaz¯ karlen¯ na]¯ [kaz¯ karle¯ na]¯ [kaz¯ karli na]¯ [kaz¯ karlen¯ na]¯ [kaz¯ karla na]¯ Past. Hbl. n/a কাজ করতাম না কাজ করেতন না কাজ করেত না কাজ করিত না কাজ করেতন না কাজ করত না [kaz¯ kartam¯ na]¯ [kaz¯ karten¯ na]¯ [kaz¯ karte¯ na]¯ [kaz¯ karti na]¯ [kaz¯ karten¯ na]¯ [kaz¯ karta na]¯ Past. Cont. n/a কাজ করিছলাম না কাজ করিছেলন না কাজ করিছেল না কাজ করিছিল না কাজ করিছেলন না কাজ করিছল না [kaz¯ karchilam¯ na]¯ [kaz¯ karchilen¯ na]¯ [kaz¯ karchile¯ na]¯ [kaz¯ karchili na]¯ [kaz¯ karchilen¯ na]¯ [kaz¯ karchila na]¯ Perf. n/a কাজ কিরিন কাজ কেরনিন কাজ করিন কাজ কিরসিন কাজ কেরনিন কাজ কেরিন [kaz¯ karini] [kaz¯ karenni]¯ [kaz¯ karani] [kaz¯ karisni] [kaz¯ karenni]¯ [kaz¯ kareni]¯ Plu Perf. n/a কাজ কিরিন কাজ কেরনিন কাজ করিন কাজ কিরসিন কাজ কেরনিন কাজ কেরিন [kaz¯ karini] [kaz¯ karenni]¯ [kaz¯ karani] [kaz¯ karisni] [kaz¯ karenni]¯ [kaz¯ kareni]¯ Part. Past. কাজ না কের n/a n/a n/a n/a n/a n/a [kaz¯ na¯ kare]¯ Part. Cond. কাজ না করেল n/a n/a n/a n/a n/a n/a [kaz¯ na¯ karle]¯ Imp. Pres. n/a n/a কাজ করেবন না কাজ করেব না কাজ করিব না কাজ করেবন না কাজ করেব না [kaz¯ karben¯ na]¯ [kaz¯ karbe¯ na]¯ [kaz¯ karbi na]¯ [kaz¯ karben¯ na]¯ [kaz¯ karbe¯ na]¯ কাজ কিরস না [kaz¯ karis na]¯ Imp. Fut. n/a n/a কাজ করেবন না কাজ কেরা না কাজ করেবন না কাজ করেব না [kaz¯ karben¯ na]¯ [kaz¯ karo¯ na]¯ [kaz¯ kar na]¯ [kaz¯ karbe¯ na]¯ /কাজ কিরসেন [kaz¯ karisne]¯

Table 6: Example of negative inections of a compound verb

9th International Conference on Computer and systems. In Proc. of the National Conference Information Technology (ICCIT 2006), Dhaka, on Computer Processing of Bangla (NCCPB Bangladesh. 05), Dhaka, Bangladesh. Islam, M. Z., Uddin, M. N., and Khan, M. (2007). A light weight stemmer for Bengali and its use in a spelling checker. Proceedings of the 1st International Conference on Digital Communications and Computer Applications (DCCA2007, Irdbid, Jordan). Klaiman, M. H. (2009). The World's Major Lan- guages, volume 23. Routledge, 2nd edition. Ortiz-Rojas, S., Forcada, M. L., and Ramírez- Sánchez, G. (2005). Construcción y mini- mización eciente de transductores de letras a partir de diccionarios con paradigmas. Proce- samiento del lenguaje natural, (35):51–57. Ray, P. S., Chakravarti, P. N., Chatterjee, S., Ja- han, R., and Ahmed, M. (1966). A Refer- ence Grammer of Bengali. The University of Chicago. Samit Bhattacharya, Monojit Choudhury, S. S. and Basu, A. (2005). Inectional morphology synthesis for bengali noun, pronoun and verb

50