Evolution, Optimization, and Language Change: the Case of Bengali Verb Inflections
Total Page:16
File Type:pdf, Size:1020Kb
Evolution, Optimization, and Language Change: The Case of Bengali Verb Inflections Monojit Choudhury1, Vaibhav Jalan2, Sudeshna Sarkar1, Anupam Basu1 1 Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur, India {monojit,sudeshna,anupam}@cse.iitkgp.ernet.in 2 Department of Computer Engineering Malaviya National Institute of Technology, Jaipur, India [email protected] Abstract (e.g., Hare and Elman (1995); Dras et al. (2003); Ke et al. (2003); Choudhury et al. (2006b)), all the The verb inflections of Bengali underwent mathematical and computational models developed a series of phonological change between for explaining language change are built for artifi- 10th and 18th centuries, which gave rise cial toy languages. This has led several researchers to several modern dialects of the language. to cast a doubt on the validity of the current compu- In this paper, we offer a functional ex- tational models as well as the general applicability planation for this change by quantifying of computational techniques in diachronic explana- the functional pressures of ease of artic- tions (Hauser et al., 2002; Poibeau, 2006). ulation, perceptual contrast and learnabil- In this paper, we offer a functional explanation1 ity through objective functions or con- of a real world language change – the morpho- straints, or both. The multi-objective and phonological change affecting the Bengali verb multi-constraint optimization problem has inflections (BVI). We model the problem as a been solved through genetic algorithm, multi-objective and multi-constraint optimization whereby we have observed the emergence and solve the same using Multi-Objective Genetic of Pareto-optimal dialects in the system Algorithm2 (MOGA). We show that the different that closely resemble some of the real forms of the BVIs, as found in the several modern ones. dialects, automatically emerge in the MOGA frame- work under suitable modeling of the objective and 1 Introduction constraint functions. The model also predicts several Numerous theories have been proposed to explain 1Functionalist accounts of language change invoke the basic the phenomenon of linguistic change, which, of late, function of language, i.e. communication, as the driving force behind linguistic change (Boersma, 1998). Stated differently, are also being supported by allied mathematical or languages change in a way to optimize their function, such that computational models. See (Steels, 1997; Perfors, speakers can communicate maximum information with min- 2002) for surveys on computational models of lan- imum effort (ease of articulation) and ambiguity (perceptual contrast). Often, ease of learnability is also considered a func- guage evolution, and (Wang et al., 2005; Niyogi, tional benefit. For an overview of different explanations in di- 2006) for reviews of works on language change. achronic linguistics see (Kroch, 2001) and Ch. 3 of (Blevins, The aim of these models is to explain why and how 2004). 2Genetic algorithm was initially proposed by Hol- languages change under specific socio-cognitive as- land (1975) as a self-organizing adaptation process mimicking sumptions. Although computational modeling is a the biological evolution. They are also used for optimization and machine learning purposes, especially when the nature of useful tool in exploring linguistic change (Cangelosi the solution space is unknown or there are more than one objec- and Parisi, 2002), due to the inherent complexi- tive functions. See Goldberg (1989) for an accessible introduc- ties of our linguistic and social structures, modeling tion to single and multi-objective Genetic algorithms. Note that in case of a multi-objective optimization problem, MOGA gives of real language change turns out to be extremely a set of Pareto-optimal solutions rather than a single optimum. hard. Consequently, with the exception of a few The concept of Pareto-optimality is defined later. 65 Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology, pages 65–74, Prague, June 2007. c 2007 Association for Computational Linguistics other possible dialectal forms of Bengali that seems Attributes Classical (Λ0) SCB ACB Sylheti PrS1 kari kori kori kori linguistically plausible and might exist or have ex- PrS2 kara karo kara kara isted in the past, present or future. Note that the PrS3 kare kare kare kare evolutionary algorithm (i.e., MOGA) has been used PrSF karen karen karen karoin here as a tool for optimization, and has no relevance PrC1 kariteChi korChi kartAsi koirtAsi to the evolution of the dialects as such. PrC2 kariteCha korCho kartAsa koirtAsae Previously, Redford et al. (2001) has modeled the PrC3 kariteChe korChe kartAse koirtAse PrCF kariteChen korChen kartAsen kortAsoin emergence of syllable systems in a multi-constraint and multi-objective framework using Genetic al- PrP1 kariAChi koreChi korsi koirsi gorithms. Since the model fuses the individual PrP2 kariACha koreCho karsa koirsae PrP3 kariAChe koreChe karse koirse objectives into a single objective function through PrPF kariAChen koreChen karsen korsoin a weighted linear combination, it is not a multi- objective optimization in its true sense and nei- Table 1: The different inflected verb forms of Clas- ther does it use MOGA for the optimization pro- sical Bengali and three other modern dialects. All cess. Nevertheless, the present work draws heavily the forms are in the phonetic forms and for the verb from the quantitative formulation of the objectives root kar. Legend: (tense) Pr – present; (aspects) S and constraints described in (Redford, 1999; Red- – simple, C – continuous, P – perfect, ; (person) 1 ford and Diehl, 1999; Redford et al., 2001). Ke et – first, 2 – second normal, 3 – third, F – formal in al. (2003) has demonstrated the applicability and ad- second and third persons. See (Bhattacharya et al., vantages of MOGA in the context of the vowel and 2005) for list of all the forms. tonal systems, but the model is not explicit about the process of change that could give rise to the optimal vowel systems. As we shall see that the conception verb root in Bengali, which are obtained through af- of the genotype, which is arguably the most impor- fixation of one of the 52 inflectional suffixes, option- tant part of any MOGA model, is a novel and signif- ally followed by the emphasizers. The suffixes mark icant contribution of this work. The present formu- for the tense, aspect, modality, person and polarity lation of the genotype not only captures a snapshot information (Bhattacharya et al., 2005). The ori- of the linguistic system, but also explicitly models gin of modern Bengali can be traced back to Vedic the course of change that has given rise to the partic- Sanskrit (circa 1500 BC 600 BC), which during ular system. Thus, we believe that the current model the middle Indo-Aryan period gave rise to the di- ¯ ¯ is more suitable in explaining a case of linguistic alects like Magadh¯ i, and Ardhamagadh¯ i (circa ¯ change. 600 BC 200 AD), followed by the Magadh¯ i − The paper is organized as follows: Sec. 2 intro- apabhramsha, and finally crystallizing to Bengali duces the problem of historical change affecting the (circa 10th century AD) (Chatterji, 1926). The ver- BVIs and presents a mathematical formulation of the bal inflections underwent a series of phonological same; Sec. 3 describes the MOGA model; Sec. 4 changes during the middle Bengali period (1200 - reports the experiments, observations and their in- 1800 AD), which gave rise to the several dialectal terpretations; Sec. 5 concludes the paper by sum- forms of Bengali, including the standard form – the marizing the contributions. In this paper, Bengali Standard Colloquial Bengali (SCB). th graphemes are represented in Roman script follow- The Bengali literature of the 19 century was ing the ITRANS notation (Chopde, 2001). Since written in the Classical Bengali dialect or the Bengali uses a phonemic orthography, the phonemes sadhubh¯ ash¯ a¯ that used the older verb forms and are also transcribed using ITRANS within two /s. drew heavily from the Sanskrit vocabulary, even though the forms had disappeared from the spoken th 2 The Problem dialects by 17 century. Here, we shall take the lib- erty to use the terms “classical forms” and “Classi- Bengali is an agglutinative language. There are cal Bengali” to refer to the dialectal forms of middle more than 150 different inflected forms of a single Bengali and not Classical Bengali of the 19th cen- 66 tury literature. Table 1 enlists some of the corre- APO Semantics sponding verb forms of classical Bengali and SCB. Del(p, LC, RC) p → φ/LC—RC Table 3 shows the derivation of some of the current Met(pipj, LC, RC) pipj → pjpi/LC—RC verb inflections of SCB from its classical counter- Asm(p, LC, RC) p → p0/LC—RC parts as reported in (Chatterji, 1926). Mut(p, p0, LC, RC) p → p0/LC—RC 2.1 Dialect Data Table 2: Semantics of the basic APOs in terms of Presently, there are several dialects of Bengali that rewrite rules. LC and RC are regular expressions vary mainly in terms of the verb inflections and in- specifying the left and right contexts respectively. p, 0 tonation, but rarely over syntax or semantics. We do p , pi and pj represent phonemes. not know of any previous study, during which the Rule APO Example Derivations different dialectal forms for BVI were collected and No. kar − iteChe kar − iten kar − iAChi systematically listed. Therefore, we have collected 1 Del(e, φ, Ch) kar − itChe NA NA 2 Del(t, φ, Ch) kar − iChe NA NA dialectal data for the following three modern dialects 3 Met(ri, φ, φ) kair − Che kair − ten kair − AChi of Bengali by enquiring the n¨aive informants. 5 Mut(A, e, φ, Ch) NA NA kair-eChi 6 Asm(a, i, φ, φ) koir − Che koir − ten koir − eChi • Standard Colloquial Bengali (SCB) spoken in a 7 Del(i, o, φ) kor − Che kor − ten kor − eChi region around Kolkata, the capital of West Ben- Table 3: Derivations of the verb forms of SCB from gal, classical Bengali using APOs.