The Bangla-Asamiya Script and Its Representation in Unicode
Total Page:16
File Type:pdf, Size:1020Kb
THE BANGLA-ASAMIY A SCRIPT AND ITS REPRESENTATION IN UNICODE Probal Dasgupta University of Hyderabad Gautam Sengupta Jadavpur University. Ko/katal o. Introductory Remarks The script describ~d here is one that the Eastem Indo-Aryan languages BangIa (or Bengali) and Asamiya (or Assamese), the establlshed and traditional loci of this script, share with Santali )-&'#& is also written ' other scripts), Manipuri an" several other languages! As the linguistics and graphology of those languages come to be better understood, the behav'our of this shared script ' those settings -ill playa crucial role, as all newly opened up "omains do, ' redrawi g the overall picture of this script. Until that development takes place, we are confined to the study of the major languages that have traditionally used this and only this script. Furthermore, the background of the aut&ors compels an emphasis on the way the script functions ' the BangIa setting, treating the Asamiya material as a brief supplement to a overall BangIa-focused story. The present authors regret but cannot avoid this bias, -&'#& should be easy ,or Asamiya scholars to correct. For some #urrent Asamiya-base" remarks on this share" script, see Goswami a " Tamuli )1223*! The present paper seeks to cover the synchronic realities at the levels of linguistic a " graphological description, including the computational dimension that the study o, scripts ' our era is compelled to take on board for languages that have made the jump! Some rudimentary remarks about historical origins also appear, but will need to be supplemented by specialists of the history of script an" language. While eschewing pho ological description for its own sake, we are compelled to provide a reasonably full picture of the segmental phonology of standar" BangIa as the basic material that the graphology hooks on to. For convenience o, reading, we use a basic phonemic transcription as our standard metho" of mentioning BangIa or Asamiya words, except where the specifics of a written fonn are under scrutiny. For the same reason, our "'scussion of the script is built around a romanization whose conventions are consistent with the phonemic transcription we employ. 1 We should ideally distinguish between the properties of the script proper, -&'#& can underlie several different. orthographies for one or more languages, and those of an orthograp&$ that applies it to a particular language. But common parlan#e does not make this distinction. For immediate practical purposes, given the relatively marginal differences between written BangIa and written Asamiya at the structural level, -e will therefore not make that distinction, a " will speak of script somewhat loosely. The distinctions we do need to make for our practical work are expressed ' the transcription systems used! In our practice here+ the graphological transcriptions are enclosed ' 7 8 to distinguish them from transcriptions at the graphic level enclosed ' single angular brackets < 9! We employ a ' u ,or short vowel graphemes, a I ii for their long counterparts, r for the functionally .ocalic Iril originating historically from syllabic Irl, ai aU for the diphthong graphemes, tilde f()r vowel nasalizatio (as ' a), it for the velar nasal, s for the palato- alveolar sibilant, nand s for the dentals, t : for retroflex plosives, r for the retroflex flap, ( for the retroflex sibilant, ,' for the palatal nasal, H for the anusvara and b for the visarga to the extent that these may eed separate representation, y for the palatal grapheme with semivowel value (marked ' the actual script by placing a dot under the independent allograp& of the palatal grapheme), $ for the palatal grapheme with non-semivowel value (this covers both the undotted independent allograp& of the palatal grapheme and the postconsonantal depe dent palatal marking known as ya-phala), and backslash for the silencer that cancels the default vowel «a» on a consonant. The phonem'# II and phonetic ; < transcription systems use re = for low vowels, # j for palato-alveolar affricates, and other symbols (including Ia! for the low back unrounded vowel) as ' IPA and other standard tra scriptions. Itis possible that this stock needs to be eked out by drawing on IPA itself an" other resources ', one expands the scope of such an investigation. When Salomon )1223> ?1*+ ' a direct reference to BangIa, claims that 'the written vowel e actually serves to represent three separate sounds', he lists 1r./, ' addition to our lei and lrel, -&'#& corresponds to nothing known ' standard BangIa. The reference is perhaps to dialects of BangIa ' -&'#& standard lrel does not appear, for we are not aware of any variant of Bangia ' -&'#& lower-mid fro t and low front vowels appear to contrast with mid front, and Salomo does not clarify -&at he has ' mind. We "oubt that graphological studies are prepared to take on all the dialects at this stage of the enterprise. The present expositio is a working draft. Section I provides a general overview of the script and some considerations involv'ng phonology and orthography. Section 1 takes up more concrete issues on the computational front requiring decisions. 1. Considerations of Script, Phonology, and Orthography 1.1. Diachrony and Description We distinguish the content of descriptive linguistics, -&'#& needs to take on board whatever realities govern a particular state of a language, from the principles of linguistic synchrony, -&'#& interact with those of diachrony and other systems to yield these realities. T&# serious study of linguistic synchrony deploys scientific idealizations modelling the most transparent operations ' + say+ phonology or syntax. In contrast, the descriptive analysis of the phenomena of a language has to hug these phenomena closely and to suspend judgment about just ho- various factors interact to give rise to the appearances, ', they are later scientifically judged to be as deceptive as all appearances. The description of morphology, for instance+ has to take into account both synchron'# and diachronic principles. In the other direction, a study of "'achrony must divide its attention between the tracking of phenomena through time, a historical matter, and the effects o, "'achronic phenomena as they show up within a particular state of a given language+ where the archaic and the innovative sectors of the lexicon are layere" "ifferently. In the study of language ' recent decades, there has been some neglect both of diachron'# issues ( " of the interface between spoken and written language. 4e face a certain problem of equipment and of preparedness of the "iscursive surroundings, therefore, when we approach the issue of the diachrony of the Bangla-Asamiya writing system. For our purposes, let the following points suffice, inviting supplementation by specialists. To begin with a familiar first approximation )-&'#& -e radically revise ' section 1*+ the Bangla- Asamiya script is alpha-syllabic, or an ak~ara script ' the sense of Salomon )1223>BC*! Its remote ancestor was the Brahm) script. Its classic shape was finalized over several decades when the sociology of print encountered the handwriting of Ishwarchandra Vidyasagar (1820-1891, the initiator of BangIa print litera#$ and the author of primers that have stoo" the test of time) a " his associates ' nineteenth century Bengal. There was one major revision of these shapes beginning ' the nineteen fifties, with the rise of linotype printing and the need to reduce the number of "'stinct allographs. The current rise of computer technology has been occasioning much activity behind the scenes as far as the replication of older printing standards is concerned, but little by way of script reform ' the sense of the earlier impact of print technology on this script. At the level of orthography, Bangla-Asamiya writing is considerably more conservative than the phonologies of these languages. Many innovations ' their phonology are not matched ' the writing of words taken directly ,rom Sanskrit and preserving the orthography of Sanskrit, standardly called tatsama words. As a result, there is a deep cleavage between speech and writing ' these languages. C!1! Generalities and Dowels The basic graphological molecule of an ak~ara script such as the Bangla-Asamiya script is an ak~ara. It consists of a series of graphic atoms. The series must begin -'th zero or more consonants and end with either a marked. vowel or the unmarke" default vowel «a» (BangIa and Asamiya pronounce this as I::>/) or the silencer element transcribed by us as the backs lash E and called 7hasanta» Ih::>Sonto/ ' BangIa (the Nagari term is «halanta», Hind' Ih~l~nt/). Thus, ' «ud\bigna» ludbigno/ 'anxious', there are four ak~aras. The vocalic ak~ara <ill» consists of zero consonant plus graphological <ill». The zero consonant triggers the graphic l~vel choice of the independent vowel allograph of <ill»!The next ak~ara 7"E8 #onsists of graphic F"9 plus the silencer, without -&'#& graphic F"9 -ould be understood to carry a default «a» (modem BangIa strongly tends to omit all silencers at the graphic level, leaving it an open question whether the$ are even graphologically absent)! A consonant «b» 'nitiates the following ak~ara ' the word under consideration and triggers, for graphological 7'8+ the graph level #&oice of the dependent allograph F'9+ -&'#& appears to the left of the graph <b> and is called the «hrasba-i-kara» /hr::>§oikar/, the 'short 7'8 supplement'. All vowels distinguish the dependent (or supplementary) from the independent allograph! The last ak~ara of <illd\bigna» begins with the conjunct consonant «gn» and leaves the default vowel «a» understood. Some conjunct consonants like «gn» combine the two graphs, ' this case vertically (while 7mph» combinesm with ph horizontally). One consonant, «t», ' a conjunct-initial position+ chooses a prefixal allograph called «khanc;l8 ta» /khonet.otto/ 'broken t', whose use preempts that of the unattested «t\» combination throughout the language. In the other "irection, several consonants ' conjunct-final position choose a dependent allograph known as a <<phala8Iph::>la/.