Punjabi Tonemics and the Gurmukhi Script: a Preliminary Study

Total Page:16

File Type:pdf, Size:1020Kb

Punjabi Tonemics and the Gurmukhi Script: a Preliminary Study Brigham Young University BYU ScholarsArchive Theses and Dissertations 2012-03-07 Punjabi Tonemics and the Gurmukhi Script: A Preliminary Study Andrea Lynn Bowden Brigham Young University - Provo Follow this and additional works at: https://scholarsarchive.byu.edu/etd Part of the Linguistics Commons BYU ScholarsArchive Citation Bowden, Andrea Lynn, "Punjabi Tonemics and the Gurmukhi Script: A Preliminary Study" (2012). Theses and Dissertations. 2983. https://scholarsarchive.byu.edu/etd/2983 This Thesis is brought to you for free and open access by BYU ScholarsArchive. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of BYU ScholarsArchive. For more information, please contact [email protected], [email protected]. ǣ Ǥ ǡ ʹͲͳʹ ̹ʹͲͳʹǤ ǣ Ǥ ǡ ǡ ǡ Ǧǡ ǡǦǡ ǯǡ Ǥ ǡǡ ǡ Ǥǡ Ǥ ǡǡ ǡDz dz ȋǡͳͻͻͷȌǤǡ Ǥ Ǥ ǡ ǡ ǡ Ǥ : Ǧǡ ǡǡǡǡ ǡ Ǥǡ Ǥ ǡ ǡ Ǥ Ǥ ǡ ǡ ǡ Ǥ Ǥ Table of Contents 1. Introduction....................................................................................................................................1 1.1. PunjabiandLexicalTone.....................................................................................................1 1.2. PreviousResearch..................................................................................................................2 1.3. ProposalofthisResearch....................................................................................................5 1.4. ProposedTonogenesis..........................................................................................................7 1.5. ThesisOutline..........................................................................................................................8 2. Ethnography...................................................................................................................................9 2.1. ThePunjabRegion.................................................................................................................9 ʹǤͳǤͳǤ ǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤͻ ʹǤͳǤʹǤ ǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤͳͳ 2.2. LanguageInformationandHistory...............................................................................11 3.0. Critique........................................................................................................................................14 3.1. Tones........................................................................................................................................14 ͵ǤͳǤͳǤ ǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤͳͷ ͵ǤͳǤʹǤ ǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤͳͷ 3.2. DatafromPreviousResearch..........................................................................................19 3.3. VowelLengthandPreviousResearch..........................................................................21 3.4. OrthographicConsiderations..........................................................................................22 3.5. ProposedSolution...............................................................................................................23 4. Method/Results...........................................................................................................................26 4.1. Methodology..........................................................................................................................26 ͶǤͳǤͳǤ ǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤʹ͸ ͶǤͳǤʹǤ ǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤʹͺ ͶǤͳǤ͵Ǥ ǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤʹͻ ͶǤͳǤͶǤ ǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤʹͻ 4.2. Results.....................................................................................................................................29 ͶǤʹǤͳǤ ǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤ͵Ͳ ͶǤʹǤʹǤ ǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤ͵ʹ ͶǤʹǤ͵Ǥ ǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤ͵͵ 4.3. RelationshipofPhonemestoGurmukhi.....................................................................34 ͶǤ͵ǤͳǤ ǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤ͵Ͷ ͶǤ͵ǤʹǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤ͵͸ 4.4. TonesinPunjabi...................................................................................................................37 ͶǤͶǤͳǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤ͵ͺ ͶǤͶǤʹǤ ǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤͶͳ ͶǤͶǤ͵ǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤͶ͵ 5.0. Analysis......................................................................................................................................46 5.1. WordǦMedialEnvironments...........................................................................................46 ͷǤͳǤͳǤ ǦǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤͶ͹ 5.2. WordǦInitialEnvironments.............................................................................................50 ͷǤʹǤͳ ǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤͷͳ ͷǤʹǤͳǤͳǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤͷͳ ͷǤʹǤʹǤ Ǧ ǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤͷʹ ͷǤʹǤʹǤͳǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤͷʹ ͷǤʹǤʹǤʹǤ ǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤͷ͵ ͷǤʹǤʹǤ͵ǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤͷ͸ ͷǤʹǤʹǤͶǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤͷ͹ ͷǤʹǤʹǤͷǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤͷͺ ͷǤʹǤ͵ǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤͷͻ ͷǤʹǤ͵ǤͳǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤ͸Ͳ 6. Conclusion.....................................................................................................................................62 References.............................................................................................................................................65 AppendixA:InformedConsent......................................................................................................70 AppendixB:PunjabiLanguageStudyInformation.................................................................72 Scholarship/CriticalLanguage....................................................................................................72 Program..............................................................................................................................................72 DiscoveryofTonePattern............................................................................................................74 AppendixC:GurmukhiScript.........................................................................................................76 IndexofTables ͳǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤ͵ ʹǣ ǤǤǤǤǤǤǤǤǤǤǤǤǤǤ͸ ͵ǣ ǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤʹͲ ͶǣǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤǤʹͶ
Recommended publications
  • Cross-Language Framework for Word Recognition and Spotting of Indic Scripts
    Cross-language Framework for Word Recognition and Spotting of Indic Scripts aAyan Kumar Bhunia, bPartha Pratim Roy*, aAkash Mohta, cUmapada Pal aDept. of ECE, Institute of Engineering & Management, Kolkata, India bDept. of CSE, Indian Institute of Technology Roorkee India cCVPR Unit, Indian Statistical Institute, Kolkata, India bemail: [email protected], TEL: +91-1332-284816 Abstract Handwritten word recognition and spotting of low-resource scripts are difficult as sufficient training data is not available and it is often expensive for collecting data of such scripts. This paper presents a novel cross language platform for handwritten word recognition and spotting for such low-resource scripts where training is performed with a sufficiently large dataset of an available script (considered as source script) and testing is done on other scripts (considered as target script). Training with one source script and testing with another script to have a reasonable result is not easy in handwriting domain due to the complex nature of handwriting variability among scripts. Also it is difficult in mapping between source and target characters when they appear in cursive word images. The proposed Indic cross language framework exploits a large resource of dataset for training and uses it for recognizing and spotting text of other target scripts where sufficient amount of training data is not available. Since, Indic scripts are mostly written in 3 zones, namely, upper, middle and lower, we employ zone-wise character (or component) mapping for efficient learning purpose. The performance of our cross- language framework depends on the extent of similarity between the source and target scripts.
    [Show full text]
  • SC22/WG20 N896 L2/01-476 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale De Normalisation
    SC22/WG20 N896 L2/01-476 Universal multiple-octet coded character set International organization for standardization Organisation internationale de normalisation Title: Ordering rules for Khmer Source: Kent Karlsson Date: 2001-12-19 Status: Expert contribution Document type: Working group document Action: For consideration by the UTC and JTC 1/SC 22/WG 20 1 Introduction The Khmer script in Unicode/10646 uses conjoining characters, just like the Indic scripts and Hangul. An alternative that is suitable (and much more elegant) for Khmer, but not for Indic scripts, would have been to have combining- below (and sometimes a bit to the side) consonants and combining-below independent vowels, similar to the combining-above Latin letters recently encoded, as well as how Tibetan is handled. However that is not the chosen solution, which instead uses a combining character (COENG) that makes characters conjoin (glue) like it is done for Indic (Brahmic) scripts and like Hangul Jamo, though the latter does not have a separate gluer character. In the Khmer script, using the COENG based approach, the words are formed from orthographic syllables, where an orthographic syllable has the following structure [add ligation control?]: Khmer-syllable ::= (K H)* K M* where K is a Khmer consonant (most with an inherent vowel that is pronounced only if there is no consonant, independent vowel, or dependent vowel following it in the orthographic syllable) or a Khmer independent vowel, H is the invisible Khmer conjoint former COENG, M is a combining character (including COENG, though that would be a misspelling), in particular a combining Khmer vowel (noted A below) or modifier sign.
    [Show full text]
  • A Practical Sanskrit Introductory
    A Practical Sanskrit Intro ductory This print le is available from ftpftpnacaczawiknersktintropsjan Preface This course of fteen lessons is intended to lift the Englishsp eaking studentwho knows nothing of Sanskrit to the level where he can intelligently apply Monier DhatuPat ha Williams dictionary and the to the study of the scriptures The rst ve lessons cover the pronunciation of the basic Sanskrit alphab et Devanagar together with its written form in b oth and transliterated Roman ash cards are included as an aid The notes on pronunciation are largely descriptive based on mouth p osition and eort with similar English Received Pronunciation sounds oered where p ossible The next four lessons describ e vowel emb ellishments to the consonants the principles of conjunct consonants Devanagar and additions to and variations in the alphab et Lessons ten and sandhi eleven present in grid form and explain their principles in sound The next three lessons p enetrate MonierWilliams dictionary through its four levels of alphab etical order and suggest strategies for nding dicult words The artha DhatuPat ha last lesson shows the extraction of the from the and the application of this and the dictionary to the study of the scriptures In addition to the primary course the rst eleven lessons include a B section whichintro duces the student to the principles of sentence structure in this fully inected language Six declension paradigms and class conjugation in the present tense are used with a minimal vo cabulary of nineteen words In the B part of
    [Show full text]
  • The Festvox Indic Frontend for Grapheme-To-Phoneme Conversion
    The Festvox Indic Frontend for Grapheme-to-Phoneme Conversion Alok Parlikar, Sunayana Sitaram, Andrew Wilkinson and Alan W Black Carnegie Mellon University Pittsburgh, USA aup, ssitaram, aewilkin, [email protected] Abstract Text-to-Speech (TTS) systems convert text into phonetic pronunciations which are then processed by Acoustic Models. TTS frontends typically include text processing, lexical lookup and Grapheme-to-Phoneme (g2p) conversion stages. This paper describes the design and implementation of the Indic frontend, which provides explicit support for many major Indian languages, along with a unified framework with easy extensibility for other Indian languages. The Indic frontend handles many phenomena common to Indian languages such as schwa deletion, contextual nasalization, and voicing. It also handles multi-script synthesis between various Indian-language scripts and English. We describe experiments comparing the quality of TTS systems built using the Indic frontend to grapheme-based systems. While this frontend was designed keeping TTS in mind, it can also be used as a general g2p system for Automatic Speech Recognition. Keywords: speech synthesis, Indian language resources, pronunciation 1. Introduction in models of the spectrum and the prosody. Another prob- lem with this approach is that since each grapheme maps Intelligible and natural-sounding Text-to-Speech to a single “phoneme” in all contexts, this technique does (TTS) systems exist for a number of languages of the world not work well in the case of languages that have pronun- today. However, for low-resource, high-population lan- ciation ambiguities. We refer to this technique as “Raw guages, such as languages of the Indian subcontinent, there Graphemes.” are very few high-quality TTS systems available.
    [Show full text]
  • Schwa Deletion: Investigating Improved Approach for Text-To-IPA System for Shiri Guru Granth Sahib
    ISSN (Online) 2278-1021 ISSN (Print) 2319-5940 International Journal of Advanced Research in Computer and Communication Engineering Vol. 4, Issue 4, April 2015 Schwa Deletion: Investigating Improved Approach for Text-to-IPA System for Shiri Guru Granth Sahib Sandeep Kaur1, Dr. Amitoj Singh2 Pursuing M.E, CSE, Chitkara University, India 1 Associate Director, CSE, Chitkara University , India 2 Abstract: Punjabi (Omniglot) is an interesting language for more than one reasons. This is the only living Indo- Europen language which is a fully tonal language. Punjabi language is an abugida writing system, with each consonant having an inherent vowel, SCHWA sound. This sound is modifiable using vowel symbols attached to consonant bearing the vowel. Shri Guru Granth Sahib is a voluminous text of 1430 pages with 511,874 words, 1,720,345 characters, and 28,534 lines and contains hymns of 36 composers written in twenty-two languages in Gurmukhi script (Lal). In addition to text being in form of hymns and coming from so many composers belonging to different languages, what makes the language of Shri Guru Granth Sahib even more different from contemporary Punjabi. The task of developing an accurate Letter-to-Sound system is made difficult due to two further reasons: 1. Punjabi being the only tonal language 2. Historical and Cultural circumstance/period of writings in terms of historical and religious nature of text and use of words from multiple languages and non-native phonemes. The handling of schwa deletion is of great concern for development of accurate/ near perfect system, the presented work intend to report the state-of-the-art in terms of schwa deletion for Indian languages, in general and for Gurmukhi Punjabi, in particular.
    [Show full text]
  • LAST FIRST EXP Updated As of 8/10/19 Abano Lu 3/1/2020 Abuhadba Iz 1/28/2022 If Athlete's Name Is Not on List Acevedo Jr
    LAST FIRST EXP Updated as of 8/10/19 Abano Lu 3/1/2020 Abuhadba Iz 1/28/2022 If athlete's name is not on list Acevedo Jr. Ma 2/27/2020 they will need a medical packet Adams Br 1/17/2021 completed before they can Aguilar Br 12/6/2020 participate in any event. Aguilar-Soto Al 8/7/2020 Alka Ja 9/27/2021 Allgire Ra 6/20/2022 Almeida Br 12/27/2021 Amason Ba 5/19/2022 Amy De 11/8/2019 Anderson Ca 4/17/2021 Anderson Mi 5/1/2021 Ardizone Ga 7/16/2021 Arellano Da 2/8/2021 Arevalo Ju 12/2/2020 Argueta-Reyes Al 3/19/2022 Arnett Be 9/4/2021 Autry Ja 6/24/2021 Badeaux Ra 7/9/2021 Balinski Lu 12/10/2020 Barham Ev 12/6/2019 Barnes Ca 7/16/2020 Battle Is 9/10/2021 Bergen Co 10/11/2021 Bermudez Da 10/16/2020 Biggs Al 2/28/2020 Blanchard-Perez Ke 12/4/2020 Bland Ma 6/3/2020 Blethen An 2/1/2021 Blood Na 11/7/2020 Blue Am 10/10/2021 Bontempo Lo 2/12/2021 Bowman Sk 2/26/2022 Boyd Ka 5/9/2021 Boyd Ty 11/29/2021 Boyzo Mi 8/8/2020 Brach Sa 3/7/2021 Brassard Ce 9/24/2021 Braunstein Ja 10/24/2021 Bright Ca 9/3/2021 Brookins Tr 3/4/2022 Brooks Ju 1/24/2020 Brooks Fa 9/23/2021 Brooks Mc 8/8/2022 Brown Lu 11/25/2021 Browne Em 10/9/2020 Brunson Jo 7/16/2021 Buchanan Tr 6/11/2020 Bullerdick Mi 8/2/2021 Bumpus Ha 1/31/2021 LAST FIRST EXP Updated as of 8/10/19 Burch Co 11/7/2020 Burch Ma 9/9/2021 Butler Ga 5/14/2022 Byers Je 6/14/2021 Cain Me 6/20/2021 Cao Tr 11/19/2020 Carlson Be 5/29/2021 Cerda Da 3/9/2021 Ceruto Ri 2/14/2022 Chang Ia 2/19/2021 Channapati Di 10/31/2021 Chao Et 8/20/2021 Chase Em 8/26/2020 Chavez Fr 6/13/2020 Chavez Vi 11/14/2021 Chidambaram Ga 10/13/2019
    [Show full text]
  • 2020 Language Need and Interpreter Use Study, As Required Under Chair, Executive and Planning Committee Government Code Section 68563 HON
    JUDICIAL COUNCIL OF CALIFORNIA 455 Golden Gate Avenue May 15, 2020 San Francisco, CA 94102-3688 Tel 415-865-4200 TDD 415-865-4272 Fax 415-865-4205 www.courts.ca.gov Hon. Gavin Newsom Governor of California HON. TANI G. CANTIL- SAKAUYE State Capitol, First Floor Chief Justice of California Chair of the Judicial Council Sacramento, California 95814 HON. MARSHA G. SLOUGH Re: 2020 Language Need and Interpreter Use Study, as required under Chair, Executive and Planning Committee Government Code section 68563 HON. DAVID M. RUBIN Chair, Judicial Branch Budget Committee Chair, Litigation Management Committee Dear Governor Newsom: HON. MARLA O. ANDERSON Attached is the Judicial Council report required under Government Code Chair, Legislation Committee section 68563, which requires the Judicial Council to conduct a study HON. HARRY E. HULL, JR. every five years on language need and interpreter use in the California Chair, Rules Committee trial courts. HON. KYLE S. BRODIE Chair, Technology Committee The study was conducted by the Judicial Council’s Language Access Hon. Richard Bloom Services and covers the period from fiscal years 2014–15 through 2017–18. Hon. C. Todd Bottke Hon. Stacy Boulware Eurie Hon. Ming W. Chin If you have any questions related to this report, please contact Mr. Douglas Hon. Jonathan B. Conklin Hon. Samuel K. Feng Denton, Principal Manager, Language Access Services, at 415-865-7870 or Hon. Brad R. Hill Ms. Rachel W. Hill [email protected]. Hon. Harold W. Hopp Hon. Hannah-Beth Jackson Mr. Patrick M. Kelly Sincerely, Hon. Dalila C. Lyons Ms. Gretchen Nelson Mr.
    [Show full text]
  • The What and Why of Whole Number Arithmetic: Foundational Ideas from History, Language and Societal Changes
    Portland State University PDXScholar Mathematics and Statistics Faculty Fariborz Maseeh Department of Mathematics Publications and Presentations and Statistics 3-2018 The What and Why of Whole Number Arithmetic: Foundational Ideas from History, Language and Societal Changes Xu Hu Sun University of Macau Christine Chambris Université de Cergy-Pontoise Judy Sayers Stockholm University Man Keung Siu University of Hong Kong Jason Cooper Weizmann Institute of Science SeeFollow next this page and for additional additional works authors at: https:/ /pdxscholar.library.pdx.edu/mth_fac Part of the Science and Mathematics Education Commons Let us know how access to this document benefits ou.y Citation Details Sun X.H. et al. (2018) The What and Why of Whole Number Arithmetic: Foundational Ideas from History, Language and Societal Changes. In: Bartolini Bussi M., Sun X. (eds) Building the Foundation: Whole Numbers in the Primary Grades. New ICMI Study Series. Springer, Cham This Book Chapter is brought to you for free and open access. It has been accepted for inclusion in Mathematics and Statistics Faculty Publications and Presentations by an authorized administrator of PDXScholar. Please contact us if we can make this document more accessible: [email protected]. Authors Xu Hu Sun, Christine Chambris, Judy Sayers, Man Keung Siu, Jason Cooper, Jean-Luc Dorier, Sarah Inés González de Lora Sued, Eva Thanheiser, Nadia Azrou, Lynn McGarvey, Catherine Houdement, and Lisser Rye Ejersbo This book chapter is available at PDXScholar: https://pdxscholar.library.pdx.edu/mth_fac/253 Chapter 5 The What and Why of Whole Number Arithmetic: Foundational Ideas from History, Language and Societal Changes Xu Hua Sun , Christine Chambris Judy Sayers, Man Keung Siu, Jason Cooper , Jean-Luc Dorier , Sarah Inés González de Lora Sued , Eva Thanheiser , Nadia Azrou , Lynn McGarvey , Catherine Houdement , and Lisser Rye Ejersbo 5.1 Introduction Mathematics learning and teaching are deeply embedded in history, language and culture (e.g.
    [Show full text]
  • Tai Lü / ᦺᦑᦟᦹᧉ Tai Lùe Romanization: KNAB 2012
    Institute of the Estonian Language KNAB: Place Names Database 2012-10-11 Tai Lü / ᦺᦑᦟᦹᧉ Tai Lùe romanization: KNAB 2012 I. Consonant characters 1 ᦀ ’a 13 ᦌ sa 25 ᦘ pha 37 ᦤ da A 2 ᦁ a 14 ᦍ ya 26 ᦙ ma 38 ᦥ ba A 3 ᦂ k’a 15 ᦎ t’a 27 ᦚ f’a 39 ᦦ kw’a 4 ᦃ kh’a 16 ᦏ th’a 28 ᦛ v’a 40 ᦧ khw’a 5 ᦄ ng’a 17 ᦐ n’a 29 ᦜ l’a 41 ᦨ kwa 6 ᦅ ka 18 ᦑ ta 30 ᦝ fa 42 ᦩ khwa A 7 ᦆ kha 19 ᦒ tha 31 ᦞ va 43 ᦪ sw’a A A 8 ᦇ nga 20 ᦓ na 32 ᦟ la 44 ᦫ swa 9 ᦈ ts’a 21 ᦔ p’a 33 ᦠ h’a 45 ᧞ lae A 10 ᦉ s’a 22 ᦕ ph’a 34 ᦡ d’a 46 ᧟ laew A 11 ᦊ y’a 23 ᦖ m’a 35 ᦢ b’a 12 ᦋ tsa 24 ᦗ pa 36 ᦣ ha A Syllable-final forms of these characters: ᧅ -k, ᧂ -ng, ᧃ -n, ᧄ -m, ᧁ -u, ᧆ -d, ᧇ -b. See also Note D to Table II. II. Vowel characters (ᦀ stands for any consonant character) C 1 ᦀ a 6 ᦀᦴ u 11 ᦀᦹ ue 16 ᦀᦽ oi A 2 ᦰ ( ) 7 ᦵᦀ e 12 ᦵᦀᦲ oe 17 ᦀᦾ awy 3 ᦀᦱ aa 8 ᦶᦀ ae 13 ᦺᦀ ai 18 ᦀᦿ uei 4 ᦀᦲ i 9 ᦷᦀ o 14 ᦀᦻ aai 19 ᦀᧀ oei B D 5 ᦀᦳ ŭ,u 10 ᦀᦸ aw 15 ᦀᦼ ui A Indicates vowel shortness in the following cases: ᦀᦲᦰ ĭ [i], ᦵᦀᦰ ĕ [e], ᦶᦀᦰ ăe [ ∎ ], ᦷᦀᦰ ŏ [o], ᦀᦸᦰ ăw [ ], ᦀᦹᦰ ŭe [ ɯ ], ᦵᦀᦲᦰ ŏe [ ].
    [Show full text]
  • Shahmukhi to Gurmukhi Transliteration System: a Corpus Based Approach
    Shahmukhi to Gurmukhi Transliteration System: A Corpus based Approach Tejinder Singh Saini1 and Gurpreet Singh Lehal2 1 Advanced Centre for Technical Development of Punjabi Language, Literature & Culture, Punjabi University, Patiala 147 002, Punjab, India [email protected] http://www.advancedcentrepunjabi.org 2 Department of Computer Science, Punjabi University, Patiala 147 002, Punjab, India [email protected] Abstract. This research paper describes a corpus based transliteration system for Punjabi language. The existence of two scripts for Punjabi language has created a script barrier between the Punjabi literature written in India and in Pakistan. This research project has developed a new system for the first time of its kind for Shahmukhi script of Punjabi language. The proposed system for Shahmukhi to Gurmukhi transliteration has been implemented with various research techniques based on language corpus. The corpus analysis program has been run on both Shahmukhi and Gurmukhi corpora for generating statistical data for different types like character, word and n-gram frequencies. This statistical analysis is used in different phases of transliteration. Potentially, all members of the substantial Punjabi community will benefit vastly from this transliteration system. 1 Introduction One of the great challenges before Information Technology is to overcome language barriers dividing the mankind so that everyone can communicate with everyone else on the planet in real time. South Asia is one of those unique parts of the world where a single language is written in different scripts. This is the case, for example, with Punjabi language spoken by tens of millions of people but written in Indian East Punjab (20 million) in Gurmukhi script (a left to right script based on Devanagari) and in Pakistani West Punjab (80 million), written in Shahmukhi script (a right to left script based on Arabic), and by a growing number of Punjabis (2 million) in the EU and the US in the Roman script.
    [Show full text]
  • Indic​ ​Loanwords​ ​In​ ​Tocharian​ ​B,​ ​Local​ ​Markedness,​ ​​ ​And​ ​The​ ​Animacy
    Indic Loanwords in Tocharian B, Local Markedness, and the Animacy Hierarchy ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​​ ​ ​ ​ ​ ​ ​ ​ Francesco Burroni and Michael Weiss (Department of Linguistics, Cornell University) ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ A question that is rarely addressed in the literature devoted to Language Contact is: how are nominal forms borrowed when the donor and the recipient language both possess rich inflectional morphology? Can nominal forms be borrowed from and in different cases? What are the decisive factors shaping the borrowing scenario? In this paper, we frame this question from the angle of a case study involving two ancient Indo-European languages: Tocharian and Indic (Sanskrit, Prakrit(s)). ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ Most studies dedicated to the topic of loanwords in Tocharian B (henceforth TB) have focused on borrowings from Iranian (e.g. Tremblay 2005), but little attention has been so far devoted to forms borrowed from Indic, perhaps because they are considered uninteresting. We argue that such forms, however, are of interest for the study of Language Contact. A remarkable feature of Indic borrowings into TB is that a-stems are borrowed in TB as e-stems when denoting animate referents, but as consonant ​ ​ ​ ​ (C-)stems when denoting inanimate referents, a distribution that was noticed long ago by Mironov (1928, following Staёl-Holstein 1910:117 on Uyghur). In the literature, however, one finds no reaction to Mironov’s idea. By means of a systematic study of all the a-stems borrowed from Indic into TB, we argue ​ ​ that the trait [+/- animate] of the referent is, in fact, a very good predictor of the TB shape of the borrowing, e.g. male personal names from Skt.
    [Show full text]
  • A Comparative Study of Shan and Standard Thai Morphology
    A COMPARATIVE STUDY OF SHAN AND STANDARD THAI MORPHOLOGY Kittisara A Thesis Submitted in Partial Fulfilment of the Requirements for the Degree of Master of Arts (Linguistics) Graduate School Mahachulalongkornrajavidayalaya University C.E. 2018 A Comparative Study of Shan and Standard Thai Morphology Kittisara A Thesis Submitted in Partial Fulfilment of the Requirements for the Degree of Master of Arts (Linguistics) Graduate School Mahachulalongkornrajavidayalaya University C.E. 2018 (Copyright by Mahachulalongkornrajavidyalaya University) i Thesis Title : A Comparative Study of Shan and Standard Thai Morphology Researcher : Kittisara Degree : Master of Arts in Linguistics Thesis Supervisory Committee : Assoc. Prof. Nilratana Klinchan B.A. (English), M.A. (Political Science) : Asst. Prof. Dr. Phramaha Suriya Varamedhi B.A. (Philosophy), M.A. (Linguistics), Ph.D. (Linguistics) Date of Graduation : March 19, 2019 Abstract The purpose of this research is to explore the comparative study of Shan and standard Thai Morphology. The objectives of the study are classified into three parts as the following; (1) To study morpheme of Shan and standard Thai, (2) To study the word-formation of Shan and standard Thai and (3) To compare the morpheme and word-classes of Shan and standard Thai. This research is the qualitative research. The population referred to this research, researcher selects Shan people who were born at Tachileik in Shan state consisting of 6 persons. Area of research is Shan people at Tachileik in Shan state union of Myanmar. Research method, the tool used in the research, the researcher makes interview and document research. The main important parts in this study based on content analysis as documentary research by selecting primary sources from the books, academic books, Shan dictionary, Thai dictionary, library, online research and the research studied from informants' native speakers for 6 persons.
    [Show full text]