Recognition of Online Handwritten Gurmukhi Strokes Using Convolutional Neural Networks

Total Page:16

File Type:pdf, Size:1020Kb

Recognition of Online Handwritten Gurmukhi Strokes Using Convolutional Neural Networks Recognition of Online Handwritten Gurmukhi Strokes using Convolutional Neural Networks Rishabh Budhouliya1, Rajendra Kumar Sharma1 and Harjeet Singh2 1Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Punjab, India 2Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India frbudhouliya bemba15, [email protected], [email protected] Keywords: Convolutional Neural Networks, Data Augmentation, Stroke Warping, Gurmukhi Strokes, Online Handwritten Character Recognition. Abstract: In this paper, we attempt to explore and experiment multiple variations of Convolutional Neural Networks on the basis of their distributions of trainable parameters between convolution and fully connected layers, so as to achieve a state-of-the-art recognition accuracy on a primary dataset which contains isolated stroke samples of Gurmukhi script characters produced by 190 native writers. Furthermore, we investigate the benefit of data augmentation with synthetically generated samples using an approach called stroke warping on the aforemen- tioned dataset with three variants of a Convolutional Neural Network classifier. It has been found that this approach improves classification performance and reduces over-fitting. We extend this finding by suggesting that stroke warping helps in estimating the inherent variances induced in the original data distribution due to different writing styles and thus, increases the generalisation capacity of the classifier. 1 INTRODUCTION the strokes to achieve a state-of-the-art accuracy. • Explore and evaluate multiple CNN variants on Research in Online Handwritten Recognition Sys- the basis of their distribution of trainable param- tems seems to have peaked since the late 1990s, con- eters between convolution layer and fully con- 0 sidering the fact that Google s Multilingual Online nected layer on datasets with varying data samples Handwritten Recognition System is able to support per class. 22 scripts and 97 languages, which is being currently used by certain commercial products like Google • Study the effect of stroke warping, a technique to Translate successfully (Keysers et al., 2017). Despite produce random variances within the stroke sam- this success in Google0s system, we want to bring cer- ples. The warped augmented dataset is created by tain factors into light which are limited to the scope of applying a combination of affine transformation Indo-Aryan Languages - (rotation) and elastic distortions to images of the existing stroke samples. • Each written language has a strong variability as- sociated with the writing style depending upon • Experimentally evaluate the effect of data aug- certain demographic factors like region, age and mentation on classification accuracy of the clas- culture. sifier. • In the case of such languages, major issues tackled This paper is structured as follows. In section II, there by any researcher include the shape complexity of is a discussion about the development of character the characters, features for recognition of charac- recognition and the progression in Gurmukhi script ters and stroke level recognition. recognition. In section III, we introduce the Gur- mukhi script, covering its features and the different Motivated by these factors, our work aims to: styles of writing the script. The generation of syn- • Use a primary dataset called OHWR-Gurmukhi, thetic data is covered in section IV, where we use described in section IV, to sub sample a dataset the technique called stroke warping, to augment the which includes 79 stroke classes with 100 sam- dataset 100. After that, we delve into our main idea ples each, referred as dataset 100. A Convolu- in section V, an experiment which tests three classi- tional Neural Network has been used to classify fiers to validate the benefit of data augmentation and to achieve an optimal classification accuracy. In sec- nition was built to detect and aggregate strokes us- tion VI, the framework of the experiment is laid out ing set theory to recognize characters with an accu- and results of the performed experiments are shown. racy of 95.60% for single character stroke sequencing We discuss the result and their consequences on the (Kumar and Sharma, 2013). This work was done on objectives of the paper in section VII and we conclude a dataset of 27,231 samples categorized on the ba- the findings in section VIII. sis of the proficiency of the writers. A significant shift came after Hidden Markov Models(HMM) and Support Vector Machines(SVM) were used for clas- 2 RELATED WORK sification while employing the features extracted on the basis of region and cursiveness. This experiment resulted in a 96.70% recognition rate of Gurmukhi LeNet was the very first Convolutional Neural Net- characters (Verma and Sharma, 2016). Their experi- work used for visual detection tasks including char- ments consisted of the methods to extract features and acter recognition and document analysis (Jackel et al., then classify them using SVM or HMM for classifica- 1995). This neural network was used to extract local tion. Our aim in this work is to use the Deep Learn- geometric features from the input image in a way that ing concept of learning features and then performing preserved approximate relative locations of these fea- extensive experimentation using CNNs to obtain bet- tures. By convolving the input image with a trainable ter recognition accuracy. One of the main advantage kernel, the network was able to produce high level of using a CNN is that it is able to extract features feature maps which were then fed to a linear classifi- automatically and is invariant to shift and distortion cation layer. For the system they created, an overall (Wong et al., 2016). OCR accuracy exceeding 99.00% was achieved. Owing to the continuous academic research in the Online Handwritten Chinese Character Recogni- tion, it has been demonstrated (Xiao et al., 2017) that 3 GURMUKHI SCRIPT methods based on CNNs can learn more discrimi- native features from source data, which may lead to Punjabi language is spoken by about 130 million peo- a better end-to-end solution for Online Handwritten ple, mainly in West Punjab in Pakistan and in East Recognition problems. The authors of this paper con- Punjab in India. Indian Punjabi is written using the tinued to design a compact CNN classifier for On- Gurmukhi script, which has a fairly complex system line Handwritten Chinese Character Recognition us- of tonal variance. ing DropWeight for pruning redundant connections Some notable features of Gurmukhi script are : in a CNN architecture maintaining an accuracy of • Gurmukhi script is cursive and written in left to 96.88%. Handwritten Bangla Digit Recognition us- right direction with top down approach. ing a CNN with Gaussian and Gabor Filters, achieved • A horizontal line, called a “shirorekha” is found 98.78% recognition accuracy (Alom et al., 2017). on the upper part of almost all the characters. Along with working on improving the recognition accuracy of the classifiers, researchers also worked • Any Gurmukhi word can be divided into three upon writer adaptation for online handwritten recog- sections viz. Upper Zone, Middle Zone and the nition where they used lexemes to identify the styles Lower Zone. All the strokes are classified into present in a particular writer’s sample data which re- one of the three zone. The upper zone consists of sulted in the reduction of average error rate on hand- the region above the head line where some of the written words (Connell and Jain, 2002). vowels reside. The middle zone is the most popu- In the present paper, we demonstrate a method lated zone, consisting of consonants and some of to capture the variability induced by different writ- the vowels. The lower zone contains some vow- ing styles, thus enhancing the generalization accu- els and half characters that lie below the foot of racy of the classifier. It is pertinent here to discuss consonants (Verma and Sharma, 2017). the progression in Gurmukhi script recognition. A recognizer using pre-processing algorithms (Normal- 3.1 Different Styles of Writing ization, Interpolation and Slant Correction) has been Gurmukhi Script proposed to recognize loops, headline, straight line and dot features from online handwritten Gurmukhi The critical area of research in Character recognition strokes collected on a pen-tablet interface by 60 writ- is to capture and detect the complex nature of any ers (Sharma et al., 2007). After this, a post proces- script that results in the variation of writing styles. sor for improving the accuracy of character recog- The documented reasons for variation in handwriting Figure 1: Stroke classes for Gurmukhi character set. style can be attributed to a distinct way of writing for each person, and the ways can be: Figure 2: The distribution of dataset 100 into training and • Speed of writing test set. • Style of holding the pen validation set was created due to the small size of the • Formation of a character can be influenced by dataset 100. the amount of strokes used to create a character. An important question to be answered is why does Some users use a single stroke while others may an increase in data size by augmentation of sample use multiple strokes to write the same thing. images increases the generalization accuracy? Also, if there is an increase, could it be attributed to the fact 3.2 Data Collection that stroke warping is able to capture the variability in the empirical population due to the writer’s personal The source population of our dataset is created by 190 characteristics as discussed above. To answer these writers of different age group to bring maximum vari- questions, we have set-up an experiment where the ability in the population. A touch based, Tablet PC correlation between change in nature of the dataset has been used as input interface. The collected data due to addition of synthetic data samples and increase was annotated at stroke level with respective stroke in recognition accuracy can be seen and verified by classes characterized by three zones; Upper, Middle comparing performances of three different CNN clas- and the Lower zone as shown in Figure 1.
Recommended publications
  • The What and Why of Whole Number Arithmetic: Foundational Ideas from History, Language and Societal Changes
    Portland State University PDXScholar Mathematics and Statistics Faculty Fariborz Maseeh Department of Mathematics Publications and Presentations and Statistics 3-2018 The What and Why of Whole Number Arithmetic: Foundational Ideas from History, Language and Societal Changes Xu Hu Sun University of Macau Christine Chambris Université de Cergy-Pontoise Judy Sayers Stockholm University Man Keung Siu University of Hong Kong Jason Cooper Weizmann Institute of Science SeeFollow next this page and for additional additional works authors at: https:/ /pdxscholar.library.pdx.edu/mth_fac Part of the Science and Mathematics Education Commons Let us know how access to this document benefits ou.y Citation Details Sun X.H. et al. (2018) The What and Why of Whole Number Arithmetic: Foundational Ideas from History, Language and Societal Changes. In: Bartolini Bussi M., Sun X. (eds) Building the Foundation: Whole Numbers in the Primary Grades. New ICMI Study Series. Springer, Cham This Book Chapter is brought to you for free and open access. It has been accepted for inclusion in Mathematics and Statistics Faculty Publications and Presentations by an authorized administrator of PDXScholar. Please contact us if we can make this document more accessible: [email protected]. Authors Xu Hu Sun, Christine Chambris, Judy Sayers, Man Keung Siu, Jason Cooper, Jean-Luc Dorier, Sarah Inés González de Lora Sued, Eva Thanheiser, Nadia Azrou, Lynn McGarvey, Catherine Houdement, and Lisser Rye Ejersbo This book chapter is available at PDXScholar: https://pdxscholar.library.pdx.edu/mth_fac/253 Chapter 5 The What and Why of Whole Number Arithmetic: Foundational Ideas from History, Language and Societal Changes Xu Hua Sun , Christine Chambris Judy Sayers, Man Keung Siu, Jason Cooper , Jean-Luc Dorier , Sarah Inés González de Lora Sued , Eva Thanheiser , Nadia Azrou , Lynn McGarvey , Catherine Houdement , and Lisser Rye Ejersbo 5.1 Introduction Mathematics learning and teaching are deeply embedded in history, language and culture (e.g.
    [Show full text]
  • Tai Lü / ᦺᦑᦟᦹᧉ Tai Lùe Romanization: KNAB 2012
    Institute of the Estonian Language KNAB: Place Names Database 2012-10-11 Tai Lü / ᦺᦑᦟᦹᧉ Tai Lùe romanization: KNAB 2012 I. Consonant characters 1 ᦀ ’a 13 ᦌ sa 25 ᦘ pha 37 ᦤ da A 2 ᦁ a 14 ᦍ ya 26 ᦙ ma 38 ᦥ ba A 3 ᦂ k’a 15 ᦎ t’a 27 ᦚ f’a 39 ᦦ kw’a 4 ᦃ kh’a 16 ᦏ th’a 28 ᦛ v’a 40 ᦧ khw’a 5 ᦄ ng’a 17 ᦐ n’a 29 ᦜ l’a 41 ᦨ kwa 6 ᦅ ka 18 ᦑ ta 30 ᦝ fa 42 ᦩ khwa A 7 ᦆ kha 19 ᦒ tha 31 ᦞ va 43 ᦪ sw’a A A 8 ᦇ nga 20 ᦓ na 32 ᦟ la 44 ᦫ swa 9 ᦈ ts’a 21 ᦔ p’a 33 ᦠ h’a 45 ᧞ lae A 10 ᦉ s’a 22 ᦕ ph’a 34 ᦡ d’a 46 ᧟ laew A 11 ᦊ y’a 23 ᦖ m’a 35 ᦢ b’a 12 ᦋ tsa 24 ᦗ pa 36 ᦣ ha A Syllable-final forms of these characters: ᧅ -k, ᧂ -ng, ᧃ -n, ᧄ -m, ᧁ -u, ᧆ -d, ᧇ -b. See also Note D to Table II. II. Vowel characters (ᦀ stands for any consonant character) C 1 ᦀ a 6 ᦀᦴ u 11 ᦀᦹ ue 16 ᦀᦽ oi A 2 ᦰ ( ) 7 ᦵᦀ e 12 ᦵᦀᦲ oe 17 ᦀᦾ awy 3 ᦀᦱ aa 8 ᦶᦀ ae 13 ᦺᦀ ai 18 ᦀᦿ uei 4 ᦀᦲ i 9 ᦷᦀ o 14 ᦀᦻ aai 19 ᦀᧀ oei B D 5 ᦀᦳ ŭ,u 10 ᦀᦸ aw 15 ᦀᦼ ui A Indicates vowel shortness in the following cases: ᦀᦲᦰ ĭ [i], ᦵᦀᦰ ĕ [e], ᦶᦀᦰ ăe [ ∎ ], ᦷᦀᦰ ŏ [o], ᦀᦸᦰ ăw [ ], ᦀᦹᦰ ŭe [ ɯ ], ᦵᦀᦲᦰ ŏe [ ].
    [Show full text]
  • The Kharoṣṭhī Documents from Niya and Their Contribution to Gāndhārī Studies
    The Kharoṣṭhī Documents from Niya and Their Contribution to Gāndhārī Studies Stefan Baums University of Munich [email protected] Niya Document 511 recto 1. viśu͚dha‐cakṣ̄u bhavati tathāgatānaṃ bhavatu prabhasvara hiterṣina viśu͚dha‐gātra sukhumāla jināna pūjā suchavi paramārtha‐darśana 4 ciraṃ ca āyu labhati anālpakaṃ 5. pratyeka‐budha ca karoṃti yo s̄ātravivegam āśṛta ganuktamasya 1 ekābhirāma giri‐kaṃtarālaya 2. na tasya gaṃḍa piṭakā svakartha‐yukta śamathe bhavaṃti gune rata śilipataṃ tatra vicārcikaṃ teṣaṃ pi pūjā bhavatu [v]ā svayaṃbhu[na] 4 1 suci sugaṃdha labhati sa āśraya 6. koḍinya‐gotra prathamana karoṃti yo s̄ātraśrāvaka {?} ganuktamasya 2 teṣaṃ ca yo āsi subha͚dra pac̄ima 3. viśāla‐netra bhavati etasmi abhyaṃdare ye prabhasvara atīta suvarna‐gātra abhirūpa jinorasa te pi bhavaṃtu darśani pujita 4 2 samaṃ ca pādo utarā7. imasmi dāna gana‐rāya prasaṃṭ́hita u͚tama karoṃti yo s̄ātra sthaira c̄a madhya navaka ganuktamasya 3 c̄a bhikṣ̄u m It might be going to far to say that Torwali is the direct lineal descendant of the Niya Prakrit, but there is no doubt that out of all the modern languages it shows the closest resemblance to it. [...] that area around Peshawar, where [...] there is most reason to believe was the original home of Niya Prakrit. That conclusion, which was reached for other reasons, is thus confirmed by the distribution of the modern dialects. (Burrow 1936) Under this name I propose to include those inscriptions of Aśoka which are recorded at Shahbazgaṛhi and Mansehra in the Kharoṣṭhī script, the vehicle for the remains of much of this dialect. To be included also are the following sources: the Buddhist literary text, the Dharmapada found in Khotan, written likewise in Kharoṣṭhī [...]; the Kharoṣṭhī documents on wood, leather, and silk from Caḍ́ota (the Niya site) on the border of the ancient kingdom of Khotan, which represented the official language of the capital Krorayina [...].
    [Show full text]
  • Devan¯Agar¯I for TEX Version 2.17.1
    Devanagar¯ ¯ı for TEX Version 2.17.1 Anshuman Pandey 6 March 2019 Contents 1 Introduction 2 2 Project Information 3 3 Producing Devan¯agar¯ıText with TEX 3 3.1 Macros and Font Definition Files . 3 3.2 Text Delimiters . 4 3.3 Example Input Files . 4 4 Input Encoding 4 4.1 Supplemental Notes . 4 5 The Preprocessor 5 5.1 Preprocessor Directives . 7 5.2 Protecting Text from Conversion . 9 5.3 Embedding Roman Text within Devan¯agar¯ıText . 9 5.4 Breaking Pre-Defined Conjuncts . 9 5.5 Supported LATEX Commands . 9 5.6 Using Custom LATEX Commands . 10 6 Devan¯agar¯ıFonts 10 6.1 Bombay-Style Fonts . 11 6.2 Calcutta-Style Fonts . 11 6.3 Nepali-Style Fonts . 11 6.4 Devan¯agar¯ıPen Fonts . 11 6.5 Default Devan¯agar¯ıFont (LATEX Only) . 12 6.6 PostScript Type 1 . 12 7 Special Topics 12 7.1 Delimiter Scope . 12 7.2 Line Spacing . 13 7.3 Hyphenation . 13 7.4 Captions and Date Formats (LATEX only) . 13 7.5 Customizing the date and captions (LATEX only) . 14 7.6 Using dvnAgrF in Sections and References (LATEX only) . 15 7.7 Devan¯agar¯ıand Arabic Numerals . 15 7.8 Devan¯agar¯ıPage Numbers and Other Counters (LATEX only) . 15 1 7.9 Category Codes . 16 8 Using Devan¯agar¯ıin X E LATEXand luaLATEX 16 8.1 Using Hindi with Polyglossia . 17 9 Using Hindi with babel 18 9.1 Installation . 18 9.2 Usage . 18 9.3 Language attributes . 19 9.3.1 Attribute modernhindi .
    [Show full text]
  • Numeration Systems Reminder : If a and M Are Whole Numbers Then Exponential Expression Amor a to the Mth Power Is M = ⋅⋅⋅⋅⋅ Defined by A Aaa
    MA 118 – Section 3.1: Numeration Systems Reminder : If a and m are whole numbers then exponential expression amor a to the mth power is m = ⋅⋅⋅⋅⋅ defined by a aaa... aa . m times Number a is the base ; m is the exponent or power . Example : 53 = Definition: A number is an abstract idea that indicates “how many”. A numeral is a symbol used to represent a number. A System of Numeration consists of a set of basic numerals and rules for combining them to represent numbers. 1 Section 3.1: Numeration Systems The timeline indicates recorded writing about 10,000 years ago between 4,000–3,000 B.C. The Ishango Bone provides the first evidence of human counting, dated 20,000 years ago. I. Tally Numeration System Example: Write the first 12 counting numbers with tally marks. In a tally system, there is a one-to-one correspondence between marks and items being counted. What are the advantages and drawbacks of a Tally Numeration System? 2 Section 3.1: Numeration Systems (continued) II. Egyptian Numeration System Staff Yoke Scroll Lotus Pointing Fish Amazed (Heel Blossom Finger (Polywog) (Astonished) Bone) Person Example : Page 159 1 b, 2b Example : Write 3248 as an Egyptian numeral. What are the advantages and drawbacks of the Egyptian System? 3 Section 3.1: Numeration Systems (continued) III. Roman Numeral System Roman Numerals I V X L C D M Indo-Arabic 1 5 10 50 100 500 1000 If a symbol for a lesser number is written to the left of a symbol for a greater number, then the lesser number is subtracted.
    [Show full text]
  • Course: M.A. Sanskrit Total Seats: 509 Seats Distribution
    Course: M.A. Sanskrit Total Seats: 509 Seats Distribution Exam Type General SC ST OBC EWS Total Entrance 116 38 19 69 13 255 Merit 117 38 19 68 12 254 Eligibility in Entrance Category Category Course Requirements Marks Requirements Id 1 B.A. (Hons.) Sanskrit from recognized Minimum 45% marks for General Category and 40.5% for OBC University. and admission of SC/ST/PwD category candidates will be done as per University rules or equivalent grade 2 B.A. (Pass)/B.A. (Programme) or BA (Hons.) 50% marks in aggregate for General category and 45% for OBC in Hindi of the University of Delhi or any & admission of SC/ST/PwD category candidates will be done as recognized university with at least two papers per University rules or equivalent grade of Sanskrit 3 Acharya/Shastri (Sanskrit) examination of 50% in aggregate for General Category and 45% for OBC and recognized University/Deemed University admission of SC/ST/PwD category candidates will be done as and Rashtriya Sanskrit Sansthan. per University rules or equivalent grade 5 B.A. (H) Examination in any subject of the Minimum 50% for General Category and 45% for OBC in Delhi University of an examination Hons. And minimum 60% for general 54% for OBC in Sanskrit recognized as equivalent there to with Subsidiary/Concurrent Subject & admission of SC/ST/PwD Sanskrit as subsidiary/concurrent subject. category candidates will be done as University rules or equivalent grade 6 M.A. examination in any arts and Humanities Minimum 50% for General & 45% for OBC marks in aggregate (except Sanskrit) from University of Delhi.
    [Show full text]
  • An Introduction to Indic Scripts
    An Introduction to Indic Scripts Richard Ishida W3C [email protected] HTML version: http://www.w3.org/2002/Talks/09-ri-indic/indic-paper.html PDF version: http://www.w3.org/2002/Talks/09-ri-indic/indic-paper.pdf Introduction This paper provides an introduction to the major Indic scripts used on the Indian mainland. Those addressed in this paper include specifically Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. I have used XHTML encoded in UTF-8 for the base version of this paper. Most of the XHTML file can be viewed if you are running Windows XP with all associated Indic font and rendering support, and the Arial Unicode MS font. For examples that require complex rendering in scripts not yet supported by this configuration, such as Bengali, Oriya, and Malayalam, I have used non- Unicode fonts supplied with Gamma's Unitype. To view all fonts as intended without the above you can view the PDF file whose URL is given above. Although the Indic scripts are often described as similar, there is a large amount of variation at the detailed implementation level. To provide a detailed account of how each Indic script implements particular features on a letter by letter basis would require too much time and space for the task at hand. Nevertheless, despite the detail variations, the basic mechanisms are to a large extent the same, and at the general level there is a great deal of similarity between these scripts. It is certainly possible to structure a discussion of the relevant features along the same lines for each of the scripts in the set.
    [Show full text]
  • Origin of the Numerals Al-Biruni's Testimony
    Origin of the numerals Al-Biruni’s testimony Ahmed Boucenna Laboratoire DAC, Department of Physics, Faculty of Sciences, Ferhat Abbas University 19000 Stif, Algeria [email protected] [email protected] Abstract The origin of the numerals that we inherited from the arabo-Islamic civilization remained one enigma. The hypothesis of the Indian origin remained, with controversies, without serious rival. It was the dominant hypothesis since more of one century. Its partisans found to it and constructed a lot of arguments. The testimonies of the medieval authors have been interpreted to its advantage. The opposite opinions have been dismissed and ignored. An amalgam between the history of our modern numerals and the Indian mathematics history is made. Rational contradictions often passed under silence. A meticulous observation of the numerals permits to affirm that our numerals are in fact more or less modified Arabic letters. The "Ghubari" shape of the numerals shows that the symbol of a numeral corresponds to the Arabic letter whose numerical value is equal to this numeral. The numerals don't have a simple resemblance with some Arabic letters, but every number looks like the Arabic letter whose numerical value is equal to this numeral. The elements of the ''Abjadi'' calculation gives us a theoretical support, independent of the letters and numerals, witch explains our observation. Besides a re-lecture of the testimonies of the medieval authors, particularly the testimony of Al-Biruni, that is probably at the origin of all others testimonies speaking of the Indian origin of the numerals, is in agreement with the fact that our numerals are Arabic letters.
    [Show full text]
  • Tibetan Romanization Table
    Tibetan Comment [LRH1]: Transliteration revisions are highlighted below in light-gray or otherwise noted in a comment. ALA-LC romanization of Tibetan letters follows the principles of the Wylie transliteration system, as described by Turrell Wylie (1959). Diacritical marks are used for those letters representing Indic or other non-Tibetan languages, and parallel the use of these marks when transcribing their counterpart letters in Sanskrit. These are applied for the sake of consistency, and to reflect international publishing standards. Accordingly, romanize words of non-Tibetan origin systematically (following this table) in all cases, even though the word may derive from Sanskrit or another language. When Tibetan is written in another script (e.g., ʼPhags-pa) the corresponding letters in that script are also romanized according to this table. Consonants (see Notes 1-3) Vernacular Romanization Vernacular Romanization Vernacular Romanization ka da zha ཀ་ ད་ ཞ་ kha na za ཁ་ ན་ ཟ་ Comment [LH2]: While the current ALA-LC ga pa ’a table stipulates that an apostrophe should be used, ག་ པ་ འ་ this revision proposal recommends that the long- nga pha ya standing defacto LC practice of using an alif (U+02BC) be continued and explicitly stipulated in ང་ ཕ་ ཡ་ the Table. See accompanying Narrative for details. ca ba ra ཅ་ བ་ ར་ cha ma la ཆ་ མ་ ལ་ ja tsa sha ཇ་ ཙ་ ཤ་ nya tsha sa ཉ་ ཚ་ ས་ ta dza ha ཏ་ ཛ་ ཧ་ tha wa a ཐ་ ཝ་ ཨ་ Vowels and Diphthongs (see Notes 4 and 5) ཨི་ i ཨཱི་ ī རྀ་ r̥ ཨུ་ u ཨཱུ་ ū རཱྀ་ r̥̄ ཨེ་ e ཨཻ་ ai ལྀ་ ḷ ཨོ་ o ཨཽ་ au ལཱྀ ḹ ā ཨཱ་ Other Letters or Diacritical Marks Used in Words of Non-Tibetan Origin (see Notes 6 and 7) ṭa gha ḍha ཊ་ གྷ་ ཌྷ་ ṭha jha anusvāra ṃ Comment [LH3]: This letter combination does ཋ་ ཇྷ་ ◌ ཾ not occur in Tibetan texts, and has been deprecated from the Unicode Standard.
    [Show full text]
  • Transliteration of <Script Name>
    Transliteration of Bengali, Assamese & Manipuri 1/5 BENGALI, ASSAMESE & MANIPURI Script: Bengali* ISO 15919 UN ALA-LC 2001(1.0) 1977(2.0) 1997(3.0) Vowels অ ◌ a a a আ ◌া ā ā ā ই ি◌ i i i ঈ ◌ী ī ī ī উ ◌ু u u u ঊ ◌ূ ū ū ū ঋ ◌ৃ r̥ ṛ r̥ ৠ ◌ৄ r̥̄ — r̥̄ ঌ ◌ৢ l̥ — l̥ ৡ ◌ৣ A l̥ ̄ — — এ ে◌ e(1.1) e e ঐ ৈ◌ ai ai ai ও ে◌া o(1.1) o o ঔ ে◌ৗ au au au অ�া a:yā — — Nasalizations ◌ং anunāsika ṁ(1.2) ṁ ṃ ◌ঁ candrabindu m̐ (1.2) m̐ n̐, m̐ (3.1) Miscellaneous ◌ঃ bisarga ḥ ḥ ḥ ◌্ hasanta vowelless vowelless vowelless (1.3) ঽ abagraha Ⓑ :’ — ’ ৺ isshara — — — Consonants ক ka ka ka খ kha kha kha গ ga ga ga ঘ gha gha gha ঙ ṅa ṅa ṅa চ ca cha ca ছ cha chha cha জ ja ja ja ঝ jha jha jha ঞ ña ña ña ট ṭa ṭa ṭa ঠ ṭha ṭha ṭha ড ḍa ḍa ḍa ঢ ḍha ḍha ḍha ণ ṇa ṇa ṇa ত ta ta ta Thomas T. Pedersen – transliteration.eki.ee Rev. 2, 2005-07-21 Transliteration of Bengali, Assamese & Manipuri 2/5 ISO 15919 UN ALA-LC 2001(1.0) 1977(2.0) 1997(3.0) থ tha tha tha দ da da da ধ dha dha dha ন na na na প pa pa pa ফ pha pha pha ব ba ba ba(3.2) ভ bha bha bha ম ma ma ma য ya ja̱ ya র ⒷⓂ ra ra ra ৰ Ⓐ ra ra ra ল la la la ৱ ⒶⓂ va va wa শ śa sha śa ষ ṣa ṣha sha স sa sa sa হ ha ha ha ড় ṛa ṙa ṛa ঢ় ṛha ṙha ṛha য় ẏa ya ẏa জ় za — — ব় wa — — ক় Ⓑ qa — — খ় Ⓑ k̲h̲a — — গ় Ⓑ ġa — — ফ় Ⓑ fa — — Adscript consonants ◌� ya-phala -ya(1.4) -ya ẏa ৎ khanda-ta -t -t ṯa ◌� repha r- r- r- ◌� baphala -b -b -b ◌� raphala -r -r -r Vowel ligatures (conjuncts)C � gu � ru � rū � Ⓐ ru � rū � śu � hr̥ � hu � tru � trū � ntu � lgu Thomas T.
    [Show full text]
  • Brahminet-ITRANS Transliteration Scheme
    BrahmiNet-ITRANS Transliteration Scheme Anoop Kunchukuttan August 2020 The BrahmiNet-ITRANS notation (Kunchukuttan et al., 2015) provides a scheme for transcription of major Indian scripts in Roman, using ASCII characters. It extends the ITRANS1 transliteration scheme to cover characters not covered in the original scheme. Tables 1a shows the ITRANS mappings for vowels and diacritics (matras). Table 1b shows the ITRANS mappings for consonants. These tables also show the Unicode offset for each character. By Unicode offset, we mean the offset of the character in the Unicode range assigned to the script. For Indic scripts, logically equivalent characters are assigned the same offset in their respective Unicode codepoint ranges. For illustration, we also show the Devanagari characters corresponding to the transliteration. ka kha ga gha ∼Na, N^a ITRANS Unicode Offset Devanagari 15 16 17 18 19 a 05 अ क ख ग घ ङ aa, A 06, 3E आ, ◌ा cha Cha ja jha ∼na, JNa i 07, 3F इ, ि◌ 1A 1B 1C 1D 1E ii, I 08, 40 ई, ◌ी च छ ज झ ञ u 09, 41 उ, ◌ु Ta Tha Da Dha Na uu, U 0A, 42 ऊ, ◌ू 1F 20 21 22 23 RRi, R^i 0B, 43 ऋ, ◌ृ ट ठ ड ढ ण RRI, R^I 60, 44 ॠ, ◌ॄ ta tha da dha na LLi, L^i 0C, 62 ऌ, ◌ॢ 24 25 26 27 28 LLI, L^I 61, 63 ॡ, ◌ॣ त थ द ध न pa pha ba bha ma .e 0E, 46 ऎ, ◌ॆ 2A 2B 2C 2D 2E e 0F, 47 ए, ◌े प फ ब भ म ai 10,48 ऐ, ◌ै ya ra la va, wa .o 12, 4A ऒ, ◌ॊ 2F 30 32 35 o 13, 4B ओ, ◌ो य र ल व au 14, 4C औ, ◌ौ sha Sha sa ha aM 05 02, 02 अं 36 37 38 39 aH 05 03, 03 अः श ष स ह .m 02 ◌ं Ra lda, La zha .h 03 ◌ः 31 33 34 (a) Vowels ऱ ळ ऴ (b) Consonants Version: v1.0 (9 August 2020) NOTES: 1.
    [Show full text]
  • Prof. P. Bhaskar Reddy Sri Venkateswara University, Tirupati
    Component-I (A) – Personal details: Prof. P. Bhaskar Reddy Sri Venkateswara University, Tirupati. Prof. P. Bhaskar Reddy Sri Venkateswara University, Tirupati. & Dr. K. Muniratnam Director i/c, Epigraphy, ASI, Mysore Dr. Sayantani Pal Dept. of AIHC, University of Calcutta. Prof. P. Bhaskar Reddy Sri Venkateswara University, Tirupati. Component-I (B) – Description of module: Subject Name Indian Culture Paper Name Indian Epigraphy Module Name/Title Kharosthi Script Module Id IC / IEP / 15 Pre requisites Kharosthi Script – Characteristics – Origin – Objectives Different Theories – Distribution and its End Keywords E-text (Quadrant-I) : 1. Introduction Kharosthi was one of the major scripts of the Indian subcontinent in the early period. In the list of 64 scripts occurring in the Lalitavistara (3rd century CE), a text in Buddhist Hybrid Sanskrit, Kharosthi comes second after Brahmi. Thus both of them were considered to be two major scripts of the Indian subcontinent. Both Kharosthi and Brahmi are first encountered in the edicts of Asoka in the 3rd century BCE. 2. Discovery of the script and its Decipherment The script was first discovered on one side of a large number of coins bearing Greek legends on the other side from the north western part of the Indian subcontinent in the first quarter of the 19th century. Later in 1830 to 1834 two full inscriptions of the time of Kanishka bearing the same script were found at Manikiyala in Pakistan. After this discovery James Prinsep named the script as ‘Bactrian Pehelevi’ since it occurred on a number of so called ‘Bactrian’ coins. To James Prinsep the characters first looked similar to Pahlavi (Semitic) characters.
    [Show full text]