Overview of Unicode and Indian Scripts

Total Page:16

File Type:pdf, Size:1020Kb

Overview of Unicode and Indian Scripts CHAPTER: 2 Overview of Unicode and Indian Scripts Introduction History and Development of Human Languages History and Development of Scripts Character Representation in Computers Brief History of Character Representation ASCII (American Standard for Information Interchange) Unicode Principles of Unicode Standard Observations of Mapping Table Transliteration Algorithm Conclusion References Chapter 2: Overview of Unicode and Indian Scripts 39 2.1 INTRODUCTION Internet is being populated with resources in many of the world languages. Technology has enabled even smallest community group to work in their language and literature. Internet can deliver information about any domain which library professionals can collect, organize and provide to users. Librarian should look forward to make information easily accessible to user community. Giving organized information services over network and application of IT has brought about the cojicept of Digital library. With the availability of information in different languages and scripts. Digital library demands multilingual access to information. The representation of information in more than 7150 languages (1) in their respective scripts is a real challenge for IT industry as well as for library community. The system should have intelligence to understand different scripts and accordingly should be able to display it. One of the major problems in this is diversity of human language and scripts. To some extent transliteration comes to rescue, atleast to make user understand 'what the document is about'? Particularly in India, many people know a minimum of two/three languages and this is common especially in South India where people know two or three South Indian languages but not the script. In such situation the transliteration could definitely cut across the script barrier to some extent. Also one can read fields like Author name, publisher and place which are pronounced in the same way in any language. Libraries can hold the documents in many languages and scripts. Libraries should ensure access to these documents to user community irrespective of the language it is published. Translation service is one approach to solve the problem. But before that, the user should be able to know that there is a document of his/her interest so that he/she can ask for translation. To solve this problem library should also ensure generating catalogue entry for the document in all the languages which is a difficult task. The only ray of hope Chapter 2: Overview of Unicode and Indian Scripts 40 is transliteration algorithms and codification of data for controlled translation of catalogue entries to the extent possible. Transliteration and data codification of the catalogue may give an idea to the user about the content of the document. Mostly, character representation is done with ASCII (American Standard Code for Information Interchange) in computers. ASCII is a 7 bit character encoding system. Due to the development of Internet, multilingual communication has grow^n to a great extent and this situation demonstrates that ASCII does not hold good in handling multilingual data. When these problems were realized, a consortium of Xerox, IBM and others was formed to develop a character representation code that could handle all the scripts of the world. The outcome of these efforts was the development of the Unicode standard, which promises to handle all the scripts of the world (2). There are two parts to multilingual communication. First is the spoken part which we understand as language. In general, language is understood to be gesticulation or phonetic representation of one's feeling. The second part is recording of such representation for future use or communication which is done by recording the permanent or temporary marks on a surface which is known as Script. Thus the scripts are signs/symbols which carry some meaning. It is evident that language was bom first and then scripts. One of the objectives of the present work is to transliterate the data across Indian languages. For practical purposes and to demonstrate Hindi, Bengali, Telugu and Kannada are taken in consideration. The present work does not attempt to transliterate data from English to Indian language scripts or vis versa. Chapter 2: Overview of Unicode and Indian Scripts 41 2.2 HISTORY AND DEVELOPMENT OF HUMAN LANGUAGES The initiation of language has taken place with the genesis of life. The very way of expressing joy and sorrow itself is a form of language that nature has given to humans. Animals use different form of language like body language, contact language or gesture language to express their feelings. (3) According to Webster's Dictionary, a language is (4) "a systematic means of communicating ideas or feelings by the use of conventionalized signs, sounds, gestures, or marks having understood meanings". That means a language is not necessarily a vocal phenomena, it also includes gesture or visual signs and so on. A very good definition for language is given on Indiana University website. It provides meaning of language in different senses (5), >For a person, language exists in his brain. >For a pair or a number of persons, language can only be described in terms of the interaction between a speaker (writer, signer) and a hearer (reader, sign observer). >For a community, language is a system that is shared by the members of a community, evolving over the time as the community evolves There are a number of research projects to find out the cultural evolution and divergence of human through tracing the linguistic development. It is an established fact that the origin of man kind was in Africa, which also leads to the fact that the first ever language whatever it would have been would have taken birth in Afiica. The question comes 'Why there are so many languages in the world?' Chapter 2: Overview of Unicode and Indian Scripts 42 The mythological story of 'Tower of Babel' tells how God felt envious about Man's collaborative endeavor and 'blessed' him with different languages so that they couldn't communicate among themselves and could not further carry out collaborative jobs which would make them equal to God one day (6). Though the story does not hold true but states that it was the geographical separation which brought out lingual divergence. This separation happened because of the Man's migration in search of food and shelter. Besides understanding the geographical diversity as the prime cause of language divergence, there are small variations found in the language usage among the people of same community. Every one of us controls a definite sphere of words which keeps increasing throughout the life but in actual usage we don't use all the word of our lingual sphere which gives rise to a style often found in the renowned authors. Not only that often we are identified because of our style of writing, a person often commits same kind of mistake while talking. It could be because of his ignorance or incapability of pronunciation. In India there is a caste called as AHIR (used for North Indian Yadavas), but the actual pronunciation for it is ABHIR, means a fearless man. It is evident that ABHIR became AHIR in due course of time because of erroneous pronunciation and gradual adoption by the community. Such influence of one person on the others within a region brings out a regional dialect. Similarly, in due course of time the words are softened, new idioms and pronunciations become operational and thus language of the community goes through a kind of internal divergence. Besides the impact of other communities also play a part in lingual divergence which is known as External divergence. External divergence basically takes place when two or more communities come in contact. This contact could happen because of cross communal trade, marriages, and so on. A typical example could be sought in Dravidian language family. This family has borrowed a lot of words from Sanskrit (also known as Samskrit) which belongs to Indo-Aryan Chapter 2: Overview of Unicode and Indian Scripts 43 language family. The generation of different forms of Prakrits like Hindi, Bengali, Oriya, and so on, could have taken place because of internal divergence but the effect of external divergence can't be ruled out. According to one study, a total of 7150 spoken languages exist out of which 110 from broad categories (1). The following table gives an idea of 10 most spoken languages in the world. Rank, Countries Population Language (in millions) 1. Chinese, Brunei, Cambodia, China, Indonesia, Malaysia, 885 Mandarin Mongolia, Philippines, Singapore, S. Africa, Taiwan, Thailand 2. Spanish Algeria, Andorra, Argentina, Belize, Benin, Bolivia, 332 Cambodia, Chad, Chile, Colombia, Costa Rica, Cuba, Dominican Rep., Ecuador, El Salvador, Eq. Guinea, Guatemala, Honduras, Ivory Coast, Laos, Madagascar, Mali, Mexico, Morocco, Nicaragua, Niger, Panama, Paraguay, Peru, Spain, Togo, Tunisia, Uruguay, U.S., Venezuela, Vietnam 3. English Australia, Botswana, Brunei, Cameroon, Canada, 322 Eritrea, Ethiopia, Fiji, The Gambia, Guyana, India, Ireland, Israel, Lesotho, Liberia, Malaysia, Micronesia, Namibia, Nauru, New Zealand, Palau, Papua New Guinea, Samoa, Seychelles, Sierra Leone, Singapore, Solomon Islands, Somalia, S. Africa, Suriname, Swaziland, Tonga, U.K., U.S., Vanuatu, Zimbabwe, many Caribbean states 4. Arabic Egypt, Sudan, ALgeria, Morocco, Tunisia, Lybia, Saudi 235 Arabia, Syria, Jordan, Yemen, UAE, Oman, Iraq, Lebanon 5. Bengali Bangladesh, India, Singapore 189 6. Hindi India,
Recommended publications
  • UNICODE for Kannada General Information & Description
    UNICODE for Kannada (U+0C80 to U+0CFF) General Information & Description Written by: C V Srinatha Sastry Issued by: Director Directorate of Information Technology Government of Karnataka Multi Storied Buildings, Vidhana Veedhi BANGALORE 560 001 INDIA PDF created with FinePrint pdfFactory Pro trial version http://www.pdffactory.com UNICODE for Kannada Introduction The Kannada script is a South Indian script. It is used to write Kannada language of Karnataka State in India. This is also used in many parts of Tamil Nadu, Kerala, Andhra Pradesh and Maharashtra States of India. In addition, the Kannada script is also used to write Tulu, Konkani and Kodava languages. Kannada along with other Indian language scripts shares a large number of structural features. The Kannada block of Unicode Standard (0C80 to 0CFF) is based on ISCII-1988 (Indian Standard Code for Information Interchange). The Unicode Standard (version 3) encodes Kannada characters in the same relative positions as those coded in the ISCII-1988 standard. The Writing system that employs Kannada script constitutes a cross between syllabic writing systems and phonemic writing systems (alphabets). The effective unit of writing Kannada is the orthographic syllable consisting of a consonant (Vyanjana) and vowel (Vowel) (CV) core and optionally, one or more preceding consonants, with a canonical structure of ((C)C)CV. The orthographic syllable need not correspond exactly with a phonological syllable, especially when a consonant cluster is involved, but the writing system is built on phonological principles and tends to correspond quite closely to pronunciation. The orthographic syllable is built up of alphabetic pieces, the actual letters of Kannada script.
    [Show full text]
  • New York Statewide Data Warehouse Guidelines for Extracts for Use In
    New York State Student Information Repository System (SIRS) Manual Reporting Data for the 2015–16 School Year October 16, 2015 Version 11.5 The University of the State of New York THE STATE EDUCATION DEPARTMENT Information and Reporting Services Albany, New York 12234 Student Information Repository System Manual Version 11.5 Revision History Version Date Revisions Changes from 2014–15 to 2015–16 are highlighted in yellow. Changes since last version highlighted in blue. Initial Release. New eScholar template – Staff Attendance. CONTACT and STUDENT CONTACT FACTS fields for local use only. See templates at http://www.p12.nysed.gov/irs/vendors/2015- 16/techInfo.html. New Assessment Measure Standard, Career Path, Course, Staff Attendance, Tenure Area, and CIP Codes. New Reason for Ending Program Service code for students with disabilities: 672 – Received CDOS at End of School Year. Reason for Beginning Enrollment Code 5544 guidance revised. Reason for Ending Enrollment Codes 085 and 629 clarified and 816 modified. 11.0 October 1, 2015 Score ranges for Common Core Regents added in Standard Achieved Code section. NYSITELL has five performance levels and new standard achieved codes. Transgender student reporting guidance added. FRPL guidance revised. GED now referred to as High School Equivalency (HSE) diploma, language revised, but codes descriptions that contain “GED” have not changed. Limited English Proficient (LEP) students now referred to as English Language Learners (ELL), but code descriptions that contain “Limited English Proficient” or “LEP” have not changed. 11.1 October 8, 2015 Preschool/PreK/UPK guidance updated. 11.2 October 9, 2015 Tenure Are Code SMS added.
    [Show full text]
  • The Unicode Cookbook for Linguists: Managing Writing Systems Using Orthography Profiles
    Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2017 The Unicode Cookbook for Linguists: Managing writing systems using orthography profiles Moran, Steven ; Cysouw, Michael DOI: https://doi.org/10.5281/zenodo.290662 Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-135400 Monograph The following work is licensed under a Creative Commons: Attribution 4.0 International (CC BY 4.0) License. Originally published at: Moran, Steven; Cysouw, Michael (2017). The Unicode Cookbook for Linguists: Managing writing systems using orthography profiles. CERN Data Centre: Zenodo. DOI: https://doi.org/10.5281/zenodo.290662 The Unicode Cookbook for Linguists Managing writing systems using orthography profiles Steven Moran & Michael Cysouw Change dedication in localmetadata.tex Preface This text is meant as a practical guide for linguists, and programmers, whowork with data in multilingual computational environments. We introduce the basic concepts needed to understand how writing systems and character encodings function, and how they work together. The intersection of the Unicode Standard and the International Phonetic Al- phabet is often not met without frustration by users. Nevertheless, thetwo standards have provided language researchers with a consistent computational architecture needed to process, publish and analyze data from many different languages. We bring to light common, but not always transparent, pitfalls that researchers face when working with Unicode and IPA. Our research uses quantitative methods to compare languages and uncover and clarify their phylogenetic relations. However, the majority of lexical data available from the world’s languages is in author- or document-specific orthogra- phies.
    [Show full text]
  • Unicode and Code Page Support
    Natural for Mainframes Unicode and Code Page Support Version 4.2.6 for Mainframes October 2009 This document applies to Natural Version 4.2.6 for Mainframes and to all subsequent releases. Specifications contained herein are subject to change and these changes will be reported in subsequent release notes or new editions. Copyright © Software AG 1979-2009. All rights reserved. The name Software AG, webMethods and all Software AG product names are either trademarks or registered trademarks of Software AG and/or Software AG USA, Inc. Other company and product names mentioned herein may be trademarks of their respective owners. Table of Contents 1 Unicode and Code Page Support .................................................................................... 1 2 Introduction ..................................................................................................................... 3 About Code Pages and Unicode ................................................................................ 4 About Unicode and Code Page Support in Natural .................................................. 5 ICU on Mainframe Platforms ..................................................................................... 6 3 Unicode and Code Page Support in the Natural Programming Language .................... 7 Natural Data Format U for Unicode-Based Data ....................................................... 8 Statements .................................................................................................................. 9 Logical
    [Show full text]
  • Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
    1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only.
    [Show full text]
  • The Braille Font
    This is le brailletex incl boxdeftex introtex listingtex tablestex and exampletex ai The Br E font LL The Braille six dots typ esetting characters for blind p ersons c comp osed by Udo Heyl Germany in January Error Reports in case of UNCHANGED versions to Udo Heyl Stregdaer Allee Eisenach Federal Republic of Germany or DANTE Deutschsprachige Anwendervereinigung T X eV Postfach E Heidelb erg Federal Republic of Germany email dantedantede Intro duction Reference The software is founded on World Brail le Usage by Sir Clutha Mackenzie New Zealand Revised Edition Published by the United Nations Educational Scientic and Cultural Organization Place de Fontenoy Paris FRANCE and the National Library Service for the Blind and Physically Handicapp ed Library of Congress Washington DC USA ai What is Br E LL It is a fontwhich can b e read with the sense of touch and written via Braille slate or a mechanical Braille writer by blinds and extremly eyesight disabled The rst blind fontanight writing co de was an eight dot system invented by Charles Barbier for the Frencharmy The blind Louis Braille created a six dot system This system is used in the whole world nowadays In the Braille alphab et every character consists of parts of the six dots basic form with tworows of three dots Numb er and combination of the dots are dierent for the several characters and stops numbers have the same comp osition as characters a j Braille is read from left to right with the tips of the forengers The left forenger lightens to nd out the next line
    [Show full text]
  • A New Research Resource for Optical Recognition of Embossed and Hand-Punched Hindi Devanagari Braille Characters: Bharati Braille Bank
    I.J. Image, Graphics and Signal Processing, 2015, 6, 19-28 Published Online May 2015 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijigsp.2015.06.03 A New Research Resource for Optical Recognition of Embossed and Hand-Punched Hindi Devanagari Braille Characters: Bharati Braille Bank Shreekanth.T Research Scholar, JSS Research Foundation, Mysore, India. Email: [email protected] V.Udayashankara Professor, Department of IT, SJCE, Mysore, India. Email: [email protected] Abstract—To develop a Braille recognition system, it is required to have the stored images of Braille sheets. This I. INTRODUCTION paper describes a method and also the challenges of Braille is a language for the blind to read and write building the corpora for Hindi Devanagari Braille. A few through the sense of touch. Braille is formatted to a Braille databases and commercial software's are standard size by Frenchman Louis Braille in 1825.Braille obtainable for English and Arabic Braille languages, but is a system of raised dots arranged in cells. Any none for Indian Braille which is popularly known as Bharathi Braille. However, the size and scope of the combination of one to six dots may be raised within each English and Arabic Braille language databases are cell and the number and position of the raised dots within a cell convey to the reader the letter, word, number, or limited. Researchers frequently develop and self-evaluate symbol the cell exemplifies. There are 64 possible their algorithm based on the same private data set and combinations of raised dots within a single cell.
    [Show full text]
  • Proposal to Encode 0D5F MALAYALAM LETTER ARCHAIC II
    Proposal to encode 0D5F MALAYALAM LETTER ARCHAIC II Shriramana Sharma, jamadagni-at-gmail-dot-com, India 2012-May-22 §1. Introduction In the Malayalam Unicode encoding, the independent letter form for the long vowel Ī is: ഈ where the length mark ◌ൗ is appended to the short vowel ഇ to parallel the symbols for U/UU i.e. ഉ/ഊ. However, Ī was originally written as: As these are entirely different representations of Ī, although their sound value may be the same, it is proposed to encode the archaic form as a separate character. §2. Background As the core Unicode encoding for the major Indic scripts is based on ISCII, and the ISCII code chart for Malayalam only contained the modern form for the independent Ī [1, p 24]: … thus it is this written form that came to be encoded as 0D08 MALAYALAM LETTER II. While this “new” written form is seen in print as early as 1936 CE [2]: … there is no doubt that the much earlier form was parallel to the modern Grantha , both of which are derived from the old Grantha Ī as seen below: 1 ma - īdṛgvidhā | tatarāya īṇ Old Grantha from the Iḷaiyānputtūr copper plates [3, p 13]. Also seen is the Vatteḻuttu Ī of the same time (line 2, 2nd char from right) which also exhibits the two dots. Of course, both are derived from old South Indian Brahmi Ī which again has the two dots. It is said [entire paragraph: Radhakrishna Warrier, personal communication] that it was the poet Vaḷḷattōḷ Nārāyaṇa Mēnōn (1878–1958) who introduced the new form of Ī ഈ.
    [Show full text]
  • Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR)
    Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR) Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR) LGR Version: 3.0 Date: 2019-03-06 Document version: 2.6 Authors: Neo-Brahmi Generation Panel [NBGP] 1. General Information/ Overview/ Abstract The purpose of this document is to give an overview of the proposed Kannada LGR in the XML format and the rationale behind the design decisions taken. It includes a discussion of relevant features of the script, the communities or languages using it, the process and methodology used and information on the contributors. The formal specification of the LGR can be found in the accompanying XML document: proposal-kannada-lgr-06mar19-en.xml Labels for testing can be found in the accompanying text document: kannada-test-labels-06mar19-en.txt 2. Script for which the LGR is Proposed ISO 15924 Code: Knda ISO 15924 N°: 345 ISO 15924 English Name: Kannada Latin transliteration of the native script name: Native name of the script: ಕನ#ಡ Maximal Starting Repertoire (MSR) version: MSR-4 Some languages using the script and their ISO 639-3 codes: Kannada (kan), Tulu (tcy), Beary, Konkani (kok), Havyaka, Kodava (kfa) 1 Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR) 3. Background on Script and Principal Languages Using It 3.1 Kannada language Kannada is one of the scheduled languages of India. It is spoken predominantly by the people of Karnataka State of India. It is one of the major languages among the Dravidian languages. Kannada is also spoken by significant linguistic minorities in the states of Andhra Pradesh, Telangana, Tamil Nadu, Maharashtra, Kerala, Goa and abroad.
    [Show full text]
  • The Fontspec Package Font Selection for XƎLATEX and Lualatex
    The fontspec package Font selection for XƎLATEX and LuaLATEX Will Robertson and Khaled Hosny [email protected] 2013/05/12 v2.3b Contents 7.5 Different features for dif- ferent font sizes . 14 1 History 3 8 Font independent options 15 2 Introduction 3 8.1 Colour . 15 2.1 About this manual . 3 8.2 Scale . 16 2.2 Acknowledgements . 3 8.3 Interword space . 17 8.4 Post-punctuation space . 17 3 Package loading and options 4 8.5 The hyphenation character 18 3.1 Maths fonts adjustments . 4 8.6 Optical font sizes . 18 3.2 Configuration . 5 3.3 Warnings .......... 5 II OpenType 19 I General font selection 5 9 Introduction 19 9.1 How to select font features 19 4 Font selection 5 4.1 By font name . 5 10 Complete listing of OpenType 4.2 By file name . 6 font features 20 10.1 Ligatures . 20 5 Default font families 7 10.2 Letters . 20 6 New commands to select font 10.3 Numbers . 21 families 7 10.4 Contextuals . 22 6.1 More control over font 10.5 Vertical Position . 22 shape selection . 8 10.6 Fractions . 24 6.2 Math(s) fonts . 10 10.7 Stylistic Set variations . 25 6.3 Miscellaneous font select- 10.8 Character Variants . 25 ing details . 11 10.9 Alternates . 25 10.10 Style . 27 7 Selecting font features 11 10.11 Diacritics . 29 7.1 Default settings . 11 10.12 Kerning . 29 7.2 Changing the currently se- 10.13 Font transformations . 30 lected features .
    [Show full text]
  • Haptiread: Reading Braille As Mid-Air Haptic Information
    HaptiRead: Reading Braille as Mid-Air Haptic Information Viktorija Paneva Sofia Seinfeld Michael Kraiczi Jörg Müller University of Bayreuth, Germany {viktorija.paneva, sofia.seinfeld, michael.kraiczi, joerg.mueller}@uni-bayreuth.de Figure 1. With HaptiRead we evaluate for the first time the possibility of presenting Braille information as touchless haptic stimulation using ultrasonic mid-air haptic technology. We present three different methods of generating the haptic stimulation: Constant, Point-by-Point and Row-by-Row. (a) depicts the standard ordering of cells in a Braille character, and (b) shows how the character in (a) is displayed by the three proposed methods. HaptiRead delivers the information directly to the user, through their palm, in an unobtrusive manner. Thus the haptic display is particularly suitable for messages communicated in public, e.g. reading the departure time of the next bus at the bus stop (c). ABSTRACT Author Keywords Mid-air haptic interfaces have several advantages - the haptic Mid-air Haptics, Ultrasound, Haptic Feedback, Public information is delivered directly to the user, in a manner that Displays, Braille, Reading by Blind People. is unobtrusive to the immediate environment. They operate at a distance, thus easier to discover; they are more hygienic and allow interaction in 3D. We validate, for the first time, in INTRODUCTION a preliminary study with sighted and a user study with blind There are several challenges that blind people face when en- participants, the use of mid-air haptics for conveying Braille. gaging with interactive systems in public spaces. Firstly, it is We tested three haptic stimulation methods, where the hap- more difficult for the blind to maintain their personal privacy tic feedback was either: a) aligned temporally, with haptic when engaging with public displays.
    [Show full text]
  • Proposal to Encode Tamil Fractions and Symbols §1. Thanks §2
    Proposal to encode Tamil fractions and symbols Shriramana Sharma, jamadagni-at-gmail-dot-com, India 2012-Jul-17 §1. Thanks I owe much to everyone who helped with this proposal. G Balachandran of Sri Lanka heavily contributed to this proposal by sharing the attestations and data which he had collected over five years back in researching this very same topic. Dr Jean-Luc Chevillard of France and Dr Elmar Kniprath of Germany provided the second majority of attestations. Dr Kalyanasundaram (Switzerland), Dr Jayabarathi (Malaysia), K Ramanraj (Chennai), Vijayaraghavan Vanbakkam (Germany), Dr Rajam (US), Dr Vijayavenugopal (Pondicherry), Mani Manivannan (Chennai/US) and A E Elangovan (Chennai) also helped with attestations. Various members of the CTamil list also contributed to related discussions. Vinodh Rajan of Chennai is my personal sounding board for all my proposals and he contributes in too many ways for me to specify. Deborah Anderson continuously helped out with encouragement and enthusiasm-bolstering :-). I express my sincere thanks to all these people and to anyone else not mentioned (my apologies!). Preparing this has been a lengthy journey, and you all helped me through! §2. Introduction Compared to other Indic scripts/regions, the Tamil-speaking region has employed a larger set of symbols for fractions and abbreviations. These symbols, especially the fractions, have been referred to in various documents already submitted to the UTC, including my Grantha proposal L2/09-372 (p 6). A comprehensive proposal has been desirable for quite some time for the addition of characters to Unicode to enable the textual representation of these rare heritage written forms which are to be found in old Tamil manuscripts and books.
    [Show full text]