A Gurmukhi to Shahmukhi Transliteration System Gurpreet Singh Lehal Department of Computer Science Punjabi University, Patiala, India-147002

Total Page:16

File Type:pdf, Size:1020Kb

A Gurmukhi to Shahmukhi Transliteration System Gurpreet Singh Lehal Department of Computer Science Punjabi University, Patiala, India-147002 A Gurmukhi to Shahmukhi Transliteration System Gurpreet Singh Lehal Department of Computer Science Punjabi University, Patiala, India-147002. [email protected] Abstract of the original word. Transliteration is frequently used in Machine Translation, to create possible Transliteration is a process wherein an input equivalents of unknown words, cross-lingual string in some alphabet is converted to a string in information retrieval systems and in another alphabet, usually based on the phonetics development of multilingual resources. of the original word. It is useful for Machine Transliteration is usually classified into two Translation, to create possible equivalents of directions. Given a pair (s,t) where s is the unknown words, cross-lingual information original word in the source language and t is the retrieval systems and in development of transliterated word in the target language, multilingual resources. Punjabi is one of the forward transliteration is the process of unique languages, which are written in more than phonetically converting s into t, and backward one script. In India, Punjabi is written in transliteration is the process of correctly Gurmukhi script, while in Pakistan it is written in generating s given t(Lin and Chen, 2002). Shahmukhi (Urdu) script. This has created a Backward transliteration is more challenging script wedge as majority of Punjabi speaking than forward transliteration. While forward people in Pakistan cannot read Gurmukhi script, transliteration can accomplish the mapping and similarly the majority of Punjabi speaking through table-lookup, backward transliteration is people in India cannot comprehend Shahmukhi required to disambiguate the noise produced in script. In this paper, we present a high accuracy the forward transliteration and estimate the Gurmukhi to Shahmukhi transliteration system. original word as close as possible. The Gurmukhi We have tried to overcome the shortcomings of to Shahmukhi transliteration is different from the existing Gurmukhi to Shahmukhi standard transliteration schemes in the sense that transliteration systems and developed a system the Gurmukhi word has to be converted to which can which can transliterate any Gurmukhi Shahmukhi with exact spellings, since the text to Shahmukhi at more than 98.6% accuracy language is written in both Gurmukhi and at word level. Shahmukhi. This fact has largely been ignored by existing systems, which have treated the Introduction transliteration problem as forward transliteration and used character mappings and dependency Punjabi is one of the most widely spoken rules to convert Gurmukhi word to Shahmukhi language in Indian sub-continent, with more than without any consideration to spellings. 100 million speakers in India and Pakistan. A In this paper we present a fairly good accuracy unique feature of Punjabi is that it is written in Gurmukhi-Shahmukhi transliteration system, two mutually incomprehensible scripts. In India where we have taken special care to retain the Punjabi language is written in Gurmukhi script, correct spellings in the transliterated Shahmukhi while in Pakistan it is written in Shahmukhi text. (Urdu) script. This has created a script wedge as majority of Punjabi speaking people in Pakistan Gurmukhi and Shahmukhi scripts: a brief cannot read Gurmukhi script, and similarly the overview majority of Punjabi speaking people in India cannot comprehend Shahmukhi script. To break Shahmukhi is a right-to-left script and the shape this script barrier, we have developed a assumed by a character in a word is context transliteration system for transliterating Punjabi sensitive. The Nastalique script, a cursive, right- text written in Gurmukhi script to Shahmukhi. to-left context-sensitive and a highly complex Transliteration is a process wherein an input writing system is normally used for Shahmukhi string in some alphabet is converted to a string in orthography. It has 35 simple consonants, 15 another alphabet, usually based on the phonetics aspirated consonants, one character for nasal "Proceedings of ICON-2009: 7th International Conference on Natural Language Processing, Macmillan Publishers, India. Also accessible from http://ltrc.iiit.ac.in/proceedings/ICON-2009" د sound, 15 diacritical marks, 10 digits and other symbols. Gurmukhi derives its character set from ਦ دھ old scripts of the Indian Sub-continent. It is a ਧ ن ,left-to-right syllabic script. It has 38 consonants 10 vowels characters, 9 vowel symbols, 2 ਨ پ symbols for nasal sounds and 1 symbol that ਪ پھ ,duplicates the sound of a consonant (Malik 2006; Malik, 2005; Jawaid and Ahmed, 2009). ਫ ب Below is the mapping chart for Gurmukhi and its ਬ respective Shahmukhi character(s). بھ Table 1. Gurmukhi to Shahmukhi Mapping ਭ م Gurmukhi Shahmukhi ਮ ا ے ਅ ਯ آ ر ਆ ਰ اِ ل ਇ ਲ ای ل ਈ ਲ਼ اُ و ਉ ਵ ُاو ش ਊ ਸ਼ اے / ص / ث ਏ ਸ س َاے ਐ ه / ح ਹ َاو ਓ ا / ه / ٰ◌ ਾ◌ َاو ਔ ِ◌ ◌ਿ ق/ ک ਕ ی ੀ◌ کھ ਖ ُ◌ ੁ◌ گ ਗ ُ◌و ੂ◌ گھ ਘ ے ੇ◌ نگ ਙ َ◌ے ੈ◌ چ ਚ و ੋ◌ چھ ਛ َ◌و ੌ◌ ج ਜ خ ਖ਼ جھ ਝ غ ਗ਼ نج ਞ ذ / ظ / ض ਜ਼ ٹ ز / ਟ ڑ ٹھ ਠ ੜ ف ڈ ਡ ਫ਼ ن / ں ڈھ ਢ ◌ੰ ّ◌ ن ਣ ◌ੱ ن / ں ت / ط ਤ ◌ਂ تھ ਥ As can be seen from Table 1, there are certain Two more characters ◌ੰ and ◌ਂ are mapped Gurmukhi characters which have multiple But these characters do not . ن and ں to both equivalent Shahmukhi characters. For example, ں pose any problem since the character Similarly, always come at the end of the word. For .ط and ت the character ਤ is mapped to the character is mapped to four Shahmukhi other cases we choose the character having ਜ਼ higher frequency of occurrence. But one .(ض/ظ/ذ/ز) characters obvious problem is that the lesser frequent A statistical analysis on the Shahmukhi corpus characters never get mapped. A statistical was performed to determine the percentage of analysis of corpus reveals that these lesser times the Gurmukhi character was mapped to the frequently occurring similar sounding similar sounding Shahmukhi characters (Table Shahmukhi characters have 1.45% frequency 2). The Shahmukhi characters with highest of occurrence. It was also found that the frequency are selected for default mapping. average word length of a Shahmukhi word is Table 2. Frequency of Occurrence of multiple 3.55, which means that we can roughly mapping characters predict that any rule based system which Gurmukhi Equivalent % Default does not take care of these characters, will Character Shahmukhi Frequency have 1.45*3.55 = 5.15% error rate at word Characters level, due to multiple mappings. Thus to ه %92.12 ه ਹ have a high accuracy system, it is necessary .to solve this problem %7.88 ح س %92.58 س ਸ 2. No exact equivalent mappings in Gurmukhi for some Shahmukhi %7.41 ص characters: There are certain Shahmukhi %1.39 ث ک %91.29 ک hamza) for) ء ain) and) ع ਕ characters such as which there no exact equivalent Gurmukhi %8.71 ق ت %95.49 ت ਤ characters. Though hamza can be generated using character combination rules, but there %4.51 ط ز %62.12 ز ਜ਼ are no specific rules for ain. As for example, :consider the following cases %14.87 ض عورت <- %13.91 ظ ਔਰਤ %9.10 ذ ُ ُشروع <- ਸ਼ੁਰੂ رفيع <- Issues in Gurmukhi-Shahmukhi Transliteration ਰਫ਼ੀ ِالوداع <- ਅਲਿਵਦਾ As already discussed above, the main issue in It is not possible to specify the rules in above Gurmukhi-Shahmukhi transliteration is to examples for generation of ain in Shahmukhi convert the Gurmukhi word to Shahmukhi with words and it depends from case to case. correct spellings. The existing systems for From the statistics, we find that ain has Gurmukhi-Shahmukhi transliteration are mostly 0.42% of frequency of occurrence and thus rule based and thus do not have good any pure rule based system, which cannot transliteration accuracy as they do not check for properly generate ain in Shahmukhi will correct Shahmukhi spellings. Two transliteration further have 0.42*3.55 = 1.49% error rate at systems available on internet have been taken for word level. comparison with our system 3. Missing nukta symbols in Gurmukhi text : (http://www.puran.info/PMT/PMT.aspx; Punjabi language has borrowed many words http://parc.cdac.in/utrans.htm). The main challenges from Arabic, Persian etc. Five consonants ( involved in development of a good accuracy ਸ਼ system are: ਖ਼ ਗ਼ ਜ਼ ਫ਼) were added to the original 35 1. Multiple Mappings : As already discussed characters in Gurmukhi to accommodate the in above section, there are certain Gurmukhi sounds from these languages. As can be characters which are mapped to multiple seen, these characters were created by adding Shahmukhi characters. The five such the nukta(dot) symbol at the feet of the Gurmukhi characters are shown in Table 2. respectively. For transliteration of such existing symbols (ਸ ਖ ਗ ਜ ਫ). But over the words, we have to device special methods to years, the usage of these characters handle these cases. particularly, ਖ਼ ਗ਼ ਜ਼ ਫ਼, has been on decline as 5. Transliteration of proper nouns: The many Punjabi speakers do not make a transliteration of Urdu proper nouns such as distinction between , and . In names of persons and places from Gurmukhi ਖ ਖ਼ ਗ ਗ਼ ਫ ਫ਼ to Shahmukhi poses another challenge. fact from the analysis of Gurmukhi and Many times the spellings of such words in Shahmukhi corpus, we found that these four Shahmukhi are typical and it is not possible characters had combined character frequency to formulate transliteration rules for of occurrence of only 0.17% as compared to generation of such spellings.
Recommended publications
  • Schwa Deletion: Investigating Improved Approach for Text-To-IPA System for Shiri Guru Granth Sahib
    ISSN (Online) 2278-1021 ISSN (Print) 2319-5940 International Journal of Advanced Research in Computer and Communication Engineering Vol. 4, Issue 4, April 2015 Schwa Deletion: Investigating Improved Approach for Text-to-IPA System for Shiri Guru Granth Sahib Sandeep Kaur1, Dr. Amitoj Singh2 Pursuing M.E, CSE, Chitkara University, India 1 Associate Director, CSE, Chitkara University , India 2 Abstract: Punjabi (Omniglot) is an interesting language for more than one reasons. This is the only living Indo- Europen language which is a fully tonal language. Punjabi language is an abugida writing system, with each consonant having an inherent vowel, SCHWA sound. This sound is modifiable using vowel symbols attached to consonant bearing the vowel. Shri Guru Granth Sahib is a voluminous text of 1430 pages with 511,874 words, 1,720,345 characters, and 28,534 lines and contains hymns of 36 composers written in twenty-two languages in Gurmukhi script (Lal). In addition to text being in form of hymns and coming from so many composers belonging to different languages, what makes the language of Shri Guru Granth Sahib even more different from contemporary Punjabi. The task of developing an accurate Letter-to-Sound system is made difficult due to two further reasons: 1. Punjabi being the only tonal language 2. Historical and Cultural circumstance/period of writings in terms of historical and religious nature of text and use of words from multiple languages and non-native phonemes. The handling of schwa deletion is of great concern for development of accurate/ near perfect system, the presented work intend to report the state-of-the-art in terms of schwa deletion for Indian languages, in general and for Gurmukhi Punjabi, in particular.
    [Show full text]
  • Shahmukhi to Gurmukhi Transliteration System: a Corpus Based Approach
    Shahmukhi to Gurmukhi Transliteration System: A Corpus based Approach Tejinder Singh Saini1 and Gurpreet Singh Lehal2 1 Advanced Centre for Technical Development of Punjabi Language, Literature & Culture, Punjabi University, Patiala 147 002, Punjab, India [email protected] http://www.advancedcentrepunjabi.org 2 Department of Computer Science, Punjabi University, Patiala 147 002, Punjab, India [email protected] Abstract. This research paper describes a corpus based transliteration system for Punjabi language. The existence of two scripts for Punjabi language has created a script barrier between the Punjabi literature written in India and in Pakistan. This research project has developed a new system for the first time of its kind for Shahmukhi script of Punjabi language. The proposed system for Shahmukhi to Gurmukhi transliteration has been implemented with various research techniques based on language corpus. The corpus analysis program has been run on both Shahmukhi and Gurmukhi corpora for generating statistical data for different types like character, word and n-gram frequencies. This statistical analysis is used in different phases of transliteration. Potentially, all members of the substantial Punjabi community will benefit vastly from this transliteration system. 1 Introduction One of the great challenges before Information Technology is to overcome language barriers dividing the mankind so that everyone can communicate with everyone else on the planet in real time. South Asia is one of those unique parts of the world where a single language is written in different scripts. This is the case, for example, with Punjabi language spoken by tens of millions of people but written in Indian East Punjab (20 million) in Gurmukhi script (a left to right script based on Devanagari) and in Pakistani West Punjab (80 million), written in Shahmukhi script (a right to left script based on Arabic), and by a growing number of Punjabis (2 million) in the EU and the US in the Roman script.
    [Show full text]
  • Contribution to the UN Secretary-General's 2018 Report
    COMMISSION ON SCIENCE AND TECHNOLOGY FOR DEVELOPMENT (CSTD) Twenty-second session Geneva, 13 to 17 May 2019 Submissions from entities in the United Nations system and elsewhere on their efforts in 2018 to implement the outcome of the WSIS Submission by Internet Corporation for Assigned Names and Numbers This submission was prepared as an input to the report of the UN Secretary-General on "Progress made in the implementation of and follow-up to the outcomes of the World Summit on the Information Society at the regional and international levels" (to the 22nd session of the CSTD), in response to the request by the Economic and Social Council, in its resolution 2006/46, to the UN Secretary-General to inform the Commission on Science and Technology for Development on the implementation of the outcomes of the WSIS as part of his annual reporting to the Commission. DISCLAIMER: The views presented here are the contributors' and do not necessarily reflect the views and position of the United Nations or the United Nations Conference on Trade and Development. 2018 ANNUAL REPORT TO UNCTAD: ICANN CONTRIBUTION Progress made in the implementation of and follow-up to the outcomes of the World Summit on the Information Society at the regional and international levels Executive Summary ICANN is pleased and honoured be invited to contribute to this annual UNCTAD Report. We value our involvement with, and contribution to, the overall WSIS process and to our relationship with the UN Commission on Science and Technology for Development (CSTD). 2018 has been a busy and important year for ICANN and for the Internet Governance Ecosystem in general; with the ITU Plenipotentiary taking place in Dubai and the IGF in Paris.
    [Show full text]
  • Pronunciation
    PRONUNCIATION Guide to the Romanized version of quotations from the Guru Granth Saheb. A. Consonants Gurmukhi letter Roman Word in Roman Word in Gurmukhi Meaning Letter letters using the letters using the relevant letter relevant letter from from the second the first column column S s Sabh sB All H h Het ihq Affection K k Krodh kroD Anger K kh Khayl Kyl Play G g Guru gurU Teacher G gh Ghar Gr House | ng Ngyani / gyani i|AwnI / igAwnI Possessing divine knowledge C c Cor cor Thief C ch Chaata Cwqw Umbrella j j Jahaaj jhwj Ship J jh Jhaaroo JwVU Broom \ ny Sunyi su\I Quiet t t Tap t`p Jump T th Thag Tg Robber f d Dar fr Fear F dh Dholak Folk Drum x n Hun hux Now q t Tan qn Body Q th Thuk Quk Sputum d d Den idn Day D dh Dhan Dn Wealth n n Net inq Everyday p p Peta ipqw Father P f Fal Pl Fruit b b Ben ibn Without B bh Bhagat Bgq Saint m m Man mn Mind X y Yam Xm Messenger of death r r Roti rotI Bread l l Loha lohw Iron v v Vasai vsY Dwell V r Koora kUVw Rubbish (n) in brackets, and (g) in brackets after the consonant 'n' both indicate a nasalised sound - Eg. 'Tu(n)' meaning 'you'; 'saibhan(g)' meaning 'by himself'. All consonants in Punjabi / Gurmukhi are sounded - Eg. 'pai-r' meaning 'foot' where the final 'r' is sounded. 3 Copyright Material: Gurmukh Singh of Raub, Pahang, Malaysia B.
    [Show full text]
  • Punjabi Machine Transliteration Muhammad Ghulam Abbas Malik
    Punjabi Machine Transliteration Muhammad Ghulam Abbas Malik To cite this version: Muhammad Ghulam Abbas Malik. Punjabi Machine Transliteration. 21st international Conference on Computational Linguistics (COLING) and the 44th Annual Meeting of the ACL, Jul 2006, Sydney, France. pp.1137-1144. hal-01002160 HAL Id: hal-01002160 https://hal.archives-ouvertes.fr/hal-01002160 Submitted on 15 Jan 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Punjabi Machine Transliteration M. G. Abbas Malik Department of Linguistics Denis Diderot, University of Paris 7 Paris, France [email protected] Transliteration refers to phonetic translation Abstract across two languages with different writing sys- tems (Knight & Graehl, 1998), such as Arabic to Machine Transliteration is to transcribe a English (Nasreen & Leah, 2003). Most prior word written in a script with approximate work has been done for Machine Translation phonetic equivalence in another lan- (MT) (Knight & Leah, 97; Paola & Sanjeev, guage. It is useful for machine transla- 2003; Knight & Stall, 1998) from English to tion, cross-lingual information retrieval, other major languages of the world like Arabic, multilingual text and speech processing. Chinese, etc. for cross-lingual information re- Punjabi Machine Transliteration (PMT) trieval (Pirkola et al, 2003), for the development is a special case of machine translitera- of multilingual resources (Yan et al, 2003; Kang tion and is a process of converting a word & Kim, 2000) and for the development of cross- from Shahmukhi (based on Arabic script) lingual applications.
    [Show full text]
  • A Message from Afar: Fact Sheet 3 (PDF)
    3 What’s the Message? Here’s your signal The detection of a signal from another world would be a most remarkable moment in human history. However, if we detect such a signal, is it just a beacon from their technology, without any content, or does it contain information or even a message? Does it resemble sound, or is it like interstellar e-mail? Can we ever understand such a message? This appears to be a tremendous challenge, given that we still have many scripts from our own antiquity that remain undeciphered, despite many serious attempts, over hundreds of years. – And we know far more about humanity than about extra-terrestrial intelligence… We are facing all the complexities involved in understanding and glimpsing the intellect of the author, while the world’s expectations demand immediacy of information. So, where do we begin? Structure and language Information stands out from randomness, it is based on structures. The problem goal we face, after we detect a signal, is to first separate out those information-carrying signals from other phenomena, without being able to engage in a dialogue, and then to learn something about the structure of their content in the passing. This means that we need a suitable filter. We need a way of separating out the interesting stuff; we need a language detector. While identifying the location of origin of a candidate signal can rule out human making, its content could involve a vast array of possible structures, some of which may be beyond our knowledge or imagination; however, for identifying intelligence that shares any pattern with our way of processing and transmitting information, the collective knowledge and examples here on Earth are a good starting point.
    [Show full text]
  • On Continent and Script-Wise Divisions-Based Statistical Measures for Stop-Words Lists of International Languages Jatinderkumar R
    Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 89 ( 2016 ) 313 – 319 Twelfth International Multi-Conference on Information Processing-2016 (IMCIP-2016) On Continent and Script-Wise Divisions-Based Statistical Measures for Stop-Words Lists of International Languages Jatinderkumar R. Sainia,c,∗ and Rajnish M. Rakholiab,c aNarmada College of Computer Application, Bharuch 392 011, Gujarat, India bS. S. Agrawal Institute of Computer Science, Navsari 396 445, Gujarat, India cR K University, Rajkot, Gujarat, India Abstract The data for the current research work was collected for 42 different International languages encompassing 3 continents viz. Asia, Europe and South America. The data comprised of unigram model representation of lexicons in the stop-words lists. 13 scripting systems comprising Arabic, Armenian, Bengali, Chinese, Cyrillic, Devanagari, Greek, Gurmukhi, Hanja & Hangul, Kana, Kanji, Marathi, Roman (Latin) and Thai were considered. Based on a comprehensive analysis of statistical measures for Stop-words lists, it has been concluded that Asian languages are mostly self-scripted and that the average number of stop-words in Asian languages is more than those in European languages. In addition to various important and other first research results, a very important inference from the current research work is that the average number of stop-words for any given language could be predicted to be 200. © 2016 The Authors. Published by Elsevier B.V. © 2016 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (Peer-reviewhttp://creativecommons.org/licenses/by-nc-nd/4.0/ under responsibility of organizing). committee of the Twelfth International Multi-Conference on Information PeerProcessing-2016-review under (IMCIP-2016).responsibility of organizing committee of the Organizing Committee of IMCIP-2016 Keywords: Function-Words; Language; Natural Language Processing (NLP); Script; Stop-Words.
    [Show full text]
  • Numbering Systems Developed by the Ancient Mesopotamians
    Emergent Culture 2011 August http://emergent-culture.com/2011/08/ Home About Contact RSS-Email Alerts Current Events Emergent Featured Global Crisis Know Your Culture Legend of 2012 Synchronicity August, 2011 Legend of 2012 Wednesday, August 31, 2011 11:43 - 4 Comments Cosmic Time Meets Earth Time: The Numbers of Supreme Wholeness and Reconciliation Revealed In the process of writing about the precessional cycle I fell down a rabbit hole of sorts and in the process of finding my way around I made what I think are 4 significant discoveries about cycles of time and the numbers that underlie and unify cosmic and earthly time . Discovery number 1: A painting by Salvador Dali. It turns that clocks are not as bad as we think them to be. The units of time that segment the day into hours, minutes and seconds are in fact reconciled by the units of time that compose the Meso American Calendrical system or MAC for short. It was a surprise to me because one of the world’s foremost authorities in calendrical science the late Dr.Jose Arguelles had vilified the numbers of Western timekeeping as a most grievious error . So much so that he attributed much of the worlds problems to the use of the 12 month calendar and the 24 hour, 60 minute, 60 second day, also known by its handy acronym 12-60 time. I never bought into his argument that the use of those time factors was at fault for our largely miserable human-planetary condition. But I was content to dismiss mechanized time as nothing more than a convenient tool to facilitate the activities of complex societies.
    [Show full text]
  • Root Zone Label Generation Rules — LGR-3 Overview and Summary
    Integration Panel: Root Zone Label Generation Rules — LGR-3 Overview and Summary REVISION – July 10, 2019 Table of Contents 1 Overview 5 1.1 Root Zone Label Generation Rules (LGR-3) Files 5 2 Process of Integration 7 2.1 Overview 7 2.2 Proposals Submitted 9 2.3 Review of Proposals 11 2.3.1 General Notes on the Proposal Review 11 2.3.2 Arabic LGR Proposal Review 12 2.3.3 Armenian LGR Proposal Review 12 2.3.4 Cyrillic LGR Proposal Review 12 2.3.5 Devanagari LGR Proposal Review 13 2.3.6 Ethiopic LGR Proposal Review 14 2.3.7 Georgian LGR Proposal Review 14 2.3.8 Gujarati LGR Proposal Review 14 2.3.9 Gurmukhi LGR Proposal Review 15 2.3.10 Hebrew LGR Proposal Review 15 2.3.11 Kannada LGR Proposal Review 16 2.3.12 Khmer LGR Proposal Review 16 2.3.13 Lao LGR Proposal Review 17 2.3.14 Malayalam LGR Proposal Review 17 2.3.15 Oriya LGR Proposal Review 18 2.3.16 Sinhala LGR Proposal Review 19 2.3.17 Tamil LGR Proposal Review 20 2.3.18 Telugu LGR Proposal Review 20 2.3.19 Thai LGR Proposal Review 21 3 Integration and Contents of LGR-3 21 3.1 General Notes 21 3.1.1 Summary 22 3.2 MerGed LGR (Common) 23 3.2.1 Repertoire 23 Integration Panel: Root Zone Label Generation Rules — LGR-3 Overview and Summary 3.2.2 Variants 23 3.2.3 Character Classes 24 3.2.4 Whole-Label Evaluations (WLE) Rules 24 3.3 Arabic Element LGR 25 3.3.1 Repertoire for Arabic 25 3.3.2 Variants for Arabic 25 3.3.3 Whole-Label Evaluation Rules for Arabic 25 3.3.4 Default Whole-Label Evaluation Rules 25 3.4 Devanagari Element LGR 26 3.4.1 Repertoire for Devanagari 26 3.4.2 Variants for
    [Show full text]
  • General Historical and Analytical / Writing Systems: Recent Script
    9 Writing systems Edited by Elena Bashir 9,1. Introduction By Elena Bashir The relations between spoken language and the visual symbols (graphemes) used to represent it are complex. Orthographies can be thought of as situated on a con- tinuum from “deep” — systems in which there is not a one-to-one correspondence between the sounds of the language and its graphemes — to “shallow” — systems in which the relationship between sounds and graphemes is regular and trans- parent (see Roberts & Joyce 2012 for a recent discussion). In orthographies for Indo-Aryan and Iranian languages based on the Arabic script and writing system, the retention of historical spellings for words of Arabic or Persian origin increases the orthographic depth of these systems. Decisions on how to write a language always carry historical, cultural, and political meaning. Debates about orthography usually focus on such issues rather than on linguistic analysis; this can be seen in Pakistan, for example, in discussions regarding orthography for Kalasha, Wakhi, or Balti, and in Afghanistan regarding Wakhi or Pashai. Questions of orthography are intertwined with language ideology, language planning activities, and goals like literacy or standardization. Woolard 1998, Brandt 2014, and Sebba 2007 are valuable treatments of such issues. In Section 9.2, Stefan Baums discusses the historical development and general characteristics of the (non Perso-Arabic) writing systems used for South Asian languages, and his Section 9.3 deals with recent research on alphasyllabic writing systems, script-related literacy and language-learning studies, representation of South Asian languages in Unicode, and recent debates about the Indus Valley inscriptions.
    [Show full text]
  • The Writing Revolution
    9781405154062_1_pre.qxd 8/8/08 4:42 PM Page iii The Writing Revolution Cuneiform to the Internet Amalia E. Gnanadesikan A John Wiley & Sons, Ltd., Publication 9781405154062_1_pre.qxd 8/8/08 4:42 PM Page iv This edition first published 2009 © 2009 Amalia E. Gnanadesikan Blackwell Publishing was acquired by John Wiley & Sons in February 2007. Blackwell’s publishing program has been merged with Wiley’s global Scientific, Technical, and Medical business to form Wiley-Blackwell. Registered Office John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom Editorial Offices 350 Main Street, Malden, MA 02148-5020, USA 9600 Garsington Road, Oxford, OX4 2DQ, UK The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK For details of our global editorial offices, for customer services, and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com/wiley-blackwell. The right of Amalia E. Gnanadesikan to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. Designations used by companies to distinguish their products are often claimed as trademarks.
    [Show full text]
  • Cataloging Service Bulletin 078, Fall 1997
    ISSN 0160-8029 LIBRARY OF CONGRESSIWASHINGTON CATALOGING SERVICE BULLETIN LIBRARY SERVICES Number 78, Fall 1997 Editor: Robert M. Hiatt CONTENTS Page GENERAL Personnel Changes Headings for Certain Entities Scope of the Cataloging in Publication Program Collection-Level Cataloging Democratic Republic of the Congo; Hong Kong DESCRIPTIVE CATALOGING Library of Congress Rule Interpretations "DLC-S" in SAR Fields SUBJECT CATALOGING Subdivision -Biography under Personal Names Subdivision -History Free-Floating Chronological Subdivisions Subdivision Simplification Progress Changed or Cancelled Free-Floating Subdivisions Subject Headings of Current Interest Revised LC Subject Headings Subject Headings Replaced by Name Headings MARC Language Codes Editorial postal address: Cataloging Policy and Support Office, Library Services, Library of Congress, Washington, D .C. 20540-4305 Editorial electronic mail address: [email protected] Editorialfa number: (202) 707-6629 Subscription address: Customer Support Team, Cataloging Distribution Service, Library of Congress, Washington, D. C. 2054 1-5212 Library of Congress Catalog Card Number: 78-51400 ISSN 0160-8029 Key title: Cataloging service bulletin Copyright '1997 the Library of Congress, except within the U.S.A. Barbara B. Tillett, chief, Cataloging Policy and Support Office (cpso), is on a three-year detail as project director for the implementation of an integrated library system for the Library of Congress, effective August 24, 1997. During this period Thompson A. Yee, assistant chief, will serve as acting chief of ~PSO. During her detail, Dr. Tillett will continue to serve as the Library's representative to the ALCTS/CCS Committee on Cataloging: Description and Access and to the Joint Steering Committee for Revision of AACR and to attend the weekly meetings of the Library's Cataloging Management Team.
    [Show full text]