Telu L2/20-105 TO: UTC FROM
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Section 14.4, Phags-Pa
The Unicode® Standard Version 13.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2020 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 13.0. Includes index. ISBN 978-1-936213-26-9 (http://www.unicode.org/versions/Unicode13.0.0/) 1. -
Technical Reference Manual for the Standardization of Geographical Names United Nations Group of Experts on Geographical Names
ST/ESA/STAT/SER.M/87 Department of Economic and Social Affairs Statistics Division Technical reference manual for the standardization of geographical names United Nations Group of Experts on Geographical Names United Nations New York, 2007 The Department of Economic and Social Affairs of the United Nations Secretariat is a vital interface between global policies in the economic, social and environmental spheres and national action. The Department works in three main interlinked areas: (i) it compiles, generates and analyses a wide range of economic, social and environmental data and information on which Member States of the United Nations draw to review common problems and to take stock of policy options; (ii) it facilitates the negotiations of Member States in many intergovernmental bodies on joint courses of action to address ongoing or emerging global challenges; and (iii) it advises interested Governments on the ways and means of translating policy frameworks developed in United Nations conferences and summits into programmes at the country level and, through technical assistance, helps build national capacities. NOTE The designations employed and the presentation of material in the present publication do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. The term “country” as used in the text of this publication also refers, as appropriate, to territories or areas. Symbols of United Nations documents are composed of capital letters combined with figures. ST/ESA/STAT/SER.M/87 UNITED NATIONS PUBLICATION Sales No. -
Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only. -
Bibliography
Bibliography Many books were read and researched in the compilation of Binford, L. R, 1983, Working at Archaeology. Academic Press, The Encyclopedic Dictionary of Archaeology: New York. Binford, L. R, and Binford, S. R (eds.), 1968, New Perspectives in American Museum of Natural History, 1993, The First Humans. Archaeology. Aldine, Chicago. HarperSanFrancisco, San Francisco. Braidwood, R 1.,1960, Archaeologists and What They Do. Franklin American Museum of Natural History, 1993, People of the Stone Watts, New York. Age. HarperSanFrancisco, San Francisco. Branigan, Keith (ed.), 1982, The Atlas ofArchaeology. St. Martin's, American Museum of Natural History, 1994, New World and Pacific New York. Civilizations. HarperSanFrancisco, San Francisco. Bray, w., and Tump, D., 1972, Penguin Dictionary ofArchaeology. American Museum of Natural History, 1994, Old World Civiliza Penguin, New York. tions. HarperSanFrancisco, San Francisco. Brennan, L., 1973, Beginner's Guide to Archaeology. Stackpole Ashmore, w., and Sharer, R. J., 1988, Discovering Our Past: A Brief Books, Harrisburg, PA. Introduction to Archaeology. Mayfield, Mountain View, CA. Broderick, M., and Morton, A. A., 1924, A Concise Dictionary of Atkinson, R J. C., 1985, Field Archaeology, 2d ed. Hyperion, New Egyptian Archaeology. Ares Publishers, Chicago. York. Brothwell, D., 1963, Digging Up Bones: The Excavation, Treatment Bacon, E. (ed.), 1976, The Great Archaeologists. Bobbs-Merrill, and Study ofHuman Skeletal Remains. British Museum, London. New York. Brothwell, D., and Higgs, E. (eds.), 1969, Science in Archaeology, Bahn, P., 1993, Collins Dictionary of Archaeology. ABC-CLIO, 2d ed. Thames and Hudson, London. Santa Barbara, CA. Budge, E. A. Wallis, 1929, The Rosetta Stone. Dover, New York. Bahn, P. -
Automatic Labeling of Voiced Consonants for Morphological Analysis of Modern Japanese Literature
Automatic Labeling of Voiced Consonants for Morphological Analysis of Modern Japanese Literature Teruaki Oka† Mamoru Komachi† [email protected] [email protected] Toshinobu Ogiso‡ Yuji Matsumoto† [email protected] [email protected] Nara Institute of Science and Technology National† Institute for Japanese Language and Linguistics ‡ Abstract literary text,2 which achieves high performance on analysis for existing electronic text (e.g. Aozora- Since the present-day Japanese use of bunko, an online digital library of freely available voiced consonant mark had established books and work mainly from out-of-copyright ma- in the Meiji Era, modern Japanese lit- terials). erary text written in the Meiji Era of- However, the performance of morphological an- ten lacks compulsory voiced consonant alyzers using the dictionary deteriorates if the text marks. This deteriorates the performance is not normalized, because these dictionaries often of morphological analyzers using ordi- lack orthographic variations such as Okuri-gana,3 nary dictionary. In this paper, we pro- accompanying characters following Kanji stems pose an approach for automatic labeling of in Japanese written words. This is problematic voiced consonant marks for modern liter- because not all historical texts are manually cor- ary Japanese. We formulate the task into a rected with orthography, and it is time-consuming binary classification problem. Our point- to annotate by hand. It is one of the major issues wise prediction method uses as its feature in applying NLP tools to Japanese Linguistics be- set only surface information about the sur- cause ancient materials often contain a wide vari- rounding character strings. -
Shahmukhi to Gurmukhi Transliteration System: a Corpus Based Approach
Shahmukhi to Gurmukhi Transliteration System: A Corpus based Approach Tejinder Singh Saini1 and Gurpreet Singh Lehal2 1 Advanced Centre for Technical Development of Punjabi Language, Literature & Culture, Punjabi University, Patiala 147 002, Punjab, India [email protected] http://www.advancedcentrepunjabi.org 2 Department of Computer Science, Punjabi University, Patiala 147 002, Punjab, India [email protected] Abstract. This research paper describes a corpus based transliteration system for Punjabi language. The existence of two scripts for Punjabi language has created a script barrier between the Punjabi literature written in India and in Pakistan. This research project has developed a new system for the first time of its kind for Shahmukhi script of Punjabi language. The proposed system for Shahmukhi to Gurmukhi transliteration has been implemented with various research techniques based on language corpus. The corpus analysis program has been run on both Shahmukhi and Gurmukhi corpora for generating statistical data for different types like character, word and n-gram frequencies. This statistical analysis is used in different phases of transliteration. Potentially, all members of the substantial Punjabi community will benefit vastly from this transliteration system. 1 Introduction One of the great challenges before Information Technology is to overcome language barriers dividing the mankind so that everyone can communicate with everyone else on the planet in real time. South Asia is one of those unique parts of the world where a single language is written in different scripts. This is the case, for example, with Punjabi language spoken by tens of millions of people but written in Indian East Punjab (20 million) in Gurmukhi script (a left to right script based on Devanagari) and in Pakistani West Punjab (80 million), written in Shahmukhi script (a right to left script based on Arabic), and by a growing number of Punjabis (2 million) in the EU and the US in the Roman script. -
5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721
Internet Engineering Task Force (IETF) P. Faltstrom, Ed. Request for Comments: 5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721 The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) Abstract This document specifies rules for deciding whether a code point, considered in isolation or in context, is a candidate for inclusion in an Internationalized Domain Name (IDN). It is part of the specification of Internationalizing Domain Names in Applications 2008 (IDNA2008). Status of This Memo This is an Internet Standards Track document. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc5892. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. -
Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR)
Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR) Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR) LGR Version: 3.0 Date: 2019-03-06 Document version: 2.6 Authors: Neo-Brahmi Generation Panel [NBGP] 1. General Information/ Overview/ Abstract The purpose of this document is to give an overview of the proposed Kannada LGR in the XML format and the rationale behind the design decisions taken. It includes a discussion of relevant features of the script, the communities or languages using it, the process and methodology used and information on the contributors. The formal specification of the LGR can be found in the accompanying XML document: proposal-kannada-lgr-06mar19-en.xml Labels for testing can be found in the accompanying text document: kannada-test-labels-06mar19-en.txt 2. Script for which the LGR is Proposed ISO 15924 Code: Knda ISO 15924 N°: 345 ISO 15924 English Name: Kannada Latin transliteration of the native script name: Native name of the script: ಕನ#ಡ Maximal Starting Repertoire (MSR) version: MSR-4 Some languages using the script and their ISO 639-3 codes: Kannada (kan), Tulu (tcy), Beary, Konkani (kok), Havyaka, Kodava (kfa) 1 Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR) 3. Background on Script and Principal Languages Using It 3.1 Kannada language Kannada is one of the scheduled languages of India. It is spoken predominantly by the people of Karnataka State of India. It is one of the major languages among the Dravidian languages. Kannada is also spoken by significant linguistic minorities in the states of Andhra Pradesh, Telangana, Tamil Nadu, Maharashtra, Kerala, Goa and abroad. -
Writing Systems: Their Properties and Implications for Reading
Writing Systems: Their Properties and Implications for Reading Brett Kessler and Rebecca Treiman doi:10.1093/oxfordhb/9780199324576.013.1 Draft of a chapter to appear in: The Oxford Handbook of Reading, ed. by Alexander Pollatsek and Rebecca Treiman. ISBN 9780199324576. Abstract An understanding of the nature of writing is an important foundation for studies of how people read and how they learn to read. This chapter discusses the characteristics of modern writing systems with a view toward providing that foundation. We consider both the appearance of writing systems and how they function. All writing represents the words of a language according to a set of rules. However, important properties of a language often go unrepresented in writing. Change and variation in the spoken language result in complex links to speech. Redundancies in language and writing mean that readers can often get by without taking in all of the visual information. These redundancies also mean that readers must often supplement the visual information that they do take in with knowledge about the language and about the world. Keywords: writing systems, script, alphabet, syllabary, logography, semasiography, glottography, underrepresentation, conservatism, graphotactics The goal of this chapter is to examine the characteristics of writing systems that are in use today and to consider the implications of these characteristics for how people read. As we will see, a broad understanding of writing systems and how they work can place some important constraints on our conceptualization of the nature of the reading process. It can also constrain our theories about how children learn to read and about how they should be taught to do so. -
Register in Eastern Cham: Phonological, Phonetic and Sociolinguistic Approaches
REGISTER IN EASTERN CHAM: PHONOLOGICAL, PHONETIC AND SOCIOLINGUISTIC APPROACHES A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirement for the Degree of Doctor of Philosophy by Marc Brunelle August 2005 © 2005 Marc Brunelle REGISTER IN EASTERN CHAM: PHONOLOGICAL, PHONETIC AND SOCIOLINGUISTIC APPROACHES Marc Brunelle, Ph.D. Cornell University, 2005 The Chamic language family is often cited as a test case for contact linguistics. Although Chamic languages are Austronesian, they are claimed to have converged with Mon-Khmer languages and adopted features from their closest neighbors. A good example of such a convergence is the realization of phonological register in Cham dialects. In many Southeast Asian languages, the loss of the voicing contrast in onsets has led to the development of two registers, bundles of features that initially included pitch, voice quality, vowel quality and durational differences and that are typically realized on rimes. While Cambodian Cham realizes register mainly through vowel quality, just like Khmer, the registers of the Cham dialect spoken in south- central Vietnam (Eastern Cham) are claimed to have evolved into tone, a property that plays a central role in Vietnamese phonology. This dissertation evaluates the hypothesis that contact with Vietnamese is responsible for the recent evolution of Eastern Cham register by exploring the nature of the sound system of Eastern Cham from phonetic, phonological and sociolinguistic perspectives. Proponents of the view that Eastern Cham has a complex tone system claim that tones arose from the phonemicization of register allophones conditioned by codas after the weakening or deletion of coda stops and laryngeals. -
The Gentics of Civilization: an Empirical Classification of Civilizations Based on Writing Systems
Comparative Civilizations Review Volume 49 Number 49 Fall 2003 Article 3 10-1-2003 The Gentics of Civilization: An Empirical Classification of Civilizations Based on Writing Systems Bosworth, Andrew Bosworth Universidad Jose Vasconcelos, Oaxaca, Mexico Follow this and additional works at: https://scholarsarchive.byu.edu/ccr Recommended Citation Bosworth, Bosworth, Andrew (2003) "The Gentics of Civilization: An Empirical Classification of Civilizations Based on Writing Systems," Comparative Civilizations Review: Vol. 49 : No. 49 , Article 3. Available at: https://scholarsarchive.byu.edu/ccr/vol49/iss49/3 This Article is brought to you for free and open access by the Journals at BYU ScholarsArchive. It has been accepted for inclusion in Comparative Civilizations Review by an authorized editor of BYU ScholarsArchive. For more information, please contact [email protected], [email protected]. Bosworth: The Gentics of Civilization: An Empirical Classification of Civil 9 THE GENETICS OF CIVILIZATION: AN EMPIRICAL CLASSIFICATION OF CIVILIZATIONS BASED ON WRITING SYSTEMS ANDREW BOSWORTH UNIVERSIDAD JOSE VASCONCELOS OAXACA, MEXICO Part I: Cultural DNA Introduction Writing is the DNA of civilization. Writing permits for the organi- zation of large populations, professional armies, and the passing of complex information across generations. Just as DNA transmits biolog- ical memory, so does writing transmit cultural memory. DNA and writ- ing project information into the future and contain, in their physical structure, imprinted knowledge. -
The Unicode Standard, Version 4.0--Online Edition
This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consor- tium and published by Addison-Wesley. The material has been modified slightly for this online edi- tion, however the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/standard/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial capital letters. However, not all words in initial capital letters are trademark designations. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten used as the source of reference Kanji codes was written by Tetsuji Morohashi and published by Taishukan Shoten.