'Links Between Scripts, Writing Systems, Orthographies, Fonts And
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Armenian Secret and Invented Languages and Argots
Armenian Secret and Invented Languages and Argots The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Russell, James R. Forthcoming. Armenian secret and invented languages and argots. Proceedings of the Institute of Linguistics of the Russian Academy of Sciences. Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:9938150 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Open Access Policy Articles, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#OAP 1 ARMENIAN SECRET AND INVENTED LANGUAGES AND ARGOTS. By James R. Russell, Harvard University. Светлой памяти Карена Никитича Юзбашяна посвящается это исследование. CONTENTS: Preface 1. Secret languages and argots 2. Philosophical and hypothetical languages 3. The St. Petersburg Manuscript 4. The Argot of the Felt-Beaters 5. Appendices: 1. Description of St. Petersburg MS A 29 2. Glossary of the Ṙuštuni language 3. Glossary of the argot of the Felt-Beaters of Moks 4. Texts in the “Third Script” of MS A 29 List of Plates Bibliography PREFACE Much of the research for this article was undertaken in Armenia and Russia in June and July 2011 and was funded by a generous O’Neill grant through the Davis Center for Russian and Eurasian Studies at Harvard. For their eager assistance and boundless hospitality I am grateful to numerous friends and colleagues who made my visit pleasant and successful. For their generous assistance in Erevan and St. -
A Survey of Orthographic Information in Machine Translation 3
Machine Translation Journal manuscript No. (will be inserted by the editor) A Survey of Orthographic Information in Machine Translation Bharathi Raja Chakravarthi1 ⋅ Priya Rani 1 ⋅ Mihael Arcan2 ⋅ John P. McCrae 1 the date of receipt and acceptance should be inserted later Abstract Machine translation is one of the applications of natural language process- ing which has been explored in different languages. Recently researchers started pay- ing attention towards machine translation for resource-poor languages and closely related languages. A widespread and underlying problem for these machine transla- tion systems is the variation in orthographic conventions which causes many issues to traditional approaches. Two languages written in two different orthographies are not easily comparable, but orthographic information can also be used to improve the machine translation system. This article offers a survey of research regarding orthog- raphy’s influence on machine translation of under-resourced languages. It introduces under-resourced languages in terms of machine translation and how orthographic in- formation can be utilised to improve machine translation. We describe previous work in this area, discussing what underlying assumptions were made, and showing how orthographic knowledge improves the performance of machine translation of under- resourced languages. We discuss different types of machine translation and demon- strate a recent trend that seeks to link orthographic information with well-established machine translation methods. Considerable attention is given to current efforts of cog- nates information at different levels of machine translation and the lessons that can Bharathi Raja Chakravarthi [email protected] Priya Rani [email protected] Mihael Arcan [email protected] John P. -
Technical Reference Manual for the Standardization of Geographical Names United Nations Group of Experts on Geographical Names
ST/ESA/STAT/SER.M/87 Department of Economic and Social Affairs Statistics Division Technical reference manual for the standardization of geographical names United Nations Group of Experts on Geographical Names United Nations New York, 2007 The Department of Economic and Social Affairs of the United Nations Secretariat is a vital interface between global policies in the economic, social and environmental spheres and national action. The Department works in three main interlinked areas: (i) it compiles, generates and analyses a wide range of economic, social and environmental data and information on which Member States of the United Nations draw to review common problems and to take stock of policy options; (ii) it facilitates the negotiations of Member States in many intergovernmental bodies on joint courses of action to address ongoing or emerging global challenges; and (iii) it advises interested Governments on the ways and means of translating policy frameworks developed in United Nations conferences and summits into programmes at the country level and, through technical assistance, helps build national capacities. NOTE The designations employed and the presentation of material in the present publication do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. The term “country” as used in the text of this publication also refers, as appropriate, to territories or areas. Symbols of United Nations documents are composed of capital letters combined with figures. ST/ESA/STAT/SER.M/87 UNITED NATIONS PUBLICATION Sales No. -
Methods for Estonian Large Vocabulary Speech Recognition
THESIS ON INFORMATICS AND SYSTEM ENGINEERING C Methods for Estonian Large Vocabulary Speech Recognition TANEL ALUMAE¨ Faculty of Information Technology Department of Informatics TALLINN UNIVERSITY OF TECHNOLOGY Dissertation was accepted for the commencement of the degree of Doctor of Philosophy in Engineering on November 1, 2006. Supervisors: Prof. Emer. Leo Vohandu,˜ Faculty of Information Technology Einar Meister, Ph.D., Institute of Cybernetics at Tallinn University of Technology Opponents: Mikko Kurimo, Dr. Tech., Helsinki University of Technology Heiki-Jaan Kaalep, Ph.D., University of Tartu Commencement: December 5, 2006 Declaration: Hereby I declare that this doctoral thesis, my original investigation and achievement, submitted for the doctoral degree at Tallinn University of Technology has not been submitted for any degree or examination. / Tanel Alumae¨ / Copyright Tanel Alumae,¨ 2006 ISSN 1406-4731 ISBN 9985-59-661-7 ii Contents 1 Introduction 1 1.1 The speech recognition problem . 1 1.2 Language specific aspects of speech recognition . 3 1.3 Related work . 5 1.4 Scope of the thesis . 6 1.5 Outline of the thesis . 7 1.6 Acknowledgements . 7 2 Basic concepts of speech recognition 9 2.1 Probabilistic decoding problem . 10 2.2 Feature extraction . 10 2.2.1 Signal acquisition . 11 2.2.2 Short-term analysis . 11 2.3 Acoustic modelling . 14 2.3.1 Hidden Markov models . 15 2.3.2 Selection of basic units . 20 2.3.3 Clustered context-dependent acoustic units . 20 2.4 Language Modelling . 22 2.4.1 N-gram language models . 23 2.4.2 Language model evaluation . 29 3 Properties of the Estonian language 33 3.1 Phonology . -
VALTS ERNÇSTREITS (Riga) LIVONIAN ORTHOGRAPHY Introduction the Livonian Language Has Been Extensively Written for About
Linguistica Uralica XLIII 2007 1 VALTS ERNÇSTREITS (Riga) LIVONIAN ORTHOGRAPHY Abstract. This article deals with the development of Livonian written language and the related matters starting from the publication of first Livonian books until present day. In total four different spelling systems have been used in Livonian publications. The first books in Livonian appeared in 1863 using phonetic transcription. In 1880, the Gospel of Matthew was published in Eastern and Western Livonian dialects and used Gothic script and a spelling system similar to old Latvian orthography. In 1920, an East Livonian written standard was established by the simplification of the Finno-Ugric phonetic transcription. Later, elements of Latvian orthography, and after 1931 also West Livonian characteristics, were added. Starting from the 1970s and due to a considerable decrease in the number of Livonian mother tongue speakers in the second half of the 20th century the orthography was modified to be even more phonetic in the interest of those who did not speak the language. Additionally, in the 1930s, a spelling system which was better suited for conveying certain phonetic phenomena than the usual standard was used in two books but did not find any wider usage. Keywords: Livonian, standard language, orthography. Introduction The Livonian language has been extensively written for about 150 years by linguists who have been noting down examples of the language as well as the Livonians themselves when publishing different materials. The prime consideration for both of these groups has been how to best represent the Livonian language in the written form. The area of interest for the linguists has been the written representation of the language with maximal phonetic precision. -
Petit Manuel Unix®
Août 2010 Petit Manuel Unix® Jacques MADELAINE Département d’informatique Université de CAEN 14032 CAEN CEDEX La première édition de ce manuel décrivait SMX un Unix développé à l’INRIA pour la machine française SM90, la deuxième édition une adaptation pour SPIX, un Unix pour SM90 basé sur System V et développé par Bull. Il a été ensuite modifié et corrigé pour SunOS l’Unix de Sun Microsystems, puis pour Solaris. La cinquième version a été adaptée pour tenir compte des particularités du système GNU-Linux. Un chapitre supplémentaire dédié aux accès réseau a été ensuite ajouté. Rappelons que presque toutes les commandes décrites vont fonctionner comme indiqué sur tout système Unix commercial (Solaris, HP-UX, AIX, ...) ou libre (Linux, OpenBSD, FreeBSD, NetBSD, ...). Mes remerciements à Sara Aubry pour sa relecture attentive etàFrançois Girault pour avoir fourni la mise en tableau des commandes d’emacs. Mes remerciements à Davy Gigan pour m’avoir poussé à publier la version html en octobre 2003. 1 INTRODUCTION() INTRODUCTION() 2Petit manuel Unix 2002 INTRODUCTION NOM intro − introduction to the mini manual − introduction au petit manuel DESCRIPTION Ce manuel donne les principales commandes de Unix. Unix est une famille de systèmes d’exploitation ; les commandes décrites existent, sauf précision contraire, sous Linux et Solaris, les deux systèmes disponibles au département. Seules les principales options sont données, reportez-vous au manuel en ligne pour une liste exhaustive.Chaque commande est décrite par trois sections : NOM qui donne le nom de la commande, son nom en anglais (le nom Unix étant un mnémonique anglais ne correspondant pas toujours bien aveclefrançais) et en français. -
Guide to the Use of Character Set Standards in Europe
CEN TECHNICAL REPORT Draft 3 for CEN Trnnnn:1999 1999-07-23 Descriptors: Data processing, information interchange, text processing, text communication, graphic characters, character sets, representation of characters, coded character sets, architecture Information Technology - Guide to the use of character set standards in Europe This CEN Technical Report has been drawn up by CEN/TC 304 This CEN Technical Report was established by TC 304 in one official version (English). A version in any other language made by translation under the responsibility of a CEN member into its own lan- guage and notified to the Central Secretariat has the same status as the official version. CEN members are the national bodies of Austria, Belgium, the Czech Republic, Denmark, Finland, France, Germany, Greece, Iceland, Ireland, Italy, Luxembourg, Netherlands, Norway, Portugal, Spain, Sweden, Switzerland, and the United Kingdom. CEN European Committee for Standardization Comité Européen de Normalisation Europäisches Komitee für Normung Central Secretariat: rue de Stassart 36, B-1050 Brussels © CEN 1999 Copyright reserved to all CEN members Ref.No. TR xxxx:1999 E CEN TR nnnn : Draft 2 Guide to the use of character set standards in Europe ii Guide to the use of character set standards in Europe CEN TR nnnn : Draft 2 FOREWORD This report was produced by a CEN/TC 304 Project Team, set up in June, 1998, as one of several to carry out the funded work program of TC 304 (documented in CEN/TC 304 N 666 R2). A first draft was discussed at the TC meeting in Brussels in November, 1998. A revised draft was circulated for comments within the TC and thereafter discussed at the TC plenary meeting in April, 1999. -
Choosing Inscriptions Making Font for 'Armazuli' Aramaic Objectives Mark up of the Texts and Linked Data New Photo Document
@EAGLE 2016, Rome Epigraphic Corpus of Georgia Inscriptions found in Georgia are diverse in their typology, content and language. From among this range perhaps the most compelling examples are those inscriptions The Institute of Linguistic Research, in Aramaic and Old Greek (numbering more than 1000), and dating from V AC to XIX (T. Kaukhchishvili 2009.) None of these have been published online according to the EpiDoc guidelines. Thanks to a year’s funding from The Institute of Linguistic Research of the Ilia State University (ISU), the “Epigraphic Corpus of Georgia Project”, Ilia State University led by Prof. Nino Doborjginidze, began on March 1st 2015. It aims to make the first, key 30 inscriptions available to both a scholarly audience and to the general public. It presents an opportunity to question the strict practice of the “print-only” publishing of epigraphic materials among Georgian epigraphists. Objectives New photo documentation and web page - epigraphy.iliauni.edu.ge A desired outcome of the digital publishing of the inscriptions of Georgia is, on the one hand, to preserve those inscriptions and likewise to preserve editions of these inscriptions that were made by Georgian experts from 1930s onwards, although only a few of these were published in international scientific journals because of the So- viet restrictions. Thus the aims of the project can be summarized thus: • to protect these inscriptions as an element of our cultural heritage • to document the printed critical editions of the inscriptions, which likewise consti- tute a part of that cultural heritage • to demonstrate and illustrate common, historical-cultural contexts(e.g. -
Proposal for the Georgian Script Root Zone LGR
Proposal for the Georgian Script Root Zone LGR LGR Version 2 Date: 2016-11-24 Document version: 1.1 Authors: GEORGIAN SCRIPT GENERATION PANEL 1 General Information/ Overview/ Abstract The purpose of this document is to give an overview of the proposed Georgian LGR in the XML format and the rationale behind the design decisions taken. It includes a discussion of relevant features of the script, the communities or languages using it, the process and methodology used and information on the contributors. The formal specification of the LGR can be found in the accompanying XML document: • Proposed-LGR-Georgian-20160915.xml Labels for testing can be found in the accompanying text document: • Labels-GeorgianScript-20160915.txt 2 Script for which the LGR is proposed ISO 15924 Code: Geor ISO 15924 Key N°: 240 ISO 15924 English Name: Georgian Mkhedruli Latin transliteration of native script name: Mkhedruli Native name of the script: მხედრული Maximal Starting Repertoire (MSR) version: MSR-2 Proposal for a Georgian Script Root Zone LGR Georgian Script GP 3 Background on Script and Principal Languages Using It The Georgian scripts are the three writing systems used to write the Georgian language: Asomtavruli, Nuskhuri and Mkhedruli. Mkhedruli (Georgian: მხედრული) is the current Georgian script and is therefore the standard script for modern Georgian and its related Kartvelian languages, whereas Asomtavruli and Nuskhuri are used only in ceremonial religious texts and iconography. In the following, the term Georgian script is used synonymously with Mkhedruli. Like the two other scripts, Mkhedruli is purely unicameral. Mkhedruli first appears in the 10th century - the oldest Mkhedruli inscription found is dated back to 982 AD. -
Sylfaen : Foundations of Multiscript Typography
Sylfaen : Foundations of Multiscript Typography John Hudson, 2000 © INTRODUCTION In the autumn of 1997, I was hired by the Microsoft Typography Group to consult For a general introduction to OpenType, see David Lemon’s on the production of support materials for OpenType font developers working with article in the first issue of Type. For more detailed information, multiple scripts and languages. The project was given the working title Web Resource including the current version of the OpenType specification, for International Typography (WRIT). A year later, I had the pleasure of showing this see the Microsoft Typography work in progress at the 1998 ATypI Conference in Lyon, where it was well received website at: www.microsoft.com/typography by my colleagues in the type industry. Shortly after my return to Canada, Microsoft decided to put further development of this work on hold, so this paper, inevitably, lacks some of the enthusiasm I had for the project in 1998. It is difficult to remain excited about something which has been effectively cancelled, but at the same time I am happy to respond to the editors’ invitation to write something about the project for the Type journal, because I think the small team at Microsoft who worked on this project achieved something important. Perhaps I still hope that some day the project may be revived, even if in a different or more limited form. Certainly, I still believe that the ever increasing emphasis on internationalisation in software development, business and, of course, the Internet, requires a better level of support from font developers, and this in turn requires something like WRIT. -
17Th Century Estonian Orthography Reform, the Teaching of Reading and the History of Ideas
TRAMES, 2011, 15(65/60), 4, 365–384 17TH CENTURY ESTONIAN ORTHOGRAPHY REFORM, THE TEACHING OF READING AND THE HISTORY OF IDEAS Aivar Põldvee Institute of the Estonian Language, Tallinn, and Tallinn University Abstract. Literary languages can be divided into those which are more transparent or less transparent, based on phoneme-grapheme correspondence. The Estonian language falls in the category of more transparent languages; however, its development could have pro- ceeded in another direction. The standards of the Estonian literary language were set in the first half of the 17th century by German clergymen, following the example of German orthography, resulting in a gap between the ‘language of the church’ and the vernacular, as well as a discrepancy between writing and pronunciation. The German-type orthography was suited for Germans to read, but was not transparent for Estonians and created difficulties with the teaching of reading, which arose to the agenda in the 1680s. As a solution, Bengt Gottfried Forselius offered phonics instead of an alphabetic method, as well as a more phonetic and regular orthography. The old European written languages faced a similar problem in the 16th–17th centuries; for instance, Valentin Ickelsamer in Germany, John Hart in England, the grammarians of Port-Royal in France, and Comenius and others suggested using the phonic method and a more phonetic orthography. This article explores 17th century Estonian orthography reform and the reasons why it was realized as opposed to European analogues. Keywords: 17th century, spelling systems, literacy, phonics, Estonian language DOI: 10.3176/tr.2011.4.03 1. The problem In the Early Modern era, several universal processes took place in the history of European languages. -
Download the Complete Issue
ISSN 0976-0962 IJCLA International Journal of Computational Linguistics and Applicati ons Vol. 3 No. 2 Jul-Dec 2012 Guest Editor Yasunari Harada Editor-in-Chief Alexander Gelbukh © BAHRI PUBLICATIONS (2012) ISSN 0976-0962 International Journal of Computational Linguistics and App lications Vol. 3 No. 2 Jul-Dec 2012 International Journal of Computational Linguistics and Applications – IJCLA (started in 2010) is a peer-reviewed international journal published twice a year, in June and December. It publishes original research papers related to computational linguistics, natural language processing, human language technologies and their applications. The views expressed herein are those of the authors. The journal reserves the right to edit the material. © BAHRI PUBLICATIONS (2013). All rights reserved. No part of this publication may be reproduced by any means, transmitted or translated into another language without the written permission of the publisher. Indexing: Cabell's Directory of Publishing Opportunities. Editor-in-Chief: Alexander Gelbukh Subscription: India: Rs. 2699 Rest of the world: US$ 249 Payments can be made by Cheques/Bank Drafts/International Money Orders drawn in the name of BAHRI PUBLICATIONS, NEW DELHI and sent to: BAHRI PUBLICATIONS 1749A/5, 1st Floor, Gobindpuri Extension, P. O. Box 4453, Kalkaji, New Delhi 110019 Telephones: 011-65810766, (0) 9811204673, (0) 9212794543 E-mail: [email protected]; [email protected] Website: http://www.bahripublications.com Printed & Published by Deepinder Singh Bahri, for and