UTF-8 and Unicode FAQ for Unix/Linux by Markus Kuhn
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Armenian Secret and Invented Languages and Argots
Armenian Secret and Invented Languages and Argots The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Russell, James R. Forthcoming. Armenian secret and invented languages and argots. Proceedings of the Institute of Linguistics of the Russian Academy of Sciences. Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:9938150 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Open Access Policy Articles, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#OAP 1 ARMENIAN SECRET AND INVENTED LANGUAGES AND ARGOTS. By James R. Russell, Harvard University. Светлой памяти Карена Никитича Юзбашяна посвящается это исследование. CONTENTS: Preface 1. Secret languages and argots 2. Philosophical and hypothetical languages 3. The St. Petersburg Manuscript 4. The Argot of the Felt-Beaters 5. Appendices: 1. Description of St. Petersburg MS A 29 2. Glossary of the Ṙuštuni language 3. Glossary of the argot of the Felt-Beaters of Moks 4. Texts in the “Third Script” of MS A 29 List of Plates Bibliography PREFACE Much of the research for this article was undertaken in Armenia and Russia in June and July 2011 and was funded by a generous O’Neill grant through the Davis Center for Russian and Eurasian Studies at Harvard. For their eager assistance and boundless hospitality I am grateful to numerous friends and colleagues who made my visit pleasant and successful. For their generous assistance in Erevan and St. -
File Formats
man pages section 4: File Formats Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. Part No: 817–3945–10 September 2004 Copyright 2004 Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved. This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun logo, docs.sun.com, AnswerBook, AnswerBook2, and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun’s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun’s written license agreements. -
Unicode and Code Page Support
Natural for Mainframes Unicode and Code Page Support Version 4.2.6 for Mainframes October 2009 This document applies to Natural Version 4.2.6 for Mainframes and to all subsequent releases. Specifications contained herein are subject to change and these changes will be reported in subsequent release notes or new editions. Copyright © Software AG 1979-2009. All rights reserved. The name Software AG, webMethods and all Software AG product names are either trademarks or registered trademarks of Software AG and/or Software AG USA, Inc. Other company and product names mentioned herein may be trademarks of their respective owners. Table of Contents 1 Unicode and Code Page Support .................................................................................... 1 2 Introduction ..................................................................................................................... 3 About Code Pages and Unicode ................................................................................ 4 About Unicode and Code Page Support in Natural .................................................. 5 ICU on Mainframe Platforms ..................................................................................... 6 3 Unicode and Code Page Support in the Natural Programming Language .................... 7 Natural Data Format U for Unicode-Based Data ....................................................... 8 Statements .................................................................................................................. 9 Logical -
Legacy Character Sets & Encodings
Legacy & Not-So-Legacy Character Sets & Encodings Ken Lunde CJKV Type Development Adobe Systems Incorporated bc ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/unicode/iuc15-tb1-slides.pdf Tutorial Overview dc • What is a character set? What is an encoding? • How are character sets and encodings different? • Legacy character sets. • Non-legacy character sets. • Legacy encodings. • How does Unicode fit it? • Code conversion issues. • Disclaimer: The focus of this tutorial is primarily on Asian (CJKV) issues, which tend to be complex from a character set and encoding standpoint. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations dc • GB (China) — Stands for “Guo Biao” (国标 guóbiâo ). — Short for “Guojia Biaozhun” (国家标准 guójiâ biâozhün). — Means “National Standard.” • GB/T (China) — “T” stands for “Tui” (推 tuî ). — Short for “Tuijian” (推荐 tuîjiàn ). — “T” means “Recommended.” • CNS (Taiwan) — 中國國家標準 ( zhôngguó guójiâ biâozhün) in Chinese. — Abbreviation for “Chinese National Standard.” 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations (Cont’d) dc • GCCS (Hong Kong) — Abbreviation for “Government Chinese Character Set.” • JIS (Japan) — 日本工業規格 ( nihon kôgyô kikaku) in Japanese. — Abbreviation for “Japanese Industrial Standard.” — 〄 • KS (Korea) — 한국 공업 규격 (韓國工業規格 hangug gongeob gyugyeog) in Korean. — Abbreviation for “Korean Standard.” — ㉿ — Designation change from “C” to “X” on August 20, 1997. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations (Cont’d) dc • TCVN (Vietnam) — Tiu Chun Vit Nam in Vietnamese. — Means “Vietnamese Standard.” • CJKV — Chinese, Japanese, Korean, and Vietnamese. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated What Is A Character Set? dc • A collection of characters that are intended to be used together to create meaningful text. -
Implementing Cross-Locale CJKV Code Conversion
Implementing Cross-Locale CJKV Code Conversion Ken Lunde CJKV Type Development Adobe Systems Incorporated bc ftp://ftp.oreilly.com/pub/examples/nutshell/ujip/unicode/iuc13-c2-paper.pdf ftp://ftp.oreilly.com/pub/examples/nutshell/ujip/unicode/iuc13-c2-slides.pdf Code Conversion Basics dc • Algorithmic code conversion — Within a single locale: Shift-JIS, EUC-JP, and ISO-2022-JP — A purely mathematical process • Table-driven code conversion — Required across locales: Chinese ↔ Japanese — Required when dealing with Unicode — Mapping tables are required — Can sometimes be faster than algorithmic code conversion— depends on the implementation September 10, 1998 Copyright © 1998 Adobe Systems Incorporated Code Conversion Basics (Cont’d) dc • CJKV character set differences — Different number of characters — Different ordering of characters — Different characters September 10, 1998 Copyright © 1998 Adobe Systems Incorporated Character Sets Versus Encodings dc • Common CJKV character set standards — China: GB 1988-89, GB 2312-80; GB 1988-89, GBK — Taiwan: ASCII, Big Five; CNS 5205-1989, CNS 11643-1992 — Hong Kong: ASCII, Big Five with Hong Kong extension — Japan: JIS X 0201-1997, JIS X 0208:1997, JIS X 0212-1990 — South Korea: KS X 1003:1993, KS X 1001:1992, KS X 1002:1991 — North Korea: ASCII (?), KPS 9566-97 — Vietnam: TCVN 5712:1993, TCVN 5773:1993, TCVN 6056:1995 • Common CJKV encodings — Locale-independent: EUC-*, ISO-2022-* — Locale-specific: GBK, Big Five, Big Five Plus, Shift-JIS, Johab, Unified Hangul Code — Other: UCS-2, UCS-4, UTF-7, UTF-8, -
Petit Manuel Unix®
Août 2010 Petit Manuel Unix® Jacques MADELAINE Département d’informatique Université de CAEN 14032 CAEN CEDEX La première édition de ce manuel décrivait SMX un Unix développé à l’INRIA pour la machine française SM90, la deuxième édition une adaptation pour SPIX, un Unix pour SM90 basé sur System V et développé par Bull. Il a été ensuite modifié et corrigé pour SunOS l’Unix de Sun Microsystems, puis pour Solaris. La cinquième version a été adaptée pour tenir compte des particularités du système GNU-Linux. Un chapitre supplémentaire dédié aux accès réseau a été ensuite ajouté. Rappelons que presque toutes les commandes décrites vont fonctionner comme indiqué sur tout système Unix commercial (Solaris, HP-UX, AIX, ...) ou libre (Linux, OpenBSD, FreeBSD, NetBSD, ...). Mes remerciements à Sara Aubry pour sa relecture attentive etàFrançois Girault pour avoir fourni la mise en tableau des commandes d’emacs. Mes remerciements à Davy Gigan pour m’avoir poussé à publier la version html en octobre 2003. 1 INTRODUCTION() INTRODUCTION() 2Petit manuel Unix 2002 INTRODUCTION NOM intro − introduction to the mini manual − introduction au petit manuel DESCRIPTION Ce manuel donne les principales commandes de Unix. Unix est une famille de systèmes d’exploitation ; les commandes décrites existent, sauf précision contraire, sous Linux et Solaris, les deux systèmes disponibles au département. Seules les principales options sont données, reportez-vous au manuel en ligne pour une liste exhaustive.Chaque commande est décrite par trois sections : NOM qui donne le nom de la commande, son nom en anglais (le nom Unix étant un mnémonique anglais ne correspondant pas toujours bien aveclefrançais) et en français. -
Introduction to Unicode
Introduction to Unicode Presented by developerWorks, your source for great tutorials ibm.com/developerWorks Table of Contents If you're viewing this document online, you can click any of the topics below to link directly to that section. 1. Tutorial basics 2 2. Introduction 3 3. Unicode-based multilingual form development 5 4. Going interactive: Adding the Perl CGI script 10 5. Conclusions and resources 13 6. Feedback 15 Introduction to Unicode Page 1 Presented by developerWorks, your source for great tutorials ibm.com/developerWorks Section 1. Tutorial basics Is this tutorial right for you? This tutorial is for anyone who wants to understand the basics of Unicode-based multilingual Web page development. It is an introduction to how multilingual characters, the Unicode encoding standard, XML, and Perl CGI scripts can be integrated to produce a truly multilingual Web page. The tutorial explains the concepts behind this process and lays the groundwork for future development. Navigation Navigating through the tutorial is easy: * Use the Next and Previous buttons to move forward and backward. * Use the Menu button to return to the tutorial menu. * If you'd like to tell us what you think, use the Feedback button. * If you need help with the tutorial, use the Help button. What is required: Hardware and software needed No additional hardware is needed for this tutorial; however, in order to view some of the files online, Unicode fonts must be loaded. See Resources for information on Unicode fonts if you don't already have one loaded. (Note: Many Unicode fonts are still incomplete or in development, so one particular font may not contain every character; however, for the purposes of this tutorial, the Unicode font downloaded by the user will ideally contain all or most of the language characters used in our examples. -
Guide to the Use of Character Set Standards in Europe
CEN TECHNICAL REPORT Draft 3 for CEN Trnnnn:1999 1999-07-23 Descriptors: Data processing, information interchange, text processing, text communication, graphic characters, character sets, representation of characters, coded character sets, architecture Information Technology - Guide to the use of character set standards in Europe This CEN Technical Report has been drawn up by CEN/TC 304 This CEN Technical Report was established by TC 304 in one official version (English). A version in any other language made by translation under the responsibility of a CEN member into its own lan- guage and notified to the Central Secretariat has the same status as the official version. CEN members are the national bodies of Austria, Belgium, the Czech Republic, Denmark, Finland, France, Germany, Greece, Iceland, Ireland, Italy, Luxembourg, Netherlands, Norway, Portugal, Spain, Sweden, Switzerland, and the United Kingdom. CEN European Committee for Standardization Comité Européen de Normalisation Europäisches Komitee für Normung Central Secretariat: rue de Stassart 36, B-1050 Brussels © CEN 1999 Copyright reserved to all CEN members Ref.No. TR xxxx:1999 E CEN TR nnnn : Draft 2 Guide to the use of character set standards in Europe ii Guide to the use of character set standards in Europe CEN TR nnnn : Draft 2 FOREWORD This report was produced by a CEN/TC 304 Project Team, set up in June, 1998, as one of several to carry out the funded work program of TC 304 (documented in CEN/TC 304 N 666 R2). A first draft was discussed at the TC meeting in Brussels in November, 1998. A revised draft was circulated for comments within the TC and thereafter discussed at the TC plenary meeting in April, 1999. -
JFP Reference Manual 5 : Standards, Environments, and Macros
JFP Reference Manual 5 : Standards, Environments, and Macros Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. Part No: 817–0648–10 December 2002 Copyright 2002 Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved. This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun logo, docs.sun.com, AnswerBook, AnswerBook2, and Solaris are trademarks, registered trademarks, or service marks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun’s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun’s written license agreements. -
Choosing Inscriptions Making Font for 'Armazuli' Aramaic Objectives Mark up of the Texts and Linked Data New Photo Document
@EAGLE 2016, Rome Epigraphic Corpus of Georgia Inscriptions found in Georgia are diverse in their typology, content and language. From among this range perhaps the most compelling examples are those inscriptions The Institute of Linguistic Research, in Aramaic and Old Greek (numbering more than 1000), and dating from V AC to XIX (T. Kaukhchishvili 2009.) None of these have been published online according to the EpiDoc guidelines. Thanks to a year’s funding from The Institute of Linguistic Research of the Ilia State University (ISU), the “Epigraphic Corpus of Georgia Project”, Ilia State University led by Prof. Nino Doborjginidze, began on March 1st 2015. It aims to make the first, key 30 inscriptions available to both a scholarly audience and to the general public. It presents an opportunity to question the strict practice of the “print-only” publishing of epigraphic materials among Georgian epigraphists. Objectives New photo documentation and web page - epigraphy.iliauni.edu.ge A desired outcome of the digital publishing of the inscriptions of Georgia is, on the one hand, to preserve those inscriptions and likewise to preserve editions of these inscriptions that were made by Georgian experts from 1930s onwards, although only a few of these were published in international scientific journals because of the So- viet restrictions. Thus the aims of the project can be summarized thus: • to protect these inscriptions as an element of our cultural heritage • to document the printed critical editions of the inscriptions, which likewise consti- tute a part of that cultural heritage • to demonstrate and illustrate common, historical-cultural contexts(e.g. -
Proposal for the Georgian Script Root Zone LGR
Proposal for the Georgian Script Root Zone LGR LGR Version 2 Date: 2016-11-24 Document version: 1.1 Authors: GEORGIAN SCRIPT GENERATION PANEL 1 General Information/ Overview/ Abstract The purpose of this document is to give an overview of the proposed Georgian LGR in the XML format and the rationale behind the design decisions taken. It includes a discussion of relevant features of the script, the communities or languages using it, the process and methodology used and information on the contributors. The formal specification of the LGR can be found in the accompanying XML document: • Proposed-LGR-Georgian-20160915.xml Labels for testing can be found in the accompanying text document: • Labels-GeorgianScript-20160915.txt 2 Script for which the LGR is proposed ISO 15924 Code: Geor ISO 15924 Key N°: 240 ISO 15924 English Name: Georgian Mkhedruli Latin transliteration of native script name: Mkhedruli Native name of the script: მხედრული Maximal Starting Repertoire (MSR) version: MSR-2 Proposal for a Georgian Script Root Zone LGR Georgian Script GP 3 Background on Script and Principal Languages Using It The Georgian scripts are the three writing systems used to write the Georgian language: Asomtavruli, Nuskhuri and Mkhedruli. Mkhedruli (Georgian: მხედრული) is the current Georgian script and is therefore the standard script for modern Georgian and its related Kartvelian languages, whereas Asomtavruli and Nuskhuri are used only in ceremonial religious texts and iconography. In the following, the term Georgian script is used synonymously with Mkhedruli. Like the two other scripts, Mkhedruli is purely unicameral. Mkhedruli first appears in the 10th century - the oldest Mkhedruli inscription found is dated back to 982 AD. -
Unicode Cree Syllabics for Windows and Macintosh
Unicode Cree Syllabics for Windows and Macintosh 37th Algonquian Conference, Ottawa 2005 Bill Jancewicz SIL International and Naskapi Development Corporation ABSTRACT Submitted as an update to a presentation made at the 34th Algonquian Conference (Kingston). The ongoing development of the operating systems has included increased support for cross-platform use of Unicode syllabic script. Key improvements that were included in Macintosh's OS X.3 (Panther) operating system now allow direct keyboarding of Unicode characters by means of a user-defined input method. A summary and comparison of the available tools for handling Unicode on both Windows and Macintosh will be discussed. INTRODUCTION Since 1988 the author has been working in the Naskapi language at Kawawachikamach with the primary purpose of linguistic analysis and Bible translation, sponsored by Wycliffe Bible Translators and SIL International. Related work includes mother tongue translator training, Naskapi literature and curriculum development. Along with the language work the author also developed methods of production for Naskapi language materials in syllabic script by means of the computer. With the advent of high quality publishing capabilities in newer computers, procedures were updated to keep pace with the improving technology. With resources available from SIL International computer services department, a very satisfactory system of keyboarding syllabic texts in Naskapi was developed. At the urging of colleagues working in related languages, the system was expanded to include the wider inventory of standard Eastern and Western Cree syllabics. While this has pushed the limit of what is possible with the current technology, Unicode makes this practical. Note that the system originally developed for keyboarding Naskapi syllabics is similar but not identical to the Cree system, because of the unique local orthography in use at Kawawachikamach.