UTF-8 Encoding and SAS® Frank Poppe, PW Consulting
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Hieroglyphs for the Information Age: Images As a Replacement for Characters for Languages Not Written in the Latin-1 Alphabet Akira Hasegawa
Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 5-1-1999 Hieroglyphs for the information age: Images as a replacement for characters for languages not written in the Latin-1 alphabet Akira Hasegawa Follow this and additional works at: http://scholarworks.rit.edu/theses Recommended Citation Hasegawa, Akira, "Hieroglyphs for the information age: Images as a replacement for characters for languages not written in the Latin-1 alphabet" (1999). Thesis. Rochester Institute of Technology. Accessed from This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. Hieroglyphs for the Information Age: Images as a Replacement for Characters for Languages not Written in the Latin- 1 Alphabet by Akira Hasegawa A thesis project submitted in partial fulfillment of the requirements for the degree of Master of Science in the School of Printing Management and Sciences in the College of Imaging Arts and Sciences of the Rochester Institute ofTechnology May, 1999 Thesis Advisor: Professor Frank Romano School of Printing Management and Sciences Rochester Institute ofTechnology Rochester, New York Certificate ofApproval Master's Thesis This is to certify that the Master's Thesis of Akira Hasegawa With a major in Graphic Arts Publishing has been approved by the Thesis Committee as satisfactory for the thesis requirement for the Master ofScience degree at the convocation of May 1999 Thesis Committee: Frank Romano Thesis Advisor Marie Freckleton Gr:lduate Program Coordinator C. -
IPDS Technical Reference 1
IPDS Technical Reference 1 TABLE OF CONTENTS Manuals for the IPDS card.................................................................................................................................4 Notice..................................................................................................................................................................5 Important.........................................................................................................................................................5 How to Read This Manual................................................................................................................................. 6 Symbols...........................................................................................................................................................6 About This Book..................................................................................................................................................7 Audience.........................................................................................................................................................7 Terminology.................................................................................................................................................... 7 About IPDS.......................................................................................................................................................... 8 Capabilities of IPDS............................................................................................................................................9 -
ISO Basic Latin Alphabet
ISO basic Latin alphabet The ISO basic Latin alphabet is a Latin-script alphabet and consists of two sets of 26 letters, codified in[1] various national and international standards and used widely in international communication. The two sets contain the following 26 letters each:[1][2] ISO basic Latin alphabet Uppercase Latin A B C D E F G H I J K L M N O P Q R S T U V W X Y Z alphabet Lowercase Latin a b c d e f g h i j k l m n o p q r s t u v w x y z alphabet Contents History Terminology Name for Unicode block that contains all letters Names for the two subsets Names for the letters Timeline for encoding standards Timeline for widely used computer codes supporting the alphabet Representation Usage Alphabets containing the same set of letters Column numbering See also References History By the 1960s it became apparent to thecomputer and telecommunications industries in the First World that a non-proprietary method of encoding characters was needed. The International Organization for Standardization (ISO) encapsulated the Latin script in their (ISO/IEC 646) 7-bit character-encoding standard. To achieve widespread acceptance, this encapsulation was based on popular usage. The standard was based on the already published American Standard Code for Information Interchange, better known as ASCII, which included in the character set the 26 × 2 letters of the English alphabet. Later standards issued by the ISO, for example ISO/IEC 8859 (8-bit character encoding) and ISO/IEC 10646 (Unicode Latin), have continued to define the 26 × 2 letters of the English alphabet as the basic Latin script with extensions to handle other letters in other languages.[1] Terminology Name for Unicode block that contains all letters The Unicode block that contains the alphabet is called "C0 Controls and Basic Latin". -
Unicode and Code Page Support
Natural for Mainframes Unicode and Code Page Support Version 4.2.6 for Mainframes October 2009 This document applies to Natural Version 4.2.6 for Mainframes and to all subsequent releases. Specifications contained herein are subject to change and these changes will be reported in subsequent release notes or new editions. Copyright © Software AG 1979-2009. All rights reserved. The name Software AG, webMethods and all Software AG product names are either trademarks or registered trademarks of Software AG and/or Software AG USA, Inc. Other company and product names mentioned herein may be trademarks of their respective owners. Table of Contents 1 Unicode and Code Page Support .................................................................................... 1 2 Introduction ..................................................................................................................... 3 About Code Pages and Unicode ................................................................................ 4 About Unicode and Code Page Support in Natural .................................................. 5 ICU on Mainframe Platforms ..................................................................................... 6 3 Unicode and Code Page Support in the Natural Programming Language .................... 7 Natural Data Format U for Unicode-Based Data ....................................................... 8 Statements .................................................................................................................. 9 Logical -
Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only. -
United States Patent (19) 11 Patent Number: 5,689,723 Lim Et Al
US005689723A United States Patent (19) 11 Patent Number: 5,689,723 Lim et al. 45) Date of Patent: Nov. 18, 1997 (54) METHOD FOR ALLOWINGSINGLE-BYTE 5,091,878 2/1992 Nagasawa et al. ..................... 364/419 CHARACTER SET AND DOUBLE-BYTE 5,257,351 10/1993 Leonard et al. ... ... 395/150 CHARACTER SET FONTS IN ADOUBLE 5,287,094 2/1994 Yi....................... ... 345/143 BYTE CHARACTER SET CODE PAGE 5,309,358 5/1994 Andrews et al. ... 364/419.01 5,317,509 5/1994 Caldwell ............................ 364/419.08 75 Inventors: Chan S. Lim, Potomac; Gregg A. OTHER PUBLICATIONS Salsi, Germantown, both of Md.; Isao Nozaki, Yamato, Japan Japanese PUPA number 1-261774, Oct. 18, 1989, pp. 1-2. Inside Macintosh, vol. VI, Apple Computer, Inc., Cupertino, (73) Assignee: International Business Machines CA, Second printing, Jun. 1991, pp. 15-4 through 15-39. Corp, Armonk, N.Y. Karew Acerson, WordPerfect: The Complete Reference, Eds., p. 177-179, 1988. 21) Appl. No.: 13,271 IBM Manual, "DOSBunsho (Language) Program II Opera 22 Filed: Feb. 3, 1993 tion Guide” (N:SH 18-2131-2) (Partial Translation of p. 79). 51 Int. Cl. ... G09G 1/00 Primary Examiner-Phu K. Nguyen 52) U.S. Cl. .................. 395/805; 395/798 Assistant Examiner-Cliff N. Vo (58) Field of Search ..................................... 395/144-151, Attorney, Agent, or Firm-Edward H. Duffield 395/792, 793, 798, 805, 774; 34.5/171, 127-130, 23-26, 143, 116, 192-195: 364/419 57 ABSTRACT The method of the invention allows both single-byte char 56) References Cited acter set (SBCS) and double-byte character set (DBCS) U.S. -
Verity Locale Configuration Guide V5.0 for Peoplesoft
Verity® Locale Configuration Guide V 5.0 for PeopleSoft® November 15, 2003 Original Part Number DM0619 Verity, Incorporated 894 Ross Drive Sunnyvale, California 94089 (408) 541-1500 Verity Benelux BV Coltbaan 31 3439 NG Nieuwegein The Netherlands Copyright 2003 Verity, Inc. All rights reserved. No part of this publication may be reproduced, transmitted, stored in a retrieval system, nor translated into any human or computer language, in any form or by any means, electronic, mechanical, magnetic, optical, chemical, manual or otherwise, without the prior written permission of the copyright owner, Verity, Inc., 894 Ross Drive, Sunnyvale, California 94089. The copyrighted software that accompanies this manual is licensed to the End User for use only in strict accordance with the End User License Agreement, which the Licensee should read carefully before commencing use of the software. Verity®, Ultraseek®, TOPIC®, KeyView®, and Knowledge Organizer® are registered trademarks of Verity, Inc. in the United States and other countries. The Verity logo, Verity Portal One™, and Verity® Profiler™ are trademarks of Verity, Inc. Sun, Sun Microsystems, the Sun logo, Sun Workstation, Sun Operating Environment, and Java are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. Xerces XML Parser Copyright 1999-2000 The Apache Software Foundation. All rights reserved. Microsoft is a registered trademark, and MS-DOS, Windows, Windows 95, Windows NT, and other Microsoft products referenced herein are trademarks of Microsoft Corporation. IBM is a registered trademark of International Business Machines Corporation. The American Heritage® Concise Dictionary, Third Edition Copyright 1994 by Houghton Mifflin Company. Electronic version licensed from Lernout & Hauspie Speech Products N.V. -
Configuration Control Document
Configuration Control Document CR8200 Firmware Version 1.12.2 CR950 Firmware Version 2.1.2 CR1500 Firmware Version 1.4.1 CR1100 Firmware Version 1.2.0 CR5200 Firmware Version 1.0.4 CR2700 Firmware Version 1.0.6 A271 Firmware Version 1.0.3 D027153 CR8200 CR950 CR1500 CR1100 CR2700 CRA-A271 Configuration Control Document CCD.Docx Page 1 of 89 © 2013-2019 The Code Corporation 12393 South Gateway Park Place Suite 600, Draper, UT 84020 (801) 495-2200 FAX (801) 495-0280 Configuration Control Document Table of Contents Keyword Table .................................................................................................................. 4 Scope ................................................................................................................................ 6 Notations .......................................................................................................................... 6 Reader Command Overview ............................................................................................. 6 4.1 Configuration Command Architecture ........................................................................................ 6 4.2 Command Format ....................................................................................................................... 7 4.3 Supported Commands ................................................................................................................. 8 4.3.1 <CF> – Configuration Manager ...................................................................................................... -
Legacy Character Sets & Encodings
Legacy & Not-So-Legacy Character Sets & Encodings Ken Lunde CJKV Type Development Adobe Systems Incorporated bc ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/unicode/iuc15-tb1-slides.pdf Tutorial Overview dc • What is a character set? What is an encoding? • How are character sets and encodings different? • Legacy character sets. • Non-legacy character sets. • Legacy encodings. • How does Unicode fit it? • Code conversion issues. • Disclaimer: The focus of this tutorial is primarily on Asian (CJKV) issues, which tend to be complex from a character set and encoding standpoint. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations dc • GB (China) — Stands for “Guo Biao” (国标 guóbiâo ). — Short for “Guojia Biaozhun” (国家标准 guójiâ biâozhün). — Means “National Standard.” • GB/T (China) — “T” stands for “Tui” (推 tuî ). — Short for “Tuijian” (推荐 tuîjiàn ). — “T” means “Recommended.” • CNS (Taiwan) — 中國國家標準 ( zhôngguó guójiâ biâozhün) in Chinese. — Abbreviation for “Chinese National Standard.” 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations (Cont’d) dc • GCCS (Hong Kong) — Abbreviation for “Government Chinese Character Set.” • JIS (Japan) — 日本工業規格 ( nihon kôgyô kikaku) in Japanese. — Abbreviation for “Japanese Industrial Standard.” — 〄 • KS (Korea) — 한국 공업 규격 (韓國工業規格 hangug gongeob gyugyeog) in Korean. — Abbreviation for “Korean Standard.” — ㉿ — Designation change from “C” to “X” on August 20, 1997. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations (Cont’d) dc • TCVN (Vietnam) — Tiu Chun Vit Nam in Vietnamese. — Means “Vietnamese Standard.” • CJKV — Chinese, Japanese, Korean, and Vietnamese. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated What Is A Character Set? dc • A collection of characters that are intended to be used together to create meaningful text. -
Database Globalization Support Guide
Oracle® Database Database Globalization Support Guide 19c E96349-05 May 2021 Oracle Database Database Globalization Support Guide, 19c E96349-05 Copyright © 2007, 2021, Oracle and/or its affiliates. Primary Author: Rajesh Bhatiya Contributors: Dan Chiba, Winson Chu, Claire Ho, Gary Hua, Simon Law, Geoff Lee, Peter Linsley, Qianrong Ma, Keni Matsuda, Meghna Mehta, Valarie Moore, Cathy Shea, Shige Takeda, Linus Tanaka, Makoto Tozawa, Barry Trute, Ying Wu, Peter Wallack, Chao Wang, Huaqing Wang, Sergiusz Wolicki, Simon Wong, Michael Yau, Jianping Yang, Qin Yu, Tim Yu, Weiran Zhang, Yan Zhu This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. -
AIX Globalization
AIX Version 7.1 AIX globalization IBM Note Before using this information and the product it supports, read the information in “Notices” on page 233 . This edition applies to AIX Version 7.1 and to all subsequent releases and modifications until otherwise indicated in new editions. © Copyright International Business Machines Corporation 2010, 2018. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents About this document............................................................................................vii Highlighting.................................................................................................................................................vii Case-sensitivity in AIX................................................................................................................................vii ISO 9000.....................................................................................................................................................vii AIX globalization...................................................................................................1 What's new...................................................................................................................................................1 Separation of messages from programs..................................................................................................... 1 Conversion between code sets............................................................................................................. -
Unicode Encoding the TITUS Project
ALCTS "Library Catalogs and Non- 2004 ALA Annual Conference Orlando Roman Scripts" Program Unicode Encoding and Online Data Access Ralf Gehrke / Jost Gippert The TITUS Project („Thesaurus indogermanischer Text- und Sprachmaterialien“) (since 1987/1993) www.ala.org/alcts 1 ALCTS "Library Catalogs and Non- 2004 ALA Annual Conference Orlando Roman Scripts" Program Scope of the TITUS project: • Electronic retrieval engine covering the textual heritage of all ancient Indo-European languages • Present retrieval task: – Documentation of the usage of all word forms occurring in the texts, in their resp. contexts • Survey of the parts of the text database: – http://titus.uni-frankfurt.de/texte/texte2.htm Data formats (since 1995): • Text formats: – WordCruncher Text format (8-Bit) – HTML (UTF-8 Unicode 4.0) – (Plain 7-bit ASCII format) • Database format: – MS Access (relational, Unicode-based) – Retrieval via SQL www.ala.org/alcts 2 ALCTS "Library Catalogs and Non- 2004 ALA Annual Conference Orlando Roman Scripts" Program Original Scripts Covered: • Latin (with all kinds of diacritics), incl. variants* • Greek • Slavic (Cyrillic and Glagolitic*) • Armenian • Georgian • Devangar • Other Brhm scripts (Tocharian, Khotanese)* • Avestan* • Middle Persian (Pahlav)* • Manichean* • Arabic (incl. Persian) • Runic • Ogham • and many more * not yet encodable (as such) in Unicode Example 1a: Donelaitis (Lithuanian: formatted text incl. diacritics: 8-bit version) www.ala.org/alcts 3 ALCTS "Library Catalogs and Non- 2004 ALA Annual Conference Orlando Roman Scripts" Program Example 1b: Donelaitis (Lithuanian: formatted text incl. diacritics: Unicode version) Example 2a: Catechism (Old Prussian: formatted text incl. diacritics: 8-bit version, special TITUS font) www.ala.org/alcts 4 ALCTS "Library Catalogs and Non- 2004 ALA Annual Conference Orlando Roman Scripts" Program Example 2b: Catechism (Old Prussian: formatted text incl.