Programming with Unicode Documentation Release 2011

Total Page:16

File Type:pdf, Size:1020Kb

Programming with Unicode Documentation Release 2011 Programming with Unicode Documentation Release 2011 Victor Stinner Oct 01, 2019 Contents 1 About this book 1 1.1 License..................................................1 1.2 Thanks to.................................................1 1.3 Notations.................................................1 2 Unicode nightmare 3 3 Definitions 5 3.1 Character.................................................5 3.2 Glyph...................................................5 3.3 Code point................................................5 3.4 Character set (charset)..........................................5 3.5 Character string.............................................6 3.6 Byte string................................................6 3.7 UTF-8 encoded strings and UTF-16 character strings..........................7 3.8 Encoding.................................................7 3.9 Encode a character string.........................................7 3.10 Decode a byte string...........................................8 3.11 Mojibake.................................................8 3.12 Unicode: an Universal Character Set (UCS)...............................9 4 Unicode 11 4.1 Unicode Character Set.......................................... 11 4.2 Categories................................................ 11 4.3 Statistics................................................. 12 4.4 Normalization.............................................. 12 5 Charsets and encodings 15 5.1 Encodings................................................ 15 5.2 Popularity................................................ 15 5.3 Encodings performances......................................... 16 5.4 Examples................................................. 16 5.5 Handle undecodable bytes and unencodable characters......................... 16 5.5.1 Undecodable byte sequences.................................. 16 5.5.2 Unencodable characters..................................... 16 5.5.3 Error handlers.......................................... 17 5.5.4 Replace unencodable characters by a similar glyph...................... 17 i 5.5.5 Escape the character...................................... 17 5.6 Other charsets and encodings...................................... 17 6 Historical charsets and encodings 19 6.1 ASCII................................................... 19 6.2 ISO 8859 family............................................. 20 6.2.1 ISO 8859-1........................................... 21 6.2.2 cp1252............................................. 21 6.2.3 ISO 8859-15.......................................... 22 6.3 CJK: asian encodings.......................................... 22 6.3.1 Chinese encodings....................................... 22 6.3.2 Japanese encodings....................................... 22 6.3.3 ISO 2022............................................ 23 6.3.4 Extended Unix Code (EUC).................................. 23 6.4 Cyrillic.................................................. 24 7 Unicode encodings 25 7.1 UTF-8.................................................. 25 7.2 UCS-2, UCS-4, UTF-16 and UTF-32.................................. 25 7.3 UTF-7.................................................. 26 7.4 Byte order marks (BOM)......................................... 26 7.5 UTF-16 surrogate pairs.......................................... 26 8 How to guess the encoding of a document? 29 8.1 Is ASCII?................................................. 29 8.2 Check for BOM markers......................................... 30 8.3 Is UTF-8?................................................. 31 8.4 Libraries................................................. 33 9 Good practices 35 9.1 Rules................................................... 35 9.2 Unicode support levels.......................................... 35 9.3 Test the Unicode support of a program................................. 36 9.4 Get the encoding of your inputs..................................... 36 9.5 Switch from byte strings to character strings.............................. 37 10 Operating systems 39 10.1 Windows................................................. 39 10.1.1 Code pages........................................... 39 10.1.2 Encode and decode functions.................................. 40 10.1.3 Windows API: ANSI and wide versions............................ 42 10.1.4 Windows string types...................................... 42 10.1.5 Filenames............................................ 42 10.1.6 Windows console........................................ 42 10.1.7 File mode............................................ 43 10.2 Mac OS X................................................ 43 10.3 Locales.................................................. 43 10.3.1 Locale categories........................................ 44 10.3.2 The C locale........................................... 44 10.3.3 Locale encoding......................................... 44 10.3.4 Locale functions........................................ 44 10.4 Filesystems (filenames)......................................... 45 10.4.1 CD-ROM and DVD....................................... 45 10.4.2 Microsoft: FAT and NTFS filesystems............................. 45 10.4.3 Apple: HFS and HFS+ filesystems............................... 45 ii 10.4.4 Others.............................................. 46 11 Programming languages 47 11.1 C language................................................ 47 11.1.1 Byte API (char)......................................... 47 11.1.2 Byte string API (char*)..................................... 48 11.1.3 Character API (wchar_t).................................... 48 11.1.4 Character string API (wchar_t*)................................ 48 11.1.5 printf functions family..................................... 48 11.2 C++.................................................... 49 11.3 Python.................................................. 50 11.3.1 Python 2............................................. 50 11.3.2 Python 3............................................. 50 11.3.3 Differences between Python 2 and Python 3.......................... 51 11.3.4 Codecs............................................. 51 11.3.5 String methods......................................... 52 11.3.6 Filesystem............................................ 52 11.3.7 Windows............................................ 52 11.3.8 Modules............................................. 52 11.4 PHP.................................................... 53 11.5 Perl.................................................... 54 11.6 Java.................................................... 55 11.7 Go and D................................................. 55 12 Database systems 57 12.1 MySQL.................................................. 57 12.2 PostgreSQL................................................ 57 12.3 SQLite.................................................. 57 13 Libraries 59 13.1 Qt library................................................. 59 13.1.1 Character and string classes................................... 59 13.1.2 Codec.............................................. 60 13.1.3 Filesystem............................................ 60 13.2 The glib library.............................................. 60 13.2.1 Character strings........................................ 60 13.2.2 Codec functions......................................... 60 13.2.3 Filename functions....................................... 61 13.3 iconv library............................................... 61 13.4 ICU libraries............................................... 61 13.5 libunistring................................................ 61 14 Unicode issues 63 14.1 Security vulnerabilities.......................................... 63 14.1.1 Special characters........................................ 63 14.1.2 Non-strict UTF-8 decoder: overlong byte sequences and surrogates.............. 63 14.1.3 Check byte strings before decoding them to character strings................. 64 15 See also 65 Index 67 iii iv CHAPTER 1 About this book The book is written in reStructuredText (reST) syntax and compiled by Sphinx. I started to write in the 25th September 2010. 1.1 License This book is distributed under the CC BY-SA 3.0 license. 1.2 Thanks to Reviewers: Alexander Belopolsky, Antoine Pitrou, Feth Arezki and Nelle Varoquaux, Natal Ngétal. 1.3 Notations • 0bBBBBBBBB: 8 bit unsigned number written in binary, first digit is the most significant. For example, 0b10000000 is 128. • 0xHHHH: number written in hexadecimal, e.g. 0xFFFF is 65535. • 0xHH 0xHH ...: byte sequence with bytes written in hexadecimal, e.g. 0xC3 0xA9 (2 bytes) is the char- acter é (U+00E9) encoded to UTF-8. • U+HHHH: Unicode character with its code point written in hexadecimal. For example, U+20AC is the “euro sign” character, code point 8,364. Big code point are written with more than 4 hexadecimal digits, e.g. U+10FFFF is the biggest (unallocated) code point of Unicode Character Set 6.0: 1,114,111. • A—B: range including start and end. Examples: – 0x00—0x7F is the range 0 through 127 (128 bytes) – U+0000—U+00FF is the range 0 through 255 (256 characters) 1 Programming with Unicode Documentation, Release 2011 • {U+HHHH, U+HHHH, . }: a character string. For example, {U+0041, U+0042, U+0043} is the string “abc” (3 characters). 2 Chapter 1. About this book CHAPTER 2 Unicode nightmare Unicode is the nightmare of many developers (and users) for different, and sometimes good reasons. In the 1980s, only few people read documents
Recommended publications
  • Cumberland Tech Ref.Book
    Forms Printer 258x/259x Technical Reference DRAFT document - Monday, August 11, 2008 1:59 pm Please note that this is a DRAFT document. More information will be added and a final version will be released at a later date. August 2008 www.lexmark.com Lexmark and Lexmark with diamond design are trademarks of Lexmark International, Inc., registered in the United States and/or other countries. © 2008 Lexmark International, Inc. All rights reserved. 740 West New Circle Road Lexington, Kentucky 40550 Draft document Edition: August 2008 The following paragraph does not apply to any country where such provisions are inconsistent with local law: LEXMARK INTERNATIONAL, INC., PROVIDES THIS PUBLICATION “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions; therefore, this statement may not apply to you. This publication could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in later editions. Improvements or changes in the products or the programs described may be made at any time. Comments about this publication may be addressed to Lexmark International, Inc., Department F95/032-2, 740 West New Circle Road, Lexington, Kentucky 40550, U.S.A. In the United Kingdom and Eire, send to Lexmark International Ltd., Marketing and Services Department, Westhorpe House, Westhorpe, Marlow Bucks SL7 3RQ. Lexmark may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.
    [Show full text]
  • Tilde-Arrow-Out (~→O)
    Chapter 5: Derivations in Sentential Logic 181 atomic). In the next example, the disjunction is complex (its disjuncts are not atomic). Example 2 (1) (P ´ Q) → (P & Q) Pr (2) •: (P & Q) ∨ (~P & ~Q) ID (3) |~[(P & Q) ∨ (~P & ~Q)] As (4) |•: ¸ DD (5) ||~(P & Q) 3,~∨O (6) ||~(~P & ~Q) 3,~∨O (7) ||~(P ∨ Q) 1,5,→O (8) ||~P 7,~∨O (9) ||~Q 7,~∨O (10) ||~P & ~Q 8,9,&I (11) ||¸ 6,10,¸I The basic strategy is exactly like the previous problem. The only difference is that the formulas are more complex. 13. FURTHER RULES In the previous section, we added the rule ~∨O to our list of inference rules. Although it is not strictly required, it does make a number of derivations much easier. In the present section, for the sake of symmetry, we add corresponding rules for the remaining two-place connectives; specifically, we add ~&O, ~→O, and ~↔O. That way, we have a rule for handling any negated molecular formula. Also, we add one more rule that is sometimes useful, the Rule of Repetition. The additional negation rules are given as follows. Tilde-Ampersand-Out (~&O) ~(d & e) ––––––––– d → ~e Tilde-Arrow-Out (~→O) ~(d → f) –––––––––– d & ~f 182 Hardegree, Symbolic Logic Tilde-Double-Arrow-Out (~±O) ~(d ± e) –––––––––– ~d ± e The reader is urged to verify that these are all valid argument forms of sentential logic. There are other valid forms that could serve equally well as the rules in question. The choice is to a certain arbitrary. The advantage of the particular choice becomes more apparent in a later chapter on predicate logic.
    [Show full text]
  • SLP-5 Setup for Uni-3/5/7 & WM-Nano Chinese BIG5 Characters
    SLP-5 Configuration for Uni-3, Uni-5, Uni-7 & WM-Nano Chinese BIG5 Characters The Uni-3, Uni-5, Uni-7 scales and WM-Nano wrapper can display and print PLU information in Chinese BIG5, also known as Traditional, characters but all menus, etc. remain in English. Chinese characters are entered in SLP-5 and sent to the scales. Chinese characters cannot be entered at the scales. Refer to the following procedure to configure the scales and wrapper and SLP-5. SCALE SETUP Update the firmware for the Uni-5, Uni-7 and WM-Nano to the following versions. Uni-5: C1830x Uni-7: C1828x (PK-260A CF CPU), C2072x (PK-260B SD CPU) WM-Nano: C1832x (PK-260A CF CPU), C2074x (PK-260B SD CPU) Uni-3: C1937U or later, C2271x The Language setting in the Uni-3 must be set to print and display Chinese characters. Note: Only the Uni-3L2 models display Chinese characters. KEY IN ITEM No. Enter 6000, press MODE 0.000 0.000 0.00 0.00 <B00 SETUP> Enter 495344, press PLU <B00 SETUP> <B00 SETUP> Enter 26, press Down Arrow <B00 SETUP> B26 COUNTRY Press ENTER B26 COUNTRY *LANGUAGE 1:ENGLISH Enter 15, press ENTER B26-01-02 LANGUAGE 1 *LANGUAGE 15:FORMOSAN After the long beep press MODE B26-01-02 LANGUAGE 15 REBOOT Shut off the scale for this setting change to 19004-0000 take effect. SLP-5 SETUP SLP-5 must be configured to use Chinese fonts. Refer to the instructions below. The following Chinese character sizes are available.
    [Show full text]
  • Edit Bibliographic Records
    OCLC Connexion Browser Guides Edit Bibliographic Records Last updated: May 2014 6565 Kilgour Place, Dublin, OH 43017-3395 www.oclc.org Revision History Date Section title Description of changes May 2014 All Updated information on how to open the diacritic window. The shortcut key is no longer available. May 2006 1. Edit record: basics Minor updates. 5. Insert diacritics Revised to update list of bar syntax character codes to reflect and special changes in character names and to add newly supported characters characters. November 2006 1. Edit record: basics Minor updates. 2. Editing Added information on guided editing for fields 541 and 583, techniques, template commonly used when cataloging archival materials. view December 2006 1. Edit record: basics Updated to add information about display of WorldCat records that contain non-Latin scripts.. May 2007 4. Validate record Revised to document change in default validation level from None to Structure. February 2012 2 Editing techniques, Series added entry fields 800, 810, 811, 830 can now be used to template view insert data from a “cited” record for a related series item. Removed “and DDC” from Control All commands. DDC numbers are no longer controlled in Connexion. April 2012 2. Editing New section on how to use the prototype OCLC Classify service. techniques, template view September 2012 All Removed all references to Pathfinder. February 2013 All Removed all references to Heritage Printed Book. April 2013 All Removed all references to Chinese Name Authority © 2014 OCLC Online Computer Library Center, Inc. 6565 Kilgour Place Dublin, OH 43017-3395 USA The following OCLC product, service and business names are trademarks or service marks of OCLC, Inc.: CatExpress, Connexion, DDC, Dewey, Dewey Decimal Classification, OCLC, WorldCat, WorldCat Resource Sharing and “The world’s libraries.
    [Show full text]
  • About:Config .Init About:Me About:Presentation Web 2.0 User
    about:config .init about:me about:presentation Web 2.0 User View Technical View Simple Overview Picture Complex Overview Picture Main Problems Statistics I Statistics II .next Targets Of Attack Targets Methods Kinds Of Session Hijacking SQL Injection Introduction Examples SQL Injection Picture Analysis SQL Escaping SQL Escaping #2 SQL Parameter Binding XSS Introduction What Can It Do? Main Problem Types Of XSS Components Involved In XSS Reflected XSS Picture Reflected XSS Analysis (Server Side) Stored XSS Picture Server Side Stored XSS (Local) DOM XSS Example Picture local DOM XSS CSRF Introduction Example Picture CSRF Session Riding Analysis Complex Example Hijack Via DNS + XSS Picture DNS+XSS Combo Cookie Policy Analysis Variant Components .next Misplaced Trust 3rd Party Script Picture Trust 3rd Party Script Analysis Misplaced Trust In Middleware Misplaced Trust In Server­Local Data Picture Local Scripts Analysis Same Origin Policy Frame Policy UI Redressing Introduction Clickjacking Picture Clickjacking Analysis BREAK .next Summary of Defense Strategies "Best Effort" vs. "Best Security" Protection against Hijacking Session Theft Riding, Fixation, Prediction Separate by Trust Validation Why Input Validation at Server Check Origin and Target of Request Validation of Form Fields Validation of File Upload Validation Before Forwarding Validation of Server Output Validation of Target in Client Validation of Origin in Client Validation of Input in Client Normalization What's That? Normalizing HTML Normalizing XHTML Normalizing Image, Audio,
    [Show full text]
  • Consonant Characters and Inherent Vowels
    Global Design: Characters, Language, and More Richard Ishida W3C Internationalization Activity Lead Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 1 Getting more information W3C Internationalization Activity http://www.w3.org/International/ Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 2 Outline Character encoding: What's that all about? Characters: What do I need to do? Characters: Using escapes Language: Two types of declaration Language: The new language tag values Text size Navigating to localized pages Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 3 Character encoding Character encoding: What's that all about? Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 4 Character encoding The Enigma Photo by David Blaikie Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 5 Character encoding Berber 4,000 BC Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 6 Character encoding Tifinagh http://www.dailymotion.com/video/x1rh6m_tifinagh_creation Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 7 Character encoding Character set Character set ⴰ ⴱ ⴲ ⴳ ⴴ ⴵ ⴶ ⴷ ⴸ ⴹ ⴺ ⴻ ⴼ ⴽ ⴾ ⴿ ⵀ ⵁ ⵂ ⵃ ⵄ ⵅ ⵆ ⵇ ⵈ ⵉ ⵊ ⵋ ⵌ ⵍ ⵎ ⵏ ⵐ ⵑ ⵒ ⵓ ⵔ ⵕ ⵖ ⵗ ⵘ ⵙ ⵚ ⵛ ⵜ ⵝ ⵞ ⵟ ⵠ ⵢ ⵣ ⵤ ⵥ ⵯ Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 8 Character encoding Coded character set 0 1 2 3 0 1 Coded character set 2 3 4 5 6 7 8 9 33 (hexadecimal) A B 52 (decimal) C D E F Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 9 Character encoding Code pages ASCII Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 10 Character encoding Code pages ISO 8859-1 (Latin 1) Western Europe ç (E7) Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 11 Character encoding Code pages ISO 8859-7 Greek η (E7) Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 12 Character encoding Double-byte characters Standard Country No.
    [Show full text]
  • Proposal to Add U+2B95 Rightwards Black Arrow to Unicode Emoji
    Proposal to add U+2B95 Rightwards Black Arrow to Unicode Emoji J. S. Choi, 2015‐12‐12 Abstract In the Unicode Standard 7.0 from 2014, ⮕ U+2B95 was added with the intent to complete the family of black arrows encoded by ⬅⬆⬇ U+2B05–U+2B07. However, due to historical timing, ⮕ U+2B95 was not yet encoded when the Unicode Emoji were frst encoded in 2009–2010, and thus the family of four emoji black arrows were mapped not only to ⬅⬆⬇ U+2B05–U+2B07 but also to ➡ U+27A1—a compatibility character for ITC Zapf Dingbats—instead of ⮕ U+2B95. It is thus proposed that ⮕ U+2B95 be added to the set of Unicode emoji characters and be given emoji‐ and text‐style standardized variants, in order to match the properties of its siblings ⬅⬆⬇ U+2B05–U+2B07, with which it is explicitly unifed. 1 Introduction Tis document primarily discusses fve encoded characters, already in Unicode as of 2015: ⮕ U+2B95 Rightwards Black Arrow: Te main encoded character being discussed. Located in the Miscellaneous Symbols and Arrows block. ⬅⬆⬇ U+2B05–U+2B07 Leftwards, Upwards, and Downwards Black Arrow: Te three black arrows that ⮕ U+2B95 completes. Also located in the Miscellaneous Symbols and Arrows block. ➡ U+27A1 Black Rightwards Arrow: A compatibility character for ITC Zapf Dingbats. Located in the Dingbats block. Tis document proposes the addition of ⮕ U+2B95 to the set of emoji characters as defned by Unicode Technical Report (UTR) #51: “Unicode Emoji”. In other words, it proposes: 1. A property change: ⮕ U+2B95 should be given the Emoji property defned in UTR #51.
    [Show full text]
  • SUPPORTING the CHINESE, JAPANESE, and KOREAN LANGUAGES in the OPENVMS OPERATING SYSTEM by Michael M. T. Yau ABSTRACT the Asian L
    SUPPORTING THE CHINESE, JAPANESE, AND KOREAN LANGUAGES IN THE OPENVMS OPERATING SYSTEM By Michael M. T. Yau ABSTRACT The Asian language versions of the OpenVMS operating system allow Asian-speaking users to interact with the OpenVMS system in their native languages and provide a platform for developing Asian applications. Since the OpenVMS variants must be able to handle multibyte character sets, the requirements for the internal representation, input, and output differ considerably from those for the standard English version. A review of the Japanese, Chinese, and Korean writing systems and character set standards provides the context for a discussion of the features of the Asian OpenVMS variants. The localization approach adopted in developing these Asian variants was shaped by business and engineering constraints; issues related to this approach are presented. INTRODUCTION The OpenVMS operating system was designed in an era when English was the only language supported in computer systems. The Digital Command Language (DCL) commands and utilities, system help and message texts, run-time libraries and system services, and names of system objects such as file names and user names all assume English text encoded in the 7-bit American Standard Code for Information Interchange (ASCII) character set. As Digital's business began to expand into markets where common end users are non-English speaking, the requirement for the OpenVMS system to support languages other than English became inevitable. In contrast to the migration to support single-byte, 8-bit European characters, OpenVMS localization efforts to support the Asian languages, namely Japanese, Chinese, and Korean, must deal with a more complex issue, i.e., the handling of multibyte character sets.
    [Show full text]
  • ISO Basic Latin Alphabet
    ISO basic Latin alphabet The ISO basic Latin alphabet is a Latin-script alphabet and consists of two sets of 26 letters, codified in[1] various national and international standards and used widely in international communication. The two sets contain the following 26 letters each:[1][2] ISO basic Latin alphabet Uppercase Latin A B C D E F G H I J K L M N O P Q R S T U V W X Y Z alphabet Lowercase Latin a b c d e f g h i j k l m n o p q r s t u v w x y z alphabet Contents History Terminology Name for Unicode block that contains all letters Names for the two subsets Names for the letters Timeline for encoding standards Timeline for widely used computer codes supporting the alphabet Representation Usage Alphabets containing the same set of letters Column numbering See also References History By the 1960s it became apparent to thecomputer and telecommunications industries in the First World that a non-proprietary method of encoding characters was needed. The International Organization for Standardization (ISO) encapsulated the Latin script in their (ISO/IEC 646) 7-bit character-encoding standard. To achieve widespread acceptance, this encapsulation was based on popular usage. The standard was based on the already published American Standard Code for Information Interchange, better known as ASCII, which included in the character set the 26 × 2 letters of the English alphabet. Later standards issued by the ISO, for example ISO/IEC 8859 (8-bit character encoding) and ISO/IEC 10646 (Unicode Latin), have continued to define the 26 × 2 letters of the English alphabet as the basic Latin script with extensions to handle other letters in other languages.[1] Terminology Name for Unicode block that contains all letters The Unicode block that contains the alphabet is called "C0 Controls and Basic Latin".
    [Show full text]
  • PCL PC-8, Code Page 437 Page 1 of 5 PCL PC-8, Code Page 437
    PCL PC-8, Code Page 437 Page 1 of 5 PCL PC-8, Code Page 437 PCL Symbol Set: 10U Unicode glyph correspondence tables. Contact:[email protected] http://pcl.to -- -- -- -- $90 U00C9 Ê Uppercase e acute $21 U0021 Ë Exclamation $91 U00E6 Ì Lowercase ae diphthong $22 U0022 Í Neutral double quote $92 U00C6 Î Uppercase ae diphthong $23 U0023 Ï Number $93 U00F4 & Lowercase o circumflex $24 U0024 ' Dollar $94 U00F6 ( Lowercase o dieresis $25 U0025 ) Per cent $95 U00F2 * Lowercase o grave $26 U0026 + Ampersand $96 U00FB , Lowercase u circumflex $27 U0027 - Neutral single quote $97 U00F9 . Lowercase u grave $28 U0028 / Left parenthesis $98 U00FF 0 Lowercase y dieresis $29 U0029 1 Right parenthesis $99 U00D6 2 Uppercase o dieresis $2A U002A 3 Asterisk $9A U00DC 4 Uppercase u dieresis $2B U002B 5 Plus $9B U00A2 6 Cent sign $2C U002C 7 Comma, decimal separator $9C U00A3 8 Pound sterling $2D U002D 9 Hyphen $9D U00A5 : Yen sign $2E U002E ; Period, full stop $9E U20A7 < Pesetas $2F U002F = Solidus, slash $9F U0192 > Florin sign $30 U0030 ? Numeral zero $A0 U00E1 ê Lowercase a acute $31 U0031 A Numeral one $A1 U00ED B Lowercase i acute $32 U0032 C Numeral two $A2 U00F3 D Lowercase o acute $33 U0033 E Numeral three $A3 U00FA F Lowercase u acute $34 U0034 G Numeral four $A4 U00F1 H Lowercase n tilde $35 U0035 I Numeral five $A5 U00D1 J Uppercase n tilde $36 U0036 K Numeral six $A6 U00AA L Female ordinal (a) http://www.pclviewer.com (c) RedTitan Technology 2005 PCL PC-8, Code Page 437 Page 2 of 5 $37 U0037 M Numeral seven $A7 U00BA N Male ordinal (o) $38 U0038
    [Show full text]
  • Unicode and Code Page Support
    Natural for Mainframes Unicode and Code Page Support Version 4.2.6 for Mainframes October 2009 This document applies to Natural Version 4.2.6 for Mainframes and to all subsequent releases. Specifications contained herein are subject to change and these changes will be reported in subsequent release notes or new editions. Copyright © Software AG 1979-2009. All rights reserved. The name Software AG, webMethods and all Software AG product names are either trademarks or registered trademarks of Software AG and/or Software AG USA, Inc. Other company and product names mentioned herein may be trademarks of their respective owners. Table of Contents 1 Unicode and Code Page Support .................................................................................... 1 2 Introduction ..................................................................................................................... 3 About Code Pages and Unicode ................................................................................ 4 About Unicode and Code Page Support in Natural .................................................. 5 ICU on Mainframe Platforms ..................................................................................... 6 3 Unicode and Code Page Support in the Natural Programming Language .................... 7 Natural Data Format U for Unicode-Based Data ....................................................... 8 Statements .................................................................................................................. 9 Logical
    [Show full text]
  • NCSA Telnet for the Macintosh User's Guide
    NCSA Telnet for the Macintosh User’s Guide Version 2.6 • October 1994 National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Contents Introduction Features of NCSA Telnet v Differences between Version 2.5 and Version 2.6 v New Features in Version 2.6 v Discontinued Features vi Bugs Fixed from Version 2.5 vi System Requirements vi Notational Conventions vi 1 Getting Started Installation Note 1-1 Beginning an NCSA Telnet Session 1-1 Opening and Closing a Connection 1-2 Opening a Connection 1-2 Logging on to Your Host 1-3 Setting the BACKSPACE/DELETE Key 1-3 Setting a VT Terminal Type 1-3 Emulating the VT Terminal Keyboard 1-4 Closing a Connection 1-4 Copying, Pasting, and Printing 1-5 Copy and Paste from the Edit Menu 1-5 Print from the File Menu 1-5 Ending an NCSA Telnet Session 1-6 2 Configuration Global Preferences 2-1 New Configuration System in Version 2.6 2-3 Default Configuration Records 2-3 Editing Configuration Records 2-3 Editing Terminal Configuration Records 2-4 Editing Session Configuration Records 2-5 Changing Configuration after Session Connected 2-9 Saved Sets 2-13 Saving a Set 2-14 Using a Saved Set 2-14 Loading a Saved Set 2-15 Macro Definitions 2-15 Reverting to Previous Macro Definitions 2-16 Saving Macros 2-16 3 Advanced Features Cursor Positioning with the Mouse 3-1 Multiple Connections 3-1 Opening More Than One Connection 3-1 Moving between Connections 3-1 Rules for Session Names 3-2 The Connections Menu 3-2 Naming Windows 3-2 Checking Session Status 3-2 Aborting Connection Attempts
    [Show full text]