The Unicode Standard 5.0 Code Charts
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only. -
Presentation of the Possibilities Offered by IFAO-Grec Unicode in Greek and Coptic, Especially in the PUA
DESCRIPTION OF THE IFAO-GREC UNICODE FONT This font is first of all a Greek and Coptic font which contains the most important critical and diacritical signs, and the sigla and symbols used in editing papyrological and epigraphical texts, as well as Greek texts of specialized content such as mathematics, astronomy, magic, music, and poetry. The font is naturally compatible with other Greek fonts in standard Unicode format (Main Plane 0) and tries to be as compatible as possible with e.g. NewAthenaUnicode in the Private Use Area (PUA) and the new Plane 1 area. But it offers several possibilities that do not exist in other fonts. The font is designed to harmonise with Times New Roman, in both style and dimensions. It was conceived by Jean-Luc Fournet, and the Unicode version is the work of Ralph Hancock. Adam Bülow-Jacobsen helped in various ways. IFAO-Grec Unicode is issued free of all rights. Since no font is perfect or complete, please notify Jean-Luc Fournet ([email protected]) of any errors or omissions so that we can correct them in future versions. Below you will find a brief presentation of the possibilities offered by IFAO-Grec Unicode in Greek and Coptic, especially in the PUA. Characters are designated by their Unicode number, e.g. ‘0353’, ‘E504’, or ‘1F00’. The official area, Main Plane 0. 1) 0300-0385: mostly diacritics, both normally spaced and zero-width (combining). Accents, breathings, iota subscript, diaeresis, macron and/or breve already exist in combination with letters (1F00 sqq. and EAF3 sqq.), but can also be typed separately after the letters as combining marks from this series or the series E501-E50B in the PUA. -
Information, Characters, Unicode
Information, Characters, Unicode Unicode © 24 August 2021 1 / 107 Hidden Moral Small mistakes can be catastrophic! Style Care about every character of your program. Tip: printf Care about every character in the program’s output. (Be reasonably tolerant and defensive about the input. “Fail early” and clearly.) Unicode © 24 August 2021 2 / 107 Imperative Thou shalt care about every Ěaracter in your program. Unicode © 24 August 2021 3 / 107 Imperative Thou shalt know every Ěaracter in the input. Thou shalt care about every Ěaracter in your output. Unicode © 24 August 2021 4 / 107 Information – Characters In modern computing, natural-language text is very important information. (“number-crunching” is less important.) Characters of text are represented in several different ways and a known character encoding is necessary to exchange text information. For many years an important encoding standard for characters has been US ASCII–a 7-bit encoding. Since 7 does not divide 32, the ubiquitous word size of computers, 8-bit encodings are more common. Very common is ISO 8859-1 aka “Latin-1,” and other 8-bit encodings of characters sets for languages other than English. Currently, a very large multi-lingual character repertoire known as Unicode is gaining importance. Unicode © 24 August 2021 5 / 107 US ASCII (7-bit), or bottom half of Latin1 NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SS SI DLE DC1 DC2 DC3 DC4 NAK SYN ETP CAN EM SUB ESC FS GS RS US !"#$%&’()*+,-./ 0123456789:;<=>? @ABCDEFGHIJKLMNO PQRSTUVWXYZ[\]^_ `abcdefghijklmno pqrstuvwxyz{|}~ DEL Unicode Character Sets © 24 August 2021 6 / 107 Control Characters Notice that the first twos rows are filled with so-called control characters. -
Greek Polytonic – Easy Accent Overview
Greek Polytonic – Easy Accent This document contains four sections: 1 Overview 2 Layout of the Easy Accent Keyboard 3 Shift State Graphics 4 Installation Instructions Overview Easy Accent is a custom Greek Polytonic keyboard that John Schwandt (Senior Fellow New Saint Andrews College) created to facilitate typing in ancient Greek while using the modern national Greek keyboard layout. Microsoft comes with a Greek Polytonic keyboard that anyone can add to their Windows system. Once added all you need to do is press Alt+Shift to change from typing in English to typing in Greek. Since this is done in the operating system it works in any program, even in system programs such as naming files and directories. It has all of the letters in the same location as the modern national system but this doesn’t leave much room for all of the possible diacritic marking combinations. If you have tried to access these marking in this keyboard you understand how difficult and unintuitive it is. There had to be an easier way. Since Greek is generally written either in uncials (upper case) or miniscules (lower case) forms without frequent shifting between the two, the Shift Key seemed to be better used for one of the far more frequent accent marks. This lead to the development of all accents being shift states rather than dead keys. The trick was to find three shift states that would still allow for all case form and not conflict with the “control” or “alt” shift states which many programs use for hotkeys. The Easy Accent keyboard uses Shift, AltGr (right Alt), and AltGr+Shift for these shift states. -
Part Two the First Problem with Word-Processing in Ancient Greek Is
Word Processing in Greek - Part Two The first problem with word-processing in ancient Greek is finding a font with Greek characters. (see Part 1) When you have a Unicode font with the polytonic Greek characters, the next problem is how to access them from the keyboard. Problem # 2 - Accessing the Greek characters . In order to do word-processing in ancient Greek, one not only needs a font with Greek characters, as well as the Latin characters used for English, one also needs to be able to access them easily by means of a keyboard. There are programs available, some for free download, which give a "virtual keyboard" which enables one to use a single keyboard, but switch between character sets. Some include the ability to switch between Hebrew, Russian, even simple Chinese. However, I have not found one which is completely reliable. One of the best, which I used for some time, works very well up to a point - then it seems to run out of memory and freezes the word-processor or even crashes the computer. However, I have found a simpler method, using MSWord, which involves a bit of time to set up - but less time than installing and getting used to a virtual keyboard program. If you don't have MSWord, the word-processing program you do have will probably have a similar capability. RTLM - Read the lovely manual and find out how to "insert symbol" and "allocate short cuts". Using MSWord, and a Unicode font, it is possible to set up a series of "short cuts" - keystrokes which will access characters outside the "Basic Latin" range associated with the normal keys. -
The Unicode Standard, Version 3.0, Issued by the Unicode Consor- Tium and Published by Addison-Wesley
The Unicode Standard Version 3.0 The Unicode Consortium ADDISON–WESLEY An Imprint of Addison Wesley Longman, Inc. Reading, Massachusetts · Harlow, England · Menlo Park, California Berkeley, California · Don Mills, Ontario · Sydney Bonn · Amsterdam · Tokyo · Mexico City Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial capital letters. However, not all words in initial capital letters are trademark designations. The authors and publisher have taken care in preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. If these files have been purchased on computer-readable media, the sole remedy for any claim will be exchange of defective media within ninety days of receipt. Dai Kan-Wa Jiten used as the source of reference Kanji codes was written by Tetsuji Morohashi and published by Taishukan Shoten. ISBN 0-201-61633-5 Copyright © 1991-2000 by Unicode, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or other- wise, without the prior written permission of the publisher or Unicode, Inc. -
Hellenic Polytonic HOWTO Table of Contents
Hellenic Polytonic HOWTO Copyright (c) 2005-2011 Dimitri Marinakis. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". Table of Contents Introduction History Acknowledgements References Disclaimer Accents and “Breathings” Output Input Internal representation GNU/Linux keyboard translation Hellenic (polytonic) under X Hellenic polytonic fonts Font installation Hellenic polytonic keyboard under X Hellenic polytonic keyboard under KDE and GNOME Locales Hellenic keyboard at the console (CLI) No polytonic keyboard Word-processing and display applications GNU Free Documentation License Hellenic_polytonic_HOWTO-111210 Page 1 of 28 Introduction The current state of handling Hellenic language characters on GNU / Linux systems is examined. Emphasis is placed on the use of Hellenic polytonic (accented) fonts and keyboard drivers. Try the pdf version of this document if the html version is not rendered correctly. History 20111210 – Major update - tlgu.carmen.gr 20060505 – Information update – more free fonts 20060422 – Information update – X information 20051011 – Information update – gnome information 20050830 – Date correction – information request 20050722 – Amendments, addition of keyboard layouts 20050717 – First release: – cap8.gr/tlgu Acknowledgements -
Greek Unicode Character Chart-Extended
Greek Extended Range: 1F00–1FFF This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 12.1 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-12.1/ for charts showing only the characters added in Unicode 12.1. See http://www.unicode.org/Public/12.1.0/charts/ for a complete archived file of character code charts for Unicode 12.1. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 12.1 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 12.1, online at http://www.unicode.org/versions/Unicode12.1.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation. -
Greekkeys Unicode 2008 USER’S GUIDE
i GreekKeys Unicode 2008 USER’S GUIDE by Donald Mastronarde ©2008 American Philological Association GreekKeys Unicode 2008 User’s Guide ii NEITHER THE AMERICAN PHILOLOGICAL ASSOCIATION NOR APPLE INC. NOR MICROSOFT CORPORATION MAKES ANY WARRANTIES, EITHER EXPRESS OR IMPLIED, REGARDING THE ENCLOSED COMPUTER SOFTWARE PACKAGE, ITS MERCHANTABILITY OR ITS FITNESS FOR ANY PARTICULAR PURPOSE. THE EXCLUSION OF IMPLIED WARRANTIES IS NOT PERMITTED BY SOME STATES. THE ABOVE EXCLUSION MAY NOT APPLY TO YOU. THIS WARRANTY PROVIDES YOU WITH SPECIFIC LEGAL RIGHTS. THERE MAY BE OTHER RIGHTS THAT YOU MAY HAVE WHICH VARY FROM STATE TO STATE. IN NO EVENT WILL APPLE INC. OR MICROSOFT CORPORATION OR THE AMERICAN PHILOLOGICAL ASSOCIATION BE LIABLE FOR ANY CONSEQUENTIAL, INCIDENTAL OR INDIRECT DAMAGES (INCLUDING DAMAGES FOR DOWNTIME, COSTS OF RECOVERING OR REPRODUCING DATA AND THE LIKE) ARISING OUT OF THE USE OR INABILITY TO USE GREEKKEYS, EVEN IF THEY HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. BECAUSE SOME STATES DO NOT ALLOW THE EXCLUSION OR LIMITATION OF LIABILITY FOR CONSEQUENTIAL OR INCIDENTAL DAMAGES, THE ABOVE LIMITATIONS MAY NOT APPLY TO YOU. GreekKeys 2008 Licenses See the document GreekKeysLicense.pdf, which is included in the download and also available at https://webfiles.berkeley.edu/~pinax/greekkeys/pdfs/GreekKeysLicense.pdf Macintosh and OS X are registered trademarks of Apple Inc. Microsoft Word is a registered trademark of Microsoft Corporation. PostScript is a registered trademarks of Adobe Systems, Inc. Other brand or product names are trademarks of their respective holders. iii Online sales GreekKeys 2008 is sold via a web store hosted by eSellerate at: http://store.esellerate.net/apa/apagreekkeys Technical support Technical support for this product is limited. -
Greekkeys 2015
GreekKeys 2015 USER GUIDE by Donald Mastronarde ©2015 Society for Classical Studies ii NEITHER THE SOCIETY FOR CLASSICAL STUDIES NOR APPLE INC. NOR MICROSOFT CORPORATION MAKES ANY WARRANTIES, EITHER EXPRESS OR IMPLIED, REGARDING THE ENCLOSED COMPUTER SOFTWARE PACKAGE, ITS MERCHANTABILITY OR ITS FITNESS FOR ANY PARTICULAR PURPOSE. THE EXCLUSION OF IMPLIED WARRANTIES IS NOT PERMITTED BY SOME STATES. THE ABOVE EXCLUSION MAY NOT APPLY TO YOU. THIS WARRANTY PROVIDES YOU WITH SPECIFIC LEGAL RIGHTS. THERE MAY BE OTHER RIGHTS THAT YOU MAY HAVE WHICH VARY FROM STATE TO STATE. IN NO EVENT WILL APPLE INC. OR MICROSOFT CORPORATION OR THE SOCIETY FOR CLASSICAL STUDIES BE LIABLE FOR ANY CONSEQUENTIAL, INCIDENTAL OR INDIRECT DAMAGES (INCLUDING DAMAGES FOR DOWNTIME, COSTS OF RECOVERING OR REPRODUCING DATA AND THE LIKE) ARISING OUT OF THE USE OR INABILITY TO USE GREEKKEYS, EVEN IF THEY HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. BECAUSE SOME STATES DO NOT ALLOW THE EXCLUSION OR LIMITATION OF LIABILITY FOR CONSEQUENTIAL OR INCIDENTAL DAMAGES, THE ABOVE LIMITATIONS MAY NOT APPLY TO YOU. GreekKeys 2015 Licenses The licenses for the keyboards and fonts are in the GK2015 Licenses folder in the GK2015 Documentation folder included in the download. They are also available at the SCS web site. Macintosh and OS X are registered trademarks of Apple Inc. Microsoft Word is a registered trademark of Microsoft Corporation. PostScript is a registered trademarks of Adobe Systems, Inc. Other brand or product names are trademarks of their respective holders. iii Distribution GreekKeys 2015 is distributd by the Society for Classical Studies at the association’s web site. -
Before and After Unicode: Working with Polytonic Greek1 Donald Mastronarde, U
Before and After Unicode: Working with Polytonic Greek1 Donald Mastronarde, U. of California, Berkeley After the age of the punch card, when input and output for computing turned to the human-readable form of numbers and letter, these processes were very much English-centered. Computer engineers and programmers had little interest in the needs of multilingual and multiscript texts. When character sets other than plain-vanilla US English did become available, each set was limited in size to 256 items and the real limit is more like 220. For many purposes, only 128 items could be handled more or less gracefully, while the items in the upper half of a larger set might be ignored or misinterpreted by some programs. When the TLG originally began digitizing polytonic Greek texts, the betacode transcription and an elaborate collection of beta escapes were needed to cover the multitude of characters and symbols in specialized texts. The introduction of the Macintosh in 1984 inaugurated a new era for those who wanted to use fonts for specialized scripts: users could create and edit bitmap fonts and print them without difficulty on the ImageWriter, Apple’s dot-matrix printer, and they could hack the Mac’s system resources to adapt keyboard input for the customized font. The late George Walsh recognized the potential benefit to Hellenists and released SMK GreekKeys soon after the appearance of the first Mac. The same capability attracted users of other non-Roman scripts. 1 This is a slight revision (Feb. 2008) of the presentation made at the panel “Fonts, Encodings, Word-Processing and Publication: a tutorial for classicists on fonts and Unicode” at the Annual Meeting of the American Philological Association in Montreal, January 2006. -
Latin-1 Supplement
Supported .eu IDN characters Basic Latin U+002 - HYPHEN-MINUS U+0030 0 DIGIT ZERO U+0031 1 DIGIT ONE U+0032 2 DIGIT TWO U+0033 3 DIGIT THREE U+0034 4 DIGIT FOUR U+0035 5 DIGIT FIVE U+0036 6 DIGIT SIX U+0037 7 DIGIT SEVEN U+0038 8 DIGIT EIGHT U+0039 9 DIGIT NINE U+0041 A LATIN CAPITAL LETTER A U+0042 B LATIN CAPITAL LETTER B U+0043 C LATIN CAPITAL LETTER C U+0044 D LATIN CAPITAL LETTER D U+0045 E LATIN CAPITAL LETTER E U+0046 F LATIN CAPITAL LETTER F U+0047 G LATIN CAPITAL LETTER G U+0048 H LATIN CAPITAL LETTER H U+0049 I LATIN CAPITAL LETTER I U+004A J LATIN CAPITAL LETTER J U+004B K LATIN CAPITAL LETTER K U+004C L LATIN CAPITAL LETTER L U+004D M LATIN CAPITAL LETTER M U+004E N LATIN CAPITAL LETTER N U+004F O LATIN CAPITAL LETTER O U+0050 P LATIN CAPITAL LETTER P U+0051 Q LATIN CAPITAL LETTER Q U+0052 R LATIN CAPITAL LETTER R U+0053 S LATIN CAPITAL LETTER S U+0054 T LATIN CAPITAL LETTER T U+0055 U LATIN CAPITAL LETTER U U+0056 V LATIN CAPITAL LETTER V U+0057 W LATIN CAPITAL LETTER W U+0058 X LATIN CAPITAL LETTER X U+0059 Y LATIN CAPITAL LETTER Y U+005A Z LATIN CAPITAL LETTER Z U+0061 a LATIN SMALL LETTER A U+0062 b LATIN SMALL LETTER B U+0063 c LATIN SMALL LETTER C U+0064 d LATIN SMALL LETTER D U+0065 e LATIN SMALL LETTER E U+0066 f LATIN SMALL LETTER F U+0067 g LATIN SMALL LETTER G U+0068 h LATIN SMALL LETTER H U+0069 i LATIN SMALL LETTER I 1 1 Supported .eu IDN characters U+006A j LATIN SMALL LETTER J U+006B k LATIN SMALL LETTER K U+006C l LATIN SMALL LETTER L U+006D m LATIN SMALL LETTER M U+006E n LATIN SMALL LETTER N U+006F o LATIN