Edital 310520-0859
Total Page:16
File Type:pdf, Size:1020Kb

Load more
Recommended publications
-
Legacy Character Sets & Encodings
Legacy & Not-So-Legacy Character Sets & Encodings Ken Lunde CJKV Type Development Adobe Systems Incorporated bc ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/unicode/iuc15-tb1-slides.pdf Tutorial Overview dc • What is a character set? What is an encoding? • How are character sets and encodings different? • Legacy character sets. • Non-legacy character sets. • Legacy encodings. • How does Unicode fit it? • Code conversion issues. • Disclaimer: The focus of this tutorial is primarily on Asian (CJKV) issues, which tend to be complex from a character set and encoding standpoint. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations dc • GB (China) — Stands for “Guo Biao” (国标 guóbiâo ). — Short for “Guojia Biaozhun” (国家标准 guójiâ biâozhün). — Means “National Standard.” • GB/T (China) — “T” stands for “Tui” (推 tuî ). — Short for “Tuijian” (推荐 tuîjiàn ). — “T” means “Recommended.” • CNS (Taiwan) — 中國國家標準 ( zhôngguó guójiâ biâozhün) in Chinese. — Abbreviation for “Chinese National Standard.” 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations (Cont’d) dc • GCCS (Hong Kong) — Abbreviation for “Government Chinese Character Set.” • JIS (Japan) — 日本工業規格 ( nihon kôgyô kikaku) in Japanese. — Abbreviation for “Japanese Industrial Standard.” — 〄 • KS (Korea) — 한국 공업 규격 (韓國工業規格 hangug gongeob gyugyeog) in Korean. — Abbreviation for “Korean Standard.” — ㉿ — Designation change from “C” to “X” on August 20, 1997. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations (Cont’d) dc • TCVN (Vietnam) — Tiu Chun Vit Nam in Vietnamese. — Means “Vietnamese Standard.” • CJKV — Chinese, Japanese, Korean, and Vietnamese. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated What Is A Character Set? dc • A collection of characters that are intended to be used together to create meaningful text. -
Implementing Cross-Locale CJKV Code Conversion
Implementing Cross-Locale CJKV Code Conversion Ken Lunde CJKV Type Development Adobe Systems Incorporated bc ftp://ftp.oreilly.com/pub/examples/nutshell/ujip/unicode/iuc13-c2-paper.pdf ftp://ftp.oreilly.com/pub/examples/nutshell/ujip/unicode/iuc13-c2-slides.pdf Code Conversion Basics dc • Algorithmic code conversion — Within a single locale: Shift-JIS, EUC-JP, and ISO-2022-JP — A purely mathematical process • Table-driven code conversion — Required across locales: Chinese ↔ Japanese — Required when dealing with Unicode — Mapping tables are required — Can sometimes be faster than algorithmic code conversion— depends on the implementation September 10, 1998 Copyright © 1998 Adobe Systems Incorporated Code Conversion Basics (Cont’d) dc • CJKV character set differences — Different number of characters — Different ordering of characters — Different characters September 10, 1998 Copyright © 1998 Adobe Systems Incorporated Character Sets Versus Encodings dc • Common CJKV character set standards — China: GB 1988-89, GB 2312-80; GB 1988-89, GBK — Taiwan: ASCII, Big Five; CNS 5205-1989, CNS 11643-1992 — Hong Kong: ASCII, Big Five with Hong Kong extension — Japan: JIS X 0201-1997, JIS X 0208:1997, JIS X 0212-1990 — South Korea: KS X 1003:1993, KS X 1001:1992, KS X 1002:1991 — North Korea: ASCII (?), KPS 9566-97 — Vietnam: TCVN 5712:1993, TCVN 5773:1993, TCVN 6056:1995 • Common CJKV encodings — Locale-independent: EUC-*, ISO-2022-* — Locale-specific: GBK, Big Five, Big Five Plus, Shift-JIS, Johab, Unified Hangul Code — Other: UCS-2, UCS-4, UTF-7, UTF-8, -
N2403 Date: 2002-04-22
ISO/IEC JTC 1/SC 2/WG 2 N2403 DATE: 2002-04-22 ISO/IEC JTC 1/SC 2/WG 2 Universal Multiple-Octet Coded Character Set (UCS) - ISO/IEC 10646 Secretariat: ANSI DOC TYPE: Meeting Minutes TITLE: Draft minutes of WG 2 meeting 41, Hotel Phoenix, Singapore 2001-10-15/19 SOURCE: V.S. Umamaheswaran, Recording Secretary, and Mike Ksar, Convener PROJECT: JTC 1.02.18 – ISO/IEC 10646 STATUS: SC 2/WG 2 participants are requested to review the attached unconfirmed minutes, act on appropriate noted action items, and to send any comments or corrections to the convener as soon as possible but no later than 2002-05-15. ACTION ID: ACT DUE DATE: 2002-05-15 DISTRIBUTION: SC 2/WG 2 members and Liaison organizations MEDIUM: Paper NO. OF PAGES: 45 (including cover sheet) Mike Ksar Convener – ISO/IEC/JTC 1/SC 2/WG 2 Microsoft Corporation Phone: +1 425 707-6973 One Microsoft Way Redmond, WA, 98052 U. S. A. e-mail: [email protected] ISO International Organization for Standardization Organisation Internationale de Normalisation ISO/IEC JTC 1/SC 2/WG 2 Universal Multiple-Octet Coded Character Set (UCS) ISO/IEC JTC 1/SC 2/WG 2 N2403 Date: 2002-04-22 Title: Draft minutes of WG 2 meeting 41, Hotel Phoenix, Singapore; 2001-10-15/19 Source: V.S. Umamaheswaran ([email protected]), Recording Secretary Mike Ksar ([email protected]), Convener Action: WG 2 members and Liaison organizations Distribution: ISO/IEC JTC 1/SC 2/WG 2 members and Liaison organizations 1 Opening and roll call Input document: N2367 2nd Call and updated preliminary agenda – WG 2 meeting 41; Ksar; 2001-08-10 The convener Mr. -
The Unicode Standard, Version 4.0--Online Edition
This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consor- tium and published by Addison-Wesley. The material has been modified slightly for this online edi- tion, however the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/standard/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial capital letters. However, not all words in initial capital letters are trademark designations. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten used as the source of reference Kanji codes was written by Tetsuji Morohashi and published by Taishukan Shoten. -
Fontes & Codages
Fontes & codages Yannis Haralambous To cite this version: Yannis Haralambous. Fontes & codages. O’Reilly France, 2004, 2-84177-273-X. hal-02112931 HAL Id: hal-02112931 https://hal.archives-ouvertes.fr/hal-02112931 Submitted on 27 Apr 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Fontes & codages yannis haralambous Fontes & codages Éditions O’REILLY 18 rue Séguier 75006 PARIS http://www.oreilly.fr Cambridge • Cologne • Farnham • Paris • Pékin • Sébastopol • Taïpeï • Tokyo Couverture conçue par Emma Colby et Hanna Dyer. Édition : Xavier Cazin. Les programmes figurant dans ce livre ont pour but d’illustrer les sujets traités. Il n’est donné aucune garantie quant à leur fonctionnement une fois compilés, assemblés ou interprétés dans le cadre d’une utilisation professionnelle ou commerciale. c Éditions O’Reilly, Paris, 2004 ISBN 2-84177-273-X Toute représentation ou reproduction, intégrale ou partielle, faite sans le consentement de l’au- teur, de ses ayants droit, ou ayants cause, est illicite (loi du 11 mars 1957, alinéa 1er de l’article 40). Cette représentation ou reproduction, par quelque procédé que ce soit, constituerait une contre- façon sanctionnée par les articles 425 et suivants du Code pénal. -
Fonts & Encodings
Fonts & Encodings Yannis Haralambous To cite this version: Yannis Haralambous. Fonts & Encodings. O’Reilly, 2007, 978-0-596-10242-5. hal-02112942 HAL Id: hal-02112942 https://hal.archives-ouvertes.fr/hal-02112942 Submitted on 27 Apr 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. ,title.25934 Page iii Friday, September 7, 2007 10:44 AM Fonts & Encodings Yannis Haralambous Translated by P. Scott Horne Beijing • Cambridge • Farnham • Köln • Paris • Sebastopol • Taipei • Tokyo ,copyright.24847 Page iv Friday, September 7, 2007 10:32 AM Fonts & Encodings by Yannis Haralambous Copyright © 2007 O’Reilly Media, Inc. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (safari.oreilly.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or [email protected]. Printing History: September 2007: First Edition. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Fonts & Encodings, the image of an axis deer, and related trade dress are trademarks of O’Reilly Media, Inc. -
Korean LGP Status Update ICANN #54 DUB (Dublin)| 2015.10
Korean LGP Status Update ICANN #54 DUB (DUBlin)| 2015.10. Agenda Introduction and a list of Hangul Syllables for K-LGR v0.3 A list of Hangul Syllables , Hanja characters for K-LGR v0.3 Review of C (Chinese) and K (Korean) Var Groups Timeline of KLGP activities | 2 1. Introduction Characters to be included in "kore" (Korean Label) Both Hangeul (Hangul) and Hanja are included. K-LGR v02 --- revised --> K-LGR v0.3 (2015.08.13.) | 3 2. K-LGR v0.3 A list of Hangul Syllables for K-LGR v0.3 (2015.08.13.) 11,172 Hangul Syllbles (U+AC00 ~ U+D7A3) A list of Hanja characters for K-LGR v0.3 (2015.08.13.) Source of Hanja Character Set # chars 1) KS X 1001 (268 comptb. chars excluded) 4,620 2) KPS 9566 4,653 3) IICORE - K column marked 4,743 4) IICORE - KP column marked (= KPS 9566) 4,653 5) Qualifying Test of Korean Hanja Proficiency 4,641 (한국 한자 능력 검정 시험) K-LGR v0.3 (2015.08.13.): Hanja List 4,819 | 4 3. Review of C (Chinese) and K (Korean) Variant Groups C-LGR (2015.04.30.): 3093 variant groups (a variant group is composed of two or more variants) K-LGR v0.3 (2015.08.13.): 37 variant groups Analysis of 3093 C (Chinese) variant groups extracted 303 variant groups where there are two or more K characters • K character is a character belonging to K-LGR v0.3 (2015.08.13.) Korea classified 303 variant groups into three categories | 5 3. -
Interoperability Landscaping Report December 28, 2015
Ref. Ares(2016)2259176 - 13/05/2016 Interoperability Landscaping Report December 28, 2015 Deliverable Code: D5.1 Version: 3 – Intermediary Dissemination level: PUBLIC H2020-EINFRA-2014-2015 / H2020-EINFRA-2014-2 Topic: EINFRA-1-2014 Managing, preserving and computing with big research data Research & Innovation action Grant Agreement 654021 Interoperability Landscaping Report Document Description D5.1 – Interoperability Landscaping Report WP5 – Interoperability Framework WP participating organizations: ARC, UNIMAN, UKP-TUDA, INRA, EMBL-EBI, AK, LIBER, UvA, OU, EPFL, CNIO, USFD, GESIS, GRNET, Frontiers, UoS Contractual Delivery Date: 12/2015 Actual Delivery Date: 12/2015 Nature: Report Version: 1.0 (Draft) Public Deliverable Preparation slip Name Organization Date Authors Listed in the document Edited by Piotr Przybyła UNIMAN 20/12/2015 Matthew Shardlow UNIMAN Reviewed by John McNaught UNIMAN 20/12/2015 Natalia Manola ARC 22/12/2015 Approved by Natalia Manola ARC 16/1/2016 For delivery Mike Hatzopoulos ARC Document change record Issue Item Reason for Change Author Organization V0.1 Draft version Initial document structure Piotr Przybyła, UNIMAN Matthew Shardlow V0.2 Draft version Included initial content Piotr Przybyła, UNIMAN Matthew Shardlow V1.0 First delivery Applied corrections and Piotr Przybyła, UNIMAN added discussion Matthew Shardlow Public Page 1 of 112 Interoperability Landscaping Report Table of Contents 1. INTRODUCTION 8 2. REPOSITORIES 11 2.1 DESCRIPTION OF RESOURCES 11 2.1.1 METADATA SCHEMAS & PROFILES 12 2.1.2 VOCABULARIES AND ONTOLOGIES FOR DESCRIBING SPECIFIC INFORMATION TYPES 19 2.1.3 MECHANISMS USED FOR THE IDENTIFICATION OF RESOURCES 20 2.1.4 SUMMARY 20 2.2 REGISTRIES/REPOSITORIES OF RESOURCES 21 3. -
IRG Principle and Procedures
INTERNATIONAL ORGANIZATION FOR STANDARDIZATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC 1/SC 2/WG 2/IRG Universal Coded Character Set (UCS) ISO/IEC JTC 1/SC 2/WG 2/IRGN2424 WG2N5161 SC2N4752 (Revision of IRG N1503/N1772/N1823/N1920/N1942/N1975/N2016/N2092/N2153/ N2222/N2275/N2310/N2345/N2408) Confirmed 2021-04-05 Title: IRG Principles and Procedures (IRG PnP) Version 13 Source: IRG Convenor Action: For review by IRG and WG2 Distribution: IRG Reviewers and Ideographic Experts Editor in chief: Lu Qin, IRG Convenor References: Recommendations from IRG #53(IRGN2410&IRG2408,IRGN2412), IRG #52 (IRGN2360), IRGN2345 drafts, and feedback from Ken Lunde and HKSAR; IRG #51 (IRGN2329); IRG #49 ( IRGN2275); IRG #48 (IRGN2220); IRG #47 (IRGN2180); IRG #45 (IRGN2150); IRG # 44 (IRGN2080), IRGN2016 and IRGN1975; IRG #42 (IRGN1952 and feedback from HKSAR, Japan, ROK and TCA, IRG1920 Draft (2012-11-15), Draft 2 (2013-05-04) and Draft 3 (2013-05-22), feedback from Japan (2013-04-23) and ROK (2013-05-16 and 2013-05-21); IRG #40 discussions, IRG1823 Draft 3 and feedback from HKSAR, Korea; IRG #39 discussions IRGN1823 Draft2 feedback from HKSAR and Japan, from KIM Kyongsok, IRGN1781 and N1782 Feedback from KIM Kyongsok, IRGN1772 (P&P Version 5), IRGN1646 (P&P Version 4 draft), IRGN1602 (P&P Draft 4) and IRGN1633 (P&P Editorial Report), IRGN1601 (P&P Draft 3 Feedback from HKSAR), IRGN1590 and IRGN1601(P&P V2 and V3 draft and all feedback), IRGN1562 (P&P V3 Draft 1 and Feedback from HKSAR), IRGN1561 (P&P V2 and all feedback), IRGN1559 (P&P V2 Draft and all feedback), IRGN1516 (P&P V1 Feedback from HKSAR), IRGN1489 (P&P V1 Feedback from Taichi Kawabata) IRGN1487 (P&P V1 Feedback from HKSAR), IRGN1465, IRGN1498 and IRGN1503 (P&P V1 drafts) Table of Contents 1. -
ISO/IEC International Standard ISO/IEC 10646
ISO/IEC International Standard ISO/IEC 10646 Final Committee Draft Information technology – Universal Coded Character Set (UCS) Technologie de l’information – Jeu universel de caractères codés (JUC) Second edition, 2010 ISO/IEC 10646:2010 (E) Final Committee Draft (FCD) PDF disclaimer This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat accepts no liability in this area. Adobe is a trademark of Adobe Systems Incorporated. Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF- creation parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below. © ISO/IEC 2010 All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the ad- dress below or ISO's member body in the country of the requester. ISO copyright office Case postale 56 • CH-1211 Geneva 20 Tel. + 41 22 749 01 11 Fax + 41 22 749 09 47 E-mail [email protected] Web www.iso.ch Printed in Switzerland 2 © ISO/IEC 2010 – All rights reserved ISO/IEC 10646:2010 (E) Final Committee Draft (FCD) CONTENTS Foreword................................................................................................................................................. -
DIGITAL BREAD CRUMBS: Seven Clues to Identifying Who's Behind Advanced Cyber Attacks
REPORT DIGITAL BREAD CRUMBS: Seven Clues To Identifying Who’s Behind Advanced Cyber Attacks SECURITY REIMAGINED Digital Bread Crumbs: Focusing Seven Clues To Identifying Who’s Behind Advanced Cyber Attacks CONTENTS Executive Summary ...........................................................................................................................................................................................................3 Introduction ...................................................................................................................................................................................................................................3 1. Keyboard Layout .....................................................................................................................................................................................................4 2. Malware Metadata ...............................................................................................................................................................................................5 3. Embedded Fonts .................................................................................................................................................................................................6 4. DNS Registration ................................................................................................................................................................................................7 5. Language .......................................................................................................................................................................................................................8 -
Crash Course on Character Encodings
Crash Course on Character Encodings Yusuke Shinyama NYCNLP Oct. 27, 2006 Introduction 2 Are they the same? • Unicode • UTF 3 Two Mappings Character Byte Character Code Sequence A 64 64 182 216 1590 ﺽ 美 32654 231 190 142 4 Two Mappings Character Byte Character Code Sequence Unicode UTF-8 A 64 64 182 216 1590 ﺽ 美 32654 231 190 142 “Character Set” “Encoding Scheme” 5 Terminology • Character Set - Mapping from abstract characters to numbers. • Encoding Scheme - Way to represent (encode) a number in a byte sequence in a decodable way. - Only necessary for character sets that have more than 256 characters. 6 In ASCII... Character Byte Character Code Sequence ASCII 5 53 53 A 65 65 m 109 109 7 Character Sets 8 Character Sets • ≤ 256 characters: - ASCII (English) - ISO 8859-1 (English & Western European languages) - KOI8 (Cyrillic) - ISO-8859-6 (Arabic) • 256 < characters: - Unicode - GB 2312 (Simplified Chinese) - Big5 (Traditional Chinese) - JISX 0208 (Japanese) - KPS 9566 (North Korean) 9 Character Sets ISO ISO ASCII GB 2312 Unicode 8859-1 8859-6 A 65 65 65 65 65 ë - 235 - - 235 1590 - 214 - - ﺽ 美 - - - 50112 32654 ♥ - - - - 9829 10 Unicode Standard • History - ISO Universal Character Set (1989) - Unicode 1.0 (1991) ■ 16-bit fixed length codes. - Unicode 2.0 (1996) ■ Oops, we’ve got many more. ■ Extended to 32 bits. - Unicode 5.0 (2006) ■ Keep growing... 11 Unicode Standard • Hexadecimal notation (U+XXXX). • ISO 8859-1 is preserved as the first 256 characters. ISO Unicode 8859-1 U+0041 A 65 (65) U+00EB ë 235 (235) 12 Problems in Unicode • Politics (Microsoft, Apple, Sun, ...) • Lots of application specific characters.