Administration: Globalization Table of Contents

Total Page:16

File Type:pdf, Size:1020Kb

Administration: Globalization Table of Contents PUBLIC SAP IQ 16.0 SP08 Document Version: 2.0 - 2014-06-20 Administration: Globalization Table of Contents 1 About International Language Data.................................................4 1.1 International Languages and Character Sets............................................4 1.1.1 What is ICU, and when is it needed?........................................... 4 1.2 Character sets................................................................. 6 1.2.1 Code Pages in Windows....................................................6 1.2.2 How the Collation Sequence Sorts Characters....................................9 1.3 Collations....................................................................10 1.3.1 SQL Anywhere Collation Algorithm (SACA)......................................11 1.3.2 Unicode Collation Algorithm (UCA)...........................................12 1.3.3 Collations in a database...................................................14 1.3.4 Alternate collations...................................................... 15 1.3.5 Turkish character sets and collations..........................................18 1.4 Locales......................................................................21 1.4.1 Locale language.........................................................21 1.4.2 Locale character set......................................................21 1.5 Understanding Character Set Translation............................................. 22 1.5.1 Character Translation for Database Messages...................................22 1.5.2 Connection Strings and Character Sets........................................23 1.5.3 Avoiding Character Set Translation...........................................23 2 ANSI and OEM Code Pages......................................................25 2.1 ANSI ISO_1 Collation............................................................25 2.2 ANSI 1252LATIN1 Collation........................................................25 2.3 ANSI ISO1LATIN1 Collation........................................................25 2.4 ANSI ISO9LATIN1 Collation....................................................... 26 3 Displaying Collations...........................................................27 4 Multibyte Collations...........................................................28 4.1 Japanese Language Support...................................................... 28 4.2 Thai Language Support..........................................................28 5 Locale information............................................................ 29 6 Selecting a Collation to Support a Specific Locale.................................... 30 7 Choosing Character Sets and Collation Sequences to Maximize Performance................31 8 Setting a Locale.............................................................. 32 8.1 List of Language Label Values......................................................32 8.2 Setting the Locale for an INSERT...LOCATION Statement..................................33 PUBLIC Administration: Globalization 2 © 2014 SAP AG or an SAP affiliate company. All rights reserved. Table of Contents 9 Disabling Character Set Translation on a Database Server..............................35 10 Creating a Database with the Default Collation.......................................36 11 Changing the database collation..................................................37 12 Important Disclaimers on Legal Aspects............................................39 Administration: Globalization PUBLIC Table of Contents © 2014 SAP AG or an SAP affiliate company. All rights reserved. 3 1 About International Language Data When you create a database, you specify a collating sequence, or collation, to be used by the database. A collation is a combination of a character set and a sort order for characters in the database. 1.1 International Languages and Character Sets Configure your SAP® IQ installation to handle international language issues. The database collation sorts and compares all character data types in the database, including object names, such as table and column names. SAP IQ takes advantage of the space efficiency and speed of the SAP® SQL Anywhere® Collation Algorithm. ● The database option SORT_COLLATION allows implicit use of the SORTKEY function on ORDER BY expressions. When the value of this option is set to a valid collation name or collation ID, any string expression in the ORDER BY clause is automatically treated as if the SORTKEY function has been invoked. ● The SORTKEY function uses the International Components for Unicode (ICU) library, instead of the Sybase ® Unicode Infrastructure Library (Unilib ). Sort-key values created using a version of SAP IQ earlier than 15 do not contain the same values created using later versions. Regenerate any sort key values in your database that were generated using a version of SAP IQ earlier than 15. ● The CREATE DATABASE parameter COLLATION supports specification of a collation for a database. The collation of the database must match the collation used by the operating system. The default character set for CREATE DATABASE is ISO_BINENG. ● The CP874toUTF8 utility converts data in the CP874 character set into UTF8 collation, supported by SAP IQ for the Thai language. The CP874toUTF8 utility calls the ICU library to perform data conversion. You can also use this utility to load data in the CP874 character set without converting the data to UTF8. SAP IQ no longer supports custom collations. If you rebuild a database with a custom collation, rebuild in a single step to preserve the custom collation. If you unload the database and then load the schema and data into a database you create, you must use one of the supplied collations. For more information about changes to database collations, and a list of collations deprecated in SAP IQ 15, see New Features in SAP IQ 15.0. Use the iqunload utility to migrate to the current version of SAP IQ from an existing 12.7 database that was created with a deprecated collation. 1.1.1 What is ICU, and when is it needed? ICU, or International Components for Unicode, is an open source library developed and maintained by IBM. ICU facilitates software internationalization by providing Unicode support. Certain character set conversions and collation operations are implemented using ICU. PUBLIC Administration: Globalization 4 © 2014 SAP AG or an SAP affiliate company. All rights reserved. About International Language Data When is ICU needed on the database server? Ideally, ICU should always be available for use by the database server. The following table specifies when and why ICU is needed: ICU is needed when... Notes Unicode Collation Algorithm (UCA) is used as the colla­ UCA requires ICU. tion for the NCHAR or CHAR character set. The database character set is not UTF-8, but is a multi- For password conversion from the database character byte character set. set to UTF-8 (database passwords are stored in UTF-8, internally). The client and database character sets are different, Proper conversion to and from a multi-byte character and when either of them is multi-byte (including set requires ICU. UTF-8). This includes Unicode ODBC, OLE DB, ADO.NET, and SQL Anywhere JDBC applications, re­ gardless of the database character set where at least one of these clients does not have ICU. The database character set is not UTF-8 and conver­ The database server requires ICU to convert UTF-8 to sion between CHAR and NCHAR values is required. another character set. An Embedded SQL client uses an NCHAR character set The database server requires ICU to convert UTF-8 to other than UTF-8. another character set. The default Embedded SQL cli­ ent NCHAR character set is the same as the initial cli­ ent CHAR character set. This can be changed using the db_change_nchar_charset function. The CSCONVERT or SORTKEY functions are used. The Character set conversion o and from a multi-byte char­ CSCONVERT function is called to convert between acter set requires ICU. Sortkey generation for many character sets that conform to the requirements of the sortkey labels requires UCA, which, in turn, requires third point above. ICU. When can I get correct character set conversion on the database server without ICU? You can get correct character set conversion without ICU when both the database character set and client character set are single-byte and sqlany.cvf is available (all platforms), or if the operating system supports the conversion (Windows only). This is because single-byte to single-byte conversions can be processed without ICU if the sqlany.cvf file is available, or the host operating system has the appropriate converters installed. When is ICU needed on the client? For Unicode client applications, you are likely to get better combined client and database server performance when all clients have ICU installed, regardless of the database character set. This is because some of the required Administration: Globalization PUBLIC About International Language Data © 2014 SAP AG or an SAP affiliate company. All rights reserved. 5 conversion activity may be offloaded from the database server to the client, and because fewer conversions are required. Also, if you are using ODBC on Windows platforms, you must have ICU installed on the client, even for ANSI applications. This is because the driver manager converts ANSI ODBC calls to Unicode ODBC calls. 1.2 Character sets A character set is a set of symbols, including letters, digits, spaces, and other symbols. Each piece of software
Recommended publications
  • Man'yogana.Pdf (574.0Kb)
    Bulletin of the School of Oriental and African Studies http://journals.cambridge.org/BSO Additional services for Bulletin of the School of Oriental and African Studies: Email alerts: Click here Subscriptions: Click here Commercial reprints: Click here Terms of use : Click here The origin of man'yogana John R. BENTLEY Bulletin of the School of Oriental and African Studies / Volume 64 / Issue 01 / February 2001, pp 59 ­ 73 DOI: 10.1017/S0041977X01000040, Published online: 18 April 2001 Link to this article: http://journals.cambridge.org/abstract_S0041977X01000040 How to cite this article: John R. BENTLEY (2001). The origin of man'yogana. Bulletin of the School of Oriental and African Studies, 64, pp 59­73 doi:10.1017/S0041977X01000040 Request Permissions : Click here Downloaded from http://journals.cambridge.org/BSO, IP address: 131.156.159.213 on 05 Mar 2013 The origin of man'yo:gana1 . Northern Illinois University 1. Introduction2 The origin of man'yo:gana, the phonetic writing system used by the Japanese who originally had no script, is shrouded in mystery and myth. There is even a tradition that prior to the importation of Chinese script, the Japanese had a native script of their own, known as jindai moji ( , age of the gods script). Christopher Seeley (1991: 3) suggests that by the late thirteenth century, Shoku nihongi, a compilation of various earlier commentaries on Nihon shoki (Japan's first official historical record, 720 ..), circulated the idea that Yamato3 had written script from the age of the gods, a mythical period when the deity Susanoo was believed by the Japanese court to have composed Japan's first poem, and the Sun goddess declared her son would rule the land below.
    [Show full text]
  • Consonant Characters and Inherent Vowels
    Global Design: Characters, Language, and More Richard Ishida W3C Internationalization Activity Lead Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 1 Getting more information W3C Internationalization Activity http://www.w3.org/International/ Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 2 Outline Character encoding: What's that all about? Characters: What do I need to do? Characters: Using escapes Language: Two types of declaration Language: The new language tag values Text size Navigating to localized pages Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 3 Character encoding Character encoding: What's that all about? Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 4 Character encoding The Enigma Photo by David Blaikie Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 5 Character encoding Berber 4,000 BC Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 6 Character encoding Tifinagh http://www.dailymotion.com/video/x1rh6m_tifinagh_creation Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 7 Character encoding Character set Character set ⴰ ⴱ ⴲ ⴳ ⴴ ⴵ ⴶ ⴷ ⴸ ⴹ ⴺ ⴻ ⴼ ⴽ ⴾ ⴿ ⵀ ⵁ ⵂ ⵃ ⵄ ⵅ ⵆ ⵇ ⵈ ⵉ ⵊ ⵋ ⵌ ⵍ ⵎ ⵏ ⵐ ⵑ ⵒ ⵓ ⵔ ⵕ ⵖ ⵗ ⵘ ⵙ ⵚ ⵛ ⵜ ⵝ ⵞ ⵟ ⵠ ⵢ ⵣ ⵤ ⵥ ⵯ Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 8 Character encoding Coded character set 0 1 2 3 0 1 Coded character set 2 3 4 5 6 7 8 9 33 (hexadecimal) A B 52 (decimal) C D E F Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 9 Character encoding Code pages ASCII Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 10 Character encoding Code pages ISO 8859-1 (Latin 1) Western Europe ç (E7) Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 11 Character encoding Code pages ISO 8859-7 Greek η (E7) Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 12 Character encoding Double-byte characters Standard Country No.
    [Show full text]
  • Unicode Ate My Brain
    UNICODE ATE MY BRAIN John Cowan Reuters Health Information Copyright 2001-04 John Cowan under GNU GPL 1 Copyright • Copyright © 2001 John Cowan • Licensed under the GNU General Public License • ABSOLUTELY NO WARRANTIES; USE AT YOUR OWN RISK • Portions written by Tim Bray; used by permission • Title devised by Smarasderagd; used by permission • Black and white for readability Copyright 2001-04 John Cowan under GNU GPL 2 Abstract Unicode, the universal character set, is one of the foundation technologies of XML. However, it is not as widely understood as it should be, because of the unavoidable complexity of handling all of the world's writing systems, even in a fairly uniform way. This tutorial will provide the basics about using Unicode and XML to save lots of money and achieve world domination at the same time. Copyright 2001-04 John Cowan under GNU GPL 3 Roadmap • Brief introduction (4 slides) • Before Unicode (16 slides) • The Unicode Standard (25 slides) • Encodings (11 slides) • XML (10 slides) • The Programmer's View (27 slides) • Points to Remember (1 slide) Copyright 2001-04 John Cowan under GNU GPL 4 How Many Different Characters? a A à á â ã ä å ā ă ą a a a a a a a a a a a Copyright 2001-04 John Cowan under GNU GPL 5 How Computers Do Text • Characters in computer storage are represented by “small” numbers • The numbers use a small number of bits: from 6 (BCD) to 21 (Unicode) to 32 (wchar_t on some Unix boxes) • Design choices: – Which numbers encode which characters – How to pack the numbers into bytes Copyright 2001-04 John Cowan under GNU GPL 6 Where Does XML Come In? • XML is a textual data format • XML software is required to handle all commercially important characters in the world; a promise to “handle XML” implies a promise to be international • Applications can do what they want; monolingual applications can mostly ignore internationalization Copyright 2001-04 John Cowan under GNU GPL 7 $$$ £££ ¥¥¥ • Extra cost of building-in internationalization to a new computer application: about 20% (assuming XML and Unicode).
    [Show full text]
  • SUPPORTING the CHINESE, JAPANESE, and KOREAN LANGUAGES in the OPENVMS OPERATING SYSTEM by Michael M. T. Yau ABSTRACT the Asian L
    SUPPORTING THE CHINESE, JAPANESE, AND KOREAN LANGUAGES IN THE OPENVMS OPERATING SYSTEM By Michael M. T. Yau ABSTRACT The Asian language versions of the OpenVMS operating system allow Asian-speaking users to interact with the OpenVMS system in their native languages and provide a platform for developing Asian applications. Since the OpenVMS variants must be able to handle multibyte character sets, the requirements for the internal representation, input, and output differ considerably from those for the standard English version. A review of the Japanese, Chinese, and Korean writing systems and character set standards provides the context for a discussion of the features of the Asian OpenVMS variants. The localization approach adopted in developing these Asian variants was shaped by business and engineering constraints; issues related to this approach are presented. INTRODUCTION The OpenVMS operating system was designed in an era when English was the only language supported in computer systems. The Digital Command Language (DCL) commands and utilities, system help and message texts, run-time libraries and system services, and names of system objects such as file names and user names all assume English text encoded in the 7-bit American Standard Code for Information Interchange (ASCII) character set. As Digital's business began to expand into markets where common end users are non-English speaking, the requirement for the OpenVMS system to support languages other than English became inevitable. In contrast to the migration to support single-byte, 8-bit European characters, OpenVMS localization efforts to support the Asian languages, namely Japanese, Chinese, and Korean, must deal with a more complex issue, i.e., the handling of multibyte character sets.
    [Show full text]
  • 7$0,/³(1*/,6+ ',&7,21$5< COMMON SPOKEN TAMIL MADE EASY
    7$0,/³(1*/,6+ ',&7,21$5< )RU8VH:LWK &20021632.(17$0,/ 0$'(($6< 7KLUG(GLWLRQ E\ 79$',.(6$9$/8 'LJLWDO9HUVLRQ &+5,67,$10(',&$/&2//(*(9(//25( Adi’s Book, Tamil-English Dictionary. KEY adv. adverb n. noun pron. pronoun v. verb nom. takes nominatve case dat. takes dative case adj. adjective acc. takes accusative case d.b. taked declensional base intr. intransitive tr. transitive neut. neuter imp. impersonal pers. personal incl inclusive excl. exclusive m. masculine f. feminine sing. singular pl. plural w. weak s. strong TAMIL ENGLISH TAMIL ENGLISH KX ZQ become, happen DNN sister (older) KX ZQ happen, become DNNXº arm pit DE\VDP exercise (n.) DºD VQGK measure DE\VDPVHL ZGK exercise (v.) alamaram banyan tree adangu (w.in) obey DºDYXHGX VWKWK measurements (take m.) adangu (w.in) subside alladhu or adhan its P yes DGKDQOHKDL\OH therefore DPYVH no moon DGK same ambadhu fifty DGKLKUDP authority DPP madam DGKLNODLOH early morning DPPWK\U mother adhisayam wonder (n.) ammi grindstone adhu it ¼ male adhu (pron. neut.) that Q but DGKXNNXWKKXQGKSÀOD accordingly anai dam adhunga their (neut.) DQVLDQFKL pine apple adhunga they (things) DQEXQVDP love, kindness adhunga those (things)(are) QGDYDU Lord DGKXQJDºH them (things) andha (adj) that adi (s.thth) strike andha (pron.neut.) those adikkadi often DQKD many GX goat, sheep DQKDP almost, mostly aduppu oven, stove ange there aduththa next DQJHGKQ there only ahalam width anju five ahappadu (w.tt) caught, (to be) found D¼¼DQ brother (elder) DK\DYLPQDP aeroplane anumadhi permission (n.) K\DPYQXP sky anumadhi permit (n.) LSÀFKFKX FRO finished anumadhi kodu (s.thth) permit (v.) DLQQÌUX five hundred anuppu (w.in) send LSÀ ZQ exhaust, finish apdi that way DL\ sir DSGLGKQ that is it DL\À oh dear DSGL\ indeed?, is that so? DL\ÀSYDP poor soul! DSGL\" really?, right? DNN elder sister DSSWKDKDSSDQ father Adi’s Book, Tamil-English Dictionary.
    [Show full text]
  • Character Set Migration Best Practices For
    Character Set Migration Best Practices $Q2UDFOH:KLWH3DSHU October 2002 Server Globalization Technology Oracle Corporation Introduction - Database Character Set Migration Migrating from one database character set to another requires proper strategy and tools. This paper outlines the best practices for database character set migration that has been utilized on behalf of hundreds of customers successfully. Following these methods will help determine what strategies are best suited for your environment and will help minimize risk and downtime. This paper also highlights migration to Unicode. Many customers today are finding Unicode to be essential to supporting their global businesses. Oracle provides consulting services for very large or complex environments to help minimize the downtime while maximizing the safe migration of business critical data. Why migrate? Database character set migration often occurs from a requirement to support new languages. As companies internationalize their operations and expand services to customers all around the world, they find the need to support data storage of more World languages than are available within their existing database character set. Historically, many legacy systems required support for only one or possibly a few languages; therefore, the original character set chosen had a limited repertoire of characters that could be supported. For example, in America a 7-bit character set called ASCII is satisfactory for supporting English data exclusively. While in Europe a variety of 8 bit European character sets can support specific subsets of European languages together with English. In Asia, multi byte character sets that could support a given Asian language and English were chosen. These were reasonable choices that fulfilled the initial requirements and provided the best combination of economy and performance.
    [Show full text]
  • Handy Katakana Workbook.Pdf
    First Edition HANDY KATAKANA WORKBOOK An Introduction to Japanese Writing: KANA THIS IS A SUPPLEMENT FOR BEGINNING LEVEL JAPANESE LANGUAGE INSTRUCTION. \ FrF!' '---~---- , - Y. M. Shimazu, Ed.D. -----~---- TABLE OF CONTENTS Page Introduction vi ACKNOWLEDGEMENlS vii STUDYSHEET#l 1 A,I,U,E, 0, KA,I<I, KU,KE, KO, GA,GI,GU,GE,GO, N WORKSHEET #1 2 PRACTICE: A, I,U, E, 0, KA,KI, KU,KE, KO, GA,GI,GU, GE,GO, N WORKSHEET #2 3 MORE PRACTICE: A, I, U, E,0, KA,KI,KU, KE, KO, GA,GI,GU,GE,GO, N WORKSHEET #~3 4 ADDmONAL PRACTICE: A,I,U, E,0, KA,KI, KU,KE, KO, GA,GI,GU,GE,GO, N STUDYSHEET #2 5 SA,SHI,SU,SE, SO, ZA,JI,ZU,ZE,ZO, TA, CHI, TSU, TE,TO, DA, DE,DO WORI<SHEEI' #4 6 PRACTICE: SA,SHI,SU,SE, SO, ZA,II, ZU,ZE,ZO, TA, CHI, 'lSU,TE,TO, OA, DE,DO WORI<SHEEI' #5 7 MORE PRACTICE: SA,SHI,SU,SE,SO, ZA,II, ZU,ZE, W, TA, CHI, TSU, TE,TO, DA, DE,DO WORKSHEET #6 8 ADDmONAL PRACI'ICE: SA,SHI,SU,SE, SO, ZA,JI, ZU,ZE,ZO, TA, CHI,TSU,TE,TO, DA, DE,DO STUDYSHEET #3 9 NA,NI, NU,NE,NO, HA, HI,FU,HE, HO, BA, BI,BU,BE,BO, PA, PI,PU,PE,PO WORKSHEET #7 10 PRACTICE: NA,NI, NU, NE,NO, HA, HI,FU,HE,HO, BA,BI, BU,BE, BO, PA, PI,PU,PE,PO WORKSHEET #8 11 MORE PRACTICE: NA,NI, NU,NE,NO, HA,HI, FU,HE, HO, BA,BI,BU,BE, BO, PA,PI,PU,PE,PO WORKSHEET #9 12 ADDmONAL PRACTICE: NA,NI, NU, NE,NO, HA, HI, FU,HE, HO, BA,BI,3U, BE, BO, PA, PI,PU,PE,PO STUDYSHEET #4 13 MA, MI,MU, ME, MO, YA, W, YO WORKSHEET#10 14 PRACTICE: MA,MI, MU,ME, MO, YA, W, YO WORKSHEET #11 15 MORE PRACTICE: MA, MI,MU,ME,MO, YA, W, YO WORKSHEET #12 16 ADDmONAL PRACTICE: MA,MI,MU, ME, MO, YA, W, YO STUDYSHEET #5 17
    [Show full text]
  • Writing As Aesthetic in Modern and Contemporary Japanese-Language Literature
    At the Intersection of Script and Literature: Writing as Aesthetic in Modern and Contemporary Japanese-language Literature Christopher J Lowy A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2021 Reading Committee: Edward Mack, Chair Davinder Bhowmik Zev Handel Jeffrey Todd Knight Program Authorized to Offer Degree: Asian Languages and Literature ©Copyright 2021 Christopher J Lowy University of Washington Abstract At the Intersection of Script and Literature: Writing as Aesthetic in Modern and Contemporary Japanese-language Literature Christopher J Lowy Chair of the Supervisory Committee: Edward Mack Department of Asian Languages and Literature This dissertation examines the dynamic relationship between written language and literary fiction in modern and contemporary Japanese-language literature. I analyze how script and narration come together to function as a site of expression, and how they connect to questions of visuality, textuality, and materiality. Informed by work from the field of textual humanities, my project brings together new philological approaches to visual aspects of text in literature written in the Japanese script. Because research in English on the visual textuality of Japanese-language literature is scant, my work serves as a fundamental first-step in creating a new area of critical interest by establishing key terms and a general theoretical framework from which to approach the topic. Chapter One establishes the scope of my project and the vocabulary necessary for an analysis of script relative to narrative content; Chapter Two looks at one author’s relationship with written language; and Chapters Three and Four apply the concepts explored in Chapter One to a variety of modern and contemporary literary texts where script plays a central role.
    [Show full text]
  • Legacy Character Sets & Encodings
    Legacy & Not-So-Legacy Character Sets & Encodings Ken Lunde CJKV Type Development Adobe Systems Incorporated bc ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/unicode/iuc15-tb1-slides.pdf Tutorial Overview dc • What is a character set? What is an encoding? • How are character sets and encodings different? • Legacy character sets. • Non-legacy character sets. • Legacy encodings. • How does Unicode fit it? • Code conversion issues. • Disclaimer: The focus of this tutorial is primarily on Asian (CJKV) issues, which tend to be complex from a character set and encoding standpoint. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations dc • GB (China) — Stands for “Guo Biao” (国标 guóbiâo ). — Short for “Guojia Biaozhun” (国家标准 guójiâ biâozhün). — Means “National Standard.” • GB/T (China) — “T” stands for “Tui” (推 tuî ). — Short for “Tuijian” (推荐 tuîjiàn ). — “T” means “Recommended.” • CNS (Taiwan) — 中國國家標準 ( zhôngguó guójiâ biâozhün) in Chinese. — Abbreviation for “Chinese National Standard.” 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations (Cont’d) dc • GCCS (Hong Kong) — Abbreviation for “Government Chinese Character Set.” • JIS (Japan) — 日本工業規格 ( nihon kôgyô kikaku) in Japanese. — Abbreviation for “Japanese Industrial Standard.” — 〄 • KS (Korea) — 한국 공업 규격 (韓國工業規格 hangug gongeob gyugyeog) in Korean. — Abbreviation for “Korean Standard.” — ㉿ — Designation change from “C” to “X” on August 20, 1997. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations (Cont’d) dc • TCVN (Vietnam) — Tiu Chun Vit Nam in Vietnamese. — Means “Vietnamese Standard.” • CJKV — Chinese, Japanese, Korean, and Vietnamese. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated What Is A Character Set? dc • A collection of characters that are intended to be used together to create meaningful text.
    [Show full text]
  • Basis Technology Unicode対応ライブラリ スペックシート 文字コード その他の名称 Adobe-Standard-Encoding A
    Basis Technology Unicode対応ライブラリ スペックシート 文字コード その他の名称 Adobe-Standard-Encoding Adobe-Symbol-Encoding csHPPSMath Adobe-Zapf-Dingbats-Encoding csZapfDingbats Arabic ISO-8859-6, csISOLatinArabic, iso-ir-127, ECMA-114, ASMO-708 ASCII US-ASCII, ANSI_X3.4-1968, iso-ir-6, ANSI_X3.4-1986, ISO646-US, us, IBM367, csASCI big-endian ISO-10646-UCS-2, BigEndian, 68k, PowerPC, Mac, Macintosh Big5 csBig5, cn-big5, x-x-big5 Big5Plus Big5+, csBig5Plus BMP ISO-10646-UCS-2, BMPstring CCSID-1027 csCCSID1027, IBM1027 CCSID-1047 csCCSID1047, IBM1047 CCSID-290 csCCSID290, CCSID290, IBM290 CCSID-300 csCCSID300, CCSID300, IBM300 CCSID-930 csCCSID930, CCSID930, IBM930 CCSID-935 csCCSID935, CCSID935, IBM935 CCSID-937 csCCSID937, CCSID937, IBM937 CCSID-939 csCCSID939, CCSID939, IBM939 CCSID-942 csCCSID942, CCSID942, IBM942 ChineseAutoDetect csChineseAutoDetect: Candidate encodings: GB2312, Big5, GB18030, UTF32:UTF8, UCS2, UTF32 EUC-H, csCNS11643EUC, EUC-TW, TW-EUC, H-EUC, CNS-11643-1992, EUC-H-1992, csCNS11643-1992-EUC, EUC-TW-1992, CNS-11643 TW-EUC-1992, H-EUC-1992 CNS-11643-1986 EUC-H-1986, csCNS11643_1986_EUC, EUC-TW-1986, TW-EUC-1986, H-EUC-1986 CP10000 csCP10000, windows-10000 CP10001 csCP10001, windows-10001 CP10002 csCP10002, windows-10002 CP10003 csCP10003, windows-10003 CP10004 csCP10004, windows-10004 CP10005 csCP10005, windows-10005 CP10006 csCP10006, windows-10006 CP10007 csCP10007, windows-10007 CP10008 csCP10008, windows-10008 CP10010 csCP10010, windows-10010 CP10017 csCP10017, windows-10017 CP10029 csCP10029, windows-10029 CP10079 csCP10079, windows-10079
    [Show full text]
  • Implementing Cross-Locale CJKV Code Conversion
    Implementing Cross-Locale CJKV Code Conversion Ken Lunde CJKV Type Development Adobe Systems Incorporated bc ftp://ftp.oreilly.com/pub/examples/nutshell/ujip/unicode/iuc13-c2-paper.pdf ftp://ftp.oreilly.com/pub/examples/nutshell/ujip/unicode/iuc13-c2-slides.pdf Code Conversion Basics dc • Algorithmic code conversion — Within a single locale: Shift-JIS, EUC-JP, and ISO-2022-JP — A purely mathematical process • Table-driven code conversion — Required across locales: Chinese ↔ Japanese — Required when dealing with Unicode — Mapping tables are required — Can sometimes be faster than algorithmic code conversion— depends on the implementation September 10, 1998 Copyright © 1998 Adobe Systems Incorporated Code Conversion Basics (Cont’d) dc • CJKV character set differences — Different number of characters — Different ordering of characters — Different characters September 10, 1998 Copyright © 1998 Adobe Systems Incorporated Character Sets Versus Encodings dc • Common CJKV character set standards — China: GB 1988-89, GB 2312-80; GB 1988-89, GBK — Taiwan: ASCII, Big Five; CNS 5205-1989, CNS 11643-1992 — Hong Kong: ASCII, Big Five with Hong Kong extension — Japan: JIS X 0201-1997, JIS X 0208:1997, JIS X 0212-1990 — South Korea: KS X 1003:1993, KS X 1001:1992, KS X 1002:1991 — North Korea: ASCII (?), KPS 9566-97 — Vietnam: TCVN 5712:1993, TCVN 5773:1993, TCVN 6056:1995 • Common CJKV encodings — Locale-independent: EUC-*, ISO-2022-* — Locale-specific: GBK, Big Five, Big Five Plus, Shift-JIS, Johab, Unified Hangul Code — Other: UCS-2, UCS-4, UTF-7, UTF-8,
    [Show full text]
  • D Igital Text
    Digital text The joys of character sets Contents Storing text ¡ General problems ¡ Legacy character encodings ¡ Unicode ¡ Markup languages Using text ¡ Processing and display ¡ Programming languages A little bit about writing systems Overview Latin Cyrillic Devanagari − − − − − − Tibetan \ / / Gujarati | \ / − Armenian / Bengali SOGDIAN − Mongolian \ / / Gurumukhi SCRIPT Greek − Georgian / Oriya Chinese | / / | / Telugu / PHOENICIAN BRAHMI − − Kannada SINITIC − Japanese SCRIPT \ SCRIPT Malayalam SCRIPT \ / | \ \ Tamil \ Hebrew | Arabic \ Korean | \ \ − − Sinhala | \ \ | \ \ _ _ Burmese | \ \ Khmer | \ \ Ethiopic Thaana \ _ _ Thai Lao The easy ones Latin is the alphabet and writing system used in the West and some other places Greek and Cyrillic (Russian) are very similar, they just use different characters Armenian and Georgian are also relatively similar More difficult Hebrew is written from right−to−left, but numbers go left−to−right... Arabic has the same rules, but also requires variant selection depending on context and ligature forming The far east Chinese uses two ’alphabets’: hanzi ideographs and zhuyin syllables Japanese mixes four alphabets: kanji ideographs, katakana and hiragana syllables and romaji (latin) letters and numbers Korean uses hangul ideographs, combined from jamo components Vietnamese uses latin letters... The Indic languages Based on syllabic alphabets Require complex ligature forming Letters are not written in logical order, but require a strange ’circular’ ordering In addition, a single line consists of separate
    [Show full text]