(12) United States Patent (10) Patent N0.: US 7,900,143 B2 Xu (45) Date of Patent: Mar

Total Page:16

File Type:pdf, Size:1020Kb

(12) United States Patent (10) Patent N0.: US 7,900,143 B2 Xu (45) Date of Patent: Mar US007900143B2 (12) United States Patent (10) Patent N0.: US 7,900,143 B2 Xu (45) Date of Patent: Mar. 1, 2011 (54) LARGE CHARACTER SET BROWSER 6,718,519 B1* 4/2004 Taieb ....................... .. 715/542 (75) Inventor: Yueheng Xu, Beaverton, OR (US) OTHER PUBLICATIONS Roger Molesworth, The Open Interchange of Electronic Documents, (73) Assignee: Intel Corporation, Santa Clara, CA Standards & Practices in Electronic Data Interchange, IEEE Col (Us) loquium on May 21, 1991, pp. 1-6.* Apple Computer, Inc., Programming With The Text Encoding Con Notice: Subject to any disclaimer, the term of this version Manager, Technical Publication, Apr. 1999, pp. 1-286. patent is extended or adjusted under 35 K. Lunde, CJKJNF, CJKINF Version 1.3, Jun. 16, 1995, pp. 1-50. USC 154(b) by 2155 days. Zhu, et al., Network Working Group, Request for Comments 1922, “Chinese Character Encoding For Internet Messages,” Mar. 1996, pp. 1-27. (21) Appl. No.: 09/748,895 Searle, “A Brief History of Character Codes in North America, Europe, and East Asia,” Sakamura Laboratory, University Museum, (22) Filed: Dec. 27, 2000 University of Tokyo, 1999, pp. 1-19. Goldsmith, et al., Network Working Group, Request for Comments (65) Prior Publication Data 2152, “A Mail-Safe Transmission Format of Unicode,” May 1997, US 2002/0120654 A1 Aug. 29, 2002 pp. 1-15. Japanese Patent Of?ce, English Language Translation of Inquiry Int. Cl. Letter Dated Jul. 11, 2008, pp. 1-3. (51) Japan Standards Association, “7-bit and 8-bit double byte coded G06F 17/22 (2006.01) extended Kanji sets for information interchange,” JIS X 0213:2000, (52) US. Cl. .................................................... .. 715/269 Jan. 20, 2000, pp. 50-52 and 60-64. (58) Field of Classi?cation Search ....... .. 7l5/5l3i5l7, 7l5/523i524, 530, 536, 542, 239, 255, 269; * cited by examiner 707/10, 1004101, 203; 709/246 Primary ExamineriLaurie Ries See application ?le for complete search history. Assistant Examiner4Chau Nguyen (56) References Cited (74) Attorney, Agent, or FirmiTrop, Pruner & Hu, RC. U.S. PATENT DOCUMENTS (57) ABSTRACT 5,870,084 A 2/1999 Kanungo et a1. A large number of characters may be represented by character 6,157,905 A * 12/2000 Powell ..................... .. 715/536 codes without undue complication. For example, characters 6,204,782 B1 * 3/2001 Gonzalez et al. ............ .. 341/90 that are already represented by Unicode may be handled 6,397,259 B1* 5/2002 Lincke et al. 715/501.1 pursuant to the Unicode system within one 16-bit code. Char 6,425,123 B1* 7/2002 Rojas et a1. .. 717/136 6,446,133 B1* 9/2002 Tan et al. 709/245 acters that are not represented by Unicode may be handled in 6,512,448 B1* 1/2003 Rincon et al. .. 340/7.51 a second fashion. The characters that are not represented by 6,601,108 B1* 7/2003 Marmor .... .. .. 709/246 Unicode may be represented by two 16-bit codes. 6,631,500 B1* 10/2003 .. 715/531 6,658,625 B1* 12/2003 715/523 25 Claims, 3 Drawing Sheets II'ISGIl New 086 To lmlicalu Wind Change Ute Surrogate Value ‘in Find Fonl Fill Lncatinn 8| mm: ti General: Glyph Image @ US. Patent Mar. 1, 2011 Sheet 1 013 US 7,900,143 B2 Source HTML File 10 Encoded In ISO-2022 CNllSO-2022-CN- ‘J 12 EXT Format f / 18 l 14 16 / lSO-2022-CN/ISO- _/ J 2022-CN-EXT To . Web Content Unrcode/Untcode Wrth Web Content Parser 7 Rendering Engine Surrogate Support Converter l l ‘ 26 20 ASCII And Other J L CJK String Charset String D . s Ia M d I Display Module I p y o u e 24 l ll / 28 w ll 1 f 22 ASCII And Other GB2312/HZGBK/ lSO-2022-CN/ISO Charset Font BIG5 Font 2022-CN-EXT Font Display Drivers Display Drivers Display Driver l i ll 30 Text Content On _/ Screen (Glyphs Of Text) FIG. 1 US. Patent Mar. 1, 2011 Sheet 2 013 US 7,900,143 B2 32 f 34 / Processor f 40 f 36 f 38 Dlsplay. Brldge. MemorySystem f 42 F 44 46 f 12 Bridge HDD --— f 48 SIO Modem FIG. 2 US. Patent Mar. 1, 2011 Sheet 3 013 US 7,900,143 B2 WK Browser ‘ Receive HTML Web [Page Iln PRC [Formal f. 66 llnsertt New 68G Tl'o llnldicate Mod Change /. 701) Use Table Wrapping 5] F 72 murder Map To Surrogalte Areas 1 f 76 Use Surrogale Manure To r0110 [Font lFile Location & @?sei Generate 75 Glyph —/ Ilmage V @ Fm US 7,900,143 B2 1 2 LARGE CHARACTER SET BROWSER BRIEF DESCRIPTION OF THE DRAWINGS BACKGROUND FIG. 1 is a ?ow chart for software in accordance with one embodiment of the present invention; This invention relates generally to Internet browsers and 5 FIG. 2 is a block diagram of hardware for implementing particularly to browsers which support a large number of one embodiment of the present invention; and characters. FIG. 3 is a ?ow chart for software in accordance with one Textual data, displayed as printed material or on a com embodiment of the present invention. puter monitor, is encoded in the form of binary numerical codes. When a given key on a keyboard is stricken, a character DETAILED DESCRIPTION code for that key is generated. A computer then uses the character code to select an appropriate character shape from a A web browser 12, shown in FIG. 1, provides a platform stored font ?le listing with the same character code. level solution to the need for larger character sets. The web English language personal computer systems generally browser 12 may serve as a universal, cross-platform user employ a seven bit character code. This code is codi?ed interface for any web-based application. The browser 12 according to the American Standard Code for Information effectively enables a large character set to support all web Interchange (ASCII) (ANSIx3.4-1996) that allows for char based applications. acter sets of about 128 items of upper and lower case Latin A source hypertext markup language (HTML) ?le is letters, Arabic numerals, signs and control characters. encoded in accordance with the ISO-2022 CN and ISO-2022 The Chinese, Japanese and Korean (CJ K) languages have a 20 CN-extension formats as indicated in block 1 0. In the browser relatively large number of characters, on the order of hun 12, that ?le is converted into a Unicode format extended to dreds of thousands of characters. These characters far exceed include support of a surrogate mechanism of Unicode 2.0 the capacity, in and of themselves, of the 7-bit ASCII charac ter codes. (ISO/IEC 10646 and RFC 2152 (1996)). Using the surrogate mechanism of Unicode extends Unicode to a capacity su?i For example, personal computers in Japan presently utilize 25 the Japanese Industrial Standard (J IS) X0208-1990 that cient to recognize a character set of one million unique char accommodates only 6,879 characters. While this is adequate acters. Since existing Unicode standards use a sixteen bit character length, the sixteen bit value can provide no more for many basic functions, it may be insuf?cient for writing than 65,536 characters. people’s names, place names, historical data and other such Thus, the ISO-2022-CN and CN extension formats are information. 30 The existing CJ K character sets are not suf?cient to provide converted to Unicode with surrogate support as indicated at a wide variety of important information given the available block 14. Those CJK characters that are already de?ned in character sets. For example, with the GB-2312 and Big-5 Unicode standard are converted into their sixteen bit Unicode character sets used by the Netscape Communicator web value. The remaining characters that are not de?ned by the Unicode standard are converted into two sixteen bit values. In browser, only about 16,000 characters are available. 35 As a result, the International Organization for Standardiza one embodiment of the present invention, each of the two tion has created a standard called ISO-2022 that outlines how sixteen bit values is in the range from 0xD800 to 0xDB00 and seven bit and eight bit character codes may be structured. The from 0xDC00 to 0xDFFF respectively. This conforms to the Unicode de?nition of a surrogate mechanism without ambi Chinese language version is ISO-2022-CN, set forth in guity. Request for Comments (RFC) 1922 (Network Working 40 Group, 1996). The browser 12 does the web content parsing, as indicated The so-called Uni?cation Code or Unicode was developed in block 1 6, such as HTML parsing. Thereafter, the remaining by a number of US. software ?rms to unify all the world’s rendering steps are completed as indicated in block 18. character sets into one large character set. See International When the browser 12 is ready to render a particular text Organization for Standardization ISO/IEC 10646-1 (1993) 45 string, the browser 12 loops for each of the characters in the Geneva, Switzerland. Unicode seeks to limit the character set string. For CJK characters, a ?rst search seeks fonts in known space to sixteen bits or a maximum of 65,536 characters. This CJK font libraries as indicated in block 20. The known CJ K character space means each character must be represented by font libraries include Guo-jiu Biao-zhun (“Coding of Chinese a ?xed length code of sixteen bits or two bytes. However, even Ideogram Set for Information Interchange Basic Set”, with Unicode, all of the world’s characters, including the 50 GB-2312-80), Big-5 (Institute for Information Industry, hundreds of thousands of CJ K characters, can not be “Chinese Coded Character Set in Computer,” March 1984), expressed using a character set that only allows for 65,536 GBK (“Information Technology” Universal Multiple-Octet total characters.
Recommended publications
  • Consonant Characters and Inherent Vowels
    Global Design: Characters, Language, and More Richard Ishida W3C Internationalization Activity Lead Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 1 Getting more information W3C Internationalization Activity http://www.w3.org/International/ Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 2 Outline Character encoding: What's that all about? Characters: What do I need to do? Characters: Using escapes Language: Two types of declaration Language: The new language tag values Text size Navigating to localized pages Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 3 Character encoding Character encoding: What's that all about? Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 4 Character encoding The Enigma Photo by David Blaikie Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 5 Character encoding Berber 4,000 BC Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 6 Character encoding Tifinagh http://www.dailymotion.com/video/x1rh6m_tifinagh_creation Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 7 Character encoding Character set Character set ⴰ ⴱ ⴲ ⴳ ⴴ ⴵ ⴶ ⴷ ⴸ ⴹ ⴺ ⴻ ⴼ ⴽ ⴾ ⴿ ⵀ ⵁ ⵂ ⵃ ⵄ ⵅ ⵆ ⵇ ⵈ ⵉ ⵊ ⵋ ⵌ ⵍ ⵎ ⵏ ⵐ ⵑ ⵒ ⵓ ⵔ ⵕ ⵖ ⵗ ⵘ ⵙ ⵚ ⵛ ⵜ ⵝ ⵞ ⵟ ⵠ ⵢ ⵣ ⵤ ⵥ ⵯ Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 8 Character encoding Coded character set 0 1 2 3 0 1 Coded character set 2 3 4 5 6 7 8 9 33 (hexadecimal) A B 52 (decimal) C D E F Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 9 Character encoding Code pages ASCII Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 10 Character encoding Code pages ISO 8859-1 (Latin 1) Western Europe ç (E7) Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 11 Character encoding Code pages ISO 8859-7 Greek η (E7) Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 12 Character encoding Double-byte characters Standard Country No.
    [Show full text]
  • SUPPORTING the CHINESE, JAPANESE, and KOREAN LANGUAGES in the OPENVMS OPERATING SYSTEM by Michael M. T. Yau ABSTRACT the Asian L
    SUPPORTING THE CHINESE, JAPANESE, AND KOREAN LANGUAGES IN THE OPENVMS OPERATING SYSTEM By Michael M. T. Yau ABSTRACT The Asian language versions of the OpenVMS operating system allow Asian-speaking users to interact with the OpenVMS system in their native languages and provide a platform for developing Asian applications. Since the OpenVMS variants must be able to handle multibyte character sets, the requirements for the internal representation, input, and output differ considerably from those for the standard English version. A review of the Japanese, Chinese, and Korean writing systems and character set standards provides the context for a discussion of the features of the Asian OpenVMS variants. The localization approach adopted in developing these Asian variants was shaped by business and engineering constraints; issues related to this approach are presented. INTRODUCTION The OpenVMS operating system was designed in an era when English was the only language supported in computer systems. The Digital Command Language (DCL) commands and utilities, system help and message texts, run-time libraries and system services, and names of system objects such as file names and user names all assume English text encoded in the 7-bit American Standard Code for Information Interchange (ASCII) character set. As Digital's business began to expand into markets where common end users are non-English speaking, the requirement for the OpenVMS system to support languages other than English became inevitable. In contrast to the migration to support single-byte, 8-bit European characters, OpenVMS localization efforts to support the Asian languages, namely Japanese, Chinese, and Korean, must deal with a more complex issue, i.e., the handling of multibyte character sets.
    [Show full text]
  • Legacy Character Sets & Encodings
    Legacy & Not-So-Legacy Character Sets & Encodings Ken Lunde CJKV Type Development Adobe Systems Incorporated bc ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/unicode/iuc15-tb1-slides.pdf Tutorial Overview dc • What is a character set? What is an encoding? • How are character sets and encodings different? • Legacy character sets. • Non-legacy character sets. • Legacy encodings. • How does Unicode fit it? • Code conversion issues. • Disclaimer: The focus of this tutorial is primarily on Asian (CJKV) issues, which tend to be complex from a character set and encoding standpoint. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations dc • GB (China) — Stands for “Guo Biao” (国标 guóbiâo ). — Short for “Guojia Biaozhun” (国家标准 guójiâ biâozhün). — Means “National Standard.” • GB/T (China) — “T” stands for “Tui” (推 tuî ). — Short for “Tuijian” (推荐 tuîjiàn ). — “T” means “Recommended.” • CNS (Taiwan) — 中國國家標準 ( zhôngguó guójiâ biâozhün) in Chinese. — Abbreviation for “Chinese National Standard.” 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations (Cont’d) dc • GCCS (Hong Kong) — Abbreviation for “Government Chinese Character Set.” • JIS (Japan) — 日本工業規格 ( nihon kôgyô kikaku) in Japanese. — Abbreviation for “Japanese Industrial Standard.” — 〄 • KS (Korea) — 한국 공업 규격 (韓國工業規格 hangug gongeob gyugyeog) in Korean. — Abbreviation for “Korean Standard.” — ㉿ — Designation change from “C” to “X” on August 20, 1997. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations (Cont’d) dc • TCVN (Vietnam) — Tiu Chun Vit Nam in Vietnamese. — Means “Vietnamese Standard.” • CJKV — Chinese, Japanese, Korean, and Vietnamese. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated What Is A Character Set? dc • A collection of characters that are intended to be used together to create meaningful text.
    [Show full text]
  • Basis Technology Unicode対応ライブラリ スペックシート 文字コード その他の名称 Adobe-Standard-Encoding A
    Basis Technology Unicode対応ライブラリ スペックシート 文字コード その他の名称 Adobe-Standard-Encoding Adobe-Symbol-Encoding csHPPSMath Adobe-Zapf-Dingbats-Encoding csZapfDingbats Arabic ISO-8859-6, csISOLatinArabic, iso-ir-127, ECMA-114, ASMO-708 ASCII US-ASCII, ANSI_X3.4-1968, iso-ir-6, ANSI_X3.4-1986, ISO646-US, us, IBM367, csASCI big-endian ISO-10646-UCS-2, BigEndian, 68k, PowerPC, Mac, Macintosh Big5 csBig5, cn-big5, x-x-big5 Big5Plus Big5+, csBig5Plus BMP ISO-10646-UCS-2, BMPstring CCSID-1027 csCCSID1027, IBM1027 CCSID-1047 csCCSID1047, IBM1047 CCSID-290 csCCSID290, CCSID290, IBM290 CCSID-300 csCCSID300, CCSID300, IBM300 CCSID-930 csCCSID930, CCSID930, IBM930 CCSID-935 csCCSID935, CCSID935, IBM935 CCSID-937 csCCSID937, CCSID937, IBM937 CCSID-939 csCCSID939, CCSID939, IBM939 CCSID-942 csCCSID942, CCSID942, IBM942 ChineseAutoDetect csChineseAutoDetect: Candidate encodings: GB2312, Big5, GB18030, UTF32:UTF8, UCS2, UTF32 EUC-H, csCNS11643EUC, EUC-TW, TW-EUC, H-EUC, CNS-11643-1992, EUC-H-1992, csCNS11643-1992-EUC, EUC-TW-1992, CNS-11643 TW-EUC-1992, H-EUC-1992 CNS-11643-1986 EUC-H-1986, csCNS11643_1986_EUC, EUC-TW-1986, TW-EUC-1986, H-EUC-1986 CP10000 csCP10000, windows-10000 CP10001 csCP10001, windows-10001 CP10002 csCP10002, windows-10002 CP10003 csCP10003, windows-10003 CP10004 csCP10004, windows-10004 CP10005 csCP10005, windows-10005 CP10006 csCP10006, windows-10006 CP10007 csCP10007, windows-10007 CP10008 csCP10008, windows-10008 CP10010 csCP10010, windows-10010 CP10017 csCP10017, windows-10017 CP10029 csCP10029, windows-10029 CP10079 csCP10079, windows-10079
    [Show full text]
  • Implementing Cross-Locale CJKV Code Conversion
    Implementing Cross-Locale CJKV Code Conversion Ken Lunde CJKV Type Development Adobe Systems Incorporated bc ftp://ftp.oreilly.com/pub/examples/nutshell/ujip/unicode/iuc13-c2-paper.pdf ftp://ftp.oreilly.com/pub/examples/nutshell/ujip/unicode/iuc13-c2-slides.pdf Code Conversion Basics dc • Algorithmic code conversion — Within a single locale: Shift-JIS, EUC-JP, and ISO-2022-JP — A purely mathematical process • Table-driven code conversion — Required across locales: Chinese ↔ Japanese — Required when dealing with Unicode — Mapping tables are required — Can sometimes be faster than algorithmic code conversion— depends on the implementation September 10, 1998 Copyright © 1998 Adobe Systems Incorporated Code Conversion Basics (Cont’d) dc • CJKV character set differences — Different number of characters — Different ordering of characters — Different characters September 10, 1998 Copyright © 1998 Adobe Systems Incorporated Character Sets Versus Encodings dc • Common CJKV character set standards — China: GB 1988-89, GB 2312-80; GB 1988-89, GBK — Taiwan: ASCII, Big Five; CNS 5205-1989, CNS 11643-1992 — Hong Kong: ASCII, Big Five with Hong Kong extension — Japan: JIS X 0201-1997, JIS X 0208:1997, JIS X 0212-1990 — South Korea: KS X 1003:1993, KS X 1001:1992, KS X 1002:1991 — North Korea: ASCII (?), KPS 9566-97 — Vietnam: TCVN 5712:1993, TCVN 5773:1993, TCVN 6056:1995 • Common CJKV encodings — Locale-independent: EUC-*, ISO-2022-* — Locale-specific: GBK, Big Five, Big Five Plus, Shift-JIS, Johab, Unified Hangul Code — Other: UCS-2, UCS-4, UTF-7, UTF-8,
    [Show full text]
  • Japanese Bibliographic Records and CJK Cataloging in U.S
    San Jose State University SJSU ScholarWorks Master's Theses Master's Theses and Graduate Research Fall 2009 Japanese bibliographic records and CJK cataloging in U.S. university libraries. Mie Onnagawa San Jose State University Follow this and additional works at: https://scholarworks.sjsu.edu/etd_theses Recommended Citation Onnagawa, Mie, "Japanese bibliographic records and CJK cataloging in U.S. university libraries." (2009). Master's Theses. 4010. DOI: https://doi.org/10.31979/etd.pcb8-mryq https://scholarworks.sjsu.edu/etd_theses/4010 This Thesis is brought to you for free and open access by the Master's Theses and Graduate Research at SJSU ScholarWorks. It has been accepted for inclusion in Master's Theses by an authorized administrator of SJSU ScholarWorks. For more information, please contact [email protected]. JAPANESE BIBLIOGRAPHIC RECORDS AND CJK CATALOGING IN U.S. UNIVERSITY LIBRARIES A Thesis Presented to The Faculty of the School of Library and Information Science San Jose State University In Partial Fulfillment of the Requirements for the Degree Master of Library and Information Science by Mie Onnagawa December 2009 UMI Number: 1484368 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. UMT Dissertation Publishing UM! 1484368 Copyright 2010 by ProQuest LLC. All rights reserved. This edition of the work is protected against unauthorized copying under Title 17, United States Code.
    [Show full text]
  • AIX Globalization
    AIX Version 7.1 AIX globalization IBM Note Before using this information and the product it supports, read the information in “Notices” on page 233 . This edition applies to AIX Version 7.1 and to all subsequent releases and modifications until otherwise indicated in new editions. © Copyright International Business Machines Corporation 2010, 2018. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents About this document............................................................................................vii Highlighting.................................................................................................................................................vii Case-sensitivity in AIX................................................................................................................................vii ISO 9000.....................................................................................................................................................vii AIX globalization...................................................................................................1 What's new...................................................................................................................................................1 Separation of messages from programs..................................................................................................... 1 Conversion between code sets.............................................................................................................
    [Show full text]
  • International Language Environments Guide
    International Language Environments Guide Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. Part No: 806–6642–10 May, 2002 Copyright 2002 Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved. This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun logo, docs.sun.com, AnswerBook, AnswerBook2, Java, XView, ToolTalk, Solstice AdminTools, SunVideo and Solaris are trademarks, registered trademarks, or service marks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. SunOS, Solaris, X11, SPARC, UNIX, PostScript, OpenWindows, AnswerBook, SunExpress, SPARCprinter, JumpStart, Xlib The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry.
    [Show full text]
  • Traditional Chinese Solaris User's Guide
    Traditional Chinese Solaris User’s Guide Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. Part No: 816–0669–10 May 2002 Copyright 2002 Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved. This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun logo, docs.sun.com, AnswerBook, AnswerBook2, and Solaris are trademarks, registered trademarks, or service marks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun’s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun’s written license agreements.
    [Show full text]
  • International Language Environments Guide for Oracle® Solaris 11.4
    International Language Environments ® Guide for Oracle Solaris 11.4 Part No: E61001 November 2020 International Language Environments Guide for Oracle Solaris 11.4 Part No: E61001 Copyright © 2011, 2020, Oracle and/or its affiliates. License Restrictions Warranty/Consequential Damages Disclaimer This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. Warranty Disclaimer The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. Restricted Rights Notice If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are
    [Show full text]
  • International Register of Coded Character Sets to Be Used with Escape Sequences for Information Interchange in Data Processing
    INTERNATIONAL REGISTER OF CODED CHARACTER SETS TO BE USED WITH ESCAPE SEQUENCES 1 Introduction 1.1 General This document is the ISO International Register of Coded Character Sets To Be Used With Escape Sequences for information interchange in data processing. It is compiled in accordance with the provisions of ISO/IEC 2022, "Code Extension Technique" and of ISO 2375 "Procedure for Registration of Escape Sequences". This International Register contains coded character sets which have been registered in accordance with procedures given in ISO 2375. Its purpose is to identify widely used coded character sets and associate with each a unique escape sequence by means of which it can be designated according to ISO/IEC 2022 and ISO/IEC 4873. The publication of this International Register should promote compatibility in international information interchange and avoid duplication of effort in developing application-oriented coded character sets. Registration provides an identification for a coded character set but implies nothing about its status; it may or may not be part of a standard of an international, national or a corporate body. However, if such a standard is published subsequently to the registration, it would be appropriate for the escape sequence identifying the character set to be specified in the standard. If it is desired to register a set, application should be made to the Registration Authority through an appropriate Sponsoring Authority as specified in ISO 2375. Any character set can be a candidate for registration if it meets the requirements of ISO 2375. The Registration Authority ascertains that the proposals received are formally in accordance with this International Standard, technically in accordance with ISO/IEC 2022, and, where applicable, with ISO/IEC 646 and ISO/IEC 4873, and meet the presentation practice of the Registration Authority.
    [Show full text]
  • HZ – CN-GB – EUC-CN – CP936 • Traditional – EUC-TW – Big Five Et Al
    The Hitchhiker’s Guide to Chinese Encodings Tom Emerson (譨聢·蒨翳芖) Senior Computational Linguist 20th International Unicode Conference Washington, D.C., USA strategy • process • technology • results www.basistech.com Overview • Who am I and why am I here? • What do we mean by “Chinese?” – Simplified vs. Traditional •ChineseCharacter Sets • Introduce Chinese Encodings • Driving Forces • Reality vs. Idealism • Transcoding Issues Who Am I? • “Sinostringologst” at Basis Technology • Lead developer for our Chinese Morphological Analyzer and our Chinese Script Converter • Background in both Computer Science and Linguistics Who Are You? Don’t Panic! The Blowfish Book is the reference for anyone working with CJK character sets. Get it. What is “Chinese?” • For our purposes we are interested in the written language. • In general this means Mandarin. • Topolects (ᮍ㿔) sometimes define their own hanzi for local words, usually for names. • Hence “Written Cantonese” doesn't make a lot of sense. Simplified vs. Traditional • “Simple” and “Full” Form • Mainland China and Singapore use “Simplified Chinese” • Hong Kong, Taiwan, and Macao use “Traditional Chinese” Simplification • Fewer strokes – Easier to learn – Easier to remember – Easier to write • Compare: ৄ vs. 繟 • Simplification is not recent – Some simplified characters in current use date to the pre-Qin period (pre 246 B.C.E.) Simplification 1956: Scheme for Simplifying Chinese Characters 1964: The Complete List of Simplified Characters (2236 characters) 1986: The Complete List of Simplified Characters, 2nd Simplification ऱഏᎾᆖᛎ᝟ႨΕല۟֟׌سഏᎾՕ᝜ࠓࢬข ৫ၲࡨऱ᝜ڣװᆖᛎऱ࿇୶ΖൕڣࠐԼآᖄ ࠓᑪխΕ׌ߡຟਢ۩ᄐհଈΖ ೑䰙໻䌁ᑊ᠔ѻ⫳ⱘ೑䰙㒣⌢䍟࢓ǃᇚ㟇ᇥЏ ᇐ᳾ᴹकᑈ㒣⌢ⱘথሩDŽҢএᑈᑺᓔྟⱘ䌁 ᑊ╂ЁǃЏ㾦䛑ᰃ㸠ϮП佪DŽ Character Sets vs. Encodings • Non-Coded Character Sets – A non-coded character set represents a list of characters that are defined by an organization as the standard set that one is expected to know.
    [Show full text]