Distinguishing 8-Bit Characters and Japanese Professional Quality [9], Including Japanese Line Break- Characters in (U)Ptex Ing Rules and Vertical Typesetting

Total Page:16

File Type:pdf, Size:1020Kb

Distinguishing 8-Bit Characters and Japanese Professional Quality [9], Including Japanese Line Break- Characters in (U)Ptex Ing Rules and Vertical Typesetting TUGboat, Volume 41 (2020), No. 3 329 Distinguishing 8-bit characters and Japanese professional quality [9], including Japanese line break- characters in (u)pTEX ing rules and vertical typesetting. pTEX and pLATEX were originally developed by Hironori Kitagawa the ASCII Corporation2 [1]. However, pTEX and Abstract pLATEX in TEX Live, which are our concern, are community editions. These are currently maintained pTEX (an extension of TEX for Japanese typesetting) 3 by the Japanese TEX Development Community. For uses a legacy encoding as the internal Japanese en- more detail, please see the English guide for pTEX [3]. coding, while accepting UTF-8 input. This means pTEX itself does not have 휀-TEX features, but that pTEX does code conversion in input and output. there is 휀-pTEX [7], which merges pTEX, 휀-TEX and Also, pT X (and its Unicode extension upT X) dis- E E additional primitives. Anything discussed about tinguishes 8-bit character tokens and Japanese char- pTEX in this paper (besides this paragraph) also acter tokens, while this distinction disappears when applies to 휀-pTEX, so I simply write “pTEX” instead tokens are processed with \string and \meaning, of “pTEX and 휀-pTEX”. Note that the pLATEX format or printed to a file or the terminal. in TEX Live is produced by 휀-pTEX, because recent These facts cause several unnatural behaviors versions of LATEX require 휀-TEX features. with (u)pTEX. For example, pTEX garbles “ſ ” (long s) to “顛” on some occasions. This paper explains these 2.1 Input code conversion by ptexenc unnatural behaviors, and discusses an experiment in improvement by the author. Although pTEX in TEX Live accepts UTF-8 inputs, the internal Japanese character set is limited to 1 Introduction JIS X 0208 (JIS level 1 and 2 kanjis), which is a Since TEX Live 2018, UTF-8 has been the new default legacy character set before Unicode. pTEX uses input encoding in LATEX [8]. However, with pLATEX, Shift_JIS (Windows) or EUC-JP (other) as the in- which is a modified version ofA LTEX for the pTEX ternal encoding of JIS X 0208. engine, the source In pTEX and related programs, the ptexenc library [12] converts an input line to the internal %#!platex encoding. pTEX’s input processor actually reads \documentclass{minimal} the converted result by ptexenc. A valid UTF-8 \begin{document}ſ\end{document} % long s sequence which does not represent a JIS X 0208 char- gives an inconsistent error message [4] (edited to fit acter — such as <C5><BF> (“ſ”) or <C3><9F> (“ß”) — TUGboat’s narrow columns): is converted to ^^-notation, such as ^^ab. On the other hand, an invalid UTF-8 sequence ! Package inputenc Error: Unicode character 顛 (U+C4CF) not set up for use with LaTeX. is converted into <A2><AF> (an undefined code point in EUC-JP) sometimes, in TEX Live 2019 or prior. Here “顛”, “ſ” and U+C4CF are all different characters. In TEX Live 2020, the sequence is always converted The purpose of this paper is to investigate the into ^^-notation. background of this message and propose patches to resolve this issue. This paper is based on a cancelled 2.2 Japanese character tokens talk [6] in T XConf 2019.1 E pT X divides character tokens into two groups: ordi- In this paper, the following are assumed: E nary 8-bit character tokens and Japanese character • All inputs and outputs are encoded in UTF-8. tokens. The former are not different from tokens in • pTEX uses EUC-JP as the internal Japanese en- 8-bit engines, say, TEX82 and pdfTEX.A ^^-notation coding (see Section 2.1). sequence is always treated as an 8-bit character. • Sources are typeset in plain pTEX(ptex), unless A Japanese character token is represented by stated otherwise by %#!. its character code. In other words, although there • The notation <AB> describes a byte 0xab, or a is a \kcatcode primitive, which is the counterpart character token whose code is 0xab. of \catcode, its information is not stored in tokens. Hence, changing \kcatcode by users is not recom- 2 Overview of pTEX mended. pTEX is an engine extension of TEX82 for Japanese typesetting. It can typeset Japanese documents of 2 Currently ASCII DWANGO in DWANGO Co. Ltd. 3 texjp.org/. Several GitHub repositories: 1 TEXConf 2019 (the annual meeting of Japanese TEX users, github.com/texjporg/tex-jp-build ((u)pTEX), texconf2019.tumblr.com) was canceled due to a typhoon. github.com/texjporg/platex (pLATEX). Distinguishing 8-bit characters and Japanese characters in (u)pTEX 330 TUGboat, Volume 41 (2020), No. 3 2.3 An example input This is because all of Now we look at an example. Our input line is \csname u^^c5^^bf\endcsname a<C3><9F><E6><BC><A2><C5><BF><C2><A7> (aß漢ſ§) \csname uſ\endcsname % ſ: <C5><BF> in UTF-8 \csname u顛\endcsname % 顛: <C5><BF> in EUC-JP First, ptexenc converts this line into have the same name u<C5><BF> in pTEX, hence they a^^c3^^9f<B4><C1>^^c5^^bf<A1><F8> are treated as the same control sequence. Applying which is fed to pTEX’s input processor. The final \string to them, we get the same token list character “§” is included in JIS X 0208. \12 u12 顛 From the result above, pTEX produces tokens This explains the error message in the introduc- a <C3> <9F> 漢 <C5> <BF> § 11 12 12 12 12 tion. “顛 (U+C4CF)” in the message is generated where 漢 and § are Japanese character tokens. From from this example, we can see that we cannot write “§” \expandafter\string directly to output this character in a Latin font (use \csname u8:\string<C5>\string<BF>\endcsname commands or ^^c2^^a7). The inputenc package expects that applying \string 3 Stringization in pTEX to the above control sequence produces 3.1 Overview \12 u12 812 :12 <C5>12 <BF>12 Names of multiletter control sequences, which in- A clude control sequences with single Japanese charac- but the result in pLTEX is ter name, such as \あ, are stringized, that is to say, \12 u12 812 :12 顛 they are stored into the string pool. Similarly, some primitives, such as \string, \jobname, \meaning 3.3 \meaning and \the (almost always the case), first stringize The result of their intermediate results into the string pool, and \font\Z=ec-lmr10 \Z % T1 encoding then retokenize these intermediate results. \def\fuga{^^c5^^bf顛ſ}\meaning\fuga Stringization of pTEX has two crucial points. • The origin of a byte is lost in stringization. A differs between plainE T X and plain pTEX: byte sequence, for example <C5><BF>, in the plain TEX macro:->Å£éąŻÅ£ string pool may be the result of stringization of plain pTEX macro:->顛顛顛 a Japanese character “顛”, or that of two 8-bit Now we look at what happened with pTEX. The characters <C5> and <BF>. definition of \fuga is represented by the token list • In retokenization, a byte sequence which repre- sents a Japanese character in the internal encod- <C5>12 <BF>12 顛 <C5>12 <BF>12 ing is always converted to a Japanese character This gives the following string as the intermediate token. For example, <C5><BF> is always con- result of \meaning. verted to a Japanese token 顛. macro:-><C5><BF><C5><BF><C5><BF> These points cause unnatural behavior, namely bytes from 8-bit characters becoming garbled to Japanese Retokenizing this string gives the final result character tokens. We look into several examples. macro:->顛顛顛 3.2 Control sequence name which we have already seen. Let’s begin with the following source: 3.4 A tricky application \font\Z=ec-lmr10 \Z % T1 encoding The behavior described in Section 3.2 has a tricky \expandafter\def\csname uſ\endcsname{AA} application: generating a Japanese character token \expandafter\def\csname u顛\endcsname{BB} from its code number, even in an expansion-only \def\ZZ#1{#1 (\string#1) } context. This can be constructed as follows: \expandafter\ZZ\csname u^^c5^^bf\endcsname% (1) \expandafter\ZZ\csname uſ\endcsname % (2) %#!eptex \expandafter\ZZ\csname u顛\endcsname % (3) \font\Z=ec-lmr10 \Z % T1 encoding \input expl3-generic % for \char_generate:nn With pTEX, (1)–(3) produces the same result \ExplSyntaxOn BB (\u 顛) \cs_generate_variant:Nn \cs_to_str:N { c } Hironori Kitagawa TUGboat, Volume 41 (2020), No. 3 331 \cs_new:Npn \tkchar #1 { Then, pTEX prints this token list. Since <A1>–<FE> \cs_to_str:c { are printable and <9F> is not, the putc2 function \char_generate:nn % upper byte receives the following string, one byte per call. { \int_div_truncate:nn { #1 } { 256 } } { 12 } <C5><BF><C5><BF><C3>^^9f \char_generate:nn % lower byte { \int_mod:nn { #1 } { 256 } } { 12 } Each <C5><BF> is converted to “顛” by putc2, } while the single <C3> remains unchanged. Hence the } final result is “顛顛<C3>^^9f”, as shown. \ExplSyntaxOff \edef\A{\tkchar{`漢}\tkchar{`字}} 4.3 \message \meaning\A % ==> macro:->漢字 \message is similar to \write, but differs in that it stringizes its argument. Now consider an input line This \tkchar will be unnecessary as of TEX Live 2020, since the \Uchar and \Ucharcat primitives \message{^^fe^^f3:ꚲ:} were added into 휀-pTEX at that time. Here ꚲ (<F0><AA><9A><B2> in UTF-8) is a character 4 Output to file or terminal included in JIS X 0213, but not in JIS X 0208. 4.1 Output code conversion The argument of \message is (expanded to) the following token list. As with input, pTEX does a code conversion from the internal Japanese encoding to UTF-8 in outputting <FE>12 <F3>12 :12 <F0>12 <AA>12 <9A>12 <B2>12 :12 to a file or the terminal.
Recommended publications
  • C Strings and Pointers
    Software Design Lecture Notes Prof. Stewart Weiss C Strings and Pointers C Strings and Pointers Motivation The C++ string class makes it easy to create and manipulate string data, and is a good thing to learn when rst starting to program in C++ because it allows you to work with string data without understanding much about why it works or what goes on behind the scenes. You can declare and initialize strings, read data into them, append to them, get their size, and do other kinds of useful things with them. However, it is at least as important to know how to work with another type of string, the C string. The C string has its detractors, some of whom have well-founded criticism of it. But much of the negative image of the maligned C string comes from its abuse by lazy programmers and hackers. Because C strings are found in so much legacy code, you cannot call yourself a C++ programmer unless you understand them. Even more important, C++'s input/output library is harder to use when it comes to input validation, whereas the C input/output library, which works entirely with C strings, is easy to use, robust, and powerful. In addition, the C++ main() function has, in addition to the prototype int main() the more important prototype int main ( int argc, char* argv[] ) and this latter form is in fact, a prototype whose second argument is an array of C strings. If you ever want to write a program that obtains the command line arguments entered by its user, you need to know how to use C strings.
    [Show full text]
  • Migration 13 CHS
    Tudun mun tsira: Shiga ciyawa DEUTSCHE WELLE JI KA ƘARU Tudun mun tsira – Mai bayani a kan ’yan Afirka da ke yin ƙaura zuwa Turai Kashi na goma sha uku: Shiga ciyawa Wadda ta rubuta: Chrispin Mwakideu Wanda ya fassara: Ɗanlami Bala Gwammaja Wadda ta tace: Halima C. Schmaling ’Yan Wasa: Magaji: ɗan shekara 22 Jami’in shige da fice: ɗan shekara 40 Lami: ’yar shekara 20 Sa’adatu: ’yar shekara 35 Charles: ɗan shekara 45 Ɗan sanda: ɗan shekara 35 Faisal: ɗan shekara 19 Roda: ’yar shekara 25 1/9 Tudun mun tsira: Shiga ciyawa Mai gabatarwa: Masu saurarenmu, barkanmu da sake saduwa a cikin shirinmu na “Ji Ka Ƙaru”. Wannan shi ne kashi na goma sha uku a shirinmu na wasan kwaikwayo a kan ƙaura da matasa daga Afirka ke yi zuwa Turai, mai taken “Tudun mun tsira”. Idan ba mu manta ba, wannan wasa ne a kan rayuwar waɗansu matasa, wato Lami, da Faisal, da kuma Magaji. Mun ji kuma cewa ko da yake akwai damammakin karatu da kuma ci gaban ɗan Adam a Turai, akwai kuma waɗansu abubuwan ƙi, musamman ma dai idan mutum ba bisa ƙa’ida ya shiga Turai ɗin ba. Sai ku biyo mu cikin shirin namu na yau mai taken “Shiga ciyawa” don jin yadda za ta kasance. Yanzu dai ga mu a ofishin jami’an shige da fice a lokacin da suke ganawa da Magaji. Fitowa ta 1: Magaji yana yi wa jami’in shige da fice bayani Jami’in shige da fice: Wato dai kai ne sabon akarambanar da muka samu ko? Magaji: Akarambana? Ban gane ba, Yallaɓai, me kake nufi da haka? Jami’i: To, ba ga shi ka kurɗaɗo ta cikin jirgin da muke ta fama jigilar kai abinci Afirka ba don ya kawo ka Turai? Ni na kwana biyu ma ban ga abin mamaki irin haka ba, sai kai.
    [Show full text]
  • Technical Study Desktop Internationalization
    Technical Study Desktop Internationalization NIC CH A E L T S T U D Y [This page intentionally left blank] X/Open Technical Study Desktop Internationalisation X/Open Company Ltd. December 1995, X/Open Company Limited All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owners. X/Open Technical Study Desktop Internationalisation X/Open Document Number: E501 Published by X/Open Company Ltd., U.K. Any comments relating to the material contained in this document may be submitted to X/Open at: X/Open Company Limited Apex Plaza Forbury Road Reading Berkshire, RG1 1AX United Kingdom or by Electronic Mail to: [email protected] ii X/Open Technical Study (1995) Contents Chapter 1 Internationalisation.............................................................................. 1 1.1 Introduction ................................................................................................. 1 1.2 Character Sets and Encodings.................................................................. 2 1.3 The C Programming Language................................................................ 5 1.4 Internationalisation Support in POSIX .................................................. 6 1.5 Internationalisation Support in the X/Open CAE............................... 7 1.5.1 XPG4 Facilities.........................................................................................
    [Show full text]
  • Hiragana Chart
    ひらがな Hiragana Chart W R Y M H N T S K VOWEL ん わ ら や ま は な た さ か あ A り み ひ に ち し き い I る ゆ む ふ ぬ つ す く う U れ め へ ね て せ け え E を ろ よ も ほ の と そ こ お O © 2010 Michael L. Kluemper et al. Beginning Japanese, Tuttle Publishing, an imprint of Periplus Editions (HK) Ltd. All rights reserved. www.TimeForJapanese.com. 1 Beginning Japanese 名前: ________________________ 1-1 Hiragana Activity Book 日付: ___月 ___日 一、 Practice: あいうえお かきくけこ がぎぐげご O E U I A お え う い あ あ お え う い あ お う あ え い あ お え う い お う い あ お え あ KO KE KU KI KA こ け く き か か こ け く き か こ け く く き か か こ き き か こ こ け か け く く き き こ け か © 2010 Michael L. Kluemper et al. Beginning Japanese, Tuttle Publishing, an imprint of Periplus Editions (HK) Ltd. All rights reserved. www.TimeForJapanese.com. 2 GO GE GU GI GA ご げ ぐ ぎ が が ご げ ぐ ぎ が ご ご げ ぐ ぐ ぎ ぎ が が ご げ ぎ が ご ご げ が げ ぐ ぐ ぎ ぎ ご げ が 二、 Fill in each blank with the correct HIRAGANA. SE N SE I KI A RA NA MA E 1.
    [Show full text]
  • Como Digitar Em Japonês 1
    Como digitar em japonês 1 Passo 1: Mudar para o modo de digitação em japonês Abra o Office Word, Word Pad ou Bloco de notas para testar a digitação em japonês. Com o cursor colocado em um novo documento em algum lugar em sua tela você vai notar uma barra de idiomas. Clique no botão "PT Português" e selecione "JP Japonês (Japão)". Isso vai mudar a aparência da barra de idiomas. * Se uma barra longa aparecer, como na figura abaixo, clique com o botão direito na parte mais à esquerda e desmarque a opção "Legendas". ficará assim → Além disso, você pode clicar no "_" no canto superior direito da barra de idiomas, que a janela se fechará no canto inferior direito da tela (minimizar). ficará assim → © 2017 Fundação Japão em São Paulo Passo 2: Alterar a barra de idiomas para exibir em japonês Se você não consegue ler em japonês, pode mudar a exibição da barra de idioma para inglês. Clique em ツール e depois na opção プロパティ. Opção: Alterar a barra de idiomas para exibir em inglês Esta janela é toda em japonês, mas não se preocupe, pois da próxima vez que abrí-la estará em Inglês. Haverá um menu de seleção de idiomas no menu de "全般", escolha "英語 " e clique em "OK". © 2017 Fundação Japão em São Paulo Passo 3: Digitando em japonês Certifique-se de que tenha selecionado japonês na barra de idiomas. Após isso, selecione “hiragana”, como indica a seta. Passo 4: Digitando em japonês com letras romanas Uma vez que estiver no modo de entrada correto no documento, vamos digitar uma palavra prática.
    [Show full text]
  • Handy Katakana Workbook.Pdf
    First Edition HANDY KATAKANA WORKBOOK An Introduction to Japanese Writing: KANA THIS IS A SUPPLEMENT FOR BEGINNING LEVEL JAPANESE LANGUAGE INSTRUCTION. \ FrF!' '---~---- , - Y. M. Shimazu, Ed.D. -----~---- TABLE OF CONTENTS Page Introduction vi ACKNOWLEDGEMENlS vii STUDYSHEET#l 1 A,I,U,E, 0, KA,I<I, KU,KE, KO, GA,GI,GU,GE,GO, N WORKSHEET #1 2 PRACTICE: A, I,U, E, 0, KA,KI, KU,KE, KO, GA,GI,GU, GE,GO, N WORKSHEET #2 3 MORE PRACTICE: A, I, U, E,0, KA,KI,KU, KE, KO, GA,GI,GU,GE,GO, N WORKSHEET #~3 4 ADDmONAL PRACTICE: A,I,U, E,0, KA,KI, KU,KE, KO, GA,GI,GU,GE,GO, N STUDYSHEET #2 5 SA,SHI,SU,SE, SO, ZA,JI,ZU,ZE,ZO, TA, CHI, TSU, TE,TO, DA, DE,DO WORI<SHEEI' #4 6 PRACTICE: SA,SHI,SU,SE, SO, ZA,II, ZU,ZE,ZO, TA, CHI, 'lSU,TE,TO, OA, DE,DO WORI<SHEEI' #5 7 MORE PRACTICE: SA,SHI,SU,SE,SO, ZA,II, ZU,ZE, W, TA, CHI, TSU, TE,TO, DA, DE,DO WORKSHEET #6 8 ADDmONAL PRACI'ICE: SA,SHI,SU,SE, SO, ZA,JI, ZU,ZE,ZO, TA, CHI,TSU,TE,TO, DA, DE,DO STUDYSHEET #3 9 NA,NI, NU,NE,NO, HA, HI,FU,HE, HO, BA, BI,BU,BE,BO, PA, PI,PU,PE,PO WORKSHEET #7 10 PRACTICE: NA,NI, NU, NE,NO, HA, HI,FU,HE,HO, BA,BI, BU,BE, BO, PA, PI,PU,PE,PO WORKSHEET #8 11 MORE PRACTICE: NA,NI, NU,NE,NO, HA,HI, FU,HE, HO, BA,BI,BU,BE, BO, PA,PI,PU,PE,PO WORKSHEET #9 12 ADDmONAL PRACTICE: NA,NI, NU, NE,NO, HA, HI, FU,HE, HO, BA,BI,3U, BE, BO, PA, PI,PU,PE,PO STUDYSHEET #4 13 MA, MI,MU, ME, MO, YA, W, YO WORKSHEET#10 14 PRACTICE: MA,MI, MU,ME, MO, YA, W, YO WORKSHEET #11 15 MORE PRACTICE: MA, MI,MU,ME,MO, YA, W, YO WORKSHEET #12 16 ADDmONAL PRACTICE: MA,MI,MU, ME, MO, YA, W, YO STUDYSHEET #5 17
    [Show full text]
  • The Japanese Writing Systems, Script Reforms and the Eradication of the Kanji Writing System: Native Speakers’ Views Lovisa Österman
    The Japanese writing systems, script reforms and the eradication of the Kanji writing system: native speakers’ views Lovisa Österman Lund University, Centre for Languages and Literature Bachelor’s Thesis Japanese B.A. Course (JAPK11 Spring term 2018) Supervisor: Shinichiro Ishihara Abstract This study aims to deduce what Japanese native speakers think of the Japanese writing systems, and in particular what native speakers’ opinions are concerning Kanji, the logographic writing system which consists of Chinese characters. The Japanese written language has something that most languages do not; namely a total of ​ ​ three writing systems. First, there is the Kana writing system, which consists of the two syllabaries: Hiragana and Katakana. The two syllabaries essentially figure the same way, but are used for different purposes. Secondly, there is the Rōmaji writing system, which is Japanese written using latin letters. And finally, there is the Kanji writing system. Learning this is often at first an exhausting task, because not only must one learn the two phonematic writing systems (Hiragana and Katakana), but to be able to properly read and write in Japanese, one should also learn how to read and write a great amount of logographic signs; namely the Kanji. For example, to be able to read and understand books or newspaper without using any aiding tools such as dictionaries, one would need to have learned the 2136 Jōyō Kanji (regular-use Chinese characters). With the twentieth century’s progress in technology, comparing with twenty years ago, in this day and age one could probably theoretically get by alright without knowing how to write Kanji by hand, seeing as we are writing less and less by hand and more by technological devices.
    [Show full text]
  • Writing As Aesthetic in Modern and Contemporary Japanese-Language Literature
    At the Intersection of Script and Literature: Writing as Aesthetic in Modern and Contemporary Japanese-language Literature Christopher J Lowy A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2021 Reading Committee: Edward Mack, Chair Davinder Bhowmik Zev Handel Jeffrey Todd Knight Program Authorized to Offer Degree: Asian Languages and Literature ©Copyright 2021 Christopher J Lowy University of Washington Abstract At the Intersection of Script and Literature: Writing as Aesthetic in Modern and Contemporary Japanese-language Literature Christopher J Lowy Chair of the Supervisory Committee: Edward Mack Department of Asian Languages and Literature This dissertation examines the dynamic relationship between written language and literary fiction in modern and contemporary Japanese-language literature. I analyze how script and narration come together to function as a site of expression, and how they connect to questions of visuality, textuality, and materiality. Informed by work from the field of textual humanities, my project brings together new philological approaches to visual aspects of text in literature written in the Japanese script. Because research in English on the visual textuality of Japanese-language literature is scant, my work serves as a fundamental first-step in creating a new area of critical interest by establishing key terms and a general theoretical framework from which to approach the topic. Chapter One establishes the scope of my project and the vocabulary necessary for an analysis of script relative to narrative content; Chapter Two looks at one author’s relationship with written language; and Chapters Three and Four apply the concepts explored in Chapter One to a variety of modern and contemporary literary texts where script plays a central role.
    [Show full text]
  • Legacy Character Sets & Encodings
    Legacy & Not-So-Legacy Character Sets & Encodings Ken Lunde CJKV Type Development Adobe Systems Incorporated bc ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/unicode/iuc15-tb1-slides.pdf Tutorial Overview dc • What is a character set? What is an encoding? • How are character sets and encodings different? • Legacy character sets. • Non-legacy character sets. • Legacy encodings. • How does Unicode fit it? • Code conversion issues. • Disclaimer: The focus of this tutorial is primarily on Asian (CJKV) issues, which tend to be complex from a character set and encoding standpoint. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations dc • GB (China) — Stands for “Guo Biao” (国标 guóbiâo ). — Short for “Guojia Biaozhun” (国家标准 guójiâ biâozhün). — Means “National Standard.” • GB/T (China) — “T” stands for “Tui” (推 tuî ). — Short for “Tuijian” (推荐 tuîjiàn ). — “T” means “Recommended.” • CNS (Taiwan) — 中國國家標準 ( zhôngguó guójiâ biâozhün) in Chinese. — Abbreviation for “Chinese National Standard.” 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations (Cont’d) dc • GCCS (Hong Kong) — Abbreviation for “Government Chinese Character Set.” • JIS (Japan) — 日本工業規格 ( nihon kôgyô kikaku) in Japanese. — Abbreviation for “Japanese Industrial Standard.” — 〄 • KS (Korea) — 한국 공업 규격 (韓國工業規格 hangug gongeob gyugyeog) in Korean. — Abbreviation for “Korean Standard.” — ㉿ — Designation change from “C” to “X” on August 20, 1997. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations (Cont’d) dc • TCVN (Vietnam) — Tiu Chun Vit Nam in Vietnamese. — Means “Vietnamese Standard.” • CJKV — Chinese, Japanese, Korean, and Vietnamese. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated What Is A Character Set? dc • A collection of characters that are intended to be used together to create meaningful text.
    [Show full text]
  • Implementing Cross-Locale CJKV Code Conversion
    Implementing Cross-Locale CJKV Code Conversion Ken Lunde CJKV Type Development Adobe Systems Incorporated bc ftp://ftp.oreilly.com/pub/examples/nutshell/ujip/unicode/iuc13-c2-paper.pdf ftp://ftp.oreilly.com/pub/examples/nutshell/ujip/unicode/iuc13-c2-slides.pdf Code Conversion Basics dc • Algorithmic code conversion — Within a single locale: Shift-JIS, EUC-JP, and ISO-2022-JP — A purely mathematical process • Table-driven code conversion — Required across locales: Chinese ↔ Japanese — Required when dealing with Unicode — Mapping tables are required — Can sometimes be faster than algorithmic code conversion— depends on the implementation September 10, 1998 Copyright © 1998 Adobe Systems Incorporated Code Conversion Basics (Cont’d) dc • CJKV character set differences — Different number of characters — Different ordering of characters — Different characters September 10, 1998 Copyright © 1998 Adobe Systems Incorporated Character Sets Versus Encodings dc • Common CJKV character set standards — China: GB 1988-89, GB 2312-80; GB 1988-89, GBK — Taiwan: ASCII, Big Five; CNS 5205-1989, CNS 11643-1992 — Hong Kong: ASCII, Big Five with Hong Kong extension — Japan: JIS X 0201-1997, JIS X 0208:1997, JIS X 0212-1990 — South Korea: KS X 1003:1993, KS X 1001:1992, KS X 1002:1991 — North Korea: ASCII (?), KPS 9566-97 — Vietnam: TCVN 5712:1993, TCVN 5773:1993, TCVN 6056:1995 • Common CJKV encodings — Locale-independent: EUC-*, ISO-2022-* — Locale-specific: GBK, Big Five, Big Five Plus, Shift-JIS, Johab, Unified Hangul Code — Other: UCS-2, UCS-4, UTF-7, UTF-8,
    [Show full text]
  • The Selected Poems of Yosa Buson, a Translation Allan Persinger University of Wisconsin-Milwaukee
    University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations May 2013 Foxfire: the Selected Poems of Yosa Buson, a Translation Allan Persinger University of Wisconsin-Milwaukee Follow this and additional works at: https://dc.uwm.edu/etd Part of the American Literature Commons, and the Asian Studies Commons Recommended Citation Persinger, Allan, "Foxfire: the Selected Poems of Yosa Buson, a Translation" (2013). Theses and Dissertations. 748. https://dc.uwm.edu/etd/748 This Dissertation is brought to you for free and open access by UWM Digital Commons. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of UWM Digital Commons. For more information, please contact [email protected]. FOXFIRE: THE SELECTED POEMS OF YOSA BUSON A TRANSLATION By Allan Persinger A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in English at The University of Wisconsin-Milwaukee May 2013 ABSTRACT FOXFIRE: THE SELECTED POEMS OF YOSA BUSON A TRANSLATION By Allan Persinger The University of Wisconsin-Milwaukee, 2013 Under the Supervision of Professor Kimberly M. Blaeser My dissertation is a creative translation from Japanese into English of the poetry of Yosa Buson, an 18th century (1716 – 1783) poet. Buson is considered to be one of the most important of the Edo Era poets and is still influential in modern Japanese literature. By taking account of Japanese culture, identity and aesthetics the dissertation project bridges the gap between American and Japanese poetics, while at the same time revealing the complexity of thought in Buson's poetry and bringing the target audience closer to the text of a powerful and mov- ing writer.
    [Show full text]
  • AIX Globalization
    AIX Version 7.1 AIX globalization IBM Note Before using this information and the product it supports, read the information in “Notices” on page 233 . This edition applies to AIX Version 7.1 and to all subsequent releases and modifications until otherwise indicated in new editions. © Copyright International Business Machines Corporation 2010, 2018. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents About this document............................................................................................vii Highlighting.................................................................................................................................................vii Case-sensitivity in AIX................................................................................................................................vii ISO 9000.....................................................................................................................................................vii AIX globalization...................................................................................................1 What's new...................................................................................................................................................1 Separation of messages from programs..................................................................................................... 1 Conversion between code sets.............................................................................................................
    [Show full text]