Keynote Speech: Internationalizing Web Content

Objectives • Explore the dimensions of internationalization (i18n) • Tease apart some basic contexts where internationalization is necessary • Show examples of how the W3C is making local access to the Web easier/possible Internationalizing Web Content • Show how internationalization is a prerequisite for good local content Richard Ishida W3C Internationalization Lead Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 1 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 2 Overview L10n or i18n? Localization The adaptation of a product, application or document content to meet the language, cultural and other requirements of a specific target market. L10n or i18n? Internationalization Getting the character basics right Extending technology to support local needs The design and development of a product, application or Removing barriers to international use document content that enables easy localization for target Assessing cultural influences audiences that vary in culture, region, or language. Improving the process Summary http://www.w3.org/International/questions/qa-i18n Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 3 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 4 Overview Getting the character basics right ! 缔造真正全球通行的万维网締造真正全球通行的萬維網 የዓʶˊ አˬፉን ድ˙ በእውነት አʶˊ አˬፍ ˈድ˔ግ! Κάνοντας τον Παγκόσμιο Ιστό πραγματικά Παγκόσμιο ליצור מהרשת רשת כלל עולמית באמת वड वाईड वेब को सचमुचBव#यापीबनारहQ हS ! ᑖᑦᓱᒪ ᐃᑭᐊᖅᑭᕕᒃ ᓯᓚᕐᔪᐊᓕᒫᒥᒃ ᓈᕆᑎᑉᐹ. Making the World Wide Web world wide! L10n or i18n? ワールド・ワイド・ウェッブを世界中に広げましょう Getting the character basics right Hogy a Világháló valóban az egész világé lehessen! Extending technology to support local needs वड वाईड वेबलाई यथाथमै Bव#यापी बनाउने ! Removing barriers to international use "Дүниежүзілікторды" нағыз дүниежүзілікетеміз! Assessing cultural influences Improving the process 전세계의 월드 와이드 웹으로 만들기! ਵਰਡ ਵਾਈਡ ਵੈਬਨੰ ੂ ਵਾਕਈ ਿਵਸ਼ਵ-ਿਵਆਪੀ ਬਨਾਉਣਾ ! Сделаем "Всемирную паутину" действительно всемирной! World Wide Web U ita uri Webu Nyangaredzi ya Dzhango i vhe nyangaredzi ngangoho! Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 5 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 6 1 Getting the character basics right Getting the character basics right European Middle East scripts South & South East alphabetic scripts Hebrew Asian scripts Latin Arabic Devanagari 好 ũ א Greek Syriac Bengali A Cyrillic Thaana Gurmukhi Armenian Gujurati Georgian Panjabi Code point 41 5D0 597D 233B4 Runic Symbols Oriya Ogham Currency symbols Tamil Modifier letters Letter like symbols Telugu Additional scripts Combining characters Mathematic operators Kannada Ethiopic Numeric forms Malayalam Cherokee Technical symbols Sinhala Canadian Aboriginal East Asian scripts Geometrical symbols Thai Syllabics Han Miscellaneous Lao Mongolian Hiragana symbols & dingbats Tibetan Katakana Enclosed & square Myanmar Hangul Braille Khmer Bopomofo Etc…. Yi Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 7 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 8 Getting the character basics right Getting the character basics right 好 ũ א A Character Code point 41 5D0 597D 233B4 a a UTF-8 41 D7 90 E5 A5 BD F0 A3 8E B4 vs. UTF-16 00 41 05 D0 59 7D D8 4C DF B4 Glyph 雪雪 Encodings UTF-32 00 00 00 41 00 00 05 D0 00 00 59 7D 00 02 33 B4 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 9 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 10 Getting the character basics right Getting the character basics right ह + ि◌ + न + ◌् + द + ◌ी ! * ,-/a . ^ [ c ،ّ S ()* +V'% $, " ( (Unicode Conference8Q 12-10 45 67 1997 ( ِ -5$ .'S :ْbV'% ,$ ! 5 $ ' ! ! ! 5* _4 $ 4 5 ,\5 5 ! $ . 5*$ a b िहbदी Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 11 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 12 2 Getting the character basics right Getting the character basics right Participating stakeholders: Character Bytes • W3C adopted and monitors the use of Unicode as the A 41 document character set for its specifications, and liaises with the Unicode Consortium • Unicode Consortium defines Unicode and expected á C3 A1 character-level behaviors • platform developers need to provide for Unicode support, such as rendering algorithms あ E3 81 82 • application developers should ensure that expected Unicode behaviors are implemented ũ F0 A3 8E B4 • content developers and managers should use Unicode whenever possible Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 13 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 14 Getting the character basics right Getting the character basics right あאあaאあ aאあaאa 61 D7 90 E3 81 82 61 D7 90 E3 81 82 61 D7 90 E3 81 82 61 D7 90 E3 81 82 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 15 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 16 Getting the character basics right Getting the character basics right あ NFC Ízelítőülאあaאa NFD I◌zeli ◌to◌ű◌̈l 61 D7 90 E3 81 82 61 D7 90 E3 81 82 Ha a világ beszélni akarna, Unicode-ul szólalna meg. Regisztráljon már most a Tizedik Nemzetközi Unicode Konferenciára, melyet 1997. március 10-12-én rendeznek Meinz-ban, Németországban. Ezen a konferencián az iparág több neves szakértője is részt vesz. Ízelítőül a témákból: a világháló és a Unicode nemzetköziesítése és lokalizálása, a Unicode alkalmazása működő rendszerekben és alkalmazásokban, szövegelrendezésnél, és többnyelvű számítógépeken. Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 17 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 18 3 Getting the character basics right Getting the character basics right <meta http-equiv="Content-type" content="text/html;charset=UTF-8" /> <?xml version="1.0" encoding="UTF-8"?> Content-Type: text/html; charset=utf-8 HTTP <?xml .. <meta .. HTML (✓) ✗ ✓ XHTML (text/html) (✓) (✓) ✓ XHTML (XML) (✓) ✓ ✗ http://www.w3.org/International/tutorials/tutorial-char-enc/ Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 19 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 20 Overview Extending technology to support local needs Characters as ordered in memory: .W3C" in Hebrew ,םו אני בה תו לי ע פ <The title says "<span ✓ פעילות הב ינאו ם, ?L10n or i18n Getting the character basics right The title says "W3C " in Hebrew. Extending technology to support local needs Removing barriers to international use Assessing cultural influences Improving the process Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 21 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 22 Extending technology to support local needs Extending technology to support local needs Characters as ordered in memory: Characters as ordered in memory: W3C" in ,םו אנ י בה תו ל י עפ<"W3C" in Hebrew. The title says "<span dir="rtl ,םו אני בה תו לי ע פ <The title says "<span Hebrew. ✓ ✓ .in Hebrew "פעילות הב ינאו ם, in Hebrew. The title says "W3C "פעילות הב ינאו ם, The title says "W3C ✗ ✗ Using the bidi algorithm only Using the bidi algorithm only .W3C" in Hebrew ,פעילות הב ינאו ם " W3C" in Hebrew. The title says ,פעילות הב ינאו ם " The title says Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 23 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 24 4 Extending technology to support local needs Extending technology to support local needs かみしばいこれは紙芝居です。這一晚會如常舉行 <ruby> <rb>紙芝居</rb> 這一|晚會|如常|舉行 This banquet is held as usual. <rt>かみしばい</rt> </ruby> 這一|晚會|如|常|舉行 If this banquet is held frequently. 這一晚|會|如常|舉行 (An event) will be held tonight as usual. これは<ruby><rb>紙芝居</rb><rt>かみしばい</rt></ruby>です。 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 25 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 26 Extending technology to support local needs Extending technology to support local needs punctuation trim 经验分经验分 (万维 (万维 auto-space 弟10回のUnicode会議弟 10 回の Unicode 会議．．． emphasis これは日本語の文章です。、、、これは日本語の文章です。 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 27 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 28 Extending technology to support local needs Extending technology to support local needs 及多文种计算等。的实现，字型，文本格式以码在操作系统和应用软件中码，国际化和本地化，统一域包括：国际互联网和统一集各方面的专家。涉及的领在开始注册。本次会议将汇第十届统一码国际研讨会现日在德国 Unicode 当世界需要沟通时，请用 כאשר העול רוצה לדבר, הוא מדבר בUnicode. הירשמו כעת לכנס Unicode הבינלאומי העשירי, שייער בי התאריכי 1012 במר, ְמָיְינְ 。将于 שבגרמניה. בכנס ישתתפו מומחי מכל ענפי Mainz Mainz התעשייה בנושא האינטרנט העולמי וה , 3 3 Unicode בהתאמה לשוק הבינלאומי והמקומי, ביישו 月市举行的 Unicode במערכות הפעלה וביישומי , בגופני , 1010 בפריסת טקסט ובמחשוב רבלשוני. －日 1212 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 29 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 30 5 Extending technology to support local needs Extending technology to support local needs http://people.w3.org/rishida/scripts/samples/wrapping.html http://people.w3.org/rishida/scripts/samples/wrapping.html Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 31 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 32 Extending technology to support local needs Extending technology to support local needs Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 33 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 34 Overview Removing barriers to international use You are speaking to her from my new house. Están hablándole desde mi casa nueva. L10n or i18n? Getting the character basics right 私の新しい家から彼女と話しています。 Extending technology to support local needs Removing barriers to international use Assessing cultural influences Improving the process Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 35 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 36 6 Removing barriers to international use Removing barriers to international use There were %d spelling mistakes in file: %s. Datei %s enthält %d Rechtschreibfehler. The < > has been disabled. printf( "There were %d spelling

Keynote Speech: Internationalizing Web Content

The Ogham-Runes and El-Mushajjar

Neural Substrates of Hanja (Logogram) and Hangul (Phonogram) Character Readings by Functional Magnetic Resonance Imaging

Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress

Recognition of Online Handwritten Gurmukhi Strokes Using Support Vector Machine a Thesis

+ Natali A, Professor of Cartqraphy, the Hebreu Uhiversity of -Msalem, Israel DICTIONARY of Toponymfc TERLMINO~OGY Wtaibynafiail~

Proposal for a Korean Script Root Zone LGR 1 General Information

The Japanese Writing Systems, Script Reforms and the Eradication of the Kanji Writing System: Native Speakers’ Views Lovisa Österman

A New Research Resource for Optical Recognition of Embossed and Hand-Punched Hindi Devanagari Braille Characters: Bharati Braille Bank

The Fontspec Package Font Selection for XƎLATEX and Lualatex

Scripts, Languages, and Authority Control Joan M

Contribution to the UN Secretary-General's 2018 Report

ISO/IEC JTC1/SC2/WG2 N 2029 Date: 1999-05-29