
Unicode in Action Presenters • Tex Texin Unicode in Action Globalization Architect, XenCraft Cummings, McKenna, Texin • Craig R. Cummings Staff Consultant/Globalization Evangelist, VMware Internationalization and Unicode Conference 42 • Michael McKenna Sept. 10, 2018 World-Ready Architect, PayPal, Inc. Unicode in Action 2 Code Abstract Unicode in Action • The Unicode in Action tutorial is a 90 minute session that • The demo code will be available on demonstrates programming with Unicode and related best practices. I18nGuy.com shortly after the conference • This tutorial will build a simple application and demonstrate the code and resulting behavior as internationalization functions are added. Attendees will be able to relate these prototype examples to the requirements of their own applications and reference them to code solutions. • The program will show sorting of different strengths, regular expressions, Unicode normalization, bidirectional languages, and other features of the Unicode standard. The tutorial will highlight why each of these functions are needed so you can determine when to use them in your applications. Unicode in Action 3 Unicode in Action 4 Objectives Base Program – Movie Catalog • Be introductory level • Our first example is a simple movie • Simple examples catalog. • The program will show • It could be any business application, listing – sorting of different strengths, products, customers, etc. – regular expressions, – Unicode normalization, • It demonstrates typical data requirements: – bidirectional languages, – text, dates, numbers, currencies, taxonomies, – and other features. images. – Highlight the need for these features. • It is written in HTML5 and JavaScript – For simplicity and portability Unicode in Action 5 Unicode in Action 6 Internationalization and Unicode Conference 1 Tex Texin, XenCraft Unicode in Action Base Program – Movie Catalog Simple Code HTML5 and JavaScript HTML Excerpts <!DOCTYPE html> <html> <div id="datalist"> <head> <meta charset="utf-8"> <table class='products-list'> <title>Unicode in Action Movie Catalog</title> <caption>Movie Catalog</caption> <link href="css/styles.css" rel="stylesheet“> <tr id="prodheading"> </head> <th>Title</th><th>Release <body> Date</th><th>Genre</th><th>Units<br>(Thousa … nds)</th><th>Price</th><th>Cover</th></tr> <h1>Options</h1> <form id="options" name="settings" <tbody id="id01"> onsubmit="return myControls();" > <p>Search: <input type="text" name="search" </tbody> size="40" </table> placeholder="search term or regular </div> expression"></p> </td></tr> </table> <div class="controlbuttons"> <input type="submit" value="Go"> </div> </form> Unicode in Action 7 Unicode in Action 8 Simple Code HTML5 and JavaScript Simple Code HTML5 and JavaScript JavaScript Excerpts JavaScript Excerpts <script type="text/javascript"> /* return true for records that do not match*/ function showProducts(data) { function getProducts() { function searchFilter(testValue, matchPattern) { var i; var products = readjson("","products.json"); var exclude = false; var out = ""; showProducts(products); if (matchPattern == "") { for(i = 0; i < data.length; i++) { } return (exclude); if (searchFilter(data[i].title, UIApattern)) { } continue; function myControls(){ var REpattern = new RegExp(matchPattern, } UIApattern = "i"); document.forms["settings"]["search"].value; out += "<tr><td>" + data[i].title + "</td><td>" + mydate(data[i].specs.year) + "</td><td>" + exclude = (testValue.search(REpattern) == -1) ; mygenre(data[i].specs.genre) + "</td><td>" + mynumber(data[i].specs.duration) + "</td><td>“ + getProducts(); /* if not found, exclude = true */ mycurrency(data[i].price) + "</td><td>“ + myimage( data[i].image.small) + "</td></tr>\n"; return false; return (exclude); } } } document.getElementById("id01").innerHTML = out; function myimage (value) { } var intlvalue = "<img alt='movie cover photo‘ src='" + value + "'>"; return(intlvalue); } </script> Unicode in Action 9 Unicode in Action 10 Base Program – Movie Catalog Base Program – Movie Catalog Locale, What do we need Search, to make this Sort, program global? Normalization, Bidi, LTR, RTL Encoding (UTF-8, UTF-16, Supplementary Characters) Unicode in Action 11 Unicode in Action 12 Internationalization and Unicode Conference 2 Tex Texin, XenCraft Unicode in Action Internationalized Movie Catalog Internationalized Movie Catalog Chinese Locale Arabic Locale, RTL Direction Unicode in Action 13 Unicode in Action 14 Internationalized Movie Catalog Features • Uses locales – (en-US, de-DE, zh-CN, sv, ar) • Localized headings, taxonomy • Formatted data (date, number, price) • Normalization of input Normalization • Localized sort • Bidi Tex Texin Internationalization Architect Unicode in Action 15 Canonical & Compatibility Normalization Unicode Normalization Forms • Unicode characters can have more than 1 • Unicode Consortium has defined canonical and representation compatibility decomposition formats and 4 different sets of rules for normalization: • Canonical equivalence – Indistinguishable, fundamental equivalence “ Unicode Normalization Forms” – E.g. combining sequences, singletons http://www.unicode.org/unicode/reports/tr15/ – “Å” U+00C5 (A-ring pre-composed) – “A+˚ ” U+0041 + U+030A (A + combining ring above) Composed Decomposed – “Å” U+212B (Angstrom) • Compatibility equivalence Canonical NFC NFD – E.g. Formatting differences, ligatures Canonical+ NFKC NFKD – “カ” U+FF76 “カ” U+30AB (KA half and full width) Kompatibility – “fi” U+FB01 (ligature fi) Unicode in Action 17 Unicode in Action 18 Internationalization and Unicode Conference 3 Tex Texin, XenCraft Unicode in Action Collation Dependencies • Language • Application – Dictionary – Phonebook Sorting • “Strength” – Accent – Case – Ignorables Tex Texin Internationalization Architect Unicode in Action 20 Example Collation Differences Comparison Levels Language Swedish: z < ö Level Description Examples German: ö < z L1 Base characters role < roles < rule Usage Dictionary: öf < of L2 Accents role < rôle < roles Telephone: of < öf L3 Case role < Role < rôle Customizations Upper-first A < a L4 Punctuation role < “role” < Role Lower -First a < A Ln Tie-Breaker role < ro□le < “role” Box represents format character Purple chars more significant than differences indicated by underscores Accent Ordering Forward Accent cote < coté < côte < côté Ordering French Accent cote < côte < coté < côté Ordering Language Identifiers French gives more weight to accents at the end of the string than the beginning. Cote and Coté are more similar in forward ordering, but in French, Côte orders between the two. Tex Texin Internationalization Architect Unicode in Action Internationalization and Unicode Conference 4 Tex Texin, XenCraft Unicode in Action Language Identification BCP47 Language Identifiers • HTTP: Content-Language header language-extlang-script-region-variants-extensions-privateuse • HTML: LANG attribute Subtag Standard Syntax Examples Language ISO 639 2 or 3 letter code en, yue e.g. <html lang="fr"> Extlang ISO 639-2 3 letter code (Legacy only) zh-yue • XML: xml:lang attribute Script ISO 15924 4 letter code Latn, Cyrl, Hans, Hant Region ISO 3166 2 letter code US, GB UN M49 3 digit code 419 • XHTML 1.0: Both lang and xml:lang variants extensions <p xml:lang="la" lang="la">Verba.</p> privateuse http://www.iana.org/assignments/language-subtag-registry • XHTML 1.1, 2: xml:lang attribute Unicode in Action 25 Unicode in Action 26 Example Language Identifiers Language Identification – CSS Tag Language Tag Language Two methods use the language attribute in CSS: en English zh Chinese en-US American English zh-Hant Traditional Chinese Spanish as spoken in • The lang pseudo-class. es-US zh-Hans Simplified Chinese U.S. *:lang(zh) { font-family:SimSun } en-CA Canadian English cmn Mandarin fr-CA Canadian French yue Cantonese • The attribute selector. Mandarin for China in Simplified fr-FR French French cmn-Hans-CN Chinese *[lang|=fr] { font-weight:bold } es-ES Iberian Spanish cmn-Hant Mandarin in Traditional Chinese • Both use the same matching mechanism as the Latin American es-419 pt-BR Brazilian Portuguese lang() Spanish function in XPath. es-MX Mexican Spanish zh-yue retired, use yue instead Example: LanguagesCSS.htm zh-CN Chinese spoken in China Unicode in Action 27 Unicode in Action 28 Text Layout Standards More content and example code are available at: www.xencraft.com/training/webstandards.html Feature Feature Lang() Xsl format-number Lang pseudo-class Html bi-directional text Lang attr selector Css bi-directional text Bidirectional Support Quote:qo Vertical text (SVG losing ground) Text-transform Ruby annotation Css list-style-type Css3 combined sort Xsl number Xsl:sort Tex Texin Internationalization Architect Unicode in Action 29 Internationalization and Unicode Conference 5 Tex Texin, XenCraft Unicode in Action Bidirectional (Bidi) Language Support Bidirectional (Bidi) Language Support • HTML 4 DIR attribute • HTML 5 – Isolates dir="ltr" | dir="rtl" <bdi dir=rtl> </bdi> – Sets base direction – Flow doesn’t change with container changes! – Direction is inherited • Direction affects alignment and flow • DIR=AUTO – Ordering of text and table columns – Detects direction, based on first strong – Text alignment, Alignment of overflowing blocks character • Control Characters • CSS Selectors – Right to Left and Left to Right Marks &rlm;/&lrm; – :dir(rtl) for rtl elements – Useful for correct positioning of neutrals – :dir(ltr) for ltr elements Unicode in Action 31 Unicode in Action 32 Internationalized
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages8 Page
-
File Size-