Internationalizing Javascript Applications Norbert Lindenberg

Internationalizing Javascript Applications Norbert Lindenberg

Internationalizing JavaScript Applications Norbert Lindenberg © Norbert Lindenberg 2013. All rights reserved. Agenda • Unicode support • Collation • Number and date/time formatting • Localizable resources • Message construction 2 JavaScript is… • ECMAScript Language • ECMAScript Internationalization API • Browser: DOM, Navigator, XMLHttpRequest • Server: Node.js • Platforms: Firefox OS, Windows 8, Phonegap • Libraries: jQuery, Dojo, YUI, GWT, Node modules, etc. 3 ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array, RegExp, ... • Edition 5.1 current • Edition 6 expected December 2014 4 ECMAScript • Internationalization API Speci!cation • Developed by Ecma TC 39 + experts • API: Collator, NumberFormat, DateTimeFormat • Edition 1 approved December 2012 • Chrome, Opera, Explorer, Windows shipped; Firefox, Node.js coming • Edition 2 expected December 2014 5 Unicode Unicode support • All text in UTF-16 internally • UTF-8 well supported for transport • Need to identify charset in <script> tags, Content-Type headers • Need to use encodeURIComponent for path and query string components 7 Occupy Wall Street. By @tanlines. Supplementary characters • Characters above U+FFFF • Emoji, rare CJK, ancient scripts, musical symbols, ... • 2 code units in UTF-16 9 Today: UCS-2 or UTF-16? UCS-2: UTF-16: • Regular • Source text expressions conversion • String comparison • URI handling • Case conversion 10 Today: UCS-2 or UTF-16? UCS-2: UTF-16: • Regular • Source text expressions conversion • String comparison • URI handling • Case conversion • DOM, text input, text rendering, XMLHttpRequest 11 ECMAScript 6: UTF-16 • Case conversion for full Unicode • Full Unicode in identi!ers • String accessors for code points • But: no change to low-level string comparison • Planned: New Unicode mode in regular expressions 12 Regular expressions • RegExp in ES5 doesn’t have much Unicode support • No support for Unicode character properties • No support for supplementary characters 13 Regular expressions • CSet (inimino): Character classes with supplementary characters • XRegExp (Steven Levithan and Mathias Bynens): Unicode categories and properties with supplementary characters 14 Unicode normalization • Makes strings be equal that users perceive as equal (more or less) • ä = a ¨ U+00E4 = U+0061 U+0308 • 김 = ; dt U+AE40 = U+1100 U+1175 U+11B7 15 Unicode normalization • ECMAScript 5 “assumes” normalization happens where needed • Reality: applications have to do it • ECMAScript 6: String.prototype.normalize "김".normalize("NFD") → "\u1100\u1175\u11B7" • Libraries available, but not up to date: • unorm (Matsuza) • Richard Ishida’s normalizer 16 北京大学.中国 北京大学.中国 Internationalized domain names • Unicode at user interface • ASCII under the hood • 北京大学.中国 = xn--1lq90ic7fzpc.xn--!qs8s • Main steps: • normalization (as discussed) • punycode (Mathias Bynens has latest) 19 Collation Collation (sorting) • Old: String.prototype.localeCompare • Only string argument • New: Intl.Collator • locales • options • Fixed: String.prototype.localeCompare • With locales and options arguments 21 Locales • BCP 47 language tags • Language, script, country codes • “es”, “en-AU”, “zh-Hans-CN” • Unicode locale extension • “de-u-co-phonebk” • Preference lists • [“mr”, “hi”, “en-IN”] 22 Locale negotiation • BCP 47 Lookup • [“es-GT”, “es-MX”] → “es-GT”, “es”, “es-MX” • Best !t • implementation de!ned • [“es-GT”, “es-MX”] → “es-GT”, “es-MX”, “es” • Unicode extension handled separately 23 Collator extensions • co: collation – phonebook, pinyin, ... • kf: case !rst – upper, lower • kn: numeric sorting 24 Collator options • localeMatcher: lookup, best !t • usage: sort, search • sensitivity: base, accent, case, variant • ignorePunctuation • numeric, caseFirst 25 Non-ECMAScript • Nothing good found (some for Latin only) • Collation is hard • Knowledge of full Unicode character set • Big tables • Send lists that need alphabetic sorting to server 26 Number formatting 27 Number formatting • Old: Number.prototype.toLocaleString • No arguments • New: Intl.NumberFormat • locales • options • Fixed: Number.prototype.toLocaleString • With locales and options arguments 28 NumberFormat extensions • nu: numbering system 29 NumberFormat options • localeMatcher: lookup, best !t • style: decimal, currency, percent • currency: ISO 4217 currency code • currencyDisplay: symbol, code, name • minimum/maximum digits • useGrouping 30 Non-ECMAScript ¤ % ๙ # , ⚑ Globalize + + - + - 250+ Dojo + + - + - 30+ Closure + + + + + 300+ Windows 8 + + + + + 100s iLib + + - + - 10+ ¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales. 31 Date and time formatting Date and time formatting • Old: Date.prototype.toLocale[|Date|Time]String • No arguments • New: Intl.DateTimeFormat • locales • options • Fixed: Date.prototype.toLocale[|Date|Time]String • With locales and options arguments 33 DateTimeFormat extensions • ca: calendar • nu: numbering system 34 DateTimeFormat options • localeMatcher: lookup, best !t • timeZone: UTC • hour12 • weekday, era, year, month, day, hour, minute, second, timeZoneName: components • formatMatcher: basic, best !t 35 Non-ECMAScript ca tz ๙ ⚑ Globalize 5+ + - 250+ Dojo 4 - - 30+ Closure + + + 300+ Windows 8 ? - ? ? Moment - - - 50 YUI - - - 50+ ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales. 36 Resources Localizable resources • ECMAScript doesn’t have an I/O system, therefore no standard resource loading • Developers have invented many different mechanisms 38 Representation • At runtime, use object as key-value map • JSON good for transfer; not as source • Localizers need comments • Existing source formats have problems • Java properties !les are not UTF-8 • gettext .po !les encoding is unspeci!ed • YUI uses its own YRB format 39 How dynamic would you like it? • Making resources available to application • Inject strings/objects into JavaScript code • Bundle resources with JavaScript code • Load resource bundles at runtime 40 Injecting strings into JavaScript • Server knows locale when generating JavaScript and HTML • Inject strings or objects directly where needed • E.g., JavaServer Pages Standard Tag Library: <fmt:bundle basename="Messages”> <script> alert("<fmt:message key='HELLO'/>"); </script> </fmt:bundle> 41 Injecting strings into JavaScript • Problems: • Mixes multiple programming languages • Can introduce syntax errors through localization 42 Bundling resources with JavaScript • Server knows locale when serving JS • Bundles resources with JavaScript • Convert to JavaScript/JSON • Concatenate with other JavaScript • Resources: var MyResources = {HELLO: "안녕하세요"}; • Code: alert(MyResources.HELLO); 43 Loading resources at runtime • Locale not known until runtime • Request resources at runtime • Using XMLHttpRequest • By creating script tag • Resources in JSON or module format 44 Loading resources at runtime • Cross-domain support? • Not with XMLHttpRequest • Possible with script tag • Synchronous access? • More convenient programming model • Can lock up browser • BCP 47 support? • Many loaders assume aa-AA format 45 Access to resources in libraries • Dojo • Loading at runtime, synchronous • GWT • Injecting resources (Constant) • Bundling resources (with HTML, Dictionary) • YUI • Bundling resources via module loader 46 Message construction Photo © Den Widhana Message construction • Substitution • {user} went to {city}. • {user}さんは{city}へ行きました。 48 Message construction • Plurals • {user} est allé à {city}. • {user1} et {user2} sont allés à {city}. • 1-6 forms depending on language • {number, plural {one {...} few {...} many {...}}} 49 Message construction • Gender • {user} est allé à {city}. • {user} est allée à {city}. • 1-4 forms depending on language • {gender, select {female {...} male {...} unknown {...}}} 50 Message construction {gender, select { female {num, plural { one {{user1} est allée à {city}.} other {{user1} et {user2} sont allées à {city}.}}} male {num, plural { one {{user1} est allé à {city}.} other {{user1} et {user2} sont allés à {city}.}}} }} 51 Message construction • Google has MessageFormat for Closure environment • Alex Sexton provided standalone version • Mozilla has even more ambitious L20n library 52 Summary • ECMAScript Internationalization API provides core functionality • http://norbertlindenberg.com/2012/12/ ecmascript-internationalization-api/ • Libraries provide more internationalization support than you may think • http://norbertlindenberg.com/2013/10/ javascript-internationalization/ 53.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    53 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us