Internationalizing Javascript Applications Norbert Lindenberg
Total Page:16
File Type:pdf, Size:1020Kb
Internationalizing JavaScript Applications Norbert Lindenberg © Norbert Lindenberg 2013. All rights reserved. Agenda • Unicode support • Collation • Number and date/time formatting • Localizable resources • Message construction 2 JavaScript is… • ECMAScript Language • ECMAScript Internationalization API • Browser: DOM, Navigator, XMLHttpRequest • Server: Node.js • Platforms: Firefox OS, Windows 8, Phonegap • Libraries: jQuery, Dojo, YUI, GWT, Node modules, etc. 3 ECMAScript • Language Speci!cation • Developed by Ecma TC 39 • Language syntax and semantics • Core API: Object, String, Array, RegExp, ... • Edition 5.1 current • Edition 6 expected December 2014 4 ECMAScript • Internationalization API Speci!cation • Developed by Ecma TC 39 + experts • API: Collator, NumberFormat, DateTimeFormat • Edition 1 approved December 2012 • Chrome, Opera, Explorer, Windows shipped; Firefox, Node.js coming • Edition 2 expected December 2014 5 Unicode Unicode support • All text in UTF-16 internally • UTF-8 well supported for transport • Need to identify charset in <script> tags, Content-Type headers • Need to use encodeURIComponent for path and query string components 7 Occupy Wall Street. By @tanlines. Supplementary characters • Characters above U+FFFF • Emoji, rare CJK, ancient scripts, musical symbols, ... • 2 code units in UTF-16 9 Today: UCS-2 or UTF-16? UCS-2: UTF-16: • Regular • Source text expressions conversion • String comparison • URI handling • Case conversion 10 Today: UCS-2 or UTF-16? UCS-2: UTF-16: • Regular • Source text expressions conversion • String comparison • URI handling • Case conversion • DOM, text input, text rendering, XMLHttpRequest 11 ECMAScript 6: UTF-16 • Case conversion for full Unicode • Full Unicode in identi!ers • String accessors for code points • But: no change to low-level string comparison • Planned: New Unicode mode in regular expressions 12 Regular expressions • RegExp in ES5 doesn’t have much Unicode support • No support for Unicode character properties • No support for supplementary characters 13 Regular expressions • CSet (inimino): Character classes with supplementary characters • XRegExp (Steven Levithan and Mathias Bynens): Unicode categories and properties with supplementary characters 14 Unicode normalization • Makes strings be equal that users perceive as equal (more or less) • ä = a ¨ U+00E4 = U+0061 U+0308 • 김 = ; dt U+AE40 = U+1100 U+1175 U+11B7 15 Unicode normalization • ECMAScript 5 “assumes” normalization happens where needed • Reality: applications have to do it • ECMAScript 6: String.prototype.normalize "김".normalize("NFD") → "\u1100\u1175\u11B7" • Libraries available, but not up to date: • unorm (Matsuza) • Richard Ishida’s normalizer 16 北京大学.中国 北京大学.中国 Internationalized domain names • Unicode at user interface • ASCII under the hood • 北京大学.中国 = xn--1lq90ic7fzpc.xn--!qs8s • Main steps: • normalization (as discussed) • punycode (Mathias Bynens has latest) 19 Collation Collation (sorting) • Old: String.prototype.localeCompare • Only string argument • New: Intl.Collator • locales • options • Fixed: String.prototype.localeCompare • With locales and options arguments 21 Locales • BCP 47 language tags • Language, script, country codes • “es”, “en-AU”, “zh-Hans-CN” • Unicode locale extension • “de-u-co-phonebk” • Preference lists • [“mr”, “hi”, “en-IN”] 22 Locale negotiation • BCP 47 Lookup • [“es-GT”, “es-MX”] → “es-GT”, “es”, “es-MX” • Best !t • implementation de!ned • [“es-GT”, “es-MX”] → “es-GT”, “es-MX”, “es” • Unicode extension handled separately 23 Collator extensions • co: collation – phonebook, pinyin, ... • kf: case !rst – upper, lower • kn: numeric sorting 24 Collator options • localeMatcher: lookup, best !t • usage: sort, search • sensitivity: base, accent, case, variant • ignorePunctuation • numeric, caseFirst 25 Non-ECMAScript • Nothing good found (some for Latin only) • Collation is hard • Knowledge of full Unicode character set • Big tables • Send lists that need alphabetic sorting to server 26 Number formatting 27 Number formatting • Old: Number.prototype.toLocaleString • No arguments • New: Intl.NumberFormat • locales • options • Fixed: Number.prototype.toLocaleString • With locales and options arguments 28 NumberFormat extensions • nu: numbering system 29 NumberFormat options • localeMatcher: lookup, best !t • style: decimal, currency, percent • currency: ISO 4217 currency code • currencyDisplay: symbol, code, name • minimum/maximum digits • useGrouping 30 Non-ECMAScript ¤ % ๙ # , ⚑ Globalize + + - + - 250+ Dojo + + - + - 30+ Closure + + + + + 300+ Windows 8 + + + + + 100s iLib + + - + - 10+ ¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales. 31 Date and time formatting Date and time formatting • Old: Date.prototype.toLocale[|Date|Time]String • No arguments • New: Intl.DateTimeFormat • locales • options • Fixed: Date.prototype.toLocale[|Date|Time]String • With locales and options arguments 33 DateTimeFormat extensions • ca: calendar • nu: numbering system 34 DateTimeFormat options • localeMatcher: lookup, best !t • timeZone: UTC • hour12 • weekday, era, year, month, day, hour, minute, second, timeZoneName: components • formatMatcher: basic, best !t 35 Non-ECMAScript ca tz ๙ ⚑ Globalize 5+ + - 250+ Dojo 4 - - 30+ Closure + + + 300+ Windows 8 ? - ? ? Moment - - - 50 YUI - - - 50+ ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales. 36 Resources Localizable resources • ECMAScript doesn’t have an I/O system, therefore no standard resource loading • Developers have invented many different mechanisms 38 Representation • At runtime, use object as key-value map • JSON good for transfer; not as source • Localizers need comments • Existing source formats have problems • Java properties !les are not UTF-8 • gettext .po !les encoding is unspeci!ed • YUI uses its own YRB format 39 How dynamic would you like it? • Making resources available to application • Inject strings/objects into JavaScript code • Bundle resources with JavaScript code • Load resource bundles at runtime 40 Injecting strings into JavaScript • Server knows locale when generating JavaScript and HTML • Inject strings or objects directly where needed • E.g., JavaServer Pages Standard Tag Library: <fmt:bundle basename="Messages”> <script> alert("<fmt:message key='HELLO'/>"); </script> </fmt:bundle> 41 Injecting strings into JavaScript • Problems: • Mixes multiple programming languages • Can introduce syntax errors through localization 42 Bundling resources with JavaScript • Server knows locale when serving JS • Bundles resources with JavaScript • Convert to JavaScript/JSON • Concatenate with other JavaScript • Resources: var MyResources = {HELLO: "안녕하세요"}; • Code: alert(MyResources.HELLO); 43 Loading resources at runtime • Locale not known until runtime • Request resources at runtime • Using XMLHttpRequest • By creating script tag • Resources in JSON or module format 44 Loading resources at runtime • Cross-domain support? • Not with XMLHttpRequest • Possible with script tag • Synchronous access? • More convenient programming model • Can lock up browser • BCP 47 support? • Many loaders assume aa-AA format 45 Access to resources in libraries • Dojo • Loading at runtime, synchronous • GWT • Injecting resources (Constant) • Bundling resources (with HTML, Dictionary) • YUI • Bundling resources via module loader 46 Message construction Photo © Den Widhana Message construction • Substitution • {user} went to {city}. • {user}さんは{city}へ行きました。 48 Message construction • Plurals • {user} est allé à {city}. • {user1} et {user2} sont allés à {city}. • 1-6 forms depending on language • {number, plural {one {...} few {...} many {...}}} 49 Message construction • Gender • {user} est allé à {city}. • {user} est allée à {city}. • 1-4 forms depending on language • {gender, select {female {...} male {...} unknown {...}}} 50 Message construction {gender, select { female {num, plural { one {{user1} est allée à {city}.} other {{user1} et {user2} sont allées à {city}.}}} male {num, plural { one {{user1} est allé à {city}.} other {{user1} et {user2} sont allés à {city}.}}} }} 51 Message construction • Google has MessageFormat for Closure environment • Alex Sexton provided standalone version • Mozilla has even more ambitious L20n library 52 Summary • ECMAScript Internationalization API provides core functionality • http://norbertlindenberg.com/2012/12/ ecmascript-internationalization-api/ • Libraries provide more internationalization support than you may think • http://norbertlindenberg.com/2013/10/ javascript-internationalization/ 53.