Complex Text on Simple Devices

Complex Text on Simple Devices Pedro Navarro Sr. Software Engineer / Streaming Client Technologies 1 Background 2 Gibbon ● Codename for our JavaScriptCore-based application framework. It provides objects to create UI elements, access the device and perform video playback. ● Written in portable C++ targeted to the lowest common denominator ● Runs in Consumer Electronic devices (TVs, Blu-Ray players, Roku devices), Android TV and Game Consoles (from the Wii to the PS4 Pro) Netflix UI Powered by Gibbon since 2013 3 Constraints ● We are targeting devices with very low capabilities (128mb total RAM) and, in many cases, with a Read Only file system. ● We don’t control the host we are running on, so we have to be ready to work with old compilers and different versions, and combinations, of all third party libraries we use ● Very long release cycles: except for game consoles it takes 12 months from the time we provide our SDK until there are devices in the market with it. ● No upgrades! We are part of the device’s firmware. OLD DEVICE Luckily, we didn’t have to ● Small footprint. Our binaries and fonts have to be as small as support this one. possible, as flash storage is scarce ● Different graphics platforms. We need to run on DirectFB, OpenGL and Game Console graphics APIs. 4 2013’s Text Engine ● Our first iteration of Gibbon’s Text Layout Engine was very simple and provided just 1:1 mapping between characters in a text string and glyph indices in a font. ● The character set was WGL-4, which we extended later to add additional Latin glyphs. ● Support for CJK was added by introducing fall fallbacks. 2013 Supported Writing Systems ■ Latin (425 languages) ■ Cyrillic (106 languages) ■ Greek (1 language) ■ Chinese (5 languages) ■ Japanese (1 language) ■ Hangul (1 language) 5 Global Launch ● For the global launch we had to be ready well in advance because of the long release times. ● We defined our own Character Set (NGL-2), to standardize our fonts and our content. ● Supporting Complex Scripts meant that we had to integrate text shaping and BiDi processing. ● Research showed that the vast majority of devices we needed to support, besides game consoles, would be low performance set-top boxes. Global launch candidates: Indic writing systems: ■ Arabic (38 languages) ■ Bengali (5 languages) ● It’s important for us to get pixel fidelity between ■ Hebrew (9 languages) ■ Devanagari (19 languages) platforms, so the UI doesn’t have to account for ■ Ge’ez ■ Gurmukhi ■ Georgian ■ Gujarati (2 languages) differences. ■ Armenian ■ Kannada (4 languages) ■ Tibetan (4 languages) ■ Malayalam (2 languages) ■ Khmer ■ Oriya (2 languages) ■ Lao ■ Tamil ■ Thai ■ Telugu ■ Burmese 6 Global Text Layout Engine Features ● Font Handling ○ Modular fonts ○ Aliasing, fallbacks and substitution ○ Synthetic bold and italic support ● Text Shaping ○ Context based glyphs ○ Ligatures (substitution) ○ Positioning ○ Reordering ● Text Layout ○ Rich text ○ Bidirectional support ○ Line breaking (word wrapping) 7 Font fallbacks 8 Font fallbacks / Font linking ● Font linking automatically picks glyphs from other fonts, if not present in the active one, that offer the Unicode range where the missing glyph is. ● Font linking lets us ships fonts for each script as needed, making the deployments modular. ● Design points: ○ A writable file system is not guaranteed, so Fontconfig’s cache solution would not work for us. Fontconfig might not be available on the system so we would have to supply ours. ○ We are not generic: we control the fonts that are in our system and the content we are going to display. ○ We know the font we want to use for every writing system. 9 <settings> <aliases>Helvetica, Sans, serif</aliases> </settings> Font fallbacks  for that particular writing system (no latin in CJK fonts, for <regular> <file>fonts/Arial_for_Netflix-R.ttf</file> example) plus the space (U+0020) <settings> <bbox>-136,-621+143x1864</bbox> Scan the fonts at build time and write to a configuration file <default_bbox>-1006,-665+2222x1864</default_bbox> which Unicode Blocks the font has glyphs in. </settings> <blocks> <block1>000000-00007F</block1> <!-- Basic Latin (95 characters) ● Run time: <block2>000080-0000FF</block2> <!-- Latin-1 Supplement (96 characters) <block3>000100-00017F</block3> <!-- Latin Extended-A (128 characters) When a glyph is not found in the current font, search for fonts <block4>000180-00024F</block4> <!-- Latin Extended-B (24 characters) that can supply the needed Unicode Block, sorted by language <block5>000250-0002AF</block5> <!-- IPA Extensions (9 characters) <block6>0002B0-0002FF</block6> <!-- Spacing Modifier Letters (9 characters) and priority. Keep going down the list until a match is found. <block7>000300-00036F</block7> <!-- Combining Diacritical Marks (10 characters) <block8>000370-0003FF</block8> <!-- Greek and Coptic (73 characters) Once a match is found, keep using the same font until a new <block9>000400-0004FF</block9> <!-- Cyrillic (122 characters) Unicode Block is needed. … <block26>00FE70-00FEFF</block26> <!-- Arabic Presentation Forms-B (1 character) </blocks> Spaces are always considered to be part of the current <languages>*-Latn,*-Grek,*-Cyrl</languages> Unicode Block, so we keep spacing consistent by using the <priority>200</priority> Font definition file </regular> Excerpt of our fonts.xml space glyph for each script’s font. configuration file 10 Text Layout 驩檤 サ捯ひろ驚 11 Attributes [00] 0 - 0: [00:000-000] Japanese 20 [LTR] [01] 1 - 2: [00:000-000] Traditional_Chinese 20 [LTR] [02] 3 - 3: [00:000-000] Japanese 20 [LTR] [03] 4 - 4: [00:000-000] Traditional_Chinese 20 [LTR] [04] 5 - 7: [00:000-000] Japanese 20 [LTR] Text direction runs Text Layout [00] 0 - 7: LTR (0:0-0) Embedding levels: 0 0 0 0 0 0 0 0 Visual map: 0 1 2 3 4 5 6 7 ● Infrastructure: Visual embeddings: 0 0 0 0 0 0 0 0 Text script runs [00] 0 - 2: Hani ICU for BiDi, Script categorization and line breaking. [01] 3 - 3: Kana [02] 4 - 4: Hani Freetype for rasterization. [03] 5 - 6: Hira [04] 7 - 7: Hani Harfbuzz for text shaping. Text locale runs [00] 0 - 7: ja ● Itemization: Word breaks [00] 0 - 0: |驩| White space is collapsed according to the HTML5 rules. [01] 1 - 2: |檤 | [02] 3 - 3: |サ| [03] 4 - 4: |捯| Fonts are resolved before shaping, so we shape the longest [04] 5 - 5: |ひ| possible run of the same font. We don’t fall back to the base [05] 6 - 6: |ろ| font for spaces. [06] 7 - 7: |驚| Line breaks Attributes We add synthesized bold and oblique styles to the list of [00] 0 - 0: [00:000-000] Japanese 20 [LTR] Hani available fonts. [01] 1 - 2: [00:000-000] Traditional_Chinese 20 [LTR] Hani [02] 3 - 3: [00:000-000] Japanese 20 [LTR] Kana We try to find locales, specified in a or by inferring it [03] 4 - 4: [00:000-000] Traditional_Chinese 20 [LTR] Hani [04] 5 - 6: [00:000-000] Japanese 20 [LTR] Hira from the script, to use ICU’s dictionary based line breaking Sample text layout [05] 7 - 7: [00:000-000] Japanese 20 [LTR] Hani Debug information our itemizer when available. provides about a text object 驩檤 サ捯ひろ驚 12 Itemizer layout - Bounds: [0,0+146x22] - Desired: [0,0+300x200] Padding: [0x0] - Indent: 0 Mirror: false [00] Line: [0,0+146x22] | Dir: RTL | Padding: 0+0 [00] Text run: Bounds: [0,0+20x21] | Ascent: -18 Direction: LTR: [00:000-000] Buffer offsets: 0 - 0 Text Layout Buffer contents: gid6521=0 [01] Text run: Bounds: [20,1+26x21] | Ascent: -17 Direction: LTR: [00:000-000] ● Text layout: Buffer offsets: 0 - 1 Buffer contents: gid6606=1|gid3=2 [02] Text run: Bounds: [46,0+20x21] | Ascent: -18 Emphasis on being one-pass. We forget the text string as soon Direction: LTR: [00:000-000] as we have itemized it. Buffer offsets: 0 - 0 Buffer contents: gid134=3 Harfbuzz buffers are referenced by multiple “items”. Each [03] Text run: Bounds: [66,1+20x21] | Ascent: -17 Direction: LTR: [00:000-000] item has a harfbuzz buffer starting and ending offset. Buffer offsets: 0 - 0 Buffer contents: gid5179=4 We never shape text again. If we need to left/right trim, we [04] Text run: Bounds: [86,0+40x21] | Ascent: -18 operate directly on the items by modifying the offsets. For Direction: LTR: [00:000-000] Buffer offsets: 0 - 1 each font, we keep an in-memory codepoint to glyph index for Buffer contents: gid77=5|gid104=6 all spacing characters. [05] Text run: Bounds: [126,0+20x21] | Ascent: -18 Last word mark present: 0 We don’t support hyphenation or justification. Direction: LTR: [00:000-000] Buffer offsets: 0 - 0 Buffer contents: gid6515=7 For BiDi reordering we operate directly on the runs, as each Cache Reuse: 1[1]/0 (0/0) run has an embedding level property. DisplayList(0xdcdb2ee0) pixels=2,780 size=300x200: Text: txt:'驩檤 サ捯ひろ驚' A run can have any number of sub-runs associated with it, for emphasis marks or rubies. Sample text layout Debug information our itemizer Layouts are cached, as they are expensive, and a change in provides about a text object container attributes can trigger a relayout or reitemize. 13 Text Layout Facts ● We were able to fit in 128 mb devices, where we have only 20-30 mb available for our app. ● Text layout is, by far, the most expensive operation. Smart caching of text layouts helped us reach 30-45 fps when scrolling movie titles: ○ Try to never itemize text a second time. ○ When changing the container dimensions or alignment adjust the layout lines.

Complex Text on Simple Devices

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support