Direction: Rtl;)

Direction: Rtl;)

Get Your Head Around Bidirectionality! Behnam Esfahbod Software Engineer Abstract We know when the software is broken for a right-to-left languages like Arabic, Persian, or Hebrew, but often the solution is either not clear, or fixing it with out-of-place patches won't worth the costs down the road. Like other areas of i18n, bidirectional layout and right-to-left language support need deliberate design in the user-interface stack, and without good architecture it won't be useful for the developers or the users. 42nd In this tutorial, we first learn how to think in right-to-left and Internationalization & how it mirrors into left-to-right directionality. We then look at Unicode Conference the common problems in bidirectional applications and how to address them with generic solutions and standard algorithms. September 2018 This tutorial is suitable for anyone not familiar with right-to-left Santa Clara, CA, USA languages or bidirectional design, or interested to learn how to develop solutions for this area. !2 About me • Software Engineer @ Quora, Inc. • Co-Chair of Arabic Layout Task Force @ W3C i18n Activity • Virgule Typeworks • Facebook, Inc. • IRNIC Domain Registry • Sharif FarsiWeb, Inc. !3 This talk • Bidirectional Writing Systems • Bidirectional Text • Bidirectional Layout • Bidirectional Web Application • Bidirectionality Techniques !4 Bidirectional Writing Systems History Boustrophedon from Greek “boustrophēdón” meaning “ox-turning” !6 Fragmentary boustrophedon inscription in the agora of Gortyn (Crete) - code of law | by PRA [CC BY-SA 3.0] History Line direction alternates. No paragraph direction. Q: Why’s this useful? !7 Fragmentary boustrophedon inscription in the agora of Gortyn (Crete) - code of law | by PRA [CC BY-SA 3.0] History • Most scripts chose one way or another • Small set of writing symbols - Letters, e.g. Greek Alpha or Arabic Alef - Limited punctuations - No numerals: roman and abjad numbers Later, Hindu-Arabic numerals Early Writing • Not (normally) read digit-by-digit Systems - - Spelled out as a (whole) number - Therefore: no direction in reading a numbers! !8 Today Writing systems at national level !9 Writing systems worldwide | By JWB [CC-BY-SA-3.0] Today • Unicode ≈ unique, unified, universal encoding • About 150 scripts encoded in Unicode: - ~110 left-to-right (LTR) (some could also be top-to-bottom) - ~30 right-to-left (RTL) (some are bidi…) - the rest are top-to-bottom, or mixed directions Major unified scripts Digital encoding • - CJK: Chinese, Japanese, Korean - Arabic: Standard/Maghrebi Arabic, Persian, Urdu, Jawi, Uyghur, … • Major non-unified scripts - Latin/Greek/Cyrillic !10 Bidirectional Text Manuscript text & layout !12 Semantic • Encode concepts, not various shapes of them encoding in - One Arabic Letter Alef (U+0627) Unicode - Most Arabic letters take at least 4 shapes depending on context - But, two Latin Letter A (oops!) - LATIN CAPITAL LETTER A (U+0041) / LATIN SMALL LETTER A (U+0061) Store text in memory in the same order as is read/processed in mind !13 Semantic • Encode concepts, not various shapes of them encoding in - One Arabic Letter Alef (U+0627) Unicode - Most Arabic letters take at least 4 shapes depending on context - But, two Latin Letter A (oops!) - LATIN CAPITAL LETTER A (U+0041) / LATIN SMALL LETTER A (U+0061) • Some punctuations are shared, some are not - Single Period/Full Stop symbol for most scripts (“.” U+002E) (U+061F ”؟“ ,Store text in - A pair of Question Marks (“?” U+003F memory in the same order as is read/processed in mind !14 Semantic • Encode concepts, not various shapes of them encoding in - One Arabic Letter Alef (U+0627) Unicode - Most Arabic letters take at least 4 shapes depending on context - But, two Latin Letter A (oops!) - LATIN CAPITAL LETTER A (U+0041) / LATIN SMALL LETTER A (U+0061) • Some punctuations are shared, some are not - Single Period/Full Stop symbol for most scripts (“.” U+002E) (U+061F ”؟“ ,Store text in - A pair of Question Marks (“?” U+003F memory in the same order as is • Some Numerals are LTR and some RTL read/processed in - Until 2006 (encoding of N’Ko), all numerals were LTR mind - European (ASCII): 0123456789 / Eastern Hindi-Arabic (Persian): ۰۱۲۳۴۵۶۷۸۹ - Recently-developed African systems use RTL numerals - N’Ko: ߉߈߇߆߅߄߃߂߁߀ !15 Direction in text block What will be the biggest internet trends between 2016-2020? LTR paragraphs are usually aligned “flush left”, a.k.a. “left-aligned” or “ragged right”. !16 Direction in text block What will be the biggest internet trends between 2016-2020? RTL paragraphs ﺑﺰرﮔﺘﺮﯾﻦ روﻧﺪﻫﺎی اﯾﻨﺘﺮﻧﺘﯽ در ﺑﯿﻦ ﺳﺎل ﻫﺎی are usually aligned “flush right”, a.k.a. ۲۰۲۰-۲۰۱۶ ﭼﻪ ﺧﻮاﻫﺪ ﺑﻮد؟ right-aligned” or“ “ragged left”. !17 Direction in text block What will be the biggest internet trends between 2016-2020? Reading direction ﺑﺰرﮔﺘﺮﯾﻦ روﻧﺪﻫﺎی اﯾﻨﺘﺮﻧﺘﯽ در ﺑﯿﻦ ﺳﺎل ﻫﺎی is usually perceived ۲۰۲۰-۲۰۱۶ ﭼﻪ ﺧﻮاﻫﺪ ﺑﻮد؟ implicitly from the writing system… !18 Direction in text block What will be the biggest internet trends between 2016-2020? …allowing reading ﺑﺰرﮔﺘﺮﯾﻦ روﻧﺪﻫﺎی اﯾﻨﺘﺮﻧﺘﯽ در ﺑﯿﻦ ﺳﺎل ﻫﺎی end-aligned” text“ ۲۰۲۰-۲۰۱۶ ﭼﻪ ﺧﻮاﻫﺪ ﺑﻮد؟ .with no problems !19 Direction in text block What will be the biggest internet ?trends between 2016-2020 Setting the wrong ﺑﺰرﮔﺘﺮﯾﻦ روﻧﺪﻫﺎی اﯾﻨﺘﺮﻧﺘﯽ در ﺑﯿﻦ ﺳﺎل ﻫﺎی direction results in poor readability, ﭼﻪ ﺧﻮاﻫﺪ ﺑﻮد؟ and sometimes ۲۰۱۶-۲۰۲۰ event close to gibberish. !20 Direction in text block What will be the biggest internet trends between 2016-2020? Let’s now look at ﺑﺰرﮔﺘﺮﯾﻦ روﻧﺪﻫﺎی اﯾﻨﺘﺮﻧﺘﯽ در ﺑﯿﻦ ﺳﺎل ﻫﺎی how sequences of shapes are ۲۰۲۰-۲۰۱۶ ﭼﻪ ﺧﻮاﻫﺪ ﺑﻮد؟ .perceived !21 Direction in text block What will be the biggest internet trends between 2016-2020? ﺑﺰرﮔﺘﺮﯾﻦ روﻧﺪﻫﺎی اﯾﻨﺘﺮﻧﺘﯽ در ﺑﯿﻦ ﺳﺎل ﻫﺎی LTR runs ⇒ orange ۲۰۲۰-۲۰۱۶ ﭼﻪ ﺧﻮاﻫﺪ ﺑﻮد؟ RTL runs ⇒ green !22 Direction in text block What will be the1 biggest internet trends between2 2016-2020? On the line level, 1 ﺑﺰرﮔﺘﺮﯾﻦ روﻧﺪﻫﺎی اﯾﻨﺘﺮﻧﺘﯽ در ﺑﯿﻦ ﺳﺎل ﻫﺎی the runs are read in order, in the 3 2 ۲۰۲۰-۲۰۱۶ ﭼﻪ ﺧﻮاﻫﺪ ﺑﻮد؟ direction of the paragraph (base direction) !23 Unicode • Converting a semantic in-memory string of chars into a reordering Bidirectional suitable for presentation (visual output) Algorithm (UBA) Annex #9 to the Unicode Standard (UAX #9) !24 Unicode • Converting a semantic in-memory string of chars into a reordering Bidirectional suitable for presentation (visual output) Algorithm (UBA) • Every Unicode Character has a Bidi Class - Strong, such as letters - Weak, such as numbers - Neutral, such as whitespace, punctuation and symbols Annex #9 to the Unicode Standard (UAX #9) !25 Unicode • Converting a semantic in-memory string of chars into a reordering Bidirectional suitable for presentation (visual output) Algorithm (UBA) • Every Unicode Character has a Bidi Class - Strong, such as letters - Weak, such as numbers - Neutral, such as whitespace, punctuation and symbols Annex #9 to the Some characters are Mirrored if in an RTL run Unicode Standard • (UAX #9) - Parenthesis are mirrored: “(” is an open parens in both LTR & RTL - Question Marks do not mirror: “?” is always closed on the right. !26 Unicode • Input: string of characters & base direction Bidirectional - Both inputs should be set correctly to achieve the correct presentation Algorithm (UBA) High-level steps of the algorithm !27 Unicode • Input: string of characters & base direction Bidirectional - Both inputs should be set correctly to achieve the correct presentation Algorithm (UBA) • Output: chars’ levels (evens are LTR, odds are RTL) & position High-level steps of the algorithm !28 Unicode • Input: string of characters & base direction Bidirectional - Both inputs should be set correctly to achieve the correct presentation Algorithm (UBA) • Output: chars’ levels (evens are LTR, odds are RTL) & position • First, explicit direction levels are calculated - Based on special directional formatting characters - Embedding (LRE, RLE), Isolate (LRI, RLI, FSI), Override (LRO, RLO) High-level steps of - Higher-level protocol the algorithm - HTML (dir="rtl") - CSS (direction: rtl;) !29 Unicode • Input: string of characters & base direction Bidirectional - Both inputs should be set correctly to achieve the correct presentation Algorithm (UBA) • Output: chars’ levels (evens are LTR, odds are RTL) & position • First, explicit direction levels are calculated - Based on special directional formatting characters - Embedding (LRE, RLE), Isolate (LRI, RLI, FSI), Override (LRO, RLO) High-level steps of - Higher-level protocol the algorithm - HTML (dir="rtl") - CSS (direction: rtl;) • Then, implicit dir. levels are calculated using chars’ Bidi Class - Implicit formatting characters (LRM, RLM, ALM) take effect here !30 Unicode • Input: string of characters & base direction Bidirectional - Both inputs should be set correctly to achieve the correct presentation Algorithm (UBA) • Output: chars’ levels (evens are LTR, odds are RTL) & position • First, explicit direction levels are calculated - Based on special directional formatting characters - Embedding (LRE, RLE), Isolate (LRI, RLI, FSI), Override (LRO, RLO) High-level steps of - Higher-level protocol the algorithm - HTML (dir="rtl") - CSS (direction: rtl;) • Then, implicit dir. levels are calculated using chars’ Bidi Class - Implicit formatting characters (LRM, RLM, ALM) take effect here • Finally, having the bidi levels, reordering can be done, when needed !31 Directional embeddings They translated the question ﺑﺰرﮔﺘﺮﯾﻦ روﻧﺪﻫﺎی اﯾﻨﺘﺮﻧﺘﯽ در ﺑﯿﻦ“ into on ” ﺳﺎلﻫﺎی ۲۰۱۶-۲۰۲۰ ﭼﻪ ﺧﻮاﻫﺪ ﺑﻮد؟ How directions are Quora! mixed when

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    94 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us