Keynote Speech: Internationalizing Web Content

Total Page:16

File Type:pdf, Size:1020Kb

Keynote Speech: Internationalizing Web Content Objectives • Explore the dimensions of internationalization (i18n) • Tease apart some basic contexts where internationalization is necessary • Show examples of how the W3C is making local access to the Web easier/possible Internationalizing Web Content • Show how internationalization is a prerequisite for good local content Richard Ishida W3C Internationalization Lead Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 1 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 2 Overview L10n or i18n? Localization The adaptation of a product, application or document content to meet the language, cultural and other requirements of a specific target market. L10n or i18n? Internationalization Getting the character basics right Extending technology to support local needs The design and development of a product, application or Removing barriers to international use document content that enables easy localization for target Assessing cultural influences audiences that vary in culture, region, or language. Improving the process Summary http://www.w3.org/International/questions/qa-i18n Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 3 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 4 Overview Getting the character basics right ! 缔造真正全球通行的万维网 締造真正全球通行的萬維網 የዓʶˊ አˬፉን ድ˙ በእውነት አʶˊ አˬፍ ˈድ˔ግ! Κάνοντας τον Παγκόσμιο Ιστό πραγματικά Παγκόσμιο ליצור מהרשת רשת כלל עולמית באמת वड वाईड वेब को सचमुचBव#यापीबनारहQ हS ! ᑖᑦᓱᒪ ᐃᑭᐊᖅᑭᕕᒃ ᓯᓚᕐᔪᐊᓕᒫᒥᒃ ᓈᕆᑎᑉᐹ. Making the World Wide Web world wide! L10n or i18n? ワールド・ワイド・ウェッブを世界中に広げましょう Getting the character basics right Hogy a Világháló valóban az egész világé lehessen! Extending technology to support local needs वड वाईड वेबलाई यथाथमै Bव#यापी बनाउने ! Removing barriers to international use "Дүниежүзілікторды" нағыз дүниежүзілікетеміз! Assessing cultural influences Improving the process 전세계의 월드 와이드 웹으로 만들기! ਵਰਡ ਵਾਈਡ ਵੈਬਨੰ ੂ ਵਾਕਈ ਿਵਸ਼ਵ-ਿਵਆਪੀ ਬਨਾਉਣਾ ! Сделаем "Всемирную паутину" действительно всемирной! World Wide Web U ita uri Webu Nyangaredzi ya Dzhango i vhe nyangaredzi ngangoho! Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 5 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 6 1 Getting the character basics right Getting the character basics right European Middle East scripts South & South East alphabetic scripts Hebrew Asian scripts Latin Arabic Devanagari 好 ũ א Greek Syriac Bengali A Cyrillic Thaana Gurmukhi Armenian Gujurati Georgian Panjabi Code point 41 5D0 597D 233B4 Runic Symbols Oriya Ogham Currency symbols Tamil Modifier letters Letter like symbols Telugu Additional scripts Combining characters Mathematic operators Kannada Ethiopic Numeric forms Malayalam Cherokee Technical symbols Sinhala Canadian Aboriginal East Asian scripts Geometrical symbols Thai Syllabics Han Miscellaneous Lao Mongolian Hiragana symbols & dingbats Tibetan Katakana Enclosed & square Myanmar Hangul Braille Khmer Bopomofo Etc…. Yi Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 7 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 8 Getting the character basics right Getting the character basics right 好 ũ א A Character Code point 41 5D0 597D 233B4 a a UTF-8 41 D7 90 E5 A5 BD F0 A3 8E B4 vs. UTF-16 00 41 05 D0 59 7D D8 4C DF B4 Glyph 雪 雪 Encodings UTF-32 00 00 00 41 00 00 05 D0 00 00 59 7D 00 02 33 B4 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 9 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 10 Getting the character basics right Getting the character basics right ह + ि◌ + न + ◌् + द + ◌ी ! * ,-/a . ^ [ c ،ّ S ()* +V'% $, " ( (Unicode Conference8Q 12-10 45 67 1997 ( ِ -5$ .'S :ْbV'% ,$ ! 5 $ ' ! ! ! 5* _4 $ 4 5 ,\5 5 ! $ . 5*$ a b िहbदी Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 11 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 12 2 Getting the character basics right Getting the character basics right Participating stakeholders: Character Bytes • W3C adopted and monitors the use of Unicode as the A 41 document character set for its specifications, and liaises with the Unicode Consortium • Unicode Consortium defines Unicode and expected á C3 A1 character-level behaviors • platform developers need to provide for Unicode support, such as rendering algorithms あ E3 81 82 • application developers should ensure that expected Unicode behaviors are implemented ũ F0 A3 8E B4 • content developers and managers should use Unicode whenever possible Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 13 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 14 Getting the character basics right Getting the character basics right あאあaאあ aאあaאa 61 D7 90 E3 81 82 61 D7 90 E3 81 82 61 D7 90 E3 81 82 61 D7 90 E3 81 82 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 15 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 16 Getting the character basics right Getting the character basics right あ NFC Ízelítőülאあaאa NFD I◌zeli ◌to◌ű◌̈l 61 D7 90 E3 81 82 61 D7 90 E3 81 82 Ha a világ beszélni akarna, Unicode-ul szólalna meg. Regisztráljon már most a Tizedik Nemzetközi Unicode Konferenciára, melyet 1997. március 10-12-én rendeznek Meinz-ban, Németországban. Ezen a konferencián az iparág több neves szakértője is részt vesz. Ízelítőül a témákból: a világháló és a Unicode nemzetköziesítése és lokalizálása, a Unicode alkalmazása működő rendszerekben és alkalmazásokban, szövegelrendezésnél, és többnyelvű számítógépeken. Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 17 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 18 3 Getting the character basics right Getting the character basics right <meta http-equiv="Content-type" content="text/html;charset=UTF-8" /> <?xml version="1.0" encoding="UTF-8"?> Content-Type: text/html; charset=utf-8 HTTP <?xml .. <meta .. HTML (✓) ✗ ✓ XHTML (text/html) (✓) (✓) ✓ XHTML (XML) (✓) ✓ ✗ http://www.w3.org/International/tutorials/tutorial-char-enc/ Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 19 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 20 Overview Extending technology to support local needs Characters as ordered in memory: .W3C</span>" in Hebrew ,םו אני בה תו לי ע פ <The title says "<span ✓ פעילות הב ינאו ם, ?L10n or i18n Getting the character basics right The title says "W3C " in Hebrew. Extending technology to support local needs Removing barriers to international use Assessing cultural influences Improving the process Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 21 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 22 Extending technology to support local needs Extending technology to support local needs Characters as ordered in memory: Characters as ordered in memory: W3C</span>" in ,םו אנ י בה תו ל י עפ<"W3C</span>" in Hebrew. The title says "<span dir="rtl ,םו אני בה תו לי ע פ <The title says "<span Hebrew. ✓ ✓ .in Hebrew "פעילות הב ינאו ם, in Hebrew. The title says "W3C "פעילות הב ינאו ם, The title says "W3C ✗ ✗ Using the bidi algorithm only Using the bidi algorithm only .W3C" in Hebrew ,פעילות הב ינאו ם " W3C" in Hebrew. The title says ,פעילות הב ינאו ם " The title says Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 23 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 24 4 Extending technology to support local needs Extending technology to support local needs かみしばい これは紙芝居です。 這一晚會如常舉行 <ruby> <rb>紙芝居</rb> 這一|晚會|如常|舉行 This banquet is held as usual. <rt>かみしばい</rt> </ruby> 這一|晚會|如|常|舉行 If this banquet is held frequently. 這一晚|會|如常|舉行 (An event) will be held tonight as usual. <p>これは<ruby><rb>紙芝居</rb><rt>かみしばい</rt></ruby>です。</p> Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 25 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 26 Extending technology to support local needs Extending technology to support local needs punctuation trim 经验分 经验分 (万维 (万维 auto-space 弟10回のUnicode会議 弟 10 回の Unicode 会議 ... emphasis これは日本語の文章です。 、、、 これは日本語の文章です。 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 27 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 28 Extending technology to support local needs Extending technology to support local needs 及多文种计算等。 的实现,字型,文本格式以 码在操作系统和应用软件中 码,国际化和本地化,统一 域包括:国际互联网和统一 集各方面的专家。涉及的领 在开始注册。本次会议将汇 第十届统一码国际研讨会现 日在德国 Unicode 当世界需要沟通时,请用 כאשר העול רוצה לדבר, הוא מדבר בUnicode. הירשמו כעת לכנס Unicode הבינלאומי העשירי, שייער בי התאריכי 1012 במר, ְמָיְינְ 。 将于 שבגרמניה. בכנס ישתתפו מומחי מכל ענפי Mainz Mainz התעשייה בנושא האינטרנט העולמי וה , 3 3 Unicode בהתאמה לשוק הבינלאומי והמקומי, ביישו 月 市举行的 Unicode במערכות הפעלה וביישומי , בגופני , 1010 בפריסת טקסט ובמחשוב רבלשוני. -日 1212 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 29 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 30 5 Extending technology to support local needs Extending technology to support local needs http://people.w3.org/rishida/scripts/samples/wrapping.html http://people.w3.org/rishida/scripts/samples/wrapping.html Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 31 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 32 Extending technology to support local needs Extending technology to support local needs Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 33 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 34 Overview Removing barriers to international use You are speaking to her from my new house. Están hablándole desde mi casa nueva. L10n or i18n? Getting the character basics right 私の新しい家から彼女と話しています。 Extending technology to support local needs Removing barriers to international use Assessing cultural influences Improving the process Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 35 Copyright © 2005 W3C (MIT, ERCIM, Keio) slide 36 6 Removing barriers to international use Removing barriers to international use There were %d spelling mistakes in file: %s. Datei %s enthält %d Rechtschreibfehler. The < > has been disabled. printf( "There were %d spelling
Recommended publications
  • The Ogham-Runes and El-Mushajjar
    c L ite atu e Vo l x a t n t r n o . o R So . u P R e i t ed m he T a s . 1 1 87 " p r f ro y f r r , , r , THE OGHAM - RUNES AND EL - MUSHAJJAR A D STU Y . BY RICH A R D B URTO N F . , e ad J an uar 22 (R y , PART I . The O ham-Run es g . e n u IN tr ating this first portio of my s bj ect, the - I of i Ogham Runes , have made free use the mater als r John collected by Dr . Cha les Graves , Prof. Rhys , and other students, ending it with my own work in the Orkney Islands . i The Ogham character, the fair wr ting of ' Babel - loth ancient Irish literature , is called the , ’ Bethluis Bethlm snion e or , from its initial lett rs, like “ ” Gree co- oe Al hab e t a an d the Ph nician p , the Arabo “ ” Ab ad fl d H ebrew j . It may brie y be describe as f b ormed y straight or curved strokes , of various lengths , disposed either perpendicularly or obliquely to an angle of the substa nce upon which the letters n . were i cised , punched, or rubbed In monuments supposed to be more modern , the letters were traced , b T - N E E - A HE OGHAM RU S AND L M USH JJ A R . n not on the edge , but upon the face of the recipie t f n l o t sur ace ; the latter was origi al y wo d , s aves and tablets ; then stone, rude or worked ; and , lastly, metal , Th .
    [Show full text]
  • Neural Substrates of Hanja (Logogram) and Hangul (Phonogram) Character Readings by Functional Magnetic Resonance Imaging
    ORIGINAL ARTICLE Neuroscience http://dx.doi.org/10.3346/jkms.2014.29.10.1416 • J Korean Med Sci 2014; 29: 1416-1424 Neural Substrates of Hanja (Logogram) and Hangul (Phonogram) Character Readings by Functional Magnetic Resonance Imaging Zang-Hee Cho,1 Nambeom Kim,1 The two basic scripts of the Korean writing system, Hanja (the logography of the traditional Sungbong Bae,2 Je-Geun Chi,1 Korean character) and Hangul (the more newer Korean alphabet), have been used together Chan-Woong Park,1 Seiji Ogawa,1,3 since the 14th century. While Hanja character has its own morphemic base, Hangul being and Young-Bo Kim1 purely phonemic without morphemic base. These two, therefore, have substantially different outcomes as a language as well as different neural responses. Based on these 1Neuroscience Research Institute, Gachon University, Incheon, Korea; 2Department of linguistic differences between Hanja and Hangul, we have launched two studies; first was Psychology, Yeungnam University, Kyongsan, Korea; to find differences in cortical activation when it is stimulated by Hanja and Hangul reading 3Kansei Fukushi Research Institute, Tohoku Fukushi to support the much discussed dual-route hypothesis of logographic and phonological University, Sendai, Japan routes in the brain by fMRI (Experiment 1). The second objective was to evaluate how Received: 14 February 2014 Hanja and Hangul affect comprehension, therefore, recognition memory, specifically the Accepted: 5 July 2014 effects of semantic transparency and morphemic clarity on memory consolidation and then related cortical activations, using functional magnetic resonance imaging (fMRI) Address for Correspondence: (Experiment 2). The first fMRI experiment indicated relatively large areas of the brain are Young-Bo Kim, MD Department of Neuroscience and Neurosurgery, Gachon activated by Hanja reading compared to Hangul reading.
    [Show full text]
  • Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
    1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only.
    [Show full text]
  • Recognition of Online Handwritten Gurmukhi Strokes Using Support Vector Machine a Thesis
    Recognition of Online Handwritten Gurmukhi Strokes using Support Vector Machine A Thesis Submitted in partial fulfillment of the requirements for the award of the degree of Master of Technology Submitted by Rahul Agrawal (Roll No. 601003022) Under the supervision of Dr. R. K. Sharma Professor School of Mathematics and Computer Applications Thapar University Patiala School of Mathematics and Computer Applications Thapar University Patiala – 147004 (Punjab), INDIA June 2012 (i) ABSTRACT Pen-based interfaces are becoming more and more popular and play an important role in human-computer interaction. This popularity of such interfaces has created interest of lot of researchers in online handwriting recognition. Online handwriting recognition contains both temporal stroke information and spatial shape information. Online handwriting recognition systems are expected to exhibit better performance than offline handwriting recognition systems. Our research work presented in this thesis is to recognize strokes written in Gurmukhi script using Support Vector Machine (SVM). The system developed here is a writer independent system. First chapter of this thesis report consist of a brief introduction to handwriting recognition system and some basic differences between offline and online handwriting systems. It also includes various issues that one can face during development during online handwriting recognition systems. A brief introduction about Gurmukhi script has also been given in this chapter In the last section detailed literature survey starting from the 1979 has also been given. Second chapter gives detailed information about stroke capturing, preprocessing of stroke and feature extraction. These phases are considered to be backbone of any online handwriting recognition system. Recognition techniques that have been used in this study are discussed in chapter three.
    [Show full text]
  • + Natali A, Professor of Cartqraphy, the Hebreu Uhiversity of -Msalem, Israel DICTIONARY of Toponymfc TERLMINO~OGY Wtaibynafiail~
    United Nations Group of E%perts OR Working Paper 4eographicalNames No. 61 Eighteenth Session Geneva, u-23 August1996 Item7 of the E%ovisfonal Agenda REPORTSOF THE WORKINGGROUPS + Natali a, Professor of Cartqraphy, The Hebreu UhiVersity of -msalem, Israel DICTIONARY OF TOPONYMfC TERLMINO~OGY WtaIbyNafiaIl~- . PART I:RaLsx vbim 3.0 upi8elfuiyl9!J6 . 001 . 002 003 004 oo!l 006 007 . ooa 009 010 . ol3 014 015 sequala~esfocJphabedcsaipt. 016 putting into dphabetic order. see dso Kqucna ruIt!% Qphabctk 017 Rtlpreat8Ii00, e.g. ia 8 computer, wflich employs ooc only numm ds but also fetters. Ia a wider sense. aIso anploying punauatiocl tnarksmd-SymboIs. 018 Persod name. Esamples: Alfredi ‘Ali. 019 022 023 biliaw 024 02s seecIass.f- 026 GrqbicsymboIusedurunitiawrIdu~morespedficaty,r ppbic symbol in 1 non-dphabedc writiog ryste.n& Exmlptes: Chinese ct, , thong; Ambaric u , ha: Japaoese Hiragana Q) , no. 027 -.modiGed Wnprehauive term for cheater. simplified aad character, varIaoL 031 CbmJnyol 032 CISS, featm? 033 cQdedrepfwltatiul 034 035 036 037' 038 039 040 041 042 047 caavasion alphabet 048 ConMQo table* 049 0nevahte0frpointinlhisgr8ti~ . -.- w%idofplaaecoordiaarurnm;aingoftwosetsofsnpight~ -* rtcight8ngfIertoeachotkrodwithap8ltKliuofl8qthonbo&. rupenmposedonr(chieflytopogtaphtc)map.see8lsouTM gz 051 see axxdimtes. rectangufar. 052 A stahle form of speech, deriyed from a pbfgin, which has became the sole a ptincipal language of 8 qxech comtnunity. Example: Haitian awle (derived from Fresh). ‘053 adllRaIfeatlue see feature, allhlral. 054 055 * 056 057 Ac&uioaofsoftwamrcqkdfocusingrdgRaIdatabmem rstoauMe~osctlto~thisdatabase. 058 ckalog of defItitioas of lbe contmuofadigitaldatabase.~ud- hlg data element cefw labels. f0mw.s. internal refm codMndtextemty,~well~their-p,. 059 see&tadichlq. 060 DeMptioa of 8 basic unit of -Lkatifiile md defiile informatioa tooccqyrspecEcdataf!eldinrcomputernxaxtLExampk Pateofmtifii~ofluwtby~namaturhority’.
    [Show full text]
  • Proposal for a Korean Script Root Zone LGR 1 General Information
    (internal doc. #: klgp220_101f_proposal_korean_lgr-25jan18-en_v103.doc) Proposal for a Korean Script Root Zone LGR LGR Version 1.0 Date: 2018-01-25 Document version: 1.03 Authors: Korean Script Generation Panel 1 General Information/ Overview/ Abstract The purpose of this document is to give an overview of the proposed Korean Script LGR in the XML format and the rationale behind the design decisions taken. It includes a discussion of relevant features of the script, the communities or languages using it, the process and methodology used and information on the contributors. The formal specification of the LGR can be found in the accompanying XML document below: • proposal-korean-lgr-25jan18-en.xml Labels for testing can be found in the accompanying text document below: • korean-test-labels-25jan18-en.txt In Section 3, we will see the background on Korean script (Hangul + Hanja) and principal language using it, i.e., Korean language. The overall development process and methodology will be reviewed in Section 4. The repertoire and variant groups in K-LGR will be discussed in Sections 5 and 6, respectively. In Section 7, Whole Label Evaluation Rules (WLE) will be described and then contributors for K-LGR are shown in Section 8. Several appendices are included with separate files. proposal-korean-lgr-25jan18-en 1 / 73 1/17 2 Script for which the LGR is proposed ISO 15924 Code: Kore ISO 15924 Key Number: 287 (= 286 + 500) ISO 15924 English Name: Korean (alias for Hangul + Han) Native name of the script: 한글 + 한자 Maximal Starting Repertoire (MSR) version: MSR-2 [241] Note.
    [Show full text]
  • The Japanese Writing Systems, Script Reforms and the Eradication of the Kanji Writing System: Native Speakers’ Views Lovisa Österman
    The Japanese writing systems, script reforms and the eradication of the Kanji writing system: native speakers’ views Lovisa Österman Lund University, Centre for Languages and Literature Bachelor’s Thesis Japanese B.A. Course (JAPK11 Spring term 2018) Supervisor: Shinichiro Ishihara Abstract This study aims to deduce what Japanese native speakers think of the Japanese writing systems, and in particular what native speakers’ opinions are concerning Kanji, the logographic writing system which consists of Chinese characters. The Japanese written language has something that most languages do not; namely a total of ​ ​ three writing systems. First, there is the Kana writing system, which consists of the two syllabaries: Hiragana and Katakana. The two syllabaries essentially figure the same way, but are used for different purposes. Secondly, there is the Rōmaji writing system, which is Japanese written using latin letters. And finally, there is the Kanji writing system. Learning this is often at first an exhausting task, because not only must one learn the two phonematic writing systems (Hiragana and Katakana), but to be able to properly read and write in Japanese, one should also learn how to read and write a great amount of logographic signs; namely the Kanji. For example, to be able to read and understand books or newspaper without using any aiding tools such as dictionaries, one would need to have learned the 2136 Jōyō Kanji (regular-use Chinese characters). With the twentieth century’s progress in technology, comparing with twenty years ago, in this day and age one could probably theoretically get by alright without knowing how to write Kanji by hand, seeing as we are writing less and less by hand and more by technological devices.
    [Show full text]
  • A New Research Resource for Optical Recognition of Embossed and Hand-Punched Hindi Devanagari Braille Characters: Bharati Braille Bank
    I.J. Image, Graphics and Signal Processing, 2015, 6, 19-28 Published Online May 2015 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijigsp.2015.06.03 A New Research Resource for Optical Recognition of Embossed and Hand-Punched Hindi Devanagari Braille Characters: Bharati Braille Bank Shreekanth.T Research Scholar, JSS Research Foundation, Mysore, India. Email: [email protected] V.Udayashankara Professor, Department of IT, SJCE, Mysore, India. Email: [email protected] Abstract—To develop a Braille recognition system, it is required to have the stored images of Braille sheets. This I. INTRODUCTION paper describes a method and also the challenges of Braille is a language for the blind to read and write building the corpora for Hindi Devanagari Braille. A few through the sense of touch. Braille is formatted to a Braille databases and commercial software's are standard size by Frenchman Louis Braille in 1825.Braille obtainable for English and Arabic Braille languages, but is a system of raised dots arranged in cells. Any none for Indian Braille which is popularly known as Bharathi Braille. However, the size and scope of the combination of one to six dots may be raised within each English and Arabic Braille language databases are cell and the number and position of the raised dots within a cell convey to the reader the letter, word, number, or limited. Researchers frequently develop and self-evaluate symbol the cell exemplifies. There are 64 possible their algorithm based on the same private data set and combinations of raised dots within a single cell.
    [Show full text]
  • The Fontspec Package Font Selection for XƎLATEX and Lualatex
    The fontspec package Font selection for XƎLATEX and LuaLATEX Will Robertson and Khaled Hosny [email protected] 2013/05/12 v2.3b Contents 7.5 Different features for dif- ferent font sizes . 14 1 History 3 8 Font independent options 15 2 Introduction 3 8.1 Colour . 15 2.1 About this manual . 3 8.2 Scale . 16 2.2 Acknowledgements . 3 8.3 Interword space . 17 8.4 Post-punctuation space . 17 3 Package loading and options 4 8.5 The hyphenation character 18 3.1 Maths fonts adjustments . 4 8.6 Optical font sizes . 18 3.2 Configuration . 5 3.3 Warnings .......... 5 II OpenType 19 I General font selection 5 9 Introduction 19 9.1 How to select font features 19 4 Font selection 5 4.1 By font name . 5 10 Complete listing of OpenType 4.2 By file name . 6 font features 20 10.1 Ligatures . 20 5 Default font families 7 10.2 Letters . 20 6 New commands to select font 10.3 Numbers . 21 families 7 10.4 Contextuals . 22 6.1 More control over font 10.5 Vertical Position . 22 shape selection . 8 10.6 Fractions . 24 6.2 Math(s) fonts . 10 10.7 Stylistic Set variations . 25 6.3 Miscellaneous font select- 10.8 Character Variants . 25 ing details . 11 10.9 Alternates . 25 10.10 Style . 27 7 Selecting font features 11 10.11 Diacritics . 29 7.1 Default settings . 11 10.12 Kerning . 29 7.2 Changing the currently se- 10.13 Font transformations . 30 lected features .
    [Show full text]
  • Scripts, Languages, and Authority Control Joan M
    49(4) LRTS 243 Scripts, Languages, and Authority Control Joan M. Aliprand Library vendors’ use of Unicode is leading to library systems with multiscript capability, which offers the prospect of multiscript authority records. Although librarians tend to focus on Unicode in relation to non-Roman scripts, language is a more important feature of authority records than script. The concept of a catalog “locale” (of which language is one aspect) is introduced. Restrictions on the structure and content of a MARC 21 authority record are outlined, and the alternative structures for authority records containing languages written in non- Roman scripts are described. he Unicode Standard is the universal encoding standard for all the charac- Tters used in writing the world’s languages.1 The availability of library systems based on Unicode offers the prospect of library records not only in all languages but also in all the scripts that a particular system supports. While such a system will be used primarily to create and provide access to bibliographic records in their actual scripts, it can also be used to create authority records for the library, perhaps for contribution to communal authority files. A number of general design issues apply to authority records in multiple languages and scripts, design issues that affect not just the key hubs of communal authority files, but any institution or organization involved with authority control. Multiple scripts in library systems became available in the 1980s in the Research Libraries Information Network (RLIN) with the addition of Chinese, Japanese, and Korean (CJK) capability, and in ALEPH (Israel’s research library network), which initially provided Latin and Hebrew scripts and later Arabic, Cyrillic, and Greek.2 The Library of Congress continued to produce catalog cards for material in the JACKPHY (Japanese, Arabic, Chinese, Korean, Persian, Hebrew, and Yiddish) languages until all of the scripts used to write these languages were supported by an automated system.
    [Show full text]
  • Contribution to the UN Secretary-General's 2018 Report
    COMMISSION ON SCIENCE AND TECHNOLOGY FOR DEVELOPMENT (CSTD) Twenty-second session Geneva, 13 to 17 May 2019 Submissions from entities in the United Nations system and elsewhere on their efforts in 2018 to implement the outcome of the WSIS Submission by Internet Corporation for Assigned Names and Numbers This submission was prepared as an input to the report of the UN Secretary-General on "Progress made in the implementation of and follow-up to the outcomes of the World Summit on the Information Society at the regional and international levels" (to the 22nd session of the CSTD), in response to the request by the Economic and Social Council, in its resolution 2006/46, to the UN Secretary-General to inform the Commission on Science and Technology for Development on the implementation of the outcomes of the WSIS as part of his annual reporting to the Commission. DISCLAIMER: The views presented here are the contributors' and do not necessarily reflect the views and position of the United Nations or the United Nations Conference on Trade and Development. 2018 ANNUAL REPORT TO UNCTAD: ICANN CONTRIBUTION Progress made in the implementation of and follow-up to the outcomes of the World Summit on the Information Society at the regional and international levels Executive Summary ICANN is pleased and honoured be invited to contribute to this annual UNCTAD Report. We value our involvement with, and contribution to, the overall WSIS process and to our relationship with the UN Commission on Science and Technology for Development (CSTD). 2018 has been a busy and important year for ICANN and for the Internet Governance Ecosystem in general; with the ITU Plenipotentiary taking place in Dubai and the IGF in Paris.
    [Show full text]
  • ISO/IEC JTC1/SC2/WG2 N 2029 Date: 1999-05-29
    ISO INTERNATIONAL ORGANIZATION FOR STANDARDIZATION ORGANISATION INTERNATIONALE DE NORMALISATION --------------------------------------------------------------------------------------- ISO/IEC JTC1/SC2/WG2 Universal Multiple-Octet Coded Character Set (UCS) -------------------------------------------------------------------------------- ISO/IEC JTC1/SC2/WG2 N 2029 Date: 1999-05-29 TITLE: Repertoire additions for ISO/IEC 10646-1 - Cumulative List No.9 SOURCE: Bruce Paterson, project editor STATUS: Standing Document, replacing WG2 N 1936 ACTION: For review and confirmation by WG2 DISTRIBUTION: Members of JTC1/SC2/WG2 INTRODUCTION This working paper contains the accumulated list of additions to the repertoire of ISO/IEC 10646-1 agreed by WG2, up to meeting no.36 (Fukuoka). A summary of all allocations within the BMP is given in Annex 1. A list of additional Collections, Blocks, and character tables is given in Annex 2. All additions are assigned to a Character Category, in accordance with clause II of the document "Principles and Procedures for Allocation of New Characters and Scripts" WG2 N 1502. The column Cat. in the table below shows the category (A to G) assigned by WG2. An entry P in this column indicates that the characters are provisionally accepted by WG2. WG2 Cat. No of Code Character(s) Source Current Ballot mtg.res chars position(s) doc. ref. end date NEW/EXTENDED SCRIPTS 27.14 A 11172 AC00-D7A3 Hangul syllables (revision) N1158 AMD 5 - 28.2 A 31 0591-05AF Hebrew cantillation marks N1217 AMD 7 - +05C4 28.5 A 174 0F00-0FB9 Tibetan N1238 AMD 6 - 31.4 A 346 1200-137F Ethiopic N1420 AMD10 - 31.6 A 623 1400-167F Canadian Aboriginal Syllabics N1441 AMD11 - 31.7 B1 85 13A0-13AF Cherokee N1172 AMD12 - & N1362 32.14 A 6582 3400-4DBF CJK Unified Ideograph Exten.
    [Show full text]