Computational Development of Lesser Resourced Languages

Computational Development of Lesser Resourced Languages

Computational Development of Lesser Resourced Languages Martin Hosken WSTech, SIL International © 2019, SIL International Modern Technical Capability l Grammar checking l Wikipedia l OCR l Localisation l Text to speech l Speech to text l Machine Translation © 2019, SIL International Digital Language Vitality l 0.2% doing well − 43% world population l 78% score nothing! − ~10% population © 2019, SIL International Simons and Thomas, 2019 Climbing from the Bottom l Language Tag l Linebreaking l Unicode encoding l Locale Information l Font − Character Lists − Sort order l Keyboard − physical l Content − phone © 2019, SIL International Language Tag l Unique orthography l lng – ISO639 identifier l Scrp – ISO 15924 l Structure: l RE – ISO 3166-1 − lng-Scrp-RE-variants − ahk = ahk-Latn-MM − https://ldml.api.sil.org/langtags.json BCP 47 © 2019, SIL International Language Tags l Variants l Policy Issues − dialect/language − ISO 639 is linguistic − orthography/script − Language tags are sociolinguistic − registration/private use © 2019, SIL International Unicode Encoding l Engineering detail l Policy Issues l Almost all scripts in − Use Unicode − Publish Orthography l Find a char Descriptions − Sequences are good l Implies an orthography © 2019, SIL International Fonts l Lots of fonts! l Policy Issues l SIL Fonts − Ensure industry support − Full script coverage − Encourage free fonts l Problems − adding fonts to phones © 2019, SIL− InternationalNoto styling Keyboards l Keyman l Wider industry − All platforms − More capable standard − Predictive text − More industry interest − Open Source − IDE © 2019, SIL International Keyboards l Policy Issues − Agreed layout l Per language l Physical & Mobile © 2019, SIL International Linebreaking l Unsolved problem l Word frequencies − Integration − open access − Description − same as for predictive text l Resources © 2019, SIL International Locale Information l A deep well! l Key terms l Unicode CLDR l Sorting − Industry base data l Dates, Times, etc. l SLDR © 2019, SIL International Content l Literacy l Crowd sourcing − For what purpose? − Wikipedia l Local Writers − Comedy − Translation © 2019, SIL International Bibliography l 2019 Simons and Thomas “Assessing Digital Language Support as a Factor in Language Vitality” given at 6th International Conference on Language Document and Conservation, University of Hawaii, Manoa l 2019 Internet World Users by Language https://www.internetworldstats.com/stats7.htm l 2009 Phillips, Davis Ed.BCP 47: Tags for Identifying Languages (RFC5646, RFC4647) l Unicode Technical Standard 35: Unicode Locale Data Markup (LDML) https://unicode.org/reports/tr35/tr35.html#Contents.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    14 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us