The Tatoeba Translation Challenge--Realistic Data Sets For
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Computational Development of Lesser Resourced Languages
Computational Development of Lesser Resourced Languages Martin Hosken WSTech, SIL International © 2019, SIL International Modern Technical Capability l Grammar checking l Wikipedia l OCR l Localisation l Text to speech l Speech to text l Machine Translation © 2019, SIL International Digital Language Vitality l 0.2% doing well − 43% world population l 78% score nothing! − ~10% population © 2019, SIL International Simons and Thomas, 2019 Climbing from the Bottom l Language Tag l Linebreaking l Unicode encoding l Locale Information l Font − Character Lists − Sort order l Keyboard − physical l Content − phone © 2019, SIL International Language Tag l Unique orthography l lng – ISO639 identifier l Scrp – ISO 15924 l Structure: l RE – ISO 3166-1 − lng-Scrp-RE-variants − ahk = ahk-Latn-MM − https://ldml.api.sil.org/langtags.json BCP 47 © 2019, SIL International Language Tags l Variants l Policy Issues − dialect/language − ISO 639 is linguistic − orthography/script − Language tags are sociolinguistic − registration/private use © 2019, SIL International Unicode Encoding l Engineering detail l Policy Issues l Almost all scripts in − Use Unicode − Publish Orthography l Find a char Descriptions − Sequences are good l Implies an orthography © 2019, SIL International Fonts l Lots of fonts! l Policy Issues l SIL Fonts − Ensure industry support − Full script coverage − Encourage free fonts l Problems − adding fonts to phones © 2019, SIL− InternationalNoto styling Keyboards l Keyman l Wider industry − All platforms − More capable standard − Predictive text − More industry interest − Open Source − IDE © 2019, SIL International Keyboards l Policy Issues − Agreed layout l Per language l Physical & Mobile © 2019, SIL International Linebreaking l Unsolved problem l Word frequencies − Integration − open access − Description − same as for predictive text l Resources © 2019, SIL International Locale Information l A deep well! l Key terms l Unicode CLDR l Sorting − Industry base data l Dates, Times, etc. -
Ethnicity, Education and Equality in Nepal
HIMALAYA, the Journal of the Association for Nepal and Himalayan Studies Volume 36 Number 2 Article 6 December 2016 New Languages of Schooling: Ethnicity, Education and Equality in Nepal Uma Pradhan University of Oxford, [email protected] Follow this and additional works at: https://digitalcommons.macalester.edu/himalaya Recommended Citation Pradhan, Uma. 2016. New Languages of Schooling: Ethnicity, Education and Equality in Nepal. HIMALAYA 36(2). Available at: https://digitalcommons.macalester.edu/himalaya/vol36/iss2/6 This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License This Research Article is brought to you for free and open access by the DigitalCommons@Macalester College at DigitalCommons@Macalester College. It has been accepted for inclusion in HIMALAYA, the Journal of the Association for Nepal and Himalayan Studies by an authorized administrator of DigitalCommons@Macalester College. For more information, please contact [email protected]. New Languages of Schooling: Ethnicity, Education, and Equality in Nepal Uma Pradhan Mother tongue education has remained this attempt to seek membership into a controversial issue in Nepal. Scholars, multiple groups and display of apparently activists, and policy-makers have favored contradictory dynamics. On the one hand, the mother tongue education from the standpoint practices in these schools display inward- of social justice. Against these views, others looking characteristics through the everyday have identified this effort as predominantly use of mother tongue, the construction of groupist in its orientation and not helpful unified ethnic identity, and cultural practices. in imagining a unified national community. On the other hand, outward-looking dynamics Taking this contention as a point of inquiry, of making claims in the universal spaces of this paper explores the contested space of national education and public places could mother tongue education to understand the also be seen. -
Multilingual Content Management and Standards with a View on AI Developments Laurent Romary
Multilingual content management and standards with a view on AI developments Laurent Romary To cite this version: Laurent Romary. Multilingual content management and standards with a view on AI developments. AI4EI - Conference Artificial Intelligence for European Integration, Oct 2020, Turin / Virtual, Italy. hal-02961857 HAL Id: hal-02961857 https://hal.inria.fr/hal-02961857 Submitted on 8 Oct 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License Multilingual content management and standards with a view on AI developments Laurent Romary Directeur de Recherche, Inria, team ALMAnaCH ISO TC 37, chair Language and AI • Central role of language in the revival of AI (machine-learning based models) • Applications: document management and understanding, chatbots, machine translation • Information sources: public (web, cultural heritage repositories) and private (Siri, Amazon Alexa) linguistic information • European context: cf. Europe's Languages in the Digital Age, META-NET White Paper Series • Variety of linguistic forms • Spoken, written, chats and forums • Multilingualism, accents, dialects, technical domains, registers, language learners • General notion of language variety • Classifying and referencing the relevant features • Role of standards and standards developing organization (SDO) A concrete example for a start Large scale corpus Language model BERT Devlin, J., Chang, M. -
Tags for Identifying Languages File:///C:/W3/International/Draft-Langtags/Draft-Phillips-Lan
Tags for Identifying Languages file:///C:/w3/International/draft-langtags/draft-phillips-lan... Network Working Group A. Phillips, Ed. TOC Internet-Draft webMethods, Inc. Expires: October 7, 2004 M. Davis IBM April 8, 2004 Tags for Identifying Languages draft-phillips-langtags-02 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on October 7, 2004. Copyright Notice Copyright (C) The Internet Society (2004). All Rights Reserved. Abstract This document describes a language tag for use in cases where it is desired to indicate the language used in an information object, how to register values for use in this language tag, and a construct for matching such language tags, including user defined extensions for private interchange. Table of Contents 1. Introduction 2. The Language Tag 2.1 Syntax 2.2 Language Tag Sources 2.2.1 Pre-Existing RFC3066 Registrations 1 of 20 08/04/2004 11:03 Tags for Identifying Languages file:///C:/w3/International/draft-langtags/draft-phillips-lan.. -
A Könyvtárüggyel Kapcsolatos Nemzetközi Szabványok
A könyvtárüggyel kapcsolatos nemzetközi szabványok 1. Állomány-nyilvántartás ISO 20775:2009 Information and documentation. Schema for holdings information 2. Bibliográfiai feldolgozás és adatcsere, transzliteráció ISO 10754:1996 Information and documentation. Extension of the Cyrillic alphabet coded character set for non-Slavic languages for bibliographic information interchange ISO 11940:1998 Information and documentation. Transliteration of Thai ISO 11940-2:2007 Information and documentation. Transliteration of Thai characters into Latin characters. Part 2: Simplified transcription of Thai language ISO 15919:2001 Information and documentation. Transliteration of Devanagari and related Indic scripts into Latin characters ISO 15924:2004 Information and documentation. Codes for the representation of names of scripts ISO 21127:2014 Information and documentation. A reference ontology for the interchange of cultural heritage information ISO 233:1984 Documentation. Transliteration of Arabic characters into Latin characters ISO 233-2:1993 Information and documentation. Transliteration of Arabic characters into Latin characters. Part 2: Arabic language. Simplified transliteration ISO 233-3:1999 Information and documentation. Transliteration of Arabic characters into Latin characters. Part 3: Persian language. Simplified transliteration ISO 25577:2013 Information and documentation. MarcXchange ISO 259:1984 Documentation. Transliteration of Hebrew characters into Latin characters ISO 259-2:1994 Information and documentation. Transliteration of Hebrew characters into Latin characters. Part 2. Simplified transliteration ISO 3602:1989 Documentation. Romanization of Japanese (kana script) ISO 5963:1985 Documentation. Methods for examining documents, determining their subjects, and selecting indexing terms ISO 639-2:1998 Codes for the representation of names of languages. Part 2. Alpha-3 code ISO 6630:1986 Documentation. Bibliographic control characters ISO 7098:1991 Information and documentation. -
Addressing — Digital Interchange Models
© The Calendaring and Scheduling Consortium, Inc. 2019 – All rights reserved CC/WD 19160-6:2019 CalConnect TC VCARD Addressing — Digital interchange models Working Dra Standard Warning for dras This document is not a CalConnect Standard. It is distributed for review and comment, and is subject to change without notice and may not be referred to as a Standard. Recipients of this dra are invited to submit, with their comments, notification of any relevant patent rights of which they are aware and to provide supporting documentation. Recipients of this dra are invited to submit, with their comments, notification of any relevant patent rights of which they are aware and to provide supporting documentation. The Calendaring and Scheduling Consortium, Inc. 2019 CC/WD 19160-6:2019:2019 © 2019 The Calendaring and Scheduling Consortium, Inc. All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior written permission. Permission can be requested from the address below. The Calendaring and Scheduling Consortium, Inc. 4390 Chaffin Lane McKinleyville California 95519 United States of America [email protected] www.calconnect.org ii © The Calendaring and Scheduling Consortium, Inc. 2019 – All rights reserved CC/WD 19160-6:2019:2019 Contents .Foreword...................................................................................................................................... -
Proposal for Generation Panel for Latin Script Label Generation Ruleset for the Root Zone
Generation Panel for Latin Script Label Generation Ruleset for the Root Zone Proposal for Generation Panel for Latin Script Label Generation Ruleset for the Root Zone Table of Contents 1. General Information 2 1.1 Use of Latin Script characters in domain names 3 1.2 Target Script for the Proposed Generation Panel 4 1.2.1 Diacritics 5 1.3 Countries with significant user communities using Latin script 6 2. Proposed Initial Composition of the Panel and Relationship with Past Work or Working Groups 7 3. Work Plan 13 3.1 Suggested Timeline with Significant Milestones 13 3.2 Sources for funding travel and logistics 16 3.3 Need for ICANN provided advisors 17 4. References 17 1 Generation Panel for Latin Script Label Generation Ruleset for the Root Zone 1. General Information The Latin script1 or Roman script is a major writing system of the world today, and the most widely used in terms of number of languages and number of speakers, with circa 70% of the world’s readers and writers making use of this script2 (Wikipedia). Historically, it is derived from the Greek alphabet, as is the Cyrillic script. The Greek alphabet is in turn derived from the Phoenician alphabet which dates to the mid-11th century BC and is itself based on older scripts. This explains why Latin, Cyrillic and Greek share some letters, which may become relevant to the ruleset in the form of cross-script variants. The Latin alphabet itself originated in Italy in the 7th Century BC. The original alphabet contained 21 upper case only letters: A, B, C, D, E, F, Z, H, I, K, L, M, N, O, P, Q, R, S, T, V and X. -
Map by Steve Huffman; Data from World Language Mapping System
Svalbard Greenland Jan Mayen Norwegian Norwegian Icelandic Iceland Finland Norway Swedish Sweden Swedish Faroese FaroeseFaroese Faroese Faroese Norwegian Russia Swedish Swedish Swedish Estonia Scottish Gaelic Russian Scottish Gaelic Scottish Gaelic Latvia Latvian Scots Denmark Scottish Gaelic Danish Scottish Gaelic Scottish Gaelic Danish Danish Lithuania Lithuanian Standard German Swedish Irish Gaelic Northern Frisian English Danish Isle of Man Northern FrisianNorthern Frisian Irish Gaelic English United Kingdom Kashubian Irish Gaelic English Belarusan Irish Gaelic Belarus Welsh English Western FrisianGronings Ireland DrentsEastern Frisian Dutch Sallands Irish Gaelic VeluwsTwents Poland Polish Irish Gaelic Welsh Achterhoeks Irish Gaelic Zeeuws Dutch Upper Sorbian Russian Zeeuws Netherlands Vlaams Upper Sorbian Vlaams Dutch Germany Standard German Vlaams Limburgish Limburgish PicardBelgium Standard German Standard German WalloonFrench Standard German Picard Picard Polish FrenchLuxembourgeois Russian French Czech Republic Czech Ukrainian Polish French Luxembourgeois Polish Polish Luxembourgeois Polish Ukrainian French Rusyn Ukraine Swiss German Czech Slovakia Slovak Ukrainian Slovak Rusyn Breton Croatian Romanian Carpathian Romani Kazakhstan Balkan Romani Ukrainian Croatian Moldova Standard German Hungary Switzerland Standard German Romanian Austria Greek Swiss GermanWalser CroatianStandard German Mongolia RomanschWalser Standard German Bulgarian Russian France French Slovene Bulgarian Russian French LombardRomansch Ladin Slovene Standard -
The Maulana Who Loved Krishna
SPECIAL ARTICLE The Maulana Who Loved Krishna C M Naim This article reproduces, with English translations, the e was a true maverick. In 1908, when he was 20, he devotional poems written to the god Krishna by a published an anonymous article in his modest Urdu journal Urd -i-Mu’all (Aligarh) – circulation 500 – maulana who was an active participant in the cultural, H ū ā which severely criticised the British colonial policy in Egypt political and theological life of late colonial north India. regarding public education. The Indian authorities promptly Through this, the article gives a glimpse of an Islamicate charged him with “sedition”, and demanded the disclosure of literary and spiritual world which revelled in syncretism the author’s name. He, however, took sole responsibility for what appeared in his journal and, consequently, spent a little with its surrounding Hindu worlds; and which is under over one year in rigorous imprisonment – held as a “C” class threat of obliteration, even as a memory, in the singular prisoner he had to hand-grind, jointly with another prisoner, world of globalised Islam of the 21st century. one maund (37.3 kgs) of corn every day. The authorities also confi scated his printing press and his lovingly put together library that contained many precious manuscripts. In 1920, when the fi rst Indian Communist Conference was held at Kanpur, he was one of the organising hosts and pre- sented the welcome address. Some believe that it was on that occasion he gave India the slogan Inqilāb Zindabād as the equivalent to the international war cry of radicals: “Vive la Revolution” (Long Live The Revolution). -
The Forms of Seeking Accepting and Denying Permissions in English and Awadhi Language
THE FORMS OF SEEKING ACCEPTING AND DENYING PERMISSIONS IN ENGLISH AND AWADHI LANGUAGE A Thesis Submitted to the Department of English Education In Partial Fulfilment for the Masters of Education in English Submitted by Jyoti Kaushal Faculty of Education Tribhuwan University, Kirtipur Kathmandu, Nepal 2018 THE FORMS OF SEEKING ACCEPTING AND DENYING PERMISSIONS IN ENGLISH AND AWADHI LANGUAGE A Thesis Submitted to the Department of English Education In Partial Fulfilment for the Masters of Education in English Submitted by Jyoti Kaushal Faculty of Education Tribhuwan University, Kirtipur Kathmandu, Nepal 2018 T.U. Reg. No.:9-2-540-164-2010 Date of Approval Thesis Fourth Semester Examination Proposal: 18/12/2017 Exam Roll No.: 28710094/072 Date of Submission: 30/05/2018 DECLARATION I hereby declare that to the best of my knowledge this thesis is original; no part of it was earlier submitted for the candidate of research degree to any university. Date: ..…………………… Jyoti Kaushal i RECOMMENDATION FOR ACCEPTANCE This is to certify that Miss Jyoti Kaushal has prepared this thesis entitled The Forms of Seeking, Accepting and Denying Permissions in English and Awadhi Language under my guidance and supervision I recommend this thesis for acceptance Date: ………………………… Mr. Raj Narayan Yadav Reader Department of English Education Faculty of Education TU, Kirtipur, Kathmandu, Nepal ii APPROVAL FOR THE RESEARCH This thesis has been recommended for evaluation from the following Research Guidance Committee: Signature Dr. Prem Phyak _______________ Lecturer & Head Chairperson Department of English Education University Campus T.U., Kirtipur, Mr. Raj Narayan Yadav (Supervisor) _______________ Reader Member Department of English Education University Campus T.U., Kirtipur, Mr. -
INCLUSIVE GOVERNANCE a Study on Participation and Representation After Federalization in Nepal
STATE OF INCLUSIVE GOVERNANCE A Study on Participation and Representation after Federalization in Nepal Central Department of Anthropology Tribhuvan University Kirtipur, Kathmandu, Nepal B I STATE OF INCLUSIVE GOVERNANCE STATE OF INCLUSIVE GOVERNANCE STATE OF INCLUSIVE GOVERNANCE A Study on Participation and Representation after Federalization in Nepal Binod Pokharel Meeta S. Pradhan SOSIN Research Team Project Coordinator Dr. Dambar Chemjong Research Director Dr. Mukta S. Tamang Team Leaders Dr. Yogendra B. Gurung Dr. Binod Pokharel Dr. Meeta S. Pradhan Dr. Mukta S. Tamang Team Members Dr. Dhanendra V. Shakya Mr. Mohan Khajum Advisors/Reviewers Dr. Manju Thapa Tuladhar Mr. Prakash Gnyawali B I STATE OF INCLUSIVE GOVERNANCE STATE OF INCLUSIVE GOVERNANCE STATE OF INCLUSIVE GOVERNANCE A Study on Participation and Representation after Federalization in Nepal Copyright@2020 Central Department of Anthropology Tribhuvan University This study is made possible by the support of the American People through the United States Agency for International Development (USAID). The contents of this report are the sole responsibility of the authors and do not necessarily reflect the views of USAID or the United States Government or Tribhuvan University. Published By Central Department of Anthropology (CDA) Tribhuvan University (TU) Kirtipur, Kathmandu, Nepal Tel: + 977-0-4334832 Email: [email protected] Website: anthropologytu.edu.np First Published: October 2020 400 Copies Cataloguing in Publication Data Pokharel, Binod. State of inclusive governance: a study on participation and representation after federalization in Nepal / Binod Pokharel, Meeta S. Pradhan.- Kirtipur : Central Department of Anthropology, Tribhuvan University, 2020. Xx, 164p. : Col. ill. ; Cm. ISBN 978-9937-0-7864-1 1. -
1-1873-71868 for Nameshop
Application number: 1-1873-71868 for Nameshop Generated on 30 May 2012 Applied-for gTLD string 13. Provide the applied-for gTLD string. If an IDN, provide the U-label. IDN 14(a). If an IDN, provide the A-label (beginning with "xn--"). 14(b). If an IDN, provide the meaning or restatement of the string in English, a description of the literal meaning of the string in the opinion of the applicant. 14(c). If an IDN, provide the language of the label (in English). 14(c). If an IDN, provide the language of the label (as referenced by ISO-639-1). 14(d). If an IDN, provide the script of the label (in English). 14(d). If an IDN, provide the script of the label (as referenced by ISO 15924). 14(e). If an IDN, list all code points contained in the U-label according to Unicode form. 15(a). If an IDN, Attach IDN Tables for the proposed registry. 15(b). Describe the process used for development of the IDN tables submitted, including consultations and sources used. 15(c). List any variant strings to the applied-for gTLD string according to the relevant IDN tables. 16. If an IDN, describe the applicant's efforts to ensure that there are no known operational or rendering problems. If such issues are known, describe steps that will be taken to mitigate these issues in software and other applications. Nameshop anticipates the introduction of .IDN without operational or rendering problems. Based on a decade of experience launching and operating new TLDs, Afilias, the back-end provider of registry services for this TLD, is confident the launch and operation of this TLD presents no known challenges.