Recent Progress in Developing Grapheme-Based Speech Recognition for Indonesian Ethnic Languages: Javanese, Sundanese, Balinese and Bataks
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Suspicious Identity of U+A9B5 JAVANESE VOWEL SIGN TOLONG
L2/19-003 Suspicious identity of U+A9B5 JAVANESE VOWEL SIGN TOLONG Liang Hai / 梁海 <[email protected]> Aditya Bayu Perdana / <[email protected]> ꦄꦢꦶꦠꦾ ꦧꦪꦸꦥꦢꦤ 4 January 2019 1 Acknowledgements The authors would like to thank Ilham Nurwansah and the Script Ad Hoc group for their feedback. Ilham Nurwansah also kindly provided the Sundanese samples (Figure 2, 3, 4, and 5). 2 Background In the original Unicode Javanese proposal L2/08-015R Proposal for encoding the Javanese script in the UCS, the character tolong (U+A9B5 JAVANESE VOWEL SIGN TOLONG) was described as a vowel sign that is used exclusively in the Sundanese writing system with three major use cases: 1. Used alone as the vowel sign o 2. As a part of the vowel sign eu: <vowel sign ĕ, tolong> 3. As a part of the letters and conjoined forms of reu/leu: <letter / conjoined form rĕ/lĕ, tolong> Table 1. Sundanese tolong usage according to the original proposal Written form ◌ ◌ꦵ ◌ꦼ ◌ꦼꦵ ◌� ◌�ꦵ A9C0 PANGKON A9BC PEPET Encoding (A9B5 TOLONG) A989 PA CEREK (A9B5 TOLONG) (A9B5 TOLONG) Transcription a o ĕ eu rĕ reu Pronunciation [a] [o] [ə] [ɤ] [rə] [rɤ] See also the note under Table 2. However, tolong appears to be merely a stylistic variant of tarung (U+A9B4 JAVANESE VOWEL SIGN TARUNG), therefore the disunification of tolong from tarung is likely a mistake. 3 Proposal The Unicode Standard needs to recommend how the inappropriately disunified character U+A9B5 JAVANESE VOWEL SIGN TOLONG should be handled. 1 In particular, clarification in the names list and the Core Specification is necessary for explaining the background of the mis-disunification and recommending how both the tarung and tolong forms for both the Javanese and Sundanese languages should be implemented. -
Cqmejj · -Uhhrersity
$9uth¢a$t Mia JTogtam -1986-:13.ulletin CQmeJJ · -Uhhrersity ' - SEAP ARCHIVE COPY DO NOT REMOVE This publication has been made possible by the generosity of Robert and Ruth Polson. Southeast Asia Program 1986 Bulletin Cornell University Contents From the Director . 2 Badgley Appointed Curator of the Echols Collection . .. .. .. .. 3 Filming Javanese Manuscript Collections in Surakarta . 4 Microcomputers and the Study of Southeast Asia. .. 6 Celebrating Our Founder's Birthday.............. .... ... 7 Interview with Dr. Hendrik M. J. Maier..................... ... .. 9 Retirements. .. .. .. .. .. .. .. .. .. I 2 Program Publications . 13 About Program People . 14 Thursday Luncheon Speakers .. .. .. .. I 4 Faculty and Staff Publications. .... ... .. .. .. 14 Lauriston Sharp Prize. 14 Social Science Research Council Fellowships . 15 Resident Faculty . .. .. .. 15 Visiting Faculty .. .. .. .. .. .. .. 15 Visiting Fellows. 15 Graduate Students in Field Published by the Southeast Asia Program, Research . 15 Cornell University, 1987 Graduate Students in Residence, Edited by Stanley J. O'Connor Spring 1986................ 15 Full-Year Asian Language Designed by Deena Wickstrom Concentration . I 6 Produced by the Office of Publications Services, Advanced Indonesian Abroad Cornell University Program. .... .......... 16 Recent Doctoral Dissertations The photograph of John H. Badgley was taken by Helen Kelley and of Hendrik M. J . Maier, by Margaret Fabrizzio. by SEAP Students........... 16 Recent Dissertations and Cover design after a woodcut of cloves from 1ratado das drogas e Theses on Southeast Asia by medicinas das indias Orientais, by Crist6vao da Costa Other Students at Cornell.. 16 from the Director Dear Friends, year we were fortunate to have Professor Charnvit Kasetsiri, vice rector of Thammasat University, come to Last year I noted that the Southeast Asia Program was teach the Thailand Seminar. -
Introduction to Old Javanese Language and Literature: a Kawi Prose Anthology
THE UNIVERSITY OF MICHIGAN CENTER FOR SOUTH AND SOUTHEAST ASIAN STUDIES THE MICHIGAN SERIES IN SOUTH AND SOUTHEAST ASIAN LANGUAGES AND LINGUISTICS Editorial Board Alton L. Becker John K. Musgrave George B. Simmons Thomas R. Trautmann, chm. Ann Arbor, Michigan INTRODUCTION TO OLD JAVANESE LANGUAGE AND LITERATURE: A KAWI PROSE ANTHOLOGY Mary S. Zurbuchen Ann Arbor Center for South and Southeast Asian Studies The University of Michigan 1976 The Michigan Series in South and Southeast Asian Languages and Linguistics, 3 Open access edition funded by the National Endowment for the Humanities/ Andrew W. Mellon Foundation Humanities Open Book Program. Library of Congress Catalog Card Number: 76-16235 International Standard Book Number: 0-89148-053-6 Copyright 1976 by Center for South and Southeast Asian Studies The University of Michigan Printed in the United States of America ISBN 978-0-89148-053-2 (paper) ISBN 978-0-472-12818-1 (ebook) ISBN 978-0-472-90218-7 (open access) The text of this book is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License: https://creativecommons.org/licenses/by-nc-nd/4.0/ I made my song a coat Covered with embroideries Out of old mythologies.... "A Coat" W. B. Yeats Languages are more to us than systems of thought transference. They are invisible garments that drape themselves about our spirit and give a predetermined form to all its symbolic expression. When the expression is of unusual significance, we call it literature. "Language and Literature" Edward Sapir Contents Preface IX Pronounciation Guide X Vowel Sandhi xi Illustration of Scripts xii Kawi--an Introduction Language ancf History 1 Language and Its Forms 3 Language and Systems of Meaning 6 The Texts 10 Short Readings 13 Sentences 14 Paragraphs.. -
Languages of Southeast Asia
Jiarong Horpa Zhaba Amdo Tibetan Guiqiong Queyu Horpa Wu Chinese Central Tibetan Khams Tibetan Muya Huizhou Chinese Eastern Xiangxi Miao Yidu LuobaLanguages of Southeast Asia Northern Tujia Bogaer Luoba Ersu Yidu Luoba Tibetan Mandarin Chinese Digaro-Mishmi Northern Pumi Yidu LuobaDarang Deng Namuyi Bogaer Luoba Geman Deng Shixing Hmong Njua Eastern Xiangxi Miao Tibetan Idu-Mishmi Idu-Mishmi Nuosu Tibetan Tshangla Hmong Njua Miju-Mishmi Drung Tawan Monba Wunai Bunu Adi Khamti Southern Pumi Large Flowery Miao Dzongkha Kurtokha Dzalakha Phake Wunai Bunu Ta w an g M o np a Gelao Wunai Bunu Gan Chinese Bumthangkha Lama Nung Wusa Nasu Wunai Bunu Norra Wusa Nasu Xiang Chinese Chug Nung Wunai Bunu Chocangacakha Dakpakha Khamti Min Bei Chinese Nupbikha Lish Kachari Ta se N a ga Naxi Hmong Njua Brokpake Nisi Khamti Nung Large Flowery Miao Nyenkha Chalikha Sartang Lisu Nung Lisu Southern Pumi Kalaktang Monpa Apatani Khamti Ta se N a ga Wusa Nasu Adap Tshangla Nocte Naga Ayi Nung Khengkha Rawang Gongduk Tshangla Sherdukpen Nocte Naga Lisu Large Flowery Miao Northern Dong Khamti Lipo Wusa NasuWhite Miao Nepali Nepali Lhao Vo Deori Luopohe Miao Ge Southern Pumi White Miao Nepali Konyak Naga Nusu Gelao GelaoNorthern Guiyang MiaoLuopohe Miao Bodo Kachari White Miao Khamti Lipo Lipo Northern Qiandong Miao White Miao Gelao Hmong Njua Eastern Qiandong Miao Phom Naga Khamti Zauzou Lipo Large Flowery Miao Ge Northern Rengma Naga Chang Naga Wusa Nasu Wunai Bunu Assamese Southern Guiyang Miao Southern Rengma Naga Khamti Ta i N u a Wusa Nasu Northern Huishui -
Framework of Jawatex
IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.4, April 2010 219 Framework of JawaTeX Ema Utami1, Jazi Eko Istiyanto2, Sri Hartati3, Marsono4, Ahmad Ashari5 1Information System Major of STMIK AMIKOM Yogyakarta Ring Road Utara ST, Condong Catur, Depok Sleman Yogyakarta 2,3,5Doctoral Program in Computer Science of Postgraduate School Gadjah Mada University Graha Student Internet Center (SIC) 3rd floor Faculty of Mathematic and Natural Sciences Gadjah Mada University Sekip Utara Bulaksumur Yogyakarta. 55281 4 Sastra Nusantara Major of Culture Sciences Gadjah Mada University Humaniora ST No.1, Bulaksumur, Yogyakarta Summary Transliteration is a substitution letter by letter from one alphabet to another, free from how to actually speak those characters or it can 1. Introduction be called a letter substitution or transliteration. Currently there are already two Javanese characters of true type font. To use the fonts This paper is influenced by the research of Free/Open in writing Javanese characters, users must have knowledge about Source Software Localization (FOSS) [11] to develop how to read and write Javanese. No researcher has already developed algorithm to handle writing Javanese character for x software based on where the software is built. According to and q Latin characters, aritmetic operand, special symbols (except The Localization Industry Standards Association (LISA), period, coma and double quote), multiple consonant (more than Localization encloses product building that is appropriate two sequence consonants), also cannot handle roman numbering to target culture (region and language) where the poducts system. are sold [7]. Research in China, Japan dan Korea (CJK) This paper explains how JawaTeX is designed. -
Universal Scripts Project: Statement of Significance and Impact
Universal Scripts Project: Statement of Significance and Impact The Universal Scripts Project expands the capabilities of the Internet by providing digital access to text materials from a variety of modern and historical cultures whose writing systems are not currently included in the international standard for electronic representation of scripts, known as Unicode. People who write in these scripts find it difficult to use email, compose and send documents electronically, and post documents on the World Wide Web, without relying on nonstandard fonts or other cumbersome workarounds, and are therefore left out of the “technological revolution.” About 66 scripts are currently included in the Unicode standard, but over 80 are not. Some 40 of these missing scripts belong to modern linguistic minorities in Africa, the Indian subcontinent, China, and other countries in Southeast Asia; about 40 are scripts of historical importance. The project’s goal for 2007–2008 is to provide the standards bodies overseeing character sets with proposals for 15 scripts to be included in the Unicode standard. The scripts selected for inclusion include 9 modern minority scripts and 6 historical scripts. The need is urgent, because the entire process, from first proposal to acceptance, typically takes from 2 to 5 years, and support among corporations and national bodies for adding more scripts to Unicode is uncertain. If the proposals are not submitted soon, these user communities will not be able to use their scripts in the near future. The scripts selected for this grant have established scholarly and user-community connections, which will help guarantee that the proposals meet the users' needs. -
JTC1/SC2/WG2 N3405R Date: 2008-04-21
JTC1/SC2/WG2 N3405R Date: 2008-04-21 Updated Draft Agenda – Meeting # 52 Topic (Document No.) Proposed Outcome 1. Opening and roll call (N3401) Update WG2Distribution List 2. Approval of the agenda (N3405) Approved agenda 3. Approval of minutes of meeting 51 (N3353) Approved Minutes 4. Review action items from previous meeting Updated Action Item List (N3353-AI) 5. JTC1 and ITTF matters: FYI 5.1. Ballot Results & Publication – Amendment 3 (N3375, N3391) 6. SC2 matters: FYI 6.1. Program of Work 6.2. Submittals to ITTF – Amendment 4 (N3381) 6.3. Ballot results: Amendment 5 (N3409), Amendment 6 (N3406) 6.4. SC2 Secretariat Report – SC2 15 Plenary 15 (N3415) 6.5. Draft agenda (N3439) 6.6. Request for periodic review (N3416) 6.7. Recommendations - SWG Directives Meeting (N3417) 6.8. Stabilized Standards recommendation (N3440) 7. WG2 matters: 7.1. Tai Tham ad hoc meeting January 2008 FYI (N3374, N3379, N3384) moved N3379, N3384 to 10.1.4 7.2. WG2 Convener’s Draft Report to SC2 (N3399) FYI 7.3. Snapshot of Pictorial view of Roadmaps (N3398) 7.4. Character Count spreadsheet (N3368) FYI 7.5. Request to modify principles and procedures (N3441) Review 7.6. CJK Multicolumn presentation (N3408) Review & choose format 7.7. Subdivision of work – New edition 10646 FCD Status Update (N3360, N3362, N3364) 7.8. Criteria for encoding script specific danda in Meitei Mayek Review (N3457) 8. IRG status and reports Review & approve 8.1. IRG Resolutions – Meeting 29 (N3371) 8.2. IRG Summary Report – Meeting 29 (N3372) 8.3. Proposal to encode 6 additional HKSCS (N3445) 9. -
Word-Prosodic Systems of Raja Ampat Languages
Word-prosodic systems of Raja Ampat languages PROEFSCHRIFT ter verkrijging van de graad van Doctor aan de Universiteit Leiden, op gezag van de Rector Magnificus Dr. D.D. Breimer, hoogleraar in de faculteit der Wiskunde en Natuurwetenschappen en die der Geneeskunde, volgens besluit van het College voor Promoties te verdedigen op woensdag 9 januari 2002 te klokke 15.15 uur door ALBERT CLEMENTINA LUDOVICUS REMIJSEN geboren te Merksem (België) in 1974 Promotiecommissie promotores: Prof. Dr. V.J.J.P. van Heuven Prof. Dr. W.A.L. Stokhof referent: Dr. A.C. Cohn, Cornell University overige leden: Prof. Dr. T.C. Schadeberg Prof. Dr. H. Steinhauer Published by LOT phone: +31 30 253 6006 Trans 10 fax: +31 30 253 6000 3512 JK Utrecht e-mail: [email protected] The Netherlands http://www.let.uu.nl/LOT/ Cover illustration: Part of the village Fafanlap (Misool, Raja Ampat archipelago, Indonesia) in the evening light. Photo by Bert Remijsen (February 2000). ISBN 90-76864-09-8 NUGI 941 Copyright © 2001 by Albert C.L. Remijsen. All rights reserved. This book is dedicated to Lex van der Leeden (1922-2001), with friendship and admiration Table of contents Acknowledgements vii Transcription and abbreviations ix 1 Introduction 1 2 The languages of the Raja Ampat archipelago 5 2.1. About this chapter 5 2.2. Background 6 2.2.1. The Austronesian and the Papuan languages, and their origins 6 2.2.2. The South Halmahera-West New Guinea subgroup of Austronesian 8 2.2.2.1. In general 8 2.2.2.2. Within the South Halmahera-West New Guinea (SHWNG) subgroup 9 2.2.2.3. -
Line Segmentation of Javanese Image of Manuscripts in Javanese Scripts
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Repository Universitas Sanata Dharma International Journal of Engineering Innovation & Research Volume 2, Issue 3, ISSN: 2277 – 5668 Line Segmentation of Javanese Image of Manuscripts in Javanese Scripts Anastasia Rita Widiarti Agus Harjoko Marsono Sri Hartati Email: [email protected] Email: [email protected] Email: [email protected] Email: [email protected] Abstract – Segmentation is an important stage in automatic II. RELEVANT WORKS transliteration process of a manuscript image. One of the segmentation approaches generally used to get an image of Palakollu, et al. [2] investigate line segmentation the scripts of a scrip image is performing line segmentation and then performing segmentation of script images on the method on Hindi manuscript image using projection-based result of the line segmentation. approach. They build an algorithm to detect header line Line segmentation of image of manuscripts in Javanese and base line, based on several initial assumptions such as scripts is often difficult because there are lines of images average line height of 30 pixel, to make an estimation of which shouldn’t be on the same line, but are in one line area real average line height. From 500 documents of the image, and there are even images in different but investigated, the average accuracy of line segmentation is overlapping lines. This paper offers an idea to use moving 93.6%. Lehal and Singh [3] study segmentation on average to smooth the curve of vertical projection result of Gurmukhi text using a combination of statistical analysis image of manuscripts in Javanese scripts as an initial guide from the text, projection profile, and analysis on connected of line separation. -
LSP 402 Performer08192015.Pdf
Language Specific Peculiarities Document for JAVANESE as Spoken in INDONESIA Javanese is an Austronesian language (Malayo-Polynesian) spoken primarily in the central and eastern parts of Java, an island of Indonesia. It is spoken by approximately 84 million people. Bahasa Indonesia is the official language of Indonesia, and while Javanese does not have official language status, it is the most widely spoken regional language in Indonesia (Nothofer, 2006). Javanese is also spoken by approximately 500,000 people in Suriname, New Caledonia, Malaysia and other countries (Lewis et al., 2014), but these were not collected for the current project. 1. Special handling of dialects There are three main dialects of Javanese corresponding to geographical areas within Java (Andarini et al., 2007). The standard dialect is based on Central Javanese as spoken in Surakarta and Yogyakarta. It is representative of the Javanese taught in schools and spoken in the palaces. There are phonetic and lexical differences between the three dialects. But in general, there is a high degree of mutual intelligibility. The phonetic differences are largely made up of different vowel pronunciations (Ras, 1985). Region Districts or Cities Central Javanese Pekalongan, Kedu, Bagelen, Semarang, Eastern North-Coast, Blora, Surakarta, Yogyakarta, Madiun Western Javanese North Banten, Cirebon, Tegal, Banyumas Eastern Javanese Surabaya, Malang, Jombang, Banyuwangi Speakers of all 3 dialects are included in this speech database. In addition to the dialects, Javanese has three speech levels used depending on the social relationship between the conversation partners (Poedjosoedarmo, 1968; Quinn, 2011): • ngoko – non-polite or informal speech level, used between people who have a high degree of familiarity, e.g., close family and friends. -
Language Shift Among Javanese Youth and Their Perception of Local and National Identities
GEMA Online® Journal of Language Studies 109 Volume 19(3), August 2019 http://doi.org/10.17576/gema-2019-1903-07 Language Shift among Javanese Youth and Their Perception of Local and National Identities Erna Andriyanti [email protected] English Education Department, Universitas Negeri Yogyakarta ABSTRACT It is not uncommon for language to play an important role in identity issues in multilingual countries. Declaring one of the important community languages as the official language in such a country can pose a threat to the survival of the other languages. Bahasa Indonesia is an example of this phenomenon. Its successful establishment as the national language has altered the local language situation throughout the country. Relevant to this study, it has had an important effect on young people’s use of Javanese, the dominant local language of Yogyakarta. This study analyses the extent of language shift among the young multilinguals in the city and investigates the youth’s search for authentic local and national identities. A questionnaire was used to elicit the youth’s mother tongue as well as their attitudes and perceptions towards Javanese and Bahasa Indonesia and local and national identities. Their real use of languages was obtained through non-participative observations. A sample group of 1,039 students from 10 junior and senior high schools was surveyed. The findings reveal the current status of Javanese and Bahasa Indonesia as mother tongues and the identity- language choice links. Most young people with Javanese parents claimed that Bahasa Indonesia is their first language. This signals a weakened intergenerational transmission of Javanese. -
Remijsen Phd (2002) Word-Prosodic Systems of the Raja
Word-prosodic systems of Raja Ampat languages PROEFSCHRIFT ter verkrijging van de graad van Doctor aan de Universiteit Leiden, op gezag van de Rector Magnificus Dr. D.D. Breimer, hoogleraar in de faculteit der Wiskunde en Natuurwetenschappen en die der Geneeskunde, volgens besluit van het College voor Promoties te verdedigen op woensdag 9 januari 2002 te klokke 15.15 uur door ALBERT CLEMENTINA LUDOVICUS REMIJSEN geboren te Merksem (België) in 1974 Promotiecommissie promotores: Prof. Dr. V.J.J.P. van Heuven Prof. Dr. W.A.L. Stokhof referent: Dr. A.C. Cohn, Cornell University overige leden: Prof. Dr. T.C. Schadeberg Prof. Dr. H. Steinhauer Published by LOT phone: +31 30 253 6006 Trans 10 fax: +31 30 253 6000 3512 JK Utrecht e-mail: [email protected] The Netherlands http://www.let.uu.nl/LOT/ Cover illustration: Part of the village Fafanlap (Misool, Raja Ampat archipelago, Indonesia) in the evening light. Photo by Bert Remijsen (February 2000). ISBN 90-76864-09-8 NUGI 941 Copyright © 2001 by Albert C.L. Remijsen. All rights reserved. This book is dedicated to Lex van der Leeden (1922-2001), with friendship and admiration Table of contents Acknowledgements vii Transcription and abbreviations ix 1 Introduction 1 2 The languages of the Raja Ampat archipelago 5 2.1. About this chapter 5 2.2. Background 6 2.2.1. The Austronesian and the Papuan languages, and their origins 6 2.2.2. The South Halmahera-West New Guinea subgroup of Austronesian 8 2.2.2.1. In general 8 2.2.2.2. Within the South Halmahera-West New Guinea (SHWNG) subgroup 9 2.2.2.3.