Jejueo Talking Dictionary: a Collaborative Online Database for Language Revitalization

Total Page:16

File Type:pdf, Size:1020Kb

Jejueo Talking Dictionary: a Collaborative Online Database for Language Revitalization Jejueo talking dictionary: A collaborative online database for language revitalization Moira Saltzman University of Michigan [email protected] Abstract 1. Introduction This paper describes the ongoing development of the Jejueo Talking Dictionary, a free online The purpose of this paper is to present the multimedia database and Android application. ongoing development of the Jejueo Talking Jejueo is a critically endangered language spoken Dictionary as an example of applying by 5,000-10,000 people throughout Jeju Province, South Korea, and in a diasporic enclave in Osaka, interdisciplinary methodology to create an Japan. Under contact pressure from Standard enduring, multipurpose record of an Korean, Jejueo is undergoing rapid attrition endangered language. In this paper I examine (Kang, 2005; Kang, 2007), and most fluent strategies for gathering extensive data to create speakers of Jejueo are now over 75 years old a multimodal online platform aimed at a wide (UNESCO, 2010). In recent years, talking variety of uses and user groups. The Jejueo dictionaries have proven to be valuable tools in Talking Dictionary project is tailored to language revitalization programs worldwide diverse user communities on Jeju Island, South (Nathan, 2006; Harrison and Anderson, 2006). Korea, where Jejueo, the indigenous language, As a collaborative team including linguists from is critically endangered and underdocumented, Jeju National University, members of the Jejueo Preservation Society, Jeju community members but where the population’s smart phone and outside linguists, we are currently building a penetration rate is 75% (Lee, 2014) and semi- web-based talking dictionary of Jejueo along with speakers are highly proficient users of an application for Android devices. The Jejueo technology (Song, 2012). The Jejueo Talking talking dictionary will compile existing annotated Dictionary is also intended for Jejueo speakers video corpora of Jejueo songs, conversational of varying degrees of fluency in Osaka, Japan, genres and regional mythology into a multimedia where up to 126,511 diasporic Jejuans reside database, to be supplemented by original (Southcott, 2013). A third aim of the Jejueo annotated video recordings of natural language Talking Dictionary is to create extensive use. Lexemes and definitions will be linguistic documentation of Jejeuo that will be accompanied by audio files of their pronunciation and occasional photos, in the case of items native available to the wider scientific community, as to Jeju. The audio and video data will be tagged the vast majority of existing documentary in Jejueo, Korean, Japanese and English so that materials on Jejueo are published in Korean. users may search or browse the dictionary in any The Jejueo Talking Dictionary will serve as an of these languages. Videos showing a range of online open-access repository of over 200 discourse types will have interlinear glossing, so hours of natural and ceremonial language use, that users may search Jejueo particles as well as with interlinear glossing in Jejueo, Korean, lexemes and grammatical topics, and find the Japanese and English. tools to construct original Jejeuo speech. The Jejueo talking dictionary will serve as a tool for language acquisition in Jejueo immersion 2 Background programs in schools, as well as a repository for oral history and ceremonial speech. The aim of 2.1 Language context this paper is to discuss how the interests of diverse user communities may be addressed by Very closely related to Korean, Jejueo is the the methodology, organization and scope of indigenous language of Jeju Island, South talking dictionaries. Korea. Jejueo has 5,000-10,000 speakers 122 Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 122–129, Honolulu, Hawai‘i, USA, March 6–7, 2017. c 2017 Association for Computational Linguistics located throughout the islands of Jeju Province Korean (Kang, 2005; Saltzman, 2014). Recent and in a diasporic enclave in Osaka, Japan. surveys on language ideologies of Jejueo With most fluent speakers over 75 years old, speakers (Kim, 2011; Kim, 2013) show that a Jejueo was classified as critically endangered roughly diglossic situation is maintained by by UNESCO in 2010. The Koreanic language present day language ideologies. In a series of family consists of at least two languages, qualitative interviews on language ideologies, Jejueo and Korean. Several regional varieties Kim (2013:33) finds common themes of Korean are spoken across the Korean suggesting that Korean is used as a means of peninsula, divided loosely along provincial showing respect to unfamiliar interlocutors, as lines. Jejueo and Korean are not mutually Korean “...is perceived as the language of intelligible, owing to Jejueo’s distinct lexicon distance and rationality”. Likewise Jejueo is and grammatical morphemes. Pilot research considered appropriate to use whenever (Yang, 2013) estimates that 20-25% of the interpersonal boundaries, such as distinctions lexicons of Jejueo and Korean overlap, and a within social hierarchies are perceived less recent study (O’Grady, 2015) found that salient than the intimacy and mutual trust two Jejueo is at most 12% intelligible to speakers or more people share. (Kim, 2013). of Korean on Korea’s mainland. 1 Jejueo conserves many Middle Korean phonological Yang’s (2013) pilot survey on language and lexical features lost to MSK, including the attitudes finds that while community members Middle Korean phoneme /ɔ/ and terms such as recognize Jejueo as a marker of Jeju identity pɨzʌp : Jejueo pusʌp ‘charcoal burner’ worth transmitting to future generations, few (Stonham, 2011: 97). Extensive lexical and speakers feel empowered to reverse the pattern morphological borrowing from Japanese, of language shift to Korean. There are no Mongolian and Manchurian is evident in longer monolingual speakers of Jejueo on Jeju Jejueo, owing to the Mongolian colonization or in Osaka. The examples below are samples of Jeju in the 13th and 14th centuries, Japan’s of the same declarative construction produced annexation of Korea and occupation of Jeju by a fluent Jejueo speaker in (1), a typical between 1910 and 1945, and centuries of trade younger Jejueo semi-speaker in (2), and the with Manchuria and Japan (Martin, 1993; Lee Korean translation (3). Jejueo morphemes in and Ramsey, 2000). Several place names in (2) are in boldface. Jeju are arguably Japonic in origin, e.g. Tamna, the first known name of Jeju Island (1) (Kwen ,1994:167; Vovin, 2013). Moreover, harmang -jʌŋ sontɕi -jʌŋ mik͈ aŋ several names for indigenous fruits and grandmother-CONJ grandchild-CONJ orange- vegetables on Jeju are borrowed from -ɯl tʰa -m -su -ta Japanese, e.g. mik͈aŋ ‘orange’. Mongolic ACC pick-PRS[PROG]-FO-DECL speakers left the lexical imprint of a robust “The grandmother and grandchild are picking inventory of terms describing horses and cows, oranges.” e.g. mɔl ‘horse’. Jejueo borrowed grammatical morphemes from the Tungusic language Manchurian, e.g the dative suffixal particle (2) *de < ti ‘to’ (Kang, 2005). harmang -koa sontɕa -oa kjul grandmother-CONJgrandchild-CONJ orange- 2.2 Current status of Jejueo -ɯl t͈a -ko i -su-ta The present situation in Jeju is one of language ACC pick-PROG-EXIST[PRS]-FO.DECL shift, where fewer than 10,000 people out of a “The grandmother and grandchild are picking population of 600,000 are fluent in Jejueo, and oranges.” features of Jejueo’s lexicon, morphosyntax and phonology are rapidly assimilating to (3) harmʌni -oa sontɕa -oa kjul 1 In a 2015 study O’Grady and Yang found that speakers grandmother-CONJ grandchild-CONJ orange- of Korean from four provinces on the mainland had rates -ɯl t͈a -ko is͈ -ʌjo of 8-12% intelligibility for Jejueo based on a ACC pick-PROG EXIST[PRS]-FO.DECL comprehension task of a one-minute recording of Jejueo connected speech. 123 “The grandmother and grandchild are picking publication of lexicographic materials may oranges.” even help indigenous languages be perceived as ‘real languages’ in the sociolinguistic While examples (1) and (3) have several marketplace. The lexicographic materials cognate forms, the majority of grammatical alone, however, do not engender sufficient particles are genetically unrelated. The motivation for a speech community to accusative particle -ɯl is shared by Korean maintain the use of their heritage language. and Jejueo, although in Jejueo the nominative Fishman (1991) warns against dictionary and accusative markers are most commonly projects that become ‘monuments’ to a dropped. In example (2) the construction the language rather than stimulating language use Jejueo morphemes have been replaced by and intergenerational transmission. Korean morphemes, save ‘grandmother’ and the verbal ending, a pattern typical of non- A recent study by O’Grady (2015) found that fluent speakers of Jejueo (Saltzman, 2014). the level of Jejueo transmission between generations shows a drastic decline. Given the task of answering content questions based on a 3 Jejueo lexicography and sustainability one-minute recording of Jejueo connected speech, heritage speakers in the 50-60 age Because most Korean linguists view Jejueo as a conservative dialect of Korean (Sohn, 1999; bracket demonstrated a comprehension level Song, 2012), lexical documentation of Jejueo of 89%, while heritage speakers between 20 has not been a scientific priority. The few and 29 showed just 12% comprehension, equal Jejueo lexicographic projects have been to that of citizens
Recommended publications
  • The University of Chicago Tears in the Imperial Screen
    THE UNIVERSITY OF CHICAGO TEARS IN THE IMPERIAL SCREEN: WARTIME COLONIAL KOREAN CINEMA, 1936-1945 A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE HUMANITIES IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF EAST ASIAN LANGUAGES AND CIVILIZATIONS BY HYUN HEE PARK CHICAGO, ILLINOIS AUGUST 2017 TABLE OF CONTENTS Page LIST OF TABLES ...…………………..………………………………...……… iii LIST OF FIGURES ...…………………………………………………..……….. iv ABSTRACT ...………………………….………………………………………. vi CHAPTER 1 ………………………..…..……………………………………..… 1 INTRODUCTION CHAPTER 2 ……………………………..…………………….……………..… 36 ENLIGHTENMENT AND DISENCHANTMENT: THE NEW WOMAN, COLONIAL POLICE, AND THE RISE OF NEW CITIZENSHIP IN SWEET DREAM (1936) CHAPTER 3 ……………………………...…………………………………..… 89 REJECTED SINCERITY: THE FALSE LOGIC OF BECOMING IMPERIAL CITIZENS IN THE VOLUNTEER FILMS CHAPTER 4 ………………………………………………………………… 137 ORPHANS AS METAPHOR: COLONIAL REALISM IN CH’OE IN-GYU’S CHILDREN TRILOGY CHAPTER 5 …………………………………………….…………………… 192 THE PLEASURE OF TEARS: CHOSŎN STRAIT (1943), WOMAN’S FILM, AND WARTIME SPECTATORSHIP CHAPTER 6 …………………………………………….…………………… 241 CONCLUSION BIBLIOGRAPHY …………………………………………………………….. 253 FILMOGRAPHY OF EXTANT COLONIAL KOREAN FILMS …………... 265 ii LIST OF TABLES Page Table 1. Newspaper articles regarding traffic film screening events ………....…54 Table 2. Newspaper articles regarding traffic film production ……………..….. 56 iii LIST OF FIGURES Page Figure. 1-1. DVDs of “The Past Unearthed” series ...……………..………..…..... 3 Figure. 1-2. News articles on “hygiene film screening” in Maeil sinbo ….…... 27 Figure. 2-1. An advertisement for Sweet Dream in Maeil sinbo……………… 42 Figure. 2-2. Stills from Sweet Dream ………………………………………… 59 Figure. 2-3. Stills from the beginning part of Sweet Dream ………………….…65 Figure. 2-4. Change of Ae-sun in Sweet Dream ……………………………… 76 Figure. 3-1. An advertisement of Volunteer ………………………………….. 99 Figure. 3-2. Stills from Volunteer …………………………………...……… 108 Figure.
    [Show full text]
  • Towards a Practical Phonology of Korean
    Towards a practical phonology of Korean Research Master programme in Linguistics Leiden University Graduation thesis Lorenzo Oechies Supervisor: Dr. J.M. Wiedenhof Second reader: Dr. A.R. Nam June 1, 2020 The blue silhouette of the Korean peninsula featured on the front page of this thesis is taken from the Korean Unification Flag (Wikimedia 2009), which is used to represent both North and South Korea. Contents Introduction ..................................................................................................................................................... iii 0. Conventions ............................................................................................................................................... vii 0.1 Romanisation ........................................................................................................................................................ vii 0.2 Glosses .................................................................................................................................................................... viii 0.3 Symbols .................................................................................................................................................................. viii 0.4 Phonetic transcription ........................................................................................................................................ ix 0.5 Phonemic transcription.....................................................................................................................................
    [Show full text]
  • The Morphosyntax of Jejuan – Ko Clause Linkages
    The Morphosyntax of Jejuan –ko Clause Linkages † Soung-U Kim SOAS University of London ABSTRACT While clause linkage is a relatively understudied area within Koreanic linguistics, the Korean –ko clause linkage has been studied more extensively. Authors have deemed it interesting since depending on the successive/non-successive interpretation of its events, a –ko clause linkage exhibits all or no properties of what is traditionally known as coordination or subordination. Jejuan –ko clauses may look fairly similar to Korean on the surface, and exhibit a similar lack of semantic specification. This study shows that the traditional, dichotomous coordination-subordination opposition is not applicable to Jejuan –ko clauses. I propose that instead of applying a-priori categories to the exploration of clause linkage in Koreanic varieties, one should apply a multidimensional model that lets patterns emerge in an inductive way. Keywords: clause linkage, –ko converb, Jejuan, Jejueo, Ceycwu dialect 1. Introduction Koreanic language varieties are well-known for their richness in manifestations of clause linkage, much of which is realised by means of specialised verb forms. Connecting to an ever-growing body of research in functional-typological studies (cf. Haspelmath and König 1995), a number of authors in Koreanic linguistics have adopted the term converb for these forms (Jendraschek and Shin 2011, 2018; Kwon et al. 2006 among others). Languages such as Jejuan (Song S-J 2011) or Korean (Sohn H-M 2009) make extensive use of an unusually high number of converbs, connecting clauses within a larger sentence structure which may correspond to * This work was supported by the Laboratory Programme for Korean Studies through the Ministry of Education of the Republic of Korea and Korean Studies Promotion Service of the Academy of Korean Studies (AKS-2016-LAB-2250003), the Endangered Languages Documentation Programme of the Arcadia Fund (IGS0208), as well as the British Arts and Humanities Research Council.
    [Show full text]
  • Unexpected Nasal Consonants in Joseon-Era Korean Thomas
    Unexpected Nasal Consonants in Joseon-Era Korean Thomas Darnell 17 April 2020 The diminutive suffixes -ngaji and -ngsengi are unique in contemporary Korean in ​ ​ ​ ​ that they both begin with the velar nasal consonant (/ŋ/) and seem to be of Korean origin. Surprisingly, they seem to share no direct genetic affiliation. But by reverse-engineering sound change involving the morpheme-initial velar nasal in the Ulsan dialect, I prove that the historical form of -aengi was actually ​ ​ maximally -ng; thus the suffixes -ngaji and -ngsaengi are related if we consider ​ ​ ​ ​ them to be concatenations of this diminutive suffix -ng and the suffixes -aji and ​ ​ ​ ​ -sengi. This is supported by the existence of words with the -aji suffix in which the ​ ​ ​ initial velar nasal -ㅇ is absent and which have no semantic meaning of ​ ​ diminutiveness. 1. Introduction Korean is a language of contested linguistic origin spoken primarily on the Korean Peninsula in East Asia. There are approximately 77 million Korean speakers globally, though about 72 million of these speakers reside on the Korean peninsula (Eberhard et al.). Old Korean is the name given to the first attested stage of the Koreanic family, referring to the language spoken in the Silla kingdom, a small polity at the southeast end of the Korean peninsula. It is attested (at first quite sparsely) from the fifth century until the overthrow of the Silla state in the year 935 (Lee & Ramsey 2011: 48, 50, 55). Soon after that year, the geographic center of written Korean then moved to the capital of this conquering state, the Goryeo kingdom, located near present-day Seoul; this marks the beginning of Early Middle Korean (Lee & Ramsey: 50, 77).
    [Show full text]
  • 2016 Research Enhancement Grant Application (Division of Arts and Humanities)
    2016 Research Enhancement Grant Application (Division of Arts and Humanities) Title: The Sound and Grammar of Jeju Korean Name: Seongyeon Ko E-mail: [email protected] Department: Classical, Middle Eastern, and Asian Languages & Cultures PROJECT DESCRIPTION This project aims to collect data of Jeju, an endangered Koreanic language to be used to produce an illustration of its sound structure and further develop a larger project to eventually publish a comprehensive grammar of the language. Jeju Korean Jeju Korean is a regional variety of Korean, spoken mainly on Jeju Island by approximately 5,000 to 10,000 fluent speakers as well as in the Osaka area in Japan by some diasporic Jeju speakers. Traditionally considered a regional dialect of Korean, it is almost unintelligible with other “mainland” varieties of Korean and, therefore, is often treated as a separate language nowadays. In 2010, UNESCO designated it as one of the world’s “critically endangered languages” based on the fact that the Jeju language was spoken largely by elderly speakers in their 70s or older primarily in informal settings and rapidly falling out of use under the influence of Standard Korean. In fact, younger speakers speak a kind of “mixed” language of the Standard Korean and the Jeju Korean. Born and raised in Jeju Island before my college education, I was one of those younger speakers of the “mixed” language. And this is one of the major reasons that I became a linguist who have felt obliged to conduct research on this particular vernacular. Jeju Korean has been of much interest to both historical/comparative linguist group and general linguist group.
    [Show full text]
  • Proposal for a Korean Script Root Zone LGR 1 General Information
    (internal doc. #: klgp220_101f_proposal_korean_lgr-25jan18-en_v103.doc) Proposal for a Korean Script Root Zone LGR LGR Version 1.0 Date: 2018-01-25 Document version: 1.03 Authors: Korean Script Generation Panel 1 General Information/ Overview/ Abstract The purpose of this document is to give an overview of the proposed Korean Script LGR in the XML format and the rationale behind the design decisions taken. It includes a discussion of relevant features of the script, the communities or languages using it, the process and methodology used and information on the contributors. The formal specification of the LGR can be found in the accompanying XML document below: • proposal-korean-lgr-25jan18-en.xml Labels for testing can be found in the accompanying text document below: • korean-test-labels-25jan18-en.txt In Section 3, we will see the background on Korean script (Hangul + Hanja) and principal language using it, i.e., Korean language. The overall development process and methodology will be reviewed in Section 4. The repertoire and variant groups in K-LGR will be discussed in Sections 5 and 6, respectively. In Section 7, Whole Label Evaluation Rules (WLE) will be described and then contributors for K-LGR are shown in Section 8. Several appendices are included with separate files. proposal-korean-lgr-25jan18-en 1 / 73 1/17 2 Script for which the LGR is proposed ISO 15924 Code: Kore ISO 15924 Key Number: 287 (= 286 + 500) ISO 15924 English Name: Korean (alias for Hangul + Han) Native name of the script: 한글 + 한자 Maximal Starting Repertoire (MSR) version: MSR-2 [241] Note.
    [Show full text]
  • Sijo: Korean Poetry Form
    Kim Leng East Asia: Origins to 1800 Spring 2019 Curriculum Project Sijo: Korean Poetry Form Rationale: This unit will introduce students to the sijo, a Korean poetic form, that predates the haiku. This popular poetic form has been written in Korea since the Choson dynasty (1392-1910). The three line poem is part of Korea’s rich cultural and literary heritage. Common Core English Language Art Standards: CCSS.ELA-Literacy.RL.9-10.4 Determine the meaning of words and phrases as they are used in the text, including figurative and connotative meanings; analyze the cumulative impact of specific word choices on meaning and tone (e.g., how the language evokes a sense of time and place; how it sets a formal or informal tone). CCSS.ELA-Literacy.RL.9-10.6 Analyze a particular point of view or cultural experience reflected in a work of literature from outside the United States, drawing on a wide reading of world literature. CCSS.ELA-Literacy.RL.9-10.10 By the end of grade 9, read and comprehend literature, including stories, dramas, and poems, in the grades 9-10 text complexity band proficiently, with scaffolding as needed at the high end of the range. Common Core Standards: L 3 Apply knowledge of language to understand how language functions in different contexts. L 5 Demonstrate understanding of figurative language, word relationships and nuances in meaning. English Language Arts Standards » Standard 10: Range, Quality, & Complexity » Range of Text Types for 6-12 Students in grades 6-12 apply the Reading standards to the following range of text types, with texts selected from a broad range of cultures and periods.
    [Show full text]
  • 1. Introduction 2. Studies on Jeju and Efforts to Preserve It
    An Endangered Language: Jeju Language Yeong-bong Kang Jeju National University 1. Introduction It has been a well-known fact that language is closely connected with both speaker's mind and its local culture. Cultural trait, one of the core properties of language, means that language reflects culture of the society at large. Even though Jeju language samchun 'uncle' is a variation of its standard Korean samchon, it is hard to say that samchun has the same dictionary definition of samchon, the brother of father, especially unmarried. In Jeju, if he/she is older than the speaker, everyone, regardless of his/her sex, can be samchun whether or not he/she is the speaker's relative. It means that Jeju language well reflects local culture and social aspects of Jeju. In other words, Jeju language reflects Jeju culture and society, and it reveals Jeju people's soul. This paper aims to investigate efforts to preserve Jeju language which reflect Jeju people's soul and cultures, processes which Jeju was included in the Atlas of languages in danger by UNESCO, and substantive approaches for preserving Jeju. 2. Studies on Jeju and efforts to preserve it There have been lots of studies on Jeju and efforts to preserve it by individuals, institutions, media, and etc. 2.1 Individual studies on Jeju language and efforts to preserve it Individual studies on Jeju language started with Japanese linguist Ogura Shinpei's Jeju Dialect in 1913. He also presented The Value of Jeju Dialect and Jeju Dialect: Cheong-gu Journal in 1924 and 1931 each.
    [Show full text]
  • Christian Communication and Its Impact on Korean Society : Past, Present and Future Soon Nim Lee University of Wollongong
    University of Wollongong Thesis Collections University of Wollongong Thesis Collection University of Wollongong Year Christian communication and its impact on Korean society : past, present and future Soon Nim Lee University of Wollongong Lee, Soon Nim, Christian communication and its impact on Korean society : past, present and future, Doctor of Philosphy thesis, School of Journalism and Creative Writing - Faculty of Creative Arts, University of Wollongong, 2009. http://ro.uow.edu.au/theses/3051 This paper is posted at Research Online. Christian Communication and Its Impact on Korean Society: Past, Present and Future Thesis submitted in fulfilment of the requirements for the award of the degree of Doctor of Philosophy University of Wollongong Soon Nim Lee Faculty of Creative Arts School of Journalism & Creative writing October 2009 i CERTIFICATION I, Soon Nim, Lee, declare that this thesis, submitted in partial fulfilment of the requirements for the award of Doctor of Philosophy, in the Department of Creative Arts and Writings (School of Journalism), University of Wollongong, is wholly my own work unless otherwise referenced or acknowledged. The document has not been submitted for qualifications at any other academic institution. Soon Nim, Lee 18 March 2009. i Table of Contents Certification i Table of Contents ii List of Tables vii Abstract viii Acknowledgements x Chapter 1: Introduction 1 Chapter 2: Christianity awakens the sleeping Hangeul 12 Introduction 12 2.1 What is the Hangeul? 12 2.2 Praise of Hangeul by Christian missionaries
    [Show full text]
  • Alawi 1 Hayla Alawi Pamela J Mackintosh Undergraduate
    Alawi 1 Hayla Alawi Pamela J Mackintosh Undergraduate Research Award May 8th, 2020 Jeju Island, the Three Clans Myth, and Women Divers: Female Importance in Jeju’s Cultural History Introduction Jeju1 Island, officially the Jeju Special Self-Governing Province, lies 90 kilometers off the southern coast of the Korean peninsula and forms a province of South Korea. It is an interesting place, considered by many historians to be unique from mainland Korea before it was absorbed into the larger state, with fascinating cultural phenomena and a murky past. Although there is not much scholarship on the early history of Jeju2 and little in the written record about the island, it is possible to theorize what early Jeju cultural history may have looked like through a combined examination of the island’s mythology and modern-day culture. To gain a greater understanding of what early Jeju human culture may have looked like, I will examine the Myth of the Three Clans of Jeju Island, Jeju’s most prominent foundation myth. It is not the only foundation myth originating from the Korean Peninsula, but it is unique in that it features a key reversal between the roles of men and women in a narrative that is otherwise similar to other Korean foundation myths, the rest of which are found on mainland Korea. Myths can be thought of as reflecting a people’s society, culture, and perceived history, so the nature of 1 Note on Korean romanization: both the Revised Romanization of Korean (RR) and the McCune-Reischauer (MR) systems of Korean romanization will be used in this paper.
    [Show full text]
  • The Finnish Korean Connection: an Initial Analysis
    The Finnish Korean Connection: An Initial Analysis J ulian Hadland It has traditionally been accepted in circles of comparative linguistics that Finnish is related to Hungarian, and that Korean is related to Mongolian, Tungus, Turkish and other Turkic languages. N.A. Baskakov, in his research into Altaic languages categorised Finnish as belonging to the Uralic family of languages, and Korean as a member of the Altaic family. Yet there is evidence to suggest that Finnish is closer to Korean than to Hungarian, and that likewise Korean is closer to Finnish than to Turkic languages . In his analytic work, "The Altaic Family of Languages", there is strong evidence to suggest that Mongolian, Turkic and Manchurian are closely related, yet in his illustrative examples he is only able to cite SIX cases where Korean bears any resemblance to these languages, and several of these examples are not well-supported. It was only in 1927 that Korean was incorporated into the Altaic family of languages (E.D. Polivanov) . Moreover, as Baskakov points out, "the Japanese-Korean branch appeared, according to (linguistic) scien tists, as a result of mixing altaic dialects with the neighbouring non-altaic languages". For this reason many researchers exclude Korean and Japanese from the Altaic family. However, the question is, what linguistic group did those "non-altaic" languages belong to? If one is familiar with the migrations of tribes, and even nations in the first five centuries AD, one will know that the Finnish (and Ugric) tribes entered the areas of Eastern Europe across the Siberian plane and the Volga.
    [Show full text]
  • Jejueo Datasets for Machine Translation and Speech Synthesis
    Jejueo Datasets for Machine Translation and Speech Synthesis Kyubyong Park, Yo Joong Choe, Jiyeon Ham Kakao Brain 20, Pangyoyeok-ro 241, Bundang-gu, Seongnam-si, Gyeonggi-do, Korea fkyubyong.park, yj.choe, [email protected] Abstract Jejueo was classified as critically endangered by UNESCO in 2010. Although diverse efforts to revitalize it have been made, there have been few computational approaches. Motivated by this, we construct two new Jejueo datasets: Jejueo Interview Transcripts (JIT) and Jejueo Single Speaker Speech (JSS). The JIT dataset is a parallel corpus containing 170k+ Jejueo-Korean sentences, and the JSS dataset consists of 10k high-quality audio files recorded by a native Jejueo speaker and a transcript file. Subsequently, we build neural systems of machine translation and speech synthesis using them. All resources are publicly available via our GitHub repository. We hope that these datasets will attract interest of both language and machine learning communities. Keywords: Jejueo, Jeju language 1. Introduction A B C D Jejueo, or the Jeju language, is a minority language used on 제주어 JIT Dataset JSS Script d JSS Audios for Machine for Speech for Speech Jeju Island (O’Grady, 2015). It was classified as critically 구술자료집 compile extract recor 1 Translation Synthesis Synthesis endangered by UNESCO in 2010. While there have been PDF TXT TSV WAV many academic efforts to preserve the language (Yang et al., 2017; Saltzman, 2017; Yang et al., 2018a; Yang et al., Figure 1: Overview of dataset construction. The original 2018b), data-driven approaches for Jejueo-related language pdf files (A) are compiled into the JIT dataset (B).
    [Show full text]