Directl: a Language Independent Approach to Transliteration

Total Page:16

File Type:pdf, Size:1020Kb

Directl: a Language Independent Approach to Transliteration DIRECTL: a Language-Independent Approach to Transliteration Sittichai Jiampojamarn, Aditya Bhargava, Qing Dou, Kenneth Dwyer, Grzegorz Kondrak Department of Computing Science University of Alberta Edmonton, AB, T6G 2E8, Canada {sj,abhargava,qdou,dwyer,kondrak}@cs.ualberta.ca Abstract results of DIRECTL with its variants that incor- porate language-specific pre-processing, phonetic We present DIRECTL: an online discrimi- alignment, and manual data correction. native sequence prediction model that em- 2 Transliteration alignment ploys a many-to-many alignment between target and source. Our system incorpo- In the transliteration task, training data consist of rates input segmentation, target charac- word pairs that map source language words to ter prediction, and sequence modeling in words in the target language. The matching be- a unified dynamic programming frame- tween character substrings in the source word and work. Experimental results suggest that target word is not explicitly provided. These hid- DIRECTL is able to independently dis- den relationships are generally known as align- cover many of the language-specific reg- ments. In this section, we describe an EM-based ularities in the training data. many-to-many alignment algorithm employed by DIRECTL. In Section 4, we discuss an alternative 1 Introduction phonetic alignment method. In the transliteration task, it seems intuitively im- We apply an unsupervised many-to-many align- portant to take into consideration the specifics of ment algorithm (Jiampojamarn et al., 2007) to the the languages in question. Of particular impor- transliteration task. The algorithm follows the ex- tance is the relative character length of the source pectation maximization (EM) paradigm. In the and target names, which vary widely depending on expectation step shown in Algorithm 1, partial counts γ of the possible substring alignments are whether languages employ alphabetic, syllabic, or T V ideographic scripts. On the other hand, faced with collected from each word pair (x ,y ) in the the reality of thousands of potential language pairs training data; T and V represent the lengths of that involve transliteration, the idea of a language- words x and y, respectively. The forward prob- independent approach is highly attractive. ability α is estimated by summing the probabili- ties of all possible sequences of substring pairings In this paper, we present DIRECTL: a translit- eration system that, in principle, can be applied to from left to right. The FORWARD-M2M procedure is similar to lines 5 through 12 of Algorithm 1, ex- any language pair. DIRECTL treats the transliter- ation task as a sequence prediction problem: given cept that it uses Equation 1 on line 8, Equation 2 on line 12, and initializes α . Likewise, the an input sequence of characters in the source lan- 0,0 := 1 backward probability β is estimated by summing guage, it produces the most likely sequence of the probabilities from right to left. characters in the target language. In Section 2, we discuss the alignment of character substrings α += δ(xt ,ǫ)α (1) in the source and target languages. Our transcrip- t,v t−i+1 t−i,v tion model, described in Section 3, is based on t v α += δ(x ,y )α − − (2) an online discriminative training algorithm that t,v t−i+1 v−j+1 t i,v j makes it possible to efficiently learn the weights The maxX and maxY variables specify the of a large number of features. In Section 4, we maximum length of substrings that are permitted provide details of alternative approaches that in- when creating alignments. Also, for flexibility, we corporate language-specific information. Finally, allow a substring in the source word to be aligned in Section 5 and 6, we compare the experimental with a “null” letter (ǫ) in the target word. 28 Proceedings of the 2009 Named Entities Workshop, ACL-IJCNLP 2009, pages 28–31, Suntec, Singapore, 7 August 2009. c 2009 ACL and AFNLP Algorithm 1: Expectation-M2M alignment Algorithm 2: Online discriminative training T V Input: x ,y ,maxX,maxY,γ Input: Data {(x1, y1), (x2, y2),..., (xm, ym)}, Output: γ number of iterations k, size of n-best list n T V Output: Learned weights ψ 1 α := FORWARD-M2M (x ,y ,maxX,maxY ) T V ~ 2 β := BACKWARD-M2M (x ,y ,maxX,maxY ) 1 ψ := 0 3 if (αT,V = 0) then 2 for k iterations do 4 return 3 for j = 1 ...m do 5 for t = 0 ...T , v = 0 ...V do 4 Yˆj = {yˆj1,..., yˆjn} = arg maxy[ψ · Φ(xj , y)] 6 if (t> 0) then 5 update ψ according to Yˆj and yj 7 for i = 1 ...maxX st t − i ≥ 0 do t 6 return ψ t αt i,v δ(xt i+1,ǫ)βt,v − 8 γ(xt i+1, ǫ)+= α − − T,V 9 if (v > 0 ∧ t> 0) then 10 for i = 1 ...maxX st t − i ≥ 0 do rameters ψ. The values of k and n are deter- 11 for j = 1 ...maxY st v − j ≥ 0 do t v mined using a development set. The model param- t v αt i,v j δ(xt i+1,yv j+1)βt,v − − 12 γ(xt i+1,yv j+1)+= α− − − − T,V eters are updated according to the correct output yj and the predicted n-best outputs Yˆj, to make the model prefer the correct output over the in- In the maximization step, we normalize the par- correct ones. Specifically, the feature weight vec- tial counts γ to the alignment probability δ using tor ψ is updated by using MIRA, the Margin In- the conditional probability distribution. The EM fused Relaxed Algorithm (Crammer and Singer, steps are repeated until the alignment probability 2003). MIRA modifies the current weight vector δ converges. Finally, the most likely alignment for ψo by finding the smallest changes such that the each word pair in the training data is computed new weight vector ψn separates the correct and in- with the standard Viterbi algorithm. correct outputs by a margin of at least ℓ(y, yˆ), the loss for a wrong prediction. We define this loss to 3 Discriminative training be 0 if yˆ = y; otherwise it is 1+ d, where d is We adapt the online discriminative training frame- the Levenshtein distance between y and yˆ. The work described in (Jiampojamarn et al., 2008) to update operation is stated as a quadratic program- the transliteration task. Once the training data has ming problem in Equation 3. We utilize a function been aligned, we can hypothesize that the ith let- from the SVMlight package (Joachims, 1999) to ter substring xi ∈ x in a source language word solve this optimization problem. is transliterated into the ith substring y ∈ y in i min k ψ − ψ k the target language word. Each word pair is rep- ψn n o subject to ∀yˆ ∈ Yˆ : (3) resented as a feature vector Φ(x, y). Our feature ψ x, y x, y ℓ y, y vector consists of (1) n-gram context features, (2) n · (Φ( ) − Φ( ˆ)) ≥ ( ˆ) HMM-like transition features, and (3) linear-chain The arg max operation is performed by an exact features. The n-gram context features relate the search algorithm based on a phrasal decoder (Zens letter evidence that surrounds each letter xi to its and Ney, 2004). This decoder simultaneously output yi. We include all n-grams that fit within finds the l most likely substrings of letters x that a context window of size c. The c value is deter- generate the most probable output y, given the mined using a development set. The HMM-like feature weight vector ψ and the input word xT . transition features express the cohesion of the out- The search algorithm is based on the following dy- put y in the target language. We make a first order namic programming recurrence: Markov assumption, so that these features are bi- Q(0, $) = 0 grams of the form (yi−1,yi). The linear-chain fea- Q(t,p) = max{ψ · φ(xt ,p′,p)+ Q(t′,p′)} tures are identical to the context features, except t′+1 p′,p, that yi is replaced with a bi-gram (yi−1,yi). t−maxX≤t′<t Algorithm 2 trains a linear model in this fea- Q(T +1, $) = max{ψ · φ($,p′, $) + Q(T,p′)} ture space. The procedure makes k passes over p′ the aligned training data. During each iteration, To find the n-best predicted outputs, the table the model produces the n most likely output words Q records the top n scores for each output sub- Yˆj in the target language for each input word xj string that has the suffix p substring and is gen- t ′ in the source language, based on the current pa- erated by the input letter substring x1; here, p is 29 a sub-output generated during the previous step. In Japanese, we replace each Katakana character The notation φ(xt ,p′,p) is a convenient way with one or two phonemes using standard tran- t′+1 to describe the components of our feature vector scription tables. For the Latin script, we simply Φ(x, y). The n-best predicted outputs Yˆ can be treat every letter as an IPA symbol (International discovered by backtracking from the end of the ta- Phonetic Association, 1999). The IPA contains a ble, which is denoted by Q(T + 1, $). subset of 26 letter symbols that tend to correspond to the usual phonetic value that the letter repre- 4 Beyond DIRECTL sents in the Latin script. The Chinese characters 4.1 Intermediate phonetic representation are first converted to Pinyin, which is then handled We experimented with converting the original Chi- in the same way as the Latin script.
Recommended publications
  • Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
    1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only.
    [Show full text]
  • Phonetics and Phonology in Russian Unstressed Vowel Reduction: a Study in Hyperarticulation
    Phonetics and Phonology in Russian Unstressed Vowel Reduction: A Study in Hyperarticulation Jonathan Barnes Boston University (Short title: Hyperarticulating Russian Unstressed Vowels) Jonathan Barnes Department of Modern Foreign Languages and Literatures Boston University 621 Commonwealth Avenue, Room 119 Boston, MA 02215 Tel: (617) 353-6222 Fax: (617) 353-4641 [email protected] Abstract: Unstressed vowel reduction figures centrally in recent literature on the phonetics-phonology interface, in part owing to the possibility of a causal relationship between a phonetic process, duration-dependent undershoot, and the phonological neutralizations observed in systems of unstressed vocalism. Of particular interest in this light has been Russian, traditionally described as exhibiting two distinct phonological reduction patterns, differing both in degree and distribution. This study uses hyperarticulation to investigate the relationship between phonetic duration and reduction in Russian, concluding that these two reduction patterns differ not in degree, but in the level of representation at which they apply. These results are shown to have important consequences not just for theories of vowel reduction, but for other problems in the phonetics-phonology interface as well, incomplete neutralization in particular. Introduction Unstressed vowel reduction has been a subject of intense interest in recent debate concerning the nature of the phonetics-phonology interface. This is the case at least in part due to the existence of two seemingly analogous processes bearing this name, one typically called phonetic, and the other phonological. Phonological unstressed vowel reduction is a phenomenon whereby a given language's full vowel inventory can be realized only in lexically stressed syllables, while in unstressed syllables some number of neutralizations of contrast take place, with the result that only a subset of the inventory is realized on the surface.
    [Show full text]
  • Galio Uma, Maƙerin Azurfa Maƙeri (M): Ni Ina Da Shekara Yanzu…
    LANGUAGE PROGRAM, 232 BAY STATE ROAD, BOSTON, MA 02215 [email protected] www.bu.edu/Africa/alp 617.353.3673 Interview 2: Galio Uma, Maƙerin Azurfa Maƙeri (M): Ni ina da shekara yanzu…, ina da vers …alal misali vers mille neuf cent quatre vingt et quatre (1984) gare mu parce que wajenmu babu hopital lokacin da anka haihe ni ban san shekaruna ba, gaskiya. Ihehehe! Makaranta, ban yi makaranta ba sabo da an sa ni makaranta sai daga baya wajenmu babana bai da hali, mammana bai da hali sai mun biya. Akwai nisa tsakaninmu da inda muna karatu. Kilometir kama sha biyar kullum ina tahiya da ƙahwa. To, ha na gaji wani lokaci mu hau jaki, wani lokaci mu hau raƙumi, to wani lokaci ma babu hali, babu karatu. Amman na yi shekara shidda ina karatu ko biyat. Amman ban ji daɗi ba da ban yi karatu ba. Sabo da shi ne daɗin duniya. Mm. To mu Buzaye ne muna yin buzanci. Ana ce muna Tuareg. Mais Tuareg ɗin ma ba guda ba ne. Mu artisans ne zan ce miki. Can ainahinmu ma ba mu da wani aiki sai ƙira. Duka mutanena suna can Abala. Nan ga ni dai na nan ga. Ka gani ina da mata, ina da yaro. Akwai babana, akwai mammana. Ina da ƙanne huɗu, macce biyu da namiji biyu. Suna duka cikin gida. Duka kuma kama ni ne chef de famille. Duka ni ne ina aider ɗinsu. Sabo da babana yana da shekara saba’in da biyar. Vers. Mammana yana da shekara shi ma hamsin da biyar.
    [Show full text]
  • Writing As Aesthetic in Modern and Contemporary Japanese-Language Literature
    At the Intersection of Script and Literature: Writing as Aesthetic in Modern and Contemporary Japanese-language Literature Christopher J Lowy A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2021 Reading Committee: Edward Mack, Chair Davinder Bhowmik Zev Handel Jeffrey Todd Knight Program Authorized to Offer Degree: Asian Languages and Literature ©Copyright 2021 Christopher J Lowy University of Washington Abstract At the Intersection of Script and Literature: Writing as Aesthetic in Modern and Contemporary Japanese-language Literature Christopher J Lowy Chair of the Supervisory Committee: Edward Mack Department of Asian Languages and Literature This dissertation examines the dynamic relationship between written language and literary fiction in modern and contemporary Japanese-language literature. I analyze how script and narration come together to function as a site of expression, and how they connect to questions of visuality, textuality, and materiality. Informed by work from the field of textual humanities, my project brings together new philological approaches to visual aspects of text in literature written in the Japanese script. Because research in English on the visual textuality of Japanese-language literature is scant, my work serves as a fundamental first-step in creating a new area of critical interest by establishing key terms and a general theoretical framework from which to approach the topic. Chapter One establishes the scope of my project and the vocabulary necessary for an analysis of script relative to narrative content; Chapter Two looks at one author’s relationship with written language; and Chapters Three and Four apply the concepts explored in Chapter One to a variety of modern and contemporary literary texts where script plays a central role.
    [Show full text]
  • Japanese Language and Culture
    NIHONGO History of Japanese Language Many linguistic experts have found that there is no specific evidence linking Japanese to a single family of language. The most prominent theory says that it stems from the Altaic family(Korean, Mongolian, Tungusic, Turkish) The transition from old Japanese to Modern Japanese took place from about the 12th century to the 16th century. Sentence Structure Japanese: Tanaka-san ga piza o tabemasu. (Subject) (Object) (Verb) 田中さんが ピザを 食べます。 English: Mr. Tanaka eats a pizza. (Subject) (Verb) (Object) Where is the subject? I go to Tokyo. Japanese translation: (私が)東京に行きます。 [Watashi ga] Toukyou ni ikimasu. (Lit. Going to Tokyo.) “I” or “We” are often omitted. Hiragana, Katakana & Kanji Three types of characters are used in Japanese: Hiragana, Katakana & Kanji(Chinese characters). Mr. Tanaka goes to Canada: 田中さんはカナダに行きます [kanji][hiragana][kataka na][hiragana][kanji] [hiragana]b Two Speech Styles Distal-Style: Semi-Polite style, can be used to anyone other than family members/close friends. Direct-Style: Casual & blunt, can be used among family members and friends. In-Group/Out-Group Semi-Polite Style for Out-Group/Strangers I/We Direct-Style for Me/Us Polite Expressions Distal-Style: 1. Regular Speech 2. Ikimasu(he/I go) Honorific Speech 3. Irasshaimasu(he goes) Humble Speech Mairimasu(I/We go) Siblings: Age Matters Older Brother & Older Sister Ani & Ane 兄 と 姉 Younger Brother & Younger Sister Otooto & Imooto 弟 と 妹 My Family/Your Family My father: chichi父 Your father: otoosan My mother: haha母 お父さん My older brother: ani Your mother: okaasan お母さん Your older brother: oniisanお兄 兄 さん My older sister: ane姉 Your older sister: oneesan My younger brother: お姉さ otooto弟 ん Your younger brother: My younger sister: otootosan弟さん imooto妹 Your younger sister: imootosan 妹さん Boy Speech & Girl Speech blunt polite I/Me = watashi, boku, ore, I/Me = watashi, washi watakushi I am going = Boku iku.僕行 I am going = Watashi iku く。 wa.
    [Show full text]
  • Directions Active Employees (At 31.12
    Publishing details At a glance Sales and net income for the year in € m Level of internationality Visitors: 47.9 % Exhibitors: 73.4 % 500 50 400 40 Editors-in-chief Print production 300 30 Dominique Ewert Messe Frankfurt Medien Klaus Münster-Müller und Service GmbH 200 20 Publishing Services Annual Report 2014 100 10 Editors 2010 2011 2012 2013 2014 Markus Quint (production editor) Print l from Germany attending Messe Frankfurt events at the Frankfurt venue l Sales l Net income for the year l from outside Germany attending Messe Frankfurt events at the Frankfurt venue Antje Breuer-Seifi Druckhaus Becker GmbH Claudia Lehning-Berge Dieselstraße 9 2014 The Messe Frankfurt corporate group conceives, plans and hosts trade fairs and exhibitions in Germany and abroad. Sarah Stanzel 64372 Ober-Ramstadt Messe Frankfurt The parent company and its subsidiaries offer a well-coordinated service package for national and international Gabriele Wehrl Germany Annual Report customers, exhibitors and visitors. Responsibility for content in accordance Paper Corporate group 2 with the German press laws Cover: Hello Fat Matt 1.1 350 g/m in € m* 2010 2011 2012 2013 2014 Iris Jeglitza-Moshage Inside pages: Arctic the Volume 150 g/m2 Sales 448 467 537 545 554 Photographs Print run Personnel expenses 102 106 120 123 131 Pietro Sutera Photography (p. 3) 3,000 in two editions Depreciation, amortisation and write-downs 59 59 61 56 52 Rüdiger Nehmzow (p. 6–11, 36–39) (German and English) Earnings before taxes on income 42 34 36 49 47 Iwan Baan (p. 19) EBITDA 109 99 102 108 102 BRCK (p.
    [Show full text]
  • Seize the Ъ: Linguistic and Social Change in Russian Orthographic Reform Eugenia Sokolskaya
    Seize the Ъ: Linguistic and Social Change in Russian Orthographic Reform Eugenia Sokolskaya It would be convenient for linguists and language students if the written form of any language were a neutral and direct representation of the sounds emitted during speech. Instead, writing systems tend to lag behind linguistic change, retaining old spellings or morphological features instead of faithfully representing a language's phonetics. For better or worse, sometimes the writing can even cause a feedback loop, causing speakers to hypercorrect in imitation of an archaic spelling. The written and spoken forms of a language exist in a complex and fluid relationship, influenced to a large extent by history, accident, misconception, and even politics.1 While in many language communities these two forms – if writing exists – are allowed to develop and influence each other in relative freedom, with some informal commentary, in some cases a willful political leader or group may step in to attempt to intentionally reform the writing system. Such reforms are a risky venture, liable to anger proponents of historic accuracy and adherence to tradition, as well as to render most if not all of the population temporarily illiterate. With such high stakes, it is no wonder that orthographic reforms are not often attempted in the course of a language's history. The Russian language has undergone two sharply defined orthographic reforms, formulated as official government policy. The first, which we will from here on call the Petrine reform, was initiated in 1710 by Peter the Great. It defined a print alphabet for secular use, distancing the writing from the Church with its South Slavic lithurgical language.
    [Show full text]
  • Introduction to Japanese Computational Linguistics Francis Bond and Timothy Baldwin
    1 Introduction to Japanese Computational Linguistics Francis Bond and Timothy Baldwin The purpose of this chapter is to provide a brief introduction to the Japanese language, and natural language processing (NLP) research on Japanese. For a more complete but accessible description of the Japanese language, we refer the reader to Shibatani (1990), Backhouse (1993), Tsujimura (2006), Yamaguchi (2007), and Iwasaki (2013). 1 A Basic Introduction to the Japanese Language Japanese is the official language of Japan, and belongs to the Japanese language family (Gordon, Jr., 2005).1 The first-language speaker pop- ulation of Japanese is around 120 million, based almost exclusively in Japan. The official version of Japanese, e.g. used in official settings andby the media, is called hyōjuNgo “standard language”, but Japanese also has a large number of distinctive regional dialects. Other than lexical distinctions, common features distinguishing Japanese dialects are case markers, discourse connectives and verb endings (Kokuritsu Kokugo Kenkyujyo, 1989–2006). 1There are a number of other languages in the Japanese language family of Ryukyuan type, spoken in the islands of Okinawa. Other languages native to Japan are Ainu (an isolated language spoken in northern Japan, and now almost extinct: Shibatani (1990)) and Japanese Sign Language. Readings in Japanese Natural Language Processing. Francis Bond, Timothy Baldwin, Kentaro Inui, Shun Ishizaki, Hiroshi Nakagawa and Akira Shimazu (eds.). Copyright © 2016, CSLI Publications. 1 Preview 2 / Francis Bond and Timothy Baldwin 2 The Sound System Japanese has a relatively simple sound system, made up of 5 vowel phonemes (/a/,2 /i/, /u/, /e/ and /o/), 9 unvoiced consonant phonemes (/k/, /s/,3 /t/,4 /n/, /h/,5 /m/, /j/, /ó/ and /w/), 4 voiced conso- nants (/g/, /z/,6 /d/ 7 and /b/), and one semi-voiced consonant (/p/).
    [Show full text]
  • Zive Naj Vsi Narodi K I Hrepene Docakat Dan D E Koder Sonce Hodi P Repir I Z Sveta Bo Pregnan D E Rojak P Rost Bo Vsak N E Vrag
    12. letno srečanje Združenja za slovansko jezikoslovje Povzetki prispevkov 12th Slavic Linguistics Society Annual Meeting Book of Abstracts XII ежегодная конференция Общества славянского языкознания Тезисы докладов Ljubljana, 21.–24. 9. 2017 v / / Zive naj vsi narodi / / / K i hrepene docakatv dan , / / / D e koder sonce hodi , / / P repir/ iz sveta bo/ pregnan , / D e rojak / P rost bo vsak , / / / N e vrag, le sosed bo mejak. 12. letno srečanje Združenja za slovansko jezikoslovje: Povzetki prispevkov 12th Slavic Linguistics Society Annual Meeting: Book of Abstracts XII ежегодная конференция Общества славянского языкознания: Тезисы докладов ISBN: 978-961-05-0027-8 Urednika / Editors / Редакторы: Luka Repanšek, Matej Šekli Recenzenti / Peer-reviewers / Рецензенты: Aleksandra Derganc, Marko Hladnik, Gašper Ilc, Zenaida Karavdić, Simona Kranjc, Domen Krvina, Nina Ledinek, Frančiška Lipovšek, Franc Marušič, Tatjana Marvin, Petra Mišmaš, Matic Pavlič, Анастасия Ильинична Плотникова / Anastasiia Plotnikova, Luka Repanšek, Михаил Николаевич Саенко / Mikhail Saenko, Дмитрий Владимирович Сичинава / Dmitri Sitchinava, Vera Smole, Mojca Smolej, Marko Snoj, Petra Stankovska, Andrej Stopar, Saška Štumberger, Hotimir Tivadar, Mitja Trojar, Mladen Uhlik, Mojca Žagar Karer, Rok Žaucer, Andreja Žele, Sašo Živanović Vodja konference / Conference leadership / Руководитель конференции: Matej Šekli Organizacijski odbor konference / Conference board / Оргкомитет конференции: Branka Kalenić Ramšak, Domen Krvina, Alenka Lap, Oto Luthar, Tatjana Marvin, Mojca
    [Show full text]
  • Title: Exploring Japanese Language: from East Asia to Eurasia
    Eurasia Foundation International Lectures, Fall 2020 Semester “The Construction and Transformation of East Asiaology” Lecture Series (12) Title: Exploring Japanese Language: From East Asia to Eurasia For the 12th Eurasia Foundation International Lectures, we invite Professor Shun-Yi Chen, from the Center for Japanese Studies at Chinese Culture University, to share his research results on exploring Japanese language. Professor Chen’s speech consist of two parts: the first part explores the pronunciation of Chinese characters in Japanese (Japanese Kanji) and its relations with the composition of words and the second part discusses the similarity between Japanese and Hebrew. Professor Chen indicates that the pronunciation of Japanese Kanji is different depending on when they were introduced to Japan. There are different types including Go-on (呉音, “sounds from the Wu region”), Kan-on (漢音, “Han sound”), and Tō-on (唐音, “Tang sound”). Kan-on refers to sounds of Kanji from Chang’an in Tang dynasty which were introduced with Japanese missions to Tang and Sui China. Go-on refers to sounds of Kanji from Southern China which were introduced via the Korean Peninsula in the 5th and 6th century. Tō-on refers to sounds of Kanji introduced after Song dynasty, while So-on were introduced in Yuan, Ming Dynasty. Most Tō-on and So-on were Buddhist terms. Here are some examples of different sounds: 行政 Gyōsei (ぎょうせい) is Go-on; 銀行 Ginkō (ぎんこう) is Kan-on. Professor Chen also points out that since the pronunciation of Kanji were introduced from China, they must follow the rule of the pronunciation in Chinese.
    [Show full text]
  • Abin Yi Idan Kana Da COVID-19
    Abin yi idan kana da COVID-19 KOYI YADDA ZAKA KULA DA KANKA DA KUMA WADANSU A GIDA. Menene Alamomin COVID-19? • Akwai alamomi iri-iri masu yawa, fara daga masu sauki zuwa masu tsanani. Wadansu mutane ba su da wadansu alamomin cutar. • Mafi yawan alamomi sun haɗa da zazzaɓi ko sanyi, tari, gajeruwar numfashi ko wahalar numfashi, gajiya, ciwon jijiyoyi ko jiki, ciwon kai, rashin dandano ko jin wari, ciwon makogwaro ko yoyon hanci, tashin zuciya ko amai, da gudawa. Wanene ke da Hadari don Me ya kamata in yi idan ina da alamomin COVID-19? Tsananin ciwo daga COVID-19? • Tsaya a gida! Kada ka bar gida sai dai don • Tsakanin manya, haɗarin yin gwaji don COVID-19 da sauran muhimman mummunan rashin lafiya yana abubuwan kula da lafiya ko bukatun ƙaruwa tare da shekaru, tsofaffi yau-da-kullum, kamar su kayan masarufi, na cikin haɗari mafi girma. idan wani ba zai iya samar maka su ba. Kada ka je wurin aiki, ko da kuwa kana • Mutane daga wadansu launin ma’aikaci mai muhimmanci. fatar da kabilu (hade da Bakake, ’yan Latin Amurka da ’Yan asali) • Yi magana da mai baka kulawar lafiya! saboda tsarin lafiya da rashin Yi amfani da tarho ko telemedicine idan zai yiwu. daidaiton zamantakewa. • Yi gwaji! Idan mai baka kulawa baya bayar • Mutanen na kowane shekaru yin gwaji, ziyarci nyc.gov/covidtest ko kira waɗanda suke da yanayin 311 don neman wurin gwaji kusa da kai. boyayyen rashin lafiya, kamar: Wurare dayawa Na bayar da gwajin kyauta. Ciwon daji • Kira 911 a yanayin gaggawa! Idan kana Daɗɗaɗen Ciwon koda da matsalar numfashi, zafi ko matsin lamba Daɗɗaɗen ciwon huhu a kirjinka, ka rikice ko baka iya zama farke, Ciwon mantuwa da sauran kana da leɓuna masu ruwan bula ko fuska, cututtukan jijiyoyin jiki ko wadansu yanayin gaggawa, jeka asibiti Ciwon suga ko kira 911 nan take.
    [Show full text]
  • How to Deal with the Elements of Russian Origin
    研究論文 How to Deal with the Elements of Russian Origin: Developments of Orthographic Reforms for the Kyrgyz Language1) ロシア語的要素をどのように扱うか ― キルギス語正書法改革の展開 ― 小田桐 奈 美 Nami Odagiri 本研究は、ロシア語的要素(ロシア語起源の借用語、キリル文字、ロシア語起源の音)に 着目し、ソ連時代およびソ連崩壊以降のキルギス語の正書法改革において、このロシア語的 要素が「外来要素」として認識され排除されてきたのか、それとも何らかの形で受容されて きたのかという問題を検討するものである。本研究によって、特に独立以降のキルギス語の 正書法は、単純にロシア語的要素を排除するわけでも積極的に受け入れるわけでもなく、非 常に複雑に入り組んだものであり、かつ曖昧さを包含するものであったことが明らかになっ た。 キーワード キルギス語、ロシア語、正書法改革、旧ソ連地域 1 . Introduction After the dissolution of the Soviet Union, the titular languages of each ex-Soviet state were promoted as the “state language” 2) and positioned as symbols of national integration. On the basis of the results from previous studies regarding this topic (e.g., Landau and Kellner- Heinkele 2001; Pavlenko 2008), the author emphasizes the general tendency of each states’ language influence to grow, while the role of the Russian language, which enjoyed the highest prestige during the Soviet era, is shrinking, although its influence has not been completely excluded 3). However, many previous studies are general or comparative involving two or more states of the ex-Soviet region, and details of the relational dynamics between the state and Russian language in each state have not yet been entirely explored. Moreover, as previous studies were 69 外国語学部紀要 第 12 号(2015 年 3 月) mainly concerned with the legal and social status of language, such linguistic issues as ortho- graphic and alphabet reforms in the post-Soviet era have not been entirely discussed4). This study, therefore, focuses on the elements of Russian origin in the Kyrgyz language (e.g., the Cyrillic alphabet, loanwords, and sounds of Russian origin). Additionally, it addresses issues concerning orthographic reforms for the Kyrgyz language.
    [Show full text]