Back Transliteration from Japanese to English Using Target English Context

Total Page:16

File Type:pdf, Size:1020Kb

Back Transliteration from Japanese to English Using Target English Context Back Transliteration from Japanese to English Using Target English Context Isao Goto†, Naoto Kato††, Terumasa Ehara†††, and Hideki Tanaka† †NHK Science and Technical ††ATR Spoken Language Trans- †††Tokyo University of Research Laboratories lation Research Laboratories Science, Suwa 1-11-10 Kinuta, Setagaya, 2-2-2 Hikaridai, Keihanna 5000-1, Toyohira, Chino, Tokyo, 157-8510, Japan Science City, Kyoto, 619-0288, Nagano, 391-0292, Japan [email protected] Japan [email protected]. [email protected] [email protected] ac.jp cross-language information retrieval if these Abstract words could be automatically restored to the original English words. This paper proposes a method of automatic Back transliteration is the process of restoring back transliteration of proper nouns, in which transliterated words to the original English words. a Japanese transliterated-word is restored to Here is a problem of back transliteration. the original English word. The English words are created from a sequence of letters; thus [Back transliteration] our method can create new English words that ? クラッチフィールド are not registered in dictionaries or English (English word) (ku ra cchi fi – ru do) word lists. When a katakana character is con- verted into English letters, there are various There are many ambiguities to restoring a candidates of alphabetic characters. To ensure transliterated katakana word to its original Eng- adequate conversion, the proposed method lish word. For example, should "a" in "ku ra cchi uses a target English context to calculate the fi – ru do" be converted into the English letter of probability of an English character or string "a" or "u" or some other letter or string? Trying corresponding to a Japanese katakana charac- to resolve the ambiguity is a difficult problem, ter or string. We confirmed the effectiveness which means that back transliteration to the cor- of using the target English context by an ex- rect English word is also difficult. periment of personal-name back translitera- Using the pronunciation of a dictionary or lim- tion. iting output English words to a particular English word list prepared in advance can simplify the problem of back transliteration. However, these 1 Introduction methods cannot produce a new English word that In transliteration, a word in one language is con- is not registered in a dictionary or an English verted into a character string of another language word list. Transliterated words are mainly proper expressing how it is pronounced. In the case of nouns and technical terms, and such words are transliteration into Japanese, special characters often not registered. Thus, a back transliteration called katakana are used to show how a word is framework for creating new words would be very pronounced. For example, a personal name and useful. its transliterated word are shown below. A number of back transliteration methods for selecting English words from an English pronun- [Transliteration] ciation dictionary have been proposed. They in- Cunningham カニンガム (ka ni n ga mu) clude Japanese-to-English (Knight and Graehl, 1998) 1 , Arabic-to-English (Stalls and Knight, Here, the italic alphabets are romanized Japanese katakana characters. New transliterated words such as personal names or technical terms in katakana are not al- 1 Their English letter-to-sound WFST does not convert Eng- ways listed in dictionaries. It would be useful for lish words that are not registered in a pronunciation diction- ary. 1998), and Korean-to-English (Lin and Chen, the English letter "u." The correspondence of an 2002). English letter or string to a katakana character or There are also methods that select English string varies depending on the surrounding char- words from an English word list, e.g., Japanese- acters, i.e., on its English context. to-English (Fujii and Ishikawa, 2001) and Chi- Thus, our back transliteration method uses the nese-to-English (Chen et al., 1998). target English context to calculate the probability Moreover, there are back transliteration meth- of English letters corresponding to a katakana ods capable of generating new words, there are character or string. some methods for back transliteration from Ko- rean to English (Jeong et al., 1999; Kang and 2.2 Notation and conversion-candidate Choi, 2000). lattice These previous works did not take the target We formulate the word conversion process as a English context into account for calculating the unit conversion process for treating new words. plausibility of matching target characters with the Here, the unit is one or more characters that form source characters. a part of characters of the word. This paper presents a method of taking the tar- A katakana word, K, is expressed by equation get English context into account to generate an 2.1 with "^" and "$" added to its start and end, English word from a Japanese katakana word. respectively. Our character-based method can produce new K ==kkkkm+1 ... (2.1) English words that are not listed in the learning 0011m+ k = ^ k = $ (2.2) corpus. 0 , m+1 This paper is organized as follows. Section 2 where k j is the j-th character in the katakana describes our method. Section 3 describes the word, and m is the number of characters except experimental set-up and results. Section 4 dis- for "^" and "$" and k m+1 is a character string cusses the performance of our method based on 0 the experimental results. Section 5 concludes our from k0 to km+1 . research. We use katakana units constructed of one or more katakana characters. We denote a katakana 2 Proposed Method unit as ku. For any ku, many English units, eu, could be corresponded as conversion-candidates. The ku's and eu's are generated using a learning 2.1 Advantage of using English context corpus in which bilingual words are separated First we explain the difficulty of back translitera- into units and every ku unit is related an eu unit. tion without a pronunciation dictionary. Next, we L{}E denotes the lattice of all eu's correspond- clarify the reason for the difficulty. Finally, we ing to ku's covering a Japanese word. Every eu is clarify the effect using English context in back a node of the lattice and each node is connected transliteration. with next nodes. L{}E has a lattice structure start- In back transliteration, an English letter or ing from "^" and ending at "$." Figure 1 shows an string is chosen to correspond to a katakana char- example of L{}E corresponding to a katakana acter or string. However, this decision is difficult. For example, there are cases that an English letter word "キルシュシュタイン(ki ru shu shu ta i "u" corresponds to "a" of katakana, and there are n)." In the figure, each circle represents one eu. cases that the same English letter "u" does not A character string linking individual character correspond to the same "a" of katakana. "u" in units in the paths ppppdq∈(12 , ,.., ) between "^" Cunningham corresponds to "a" in katakana and and "$" in L{}E becomes a conversion candidate, "u" in Bush does not correspond to "a" in kata- where q is the number of paths between "^" and kana. It is difficult to resolve this ambiguity "$" in L{}E . without the pronunciation registered in a diction- We get English word candidates by joining eu's ary. from "^" to "$" in L{}E . We select a certain path, The difference in correspondence mainly comes from the difference of the letters around pd, in L{}E . The number of character units キ(ki) ル(ru) シュ(shu) シュ(shu) タ(ta) イ(i) ン(n) ^ cchi l ch ch ta e m $ chi ld che che tad hi mon ci le chou chou tag i mp cki les chu chu te ji n cy lew s s ter y ne k ll sc sc tha yeh ng ke lle sch sch ti yi・・・ngh khi llu schu schu to nin ki lou sh sh tta nn kie lu she she tu・・・ nne kii r shu shu nt ky rc su su nw・・・ タイ qui・・・rd sy sy (tai) re sz・・・sz・・・ t rg taj roo tay rou tey rr ti rre tie rt ty lu・・・ tye・・・ Figure 1: Example of lattice L{}E of conversion candidates units. ˆ except for "^" and "$" in pd is expressed as np()d . E =EKarg maxP ( | ) . (2.6) The character units in p are numbered from start E d ˆ to end. Here, E represents an output result. The English word, E, resulting from the con- To use the English context for calculating the matching of an English unit with a katakana unit, version of a katakana word, K, for pd is expressed as follows: the above equation is transformed into Equation m+1 2.7 by using Bayes’ theorem. K ==kkkk0011.. m+ Eˆ = arg maxPP (EKE ) ( | ) (2.7) np()1d + = ku = ku ku.. ku , (2.3) E 001()1npd + lp()1+ Equation 2.7 contains a translation model in E ==eeeed .. 001()1lpd + which an English word is a condition and kata- = eunp()1d + = eu eu.. eu , (2.4) 001()1npd + kana is a result. The word in the translation model P(|)K E in ke00==ku 0 = eu 0 =^ , Equation 2.7 is broken down into character units ke==ku = eu =$ (2.5) mlpnpnp++1 (ddd )1 ( )1 + ( )1 +, by using equations 2.3 and 2.4. where ej is the j-th character in the English word. EEˆ = arg maxP ( ) E lp()d is the number of characters except for "^" np()1dd++ np ()1 np()1d + × P(,Kku00 , eu | E ) and "$" in the English word. eu0 for each pd ∑∑ eunp()1()1dd++ ku np in L{}E in equation 2.4 becomes the candidate 00 = arg maxP (E ) np()1d + English word.
Recommended publications
  • 7$0,/³(1*/,6+ ',&7,21$5< COMMON SPOKEN TAMIL MADE EASY
    7$0,/³(1*/,6+ ',&7,21$5< )RU8VH:LWK &20021632.(17$0,/ 0$'(($6< 7KLUG(GLWLRQ E\ 79$',.(6$9$/8 'LJLWDO9HUVLRQ &+5,67,$10(',&$/&2//(*(9(//25( Adi’s Book, Tamil-English Dictionary. KEY adv. adverb n. noun pron. pronoun v. verb nom. takes nominatve case dat. takes dative case adj. adjective acc. takes accusative case d.b. taked declensional base intr. intransitive tr. transitive neut. neuter imp. impersonal pers. personal incl inclusive excl. exclusive m. masculine f. feminine sing. singular pl. plural w. weak s. strong TAMIL ENGLISH TAMIL ENGLISH KX ZQ become, happen DNN sister (older) KX ZQ happen, become DNNXº arm pit DE\VDP exercise (n.) DºD VQGK measure DE\VDPVHL ZGK exercise (v.) alamaram banyan tree adangu (w.in) obey DºDYXHGX VWKWK measurements (take m.) adangu (w.in) subside alladhu or adhan its P yes DGKDQOHKDL\OH therefore DPYVH no moon DGK same ambadhu fifty DGKLKUDP authority DPP madam DGKLNODLOH early morning DPPWK\U mother adhisayam wonder (n.) ammi grindstone adhu it ¼ male adhu (pron. neut.) that Q but DGKXNNXWKKXQGKSÀOD accordingly anai dam adhunga their (neut.) DQVLDQFKL pine apple adhunga they (things) DQEXQVDP love, kindness adhunga those (things)(are) QGDYDU Lord DGKXQJDºH them (things) andha (adj) that adi (s.thth) strike andha (pron.neut.) those adikkadi often DQKD many GX goat, sheep DQKDP almost, mostly aduppu oven, stove ange there aduththa next DQJHGKQ there only ahalam width anju five ahappadu (w.tt) caught, (to be) found D¼¼DQ brother (elder) DK\DYLPQDP aeroplane anumadhi permission (n.) K\DPYQXP sky anumadhi permit (n.) LSÀFKFKX FRO finished anumadhi kodu (s.thth) permit (v.) DLQQÌUX five hundred anuppu (w.in) send LSÀ ZQ exhaust, finish apdi that way DL\ sir DSGLGKQ that is it DL\À oh dear DSGL\ indeed?, is that so? DL\ÀSYDP poor soul! DSGL\" really?, right? DNN elder sister DSSWKDKDSSDQ father Adi’s Book, Tamil-English Dictionary.
    [Show full text]
  • Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
    1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only.
    [Show full text]
  • Handy Katakana Workbook.Pdf
    First Edition HANDY KATAKANA WORKBOOK An Introduction to Japanese Writing: KANA THIS IS A SUPPLEMENT FOR BEGINNING LEVEL JAPANESE LANGUAGE INSTRUCTION. \ FrF!' '---~---- , - Y. M. Shimazu, Ed.D. -----~---- TABLE OF CONTENTS Page Introduction vi ACKNOWLEDGEMENlS vii STUDYSHEET#l 1 A,I,U,E, 0, KA,I<I, KU,KE, KO, GA,GI,GU,GE,GO, N WORKSHEET #1 2 PRACTICE: A, I,U, E, 0, KA,KI, KU,KE, KO, GA,GI,GU, GE,GO, N WORKSHEET #2 3 MORE PRACTICE: A, I, U, E,0, KA,KI,KU, KE, KO, GA,GI,GU,GE,GO, N WORKSHEET #~3 4 ADDmONAL PRACTICE: A,I,U, E,0, KA,KI, KU,KE, KO, GA,GI,GU,GE,GO, N STUDYSHEET #2 5 SA,SHI,SU,SE, SO, ZA,JI,ZU,ZE,ZO, TA, CHI, TSU, TE,TO, DA, DE,DO WORI<SHEEI' #4 6 PRACTICE: SA,SHI,SU,SE, SO, ZA,II, ZU,ZE,ZO, TA, CHI, 'lSU,TE,TO, OA, DE,DO WORI<SHEEI' #5 7 MORE PRACTICE: SA,SHI,SU,SE,SO, ZA,II, ZU,ZE, W, TA, CHI, TSU, TE,TO, DA, DE,DO WORKSHEET #6 8 ADDmONAL PRACI'ICE: SA,SHI,SU,SE, SO, ZA,JI, ZU,ZE,ZO, TA, CHI,TSU,TE,TO, DA, DE,DO STUDYSHEET #3 9 NA,NI, NU,NE,NO, HA, HI,FU,HE, HO, BA, BI,BU,BE,BO, PA, PI,PU,PE,PO WORKSHEET #7 10 PRACTICE: NA,NI, NU, NE,NO, HA, HI,FU,HE,HO, BA,BI, BU,BE, BO, PA, PI,PU,PE,PO WORKSHEET #8 11 MORE PRACTICE: NA,NI, NU,NE,NO, HA,HI, FU,HE, HO, BA,BI,BU,BE, BO, PA,PI,PU,PE,PO WORKSHEET #9 12 ADDmONAL PRACTICE: NA,NI, NU, NE,NO, HA, HI, FU,HE, HO, BA,BI,3U, BE, BO, PA, PI,PU,PE,PO STUDYSHEET #4 13 MA, MI,MU, ME, MO, YA, W, YO WORKSHEET#10 14 PRACTICE: MA,MI, MU,ME, MO, YA, W, YO WORKSHEET #11 15 MORE PRACTICE: MA, MI,MU,ME,MO, YA, W, YO WORKSHEET #12 16 ADDmONAL PRACTICE: MA,MI,MU, ME, MO, YA, W, YO STUDYSHEET #5 17
    [Show full text]
  • Galio Uma, Maƙerin Azurfa Maƙeri (M): Ni Ina Da Shekara Yanzu…
    LANGUAGE PROGRAM, 232 BAY STATE ROAD, BOSTON, MA 02215 [email protected] www.bu.edu/Africa/alp 617.353.3673 Interview 2: Galio Uma, Maƙerin Azurfa Maƙeri (M): Ni ina da shekara yanzu…, ina da vers …alal misali vers mille neuf cent quatre vingt et quatre (1984) gare mu parce que wajenmu babu hopital lokacin da anka haihe ni ban san shekaruna ba, gaskiya. Ihehehe! Makaranta, ban yi makaranta ba sabo da an sa ni makaranta sai daga baya wajenmu babana bai da hali, mammana bai da hali sai mun biya. Akwai nisa tsakaninmu da inda muna karatu. Kilometir kama sha biyar kullum ina tahiya da ƙahwa. To, ha na gaji wani lokaci mu hau jaki, wani lokaci mu hau raƙumi, to wani lokaci ma babu hali, babu karatu. Amman na yi shekara shidda ina karatu ko biyat. Amman ban ji daɗi ba da ban yi karatu ba. Sabo da shi ne daɗin duniya. Mm. To mu Buzaye ne muna yin buzanci. Ana ce muna Tuareg. Mais Tuareg ɗin ma ba guda ba ne. Mu artisans ne zan ce miki. Can ainahinmu ma ba mu da wani aiki sai ƙira. Duka mutanena suna can Abala. Nan ga ni dai na nan ga. Ka gani ina da mata, ina da yaro. Akwai babana, akwai mammana. Ina da ƙanne huɗu, macce biyu da namiji biyu. Suna duka cikin gida. Duka kuma kama ni ne chef de famille. Duka ni ne ina aider ɗinsu. Sabo da babana yana da shekara saba’in da biyar. Vers. Mammana yana da shekara shi ma hamsin da biyar.
    [Show full text]
  • Writing As Aesthetic in Modern and Contemporary Japanese-Language Literature
    At the Intersection of Script and Literature: Writing as Aesthetic in Modern and Contemporary Japanese-language Literature Christopher J Lowy A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2021 Reading Committee: Edward Mack, Chair Davinder Bhowmik Zev Handel Jeffrey Todd Knight Program Authorized to Offer Degree: Asian Languages and Literature ©Copyright 2021 Christopher J Lowy University of Washington Abstract At the Intersection of Script and Literature: Writing as Aesthetic in Modern and Contemporary Japanese-language Literature Christopher J Lowy Chair of the Supervisory Committee: Edward Mack Department of Asian Languages and Literature This dissertation examines the dynamic relationship between written language and literary fiction in modern and contemporary Japanese-language literature. I analyze how script and narration come together to function as a site of expression, and how they connect to questions of visuality, textuality, and materiality. Informed by work from the field of textual humanities, my project brings together new philological approaches to visual aspects of text in literature written in the Japanese script. Because research in English on the visual textuality of Japanese-language literature is scant, my work serves as a fundamental first-step in creating a new area of critical interest by establishing key terms and a general theoretical framework from which to approach the topic. Chapter One establishes the scope of my project and the vocabulary necessary for an analysis of script relative to narrative content; Chapter Two looks at one author’s relationship with written language; and Chapters Three and Four apply the concepts explored in Chapter One to a variety of modern and contemporary literary texts where script plays a central role.
    [Show full text]
  • Basis Technology Unicode対応ライブラリ スペックシート 文字コード その他の名称 Adobe-Standard-Encoding A
    Basis Technology Unicode対応ライブラリ スペックシート 文字コード その他の名称 Adobe-Standard-Encoding Adobe-Symbol-Encoding csHPPSMath Adobe-Zapf-Dingbats-Encoding csZapfDingbats Arabic ISO-8859-6, csISOLatinArabic, iso-ir-127, ECMA-114, ASMO-708 ASCII US-ASCII, ANSI_X3.4-1968, iso-ir-6, ANSI_X3.4-1986, ISO646-US, us, IBM367, csASCI big-endian ISO-10646-UCS-2, BigEndian, 68k, PowerPC, Mac, Macintosh Big5 csBig5, cn-big5, x-x-big5 Big5Plus Big5+, csBig5Plus BMP ISO-10646-UCS-2, BMPstring CCSID-1027 csCCSID1027, IBM1027 CCSID-1047 csCCSID1047, IBM1047 CCSID-290 csCCSID290, CCSID290, IBM290 CCSID-300 csCCSID300, CCSID300, IBM300 CCSID-930 csCCSID930, CCSID930, IBM930 CCSID-935 csCCSID935, CCSID935, IBM935 CCSID-937 csCCSID937, CCSID937, IBM937 CCSID-939 csCCSID939, CCSID939, IBM939 CCSID-942 csCCSID942, CCSID942, IBM942 ChineseAutoDetect csChineseAutoDetect: Candidate encodings: GB2312, Big5, GB18030, UTF32:UTF8, UCS2, UTF32 EUC-H, csCNS11643EUC, EUC-TW, TW-EUC, H-EUC, CNS-11643-1992, EUC-H-1992, csCNS11643-1992-EUC, EUC-TW-1992, CNS-11643 TW-EUC-1992, H-EUC-1992 CNS-11643-1986 EUC-H-1986, csCNS11643_1986_EUC, EUC-TW-1986, TW-EUC-1986, H-EUC-1986 CP10000 csCP10000, windows-10000 CP10001 csCP10001, windows-10001 CP10002 csCP10002, windows-10002 CP10003 csCP10003, windows-10003 CP10004 csCP10004, windows-10004 CP10005 csCP10005, windows-10005 CP10006 csCP10006, windows-10006 CP10007 csCP10007, windows-10007 CP10008 csCP10008, windows-10008 CP10010 csCP10010, windows-10010 CP10017 csCP10017, windows-10017 CP10029 csCP10029, windows-10029 CP10079 csCP10079, windows-10079
    [Show full text]
  • Japanese Language and Culture
    NIHONGO History of Japanese Language Many linguistic experts have found that there is no specific evidence linking Japanese to a single family of language. The most prominent theory says that it stems from the Altaic family(Korean, Mongolian, Tungusic, Turkish) The transition from old Japanese to Modern Japanese took place from about the 12th century to the 16th century. Sentence Structure Japanese: Tanaka-san ga piza o tabemasu. (Subject) (Object) (Verb) 田中さんが ピザを 食べます。 English: Mr. Tanaka eats a pizza. (Subject) (Verb) (Object) Where is the subject? I go to Tokyo. Japanese translation: (私が)東京に行きます。 [Watashi ga] Toukyou ni ikimasu. (Lit. Going to Tokyo.) “I” or “We” are often omitted. Hiragana, Katakana & Kanji Three types of characters are used in Japanese: Hiragana, Katakana & Kanji(Chinese characters). Mr. Tanaka goes to Canada: 田中さんはカナダに行きます [kanji][hiragana][kataka na][hiragana][kanji] [hiragana]b Two Speech Styles Distal-Style: Semi-Polite style, can be used to anyone other than family members/close friends. Direct-Style: Casual & blunt, can be used among family members and friends. In-Group/Out-Group Semi-Polite Style for Out-Group/Strangers I/We Direct-Style for Me/Us Polite Expressions Distal-Style: 1. Regular Speech 2. Ikimasu(he/I go) Honorific Speech 3. Irasshaimasu(he goes) Humble Speech Mairimasu(I/We go) Siblings: Age Matters Older Brother & Older Sister Ani & Ane 兄 と 姉 Younger Brother & Younger Sister Otooto & Imooto 弟 と 妹 My Family/Your Family My father: chichi父 Your father: otoosan My mother: haha母 お父さん My older brother: ani Your mother: okaasan お母さん Your older brother: oniisanお兄 兄 さん My older sister: ane姉 Your older sister: oneesan My younger brother: お姉さ otooto弟 ん Your younger brother: My younger sister: otootosan弟さん imooto妹 Your younger sister: imootosan 妹さん Boy Speech & Girl Speech blunt polite I/Me = watashi, boku, ore, I/Me = watashi, washi watakushi I am going = Boku iku.僕行 I am going = Watashi iku く。 wa.
    [Show full text]
  • Hi Ilok Ru H \Yi~>? I
    t'/I h( C 13~ /') I ( J /~ j into hi ilok ru h \Yi~>? I / EsCUELAS SIN FRONTERAS Guatemala 1995 xintow chik ilok ru hu ESCUELAS SIN FRONTERAS Rokslnkll II Ilok ru hu Reheb' II kok'ol OJ q'eqchl' wankeb' SOl xb'een noloJ tosol hu xmolam ESF ColeCClOn Matenales educallvos nO 2 Idtoma Qeqclu Sene Manual del alumno Nlvel Pnmergrado Area Lecto escntura Drrector de colecclOn DrrecclOn de Programas Autores Jorge Seb Choc Carlos Qwm Xol Sofi Lamy PIerre Lancelot DlagramaClon PIerre Lancelot DlbuJos Mayra Fong URL Portada Marganta Ramrrez URL Pnmera relmpreslon 1999 IIIDII ~ III1ft9 Uruversldad Rafael Landivar USAID ESCUELAS SIN FRONTERAS © Escuelas Sm Fronteras Guatemala Penrullda la reproducclOn con preVIa autonzaclon xintow chik ilok ru hu Xyoob'ankll Sal xyollajlk chi junajwa II rokslnkll II tasal hu xmtaw chlk lIok ru hu, oXlb' ru h xsal naq xk'ojlaman - Jun ch'uut chi eetahl wankeb' chi ru t'lkr, 26 (waqlb' xka'k'aal) chi junll - Jun xk'utb'al xb'e h xk'anjellaj tzolonel - Jun tasal hu b'ar WII natawman naab'al kok' k'anjel yal re xtenq'ankll II tzolom A oXlb' paay chi na'leb' ha'ln Ink'a' naru nake'xjachl nb', ya jo'kan aj WII naq maajunwa naru rokslnklleb' a tasal hu ha'ln WI tOj majll natzolman II xb'ereslnkll Jo'kan bill, WI Ink'a' nab'aanuman chi jolalln, h xtzolb'al II aatlnob'aal moko jwal yaal to no Ie!, xb'aan naq Ink'a' xlajman xk'eeb'al xloqlal II xk'a'uxl ut II xtusulal h xyo'lajlk chaq a k'anJel ha'ln, ut chi Jo'ka'an nake'kana xb'aanunkll chi junll h xb'ehll ut h xtusulal II tzolok, WI maak'a' chi junll ha'ln maak'a' chlk xyaalal
    [Show full text]
  • Introduction to the Special Issue on Japanese Geminate Obstruents
    J East Asian Linguist (2013) 22:303-306 DOI 10.1007/s10831-013-9109-z Introduction to the special issue on Japanese geminate obstruents Haruo Kubozono Received: 8 January 2013 / Accepted: 22 January 2013 / Published online: 6 July 2013 © Springer Science+Business Media Dordrecht 2013 Geminate obstruents (GOs) and so-called unaccented words are the two properties most characteristic of Japanese phonology and the two features that are most difficult to learn for foreign learners of Japanese, regardless of their native language. This special issue deals with the first of these features, discussing what makes GOs so difficult to master, what is so special about them, and what makes the research thereon so interesting. GOs are one of the two types of geminate consonant in Japanese1 which roughly corresponds to what is called ‘sokuon’ (促音). ‘Sokon’ is defined as a ‘one-mora- long silence’ (Sanseido Daijirin Dictionary), often symbolized as /Q/ in Japanese linguistics, and is transcribed with a small letter corresponding to /tu/ (っ or ッ)in Japanese orthography. Its presence or absence is distinctive in Japanese phonology as exemplified by many pairs of words, including the following (dots /. / indicate syllable boundaries). (1) sa.ki ‘point’ vs. sak.ki ‘a short time ago’ ka.ko ‘past’ vs. kak.ko ‘paranthesis’ ba.gu ‘bug (in computer)’ vs. bag.gu ‘bag’ ka.ta ‘type’ vs. kat.ta ‘bought (past tense of ‘buy’)’ to.sa ‘Tosa (place name)’ vs. tos.sa ‘in an instant’ More importantly, ‘sokuon’ is an important characteristic of Japanese speech rhythm known as mora-timing. It is one of the four elements that can form a mora 1 The other type of geminate consonant is geminate nasals, which phonologically consist of a coda nasal and the nasal onset of the following syllable, e.g., /am.
    [Show full text]
  • Some Principles of Cryptographic Security
    . > :Non - Re sponsi ve DOCID· 38~8697 TOP SEEAET !l&tiilU Ji Some Principles of Cryptographic Security BY BJUGADIER JOHN H. TlLTMAN rpaµ Beer et b'iabrs The author derives .<tome general principles of cryptographic security as seen from the two opposite points of view of the de.iig~r of a cipher s_ystem and of the cryptanalyst. He illustrates the principles by exam­ ples frorn his own experience, emphasizing the weaknesses of design and usage which have led to tlu! so/utiun of a number of $)'1Jtems. I have attempted in the following paper to derive soirie general prin­ ciples of cryptographic security, looking at it from the two opposite points of view of the designer of a cipher system and the cryptanalyst. I have found it impossible to arrange these principles in any logical order and have decided t.o express them in the form of disconnected aphorisms and to illustrate them from my own experience. Much of what I have to say wi11 seem rather obvious, and I do not imagine that any of it will be or value to the designer of a cipher system today, but I hope that some of it may serve to stimulate the imagination of crypt­ anelysts facing complex but potentially vulnerable systems (such as are stiJI used in many countries). tn any case, the principles should not be taken loo seriously. They should be regarded as a loose frame­ work for the classification of weaknesses of design and usage which have led t.o the solution of a number of systems in the anelysiR of which I have ta.ken part.
    [Show full text]
  • Introduction to Japanese Computational Linguistics Francis Bond and Timothy Baldwin
    1 Introduction to Japanese Computational Linguistics Francis Bond and Timothy Baldwin The purpose of this chapter is to provide a brief introduction to the Japanese language, and natural language processing (NLP) research on Japanese. For a more complete but accessible description of the Japanese language, we refer the reader to Shibatani (1990), Backhouse (1993), Tsujimura (2006), Yamaguchi (2007), and Iwasaki (2013). 1 A Basic Introduction to the Japanese Language Japanese is the official language of Japan, and belongs to the Japanese language family (Gordon, Jr., 2005).1 The first-language speaker pop- ulation of Japanese is around 120 million, based almost exclusively in Japan. The official version of Japanese, e.g. used in official settings andby the media, is called hyōjuNgo “standard language”, but Japanese also has a large number of distinctive regional dialects. Other than lexical distinctions, common features distinguishing Japanese dialects are case markers, discourse connectives and verb endings (Kokuritsu Kokugo Kenkyujyo, 1989–2006). 1There are a number of other languages in the Japanese language family of Ryukyuan type, spoken in the islands of Okinawa. Other languages native to Japan are Ainu (an isolated language spoken in northern Japan, and now almost extinct: Shibatani (1990)) and Japanese Sign Language. Readings in Japanese Natural Language Processing. Francis Bond, Timothy Baldwin, Kentaro Inui, Shun Ishizaki, Hiroshi Nakagawa and Akira Shimazu (eds.). Copyright © 2016, CSLI Publications. 1 Preview 2 / Francis Bond and Timothy Baldwin 2 The Sound System Japanese has a relatively simple sound system, made up of 5 vowel phonemes (/a/,2 /i/, /u/, /e/ and /o/), 9 unvoiced consonant phonemes (/k/, /s/,3 /t/,4 /n/, /h/,5 /m/, /j/, /ó/ and /w/), 4 voiced conso- nants (/g/, /z/,6 /d/ 7 and /b/), and one semi-voiced consonant (/p/).
    [Show full text]
  • Sveučilište Josipa Jurja Strossmayera U Osijeku Filozofski Fakultet U Osijeku Odsjek Za Engleski Jezik I Književnost Uroš Ba
    CORE Metadata, citation and similar papers at core.ac.uk Provided by Croatian Digital Thesis Repository Sveučilište Josipa Jurja Strossmayera u Osijeku Filozofski fakultet u Osijeku Odsjek za engleski jezik i književnost Uroš Barjaktarević Japanese-English Language Contact / Japansko-engleski jezični kontakt Diplomski rad Kolegij: Engleski jezik u kontaktu Mentor: doc. dr. sc. Dubravka Vidaković Erdeljić Osijek, 2015. 1 Summary JAPANESE-ENGLISH LANGUAGE CONTACT The paper examines the language contact between Japanese and English. The first section of the paper defines language contact and the most common contact-induced language phenomena with an emphasis on linguistic borrowing as the dominant contact-induced phenomenon. The classification of linguistic borrowing thereby follows Haugen's distinction between morphemic importation and substitution. The second section of the paper presents the features of the Japanese language in terms of origin, phonology, syntax, morphology, and writing. The third section looks at the history of language contact of the Japanese with the Europeans, starting with the Portuguese and Spaniards, followed by the Dutch, and finally the English. The same section examines three different borrowing routes from English, and contact-induced language phenomena other than linguistic borrowing – bilingualism , code alternation, code-switching, negotiation, and language shift – present in Japanese-English language contact to varying degrees. This section also includes a survey of the motivation and reasons for borrowing from English, as well as the attitudes of native Japanese speakers to these borrowings. The fourth and the central section of the paper looks at the phenomenon of linguistic borrowing, its scope and the various adaptations that occur upon morphemic importation on the phonological, morphological, orthographic, semantic and syntactic levels.
    [Show full text]