Korean-To-Chinese Machine Translation Using Chinese Character As Pivot Clue

Total Page:16

File Type:pdf, Size:1020Kb

Korean-To-Chinese Machine Translation Using Chinese Character As Pivot Clue Korean-to-Chinese Machine Translation using Chinese Character as Pivot Clue Jeonghyeok Park1,2,3 and Hai Zhao1,2,3, ∗ 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, China 3 MoE Key Lab of Artificial Intelligence AI Institute, Shanghai Jiao Tong University [email protected], [email protected] Abstract et al., 2018; Xiao et al., 2019). Meanwhile, there are few attempts to improve the performance of the Korean-Chinese is a low resource language NMT model using linguistic characteristics for sev- pair, but Korean and Chinese have a lot eral language pairs (Sennrich and Haddow, 2016). in common in terms of vocabulary. Sino- On the other hand, Most of the recently proposed Korean words, which can be converted into corresponding Chinese characters, account for statistical machine translation (SMT) systems have more then fifty of the entire Korean vocabu- attempted to improve translation performance by lary. Motivated by this, we propose a simple using linguistic features including part-of-speech linguistically motivated solution to improve (POS) tags (Ueffing and Ney, 2013), syntax (Zhang the performance of Korean-to-Chinese neural et al., 2007), semantics (Rafael and Marta, 2011), machine translation model by using their com- reordering information (Zang et al., 2015; Zhang et mon vocabulary. We adopt Chinese charac- al., 2016) and so on. ters as a translation pivot by converting Sino- Korean words in Korean sentence to Chinese In this work, we focus on machine translation be- characters and then train machine translation tween Korean and Chinese, which have few parallel model with the converted Korean sentences corpora but share a well-known culture heritage, the as source sentences. The experimental results Sino-Korean words. Chinese loanwords used in Ko- on Korean-to-Chinese translation demonstrate rean are called Sino-Korean words, and can also be that the models with the proposed method written in Chinese characters which are still used by improve translation quality up to 1.5 BLEU points in comparison to the baseline models. modern Chinese people. Such a shared vocabulary makes the two languages closer despite their huge linguistic difference and provides the possibility for 1 Introduction better machine translation. Neural machine translation (NMT) using sequence- Because of its long history of contact with China, to-sequence structure has achieved remarkable per- Koreans have used Chinese characters as their writ- formance for most language pairs (Bahdanau et al., ing system, and even after adopting Hangul(ôÇ/åJ in 2014; Cho et al., 2014; Sutskever et al., 2014; Lu- Korean) as the standard language, Chinese charac- ong and Manning, 2015). Many studies on NMT ters have a considerable influence in Korean vocabu- have tried to improve the translation performance lary. Currently, the writing system adopted by mod- by changing the structure of the network model or ern Korean is Hangul, but Chinese characters con- adding new strategies (Wu and Zhao, 2018; Zhang tinue to be used in Korean and Chinese characters ∗ used in Korean are called ”Hanja”. Korean vocab- Corresponding author. This paper was partially supported ulary can be categorized into native Korean words, by National Key Research and Development Program of China (No. 2017YFB0304100) and Key Projects of National Natural Sino-Korean words, and loanwords from other lan- Science Foundation of China (U1836222 and 61733011). guages. The Sino-Korean vocabulary refers to Ko- 522 Pacific Asia Conference on Language, Information and Computation (PACLIC 33), pages 522-530, Hakodate, Japan, September 13-15, 2019 Copyright © 2019 Jeonghyeok Park and Hai Zhao Systems Sentences Korean " §î îÉr 아Aü< °úᇀsᅵ ìÍøí÷&%3다. HH-Convert }令Ér 아Aü< °úᇀsᅵ 颁布÷&%3다. Chinese }令颁布如下。 English The command was promulgated as follows. Korean ᆼª²DGÉr Fg #3ôÇ% %iò \"f_ᅴ /BN1lx sᅵe`¦ XSþÙ¡다. HH-Convert $国Ér 广范ôÇ 领域\"f_ᅴ q同 )Ê`¦ n¤þÙ¡다. Chinese $国(广泛的领域n¤了q同)Ê。 English The two countries have confirmed common interests in a wide range of areas. Table 1: The HH-Convert is Korean sentence converted by Hangul-Hanja conversion of the Hanjaro. The underline denotes Sino-Korean word and its corresponding Chinese characters in Korean sentence and HH-Convert sentence, respectively. rean words of Chinese origin and can be converted ties between language pairs to improve MT perfor- into corresponding Chinese characters, and consid- mance. Li et al. (2009) improved the translation erably account for about 57% of Korean vocabu- quality for Chinese-to-Korean SMT by using Chi- lary. Table 1 shows some sentence pairs of Korean nese syntactic reordering for an adequate generation and Chinese with the converted Sino-Korean words. of Korean verbal phrases. In Table 1, some Chinese words are commonly ob- Since Chinese and Korean belong to entirely dif- served between the converted Korean sentence and ferent language families in terms of typology and the Chinese sentence. genealogy, many studies also tried to analyze sen- In this paper, we present a novel yet straightfor- tence structure and word alignment of the two lan- ward method for better Korean-to-Chinese MT by guages and then proposed the specific methods for exploiting the connection of Sino-Korean vocabu- their concern (Huang and Choi, 2000; Kim et al., lary. We convert all Sino-Korean words in Korean 2002; Li et al., 2008). Lu et al. (2015) proposed sentences into Chinese characters and take the con- a method of translating Korean words into Chinese verted Korean sentences as the updated source data using the Chinese character knowledge. for later MT model training. Our method is applied to two types of NMT models, recurrent neural net- There are several attempts to exploit the connec- work (RNN) and the Transformer, and shows signif- tion between the source language and the target lan- icant translation performance improvement. guage in machine translation. Kuang et al. (2018) proposed methods to somewhat shorten the distance 2 Related Work between the source and target words in NMT model, and thus strengthen their association, through a tech- There have been studies of linguistic annotation, nique bridging source and target word embeddings. such as dependency label (Wu et al., 2018; Li et al., For other low-resource language pairs, using pivot 2018a; Li et al., 2018b), semantic role labels (Guan language to overcome the limitation of the insuf- et al., 2019; Li et al., 2019) and so on. Sennrich and ficient parallel corpus has been a choice (Habash Haddow (2016) proved that various linguistic fea- and Hu, 2009; Zahabi et al., 2013; Ahmadnia et tures can be valuable for NMT. In this work, we fo- al., 2017). Chu et al. (2013) bulid a Chinese char- cus on the linguistic connection between Korean and acter mapping table for Japanese, Traditional Chi- Chinese to improve Korean-to-Chinese NMT. nese, and Simplified Chinese and verified the ef- There are several studies on Korean-Chinese fectiveness of shared Chinese characters for Chi- machine translation. For example, Kim et nese–Japanese MT. Zhao et al. (2013) used the Chi- al. (2002) proposed verb-pattern-based Korean-to- nese character, a common form of both languages, as Chinese MT system that uses pattern-based knowl- a translation bridge in the Vietnamese-Chinese SMT edge and consistently manages linguistic peculiari- model, and improved the translation quality by con- 523 北 B^ “北美a'>ᅨ¸ “南北a'>ᅨ%!3 Chinese, many homophones were created in their @/¨8” vocabulary in the process of translating the Chinese 3.1îr1lx 100ÅÒ¸ ´úᆽ아 ᆼ©# î #QL:\ "é¶Òo( words into their language. Around 35% of the Sino- 原r) IFGlᅵÂÒÃÌ Korean words registered in the Standard Korean Language Dictionary belong to homophones. Thus Table 2: News headlines with Chinese characters. The converting Sino-Korean words into (usually differ- underline denotes Chinese characters. ent) Chinese characters will have a similar impact as semantic disambiguation. For example, the Korean verting Vietnamese syllables into Chinese characters word uisa (_ᅴ사 in Korean) has many homophones with a pre-specified dictionary. Partially motivated and can have several meanings. To clarify the mean- by this work, we turn to Korean in terms of NMT ing of the word uisa in Korean context, these words models by fully exploiting the shared Sino-Korean are occasionally written in Chinese characters as fol- vocabulary between Korean and Chinese. lows: ;师 (doctor), 意思 (mind), Ië (martyr), ® 事 (proceedings). 3 Sino-Korean Words and Chinese In addition, There is a difference between Chinese Characters characters (Hanja) used in Korea and Chinese char- acters used in China. Chinese can be divided into Korea belongs to the Chinese cultural sphere, which two categories: Traditional Chinese and Simplified means that China has historically influenced regions Chinese. Chinese characters used in China and Ko- and countries of East Asia. Before the creation of rea are Simplified Chinese and Traditional Chinese, Hangul (Korean alphabet), all documents were writ- respectively. ten in Chinese characters, and Chinese characters were used continuously even after the creation of 4 The Proposed Approach Hangul. The proposed approach for Korean-to-Chinese MT Today, the standard writing system in Korea is has two phases: Hangul-Hanja conversion and NMT Hangul, and the use of Chinese characters in Korean model training. We first convert the Sino-Korean sentences is rare, but Chinese characters have left words of the Korean input sentences into Chinese a significant influence on Korean vocabulary. About characters, and convert the Traditional Chinese char- 290,000 (57%) out of the 510,000 words in the Stan- acters of the converted Korean input sentences into dard Korean Language Dictionary published by the Simplified Chinese characters to share the common National Institute of Korean Language belongs to units between source and target vocabulary.
Recommended publications
  • Areal Script Form Patterns with Chinese Characteristics James Myers
    Areal script form patterns with Chinese characteristics James Myers National Chung Cheng University http://personal.ccu.edu.tw/~lngmyers/ To appear in Written Language & Literacy This study was made possible through a grant from Taiwan’s Ministry of Science and Technology (103-2410-H-194-119-MY3). Iwano Mariko helped with katakana and Minju Kim with hangul, while Tsung-Ying Chen, Daniel Harbour, Sven Osterkamp, two anonymous reviewers and the special issue editors provided all sorts of useful suggestions as well. Abstract It has often been claimed that writing systems have formal grammars structurally analogous to those of spoken and signed phonology. This paper demonstrates one consequence of this analogy for Chinese script and the writing systems that it has influenced: as with phonology, areal script patterns include the borrowing of formal regularities, not just of formal elements or interpretive functions. Whether particular formal Chinese script regularities were borrowed, modified, or ignored also turns out not to depend on functional typology (in morphemic/syllabic Tangut script, moraic Japanese katakana, and featural/phonemic/syllabic Korean hangul) but on the benefits of making the borrowing system visually distinct from Chinese, the relative productivity of the regularities within Chinese character grammar, and the level at which the borrowing takes place. Keywords: Chinese characters, Tangut script, Japanese katakana, Korean hangul, writing system grammar, script outer form, areal patterns 1. Areal phonological patterns and areal script patterns Sinoform writing systems look Chinese, even when they are functionally quite different. The visual traits of non-logographic Japanese katakana and Korean hangul are nontrivially like those of logographic Chinese script, as are those of the logographic but structurally unique script of Tangut, an extinct Tibeto-Burman language of what is now north-central China.
    [Show full text]
  • A Life of Sound: Korean Farming Music and Its Journey to Modernity
    A Life of Sound: Korean Farming Music and its Journey to Modernity Jennifer L. Bussell 2 May 1997 Bachelors Essay submitted for Honors in the Department of Anthropology The College of The University of Chicago Professor Ralph Nicholas, Advisor Professor Tetsuo Najita, Reader Introduction It is late on a Monday night, and from the basement of an aging university building the throbbing sound of drum beats rises through the ceiling, catching the ear of everyone who passes through and threatening the livelihood of shaking chandeliers. The sound is muffled, but its power can still be felt, the strength of the music unhindered by the walls surrounding it. Below, visitors to the basement cannot help changing their path, cannot help being drawn around the corner, into the room from which the sounds emanate. To their amazement, it is not an amplified drumset, it is not a few students banging away on American-style drums. What they find is a large group of students, seated on the floor in a circle, each playing an instrument. The visitors do not recognize any of the instruments. Most of what they see are drums, but not ones they have ever seen before. A few people are playing something different, some kind of miniature gong that makes an almost shreiking sound, and the visitors try to restrain themselves from holding their ears as they observe. After the instruments, they notice the students. Many of the musicians are Asian, but there are a few who are not. They seem serious, intent on what they are playing, but there are some who are smiling, a few who are making eye contact and laughing at each other, clearly enjoying themselves.
    [Show full text]
  • Expreance Korean.Indd
    GREETINGS Welcome to Experience Korea!!! The National Association for Korean Schools (NAKS) is pleased to off er you to experience Ko- rea while interacting with many Korean language school teachers. Please continue your experience and connect to Korea by attending activities of Ko- rean language schools in your area. When it comes to Experience Korea, NAKS and its member schools Seungmin Lee are here to give you the unique opportunity. Please enjoy. NAKS President Korean Wave (한류) The Korean Wave (Hangul: 한류; Hanja: 韓流; RR: Hallyu; MR: Hallyu, About this sound listen (help·info), a neologism literally meaning “fl ow of Korea”) is the increase in global popularity of South Korean culture since the 1990s First driven by the spread of K-dramas and K-pop across East, South and Southeast Asia during its initial stages, the Korean Wave evolved from a regional development into a global phenomenon, carried by the Internet and social media and the proliferation of K-pop music videos on YouTube.Part of the success of the Korean Wave owes in part to the development of social net- working services and online video sharing platforms such as YouTube, which have allowed the Korean entertainment industry to reach a sizable overseas audience. Since the turn of the 21st century, South Korea has emerged as a major exporter of popular culture and tourism, aspects which have become a signif- icant part of its burgeoning economy. The growing popularity of Korean pop culture in many parts of the world has prompted the South Korean govern- ment to support its creative industries through subsidies and funding for start- ups, as a form of soft power and in its aim of becoming one of the world’s leading exporters of culture along with Japanese and British culture, a niche that the United States has dominated for nearly a century.
    [Show full text]
  • Arxiv:1908.09282V3 [Cs.CL] 31 Oct 2019 Place in Modern Korean
    Don’t Just Scratch the Surface: Enhancing Word Representations for Korean with Hanja Kang Min Yoo∗, Taeuk Kim∗ and Sang-goo Lee Department of Computer Science and Engineering Seoul National University, Seoul, Korea fkangminyoo,taeuk,[email protected] Abstract We propose a simple yet effective approach for improving Korean word representations using additional linguistic annotation (i.e. Hanja). We employ cross-lingual transfer learning in training word representations by leveraging the fact that Hanja is closely related to Chi- nese. We evaluate the intrinsic quality of rep- resentations learned through our approach us- ing the word analogy and similarity tests. In Figure 1: An example of a Korean word showing its addition, we demonstrate their effectiveness form and multi-level meanings. The Sino-Korean word KR on several downstream tasks, including a novel consists of Hangul phonograms ( ) and Hanja lo- HJ Korean news headline generation task. gograms ( ). Although annotation of Hanja is op- tional, it offers deeper insight into the word meaning due to its association with the Chinese characters (CN). 1 Introduction There is a strong connection between the Korean Information Skip-Gram), for capturing the seman- and Chinese languages due to cultural and histori- tics of Hanja and subword structures of Korean cal reasons (Lee and Ramsey, 2011). Specifically, and introducing them into the vector space. Note a set of logograms with very similar forms to the that it is also quite intuitive for native Koreans 1 Chinese characters, called Hanja , served in the to resolve the ambiguity of (Sino-)Korean words past as the only medium for written Korean until with the aid of Hanja.
    [Show full text]
  • Language, Script, and Art in East Asia and Beyond: Past and Present
    SINO-PLATONIC PAPERS Number 283 December, 2018 Language, Script, and Art in East Asia and Beyond: Past and Present edited by Victor H. Mair Victor H. Mair, Editor Sino-Platonic Papers Department of East Asian Languages and Civilizations University of Pennsylvania Philadelphia, PA 19104-6305 USA [email protected] www.sino-platonic.org SINO-PLATONIC PAPERS FOUNDED 1986 Editor-in-Chief VICTOR H. MAIR Associate Editors PAULA ROBERTS MARK SWOFFORD ISSN 2157-9679 (print) 2157-9687 (online) SINO-PLATONIC PAPERS is an occasional series dedicated to making available to specialists and the interested public the results of research that, because of its unconventional or controversial nature, might otherwise go unpublished. The editor-in-chief actively encourages younger, not yet well established scholars and independent authors to submit manuscripts for consideration. Contributions in any of the major scholarly languages of the world, including romanized modern standard Mandarin and Japanese, are acceptable. In special circumstances, papers written in one of the Sinitic topolects (fangyan) may be considered for publication. Although the chief focus of Sino-Platonic Papers is on the intercultural relations of China with other peoples, challenging and creative studies on a wide variety of philological subjects will be entertained. This series is not the place for safe, sober, and stodgy presentations. Sino-Platonic Papers prefers lively work that, while taking reasonable risks to advance the field, capitalizes on brilliant new insights into the development of civilization. Submissions are regularly sent out for peer review, and extensive editorial suggestions for revision may be offered. Sino-Platonic Papers emphasizes substance over form.
    [Show full text]
  • Encounters with Samulnori: the Cultural Politics of South Korea's Dynamic Percussion Genre
    Encounters with Samulnori: The Cultural Politics of South Korea's Dynamic Percussion Genre The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Lee, Katherine In-Young. 2012. Encounters with Samulnori: The Citation Cultural Politics of South Korea's Dynamic Percussion Genre. Doctoral dissertation, Harvard University. Accessed April 17, 2018 3:50:42 PM EDT Citable Link http://nrs.harvard.edu/urn-3:HUL.InstRepos:10288955 WARNING: This file should NOT have been available for Terms of Use downloading from Harvard University's DASH repository. (Article begins on next page) © 2012 – Katherine In-Young Lee All rights reserved Professor Kay Kaufman Shelemay, advisor Katherine In-Young Lee ENCOUNTERS WITH SAMULNORI: THE CULTURAL POLITICS OF SOUTH KOREA’S DYNAMIC PERCUSSION GENRE ABSTRACT This dissertation interrogates how diverse actors ascribe semantic, affective, and political meanings to instrumental music under changing historical circumstances and in different performance contexts. In what I call an “ethnographic reception study,” I employ historical and ethnographic methods to assess the ways in which the popular samulnori percussion genre from South Korea has been imbued with associations as divergent as a sonic symbol of Korea to narratives of resistance against the state. Through five chapters, I track some of the contested and multiple meanings as they interact, both in historical moments in South Korea and vis-à-vis transnational circulations that led to the genre’s transmission outside Korea. As a genre of percussion music that was first created in South Korea in 1978, samulnori has had a complex reception during three dramatic decades in modern Korean history— leading to life-changing encounters from its fans while also eliciting scorn from its detractors.
    [Show full text]
  • Enhancing Word Representations for Korean with Hanja
    Don’t Just Scratch the Surface: Enhancing Word Representations for Korean with Hanja Kang Min Yoo∗, Taeuk Kim∗ and Sang-goo Lee Department of Computer Science and Engineering Seoul National University, Seoul, Korea fkangminyoo,taeuk,[email protected] Abstract We propose a simple yet effective approach for improving Korean word representations using additional linguistic annotation (i.e. Hanja). We employ cross-lingual transfer learning in training word representations by leveraging the fact that Hanja is closely related to Chi- nese. We evaluate the intrinsic quality of rep- resentations learned through our approach us- ing the word analogy and similarity tests. In Figure 1: An example of a Korean word showing its addition, we demonstrate their effectiveness form and multi-level meanings. The Sino-Korean word KR on several downstream tasks, including a novel consists of Hangul phonograms ( ) and Hanja lo- HJ Korean news headline generation task. gograms ( ). Although annotation of Hanja is op- tional, it offers deeper insight into the word meaning due to its association with the Chinese characters (CN). 1 Introduction There is a strong connection between the Korean Information Skip-Gram), for capturing the seman- and Chinese languages due to cultural and histori- tics of Hanja and subword structures of Korean cal reasons (Lee and Ramsey, 2011). Specifically, and introducing them into the vector space. Note a set of logograms with very similar forms to the that it is also quite intuitive for native Koreans 1 Chinese characters, called Hanja , served in the to resolve the ambiguity of (Sino-)Korean words past as the only medium for written Korean until with the aid of Hanja.
    [Show full text]
  • Encounters with Samulnori: the Cultural Politics of South Korea's Dynamic Percussion Genre
    Encounters with Samulnori: The Cultural Politics of South Korea's Dynamic Percussion Genre The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Lee, Katherine In-Young. 2012. Encounters with Samulnori: The Cultural Politics of South Korea's Dynamic Percussion Genre. Doctoral dissertation, Harvard University. Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:10288955 Terms of Use This article was downloaded from Harvard University’s DASH repository, WARNING: This file should NOT have been available for downloading from Harvard University’s DASH repository. © 2012 – Katherine In-Young Lee All rights reserved Professor Kay Kaufman Shelemay, advisor Katherine In-Young Lee ENCOUNTERS WITH SAMULNORI: THE CULTURAL POLITICS OF SOUTH KOREA’S DYNAMIC PERCUSSION GENRE ABSTRACT This dissertation interrogates how diverse actors ascribe semantic, affective, and political meanings to instrumental music under changing historical circumstances and in different performance contexts. In what I call an “ethnographic reception study,” I employ historical and ethnographic methods to assess the ways in which the popular samulnori percussion genre from South Korea has been imbued with associations as divergent as a sonic symbol of Korea to narratives of resistance against the state. Through five chapters, I track some of the contested and multiple meanings as they interact, both in historical moments in South Korea and vis-à-vis transnational circulations that led to the genre’s transmission outside Korea. As a genre of percussion music that was first created in South Korea in 1978, samulnori has had a complex reception during three dramatic decades in modern Korean history— leading to life-changing encounters from its fans while also eliciting scorn from its detractors.
    [Show full text]
  • Correspondence Between the Korean and Mandarin Chinese Pronunciations of Chinese Characters: a Comparison at the Sub-Syllabic Level
    Correspondence between the Korean and Mandarin Chinese pronunciations of Chinese characters: A comparison at the sub-syllabic level Xiao Luo[1], Yike Yang[2], Jing Sun[1] and Nuo Chen[3] School of Education, University of Cincinnati [1] Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University [2] South China Agricultural University [3] Abstract This study explores the corresponding relationship of Chinese characters’ pronunciations between modern Mandarin Chinese and modern Korean at the subsyllabic level and investigates the applicability of such correspondence in learning and reading Korean as a second language (L2) by native (L1) Mandarin Chinese speakers. Correspondence between Korean and Mandarin Chinese initial consonants and that between Korean -V(C) structures and Chinese finals were calculated based on the 1,800 Chinese characters for educational purposes in South Korea. Our results demonstrated that Korean initial consonants had either consistent or inconsistent correspondence with their Mandarin Chinese counterparts. In addition, this study proved that pure comparisons of vowels between the two languages are not reliable. Instead, the comparison between Korean -V(C) structures and Chines finals could be more practical. Ninety percent of the high frequency Chinese characters in Korean can be inferred to corresponding Chinese pronunciations based on the data provided in this study. Key words Chinese characters, Correspondence, Chinese, Korean, Korean as L2 May 2019 Buckeye East Asian Linguistics © The Authors 46 1. Introduction Due to historical cross-cultural communication between China and the Korean peninsula (Ebrey 1996), modern Korean language contains a large number of Chinese character-driven loanwords (Wang, Yeon, Zhou, Shu, & Yan 2016).
    [Show full text]
  • Korean-To-Chinese Machine Translation Using Chinese Character As Pivot Clue
    Korean-to-Chinese Machine Translation using Chinese Character as Pivot Clue Jeonghyeok Park1,2,3 and Hai Zhao1,2,3, ∗ 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, China 3 MoE Key Lab of Artificial Intelligence AI Institute, Shanghai Jiao Tong University [email protected], [email protected] Abstract et al., 2018; Xiao et al., 2019). Meanwhile, there are few attempts to improve the performance of the Korean-Chinese is a low resource language NMT model using linguistic characteristics for sev- pair, but Korean and Chinese have a lot eral language pairs (Sennrich and Haddow, 2016). in common in terms of vocabulary. Sino- On the other hand, Most of the recently proposed Korean words, which can be converted into corresponding Chinese characters, account for statistical machine translation (SMT) systems have more then fifty of the entire Korean vocabu- attempted to improve translation performance by lary. Motivated by this, we propose a simple using linguistic features including part-of-speech linguistically motivated solution to improve (POS) tags (Ueffing and Ney, 2013), syntax (Zhang the performance of Korean-to-Chinese neural et al., 2007), semantics (Rafael and Marta, 2011), machine translation model by using their com- reordering information (Zang et al., 2015; Zhang et mon vocabulary. We adopt Chinese charac- al., 2016) and so on. ters as a translation pivot by converting Sino- Korean words in Korean sentence to Chinese In this work, we focus on machine translation be- characters and then train machine translation tween Korean and Chinese, which have few parallel model with the converted Korean sentences corpora but share a well-known culture heritage, the as source sentences.
    [Show full text]
  • A Study on the Korean and Chinese Pronunciation of Chinese Characters and Learning Korean As a Second Language
    PACLIC 32 A Study on the Korean and Chinese Pronunciation of Chinese Characters and Learning Korean as a Second Language Xiao Luo Yike Yang Jing Sun School of Education Department of Chinese and Bilingual School of Education Studies University of Cincinnati The Hong Kong Polytechnic University University of Cincinnati Cincinnati, OH, U.S.A. Hong Kong S.A.R. Cincinnati, OH, U.S.A. [email protected] [email protected] [email protected] (Cho & Chiu, 2015). They serve as special sources Abstract for Chinese native (L1) speakers to learn to read Sino-Korean words in Korean as a second Sino-Korean words have their etymological roots in Chinese characters. Previous studies language (L2) or a foreign language (FL) (Im & showed that the correspondent relation between Lee, 2008; Guo, 2018). Chinese and the Korean pronunciation of The significant role that vocabulary plays in L2 Chinese characters facilitates the reading of reading is well documented in literature (e.g., Sino-Korean words by Chinese learners of Ouellette, 2006). Given the large number of Sino- Korean as a second language (L2). This study Korean words in the lexical repertoire, it is quantifies such correspondence at the syllable worthwhile to examine the degree of level by calculating the degree of correspondence between the Chinese and Korean correspondence in Korean-Chinese syllables. pronunciation of Chinese characters. This will The degree of correspondence between Korean consequently facilitate teaching, learning, and and Chinese syllables was examined. Results show that among the 406 Chinese character reading of Sino-Korean words for the large families in Sino-Korean words, 22.7% have an population of L2 Korean learners in China, average correspondent consistency lower than considering that there has been a growing 0.5 and 33.3% are equal to or higher than 0.5 population of Chinese learners of Korean in recent but lower than 1.
    [Show full text]