Korean-To-Japanese Neural Machine Translation System Using Hanja Information

Total Page:16

File Type:pdf, Size:1020Kb

Korean-To-Japanese Neural Machine Translation System Using Hanja Information Korean-to-Japanese Neural Machine Translation System using Hanja Information Hwichan Kim Tosho Hirasawa Mamoru Komachi Tokyo Metropolitan University 6-6 Asahigaoka, Hino, Tokyo 191-0065, Japan fkim-hwichan, [email protected], [email protected] Abstract Korean (Hangul) 자연 ¸´ 처¬ Korean (Hanja) rP 言語 処理 In this paper, we describe our TMU neural ma- Japanese (Kanji) rP 言語 処理 chine translation (NMT) system submitted for English Natural Language Processing the Patent task (Korean→Japanese) of the 7th Workshop on Asian Translation (WAT 2020, Table 1: Example of conversion of SK words in Hangul Nakazawa et al., 2020). We propose a novel into Hanja and Japanese translation into Kanji. method to train a Korean-to-Japanese transla- tion model. Specifically, we focus on the vo- cabulary overlap of Korean Hanja words and embeddings are the same. Conneau et al.(2018) Japanese Kanji words, and propose strategies and Lample and Conneau(2019) improved trans- to leverage Hanja information. Our experi- lation accuracy by making embeddings between ment shows that Hanja information is effec- tive within a specific domain, leading to an im- source words and their corresponding target words provement in the BLEU scores by +1.09 points closer. From this fact, we hypothesize that if the compared to the baseline. embeddings of each corresponding word are closer, the translation accuracy will improve. 1 Introduction Based on this hypothesis, we propose two ap- The Japanese and Korean languages have a strong proaches to train a translation model. First, we connection with Chinese owing to cultural and his- follow Park and Zhao(2019)’s method to increase torical reasons (Lee and Ramsey, 2011). Many the vocabulary overlap to improve the Korean-to- words in Japanese are composed of Chinese char- Japanese translation accuracy. Therefore, we per- acters called Kanji. By contrast, Korean uses the form Hangul to Hanja conversion pre-processing Korean alphabet called Hangul to write sentences before training the translation model. Second, we in almost all cases. However, Sino-Korean1 (SK) propose another approach to obtain Korean and words, which can be converted into Hanja words, Japanese embeddings that are closer to Korean account for 65 percent of the Korean lexicon (Sohn, SK words and their corresponding Japanese Kanji 2006). Table1 presents an example of conversions words. SK words written in Hangul and their coun- of SK words into Hanja, which are compatible with terparts in Japanese Kanji are superficially differ- Japanese Kanji words. ent, but we make both embeddings close by using a In addition, several studies have suggested that loss function when training the translation model. overlapping tokens between the source and target In addition, in this study, we used the Japan languages can improve the translation accuracy Patent Office Patent Corpus 2.0, which consists of (Sennrich et al., 2016; Zhang and Komachi, 2019). four domains, namely chemistry (Ch), electricity Park and Zhao(2019) trained a Korean-to-Chinese (El), mechanical engineering (Me), and physics translation model by converting Korean SK words (Ph), whose training, development, and test-n12 from Hangul into Hanja to increase the vocabulary data have domain information. Our methods are overlap. more effective when the terms are derived from In other words, the meaning of a vocabulary Chinese characters; therefore, we expect that the overlap on NMT is that each corresponding word’s effect will be different per domain. This is because 1Sino-Korean (SK) refers to Korean words of Chinese 2Here, test-n, test-n1, test-n2, and test-n3 data, and test-n origin. consist of test-n1, test-n2, and test-n3. 127 Proceedings of the 7th Workshop on Asian Translation, pages 127–134 December 4, 2020. c 2020 Association for Computational Linguistics Korean (Hangul) ø래서 NX h 량@ 0.01 %tX\ \정\다. Korean (Hanja) ø래서 NX 含有2@ 0.01 %以下\ ¿a\다. Japanese そのため Nn 含有2o 0.01 %以下k ¿aする。 English Therefore, N content 0.01 % or less limit. Table 2: Korean sentence and sentence in which SK words are converted into Hanja, together with their Japanese and English translations. there are domains in which there are many terms from Hangul into Hanja to increase the vocabulary derived from Chinese characters. Therefore, to ex- overlap. amine which Hanja information is the most useful Conneau et al.(2018) proposed a method to build in each domain, we perform a domain adaptation a bilingual dictionary by aligning the source and tar- by fine-tuning the model pre-trained by all training get word embedding spaces using a rotation matrix. data using domain-specific data. They showed that word-by-word translation using In this study, we examine the effect of Hanja the bilingual dictionary can improve the translation information and domain adaptation in a Korean- accuracy in low-resource language pairs. Lample to-Japanese translation. The main contributions of and Conneau(2019) improved the translation ac- this study are as follows: curacy by fine-tuning a pre-trained cross-lingual language model (XLM). The authors observed that • We demonstrate that Hanja information is ef- the bilingual word embeddings in XLM are similar. fective for Korean to Japanese translations Based on these facts, we hypothesize that if the within a specific domain. embeddings of each corresponding word become • In addition, our experiment shows that the closer, the translation accuracy will improve. In translation model using Hanja information this study, we use Hanja information to make the tends to translate literally. embeddings of each corresponding word closer to each other. 2 Related Work Some studies have focused on exploiting Hanja Several studies have been conducted on Korean– information. Park and Zhao(2019) focused on the Japanese neural machine translation (NMT). Park linguistic connection between Korean and Chinese. et al.(2019) trained a Korean-to-Japanese trans- They used the parallel data in which the Korean lation model using a transformer-based NMT sys- sentences of SK words were converted into Hanja tem with relative positioning, back-translation, and to train a translation model and demonstrated that multi-source methods. There have been other their method improves the translation quality. In attempts that combine statistical machine trans- addition, they showed that conversion into Hanja lation (SMT) and NMT (Ehara, 2018; Hyoung- helped translate the homophone of SK words. Yoo Gyu Lee and Lee, 2015). Previous studies on et al.(2019) proposed an approach of training Ko- Korean–Japanese NMT did not use Hanja infor- rean word representations using the data in which mation, whereas we train a Korean-to-Japanese SK words were converted into Hanja words. They translation model using data in which SK words demonstrated the effectiveness of the representa- were converted into Hanja words. tion learning method on several downstream tasks, Sennrich et al.(2016) proposed byte-pair encod- such as a news headline generation and sentiment ing (BPE), i.e., a sub-word segmentation method, analysis. To train the Korean-to-Japanese trans- and suggested that overlapping tokens by joint lation model, we combine these two approaches BPE is more effective for training the translation using Hanja information for training the transla- model between European language pairs. Zhang tion model and word embeddings to improve the and Komachi(2019) increased the overlap of to- translation quality. kens between Japanese and Chinese by decompos- In domain adaptation approaches for NMT, ing Japanese Kanji and Chinese characters into an Bapna and Firat(2019); Gu et al.(2019) trained ideograph or stroke level to improve the accuracy an NMT model pre-trained with massive parallel of Chinese–Japanese unsupervised NMT. Follow- data and retrained it with small parallel data within ing previous studies, we convert Korean SK words the target domain. In addition, Hu et al.(2019) 128 Korean Japanese Model Partition Sent. Tokens Types Tokens Types train 999,758 31,151,846 21,936 31,065,360 24,178 dev 2,000 106,433 6,475 104,307 6,098 test-n 5,230 272,975 9,530 269,876 9,018 Baseline test-n1 2,000 108,327 6,425 106,947 6,137 test-n2 3,000 153,195 7,522 151,253 6,656 test-n3 230 11,453 1,666 11,676 1,644 train 999,755 32,066,032 26,046 30,474,136 27,541 dev 2,000 109,460 6,944 103,994 6,119 test-n 5,230 298,404 9,832 269,146 9,166 Hanja-conversion test-n1 2,000 111,543 6,941 106,653 6,162 test-n2 3,000 175,131 6,410 150,844 6,717 test-n3 230 11,730 1,759 11,649 1,634 Table 3: Statistics of parallel data after each pre-processing. proposed an unsupervised adaptation method that Kanji word closer to each other. We use a loss retrains a pre-trained NMT model using pseudo- function to achieve this goal as follows: in-domain data. In this study, we perform domain N adaptation by fine-tuning a pre-trained NMT model X L = L + (1 − Sim(E(S ); E(K )) (1) with domain-specific data to examine whether T n n n=1 Hanja information is useful, and if so, in which domains. where L is the loss function per-batch and LT is the loss of the transformer architecture. In addi- 3 NMT with Hanja Information tion, S and K are the lists of SK words and its Our model is based on the transformer architecture corresponding Japanese Kanji words in the batch, (Vaswani et al., 2017), and we share the embed- respectively, and N is the length of the S and K ding weights between the encoder input, decoder lists (e.g., when the sentence is the example of Ta- input, and output to make better use of the vocab- ble2 and the batch size is one, S = (h 량, t ulary overlap between Korean and Japanese.
Recommended publications
  • Neural Substrates of Hanja (Logogram) and Hangul (Phonogram) Character Readings by Functional Magnetic Resonance Imaging
    ORIGINAL ARTICLE Neuroscience http://dx.doi.org/10.3346/jkms.2014.29.10.1416 • J Korean Med Sci 2014; 29: 1416-1424 Neural Substrates of Hanja (Logogram) and Hangul (Phonogram) Character Readings by Functional Magnetic Resonance Imaging Zang-Hee Cho,1 Nambeom Kim,1 The two basic scripts of the Korean writing system, Hanja (the logography of the traditional Sungbong Bae,2 Je-Geun Chi,1 Korean character) and Hangul (the more newer Korean alphabet), have been used together Chan-Woong Park,1 Seiji Ogawa,1,3 since the 14th century. While Hanja character has its own morphemic base, Hangul being and Young-Bo Kim1 purely phonemic without morphemic base. These two, therefore, have substantially different outcomes as a language as well as different neural responses. Based on these 1Neuroscience Research Institute, Gachon University, Incheon, Korea; 2Department of linguistic differences between Hanja and Hangul, we have launched two studies; first was Psychology, Yeungnam University, Kyongsan, Korea; to find differences in cortical activation when it is stimulated by Hanja and Hangul reading 3Kansei Fukushi Research Institute, Tohoku Fukushi to support the much discussed dual-route hypothesis of logographic and phonological University, Sendai, Japan routes in the brain by fMRI (Experiment 1). The second objective was to evaluate how Received: 14 February 2014 Hanja and Hangul affect comprehension, therefore, recognition memory, specifically the Accepted: 5 July 2014 effects of semantic transparency and morphemic clarity on memory consolidation and then related cortical activations, using functional magnetic resonance imaging (fMRI) Address for Correspondence: (Experiment 2). The first fMRI experiment indicated relatively large areas of the brain are Young-Bo Kim, MD Department of Neuroscience and Neurosurgery, Gachon activated by Hanja reading compared to Hangul reading.
    [Show full text]
  • Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
    1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only.
    [Show full text]
  • Recognition of Online Handwritten Gurmukhi Strokes Using Support Vector Machine a Thesis
    Recognition of Online Handwritten Gurmukhi Strokes using Support Vector Machine A Thesis Submitted in partial fulfillment of the requirements for the award of the degree of Master of Technology Submitted by Rahul Agrawal (Roll No. 601003022) Under the supervision of Dr. R. K. Sharma Professor School of Mathematics and Computer Applications Thapar University Patiala School of Mathematics and Computer Applications Thapar University Patiala – 147004 (Punjab), INDIA June 2012 (i) ABSTRACT Pen-based interfaces are becoming more and more popular and play an important role in human-computer interaction. This popularity of such interfaces has created interest of lot of researchers in online handwriting recognition. Online handwriting recognition contains both temporal stroke information and spatial shape information. Online handwriting recognition systems are expected to exhibit better performance than offline handwriting recognition systems. Our research work presented in this thesis is to recognize strokes written in Gurmukhi script using Support Vector Machine (SVM). The system developed here is a writer independent system. First chapter of this thesis report consist of a brief introduction to handwriting recognition system and some basic differences between offline and online handwriting systems. It also includes various issues that one can face during development during online handwriting recognition systems. A brief introduction about Gurmukhi script has also been given in this chapter In the last section detailed literature survey starting from the 1979 has also been given. Second chapter gives detailed information about stroke capturing, preprocessing of stroke and feature extraction. These phases are considered to be backbone of any online handwriting recognition system. Recognition techniques that have been used in this study are discussed in chapter three.
    [Show full text]
  • Sinitic Language and Script in East Asia: Past and Present
    SINO-PLATONIC PAPERS Number 264 December, 2016 Sinitic Language and Script in East Asia: Past and Present edited by Victor H. Mair Victor H. Mair, Editor Sino-Platonic Papers Department of East Asian Languages and Civilizations University of Pennsylvania Philadelphia, PA 19104-6305 USA [email protected] www.sino-platonic.org SINO-PLATONIC PAPERS FOUNDED 1986 Editor-in-Chief VICTOR H. MAIR Associate Editors PAULA ROBERTS MARK SWOFFORD ISSN 2157-9679 (print) 2157-9687 (online) SINO-PLATONIC PAPERS is an occasional series dedicated to making available to specialists and the interested public the results of research that, because of its unconventional or controversial nature, might otherwise go unpublished. The editor-in-chief actively encourages younger, not yet well established, scholars and independent authors to submit manuscripts for consideration. Contributions in any of the major scholarly languages of the world, including romanized modern standard Mandarin (MSM) and Japanese, are acceptable. In special circumstances, papers written in one of the Sinitic topolects (fangyan) may be considered for publication. Although the chief focus of Sino-Platonic Papers is on the intercultural relations of China with other peoples, challenging and creative studies on a wide variety of philological subjects will be entertained. This series is not the place for safe, sober, and stodgy presentations. Sino- Platonic Papers prefers lively work that, while taking reasonable risks to advance the field, capitalizes on brilliant new insights into the development of civilization. Submissions are regularly sent out to be refereed, and extensive editorial suggestions for revision may be offered. Sino-Platonic Papers emphasizes substance over form.
    [Show full text]
  • + Natali A, Professor of Cartqraphy, the Hebreu Uhiversity of -Msalem, Israel DICTIONARY of Toponymfc TERLMINO~OGY Wtaibynafiail~
    United Nations Group of E%perts OR Working Paper 4eographicalNames No. 61 Eighteenth Session Geneva, u-23 August1996 Item7 of the E%ovisfonal Agenda REPORTSOF THE WORKINGGROUPS + Natali a, Professor of Cartqraphy, The Hebreu UhiVersity of -msalem, Israel DICTIONARY OF TOPONYMfC TERLMINO~OGY WtaIbyNafiaIl~- . PART I:RaLsx vbim 3.0 upi8elfuiyl9!J6 . 001 . 002 003 004 oo!l 006 007 . ooa 009 010 . ol3 014 015 sequala~esfocJphabedcsaipt. 016 putting into dphabetic order. see dso Kqucna ruIt!% Qphabctk 017 Rtlpreat8Ii00, e.g. ia 8 computer, wflich employs ooc only numm ds but also fetters. Ia a wider sense. aIso anploying punauatiocl tnarksmd-SymboIs. 018 Persod name. Esamples: Alfredi ‘Ali. 019 022 023 biliaw 024 02s seecIass.f- 026 GrqbicsymboIusedurunitiawrIdu~morespedficaty,r ppbic symbol in 1 non-dphabedc writiog ryste.n& Exmlptes: Chinese ct, , thong; Ambaric u , ha: Japaoese Hiragana Q) , no. 027 -.modiGed Wnprehauive term for cheater. simplified aad character, varIaoL 031 CbmJnyol 032 CISS, featm? 033 cQdedrepfwltatiul 034 035 036 037' 038 039 040 041 042 047 caavasion alphabet 048 ConMQo table* 049 0nevahte0frpointinlhisgr8ti~ . -.- w%idofplaaecoordiaarurnm;aingoftwosetsofsnpight~ -* rtcight8ngfIertoeachotkrodwithap8ltKliuofl8qthonbo&. rupenmposedonr(chieflytopogtaphtc)map.see8lsouTM gz 051 see axxdimtes. rectangufar. 052 A stahle form of speech, deriyed from a pbfgin, which has became the sole a ptincipal language of 8 qxech comtnunity. Example: Haitian awle (derived from Fresh). ‘053 adllRaIfeatlue see feature, allhlral. 054 055 * 056 057 Ac&uioaofsoftwamrcqkdfocusingrdgRaIdatabmem rstoauMe~osctlto~thisdatabase. 058 ckalog of defItitioas of lbe contmuofadigitaldatabase.~ud- hlg data element cefw labels. f0mw.s. internal refm codMndtextemty,~well~their-p,. 059 see&tadichlq. 060 DeMptioa of 8 basic unit of -Lkatifiile md defiile informatioa tooccqyrspecEcdataf!eldinrcomputernxaxtLExampk Pateofmtifii~ofluwtby~namaturhority’.
    [Show full text]
  • Chinese Script Generation Panel Document
    Chinese Script Generation Panel Document Proposal for the Generation Panel for the Chinese Script Label Generation Ruleset for the Root Zone 1. General Information Chinese script is the logograms used in the writing of Chinese and some other Asian languages. They are called Hanzi in Chinese, Kanji in Japanese and Hanja in Korean. Since the Hanzi unification in the Qin dynasty (221-207 B.C.), the most important change in the Chinese Hanzi occurred in the middle of the 20th century when more than two thousand Simplified characters were introduced as official forms in Mainland China. As a result, the Chinese language has two writing systems: Simplified Chinese (SC) and Traditional Chinese (TC). Both systems are expressed using different subsets under the Unicode definition of the same Han script. The two writing systems use SC and TC respectively while sharing a large common “unchanged” Hanzi subset that occupies around 60% in contemporary use. The common “unchanged” Hanzi subset enables a simplified Chinese user to understand texts written in traditional Chinese with little difficulty and vice versa. The Hanzi in SC and TC have the same meaning and the same pronunciation and are typical variants. The Japanese kanji were adopted for recording the Japanese language from the 5th century AD. Chinese words borrowed into Japanese could be written with Chinese characters, while Japanese words could be written using the character for a Chinese word of similar meaning. Finally, in Japanese, all three scripts (kanji, and the hiragana and katakana syllabaries) are used as main scripts. The Chinese script spread to Korea together with Buddhism from the 2nd century BC to the 5th century AD.
    [Show full text]
  • Proposal for a Korean Script Root Zone LGR 1 General Information
    (internal doc. #: klgp220_101f_proposal_korean_lgr-25jan18-en_v103.doc) Proposal for a Korean Script Root Zone LGR LGR Version 1.0 Date: 2018-01-25 Document version: 1.03 Authors: Korean Script Generation Panel 1 General Information/ Overview/ Abstract The purpose of this document is to give an overview of the proposed Korean Script LGR in the XML format and the rationale behind the design decisions taken. It includes a discussion of relevant features of the script, the communities or languages using it, the process and methodology used and information on the contributors. The formal specification of the LGR can be found in the accompanying XML document below: • proposal-korean-lgr-25jan18-en.xml Labels for testing can be found in the accompanying text document below: • korean-test-labels-25jan18-en.txt In Section 3, we will see the background on Korean script (Hangul + Hanja) and principal language using it, i.e., Korean language. The overall development process and methodology will be reviewed in Section 4. The repertoire and variant groups in K-LGR will be discussed in Sections 5 and 6, respectively. In Section 7, Whole Label Evaluation Rules (WLE) will be described and then contributors for K-LGR are shown in Section 8. Several appendices are included with separate files. proposal-korean-lgr-25jan18-en 1 / 73 1/17 2 Script for which the LGR is proposed ISO 15924 Code: Kore ISO 15924 Key Number: 287 (= 286 + 500) ISO 15924 English Name: Korean (alias for Hangul + Han) Native name of the script: 한글 + 한자 Maximal Starting Repertoire (MSR) version: MSR-2 [241] Note.
    [Show full text]
  • The Japanese Writing Systems, Script Reforms and the Eradication of the Kanji Writing System: Native Speakers’ Views Lovisa Österman
    The Japanese writing systems, script reforms and the eradication of the Kanji writing system: native speakers’ views Lovisa Österman Lund University, Centre for Languages and Literature Bachelor’s Thesis Japanese B.A. Course (JAPK11 Spring term 2018) Supervisor: Shinichiro Ishihara Abstract This study aims to deduce what Japanese native speakers think of the Japanese writing systems, and in particular what native speakers’ opinions are concerning Kanji, the logographic writing system which consists of Chinese characters. The Japanese written language has something that most languages do not; namely a total of ​ ​ three writing systems. First, there is the Kana writing system, which consists of the two syllabaries: Hiragana and Katakana. The two syllabaries essentially figure the same way, but are used for different purposes. Secondly, there is the Rōmaji writing system, which is Japanese written using latin letters. And finally, there is the Kanji writing system. Learning this is often at first an exhausting task, because not only must one learn the two phonematic writing systems (Hiragana and Katakana), but to be able to properly read and write in Japanese, one should also learn how to read and write a great amount of logographic signs; namely the Kanji. For example, to be able to read and understand books or newspaper without using any aiding tools such as dictionaries, one would need to have learned the 2136 Jōyō Kanji (regular-use Chinese characters). With the twentieth century’s progress in technology, comparing with twenty years ago, in this day and age one could probably theoretically get by alright without knowing how to write Kanji by hand, seeing as we are writing less and less by hand and more by technological devices.
    [Show full text]
  • Negotiation Philosophy in Chinese Characters
    6 Ancient Wisdom for the Modern Negotiator: What Chinese Characters Have to Offer Negotiation Pedagogy Andrew Wei-Min Lee* Editors’ Note: In a project that from its inception has been devoted to second generation updates, it is instructive nonetheless to realize how much we have to learn from the past. We believe Lee’s chapter on Chinese characters and their implications for negotiation is groundbreaking. With luck, it will prove to be a harbinger of a whole variety of new ways of looking at our field that will emerge from our next round of discussion. Introduction To the non-Chinese speaker, Chinese characters can look like a cha- otic mess of dots, lines and circles. It is said that Chinese is the most difficult language in the world to learn, and since there is no alpha- bet, the struggling student has no choice but to learn every single Chinese character by sheer force of memory – and there are tens of thousands! I suggest a different perspective. While Chinese is perhaps not the easiest language to learn, there is a very definite logic and sys- tem to the formation of Chinese characters. Some of these characters date back almost eight thousand years – and embedded in their make-up is an extraordinary amount of cultural history and wisdom. * Andrew Wei-Min Lee is founder and president of the Leading Negotiation Institute, whose mission is to promote negotiation pedagogy in China. He also teaches negotiation at Peking University Law School. His email address is an- [email protected]. This article draws primarily upon the work of Feng Ying Yu, who has spent over three hundred hours poring over ancient Chinese texts to analyze and decipher the make-up of modern Chinese characters.
    [Show full text]
  • Hangeul As a Tool of Resistance Aganst Forced Assimiliation: Making Sense of the Framework Act on Korean Language
    Washington International Law Journal Volume 27 Number 3 6-1-2018 Hangeul as a Tool of Resistance Aganst Forced Assimiliation: Making Sense of the Framework Act on Korean Language Minjung (Michelle) Hur Follow this and additional works at: https://digitalcommons.law.uw.edu/wilj Part of the Comparative and Foreign Law Commons Recommended Citation Minjung (Michelle) Hur, Comment, Hangeul as a Tool of Resistance Aganst Forced Assimiliation: Making Sense of the Framework Act on Korean Language, 27 Wash. L. Rev. 715 (2018). Available at: https://digitalcommons.law.uw.edu/wilj/vol27/iss3/6 This Comment is brought to you for free and open access by the Law Reviews and Journals at UW Law Digital Commons. It has been accepted for inclusion in Washington International Law Journal by an authorized editor of UW Law Digital Commons. For more information, please contact [email protected]. Compilation © 2018 Washington International Law Journal Association HANGEUL AS A TOOL OF RESISTANCE AGAINST FORCED ASSIMILATION: MAKING SENSE OF THE FRAMEWORK ACT ON KOREAN LANGUAGE Minjung (Michelle) Hur† Abstract: Language policies that mandate a government use a single language may seem controversial and unconstitutional. English-only policies are often seen as xenophobic and discriminatory. However, that may not be the case for South Korea’s Framework Act on Korean Language, which mandates the use of the Korean alphabet, Hangeul, for official documents by government institutions. Despite the resemblance between the Framework Act on Korean Language and English-only policies, the Framework Act should be understood differently than English-only policies because the Hangeul-only movement has an inverse history to English-only movements.
    [Show full text]
  • The Writings of Henry Cu
    P~per No. 13 The Writings of Henry Cu Kim The Center for Korean Studies was established in 1972 to coordinate and develop the resources for the study of Korea at the University of Hawaii. Its goals are to enhance the quality and performance of Uni­ versity faculty with interests in Korean studies; develop compre­ hensive and balanced academic programs relating to Korea; stimulate research and pub­ lications on Korea; and coordinate the resources of the University with those of the Hawaii community and other institutions, organizations, and individual scholars engaged in the study of Korea. Reflecting the diversity of academic disciplines represented by its affiliated faculty and staff, the Center especially seeks to further interdisciplinary and intercultural studies. The Writings of Henry Cu Killl: Autobiography with Commentaries on Syngman Rhee, Pak Yong-man, and Chong Sun-man Edited and Translated, with an Introduction, by Dae-Sook Suh Paper No. 13 University of Hawaii Press Center for Korean Studies University of Hawaii ©Copyright 1987 by the University of Hawaii Press All rights reserved. Printed in the United States of America Honolulu, Hawaii 96822 Library of Congress Cataloging-in-Publication Data Kim, Henry Cu, 1889-1967. The Writings of Henry Cu Kim. (Paper; no. 13) Translated from holographs written in Korean. Includes index. 1. Kim, Henry Cu, 1889-1967. 2. Kim, Henry Cu, 1889-1967-Friends and associates. 3. Rhee, Syngman, 1875-1965. 4. Pak, Yong-man, 1881-1928. 5. Chong, Sun-man. 6. Koreans-Hawaii-Biography. 7. Nationalists -Korea-Biography. I. Suh, Dae-Sook, 1931- . II. Title. III. Series: Paper (University of Hawaii at Manoa.
    [Show full text]
  • Korean Studies Program
    korean studies program university of michigan fall 2006 director’s note koryo saram-the unreliable people, premiers in the fall I am happy to report that Koryo Saram-The Unreliable People, every Korean who was connected to the Comintern. One cannot a one-hour documentary created by the Korean Studies Program help wondering what would have happened to modern Korean (KSP) will make its international premier at the Smithsonian history if one or more factor had changed. For instance, if the Institution’s Sackler Gallery in Washington DC on October 29. Korean communist leadership in the Soviet Union had not This documentary, which has taken two years to make, is the perished in the Great Terror, would it not have been parachuted result of an extensive collaboration between units in the into North Korea by the Soviets in the aftermath of World War II? International Institute and within the university. We were fortu- Would someone from the ranks of the Comintern-connected nate to have access to some of the lowest paid but highest skilled Koreans have ruled the roost in North Korea and not Kim Il workers to be found in the country: the UROP (Undergraduate Sung? If so, is it possible that North Korea might have ended Research and Opportunity Program) and work-study students up looking more like a garden-variety Soviet satellite, rather than who streamed in and out of the KSP offices on the third floor what it became, a beleaguered nationalist regime held together of the International Institute. These individuals worked on by a cult of personality? a multitude of tasks: sorting through primary documents and The answers, it turned out, were not only unanswerable, film footage in Russian, Korean and English; translating oral but un-researchable.
    [Show full text]