Korean Language Generation in an Interlingua-Based Speech

Total Page:16

File Type:pdf, Size:1020Kb

Korean Language Generation in an Interlingua-Based Speech Korean Language Generation in an Interlingua-based Speech Translation System by Dennis Woojun Yang Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY May 1995 ( Massachusetts Institute of Technology 1995. All rights reserved. AAuthor uthor....... ................................ Department of Electrical E ineering and Computer Science May 26, 1995 CerCer::" -- -- - 'U./ ....... ....... :'"''..................'''- Stephanie Seneff Principal Research Scientist mhesis Supervisor Certifiec .................. Clifford Weinstein Leader of Lincoln/p ech Systems Technology Group \.- . i 1 1. B Thesis Supervisor Acceptedby .--........ ".....Ac e F.R........Morgenthaler Chairmanepartmental Com ittee on Graduate Theses ;:;ASSACHUSETTS INSTITU'E OF TECHNOLOGY AUG 10 1995 LIBRARIES arker EMn Korean Language Generation in an Interlingua-based Speech Translation System by Dennis Woojun Yang Submitted to the Department of Electrical Engineering and Computer Science on May 26, 1995, in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering Abstract Group 24 at MIT Lincoln Laboratory has been developing an automatic speech-to- speech translation system for the English-Korean pair. For the machine translation module, an interlingua system has been adopted. This system analyzes the source language text and represents the results of the analysis in a semantic frame, an un- ambiguous textual-meaning propositional representation language, from which the text in the target language is generated. For the language generation component, GENESIS, a language generation system developed at the Spoken Language Sys- tems Group at the Laboratory for Computer Science of Massachusetts Institute of Technology, has been utilized. GENESIS has been used for European languages for general purposes and for Japanese in limited domains. It has also been found to be capable of handling some of the linguistic phenomena that are needed for Korean. However, there exist areas in which GENESIS cannot currently handle Korean gen- eration. This thesis explores the degree to which GENESIS is able to manage Korean language generation. Thesis Supervisor: Stephanie Seneff Title: Principal Research Scientist Thesis Supervisor: Clifford Weinstein Title: Leader of Lincoln Speech Systems Technology Group Acknowledgments More than 8 months of research and a bucket of sweat resulted in this document. Without substantial help and support from the people cited below, the completion of this thesis would have not been possible. I would like to take this opportunity to thank the people who have contributed much to the completion of my Master of Engineering thesis. I wish to express my sincere thanks to my MIT campus supervisor, Dr. Stephanie Seneff. Only with her expertise in natural language processing and her watchful guidance has it been possible for me to start, progress, and finish my thesis. I truely appreciate her technical support and time spent in proofreading my thesis. I gratefully acknowledge the help of my MIT Lincoln Laboratory supervisor, Dr. Cliff Weinstein. His support ranged from providing all the necessary equipment to proofreading my thesis. I would also like to recognize MIT Lincoln Lab for allowing me to use its facilities. A lot of thanks to the Group 24 members of MIT Lincoln Lab for not only helping me do my research, but also for showing me a glimpse of the real world. I would especially like to thank Dinesh Tummala whom I could always count on for guidance on various issues, both technical and personal, and Dr. Youngsuk Lee who shared her Korean expertise with me. Special thanks to my friends, Shinsuk, Hyunwoo, Dongwhan, and Soojung who scored the performance of the output, and to Jinmo, John and professor Victor Zue who proofread my thesis. A lot of thanks and best wishes for Sunyong who showed me her thesis to look at the format, and for Mira who sent over the material about the Korean grammar. And, to my parents who immigrated to America to give me the best education they can, I would like to express my deepest gratitude. Most importantly, I would like to thank God for being with me and keeping me strong in faith throughout the whole process. The work for this thesis was carried out in the fall of 1994 and in the spring of 1995 at MIT Lincoln Laboratory, Lexington, Massachusetts. This document was written and revised in the spring term of 1995 at MIT Lincoln Laboratory. Contents 1 Introduction 10 1.1 Machine translation systems . ...................... 10 1.2 CCLINC............ ...................... 12 1.3 Korean language in CCLINC . ...................... 14 1.4 TINA and GENESIS ..... ...................... 15 1.5 Evaluation procedure ..... ...................... 16 2 Korean language phenomena 17 2.1 Word order rules ................. ............ ..17 2.2 Conjunctive relations. .. .... .... 22 2.3 Verb suffixes . ..... ....... 22 2.3.1 Honorific/polite suffixes ......... ........... ..22 2.3.2 Tense suffixes. .. .. .. .. .. .. 24 2.3.3 Type-defining suffixes . ........... ..24 2.3.4 Conjunctive suffixes . .. ... .. ... .. 25 2.3.5 Gerund suffixes .... ...... .. .. ... .. ... .. 26 3 GENESIS 28 3.1 Semantic frames ........ .................. 28 3.2 Mechanism of GENESIS ... .................. 30 3.2.1 Lexicon ........ .................. 30 3.2.2 Messages ........ .................. 31 3.2.3 Rewrite rules. .................. 32 5 3.3 GENESIS for Korean ........................... 32 3.3.1 Lexicon . ........................... 32 3.3.2 Messages... ........................... 38 3.3.3 Rewrite rules ........................... 41 4 Proposed improvements to GENESIS 44 4.1 Negations and passive voice sentences 44 ........... ....... 4.2 Articles ................ ........... ....... 49 4.3 Styles of speech. ........... ....... 50 4.4 Preposition vs. postposition. ........... ....... 51 4.5 Mapping approach. ........... ....... 52 4.6 Lexical incompatibility. ........... 53 5 Evaluation 54 5.1 Evaluation Procedure ............ .. ... .. .54 5.1.1 Data .................... ..... ....... .54 5.1.2 Method .................. ... .. .. .54 5.1.3 Scores ................... ... .... .. .55 5.2 Analysis ...................... ........... ..55 5.2.1 Insufficient analysis of TINA ....... ........... ..58 5.2.2 Fixable by changing rules of GENESIS .. .. ... .. .59 5.2.3 Require code modification for GENESIS ........... ..60 5.2.4 Other ................... ........... ..60 5.3 Conclusion ..................... .. .. ... .. 61 6 Discussion and future plans 63 A Lexicon for GENESIS 65 B Messages for GENESIS 73 C Rewrite-rules for GENESIS 81 6 D Inflection patterns 89 7 List of Figures 1-1 A typical SSTS ................... 11 1-2 Translation using ILT approach [1] ........ 11 1-3 System structure for multilingual SSTS [6] .... 13 3-1 Parse tree for a sample sentence .......... 29 4-1 Inflections for "Hada" verb ............. 48 5-1 Adequacy score histogram (O indicates unparsed) 56 5-2 Fluency score histogram (O indicates unparsed) 57 D-1 Inflection patterns for V1 "HaDa" verbs . .......... .91 D-2 Inflection patterns for V2 verbs ........... ..93 D-3 Inflection patterns for V3 verbs .......... ..95 D-4 Inflection patterns for V4 verbs .......... ..97 D-5 Inflection patterns for V5 verbs 99 8 List of Tables 3.1 Example lexicon entries for English .............. 31 5.1 Evaluation scores. 55 5.2 Occurrences of each error source ................ 55 A.1 Lexicon file for GENESIS ................... 66 B.1 Messages file for GENESIS ................... 74 C.1 Data files used for rewrite.c................. .. 82 C.2 Program automatically generating korean-rewrite-rules.text . .. 83 C.3 Special.eng. ........................... .. 88 D.1 Words beloging to V1 "HaDa" verbs ............ ..... ..90 D.2 Words beloging to V2 verbs ................. ..... ..92 D.3 Words beloging to V3 verbs ................. ..... ..94 D.4 Words beloging to V4 verbs ................. ..... ..96 D.5 Words beloging to V5 verbs ................. ..... ..98 9 Chapter 1 Introduction 1.1 Machine translation systems Group 24 at MIT Lincoln Laboratory has been developing an automatic speech-to- speech translation system (SSTS) for English and Korean. It has been proposed that the English-Korean automatic SSTS be used by military coalition forces in Korea where the need for communication among Korean and American soldiers has been recognized. However, this proposal is difficult to fulfill due to the large differences between the two languages. The purpose of building the English-Korean automatic SSTS is to help the soldiers communicate in their own respective languages. A typical SSTS works in three phases: speech recognition, language translation, and speech synthesis [1]. The first phase recognizes the speech in the source language (SL) then produces the utterance in text form. The second phase analyzes this utterance and translates it into the target language (TL) in text form. The last phase converts this translation into sound. See figure 1-1. Most machine translation systems developed to date fall into two categories de- pending on how the language translation is approached - transfer and interlingua [1]. Transfer systems involve finding the target language correlates for lexical units and syntactic constructions of the source language, whereas in interlingua systems the SL and TL are
Recommended publications
  • Towards a Practical Phonology of Korean
    Towards a practical phonology of Korean Research Master programme in Linguistics Leiden University Graduation thesis Lorenzo Oechies Supervisor: Dr. J.M. Wiedenhof Second reader: Dr. A.R. Nam June 1, 2020 The blue silhouette of the Korean peninsula featured on the front page of this thesis is taken from the Korean Unification Flag (Wikimedia 2009), which is used to represent both North and South Korea. Contents Introduction ..................................................................................................................................................... iii 0. Conventions ............................................................................................................................................... vii 0.1 Romanisation ........................................................................................................................................................ vii 0.2 Glosses .................................................................................................................................................................... viii 0.3 Symbols .................................................................................................................................................................. viii 0.4 Phonetic transcription ........................................................................................................................................ ix 0.5 Phonemic transcription.....................................................................................................................................
    [Show full text]
  • Downloaded for Personal Non‐Commercial Research Or Study, Without Prior Permission Or Charge
    Barnes‐Sadler, Simon George (2016) Central Asian and Yanbian Korean in comparative perspective. PhD thesis. SOAS University of London. http://eprints.soas.ac.uk/26672 Copyright © and Moral Rights for this thesis are retained by the author and/or other copyright owners. A copy can be downloaded for personal non‐commercial research or study, without prior permission or charge. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder/s. The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders. When referring to this thesis, full bibliographic details including the author, title, awarding institution and date of the thesis must be given e.g. AUTHOR (year of submission) "Full thesis title", name of the School or Department, PhD Thesis, pagination. Central Asian and Vernacular Yanbian Korean in Comparative Perspective Simon George Barnes-Sadler Thesis submitted for the degree of PhD 2016 Department of the Languages and Cultures of Japan and Korea 1 SOAS, University of London Declaration for SOAS PhD thesis I have read and understood regulation 17.9 of the Regulations for students of the SOAS, University of London concerning plagiarism. I undertake that all the material presented for examination is my own work and has not been written for me, in whole or in part, by any other person. I also undertake that any quotation or paraphrase from the published or unpublished work of another person has been duly acknowledged in the work which I present for examination.
    [Show full text]
  • Korean Residents in Japan and Their Korean Language in Multiple Language Contacts
    Korean Residents in Japan and their Korean Language in Multiple Language Contacts by Jeonghye Son M.A., The University of British Columbia, 2008 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE DEGREE OF MASTER OF ARTS in The Faculty of Graduate Studies (Asian Studies) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) October 2008 ©Jeonghye Son, 2008 Abstract In present-day Japan, there are about 500,000 Korean residents (henceforth, Zairiichi Koreans) and most of them are individuals who were forced to cross over to Japan as low- wage labourers and for miiitary service during the colonial period and their descendants. The language contact which Zainichi Koreans have undergone is interesting for a number of reasons. The majority of the first generations are southern dialect speakers; due to geographical proximity to Japan, it was easier for Koreans in the southern areas on the peninsula to cross over to Japan. However, following liberation from Japan in 1945, younger generations have been exposed to the standard languages of North or South Korea in schools that were established for children remaining in Japan whereas, at home, to the dialects spoken by older generations in their families or communities. Moreover, in their day-to-day activities they primarily use the dominant language of Japanese. It is the purpose of this study to characterize the Korean language used by Zainichi Koreans through an in-depth analysis of orthography, lexicon and grammar compared with the original Korean language used on the peninsula, and to suggest the socio-linguistic typology. This study is based mainly on data from three volumes of comic books which were titled ‘Flutter Toward the Sky’ (Ch anggonge narae ch ‘Jra) and published by a Chongryun run publisher, ChasJn Sinbo and on audio-recorded data from classes in a Chongryun-run primary school.
    [Show full text]
  • Studies in Korean Linguistics and Language Pedagogy
    STUDIES IN KOREAN LINGUISTICS AND LANGUAGE PEDAGOGY KLEAR-RILI Studies in Korean Language and Linguistics KLEAR-RILI Studies in Korean Language and Linguistics STUDIES IN KOREAN STUDIES IN KOREAN LINGUISTICS AND LINGUISTICS AND LANGUAGE PEDAGOGY LANGUAGE PEDAGOGY Festschrift for Ho-min Sohn Festschrift for Ho-min Sohn Edited by Sung-Ock Sohn, Sungdai Cho & Seok-Hoon You Korea University Press 82-2-3290-4234 (Tel.) 82-2-923-6311 (Fax) http://www.kupress.com [email protected] 145, Anam-Ro, Seongbuk-Gu, Seoul, 136-701 Korea ⓒ 2013 Korea University Press All rights reserved. Published October 2013 ISBN 978-89-7641-830-2 93710 KRW 31,000 5 CONTENTS Preface ..................................................................................................................................................................................................................... 9 Contributors ................................................................................................................................................................................................ 14 Introduction ................................................................................................................................................................................................. 15 PART I KOREAN LINGUISTICS 1. Korean Honorific Sentence Ending,-pnita : Indexing of Knowledgeable Expert • Sumi Chang ............................................................................... 25 2. Semantic and Pragmatic Functions of Sentence Enders -kwun and
    [Show full text]
  • An Introduction to the Korean Spoken Language
    •a: & a <§ ^ $ AN INTRODUCTION TO THE KOREAN SPOKEN LANGUAGE BY HORACE GRANT UNDERWOOD IN TWO PARTS : PART I. GRAMMATICAL NOTES PART II. ENGLISH INTO KOREAN SECOND EDITION REVISED AND ENLARGED WITH THE ASSISTANCE OF HORACE HORTON UNDERWOOD, A.B. EUROPE AND AMERICA The MacMillan Company, New York THE FAR EAST Kelly & Walsh, I/td., Yokohama, Shanghai The Korean Religious Tract Society, Seoul, Korea (all eight reserved) 39fb ■- 7?5. 5 PRINTED BY THE FUKUIN PRINTING Co., L'id., YOKOHAMA, JAPAN. K ■ PREFACE. It was hardly expected when this volume saw the light of day in 1889 that so many years would pass before it was supplemented by something more elaborate and better and it is only the fact that nothing else has been prepared to take its place and that the author has been so beseeched for a new edition that has led us to issue this second edition. "We have sought advice and help and suggestions for changes on every hand and regret very much that the press of work has hindered others from giving to us the assistance that would have made this book of much more value to the studenj of Korean. In the present edition the author i3 glad to say that he has had the assistance of his son who went over the revision of the book with the enthusiasm of a new student of the language. We regret that more changes have not been made because we feel that the imperfections of the book would have warranted a more thorough revision of the book, but ■ a careful review of all the parts with the assistance of some of the best Korean scholars available did not result ■in more than what is seen in this new edition.
    [Show full text]
  • Preparing Camera-Ready Copy
    ISSN 2449-7444 Volume 1/2015 International Journal of Korean Humanities and Social Sciences Institute of Linguistics Faculty of Modern Languages and Literature Adam Mickiewicz University Poznań, Poland International Journal of Korean Humanities and Social Sciences, vol. 1/2015 INSTITUTE OF LINGUISTICS ADAM MICKIEWICZ UNIVERSITY www.korea.amu.edu.pl http://legiling-korean1.home.amu.edu.pl/ [email protected] EDITORIAL BOARD Editors-in-chief: Kyong-geun Oh & Aleksandra Matulewska External Members of the Editorial Board: Jerzy Bańczerowski (Poznań College of Modern Languages, Poland) Mansu Kim (Inha University in Incheon, Korea) Marek Matulewski (Poznań School of Logistics, Poland) Jong-seong Park (Korea National Open University, Korea) Section editors for humanities: Jerzy Bańczerowski (linguistics), Jong-seong Park and Kyong-geun Oh (literature) for social sciences: Marek Matulewski Linguistic editors: Kyong-geun Oh for Korean, Andrew Pietrzak for English, Piotr Wierzchoń for Polish Technical editor: Aleksandra Matulewska Editorial Office International Journal of Korean Humanities and Social Sciences Institute of Linguistics Adam Mickiewicz University al. Niepodległości 4, pok. 218B 61-874 Poznań, Poland [email protected] Copyright by Institute of Linguistics Printed in Poland ISSN 2449-7444 Circulation 100 copies Editing and typesetting: AM Printing: Zakład Graficzny Uniwersytetu im. A. Mickiewicza Printed version serves referential purposes. 2 Table of Contents 5 3 Foreword in English 7 Foreword in Korean 9 11 LITERATURE
    [Show full text]