A Student Model of Katakana Reading Proficiency for a Japanese Language Intelligent Tutoring System
Total Page:16
File Type:pdf, Size:1020Kb
A Student Model of Katakana Reading Proficiency for a Japanese Language Intelligent Tutoring System Anthony A. Maciejewski Yun-Sun Kang School of Electrical Engineering Purdue University West Lafayette, Indiana 47907 Abstract--Thls work describes the development of a kanji. Due to the limited number of kana, their relatively low student model that Is used In a Japanese language visual complexity, and their systematic arrangement they do Intelligent tutoring system to assess a pupil's not represent a significant barrier to the student of Japanese. proficiency at reading one of the distinct In fact, the relatively small effort required to learn katakana orthographies of Japanese, known as k a t a k a n a , yields significant returns to readers of technical Japanese due to While the effort required to memorize the relatively the high incidence of terms derived from English and few k a t a k a n a symbols and their associated transliterated into katakana. pronunciations Is not prohibitive, a major difficulty In reading katakana Is associated with the phonetic This work describes the development of a system that is modifications which occur when English words used to automatically acquire knowledge about how English which are transliterated Into katakana are made to words are transliterated into katakana and then to use that conform to the more restrictive rules of Japanese information in developing a model of a student's proficiency in phonology. The algorithm described here Is able to reading katakana. This model is used to guide the instruction automatically acquire a knowledge base of these of the student using an intelligent tutoring system developed phonological transformation rules, use them to previously [6,8]. The remainder of this paper is organized as assess a student's proficiency, and then follows: Section II provides a brief introduction to the appropriately Individualize the students' katakana writing system and to the rules of Japanese Instruction. phonology. This is followed by the description of an algorithm to automatically generate a katakana to English dictionary from an arbitrary Japanese text and its English 1 IN1RODUCTION translation. In section IV, the katakana to English dictionary is used to develop a set of phonological rules that govern the Interest in Japanese language instruction has risen inverse transformation from Japanese phonetics, represented by dramatically in recent years, particularly for those Americans the katakana orthography, back to the original English engaged in technical disciplines. However, the Japanese pronunciation. This set of rules is treated as the knowledge language is generally regarded as one of the most difficult base that the student must acquire in order to become languages for English-speaking people to learn. While the proficient at reading katakana. The failure to recognize words number of individuals studying Japanese is increasing there that contain specific phonological rules is then used to build remains an extremely high attrition rate, estimated by some to the student model, which is described in section V. Finally, be as high as 80% [7]. Much of this difficulty can be the conclusions of this work are presented in the last section. associated with the Japanese writing system. Japanese text consists of two distinct orthographies, a phonetic syllabary known as kana and a set of logographic characters, originally n.KATAKANA ANDJAPANESE PHONOLOGY derived from the Chinese, known as kanji. The kana are divided into two phonetically equivalent but graphically The Japanese lexicon contains an extremely large number of distinct sets, katakana and hiragana, both consisting of 46 words originating from foreign languages. Traditionally, these symbols and two diacritic marks denoting changes in Japanese loan words are characterized as either kango, literally pronunciation. The katakana are used primarily for writing "Chinese word," which are of Chinese origin (also referred to words of foreign origin that have been adapted to the Japanese as Sino-Japanese words) or gairaigo, literally "foreign word," phonetic system although they are also used for for words from any other country of origin. While the onomatopoeia, colloquialisms, and emphasis. The hiragana proportion of Sino-Japanese words in the lexicon is extremely are used to write all inflectional endings and some types of large due to the profound cultural influence of China, words of native Japanese words that are not currently represented by English origin have dominated the class of loan words since the late 19th century. In a study of Japanese publications Manuscriptreceived July 29. 1991. This material is based upon worlc: performed between 1956 and 1964, over 80% of the gairaigo supported by the National Science Foundation under Grant No. INT originated from English [4]. This process of adopting English 8818039 and in part by a grant from NEC Corporation. 1871 ISSN# 0-7803-0233-8/91 $1.00©1991 IEEE into the lexicon is particularly common for relatively new or specialized termsarising in teehnicalliterature. TABlE I KATAKANA CHARACTERS ANDTIIEIRPHONETIC REPRFSENTAnON When adopting a word of foreign origin into Japanese, the INIPASYMBOLS: BASIC SYllABLES original pronunciation of that word is typically transliterated -( .., into katakana which graphically represents all of the possible 7' .:r. * phoneticsequencesin the Japanese language. It is this process /1/ N /UI Id 101 of modifying English phonetic sequences to conform to the h "\' ? 7' :l rules of Japanese phonology which presents English-speaking /'01 !CI /kJJI fuI fml '1" :". A ~ Y readers of katakana with difficulty in identifying a words lsal 1Ji/ /suI IOIC/ hA>I meaning. This is due to the fact that the rules of Japanese ? + ''J ::;- l- phonology are quite different from those of English. In IW /lJi/ /UoJI /f# /toI particular, Japanese has only five single vowels la i u e 01 in -r =. ~ ~ / contrast to the large number of vowels in English. These trW /nil /mJI /rIJ /001 vowels,when combinedwith the nine Japaneseconsonants /k J\ 1: 7 A" s t n h m j r wI constitute 44 of the 46 basic sounds in IbaI /nil /fuI ~ *fool Japanesewhichare traditionally organizedas shownin TableI. '7 ~ A ~ :c This organizationis referred to as gozyuuoiizu, which literally /rMI /mil /mu/ /mel /mol ;L translatesas "table of fifty sounds," since all fifty soundswere 3 "'"/ja/ /ju/ /jol used at one point in time. The pronunciation of each katakana '7 I) Iv t- 0 character in the table is represented using the International /rII /riI truI /rei /roI Phonetic Alphabet (IPA) symbols. In addition to these basic '7 pronunciations, certain katakana may be written with one of /wl/ two diacritic marks that modify their pronunciation(see Table II). In particular, the four rows in Table I that correspond to TABlEll the consonants /k s t hI may be written with a symbol KATAKANA ClLo\RACTERS ANDTIIEIRPHONETIC REPRFSENTAnON consisting of two short parallel lines, known as nigori, which INIPASYMBOIli: MODIFIED SYLLABlES results in a voiced version of these consonants, i.e. z d b/. Ig tI :¥ ~ ~ ;/ The second diacriticmark is a small circle referred to as maru Ipi /gil IV}J/ 1ge1 Igol whichwhen combinedwith the katakana in the /hirow results ~ :; x 1t 'I' in pronunciations which include the bilabial stop Ip/. The fLI/ Id;i/ fDJI f1# /zoI katakana in the second column, i.e. those associated with the ,. T '1' f }: vowelli/, may also be combined with any of the katakana in IdaJ Id;i/ fDJI Idtl Idol the row associatedwith the consonantIj/ to produce additional J'\ I:" '/ "- ;f, single syllable pronunciations. The second katakana in this /'oil /oil 1'00I /btl Ibol case is written slightly smaller in order to differentiatefrom a Jot ~ 7" ~ ;f( two syllable sequence, as illustrated in Table III. Finally, !pal /pi/ /rAJ! /pt/ /pol Japanese includes two moraic consonants, IQI the mora TABlE ill obstruentand INI the mora nasal, whichare presentedin Table KATAKANA CHARACTERS ANDTIIEIRPHONETIC REPRFSENTAnON IV. INIPASYMBOIli: CONSONANT PLUStjaJ.tjut. or tjot katakana From the above description of the orthography,it "\''''r "\'.1 "\'!l :¥"'r :¥.1 :¥!1 is clear that certain inherent limitations are imposed on /kja/ /kju/ /kjo/ 19i1/ 19iu/ Igjo/ phonetic sequences in the Japanese language. In particular, :"''''r :"'.1 :"'9 :;'\" :;.1 :;9 Japanesedoes not allow any consonantclustersexcept when a 11'1/ IJ'u/ 11'01 Id;1/ Id;u/ Id;oI consonant is followed by a glide or preceded by a moraic +"'r +.1 +9 consonant [10]. In addition,consonants may not appearat the /lJ'I/ /lJ'u/ /11'01 end of a sequence. These restrictions, together with the =.'\" ='.1 ='9 limited numberof Japanese vowels,result in the vast majority /njl/ /nju/ /njol I:"'\" 1:".1 1:"9 of phonological modifications which occur when l::"'r 1:.1 1:9 /'ojl/ /'oju/ /'ojol /hjl/ /hju/ /hjol ~'\" ~.1 ~9 transliterating an English word into katakana. By the same ~'\" ~.1 ~ ;I /pjl/ /pju/ /pjol token,these resultingmodifications are the sourceof difficulty /mjl/ /mju/ /mjol for English-speaking readersof katakana. !J'\" !J.1 !J 9 It shouldalso be noted that there are additional difficulties to /ril/ /riu/ /riol comprehending katakana unrelated to the phonological processesinvolved. In particular, while it is true that the vast TABlEN KATAKANA CHARACTERS ANDTIIEIRPHONETIC REPRFSENTAnON majority of loan words are created by the phonological INIPASYMBOIli: MORA CONSONANTS process, foreign borrowings may also be modifiedby changes in form due to simplification,semantics,