Transfer and Generation Stage of Mu Machine Translation Project
Total Page:16
File Type:pdf, Size:1020Kb
DEALING WITH INCOMPLETENESS OF LINGUISTIC KNOWLEDGE IN LANGUAGE TRANSLATION TRANSFER AND GENERATION STAGE OF MU MACHINE TRANSLATION PROJECT Makoto Nagao, Toyoaki Nishida and Jun-ichi Tsujii Department of Electrical Engineering Kyoto University Sakyo-ku, Kyoto 606, JAPAN I. INTRODUCTION 2.3 Multiple Relation Structure Linguistic knowledge usable for machine trans- In principle, we use deep case dependency lation is always imperfect. We cannot be free from structure as a semantic representation. Theoreti- the uncertainty of knowledge we have for machine cally we can assign a unique case dependency struc- translation. Especially at the transfer stage of ture to each input sentence. In practice, however, machine translation, the selection of target lan- analysis phase may fail or may assign a wrong guage expression is rather subjective and optional. structure. Therefore we use as an intermediate representation a structure which makes it possible Therefore the linguistic contents of machine to annotate multiple possibilities as well as mul- translation system always fluctuate, and make tiple level representation. An example is shown in gradual progress. The system should be designed to Fig. 2. Properties at a node is represented as a allow such constant change and improvements. This vector, so that this complex dependency structure paper explains the details of the transfer and gen- is flexible in the sense that different interpreta- eration stages of Japanese-to-English system of the tion rules can be applied to the structure. machine translation project by the Japanese Govern- ment, with the emphasis on the ideas to deal with 2.4 Lexicon Driven Feature the incompleteness of linguistic knowledge for machine translation. Besides the transfer and generation rules which involve semantic checking functions, the grammar allows the reference to a lexical item in 2. DESIGN STRATEGIES the dictionary. A lexical item contains its spe- 2.1 Annotated Dependency Structure cial grammatical usages and idiomatic expressions. During the transfer and generation stages, the~e rules are activated with the highest priority. The intermediate representation we adopted as This feature makes the system very flexible for the result of analysis in our machine translation dealing with exceptional cases. The improvement of is the annotated dependency structure. Each node translation quality can be achieved progressively has arbitrary number of features as shown in Fig. i. by adding linguistic information and word usages in This makes it possible to access the constituents the dictionary entries. by more than one linguistic cues. This representa- tion is therefore powerful and flexible for the 2.5 Format-Oriented Description of Dictionary sophisticated grammatical and semantic checking, Entries especially when the completeness of semantic analy- sis is not assured and trial-and-error improvements are required at the transfer and generation stages. The quality of a machine translation system heavily depends on the quality of the dictionary. 2.2 Multiple L~aver Grammar In order to build a machine translation dictionary, we collaborate with expert translators. We develop- We have three conceptual levels for grammar ed a format-oriented language to allow computer- rules. naive human translators to encode their expertise lowest level: default grammar which guarantees the without any conscious effort on programming. Although the format-oriented language we developed output of the translation process. The quality of the translation is not assured. Rules of lacks full expressive power for highly sophisticat- this level apply to those inputs for which no ed linguistic phenomena, it can cover most of the higher layer grammar rules are applicable. common lexical information translators may want to kernel level: main grammar which chooses and gener- describe. The formatted description is automati- ates target language structure according to cally converted into statements in GRADE, a pro- semantic relations among constituents which are gramming language developed by the Mu-Project. We determined in the analysis stage. prepared a manual according to which a man can fill topmost level: heuristic grammar which attempts to in the dictionary format with linguistic data of get elegant translation for the input. Each items. The manual guarantees a certain level of rule bears heuristic nature in the sense that it quality of the dictionary, which is important when is word specific and it is applicable only to many people have to work in parallel. some restricted classes of inputs. 420 (Due %0 the advance of electronic instrumentation, auwmsted ship increases in number.) J-CAT=Verb J-LEX ffi It~ "f ~ (increase) J-DEEP-CASE = MAIN J-GAPffi'(SOUrce GOAl)' J-SEI~WENCE -CONNECTOR = DECLARATIVE J-SENTENCE-RELATION = NIL J.SEh~I'ENCE-END © NIL J-DEEP.TENSE = PRESENT J-DEEP.ASPECT= BeyondTime J.DEEP.MODE = NIL J.VERB.ASPECT ffiTRANSITIVE J-VERB.INT = NO J-VERB-PAT='(~: .~."~' .:~ I "C" :: )' J-VERB-SD ~'(~~ -SUBject T-CAUse ...)' J-NEG = NIL I J-CAT = No'tin J-CAT = Noun J-CAT = Noun J-LEX = ;~ ~(advance ) J-£ZX ffi NIL J-LEX = NIL I J.DEEP-CASE ffi CAUse J-DEEI'-CASE = SOUrce J-DEEP-CASE,ffi GOAl I J.SUT'.FACE-CASE ffi ~'- J-SURFACE-CASE = ~,. ,% J -SURFACE-CASE ='(( ". T" ) (:=))" I -CAT= Noun J-CATfNoun dummy nodes J.LEX ,= g~ Ib~(['.~(antomatad ship) (electronic instrument.stion) J-DEEP-CASE ffi SUBject .DEEP.CASE ffi SUBject J-SURFACE-CASE = ~' :SURFACE-CASE = -'9 J-BKK-LEX = ~: J-NFffiNIL J-DEEP-BFKI-3 = NIL I J-SURFACE-BFKI-3 ,= NIL J-BFK-LEX1-3 ,, NIL J-N,ffi C.,ommonNotm J.SEM,, OM(urthSeinl object) J-NUMBER = NIL .o. Fig. i. Representation of analysis result by features. 3. ORGANIZATION OF GRAMMAR RULES FOR TRANSFER AND GENERATION STAGES 3.1 Heuristic Rule First his work o work .... ] Grammar rules are organized along the princi- ple that "if better rule exists then the system work work uses it; otherwise the system attempts to use a I I [J-LEX = he standard rule: if it fails, the system will use a agent OR possess ~ |J-DEEP-CASE l default rule." The grammar rule involves a number L I -- I = agent OR posessj of stages for applying heuristic rules. Fig. 3 he he shows a processing flow for the transfer and gener- ation stages. Fig. 2. An example of complex dependency structure. Heuristic rules are word specific. GRADE makes it possible to define word specific rules. Such rules can be invoked in many ways. For example, we can associate a word selection rule for an ordinary verb in a dictionary entry for a noun, as shown in Fig. 4. 421 P re-transfer ~ post-transfer 3.2 Pre-transfer Rules loop loop Some heuristic rules are activated just after the in TRANSFER internal standard analysis of a representationterna•l -- ~ representation for Japanese for English Japanese sentence is finish- ed, to obtain a more neutral (or target language oriente~ ++,/ \ +..,,o, analyzed structure. We call such invocation the pre- phrase transfer loop. Semantic and structure,s/ pragmatic interpretation are tree ~ structure transformation done in the pre-transfer loop. The more heuristic MORPHOLOGICAL rules are applied in this SYNTHESIS loop, the better result will be obtained. Figs. 5 and 6 Fig. 3. Processing flow for the transfer and generation stages. show some examples. (a) Activating a Lexical Rule for a Noun "~J~'(effect) from a Governing Verb "+~. + "(give) 3.3 Word Selection in Target Language by Using Semantic Markers J-CAT= Verb J-CAT = Verb J-LEX= ~- ~ ~ (five) TRANSFER Word selection in the ... b. J-/~X = affect target language is a big problem in machine transla- // \ /---\ tion. There are varieties of choices of translation J-CAT=Noum J for a word in the source J-LEX= P~ ~(effect) J language. Main principles J-DEEP-CASE =OBJect, I adopted in our system are, J-N-V-PROG = ~ ~-V-TRANSFER ......... (i) Area restriction by J-N- KOUSETSU = ~ ~-KOUSETSU-TRANSFE R I::2 using field code, such as electrical Engineer- .... -" "''. P'-~- SUBGRAMMAR:~ ~- V-TRANSFER ing, nuclear science, /J /; dealing with c~ses like: medicine, and so on. (2) Semantic code attached *'~ ... /" <VERB>:A~, ~£~ .... to a word in the analy- t I C~ve), (l~ive) ~ ~ected, a~ec~._ sis phase is used for I the selection ofaproper other sub~'amrnars ~ J~(efl'eet ) target language word or a phrase. (3) Sentential structure of the vicinity of a word (b) Form-Oriented Description of a Transfer Rule for a Noun "~J~m~'(effect) to be translated is sometimes effective for /ze( the determination of a I proper word or a phrase -,'~ it! .- in the target language. ~- EFFECT +-~>~ IPE ¢. [ ftl&+t | I'[ I++.~ I) X' Table i shows examples of a part of the verb trans- !~! ~ua3 08; fer dictionary. Selection a~T ooJ t i I = ~ of English verb is done by I~FF~CT)TE "tOO ~0 I the semantic categories of +'+ ..+ +'~,i IFtPptCTITE f./ ~ 2 ) nouns related to the verb. The number i attached to I verbs like form-l, produce- • = ~ ^.c • I ~U=G ~ l AnG + 2 is the i-th usage of the +~.+ i + I verb. When the semantic s. I It 6 information of nouns is not I available, the column indi- cated by ~ is applied to 3} i: -! Fig. 4. Lexicon-oriented |! invocation of grammar ;J. ! rules. 422 I J-CAT= N0un { J-CAT=Noun produce a default translation. J-LEX = ~ in~.¢expression) In most cases, we can use a fixed format for describing a translation rule I for lexical items. We developed a num- J.CAT=Verb "~ J-CAT = Al)Jamtive ber of dictionary formats specially designed for the ease of dictionary in- J-LEX = ~" ~,Ido not have) {J-LEX = ~ ~"~ ~ Cmeaning{ess) put by computer-naive expert translators. The expressive power of format- oriented description is, however, insuf- ficient for a number of common verbs J-CAT = Nouz 1 such as "~ ~ " (make, do, perform ...