DEALING WITH INCOMPLETENESS OF LINGUISTIC KNOWLEDGE IN LANGUAGE TRANSLATION TRANSFER AND GENERATION STAGE OF MU MACHINE TRANSLATION PROJECT

Makoto Nagao, Toyoaki Nishida and Jun-ichi Tsujii Department of Electrical Engineering Kyoto University Sakyo-ku, Kyoto 606, JAPAN

I. INTRODUCTION 2.3 Multiple Relation Structure

Linguistic knowledge usable for machine trans- In principle, we use deep case dependency lation is always imperfect. We cannot be free from structure as a semantic representation. Theoreti- the uncertainty of knowledge we have for machine cally we can assign a unique case dependency struc- translation. Especially at the transfer stage of ture to each input sentence. In practice, however, machine translation, the selection of target lan- analysis phase may fail or may assign a wrong guage expression is rather subjective and optional. structure. Therefore we use as an intermediate representation a structure which makes it possible Therefore the linguistic contents of machine to annotate multiple possibilities as well as mul- translation system always fluctuate, and make tiple level representation. An example is shown in gradual progress. The system should be designed to Fig. 2. Properties at a node is represented as a allow such constant change and improvements. This vector, so that this complex dependency structure paper explains the details of the transfer and gen- is flexible in the sense that different interpreta- eration stages of Japanese-to-English system of the tion rules can be applied to the structure. machine translation project by the Japanese Govern- ment, with the emphasis on the ideas to deal with 2.4 Lexicon Driven Feature the incompleteness of linguistic knowledge for machine translation. Besides the transfer and generation rules which involve semantic checking functions, the grammar allows the reference to a lexical item in 2. DESIGN STRATEGIES the dictionary. A lexical item contains its spe- 2.1 Annotated Dependency Structure cial grammatical usages and idiomatic expressions. During the transfer and generation stages, the~e rules are activated with the highest priority. The intermediate representation we adopted as This feature makes the system very flexible for the result of analysis in our machine translation dealing with exceptional cases. The improvement of is the annotated dependency structure. Each node translation quality can be achieved progressively has arbitrary number of features as shown in Fig. i. by adding linguistic information and word usages in This makes it possible to access the constituents the dictionary entries. by more than one linguistic cues. This representa- tion is therefore powerful and flexible for the 2.5 Format-Oriented Description of Dictionary sophisticated grammatical and semantic checking, Entries especially when the completeness of semantic analy- sis is not assured and trial-and-error improvements are required at the transfer and generation stages. The quality of a machine translation system heavily depends on the quality of the dictionary. 2.2 Multiple L~aver Grammar In order to build a machine translation dictionary, we collaborate with expert translators. We develop- We have three conceptual levels for grammar ed a format-oriented language to allow computer- rules. naive human translators to encode their expertise lowest level: default grammar which guarantees the without any conscious effort on programming. Although the format-oriented language we developed output of the translation process. The quality of the translation is not assured. Rules of lacks full expressive power for highly sophisticat- this level apply to those inputs for which no ed linguistic phenomena, it can cover most of the higher layer grammar rules are applicable. common lexical information translators may want to kernel level: main grammar which chooses and gener- describe. The formatted description is automati- ates target language structure according to cally converted into statements in GRADE, a pro- semantic relations among constituents which are gramming language developed by the Mu-Project. We determined in the analysis stage. prepared a manual according to which a man can fill topmost level: heuristic grammar which attempts to in the dictionary format with linguistic data of get elegant translation for the input. Each items. The manual guarantees a certain level of rule bears heuristic nature in the sense that it quality of the dictionary, which is important when is word specific and it is applicable only to many people have to work in parallel. some restricted classes of inputs.

420 (Due %0 the advance of electronic instrumentation, auwmsted ship increases in number.)

J-CAT= J-LEX ffi It~ "f ~ (increase) J-DEEP-CASE = MAIN J-GAPffi'(SOUrce GOAl)' J-SEI~WENCE -CONNECTOR = DECLARATIVE J-SENTENCE-RELATION = NIL J.SEh~I'ENCE-END © NIL J-DEEP.TENSE = PRESENT J-DEEP.ASPECT= BeyondTime J.DEEP.MODE = NIL J.VERB.ASPECT ffiTRANSITIVE J-VERB.INT = NO J-VERB-PAT='(~: .~."~' .:~ I "C" :: )' J-VERB-SD ~'(~~ -SUBject T-CAUse ...)' J-NEG = NIL

I J-CAT = No'tin J-CAT = J-CAT = Noun J-LEX = ;~ ~(advance ) J-£ZX ffi NIL J-LEX = NIL I J.DEEP-CASE ffi CAUse J-DEEI'-CASE = SOUrce J-DEEP-CASE,ffi GOAl I J.SUT'.FACE-CASE ffi ~'- J-SURFACE-CASE = ~,. ,% J -SURFACE-CASE ='(( ". T" ) (:=))"

I -CAT= Noun J-CATfNoun dummy nodes J.LEX ,= g~ Ib~(['.~(antomatad ship) (electronic instrument.stion) J-DEEP-CASE ffi SUBject .DEEP.CASE ffi SUBject J-SURFACE-CASE = ~' :SURFACE-CASE = -'9 J-BKK-LEX = ~: J-NFffiNIL J-DEEP-BFKI-3 = NIL I J-SURFACE-BFKI-3 ,= NIL J-BFK-LEX1-3 ,, NIL J-N,ffi C.,ommonNotm J.SEM,, OM(urthSeinl object) J-NUMBER = NIL .o.

Fig. i. Representation of analysis result by features.

3. ORGANIZATION OF GRAMMAR RULES FOR TRANSFER AND GENERATION STAGES

3.1 Heuristic Rule First

his work o work .... ] Grammar rules are organized along the princi- ple that "if better rule exists then the system work work uses it; otherwise the system attempts to use a I I [J-LEX = he standard rule: if it fails, the system will use a agent OR possess ~ |J-DEEP-CASE l default rule." The grammar rule involves a number L I -- I = agent OR posessj of stages for applying heuristic rules. Fig. 3 he he shows a processing flow for the transfer and gener- ation stages.

Fig. 2. An example of complex dependency structure. Heuristic rules are word specific. GRADE makes it possible to define word specific rules. Such rules can be invoked in many ways. For example, we can associate a word selection rule for an ordinary verb in a dictionary entry for a noun, as shown in Fig. 4.

421 P re-transfer ~ post-transfer 3.2 Pre-transfer Rules loop loop Some heuristic rules are activated just after the in TRANSFER internal standard analysis of a representationterna•l -- ~ representation for Japanese for English Japanese sentence is finish- ed, to obtain a more neutral (or target language oriente~ ++,/ \ +..,,o, analyzed structure. We call such invocation the pre- phrase transfer loop. Semantic and structure,s/ pragmatic interpretation are tree ~ structure transformation done in the pre-transfer loop. The more heuristic MORPHOLOGICAL rules are applied in this SYNTHESIS loop, the better result will be obtained. Figs. 5 and 6 Fig. 3. Processing flow for the transfer and generation stages. show some examples.

(a) Activating a Lexical Rule for a Noun "~J~'(effect) from a Governing Verb "+~. + "(give) 3.3 Word Selection in Target Language by Using Semantic Markers J-CAT= Verb J-CAT = Verb J-LEX= ~- ~ ~ (five) TRANSFER Word selection in the ... b. J-/~X = affect target language is a big problem in machine transla- // \ /---\ tion. There are varieties of choices of translation J-CAT=Noum J for a word in the source J-LEX= P~ ~(effect) J language. Main principles J-DEEP-CASE =OBJect, I adopted in our system are, J-N-V-PROG = ~ ~-V-TRANSFER ...... (i) Area restriction by J-N- KOUSETSU = ~ ~-KOUSETSU-TRANSFE R I::2 using field code, such as electrical Engineer- .... -" "''. P'-~- SUBGRAMMAR:~ ~- V-TRANSFER ing, nuclear science, /J /; dealing with c~ses like: medicine, and so on. (2) Semantic code attached *'~ ... /" :A~, ~£~ .... to a word in the analy- t I C~ve), (l~ive) ~ ~ected, a~ec~._ sis phase is used for I the selection ofaproper other sub~'amrnars ~ J~(efl'eet ) target language word or a phrase. (3) Sentential structure of the vicinity of a word (b) Form-Oriented Description of a Transfer Rule for a Noun "~J~m~'(effect) to be translated is sometimes effective for /ze( the determination of a I proper word or a phrase -,'~ it! .- in the target language. ~- EFFECT +-~>~ IPE ¢. [ ftl&+t | I'[ I++.~ I) X' Table i shows examples of a part of the verb trans- !~! ~ua3 08; fer dictionary. Selection a~T ooJ t i I = ~ of English verb is done by I~FF~CT)TE "tOO ~0 I the semantic categories of +'+ ..+ +'~,i IFtPptCTITE f./ ~ 2 ) related to the verb. The number i attached to I like form-l, produce- • = ~ ^.c • I ~U=G ~ l AnG + 2 is the i-th usage of the +~.+ i + I verb. When the semantic s. I It 6 information of nouns is not I available, the column indi- cated by ~ is applied to

3} i:

-! Fig. 4. Lexicon-oriented |! invocation of grammar ;J. ! rules.

422 I J-CAT= N0un { J-CAT=Noun produce a default translation. J-LEX = ~ in~.¢expression) In most cases, we can use a fixed format for describing a translation rule I for lexical items. We developed a num- J.CAT=Verb "~ J-CAT = Al)Jamtive ber of dictionary formats specially designed for the ease of dictionary in- J-LEX = ~" ~,Ido not have) {J-LEX = ~ ~"~ ~ Cmeaning{ess) put by computer-naive expert translators.

The expressive power of format- oriented description is, however, insuf- ficient for a number of common verbs J-CAT = Nouz 1 such as "~ ~ " (make, do, perform .... ) J-LEX = ~'~(sense) and "~ ~ " (become, consist of, provide, J.-DEEP-CASE = SUBject ...) etc. In such cases, we can encode transfer rules directly by GRADE. An { example is shown in Fig. 7. Varieties J-CAT= Noun of usages are to be listed up with their corresponding English sentential struc- --J J-LEX = NIL tures and semantic conditions.

3.4 Post-Transfer Rules

=expre~ion which d~s not have sere" ~ "meaningle~ expre~ion" The transfer stage bridges the gap between Japanese and English expressions. Fig. 5. An example of a heuristic rule used in the There are still many odd structures pre-transfer loop. after this stage, and we have to adjust further more the English internal repre- sentation into more natural ones. We call this part as post-transfer loop. An example is given in Fig. 8, where a logarithmic have integral Japanese factitive verb is first trans- characteristics equation ferred to English "make", and then a integral integral structural change is made to eliminate equation equation it, and to have a more direct expression. l { have \~ with 4. GENERATION PROCESS

integral logarithmic logarithmic 4.1 Translation of Japanese equation characteristics characteristics Postpositions

Postpositions in Japanese general- c0nductivity give effect ly express the case slots for verbs. A postposition, however, has different effect effect usages, and the determination of English prepositions for each postposition is quite difficult. It also depends on the give / verb which governs the noun phrase hav- ing that postposition.

? effect conductivity (REC: recipient) Table 2 illustrates a part of a default table for determining deep and (3) ADJ [~ { Sl surface case labels when no higher level rule applies. This sort of tables are ~> :many defined for all case combination. In Xl ~ X2 ~i X2 ~>-~:few this way, we confirm at least one trans- ! lation to be assigned to an input. A ADJ ~ :be, exist,.. particular usage of preposition for a (to be determined SUB at transfer step) particular English verb is written in I the lexical entry of the verb. X 1 4.2 Determination of Global Sentential 14) ~DSI (~,~) Structures in Target Language - . ~(+tend tO)

I z /A ~ ~ :there exist ~/~ ~ :tendency Fig. 6. Examples of pre-transfer rules.

423 non-living substance form-I structure form X(obj) social phenomena take place X take place action,deed,movement occur-i reaction X occur standard,property state,condition arise-i X arise relation produce-2 produce X non-living substance structure form-I X form Y phenomena,action cause-i X cause Y produce-2 x produce Y property improve-i X improve Y measure increase-2 X increase Y raise-i X raise Y Semantic marker for X/Y

Table i. Word selection in target language by using semantic markers.

Grobal sentential structures of Japanese and English are quite different, and correspondingly ~ (NARO) NARU[J-VP=VI] consist of the internal structure of a Japanese sentence is (1) A ~rS ~'~ ~. /\ ,. /\ not the same as that of English. Fundamental A B A B difference from Japanese internal representation (SUB) (COM) (OBJ) (COM) to that of English is absorbed at the (pre-, post -) transfer stages. But at the stage of English (2) A,~'[3 I:- .~u~. NARU [J-VP=V2 ] generation, some structural transformations are /\ still required in such oases as (a) embedded sentential structure, (b) complex sentential A B (suB) (GOAL) structure. provide : : __._.> [B. J-SEM=CE] )" // X We classified four kinds of embedded senten- tial structures. CE:means, equipment A B (AGT) (OBJ) (i) a case slot of an embedded sentence is vacant, reach and the noun modified by the embedded sentence == % ~r comes to fill the slot. MU :unit A B (~)The form like "NI~" V ~ N2" m " (N 2 ~ NI~'V ) (OBJ) (STO) N2". In this case the noun N I must have the semantic properties like parts, attributes, "B. J-CAT=ADJ ~ bzec°~e and action. := }, J-LEX= %'~ (easy) |~ / k I=<~(diffi- I A B (~i~)The third and the fourth classes are particular cult) J (OBJ) (GOAL) embedded expressions in Japanese, which have the connecting expressions like "~ " (in turn the case of), " ~9~ " (in the way that, [B. J-SEM=IT, IC] ~ / "g~,P " (in that), and so on. IT :theory ,method A B IC :conceptual (OBJ) (GOAL) object get An example of the structural transformation B :complement marke I is shown in Fig. 9. The "vhy..." l: i' is generated after the structural transformation. (OBJ) (GOAL) Connection of two sentences in the compound become and complex sentences is done according to Table ,default] .." / X 3. An example is given in Fig. i0. A B (OBJ) (GOAL) 4.3 The Process of Sentence Generation in English (3) dictionary rules After the transfer is done from the Japanese help become b give deep dependency structure to the English one, conversion is done to a phrase structure tree with double become :. double all the surface words attached to the tree. The processes explained in 4.1 and 4.2 are involved at cause become ~. A causes B this generation stage. The conversion is perform- ed top-down from the root node of the dependency tree to the leaf. Therefore when a governing verb Fig. 7. An example of dictionary transfer rules demands a noun phrase expression or a to- of popular verbs. expression to its dependent phrase, the structural change of the phrase must be performed. Noun to verb transformation, and noun to

424 (transfer) (post-transfer) ~ ~ ~ make > C'

! A B C A B C A C 1SUB I /1 B B / I (C': derived (C:intransitive (consultation to derived from C) verb) lexieal item C)

A~I"8tI~ ---~A make B rotate > A rotate B

Fig. 8. An example of post-transfer rule application.

J-SURFACE-CASE J-DEEP-CASE E-DEEP-CASE Default Preposition

~: (ni) RECipient REG. BENeficiary to (REC- to, BEN -- for) ORigin ORI from PARticipant PAR with

TIMe Time-AT in ROLe ROL as GOAl GOA ~o

Table 2. Default rule for assigning a case label of English to a Japanese postposition " l~ " (ni).

JAPANESE ENGLISH he school resign reason SENTENTIAL SENTENTIAL N 1 N 2 V N 3 CONNECTIVE DEEP-CASE CONNECTIVE [ANALYSIS] reason(N3), \ RENYO TOOL BY -ING . . resign(V) (-SHI) TE TOOL BY -ING .. / RENYO CAUSE BECAUSE . . -~ / m I "~'~O e ¢! /o I ~e, (-SHI) TE l school(N 2 ) reason - TAME s! -NODE hei -KARA -TO TIME WHEN . . [TRANSFER] -TOKI 73 ,PROP. CAUSE i -TE -TAME PURPOSE SO-THAT-MAY -NONI II -YOU to N 1 N 2 (N 3) -YOU MANNER AS-IF -KOTONAKU WITHOUT -ING .. [GENERATION] NP -NACAP~, ACCOMPANY WHILE -ING .. -BA CIRCUMSTANCE WHEN . . N 3 RELCL ,°., REL/~V S , Table 3. Correspondence of sentential connectives. why

Fig. 9. Structural transformation of an embedded sentence of type 3.

425 (a) (b) earthquake building collapse

,ANALYSIS] ~i [i collapse destroy TAMENI --->V2 YOUNI --~V. (PURPOSE) (PURPOSE)[ z X building earthquake earthquake building = The buildings collapsed [CPO:causal potency] [TRANSFER] yl [I due to the earthquake. = The earthquake destroyed the ~-- > V 2 ~ V? SO-THAT-~AY SO-THAT-MAY " buildings. (PURPOSE) (PURPOSE) X

Fig. ii An example of structural transformation [GENERATION] S S in the generation phase.

V 1 INF V 1 SUB 5. SUMMARY TO V 2 CONJ S T This paper described a number of strategies IN-ORDER-TO X AUX V 1 we employed in the transfer and generation stages of our Mu system to make the system both powerful MAY and fault-tolerant. As is mentioned above, our system has many advantages such as the flexibility Fig. i0. Structural transformation of of the generation process, the utilization of an embedded sentence. strong lexical information. The system is in the course of development in collaboration with a num- ber of computer scientists from computer industries transformation are often required due to the differ- and expert translators. Some of the translation ence of expressions in Japanese and English. This results are attached in the last, which show the process goes down from the root node to all the present level of the translation system. Progres- leaf nodes. sive improvement is expected in the next two years.

After this process of phrase structure genera- tion, some sentential transformations are performed ACKNOWLEDGEMENTS such as follows. We acknowledge the members of the Mu-Project, ( i ) When an agent is absent, passive transforma- especially, Mr. S. Takai(JCS), Mr. Y. Fukumochi tion is applied. (Sharp Co.), Mr. T. Ishioka(JCS), Miss M. Kume ( ii ) When the agent and object are both missing, (JCS), Mr. H. Sakamoto(Oki Co.), Mr. A. Kosaka the predicative verb is nominalized and (NEC Co.), Mr. H. Adachi(Toshiba Co.), Miss A. placed as the subject, and such verb phrases Okumura(Intergroup), and Miss A. Okuda(Intergroup) as "is made", and "is performed" are supple- who contributed greatly for the implementation of mented. the system. (iii) When a subject phrase is a big tree, the anticipatory subject "it" is introduced. ( iv ) Pronominalization of the same subject nouns REFERENCES is done in compound and complex sentences. ( v ) Duplication of a head noun in the conjunctive [i] M. Nagao: Machine Translation Project of the noun phrase is eliminated, such as, "uniform Japanese Government, a paper presented at the component and non-uniform component" > workshop between EUROTRA and Japanese machine "uniform and non-uniform components". translation experts, held in Brussels on (vi) Others. November 24-25, 1983. [2] J. Nakamura, et al.: Grammar Writing System Another big structural transformation required (GRADE) of Mu-Machine Translation Project and comes from the essential difference between DO- its Charactersitics, Proc. of COLING 84, 1984. language (English) and BE-language (Japanese). In [3] J. Tsujii, et al.: Analysis Grammar of English the case slots such as tools, cause/reason, Japanese in the Mu-Project -- A Procedural and some others come to the subject position very Approach to Analysis Grammar --, ibid. often, while in Japanese such expressions are never [4] Y. Sakamoto, et al.: Lexicon Features for used. The transformation of this kind is incorpo- Japanese Syntactic Analysis in Mu-Project- rated in the generation grannnar such as shown in JE, ibid. Fig. ii, and produces more English-like expressions. [5] J. Tsujii: The transfer Phase in an English- This stylistic transformation part is still very Japanese Translation System, Proc. of COLING primitive. We have to accumulate much more linguis- 82, 1982. tic knowledge and lexical data to have more satis- fiable English expressions. Sample outputs as of April, 1984 are attached in the next page.

426 o

0

U

e~

"o

¢

x

v,l 1)

E o ~.o 4 ® ~ g ¢ N g~

':'~ i/i ~-, ~': ~ ~'

"o

- g

0 ".'.'I ® T ~ °

o) 0 0

• Q. oJ tU .4.4 0 o

c v,. -E--- _ U "="+ i ~.~ m m

e:~ N o

X .~ ~ .~ 0 --- ¢1

+.-~ o o o. -~$ o., ~" ~ -~'- .Q o [] ~;.~ .~ ~: °.~ ~3 o o

°~- -o~ "- " [ .--,=--+ ~ ~' ~ ~= ~ .j~ .+ .C

o*,.> ~.~ o o ~,L, ~.:- ~ ~ o~ o~

o ~:~ ~ ~ u ~o

uJ,~

427