Full-text processing: improving a practical NLP system based on surface information within the context

Tetsuya Nasukawa. IBM Research Tokyo Resem~hLaborat0ry t623-14, Shimotsurum~, Yimmt;0¢sl{i; I

Abstract text. Without constructing a i):recige filodel of the eohtext through, deep sema~nfiCamtlys~is, our frmne= Rich information fl)r resolving ambigui- "work-refers .to a set(ff:parsed trees. (.r~sltlt~ 9 f syn- ties m sentence ~malysis~ including vari- : tacti(" miaiysis)ofeach sexitencd in t.li~;~i'ext as (:on- ous context-dependent 1)rol)lems. can be ob- text ilfformation, Thus. our context model consists tained by analyzing a simple set of parsed of parse(f trees that are obtained 1)y using mi exlst- ~rces of each senten('e in a text withom il!g g¢lwral syntactic parser. Excel)t for information constructing a predse model of the contex~ ()It the sequence of senl;,en('es, olIr framework does nol tl(rough deep senmntic.anMysis. Th.us. pro- consider any discourse stru(:~:ure mwh as the discourse cessmg• ,!,' a gloup" of sentem' .., '~,'(s togethel. i ': makes.' • " segmenm, focus space stack, or dominant hierarclty it.,p.(,)ss{t?!e .t.9 !~npl:ovel-t]le ~ccui'a~'Y (?f a :: :.it~.(.fi.ii~idin:.(cfi.0szufid, Sht/!er, dgs6)i.Tli6refbi.e, om<

,,~ ~-. ~t.ehi- - ....t sii~ 1 "~g;. ;, ~-ni~chin¢''ti-mslat{b~t'@~--..: .;- . .? • . : ...... - ~,',"...... ;-.preaches ":...... 'to-context pr0cessmg,,' • and.m" "tier-ran : ' le d at. •tern ; ::'Li:In i. thin.j.;'P'..p) a (r ~;.!iwe : .%d,es(tib~, ..i-!: ..... a. ,,~inqfl6 .... • ..- ~.,.. . '..obtmnm~..~:.pei'iect. ' .: " : ....aimly,~ls. : : Howev~r,:" " ' ' ~ ...... by 9xtel~d c :co3!t e~/~ ~.g0dc[.~.Ofi!si.~tilng .-ef:p~fs.e d t'Imcs.6!-.: .~,..,2,'.{ .in~/t, ii~,}.~{ifit..of .tile. iJl:oces~i~/g 6bji,bt. fr0ih'-0ne'.smt~ :~tch'~:n~q~ce::.in :.a.text,4~a¢~fit~-eff~'!,~ven£~s :> L #..:::'ti;iic~. ~6 .rfiifltiifl6 sgi~tdlt¢:es {n:.~{ k0ifi'ee, f,6XVxnd .by fin"- handhng: """:' various'!?": l~roblems '"" ....m NLP ('such " " ; using syntaciic information on all tlie 0theriwSr(ls as the resolutien of stru~:tural ambiguities, i.n the whole text. Snch ,~s nmdifiee-modifier relation- pronoun referents, and the focus of focusing ships and their I)ositions in the text. our framework subjuncts (e.g. also and only), as well as improves the overM1 ae('uraey of a natural language for •adding SUl)plemen:t~try phrases t.o seine pro('essing system. elli)ti(al sen:fences.. ". " " " . .' ...... : ...... <:. : ? -: - ' 2";" ,:- q.: '. <;. :- .:'~ .v: '..-:: .' : :". '.-::' ' C".. : We, imp]etne~i{.M this II:amework:on:an Englisl~-to-

• ,*"-' "~''<- ":~ ...... '. ".',' '." " , . "," ,', ' " - " . cOnllith, er lllallu~tts.,. ±.tt.: xtood.~,) .~,¢~v...... -...... ,-~: .'key, '~.eclimot.ogy. for :.. mH)r.0y!ng" tlie. ac(;nra(:y •(ff ,h~xt.- .: ::. < :. :. ~G,':.; .:.: .. ", - ,.;-' , : '; ..... ',' : :-: : - ; :'; .... :

t : ...... "q ""'.' . " ' ..... ' ...... " ", " .... " " m }rDuiid:i~/lowledge.,t, hd.(l(,el> iltfo1'(,ll(e !n(~cllal)isi~ls, C'O'tt'~X~-:~;:~l';!:;lt e t(:::~:~:1~;]~ ' :~~ (.2;1]I>!;;0'~ y ~.~;" is true:.that.-' We." can" " mWays.,' " " .o'nn.d ;' ex~mq)l.,'" ' 'eS ot~ 1 Jiob' .- . SllInlng.:..'. • ~x,,,.' ,, "s ; . : 1~,,'",' • '.,[ " : ; . .' ,<.;(.. '",.; 7.' lems thai requir.e c6mmoil 'sei,g~ mM .ilff¢4rence~ lnech.r ::. ,." ;u.~w~i~:~d:h;na::P;'.lYSo:::~lg.:~:;n:~'l'. < w3!-n[;a2ls;t a~lisI~lS;':s~rch.:as :t.ti( class'ie i)t:6iflems ]ne.ntloned. it['.- ',,.":" !. :. '. '.'"., !.:. '~' '.,-<,:' " ' : . (Cl{anlliai("1973~ il]whietl tl[i re.fi rents (if'l:,{'ouonns result of semse (tiszmtblgua~mn m one sentenc.e Y :: - - ':..: .-" ," ,: •, . - <-' ," "; ;; ;- ' - " . Wit-ir 'M~ the (W6i'd~ .{/c"{Sd i~lis~iou{'~(~ t.fih~ :Sl{Xl'(~:.ttie "are ndt,

'abSerxre..t'ti/~i~3~ ~(mi);x(id~-ilenderlt :tfi:obloniS;'tht~( Art! ';::) ; ..6.y'.',: ';-,-.: .,..-! ..... ;..Y-;! : -i~:- "<:..'. ?., i~:!,.:.,!!~¢:.!..>"-..,':'~.:..,' *.~ • : i" ;~;. ' " ,-," " .-/, " v ( :~ ;-., < i-. - " .. "e ~. ?._:SL:.£t : . 'W/~,"(;all "O])t,,~kilI:.6"l.ll~C,~ "t,~, U~'l;erglllllltg: [,tie ll~DetlIleog .o I s(MgaDlg..Wll;nOt'lt-.Ille!lS( .DI. a (Leep lllieronce ille(lt7.. ' '~ "', .' ",.' ' -1 . '.... i. .~ .: , • ~ t" e imism o~ Cal( full" haiid (od( d data su( h ~ scl, i-)ts ' sm~(~nr~uy ftmolguous 1)nrases Irom sH'.ltc{urai.mlor: " ?" "~ ' y. - : ' ' " ..' " ' " " 1 ' • : • ', . " - ' .- • , :' ' ': ' ' [Schhnk and R{dsti6}k 1)981~ Wc therefore tried t0 matnm on all w~th the same:lemmawlthm the .~,~...~, -.' i <..,,.'...~., '...,-,.,~,!,-.;.... . ~,, ;, ~,i ', .~,.'~..:~; ' ; di,~(:a~r:;'e~,.'.M6r~ovel',, p~;O.c~siilg.:,~-MiOle.~.xt:: ~t:.a~i~:, d/)VelOp' a- praY( tl(-'.[tl ~,ile~no([- tllaL .wolll(l SOlV.O' lllOS ,.. , - ' - , , .., .... ' - • ...... " ...... ' : - . - ' " • " " - .... tinle ltmkes:-it. po~ibto ~6,.refer::to otl~erdn.f(*rm~ti61~ abntexVddpenden.t px'obl0,nls.and, m~t)r.o'~e the aceu-.. ,' ' ,~,- .. ,-;, ' . ., • .,..-,,:; -.:. :, .,. " ;.. • ; • . ,,- .:. ':.'.; ; ",': ...... ,i":-, " i "' ;.' :' " "., ," - , ,~:' ," " : ; .' S'li(;[Fa.,~wor(IIreqi'fOn(~I~d~,1~e.poslrlo~z:ot.e~ten,WOl'(r, •.racy, oI ~ex.b,all&lysls :Dy,U,qlng a. stlnpte lnecP2a, nlstD.:. , , .~ ~ • :...... : ...... :.....: ....,",, :, : : ,:", '., ...... - "< ." : .. ~.:....~. . ;.. : ,. i ,.'., .; .-. " ...... - ." . ,wfiieh,c.an be.lmedf0r gesoban~i)ron.em~ ref~tetlcd und 'and-,eXlSldl]g 1i13~llllle-l'l,~a(13A)leoara . , , • ' ,. . - _..,. : . ;, :.',-. ,: '-. ', .; - : " < , " - :,.', " -¢ -, - - " T6 t~e~h~, w:it}~ ~e' ttevel6,;~([ ~(ftah~4v6tk'f(fi" ~i@6 ':/ tlii~ '~0('us< ot-m~ash~gsnla'imictq :stidi..it~ a.l.~o, and-0nt:y, • " "-" ~-~ .- . ~,,. '~ - ~ .' : ' - . r """; :',"<' ', '-"' ':: ". "'-:':'"' " -':.? 2' *,..¢"': . ":'.':; ) - ',. , ': {~essi.ng.all_: se]~te.n~:8~ .{g..a': t;eXt ;siimflt~n('.0i/sk6! s(; that..i: • .Ii/:~hi's .: p~p"h'i:,, ~$!./.d'eScti'.t~9.)d3ii:.....a!ob)ist.. (>O~\,tgx,t~ ~e'a(-hsent.ende e~u(b'e d~s~nb'~gu~ded by itsing i~ffo~mia- . (proceS~ing l~ett!od , .m~mely~"filll-.t(~x~:processlng ,, f6- tioi~ ex~rgcted from other senten('es within tlm same (:using On its effects on ttie.output of a nmehine trails-

824 lation system. In the next section, we briefly (h'scril)e Context = {Sentencel, Sentence2, ..., Sentence n] tim framework of our method, which uses a siml)le context model; tl,(`n, in the following s('etions, we il- Stenence i = [Word i-I, Word i-2, ..., Word i-j} lustrate its effe(:tiveness with some actual outl)uts of our English-to-JCL1)anese lna('hine translation system. John likes apples. Sentence 1 Word1-1 [John] POX : N BASE : John ...

2 Framework ...... ub ./~o hn~ Word1-2 [llktm]

...... ~ POX : V BASE : like ... Full-text processing consists of thr('e steps: POX : N BASE : apple .~. 1. G(`neratil,g a context model tlmt consists of 1)arsed trees of each seltt('n(`e ill a sour('e t(`xt Sentence 2 Tom ah'o likes apples. Word2-1 [Tom] 2. Refining the context model by assigning a single raft- POS : N BASE : Tom ... fled parse tree to (`a<'h senten<'e in the text Word2-2 [also[ 3. Resolving the prol)lems in -;t<'h sentence in the <'

of dat;t on multiple senten('es in at text, it is esseu- Word3-3 [likes] tim to constru('t some eollt(`:~t model; the tirst st(' 1) POS : V BASE : like ,.. of the full-text 1)ro('essing nwthod ix therefore to ('on- Word3-4 [oranges] POS : N BASE : oranse ,., stru('t a context lnodel by amalyzing (`a('h senten('(` in an inlmt text. To avoid any (,rrors that may o('cur during transforlmLtion into any other rel)r(`s('ntations, Figm'e 1: Example of ~t context mod('l su(:h as a h)gicM rel)resentation , we stayed with sur- face structures, and to i)reserve the robustn('ss of this framework, we used only a. set of l)arsed tr('es as ;t su('h a s('ntenee, hfformation extracted front COln- (:ontext model. Thus, ea(:h sent.enc(` of an inl)Ut text pl(`te 1)arses of w(`ll-formed sentences 2 in a context ix pro('(`ssed t)y a syntactic lmrs('r in the first st('I), model ('all b(` us('(l to cOlnlflete incolnl)lete parses, in and the positi(m of eac|t instance of every h'mma., its the f()rm of partially parsed chunks that a bottom- morphological information, and its lno(lifiee-modifier up 1)ars(,r outlmts fl)r ill-formed sentences by using a relationships with other content words are extracted previously des('ribed method (N~Lsukawa, 1995). from the parser output, and stored to construct a On the other hand, fl)r some sentences in a text, context model, ;~s shown in Figure 1. In addition, if such as Time ]lies like an arrow, a syntactic t)arser any on-line knowledge r(`sourc('s are ~tvMbd)l(`, infl)r- lltay gent,rate nlore thatl olle parse tree, owillg to the mation extracted froln tit(, resour<:es is also stored in 1)r(`sen(-e of words that Call ])e ;Lssigned to more than the context model. For examl)le, infl)rmation on sym one part of st)eech , or to the l)resen('e of complicated onyms extra.('te(t from an on-lilw thesaurus dictionary coordinate structures, or for wtrious other re~Lsons. In and information (m wor(l sense all([ structural disam- attempting to select the correct 1)arse of such a sen- biguation extracted D()m an examl)le l)~ts(`, such as t(`nee, on(' (;an use the tyt)es of the l)revious and sub-

825 the context model is as fi)llows: model ¢'ontmned no ilfformation was 74.5%. In our 1. In each candidate 1)arse of a sentence with nmMph' experiment on ill-f(mned sentences ill technical do('- candidate i)arses, assign a score for each lnodifier- ulnents, in more than h~flf of the incoml)letely 1)~trsed modifiee relationship that is fl)und in the context sentences, the lmrt.iM parses were joined into a single model, and add u I) the scores to assign a 1)reference stru('ture by using ilfformation in the context model. value to the (:andidate l)arse. However, after the second step, ambiguities in each 2. Select the 1)arse or 1)arses wilh the highest preference sentence are kept unresolved in the context model. value. If more than one l);~rse has the highest t)ref- Thus, we need to resolve problems in each sentence erenee wdue, go to the next ste 1) with those lmrses; in the context model ill(lividuMly. otherwise, leave this i)ro('edure. In this section, we describe how the accuracy of 3. Assign a 1)reference value to each remaining candi- senten('e mtalysis in other probh'nls is improved by date parse that has the same tyl)e of root node (su('h referring to the siml)le context model, and how the as noun phrase, verb l)hrase, or sentence) as the results are refiecte(l in improved parse of the 1)receding sentence or the next senten('e. outlmts. 4. Select the parse or 1)arses with the highest 1)reference 3.1 Resolving the focus of focusing wdue. If more than on(' parse has the highest 1)ref- erence value, go to tit(, next ste I) with dtose 1)arses; subjuncts otherwise, leave this procedure. Ih,solving the focus of fi)cusing sul)juncts such as 5. Assign a preference wfiue to ea('h remaining ('andi- also ;rod only is a tyl)ieal context-del)endent prob- date parse based on heuristic ruh's that assign scores l('m tha.t requires ilffornmtion on the 1)revious con- to structures according to their grammatical prefer- text. Fo('using sul)jnncts (lr~tw m.tention to a part ability. of ;t senten(-e th~tt often represents new information. 6. Select the parse or parses with the highest prefer- Consider the se(:ond senten('e, Tom also likes apples, ence value. If more than one t)arse has the highest in Figures 1 mM 2. Ill this sentence, the scope of also 1)reference wfiue, select the first parse in the list of can 1)e To'm, likes, the entire predicate (the whole sen- the remmning candidate parses. t.enee except the subject Tom), or apple.% acc(trding Tile procedure of conq)leting l)artia] ])kLl'ses of a.n to the itrevious context. In this ('as(', the preceding ill-formed sentence consists of two steps: senten('e, Joh, n likes apples, has the structure, A likes B, whereas sentence (2) has the structure, X also likes 1. Inspecting and restrnet.uring of each 1)artial parse The part of st)ee('h mid the modifiee-modifier rela- B, where B and the predi(:ate fib,s are identical. The tionshil)s with other words are inspe('ted for each eoml)arison of these two structures indicates that the word in a 1)artial l)arse. If the part of speech and tit(" new intbrmation X (Tom) is the scope of also in sen- modifiee-modifier relationships with other words are tence (2). different from those in the eont('x:t model, the 1)aerial The fl)('us of focusing sul)jun('ts ix resolved by parse is restructured a('eor(ling to the information in means of the following algorithln: the context model. 1. Find among the 1)revious sentences in the context 2. Joining of partial pmses model one that contains expressions morphologically If the 1)artial l)arses were not ratified into a singh" identical with those in the sentence containing the structure in the previous step, they arc, joined to- focusing suhjunet. gether on tit(" l)asis of modifier-modifiee relationshil) 2. Contpare each candidate focus word or phrase in 1)atterns in the ('ontext model so that a unified i)arse the sentence containing the tl)('using subjunct with is obtained. words or phrases in tit(" senten('e extracted in ste l) 1. 2.3 Problem resolution for each sentence in 3. Drop any mori)hologieally i(hmtical words or I)hrases as candidates for the focus, and select the remain- the context model der as the focus of the fo(-,tsing su|)junct. If more Finally, in the third stel) , ea,'h senten('e in the ('Olltext than one candidate remains, take the defaul}, inter- lnodel is mmlyzed individually, and its mnl)iguities pretation that wouhl be used if there were no context and context-dependent prol)h'ms are resolved by re- iuformatiolt. ferring to information on other sentences in the con- Figure 2 shows the translation outputs of our sys- text model. The next section des('ribes the 1)roce- te,n with and without information 1)rovi(h~d by con- dures for problenl resolution, and explains lheir ef- text pr(t(:essing. As shown in this figure, with(tar the fectivene,ss in lint)roving nmehine transla.don output. context information, also modifies the 1)redicate like l)y default in l)oth senten('es (2) and (3). In contrast, 3 Effectiveness when context pro('essing is apt)lied, the focus of also ix determined to I)e Tom in senten(:e (2) and orange The a(:cura('y of syntactic analysis m~\y l)e improved in sentence (3). by refinement of the ('ontext nn)del in tlt(' second step In our amtlysis of ('omlmter manuals, most nouns of the procedure. For ex~mlple, in an exl)eriment on were repeated with the same expressions unless they 244 sentences from a. chapter of a COml)uter manual, were repla.('ed by 1)ronouns or definite expressions in which we attempted to select the correct parse of su(h as th, is, that, and tit('.. ()n the other hint(I, predi- a sentence from multiple candidate l)arses, ('orre('t (-ates were sometimes repeated with different expres- parses were sele('ted for 89.1% of 110 multiple pa.rsed sions. For exanlple: sentences by using infbrmation in the ('ontext model, A has B. ~ A also includes C. where~us the success rate obtained when the ('ontext A contains B. --~ C is also included in A.

826 (1) John likes al)l)l'.'s. [With and 'vViihou(. (:ottt<,xi] 'l'ranslaiioll: "~ !i ~'+&, ~J "/:-:~:~.t'-g*~-51"<, I)ep(qidency SI rtl('l iil'(q ,lOhgL ]t¢+ 'l'i~tylO 'lllo kOrtLOllti ilZ¢LR'IL,

(2) Tom Mso likes a.l)l)l,,s. [wiu,,..~ < ',.,,,.'x,l (-~----" [With (:ontexi] L ~-_

'l'ranslalh)n: I" ACJ., i) "t::{:, I.J4<>~::~:& ~:-g , ]DfL llJo f[OTL'!J(JII, IL{ I~;OII,O IlLI lllfL,'47+ :I'0711, /'Zll,¢/O 'l~o~t~ ~ltO ~'itZ[lO "~llO /,:Ol~O~tti ltZ(t,~'a.

(a) H(: a,lso lik(,s oranges. [Wit, hour. ('<'""×t I Qi,,,,) Iwi, ,, (:(>.~,(,~q Qa'_) . i)op<,,.I,,,,,.y s,,..,., .,..: p,.iaV,'~..-).:?,,7;', tq+aa--'X:-:~ ...... v .,.oA

Translation: ~t&, 7~+ 1/;/5) g:, I,iJd,~V?"ai&'t'j,, Ti'al,sl~tiioti: ~{2, >]" I/F'S{)~'&~-~',, [((Zl'(~ 7L('Z()HZi Iltf~N'lg. Jt(~ 01'(Z~II(: '+Ill) C]OttlIO'IZTtZ t((tl't: ]l,(Z 09'(tlZg('. 11t0 ~;01tCl#tti 7tt(t.S'll,.

Figure 2: l~;xaml)h ' of translation (I)

[11 this case, infornlltl:ion on ,~3"ilOllyillS a,lld deriva- t)t'()ll()llll i/,tl(| i1:,% l'('f('l'Olll; llOlln 1)hra,s( ' &l?(' reversed ill tiv('s (,xtr+t('t(,d fi'om on-line (li('tionari('s can t)(' us('d the ll:~ulsllt/:ion of a. (:Oml)h,x senten('e where an ini- l;o exalllille the (:OH'eS[)Oll(h'n('e ])etw('ell two words. tim main ('lause ill a, sour('(,-lmtgmtge s(,nt(,n('(, ('om(,s afl(,r th(' sul)ordin+tte ('l+ms(' in th(' target language, 3.2 Resolving pronoun referents the r('t'('r(mt, noun phr~ts(' shouhl be repbt('ed with th(' Pronoun resolution is a.noth(,r typical ('ont(,xl- I)ronoull, to avoid ('ata.phori(' refer(,n('(,. For ('xaml/h', (h'l)('nd('nt 1)rol)h'nJ, sin('(' the r('fcr('nl of a l)ronoun is the t"m~,,lish S('lll,(qlc(' not Mwa.ys in('lud('d in lh(' sam(, smlt:(,n('(,. Our ('ou- Th,(: dog 'will eat you,'r c.,k¢', if you dcm,'t ho, v¢: l:ex:l: n).o(lel is us('d to s(qe('t (+uMidat(' noun l)hras('s q'eti(:kly, for a 1)ronoun r('fl'rent. ]qlrthermore, information on should bc translatod as word fr('qu(m('y and moditi('r-moditi('(' rel+t(ionships extr;tcted fi'om the (:ontext 1no([el inll)roves the a(.(.u- Kiw~.i [v,,,,] ~/a .~ono keiki [th< <..kq wo ,~'tq/'~¢,~,i [q,,i,.~l.] racy with whi('h th(' ('orre('t rcf('r(,nt is s(q(,('tod froui ~a.l~¢' ":~,¢ri[,10,,'~ < .~1 ~,(1,'ra., ,,~o'n,o i~tu [~h, d,,~] ,qa the (';m(lid~t(' noun l)hri~s(,s, a.s shown in a. pr('vious :a,hetc:_sD, i?r~,a,'i£~/o [,,,i.., q. ,I pap('r (Nasukaw;t, 199,i). By applying h(mrisii(' rules according to which a, candi(lat(, that has h('im fre- Sin('(' in the t,r;mslated .]ai)~uwse s(,nt(,n('(, the sub- qu('ntly r(,pe~m~(l in th(, 1)re('eding sent(m('es and it oMinate clause, i,f you do'u'I have it quickly, ('om(,s candidate th~tt modifi(,s the morl)hoh)gi('a.lly id('nti-- 1)efor(' th(' main el+rose, The dog 'will ,at your" (:ai;e, (:al predicat('s as tho 1)rollollll in i;he same context the pronoun it in th(, sUbol'dinat(, claus(, must l)e r('- are t)referred, w(, obt.Mn('d a su('(:(,ss i'~'L(,OO[ ,0.'~.8(Z, ill solved in order to g('n(,r;tte a natura.1 .]iq)an(,s(, sen- pronoun r(,solution. t(m('(,. Mioreover, the word sense of h, ave in the subor- However, the results of pronoun resohliiOn may not dinar(' claus(' cannot 1)e sch,('t(,d without infl)rma.tion be explicitly r('th'('t('d in th(, out.put of :t ma.('hin(, on th(' ret'orent of the pronoun it. tral,sla.tion system, sin((' most languag('s have ('orre Sl)onding an+q)hori(: expressions, ~tnd us(' of th(, corre- 3.3 Lexical and Structural disambiguation Sl)onding a.naphori( expression in lhe translation oul- In a. consistent text, 1)olyselnOUS words withiu a dis- l)ut: hi~s the adviLnt+tge of a.voi(ling misint('rl)r('ta.tions course tend (o have the sam(, word s('ns(' (Gale et a,l., ('a.used by misr('solution of 1)ronoun ref('r('nts, ('v('n if 1992; N;tsukawa, 1993). Thus, ])y al)plyiug discours(! the probability of misim.('rl)r('tation is less than 10J(. ('ovstra.int in such a, nlanner that 1)olysemous words Thus, ill Figure 2, He in .q('illrOll('(~ (3) is tra, nsl~Lt('d with the slune lemma within a context ha.ve th(' same as the Ja,1)anese 1)ronoun ~;a'r(:, Mthough its ref(,renl; (,nt of He, is r('tle(q;(~d in (:he translation of the predicate is correctly resolv(,d a,s Tor~,. Even so, corr(,('t res- like. lh'('~mse of the l,~(:k of tt scnmnti(' f('ature £'lt~t~,an for olution of a 1)ronoun r('f('r('ul: is iml)ortanl for dis- th(, h'xi('al enl;ries '/'o~, a.nd ,loh'u in our (ti('tion~try at th(' ambiguating the word sense ()f a 1)r('di('al:(' modified tinio of this transla, tion, diti'eront word senses for animate 1)y t, he l)roiiou11. "~ Ill ad(lition, if the 1)ositions of a, sul)jc('ts mid nolt-aalinla|;(! sul)je('ts were s(,lectcd for tl, c verb like, and the verb like was r(,n(h,r('d (lit[(,r('nlly in th(' aIn fact, t.he result of pronoun r('solution for s('nl:('nc(' translations with mM withont context. (3) of Figure 2, in whi('h To~,. is s(%('t(,d as (.ho rofe>

827 word sense, a result of word sense (lisambiguation aI)- using the information provided by the unamt)iguous plied in one sentence cau be shared with all ()tiler 1)rel)ositional phrase in The flow of a job in sentence words in tile context that have the same lemma. Fur- (7). Similarly, tile information on the unaml)iguous thermore, by assuming dis('ourse I)reference, namely, prepositional phrase in placed on an output queue in a tendency for each word to modify or be modified by sentence (11) disaml)iguates the aml)iguous I)rel)osi - similar words within a discourse, structural infornm- tional t)hrase on a job queue in sentence (9), alh)wing tion on all other words with the same lemma within it to be attached to places. the discourse 1)rovides clue for determining the mod- ifiees of structurally mnl)iguous 1)hrases (Nasukawa 3.4 Supplementing phrases for elliptical and Uramoto, 1995). This method can 1)e used to sentences solve context-dependent t)rol)leuls such as the well- Supplementatiml of elliptical phrases is another typ- known examt)le shown in Figure 3. ical context-dependent prol)lem. In spite of the sin> t)lMty of our context model, some elliptical phrases (1) John saw a girl with a telescol)e. can be supt)lelnented by using information extracted h'om the context model. For example, if a group of [Without (]ontcxt] words ending with a cohm is not a complete sentence, as in the ease of (3) in Figure 4, This allows you to: Translation: ~ !J ~/t~t. ~{N,~< 3: o<. ~'/.0.'{'5~ ~ b/do our system adds either do the following or the follow- John ha boucnkyou niyotte shoujo wo ~nimashita, ing t)y referring to the tyl)e of the next sentence or [With Context] phrase in the context model. If verb phrases follow, |)el)endency Structure: do the following is added, and if noun l)hrases folh)w, the following is added. Thus, in (3) in Figure 4, do the following is added 1)ecause a verb phrase follows '<)0, ...... this sentence...... ( with 3.5 Resolving modality John ha bouenkyou wo motsu shoujo wo mimashita. The modality of itemized sentences or phrases is of_ ten ambiguous as a result of the 1)resence of ellipses. (2) The girl witl, a telescope was walking on the street. For example, (4), (5), and (6)in Figure 4 couhl be [With and Without Context] imt)erative sentences in certain contexts. In this ease, Dependency Structure: however, they are itemized phrases, and by reference to (3), they (:all be identified as supl)lementary w, rb phrases to be attached to (3). Thus our system ana- v,?a -D ...... lyzes them as verb phrases and nominalizes them in the translation. Translation: ~,~" % ~)~J~'.0,'I2v, }~i')"~']J.Z~,,Z'V,$ bt:o Houenkyou wo moran shoujo ha loori de aru~tc imashita. 4 Discussion Figure 3: Translation with context (II) We. have described how a simple context model that consists merely of a set of parsed trees of each sen- In sentence (1) of tile figure, the mo(lifiee of the tence ill a text provides rich information for resolving prel)ositional phrase with a telescope can be either amt)iguities in sentence analysis and various context- saw or girl, depending on its context. In this case, in- dependent prol)lems. The greatest advantage of our formation in sentence (2), where the identical t)repo- coutext-processing method is its rolmstness. Storing sitional t)hra.se modifies girl, provides a clue that with information on a large number of sentences requires a telescope in sentence (1) is likely to modify girl. a relatively large memory space, which has become In this way, modifier-m<)difiee relationships extracted available as a result of progress in hardware tech- from a context model provide clues for disambiguat- nology. Our fl'amework is highly practical, since it ing structurally ambiguous phrases. Needless to say, does not require any knowledge resources that have the effectiveness of this method is highly dependent been specially hand-coded for context processing, or on the sf tile the ambiguous prepositional 1)hrase of a job 5 in sen- following assun, pti

828 I Polysmnous words within a discourse tend to h;tve the Sa,lllP word S('llS(". (1) Tracking Your Job :~.--% -m-2 ~, 7~i!~aM7~ :_ ~: [ U,~e'r v,o job ,~o t.v~l,i,~eX:iaur'u • Words with th(' same h'nnna ten([ to modify or koto) 1)(' modified by similar words. (2) It is iml)ortant to know th(, flow of a job so that • Topical words t('nd to I)e repeated frequently. you can track it through thv system and display or Therefore, the effectiveness of this lnethod is highly change its status. (h'p(qid('nt on the source text. th)wever, at least in mos[ l:('('hnic&| do('uln('tits Stl('h ~ts ('()ili[)llt('l' IlI&IIII&|S, tj,'e k'~ 8to,, &.Sv,l:t~i~,,J 0P, t~ : th(' above ;mSUml)tions hohl true, and we h~we had ~:tili:'~:-(*'~<> [ Uaer ga, .system ,wo too,~hih:, sore wo tau- isekidekite,, oyobi aono ,joukyou wo hyo'widekiruka, encouraging results. aruih, a henkou kanouna youni, joD no nay/are ¢~o sh, it- teiru ko*o h,a j'ml~,yo',, dear,. 1 Acknowledgements (3) This allows you to: I wouhl like to thm,k Mi('hael McDonald for his inwdnabh, t *tt:l, :~---~'- -~< ~ ~><, 1:2 V~:~fr.;5 < ~ 4 "J(lP,~:L :~ g. ( Kore help in l>roofr('ading this paper. I wouhl also like to thank h,a, user ni totte, ika wo o]~:ona,,u, A:oto wo );:a'nou ni Taijiro Tsutsumi, Masayuki Morohashi, I'~oichi Takeda, .~himasu. ] Iliroshi Maruyam~h Hiroshi Nomiyamn, Hid(x) \Vatanabe, (4) End or hold a batch job. Shiho ()gino, Naohiko Uramoto, and the anonymous re- ~,~-7- • "~ ~-74,~,j"¢Z,t ~ ~v,~t~'i,-~~ ~_ ~: [Batch job vi('w('rs for their (:omnlents a,nd suggestions. wo ahuuryou.~,ar,l~, koto ar,ltih,a hoji,~urlt koto] (5) Answer messages sent by the system. References 9,x~-&~<2<>'(~(,ttT~,g'7-k- 715~) 7~12& [~y,<4te'm, r~i IP~ltl|g~('tt(" Chm'niak. 1973. Jack and .}an(,i; ill Search of yotte ok,lt,vareru ~n,esaaSle wi ]cota,c'r'lt hoto] a 'l~h('ory of Knowledge. In Proceedings of IJCAL7,7, (6) Control print('r output. Img('s 337 343. l~l.!llil] ~ ¢?,'.~') ill )J 'k liilJ~11 "¢" 7~ C k [bt,.sat,~'u,so'lt,ch, i 'no d~,ut- William A. Gale, I':emwth W. Church, and David Yarowsky. suryoku wo seigyo*ur,u hot<)] 119(,)2. ()n(' Sense per Dis('onrse. In Proceedi',41,~ of th, e 4th DARPA Speech and Naturo, l Lanq'uagc Work:ahop. (7) Tit(! flow of ~t job can have lip LO fiVC StCl)S: Barbara ,1. Grosz and Candmt('e I,. Sidner. 1986. AI.- gu-fa)~;~t~, l~),.; r) o)x-)~,~7'lfide> b'~!J,~J': [.Job rl, o "nafta'l'~: hi, .~aidai 5 no .~tep ga a.riemn,v~u] tentions, hLtentions, and the Structure of Discourse. Compatational Linquiatic,% 12(3):175 204. (8) 1. A nser or 1)rogram sutmdts a jol) to 1)e run. Dmdel Lyons and Gracme Hirst. A Compositional Se- ma,ntics for Focusing Sul)juncts. In Procceding,q ofACL- ~t;flb~-j'o [1. U.ser ar.u,iha program ha, jikko'u, anrcrlt 90, pages 54 61, 1990. tame no job wo jit,:ko'niro,iddmas,la. ) I(atashi Nagao. 1990. Dependency Amdyzer: A l(imwh'dge- (9) 2. The system places tim job on a job queue. Bas('d Api)roach to Stru('tural Dismnl)iguation. In Pro- 2. ~x-ye,),t, "7~.7"?,~Aj:YUV, "2,'~-/~:¢~3'2q'o [2. Hy,~te'm, ceedinga of COLING-90, pages 282 287. ha, jobmachi.qyouret,~'lt hi, .)lrO/) IDO okim,s.,vlt. ] ~['('tsuya Nasukawa. 1993. Discourse Constraint in Com- (10) 3. The systean takes tit(, job fi'om the job (l,t<'ue and lmt('r Manuals. In Procecding.~ of TMI-93, pages 183 rllltS it. 194. Tetsuya Nasukawa. 1994. Ilo|)ust Method of Pronoun J'<> (3. System h,a, jobma, chigyov~rct,va kara, job ,wo Resolution Using Full-Text, Information. In Proceedings tori, sore wo jikkou.~hima.~'u,.] of COLING,94, pages 1157 1163. (11) 4. If this job creates some inforlnation (output) that Tctsuya Nasukawa, 1995. Rol)ust Based on needs to be 1)tinted, th(" printer output is placed on Discours(~ Inform~ttion: Coml)leting Partial Parses of ~l~I[ Ollt;[)1l{, (IllCllC. Ill-Forlned S(?nten(-es on the Basis of Discourse Infor- 4. ~_a) g u "fib ~, [itJtiltJ~c ]¢c~.'g~¢o~>~ t,, < <)do,¢)'I)'i~#, ( tllJJ ) {"P}~J~ ln~ttion, lit fb'oceedinga o]" A CL-95. 1-~,~,{,::~;J:, ~l~lNitta)tl~)o~;t, tl',Jj~,7 ~,qi-#~j~c~gi?t.~ *t ~t-¢<> [4. Kono job ga, insatsu,~are'r'lt h,it,v~yo'u, ga aru ikut,,,l&a T(,tsuya Nasuk~w~t mM Naohiko Uramoto. Discours(~ no jouho'u (ah,,~ttauryok,u) um ,~akuseisur'lt baa.i niha, as a I,[nowledge Resourc(~ for Senten('e Disaml)iguatiom inaatsusouchi no shuts,~ryok,~ ha, ,~h,.~d,~,l~,ryokamachi- In Proceedin9.s of [JCAL95, 1995. gyouretau ni haichisaremas,l~,. 1 Roger C. S(:tmnk m,d Christot)her K. t{iesb(x:k. 1981. I'n.~ide Computer Underatanding: Five Pro.qram.~ plu,,~ (12) 5. The system takes printer output fl'om the out- 1)ut qlteUe and sends it; to t;h(, desired 1)rint~w to l)e Miniature,< Lawrence Erlbauln Associates, tlillsdah', New Jersey. printed. 5. "NXg:J, IJ~, tllJJ{,}6,~j:~/IJJ'G, lillt6~@,<.o)~qlJd~l[~O'~_g~, f~rl Koichi Take&t, Naohiko Urmnoto, T(,t,suya Nasukawa, •ll~7ote~©~"~¢lilJliliJ~'t.~:, ~a~'l-~ b 2-j'o [5. Sy.'~tem and Taijiro Tsutsumi. Shalt2: Symmetric Machine h,a, ,~h'l~t.~,aryok'lt.m,ach, igyo'u'rc't,~u kara, i'~t,~at,~rlt,~o'l~chi Trm,slation System with Co,reel)rUM ']'ransf(,r. ht Pro ~ no ,~h,utsu¢,,qoku wo torikomi, in.sat.~'u,.~arcr'u,tam( no ceedin.q,~ of COLING-92, pages 1034 1038, 1992. hitsuyouna inaatsuso,~chi ni, sm'e wo oX:'arimasu. 1 Naohiko Uramoto. 1992. Lcxical and Structural Dis- ambiguation Using an Exauq)le-Base. In Procecdings of Figure 4: Translation wil:h context (III) the 2rid ,lapan-Au,~tralia ,loint Sympoaiu)n on Naturo.l Lauguage Proce,~sin.q, pages 150 160.

829