Full-Text Processing: Improving a Practical NLP System Based on Surface Information Within the Context
Total Page:16
File Type:pdf, Size:1020Kb
Full-text processing: improving a practical NLP system based on surface information within the context Tetsuya Nasukawa. IBM Research Tokyo Resem~hLaborat0ry t623-14, Shimotsurum~, Yimmt;0¢sl{i; I<almgawa<kbn 2421,, J aimn • nasukawa@t,rl:, vnet;::ibm icbm Abstract text. Without constructing a i):recige filodel of the eohtext through, deep sema~nfiCamtlys~is, our frmne= Rich information fl)r resolving ambigui- "work-refers .to a set(ff:parsed trees. (.r~sltlt~ 9 f syn- ties m sentence ~malysis~ including vari- : tacti(" miaiysis)ofeach sexitencd in t.li~;~i'ext as (:on- ous context-dependent 1)rol)lems. can be ob- text ilfformation, Thus. our context model consists tained by analyzing a simple set of parsed of parse(f trees that are obtained 1)y using mi exlst- ~rces of each senten('e in a text withom il!g g¢lwral syntactic parser. Excel)t for information constructing a predse model of the contex~ ()It the sequence of senl;,en('es, olIr framework does nol tl(rough deep senmntic.anMysis. Th.us. pro- consider any discourse stru(:~:ure mwh as the discourse cessmg• ,!,' a gloup" of sentem' .., '~,'(s togethel. i ': makes.' • " segmenm, focus space stack, or dominant hierarclty it.,p.(,)ss{t?!e .t.9 !~npl:ovel-t]le ~ccui'a~'Y (?f a :: :.it~.(.fi.ii~idin:.(cfi.0szufid, Sht/!er, dgs6)i.Tli6refbi.e, om< ,,~ ~-. ~t.ehi- - ....t sii~ 1 "~g;. ;, ~-ni~chin¢''ti-mslat{b~t'@~--..: .;- . .? • . : ...... - ~,',". ..........;-.preaches ":........... 'to-context pr0cessmg,,' • and.m" "tier-ran : ' le d at. •tern ; ::'Li:In i. thin.j.;'P'..p) a (r ~;.!iwe : .%d,es(tib~, ..i-!: ..... a. ,,~inqfl6 .... • ..- ~.,.. '..obtmnm~..~:.pei'iect. ' .: " : ....aimly,~ls. : : Howev~r,:" " ' ' ~ ......... by 9xtel~d c :co3!t e~/~ ~.g0dc[.~.Ofi!si.~tilng .-ef:p~fs.e d t'Imcs.6!-.: .~,..,2,'.{ .in~/t, ii~,}.~{ifit..of .tile. iJl:oces~i~/g 6bji,bt. fr0ih'-0ne'.smt~ :~tch'~:n~q~ce::.in :.a.text,4~a¢~fit~-eff~'!,~ven£~s :> L #..:::'ti;iic~. ~6 .rfiifltiifl6 sgi~tdlt¢:es {n:.~{ k0ifi'ee, f,6XVxnd .by fin"- handhng: """:' various'!?": l~roblems '"" ....m NLP ('such " " ; using syntaciic information on all tlie 0theriwSr(ls as the resolutien of stru~:tural ambiguities, i.n the whole text. Snch ,~s nmdifiee-modifier relation- pronoun referents, and the focus of focusing ships and their I)ositions in the text. our framework subjuncts (e.g. also and only), as well as improves the overM1 ae('uraey of a natural language for •adding SUl)plemen:t~try phrases t.o seine pro('essing system. elli)ti(al sen:fences.. ". " " " . .' ...... : ........ <:. : ? -: - ' 2";" ,:- q.: '. <;. :- .:'~ .v: '..-:: .' : :". '.-::' ' C".. : We, imp]etne~i{.M this II:amework:on:an Englisl~-to- • ,*"-' "~''<- ":~ ....... '. ".',' '." " , . "," ,', ' " - " . cOnllith, er lllallu~tts.,. ±.tt.: xtood.~,) .~,¢~v.. ....-.. ......... ,-~: .'key, '~.eclimot.ogy. for :.. mH)r.0y!ng" tlie. ac(;nra(:y •(ff ,h~xt.- .: ::. < :. :. ~G,':.; .:.: .. ", - ,.;-' , : '; ..... ',' : :-: : - ; :'; .... : t : ....... "q ""'.' . " ' ..... ' ...... " ", " .... " " m }rDuiid:i~/lowledge.,t, hd.(l(,el> iltfo1'(,ll(e !n(~cllal)isi~ls, C'O'tt'~X~-:~;:~l';!:;lt e t(:::~:~:1~;]~ ' :~~ (.2;1]I>!;;0'~ y ~.~;" is true:.that.-' We." can" " mWays.,' " " .o'nn.d ;' ex~mq)l.,'" ' 'eS ot~ 1 Jiob' .- . SllInlng.:..'. • ~x,,,.' ,, "s ; . : 1~,,'",' • '.,[ " : ; . .' ,<.;(.. '",.; 7.' lems thai requir.e c6mmoil 'sei,g~ mM .ilff¢4rence~ lnech.r ::. ,." ;u.~w~i~:~d:h;na::P;'.lYSo:::~lg.:~:;n:~'l'. < w3!-n[;a2ls;t a~lisI~lS;':s~rch.:as :t.ti( class'ie i)t:6iflems ]ne.ntloned. it['.- ',,.":" !. :. '. '.'"., !.:. '~' '.,-<,:' " ' : . (Cl{anlliai("1973~ il]whietl tl[i re.fi rents (if'l:,{'ouonns result of Word semse (tiszmtblgua~mn m one sentenc.e Y :: - - ':..: .-" ," ,: •, . - <-' ," "; ;; ;- ' - " . Wit-ir 'M~ the (W6i'd~ .{/c"{Sd i~lis~iou{'~(~ t.fih~ :Sl{Xl'(~:.ttie "are ndt,<ex lw~itl s%~L15P£t.l.li [;lle~l;(}x~;, nowexer ..m a..'..... ,.. .: ~. .::" ":,~..:~: ...::.'. ,,:-:;...': :.':,.;.:,: .:'.,..,.. ":.- •<.....< :...... P . .. ~ z-., .% ~, . ::.: ...;-:".-..;..-,~..,,. :-z-el. "" satlw .ld~illn~L:2 Fffrtlte1'iiior(,:,..:I~.~asstntlin'g;W.fllS.C(~Ml~Se :t~.X~..Wit'l'IllT ~ur(,sZl'l(%e;(l (iOlllai,n" ..:p&rr, l('lnarly An [,e 11: " .: *.. : --: - (..'. • .... .. • , , , .' • : • : ', , . -. .... 'abSerxre..t'ti/~i~3~ ~(mi);x(id~-ilenderlt :tfi:obloniS;'tht~( Art! ';::) ; ..6.y'.',: ';-,-.: .,..-! ..... ;..Y-;! : -i~:- "<:..'. ?., i~:!,.:.,!!~¢:.!..>"-..,':'~.:..,' *.~ • : i" ;~;. ' " ,-," " .-/, " v ( :~ ;-., < i-. - " .. "e ~. ?._:SL:.£t : . 'W/~,"(;all "O])t,,~kilI:.6"l.ll~C,~ "t,~, U~'l;erglllllltg: [,tie ll~DetlIleog .o I s(MgaDlg..Wll;nOt'lt-.Ille!lS( .DI. a (Leep lllieronce ille(lt7.. ' '~ "', .' ",.' ' -1 . '.... i. .~ .: , • ~ t" e imism o~ Cal( full" haiid (od( d data su( h ~ scl, i-)ts ' sm~(~nr~uy ftmolguous 1)nrases Irom sH'.ltc{urai.mlor: " ?" "~ ' y. - : ' ' " ..' " ' " " 1 ' • : • ', . " - ' .- • , :' ' ': ' ' [Schhnk and R{dsti6}k 1)981~ Wc therefore tried t0 matnm on all words w~th the same:lemmawlthm the .~,~...~, -.' i <..,,.'...~., '...,-,.,~,!,-.;.... ~,, ;, ~,i ', .~,.'~..:~; ' ; di,~(:a~r:;'e~,.'.M6r~ovel',, p~;O.c~siilg.:,~-MiOle.~.xt:: ~t:.a~i~:, d/)VelOp' a- praY( tl(-'.[tl ~,ile~no([- tllaL .wolll(l SOlV.O' lllOS ,.. , - ' - , , .., .... ' - • ....... " . ..... ' : - . - ' " • " " - .... tinle ltmkes:-it. po~ibto ~6,.refer::to otl~erdn.f(*rm~ti61~ abntexVddpenden.t px'obl0,nls.and, m~t)r.o'~e the aceu-.. ,' ' ,~,- .. ,-;, ' . ., • .,..-,,:; -.:. :, .,. " ;.. • ; • . ,,- .:. ':.'.; ; ",': ....... ,i":-, " i "' ;.' :' " "., ," - , ,~:' ," " : ; .' S'li(;[Fa.,~wor(IIreqi'fOn(~I~d~,1~e.poslrlo~z:ot.e~ten,WOl'(r, •.racy, oI ~ex.b,all&lysls :Dy,U,qlng a. stlnpte lnecP2a, nlstD.:. , , .~ ~ • :.. ..... ...: ...... :.....: ....,",, :, : : ,:", '., ..... ..- "< ." : .. ~.:....~. ;.. : ,. i ,.'., .; .-. " .... .- ." . ,wfiieh,c.an be.lmedf0r gesoban~i)ron.em~ ref~tetlcd und 'and-,eXlSldl]g 1i13~llllle-l'l,~a(13A)leoara . , , • ' ,. - _..,. : . ;, :.',-. ,: '-. ', .; - : " < , " - :,.', " -¢ -, - - " T6 t~e~h~, w:it}~ ~e' ttevel6,;~([ ~(ftah~4v6tk'f(fi" ~i@6 ':/ tlii~ '~0('us< ot-m~ash~gsnla'imictq :stidi..it~ a.l.~o, and-0nt:y, • " "-" ~-~ .- . ~,,. '~ - ~ .' : ' - . r """; :',"<' ', '-"' ':: ". "'-:':'"' " -':.? 2' *,..¢"': . ":'.':; ) - ',. , ': {~essi.ng.all_: se]~te.n~:8~ .{g..a': t;eXt ;siimflt~n('.0i/sk6! s(; that..i: • .Ii/:~hi's .: p~p"h'i:,, ~$!./.d'eScti'.t~9.)d3ii:.....a!ob)ist.. (>O~\,tgx,t~ ~e'a(-hsent.ende e~u(b'e d~s~nb'~gu~ded by itsing i~ffo~mia- . (proceS~ing l~ett!od , .m~mely~"filll-.t(~x~:processlng ,, f6- tioi~ ex~rgcted from other senten('es within tlm same (:using On its effects on ttie.output of a nmehine trails- 824 lation system. In the next section, we briefly (h'scril)e Context = {Sentencel, Sentence2, ..., Sentence n] tim framework of our method, which uses a siml)le context model; tl,(`n, in the following s('etions, we il- Stenence i = [Word i-I, Word i-2, ..., Word i-j} lustrate its effe(:tiveness with some actual outl)uts of our English-to-JCL1)anese lna('hine translation system. John likes apples. Sentence 1 Word1-1 [John] POX : N BASE : John ... 2 Framework ..............ub ./~o hn~ Word1-2 [llktm] ................ ~ POX : V BASE : like ... Full-text processing consists of thr('e steps: POX : N BASE : apple .~. 1. G(`neratil,g a context model tlmt consists of 1)arsed trees of each seltt('n(`e ill a sour('e t(`xt Sentence 2 Tom ah'o likes apples. Word2-1 [Tom] 2. Refining the context model by assigning a single raft- POS : N BASE : Tom ... fled parse tree to (`a<'h senten<'e in the text Word2-2 [also[ 3. Resolving the prol)lems in -;t<'h sentence in the <'<m- POS : ADV BASF. : also., text ntodel an<l generating a. final analysis for ea<'h Word2-3 [likes] POS : V BASE : like ,.. sentence in tit(. text Word2.4 [apples] The resl)ective procedures fl)r these steps are (It'- POS : N BASE : apple .. scribed in the tolh)wing thre(, subs(`ctions. Sentence 3 He also likes oranges. Word3-1 [He] 2.1 Generation of a simple context model POS : PN BASE : he ... Word3-2 [also] In order to refer to ('ontext information that consists POX : ADV BASE : also., of dat;t on multiple senten('es in at text, it is esseu- Word3-3 [likes] tim to constru('t some eollt(`:~t model; the tirst st(' 1) POS : V BASE : like ,.. of the full-text 1)ro('essing nwthod ix therefore to ('on- Word3-4 [oranges] POS : N BASE : oranse ,., stru('t a context lnodel by amalyzing (`a('h senten('(` in an inlmt text. To avoid any (,rrors that may o('cur during transforlmLtion into any other rel)r(`s('ntations, Figm'e 1: Example of ~t context mod('l su(:h as a h)gicM rel)resentation , we stayed with sur- face structures, and to i)reserve the robustn('ss of this framework, we used only a. set of l)arsed tr('es as ;t su('h a s('ntenee, hfformation extracted front COln- (:ontext model. Thus, ea(:h sent.enc(` of an inl)Ut text pl(`te 1)arses of w(`ll-formed sentences 2 in a context ix pro('(`ssed t)y a syntactic lmrs('r in the first st('I), model ('all b(` us('(l to cOlnlflete incolnl)lete parses, in and the positi(m of eac|t instance of every h'mma., its the f()rm of partially parsed chunks that a bottom- morphological information, and its lno(lifiee-modifier up 1)ars(,r outlmts fl)r ill-formed sentences by using a relationships with other content words are extracted previously des('ribed method (N~Lsukawa, 1995). from the parser output, and stored to construct a On the other hand, fl)r some sentences in a text, context model, ;~s shown in Figure 1. In addition, if such as Time ]lies like an arrow, a syntactic t)arser any on-line knowledge r(`sourc('s are ~tvMbd)l(`, infl)r- lltay gent,rate nlore thatl olle parse tree, owillg to the mation extracted froln tit(, resour<:es is also stored in 1)r(`sen(-e of words that Call ])e ;Lssigned to more than the context model. For examl)le, infl)rmation on sym one part of st)eech , or to the l)resen('e of complicated onyms extra.('te(t from an on-lilw thesaurus dictionary coordinate structures, or for wtrious other re~Lsons.