Full-Text Processing: Improving a Practical NLP System Based on Surface Information Within the Context

Full-Text Processing: Improving a Practical NLP System Based on Surface Information Within the Context

Full-text processing: improving a practical NLP system based on surface information within the context Tetsuya Nasukawa. IBM Research Tokyo Resem~hLaborat0ry t623-14, Shimotsurum~, Yimmt;0¢sl{i; I<almgawa<kbn 2421,, J aimn • nasukawa@t,rl:, vnet;::ibm icbm Abstract text. Without constructing a i):recige filodel of the eohtext through, deep sema~nfiCamtlys~is, our frmne= Rich information fl)r resolving ambigui- "work-refers .to a set(ff:parsed trees. (.r~sltlt~ 9 f syn- ties m sentence ~malysis~ including vari- : tacti(" miaiysis)ofeach sexitencd in t.li~;~i'ext as (:on- ous context-dependent 1)rol)lems. can be ob- text ilfformation, Thus. our context model consists tained by analyzing a simple set of parsed of parse(f trees that are obtained 1)y using mi exlst- ~rces of each senten('e in a text withom il!g g¢lwral syntactic parser. Excel)t for information constructing a predse model of the contex~ ()It the sequence of senl;,en('es, olIr framework does nol tl(rough deep senmntic.anMysis. Th.us. pro- consider any discourse stru(:~:ure mwh as the discourse cessmg• ,!,' a gloup" of sentem' .., '~,'(s togethel. i ': makes.' • " segmenm, focus space stack, or dominant hierarclty it.,p.(,)ss{t?!e .t.9 !~npl:ovel-t]le ~ccui'a~'Y (?f a :: :.it~.(.fi.ii~idin:.(cfi.0szufid, Sht/!er, dgs6)i.Tli6refbi.e, om< ,,~ ~-. ~t.ehi- - ....t sii~ 1 "~g;. ;, ~-ni~chin¢''ti-mslat{b~t'@~--..: .;- . .? • . : ...... - ~,',". ..........;-.preaches ":........... 'to-context pr0cessmg,,' • and.m" "tier-ran : ' le d at. •tern ; ::'Li:In i. thin.j.;'P'..p) a (r ~;.!iwe : .%d,es(tib~, ..i-!: ..... a. ,,~inqfl6 .... • ..- ~.,.. '..obtmnm~..~:.pei'iect. ' .: " : ....aimly,~ls. : : Howev~r,:" " ' ' ~ ......... by 9xtel~d c :co3!t e~/~ ~.g0dc[.~.Ofi!si.~tilng .-ef:p~fs.e d t'Imcs.6!-.: .~,..,2,'.{ .in~/t, ii~,}.~{ifit..of .tile. iJl:oces~i~/g 6bji,bt. fr0ih'-0ne'.smt~ :~tch'~:n~q~ce::.in :.a.text,4~a¢~fit~-eff~'!,~ven£~s :> L #..:::'ti;iic~. ~6 .rfiifltiifl6 sgi~tdlt¢:es {n:.~{ k0ifi'ee, f,6XVxnd .by fin"- handhng: """:' various'!?": l~roblems '"" ....m NLP ('such " " ; using syntaciic information on all tlie 0theriwSr(ls as the resolutien of stru~:tural ambiguities, i.n the whole text. Snch ,~s nmdifiee-modifier relation- pronoun referents, and the focus of focusing ships and their I)ositions in the text. our framework subjuncts (e.g. also and only), as well as improves the overM1 ae('uraey of a natural language for •adding SUl)plemen:t~try phrases t.o seine pro('essing system. elli)ti(al sen:fences.. ". " " " . .' ...... : ........ <:. : ? -: - ' 2";" ,:- q.: '. <;. :- .:'~ .v: '..-:: .' : :". '.-::' ' C".. : We, imp]etne~i{.M this II:amework:on:an Englisl~-to- • ,*"-' "~''<- ":~ ....... '. ".',' '." " , . "," ,', ' " - " . cOnllith, er lllallu~tts.,. ±.tt.: xtood.~,) .~,¢~v.. ....-.. ......... ,-~: .'key, '~.eclimot.ogy. for :.. mH)r.0y!ng" tlie. ac(;nra(:y •(ff ,h~xt.- .: ::. < :. :. ~G,':.; .:.: .. ", - ,.;-' , : '; ..... ',' : :-: : - ; :'; .... : t : ....... "q ""'.' . " ' ..... ' ...... " ", " .... " " m }rDuiid:i~/lowledge.,t, hd.(l(,el> iltfo1'(,ll(e !n(~cllal)isi~ls, C'O'tt'~X~-:~;:~l';!:;lt e t(:::~:~:1~;]~ ' :~~ (.2;1]I>!;;0'~ y ~.~;" is true:.that.-' We." can" " mWays.,' " " .o'nn.d ;' ex~mq)l.,'" ' 'eS ot~ 1 Jiob' .- . SllInlng.:..'. • ~x,,,.' ,, "s ; . : 1~,,'",' • '.,[ " : ; . .' ,<.;(.. '",.; 7.' lems thai requir.e c6mmoil 'sei,g~ mM .ilff¢4rence~ lnech.r ::. ,." ;u.~w~i~:~d:h;na::P;'.lYSo:::~lg.:~:;n:~'l'. < w3!-n[;a2ls;t a~lisI~lS;':s~rch.:as :t.ti( class'ie i)t:6iflems ]ne.ntloned. it['.- ',,.":" !. :. '. '.'"., !.:. '~' '.,-<,:' " ' : . (Cl{anlliai("1973~ il]whietl tl[i re.fi rents (if'l:,{'ouonns result of Word semse (tiszmtblgua~mn m one sentenc.e Y :: - - ':..: .-" ," ,: •, . - <-' ," "; ;; ;- ' - " . Wit-ir 'M~ the (W6i'd~ .{/c"{Sd i~lis~iou{'~(~ t.fih~ :Sl{Xl'(~:.ttie "are ndt,<ex lw~itl s%~L15P£t.l.li [;lle~l;(}x~;, nowexer ..m a..'..... ,.. .: ~. .::" ":,~..:~: ...::.'. ,,:-:;...': :.':,.;.:,: .:'.,..,.. ":.- •<.....< :...... P . .. ~ z-., .% ~, . ::.: ...;-:".-..;..-,~..,,. :-z-el. "" satlw .ld~illn~L:2 Fffrtlte1'iiior(,:,..:I~.~asstntlin'g;W.fllS.C(~Ml~Se :t~.X~..Wit'l'IllT ~ur(,sZl'l(%e;(l (iOlllai,n" ..:p&rr, l('lnarly An [,e 11: " .: *.. : --: - (..'. • .... .. • , , , .' • : • : ', , . -. .... 'abSerxre..t'ti/~i~3~ ~(mi);x(id~-ilenderlt :tfi:obloniS;'tht~( Art! ';::) ; ..6.y'.',: ';-,-.: .,..-! ..... ;..Y-;! : -i~:- "<:..'. ?., i~:!,.:.,!!~¢:.!..>"-..,':'~.:..,' *.~ • : i" ;~;. ' " ,-," " .-/, " v ( :~ ;-., < i-. - " .. "e ~. ?._:SL:.£t : . 'W/~,"(;all "O])t,,~kilI:.6"l.ll~C,~ "t,~, U~'l;erglllllltg: [,tie ll~DetlIleog .o I s(MgaDlg..Wll;nOt'lt-.Ille!lS( .DI. a (Leep lllieronce ille(lt7.. ' '~ "', .' ",.' ' -1 . '.... i. .~ .: , • ~ t" e imism o~ Cal( full" haiid (od( d data su( h ~ scl, i-)ts ' sm~(~nr~uy ftmolguous 1)nrases Irom sH'.ltc{urai.mlor: " ?" "~ ' y. - : ' ' " ..' " ' " " 1 ' • : • ', . " - ' .- • , :' ' ': ' ' [Schhnk and R{dsti6}k 1)981~ Wc therefore tried t0 matnm on all words w~th the same:lemmawlthm the .~,~...~, -.' i <..,,.'...~., '...,-,.,~,!,-.;.... ~,, ;, ~,i ', .~,.'~..:~; ' ; di,~(:a~r:;'e~,.'.M6r~ovel',, p~;O.c~siilg.:,~-MiOle.~.xt:: ~t:.a~i~:, d/)VelOp' a- praY( tl(-'.[tl ~,ile~no([- tllaL .wolll(l SOlV.O' lllOS ,.. , - ' - , , .., .... ' - • ....... " . ..... ' : - . - ' " • " " - .... tinle ltmkes:-it. po~ibto ~6,.refer::to otl~erdn.f(*rm~ti61~ abntexVddpenden.t px'obl0,nls.and, m~t)r.o'~e the aceu-.. ,' ' ,~,- .. ,-;, ' . ., • .,..-,,:; -.:. :, .,. " ;.. • ; • . ,,- .:. ':.'.; ; ",': ....... ,i":-, " i "' ;.' :' " "., ," - , ,~:' ," " : ; .' S'li(;[Fa.,~wor(IIreqi'fOn(~I~d~,1~e.poslrlo~z:ot.e~ten,WOl'(r, •.racy, oI ~ex.b,all&lysls :Dy,U,qlng a. stlnpte lnecP2a, nlstD.:. , , .~ ~ • :.. ..... ...: ...... :.....: ....,",, :, : : ,:", '., ..... ..- "< ." : .. ~.:....~. ;.. : ,. i ,.'., .; .-. " .... .- ." . ,wfiieh,c.an be.lmedf0r gesoban~i)ron.em~ ref~tetlcd und 'and-,eXlSldl]g 1i13~llllle-l'l,~a(13A)leoara . , , • ' ,. - _..,. : . ;, :.',-. ,: '-. ', .; - : " < , " - :,.', " -¢ -, - - " T6 t~e~h~, w:it}~ ~e' ttevel6,;~([ ~(ftah~4v6tk'f(fi" ~i@6 ':/ tlii~ '~0('us< ot-m~ash~gsnla'imictq :stidi..it~ a.l.~o, and-0nt:y, • " "-" ~-~ .- . ~,,. '~ - ~ .' : ' - . r """; :',"<' ', '-"' ':: ". "'-:':'"' " -':.? 2' *,..¢"': . ":'.':; ) - ',. , ': {~essi.ng.all_: se]~te.n~:8~ .{g..a': t;eXt ;siimflt~n('.0i/sk6! s(; that..i: • .Ii/:~hi's .: p~p"h'i:,, ~$!./.d'eScti'.t~9.)d3ii:.....a!ob)ist.. (>O~\,tgx,t~ ~e'a(-hsent.ende e~u(b'e d~s~nb'~gu~ded by itsing i~ffo~mia- . (proceS~ing l~ett!od , .m~mely~"filll-.t(~x~:processlng ,, f6- tioi~ ex~rgcted from other senten('es within tlm same (:using On its effects on ttie.output of a nmehine trails- 824 lation system. In the next section, we briefly (h'scril)e Context = {Sentencel, Sentence2, ..., Sentence n] tim framework of our method, which uses a siml)le context model; tl,(`n, in the following s('etions, we il- Stenence i = [Word i-I, Word i-2, ..., Word i-j} lustrate its effe(:tiveness with some actual outl)uts of our English-to-JCL1)anese lna('hine translation system. John likes apples. Sentence 1 Word1-1 [John] POX : N BASE : John ... 2 Framework ..............ub ./~o hn~ Word1-2 [llktm] ................ ~ POX : V BASE : like ... Full-text processing consists of thr('e steps: POX : N BASE : apple .~. 1. G(`neratil,g a context model tlmt consists of 1)arsed trees of each seltt('n(`e ill a sour('e t(`xt Sentence 2 Tom ah'o likes apples. Word2-1 [Tom] 2. Refining the context model by assigning a single raft- POS : N BASE : Tom ... fled parse tree to (`a<'h senten<'e in the text Word2-2 [also[ 3. Resolving the prol)lems in -;t<'h sentence in the <'<m- POS : ADV BASF. : also., text ntodel an<l generating a. final analysis for ea<'h Word2-3 [likes] POS : V BASE : like ,.. sentence in tit(. text Word2.4 [apples] The resl)ective procedures fl)r these steps are (It'- POS : N BASE : apple .. scribed in the tolh)wing thre(, subs(`ctions. Sentence 3 He also likes oranges. Word3-1 [He] 2.1 Generation of a simple context model POS : PN BASE : he ... Word3-2 [also] In order to refer to ('ontext information that consists POX : ADV BASE : also., of dat;t on multiple senten('es in at text, it is esseu- Word3-3 [likes] tim to constru('t some eollt(`:~t model; the tirst st(' 1) POS : V BASE : like ,.. of the full-text 1)ro('essing nwthod ix therefore to ('on- Word3-4 [oranges] POS : N BASE : oranse ,., stru('t a context lnodel by amalyzing (`a('h senten('(` in an inlmt text. To avoid any (,rrors that may o('cur during transforlmLtion into any other rel)r(`s('ntations, Figm'e 1: Example of ~t context mod('l su(:h as a h)gicM rel)resentation , we stayed with sur- face structures, and to i)reserve the robustn('ss of this framework, we used only a. set of l)arsed tr('es as ;t su('h a s('ntenee, hfformation extracted front COln- (:ontext model. Thus, ea(:h sent.enc(` of an inl)Ut text pl(`te 1)arses of w(`ll-formed sentences 2 in a context ix pro('(`ssed t)y a syntactic lmrs('r in the first st('I), model ('all b(` us('(l to cOlnlflete incolnl)lete parses, in and the positi(m of eac|t instance of every h'mma., its the f()rm of partially parsed chunks that a bottom- morphological information, and its lno(lifiee-modifier up 1)ars(,r outlmts fl)r ill-formed sentences by using a relationships with other content words are extracted previously des('ribed method (N~Lsukawa, 1995). from the parser output, and stored to construct a On the other hand, fl)r some sentences in a text, context model, ;~s shown in Figure 1. In addition, if such as Time ]lies like an arrow, a syntactic t)arser any on-line knowledge r(`sourc('s are ~tvMbd)l(`, infl)r- lltay gent,rate nlore thatl olle parse tree, owillg to the mation extracted froln tit(, resour<:es is also stored in 1)r(`sen(-e of words that Call ])e ;Lssigned to more than the context model. For examl)le, infl)rmation on sym one part of st)eech , or to the l)resen('e of complicated onyms extra.('te(t from an on-lilw thesaurus dictionary coordinate structures, or for wtrious other re~Lsons.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us