Modeling Linguistic Theory on a Computer: from GB to Minimalism
Total Page:16
File Type:pdf, Size:1020Kb
Modeling Linguistic Theory on a Computer: From GB to Minimalism Sandiway Fong Dept. of Linguistics Dept. of Computer Science 1 MIT IAP Computational Linguistics Fest, 1/14/2005 Outline • Mature system: PAPPI • Current work – parser in the principles-and- – introduce a left-to-right parser parameters framework based on the probe-goal model – principles are formalized and from the Minimalist Program declaratively stated in Prolog (MP) (logic) – take a look at modeling some – principles are mapped onto data from SOV languages general computational mechanisms • relativization in Turkish and Japanese – recovers all possible parses • psycholinguistics (parsing – (free software, recently ported preferences) to MacOS X and Linux) – (software yet to be released...) – (see http://dingo.sbs.arizona.edu/~sandi way/) 2 MIT IAP Computational Linguistics Fest, 1/14/2005 3 PAPPI: Overview sentence • user’s viewpoint syntactic represent ations parser operations corresponding to linguistic principles (= theory) 3 MIT IAP Computational Linguistics Fest, 1/14/2005 PAPPI: Overview • parser operations can be – turned on or off – metered • syntactic representations can be – displayed – examined • in the context of a parser operation – dissected • features displayed 4 MIT IAP Computational Linguistics Fest, 1/14/2005 PAPPI: Coverage • supplied with a basic set of principles – X’-based phrase structure, Case, Binding, ECP, Theta, head movement, phrasal movement, LF movement, QR, operator-variable, WCO – handles a couple hundred English examples from Lasnik and Uriagereka’s (1988) A Course in GB Syntax • more modules and principles can be added or borrowed – VP-internal subjects, NPIs, double objects Zero Syntax (Pesetsky, 1995) – Japanese (some Korean): head-final, pro-drop, scrambling – Dutch (some German): V2, verb raising – French (some Spanish): verb movement, pronominal clitics – Turkish, Hungarian: complex morphology – Arabic: VSO, SVO word orders 5 MIT IAP Computational Linguistics Fest, 1/14/2005 PAPPI: Architecture • software GUI layers parser prolog os 6 MIT IAP Computational Linguistics Fest, 1/14/2005 2 PAPPI: Architecture Word Order pro-drop Wh-in-Syntax Scrambling • software GUI Lexicon Parameters layers parser Periphery prolog PS Rules Principles os Programming Language Compilation Stage LR(1) Type Chain Tree Inf. – competing parses can be run in parallel across multiple machines 7 MIT IAP Computational Linguistics Fest, 1/14/2005 PAPPI: Machinery • morphology – simple morpheme concatenation – morphemes may project or be rendered as features • (example from the Hungarian implementation) EXAMPLE: a szerzô-k megnéz-et------het-----!--------né-----nek----! két cikk---et the author-Agr3Pl look_at---Caus-Possib-tns(prs)-Cond-Agr3Pl-Obj(indef) two article-Acc 8 MIT IAP Cao m p munkatárs-utationaal----------- Lingiku---------------------kalistics Fest, 1/14/2005 the colleague----Poss3Sg-Agr3Pl+Poss3Pl-LengdFC+Com 2 PAPPI: LR Machinery • specification • phrase – rule XP -> [XB|spec(XB)] ordered specFinal st max(XP), proj(XB,XP). structure – rule XB -> [X|compl(X)] ordered headInitial(X) st bar(XB), proj(X,XB), head(X). – parameterized – rule v(V) moves_to i provided agr(strong), finite(V). X’-rules – rule v(V) moves_to i provided agr(weak), V has_feature aux. – head movement rules State 3 • implementation NP -> D N . State 1 State 2 – rules are not used – bottom-up, shift-reduce parser NP -> D . N NP -> N . directly during – push-down automaton (PDA) parsing for – stack-based merge State 4 computational efficiency • shift S -> . NP VP NP -> . D N S -> NP . VP – mapped at compile- • reduce NP -> . N NP -> NP . PP NP -> . NP PP VP -> . V NP time onto LR VP -> . V machinery – canonical LR(1) VP -> . VP PP PP -> . P NP • disambiguate through oSntaete 0word lookahead 9 MIT IAP Computational Linguistics Fest, 1/14/2005 1 PAPPI: Machine Parameters • selected parser operations may be integrated with phrase structure • specification recovery or – coindexSubjAndINFL in_all_configurations CF where specIP(CF,Subject) then coindexSI(Subject,CF). chain formation – subjacency in_all_configurations CF where isTrace(CF), – machine upPath(CF,Path) then lessThan2BoundingNodes(Path) parameter • implementation – however, not always efficient – use type inferencing defined over category labels to do so • figure out which LR reduce actions should place an outcall to a parser operation – subjacency can be called during chain aggregation 10 MIT IAP Computational Linguistics Fest, 1/14/2005 3 PAPPI: Chain Formation • specification • recovery of – assignment of a chain feature to constituents chains – compute all possible combinations • each empty • combinatorics category optionally – exponential growth participates in a chain • each overt constituent optionally heads a chain 11 MIT IAP Computational Linguistics Fest, 1/14/2005 3 PAPPI: Chain Formation • specification • recovery of – assignment of a chain feature to constituents chains – compute all possible combinations • each empty • cimomplebmineantotaritciosn category optionally – exponentialpossible chains growth compositionally defined participates in – incrementally computed a chain – bottom-up • each overt constituent optionally heads a chain – allows parser operation merge 12 MIT IAP Computational Linguistics Fest, 1/14/2005 3 PAPPI: Chain Formation • specification • recovery of – assignment of a chain feature to constituents chains – compute all possible combinations • each empty • cimmoemprlgebemin ceaontontarsittcirosanints on chain paths category optionally – exponentialpossible chains growth compositionally defined participates in – incrementally computed a chain – bottom-up • each overt – loweringFilter in_all_configurations CF where isTrace(CF), constituent downPath(CF,Path) then Path=[]. optionally – subjacency in_all_configurations CF where isTrace(CF), heads a chain – allowsupPath( CpFarser,Path) operationthen lessTh amergen2BoundingNodes(Path) 13 MIT IAP Computational Linguistics Fest, 1/14/2005 2 PAPPI: Domain Computation • specification • minimal – gc(X) smallest_configuration CF st cat(CF,C), domain member(C,[np,i2]) – with_components – incremental – X, – bottom-up – G given_by governs(G,X,CF), – S given_by accSubj(S,X,CF). • implementing – Governing Category (GC): – GC(α) is the smallest NP or IP containing: – (A) α, and – (B) a governor of α, and – (C) an accessible SUBJECT for α. 14 MIT IAP Computational Linguistics Fest, 1/14/2005 2 PAPPI: Domain Computation • specification • minimal – gc(X) smallest_configuration CF st cat(CF,C), domain member(C,[np,i2]) – with_components – incremental – X, – bottom-up – G given_by governs(G,X,CF), – S given_by accSubj(S,X,CF). • iumspedle minenting – BGionvdeirnngin gC Conatdeigtioorny (AGC): – GC• (αA) nis athnea spmhaolrle mst uNsPt bore I PA -cboonutanindin ign: its GC – (A) α, and – conditionA in_(aBl)l _ac goonvfeigrnuorar toiof nαs, aCnFd where – anap(hCo)r (aCnF a) ctcheesns igbcle(C SFU,GBJCE),C aTB foour nαd.(CF,GC). – anaphor(NP) :- NP has_feature apos, NP has_feature a(+). 15 MIT IAP Computational Linguistics Fest, 1/14/2005 Probe-Goal Parser: Overview • strictly incremental – left-to-right – uses elementary tree (eT) composition • guided by selection • open positions filled from input – epp – no bottom-up merge/move • probe-goal agreement – uninterpretable interpretable feature system 16 MIT IAP Computational Linguistics Fest, 1/14/2005 3 Probe-Goal Parser: Selection • recipe 1 start(c) Spec 3 • select drives pick eT headed by c C Comp derivation from input (or M) 2 Move M – left-to-right fill Spec, run agree(P,M) fill Head, update P Probe P fill Comp (c select c’, recurse) • memory elements • example – MoveBox (M) • emptied in accordance with theta theory • filled from input – ProbeBox (P) • current probe 17 MIT IAP Computational Linguistics Fest, 1/14/2005 3 Probe-Goal Parser: Selection • recipe 1 start(c) Spec 3 • select drives pick eT headed by c C Comp derivation from input (or M) 2 Move M – left-to-right fill Spec, run agree(P,M) fill Head, update P Probe P fill Comp (c select c’, recurse) • memory elements • example – MoveBox (M) • emptied in accordance with theta theory • filled from input – ProbeBox (P) • current probe agree φ-features → probe • note case → goal – extends derivation to the right 18 MIT IAP Computational Linguistics Fest, 1/1•4/2s0i0m5 ilar to Phillips (1995) 3 Probe-Goal Parser: Selection • recipe 1 start(c) Spec 3 • select drives pick eT headed by c C Comp derivation from input (or M) 2 Move M – left-to-right fill Spec, run agree(P,M) fill Head, update P Probe P fill Comp (c select c’, recurse) • memory elements • example – MoveBox (M) • emptied in accordance with theta theory • filled from input – ProbeBox (P) • current probe agree φ-features → probe • note case → goal – enxot emnedrsg ed/emriovvaetion to the right 19 MIT IAP Computational Linguistics Fest, 1/1•4/2sc0if0m.5 Milainr itmoa Plihstil lGiprsa m(1m99a5r.) Stabler (1997) Probe-Goal Parser: Lexicon lexical item properties uninterpretable interpretable features features v* (transitive) select(V) per(P) (epp) spec(select(N)) num(N) value(case(acc)) gen(G) v (unaccusative) select(V) v# (unergative) select(V) spec(select(N)) PRT. (participle) select(V) num(N) case(C) gen(G) V (trans/unacc) select(N) V (unergative) V (raising/ecm) select(T(def)) 20 MIT IAP Computational Linguistics Fest, 1/14/2005 Probe-Goal Parser: Lexicon lexical item properties uninterpretable interpretable features features T select(v) per(P) epp value(case(nom)) num(N) gen(G) T(def) (ϕ-incomplete)