XNL-Soar, Incremental Parsing, and the Minimalist Program

XNL-Soar, Incremental Parsing, and the Minimalist Program Deryle Lonsdale, LaReina Hingson, Jamison Cooper-Leavitt, Warren Casbeer, Rebecca Madsen BYU Department of Linguistics Syntax in the Minimalist Program (MP) Minimalist Principles (Chomsky 1995) Abstract Merge: Build structure based on uninterpretable categorial features First: head+complement The object: a new incremental language modeling parser Second: spec + head/X-bar Operator trace (partial) Syntactic representation based on the Minimalist Program (Chomsky 1995) Move: Generate grammaticality in the surface representation 1: Updates the prior GB (Principles & Parameters) representation Head-head movement 2:-3: 1: lexaccess(their) The framework: Soar Phrasal movement 5: 2: lexaccess(investigation) Symbolic, rule-based, goal-directed cognitive modeling approach (Newell 1990) Hierarchy of Projections (Adger 2003) 3: project (NP) Machine learning architecture (Laird et al. 1987) Inherent knowledge for specifying structural projections 4: hop (n) Models on-line human language processing. 6: Nominal: D > (Poss) > n > N 5: merge1(n’) The goal: explore MP correlations with prior psycholinguistic findings in human language processing (Lewis 1993) Clausal: C > T > (Neg) > (Perf) > (Prog) > (Pass) > v > V 6: merge2(nP) The approach: process incoming lexical items incrementally via operators Features play a central role 7: movehead (N) Lexical access operator for each incoming word Strong (movement) and weak (merge) 8: hop(D) Build MP syntactic structure via projection, merge, and movement operators Feature checking 9: merge1(D’) Constrain operators using subcategorization and thematic information NP, VP symmetry including shells 10: merge2(DP) Allow strategies for limited, local structural reanalysis for unproblematic 11: lexaccess(should) ambiguities 12: lexaccess(have) Related issues: 13: lexaccess(exonerated) 7: Semantic interpretation incrementally mapped from syntactic structures Incremental parsing 14: lexaccess(the) Machine learning: chunking up prior decisions and reusing them later 15: lexaccess(defendant) Performance issues: memory usage across time, parsing difficulties Human language processing is incremental 16: project(NP) References: Processes are largely lexically driven 17: hop(n) Chomsky, N. 1995. The Minimalist Program. MIT Press. 18: merge1(n’) Laird, J., A. Newell, and P. Rosenbloom. 1987. Soar: An architecture for Word-by-word processing: words... 9:-10: general intelligence. Artificial Intelligence 33:1-64. 19: merge2(nP) enter system’s perceptual buffer Lewis, R. 1993. An architecturally-based theory of human sentence 20: movehead(N) attended to via lexical access operator comprehension. PhD thesis, Carnegie Mellon, School of Computer Science. 21: hop(D) Newell, A. 1990. Unified theories of cognition. Harvard University Press: disappear if unattended to after specified duration 11:-15: 22: merge1(D’) Cambridge, MA. Structure (syntactic and semantic) constructed piecemeal 23: merge2(DP) Pritchett, B. 1992. Grammatical Competence and Parsing Performance. 24: merge1(V’) University of Chicago Press. Chicago, IL. Open question: Is parsing within the MP incrementally feasible in a 25: merge2(VP) cognitively plausible way? 26: hop(v) 27: merge1(v’) 28: merge2(vP) 29: movehead(V) Background: Soar WordNet data 30: hop(Perf) 31: merge1(Perf’) Operators and operator types General theory of human problem solving 32: merge2(PerfP) Overview of verb exonerate Cognition: language, action, performance (in all their varieties) 16:-24: 33: hop(T) Operator: basic unit of cognitive processing The verb exonerate has 1 sense (first 1 from tagged 34: merge1(T’) Cognitive modeling architecture instantiated as intelligent agent texts) Stepwise progress toward specified goal 35: merge2(TP) Observable mechanisms, time course of behaviors, deliberation Various types and functions in XNL-Soar (1)acquit, assoil, clear, discharge, exonerate, Knowledge levels and their use exculpate -- (pronounce not guilty of criminal Lexical access: retrieve and store lexically-related information charges; "The suspect was cleared of the murder Instantiate the model as a computational system Merge: construct syntax via MP-specified merge operations charges") Symbolic rule-based architecture Movehead: perform head-to-head movement (via adjunction) Semantic class: verb.communication Subgoal-directed problem specification HoP: consult hierarchy of projections, return next possible target level Verb frames: Operator-based problem solving Project: create bare-structure maximal projection from lexical item Somebody ---s somebody. Machine learning Somebody ---s somebody of something. Applications: robotics, video games and simulation, tutorial dialogue, etc. NL-Soar: natural language processing engine built on Soar English LCS lexicon data Background: NL-Soar 10.6.a#1#_ag_th,mod- External knowledge sources poss(of)#exonerate#exonerate#exonerate#exonerate+ed# (2.0,00874318_exonerate%2:32:00::) Soar extension for modeling language use “10.6.a” “Verbs of Possessional Deprivation: Cheat Verbs/-of” WordNet 2.0 (wordnet.princeton.edu) WORDS (absolve acquit balk bereave bilk bleed burgle cheat Unified theory of cognition Lexical semantics: part-of-speech, word senses, subcategorization cleanse con cull cure defraud denude deplete depopulate + deprive despoil disabuse disarm disencumber dispossess divest Inflectional and derivational morphology drain ease exonerate fleece free gull milk mulct pardon Soar cognitive modeling system English LCS lexicon (www.umiacs.umd.edu/~bonnie/verbs-English.lcs) plunder purge purify ransack relieve render rid rifle rob sap 25:-29: + strip swindle unburden void wean) NL components Thematic information: θ-grids, θ-roles ((1 "_ag_th,mod-poss()") = Used to derive uninterpretable features (1 "_ag_th,mod-poss(from)") (1 "_ag_th,mod-poss(of)")) Unified cognitive architecture for overall cognition including NL Triggers syntactic construction Used specifically to model language tasks: acquisition, language use, "He !!+ed the people (of their rights); He !!+ed him of his Aligned with WordNet information sins" language/task integration, etc. Different modalities supported Parsing/comprehension: derive semantics of incoming utterances Generation/production: output utterances expressing semantic content Rules for XNL-Soar Mapping: convert between semantic representations Discourse/dialogue: learn and execute dialogue plans IF→THEN (productions) XNL-Soar updates the syntactic component of NL-Soar to use the MP If certain conditions are met, then the agent performs some action Represent a priori knowledge Several rule/production firings can be bundled together as operators Current XNL-Soar system: about 60 productions External knowledge sources: interfaced via 1000+ lines of Tcl/Perl 30:-34: NL-Soar system: 3500 productions Our goals Integrate the MP into a cognitive modeling engine Explore language/task-integrations using the MP Current status Test cross-linguistic implementation possibilities with the MP Ultimately, determine whether the MP supports incremental, operator- Proof of concept for fundamental syntactic structures based processing Basic transitive sentences work, ditransitives soon Unergatives and unaccusatives work Our approach All functional and lexical projections in syntactic structure Feature percolation, feature checking mostly works Map the syntactic parsing task onto an operator-based framework Some constraints derived from thematic information Specify goals, subgoals, etc. for parsing Develop operator types for various MP syntactic operations Implement constraints, preconditions, precedence hierarchies Future functionality Integrate necessary and relevant external knowledge sources Strengths: XP adjunction We have already done this for a prior syntactic model. More semantics/deeper semantics. The MP has an operator-like feel to it. Quantifier raising The Soar operator-based framework is versatile and flexible. Scopal relationships Weaknesses: C-command and other interpretive mechanisms The MP literature does not address incremental parsing in-depth. More detailed LCS structures The external knowledge sources are somewhat incommensurable. Web-based interactive Minimalist Parser grapher 35: Our knowledge of human performance data is sketchy. Issue: Find a balance between generation and parsing Most MP descriptions are generative, not recognitional in focus Is it advisable and well motivated to “undo” or “reverse” movements? If not, is generate-and-test the right mechanism for parsing input? What are the implications for learning and bootstrapping language capabilities (e.g. parsing in the service of generation)? Similar Work Incremental parsing in general (Phillips 2003) Future applications Other linguistic theories for incremental parsing GB (Kolb 1991) Integrate syntax/semantics into discourse/conversation component Dependency grammar (Milward 1994, Ait-Mokhtar et al. 2002) Develop human-agent and agent-agent communication Categorial Grammar (Izuo 2004) Parameterize XNL-Soar for processing of other languages besides English Finite-state methods (Ait-Mokhtar & Chanod 1997) Model cognition in reading Minimalist parsing in other frameworks (Stabler 1997, Harkema 2001) Model real-time language/task integrations Thematic information and parsing (Schlesewsky & Bornkessel 2004) Crosslinguistic considerations in incremental parsing (Schneider 2000) Note: all of the above have been implemented in NL-Soar, the predecessor Human studies on ambiguity, reanalysis Eye tracking (Kamide, Altmann, & Haywood 2003) ERP (Bornkessel, Schlesewsky, & Friederici 2003) For more information ... on Soar: http://sitemaker.umich.edu/soar on NL-Soar: http://linguistics.byu.edu/nlsoar.

XNL-Soar, Incremental Parsing, and the Minimalist Program

Common and Distinct Neural Substrates for Pragmatic, Semantic, and Syntactic Processing of Spoken Sentences: an Fmri Study

Modeling Subcategorization Through Co-Occurrence Outline

1 Minimalism Minimalism Is the Name of the Predominant Approach In

Generative Linguistics Within the Cognitive Neuroscience of Language Alec Marantz

Acquiring Verb Subcategorization from Spanish Corpora

Features from Aspects Via the Minimalist Program to Combinatory Categorial Grammar

Movement in Minimalism, Class 2: Linearization and the Problem of Copies July 11, LSA Summer Institute 2017

The Development of Minimalist Syntax

Glossary for Syntax in Three Dimensions (2015) © Carola Trips

Subcategorization Semantics and the Naturalness of Verb-Frame Pairings

The Minimalist Program 1 1 the Minimalist Program

Verb Sense and Verb Subcategorization Probabilities Douglas Roland Daniel Jurafsky University of Colorado University of Colorado Department of Linguistics Dept