<<

Phase-based Minimalist Parsing and complexity in non-local dependencies

Cristiano Chesi NETS - IUSS P.zza Vittoria 15 I-27100 Pavia (Italy) [email protected]

1 Introduction Abstract The last twenty years of formal linguistic research English. A cognitively plausible parsing have been deeply influenced by Chomsky’s mini- should perform like the human malist intuitions (Chomsky 1995, 2013). In a nut- parser in critical contexts. Here I propose shell, the core Minimalist proposal is to reduce an adaptation of Earley’s parsing algo- phrase structure formation to the recursive appli- rithm, suitable for Phase-based Minimal- cation of a binary, bottom-up, structure-building ist (PMG, Chesi 2012), that is operation dubbed Merge. Merge creates hierar- able to predict complexity effects in per- chical structures by combining two lexical items formance. Focusing on self-paced reading (1.a), one lexical item and an already built (by pre- experiments of object clefts sentences vious application of Merge operations) phrase (Warren & Gibson 2005) I will associate (1.b) or two already built phrases (1.). to parsing a complexity metric based on cued features to be retrieved at the verb (1) a. b. c. segment (Feature Retrieval & Encoding Cost, FREC). FREC is crucially based on x y x YP XP YP the usage of memory predicted by the dis- cussed parsing algorithm and it correctly Phrases are not linearly ordered by Merge. Only fits with the reading time revealed. when they are spelled-out (i.e. sent to the Sensory- Motor interface, aka Phonetic Form, PF), lineari- Italian. Un algoritmo di parsing cogniti- zation is required: assuming that x and y are ter- vamente plausibile dovrebbe avere una minal nodes (i.e. words), either or performance paragonabile a quella can both be proper linearizations of (1.a). Hierar- umana in contesti critici. In questo lavoro chical structure (and linearization) is also deter- propongo un adattamento dell’algoritmo mined by another structure building operation: di Earley che utilizza Grammatiche Mini- Move (or Internal Merge, Chomsky 1995); Move maliste basate sul concetto di Fase (PMG, re-arranges phrases in the structure by re-merging Chesi 2012). Associata all’algoritmo, an item (already merged in the structure) to the verrà discussa una funzione di costo (Fea- edge of the current, top-most, phrase: for instance ture Retrieval & Encoding Cost, FREC) [XP [YP [ZP]]] can lead to [ZP [XP [YP (ZP)]] if capace di misurare la difficoltà relativa al XP (the probe) has a feature triggering movement recupero dei referenti coinvolti in dipen- (e.g. +f) and ZP (the goal) has the relevant feature denze a distanza. La funzione si basa sui qualifying it as a plausible target for movement tratti morfosintattici archiviati nel me- (e.g. -f). At the end, the element displaced (ZP) mory buffer utilizzato dal parser. Concen- will occupy the edge of the structure. When the trandosi sulle strutture scisse ad estra- items within an already built phrase, for instance zione dell’oggetto, si mostrerà come il XP, are delivered to PF, they get properly linear- FREC risulti predittivo dei dati sperimen- ized according to their hierarchical structure (e.g. tali ricavati da studi classici di lettura au- Linear Correspondence Axiom, Kayne 1994), in- toregolata (Warren & Gibson 2005). trinsic phonetic properties (e.g. cliticization), as well as economy conditions (e.g. an items should with wh-questions share a similar non-local de- not be pronounced twice). Such a (cyclic) spell- pendency formation: out happens at phases: XP will be delivered to PF only if it qualifies as a phase (Chomsky 2013). In (2) a. It is [DP1 the banker|John|me] that this sense, a phase should be a constituent/phrase [DP2 the lawyer|Dan|you] will meet _DP1 with some degree of completeness with respect to semantic interpretation (Logic Form, aka LF). In short, the head of the dependency (DP1) should Most minimalist linguists agree on the fact that a be interpreted both as a focalized item and as the full-fledged (aka Complementizer direct object (this is where the name of the con- Phrase, CP) is a phase, the highest argumental struction “object cleft” comes from) of the embed- shell of a qualifies as a phase (aka little- ded verb. The difficulty of parsing this structure v Phrase, vP) and also a full argument is a phase has been deeply discussed in literature (Gordon et (aka Determiner Phrase, DP). Such a simple (and al. 2004). What is considered a crucial factor is the computationally appealing) model has been fully role of the similarity between DP1 and DP2 (the formalized (Stabler 1997, Collins & Stabler 2016) of the cleft, Belletti and Rizzi 2013, §2). and some parsing algorithm that implements main To capture this fact, I will re-adapt Earley’s algo- minimalist insights has been discussed in litera- rithm (§3.1) to operate on a specific version of ture (e.g. Harkema 2001, Chesi 2012 a.o.). Minimalist (§3). This would allow us to In these pages, I will present some of the ad- subsume the similarity effect by predicting read- vantages of retaining such a simplified computa- ing differences as revealed in self-paced reading tional approach to syntactic derivation. Crucially, experiments (e.g. Warren & Gibson 2005, §4). I will try to overcome some clear disadvantages in assuming the just presented standard, bottom-up, 2 Parsing with Minimalist Grammars structure building operations, while obtaining, at Since Merge and Move strictly operate “from bot- the same time, a better empirical fit: on the one tom to top”, we expect sentence structure in (2) to hand, I will avoid any non-efficient deductive- be built in 9 steps (and 5 phases: ph1, ph2 …): parsing perspective (that is a consequence of the assumed bottom-up nature of the Merge and 1. [ph1 the banker] Move operations); on the other, I will promote a 2. [ph3 meet [ph1 …]]] more transparent relation between formal compe- 3. [ph3 will [meet [ph1 …]]] tence, parsing and psycholinguistic performance 4. [ph2 the lawyer] (independently built) by presenting a simple adaptation of Earley’s 5. [ph3 [ph2 …] will [meet [ph1 …]]] 6. [ph4 that [ph3 [ph2 …] will [meet [ph1 …]]]] Top-Down parsing algorithm (Earley 1970) and a 7. [[ph1 …] [ph4 that [ph3 [ph2 …] will [meet (ph1 …)]]]] complexity metric that refers directly to parsing (ph1 moves to ph4 edge) memory usage: this metric will be able to account 8. [ph5 is [[ph1 …] [ph4 that [ph3 [ph2 …] will [meet for complexity in retrieving the correct item while [ph1 …]]]]] 9. [ph5 it [is [[ph1 …] [ph4 that [ph3 [ph2 …] will [meet processing specific non-local dependencies. By [ph1 …]]]]] “non-local” dependencies I refer to those relations involving movement, namely constructions where With the exception of step 4, all other steps must the very same item occurs in two distinct, non-ad- be strictly ordered. As a consequence, moving the jacent, positions: for instance, wh-dependencies in direct object in the relevant position would force English require the wh- item (who, in (1)) to be the linearization to place ph2 first at the edge of interpreted both in a the left peripheral (focalized) ph3, then at the edge of ph4. This is how Minimal- position (the Criterial position, in the sense of ism derives the relevant non-local dependencies in Rizzi 2007) and in the thematic lower position (2). Obviously this is not transparent at all with 1 (right next to the verb meet in (1)) : respect to parsing (e.g. Fong 2011), where the pro- cessing order is expected to be completely re- (1) Who1 do you think Mary will meet _1? versed:

The critical derivation I will discuss in this pa- 1. [ph5 ] is initiated per is that of object clefts (Gordon et al. 2001) that 2. [ph1 ] is fully processed while [ph5 ] is still open

1 Coreference in non-local dependencies will be indi- pronounced item in the thematic position is indicated cated by the same subscript placed both on the “dis- with a co-indexed underscore) placed” item and on the thematic position (the non- 3. [ph4 ] is initiated (a Relative Clause) As in MGs (Stabler 1997), the Lexicon is a finite 4. [ph3 ] is initiated as well (Verbal Phrase) set of lexical items storing phonetic, semantic 5. [ph2 …] is fully processed while [ph5 ], [ph4 ] and [ph3 ] are open (here ignored) and syntactic features (functional 6. [ph1 ] finally receives a thematic role, hence [ph5 ], [ph4 ] +F, selectional =S, categorial C); an item bearing and [ph3 ] can be closed. a selection feature, e.g. [=XP A], requires an XP ph(r)ase right afterward: [=XP A [XP ]] (once fea- Unless we deeply revise Minimalist Grammars tures are projected in the structure, i.e. [XP ], the (both with respect to movement, Fong 2005, and selection features are deleted, i.e. =XP); functional to thematic role assignment, Niyogi & Berwick features, e.g. +X express a functional specifica- 2005), we are left with an asymmetry that can not tion like determiner +D, tense +T or topic +S be explained simply in terms of structure building (when placed under brackets, e.g. (+f), functional operations as discussed in the next section. features are optional; Ø indicates phonetically null items). 2.1 The “similarity” problem Merge simply unifies the expected structure built Warren & Gibson (2005) show that in clefts con- so far with a new incoming item, if and only if, structions like the one discussed in (2), the varia- this item bears (at least) the first relevant feature tion of the two DPs [ph1 ] and [ph2 ] produces dif- expected (Merge operation is greedy: an item ferences in reading time at the verb segment in bearing more features in the correct expected or- self-paced reading experiments with the full-DP der will lexicalize them all): matching condition ([ph1 the barber] that [ph2 the banker] praised …) and proper nouns matching 1. Merge([+X +Y +Z W ], [+X +Y A])=[[+X +Y A] +Z W ] 2. Merge([[+X +Y A] +Z W ], [+z B])=[[+X +Y A][+z B] W ] condition ([ph1 John] that [ph2 Dan] praised …) 3. Merge([[+X +Y A][+z B] W ], [w C])=[[+X +Y A][+z B] [w C]] ranking higher in terms of difficulty (greatest slow down at verb segment), while pronouns ([ph1 you] Move uses a Last-In-First-Out (LIFO) memory that [ph2 we] praised …) are easier (fastest reading buffer (M) to create non-local dependencies: M is time). No CFG-based parsing algorithm (in fact, used to store unexpected bundles of features no classic algorithm implements the non-local de- merged in the derivation (below, underlined fea- pendencies in (2) as presented in §2) or Minimal- tures, e.g. [+W U], are the unexcepted ones trigger- ist deductive parsing (parsing strategies exploit ing Move): the weak equivalence of MGs with multiple Con- text Free Grammars, Michaelis 1998) have a 1'. Merge([+X +Y +Z W ], [+X +W U A]) = [[+X +W U A] +Z W ] chance to compare these cases. 2'. Move([+X +W U A]) = M[+W U ]

3 A processing-friendly proposal Items in the memory buffer M will be re-merged in the structure, before any other item taken from Phase-based Minimalist Grammars (PMGs, Chesi the lexicon, as soon as a coherent selection is in- 2012) suitable for parsing of sentences like the troduced by another merged item: ones in (2) can be formalized as follows: 3'. Merge([ … [w =[+W U] C [+W U ]]], M[+W U ]) = (3) PMG able to parse cleft sentences [ … [w =[+W U] C [+W U )]]], M[ empty ]

Lexicon Notice that phonetic features (items under angled [[+D +Sg Johni] [N _i]], [[+D +Sg Dani] [N _i]], [N +Sg banker], brackets, i.e. []) are not re-merged in the struc- [N +Sg lawyer], [+D the], [+D +P1 +Pl +case_acc me [N Ø]], ture (that is, they are not expected to be found in [+D +P2 +Sg +case_nom you [N Ø]], [+T will], [+T that], the input) since they are already been pro- [=[DP (+case_nom)] =[DP (+case_acc)] V meet], [+exp it], [=rCP BE is] nounced/parsed in the higher position. When the Phases M(emory) buffer is empty and no more selection DP → [DP ([+F Ø]/[+S Ø]) +D N] features must be expanded, the procedure ends. Cleft → [CP +Exp BE] rCP → [CP +F +FIN (+S) +T V] 3.1 Parsing cleft structures with PMGs

Operations The parsing algorithm using the minimalist gram- Merge = ([phH +f (+fn) (H)], [+f L]) = [phH [+f L (+fn) (H)]] mar described in (3) implements an Earley-like Phase Projection = [phH =phX H] = [phH =phX H [phX ]] procedure composed of three sub-routines: Move = if expected [phX +f X] and found [phX [phY +f +g Y] X] → MEM([phY +g )])

1. Ph(ase)P(rojection) (Earley Prediction proce- 20. Merge([CP [DP [+F Ø] [+D the] [N banker]] [+FIN that] [DP dure): the most prominent (i.e. first/left most) se- [+S Ø] [+D the] [N lawyer]] +T V]], [+T will]) = lect feature is expanded (the sentence parsing ([CP [DP [+F Ø] [+D the] [N banker]] [+FIN that] [DP [+S Ø] starts with a default PhP using one of the phases in [+D the] [N lawyer]] [+T will] V]] grammar (3)); 21. Search(V): M[[DP +D N (the lawyer)],[DP +D N (the 2. Merge (Earley Scanning procedure): if Memory is banker)]], [=DP =DP V meet] empty, the first available feature F in the expected 22. Merge([CP [DP [+F Ø] [+D the] [N banker]] [+FIN that] [DP phase is searched in the input/lexicon and possible [+S Ø] [+D the] [N lawyer]] [+T will] V]], [=DP =DP V meet]) 2 = items will be retrieved (search(F) = [F lex1], [F [CP [DP [+F Ø] [+D the] [N banker]] [+FIN that] [DP [+S Ø] lex2] … [F lexn]) then unified with the expected [+D the] [N lawyer]] [+T will] [=DP =DP V meet]] structure (e.g. Merge([F … X], [F lex1]) = [[F lex1]… 23. PhP([CP … [=DP =DP V meet]]) = [CP … [=DP =DP V meet x]); items stored in Memory are checked before the [DP +D N]]] sentence input for Merge; 24. Merge ([CP … [=DP =DP V meet [DP +D N]]], M[[DP +D N 3. Move: if more features than the one expected are (the lawyer)]] = ([CP … [=DP =DP V meet [DP +D N (the introduced, those features are clustered and moved lawyer)]]] in the LIFO Memory buffer: 25. PhP([CP … [=DP =DP V meet]]) = [CP … [=DP =DP V meet M[[slot 1][slot 2] … [slot n]]. [DP +D N (the lawyer)] [DP +D N]]] 26. Merge ([CP … [=DP =DP V meet … [DP +D N]]], M[[DP +D N Given the recursive, cyclic, application of the (the banker)]] = ([CP … [=DP =DP V meet … [DP +D N (the three subroutines above, this is the sequence of banker)]]] steps needed for parsing a cleft sentence like (2): According to the lexicon and the phase expecta-

tions, step 10 and 19 could have found in the input 1. Default PhP (in this case: Cleft): [CP +Exp BE] 2. Search(+Exp): M[ empty ], Lex[[+exp it]] [+D N John], [+D N Dan], [+D +P1 +Pl +case_acc me [N Ø]] 3. Merge([CP +Exp BE], [+exp it]) = [CP [+exp it] BE] or [+D +P2 +Sg +case_nom you [N Ø]], capturing all pos- 4. Search(BE): M[ empty ], Lex[[BE is]] sible combinations of definite descriptions, cor- 5. Merge([CP [+exp it] BE], [=rCP BE is]) = rect pronominal DPs and proper nouns. Exactly all [CP [+exp it] [=rCP BE is]] 6. PhP([CP [+exp it] [=rCP BE is]) = the possibilities we want to test. [CP [+exp it] [=rCP BE is [CP +F +FIN +S +T V]] 7. Search(+F): M[ empty ], Lex[[DP [+F Ø] +D N]] 4 Explaining the “similarity” problem in 8. Merge([…[CP +F +FIN +S +T V]], [DP [+F Ø] +D N]) = terms of cue-based feature retrieval [CP [DP [+F Ø] +D N] +FIN +S +T V]] 9. Search(+D): M[ empty ], Lex[[+D the]] According to Warren & Gibson (2005) revealed 10. Merge([DP [+F Ø] +D N], [+D the]) = reading times (see also Gordon et al. 2004 for very [CP [DP [+F Ø] [+D the] N] +FIN +S +T V]] similar results) we can roughly rank on a difficulty 11. Search(N): M[ empty ], Lex[[N banker]] scale all the (3x3) tested conditions (D = definite 12. Merge([DP [+F Ø] [+D the] N], [N banker]) = condition, e.g. “the banker”, N = nominal condi- [CP [DP [+F Ø] [+D the] [N banker]] +FIN +S +T V]] tion, e.g. “Dan”, P = pronoun condition, e.g. “we”; 13. Move([DP [+F Ø] [+D the] N], [N banker]) = M[[DP +D N ]] for instance D-D stands for “it is the banker that (Move is triggered because at step 8 +D N were the lawyer will meet…”, vs D-P condition “it is unexpected; only after full lexicalization [DP [+F Ø] the banker that we will meet…”): +D N] is stored in M, namely at step 13) 14. Search(+FIN): M[[DP +D N ]], Lex[[+FIN that]] (4) D-D ≥ N-D ≈ N-N ≈ P-D 15. Merge([CP [DP [+F Ø] [+D the] [N banker]] +FIN +S +T V]], > D-N ≥ P-N > D-P ≥ N-P ≈ P-P [+FIN that]) = [CP [DP [+F Ø] [+D the] [N banker]] [+FIN that] +S +T V]] 16. Search(+S): M[[DP +D N ]], Building on Gillund & Shiffrin (1984) Search of Lex[[DP [+S Ø] +D N]] Associative Memory (SAM) model, and assum- 17. Merge([CP [DP [+F Ø] [+D the] [N banker]] [+FIN that] +S +T ing a cue-based retrieval mechanism for items in V]], [DP [+S Ø] +D N]) = [CP [DP [+F Ø] [+D the] [N banker]] memory (Van Dyke & McElree 2006), we can de- [+FIN that] [DP [+S Ø] +D N] +T V]] 18. (repeat 9-13 mutatis mutandis) fine a complexity (C) function associated to the 19. Search(+T): M[[DP +D N (the lawyer)],[DP +D N (the features to be retrieved from M (Feature Retrieval banker)]], Lex[+T will]

2 For reason of space, I will not discuss here neither taken into consideration and stored in the parsing lexical and syntactic nor reanalysis (i.e. re- “chart” as in the classic Earley’s parser. For ranking covery from wrong expectations); the proposed algo- of alternatives see Hale (2001). rithm here is meant to be a Top-Down complete pro- cedure, that is, all the possible will be Cost, FRC, Chesi 2016) for each item to be re- N-P conditions (both CFRC=4). Predictions can be merged after the phase projection at verb (V): further differentiated by adding a cost for encod- ing the features in the structure (eF) which is (to keep the calculation as simple as possible) propor- (5) CFRC(V) = ∏ tional to the number of lexical features to be en- coded once an item is retrieved from memory (the In the formula above, m is number of items stored numerator of the CFRC cost function becomes: in memory at retrieval, nF is the number of fea- 1 ). This corresponds to an in- tures characterizing the argument to be retrieved crease of +1 for D and +0,5 for N at retrieval. The that are non-distinct in memory (i.e. also present new CFREC(V) in the different conditions becomes: in other objects in memory), dF is number of dis- tinct cued features (e.g. case features explicitly CFREC(V) D-D = x = 50 probed by the verb selection). CFRC will express , the cost, in numerical terms, that should fit with CFREC(V) N-D = x = 37,5 the revealed reading time (i.e. higher differences , , CFREC(V) N-N = x = 30,37 in reading times, higher differences in CFRC). According to the lexicon in (3), the cost for re- CFREC(V) P-D = x = 25 trieving the correct items in the D-D condition, for , CFREC(V) D-N = x = 24,5 instance, is calculated as follows: , CFREC(V) P-N = x = 12,25 1. [=[DP (+case_nom)] =DP(+case_acc) V meet] will trigger retrieval of the first item (the last inserted one CFREC(V) D-P = x = 8 in the buffer) which is (step 24) the DP , CFREC(V) N-P = x = 6 [+D +Sg N (the lawyer)] 2. No cued-features are present (the verb selec- CFREC(V) P-P = x = 2 tion only asks for an optional nominative case) and the 3 features to be retrieved are in fact shared with the other item in memory Though in some cases FREC predicts slightly larger differences (e.g. D-D vs N-D/N-N condi- ([+D +Sg N (the banker)]) tion), it correctly ranks all conditions revealed by 3. Hence: CFRC = x = 16 the discussed experiment, and it is coherent with specific predictions (e.g. related to feature match- Notice that retrieving the object when the subject ing) discussed in literature (Belletti & Rizzi 2013). has been removed from memory has a minimal cost since no confounding features are present an- 5 Conclusion ymore in memory. As for the other relevant con- In this paper I presented an adaptation of Earley’s ditions: N-N, as in D-D condition share the same Top-Down parsing algorithm to be used with a features hence we expect them to have similar cost simple implementation of a Minimalist Grammar except for the fact that N feature is not fully lexi- (PMG). The advantages of this approach are both calized, but it is a trace of an N-to-D movement in terms of cognitive plausibility and parsing/per- (Longobardi 1994). Counting this as 0.5 (further formance transparency. From the cognitive plau- investigation is needed to correctly assign a cost sibility perspective, I showed how a re-orientation to an emptied lexical position), we obtain 12,25. of the minimalist structure building operations Same complexity for N-D condition (since the [N] Merge and Move is sufficient to include such op- lexical feature in D is compared to the trace [N _i] erations directly within a parsing procedure. This feature of N counting 0.5). While we would ex- is a step toward the “Parser Is the Grammar” pect slightly smaller cost with the P-D condition (PIG) default hypothesis (Phillips 2006) and a (P does have a [N Ø] empty feature), that is 9, we welcome simplification of the linguistic compe- will correctly predict simpler complexity for re- tence description: such a grammar description (i.e. trieving pronouns at the subject position, since our linguistic competence) is shared both in pro- they are always bearing person features (which rd duction (generation) and in comprehension (pars- are distinct from default 3 person of D and N) ing); this seems trivial from a cognitive perspec- and they are marked for case (which is cued by the tive (we have a unique Broca’s area activated in verb, producing the minimal cost in the P-P con- syntactic processing both in parsing and in gener- dition (CFRC= 1) and similar costs in the D-P and ation), but it is far from trivial in computational terms. On the other hand, from the parsing/perfor- Gillund, G., & Shiffrin, R. M. 1984. A retrieval model mance transparency perspective, I presented a for both recognition and recall. Psychological re- complexity metric (FREC), based on cued fea- view, 91(1), 1. tures stored in memory which better characterize Gordon, P. C., Hendrick, R., & Johnson, M. 2001. performance in object clefts constructions com- Memory interference during processing. pared to alternative models: for instance the Journal of Experimental Psychology: Learning, Depencency Locality Theory (DLT) based on ac- Memory, and Cognition, 27(6), 1411. cessibility hierarchy (Gibson 2000) is unable to Gordon, P., Hendrick, R., Johnson, M. 2004. “Effects predict high complexity in N-N condition compa- of noun phrase type on sentence complexity”, rable to N-D or D-D condition, since N should be Journal of Memory and Language 51, 97-114. uniformly more accessible than D, contrary to the Hale, J. T. 2001. A probabilistic as a psy- facts. The proposed model, obviously should be cholinguistic model. In Proceedings of the second extended in many respects to capture other critical meeting of the North American Chapter of the Asso- phenomena (see Lewis & Vasishth 2005) but the ciation for Computational on Language first results on specific well-studied constructions, technologies, 1-8. like object clefts, seem very promising. Hale, J. T. 2011. What a rational parser would do. Cog- nitive Science, 35(3), 399-443. Reference Harkema, H. 2001. Parsing minimalist . Uni- Belletti, A., & Rizzi, L. 2013. Intervention in grammar versity of California, Los Angeles. and processing. From grammar to meaning: The spontaneous logicality of language, 293-311. Kayne, R. S. 1994. The antisymmetry of . Cam- bridge (MA), MIT Press. Chesi, C. 2012. Competence and Computation: toward a processing friendly minimalist Grammar. Padova: Lewis, R. L., & Vasishth, S. 2005. An activation‐based Unipress. model of sentence processing as skilled memory re- trieval. Cognitive science, 29(3), 375-419. Chesi, C. 2016. Il processamento in tempo reale delle frasi complesse. EM Ponti, M. Budassi (eds.), 21- Longobardi, G. 1994. Reference and proper names: a 38. theory of N-movement in syntax and logical form. Linguistic Inquiry, 25, 609–65. Chomsky, N. 1995. The Minimalist Program (Current Studies in Linguistics 28). Cambridge (MA): MIT Michaelis, J. 1998. Derivational Minimalism is Mildly Press. Context-Sensitive.” In M. Moortgat (ed.), Logical Aspects of Computational Linguistics, (LACL ’98). Chomsky, N. 2008. On phases. In Robert Freidin, Car- Lecture Notes in Artificial Intelligence. Springer los P. Otero, and Maria Luisa Zubizarreta (eds.). Verlag. Foundational issues in linguistic theory, 133-166. Niyogi, S., & Berwick, R. C. 2005. A minimalist im- Chomsky, N. 2013. Problems of projection. Lingua, plementation of Hale-Keyser incorporation theory. 130, 33-49. In A. M. Di Sciullo (ed.) UG and external systems Collins, C., & Stabler, E. 2016. A formalization of min- language, brain and computation, linguistik ak- imalist syntax. Syntax, 19(1), 43-78. tuell/linguistics today, 75, 269-288. Earley, J. 1970. An efficient context-free parsing algo- Phillips, C. 1996. Order and structure. Doctoral disser- rithm. Communications of the Association for Com- tation, Massachusetts Institute of Technology. puting Machinery, 13(2), February. Rizzi, L. 2007. On some properties of criterial freezing. Fong, S. 2005. Computation with probes and goals. In Studies in linguistics, 1, 145-158. Di Sciullo, A. M. and R. Delmonte (eds.). UG and Stabler, E. 1997. Derivational minimalism. In Interna- external systems: Language, brain and computa- tional Conference on Logical Aspects of Computa- tion. 75, 311. tional Linguistics (pp. 68-95). Springer, Berlin, Hei- Fong, S. 2011. Minimalist parsing: Simplicity and fea- delberg. ture unification. In Proceedings of Workshop on Van Dyke, J. A., & McElree, B. 2006. Retrieval inter- Language and . Mons, Belgium: Univer- ference in sentence comprehension. Journal of sity of Mons. March. Memory and Language, 55(2), 157-166. Gibson, E. 2000. The dependency locality theory: A Warren, T., & Gibson, E. 2005. Effects of NP type in distance-based theory of linguistic complexity. Im- reading cleft sentences in English. Language and age, language, brain, 95-126. Cognitive Processes, 20(6), 751-767.