<<

CMY CY MY CM K Y M C Centre deLingüística Teòrica delaUniversitat Autònoma deBarcelona 45 7 Index 89 27 165 153 139 203 229 263 287 Generative Syntax.Questions,Crossroads, andChallenges Matrix Syntax. approach. Syntax: atimechartandsomereflections. Breaking aconceptualtie. theory ofgrammar? Components. Architecture of Grammar and the Role of Interface Interfaces. Challenges. and Questions, Insights, Language: of Faculty the and Grammar Reading Program. Questions, Crossroads,andChallenges. Biberauer, Theresa.Factors2and3:Towardsaprincipled D’Alessandro, Roberta. Hunter, Tim. Martin, Roger Irurtzun, Aritz.De-syntacticisingSyntax?Concernsonthe Siddiqi, Daniel.OnMorpho-Syntax. Richards, Marc. Cerrudo, Alba. Chomsky, Noam;Gallego,ÁngelJ.;Ott,Dennis.Generative Chomsky, Noam. Gallego, ÁngelJ.;Ott,Dennis.Epilogue.GenerativeSyntax: Institut Interuniversitari deFilologia Valenciana ISSN 1695-6885(inpress);2014-9719(online) Catalan Journal ofLinguistics Catalan Journal What sort ofcognitive hypothesis derivational a is https://revistes.uab.cat/catJL † ; Orús,Román;Uriagereka,Juan.Towards Discourse Phenomena as a Window to the Problems of ‘Problems of Projection’: Some Puzzling Foundational Issues: The Special Issue,2019 The achievements of Generative Servei dePublicacions

Generative Syntax. Questions, Crossroads, and Challenges CCATATALAN JJOURNAL OFL LINGUISTICS Centre deLingüística Teòrica delaUniversitat Autònoma deBarcelona OF Generative Syntax.Questions, Institut Interuniversitari deFilologia Valenciana Crossroads, andChallenges ISSN 1695-6885(inpress);2014-9719(online) C C J J L L Ángel J.Gallego&DennisOtt OURNAL Special Issue INGUISTICS AT https://revistes.uab.cat/catJL ATAL Special Issue,2019 Edited by 2019 AN Servei dePublicacions Catalan Journal of Linguistics Editors Centre de Lingüística Teòrica de la Universitat Autònoma de Barcelona Institut Interuniversitari de Filologia Valenciana Catalan Journal of Linguistics style sheet

Informació general i subscripcions General information and subscriptions • Document format. Manuscripts should be should be listed chronologically, with the CATALAN JOURNAL OF LINGUISTICS és una revista de lingüís- CATALAN JOURNAL OF LINGUISTICS is a journal of theoretical written in English. Font 12 point Times New oldest publication at the top and the newest at tica teòrica, publicada pel Centre de Lingüística Teòrica linguistics published by the Centre de Lingüística Roman, interline at 1.5, use margins 2,5 cm the bottom. Please, provide the DOI of articles de la Universitat Autònoma de Barcelona (format per Teòrica de la Universitat Autònoma de Barcelona (1 inch). Phonetic symbols must be set in if available. lingüistes dels departaments de Filologia Catalana i de (comprising linguists from the departments of Catalan Doulos SIL or Charis SIL. a) Book: Filologia Espanyola) i per l’Institut Interuniversitari Philology and Spanish Philology) and by the Institut • Heading. Title, with title capitalisation, in Borer, Hagit. 1983. Parametric Syntax. de Filologia Valenciana (que integra lingüistes de les Interuniversitari de Filologia Valenciana (which boldface. Name of the author(s), affiliation(s) Dordrecht: Foris. universitats de València, d’Alacant i Jaume I de Castelló). comprises linguists from the universities of València, and e-mail address(es). Acknowledgements, DOI: http://dx.doi.org/10.1515/97831 CATALAN JOURNAL OF LINGUISTICS publica articles d’inves- Alacant and Jaume I at Castelló). as a footnote, called by a flying asterisk. 10808506 tigació en lingüística teòrica que desenvolupin teories CATALAN JOURNAL OF LINGUISTICS publishes research papers sobre l’estructura de les llengües particulars dins la in theoretical linguistics developing theories on the • Abstracts. A 150 word abstract is required. b) Journal article: perspectiva d’una teoria general del llenguatge humà. structure of particular languages within the perspective A Catalan version of the abstract should be Kayne, Richard S. 1993. Toward a modu- Els volums seran sovint monogràfics, editats per un of a general theory of the human language. Volumes will included. Authors are also asked to provide lar theory of auxiliary selection. Studia especialista en el tema. often be monographic, edited by a specialist in the topic. the keywords of the article. Linguistica 47: 3-31. € € DOI: http://dx.doi.org/10.1111/j.1467- El preu de subscripció anual és de 18 . The yearly subscription price is 18 . • Sections and paragraphs. Headings of 9582.1993.tb00837.x main sections: in boldface. Headings of first Comitè assessor Redacció order paragraphs: in italics. Headings of c) Book article: Josep Maria Brucart Marraco Universitat Autònoma de Barcelona second order paragraphs or others: in italics Jaeggli, Osvaldo & Safir, Ken. 1989. The (Universitat Autònoma de Barcelona) Departament de Filologia Catalana with text beginning at the same line. Null Subject Parameter and Parametric Maria Teresa Espinal Farré 08193 Bellaterra (Barcelona). Spain Theory. In Jaeggli, Osvaldo & Safir, Phone: +34 93 581 23 71 (Universitat Autònoma de Barcelona) • Text. The beginning of a paragraph should Ken (eds.). The Null Subject Parameter, C +34 93 581 23 72 Anna Gavarró Algueró be left indented, except for the first paragraph 1-44. Dordrecht: Kluwer. (Universitat Autònoma de Barcelona) Fax: +34 93 581 27 82 of a section. Words, phrases or sentences in a DOI: http://dx.doi.org/10.1007/978-94- M Maria Lluïsa Hernanz Carbó E-mail: [email protected] language other than English appearing in the 009-2540-3_1 Y (Universitat Autònoma de Barcelona) Intercanvi body of the text should be followed by a Sabine Iatridou (Massachusetts Institute of Technology) Universitat Autònoma de Barcelona translation in simple quotation marks. d) Dissertation, manuscript, or work in CM Michael J. Kenstowicz Servei de Biblioteques References should follow this format: preparation: (Massachusetts Institute of Technology) Secció d’Intercanvi de Publicacions Pesetsky, David. 1982. Paths and MY Chomsky (1981), Kayne (1981, 1982), Belletti Joan Mascaró Altimiras Plaça Cívica (edifici N) & Rizzi (forthcoming), Jackendoff (p.c.). If Categories. PhD dissertation, MIT. (Universitat Autònoma de Barcelona) 08193 Bellaterra (Barcelona). Spain Chomsky, Noam. 1994. Categories and CY the page number or an example is expressed: Gemma Rigau Oliver Phone: +34 93 581 11 93 Rizzi (1990: 20), Rizzi (1990: ex. (24)). When Transformations. Unpublished manus- CMY (Universitat Autònoma de Barcelona) Fax: +34 93 581 32 19 the reference is included between parentheses, cript, MIT. Henriette de Swart (Universiteit Utrecht) E-mail: [email protected] K • Publication rights. CATALAN JOURNAL OF Rafaella Zanuttini (Yale University) drop the ones enclosing the year: (see Rizzi Subscripció, administració, edició i impressió 1990: 20). LINGUISTICS is published under the licence Consell editor Universitat Autònoma de Barcelona • Examples. Example numbers should go system Creative Commons, according to the Teresa Cabré Monné Servei de Publicacions modality «Attribution-Noncommercial (by-nc): (Universitat Autònoma de Barcelona) 08193 Bellaterra (Barcelona). Spain between parentheses, left aligned (Arabic digits for examples in the text; lower case derivative work is allowed under the condition Ángel Gallego Bartolomé Phone: +34 93 581 10 22 of non making a commercial use. The original (Universitat Autònoma de Barcelona) E-mail: [email protected] Roman for examples in footnotes). Example work cannot be used with commercial Jesús Jiménez Martínez (Universitat de València) http://publicacions.uab.cat texts should start at the first tab. If there is Sandra Montserrat Buendía (Universitat d’Alacant) ISSN 1695-6885 (in press); ISSN 2014-9719 (online) more than one example within a number, purposes». Therefore, everyone who sends a Xavier Villalba Nicolás Dipòsit legal: B-4.112-2003 lower case alphabet letters should be used manuscript is explicitly accepting this publica- (Universitat Autònoma de Barcelona) Imprès a Espanya. Printed in Spain followed by a stop and a tab. Examples must tion and edition cession. In the same way, Imprès en paper ecològic be referred to as (1), (5a,b), (6b-d), (ii), (iii he/she is authorizing CATALAN JOURNAL OF c-e), (iv a,b), etc. LINGUISTICS to include his/her work in a journal’s issue for its distribution and sale. The Bases de dades en què CATALAN JOURNAL OF LINGUISTICS està referenciada • Glosses. Examples from languages other cession allows CATALAN JOURNAL OF LINGUISTICS — Bibliographie Linguistique — Linguistic Bibliography than English should be accompanied by a to publish the work in a maximum period of two — CARHUS Plus+ — Linguistic Working Papers Directory gloss and, if necessary, by a translation into years. With the aim of favouring the diffusion — CiteFactor — Linguistics Abstracts English. Align glosses with examples word of knowledge, CATALAN JOURNAL OF LINGUISTICS — Dialnet — LLBA (Linguistic and Language Behavior Abstracts) by word. Translations should be in simple joins the Open Access journal movement — Francis — Scopus quotation marks. — Índice Español de Ciencias Sociales (DOAJ), and delivers all its contents to different y Humanidades (ISOC) • References. References should be according repositories under this protocol; therefore, to the following pattern, with full names of sending a manuscript to the journal also entails CATALAN JOURNAL OF LINGUISTICS es publica sota el sistema de llicències Creative Commons segons la modalitat: authors included. Reference entries for the explicit acceptation by its author/s of this Reconeixement - NoComercial (by-nc): Es permet la generació d’obres derivades multiple works by the same author(s)/editor(s) distribution method. sempre que no se’n faci un ús comercial. Tampoc es pot utilitzar l’obra original amb finalitats comercials. Catalan Journal of Linguistics Special Issue, 2019 1-5

Summary Catalan Journal of Linguistics = CatJL Special Issue, pp. 1-288, 2019 ISSN 1695-6885 (in press); ISSN 2014-9719 (online) https://revistes.uab.cat/catJL

Articles 7-26 D’Alessandro, Roberta (Utrecht University/UiL-OTS) The achievements of Generative Syntax: a time chart and some reflec- tions. Catalan Journal of Linguistics, 2019, Special Issue, pp. 7-26. In May 2015, a group of eminent linguists met in Athens to debate the road ahead for . There was a lot of discussion, and the linguists expressed the intention to draw a list of achievements of generative grammar, for the benefit of other linguists and of the field in general. The list has been sketched, and it is rather interesting, as it presents a general picture of the results that is very ‘past-heavy’. In this paper I reproduce the list and discuss the reasons why it looks the way it does. Keywords: generative grammar; syntax; linguistics; results

27-44 Martin, Roger† (Yokohama National University); Orús, Román (Johannes Gutenberg-Universität; Donostia International Physics Center / Ikerbasque Foundation for Science); Uriagereka, Juan (University of Maryland) Towards Matrix Syntax. Catalan Journal of Linguistics, 2019, Special Issue, pp. 27-44. Matrix syntax is a model of syntactic relations in language, which grew out of a desire to understand chains. The purpose of this paper is to explain its basic ideas to a linguistics audience, without entering into too many formal details (for which cf. Orús et al. 2017). The resulting mathematical structure resembles some aspects of quantum mechanics and is well-suited to describe linguistic chains. In particular, sentences are naturally modeled as vectors in a Hilbert space with a tensor product structure, built from 2x2 matrices belonging to some specific group. Curiously, the matrices the system employs are simple extensions of customary representations of the major parts of speech, as [±N, ±V] objects. Keywords: syntax; chains; ; Hilbert space; matrix 2 CatJL Special Issue, 2019 Summary

45-88 Biberauer, Theresa (University of Cambridge; Stellenbosch University; University of the Western Cape) Factors 2 and 3: Towards a principled approach. Catalan Journal of Linguistics, 2019, Special Issue, pp. 45-88. This paper seeks to make progress in our understanding of the non-UG compo- nents of Chomsky’s (2005) Three Factors model. In relation to the input (Factor 2), I argue for the need to formulate a suitably precise hypothesis about which aspects of the input will qualify as ‘intake’ and, hence, serve as the basis for grammar construction. In relation to Factor 3, I highlight a specific cognitive bias that appears well motivated outside of language, while also having wide- ranging consequences for our understanding of how I-language grammars are constructed, and why they should have the crosslinguistically comparable form that generativists have always argued human languages have. This is Maximise Minimal Means (MMM). I demonstrate how its incorporation into our model of grammar acquisition facilitates understanding of diverse facts about natural language typology, acquisition, both in “stable” and “unstable” contexts, and also the ways in which linguistic systems may change over time. Keywords: three factors; Universal Grammar; acquisition; crosslinguistic vari- ation; poverty of the stimulus

89-138 Hunter, Tim (University of California) What sort of cognitive hypothesis is a derivational theory of grammar? Catalan Journal of Linguistics, 2019, Special Issue, pp. 89-138. This paper has two closely related aims. The main aim is to lay out one specific way in which the derivational aspects of a grammatical theory can contribute to the cognitive claims made by that theory, to demonstrate that it is not only a theory’s posited representations that testable cognitive hypotheses derive from. This requires, however, an understanding of grammatical derivations that ini- tially appears somewhat unnatural in the context of modern generative syntax. The second aim is to argue that this impression is misleading: certain accidents of the way our theories developed over the decades have led to a situation that makes it artificially difficult to apply the understanding of derivations that I adopt to modern generative grammar. Comparisons with other derivational for- malisms and with earlier generative grammars serve to clarify the question of how derivational systems can, in general, constitute hypotheses about mental phenomena. Keywords: syntax; minimalist grammars; transformational grammars; deriva- tions; representations; derivation trees; probabilistic grammars

139-152 Richards, Marc (Queen’s University Belfast) Problems of ‘Problems of Projection’: Breaking a conceptual tie. Catalan Journal of Linguistics, 2019, Special Issue, pp. 139-152. The exocentric labelling model of Chomsky’s (2013, 2015) Problems of Projection renders projection rather more problematic than it was previously, giving rise to numerous technical and conceptual complications, redundancies and inconsistencies. Of particular concern is the reversibility of the assumptions that are made with respect to the relation between labelling and Search, such that Summary CatJL Special Issue, 2019 3

the opposite theory is equally coherent and delivers the same empirical results. After reviewing these concerns, a simpler conception of exocentric labelling is sketched in which all labels are uniformly added via external Merge of categoriz- ing phase heads, turning unlabelled (uninterpreted) nonphase syntactic objects into labelled (interpreted) phases. Some conceptual and empirical advantages of the simpler system are finally considered. Keywords: phases; labels; categories; transfer; islands

153-163 Siddiqi, Daniel (Carleton University) On Morpho-Syntax. Catalan Journal of Linguistics, 2019, Special Issue, pp. 153-163. This short paper offers a moment of reflection on the state of the Generative Grammar enterprise especially in light of the fact the Minimalist Syntax has so completely returned to a mission that includes (rather than explicitly excludes) a model of word-formation. I focus here on a discussion of crucial ways the move from “syntactic” theory to “morphosyntactic” theory has changed the mission of generative grammar and to what extent practitioners have kept pace. I hope to provide both a broad and a long view of metatheoretic concerns we now find ourselves at the nexus of and suggest best practices in light of those views. Keywords: distributed morphology; metatheory; readjustment; allosemia; psy- cholinguistics

165-202 Irurtzun, Aritz (CNRS-IKER) De-syntacticising Syntax? Concerns on the Architecture of Grammar and the Role of Interface Components. Catalan Journal of Linguistics, 2019, Special Issue, pp. 165-202. This article discusses different ways in which interface components could potentially affect syntax (or what have traditionally been analysed as syntactic phenomena). I will distinguish four types of potential effects that the interface components could have onto syntax: (i) no real interaction, since almost nothing pertains to syntax: everything (beyond Merge) is externalization; (ii) computa- tions at interface components actively affect the syntactic computation; (iii) Properties of interface representations function to inform biases for language acquisition; (iv) interface components impose Bare Output Conditions (legibility conditions) that constrain the range of possible syntactic representations at the interface. I argue that the first two are problematic, whereas the latter two may help us understanding a range of universal and variable phenomena. Keywords: architecture of grammar; syntax; interfaces; bare output conditions; modularity

203-227 Cerrudo, Alba Discourse Phenomena as a Window to the Interfaces. Catalan Journal of Linguistics, 2019, Special Issue, pp. 203-227. This paper examines the two lines of analysis that are generally pursued when dealing with discourse phenomena in the generative tradition: syntactico-centric and interface-based approaches. Syntactico-centric analyses are criticized 4 CatJL Special Issue, 2019 Summary

because they need construction-specific mechanisms, while interface-based analyses sometimes challenge standard assumptions about the architecture of grammar. The discussion is mainly theoretical, but three case studies serve as exemplification: focalization, ellipsis and parentheticals. The second part of the paper is focused on parentheticals; a brief proposal is presented regarding the distinction between free and anchored parentheticals from a syntax- phonology interface perspective. The general conclusion is that following an interface-based perspective to approach discourse phenomena can help us gain new insights about the nature of the interfaces and their role in grammar. Keywords: syntax-phonology interface; cartography; ellipsis; focalization; par- entheticals

229-261 Chomsky, Noam (University of Arizona & M.I.T.); Gallego, Ángel J. (Universitat Autònoma de Barcelona); Ott, Dennis (University of Ottawa) Generative Grammar and the Faculty of Language: Insights, Questions, and Challenges. Catalan Journal of Linguistics, 2019, Special Issue, pp. 229-261. This paper provides an overview of what we take to be the key current issues in the field of Generative Grammar, the study of the human Faculty of Language. We discuss some of the insights this approach to language has produced, includ- ing substantial achievements in the understanding of basic properties of language and its interactions with interfacing systems. This progress in turn gives rise to new research questions, many of which could not even be coherently formulated until recently. We highlight some of the most pressing outstanding challenges, in the hope of inspiring future research. Keywords: Generative Grammar; faculty of language; basic properties; opera- tions; interfaces; syntax

263-285 Chomsky, Noam (University of Arizona & M.I.T.) Some Puzzling Foundational Issues: The Reading Program. Catalan Journal of Linguistics, 2019, Special Issue, pp. 263-285. This is an annotated transcription of ’s keynote presentation at the University of Reading, in May 2017. Here, Chomsky reviews some foun- dational aspects of the theory of structure building: essentially, Merge and Label. The aim is to eliminate what he refers to as extensions of Merge which are seemingly incompatible with the Strong Minimalist Thesis while still accounting for recursive structure, displacement, and reconstruction (as the main empirical goals of the Minimalist Program). These include sidewards movement, multi-dominance, and late-Merge; all of which have been developed throughout the life cycle of transformational generative grammar. Furthermore, Chomsky formulates a series of conditions that an adequate formulation of Merge must meet, and sketches how the aforementioned extensions may violate these conditions. Chomsky arrives at a formulation of an operation MERGE, which maintains the core properties of Merge but is further restricted by limitations over what MERGE can do to the workspaces where syntactic operations apply. Keywords: Strong Minimalist Thesis; workspaces; MERGE; recursion Summary CatJL Special Issue, 2019 5

287-288 Gallego, Ángel J. (Universitat Autònoma de Barcelona); Ott, Dennis (University of Ottawa) Epilogue. Generative Syntax: Questions, Crossroads, and Challenges. Catalan Journal of Linguistics, 2019, Special Issue, pp. 287-288.

Catalan Journal of Linguistics Special Issue, 2019 7-26

The achievements of Generative Syntax: a time chart and some reflections*

Roberta D’Alessandro Utrecht University/UiL-OTS [email protected]

Received: April 10, 2018 Accepted: September 23, 2019

Abstract

In May 2015, a group of eminent linguists met in Athens to debate the road ahead for genera- tive grammar. There was a lot of discussion, and the linguists expressed the intention to draw a list of achievements of generative grammar, for the benefit of other linguists and of the field in general. The list has been sketched, and it is rather interesting, as it presents a general picture of the results that is very ‘past-heavy’. In this paper I reproduce the list and discuss the reasons why it looks the way it does. Keywords: generative grammar; syntax; linguistics; results

Resum. Els assoliments de la sintaxi generativa: una gràfica temporal

El maig de 2015, un grup d’eminents lingüistes es van reunir a Atenes per debatre el camí que cal seguir per a la gramàtica generativa. Hi va haver molta discussió i els lingüistes van manifestar la intenció de confeccionar una llista d’èxits de la gramàtica generativa en benefici d’altres lingüis- tes i de l’àmbit en general. La llista ha estat esbossada i és força interessant, ja que presenta una imatge general dels resultats molt «passada». En aquest treball reprodueixo la llista i comento els motius pels quals es veu d’aquesta manera. Paraules clau: gramàtica generativa; sintaxi; lingüística; resultats

Table of Contents 1. Introduction 5. Communicating results and learning 2. “Results” in generative grammar about them 3. Suggestions for additions 6. Some final remarks 4. Some comments on the time charts References

* The author acknowledges funding support from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant Agreement No. 681959_Microcontact).

ISSN 1695-6885 (in press); 2014-9718 (online) https://doi.org/10.5565/rev/catjl.232 8 CatJL Special Issue, 2019 Roberta D’Alessandro

1. Introduction In May 2015, a group of eminent linguists met in Athens to debate the road ahead for generative grammar.1 There was a lot of discussion, and the assembled expressed the intention to draw a list of achievements of generative grammar, for the benefit of other linguists and of the field in general. This list, to the best of my knowledge, has not been published yet. However, Peter Svenonius did publish a tentative list on his blog after the conference. The list was edited and compiled by Mark Baker, Rose-Marie Déchaine, Amy Rose Deal, Winfried Lechner, Julie Legate, Ian Roberts, Ivy Sichel, and Peter Svenonius himself. A group of people is now working on an encyclopedia based on the list.2 I decided to carry out an exercise: I put a (tentative) date next to every entry on the list, just to map these important results on a time chart. Many of these are shared results, so I tried to select the paper/dissertation in which these ideas were first for- mulated, not necessarily with the name we use for them today. I then put the list on Lingbuzz and Facebook, and had an overwhelming response from the community,3 such that this time chart has now become a collective exercise. Since this draft has received much more attention than I had expected, let me add some disclaimers, which I had initially overlooked and were only added to the reviewed version. First, as it will be obvious to whoever reads this, this paper is not scientific at all. It can be seen as an attempt to reconstruct some of the key stages of gen- erative syntax, followed by some very personal reflections on the status of the discipline. Again, this paper should not be taken as the truth, nor as a piece of scientific research. Given the fact that the paper received a lot more attention than I had foreseen, some interesting points were raised by many scholars during these months. Because of the nature of what I wrote, the three reviews I received looked more like follow-up discussion than reviews. I will try and include the reviewers’ viewpoint and rebuttals as much as possible, as I think this could really get the discussion going. Then, as Svenonius points out,4 the original list concerns mid-level-coverage results in generative grammar (or rather: syntax) for which there is a broad consen- sus. According to Gillian Ramchand’s blog,5 “‘mid level generalizations’ […] refer to the concrete results of bread and butter generative syntax (whether GB, LFG

1. Unfortunately, the conference website no longer works. The conference was called Generative Syntax in the Twenty-First Century: The Road Ahead. 2. . 3. I wish to thank Avery Andrews, Tista Bagchi, Theresa Biberauer, Jonathan Bobaljik, Hagit Borer, Stanley Dubinsky, Dan Everett, Berit Gehrke, Alessandra Giorgi, Vera Gribanova, Heidi Harley, Martin Haspelmath, Monica Irimia, Pauline Jacobson, Dalina Kallulli, Alec Marantz, Jason Merchant, Gereon Müller, Francisco Ordoñez, Dennis Ott, Diego Pescarini, David Pesetsky, Omer Preminger, Craig Sailor, Peter Svenonius, Tonjes Veenstra, and Xavier Villalba Nicolas, and three anonymous reviewers (I hope I’m not forgetting anyone) for their feedback. I hope I’m reproducing their suggestions correctly. All mistakes you’ll find remain entirely mine. 4. You can find part of the conversation here: . 5. . The achievements of Generative Syntax CatJL Special Issue, 2019 9 or HPSG) which would not have been discovered without the explicit goals and methodologies of generative grammar (MLGs)”. The list, as one reviewer points out, is subjective. Had other linguists been invited to the meeting, the list would probably look quite different. Furthermore, the methodology according to which these results were selected is not obvious. The list will look very unbalanced towards the early days of generative gram- mar. In the rest of this paper, I will discuss the possible reasons why. One thing that was very interesting for me is that I got many of these dates wrong when I drafted the chart for the first time. Now, of course, this might be entirely due to my own ignorance, and to some extent it certainly is. I grew up in the Minimalist era, and had very little exposure to GB and the early years of gen- erative grammar. Because of this, when I started looking for dates I proceeded as I usually do when I start working on a new project: Google, handbooks, introductory chapters, introductory syntax books, and encyclopedia articles of all sorts. Then, I started reading (not in great detail, admittedly, as this was just for fun in the begin- ning, and I had limited time) some more specialized articles. I tried to track down the first time something was observed, an idea was proposed, and selected as the locus of “first formulation” those references on which everyone seemed to agree.6 Despite my efforts, I made a number of mistakes. I put the draft online as I was sure that many of these “standard references” were wrong, and there had been a lot of reappropriation:7 very often the people/papers who are cited as “the standard reference” or “the first to observe” are not the ones who actually first discovered/ observed/reported something. Some years ago, I taught a course on agreement which I called Die Ewige Wiederkehr des Gleichen (‘The eternal recurrence of the same’), where I tried to show that most of what we think we are discovering or inventing today had already been discovered or observed, in different terms of course, in the ‘60s. I’ve had very much the same impression when putting together this timeline. Many people also had interesting suggestions for additions to the list, so I will include those after the original list. Finally, as many observed, not all of these results can be attributed to generative grammar. We will assume for the time being that they can be, as this is not directly relevant for the exercise I wish to do, but we should be aware of this. I have not modified the list, but I have copied it entirely from Peter Svenonius’s blog, including the explanations, as I think they make things a bit clearer. I also added the references. The aim of this paper is to reflect on the actual status of generative grammar, on its achievements, and on the shortcomings that seem to emerge from this list. Again, this is more of an opinion paper than a scientific one, and it should be taken as such. Science is not really arguable, but opinions are. This paper was presented at a conference held in Barcelona entitled Generative Syntax 2017. Questions,

6. Admittedly, this does not make too much sense, because results are always due to more than one person, but I did it, so here it is. 7. A term due to Pauline Jacobson, who sent me lots of interesting feedback, and to whom I wish to give special thanks. 10 CatJL Special Issue, 2019 Roberta D’Alessandro crossroads, and challenges. In the spirit of the conference, I tried to reflect on these issues, and this paper is just that: an attempt at reflection.

2. “Results” in generative grammar It often happens, during general conferences or in linguistic venues, that genera- tive grammarians are asked what results generative grammar has achieved, what discoveries it has brought to light, and what contribution it has given made to linguistic debate. The follow-up question is almost always: what have you discov- ered in the last 20 years? The choice of this time span is not random, but refers to the Minimalist Program period, which started more or less around 1993-1995, i.e. around 20 years ago. The general feeling is that generative grammar, or syntax to be more precise, has not moved ahead too much during the minimalist period. While the charts that you will see seem to point in that direction, this is in fact not true. The field has evolved a great deal, it has expanded. Many new languages have been studied and analyzed with generative tools. New generalizations have been drawn up, and new theoretical questions have been asked. This, I feel, is the normal way for a field to go forward. The way generalizations were expressed in the Government and Binding (GB henceforth) era, starting with Chomsky’s 1981 Pisa lectures, was radically different from the way they are expressed now. Learnability was already in the picture, and there was a consistent strand of generative L1 acquisition and modeling work, but it had rather limited success. The most successful part of research during GB was on grammatical (parameterized) principles, which were crucial for understanding, for instance, learnability. The research pitch on learnability is much more prominent in the Minimalist program. What has stayed the same is the understanding that languages do not vary indefinitely, and that constraints that are discovered about one language could be used to describe a different language. This, I think, is the key difference between generative grammar and other linguistic enterprises, such as typology: while typologists assume that, say, the existence of wh- movement in English cannot tell us anything about Chinese, generativists assume that this isn’t the case. The common core has moved from principles to features, from structural constraints to the tools we use to build grammar. This evolution should not confuse us: we should not use old parameters to measure new discoveries. With this disclaimer in mind, let us move on to examine the original list created by the linguists in Athens.

2.1. Mid-level coverage results in generative grammar Mid-coverage results are generalizations, observations that would not have been made without the tools and approaches of generative grammar. These results are just an indication; they simply reflect a discussion, and have not been officially approved by anyone. I have tried to attribute a date to them, the date on which they The achievements of Generative Syntax CatJL Special Issue, 2019 11 were first formulated. I will list the results as they are given in Peter Svenonius’s blog entry, with the explanations that he provides. The dates, however, are the result of my own research, so any errors in this regard are down to me. According to the list, mid-coverage results of generative grammar include the following: 1. Unaccusativity [There are two classes of monovalent verbs such that the argu- ment in the unaccusative class is predicate-internal, while the argument in the unergative class is predicate-external (in derivational terms, the unaccusative argument originates predicate-internally)]: Hall (1965). 2. The Agent asymmetry: [NPs bearing Agent roles are higher than NPs bearing other roles in the unmarked structure of the clause]: Keenan & Comrie (1972). 3. Passive valence reduction: [Agents are the easiest arguments to suppress in valency reduction]: Keenan (1975).

X-bar theory, categories, and headedness 1. Extended projections [Clauses and nominals consist of a (respectively) verbal/ nominal head, dominated by zero or more members of an ordered sequence of functional elements]: Grimshaw (1991). 2. Cinque hierarchy [There are semantically defined classes of TAM functors that appear in the same hierarchical order in all languages in which they exist overtly]: Cinque (1999). 3. Cinque hierarchy for adverbs [There are semantically defined classes of adverbs that appear in the same hierarchical order in all languages in which they exist overtly (related to or identical to the TAM hierarchy)]: Cinque (1999). 4. Morphology Mirrors Syntax [The hierarchy of projections as reflected in free words is the same one that is reflected in morphological structure when morphemes express the same notions as the free words]: Chomsky (1957) / Muysken (1979, 1981). 5. CP-DP parallelism [There are substantive parallels in structure between noun phrases and clauses, most obviously in the case of nominalizations but also detectable in other kinds of nominals (e.g. similarities between subjects and possessors, subject to cross-linguistic variation)]: Jackendoff (1977). 6. The Final-over-Final constraint8 [It is relatively difficult to embed head-final projections in head-initial ones, compared to the opposite (132 but not *231, where 1 takes 2 as a complement and 2 takes 3)]: Biberauer, Holmberg & Roberts (2007). 7. Cinque’s version of Greenberg’s U20 [Only one unmarked order is found pre- nominally for Dem, Num, and Adj, namely Dem > Num > Adj > N; ordering possibilities increase as N is further to the left in the sequence. The facts suggest (i) a universal hierarchy Dem > Num > Adj > N, where these categories exist, (ii) the possibility of leftward but not rightward movement of projections of N to derive some other orders, and (iii) the absence of such movement of adnom-

8. Now called Final-over-Final Condition. I reproduce the list as it is originally formulated by Svenonius. 12 CatJL Special Issue, 2019 Roberta D’Alessandro

inal modifiers alone (e.g. no information-neutral movement of Adj across Num and/or Dem unless it is in a projection containing N) (May generalize to other categories)]: Cinque (1996). 8. Functional Material Doesn’t Incorporate [Higher functional structure such as determiners and complementizers doesn’t incorporate into superordinate lexical heads]: Li (1990). 9. SOV scrambling [All SOV languages allow a degree of word order freedom (scrambling); VO languages may not]: Grewendorf & Sternefeld (1990)?

Movement in general (not restricted to A-bar or A) 1. Coordinate Structure Constraint [Extraction from a Coordinate Structure is not possible unless it is by Across-the-Board movement (the phenomenon of pseudocoordination has to be distinguished; e.g. “What did you go (to the store) and buy?”; pseudocoordination shows characteristic properties, for example a restricted class of possible left-hand categories (cf. *“What did you walk and buy?”), extraction only from the open-class right-hand member (cf. *“Which store did you go to and buy shrimp?”)]: Ross (1967). 2. Head Movement Constraint [Head movement doesn’t cross heads. This can- not be escaped by excorporation: If X moves to Y by head-movement, then X cannot move on, stranding Y. (Clitic movement crosses heads and must be distinguished from head movement proper, i.e. head movement of complements in extended projections to their selecting projections, and of incorporees to their selecting predicates)]: Travis (1984). 3. Movement is upward [Movement is upward, landing in higher syntactic posi- tions]: Ross (1967). 4. Right Roof constraint [Rightward movement is clause bounded (“the right roof constraint”)]: Ross (1967). 5. Second position [There are second position effects which are category-insen- sitive, i.e. not sensitive to the category of the element in first position, but no second to last effects which are similarly category-insensitive. (This allows for immediately pre-verbal positions in V-final structures)]: Kayne (1994). 6. Syntactic clitic placement [A major class of clitics (phonologically dependent items) have their location in the surface string determined by purely syntactic principles of the language (i.e. ignoring the phonological dependency)]: Steele (1977).

Binding Theory 1. Principle B [Pronouns, in the unmarked case, can’t be locally bound (under the same A-position class of locality as for Principle A), but can be bound nonlo- cally]: Chomsky (1973) / Lasnik (1976) / Chomsky (1981). 2. Principle C [an R-expression can’t be bound by (systematically corefer with) a c-commanding pronoun]: Chomsky (1973) / Lasnik (1976) / Chomsky (1981). 3. Structure relevant to binding [The conditions on pronominal reference cannot be stated purely with linear order. The subject-nonsubject distinction plays an The achievements of Generative Syntax CatJL Special Issue, 2019 13

important role, especially for Principle A (and B to the extent that it is comple- mentary)]: Langaker (1966). 4. Strong crossover [Coreference is impossible between a pronoun in an argu- ment position and a c-commanding antecedent when the antecedent has moved across the pronoun; i.e. is the head of a filler-gap dependency where the gap is c-commanded by the pronoun. Example: “Who did he say was hungry?” Coreference impossible]: Postal (1971) / Wasow (1972). 5. Weak crossover [Coreference is degraded between a pronoun and a c-com- manding antecedent when the antecedent has moved across the pronoun; i.e. is the head of a filler-gap dependency where the gap is lower than the pronoun. Example: “Who did his mother say was hungry?” Coreference degraded]: Postal (1971) / Wasow (1972). Arguments 1. Improper movement [A-positions (as diagnosed by case, agreement, and bind- ing) feed unbounded dependencies (e.g. the tail of a wh-movement, relative clause formation, or topicalization chain is in an A-position). Unbounded dependencies preserve case, agreement, and binding configurations, and do not (normally) feed A-positions (i.e. they do not normally increase the possibili- ties for an element to enter case-agreement-relevant relations, unlike passive, raising, etc.)]: Chomsky (1977)? 2. Control versus raising [Obligatory control is a subject-to-subject relation (or, in some cases, object-to-subject relation) in which one referent gets thematic roles from two predicates, related to each other by nonfinite complementation; in Raising, the shared argument gets only one thematic role, from the embedded predicate]: Rosenbaum (1965). 3. Structural agreement [There is a structural bias affecting agreement such that nominals higher in the clause are agreed with in preference to lower nominals, except where marked case on a higher nominal may disqualify it (reflected in subject agreement over object agreement)]: Aissen (1989). 4. Grammatical Subject [There is a distinction between grammatical subject and thematically highest argument (though traditional subject diagnostics may decompose even further)]: Chomsky (1965). 5. Diesing’s Generalization [If uniquely referring DPs (definites and/or specifics; Milsark’s “strong” noun phrases) and weak indefinites with the same grammat- ical function different positions, then the uniquely referring DPs are structurally higher]: Diesing (1992). 6. Person-Case Constraint (PCC) [Languages place strong restrictions on the use of local direct objects when a goal NP is present (NP, or DP, as opposed to PP), for example: A direct object may not be first or second person in the presence of an indirect object]: Perlmutter (1971). 7. No NCC [There is no number case constraint; languages do not restrict the gram- matical number of the direct object when a goal NP is present]: Nevins (2011). 8. Ergative subjects [Asymmetries between arguments for purposes of unmarked word order, binding, and control work the same way in nominative and ergative 14 CatJL Special Issue, 2019 Roberta D’Alessandro

languages. Clause structure in ergative and accusative languages is homomor- phic]: Mahajan (1997). 9. Null subjects [Many languages allow pronouns to be unpronounced in certain positions under certain conditions. Where possible, these pronouns act much like overt pronouns for e.g. Binding Conditions]: Perlmutter (1971). 10. High causatives [In a morphological causative, the new causee will be higher than any argument of the base verb]: Baker (1988). 11. Marantz’ Generalization [In benefactive applicative constructions, the new argument will be structurally higher than the base internal argument]: Marantz (1984). 12. Erg Agreement is dependent on Erg case [No language has a nominative-ac- cusative case system and an ergative-absolutive agreement system, although matched systems are possible, and the opposite mismatch is possible (Bobaljik 2008, and typological sources)]: Anderson (1977). 13. No Active Case [No language has an active system of case marking, whereas active systems of agreement marking are possible. (Baker & Bobaljik in press/ in progress, but well documented)]: Mithun (1991)?

Quantifier Raising 1. QR [The logical scope of natural language quantifiers (over individuals, times or situations/worlds) does not have to match their surface position. Quantifier scope is co-determined by structural factors (islands, clausal boundaries), logi- cal properties of the quantifier (universal vs. existential) and the form of the quantificational expression (simple vs. complex indefinites)]: Bach (1968), May (1977). 2. QR is clause bound [The scope of (expressions corresponding to) universal quantifiers is limited by conditions identical or very similar to the conditions on A-movement (clause bounded, except in restructuring contexts)]: May (1985). 3. Widest scope indefinites [In many languages, morphologically simple indef- inites (some books at least one book) may take unbounded scope, even across islands]: Fodor & Sag (1982). 4. Reconstruction [Dislocated quantificational expressions can take scope below their surface position, but no lower than their base position]: Chomsky (1976).

A-bar. A-bar phenomena 1. A-bar Unity [A class of A-bar (filler-gap) constructions (including interroga- tives, relative clauses, focus movement constructions, and operator-variable chains) show unified behavior with respect to locality and configuration]: Chomsky (1977), Chomsky (1981). 2. Successive Cyclicity [Unbounded dependencies are successive-cyclic, as diag- nosed by locality effects]: Fillmore (1963) / Chomsky (1973). 3. Covert A-bar dependencies [There are operator-variable relations where the operator is low on the surface that are restricted by the same laws as A-bar dependencies, where the A-bar element is high on the surface. For example, the The achievements of Generative Syntax CatJL Special Issue, 2019 15

interpretation of wh-in-situ for selection and scope parallels overt wh-movement in a significant and fairly well-defined class of cases]: Huang (1982). 4. Subject-object asymmetry for A-bar [High (preverbal) subjects are more dif- ficult to extract than low (often postverbal) subjects in a class of cases]: Ross (1967). 5. Freezing [It’s harder to subextract from subjects and objects that have moved; no language will permit movement out of a moved subject or object but not out of a nonmoved one, under otherwise identical conditions]: Ross (1967). 6. Specifier bias in Pied-piping [If you can pied-pipe from a complement then you can pied-pipe from a specifier]: Ross (1967)? 7. Adjunct extraction is hard [If a phrase is an island for argument extraction, then it is also an island for adjunct extraction]: Huang (1982). 8. Parasitic gaps [An A-bar chain can license an otherwise illicit gap in an adjunct]: Ross (1967). 9. Resumptive pronouns [Resumption is by pronouns (not by dedicated resumptive particles)]: Ross (1967). 10. Resumptive pronoun island alleviation [Resumptive pronouns tend to alleviate island effects]: Ross (1967). 11. Local subject condition on resumption [There is a class of resumption which is incompatible with local subject position]: McCloskey (1990). 12. Left-dislocation [Many languages allow one or more kinds of left dislocation, with systematic similarities and differences from A bar movement (e.g. lack of case connectivity)]: Lambrecht (1994). 13. Intervention Effects (Beck Effects) [Covert A-bar chains (i.e. in-situ wide-scope- bearing elements) cannot cross (take scope over) scope-bearing interveners]: Beck (1996).

11

10

9

8

7

6

5

4

3

2

1

0 1965-66 1967-68 1968-69 1969-70 1975-76 1977-78 1979-80 1985-86 1987-88 1989-90 1995-96 1997-98 2005-06 2007-08 2009-10 2015-16 1955-62 1963-64 1971-72 1973-74 1981-82 1983-84 1991-92 1993-94 2001-02 2003-04 2011-12 2013-14 1999-2000

General X-bar... Movement Binding Arguments QR

Figure 1. Discoveries by year. 16 CatJL Special Issue, 2019 Roberta D’Alessandro

3. Suggestions for additions Many linguists sent me suggestions for items to be added to the list.9 Most of these items were mentioned by more than one linguist. I will just list them here, in random order. 1. Root vs non-root transformations [Some transformations can only take place in root environments]: Emonds (1969). 2. Accessibility hierarchy for relativization [see also the Agent asymmetry above]: Keenan & Comrie (1972). 3. Raising: Postal (1974).

X-bar theory, categories, and headedness 1. C-command: Klima (1964) / Reinhart (1976). 2. COMP/C: Bresnan (1972).

Movement in general (not restricted to A-bar or A) 1. Remnant movement: Tiersch (1985). 2. Principle of Minimal Compliance: Richards (1989). 3. Minimality/Relativized Minimality: Chomsky (1986) / Rizzi (1990). 4. Clitic doubling voids A-minimality effects: Anagnostopoulou (2003).

Arguments 1. VP-shells: Chomsky (1955) / Larson (1988). 2. VP-shells/double object constructions: Barss & Lasnik (1986). 3. Non-nominative subjects [Non-nominative subjects behave like structural sub- jects]: Andrews (1976). 4. Split subject position/two subject positions: Schachter (1976) / Guilfoyle, Hung & Travis (1992). 5. Exceptional Case Marking: Chomsky (1981).

QR 1. Quantifier lowering is subject to island constraints: Lakoff (1965) / (1970).

A-bar. A-bar phenomena. 1. Some kinds of sluicing ameliorates islands: Ross (1969).

9. As one reviewers points out, results should include also those ideas that proved wrong but that opened the path for the discovery of many important generalizations. One such idea was Hale’s Configurationality Parameter (Hale 1978), which was “a successful contribution in its failure”, given that it opened the path to the conceptualization of binary branching, Merge, and the poly- syntesis parameter (Baker 1996). The achievements of Generative Syntax CatJL Special Issue, 2019 17

11

10

9

8

7

6

5

4

3

2

1

0

1955-62 1963-64 1965-66 1967-68 1968-69 1969-70 1971-72 1973-74 1975-76 1977-78 1979-80 1981-82 1983-84 1985-86 1987-88 1989-90 1991-92 1993-94 1995-96 1997-98 2001-02 2003-04 2005-06 2007-08 2009-10 2011-12 2013-14 2015-16 1999-2000

General X-bar... Movement Binding Arguments QR A- /A'

Figure 2. Suggested additions.

4. Some comments on the time charts Figure 1 (as well as Figure 2) shows that the most important and generally accepted discoveries or observations, according to “the list”, were made between 1955 and 1992. After 1990, we see a steady discovery rate of 1 or 2 items every two years. From 2001 onwards, there is almost nothing. The obvious explanation for this decrease might be that discoveries or observations need to be tested, and it can take some time before they are accepted as true. Time is the main factor here: A generalization discovered in 2014 has not yet come to be accepted as a definite result, despite its publication. The impression that I have, however, is that this isn’t the whole story. I would like to share some thoughts on these charts, and on the status of gen- erative grammar, without trying to be too negative, and as a simple worker in the field. Before going into that, one disclaimer is essential. In what follows, I try to give a plausible explanation for the time curve we see in the figures above. In other words, I am trying to understand why some of the most prominent linguists in the generative field thought that the best results were obtained early on, and did not think of mentioning more recent ones. While I do believe that a paradigm shift, as well as a focus shift, is at work, I do not mean to claim that all contemporary generativists are concerned with problems of Merge and cognition. Most of us, in fact, including myself, are concerned with the description and understanding of languages, grammars, linguistic phenomena. Many generalizations are coming out of this kind of work, though they do not spread as fast and large as they used to. Again, in what follows I am trying to dis- cuss “the list”, not “my personal list”. 18 CatJL Special Issue, 2019 Roberta D’Alessandro

4.1. Paradigm shift

The wealth of discoveries in the 60s and 70s, in the early days of generative gram- mar, is not repeated in any later periods. During the Government and Binding era, we still have a stable ‘discovery-rate’. After the publication of the Minimalist Program, we seem to see much less happening. We could ask ourselves why this is the case. One possible answer could be that the advent of the Minimalist Program has shifted the object of investigation from languages to language. This does not mean that GB was not interested in language: language was always the core of the investigation. During GB, the task was to identify the principles and parameters that constitute UG. This was carried out by looking at languages quite intensively. Unlike during the Phrase Structure Rule period, in which not many languages were taken into account and English was the primary language of investigation, during GB many studies of languages other than English were produced, and many cross-linguistic generalizations were drawn. The methodology consisted mostly in observing what happened in a lan- guage (or in two, comparatively) and trying to draw generalizations about how UG must look considering these observations. When something could not be explained on the basis of principles and parameters already formulated, a new parameter was formulated (and in the most extreme case, a new principle), resulting in an explo- sion of basic assumptions, and parameters, most of which were language-specific. In other words, there was a risk of simply repeating the Phrase Structure Rule enterprise, having filters or even parameters which would be able to account for one phenomenon in one language only, with little predictive power. One of the key features of generative grammar, often contested by, for instance, typologists, is the assumption that one can use discoveries in one language to try and explain a different language. The existence of UG implies that there are shared features (not in the technical sense) in languages. This assumption informed much of GB work. While studying languages to try to understand the language faculty is still a worthwhile enterprise, the narration seems to me to have changed radically with the MP, at least for one group of syntacticians. In GB, linguists (syntacticians) were busy trying to find similarities across languages, to identify parameterized principles that could account for the limits of syntactic variation. These parameterized principles were considered as part of UG; X-bar was considered as genetically provided, something underlying every gram- mar, the essence of our computational system. While X-bar has been largely aban- doned, in practice most syntacticians still use it for their daily language description. With the advent of MP, quite a large branch of MP, the so-called biolinguis- tics, has been concerned with understanding whether the computational basis of language is common to other cognitive faculties or not. Biolinguistics (Jenkins 2000), has focused on the Faculty of Language, which is defined in cognitive terms much more than Universal Grammar was. Of course, in both cases we are trying to understand how our grammatical/computational system looks, under the assumption that we have one. The image of this language acquisition device has, however, changed radically from the GB years to the biolinguistics years, and so has the The achievements of Generative Syntax CatJL Special Issue, 2019 19 methodology used to investigate it. The discussion (at least as far as this subgroup is concerned) revolves around Merge, other components of human cognition, or animal language, but not so much around wh-movement or head directionality in different languages. Comparative syntax is still the main occupation of most gen- erativists, but this is perhaps seen as a “reductionist problem”: reducing one issue to a wider one, rather than an explanation. Going beyond explanatory adequacy means that we not only wish to understand how languages are acquired, but why language looks the way it looks. The understanding is that there might be something guiding humans to fix head directionality, but that is most likely an interface requirement (for instance, ease of computing a dependency between heads that are harmonically aligned vs heads that are not); there is certainly something allowing some languages to drop the subject, but that too is only partially relevant to understanding the nature of the faculty of language, and it might ultimately be an interface condition (or a set of conditions interacting, see recent work by Jimenez Fernández or Miyagawa). As Chomsky remarked from the very early days (Chomsky 1965), generative grammar does not have much to say about language universals in the Greenbergian sense10 (though Chomsky’s attitude with respect to typological generalization shifted around 1982).11 Implicational universals of the sort “if in a language the determiner precedes the noun then the auxiliary will precede the verb”, which amounts to the head parameter, are not as central as they used to be for the generative enterprise. What we need to understand is what constitutes the Faculty of Language, and observing languages cannot provide much beyond a handful of clues. As an example, we can mention the recursion issue, debated by Daniel Everett and Nevins, Pesetsky & Rodriguez (2009) and many following works (most notably, the Faculty of Language blog, by Norbert Hornstein). The argument goes more or less like this: Pirahã does have recursion, despite what Everett maintains. But, and this is the important bit, even if it didn’t, this would not tell us anything about the Faculty of Language. Recursion is a characteristic of FL, which needs not be present in all languages. In other words, what we see in languages may or may not give us an indication of what constitutes FL. What does help is learnability, i.e. how a language can come about given the initial conditions. Of course, one should first know what ‘a language’ actually is in formal terms, before one can think about how it might come about. What I wish to say here is that the direction from which we tackle the whole Faculty of Language/ UG issue has changed quite radically. What concerns us is the mechanism whereby sentences are “assembled”; each language has a set of conditions, which are mainly linked to their features (Borer- Chomsky conjecture), but they are not as central to linguistic investigation as they

10. “Insofar as attention is restricted to surface structures, the most that can be expected is the discovery of statistical tendencies, such as those presented by Greenberg” (Chomsky 1965: 118). 11. “Greenbergian universals […] are very suggestive” (Chomsky 1982: 111). They are “important, […] yielding many generalizations that require explanation […]” (Chomsky 1986: 21). These quotes are taken from Newmeyer (2017: 550). 20 CatJL Special Issue, 2019 Roberta D’Alessandro could be. What is a well-formed feature? How exactly does one define features? These questions are rarely addressed nowadays. What is addressed to a larger extent is the relation between features, their geometry. The core mechanism for forming a syntactic object is Merge; syntax assembles all sorts of objects: those that are fit meet the interface conditions and “converge”, while the others crash. The size of the set of interface conditions is not so clear, and nor is the exact definition of a condition. Be that as it may, this sort of methodological and paradigmatic shift has had consequences for the generalizations made about languages, which we have called “achievements” here. The “macroparametric”/classical approach has been replaced by the Borer-Chomsky conjecture (Baker 2008): we need to learn the lexicon, and we also learn the “parameters” attached to each functional head. Why these microparameters cluster in given ways is almost always still a mystery (but see, for instance, Biberauer & Roberts 2015), this isn’t particularly of interest as it isn’t relevant for our understanding of language. There is, now more than ever, a divide between those syntacticians who occu- py themselves with generalizations about languages, and biolinguists, who occupy themselves with generalizations about language. There’s nothing bad about this divide, other than that we are leaving generalizations about languages aside as if we already understand everything there is to know, and we don’t. In general, “interface conditions” are a shorthand for “outsourcing”: by pushing things into other modules, most notably PF, we leave it to morphologists and pho- nologists to solve puzzles (like, for instance, linearization, heavy NP-shift, or even null subjects) that were once the syntacticians’ core object of inquiry. Phonologists and morphologists mostly do not recognize these as issues they need to deal with, so very few linguists work on these issues nowadays12. This, I think, is the main reason for the decrease in results that we see in our time charts.

4.2. The specialization of the field Another issue that might have influenced this decrease is that linguistics is specializing more and more, as is natural for any discipline or science. This means that while 50 years ago syntacticians had to read, across-the-board, everything that was written about any topic in syntax, nowadays they tend to focus on their own narrower research topic. Research topics have in turn changed, because of this evolution and because of the many discoveries that have been made in the meantime. For example, com- plementizers were treated as one thing in their first formulations, but since Rizzi (1997) there has been a fragmentation; there have been studies on each type of sentence introduced by each complementizer type; more details have been added, more language-specific aspects have been brought up. Nowadays, it is perfectly

12. One reviewer argues that I should provide empirical proof for my claim. One piece of evidence could be that there are very few articles recently on these issues. There is to date yet no replacement for Kayne’s LCA (Kayne 1994). The achievements of Generative Syntax CatJL Special Issue, 2019 21 normal to spend all of one’s career on the study of one complementizer. This in turn means that very specialized papers will be read by other specialists only, and the average syntactician will ignore most of what is said on topics that they are not working on. I don’t think this is too substantial a problem, nor that it is a problem only of generative syntax (in fact, if anything, specialization shows that the field is growing) but it certainly has an impact on our time charts. Most discoveries are so specialized, so narrow and detailed, that other syntacticians don’t even know about them.

4.2.1. Neighbouring fields A connected issue is that we deem most of the empirical discoveries that come from other fields as not relevant. Psycholinguistics has evolved into a very strong field, but we rarely take the observations coming from that field seriously. Big data and statistical analysis can bring to light many interesting generalizations. The general attitude towards these neighboring fields is that of polite neglect. Many studies lack theoretical depth, and should therefore be ignored. The results are before our very eyes: generativists are often accused of being snobbish, and are becoming more and more isolated. While a healthy exchange would be fruitful for everyone, we see mutual disregard, or overt personal fights, more often than not. Discarding everything as irrelevant is dangerous, and is drawing us towards isolation. Theoretical approaches to language are less and less prominent, and lin- guistics is moving towards a scarily vacuous empiricism. There is only one solution to this: we need to reach out, listen to what other fields and linguistic subfields have to say, and try to incorporate their insights (when possible), while at the same time proposing integrated methodologies that can help our enterprise.

5. Communicating results and learning about them Going back to the list, despite the many factors that we have discussed that have led to a decrease in results in recent years, we can still claim that the achievements of generative grammar are many. One of the biggest issues, however, is that these achievements are unknown to most people, even linguists from other subfields. I have already mentioned the limited cooperation with neighboring fields. This is not the only problem we face: while achievements in many fields have at least been heard about by most people, achievements in generative grammar have not. Recently, Ángel Gallego of the Universitat Autònoma de Barcelona circulated a query, asking everyone whether they had heard at least the names of some of the main discoveries of the last centuries. There were terms like ‘gravitational waves’ or ‘relativity theory’. And there were also words like ‘Universal Grammar’, or ‘struc- ture dependency’. Sadly, most people had never heard terms like UG or structure dependency. This is certainly due to the fact that we have spent way too little time sharing results, and communicating our views to the world. Observe that this might not be true of generative grammar only: this might be a problem for linguistics in general, if one of the greatest young intellectuals I know, a neuroscientist, had never heard the word ‘Indo-European’. This is somewhat 22 CatJL Special Issue, 2019 Roberta D’Alessandro worrisome, especially given that if we do not communicate what we know to the general public someone else will: more and more anthropologists, psychologists, or historians claim discoveries that have been known to linguists for decades, and are even praised for them (see for instance ‘Brain scientists discover composition’, by Angelika Kratzer13). Learning to communicate results and viewpoints is one of the challenges of generative grammar and linguistics in general. Learning to explain to people why this enterprise is worthwhile is also crucial for funding purposes. “The list” is therefore essential, even if we do not agree on what should be in it completely.

5.1. Lack of funds One of the challenges that linguists, and in particular theoretical linguists, are fac- ing these days is the chronic lack of funds for research. This has much to do with the new world order, which gives precedence to quantitative rather than qualitative research. The current situation of things can also be attributed to the perceived uselessness of generative grammar, which is sometimes considered vacuous even by linguists themselves. This reflects quite badly in grants, which are almost never awarded for theoreti- cal research. Given that most programs do not fund PhDs or post-docs directly but only through grants, it is becoming very difficult for the field to survive. Here too, spreading the word, explaining the results, bringing this list to the layman could be extremely helpful, both for internal growth and for obtaining grants. With more research being done in generative grammar it is very likely that more results will be achieved.

5.2. Reading about results Like other scientific fields, linguistics suffers from a publish-or-perish curse, which is strictly related to obtaining funds or promotions. Funds are scarce, as I have just said; to try to obtain at least some minimal funding for one’s research, a large number of publications are required. This, together with the fact that the average academic must comply with all sorts of requirements and attend to all sorts of tasks, means that linguists, and generative grammarians, struggle to find time for research. The little research time there is must be maximally productive: researchers write almost more than they read, these days. There is a direct connection between the scarcity of time devoted to reading and the lack of acknowledgment of results: to be a “result”, something has to be recognized as such by the entire field. The problem is that the entire field will not know about the discovery, because the entire field no longer has the time to read what researchers working on different topics are producing, and has got out of the habit of doing so. Legitimation will be lost, and it will be much more difficult to talk about “results”.

13. . The achievements of Generative Syntax CatJL Special Issue, 2019 23

6. Some final remarks The generative enterprise has brought to light many interesting data generalizations, and has improved the understanding of how language works. While many of the achievements or discoveries date back to the early days of GG, many interesting generalizations and explanations for linguistic data are also emerging today as an output of this research enterprise. From Government and Binding to the Minimalist Program there has been a paradigm shift resulting in a new research focus. While research on comparative syntax, or on syntactic variation, is still being carried out by several linguists, including myself, the attention has shifted to the Faculty of Language as a cogni- tive function rather than on the observation of languages. This means that differ- ent sorts of results are being presented nowadays, which find no place in this list. Furthermore, many results and generalizations still need to be digested by the whole community. This will prove more difficult given that there is no longer a “commu- nity” sharing all results as there used to be in the past: linguists are very specialized and rarely read research on topics that are of no immediate interest to them. How can a result be recognized as such, if three quarters of the people working within the same framework ignore its existence? Generative grammar is a worthwhile enterprise. It has brought to light and helped to draw up many generalizations that would have otherwise been impos- sible to formulate. It has brought theoretical depth to many intuitions, and has provided the tools to make these intuitions explicit. Let us start by reading and spreading the results in this list, and let us bear in mind that those who are sup- posed to read these results have a very different profile from 20 years ago.

References Aissen, Judith. 1989. Agreement Controllers and Tzotzil Comitatives. Language 65: 518-536. Anagnostopoulou, Elena. 2003. The Syntax of Ditransitives. Evidence from Clitics. Berlin / New York: Mouton de Gruyter. Anderson, Stephen. 1977. On the mechanisms by which languages become erga- tive. In Charles Li (ed.). Mechanisms of syntactic change. University of Texas Press. Andrews, Avery. 1976. The VP complement analysis in Modern Icelandic. NELS 6: 1-21. Reprinted in 1990 in Joan Maling & Annie Zaenen (eds.). Modern Icelandic Syntax. San Diego: Academic Press, 165-185. Bach, Emmon. 1968. Nouns and noun phrases. In E. Bach & R. Harms (eds.). Universals in Linguistic Theory. New York: Holt, Rinehart, and Winston, 90-122. Baker, Mark. 1985. The Mirror Principle and Morphosyntactic Explanation. Linguistic Inquiry 16: 373-415. Baker, Mark. 1988. Incorporation: A Theory of Grammatical Function Changing. Chicago: University of Chicago Press. Baker, Mark. 1996. The polysynthesis parameter. Oxford: Oxford University Press. 24 CatJL Special Issue, 2019 Roberta D’Alessandro

Baker, Mark. 2008. The macroparameter in a microparametric world. In Theresa Biberauer (ed). The Limits of Syntactic Variation, 351-373. Amsterdam: John Benjamins. Barss, Andrew & Howard Lasnik. 1986. A Note on Anaphora and Double Objects. Linguistic Inquiry 17: 347-354. Beck, Sigrid. 1996. Wh-constructions and Transparent Logical Form. Doctoral dis- sertation, Universität Tübingen. Biberauer, Theresa, Holmberg, Anders & Ian Roberts. 2007. Disharmonic word-order systems and the Final-over-Final-Constraint (FOFC). In A. Bisetto & F. Barbieri (eds.). Proceedings of XXXIII Incontro di Grammatica Generativa, 86-105. Biberauer, Theresa & Ian Roberts. 2015. Rethinking formal hierarchies: a proposed unification. Cambridge Occasional Papers in Linguistics 7: 1-31. Bresnan, Joan. 1972. Theory of complementation in English syntax. Doctoral disserta- tion, Massachusetts Institute of Technology. Chomsky, Noam. 1955. The logical structure of linguistic theory. Manuscript. Published in 1975 by Plenum Press, New York. Chomsky, Noam. 1957. . The Hague/Paris: Mouton. Chomsky, Noam. 1965. Aspects of the Theory of Syntax. Cambridge, MA: MIT Press. Chomsky, Noam. 1973. Conditions on transformations. In S. Anderson & P. Kiparsky (eds.). A festschrift for Morris Halle. New York: Holt, Rinehart and Winston. Chomsky, Noam. 1976. Conditions on rules of grammar. Linguistic Analysis 2: 303-351. Chomsky, Noam. 1977. On Wh-movement. In P. Culicover, T. Wasow & A. Akmajian (eds.). Formal Syntax. New York: Academic Press, 71-132. Chomsky, Noam. 1981. Lectures on Government and Binding. Dordrecht: Foris. Chomsky, Noam. 1986. Barriers. Cambridge, MA: MIT Press. Cinque, Guglielmo. 1996. The antisymmetric programme: theoretical and typological implications. Journal of Linguistics 32: 447-464. Cinque, Guglielmo. 1999. Adverbs and Functional heads. Oxford: Oxford University Press. Diesing, Molly. 1992. Indefinites. Cambridge, Mass.: MIT Press. Emonds, Joseph. 1969. Root and structure-preserving transformations. Doctoral dis- sertation, MIT. Fillmore, Charles. 1963. The position of embedding transformations in grammar. Word 19: 208-231. Grimshaw, Jane (1991). Extended projection. Unpublished manuscript, Brandeis University. (Also appeared in J. Grimshaw 2005, Words and Structure, Stanford: CSLI). Guilfoyle, Eithne, Hung, Henrietta & Lisa Travis. 1992. Spec of IP and spec of VP: Two subjects in Austronesian languages. Natural Language and Linguistic Theory 10: 375-414. Hale, Ken. 1978. On the position of Walbiri in a typology of the base. Ms, MIT. Hall, Barbara C. 1965. Subject and Object in Modern English. Doctoral dissertation, MIT. Horrocks, Geoffrey & Melita Stavrou. 1987. Bounding theory and Greek syntax: evi- dence for wh movement in NP. Journal of Linguistics 23: 79-108. The achievements of Generative Syntax CatJL Special Issue, 2019 25

Huang, C.-T. James. 1982. Logical relations in Chinese and the theory of grammar. Doctoral dissertation, MIT. Jackendoff, R. 1977. X-Bar Syntax: A Study of Phrase Structure. Cambridge, MA: MIT Press. Jenkins, L. 2000. Biolinguistics: Exploring the Biology of Language. Cambridge: Cambridge University Press. Kayne, Richard. 1994. The antisymmetry of syntax. Cambridge, MA: MIT Press Keenan, Edward L. 1975. Some Universals of Passive in Relational Grammar. In R. E. Grossman, L. J. San & T. J. Vance (eds.). Papers from the Twelfth Annual Meeting of the Chicago Linguistics Society, 340-352. Keenan, Edward L. & Bernard Comrie. 1972. Noun Phrase Accessibility and Universal Grammar. Winter Meeting, LSA. Klima, Edward S. 1964. Negation in English. In J.A. Fodor & J.J. Katz (eds.). The Structure of Language. Readings in the . New Jersey: Prentice Hall, 246-323. Lakoff, George. [1965] 1970. Irregularity in Syntax. New York: Holt, Rinehart (Ph.D. dissertation 1965) Langacker, Ronald. 1966. On Pronominalization and the Chain of Command. In Reibel & Schane (eds.). Modern studies in English. New Jersey: Prentice Hall. Larson, Richard. 1988. On the double object construction. Linguistic Inquiry 19: 335-391. Lasnik, Howard. 1976. Remarks on Coreference. Linguistic Analysis 2: 1-22. Li, Yafei. 1990. Conditions on X0-Movement. Doctoral dissertation, MIT. Mahajan, Anoop. 1997. Universal grammar and the typology of ergative languages. Studies on Universal Grammar and typological variation: 35-57. Marantz, Alec. 1984. On the Nature of Grammatical Relations. Cambridge, MA: MIT Press. May, Robert. 1977. The Grammar of Quantification. Doctoral dissertation, MIT. May, Robert. 1985. Logical form: Its structure and derivation. Cambridge, MA: MIT Press. Mithun, Marianne. 1991. Active/agentive Case Marking and Its Motivations. Language 67: 510-546. Muysken, Pieter. 1979. Quechua Causatives and Logical Form: A Case Study in Markedness. In A. Belletti, L. Brandi & L. Rizzi (eds.). Theory of Markedness in Generative Grammar. Pisa: Scuola Normale Superiore. Muysken, Pieter. 1981. Quechua Word Structure. In F. Heny (ed.). Binding and Filtering. Cambridge, MA: MIT Press. Nevins, Andrew I. 2011. Multiple Agree with clitics: person complementarity vs. omnivorous number. Natural Language & Linguistic Theory 29: 939-971. Nevins, Andrew, David Pesetsky & Cilene Rodrigues. 2009. Pirahã Exceptionality: A Reassessment. Language 85(2): 355-404. Newmeyer, Frederick. 2017. Where, if anywhere, are parameters? A critical historical overview of parametric theory. In Bowern, Horn & Zanuttini (eds.). On looking into words (and beyond). Structures, Relations, Analyses. Language Science Press. Perlmutter, David. 1971. Deep and Surface Structure Constraints in Syntax. New York: Holt, Reinhart and Winston. 26 CatJL Special Issue, 2019 Roberta D’Alessandro

Postal, Paul. 1971. Cross-over phenomena. New York: Holt, Rinehart and Winston. Postal, Paul. 1974. On Raising. One Rule of English Grammar and Its Theoretical Implications. Cambridge, MA: MIT Press. Reinhart, Tanya. 1976. The syntactic domain of anaphora. Doctoral dissertation, MIT. Richards, Norvin. 1998. The principle of Minimal Compliance. Linguistic Inquiry 29: 599-629. Rizzi, Luigi. 1990. Relativized Minimality. Cambridge, MA: MIT Press. Rodman, Robert. 1974/1977. On Left Dislocation. Papers in Linguistics 7: 437- 466. (Reprinted in Anagnostopoulou et al. (eds). Materials on Left Dislocation. Amsterdam: John Benjamins, 31-54.) Rosenbaum, Peter S. 1965. The grammar of English predicate complement construc- tions. Doctoral dissertation, MIT. Ross, John R. 1967. Constraints on Variables in Syntax. Doctoral dissertation, MIT. Ross, John R. 1969. Guess who? In Robert I. Binnick, Alice Davison, Georgia M. Green & Jerry L. Morgan (eds.). Papers from the Fifth Regional Meeting of the Chicago Linguistic Society. Chicago: University of Chicago Press, 252-286. Schachter, Paul. 1976. The Subject in Philippine Languages: Topic, Actor, Actor-topic, or None of the Above. In Li, Charles (ed.). Subject and Topic. New York: Academic Press, 491-518. Steele, Susan. 1977. Clisis and diachrony. In Charles N. Li (ed.). Mechanisms of Syntactic Change. Austin: University of Texas Press, 539- 579. Thiersch, Craig. 1985. VP and Scrambling in the German Mittelfeld. Ms., University of Tilburg. Travis, Lisa. 1984. Parameters and Effects of Word Order Variation. Doctoral disser- tation, MIT. Wasow, Thomas. 1972. Anaphoric relations in English. Doctoral dissertation, MIT. Catalan Journal of Linguistics Special Issue, 2019 27-44

Towards Matrix Syntax*

Roger Martin† Yokohama National University [email protected] Román Orús Johannes Gutenberg-Universität Donostia International Physics Center / Ikerbasque Foundation for Science [email protected] Juan Uriagereka University of Maryland [email protected]

Received: December 31, 2017 Accepted: September 23, 2019

Abstract

Matrix syntax is a model of syntactic relations in language, which grew out of a desire to under- stand chains. The purpose of this paper is to explain its basic ideas to a linguistics audience, without entering into too many formal details (for which cf. Orús et al. 2017). The resulting math- ematical structure resembles some aspects of quantum mechanics and is well-suited to describe linguistic chains. In particular, sentences are naturally modeled as vectors in a Hilbert space with a tensor product structure, built from 2x2 matrices belonging to some specific group. Curiously, the matrices the system employs are simple extensions of customary representations of the major parts of speech, as [±N, ±V] objects. Keywords: syntax; chains; minimalist program; Hilbert space; matrix

* It is with a heavy heart that we send this paper to print, since our dear friend and co-author Roger Martin is no longer with us to grace us with his insight, rigor, and sheer decency. We have no doubt that he would have caught issues with the final version that are not apparent to us. We can at least assure the readers that Roger was with our effort till the very end, making it all worthwhile for us to simply pass the torch. We thank Ángel Gallego and Dennis Ott for their efforts in editing this volume and organizing GYNSYN17, the conference from which it stems. We appreciate comments we received from participants, especially Tim Hunter, Marc Richards, and Masaya Yoshida. In addition, we are grateful to Sergio Balari, David Berlinski, John Colarusso, Ángel Gallego, Bill Idsardi, Michael Jarret, Diego Krivochen, Dan Lathrop, Steve Marcus, Dave Medeiros, Doug Saddy, and Zach Stone. Some of the research leading to this paper received financial support from a Grant-in-aid for Scientific Research from the Japan Society for the Promotion of Science (KAKEN 26370446; PI: Martin). Finally, our appreciation goes to an anonymous reviewer for insightful suggestions that helped us clarify parts of the paper.

ISSN 1695-6885 (in press); 2014-9718 (online) https://doi.org/10.5565/rev/catjl.221 28 CatJL Special Issue, 2019 Roger Martin; Román Orús; Juan Uriagereka

Resum. Cap a la sintaxi de matrius

La sintaxi de matrius és un model formal de relacions sintàctiques en el llenguatge que va sor- gir del desig de modelar les cadenes. L’objectiu d’aquest treball és explicar les idees bàsiques d’aquest model a un públic lingüístic, sense entrar en gaires detalls formals (vegeu Orús et al. 2017). L’estructura matemàtica resultant s’assembla a alguns aspectes de la mecànica quàntica i s’adapta bé per descriure les cadenes lingüístiques. En particular, les oracions es modelen natural- ment com a vectors en un espai de Hilbert amb una estructura de producte tensorial, construïdes a partir de matrius 2 x 2 que pertanyen a un grup específic. Curiosament, les matrius que utilitza el sistema són extensions simples de representacions habituals de les parts principals del discurs com a objectes [± N, ± V]. Paraules clau: sintaxi; cadenes; programa minimalista; espai de Hilbert; matriu

Table of Contents 1. Preliminaries focusing 4. The Explosion Problem with on the Trouble with Chains Specifiers and the Need for Matrix 2. The Fundamental Assumption Compression and Anti-Symmetrical Merge 5. Chains and Beyond… 3. Projecting from the Bottom References and Selection Restrictions

1. Preliminaries focusing on the Trouble with Chains While it may not be necessary for an analytical science to be quantitative, gaining quantitative traction—if natural within a discipline’s subject matter—can be an advantage. This is because of the rigor one can associate to calculations, but more generally because the level of predictions and accuracy of testing can gain a differ- ent scope. In practical terms, our project may be seen as a way to implement that desideratum within well-known presuppositions. We assume the basic tenets of generative grammar, such as lexical catego- ries, phrases, Merge, Agree, displacement, chains, control, ellipsis, rules of con- strual and other such notions that have arisen from a long tradition of theoretical investigations into the structure of the human language faculty. All of the famil- iar machinery that theoretical syntacticians commonly use constitutes our basic repository as well. Within such a framework, one of the primary concerns is how to account for displacement phenomena, and other long-range correlations, particularly working within a so-called minimalist approach to grammar. Paramount among the issues is the fact that the interpretation of displaced objects is distributed (in phonetic and semantic terms). Although there has been much discussion about the reason for this over the years, little has been achieved in the way of understanding. In short, we know of no analysis that can account for the kinds of facts that we review below in classical computational terms (see Colins & Stabler 2016 for essentially the same admission). Towards Matrix Syntax CatJL Special Issue, 2019 29

Consider, for instance, the sentence involving multiple raising in (1), which can have either of the possible interpretations in A or B: (1) Friends of each other seemed to the Obamas to appear to the Bushes to have shown up unannounced at the White House. . a Friends of Barack seemed to Michelle and friends of Michelle seemed to Barack to appear to the Bushes to have shown up unannounced at the White House. . b It seemed to the Obamas that friends of George appeared to Laura, and friends of Laura appeared to George to have shown up unannounced at the White House. The availability of these two interpretations is commonly attributed to the assumption that displacement of syntactic phrases creates copies, as in (2): (2) Friends of each other seemed to the Obamas friends of each other to appear to the Bushes friends of each other to have shown up unannounced. If we focus on just one of the copies as the locus of interpretation, we can consider the three possibilities in (3), where here, to illustrate, the copy that is interpreted is highlighted in bold-face and the others are in strike-out. (3) a. Friends of each other seemed to the Obamas friends of each other to appear to the Bushes friends of each other to have shown up unannounced. b. Friends of each other seemed to the Obamas friends of each other to appear to the Bushes friends of each other to have shown up unannounced. c. Friends of each other seemed to the Obamas friends of each other to appear to the Bushes friends of each other to have shown up unannounced. If the interpretive component utilizes the bold-faced copy in (3a), this allows for binding of each other by the Obamas, yielding interpretation A. Whereas inter- preting the bold-faced copy in (3b) allows each other to be bound by the Bushes, yielding interpretation B. Accessing the bold-faced copy in (3c) presumably does not yield any possible interpretation, assuming each other is not bound in that position. However, many questions arise. First, as should be obvious, only one copy survives at the phonetic interface (PF), but the reason for this is unclear (why are not all copies pronounced/interpreted?). Furthermore, the copy that gets pronounced at PF necessarily corresponds to the bold-faced one in (3c), but we have seen how the bold-faced copies in (3a) or (3b) can also be accessed for interpretation at the semantic interface (LF). Yet, although all of the choices in (3) may be possible at LF, it seems that there too only one of the copies can be interpreted.1 If that were

1. It might seem that more than one copy is needed at LF when the copy used for scope/binding is not the same as the one involved in theta-role assignment. However, we assume theta-roles to be determined in a separate component of the grammar, as in Uriagereka (2008) and Martin and Uriagereka (2014). Hornstein (1998, 2001, etc.) advocates yet another approach to theta-roles that 30 CatJL Special Issue, 2019 Roger Martin; Román Orús; Juan Uriagereka not the case, arguably the unacceptable (4a) should be possible, with an LF like (4b).

(4). a *Friends of each other seemed to themselves to have shown up unan- nounced.

b. [IP [friends of [each other]i]i seemed to [themselves]i [IP [friends of [each other]i]i to have shown up unannounced]] With both copies of friends of each other available at LF, the higher copy could bind themselves and, at the same time, each other in the lower copy could be bound by themselves.2 Why it should be the case that no more than one copy is available for interpre- tation is no more obvious in the case of LF than is the question of why only one of the copies can be pronounced at PF. One can of course stipulate as much—but the question is why the objects behave that way, and not in other equally rational ways (all copies are interpreted, some copies are interpreted…). No formalism we know of yields that as a straightforward consequence. We argue that chains (a set of occurrences of the same syntactic token created by a grammatical transformation) are non-classical objects, of the sort commonly assumed in physics, exhibiting conditions that have been described as “spooky” (see Martin & Uriagereka 2008, 2014 for the explicit statement of this idea). We are not the first to bring such notions into the discussion of language. For example, Paul Smolensky (Smolensky 1990; Smolensky & Legendre 2006) has argued for something along these lines for phonology and other parts of language—although within connectionist presuppositions that we do not find necessary. Researchers of various orientations have suggested similar “spooky” connections (e.g. Aerts & Aerts 1994; Atmanspacher et al. 2002; Bruza 2009; Coecke et al. 2013; Heunen et al. 2013; Gerth & beim Graben 2009; Khrennikov 2006; Piattelli-Palmarini & Vitiello 2015; Witteck et al. 2013, etc.). More generally, we will be taking syntax to act on some Hilbert space, by way of linear operations. Moreover, projecting syntactic stuff into interface observables is what “collapses it” into a classical reality, in which entities present reference and quantification, truth values, or for that matter the very signals of speech (or writing, in most systems) are linearized one right after the other—which we also take to be a form of “collapse”. That is our project in a nutshell. None of this really makes sense without quantitative, or at least elaborate logi- cal, assumptions. We think there is a simple way of proceeding, stemming from a

divorces them from intentional (scope/binding) semantics, treating them as features that get checked in the course of a derivation. 2. The indexation in (4b) might be said to violate the so-called i-within-i prohibition. But the status of that condition, which was originally stipulated for theory internal reasons that are not obviously relevant today (and which furthermore incorrectly rules out grammatical expressions such as Escher drew a picture of itself), is far from clear. There are also other sorts of examples that, while being more complex for presentational purposes, demonstrate the same point and do not involve i-within-i situations. See, for example, Hornstein (1998). Towards Matrix Syntax CatJL Special Issue, 2019 31 fact that is familiar to linguists: we operate on feature matrices. We show below that it is easy to translate between familiar syntactic categories and matrices, and moreover that connatural to the latter, if their values are numerical, are interesting quantities that turn out to be central to a project attempting to construct relevant Hilbert spaces and, more generally, turn relevantly quantitative.

2. The Fundamental Assumption and Anti-Symmetrical Merge Consider familiar objects as in (5), from Chomsky (1974).

(5) a. noun: [+N, -V] b. verb: [-N, +V] c. adjective: [+N, +V] d. adposition: [-N, -V]

(5) capitalizes on the intuition that “nouniness” is conceptually orthogonal to “verbiness”, and those two separate lexical dimensions articulate all of the concep- tual space the lexicon needs. N and V features were postulated by Chomsky so as to rationalize the distribution of lexical categories. He could, of course, have called those features A and B, or 1 and i, and retain the system we customarily teach our students. Note, however, that in the latter instance, i.e. 1 and i, there would be a greater level of precision: we could state the intuitive orthogonality of N and V in precise terms, inasmuch as 1 is mathematically orthogonal (maximally different) from i = √-1. We make the Fundamental Assumption in (6) which leads to the reformulation of (5) as (7).

(6) Fundamental Assumption: N = 1 and V = i = √-1.

(7) a. noun: [1, -i] b. verb: [-1, i] c. adjective: [1, i] d. adposition: [-1, -i]

The representations in (7) may be seen as row vectors, in the sense that they are 1d arrays of numbers. While we could develop most of our formalism with such objects, for operational reasons we will “translate” the vectors in (7) to diagonal square matrices as in (8). (For an introduction to basic facts on linear algebra, see, e.g., .)

(8) a. noun: b. verb:

c. adjective: d. adposition:

Notice that the numerical values from the vectors in (7) are placed in the matrix diagonal in (8). As a result, (8) presents “diagonal” and “unitary” matrices, which 32 CatJL Special Issue, 2019 Roger Martin; Román Orús; Juan Uriagereka means they have the property of their inverse being identical to what is called their adjoint. Explaining that technically would take us too far afield, but simply to fix some notation, for e.g. the noun matrix N = its transpose is NT = (where we just exchange the antidiagonal elements, which in this case are both 0), and its adjoint (or conjugate transpose) is N+ = , where we just took the transpose matrix NT and took the complex conjugate of its entries—i.e., replaced by Suffice it to say in this context that these are extremely elegant matrices, with well known mathemati- cal properties. The formal objects in (8), which we call Chomsky matrices, in fact integrate into a curious mathematical group that will be discussed below. We may also add that these matrices form a non-standard basis for the Hilbert space C2, which will also be discussed later. To illustrate one of the merits of treating lexical categories in terms of the Chomsky matrices, consider the highly limited combinatorial possibilities for lexi- cal heads and their complements:

(9) a. Nouns select PPs. b. Verbs select NPs. c. Adjectives select PPs. d. Prepositions select NPs.

As generic statements, (9c) and (9d) are virtual universals, while (9a) and (9b) are at least statistically overwhelming. Certainly, verbs also select other catego- ries, which may force us to complicate the system—and also to invoke functional categories—but in science one typically begins by trying to predict the most basic interactions.3 In any event, while the facts in (9) are commonly stipulated in many ways, we have never seen them explained. To provide an account for (9), we begin with the assumption that First- Merge—the combination of a head and its complement—is matrix multiplica- tion. Indeed, once we postulate the Fundamental Assumption in (6), and represent categories as in (8), it is natural to ask whether those moves lead to more than just the formalization of orthogonality of N and V features, or standard operations among them are also possible. Multiplication is, in a sense, a deformation of a given (conceptual) space by way of a linear operator. One should then ask why matrix multiplication should model First-Merge (or any other process). While there is no a priori answer to such a question, we can attempt to show the predic- tive results of taking the step. We further assume the following:

(10) First-Merge is antisymmetrical.

Typically, First-Merge creates an asymmetrical relation, in the sense that one element, the head, is necessarily atomic (selected from the lexicon), whereas the other, the complement, is a complex phrasal element that has been previously

3. It could also be that (9) is in some sense cognitively prior, in that learners get it from Universal Grammar in the absence of experience. From that perspective, further complications would be learnt or emerge only later, based on more complex interactions with environmental stimuli. Towards Matrix Syntax CatJL Special Issue, 2019 33 assembled in the derivation. However, there is one notable exception to this situ- ation, corresponding to the initial combinatorial step of every derivation (or the initial step in the sub-derivation of, say, a left branch constituent, etc.). Obviously, at the very start of a derivation there cannot be any previously assembled syntac- tic objects. Thus, the only option then is to combine two lexical items. The idea behind (10) is that we can allow for this sort of situation—while still maintaining the general asymmetry of the head-complement relation—if we restrict the initial combination of two lexical items to instances of self-merge, an idea first proposed by Guimarães (2000) and later adopted by Kayne (2009).4 When considered from the point of view of the Chomsky matrices in (8) and first-merge as matrix multiplication, the result of self-merging any of the Chomsky categories is the same:5

(11)  = = =

= = = Z

Z is one of the famous Pauli matrices, which has been put to use to predict prop- erties of an electron’s angular momentum. The reasons for that are not important now, but they boil down to the fact that Z is what mathematicians call a Hermitian matrix. Hermitian matrices are to matrices what real numbers are to numbers, in that eigenvalues (roots) of Hermitian matrices are real. Both can be measured. We will get a feel for that as we get our hands into computations, but we can point out the obvious already: the elements in the diagonal in Z are both real numbers. These are key in understanding the essence of a matrix: its eigenvalues. The eigenvalues of the Chomsky matrices are combinations of ±1 and ±i. It is different for Z, as a result of which the matrix has other elegant properties. We can think of Z as a welcome encounter arising from the self-merger of the Chomsky objects. At the same time, it is also interesting to ponder what we should make of that, especially within an ultimately “semiotic” system that in some sense carries thought, and even allows us to communicate it. Basically, a linguistic system that is trying to start in a self-merger with the math in (11) has to resolve that “multiguity”, so that instead of all possible self-mergers leading to Z, the system chooses one—any one—to the exclusion of the others. One may think of this choice as the core Saussurean arbitrariness in the system, as the

4. A relation that is asymmetrical except when holding with itself is called antisymmetrical. Here we state that First-Merge is antisymmetric, though precisely speaking the property holds of the relation established between the two elements that undergo First-Merge (not the assembling operation). 5. To multiply such matrices as in (11), readers may multiply entries in entry-wise fashion. But note that this is possible only because the matrices are diagonal (it is not meant as a general comment about matrix products). 34 CatJL Special Issue, 2019 Roger Martin; Román Orús; Juan Uriagereka choice of any such mapping is in principle as good as any other. The choice made by language seems to be the following:

(12) N (understood as Chomsky’s ) self-merges as

= Z.

(12) can be viewed as cognitive anchor, which we will not seek to explain, at least not mathematically. We assume (12), as opposed to any of the other logically possible mappings, for empirical reasons: derivations bottom out as nouns.6 That Guimarães proposed self-merger for nouns is not surprising; the insight was in the self-merger—not its involving nouns.

3. Projecting from the Bottom and Selection Restrictions Once the “anchor” for human language in (12) is assumed, things start falling into place, in a form that can be summarized in terms of a diagram proposed to us by Michael Jarret, which we refer to as the Jarret graph, presented abstractly in (13). In this graph, we need to distinguish operational edges (Chomsky matrices corresponding to the four major lexical heads, which are presented in the graph with a “hat” ^ to signal their operator status) and nodes. Both edges and nodes are matrices, but the emphasis in each instance is entirely different: while the edges are unique linear operators, hence they are presented with fully specified values, the matrices they operate on as vectors can be of any kind that presents, in matrix representation, a given determinant, signaled in parenthesis, ranging over ±1 and ±i.

(13)

A matrix determinant is an invariant scalar obtained, for simple square matri- ces, by multiplying the items in the diagonal and subtracting from that the product of the items in the off diagonal. We propose, for matrices understood as vectors, that the matrix determinant determines what linguists call the “label” of a category, which for the projections we will be operating with are the fundamental orthogonal

6. Also, of the lexical categories, only nouns appear in bare form, without dependents, as names, pronouns, etc. Towards Matrix Syntax CatJL Special Issue, 2019 35 features ±1 and ±i only. Importantly, we assume that the interpretation of deter- minants as labels is only relevant for the outputs of operations (namely, the nodes in the graph, which correspond to phrasal projections consisting of a lexical head and complement, except for the initial step of self-merge), and not the operators (or lexical heads) themselves.7 The specific labeling system we argue for projected categories is the following:

(14) a. N projections: label/determinant -1 b. V projections: label/determinant i c. A projections: label/determinant 1 d. P projections: label/determinant -i The Jarret graph has N heads select (multiply with) matrices of type -i (prepo- sitional projections) to yield -1 projections, while P heads select matrices of type -1 (the nominal projections) to yield -i projections—that being the recursive core of the system. In addition, the graph also says that V heads select matrices of type -1 to yield i projections, while A heads select matrices of type -i to yield 1 projections—that being the non-recursive periphery of the first-merge system. In addition, the graph has a START point, explicitly signaled in (13). This boils down to the anchoring assumption we have argued for. It would be silly for a graph as in (13) to start at the periphery, since then the computation has nowhere to go; the core is a more useful place to start. But the core itself has two different sites: one labeled -i and the other one labeled 1. On formal grounds alone, it is natural for the system to start at a state that carries the computation to the very elegant Pauli matrix Z, with determinant/label -1. We have already shown above how all instances of self-merge, for any of the Chomsky categories, yield this result. That being the case, the only matrix that carries the system to the Z configuration with determinant -1 is precisely , which we call Chomsky1 (or C1), for this very reason. (It is still a substantive claim to postulate that C1 corresponds specifi- cally to nouns, which we are adapting from Chomsky 1974 via our Fundamental Assumption; the formal system could have just as naturally started in C1 with us having assigned that matrix to verbs, prepositions or adjectives…) The other formal properties of the Jarret graph—such as why the -1 and -i pro- jections are at the core, others at the periphery—follow from the results of matrix multiplication over the Chomsky matrices. Specifically, only the following eight results are mathematically possible, via multiplication. We have mentioned how, starting on the self-merge of Chomsky’s C1 (15e), we obtain Pauli’s Z (15a), with label/determinant -1. We can then proceed with the specific options in the Jarret graph. Z can multiply into -C1 (15g) with label/determinant -i by -C2 (15h) (stay-

7. We emphasize again that the matrices understood as linear operators are as distinct from what they operate on as, say, operator “+” is from the number pair it takes – “+” is not, itself, even a number, let alone a pair. In that representation, the label for our matrices-as-operators is not its determinant. In contrast, when a given matrix is interpreted as a vector that linear operators operate on, then its determinant is the object’s label. 36 CatJL Special Issue, 2019 Roger Martin; Román Orús; Juan Uriagereka ing at the core of the graph), or multiply into -C2 (15h) with label/determinant i by -C1 (15g) (going into the left periphery of the graph).

(15) a. Z b. I c. -Z d. -I e. C1 f. C2

, , , , , ,

g. -C1 h. -C2

, .

The same reasoning obtains for the ensuing matrices. For example, the -C1 obtained in the previous instance can multiply into -Z (15c) with label/determi- nant -1 by C1 (15e) (staying at the core of the graph), or it can multiply into -I (19d) with label/determinant 1 by C2 (15f) (going into the right periphery of the graph). Readers can try this as an exercise for other states in the graph, and it will become apparent that all the results fall within the “first-merge” Abelian group in (15), which is commutative for multiplication. Note that, for each of the lexical projections—matrices that the system treats as vector arguments of other matrix operators—there are two equivalent matrix vari- ants with the same label/determinant, which we refer to as “twin” projections. The matrices-as-vectors are equivalent in that they share the same determinant, which we have proposed can be understood syntactically as a label. For example, Z and -Z have label/determinant -1 because the determinant is the product of the items in the diagonal minus the product of those in the off diagonal—so -1 in both instances. Readers can easily verify that this is true for all other twin categories in (16).

(16) a. NP, label -1 b. AP, label 1 c. VP, label i d. PP, label -i Z -Z I -I C2 -C2 C1 -C1

, , , , .

Other matrix multiplications are possible among the eight items in (16); but only those expressed in the Jarret graph present this kind of symmetry. For exam- ple, we could have multiplied, say, a preposition understood as a -C1 operator (15g) times a verb phrase understood as matrix-as-vector C2 (15f), one of the VP “twins”; the result is -I (15d) with label/determinant 1; but while that is mathemati- cally fine, it simply cannot be a projection from a prepositional head -C2 with label -i. Readers can try similar multiplications off the edges of the Jarret graph, to see how only the connections made explicit within it preserve endocentricity/ selection in the sense described. That is what predicts the facts in (9), together with the formal fact that mul- tiplication only allows certain results. Had we asked whether we could obtain a Towards Matrix Syntax CatJL Special Issue, 2019 37 projected Z (16a) from the last matrix multiplication mentioned in the previous paragraph (-C1 (15g) times C2 (15f)), the answer would be no. That is not for substantive reasons as presupposed in (14); it follows from assuming a numerical base and elementary multiplications—one cannot obtain i from 1x1. Thus there is an important consequence of the numerical assumptions we made to substantiate Chomsky’s intuition about the cognitive orthogonality of N and V attributes, as well as the general approach to treating categories as feature matrices (in linear- operator interpretation for heads and vector interpretation with “twin” variants for projections), together with interpreting these and their hypothesized elements in a mathematical sense: we are now able to predict certain elementary combinations in syntax without having to invoke external considerations.

4. The Explosion Problem with Specifiers and the Need for Matrix Compression Just as we have proposed matrix multiplication for first-merge, we now propose another kind of product, the tensor product, for Elsewhere Merge. We refer that way to those forms of Merge in which both of the merged elements are complex, having a derivational history (instead of one of them, at least, coming from the lexicon). The rhetoric in the literature routinely equates these two forms of merge, but we will keep them separate, to begin with because they are plainly distinct in that First Merge must involve a projecting head, whereas Elsewhere Merge does not. There are also many empirical differences between firstly merged complements and speci- fiers merged in elsewhere fashion—but we will not review them now. In any case, just as there is no a priori reason to treat a given form of merge as a type of product, there isn’t one to treat another as a different type of product. The argument is ulti- mately based on how well the decision may lead to modeling relevant facts. In this regard, there are two broad themes to keep in mind from the perspective of modeling chains, which is our driving force. First, that a (displacement) chain by definition must involve at least one specifier—in that the displaced site cannot involve a form of First Merge. Second, that the payoff of the sort of math we are pursuing arises when considering tensorized networks, for formal reasons we review shortly. Tensor products have the effect of basically concatenating two matrices into a larger one, which is useful in “building structure”. Whereas regular matrix mul- tiplications do not preserve structure (once modified, a linearly altered structure could have come from different multiplications), tensor products are structure- preserving: by looking at a tensor product, we know what went into it. For this reason, while matrix multiplication retains the dimensionality of its factors, tensor products generally have a dimensionality that grows upon taking place, as function of the dimensionality of the factors. The dimensionality of a matrix is its number of rows and columns—or the information that takes to specify it—determining what sorts of operations are allowed.8 For the objects in the Abelian group in (15),

8. For example, only matrices of identical inner dimensionalities can be added/subtracted, and only a matrix A with the same number of rows as the number of columns as a matrix B can enter into a multiplication A B. 38 CatJL Special Issue, 2019 Roger Martin; Román Orús; Juan Uriagereka multiplying its members times any other does not change the dimensionality of the factors: the result is of the same dimensionality of each of the factors—otherwise (16) would not be a group. But this does not happen when we invoke a tensor product concatenating two matrices. The dimensionality of the resulting matrix is the product of the dimensionalities of the factors.

(17)

Had (17) been matrix multiplication involving the identity matrix I, the result would be identical (in dimensionality and everything else). Because this is a tensor product, even if it involves the identity matrix, the result preserves the shape of the second factor, but it is obvious that the output is a 4x4 matrix. Moreover, by looking at the output we know that it must have originated in the product to the left, in this sense the tensor product being structure preserving. We use structure-preserving tensor products to generate phrase-to-phrase merg- ers (as opposed to head-to-phrase conditions). This has vast consequences. We generate (18a) by the tensor product of children’s and pictures of NYC. This is possible regardless of whether the genitive is complex, as in relatives of children’s (the situation in (18b))—a phrase like that falling into the characterization provided for (18a). But the situation in (18c) is more interesting:

(18) a. Children’s pictures of NYC. b. Relatives of children’s pictures of NYC. c. Women’s children’s pictures of NYC. d. London’s women’s children’s pictures of NYC. In (18c) we have a specifier (women’s) within another (women’s children’s). So if each elsewhere merger, going beyond the initial head-complement relations, invokes a tensor product, and a tensor product’s dimensionality is the product of its factor’s dimensionalities, then the dimensionality of women’s children’s pic- tures… should be equal to that of children’s pictures times that of women’s. This can then go on into the even higher dimensionality of London’s women’s children’s pictures… as in (18d) and so on—indefinitely. We call this the Explosion Problem, which tells us something about the nature of specifiers. The general approach to such problems is matrix compression, based on dimensional reduction. What we seek for that purpose are matrix results where entire rows or entire columns reduce to zero, and thus can be eliminated. These, it turns out, correspond to matrices with one or more zero eigenvalues. Consider next some generalities to be drawn about our “Magnificent Eight” objects in (16) (the Chomsky matrices, the Pauli matrix ±Z, the identity matrix and its inverse ±I) in particular: Towards Matrix Syntax CatJL Special Issue, 2019 39

Table 1. Algebraic properties of the Pauli matrices within the Magnificent Eight Matrices: Z = -Z = I = -I = Properties Char. polynomial x2 – 1 x2 – 1 x2 – 2x + 1 x2 + 2x + 1

Eigenvalues 1, -1 -1, 1 1, 1 -1, -1

Determinant -1 -1 1 1

Trace 0 0 2 -2

Table 2. Algebraic properties of the Chomsky matrices within the Magnificent Eight Matrices: C1= -C1= C2= -C2 = Properties Char. polynomial x2 – (1– i)x –i x2 - (-1+ i)x – i x2 – (1+ i)x + i x2 - (-1- i)x + i

Eigenvalues 1, -i -1, i 1, i -1, -i

Determinant -i -i i I

Trace 1– i -1+ i 1+i -1- i

For these matrices, the following statement is always formally true:

(19) The diagonal elements are the polynomial roots and matrix eigenvalues.

Note also that the Pauli matrices (Table 1) are different from the Chomsky matrices (Table 2) in that all their eigenvalues are real. Hermitian matrices have real eigenvalues, so we can easily see that none of the Chomsky matrices are Hermitian. Observe, also, putative unifications across categories. We have already observed how the positive and negative versions of the “twin” categories, which we use in vector interpretation for category projections, share the same determinant. But there are more generalizations of interest. Note that only ±Z presents the same characteristic polynomial x2 – 1 (all other matrices in the Magnificent Eight have different characteristic polynomials). Now that is a specific sense in which ±Z is the most elegant among the Magnificent Eight: aside from being Hermitian, it has a unified characteristic polynomial, a unified trace,9 and a unified determinant— which no other matrix pair in the group does. Needless to say, considerations about characteristic polynomials, eigenvalues, and so on, obtain for all square matrices, not just our Magnificent Eight. This is important when considering these architectural issues from a broader perspective,

9. The notion matrix trace in these tables—which of course has nothing to do with syntactic traces—is just the sum of the elements in the matrix diagonal. To avoid a confusing notation we signal as tr the matrix trace. 40 CatJL Special Issue, 2019 Roger Martin; Román Orús; Juan Uriagereka including an extension of our system to functional categories. To begin with, Pauli’s ±Z is only one of three Hermitian matrices within the Pauli Group, which includes also ±I and imaginary versions of all these matrices. Two other fundamental Pauli matrices, ±X = ± and ±Y = ± , are non-diagonal. It turns out that when we multiply any of the non-diagonal matrices in the Pauli Group by any of the Chomsky matrices, we end up in another group of 32 matrices that we call the Pauli/Chomsky Group. This group is extremely interesting in its own right, and furthermore it allows us to systematically construct and explore a corresponding vector space where tensor products (specifier relations) can be superposed (form- ing chains). Moreover, the larger group allows us to postulate a richer “periodic table” of syntactic elements: the Magnificent Eight (lexical projections) along with corresponding functional projections, which also have well-defined characteristics of the sort just studied (some will have unified polynomials, not all; some will be Hermitian, not all; some will be unitary, etc.—all of which has consequences for a generalized semantic anchoring). Within those parameters, we take the research program to be as follows:

(20). A To find out how the functional categories (Infl, Comp, v, etc.) relate to lexical categories and to one another in a principled fashion. . B To determine how Pauli/Chomsky Group and its 16 twin projections con- stitute the basis for standard syntax, in terms of their multiplications and tensor products. . C To understand which of the tensor products among the categories in the periodic table lead to compressible results. . D To figure out how the tensor products sum with one another into chain dependencies, and which among those present separable contexts.

Just as the formal tools in Tables 1 and 2 above show us in what sense the Pauli/Chomsky matrices are more or less elegant—which arguably relates to their syntactic distribution—they also allow us to understand how dimensions can be reduced after they have grown due to a tensor product. In this regard, it is useful to emphasize that our matrices can be said to have a dimensionality equal to their number of non-zero eigenvalues. In other words, a non-compressible 4x4 matrix has four substantive (not zero) eigenvalues, whereas a compressible 4x4 matrix has as many zero eigenvalues as matrix dimensions are irrelevant to it. It will not be possible for us to present, in this context, anything but a “teaser” of how the dimensional reduction works—and we refer interested readers to Orús et al. (2017) for relevant details. The basic idea, however, works as follows. We can find situations in which literally adding a tensor product (arising from a projection taking a specifier) to another such product results in the elimination of some of the eigenvalues in the sum of the matrices. For example, with the sum shown in Towards Matrix Syntax CatJL Special Issue, 2019 41

(21), the specifications of which are given in (22), we clearly have a dimensional reduction, since the ensuing matrix has two zero eigenvalues.10

(21)

(22) dt.: 0, tr.: 0; char. pol.: x4 + 2i x2; eigenvalues: (-1+ i), (1 - i), 0, 0.

So the system will basically reduce the specifier dimensionality by integrating it into a sum of the sort in (21)—which we take to be a chain if certain structural conditions are met. Another way of putting this more intuitively is that the gen- eral rationale behind chains, from the present perspective, is to reduce specifier dimensionality.

5. Chains and Beyond… The importance of the foregoing exercise is to prepare the ground for the opera- tions that, in conditions of superposition (sums) as in (21)/(22), may lead not just to dimensionality reduction in specifiers—but also to different chain collapses. This is the crux of the idea: chains exist, prior to being observed, in superposed states. At the observation point, if at all possible, they materialize, with some probability, in one of those states, which thus becomes observable. There are well understood properties of superposed states that, in principle, allow for their separability; for instance when they are orthogonal to start with (with regards to some orthonormal basis). The situation is all or nothing: if the states are orthogonal, the separation, in the right conditions, is inevitable; if they are not orthogonal, the separation is impossible. Moreover, there is no such thing as being observable in multiple states at the same time, much as there is meaning to the states all existing simultaneously. With Chomsky (1995), we take a chain to be an object of the sort {{α, Κ}, {α, Λ}}, where a specifier α moves from context Λ to context Κ. Since we are mod- eling specifiers with tensor products, we can take the chain to be the sum:

(23) [α Κ ] + [α Λ] = α [Κ + Λ]

To say⨂ α separates⨂ from these⨂ superpositions is to say one can “factor out speci- fier α” from the relevant tensor products, as in the right-hand side of the equations in (24). So the chain, in a deep sense, links the contexts of each of its occurrences, Κ and Λ. After “factoring out” the separable element α, what remains is the super-

10. For diagonal square matrices, the determinant amounts to the product of the eigenvalues, while the trace equals the sum of the eigenvalues. The matrix determinant and trace of the sum in (21) is zero, as a consequence of which the characteristic polynomial is simplified to x4 + 2i x2. 42 CatJL Special Issue, 2019 Roger Martin; Román Orús; Juan Uriagereka position [Κ + Λ]. Now here is the key: if the superposed contexts are mutually orthogonal, we can apply to such complementary conditions the standard logic in quantum mechanics. Basically, when the relevant system is measured, it has 50% probability of being observed in the Κ configuration and 50% probability of being observed in the Λ configuration. If we suppose that we observe abstract linguistic representations by sending them to relevant interfaces, within those representa- tions we can say that chain {{α, Κ}, {α, Λ}} collapses at either configuration Κ or configuration Λ, with equal probability. Of course, we have to make precise what we mean by “orthogonal”, or “maxi- mally different” within an orthonormal basis. The following is the standard approach:

(24) Two vectors x and y in vector space V are orthogonal if their inner (scalar) product is zero.

A convenient way to define the scalar product between two matrices is as in (25), where tr represents a matrix trace (and see fn. 9):

(25) = tr(A† B) Where, for ket |A>, A’s conjugate adjoint A† is the bra

Here we are using a vector notation introduced by Paul Dirac for notions dis- cussed above already. What (24)/(25) boil down to is that we take two matrices A and B, understood as vectors, to be orthogonal if and only if the trace of multiplying A’s adjoint A† times B is zero. Because we have the Pauli/Chomsky group to work with, determining this, which in Dirac’s shorthand is , is relatively simple: we just need to churn the calculations. The points to take home are straightforward. First, this is supposed to work with the very same types of conditions and reasoning as it does in quantum physics. The issue is not really whether the computations are wrong (they aren’t), but rather whether they are meaningful. To decide on that depends on whether we have alternative theories of “reconstruction effects” and the like, and if so, whether such alternatives fare better on empirical grounds. Our attempt here is simply to show how things work in our terms. Second, the formalism, as such, allows for little or no wiggle room. If “collaps- es” are meant seriously, they take place in a vector (Hilbert) space along the lines of what is guaranteed by (24)/(25) in the context of something like the Chomsky/ Pauli group. In particular, if two matrices come out as orthogonal by the defini- tions we are introducing, they cannot be “quasi-orthogonal” or “orthogonal up to speaker intuitions”, or whatever. One could, of course, change the definition of the inner/scalar product in (25), and then different things would be orthogonal. Or reject the Pauli/Chomsky group as the locus for all of this, and then perhaps in a different realm other things would be orthogonal. But within the scenario we are presenting there are no alternatives. A third broad point to bear in mind is that we are attempting several things at the same time. At the very least we need to address Towards Matrix Syntax CatJL Special Issue, 2019 43 the Compression Problem for specifiers. This is to say that we are not just after “reconstruction effects” for chain occurrences. While that is what has motivated the program, once we invoke matrices, groups, Hilbert spaces and so forth, one hopes that all of that doesn’t amount to mere paraphernalia to address the techni- cal problem of occurrences. For us, chain occurrences are interesting inasmuch as they touch on all these other issues, taking us from humble phrases to complex long-range correlations. To be sure, chains are not the only long-range correlations that grammars present: there is obligatory and non-obligatory control, ellipsis of various kinds, binding and obviation effects, preferences, and much more. We have the sense that treating these matters within a Hilbert space of relations is promising, particularly when, beyond the superpositions just discussed, such system a fortiori involves rampant entanglements. Basically, whatever doesn’t separate is entangled, so there is plenty of room to explore what happens beyond the core situation dis- cussed above. Present space limitations aside, the issue of entanglements is one that we are currently working on and do not have a full understanding of yet, to be honest. One last point is worth emphasizing: much of what we have said above would not make (non-metaphorical) sense without the use we have made of scalars of different kinds. We have shown the role played by determinant scalars. We have just alluded to the important role of matrix traces—another scalar—in determin- ing the inner product of our Hilbert space. (We could also show how traces in the Pauli/Chomsky matrices help us separate substantive categories from grammatical ones.) Moreover, the logic of chain collapses as sketched ultimately follows the usual logic that is also applied to quantum mechanics. That very logic requires a “lower boundary”, usually expressed in terms of Planck’s famous constant for the case of quantum physics—at any rate, a non-zero real number. That apparatus has to be numerical, indeed real in the technical sense. No real numbers, no syntax as we have examined it. We could certainly be wrong in our analyses, but if we are not, they provide bona-fide arguments that “mind phenomena” require real quantities as they materialize, enough at least to show up with coherent patterns as examined here. As we noted already above, we are not the first to have argued that the human language utilizes a Hilbert space (or some extension) of some sort, or that it is best to treat some aspects of grammar in terms of vector spaces more generally. We believe, however, that we are the first to make such a “quantum leap” taking totally seriously the fundamentals of linguistic theory (the division into nouns, verbs, adjectives and adpositions, the role of structure, selection and endocentricity, within phrases, standard cartographies, etc.). This is a sense in which our approach is actually as conservative as it is admittedly radical. We have shown how a Hilbert space can be constructed from assumptions that many linguists teach in their under- graduate classes. The only twist we have added is to interpret familiar conceptual orthogonalities in mathematical terms, which we have found worth studying. 44 CatJL Special Issue, 2019 Roger Martin; Román Orús; Juan Uriagereka

References Aerts, D. & Aerts, S. 1994. Applications of quantum statistics in psychological studies of decision processes. Foundations of Science 1: 85-97. Atmanspacher, H., Römer, H. & Walach, H. (2002). Weak quantum theory: Complementarity and entanglement in physics and beyond. Foundations of Physics 32: 379-406. Bruza, P., K. Kitto, D. Nelson & C. McEvoy. 2009. Is there something quantum-like about the human mental lexicon? Journal of Mathematical Psychology 53: 363-377. Collins, C. & E. Stabler. 2016. A Formalization of Minimalist Syntax. Syntax 19(1): 43-78. Chomsky, N. 1974. The Amherst Lectures, delivered at the 1974 Linguistic Institute, University of Massachusetts, Amherst: Université de Paris VII. Chomsky, N. 1995. The Minimalist Program. Cambridge: MIT Press. Gerth, S. & P. beim Graben. 2009. Unifying syntactic theory and sentence processing difficulty through a connectionist minimalist parser. Cognitive Neurodynamics 3: 297-316. Guimarães, M. 2000. In Defense of Vacuous Projections in Bare Phrase Structure. In Guimarães, M., L. Meroni, C. Rodrigues & I. San Martin (eds.). University of Maryland Working Papers in Linguistics 9: 90-115. Heunen, C., M. Sadrzadeh & E. Grefenstette (eds.). 2013. Quantum Physics and Linguistics. Oxford: Oxford University Press. Hornstein, N. 1998. Movement and Chains. Syntax 1(2): 99-127. Hornstein, N. 2001. Move! A Minimalist Theory of Construal. Oxford: Blackwell. Kayne, R. 2009. Antisymmetry and the Lexicon. Linguistic Variation Yearbook 2008: 1-32. Khrennikov, A. 2006. Quantum-like brain: ‘Interference of minds’. Biosystems 84(3): 225-241. Martin, R. & J. Uriagereka. 2008. Competence for preferences. In X. Artiagoitia & J. A. Lakarra (eds.). Festschrift for Patxi Goenaga. University of the Basque Country. Martin, R. & J. Uriagereka. 2014. Chains in Minimalism. In P. Kosta, S. Franks, T. Radeva-Bork & L. Schürcks (eds.). Minimalism and Beyond: Radicalizing the Interfaces. Amsterdam: John Benjamins. Orús, R. & Martin, R. & Uriagereka, J. 2017. Mathematical foundations of matrix syntax. Retrieved from . Smolensky, P. 1990. Tensor product variable binding and the representation of symbolic structures in connectionist networks. Artificial Intelligence 46: 159-216. Smolensky, P. & G. Legendre. 2006. The harmonic mind: From neural computation to Optimality-Theoretic grammar (vols. 1-2). Cambridge: MIT Press. Uriagereka, J. 2008. Syntactic Anchors: On Semantic Structuring. Cambridge: Cambridge University Press. Catalan Journal of Linguistics Special Issue, 2019 45-88

Factors 2 and 3: Towards a principled approach*

Theresa Biberauer University of Cambridge & Stellenbosch University & University of the Western Cape [email protected]

Received: December 31, 2017 Accepted: September 23, 2019

Abstract

This paper seeks to make progress in our understanding of the non-UG components of Chomsky’s (2005) Three Factors model. In relation to the input (Factor 2), I argue for the need to formu- late a suitably precise hypothesis about which aspects of the input will qualify as ‘intake’ and, hence, serve as the basis for grammar construction. In relation to Factor 3, I highlight a specific cognitive bias that appears well motivated outside of language, while also having wide-ranging consequences for our understanding of how I-language grammars are constructed, and why they should have the crosslinguistically comparable form that generativists have always argued human languages have. This is Maximise Minimal Means (MMM). I demonstrate how its incorporation into our model of grammar acquisition facilitates understanding of diverse facts about natural language typology, acquisition, both in “stable” and “unstable” contexts, and also the ways in which linguistic systems may change over time. Keywords: three factors; Universal Grammar; acquisition; crosslinguistic variation; poverty of the stimulus

Resum. Factors 2 i 3: cap a un enfocament fonamentat

Aquest treball pretén fer progressos en la comprensió dels components que no són UG del model de tres factors de Chomsky (2005). En relació amb l’entrada (factor 2), argumento la necessitat de formular una hipòtesi adequada i precisa sobre quins aspectes de l’entrada es qualificaran com a

* This paper, which partially reflects the content of a talk given at the ‘Generative Syntax 2017: Questions, Crossroads, and Challenges’ meeting organized by Ángel Gallego and Dennis Ott, is an expanded version of a working paper that appeared in the Cambridge Occasional Papers in Linguistics (COPiL) in August 2017 (Biberauer 2017e in the references). I thank the audience at the above-mentioned Barcelona meeting for their questions and comments; Ian Roberts for comments on the COPiL paper; Jamie Douglas, Julio Song, Erin Pretorius, Craig Sailor, Paula Buttery, Frances Blanchette, Jeroen van Craenenbroeck, Aritz Irurtzun, Daniel Harbour, Ángel Gallego, Peter Msaka, Valentina Colasanti, and Hedde Zeijlstra for valuable discussions of diverse kinds; an anonymous reviewer for comments on an earlier draft of the present paper; Ángel and Dennis for the invitation to write this up; and Ángel yet again for crucial support at key points. The research reported here was initially funded by the European Research Council Advanced Grant No. 269752 ‘Rethinking Comparative Syntax’ (ReCoS), and a Cambridge Humanities Research Grant ‘Learning from ques- tions and commands: probing the nature and origins of native-speaker knowledge’ (Biberauer 2015). Crucially, it also reflects unfunded work that has been undertaken on a ‘Maximise Minimal Means’ basis since the completion of the ReCoS project in May 2017. Usual disclaimers apply.

ISSN 1695-6885 (in press); 2014-9718 (online) https://doi.org/10.5565/rev/catjl.219 46 CatJL Special Issue, 2019 Theresa Biberauer

«ingesta» i, per tant, seran la base de la construcció gramatical. En relació amb el factor 3, destaco un biaix cognitiu específic que apareix força motivat fora del llenguatge, alhora que té àmplies conseqüències per a la nostra comprensió de com es construeixen les gramàtiques del llenguatge I, i per què haurien de tenir la forma interlingüísticament comparable als generativistes. Es tracta de maximitzar els mitjans mínims (MMM). Demostro que la seva incorporació al nostre model d’adquisició gramatical facilita la comprensió de fets diversos sobre tipologia de llenguatge natu- ral, adquisició, tant en contextos «estables» com «inestables», i també de les maneres de canviar els sistemes lingüístics amb el pas del temps. Paraules clau: tres factors; gramàtica universal; adquisició; variació interlingüística; pobresa de l’estímul

Table of Contents 1. Introduction 3. Novel predictions of the model 2. A neo-emergentist approach 4. Conclusion to linguistic variation: the Maximise References Minimal Means (MMM) model

1. Introduction The “traditional” generative perspective on the question of how adult speakers come to have the native-language knowledge that they do famously highlights the two ingredients given in (1):

(1) Universal Grammar (UG) + Primary Linguistic Data (PLD) à Adult Grammar (= an I-language)

Here, the nature component – UG – is thought to be “rich in structure” (Chomsky 1981: 3), with the key consequence that the nurture component – the PLD – does not need to be so elaborate. The connection between UG and the PLD in the context of the classic Principles & Parameters era of the 1980s and 1990s was in fact assumed to be much closer than is often appreciated in current discus- sion, with UG fulfilling a “steering” function in relation to the PLD. Chomsky (1981: 10), for example, characterises the UG specification as entailing

concepts that can plausibly be assumed to provide a preliminary, prelinguistic analysis of a reasonable selection of presented data, that is, to provide the primary linguistic data that are mapped by the language faculty to a grammar…

In other words, the PLD, as initially conceived, was not assumed to be “eve- rything the acquirer hears”, but, instead, that part of the input that UG facilitated initial access to. On this model, all the PLD had to provide was:

limited evidence, just sufficient to fix the parameters of UG [which – TB] … determine a grammar that may be very intricate and … in general lack grounding in experience in the sense of an inductive bias. (Chomsky 1981: 3) Factors 2 and 3: Towards a principled approach CatJL Special Issue, 2019 47

The PLD, then, was expected to be readily accessible and quite simple in struc- ture, with the “rich deductive structure” of parameters1 accounting for the fact that our linguistic knowledge ultimately seems to vastly outstrip the input. In view of the inescapability of Plato’s Problem2, the minimal grounding point raised above has always been of particular significance: acquirers demonstrably go beyond the finite input to which they are exposed in a range of, for the most part, surprisingly consistent ways; similarly, the nature and content of individual exposure also varies greatly, once again seemingly mostly not to the detriment of the essential uniformity of adult grammars. During the Minimalist era, the rich UG assumption and, thus, its potential as a solution to Plato’s Problem has, however, been drawn into question: the objective in this context is to populate UG with only the grammar-shaping content that cannot be ascribed to more general cogni- tive principles. More specifically, Chomsky (2005) proposes the so-called Three Factors Model, represented in (2):

(2) UG + PLD + general cognitive factors à Adult Grammar (= an I-language)

Here, the additional factor – the “general cognitive factors” in (2) – may, for example, include language acquisition biases (‘principles of data analysis … used in language acquisition and other domains’; Chomsky 2005: 6), and constraints on the make-up and workings of the computational system underpinning human language (‘principles of structural architecture’ and ‘principles of efficient com- putation’; ibid.). To my mind, this Three Factors model has not received the serious and system- atic attention that it deserves. In part, this follows from the vastness of the questions about its individual components – the Three Factors – on which there is currently very little, if any, real consensus. Consider, for example, the question of what a minimal UG should contain. Researchers who would today describe themselves as “generative”/“Chomskyan” range from those, on the one hand, who would iden- tify only (feature-blind) Merge (the basic combinatorial operation which produces Recursion; cf. Hauser, Chomsky & Fitch 2002 and many subsequent researchers3) to those, on the other, who assume richly specified cartographic or even nanosyn- tactic structures (see i.a. Shlonsky 2010, Cinque 2013, Rizzi & Cinque 2016 on the former, and i.a. Caha 2009, Starke 2009, 2014, and Baunaz, De Clercq, Haegeman & Lander 2018 on the latter). An informal survey of generative colleagues of all

1. This “rich deductive structure” refers to the assumption that parameters, by hypothesis, determined not just the phenomenon associated with their triggering input, but additionally also a cluster of at first sight unrelated, and, in part, very complex properties. 2. ‘[T]he problem of explaining how we can know so much given that we have such limited evidence’ (Chomsky 1986: xxv). 3. This basic, feature-blind combinatorial operation is known by many names, including Core Merge (Fujita 2009), Set-Merge or Simplest Merge (Epstein, Kitahara & Seely 2012, 2013; Chomsky, Gallego & Ott this volume), Bare Merge (Boeckx 2015), and Concatenate (Hornstein & Nunes 2008; Hornstein & Pietroski 2009). See i.a. Mobbs (2015), and Freeman (2016) for discussion of the nature of syntactic Merge, and of the extent to which Merge as employed in syntactic derivations can be equated with the combination operation seen outside language. 48 CatJL Special Issue, 2019 Theresa Biberauer ages also suggests that a great many remain committed to the necessary correctness of Chomsky’s (2001: 10) proposal that UG ‘specifies the features F that are avail- able to fix each particular language L’. This would, however, entail a much richer UG than the Merge-only entity assumed in Hauser, Chomsky & Fitch (2002), and work following that line of thinking. To the extent that parameters are still assumed to be a useful way of thinking about (the limits on) crosslinguistic variation4 both synchronically and diachronically, we also see significant unclarity regarding the nature and origins of minimalist parameters, with some researchers assuming a high number of innately specified choice-points (cf. i.a. Westergaard 2009, and the work of Richie Kayne more generally), and others assuming these to be (in part) emergent in different ways (cf. i.a. Dresher 2009, 2014 in the domain of phonology; Gianollo, Guardiano & Longobardi 2008; Guardiano & Longobardi 2017, and Longobardi 2018 for the proposal that specific parameters in fact reflect a limited number of innately specified parameter schema, and Rizzi 2014, 2015 for a proposal in the same spirit; see also i.a. Zeijlstra 2008; Biberauer 2018 et seq.; Roberts 2012, 2019; Wiltschko 2014; Ramchand & Svenonius 2014; and Biberauer & Roberts 2015, 2017 on different types of specifically emergent parameters), and perhaps the majority leaving aside explicit consideration of this “bigger picture” question. In relation to third factors, the picture is more rather than less opaque; see Mobbs (2015) for overview discussion. Finally, systematic consideration of the form that the ‘triggering’ input takes has barely advanced beyond the by now long-standing recognition that ‘PLD’ cannot be taken to mean “everything the child hears”. Thus discussions like Evers & van Kampen (2008), Gagliardi (2012), and Lidz & Gagliardi (2015) highlight the difference between ‘input’ and ‘intake’,5 while Fodor & Sakas (2017) provide a useful overview of work to date on so-called ‘triggering input’. Agreement – even in quite general terms – on what our conception of Factors 1, 2 and 3 should be thus remains to be reached. A positive perspective on this state of affairs would interpret it as following from the fact that a more explicitly articulated version of the Three Factors model and its components is precisely what current generative theory is, at this point, in the process of striving for. Granting this positive interpretation, however, one would want to see explicit discussion of how progress towards this goal might be made; and it is my sense that we are not engaging in discussion of this kind – or at least, not systematically so. More specifi- cally, we are not taking seriously enough the possibility of making new progress on the Big Question regarding the likely contents of UG – and on many other matters of generative concern, long-standing and otherwise. What I would like to suggest here is that such progress can rather readily be made by probing the second and third factors via routes that generative and more general linguistic research to date puts 21st century generativists – and researchers

4. See i.a. Newmeyer (2004, 2005), Biberauer (2008, 2011, 2016, 2017b, c, d), Gallego (2011), many of the contributions in Picallo (2014), Eguren, Fernandez-Soriano & Mendikoetxea (2016), and also Biberauer & Roberts (2017). 5. See also Gass (1997) on this distinction in the L2 context. Factors 2 and 3: Towards a principled approach CatJL Special Issue, 2019 49 more generally – in an excellent position to exploit. Accordingly, this paper will seek to outline a model within which I believe productive investigation of all three factors might proceed (section 2). As my purpose here is to attempt a demonstration of how systematic investigation of Factors 2 and 3, and their interaction with Factor 1 might be undertaken, most of the discussion will focus on the former Factors (sections 2.2 and 2.3 respectively). Section 3 then considers some of the novel predictions the model makes, i.a. also considering its implications for our understanding of UG. Section 4 concludes.

2. A neo-emergentist approach to linguistic variation: the Maximise Minimal Means (MMM) model The neo-emergentist model to be outlined here can be schematized as follows (Biberauer 2011 et seq.):

(3) UG + PLD + Maximise Minimal Means (MMM) → Adult Grammar F1 F2 F3

The nature and assumed role of each factor will be discussed in the following sub-sections, but first a word on the “new” ingredient: Maximise Minimal Means. On the sense in which this model is ‘neo-emergentist’ see section 2.2 below. As already noted, I am assuming MMM to be a general cognitive bias. Importantly, it is conceived as both (i) a generally applicable learning bias har- nessed by the acquirer during acquisition, and (ii) a principle of structure building, facilitating the kind of efficient computation and also, crucially, the self-diversify- ing property that allows human language to be the powerful tool that it is. On this latter point, I follow Abler (1989), an early proponent of the idea that at first sight very different-seeming complex systems – like those underlying chemical interac- tions, biological inheritance, and human language – may be constructed on the basis of common principles. More particularly, Abler argued that chemistry, genetics, and human language all share a hierarchical organisation centred on “particulate” – i.e. discrete – units, which combine in such a way that the systems in question are able to self-diversify. In other words, they are able to behave in the manner of what Abler designates Humboldt systems, namely those:

(4). a which ‘make[ ] infinite use of finite means’ (Humboldt 1836: 70), and, no less importantly, . b whose ‘synthesis creates something that is not present per se in any of the associated constituents’ (Humboldt 1836: 67)

The component in (4a) is much-cited in generative work, with (4b) typically going unmentioned. Here, I would, however, like to suggest that the novel “more- than-the-sum-of-the-parts” (henceforth: more-than) products emerging from the synthesis of simpler elements are no less fundamental to our understanding of the make-up of language structure – and, in fact, also that structure’s use – than 50 CatJL Special Issue, 2019 Theresa Biberauer the oft-mentioned infinity-generating finite means: that one would get more than just the sum of the (finite) parts is precisely what MMM would lead us to expect, as the following discussion will show.

2.1. Factor 1: Universal Grammar Our starting hypothesis in respect of UG is that it will contribute the following to the I-language creation process:

(5). a the basic operations: (i) feature-sensitive – as opposed to ‘blind’ or Simplest6 – Merge, and (ii) likewise feature-sensitive Agree, . b a formal feature template of some kind (e.g. [iF]/[uF]), or possibly just the notion ‘formal feature, distinct from phonological and semantic feature’ (i.e. [F]) to be fleshed out in ways appropriate to the substantive content of the formal features in the system.7

There may, additionally, be a very small set of universally specified formal fea- tures (=[F]s) not derivable from the inpuy (see section 2.2), and/or a set of universal spine-defining categorisers of the kind assumed in the work of Wiltschko (2014), Ramchand & Svenonius (2014), and Song (2019); but not the full inventory from which acquirers make a one-time selection postulated in Chomsky (2001: 10): one of this model’s objectives is precisely to try to make progress on the question of what kinds of [F]s are required to characterize natural-language syntax, and also to what extent those [F]s need to derive from UG. The working hypothesis is that [F]s which cannot be acquired on the basis of (i) cues that can credibly be ascribed to the input (see section 2.2 below for discussion) and/or (ii) the manner in which these input cues are interpreted as a consequence of the interaction of Factors 1 and 3 (see section 2.3) must constitute part of the ‘UG residue’ in the sense of Chomsky (2007: 19).8 Importantly, the perspective on formal features here elaborates in a particular way on Chomsky’s (1995) distinction between phonological ([P]), semantic ([S]), and formal features ([F]). In particular, we take [P]-[S]-based mappings to give the essence of the Saussurean arbitrariness that is familiar from the literature (see (6a) below). Human language, however, (uniquely?) goes beyond this level of arbitrariness; it additionally involves a “higher” level of arbitrariness defined by Formal ([F]-) features. As we will see, these [F]s map onto [P]- and [S]-features in systematic ways (see (7) below, and also section 2.2 for more detailed discussion). The proposal, then, is that there are degrees of arbitrariness in human language:

6. See note 3, and also i.a. Chomsky, Gallego & Ott (this volume), Richards (2017), and Preminger (2017) for discussion of Simplest Merge. See section 3.1.1 for the suggestion that Simplest Merge might not in fact be the obvious default in the context of a system that makes maximal use of minimal means. 7. Thanks to Jeroen van Craenenbroeck for discussion of this point. 8. That is, ‘UG is the residue when third factor effects are abstracted. The richer the residue, the harder it will be to account for the evolution of UG, evidently.’ Factors 2 and 3: Towards a principled approach CatJL Special Issue, 2019 51

(6). a lexically stored, idiosyncratic conventionalized sound-meaning mappings involving just [P]- and [S]-features, and . b grammatically regulated and thus more systematically conventionalised sound-meaning mappings, involving [P]-, [S]- and [F]-features.

(7) gives a rough schematization of the proposed interaction between the univer- sally uncontroversial (‘virtually conceptually necessary’; Chomsky 1993 et seq.) form ([P]) and meaning ([S]) components of language, and Chomsky’s (1995) formal features ([F]). As this diagram indicates, the [F]s are assumed to piggy-back on the in part more directly accessible [P]- and/or [S]-features, a point to which we return in more detail below: (7)

In the absence of a UG-given inventory of [F]s, and, further, no innately given parametric specifications, the question is, of course, where the seemingly recurring systematic patterns in natural-language syntax come from. In this model, the answer is from the interaction of (i) the minimal UG outlined in this section with (ii) spe- cific aspects of the input to be introduced in the following section and (iii) MMM, which is the focus of section 2.3. That is, natural language syntax is the more-than outcome of the interaction of Chomsky’s three factors (see again (4) above).

2.2. Factor 2: PLD (the intake) As is clear from (1), PLD has been part of the generative model of language acqui- sition from the outset: without exposure to specific linguistic input, no grammar will develop (cf. i.a. Crain & Pietroski 2001; Lidz & Gagliardi 2015). There has, however, never been a systematic attempt to specify precisely what the PLD actu- ally entails in concrete terms, or why it should be credible that the child is able to draw on it. The ‘limited evidence’ orientation of the classic P&P era (see p. 1 above) is partly to blame here as the ‘deductive richness’ expectation of classic parameters was precisely concerned with alleviating the need for acquirers to notice every regularity in their target systems. This alleviation, it is important to note, remains a goal that needs to be pursued in the current context, given the clear exist- ence in both “stable” and developing grammars of regularities for which the input is either rare or non-existent (see section 3.1.3 for discussion of a specific case). Insofar as the relationship between UG and the PLD is concerned, there was also, during the classic P&P era, a challenge that was quite widely acknowledged, namely the so-called Linking Problem (cf. i.a. Pinker 1984; Gervain & Mehler 2010; Ambridge, Pine & Lieven 2013; Fasanella 2014; Fasanella & Fortuny 2016; 52 CatJL Special Issue, 2019 Theresa Biberauer and Pearl & Sprouse in press for discussion). This revolves around the question of how the contents of UG, rich or otherwise, are to be linked up to the actual linguistic input that acquirers are exposed to. From the classic P&P perspective, the question is how acquirers actually ‘recognize’ the empirical facts that will allow them to set pre-specified parameters in the appropriate way? (see Fodor & Sakas 2017 for overview discussion, and i.a. the work of Lightfoot, Fodor, and Westergaard for some phenomenon-specific attempts to pinpoint the nature of the input strings that would “cue” parametric settings/I-language specifications of different kinds.) The same question naturally arises in the context of an impoverished UG model of the kind under consideration here. Regardless of one’s assumptions about UG, then, better understanding of the notion ‘acquisitionally significant input’ (= ‘PLD’ = ‘intake’) is required. In the absence of an overarching theory of why certain data matter, while other data do not (as much), generativists have left themselves open to (not entirely unjustified) accusations about the seriousness with which they approach the empiri- cal side of their linguistic theorizing. What I would like to do in this connection is introduce and motivate what I believe to be a principled approach, which builds, on the one hand, on what we have learned about acquisition in the last four decades or so, and, on the other, on both classic structuralist and more recent Chomskyan ideas, thereby allowing us to formulate a suitably precise hypothesis about which aspects of the input seem likely to qualify as credible ‘intake’ and, hence, to serve as the basis for grammar construction. What follows is a highly simplified version of an approach I have been developing since 2011 in the context of the research projects and subsequent research listed in the first note. In the absence of a rich UG for an appropriately articulated learning theory to link to the input acquirers receive, we clearly have to let Factors 2 and 3 work harder than was previously the case. And key insights from the past 30 years’ lan- guage acquisition research suggest that this may indeed be feasible. Consider, for example, the research demonstrating in utero and very early post-birth sensitivity to aspects of prosody (see Gervain & Werker 2008 for an overview). In brief, it is known that the fetal auditory system is functional from around 6 months’ gestation (Mehler & Dupoux 1994; Moore 2002). While fine details of speech are filtered out, less fine-grained prosodic properties, like intonational contours (e.g. a lan- guage’s characteristic “tune”, which is closely tied to its basic headedness proper- ties; see below) and rhythmic properties are detectable in utero. This fact appears to underlie newborn infants’ repeatedly demonstrated ability to distinguish the mater- nal language from a prosodically distinct – and oppositely headed – language-type, e.g. English vs Japanese (cf. i.a. Mehler et al. 1988; Nazzi et al. 1998; Gervain et al. 2007), and also, subsequently, their strikingly early ability to establish the “basic” (i.e. lexical/bottom-of-extended-projection) head-directionality of the system they are acquiring: simplifying greatly, OV has a basic ‘strong-weak’ prosodic contour, while VO has a basic ‘weak-strong’ contour (cf. i.a. Wexler 1998 and Tsimpli 2014 on basic word order as a very acquired property, a Very Early Parameter or VEP). Further, various ‘edge’-oriented cues allow acquirers to begin to “chunk” the input-strings in accordance with the grammar of their input-language(s) long before Factors 2 and 3: Towards a principled approach CatJL Special Issue, 2019 53 they have any lexical knowledge. Function items consistently differ from content items in respect of their phonological properties (they are, in general, shorter, with individual syllables being less complex with less diphthongisation, shorter vowel duration, and diminished amplitude), their frequency (individual function words are much more frequent than individual content items), and, particularly crucially in the current context, their distribution (functional items tend to occupy the edges of syntactic domains). These properties appear to alert pre-lexical infants to the distinction between content and functional items, leaving 6-month-olds with a pref- erence for the former (see Shi, Werker & Morgan 1999; Shi & Werker 2001; and the overview in Gervain & Werker 2008). Thereafter, more fine-grained details become available, with, for example, the distribution of consonants and vowels within already-identified linguistic chunks contributing specifically to the articu- lation of acquirers’ knowledge of, respectively, vocabulary and associated inflec- tional morphology (Nespor, Peña & Mehler 2003). Importantly, then, the picture that emerges is of acquirers making the most of the cues that are accessible to them at every stage of the acquisition process, as one would expect on an MMM view (see section 2.3 below). More specifically, we see that acquirers seem initially to focus just on the linguistic systematicities that do not require any mapping between form and meaning: salient and typically recurring (and thus high-frequency) phrase-level prosodic regularities. Prosody, in other words, seems to be the minimal means which serves as the stepping-stone into grammar. Once accustomed to the initially registered patterns, acquirers appear to be become “bored” by them, and we see a shift in interest to more fine-grained, high-frequency aspects of prosodic encoding – such as those underlying the dif- ference between content and functional items – which the now “boring” initial prosodic regularity has rendered accessible to the acquirer. And so the process continues, with the acquirer’s attention to linguistic properties becoming succes- sively more finely tuned as their linguistic knowledge at each stage of the acquisi- tion process facilitates ever more detailed access to the regularities in the input. On the MMM view, then, the acquirer’s attention to the input is at least partly “steered” by what the grammatical specification of their grammar makes accessible to them. Initially having access to only a limited component of what is in the input – i.e. to a highly restricted intake – appears to allow acquirers to make efficient headway in fleshing out the complex formal system to which they are exposed on a “Less is More” basis (see Newport 1990; Elman 1993; much recent work by Charles Yang, and the discussion to follow in section 2.3). Crucially, acquirer’s hypothesised initial “sound-side” focus provides them with various kinds of distributional knowledge, which can then be fleshed out on the basis of input requiring sensitivity to both sound and meaning.9 In this con- nection, the distinction between the fully arbitrary form-meaning mappings that define classic Saussurean arbitrariness ((6a) above), and the still arbitrary, but more

9. In the case of sign languages, this initial sensitivity would, of course, be expected to centre on relevant aspects of sign-language prosody, which has been said to involve body posture and various manual cues (timing, size; see Sandler 2010, 2012 for an overview). 54 CatJL Special Issue, 2019 Theresa Biberauer systematic form-meaning mappings that constitute grammar ((6b) above) is argued to be particularly important. More specifically, Biberauer (2017e) highlights the key relevance of so-called systematic departures from Saussurean arbitrariness – i.e. consistent departures from the arbitrary one-to-one form-meaning ([P]-[S]- based) mappings that underlie the core content lexicon – in alerting the acquirer to a domain in which the postulation of (grammatical) formal features ([F]s) would facilitate more economical – in our terms, MMM-driven – learning and knowledge representation (see also Schuler et al. 2016, and Pearl in press; and Fasanella 2014 and Fasanella & Fortuny 2016 on the so-called Chunking Procedure). The proposed [F]-signalling mappings include:

(8). a Doubling/Agreement and expletives/dummy elements, i.e. cases where there is, in a relevant sense “too much form”. In the doubling/agreement case, for example, we have two/multiple forms, the prosodically weaker one of which “echoes” (part of) the meaning of the other (cf. also Zeijlstra 2008).10 In the expletive/dummy case, we have a form with no (non-rela- tional) meaning.11 Instead of just postulating the relevant semantic ([S])

10. The fact that agreement/doubling “echoes” part of the meaning of its controller does not rule out the possibility that it may, in the context of particular structures, serve to signal meaning that might not otherwise be (so) evident. In German (i-ii) suggested by an anonymous reviewer, for example, the verbal agreement serves (potentially alongside intonation in speech) to distinguish two quite different meanings: (i) Peter hat Frauen einen Brot gebacken. Peter have.sg women a.acc bread buy.part ‘Peter baked a loaf of bread for women.’ (ii) Peter haben Frauen einen Brot gebacken. Peter have.pl women a. acc bread bake. part ‘For Peter women baked a loaf of bread.’ The claim about agreement/doubling as an [F]-cue is simply that its systematically dependent, “echoing” nature will be salient to child acquirers in a context where they are trying to establish generalisations for systematically recurring patterns. As units, agreement/doubling markers carry “derivative” meanings, but this “derivative” meaning may serve disambiguating, emphasizing, or other interpretively significant functions in certain structural configurations. That individual ele- ments will be able to serve both neutral/unmarked and non-neutral/marked functions, depending on their structural environment, is, in fact, precisely the kind of more-than effect that the MMM model would predict. 11. The idea that expletives add “no meaning” to structures of which they are part and are, conse- quently, LF-replaceable (cf. Chomsky 1995) is widespread in Chomskyan syntax (see i.a. Vikner 1995; Svenonius 2002 for discussion). That even the most familiar English-type “pure” expletives (Lasnik 1995) have interpretive consequences is, however, also clear: English there, for example, consistently blocks wide-scope readings (Milsark 1974; Bobaljik 2002). To the extent that they are primarily grammatical rather than content elements which contribute to interpretation by block- ing otherwise available, movement-derived meanings, expletives may thus better be classified as instantiations of (8d)-type departures from Saussurean arbitrariness. If one considers expletives beyond English – e.g. Icelandic topic expletives, Basque, Korean and Sardinian verbal expletives, all of which play a role in information-structurally marked structures – this latter classification in fact seems more appropriate. It is also worth noting that expletive elements generally seem to con- tribute to meaning principally as a consequence of the relations they enter into with the contentful components of the structures they feature in, i.e. their interpretive contributions depend less on the Factors 2 and 3: Towards a principled approach CatJL Special Issue, 2019 55

feature in cases like these, an appropriate [F] also needs to be postulated (see (9) below for an illustration relating to the postulation of a formal [negation] feature). b. Systematic silence, e.g. null exponence, null arguments, null complemen- tisers, ellipsis, etc. These are cases where there appears to be meaning which arises systematically despite the absence of form. If acquirers, as a result of their encounters with the content lexicon in particular, operate on the default assumption that meaning is paired with overtly realised form, we might expect them to “notice” circumstances where they systematically interpret meanings that don’t correlate with overt form. The evidence from child acquisition suggests that certain types of nullness – notably, null argu- ments – are correctly produced and understood very early, by the age of 2 (Tsimpli 2014). Other types, like VP ellipsis are likewise produced and understood surprisingly early, by the age of 3-4 (see i.a. Foley et al. 2003, and Santos 2009), although this may not be full acquisition of all aspects of the relevant phenomena (cf. i.a. Göksun et al. 2011 for discussion). The fact that null elements alternate with overt counterparts undoubtedly plays a key role in the identification of nullness-related [F]s, with cases where the overt form is necessarily emphatic – null subjects are a case in point – being acquired particularly readily. . c Multifunctionality, or cases where there appears to be what we might think of as system-defining homophony, i.e. a pattern in terms of which single forms can contribute multiple meanings, depending on their place- ment/distribution (cf. also Wiltschko 2014). Importantly, for the acquirer to diagnose a systematic departure from Saussurean arbitrariness, the grammar being acquired must feature multiple apparently homophonous forms whose distribution is key to their interpretation; isolated homo- phonies (as in that centring on English bank) are not predicted to trigger [F]-postulation. Systematic homophony is a striking property of many East Asian languages, for example (see Duffield 2013, 2017, Biberauer 2017a). Thus the Vietnamese modal system discussed in Duffield (op. cit.) com- prises three distinct lexical items – or, more accurately, units of language in Wiltschko’s (2014) sense – whose immediately preverbal, postverbal or clause-final placement determines their modal force (deontic, abilita- tive, epistemic, respectively). In cases of this sort, acquirers postulate an underspecified ‘homophone’ (or unit of language) lacking the [F](s) that determine the distribution of the element in question; these [F](s), instead, the acquirer assigns to phonologically null functional heads, which serve as Merge-sites for the relevant underspecified forms. Distributional cues, then, are key to capitalizing on this [F]-cue.

independent content they contribute to the wholes of which they become a part, and more on the interpretive contrasts they facilitate with otherwise required, but, in the structures in which they occur, unrealised derivational operations – obligatory substantive subject-, topic-, or verb-raising in the cases mentioned here. 56 CatJL Special Issue, 2019 Theresa Biberauer

d. Movement, i.e. assuming Chomsky’s (2000) notion of ‘duality of seman- tics’ – roughly, that human language expresses both thematic and dis- course/scopal meaning – we can see that movement will often result in “extra” meaning. This would, for example, be true in topicalization- and focus-fronting cases.12 Also relevant here, however, is what we might think of as ‘higher-level duality of patterning’, deriving from the contrast between “neutral/basic” and “marked” orders. Just like Hockettian duality of patterning (Hockett 1958) assumes two levels of structuring – meaning- less phonemes which combine to create meaningful phoneme combina- tions13 – we might think of syntax as involving “meaningless” structuring that contrasts with meaningful structuring (see also Fortuny 2010). More specifically, consider on the one hand meaningless “basic” word-order choices like OV vs VO – which are, significantly, known to be acquired early (cf. Tsimpli 2014 for overview discussion) – and meaningless obliga- tory filling choices like V’s spellout position or the need to fill Spec-TP or Spec-CP; on the other hand, we would have meaningful optional move- ments like T-to-C in English, or the nature of the XP that raises to Spec-CP. Here, the meaningless conventions require fixing – just like the content of the phoneme inventory does – whereafter they can serve as the basis or reference point for further, potentially meaningful ordering patterns, which contrast with the “basic” one.14 . e A particular kind of recursion, namely that which produces the structured repetition patterns that underlie productive compounding(-like) patterns (e.g. noun-noun or verb-verb compounding, verb-serializing, and verb clustering; cf. much work by Tom Roeper, William Snyder and Ana Pérez-Leroux, i.a. Roeper 2011; Roeper & Snyder 2004, 2005; Pérez- Leroux, Castilla-Earls, Bejar & Massam 2012; Pérez-Leroux, Peterson, Bejar, Castilla-Earls, Massam & Roberge 2018). Acquirers can be expected to “notice” this kind of recursion – thus rendering it a credible [F]-trigger – on account of their keenness to postulate memory-saving generalizations, i.e. formal rules (see Roeper & Snyder 2005: 158; and also

12. Importantly, though, the fact that a particular movement operation is interpreted as topicalisation or focalisation does not automatically result in the postulation of a [topic]- or [focus]-feature. The third-factor bias to maximise minimal means, to be discussed in the next section, will, in the first instance, drive the acquirer to seek to recycle an already-postulated [F]. In this connection, the growing number of analyses of topicalisation and focalisation phenomena that diagnose [F]s like [person] (see i.a. Richards 2008; Leffel, Simik & Wierzba 2013) and [case] (see i.a. Pesetsky 2014; Levin 2016, and (18b) in the main text) as the syntactically active [F]s being manipulated by Merge and Agree is precisely what the MMM approach would predict (see also section 3.1.1.). 13. Duality of patterning rather clearly seems to instantiate the second aspect of language’s Humboldtian character (cf. (4b) above); and the same is true for the “higher-level” variety pro- posed here. 14. Having both levels of duality of patterning allows the system to maximise the contribution of both the Lexical Items – i.e. the elements (containing the features) that are manipulated by the compu- tational system – and that system’s structure-building operations, (External and Internal) Merge, as MMM would lead us to expect. Factors 2 and 3: Towards a principled approach CatJL Special Issue, 2019 57

Yang 2016; Schuler et al. 2016). More fundamentally, the requirement that van Riemsdijk (2008) and Leivada (2017) label Identity Avoidance and Richards (2010) Distinctness drives acquirers in the direction of [F]-postulation wherever apparently identical elements surface adjacent to each other within the same domain. This drive plausibly reflects a very basic heuristic that children are more famously known to employ in word learning, namely the Principle of Contrast (Clark 1993). From the current perspective, the recursion at stake here is just MMM driving the acquirer to make use of the Principle of Contrast not just in the core lexical domain (i.e. in relation to (6a) above), but also in grammar structuring (i.e. in relation to (6b) above too).15,16

A word on high-frequency recurring collocation, i.e. unduly frequent forms with a consistent, relatively minimal meaning, and a consistent position relative to contentful lexical items, is also in order here. This case boils down to the distinc- tion between content/lexical and function words, which we know acquirers to be sensitive to from the very earliest stages of acquisition (see again the discussion of Shi, Werker & Morgan 1999; Shi & Werker 2001 above). As noted above, func- tion words are edge-elements, located at the left or right boundary of their XP. [F]s are assigned directly to these elements in cases where they exhibit regular, non- homophony-type departures from Saussurean arbitrariness, e.g. where they trigger agreement, or movement, or ellipsis or nullness of some other kind, or recursion, or where their presence is obligatory wherever a substantive element of some kind is present (French determiners would be a case in point). As discussed above, [F]s are not assigned directly to (8c)-type homophonous elements: doing so would cre- ate an unwieldly, homophone-rich lexicon which fails to register many systematic generalisations. Importantly, then, functional elements per se are not necessarily ascribed [F]s, leaving open the possibility of (largely) [F]-less auxiliaries, deter- miners, etc., in some languages, i.e. of less grammaticalised functional elements.

15. In fact, in emergentist approaches to phonology such as that of the so-called Toronto School (see i.a. Hall 2007; Dresher 2009, 2014), the Principle of Contrast is also assumed to be operative in the structuring of phonological systems: in accordance with Hall’s Contrastivist Hypothesis, phonological features are only postulated if they account for a phonological contrast in the system being acquired. To the extent that all Identity Avoidance phenomena can be ascribed to the workings of the basic Principle of Contrast heuristic, the diverse Obligatory Contour Principle (OCP)/haplology phenomena that have been identified in phonology and morphosyntax can all be understood as a reflex of this same heuristic. Formally identical elements may not Merge with each other and thus surface adjacent to each other in the same domain any more than identical phonological units may do so. 16. Cf. also D’Alessandro & van Oostendorp (2018) on so-called Magnetic Grammar. That we would see the kinds of repulsion and attraction effects highlighted in this work – and also properties like Relativized Minimality – follows quite directly from the approach outlined here: in systems that maximize minimal means, we expect the number of features and the composite objects constructed from them to be limited in such a way that complete or partial similarity- and difference-based relations like attraction, repulsion, and intervention effects would be expected to become calculable and, thus, to play a role in regulating language structure. In a system with too many distinct [F]s, the observed interactions could not be modelled as falling out from simple similarity and difference “calculations”. 58 CatJL Special Issue, 2019 Theresa Biberauer

This seems useful when we compare “particle”-type auxiliaries and determiners with “full” counterparts, either crosslinguistically or within a single language (see Biberauer 2017a for extensive discussion), and also when we think about the pro- cess via which functional elements become grammaticalised (an [F] not previously associated with a content item needs to be ascribed to it). Taking (8a-e) together, then, the driving intuition is that [F]s are postulated if they can be seen to regulate some form of systematic contrast, which cannot be explained by appealing only to semantic or phonological considerations. Consider the case of negation. (9-11) illustrates three types of systematic departure from Saussurean arbitrariness that the approach outlined here predicts to cue the presence of a formal feature ([F]); here [negation]:

(9) Ons is nie laat nie. [Afrikaans] us is not late neg ‘We are not late.’

(10) a. [With no job] would she be happy. [English] (neutral order: She would be happy with no job.) b. [Never in my life] did I expect that to happen! (neutral order: I never in my life expected that to happen.)

(11) a. a gʊa atɨ. [Mbili, Grassfields Bantu, Niger-Congo; Cameroon] 3sg fell tree ‘He fells a tree.’ (affirmative: VO) b. a ka atɨ gʊa. 3sg not tree fell ‘He does not fell a tree.’ (Ayuninjam 1998: 339, via Dryer 2009)

In (9), two negative markers are required to express a single negation, a regular pattern in Afrikaans, which acquirers are thus expected to pick up on;17 since the doubling is specifically keyed to negation, the formal feature [negation] is postu- lated. Property-type (8a) thus cues the presence of [negation] here. (10), in turn, presents two structures in which a negative phrase has been fronted, triggering Verb Second, a non-neutral word-order pattern in modern English. The contrast between the neutral SVO-structures and these V2-fronting structures requires reference to

17. Since this negative doubling is necessarily expressed in every negative imperative structure (see (i)), the child will receive considerable amounts of input signalling the formal (i.e. grammaticalised) nature of negation. (i) Moenie jou tas vergeet nie! must.not your case forget neg ‘Don’t forget your suitcase!’ More generally, the formal features cued in imperatives seem to us good candidates for ‘early’ acquisition in the sense of Wexler (1998) and Tsimpli (2014); see also main text. Factors 2 and 3: Towards a principled approach CatJL Special Issue, 2019 59 the formal feature [negation] – and possibly also [focus], given the more general nature of modern English’s V2 profile, a point we leave aside here. Interpretively significant optional movement – one instantiation of property-type (8d) – thus cues the presence of [negation] in this case. Finally, (11) demonstrates the consistent word-order difference between affirmative and negative clauses in Mbili, a case of “basic” word order facts pointing to the grammatical relevance of negation, i.e. the other instantiation of property-type (8d)) signalling the need to postulate [negation]. As already noted, it appears to be the case that [P]-features alone – notably prosodic properties – serve as the initial stepping-stone into grammar. With basic, purely P-mediated regularities in place, the child can then proceed to draw on the cues provided by (8a-e)-type phenomena. Worth noting in the latter connection is the seeming significance of the cues provided by certain high-frequency, relatively simple, but strikingly syntax-rich structures, notably questions and imperatives (Biberauer 2015, 2017c; Biberauer, Bockmühl, Herrmann & Shah 2017). The cur- rent hypothesis is that [F]s cued in these structures will play a key role in structur- ing the earliest child grammars. As we will see in section 3.1.1 below, this also leads to the prediction that these [F]s will be the target of different kinds of ‘recy- cling’. For present purposes, the key point is that the approach outlined here does suggest both an initial ‘way in’ for the postulation of [F]s – the P(honological)- route – and also a potential basis on which the acquirers may initially move beyond purely [P]-mediated [F]s to those cued by systematic departures of the kind in (8). Evidently, the systematic morphosyntactic and morphosemantic contrasts that an acquirer encounters will vary by language; hence the language-specific ‘content’ of what it means to “be” categories of different types, and also what features are grammaticalised (i.e. [F]s) is, on the account proposed here, expected to vary (cf. also i.a. Haspelmath 2010; Ritter & Wiltschko 2009, 2014; Wiltschko 2014; and Chung 2012 on this). That grammars will always be characterized in terms of the distribution of formal features (cf. Baker’s so-called Borer-Chomsky Conjecture) and the way in which these regulate the operations of Merge and Agree, however, crucially distinguishes the present approach from “standard” emergentist approach- es, e.g. those in the Construction Grammar tradition. We therefore designate the current approach neo-emergentist. Since both the [F]s and the categories they define will be emergent, we do need to understand how it is that the current proposal does not just predict rampant and unconstrained variation. Having considered the respective contributions of Factors 1 and 2, it is time to turn to Factor 3: Maximise Minimal Means (MMM).

2.3. Factor 3: MMM MMM is, as noted at the outset, a general cognitive bias, which I assume to play a key role in steering acquisition. In the linguistic context, I assume it to have – pos- sibly among others – the language-specific manifestations in (12-13):

(12) Feature Economy (FE): postulate as few formal features as possible to account for the input (=intake) [generalised from Roberts & Roussou 2003] 60 CatJL Special Issue, 2019 Theresa Biberauer

(13)  Input Generalisation (IG): maximise already-postulated features [generalised from Roberts 2007]

Together, FE and IG result in a learning pattern/path (hierarchy) with the fol- lowing general “shape” (cf. also Biberauer & Roberts 2016, 2017):

(14) The NONE>ALL>SOME learning path

Here, the idea is that (14) models the interaction between the three factors in (3) as follows: the initial NO represents an acquirer who does not pick up on a systematic departure from Saussurean arbitrariness in the input; they will therefore not pose the ‘F present?’ question. The initial NO thus needs to be interpreted as a default which the comparatively oriented linguist can juxtapose with the initial YES, the answer that necessarily results when some form of trig- gering data (see again (8) above) leads to this question being posed. The initial NO (or the NONE-system), then, respects both FE and IG; it literally requires the acquirer to do nothing. The initial YES (or the ALL-system), by contrast, necessarily violates FE – as all [F]-postulation and thus, (further) grammar con- struction, will – but it respects IG as the newly identified [F] is assumed to be present on all heads in the relevant domain (all heads in the case of headedness; all argument-licensing heads in the case of null-argument phenomena; all verbal heads in the case of finiteness marking, etc.). Should it emerge that the postulated [F] is not sufficient to delineate the domain over which the property in question is distributed, a further [F] will be postulated, thus producing a SOME-system (at later acquisition stages, this [F] may already be part of the system; see sec- tion 3.1.3 for some discussion illustrating this case). If the relevant regularity is still not suitably demarcated, a further [F] is postulated, as before, producing another SOME-system. And so on until the relevant regularity has been appro- priately characterized.18

18. The proposed learning path thus progresses from super- to subset, which might at first sight suggest a ‘superset trap’ problem. Since the supersets in play here plausibly follow from the acquirer’s initial ‘ignorance’, however, with subsets being postulated precisely because it is clear that the existing superset grammar is deficient, the classic Subset Principle reasoning does not apply (see also Branigan 2012 on this). The superset ‘grammars’ postulated on the basis of (14) are always defeasible by the input. Independently of this, see i.a. Fodor & Sakas (2005, 2017) and Biberauer & Roberts (2009) for critical discussion of the extent to which ‘grammar size’ can in fact be meaningfully translated into super- and subset relations: implementation of something like a Subset Principle in the acquisition context poses numerous non-trivial problems. Factors 2 and 3: Towards a principled approach CatJL Special Issue, 2019 61

Very importantly, the assumption that [F]-postulation by acquirers is regulated by MMM means that [F]s already in the system will always, where possible, serve as the point of departure for further refinements of the existing grammar (see section 3.1.1 below on [F]-‘recycling’). MMM will also tend to produce “nested” natural classes, with different (linguistic) phenomena being sensitive to more or less specific [F]-combinations (see the immediately following discussion, and also section 3.1.2 below). From an acquisition perspective, this also has the consequence that the ALL>SOME component of the NONE>ALL>SOME- defined learning path must be understood in relative terms. More specifically, as soon as an [F] is postulated to constrain the distribution of a grammatical regularity, it effectively becomes, for the acquirer, an ALL-option in relation to the class of heads under consideration at that point in the acquisition process; what “counts” as ALL vs SOME thus needs to be interpreted dynamically from the perspective of the language-acquiring child. From a typological perspective – i.e. the type of perspective a comparatively oriented linguist might hold – the existence of languages employing both more and less featurally constrained versions of “the same” phenomenon (head directionality, null subjects, verb-raising, etc.) – the NONE>ALL>SOME perspective remains useful in more fixed form (though see section 3.1.2. below for further discussion). And the same is true for the acquisitionist, who may find it useful to think of earlier and later stages of an acquirer’s grammar in NONE>ALL>SOME terms. If MMM and, more specifically, the NONE>ALL>SOME learning path it gives rise to are to be credibly conceived of as third-factor-related, there would need to be non-syntactic evidence favouring their postulation. Significantly, there does appear to be evidence of precisely this kind. Dresher (2009), for example, postulates the Successive Division Algorithm (SDA), which approaches the acquisition of phonol- ogy, and thus, by extension, phonological typology in NONE>ALL>SOME terms. The SDA is given in (15):

(15) a. Begin with no feature specifications: assume all sounds are allophones of a single undifferentiated phoneme.  b. If the set is found to consist of more than one contrasting member, select a feature and divide the set into as many subsets as the feature allows for.  c. Repeat step (b) in each subset: keep dividing up the inventory into sets, applying successive features in turn, until every set has only one member. (Dresher 2009: 16)

Importantly, the basis for the successive divisions is not dictated by UG; these divisions may therefore target different features in different systems, producing phonological systems with natural classes that are not structured in the same way. Consider (16) by way of example: 62 CatJL Special Issue, 2019 Theresa Biberauer

(16) NONE>ALL>SOME in phonology (diagram from Dresher 2014: 167)

(where marked values are indicated as [F] and unmarked values as (non-F). For expository purposes, we abstract away from the details of Dresher’s markedness assumptions.) Here we see that the three vowels /a/, /i/, and /u/ fall into different natural classes, depending on the way in which the vowel space that they occupy is divided. In both cases, the feature [syllabic] must initially be postulated to distinguish vowels from consonants: this is the basis for the ALL-division, which is universal, given that the sound spectrum does not have alternative “natural joints” (see Martí 2015). A range of SOME-division options follow, however. In the case of (16a), a first further distinction is drawn on the basis of the distinctive feature of vowel height ([high] vs (non-high) for Dresher), resulting in a natural- class distinction between high and non-high vowels. Phonological processes in this system (e.g. vowel harmony) will thus reference this high/non-high distinction, with /i/ and /u/ systematically exhibiting behaviour not shown by /a/. The rounding feature then serves to individuate the [high] vowels. In (16b), by contrast, the vowel space is initially sub-divided on the basis of roundness, with the height division being secondary, i.e. the basis for ultimate full individuation. In this case, phonological processes will therefore target /i/ together with /a/, excluding /u/. In each case, the vowel’s systematically contrastive behaviour will alert acquirers to the nature of the successive divisions that are required – or, in our terms, to the form that the full NONE>ALL>SOME learning pathway should take. Strikingly, existing phoneme acquisition studies focusing on English and Dutch would appear to support the kind of learning pathways predicted by this approach (see i.a. Fikkert 1994; Stokes, Klee, Carson & Carson 2005; and also Dresher 2014 and Mobbs 2015 for discussion). The work of Dany Jaspers (cf. i.a. Jaspers 2013; Seuren & Jaspers 2014) inde- pendently postulates a NONE>ALL>SOME algorithm in the domain of logico- cognitive concept formation. Consider (17) below in this connection: Factors 2 and 3: Towards a principled approach CatJL Special Issue, 2019 63

(17) NONE>ALL>SOME in the domain of the propositional calculus operators (following Jaspers 2013)

Here we see that successive divisions of the logical truth space necessarily begin with a separation of truth from falsity, i.e. Step 1 in (17b). As in the case of the vowel-space, further sub-division is then open to alternative possibilities: either we distinguish the case where something, possibly everything, is true from that where everything is true – Step 2 in (17c) – or we distinguish the case where everything is true from that where something, but not everything is true – Step 2’ in (17d). As we will see in section 3.1.2 below, non-initial (i.e. SOME-) divisions more generally seem to open up a number of alternative possibilities at the same level of division (‘subcategorisation’). Strikingly, Jaspers (2012) also shows how the (development of) human colour perception appears to follow the kind of successive division path MMM-driven development would predict. More generally, there is evidence from (developmen- tal) cognitive psychology showing that object classification seems to develop on the basis of ‘hierarchical inclusiveness’, with superordinate/more inclusive/less specified categories being acquired before subordinate/less inclusive/more specified categories (cf. i.a. Bornstein & Arterberry 2010). Various child language acquisi- tion phenomena also point in this direction. The “shadow” noun-class markers that have been said to precede fully specified noun-class markers in the acquisition of Bantu languages (Demuth 1994, 2003), the way in which free anaphors develop in French (van Kampen 2004; cf. also Lleó 1998, 2001; and Lleó & Demuth 1999 for Spanish), and the ‘root infinitive’ phenomenon (cf. Guasti 2017 for an overview) are all cases in point. And in the parsing domain, good enough parsing, in term of which humans preferentially operate with a shallow parse until it becomes clear that deeper parsing is required (Ferreira & Patson 2007) also looks like a reflex of MMM. The same is true for the evidence pointing to the use of fast and frugal 64 CatJL Special Issue, 2019 Theresa Biberauer heuristics in decision-making, i.e. Daniel Kahnemann’s (2001) fast thinking (see Gigerenzer & Todd 2000 for the seminal fast and frugal heuristics paper), and the picture that seems to be emerging from the study of writing systems: the majority of characters in writing systems are made of three strokes or less (Dehaene 2007), with cardinal orientations (horizontal and vertical) being vastly over-represented in the world’s languages, compared to oblique ones, as one might expect, given humans’ superior ability to compute the former (orientational anisotropy; Morin 2018).19 We will discuss further linguistic domains in which NONE>ALL>SOME seems to emerge in section 3 below. With the main components of the model in place, we are now in a position to consider some of its predictions.

3. Novel predictions of the model We will consider predictions of two types here: those relating to the general formal properties that we expect to find in natural-language systems, on the one hand, and those relating to predicted patterns in what I will call ‘Going beyond the input’ scenarios on the other (see i.a. Biberauer 2016, 2017b for more detailed discussion of a wider range of predictions).

3.1. General formal properties 3.1.1. Recycling Given MMM, we expect what we might generally think of as ‘recycling’ effects to be a distinctive property of natural-language systems. This does indeed appear to be correct. Consider, for example:

(18). a the pervasiveness of grammaticalisation phenomena in natural language, and the way in which ‘pragmaticalisation’ (broadly, speaker-hearer-orient- ed grammaticalization) also draws on existing elements and features in the system; . b the way in which certain features serve multiple functions in the same grammar (e.g. case stacking, where case-marking marks not just thematic and/or grammatical relations, but also discourse prominence; or the numer- ous uses to which agreement can be put, sometimes within the same lan- guage, Archi seemingly being the extreme case here; see Bond, Corbett, Chumakina & Brown 2016); . c the “specialised” use of C(onsonant) and V(owel), stress, and basic lin- earization in acquiring the lexicon and morphosyntatic regularities (see i.a. Nespor, Peña & Mehler 2003; and Gervain & Mehler for overview discussion); and

19. Thanks to Daniel Harbour for discussion. Factors 2 and 3: Towards a principled approach CatJL Special Issue, 2019 65

. d the various ways in which the earliest-acquired categories – centring on a basic predicate-/“archi”-V versus argument-/ “archi”-N-type category (cf. also Bouchard 2013; Douglas 2018; and Song 201920) – are put to “extended” use in grammar structuring. Consider, for example, the varied evidence pointing to the existence of extended projections (Grimshaw 1991 et seq.), which are typically thought to be defined with reference to basic lexical categorial features (e.g. V, N, P, etc.); on the present account, these basic features may usefully be thought of in the kind of not fully fleshed- out “archi” terms discussed in Douglas (2018) and Song (2019). As we will see in section 3.1.2, extended-projection membership imposes structural constraints of different kinds. Another case in point is the ubiquity of verbalization and nominalization phenomena, where the latter seems to serve both a general “subordinating” function (e.g. in subordination and embedding structures; cf. Franco 2012 for discussion and reference, and Huddleston 1984: 379-380 for the distinction between these two), and – the opposite – a foregrounding purpose (as in VP topicalization/focus). Among finiteness-marking languages, we also see many languages which harness the distribution and inflectional marking of the verb to signal notions that can be lexically expressed too, e.g. declarative vs interrogative marking, and main- vs subordinate-clause status, as in (non-English) Germanic; or realis vs irrealis, as in some Romance. V also often acts as a reference point for focus (see recent work by Kriszta Szendrői and Fatima Hamlaoui, and Vieri Samek-Lodovici, and the more general existence of immediately-before- and immediately-after-verb focus systems – see Gibson, Kombarou, Marten & van der Wal 2017), or for the A’-domain (as in V2 systems, and Hungarian – cf. Kiss 2008, who distinguishes a “nonconfigurational” post-V zone from a configurational pre-V zone; a similar, apparently “configurationality”-distinguishing pre- and post-V zone is found in Kiowa – Adger, Harbour & Watkins 2009).

Importantly, the MMM logic also suggests a perspective in terms of which Simplest Merge, conceived of as an [F]-blind operation, may not in fact be the simplest or ‘most minimal’ option (see note 3). In a system which maximizes mini- mal means, in which [F]s already serve as the basis on which the UG-given Agree operation operates, one might expect [F]s also to regulate Merge: if the computa- tional system can “see” these entities for the purposes of one operation, it requires a stipulation to render them “invisible” for the purposes of the other putatively universally given computational operation. If that is correct, the problems associ- ated with ‘free generation’ can be eliminated (see also Preminger 2018 on this).

20. Douglas (2018: 28, note 22), working within an MMM perspective, helpfully characterises the notions ‘archi-V’ and ‘archi-N’ as follows: ‘ We must think of the N/V distinction as distinguishing nominal features and verbal features (or nominal features and non-nominal features), which will eventually be successively subdivided into the finer-grained categories of the adult grammar (including [N] and [V]). The N/V distinction thus involves archi-features (by analogy with archi-phonemes): archi-N (N) and archi-V (V).’ 66 CatJL Special Issue, 2019 Theresa Biberauer

3.1.2. The shape of grammatical (parametric) variation and its connection to the course of acquisition The NONE>ALL>SOME learning path also leads us to expect “the same” phenom- enon to surface across languages in different sized versions. (19) schematises one way of thinking about this, with (20) attempting a rough characterization of what is at stake (cf. also Biberauer & Roberts 2016, 2017; Biberauer 2018; Roberts 2019):21

(19)

(20) For a given value vi of a parametrically variant feature F:

a. Macroparameters: all functional heads of the relevant type share vi; b. Mesoparameters: all functional heads of a given naturally definable class,

e.g. [+V], share vi; . c Microparameters: a small subclass of functional heads (e.g. modal

auxiliaries) shows vi;  d. Nanoparameters: one or more individual lexical items is/are specified

for vi.

Taking a specific example, the fact that the types of head-final systems that can be identified crosslinguistically can be (partially) distinguished along the lines in (21) thus fits with the expectations of the model (see i.a. Cinque 2005, 2017; Biberauer 2008; Biberauer & Sheehan 2013; Biberauer 2017d, 2018; Roberts 2019 for discussion):

(21) a. “rigid” head-finality: Japanese, Malayalam, etc. . b clausal head-finality, nominal head-initiality, and vice versa: Chinese, Thai, Gungbe, etc. c. “leaking” OV of different kinds, e.g. West Germanic

21. Importantly, the proposed parameter types must be thought of in relative rather than absolute terms, i.e. a different approach to that assumed during the classic P&P era, where the Head Parameter, for example, constituted a macroparameter; the null-subject parameter a mesoparameter, and so on. Factors 2 and 3: Towards a principled approach CatJL Special Issue, 2019 67

d. OVX, where O is the direct object (Hawkins 2009)

e. O[F]VX, where O[F] is a restricted object-type (e.g. Neg, Focused, Specific, etc.) Here it is worth highlighting the SOME-options reflected in (21), i.e. the systems for which the original head-initial/-final decision did not go all in one or other direction (see Biberauer & Roberts 2017 for simplified discussion, and Biberauer 2017b for more detailed consideration). That uniformly head-initial/-final clausal or nominal structures should occur once again reflects the expectation that early- acquired “archi”-V and N will play a key structuring role in natural-language grammars (cf. (18d) above). Importantly, we can, from a typological perspective, think of “archi”-V and N as fulfilling parallel roles in structuring different grammars (just as [high] and [round] did in (16a) and (16b) above; cf. also Wiltschko 2014 on the distinct, but formally parallel choice of one of [tense], [person] and [location] as the designated substantive content for INFL). More specialised SOME- systems will require the postulation of more [F]s in order to constrain the domain of head-finality. Here again, different [F]s may serve parallel structuring roles, with [aspect] potentially defining a domain of head-finality in one system, and [tense] in another, for example. As [F]-postulation is assumed to be driven by regularities in the input (section 2.2), and as there is no innately specified learning path, there is no expectation that these [F]s will be “tested” in a fixed sequence of any kind (pace the parameter hierarchies in i.a. Biberauer, Holmberg, Roberts & Sheehan 2014, and Roberts 2019). Instead, a linguists’ (typologically oriented) amalgamated representation of the potential learning paths would indicate that these SOME-options are typologically equivalent, i.e. choices made at the same stage of the learning path. Typologically equivalent SOME-choices, which are not successively considered in the acquisition context, are thus not typically in a featural superset/subset relationship; let us call these SOMEEquivalent choices. By contrast, SOME-choices that are successively considered during acquisition are in a featural superset/subset relationship; let us call these SOMESubset choices. (22) illustrates the difference with reference to the typology of head-final systems presented in (21) above:

(22) a. “rigid” head-finality: Japanese, Malayalam, etc. [ALL] . b clausal head-finality, nominal head-initiality, and vice versa: Chinese, Thai,

Gungbe, etc. [SOMEEquivalent]

c. “leaking” OV of different kinds, e.g. West Germanic [SOMESubset]

d. OVX, where O is the direct object [SOMESubset] . e O[F]VX, where O[F] is a restricted object-type (e.g. Neg, Focused,

Specific, etc.) [SOMESubset]

The possibility of thinking about typological equivalence in this in part acqui- sition-oriented way is a new one, which arises directly from the way the present model is constructed. 68 CatJL Special Issue, 2019 Theresa Biberauer

A further new possibility is highlighted in Biberauer & Roberts (2012, 2016, 2017). These works point out that the “size”-based parametric approach set out in (19-20) leads to novel diachronic predictions. The expectation would, for example, be that “larger” (more macro) choices which require fewer [F]s exhibit greater stability over time. And this seems to be true: rigid head-finality, for example, seems very stable, whereas West-Germanic-style OV is far less so. Furthermore, we predict that change in the direction of “smaller” (more micro) choices will exhibit a particular character, namely one which references [F]s that are already present in the system. Again, this seems to be correct. If we consider the case of OV-loss/restriction, it seems that what we observe is a pro- cess along the lines of (23) (Biberauer & Roberts 2008 show that OV-loss in the history of English appears to have followed the kind of “cascading” pathway sketched out in (23b,c):

(23) (simplified) schema of potential changes in the nature of the preverbal posi- tion in an initially “rigidly” head-final OV system:

22 a. all Os > all non-clausal complement (DP, PP, etc.) . b all non-clausal complements (DP, PP, etc.) > all DPs (nominal objects only)

. c all DPs (nominal objects only) > specific sub-types of DP (e.g. DP[negative], DP[focus], DP[topic]) > pronominal object > clitic pronominal object, etc.

Alternatively, it could also be that the OV-constraining factor is not nominal- oriented, as in (23), but clause-oriented, with the restriction referencing [tense], [aspect], [finiteness], etc. In this case, we would expect different diachronic pos- sibilities, which need also not all go in the same direction (i.e. OV loss and VO gain; OV>VO is also diachronically attested, and the MMM system allows for changes in both directions, depending on how key aspects of the rest of the system are configured). A key feature of the NONE>ALL>SOME learning paths is that they lead us to expect natural classes constructed on the basis of “nested” featural specifica- tions. Thinking of the acquisition of syntactic categories, for example, we might expect something like (24) rather than the kind of bottom-up approach to the acquisition of syntactic structure that was popular in the classic P&P era (cf. i.a. Radford’s 1990 Small Clause Hypothesis; Rizzi’s 1993/1994 Truncation model; the ATOM model of Schütze & Wexler 1996; see Biberauer & Roberts 2015 for discussion of (24)):

22. Intensive contact seems to be necessary to trigger a change from a rigidly head-final system to something less head-final; and it also seems necessary to introduce a head-initial nominal/D so that CPs can begin to undergo extraposition (see Biberauer & Sheehan 2012 on this). Factors 2 and 3: Towards a principled approach CatJL Special Issue, 2019 69

(24)

In terms of (24), we expect acquirers to want to utilize the (in part prosodi- cally mediated; see section 2.2) [F] facilitating the initial “archi”-V vs N distinc- tion (here: [±V]) as the basis for further category distinctions. Taking seriously the significance of interrogative and imperative structures in the input (see again Biberauer 2015, 2017c), and also the observed fact that English-acquiring children appear to be confident about “basic” interrogative properties like wh-movement before they have grasped the workings of the auxiliary system or, indeed, all the specifics of the C-system (cf. i.a. Thornton 1995 for discussion and references), there seems to be good motivation for proposing that the (clause-typing-related) category C may define the second “archi”-V-based ([+V]) category-type acquired by children. In phase-based systems (Chomsky 2001 et seq.), this head instantiates a phase-head, whose properties further determine the properties of T (cf. again Chomsky 2001); in the present approach, T’s properties are expected to build on and further elaborate – by means of newly postulated/harnessed [F]s – those already present on C. In other words, the connection between C and T is entirely expected. Similar reasoning can be applied in relation to v and one or more associated non- phase heads, and, likewise, to the corresponding heads in the nominal domain. What is important for our purposes here is that the NONE>ALL>SOME learn- ing path in (14) assumes an acquirer keen to generalize over as large a domain as possible to create formally defined domains sharing a particular property. This works against the kind of incremental upwards learning (e.g. V>v>Asp>T>C) often assumed, suggesting instead that acquirers will successively postulate ini- tially underspecified elements which can then be fleshed out to create sub-types of different kinds, each building upon the [F]s of the initially underspecified category, which, in turn, builds on that of earlier underspecified categories. This leads to the creation of monotonic natural classes, meaning that we expect to find considerable evidence of monotonicity in crosslinguistic variation. And this expectation does 23 appear to be borne out. Consider, for example, the Final-over-Final Condition (FOFC; see i.a. Biberauer, Holmberg & Roberts 2014; Sheehan 2013; Sheehan, Biberauer, Roberts & Holmberg in 2017). FOFC is stated in (25):

23. Note that, as of 2017, the C in FOFC stands for Condition. Final-over-Final Condition is still not as transparent a name for the word-order constraint as we would like, but the revised form at least does not misstate the nature of the constraint in play: Final-over-Final is precisely what is required, and not what is ruled out, as the initial, constraint-oriented acronym seemingly suggested; Final- over-Initial is what is barred. 70 CatJL Special Issue, 2019 Theresa Biberauer

(25) The Final-over-Final Condition (FOFC) A head-final phrase αP cannot dominate a head-initial phrase βP where α and β are heads in the same Extended Projection. (cf. Biberauer, Holmberg & Roberts/BHR 2008 et seq., notably BHR 2014)

What (25) requires is that head-finality start at the bottom of an Extended Projection, i.e. with a lexical V or N (see Grimshaw 1991 et seq.), and that once a head-final sequence has “stopped”, it cannot restart within the same EP. Contrast the structures in (25) and (26) in this respect (^ signifies head-finality in each case):

(25) Three very basic FOFC-respecting patterns:

a. [CP C^ [TP T^ [VP V^]]]

b. [CP C [TP T^ [VP V^]]]

c. [CP C [TP T [VP V^]]] > monotonicity: structurally adjacent heads consistently bear ^

(26) Three basic FOFC-violating patterns:

a. *[CP C^ [TP T [VP V^]]]

b. *[CP C^ [TP T [VP V]]]

c. *[CP C^ [TP T^ [VP V]]] > non-monotonicity: structurally adjacent heads vary in their ^-specification; an “on-off” pattern

As noted elsewhere (Biberauer, Holmberg & Roberts 2008; Biberauer, Newton & Sheehan 2009; Biberauer, Sheehan & Newton 2010; BHR 2014; Sheehan et al. 2017), this requirement has diachronic implications: OV>VO changes must proceed top-down, and VO>OV changes bottom-up, which seems to be correct. Very sig- nificantly for our current purposes, however, FOFC-style monotonicity effects are not restricted to the domain of word order. Something strikingly similar emerges in relation to categorization: see Panagiotidis (2014) and references therein on so- called Phrasal Coherence, which is illustrated in (27) Factors 2 and 3: Towards a principled approach CatJL Special Issue, 2019 71

(27) Phrasal Coherence: an initially verbal structure may subsequently be nomi- nalized (see (a)); once it has been nominalized, there can be no return to verbalization. Further initially nominal structures cannot be verbalized (i.e. 24 verbal = the equivalent of head-final in the word-order domain).

Similarly, in the domain of Agreement, we see (non)-agreement “cut- off” effects exhibiting the same profile (see Biberauer 2017b for discussion). Additionally, the various hierarchies proposed by typologists and others, and the recently much-discussed *ABA syncretism constraint (cf. i.a. Caha 2009; Bobaljik & Sauerland 2018 for discussion and references) instantiate further examples of monotonicity effects in grammar – precisely what we would expect if grammars are structured on the basis of the kind of featurally regulated acquisition pathways outlined above. The same is true for the “extended FOFC effects” discussed in Biberauer (2017b). What seems to be at stake here, then, are higher-level generalizations about recurring patterns of grammar structuring that could not readily have been ascribed to parameters – or even been readily identified, to begin with! – during the classic P&P era. These, we contend, are precisely the kinds of newly discovered patterns that generativists can now investigate seriously. From our perspective, they also appear to be the kinds of generalizations that are best understood as the product of the kind of three-way interaction between UG, the input and MMM proposed here.

3.1.3. Going Beyond the Input scenarios For Chomskyans, there has, as noted in the introduction, always been a clear sense in which all acquisition requires the acquirer to go beyond the input: children end up with knowledge of systematicities that simply aren’t available to them via the input. That-trace effects in languages that have them constitute one striking exam- ple (Rizzi 1982, 1986). Here, we will briefly consider three further scenarios that uncontroversially involve going beyond the input. One relates to artificial language learning, and the other two to real-life learning. Experimental work by i.a. Hudson Kam & Newport (2005) has revealed that ‘children learn unpredictable variation differently than adults. They have a stronger tendency to impose systematicity on inconsistent input … (my emphasis; TB)’ (Hudson Kam & Newport 2005: 184; see Mobbs 2015 for overview discussion).

24. Derivational forms like anti-disestablishmentarianism and recategorisability famously do not exhibit this coherence, of course. Thanks to Jeroen van Craenenbroeck for reminding me of this matter, which has been on my ‘Future research’ list for rather too long already, but necessarily remains there at this point. 72 CatJL Special Issue, 2019 Theresa Biberauer

In particular, while adults demonstrate frequency-matching, approximately repli- cating the variability in the original input, child acquirers employ regularization strategies. The nature of these strategies is of particular interest here. Consider (28) in this connection:

(28) The types of regularization that children impose on the input: a. minimization: use the variable form none of the time (NONE) b. maximization: use the variable form all of the time (ALL) c. linguistically governed selection: use the variable form in a grammatically defined subset of contexts, e.g. only with transitive Vs (SOME)

It is worth noting that (28c) was the most rarely used strategy; nevertheless, the picture that emerges from this (and other studies) is that child acquirers appear to appeal to MMM-driven regularization strategies of precisely the kind assumed in this model. Our real-life examples both come from English. The first concerns number- marking in modern British English vernaculars (see Willis 2016 for more detailed discussion of this data). Let us first consider the present tense. Here standard English number-marking is restricted to first and third person on BE (i.e. am/are, is/are), and 3rd person singular on lexical verbs and (non-modal) auxiliaries. In vernacular varieties, the following patterns emerge:

(29) a. generalization throughout the paradigm, either (i) to s-forms throughout (she sings, they sings) (ALL), or (ii) to s-less forms (she sing, they sing) throughout (NONE). . b use with specific sub-types of subjects, as in the Northern Subject Rule, which takes a number of different forms, picking up on the form of the subject (e.g. full DP, pronoun) and potentially the position of the subject (pre-/post-auxiliary), and so on (SOME).

As indicated, then, NONE>ALL>SOME patterns once again emerge. Before we move on to consider the patterns observed in the past tense, it is worth briefly considering why all three of the NONE>ALL>SOME patterns emerge in the present tense. To the extent that the ALL-choice rests on the postulation of featurally more complex phase-heads than the NONE-choice, we might, after all, expect there to have to be a further grammatical signal that this increased featural complication relative to the evidently available NONE-option is warranted. Importantly, however, NONE- and ALL-options can also be equally complex. Where an [F] is already part of a system, generalising it over a (novel) class of heads will, for example, conform to both IG and FE (cf. (12) and (13) above). Where the decision is simply a matter of spellout – consistently do/don’t spell out a specified feature – there need also not be any complexity difference in play. Both considerations seem to hold for Factors 2 and 3: Towards a principled approach CatJL Special Issue, 2019 73 the dialects that opt for ALL/NONE reanalyses of the verbal [number] marking. These reanalyses render a non-[number] analysis of some kind necessary. In the case of the ALL-s-realisation systems, -s arguably spells out only [present], which is simpler than the standard [3psg, present] specification; as we know that [tense] is already present in the verbal system at the stage at which [number] is extended to it from the nominal system (cf. i.a. Miller 2007; Miller & Schmitt 2012a,b), this [F]-attribution would not seem to entail the postulation of any new [F]s. In the case of the NO-s-realisation systems, the unmarked verbs once again need to be specified for [present], even in the absence of an overt spellout, to accommodate speakers’ awareness of the [tense] specification, which is very evident in do-support contexts (interrogatives, tag structures, etc.). Whether -s is realised or not, then, an already- present feature [tense] will be ascribed to the consistently (un)inflected verb-forms in both the ALL- and the NONE-systems. And the same is true in the case of the Northern Subject Rule SOME-systems: here -s realisation always appears to be regu- lated by an already-present lexical-functional distinction (between full nominals and pronouns), potentially further mediated by “shallow”-seeming linear (i.e. PF-based) considerations. The NONE>ALL>SOME options in this case therefore seem to be comparable in “cost” terms. Since the regularity in question – what to make of -s – is known to be later-acquired (see again Miller 2008, Miller & Schmitt 2012a,b, and also Brown’s classic (1973) Morpheme Order Study), this cost-equivalence is in fact unsurprising in the context of the present model: as already noted above, non- initial (i.e. later) choices do not necessarily take the form of featurally more or less complex options, or SOMESubset choices; instead, they may simply be alternative SOME or SOMEEquivalent choices (cf. the discussion around (22)). Turning to the past tense, we see that number marking in this domain in stand- ard English is even more restricted than in the present, surfacing only on BE (i.e. was/were). In the vernacular varieties, we again see a number of different patterns emerging, namely:

(30). a generalization throughout the paradigm, either to all was or all were (ALL/NONE) . b specialization relative to polarity: were (i.e. weren’t) in negative clauses, regardless of person and number, with was occurring in affirmative clauses, regardless of number (see (31)). (SOME)

(31) a. They was writing a lot of tests that time. b. He weren’t doing much else.

As in the case of the present tense, the NONE>ALL>SOME options given in (30-31) can all be shown to be cost-equivalent. Thus the generalization options par- allel the -s/-∅-generalisation options discussed for the present tense: both require the postulation of the feature [past], i.e. an instantiation of the [tense]-feature, which is demonstrably part of the English verbal system prior to verbal number marking. The grammatically defined SOME-choice that emerges in the past tense, 74 CatJL Special Issue, 2019 Theresa Biberauer likewise, piggybacks on an [F] already present in the system, namely [polarity]. What determines this specific choice of [F]? One highly plausible conditioning fac- tor here would be the evidence that acquirers get from interrogative structures that auxiliaries are fundamentally concerned with polarity. Consider (32) in this regard:

(32) a. They were all picnicking in the sunshine. b. Were they all picnicking in the sunshine? c. They ate a lot of cake. d. Did they eat a lot of cake? Here we see a very fundamental declarative-interrogative contrast in respect of auxiliary positioning (cf. (8d) above) and realization (cf. (8b) above). That English-acquiring children initially relate auxiliaries to interrogativity – i.e. open polarity – and, more generally, non-neutral affirmative polarity rather than tense- marking is strongly suggested by child data (see again Thornton 1995, and notably also Woods & Roeper in press for recent discussion and references; note also that this fits with the discussion surrounding (24) above).25 [Polarity] then seems to be an early-acquired [F], at least in English, which, in the context of our model, would therefore be expected to serve as the basis for input structuring in cases where the input is in some way compromised. Like [tense], this feature is already part of the grammar at the point where the acquirer is seeking a featural rationale for the singular-plural distinction on BE, meaning that this SOMESubset option is as “economical” as the options that, at first sight, appear to be “simpler” NONE- or ALL-options (the generalisation options in (30a)). Our second real-life example comes from West Ulster English. As previously discussed in McCloskey (2000, 2016) and also Henry (2012, 2015), this variety of English permits unusually extensive quantifier-float options in A-bar contexts. Consider (33) in this regard; parentheses indicate the various all-placement options:

(33) What (all) did he (all) say (all) that he (all) bought (all)?

Henry (2015), however, shows that these options are not necessarily available to all West Ulster speakers; instead, it appears to be the case that different “float- ing” grammars exist, as illustrated in (34):

(34) a. What all did he say that he bought? b. What (all) did he (all) say (all) that he (all) bought (all) c. What (all) did he say (all) that he bought? d. What (all) did he say (all) that he bought (all)? e. What (all) did he (all) say that he (all) bought? f. What (all) did he (all) say that he (all) bought (all)?

25. The strong connection to non-affirmative polarity is also evident in the history of the rise of do-sup- port (see i.a. Kroch 1989, and Wallage 2017 for discussion and references). Factors 2 and 3: Towards a principled approach CatJL Special Issue, 2019 75

(34a) is the standard English, no-floating grammar, while (34b) instantiates the grammar which permits stranding in all possible positions. (34c-f), in turn, rep- resent grammars in which some natural-class subset of these options is available. The picture as a whole can be characterised as in (35):

(35) a. What all did he say that he bought? NONE b. What (all) did he (all) say (all) that he (all) bought (all) ALL (vP- & CP-edge plus base position)

c. What (all) did he say (all) that he bought? SOMEEquivalent (CP-edge only)

d. What (all) did he say (all) that he bought (all)? SOMEEquivalent (CP-edge plus base position)

e. What (all) did he (all) say that he (all) bought? SOMEEquivalent (vP-edge only)

f. What (all) did he (all) say that he (all) bought (all)? SOMEEquivalent (vP-edge plus base position)

The pattern that we see, then, involves an across-the-board licensing or ban of stranding possibilities (35a,b), or the licensing of stranding options targeting one or other phase-edge with or without the quantifier’s base position being a further possibility (35c-f). Crucially, McCloskey (2016) observes that the input for these structures will be very scarce indeed, raising the question of how the variant stranding grammars are acquired: they will clearly fall beyond the prescriptive radar, and it is also not the case that the A- and A-bar stranding patterns in a given system necessarily overlap in any way. Here, then, we undoubtedly face another “going beyond the input” scenario, where acquirers are converging on grammars that conform to the NONE>ALL>SOME expectations that an MMM-mediated model would predict for input-poor scenarios generally. What I would like to suggest – in advance of fieldwork to establish the actual facts – is that input from other components of the grammar that are already in place will enable the acquirer to converge on an appro- priate grammar. Data alerting the child to the need or not to distinguish between different clausal phase heads (C, v) could, for example, (help to) determine the size and composition of the class of stranding-permitting heads. One type of data that might be relevant in this regard – particularly also if we bear in mind the need to pinpoint structures that could plausibly be salient enough to supply the acquirer with the relevant input at a suitably early stage – is the inverted-subject imperative. The examples in (36) demonstrate the fact that these are not equally readily available in all varieties of Ulster English (data from Henry 1995, 2015): 76 CatJL Special Issue, 2019 Theresa Biberauer

(36) a. Sit you down! [Dialect A: ✓; Dialect B: ✓] b. Go you away! [Dialect A: ✓; Dialect B: ✓] c. Run you! [Dialect A: ✓; Dialect B: ✗] d. Read you that book! [Dialect A: ✓; Dialect B: ✗] As (36) shows, some varieties permit inverted-subject imperatives, regardless of verb-type (Henry’s Dialect A), while others exhibit argument-structure-based constraints on the availability of this imperative-type. Henry’s Dialect B, for exam- ple, only permits inverted-subject imperatives with telic intransitives; thus transi- tive (36d) and the atelic intransitive in (36c) are both ruled out. In terms of a fairly standard minimalist view, v is the phase-head that regulates argument-structure and so-called first-phase syntax more generally (cf. i.a. Ramchand 2008, and D’Alessandro, Franco & Gallego 2017), while C is the phasal locus of clause-typ- ing and (at least some – see Heim & Wiltschko 2017) discourse-related properties. Accepting this view, we see that acquirers of Dialect A-type systems will receive evidence from a high-frequency – and presumably also highly salient – input struc- ture that discourse-marked (i.e. non-neutral declarative) v and C phase-heads can be generalised across, i.e. IG as in (13) can apply. In this case, then, we might expect NONE or ALL stranding grammars to be postulated as there is another well attested non-neutral, A-bar-structure where the relevant clausal phase heads can all be treated identically: all vs are compatible with the inverted-subject-associated imperative C, i.e. any v can match up with the relevant type of C, and so we might also expect all vs and Cs to behave identically in relation to quantifier stranding. Acquirers exposed to Dialect B-type systems, by contrast, will receive imperative evidence that the v and C phase-heads cannot simply be treated as a natural class in the context of discourse-marked (i.e. non-neutral/non-declarative) structures: transitive and atelic vPs need to be distinguished to capture the constraint on the distribution of inverted-subject imperatives. In these grammars, then, we might expect acquirers not to generalise across v and C to produce either a NONE or ALL grammar; instead, postulation of one of the SOME grammars presumably allows them to exploit the already-present featural discrepancies between phase heads in their target variety. If this kind of approach to the quantifier-stranding possibilities depicted in (34/35) is on the right track, we again, as in the case of verbal number-marking, see that apparent NONE>ALL>SOME options in fact constitute SOMEEquivalent options, with the result that acquirers have a number of equally MMM-compatible options for resolving a poverty-of-the-stimulus-type indeterminacy.

4. Conclusion Our objective here has been to try to show why it is both productive and important for generativists to take the Three Factors model seriously, and also to flesh out how we might want to approach its empirical and general cognitive components, and their interaction with each other, and with whatever is left in UG. I introduce a Factors 2 and 3: Towards a principled approach CatJL Special Issue, 2019 77 neo-emergentist model of language acquisition, variation, and change that, like its classic P&P predecessor, seeks to understand language variation (and change) as a reflex of the way in which language is acquired. Where the explanatory burden previously rested largely on UG and its hypothetically rich parametric content, we have instead considered how parametrically shaped adult grammars might arise in the absence of a UG-given parametric endowment. Each of the three factors in Chomsky’s (2005) model was ascribed a role in the context of the model presented here, with the general cognitive factor, Maximise Minimal Means, being argued to be particularly significant in facilitating new understanding of crosslinguistically recurring patterns that would not – had they been noticed during the classic P&P era – have received a satisfactory “two-factors” explanation. At the same time, we have emphasised the importance of engaging seriously with the input, and, more specifically, those aspects of it which serve as the basis for UG-mediated, MMM- driven generalisation. The current minimalist perspective on crosslinguistic varia- tion and language typology, then, would seem to be both more complex and more interesting than that expressed in Chomsky (1995: 6):

Within the P&P approach the problem of typology and language variation arises in a somewhat different form than before. Language differences and typology should be reducible to choice of values of parameters.

In fact, it may be that we are, finally, starting to reach the point where we can make progress on matters like those initially highlighted in Chomsky’s review of Skinner (emphasis mine, TB):26

As far as acquisition of language is concerned, it seems clear that reinforcement, casual observation, and natural inquisitiveness (coupled with a strong tendency to imitate) are important factors, as is the remarkable capacity of the child to generalize, hypoth- esize, and “process information” in a variety of very special and apparently highly complex ways which we cannot yet describe or begin to understand, and which may be largely innate, or may develop through some sort of learning or through matura- tion of the nervous system. The manner in which such factors operate and interact in language acquisition is completely unknown. (Chomsky 1959: 43)

References Abler, William. 1989. On the particulate principle of self-diversifying systems. Journal of Social Biological Structure 12: 1-13. Adger, David, Harbour, Daniel & Watkins, Laurel. 2009. Mirrors and Microparameters: Phrase Structure beyond Free Word Order. Cambridge: Cambridge University Press. Ambridge, Ben, Pine, Jonathan & Lieven, Elena. 2013. Child language acquisition: why Universal Grammar doesn’t help. Language 90(3): e53-e90.

26. Thanks to Itziar Laka for drawing attention to this important extract during a generative linguistics event in Reading (May 2017). 78 CatJL Special Issue, 2019 Theresa Biberauer

Baunaz, Lena, De Clercq, Karen, Haegeman, Liliane & Lander, Eric. 2018. Exploring Nanosyntax. Oxford: Oxford University Press. Biberauer, Theresa. 2008. Introduction. In Biberauer, Theresa (ed.). The Limits of Syntactic Variation, 1-72. Amsterdam: Benjamins. Biberauer, Theresa. 2011. In defence of lexico-centric parametric variation: two 3rd factor-constrained case studies. Paper presented at the Workshop on Formal Grammar and Syntactic Variation: Rethinking Parameters (Madrid). Biberauer, Theresa. 2015. Learning from questions and commands: probing the nature and origins of native-speaker knowledge. Cambridge Humanities Research Grant proposal. Biberauer, Theresa. 2016. Going beyond the input (and UG): an emergentist generative perspective on syntactic variation, stability and change. Invited talk at the Language Contact, Continuity and Change in the Emergence of Modern Hebrew conference (Jerusalem) Biberauer, Theresa. 2017a. Particles and the Final-over-Final Constraint. In Sheehan, Michelle, Biberauer, Theresa, Holmberg, Anders & Roberts, Ian (eds.). The Final- over-Final Condition, 187-296. Cambridge, MA: MIT Press. Biberauer, Theresa. 2017b. Maximising Minimal Means: typological, acquisitional and diachronic perspectives. Three-day invited Lecture Series at the Center for Research in Syntax, Semantics and Phonology (CRiSSP) (Brussels). Biberauer, Theresa. 2017c. Learning from commands in contact situations: some Southern African case studies. Talk given at the Stellenbosch Linguistic Research Seminar (Stellenbosch, 5 April). Biberauer, Theresa. 2017d. Word-order variation and change in systems that maximize minimal means. Paper presented at the Variation and Change in the Verb Phrase Workshop (Oslo, 15 May). Biberauer, Theresa. 2017e. Factors 2 and 3: a principled approach. Cambridge Occasional Papers in Linguistics 10: 38-65. [Available on-line: ]. Biberauer, Theresa. 2018. Pro-drop and emergent parameter hierarchies. In Cognola, Federica & Casalicchio, Jan (eds.). Null Subjects in Generative Grammar: A Synchronic and Diachronic Perspective, 94-135. Oxford: Oxford University Press. Biberauer, Theresa, Bockmühl, Juliane, Herrmann, Erika & Shah, Sheena. 2017. Imperative variation: the case of Afrikaans, Namibian German and Kroondal German. Paper presented at the 4th Formal Ways of Analyzing Variation (FWAV4) conference (York, 29 June). Biberauer, Theresa, Holmberg, Anders & Roberts, Ian. 2008. Disharmonic word- order systems and the Final-over-Final-Constraint (FOFC). In Bisetto, Antonetta & Barbieri, Francisco (eds). Proceedings of XXXIII Incontro di Grammatica Generativa. [Available on-line at: ]. Biberauer, Theresa, Holmberg, Anders & Roberts, Ian. 2014. A syntactic universal and its consequences. Linguistic Inquiry 45(2): 169-225. Biberauer, Theresa, Holmberg, Anders, Roberts, Ian & Sheehan, Michelle. 2014. Complexity in comparative syntax: the view from modern parametric theory. In Newmeyer, Frederick J. & Laurel Preston (eds.). Measuring Linguistic Complexity, 103-127. Oxford: Oxford University Press. Factors 2 and 3: Towards a principled approach CatJL Special Issue, 2019 79

Biberauer, Theresa, Newton, Glenda & Sheehan, Michelle. 2009. Limiting synchronic and diachronic variation and change: the Final-over-Final Constraint. Language and Linguistics 10(4): 699-741. Biberauer, Theresa & Roberts, Ian. 2008. Cascading parameter changes: internally driven change in Middle and Early Modern English. In Þórhallur Eythórsson (ed.). Grammatical Change and Linguistic Theory: The Rosendal Papers, 79-113. Amsterdam: Benjamins. Biberauer, Theresa & Roberts, Ian. 2009. The Return of the Subset Principle. In Crisma, Paola & Longobardi, Giuseppe (eds.). Historical Syntax and Linguistic Theory, 58-74. Oxford: Oxford University Press. Biberauer, Theresa & Roberts, Ian. 2012. On the significance of what hasn’t happened. Paper presented at the DiGS 14 Conference (Lisbon) Biberauer, Theresa & Roberts, Ian. 2015. Rethinking formal hierarchies: a proposed unification. Cambridge Occasional Papers in Linguistics 7: 1-31. Biberauer, Theresa & Roberts, Ian. 2016. Parameter typology from a diachronic perspective: the case of Conditional Inversion. In Bidese, Ermenegildo, Cognola, Federica & Moroni, Manuela C. (eds.). Theoretical Approaches to Linguistic Variation, 259-291. Amsterdam: Benjamins. Biberauer, Theresa & Roberts, Ian. 2017. Parameter setting. In Ledgeway, Adam & Roberts, Ian (eds.). The Cambridge Handbook of Historical Syntax, 134-162. Cambridge: Cambridge University Press. Biberauer, Theresa & Sheehan, Michelle. 2012. Disharmony, antisymmetry, and the Final-over-Final Constraint. In Etxebarria, Myriam & Valmala, Vidal (eds.). Ways of Structure Building, 206-244. Oxford: Oxford University Press. Biberauer, Theresa & Sheehan, Michelle. 2013. Introduction. In Biberauer, Theresa & Sheehan, Michelle (eds.). Theoretical Approaches to Disharmonic Word Order, 1-46. Oxford: Oxford University Press. Biberauer, Theresa, Sheehan, Michelle & Newton, Glenda. 2010. Impossible changes and impossible borrowings: the Final-over-Final Constraint. In Breitbarth, Anne, Lucas, Christopher, Watts, Sheila & Willis, David (eds.). Continuity and Change in Grammar, 35-60. Amsterdam: John Benjamins. Bobaljik, Jonathan. 2002. A-Chains at the PF Interface: copies and covert movement. Natural Language and Linguistic Theory 20(2): 197-267. Bobaljik, Jonathan & Sauerland, Uli. 2018. *ABA and the combinatorics of morphological rules. Glossa: A Journal of General Linguistics 3(1): 15. Boeckx, Cedric. 2015. Elementary Syntactic Structures: Prospects of a Feature-Free Syntax. Cambridge: Cambridge University Press. Bond, Oliver, Corbett, Greville, Chumakina, Maria & Brown, Dunstan. 2016. Archi: Complexities of Agreement in Cross-theoretical Perspective. Oxford: Oxford University Press. Bornstein, Marc & Arterberry, Martha. 2010. The development of object categorization in young children: Hierarchical inclusiveness, age, perceptual attribute and group versus individual analyses. Developmental Psychology 46: 350-365. Bouchard, Denis. 2013. The Nature and Origin of Language. Oxford: Oxford University Press. Branigan, Phil. 2012. Macroparameter learnability: an Algonquian Case Study. Unpublished ms: Memorial University of Newfoundland. 80 CatJL Special Issue, 2019 Theresa Biberauer

Brown, Roger. 1973. A First Language: The Early Stages. Cambridge, MA: Harvard University Press. Caha, Pavel. 2009. The Nanosyntax of Case. PhD dissertation, CASTL Tromsø. Chomsky, Noam. 1959. Review of Verbal Behaviour by B.F. Skinner. Language 35: 26-58. Chomsky, Noam. 1981. Lectures on Government and Binding. Dordrecht: Foris. Chomsky, Noam. 1986. Knowledge of Language. Its Nature, Origin, and Use. Westport, CT: Praeger. Chomsky, Noam. 1993. A minimalist program for linguistic theory. In Hale, Ken & Keyser, S. Jay (eds.). The View from Building 20: Essays in Linguistics in Honor of Sylvain Bromberger, 1-52. Cambridge, MA: MIT Press. Chomsky, Noam. 1995. The Minimalist Program. Cambridge, MA: MIT Press. Chomsky, Noam. 2000. Minimalist inquiries: the framework. In Martin, Roger, Michaels, David & Uriagereka, Juan (eds.). Step by Step: Essays on Minimalist Syntax in Honor of Howard Lasnik, 89-156. Cambridge, MA: MIT Press. Chomsky, Noam. 2001. Derivation by phase. In Kenstowicz, Michael (ed.). Ken Hale: A Life in Language, 1-50. Cambridge, MA.: MIT Press. Chomsky, Noam. 2005. Three factors in Language Design. Linguistic Inquiry 36: 1-22. Chomsky, Noam. 2007. Of minds and language. Biolinguistics 1: 9-27. Chung, Sandy. 2012. Are lexical categories universal? The view from Chamorro. Theoretical Linguistics 38: 1-56. Cinque, Guglielmo. 2005. A note on Verb/Object order and Head/Relative clause order. University of Venice Working Papers in Linguistics 15: 49-104. Cinque, Guglielmo. 2013. Cognition, Universal Grammar, and typological generalizations. Lingua 130: 50-65. Cinque, Guglielmo. 2018. A microparametric approach to the head-initial/head-final parameter. Linguistic Analysis 41(3/4): 309-366. Clark, Eve. 1993. The Lexicon in Acquisition. Cambridge: Cambridge University Press. Crain, Stephen & Pietroski, Paul. 2001. Nature, nurture and Universal Grammar. Linguistics and Philosophy 24(2): 139-186. D’Alessandro, Roberta, Franco, Irene & Gallego, Ángel. 2017. The Verbal Domain. Oxford: Oxford University Press. D’Alessandro Roberta & van Oostendorp, Marc (2018). Magnetic Grammar. Unpublished ms: Utrecht and Radboud Universities. (; last accessed 5 December 2018) Dehaene, Stanislas. 2007. Reading in the Brain: The New Science of How We Read. London: Penguin. Demuth, Katherine. 1994. On the ‘underspecification’ of functional categories in early grammars. In Lust, Barbara, Suñer, Margarita & Whitman, John (eds.). Syntactic Theory and First Language Acquisition: Cross-Linguistic Perspectives, 119-134. Hillsdale, N.J.: Lawrence Erlbaum Associates. Demuth, Katherine. 2003. The acquisition of Bantu languages (eds). In The Bantu Languages. Nurse, Derek & Philippson, Gerard (eds.), 209-222. Surrey: Curzon Press. Douglas, Jamie. 2018. Maori subject extraction. Glossa 3(1): 110. Dresher, Elan. 2009. The Contrastive Hierarchy in Phonology. Cambridge: Cambridge University Press. Factors 2 and 3: Towards a principled approach CatJL Special Issue, 2019 81

Dresher, Elan. 2014. The arch not the stones: universal feature theory without universal features. Nordlyd 41(2): 165-181. Dryer, Matthew. 2009. Verb-object-negative order in Central Africa. In Cyffer, Norbert, Ebermann, Ewald & Ziegelmeyer, Georg (eds.). Negation Patterns in West African Languages and Beyond, 307-362. Amsterdam: Benjamins. Duffield, Nigel. 2013. Minimalism and semantic syntax: interpreting multifunctionality in Vietnamese. Unpublished ms: Konan University (LingBuzz/001919; last accessed 21 September 2015) Duffield, Nigel. 2017. On what projects in Vietnamese. Journal of East Asian Linguistics 26(4): 351-387. Eguren, Luis, Fernandez-Soriano, Olga & Mendikoetxea, Amaya. 2016. Rethinking Parameters. Oxford: Oxford University Press. Elman, Jeffrey. 1993. Learning and development in neural networks: the importance of starting small. Cognition 48(1): 71-99. Epstein, Samuel D., Kitahara, Hisatsugu & Seely, T. Daniel. 2012. Structure building that can’t be! In Uribe-Etxebarria, Myriam & Valmala, Vidal (eds.). Ways of Structure Building, 253-270. Oxford: Oxford University Press. Reprinted in: Epstein, Samuel D., Kitahara, Hisatsugu & Seely, T. Daniel. 2015. Explorations in Maximizing Syntactic Minimization. London: Routledge, 156-174. Epstein, Samuel D., Kitahara, Hisatsugu & Seely, T. Daniel. 2013. Simplest Merge generates set intersection: Implications for complementizer-trace explanation. In Goto, Nobu, Otaki, Koichi, Sato, Atsushi & Takita, Kensuke (eds.). Proceedings of GLOW in Asia IX., 77-92. Mie University. Reprinted in: Epstein, Samuel D., Kitahara, Hisatsugu & Seely, T. Daniel. 2015. Explorations in Maximizing Syntactic Minimization. London: Routledge, 175-94. Evers, Arnold & van Kampen, Jacqueline. 2008. Parameter setting and input reduction. In Biberauer, Theresa (ed.). The Limits of Syntactic Variation, 483-515. Amsterdam: Benjamins. Fasanella, Adriana. 2014. On how Learning Mechanisms shape Natural Languages. Ph.D. dissertation, UAB. () Fasanella, Adriana & Fortuny, Fortuny. 2016. Deriving linguistic variation from learnability conditions: the Chunking Procedure. In Eguren, Luis, Fernandez- Soriano, Olga & Mendikoetxea, Amaya. Rethinking Parameters, 105-132. Oxford: Oxford University Press. Ferreira, Fernanda & Patson, Nikole. 2007. The good enough approach to language comprehension. Language and Linguistics Compass 1: 71-83. Fikkert, Paula. 1994. On the Acquisition of Prosodic Structure. Dordrecht: ICG Printing. Fodor, Janet D. & Sakas, William G. 2005. The Subset Principle in syntax: costs of compliance. Journal of Linguistics 41: 513-569. Foley, Claire, Núñez del Prado, Zelmira, Barbiers, Isabella & Lust, Barbara. 2003. Knowledge of variable binding in VP-ellipsis: language acquisition research and theory converge. Syntax 6: 1-52. Fodor, Janet D. & Sakas, William G. 2017. Learnability. In Roberts, Ian (ed.). The Oxford Handbook of Universal Grammar, 249-269. Oxford: Oxford University Press. Fortuny, Jordi. 2010. On the duality of patterning. In Zwart, Jan-Wouter (ed.). Structure Preserved: Studies in Syntax for Jan Koster, 131-140. Amsterdam: Benjamins. 82 CatJL Special Issue, 2019 Theresa Biberauer

Franco, Ludovico. 2012. Against the identity of complementizers and (demonstrative) pronouns. Poznan Studies in Contemporary Linguistics 48: 565-596. Freeman, Thomas. 2016. Metaphysical Syntax. MPhil dissertation, University of Cambridge. Fujita, Kensuke. 2009. A prospect for evolutionary adequacy: Merge and the evolution and development of human language. Biolinguistics 3: 128-153. Gagliardi, Annie. 2012. Input and Intake in Language Acquisition. PhD dissertation, Maryland. Gallego, Ángel. 2011. Parameters. In Boeckx, Cedric (ed.). Oxford Handbook of Linguistic Minimalism, 523-550. Oxford: Oxford University Press. Gass, Susan. 1997. Input, Interaction, and the Second Language Learner. Mahwah, NJ: Erlbaum. Gervain, Judit, Nespor, Marina, Mazuka, Reiko, Horie, Ryota & Mehler, Jacques (2007). Bootstrapping word order in prelexical infants: a Japanese-Italian cross- linguistic study. Cognitive Psychology 57: 56-74. Gervain, Judit & Mehler, Jacques. 2010. Speech perception and language acquisition in the first year of life. Annual Review of Psychology 61: 191-218. Gervain, Judit & Werker, Janet. 2008. How infant speech perception contributes to language acquisition. Language and Linguistics Compass 2(6): 1149-1170. Gianollo, Chiara, Guardiano, Cristina & Longobardi, Giuseppe. 2008. Three fundamental issues in parametric linguistics. In Biberauer, Theresa (ed.). The Limits of Syntactic Variation, 109-142. Amsterdam: Benjamins. Gibson, Hannah, Koumbarou, Andriana, Marten, Lutz & van der Wal, Jenneke. 2017. Locating the Bantu conjoint/disjoint alteration in a typology of focus marking. In Van der Wal, Jenneke & Hyman, Larry (eds.). The Conjoint/Disjoint Alternation in Bantu, 61-100. Berlin: Mouton de Gruyter. Gigerenzer, Gerd & Todd, Peter. 2000. Fast and frugal heuristics: the adaptive toolbox. In Gigerenzer, Gerd, Todd, Peter & The ABC Research Group (eds.). Evolution and Cognition. Simple Heuristics that make us Smart, 3-34. Oxford: Oxford University Press. Goksun, Tilbe, Roeper, Tom, Hirsh-Pasek, Kathy & Golinkoff, Roberta. 2011. From nounphrase ellipsis to verbphrase ellipsis: The acquisition path from context to abstract reconstruction. In Harris, Jesse & Grant, Margaret (eds.). Occasional Working Papers in Linguistics 38: Processing Linguistic Structure, 53-74. Amherst, MA: GLSA. Grimshaw, Jane. 1991. Extended Projection. Unpublished ms: Brandeis. Guasti, Maria-Teresa. 2017. Language Acquisition. The Growth of Grammar. Cambridge, MA: MIT Press. Guardiano, Cristina & Longobardi, Giuseppe. 2017. Parameter theory and parametric comparison. In Roberts, Ian (ed.). The Oxford Handbook of Universal Grammar, 377-398. Oxford: Oxford University Press. Hall, Daniel C. 2007. The Role and Representation of Contrast in Phonological Theory. PhD dissertation, University of Toronto. Haspelmath, Martin. 2010. Comparative concepts and descriptive categories in crosslinguistic studies. Language 86(3): 663-687. Hauser, Marc, Chomsky, Noam & Fitch, Tecumseh. 2002. The faculty of language: what is it, who has it, and how did it evolve? Science 298(5598): 1569-1579. Factors 2 and 3: Towards a principled approach CatJL Special Issue, 2019 83

Hawkins, John A. 2009. An asymmetry between VO and OV languages: the ordering of obliques. In Corbett, Greville & Noonan, Michael (eds). Case and Grammatical Relations: Studies in Honor of Bernard Comrie, 167-190. Amsterdam: Benjamins. Heim, Johannes & Wiltschko, Martina. 2017. The complexity of speech acts. Evidence from speech act modifiers. Unpublished ms: UBC. Henry, Alison. 1995. Belfast English and Standard English: Dialect Variation and Parameter Setting. Oxford: Oxford University Press. Henry, Alison. 2012. Phase edges, quantifier float and the nature of (micro-) variation. IBERIA 4(1): 23-39. Henry, Alison. 2015. Morphosyntactic variation and finding the open values in syntax. Paper presented at the Formal Approaches to Morphosyntactic Variation conference (Vitoria). Hockett, Charles. 1958. A Course in Modern Linguistics. New York: Macmillan Company. Hornstein, Norbert & Nunes, Jairo. 2008. Adjunction, labeling, and bare phrase structure. Biolinguistics 2: 57-86. Hornstein, Norbert & Pietroski, Paul. 2009. Basic operations: Minimal syntax- semantics. Catalan Journal of Linguistics 8: 113-39. Huddleston, Geoffrey. 1984. Introduction to the Grammar of English. Cambridge: Cambridge University Press. Hudson Kam, Carla & Newport, Elissa. 2005. Regularizing unpredictable variation: the roles of adult and child learners in language formation and change. Language Learning and Development 1: 151-95. Humboldt, Wilhelm von. 1836. Über die Verschiedenheit des menschlichen Sprachbaues. Paderborn: Verlag Ferdinanc Schöningh. Jaspers, Dany. 2012. Logic and colour. Logica Universalis 6: 227-248. Jaspers, Dany. 2013. Constraints on Concept Formation. Poster presented at GLOW 36 (Lund). Kahnemann, Daniel. 2011. Thinking, Fast and Slow. London: Penguin. Kampen, Jacqueline van. 2004. Learnability order in the French pronominal system. In Bok-Bennema, Reineke, Hollebrandse, Bart, Kampers-Manhe, Brigitte & Sleeman, Petra (eds.). Romance Languages and Linguistic Theory 2002: Selected Papers from ‘Going Romance’, Groningen 28-30 November 2002, 163-182. Amsterdam: Benjamins. Kiss, Katalin É. 2008. Free word order, (non)configurationality, and phases. Linguistic Inquiry 39(3): 441-475. Kroch, Anthony. 1989. Reflexes of grammar in patterns of language change. Language Variation and Change 1: 199-244. Lasnik, Howard 1995. Case and expletives revisited: on Greed and other human failings. Linguistic Inquiry 26: 615-633. Leffel, Timothy, Šimik, Radek & Wierzba, Marta. 2013. Pronominal F-markers in Basaa. Proceedings of NELS 43: 265-276. GLSA Publications. Leivada, Evelina. 2017. What’s in (a) Label? Neural origins and behavioral manifestations of Identity Avoidance in language and cognition. Biolinguistics 11: 1-30. 84 CatJL Special Issue, 2019 Theresa Biberauer

Levin, Theodore. 2016. Successive-cyclic case assignment: Korean nominative- nominative case-stacking. Natural Language and Linguistic Theory 35(2): 447-498. Lidz, Jeff & Gagliardi, Annie. 2015. How nature meets nurture: Universal Grammar and statistical learning. Annual Review of Linguistics 1: 333-353. Lleó, Conxita. 1998. Proto-articles in the acquisition of Spanish: Interface between Phonology and Morphology. In Fabri, Ray, Ortmann, Albert & Parodi, Teresa (eds.). Modelle der Flexion: 18. Jahresetagung der Deutschen Gesellschaft für Sprachwissenschaft. Tübingen: Niemeyer. Lleó, Conxita. 2001. Early fillers: undoubtedly more than phonological stuffing. Journal of Child Language 28: 262-265. Lleó, Conxita & Demuth, Katherine. 1999. Prosodic constraints on the emergence of grammatical morphemes: crosslinguistic evidence from Germanic and Romance languages. In: Greenhill, Annabel, Littlefield, Heather & Tano, Cheryl (eds.). Proceedings of the 23rd Annual Boston University Conference on Language Development (BUCLD 23), volume 2: 407-418. Somerville, MA: Cascadilla Press. Longobardi, Giuseppe. 2018. Principles, Parameters, and Schemata: a radically underspecified UG. Linguistic Analysis 41(3-4): 517-558. Martí, Luisa. 2015. Grammar versus Pragmatics: Carving Nature at the Joints. Mind and Language 30(4): 437-473. McCloskey, Jim. 2000. Quantifier float and wh-movement in an Irish English. Linguistic Inquiry 31: 57-84. McCloskey, Jim. 2006. Questions and questioning in a local English. In Zanuttini, Raffaella, Campos, Hector, Herburger, Elena & Portner, Paul (eds.). Crosslinguistic Research in Syntax and Semantics: Negation, Tense, and Clausal Architecture, 87-126. Washington, DC: Georgetown University Press. McCloskey, Jim. 2016. Micro-parameters in a tiny space: stranding at the edge. Paper presented at CamCoS 5 (Cambridge, UK). Mehler, Jacques, Jusczyk, Peter, Lambertz, Ghislane, Halsted, Nilofar, Bertoncini, Josiane & Amiel-Tison, Claudine. 1988. How infant speech perception contributes to language acquisition. A precursor of language acquisition in young infants. Cognition 29: 143-178. Mehler, Jacques & Dupoux, Emile. 1994. What Infants Know: The New Cognitive Science of Early Development. Cambridge, MA: Blackwell. Miller, Karen. 2007. Variable Input and the Acquisition of Plurality in Two Varieties of Spanish. PhD dissertation, Michigan State University. Miller, Karen & Schmitt, Christina. 2012a. Not all children agree: acquisition of agreement when the input is variable. Language Learning and Development 8(3): 255-277. Miller, Karen & Schmitt, Christina. 2012b. Variable input and the acquisition of plural morphology. Language Acquisition 19(3): 223-261. Milsark, Gary. 1974. Existential sentences in English. PhD dissertation, MIT. Mobbs, Iain. 2015. Minimalism and the Design of the Language Faculty. PhD dissertation, University of Cambridge. Moore, David. 2002. Auditory development and the role of experience. British Medical Bulletin 63: 171-181. Morin, Olivier. 2018. Spontaneous emergence of legibility in writing systems: the case of orientation anisotropy. Cognitive Science 42(2): 664-677. Factors 2 and 3: Towards a principled approach CatJL Special Issue, 2019 85

Nazzi, Thierry, Bertoncini, Josiane & Mehler, Jacques. 1998. Language discrimination by newborns: toward an understanding of the role of rhythm. Journal of Experimental Psychology: Human Perception and Performance 24: 756-766. Nespor, Marina, Peña, Marina & Mehler, Jacques. 2003. On the different roles of vowels and consonants in speech processing and language acquisition. Lingue e Linguaggio 2: 203-229. Newmeyer, Frederick J. 2004. Against a parameter-setting approach to language variation. In Pica, Pierre, Rooryck, Johan & van Craenenbroeck, Jeroen (eds.). Language Variation Yearbook Volume 4, 181-234. Amsterdam: Benjamins. Newmeyer, Frederick J. 2005. Possible and Probable Languages. Oxford: Oxford University Press. Newport, Elissa. 1990. Maturational constraints on language learning. Cognitive Science 14: 11-28. Panagiotidis, Phoevos. 2014. Categorial Features: A Generative Theory of Word Class Categories. Cambridge: Cambridge University Press. Pearl, Lisa. in press. Modelling syntactic acquisition. In J. Sprouse (ed.). Oxford Handbook of Experimental Syntax. Oxford: Oxford University Press. Pearl, Lisa & Sprouse, Jon. in press. Comparing solutions to the Linking Problem using an integrated quantitative framework of language acquisition. To appear in Language. (; last accessed 5 December 2018). Pérez-Leroux, Ana T., Castilla-Earls, Anny P., Bejar, Susana & Massam, Diane. 2012. Elmo’s sister’s ball. The development of nominal recursion in children. Language Acquisition 19(4), 301-311. Pérez-Leroux, Ana T., Peterson, Tyler, Bejar, Susana, Castilla-Earls, Anny P., Massam, Diane & Roberge, Yves. 2018. The acquisition of recursive modification in NPs. Language 94(2): 332-359. Pesetsky, David. 2014. Russian Case Morphology and the Syntactic Categories. Cambridge, MA: MIT Press. Picallo, Carme (ed.). 2014. Linguistic Variation in the Minimalist Framework. Oxford: Oxford University Press. Pinker, Steven. 1984. Language Learnability and Language Development. Cambridge, MA: Harvard University Press. Preminger, Omer. 2018. Back to the Future: non-generation, filtration, and the heartbreak of interface-driven minimalism. In Hornstein, Norbert, Lasnik, Howard, Patel-Grosz, Pritty & Yang, Charles (eds.). Syntactic Structures after 60 Years: The Impact of the Chomskyan Revolution in Linguistics, 355‑380. Berlin: de Gruyter. Radford, Andrew. 1990. Syntactic Theory and the Acquisition of English Syntax: The Nature of Early Child Grammars of English. Oxford: Blackwell. Ramchand, Gillian. 2008. Verb Meaning and the Lexicon. A First Phase Syntax. Cambridge: Cambridge University Press. Ramchand, Gilllan & Svenonius, Peter. 2014. Deriving the functional hierarchy. Language Sciences 46: 152-174. Richards, Marc. 2014. Defective Agree, Case alternations, and the prominence of Person. In Bornkessel-Schlesewsky, Ina, Malchukov, Andrej & Richards, Marc (eds.). Scales and Hierarchies: A Cross-Disciplinary Perspective, 173-194. Berlin: Mouton de Gruyter. 86 CatJL Special Issue, 2019 Theresa Biberauer

Richards, Marc. 2017. Problems of ‘Problems of Projection’: breaking a conceptual tie. Paper presented at the Generative Syntax 2016: Questions, Crossroads, and Challenges workshop (Barcelona 23 June 2017) Richards, Norvin. 2010. Uttering Trees. Cambridge: MIT Press. Riemsdijk, Henk van. 2008. Identity avoidance: OCP effects in Swiss relatives. In Freidin, Robert, Otero, Carlos P. & Zubizarreta, Maria Luisa (eds.). Foundational Issues in Linguistic Theory: Essays in Honor of Jean-Roger Vergnaud, 227-250. Cambridge, MA: MIT Press. Ritter, Elisabeth & Wiltschko, Martina. 2009. Varieties of INFL: TENSE, LOCATION and PERSON. In van Craenenbroeck, Jeroen (ed.). Alternatives to Cartography, 153-201. Berlin: Mouton de Gruyter. Ritter, Elisabeth & Wiltschko, Martina. 2014. The composition of INFL. Natural Language and Linguistic Theory 32(4): 1331-1386. Rizzi, Luigi. 1982. Issues in Italian Syntax. Dordrecht: Foris. Rizzi, Luigi. 1986. Null objects in Italian and the theory of pro. Linguistic Inquiry 17: 501-557. Rizzi, Luigi. 1993/4. Some notes on linguistic theory and language development: the case of Root Infinitives. Language Acquisition 3: 341-393. Rizzi, Luigi. 2014. On the elements of syntactic variation. In Picallo, Carme (ed.). Linguistic Variation in the Minimalist Framework, 13-45. Oxford: Oxford University Press. Rizzi, Luigi. 2018. On the format and locus of parameters: the role of morphosyntactic features. The Linguistic Review 41(3-4): 159-190. Rizzi, Luigi & Cinque, Guglielmo. 2016. Functional categories and syntactic theory. Annual Review of Linguistics 2: 139-163. Roberts, Ian. 2007. Diachronic Syntax. Oxford: Oxford University Press. Roberts, Ian. 2012. On the nature of syntactic parameters: a programme for research. In Galves, Charlotte, Cyrino, Sonia, Lopes, Ruth, Sandalo, Filomena & Avelar, Juanito (eds.). Parameter Theory and Linguistic Change, 319-334. Oxford: Oxford University Press. Roberts, Ian. 2019. Parameter Hierarchies and Universal Grammar. Oxford: Oxford University Press. Roberts, Ian & Roussou, Anna. 2003. Syntactic Change. A Minimalist Approach to Grammaticalization. Cambridge: Cambridge University Press. Roeper, Tom. 2011. The acquisition of recursion: how formalism articulates the acquisition path. Biolinguistics 5: 57-86. Roeper, Tom & Snyder, William. 2004. Recursion as an analytic device in acquisition. In van Kampen, Jacqueline & Coopmans, Peter (eds.). Proceedings of GALA 2003, 401-408. Utrecht: LOT. Roeper, Tom & Snyder, William. 2005. Language learnability and the forms of recursion. In Di Sciullo, Anna-Maria (ed.). UG and External Systems: Language, Brain and Computation, 155-169. Amsterdam: Benjamins. Sandler, Wendy. 2010. Prosody and syntax in sign language. Transactions of the Philological Society 108(3): 298-328. Sandler, Wendy. 2012. The phonological organization of sign languages. Language and Linguistics Compass 6(3): 162-182. Factors 2 and 3: Towards a principled approach CatJL Special Issue, 2019 87

Santos, Ana. 2009. Minimal Answers: Ellipsis, Syntax, and Discourse in the Acquisition of European Portuguese. Amsterdam: Benjamins. Schutze, Carson T. & Wexler, Kenneth. 1996. Subject case licensing and English root infinitives. In Stringfellow, Andy, Cahana-Amitay, Dalia, Hughes, Elizabeth & Zukowski, Andrea (eds.). BUCLD 20 Proceedings, 670-681. Cambridge, MA: Cascadilla Press. Schuler, Katherine, Yang, Charles & Newport, Elissa. 2016. Testing the Tolerance Principle: children form productive rules when it is more computationally efficient to do so. Proceedings of the 38th Annual Conference of the Cognitive Science Society: 2321-2326. Seuren, Pieter & Jaspers, Dany. 2014. Logico-cognitive structure in the lexicon. Language 90: 607-643. Sheehan, Michelle. 2013. Explaining the Final-over-Final Constraint: formal and functional approaches. In Theoretical Approaches to Disharmonic Word Order. Biberauer, Theresa & Sheehan, Michelle (eds), 407-444. Oxford: Oxford University Press. Sheehan, Michelle, Biberauer, Theresa, Holmberg, Anders & Roberts, Ian (2017). The Final-over-Final Condition. A Syntactic Universal. Cambridge, MA: MIT Press. Shi, Rushen & Werker, Janet F. 2001. Six-month-old infants’ preference for lexical words. Psychological Science 12(1): 70-75. Shi, Rushen, Werker, Janet F. & Morgan, James L. 1999. Newborn infants’ sensitivity to perceptual cues to lexical and grammatical words. Cognition 72(2): B11-B21. Shlonsky, Ur. 2010. The cartographic enterprise in syntax. Language and Linguistics Compass 4(6): 417-429. Song, C. (2019). On the formal flexibility of syntactic categories. PhD dissertation, University of Cambridge. Starke, Michal. 2009. Nanosytax. A short primer to a new approach to language. Nordlyd 36(1): 1-6. Starke, Michal. 2014. Towards elegant parameters: language variation reduces to the size of lexicaly-stored trees. In Picallo, Carme (ed.). Linguistic Variation in the Minimalist Framework, 140-154. Oxford: Oxford University Press. Stokes, Stephanie, Klee, Thomas, Perry Carson, Cecyle & Carson, David. 2005. A phonemic implicational feature hierarchy of phonological contrasts for English- speaking children. Journal of Speech, Language, and Hearing Research 48: 817-833. Svenonius, Peter. 2002. Introduction. In Svenonius, Peter (ed.). Subjects, Expletives and the EPP, 3-28. Oxford: Oxford University Press. Thornton, Rosalind. 1995. Referentiality and wh-movement in child English: Juvenile D-Linkuency. Language Acquisition 4: 139-175. Tsimpli, Ianthi. 2014. Early, late or very late? Timing acquisition and bilingualism. Linguistic Approaches to Bilingualism 4(3): 283-313. Vikner, Sten. 1995. Verb Movement and Expletive Subjects in the Germanic Languages. Oxford: Oxford University Press. Wallage, Phillip. 2017. Negation in Early English: Grammatical and Functional Change. Cambridge: Cambridge University Press. Westergaard, Marit. 2009. The Acquisition of word order: Micro-cues, Information Structure and Economy. Amsterdam: Benjamins. 88 CatJL Special Issue, 2019 Theresa Biberauer

Wexler, Kenneth. 1998.Very early parameter setting and the unique checking constraint: a new explanation of the optional infinitive stage. Lingua 106: 23-79. Willis, David. 2016. Exaptation and degrammaticalization within an acquisition-based model of abductive reanalysis. In Norde, Muriel & van de Velde, Freek (eds.). Exaptation in Language Change, 227-260. Amsterdam: Benjamins. Wiltschko, Martina. 2014. The Universal Structure of Categories. Towards a Formal Typology. Cambridge: Cambridge University Press. Woods, Rebecca & Roeper, Tom. in press. Rethinking auxiliary doubling in adult and child language. To appear in: Woods, Rebecca & Wolfe, Sam (eds.). Rethinking Verb Second. Oxford: Oxford University Press. Yang, Charles. 2016. Price of Productivity. How Children Learn and Break Rules of Language. Cambridge, MA: MIT Press. Zeijlstra, Hedde. 2008. On the syntactic flexibility of formal features. In Biberauer, Theresa (ed.). The Limits of Syntactic Variation, 143-174. Amsterdam: Benjamins. Catalan Journal of Linguistics Special Issue, 2019 89-138

What sort of cognitive hypothesis is a derivational theory of grammar?*

Tim Hunter University of California [email protected]

Received: February 15, 2018 Accepted: September 23, 2019

Abstract

This paper has two closely related aims. The main aim is to lay out one specific way in which the derivational aspects of a grammatical theory can contribute to the cognitive claims made by that theory, to demonstrate that it is not only a theory’s posited representations that testable cognitive hypotheses derive from. This requires, however, an understanding of grammatical derivations that initially appears somewhat unnatural in the context of modern generative syntax. The second aim is to argue that this impression is misleading: certain accidents of the way our theories developed over the decades have led to a situation that makes it artificially difficult to apply the understanding of derivations that I adopt to modern generative grammar. Comparisons with other derivational formalisms and with earlier generative grammars serve to clarify the question of how derivational systems can, in general, constitute hypotheses about mental phenomena. Keywords: syntax; minimalist grammars; transformational grammars; derivations; representa- tions; derivation trees; probabilistic grammars

Resum. Quin tipus d’hipòtesi cognitiva és una teoria derivativa de la gramàtica?

Aquest article té dos objectius estretament relacionats. L’objectiu principal és exposar una forma específica en la qual els aspectes derivatius d’una teoria gramatical poden contribuir a les afirmacions cognitives realitzades per aquesta teoria, per demostrar que no són només les representacions plantejades d’una teoria de les que es deriven hipòtesis cognitives testables. Això requereix, però, una comprensió de les derivacions gramaticals que, inicialment, semblen poc naturals en el context de la sintaxi generativa moderna. El segon objectiu és argumentar que aquesta impressió és enganyosa: certs accidents de la manera com les nostres teories es van desenvolupar al llarg de les dècades han donat lloc a una situació que fa artificialment difícil aplicar la comprensió de les derivacions que adopto a la gramàtica generativa moderna. Les comparacions amb altres formalismes derivatius i amb gramàtiques generatives anteriors serveixen per aclarir la qüestió de com els sistemes derivats poden, en general, constituir hipòtesis sobre fenòmens mentals. Paraules clau: sintaxi; gramàtiques minimalistes; gramàtiques transformacionals; derivacions; representacions; arbres de derivació; gramàtiques probabilístiques

* Thanks to Bob Frank, Norbert Hornstein, Ellen Lau, Jeff Lidz, Colin Phillips, Jon Sprouse, Tim Stowell, Alexander Williams, Masaya Yoshida and two anonymous reviewers for helpful com- ments and discussions; and to audiences at the University of Connecticut, Harvard, Michigan State University, Northwestern University, UCLA, the 2017 IGG workshop on “Order and direction of grammatical operations” in Pavia, and the “Generative Syntax 2017” workshop in Barcelona.

ISSN 1695-6885 (in press); 2014-9718 (online) https://doi.org/10.5565/rev/catjl.224 90 CatJL Special Issue, 2019 Tim Hunter

Table of Contents 1. Representations, derivations 3. The empirical reflexes of derivational and derived expressions processes 2. A derivational theory 4. Conclusion of minimalist syntax (or two) References

Two kinds of theories of natural language syntax can be distinguished: representa- tional theories and derivational theories. A representational theory posits some set of constraints, and defines a well-formed syntactic object to be one that satisfies all of the constraints. A derivational theory instead takes the form of a nondeter- ministic mechanical procedure, for example a symbol-rewriting procedure or a procedure that builds larger objects out of smaller ones, and defines a well-formed syntactic object to be one that is generated by this procedure. The mentalistic claims of a representational theory are relatively clear: it is gen- erally understood that when a speaker comprehends or produces a sentence, a repre- sentational theory predicts that a corresponding well-formed syntactic object (say, a tree structure with the sentence’s words at its leaves) is grasped in the speaker’s mind. With a representational theory, nothing is said about how a speaker might go about constructing (a representation of) this syntactic object, and the linguist’s everyday use of the theory also does not involve any descriptions of procedures that construct syntactic objects. The situation for a derivational theory, however, is slightly less straightforward. Consider for example the mainstream contemporary derivational theories deriving from Chomsky (1995) and subsequent work. It is natural to assume that a speaker grasps the syntactic object that is the end product of the derivational process corresponding to the sentence being comprehended/produced, i.e. the tree structures that are routinely used to illustrate proposals in this literature. But if that is the extent of a derivational theory’s mental commitments, what is the scientific role of the derivational procedure? If we have an existing derivational theory T1, and an alternative theory T2 proposes a derivational procedure that differs from that of T1 but yields the same set of well-formed syntactic objects, then is there any clear sense in which we should understand the two theories to be different? If they are not different — i.e. if the procedural component of a derivational theory does not contribute to its empirical bottom line — then why bother with the derivational procedures at all? If they are different, then how are they different, i.e. how does the procedural component of a derivational theory contribute to the theory’s empirical bottom line? Answering this last question is the main goal of this paper: I will lay out a way of understanding derivational theories according to which the derivational process itself, in addition to the end result of this process, plays a part in determining the empirical predictions of a theory. For concreteness, I will illustrate by showing at the end of the paper — in entirely artificial, and artificially small-scale, case stud- ies — how derivational operations play a part in determining predictions about sentence comprehension difficulty, and predictions about which grammar a learner What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 91 will choose in response to some collection of input. The main point I want to stress, however, is not the (fairly arbitrary) particulars of either of these case studies, but rather the general understanding of derivational frameworks which makes a derivational process a first-class theoretical object which can underpin empirical predictions just as naturally as the static objects in a representational theory can. The key idea is that we can identify an atemporal structured object — typically, a derivation tree — that encapsulates the derivational process and yet is static in the same sense that syntactic objects in representational theories are. This point is an attempt to address the issues raised by some who have ques- tioned the role of derivational processes in modern generative syntax (Sag & Wasow 2011; Jackendoff 2011; Ferreira 2005; Phillips & Lewis 2013). This criti- cism appears to stem largely from the fact that, in practice, descriptions of how a particular theory accounts for some relevant data rarely requires making reference to the derivational operations1 posited by the theory; very often, the final con- structed syntactic object is all we need to consider when working out the empirical predictions of a theory, and it seems that any number of different ways of describ- ing how that object is constructed would leave the account intact. I will suggest that this impression is due to a perhaps unfortunate quirk in mod- ern generative syntax: the fact that the end product of a derivational process very often encodes a large amount (or all) of the derivational process itself, for example in the form of co-indexed traces or copies. A clearer understanding of how deri- vational grammars in general can constitute hypotheses about mental phenomena can be achieved by considering other derivational systems that do not have this quirk, and where it is therefore easy to see the role of the derivational process itself (because this role is not duplicated by representational devices). With more light thus shed on the kind of mental significance a derivational process can in princi- ple take on, we will be better placed to ask (i) what it would mean to ascribe this same kind of mental significance to the derivational processes typically invoked in modern generative syntax, and (ii) whether doing so is consistent with the standard ways in which linguists already work with these theories. In answer to the second question, I will argue that it is not only consistent with standard practices but fur- thermore is, given the way our theories have developed over the decades, a very natural understanding of syntactic derivations. In Section 1 I will review in more detail the distinguishing features of represen- tational and derivational theories of grammar, and the questions that are sometimes raised about the mental significance of derivational processes. In doing so I will discuss the abovementioned quirk of modern generative syntax, and the way the questions are clarified by considering other systems that do not share this quirk. I will then turn to minimalist syntax more specifically in Section 2. The goal here will be to identify the static representations (namely, derivation trees) that encap-

1. This is not to deny that we sometimes talk about, for example, a certain sentence being unacceptable “because this movement step violates such-and-such constraint”. But due to the presence of devices like traces or copies, this appeal to the processes themselves is often dispensable. Much more on this point in Section 1.3 below. 92 CatJL Special Issue, 2019 Tim Hunter sulate this kind of theory’s derivational processes. I will do this for two subtly dif- ferent variants of minimalist theory: one which takes merge and move to be distinct primitive operations, and one which unifies them into a single operation. These two variants that I will introduce differ only in their derivational processes, not in the syntactic objects that they construct. In Section 3 I will then present case stud- ies where the two variants nonetheless make distinct predictions in two empirical domains: sentence comprehension difficulty, and grammar selection by a learner. Since the two variants’ differences concern only their derivational processes, this serves to demonstrate that the procedural component of a derivational system con- tributes to a theory’s empirical bottom line. Section 4 summarizes and concludes.

1. Representations, derivations and derived expressions 1.1. Purely representational and purely derivational systems Caricaturing at least slightly, Figure 1 illustrates one possible conception of the rela- tionship between a representational system and a derivational system. On the left is the static syntactic structure assigned to the sentence ‘Kim gives Sandy Fido’ in HPSG, one of the more widely-known representational theories of grammar (Pollard & Sag 1994: 33). This syntactic object is well-formed by virtue of satisfying the relevant array of constraints. As mentioned above, the mental commitments of this kind of theory are relatively clear: in comprehending or producing this sentence, a representation of this syntactic object is grasped2 in the speaker’s mind. By virtue of the fact that this grasped syntactic object is well-formed, the theory predicts that this sentence will be judged to be acceptable. And perhaps there are other predictions that one might make on the basis of other properties of this syntactic object: to take an overly simplistic example, one might predict that the time taken to comprehend this sentence will be some function of the size of this object. On the right of Figure 1, for comparison, is a sketch of how a derivation on modern minimalist grammar might be thought of. There is a final derived expres- sion of the familiar sort, the tree with yield ‘the dog will chase it’ shown at the top. One possible thought — although I will argue against this — is that this tree is the thing in this derivational system that best corresponds, as indicated by the horizontal dashed lines, to HPSG’s static syntactic object on the left. Since this is a derivational framework, however, there is more to the picture than just this: there is also a derivational process which is taken to have given rise to this derived expression, as shown underneath. (For reasons that should become clear, I am writing them underneath the derived expression, despite the usual idea that these other pieces of the picture precede the derived expression. This usual notion of precedence is reflected in the arrows.) The layout of the diagram is intended to

2. I will assume that the intended meaning of this term, while difficult to spell out explicitly, is suffi- ciently clear. Since the questions I aim to address here largely centre on the difficulties that come with adopting derivational as opposed to representational grammars, I am taking as my concrete goal to show that there are no such additional difficulties. Fleshing out the notion that I am calling “grasping” is a difficulty that will affect derivational and representational theories equally. What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 93

Figure 1. A view that I will argue against: only the end product of a derivational process is given the easily-understandable empirical status corresponding to that of a static represen- tation in a non-derivational theory. emphasize the way this perception of a derivational system makes the derivational process seem like something “extra”, and perhaps even something superfluous, in comparison with a representational system: there is nothing in the illustration of the representational system on the left which corresponds to this derivational process. So what is it there for? I will argue that instead of this view, we should consider the derivational pro- cess as a whole (including, but not limited to, the final derived expression) to be the analog of the static representation in a representational system. This shift in perspective is reflected in the shift from Figure 1 to Figure 2. The arrows that are usually thought of (and can still be, harmlessly) as indicating a kind of precedence are now simply part of the object that a speaker must grasp; the formal relationships amongst expressions that they express are part of the information that a speaker must recover.3

3. The difference between Figure 1 and Figure 2 perhaps corresponds to the difference between what Phillips & Lewis (2013) call the “extensionalist” view of derivations and the “formalist” view, respectively. The formalist view can be seen as an intermediate position between two extremes: 94 CatJL Special Issue, 2019 Tim Hunter

It will be useful to establish some terminology for what follows. I will use the term expression for an object of the sort that might be manipulated or inspected by a grammar: either checked for consistency with some representational constraint, or used as input to or produced as output from some derivational operation. I will show expressions inside thick, rounded boxes throughout. I will use object as a much more general term for any kind of structured representation that a mind might grasp. Expressions are objects, but not all objects are expressions. In a representa- tional setting, there are no relevant objects to consider besides expressions them- selves, and so the object to be grasped upon encountering the sentence ‘Kim gives Sandy Fido’ is simply the expression itself that appears on the left of Figure 1 and Figure 2. The difference between these two figures is that Figure 1 expresses a view where, in the derivational system, the object to be grasped is the single expression shown within the horizontal dashed lines; whereas Figure 2 expresses the view that the object to be grasped is an object of a different sort, an object encoding certain relations among expressions. This object is a derivation (and can be represented on paper by a derivation tree).4 A clear illustration of the perspective presented in Figure 2 is provided by the various kinds of categorial grammar. In this framework, the categories into which

the extensionalist view, according to which individual derivational steps are not understood to be making any mentalistic commitments at all, and the “literalist” view, according to which individual derivational steps are interpreted very directly as hypothesized real-time mental operations. From the perspective in Figure 2 that I aim to elucidate here, linking hypotheses can be formulated that expose derivational operations to empirical scrutiny (unlike the extensionalist view), but these link- ing hypotheses do not include the straightforward one that makes immediate and direct predictions about real-time mental operations (as is the case on the literalist view). Phillips and Lewis mention the intermediate formalist option only relatively briefly, and focus mainly on the literalist and extensionalist extremes, without going into much detail about what a fleshed-out formalist position would look like. But the notion of a static atemporal derivation tree, mentioned above, corresponds closely to the collection of formally related structures that Phillips and Lewis mention. 4. Some readers, especially formally-minded ones, might object at this point that the tension I am set- ting up can be straightforwardly resolved by eliminating (or just neglecting to make) this distinction between expressions and objects; derivation trees can just as well be defined model-theoretically (i.e. “any derivational system can be converted to an equivalent representational one”), and then the choice of Figure 2 over Figure 1 is obvious. I agree wholeheartedly: my aim in this paper is in large part to lay out exactly this line of reasoning in an accessible and contextualized manner, since it seems to gain little traction in the linguistics literature. I adopt the expression/object terminology as a way to try to engage with the intuitions that make derivational and representational systems appear starkly different. It is worth pointing out, however, that — as this acknowledgement might suggest — the perspective on derivations that I end up arguing for in this paper is entirely standard within the theoretical compu- tational linguistics community (and entirely mundane formally). This includes the work on formalized minimalist grammars that I draw on in Sections 2 and 3; see for example work by Kobele (2010, 2011, 2012), Graf (2011, 2013, 2017) and Stabler (2011, 2013), all of which takes derivation trees (virtually without comment) as the central object of interest. The same can be said for the formal work on tree-adjoining grammars (e.g. Joshi & Schabes 1997), where an analogous relationship between derivation structure and derived expressions arose earlier. It is no coincidence that these formalisms, where this relationship is kept clear, do not share with mainstream minimalist the unfortunate quirk of duplicating derivational history in derived expressions, as I discuss in Section 1.3. What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 95

Figure 2. The view that I will argue for: the derivational process itself, in its entirety, is the relevant object. lexical items are classified can be complex, and a small number of very general combinatory rules apply in a manner that is guided by these potentially complex categories. For example, using the lexical items shown in (1), the two general rules of forwards and backwards function application can be applied recursively to construct the sentence ‘the dog chased the cat’. This is typically illustrated using a format like (2), but an equivalent representation that follows the conventions I adopt throughout this paper is the one in (3).

(1) the :: NP/N dog :: N cat :: N chased :: (S\NP)/NP

(2) 96 CatJL Special Issue, 2019 Tim Hunter

(3)

A distinctive feature of this kind of grammar is that the expressions being manipulated are essentially unstructured: they are things like ‘the dog :: NP’, i.e. a string5 paired with a category, where the category dictates how the expression can be used by any subsequent operations. So the derivational process indicated in (2) and (3) is one which works with the “ingredients” shown in (1), and produces as a result the expression ‘the dog chased the cat :: S’. Notice that the final derived expression is an object of the same sort as the ingredient expressions in (1), i.e. a string with a category. As Jacobson (2007) describes this kind of system, “there is no room to state constraints on structured representations. For ‘structure’ is not something that the grammar ever gets to see”. In the terminology introduced above, this is to say that the expressions here, the things that the grammar can “see” — inspect, manipulate, whatever — have no structure; the only structured object is the derivation. The crucial point here is that it would make little sense to suppose that the object that is “grasped” by a speaker upon encountering the sentence ‘the dog chased the cat’ is simply the one constructed by this derivational process, namely the expression ‘the dog chased the cat :: S’. For the theory to be doing any work at all, there must at the very least be some difference between what the speaker does upon encountering ‘the dog chased the cat’ and what he/she does upon encountering ‘cat dog the the chased’. But this is not a difference between ‘the dog chased the cat :: S’ being well-formed and ‘cat dog the the chased :: S’ being ill-formed relative to some constraints on representations — there are no such constraints. Rather, the difference is that in the case of ‘the dog chased the cat’, there is some derivational process that produces the expression ‘the dog chased the cat :: S’, whereas in the case of ‘cat dog the the chased’ there is no derivational process that produces the expression ‘cat dog the the chased :: S’. So what is grasped by a speaker encountering ‘the dog chased the cat’ is some repre- sentation like (3): something that encodes the relationships between the ingredients like ‘the :: NP/N’ and ‘dog :: N’ and the things that are built from them like ‘the

5. Of course there is also a semantic representation that accompanies each such expression, so they are really triples comprising a string, a meaning and a category. I will leave out the meaning component only for simplicity (an omission that is perhaps particularly egregious given that a distinguishing property of categorial grammars is the manner in which the composition of strings and the compo- sition of meanings take place in sync with each other). The crucial point remains that these objects do not have any syntactic structure, in contrast to the case of transformational grammars. What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 97

Figure 3. Taking categorial grammar as our derivational theory, it is very natural to adopt the view in Figure 2 rather than that in Figure 1. dog :: NP’. It is clear, then, that in this kind of system the derivational process is doing some real work, in such a way that it makes sense to construe the derivational process itself as the object that corresponds to the representations to be grasped in the setting of a representational theory; see Figure 3. What makes the importance of the derivation so clear in categorial grammars is the fact that, as emphasized above, the expressions constructed by these deriva- tions are just strings (with categories) that have no significant structure. Thus there is, roughly speaking, “nothing but the derivation”, and so when it comes to asking what the theory says about (what a speaker will do upon encountering) a particular sentence, the derivation itself is the only thing to look to. But the general point can be carried over to systems where the derived expressions have more structure, for example, if they are trees rather than strings: in such systems, it is less obvious that it is necessary to treat the derivation with the significance indicated in Figure 2 and Figure 3, but there is no obstacle to doing so if it is useful. My goal in this paper is roughly to show that doing so in the context of modern generative syntax is both useful and, implicitly at least, even standard. As another example, note that a familiar context-free grammar (CFG) can be understood as a device that generates unstructured objects much like the way cat- egorial grammars do. Specifically, the CFG in (4) can be understood as a collection of statements that allow the expression ‘the dog chased the cat :: S’ to be generated by the derivational process illustrated in (5). As above, this is to be understood as a record of the fact that ‘the :: D’ combined with ‘cat :: N’ to produce the expression ‘the cat :: NP’, which in turn combined with ‘chased :: V’ and so on.

(4) S → NP VP NP → D N VP → V NP D → the N → dog N → cat V → chased 98 CatJL Special Issue, 2019 Tim Hunter

(5)

From this point of view, a rule such as ‘NP → D N’ is a statement about what can be combined with what (to produce what), and a CFG does not derive struc- tured expressions any more than a categorial grammar does. Accordingly, in order for a CFG understood this way to serve as a model of linguistic competence, it is natural to take the derivation itself to be what is “grasped” by speakers, just as it is with categorial grammars. This entirely derivational approach is not the only construal of CFGs, however, and perhaps is not even the most common one. Moving to the other extreme, we can instead consider an entirely representational construal. On this view the rules in (4) are understood not as statements specifying allowable derivational operations, but as well-formedness conditions on static expressions (McCawley 1968), just like in HPSG. Expressions of the sort dealt with in (3) and (5) — namely things like ‘the dog :: NP’ and ‘chased the cat :: VP’ — don’t contain enough information for these well-formedness conditions to take their intended effect, and so on this construal we must take the expressions that the grammar works with to be trees. One such tree that is well-formed according to the grammar in (4) is shown in (6).

(6)

On this view, the rule ‘NP → D N’ is not a statement about what can be com- bined with what, or about anything that an abstract derivational procedure can or cannot do. It is a statement that says, of a static tree-shaped expression, “If a node is labeled NP and has two daughters, the left of which is labeled D and the right of which is labeled N, then that node is well-formed”. If all the parts of a tree are well-formed according to this interpretation of the grammar, then the tree What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 99

Figure 4. The equivalence of the representational and derivational construals of CFGs rests on allowing the entire derivation to “count”, as in Figure 2.

Figure 5. Restricting ourselves to understanding derivational theories as in Figure 1 would entail that the representational and derivational construals of CFGs were not equivalent. is well-formed. A representation of this static tree is what is taken to be grasped by a speaker upon encountering the sentence ‘the dog chased the cat’, in just the same way that the static HPSG representation on the left-hand side of Figure 1 and Figure 2 is. Of course, these two construals of the CFG in (4) are barely distinguishable (if at all) as cognitive hypotheses. This feeling that they are one and the same is based on the assumption that the derivational process in (5) can serve as an object to be grasped, so that the two construals stand in the trivial relationship to each other illustrated in Figure 4. Denying this role to derivations themselves would force us to conclude that the construal illustrated in (5) differed significantly from the construal illustrated in (6): we would end up with the relationship between the two illustrated 100 CatJL Special Issue, 2019 Tim Hunter in Figure 5, where only the representational construal in (6) is sensible because, as discussed in relation to the categorial grammar example, it makes no sense to suppose that the object to be grasped is simply ‘the dog chased the cat :: S’. I will argue that we should reject the view of modern minimalist derivations illustrated in Figure 1 for essentially the same reason that we reject the view of CFGs illustrated on the right of Figure 5.

1.2. Mixed systems Recall from above that when a grammar deals with expressions that have more structure than strings — for example, trees — it is less obviously necessary that the derivation must take on the significance indicated in Figure 2 and Figure 3, but nonetheless still possible. The question is whether the possibility of doing so is useful. As an illustration of what doing so looks like when it is not useful, we can consider (somewhat perversely) a construal of the CFG in (4) according to which it specifies a derivational process (like in (5), but unlike in (6)) that works with structured expressions (like in (6), but unlike in (5)). A derivation in this unwieldy and redundant system is shown in (7).

(7) What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 101

Now the rule ‘NP → D N’, for example, says two mutually redundant things. First, with regard to the derivational process, it says that it is possible to put togeth- er a tree with a root node labeled D and a tree with a root node labeled N, to form a tree with a new root node labeled NP. Second, with regard to the structured expressions that are derived, it says that a node labeled NP is well-formed if it has two daughters, a left daughter labeled D and a right daughter labeled N. These “two things” that the rule says are redundant since, of course, they are really just the one thing said in two different ways. So the redundancy stems from the fact that a CFG really only has one thing to say, and while that one thing can be expressed and enforced either derivationally as in (5) or representationally as in (6), having the rule enforce it in both ways is redundant. The crucial point to note is that because of this redundancy, it is plausible to take the final expression derived by the entire derivational process, namely the tree at the root node in (7), as the object that is grasped by a speaker, since — in contrast to the situation illustrated earlier in (5) and Figure 5 — grasping the entire derivational process provides no additional information beyond what is provided by grasping the final derived expression. If all derivational systems that worked with structured expressions were redundant in this way, then the conception I began with in Figure 1 would be reasonable. But my aim here is to show that this is not the case. I will use the term purely derivational for systems like (3) and (5), where “everything the grammar says” is expressed derivationally; and purely represen- tational for systems like (6), where everything is expressed representationally. The system illustrated in (7) is neither purely derivational nor purely representational — it is what I will call a mixed system, since the grammar makes both represen- tational and derivational statements.6 So to repeat the crucial point, although this first example of a mixed system has the property that the representational and derivational aspects are mutually redundant, there are other mixed systems that are not redundant in this way — instead, some parts of the important work are accomplished derivationally, and other parts are accomplished representationally. One example of a mixed but non-redundant system is the framework of early transformational grammars in Miller and Chomsky (1963) and Chomsky (1965). A clear illustration of this comes from the famous comparison between the two sentences in (8) and (9) (see Miller & Chomsky 1963: 476-480).

(8) John is easy to please.

(9) John is eager to please.

6. Note that it does not make sense to ask whether the set of rules in (4) itself is purely derivation- al or purely representational (or mixed). It depends on whether those rules are interpreted as well-formedness conditions on static representations, or as statements about what can be derived from what. 102 CatJL Special Issue, 2019 Tim Hunter

Each of these sentences is derived by base-generating two monoclausal “under- lying P-markers”, and then manipulating and combining these P-markers (these are the expressions that this system works with) to arrive at a single “derived P-marker”, as illustrated in (10) and (11).

(10) (11)

Like the understanding of CFGs illustrated in (7), this is a derivational system that works with structured expressions (specifically, trees, rather than strings), so this is a mixed system. The grammar licenses certain derivational steps that relate P-markers to one another — specifically, certain transformations, such as the trans- formation that combines two S-rooted trees and the transformation that fronts an NP from an embedded object position to overwrite ‘it’ — and also imposes certain representational constraints (“surface filters”7) on the eventual derived expression.

7. I am taking some liberties with the historical details here: transformational grammars generally did not include surface filters until after the 1960s, but the fact that they were soon introduced leaves the main point unaffected. What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 103

But unlike the first example of a mixed system in (7), this is not redundant: in this case, the work that is performed derivationally and the work that is performed representationally are separate, and accordingly grasping the entire derivational process provides more information than does grasping the final derived expres- sion alone. Furthermore, it is clear that the intended interpretation of these early trans- formational grammars did involve the idea that a speaker encountering (8) or (9) grasped the entire derivational process illustrated in (10) or (11). This pair of sen- tences provides a dramatic illustration of this: the interesting point about this pair is that speakers understand them to have different structures in some important sense — as evidenced by the fact that speakers understand ‘John’ to be the pleasee in (8) but the pleaser in (9), and the fact that speakers know there is an expletive-‘it’ variant of (8) but not (9), etc. The crucial point to note is that the theory would not provide any account of these differences if one supposed that the object grasped by speakers were simply the eventual derived structures, because these two structures are identical (modulo the alternation of ‘easy’/‘eager’ itself), as (10) and (11) make clear. In order to provide any explanation for the different ways in which speak- ers treat these two sentences, the derivational processes posited by the theory, i.e. the entire tree structures shown in (10) and (11), must be the objects thought to be grasped by speakers. This point is not only logically necessary in hindsight, but was clearly the intended interpretation at the time:

[…] we see that the grammatical relations of ‘John’ and ‘please’ in [(8)] and [(9)] are represented in the intuitively correct way in the structural descriptions provided by a transformational grammar. The structural description of [(8)] consists of the two under- lying P-markers [at the bottom of (10)] and the derived P-marker [at the top of (10)] (as

well as a record of the transformational history T1, T4, T5). The structural description of [(9)] consists of the two underlying P-markers [at the bottom of (11)] and the derived

P-marker [at the top of (11)] (along with the transformational history T1, T2, T3). Thus the structural description of [(8)] contains the information that ‘John’ in [(8)] is the object of ‘please’ in the underlying P-marker [at the bottom right of (10)]; and the structural description of [(9)] contains the information that ‘John’ in [(9)] is the subject of ‘please’ in the underlying P-marker [at the bottom right of (11)]. [one component of the perceptual model] will utilize the full resources of the trans- formational grammar to provide a structural description, consisting of a set of P-markers and a transformational history, in which deeper grammatical relations and other struc- tural information are represented. Miller & Chomsky (1963: 479-480)

This kind of transformational grammar is therefore a mixed system where the final derived expression does not provide all of the grammatically relevant informa- tion — in essentially the same way as was noted earlier with respect to the purely derivational systems in (3) and (5). So having trees instead of strings as the derived expressions does not automatically make the derivational process redundant. 104 CatJL Special Issue, 2019 Tim Hunter

1.3. What kind of system is modern transformational syntax? Against the backdrop of these distinctions — between purely derivational/repre- sentational and mixed systems, and between redundant and non-redundant mixed systems — we can now ask what kind of system contemporary versions of trans- formational grammar are. Clearly they are mixed systems of some sort, since they are specified derivationally and work with structured expressions, so the question is whether the derivational process that derives a structured expression provides additional information that is not encoded in the derived expression itself, like in (10) and (11), or is redundant, like in (7). To the extent that the derivational process provides additional information that theories appeal to, the background assumption that researchers are working with must be the view outlined in Figure 2 (because the view in Figure 1 would put this information “out of bounds”). It is at this point that we must contend with the “unfortunate quirk” of modern transformational grammars mentioned in the introduction. Over the decades a num- ber of representational devices have been introduced that encode in the final derived expression information that previously was encoded only in underlying phrase mark- ers, such as traces/copies and co-indexed silent elements like PRO. This has created a situation where, in very many cases, the eventual derived expression does uniquely identify the history of transformational operations (essentially, merge and move steps) that derived it. In such cases, recovering the derivational process itself is redundant, in much the same way as it is in (7); and the prevalence of such cases might create the impression that researchers are working with the view in Figure 1. But I will argue that this does not seem to be the case in general: even in the minimalist era, there are clear instances of proposals that only “make sense” under the view that the entire derivation is relevant (Figure 2), in ways that are formally analogous to the ‘easy to please’/‘eager to please’ analysis discussed above. I discuss some of these in Section 1.3.1. A natural and important question to ask, admittedly, is why such cases have become so rare, i.e. why derivational information so frequently ends up “duplicated” via representational devices. I will argue in Section 1.3.2 that this is simply the result of historical accidents that have led to a theoretical architecture that makes thinking about these questions unnecessarily difficult. 1.3.1. Sensitivity to derivational history The most obviously relevant development since the system illustrated in (10)/ (11) is the introduction of traces. One possibility is that the introduction of traces coincided with a wholesale adjustment away from the perspective where complete derivations are the relevant objects, towards a view where only the derived struc- ture matters. Two indications that this was not the case can be seen in arguments motivating the Strict Cycle Condition (Freidin 1978, 1999) and in the proposal by Lebeaux (1988, 2000) to account for anti-reconstruction effects by allowing late adjunction. More recently still, movement has generally been taken to leave behind not just a trace of the moved constituent, but rather a full copy of it. This has the consequence of duplicating still more information between the derivational process and its output, but even this development does not seem to have coincided with a switch to a view where only the derived structure matters. Appeals to information What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 105 that cannot be gleaned from derived structure can be seen in, for example, Lasnik (1999) and McCloskey (2002). As a first example, consider the argument for the Strict Cycle Condition based on the unacceptability of (12) (Chomsky 1973). Freidin (1978: 524) (see also Freidin 1999: 100) points out that in the context of the assumption that intermedi- ate traces of successive cyclic movement can be deleted, one needs both subjacency and the Strict Cycle Condition to rule out such a sentence.

(12) *What did he wonder who ate?

There are two relevant derivations to consider:

(13) he wondered [who ate what]

he wondered [whoi ti ate what] whatj he wondered [whoi ti ate tj]

(14) he wondered [who ate what]

he wondered [whatj who ate tj] whatj he wondered [tj who ate tj] whatj he wondered [whoi ti ate tj]

In (13), first ‘who’ moves to the embedded SpecCP position, and then ‘what’ is forced to move in a single step to the matrix SpecCP position, violating subjacency. But on the assumption that intermediate A-bar traces can be deleted/overwritten, the fact that the derivation in (13) violates subjacency is not sufficient to rule out the sentence, because the derivation in (14) provides a way around subja- cency: move ‘what’ to the matrix SpecCP in two subjacency-obeying steps, and then move ‘who’ into the embedded SpecCP position (overwriting the trace of ‘what’). The additional constraint that is needed is the Strict Cyclic Condition, which prevents the order of operations in (14). The important point for our purposes is that the two derivations in (13) and (14) produce the same final derived expression, as the last lines of each make clear. Thus it would make no sense to point out that, without the Strict Cycle Condition, (13) would be ruled out as desired but (14) would not, unless the things being ruled out and ruled in were derivations. Put differently: if we suppose that the expression on the last line of (13) is what the theory says is grasped by a speaker upon encounter- ing the string in (12), then it would make no sense to say that although this expres- sion is correctly classified as ungrammatical, something more must be added to our theory to rule out (the expression on the last line of) (14). As in the ‘easy’/‘eager’ example, the final derived expression underdetermines the entities that the theory is evidently taken to actually care about — namely, the derivations themselves.8

8. Freidin (1978) notes that a ban on the deletion of traces achieves the same result. But this does not change the fact that the argument for the Strict Cycle sketched here, which relies on the view that 106 CatJL Special Issue, 2019 Tim Hunter

As another example of this kind of situation, consider the analysis of the con- trast in (15) proposed by Lebeaux (1988, 2000).

(15) a. Which pictures [that Johni took] did hei like?

b. *Whose claim [that Johni took pictures] did hei deny?

Lebeaux’s influential account of this contrast involved supposing that the rela- tive clause in (15a) could be added after the wh-movement transformation that fronts ‘which report’, since the relative clause is not required to be present in d-structure. The bracketed clause in (15b), however, being a complement rather than an adjunct, does not have this flexibility, and therefore has no way to avoid the Condition C violation induced by the co-indexed matrix subject ‘he’ at d-structure. The crucial point for our purposes here is that this distinction between the derivation of (15a) that circumvents Condition C and the derivation of (15b) that violates it was not encoded in the S-structure phrase markers that were assumed at the time. The two trees shown in (16) do not themselves differ in any respect that is relevant to compliance with Condition C.9

(16) a.

derivations are the objects being ruled in and ruled out, was taken as a valid pattern of reasoning regarding the consequences of not having such a ban. Freidin in fact argues for the approach that disallows trace deletion rather than the one that enforces the Strict Cycle Condition. The shift from deletable traces to non-deletable traces can be seen as part of the broader trend towards more and more “substantive” residues of movement, ensuring that more and more derivational history is encoded in the final derived object, culminating with the full-fledged copy theory of movement. 9. Lebeaux (2000: 107-108) is quite explicit about this: “There are two possible derivations for [(15a)]. In one, Adjoin-α applies prior to Move-α. […] In this derivation … Condition C will apply to the intermediate structure, ruling it out. […] There is, however, another derivation [in which] Move-α […] applies before Adjoin-α. This derivation gives rise to the appropriate s-structure as well.” (emphasis added). What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 107

b.

The theory would not provide any account for the fact that speakers’ judge- ments of (15a) differ from their judgements of (15b) if it were assumed that speak- ers grasped only the trees in (16). This is analogous to the way the early transforma- tional grammars would not provide any account for speakers’ differing judgements regarding the ‘easy’/‘eager’ contrast if it were assumed that speakers grasped only the derived P-markers shown at the top of (10) and (11). Instead, the theory must be interpreted as claiming that speakers grasp the entire derivational process, includ- ing specifications of what the D-structure phrase marker looked like and whether or not the relative clause was adjoined via a transformation that followed the wh- movement of ‘which claim’: picture representations along the lines of (10) and (11) with D-structures at the bottom and the trees in (16) at the top, where one of the crucial transformations involves adding the relative clause to a post-wh-movement structure to produce (16a). Note that the important point here is independent of whether Lebeaux’s analy- sis is correct. What is significant is that there does not appear to have been any objection to the analysis based on the idea that since the crucial distinction is not encoded in the trees in (16), the theory cannot account for the contrast in judge- ments in (15). If it were standardly assumed that final derived expressions10 were the objects that were grasped by speakers, then one would expect this objection to be raised. This kind of situation, where a derived expression underdetermines its

10. Perhaps the point becomes even clearer in light of the fact that there is no single “final derived expression” in the GB system assumed by Lebeaux (1988). Grasping only the s-structure phrase marker would provide no encoding of, for example, the scope of covertly-moving expressions; and grasping only the LF phrase marker would provide no distinction between, for example, the wh-phrases that are pronounced in fronted positions and the wh-phrases that are pronounced in their base positions in English multiple wh-questions. The only reasonable interpretation of GB-style theories is to suppose that speakers grasp representations at all four levels — d-structure, s-struc- ture, PF and LF — along with a history of the transformations that relate these four to each other. 108 CatJL Special Issue, 2019 Tim Hunter derivational history, becomes less likely in the context of the recent shift to adopt copies rather than traces as the residues of movement. In particular, the two exam- ples just discussed would no longer provide clear evidence that derivations are the relevant objects if we update them in accord with these more modern assumptions: the derived structure for (15a) would encode the crucial distinction between early and late adjunction by having a copy of the wh-phrase with or without the adjunct in the low position; and the assumption that residues of movement can be deleted/ overwritten is (at best) difficult to reconcile with the idea that these residues are full-fledged copies, undermining the argument based on cyclicity. So we might ask whether, even if the derivational mindset that was clear in Miller and Chomsky (1963) was not abandoned with the introduction of traces in the 1970s, perhaps it was abandoned later with the introduction of copies. But this also does not seem to be the case. For example, Lasnik (1999) raises the possibility that A-movement may not leave a copy, as an explanation for the fact that A-movement does not show reconstruction effects. As a consequence, the final derived expression would not encode the base positions of A-moved ele- ments, and therefore could not be used to determine which theta roles they are assigned (or even whether they had been assigned theta roles, as the Theta Criterion or equivalent would require). Lasnik suggests instead that “θ-roles are ‘checked’ in the course of a derivation” (p. 207) — for this to amount to any account of why speakers interpret sentences involving A-movement in the ways that they do, the background assumption must be that they grasp a complete derivational history. This is certainly not an uncontroversial proposal, and Lasnik notes that it departs from Chomsky’s (1995) assumption that θ-roles are “determined specifically at the LF level” (i.e. in the final derived expression), but there is no sign that this departure requires an adjustment to the fundamental question of which objects are grasped by speakers. In fact Lasnik notes (p. 208) that his proposal can be seen as a direct descendent of the way the Standard Theory takes θ-assignment to be a “base property”, as illustrated with the ‘easy’/‘eager’ example above. Another example can be seen in McCloskey’s (2002) account of certain facts involving complementizers in Irish. This proposal’s “core claim is that the morpho- syntactic make-up of a head is influenced not by the syntactic material with which it is in a local relation, but rather by the mode of introduction of that material” (p. 202). Specifically, a C head is pronounced as ‘aL’ if its specifier was filled by an application of move, and as ‘aN’ if its specifier was filled by an application of merge (and as ‘go’ if its specifier is not filled). So inspecting the contents of the C head’s projection in the final derived expression will not suffice to determine which of these pronunciations is applicable for a given structure.11 McCloskey notes that

11. This is perhaps not as clear a case as some of the earlier examples, because even though inspecting the C head’s projection will not provide the relevant information, inspecting the entire derived phrase marker will: if other copies of the phrase that fills the SpecCP position are present lower in the structure, then it will follow that the SpecCP position was filled by move. But McCloskey makes no mention of this and states the relevant criterion in terms of the derivational operations themselves. This would be surprising if he was working under the assumption that this information could be recovered only via inspection of copies. What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 109 the proposal is unusual in this respect, but again there is no sign that he takes this novelty to involve an adjustment to our understanding of the fundamentals of what a syntactic derivation is. A striking recent example of “nonmonotonic derivations” is Müller’s (2017) proposal of a derivational system that includes a removal operation, which deletes a projection from a phrase marker. The motivation for this is cases of apparent “conflicting representations”, where some diagnostics indicate that a certain ele- ment is present and others indicate that it is absent; Müller proposes that in such situations, some diagnostics are revealing properties of the pre-removal structure and others are revealing properties of the post-removal structure. The question of whether passives include a syntactic representation of an external argument, for example, is treated this way. It is not immediately obvious whether this will cre- ate situations where two importantly distinct derivations lead to a single derived structure, as in some of the cases discussed above — but it seems natural to assume that Müller intends for a derivation where a certain element is merged and then removed to be meaningfully different from any derivation where that element is never merged in the first place.12 This fits with the view that entire derivational histories are the central theoretical objects (Figure 2), but not with the view that final derived expressions suffice (Figure 1).

1.3.2. The rise of representational devices Let us suppose, then, that the implicit assumption in contemporary minimalist syntax is still that speakers grasp entire derivations, in the manner that is straight- forwardly necessary for the ‘easy’/‘eager’ contrast in (10) and (11). Why then have we seen such an increase in representational devices, which make this implicit assumption easier to overlook? If the reason for these developments was not a shift to a point of view more in line with Figure 1, where the final derived object encodes all grammatically relevant information by design, then what was the reason? To the extent that no other reason can be identified, the argument I made in the previous subsection would be weakened. To be concrete: why is it that contemporary theories would assign derived structures something like those in (17) to the ‘easy’/‘eager’ sentences, rather than those shown at the top of (10) and (11)?

(17) a. Johni is easy [ti PROarb to please ti]

b. Johni is eager [PROi to please]

12. It is interesting, in light of the discussion in the next subsection, that Müller acknowledges (p. 27) that “structure removal may indeed lead to incompatibilities with the standard concept of trans- parent logical forms as laid out, e.g., in Heim and Kratzer (1998)”, i.e. incompatibilities with the idea that semantics is computed from derived expressions, and suggests instead that the derivation tree (or T-marker) serves this purpose; cf. the discussion around (20) and (21) below. So Müller acknowledges that his proposal requires a departure from one dominant assumption about the object that serves as the basis of semantic interpretation (to which there are viable alternatives), but makes no mention of any required shift regarding the issue reflected in the choice between Figure 1 and Figure 2. 110 CatJL Special Issue, 2019 Tim Hunter

Note that it is no answer to simply say that “We need a co-indexed PRO there in order to represent the fact that ‘John’ receives the subject theta role from ‘please’” — this begs the question, since we have seen that there are alternative, derivational ways of representing this information. A problem that soon arose for the “purely deep structure” encoding of semantic information as in (10) and (11) was the fact that some transformations can affect semantic interpretation. For example, the passive transformation affects the relative scope of the two quantifiers in (18), and the raising of ‘every boy’ in (19a) affects the ability of this quantifier to bind the variable in ‘his’.13

(18) a. Everyone in this room knows two languages. b. Two languages are known by everyone in this room.

(19) a. [Every boy]i seems to hisi mother to be intelligent.

b. *It seems to hisi mother that [every boy]i is intelligent.

Since such pairs of sentences are derived from pairs of D-structures that are equivalent in all relevant respects, it is not possible to maintain the view of the Standard Theory (Katz & Postal 1964; Chomsky 1965) that deep structures were the only objects relevant to semantic interpretation. Somehow the theory needed to allow semantic interpretation to be dependent on both D-structure, where thematic relations were encoded, and S-structure, where the scope of quantifiers and other operators was encoded; see e.g. van Riemsdijk & Williams (1986: 80-87) for dis- cussion. The model of grammar based on this assumption that both D-structure and S-structure contributed to semantic interpretation became known as the Extended Standard Theory. While the basic point that semantic interpretation depended on both D-structure and S-structure configurations is clear enough, it is less clear exactly how a system of compositional semantic rules might operate if it is to take two distinct trees as input. Logically speaking, there are two different ways in which things could be re-envisaged so as to provide a single object that encodes all semantically-relevant information “in one place”. The first possibility is to take the input to semantics to be derivational histories rather than D-structures. The idea here would be to take each derivational operation to be associated with some particular compositional semantic rule, in the style of Montague (1974) and much subsequent work; in modern terminology this kind of approach is sometimes described as “directly compositional” (Barker & Jacobson 2007). The raising transformation that applies in the derivation of (19a), for example, would affect the syntactic and semantic computations in parallel: it would displace the phrase ‘every boy’ into its matrix clause position on the syntactic side, and widen the scope of this phrase’s interpretation on the semantic side. Rather than

13. See e.g. van Riemsdijk & Williams (1986: 83); Chomsky (1975: 97-98); Lasnik & Lohndal (2013: 37). The contrast in (18) was in fact noted in Chomsky (1957: 100-101). What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 111 a phrase marker such as D-structure or S-structure being the structure on which compositional rules would operate, a hierarchical description of the derivational history would serve this role. This kind of object is essentially what was depicted earlier in (10) and (11), but it suffices to only identify the transformations applied at the internal nodes, as in (20) and (21).

(20) (21)

Such hierarchical objects were known as T-markers in early transformational grammar (see e.g. Chomsky 1965: 130). Given a semantic rule associated with each transformation, it would be possible to retrieve from these objects both the additional syntactic information that is represented explicitly in (10) and (11) and a semantic interpretation for the completed sentences.14 Kobele (2006, 2012) pro- vides examples of what it would look like to apply this idea to a minimalist-style transformational grammar. As it happened, however, this is not the way things proceeded. A second option was made possible by the introduction of traces: the input to semantic interpreta- tion was taken to be not D-structure as in the Standard Theory, nor the derivation

14. In fact there was some work pursuing exactly this idea before the Standard Theory assumption that only deep structure was relevant came to be adopted: “Suppose S has been constructed from a certain set of source sentences by the optional transformation T. A type 2 rule is a rule which operates on the semantic interpretations of these source sentences and on either the derived constituent structure characterization of S or on the transformation T in order to produce a semantic interpretation of S” (Katz & Fodor 1963: 206). This was a way to flesh out the assumption in Chomsky (1957) that the complete derivation was relevant to semantic interpretation: “In the earliest generative model, the interface is the T-marker, which includes all of the syntactic structures created in the course of the derivation. Subsequent models had the following interfaces with semantics: The Standard Theory had D-structure, the Extended Standard Theory had D-structure and S-structure” (Lasnik & Lohndal 2013: 39). The adoption of the Standard Theory assumption meant that meaning-changing transformations such as question-formation and negation, which were optional in the pre-Aspects framework, had to instead become obligatory transformations triggered by question and negation morphemes that were present at D-structure. These morphemes might be thought of as represen- tational encodings of the “derivational future”, made necessary by the D-structure assumption, analogous to the way traces are representational encodings of the “derivational past”. 112 CatJL Special Issue, 2019 Tim Hunter as a whole as in the first option just outlined, but rather S-structure. The thematic information that was not available at S-structure in the Standard Theory could now be retrieved at that level via the traces left by transformations that moved things out of their thematic positions.15 This in turn subsequently developed into the idea that the input to semantic interpretation is an “even later” level of representation, namely LF, with traces (or copies) still encoding all positions that constituents had moved through in earlier stages of the derivation. But importantly, the possibility of this approach to semantic interpretation was a by-product of the presence of traces, not a motivation for introducing traces:

The motivation for [traces] was that in important respects, movement gaps behave like positions that are lexically filled, an argument first made in Wasow (1972) and Chomsky (1973). (Lasnik & Lohndal 2013: 38)

The argument from Wasow (1972: 138-142) is that strong crossover violations such as (22a) can be accounted for by supposing that wh-movement leaves behind a trace that is an R-expression, such that the trace in (22a) violates Condition C in just the same way that ‘John’ does in (22b).

(22) a. *Whoi did hei say Mary kissed ti?

b. *Hei said Mary kissed Johni

The argument from Chomsky (1973: 265-267) is based on the need to rule out (23). Without a trace left behind by the raising of ‘John’, the theory under consid- eration would have no way to prevent application of the rule that relates ‘the men’ to ‘each other’; but if a trace is left, then this is blocked by the Specified Subject Condition (SSC).

(23) *Johni seems to the men ti to like each other. In slightly more neutral terms, the key observation here is that a clause does not become “subjectless” when its subject is moved out of it. (Lightfoot 1976: 560 also presents this particular case as “the earliest motivation for introducing the notion of a trace”.)16 These two arguments are based on the idea that there must be “something there” in vacated positions: something to be constrained by Condition C (or equivalent)

15. See van Riemsdijk & Williams (1986: 186), Chomsky (1975: 96), Lasnik & Lohndal (2013: 38). Actually, even the introduction of traces did not immediately make it entirely straightforward that thematic information could be recovered at S-structure because of the possibility that traces could be deleted or overwritten by subsequent transformations; see e.g. Fiengo (1977: 58-60) and Lightfoot (1976: 560, note 2) for some discussion. 16. Omer Preminger (p.c.) points out an additional roughly contemporaneous introduction of the idea: Baker & Brame (1972: 56) propose that transformational rules “may be restated in such a way that they leave a special feature or boundary symbol behind in the place formerly occupied by the moved constituent”, as a possible explanation for the much-discussed ‘wanna’-contraction paradigm. What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 113 in (22a), and something to serve as a specified subject in (23). A slightly different kind of argument for traces concerned the relationship between a moved element and its trace:

The principal motivation for traces comes from the parallelism between movement structures and antecedent-anaphor relations. […] Essentially, movement must always be to a c-commanding position and an anaphor must always be c-commanded by its antecedent. […] we might say that a trace has anaphor properties and that the moved phrase has antecedent properties. (van Riemsdijk & Williams 1986: 141-142)

And these two kinds of dependencies share not only the c-command require- ment but also locality constraints: a single constraint is apparently blocking both movement and an anaphor-antecedent relation in (24), and a single constraint (per- haps the same one) is apparently blocking both in (25) (van Riemsdijk & Williams 1986: 143).

(24) a. *Johni was expected that ti would win

b. *Johni expected that himselfi would win

(25) a. *Johni was expected Bill to kill ti

b. *Johni expected Bill to kill himselfi

The core idea here is that conditions on the application of movement rules could be recast as conditions on the distribution of traces, and that “the distribution of trace at the level of surface structure follows from some quite natural conditions on bound anaphora” (Fiengo 1977: 53). The posited connections between move- ment dependencies and antecedent-anaphor dependencies were developed further, to the point where the distinctive distributions of PRO, A-traces and Ā-traces were all accounted for to some large extent by the Binding Principles (Chomsky 1981, 1986). And it was apparently very natural to understand the Binding Principles as representational constraints, since their canonical purpose was to constrain the dis- tribution of various kinds of NPs which were taken at the time to be base-generated, rather than by-products of certain transformations. So when similarities were noted between, for example, the configurations in which reflexives could appear and the configurations in which raising was licit, the natural way to bring them under the same umbrella was to suppose that there is an A-trace at S-structure that is subject to the existing, representational constraint on the distribution of reflexives (i.e. some analog of Principle A).17

17. If analyses of reflexives as transformationally-derived, in the style of Lees and Klima (1963), had remained dominant throughout the intervening years, then it would perhaps have been less obvious that the unification had to proceed by bringing everything under a representational umbrella. The other natural alternative would have been to suppose that there is something important shared by 114 CatJL Special Issue, 2019 Tim Hunter

To the extent that this kind of logic drove the shift towards enriched representa- tions, the shift had nothing to do with a preference for representational encodings of locality conditions, nor a preference for having a particular representational level as the input to semantic interpretation,18 nor with a preference for avoiding situations of the sort presented in section 1.3.1 where it must be assumed that speakers grasp derivational histories; rather it was simply an attempt to unify various primitives that had been empirically discovered to pattern together. If two things (reflexives and raising) behave alike and one (reflexives) is taken for granted to be constrained representationally, then it is natural to create a representational reflex of the other in order to bring the two into line. Furthermore, the half of this scenario that was taken for granted to be represen- tational in the 1980s is arguably no longer thought to be so. The general trend in minimalist syntax has been to try to derive the effects of earlier representational constraints, such as Principle A, from derivational constraints on merge and/or move, such as some version of a Shortest Move condition (e.g. Hornstein 1999, 2001; Kayne, 2002). Very broadly speaking: a derivational explanation of the ungrammaticality of (26a), based on the fact that it involves a movement step that goes too far, would most likely be more in keeping with contemporary thinking than would a representational explanation of the ungrammaticality of (26b), based on the fact that it involves a trace/copy that is not appropriately bound/licensed.

(26) a. *Johni thinks that Mary likes himselfi b. *John is likely that it seems to be tall

If indeed the locality constraints on raising, for example, are nowadays to be explained via minimality-style limits on the applicability of movement transforma- tions, nothing would be lost by reverting to a system which, like 1960s transfor- mational grammars, has no traces or copies left by raising: the traces/copies are no longer of any relevance to Principle A. (And recall that “we need a co-indexed silent element there in order to encode the theta role that ‘John’ received in its

the derivational operation that establishes reflexive-antecedent pairs and the derivational operation that implements raising. (Indeed the Chomsky 1973 argument based on (23) still assumed a deriva- tionally-established relationship between ‘each other’ and its antecedent.) While these two kinds of dependencies were in fact unified under a representational umbrella in the GB era, minimalist theories arguably tend more towards unifying them derivationally; see discussion surrounding (26) below. 18. Admittedly, the idea that traces would allow for interpretation to depend only on surface structure quickly became appealing in itself. Lightfoot (1976: 560) writes that “The earliest motivation for introducing the notion of a trace was the desire to employ the Specified Subject Condition [in (23)] But much of the subsequent appeal of the theory seems to lie in the claim that it yields exactly the right information to support semantic interpretation at the level of surface structure”; Lightfoot goes on to argue for the “pluralist” view of traces that takes them (as I do here) to be independently syntactically motivated with the semantic consequences as a by-product, over the “exclusively semantic” view that takes them to be motived only by the requirement of semantic interpretation. Chomsky (1975: 97) goes as far as to say that “The original motivation for the trace theory was in part that it facilitated semantic interpretation”, but still with the qualifier “in part”, and follows immediately with “But there were also independent considerations that led to the same theory”. What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 115 base position” only begs the question — even if semantic interpretation cannot be computed from D-structure alone, directly compositional interpretation, discussed around (20) and (21) above, remains an option.) In other words, if this line of rea- soning is correct, then we have roughly reverted to using derivational mechanisms to constrain both antecedent-anaphor dependencies and movement dependencies, as was the case before the introduction of traces; but despite the fact that traces were introduced with the aim of representationalizing those constraints (a purpose that they no longer serve), we have maintained the assumption that movement leaves some kind of representational residue. This representational residue is now redundant, like the tree structure in (7).19 So not only was the representational shift driven by empirical practicalities rather than architectural preferences, but the assumptions that made the shift practi- cal in the 1970s and 1980s arguably no longer hold. If this analysis is correct, then it suggests that many of the representational encodings of derivational history in modern syntax — for example, the unpronounced copy left in the lower clause of a raising construction — are unnecessary. Put differently, we have a mixed system where there is a certain amount of redundancy, but it is arguably the representa- tional aspects that are redundant, not the derivational ones.20

1.4. Interim summary This section has had two aims. The first aim was to establish what it looks like for a theory of grammar to suppose that the objects being grasped by a speaker are derivations. This comes out most clearly in the case of systems like categorial grammar, but importantly there are also mixed systems that derive structured expressions (for example, trees) and yet also require this same derivational interpretation. The second aim was to argue that although it is no longer as clearly the case as it was in the early days of the 1960s, generative grammar has never ceased to be a system of the mixed kind that takes derivations themselves to be

19. General requirements such as The Projection Principle (Chomsky 1981) and the No Tampering Condition (Chomsky 2005, 2007) can be seen as wide-ranging expressions of the trend towards a stronger and stronger commitment to the idea that derivational operations must preserve informa- tion from earlier stages. Notice that these are virtually names for the requirement that information is preserved, not arguments for the adoption of systems that satisfy this requirement. For example, discussions of the Projection Principle often point out that it forces or derives the presence of trac- es in positions vacated by movement (e.g. Chomsky 1981: 30; van Riemsdijk & Williams 1986: 252; Lasnik & Uriagereka 1988: 28), but this says nothing about whether the presence of traces is something that we should want in the first place. Requiring that information is preserved seems to inevitably lead in exactly the direction of redundant mixed theories, i.e. towards theories where earlier derivational stages are genuinely dispensible as in (7), inviting the view in Figure 1. The motivation for moving in this direction remains unclear to me. 20. Brody (2002) observes the same redundancy, but argues to eliminate it in the opposite way: switch- ing to a completely representational theory, where the idea would be that both (26a) and (26b) should be ruled out by representational constraints in roughly the manner of GB systems. The purpose of this paper is only to explore the option that retains derivations, since this seems like a less drastic departure from contemporary mainstream thinking, so I leave aside proper consideration of the relative merits of this option versus the alternative that Brody proposes. 116 CatJL Special Issue, 2019 Tim Hunter the objects to be grasped by speakers. The instances where this can be seen have become rarer over the years for unrelated empirical reasons (which arguably are less relevant now than they were in the GB era), but certain specific well-known points in the literature make it clear that this is still the intended interpretation. If we accept this conclusion about the cognitive commitments of modern generative grammar, then we should expect that in principle there will be ways to empirically distinguish theories on the basis of their derivational claims — not the claims they make about derived expressions (for example, which ones are well-formed and which ones have which particular interpretations), but the claims they make about the derivational processes of which those expressions are the end result. My goal in the rest of this paper is to show one way of cashing out these claims. To sharpen the issue, I will consider two versions of the theory that differ only in their derivational processes: the set of expressions derivable by the two are identical (as are their classifications of which expressions are grammatical, and which have certain interpretations, etc.).

2. A derivational theory of minimalist syntax (or two) In this section I will present two minimally-different versions of minimalist syntax. The two versions agree entirely on the set of derivable final expressions, and dif- fer only in the derivational processes that are taken to construct those expressions. Specifically, in one version merge and move are two distinct primitive structure- building operations, and in the other the structure-building functionality of merge and move is abstracted out into a single primitive operation. If the final derived expressions are all that play a part in the cognitive claims of a grammatical theory, then these two versions of the theory will obviously be empirically indistinguishable. I have shown in the previous section that, in occa- sional cases, the derivational properties of a theory (i.e. the fact that speakers grasp complete derivations) are relied upon in an account of acceptability facts of the sort standardly used in syntactic research — and therefore, that it is reasonable (and indeed necessary, if existing arguments in the syntactic literature are to be taken seriously) to suppose that speakers grasp complete derivations. Here I hope to show that other empirical measures can also be sensitive to the derivational properties of a theory. In other words, the contributions of derivational processes to accounts of acceptability facts are not an artifact of some peculiarities of the ways in which grammars relate to acceptability judgements; they are a part of the quite general claims that are made by positing a generative grammar as a component of a speaker’s mind. As I will show in Section 3, the two versions of minimalist syntax that I introduce here can (in combination with reasonable linking hypotheses) make distinct empirical predictions about sentence comprehension difficulty phenomena and about the choices a learner will make between candidate grammars.

2.1. Merge and move as distinct primitives I will start by presenting a relatively standard version of minimalist syntax in this section (essentially following Stabler 1997, 2011), from a perspective that empha- What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 117 sizes the place of derivations in the sense outlined in Section 1. In Section 2.2 I will present the derivationally-distinct alternative by highlighting the ways it differs from the system introduced here. As an example, consider the derivational process underlying (27). Here and throughout this section I will ignore head movement: for simplicity, I will sup- pose that this is a simple wh-question in a language much like English, but lacking auxiliary inversion (or alternatively, an embedded question in English). I will also make a number of simplifying assumptions about the particulars of clause structure (e.g. ignoring the TP layer).

(27) what John eats

In particular, consider the expression that has been derived immediately pre- ceding the final wh-movement step: the wh-phrase is in the direct object position, but has an unchecked feature indicating that it must move to another position for the derivation to be valid. I will represent this by labeling the phrase DP[-wh] (as opposed to simply DP). In addition, to highlight the way phrases with unchecked features have “unfinished business” that needs to be completed by some subsequent derivational step (in contrast to the way the DP ‘John’, for example, has done eve- rything it needs to do), I will adopt a notation for tree structures where phrases with these unfulfilled requirements stand out visually, as shown on the right in (28). It bears emphasizing that this unusual graphical convention says nothing more than what was already said by annotating the ‘what’ node with an unchecked -wh fea- ture in the more conventional diagram on the left in (28). It is no departure from standard minimalist assumptions. I adopt it here only because it will help to clarify the relationship between the two subtly different derivational implementations of movement that I am introducing in this section.

(28) 118 CatJL Special Issue, 2019 Tim Hunter

Adopting the notational conventions from Section 1, I will represent the full derivation of ‘what John eats’ as shown in Figure 6. The tree in Figure 6 should be thought of as an encapsulation of a derivational history in much the same way as (10) and (11) above (on page 102).21 A few specific points are worth noting. First, notice that as soon as ‘eats’ and ‘what’ are merged, we have a structure where one component has “unfinished business” in the sense introduced above, and therefore the tree structure containing only these two words already shows ‘what’ set aside in the manner introduced in (28). Second, notice that the step that combines ‘eats what’ with ‘John’ is shown with the former on the left and the subject on the right in the derivation structure. This has nothing to do with the eventual linear order of these two constituents, nor with the order in which they appear in the resulting derived VP structure, shown immediately above. It simply records the fact that ‘eats what’ is the “selector” (here, more specifically, theta-assigner) and ‘John’ is the “selectee” (here, more specifically, theta-assignee), and as the rest of the tree makes clear, merge steps are recorded with the selector, or the element which projects, on the left (e.g. ‘eats’ and the null C head) and the selectee on the right (e.g. ‘what’ and the completed VP). Although no ambiguity would arise if this convention were not maintained, I will do things this way in order to bring out the distinction between the structure of the derivation (what combines with what, indicated by the thick arrows) and the structure of the derived expressions. Third, the final step of the derivation is a move step. This is a unary operation, which takes one derived expression as its input to produce a new derived expres- sion — in contrast to a binary operation such as merge, which takes two derived expressions as input — much like the unary transformation that fronts ‘John’ in the final step of the derivation shown in (10). After this move step, ‘what’ has no remaining unchecked features and therefore is shown having settled fully into its final position in the derived tree. For expository purposes, I am assuming, some- what unconventionally, that no copy or trace is left in object position. Nothing significant would change if a copy were shown in the final derived tree, but my choice here is intended as a reminder that the final derived expression does not in general uniquely determine a history of derivational operations (even if it would in most theories’ analyses of this particular simple sentence). If we do things this way, then the fact that ‘what’ bears the object theta role is encoded by the earlier derivational steps rather than in the final derived expression, just as was the case for the fact that ‘John’ is the underlying object in (10); and in addition, to take but one example, strong crossover will need to be stated as a derivational constraint on wh-movement rather than by supposing that the residue of wh-movement is an R-expression constrained by Condition C. Whatever the explanatory virtues of

21. One difference is that the trees in (10) and (11) showed only the transformational part of the der- ivations of the relevant sentences, ignoring the base component’s construction of the underlying P-markers, and therefore show tree structures at the leaves of the derivation. The tree in Figure 6, in contrast, shows primitive lexical items at the leaves, since all structure-building is performed by generalized transformations in this system. What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 119

Figure 6. The full derivation of ‘what John eats’. 120 CatJL Special Issue, 2019 Tim Hunter assimilating strong crossover to Condition C by including some residue of ‘what’ in the final derived expression in Figure 6, however, the motivation behind my choice is, to repeat, simply to provide a reminder that derivational histories are sometimes relied upon as the sole record of certain pieces of information. Recall, for example, Lasnik’s (1999) suggestion that A-movement leaves no copies, which, whatever its pros and cons, is not taken to make it impossible to enforce the requirement that all DPs receive theta roles. My choice to not show copies is simply a reminder of those sorts of possibilities.

2.2. A single structure-building operation I will turn now to the alternative derivational procedure that generates the same range of derived expressions as the system just outlined in Section 2.1. Recall that when subconstituents of a tree have unchecked features that require future movement operations, I have drawn these subconstituents below the rest of the tree, as illustrated in (28) — intuitively, one can think of them as waiting in a kind of buffer or “holding zone” for the opportunity to fulfill their remaining combinatory requirements. It is this holding zone, naturally enough, that movement operations draw on when, for example, a C′ constituent has been constructed whose head bears a +wh feature and can therefore check the -wh feature of a waiting phrase, as shown in the last step of Figure 6. The idea behind the unification of merge and move into a single structure- building operation, as I will implement it here, is to suppose that not only move but also merge draws on this same “holding zone”. So it will not only hold phrases that are waiting to move into certain structural positions, but also phrases that are waiting to merge into certain structural positions. It is the shared use of this holding zone that unifies all instances of structure-building in this system, which departs somewhat from standard intuitions regarding the unification of merge and move (the latter being an instance of the former, perhaps in combination with copy); see Hunter (2011) for much discussion, drawing on Stabler (2006). But for present purposes all that is important is that it provides a minimally-different conception of the derivational processes that produce the same range of derived expressions as the more standard system introduced above — and one that has been formu- lated explicitly enough to (i) allow us to be certain that the two systems do indeed generate the same range of derived expressions, and (ii) allow us to integrate both systems into models of parsing and learning to conduct the kind of tests that will follow in Section 3. To illustrate, consider the derivational steps that combine the verb ‘eats’ with its two arguments. For each argument, the effects of what was previously the merge step that introduced it and combined it with (a projection of) ‘eats’ are now achieved by two distinct derivational operations in succession: the first of which I will call insert, and the second of which I will call build. The build operation is the one that is a generalized version of both merge and move from above; the insert operation takes up the slack of the extra book-keeping that is created by this unification. What insert does is introduce new material into an expression without What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 121 fulfilling any of this new material’s requirements; for example, (29) shows a small part of a derivation, the first step of which is an insert step that introduces ‘what’ into the derivation without putting it into a position that fulfills its requirement of a theta role. Instead, it is simply placed in the “holding zone” shown at the bottom of these boxed expressions. This corresponds to the fact that at this point ‘what’ certainly has unfinished business, in fact two kinds of unfinished business (because it has not even “started business”): both the establishment of a theta role and the requirement to move into an operator position remain to be completed. The second step shown in (29) is a build step. The build operation is essentially identical to the move operation as presented in Section 2.1: it draws on material waiting in the holding zone, to establish dependencies required by the “main part” of the expression, shown at the top of the thick boxes. The sense in which this system unifies merge and move is that both “first merge” and “re-merge” involve the build operation, drawing something from this holding zone. After the build step establishes its thematic dependency with ‘eat’, ‘what’ still has unfinished business in just the same sense that was discussed earlier, namely the requirement encoded by the -wh feature, so it remains held out, waiting for an opportunity to fulfill this final requirement. The holding zone contains elements that have one or more as- yet-unfulfilled requirements. The next two derivational steps are shown in (30). These two steps are another insert-build “pair” that together have the effect of what was a single merge step in the system of Section 2.1. Here it is ‘John’ that is first added, without having any dependencies established, by an insert step, and then subsequently drawn on by build to establish the necessary external theta role dependency. In the intermediate derived expression in (30), ‘John’ is shown in the holding zone just as ‘what’ was in the intermediate derived expression in (29). But unlike ‘what’, ‘John’ has no further business to conduct beyond thematic requirements, and so after the build step in (30) it is shown fully settled into its final position. 122 CatJL Special Issue, 2019 Tim Hunter

(29) (30) (31)

After a third insert-build pair of steps (corresponding to the third merge step of the derivation shown in Figure 6) that combines the C head with its completed VP complement, the derivation of ‘what John eats’ ends with the build step shown in (31). Note that although this build step corresponds to what was previously a move step, it is not formally different from any of the previous build steps that corresponded to (parts of) merge steps: it establishes a dependency between an ele- ment waiting on the holding zone and the head of the main tree. If the waiting element has further requirements that remain unfulfilled after a build step, then it remains in the holding zone to await a future build step that will satisfy the next of its further requirements, as in the build step at the top of (29); if it does not, then it will not need to participate in any further derivational operations and is fully integrated into the tree, as in the build steps at the top of (30) and (31). I will call the system introduced here Insertion Minimalist Grammars (IMGs), in contrast to the more standard system in Section 2.1 which I will call Minimalist Grammars (MGs). This terminology follows the technical literature where more details of these two systems and the relationship between them can be found; see for example Stabler (2011, 2006); Hunter (2011). What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 123

2.3. Interim summary Notice that the final expression derived by the IMG in (31) is identical to the final expression derived by the MG in Figure 6. It should be intuitively clear that any expression that can be derived by one of these systems can be derived by the other, but also that the structure of the two corresponding derivations will be slightly different. Given the conclusions reached in Section 1, this means that the two theories make distinct claims about the objects that are grasped by speak- ers — they are therefore just as distinct as two representational theories that posit structurally different representations. Full derivation tree structures of the sort shown in Figure 6 for an MG are unwieldy, and even more so for IMGs. An alternative notation that allows for a more direct comparison between the two systems labels the internal nodes of deri- vation trees simply with the name of the operation that applies at the corresponding derivational step, as in (20) and (21) above. This loses no information, because all of the derivational operations we are considering here are functions: if we know that merge applied to ‘eat’ and ‘what’, for example, then we have all the information we need to work out what the resulting derived expressions is. So to save space, we can represent the MG derivation from Figure 6 much more compactly as shown in (32); for comparison, the corresponding IMG derivation (previously shown only partially and piecemeal in (29), (30) and (31)) can be represented as shown in (33). To repeat, these are analogous to the “T-markers” of early transformation theory; see e.g. Chomsky (1965: 130).

(32) 124 CatJL Special Issue, 2019 Tim Hunter

(33)

3. The empirical reflexes of derivational processes

I now turn to the task of demonstrating that, in two simple case studies, there are empirical consequences to the choice between (i) supposing that the expression shown in (34) is derived by the combination of merge and move steps shown in (32), as the MG theory would have it, and (ii) supposing that it is derived by the combination of insert and build steps shown in (33), as the IMG theory would have it.

(34)

Specifically, I will show that it is possible to put together a chain of linking hypotheses to produce the result that these two proposals make distinct predic- tions — holding all other factors fixed — with regard to sentence comprehension What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 125 difficulty and with regard to the choices a learner will make among a given range of grammars. If it is shown that these two theories can make empirically-distinguishable cognitive claims, then it should be clear that the “empirical payload” of a grammati- cal theory should not in general exclude the derivational properties of that theory: in other words, Figure 2, rather than Figure 1, more accurately characterizes the cognitive claims of a derivational theory. This general point, of course, has nothing specifically to do with, for example, the two particular derivational systems I am considering here, or with the virtues of the idea of unifying merge and move, or with the degree to which the approach I have taken to this unification is in line with other proposals in the syntax literature. These two systems just provide a simple setting for tackling the abstract problem of relating derivational processes to various kinds of empirical predictions. Similarly, the details of the two case studies that follow — for example, the use of surprisal as a complexity metric, or the use of maximum likelihood estimation on the part of a learner — are also independent of the main concerns here. Some such assump- tions, and some collection of linking hypotheses, must be chosen in order to make the questions concrete; replacing the choices I have made with others would no doubt change the empirical predictions that I derive, but would leave unaffected the broader point that such empirical predictions can be derived.

3.1. A probability model In many of the cases where grammars play a part in some cognitive model, they do so by being supplemented with probabilities. The two empirical domains that I will deal with in the case studies below are both instances of this: information-theoretic complexity metrics such as surprisal are computed as some function of a distribu- tion over sentences, and probabilistic learning models often involve calculating the likelihood of the observed input relative to a certain hypothesized grammar in order to assess the fit of the grammar to these observations. One way for the predictions of such models to be sensitive to the distinction between MGs and IMGs, then, is for the probability distributions definable over the common set of generated expressions to be sensitive to this distinction. This is the approach that I adopt.22 In this section I will give a very brief overview of how to supplement grammars of the sort introduced in Section 2 with probabilities, in a way that produces differ- ent results depending on whether one adopts an MG or an IMG that produces the same set of derived expressions. Readers who are ready to assume that this can be done can safely skip to Section 3.2; readers who would like more information on the technical details than is provided here should consult Hunter and Dyer (2013). But I should note that there are many possible ways to do this, all of which would produce different patterns of results in the case studies below, and no single “right way”. Attempting to justify taking any particular one of these as a valid linking

22. See Hunter (to appear) for more discussion of the relationship between probability distributions and grammatical structure. 126 CatJL Special Issue, 2019 Tim Hunter hypothesis would run into the usual problems of many simultaneous unknowns, and is well beyond our current understanding (let alone the scope of this paper). Here I adopt one relatively natural option as a proof of concept. A conventional, non-probabilistic grammar can be thought of as defining a space of probabilistic grammars, each of which defines a particular probability dis- tribution over the objects generated by the original grammar. To add probabilities to a grammar — whether a simple context-free phrase structure grammar, or an MG, or an IMG — is therefore to choose from the space of associated probabilistic grammars, and in particular this is usually done by choosing values for some col- lection of real-valued parameters. In the case of a CFG such as the one in (35), we can think of there being one parameter for each rule; these parameters are the λ1, λ2, etc. shown in (35). Then the task of choosing one of the many probabilistic ver- sions of this grammar is the task of choosing values for the parameters λ1, λ2, etc.

(35) λ1 S → NP VP λ2 NP → John λ3 NP → Mary λ4 NP → D N λ5 VP → ran λ6 VP → walked λ7 D → the λ8 N → dog λ9 N → cat How does fixing values for these parameters have the effect of attaching probabilities to the grammar’s rules (and, as a result, to its derivations)? There are many possibilities, but one standard way of doing things the probability of NP rewriting as ‘John’, for example, is the value ; the probability of

NP rewriting as ‘Mary’ is ; and the probability of VP rewriting as ‘ran’ is ; and so on. And the probability of a particular derivation in this grammar is the product of the probabilities of the rules that are used in the derivation. Typically what we would like to do is to supplement an existing grammar G with probabilities (i.e. choose values for the parameters) in the manner that maximizes its degree of fit with some body of training data D. What does it mean to maximize degree of fit? Again there are many possibilities, but a common and simple answer is that we would like to choose values for the vector of parameters λ that maximizes the likelihood of the data D according to the probabilistic grammar

Gλ (i.e. the grammar supplemented with probabilities according to λ). This is known as maximum-likelihood training since the quantity it maximizes is the likelihood

P(D|Gλ). In the case of a CFG, this is a relatively simple process: if one sets the parameter λ2 to be the number of times that NP rewrites as ‘John’ in the training What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 127

corpus, and sets the parameter λ5 to be the number of times that VP rewrites as ‘ran’ in the training corpus, etc., then one arrives at the values of these parameters that maximize this likelihood.23 Due to a formal property of MGs established by Michaelis (2001), it turns out that a broadly similar strategy to the one just outlined can be adopted for supple- menting MGs (and also IMGs) with probabilities. The range of possible derivations in these grammars can be characterized via a branching process that has exactly the same structure as a context-free grammar; the relationship between this branching process and the surface strings (and indeed the surface tree structures) generated by the grammar is more complex and less transparent in the case of MGs than it is for CFGs, but this difference is irrelevant to the task of defining a probability distribution over the objects that a grammar generates. Hale (2006) made use of this fact to supplement MGs with probabilities. What this underlyingly context-free branching process provides is a characteriza- tion of what the “choice points” are that a machine would encounter when carrying out possible derivations licensed by the grammar, and what the competing candi- date options are at each point: this corresponds to knowing, in the case of the CFG above, that there are points where you need to decide between ‘John’ and ‘Mary’ and ‘D N’ as the expansion of NP, and there are points where you need to decide between ‘ran’ and ‘walked’ as the expansion of VP, etc. In a CFG, any particular grammatical rule only ever enters into consideration at one such choice point: if a rule’s left-hand side is NP, then it enters into consideration at the choice point cor- responding to deciding how to rewrite the symbol NP, and no others. But in an MG or an IMG, the relationship between the choice points and the grammatical rules is more complex. There are various ways in which one might flesh out the notion of a “grammatical rule” in these systems. Following Hunter & Dyer (2013), I will sup- pose that grammatical rules are roughly things like “merge to assign a theta role” or “move to check a -wh feature” (in an MG) or “build to check a -wh feature” (in an IMG) — for other reasonable choices, the fact remains that the relationship between choice points and grammatical rules is complex and many-to-many. Given this assumption about what we take to be the rules that the grammar is trafficking in, it is natural — although, as noted above, there is no single “right way” to do this — to design a probability model where the parameters have inter- pretations that relate to notions like merge steps, move steps, theta roles, wh-fea- tures, build steps and insert steps. Broadly speaking, the parameter λ2 in (35) is a measure of “how much NP gets rewritten as ‘John’”, λ5 is a measure of “how much VP gets rewritten as ‘ran’”, etc.; and accordingly training data is taken to provide information about “how much NP gets rewritten as ‘John’”, etc. So the model pro- posed for MGs by Hunter & Dyer (2013) includes parameters that are measures of “how much merge happens”, “how much wh features get checked”, “how much move happens”, etc.; and for IMGs, there are measures of “how much build hap-

23. And this has the effect, of course, that the probability of VP rewriting as ‘ran’ will be the number

of times that VP rewrote as ‘ran’ in the training corpus (i.e. λ5), divided by the number of times VP rewrote at all in the training corpus (i.e. λ5 + λ6). 128 CatJL Special Issue, 2019 Tim Hunter pens”, “how much insert happens”, etc. The relationship between these parameters and the probabilities that are multiplied together to determine the probability of entire derivation is complex (more complex than it is for CFGs), for precisely the reason that the relationship between the grammatical rules and the choice points is overlapping and complex. These complications aside, the model has the same form as the one explicated for CFGs above in that (i) it involves a choice of parameter values, which in turn determine probabilities, and (ii) one can use a training corpus to choose the parameter values λ that maximize the value P(D|Gλ). What differs is that in the case of an MG, choosing parameter values can be interpreted as answering questions about “how much merge happens” (for the parameter λmerge) and “how much move happens” (for the parameter λmove), whereas in the CFG the questions being answered include “how much NP rewrites as ‘John’” (for the parameter λ2, which could also have been called λNP → John). And, crucially, in the case of an IMG the parameter values being chosen during training are different again: they correspond to answering questions about “how much build happens” (λbuild), “how much insert 24 happens” (λinsert), etc. The answers that a given body of training data provides to the MG-based questions about merge and move steps will in general be different from the answers that this same training data provides to the IMG-based questions about build and insert steps.

3.2. Case study: Sentence comprehension difficulty and surprisal Surprisal is an information-theoretic complexity metric that has been hypothesized to predict human sentence comprehension difficulty (Hale 2001, 2016; Levy 2008). Given a probability distribution over sentences, and a particular sentence whose processing we are interested in, a surprisal value is defined for each word in the sen- tence. This sequence of values is taken to represent the difficulty of integrating the information provided by each word as the sentence is read or heard incrementally.

Specifically, given the sentence w1w2 … wn, the surprisal at word wi is

− log P(Wi = wi|W1 = w1,W2 = w2, … ,Wi−1 = wi−1)

The probability here is simply the probability of encountering the word wi in that position, given all the preceding context. The negative logarithm is a monotonic decreasing function, and therefore has the effect of converting high probabilities to low surprisal values, and converting low probabilities into high surprisal values.

24. So I am making an assumption here that hypothesized derivational operations correspond in this direct way to parameters of the relevant probability models, which is of course not necessary. In particular, if one takes the differences between MGs and IMGs to be so small that they are nota- tional variants, then the two probabilistic systems I am setting up will look like distinct parame- terizations of a single derivational system, rather than the results of applying a single parameter- ization choice to two distinct systems. But the more general point I want to make is that, whatever derivational distinctions one takes to be big enough that they “should matter”, tying parameters of probability distributions to those distinctions is one natural way to make them matter. What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 129

In the concrete case that I am presenting here, I will work with grammars generating all and only the sentences shown in (36):

(36) boys will shave boys will shave themselves who will shave who will shave themselves some boys will shave some boys will shave themselves

I make relatively obvious (i.e. English-like) assumptions about the structures of these sentences. The one somewhat unusual aspect of the analyses I adopt is that reflexives are generated via a doubling-style movement theory: in ‘boys will shave themselves’, for example, it is ‘boys themselves’ that combines as the object of ‘shave’, and ‘boys’ then moves up to the SpecTP position. (This is in order to maximize the number of “merge versus move” choices while keep the derivations as small as possible overall.) Given a common lexicon where the words that appear in these sentences are annotated with appropriate features, I will consider the relationship between the MG that generates (36) and the IMG that generates this very same set of sentences. Notice that in addition to generating the same set of strings, these grammars imple- ment the same analyses of these strings. By this I mean that the two grammars make all the same assumptions about “what goes where” in the course of the derivation — they differ only in whether these same interactions among words and phrases are effected by merge and move steps or by insert and build steps. Suppose we adopt the following (artificial, and entirely arbitrary) “corpus” as the training data that will provide the basis for choosing probabilistic versions of our two grammars. The number at the beginning of each line is the frequency of the sentence in the training data.

(37) 10 boys will shave 2 boys will shave themselves 3 who will shave 1 who will shave themselves 5 some boys will shave

For the reasons outlined above, this training data will be “interpreted” differ- ently depending on whether one is using it to train the MG or the IMG. As a train- ing corpus for adding probabilities to the MG that generates the sentences in (36), it is a collection of merge and move events that provide a basis for estimating the parameters λmerge and λmove (as well as the others that relate to specific features). Choosing a value for each of these MG-based parameters picks out a particular probabilistic MG, which in turn defines a particular probability distribution over the set of sentences in (36). The values of these MG-based parameters that this training corpus leads to pick out a probabilistic MG that defines the following distribution: 130 CatJL Special Issue, 2019 Tim Hunter

(38) 0.35478 boys will shave 0.35478 some boys will shave 0.14801 who will shave 0.05022 boys will shave themselves 0.05022 some boys will shave themselves 0.04199 who will shave themselves

From the perspective of the corresponding IMG, however, the training corpus provides a basis for estimating the parameters λinsert and λbuild (as well as the others that relate to specific features). The probabilistic IMG that is picked out by using the same training corpus to estimate the values of these parameters defines the fol- lowing, distinct, distribution over the same set of sentences:

(39) 0.35721 boys will shave 0.35721 some boys will shave 0.095 who will shave 0.095 who will shave themselves 0.04779 boys will shave themselves 0.04779 some boys will shave themselves

Note that even the one sentence that was not in the training corpus, ‘some boys will shave themselves’, is assigned different probabilities by the two grammars. Although the two grammars assign the same analyses to all six sentences, the information provided by the common training corpus bears on the probability of this unseen sentence differently depending on whether one adopts the MG-based or IMG-based probability model. From here it is a simple final step to complete the picture: calculations of sur- prisal values for ‘who will shave themselves’ derived from the MG-based distribu- tion are shown in (40), and the corresponding calculations using the IMG-based distribution are shown in (41). (I have chosen this sentence because it shows a rela- tively striking difference, but the same point could be made with any of the other sentences.) The surprisal values, and therefore the predicted degrees of sentence comprehension difficulty, differ.

(40) surprisal at ‘who’ = − log P(W1 = who) = − log(0.15 + 0.04) = − log 0.19 = 2.4

surprisal at ‘themselves’ = − log P(W4 = themselves | W1 = who, … ) 0.04 = − log 0.15 + 0.04 = − log 0.21 = 2.2 What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 131

(41) surprisal at ‘who’ = − log P(W1 = who) = − log(0.10 + 0.10) = − log 0.2 = 2.3

surprisal at ‘themselves’ = − log P(W4 = themselves | W1 = who, …) 0.10 = − log 0.10 + 0.10 = − log 0.5 = 1 To recap: I took it as given that the language of the speaker(s) of interest con- sists of precisely the set of sentences in (36), and moreover held fixed a particular analysis of each of those sentences (e.g. the assumption that reflexives are created by movement, that ‘who’ moves to SpecCP, etc.). Against the backdrop of these fixed assumptions, we would like to know whether the mental grammar of the speaker(s) of interest is the MG that expresses those analyses in terms of merge and move steps, or the corresponding IMG that expresses those same analyses in insert and build steps. What the calculations above show is that an experiment where we measure the difficulty that our speaker of interest encounters in incrementally read- ing a sentence can provide data that — via linking assumptions which, as usual, would need to be independently justified — bears on this question. Concretely, if we suppose that the speaker’s probabilistic knowledge of language is informed by the pattern in (37) — based on, for example, the fact that this is data we collected from newspaper articles that are representative of the speaker’s linguistic experi- ence — then our setup would predict roughly equal comprehension difficulty at the first and last words of ‘who will shave themselves’ if the speaker’s mental grammar is the MG, but significantly less comprehension difficulty at the last word than at the first if the speaker’s mental grammar is the corresponding IMG.

3.3. Case study: Grammar selection In this section, I will consider a very simple model of grammar selection by a learner. It will be useful to begin with the specifics of the (artificial) learning prob- lem that the model learner will confront, before returning to the details of the forms of particular grammars and the differences between MGs and IMGs. The learner I will consider must choose between two grammars, Gdet and Gwh. Both grammars generate the same set of surface strings, although they assign dif- ferent structures to some of these strings. The common set of surface strings is shown in (42). (42) boys will shave boys will shave themselves who will shave who will shave themselves foo boys will shave foo boys will shave themselves 132 CatJL Special Issue, 2019 Tim Hunter

Where the two grammars differ is in their treatment of the word ‘foo’. In Gdet, det this word is a determiner, as shown in the tree on the left in (43). (G corresponds to the grammar that was used in the previous case study, with the string ‘foo’ in place of ‘some’.) In Gwh, this word is a wh-phrase base-generated in SpecCP (in line with certain proposals about words like ‘why’ and ‘how’), as shown on the right in (43).

(43)

The learner will be provided with some training data, on the basis of which to decide between these two analyses. The training data will be some collection of tokens of the sentences in (42), all of which are generated by both grammars. In order to decide whether Gdet or Gwh best fits the data, the learner will have to consider which grammar can best capture the statistical properties of the training corpus. One way to tackle this problem builds directly on the kind of training that was used in the previous case study. Let us suppose that supplementing gram- mar Gdet with probabilities requires choosing values for parameters λ, and that supplementing grammar Gwh with probabilities requires choosing values for param- eters µ. We know from above that the learner can discover which of the various probabilistic versions of Gdet best fits the data by choosing λ so as to maximize det wh P(D|Gλ ); and similarly, the learner can choose a probabilistic version of G by wh det choosing µ so as to maximize P(D|Gµ ). Having thus identified the “winner” Gλ det wh amongst all the versions of G and the “winner” Gµ amongst all the versions of Gwh, the learner can pit these two winners against each other in a grand final by det wh comparing P(D|Gλ ) with P(D| Gµ ): this is comparing the best that any version of det wh det wh G can do with the best that any version of G can do. If P(D|Gλ ) > P(D|Gµ ), then our learner will choose Gdet over Gwh. det wh Notice now that I have not said anything so far about whether G and G are MGs or IMGs. So we can consider two different instantiations of the learning scenario that has just been introduced: one where a learner must choose between two MGs, MGdet and MGwh, and one where a learner must choose between two IMGs, IMGdet and IMGwh. These two learners are “doing the same thing” — decid- ing whether to analyze ‘foo’ as a determiner or as a wh-phrase — but one is using the MG system to do this and the other is using the IMG system instead. But I will show that these two learners can reach different conclusions about how to analyze this unknown word, even while the training data is held constant. What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 133

As a first concrete example, consider the (artificial and arbitrary) “training corpus” in (44). As before, the numbers at the beginning of each line are token frequencies.

(44) 5 boys will shave 5 boys will shave themselves 5 who will shave 5 who will shave themselves 5 foo boys will shave

We have seen in the previous case study that the best-fitting probabilistic ver- sion of MGdet can differ from the best-fitting probabilistic version of IMGdet, since the former is determined by setting values of parameters including λmerge and λmove but whereas the latter has parameters including λbuild and λinsert. For just the same reasons, the best-fitting probabilistic version of MGwh can differ from the best- wh fitting probabilistic version of IMG — where the former has parameters µmerge and µmove, the latter has µbuild and µinsert. These two divergences mean that the determiner-versus-wh competition taking place in the MG setting may look very different from the determiner-versus-wh competition taking place in the IMG set- ting. Specifically, it turns out that for the MG-based learner, the best likelihood attainable by some probabilistic version of MGdet is 75 times higher than the best likelihood attainable by some probabilistic version of MGwh; whereas for the IMG- based learner confronted with the same choice, the best likelihood attainable under the determiner analysis is only 13.7 times higher than the best likelihood attainable under the wh-phrase analysis.

preference factor for determiner analysis with MGs =

preference factor for determiner analysis with IMGs =

The training corpus in (44) therefore provides much stronger evidence for the determiner analysis if the two competing analyses are seen through the lens of the MG framework, than it does if the competition is seen through the lens of the IMG framework. In the context of a more elaborate learning model (for example in combination with certain Bayesian priors), this means that it is possible that the training corpus in (44) could provide evidence that tips the scale in favour of the determiner analysis for an MG-based learner, but not for an IMG-based learner. This despite the fact that the decision in each case is the decision between the two tree structures in (43) — all that differs is whether these trees are taken to 134 CatJL Special Issue, 2019 Tim Hunter be constructed by the derivational operations merge and move, or the derivational operations build and insert. A more dramatic result is provided by the (equally arbitrary and artificial) training corpus in (45).

(45) 8 boys will shave 1 boys will shave themselves 12 who will shave 1 who will shave themselves 4 foo boys will shave

In this case, the consequences for the MG-based learner and the IMG-based learner differ not in degree (of preference for the determiner analysis), but in direc- tion: MGdet beats MGwh in the MG-based determiner-versus-wh competition, but IMGwh beats IMGdet in the IMG-based determiner-versus-wh competition. In the MG-based scenario, the best likelihood attainable under the determiner analysis is 64900 times higher than the best attainable under the wh-phrase analysis; in the IMG-based scenario, however, this ratio is only 0.749. preference factor for determiner analysis with MGs =

preference factor for determiner analysis with IMGs =

So even in the absence of other surrounding assumptions (e.g. Bayesian pri- ors) to interact with, this training corpus will favour the determiner analysis for the MG-based learner, but favour the wh-phrase analysis for the IMG-based learner. To recap: what has been demonstrated is that the choice between the two analyses of ‘foo’ shown in (43) can have a different outcome, depending on whether those analyses are expressed in MGs, with merge and move as distinct primitive operations, or in IMGs, with build as the single unified structure-building operation — all while holding fixed the training corpus and all other linking hypotheses.

4. Conclusion My aim here has been to answer a question posed in the introduction: how (if at all) does the procedural component of a derivational theory contribute to the theory’s empirical bottom line? The central idea is that in derivational systems What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 135 we can identify a procedure that constructs a complex expression with a static, structured representation of the relations among certain expressions — much as we can identify the procedure “add x to y and then multiply the result by z” with the structured representation z × (x + y). This idea lets us formulate the hypothesis that this structured object, the derivation tree, is the object that is grasped by a speaker when using the corresponding sentence. This eliminates what sometimes appears to be almost a kind of category mismatch that arises when considering how derivational theories are to be cashed out as cognitive hypotheses, particularly as compared to representational theories (recall Figure 1 and Figure 2). It is not immediately obvious, however, that it is sensible to interpret modern generative grammar in such a way that a structured representation of the deriva- tion itself has this primary status: it is tempting to suspect that the derivational operations that lead up to a particular derived expression are redundant extra trimmings, and therefore that debates over these derivational operations them- selves are debates without empirical grounding. I have argued that this is an illu- sion based on a historical trend towards representational encodings of syntactic generalizations, which (i) was not motivated by, and therefore should be taken independently of, the discussions about the mental status of derivational opera- tions, and (ii) is arguably beginning to reverse anyway. In support of the claim that this effect is illusory, I highlighted cases where standard syntactic practice is clearly incompatible with assuming that only final derived expressions are grasped. It therefore follows that the choice of derivational operations posited by a theory has an effect on the structured object that a speaker is taken to grasp or retrieve upon using a sentence. This in turn means that two theories that differ in their choices of derivational operations — even if the two systems generate the same set of grammatical derived structures — will make distinct claims about speakers’ mental representations that can lead to distinct empirical predictions for models of speaker behaviour that include grammatical systems as one of their components. There are many ways that this could be done. I have illustrated the effect by taking the probabilistic enrichment of a grammar to be one locus of sensitivity to the entire object grasped (i.e. the entire derivation tree), since prob- abilities are a part of many common models of behavioural tasks. Specifically, in the context of surprisal-based models of incremental sentence comprehension difficulty and of a simple maximum-likelihood-based learner, I showed that two grammars differing only in the derivational operations taken to be responsible for constructing a common set of grammatical structures make distinct empirical predictions. In narrow terms, this serves as a demonstration that if we are confronted with two theories that differ only in their derivational claims, there are ways to go beyond the standard methodology of acceptability judgements in order to gather evidence that will distinguish them empirically. But in principle we need not wait until we are confronted by the need for such a tie-breaker before attempting to flesh out the empirical consequences of the derivational components of theories: the derivational aspects of a theory can be treated as a first-class component of 136 CatJL Special Issue, 2019 Tim Hunter its empirical payload just as much as all aspects of the representation on the left of Figure 2 are. For psycholinguists, this perspective has the potential to promote more direct engagement with syntactic theory; for syntacticians, it promotes clari- fied ways of understanding the relationship between derivational and representa- tional ways to express generalizations.

References Baker, C. L. & Brame, M. K. 1972. ‘Global rules’: A rejoinder. Language 48(1): 51-75. Barker, C. & Jacobson, P. (eds.). 2007. Direct Compositionality. Oxford: Oxford University Press. Brody, M. 2002. On the status of representations and derivations. In Epstein, S. D. & Seely, T. D. (eds). Derivation and Explanation in the Minimalist Program, 19-41. Oxford: Blackwell. Chomsky, N. 1957. Syntactic Structures. The Hague: Mouton. Chomsky, N. 1965. Aspects of the Theory of Syntax. Cambridge, MA: MIT Press. Chomsky, N. 1973. Conditions on transformations. In Anderson, S. R. & Kiparsky, P. (eds). A Festschrift for Morris Halle, 232-286. New York: Holt, Rinehart and Winston. Chomsky, N. 1975. . New York: Pantheon Books. Chomsky, N. 1981. Lectures of Government and Binding. Dordrecht: Foris. Chomsky, N. 1986. Knowledge of Language: Its Nature, Origins and Use. New York: Praeger. Chomsky, N. 1995. The Minimalist Program. Cambridge, MA: MIT Press. Chomsky, N. 2005. Three factors in language design. Linguistic Inquiry 36(1): 1-22. Chomsky, N. 2007. Approaching UG from below. In Sauerland, U. & Gartner, H.-M. (eds.). Interfaces + Recursion = Language? Berlin: Mouton de Gruyter. Ferreira, F. 2005. Psycholinguistics, formal grammars, and cognitive science. The Linguistic Review 22: 365-380. Fiengo, R. 1977. On trace theory. Linguistic Inquiry 8(1): 35-61. Freidin, R. 1978. Cyclicity and the theory of grammar. Linguistic Inquiry 9(4): 519-549. Freidin, R. 1999. Cyclicity and minimalism. In Epstein, S. D. & Hornstein, N. (eds.). Working Minimalism, 95-126. Cambridge, MA: MIT Press. Graf, T. 2011. Closure properties of Minimalist derivation tree languages. In Pogodalla, S. & Prost, J.-P. (eds.). LACL 2011, vol. 6736 of Lecture Notes in Artificial Intelligence, 96-111. Heidelberg: Springer. Graf, T. 2013. Local and transderivational constraints in syntax and semantics. PhD thesis, UCLA. Graf, T. 2017. Derivations as representations: News from the computational frontier. In Mayr, C. & Williams, E. (eds.). Festschrift for Martin Prinzhorn, vol. 82 of Wiener Linguistische Gazette, 61-69. Hale, J. T. 2001. A probabilistic earley parser as a psycholinguistic model. In Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics. Hale, J. T. 2006. Uncertainty about the rest of the sentence. Cognitive Science 30: 643-672. What sort of cognitive hypothesis is a derivational theory of grammar? CatJL Special Issue, 2019 137

Hale, J. T. 2016. Information-theoretical complexity metrics. Language and Linguistics Compass 10(9): 397-412. Heim, I. & Kratzer, A. 1998. Semantics in Generative Grammar. Oxford: Blackwell. Hornstein, N. 1999. Movement and control. Linguistic Inquiry 30(1): 69-96. Hornstein, N. 2001. Move! A minimalist theory of construal. Oxford: Blackwell. Hunter, T. 2011. Insertion Minimalist Grammars: Eliminating redundancies between merge and move. In Kanazawa, M., Kornai, A., Kracht, M. & Seki, H. (eds.). The Mathematics of Language (MOL 12 Proceedings), vol. 6878 of LNCS, 90-107. Berlin/Heidelberg: Springer. Hunter, T. to appear. Formal methods in experimental syntax. In Sprouse, J. (ed.). The Oxford Handbook of Experimental Syntax. Hunter, T. & Dyer, C. 2013. Distributions on Minimalist Grammar derivations. In Proceedings of the 13th Meeting on the Mathematics of Language. Jackendoff, R. 2011. What is the human language faculty?: Two views. Language 87(3): 586-624. Jacobson, P. 2007. Direct compositionality and variable-free semantics. In Barker, C. & Jacobson, P. (eds.). Direct Compositionality, 191-236. Oxford: Oxford University Press. Joshi, A. K. & Schabes, Y. 1997. Tree-adjoining grammars. In Rozenberg, G. & Salomaa, A. (eds.). Handbook of Formal Languages, vol. 3, 69-124. New York: Springer. Katz, J. J. & Fodor, J. A. 1963. The structure of a semantic theory. Language 39(2): 170-210. Katz, J. J. & Postal, P. M. 1964. An Integrated Theory of Linguistic Description. Cambridge, MA: MIT Press. Kayne, R. 2002. Pronouns and their antecedents. In Epstein, S. D. & Seely, T. D. (eds.). Derivation and Explanation in the Minimalist Program, 133-166. Oxford: Blackwell. Kobele, G. M. 2006. Generating Copies: An investigation into Structural Identity in Language and Grammar. PhD thesis, UCLA. Kobele, G. M. 2010. Without remnant movement, MGs are context-free. In Ebert, C., Jäger, G. & Michaelis, J. (eds.). Proceedings of Mathematics of Language 10/11, vol. 6149 of LNCS, 160-173. Berlin/Heidelberg: Springer. Kobele, G. M. 2011. Minimalist tree languages are closed under intersection with recog- nizable tree languages. In Pogodalla, S. & Prost, J.-P. (eds.). LACL 2011, vol. 6736 of Lecture Notes in Artificial Intelligence, 129-144. Kobele, G. M. 2012. Importing montagovian dynamics into minimalism. In Bechet, D. & Dikovsky, A. (eds.). Logical Aspects of Computational Linguistics, 103-118. Berlin: Springer. Lasnik, H. 1999. Chains of arguments. In Epstein, S. D. & Hornstein, N. (eds.). Working Minimalism, 189-215. Cambridge, MA: MIT Press. Lasnik, H. & Lohndal, T. 2013. Brief overview of the history of generative syntax. In den Dikken, M. (ed.). The Cambridge Handbook of Generative Syntax, 26-60. Cambridge: Cambridge University Press. Lasnik, H. & Uriagereka, J. 1988. A Course in GB Syntax. Cambridge, MA: MIT Press. Lebeaux, D. 1988. Language acquisition and the form of the grammar. PhD thesis, University of Massachussetts, Amherst. 138 CatJL Special Issue, 2019 Tim Hunter

Lebeaux, D. 2000. Language acquisition and the form of the grammar. Philadelphia: John Benjamins. Lees, R. B. & Klima, E. S. 1963. Rules for english pronominalization. Language 39(1): 17-28. Levy, R. 2008. Expectation-based syntactic comprehension. Cognition 106(3): 1126- 1177. Lightfoot, D. 1976. Trace theory and twice-moved nps. Linguistic Inquiry 7(4): 559- 582. McCawley, J. D. 1968. Concerning the base component of a transformational grammar. Foundations of Language 4: 243-269. McCloskey, J. 2002. Resumption, successive cyclicity, and the locality of operations. In Epstein, S. D. & Seely, T. D. (eds.). Derivation and Explanation in the Minimalist Program, 184-226. Oxford: Blackwell. Michaelis, J. 2001. Derivational minimalism is mildly context-sensitive. In Moortgat, M. (ed.). Logical Aspects of Computational Linguistics, vol. 2014 of LNCS, 179-198, Berlin/Heidelberg: Springer. Miller, G. A. & Chomsky, N. 1963. Finitary models of language users. In Luce, R. D., Bush, R. R. & Galanter, E. (eds.). Handbook of Mathematical Psychology, vol. 2. New York: Wiley and Sons. Montague, R. 1974. Formal Philosophy: Selected Papers of Richard Montague. New Haven, CT: Yale University Press. Edited and with an introduction by Richmond H. Thomason. Müller, G. 2017. Structure removal: an argument for feature-driven merge. Glossa: A journal of general linguistics 2(1): 1-35. Phillips, C. & Lewis, S. 2013. Derivational order in syntax: evidence and architectural consequences. Studies in Linguistics 6: 11-47. Pollard, C. & Sag, I. A. 1994. Head-driven Phrase Structure Grammar. Chicago: University of Chicago Press. Sag, I. A. & Wasow, T. 2011. Performance-compatible competence grammar. In Borsley, R. & Borjars, K. (eds.). Non-Transformational Syntax: Formal and Explicit Models of Grammar. Wiley-Blackwell. Stabler, E. P. 1997. Derivational minimalism. In Retore, C. (ed.). Logical Aspects of Computational Linguistics, vol. 1328 of LNCS, 68-95. Berlin/Heidelberg: Springer. Stabler, E. P. 2006. Sidewards without copying. In Wintner, S. (ed.). Proceedings of The 11th Conference on Formal Grammar, 157-170. Stanford, CA: CSLI Publications. Stabler, E. P. 2011. Computational perspectives on minimalism. In Boeckx, C. (ed.). The Oxford Handbook of Linguistic Minimalism. Oxford: Oxford University Press. Stabler, E. P. 2013. Two models of minimalist, incremental syntactic analysis. Topics in Cognitive Science 5(3): 611-633. van Riemsdijk, H. & Williams, E. 1986. Introduction to the Theory of Grammar. Cambridge, MA: The MIT Press. Wasow, T. 1972. Anaphoric Relations in English. PhD thesis, MIT. Catalan Journal of Linguistics Special Issue, 2019 139-152

Problems of ‘Problems of Projection’: Breaking a conceptual tie*

Marc Richards Queen’s University Belfast [email protected]

Received: December 31, 2017 Accepted: September 23, 2019

Abstract

The exocentric labelling model of Chomsky’s (2013, 2015) Problems of Projection renders pro- jection rather more problematic than it was previously, giving rise to numerous technical and conceptual complications, redundancies and inconsistencies. Of particular concern is the revers- ibility of the assumptions that are made with respect to the relation between labelling and Search, such that the opposite theory is equally coherent and delivers the same empirical results. After reviewing these concerns, a simpler conception of exocentric labelling is sketched in which all labels are uniformly added via external Merge of categorizing phase heads, turning unlabelled (uninterpreted) nonphase syntactic objects into labelled (interpreted) phases. Some conceptual and empirical advantages of the simpler system are finally considered. Keywords: phases; labels; categories; transfer; islands

Resum. Problemes de Problemes de projecció. Trencar un empat conceptual

El model d’etiquetatge exocèntric de Chomsky (2013, 2015), Problemes de projecció, fa que la projecció sigui més problemàtica que abans i que doni lloc a nombroses complicacions tècniques i conceptuals, redundàncies i incoherències. És particularment preocupant la reversibilitat de les hipòtesis que es fan respecte de la relació entre l’etiquetatge i la cerca, de manera que la teoria contrària és igualment coherent i proporciona els mateixos resultats empírics. Després de revisar aquestes preocupacions, es dibuixa una concepció més senzilla de l’etiquetatge exocèntric en què totes les etiquetes s’afegeixen uniformement mitjançant la fusió externa de nuclis de fase categoritzadors, que converteixen objectes sintàctics no marcats (sense interpretar) en fases eti- quetades (interpretades). Es consideren finalment alguns avantatges conceptuals i empírics del sistema més senzill. Paraules clau: fases; etiquetes; categories; transferència; illes

* I would like to thank two anonymous reviewers for their thoughtful comments and suggestions, as well as the audience at GenSyn 2017, Universitat Autònoma de Barcelona, especially its organizers and the editors of the present volume, Ángel Gallego and Dennis Ott. All remaining errors and oversights are my own.

ISSN 1695-6885 (in press); 2014-9718 (online) https://doi.org/10.5565/rev/catjl.220 140 CatJL Special Issue, 2019 Marc Richards

Table of Contents 1. Introduction 4. Another phase–label fable 2. Problematic projection 5. Conclusion 3. Searching down the rabbit hole References

1. Introduction This paper raises the methodological question of how we should proceed in the case of a conceptual tie, as can sometimes arise when pursuing the minimalist research program. In cases where there appear to be multiple principled solutions to a theo- retical problem that conform equally well to Chomsky’s Strong Minimalist Thesis (SMT), and which make identical empirical predictions and account for all the same facts, how can we decide which solution, if any, is to be preferred? I argue that the exocentric, projection-free syntactic model of Problems of Projection (POP; Chomsky 2013, 2015) has led to exactly such an impasse, in which two opposing but equally plausible and coherent sets of assumptions about the relation between labelling and Search have been claimed by different researchers to yield the same results (namely, opacity/freezing effects). In such cases, where we have contradic- tory assumptions that cannot both be right, it may well be that both are wrong, indi- cating perhaps a deeper conceptual flaw common to the general approach. We there- fore need to identify and clarify where this ‘wrong turn’ might lie, in order to make progress again. To this end, I suggest in this paper a minor course correction which charts an alternative path towards a syntax-external, phase-level labelling algorithm (LA) of the POP kind, i.e. one which shares the same objectives of eliminating theory-internal notions of projection and endocentricity under the SMT. This alter- native approach accounts for the same freezing/opacity facts as the POP-LA whilst avoiding and resolving the aforesaid conceptual stalemate. We proceed, briefly, as follows: Section 2 reviews some of the conceptual and technical problems inherent in the POP approach; the resulting impasse is outlined in section 3. Section 4 then sketches a possible way to break the tie by pursuing a Merge-based approach to phase-level labelling that equates labels with phase heads and allows us to sidestep and possibly even reconcile the conflicting viewpoints of the Search-based POP-LA.

2. Problematic projection Taking labelling to be necessary (only) at the interface in order to determine the appropriate interpretation of a syntactic object (SO), Chomsky 2013 (POP), 2015 (POP+) proposes a Search-based algorithm which operates at the phase level and identifies the “designated element” that provides the relevant information to the interface. In the simplest case, i.e. SO = {LI, XP}, minimal search immediately detects and identifies the head (LI) as the label. However, as is well known, this labelling algorithm (LA) breaks down when a symmetrical structure is encountered, such as {XP, YP}. In such cases, the symmetrical SO is made interpretable (label- lable) in one of two ways: either (i) by creating an asymmetry through internal Problems of ‘Problems of Projection’: Breaking a conceptual tie CatJL Special Issue, 2019 141

Merge (IM) of XP or YP, or (ii) by seeking a tolerable symmetry via a shared label common to both X(P) and Y(P). Let us accept the premises and the kind of LA they entail (i.e. syntax-external labelling, divorced from the operation Merge itself, with the latter producing exocentric structures).1 Setting aside some minor technical questions that arise when this system is pursued in greater detail,2 it seems to me at a more fundamental level that both (i) and (ii) rely on additional assumptions that require nontrivial departures from the SMT (i.e. departures from minimal expecta- tions) and are thus on rather shaky ground, conceptually speaking. Firstly, route (i) necessitates the assumption that IM creates “discontinuous ele- ments” (DEs) that are “invisible to LA” (POP: 44). The idea that lower copies do not ‘count’ or are invisible to syntactic operations is, of course, not a new one, and goes back at least to the trace invisibility of Chomsky (2001), in which lower cop- ies would not induce intervention effects, i.e. they were invisible to Agree. Insofar as LA is itself a kind of Search procedure, like Agree, this invisibility might seem perfectly consistent and plausible – Search of any kind (Agree or LA) only seems to detect the head of a chain. Although we can of course define chains, terms and occurrences in the way we need to ensure this result (as POP: 44 does), it seems counter to the minimalist spirit, as any such legalistic definitions are surely exactly the kind of “descriptive technology” (Chomsky 2008) that add to the “first factor” (in terms of Chomsky 2005) and thereby take us further from the SMT. As far as the syntax is concerned, the minimal assumption is surely that copies are just that: iden- tical copies of the self-same element; what holds of one copy should hold of them all. Each copy contains the same set of features, and since intervention and labelling both operate on features, then if a higher copy can intervene or value a probe or return a label to LA then a lower copy should equally be able to do so. Anything else is a stipulation and a departure from the SMT, requiring careful justification (and in terms of empirical justification, there are well-known cases of Agree with lower copies in the literature, most notably in Holmberg & Hróarsdóttir 2003).3

1. I’m not entirely convinced that this is the only possibility under the “simplest” conception of Merge as bare set-formation. Given the undeletable Edge Features (EFs) of Chomsky 2008 as a property of LIs, we could still retain an inherent asymmetry to Merge itself (i.e. Merge is always to something) without Merge(α,β) changing any properties on either α or β, in line with the No Tampering Condition: EF on α (or β) does not delete and remains unchanged. The ‘selector’ is thus always identifiable as the currently accessible head, i.e. the LI whose EF is currently driving the computation – essentially the ‘locus’ of Collins’s (2002) label-free system. Indeed, this is claimed to be the “simplest assumption” for detecting the current probe in Chomsky (2007: 23). In this way, EFs could plausibly act as labels (e.g. by identifying the “designated element” that provides the relevant information to LA); see Cecchetto & Donati (2010, 2015) for such an EF-based, ‘internal’ approach to labelling. Since the label only changes when a new LI (‘selector’) is merged, it is clear that cases like {XP, YP} pose no particular problem for such approaches. 2. For example, the question of which features belong to the set of “prominent features” (POP: 45), i.e. potential labellers, for the purposes of the LA, as well as other attendant complications (such as “weak” variants of these features, as claimed for English φ on T in POP+, requiring ‘strengthening’ via a shared label for labelling to succeed; see Goto 2017 for a critical discussion of the strong/ weak-labels hypothesis). 3. See also Takita, Goto & Shibata (2016) and Stockwell (2016) on this point. Several of the argu- ments and criticisms made in this section are also made by these authors. 142 CatJL Special Issue, 2019 Marc Richards

Even at a conceptual level, the DE stipulation is surely to be excluded on the grounds that it seems to require the properties of an embedded SO to be altered (to render the lower copy invisible), in violation of the No Tampering Condition (NTC). In effect, the DE assumption exceptionally allows IM to turn (1a) into (1b) – in this case, we have raising of IA out of v*P as one of the ways of desymmetriz- ing {EA, v*P} for LA purposes, discussed in POP: 44 (17), footnote 34.

(1) a. IA … {EA, {v* {V, IA}}} → IA … {EA, {v* {V, IA}}} →

b. IA … {EA, {v*}} → label = v*

Whilst other operations, such as Transfer, might well be able to remove struc- ture in this way and thus render it invisible to the syntax (as captured, for example, under the Phase Impenetrability Condition (PIC) of Chomsky 2000, 2001),4 sim- plest Merge conforms to the NTC. In tampering with structure in this way, IM in (1) seems to blatantly flout the NTC. Following Chomsky (2008, 2013 and elsewhere), I take the NTC to be a deep, third-factor computational principle. As such, it takes precedence over an FL-specific stipulation like DE – that is, it would be preferable to do without DE than renounce or weaken the NTC.5 An alternative LA that does not rely on DE would therefore seem desirable. Secondly, route (ii) does not resemble other instances of Search, such as those involving Probe-Goal Agree. The latter – minimal search under Probe-Goal – is not confounded by {XP, YP} structures, unlike Label-Search, and there is no similar condition on Probe-Goal Agree such that ‘deep search’ into an XP is only possible if a parallel search takes place into YP, its sister. Rather, the Agree-probe just keeps searching deeper and deeper until it finds the kind of thing it’s looking for (up to PIC), whether that goal be located inside XP or YP, with no equivalent arbitrary requirement that searching into non-heads yield two goals that share the same feature. Quite why minimal search has these unexpected and anomalous additional properties just in the case of labelling but not in the case of Probe-Goal needs some kind of justification instead of its current stipulation; otherwise, this again seems to take us further from the SMT, not closer.6

4. Indeed, the use of Transfer in the service of labelling/LA to achieve precisely this effect has been proposed by Narita (2014). See also the more general point discussed below. 5. Alternatively, we could keep DE and seek to eliminate the NTC, which is what Gallego (2017) opts for. 6. Let’s also not forget that the starting point in POP – the initial ‘problem of projection’ from which the labelling discussion and the development of LA proceeds – is the question of how and why [NP, TP] is labelled T and not N, i.e. why NP is interpreted as the specifier of TP and not vice versa (cf. POP: 42 (16)). The shared label for this SO which the LA ultimately arrives at, viz. <φ, φ>, does not solve this problem. It implies that this SO is interpreted as a nominal after all (interpretable φ being a property of NPs/DPs) – surely the wrong result. Worse, T’s φ is uninterpretable, being a φ-probe inherited from C, and thus will never reach the interface (indeed, the only reason it is inherited from C at all is to enable its immediate deletion, if the rationale in Richards 2007 is cor- rect). If labels are needed at the interface for coherent interpretation, then seems a pretty dysfunctional label for TP. Problems of ‘Problems of Projection’: Breaking a conceptual tie CatJL Special Issue, 2019 143

Thirdly, and more generally, the motivation for POP’s external LA (i.e. the need for SOs to receive the proper interpretation at the interface in a projection- free syntax) sits awkwardly with the use to which it has most widely been put in the literature – viz. as a local, syntax-internal trigger for (successive-cyclic) move- ment, via route (i). Thus labelling symmetries of the {XP, YP} kind necessitate, and trigger, an immediate resolution in the syntax (even though the labelling issue only arises and should only be detectable at the phase level, when the LA finally applies7). Interestingly, whereas POP: 44 makes a virtue of how successive-cyclic movement is now “forced” by the need to resolve a labelling ambiguity at each intermediate step (i.e. each phase edge, all having the symmetrical form {α XP, YP}), this is no longer the case under the revised perspective of POP+, which makes it clear that there is in fact no need to “force” (i.e. trigger) any movement or instance of Merge under the simplest system of free Merge.8 As such, the inter- mediate movement steps are just one derivational option (the one that happens to yield labellable structures by LA, perhaps, but still, they are not forced). Therefore, even without the LA-based assumptions underlying route (i) that provide a need to create asymmetric structures via IM (cf. above), these intermediate movements would still be possible, given that “Merge applies freely, including IM” (POP+: 10). Conceptually, however, it seems odd to create an artificial problem just in order to force or ensure its resolution,9 as there is then always an even simpler way to resolve it (and thus one that comes closer to the SMT), which is not to create the problem in the first place. In this case, the labelling ambiguity that arises at each phase edge would be equally well resolved by not moving to each phase edge in the first place (and thus not creating each symmetrical {α XP, YP} structure at all). That is, one-fell-swoop movements (i.e. non-successive-cyclic derivations), skipping the intermediate positions entirely, would be equally good (i.e. labellable) under the LA (indeed, they’d conform even more straightforwardly to the LA, as neither route (i) nor route (ii) would be required: we’d just have the simplest case of {H, XP} at each phase edge instead) – so really, it turns out that successive-cyclicity isn’t forced at all by LA. The latter wants to label each phase edge as if the raised XP was not there, so why put it there at all? There must, then, be some independent factor that forces these intermediate movements to just these positions (i.e. phase edges), and indeed we already have such a thing: cyclic Transfer (/PIC), which not only necessitates these periodic stop-overs at

7. The implied lookahead here is unfortunate, though not real, as we can simply view the choice of not moving further as a derivational option that is filtered out at the interface (by LA). See also the following discussion and footnote 8. 8. Chomsky (2015: 10-11) explicitly rejects “the lingering idea, carried over from earlier work, that each operation has to be motivated by satisfying some demand. But there is no reason to retain this condition. Operations can be free, with the outcome evaluated at the phase level for transfer and interpretation at the interfaces.” 9. Such moves were of course commonplace in earlier minimalism, in which imperfections such as ‘viral’ uFs would be introduced into a derivation in order to trigger particular operations that would check or delete them. As with all such movement triggers, including the use of LA-based symmetries at issue above, the derivation would converge equally well if these imperfections were not introduced to begin with. 144 CatJL Special Issue, 2019 Marc Richards the positions where we find them (which the LA does not do) but also accounts for the initial movement step too (e.g. of a wh-object from its base position in the complement of a verbal head), something which the LA does not do. The latter thus seems doubly redundant as a way of motivating movement, as it is at most just a part of the picture (the ‘movement from’ part, but not the ‘movement to’ part), and an unnecessary part at that. In fact, it’s triply redundant. As noted above, if we’re to assume that IM can violate the NTC in the manner implied by the DE stipulation, as in (1), then we should certainly also assume that Transfer can do this too (not least as Transfer stands outside the purview of the NTC, which holds only of Merge). In that case, the very points at which the unlabellable intermediate steps of the form {α XP, YP} arise, i.e. phase edges, are precisely those which Transfer will alter through the spelling-out of the phase head’s complement, turning YP into, effectively, Y (no less plausibly than IM/DE would). That is, Transfer will always turn (2a) into (2b), which is labellable at the phase level via minimal search, with LA identifying Y (the phase head) as the label without any need for XP to move.

(2) a. {α XP, YP} = {α XP, {Y, ZP}} → Transfer ZP: {α XP, {Y, ZP}} →

b. {α XP, {Y}} → label = Y Essentially, what this means is that a Narita (2014)-style labelling through Transfer will always arise at precisely the points in the derivation where the move- ment-triggering symmetry is meant to arise. This has not gone unnoticed in the literature; indeed, Takita, Goto & Shibata (2016) gamely exploit (2) as a possible alternative way to label these structures, with some interesting consequences for the analysis of existential constructions. However, there comes a point when you have to ask if the game is still worth the candles. The numerous redundancies, inconsistencies and other conceptual concerns raised above lead one to suspect that there might be an even simpler way of going about phase-level labelling under the “simplest conception of Merge” (POP: 42) and a projection-free UG.

3. Searching down the rabbit hole This suspicion is reinforced in light of a conceptual stand-off that arises when LA failures are not repaired (or reparable) by either route (i) or route (ii) and are thus claimed to underlie illegitimate or nonconvergent derivations, i.e. to be a source of deviance or uninterpretability at the interface. In such cases, mutually incompatible sets of assumptions have been brought to bear on the same empirical problem(s), yet they seem to offer equally coherent and plausible theoretical accounts of the same phenomena. More specifically, when it comes to deciding on the relation between labelling and islands (opaque domains, freezing effects, etc.), it seems that no matter which way we turn, we get the same answer. In striving for the SMT, we’ve lost our conceptual footing. Of specific concern here is the existence of two compelling, but competing, lines of research into LA-derived islands. On the one hand, a lack of label has been Problems of ‘Problems of Projection’: Breaking a conceptual tie CatJL Special Issue, 2019 145 claimed to underlie opacity, most notably by Goto (2015, 2016); relevant here are also Hornstein & Nunes’s (2008) claim that adjuncts may go unlabelled and Blümel’s (2017a) treatment of root/V2 clauses as labelless, as the island status of both of these (adjuncts and V2 clauses) can then be derived from the LA on the assumption that unlabelled structures are opaque – for Goto, they are “invisible to Search”. Goto (2015) makes the case that all the familiar kinds of islands, including CED domains, coordinate structures, CNPC, etc., involve unlabelled SOs. Once labelled, an SO becomes visible to Search.10 By contrast, the opposite state of affairs has also been proposed, i.e. that it is labelling that freezes an SO and renders it (internally) opaque for (sub)extraction. An SO is then transparent until it is labelled, at which point it becomes, effec- tively, invisible to Search. On such approaches, symmetrical structures – those with a shared label, by route (ii) above – are inherently “stable” and resist any further manipulation: this is the stance defended by Narita (2015) (based on his and Naoki Fukui’s “symmetry-driven” model of the syntax), and it is the one which POP+ comes closest to embracing in its approach to criterial freezing and Rizzi’s “halting problem” (i.e., movement beyond the shared-label position would result in the wrong label at CI, and thus the wrong interpretation, though this relies on the problematic DE supposition reviewed above). Blümel’s (2012, 2017b) system likewise derives freezing effects from symmetrical (shared) labelling and thus falls within this camp. When it comes to labelling and its relation to (sub)extraction, then, it seems we’re damned if we do (Narita), and damned if we don’t (Goto). Both concep- tions seem equally plausible. Goto’s contention that unlabelled SOs are ‘invis- ible’, perhaps not (just) at the interface but within the narrow syntax too for cer- tain operations, and that they are opened up to such operations through labelling, has intuitive appeal, and there are credible precursors in the cyclic expansion of search space (Rezac 2003 comes to mind; Rackowski & Richards 2005 is another clear antecedent in terms of rendering phases internally accessible via prior Agree/ Search). Likewise, the opposite contention that labels ‘seal off’ an SO and mark it as complete and inaccessible for further manipulation, possibly as part of the general packaging of SOs that goes on at the phase level (Transfer, LA, etc.), makes plenty of sense from the phase-cyclic computational perspective. Neither approach is without its conceptual problems, too (whether it be DE or those raised in footnote 10). When faced with such an empirical, theoretical and conceptual tie, with no obvious arguments to tip the balance one way or another, we run the risk of stalemate and theoretical stagnation. How do we decide on the road ahead?

10. Conceptually, this sits uneasily with POP’s claim that LA is itself a Search-based procedure, as then it is unclear how LA could ever label anything – in order to be visible to LA/Search and thus receive a label, an SO would already have to have a label. The Goto approach also has to allow certain SOs to remain unlabelled at the interface, departing from POP. I see the former issue as more problematic than the latter; indeed, the latter is potentially quite desirable (see section 4 below), as many of the problems with the POP-LA system stem from it trying to label too much. For an ‘internal’ approach to labelling that likewise reduces islands to lack of label, see Cecchetto & Donati (2012, 2015). 146 CatJL Special Issue, 2019 Marc Richards

Given the other issues surrounding the POP-LA framework touched on in sec- tion 2, it seems to me that the best way to get our bearings back and reset our conceptual compass is to retrace our steps a little and venture down a slightly different path.

4. Another phase–label fable One way to break the deadlock and dig ourselves out of the apparent hole is to go back and ask ourselves if the POP-LA really is the simplest approach to labelling that we could imagine under the SMT. The best possible scenario – i.e., the LA that comes closest to the SMT – would be not to need an LA at all. From the mini- malist perspective, this is perhaps where we should have started, to first see how far we could get without assuming a special LA of any kind, only opting for more complicated solutions when this simplest system, without an LA, breaks down or proves inadequate. After all, why do we need an LA? Why search for labels at all? If the reason is to render SOs interpretable at the interface (in terms of their cat- egorial type), then a Search-based algorithm (with all its attendant complications) looks like overkill. It’s doing too much; it’s labelling too much. As Hornstein & Pietrowski (2009) and others have argued, it is not clear that every SO needs to be labelled at the interface, for the purposes of interpretation. This is especially true from the perspective of phase theory (Chomsky 2000 et seq.). Phases just are the units of interpretation. They are transferred as units and show semantic, phonologi- cal and syntactic integrity as interpretive units (see especially Chomsky 2001 on this). It is unclear that anything smaller than a phase is interpreted at the interface. Labelling anything smaller than the phase is then redundant. Part of why phases exist, then, might be precisely to provide labels to the SOs that do get interpreted. If we minimally (and maximally) assume that we just label what we need to label, i.e. just those SOs which are actually interpreted, then labels can simply be added, uniformly, by external merge of categorizing phase heads.11 This is already widely assumed for lexical/event structure. As POP+ also notes, in the Distributed Morphology tradition of roots and categorizers, a root is inher- ently unspecified for its label, and receives this information externally, from the structural context, via heads like v – i.e., via phase heads (see, e.g., Embick 2010; Marantz 2013). If at least some phase heads act as labels (i.e. the ‘categorizers’ of DM and other exocentric approaches, such as Borer 2003, 2014), then maybe we should try just equating the two – i.e. all labels are phase heads, and vice versa. A Merge-based approach to labelling thus emerges, based on the simplest concep- tion of Merge (i.e. without projection), in which nonphase heads and their SOs are labelled by phase heads (hence the alternating P-N-P-N-… sequence of phase heads

11. As Juan Uriagereka points out (p.c.), this is as expected from a physical perspective: not everything in physics is visible, detectable or measurable. Unlabelled SOs would then count among the class of theoretical entities that are in principle unobservable; they would be inaccessible to the conscious mind. (This would also seem to chime with Chomsky’s 2017 recent speculations on related matters regarding externalization, inner speech and the language of thought.) Problems of ‘Problems of Projection’: Breaking a conceptual tie CatJL Special Issue, 2019 147 and nonphase heads which seems to characterize the clausal spine; cf. Richards 2007). As long as the phase head is detectable at the phase level (as presumably it must be, quite independently of labelling, as the trigger of Transfer and the driver of phase-level operations), then identifying the label at the phase level is trivial – it’s just the phase head. The idea, then, is that we simply generalize the DM roots-and-labels approach to all cases, i.e. from roots and heads to all labelless XPs. Then, just as a category- less root (R) receives its label externally, structurally, by merging with a categoriz- ing functional head (K) such as v, n, etc., so an unlabelled (nonphase) XP receives its label externally, structurally, by merging with a phase head. The basic cases 12 seem easy enough to capture, and are given in (3).

(3) {R, DP} (= nonphase, not labelled/interpreted) {v, {R, DP}} (= phase, interpreted with label v) {EA, {v, {R, DP}}} (= same phase, same label) {T, vP} (= nonphase, not labelled/interpreted) {DP, TP} (= nonphase, not labelled/interpreted) {C, {EA, TP}} (= phase, interpreted with label C)

Following POP: 47, the base pair of every tree, involving merger of two heads (LIs), consists of a root and its categorizer: only one of these two items thus pro- vides a label. For Merge{X, Y}, with X the phase head (categorizer) and Y the root, X is therefore the label. Suppose the root (R, a nonphase head) first combines with an internal argument, yielding e.g. {R, DP}. This will then be labelled, exter- nally, by merger of the phase head (e.g. v), so that {v, {R, DP}} is the minimal labelled (and thus interpreted) SO. Since {R, DP} is smaller than a phase, it is not interpreted anyway, and so it does not need a label. For {EA, vP}, the detectable phase head (i.e. the one triggering Transfer and other phase-level operations) is v, hence v is also detectable as the label (at least in the usual case; see below). Following POP+, the head T is essentially like a root (it is too “weak” to label on its own); it is feature-less, inheriting its properties from C (cf. Chomsky 2007, 2008). Thus {T, vP} is the same as {R, DP}: it is labelled externally, by the phase head (C).13 Assuming a cyclic construction of the CP, with IM of the subject preceding Merge-C under Free Merge (following POP+: 10), the SO {SPEC, TP} is likewise labelled by the phase head C via merge of the latter, yielding the minimal labelled

12. Following Chomsky (2007 et seq.), I take phase heads to be the locus of uninterpretable (unval- ued) features. Valuation of these features (via Agree) renders the phase head inactive and triggers Transfer. 13. Feature inheritance might equally well provide the SOs {T, vP} and {R, DP} with a label; they could simply inherit the label of the phase head that selects them (i.e. C or v, etc., respectively). It is unclear that ‘TP’ is categorially distinct from CP, any more than ‘RootP’ is categorially distinct from vP, etc. Note a possible prediction here: if T is essentially a root, categorized and labelled by C, then lexical roots might themselves be directly categorized by C, a potential source for prepo- sitions and the well-known parallels between C and P. 148 CatJL Special Issue, 2019 Marc Richards

SO = {C, {SPEC, TP}}. Again, the smaller SO ({SPEC, TP}) does not need its own label, as it is sub-phasal and thus never interpreted as such. Finally, DP (or nP) is likewise a labelled root, labelled by a phase head (D or n*), following suggestions for the treatment of nominal phases in POP and Chomsky (2007: 25). Insofar as we can get away with just labelling what we actually need to label (i.e. the minimal phase-label story sketched above), no additional LA is required. Furthermore, this ‘simplest’ approach to labelling under the SMT has the further advantage of deriving the same island/opacity effects that led to the conceptual impasse under POP-LA that we saw in section 3 above. Freezing effects will arise as ‘wrong label’ effects, just as POP+ proposes for these, but without the inherent uncertainties (reversible assumptions) of the latter approach. Essentially, the problem of Merge{X, Y} under POP-LA, where both heads would yield a label (resulting in a conflict or ambiguity at the interface), now obtains in the specific case of Merge{XP, YP} where both XP and YP are phases (and in particular, phases with active edges). That is, the only place where a labelling symmetry will arise under the alternative phase-label approach outlined above is where two phase heads come together at the same time, in a single SO, with both phase heads then offering a label for that SO at the interface. Islands, then, are not due to a lack of label (the Goto approach) or to a shared label (the Narita/POP approach); rather, they are due to there being two labels (leading to an anomalous, ambiguous or gibberish interpretation at CI). The technical implementation of this could be achieved by means of undelet- able Edge Features on phase heads, with each such EF providing a label (cf. foot- note 1; see Richards 2014 for an implementation along these lines). It might also be possible to reduce it to the integration of separate workspaces (thus reinventing Uriagereka 1999 yet again, with left-branch compression and its resultant CED effects now reconceived in terms of labelling rather than the LCA). Left branches (such as the EA DP), as separate phases of the derivation, would be constructed in parallel and then integrated into the clausal spine. In the normal case (with no subextraction), the DP (EA) phase is constructed, transferred and labelled by its phase head (D/n). This labelled SO can then be added to the workspace of the v* phase for merger with v*P. However, in order to extract something out of the DP/ EA, the latter’s workspace must be kept open: there are then two active or open workspaces – the phase we’re moving out of (the DP/EA) and the phase we’re moving into (v*P). The result is an SO at the v*P phase level, {DP, v*P}, which contains two active phase heads, and therefore two potential labels. Whatever the technical implementation, the essential idea is that island SOs result from a non- uniform composite label, such as for subject islands, which confuses the interface (and/or leads to a deviant interpretation). Islands are thus predicted to emerge just where two phases (i.e. two phase- labelled XPs), of different categories,14 are merged together and must both remain

14. The Coordinate Structure Constraint seems harder to capture, but its resolution via Across-The- Board movement follows naturally as, in such cases, both active phase heads would provide the same label, yielding a shared, uniform composite label equivalent to that obtained by route (ii) under the POP-LA. Problems of ‘Problems of Projection’: Breaking a conceptual tie CatJL Special Issue, 2019 149

‘active’ (i.e. both the source phase and the target phase for subextraction), and indeed this configuration is implicated in at least the following familiar island types:

(4) a. Subject islands (CED): *{nP, v*P} b. Adjunct islands (CED): * c. Free-relative islands: *{nP, CP} d. CNPC: *{nP, CP}

The case of (4d) warrants further comment. The CNPC is arguably much more general than usually thought, as persuasively argued by Bošković (2015). The exceptionality of verbs (with respect to other lexical categories) in allowing extrac- tion out of their complements now follows if the categorizing phase head in the case of n, a merges directly with the root, severing the root from its internal argu- ment, so that the dependent (complement XP) of a noun is sister to {n, R}, as in (5a), whereas the root merges directly with the dependent/complement in the case of verbalization, as in (5b).15

(5) a. {{n, R}, XP} b. {v, {R, XP}}

This structural difference in turn follows independently from the presence of a φ-probe on v (which enters into φ-Agree and Case-valuation with the dependent object), thus requiring a Probe-Goal configuration (i.e., Search-sister) to obtain between the phase head/categorizer and the object just in the case of verbs, which is not possible if the phase head merges directly with the root, as in (5a).16 If the nominalized root {n, R} merges with the noun’s complement (e.g. CP), we have the configuration in (4d), hence the island effect qua labelling conflict. Thus CNPC effects obtain where (5a) instantiates (4d), with XP = CP, yielding the nonuniform composite label *. The exceptional extractability out of verbal complements is then the direct, structural result of the categorizing phase head v merging higher, above the extraction site, yielding an SO of the form in (5b). As such, verbs do not instantiate an SO of the (4d) kind, involving the merger of two phases, unlike the other categories. For the same reasons, we can immediately see why phase sliding (and similar ideas) will have a ‘melting’ effect on islands (cf. Gallego 2006, 2010): raising of the phase head (such as obtains under v-to-T movement) places the categorizing

15. POP: 46, footnote 43, proposes (5a) for v + Root combinations. I suspect on the above grounds that it holds for all categories apart from v. See Alexiadou (2014) for relevant discussion and a different take on the severing of arguments from roots. 16. Gallego (2014) exploits the presence of this φ-probe on v (versus its absence on n, a) in order to account for another highly salient difference between verbs and other categories, namely why arguments are obligatorily present only with verbs. 150 CatJL Special Issue, 2019 Marc Richards head, and thus a singleton/uniform label, above the extraction site, re-establishing (5b), so that the lower labelling failure is rescued at the phase level; cf. (6).

(6) ‘Melting’ via head movement:

P1 … {{P2, XP}, {P1, YP}}

More radically, and perhaps tentatively, the basic configurational difference between (5a) and (5b) reveals a further redundancy that might now be eliminated. The categorial distinction between verbs and nouns (or non-verbs) is duplicated in (5): it is specified both on the categorial type of the phase head (v, n) as well as via the structural difference itself. The fact that human language involves two basic categorial types (nouns and verbs; or rather, [+N] and [-N]) may reduce to the two logical possibilities for combining roots with their categorizers and complements, allowing us to speak of a generalized lexical categorizer, K:

(7) a. {{K, R}, {XP}} = “[+N]” b. {K, {R, XP}} = “[-N]”

This further derives the fact that derivations terminate in nouns (or rather, in a [+N] category): this is what emerges from base-Merge of two heads/terminals, i.e. {K, R}, as in (7a).

5. Conclusion I leave a more extensive elaboration of the details of this proposal to further research. My intention here has simply been to enunciate the argument that recent developments in minimalist generative syntax, in particular the POP-LA, might be leading us down something of a conceptual blind path (albeit an undoubtedly pro- ductive and inspiring one), and that the simplest LA under SMT – the ideal scenario in which there is no LA per se, with phase heads providing the external labels for nonphasal SOs (as already widely assumed for roots in DM and other construction- ist/exocentric approaches to categorization) – is at least worth exploring before we abandon it in favour of more complex, Search-based solutions.

References Alexiadou, A. (2014). Roots don’t take complements. Theoretical Linguistics 40: 287‐298. Blümel, A. (2012). Successive-cyclic movement as recursive symmetry-breaking. Proceedings of WCCFL 30: 87-97. Blümel, A. (2017a). Exocentric Root Declaratives. Evidence from V2. To appear in L. Bauke & A. Blümel (eds.). Labels and Roots. Berlin: De Gruyter. Problems of ‘Problems of Projection’: Breaking a conceptual tie CatJL Special Issue, 2019 151

Blümel, A. (2017b). Symmetry, Shared Labels and Movement in Syntax. Berlin: De Gruyter. Borer, H. (2003). Exo-skeletal vs. endo-skeletal explanations: syntactic projections and the lexicon. In J. Moore & M. Polinsky (eds.). The Nature of Explanation in Linguistic Theory. Chicago: CSLI/University of Chicago Press, 31-67. Borer, H. (2014). The category of roots. In A. Alexiadou, H. Borer & F. Schäfer (eds.). The Syntax of Roots and the Roots of Syntax. Oxford: OUP. Bošković, Ž. (2015). From the Complex NP Constraint to everything: On deep extrac- tions across categories. The Linguistic Review 32: 603-669. Cecchetto, C. & C. Donati (2010). On labeling: Principle C and head movement. Syntax 13: 241-278. Cecchetto, C. & C. Donati (2012). Relative Structures (and other strong islands) reduced to relabelling. Ms., Universities of Milan and Rome. Cecchetto, C. & C. Donati (2015). (Re)Labeling. Cambridge, MA: MIT Press. Chomsky, N. (2000). Minimalist inquiries: the framework. In R. Martin, D. Michaels & J. Uriagereka (eds.). Step by step: Essays on minimalist syntax in honor of Howard Lasnik, 89-156. Cambridge, MA: MIT Press. Chomsky, N. (2001). Derivation by phase. In M. Kenstowicz (ed.). Ken Hale: a life in language, 1-50. Cambridge, Mass.: MIT Press. Chomsky, N. (2005). Three Factors in Language Design. Linguistic Inquiry 36: 1-22. Chomsky, N. (2007). Approaching UG from below. In U. Sauerland & H.-M. Gärtner (eds.). Interfaces + Recursion = Language? Chomsky’s Minimalism and the View from Syntax-Semantics, 1-30. Berlin: De Gruyter. Chomsky, N. (2008). On phases. In R. Freidin, C. P. Otero & M.-L. Zubizarreta (eds.). Foundational Issues in Linguistic Theory, 133-166. Cambridge, MA: MIT Press. Chomsky, N. (2013). Problems of Projection. Lingua 130: 33-49. Chomsky, N. (2015). Problems of Projection: Extensions. In E. Di Domenico, C. Hamann & S. Matteini (eds.). Structures, strategies and beyond: Studies in honour of Adriana Belletti. Amsterdam: Benjamins, 1-16. Chomsky, N. (2017). Untitled talk. Generative Syntax: Questions, Crossroads and Challenges. Barcelona, June 23, 2017. Collins, C. (2002). Eliminating labels. In S. D. Epstein & T. D. Seely (eds.). Derivation and Explanation in the Minimalist Program, 42-64. Oxford: Blackwell. Embick, D. (2010). Localism and globalism in morphology and phonology. Cambridge: MIT Press. Gallego, Á. (2006). Phase Sliding. Ms., UAB and UMD. Gallego, Á. (2010). Phase Theory. Amsterdam: Benjamins. Gallego, Á. (2014). Roots and phases. In A. Alexiadou, H. Borer & F. Schäfer (eds.). The Syntax of Roots and the Roots of Syntax. Oxford: Oxford University Press. Gallego, Á. (2017). Strong and weak “strict cyclicity” in phase theory. Ms., Universitat Autònoma de Barcelona. Goto, N. (2015). On Labeling: In Search of Unconstrained Merge. Paper presented at the English Linguistic Society of Japan, Workshop on Unconstrained Merge, Osaka, November 2015. Goto, N. (2016). Labelability = extractability: Its theoretical implications for the Free Merge hypothesis. Proceedings of NELS 46, vol. 1: 335-348. 152 CatJL Special Issue, 2019 Marc Richards

Goto, N. (2017). Eliminating strong/weak parameter on T. Proceedings of GLOW in Asia XI. Holmberg, A. & H. Hróarsdóttir (2003). Agreement and movement in Icelandic raising constructions. Lingua 113: 997-1019. Hornstein, N. & J. Nunes (2008). Adjunction, labeling, and bare phrase structure. Biolinguistics 2: 57-86. Hornstein, N. & P. Pietroski (2009). Basic Operations: Minimal Syntax-Semantics. Catalan Journal of Linguistics 8: 113-139. Marantz, A. (2013). Locality Domains for Contextual Allomorphy across the Interfaces. In O. Matushansky & A. Marantz (eds.). Distributed Morphology Today: Morphemes for Morris Halle. Cambridge, MA: MIT Press, 95-115. Narita, H. (2014). Endocentric structuring of projection-free syntax: Phasing in Full Interpretation. Amsterdam: Benjamins. Narita, H. (2015). Conditions on Symmetry-Breaking in Syntax. Paper presented at the English Linguistic Society of Japan, Workshop on Unconstrained Merge, Osaka, November 2015. Rackowski, A. & N. Richards (2005). Phase Edge and Extraction: A Tagalog Case Study. Linguistic Inquiry 36: 565-599. Rezac, M. (2003). The Fine Structure of Cyclic Agree. Syntax 6: 156-182. Richards, M. (2007). On feature inheritance: An argument from the phase impenetrabil- ity condition. Linguistic Inquiry 38: 563-572. Richards, M. (2014). Wrong path, wrong label. On the relation between labelling and left-branch opacity. Paper presented at DGfS Marburg, March 2014. Stockwell, R. (2016). Labelling in Syntax. Cambridge Occasional Papers in Linguistics 9: 130-155. Takita, K., N. Goto & Y. Shibata (2016). Labeling through Spell-Out. The Linguistic Review 33: 177-198. Uriagereka, J. (1999). Multiple Spell-Out. In S. D. Epstein & N. Hornstein (eds.). Working Minimalism. Cambridge, MA: MIT Press, 251-282. Catalan Journal of Linguistics Special Issue, 2019 153-163

On Morpho-Syntax*

Daniel Siddiqi Carleton University [email protected]

Received: February 15, 2018 Accepted: September 23, 2019

Abstract

This short paper offers a moment of reflection on the state of the Generative Grammar enterprise especially in light of the fact the Minimalist Syntax has so completely returned to a mission that includes (rather than explicitly excludes) a model of word-formation. I focus here on a discussion of crucial ways the move from “syntactic” theory to “morphosyntactic” theory has changed the mission of generative grammar and to what extent practitioners have kept pace. I hope to provide both a broad and a long view of metatheoretic concerns we now find ourselves at the nexus of and suggest best practices in light of those views. Keywords: distributed morphology; metatheory; readjustment; allosemia; psycholinguistics

Resum. Sobre la morfosintaxi

Aquest article breu ofereix un moment de reflexió sobre l’estat de l’empresa gramatical generativa, sobretot a la vista que la sintaxi minimalista ha tornat de manera tan completa a una missió que inclou (més que no pas exclou explícitament) un model de formació de paraules. Em centro aquí en una discussió de maneres crucials de passar de la teoria «sintàctica» a la teoria «morfosintàc- tica» que ha canviat la missió de la gramàtica generativa i fins a quin punt els practicants han mantingut el ritme. Espero oferir una visió àmplia i llarga de les preocupacions metateorètiques que ara ens trobem en el punt de mira i suggerir bones pràctiques a la vista d’aquestes opinions. Paraules clau: morfologia distribuïda; metateoria; reajustament; al·losèmia; psicolingüística

Table of Contents 1. Introduction 5. On words and morphemes 2. On exceptions and generalizations 6. On psycholinguistic evidence 3. On “conceptual” arguments 7. Conclusions 4. On readjustment and allosemy References

* I’d like to acknowledge the feedback of Itamar Kastner, Ángel Gallego, and Dennis Ott on this paper. As well, revisions to this paper result from discussion at GenSys17 in Barcelona, especially from Brandon Fry. Additionally, this discussion largely comes out of a conversation with Gregory Stump. Finally, I’d like to thank two anonymous reviewers for helpful feedback.

ISSN 1695-6885 (in press); 2014-9718 (online) https://doi.org/10.5565/rev/catjl.222 154 CatJL Special Issue, 2019 Daniel Siddiqi

1. Introduction While overshadowed a bit by the advent of the Minimalist Program, Bare Phrase Structure, and Merge (Chomsky 1995), the rise of a realizational morphology- syntax interface (Anderson 1992; Beard 1995; Halle & Marantz 1993, 1994; Stump 2001, etc.) during that same period (the early 1990s) has had similarly profound effects on the enterprise of generative grammar. Of course, realizational morphol- ogy-syntax interfaces are in no way limited to the Chomskyan generative tradition. For example, a fruitful contemporary research program maps Paradigm Function Morphology (PFM; Stump 2001, 2016) to Lexical Functional Grammar (LFG; Bresnan & Kaplan 1984). However, in the context of this workshop, the main relevant expression of this in Minimalism is Distributed Morphology (DM; Halle & Marantz 1993, 1994), which has risen over the last quarter century to be one of the dominant Minimalist frameworks. Its addition to Minimalism has paid huge dividends to the model, not the least of which is opening access to a large swath of North American languages as DM has equipped Minimalism with ample tools required for investigation into highly synthetic languages. An important aspect of the rise of DM is the blurring of lines in generative grammar. Within the Chomskyan tradition, the division of morphology and syntax has largely been erased, meaning Chomskyan syntacticians are often morphologists and DM morphologists are often syntacticians. This is a massive growth in the scope of the enterprise (or more precisely a return to a previously large scope that had been narrowed circa Chomsky 1970—the default position in literature previous to Chomsky 1970 assumed morphemes as the building blocks of syntax: see for example Chomsky & Halle 1968; Chomsky 1957; Chomsky 1965). For example: Minimalism, by way of DM, is now crucially concerned with the nature of the mor- phology-phonology interface. Stem allomorphy, productivity, and blocking, which were once the exclusive domain of morphology, are now concerns for syntax. Whole word storage, word processing, and frequency effects are now within Minimalism’s domain. Templatic morphology, defective paradigms, and cross-class syncretisms are things Minimalism ought to have explanations for. Similarly, DM has become a model of lexical semantics, especially recently. And again, Minimalism now inherits the onus to provide explanations to complement the decades of lexical semantics research, such as argument structure, polysemy, and event structure. In this type of venue, it is tempting to bemoan the specialization and lack of awareness of the literature of researchers in this type of sprawling interface research program. Certainly, I will do some of that here, but my main contribution to this workshop is a discussion of crucial ways in which the move from “syntactic” theory to “morphosyntactic” theory has changed the mission of generative grammar and to what extent practitioners have kept pace. I focus here on a brief survey of five things that seem especially pressing to me.

2. On exceptions and generalizations One of the key markers of morphology is that it is the main domain for exceptional behaviour. For example, the generalization about morphology is that it is largely On Morpho-Syntax CatJL Special Issue, 2019 155 concatenative, showing headed hierarchical structure similar to syntax. However, stem allomorphy and other non-affixal morphological processes that modify the base make up a significant portion of studied phenomena, especially among more frequent forms. Similarly, most (affixal) morphology is semantically transparent. But, the complex word is famously the domain for most non-compositional mean- ing (often unironically called lexicalization). Even in the domain of phonology, morphophonemic alternations such as tri-syllabic shortening or velar softening stand apart from the otherwise fully regular and fully productive phonology. The relative size of the class of exceptional phenomena within the domain of morphology is what makes it significant. Non-concatenative morphology comprises such a large class of data that it is not an exaggeration to claim that the field of morphology is effectively divided over the metatheoretic concerns which involve dealing with it. Item-and-arrangement models of morphology seek to treat non- affixal morphology as truly exceptional and limit their scope to primarily account for concatenative morphology. This limits the formal mechanism they require to a simple concatenative mechanism not unlike Merge, thus increasing restriction (and decreasing power). The flip side of this, of course, is that all morphology has to be treated as affixal (thus the need for transfixes and superfixes and the like) or it has to be treated as exceptional (and essentially listed). This trade-off reduces the empirical coverage of item-and-arrangement models. However, one significant gain for morpheme-based models is a claim to parsimony. In this view, syntax and morphology can both be reduced to headed hierarchical structures. Item-and-process and word-and-paradigm models of morphology treat the pre- ponderance of concatenative morphology as epiphenomenal and seek to treat non- affixal morphology as the baseline for the required power of the model. In these models, morphological rules or paradigm functions are powerful enough to gener- ate non-concatenative morphology. Thus, the power of these models is generally greater than needed to account for the attested patterns of morphology in the world and is certainly greater than that needed to account for concatenative morphol- ogy. Indeed, in these models, what appears to be an affix is almost never treated as one. Rather, it is a stem change that happens to be at the periphery of the word form. In these cases, the fact that most morphology appears to be concatenative is treated as an artifact of history (see Stump 2001 for discussion; see also chapter 3 of Haspelmath & Sims 2010 for a textbook discussion of this point). In essence, item-and-arrangement models sacrifice empirical coverage by treating non-affixal morphology as exceptional but gain restriction and the cor- responding falsifiability. The other two gain empirical coverage but at the price of increased power. Of course, this discussion should be very familiar. Chomsky (1970) in no small part argued to eliminate these types of exceptional data from the mission of syntac- tic generative grammar. Lieber (1992) made similar arguments to eliminate them from a model of the morphosyntax as well. Since Chomsky (1995), Minimalism has put a primary focus on the metatheoretic principles of restriction and elegance. As it has done so, it has systematically sacrificed empirical coverage to limit its power, eventually discarding adjuncts and head movement, for example. The restriction 156 CatJL Special Issue, 2019 Daniel Siddiqi it has gained is frequently touted by its practitioners as its main appeal, especially as its competitors such as HPSG, LFG, and CxG have sacrificed restriction for empirical coverage and computational/cognitive power. So it is a remarkable that DM undoes a lot of those moves. DM spends an extraordinary amount of research on exceptions and creates increasing numbers of mechanisms to account for this exceptional behaviour. It is not without significant irony that I point out that root suppletion and stem allo- morphy is one such topic where practitioners of DM invest significant effort (see for example Siddiqi 2009; Haugen & Siddiqi 2013, 2016; Harley 2014; Harley & Tubino-Blanco 2014 among many many others) and create powerful mechanisms (such as default fusion or readjustment rules) to account for phenomena that are clearly exceptional and would be removed from the enterprise of generative gram- mar by the standards of such as Chomsky (1970) and Lieber (1992). Similarly, recent accounts for level ordering data (see for example Newell 2016; cf. Kiparsky 1982) again capture data with very low productivity (potentially none) yet suggest significant changes to the power of the productive grammar. This monster of power creep is two-headed. Besides accounting for phenomena that are clearly exceptional, DM eagerly accepts mechanisms that were cast aside from the pre-spell-out branch of syntactic theory, such as head movement and lowering, while also adding powerful transformational mechanisms such as rebracketing (see for example Radkevich 2014) and local dislocation (see for example Embick 2007). The sheer number of these exception-capturing mechanisms would drastically reduce restriction and increase the power of the model by itself, but this is further magnified by the fact that these mechanisms themselves are also frequently power- ful. Impoverishment and readjustment have existed for as long as the theory has. Recently, allosemy of roots, which is essentially vocabulary insertion except for lexical semantics instead of phonology, has been increasingly en vogue (see for allosemy Marantz 2013, 2014; Wood & Marantz 2017; Wood 2015; Harley 2014). Readjustment and allosemy are especially powerful, so I will give them a section of their own below. Minimalism (and the Chomskyan tradition more generally since at least 1970) has taken as its driving principle that restriction, elegance, parsimony, and other such concerns are prioritized over empirical coverage, especially when it comes to phenomena that are clearly exceptional. Minimalism stands nearly alone among formal models of syntax in prioritizing these concerns. DM inherits this mission from Minimalism, but while Minimalism continues to emphasize this restriction, it seems like DM when taken as a whole, loses sight of this mission. To some extent, this is just the natural effect of power creep. Each theorist who proposes a small increase in power has not compromised the whole system, but when all of us are doing this, we quickly arrive where we are now: with a PF path that is congested with powerful devices that threaten the claim to restriction. But the other reason that this increase of complexity is happening is the types of questions we are asking: specifically about morphology. The field of generative morphology has known since its inception that its generalizations license a simple, On Morpho-Syntax CatJL Special Issue, 2019 157 elegant model of concatenative morphology and syntax—a model that DM aims to be. But its exceptions are a large enough class that they license a more robust word- based approach, such as the approach taken by PFM. The word-based hypothesis (Aronoff 1976) treats explanations for the exceptional, such as stem allomorphy, as the baseline for the minimum necessary power of the model. They are very good at this. DM needs to remember that its appeal, and the appeal of any morpheme approach, is its limited power. This entails treating exceptional morphology as truly exceptional (i.e. listed), rather than accounted for by an ad hoc mechanism of the grammar. Otherwise, it will just be worse at doing what word-based models are already doing but without any claims to restriction and elegance.

3. On “conceptual” arguments What I hope I did in Section 1 was make a nuanced, metatheoretical argument couched in a solid history of philosophy of science. I.e. a “conceptual argument”. Generative grammar, like all sciences, relies on metatheory as the backbone of the entire enterprise. Indeed, Minimalism is essentially a mandate to put metatheoretic concerns at the forefront of linguistic theory. It is odd then that even a cursory glance at the literature in DM will reveal “setting aside conceptual considera- tions” as a refrain. To be sure, given the relatively very small amount of data we have, there is definitely a place for arguments that the field of linguistics should be focusing more on developing models with sweeping empirical coverage than developing models that forefront metatheoretical concerns. Kaplan (1987) very convincingly made this argument in defense of very complicated and powerful modular models such as LFG. A similar argument is made by Haspelmath (2013) in favor of non-generative, “nonaprioiristic” approaches to comparative syntax. These types of arguments make sense when made to argue on behalf of a model or framework or program that has aimed at putting empirical coverage ahead of metatheoretic concerns. These arguments are somewhat counterintuitive when used within DM since DM appears at the nexus of a history of morpheme approaches to morphology (which by their very nature prioritize parsimony and restriction over empirical coverage) and Minimalist syntax (which explicitly prioritizes parsimony and restriction). Metatheoretic concerns are fundamental to DM. They are in its very blood. The most compelling case against conceptual arguments that I have seen within DM comes from Embick (2014), where he argues against conceptual arguments being used to favor “insertion into non-terminals” models over “morpheme-inser- tion only” models, so I quote a bit of it here:

Since it appears to allow for a grammar with fewer mechanisms, INT might have a conceptual advantage over MIO; at least, to the extent that this kind of accounting is taken at face value as a valid assessment of parsimony…. In any case, conceptual argu- ments about which of INT and MIO has more or less machinery provide guidelines for research, but are not decisive, and must play a secondary [role] to questions about where the two theories differ empirically. …MIO is superior to INT on empirical grounds, in a way that trumps (potential) conceptual concerns. (Embick 2014) 158 CatJL Special Issue, 2019 Daniel Siddiqi

In the case of Embick (2014), I happen to have disagreed with the relevant empirical arguments (Haugen & Siddiqi 2016), but the argument for priori- tizing empirical concerns is a compelling one…except that if you follow that to its logical conclusion, it means adopting a word model of morphology that shares many of the underlying assumptions of DM but has much greater empirical cover- age (and the corresponding power) (one obvious candidate I mentioned before is PFM). DM and Minimalism have explicitly stated, repeatedly, by their very design, that conceptual concerns must play a primary role, not a secondary one. An ancillary concern I have here is the role a potential dismissal of “concep- tual considerations” can have on the review process. Having a propensity for the conceptual argument myself and having been an editor on several occasions, I have seen more than my share of reviews that actively diminished the value of metatheoretic concerns. While it is certainly true that conceptual arguments should be disregarded in the face of overwhelming counter-evidence (even the simplest model has no value if it doesn’t describe the world), conceptual arguments have a significant importance to Minimalism and DM. We risk losing our identity if we are too quick to disregard them.

4. On readjustment and allosemy I have argued against readjustment rules for most of my (very short) professional career. But these arguments have usually come from the point of view of a mor- phologist and a syntactician. It should be lost on nobody though, that nearly every contemporary incarnation of readjustment rules assumes they are phonological— that they properly belong to the phonological component of the grammar. Similarly, the recent hypothesis that morphemes are subject to conditioned listed allosemic alternations is intended to capture polysemy, which has always been one of the most researched areas of lexical semantics. Both readjustment and allosemy have in common that they are aggressively powerful. Both have the power to map one root to vastly different LF and PF real- izations given a particular environment. As Haugen & Siddiqi (2013, 2016) point out, readjustment is assumed to account for alternations such as think-thought, seek- sought, and bring-brought, which involve a phonological rule powerful enough to take /ɪŋk/, /ɪk/, or /ɪŋ/ as its input and then output /ɔ/. This is an extraordinarily powerful phonological rule: the type of which would never appear in phonologi- cal literature. Similarly, the current way we approach allosemy, as seen in Harley (2014), employs radically different and independent mappings from a root to its semantics. For example, Harley (2014) proposes that the root realized by –ceive/- cept- maps to “think” in the context of con- while mapping to “fake” in the context of de-. While Harley’s (2014) discussion regards allosemy of roots, most appli- cations of allosemic alternation are applied to functional heads (see for example Wood & Marantz 2015). Put succinctly, these are examples of morpho-syntacticians imposing brute- force, radical replacement operations on a different component of grammar— rules of a variety that would never be proposed by practitioners in those particular On Morpho-Syntax CatJL Special Issue, 2019 159 components. It’s not surprising that there has been pushback about this: Bermudez- Otero (2013) is in part a scathing rebuke of readjustment, and Ramchand (2015) forcefully rejects allosemy from the view of a constructivist (a position on lexical semantics that DM assumes). Ramchand’s (2015) arguments are as compelling as they are obvious: the gains seem minimal (syntactic independence from lexical semantics) while the metatheoretic costs are enormous (adding another listing frame to the grammar, destroying learnability through syntactic-semantic bootstrapping, and obliterating any and all generalizations about the nature of polysemy). Why are we doing this with the syntax? Is it not weird that we are ignoring decades of generalizations and best practices in these other components so that we can account for their data in ways they never would (via brute force and stipulation) and in ways we never would for own data?

5. On words and morphemes I promised to “bemoan the specialization and lack of awareness of the literature of researchers in this type of sprawling interface research program”. I’ll do that here. When grad students ask me for reading on how to become a morphologist in DM, I hand them three books: Aronoff (1976), DiSciullo & Williams (1987), and Stump (2001). This is because, in my experience, grad students entering into the enterprise of being a DM morphologist have almost no awareness of the morphological lit- erature outside of DM. Hearing that the existence of morphemes is a hotly debated topic in the field of morphology seems to them almost analogous to hearing tales of the boogeyman. This is not the fault of grad students or their supervisors. This is the fault of the literature in DM. As we mentioned in our editors’ note to Siddiqi & Harley (2016), the mor- phological literature in DM is strikingly, almost mind-boggling, insular. There are some standout counter-examples (see for example many papers and books from David Embick), but those usually argue that DM is globally preferable to word-based approaches. Seldom does the DM literature draw on insights from the word-based literature about particular phenomena. This tendency might be true of every model of every field of linguistics. It is certainly true of Minimalism more generally. Though, in Minimalism, there is at least a claim to predominance in syntactic research. How you go about delineating DM syntacticians from DM morphologists is outside of my ability set and thus so is counting morphologists and guessing at percentages, but it doesn’t strike me as true that DM has the same claim to predominance in the field of morphology, especially given the immense literature in word-based theory. This is to say nothing of the lion’s share morphol- ogy and word processing have in the psycholinguistics literature. My discussion above about metatheoretic concerns of power and restriction do not occur in a vacuum. When DM does explicitly engage the greater litera- ture on morphological theory, it is to make strong claims about restriction, power, economy, and parsimony. It inherits most of these claims by way of the morpheme hypothesis. It also inherits its weaknesses (stem allomorphy, defective paradigms, cran-morphs, bound stems, etc.). DM practitioners should always be aware of 160 CatJL Special Issue, 2019 Daniel Siddiqi this. Increasing the power of DM jeopardizes the strength of morpheme models. Eventually, the increased power will undermine the restriction of DM, and restric- tion will no longer differentiate it from paradigmatic realizational models such as Anderson (1992) and Stump (2001). This all stops well short of engaging the most important part about ignor- ing such a large cross-section of the literature: there are very good linguists with very good insights about the same phenomena that we work on. We shouldn’t be ignoring their insight any more than they should ignore ours because we are also syntacticians.

6. On psycholinguistic evidence Almost as a throw away remark above, I commented that the study of word-for- mation makes up a disproportionate share of psycholinguistic research. In DM, we almost never make reference to this research (though Marantz 2005 and Pfau 2009 are solid counter-examples). There are good reasons for this, of course. Foremost of these is the claim that DM is a model of competence. It’s not super easy to see what studies of word processing effects or frequency or productivity have on a competence model of morphology, but it is certainly not nothing. For example, we regularly make aggressive claims of morphological decomposition. It seems like psycholinguistic evidence can inform these claims. DM is indeed couched in a literature of syntactic competence and in that domain it is much easier to conclude that psycholinguistic evidence doesn’t inform the theory to a great degree. But a competence model is meant to be a model of linguistic knowledge and psycholi- guistic evidence certainly tells us about speaker knowledge. It is certainly very weird that DM shows less reliance on experimental evi- dence than the rest of morphology. Phonetics and phonology conferences are increasingly dominated by experimental evidence. Correspondingly, morpho- phonology increasingly relies on such evidence. Word-based models and adaptive discrimitive models (see for example Blevins et al 2016) are happy to incorporate psycholinguistic evidence. There are several claims within DM that can be tested experimentally. Decomposition and parsing are certainly the most prominent phe- nomenon begging for experimental confirmation in a theory that increasingly proposes inflated numbers of morpheme boundaries and heavily articulating func- tional structure. Furthermore, it seems that DM predicts that increased processing time for forms that involve readjustment. This seems like a testable claim, though one that is certainly confounded by the fact that readjusted forms are usually high-frequency. Psycholinguistic research methods are certainly not without their risks. For example, psycholinguistic evidence, especially in morphology, relies heavily on controlling for frequency effects. Frequency controls are dependent on the cor- pora the frequencies are drawn from, which have in turn made design choices and employed data collection techniques that have significant effects. The net result is that corpora can have butterfly-effect style consequences to morphological theory. See Swanson (2016 and following) for discussion. On Morpho-Syntax CatJL Special Issue, 2019 161

7. Conclusions More than once, I’ve heard the refrain at a morphology conference that “DMers are not morphologists but syntacticians.” This is clearly not true. DMers are morphologists in every way that matters. But this claim is also not without merit. DM has a tendency to not engage the rest of the morphological literature. It also has a tendency to deal with typical morphological concerns (i.e. exceptions) in ways that disregard chief morphololgical metatheoretical concerns. These are pretty significant objections. Indeed, you see those objections manifest themselves in the Nanosyntax (see for example Caha 2009 et seq.) literature, where Nanosyntax is presented as a response to some of the issues described here. Since this volume is intended to be a “state of field” reflection on generative grammar (the Chomskyan enterprise), and that has returned to the pre-Remarks state of including morphology, the way Minimalism interfaces with the morphological literature and other morphological theories seem to be of chief concern and worth a moment’s reflection.

References Anderson, Stephen. 1992. A-morphous Morphology. Cambridge: Cambridge University Press. Aronoff, Mark. 1976. Word Formation in Generative Grammar. Cambridge: MIT Press. Beard, Robert. 1995. Lexeme-Morpheme Base Morphology; a General Theory of Inflection and Word Formation. Albany, NY: SUNY Press. Bermúdez-Otero, Ricardo. 2013. The Spanish lexicon stores stems with theme vowels, not roots with inflectional class features. Probus 25(1): 3-103. Blevins, James, Farrell Ackerman, Robert Malouf & Michael Ramscar. 2016. Morphology as an adaptive discrimitive system. In Daniel Siddiqi & Heidi Harley (eds.). Morphological Metatheory. Amsterdam: John Benjamins. Caha, P. 2009. The nanosyntax of case. PhD dissertation, University of Tromsø. Chomsky, Noam. 1957. Syntactic Structures. The Hague: Mouton Chomsky, Noam. 1965. Aspects of the Theory of Syntax. Cambridge: MIT Press. Chomsky, Noam. 1970. Remarks on Nominalization. In R. Jacobs & P. Rosenbaum (eds.). Readings in English Transformational Grammar, 184-221. Waltham, MA: Ginn. Chomsky, Noam. 1995. The Minimalist Program. Cambridge, MA: MIT Press. Chomsky, Noam & Morris Halle. 1968. The Sound Pattern of English. New York: Harper & Row. Di Sciullo, Anna Maria & Edwin Williams. 1987. On the Definition of Word. Cambridge, MA: MIT Press. Embick, David. 2007. Linearization and Local Dislocation: Derivational mechanics and interactions. Linguistic Analysis 33(3-4): 303-336. Embick, David. 2014. On the targets of phonological realization. Proceedings for the MSPI Workshop at Stanford University. Halle, Morris & Alec Marantz. 1993. Distributed morphology and the pieces of inflec- tion. In The View from Building 20: Essays in Linguistics in Honor of Sylvain 162 CatJL Special Issue, 2019 Daniel Siddiqi

Bromberger, ed. Kenneth Hale and Samuel Jay Keyser, 111-176. Cambridge, MA: MIT Press. Halle, Morris & Alec Marantz. 1994. Some key features of Distributed Morphology. In Andrew Carnie and Heidi Harley (eds.). Papers on phonology and morphology, 275-288. Cambridge, MA: MIT Working Papers in Linguistics 21. Harley, Heidi. 2014. On the identity of roots. Theoretical Linguistics 40: 225-275. Harley, Heidi & Mercedes Tubino Blanco. 2013. Cycles, vocabulary items, and stem forms in Hiaki. In Ora Matushansky & Alex Marantz (eds.). Distributed Morphology Today: Morphemes for Morris Halle, 117-134. Cambridge, MA: MIT Press. Haspelmath, Martin. 2013. Comparative Syntax. In Carnie, Andrew, Yosuke Sato & Daniel Siddiq. The Routledge Handbook of Syntax. London: Routledge. Haspelmath, Martin & Andrea Sims. 2010. Understanding Morphology. London: Hodder Education. Haugen, Jason D. & Daniel Siddiqi. 2013. Roots and the derivation. Linguistic Inquiry 44.3: 493-517. Haugen, Jason D. & Daniel Siddiqi. 2016. Restricted Realization Theory. In Daniel Siddiqi & Heidi Harley (eds.). Morphological Metatheory. Amsterdam: John Benjamins. Kaplan, Ronald. 1987. Three seductions of computational psycholinguistics. In P. Whitelock et al. Linguistic Theory and Computer Applications, 149-188. London: Academic Press. Kaplan, Ronald & Joan Bresnan. 1982. Lexical-Functional Grammar: A formal system for grammatical representation. In Bresnan, Joan (ed.). The Mental Representation of Grammatical Representation, 173-281. Cambridge, MA: MIT Press. Lieber, Rochelle. 1992. Deconstructing morphology: Word formation in syntactic the- ory. Chicago: University of Chicago Press. Marantz, Alec. 2005. Generative linguistics within the cognitive neuroscience of lan- guage. Linguistic Review 22: 429-455. Marantz, Alec. 2013. Locality Domains for Contextual Allomorphy across the Interfaces. In Ora Matushansky & Alex Marantz (eds.). Distributed Morphology Today: Morphemes for Morris Halle, 117-134. Cambridge, MA: MIT Press. Newell, Heather. 2016. English Lexical Levels are not Lexical, but Phonological. Ms. Pfau, Roland. 2009. Grammar as a Processor. Amsterdam: John Benjamins Radkevich, Nina. 2010. On Location: the structure of case and adpositiona. PhD Dissertation. Ramchand, Gillian. 2015. Allosemy–No thanks. Language Blog. 14 Sept. 2015. Siddiqi, Daniel. 2009. Syntax within the Word. Amsterdam: John Benjamins. Siddiqi, Daniel & Heidi Harley (eds.). 2016. Morphological Metatheory. Amsterdam: John Benjamins. Swanson, Heather. 2016. Problems in replicating studies that rely on lexical frequencies. Talk given at Mo-MOT 1. Ottawa, ON. Stump, Gregory. 2001. Inflectional Morphology: A Theory of Paradigm Structure. Cambridge: Cambridge University Press. Stump, Gregory. 2016. Paradigms at the interface of a lexeme’s syntax and seman- tics with its inflectional morphology. In Siddiqi, Daniel & Heidi Harley (eds.). Morphological Metatheory. Amsterdam: John Benjamins. On Morpho-Syntax CatJL Special Issue, 2019 163

Wood, Jim. 2015. Icelandic Morphosyntax and Argument Structure. Dordrecht: Springer. Wood, Jim & Alec Marantz. 2017. The interpretation of external arguments. In Roberta D’Alessandro, Irene Franco & Ángel J. Gallego (eds.). The Verbal Domain, 255- 278. Oxford: Oxford University Press.

Catalan Journal of Linguistics Special Issue, 2019 165-202

De-syntacticising Syntax? Concerns on the Architecture of Grammar and the Role of Interface Components*

Aritz Irurtzun CNRS-IKER [email protected]

Received: April 3, 2018 Accepted: September 23, 2019

Abstract

This article discusses different ways in which interface components could potentially affect syntax (or what have traditionally been analysed as syntactic phenomena). I will distinguish four types of potential effects that the interface components could have onto syntax: (i) no real interaction, since almost nothing pertains to syntax: everything (beyond Merge) is externalization; (ii) computations at interface components actively affect the syntactic computation; (iii) Properties of interface representations function to inform biases for language acquisition; (iv) interface components impose Bare Output Conditions (legibility conditions) that constrain the range of possible syntactic representations at the interface. I argue that the first two are problematic, whereas the latter two may help us understanding a range of universal and variable phenomena. Keywords: architecture of grammar; syntax; interfaces; bare output conditions; modularity

Resum. Dessintactitzar la sintaxi? Preocupacions sobre l’arquitectura de la gramàtica i el paper dels components d’interfície

Aquest article tracta diferents maneres en què els components de la interfície poden afectar poten- cialment la sintaxi (o tradicionalment analitzats com a fenòmens sintàctics). Distingiré quatre tipus d’efectes potencials que els components de la interfície poden tenir sobre la sintaxi: (i) no hi ha interacció real, ja que gairebé res no pertoca a la sintaxi: tot (més enllà de combinar) és externalització; (ii) els càlculs dels components de la interfície afecten activament la computació sintàctica; (iii) les propietats de les representacions d’interfície funcionen per informar els biaixos per a l’adquisició d’idiomes; (iv) els components de la interfície imposen condicions de sortida nua (condicions de llegibilitat) que restringeixen el rang de representacions sintàctiques possibles a la interfície. Jo argumento que els dos primers són problemàtics, mentre que els dos últims poden ajudar-nos a comprendre una gamma universal i variable de fenòmens. Paraules clau: arquitectura de gramàtica; sintaxi; interfícies; condicions de sortida nua; modu- laritat

* My deepest thanks to the organizers as well as to the audience of the Workshop on Generative Syntax: Questions, Crossroads, and Challenges held at the UAB. This research was funded by the following projects: PGC2018-096870-B-I00 (MICINN & EAI); FFI2017-87140-C4-1-P (MINECO); IT769-13 (Eusko Jaurlaritza); BIM (ANR) and UV2 (ANR–DFG).

ISSN 1695-6885 (in press); 2014-9718 (online) https://doi.org/10.5565/rev/catjl.231 166 CatJL Special Issue, 2019 Aritz Irurtzun

Table of Contents 1. Introduction 4. Reflecting syntax and biasing acquisition 2. Radical externalization 5. Legibility conditions at the interfaces 3. Actively affecting the syntactic computation 6. General conclusions References

1. Introduction This article discusses different ways in which interface components could poten- tially affect syntax (or what have traditionally been analysed as syntactic phenom- ena). I will distinguish four types of potential effects that the interface components could have onto syntax:

1. No real interaction, since almost nothing pertains to syntax: everything (beyond Merge) is externalization (section 2). 2. Computations at interface components actively affect the syntactic computation (section 3). 3. Properties of interface representations function to inform biases for language acquisition (section 4). 4. Interface components impose Bare Output Conditions (legibility conditions) that constrain the range of possible syntactic representations at the interface (section 5).

The first two conceptions advocate for a de-syntactization of phenomena previ- ously though to pertain to the syntactic component, since they take processes and constructions that were classically thought to pertain to the syntactic component to (i) merely pertain to externalization phenomena, or (ii) be derivative of phono- logical (or semantic) computations. The latter two, on the contrary, are compatible

Figure 1. The inverted-Y model of the architecture of grammar (Chomsky 1995). De-syntacticising Syntax? CatJL Special Issue, 2019 167

Figure 2. The inverted-Y model of the architecture of grammar with phasal spell-out (Chom- sky 2001). with the classical (inverted-Y) model of the architecture of grammar (Chomsky 1995) where syntax generates structures that at some point will be spelled out to the linguistic components (PF, LF) that serve as interfaces with the language-external A(rticulatory)-P(erceptual) and C(onceptual)-I(ntentional) systems (Figure 1). Over the last twenty years a range of works adapted this general architecture of grammar to a more dynamic one where, rather than a single point of SpellOut (transfer), the syntactic derivation unfolds in phases (computational cycles) lead- ing to a spell out of chunks of structure at various points (see i.a. Uriagereka 1999; Chomsky 2000, 2001; Kratzer & Selkirk 2007) (Figure 2).1 Even though the syntax of phase-based structure building has been quite exten- sively explored, there are still very few studies devoted to the nature of phases at the interfaces, and the concepts and primitives employed in each work can be very different and even incompatible with each other (see e.g. Marvin 2003; Dobashi 2003; Kratzer & Selkirk 2007; Samuels 2011; D’Alessandro & Scheer 2015; or the works in Gallego 2012). In any event, there is no substantial architectural difference between the model depicted in Figure 1 and the one in Figure 2 with respect to the derivational relationship between the syntactic computation and the interfaces. Regarding the four types of potential relationships between syntax and the interfaces, the first conception of the interface constitutes a theoretically attrac- tive program, but I will argue that it requires a large number of specifications and additional theoretical primitives if every single point of cross-linguistic variation is to be conceived as a PF phenomenon (cf. section 2). Furthermore, it also faces non-trivial empirical challenges (namely, the existence of cross-linguistic variation in the available semantic representations). Then, section 3 argues that the second type of interaction requires a radical change in our conception of the architecture of grammar, as has been extensively

1. See also Marušič (2005, 2009) for a model with non-simultaneous transfer to PF and LF. 168 CatJL Special Issue, 2019 Aritz Irurtzun claimed in the literature (see e.g. Jackendoff 1997, 2002; Zubizarreta 1998; van der Hulst 2006). However, I believe that there is no genuine evidence requiring such a radical change (i.e. that syntax is ‘phonology-free’ (Zwicky & Pullum 1983; Miller et al. 1997; or ‘melody-free’ Scheer 2011). I will discuss what probably seems to be the best case for such an interactive architecture (the focus to stress correspondence, which is argued to underlie focalization movements and wh-constructions Reglero & Ticio 2013, i.a.). I will argue that there is no basis for such a position and that it incurs in a number of conceptual and empirical problems (section 3, see also Irurtzun 2007, 2009). I will also discuss a more recent proposal by Richards (2010) which rather than building on the Nuclear Stress Rule, takes prosodic phrasing to be at the origin of the different interrogative strategies attested cross-linguistically. Based on recent work with M. Duguine (Duguine & Irurtzun 2019), I will raise a number of empirical problems that cast doubt on this vision too. Now, the vision that interface components reflect to a certain degree the structures generated by the syntactic component seems to be a sensible one; I will discuss frameworks of early language acquisition that illuminate the acqui- sition of syntactic patterns via prosodic and semantic bootstrapping hypotheses (section 4). Last, I will discuss how legibility conditions imposed by language-external components (the Articulatory-Perceptual systems at the interface with PF and the Conceptual-Intentional systems at the interface with LF) may affect the design of our syntactic ability. I will argue that alongside other effects, investigation into such ‘legibility conditions’ could help us understanding intriguing linguistic phenomena such as the cross-linguistic lack of verbal wh-words (section 5). A final section with general conclusions closes the article.

2. Radical externalization There is a sort of tension in contemporary syntactic theorizing between approaches that seek to explain phenomena at the interface between discourse and syntax as being eminently syntactic (e.g. the cartographic enterprise of Rizzi 1997 and oth- ers, or the rich articulation of the discourse-syntax interface in Haegeman & Hill 2014) on the one hand, and more programmatic proposals such as Berwick & Chomsky (2011) or Boeckx (2011, 2014) that argue that syntax is basically just Merge (structure building), and all cross-linguistic variability is restricted to the externalization component, on the other hand. For instance Boeckx (2014) defends the Strong Uniformity Thesis, with the consequence that “all of cross-linguistic variation reduces to realizational options available in the externalization component (‘PF’)” (Boeckx 2014: 139):

(1) Strong Uniformity Thesis: Principles of narrow syntax are not subject to para- metrization; nor are they affected by lexical parameters.

According to this hypothesis, phenomena that show variable patterns that were previously thought to derive from syntactic parameters are better under- De-syntacticising Syntax? CatJL Special Issue, 2019 169 stood as differences in the realization/externalization of a cross-linguistical- ly homogeneous underlying syntax. This is, in a nutshell, what Tokizaki & Dobashi (2013) and Tokizaki (2016) call the ‘Universal Syntax and Parametric Phonology’ thesis. This ambitious research program is nonetheless virtually unexplored. The most complete such proposals may be Tokizaki’s (2013) analysis of the compounding parameter as deriving from word-prosodic restrictions, or Mathieu’s (2016) analysis of the variability in the realization of wh-questions (previously analysed under the “wh-parameter”). What follows discusses these proposals (sections 2.1 and 2.2 respectively) and section 2.3 overviews some of the general empirical problems that radical exter- nalization theories face.

2.1. A radical externalization approach to the compounding parameter Tokizaki (2013) proposes that the cross-linguistic availability of recursive N+N compounding derives from word-prosodic restrictions:

(2) A complement moves to the specifier position to make a compound if the resulting structure has an acceptable prosody of a word in the language (Tokizaki 2013: 284).

For instance, he argues that the fact that English has productive N+N com- pounds such as (3) depends on the stress-pattern of the resulting compound (i.e. that the resulting structure has the same stress location as a word in that language (represented in the following examples with the stressed syllable underlined)):

(3) banana-box

In contrast, its Spanish variant in (4) would be ungrammatical because it would have ante-antepenultimate stress ([-4]) in a language that normally has stress on the penultimate [-2] syllable (but that can have stress in any of the ultimate [-1], penultimate [-2] or antepenultimate [-3] syllables):

(4) *banana-caja [Spanish] banana-box banana-box

The spirit of the idea is notably minimalist, but I think that this analysis may be too powerful, for it predicts that Spanish should allow for compounds, provided that they keep stress in the penultimate position (the ‘default’ stress position in that language). Test cases would be examples such as the ones in (5), which are ungrammatical: 170 CatJL Special Issue, 2019 Aritz Irurtzun

(5) a. *sol-luz [Spanish] sun-light sunlight b. *terror-rey terror-king king of terror c. *cristal-cruz crystal-cross cross of crystal d. *champán-bar champagne-bar champagne-bar e. *jazmín-té jasmine-tea jasmine-tea f. *maíz-pan corn-bread cornbread

Furthermore, it is not clear how such a proposal could capture the clustering phenomena typically associated to the compounding parameter such as the (un) availability of resultatives (Snyder 1995, 2001). The original observation of Snyder is that transitive resultative constructions such as English (6) are possible only in languages with productive N+N compounding (cf. the typology in Table 1).

(6) John hammered the metal flat.

Thus, I believe that something else should be said about these patterns if we are going to accept that they are due to patterns of externalization. In the next section I review one of the most detailed and most ambitious propos- als for a radical externalization approach to a phenomenon that has generally been thought of as syntactic: Mathieu’s (2016) analysis of the cross-linguistic distribu- tion of different wh-question strategies.

2.2. A radical externalization approach to wh movement vs. wh in situ Mathieu’s (2016) analysis is concerned with the licensing of wh in situ. It is con- ceived as a ‘radical externalization’ approach where “the wh parameter is com- pletely relegated to PF” (Mathieu 2016: 252). More precisely, the interrogative strategy/ies that can be used in any particular language depend(s) on its prosodic properties (with respect to the expression of prominence). In this regard, the analy- sis distinguishes between two types of languages: De-syntacticising Syntax? CatJL Special Issue, 2019 171

Table 1. The Compounding Parameter and cross-linguistic variation (from Snyder 2001, 329) Resultatives N+N Compounds American Sign Language yes yes Austroasiatic (Khmer) yes yes Finno-Ugric (Hungarian) yes yes Germanic (English, German) yes yes Japanese yes yes Korean yes yes Sino-Tibetan (Mandarin) yes yes Tai (Thai) yes yes Basque no yes Afroasiatic (E. Arabic, Hebrew) no no (?) Austronesian (Javanese) no no Bantu (Lingala) no no Romance (French, Spanish) no no Slavic (Russian, Serbo-Croatian) no no

1. Culminative languages (e.g. Germanic and (most) Romance): these languages “have lexical stress and always link the prominence of the focused constituent to a stressed syllable” (Mathieu, 2016, 264). 2. Demarcative languages (e.g. Korean and Japanese): these languages “resort to the insertion of boundaries either to the left or right (or both) of the intonational phrase to mark focus without any pitch accent on a particular syllable” (Mathieu 2016: 264).

Importantly, such a typological division of languages is taken to be a highly consequential one: Mathieu (2016) argues that wh-in-situ languages are languages that use the demarcative strategy only. Thus, “French is a wh-in-situ language because of its inherent prosodic properties and in particular because of the way focus is realized in the language. More generally, [Mathieu argues] that, whereas wh movement languages tend to use pitch accents followed by deaccenting to express focus, wh-in-situ languages tend to use prosodic phrasing. Languages in the first group usually have lexical stress, whereas those in the second one do not. In other words, the option to move or not to move in a given language is constrained by the limits imposed by the phonology of the language. Variation is thus not part of syntax but completely external to it” (Mathieu 2016: 281). I believe that this approach is mistaken. In recent work (Duguine & Irurtzun 2019) we have argued that such a proposal is problematic; two main types of problems could be mentioned: On the one hand, there is no clear ground for the typological distinction between “culminative” vs. “demarcative” languages. In Mathieu’s (2016) analysis says that “while many languages that use the culminative 172 CatJL Special Issue, 2019 Aritz Irurtzun strategy also make use of the demarcative strategy, the reverse is not true”, but the claim that languages classified as demarcative do not employ pitch accents seems to be unwarranted. In fact, languages classified as demarcative do not restrict their expression of focus to phonological phrasing, but amply employ pitch accents and other local prosodic events to mark focus: higher F0 excursion in pitch accents and tone bearing units, elongated moraic/syllabic duration, higher intensity values, and gestural hyperarticulation are all attested in “demarcative” languages such as Japanese (see e.g. Beckman & Pierrehumbert 1986; Pierrehumbert & Beckman 1988; Fujisaki & Kawai 1988; Maekawa 1999; Kubozono 2007; Venditti et al. 2008; Ishihara 2011, 2015), Korean (Hwang 2006; Lee 2007; Hwang 2011; Kim & Jun 2009), or Mandarin (Xu 1999; Gu et al. 2003; Liu & Xu 2005; Chen & Gussenhoven 2008; Lee et al. 2016). Besides, the existence of the consequential cross-linguistic tendencies with respect to wh-questions is not obvious either: for instance, Amharic has stress-accent (Haile 1987) but also wh-in-situ (Eilam 2008), as do Pashto (Tegey & Robson 1996; David 2014), Uyghur (Yakup & Sereno 2016; Major 2014), Marathi (Wali 2005; Rao et al. 2017; Dhongde & Wali 2009), or Ancash Qechua (Hintz 2006; Cole & Hermon 1994). Actually, Basque is illuminating in this respect. In general, this language shows syntactic homogeneity across its dialects, but a wide prosodic variability. So it appears problematic for any approach derivationally tying syntactic patterns to phonological patterns; the theory would predict that they should co-vary but often times they do not (see also section (3)). For instance, research into cross-linguistic prosodic typology over the last twenty years has underlined a range of similarities between the word-prosodic patterns of Northern Bizkaian Basque (and just those varieties of Basque) and Tokyo Japanese. Hualde et al. (2002: 578) even argue that “the striking coincidence between some Basque varieties (NB) and Tokyo Japanese in a number of important prosodic properties suggests that this set of common properties can be used to characterize a prosodic prototype: T-type pitch-accent” (see also Elordieta 1998; Ito 2002; Gussenhoven 2004 for discussion). However, Northern Bizkaian Basque is an obligatory wh-movement variety (cf. Hualde et al. 1994), unlike Japanese. On the other hand, one of the few syntactic differ- ences across Basque dialects is to be found on wh-constructions (and focalizations): as a matter of fact, Labourdin Basque is a stress-accent variety (cf. Gaminde & Salaberria 1997; Hualde 1999, 2003), hence a culminative language under Mathieu (2016)’s typology, but it is a variety that has recently developed optional wh in situ (Duguine & Irurtzun 2014), unlike the rest of stress-accent varieties of Basque that have obligatory wh-movement (see Irurtzun 2016, for an overview). In conclusion, I think that we cannot maintain that syntactic operations such as wh-movement are dependent on phonological computations of stress, since such a theory predicts a co-variation not observed in the cross-linguistic comparison.

2.3. General problems for the radical externalization thesis Beyond specific proposals, I would like to discuss how the thesis that all crosslin- guistic variation is restricted to externalization faces nontrivial problems with De-syntacticising Syntax? CatJL Special Issue, 2019 173 patterns in which syntactic (“word order”) variation seems to be correlated with semantic variation. In principle, the prediction of the radical externalization the- sis is that we should not observe any cross-linguistic variability in semantics (no matter it is genuinely semantic in essence, or derived from syntax). What follows provides a glimpse into some types of phenomena that are problematic for the radical externalization thesis. I would like to underline from the outset that even if we discovered that the typological generalizations at the base of the following proposals were not that strong, the existence itself of variation in semantics (the availability or not of a cer- tain reading in some languages/idiolects) casts doubts on the viability of a radical externalization thesis. In the following I briefly review three proposals of variation with interesting ties to syntax, but it should be noted that pure idiosyncratic seman- tic variation itself would also constitute evidence against the radical externalization hypothesis (see for instance the references at the end of this section).

2.3.1. Differences in the interpretation of interrogatives The first example of nontrivial semantic differences that I would like to discuss concerns the interpretation of multiple wh-interrogatives. Bošković (2003) analy- ses the patterns of interpretation of wh-movement and wh in situ languages, and his observation is that wh in situ languages allow for both Pair-List and Single- Pair interpretations of multiple wh-question sentences. As an illustration, consider example (7) from Japanese:

(7) Dare-ga nani-o katta no? [Japanese] who-nom what-acc bought q Who bought what?

Question (7) can be felicitously used in a situation where we have a shopping list for a party and each guest ought to buy something. In such a situation, (7) could be answered with a list of buyers and their corresponding items (say, as in (8)). This is the so-called Pair-List interpretation:

(8) Hanako-ga wain-o katta, Miki-ga biru-o katta… [Japanese] Hanako-nom wine-acc bought Miki-nom beer-acc bought… Hanako bought wine, Miki bought beer…

A similar thing happens in the English counterpart to (7) in (9A), which can be naturally answered by a list such as (9B):

(9) A. Who bought what? B. Mary bought wine, Susan bought beer…

However, Bošković (2003) argues, the difference between wh in situ languages like Japanese and obligatory wh-movement languages like English is that multi- ple questions like (7) in wh in situ languages also allow for Single-Pair answers, 174 CatJL Special Issue, 2019 Aritz Irurtzun whereas their counterparts in wh-movement languages do not. As such, (7) can be felicitously uttered in a context such as the one in (10), which requires a PairList reading:

(10) Context: John is in a store and in the distance sees somebody buying a piece of clothing, but does not see who it is and does not see exactly what the person is buying. He goes to the sales clerk and asks the question in (7).

Languages displaying obligatory wh-movement like English on the other hand, do not have the Single-Pair reading, and as a consequence (9A) cannot be uttered in the context of (10). What is more, optional wh-movement languages like French provide strong evidence for such a typological claim, with each type of construc- tion patterning as expected. The wh in situ construction of (11a) allows for both Pair-List and Single-Pair readings, whereas the wh-movement construction of (11b) only allows for the Pair-List reading:

(11) a. Il a donné quoi à qui? [French] he aux given what to whom What did he give to whom? b. Qu’a-t-il donné à qui? what.has.he given to whom What did he give to whom?

The analysis in Bošković (2003) is that Single-Pair multiple wh-questions denote sets of propositions (type ), but pair-list multiple wh-questions denote sets of questions (i.e., sets of sets of propositions; type ), and that it is the overt movement of a wh-phrase to Spec-CP that generates the loss of the Single- Pair interpretation. Assuming, following Hagstrom (1998), that the interrogative Q morpheme is an existential quantifier over choice functions, its merger outscoping several wh-phrases will derive in a Single-Pair reading (when the wh-phrases are left in situ, (12)), whereas movement of a wh-phrase to C (and crossing Q) generates a relativized minimality effect, which derives in the loss of the wh in situ reading (13):

(12) C Q [wh1 wh2 V]

(13) wh1 C Q [t wh2 V]

Questions with Pair-List interpretations can be generated with no problems, for the movement does not affect the scope of the Q particle, which is always attached to the lowest wh-phrase:

(14) wh1 C [t wh2+Q]

See Bošković (2003) for more details on this theory of the syntax-semantics interface but importantly, whatever our analysis of these facts, they seem to provide De-syntacticising Syntax? CatJL Special Issue, 2019 175 strong evidence against the idea that all cross-linguistic variation is restricted to interface components. Here, it seems that there is a categorical semantic variation and that syntax (word order) and semantics are observed to go hand in hand. In the following, I will briefly mention a couple of similar cases that in my view provide reasons for scepticism with respect to the radical externalization approach.

2.3.2. Differences in the availability of ‘telic pairs’ In a series of works, Higginbotham (2009a, 2009b) discusses the idea that accom- plishments like resultative constructions are syntactically represented by ordered pairs of eventualities, and that “the ‘accomplishment’ interpretation of a predicate may stem from the complex thematic structure of a preposition, a syntactic adjunct, rather than from the head verb” (Higginbotham 2009a: 116). His claim is that the structures are telic pairs; holding that the formation of telic pairs is a compositional, rather than a lexical, process. There are languages with the possibility of generating telic pairs like English, where example (15) is ambigu- ous between a stative and a motion reading (i.e., ‘the boat stays floating under the vertical projection of the bridge’ vs. ‘the boat went to some space under the bridge, floating the while’). However, other languages such as Italian with constructions like (16), only have the stative reading:

(15) The boat is floating under the bridge.

(16) La barca galleggia sotto il ponte. [Italian] the boat boat under the bridge The boat is floating under the bridge. (stative)

In order to account for this type of data, Higginbotham (2009a, 2009b) pro- posed a (de-)compositional analysis whereby the essential difference between lan- guages like English and Italian is that English allows for a combinatorial opera- tion that generates telic pairs of events, whereas Italian doesn’t. Interestingly, the idea in Higginbotham (2009a, 2009b) is that this feature is not idiosyncratic of V-P constructions, and he proposes that the same mechanism underlies complex constructions such as resultatives (17), which are naturally available in languages allowing the motion directional reading like English (or Chinese), but totally absent in languages lacking it such as Italian.

(17) I wiped the table clean.

The proposal in Higginbotham (2009a, 2009b) may raise skepticism, for it pro- poses a semantic parameter distinguishing languages allowing a specific semantic combinatorial operation and languages disallowing it (see also also Table 1 and Snyder 1995, 2001, 2005 for a proposal on related constructions), but the range of phenomena discussed cannot easily be reduced to a mere externalization parameter: they involve complex syntax-semantics pairings which apparently can be generated in some languages but not in others. 176 CatJL Special Issue, 2019 Aritz Irurtzun

2.3.3. Interpretive consequences of V raising In recent work Han et al. (2016) have analysed the variability with respect to verb raising observed across Korean idiolects. Korean being a verb-final language, its basic word order (18) is compatible with both verb-raising (19) and tense lowering (20) constructions:

(18) Kim-i cacwu Lee-lul piphanha-n-ta [Korean] Kim-nom often Lee-acc criticize-pres-decl Kim often criticizes Lee.

(19) (20)

Thus, an important part of the input that Korean-learning children are exposed to is critically underspecified as to whether it was generated with a verb raising grammar or a tense-lowering grammar. However, as argued by Han et al. (2007, 2016) the relative scope between negation and object QPs provides an appropriate diagnostic for the position of the verb in a Korean speaker’s I-language: if there is verb raising, negation (a clitic) moves with it, and as a consequence it outscopes the object QP. On the contrary, if there is no verb-raising, the object QP takes scope over negation. The preceding literature on the topic provided mixed judgments on these issues and a blurred theoretical image, but Han et al. (2016) show that rather than a sto- chastic procedure, the option of V raising vs. T lowering is grammaticalized in each Korean idiolect, and that there are actually two varieties of Korean grammar coexisting: one with verb-raising (19), the other one without it (20). Remarcably, the participants in Han et al.’s (2016) study show stable judgments across test items and experimental sessions.2

2. Furthermore, the rarity and scarcity of the evidence that signals whether a chain was generated with a verb-raising or a tense-lowering grammar makes that this grammatical choice is left underspeci- fied so that children do not necessarily converge on the same grammatical options of their parents. As a consequence, children’s grammar is not necessarily the same as their parents’, as illustrated by the lack of correlation between their judgments. Han et al. (2016) suggest that this is a case of ‘endogenous’ linguistic variation. De-syntacticising Syntax? CatJL Special Issue, 2019 177

Again, concerning our discussion in this paper, I take it that the fact that this virtually invisible movement has predictable and stable semantic consequences argues against the conception that all variation is restricted to the externalization component.

2.4. Conclusion The radical externalization thesis is an elegant programmatic position that seeks to understand the commonality to human languages (actually, to human language) on the basis of the idea that syntax (structure building computation) is homogeneous across the species and that it is inherently directed towards the construction of complex thought. Externalization would just be secondary to this process, and all the cross-linguistic variability would arise during externalization (it would amount to different ways of expressing the same structure/thought). I believe that the evidence that I have discussed casts doubts on such a position. I would like to underlie again that one does not have to believe in the reality and generality of proposed ‘semantic parameters’ such as Chierchia’s (1998) parameter for the differential denotation of nominals across languages, or the aforementioned ‘semantic composition’ parameter by Higginbotham (2009a); see for instance Duguine et al. (2017) for discussion. If there are non- trivial semantic differences across languages, given the inverted-Y model of the architecture of grammar they can only derive either from syntax (whereby different positions determine differences in interpretation (say, different landing sites determine different scopal interpretations)) or from semantics itself (the use of different semantic combination operations, different domains, etc.). They cannot derive from externalization if externalization has nothing to do with the path to LF/SEM. The discussion above briefly commented on a couple of cases that cannot be captured in terms of externalization in any obvious way, but the literature on the syntax-semantics interface is full of similarly problematic phenome- na of cross-linguistic variation (see i.a. Bach et al. 1995; Chung & Ladusaw 2004; Matthewson 2010; Arregui et al. 2014; Matthewson 2014; Etxeberria & Giannakidou 2014; Holmberg 2016; Keenan & Paperno 2017; Scontras et al. 2017). It is unlikely that this type of phenomena can be accounted for as differ- ences in externalization. Next section discusses another type of conceivable relationship between inter- face components and syntax; namely, the hypothesis that interface components may interact with syntax during derivations.

3. Actively affecting the syntactic computation A more active way in which interface components may affect ‘syntactic’ compu- tations is by having parallel computations in, say, phonology and syntax where structures generated in the former serve as the structural description for the opera- tions taking place on the latter. 178 CatJL Special Issue, 2019 Aritz Irurtzun

However, this type of proposals require a radical change in our conception of the architecture of grammar, for in the ‘inverted-Y’ model interface components cannot directly interact with syntax (Zwicky & Pullum 1983; Miller et al. 1997; Irurtzun 2007, 2009). Proponents of this type of interactions have thus proposed alternative conceptions of the architecture of grammar allowing such types of inter- action. For instance, the one in Figure 3, from Vallduví (1995), presents a direct link between SS and LF, and an indirect link with PF, with an additional submodule of Information Structure (IS) which is somehow parallel to the syntactic computa- tion (the dashed line means that further strata may be needed to represent other relations) (Figure 3). More famously, Zubizarreta (1998) proposes a different type of architecture with the level of LF at the center stage of the derivation (Figure 4). As can be seen, this conception is quite in line with the ‘radical externalization’ hypothesis presented in the previous section. According to Zubizarreta’s (1998) model, the derivation unfolds creating sets of phrase markers until one single phrase

Figure 3. A model of grammar that incorporates a separate level of IS (from Vallduví 1995: 147).

Figure 4. A model of grammar with a post-LF level of Assertion Structure (from Zubizarreta 1998: 32). De-syntacticising Syntax? CatJL Special Issue, 2019 179

Figure 5. A tripartite parallel architecture (from Jackendoff 2002: 125). marker is obtained at the level of Σ-Structure. At this point, operations such as Focus Marking, the Nuclear Stress Rule (NSR) and prosodic movements take place until we reach the level of LF. There the derivation branches in two branches, one that derives in a PF representation and the other one in the “Assertion Structure”, which is the information structure of the sentence where the focus-presupposition partition is encoded. Last, works like Jackendoff (1997, 2002) have proposed an even more power- ful model with fully parallel phonological, syntactic and semantic components with independent generative power that generate structures that are then linked via structure interface (or correspondence) rules (Figure 5). Probably the best candidate for phonology affecting syntax may be the pur- ported correspondence between focus and stress, which is taken to drive movement operations with semantic consequences in some languages. In the next section, I briefly overview the major tenets and some shortcomings of such approaches. Next, in section 3.2 I analyse a novel take on wh-questions that is based on the same type of conception of the architecture of grammar, since it builds on p-phrasing for explaining the wh-question construals available crosslinguistically. I argue that this type of proposals have important shortcomings.

3.1. The Nuclear Stress Rule and focus/wh-questions The Nuclear Stress Rule (NSR) governs the assignment of nuclear stress in the clause. The classical theory of Halle & Vergnaud (1987) was a variable stress assignment rule with different parameters such as “head terminal” (“whether or not the head of the constituent is adjacent to one of the constituent boundaries” (Halle & Vergnaud 1987: 9), and “BND” for “boundedness” (“whether or not the head of the constituent is separated from its constituent boundaries by no more than one 180 CatJL Special Issue, 2019 Aritz Irurtzun intervening element” (Halle & Vergnaud 1987: 10). For English, the NSR would have the parameter setting in (21):

(21) The Nuclear Stress Rule (Halle & Vergnaud 1987: 264): — The Parameter settings on line N (N≥3) of the Metrical Grid are [-BND, +HT, right].  — Interpret boundaries of syntactic constituents composed of two or more stressed words as metrical boundaries. — Locate the heads of line N constituents on line N+1.

With this setting, the nuclear stress assignment to Judea in (22) is explained as a simple bottom-up composition of the metrical grid (ex. 83, p. 265 of Halle & Vergnaud 1987):

(22) Jesus preached to the people of Judea. . . . * ) Line 6 ( . . . * ) Line 5 . ( . . * ) Line 4 * * * * ) Line 3 [Jesus [preached to the [people of Judea]]]

However, in one of the most influential articles on the syntax-phonology inter- face, Cinque (1993) argued that the phonological parametrization of the NSR was superfluous, for it missed a generalization: nuclear stress is crosslinguistically assigned to the most deeply embedded element with the syntactic structure, so at the interface it suffices to turn syntactic phrase structure into phonological metrical grids and the more embedded an element is in the syntax, the more embedded it will get in the phonology:

(23)

As a consequence, the positional variability observed in head first (SVÓ) vs. head last languages (SÓV) is illusory: it is not the case that in SVO languages such as English or Spanish nuclear stress is assigned to the last element vs. in SOV lan- guages such as Japanese or Basque it is assigned to the central element. In fact, in both types of languages it is assigned to the most deeply embedded element (which De-syntacticising Syntax? CatJL Special Issue, 2019 181 happens to be the O in the unmarked case). Then marked operations such as stress shift or prosodically motivated movements will take place in order to guarantee that the element to be focused receives nuclear stress in PF. Built upon these observations, a whole line of analysis developed seeking to account for the patterns observed in focus (see in particular Zubizarreta 1998; Reinhart 2006) and interrogative (Reglero & Ticio 2013) constructions as deriving from a purported PF constraint requiring the element to be interpreted as focus to get nuclear stress. Thus, the movements observed in these constructions in some languages are taken to take place in order to guarantee that the focus/interrogative is placed in the most embedded position (the position where it will get nuclear stress by the NSR). I think that this type of approach is misleading: in previous work (Irurtzun 2007, 2009). I have argued that such a position is not tenable given that it faces a range of conceptual and empirical problems. Here I will not repeat those arguments but I would like to stress a couple of points, based on recent discussions on the literature. One of the key assumptions of the NSR-based theory of focus is that nuclear stress is not just a correlate of focus; nuclear stress is taken to be not just one of the possibly many manifestations of an underlying focus representation, it is rather –according to this theory– an essential part of the nature of focus, so much so that the whole derivation is affected so that nuclear stress ends up being assigned to a specific item. This alleged intimate relationship between nuclear stress and focus could be understood in an embodied cognition approach as a grammaticalization of the ‘effort code’ (Gussenhoven 2004): more articulatory effort amounts to more vibration of the vocal folds (articulatory phonetics), which in turn amounts to higher excursion in f0 frequencies (acoustic phonetics), which in turn corresponds to a categorical distinction in terms of pitch accent (phonology), and which finally is associated to a contrastive or emphatic interpretation (semantics), that is, focus. However, there are many languages that do not behave as suggested by this vision. One such case is Mandarin, where items lexically associated to Tone 3 (a falling tone) show lower values when pronounced under focalization (Lee et al. 2016). That is, even if Tones 1, 2, and 4, which involve f0 rises, display higher f0 values when pronounced in focus, Tone 3 does the opposite and reaches lower f0 values when contrastively focused. This casts doubt on the assumption that PF demands focal elements to be associated to higher f0 values. A potential way to circumvent the problem posed by this type of evidence would be to say that the Mandarin data could be taken to indicate that the PF demand is really to somehow ‘hyperarticulate’ the focal element, so that when its tone involves phonological rises, higher f0 values are obtained, and when it involves phonological falls, lower f0 values are obtained. In a nutshell, it would be a matter of exaggerating the tonal events so that the signal involves a larger f0 excursion and the overall acoustic pattern is clearer/easier to discriminate. Intuitive as this line of thought may be, I think it is not correct. To begin with, other languages such as Akan (Kügler & Genzel 2011) employ pitch register lower- ing to signal focus, which argues directly against the purported condition on hyper- articulation. In Akan, as in Mandarin, L tones are pronounced with lower F0 values 182 CatJL Special Issue, 2019 Aritz Irurtzun when in focus, but the same strategy is employed with H tones too; and the more emphatic the interpretation of the focus, the lower the pitch register both for L tones and H tones. Last, other languages do not employ any prosodic means for mark- ing focus. Such is, for instance, Malay, as reported by R. Maskikit-Essed and C. Gussenhoven on a paper illustratively entitled “No stress, no pitch accent, no pro- sodic focus: the case of Ambonese Malay” (Maskikit-Essed & Gussenhoven 2016), but see also among others Zerbian (2007) on Sotho, Downing (2007) on Chichewa, Chitumbuka and Durban Zulu, Kügler & Skopeteas (2007) and Gussenhoven & Teeuw (2008) on Yucatec Maya, Gut et al. (2013) on Malaysian English, Wang et al. (2011) on Wa, Daeng (Mon-Khmer) and Yi (Sino-Tibetan), or Xu et al. (2012) on Taiwanese Mandarin.3

3.2. P-Phrasing and the interrogative strategies In recent work, Richards (2010) (see also Richards (2016)) proposes a theory according to which the interrogative strategies used by specific languages are (in part) determined by their prosodic properties, but instead of basing his analysis on nuclear stress placement, he builds it on the idea that at PF there is a constraint requiring the wh-word and the interrogative complementizer to be contained within the same prosodic phrase:

(24) “Given a wh-phrase α and a complementizer C where α takes scope, α and C must be separated by as few Minor Phrase boundaries as possible, for some level of Minor Phrasing” (Richards 2010: 151).

Regarding cross-linguistic variability, different languages are said to satisfy this constraint by appealing to different strategies:

— Changes in the prosodic phrasing (some sort of “Prosodic rephrasing”). — ‘wh-movement’ to the C domain.

In line with minimalist desiderata, the idea is to derive the question formation strategy that a language will employ from parametric choices which are independ- ent of question-formation. These would be (i) the relative order of heads and their complements (locus of C°), and, crucially, (ii) the alignment pattern of prosodic phrase boundaries. Within this system, Richards (2010) analyses (Northern Bizkaian) Basque as a variety with final complementizers and Minor Phrase Boundaries to the right of certain XPs such as wh-phrases, studying the patterns of wh-questions in Basque as

3. Beyond the range of problems discussed in Etxepare & Uribe-Etxebarria (2005, 2012), the analysis of Reglero & Ticio (2013) linking wh-phrases to nuclear stress faces a further problem in that the purported cross-linguistic association between wh-words and nuclear stress is not cross-linguisti- cally stable (the case of Italian is famous for instance (Ladd 1996; Bocci et al. 2017), cf. also the observation that during language acquisition children may tend to directly drop wh-phrases (De Lisser et al. 2015). De-syntacticising Syntax? CatJL Special Issue, 2019 183 being derivative of a dynamic syntax-phonology interface that seeks to guarantee the fulfilment of the condition in (24). It should be noted that approaches tying ‘syntactic’ structures and phonological structures can and do face granularity problems; in principle, they predict that pro- sodic typologies and syntactic typologies should co-vary. Restricting my discussion to Basque, Richards’s (2010) analysis cannot be extended to other dialects of the language which, having a different word-prosodic system, have the very same syn- tax for interrogatives (say, any of the stress-accent varieties, from Central Basque to Souletin (Irurtzun 2016)). It cannot either account for the emergent Labourdin variety which, having a similar prosodic system to those of other varieties, has a different syntax for interrogatives. Once again, I completely agree with the desirability of seeking extra-syntactic explanations for syntactic patterns, but I am afraid that those theories are too pow- erful and too weak at the same time. They predict co-variation between syntax and phonology that is not observed cross-linguistically, and the models of architecture implied in those works require substantive changes if they are to obtain explana- tory power.4 In the next two sections I discuss two further possible ways for the interfaces to affect syntax that, I’ll suggest, may indeed help us understanding patterns of syntactic universals and variation.

4. Reflecting syntax and biasing acquisition Contrasting with the previous conceptions, a different line (or lines) of investigation hypothesize that interface components have an important impact not in syntactic derivations but on the development of the syntactic hypothesis space that a child will consider during early language acquisition. The idea is that the child uses perceptual input (visual/acoustic signals and situations) to hypothesize grammati- cal structures during language acquisition, assuming some degree of homomor- phy between syntactic structures and the representations of the input at interface components.5 Thus, a relative homomorphism between syntax and the interfaces, combined with the sensorial experiences of the children serve to bias the process of language acquisition. Within this general sets of ideas, two main areas of research have been developed:

— Semantic bootstrapping theories for predicate argument structure. — Prosodic bootstrapping theories for head-complement orders.

In the following, I briefly present the major tenets of each of these approaches. The argument will be that these approaches uncover processes where semantic and

4. See Duguine & Irurtzun (2019) for further criticisms. 5. One may conjecture that the relative homomorphy derives from economy/simplicity metrics in inter-modular interface transductions (cf. Reiss 2007; Graf 2013). 184 CatJL Special Issue, 2019 Aritz Irurtzun phonological information serve to inform the determination of what traditionally have been taken to be syntactic phenomena (patterns of phrase structure and word order, and argument structure). Note, however, that this is radically different from having interface components actively affecting syntactic derivations.

4.1. On the LF side: semantic bootstrapping A major early contribution of the generative enterprise is the idea of the autonomy of syntax (Chomsky 1955: 1957). This hypothesis has an important implication regarding language acquisition, as has been emphasized by Grimshaw (1981) among others: given that there is no deterministic co-variation between syntactic types and semantic types, a child cannot directly deduce a syntactic analysis from an analysis of the semantics of a phrase. As a consequence of this, she must learn the two kinds of information separately. But contrary to what may appear at first sight, this has as a consequence the virtue of easing acquisition. In Grimshaw’s (1981: 169) terms “if there are n bits of syntactic information to be acquired, and m bits of semantic information, n + m bits of evidence are needed for learning in the autonomous theory, nm in the nonautonomous theory”. The semantic bootstrapping hypothesis can be defined as built on the idea that “the child can access a structural representation of the intended semantics or conceptual content of the utterance, and that such representations are sufficiently homomorphic to the syntax of the adult language for a mapping from sentences to meanings to be determined” (Abend et al. 2017: 117). For instance, “if children know that a word refers to a thing, they can infer that it is a noun; if they know that X is a predicate and Y is its argument, they can infer that X is the head of a phrase that includes Y; if they know that a phrase is playing the role of agent, they can infer that it is the subject of the clause” (Pinker 1989: 425).6,7 Thus, authors like Pinker (1989) have proposed models of language acquisition where the ‘linking problem’ is partially solved via a range of semantically informed hypotheses about the syntax of the elements in the in the input. If this hypothesis is correct, semantic information would have an effect on syntax, not directly in the derivational computation, but in biasing development (Figure 6).

4.2. On the PF side: prosodic bootstrapping The inverted-Y architecture of grammar (Figure 1) makes the claim that syntactic representations are somehow mapped onto prosodic representations. An oneiric image of the syntax-phonology interface would give the image of a perfect map- ping, such as the one in (25):

6. See also Grimshaw (1981); Bates & MacWhinney (1989); Clahsen et al. (1994) and Gleitman (1990) for discussion. 7. See also Markman (1992) for related issues on the problem of induction in word learning for objects. De-syntacticising Syntax? CatJL Special Issue, 2019 185

Figure 6. An idealization of the start of language acquisition according to the semantic boots- trapping hypothesis (from Pinker 1989: 426).

(25)

186 CatJL Special Issue, 2019 Aritz Irurtzun

But even if the empirical reality differs substantially from such a picture, this is the starting point of virtually all analyses of the syntax-phonology interface: the assumption is that there is an interface procedure so that syntactic representations are mapped to (wrapped in, aligned/matched with, etc.) prosodic units (see Nespor & Vogel 1986; Selkirk 1986; Truckenbrodt 1995, 1999; Seidl 2001; Dobashi 2003; Wagner 2005; Tokizaki 2008; Elordieta 2008; Selkirk 2011; Selkirk & Lee 2015, for discussion and a range of different views). In the unmarked (most faithful) case, it will be a direct mapping (XP→φ), but very often purely phonological constraints concerning p-phrase uniformity, symmetry, or minimum and maximum size also come into play, and the result of the interface transduction deviates from the per- fectly homomorphic pairing. With respect to our main discussion here, a number of works have identified interesting patterns of correspondence between PF and syntax with respect to rhythm and word order. In particular, several authors propose that the rhythmic pattern of a language is not an idiosyncratic and isolated property, but rather that it is strongly correlated with word order (i.e., that there are correlations between rhythmic patterns and syntactic patterns in that languages tend to cluster with the same rhythmic and syntactic properties, conforming cross-modular linguistic typologies). Furthermore, the explanation of this typological clustering is proposed to derive from the fact that rhythmic patterns serve to bootstrap the acquisition of the specific syntactic patterns of each language (cf. i.a. Mehler et al. 1988; Christophe et al. 2003; Bernard & Gervain 2012; Gervain & Werker 2013; Langus & Nespor 2013).8 In a nutshell, the basic idea of the prosodic bootstrapping hypothesis is that the relative order between heads and their complements is strongly correlated with the rhythmic type of the language, and that infants use their accumulated knowledge about the prosody of their target language(s) to build informed guesses about their corresponding syntactic pattern. This theory builds on a number of experiments that have shown that languages whose correlates of phrasal accent are increases in duration and intensity tend to be head-initial (with a Verb-Object word order) whereas languages that realize stress through a combination of higher pitch and intensity (and possibly also duration) tend to be head-final (with an Object- Verb word order). This generalization is known as the ‘iambic-trochaic law’ (cf. i.a. Hayes 1995; Nespor et al. 2008; Shukla & Nespor 2010), and is taken to be a basic law of grouping based on general auditory perception (i.e. not specific to language). This law states that units (language or music) that differ in intensity tend to be grouped as constituents in which the most prominent element comes first, whereas units that differ in duration are grouped as constituents in which the most prominent element comes last. As Nespor et al. (2008) put it, “if [their] proposal is on the right track, one of the basic properties of syntax can be learned through a general mechanism of perception”. Summarizing then, the prosodic

8. See also Donegan & Stampe (1983, 2004) who on independent grounds propose a ‘holistic typology’ based on rhythmic grounds in order to account for the polarized structural divergence of languages such as Munda and Mon Khmer. De-syntacticising Syntax? CatJL Special Issue, 2019 187 bootstrapping hypothesis claims that beyond the observed typological correlation between prosodic and syntactic patterns, there is a causal developmental connec- tion between them: babies use prosody to inform their guesses about the syntactic pattern of their target language.9 In favor of this hypothesis, recent studies such as Gordon et al. (2015) sug- gest that there is a correlation between rhythm perception skills and morphosyn- tactic production in children with typical language development, others such as Flaugnacco et al. (2014); Leong & Goswami (2014) also argue for a strong associa- tion between reading skills and meter perception and rhythm processing, and yet other studies such as Zumbansen et al. (2014) report the beneficial effects of both pitch and rhythm in the clinical therapy for patients with Broca’s aphasia. So, in a nutshell, if the prosodic bootstrapping hypothesis is correct, it would be a case where PF influences syntax, not in derivational terms, but in developmental ones.

5. Legibility conditions at the interfaces Last, I would like to discuss another possible relation of the interface components with respect to syntax; that of bare output conditions as restrictions on the types of representations that they can handle (and as a consequence, of the types of representations that syntax can provide as its output). In what follows, I will briefly discuss a couple of restrictions that interface components may impose on the output of syntax. They are based on minimal requirements that derive from the architecture of the language-external systems that interface with the linguistic levels of PF (the Articulatory-Perceptual apparatus) and LF (the Conceptual-Intentional apparatus), cf. Figure 1. The general idea is that legibility conditions imposed by language-external apparatus constrain the types of representations that may derivationally arrive there, and that this is reflected in the restricted cross-linguistic variability.

5.1. On the PF side The nature of the human Articulatory-Perceptual apparatus dictates a range of legibility constraints on the representations it can handle. Arguably, one such case could be the existence of maximum size constraints in prosodic phonology (see i.a. Delais-Roussarie 1995; Selkirk 2000, 2011; Elordieta et al. 2005; Jun 2005), with the result that prosodic phrases tend to be contained within the limits of breath groups (that is, even if in principle computable by UG, phonological p-phrases larger than n syllables would be difficult to produce, and difficult to process and acquire too).

9. Developmentally, such a theory is reinforced by the fact that a large part of the neurocognitive machinery required for processing and learning prosodic patterns is developed before the syntactic abilities mature (potentially, after the postnatal development of a globular brain Boeckx & Benítez- Burraco 2014; Irurtzun 2015). 188 CatJL Special Issue, 2019 Aritz Irurtzun

A syntactically10 more interesting case could be the linearity requirement at the Articulatory-Perceptual interface, which would conceivably derive from the nature of human articulators which require to externalize terminal elements sequentially. As a consequence, syntactic trees (which are characterized by phrase structural relations such as dominance, sisterhood, c-command, etc.), have to be linearized for externalization. Here Kayne’s (1994) Linear Correspondency Axiom (LCA) is a well-known procedure for linearizing structures: asymmetric ccommand is mapped into linear precedence (but see the work of Biberauer et al. (2014) referred to in the previous section). Thus, the tree in (26) is mapped onto the string :

(26)

The LCA has the following three properties (i) it is transitive (if xLy & yLz → xLz), (ii) it is total (for all x,y, either xLz, or yLx), and (iii) it is antisymmetric: not(xLy & yLx). Therefore, the relative precedence orders are , , , , for non-terminal elements, and , , for terminal elements. Given the properties I just mentioned, this is mapped into a linearization of . But interestingly, the last property (the antisymmetric requirement) has important consequences for what has traditionally been analysed as syntactic displacements (cf. i.a. Chomsky 2016). Movement (internal merge) is taken to generate a copy of an element in a higher position in the tree, creating a new c-command relationship that will generate a conflicting representation at the A-P interface, as represented in (27):

(27)

10. Assuming that the ‘head parameter’ reflects some underlying syntactic difference across-languages. De-syntacticising Syntax? CatJL Special Issue, 2019 189

The tree structure in (27) has two copies of the element XP: the one within YP is c-commanded by Z and the highest copy of XP, which in turn c-commands Z. Thus, at linearization such a representation will derive into conflicting word order requirements (since XP should precede and follow both Z and XP itself): . The solution natural language has for treating such paradoxes is chain reduction, the deletion at PF of all but one copy (in general, the highest one) such that the structure can be properly linearized without ordering conflicts (see Nunes’s 2004 elegant work on this). However, the important thing here is that the satisfaction of this formal requirement is anti-functional regarding communica- tion: it generates filler-gap dependencies. And this seems to be the general case; as Chomsky (2016: 22) puts it, “[t]he interesting cases are those in which there is a direct conflict between computational and communicative efficiency. In every known case, the former prevails; ease of communication is sacrificed”.

5.2. On the LF side We can build informed conjectures about the constraints and expectable legibility conditions of the AP interface (insofar they derive from physical properties of our articulators), but the interface between language and the CI interface is much more obscure. In fact, we know much less about the general architecture of our cognition and the properties we may expect it to demand to its inputs, so any proposal with regard to this area is highly speculative. Nonetheless, I believe that by exploring this area too we can advance in the understanding of a range of puzzling phenomena. For instance Hurford (2007) proposes that the fact that natural language predi- cates are restricted to taking (at most) four arguments may be a reflex of human constraints for the representation of a single thought, which in turn derives from our ancient visual-attentional system, which only allows to keep track of a very limited set of objects in a given scene and gives rise to our limits for ‘subitization’ (the capacity for recognizing at a glance how many objects are in a group, without verbal counting (Kaufman et al., 1949)) or for visual object tracking (Pylyshyn 2000) among others. Another possible case for which we could hypothesize an extra-linguistic origin could be a restriction on vacuous quantification (Chomsky 1982; Kratzer 1995), which rather than an essentially syntactic constraint (Potts 2002) could be con- ceived as deriving from the logical properties of our language of thought. If any- thing like this is on the right track, we could say that this restriction has as a reflex in the type rigidity imposed on quantifier expressions so that if they fail to bind a variable, the sentence is ungrammatical. The last example that I would like to discuss concerns a puzzling typological gap. In Irurtzun (2019) I have explored the possibility that a consequential con- straint on the logic of predication may help us better understand the prima facie puzzling gap of the lack of genuine verbal interrogative words. The observation is that cross-linguistically, we can ask questions about different participants in the event (subjects, direct objects, or indirect objects (28)), or modifiers of different kind (29): 190 CatJL Special Issue, 2019 Aritz Irurtzun

(28) a. Who kissed Mary? b. Whom did John kiss? c. Who did John give a kiss to?

(29) a. Where did John kiss Mary? b. When did John kiss Mary? c. How did John kiss Mary? d. Why did John kiss Mary?

However, we cannot directly ask questions on the nature of the eventuality itself. That is, there is simply no interrogative pro-verb, so that we can ask ques- tions such as (30):

(30) *Whxyzed John Mary? ‘What type of event happened such that it has John as external argument and Mary as internal argument?’

The ban on interrogative pro-verbs has seldom been discussed in linguistics. Hagège (2008) only classifies 28 languages as having the property of displaying interrogative pro-verbs (see also Idiatov & van der Auwera 2004), but many of them are not pro-verbs questioning eventuality types, and if they are, they are syntactic and semantically very restricted (see Irurtzun 2019 for discussion). My argument in that work is that the lack of verbal wh-words derives from a legibility constraint at the interface between the linguistic computation and the language- external Conceptual-Intentional systems. I depart from the assumption that at LF sentences are Neo-Davidsonian descriptions of eventualities (cf. i.a. Parsons 1990; Hornstein 2002; Pietroski 2005) whereby example (31a) gets the logical form rep- resentation in (31b):

(31) a. Brutus stabbed Cæsar. b. e [Agent(e, Brutus) & Stabbing(e) & Patient(e, Cæsar)]

My argument∃ is that the lack of verbal wh-words derives from a general con- straint on the logic of predication: predication is characterized by a logical asser- toric force whereby a property is ascribed/attributed/applied to an object (cf. i.a. McGinn 2000; Burge 2007; Liebesman 2015) and this is incompatible with query- ing that very same property (just like asserting and questioning are different speech acts). In other words, predicates predicate and it is therefore that predication qua interrogation is incongruent: the logical act of predication cannot be identical to the logical act of querying and as a consequence, natural language allows for questions such as (32a) or (32b), but not for questions such as (32c): De-syntacticising Syntax? CatJL Special Issue, 2019 191

(32) a. e [Agent(e, ?) & Stabbing(e) & Patient(e, Cæsar)] ‘Who stabbed Cæsar?’ ∃ b. e [Theme(e, Cæsar) & Dying(e) & Location(e, ?)] ‘Where did Cæsar die?’ ∃ c. * e [Agent(e, Brutus) & ?(e) & Patient(e, Cæsar)]

Besides,∃ an LF along the lines in (32c) would still be unwarranted, since an interrogative predicate like ?(e) crucially devoids the eventuality of any nature (it is completely undetermined), and as a consequence the DPs get no θ-role (as repre- sented in (33)), given that θ-roles directly depend on the structure of the eventuality (cf. Pietroski 2005; Borer 2005; Ramchand 2008). And failure to assing θ-roles violates the θ-criterion (i.a. Chomsky 1981):

(33) * e [ _____(e, Brutus) & ?(e) & Past(e) & _____(e, Cæsar)]

As∃ can be seen, the logical form in (33) is critically underdetermined where (e,Brutus/Cæsar) may correspond to any theta role (agent, experiencer, posses- sor…). In a nutshell then, based on the minimal assumption that logically predication is incompatible with interrogation, the lack of verbal question-words that stand for any eventuality type derives directly from the LF illegibility they would generate: their semantics would require predicates predicating and interrogating at the same time and a failure to assign θ-roles to eventuality participants (which, by hypothe- sis, corresponds to an illegible representation for the CI interface).11

6. General conclusions This article sought to discuss possible ways of interaction between the syntactic component and the interfaces. These are essential issues that are seldom explored in and of themselves, even if they have been at the core of theoretical discussions over the last half century. The complexity of many relevant phenomena implies that simplification in one area requires complexity in the other if we are to obtain even descriptive adequacy. In this regard, the latest ‘radical externalization’ thesis seems to me an interesting thesis worth exploring, but I am afraid that for a number of the relevant phenomena, a radical externalization implementation would require an exponential complexity in the externalization path (involving a range of operations that do not look externally motivated). Furthermore, I believe that one of the most pressing problems that this thesis faces is the fact that cross-linguistic variability does not

11. The analysis in Irurtzun (2019) makes a further prediction: the impossibility should be extendable to other analogous elements whose semantic contribution is the introduction of a predicate of events. In fact, this seems to be the case, as shown by the apparent cross-linguistic lack of interrogative adpositions or tense markers. 192 CatJL Special Issue, 2019 Aritz Irurtzun seem to be restricted to PF; a variety of cross-linguistic semantic differences has been attested in the literature, which casts doubts on the main premise of the radi- cal externalization thesis. On the other hand, the more interactive approach that makes interface computa- tions actively affect syntactic operations is too weak and too powerful at the same time. It is too weak in that it is unable to account for many of the phenomena that we observe at the interfaces (for instance, it pretends to generate metrical grids or p-phrases before or independently of syntax, but these approaches are never explicit as to how to do so). But it is also too powerful, for it predicts the possibility that syntactic computations may depend on phonological processes, which is unattested cross-linguistically. This is a fact that in my view strengthens the validity of the restrictive inverted Y-model of grammar. However, such an architecture does indeed allow for the interface components to affect syntax in some ways: on the one hand, during early acquisition infants may base a range of syntactic hypotheses based on already established phonologi- cal and semantic knowledge (as has been previously argued for by the prosodic and semantic bootstrapping hypotheses). On the other hand, legibility conditions imposed by the extra-linguistic systems that language interfaces with restrict the range of possible outputs of the syntactic computation. This type of analysis may help us better understanding grammatical patterns without necessarily conceiving ad hoc constraints to those effects. I believe that investigation into these conditions is a promising venue of research that deserves further explorations.

References Abend, Omri, Tom Kwiatkowski, Nathaniel J. Smith, Sharon Goldwater & Mark Steedman. 2017. Bootstrapping language acquisition. Cognition 164: 116-143. Arregui, Ana, María Luisa Rivero & Andrés Salanova. 2014. Cross-linguistic variation in imperfectivity. Natural Language and Linguist Theory 32: 307-362. Bach, Emmon, Eloise Jelinek, Angelika Kratzer & Barbara H. Partee (eds.). 1995. Quantification in Natural Languages. Dordrecht: Springer. Bates, Elizabeth & Brian MacWhinney. 1989. Functionalism and the competition model. In Brian MacWhinney & Elizabeth Bates (eds.). The Crosslinguistic Study of Sentence Processing, 3-76. New York: Cambridge University Press. Beckman, Mary E. & Janet B. Pierrehumbert. 1986. Intonational structure in Japanese and English. Phonology Yearbook 3: 255-309. Bernard, Carline & Judit Gervain. 2012. Prosodic cues to word order: what level of representation? Frontiers in Psychology 3: 451. Berwick, Robert C. & Noam Chomsky. 2011. The biolinguistic program: The current state of its development. In Cedric Boeckx & Anna Maria di Sciullo (eds.). The Biolinguistic Enterprise: New Perspectives on the Evolution and Nature of the Human Language Faculty, 19-41. Oxford: Oxford University Press. De-syntacticising Syntax? CatJL Special Issue, 2019 193

Biberauer, Theresa, Ian Roberts & Michelle Sheehan. 2014. No-choice parameters and the limits of syntactic variation. In Robert E. Santana-LaBarge (ed.). Proceedings of the 31st West Coast Conference on Formal Linguistics, 46-55. Sommerville: Cascadilla Proceedings Project. Bocci, Giuliano, Valentina Bianchi & Silvio Cruschina. 2017. When prosodic dis- tribution meets syntactic derivation: wh-questions in Italian. Talk delivered at the 27th Colloquium on Generative Grammar, Universidad de Alcalá, Alcalá de Henares. Boeckx, Cedric. 2011. Approaching Parameters from Below. In Anna Maria Di Sciullo & Cedric Boeckx (eds.). The Biolinguistic Enterprise: New Perspectives on the Evolution and Nature of the Human Language Faculty, 205-222. Oxford: Oxford University Press. Boeckx, Cedric. 2014. Elementary Syntactic Structures: Prospects of a Feature-free Syntax. Cambridge: Cambridge University Press. Boeckx, Cedric & Antonio Benítez-Burraco. 2014. The shape of the language-ready brain. Frontiers in Psychology 5: 282. Borer, Hagit. 2005. Structuring Sense. Volume II: The Normal Course of Events. Oxford: Oxford University Press. Bošković, Željko. 2003. On the interpretation of multiple questions. Linguistic Variation Yearbook 1(1): 1-15. Burge, Tyler. 2007. Predication and Truth. The Journal of Philosophy 104: 580-608. Chen, Yiya & Carlos Gussenhoven. 2008. Emphasis and tonal implementation in Standard Chinese. Journal of Phonetics 36(4): 724-746. Chierchia, Gennaro. 1998. Reference to Kinds across language. Natural Language Semantics 6(4): 339-405. Chomsky, Noam. 1955. The Logical Structure of Linguistic Theory. Philadelphia: University of Pennsylvania dissertation. Chomsky, Noam. 1957. Syntactic structures. The Hague: Mouton de Gruyter. Chomsky, Noam. 1981. Lectures on Government and Binding. Dordrecht: Foris. Chomsky, Noam. 1982. Some Concepts and Consequences of the Theory of Government and Binding. Cambridge: MIT Press. Chomsky, Noam. 1995. The Minimalist Program. Cambridge (MA): MIT Press. Chomsky, Noam. 2000. Minimalist inquiries: the framework. In Roger Martin, David Michaels & Juan Uriagereka (eds.). Step by Step: Essays on Minimalist Syntax in Honor of Howard Lasnik, 89-155. Cambridge (MA): MIT Press. Chomsky, Noam. 2001. Derivation by phase. In Michael Kenstowicz (ed.). Ken Hale: A Life in Language, 1-52. Cambridge (MA): MIT Press. Chomsky, Noam. 2016. What Kind of Creatures are We? New York: Columbia University Press. Christophe, Anne, Marina Nespor, Maria Teresa Guasti & Brit Van Ooyen. 2003. Prosodic structure and syntactic acquisition: the case of the head-direction param- eter. Developmental Science 6(2): 211-220. 194 CatJL Special Issue, 2019 Aritz Irurtzun

Chung, Sandra & William A. Ladusaw. 2004. Restriction and Saturation. Cambridge: MIT Press. Cinque, Guglielmo. 1993. A null theory of phrase and compound stress. Linguistic Inquiry 24: 239-298. Clahsen, Harald, Sonja Eisenbeiss & Anne Vainikka. 1994. The seeds of structure: A syntactic analysis of the acquisition of case marking. In Teun Hoekstra & Bonnie D. Schwartz (eds.). Language Acquisition Studies in Generative Grammar: Papers in Honor of Kenneth Wexler from the 1991 GLOW workshops, 85-119. Amsterdam/ Philadelphia: John Benjamins. Cole, Peter & Gabriela Hermon. 1994. Is There LF wh-Movement? Linguistic Inquiry 25(2): 239-262. D’Alessandro, Roberta & Tobias Scheer. 2015. Modular PIC. Linguistic Inquiry 46: 593-624. David, Anne Boyle. 2014. A Descriptive Grammar of Pashto and its Dialects. Berlin: Mouton de Gruyter. De Lisser, Tamirand Nnena, Stephanie Durrleman, Luigi Rizzi & Ur Shlonsky. 2015. The acquisition of jamaican creole: Null subject phenomenon. Language Acquisition 23(3): 261-292. Delais-Roussarie, Elisabeth. 1995. Pour une approche parallèle de la structure prosodique: Etude de l’organisation prosodique et rythmique de la phrase fran- çaise: Université de Toulouse-Le Mirail dissertation. Dhongde, Ramesh Vaman & Kashi Wali. 2009. Marathi. Amsterdam & Philadelphia: John Benjamins. Dobashi, Yoshihito. 2003. Phonological Phrasing and Syntactic Derivation. Ithaca (NY): Cornell University dissertation. Donegan, Patricia J. & David Stampe. 1983. Rhythm and the holistic organization of language structure. In John F. Richardson, Mitchell Marks & Amy Chukerman (eds.). Papers from the Parasession on the Interplay of Phonology Morphology and Syntax, 337-353. Chicago: University of Chicago Press. Donegan, Patricia J. & David Stampe. 2004. Rhythm and the synthetic drift of Munda. The Yearbook of South Asian Languages and Linguistics 2004, 3-36. Downing, Laura. 2007. Focus prosody divorced from stress and intonation in Chichewa, Chitumbuka and Durban Zulu. Paper presented at the ICPhS Workshop on “Intonational phonology: Understudied or fieldwork languages”, Saarbrücken. Duguine, Maia & Aritz Irurtzun. 2014. From obligatory wh-movement to optional wh in situ in Labourdin Basque. Language 90.1: e1-e30. Duguine, Maia & Aritz Irurtzun. 2019. On the role of prosody in wh-in-situ: Cross- linguistic comparison and experimental evidence from Basque. In Ángel J. Gallego & Francesc Roca (eds.). Syntactic Geolectal Variation: Traditional Approaches, Current Challenges and New Tools. Amsterdam and Philadelphia: John Benjamins. To appear. Duguine, Maia, Aritz Irurtzun & Cedric Boeckx. 2017. Linguistic diversity and granularity: two case-studies against parametric approaches. Linguistic Analysis 41: 445-473. De-syntacticising Syntax? CatJL Special Issue, 2019 195

Eilam, Aviad. 2008. Intervention effects: Why Amharic patterns differently. In Natasha Abner & Jason Bishop (eds.). Proceedings of the 27th West Coast Conference on Formal Linguistics, 141-149. Somerville (MA): Cascadilla Proceedings Project. Elordieta, Gorka. 1998. Intonation in a pitch accent variety of Basque. ASJU: International Journal of Basque Linguistics and Philology XXXII(2): 511-569. Elordieta, Gorka. 2008. Constraints on Intonational Prominence of Focalized Constituents. In Chungmin Lee, Matthew Gordon & Daniel Büring (eds.). Topic and Focus: Cross-Linguistic Perspectives on Meaning and Intonation, 1-22. Dordrecht: Springer. Elordieta, Gorka, Sonia Frota & Marina Vigario. 2005. Subjects, objects and intona- tional phrasing in Spanish and Portuguese. Studia Linguistica 59(2-3): 110-143. Etxeberria, Urtzi & Anastasia Giannakidou. 2014. D-heads, domain restriction, and variation: From Greek and Basque to Salish. In Lilia Schürcks, Anastasia Giannakidou & Urtzi Etxeberria (eds.). The Nominal Structure in Slavic and Beyond, 413-440. Berlin: Mouton de Gruyter. Etxepare, Ricardo & Myriam Uribe-Etxebarria. 2005. In-situ wh-phrases in Spanish: Locality and quantification. Recherches Linguistiques de Vincennes 33: 9-34. Etxepare, Ricardo & Myriam Uribe-Etxebarria. 2012. Las preguntas de qu-in situ en español: un análisis derivacional. In José María Brucart & Ángel J. Gallego (eds.). El movimiento de constituyentes, 251-271. Madrid: Visor Libros. Flaugnacco, Elena, Luisa Lopez, Chiara Terribili, Stefania Zoia, Sonia Buda, Sara Tilli, Lorenzo Monasta, Marcella Montico, Alessandra Sila, Luca Ronfani & Daniele Schein. 2014. Rhythm perception and production predict reading abilities in developmental dyslexia. Frontiers in Human Neuroscience 8: 392. Fujisaki, Hiroya & Hisashi Kawai. 1988. Realization of linguistic information in the voice fundamental frequency contour of the spoken Japanese. Acoustics, Speech Signal and Processing 88: 663-666. Gallego, Ángel J. (ed.). 2012. Phases: Developing the Framework. Berlin & Boston: Walter de Gruyter. Gaminde, Iñaki & Jasone Salaberria. 1997. Ezpeleta, Lekorne eta Makeako azentu- ereduez. Uztaro 20: 93-103. Gervain, Judit & Janet F. Werker. 2013. Prosody cues word order in 7-month-old bilingual infants. Nature Communications 4: 1490. Gleitman, Lila. 1990. The structural sources of verb meanings. Language Acquisition 1(1): 3-55. Gordon, Reyna L., Carolyn M. Shivers, Elizabeth A. Wieland, Sonja A. Kotz, Paul J. Yoder & J. Devin McAuley. 2015. Musical rhythm discrimination explains indi- vidual differences in grammar skills in children. Developmental Science 18(4): 635-644. 196 CatJL Special Issue, 2019 Aritz Irurtzun

Graf, Thomas. 2013. Local and Transderivational Constraints in Syntax and Semantics. Los Angeles: UCLA dissertation. Grimshaw, Jane. 1981. Form, function, and the Language Acquisition Device. In Carl Lee Baker & John J. McCarthy (eds.). The Logical Problem of Language Acquisition, 165-182. Cambridge: MIT Press. Gu, Zhenglai, Hiroki Mori & Hideki Kasuya. 2003. Analysis of vowel formant frequen- cy variations between focus and neutral speech in Mandarin Chinese. Acoustical Science and Technology 24(4): 192-193. Gussenhoven, Carlos. 2004. The Phonology of Tone and Intonation. Cambridge & New York: Cambridge University Press. Gussenhoven, Carlos & Renske Teeuw. 2008. A moraic and a syllabic H-tone in Yucatec Maya. In Esther Herrera & Pedro Martín Butrage (eds.). Fonología instru- mental: Patrones fónicos y variación, 49-71. Mexico: El Colegio de México. Gut, Ulrike, Stefanie Pillai & Zuraidah Mohd Don. 2013. The prosodic marking of information status in Malaysian English. World Englishes 32(2): 185-197. Haegeman, Liliane & Virginia Hill. 2014. The syntactization of discourse. In Raffaella R. Folli, Christina C. Sevdali & Robert Truswell (eds.). Syntax and its Limits, 370- 390. Oxford: Oxford University Press. Hagège, Claude. 2008. Towards a typology of interrogative verbs. Linguistic Typology 12: 1-44. Hagstrom, Paul M. 1998. Decomposing Questions. Cambridge (MA): MIT dissertation. Haile, Alemayehu. 1987. Lexical stress in Amharic. Journal of Ethiopian Studies 20: 19-43. Halle, Morris & Jean-Roger Vergnaud. 1987. An Essay on Stress. Cambridge (MA): MIT Press. Han, Chung-hye, Jeffrey Lidz & Julien Musolino. 2007. V-raising and grammar com- petition in Korean: Evidence from negation and quantifier scope. Linguistic Inquiry 38(1): 1-47. Han, Chung-hye, Julien Musolino & Jeffrey Lidz. 2016. Endogenous sources of vari- ation in language acquisition. Proceedings of the National Academy of Sciences 113(4): 942-947. Hayes, Bruce. 1995. Metrical Stress Theory: Principles and Case Studies. Chicago: University of Chicago Press. Higginbotham, James. 2009a. Tense, Aspect, and Indexicality. Oxford: Oxford University Press. Higginbotham, James. 2009b. Two Interfaces. In Massimo Piattelli-Palmarini, Juan Uriagereka & Pello Salaburu (eds.). Of Minds and Language: A Dialogue with Noam Chomsky in the Basque Country, 142-154. Oxford: Oxford University Press. Hintz, Diane M. 2006. Stress in South Conchucos Quechua: A Phonetic and Phonological Study. International Journal of American Linguistics 72(4): 477-521. Holmberg, Anders. 2016. The syntax of yes and no. Oxford: Oxford University Press. De-syntacticising Syntax? CatJL Special Issue, 2019 197

Hornstein, Norbert. 2002. A grammatical argument for a Neo-Davidsonian semantics. In G. Preyer & G. Peters (eds.). Logical Form and Language, 345-264. Oxford: Oxford University Press. Hualde, José Ignacio. 1999. Basque accentuation. In Harry van der Hulst (ed.). Word Prosodic Systems in the Languages of Europe, 947-993. Berlin: Mouton de Gruyter. Hualde, José Ignacio. 2003. Accent. In José Ignacio Hualde & Jon Ortiz de Urbina (eds.). A Grammar of Basque, 65-72. Berlin: Mouton de Gruyter. Hualde, José Ignacio, Gorka Elordieta & Arantzazu Elordieta. 1994. The Basque Dialect of Lekeitio, vol. XXXIV Supplements of ASJU. Gipuzkoako Foru Aldundia & University of the Basque Country UPV/EHU. Hualde, José Ignacio, Gorka Elordieta, Iñaki Gaminde & Rajka Smiljanić. 2002. From pitch accent to stress-accent in Basque. In Carlos Gussenhoven & Natasha Warner (eds.). Laboratory Phonology 7, 547-584. van der Hulst, Harry. 2006. On the parallel organization of linguistic components. Lingua 116(5): 657-688. Hurford, James R. 2007. The Origins of Meaning. Oxford: Oxford University Press. Hwang, Hyun Kyung. 2006. Intonation patterns of wh-interrogatives in South Kyungsang Korean and Fukuoka Japanese. Eoneohak 45: 39-59. Hwang, Hyun Kyung. 2011. Distinct types of focus and wh-question intonation. In Proceedings of the 17th International Congress of Phonetic Sciences (ICPhS XVII), 922-925. Idiatov, Dmitry & Johan van der Auwera. 2004. On interrogative pro-verbs. In Proceedings of the workshop on the syntax, semantics and pragmatics of ques- tions, 17-23. Nancy: ESSLLI 16. Irurtzun, Aritz. 2007. The Grammar of Focus at the Interfaces. Vitoria-Gasteiz: University of the Basque Country UPV/EHU dissertation. Irurtzun, Aritz. 2009. Why Y: On the Centrality of Syntax in the Architecture of Grammar. Catalan Journal of Linguistics 8: 141-160. Irurtzun, Aritz. 2015. The ‘globularization hypothesis’ of the language-ready brain as a developmental frame for prosodic bootstrapping theories of language acquisition. Frontiers in Psychology 6. Irurtzun, Aritz. 2016. Strategies for argument and adjunct focalization in Basque. In Beatriz Fernández & Jon Ortiz de Urbina (eds.). Microparameters in the Grammar of Basque, 243-263. Amsterdam and Philadelphia: John Benjamins. Irurtzun, Aritz. 2019. Revisiting the lack of verbal wh-words. In András Bárány, Theresa Biberauer, Jamie A. Douglas & Sten Vikner (eds.). Clausal Architecture and Its Consequences: Synchronic and Diachronic Perspectives. Berlin: Language Science Press. To appear. Ishihara, Shinichiro. 2011. Japanese focus prosody revisited: Freeing focus from pro- sodic phrasing. Lingua 121(13): 1870-1889. 198 CatJL Special Issue, 2019 Aritz Irurtzun

Ishihara, Shinichiro. 2015. Syntax-phonology interface. In Haruo Kubozono (ed.). Handbook of Japanese Phonetics and Phonology. Berlin: Mouton de Gruyter. Ito, Kiwako. 2002. The Interaction of Focus and Lexical Pitch Accent in Speech Production and Dialogue Comprehension: Evidence from Japanese and Basque. Urbana-Champaign (IL): University of Illinois dissertation. Jackendoff, Ray. 1997. The Architecture of the Language Faculty. Cambridge: MIT Press. Jackendoff, Ray. 2002. Foundations of Language: Brain, Meaning, Grammar, Evolution. Oxford: Oxford University Press. Jun, Sun-Ah. 2005. Prosodic typology. In Sun-Ah Jun (ed.). Prosodic Typology: The Phonology of Intonation and Phrasing, 430-458. Oxford: Oxford University Press. Kaufman, Edna L., Miles W. Lord, Thomas W. Reese & John Volkmann. 1949. The discrimination of visual number. The American Journal of Psychology 62(4): 498-525. Kayne, Richard S. 1994. The Antisymmetry of Syntax. Cambridge: The MIT Press. Keenan, Edward L. & Denis Paperno. 2017. Overview. In Handbook of Quantifiers in Natural Language, vol. II, 995-1004. Cham: Springer. Kim, Jieun & Sun-Ah Jun. 2009. Prosodic structure and focus prosody of South Kyungsang Korean. Language Research 45(1): 43-66. Kratzer, Angelika. 1995. Stage-level and individual-level predicates. In Gregory N. Carlson & Francis J. Pelletier (eds.). The Generic Book, 125-175. Chicago & London: The University of Chicago Press. Kratzer, Angelika & Elisabeth Selkirk. 2007. Phase theory and prosodic spellout: The case of verbs. The Linguistic Review 24(2-3): 93-135. Kubozono, Haruo. 2007. Focus and intonation in Japanese: Does focus trigger pitch reset? Interdisciplinary Studies on Information Structure 9: 1-27. Kügler, Frank & Susanne Genzel. 2011. On the prosodic expression of pragmatic prominence: The case of pitch register lowering in Akan. Language and Speech 55(3): 331-359. Kügler, Frank & Stavros Skopeteas. 2007. On the universality of prosodic reflexes of contrast: the case of Yucatec Maya. In Jürgen Trouvain & William J. Barry (eds.). Proceedings of the 16th International Congress of Phonetic Sciences, 1025-1028. Saarbrücken: Saarland University. Ladd, D. Robert. 1996. Intonational Phonology. Cambridge: Cambridge University Press. Langus, Alan & Marina Nespor. 2013. Language development in infants: What do humans hear in the first months of life? Hearing, Balance and Communication 11(3): 121-129. Lee, Hye-Sook. 2007. Interrogative intonation in North Kyungsang Korean: Language- specificity and universality of acoustic and perceptual cues. Working Papers of the Cornell Phonetics Laboratory 16: 57-100. De-syntacticising Syntax? CatJL Special Issue, 2019 199

Lee, Yong-Cheol, Ting Wang & Mark Liberman. 2016. Production and perception of Tone 3 focus in Mandarin Chinese. Frontiers in Psychology 7. Leong, Victoria & Usha Goswami. 2014. Impaired extraction of speech rhythm from temporal modulation patterns in speech in developmental dyslexia. Frontiers in Human Neuroscience 8: 96. Liebesman, David. 2015. Predication as ascription. Mind 124(494): 517-569. Liu, Fang & Yi Xu. 2005. Parallel encoding of focus and interrogative meaning in Mandarin Intonation. Phonetica 62(2-4): 70-87. Maekawa, Kikuo. 1999. Effects of focus on duration and vowel formant frequency in Japanese. In Yoshinori Sagisaka, Nick Campbell & Norio Higuchi (eds.). Computing Prosody: Computational Models for Processing Spontaneous Speech. New York: Springer-Verlag. Major, Travis. 2014. Syntactic Islands in Uyghur. Lawrence (KS): University of Kansas MA thesis. Markman, Ellen M. 1992. Constraints on word learning: Speculations about their nature, origins, and domain specificity. In Megan R. Gunnar & Michael Maratsos (eds.). Modularity and constraints in language and cognition: The Minnesota Symposia on Child Psychology, 59-101. Hillsdale: Erlbaum. Marušič, Franc. 2005. On Non-simultaneous Phases. Stony Brook (NY): Stony Brook University dissertation. Marušič, Franc. 2009. Non-simultaneous spell-out in clausal and nominal domain. In Kleanthes K. Grohmann (ed.). Interphases, 151-181. Oxford: Oxford University Press. Marvin, Tatjana. 2003. Topics in the Stress and Syntax of Words. Cambridge: MIT dissertation. Maskikit-Essed, Raechel & Carlos Gussenhoven. 2016. No stress, no pitch accent, no prosodic focus: the case of Ambonese Malay. Phonology 33(02): 353-389. Mathieu, Eric. 2016. The wh-parameter and radical externatization. In Luis Eguren, Olga Fernández Soriano & Amaya Mendikoetxea (eds.). Rethinking Parameters, 252-290. Oxford: Oxford University Press. Matthewson, Lisa. 2010. Cross-linguistic variation in modality systems: the role of mood. Semantics and Pragmatics 3: 1-74. Matthewson, Lisa. 2014. The measurement of semantic complexity: How to get by if your language lacks generalized quantifiers. In Frederick J. Newmeyer & Laurel B. Preston (eds.). Measuring Grammatical Complexity, 241-263. Oxford: Oxford University Press. McGinn, Colin. 2000. Logical properties: Identity, existence, predication, necessity, truth. Oxford: Oxford University Press. Mehler, Jacques, Peter Jusczyk, Ghislaine Lambertz, Nilofar Halsted, Josiane Bertoncini & Claudine Amiel-Tison. 1988. A precursor of language acquisition in young infants. Cognition 29: 143-178. 200 CatJL Special Issue, 2019 Aritz Irurtzun

Miller, Philip H., Geoffrey K. Pullum & Arnold M. Zwicky. 1997. The principle of phonology-free syntax: Four apparent counterexamples in french. Journal of Linguistics 33(1): 67-90. Nespor, Marina, Mohinish Shukla, Ruben van de Vijver, Cinzia Avesani, Hanna Schrauldolf & Caterina Donati. 2008. Different phrasal prominence realizations in VO and OV languages. Lingue e Linguaggio VII(2): 1-29. Nespor, Marina & Irene Vogel. 1986. Prosodic Phonology. Dordrecht: Foris. Nunes, Jairo. 2004. Linearization of Chains and Sideward Movement. Cambridge: MIT Press. Parsons, Terry. 1990. Events in the Semantics of English: A Study of Subatomic Semantics. Cambridge (MA): MIT Press. Pierrehumbert, Janet B. & Mary Beckman. 1988. Japanese Tone Structure. Cambridge: MIT Press. Pietroski, Paul M. 2005. Events and Semantic Architecture. Oxford: Oxford University Press. Pinker, Steven. 1989. Learnability and Cognition: The Acquisition of Argument Structure. Cambridge: MIT Press. (New edition, 2013). Potts, Christopher. 2002. No vacuous quantification constraints in syntax. In Masako Hirotani (ed.). Proceedings of the North East Linguistic Society 32, 451-470. Amherst: GLSA. Pylyshyn, Zenon W. 2000. Situating vision in the world. Trends in Cognitive Sciences 5(4): 197-207. Ramchand, Gillian Catriona. 2008. Verb Meaning and the Lexicon: A First-Phase Syntax. Cambridge: Cambridge University Press. Rao, Preeti, Niramay Sanghvi, Hansjörg Mixdorff & Kamini Sabu. 2017. Acoustic correlates of focus in Marathi: Production and perception. Journal of Phonetics 65: 110-125. Reglero, Lara & Emma Ticio. 2013. A unified analysis of wh-in-situ in Spanish. The Linguistic Review 30(4): 501-546. Reinhart, Tanya. 2006. Interface Strategies: Optimal and Costly Computations. Cambridge: MIT Press. Reiss, Charles. 2007. Modularity in the ‘sound’ domain: Implications for the purview of Universal Grammar. In Gillian Ramchand & Charles Reiss (eds.). The Oxford Handbook of Linguistic Interfaces, 53-80. Oxford: Oxford University Press. Richards, Norvin. 2010. Uttering Trees. Cambridge: MIT Press. Richards, Norvin. 2016. Contiguity Theory. Cambridge: MIT Press. Rizzi, Luigi. 1997. The Fine Structure of the Left Periphery. In Liliane Haegeman (ed.). Elements of Grammar, 281-337. Dordrecht: Kluwer. Samuels, Bridget D. 2011. Phonological Architecture: A Biolinguistic Approach. Oxford: Oxford University Press. Scheer, Tobias. 2011. A Guide to Morphosyntax-Phonology Interface Theories. How Extra-Phonological Information Is Treated in Phonology since Trubetzkoy’s Grenzsignale. Berlin: Mouton de Gruyter. De-syntacticising Syntax? CatJL Special Issue, 2019 201

Scontras, Gregory, Maria Polinsky, C.-Y. Edwin Tsai & Kenneth Mai. 2017. Crosslinguistic scope ambiguity: When two systems meet. Glossa 2(1): 36. Seidl, Amanda. 2001. Minimal Indirect Reference: A Theory of the Syntax-Phonology Interface. New York: Routledge. Selkirk, Elisabeth. 1986. On derived domains in sentence phonology. Phonology Yearbook 3: 371-405. Selkirk, Elisabeth. 2000. The interaction of constraints in prosodic phrasing. In Merle Horne (ed.). Prosody: Theory and Experiment. Studies presented to Gösta Bruce, 231-261. Dordrecht: Kluwer. Selkirk, Elisabeth. 2011. The syntax-phonology interface. In John Goldsmith, Jason Riggle & Alan C. L. Yu (eds.). The Handbook of Phonological Theory, 435-484. Oxford: Wiley-Blackwell 2nd edn. Selkirk, Elisabeth & Seunghun J. Lee. 2015. Constituency in sentence phonology: an introduction. Phonology 32(1). Shukla, Mohinish & Marina Nespor. 2010. Rhythmic patterns cue word order. In Nomi Erteschik-Shir & Lisa Rochman (eds.). The Sound Patterns of Syntax, 174-188. Oxford: Oxford University Press. Snyder, William. 1995. Language Acquisition and Language Variation: The Role of Morphology. MIT dissertation. Snyder, William. 2001. On the nature of syntactic variation: Evidence from complex predicates and complex word-formation. Language 77(2): 324-342. Snyder, William. 2005. Motion Predicates and the Compounding Parameter: A New Approach. Paper presented in the Linguistics Colloquium Series, University of Maryland, College Park. Tegey, Habibullah & Barbara Robson. 1996. A Reference Grammar of Pashto. Washington (DC): Center for Applied Linguistics. Tokizaki, Hisao. 2008. Syntactic Structure and Silence: A Minimalist Theory of Syntax- Phonology Interface. Tokyo: Hytuzi Syobo. Tokizaki, Hisao. 2013. Deriving the compounding parameter from phonology. Linguistic Analysis 38(3-4): 275-304. Tokizaki, Hisao. 2016. Phonological externalization of morphosyntactic structure: Universals and variables. Phonological Externalization 1: 1-10. Tokizaki, Hisao & Yoshihito Dobashi. 2013. Introduction to universal syntax and para- metric phonology. Linguistic Analysis 38(3-4): 147-151. Truckenbrodt, Hubert. 1995. Phonological Phrases: Their Relation to Syntax, Focus, and Prominence. MIT dissertation. Truckenbrodt, Hubert. 1999. On the relation between syntactic phrases and phonologi- cal phrases. Linguistic Inquiry 30(2): 219-255. Uriagereka, Juan. 1999. Multiple Spell-Out. In Samuel David Epstein & Norbert Hornstein (eds.). Working Minimalism, 251-282. Cambridge (MA): MIT Press. 202 CatJL Special Issue, 2019 Aritz Irurtzun

Vallduví, Enric. 1995. Structural properties of information packaging in Catalan. In Katalin É. Kiss (ed.). Discourse Configurational Languages, 122-152. Oxford: Oxford University Press. Venditti, Jennifer J., Kikuo Maekawa & Mary E. Beckman. 2008. Prominence marking in the Japanese intonation system. In Shigeru Miyagawa & Mamoru Saito (eds.). Handbook of Japanese Linguistics, 456-512. Oxford: Oxford University Press. Wagner, Michael. 2005. Prosody and Recursion. MIT dissertation. Wali, Kashi. 2005. Marathi. Muenchen: Lincom Europa. Wang, Bei, Ling Wang & Tursun Qadir. 2011. Prosodic realization of focus in six languages/dialects in China. In Wai-Sum Lee & Eric Zee (eds.). Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong 2011, 144-147. Hong Kong: University of Hong Kong. Xu, Yi. 1999. Effects of tone and focus on the formation and alignment of f0 contours. Journal of Phonetics 27: 55-105. Xu, Yi, Szu wei Chen & Bei Wang. 2012. Prosodic focus with and without postfocus compression: A typological divide within the same language family? The Linguistic Review 29(1): 131-147. Yakup, Mahire & Joan A. Sereno. 2016. Acoustic correlates of lexical stress in Uyghur. Journal of the International Phonetic Association 46(1): 61-77. Zerbian, Sabine. 2007. Investigating prosodic focus marking in Northern Sotho. In Enoch Oladé Aboh, Katharina Hartmann & Malthe Zimmermann (eds.). Focus strategies in African languages: the interaction of focus and grammar in Niger- Congo and Afro-Asiatic, 55-79. Berlin & New York: Mouton de Gruyter. Zubizarreta, María Luisa. 1998. Prosody, Focus and Word Order. Cambridge: MIT Press. Zumbansen, Anna, Isabelle Peretz & Sylvie Hébert. 2014. The combination of rhythm and pitch can account for the beneficial effect of melodic intonation therapy on con- nected speech improvements in Broca’s aphasia. Frontiers in Human Neuroscience 8: 592. Zwicky, Arnold M. & Geoffrey K. Pullum. 1983. Phonology in syntax: The Somali optional agreement rule. Natural Language and Linguistic Theory 1(3): 385-402. Catalan Journal of Linguistics Special Issue, 2019 203-227

Discourse Phenomena as a Window to the Interfaces*

Alba Cerrudo [email protected]

Received: March 30, 2018 Accepted: September 23, 2019

Abstract

This paper examines the two lines of analysis that are generally pursued when dealing with dis- course phenomena in the generative tradition: syntactico-centric and interface-based approaches. Syntactico-centric analyses are criticized because they need construction-specific mechanisms, while interface-based analyses sometimes challenge standard assumptions about the architecture of grammar. The discussion is mainly theoretical, but three case studies serve as exemplification: focalization, ellipsis and parentheticals. The second part of the paper is focused on parentheticals; a brief proposal is presented regarding the distinction between free and anchored parentheti- cals from a syntax-phonology interface perspective. The general conclusion is that following an interface-based perspective to approach discourse phenomena can help us gain new insights about the nature of the interfaces and their role in grammar. Keywords: syntax-phonology interface; cartography; ellipsis; focalization; parentheticals

Resum. Els fenòmens discursius com una finestra a les interfícies

Aquest article examina les dues línies de recerca que se segueixen generalment quan s’estudien fenòmens discursius en la tradició generativista: enfocaments sintàctico-cèntrics i d’interfície. Critiquem les propostes sintàctico-cèntriques perquè necessiten fer ús de mecanismes específics per analitzar cada construcció en qüestió, mentre que les propostes d’interfície de vegades posen en dubte l’arquitectura gramatical estàndard. La discussió és fonamentalment teòrica, però s’utilitzen tres casos d’estudi: la focalització, l’el·lipsi i els parentètics. La segona part de l’article se centra en els parentètics; es presenta una proposta bàsica per formalitzar les diferències entre parentètics lliures i ancorats (ang. free i anchored) des d’una perspectiva d’interfície sintaxi-fonologia. La conclusió general és que seguir una línia d’anàlisi d’interfície per tractar els fenòmens discursius pot ajudar-nos a comprendre millor la naturalesa de les interfícies i el seu paper a la gramàtica. Paraules clau: interfície sintaxi-fonologia; cartografia; el·lipsi; focalització; parentètics

* I want to thank José María Brucart, Ángel J. Gallego and the audience of Generative Syntax 2017: Questions, Crossroads and Challenges for useful discussion about the issues presented in this paper. I also thank two anonymous reviewers for their comments on previous versions of this paper. Needless to say, all possible remaining errors or shortcomings are my own. This research has been partially supported by grants from the Ministerio de Educación, Cultura y Deporte (FPU14/06694) the Ministerio de Economía y Competividad (FFI2014-56968-C4-2-P), and AGAUR from the Generalitat de Catalunya (2017SGR634).

ISSN 1695-6885 (in press); 2014-9718 (online) https://doi.org/10.5565/rev/catjl.230 204 CatJL Special Issue, 2019 Alba Cerrudo

Table of Contents 1. Introduction 3. The case of parentheticals 2. Two kinds of approaches to discourse 4. Conclusion phenomena References

1. Introduction If syntax is an optimal solution to satisfy interface conditions (Chomsky 2000), the study of the interfaces should have a central role in our research agenda. On the empirical side, discourse phenomena seem the perfect research field to study syntax and the interfaces at the same time, since these phenomena are characterized by showing special properties at every level of grammatical analysis (including pragmatics). Nonetheless, there is a very influential trend in the generative literature that pursues the syntactization of discourse (Haegeman & Hill 2013), that is, the total encoding of discourse phenomena in narrow syntax. Not all authors agree when evaluating the explanatory power of this kind of proposals and most of the discourse phenomena that cartographic analyses explore have received an alternative account. What all these alternatives have in common is that they take the burden from syntax and put it somewhere else in the grammar. This second kind of approaches to discourse go beyond the specific phenomena under study and make predictions about the architecture of grammar and the role of the interfaces on it. Both types of proposals have their own problems, but I believe that those that try to encode everything in syntax always need construction-specific mechanisms, while the same does not hold necessarily for those that also look at the interfaces. I will focus specifically on the syntax-phonology interface to illustrate this claim, looking at three different constructions: ellipsis, focalization and parentheticals. The paper is structured in two parts. The first one (section 2) is a general theo- retical discussion of these two types of approaches to discourse phenomena, based on two examples: focalization and ellipsis. The second part (section 3) focuses on parentheticals as a case study, including both a critical evaluation of previous pro- posals (according to the two categories presented at the beginning) and a sketch of a syntax-phonology interface proposal to analyze parentheticals, with implications for the theory of linearization and Spell-Out.

2. Two kinds of approaches to discourse phenomena ‘Discourse phenomena’ is a broad label that might include very different types of linguistic structures and processes (see Rigau 1981), ranging from any phenomena beyond the sentence to language-use-related notions (Schiffrin, Tannen & Hamilton 2011). These two key aspects are found in focalization and ellipsis, which are the chosen phenomena in this paper to discuss what are the kind of approaches to dis- course that can be found in the generative literature. On the one hand, their analysis requires to resort to pragmatic concepts, such as given/new or background/focus Discourse Phenomena as a Window to the Interfaces CatJL Special Issue, 2019 205

(see Vallduví & Engdahl 1996 for a review). On the other hand, these notions need context (more than one sentence) to be evaluated: the information structure of a given sentence depends heavily on its linguistic context and ellipsis always involves more than one sentence (in the sense that there must be an antecedent) and could be a process that operates across utterances.1 Other classical examples of discourse phenomena could be the study of connective elements (conjunctions, interjections, particles, discourse markers) and the investigation of reference across utterances (deixis, anaphora, cataphora, etc.). This section deals only with focalization and ellipsis because they both have received an analysis from the syntax-phonology interface side and, under this perspective, they can be seen as two sides of the same coin: ellipsis as radical deaccentuation of background information and focalization as stress on new/contrastive information.2 There is a growing body of literature on both ellipsis and focalization, so the goal of this section is not to offer a comprehensive overview, just to review some representative analyses in order to discuss the existing approaches to discourse phenomena. Specifically, I will focus on the theoretical positions of these approaches with respect to the role of syntax and the interfaces. A distinction can be established between those proposals that put all the burden on syntax and those that resort to the interfaces. I will refer to the first kind of analyses as syntactico-centric and to the second kind as interface-based. Syntactico-centric analyses presuppose that everything must be encoded in syntax because there is a strictly direct mapping between syntax and both interfaces, following the traditional Y-model. On the other hand, interface-based approaches tend to simplify syntax and propose interface conditions or operations instead, some of them at the cost of challenging the Y-model. This second type of proposals vary more on their assumptions than syntactico-centric analyses do, and this is one of the reasons for choosing two phenomena that have received an analysis based on the phonological component. The case study of the second part of the paper, parentheticals, will also receive an analysis based on the syntax-phonology interface. In the remainder of this section, I consider each of the two opposite views in turn, syntactico-centric and interface-based, and their application to the analysis of focalization and ellipsis.

2.1. Syntactico-centric approaches The most influential syntactico-centric approach to discourse is the Cartographic project (Rizzi 1997 et seq.; Cinque 1999, 2002; Belletti 2004; Cinque & Rizzi 2008, a.o.). According to Rizzi’s (1997) seminal work, the C node can be decomposed into a series of heads, each of them projecting its own phrase in a fixed functional sequence, as in (1).

1. If one adopts the analysis of fragment answers as remnants of ellipsis (Merchant 2004), the anteced- ent in these cases would be a previous utterance (a question asked by another speaker). 2. In addition, ellipsis seems to be an important phenomenon in parenthetical constructions, as recent studies have shown (Döring 2015; Ott & Onea 2015; Ott 2016b; Stowell 2017). 206 CatJL Special Issue, 2019 Alba Cerrudo

(1) ForceP > TopicP > FocP > FinP3

These four heads compose the left periphery of the clause: ForceP contains illocutionary features, TopicP hosts topicalized constituents in its specifier (CLLD in Romance), FocP does the same with foci (and wh-elements)4 and FinP, being the head closer to T, is related to tense and mood (it is supposed to host complementiz- ers lower than those on ForceP). This framework can be defined by two principles: firstly, the so-called “One feature one head”, the assumption, rooted on Pollock’s (1989) work, that each morphosyntactic feature corresponds to an independent syntactic head with a specific slot in the functional hierarchy (Cinque and Rizzi 2008: 50); secondly, the Criterial approach (Rizzi 1997, 2004), which explains topicalization and focalization as movement processes, triggered by criterial fea- tures. The topicalized/focalized constituent is argued to be endowed with topic or focus features that must be checked against the relevant head in a Spec-Head configuration. Thus, topics and foci are assumed to move from their base position in the clause to the dedicated specifiers in the left periphery. Rizzi (2006) explicitly argues for a uniform treatment of A and A’-movement, the former being triggered by formal (uninterpretable or unvalued) features (as proposed in Chomsky 1995, 2001, 2004) and the second by semantic or criterial features. A number of empirical challenges for the cartographic project have been raised in the literature (see van Craenenbroeck 2009 for a review). For instance, Neeleman and Van de Koot (2008) show that topics and foci can occur in a wide variety of structural positions in Dutch (see also Neeleman et al. 2009; Wagner 2009 and Nielsen 2003), and Bañeras (2016) offers some data from the combinatorial pos- sibilities in the Spanish left periphery that challenge the fixed functional sequence in (2).5 However, what I want to discuss here are the conceptual shortcomings of the cartographic project (cf. López 2014; Reinhart 2006; Gallego 2013, 2014; van Craenenbroeck 2009, a.o.). I will focus on two points: the nature of criterial features (as opposed to formal ones) and the status of criterial heads. Criterial features are essentially different from formal (uninterpretable or unval- ued) ones. To begin with, topic and focus features are clearly interpretable, even in a technical sense: the feature [focus] can be read at both interfaces, as a chosen alternative of a given set in LF and as specific prosodic pattern in PF, which varies crosslinguistically. The whole Agree system (since Chomsky 1995) relies on the assumption that unvalued or uninterpretable features must be deleted because they are not readable at the interfaces (they are purely formal, syntactic) and this would cause the derivation to crash. If [topic] and [focus] cannot be valued and are indeed interpretable, it makes no sense for them to be necessarily deleted. In fact, this is

3. Rizzi widens this cartography in subsequent work (2001, 2004) and the Topic head is argued to be recursive. 4. Rooth (1985) was the first to propose a syntactico-centric approach to foci, postulating a functional head, FOC, which carries the semantic content of focusing its sister and, at the same time, has phonological content. 5. See Demonte & Fernández-Soriano (2009), Hernanz (2010), Villa-García (2015) and references therein for cartographic approaches to the Spanish left periphery. Discourse Phenomena as a Window to the Interfaces CatJL Special Issue, 2019 207 empirically correct: a sentence without focalization or topicalization can be infelici- tous in a given context, but not ungrammatical. In the same vein, criterial features are not primitives, nor can be decomposed in attributes with values (Gallego 2013, 2014 and López 2014). These features can be defined as relational notions, whose meaning depends on the linguistic context, contrary to formal features, which are lexically determined. Considering this difference, it is problematic to assume that criterial features are assigned in the lexicon (as Aboh 2010 explicitly claims). Reinhart (2006) acknowledges this problem and argues that discourse features like [topic] or [focus] are not associated with a lexical item which enters into the numeration, because they are actually properties of an entire constituent, related to the informational status of the whole sentence. Thereby, this information cannot be present in the numeration since it arises from the context. Then, a non-trivial question arises: when and how are those criterial features assigned to the relevant constituents? The second point of criticism has to do with the category/feature distinction (see Adger & Svenonius 2011), which is eliminated in this framework: topic and focus are features assigned to constituents in the derivation and, at the same time, they qualify as functional heads in the left periphery, which attract those features. The redundancy is obvious and, given that these features are interpretable, seems unnecessary. In addition, the use of criterial heads obscures the distinction between the paradigmatic and the syntagmatic axes, as Gallego (2013) points out. Both lexical and traditional functional categories (Asp, T, C) are paradigmatic, while Topic or focus are syntagmatic. This was precisely Chomsky’s (1995) argument to dispense with agreement projections: AgrP has a theory-internal status, given that it is inherently relational, in the sense that it just creates a landing site for case assignment, as TopP or FocP do for topics and foci. In Gallego’s (2013: 13) words, criterial heads are emergent categories: “they only appear in a syntactic environ- ment, so recycling them as lexical items blurs the paradigmatic/syntagmatic cut and raises non-trivial questions about the interaction between the syntactic and semantic components.” To sum up, the nature of both criterial features and heads poses non-trivial questions for the cartographic framework. When one talks about the syntactization of discourse, one tends to think of the cartographic project above or of similar approaches that propose, for instance, using speaker and hearer projections (Haegeman & Hill 2013; Speas & Tenny 2003; Wiltschko & Heim 2016). However, there are other proposals that can be considered syntactico-centric due to their use of non-formal features in the deri- vation, even if they do not propose that these features project. This is the case of the so-called move-and-delete approaches to ellipsis (Merchant 2001, 2004; Brunetti 2003; Ortega-Santos et al. 2014; Weir 2014, a.o.). Merchant (2001, 2004), building on Lobeck (1995), claims that ellipsis is a syntactic operation, triggered by a feature. Different heads (C, in the case of sluicing; Foc, in the case of fragment answers) can be endowed with an [E]-feature (see also Bošković 2014), which serves two functions: (i) it instructs PF not to pronounce the com- plement of the head, and (ii) it ensures that both CPs are mutually entailing, by means of a partial identity function over propositions (to capture the well-known 208 CatJL Special Issue, 2019 Alba Cerrudo parallelism/identity empirical condition on the antecedent in ellipsis). [E] is not responsible for movement, but displacement is another important ingredient of the analysis, since remnants of ellipsis are assumed to “survive” it by moving out of the ellipsis site. Consider the following example of sluicing, analyzed according to this framework:

(2) Someone has eaten my lunch, but I don’t know [CP whoi [C[E] [TP ti has eaten my lunch]]]

Merchant (2001) argues for further subspecification of the [E] feature to account for the fact that sluicing is only possible in interrogatives, as in (3), and not in relative clauses. In interrogatives, [E] is specified with [wh] and [Q] features, which are strong and uninterpretable and, hence, trigger overt movement, while relative operators lack the [Q] feature. The same argument has been developed to account for linguistic variation: English is supposed to allow VP-ellipsis, contrary to German, because it has an EV feature (Merchant 2013; Aelbrecht 2010). Some authors believe that this is not an explanatory theory of ellipsis licens- ing (Valmala 2007; Ott & Struckemeir 2016; Fernández-Sánchez 2017, a.o.).6 The [E] feature is a descriptive device and carries some of the problems posed by [topic] or [focus] features. It is true that the [E] feature is not involved in feature- checking per se (only its subspecified features do) and it does not project, so the syntagmatic vs. paradigmatic problem does not arise. In any case, the question of when this feature is assigned (and why it is assigned only to some heads) remains open. As noticed by Ott & Struckemeir (2016), ellipsis is an optional operation, and this means that the [E] feature is optionally assigned during the derivation, in violation of the Inclusiveness Condition (Chomsky 1995). The same happens with topicalization or focalization: lack of movement does not imply ungrammatical- ity; the result would only be an infelicitous sentence in a given context. I believe that this is a compelling argument to think that none of these operations are part of narrow syntax alone. Neither of the two analyses deals with the issue of how the syntax-phonology mapping occurs, in spite of the great impact that both phenomena have on prosody. According to the theory of ellipsis that we have presented, the [E] feature is part of narrow syntax, but its primary function is to instruct PF to silence the complement of some heads. As for the cartographic project, something has to be said regarding the special intonational pattern of both topics and foci.7 Interface-based approaches have tackled these issues in detail.

6. In addition, some empirical problems have been pointed out. In English, focal constituents are not fronted if they are answers to questions, and some elements that cannot be fronted (bare quanti- fiers or NPIs) can nonetheless surface as ellipsis remnants (Abe 2014; Valmala 2007; Weir 2014; Villa-García 2016). 7. In this paper, I only deal with foci, but topics present intonational phrase boundaries (Astruc 2004; Feldhausen 2010, a.o.). See Ott (2015) for a non-cartographic analysis of topics that is consistent with their prosodic behaviour. Discourse Phenomena as a Window to the Interfaces CatJL Special Issue, 2019 209

2.2. Interface-based approaches Syntactico-centric analyses, aside from the details for the particular constructions, are relatively easy to distinguish by the postulation of non-formal/interpretable features in the syntactic derivation. I believe that a general conceptual criticism to this kind of analyses can target this assumption, so they can be evaluated in general terms. However, the opposite does not hold for interface-based approaches, because they vary greatly among them, even on their assumptions about the architecture of grammar. Therefore, giving an overview of them would exceed the purposes of this paper, even if we focus only on two phenomena, as we have been doing until now. What I want to show in this section is that interface-based approaches are not always conceptually better than cartographic ones. In fact, some of them give rise to other complications: paying attention to the interfaces usually implies a challenge to the Y-model of the architecture of grammar. I will present an important line of research in focalization that has done so and, then, I will show that, by contrast, the interface-based analysis of ellipsis does not challenge the Y-model. The analysis of parentheticals that will be presented in the second part of the paper follows this second line of action: the syntax-phonology mapping has an important role, but there is no need to put forward a radical change in the grammatical architecture. Most of the interface-based approaches to information structure are based on a distinction between truth-conditional, propositional structure (semantics) and discourse structure (pragmatics), to the extent that they propose an additional level of representation for pragmatic operations: an Information Structure compo- nent (Vallduví 1992; Lekakou 2000; Neeleman & van de Koot 2008; Espinal & Villalba 2015), Σ-Structure (Zubizarreta 1998) or Focus Structure (Erteschik-Shir 1997, 2007). In the particular case of focus, there is an influential line of research (going back to Chomsky 1971 and Cinque 1993 who viewed focus as a property defined on PF) that sees the focus of an utterance as a product of the syntax-prosody mapping and the nuclear stress rule, as stated by the stress-focus correspondence (Reinhart 1995):

(3) The focus of an utterance always contains the prosodically most prominent element of the utterance.

One of the consequences of this principle is that there can be different options to assign focus to a given syntactic element, which may vary among languages, but crucially always depending on interface conditions. For instance, some languages apply a purely phonological operation, stress strengthening, to assign focus to a particular element. Neeleman & Reinhart (1998) argued that this is the case in English, whereas in Dutch happens the opposite: discourse anaphoric constituents scramble in order to facilitate anaphoric destressing (see Samek-Lodovici 2005 for the same proposal for right-peripheral focus in Italian). This kind of proposals (see also Szendröi 2001; Hamlaoui & Szendröi 2015; Samek-Lodovici 2015, a.o.) free syntax from pragmatic features, although they claim that it is sensitive to very specific interface conditions, which can also be debatable. Even more, they have to assume (at least partially) a parallel architecture (Jackendoff 1997). 210 CatJL Special Issue, 2019 Alba Cerrudo

Figure 1. Parallel architecture according to Szendröi (2001).

Szendröi (2001), following Nespor & Vogel (1986), assumes that syntax and prosody are two independent levels of representation, only connected by principles of the syntax-prosody mapping, which means that the prosodic representation is not derived from the syntactic one. The resulting grammatical architecture can be represented as in Figure 1 (Szendröi 2001: 26). Both syntax and prosodic phonology are supposed to interface with the C-I system, ensuring focus assignment to be a purely phonological process in some cases. The idea is that the possible foci of an utterance are determined by both the syntactic and the prosodic representation. This view of focus as part of the syntax-prosody mapping is very interesting, because it gives an explanation for its specific prosodic properties. However, the parallel architecture model has a number of theoretical downsides (Irurtzun 2009; Scheer 2011). Despite the descriptive power of such a model, which permits a more flexible interaction among modules, its explanatory power is reduced when compared with the classical Y-model. According to Irurtzun (2009), there are two shortcomings: there is no clear linguistic evidence of such a big interaction across modules, so the system overgenerates; and it is better to restrict the property of recursion only to the syntactic module of grammar (Chomsky, Hauser & Fitch 2002). Scheer (2011: 679) further elaborates on this second point. He claims that allowing for Merge and for trees in phonology or semantics wrongly predicts the existence of recursive structure in these modules and concludes that this fact con- stitutes a strong argument against a parallel architecture, where all modules access concatenation. The empirical facts are irrefutable: there are no recursive phenomena in phonology. As Irurtzun puts it: “the idea of independent derivations is extremely dubious: what type of phonological representations are we going to build indepen- Discourse Phenomena as a Window to the Interfaces CatJL Special Issue, 2019 211 dently of syntax? And how are they going to affect syntactic structure building? […] How does the computational system build phonological structures if it does not take as input the output of syntax?” (2009: 153). Jackendoff (1997), the major defender of the parallel architecture model, does not specify how to build pho- nological phrases independent from syntax, he only poses some correspondence rules that link the two independently built structures in each module. He wants to avoid syntactico-centrism, but his system does not provide an explanation of how the relevant representations that are able to interact are produced. Therefore, the interface-based analysis of focalization that we have briefly reviewed is conceptu- ally problematic because it assumes a parallel architecture of grammar. By contrast, the alternative analysis for ellipsis does not challenge the Y-model. It consists in viewing ellipsis as a purely phonological operation (Chomsky & Lassynik 1993; Tancredi 1992; Rooth 1992; Hartmann 2000; Ott & Struckmeier 2016; Fernández-Sánchez 2017, a.o). This kind of proposals go back to Chomsky & Lasnik (1993) who claimed that ellipsis is independent from narrow syntax and can be defined as the optional silencing of deaccented material. Deaccentuation applies to backgrounded material, which explains why ellipsis requires a proper antecedent (the elided material has to be recoverable by context). Therefore, the domain of deletion is not a syntactic constituent (contra move-and-delete approaches), but a sort of pragmatic domain, the sentential background. Discourse-new and contrastive elements cannot be part of the background, neither extra-propositional elements, as Ott & Struckmeier (2016) show.8 One could say that this proposal implies that phonology needs access to pragmatic information, but this does not have to be the case, according to Ott & Struckmeier (2016: 231, fn. 9): “BG [Background] (and only BG) is what can be deleted in the phonological component while ensuring felicitous use of the resulting fragmentary expression; but the mechanisms of pho- nological reduction are blind to these conditions of use.” There are still questions to answer, but this view of ellipsis seems to be a prom- ising line of research, because it takes the burden out of syntax and, at the same time, avoids changing the canonical Y-model of the architecture of grammar. I will propose something along these lines for the analysis of parentheticals.

3. The case of parentheticals Parentheticals are a perfect case study to explore the limits between syntax and discourse because even their status as syntactic constructions has been called into question. They are probably one of the constructions that, being a syntactic object (in purely descriptive terms), has received more non-syntactic analyses in generative grammar. The special linguistic properties of parentheticals have led some authors to propose that the relationship that they establish with the clause that contains them (the host clause) is purely discursive (Haegeman 1988; Peterson 1999; Burton- Roberts 2005; Ott 2016b, a.o.). In fact, perhaps the core property of parentheticals is the fact that, despite being linearly interpolated inside another clause, they seem to

8. Modal particles in the German middle field survive clausal ellipsis. 212 CatJL Special Issue, 2019 Alba Cerrudo be structurally independent from it (Burton-Roberts 2005). They also show special features at the two interfaces: they interrupt the prosodic flow of the utterance, giv- ing rise to the so-called comma intonation (Bolinger 1989; Taglicht 1998; Astruc- Aguilera 2005)9, and they do not contribute to truth-conditional semantics, instead, they introduce speaker-oriented content (see specially Potts 2005). Nevertheless, it is not easy to generalize when one looks at this kind of con- structions because parentheticals are a “motley crew” (Dehé & Kavalova 2007; cf. Kaltenböch 2007 for a comprehensive classification). One of their most salient properties, positional flexibility – which could be related to their syntactic inde- pendence – is not found in all the constructions that have been considered to belong to the class. In fact, a distinction must be established between free and anchored parentheticals (Kaltenböch 2007; Kluck 2012). Free parentheticals can appear in many positions inside their hosts (generally between major constituents, cf. Emonds 1973; McCawley 1998; Espinal 1991) and they tend to affect semantically all the proposition.10 By contrast, anchored parentheticals have a fixed position in the host and, consequently, a determined semantic scope on one particular constituent. Compare the two groups of examples in (4) and (5). (4) Free parenthethicals a. Newton’s Principia – take a seat – was finally published in 1687. [interjection] . b Einstein’s theory of special relativity, I think, was presented in his 1905 paper. [reduced parenthetical clause/RPC] . c The professor made out with – and we all knew that – lots of students at the party. [and-parenthetical ] (5) Anchored parenthethicals a. Bea kissed Bob, who she has known since high school, at the party. [nominal appositive relative clause/NARC] b. Bea kissed Bob, her high school sweetheart, at the party. [nominal apposition/NA] c. Bea kissed someone, I think it was Bob, at the party. [sluiced parenthetical] d. Bea kissed [I think it was Bob] at the party.11 [amalgam] [Adapted from Kluck 2012: 1]

. 9 Although recently some experimental work has challenged this intuition (see Wichmann 2001; Dehé 2014 and references therein). 10. RPCs can have scope over one constituent of the host if some prosodic and information structure conditions are met (Hedberg & Elouazizi 2015; Kaltenböch 2007). 11. Kluck (2011, 2012) argues that amalgams should be analyzed as sluiced parentheticals, the only difference being the overt/null status of the anchor (but see Cerrudo 2017 for a different proposal). Discourse Phenomena as a Window to the Interfaces CatJL Special Issue, 2019 213

The examples in (5) could be problematic for those analyses that propose that there is no syntactic relation between parenthetical and host, unless we assume that the adjacency requirement is related to discourse conditions, not syntactic ones (see Ott & Onea 2015 for nominal appositions). I believe that at least the case of amalgams calls for a syntactic integration analysis, given that these constructions fill a gap inside the host (in (5d) the amalgam introduces the direct object of the main verb kiss). Amalgams are, in fact, the only parenthetical construction that cannot be omitted without affecting the grammaticality of the host clause. I will sketch an analysis that captures the differences between both kinds of parentheticals (free and anchored) and, at the same time, solves the conflict between hierarchy and linearity that these constructions give rise to in general. Let me phrase it with a question: if linear order is a product of syntactic structure (given Kayne’s LCA), how can parentheticals be linearized with respect to their hosts and, at the same time, be syntactically independent? Before presenting my answer to this syntax-phonology interface question, I will discuss the previous literature on the topic, according to the two categories of approaches for discourse phenomena that I have established in the first part of the paper: syntactico-centric and interface-based.

3.1. Syntactico-centric and interface-based approaches to parentheticals The analyses for the relationship between parentheticals and their hosts are usually divided in two groups, namely integrated and unintegrated approaches (in Dehé & Kavalova’s 2007 terms). Integrated approaches advocate that there is some kind of syntactic attachment of the parenthetical to its host, while unintegrated ones claim that there is no syntactic relation whatsoever between them. Generally, these two lines of analysis correlate with the two categories that we have been using in this paper. Integrated approaches are syntactico-centric, in the sense that they want to capture this special relationship in syntax and, to do so, they propose new functional projections or even special operations. By contrast, unintegrated approaches tend to be interface-based, in the sense that they appeal to other components of the grammar. As we saw for focalization, both kind of proposals can have theoretical shortcomings. Ross (1973) proposed the first formal analysis for RPCs and represents one clear example of a syntactic integration approach to parentheticals (see also Emonds 1973 and McClawley 1982). He claimed that RPCs (6a) are derived transforma- tionally from a construction where the parenthetical verb selects the host clause as its complement, as in (6b). According to him, a transformational rule, Slifting (Sentence-Lifting), is responsible from fronting the embedded clause and also from deleting the complementizer.

(6) a. John, I think, is going to pass the exam without trouble. b. I think [that John is going to pass the exam without trouble].

This proposal fails empirically, since it predicts the existence of connectivity effects between the parenthetical and the host (which do not exist, cf. Emonds 1973; Fabb 1990; Espinal 1991; Haegeman 1988; Burton-Roberts 2005; De Vries 2007, 214 CatJL Special Issue, 2019 Alba Cerrudo a.o.) and also that RPCs and fronted embedded clauses should behave similarly, but this is not the case (see Cerrudo 2015, 2016 for some data and discussion). The transformational analysis was abandoned early in the literature on paren- theticals, but Rooryck (2001a,b) goes back to it, adapting the proposal to Cinque’s (1999) cartography. He argues that RPCs are evidential modifiers of the host clause and proposes a derivation where the host is base-generated as the complement of the verb (as in the Slifting analysis). The parenthetical verb is supposed to move to the MoodEvidentiality head (to get the linear order of stylistic inversion) and, finally, the embedded CP moves to the specifier of the same projection, as represented in (7).

(7) [MoodEvidP CPi (he is going to pass the exam) [MoodEvid thinks] … [TP John thinks CPi]] As Rooryck acknowledges, this analysis only works for the cases where the parenthetical appears in final position. It is tricky to derive the interpolation of parentheticals in medial position departing from a transformational analysis, which, in any case, has empirical problems. The rest of analyses that advocate for a syntactic integrated account propose that the parenthetical is base-generated directly in the position where it is pronounced and that it is related to the host clause by some kind of adjunction procedure (Corver & Thiersch 2002; Matos 2013; Potts 2005). For instance, Potts (2005) claims that parentheticals are adjoined to different syntactic projections, but, crucially, they are teased apart from other type of adjuncts by a comma feature. This feature is interpreted literally at PF, as in the analyses of ellipsis discussed in section 2.1, but, contrary to the case of ellipsis, the comma feature is supposed to project. According to Potts, parentheticals are represented in syntax as CommaPs. At LF, the CommaP is interpreted as not-at-issue content, giving rise to conventional implicatures in Potts’ terms. Giorgi (2012) pursues a similar analysis, in the sense that she also represents pauses in syntax, but her proposal is even more syntactico-centric. She postulates the existence of two K heads in the left periphery, which represent the pauses at the two edges of parentheticals. Besides, she claims that parentheticals have a fixed position in the clause, as complements of the first KP, and derives the different orders by movement of (part of) the host to the specifiers of both KPs, as represented schematically in (8) for the simpler case (where the parenthetical is in final position).

(8) a. [KP K parenthetical [KP K [IP HOST]]]

b. [KP [IP HOST] K parenthetical [KP K e]]]

So far, I have reviewed some of the syntactico-centric approaches to paren- theticals, those based on the postulation of ad hoc features or projections. In the same vein, there are some analyses that propose the existence of special syntac- tic operations to introduce parentheticals in the host clause, while ensuring their syntactic independence (Ackema & Neeleman’s 2004 Insertion, De Vries’s 2012 par-merge or the innovations in syntagmatic structure suggested by Espinal 1991). Discourse Phenomena as a Window to the Interfaces CatJL Special Issue, 2019 215

It is interesting to discuss De Vries’ proposal, because it has been very influential recently (see Kluck 2011; Griffiths 2013; Griffiths & Gunes 2015; Griffiths & de Vries 2013; Dehé 2014). De Vries (2007, 2012) claims that parataxis constitutes a primitive in the gram- mar and, as such, must be represented in syntax. He proposes a new type of merge to introduce paratactic dependents in the syntactic derivation. According to this author, besides the standard Merge operation, which sustains an inclusiveness rela- tion and derives c-command, there is another operation available in syntax called Par(enthetical)-merge. Par-merge only concatenates two elements, without stablish- ing any kind of hierarchy among them, that is, elements introduced by par-merge are immune to c-command, which would explain why parentheticals seem to be syntactically independent. The fact that some parentheticals can be introduced by conjunctions is taken as proof of the existence of a functional head Par, which is argued to be silent in the rest of cases. Crucially, there are well-established classes of parentheticals that can be never introduced by a conjunction, like reduced parentheti- cals clauses, non-restrictive relative clauses, sluiced parentheticals and amalgams.12 Aside from this potential empirical challenge, notice that De Vries’ analysis is anti- economic, since he proposes both a new type of merge and a new functional head to account for the special relationship between parenthetical clauses and their hosts. Kluck (2012) uses this framework to analyze anchored parentheticals in terms of parallel construal (Koster 1999), as opposed to free parentheticals. In anchored parentheticals, the anchor, that is, the element that the parenthetical modifies, is analyzed as the specifier of ParP, which is merged regularly (only the head Par and its complement are par-merged). As a result of analyzing this bivalent ParP as a non-restrictive version of Koster’s colon phrase, she proposes that ParP inherits the category from its specifier (as shown in (10) for the example in (9)), like in the case of coordination (Munn 1993; Johannessen 1998).

(9) I kissed Bill, (who used to be) Bea’s husband.

(10)

[Kluck 2012: 21]

12. See Griffiths & Günes (2015) for the existence of a potential parenthetical marker in some construc- tions in Turkish. 216 CatJL Special Issue, 2019 Alba Cerrudo

On the other hand, free parentheticals (11) are represented as ParPs without specifier. They are adjoined directly in the position where they are pronounced, as in (12), given that there is no evidence of them being attached to any constituent of the host.

(11) He was, and this is quite typical for Bill, dating several women at the same time.

(12)

[Kluck 2011: 278]

This analysis has much descriptive power, but it is conceptually very problem- atic. The argument for the postulation of par-Merge is circular: parentheticals are invisible to c-command because they are introduced by par-merge and par-merge does not imply c-command because this is the core property of parentheticals (Ott 2016a). Following minimalist desiderata, I believe that we should avoid positing new primitives that significantly enrich UG, like par-merge, but also construction- specific functional projections as ParP, CommaP or KP. The need to promote some special feature or operation in syntax is a constant in the majority of approaches to parentheticals, as I have shown. Nonetheless, there are some examples of proposals that can be considered interface-based, which are precisely those called radical orphanage analyses (Heringa 2011). Radical orphanage proposals (Haegeman 1988; Fabb 1990; Safir 1986; Peterson 1999; Burton-Roberts 2005; Ott 2016a, 2016b) assume that there is no syntactic relation of any kind between parentheticals and their hosts. Haegeman treats parentheticals as orphan constituents, never adjoined to the host, whose semantic interpretation comes from general discursive principles. Specifically, she proposes that the conjunction in if-clauses, which is supposed to be the head of the parenthetical clause, is coindexed with the host clause. However, the full interpretation of the parenthetical is only possible when it is integrated in a pragmatic representation with other propositions, in accordance with general Discourse Phenomena as a Window to the Interfaces CatJL Special Issue, 2019 217 principles of interpretation (see Ott & Onea 2015 for a more specific proposal along these lines). The rest of proposals included in this group were designed for non-restric- tive relative clauses and appositions. Peterson (1994) claims that the relationship between this kind of parentheticals and their hosts is special and must be distin- guished both from subordination and adjunction. According to him, the difference lies in the fact that it is a purely discursive relationship that he calls attachment (the analysis is developed in the framework of Dik’s Functional Grammar). On the other hand, Fabb (1990) and Safir (1986) argue that the relationship between the main clause and a non-restrictive relative clause can only be captured in a special level of representation: X’ for Fabb and LF-prime, according to Safir. The parallelism of these analyses with the ones that propose new grammatical levels to account for focalization (mentioned in section 2.2.) is evident and I believe that the problems that they encounter are also similar. As we have seen, both syntactico-centric and interface-based approaches to parentheticals encounter the same conceptual problem, due to their postulation of construction-specific mechanisms to account for the special relationship between parentheticals and their host. I will follow the idea that parentheticals are truly syn- tactic orphans and that, at least free parentheticals, are only interpreted with respect to their hosts at the discourse level (since they constitute independent root clauses).

3.2. An interface-based approach to parentheticals: Multiple Spell-Out I assume that parentheticals are syntactic orphans, which technically should mean that they are derived in their own derivational workspace, separately from the host. This conception of parentheticals as truly paratactic dependents poses a lineariza- tion problem: if parentheticals are unattached to their hosts, how can they appear linearly inside them? I believe that, following the reasoning above, the interpolation of parentheticals can only occur after Spell-Out (cf. Cerrudo 2015, 2016). Given current assumptions, this seems to me to be the only way to explain their paratactic nature, if one wants to avoid unmotivated categories or operations. This hypothesis can be formalized under some version of cyclic Transfer (Uriagereka 1999 et seq.; Chomsky 2000, 2001, 2004). When the operation Transfer applies to a chunk of syntactic structure, it is sent to the two interfaces and eliminated from the deriva- tional workspace. Interestingly for the hypothesis that I am pursuing here, cyclic Transfer has been related to linearization in previous works (Uriagereka 1999; Fox & Pesetsky 2005). In fact, Uriagereka’s model can also accommodate the distinction between free and anchored parentheticals. To be more precise, my idea is that there is a correlation between the operation Transfer and the possibility of changing from one derivational workspace to another in syntax (and idea put forth in Uriagereka 1999). Let me introduce the basics of the model before presenting my specific proposal. Uriagereka (1999, 2002a, 2002b, 2004) develops the Multiple Spell-Out model (MSO), whose major claim is that specifiers and adjuncts must be spelled-out separately in order to be linearized. By contrast, head and complement units can be 218 CatJL Special Issue, 2019 Alba Cerrudo linearized without resorting to Spell-Out (until the end of the derivation), because the problem begins when we leave a command unit (the space where c-command relations hold without further stipulation, see (13)) in order to derive a complex specifier (or adjunct).

(13) a. Command unit: formed by continuous application of Merge to the same object {α, {δ, {α, {α, {β…}}}}} δ →↑← {α, {α, {β…}}} α →↑← {β…} b. Not a command unit: formed by discontinuous application of Merge to two separately assembled objects {α, {{γ, {γ, {δ…}}}, {α, {α, {β…}}}}} {γ, {γ, {δ…}}} →↑← {α, {α, {β…}}} γ→↑← {δ…} α →↑←{β…} [Uriagereka 2002a: 46]

Uriagereka proposes two options for specifiers to be related to the main deri- vation, once transferred. The first option, conservative Spell-Out, is the simpler: specifiers are attached normally to the left of the relevant head, but without hier- archical structure, flattened (their terminal nodes being previously linearized), as some giant lexical compound. The second option, Radical Spell-Out, holds that linearized specifiers (or adjuncts) are never attached to the main derivation. What lies behind this claim is the intuition that parallel workspaces may not been inter- twined in syntax in some cases. I believe that these two linearization options can be applied straightforwardly to the analysis of parentheticals and capture the distinction between free and anchored parentheticals. Anchored parentheticals can be analyzed through conservative Spell-Out: they are derived separately and transferred, but, then, they are adjoined to their anchor in the main derivation.13 Under this analysis, we can ensure their opacity while, at the same time, warrant their semantic scope: the fact that they only modify the constituent of the host to which they are attached to (the anchor). By contrast, the radical Spell-Out option is suitable for free parentheticals, those that can appear in many positions of the host and, nonetheless, tend to modify the whole proposition (in a discourse level). Free parentheticals are transferred sepa- rately, like anchored ones, but they do not merge into the main derivation; their linear integration occurs only at PF instead. There is one potential caveat for this analysis, the so-called Assembly problem (Dobashi 2009): how can the system establish the order between the transferred chunks? In fact, this is a problem for any version of cyclic transfer (including

13. Anchored parentheticals always appear to the right of their anchor (the constituent that they mod- ify), which seems to force us to use right-adjunction. However, notice that regular adjuncts of the VP tend to appear linearly in final position too. This general issue deserves further research. Discourse Phenomena as a Window to the Interfaces CatJL Special Issue, 2019 219

Chomsky’s phase theory). One possible solution is the postulation of a memory buffer mediating between syntax and the interfaces (Uriagereka 1999). The idea is that transferred material does not go to the two interfaces right away, it is stored in a computational space instead, whose only function is loading all the transferred chunks until the whole derivation (of every workspace) has come to an end, as depicted in (14).

(14)

I assume that the order of arrival to the buffer matters for linearization. In a standard bottom-up model, this would mean that we have a “last-in, first-out” situ- ation, that is, what is stored first is pronounced last. The need of some stipulation to maintain the order after Transfer (a mirror-image of the derivational history) is crucial for any theory of cyclic Transfer, but even more for the cases we are dealing with, where the system must keep track of two different derivational work- spaces, each with its own cycles. Under this framework, the linear order between parenthetical and host is a timing issue, it depends on the order of Transfer of the relevant chunks of each workspace (cf. Cerrudo 2016 for a complete derivation step by step). Let me sum up the key features of the proposal. The basic idea is that the parenthetical and the host clause are derived in different workspaces and can get intertwined through Transfer. Under the MSO framework, parallel workspaces can cross their paths after Spell-Out of both workspaces (radical Spell-Out), which is the case of free parentheticals, or after Spell-Out of the parenthetical clause (con- servative Spell-Out), which is then adjoined to a particular constituent in the host, in the case of anchored parentheticals. The analysis explains straightforwardly two salient properties of these constructions: syntactic opacity and positional flexibility. Syntactic opacity is caused by the parenthetical being linearized prior to any inter- action with the host and positional flexibility is expected if the two derivations are generated separately and can cross their paths at any time in the derivation during the cyclic process of Spell-Out (even though there could be some restrictions, see Cerrudo 2016). This framework could explain the general restrictions on the niches for parentheticals in the clause, but a complete theory should take into account all 220 CatJL Special Issue, 2019 Alba Cerrudo kinds of pragmatic or contextual factors. For instance, compare (15) and (16) with two different examples of parentheticals introduced by conjunctions, one with more positional flexibility than the other.

(15). a Peter made a cake – or cookies (I don’t remember) – for the birthday party. b. Peter made a cake for the birthday party – or cookies (I don’t remember). . c #Peter – or cookies (I don’t remember) – made a cake for the birthday party. . d #Peter made – or cookies (I don’t remember) – a cake for the birthday party.

(16) a. Peter – and this is good news – made a cake for the birthday party. b. Peter made – and this is good news – a cake for the birthday party. c. Peter made a cake – and this is good news – for the birthday party. d. Peter made a cake for the birthday party – and this is good news.

There are some important descriptive differences between the parentheticals in (15) and (16). Firstly, in (15) the parenthetical is a non-sentential fragment, and can be analyzed as an elliptical version of a sentence parallel to the host clause (Peter made cookies for the birthday party). Secondly, it introduces a disjunction with respect to one specific lexical item of the host clause (cake), contrary to (16), which is a mere addition without any contrast. It seems that this second difference is responsible for the positional restrictions shown in (15); the disjunctive paren- thetical can only appear after the first element of the disjunction is introduced. I believe that this restriction has to do with the pragmatic properties of disjunction, and, thus, it does not belong to the domain of syntax. This is only one example of how different principles of discourse organization affect the distribution of par- entheticals (see Ott & Onea 2015 for nominal appositions), but I believe that a careful examination of this kind of data should be one of the priorities on the research agenda if we want to build a complete theory of parentheticals and, at the same time, gain new insights about the syntax-discourse interface. On the other hand, one could also view the contrast above as a requirement on the antecedent of the elliptical parenthetical, but, under this perspective, the antecedent would be the string of the host clause preceding the parenthetical and not the entire host clause, as it is usually assumed (Ott 2016b; Stowell 2017; Ott & Onea 2015). In any case, the study of ellipsis in parentheticals is another fruitful research field (see specially Döring 2015) and deserves further attention. For instance, assuming that all parentheticals have a hidden clausal structure (as proposed in Döring 2015) has one welcome consequence: their prosodic isolation receives a natural explanation, since it is standard to assume that (root) CPs are mapped into intonational phrases in the syntax-phonology interface (Nespor & Vogel 1986; Selkirk 1981 et seq.; Truckenbrodt 2015). Discourse Phenomena as a Window to the Interfaces CatJL Special Issue, 2019 221

The analysis pursued here is also consistent with this conclusion if we believe that there is a correlation between Spell-Out and the mapping to pho- nological structure. Some authors have proposed that Chomsky’s phase theory and Uriagereka’s MSO should be combined if one wants to derive domains for morphophonologycal application directly from transferred chunks (Dobashi 2003; Newell 2008; Samuels 2009; Sato 2012). There is a lively debate about these mat- ters in the syntax-phonology interface field, in which parentheticals have been an important cornerstone. I believe that the hypothesis for parentheticals presented above is a good line of research to check the predictions of different theories about important issues such as linearization, the nature of Transfer and the existence of cycles, both coming from the phonologically-oriented and the syntactically- oriented literature.

4. Conclusion This paper started with the observation that the study of discourse phenomena could be a promising research field to investigate the limits between syntax and the interfaces. In the first part of the paper, it was made clear that depending on the theoretical position that one adopts about the role of syntax in grammar, the enterprise mentioned above cannot be entertained. Assuming that syntax has to encode every single aspect of linguistic phenomena leads us to a syntactico- centric trend, which provides a framework to analyze all kinds of constructions in a unified manner (following the logic of the cartographic project), but poses non-trivial questions about the nature of features and functional heads, as we saw specially for the case of focalization, and also for ellipsis and parentheticals. On the other hand, interface-based proposals are not exempt of problems either, since some of them challenge the Y-model of the architecture of grammar – as was shown for focalization –, which is undesirable in both theoretical and empiri- cal grounds. One important conclusion is, then, that interface-based approaches are not always superior to syntactico-centric ones. However, they are always interesting because they force us to think about what the role of the different components of grammar is. It is difficult to determine a priori when it is theoretically sound to add a new feature or functional head or even to propose a new operation to accommodate some empirical facts (as Par-merge for the case of parentheticals). Following minimalist desiderata, the best would be to keep new features and operations to the minimum and try to solve the empirical puzzles with the pieces that we already have. I tried to follow this line of action pursuing an interface-based approach to parentheticals, rooted in the intuition that these structures are truly syntactic orphans. The analysis does not need special mechanisms, it is only necessary to accept that syntactic deri- vations are transferred cyclically to the interfaces. Although it was only a sketchy hypothesis, I believe that it is one good example of the potential that looking at the interfaces has when dealing with discourse phenomena. I have focused only on the syntax-phonology interface, but of course the semantic/pragmatic side should also be taken into account seriously if we want 222 CatJL Special Issue, 2019 Alba Cerrudo to have a comprehensive view of the phenomena under study. In any case, the purpose of this paper was quite different from that. Beyond comprehending the linguistic phenomena per se, I wanted to show that looking at discourse phe- nomena is exciting because it can help us understand better what is the nature of the interfaces.

References Abe, Jun. 2014. Make short answers shorter: support for the in-situ approach. Ms. Tohoko Gakuin University. Aboh, Enoch O. 2010. Information structuring begins with the numeration. Iberia 2(1): 12-42. Ackema, Peter & Neeleman, Ad. 2004. Beyond morphology: Interface conditions on word formation, vol. 6. Oxford: Oxford University Press. Adger, David & Svenonius, Peter. 2011. Features in minimalist syntax. In Cedric Boeckx. The Oxford handbook of linguistic minimalism, 27-51. Oxford: Oxford University Press. Aelbrecht, Lobke. 2010. The syntactic licensing of ellipsis, vol. 149. Amsterdam: John Benjamins Publishing. Astruc-Aguilera, Maria Lluïsa. 2005. The intonation of extra-sentential elements in Catalan and English. Doctoral dissertation, University of Cambridge. Bañeras Carrió, María. 2016. La Periferia izquierda de la oración. Bases de una pro- puesta configuracional. Ms. UAB. Belletti, Adriana. 2004. Aspects of the low IP area. In Rizzi, Luigi (ed.). The structure of CP and IP. The cartography of syntactic structures, 2, 16-51. Oxford: Oxford University Press on Demand. Bolinger, Dwight. 1986. Intonation and Its Parts: Melody in Spoke English. London: Arnold. Bošković, Željko. 2014. Now I’m a phase, now I’m not a phase: On the variability of phases with extraction and ellipsis. Linguistic Inquiry 45(1): 27-89. Burton-Roberts, Noel. 2005. Parentheticals. In Brown, E.K. (ed.). Encyclopaedia of Language and Linguistics. Amsterdam: Elsevier. Brunetti, Lisa. 2003. “Information” Focus Movement in Italian and Contextual Constraints on Ellipsis. In L.M. Tsujimura & G. Garding (eds.). WCCFL 22 Proceedings. Somerville, MA: Cascadilla Press, 95-108. Cerrudo, Alba. 2015. Los parentéticos, entre la sintaxis y la parataxis. Un análisis derivacional de las construcciones parentéticas reducidas y con clítico. MA thesis. UAB. Cerrudo, Alba. 2016. Cyclic transfer in the derivation of Complete Parenthetical Clauses. Borealis–An International Journal of Hispanic Linguistics 5(1): 59-85. Cerrudo, Alba. 2017. Reprojection in Fragment Amalgams. Evidence from Spanish. 47th Linguistic Symposium on Romance Languages. 20/04/2017. University of Delaware, Newark, EEUU. Chomsky, Noam. 1995. The minimalist program. Cambridge, MA: MIT Press. Chomsky, Noam. 2000. Minimalist Inquiries: The Framework. In R. Martin, D. Michaels & J. Uriagereka. Step by Step. Essays on Minimalist Syntax in Honour of Howard Lasnik. Cambridge, MA: MIT Press, 89-155. Discourse Phenomena as a Window to the Interfaces CatJL Special Issue, 2019 223

Chomsky, Noam. 2001. Derivation by phase. In Michael Kenstowicz (ed.). Ken: Hale: A life in language, 1-52. Cambridge, MA: MIT Press. Chomsky, Noam. 2004. Beyond explanatory adequacy. Structures and Beyond. In Belletti, Adriana (ed.). The cartography of syntactic structures, vol. 3, 104-131. Chomsky, Noam & Lasnik, Howard. 1993. A minimalist program for linguistic theory. In Kenneth Hale & Samuel J. Keyser (eds.). The view from building. Cambridge, MA: MIT Press, 1-52. Cinque, Guglielmo. 1993. A null theory of phrase and compound stress. Linguistic inquiry 24(2): 239-297. Cinque, Guglielmo. 1999. Adverbs and functional heads: A cross-linguistic perspective. Oxford: Oxford University Press on Demand. Cinque, Guglielmo (ed.). 2002. Functional structure in DP and IP: The cartography of syntactic structures (Vol. 1). Oxford: Oxford University Press. Cinque, Guglielmo & Rizzi, Luigi. 2008. The cartography of syntactic structures. Studies in linguistics 2: 42-58. Corver, Norbert & Thiersch, Craig. 2002. Remarks on parentheticals. In Oosterdorp, M. van & Agnastopoulou, E. (eds.). Progress in grammar: articles at the 20th anniversary of the comparison of grammatical models in Tilburg. Utrecht: Roquade. Dehé, Nicole & Kavalova, Yolanda. 2007. Parentheticals: An introduction. In Parentheticals. Amsterdam/Philadelphia: John Benjamins Publishing Company, 1-24. Demonte, Violeta & Fernández-Soriano, Olga. 2009. Force and finiteness in the Spanish complementizer system. Probus 21(1): 23-49. Dobashi, Yoshihito. 2003. Phonological phrasing and syntactic derivation. PhD diss. Cornell University. Dobashi, Yoshihito. 2009. Multiple spell-out, assembly problem, and syntax-phonology mapping. In Phonological Domains: Universals and Deviations. Berlin: Mouton de Gruyter, 195-220. Döring, Sandra. 2015. Parentheticals are–presumably–CPs. In Parenthesis and Ellipsis. Berlin: De Gruyter, 109-145. Emonds, Joseph. 1973. Parenthetical clauses. In You take the high node and I’ll take the low node, 333-347. Erteschik-Shir, Nomi. 1997. The dynamics of focus structure. Cambridge: Cambridge University Press. Erteschik-Shir, Nomi. 2007. Information structure: The syntax-discourse interface (Vol. 3). Oxford: Oxford University Press. Espinal, Maria Teresa. 1991. The representation of disjunct constituents. Language 67: 726-762. Espinal, Maria Teresa & Villalba, Xavier. 2015. Ambiguity resolution and information structure. The Linguistic Review 32(1): 61-85. Feldhausen, Ingo. 2010. Sentential form and prosodic structure of Catalan (Vol. 168). Amsterdam: John Benjamins Publishing. Fabb, Nigel. 1990. The difference between: English restrictive and nonrestrictive rela- tive clauses. Journal of Linguistics 26(1): 57-77. Fox, Danny & Pesetsky, David. 2005. Cyclic linearization of syntactic structure. Theoretical linguistics 31(1-2): 1-45. Gallego, Ángel. 2013. The Basic Elements of the Left Periphery. Ms. UAB. 224 CatJL Special Issue, 2019 Alba Cerrudo

Gallego, Ángel. 2014. Cartografía sintáctica. Revista Española de Lingüística 41(2), 25-56. Giorgi, Alessandra. 2012. Prosodic signals as syntactic formatives in the left periphery. Manuscript available on . Griffiths, J. 2013. Parenthetical verb constructions, fragment answers, and constituent modification. Natural Language and Linguistic Theory 33(1): 191-229. Griffiths, James & Günes, Güliz. 2015. Ki issues in Turkish. In Kluck, Ott & de Vries (eds.). Parenthesis and ellipsis: cross-linguistic and theoretical perspectives (Vol. 121). Berlin: Walter de Gruyter GmbH & Co KG. Griffiths, James & De Vries, Mark. 2013. The syntactic integration of appositives: evidence from fragments and ellipsis. Linguistic Inquiry 44(2): 332-344. Haegeman, Liliane. 1988. Parenthetical adverbials: the radical orphanage approach. In Chiba, S. (ed.). Aspects of Modern English Linguistics. Tokyo: Kaitakushi, 232-254. Haegeman, Liliane & Hill, Virginia. 2013. The syntacticization of discourse. Syntax and its limits 48: 370-390. Hamlaoui, Fatima & Szendrői, Kriszta. 2015. A flexible approach to the mapping of intonational phrases. Phonology 32(1): 79-110. Hartmann, Katherine. 2000. Right node raising and gapping. Interface conditions on prosodic deletion. Philadelphia/Amsterdam: John Benjamins. Hedberg, Nancy & Elouazizi, Noureddine. 2015. Epistemic Parenthetical Verb Phrases: C-Command, Semantic Scope and Prosodic Phrasing. In Schneider, S., Glikman, J. & Avanzi, M. (eds.). Parenthetical verbs. Berlin/Boston/Munich: Walter de Gruyter, 225-257. Heringa, Herman. 2011. Appositional Constructions. PhD Diss. Groningen: LOT: Netherlands Graduate School of Linguistics. Hernanz, Maria Lluïsa. 2010. Assertive bien in Spanish and the left periphery. Mapping the left periphery. The cartography of syntactic structures 5: 19-62. Irurtzun, Aritz. 2009. Why Y: on the centrality of syntax in the architecture of grammar. Catalan Journal of Linguistics 8: 141-160. Jackendoff, Ray. 1997. The architecture of the language faculty (No. 28). Cambridge, MA: MIT Press. Johannessen, Janne Bondi. 1998. Coordination. Oxford: Oxford University Press. Kaltenböck, Gunther. 2007. Spoken parenthetical clauses in English. A taxono- my. In Nicole Dehé & Yordanka Kavalova (eds.). Parentheticals. Amsterdam/ Philadelphia: John Benjamins Publishing Company, 25-52. Kayne, Richard. 1994. The antisymmetry of syntax. Cambridge, MA: MIT Press. Kluck, Marlies. 2011. Sentence amalgamation. PhD diss. LOT Dissertation Series: Utrecht. Kluck, Marlies. 2015. On representing anchored parentheses in syntax. In Josef Bayer & Andreas Trotzke (eds.). Syntactic Complexity across Interfaces. Berlin: Mouton de Gruyter, 107-136. Koster, Jan. 1999. De primaire structuur. TABU 29: 131-140. Lekakou, Marika. 2000. Focus in modern Greek. Unpublished MA Dissertation, University College London. Lobeck, Anne C. 1995. Ellipsis: Functional heads, licensing, and identification. Oxford: Oxford University Press on Demand. Discourse Phenomena as a Window to the Interfaces CatJL Special Issue, 2019 225

López, Luis. 2014. A derivational syntax for information structure (Vol. 23). Oxford: Oxford University Press. Matos, Gabriela. 2013. Quotative Inversion in Peninsular Portuguese and Spanish, and in English. Catalan Journal of Linguistics 12: 111-130. McCawley, James D. 1982. Parentheticals and discontinuous constituent structure. Linguistic Inquiry 13: 91-106. Merchant, Jason. 2001. The Syntax of Silence. Oxford: Oxford University Press. Merchant, Jason. 2004. Fragments and ellipsis. Linguistics and Philosophy 27: 661-738. Munn, Alan Boag. 1993. Topics in the syntax and semantics of coordinate structures. PhD diss. University of Maryland. Neeleman, Ad & Van de Koot, Hans. 2008. Dutch scrambling and the nature of discourse templates. The Journal of Comparative Germanic Linguistics 11(2): 137-189. Neeleman, Ad, Titov, E., Van de Koot, Hans & Vermeulen, R. 2009. A syntactic typology of topic, focus and contrast. In Van Craenenbroeck, J. (ed.). Alternatives to Cartography, 15-52. Berlin: Mouton de Gruyter. Nespor, Marina & Vogel, Irene. 1986. Prosodic Phonology. Dordrecht: Foris. Neeleman, Ad & Reinhart, Tanya. 1998. Scrambling and the PF-interface. In Miriam Butt & Wilhelm Geuder (eds.). The 271 projection of arguments: lexical and com- positional factors, 309-353. Chicago: CSLI Publications. Newell, Heather. 2008. Aspects of the morphology and phonology of phases. Doctoral dissertation, McGill University. Nilsen, Øystein. 2003. Eliminating positions: Syntax and semantics of sentence modi- fication. PhD diss. Utrecht University. Ortega-Santos, Iván, Yoshida, Masaya & Nakao, C. 2014. On ellipsis structures involv- ing a wh-remnant and a non-wh-remnant simultaneously. Lingua 138: 55-85. Ott, Dennis. 2015. Connectivity in left-dislocation and the composition of the left periphery. Linguistic Variation 15.2: 225-290. Ott, Dennis. 2016a. Fragment Anchors Do Not Support the Syntactic Integration of Appositive Relative Clauses: Reply to Griffiths and De Vries 2013. Linguistic Inquiry 47(3): 580-590. Ott, Dennis. 2016b. Ellipsis in appositives. Glossa: A Journal of General Linguistics 1(1): 34. Ott, Dennis & Struckmeier, Volker. 2016. Deletion in clausal ellipsis: remnants in the middle field. In Proceedings of the 39th Annual Penn Linguistics Conference 22, 225-234. Ott, Dennis & Onea, Edgar. 2015. On the form and meaning of appositives. In Thuy Bui & Deniz Ozyildiz (eds.). Proceedings of NELS 45, vol. 2, 203-212. Amherst, MA: GLSA. Payà, Marta. 2003. Prosody and pragmatics in parenthetical insertions in Catalan. Catalan Journal of Linguistics 2: 207-227. Peterson, Peter. 1999. On the boundaries of syntax. In P. Collins & D. Lee (eds.). The Clause in English. In Honour of Rodney Huddleston. Amsterdam/Philadelphia: John Benjamins, 229-250. Pollock, Jean-Yves. 1989. Verb movement, universal grammar, and the structure of IP. Linguistic inquiry 20(3): 365-424. 226 CatJL Special Issue, 2019 Alba Cerrudo

Potts, Cristopher. 2005. The Logic of Conventional Implicatures. Oxford: OUP. Reinhart, Tanya. 1995. Interface Strategies. OTS working papers in Linguistics (to appear in MIT Press). Reinhart, Tanya. 2006. Interface strategies. Cambridge, MA: MIT Press. Rigau Oliver, Gemma. 1981. Gramática del discurs. PhD diss. Autonomous University of Barcelona. Rizzi, Luigi. 1997. The fine structure of the left periphery. In Elements of grammar, 281-337. Dordrecht: Springer. Rizzi, Luigi. 2004. Locality and left periphery. Structures and beyond: The cartography of syntactic structures 3: 223-251. Rizzi, Luigi. 2006. On the form of chains: Criterial positions and ECP effects. Current Studies in Linguistics Series 42: 97. Rooryck, Johan. 2001a. Evidentiality, part I. Glot International 5(4): 125-133. Rooryck, Johan. 2001b. Evidentiality, part II. Glot International 5(5): 161-168. Rooth, Mats. 1992. A theory of focus interpretation. Natural Language Semantics 1: 75-116. Ross, John Robert. 1973. Slifting. In Gross, Maurice; Halle, Morris & Schützenberger, M-P (eds.). The formal analysis of natural languages. Proceedings of the first international conference. The Hague: Mouton, 133-169. Safir, Ken. 1986. Relative clauses in a theory of binding and levels. Linguistic Inquiry 17(4): 663-89. Samek-Lodovici, Vieri. 2005. Prosody–syntax interaction in the expression of focus. Natural Language & Linguistic Theory 23(3): 687-755. Samek-Lodovici, Vieri. 2015. The Interaction of Focus, Givenness, and Prosody. A Study of Italian Clause Structure. Oxford: Oxford University Press. Samuels, Bridget D. 2009. The structure of phonological theory. PhD diss. Harvard University. Sato, Yosuke. 2012. Multiple Spell-Out and Contraction at the Syntax–Phonology Interface. Syntax 15(3): 287-314. Scheer, Tobias. 2011. A guide to morphosyntax-phonology interface theories: how extra-phonological information is treated in phonology since Trubetzkoy’s Grenzsignale. Berlin: Walter de Gruyter. Selkirk, Elisabeth. 1981. On prosodic structure and its relation to syntactic structure. In Thorstein Fretheim (ed.). Nordic Prosody 11: 111-140. Trondheim: TAPIR. Stowell, Timothy. 2017. Qualified parenthetical adjuncts. In Mayr, Clemens & Williams, E. (eds.). Festschrift Für Martin Prinzhorn. Universität Wien. Szendrői, Kriszta. 2001. Focus and the syntax-phonology interface. PhD diss., University College London. Speas, Peggy & Tenny, Carol. 2003. Configurational properties of point of view roles. Asymmetry in Grammar 1: 315-345. Taglicht, Josef. 1998. Constraints on intonational phrasing in English. Journal of Linguistics 34(1): 181-211. Tancredi, Christopher. 1992. Deletion, deaccenting and presupposition. PhD diss. Cambridge, MA: MIT. Tannen, Deborah, Heidi E. Hamilton & Schiffrin, Deborah. 2015. The handbook of discourse analysis. Chichester: John Wiley & Sons. Discourse Phenomena as a Window to the Interfaces CatJL Special Issue, 2019 227

Truckenbrodt, Hubert. 2015. Intonation phrases and speech acts. In Kluck, Marlies, Ott, Dennis & Vries, Mark de (eds.). Parenthesis and ellipsis: Cross-linguistic and theoretical perspectives, 301-349. Berlin: De Gruyter Mouton. Uriagereka, Juan. 1999. Multiple Spell-Out. In S. Epstein & N. Hornstein (eds.). Working Minimalism. Cambridge, MA: MIT Press, 251-282. Uriagereka, Juan. 2002a. Multiple Spell-Out. In J. Uriagereka. Derivations. Exploring the Dynamics of Syntax. London/New York: Routledge, 45-65. Uriagereka, Juan. 2002b. Evidential Contexts, ms. UMD. Uriagereka, Juan. 2004. Multiple Spell-Out Consequences, ms. UMD. Valmala, Vidal. 2007. The syntax of little things. Ms. UPV/EHU. Valmala, Vidal. 2014. Island repair by ellipsis: handle with care. In XI Workshop onSyn- tax, Semantics and Phonology. Vallduvi, Enric. 1992. The Informational Component. New York: Garland. Vallduví, Enric & Engdahl, Elisabet. 1996. The linguistic realization of information packaging. Linguistics 34(3): 459-520. Van Craenenbroeck, Jeroen (ed.). 2009. Alternatives to cartography (Vol. 100). Berlin: Walter de Gruyter. Villa-García, Julio. 2015. The syntax of multiple-que sentences in Spanish: Along the left periphery (Vol. 2). Amsterdam/Philadelphia: John Benjamins Publishing Company. Villa-García, Julio. 2016. TP-ellipsis with a polarity particle in multiple-complemen- tizer contexts in Spanish: on topical remnants and focal licensors. Borealis–An International Journal of Hispanic Linguistics 5(2): 135-172. Vries, Mark de. 2007. Invisible Constituents? Parentheses as B-Merged Adverbial Phrases. In Nicole Dehé & Yordanka Kavalova (eds.). Parentheticals. Amsterdam: John Benjamins, 203-234. Wagner, Michael. 2009. Focus, Topic and Word order: A compositional view. In J. van Craenenbroeck (ed.). Alternatives to cartography. Berlin/New York: Mouton de Gruyter, 53-86. Weir, Andrew. 2014. Fragments and clausal ellipsis. PhD thesis. Amherst: University of Massachusetts. Wichmann, Anne. 2001. Spoken parentheticals. In Karin Aijmer (ed.). A Wealth of English, 177-193. Göteborg: Acta Universitatis Gothoburgensis. Wiltschko, Martina & Heim, Johannes. 2016. The syntax of confirmationals. Outside the Clause: Form and function of extra-clausal constituents 178: 305. Zubizarreta, María Luisa. 1998. Prosody, focus, and word order. Cambridge, MA: MIT Press.

Catalan Journal of Linguistics Special Issue, 2019 229-261

Generative Grammar and the Faculty of Language: Insights, Questions, and Challenges*

Noam Chomsky University of Arizona & M.I.T. [email protected] Ángel J. Gallego Universitat Autònoma de Barcelona [email protected] Dennis Ott University of Ottawa [email protected]

Received: November 4, 2017 Accepted: September 23, 2019

Abstract

This paper provides an overview of what we take to be the key current issues in the field of Generative Grammar, the study of the human Faculty of Language. We discuss some of the insights this approach to language has produced, including substantial achievements in the under- standing of basic properties of language and its interactions with interfacing systems. This pro- gress in turn gives rise to new research questions, many of which could not even be coherently formulated until recently. We highlight some of the most pressing outstanding challenges, in the hope of inspiring future research. Keywords: Generative Grammar; faculty of language; basic properties; operations; interfaces; syntax

Resum. La gramàtica generativa i la facultat del llenguatge: descobriments, preguntes i desafiaments

Aquest treball proporciona una visió general dels aspectes clau actuals en el camp de la gramàtica generativa: l’estudi de la facultat del llenguatge humà. Es tractaran algunes de les visions a què aquest enfocament del llenguatge ha donat lloc, incloent-hi èxits importants en la comprensió de

* For feedback and suggestions, we are indebted to Josef Bayer, Chris Collins, Erich Groat, Luigi Rizzi, and Juan Uriagereka. Parts of this paper are based on a Question & Answer session with Noam Chomsky that took place at the Residència d’Investigadors (Barcelona) on November 6, 2016. We would like to thank the students who helped with the transcription of that session: Alba Cerrudo, Elena Ciutescu, Natalia Jardón, Pablo Rico, and Laura Vela. Ángel J. Gallego acknowl- edges support from the Ministerio de Economía y Competitividad (FFI2014-56968-C4-2-P and FFI2017-87140-C4-1-P), the Generalitat de Catalunya (2014SGR-1013 and 2017SGR-634), and the Institució Catalana de Recerca i Estudis Avançats (ICREA Acadèmia 2015). Dennis Ott acknowl- edges support from the Social Sciences and Humanities Research Council (430-2018-00305). We would like to dedicate this paper to the late Sam Epstein, whose work on syntactic theory has been a constant source of inspiration over the years.

ISSN 1695-6885 (in press); 2014-9718 (online) https://doi.org/10.5565/rev/catjl.288 230 CatJL Special Issue, 2019 Noam Chomsky; Ángel J. Gallego; Dennis Ott les propietats bàsiques del llenguatge i les seves interaccions amb els sistemes d’interfície. Aquest progrés dona lloc a noves preguntes de recerca, moltes de les quals fins i tot no es podien formular de manera coherent fins fa poc. Destaquem alguns dels reptes més destacats amb l’esperança d’inspirar futures investigacions. Paraules clau: gramàtica generativa; facultat de llenguatge; propietats bàsiques; operacions; interfícies; sintaxi

Table of Contents 1. Introduction 5. Open Questions and Future 2. Basic Properties of I-language Directions 3. Operations and Constraints 6. Conclusion 4. Interfaces References

1. Introduction Generative Grammar (GG) is the study of linguistic capacity as a component of human cognition. Its point of departure is Descartes’ observation that “there are no men so dull-witted or stupid […] that they are incapable of arranging various words together and forming an utterance from them in order to make their thoughts under- stood; whereas there is no other animal, however perfect and well endowed it may be, that can do the same” (Discours de la méthode, 1662). Studies in comparative cognition over the last decades vindicate Descartes’ insight: only humans appear to possess a mental grammar—an “I-language,” or internal-individual language system—that permits the composition of infinitely many meaningful expressions from a finite stock of discrete units (Hauser et al. 2002; Anderson 2004; Chomsky 2012a, 2017). The term Universal Grammar (UG) is a label for this striking difference in cognitive capacity between “us and them.” As such, UG is the research topic of GG: what is it, and how did it evolve in our species? While we may never find a satisfying answer to the latter question, any theory of UG must meet a criterion of evolvability: the mechanisms and primitives ascribed to UG (as opposed to deriving from independent factors) must be sufficiently sparse to plausibly have emerged as a result of what appears to have been a unique, recent, and rela- tively sudden event on the evolutionary timescale (Bolhuis et al. 2014; Berwick & Chomsky 2016). GG’s objectives open up many avenues for interdisciplinary research into the nature of UG. Fifty years ago, Eric Lenneberg published his now-classic work that founded the study of the biology of language, sometimes called “biolinguistics” (Lenneberg 1967). In conjunction with the then-nascent generative-internalist per- spective on language (Chomsky 1956[1975], 1957, 1965), this major contribution inspired a wealth of research, and much has been learned about language as a result. The techniques of psychological experimentation have become far more Generative Grammar and the Faculty of Language CatJL Special Issue, 2019 231 sophisticated in recent years, and work in neurolinguistics is beginning to connect in interesting ways with the concerns of GG (Berwick et al. 2013; Nelson et al. 2017; Friederici et al. 2017). Important results have emerged from the study of language acquisition, which is concerned with the interaction of UG and learning mechanisms in the development of an I-language (Yang 2002, 2016; Yang et al. 2017). Work by Rosalind Thornton and others shows that children spontaneously produce expressions conforming to UG-compliant options realized in languages other than the local “target” language, without any relevant evidence; but they do not systematically produce innovative sentences that violate UG principles. This continuity between children’s seemingly imperfect knowledge and the range of variation in adult grammars suggests that children are following a developmental pathway carved out by UG, exploring the range of possible languages and ultimately converging on a steady state (for review and references, see Crain & Thornton 1998, 2012; Crain et al. 2016; for a theory of the steady state as a probability distribution over I-languages, see Yang 2016). Converging conclusions are strongly suggested by the spontaneous creation of sign languages by deaf children without linguistic input (Feldman et al. 1978; Kegl. et al. 1999; Sandler & Lillo-Martin 2006). On the whole, we believe that GG has made significant progress in identifying some of the computational mechanisms distinguishing man from animal in the way recognized by Descartes. In this paper, we offer our view of the current state of the field, highlighting some of its central achievements and the many remaining challenges, in the hope of inspiring future research. Section 2 discusses the fun- damental, “non-negotiable” properties of human language that any theory of UG has to account for. Section 3 focuses on core computational operations and their properties. Section 4 turns to the interfaces of I-language and systems entering into language use, and how conditions imposed by these systems constrain syntac- tic computation. Section 5 reviews a number of challenges emerging from recent work, which call for resolution in order to meet minimalist desiderata. Section 6 concludes.

2. Basic Properties of I-language A traditional characterization of language, going back to Aristotle, defines it as “sound with meaning.” Building on this definition, we can conceive of an I-language as a system that links meaning and sound/sign in a systematic fashion, equipping the speaker with knowledge of these correlations. What kind of system is an I-language? We consider two empirical properties non-negotiable, in the sense that any theory that shares GG’s goal of providing an explanatory model of human linguistic capacity must provide formal means of capturing them: discrete infinity and displacement.1 Atomic units—lexical items, whose basic nature remains

1. The latter notion is non-negotiable in its abstract sense: there can be multiple determinants of interpretation for some syntactic object. The mechanisms implementing this basic fact vary across theoretical frameworks, of course. 232 CatJL Special Issue, 2019 Noam Chomsky; Ángel J. Gallego; Dennis Ott a subject of debate2—are assembled into syntactic objects, and such objects can occupy more than one position within a larger structure. The first property is the technical statement of the traditional observation that “there is no longest sentence,” the informal notion “sentence” now abandoned in favor of hierarchically structured objects. The second property is illustrated by a plethora of facts across the world’s languages. To pick one random illustration, consider the familiar active/passive alternation:

(1) a. Sensei-ga John-o sikar-ta. (Japanese) teacher-nom John-acc scold-pst ‘The teacher scolded John.’ b. John-ga sensei-ni sikar-are-ta. John-nom teacher-by scold-pass-pst ‘John was scolded by the teacher.’

The noun phrase John bears the same thematic relation to the verb sikar in both (1a) and (1b), but appears sentence-initially in the latter. On the assumption that thematic relations are established in a strictly local fashion—a guiding idea of GG since its inception—, this entails that the nominal is displaced from its original position in (1b). To account for these elementary properties, any theory of GG must assume the existence of a computational system that constructs hierarchically structured expressions with displacement. The optimal course to follow, we think, is to assume a basic compositional operation MERGE, which applies to two objects X and Y, yielding a new one, K = {X,Y}. If X, Y are distinct (taken directly from the lexicon or independently assembled), K is constructed by External MERGE (EM); if Y is a term of X, by Internal MERGE (IM). If K is formed by IM, Y will occur twice in K, otherwise once; but the object generated is {X,Y} in either case. IM thus turns Y into a discontinuous object (or chain), which can be understood as a sequence of occurrences of Y in K.3 (2) illustrates for (1b) above (abstracting away from irrelevant details), where MERGE combines K and the internal NP John-ga:

(2) a. {sensei-ni,{sikarareta,John-ga}} = K → MERGE(K,John-ga) b. {John-ga,{sensei-ni,{sikarareta, }}} = K′

2. For a sample, see Hale & Keyser (1993, 1999); Borer (2005); Marantz (2001, 2013); Mateu (2005); Ramchand (2008); Starke (2014). 3. We assume that each syntactic object is a (possibly singleton) set of occurrences, where occurrences are individuated by their context (structural sister). This is the definition assumed in Chomsky (2000a: 115), going back to Quine (1940: 297). See also Nunes (2004: 50ff.) and Collins & Stabler (2016: sect. 4) for critical discussion and alternative conceptions. Generative Grammar and the Faculty of Language CatJL Special Issue, 2019 233

MERGE, applying recursively so that any generated object is accessible to further operations,4 thus suffices to account for the basic properties of discrete infinity and displacement. Furthermore, it is the computationally simplest operation (as opposed to, say, concatenation, which adds order) that implements the basic properties of an I-language, and as such a conceptually necessary, irreducible component of UG. MERGE(X,Y), yielding K = {X,Y}, imposes hierarchical structure (X, Y are terms of K, but not vice versa) but no order ({X,Y} = {Y,X}). Languages differ in how they ultimately linearize objects constructed by MERGE, an important research topic for the study of the interaction between core syntax and the sensorimotor systems involved in perception and articulation. In (1a) above, the VP is linearized with OV order (John-o sikarta), whereas a corresponding English VP would surface with VO order (scolded John). Interpretation is not affected by this difference, suggesting that the relevant parameter should be a matter of externalization of internally generated expressions alone (see Travis 1984 for original ideas along these lines). A corollary of restricting composition to MERGE is the structure-dependence of syntactic operations: if order is only established in the morphophonological component, no syntactic operation can make reference to it. This excludes a large class of logically possible languages as not humanly acquirable, namely languages whose rules and operations are defined in linear terms (e.g., “reverse the order of words in the sentence to yield a question”). There is evidence that hypothetical languages of this sort are indeed outside of the range of variation permitted by UG. Neurolinguistic studies conducted by Andrea Moro and colleagues suggest that invented “languages” whose rules operate over linear order are treated by speakers as a puzzle rather than linguistic data, as indicated by diffuse activity in many parts of the brain as opposed to the pattern of activity observed in ordinary language use (Musso et al. 2003). Similar results were found in the study of a linguistically gifted but cognitively impaired subject (see section 4 below). There are many illustrations of structure-dependence from syntax-semantics and morpho-phonology (Rizzi 2013a; Everaert et al. 2015). AUX-raising was used in the earliest days of GG as a straightforward illustration of the Poverty of the Stimulus: the fact that the input (linguistic data) vastly underdetermines the I-language eventu- ally attained. The argument then and now is that the language-learning child never entertains the hypothesis that yes/no questions are formed by moving the linearly first auxiliary in the clause—a hypothesis that would receive ample support from cases such as (3) and requires complex examples of the kind in (4) to be refuted. (The symbol ‘_’ marks the gap left behind by the displaced auxiliary.)

(3) Is the tall man from Italy _ happy?

(4) Is the tall man [who is from Italy] _ happy?

4. Recursion is a “deep” property of the generative procedure; to what extent constructions exhibiting category recursion are used in some particular language (e.g., English but not German permits recur- sive possessors) is an orthogonal issue. For related discussion, see Arsenijević & Hinzen (2012); Chomsky (2014). 234 CatJL Special Issue, 2019 Noam Chomsky; Ángel J. Gallego; Dennis Ott

The computation chooses the structurally first (highest) auxiliary for inversion, not the one that happens to be embedded in the subject (at arbitrary depth), despite the fact that identification of the linearly first auxiliary is computationally straightfor- ward. No other hypothesis is ever considered by the child, and consequently cases such as (5) are not attested in children’s production (Crain & Nakayama 1987; Crain et al. 2017):

(5) *Is the tall man [who _ from Italy] is happy?

The formally innocuous linearity-based “first auxiliary” hypothesis would fur- thermore mislead children acquiring verb-final German into postulating questions such as (7), deriving from the verb-final structure underlying (6).

(6) dass der dicke Mann [der aus Italien gekommen war] glücklich war that the fat man who from Italy come was happy was ‘…that the fat man who had come from Italy was happy.’

(7) *War der dicke Mann [der aus Italien gekommen _ ] glücklich war? was the fat man who from Italy come happy was

Instead, structure-dependence dictates that the structurally closest auxiliary raise, exactly as in English and, crucially, irrespective of linear order:

(8) War der dicke Mann [der aus Italien gekommen war] glücklich _? was the fat man who from Italy come was happy ‘Was the fat man who had come from Italy happy?’

Children acquiring German do not simply adopt an alternative “last auxiliary” hypothesis, which would falsely produce the result in (9), where the relative clause has undergone optional rightward extraposition. Instead, learners instinctively know that the correct form is (10)—the only form possible if AUX-raising operates over hierarchical structure.

(9) *War der dicke Mann glücklich war [der aus Italien gekommen _ ]? was the fat man happy was who from Italy come

(10) War der dicke Mann glücklich _ [der aus Italien gekommen war]? was the fat man happy who from Italy come was ‘Was the fat man happy who had come from Italy?’

As before (and always, it seems), structure trumps linear order. The conclusion is as obvious to the language-learning child as it is to the theorist if linearity- based rules are simply not part of the hypothesis space, i.e. not permitted by UG. Children acquiring German have the same understanding of structure-dependence as children acquiring any other grammatical system, since it follows from the hierarchical organization of linguistic objects constructed by MERGE. Generative Grammar and the Faculty of Language CatJL Special Issue, 2019 235

The phenomenon of AUX-raising illustrated above, alongside other classical illustrations of structure-dependence, has been the focus of attention of so-called “usage-based” approaches, which assume that basic facts of language are not rooted in UG but rather the emergent result of statistical analysis over vast amounts of data. Approaches of this kind assume that language acquisition is essentially a matter of memorization and minimal generalizations over a large database. We will not evaluate the specific claims made by these proposals here, as this task has been undertaken elsewhere (Berwick et al. 2011; Crain et al. 2017). The approaches fail invariably both at adequately capturing the phenomena they focus on and, more fundamentally, at addressing the only theoretically relevant question: why do languages universally adopt structure-dependent operations while avoiding, in all relevant cases, far simpler computational operations based on linear order? An approach that restricts generation to MERGE provides a principled solution to this long-standing puzzle; in fact, it provides the optimal solution, a straightforward consequence of the simplest computational operation. In line with a long tradition in linguistics, we take the I-language to derive sound/ sign-meaning pairs: objects constructed by MERGE are mapped onto a semantic representation SEM, accessed by conceptual-interpretive systems, and a phone- tic representation PHON, accessed by sensorimotor systems, the latter providing instructions to the vocal or gestural articulators. Each derivation thus yields a pair , whose properties enter into complex thought and intentional plan- ning (e.g., discourse organization) and perception/articulation (internal in self-talk, external in oral or gestural production). We return to these interfaces below. Displacement as illustrated in (1b) above often has effects on both SEM and PHON: displaced objects are interpreted as chains of occurrences, and derived posi- tions are typically privileged in production. Consider a standard example of wh-move- ment (from Sportiche 2013):

(11) Je me demande de quel livre sur elle-mêmei [cette loi]i a entraîné I wonder of which book about she-self this law has triggered la publication (α). the publication ‘I wonder which book about itself this law triggered the publication of.’ (French)

The wh-phrase de quel livre ‘of which book’ is displaced by IM from its origi- nal position (α) as the complement of the noun publication to the left edge of the embedded clause, where it surfaces in the externalized form. At SEM, the resulting chain of occurrences is interpreted as an operator-variable dependency: (I wonder) which book x about y is such that this law y has triggered the publication of x. SEM provides access to the original copy of the wh-phrase that externally merged in the position marked (α) above, as evidenced by the fact that this is where the reflexive pronoun elle-même is interpreted: in the scope of its antecedent cette loi. Once again, a state of affairs that would otherwise be highly puzzling can be given a principled rationale in terms of MERGE and its effects at the interfaces. 236 CatJL Special Issue, 2019 Noam Chomsky; Ángel J. Gallego; Dennis Ott

The structural distance spanned by dependencies of this sort is not clause- bounded but of arbitrary depth. Some well-known evidence suggests that movement leaves intermediate copies, so that “long” dependencies are in effect composed of “shorter” sub-dependencies (see Boeckx 2007 for a review). All copies are avail- able at SEM, rendering reconstruction operations of earlier theories obsolete. By contrast, mapping to PHON forces a choice about the realization of the discon- tinuous object created by IM. The typical choice is the highest position, with all lower copies remaining silent. If, when, and how this preference can be overriden by parametric and other factors remains an important research question (cf. Nunes 2004; Trinh 2011). Whether other types of rearrangements commonly found in the world’s lan- guages, such as semantically vacuous scrambling, extraposition, clitic movement etc., likewise reflect narrow-syntactic computations or are part of the mapping to PHON (prior to the introduction of linear order, hence with displacement-like properties) is an open question. It is commonly assumed that effects on meaning pertaining to topic/comment and focus/background articulation necessarily indicate core-syntactic displacements, but the relevant notion of “meaning” encompasses pragmatic as well as externalization-related (e.g., prosodic) properties of expres- sions. “Meaning” properties in this broad sense plausibly emerge from holistic interpretation of pairs, rather than narrow-compositional interpreta- tion of SEM itself. We briefly return to related matters in section 5. Does the basic operation MERGE meet the criterion of evolvability? Any answer to this question is necessarily preliminary, given our ignorance about the evolution of UG. Bolhuis et al. (2014) and Berwick & Chomsky (2016) suggest that MERGE plausibly arose as a cognitive innovation in an individual, which ultimately spread to a group. Whether or not this speculation is on the right track, given that MERGE is the minimal computational operation required to generate a discrete infinity of syntactic objects, its emergence is a necessary prerequisite for our species-specific linguistic mind. The evolutionary origins of the other central component of I-language—the lexicon and its atoms with all their semantic intrica- cies (Chomsky 2000b)—remain deeply mysterious.

3. Operations and Constraints We assume that MERGE(X,Y) forms {X,Y}, and nothing else. We will occasion- ally refer to this operation as simplest MERGE, in order to distinguish it from proposals in the literature adopting a more complex operation (cf. Epstein et al. 2014; Fukui & Narita 2014; Collins 2017). A computational system comprising a lexicon and MERGE applying freely will automatically satisfy some fundamental desiderata, such as recursive generation of infinitely many structures with internal constituency and discontinuous (dis- placed) objects. MERGE operates over syntactic objects placed in a workspace: the MERGE-mates X and Y are either taken from the lexicon or were assembled previously within the same workspace (for some relevant formal definitions, see Collins & Stabler 2016). There is no motivation for additional representations, such Generative Grammar and the Faculty of Language CatJL Special Issue, 2019 237 as numerations or lexical arrays, as employed in earlier approaches that assumed trans-derivational comparisons (Chomsky 1993, 1995; cf. Collins 1997: sect. 4.6 on this point). We assume that MERGE is strictly binary: given that this is what is minimally necessary to create hierarchical structure, we assume that it is the only operation defined by UG (although adjunction structures may necessitate a separate opera- tion, a point to which we return in section 5). Generation by simplest MERGE thus entails a restrictive class of recursively defined, binary-branching and discrete- hierarchical structures. Anachronistically speaking, early work on “non-configu- rational” languages by Ken Hale (1983) suggested that there are languages without the binarity restriction, but subsequent work showed this postulation of additional, non-binary combination operations to be unjustified; see, e.g., Webelhuth (1992) on German, Legate (2002) on Warlpiri, and Kayne (1984, 1994) for additional argu- ments. While challenges remain, we take binarity and the absence of “flat” struc- tures to be a theoretically desirable and empirically feasible property of MERGE- based generation. Restriction to simplest MERGE entails an Inclusiveness Condition (IC) that precludes the introduction of extraneous objects—for instance, traces and the bar-levels of X-bar Theory and other labels, but not copies and the detection of headedness via search (more on this below). Unlike the production rules of phrase-structure grammars, simplest MERGE thus incorporates no notion of “pro- jection” (Chomsky 2013, 2015). IC also bars introduction of features that are not inherent to lexical items, such as the discourse-related features (topic, focus, etc.) assumed in the cartographic tradition and other approaches (e.g. Rizzi 1997; López 2009). We suggest below that MERGE is generally not triggered but applies freely. Importantly, IC need not be stipulated as part of UG: it is a corollary of simplest MERGE. Suppose having constructed K = {X,Y}, we proceed to merge K and some object W. W is either internal to K or external to it. If W is external, then it is taken from the lexicon or has been assembled independently; this is EM. If W is internal to K, then it is a term of K; this is IM (displacement). If W = Y, MERGE(K,Y) yields K′ = {Y,{X, }}, with two copies (occurrences) of Y in K′. Note that there is still only one, discontinuous object Y in K′, not two distinct objects; for instance, a semantically ambiguous phrase such as Mary’s book will not be interpreted differ- ently in the multiple positions it occupies after IM (as in, e.g., Mary’s book arrived/ was published last month). A widely-held but, we believe, unjustified assumption is that MERGE is a “Last Resort” operation, licensed by featural requirements of the MERGE-mates (cf. Chomsky 2000a and most current literature, e.g. Pesetsky & Torrego’s 2006 Vehicle Requirement on Merge). Note that a trigger condition cannot be restricted to either EM or IM: the operation MERGE(X,Y) is the same in both cases, the only difference being that one of X, Y is a term of the other in one case, while X and Y are distinct in the other. Simplest MERGE is not triggered; featurally-con- strained structure-building requires a distinct, more complicated operation (defined as Triggered Merge in Collins & Stabler 2016; see Collins 2017 for additional 238 CatJL Special Issue, 2019 Noam Chomsky; Ángel J. Gallego; Dennis Ott discussion). The features invoked in the technical literature to license applica- tions of MERGE are typically ad hoc and without independent justification, “EPP- features” and equivalent devices being only the most obvious case.5 The same holds for selectional and discourse-related features; the latter in addition violate IC, as noted above (cf. Fanselow 2006). Featural diacritics typically amount to no more than a statement that “displacement happens”; they are thus dispensable without empirical loss and with theoretical gain, in that Triggered Merge or equivalent complications become unnecessary (cf. Chomsky 2001: 32, 2008: 151; Richards 2016; Ott 2017b).6 MERGE thus applies freely, generating expressions that receive whatever interpretation they are assigned by interfacing systems.7 Surface stimuli deriving from the objects constructed by I-language can have any degree of perceived “acceptability” or “deviance,” from perfect naturalness to complete unintelligibili- ty. Since Chomsky 1955[1975] it has been recognized that no independently given notion of “well-formedness” exists for natural language in the way it is stipulated for artificial symbolic systems (Chomsky & Lasnik 1993: 508). Consequently, concerns about “overgeneration” in core syntax are unfounded; the only empir- ical criterion is that the grammar associate each syntactic object generated to a pair in a way that corresponds to the knowledge of the native speaker.8 In fact, “overgeneration” must be permitted on purely empirical grounds, since “deviant” expressions are systematically used in all kinds of ways. To pick a random illustration, the expression John will ever agree involving NPI ever must be generated to be usable in contexts such as I doubt that [John will ever agree]. Constructions such as Right-node Raising may have similar properties (see Larson 2018). Do we need operations other than MERGE for the construction of syntactic objects? Agreement phenomena indicate that there is an operation AGREE that relates features of syntactic objects (Chomsky 2000a, 2001). The assumption of much current work is that AGREE is asymmetric, relating initially unvalued φ-features on a Probe to matching, inherent φ-features of a Goal within the

5. The “edge features” of Chomsky (2008) are equally dispensable while not technically equivalent, and were originally introduced to distinguish elements that enter into computation from those that do not, such as interjections and response particles (which Holmberg 2016 argues to be elliptical in many cases). 6. A trigger-free approach to MERGE also eliminates the motivation for counter-cyclic MERGE in subject/object raising, an extremely complex operation (Epstein et al. 2012); see Chomsky (in press). 7. We should be careful to distinguish “interpretive systems” from “performance systems.” The inter- pretive sensorimotor and conceptual-intentional systems are systems of cognitive competence, involved in the determination of entailment and rhyme relations among expressions, for instance. Actual performance introduces all sorts of other complicating factors, such as memory constraints, irrationality, etc. 8. By contrast, the conception of syntactic computation as “crash-proof” (Frampton & Gutmann 2002, among others) is based the dubious assumption that an I-language defines a set of well-formed, intuitively acceptable/natural expressions. But there is no basis for this assumption, and the informal notion of “acceptability” involves a host of factors that under no rational conception are part of I-language. Generative Grammar and the Faculty of Language CatJL Special Issue, 2019 239

Probe’s search space (structural sister). These dependencies find their expression in morphological inflection in highly variable, language-specific ways. AGREE is structure-dependent: in (12) and (13) below, the verbal morphology indicates agreement with the in situ object regardless of whether the linear order is VO or OV (examples from Tallerman 2005).

(12) ni-k-te:moa šo:citl. (Nahuatl) (13) Uqa jo ceh-ade-ia. (Amele) 1sg-3sg-seek flower he houses build-3pl-3sg.pst ‘I seek a flower.’ ‘He built houses.’

AGREE furthermore obeys structurally-conditioned minimality: regardless of the eventual surface order of constituents in (14) and (15), upon entering the deriva- tion the inflectional Probe above the verb phrase locates the hierarchically closest Goal (underlined below) in each case—the singular subject in (14) vs. the plural one in (15), the latter subsequently displaced to the left.

(14) Die Kinder hat / *haben [vP die Lehrerin erschreckt]. the children has have the teacher startled ‘The teacher startled the children.’ (German)

(15) Die Kinder haben / *hat [vP die Lehrerin erschreckt]. the children have has the teacher startled ‘The children startled the teacher.’

Embedding the plural subject NP of (15) within a larger singular NP expectedly gives rise to singular agreement, despite identical adjacency relations at the surface.

(16) [Die Geschichte über [die Kinder]] hat / *haben [vP die Lehrerin the story about the children has have the teacher ]. startled ‘The story about the children startled the teacher.’

Empirically, AGREE or some equivalent operation is clearly required; we set aside here many intricacies of agreement phenomena uncovered in much detailed work on the topic (e.g. Bobaljik 2008; Harbour et al. 2008; Legate 2008). It is commonly assumed that IM is parasitic on AGREE, but this, like the assumption that applications of MERGE are licensed by formal features, requires a more com- plicated, separate movement operation. It is also empirically unfounded, since the effects of AGREE can be observed in the absence of IM and vice versa. Consider (18), where the matrix verb parecen ‘seem’ agrees with the in situ NP varios sobornos a políticos ‘many bribes to politicians’ (as well as with the participle descubiertos ‘discovered’). 240 CatJL Special Issue, 2019 Noam Chomsky; Ángel J. Gallego; Dennis Ott

(17) Parecen haber sido descubiertos varios sobornos a políticos. seem.3pl have.inf been discovered.3pl many bribes to politicians ‘Many bribes to politicians seem to have been discovered.’ (Spanish)

The NP can raise into the matrix clause but it need not, unlike in languages such as English. Cases of this short show that IM and AGREE are independent operations.9 IM without AGREE is illustrated by cases such as (14) above. Objects constructed in core syntax must be mapped onto representations that can be accessed by C-I and SM systems: SEM and PHON, respectively. Consequently, there must be an operation TRANSFER that hands constructed objects over to the mapping components. The mapping to PHON is complex, involving the computa- tion of stress and a prosodic contour, “flattening” of the hierarchical structure, etc. (see Collins 2017 for a partial theory of this mapping, Idsardi & Raimy 2013 for general discussion, and Arregi & Nevins 2012 for a detailed case study in ‘post- syntax’). The mapping to SEM is more direct, given that hierarchical structure is the input to semantic interpretation; just how complex it is depends on the obscure question of where the boundary between the generative procedure and C-I systems is to be drawn. A further open question is what the effects of TRANSFER are on the syntactic derivation. Ideally, TRANSFER should impose some degree of cyclicity on the system, such that for a given syntactic object K assembled in the course of the deri- vation, further computation cannot modify K. This is achieved if TRANSFER renders the objects to which it applies impenetrable to later operations, thereby providing an upper bound to the internal complexity of syntactic objects operated on at any given stage of the derivation. In Chomsky 2000a and subsequent works it is suggested that the derivational phases subject to TRANSFER correspond to the thematic domain (the verb phrase, vP) and the propositional domain (the clause, CP). A common assumption in the literature is that TRANSFER to PHON (or Spell-Out) eliminates structure, such as the interior of a phase, from the deriva- tion. This cannot be literally correct, however: transferred phases are not spelled out in their original position but can be realized elsewhere, such as when a larger object containing the phase is displaced (Obata 2010). To illustrate, in (18) the NP α contains the clausal phase β:

(18) [α the verdict [β that Tom Jones is guilty]] Suppose that subsequent to TRANSFER of β, α raises to a higher position, as in (19):

(19) [α the verdict [β that Tom Jones is guilty]] seems to have been reached (α) by the jury

9. Further arguments are needed to establish the absence of covert raising in such cases (with English- style IM but pronunciation of the original copy); see Wurmbrand (2006) on German and Icelandic. But such vacuous covert displacements are highly dubious on grounds of learnability alone. Generative Grammar and the Faculty of Language CatJL Special Issue, 2019 241

The clausal phase β is pronounced in its derived position internal to displaced α; it is not pronounced in its original position (or omitted from the final string). This means that there is no Spell-Out, and no structure is eliminated: there is only TRANSFER, which renders β inaccessible to subsequent manipulation.10 At the C-I interface, global principles of interpretation such as Condition C of the Binding Theory and the unboundedness of operator-variable dependencies (including “reconstruction” effects, as in (11) above) suggest the same conclu- sion: transferred phases remain accessible, but they cannot be modified at later cycles. This is a version of the Phase Impenetrability Condition (PIC) that permits Probe-Goal relations across phase boundaries, as long as these only manipulate the Probe. Examples include the well-known quirky-subject configurations in which C-T agrees (at least optionally) with an internal argument in situ and cases of long-distance agreement across finite-clause boundaries (D’Alessandro et al. 2008; Richards 2012).11 While permitting Probe-Goal relations and interpretive dependencies, PIC blocks IM of X “out of” a phase P on the plausible assumption that the resulting discontinuity of X alters P’s internal structure.12 Suppose X is raised from within P by IM. If syntactic objects are defined as sets of occurrences, it follows that P subsequently no longer contains X, since it does not contain the set of X’s occur- rences. Consequently, inter-phasal IM is barred by the PIC, as it affects the internal constitution of previously-transferred P. PIC thus requires raising of X to the edge of P before or at TRANSFER, as well as the assumption that the edge remains accessible at the next phase. In this way, the PIC gives rise to successive-cyclic movement and its reflexes in externalization. If smaller units such as NPs, PPs, etc. are also phases (as argued in Uriagereka 1999, Abels 2003, Den Dikken 2007, Marantz 2007, Bošković 2014, and vari- ous other works), PIC enforces cyclic movement of any internal element that will undergo modification at a later stage of the derivation. While technically coherent, this inflation of phasal categories creates significant additional complexity and threatens to render the notion of phase-based derivation vacuous. The fact that the effects associated with successive-cyclic movement seem to be absent from these categories (Gallego 2012; Van Urk 2016) supports the hypothesis that vP and CP are the only phases. The verbal and clausal phases in essence capture the “duality of interpretation” stated in terms of the D-structure/S-structure distinction of earlier theories. EM

10. We thus avoid the assembly problem of Collins & Stabler (2016), first discussed in Uriagereka (1999). 11. See Epstein et al. (2016a) for a theory of “phase cancellation” that may permit a stronger formula- tion of the PIC, with no access to what has already been transferred. For alternative ways to cancel, extend, or parametrize phases, see Gallego (2010a), den Dikken (2007), Alexiadou et al. (2014), and Chomsky (2015). 12. The No-Tampering Condition (NTC) sometimes assumed in the literature is a general desideratum of computational efficiency, but the case of IM shows that it cannot hold in its strictest form: if X is a term of Y contained in W, MERGE(X,W) affects both X (now a discontinuous object) and W (now no longer containing X), but doesn’t change X or Y, e.g. by replacing either with a distinct object. This suggests that the NTC is reducible to the PIC (Gallego 2020). 242 CatJL Special Issue, 2019 Noam Chomsky; Ángel J. Gallego; Dennis Ott within the vP phase gives rise to configurations expressing generalized argument structure, whereas IM at the CP cycle yields chains that enter into the determination of scope/discourse properties (Chomsky 2004, 2007; Gallego 2013a, in progress). While this is a reasonable approximation of the effects of EM and IM at the C-I interface, apparent exceptions (such as semantically vacuous displacements) pose interesting research questions. To be sure, the basic operations MERGE, AGREE, and TRANSFER require much further formal explication; we will address some relevant issues in the follow- ing two sections.13 Despite many remaining questions, we think that it is important to appreciate the fact that an austere system as outlined so far can accommodate a significant range of facts about natural language that are equally fundamental and surprising from a naïve point of view, such as hierarchical structure and structure- dependence, the cross-linguistically variable externalization of head-complement structures, the ubiquity of displacement and “reconstruction,” and the duality of interpretation.

4. Interfaces At the completion of each derivational cycle, the object W constructed in narrow syntax is subject to TRANSFER to the interfaces, mapping W onto SEM and PHON, accessed by C-I and SM systems, respectively. Let us refer to the mapping from narrow syntax to PHON as externalization (EXT). How and when does EXT take place? There are several possibilities. It could be that EXT takes place “all at once,” applying to the final output of the narrow-syntactic derivation. Or it could be that the units rendered inaccessible by PIC are spelled out partially, while not being eliminated from the syntactic representation (permitting phasal objects to be moved as part of larger objects, as discussed above). The interpretive and perceptual/articulatory systems accessing PHON and SEM impose constraints on the expressions freely constructed by MERGE that map onto these representations. For instance, the C-I system imposes a general requirement of Full Interpretation: all terms of a syntactic object must be interpreted, none can be ignored.14 As a result, (20) cannot be interpreted at C-I as either “Who did John see?” or “John saw Mary,” ignoring the theta-less object Mary or the vacuous operator who, respectively.

(20) {who,{John,{T,{see,Mary}}}}

13. We will not discuss here the operation of FEATURE INHERITANCE (F-I), introduced in Chomsky 2008 in order to account for the deletion of φ-features of phase heads. Ouali (2008) explores three possible manifestations of this operation, whereas Gallego (2014) argues that F-I can be eliminated under the Copy Theory of Movement. For reasons given in Richards (2007), F-I, like AGREE, must apply at the phase level, avoiding countercyclicity (Chomsky 2007: 19 fn. 26). 14. Sportiche (2015) argues that Full Interpretation permits “neglect” of elements that are meaningless or multiply represented. On this view, agreement features valued in the course of the deriva- tion remain without consequence at C-I; no additional mechanism that removes these features is required. Generative Grammar and the Faculty of Language CatJL Special Issue, 2019 243

So-called “crash-proof” models seek to bar generation of structures such as (21), given the intuitive “ill-formedness” of the derivative string (Frampton & Gutmann 2002). We think this is a mistake, for both conceptual and empirical reasons (see note 8 and related discussion above). On methodological grounds, constraints imposed on MERGE are typically redundant with more general interface condi- tions, such as Full Interpretation in the case of (20) (Chomsky 1986). The same is true for theta-theoretic violations, e.g. when the derivation fails to supply a strongly transitive verb with an object: the incompleteness is independently detected at the C-I interface, and there is no need to block generation of the “deviant” object, e.g. by complicating MERGE.15 Furthermore, “deviant” expressions typically do have some interpretation, however inexpedient it may be in real-life usage. More specific constraints are imposed by C-I on particular elements within SEM, such as those governed by the principles of Binding Theory. Thus, different types of pronouns receive interpretations that relate them to c-commanding ante- cedents in specific ways, accounting for the fact that Himself likes John does not mean “John likes himself,” the impossibility of a coreferent interpretation of “John” and “him” in John likes him, etc. While many aspects of Binding Theory remain to be addressed for a system obeying IC, principled explanations of core cases in terms of C-I principles appear to be within reach (Chomsky 2008; Reuland 2011).16 What about the other interface, which relates the core computational system to articulatory and perceptual systems involved in EXT? As noted above, EXT is nec- essarily much more complex than the mapping to SEM, in that hierarchical objects must be translated into an altogether distinct, sequential format: while linear order plausibly plays no role in the syntactic and semantic processes yielding expressions and their interpretations, it is plainly required for vocal or gestural articulation. This is not the only complication: EXT violates just about every natural computational principle and carries out extensive modifications (e.g. by introducing boundary tones, prosodic contours and stress placement, etc., all in violation of IC), in ways that are furthermore highly variable across languages. What is more, the mapping must be sufficiently flexible to accommodate the contingencies of all possible modalities. For instance, speech requires strict temporal ordering, while gestural articulation permits a degree of simultaneity between manual and non-manual signs as well as within manual signs (Sandler & Lillo-Martin 2006; Vermeerbergen et al. 2007). The morphophonological properties superimposed as part of EXT also seem to be the locus of much, perhaps all cross-linguistic variation (in accordance with Chomsky’s 2001 Uniformity Principle).17

15. An important remaining question is how to handle apparent idiosyncrasies in selection. Some of these may well turn out upon closer scrutiny to be less idiosyncratic than standardly assumed, as argued recently by Melchin (2018) for eat/devour-type contrasts. Idiosyncratically-selected functional prepo- sitions plausibly fall under a general theory of morphological case realized as part of externalization. 16. Chomsky (2007, 2008) suggests that reflexive binding might reduce to AGREE of one Probe with multiple Goals (cf. Hiraiwa 2005; López 2007). For more on this idea, see Hasegawa (2005); Gallego (2010b). 17. For related discussion and developments in the study of parametric variation, see Biberauer et al. (2014); Eguren et al. (2016); Kayne (2013); Picallo (2014). 244 CatJL Special Issue, 2019 Noam Chomsky; Ángel J. Gallego; Dennis Ott

Psycholinguistic and neurolinguistic inquiries have the potential to shed light on the status of EXT. One example is Smith & Tsimpli’s (1995) work on a sub- ject they call Chris, whose cognitive capacities are extremely limited but who has extraordinary linguistic capacities that allow him to pick up languages very quickly (at least superficially, without significant understanding). Smith and Tsimpli inves- tigated Chris’s reactions to invented languages of two types, one that conformed to UG principles and another that used principles that are not available to UG, such as linearity-based operations. It turned out that Chris was completely unable to deal with the language based on simple computational procedures using linear order, but would master easily an invented language that conformed to UG prin- ciples in employing structure-dependent rules. Subsequent studies by Smith and Tsimpli (corroborated by Musso et al.’s 2003 findings mentioned above) suggest that normals can likewise deal relatively easily with languages conforming to UG principles, but can handle the non-UG-conforming systems relying on linear order only if they were expressly presented as a puzzle rather than a language. While preliminary, these findings strike us as suggestive. These observations support the speculation that those properties of language that pertain exclusively to perception and articulation are ancillary, perhaps alto- gether external to I-language, whereas the core computational system may be close to uniform (Berwick & Chomsky 2016; but see Irurtzun this volume).18 EXT relates very different systems, a computational system constructing hierarchical expres- sions on the one hand and sequential production/perception systems on the other. While the computational system appears to have evolved recently and suddenly, the SM systems had at that point been in place for hundreds of thousands of years (see, e.g., Fitch 2010: chapter 8).19 Given that the linkage between these two systems is an inherently “messy” affair, EXT is a plausible source of linguistic variation— perhaps the only one. Where does all of this leave us with regard to the question of evolvability? MERGE and the inventory of lexical atoms it operates over must be part of UG and as such represent evolutionary innovations specific to the human linguistic mind. What about AGREE and TRANSFER? We believe that while no firm conclu- sions can be drawn at this point, it is plausible that these operations are rooted in principles of efficient computation. Chomsky (2013, 2015) suggests that AGREE instantiates minimal search within the syntactic object, in which case its core prop- erties (structure-dependence, minimality) would reduce to general properties of computation. With regard to TRANSFER and the interface mappings, we noted above that the mapping to PHON is necessarily complex, while the mapping to SEM may be near-trivial. A plausible speculation is that EXT and its variable properties reflect not UG specifications but rather the absence thereof, if the linkage

18. We say “close” because even a computationally minimal core syntax might permit a degree of variation when multiple derivational options are consistent with efficiency of computation. See Richards (2008) and Obata et al. (2015) for proposals along these lines. 19. See also Huybregts (2017) for relevant recent discussion (expanding on observations in Uriagereka 2012: 254) of the evolutionary relevance of aerially isolated click phonemes. Generative Grammar and the Faculty of Language CatJL Special Issue, 2019 245 established between the computational system proper and externalization systems was a problem that had to be solved subsequently to the evolution of I-language.

5. Open Questions and Future Directions In this section, we turn to a number of theoretical issues and outstanding questions that have emerged in recent work. While we will outline what seem to us to be plausible steps towards resolving these questions, our primary intention here is to highlight their relevance to future research in GG. We begin by returning to the operation MERGE, which, despite its apparent simplicity, raises many questions. A narrow conception of MERGE permits only two logical options: binary EM and IM. Various further options have been pro- posed in the literature, such as Parallel Merge/Sideward Movement, a species of “multidominance” structures (Nunes 2004; Citko 2005), and countercyclic Late Merge (Lebeaux 1988; Fox 2002), which replaces a displaced object with a larger one. Are these options corollaries of the availability of simplest MERGE, as has sometimes been claimed, or do they require additional mechanisms, raising new evolvability problems? We believe that there are reasons for skepticism towards these extensions beyond a narrow conception of MERGE, which warrant further scrutiny in future research. All syntactic objects in the lexicon and in the workspace WS are accessible to MERGE; there is no need for a SELECT operation (as in, e.g., Chomsky 1995). WS represents the stage of the derivation at any given point. The basic property of recursive generation requires that any object already generated be accessible to further operations. WS can contain multiple objects at a given stage, so as to permit formation of {XP,YP} structures (subject-predicate constructions) by EM. A derivation may, but need not, terminate whenever WS contains a single object; if it terminates in any other situation, no coherent interpretation can be assigned. Beyond these fundamentals, many questions arise. For instance, does MERGE(X,Y) add {X,Y} to WS = [X,Y] (where X, Y are LIs or complex ele- ments), yielding WS′ = [X,Y,{X,Y}]? Or does it rather replace X and Y in WS with {X,Y}, yielding WS′ = [{X,Y}] (as assumed in Chomsky 1995: 243)? The latter view is more restrictive, and arguably more in line with basic desiderata for optimal generation: the generative procedure constructs a single object to be mapped onto PHON and SEM, not a multiplicity of objects; and considerations of computational efficiency suggest that WS should be kept minimal throughout a derivation.20 The same conclusion is suggested by the fact that a workspace WS′ = [X,Y,{X,Y}] derived by MERGE(X,Y) would not ensure that subsequent operations can apply in a determinate fashion: any rule applying to X or Y would ambiguously refer to the individual objects X, Y or to the terms of K = {X,Y}.

20. A strong hypothesis about the generative procedure would be that operations never extend WS (i.e. increase the cardinality of elements contained in it). Except for the case where two elements taken from the lexicon are combined, EM and IM keep WS constant or reduce it. For related consider- ations (but very different conclusions), see De Vries (2009). 246 CatJL Special Issue, 2019 Noam Chomsky; Ángel J. Gallego; Dennis Ott

Indeterminacy of rules in this sense is formally unproblematic and in fact a familiar property of phrase-structure grammars; but a sensible question to ask is whether it should be permitted in an optimal I-language at all, given that it raises various technical complications (for instance with regard to the distinction between copies and repetitions, to which we return below). If the answer is negative, we are led to a view of simplest MERGE as mapping WS = [X,Y] onto WS′ = [{X,Y}], reducing its complexity and avoiding indeterminate rule application. For further elaboration on this conception of MERGE as a function mapping workspaces onto workspaces, going back to Bobaljik (1995), see Collins & Stabler (2016) and Chomsky (this volume); for an alternative conception of derivations that does away with work- spaces, see Collins (2017). This restrictive view of MERGE, which seeks to curtail the complexity of WS, bars operations such as Parallel Merge (which establishes a ternary relation between the shared element X, its MERGE-mate Y, and the object Z containing Y) and Late Merge (which requires substitution of X by some more complex object; see Epstein et al. 2012).21 This leaves EM and IM as the only possible instantiations of simplest MERGE. We believe that future work should address these and other questions raised by the above considerations, in order to establish a restrictive “null theory” of the generative procedure that adheres to plausible—yet at present necessarily tentative—desiderata of computational efficiency. Regardless of which implementation of recursive generation we adopt, a further central question is how a MERGE-based system can distinguish copies (created by IM) from repetitions of identical elements (created by EM), so that we correctly distinguish the two instances of the noun phrase the man in The man saw the man from those in the unaccusative construction The man arrived . Suppose MERGE(K,W), where W is a term of K, creates Z. Z now contains two (or more) copies of W. But upon accessing Z, how do the external interpretive systems know whether multiple instances of W are copies of a single object or independent objects (repetitions of W)?22 Different answers to this question have been pursued, e.g. in terms of multi- dominance structures (Gärtner 2002) or an operation COPY that duplicates W prior to IM (Chomsky 1993; Nunes 2004). But complex graph-theoretic objects are not defined by simplest MERGE, and no COPY operation is necessary given that copies are simply a by-product of IM (on standard set-theoretic assumptions). Another possibility is that the system keeps track of how often the relevant object was assembled (or accessed in the lexicon) and communicates this information to the interfaces as part of TRANSFER (see Kobele 2006 and Hunter 2011 for related proposals). Along these lines, Chomsky (2007, 2012b) proposes that the distinc- tion is established by the phasal nature of syntactic computation. At TRANSFER, phase-level memory suffices to determine whether a given pair of identical terms

21. See Sportiche (2015) for an alternative treatment of the facts motivating Late Merge analyses in terms of “neglect” at the interface. 22. Earlier theories sidestepped the problem by assuming a rewriting of lower copies as distinct symbols (traces), linking these to their antecedent via coindexing, in radical violation of IC. Generative Grammar and the Faculty of Language CatJL Special Issue, 2019 247

Y, Y′ was formed by IM.23 If it was, then Y and Y′ are copies; if it was not (i.e., it was formed by EM), Y and Y′ are independent repetitions. This captures the basic intuition that if some syntactic object is introduced into the derivation “from the outside,” it is a distinct object; if it is added “from within,” it is a copy. Phases would then play the crucial role of limiting memory to the current cyclic domain (the principal desideratum of phase theory), preventing unbounded search and thus rendering the detection of repetitions vs. copies computationally feasible. For criti- cal discussion of this approach, which remains to be formalized, as well as related proposals, see Collins & Groat (2018). A further important question is whether objects constructed by MERGE are necessarily endocentric and identified by a determinate label, as in earlier phrase- structural models incorporating X-bar Theory. The assumption of universal endo- centricity carried over to the Bare Phrase Structure model of Chomsky (1995), where MERGE(X,Y) is taken to yield a labeled object {L,{X,Y}}, L ∈ {X,Y}. But this is a departure from simplest MERGE, rooted in the intuitive appeal and pedagogical convenience of tree notation. In its simplest form, MERGE has no “built-in” projection mechanism, hence does not yield labeled objects (Chomsky 2013, 2015; Collins 2017). Unlike displacement and linear order, projection is not an empirically detectable property of linguistic expressions but a theory-internal concept. Encoding a label as part of the object constructed by MERGE raises vari- ous non-trivial questions (Seely 2006)—for instance, why can the label not undergo head movement on its own, or be pronounced? These problems vanish if labels qua syntactic objects do not exist, but the question of endocentricity remains in a differ- ent form: is it relevant to the syntactic derivation and/or to the interfacing systems? Chomsky (2013) argues that the answer to this question is positive, and that an algorithm LABEL is required to supplement MERGE. For some syntactic object K, LABEL(K) locates within K the first element where search “bottoms out:” the structurally most prominent lexical item. LABEL is thus not an entirely new operation, but, like AGREE, an instantiation of minimal search. For K = {H,XP}, where H is an LI and XP a complex object, H will be chosen as the label. The first step in a derivation necessarily relates two atomic objects, yielding K = {H,R}. What is the label of K in this case? If R is a feature-less root, as assumed by many contemporary approaches, it is plausibly ignored by LABEL, and H will be cor- rectly chosen as the label of K. On this conception, LABEL locates a feature of H, which renders the traditional notion of “head” irrelevant for labeling purposes. This approach to labeling raises intricate questions about the nature of lexical items (and the distribution of their properties across components, as assumed by models such as Distributed Morphology), which we set aside here. X-bar-theoretic universal endocentricity has conceptually and empirically ques- tionable consequences. To begin with, it is trivially falsified by every case of IM,

23. Identity must take features into account, so that, for instance, in a double-object construction with two identical objects (The king sold a slave a slave), an object NP raised to the phase edge can be correctly associated with its lower copy. The distinction is trivial if the NPs are distinguished by structural vs. inherent case-marking. 248 CatJL Special Issue, 2019 Noam Chomsky; Ángel J. Gallego; Dennis Ott which yields an unlabelable {XP,YP} configuration (putting aside head movement). Another case in point is the DP hypothesis, a corollary of X-bar Theory. Bruening (2009) shows that while selection by a higher verb clearly targets C (the head of the clause), there is no selection for D (only for properties of N, such as number); and unlike C, D is not universal. The challenge, then, is to accommodate D-type elements while retaining the nominal character of the overall phrase. One possibil- ity, suggested in Chomsky (2007) and developed by Oishi (2015), is that nominals are headed by a nominalizer n, analogous to v as the head of the verb phrase, with D, where present, occupying some lower position. Another is that determiners are in fact internally complex elements, as suggested by their morphology in many languages; see, e.g., Leu (2015). If K = {X,Y} and neither X nor Y is a lexical item (e.g., when X is a “speci- fier” in earlier terminology), no head is detected by LABEL. Building on Moro (2000), Chomsky (2013) argues that this situation can motivate displacement of X: if X merges (internally) to some object W containing K, K will no longer contain X (X being the set of its occurrences), and consequently Y will act as the label of K. Chomsky suggests that W and X must share a feature if the resulting configuration is to be “stable,” an idea that Chomsky (2015) extends to EPP and ECP effects (see also Rizzi 2015). Such feature sharing is involved in subject/object raising, for instance, where the raising XP enters into an AGREE relation with the head it raises to (T/v*, respectively; see Gallego 2017 for an alternative, and Epstein et al. 2016b for further discussion). Again building on Moro’s work, Ott (2012), Chomsky (2013, 2015), and Blümel (2017) argue that the need to break the symmetry of {XP,YP} configura- tions (motivated by LABEL) can drive displacement of XP, yielding phenomena such as successive-cyclic movement, raising to object, and others. Such proposals assume that MERGE applies freely; but derivations in which relevant applications fail to apply will not yield the required outcome. Plausibly, efficiency of computa- tion precludes “superfluous” applications of MERGE that have no effect on the eventual output (such as string-vacuous IM with no effect on interpretation, which would entail massive structural ambiguity of any given sentence). For proposals along these lines and relevant evidence, see e.g. Fox (2000), Chomsky (2001, 2008a), Reinhart (2006), Struckmeier 2016. Note that unlike classical X-bar Theory, a LABEL-based system allows for the possibility that a constructed object K remains unlabeled (exocentric), e.g. when K is a root clause or created by operations that are not head-oriented in any plausible sense, such as syntactic scrambling. Further illumination of these issues will require a theory that answers the question of where detectable endocentric- ity is required: in the syntactic derivation (e.g., for purposes of interpreting local selectional relations), at the interfaces (e.g., for the computation of prosody), both, or not at all (Collins 2017)? These questions remain open for now and are in urgent need of clarification. A further important research question is whether structure-building mechanisms beyond simplest MERGE are necessary, such as Chomsky’s (2004) PAIR-MERGE for adjuncts and De Vries’s (2012) PAR-MERGE for parenthetical expressions. Generative Grammar and the Faculty of Language CatJL Special Issue, 2019 249

Adjuncts and parentheticals have distinct properties, among them strong opacity for extraction. Thus, while (21) is ambiguous between a complementation and an adjunction structure, (22) is unambiguous, since only the former permits IM of the wh-phrase. And while an NP such as a book about NP readily permits wh-extraction of NP (23), an analogous extraction from a corresponding parenthetical appositive NP yields no coherent interpretation (24).

(21) John decided on the boat.

(22) What did John decide on _?

(23) What did John read a book about _?

(24) *What did John read something, a book about _?

Chomsky (2004) proposes that adjunction is the result of an operation PAIR- MERGE, which yields asymmetric (ordered) pairs rather than symmetric (unor- dered) sets, permitting the identification of an adjunct in a phrase-modifier configu- ration. PAIR-MERGE may also be required for unstructured coordination (as in John is tall, happy, hungry, bored with TV, etc.), a construction that was recognized as problematic in the earliest work in GG, due to the apparent absence of internal hierarchical organization: even unrestricted rewriting systems cannot generate these expressions, nor can transformations (see Lasnik & Uriagereka 2012 for a critical review of some proposals in Chomsky & Miller 1963).24 PAIR-MERGE is a formally distinct operation from simplest MERGE, hence raises problems of evolvability. Ideally, it could be shown to be dispensable. We do not take up the challenge here; for some suggestive work on adjunction that does not invoke special operations (but at the cost of introducing other stipulations), see Hunter (2015). As for parenthesis, it seems to us that the only principled approach consistent with evolvability considerations relegates the phenomenon entirely to discourse pragmatics, obviating the need to enrich UG with special operations. On this view, parenthetical expressions (which are frequently elliptical) are generated independently and interpolated or juxtaposed only in production (see Ott & Onea 2015; Ott 2016a,b). Traditionally, adjunction is also assumed to be involved in head movement (HM),25 but such an approach has several unwelcome consequences (Chomsky 2015: 12ff.; also Carstens et al. 2016). HM violates principles of minimal com- putation and cannot be implemented by simplest MERGE, given its countercyclic character. It also typically lacks semantic effects, at least for the core cases of

24. A possible analysis of unstructured coordination that avoids these problems could take each AP in the above example to be an elliptical ‘afterthought’ expression in the sense of Ott & De Vries (2016), Ott (2016). This would capture the central properties of the construction: infinite iterability and individual predication of each AP of the subject. For reasons of space, we cannot explore this idea further here. 25. See Epstein et al. (2016a) on PAIR-MERGE as a mechanism for affixation. 250 CatJL Special Issue, 2019 Noam Chomsky; Ángel J. Gallego; Dennis Ott verb raising. This vacuity and the fact that the configurations standardly described in terms of HM are highly variable across languages suggest that at least some instances of HM might fall within the mapping to PHON (as suggested in Chomsky 2001 and supported by specific arguments in Zwart 2017 and elsewhere), although there are interesting arguments to the contrary (e.g., Roberts 2010).26,27 Other cases might reduce to core-syntactic IM, in line with proposals in Toyoshima (2000) and Matushansky (2006). We believe that a fresh take on the relevant phenomena is needed, based on the recognition that traditional implementations of HM are in fact problems restated in technical terms rather than solutions. We noted above that simplest MERGE applies freely, and that features which are not introduced into the derivation by LIs, such as those pertaining to informational functions of XPs, violate IC. “Cartographic” analyses, where such features take center stage as the driving force behind displacements to the peripheries, are essen- tially construction-based approaches, with the notion “construction” recast in terms of features and phrase-structure rules generating cascades of projections. But infor- mational notions such as “topic” or “focus,” like grammatical functions or thematic roles, are properties of configurations and their syntactic/discursive context, not of individual syntactic objects (Chomsky 1965; Hale & Keyser 1993); consequently, they should neither be represented in the lexicon, nor in the narrow syntactic deriva- tion (cf. Uriagereka 2003; Fortuny 2008; López 2009; Gallego 2013a, in progress). The Cartographic Program pursued by Cinque, Rizzi and many others has revealed remarkable facts and generalizations, such as Cinque’s (1999) hierarchy of adverbial positions and Rizzi’s (1997) structure for the left periphery. But the postulated structures raise serious problems, as acknowledged by Cinque & Rizzi (2010: 63). As we observed above, any linguistic theory must minimally meet the conditions of acquirability and evolvability. UG must permit acquisition of I-language, and it must have evolved in the human lineage—and if current best

26. For a different, syntactic approach to HM, see Chomsky (2015). Core-syntactic HM is presupposed by many approaches to diverse phenomena, such as Donati’s (2006) analysis of free relatives, where the wh-element is analyzed as a D head that determines the label of the embedded clause after IM. See Ott (2011) for an alternative that is consistent with a non-syntactic conception of HM, but relies on specific assumptions concerning the interaction of TRANSFER and LABEL. 27. An interesting challenge to the idea that HM could be relegated to EXT is provided by Spanish VOS constructions, which suggest that verb movement can resolve minimality conflicts (see discussion around (14)-(16) above). Consider (i) below, where the internal argument cada coche ‘each car’ has moved to a position at the vP edge, from where it c-commands the vP-internal external argument su propietario ‘its owner,’ enabling a bound-variable interpretation of the subject-internal pronoun.

(i) Recogió [vP cada coche [vP su propietario (v) [ ]]] (Spanish) picked-up each car its owner ‘Each car was picked up by its owner.’ (lit.: ‘Its owner picked up each car.’) What is surprising is that this configuration does not preclude AGREE between C-T and the external argument (as it should under a conception of minimality without the notion of equidistance: Chomsky 1993, 2000). The facts are discussed in Gallego (2010, 2013b), where it is argued that nominative Case assignment to the in situ subject in such cases is parasitic on verb movement. If HM were merely a phonological operation, its apparent role in licensing Probe-Goal dependencies would be unexpected. Generative Grammar and the Faculty of Language CatJL Special Issue, 2019 251 guesses are correct, it must have evolved recently. The cascades of projections postulated for various areas of clause structure cannot possibly be learned: there is no conceivable evidence that a child could rely on to infer these hierarchical sequences from experience. But attributing complex functional hierarchies to UG raises an evolutionary puzzle: it seems virtually unimaginable that the complex cartographic templates could have evolved as irreducible properties of UG. The conclusion is that cartographic sequences of positions are problems, not solutions. As aptly discussed by Rizzi (2013b), the challenge is to derive the descriptive generalizations from more elementary principles that are motivated independently. There is some promising work in this direction, such as Ernst’s (2002) non- templatic analysis of adjunct ordering that derives Cinque’s universal template from interpretive properties of adverbial expressions, rendering a “hard-coded” func- tional sequence obsolete. Developing alternatives to templatic approaches to the clausal peripheries will require, we believe, a re-evaluation of the extent to which the superficial complexity of “sentences” in fact reflects amalgamation of inde- pendent expressions in discourse, rather than syntactic composition. In contrast to Cinque’s (1983) early work on “topic constructions,” the cartographic tradition assumes that all sorts of peripheral elements, including left- and right-dislocated constituents, are structurally integrated into the clause structure. As a result, the puzzling properties of dislocated elements that distinguish them from displaced constituents (such as wh-phrases) are merely restated, not explained, including their universal extra-peripheral ordering. An alternative, developed in Ott (2014, 2016b, 2017a), denies the reality of structurally complex peripheries by analyzing dislocated elements, unlike fronted or extraposed XPs, as structurally independent elliptical expressions that are interpretively related to their host clauses by principles of dis- course organization and cross-sentential anaphora. On this alternative approach, car- tography’s peripheral functional sequence remains only as an artifact of description. We adumbrated above the idea that the core computation yields hierarchically- structured, language-invariant expressions (entering into “thought” processes of various kinds at the interface with C-I systems) whereas the mapping that feeds externalization-related SM systems is necessarily more involved and indirect. This asymmetry between the two interfaces leads Chomsky (2014) to adopt the following hypothesis:

(25) I-language is optimized relative to the C-I interface alone, with EXT ancillary.

“Optimized” here refers to the kinds of considerations introduced above: relying only on simplest MERGE and no more complex operations. As we pointed out, this strong thesis is consistent with the general fact that operations of I-language operate over structures, not strings (with concomitant beneficial implications for language acquisition), and that structured objects provide the input to compositional interpretation. At the same time, challenges for (25) emerge from recent work suggesting a rather direct involvement of morphophonological factors in the syn- tactic computation. Richards (2016) develops an elaborate theoretical framework in which the articulation systems impose universal constraints that, in conjunction 252 CatJL Special Issue, 2019 Noam Chomsky; Ángel J. Gallego; Dennis Ott with independent language-specific differences, can account for central aspects of cross-linguistic variation (see also Mathieu 2016 for a related proposal). In this model, metrical requirements of affixes and other conditions imposed by PHON can effect the application of MERGE and other operations.28 Given the impressive results achieved by Richards’ system, his work poses an interesting challenge to the hypothesis that EXT is an ancillary process. The same is true for recent work arguing for the relevance of linear order to various syntactic and semantic processes (Kayne 2011, 2018; Barker 2012; Bruening 2014; Willer Gold 2018), contrary to our suggestions above. If and how these challenges can be reconciled with (25) is an important topic for future research.29 As noted above, a related open question pertaining to the overall organization of the system is whether the narrow-syntactic computation includes an operation AGREE in addition to MERGE, or whether featural interactions are restricted to EXT. The former view is based on the assumption that AGREE mediates assign- ment of structural Case and serves to eliminate semantically redundant φ-features from the syntactic object, as required by a particularly strong version of the Full Interpretation principle (Chomsky 2000a et seq., building on observations of Vergnaud 1977[2006] and George & Kornfilt 1981). Another possibility is that case is a purely morphological phenomenon (Marantz 1991; McFadden 2004), and that uninterpretable features are simply neglected at the C-I interface (in the spirit of Sportiche 2015). The latter scenario is consistent with relegating AGREE to EXT, where it would then serve the sole purpose of determining the morphological form of initially underspecified inflectional elements (cf. Bobaljik 2008, and Preminger 2014 for an opposing view; also Landau 2016 for an argument from Control). Also in view of the cross-linguistically highly variable expression of inflection, AGREE seems to fit rather naturally with other operations pertaining to EXT. We believe that there are interesting arguments in either direction and leave the matter here as an important topic for future research. These and many other issues concerning the overall architecture of the compu- tational system(s) underlying human linguistic capacity remain to be adequately addressed and explored. The mere fact that they can be coherently stated testifies to the progress GG has made over the years, providing ample fertile ground for further stimulating research.

28. Richards explicitly discusses instances of derivational opacity, where phonological factors trigger movements whose effects are later undone by subsequent operations. This entails that the morph- ophonology in his model cannot simply act as an output filter, but must be directly involved in the narrow-syntactic derivation. 29. Kayne (2018) presents a series of arguments for the inclusion of linear order in core-syntactic operations, proposing an operation ip-merge that yields an ordered pair expressing a relation of immediate precedence. Kayne argues furthermore that the operation is constrained such as to only construct LCA-compliant syntactic objects (in the sense of Kayne 1994). This logic strikes us as inconsistent: where LCA or some similar principle determines order, it is wholly redundant to impose order independently in narrow syntax. Kayne’s empirical arguments also strike us as uncon- vincing, as they appear to pertain primarily to pragmatics/discourse organization and production/ processing, hence EXT. For reasons of space, however, we have to leave a proper discussion of Kayne’s arguments to another occasion. Generative Grammar and the Faculty of Language CatJL Special Issue, 2019 253

6. Conclusion Even within the expressly narrow focus of GG on linguistic competence, virtually every aspect of (I-)language remains a problem. Nevertheless, significant progress has been made since the 1950s, and in recent years the establishment of a minimal formal toolkit meeting basic desiderata of explanatory and evolutionary adequacy has become a feasible goal. As always, it remains to be seen to what extent such a toolkit can be reconciled with the empirical challenges and puzzles that inevitably arise wherever we look. As documented above, an approach based on the operation MERGE raises new problems on its own, both empirical and conceptual. In fact, in many cases it remains to be determined where to even look for solutions, e.g. when we ask whether heavy-NP shift falls within the MERGE-based system of core computation or is part of externalization. In our view, this conclusion makes the challenges ahead no less exciting, but should rather fuel our appreciation of the fascinating research questions that present themselves once we approach human language as an object of the natural world.

References Abels, K. 2003. Successive Cyclicity, Anti-locality, and Adposition Stranding. PhD dissertation, University of Connecticut. Alexiadou, A., E. Anagnostopoulou & S. Wurmbrand. 2014. Movement vs. Long- distance Agree in Raising: Disappearing Phases and Feature Valuation. In H.-L. Huang, E. Poole & A. Rysling (eds.). Proceedings of NELS 43, 1-12. Amherst, MA: GLSA. Anderson, S.R. 2004. Doctor Dolittle’s Delusion. New Haven, CT: Yale University Press. Arregi, K. & A. Nevins. 2012. Morphotactics. Dordrecht: Springer. Arsenijević, B. & W. Hinzen. 2012. On the Absence of X-within-X Recursion in Human Grammar. Linguistic Inquiry 43: 423-440. Barker, C. 2012. Quantificational Binding does not Require C-command. Linguistic Inquiry 43: 614-633. Berwick, R.C. & N. Chomsky. 2011. The Biolinguistic Program: The Current State of its Development. In A.M. Di Sciullo & C. Boeckx (eds.). Biolinguistic Investigations, 19-41. Oxford: Oxford University Press. Berwick, R.C. & N. Chomsky. 2016. Why Only Us. Cambridge, MA: MIT Press. Berwick, R.C., P. Pietroski, B. Yankama & N. Chomsky. 2011. Poverty of the Stimulus Revisited. Cognitive Science 35: 1207-1242. Berwick, R., A. Friederici, N. Chomsky & J. Bolhuis. 2013. Evolution, Brain, and the Nature of Language. Trends in Cognitive Sciences 17: 89-98. Biberauer, T., A. Holmberg, I. Roberts & M. Sheehan. 2014. Complexity in Comparative Syntax: The View from Modern Parametric Theory. In F. Newmeyer & L. Preston (eds.). Measuring Grammatical Complexity, 103-127. Oxford: Oxford University Press. Bobaljik, J. D. 1995. In Terms of Merge: Copy and Head Movement. In R. Pensalfini & H. Ura (eds.). Papers on minimalist syntax (= MIT Working Papers in Linguistics 27). Cambridge, MA: MITWPL. 254 CatJL Special Issue, 2019 Noam Chomsky; Ángel J. Gallego; Dennis Ott

Bobaljik, J. D. 2008. Where’s Phi? Agreement as a Postsyntactic Operation. In D. Adger, D. Harbour & S. Béjar (eds.). Phi Theory: Phi-Features Across Interfaces and Modules, 295-328. Oxford: Oxford University Press. Bolhuis, J., I. Tattersall, N. Chomsky & R.C. Berwick. 2014. How Could Language Have Evolved? PLoS Biology 12: e1001934. Blümel, A. 2017. Symmetry, Shared Labels and Movement in Syntax. Berlin: De Gruyter. Boeckx, C. 2007. Understanding Minimalist Syntax: Lessons from Locality in Long- Distance Dependencies. Oxford: Blackwell. Borer, H. 2005. Structuring Sense (2 volumes). Oxford: Oxford University Press. Bošković, Z. 2014. Now I’m a Phase, Now I’m Not a Phase: On the Variability of Phases with Extraction and Ellipsis. Linguistic Inquiry 45: 27-89. Bruening, B. 2009. Selectional Asymmetries between CP and DP Suggest that the DP Hypothesis is Wrong. U. Penn Working Papers in Linguistics 15.1: 26-35. Bruening, B. 2014. Precede-and-Command Revisited. Language 90: 342-388. Carstens, V., N. Hornstein & T.D. Seely. 2016. Head-Head Relations in ‘Problems of Projection’. The Linguistic Review 33: 67-86. Chomsky, N. 1956[1975]. The Logical Structure of Linguistic Theory. Mimeograph, Harvard University and MIT. Partial revised version published by Plenum Press, 1975. Chomsky, N. 1957. Syntactic Structures. The Hague: Mouton. Chomsky, N. 1965. Aspects of the Theory of Syntax. Cambridge, MA: MIT Press. Chomsky, N. 1986. Knowledge of Language. New York: Praeger. Chomsky, N. 1993. A Minimalist Program for Linguistic Theory. In K. Hale & S.J. Keyser (eds.). The View from Building 20, 1-52. Cambridge, MA: MIT Press. Chomsky, N. 1995. The Minimalist Program. Cambridge, MA: MIT Press. Chomsky, N. 2000a. Minimalist Inquiries: The Framework. In R. Martin, D. Michaels, J. Uriagereka & S.J. Keyser (eds.). Step by Step, 89-155. Cambridge, MA: MIT Press. Chomsky, N. 2000b. New Horizons in the Study of . Cambridge: Cambridge University Press. Chomsky, N. 2001. Derivation by Phase. In M. Kenstowicz (ed.). Ken Hale: A Life in Language, 1-52. Cambridge, MA: MIT Press. Chomsky, N. 2004. Beyond Explanatory Adequacy. In A. Belletti (ed.). Structures and Beyond, 104-131. Oxford: Oxford University Press. Chomsky, N. 2007. Approaching UG from Below. In U. Sauerland & H.-M. Gärtner (eds.). Interfaces + Recursion = Language?, 1-30. Berlin: De Gruyter. Chomsky, N. 2008. On Phases. In R. Freidin, C. P. Otero & M.L. Zubizarreta (eds.). Foundational Issues in Linguistic Theory, 134-166. Cambridge, MA: MIT Press. Chomsky, N. 2012a. Some Simple Evo-devo Theses: How True Might They Be for Language? In R.K. Larson, V. Déprez & H. Yamakido (eds.). The Evolution of Human Language: Biolinguistic Perspectives, 45-62. Cambridge: Cambridge University Press. Chomsky, N. 2012b. Foreword. In Á.J. Gallego (ed.). Phases, 1-7. Berlin: De Gruyter. Chomsky, N. 2013. Problems of Projection. Lingua 130: 33-49. Generative Grammar and the Faculty of Language CatJL Special Issue, 2019 255

Chomsky, N. 2014. Minimal Recursion: Exploring the Prospects. In T. Roeper & M. Speas (eds.). Recursion: Complexity in Cognition, 1-15. Berlin: Springer. Chomsky, N. 2015. Problems of Projection: Extensions. In E. Di Domenico, C. Hamann & S. Matteini (eds.). Structures, Strategies and Beyond, 1-16. Amsterdam: John Benjamins. Chomsky, N. 2017. The Language Capacity: Architecture and Evolution. Psychonomic Bulletin and Review 24: 200-203. Chomsky, N. 2019. Some Puzzling Foundational Issues: The Reading Program. Catalan Journal of Linguistics Special Issue: 263-285. Chomsky, N. In press. Puzzles About Phases. In L. Franco & P. Lorusso (eds.). Linguistic Variation: Structure and Interpretation. Berlin: De Gruyter. Chomsky, N. & G. Miller. 1963. Introduction to the Formal Analysis of Natural Languages. In R.D. Luce, R.R. Bush & E. Galanter (eds.). Handbook of Mathematical Psychology, vol. II, 269-321. New York: John Wiley. Chomsky, N. & H. Lasnik. 1993. The Theory of Principles and Parameters. In J. Jacobs, A. von Stechow, W. Sternefeld and T. Vennemann (eds.). Syntax: An International Handbook of Contemporary Research, 506-569. Berlin: De Gruyter. Cinque, G. 1983. ‘Topic’ Constructions in Some European Languages and ‘Connectedness’. In K. Ehlich & H. van Riemsdijk (eds.). Connectedness in Sentence, Discourse and Text. Tilburg: Katholieke Hogeschool. Cinque, G. 1999. Adverbs and Functional Heads. Oxford: Oxford University Press. Cinque, G. & L. Rizzi. 2010. The Cartography of Syntactic Structures. In B. Heine and H. Narrog (eds.). The Oxford Handbook of Linguistic Analysis, 51-65. Oxford: Oxford University Press. Citko, B. 2005. On the Nature of Merge: External Merge, Internal Merge, and Parallel Merge. Linguistic Inquiry 36: 475-497. Collins, C. 1997. Local Economy. Cambridge, MA: MIT Press. Collins, C. 2017. Merge(X,Y) = {X,Y}. In L.S. Bauke & A. Blümel (eds.). Labels and Roots, 47-68. Berlin: De Gruyter. Collins, C. & E. Groat. 2018. Copies and Repetitions. Ms., NYU. Collins, C. & E. Stabler. 2016. A Formalization of Minimalist Syntax. Syntax 19: 43-78. Crain, S. & M. Nakayama. 1987. Structure-dependence in Grammar Formation. Language 63: 522-543. Crain, S. & R. Thornton. 1998. Investigations in Universal Grammar. Cambridge, MA: MIT Press. Crain, S. & S. Thornton. 2012. Syntax Acquisition. WIREs Cognitive Science 3: 185-203. Crain, S., L. Koring, and R. Thornton. 2017. Language Acquisition from a Biolinguistic Perspective. Neuroscience and Biobehavioral Reviews 81(B): 120-149. D’Alessandro, R., S. Fischer & G.H. Hrafnbjargarson. 2008. Agreement Restrictions. Berlin: DeGruyter. Den Dikken, M. 2007. Phase Extension: Contours of a Theory of the Role of Head Movement in Phrasal Extraction. Theoretical Linguistics 33: 1-41. Donati, C. 2006. On Wh-head Movement. In L. Cheng & N. Corver (eds.). Wh-Movement: Moving On, 21-46. Cambridge, MA: MIT Press. 256 CatJL Special Issue, 2019 Noam Chomsky; Ángel J. Gallego; Dennis Ott

Eguren, L., O. Fernández-Soriano & A. Mendikoetxea (eds.). 2016. Rethinking Parameters. Oxford: Oxford University Press. Epstein, S.D., H. Kitahara & T.D. Seely. 2012. Structure Building That Can’t Be. In M. Uribe-Etxebarria & V. Valmala (eds.). Ways of Structure Building, 253-270. Oxford: Oxford University Press. Epstein, S.D., H. Kitahara & T.D. Seely. 2014. Labeling by Minimal Search: Implications for Successive-cyclic A-Movement and the Conception of the Postulate ‘Phase’. Linguistic Inquiry 45: 463-481. Epstein, S.D., H. Kitahara & T.D. Seely. 2016a. Phase Cancellation by External Pair- Merge of Heads. The Linguistic Review 33: 87-102. Epstein, S.D., H. Kitahara & T.S. Seely. 2016b. What Do We Wonder is Not Syntactic? In S. Epstein, H. Kitahara & T.D. Seely (eds.). Explorations in Maximizing Syntactic Minimization, 222-239. New York: Routledge. Ernst, T. 2002. The Syntax of Adjuncts. Cambridge: Cambridge University Press. Everaert, M., M. Huybregts, N. Chomsky, R. Berwick & J. Bolhuis. 2015. Structures, Not Strings: Linguistics as Part of the Cognitive Sciences. Trends in Cognitive Sciences 19: 729-743. Fanselow, G. 2006. On Pure Syntax (Uncontaminated by Information Structure). In P. Brandt & E. Fuß (eds.). Form, Structure, and Grammar, 137-157. Berlin: Akademie Verlag. Feldman, H., S. Goldin-Meadow & L. Gleitman. 1978. Beyond Herodotus: The Creation of Language by Linguistically Deprived Deaf Children. In A. Lock (ed.). Action, Gesture, and Symbol. London: Academic Press. Fitch, W.T. 2010. The Evolution of Language. Cambridge: Cambridge University Press. Fortuny, J. 2008. The Emergence of Order in Syntax. Amsterdam: John Benjamins. Fox, D. 2000. Economy and Semantic Interpretation. Cambridge, MA: MIT Press. Frampton, J. & S. Gutmann. 2002. Crash-Proof Syntax. In S.D. Epstein & T.D. Seely (eds.). Explanation and Derivation in the Minimalist Program, 90-105. Oxford: Blackwell. Friederici, A.D., N. Chomsky, R.C. Berwick, A. Moro & J.J. Bolhuis. 2017. Language, Mind and Brain. Nature Human Behaviour 1: 713-722. Fox, D. 2002. Antecedent-contained Deletion and the Copy Theory of Movement. Linguistic Inquiry 33: 63-96. Fukui, N. & H. Narita. 2014. Merge, Labeling, and Projection. In A. Carnie, D. Siddiqi & Y. Sato (eds.). The Routledge Handbook of Syntax, 3-23. New York: Routledge. Gallego, Á.J. 2010a. Phase Theory. Amsterdam: John Benjamins. Gallego, Á.J. 2010b. Binding Through Agree. Linguistic Analysis 34: 163-192. Gallego, Á.J. 2012. Phases. Berlin: De Gruyter. Gallego, Á.J. 2013a. A Configurational Approach to the Left Periphery. Paper presented at 23rd Colloquium on Generative Grammar, Universidad Complutense de Madrid, May 9-11. Gallego, Á.J. 2013b. Object Shift in Romance. Natural Language and Linguistic Theory 31: 409-451. Gallego, Á.J. 2014. Deriving Feature Inheritance from the Copy Theory of Movement. The Linguistic Review 31: 41-71. Gallego, Á.J. 2017. The EPP in Labeling Theory: Evidence from Romance. Syntax 20: 384-399. Generative Grammar and the Faculty of Language CatJL Special Issue, 2019 257

Gallego, Á.J. 2020. Strong and Weak ‘Strict Cyclicity’ in Phase Theory. In A. Bárány et al. (eds.). Syntactic architecture and its consequences. Language Science Press. Gallego, Á.J. In progress. A -based approach to Information Structure. Ms., Universitat Autònoma de Barcelona. Gärtner, H.M. 2002. Generalized Transformations and Beyond. Berlin: Akademie Verlag. George, L. & J. Kornfilt. 1981. Finiteness and Boundedness in Turkish. In F. Heny (ed.). Binding and Filtering, 105-127. Cambridge, MA: MIT Press. Hale, K. 1983. Warlpiri and the Grammar of Non-configurational Languages. Natural Language and Linguistic Theory 1: 5-47. Hale, K. & S.J. Keyser. 1993. On Argument Structure and the Lexical Expression of Syntactic Relations. In K. Hale & S.J. Keyser (eds.). The View from Building 20, 53-109. Cambridge, MA: MIT Press. Hale, K. & S.J. Keyser. 1999. A Response to Fodor and Lepore, ‘Impossible Words’. Linguistic Inquiry 30: 453-466. Harbour, D., D. Adger & S. Béjar (eds.). 2008. Phi Theory. Oxford: Oxford University Press. Hasegawa, H. 2005. Reflexive Binding as Agreement and its Interaction With the Phase System. In N. Imanashi (ed.). The World of Linguistic Research, 53-69. Tokyo: Kaitakusha. Hauser M.D., N. Chomsky N. & W.T. Fitch. 2002. The Faculty of Language: What is It, Who Has It, and How Did It Evolve? Science 298: 1569-1579. Hiraiwa, K. 2005. Dimensions of Symmetry in Syntax: Agreement and Clausal Architecture. PhD dissertation, MIT. Holmberg, A. 2016. The Syntax of Yes and No. Oxford: Oxford University Press. Hunter, T. 2011. Insertion Minimalist Grammars: Eliminating Redundancies Between Merge and Move. In M. Kanazawa, A. Kornai, M. Kracht & H. Seki (eds.). The Mathematics of Language, 90-107. Berlin: Springer. Hunter, T. 2015. Deconstructing Merge and Move to Make Room for Adjunction. Syntax 18: 266-319. Huybregts, R. 2017. Phonemic Clicks and the Mapping Asymmetry: How Language Emerged and Speech Developed. Neuroscience and Biobehavioral Reviews 81(B): 279-294. Idsardi, W. & E. Raimy. 2013. Three Types of Linearization and the Temporal Aspects of Speech. In I. Roberts & M.T. Biberauer (eds.). Challenges to Linearization, 31-56. Berlin: De Gruyter. Kayne, R.S. 1984. Connectedness and Binary Branching. Dordrecht: Foris. Kayne, R.S. 1994. The Antisymmetry of Syntax. Cambridge, MA: MIT Press. Kayne, R. S. 2011. Why Are There No Directionality Parameters? In Proceedings of WCCFL 28, 1-23. Somerville, MA: Cascadilla Proceedings Project. Kayne, R. 2013. Comparative Syntax. Lingua 130: 132-151. Kayne, R. 2018. The Place of Linear Order in the Language Faculty. Ms., NYU. Kegl, J., A. Senghas & M. Coppola. 1999. Creation Through Contact: Sign Language Emergence and Sign Language Change in Nicaragua. In M. De Graff (ed.). Language Creation and Language Change. Cambridge, MA: MIT Press. 258 CatJL Special Issue, 2019 Noam Chomsky; Ángel J. Gallego; Dennis Ott

Kobele, G. 2006. Generating Copies. PhD dissertation, University of California, Los Angeles. Landau, I. 2016. Agreement at PF: an Argument from Partial Control. Syntax 19: 79-109. Larson, B. 2018. Right-node Raising and Ungrammaticality. Studia Linguistica 72: 214-260. Lasnik, H. & J. Uriagereka. 2012. Structure. In R. Kempson, T. Fernando & N. Asher (eds.). Handbook of Philosophy of Science vol. 14: Philosophy of Linguistics, 33-61. Oxford: Elsevier. Lebeaux, D. 1988. Language Acquisition and the Form of the Grammar. PhD disserta- tion, University of Massachusetts, Amherst. Legate, J. 2002. Warlpiri: Theoretical Implications. PhD dissertation, MIT. Legate, J. 2008. Morphological and Abstract Case. Linguistic Inquiry 39: 55-101. Lenneberg, E. 1967. Biological Foundations of Language. New York: John Wiley. Leu, T. 2015. The Architecture of Determiners. Oxford: Oxford University Press. López, L. 2007. Locality and the Architecture of Syntactic Dependencies. New York: Palgrave. López, L. 2009. A Derivational Syntax for Information Structure. Oxford: Oxford University Press. Marantz, A. 1991. Case and Licensing. In G.F. Westphal, B. Ao & H.-R. Chae (eds.). Proceedings of ESCOL ‘91, 234-253. Ithaca, NY: Cornell Linguistics Club. Marantz, A. 2001. Words. Ms., MIT. Marantz, A. 2007. Phases and Words. In S.-H. Choe (ed.). Phases in the Theory of Grammar, 191-222. Seoul: Dong In. Marantz, A. 2013. Locality Domains for Contextual Allomorphy Across the Interfaces. In O. Matushansky & A. Marantz (eds.). Distributed Morphology Today, 95-116. Cambridge, MA: MIT Press. Mateu, J. 2005. Impossible primitives. In M. Werning, E. Machery & G. Schurz. (eds.). The Compositionality of Meaning and Content: Foundational Issues, 213-225. Frankfurt: Ontos. Mathieu, E. 2016. The Wh-parameter and Radical Externalization. In L. Eguren, O. Fernández-Soriano and A. Mendikoetxea (eds.). Rethinking Parameters, 252- 290. Oxford: Oxford University Press. McFadden, T. 2004. The Position of Morphological Case in the Derivation. PhD dis- sertation, University of Pensylvania. Melchin, P. 2018. The Semantic Basis for Selectional Restrictions. PhD dissertation, University of Ottawa. Moro, A. 2000. Dynamic Antisymmetry. Cambridge, MA: MIT Press. Musso, M., A. Moro, V. Glauche, M. Rijntjes, J. Reichenbach, C. Büchel & C. Weiller. 2003. Broca’s Area and the Language Instinct. Nature Neuroscience 6: 774-781. Nelson, M.J., I. El Karoui, K. Giber, X. Yang, L. Cohen, H. Koopman, S.S. Cash, L. Naccache, J.T. Hale, C. Pallier & S. Dehaene. 2017. Neurophysiological Dynamics of Phrase-structure Building During Sentence Processing. PNAS 114: E3669-E3678. Nunes, J. 2004. Linearization of Chains and Sideward Movement. Cambridge, MA: MIT Press. Generative Grammar and the Faculty of Language CatJL Special Issue, 2019 259

Obata, M. 2010. Root, Successive-Cyclic and Feature-Splitting Internal Merge: Implications for Feature-Inheritance and Transfer. PhD dissertation, University of Michigan, Ann Arbor. Obata, M., S. Epstein & M. Baptista. 2015. Can Crosslinguistically Variant Grammars be Formally Identical? Third-factor Underspecification and the Possible Elimination of Parameters of UG. Lingua 156: 1-16. Oishi, M. 2015. The Hunt For a Label. In H. Egashira, H. Kitahara, K. Nakazawa, A. Nishimae, T. Nomura, M. Oishi & I. Suzuki (eds.). In Untiring Pursuit of Better Alternatives, 322-334. Tokyo: Kaitakusha. Ott, D. 2011. A Note on Free Relative Clauses in the Theory of Phases. Linguistic Inquiry 42: 183-192. Ott, D. 2012. Local Instability. Berlin: De Gruyter. Ott, D. 2014. An Ellipsis Approach to Contrastive Left-dislocation. Linguistic Inquiry 45: 269-303. Ott, D. 2016a. Fragment Anchors Do Not Support the Syntactic Integration of Appositive Relative Clauses: Reply to Griffiths & de Vries 2013. Linguistic Inquiry 47: 580-590. Ott, D. 2016b. Ellipsis in Appositives. Glossa 1: article 34. Ott, D. 2017a. The Syntax and Pragmatics of Dislocation: A Non-templatic Approach. In A. Monti (ed.). Proceedings of the 2017 annual conference of the Canadian Linguistic Association. Ott, D. 2017b. Strong Generative Capacity and the Empirical Base of Linguistic Theory. Frontiers in Psychology 8: 1617. Ott, D. & E. Onea. 2015. On the Form and Meaning of Appositives. In Proceedings of NELS 45, 203-212. Ouali, H. 2008. On C-to-T φ-feature Transfer. In D’Alessandro, R., S. Fischer & G.H. Hrafnbjargarson (eds.). Agreement Restrictions, 159-180. Berlin: Mouton de Gruyter. Pesetsky, D. & E. Torrego. 2006. Probes, Goals, and the Nature of Syntactic Categories. In Y. Otsu (ed.). Proceedings of the Seventh Tokyo Conference on Psycholinguistics, 25-60. Tokyo: Hituzi Syobo Publishing Company. Picallo, M.C. (ed.). 2014. Linguistic Variation in the Minimalist Framework. Cambridge: Cambridge University Press. Preminger, O. 2014. Agreement and Its Failures. Cambridge, MA: MIT Press. Quine, W.V. 1940. Mathematical Logic. New York: W.W. Norton & Co. Ramchand, G. 2008. Verb Meaning and the Lexicon. Cambridge: Cambridge University Press. Reinhart, T. 2006. Interface Strategies: Optimal and Costly Computations. Cambridge, MA: MIT Press. Reuland, E. 2011. Anaphora and Language Design. Cambridge, MA: MIT Press. Richards, M. 2007. On Feature Inheritance: An Argument from the Phase Impenetrability Condition. Linguistic Inquiry 38: 563-572. Richards, M. 2008. Two Kinds of Variation in a Minimalist System. In F. Heck, G. Müller & J. Trommer (eds.). Varieties of Competition, 133-162. Linguistische Arbeitsberichte 87, Universität Leipzig. Richards, M. 2012. On Feature Inheritance, Defective Phases, and the Movement– Morphology Connection. In Á.J. Gallego (ed.). Phases, 195-232. Berlin: De Gruyter. 260 CatJL Special Issue, 2019 Noam Chomsky; Ángel J. Gallego; Dennis Ott

Richards, N. 2016. Contiguity Theory. Cambridge, MA: MIT Press. Rizzi, L. 1997. The Fine Structure of the Left Periphery. In L. Haegeman (ed.). Elements of Grammar, 281-337. Dordrecht: Kluwer. Rizzi, L. 2013a. Introduction: Core Computational Principles in Natural-language Syntax. Lingua 130: 1-13. Rizzi, L. 2013b. Notes on Cartography and Further Explanation. Probus 25: 197-226. Rizzi, L. 2015. Cartography, Criteria, and Labeling. In U. Shlonsky (ed.). Beyond Functional Sequence, 314-338. Oxford: Oxford University Press. Roberts, I. 2010. Agreement and Head Movement. Cambridge, MA: MIT Press. Sandler, W. & D. Lillo-Martin. 2006. Sign Language and Linguistic Universals. Cambridge: Cambridge University Press. Seely, T. D. 2006. Merge, Derivational C-command, and Subcategorization in a Label-Free Syntax. In C. Boeckx (ed.). Minimalist Essays, 182-217. Amsterdam: Benjamins. Smith, N. & I.-M. Tsimpli. 1995. The Mind of a Savant: Language Learning and Modularity. Oxford: Blackwell. Sportiche, D. 2013. Binding Theory. Structure Sensitivity of Referential Dependencies. Lingua 130: 187-208. Sportiche, D. 2015. Neglect. Ms., UCLA. Starke, M. 2014. Cleaning Up the Lexicon. Linguistic Analysis 39: 245-256. Struckmeier, V. 2016. Against Information Structure Heads: A Relational Analysis of German Scrambling. Glossa 2: Article 1, 1-29. Tallerman, M. 2005. Understanding Syntax. New York: Routledge. Toyoshima, T. 2000. Head-to-Spec Movement and Dynamic Economy. PhD disserta- tion, Cornell University. Travis, L. 1984. Parameters and Effects of Word Order Variation. PhD dissertation, MIT. Trinh, T. 2011. Edges and Linearization. PhD dissertation, MIT. Uriagereka, J. 1999. Multiple Spell-Out. In S. Epstein & N. Hornstein (eds.). Working Minimalism, 251-282. Cambridge, MA: MIT Press. Uriagereka, J. 2003. Evidential Contexts. In J. Guéron & L. Tasmowski (eds.). Tense and Point of View, 367-394. Paris: Université Paris X. Uriagereka, J. 2012. Spell-out and the Minimalist Program. Oxford: Oxford University Press. Van Urk, C. 2016. On the distribution of reflexes of successive cyclicity. Paper present- ed at Brussels Conference on Generative Linguistics 9, Brussels, December 13-14. Vergnaud, J. R. 1977 [2006]. Letter to Noam Chomsky and Howard Lasnik. Reprinted in R. Freidin & H. Lasnik (eds.). Syntax: Critical Concepts in Linguistics, vol. 5: 21-34. New York: Routledge. Vermeerbergen, M., L. Leeson & O. Crasborn. 2007. Simultaneity in Signed Languages: A String of Sequentially Organised Issues. In M. Vermeerbergen, L. Leeson & O. Crasborn (eds.). Simultaneity in Signed Languages, 1-26. Amsterdam: John Benjamins. De Vries, M. 2009. On Multidominance and Linearization. Biolinguistics 3: 344-403. De Vries, M. 2012. Unconventional Mergers. In M. Uribe-Etxebarria & V. Valmala (eds.). Ways of Structure Building, 143-166. Oxford: Oxford University Press. Generative Grammar and the Faculty of Language CatJL Special Issue, 2019 261

Webelhuth, G. 1992. Principles and Parameters of Syntactic Saturation. Oxford: Oxford University Press. Willer Gold, J. et al. 2018. When Linearity Prevails over Hierarchy in Syntax. PNAS 115: 495-500. Wurmbrand, S. 2006. Licensing Case. Journal of Germanic Linguistics 18: 175-236. Yang, C. 2002. Knowledge and Learning in Natural Language. Oxford: Oxford University Press. Yang, C. 2016. The Price of Linguistic Productivity. Cambridge, MA: MIT Press. Yang, C., S. Crain, R. C. Berwick, N. Chomsky, J. Bolhuis. 2017. The Growth of Language: Universal Grammar, Experience, and Principles of Computation. Neuroscience & Biobehavioral Reviews 81(B): 103-119. Zwart, J.W. 2017. An Argument Against the Syntactic Nature of Verb Movement. In L.R. Bailey & M. Sheehan (eds.). Order and Structure in Syntax I: Word Order and Syntactic Structure, 29-47. Berlin: Language Science Press.

Catalan Journal of Linguistics Special Issue, 2019 263-285

Some Puzzling Foundational Issues: The Reading Program*

Noam Chomsky University of Arizona & M.I.T. [email protected]

Received: May 23, 2019 Accepted: September 23, 2019

Abstract

This is an annotated transcription of Noam Chomsky’s keynote presentation at the University of Reading, in May 2017. Here, Chomsky reviews some foundational aspects of the theory of structure building: essentially, Merge and Label. The aim is to eliminate what he refers to as exten- sions of Merge which are seemingly incompatible with the Strong Minimalist Thesis while still accounting for recursive structure, displacement, and reconstruction (as the main empirical goals of the Minimalist Program). These include sidewards movement, multi-dominance, and late- Merge; all of which have been developed throughout the life cycle of transformational generative grammar. Furthermore, Chomsky formulates a series of conditions that an adequate formulation of Merge must meet, and sketches how the aforementioned extensions may violate these conditions. Chomsky arrives at a formulation of an operation MERGE, which maintains the core properties of Merge but is further restricted by limitations over what MERGE can do to the workspaces where syntactic operations apply. Keywords: Strong Minimalist Thesis; workspaces; MERGE; recursion

Resum. Alguns problemes fonamentals desconcertants: el programa de Reading

Aquesta és una transcripció anotada de la presentació principal de Noam Chomsky a la Universitat de Reading, el maig de 2017. Aquí Chomsky revisa alguns aspectes fundacionals de la teoria de la construcció d’estructures: fonamentalment, fusió i etiquetatge. L’objectiu és eliminar allò que aquest autor anomena extensions de fusió, que aparentment són incompatibles amb la tesi minima- lista forta, i continuar donant compte de l’estructura recursiva, el desplaçament i la reconstrucció (com a principals objectius empírics del programa minimalista). Aquests inclouen el moviment lateral, la multidominància i la fusió tardana, que s’han desenvolupat al llarg del cicle de vida de la gramàtica generativa transformacional. A més a més, Chomsky formula una sèrie de condicions que ha de complir una formulació adequada de fusió i indica que les extensions esmentades poden violar aquestes condicions. Chomsky arriba a la formulació d’una operació fusió que manté les

* This talk took place at the University of Reading, in May 11th, 2017 as part of an international Colloquium: Generative Grammar in the 21st century: the evidence and the rhetoric. The video recording of the talk is available at . The talk was transcribed and annotated by Diego Gabriel Krivochen and Douglas Saddy, who take responsibility for any mistakes.

ISSN 1695-6885 (in press); 2014-9718 (online) https://doi.org/10.5565/rev/catjl.287 264 CatJL Special Issue, 2019 Noam Chomsky propietats bàsiques de fusió però que queda restringida per limitacions sobre el que la fusió pot fer en els espais de treball on s’apliquen les operacions sintàctiques. Paraules clau: tesi minimalista forta; espais de treball; fusió; recursió

When I hear about the work of the 1950s I always try to put myself in the position of a 17 year old kid in my first linguistics class in the 1940s. I imagine how it would have felt if someone got up and started talking about a book he wrote in 1885. Why bother? But that’s what this [talk] is about. I’d like to discuss some foundational issues which are unsettled and I think are rather troublesome, and that bear directly on a number of very important issues in current work and I think raise questions about the legitimacy of problems and challenges we have faced. So just a brief comment on background assumptions. The basic question we face is ‘what is the Language Faculty?’, ‘what is Universal Grammar (UG)?’; there is good reason to seek the simplest possible answers to this. One reason is just general methodology: simplicity is approximately the same as explanatory depth, we’re looking for explanation – topics that were explored extensively in the 1950s by Nelson Goodman.1 Another interesting comment that bears on our own history, I think, is by Richard Feynman, the Nobel laureate. When he received the Nobel Prize, he reviewed a number of developments in Physics and pointed out that, in every case the interesting results were reached from several different points of view and suggested that the characterization of explanatory depth – simplicity – is reaching the same conclusion by independent paths.2 He pointed out that although these approaches turned out to be physically equivalent, they were psychologically different in that the different approaches suggested different ways to approach the innumerable unanswered questions that arise whenever a result is reached, and it opens up new puzzles. I think we find things like that too [in Linguistics]. So I think that’s one reason for seeking the simplest solution. The second reason is a dictum of Galileo’s, which has served sciences pretty well for 500 years – therefore, worth taking seriously –, namely, that Nature is

1. See, for example, Goodman (1968: Chapter 4), which summarizes much of his previous work on the syntax of symbolic systems. 2. The relevant fragment is the following (Feynman, Nobel lecture, Dec. 11th, 1965) It always seems odd to me that the fundamental laws of physics, when discovered, can appear in so many different forms that are not apparently identical at first, but, with a little mathematical fiddling you can show the relationship. An example of that is the Schrödinger equation and the Heisenberg formulation of quantum mechanics. I don’t know why this is - it remains a mystery, but it was something I learned from experience. There is always another way to say the same thing that doesn’t look at all like the way you said it before. I don’t know what the reason for this is. I think it is somehow a representation of the simplicity of nature. A thing like the inverse square law is just right to be represented by the solution of Poisson’s equation, which, there- fore, is a very different way to say the same thing that doesn’t look at all like the way you said it before. I don’t know what it means, that nature chooses these curious forms, but maybe that is a way of defining simplicity. Perhaps a thing is simple if you can describe it fully in several different ways without immediately knowing that you are describing the same thing. Some Puzzling Foundational Issues: The Reading Program CatJL Special Issue, 2019 265 in fact simple and it’s the task of the scientists to show how that’s the case,3 be it the motion of planets, the tides, the flight of birds or, in our case, the nature of Language. And there’s a third reason, which is specific to the study of Language and fairly recent in our understanding of its import, and that has to do with human evolution and the evolution of language (by which I mean the evolution of the Language Faculty). We don’t know a lot about it, but we know something, and there are recent discoveries that strongly suggest that language emerged along with anatomi- cally modern humans or maybe slightly later and has remained stable ever since.4 Certainly, it arose before the separation of humans, which by a number of genome analyses has been shown to be not long after humans emerged, roughly 200.000 years ago. There’s a very interesting paper on this by Riny Huijbregts.5 Well, if that’s true, and it seems to be, then whatever emerged just has to be very simple: there were no selectional factors involved, there was just resort to some natural principle. Since it [Language] emerged suddenly and never changed, then it has to be a simple object, and that’s what we should be looking for. As a side comment, I think there are by now some reasons (I am not going to review them) to suppose that the core I-language (internal language) generates solely representations on one interface: C-I (Conceptual Intentional interface), essentially a kind of language of thought.6 And that’s probably close to, or probably we will discover totally invariant among human beings. It seems that the complexity, the variety of language arise overwhelmingly if not completely from the ancillary opera- tions which lead to externalization which we know draws upon our sensory motor system. And it’s pretty natural that that should be complex and vary because you have to match two systems that essentially have nothing to do with one another. The internal system seems to have arisen pretty suddenly along with modern humans and the SM (Sensory-Motor) system have been around for hundreds of thousands, in some cases millions of years, and have absolutely nothing to do with language. So when we try to connect these two things, it’s necessarily going to be a complex operation, and in fact the external operations, although they certainly follow princi- ples and rules of a restricted variety they nevertheless violate just about any principle of computational complexity one can imagine, and they do vary a lot, change a lot, generation to generation and so on. So I’ll just assume that, admittedly without

3. This idea can be found, for instance, in Dialogo sopra i due massimi sistemi del mondo (1632). 4. See Berwick & Chomsky (2016) for discussion. 5. See, for instance, Huijbregts (2017). 6. See Fodor (1975, 2008). It may be worth pointing out that: the language-of-thought hypothesis endorsed in LOT 1 wasn’t just any old hyper-realism about the mental; it was, in particular, a species of RTM (that is, of the representational theory of mind). Roughly, but near enough for present purposes, RTM is a claim about the metaphysics of cogni- tive mental states and processes: Tokens of cognitive mental states are tokens of relations between creatures and their mental representations. Tokens of mental processes are ‘computations’; that is, causal chains of (typically inferential) operations on mental representations. There is no tokening of a (cognitive) mental state or process (by a creature, at a time) unless there is a corresponding tokening of a mental representation (by that creature, at that time). (Fodor 2008: 5-6) 266 CatJL Special Issue, 2019 Noam Chomsky any arguments – it’s been discussed elsewhere – and take a look at the generative mechanisms for the core I-Language mapping to C-I. There are a number of questions, and problems, that arise; I will give a couple of examples of problems that I think are troublesome, and then go back and talk about how we can address them. So to take one, consider the rule raising to subject and then the same rule of raising to object.7 Raising to subject is the consequence of accepting the predicate- internal subject hypothesis. So there’s this two rules and how they work. I had a proposal in my paper Problems of Projection, which was counter-cyclic8 but it worked at the time. There was a good critique of it by Sam Epstein, Kitahara and Seely. They pointed out that first of all it violated the Extension Condition,9 but much more seriously it involves an extra operation, a complex operation of sub- stitution10 of the newly Merged element in exactly in the place where it originally

. 7 In Rosenbaum (1965: 12), raising to subject was known as Pronoun Substitution. Raising to object is a case of NP complementation with for-to COMP deletion. Perlmutter (1968: 36) proposes a general raising rule which ‘takes an NP out of the embedded sentence and moves it up into the higher sentence’. Postal (1974) describes both processes, raising to object and raising to subject as a single rule of movement (1974: 267), and defines the rule Raising, crucially, as a cyclic rule. . 8 It is useful in this context to first review the so-called ‘Cyclic principle’: essentially, when one domain to which transformations can apply is contained in another, relevant transformations apply to the smaller domain first, then proceeding to the wider domain. To quote Halle & Chomsky (1960: 275): The modifications [i.e., transformations] are introduced in a stepwise fashion, successive steps reflecting the influence of successively higher constituents. Note also that the same modifications apply to all constituents regardless of their place in the constituent hierarchy; the same rules are reapplied to each constituent in a repeating cycle until the highest constituent is reached. The final result of such a cyclical reapplication of the same rules reflects to a certain extent the stress distribution of the morphemes as parts of lower constituents. Chomsky (1973) formulates the Strict Cyclic Condition (SSC) as follows: No rule can apply to a domain dominated by a cyclic node [at the time, NP, S’] in such a way to affect solely a proper subdomain of A dominated by a node B which is also a cyclic node. Thus, a rule that targets a proper subdomain of a cyclic node is referred to as counter-cyclic. For example, in Chomsky (1995: 190), the ungrammaticality of *How did John wonder what Mary

fixed thow is blamed on the counter-cyclicity of the operation that raises how to the matrix Spec-CP, later raising what to the embedded Spec-CP. The ‘counter-cyclicity’ that Chomsky mentions here can be exemplified as follows: if operations

at T are triggered by C, then [C [T seem [α Bill to have left]]] must be derivationally prior to the structure [C [T Bill seems [α Bill to have left]]]. But because raising Bill affects, precisely, a proper subdomain of the cycle CP, this application of raising-to-subject is counter-cyclic. See Epstein, Kitahara & Seely (2014). 9. Formulated as follows: Suppose we restrict substitution operations still further, requiring that Ø be external to the targeted phrase marker K. Thus, GT and Move-α extend K to K*, which includes K as a proper part. (Chomsky 1995: 190) An even stricter version of the SSC, as Chomsky (1995: 190) observes, includes the effects of the EC. 10. Chomsky (1995: 248) defines that substitution forms L = {H(K), {α, K}} where H(K) is the head (= the label) of the projected ele- ment K, where K is a term of K. If L is a term of K, then the members of the members of L are terms of K (1995: 247), and adds that Some Puzzling Foundational Issues: The Reading Program CatJL Special Issue, 2019 267 appeared, which is quite a complex operation. Counter-cyclicity is about the same as Late Merge,11 so this critique holds for everything that is done with what’s called Late Merge: it’s completely unacceptable, because it involves operations that are complex, unmotivated, they have nothing to do with the goal we think we ought to obtain, something like the Strong Minimalist Thesis (SMT).12 These considera- tions become far more significant in the case of what are sometimes called exotic constructions, those which have virtually no evidence, maybe none, available for the child; things like Antecedent-Contained deletion13 or Across-the-Board move- ment, Parasitic gaps… It’s simply impossible to propose a new principle for those, it can’t be. The child has no evidence for them if he has to understand them. It must be the application of principles that are available for simple, easy, normal cases. So, in fact every kind of construction is in fact pretty exotic of the kind that Charles (Yang) was talking about but some are extremely so thus leading to the invocation of operations like counter-cyclicity, Late-Merge… completely unacceptable. We will rule this as part of the problems that we have to deal with. A lot of these pro- posals about Late Merge give very interesting descriptive results, bring all sorts of interesting ideas but without any basis, so what are apparently solutions are in fact problems, problems that now have to be addressed. And there’s quite a lot of work like that, counter-cyclicity, Late Merge… One example is called Parallel Merge,14

For the case of substitution, terms correspond to nodes of the informal representations, where each node is understood to stand for the subtree of which it is the root (1995: 247) thus respecting the Extension Condition (and therefore, the SSC). See fn. 31 below for the original (1995) formulation of Merge in terms of substitution as a Generalized Transformation. 11. See Stepanov (2001), in turn based on Lebeaux (1988): their proposal concerns the post-cyclic adjunction of XPs given (i) certain anti-reconstruction effects with respect to Condition C of Binding Theory, (ii) asymmetries on opacity in wh-extraction between arguments and adjuncts, (iii) Nissenbaum’s (1998) analysis of parasitic gaps, (iv) the timing of Affix Hopping and adjunct intervenience effects, (v) multiple wh-fronting in Slavic and its interaction with Superiority (and, incidentally, the SSC), (vi) Principle A effects in relative clauses (which are taken to be adjoined structures). It is essential to bear in mind that Chomsky-adjunction must not be confused with Joshi-style adjunction in the framework of Tree Adjoining Grammars (Joshi 1985 and much related work). 12. In the words of Chomsky (2000: 96): Language is an optimal solution to legibility conditions. These legibility conditions are imposed by the external systems, Conceptual-Intentional and Sensory- Motor over the outputs of the Narrow Syntax (Merge + Agree + Move). See the discussion in Chomsky (2000: 112, ff.) for more details. 13. Cases like John likes every boy Mary does [like t]. See Fox (2002) for an analysis along the lines Chomsky is referring to here. 14. Citko (2005: 476) The existence of External Merge and Internal Merge predicts the existence of a third type, combining the properties of both. This third type, which I will refer to as Parallel Merge, is like External Merge in that it involves two distinct rooted objects (α and β), but it is like Internal Merge in that it combines the two by taking a subpart of one of them. We can diagram the situation as follows, where α is a complex object {α, {α, γ}}:

268 CatJL Special Issue, 2019 Noam Chomsky which relies on work on multidimensionality, Sidewards Merge,15 a lot of things in the literature. I’ll come back to. But all of them are equally problematic for these reasons, particularly when they are used for more or less exotic constructions. In this particular case, raising to subject and object there seems to be an easy answer. The easy answer, which is in my recent papers, is simply to drop the condition that Internal Merge (Movement) has to be triggered, so it’s free, like External Merge.16 In fact, that’s an improvement, we should never have had that condition. So dropping that condition, there’s quite a straightforward cyclic analysis so we’re in good shape in this particular case. But that’s not true generally: if you look at the uses of Late Merge in the literature, a lot of them have interesting descriptive consequences, but don’t have easy answers. There might be some answers, but they have to be worked on, these are challenges. Well, that’s the first kind of problem, but I am concerned with an extension of it, and there are many extensions of what is called Merge, but are not really Merge, in the literature, which raise very serious questions of legitimacy. Some of them in fact yield direct violations of quite sound principles. So they raise several questions.

15. As per Nunes (2004: 93, ff.) we can exemplify and illustrate Sidewards Movement (which assumes the Copy + Merge theory of movement) as follows: i) a given constituent α is copied from K and merges with the independent syntactic object L.

[K …αi…] → Copy αi → Merge(αi, L), where [L …], yielding [K …αi…] , [M αi [L …]]  ii) at this point, the two copies of α cannot form a chain because they are not in a c-command relation. Thus, K and M are put together forming a new syntactic object XP, in which the copies of α can be in a c-command relation.

[XP [K …αi…] [X’ [X] [M αi [L …]]]] Note that K c-commands M, but neither copy c-commands the other (being embedded within K and M respectively). iii) If there is a higher head Y which selects a copy of α, such that YP transitively dominates XP and there is a copy of α in Spec-Y, this higher copy can form two distinct chains with the instances of α in both K and M. 16. An idea that goes all the way back to Lasnik & Saito’s (1984) Affect-α. For triggered Internal Merge, see Chomsky (1995) and the argument about displacement, uninterpretable features, and language perfection in Chomsky (2000). Chomsky (2004: 110) defends Free Merge in the following terms: NS [Narrow Syntax] is based on the free operation Merge. SMT entails that Merge of α, β is unconstrained, therefore either external or internal. And, in footnote 29 (2004: 125): For over forty years, there have been efforts to motivate displacement. That seems to have been a mistake. Recourse to any device to account for the displacement phenomena also is mistaken unless it is independently motivated (as is Internal Merge). If this is correct, then the radically simplified form of transformational grammar (‘Move-α’ and its variants) is a kind of conceptual necessity, given the undeniable existence of displacement phenomena. Chomsky (2008: 139) adopts a different position, though, imposing a feature requirement on External Merge: For an LI [Lexical Item] to be able to enter into a computation, merging with some SO [Syntactic Object], it must have some property permitting this operation. A property of an LI is called a feature, so an LI has a feature that permits it to be merged. Call this the edge feature (EF) of the LI. If an LI lacks EF, it can only be a full expression in itself; an interjection. Some Puzzling Foundational Issues: The Reading Program CatJL Special Issue, 2019 269

The first question is ‘how can they be ruled out?’, ‘what’s a proper definition of Merge that rules them out?’, and the second question is ‘how do we deal with the descriptive results that are presented and described as solutions but we should think of as problems of the analyses?’ A third kind of problem is raised by a very important part of what Charles Yang just talked about, his work on language acquisition,17 which has some pretty remarkable results on the basis of very simple assumptions, basically the Elsewhere condition18 and the assumption that listing something carries a cost for acquisition (which is almost tautological). On the basis of just those two assumptions he, along with Sam Gutmann, have managed to reach a very specific result, a definite tipping point that tells you –as much as possible– exactly where you should set a productive rule instead of listing.19 And there are some pretty remarkable empirical results also for the first time on princi- ples determining the core versus periphery distinction, quite valuable. Now, the relevance for my concerns here however is different. One is that the work highlights the fact that we should keep in mind that an important condition on language design is that the languages should be learnable, it should be possible to acquire them. In fact, if language couldn’t be acquired, it wouldn’t survive. And that highlights the fact that we are not doing general recursion, we are not studying proof theory or metamathematics, we are studying a particular organic system which has its own natural conditions that it must meet, and that turns out to be important.20

17. Specifically, Yang (2016). 18. First formulated with this name in Kiparsky (1973: 94), see also Anderson (1969: 139-144), who calls it ‘principle of disjunctive ordering’ and points towards antecedents in Panini. The general formulation in Kiparsky (1982: 8) suffices for present purposes: Rules A and B in the same component apply disjunctively to a form ϕ if and only if:  (i) the structural description of A (the special rule) properly includes the structural description of B (the general rule) (ii) the result of applying A to ϕ is distinct from the result of applying B to ϕ. In that case, A is applied first, and if it takes effect, then B is not applied. 19. Chomsky is referring here to the Tolerance Principle, proposed in Yang (2016: 64): Let R be a rule applicable to N items, of which e are exceptions. R is productive if and only iff 20. See Turing (1952). The combination of purely formal and biological considerations is evident in the following passage: The model [a mathematical model of the growing embryo] takes two slightly different forms. In one of them the cell theory is recognized but the cells are idealized into geometrical points. In the other the matter of the organism is imagined as continuously distributed. The cells are not, however, completely ignored, for various physical and physico-chemical characteristics of the matter as a whole are assumed to have values appropriate to the cellular matter. (Turing 1952: 37) Given the enormous complexity of the matter, Turing makes the following methodological choice, which echoes much biolinguistic work The interdependence of the chemical and mechanical data adds enormously to the difficulty, and attention will therefore be confined, so far as is possible, to cases where these can be separated. The mathematics of elastic solids is a well-developed subject, and has often been applied to biological systems. In this paper it is proposed to give attention rather to cases where the mechanical aspect can be ignored and the chemical aspect is the most significant. (Turing 1952: 38) 270 CatJL Special Issue, 2019 Noam Chomsky

Beyond that, there’s a specific consequence of Charles’ work, namely, if you take a look at it, his results depend on the assumption that rules are determinate [i.e., deterministic]: that means, if the structural conditions for a rule are met, the structural change has to take place in a fixed and determinate manner, and if you don’t have that property, his results don’t follow. And it’s a pretty natural property, except it is violated all over the place. So take Phrase Structure Grammar (PSG):21 if in a PSG you generate the structure, say:

1) NP, V, NP

And you have the rule

2) NP → Det, N

Then it’s not deterministic. It could be either of the two NPs: the structural description is met, but you don’t know what the result is. So that kind of rule really ought to be ruled out by the principle of determinacy. And it turns out that for these extensions of Merge that I’ve mentioned it’s almost always violated. And it turns out one of the many considerations as to why they are unacceptable. Again, all these problems are particularly acute with the exotic constructions, the methodological principle is violated all over the place in descriptive practice, and it should be kept in mind. Over the years, whenever some descriptive device has been introduced, and whatever it is (PSG, transformations, X-bar theory, parameters, phases, whatever it might be), almost always it tends to be used pretty extravagantly, well beyond the basis, of any solid foundation for the rule. And that’s partly because it’s not characterized explicitly enough, so there’s a lot of vagueness in the periphery which is exploited for the descriptive purposes. And that’s not necessarily a criticism. A good example, in fact, is Generative Semantics:22

21. See, e.g., Post (1943: 203, ff.), who presents what are known as ‘Post canonical systems’. Also Chomsky (1959); Postal (1964); Kuroda (1964) and, for a perspective closer to automata theory, Hopcroft and Ullman (1969). Greibach (1965: 43, Definition 1.1) defines PSGs as follows: By a psg (I, T, X, ) we mean a context-free phrase structure grammar where (1) I is a finite vocabulary of intermediate symbols, (2) T is a finite vocabulary℘ of terminal symbols and I ∩ T = ϕ,  (3) X is the designated initial symbol [i.e., the root in terms of trees, the axiom in proof theoretic terms] and X I,

(4) The rules of are of the forms, Z → AY1…Yn, n ≥ 1, Z I, A,Yi I T, and Z → α, Z I, α T. ∈ ℘ ∈ ∈ ⋃ ∈ Linguistically, they neatly capture Immediate Constituent analyses, and are opposed to Dependency ∈ Grammars. 22. See McCawley (1968 [1973]); Lakoff (1971); Lakoff & Ross (1976) (and also Dowty 1979). The basic tenets of a ‘vanilla’ GS (the common aspects to all variants) are presented clearly in McCawley (1968 [1973]: 155-156):  ( 1) Semantic structures are claimed to be of the same formal nature as syntactic structures, namely labeled trees whose non-terminal node-labels are the same set of labels that appear in surface structure. Some Puzzling Foundational Issues: The Reading Program CatJL Special Issue, 2019 271 transformations were around, the Katz-Postal hypothesis23 was around, Deep Structure interpretation was around, and that led to really quite extravagant use of these devices, which had both a positive and a negative aspect, finally collapsed from its own weight because it was so extensive that it didn’t make sense. But the advantages were that it led to a lot of discoveries, there were lots of insights about language that came out of it. They’re not solutions, they’re problems, and it’s good to have problems, and led to explorations of new domains that hadn’t been looked at. All of that’s positive, and that’s commonly true for the promiscuous use of devices that are invented. The negative aspect is that it doesn’t lead us to the goal of trying to understand UG and the language faculty, and it’s also misleading in that it tends to present problems, which are interesting problems,

(2) The notions of a ‘deep structure’ which separates syntax from semantics and a distinction between ‘transformations’ and ‘semantic interpretation rules’ are given up in favor of a single system of rules which relates semantic structure and surface structure via intermediate stages which deserve the name ‘syntactic’ no more and no less than ‘semantic’.  ( 3) It is held that the rules needed to determine what a grammatical sentence may mean are also needed to determine what is grammatical at all. (4) A grammar is taken not to generate a set of surface structures but a set of derivations, and to consist of a set of derivational constraints: constraints on what combinations of elements may occur in semantic structure, on what combinations of elements may occur in surface structure, and on how different stages of a derivation may differ from each other. 23. Katz & Postal (1964). Two aspects of that proposal are relevant here: The semantic interpretation of the sentence The man hit the ball Mu st represent the meanings of the constituents of this sentence, i.e., the, man, hit, the, ball, the man, hit the ball, and the man hit the ball. But it must not provide any meaning for such substrings […] as the man hit or hit the. Obviously, this condition of adequacy can be fulfilled only if the syntactic component provides, for each sentence it generates, an enumeration of all, and only, its constituents (Katz & Postal 1964: 20). The fragment above explicitly adopts the definition of ‘generative’ used in Post (1943) and requires determinacy to hold. Also, and perhaps more importantly for Chomsky’s point, there are also many cases in the literature of syntactic facts characterized by optional singulary transformations where the output P- marker must have a semantic interpretation quite different from that of the input P- marker. Among these are the question transformation, the imperative transformation, the wh-attachment transformation, etc. Thus there are three possibilities: first, that no correctly formulated singulary transformation has an output with a semantic interpre- tation distinct from its input and that those transformations in the literature which violate this claim are incorrect; second, that all singulary transformations affect meaning and those in the literature which do not are incorrect; third, that some do and some do not affect semantic interpretation and it is some specific feature of the particular transformations that determines which do and which do not. The first two alternatives are clearly preferable, even though what at present appear to be the facts throw more doubt upon them than upon the third, because they make no reference to spe- cific features of a class of transformations. […] on a priori methodological grounds, the first of the three alternatives is the one which deserves to be provisionally accepted. This alternative claims that P2 [Type 2 Projection rules, a set of rules that apply to strings that have been applied transformations] play no role in the semantic interpretation of any sentoid [a sequence of gram- matical formants with an associated semantic interpretation] without a generalized transforma- tion in its T- marker. (Katz & Postal 1964: 32. Underline in the original) 272 CatJL Special Issue, 2019 Noam Chomsky as if they were solutions, and they are not solutions: they are ways of stating a problem that we have to look at. Well, I think that a lot of this is now happening with Merge-style systems. These systems have had quite a lot of real and, I think, significant contributions. So, there’s accounts for the existence and ubiquity of displacement and reconstruc- tion, which has always been regarded as an oddity of language, but it turns out to follow from the null hypothesis: if you pick the simplest combinatorial operation you get displacement and reconstruction, so that’s a pretty significant result, I think. It accounts for the deep and quite puzzling property of structural dependence, which has been worried about for 50 years: why do languages have this strange property, which increases computational complexity of use of language, since deal- ing with linear order is much more trivial computationally? But it’s nevertheless ubiquitous. Why is that the case? Well, it turns out again that it follows from the null hypothesis, if you pick the simplest combinatorial operation, that’s what you get. Incidentally, this alone has many consequences for the future, raises difficult problems: it follows from this that anything involving linear order or any other arrangement cannot feed C-I. But there’s overwhelming evidence to the contrary: in fact, the whole history of linguistics assumes the opposite, right up to the present. It assumes that things like linear order and arrangement are what yield semantic interpretation and interact all over the place with syntactic operations. And there’s plenty of interesting contemporary work that seems to suggest the same thing. But if this is correct, and there’s good reason to think it is, those indeed give the only explanation for structural dependence, in fact the best possible explanation. We have a real problem, like the problem of the extravagant use of devices: major areas of descriptive wealth have to be completely rethought. We have to ask how we can show the descriptive consequences of using linear order or arrangement have to be settled in some other way, they have to be assigned to the externalization system. There’s also interesting neurolinguistic and psycholinguistic experiments that suggest exactly the same thing, including work that Ianthi [Tsimpli] talked about before. These results that I mention, if you take a look at them, were all achieved within a narrow version of Merge, not using the eccentric versions I’ve been raising ques- tions about. The narrow version is pretty well defined in itself: it involves simple combinatorial operations and relies on the observation that there are two logical possibilities, External and Internal Merge. But the narrow version has been used within a framework that has been left kind of vague and unspecific. And that’s a problem. It’s the vagueness that has been exploited for the extensions of Merge like counter-cyclic Late Merge, Parallel Merge to yield Multidominance,24 and so on. Well, there are the usual advantages and problems that I’ve mentioned, but I

24. The various theories grouped under the term ‘Multidominance’ can be said to have in common the rejection of the so-called Single Mother Condition (Sampson, 1975), such that a node in a structural description can be dominated by more than a single node (i.e., can have more than one ‘mother’). See Citko (2005); Peters and Richie (1981); McCawley (1982); Levine (1985). The locally multi- rooted graph proposal in Morin & O’Milley (1969) (who refer to these structures as vines) is also relevant. Some Puzzling Foundational Issues: The Reading Program CatJL Special Issue, 2019 273 think that these extensions are illegitimate and we have to somehow show that and show that the narrow version, which yields the interesting results, is somehow the only legitimate one. That’s the question I will look at. Well, there are two ways to proceed. One is the boring way, so I’ll ignore it: that’s to stipulate the narrow version explicitly: say ‘this is what it is, the others don’t work’. We don’t want to do that. The interesting way is to take a look at the computational operations of language from a completely different point of view, to start by asking ‘what are the general desiderata that any computational operations for language should meet?’ Notice there are two considerations here: first, what should any computational operation be like, simply on grounds of computational complexity and third factor conditions?25 And second, it’s got to be specific to language, an organic system which has its own properties. So there are those two conditions or desiderata. And the idea will be that the program will be constructed in a general framework that accommodates a wide range of alternatives including all the extensions, including other things that we can think of that haven’t been used yet. And then ask: what survives careful analysis in terms of these conditions? In the hope of showing that only the narrow version passes muster under these condi- tions, and which then leaves us with the challenge of facing the wealth of descrip- tive results that have been reached by what I hope to show are illegitimate means and methods. The better approach is the one that provides insight into the nature of language and explanatory adequacy, so let’s proceed with it first formulating some general principles that any operation for language ought to meet. The first and most obvious one is simply descriptive adequacy. It ought to get the facts right. And of course we all know that’s not innocuous, as we don’t know what the facts are a priori. Maybe something we think is a fact about language turns out to be a performance fact. What counts as a fact depends on theoretical understanding and empirical discoveries. Nevertheless, it’s a pretty good guideline to proceed and I should say that, now that we have –thanks to Charles [Yang]– a sharp core-periphery distinction, that helps independent study on the fact of what is a performance property, that also helps. And we have reasonably good guidelines, I think it’s safe to start from that. But it poses a problem. One problem is that Merge violates it. Merge does not satisfy descriptive adequacy. This is a fact that’s pretty crucial. So if you take a look at the simplest case of Merge, that is Internal Merge, that’s the one that involves least search, External Merge involves huge search, it’s a complicated operation.26 Internal Merge only involves search within the syntactic

25. In the words of Chomsky (2005: 6), the third factor in language design includes: iii) Principles not specific to the faculty of language. The third factor falls into several subtypes: (a) principles of data analysis that might be used in language acquisition and other domains; (b) principles of structural architecture and developmental constraints that enter into canalization, organic form, and action over a wide range, including principles of efficient computation, which would be expected to be of particular significance for computational systems such as language. It is the second of these subcategories that should be of particular significance in determining the nature of attainable languages. 26. The ‘search space’ in IM is limited to the local phrase marker under consideration, in EM the search space is potentially the entire Lexicon. 274 CatJL Special Issue, 2019 Noam Chomsky object that we are looking at. So suppose we apply just Internal Merge, then what do we get? Well, basically we get the successor function. So suppose you have a single element lexicon and we apply Merge to it; we have a single element, call it 0, and we apply Merge to it and you get the set containing 0, call that 1. Apply it again, you get the set containing 0 and 1, it’s 2. And basically you get the successor function.27 But language isn’t the successor function, maybe arithmetic is. And in fact that’s part of the argument that maybe that’s why humans know arithmetic, because it’s the simplest case of language. But that’s not language. Well, suppose you apply Internal Merge. Then, if you think about it, what you get is (if you look at it in terms of trees) a tree in which the leaves are lexical items, whatever lexical items are, you get a tree with lexical items coming off. Well, that’s not language. So, these two operations, incidentally, are appropriate for standard formal systems.28 So if you are making standard formal systems you can use these methods, but not language, because lan- guage has a different property: language has exocentric constructions, it has things of the form {XP, YP} like, say, {NP, VP}, and you can’t get them by just internal Merge, so there’s a problem. The operation Merge violates the simplest condition: descriptive adequacy. Well, that has (kind of) been overcome by a tacit assump- tion, and this tacit assumption has to be made precise, as it has consequences. And the tacit assumption is that you can construct syntactic objects in parallel and then bring them together somewhere. Now, that presupposes that you have a workspace in which operations are being carried out. And, what’s the workspace? Well, that hasn’t been properly answered. And fixing it has consequences. The one immediate consequence is that operations, say, the right version of Merge should be operations on the workspace, not on a particular syntactic object, because they can change the workspace. And in fact, if you look at these –what I think are- illegitimate operations like, say, Parallel Merge, they in fact involve separate elements of the workspace, so they are modifying the entire workspace. And since the program I am suggesting is to present desiderata that include everything, these fall within it, just like much else.

27. The recursive set-theoretic construction of the naturals is due to Von Neumann, e.g., (1923) (within the more general framework of Zermelo-Fraenkel set theory), and it is equivalent to Peano’s (which was however not based on set theory). First, let s(a) = a {a}, call s the successor function And ⋃ 0 = Ø Then, define 1 = s(0) = s(Ø) = Ø {Ø} = {0} 2 = s(1) = s({0}) = {0} {{0}} ⋃ But we know that {0} = 1, so ⋃ 2 = {0} {1} = {0, 1} Which means that 2 is defined as the ordered set containing 0 and 1. In the same way, 3 is defined ⋃ recursively as s({0, 1}) = {0, 1} {2} = {0, 1, 2} (or, equivalently, {0, 1, {0, 1}}) 28. Standard here just means the kind usually discussed in general (including linguistic) contexts: propositional calculus,⋃ quantification theory, arithmetic…. Not say category theory. (Chomsky PC) Some Puzzling Foundational Issues: The Reading Program CatJL Special Issue, 2019 275

So the operations, including the right version of Merge, have to be operations on the workspace. That raises all kinds of questions, to which I’ll turn directly. The first condition is descriptive adequacy. The second condition is some version of the Strong Minimalist Thesis, I men- tioned reasons why we ought to be able to approach that. There are specific conse- quences like, for example, Inclusiveness:29 the operations should not add anything: they should not add order, they should not add new features, or anything like that. Of course, externalization violates all these conditions, it violates just about eve- rything, so that’s not surprising. A third condition, which is specific to language, is not true of, say, formal proof theory or something like that, is that we should be restricting the computa- tional resources. We are dealing with an organic system with limited computational resources. In fact, quite limited, if you think about the speed of neural transmission and so on and so forth. So a third principle is restrict computational resources, and in the best case (the case that we ought to try to achieve, if possible), the opera- tions should never extend the workspace; they should maybe contract it, but not expand it. The fourth principle, I already mentioned, is determinism, the principle that was required for Charles’ results. So, if the structural conditions for a rule holds for some workspace, then the structural change must be unique, it must be determinate, unlike, say, phrase structure grammar. The fifth condition is, centrally, a condition of coherence or stability that says that the properties of a syntactic object can’t change in the course of the derivation,30 so something that refers to Mary on line 1 cannot refer to John on line 3. There’s an interesting history about this, that there’s no time to go into, but in the history of science it turns out that in classical physics and mathematics this condi-

29. Chomsky (1995) defines the Inclusiveness Condition as follows: A “perfect language” should meet the condition of inclusiveness: any structure formed by the computation (in particular, π and λ) is constituted of elements already present in the lexical items selected for N; no new objects are added in the course of computation apart from rearrange- ments of lexical properties (in particular, no indices, bar levels in the sense of X-bar theory, etc. (Chomsky 1995: 228) Also, footnote 7 of Chapter 4, p. 381: Note that considerations of this nature can be invoked only within a fairly disciplined minimalist approach. Thus, with sufficiently rich formal devices (say, set theory), counterparts to any object (nodes, bars, indices, etc.) can readily be constructed from features. There is no essential differ- ence, then, between admitting new kinds of objects and allowing richer use of formal devices; we assume that these (basically equivalent) options are permitted only when forced by empirical properties of language. 30. See, e.g., Lasnik & Uriagereka’s (2005: 53, 112) Conservation Laws: 1st Conservation Law: Conservation of Lexical Information All information in a syntactic derivation comes from the lexicon and interpretable lexical infor- mation cannot be destroyed. 2nd Conservation Law: Conservation of Structural Information Interpretable structural units created in a syntactic derivation cannot be altered The notion of faithfulness constraint in Optimality Theory is also consistent with this dictum. 276 CatJL Special Issue, 2019 Noam Chomsky tion was crucially violated, as in Newton’s work, with quite major consequences. But anyway, this condition has to be satisfied. And the sixth condition is the fact that language crucially involves recursion. That’s a universal human property of the human faculty of language (there’s a lot of confusion about this in the popular literature, which I will not go into). It’s an invariant property of humans that the language faculty involves recursion. Well, what’s the basic idea of recursion? It’s that every object that’s generated must be available for later computations. So, for example, if you’re doing formal proof theory, and you prove a theorem, you have to be able anywhere later in the proof to go back to that theorem and follow its consequences. That’s formal proof theory, but remember, we are not doing formal proof theory, we have an organic object which has to meet other conditions. But in our case we want to try this sixth condition given condition 2, the SMT, we want to formulate recursion in a way that stipulates no specific properties, so we don’t put any extra conditions on it: recursion ought to be free. So what we’ll say is that a syntactic object is accessible (that’s a technical term) to further operations if it’s been generated, period. So that’s general recursion without further stipulations. Well, I’ll stop with these conditions for time reasons. The basic ones are the first and second (descriptive adequacy and SMT), if we think it through, the others pretty much follow. There are consequences right away from the definition of recursion. One conse- quence shows that what’s been called the Extension Condition is a mistake, because the Extension Condition simply stipulated that the only accessible syntactic objects are the whole syntactic objects, that’s the Extension Condition. But the general definition of recursion tells us that anything inside should be accessible. And that means that one of the arguments against Late, counter-cyclic Merge doesn’t work, the one that said ‘it violates the Extension Condition’. It doesn’t matter in this case, because the major argument, the EKS [Epstein, Kitahara, Seely] argument against Late, counter-cyclic Merge is the substitution operation, which is unacceptable. So the conclusion holds, but not the entire argument, the Extension Condition [one] has to be withdrawn. Much more interesting is what happens if we take the simplest cases. Suppose the workspace consists of two elements, the set {a, b} (whatever a and b are). So we have the result of Merging a and b. That’s one of the things in the workspace. The second thing in the workspace, call it c, a syntactic object. So that’s the sim- plest case, and it’s worth a very close look because it turns out that the problems that undermine, I think, all of the extensions of Merge, already show up with this simple case, so it’s worth looking at this simple case carefully:

3) W = {{a, b}, c}

So let’s take a look at that and the problems that it raises. Notice first that a and b have to be accessible to further operations by the defi- nition of recursion. And in fact they are, because they are part of the first element, the set {a, b}. It has already been generated, so it’s accessible. However, if you Some Puzzling Foundational Issues: The Reading Program CatJL Special Issue, 2019 277 look further, the condition of recursion does suggest that the workspace that results from the operation that yields (3) should be as follows. If we Merge:

4)

The definition of recursion suggests that the workspace should be:

5) {{{a, b}, c}, a, b}

Because a and b are both accessible. And, as I mentioned, that’s not a strong argument. But the general notion of recursion works like this. So if you’re doing, say, proof theory, the axioms and every line you’ve generated are still there, they are part of the workspace (we don’t need a workspace for proof theory, but the set created by the operation would be (5)). So that raises a question: is that the right answer? Should the result of Merge, or the improvement of Merge, be an operation on the workspace which yields (5)? Well, suppose we are to do this. Notice that first of all we violate the condition of descriptive adequacy [condi- tion 1], and the reason is quite simple: we have (4) in the workspace and it has to be available for further operations, which can yield (6)

6)

X is a structure of arbitrary complexity. Since a and b are in the workspace we could merge a to X

7)

And it means that we have what amounts to a movement operation which violates every possible rule on movement, so we can’t accept that. Furthermore, it violates condition 3, the condition on not expanding the workspace [compare the size of the workspace in (4) and (5)]. Another reason is that it violates determinism [condition 4]: suppose that there’s some other operation going on that targets, say, a, we don’t know which a it applies to. So already there are three conditions that are violated. And this goes back to Feynman’s argument: we are getting somewhere because we have several independent lines of argument (each reasonable in itself) which yield the same result. 278 CatJL Special Issue, 2019 Noam Chomsky

Well, there is a proposal, very widely used in the literature that we are talking about, which says we can overcome this by developing a new theory of movement, one which doesn’t involve Internal Merge. So suppose we bar Internal Merge and develop a representational theory of movement, which just involves External Merge. That’s very widely used, in multidimensionality. That’s a terrible idea: it violates every condition you can think of, and it has its own problems. So take for example Topicalization or Left Dislocation:

8) [Mary’s book]1, John read [Mary’s book]2

This is the underlying structure. If you interpret it as Topicalization, then

[Mary’s book]1 has to be identical in all respects to [Mary’s book]2. So, if Mary owns the book, that’s the topic, the book that Mary owns. But then what John read, it has to be the book that she owns, not the one that she wrote, and it’s the same Mary. On the other hand, in the Left Dislocation case this isn’t true, they are totally unrelated. So we can say ‘as for Mary’s book’, that’s what we’re talking about, John read Mary’s book, maybe it’s a different Mary, maybe it’s the book she owns…

In the Topicalization case, [Mary’s book]1 and [Mary’s book]2 need to be cop- ies. In the Left Dislocation case, they are repetitions. Well, that’s a critical distinc- tion. And in this new representational theory, there’s no way of describing it. And in the old theory, the narrow theory, there are trivial ways of describing it: simply define the concept of copy as something formed by Internal Merge, everything else is a repetition. And that captures exactly the intuition behind this, namely, that if there’s something new that comes into the derivation from the outside it’s a repeti- tion, that has nothing to do with what’s inside. So if I say

9) John saw John

They are different guys. If it’s a copy, it’s something inside the derivation. It’s not adding anything new from the outside. And that’s trivially computed at the phase level. So within the narrow version a simple answer can be made to work, that’s all. But in the new theory there’s no answer at all, other than abandoning the hope of distinguishing copies from repetitions. And that’s incidentally only the beginning of a lot of other difficulties. Another difficulty with this approach is that you are barring Internal Merge. But that’s the simplest possible operation. So in order to follow this approach you are saying ‘the simplest operation is ruled out’ for no reason. And incidentally, barring Internal Merge means losing the explanation for the ubiquity of displacement and reconstruction, which is a pretty big result. Another consequence is that the elements manipulated by External Merge can be of arbitrary complexity, it could be anything at all, which means a huge amount of extra computation. And to amplify that, this new object, as we know, has to be inserted at every point of the successive cyclic operation, because as you know there are consequences at the various points of insertion in movement, at the vP Some Puzzling Foundational Issues: The Reading Program CatJL Special Issue, 2019 279 phase and the CP phase, there are both semantic and phonological output conditions (as Doug [Saddy] showed many years ago in Indonesian).31 So you have massive new computation which has to be introduced at every point of the successive cyclic operation, and it goes on like that. The result is, there’s lots of loss and no gain whatsoever, because these con- structions are still ruled out by other conditions. So I think that’s a non-starter. But how do we approach the matter? We have to redefine Merge as an operation of replacement, so that Merge would say:32

10) Replace(a, b) by {a, b}

That’s the way Merge was defined, back in 1995,33 in the initial publications in the Minimalist Program. Incidentally, it was defined as follows

10’) a. Replace(a, b) by {a, b} b. Eliminate(a, b)

Some of the recent formalizations call it Remove. But, where do you eliminate it from? Well, that was not answered. But now we have an answer, we know where it’s eliminated from, it’s the workspace. So we have an operation which replaces a and b by the set {a, b} and eliminates a and b from the workspace; that will over- come the problems that we discussed with the earlier example (5). But we want to do this without a new rule Eliminate. We’re trying to keep to the SMT, we don’t want new rules or anything like that. The simplest way to do this is the following: Suppose we have a workspace with a set of objects. From that workspace we can determine a sequence Σ:

31. Saddy (1990, 1991). 32. See also Collins (2017); Epstein, Kitahara & Seely (2015). 33. The relevant definitions are the following: We now adopt (more or less) the assumptions of LSLT, with a single generalized transformation GT that takes a phrase marker K1 and inserts it in a designated empty position Ø in a phrase marker K, forming the new phrase marker K*, which satisfies X-bar theory. Computation pro- ceeds in parallel, selecting from the lexicon freely at any point. At each point in the derivation, then, we have a structure Σ, which we may think of as a set of phrase markers. […] GT is a substitution operation. It targets K and substitutes K1 for Ø in K. But Ø is not drawn from the lexicon; therefore, it must have been inserted by GT itself. GT, then, targets K, adds Ø, and substitutes K1 for Ø, forming K*, which must satisfy X-bar theory. Note that this is a description of the inner workings of a single operation, GT. (Chomsky, 1995: 189) In this context, GT (Merge) is defined as a binary substitution operation, and Move-α is its singulary version. In ‘Categories and Transformations’ Chomsky defines GT as follows:

C(HL) must include a second procedure [other than Select] which combines syntactic objects already formed. A derivation converges only if this operation has applies often enough to leave us with just a single object, also exhausting the initial Numeration. The simplest such operation

takes a pair of syntactic objects (SOi, SOj) and replaces them by a new combined syntactic object SOij. Call this operation Merge. (Chomsky 1995: 226) 280 CatJL Special Issue, 2019 Noam Chomsky

11) Σ = (X1, X2, …Xn) That has the following properties: it’s the shortest sequence such that

(i) Each Xi is accessible (that’s the definition of recursion) (ii) Σ exhausts the workspace

And we can define a new operation called MERGE:

12) MERGE(Σ) = {{X1, X2}, X3, …Xn}

That’s a replace operation. It replaces X1 and X2, the first members of the sequence, by a set, and it doesn’t have any Remove operation. If you look at this, a couple of things follow. First of all, this says that you can take any two accessible elements in the workspace, any at all, and you can MERGE them in the ‘capitals’ sense [i.e., in the sense of (12)], and you map the workspace into a new workspace. This happens to accommodate all of the extensions that are around, plus more than you can think of. And the next step you have to take is to make sure that the legitimate operations External Merge and Internal Merge are included. And they are, EM, in fact, yields exactly the same results if you think it through you’ll see that EM in the old definition (now extended to a definition of the workspace) yields (12) and IM yields exactly the same thing, which makes sense, because they are the same operation; they are just two possible cases of the same operation. So, naturally, they yield the same results: Internal Merge is, again, the simplest operation, with the least search. MERGE, in this new definition, satisfies condition 2, SMT: it’s the simplest computational operation you can think of on the workspace, and it excludes the bad cases. It keeps the workspace from expanding: notice that the new result is not larger than the original one. And in fact External Merge always reduces the work- space by one; Internal Merge keeps it the same. So we are not violating condition 3, as the workspace is not expanding indefinitely. Now, there’s an obvious qualification here: sentences can be longer and longer, so there’s got to be some way of building up the workspace. The minimal way to extend the workspace, therefore the one we want to keep to, is to take two lexical items a, b out of the Lexicon (it doesn’t matter what the Lexicon is; certainly not words, but that’s been recognized since LSLT) and form from them the set {a, b}. If the Borer-Marantz idea about roots and categorization of roots is correct34, then

34. Marantz (1997: 215) defines this position as follows: Roots like √DESTROY and √GROW […] are category neutral, neutral between N and V. When the roots are placed in a nominal environment, the result is a “nominalization”; when the roots are placed in a verbal environment, they become verbs. These ‘environments’ are defined shortly after: Among the functional heads in whose environments roots become verbs (these may be “aspec- tual” in some sense), one, call it “v-1,” projects an agent while another, call it “v-2,” does not. These little “v’s” could be different flavors of a single head, or perhaps there is some unified Some Puzzling Foundational Issues: The Reading Program CatJL Special Issue, 2019 281 these will always be composed of a root and a categorizer, for example n, v, that says what category it is. In any event, two things: notice that it’s not enough to take one of them out of the workspace because that will just External Merge whatever we have and it won’t add anything. But if you pick these two things out of the workspace then you can go on to build a new syntactic object. So that’s the minimal way to allow the workspace to grow, if you can get away with it, of course, and still keep to the SMT. Other than that, the operations themselves never expand the workspace, so that’s satisfying the third condition. I’ll mention some things than have to be done. You have to show that the two legitimate operations, EM and IM (which are the same operation) satisfy determi- nacy [condition 4]. That’s not trivial. But if we look at it, there’s an answer, as to how they satisfy determinacy. Then, you have to look at other cases. I’ll just mention a couple I won’t go through. So, for example, one case that follows from the general framework looks like this:

13)

And we Merge {X, Y}. You take two things inside the syntactic object, and you Merge them. I don’t think anybody suggested that, but it falls within the general framework. And if you think it through, you’ll see that it violates all the conditions that the simplest case violates. So now let’s take one that is in the literature for which there has been quite a lot of descriptive work and that is Parallel Merge. So we have an object

14)

That’s one. And we have some other one, call it Z, and that’s the whole work- space. And then we Merge:

account that could have a single head optionally project an agent and thus cover both v-1 and v-2. (Marantz 1997: 217) Conversely, the ‘nominalizing environment’ is taken in Marantz (1997: 218) to be the domain of D. More recent proposals have proposed a ‘little x’ for each environment, such that v is a verbal- izer, n is a nominalizer, etc. (see Borer 2005a, b). The root (pun intended) of the transformational approach to nominalizations, which is essential to understand these proposals, is to be found in Lees (1960). 282 CatJL Special Issue, 2019 Noam Chomsky

15)

Forming {Y, Z}. That’s what’s called Parallel Merge. And it’s usually described as Internal Merge, so that Y is internally merging to Z. But that’s not Internal Merge. It’s some new kind of Merge which doesn’t fit the narrow version. It’s usually written like this:

16)

That’s the notation used in the Multidominance literature. But that notation doesn’t mean anything, there’s no such object. In fact, in general, trees are an extremely misleading notation, the root may have nothing there, there may not be a root. And this temptation to draw all sorts of lines, that’s all over the place, and much of it can’t be reduced to Merge. I won’t go through this today, I’ll leave it as an exercise. But (16) has exactly the problems of the trivial case, it violates all the same conditions. That, incidentally, eliminates all the work on merge-based Multidominance, which has led to very interesting results, ATB, all these things. But they are left as problems. And again, if you think about Richard Feynman’s observation, that’s what we are finding, case by case. Well, pretty much the same is true of other conditions; Sidewards Merge is even more complicated, there’s lots of other problems. But as far as I know, every case that’s in the literature aside from the narrow version and other cases that you can dream of that aren’t in the literature violate all the conditions that I’ve men- tioned. Again, I’ll leave it as an exercise. But what was intuitively the old version of EM and IM (intuitively, because it was never stated precisely), with the notions Replace and Eliminate, the question ‘Eliminate from what?’ was never answered. And then when you answer it, you conclude that the operation Merge has to be redefined as an operation on the workspace, one which doesn’t have this word eliminate and meets the conditions we listed. And I think that converges down to what pretty much was assumed intuitively. Which means, in fact, that for practical purposes (not being precise) you can use the old definition, which you know is wrong, because it’s the only definition, I think, which satisfies a range of legitimate general desiderata on what every operation ought to be. So all of those background conditions are a framework in which any computational operation for language must be selected. And it happens, I think, to work out [that] that was what we were using intuitively in the narrow version. Well, that leaves a couple of big problems, first of all showing that what I just said is correct. Which would mean solving a lot of problems which I am leaving for you to solve. Some Puzzling Foundational Issues: The Reading Program CatJL Special Issue, 2019 283

But what doesn’t work and is easily a big problem is taking everything that’s been described in the extensions of Merge, like everything in the Multidominance theory and all the rest, and showing that there is a legitimate way to pay off those promissory notes – I think if you look at that there may be some interesting ways. So there’s lots of work to do. I’ll stop there.

References Anderson, Stephen R. 1969. West Scandinavian Vowel Systems and the Ordering of Phonological Rules. PhD dissertation, MIT. Berwick, Robert. 1985. The acquisition of syntactic knowledge. Cambridge, MA: MIT Press. Borer, Hagit. 2005a. In Name Only. Structuring Sense, Volume I. Oxford: Oxford University Press. Borer, Hagit. 2005b. The Normal Course of Events. Structuring Sense, Volume II. Oxford: Oxford University Press. Borer, Hagit. 2017. The Generative Word. In James McGilvray (ed.). The Cambridge Companion to Chomsky. 2nd Ed. Cambridge: CUP, 110-133. Chomsky, Noam. 1955. The Logical Structure of Linguistic Theory. Mimeographed, MIT. Available at . Chomsky, Noam. 1959. On Certain Formal Properties of Grammars. Information and Control 2: 137-167. Chomsky, Noam. 1995. The Minimalist Program. Cambridge, Mass.: MIT Press. Chomsky, Noam. 2000. Minimalist Inquiries: The Framework. In Roger Martin, David Michaels & Juan Uriagereka (eds.). Step by Step – Essays in Minimalist Syntax in Honor of Howard Lasnik. Cambridge, Mass.: MIT Press, 89-155. Chomsky, Noam. 2004. Three Factors in Language Design. Linguistic Inquiry 36(1): 1-22. Chomsky, Noam. 2013. Problems of Projection. Lingua 130: 33-49. Chomsky, Noam. 2015a. Problems of Projection: Extensions. In Elisa Di Domenico, Cornelia Hamann & Simona Matteini (eds.). Structures, Strategies and Beyond: Studies in honour of Adriana Belletti. Amsterdam: John Benjamins, 1-16. Chomsky, Noam. 2015b. The Sophia Lectures. Sophia Linguistica 64. Sophia University, Tokyo. Citko, Barbara. 2005. On the Nature of Merge: External Merge, Internal Merge, and Parallel Merge. Linguistic Inquiry 36(4): 475-496. Collins, Chris. 2017. Merge(X, Y) = {X, Y}. In Leah Bauke & Andreas Blühmel (eds.). Labels and Roots. Berlin: Walter de Gruyter, 47-68. Dowty, David. 1979. Word Meaning and Montague Grammar. Springer Verlag. Epstein, Samuel D., Hisatsugu Kitahara & T. Daniel Seely. 2014. Labeling by Minimal Search: Implications for Successive-Cyclic, A-Movement and the Conception of the Postulate “Phase”. Linguistic Inquiry 45(3): 463-481. Epstein, Samuel D., Hisatsugu Kitahara & T. Daniel Seely. 2015. Simplest Merge Generates Set Intersection: Implications for Complementizer-Trace Explanation. In Samuel Epstein, Hisatsugu Kitahara & T. Daniel Seely. Explorations in Maximizing Syntactic Minimization. New York: Routledge, 175-194. 284 CatJL Special Issue, 2019 Noam Chomsky

Fodor, Jerry A. 1975. The Language of Thought. Cambridge, Mass.: Harvard University Press. Fodor, Jerry A. 2008. LOT 2: The Language of Thought Revisited. Oxford: Oxford University Press. Fox, Danny. 2002. Antecedent-Contained Deletion and the Copy Theory of Movement. Linguistic Inquiry 33(1): 63-96. Goodman, Nelson. 1968. Languages of Art: An Approach to a Theory of Symbols. New York: The Bobbs-Merrill Company. Greibach, Sheila. 1965. A New Normal-Form Theorem for Context-Free Phrase Structure Grammars. Journal of the ACM 12(1): 42-52. Halle, Morris & Noam Chomsky. 1960. The morphophonemics of English. MIT RLE Quarterly Progress Report 58: 275-281. Huybregts, M. A. C. (Riny). 2017. Phonemic clicks and the mapping asymmetry: How language emerged and speech developed. Neuroscience & Biobehavioral Reviews 81: 279-294. Katz, Jerry & Paul Postal. 1964. An Integrated Theory of Linguistic Descriptions. Cambridge, Mass.: MIT Press. Kiparsky, Paul. 1973. ‘Elsewhere’ in Phonology. In Stephen R. Anderson & Paul Kiparsky (eds.). Festschrift for Morris Halle. New York: Holt, Rinehart and Winston, 93-106. Kuroda, Sige-Yuki. 1964. Classes of languages and Linear-Bounded Automata. Information and Control 7: 207-223. Lakoff, George. 1965. On the Nature of Syntactic Irregularity. PhD dissertation, Indiana University. Lakoff, George. 1968. Instrumental Adverbs and the Concept of Deep Structure. Foundations of Language 4(1): 4-29. Lakoff, George. 1971. On Generative Semantics. In Danny D. Steinberg & Leon A. Jakobovits (eds.). Semantics - An Interdisciplinary Reader in Philosophy, Linguistics, Anthropology and Psychology. Cambridge: Cambridge University Press. Lakoff, George & John Robert Ross. 1967 [1976]. Is Deep Structure Necessary? Personal letter to Arnold Zwicky, printed in James D. McCawley (ed.). Syntax and Semantics 7: Notes from the Linguistic Underground. New York: Academic Press, 159-164. Lasnik, Howard & Mamoru Saito. 1984. On the nature of proper government. Linguistic Inquiry 15: 235-289. Lasnik, Howard & Juan Uriagereka. 2005. A course in Minimalist Syntax. London: Wiley-Blackwell. Lebeaux, David. 1988. Language acquisition and the form of the grammar. Ph.D. dis- sertation, University of Massachusetts, Amherst. Lees, Robert. 1960. The grammar of English nominalizations. Bloomington: Indiana University Press. Levine, Robert D. 1985. Right node (non-)raising. Linguistic Inquiry 16(3): 492-497. Marantz, Alec. 1997. No Escape from Syntax: Don’t Try Morphological Analysis in the Privacy of Your Own Lexicon. In A. Dimitriadis, L. Siegel, C. Surek-Clark & A. Williams (eds.). Proceedings of the 21st Penn Linguistics Colloquium, UPenn Working Papers in Linguistics, 201-225. Some Puzzling Foundational Issues: The Reading Program CatJL Special Issue, 2019 285

McCawley, James D. 1968. Lexical insertion in a transformational grammar without deep structure. CLS 4: 71-80. Cited by the reprint in McCawley. 1973. Grammar and Meaning. Tokyo: Taishukan, 155-166. McCawley, James D. 1982. Parentheticals and Discontinuous Constituent Structure. Linguistic Inquiry 13(1): 91-106. Morin, Yves-Charles & Michael O’Malley. 1969. Multi-rooted vines in semantic rep- resentation. In Robert Binnick et al. (eds.). Papers from the Fifth Regional Meeting of the Chicago Linguistic Society. Chicago: University of Chicago, 178-185. Nunes, Jairo. 2004. Linearization of Chains and Sidewards Movement. Cambridge, Mass.: MIT Press. Peters, Stanley & R. W. Richie. 1981. Phrase Linking Grammars. Ms. Stanford University and Palo Alto, California. Post, Emil. 1943. Formal Reductions of the General Combinatorial Decision Problem. American Journal of Mathematics 65(2): 197-215. Post, Emil. 1944. Recursively enumerable sets of positive integers and their decision problems. Bulletin of the American Mathematical Society 50: 284-316. Postal, Paul M. 1964. Constituent Structure. Bloomington, Indiana: University of Bloomington. Postal, Paul M. 1974. On Raising. Cambridge, Mass.: MIT Press. Rosenbaum, Peter. 1965. The Grammar of English Predicate Complement Constructions. PhD dissertation, MIT. Saddy, Douglas. 1990. Investigations into Grammatical Knowledge. PhD dissertation, MIT. Saddy, Douglas. 1991. Wh-scope mechanisms in Bahasa Indonesia. MITWPL #15 ed. Lisa Lai-Shen Cheng and Hamida Demirdache, 183-218. Sampson, Geoffrey. 1975. The Single Mother Condition. Journal of Linguistics 11(1): 1-11. Stepanov, Arthur. 2001. Late adjunction and Minimalist Phrase structure. Syntax 4(2): 94-125. Turing, Alan M. 1952. The Chemical Basis of Morphogenesis. Philosophical Transactions of the Royal Society of London B 237(641): 37-72. von Neumann, Johann. 1923. Zur Einführung der transfiniten Zahlen. Acta litterarum ac scientiarum Ragiae Universitatis Hungaricae Francisco-Josephinae, Sectio sci- entiarum mathematicarum 1: 199-208. Yang, Charles. 2016. The Price of Linguistic Productivity: How children learn and break rules of language. Cambridge, Mass.: MIT Press.

Catalan Journal of Linguistics Special Issue, 2019 287-288

Epilogue* Generative Syntax: Questions, Crossroads, and Challenges

Ángel J. Gallego Universitat Autònoma de Barcelona [email protected] Dennis Ott University of Ottawa [email protected]

Borges said somewhere that an epilogue is easier to write than an introduction. If nothing else, a coda lets readers judge the volume for themselves, without raising any expectations as to what (not) to expect, while allowing for post-hoc reflections. This special issue of the Catalan Journal of Linguistics grew out of a workshop titled Generative Syntax: Questions, Crossroads, and Challenges (GenSyn), which took place at the Universitat Autònoma de Barcelona in the summer of 2017. The declared goal of the event was to reconsider the questions that motivate Generative Grammar (GG) and the answers given thus far, the crossroads that its practitioners must navigate, and the challenges they face given the current state of the field. These are the issues addressed by the papers collected here. As readers of the volume will no doubt have noticed, the contributions are highly heterogeneous in character. This is a direct reflection of the wide range of perspectives and questions voiced during the workshop, which was vaguely defined as an open forum for the exchange of ideas free from the strictures of traditional formats. We thank all of our contributors for accepting our invitation to Barcelona and their willingness to be thrown in at the deep end. With two exceptions (the contribution by Chomsky et al., and the transcrip- tion of Chomsky’s lecture at the University of Reading), the papers in this volume were presented as talks at the GenSyn workshop, an attempt to go beyond what a similarly motivated, earlier event (Generative Syntax in the Twenty-first Century:

* ÁJG acknowledges support from the Ministerio de Economía y Competitividad (FFI2014-56968- C4-2-P and FFI2017-87140-C4-1-P), the Generalitat de Catalunya (2014SGR-1013 and 2017SGR- 634), and the Institució Catalana de Recerca i Estudis Avançats (ICREA Acadèmia 2015). DO acknowledges support from the Social Sciences and Humanities Research Council (430-2018- 00305). We would like to dedicate this volume to the memory of Roger Martin, who left us before it was completed. We believe Roger embodied, better than most of us, the kinds of meta-theoretical concerns and interdisciplinary openness that the workshop aimed to foster.

ISSN 1695-6885 (in press); 2014-9718 (online) https://doi.org/10.5565/rev/catjl.289 288 CatJL Special Issue, 2019 Ángel J. Gallego; Dennis Ott

The Road Ahead, Athens 2015) had covered. The goal of both events was to evalu- ate and reconsider the insights and prospects of GG. For the last sixty-odd years, GG has been the dominant approach in the formal study of human language within the cognitive sciences. Drawing in equal parts on classical ideas (e.g. the infinite character of natural language) and then-novel developments (e.g. formal-language theory), the generative revolution of the 1950s provided the basis for a new wave of investigations that gave rise to significant theoretical and empirical discoveries, establishing a fertile ground for synergies with other disciplines such as computer science, psychology, biology, and mathematics. While the interest in GG from some of these disciplines has waned over the decades as the field became more and more impenetrable from the outside, other, novel connections have emerged, such as the recent surge of neurolinguistic research addressing issues germane to GG. Ultimately, it seems to us that further advancement of the enterprise will only be possible if it can find a common vocabulary with related fields (as well as different schools of thought within our field), ask questions that connect meaning- fully with those raised by other disciplines, and appreciate the recognition that a complex phenomenon such as language can, and must, be studied from a variety of perspectives. But for the most part, this vision of cross-disciplinary collaboration remains wishful thinking. The above considerations highlight but one of the many challenges and con- ceptual crossroads that GG is facing today. The GenSyn workshop was intended as an invitation, especially to young researchers, to reflect on the current state of the field several decades after the revolution. In an increasingly specialized field, where many of the questions and insights at the core of the discipline form the conceptual backdrop of the work carried out but are seldomly addressed explicitly, we thought it beneficial to attempt to step back and evaluate where actual progress has been made and what new questions this progress has raised in turn. We are not sure that we succeeded in realizing this vision; but we hope that this issue conveys to the reader a sense of how much has been achieved, and how much remains to be done—and that achievements and questions alike should strengthen our appreciation of the enterprise as a whole. Catalan Journal of Linguistics Editors Centre de Lingüística Teòrica de la Universitat Autònoma de Barcelona Institut Interuniversitari de Filologia Valenciana Catalan Journal of Linguistics style sheet

Informació general i subscripcions General information and subscriptions • Document format. Manuscripts should be should be listed chronologically, with the CATALAN JOURNAL OF LINGUISTICS és una revista de lingüís- CATALAN JOURNAL OF LINGUISTICS is a journal of theoretical written in English. Font 12 point Times New oldest publication at the top and the newest at tica teòrica, publicada pel Centre de Lingüística Teòrica linguistics published by the Centre de Lingüística Roman, interline at 1.5, use margins 2,5 cm the bottom. Please, provide the DOI of articles de la Universitat Autònoma de Barcelona (format per Teòrica de la Universitat Autònoma de Barcelona (1 inch). Phonetic symbols must be set in if available. lingüistes dels departaments de Filologia Catalana i de (comprising linguists from the departments of Catalan Doulos SIL or Charis SIL. a) Book: Filologia Espanyola) i per l’Institut Interuniversitari Philology and Spanish Philology) and by the Institut • Heading. Title, with title capitalisation, in Borer, Hagit. 1983. Parametric Syntax. de Filologia Valenciana (que integra lingüistes de les Interuniversitari de Filologia Valenciana (which boldface. Name of the author(s), affiliation(s) Dordrecht: Foris. universitats de València, d’Alacant i Jaume I de Castelló). comprises linguists from the universities of València, and e-mail address(es). Acknowledgements, DOI: http://dx.doi.org/10.1515/97831 CATALAN JOURNAL OF LINGUISTICS publica articles d’inves- Alacant and Jaume I at Castelló). as a footnote, called by a flying asterisk. 10808506 tigació en lingüística teòrica que desenvolupin teories CATALAN JOURNAL OF LINGUISTICS publishes research papers sobre l’estructura de les llengües particulars dins la in theoretical linguistics developing theories on the • Abstracts. A 150 word abstract is required. b) Journal article: perspectiva d’una teoria general del llenguatge humà. structure of particular languages within the perspective A Catalan version of the abstract should be Kayne, Richard S. 1993. Toward a modu- Els volums seran sovint monogràfics, editats per un of a general theory of the human language. Volumes will included. Authors are also asked to provide lar theory of auxiliary selection. Studia especialista en el tema. often be monographic, edited by a specialist in the topic. the keywords of the article. Linguistica 47: 3-31. € € DOI: http://dx.doi.org/10.1111/j.1467- El preu de subscripció anual és de 18 . The yearly subscription price is 18 . • Sections and paragraphs. Headings of 9582.1993.tb00837.x main sections: in boldface. Headings of first Comitè assessor Redacció order paragraphs: in italics. Headings of c) Book article: Josep Maria Brucart Marraco Universitat Autònoma de Barcelona second order paragraphs or others: in italics Jaeggli, Osvaldo & Safir, Ken. 1989. The (Universitat Autònoma de Barcelona) Departament de Filologia Catalana with text beginning at the same line. Null Subject Parameter and Parametric Maria Teresa Espinal Farré 08193 Bellaterra (Barcelona). Spain Theory. In Jaeggli, Osvaldo & Safir, Phone: +34 93 581 23 71 (Universitat Autònoma de Barcelona) • Text. The beginning of a paragraph should Ken (eds.). The Null Subject Parameter, C +34 93 581 23 72 Anna Gavarró Algueró be left indented, except for the first paragraph 1-44. Dordrecht: Kluwer. (Universitat Autònoma de Barcelona) Fax: +34 93 581 27 82 of a section. Words, phrases or sentences in a DOI: http://dx.doi.org/10.1007/978-94- M Maria Lluïsa Hernanz Carbó E-mail: [email protected] language other than English appearing in the 009-2540-3_1 Y (Universitat Autònoma de Barcelona) Intercanvi body of the text should be followed by a Sabine Iatridou (Massachusetts Institute of Technology) Universitat Autònoma de Barcelona translation in simple quotation marks. d) Dissertation, manuscript, or work in CM Michael J. Kenstowicz Servei de Biblioteques References should follow this format: preparation: (Massachusetts Institute of Technology) Secció d’Intercanvi de Publicacions Pesetsky, David. 1982. Paths and MY Chomsky (1981), Kayne (1981, 1982), Belletti Joan Mascaró Altimiras Plaça Cívica (edifici N) & Rizzi (forthcoming), Jackendoff (p.c.). If Categories. PhD dissertation, MIT. (Universitat Autònoma de Barcelona) 08193 Bellaterra (Barcelona). Spain Chomsky, Noam. 1994. Categories and CY the page number or an example is expressed: Gemma Rigau Oliver Phone: +34 93 581 11 93 Rizzi (1990: 20), Rizzi (1990: ex. (24)). When Transformations. Unpublished manus- CMY (Universitat Autònoma de Barcelona) Fax: +34 93 581 32 19 the reference is included between parentheses, cript, MIT. Henriette de Swart (Universiteit Utrecht) E-mail: [email protected] K • Publication rights. CATALAN JOURNAL OF Rafaella Zanuttini (Yale University) drop the ones enclosing the year: (see Rizzi Subscripció, administració, edició i impressió 1990: 20). LINGUISTICS is published under the licence Consell editor Universitat Autònoma de Barcelona • Examples. Example numbers should go system Creative Commons, according to the Teresa Cabré Monné Servei de Publicacions modality «Attribution-Noncommercial (by-nc): (Universitat Autònoma de Barcelona) 08193 Bellaterra (Barcelona). Spain between parentheses, left aligned (Arabic digits for examples in the text; lower case derivative work is allowed under the condition Ángel Gallego Bartolomé Phone: +34 93 581 10 22 of non making a commercial use. The original (Universitat Autònoma de Barcelona) E-mail: [email protected] Roman for examples in footnotes). Example work cannot be used with commercial Jesús Jiménez Martínez (Universitat de València) http://publicacions.uab.cat texts should start at the first tab. If there is Sandra Montserrat Buendía (Universitat d’Alacant) ISSN 1695-6885 (in press); ISSN 2014-9719 (online) more than one example within a number, purposes». Therefore, everyone who sends a Xavier Villalba Nicolás Dipòsit legal: B-4.112-2003 lower case alphabet letters should be used manuscript is explicitly accepting this publica- (Universitat Autònoma de Barcelona) Imprès a Espanya. Printed in Spain followed by a stop and a tab. Examples must tion and edition cession. In the same way, Imprès en paper ecològic be referred to as (1), (5a,b), (6b-d), (ii), (iii he/she is authorizing CATALAN JOURNAL OF c-e), (iv a,b), etc. LINGUISTICS to include his/her work in a journal’s issue for its distribution and sale. The Bases de dades en què CATALAN JOURNAL OF LINGUISTICS està referenciada • Glosses. Examples from languages other cession allows CATALAN JOURNAL OF LINGUISTICS — Bibliographie Linguistique — Linguistic Bibliography than English should be accompanied by a to publish the work in a maximum period of two — CARHUS Plus+ — Linguistic Working Papers Directory gloss and, if necessary, by a translation into years. With the aim of favouring the diffusion — CiteFactor — Linguistics Abstracts English. Align glosses with examples word of knowledge, CATALAN JOURNAL OF LINGUISTICS — Dialnet — LLBA (Linguistic and Language Behavior Abstracts) by word. Translations should be in simple joins the Open Access journal movement — Francis — Scopus quotation marks. — Índice Español de Ciencias Sociales (DOAJ), and delivers all its contents to different y Humanidades (ISOC) • References. References should be according repositories under this protocol; therefore, to the following pattern, with full names of sending a manuscript to the journal also entails CATALAN JOURNAL OF LINGUISTICS es publica sota el sistema de llicències Creative Commons segons la modalitat: authors included. Reference entries for the explicit acceptation by its author/s of this Reconeixement - NoComercial (by-nc): Es permet la generació d’obres derivades multiple works by the same author(s)/editor(s) distribution method. sempre que no se’n faci un ús comercial. Tampoc es pot utilitzar l’obra original amb finalitats comercials. CMY CY MY CM K Y M C Centre deLingüística Teòrica delaUniversitat Autònoma deBarcelona 203 165 153 139 89 45 27 7 Index 287 263 229 Generative Syntax.Questions,Crossroads, andChallenges Components. Architecture of Grammar and the Role of Interface Breaking aconceptualtie. theory ofgrammar? approach. Matrix Syntax. Syntax: atimechartandsomereflections. Questions, Crossroads,andChallenges. Reading Program. Challenges. and Questions, Insights, Language: of Faculty the and Grammar Interfaces. Cerrudo, Alba. Irurtzun, Aritz.De-syntacticisingSyntax?Concernsonthe Siddiqi, Daniel.OnMorpho-Syntax. Richards, Marc. Hunter, Tim. Biberauer, Theresa.Factors2and3:Towardsaprincipled Martin, Roger D’Alessandro, Roberta. Gallego, ÁngelJ.;Ott,Dennis.Epilogue.GenerativeSyntax: Chomsky, Noam. Chomsky, Noam;Gallego,ÁngelJ.;Ott,Dennis.Generative Institut Interuniversitari deFilologia Valenciana ISSN 1695-6885(inpress);2014-9719(online) Catalan Journal ofLinguistics Catalan Journal What sort ofcognitive hypothesis derivational a is https://revistes.uab.cat/catJL † ; Orús,Román;Uriagereka,Juan.Towards Discourse Phenomena as a Window to the Problems of ‘Problems of Projection’: Some Puzzling Foundational Issues: The Special Issue,2019 The achievements of Generative Servei dePublicacions

Generative Syntax. Questions, Crossroads, and Challenges CCATATALAN JJOURNAL OFL LINGUISTICS Centre deLingüística Teòrica delaUniversitat Autònoma deBarcelona OF Generative Syntax.Questions, Institut Interuniversitari deFilologia Valenciana Crossroads, andChallenges ISSN 1695-6885(inpress);2014-9719(online) C C J J L L Ángel J.Gallego&DennisOtt OURNAL Special Issue INGUISTICS AT https://revistes.uab.cat/catJL ATAL Special Issue,2019 Edited by 2019 AN Servei dePublicacions