Compositionality and Syntactic Generalizations

COMPOSITIONALITY AND SYNTACTIC GENERALIZATIONS Jan Odijk COMPOSITIONALITY AND SYNTACTIC GENERALIZATIONS PROEFSCHRIFT TER VERKRIJGING VAN DE GRAAD VAN DOCTOR AAN DE KATHOLIEKE UNIVERSITEIT BRABANT, OP GEZAG VAN DE RECTOR MAGNIFICUS, PROF. DR. L.F.W. DE KLERK, IN HET OPENBAAR TE VERDEDIGEN TEN OVERSTAAN VAN EEN DOOR HET COLLEGE VAN DEKANEN AANGEWEZEN COMMISSIE IN DE AULA VAN DE UNIVERSITEIT OP VRIJDAG 12 NOVEMBER 1993 TE 14.15 UUR DOOR Johannes Engelbertus Josephus Maria Odijk geboren te Schiedam Promotores: Prof. Dr. H.C. van Riemsdijk Prof. Ir. S.P.J. Landsbergen The work described in this thesis has been carried out at the Philips Research Laboratories in Eindhoven, the Netherlands, as part of the Philips Research program. Acknowledgements This thesis is one of the results of research conducted in the Rosetta machine translation project which was carried out at Philips Research Laboratories. I am grateful to Jan Landsbergen and Henk van Riemsdijk for their supervision. Furthermore, I would like to thank Jan for giving me the opportunity of working at Philips and for supplying the opportunities to write theses in general and this one in particular, and Henk for waiting so long for my original thesis, finally receiving something completely different from what he must have expected. I would also like to thank all members from the Rosetta team, and the students who visited us. Some of them require special mentioning. Agnes Mijnhout initiated the research on predicate-argument relations, on which we report in chapter 4 of this book, and Angeliek van Hout made an initial study of the problems which R-pronouns pose in the Rosetta framework. My room mates AndréSchenk and later Joep Rous had a very positive influence on me and my work through the stimulating discussions we had on our work and other topics. And I would like to thank Lisette Appelo. The often intense discussions with her and her comments on earlier versions of this book improved it considerably. I am obliged to Lex Augusteijn and Wijbrand Siedenburg for their invaluable TEXnical and PostScript support in preparing the final document. Verder wil ik mijn ouders bedanken, met name mijn vader, die deze promotie helaas niet meer mee kan maken. Zij hebben mij altijd volop de gelegenheid gegeven en gestimuleerd te studeren. Tot slot bedank ik Margriet. Zij bleef er altijd in geloven (en erop aandringen) dat ik zou promoveren. En het gaat er dus toch van komen. Jan Odijk i ii Contents 1 Introduction 1 1.1 Formulation of the Problem . 1 1.2 Syntax and Semantics . 2 1.3 Theoretical and Application-Oriented Grammars . 4 1.3.1 Artifacts . 5 1.3.2 Descriptive Adequacy . 7 1.3.3 Broad Coverage . 8 1.3.4 Formal Differences . 9 1.4 Organization of this Book . 9 2 The Grammatical Framework 11 2.1 Compositional Grammars . 11 2.2 S-trees . 12 2.3 Free M-grammars . 13 2.3.1 Description . 13 2.3.2 Example M-grammar and Example Derivation . 15 2.4 Controlled M-grammars . 20 2.5 Translation Relation . 22 2.6 Translation System . 30 2.6.1 Description of the System . 30 2.6.2 Example Translation . 32 2.7 Some Specific Assumptions . 35 2.8 Concluding Remarks . 39 3 Compositionality and Syntactic Generalizations 41 3.1 Introduction . 41 3.2 Auxiliaries and Inversion in English . 42 3.2.1 The Problem . 42 3.2.2 A Solution . 45 3.3 Mood in Dutch . 47 3.4 Order variants . 49 3.5 Wh-movement . 51 3.6 Generic Sentences . 53 iii iv CONTENTS 3.7 A Different Kind of Modularization . 56 3.8 Concluding Remarks . 60 4 Predicate-Argument Relations 63 4.1 Introduction . 63 4.2 Predicate-Argument Relations and Compositionality . 63 4.3 Arguments . 65 4.3.1 Argument-Ordering Convention . 68 4.3.2 External and Internal Arguments . 70 4.3.3 Attribute-Value Pairs to Specify Arguments . 72 4.4 Covert Arguments . 75 4.5 Bound Adjuncts and Adverbials . 79 4.6 Small Clauses . 82 4.7 Systematic Relations between Predicates . 92 4.8 Concluding Remarks . 95 5 Some Constructions 97 5.1 Introduction . 97 5.2 Passivization . 98 5.3 Verb Second in Dutch . 104 5.4 Unbounded Dependencies . 109 5.5 Crossing Dependencies in Dutch . 113 5.5.1 Outline of the Analysis . 114 5.5.2 Verb-Raising Transformations . 116 5.5.3 Pruning . 118 5.5.4 Surface Grammar . 120 5.5.5 Concluding Remarks . 122 6 R-pronouns 125 6.1 Introduction . 125 6.2 Some Relevant Facts . 126 6.3 The Functions of R-pronouns . 127 6.3.1 Expletive Function . 127 6.3.2 Prepositional R-pronouns . 136 6.3.3 Locative R-pronouns . 138 6.3.4 Quantificational R-pronouns . 138 6.4 The Distribution of R-pronouns . 139 6.4.1 General Discussion . 139 6.4.2 Global Characterization of the Results . 142 6.5 The Assumptions in More Detail . 148 6.6 Illustration . 156 6.7 Concluding Remarks . 158 CONTENTS v 7 Concluding Remarks 163 7.1 Conclusions . 163 7.2 Topics for Further Research . 165 Samenvatting (summary in Dutch) 179 Curriculum Vitae 183 vi CONTENTS Chapter 1 Introduction 1.1 Formulation of the Problem The central problem of this dissertation is the question how syntactic generalizations can be adequately captured in a compositional framework. This problem will be investigated within the controlled M-Grammar formalism. I will describe how a number of complex syntactic constructions have been dealt with in this formalism, which has been used in developing the Rosetta machine translation system. In particular, I will show that these syntactic constructions have been dealt with in a syntactically adequate manner in a framework which is compositional in nature, and where consequently the grammar has a strong semantic bias. The syntactic generalizations that I am mainly interested in here relate to the fact that many constructions can be described most adequately by a conglomerate of construction-independent rules. This construction-independence of syntactic rules, and the relation between syntax and semantics will be discussed in more detail in section 1.2. In addition, the research has been carried out in the context of research into machine translation, i.e. application-oriented research. Application-oriented research differs from purely theoretical research in a number of respects. Some of the differences will be presented in more detail in section 1.3. The general conclusions of this study are (1) that the grammatical formalism used, controlled M-grammar, supplies | due to its compositional nature | a firm framework to deal with certain phenomena, especially when they relate fairly directly to semantics (e.g. predicate-argument relations);(2) that the framework makes it possible to incorporate analyses in which constructions are created by a conglomerate of construction-independent rules; (3) that it is possible to incorporate and extend syntactically adequate descriptions based on insights from theoretical linguistics into this compositional framework in a fairly direct manner. It will, however, also become clear that many improvements of the framework, or of specific linguistic analyses within it, are still possible. 1 2 CHAPTER 1. INTRODUCTION In the Rosetta translation system, grammars have been developed for the languages Dutch, English and Spanish. The general conclusions presented will therefore be established on the basis of examples from these languages which have actually been implemented, with an emphasis on Dutch. 1.2 Syntax and Semantics The aim of this book is to show that a number of complex syntactic phenomena can be (and have been) dealt with in a compositional grammar by means of rules which are not construction-specific. The most adequate syntactic description of a construction often turns out to be a description in which the construction arises as a consequence of the interaction of a number of different syntactic rules, most of which play a role in the formation of a wide variety of other constructions as well and are therefore not specific to one construction. I have attempted to achieve such descriptions in the Rosetta machine translation system as often as possible. The attempt to make rules of grammars construction-independent is one aspect of the so-called Move α program as pursued by Chomsky and other theoretical linguists. Another aspect of this program is the attempt to replace language- specific rules by language-independent principles. In fact, within this program, the thesis that there are no construction-specific and no language-specific rules, for a substantial core of the grammar, is pushed to its limits. As will become clear later, many analyses of constructions within the Rosetta system have been inspired by analyses of these constructions proposed within the Move α program, though the emphasis was on developing analyses which use construction-independent but language-specific rules. The issue of language-specificity will be briefly mentioned, and the conclusion is that improvements are possible. A way of dealing with this aspect is suggested, but the analyses to be presented are all formulated in terms of language-specific rules. A comparison with other possible analyses of the same facts from computationally oriented frameworks is made incidentally. It is certainly not the case that I attempt to describe a direct implementa- tion of a grammar constructed in accordance with the Move α program. On the contrary, an attempt was usually made to extract a number of key features of a specific analysis and to express these in the analysis developed. There are several reasons for not directly implementing such a system. First, the aim is to develop adequate syntactic analyses in a compositional framework. The compositionality of the framework imposes restrictions which do not hold in a pure Move α framework.

Compositionality and Syntactic Generalizations

Eric Hoekstra Review of "Studies in West Frisian Grammar. Selected

Download (1MB)

Literature of the Low Countries

E-Book Version of Online Dutch Grammar Course

Gender Across Languages: the Linguistic Representation of Women and Men

Making Sense of Dutch Word Order

Formal Methods for Testing Grammars Inari Listenmaa

Essential Dutch Grammar Proficiency Level A1

Dutch. a Linguistic History of Holland and Belgium

The Effects of Travel on the Transfer of Linguistic Data Between Asia and Europe

ONLINE APPENDIX: Not for Print Publication

Essential Dutch Grammar Free