Serbo-Croatian Word Order: a Logical Approach
Total Page:16
File Type:pdf, Size:1020Kb
Serbo-Croatian Word Order: A Logical Approach Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Vedrana Mihaliˇcek,B.A., M.A. Graduate Program in Department of Linguistics The Ohio State University 2012 Dissertation Committee: Carl Pollard, Advisor Brian Joseph Michael White Table of Contents Page List of Tables . iv 1. Introduction . 1 2. Framework . 2 2.1 Background . 2 2.2 Phenogrammar . 5 2.2.1 Terms and Types . 5 2.2.2 Functions . 7 2.3 Tectogrammar . 14 2.3.1 Preliminaries . 14 2.3.2 Representing Inflectional Features . 16 2.3.3 N and NP type families . 17 2.3.4 S type family . 17 2.3.5 ( types . 18 2.3.6 ∏ types . 19 2.4 Semantics . 21 2.4.1 Preliminaries . 21 2.4.2 Entailment . 21 2.4.3 Types . 22 2.5 Putting it all together . 24 2.5.1 Signs . 24 2.5.2 Rules . 26 i 3. Basic Word Order . 28 3.1 Introduction . 28 3.2 Data . 29 3.2.1 Lexical Noun Phrases . 29 3.2.2 Phrasal Noun Phrases . 36 3.2.3 Adverbial Modifiers . 42 3.3 Analysis . 45 3.3.1 Lexical Noun Phrases . 45 3.3.2 Phrasal Noun Phrases . 53 3.3.3 Adverbial Modifiers . 69 3.4 Conclusion . 75 4. Embedding, Predicative and Control . 79 4.1 Introduction . 79 4.2 Embedded Declarative Clauses . 80 4.2.1 Data . 80 4.2.2 Analysis . 83 4.3 Predicatives . 86 4.3.1 Data . 86 4.3.2 Analysis . 90 4.4 Subject and Object Control . 102 4.4.1 Data . 102 4.4.2 Analysis . 111 4.5 Conclusion . 120 5. Enclitics . 122 5.1 Introduction . 122 5.2 Data . 123 5.2.1 Order . 123 5.2.2 Placement . 125 5.3 Analysis . 129 5.3.1 Preliminaries . 129 5.3.2 Pronominal Clitics . 132 5.3.3 The Inherent Reflexive se . 136 ii 5.3.4 The True Reflexive se . 137 5.3.5 Auxiliaries . 146 5.3.6 1C and 1W placement . 163 5.4 Conclusion . 168 6. Interrogatives . 170 7. Conclusion . 171 Bibliography . 172 iii List of Tables Table Page 2.1 Phenogrammatical Types and Non-Logical Constants. 8 2.2 Phenogrammatical Logical Constants. 14 2.3 Phenogrammatical Functions. 15 2.4 Typesetting conventions. 25 3.1 Examples of inflectional feature combinations on lexical noun phrases. 30 3.2 Summary of noun and noun phrase types. 76 4.1 The verb biti ‘be’ paradigm. 86 4.2 The verb htjeti ‘want, will’ paradigm. 103 5.1 The order of enclitics in the cluster. 130 5.2 Tectogrammatical types of enclitics. 131 5.3 Interim summary evaluation of grammar with respect to enclitic cluster placement. 164 iv Chapter 1: Introduction 1 1 Chapter 2: Framework 2 2.1 Background 3 In this chapter, we describe the framework that our theory of Serbo-Croatian 4 grammar is expressed in. Our framework is closely related to Lambda Grammar 5 (Muskens , 2003, 2007b) and ACG (de Groote , 2001). Variants of this framework, 6 which we view as a descendant of Higher-Order Grammar (Pollard , 2004) and 7 CVG (Pollard , 2011), have previously been called Linear Logic Based Grammar 8 (Mihaliˇcek, 2010b) and Pheno-Tecto Distinguished Categorial Grammar (origi- 9 nally in Smith (2010); also Mihaliˇcekand Pollard (to appear)). 10 While obviously natural language expressions have both purely combinatorial 11 syntactic properties (what sorts of arguments do they require? what sorts of ex- 12 pressions can they be arguments of?), and ordering properties (do they have to 13 occur in some specific place in a clause or not? do they have to occur immediately 14 to the left or right of some other expressions or not?), in many logical frameworks 15 these two sets of properties are represented jointly, by a single component of the 16 framework. 2 17 For example, in many mainstream versions of Categorial Grammar (Multi- 18 modal Combinatory Categorial Grammar (MMCCG; Baldridge (2002)), Multi- 19 modal Categorial Type Logics (MMCTL; Moortgat (1997); Bernardi (2002); Ver- 20 maat (2005) ) and Multimodal Type Logical Grammar (MMTL; Morrill and Solias 21 (1993); Morrill (1994); Morrill et al. (2007)), and non-multimodal versions of these 22 frameworks), both combinatorial syntax and word order of expressions are repre- 23 sented by a single component of the framework which is formalized in Lambek 24 Calculus. This results in inflexibility when it comes to dealing with freer word 25 order. We see the development of multimodal versions of these frameworks as 26 an attempt to compensate for the inflexibility that stems from representing com- 27 binatorial syntax and word order jointly, in the same part of the theory (see also 28 (Muskens , 2003)). 29 Another approach to dealing with the same problem, exemplified by our and 30 related frameworks, is simply to give up the assumption that word order and 31 combinatorial syntax should be represented by a single grammatical component. 32 We will ambiguously refer to these two sets of properties of linguistic expressions, 33 but also the components of the theory that represent these sets of properties, as 34 phenogrammar and tectogrammar respectively. This terminology originates to 35 the best of our knowledge with Curry (1961). 36 Since Curry’s original paper, many attempts have been made to dissociate syn- 37 tactic combinatorics from linear order (e.g. Dowty (1996); Reape (1993, 1996); 3 38 Kathol (2000); Muskens (2003, 2007b); see also linear precedence rules of Pollard 39 and Sag (1987) and GPSG (Gazdar et al. , 1985)), so this approach is not new. The 40 separation of phenogrammar and tectogrammar has proven useful and important 41 for elegantly analyzing phenomena such as quantifier scoping and medial extrac- 42 tion in English (Oehrle , 1994), and German word order (Reape , 1993; Kathol , 43 2000). Mihaliˇcekand Pollard (to appear) argue that the separation of phenogram- 44 mar and tectogrammar considerably simplifies the analysis of interrogatives in 45 English and Chinese, bringing out the underlying tectogrammatical similarities 46 between the two languages and identifying phenogrammar as the locus of cross- 47 linguistic variation with respect to different question forming strategies. 48 Essentially, the division of labor between phenogrammar and tectogrammar 49 allows complex word order facts to be described somewhat independently in 50 phenogrammar, without unnecessarily complicating the tectogrammar. This is 51 important for describing a language such as Serbo-Croatian, which often allows 52 semantically and syntactically insignificant reordering of expressions and discon- 53 tinuities of various constituents. 54 So, our framework consists of three term/type calculi, which independently 55 represent phenogrammar, tectogrammar and semantics. Each linguistic expres- 56 sion is then represented in our theory as a triple of term/type pairs, corresponding 57 to the representation of that expression’s word order and combinatorial syntactic 58 properties, and its meaning. 4 59 In the remainder of this chapter, we describe each component separately and 60 then sketch how the three components work together. 61 2.2 Phenogrammar 62 2.2.1 Terms and Types 63 To distinguish between different phenogrammatical (ordering) properties of 64 expression, we assign them to different types. At the outset, we must distinguish 65 between phonological words, and clitics. We assign all phonological words to 66 type p and clitics to type c. For example, a phonological word such as the name 67 Marko, and a clitic, such as the auxiliary sam ‘am’ are represented in the grammar 68 as follows: 69 (1) a. ` marko : p 70 read as: ‘marko is a term of type p’, 71 i.e. the expression Marko is a phonological word 72 b. ` sam : c 73 read as: ‘sam is a term of type c’, 74 i.e. the expression sam is a clitic 75 Later we will define functions that represent cliticization, i.e. the construction of 76 a larger phonological word out of a clitic and a phonological word. 77 However, simply distinguishing between clitics and phonological words is 78 not sufficient to adequately describe the empirical domain. Phonological words 79 and sequences of phonological words in Serbo-Croatian differ in terms of how 80 freely they can order with respect to other constituents. So we introduce other 81 phenogrammatical types which help us encode these distinctions. 5 82 First, every phonological word can be viewed as a length one string of phono- 83 logical words. We introduce a type s for all strings of phonological words, i.e. p- 84 strings. So, the expression Marko, in addition to being represented as ` marko : p, 85 also corresponds to the string ` markos : s. While we can build strings out of 86 phonological words, we cannot do so with the clitics. 87 Second, for any string of phonological words, we can define a set of strings 88 whose member is exactly that string. Note that sets of strings are typically called 89 languages in the formal language theory setting. So a set of p-strings is called a 90 p-language. The type of sets of p-strings is s ! t, abbreviated as S. Here t is just 91 the type of truth values. So starting with ` markos : s, we can define: 92 (2) ` ls.s = markos : S 93 This term denotes a set of strings which contains the string markos, i.e. {markos}. 94 We will abbreviate such terms of type S as follows: 95 (3) ` MARKO : S 96 where MARKO =de f ls.s = markos 97 Now, we don’t stop here. In order to give an adequate description of the complex- 98 ities of Serbo-Croatian word order, we build up even more complex types so we 99 can make more fine grained distinctions in the grammar with respect to ordering 100 possibilities.