Proceedings of the Society for Computation in Linguistics
Volume 2 Article 6
2019 Modeling Clausal Complementation for a Grammar Engineering Resource Olga Zamaraeva University of Washington, [email protected]
Kristen Howell University of Washington, [email protected]
Emily M. Bender University of Washington, [email protected]
Follow this and additional works at: https://scholarworks.umass.edu/scil Part of the Computational Linguistics Commons
Recommended Citation Zamaraeva, Olga; Howell, Kristen; and Bender, Emily M. (2019) "Modeling Clausal Complementation for a Grammar Engineering Resource," Proceedings of the Society for Computation in Linguistics: Vol. 2 , Article 6. DOI: https://doi.org/10.7275/dygn-c796 Available at: https://scholarworks.umass.edu/scil/vol2/iss1/6
This Paper is brought to you for free and open access by ScholarWorks@UMass Amherst. It has been accepted for inclusion in Proceedings of the Society for Computation in Linguistics by an authorized editor of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. Modeling Clausal Complementation for a Grammar Engineering Resource
Olga Zamaraeva Kristen Howell Emily M. Bender University of Washington University of Washington University of Washington [email protected] [email protected] [email protected]
Abstract because of their interaction with other phenomena. A grammar that cannot support subordinate clauses We present a grammar engineering library will fail to provide analyses for a big portion of a for modeling objectival declarative clausal typical corpus and will not support the grammar complementation patterns attested cross- engineer in exploring how well the analyses used linguistically. Our primary contribution is positing a set of syntactico-semantic analyses for modeling simple clauses generalize. couched within a variant of the HPSG syntactic To fill this gap, we incorporate a cross-linguistic formalism and integrating them with a variety account of clausal complements into a grammar of phenomena already implemented in a gram- engineering questionnaire and customization sys- mar engineering toolkit. We evaluate the addi- tem. The questionnaire elicits typological infor- tion to the system on testsuites from genetically mation about hypothetically any language and the diverse languages that were not considered dur- customization system outputs a starter precision ing development. grammar to specifications. In this way, the toolkit 1 Introduction supports the rapid development of precision gram- mars, giving both novice and experienced grammar Grammar engineering is the modeling of language developers a means to create a grammar fragment rules in a machine-readable fashion, such that the customized to their language of interest and ready resulting system (the grammar) can parse gram- for extension to broader coverage. matical strings while not overgenerating (licensing In the meta-grammar engineering context, an incorrect or spurious analyses). Linguistic gram- analysis is a set of theoretically-grounded elements mar engineering prioritizes precision (how many which comply with the requirements of the spe- parses are syntactically and semantically correct cific implementation framework and are used by with respect to the linguistic formalism) over re- the grammar engineering system so as to produce call (how many input strings get parsed). functioning grammars. In our case, the analysis A precision grammar documents and models is couched within Head-Driven Phrase Structure linguistic phenomena on the one hand, and is a pro- Grammar (HPSG; Pollard and Sag, 1994) and gram that parses and generates, on the other. This Minimal Recursion Semantics (MRS; Copestake makes precision grammars a resource for rigorous et al., 2005) and is implemented as an extension to linguistic hypothesis testing as well as for NLP the LinGO Grammar Matrix (Bender et al., 2002, tasks. While precision grammars are expensive— 2010). Our main challenges lie in accounting for broad-coverage grammars like Flickinger 2000, a large space of typological possibilities while at 2011 or Siegel et al. 2016 take years to build— the same time integrating our analysis of clausal there are e orts to automate this process. We complements into a complex system which out- present a contribution to one such initiative, adding puts streamlined grammars. We start by motivat- a clausal complements library to a grammar engi- ing the choice of the grammar engineering system neering starter toolkit. and briefly summarizing the formal underpinnings Clausal complements are a kind of subordinate associated with this choice (§2). We then present clause. Modeling subordinate clauses in general a concise summary of the typological literature on and clausal complements in particular is important clausal complementation (§3), and describe our not only because of their corpus frequency, but also analysis and implementation in §4. The evaluation
39 Proceedings of the Society for Computation in Linguistics (SCiL) 2019, pages 39-49. New York City, New York, January 3-6, 2019 results featuring held-out languages from di erent clause embedding itself. On the other hand, we are families are given in §5. All of the resources men- able to explore the interaction of our analyses with tioned in the paper are freely available.1 the existing analyses of other phenomena in or- der to validate both the existing and new libraries. 2 Background Finally, because we contribute our library to the Grammar Matrix code base, it can in turn serve 2.1 The Grammar Matrix as part of the background infrastructure for future We develop a cross-linguistic analysis of clausal library development. complements as part of the LinGO Grammar Ma- 2 trix (Bender et al., 2002, 2010). Other examples 2.2 DELPH-IN Joint Reference Formalism, of multilingual grammar engineering projects in- HPSG and MRS clude ParGram (Butt and King, 2002) and Core- The Grammar Matrix uses the DELPH-IN Joint Gram (Müller, 2015). What is unique about the Reference Formalism (Copestake, 2000), a version Grammar Matrix, however, is that it allows the of the HPSG syntactic formalism (Pollard and Sag, user to obtain a starter grammar customized from 1994), developed to balance expressive power with typological choices, making it possible to easily computational e ciency. Every part of the gram- test the analysis on multiple languages, including mar (lexical items, lexical rules, and phrase struc- systematic testing of each part of the analysis using ture rules) is encoded with typed feature struc- artificially constructed languages (see §4.3). tures.4 Phrase structure rules can have any fixed The Grammar Matrix consists of a web-based number of daughters in a fixed order,5 but are typ- questionnaire and a back-end customization sys- ically either binary branching or unary construc- tem. The input to the system is user answers to tions. Lexical rules are always unary projections the typological questions (elicited in the question- and are furthermore restricted to the lower parts of naire and serialized as a choices file) and the out- the tree, i.e. below any phrase structure rules. put is an implemented starter precision grammar. A customized grammar consists of a type hier- There are currently multiple libraries, including archy where typed feature structures (types) inherit word order (Bender and Flickinger, 2005; Fokkens, constraints from other types and add their own con- 2014); person, number, gender, and case systems straints. The typed feature structures defined by the (Drellishak, 2009); tense, aspect, and mood (Poul- grammar are combined using the operation of uni- son, 2011); argument optionality (Saleem and Ben- fication in order to build analyses of sentences. If der, 2010); matrix yes/no questions, lexicon, (Ben- no analysis can be found for a string using a gram- der and Flickinger, 2005); morphotactics (O’Hara, mar, then the string is deemed ungrammatical by 2008; Goodman and Bender, 2010); nominalized that grammar. Because the feature structures en- clauses (Howell et al., 2018), and clausal modifiers code both syntactic and semantic constraints, any (Howell and Zamaraeva, 2018).3 The contribution derivation tree produced by the grammar also in- of this paper is the clausal complements library. cludes an MRS semantic representation (Copes- In the context of the Grammar Matrix, our analysis take et al., 2005). When evaluating a grammar consists of a new web questionnaire page with a set created via the Grammar Matrix customization sys- of choices describing clausal complements cross- tem, the correctness of both syntactic and semantic linguistically, a set of new types available for the representations of strings is taken into account. back-end customization logic, and revisions to the types already existing in the system. 2.3 Clausal Complements in HPSG Building on the existing resource of the Gram- mar Matrix to develop our library has dual bene- To our knowledge, there is no previous fits: On the one hand, we can reuse existing im- cross-linguistic account of clausal complements plementations of phenomena that occur in embed- in HPSG. Language-specific analyses include ded clauses and focus our attention primarily on Ginzburg and Sag 2000 for English and Crysmann 2013 for German, among others. We could not di- 1svn://lemur.ling.washington.edu/shared/ matrix/trunk, revision 42067 (code and testsuites). 4These are exemplified in §4.2 in (4)-(10). 2http://matrix.delph-in.net/customize/ 5This contrasts with other versions of HPSG formalisms matrix.cgi which separate immediate dominance from linear precedence 3The nonexhaustive list of cited libraries includes the ones (e.g. Engelkamp et al., 1992) as well as those that allow with which we tested the interaction of our library. variable-arity rules, like Sag et al. 2003.
40 rectly incorporate them for two reasons. First, most its nominal object. Complementizers can be seen of the existing theoretical analyses are not within in both (3) and (1). the DELPH-IN JRF. Second, the literature mostly (2) [senin sinema-ya gel-me-n-i] concerns itself with the level of clause complex- [2 . cinema- come- -2 - ] ity which is beyond the Grammar Matrix’s cur- isti-yor-um rent scope. The analyses which informed us the want- -1 ‘I want you to come to the movies.’ [tur] (adapted most are actual grammar implementations within from Kornfilt 2013, p. 48) the DELPH-IN JRF and include Siegel et al. 2016 for Japanese and Flickinger 2000, 2011 for English. (3) Men bilamen [ki bu Odam ˘ o˘ a-ni I know-1 [ this man chicken- o irladi] 3 The Typology of Clausal Complements stole-3 ] ‘I know that the man stole the chicken.’ [uzb] (Noo- Clausal arguments are clauses which serve as core nan, 2007) arguments to verbs (Noonan, 2007). They can be subjects or complements (objects) and typically oc- 4 Developing the Library cur with verbs of thought, perception, knowledge, etc. An example of an objectival clausal argument Development starts with mapping typological de- is given in (1). scriptions of the phenomenon to a set of choices to be presented to the user via the web questionnaire (1) Kim thinks [that Sandy left]. [eng] (§4.1). This characterizes our analysis in the broad sense. In particular, upon reviewing the typologi- Clausal complements can be finite clauses that cal literature and restricting the scope of the library can occur on their own (1) or they can be reduced. as outlined in §3, we provide an analysis for lan- Reduced clauses can share the subject with the guages that mark clausal complements in a set of matrix clause, have the verb in a dependent form certain ways, e.g. with complementizers, nominal- or have case marking on the arguments that dif- ization, aspect or other marking on the embedded fers from what is normal for finite clauses in the verb.7 We posit HPSG analyses for all combina- language (ibid.). Complementation strategies in- tions of the choices we target, without claiming clude parataxis (several clauses joined without sub- to have identified or modeled all dimensions of ordination); complementizer + finite complement; variation. We start with basic analyses for com- nominalization; infinitival and participial comple- plementizers and clause-embedding verbs (§4.2) ments. and implement a holistic analysis in a test-driven Here we focus on the complementation phe- fashion (§4.3). nomena which involve embedded clauses that are declarative, objectival and clearly marked as subor- 4.1 Web Questionnaire dinate, and that contain a full proposition. Embed- As summarized in §3, clausal complements can be ded clauses that are subjects, interrogative clauses, marked with a complementizer, special morphol- subject sharing, paratactic, infinitival or participial ogy on the embedded verb, or word order changes complements are not covered at this point. Our in either the matrix or the embedded clause. Com- library supports clausal complements marked by binations of these phenomena are also possible. special morphology on the verb, special word or- There may be several distinct strategies in a lan- der, and/or a complementizer attaching to one of guage and clausal complement-taking verbs may the edges of the embedded clause. Special mor- select for specific ones. We add a web page to the phology on the embedded verb is illustrated in (2), questionnaire which elicits all the relevant choices a Turkish example with nominalization. Special 6 from the user. word order in the matrix clause is illustrated by A language may have multiple complementizers (3): Uzbek is a SOV language, but clausal com- which belong to distinct complementation strate- plements can be extraposed to the end of the sen- gies (i.e. with di erent word order consequences, tence. Note that the order of the complementizer di erent selecting verbs, or di erent morphologi- and the embedded clause here also does not follow cal requirements on the embedded verb). To ac- the same rule as, for example, a transitive verb and 7While our analysis draws on a comprehensive review of 6See §4.2.4 regarding the word order in subordinate typological literature, we do not make any typological claims clauses in German. as part of this work.
41 count for this, we generalize the feature, pre- Grammar Matrix provides us with a theoretical as viously available for verbs to distinguish di erent well as an engineering basis. As is consistent with syntactically relevant inflected forms, to comple- the Grammar Matrix’s overarching goal, the main mentizers, and reflect this in the questionnaire. goal of our analysis is to provide the user-linguist To handle complementation strategies that re- with a range of typologically motivated possibil- quire specific morphology on the embedded verb, ities rather than to focus on specific predictions we leverage several existing libraries: The mor- of what is possible vs. what is not possible in the photactics library (Goodman, 2013) provides a world’s languages. In this section, we describe the means of associating morphological features with building blocks of our analysis. inflected forms of verbs; the tense/aspect/mood li- brary (Poulson, 2011) allows users to define mor- 4.2.1 Lexical Types phology associated with those features (as well as Complementizers can be treated as semantically additional arbitrary features on the ‘Other Features’ empty elements that take a proposition as a com- page) and the nominalization library (Howell et al., plement (e.g. Siegel et al. 2016). The Grammar 2018) accounts for nominalized clauses. In all Matrix already had a complementizer type which cases, the lexical rules defined by these libraries did not contribute its own predication but raised include morphosyntactic or morphosemantic fea- its complement’s semantic identifier via the tures which allow an embedding verb or comple- identity, as shown in (4).9 mentizer to select for a clausal complement headed by a verb with appropriate morphology. This lex- (4) comp-lex-item ically introduced information is available at the comp 2HEAD 3 6 MOD 7 clause level thanks to constraints in the shared core 6 " hi# 7 6 7 grammar that pass the information up the tree. 6SUBJ 7 6 hi 7 6 HEAD verb 7 We also provide an analysis of clause-bounded 6 7 6 7 extraposition of clausal complements. One of the 6 SUBJ 7 6COMPS 1 2 hi 3 7 6 h 6COMPS 7i7 salient syntactic characteristics of clausal com- 6 6 hi 7 7 6 6HOOK 2 7 7 plements is that they tend to be dispreferred in 6 6 7 7 6 6 7 7 6ARG-ST 1 6 7 7 sentence-medial position, as seen in (3). We allow 6 h 6i 7 7 6RELS !!4 5 7 the user to choose this type of extraposition and 6 h i 7 6HOOK 2 7 8 6 7 indicate whether it is obligatory or optional. 6 7 6 7 6 7 Finally, the user must add lexical types for While4 posited originally for question5 parti- clause-embedding verbs, at least one per comple- cles in the yes/no questions library (Bender and mentation strategy, as we operate within a lexicalist Flickinger, 2005), it was intended to generalize view of syntax and the lexical type of the clause- to clausal complementizers. In order to have embedding verb is therefore responsible for choos- clausal complementizers inherit from this super- ing which type of clausal complement to take. type we move the (‘main clause’) value from Here, we extended an already existing part of the the supertype (4) where it was posited originally questionnaire (the Lexicon) so that verb types can to the clausal complementizer subtypes (5). This select for a complementation strategy. way question particles (not shown) can take main clauses as their complements but clausal comple- 4.2 Syntactic Analysis mentizers cannot. Another change implemented to As discussed in §2.3, there appears to be little in support our analysis is that complementizers now the theoretical HPSG literature focused on a cross- have the feature (with user-specified array of linguistic account of basic complementation that values). This is used to license certain comple- can be directly incorporated into our library. How- mentizers only in combination with certain verb ever, we can leverage the existing Grammar Matrix types. 10 analyses of various phenomena as well as those from other implemented grammars and extend or 9Empty semantic relations ( ) lists and identities adapt them to cover clausal complementation; the as in (4) are characteristic of semantically empty elements. 10AVMs are abbreviated to focus on the primary points of 8Displacement to sentence-initial position is less men- interest. Note also that all such feature structures are arranged tioned in the typological literature, though it appears to be into a type hierarchy, such that more specific types, like (5), possible. We leave it to future work. inherit all of the constraints of their supertypes, like (4).
42 (5) ccomp-lex-item clausal complement strategies as argument struc- HEAD FORM form ture choices and allowed the case frames that are 2 3 6 7 available for non-clause-embedding verbs to be 6 h i7 6COMPS MC 7 6 h i 7 available for clausal complement strategies as well. 6 h i 7 6 7 11 The option which specifies a case constraint on the To model6 clausal complement-taking7 verbs, we introduce4 new lexical types. Lexical5 types in verb’s object is not always available: it only makes the Grammar Matrix typically specify both syntac- sense for nominalized clausal objects. tic and semantic requirements on their dependents. 4.2.2 Lexical Rules, Features, Nominalization Because cross-linguistically clausal complement- As noted in §4.1 above, we leverage existing li- taking verbs can take both verbal and nominal- braries to account for constraints on the morphol- ized complements, our supertype in this space (6) ogy of embedded verbs imposed by complemen- leaves both underspecified. Otherwise, it is simi- tizers or clausal-complement verbs. The result is lar to transitive-verb-lex, which also specifies two that users can specify, for example, that the clausal arguments.12 complement be nominalized, at which point the re- (6) cl-verb-lex sulting clausal-complement verb type will include SUBJ 1 the constraint shown in (7). We also extended the 2 h i 3 6 SUBJ 7 Morphotactics library to allow nominalized verbs 6COMPS 2 hi 7 6 COMPS 7 to bear case inflections. 6 h " #i 7 6 hi 7 6 7 6ARG-ST 1 HEAD noun , 2 7 (7) 6 h i7 cl-verb-lex 6 h i 7 6 7 COMPS NMZ + Further6 subtypes inheriting from cl-verb-lex7 are 2 3 4 5 6 h i7 6 h i 7 based on the specific choices made by the user and 6 7 For a6 specific example7 of how we accommo- make use of one of two already existing types: ei- 4 5 date lexical variation in complementation, consider ther transitive-lex-item, for nominalized comple- Turkish, which has several complementation strate- ments, or clausal-second-arg-trans-lex-item, for gies. The verb isti (‘want’) can not only take nom- all other types of clausal complements. This al- inalized complements like in (2) but also clauses lows us to model the primary semantic e ect of headed by verbs in the optative form, as in (8). nominalization: the introduction of an individual- type variable corresponding to the nominalized (8) herkes [yarin ben-im-le sinema-ya everybody [tomorrow I- - cinema- clause, which in turn allows for e.g. adjectival gel-esin] isti-yor modifiers of that clause. In the semantic com- come-2 . ] want- . position, transitive-lex-item takes the individual- ‘Everybody wants you to come along to the movies type index of its complement as a semantic argu- with me tomorrow.’ [tur] (from Kornfilt 2013, p. 48) ment. Clausal-second-arg-trans-lex-item, on the To model this, the user can add separate comple- other hand, expects an argument with an event-type mentation strategies in the questionnaire for nom- index and combines with it semantically via a han- inalization and for optative on the embedded verb, dle constraint to accommodate the MRS analysis and a corresponding clausal complement-taking of quantifier scope (Copestake et al., 2005). verb type to go with each strategy. The resulting For proper interaction with the case library grammar will include two lexical entries for isti, (Drellishak, 2009), clausal complement-taking each belonging to a di erent type. One will only verbs require a hierarchy which is aware of various take complements with [ +, nonfinite] case frames. In modeling a language with case, and the other [ , opt]. the user must specify a case frame for each verb type. For example, a verb can have nominative- 4.2.3 Phrase Structure Rules, and accusative argument structure, meaning the type The most intricate part of this library is its in- will constrain its syntactic subject to bear nomi- teraction with the existing analysis of word order. native case and its object accusative. We added To account for basic word order, any Grammar 11Often called the , complement-taking predicate, in Matrix-generated grammar will have some basic syntactic and typological literature. head-subject and head-complement rules, one of 12In future work, we will extend the lexical types available to provide for more valence patterns with clausal comple- each when the word order is strict (e.g. SOV) and ments, such as Kim told me that Sandy left. more if the order is flexible.
43 When the complementizer or the main verb par- mentizer the order is verb final. Fokkens (2014) ticipates in a word order that is di erent from other implemented this type of word order variation in verbs, we posit an additional head-complement the Grammar Matrix framework but did not fully rule (HCR) and, in some cases, an additional head- integrate it into the customization system. We in- subject rule (HSR). Then we constrain them (as corporate part of her analysis so that the user can well as all lexical types that use the HCRs and choose this type of word order variation directly HSRs) for one or both Boolean features, and via the questionnaire. . While similar features have been used be- fore (Keller 1995; Crysmann 2013; see also Siegel 4.3 Implementation et al. 2016, 59), we incorporate them into a new Encoding any particular structure is straightfor- customization logic so that sets of correct con- ward; the challenge is in emitting the right sets straints are emitted automatically based on user of structures with the right constraints given user choices. Examples (9)–(10) illustrate the general choices, where the space of possible combinations HCR and the additional one for extraposition that is large. Furthermore, any additions need to be are emitted for one of many possible user-defined well integrated so that the constraints the new code languages: an OVS language with extraposition. emits should not interfere with what other libraries create. At the same time, it is not desirable to (9) Top HCR1 HCR1 add unique structures where an existing one can COMPS 1 2 3 be adapted, since this leads to unnecessarily large, 6 INIT 7 O VHSR 6 7 complex and potentially overgenerating grammars. 6 7 6H-DTR SUBJ 7 6 2 hi 37 The goal is to implement an analysis that is su - 6 6COMPS 3 1 77 V S 6 6 h i 77 ciently general to handle any typologically plausi- 6 6 77 6NH-DTR 63 77 6 6 77 ble language while at the same time emitting rea- 6ARGS 4 3 , 2 57 6 h i 7 sonably streamlined grammars. 6 7 6 7 6 7 (10) 4HCR2 5 TopHCR2 4.3.1 Test-driven Development COMPS 1 2 3 We begin by describing the procedure we use for 6 INIT + 7 VHSR O 6 7 creating test cases, both in development and in 6 7 6H-DTR SUBJ 7 6 2 hi 37 evaluation (§5). Test-driven development in the 6 6COMPS 3 1 77 V S 6 6 h i 77 context of the Grammar Matrix relies on two com- 6 6 77 6NH-DTR 63 77 6 6 77 ponents: the testsuites and the test choices files 6ARGS 4 2 , 3 57 6 h i 7 (i.e. grammar specifications). There is a testsuite 6 7 6 7 We use6 (a feature) on lexical7 types and and a choices file for each language that is used 4 5 on the head daughters of phrase structure rules to in development or evaluation. While the testsuite account for word order variations associated with and the choices file are created separately, both subordinator attachment and with extraposition of are based on the descriptive grammar for the lan- objects from OV languages. [ +] is associ- guage in question (with the exception of artificial ated with head-initial rules and [ ] with head- pseudolanguages discussed below which function final rules. We need a separate feature, more like ‘unit tests’ for the bits of our analysis). (not shown), to handle extraposition in VOS and Specifically, we first collect the sentences illustrat- V-initial languages, because the order in these lan- ing clausal complementation from the descriptive guages remains head-initial despite extraposition grammar to compile the testsuites. Then, in a sep- to the end of sentence. is constrained on arate iteration, we read the descriptive grammar to list elements of clausal-complement select- the extent necessary to fill out the Grammar Matrix ing heads and on the non-head daughter of head- questionnaire for this language. initial HCRs.13 We build testsuites and choices files for pseu- dolanguages (defined by combinations of choices) 4.2.4 German-like V2/V-final Variation and for 5 development languages (see §5 for the In German, the word order is verb second (V2), details). The number of possible pseudolanguages but in clausal complements marked by a comple- is large: a conservative estimate which treats some 13For an example of how is used, see Zamaraeva bundles of choices as single dimensions is 2300, et al. 2018. assuming one strategy per language. In testing we
44 work with a 50 language sample which includes out languages, described in §5 below. several languages with more than one complemen- tation strategy. A testsuite consists of grammatical 4.3.2 Adding Types and ungrammatical sentences illustrating what is We update the customization system as follows. possible/impossible with respect to clausal com- A supertype for clausal-complement verbs (de- plements in a given language. scribed in §4.2) is now added in all cases when For pseudolanguages testsuites, we construct all any clausal complement strategy is specified. A possible nonrecursive sentences14 illustrating the supertype for a complementizer is added whenever clausal complementation strategies that this lan- the user says there is a complementizer associated guage has. For development languages, the sen- with any of the strategies. Appropriate subtypes tences either come directly or are adapted from the are added based on the user choices, one per strat- 16 sections in descriptive grammars explaining how egy, and constrained as described in §4.3.3. clausal complements work in this language. Here Additional phrase structure rules are added ac- the sentences feature interacting phenomena such cording to the user choices. An additional HSR as tense and case.15 Crucially, we include cor- is added for VOS orders with extraposition. An responding impossible (ungrammatical) sentences additional HCR is added whenever the comple- to make sure they are not parsed. Consider a mentizer or the clause-embedding verb cannot use language with one strict strategy: SOV word or- the basic HCR. In general, this means all situations der, obligatory complementizer attaching before when the basic order is e.g. head-final but there is the embedded clause, and obligatory extraposition either extraposition or the complementizer can at- of clausal complements. The testsuite will con- tach clause-initially (or symmetrically, if the basic sist of 2 grammatical sentences, one for a simple order is head-initial but the complementizer can at- SOV sentence and another that has an extraposed tach clause-finally). In practice, it means checking 17 clausal complement with the complementizer. The combinations of choices: 8 for the word orders; ungrammatical sentences will include: a complex 3 for complementizer (obligatory, no, optional); 3 sentence with no complementizer, one with a com- for complementizer attachment (before the clause, plementizer attaching after the clause, one with a after, or both); 3 for extraposition (obligatory, no, non-extraposed clausal complement, and a simple optional). clause with SVO order. The more flexible the lan- 4.3.3 Adding Constraints guage, the more grammatical and fewer ungram- matical examples the testsuite will have. After adding the types, we constrain them for the grammar to generate only grammatical strings (e.g. From the choices files which are created inde- for features , , , ). We need to pendently from the testsuites, grammar fragments add constraints so that they do not clash with those are created with the original customization sys- placed by other libraries while not positing new tem. The grammars are loaded into the LKB soft- types unless necessary. ware(Copestake, 2002) which can parse strings. Then we edit these grammars by hand until they be- The information about whether or have correctly with respect to the yet unsupported (nominalization) constraints are needed on the rel- clausal complements choices (as reflected by the evant types comes directly from the choices files: grammars’ coverage and overgeneration over the the user would have specified a value or a corresponding testsuites). Then we generalize the nominalization strategy associated with the com- solutions in the resulting grammars and incorpo- plementation strategy. The treatment of and rate them in the customization system, continually however must be inferred from a fairly large testing with the regression tests. The result of this space of choices combinations. development is frozen before evaluation on held- Figure 1 shows the logic that we add to the cus- tomization system that lets it decide whether to 14The vocabulary is minimal. use the feature.18 The decision depends on 15In developing these testsuites, we sometimes must adapt the sentences to exclude phenomena that are not supposed to 16A type can be instantiated by multiple lexical entries. be supported by the system or to capture the full spectrum 17Excluding V2 and free. described by the grammar’s author in prose but not fully il- 18Abbreviations for Figure and Tables: comp (comple- lustrated by examples. This, along with the initial selection mentizer); extrap (extraposition); fam (family); morph (mor- of which sentences to include in the grammar, introduces a pheme); neg (ungrammatical sentences); nmz (nominaliza- certain amount of bias to the testsuites. tion); oblig (obligatory); opt (optional); OV (object pre-
45 Complementizer (comp)
oblig. opt. no
OV VO OV VO extrap no extrap
S comp comp S both S comp comp S both extrap no extrap comp S S comp both OV VO no extrap no extrap strict extrap flex. extrap no extrap no comp S S comp both no no