Modeling Clausal Complementation for a Grammar Engineering Resource Olga Zamaraeva University of Washington, [email protected]
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the Society for Computation in Linguistics Volume 2 Article 6 2019 Modeling Clausal Complementation for a Grammar Engineering Resource Olga Zamaraeva University of Washington, [email protected] Kristen Howell University of Washington, [email protected] Emily M. Bender University of Washington, [email protected] Follow this and additional works at: https://scholarworks.umass.edu/scil Part of the Computational Linguistics Commons Recommended Citation Zamaraeva, Olga; Howell, Kristen; and Bender, Emily M. (2019) "Modeling Clausal Complementation for a Grammar Engineering Resource," Proceedings of the Society for Computation in Linguistics: Vol. 2 , Article 6. DOI: https://doi.org/10.7275/dygn-c796 Available at: https://scholarworks.umass.edu/scil/vol2/iss1/6 This Paper is brought to you for free and open access by ScholarWorks@UMass Amherst. It has been accepted for inclusion in Proceedings of the Society for Computation in Linguistics by an authorized editor of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. Modeling Clausal Complementation for a Grammar Engineering Resource Olga Zamaraeva Kristen Howell Emily M. Bender University of Washington University of Washington University of Washington [email protected] [email protected] [email protected] Abstract because of their interaction with other phenomena. A grammar that cannot support subordinate clauses We present a grammar engineering library will fail to provide analyses for a big portion of a for modeling objectival declarative clausal typical corpus and will not support the grammar complementation patterns attested cross- engineer in exploring how well the analyses used linguistically. Our primary contribution is positing a set of syntactico-semantic analyses for modeling simple clauses generalize. couched within a variant of the HPSG syntactic To fill this gap, we incorporate a cross-linguistic formalism and integrating them with a variety account of clausal complements into a grammar of phenomena already implemented in a gram- engineering questionnaire and customization sys- mar engineering toolkit. We evaluate the addi- tem. The questionnaire elicits typological infor- tion to the system on testsuites from genetically mation about hypothetically any language and the diverse languages that were not considered dur- customization system outputs a starter precision ing development. grammar to specifications. In this way, the toolkit 1 Introduction supports the rapid development of precision gram- mars, giving both novice and experienced grammar Grammar engineering is the modeling of language developers a means to create a grammar fragment rules in a machine-readable fashion, such that the customized to their language of interest and ready resulting system (the grammar) can parse gram- for extension to broader coverage. matical strings while not overgenerating (licensing In the meta-grammar engineering context, an incorrect or spurious analyses). Linguistic gram- analysis is a set of theoretically-grounded elements mar engineering prioritizes precision (how many which comply with the requirements of the spe- parses are syntactically and semantically correct cific implementation framework and are used by with respect to the linguistic formalism) over re- the grammar engineering system so as to produce call (how many input strings get parsed). functioning grammars. In our case, the analysis A precision grammar documents and models is couched within Head-Driven Phrase Structure linguistic phenomena on the one hand, and is a pro- Grammar (HPSG; Pollard and Sag, 1994) and gram that parses and generates, on the other. This Minimal Recursion Semantics (MRS; Copestake makes precision grammars a resource for rigorous et al., 2005) and is implemented as an extension to linguistic hypothesis testing as well as for NLP the LinGO Grammar Matrix (Bender et al., 2002, tasks. While precision grammars are expensive— 2010). Our main challenges lie in accounting for broad-coverage grammars like Flickinger 2000, a large space of typological possibilities while at 2011 or Siegel et al. 2016 take years to build— the same time integrating our analysis of clausal there are efforts to automate this process. We complements into a complex system which out- present a contribution to one such initiative, adding puts streamlined grammars. We start by motivat- a clausal complements library to a grammar engi- ing the choice of the grammar engineering system neering starter toolkit. and briefly summarizing the formal underpinnings Clausal complements are a kind of subordinate associated with this choice (§2). We then present clause. Modeling subordinate clauses in general a concise summary of the typological literature on and clausal complements in particular is important clausal complementation (§3), and describe our not only because of their corpus frequency, but also analysis and implementation in §4. The evaluation 39 Proceedings of the Society for Computation in Linguistics (SCiL) 2019, pages 39-49. New York City, New York, January 3-6, 2019 results featuring held-out languages from different clause embedding itself. On the other hand, we are families are given in §5. All of the resources men- able to explore the interaction of our analyses with tioned in the paper are freely available.1 the existing analyses of other phenomena in or- der to validate both the existing and new libraries. 2 Background Finally, because we contribute our library to the Grammar Matrix code base, it can in turn serve 2.1 The Grammar Matrix as part of the background infrastructure for future We develop a cross-linguistic analysis of clausal library development. complements as part of the LinGO Grammar Ma- 2 trix (Bender et al., 2002, 2010). Other examples 2.2 DELPH-IN Joint Reference Formalism, of multilingual grammar engineering projects in- HPSG and MRS clude ParGram (Butt and King, 2002) and Core- The Grammar Matrix uses the DELPH-IN Joint Gram (Müller, 2015). What is unique about the Reference Formalism (Copestake, 2000), a version Grammar Matrix, however, is that it allows the of the HPSG syntactic formalism (Pollard and Sag, user to obtain a starter grammar customized from 1994), developed to balance expressive power with typological choices, making it possible to easily computational efficiency. Every part of the gram- test the analysis on multiple languages, including mar (lexical items, lexical rules, and phrase struc- systematic testing of each part of the analysis using ture rules) is encoded with typed feature struc- artificially constructed languages (see §4.3). tures.4 Phrase structure rules can have any fixed The Grammar Matrix consists of a web-based number of daughters in a fixed order,5 but are typ- questionnaire and a back-end customization sys- ically either binary branching or unary construc- tem. The input to the system is user answers to tions. Lexical rules are always unary projections the typological questions (elicited in the question- and are furthermore restricted to the lower parts of naire and serialized as a choices file) and the out- the tree, i.e. below any phrase structure rules. put is an implemented starter precision grammar. A customized grammar consists of a type hier- There are currently multiple libraries, including archy where typed feature structures (types) inherit word order (Bender and Flickinger, 2005; Fokkens, constraints from other types and add their own con- 2014); person, number, gender, and case systems straints. The typed feature structures defined by the (Drellishak, 2009); tense, aspect, and mood (Poul- grammar are combined using the operation of uni- son, 2011); argument optionality (Saleem and Ben- fication in order to build analyses of sentences. If der, 2010); matrix yes/no questions, lexicon, (Ben- no analysis can be found for a string using a gram- der and Flickinger, 2005); morphotactics (O’Hara, mar, then the string is deemed ungrammatical by 2008; Goodman and Bender, 2010); nominalized that grammar. Because the feature structures en- clauses (Howell et al., 2018), and clausal modifiers code both syntactic and semantic constraints, any (Howell and Zamaraeva, 2018).3 The contribution derivation tree produced by the grammar also in- of this paper is the clausal complements library. cludes an MRS semantic representation (Copes- In the context of the Grammar Matrix, our analysis take et al., 2005). When evaluating a grammar consists of a new web questionnaire page with a set created via the Grammar Matrix customization sys- of choices describing clausal complements cross- tem, the correctness of both syntactic and semantic linguistically, a set of new types available for the representations of strings is taken into account. back-end customization logic, and revisions to the types already existing in the system. 2.3 Clausal Complements in HPSG Building on the existing resource of the Gram- mar Matrix to develop our library has dual bene- To our knowledge, there is no previous fits: On the one hand, we can reuse existing im- cross-linguistic account of clausal complements plementations of phenomena that occur in embed- in HPSG. Language-specific analyses include ded clauses and focus our attention primarily on Ginzburg and Sag 2000 for English and Crysmann 2013 for German, among others. We could not di- 1svn://lemur.ling.washington.edu/shared/ matrix/trunk, revision 42067 (code and testsuites). 4These are exemplified in §4.2 in (4)-(10). 2http://matrix.delph-in.net/customize/ 5This contrasts with other versions of HPSG formalisms matrix.cgi which separate immediate dominance from linear precedence 3The nonexhaustive list of cited libraries includes the ones (e.g. Engelkamp et al., 1992) as well as those that allow with which we tested the interaction of our library. variable-arity rules, like Sag et al. 2003. 40 rectly incorporate them for two reasons. First, most its nominal object. Complementizers can be seen of the existing theoretical analyses are not within in both (3) and (1). the DELPH-IN JRF. Second, the literature mostly (2) [senin sinema-ya gel-me-n-i] concerns itself with the level of clause complex- [2. cinema- come--2-] ity which is beyond the Grammar Matrix’s cur- isti-yor-um rent scope.