Valency-Augmented Dependency Parsing
Total Page:16
File Type:pdf, Size:1020Kb
Valency-Augmented Dependency Parsing Tianze Shi Lillian Lee Cornell University Cornell University [email protected] [email protected] In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing ccomp Abstract mark xcomp We present a complete, automated, and effi- nsubj nsubj mark cient approach for utilizing valency analysis in making dependency parsing decisions. It in- He says that you like to swim . cludes extraction of valency patterns, a proba- Figure 1: Sample annotation in UD, encoding the bilistic model for tagging these patterns, and core valency pattern nsubj ˛ ccomp for says, a joint decoding process that explicitly con- nsubj ˛ xcomp for like, and so on (see §2-4) siders the number and types of each token’s syntactic dependents. On 53 treebanks rep- resenting 41 languages in the Universal De- Fig. 1 for examples), and operationalize this no- pendencies data, we find that incorporating va- tion as extracted supertags. An important dis- lency information yields higher precision and F1 scores on the core arguments (subjects and tinction from prior work is that our definition of complements) and functional relations (e.g., valency-pattern supertags is relativized to a user- auxiliaries) that we employ for valency anal- specified subset of all possible syntactic relations ysis. Precision on core arguments improves (see §3). We train supertaggers that assign proba- from 80:87 to 85:43. We further show that bilities of potential valency patterns to each token, our approach can be applied to an ostensibly and leverages these probabilities during decoding different formalism and dataset, Tree Adjoin- to guide our parsers so that they favor more lin- ing Grammar as extracted from the Penn Tree- bank; there, we outperform the previous state- guistically plausible output structures. of-the-art labeled attachment score by 0:7. Fi- We mainly focus on two subsets of relations in nally, we explore the potential of extending va- our analysis, those involving core arguments and lency patterns beyond their traditional domain those that represent functional relations, and per- by confirming their helpfulness in improving form experiments over a collection of 53 treebanks 1 PP attachment decisions. in 41 languages from the Universal Dependencies 1 Introduction dataset (UD; Nivre et al., 2017). Our valency- aware parsers improve upon strong baseline sys- Many dependency parsers treat attachment deci- tems in terms of output linguistic validity, mea- sions and syntactic relation labeling as two in- sured as the accuracy of the assigned valency pat- dependent tasks, despite the fact that relation la- terns. They also have higher precision and F1 bels carry important subcategorization informa- scores on the subsets of relations under analysis, tion. For example, the number and types of the suggesting a potentially controlled way to balance syntactic arguments that a predicate may take is precision-recall trade-offs. rather restricted for natural languages — it is not We further show that our approach is not limited common for an English verb to have more than one to a particular treebank annotation style. We apply syntactic subject or more than two objects. our method to parsing another grammar formal- In this work, we present a parsing approach ism, Tree Adjoining Grammar, where dependency that explicitly models subcategorization of (some) and valency also play an important role in both syntactic dependents as valency patterns (see theory and parser evaluation. Our parser reaches 1Our implementation is available at https:// a new state-of-the-art LAS score of 92:59, with github.com/tzshi/valency-parser-emnlp18 more than 0:6 core-argument F1-score improve- ment over our strong baseline parser. Dataset Subset Syntactic Relations Finally, we demonstrate the applicability of our nsubj, obj, iobj, valency analysis approach to other syntactic phe- Core csubj, ccomp, xcomp nomena less associated with valency in its tradi- UD aux, cop, mark, tional linguistic sense. In a case study of PP at- Func. det, clf, case tachment, we analyze the patterns of two syntac- PP (§8) nmod, obl tic relations commonly used in PP attachment, and include them in the joint decoding process. Preci- 0 (subject), 1 (object), Core sion of the parsers improves by an absolute 3:30% TAG 2 (indirect object) on these two relation types. Co-head CO 2 Syntactic Dependencies and Valencies Table 1: Sets of syntactic relations we used for va- lency analysis. UD subsets come from the official According to Nivre (2005), the modern depen- categorization in the annotation guidelines. dency grammar can be traced back to Tesnière (1959), with its roots reaching back several cen- turies before the Common Era. The theory is cen- tered on the notion of dependency, an asymmet- a´j ¨ ¨ ¨ a´1 ˛ a1 ¨ ¨ ¨ ak: the ˛ symbol denotes the rical relation between words of a sentence. Tes- center word wi, and each al asserts the existence nière distinguishes three node types when analyz- of a word w dominated by wi via relation al P R, al ing simple predicates: verb equivalents that de- wi ÝÑ w. For al and am, when l ă m, the syn- scribe actions and events, noun equivalents as the tactic dependent for al linearly precedes the syn- arguments of the events, and adverb equivalents tactic dependent for am. As an example, consider for detailing the (temporal, spatial, etc.) circum- the UD-annotated sentence in Fig. 1. The token stances. There are two types of relations: (1) says has a core-relation3 valency pattern nsubj ˛ verbs dominate nouns and adverbs through a de- ccomp, and like has the pattern nsubj ˛ xcomp. pendency relation; (2) verbs and nouns are linked If we consider only functional relations, both like through a valency relation. Tesnière compares a and swim have the pattern mark ˛.4 We sometimes verb to an atom: a verb can attract a certain num- employ the abbreviated notation αL ˛αR, where α ber of arguments, just as the valency of an atom indicates a sequence and the letters L and R distin- determines the number of bonds it can engage in guish left dependencies from right dependencies. (Ágel and Fischer, 2015). In many descriptive We make our definition of valency patterns de- lexicographic works (Helbig and Schenkel, 1959; pendent on choice of R not only because some de- Herbst et al., 2004), valency is not limited to verbs, pendency relations are more often obligatory and but also includes nouns and adjectives. For more closer to the original theoretical definition of va- on the linguistic theory, see Ágel et al. (2003, lency, but also because the utility of different types 2006). of syntactic relations can depend on the down- Strictly following the original notion of va- stream task. For example, purely functional de- lency requires distinguishing between arguments pendency labels are semantically vacuous, so they and adjuncts, as well as obligatory and optional are often omitted in the semantic representations dependents. However, there is a lack of consen- extracted from dependency trees for question an- sus as to how these categorizations may be dis- swering (Reddy et al., 2016, 2017). There are also tinguished (Tutunjian and Boland, 2008), and thus recent proposals for parser evaluation that down- we adopt a more practical definition in this paper. play the importance of functional syntactic rela- 3 Computational Representation tions (Nivre and Fang, 2017). Formally, we fix a set of syntactic relations R, and define the valency pattern of a token wi with dependents. We choose to encode linearity since it appears respect to R as the linearly-ordered2 sequence that most languages empirically exhibit word order prefer- ences even if they allow for relatively free word order. 2Our approach, whose full description is in §5, can be 3UD core and functional relations are listed in Table 1. adapted to cases where linear ordering is de-emphasized. The 4The (possibly counterintuitive) direction for that and to algorithm merely requires a distinction between left and right is a result of UD’s choice of a content-word-oriented design. 4 Pilot Study: Sanity Checks system outputs and the number of those not ob- served in training.7 All 5 systems are remarkably We consider two questions that need to be ad- “hallucinatory” in inventing valency relations, in- dressed at the outset:5 troducing 16:8 to 35:5 new valency patterns, sig- 1. How well do the extracted patterns generalize nificantly larger than the actual number of unseen to unseen data? patterns. Below we show an error committed by the state-of-the-art Dozat et al. (2017) parser (up- 2. Do state-of-the-art parsers already capture per half) as compared to the gold-standard annota- the notion of valency implicitly, though they tion (lower half), and we highlight the core argu- are not explicitly optimized for it? ment valency relations of the verb bothers in bold. The system incorrectly predicts how come to be a The first question checks the feasibility of learning clausal subject. valency patterns from a limited amount of data; csubj ccomp the second probes the potential for any valency- advmod nsubj informed parsing approach to improve over cur- rent state-of-the-art systems. How come no one bothers to ask ... To answer these questions, we use the UD 2.0 fixed nsubj dataset for the CoNLL 2017 shared task (Zeman advmod xcomp et al., 2017) and the system outputs6 of the top Each such non-existent new pattern implies at five performing submissions (Dozat et al., 2017; least some (potentially small) parsing error that Shi et al., 2017b; Björkelund et al., 2017; Che can contribute to the degradation of downstream et al., 2017; Lim and Poibeau, 2017). Selection task performance. of treebanks is the same as in §6.