Down-Stream Effects of Tree-To-Dependency Conversions

Down-stream effects of tree-to-dependency conversions Jakob Elming, Anders Johannsen, Sigrid Klerke, Emanuele Lapponi†, Hector Martinez, Anders Søgaard Center for Language Technology, University of Copenhagen †Institute for Informatics, University of Oslo Abstract The difficult cases in Figure 1 are difficult for the following reason. In the easy cases morphosyn- Dependency analysis relies on morphosyntac- tactic and semantic evidence cohere. Verbs gov- tic evidence, as well as semantic evidence. In some cases, however, morphosyntactic ev- ern subjects morpho-syntactically and seem seman- idence seems to be in conflict with semantically more important. In the difficult cases, how- tic evidence. For this reason dependency ever, morpho-syntactic evidence is in conflict with grammar theories, annotation guidelines and the semantic evidence. While auxiliary verbs have tree-to-dependency conversion schemes often the same distribution as finite verbs in head position differ in how they analyze various syntactic and share morpho-syntactic properties with them, constructions. Most experiments for which and govern the infinite main verbs, main verbs seem constituent-based treebanks such as the Penn Treebank are converted into dependency tree- semantically superior, expressing the main predi- banks rely blindly on one of four-five widely cate. There may be distributional evidence that com- used tree-to-dependency conversion schemes. plementizers head verbs syntactically, but the verbs This paper evaluates the down-stream effect of seem more important from a semantic point of view. choice of conversion scheme, showing that it Tree-to-dependency conversion schemes used has dramatic impact on end results. to convert constituent-based treebanks into dependency-based ones also take different stands on 1 Introduction the difficult cases. In this paper we consider four different conversion schemes: the Yamada-Matsumoto Annotation guidelines used in modern depen- conversion scheme yamada,1 the CoNLL 2007 dency treebanks and tree-to-dependency conversion format conll07,2 the conversion scheme ewt used in schemes for converting constituent-based treebanks the English Web Treebank (Petrov and McDonald, into dependency treebanks are typically based on 2012),3 and the lth conversion scheme (Johansson a specific dependency grammar theory, such as the Prague School’s Functional Generative Description, 1The Yamada-Matsumoto scheme can be Meaning-Text Theory, or Hudson’s Word Grammar. replicated by running penn2malt.jar available at In practice most parsers constrain dependency struc- http://w3.msi.vxu.se/ nivre/research/Penn2Malt.html. We used Malt dependency∼ labels (see website). The Yamada- tures to be tree-like structures such that each word Matsumoto scheme is an elaboration of the Collins scheme has a single syntactic head, limiting diversity be- (Collins, 1999), which is not included in our experiments. tween annotation a bit; but while many dependency 2The CoNLL 2007 conversion scheme can be treebanks taking this format agree on how to an- obtained by running pennconverter.jar available at http://nlp.cs.lth.se/software/treebank converter/with the alyze many syntactic constructions, there are still ’conll07’ flag set. many constructions these treebanks analyze differ- 3The EWT conversion scheme can be repli- ently. See Figure 1 for a standard overview of clear cated using the Stanford converter available at and more difficult cases. http://nlp.stanford.edu/software/stanford-dependencies.shtml 617 Proceedings of NAACL-HLT 2013, pages 617–626, Atlanta, Georgia, 9–14 June 2013. c 2013 Association for Computational Linguistics Clear cases Difficult cases Head Dependent ? ? Verb Subject Auxiliary Main verb Verb Object Complementizer Verb Noun Attribute Coordinator Conjuncts Verb Adverbial Preposition Nominal Punctuation Figure 1: Clear and difficult cases in dependency annotation. and Nugues, 2007).4 We list the differences in ing on syntactic features, when possible, and to re- Figure 2. An example of differences in analysis is sults in the literature, when comparable results exist. presented in Figure 3. Note that negation resolution and SRL are not end In order to access the impact of these conversion applications. It is not easy to generalize across five schemes on down-stream performance, we need ex- very different tasks, but the tasks will serve to show trinsic rather than intrinsic evaluation. In general that the choice of conversion scheme has significant it is important to remember that while researchers impact on down-stream performance. developing learning algorithms for part-of-speech We used the most recent release of the Mate parser (POS) tagging and dependency parsing seem ob- first described in Bohnet (2010),5 trained on Sec- sessed with accuracies, POS sequences or depen- tions 2–21 of the Wall Street Journal section of the dency structures have no interest on their own. The English Treebank (Marcus et al., 1993). The graph- accuracies reported in the literature are only inter- based parser is similar to, except much faster, and esting insofar they correlate with the usefulness of performs slightly better than the MSTParser (Mc- the structures predicted by our systems. Fortunately, Donald et al., 2005), which is known to perform POS sequences and dependency structures are use- well on long-distance dependencies often important ful in many applications. When we consider tree-to- for down-stream applications (McDonald and Nivre, dependency conversion schemes, down-stream eval- 2007; Galley and Manning, 2009; Bender et al., uation becomes particularly important since some 2011). This choice may of course have an effect on schemes are more fine-grained than others, leading what conversion schemes seem superior (Johansson to lower performance as measured by intrinsic eval- and Nugues, 2007). Sentence splitting was done us- uation metrics. ing splitta,6, and the sentences were then tokenized using PTB-style tokenization7 and tagged using the Approach in this work in-built Mate POS tagger. In our experiments below we apply a state-of-the-art parser to five different natural language processing Previous work (NLP) tasks where syntactic features are known to There has been considerable work on down-stream be effective: negation resolution, semantic role la- evaluation of syntactic parsers in the literature, but beling (SRL), statistical machine translation (SMT), most previous work has focused on evaluating pars- sentence compression and perspective classification. ing models rather than linguistic theories. No one In all five tasks we use the four tree-to-dependency has, to the best of our knowledge, compared the conversion schemes mentioned above and evaluate impact of choice of tree-to-dependency conversion them in terms of down-stream performance. We also scheme across several NLP tasks. compare our systems to baseline systems not rely- Johansson and Nugues (2007) compare the impact of yamada and lth on semantic role labeling 4The LTH conversion scheme can be obtained by running pennconverter.jar available at 5http://code.google.com/p/mate-tools/ http://nlp.cs.lth.se/software/treebank converter/ with the 6http://code.google.com/p/splitta/ ’oldLTH’ flag set. 7http://www.cis.upenn.edu/ treebank/tokenizer.sed ∼ 618 FORM1 FORM2 yamada conll07 ewt lth Auxiliary Main verb 1 1 22 Complementizer Verb 1 2 22 Coordinator Conjuncts 2 1 22 Preposition Nominal 1 1 12 Figure 2: Head decisions in conversions. Note: yamada also differ from CoNLL 2007 in proper names. Figure 3: CoNLL 2007 (blue) and LTH (red) dependency conversions. performance, showing that lth leads to superior per- Evaluation track on recognition textual entailment formance. using dependency parsing. They also compare sev- Miyao et al. (2008) measure the impact of syntac- eral parsers using the heuristics of the winning sys- tic parsers in an information extraction system iden- tem for inference. While the shared task is an tifying protein-protein interactions in biomedical re- example of down-stream evaluation of dependency search articles. They evaluate dependency parsers, parsers, the evaluation examples only cover a subset constituent-based parsers and deep parsers. of the textual entailments relevant for practical ap- Miwa et al. (2010) evaluate down-stream per- plications, and the heuristics used in the experiments formance of linguistic representations and parsing assume a fixed set of dependency labels (ewt labels). models in biomedical event extraction, but do not Finally, Schwartz et al. (2012) compare the evaluate linguistic representations directly, evaluat- above conversion schemes and several combinations ing representations and models jointly. thereof in terms of learnability. This is very different Bender et al. (2011) compare several parsers from what is done here. While learnability may be across linguistic representations on a carefully de- a theoretically motivated parameter, our results indi- signed evaluation set of hard, but relatively frequent cate that learnability and downstream performance syntactic constructions. They compare dependency do not correlate well. parsers, constituent-based parsers and deep parsers. The authors argue in favor of evaluating parsers on 2 Applications diverse and richly annotated data. Others have dis- Dependency parsing has proven useful for a wide cussed various ways of evaluating across annotation range of NLP applications, including statistical ma- guidelines or translating structures to a common for- chine translation (Galley and Manning,

Load more