Discourse Relation Sense Classification with Two-Step
Total Page:16
File Type:pdf, Size:1020Kb
Discourse Relation Sense Classification with Two-Step Classifiers Yusuke Kido Akiko Aizawa The University of Tokyo National Institute of Informatics 7-3-1 Hongo, Bunkyo-ku 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo, Japan Tokyo, Japan [email protected] [email protected] Abstract introduction of a new evaluation criterion based on partial argument matching in the CoNLL- Discourse Relation Sense Classification is 2016 Shared Task. On the other hand, the the classification task of assigning a sense sense classification components, which assign to discourse relations, and is a part of a sense to each discourse relation, continue to the series of tasks in discourse parsing. perform poorly. In particular, Non-Explicit sense This paper analyzes the characteristics classification is a difficult task, and even the best of the data we work with and describes system achieved an F1 score of only 0.42 given the system we submitted to the CoNLL- the gold standard argument pairs without error 2016 Shared Task. Our system uses two propagation (Wang and Lan, 2015). sets of two-step classifiers for Explicit In response to this situation, Discourse Relation and AltLex relations and Implicit and Sense Classification has become a separate task EntRel relations, respectively. Regardless in the CoNLL-2016 Shared Task (Xue et al., of the simplicity of the implementation, 2016). In this task, participants implement a it achieves competitive performance using system that takes gold standard argument pairs and minimalistic features. assigns a sense to each of them. To tackle this The submitted version of our system task, we first analyzed the characteristics of the ranked 8th with an overall F1 score of discourse relation data. We then implemented a 0.5188. The evaluation on the test dataset classification system based on the analysis. One achieved the best performance for Explicit of the distinctive points of our system is that, relations with an F1 score of 0.9022. compared to existing systems, it uses smaller number of features, which enables the source 1 Introduction code to be quite short and clear, and the training In the CoNLL-2015 Shared Task on Shallow time to be fast. The performance is nonetheless Discourse Parsing (Xue et al., 2015), all the competitive, and its potential for improvement is participants adopted some variation of the pipeline also promising owing to the short program. architecture proposed by Lin et al. (2014). Among This paper aims to reorganize the ideas about the components of the architecture, the main what this task actually involves, and to show challenges are the exact argument extraction the future direction for improvement. It is and Non-Explicit sense classification (Lin et al., organized as follows: Section 2 presents the 2014). data analysis. Then the implementation of the Argument extraction is a task to identify two system we submitted is described in Section 3. argument spans for a given discourse relation. The experimental results and the conclusion are Although the reported scores were relatively low provided in Section 4 and 5. for these components this is partially because of the “quite harsh” evaluation1. This led to the 2 Data Analysis 1 CoNLL 2016 Shared Task Official Blog There are four types of discourse relations, i.e., http://conll16st.blogspot.com/2016/04/ partial-scoring-and-other-evaluation. Explicit, Implicit, AltLex, and EntRel. In html the official scorer, these discourse relations are 129 Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 129–135, Berlin, Germany, August 7-12, 2016. c 2016 Association for Computational Linguistics divided into two groups, namely, Explicit and Table 1: System output for discourse relations Non-Explicit relations, and they are evaluated that are labeled as Comparison.Concession in the separately. AltLex relations are classified into golden data. The left and right columns show the Non-Explicit relations, but they share some connective words and the sense assigned by the characteristics with Explicit relations in that they system, respectively. have words that explicitly serve as connective words in the text. These connective words Connective Assigned Sense are one of the most important features in sense while Comparison.Contrast classification, as explained later; therefore, we even though Comparison.Concession divide the types of relations into (i) Explicit still Comparison.Contrast and AltLex and (ii) Implicit and EntRel types nevertheless Comparison.Contrast in this analysis. Throughout this paper, we do but Comparison.Contrast not distinguish between Explicit connective and yet Comparison.Contrast words that work as connective in AltLex relations, though Comparison.Contrast and they are simply referred to as connective. nonetheless Comparison.Concession even if Contingency.Condition 2.1 Explicit and AltLex Discourse Relations although Comparison.Contrast In the sense classification of Explicit and AltLex relations, connective words serve as important relations labeled as Comparison.Concession in features. the golden data. Table 1 shows the senses the Figure 1 shows the distribution of sense per system assigned. For example, some of the connective word over the Explicit relations. For discourse relations that have a connective word example, 91.8% of relations with connective word while are labeled as Comparison.Concession in and are labeled as Expansion.Conjunction, and the golden data, but the system assigned them as 5.8% as Contingency.Cause.Result. As can be Comparison.Contrast. seen, each kind of connective word is mostly According to the annotation manual, Contrast covered by only a few senses. Some words such and Concession are different in that only as also and if have more than 98.8% coverage by Concession has directionality in the interpretation a single sense. of the arguments. Distinguishing these two senses According to this observation, it is easy to build is, however, ambiguous and difficult, even for a reasonably accurate sense classifier simply by human annotators. taking connective words as a feature. For example, one obvious method is a majority classifier that 2.2 Implicit and EntRel Discourse Relations assigns the most frequent sense for the relations By definition, Implicit and EntRel relations have with the same connective words in the training no connective words in the text, which compli- dataset. Figure 2 shows the accuracy per sense cates the sense classification task considerably. of such a classifier in the training dataset. The Other researchers overcame this problem by ap- method is rather simple, but it achieves more than plying machine-learning techniques such as a 80% accuracy for most of the senses. Naive Bayes classifier (Wang and Lan, 2015) or One exception is Comparison.Concession, AdaBoost (Stepanov et al., 2015). They use vari- which had only a 17.4% accuracy. This is a ous features including those obtained from parses sense derived from Comparison.Concession and of the argument texts. Comparison.Pragmatic concession in the original As a baseline, we first implemented a support PDTB, and applies “when the connective indicates vector machine (SVM) classifier taking a bag-of- that one of the arguments describes a situation words of tokens in the argument texts as features. A which causes C, while the other asserts (or The evaluation was found to assign EntRel to implies) C” (Prasad et al., 2007). Discourse a large part of the input data. This trend is ¬ relations with connective words such as although, particularly noticeable for relatively infrequent but, and however are assigned this sense. In the senses. This problem is partially attributable to evaluation using the development data, the system the unbalanced data. In fact, there are more assigned Comparison.Contrast to most discourse EntRel instances included in the training data than 130 and Expansion.Conjunction 91.8% ContingenTceCym.ToCECpemoxaomunpmrstapaieplnor..iasArsRgsariioeisoelsn.ynsSoc.un..CnyRIlcnt.oheC s5cnsrtoh.tat7arnrnat%ocdoetsnieimutatys ist0seo i.0.ioPn8o.ntr 1%n 0e 0%0 c.0.1.e00.%0d%%ence 1.1% but Comparison.Contrast 76.5% Comparison.Concession 20.0% ExpaTnECEsexxoimoppnnapatin.onnCsrsgiaoioeoln.nnAj.c.u.ERAsynyx.e.clCnctsteietcaoarphuntntrseai oemo3tn.i.n.Rve0o ene0%u.tsa.sC 0us0.Ph%ol.t0orn e0%s c0.e0e.n0%d %aelntecren 0a.t1iv%e 0.0% also Expansion.Conjunction 99.7% TEexmpapnosriaoln.S.Rynecsthartoenmye 0n.t1 0%.0% when Temporal.Synchrony 50.8% Temporal.Asynchronous.SuccessCioonn 1ti9n.g8e%ncy.Condition 17.7%Contingency.CauTsCCeeCEom.oRomxnpmeptpoaiapnarsnargosilrens.iiAons 9oncs.ny..9ACR..n%Colectaonsheutncrasonteternasea.stmoRsiiuvtoe se0nsn. u. Pt030 lr.0%t.1e3 .0%c1.e%3d%ence 0.4% if Contingency.Condition 98.8% CCoEomxmppapanraisrsiosonn.CR.Coeosntntarctaesmst s0eio.n6nt% 00..13% while Comparison.Contrast 52.0% Temporal.Synchrony 26.9% Comparison.ConceEssxipoann 1s1io.4n%.Conjunction 9.6% as Temporal.Synchrony 61.6% Contingency.Cause.Reason 35.4% TemCpoornatli.nAgseynnccyh.Croanuosues.R.Seuscuclte s0s.i3o%n 2.6% because Contingency.Cause.Reason 100.0% after Temporal.Asynchronous.Succession 90.3% Contingency.Cause.Reason 9.6% however Comparison.Contrast 77.6% Comparison.Concession 21.4% Expansion.Conjunction 0.8% Figure 1: Distribution of the sense assigned to each connective word. All explicit relations with the ten most frequent connective words are extracted from the official training data for the CoNLL-2016 Shared Task. Expansion.Instantiation Temporal.Asynchronous.Precedence Expansion.Conjunction Contingency.Condition