Discovering Implicit Discourse Relations Through Brown Cluster Pair Representation and Coreference Patterns

Discovering Implicit Discourse Relations Through Brown Cluster Pair Representation and Coreference Patterns Attapol T. Rutherford Nianwen Xue Department of Computer Science Department of Computer Science Brandeis University Brandeis University Waltham, MA 02453, USA Waltham, MA 02453, USA [email protected] [email protected] Abstract Existing systems, which make heavy use of Sentences form coherent relations in a word pairs, suffer from data sparsity problem as discourse without discourse connectives a word pair in the training data may not appear more frequently than with connectives. in the test data. A better representation of two Senses of these implicit discourse rela- adjacent sentences beyond word pairs could have tions that hold between a sentence pair, a significant impact on predicting the sense of however, are challenging to infer. Here, the discourse relation that holds between them. we employ Brown cluster pairs to rep- Data-driven theory-independent word classifica- resent discourse relation and incorporate tion such as Brown clustering should be able coreference patterns to identify senses of to provide a more compact word representation implicit discourse relations in naturally (Brown et al., 1992). Brown clustering algorithm occurring text. Our system improves the induces a hierarchy of words in a large unanno- baseline performance by as much as 25%. tated corpus based on word co-occurrences within Feature analyses suggest that Brown clus- the window. The induced hierarchy might give ter pairs and coreference patterns can re- rise to features that we would otherwise miss. In veal many key linguistic characteristics of this paper, we propose to use the cartesian product each type of discourse relation. of Brown cluster assignment of the sentence pair as an alternative abstract word representation for 1 Introduction building an implicit discourse relation classifier. Sentences must be pieced together logically in a Through word-level semantic commonalities discourse to form coherent text. Many discourse revealed by Brown clusters and entity-level rela- relations in the text are signaled explicitly through tions revealed by coreference resolution, we might a closed set of discourse connectives. Simply be able to paint a more complete picture of the disambiguating the meaning of discourse connec- discourse relation in question. Coreference resolu- tives can determine whether adjacent clauses are tion unveils the patterns of entity realization within temporally or causally related (Pitler et al., 2008; the discourse, which might provide clues for the Wellner et al., 2009). Discourse relations and their types of the discourse relations. The information senses, however, can also be inferred by the reader about certain entities or mentions in one sentence even without discourse connectives. These im- should be carried over to the next sentence to form plicit discourse relations in fact outnumber explicit a coherent relation. It is possible that coreference discourse relations in naturally occurring text. In- chains and semantically-related predicates in the ferring types or senses of implicit discourse re- local context might show some patterns that char- lations remains a key challenge in automatic dis- acterize types of discourse relations. We hypoth- course analysis. esize that coreferential rates and coreference pat- A discourse parser requires many subcompo- terns created by Brown clusters should help char- nents which form a long pipeline. The implicit acterize different types of discourse relations. discourse relation discovery has been shown to be Here, we introduce two novel sets of features the main performance bottleneck of an end-to-end for implicit discourse relation classification. Fur- parser (Lin et al., 2010). It is also central to many ther, we investigate the effects of using Brown applications such as automatic summarization and clusters as an alternative word representation and question-answering systems. analyze the impactful features that arise from 645 Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 645–654, Gothenburg, Sweden, April 26-30 2014. c 2014 Association for Computational Linguistics Number of instances nectives. On the other hand, EXPANSION rela- Implicit Explicit COMPARISON 2503 (15.11%) 5589 (33.73%) tions might be more cleanly achieved without ones CONTINGENCY 4255 (25.68%) 3741 (22.58%) as indicated by its dominance in the implicit dis- EXPANSION 8861 (53.48%) 72 (0.43%) course relations. This imbalance in class distri- TEMPORAL 950 (5.73%) 3684 (33.73%) Total 16569 (100%) 13086 (100%) bution requires greater care in building statistical classifiers (Wang et al., 2012). Table 1: The distribution of senses of implicit discourse relations is imbalanced. 3 Experiment setup We followed the setup of the previous studies Brown cluster pairs. We also study coreferential for a fair comparison with the two baseline sys- patterns in different types of discourse relations in tems by Pitler et al. (2009) and Park and Cardie addition to using them to boost the performance (2012). The task is formulated as four sepa- of our classifier. These two sets of features along rate one-against-all binary classification problems: with previously used features outperform the base- one for each top level sense of implicit discourse line systems by approximately 5% absolute across relations. In addition, we add one more classifica- all categories and reveal many important charac- tion task with which to test the system. We merge teristics of implicit discourse relations. ENTREL with EXPANSION relations to follow the setup used by the two baseline systems. An argu- 2 Sense annotation in Penn Discourse ment pair is annotated with ENTREL in PDTB if Treebank an entity-based coherence and no other type of relation can be identified between the two arguments The Penn Discourse Treebank (PDTB) is the in the pair. In this study, we assume that the gold largest corpus richly annotated with explicit standard argument pairs are provided for each re- and implicit discourse relations and their senses lation. Most argument pairs for implicit discourse (Prasad et al., 2008). PDTB is drawn from relations are a pair of adjacent sentences or adja- Wall Street Journal articles with overlapping an- cent clauses separated by a semicolon and should notations with the Penn Treebank (Marcus et al., be easily extracted. 1993). Each discourse relation contains the infor- The PDTB corpus is split into a training set, de- mation about the extent of the arguments, which velopment set, and test set the same way as in the can be a sentence, a constituent, or an incontigu- baseline systems. Sections 2 to 20 are used to train ous span of text. Each discourse relation is also classifiers. Sections 0–1 are used for developing annotated with the sense of the relation that holds feature sets and tuning models. Section 21–22 are between the two arguments. In the case of implicit used for testing the systems. discourse relations, where the discourse connectives are absent, the most appropriate connective The statistical models in the following exper- is annotated. iments are from MALLET implementation (Mc- The senses are organized hierarchically. Our fo- Callum, 2002) and libSVM (Chang and Lin, cus is on the top level senses because they are the 2011). For all five binary classification tasks, we four fundamental discourse relations that various try Balanced Winnow (Littlestone, 1988), Maxi- discourse analytic theories seem to converge on mum Entropy, Naive Bayes, and Support Vector (Mann and Thompson, 1988). The top level senses Machine. The parameters and the hyperparame- ters of each classifier are set to their default values. are COMPARISON,CONTINGENCY,EXPANSION, The code for our model along with the data ma- and TEMPORAL. trices is available at github.com/attapol/ The explicit and implicit discourse relations al- brown_coref_implicit. most orthogonally differ in their distributions of senses (Table 1). This difference has a few im- 4 Features plications for studying implicit discourse relations and uses of discourse connectives (Patterson and Unlike the baseline systems, all of the features Kehler, 2013). For example, TEMPORAL relations in the experiments use the output from automatic constitute only 5% of the implicit relations but natural language processing tools. We use the 33% of the explicit relations because they might Stanford CoreNLP suite to lemmatize and part- not be as natural to create without discourse con- of-speech tag each word (Toutanova et al., 2003; 646 Toutanova and Manning, 2000), obtain the phrase number of inter-sentential coreferential pairs. structure and dependency parses for each sentence We expect that EXPANSION relations should be (De Marneffe et al., 2006; Klein and Manning, more likely to have coreferential pairs because the 2003), identify all named entities (Finkel et al., detail or information about an entity mentioned 2005), and resolve coreference (Raghunathan et in Arg1 should be expanded in Arg2. Therefore, al., 2010; Lee et al., 2011; Lee et al., 2013). entity sharing might be difficult to avoid. Similar nouns and verbs: A binary feature 4.1 Features used in previous work indicating whether similar or coreferential nouns The baseline features consist of the following: are the arguments of the similar predicates. Predi- First, last, and first 3 words, numerical ex- cates and arguments are identified by dependency pressions, time expressions,

Discovering Implicit Discourse Relations Through Brown Cluster Pair Representation and Coreference Patterns

Implicitness of Discourse Relations

Discourse Relations and Defeasible Knowledge`

Negation Markers and Discourse Connective Omission

Obligatory Presupposition in Discourse

Discourse Relation Sense Classification with Two-Step

Semantic Relations in Discourse: the Current State of ISO 24617-8

Arxiv:1904.10419V2 [Cs.CL] 30 Aug 2019 (E.G

Discourse Connectives: from Historical Origin to Present-Day Development Magdaléna Rysová Charles University, Faculty of Mathematics and Physics

A Presuppositional Account of Causal and Temporal Interpretations of And

Evidence from Evidentials

Toward a Discourse Structure Account of Speech and Attitude Reports

A Simple Word Interaction Model for Implicit Discourse Relation