The Role of Semantics in Automatic Summarization: A Feasibility Study

Tal Baumel Michael Elhadad Computer Science Department Computer Science Department Ben-Gurion University Ben-Gurion University Beer-Sheva, Israel Beer-Sheva, Israel [email protected] [email protected]

appropriate pronoun resolution would match the pronouns to the correct entities;  High lexical coverage does not mean co- State-of-the-art methods in automatic summarization herent summaries: the focus on current rely almost exclusively on extracting salient sentences summarization research has been to optimize from input texts. Such extractive methods succeed in ROUGE results. ROUGE measures lexical producing summaries which capture salient infor- overlap between manual summaries and the mation but fail to produce fluent and coherent sum- maries. Recent progress in robust semantic analysis candidate one. We observe, however, that makes the application of semantic techniques to summarizers that achieve the highest summarization relevant. We review in this paper areas ROUGE scores often obtain low manual in the field of summarization that can benefit from the scores because they produce non-cohesive introduction of the type of semantic analysis that has and sometimes non-coherent summaries. become available. The main pain points that semantic information can alleviate are: aiming for more fluent We address those limitations of existing methods summaries by exploiting logical form representation based exclusively on by using of the source text; identifying salient information and intermediate semantic annotations for both sum- avoiding redundancy by relying on marization generation and evaluation. and paraphrase identification; and generating a coher- ent summary while relying on rhetorical structure and discourse structure information extracted from the Summarization Generation source documents. In addition, we review the possi- bility to perform automatic Pyramid evaluation of For generating automatic summarizations, we summarization quality that relies on robust semantic propose the following scheme: similarity measures.  Annotate the input documents with SRL: we annotated large summarization datasets Introduction (DUC and TAC) using SEMAFOR (Das et al., 2014) for FrameNet annotations and Current state-of-the-art automatic multi- SENNA (Collobert et al., 2011) for Prop- document summarization methods (Erkan and Bank annotations. We parse both the source Radev, 2004; Haghighi and Vanderwende, 2009) documents and the manual summaries of and automatic evaluation methods (Lin, 2004) each document cluster. We are producing a rely exclusively on lexical features. Those meth- descriptive analysis of the distribution of ods cannot attend the following problems: Frames and Entities in the datasets.  Non-lexical similarity cannot be recog-  Apply salience identifier: both PropBank nized: pairs such as “Danny and his friends and FrameNet annotations help with identi- bought a chocolate bar from Yossi” and “He fying paraphrases - syntactic alternations or sold them a snack” are quasi-paraphrases lexical variants of the same relations (i.e., although they don’t share any lexical . “add up” and “total” are both different lexi- Correct SRL annotation can identify that cal units of the frame “Adding_up”). “buy” and “sell” refer to the same frame and FrameNet annotations include frames rela- tions such as: Inherits from, Perspective on, Uses that can be further exploited by our  Document structure: Knowing what infor- system to align predicates across documents mation should be included in the summary is and with manual summaries. not sufficient to construct coherent text. In  Generate summarization given salient order to convey a narrative in the summary frames: after identifying the salient frames we need to determine which information we generate coherent text using text genera- should appear where, and what fragments tions tools such as FUF (Elhadad, 1991) need to be in the same paragraphs.  Aggregation: merging similar fragments Semantic Salience Text Text IR Filtered IR Summary Identifier Generataion across frames improves readability and avoids redundancy. Aggregation appears at

Figure 1: Proposed Summarization Scheme various levels of language: lexical, syntactic and in discourse structure (i.e., “Jacob is the Summarization Evaluation son of Isaac” and “Esau is the son of Isaac” aggregate into “Jacob and Esau are sons of For automatic summarization evaluation, we Isaac”). The evaluation system should rec- adapt the MEANT approach developed in Ma- ognize that frame-elements from different chine Translation (Wu, 2011). MEANT sentences are aggregated into a single propo- measures translation quality as follows: First, sition. is performed (either man- We are measuring the frequency of each of these ually or automatically) on both reference transla- points on the automatically SRL annotated da- tion and . The annotated tasets of DUC and TAC. On the basis of this translations are compared by matching frame by data analysis, we are assessing the potential of frame, argument by argument. The final score is developing better automatic evaluation of sum- the F-measure score of the aligned graph. When marization algorithms, and to improve each of using manual SRL, MEANT can achieve 0.4324 the candidate aspects of the process. Kendall 흉 correlation agreement with human evaluation, same as HTER. When using an au- tomatic SRL tool (Pradhan et al., 2004), it References achieves 0.3423 agreement with manual annota- [1] Collobert, Ronan, et al. "Natural language processing tors, while BLEU only achieves 0.1982. Adapt- (almost) from scratch." The Journal of Machine Learn- ing a similar system to measure the semantic ing Research 12 (2011): 2493-2537. Baron, Jason R., quality and coherence of automatic summary David D. Lewis, and Douglas W. Oard. "TREC 2006 Legal Track Overview." TREC. 2006. requires modification in the alignment procedure [2] Das, Dipanjan, et al. "Frame-semantic parsing." Com- and weighting predicates according to the num- putational Linguistics 40.1 (2014): 9-56. ber of times they appear in manual summaries (in [3] Elhadad, Michael. "FUF: The universal unifier user the manner similar to the Pyramid evaluation manual version 5.0." (1991). method (Nenkova et al., 2007), but taking into [4] Erkan, Günes, and Dragomir R. Radev. "LexRank: account the confidence in the semantic match graph-based lexical centrality as salience in text sum- across content units. marization." Journal of Artificial Intelligence Research (2004): 457-479. While MEANT is an important tool for text simi- [5] Haghighi, Aria, and Lucy Vanderwende. "Exploring larity, it has only been tested on automatic trans- content models for multi-document summarization." lation datasets, those datasets tested translations Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chap- of single sentences and not 250 text as is ter of the Association for Computational Linguistics. common in automatic summarization datasets. Association for Computational Linguistics, 2009. There are new challenges that must be addressed [6] Lin, Chin-Yew. "Rouge: A package for automatic when evaluating large texts: evaluation of summaries." Text Summarization Branches Out: Proceedings of the ACL-04 Workshop.  Referring expression: natural text usually con- 2004. tains anaphors (e.g., “The boy went to school. He had fun there”). Anaphors are frequent and must [7] Lo, Chi-kiu, and Dekai Wu. "MEANT: An inexpen- sive, high-accuracy, semi-automatic metric for evaluat- be resolved to match content units in the text and ing translation utility via semantic frames." Proceed- to verify that the generated text does not intro- ings of the 49th Annual Meeting of the Association for duce unwanted ambiguity. Computational Linguistics: Human Language Tech- nologies-Volume 1. Association for Computational Linguistics, 2011. [8] Ani Nenkova, Rebecca Passonneau and Kathy McKe- own. The Pyramid Method: Incorporating human con- tent selection variation in summarization evaluation, Journal ACM Transactions on Speech and Language Processing (TSLP) Volume 4 Issue 2, May 2007 [9] Sameer Pradhan, Wayne Ward, Kadri Hacioglu, James H. Martin, and Dan Jurafsky. Shallow Semantic Pars- ing Using Support Vector Machines. In Proceedings of the 2004 Conference on Human Language Technology and the North American Chapter of the Association for Computational Linguistics (HLT-NAACL-04), 2004.