Distributional Semantics for Robust Automatic Summarization
Total Page:16
File Type:pdf, Size:1020Kb
DISTRIBUTIONAL SEMANTICS FOR ROBUST AUTOMATIC SUMMARIZATION by Jackie Chi Kit Cheung A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Computer Science University of Toronto Copyright © 2014 by Jackie Chi Kit Cheung Abstract Distributional Semantics for Robust Automatic Summarization Jackie Chi Kit Cheung Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2014 Large text collections are an important resource of information about the world, containing everything from movie reviews and research papers to news articles about current events. Yet the sheer size of such collections presents a challenge for applications to make sense of this data and present it to users. Automatic summarization is one potential solution which aims to shorten one or more source documents while retaining the important information. Summa- rization is a complex task that requires inferences about the form and content of the summary using a semantic model. This dissertation examines the feasibility of distributional semantics as the core seman- tic representation for automatic summarization. In distributional semantics, the meanings of words and phrases are modelled by the contexts in which they appear. These models are easy to train and have found successful applications, but they have until recently not been seriously considered as contenders to support semantic inference for complex NLP tasks such as sum- marization because of a lack of evaluation methods that would demonstrate their benefit. I argue that current automatic summarization systems avoid relying on semantic analysis by focusing instead on replicating the source text to be summarized, but that substantial progress will not be possible without semantic analysis and domain knowledge acquisition. To over- come these problems, I propose an evaluation framework for distributional semantics based on first principles about the role of a semantic formalism in supporting inference. My experi- ments show that current distributional semantic approaches can support semantic inference at ii a phrasal level invariant to the constituent syntactic constructions better than a word overlap baseline. Then, I present a novel technique to embed distributional semantic vectors into a genera- tive probabilistic model for domain modelling. This model achieves state-of-the-art results in slot induction, which also translates into better summarization performance. Finally, I intro- duce a text-to-text generation technique called sentence enhancement that combines parts of heterogeneous source text sentences into a novel sentence, resulting in more informative and grammatical summary sentences than a previous sentence fusion approach. The success of this approach relies crucially on distributional semantics in order to determine which parts may be combined. These results lay the groundwork for the development of future distributional semantic models, and demonstrate their utility in determining the form and content of automatic sum- maries. iii Acknowledgements Graduate school has been a memorable journey, and I am very fortunate to have met some wonderful people along the way. First of all, I’d like to thank my advisor, Gerald Penn, who has been extremely patient and supportive. He let me pursue the research topics that I fancied while providing guidance to my research and career. After all this time, I am still amazed by the depth and breadth of his research vision, and his acumen in selecting meaningful research problems. I can only hope that enough of his taste in research and his sense of humour has rubbed off on me. I’d also like to thank the other members of my committee. Graeme Hirst has been an influ- ential mentor. His insightful questions and comments have caused me to reflect on the nature of computational linguistics research. Hector Levesque brought in a much needed outside per- spective from knowledge representation. I’d like to thank Suzanne Stevenson for interesting discussions in seminars, meetings, and at my defense. My external examiner, Mirella Lapata, has been a great inspiration for my research, and provided valuable feedback on this disserta- tion. During my degree, I spent two highly productive terms at Microsoft Research Redmond in the Speech and Natural Language Processing groups. I’d like to thank my internship mentors Xiao Li, Hoifung Poon, and Lucy Vanderwende for fruitful discussions and collaborations, and the other members of the Speech and NLP groups for lively conversations over lunch. I am grateful to have been supported by the Natural Sciences and Engineering Research Council of Canada and by a Facebook PhD Fellowship. Graduate student life would have been drab without the enrichment of colleagues and friends. The Department of Computer Science and the Computational Linguistics group in particular are full of smart and friendly people who filled my days with joy. Jorge Aranda got me hooked on board games. Giovanna Thron and Siavosh Benabbas threw awesome parties. Jono Lung and Aditya Bhargava taught me about finance. Elizabeth Patitsas is a good friend who owns two cats and is very fit. Andrew Perrault introduced me to climbing. Erin Delisle iv will do a great job organizing games. Aida Nematzadeh and Amin Tootoonchian are the kind- est people that I know, and put up with my attempts to learn Farsi. George Dahl keeps me (and the CL community) honest. The lunch crew (Afsaneh Fazly, Libby Barak, Katie Fraser, Varada Kolhatkar, and Nona Naderi) accepted me into their fold. Michael Guerzhoy is a conspirator in further bouts of not-working. Yuval Filmus is brilliant and has a good perspective on life. Timothy Fowler, Eric Corlett, and Patricia Araujo Thaine are great meeting and dinner buddies. Dai Tri Le Man and Sergey Gorbunov kept my appreciation of music alive. Dan Lister, Sam Kim, and Mischa Auerbach-Ziogas made for formidable Agricola opponents. There are many others both inside and outside of the department who I am lucky to have met. I am also lucky to have “old friends” from Vancouver like Hung Yao Tjia, Muneori Otaka, Joseph Leung, Rory Harris and Sandra Yuen to catch up with. Finally, my family is my rock. My brother Vincent, my sister Joyce and her new family help me put things in perspective. My parents, Cecilia and David, are always there to support me through all of my hopes and dreams, whining and boasting, failures and successes. They remind me that life is more than just about research, and are the foundation that gives me confidence to lead my life. v To my parents. vi Contents 1 Introduction 1 1.1 Two Approaches to Semantics . .3 1.2 Semantics in Automatic Summarization . .5 1.3 Thesis Statement . .7 1.4 Dissertation Objectives . .8 1.5 Structure of the Dissertation . 10 1.5.1 Peer-Reviewed Publications . 12 2 Centrality in Automatic Summarization 13 2.1 The Design and Evaluation of Summarization Systems . 13 2.1.1 Classification of Summaries . 14 2.1.2 Steps in Summarization . 17 2.1.3 Summarization Evaluation . 17 2.2 The Assumption of Centrality . 23 2.2.1 Text Properties Important in Summarization . 24 2.2.2 Cognitive Determinants of Interest, Surprise, and Memorability . 26 2.2.3 Centrality in Content Selection . 29 2.2.4 Abstractive Summarization . 36 2.3 Summarizing Remarks on Summarization . 38 vii 3 A Case for Domain Knowledge in Automatic Summarization 40 3.1 Related Work . 42 3.2 Theoretical basis of the analysis . 43 3.2.1 Caseframe Similarity . 45 3.2.2 An Example . 46 3.3 Experiments . 46 3.3.1 Study 1: Sentence Aggregation . 47 3.3.2 Study 2: Signature Caseframe Density . 50 3.3.3 Study 3: Summary Reconstruction . 54 3.4 Why Source-External Elements? . 57 3.4.1 Study 4: Provenance Study . 58 3.4.2 Study 5: Domain Study . 60 3.5 Summary and Discussion . 62 4 Compositional Distributional Semantics 64 4.1 Compositionality and Co-Compositionality in Distributional Semantics . 64 4.1.1 Several Distributional Semantic Models . 66 4.1.2 Other Distributional Models . 68 4.2 Evaluating Distributional Semantics for Inference . 71 4.2.1 Existing Evaluations . 73 4.2.2 An Evaluation Framework . 75 4.2.3 Task 1: Relation Classification . 76 4.2.4 Task 2: Restricted QA . 78 4.3 Experiments . 80 4.3.1 Task 1 . 82 4.3.2 Task 2 . 84 4.4 Conclusions . 85 viii 5 Distributional Semantic Hidden Markov Models 87 5.1 Related Work . 89 5.2 Distributional Semantic Hidden Markov Models . 90 5.2.1 Contextualization . 94 5.2.2 Training and Inference . 96 5.2.3 Summary and Generative Process . 96 5.3 Experiments . 97 5.4 Guided Summarization Slot Induction . 98 5.5 Multi-document Summarization: An Extrinsic Evaluation . 100 5.5.1 A KL-based Criterion . 101 5.5.2 Supervised Learning . 102 5.5.3 Method and Results . 103 5.6 Discussion . 104 6 Sentence Enhancement for Automatic Summarization 107 6.1 Sentence Revision for Abstractive Summarization . 107 6.2 Related Work . 109 6.3 A Sentence Enhancement Algorithm . 110 6.3.1 Sentence Graph Creation . 111 6.3.2 Sentence Graph Expansion . 112 6.3.3 Tree Generation . 115 6.3.4 Linearization . 117 6.4 Experiments . 118 6.4.1 Method . 118 6.4.2 Results and Discussion . 121 6.5 Discussion . 122 6.6 Appendix: ILP Encoding . 123 6.6.1 Informativeness Score . 123 ix 6.6.2 Objective Function . 123 6.6.3 Syntactic Constraints . 124 6.6.4 Semantic Constraints . 125 7 Conclusion 126 7.1 Summary of Contributions . 126 7.1.1 Distributional Semantics . 126 7.1.2 Abstractive Summarization . 127 7.2 Limitations and Future Work . 128 7.2.1 Distributional Semantics in Probabilistic Models . 128 7.2.2 Abstractive Summarization . 129 Bibliography 130 x List of Tables 3.1 A sentence decomposed into its dependency edges, and the caseframes derived from those edges that are considered.