Predication Driven Textual Entailment

“Al. I. Cuza” University of Iasi Faculty of Computer Science Predication Driven Textual Entailment 2011 PhD Student Mihai-Alex Moruz Supervisor Prof. Dr. Dan Cristea Thesis version updated and modified according to the recommendations of the reviewers. Acknowledgements First of all, I would like to thank my supervisor, Prof. Dan Cristea for guiding my steps as a researcher, for helping me expand my horizons and for always pushing me to better myself. I would also like to thank him for the confidence he had in my abilities, which helped me through the highs and the lows of these years, and for the insights he provided with regards to my research. Special thanks go out to my colleagues from the Natural Language Processing group within the faculty of Computer Science, with whom I have shared these past years of researching, attending conferences, talking and laughter, and who made these years such joy: Adi Iftene (to whom I would like to thank again for all the talks and collaboration), Maria Husarciuc (who is now Maria Moruz, my loving wife), Diana Trandabăţ, Ionuţ Pistol, Lucian Gâdioi, Iustin Dornescu and Marius Răschip. Thank you all for your help, for your advice and for all the things that I managed to learn from you all. I am also very grateful to my colleagues at the Institute for Computer Science within the Romanian Academy, especially to Neculai Curteanu, who helped me become a better researcher by teaching me how to read and write scientific papers, how to be thorough and how follow through with my ideas. Thank you for the years of patience and the wonderful discussions. Many thanks to prof. Dan Tufiş, for giving me such valuable advice throughout my PhD and for always taking the time to answer all of my questions, to prof. Bernardo Magnini, for helping me shape some of the fundamental ideas of my thesis, and to prof. Dorel Lucanu and prof. Cornelius Croitoru, for their insightful comments and advice. I would also like to offer special thanks to my family, for their patience and understanding, for putting up with the long days and nights of work, and for the help they so graciously provided me, both scientifically and otherwise. I would also like to acknowledge the financial support received from the reaserch grants „Dezvoltarea oportunităţilor oferite doctoranzilor pentru traiectorii flexibile în cariera de cercetare” POSDRU- 6/1.5/S/25, PNCDI II: “SIR – RESDEC Sistem de Întrebare-Răspuns în limbile Română şi Engleză cu Spaţii Deschise de Căutare”, PNCDI II: “eDTLR – Dicţionarul Tezaur al Limbii Române în format electronic” and PNCDI II: “INTELCHIM – Modelare şi i conducere automată utilizând instrumente ale inteligenţei artificiale pentru aplicaţii în chimie şi inginerie de proces”. ii Abstract One of the most relevant phenomena in natural language is that of variability, which means expressing the same thing using different surface representations. In order to address this issue, the notion of textual entailment was introduced (the notion of whether a text can be deduced from another). This thesis describes our contribution to the field of textual entailment. Our approach is based on the notion of predicational semantics, as we believe that deep semantic understanding of natural language utterances can be more effectively deduced from the analysis of predicates and their arguments, and that deep semantic understanding is one of the best solutions for the problem of textual entailment. We have given a novel interpretation for the definition of textual entailment that builds upon our intuition. We have also described an algorithm for solving textual entailment on the basis of predicational semantics and argument structure unification. The approach described in this paper is both novel (there are, to our knowledge, no entailment systems based on predicational semantics) and effective, as our approach solves a large majority of entailment pairs (based on manual analysis of corpora) and the implementation based on our approach proved to have good results in evaluation campaigns. The approach to solving textual entailment proposed in this thesis is language independent, given the apropriate lexical semantic resources available for that language. In order to extend existing resources, and to create new ones, we make use of our results with dictionary entry parsing, (mainly eDTLR), to propose ways of extending existing lexical semantic resources for Romanian, and for creating of new ones. The creation of these resources allow for the adaptation of our system to the Romanian language, without changing the TE algorithm we proposed. The thesis is structured as follows: chapter 1 presents the general framework of textual entailment and describes the RTE challenges; chapter 2 gives the current state of the art for textual entailment; chapter 3 describes the theoretical basis of our approach; chapter 4 describes the implementation and results of our entailment system; chapter 5 discusses possible extensions to Romanian lexical-semantic resources based on eDTLR, together with the description of the parsing of the DLR, and chapter 6 discusses conclusions and future work, and provides an outline of the personal contributions of the thesis. iii Table of Contents Acknowledgements ........................................................................................................................ i Abstract ......................................................................................................................................... iii Table of Contents ......................................................................................................................... iv 1. Introduction ........................................................................................................................... 1 1.1. Operational Definitions for Textual Entailment .......................................................................... 4 1.2. Recognising Textual Entailment Challenge (RTE) ..................................................................... 6 1.2.1. First Recognising Textual Entailment Challenge .................................................................................. 7 1.2.2. The Second PASCAL Recognising Textual Entailment Challenge ....................................................... 9 1.2.3. The Third PASCAL Recognising Textual Entailment Challenge........................................................ 10 1.2.4. TAC 2008 Recognizing Textual Entailment (RTE) Track .................................................................. 12 1.2.5. PASCAL Recognizing Textual Entailment Challenge (RTE-5) at TAC 2009 .................................... 13 1.2.6. PASCAL Recognizing Textual Entailment Challenge (RTE-6) at TAC 2010 .................................... 15 1.3. Conclusions ............................................................................................................................... 17 2. State of the Art in Textual Entailment ............................................................................. 19 2.1. Lexical Representation .............................................................................................................. 19 2.2. Syntactic Graph Distance .......................................................................................................... 23 2.3. Tree Edit Distance Algorithms .................................................................................................. 25 2.4. Logical Inference ....................................................................................................................... 28 2.5. Atomic Propositions .................................................................................................................. 30 2.6. Machine learning ....................................................................................................................... 31 2.7. Entailment Rules ....................................................................................................................... 36 2.8. Cross Lingual Textual Entailment ............................................................................................. 44 2.9. Discourse knowledge ................................................................................................................. 45 2.10. Uses of Textual Entailment ....................................................................................................... 48 iv 3. Solving Textual Entailment by Semantic Means ............................................................. 52 3.1. Levin‟s classes and VerbNet ..................................................................................................... 55 3.1.1. Syntactic Frames in VN ....................................................................................................................... 55 3.1.2. Semantic predicates ............................................................................................................................. 56 3.1.3. Statistics about VerbNet ...................................................................................................................... 56 3.2. A Predication Based Algorithm for Soving Textual Enailement .............................................. 56 3.3. Corpus analysis .......................................................................................................................... 59 3.3.1. Pairs solved using VN.........................................................................................................................

Load more