On the Effective Revision of (Bayesian) Logic Programs

ON THE EFFECTIVE REVISION OF (BAYESIAN) LOGIC PROGRAMS Aline Marins Paes Doctorate Thesis presented to the Graduate Department of Systems Engineering and Computer Science, COPPE, of Federal University of Rio de Janeiro as a partial fulfillment of the requirements for the degree of Doctor of Systems Engineering and Computer Science. Advisors: Gerson Zaverucha V´ıtorManuel de Morais Santos Costa Rio de Janeiro September, 2011 ON THE EFFECTIVE REVISION OF (BAYESIAN) LOGIC PROGRAMS Aline Marins Paes DOCTORATE THESIS PRESENTED TO THE GRADUATE DEPARTMENT OF SYSTEMS ENGINEERING AND COMPUTER SCIENCE, COPPE, OF FEDERAL UNIVERSITY OF RIO DE JANEIRO AS A PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF SYSTEMS ENGINEERING AND COMPUTER SCIENCE. Committee: Prof. Gerson Zaverucha, Ph.D. Prof. V´ıtorManuel de Morais Santos Costa, Ph.D. Prof. Mario Roberto Folhadela Benevides, Ph.D. Prof. Bianca Zadrozny, Ph.D. Prof. Stephen Howard Muggleton, Ph.D. RIO DE JANEIRO, RJ - BRAZIL SEPTEMBER, 2011 Paes, Aline Marins On the Effective Revision of (Bayesian) Logic Programs/ Aline Marins Paes. - Rio de Janeiro: UFRJ/COPPE, 2011 XXIV, 282 p.: il.; 29,7 cm. Advisors: Gerson Zaverucha V´ıtorManuel de Morais Santos Costa Thesis (PhD) - UFRJ/COPPE/Department of Sys- tems Engineering and Computer Science, 2011 Bibliography: p.240-276 1. Machine Learning. 2. Inductive Logic Program- ming. 3. Theory Revision from Examples. 4. Probabilistic Logic Learning. 5. Bayesian Networks. 6. Stochastic Local Search. I. Zaverucha, Gerson et al.. II. Federal University of Rio de Janeiro, COPPE, Department of Systems Engineering and Computer Science. III. Title. iii To my dearest mother Geiza, the pillar of my creation, and to my beloved husband Ricardo, the pillar of my evolution. iv Acknowledgments First, I would like to thank God, for the gift of life. "God created everything through him, and nothing was created except through him." (John 1:3) I could have not completed this thesis without the huge support of my family. I would like to greatly thank them, specially: I would like to thank my parents, Antonio and Geiza, for everything they denied and sacrificed themselves for providing me the best possible education; for their belief in me, which has been much greater than my belief in myself and for supporting me in all the decisions that I had made that conducted me to this PhD. I would like to thank my husband Ricardo, for being my greatest encourager, for carefully listening all the ideas and difficulties related to this thesis and for giving so many wise advices; for making my life much more meaningful and happy. I would like to thank my sister Alessandra and my brother Thiago, my brother- in-law Braz and sister-in-law Marcela, for the affection, care and encouragement. I would like to thank my niece Susan and my nephew Jo~aoV´ıtor,for their ability to fill my life with joy, even in the most stressful times. And I would like to thank my family of Niteroi, Jane, Ricardo, Janete, Fer- nando, Arthur, Zéliaand Ronald, for all the affection. I would like to thank all the teachers and professors who have been extremely generous to transmit me knowledge. Specially, I would like to thank the professors that advised and supervised this research: I thank my advisor, professor Gerson Zaverucha, for all the support, dedication to the research, encouragement and for being a model of determination on pursuing a research. Gerson's passion for research was a big factor for my decision on being a v researcher. I also thank him for the opportunities he had offered me during the PhD, when contributing and stimulating my participation in prestigious conferences, and for allowing me to co-advise some of his undergraduate students. I thank my advisor, professor V´ıtorSantos Costa, for being always available to help me on my numerous doubts. I thank him very much for his generosity on the development, maintenance and continuous update of YAP Prolog, the best Prolog compiler ever. I thank him for all the valuable tips of how to elegantly write English. Additionally, I would like to thank him and his family for hosting me so nicely in their home in Portugal. I thank my internship supervisor, professor Stephen Muggleton, for allowing me to be part of the Computational Bioinformatics Laboratory of Imperial College London for a year. His generosity, creativity, patience and so profound scientific knowledge have greatly contributed to this research and to my own development as a researcher. I would like to thank my friends for the great support they have provided me along these years, specially: Kate Revoredo, for being a model of determination, for the partnership, collaboration and a lot of great advices. Ana Lu´ısaDuboc, for the partnership and encouragement. Past and present friends of the AI Lab, Juliana Bernardes, Cristiano Pitangui, Carina Lopes, Elias Barenboim, FábioVieira, Glauber Menezes, Alo´ısioPina, for sharing ideas over these years and for providing a fun workplace. Friends of the Computational Bioinformatics Laboratory, JoséSantos, Ramin Ramezani, Pedro Torres, Niels Pahlavi, Jianzhong Chen, Alireza Tamaddoni-Nezhad, Hiroaki Watanabe, Robin Baumgarten and John Charnley, for sharing ideas during the year I have worked in Imperial College and for being such wonderful hosts. My undergraduate \co-students", Eric Couto, Rafael Pereira and Guilherme Niedu for their collaboration. All my friends, those far away and those close to me. I know that they have always kept me on their good thoughts. I thank very much the committee, professors Stephen Muggleton, MárioBene- vi vides and Bianca Zadrozny, for the careful analysis of so many pages in such little time and for the greatly valuable comments made on this thesis. I thank the PESC secretaries, Claudia, Solange, Sôniaand Gutierrez, and CBL secretary, Bridget, who are always available to solve any administrative issue. I thank the PESC support team, Itamar, Adilson, Jo~aoV´ıtor,Alexandre and Thiago, who are always ready to solve any physical and network issues. I thank UFRJ and Imperial College London for the physical installations. And finally I thank CAPES and CNPq for the financial support. vii Abstract of Thesis presented to COPPE/UFRJ as a partial fulfillment of the requirements for the degree of Doctor of Science (D.SC.) Aline Marins Paes September/2011 Advisors: Gerson Zaverucha V´ıtorManuel de Morais Santos Costa Department: Systems Engineering anc Computer Science Theory Revision from Examples is the process of improving user-defined or automatically generated theories, guided by a set of examples. Despite several suc- cessful theory revision systems having been built in the past, they have not been widely used. We claim that the main reason is the inefficiency of these systems, as they usually yield large search spaces. This thesis contributes towards the goal of designing feasible theory revision systems. First, we focus on first-order theory revision. We introduce techniques from Inductive Logic Programming (ILP) and from Stochastic Local Search to reduce the size of each individual search space generated by a FOL theory revision system. We show through experiments that it is possible to have a revision system as efficient as a standard ILP system and still generate more accurate theories. Moreover, we present an application involving the game of Chess that is successfully solved by theory revision, in contrast with a learning from scratch system that fails in correctly achieving the required theory. As first-order logic handles well multi-relational domains but fails on representing uncertain data, there is a great recent interest in joining relational representations with probabilistic reasoning mechanisms. We have contributed with a probabilistic first-order theory revision system called PFORTE. However, despite promising results in artificial domains, PFORTE faces the complexity of searching and per- forming probabilistic inference over large search spaces, making it not feasible to be applied to real world domains. Thus, the second major contribution of this thesis is to address the bottlenecks of probabilistic logic revision process. We aggregate techniques from ILP and probabilistic graphical models to reduce the search space of the revision process and also of the probabilistic inference. The new probabilistic revision system was successfully applied in real world datasets. viii Contents Acknowledgments v Nomenclature xiii List of Algorithms xv List of Figures xviii List of Tables xxiv 1 Introduction 1 1.1 First-order Logic Theory Revision from Examples . 2 1.1.1 Contributions to First-order Logic Theory Revision . 4 1.2 Probabilistic Logic Learning . 7 1.2.1 BFORTE: Towards a Feasible Probabilistic Revision System . 8 1.3 Publications . 11 1.4 Thesis outline . 12 2 ILP and Theory Revision 14 2.1 Inductive Logic Programming . 15 2.1.1 Learning Settings in ILP . 17 2.1.2 Ordering the Hypothesis Search Space . 18 2.1.3 Mode Directed Inverse Entailment and the Bottom Clause . 18 2.2 First-order Logic Theory Revision from Examples . 23 2.2.1 Revision Points . 25 2.2.2 Revision Operators . 26 2.2.3 FORTE . 27 ix CONTENTS 3 YAVFORTE: A Revised Version of FORTE, Including Mode Di- rected Inverse Entailment and the Bottom Clause 42 3.1 Introduction . 42 3.2 Restricting the Search Space of Revision Operators . 44 3.3 Improvements Performed on the Revision Operators . 46 3.3.1 Using the Bottom Clause as the Search Space of Antecedents when Revising a FOL theory . 46 3.3.2 Modifying the Delete Antecedent Operator to use Modes Lan- guage and to Allow Noise . 53 3.4 Experimental Results . 54 3.5 Conclusions . 63 4 Chess Revision: Acquiring the Rules of Variants of Chess through First-order Theory Revision from Examples 66 4.1 Motivation . 66 4.2 Revision Framework for Revising Rules of Chess to Turn Them in the Rules of Variants .

Load more