Automatic Joke Generation: Learning Humour from Examples
Total Page:16
File Type:pdf, Size:1020Kb
Automatic Joke Generation: Learning Humour from Examples Thomas Winters Thesis submitted for the degree of Master of Science in Engineering: Computer Science Thesis supervisor: Prof. dr. Danny De Schreye Assessor: Prof. dr. Luc De Raedt, Dr. Stefano Teso Mentor: Vincent Nys Academic year 2016 – 2017 c Copyright KU Leuven Without written permission of the thesis supervisor and the author it is forbidden to reproduce or adapt in any form or by any means any part of this publication. Requests for obtaining the right to reproduce or utilize parts of this publication should be addressed to the Departement Computerwetenschappen, Celestijnenlaan 200A bus 2402, B-3001 Heverlee, +32-16-327700 or by email [email protected]. A written permission of the thesis supervisor is also required to use the methods, products, schematics and programs described in this work for industrial or commercial use, and for submitting this publication in scientific contests. Preface I would like to express my sincerest gratitude to prof. dr. Danny De Schreye and Vincent Nys for their enthusiasm and encouragements, for their trust, for allowing extensive initial experimentation, for ensuring that the topic of this thesis was steered towards scientifically interesting aspects, for their patience and for the many inspiring discussions, interesting arguments and enlightening suggestions throughout the year. Thanks are also due to all the users of JokeJudger.com, the platform we created for this thesis, where hundreds of volunteers created and rated jokes for the training dataset and the evaluation of the system. I would also like to thank the jury for reading the text. Finally, I would also like to thank my family and my friends for their support. Thomas Winters i Contents Prefacei Abstract iv List of Figuresv List of Abbreviations and Symbols vii 1 Introduction1 1.1 Problem Statement............................ 1 1.2 Research Questions............................ 3 1.3 Relevancy of Computational Humour.................. 4 1.4 Thesis Structure ............................. 6 2 Related Research and Terminology9 2.1 Introduction................................ 9 2.2 Humour Theory.............................. 9 2.3 Related Concepts & Methods...................... 15 2.4 Related Humour Generators....................... 19 2.5 Related Humour Detectors ....................... 21 2.6 Conclusion ................................ 22 3 Generalised Joke Generation 23 3.1 Introduction................................ 23 3.2 Generalising Schemas .......................... 23 3.3 Metric Set Identification......................... 26 3.4 GOOFER Framework .......................... 29 3.5 Notes about the Framework....................... 34 3.6 Conclusion ................................ 34 4 Generalised Analogy Generator 35 4.1 Introduction................................ 35 4.2 Simplified Pipeline............................ 36 4.3 Data Collection: JokeJudger....................... 37 4.4 RegEx Based Template Stripper .................... 47 4.5 Metrics used ............................... 48 4.6 Template Values Generator ....................... 50 4.7 Classifier.................................. 50 4.8 Evaluation................................. 53 ii Contents 4.9 Conclusion ................................ 58 5 Conclusion 59 5.1 Future work................................ 59 5.2 Applications in a Real World Context ................. 60 5.3 Conclusion ................................ 61 A Program manuals 65 A.1 Deploying JokeJudger .......................... 65 A.2 JokeJudger Data............................. 65 A.3 Extending JokeJudger to Other Types of Jokes............ 66 A.4 Deploying Generalised Analogy Generator............... 66 Bibliography 69 iii Abstract This thesis explores how a program is able to generate humour by learning insights from a set of rated example jokes. We construct a framework called Goofer to fulfil this task, and use humour theory and generalise computational humour concepts to guide the construction of its components. These components allow the framework to learn the structures of the given jokes and estimate how funny people might find specific instantiations of joke structures. This knowledge is then employed to generate good jokes. Following the theoretical foundation and specification for the humour generating framework, we implement and evaluate a subset of the framework. This system, called Gag, learns how to generate analogy jokes using the “I like my X like I like my Y , Z” template based on given, rated analogy jokes. Since it uses generalised components and learns its own schemas, this program successfully generalises the best known analogy generator in the computational humour field. iv List of Figures 2.1 An uninstantiated JAPE schema (left) and an instantiated JAPE schema (right), taken from the JAPE paper [7]................... 16 2.2 The newelon2 schema, from the Standup system [34].......... 19 3.1 Our mapping of the analogy generating model created by Petrovic & Matthews [41] to a probabilistic schema.................. 26 3.2 A schematic overview of the generic Goofer framework for joke generation from examples........................... 29 4.1 A schematic overview of how “I like my X like I like my Y , Z” jokes could be generated following the design of our Goofer framework. 36 4.2 The “rate” page, where users rate other users’ jokes............ 39 4.3 The “create” page allows users to create jokes with helpers like challenges, suggestions and a randomiser. ................. 39 4.4 The “hall of fame” lists the best jokes of the week, month and of all time. 40 4.5 The “notifications” page shows the user the most recently received ratings on their jokes. ............................ 40 4.6 The “created jokes” page shows a histogram of the received ratings for every created joke............................... 41 4.7 The “ratings overview” page allows users to easily rediscover their favourite jokes on the site........................... 41 4.8 The “profile” page allows to change several settings and displays achievement ranks............................... 42 4.9 The percentages of the ratings for human-created jokes on JokeJudger at the end of the training data collection phase. ............... 46 4.10 The regular expression to extract template values from the “I like my X like I like my Y , Z” template. ....................... 47 4.11 The number of jokes in the training dataset per mode........... 51 4.12 The confusion matrix of the Random Tree classifier on mode ratings of the training data................................ 51 4.13 The average rating for the jokes in the training dataset.......... 52 4.14 The attribute importance according to the regression version of the Random Forest algorithm on the average score of the training data . 53 v List of Figures 4.15 The percentages of the rating categories for the classification Gag system, the regression Gag system, all human-created jokes, all human-created jokes created on JokeJudger (= excluding jokes scraped from other sites such as Twitter and Reddit for the initial dataset), and all the human-created jokes that only use a single word in for every template value................................. 55 4.16 The percentage of ratings higher than or equal to four stars out of five for each category of figure 4.15........................ 56 vi List of Abbreviations and Symbols Abbreviations AI Artificial Intelligence CH Computational Humour GTVH General Theory of Verbal Humour KR Knowledge Resource (of GTVH) NLG Natural Language Generation NLP Natural Language Processing RegEx Regular expression SSTH Semantic Script Theory of Humour Abbreviations for Our Systems Goofer Generator Of One-liners From Examples with Ratings Gag Generalised Analogy Generator Symbols X The first variable used in the “I like my X like I like my Y , Z” template Y The second variable used in the “I like my X like I like my Y , Z” template Z The third variable used in the “I like my X like I like my Y , Z” template vii Chapter 1 Introduction 1.1 Problem Statement This thesis explores how a computer program is able to learn humour from human- rated examples. We achieve this by using humour theory and concepts from related research as a guidance to design a generalised computational humour framework. We then implement several components of this framework to construct a generalised analogy generator. This research is situated in computational humour (CH), a branch of natural language processing (NLP) and artificial intelligence (AI). In this field, researchers perform several kinds of tasks on humour using programs. There are three main categories of tasks within computational humour [6]. The first type of task is the generation of humorous artefacts using programs. The second type of task is the automatic detection of certain types of humour. The third type is the use of computational concepts to understand and evaluate humour theory [48]. This thesis focuses on the first and the last categories: generating and understanding humour. In this thesis, we create a framework that learns humour theory from input jokes and uses this knowledge to generate new jokes. We call this framework Goofer, which stands for Generator of One-liners From Examples with Ratings. It builds upon multiple existing computational humour systems and theories and can be extended in the interest of generalising other computational humour systems. In order to show an implementation of such a generalisation using this framework, we created a system called Gag