Computational Analysis of Humour
Total Page:16
File Type:pdf, Size:1020Kb
Computational Analysis of Humour Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Exact Humanities by Research by Vikram Ahuja 201256040 [email protected] International Institute of Information Technology Hyderabad - 500 032, INDIA July 2019 Copyright c Vikram Ahuja, 2019 All Rights Reserved International Institute of Information Technology Hyderabad, India CERTIFICATE It is certified that the work contained in this thesis, titled “Computational Analysis of Humour” by Vikram Ahuja, has been carried out under my supervision and is not submitted elsewhere for a degree. Date Adviser: Prof. Radhika Mamidi To my Parents and Late Prof. Navjyoti Singh Acknowledgments I would like to thank Prof. Radhika Mamidi for accepting me to complete my thesis under her guidance. I would like to thank Late Prof. Navjoyti Singh, my advisor for accepting me in IIIT-H and for his constant support, guidance and motivation. Working under him was a great learning experience. He promoted free thought, exploration and has pushed me to think out of the box. I thank my parents and Rubal for their unconditional love and support throughout the journey. I would like to extend my warm regards to all the research members at CEH for their help. I would specially like to thank Taradheesh Bali for his inputs, being an awesome research partner and for being my co-author. I would like to thank all my friends and my hostel mates for making my journey in IIIT-H more exciting. A special shoutout to Manas Tewari for helping me reviewing my thesis and most of my research work. I am also grateful for my friends Gaurav Singh, Durgesh Pandey and Amit Kumar Jha for helping me with our late night discussions and cardio sessions in meta states. Special mention to VRV, Vinit, Rathi, Maju, Tau, Priyanka and whole of the silent wing for helping me throughout this journey. I would also like to thank Dr. Albert Hofmann for all his contribution in the field of consciousness and finally the greens on the foothills of Himalaya. v Abstract In this thesis we mainly focus on three major aspects of computational humour recognition. We start with categorizing humour based on the classical theories of humour along with features like theme, emo- tions and topics. We then look at the problem of recognizing humour in conversations and broadcasted speeches which are more complex and large than short jokes. Finally, we try to differentiate between different types of off-color humour and try to detect insulting remarks from off-colour humour in which dark humour is often misclassified as insulting humour. Most scholarly works in the field of computational detection of humour derive their inspiration from the incongruity theory. Incongruity is an indispensable facet in drawing a line between humorous and non-humorous occurrences but is immensely inadequate in shedding light on what actually made the par- ticular occurrence a funny one. Classical theories like Script based Semantic Theory of Humour(SSTH) and General Verbal Theory of Humour(GVTH) try and achieve this feat to an adequate extent. We ad- here to a more holistic approach towards classification of humour based on these classical theories with a few improvements and revisions. Through experiments based on our linear approach and performed on large data-sets of short jokes, we are able to demonstrate the adaptability and show componentizabil- ity of our model, and that a host of classification techniques can be used to overcome the challenging problem of distinguishing between various categories and sub-categories of jokes. Almost all the studies done in the field ofcomputational humour recognition has been done on datasets consisting of short jokes, tweets and puns. We try to detect humour in conversations and broadcasted speeches as they are complex and contains more contextual information when compared to short jokes. For the purpose of automatic humour detection in monologues we built a corpus contain- ing humorous utterances of TED talks and for dialogues we analysed data from a popular TV-sitcom Friends whose canned laughter gives an indication of when the audience would react. We classified dialogues/monologues into humorous and non-humorous by using multiple deep learning methods. Our experiments on the data show that such deep learning methods outperform the baseline by 21 accuracy points respectively on the TED Talk dataset. Off colour humour is a category of humour which is considered by many to be in poor taste or overly vulgar. Most commonly, off-colour humour contains remarks on particular ethnic group or gender, violence, domestic abuse, acts concerned with sex, excessive swearing or profanity. Blue humour, black humour and insult humour are types of off-colour humour. Blue and black humour unlike insult humour are not outrightly insulting in nature but are often misclassified because of the presence of insults and harmful speech. We then provide an original data-set consisting of nearly 15,000 instances and a novel vi vii approach towards resolving the problem of separating black and blue humour from offensive humour which is essential so that free speech on the internet is not curtailed. Our experiments show that deep learning methods outperforms other n-grams based approaches like SVMs, Naive Bayes and Logistic Regression by a large margin. Contents Chapter Page Abstract :::::::::::::::::::::::::::::::::::::::::::::: vi 1 Introduction :::::::::::::::::::::::::::::::::::::::::: 1 1.1 Motivation . 2 1.2 Contribution . 3 1.3 Organisation of thesis . 3 2 Related Work ::::::::::::::::::::::::::::::::::::::::: 5 2.1 Theories Of Humor . 5 2.1.1 Classical Theories of Humour . 5 2.1.2 Linguistic Theories Of humour . 7 2.2 Computational Humour . 8 2.2.1 Humour Recognition . 8 3 Automatic Humour Classification of Jokes ::::::::::::::::::::::::::: 13 3.1 Overview . 13 3.2 Related Work . 14 3.3 Proposed Framework . 15 3.4 Dataset . 17 3.5 Experiments . 19 3.6 Analysis . 21 3.7 Future Work . 22 4 Humour Detection in Conversations :::::::::::::::::::::::::::::: 24 4.1 Overview . 24 4.2 Related Work . 25 4.3 Dataset . 26 4.4 Methodology . 27 4.5 Experiment and Results . 28 4.6 Discussion . 29 4.7 Future Work . 30 5 Computational Analysis of Off-Colour Humour :::::::::::::::::::::::: 31 5.1 Overview . 31 5.2 Related Work . 32 5.3 Proposed Framework . 33 viii CONTENTS ix 5.4 Dataset . 35 5.5 Experiments . 36 5.6 Analysis . 37 5.7 Future Work . 38 6 Conclusions :::::::::::::::::::::::::::::::::::::::::: 39 Related Publications :::::::::::::::::::::::::::::::::::::::: 40 Bibliography :::::::::::::::::::::::::::::::::::::::::::: 41 List of Figures Figure Page 5.1 Differentiating between Insulting and Non-Insulting Humor . 34 5.2 Jokes and Insults differentiation . 35 5.3 Graph showing the results of the classifiers used . 38 x List of Tables Table Page 3.1 Computationally Detectable Characteristics in Jokes . 17 3.2 Example of few jokes in the dataset . 18 3.3 Result for Topic Detection . 22 3.4 Result for Sarcastic Jokes . 22 3.5 Result for Dark Jokes . 22 3.6 Result for Adult Slang/Sexual Jokes . 23 3.7 Result for Gross Jokes . 23 3.8 Result for Insult Jokes . 23 4.1 An excerpt from TED talk Tim Urban: Inside the mind of a master procrastinator . 26 4.2 An excerpt from S01E01 from ”Friends” TV Show . 27 4.3 Results of various classifiers on the dataset . 29 5.1 Few examples of jokes used in our dataset . 35 5.2 Table showing the accuracies of various classifiers used . 38 xi Chapter 1 Introduction Humor is the tendency of particular cognitive experiences to provoke laughter and provide amuse- ment. Humor is an essential element of all verbal communication. The Oxford English Dictionary defines humour as that quality of action, speech, or writing which excites amusement; oddity, jocularity, facetiousness, comicality, fun”[65]. Humour has been studied by philosophers since a long time starting from the classical Greek thinkers Plato[13] and Aristotle[26] leading to multiple different definitions of humour as different disciplines see humour differently. Plato is considered as the first theorist of Humour[4] and described humour as a mixed feeling of soul[51, 4], i.e a mixture of pleasure and pain. Some linguists describe humour as any object or event that elicits laughter, amuses or is felt to be funny[4] while some describe humour as something which sometimes elicits laughter and sometimes a smile[48]. Many researchers have proposed different types of humour while some researchers reduce humour to just one, i.e., incongruity and its resolution[58], while some believe that a global anthropological theory of humor and laughter is not possible[3] and it is impossible to define humour[61]. Humour is considered to be a multidisciplinary field as it’s research have had contributions from various disciplines including psychology, anthropology, sociology, literature, philosophy, philology, semiotics and linguistics.[4] The Greek thinkers highly influenced Latin authors like Cicero[16] and Quintilian[52]. Cicero’s work on humour is considered an important work and is a first attempt on taxonomy of humour from a linguistics perspective[4]. Cicero describes the distinction between verbal and referential humour which has been used by many other theorists like Freud[21] and Raskin[53]. Verbal humour is the type of humour which is expressed verbally using a language or text unlike physical and visual humour which does not need a language to be represented. Verbal humour is of interest to linguists and NLP researchers. There can multiple definitions and categorization of humour which makes it a very challenging and an interesting domain to work in. On the top of that humour is also incredibly subjective and highly contextual. A joke can be hysterical to one person while the other person might find it offensive. Also 1 there can be types of humour which makes sense in a scenario but can be inappropriate in another, for example: dark humour and self-deprecating humour.