Analysis and Generation of Humorous Texts in Hindi
Total Page:16
File Type:pdf, Size:1020Kb
Computational Humour: Analysis and Generation of Humorous Texts in Hindi Thesis submitted in partial fulfillment of the requirements for the degree of Masters of Science in Computational Linguistics by Research by Srishti Aggarwal 201325049 [email protected] International Institute of Information Technology Hyderabad - 500 032, INDIA October 2018 Copyright c Srishti Aggarwal, 2018 All Rights Reserved International Institute of Information Technology Hyderabad, India CERTIFICATE It is certified that the work contained in this thesis, titled “Computational Humour: Analysis and Gener- ation of Humorous Texts in Hindi” by Srishti Aggarwal, has been carried out under my supervision and is not submitted elsewhere for a degree. Date Adviser: Prof. Radhika Mamidi To My Parents Acknowledgments I am grateful to all the wonderful people in my life for their encouragement and support for my research work. First and foremost, I thank my parents for their love, support and faith in me. Thank you for being there for me, and for always getting me last minute tickets whenever I felt like coming home. I thank my sister for always wishing me well and being my silent supporter. I thank my family - my grandparents, uncles, aunts, cousins - for celebrating the smallest of victories, encouraging me in the biggest of defeats, and for keeping a joke running about the topic of my research. I would like to express my gratitude towards my advisor Prof. Radhika Mamidi who guided and supported me at each step of this work, right from when I showed interest in this topic more than two years ago. She showed great patience in giving me the freedom to work on my own pace, while also keeping me grounded and on track. I am thankful to Kritik Mathur, who worked with me on this project for a semester and co-authored two submissions with me. All our discussions and the many ideas we tried out really helped shape this work. My sincere gratitutde towards Harshit Harchani, Nisarg Jhaveri and Madhuri Tummalapalli for sup- porting and guiding me throughout my research work, for always lending a patient ear whenever I had something to discuss, or even just wanted to talk out loud, and for giving me company all through this last year. I thank Mohit Goel for the many pep talks he gave me, at all the right times. A special thanks to Arpit Merchant for being there for me and always helping me out whenever and wherever possible. I also want to thank my other friends - Smriti Singh, Vivek Ghaisas, Sanjana Sharma, Anvesh Rao, Kaveri Anuranjana for all the fun times together that made my time at IIIT Hyderabad memorable. v Abstract Humour is an intricate part of human behaviour and a valued quality when developing interpersonal relationships. It is an interesting and challenging domain to work in owing to its subjectivity. Humour is one of the most sophisticated forms of human intelligence and creativity, intimately related with factors such as world knowledge, reasoning and perception. With our AI systems becoming increasingly intelligent, there is now a demand for systems which are more human-like than ever before. No AI system could be complete without the ability to to process and use an integral human phenomenon like humour. Humour is often expressed through language in various forms such as funny stories/anecdotes, jokes, witty sayings, puns etc. This verbally expressed humour is of interest to NLP researchers because it deals with spoken and textual language. The subfield of Computational Linguistics and Artificial Intelligence which aims to develop systems which can use and understand humour like humans do is called Computational Humour. A complete computational humour system has to contain two main modules - a humour interpreter, and a humour generator. Both these modules could have their independent uses in a variety of appli- cations, but when it comes to interactive and comprehensive AI systems, both are equally important. In this thesis, we present a first ever work for computational humour for Hindi. We consider both sub- fields in computational humour - Humour Understanding and Humour Generation, and work on specific applications in both. We study Hindi-English code mixed puns by focusing on automatic recovery of the target words/phrases in them. We define code-mixed puns, analyse the type of structures present in them, and classify them into two main categories - intra-sentential and intra-word code-mixed puns. We then propose a four-step methodology to recover intra-sentential code-mixed puns. On evaluating our model on a test dataset of such puns, we observe that we are able to successfully recover 67%. We recognise several limitations of our system, like the inability to deal with highly unusual and creative language use. For the problem of computational generation of humour, we consider two tasks. First, we build a prototype system which automatically generates a simple form of Dur-se-Dekha jokes in Hindi using a small lexicon and handcrafted rules. Second, we design an algorithm which can induce humour in a short non-humorous text by performing single word subsitutions on the basis of phonetic similarity vi vii and sentiment opposition. We perform human evaluation on the results of both and observe that we are successfully able to generate funny instances in both these cases. Contents Chapter Page 1 Introduction :::::::::::::::::::::::::::::::::::::::::: 1 1.1 Computational Humour . 2 1.2 Motivation . 3 1.3 Our Contributions . 4 1.4 Thesis Organisation . 5 2 Background and Related Work ::::::::::::::::::::::::::::::::: 6 2.1 Theories of Humour . 6 2.1.1 Conventional Theories of Humour . 6 2.1.2 Linguistic Theories of Humour . 8 2.2 Computational Humour . 9 2.2.1 Humour Detection and Understanding . 9 2.2.2 Humour Generation . 11 3 Understanding Code-Mixed Puns ::::::::::::::::::::::::::::::: 17 3.1 Introduction . 17 3.2 Our contribution . 18 3.3 Background and Related Work . 18 3.3.1 Linguistic Studies on Puns . 18 3.3.2 Code-Mixing . 19 3.4 Classification of Code-Mixed Puns . 20 3.4.1 Intra-Sentential Code-Mixed Puns . 20 3.4.2 Intra-Word Code Mixed Puns . 21 3.5 Test dataset . 21 3.6 Our Model . 23 3.7 Results and Discussion . 26 3.8 Future Work . 28 3.9 Conclusions . 28 4 Humour Generation :::::::::::::::::::::::::::::::::::::: 29 4.1 Automatic Generation of Dur-se-Dekha jokes . 29 4.1.1 Introduction . 29 4.1.2 Dur-se-Dekha ................................... 30 4.1.3 Our Model . 33 4.1.4 Evaluation and Results . 35 viii CONTENTS ix 4.1.5 Future Extensions . 36 4.1.6 Summary . 37 4.2 Word Substitutions to Induce Humour in Short Phrases . 37 4.2.1 Introduction . 37 4.2.2 Experiment . 38 4.2.3 Evaluation and Discussion . 39 4.2.4 Summary . 40 4.3 Conclusion . 40 5 Conclusion ::::::::::::::::::::::::::::::::::::::::::: 42 Bibliography :::::::::::::::::::::::::::::::::::::::::::: 45 List of Figures Figure Page 3.1 An example of an Amul advertisement containing a intra-word code-mixed pun from 2014. 22 3.2 Flowchart depicting the model for target recovery of code-mixed puns. 24 x List of Tables Table Page 3.1 Examples of intra-sentential code-mixed puns . 20 3.2 Examples of intra-word code-mixed puns . 21 3.3 Distribution of the different types of puns in our test dataset. 22 3.4 Labels for various categories in the language identification step. 23 3.5 Examples of puns successfully recovered by our system . 26 3.6 Example for the error case were a single pun word maps to more than one word in the target. 26 3.7 Example for the error case where the pun is based on the pronunciation of an abbreviation. 27 3.8 Example for the error case where the pun is based on the pronunciation of an abbreviation. 27 3.9 Example for the error case where the target is rare or a new coinage. 27 4.1 The basic structure of Dur-se-Dekha jokes . 30 4.2 The basic structure of Dur-se-Dekha jokes translated to English. 30 4.3 The structure of Type1(a) Dur-se-Dekha jokes (with NP punchlines). 31 4.4 Examples of Type1(a) Dur-se-Dekha jokes (those with NP punchlines). 31 4.5 The structure Type1(b) Dur-se-Dekha jokes (with VP punchlines) . 32 4.6 Examples of Type1(b) Dur-se-Dekha jokes (those with VP punchlines). 32 4.7 Structure of Type2 Dur-se-Dekha jokes . 33 4.8 Examples of Type2 Dur-se-Dekha jokes . 33 4.9 Examples of Dur-se-Dekha generated by our system. 35 4.10 Mean naturalness values of human created jokes vs system generated jokes. 36 4.11 Mean Funniness values, and standard deviations of system generated jokes with varying constraints. 36 4.12 Example texts from the Baseline model. 38 4.13 Example texts from the our sentiment based model. 39 4.14 Breakdown of ratings given to jokes from both models . 39 xi Chapter 1 Introduction Philosophers and researchers since the times of Aristotle [45] and Plato [58] have been trying to understand and define the concept of humour. There are dozens of different definitions of humour. The Oxford English Dictionary defines humour as “that quality of action, speech, or writing which excites amusement; oddity, jocularity, facetiousness, comicality, fun" [78]. According to other researchers, humour can been seen as “any object or event that elicits laughter, amuses or is felt to be funny" [73], or a “non-serious social incongruity" [7], or even an individual trait or a characterictic of a person(rather than a statement/event) [41, 42]. Definitions of humour have evolved from those which relied on laughter into a multi-disciplinary topic of research and spans over the fields of sociology, physiology, psychology, phiosophy, literature and linguistics.