Automatic Question Generation Using Discourse Cues and Distractor Selection for Cloze Questions
Total Page:16
File Type:pdf, Size:1020Kb
Automatic Question Generation using Discourse Cues and Distractor Selection for Cloze Questions Thesis submitted in partial fulfillment of the requirements for the degree of MS by Research in Computer Science with specialization in NLP by Rakshit Shah 200702041 [email protected] Language Technology and Research Center (LTRC) International Institute of Information Technology Hyderabad - 500032, INDIA July 2012 Copyright c Rakshit Shah, 2012 All Rights Reserved International Institute of Information Technology Hyderabad, India CERTIFICATE This is to certify that the thesis entitled “Automatic Question Generation using Discourse Cues and Distractor Selection for Cloze Questions” submitted by Rakshit Shah to International Institute of Information Technology, Hyderabad, for the award of the Degree of Master of Science (by Research) is a record of bona-fide research work carried out by him under my supervision and guidance. The contents of this thesis have not been submitted to any other university or institute for the award o any degree or diploma. Date Adviser: Prof. Rajeev Sangal To my Parents Acknowledgments First and foremost I offer my sincerest gratitude to my supervisor, Dr Rajeev Sangal, Head of Lan- guage Technologies Research Centre (LTRC), International Institute of Information Technology - Hy- derabad, who has supported me throughout my thesis with his patience and knowledge whilst allowing me the room to work in my own way. I am thankful to my mentor and guide, Prashanth Mannem, for his wide knowledge and guidance throughout this work. I attribute the level of my Masters degree to his encouragement and effort and without him this thesis, too, would not have been completed or written. One simply could not wish for a better or friendlier mentor. I am deeply grateful to Professor Dipti Misra Sharma for her detailed and constructive comments. Her logical way of thinking has been of great value for me. In my daily work I have been blessed with a friendly and cheerful group of fellow students. Inter- esting discussions about life with Rahul Agarwal, Manish Agarwal, Abhinav Goel, Rohit Nigam and many more at the cafeteria or in the lab has kept me sane throughout my studies. Shubhangi Sharma kept me on track by giving much needed, motivating but boring, lectures time and again and telling me to hang in there. Abhinav Goel kept us entertained with his huge repertoire of anecdotes and stories. He can sense your achievement and was there to celebrate at every stepping stone of my project. Shashank Sahni has fascinated me with his interest in Linux and his ability to get softwares installed on any com- puter system. If it had not been for him, I wouldn’t have started my work on QG when I started my work. Harshit Sureka always made sure that I wasn’t very stressed out with work and often planned the unplanned trips to various places. Late night Basketball games after a long day in the lab with Romit, Shubhangi and Yasir cleared my mind and helped me get good night sleep. I am very thankful to Manish Agarwal, my parter in this work, who kept me motivated and supported me throughout the project right from the beginning. The LTRC has provided the support and equipment I needed to produce and complete my thesis. Finally, I thank my parents for supporting me throughout all my studies at the University. v Abstract A question may be either a linguistic expression used to make a request for information, or else the request itself made by such an expression. This information may be provided with an answer. Asking questions is a fundamental cognitive process that underlies higher-level cognitive abilities such as comprehension and reasoning. The ability to ask questions is the central cognitive element that distinguishes human and animal cognitive abilities. Questions are used from the most elementary stage of learning to original research. Question Generation (QG) is the task of automatically generating questions from various inputs such as raw text, database, or semantic representation. Ultimately, QG allows humans, and in many cases artificial intelligence systems, to understand their environment and each other. Research on QG has a long history in artificial intelligence, psychology, education, and natural language processing. The present work describes automatic Question Generation Systems that take natural language text as input and generate questions of various types and scope for the user. Our aim is to generate questions that assess the content knowledge that a student has acquired upon reading a text rather than vocabulary or grammar assessment or language learning. In this work, we have described two automatic question generation systems. Both these systems factor the QG process into several stages, enabling more or less independent development of particular stages. The QG system, described in chapter 2, generates questions automatically using discourse connec- tives for different question types. We described an end-to-end system that takes a document as input and outputs all the questions for selected discourse connectives. The selected discourse connectives include four subordinating conjunctions, since, when, because and although, and three adverbials, for example, for instance and as a result. Our system factors the QG process into two stages: content selec- tion (the text selected for question generation) and question formation (transformations on the content to get the question), Question formation module further has the modules of (i) finding suitable question type (wh-word), (ii) auxiliary and main verb transformations and (iii) rearranging the phrases to get the final question. The system has been evaluated for syntactic and semantic soundness of the question by two evaluators. The overall system has been rated 6.3 out of 8 for QGSTEC development dataset and 5.8 out of 8 for Wikipedia dataset. We have shown that some specific discourse relations are important, such as causal, temporal, result, etc., than others from the QG point of view. This work also shows that discourse connectives are good enough for QG and that there is no need for full fledged discourse vi vii parsing. We have generated questions using discourse connectives paving way for medium and specific scope questions. Cloze question generation (CQG) system, described in chapter 3, takes a document as input and outputs the important cloze questions. Our system factors the CQG system into three stages: (i) Sen- tence selection, (ii) Keyword selection and (iii) Distractor selection. A domain dependent approach is described for the distractor selection module of CQG system. The system is implemented for and tested on examples from the cricket sports domain. The system is evaluated using the guidelines described in this work. The accuracy of the distractors is 3.05 (Eval-1), 3.14 ((Eval-2) and 3.5 (Eval-3) out of 4. Main focus being on distractor selection, we have shown the influence of domain on the quality of distractors. Contents Chapter Page 1 Introduction .......................................... 1 1.1 Introduction.................................... .... 1 1.1.1 What is a question?................................ 1 1.1.2 What is a good question?............................. 1 1.1.3 Importance of questions in Learning . ........ 2 1.2 What is Question Generation? . ....... 2 1.3 Question Generation and its applications . ............. 3 1.4 Classification of Questions . ........ 4 1.4.1 Based on Question-type . ... 4 1.4.2 BasedonScope.................................. 4 1.5 ProblemStatement ................................ .... 5 1.6 Contribution of the Thesis . ....... 6 1.7 Thesisorganization ..... ...... ..... ...... ...... .. ...... 6 2 Automatic Question Generation using Discourse Cues ..................... 7 2.1 Overview ........................................ 7 2.2 Introduction.................................... .... 8 2.3 RelatedWork ..................................... 9 2.4 Selection of Discourse Connectives . .......... 11 2.5 Discourse connectives for QG . ....... 12 2.5.1 Question type identification . ...... 12 2.5.2 Target arguments for discourse connectives . ........... 14 2.6 Target Argument Identification . ......... 15 2.6.1 Locate syntactic head . 16 2.6.2 Target Argument Extraction . ..... 16 2.7 Syntactic Transformations and Question Generation . ................ 17 2.8 EvaluationandResults . ...... 18 2.9 ErrorAnalysis................................... 20 2.9.1 Co-reference resolution . ...... 20 2.9.2 ParsingErrors................................. 20 2.9.3 Errors due to the inter-sentential connectives . ............. 20 2.9.4 Fluencyissues ................................. 21 2.10Conclusions.................................... 21 viii CONTENTS ix 3 Distractor Selection for Cloze Questions ............................ 23 3.1 Overview ........................................ 23 3.2 Introduction.................................... 23 3.3 RelatedWork ..................................... 26 3.4 Approach ........................................ 28 3.4.1 SentenceSelection . ...... ..... ...... ...... ..... 28 3.4.2 KeywordsSelection ...... ..... ...... ...... ..... 28 3.5 DistractorSelection . ....... 29 3.5.1 Select distractors from a single team . ......... 31 3.5.2 Select distractors from both the teams . ......... 32 3.5.3 Select distractors from any team . ....... 32 3.6 EvaluationandResults . ...... 32 3.7 Conclusions..................................... 33 4 Conclusion and Future Work .................................