Automating Question Generation Given the Correct Answer
Total Page:16
File Type:pdf, Size:1020Kb
DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2020 Automating Question Generation Given the Correct Answer HAOLIANG CAO KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Authors Haoliang Cao <[email protected]> School of Electrical Engineering and Computer Science KTH Royal Institute of Technology Place for Project Stockholm, Sweden KTH Royal Institute of Technology Examiner Viggo Kann School of Electrical Engineering and Computer Science KTH Royal Institute of Technology Supervisor Johan Boye School of Electrical Engineering and Computer Science KTH Royal Institute of Technology Swedish Title Automatisering av frågegenerering givet det rätta svaret iii Abstract In this thesis, we propose an end-to-end deep learning model for a question generation task. Given a Wikipedia article written in English and a segment of text appearing in the article, the model can generate a simple question whose answer is the given text segment. The model is based on an encoder-decoder architecture. Our experiments show that a model with a fine-tuned BERT en- coder and a self-attention decoder give the best performance. We also propose an evaluation metric for the question generation task, which evaluates both syntactic correctness and relevance of the generated questions. According to our analysis on sampled data, the new metric is found to give better evaluation compared to other popular metrics for sequence to sequence tasks. Keywords Natural Language Processing, NLP, Natural Language Generation, NLG, Ques- tion Generation iv Sammanfattning I den här avhandlingen presenteras en djup neural nätverksmodell för en fråge- ställningsuppgift. Givet en Wikipediaartikel skriven på engelska och ett textseg- ment i artikeln kan modellen generera en enkel fråga vars svar är det givna textsegmentet. Modellen är baserad på en kodar-avkodararkitektur (encoder- decoder architecture). Våra experiment visar att en modell med en finjusterad BERT-kodare och en självuppmärksamhetsavkodare (self-attention decoder) ger bästa prestanda. Vi föreslår också en utvärderingsmetrik för frågeställ- ningsuppgiften, som utvärderar både syntaktisk korrekthet och relevans för de genererade frågorna. Enligt vår analys av samplade data visar det sig att den nya metriken ger bättre utvärdering jämfört med andra populära metriker för utvärdering. Nyckelord Naturligtspråkbehandling, Naturligtspråkgenerering, Frågegenerering v Acknowledgements I would like to express my sincere thanks and gratitude to my supervisor, Johan Boye who gave me the opportunity to make some contribution for the question generation task. Without his patient and careful guidance, it would be impos- sible for me to complete this thesis. I would also like to thank my examiner, Viggo Kann who gave me valuable suggestions to improve the quality of my final report. Contents 1 Introduction 1 1.1 Objective . .2 1.2 Thesis outline . .3 2 Background 4 2.1 Introduction of Natural Language Processing . .4 2.1.1 Features and representations . .4 2.1.2 Models . .6 2.2 Natural Language Understanding (NLU) . .9 2.2.1 BIDAF model . .9 2.2.2 Other QA models . 11 2.2.3 Dataset . 11 2.3 Natural language generation (NLG) . 12 2.4 Transfer Learning and Pre-trained Language Model . 13 3 Methods 15 3.1 Baseline Model . 15 3.2 ReverseQA Model . 20 4 Implementation 22 4.1 Dataset . 22 4.2 Training . 24 4.2.1 Baseline model . 24 4.2.2 ReverseQA model . 25 4.3 Testing . 25 5 Analysis of the Evaluation Method 26 5.1 Metrics for Sequence to Sequence task . 26 5.2 Readability and Relevance (RaR) Metric . 27 5.3 Metric Evaluation . 30 vi CONTENTS vii 5.3.1 Readability loss . 30 5.3.2 Overlapping score . 31 5.3.3 Conclusion . 32 6 Results 33 6.1 Quantitative Evaluation . 33 6.1.1 Baseline model . 33 6.1.2 ReverseQA model . 33 6.2 Qualitative Evaluation . 36 7 Discussion 40 7.1 About the Metric . 40 7.2 About the ReverseQA Model . 41 7.3 Future Work . 41 7.4 Ethical and Sustainability . 42 Bibliography 43 A Context for sampled data in Table 5.4 47 B Context for the typical examples 50 Chapter 1 Introduction Natural Language Processing (NLP) is a sub-field of artificial intelligence. The research in this field involves natural language, which refers to the lan- guage used by humans in their daily life, so it is closely related to the research of linguistics, but there are differences. NLP is not a general study of natu- ral language, but the development of computer systems, especially software systems, which can effectively achieve natural language communication. The realization of natural language communication between human and computer means that the computer cannot only understand the meaning of a natural lan- guage text but also express certain intentions and thought in the form of natural language text. The former is called Natural Language Understanding (NLU), and the latter is called Natural Language Generation (NLG). Question Answering (QA) task is an essential part of Natural Language Pro- cessing (NLP), or more specifically, Natural Language Understanding (NLU) field. By mimicking the reading comprehension test, we assume a machine has a certain level of understanding if it can answer questions about a specific corpus after "reading" it. The past years has witnessed fast development of models which have excellent performance on some well-known QA datasets, and some even outperform human performance [1, 2]. In this degree project, instead of further developing the model for the QA task, we would like to re- verse the process and generate questions given the answers and related text. Since the design of the QA task aims to test the reading comprehension abil- ity of the machine, the questions designed should also, to some extent, reflect the understanding of the corpus. Except for the NLU part, this project also involves Natural Language Generation (NLG), since the question cannot be simply extracted from the text. The NLG task has always been challenging [3] 1 2 CHAPTER 1. INTRODUCTION because we need to both maintain the intention and make the sentence syntac- tically correct. The NLG models has been though the rule-based phase [4] and RNN-based phase [5], in which models already given decent performance on some of the NLG tasks, such as the Neural Machine Translation (NMT) [6]. Nowadays, the contextual language models such as BERT[7], GPT2[8], XL- Net[9] have again leveled up their performance on text generation tasks, which provides us with abundant tools to utilize. If high-quality questions can be successfully generated, its possible applica- tion could be: • Help to automatically generate simple questions for reading comprehen- sion test. • Help generate more data for QA datasets. • Help to train the QA model in a semi-supervised manner. 1.1 Objective The research question is: if we want to ask a question about a piece of in- formation in a article, and the idea answer to that question is the information itself, how can we automate the question generation? Therefore, we formally define our objective as: Given a corpus and a segment of text appearing in the corpus, the aim is to train the computer to generate meaningful and reason- able questions about that segment of text. At least one of the answers to this question should be the provided answer used as the input before. Table 1.1 shows one example from the SQuAD dataset, which is one of the well-known datasets for the QA task. In the QA task, people try to implement models that can output the correct answer "zeta function" given the question "What function is related to prime numbers?" and the context in Table 1.1. However, for the question generation task, given the answer "zeta function" in Table 1.1 and its location in the context, the aim is to train the computer to generate a question that has the similar meaning as the original question "What function is related to prime numbers?". Instead of requiring an exact match, the generated question could be "What is the name of the function that closely relates to prime numbers?" or "Name the function related to prime numbers?". As long as the generated question is readable by a human and the answer to CHAPTER 1. INTRODUCTION 3 Table 1.1: One example from the SQuAD dataset[10] The zeta function is closely related to prime numbers. For example, the aforementioned fact that there are infinitely many primes can also be seen using the zeta function: if there were only finitely many primes then (1) would have a finite value. However, the harmonic context series 1 + 1/2 + 1/3 + 1/4 + ... diverges (i.e., exceeds any given number), so there must be infinitely many primes. Another example of the richness of the zeta function and a glimpse of modern algebraic number theory is the following identity (Basel problem), due to Euler, question What function is related to prime numbers? answer zeta function the generated question given the context is "zeta function" or simply "zeta," it will be considered a successful generation. 1.2 Thesis outline In chapter 2, the basic concepts of NLP are introduced, including the objec- tive of each sub-field and the available methods. We will present the details of some models whose architectures are also applied in our thesis. In chapter 3, we will first propose two end-to-end deep learning models for the question generation task. Both models will have the standard encoder-decoder structure and have decoders with multi-head attention layers. The model with a similar encoder structure to the QANet[11] will be used as the baseline. In the other model named ReverseQA, the pre-trained Bert encoding will be used as the encoder. All the concepts mentioned above will be explained in chapter 2.