DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2020 Automating Question Generation Given the Correct Answer HAOLIANG CAO KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Authors Haoliang Cao <
[email protected]> School of Electrical Engineering and Computer Science KTH Royal Institute of Technology Place for Project Stockholm, Sweden KTH Royal Institute of Technology Examiner Viggo Kann School of Electrical Engineering and Computer Science KTH Royal Institute of Technology Supervisor Johan Boye School of Electrical Engineering and Computer Science KTH Royal Institute of Technology Swedish Title Automatisering av frågegenerering givet det rätta svaret iii Abstract In this thesis, we propose an end-to-end deep learning model for a question generation task. Given a Wikipedia article written in English and a segment of text appearing in the article, the model can generate a simple question whose answer is the given text segment. The model is based on an encoder-decoder architecture. Our experiments show that a model with a fine-tuned BERT en- coder and a self-attention decoder give the best performance. We also propose an evaluation metric for the question generation task, which evaluates both syntactic correctness and relevance of the generated questions. According to our analysis on sampled data, the new metric is found to give better evaluation compared to other popular metrics for sequence to sequence tasks. Keywords Natural Language Processing, NLP, Natural Language Generation, NLG, Ques- tion Generation iv Sammanfattning I den här avhandlingen presenteras en djup neural nätverksmodell för en fråge- ställningsuppgift. Givet en Wikipediaartikel skriven på engelska och ett textseg- ment i artikeln kan modellen generera en enkel fråga vars svar är det givna textsegmentet.