Automatic Cloze-Questions Generation
Total Page:16
File Type:pdf, Size:1020Kb
Automatic Cloze-Questions Generation Annamaneni Narendra, Manish Agarwal and Rakshit shah LTRC, IIIT-Hyderabad, India {narendra.annamaneni|manish agrwal|rakshit shah}@students.iiit.ac.in Abstract area of cloze questions, (Sumita et. al., 2005; Lee and Seneff, 2007; Lin et. al., 2007; Pino et. al., Cloze questions are questions containing sen- 2009; Smith et. al., 2010) have mostly worked in the tences with one or more blanks and multiple domain of English language learning. Cloze ques- choices listed to pick an answer from. In this tions have been generated to test students knowl- work, we present an automatic Cloze Ques- tion Generation (CQG) system that generates edge of English in using the correct verbs (Sumita a list of important cloze questions given an En- et. al., 2005), prepositions (Lee and Seneff, 2007) glish article. Our system is divided into three and adjectives (Lin et. al., 2007) in sentences. Pino modules: sentence selection, keyword selec- et. al. (2009) and Smith et. al. (2010) have gener- tion and distractor selection. We also present ated questions to teach and evaluate student’s vo- evaluation guidelines to evaluate CQG sys- cabulary. Agarwal and Mannem (2011) have gener- tems. Using these guidelines three evaluators ated factual cloze questions from a biology text book report an average score of 3.18 (out of 4) on Cricket World Cup 2011 data. through heuristically weighted features. They do not use any external knowledge and rely only on infor- mation present in the document to generate the CQs 1 Introduction with distractors. This restricts the possibilities dur- Multiple choice questions (MCQs) have been ing distractor selection and leads to poor distractors. proved efficient to judge students’ knowledge. Man- In this work, we present an end-to-end automatic ual construction of such questions, however, is a cloze question generating system which adopts a time-consuming and labour-intensive task. Cloze semi-structured approach to generate CQs by mak- ing use of a knowledge base extracted from a Cricket questions (CQs) are fill-in-the-blank questions, 1 where a sentence is given with one or more blanks in portal. Also, unlike previous approaches we add it with four alternatives to fill those blanks. As op- context to the question sentence in the process of posed to MCQs where one has to generate the WH creating a CQ. This is done to disambiguate the style question, CQs use a sentence with blanks to question and avoid cases where there are multiple form a question. The sentence could be picked from answers for a question. In Example 1, we have dis- a document on the topic avoiding the need to gener- ambiguated the question by adding context in the ate a WH style question. As a result, automatic CQG world-cup final. Such a CQG system can be used has received a lot of research attention recently. in a variety of applications such as quizzing sys- tems, trivia games, assigning fan ratings on social 1. Zaheer Khan opened his account with three con- networks by posing game related questions etc. secutive maidens in the world-cup final. Automatic evaluation of a CQG system is a very (a) Zaheer Khan (b) Lasith Malinga (c) Praveen difficult task; all the previous systems have been Kumar (d) Munaf Patel evaluated manually. But even for the manual eval- In the above example CQ, the underlined word uation, one needs specific guidelines to evaluate fac- (referred to as keyword) Zaheer Khan is blanked out 1A popular game played in commonwealth countries such in the sentence and four alternatives are given. In as Australia, England, India, Pakistan etc.. 511 Proceedings of Recent Advances in Natural Language Processing, pages 511–515, Hissar, Bulgaria, 7-13 September 2013. tual CQs when compared to those that are used the word/phrase/clause that tests the knowledge of in language learning scenario. To the best of our the user from the content of the article. This key- knowledge there are no previously published guide- word shouldn’t be too trivial and neither should be lines for this task. In this paper, we also present too obscure. For example, in an article on Obama, guidelines to evaluate automatically generated fac- Obama would make a bad keyword. tual CQs. The system first collects all the potential key- words from a sentence in a list and then prunes this 2 Approach list on the basis of observations described later in Our system takes news reports on Cricket matches this section. as input and gives factual CQs as output using a Unlike the previous works in this area, our system knowledge base on Cricket players and officials col- is not bound to select only one token keyword or to lected from the web. select only nouns and adjectives as a keyword. In Given a document, the system goes through three our work, a keyword could be a Named Entity (per- stages to generate the cloze questions. In the son, number, location, organization or date) (NE), a first stage, informative and relevant sentences are pronoun (that comes at beginning of a sentence so selected and in the second stage, keywords (or that its referent is not present in that sentence) or a words/phrases to be questioned on) are identified in constituent (selected using the parse tree). In Exam- the selected sentence. Distractors (or answer alter- ple 2, the selected keyword is a noun phrase, carrom natives) for the keyword in the question sentence are ball. chosen in the final stage. R Ashwin used his carrom ball to remove the The Stanford CoreNLP tool kit is used for tok- 2. potentially explosive Kirk Edwards in Cricket enization, POS tagging (Toutanova et. al, 2003), World Cup 2011. NER (Finkel et. al, 2005), parsing (Klein et. al, 2003) and coreference resolution (Lee et. al, 2011) 2.2.1 Observations of sentences in the input documents. According to our data analysis we have some ob- 2.1 Sentence Selection servations to prune the list that are described below. In sentence selection, relevant and informative sentences from a given input article are picked to be • Relevant tokens should be present in the the question sentences in cloze questions. keyword There must be few other tokens in a keyword other than stop words3, common Agarwal and Mannem (2011) uses many sum- 4 5 marization features for sentence selection based on words and topic words . We observed that heuristic weights. But for this task it is difficult to words given by the TopicS tool are trivial to be decide the correct relative weights for each feature keywords as they are easy to predict. without any training data. So our system directly • Prepositions The preposition at the beginning uses a summarizer for selection of important sen- of the keyword is an important clue with re- tences. There are few abstractive summarizers but spect to what the author is looking to check. they perform very poorly, (Michael et. al., 1999) for So, we keep it as a part of the question sen- example. So our system uses an extractive summa- 2 tence rather than blank it out as the keyword. rizer, MEAD to select important sentences. Top We also prune the keywords containing one 10 percent of the ranked sentences from the summa- or more prepositions as they more often than rizer’s output are chosen to generate cloze questions. not make the question unanswerable and some- 2.2 Keywords Selection times introduce a possibility for multiple an- This step of the process is selection of words in swers to such questions. the selected sentence that can be blanked out. These 3In computing, stop words are words which are filtered words are referred to as the keywords in the sen- out prior to, or after processing of natural language data tence. For a good factual CQ, a keyword should be (text). http://armandbrahaj.blog.al/2009/04/ 14/list-of-english-stop-words/ 2MEAD is a publicly available toolkit for multi-lingual sum- 4Most common words in English taken from marization and evaluation. The toolkit implements multiple http://en.wikipedia.org/wiki/Most\_common\ summarization algorithms (at arbitrary compression rates) such _words\_in\_English. as position-based, Centroid[RJB00], TF*IDF, and query-based 5Topics (words) which the article talks about. We used the methods (http://www.summarization.com/mead) TopicS tool (Lin and Hovy, 2000) 512 We also use the observations, presented by (Agar- To present more meaningful and useful distrac- wal and Mannem, 2011) in their keyword selection tors, the stage is domain dependent and also uses step, such as, a keyword must not repeat in the sen- a knowledge base. The system extracts clues tence again and its term frequency should not be from the sentences to present meaningful distrac- high, a keyword should not be the entire sentence, tors. The knowledge base is collected by crawl- etc. We use the score given by the TopicS tool to ing players’ pages available at http://www. filter the keywords with high frequency. espncricinfo.com. Each page has a variety of The above criteria reduces the potential key- information about the player such as name, playing words’ list by a significant amount. Among the rest style, birth date, playing role, major teams etc. This of the keywords, our system gives preference to NE information is widely used to make better choices (persons, location, organization, numbers and dates through out the system. Sample rows and columns (in order)), noun phrases, verb phrases in order. To from the database of players are shown in the Ta- preserve the overall quality of a set of generated ble 1.