Which Is the Effective Way for Gaokao: Information Retrieval Or Neural Networks? 1 1,2 1 Shangmin Guo† , Xiangrong Zeng† , Shizhu He , Kang Liu1 and Jun Zhao1
Total Page:16
File Type:pdf, Size:1020Kb
Which is the Effective Way for Gaokao: Information Retrieval or Neural Networks? 1 1,2 1 Shangmin Guo† , Xiangrong Zeng† , Shizhu He , Kang Liu1 and Jun Zhao1 1National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China 2University of Chinese Academy of Sciences, Beijing, 10049, China shangmin.guo, xiangrong.zeng @nlpr.ia.ac.cn { shizhu.he, kliu, jzhao @nlpr.ia.ac.cn} { } After the World War II, U.S. and Soviet Union are fighting against each other in politics, economics and military. To promote the development of economics in Socialist Countries, Abstract Soviet Union establish The Council for Mutual Economic Assistance. This is against A. Truman Doctrine B. Marshall Plan C. NATO D. Federal Republic of Germany As one of the most important test of China, Entity Question Gaokao is designed to be difficult enough From Qin and Han Dynasties to Ming Dynasty, businessmen are always at the bottom of to distinguish the excellent high school hierarchy. One reason for this is that the ruling class thought the businessmen A. are not engaged in production B. do not respect Confucianism students. In this work, we detailed the C. do not respect the clan D. do not pay tax Gaokao History Multiple Choice Ques- Sentence Question tions(GKHMC) and proposed two differ- ent approaches to address them using var- ious resources. One approach is based on Figure 1: Examples of questions and their types. entity search technique (IR approach), the The upper one is an entity question. The lower other is based on text entailment approach one is a sentence question. where we specifically employ deep neu- ral networks(NN approach). The result of questions and essays and it covers several dif- experiment on our collected real Gaokao ferent subjects, like Chinese, Math, History and questions showed that they are good at etc. In this work, we focus on Gaokao History different categories of questions, i.e. IR Multiple Choice questions which is denoted as approach performs much better at entity GKHMC. Both of the factoid question answering questions(EQs) while NN approach shows task and reading comprehension task are similar to its advantage on sentence questions(SQs). GKHMC. But, the GKHMC questions have their Our new method achieves state-of-the-art own characteristics. performance and show that it’s indispens- A multiple-choice question in GKHMC such able to apply hybrid method when partici- as the examples shown in Figure 1 is composed pating in the real-world tests. of a question stem and four candidates. Our goal is to figure out the only one correct candi- 1 Introduction date. But, there are certain obstacles to achieve it. First, several background sentencess and a lead-in Gaokao, namely the National College Entrance sentence conjointly constitutes the question stem, Examination, is the most important examination which makes these questions more complicated for Chinese senior high school students. Ev- than former one-sentence-long factoid questions ery college in China, no matter it is Top10 or that can be handled by the existing approaches, Top100, would only accept the exam-takers whose like (Kolomiyet and Moens, 2011; Kwiatkowski Gaokao score is higher than its threshold score. et al., 2013; Berant and Liang, 2014; Yih et al., As there are almost 10 million students take the 2015). Secondly, the background sentences gener- examination every year, Gaokao needs to be dif- ally contain various clues to figure out the histori- ficult enough to distinguish the excellent students. cal events or personages which may be the perdue Therefore, it includes various types of questions key to answer the question. These clues may in- such as multiple-choice questions, short-answer clude Tang poem and Song iambic verse, domain- † Both of the two authors contributed equally to this paper. specific expressions, even some mixture of mod- 111 Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 111–120, Valencia, Spain, April 3-7, 2017. c 2017 Association for Computational Linguistics Question Type Candidate Type Count from questions to form a query, then inquire this EQ Entities 160 SQ Sentence 584 query in all the text resources to get the most rele- ALL Whatever 744 vant candidate. In NN approach, we take the ques- tion text and every candidate to form four state- Table 1: The GKHMC dataset. ments respectively, then judge how possible every statement is right so that we can figure out which ern Chinese and excerpt from ancient books and is most likely to be the correct answer. etc. The dependence of background knowledge To test the two approaches’ performance, we makes the models that are designed for read- collected and classified the multiple-choice ques- ing comprehension such as (Penas˜ et al., 2013; tions in Gaokao test papers from 2011 to 2015 all Richardson et al., 2013) fail. Thirdly, the diver- over the country, and they are released. From the sity of candidates’ granularity, i.e. candidates can result, we find that the performance of two ap- either be entities or sentences, makes it harder proaches are significantly discrepant at each kind to match the candidate and stem. So, the an- of questions. That is, IR approach shows notice- swer selection is disparate from the former ap- able advantages on EQs, while NN approach per- proaches whose candidates are usually just enti- forms much better on SQs. This will be further ties. Lastly, as the candidates are already given, discussed in Section 4.4. the answer generation step in former neural net- In this paper, our contributions are as follows: work approaches based question answering sys- We gave a detailed description of the Gaokao • tem is no longer necessary. History Multiple Choice Questions task and As mentioned above and shown in Figure showed its importance and difficulty. 1, in accordance with candidates’ granularity, the GKHMC questions can be divided into two We released a dataset1 for this task. The • types: entity questions(EQs) and sentence ques- dataset is manually collected and classified. tions(SQs). Entity questions are those whose can- All questions in the dataset are real Gaokao didates are all entities, no matter they are peo- quesitons from 2011 to 2015. ple, dynasties, warfares or something else. And, We introduced two different approaches for sentence questions are those whose candidates are • all sentences. We observe that such two types of this task. Each approach achieved a promis- questions have their own specific characteristics. ing results. We also compared this two ap- Most of background sentences in EQs are descrip- proaches and found that they are complemen- tion of the right candidate, so it may be partic- tary, i.e. they are good at different types of ularly suitable to apply information retrieval like questions. approach to handle them. Meanwhile, as the back- We introduced permanent provisional mem- • ground sentences and lead-in sentences in SQs are ory network(PPMN) to model the joint back- more like the entailing text, these questions aren’t ground knowledge and sentences in question appropriate to be addressed by lexically searching stem, and it beats existing memory networks and matching. Therefore, it seems that it’s more on SQs. resonable to resolve SQs by using textual reason- ing techniques. 2 Dataset In this paper, we wonder about which kind of As described in the Introduction, we collected the approach is more effective for GKHMC. Further- historical multiple-choice questions from Gaokao more, whether we should select specific method to all over the country in rencent five years. However, work out different types of questions. In terms of quite a lot contain graphs or tables which require various characteristics of GKHMC questions, we the techniques beyond natural language process- introduce two independent approaches to address ing(NLP). So, we filter out this part of questions them. One is based on entity search technique (IR and manually classified the left into two parts: approach) and the other is based on a text entail- EQs and SQs. The number of different kinds of ment approach where we specifically employ deep questions are listed in Table 1. The examples of neural networks (NN approach). In IR approach, we use the key entities and relationships extracted 1 https://github.com/IACASNLPIR/GKHMC/tree/master/data 112 different types of questions translated into English are shown in Figure 1. It is worth mentioning that there is a special type of questions on test papers named sequential ques- tions. The candidates of this kind of questions are just some ordered numbers. Every number stands for a certain content which is given in question stem. We simply replace every sequential number in candidates with their corresponding contents. Then, we can classify these questions as EQs or SQs according to the type of contents. We also collected a wide diversity of re- sources including Baidu Encyclopedia, textbooks Figure 2: Pipeline of IR approach. and practice questions as our external knowledge when inquiring the generated query. Baidu En- cyclopedia which is also known as Baidu Baike, validation on the GKHMC dataset and the results is something like Wikipedia, but the content of are 90.00% precision and 84.38% recall in EQs it is written in Chinese. We denote this resource and 95.79% precision and 97.43% recalls in SQs. as BAIKE. The textbooks resource contains three 3.1.2 Score Functions compulsory history textbooks published by Peo- To calculate the relevance between question stem ple’s Education Press. We denote them as BOOK. and candidates, we introduce three different meth- And we gathered about 50,000 practice questions ods with seven score functions, which are summa- and their answers, and this is denoted as TIKU.