Arxiv:2004.12704V1 [Cs.CL] 27 Apr 2020 Cally Generating Assessment Questions, Respectively

Semantic Graphs for Generating Deep Questions Liangming Pan1;2 Yuxi Xie3 Yansong Feng3 Tat-Seng Chua2 Min-Yen Kan2 1NUS Graduate School for Integrative Sciences and Engineering 2School of Computing, National University of Singapore, Singapore 3Wangxuan Institute of Computer Technology, Peking University [email protected], fxieyuxi, [email protected] fdcscts@, [email protected] Input Sentence: Abstract Oxygen is used in cellular respiration and released by photosynthesis, which uses the energy of sunlight to produce oxygen from water. Question: What life process produces oxygen in the presence of light? This paper proposes the problem of Deep Answer: Photosynthesis Question Generation (DQG), which aims to a) Example of Shallow Question Generation Input Paragraph A: Pago Pago International Airport generate complex questions that require rea- Pago Pago International Airport, also known as Tafuna Airport, is a public airport soning over multiple pieces of information of located 7 miles (11.3 km) southwest of the central business district of Pago Pago, in the village and plains of Tafuna on the island of Tutuila in American Samoa, an the input passage. In order to capture the unincorporated territory of the United States. Input Paragraph B: Hoonah Airport global structure of the document and facil- Hoonah Airport is a state-owned public-use airport located one nautical mile (2 km) southeast of the central business district of Hoonah, Alaska. itate reasoning, we propose a novel frame- Question: Are Pago Pago International Airport and Hoonah Airport both on work which first constructs a semantic-level American territory? Answer: Yes graph for the input document and then en- b) Example of Deep Question Generation codes the semantic graph by introducing an attention-based GGNN (Att-GGNN). After- Figure 1: Examples of shallow/deep QG. The evidence wards, we fuse the document-level and graph- needed to generate the question are highlighted. level representations to perform joint training of content selection and question decoding. On the HotpotQA deep-question cen- What-if, which requires an in-depth understand- tric dataset, our model greatly improves per- ing of the input source and the ability to reason formance over questions requiring reasoning over disjoint relevant contexts; e.g., asking Why over multiple facts, leading to state-of-the- did Gollum betray his master Frodo Baggins? after art performance. The code is publicly avail- reading the fantasy novel The Lord of the Rings. able at https://github.com/WING-NUS/ Learning to ask such deep questions has intrinsic SG-Deep-Question-Generation. research value concerning how human intelligence 1 Introduction embodies the skills of curiosity and integration, and will have broad application in future intelligent sys- Question Generation (QG) systems play a vital role tems. Despite a clear push towards answering deep in question answering (QA), dialogue system, and questions (exemplified by multi-hop reading com- automated tutoring applications – by enriching the prehension (Cao et al., 2019) and commonsense training QA corpora, helping chatbots start con- QA (Rajani et al., 2019)), generating deep ques- versations with intriguing questions, and automati- tions remains un-investigated. There is thus a clear arXiv:2004.12704v1 [cs.CL] 27 Apr 2020 cally generating assessment questions, respectively. need to push QG research towards generating deep Existing QG research has typically focused on gen- questions that demand higher cognitive skills. erating factoid questions relevant to one fact ob- In this paper, we propose the problem of Deep tainable from a single sentence (Duan et al., 2017; Question Generation (DQG), which aims to gener- Zhao et al., 2018; Kim et al., 2019), as exemplified ate questions that require reasoning over multiple in Figure1 a). However, less explored has been the pieces of information in the passage. Figure1 b) comprehension and reasoning aspects of question- shows an example of deep question which requires ing, resulting in questions that are shallow and not a comparative reasoning over two disjoint pieces reflective of the true creative human process. of evidences. DQG introduces three additional People have the ability to ask deep questions challenges that are not captured by traditional QG about events, evaluation, opinions, synthesis, or systems. First, unlike generating questions from reasons, usually in the form of Why, Why-not, How, a single sentence, DQG requires document-level understanding, which may introduce long-range de- our knowledge, to investigate deep question gen- pendencies when the passage is long. Second, we eration, (2) a novel framework which combines a must be able to select relevant contexts to ask mean- semantic graph with the input passage to generate ingful questions; this is non-trivial as it involves deep questions, and (3) a novel graph encoder that understanding the relation between disjoint pieces incorporates attention into a GGNN approach. of information in the passage. Third, we need to ensure correct reasoning over multiple pieces of 2 Related Work information so that the generated question is an- swerable by information in the passage. Question generation aims to automatically generate questions from textual inputs. Rule-based tech- To facilitate the selection and reasoning over niques for QG usually rely on manually-designed disjoint relevant contexts, we distill important in- rules or templates to transform a piece of given formation from the passage and organize them as a text to questions (Heilman, 2011; Chali and Hasan, semantic graph, in which the nodes are extracted 2012). These methods are confined to a vari- based on semantic role labeling or dependency pars- ety of transformation rules or templates, mak- ing, and connected by different intra- and inter- ing the approach difficult to generalize. Neural- semantic relations (Figure2). Semantic relations based approaches take advantage of the sequence- provide important clues about what contents are to-sequence (Seq2Seq) framework with atten- question-worthy and what reasoning should be per- tion (Bahdanau et al., 2014). These models are formed; e.g., in Figure1, both the entities Pago trained in an end-to-end manner, requiring far less Pago International Airport and Hoonah Airport labor and enabling better language flexibility, com- have the located at relation with a city in United pared against rule-based methods. A comprehen- States. It is then natural to ask a comparative ques- sive survey of QG can be found in Pan et al.(2019). tion: e.g., Are Pago Pago International Airport and Many improvements have been proposed since Hoonah Airport both on American territory?. To the first Seq2Seq model of Du et al.(2017): ap- efficiently leverage the semantic graph for DQG, plying various techniques to encode the answer in- we introduce three novel mechanisms: (1) propos- formation, thus allowing for better quality answering a novel graph encoder, which incorporates an focused questions (Zhou et al., 2017; Sun et al., attention mechanism into the Gated Graph Neural 2018; Kim et al., 2019); improving the training via Network (GGNN) (Li et al., 2016), to dynamically combining supervised and reinforcement learning model the interactions between different seman- to maximize question-specific rewards (Yuan et al., tic relations; (2) enhancing the word-level passage 2017); and incorporating various linguistic features embeddings and the node-level semantic graph rep- into the QG process (Liu et al., 2019a). However, resentations to obtain an unified semantic-aware these approaches only consider sentence-level QG. passage representations for question decoding; and In contrast, our work focus on the challenge of gen- (3) introducing an auxiliary content selection task erating deep questions with multi-hop reasoning that jointly trains with question decoding, which as- over document-level contexts. sists the model in selecting relevant contexts in the Recently, work has started to leverage paragraph- semantic graph to form a proper reasoning chain. level contexts to produce better questions. Du and We evaluate our model on HotpotQA (Yang Cardie(2018) incorporated coreference knowledge et al., 2018), a challenging dataset in which the to better encode entity connections across docu- questions are generated by reasoning over text from ments. Zhao et al.(2018) applied a gated self- separate Wikipedia pages. Experimental results attention mechanism to encode contextual informa- show that our model — incorporating both the use tion. However, in practice, semantic structure is of the semantic graph and the content selection difficult to distil solely via self-attention over the task — improves performance by a large margin, entire document. Moreover, despite considering in terms of both automated metrics (Section 4.3) longer contexts, these works are trained and evalu- and human evaluation (Section 4.5). Error analysis ated on SQuAD (Rajpurkar et al., 2016), which we (Section 4.6) validates that our use of the seman- argue as insufficient to evaluate deep QG because tic graph greatly reduces the amount of semantic more than 80% of its questions are shallow and errors in generated questions. In summary, our con- only relevant to information confined to a single tributions are: (1) the very first work, to the best of sentence (Du et al., 2017). Question The "Happy Fun Ball" was the subject of a series of parody advertisements on a show created by who? Content Selection Prediction

Load more