Asking Clarification Questions in Knowledge-Based Question Answering

Jingjing Xu1,∗ Yuechen Wang2, Duyu Tang3, Nan Duan3, Pengcheng Yang1, Qi Zeng1, Ming Zhou3, Xu Sun1 1 MOE Key Lab of Computational Linguistics, School of EECS, Peking University 2 University of Science and Technology of China 3 Microsoft Research Asia, Beijing, China {jingjingxu,yang pc,pkuzengqi,xusun}@pku.edu.cn [email protected] {dutang,nanduan,mingzhou}@microsoft.com

Abstract Midori

computer.web_browser computer.operating_system Type The ability to ask clarification questions is Type computer.software… computer.software…

essential for knowledge-based question an- Midori is a free and open- Midori was the code name for a Description source light-weight web Description managed code operating swering (KBQA) systems, especially for han- browser.... system being developed… dling ambiguous phenomena. Despite its importance, clarification has not been well Language Language M# explored in current KBQA systems. Fur- What are the languages used to ther progress requires supervised resources for When you say the source code language create the source code of Midori? used in the program Midori, are you training and evaluation, and powerful models referring to web browser Midori or for clarification-related text understanding and operating system Midori? generation. In this paper, we construct a new I mean the first one. clarification dataset, CLAQUA, with nearly C. 40K open-domain examples. The dataset sup- ports three serial tasks: given a question, iden- Figure 1: An example of a clarification question in tify whether clarification is needed; if yes, KBQA. There are two entities named “Midori” using generate a clarification question; then predict different programming languages, which confuses the answers base on external user feedback. We system. provide representative baselines for these tasks and further introduce a coarse-to-fine model For these ambiguous or confusing questions, it is for clarification question generation. Experi- hard to directly give satisfactory responses unless ments show that the proposed model achieves better performance than strong baselines. The systems can ask clarification questions to confirm further analysis demonstrates that our dataset the participant’s intention. Therefore, this work brings new challenges and there still remain explores how to use clarification to improve cur- several unsolved problems, like reasonable rent KBQA systems. automatic evaluation metrics for clarification We introduce an open-domain clarification cor- question generation and powerful models for handling entity sparsity.1 pus, CLAQUA, for KBQA. Unlike previous clarification-related datasets with limited anno- 1 Introduction tated examples (De Boni and Manandhar, 2003; Stoyanchev et al., 2014) or in specific domains (Li Clarification is an essential ability for knowledge- et al., 2017; Rao and III, 2018), our dataset cov- based question answering, especially when han- ers various domains and supports three tasks. The dling ambiguous questions (Demetras et al., comparison of our dataset with relevant datasets is 1986). In real-world scenarios, ambiguity is a shown in Table1. Our dataset considers two kinds common phenomenon as many questions are not of ambiguity for single-turn and multi-turn ques- clearly articulated, e.g. “What are the languages tions. In the single-turn case, an entity name refers used to create the source code of Midori?” in Fig- to multiple possible entities in a knowledge base ure1. There are two “ Midori” using different pro- while the current utterance lacks necessary iden- gramming languages, which confuses the system. tifying information. In the multi-turn case, am- ∗ The work was done while Jingjing Xu and Yuechen biguity mainly comes from the omission where a Wang were interns in Microsoft Research, Asia. 1The dataset and code will be released at https:// pronoun refers to multiple possible entities from github.com/msra-nlc/MSParS_V2.0 the previous conversation turn. Unlike CSQA Dataset Domain Size Task De Boni and Manandhar(2003) Open domain 253 Clarification question generation. Stoyanchev et al.(2014) Open domain 794 Clarification question generation. Learning to generate responses based on previous clar- Li et al.(2017) Movie 180K ification questions and user feedback in dialogue. Learning to ask clarification questions in reading com- Guo et al.(2017) Synthetic 100K prehension. Ranking clarification questions in an online QA forum, Rao and III(2018) Operating system 77K StackExchange2. Clarification in KBQA, supporting clarification iden- CLAQUA (Our dataset) Open domain 40K tification, clarification question generation, and clarification-based question answering.

Table 1: The comparison of our dataset with relevant datasets. The first and second datasets focus on generating clarification questions while the small size limits their applications. The middle three tasks are either not designed for KBQA or limited in specific domains. Unlike them, we present an open-domain dataset for KBQA. dataset (Saha et al., 2018) that constructs clarifica- tion, the best performing system obtains an accu- tion questions base on predicate-independent tem- racy of 86.6%. For clarification question genera- plates, our clarification questions are predicate- tion, our proposed coarse-to-fine model achieves aware and more diverse. Based on the clarifica- a BLEU score of 45.02, better than strong base- tion raising pipeline, we formulate three tasks in line models. For clarification-based question an- our dataset, including clarification identification, swering, the best accuracy is 74.7%. We also con- clarification question generation, and clarification- duct a detailed analysis for three tasks and find that based question answering. These tasks can natu- our dataset brings new challenges that need to be rally be integrated into existing KBQA systems. further explored, like reasonable automatic eval- Since asking too many clarification questions sig- uation metrics and powerful models for handling nificantly lowers the conversation quality, we first the sparsity of entities. design the task of identifying whether clarification is needed. If no clarification is needed, systems 2 Data Collection can directly respond. Otherwise, systems need to generate a clarification question to request more The construction process consists of three steps, information from the participant. Then, external sub-graph extraction, ambiguous question annota- user feedback is used to predict answers. tion, and clarification question annotation. We de- Our main contribution is constructing a new sign different annotation interfaces for single-turn clarification dataset for KBQA. To build high- and multi-turn cases. quality resources, we elaborately design a data an- notation pipeline, which can be divided into three 2.1 Single-Turn Annotation steps: sub-graph extraction, ambiguous ques- Sub-graph Extraction. As shown in Figure2, tion annotation, and clarification question anno- we extract ambiguous sub-graphs from an open- tation. We design different annotation interfaces domain knowledge base, Satori. For simplifica- for single-turn and multi-turn cases. We first ex- tion, we set the maximum number of ambiguous tract “ambiguous sub-graphs” from a knowledge entities to 2. In single-turn cases, we focus on base as raw materials. To enable systems to per- shared-name ambiguity where two entities have form robustly across domains, the sub-graphs are the same name and there is a lack of necessary dis- extracted from an open-domain KB, covering do- tinguishing information. To construct such cases, mains like music, tv, book, film, etc. Based on the we extract two entities sharing the same entity sub-graphs, annotators are required to write am- name and the same predicate. Predicate represents biguous questions and clarification questions. the relation between two entities. The sub-graphs We also contribute by implementing representa- provide reference for human annotators to write tive neural networks for the three tasks and further diverse ambiguous questions based on the shared developing a new coarse-to-fine model for clari- predicates. fication question generation. Take multi-turn as an example. In the task of clarification identifica- 2An online QA community. Computer.software. E1 Name Midori language_used Midori C E1 Type computer.software computer.web browser media common.creative work Computer.software...... language_used Midori M# E1 Description Midori is a free and open-source light-weight web browser. It uses the WebKit rendering engine and the Figure 2: An extracted ambiguous sub-graph in GTK+ 2 or GTK+ 3 ...... the single-turn case. Two entities share the same E2 Name Midori name “Midori” and the same predicate “Com- E2 Type computer.operating system puter.software,language used”. computer.software E2 Description Midori was the code name for a man- aged code operating system being Shared Name Midori developed by Microsoft with joint Predicate # 1 computer.software.language used effort of Microsoft Research...... Predicate A property relating an instance Qa What are the languages used to cre- Description of computer.software to a pro- ate the source code of Midori? gramming language that was used Qc: When you say the source code language used in to create the source code for the the program Midori, are you referring to [web browser software. Midori] or [operating system Midori]? Qa: What are the languages used to create the source code of Midori? Table 3: A clarification question annotation example in Table 2: An ambiguous question annotation example in the single-turn case. the single-turn case.

Ambiguous Question Annotation. In this step, more details from the participant. Though simple, the input is a table listing the shared entity name, it costs more user efforts as it pushes the partic- predicate name, and predicate description. Based ipant to figure out what confuses the system. In on this table, annotators need to write ambiguous comparison, the advantage of single-choice type questions, e.g., “What are the languages used to lies in the control of conversation direction. How- create the source code of Midori?”. An annotation ever, if the system cannot provide user-expected example is shown in Table2. For diversity, anno- choices, conversation may be longer. As for the tators are encouraged to paraphrase the intention multi-choice type, its advantage is less conversa- words in the predicate description. tion turns to ask for more valuable information while it requires more annotation work. Clarification Question Annotation. Based on entities and the annotated ambiguous question, an- 2.2 Multi-Turn Annotation notators are required to summarize distinguish- ing information and write a multi-choice clarifi- Sub-graph Extraction. In multi-turn cases, am- cation question with a special marker for sepa- biguity comes from the omission of the target en- rating entity information and pattern information, tity name from the previous conversation turn. We e.g., “When you say the source code language first extract two connected entities. If they also used in the program Midori, are you referring to share another predicate, we extract all related in- [web browser Midori] or [operating system Mi- formation as an ambiguous sub-graph, as shown in dori]?”. Annotators are asked to write predicate- Figure3. By asking the shared predicate, we get aware clarification questions instead of general an ambiguous question. questions, like “Which one do you mean, A or B?” organism_classification or “do you mean A?” as adopted in (Saha et al., Lineodes Species caracasia 2018). An example is shown in Table3. We use multi-choice as our basic type of clar- higher_classification ification. There are three possible clarification organism_classification question types, including zero-choice type (e.g., I Lineodes Genes don’t understand, can you provide more details?), single-choice type (e.g., Do you mean A?), and Figure 3: An extracted ambiguous sub-graph in the multi-turn case. “Lineodes caracasia” and “Li- multi-choice type (e.g., Which one do you mean, neodes” are linked and share the same relation “organ- A or B?). The zero-choice type means that the sys- ism classification”. tem does not understand the question and expects organization E1 Name Lineodes caracasia music location 6.1% E2 Name Lineodes 13.8% Predicate #1 biology.higher classification 6.9% geography Predicate Description Property relating an biol- 7.4% tv ogy.organism classification 13.2% to its higher or parent biol- 7.4% sports ogy.organism classification.

Predicate #2 biology.organism classification 9.3% 13.2% people Predicate Description Relating an biol- book ogy.organism to its biol- 10.3% 12.4% ogy.organism classification, traffic the formal or informal tax- film onomy grouping for the organism. Figure 4: The distribution of top-10 domains. Qh: What is the higher classification for Lineodes caracasia? Rh: Lineodes. ing clarification identification, clarification ques- Qa: What is the biological classification? tion generation, and clarification-based question answering. Figure4 shows the distribution of the Table 4: An ambiguous question annotation example in the multi-turn case. top-10 domains in our dataset. 3.1 Clarification Identification E1 Name Lineodes caracasia E1 Type media common.cataloged instance biology.organism classification Clarification identification can be regarded as a ...... classification problem. It takes conversation con- E1 Description Lineodes caracasia is a moth in the text and candidate entity information as input. The family ...... output is a label identifying whether clarification E2 Name Lineodes E2 Type media common.cataloged instance is needed. Specifically, the input is {Qa,E1,E2} biology.organism classification for single-turn cases or {Qp,Rp,Qa,E1,E2} for E2 Description Lineodes is a of moths of the Crambidae family...... multi-turn cases. Qa represents the current ques- Qh What is the higher classification for tion. E1 and E2 represent the candidate entities. Lineodes caracasia? Qp and Rp are question and response from the pre- R h Lineodes. vious conversation turn. The output is a binary la- Qa What is the biological classification? Qc: Are you referring to [Lineodes caracasia] or Li- bel from set Y = {0, 1} where 1 indicates that the neodes], when you ask the biological classification? input question is ambiguous and 0 indicates the opposite. Table 5: A clarification question annotation example in the multi-turn case. actor - Which film does David Wills David Wills Sonic Outlaws act?

Ambiguous Question Annotation. An annota- David Wills actor tion example is shown in Table4. Based on two mouth place - Which city does river Nile Nile Uganda entities in the extracted sub-graph, annotators con- go through? - basin_city - Where is its mouth place? struct a conversation turn and then write an am- mouth place Cairo biguous question where the entity name is omitted, e.g, “What is the biological classification?”. Figure 5: Negative examples of unambiguous sub- Clarification Question Annotation. The anno- graphs and annotated conversations. The dotted line tation guideline of this step is the same as in means the non-existent relation. The upper figure is in the single-turn case and the lower one is in the multi- single-turn annotation. The input includes two turn case. candidate entities and the annotated ambiguous question. The output is a clarification question. As Figure5 shows, negative examples are anno- An annotation example is shown in Table5. tated in the same way as the positive (ambiguous) 3 Tasks examples do, but without the clarification related steps. In their sub-graphs, one of the entities has Our dataset aims to enable systems to ask clarifi- its unique predicate. The unique predicates are cation questions in open-domain question answer- included in the user questions like “Where is its ing. Three tasks are designed in this work, includ- mouth place?”. As only river “Nile” has mouth Single-Turn Multi-Turn clarification example, we randomly select a candi- Positive Negative Positive Negative Train 8,139 6,841 12,173 8,289 date entity and fill its information into the template Dev 487 422 372 601 as the user response, e.g., “I mean web browser Test 637 673 384 444 Midori.” for the clarification question in Table3. For entity identification, we use the selected entity Table 6: Statistics of the dataset. It is important to note as the gold label. For predicate prediction, we use that three tasks share the same split. Since clarifica- the predicate in the current ambiguous question as tion question generation and clarification-based ques- tion answering are built upon ambiguous situations, the gold label. In muli-turn cases, the predicate la- they only use the positive data in their training, devel- bel is Predicate #1 as shown in Table2. In single- opment, and test sets. turn cases, the predicate label is Predicate #2 as shown in Table4. 3 places, this question is unambiguous. Entity identification and predicate identifica- The statistics of our dataset are shown in Ta- tion are both classification tasks. They take ble6. It is important to note that three tasks share the same information as input, including am- the same split. Since clarification question gen- biguous context, clarification question, and user eration and clarification-based question answering response. For single-turn cases, the input are built upon ambiguous situations, they only use is {Qa,E1,E2,Qc,Rc} where Rc is the feed- the positive (ambiguous) data in their training, de- back of the participant toward clarification ques- velopment, and test sets. For generalization, we tion Qc. For multi-turn cases, the input is add some examples with unseen entities and pred- {Qp,Rp,Qa,E1,E2,Qc,Rc}. For entity identi- icates into the development and test sets. fication, the output is a label from set YE = {E1,E2}. For predicate identification, we ran- 3.2 Clarification Question Generation domly choose 1 negative predicates from the train- This text generation task takes ambiguous con- ing set and combine them with the gold predicate text and entity information as input, and then out- together as the candidate set YP . The output of puts a clarification question. For single-turn cases, predicate identification is the index of the gold the input is {Qa,E1,E2}. For multi-turn cases, predicate in YP . the input is {Qp,Rp,Qa,E1,E2}. In both cases, the output is a clarification question Qc. We use 4 Approach BLEU as the automatic evaluation metric. For three tasks, we implement several representa- 3.3 Clarification-Based Question Answering tive neural networks as baselines. Here we intro- duce these models in detail. As our dataset focuses on simple questions, this task can be simplified as the combination of entity 4.1 Classification Identification Models identification and predicate identification. Entity The input of classification models is the ambigu- identification is to extract entity e from candidate ous context and entity information. For simpli- entities based on current context. Predicate identi- fication, we use a special symbol, [S], to con- fication is to choose predicate p that describes the catenate all the entity information together. These ambiguous question. The extracted entity e and two parts can be regarded as different source in- predicate p are combined to query the knowledge formation. Based on whether the inter-relation triple (e, p, o). The model is correct when both between different source information is explicitly tasks are successfully completed. explored, the classification models can be classi- Additional user responses toward clarification fied into two categories, unstructured models and questions are also necessary for this task. Due to structured models. the strong pattern in responses, we use a template- Unstructured models concatenate different based method to generate the feedback. We first source inputs into a long sequence with a spe- design four templates, “I mean the first one”, “I cial separation symbol [SEL]. We implement mean the second one”, “I mean [entity name]”, “I several widely-used sequence classification mod- mean [entity type] [entity name]”. Then for each els, including Convolutional Neural Network 3It brings a little noise because some predicates may be (CNN) (Kim, 2014), Long-Short Term Memory omitted in the used knowledge base. Network (LSTM) (Schuster and Paliwal, 1997), web browser operating [B] [A] Midori system Midori guage used in the program Midori, are you refer- ring to [A] or [B]?”). The entity phrase is summa-

Entity Info Encoder Decoder Decoder rized from the given entity information for distin- guishing between two entities. The pattern phrase Entity Rendering Module Ambiguous Context is used to locate the position of ambiguity, which Template What are the When you say the source languages is closely related with the context. In summary, code language used in used to create Encoder Decoder the program Midori, are the source you referring to [A] or [B]? two kinds of phrases refer to different source in- code of Midori?

Template Generating Module Hidden vector of [A] formation. Based on this feature, we propose a Hidden vector of [B] new coarse-to-fine model, as shown in Figure6. Similar ideas have been successfully applied to se- Figure 6: An illustration of the proposed coarse-to-fine mantic parsing (Dong and Lapata, 2018). model. The proposed model consists of a template gen- erating module and an entity rendering module. The The proposed model consists of a template gen- former is used to generate a clarification template based erating module Tθ and an entity rendering module on ambiguous question, e.g., “When you say the source Rφ. Tθ first generates a template containing pat- code language used in the program Midori, are you re- tern phrases and the symbolic representation of the ferring to [A] or [B]?”. The latter is used to fill up the entity phrases. Then, the symbolized entities con- generated template with detailed entity information. tained in the generated template are further prop- erly rendered by the entity rendering module Rφ to and Recurrent Convolutional Neural Network reconstruct complete entity information. Since the (RCNN) (Lai et al., 2015), Transformer. The annotated clarification questions explicitly sepa- details of the models are shown in Supplementary rate entity phrases and pattern phrases, we can Materials. easily build training data for these two modules. Structured models use separate structures to en- For clarity, the template is constructed by replac- code different source information and adopt an ing entity phrases in a clarification question with additional structure to model the inter-relation of special symbols, [A] and [B], which represent the source information. Specifically, we use two the positions of two entities. representative neural networks, Hierarchical At- The template generating module Tθ uses Trans- tention Network (HAN) (Yang et al., 2016) and former (Vaswani et al., 2017) as implementation. Dynamic Memory Network (DMN) (Kumar et al., The input is the ambiguous context and the output 2016), as our structured baselines. The details of is a clarification template. the models are shown in Supplementary Materials. Similar to Tθ, the entity rendering module Rφ is also implemented with a Transformer structure. 4.2 Clarification Question Generation More specifically, for each symbolized entity in Models the template, we add the hidden state of the de- The input of the generation model is the ambigu- coder of Tθ corresponding to this symbolized en- ous context and entity information. In single-turn tity into the decoder input. cases, the ambiguous context is current question Qa. In multi-cases, the input is current question 4.3 Clarification-Based Question Answering and the previous conversation turn {Qp,Rp,Qa}. Models We use [S] to concatenate all entity informa- This task contains two sub-tasks, entity identifi- tion together, and use [SEL] to concatenate entity cation and predicate identification. They share information and context information into a long the same input. In single-turn cases, the input is sequence. We adopt Seq2Seq (Bahdanau et al., {Qa,Qc,Rc,E1,E2}. In multi-cases, the input is 2015) and Transformer (Vaswani et al., 2017) as {Qp,Rp,Qa,Qc,Rc,E1,E2}. Considering these our baselines and further develop a new genera- two sub-tasks both belong to classification tasks, tion model. The details of baselines can be found we use the same models as used in the task of clar- in Supplementary Materials. ification identification for implementation.

Coarse-to-fine Model. Generally speaking, a 5 Experiments clarification question contains two parts, entity phrases (e.g., “web browser Midori”) and pattern We report the experiment results on three formu- phrases ( e.g. “When you say the source code lan- lated tasks and give a detailed analysis. More Model Single-Turn Multi-Turn Model Single-Turn Multi-Turn CNN 83.3 80.0 Seq2Seq 18.84 31.62 RNN 83.8 72.7 Transformer 20.69 44.42 Transformer 84.0 72.8 The proposed Model 24.04 45.02 HAN 83.1 73.0 DMN 86.6 80.5 Table 8: Results of clarification question generation models. Table 7: Results of clarification identification models.

E1 Name Lineodes interrupta E1 Type biology.organism classification details of hyper-parameters are in Supplementary E1 Description Lineodes interrupta is a moth in the Materials. Crambidae family. It was described by Zeller in 1873. It is found in 5.1 Experiment Results Mexico and the United States, where it has been recorded from Florida, Clarification Identification. As shown in Ta- Louisiana, Oklahoma and Texas. ble7, the structured models have the best perfor- E2 Name Lineodes E Type biology.organism classification mance, with a 86.6% accuracy on single-turn cases 2 E2 Description Lineodes is a genus of moths of the and a 80.5% accuracy on multi-turn cases. Com- Crambidae family. pared to the best-performing unstructured model, Qp What is higher classification of Li- neodes interrupta? the structured architecture brings obvious perfor- Rp Lineodes. mance improvements, with 2.6% and 6.7% accu- Qa Biologically speaking, what is the racy increases. It indicates that the structured ar- classification? Output: Are you referring to Lineodes interrupta or chitectures have learned the inter-relation between Lineodes, when you ask the biological classification? the context and entity details, which is a vital part for reasoning whether a question is ambiguous. Table 9: An example of the generated clarification We also find that the models all achieve better re- questions. sults on multi-turn cases. Multi-turn cases contain additionally previous conversation turn, which can this task tends to be over-fitting and small models help the models capture more key phrases from the have better the generalization ability. entity information. 5.2 Discussion Clarification Question Generation. Table8 shows that Seq2Seq achieves low BLEU scores, Automatic Evaluation. In the task of clarifica- which indicates its tendency to generate irrelevant tion question generation, we use BLEU, a widely text. Transformer achieves higher performance used evaluation metric on language generation than Seq2Seq. Our proposed coarse-to-fine model tasks, as the automatic evaluation metric. BLEU demonstrates a new state of the art, improving the evaluates the generated text by computing its sim- current highest baseline result by 3.35 and 0.60 ilarity with the gold text. However, we find this BLEU scores, respectively. metric not suitable for our task. A good answer We also find that multi-turn cases generally ob- can ask different aspects as long as it can distin- tain higher BLEU scores than single-turn cases. guish the entities. The given answer in our dataset In single-turn cases, the same-name characteris- is one of possible answers but not the only cor- tic makes it hard to summarize key distinguish- rect answer. A good answer may get a low BLEU ing information. As a comparison, in multi-turn score. Therefore, a more reasonable evaluation cases, two candidate entities usually have differ- metric needs to be explored in the future, like ent names and it is easier to generate clarification paraphrase-based BLEU. questions. It is due to the flexible structure of am- Error Analysis. In clarification question gener- biguous questions, which makes it hard to define ation, although our proposed model has achieved universal rules to identify the asked objects. Ta- the best performance, it still generates some low- ble9 shows a an example generated by the pro- quality questions. We conduct a human evaluation posed model. on 100 generated clarification questions and sum- Clarification-Based Question Answering. As marize two error categories. The first one is the Table 10 shows, the unstructured models perform entity error where the generated clarification ques- better than the structured models, indicating that tion has a correct multi-choice structure but with Model Single-Turn Multi-Turn ing this work, Guo et al.(2017) provide a larger CNN 60.8 74.7 RNN 56.4 76.0 synthetic dataset QRAQ by replacing some of en- Transformer 52.1 58.9 tities with variables. Li et al.(2017) use mis- DMN 50.9 54.5 spelled words to replace entities in questions to HAN 51.7 54.7 build ambiguous questions. Although these stud- ies are good pioneering studies, the synthetic con- Table 10: Results of clarification-based question an- swering models. structing way makes them unnatural and far from real world questions.

E1 Name Come Back, Little Sheba Different from these studies, Braslavski et al. E1 Type award.winning work (2017) investigate a community question answer- award.nominated work ...... ing (CQA) dataset and study how to predict the E1 Description Come Back, Little Sheba is a 1950 specific subject of a clarification question. Simi- theater production of the play by larly, Rao and III(2018) focus on learning to rank William Inge...... E2 Name Come Back, Little Sheba human-written clarification questions in an online E2 Type theater.production QA forum, StackExchange. Our work differs from award.nominated work these two studies in that we have a knowledge E2 Description Come Back, Little Sheba is a 2008 theater production of the play by graph at the backend, and the clarification-related William Inge...... components are able to facilitate KBQA. Qh: Who directed Come Back, Little Sheba? Output: Which one do you mean, winning work visual artist Claudia Coleman or winning work Elmer Gantry, 6.2 Question Generation when you ask the directors? For different purposes, there are various question Table 11: A bad generated case. generation tasks. Hu et al.(2018) aim to ask ques- tions to play the 20 question game. Dhingra et al. irrelevant entity information. Here we present one (2017) teach models to ask questions to limit the example in Table 11. In this example, our model number of answer candidates in task-oriented - generates entity phrases irrelevant to the input am- logues. Ke et al.(2018) train models to ask ques- biguous question. This failure is mainly due to the tions in open-domain conversational systems to unseen entity name “Come Back, Little Sheba”, better interact with people. Guo et al.(2018) de- which leads the model to wrong generation de- velop a sequence-to-sequence model to generate cisions. Since there are lots of entities with dif- natural language questions. ferent names and descriptions, how to handle the sparse information is a non-trivial problem. The 7 Conclusion and Future Work second one is the grammar error where models sometimes generate low-quality phrases after gen- In this work, we construct a clarification ques- erating a low-frequency entity word. These errors tion dataset for KBQA. The dataset supports both can be attributed to the sparsity of entities to three tasks, clarification identification, clarifica- some extent. How to deal with the sparsity of en- tion question generation, and clarification-based tity information still needs further exploration. question answering. We implement representa- 6 Related Work tive neural networks as baselines for three tasks and propose a new generation model. The de- Our work is related to clarification question and tailed analysis shows that our dataset brings new question generation. challenges. More powerful models and reasonable evaluation metrics need further explored. 6.1 Clarification Question in Other Tasks In the future, we plan to improve our dataset There are several studies on asking clarification by including more complex ambiguous situations questions. Stoyanchev et al.(2014) randomly for both single-turn and multi-turn questions, such drop one phrase from a question and require an- as multi-hop questions, aggregation questions, etc. notators to ask a clarification question toward the We also plan to integrate the clarification-based dropped information, e.g.“Do you know the birth models into existing KBQA system, and study data of XXX”. However, the small dataset size how to iteratively improve the model based on hu- makes it hard to help downstream tasks. Follow- man feedback. Acknowledgments Huang Hu, Xianchao Wu, Bingfeng Luo, Chongyang Tao, Can Xu, Wei Wu, and Zhan Chen. 2018. Play- We thank all reviewers for providing the thought- ing 20 question game with policy-based reinforce- ful and constructive suggestions. This work was ment learning. arXiv preprint arXiv:1808.07645. supported in part by National Natural Science Pei Ke, Jian Guan, Minlie Huang, and Xiaoyan Zhu. Foundation of China (No. 61673028). Jingjing 2018. Generating informative responses with con- Xu and Xu Sun are the corresponding authors of trolled sentence function. In Proceedings of the this paper. 56th Annual Meeting of the Association for Com- putational Linguistics, ACL 2018, Melbourne, Aus- tralia, July 15-20, 2018, Volume 1: Long Papers, References pages 1499–1508. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben- Yoon Kim. 2014. Convolutional neural networks for gio. 2015. Neural machine translation by jointly sentence classification. In Proceedings of the 2014 learning to align and translate. In 3rd Inter- Conference on Empirical Methods in Natural Lan- national Conference on Learning Representations, guage Processing, EMNLP 2014, October 25-29, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Conference Track Proceedings. Interest Group of the ACL, pages 1746–1751. Pavel Braslavski, Denis Savenkov, Eugene Agichtein, Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit and Alina Dubatovka. 2017. What do you mean ex- Iyyer, James Bradbury, Ishaan Gulrajani, Victor actly?: Analyzing clarification questions in cqa. In Zhong, Romain Paulus, and Richard Socher. 2016. Proceedings of the 2017 Conference on Conference Ask me anything: Dynamic memory networks for Human Information Interaction and Retrieval, pages natural language processing. In International Con- 345–348. ACM. ference on Machine Learning, pages 1378–1387. Marco De Boni and Suresh Manandhar. 2003. An anal- Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. ysis of clarification dialogue for question answering. Recurrent convolutional neural networks for text In Proceedings of the 2003 Conference of the North classification. In AAAI, volume 333, pages 2267– American Chapter of the Association for Computa- 2273. tional Linguistics on Human Language Technology- Volume 1, pages 48–55. Association for Computa- Jiwei Li, Alexander H. Miller, Sumit Chopra, tional Linguistics. Marc’Aurelio Ranzato, and Jason Weston. 2017. Dialogue learning with human-in-the-loop. In 5th Marty J Demetras, Kathryn Nolan Post, and Cather- International Conference on Learning Representa- ine E Snow. 1986. Feedback to first language learn- tions, ICLR 2017, Toulon, France, April 24-26, ers: The role of repetitions and clarification ques- 2017, Conference Track Proceedings. tions. Journal of child language, 13(2):275–292. Sudha Rao and Hal Daume´ III. 2018. Learning to ask Bhuwan Dhingra, Lihong Li, Xiujun Li, Jianfeng Gao, good questions: Ranking clarification questions us- Yun-Nung Chen, Faisal Ahmed, and Li Deng. 2017. ing neural expected value of perfect information. In Towards end-to-end reinforcement learning of dia- Proceedings of the 56th Annual Meeting of the As- logue agents for information access. In Proceedings sociation for Computational Linguistics, ACL 2018, of the 55th Annual Meeting of the Association for Melbourne, Australia, July 15-20, 2018, Volume 1: Computational Linguistics, ACL 2017, Vancouver, Long Papers, pages 2736–2745. Canada, July 30 - August 4, Volume 1: Long Papers, pages 484–495. Amrita Saha, Vardaan Pahuja, Mitesh M Khapra, Karthik Sankaranarayanan, and Sarath Chandar. Li Dong and Mirella Lapata. 2018. Coarse-to-fine de- 2018. Complex sequential question answering: To- coding for neural semantic parsing. arXiv preprint wards learning to converse over linked question an- arXiv:1805.04793. swer pairs with a knowledge graph. In Thirty- Second AAAI Conference on Artificial Intelligence. Daya Guo, Yibo Sun, Duyu Tang, Nan Duan, Jian Yin, Hong Chi, James Cao, Peng Chen, and Ming Zhou. Mike Schuster and Kuldip K Paliwal. 1997. Bidirec- 2018. Question generation from sql queries im- tional recurrent neural networks. IEEE Transactions proves neural semantic parsing. In Proceedings of on Signal Processing, 45(11):2673–2681. the 2018 Conference on Empirical Methods in Nat- ural Language Processing, pages 1597–1607. Asso- Svetlana Stoyanchev, Alex Liu, and Julia Hirschberg. ciation for Computational Linguistics. 2014. Towards natural clarification questions in di- alogue systems. In AISB symposium on questions, Xiaoxiao Guo, Tim Klinger, Clemens Rosenbaum, discourse and dialogue, volume 20. Joseph P Bigus, Murray Campbell, Ban Kawas, Kartik Talamadupula, Gerry Tesauro, and Satinder Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Singh. 2017. Learning to query, reason, and answer Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz questions on ambiguous texts. In ICLR 2017. Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Pro- cessing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 Decem- ber 2017, Long Beach, CA, USA, pages 6000–6010. Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchi- cal attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies, pages 1480–1489. A Supplementary Materials Model Hyper-parameter Setting Batch size 16 Embedding size 32 A.1 Classification Models Hidden size 32 Seq2Seq Learning rate 2 Model Hyper-parameter Setting Beam size 1 Batch size 64 Dropout 0.1 Embedding size 128 Optimizer Adam Hidden size 128 Batch size 16 CNN Filter size 128 Embedding size 32 Kernel size 100 Hidden size 32 Learning rate 0.0003 Transformer Learning rate 2 Optimizer Adam Beam size 1 Batch size 8 Dropout 0.1 Embedding size 128 Optimizer Adam RNN Hidden size 128 Batch size 16 Learning rate 2 Embedding size 32 Optimizer Adam Hidden size 32 Batch size 8 Our Model Learning rate 2 Embedding size 128 Beam size 1 Hidden size 128 Dropout 0.1 Transformer Learning rate 2 Optimizer Adam Dropout 0.1 Optimizer Adam Table 13: Settings of generation models. Batch size 64 Embedding size 100 HAN Hidden size 100 Learning rate 0.01 with a special attention mechanism. called multi- Optimizer Adam Batch size 64 head attention. The model is composed of a stack Embedding size 300 DMN of 2 identical layers. Each identical layer has two Hidden size 300 sub-layers. The first sub-layer is a multi-head self- Learning rate 0.005 Optimizer Adam attention layer, and the second is a position-wise fully connected feed-forward layer. Since input Table 12: Settings of classification models. nodes are not sequentially in order, the positional embedding layer is removed from the model. The Table 12 shows the detailed hyper-parameter decoder is also composed of a stack of 2 identical settings of these classification models. Here we layers. introduce the classification models used in our ex- periments. HAN. It is built upon two self-attention layers. The first layer is responsible for encoding context CNN. It first feeds the input embedding se- information and entity information. Then, the sec- quence into a convolutional layer. The convolu- ond layer uses a self-attention mechanism to learn s,d tionial layer contains K filters f ∈ R where s the inter-relations between them. Finally, the out- is the size of filters and d is the dimension of word put of the second layer is fed into the output layer embeddings. We use different sizes of filters to get for predicting labels. rich features. Then, a max pooling layer takes K vectors as input and generates a distributed input DMN. This model regards context information representation h. Finally, the vector h is projected as query and entity information as input to learn to the probability of labels. their inter-relations. It consists of four modules, RNN. It is built on a traditional bi-directional an input module, a question module, an episodic gated recurrent unit (GRU) structure, which is memory module, and an answer module. The used to capture global and local dependencies in- input module and thed question module encode side the input sequence. The last output of GRU is query and input into distributed vector represen- then fed into the output layer to generate the prob- tations. The episodic memory module decodes ability of labels. which parts of the entity information to focus on through the attention mechanism. Finally, the an- Transformer. A model similar with RNNs. It swer module generates an answer based on the fi- removes the recurrent unit in RNNs and replaces it nal memory vector of the memory module. A.2 Clarification Question Generation Models Here is the detailed introduction about the genera- tion baselines. Seq2Seq (Bahdanau et al., 2015). This model is based on a traditional encoder-decoder frame- work which encodes the input sequence into a dense vector and then decodes the target sequence word by word. Attention is used to capture global dependencies between input and output. Transformer (Vaswani et al., 2017). It is based solely on attention mechanism to capture global dependencies. Similar to Seq2Seq, it uses an encoder-decoder framework but with different im- plementation. Table 13 shows the detailed hyper-parameter settings of generation models.