Knowledge-Aware Dialogue Generation Via Hierarchical Infobox
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21) Knowledge-Aware Dialogue Generation via Hierarchical Infobox Accessing and Infobox-Dialogue Interaction Graph Network Sixing Wu1 , Minghui Wang2 , Dawei Zhang1 , Yang Zhou3 , Ying Li4;5∗ and Zhonghai Wu4;5 1School of Electronics Engineering and Computer Science, Peking University, Beijing, China 2School of Software and Microelectronics , Peking University, Beijing, China 3Auburn University, Auburn, Alabama, USA 4National Research Center of Software Engineering, Peking University, Beijing, China 5Key Lab of High Confidence Software Technologies (MOE), Peking University, Beijing, China fwusixing, minghui wang, daweizhang, li.ying, [email protected], [email protected] Abstract Due to limited knowledge carried by queries, tra- ditional dialogue systems often face the dilemma of generating boring responses, leading to poor user experience. To alleviate this issue, this pa- per proposes a novel infobox knowledge-aware dialogue generation approach, HITA-Graph, with three unique features. First, open-domain infobox tables that describe entities with relevant attributes Figure 1: The ‘Bill Gates’ Infobox from Wikipedia. are adopted as the knowledge source. An order- irrelevance Hierarchical Infobox Table Encoder is proposed to represent an infobox table at three lev- dialogue context [Yu et al., 2020]; namely, it can effectively els of granularity. In addition, an Infobox-Dialogue assist the dialogue systems in understanding the intrinsic se- Interaction Graph Network is built to effectively in- mantics and fabricating informative responses. tegrate the infobox context and the dialogue context Dialogue systems can gather knowledge from various in- into a unified infobox representation. Second, a Hi- formation sources, which can be broadly classified into (1) erarchical Infobox Attribute Attention mechanism Knowledge texts, which can provide rich semantic informa- is developed to access the encoded infobox knowl- tion, are easy to access on the Internet, such as online ency- edge at different levels of granularity. Last but clopedia (e.g., Wikipedia and Answers.com), news websites, not least, a Dynamic Mode Fusion strategy is de- and search engines [Tam, 2020]; (2) Knowledge bases, such signed to allow the Decoder to select a vocabulary as knowledge graphs [Zhang et al., 2020], and spread-sheet word or copy a word from the given infobox/query. tables [Qin et al., 2019]; their knowledge items are well- We extract infobox tables from Chinese Wikipedia organized, and thus can be easily integrated into dialogue sys- and construct an infobox knowledge base. Exten- tems; and (3) Topics and keywords, which can promote the sive evaluation on an open-released Chinese cor- dialogue towards a specific and consistent direction [Wang et pus demonstrates the superior performance of our al., 2018; Wu et al., 2020b]. However, knowledge texts are approach against several representative methods. unstructured, knowledge bases are hard to construct/collect, and topic words/keywords are not informative. Therefore, 1 Introduction there is a question, can we utilize a new type of knowledge that is easy-collected, informative, structured, and consistent Open-domain dialogue systems aim to generate human-like for further enhancing dialogue systems? responses; however, the inadequate knowledge carried by the Recently, a new kind of knowledge, infobox tables, has at- queries dramatically constrains the ability of dialogue sys- tracted much attention in the data-to-text tasks [Bao et al., tems to understand the intrinsic semantics of the queries and 2018; Chen et al., 2020]. Figure 1 provides an illustrating generate user-friendly dialogues [Ghazvininejad et al., 2018]. example of an infobox table that specifies an entity with mul- The frequently generated boring responses (e.g., “ I don’t tiple attribute key-values. Infobox knowledge integrates the know”) often lead to frustrating user experience [Li et al., advantages of the above three types of knowledge. First, mas- 2016]. Recently, researchers have recognized that introduc- sive infobox tables can be easily obtained from the Internet, ing external knowledge to dialogue generation systems has such as Wikipedia articles or web pages of Google search re- great potential for further improving performance. Knowl- sults. Second, infobox tables are easy to use since infobox edge is regarded as the awareness and understanding of the knowledge is well extracted and organized as informative at- ∗Corresponding author. tributes. Third, an infobox table focuses on one target en- 3964 Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21) tity without the interference of irrelevant entities, guarantee- 2 Our Approach ing knowledge consistency. The infobox knowledge offers a 2.1 Problem Formulation great opportunity to employ one kind of reliable knowledge source to generate high-quality dialogue systems. Integrating Let D = fhX; Y; T ig denote the corpus, where X = (x1; ··· ; xn) is a query, Y = (y1; ··· ; ym) is a response, the above three types of knowledge one by one often results k v l in knowledge inconsistency and computational inefficiency. and T = fhf ; f ig is an infobox table that consists of a set of attribute key-values and can be retrieved from To our best knowledge, using the infobox knowledge to the infobox base T . The goal of our task is: Y ∗ = improve the quality of open-domain dialogue generation has 0 not been fully investigated yet. Compared to the usage of arg maxY 0 P (Y jX; T ). the infobox tables in the data-to-text tasks, conducting the 2.2 Context Encoder infobox knowledge-aware dialogue generation is much more difficult to study. Unlike the data-to-text tasks that rephrase A query X is firstly encoded into dialogue context states [ ] infobox attributes in an orderly fashion, dialogue genera- H = (h1; :::; hn) by a bi-directional GRU Cho et al., 2014 : tion tasks conduct attribute order-irrelevance knowledge se- f f b b lection, inference, and dialogue-infobox fusion. In addition, ht = GRU(xt; ht−1); ht = GRU(xn−t+1; ht−1) (1) existing data-to-text studies usually only consider the infobox where x is the embedding of x, and ht is the concatenation tables within limited domains. However, open-domain dia- f b [h ; h ]. hn is regarded as the context summary. logue generation has to use infobox tables in a wide range of t t domains for maintaining enough knowledge coverage, which 2.3 Hierarchical Infobox Table Encoder dramatically increases the difficulty of unambiguous knowl- kv k v k For each attribute fi = hfi ; fi i 2 T , its key fi is an noun edge representation. These challenges indicate that infobox v word, its value fi = (wi;1; ··· ; wi;jf v j) is a word sequence. knowledge-enhanced dialogue generation cannot directly uti- i lize the techniques used in the existing data-to-text works. Intra-Attribute Encoding k With these challenges in mind, we propose HITA-Graph, For an attribute key fi or a value word wi;j , its word distribu- a novel infobox knowledge-aware dialogue generation ap- tion is sparse; namely, there are rare words. If only represent them at the word-level, many words can not be recognized. proach. (1) We propose an order-irrelevance Hierarchical k Infobox Table Encoder (HITE) to represent an infobox ta- Thus, we design a hybrid-level method. Taking fi as an ex- k;hybrid ble at three levels of granularity. To alleviate the sparsity of ample, the corresponding embedding fi is given by: the word distribution of the open-domain infobox attributes, k;hybrid k k;char besides the usual word embedding, HITE further integrates fi = MLPhf (fi ; fi ) (2) the embedding learned from the char sequence of a word, k where fi is obtained by looking up the word-level embedding the part-of-speech tag embedding, and the locally positional matrix, f k;char is computed from the char sequence of f k embedding into an intra-attribute level representation of in- i i with the CharCNN Encoder [Kim et al., 2016], and MLPhf fobox table. Subsequently, a multi-head attention network is a MLP network. The incorporation of chars can alleviate is adopted to compute an attribute-level representation. In the issue of the sparse word distribution. Subsequently, each the context-level, we propose an Infobox-Dialogue Interac- attribute f kv is represented as a set of key-word embedding: tion Graph Network (IDCI-Graph) to capture both the in- i kw kw k;hybrid hybrid pos f b fobox and the dialogue context by building and inferring on Fi = ffi;j g = f[fi ; wi;j ; wi;j ; pi;j; pi;j]g an infobox-dialogue interaction graph. (2) We develop a Hi- (3) kw kv pos erarchical Infobox Attribute Attention (HIAA), which allows where fi;j is the j-th key-word pair of fi , wi;j is the part- the Decoder to access the encoded infobox knowledge in a f b of-speech tag of wi;j , and pi;j and pi;j are local positions coarse-to-fine manner. (3) When predicting the next target v token, the Decoder is equipped with three modes: generating counted from the beginning (i.e., j) and the end (i.e., jfi j − a vocabulary word, copying a word from the query, and copy- j + 1), respectively. ing an attribute word from the infobox. Such three modes are Then, the multi-head self-attention (denoted as MHA, [Vaswani et al., 2017] ) is used to compute the intra-attribute fused via the Dynamic Mode Fusion strategy (DMF). kw;ia kw;ia We collected about 895K Chinese infobox tables from level representations Fi = ffi;j g: Wikipedia. Extensive experiments are conducted on a Chi- Fkw;ia = MHA(Q = Fkw; K = Fkw; V = Fkw) (4) nese corpus [Cai et al., 2019b]. Both the automatic and i i i i human evaluation results demonstrate that the proposed ap- Attribute-Level Encoding kv proach HITA-Graph outperforms the representative competi- For each key-value attribute fi , its attribute-level embed- kv;a kw;ia kw;ia tors in almost all experiments.