Conditional Self-Attention for Query-based Summarization Yujia Xie ∗ Tianyi Zhou Yi Mao Weizhu Chen Georgia Tech University of Washington Microsoft Microsoft
[email protected] [email protected] [email protected] [email protected] Abstract element’s context-aware embedding is then com- puted by weighted averaging of other elements Self-attention mechanisms have achieved with the probabilities. Hence, self-attention is great success on a variety of NLP tasks due powerful for encoding pairwise relationship into to its flexibility of capturing dependency contextual representations. between arbitrary positions in a sequence. For problems such as query-based summarization However, higher-level language understanding (Qsumm) and knowledge graph reasoning often relies on more complicated dependencies where each input sequence is associated with than the pairwise one. One example is the con- an extra query, explicitly modeling such ditional dependency that measures how two ele- conditional contextual dependencies can lead ments are related given a premise. In NLP tasks to a more accurate solution, which however such as query-based summarization and knowl- cannot be captured by existing self-attention edge graph reasoning where inputs are equipped mechanisms. In this paper, we propose condi- with extra queries or entities, knowing the depen- tional self-attention (CSA), a neural network module designed for conditional dependency dencies conditioned on the given query or entity is modeling. CSA works by adjusting the extremely helpful for extracting meaningful rep- pairwise attention between input tokens in a resentations. Moreover, conditional dependencies self-attention module with the matching score can be used to build a large relational graph cov- of the inputs to the given query.