Semantic Matching for Sequence-to-Sequence Learning Ruiyi Zhang3, Changyou Cheny, Xinyuan Zhangz, Ke Bai3, Lawrence Carin3 3Duke University, yState University of New York at Buffalo, zASAPP Inc.
[email protected] Abstract that each piece of mass in the source distribution is transported to an equal-weight piece of mass in the In sequence-to-sequence models, classical op- target distribution. However, this requirement is timal transport (OT) can be applied to seman- too restrictive for Seq2Seq models, making direct tically match generated sentences with target sentences. However, in non-parallel settings, applications inappropriate due to the following: (i) target sentences are usually unavailable. To texts often have different lengths, and not every tackle this issue without losing the benefits of element in the source text corresponds an element classical OT, we present a semantic matching in the target text. A good example is style transfer, scheme based on the Optimal Partial Trans- where some words in the source text do not have port (OPT). Specifically, our approach par- corresponding words in the target text. (ii) it is tially matches semantically meaningful words reasonable to semantically match important words between source and partial target sequences. e.g. To overcome the difficulty of detecting active while neglecting some other words, , conjunc- regions in OPT (corresponding to the words tion. In typical unsupervised models, text data are needed to be matched), we further exploit prior usually non-parallel in the sense that pairwise data knowledge to perform partial matching. Ex- are typically unavailable (Sutskever et al., 2014). tensive experiments are conducted to evaluate Thus, both pairwise information inference and text the proposed approach, showing consistent im- generation must be performed in the same model provements over sequence-to-sequence tasks.