Learning Thematic Similarity Metric Using Triplet Networks Liat Ein Dor,∗ Yosi Mass,∗ Alon Halfon, Elad Venezian, Ilya Shnayderman, Ranit Aharonov and Noam Slonim IBM Research, Haifa, Israel fliate,yosimass,alonhal,eladv,ilyashn,ranita,
[email protected] Abstract to the related task of clustering sentences that rep- resent paraphrases of the same core statement. In this paper we suggest to leverage the Thematic clustering is important for various use partition of articles into sections, in or- cases. For example, in multi-document summa- der to learn thematic similarity metric be- rization, one often extracts sentences from mul- tween sentences. We assume that a sen- tiple documents that have to be organized into tence is thematically closer to sentences meaningful sections and paragraphs. Similarly, within its section than to sentences from within the emerging field of computational argu- other sections. Based on this assumption, mentation (Lippi and Torroni, 2016), arguments we use Wikipedia articles to automatically may be found in a widespread set of articles (Levy create a large dataset of weakly labeled et al., 2017), which further require thematic orga- sentence triplets, composed of a pivot sen- nization to generate a compelling argumentative tence, one sentence from the same sec- narrative. tion and one from another section. We We approach the problem of thematic cluster- train a triplet network to embed sentences ing by developing a dedicated sentence similar- from the same section closer. To test the ity measure, targeted at a comparative task – The- performance of the learned embeddings, matic Distance Comparison (TDC): given a pivot we create and release a sentence cluster- sentence, and two other sentences, the task is to ing benchmark.