Arxiv:2004.10150V1 [Cs.CL] 21 Apr 2020 Vided Strong Impetus to Develop Systems That Per- Able and Cannot Be Easily Sourced

Unsupervised Opinion Summarization with Noising and Denoising Reinald Kim Amplayo and Mirella Lapata Institute for Language, Cognition and Computation School of Informatics, University of Edinburgh [email protected], [email protected] Abstract tentially more challenging, abstractive approaches seem more appropriate for generating informative The supervised training of high-capacity mod- and concise summaries, e.g., by performing var- els on large datasets containing hundreds of ious rewrite operations (e.g., deletion of words thousands of document-summary pairs is critical to the recent success of deep learning tech- or phrases and insertion of new ones) which go niques for abstractive summarization. Unfortu- beyond simply copying and rearranging passages nately, in most domains (other than news) such from the original opinions. training data is not available and cannot be eas- Abstractive summarization has enjoyed renewed ily sourced. In this paper we enable the use of supervised learning for the setting where interest in recent years thanks to the availability there are only documents available (e.g., prod- of large-scale datasets (Sandhaus, 2008; Hermann uct or business reviews) without ground truth et al., 2015; Grusky et al., 2018; Liu et al., 2018; summaries. We create a synthetic dataset from Fabbri et al., 2019) which have driven the devel- a corpus of user reviews by sampling a re- opment of neural architectures for summarizing view, pretending it is a summary, and gener- single and multiple documents. Several approaches ating noisy versions thereof which we treat (See et al., 2017; Celikyilmaz et al., 2018; Paulus as pseudo-review input. We introduce several et al., 2018; Gehrmann et al., 2018; Liu et al., linguistically motivated noise generation functions and a summarization model which learns 2018; Perez-Beltrachini et al., 2019; Liu and La- to denoise the input and generate the original pata, 2019; Wang and Ling, 2016) have shown review. At test time, the model accepts gen- promising results with sequence-to-sequence mod- uine reviews and generates a summary contain- els that encode one or several source documents ing salient opinions, treating those that do not and then decode the learned representations into an reach consensus as noise. Extensive automatic abstractive summary. and human evaluation shows that our model brings substantial improvements over both ab- The supervised training of high-capacity models stractive and extractive baselines. on large datasets containing hundreds of thousands of document-summary pairs is critical to the recent 1 Introduction success of deep learning techniques for abstractive The proliferation of massive numbers of online summarization. Unfortunately, in most domains product, service, and merchant reviews has pro- (other than news) such training data is not avail- arXiv:2004.10150v1 [cs.CL] 21 Apr 2020 vided strong impetus to develop systems that per- able and cannot be easily sourced. For instance, form opinion mining automatically (Pang and Lee, manually writing opinion summaries is practically 2008). The vast majority of previous work (Hu impossible since an annotator must read all avail- and Liu, 2006) breaks down the problem of opin- able reviews for a given product or service which ion aggregation and summarization into three inter- can be prohibitively many. Moreover, different related tasks involving aspect extraction (Mukher- types of products impose different restrictions on jee and Liu, 2012), sentiment identification (Pang the summaries which might vary in terms of length, et al., 2002; Pang and Lee, 2004), and summary or the types of aspects being mentioned, rendering creation based on extractive (Radev et al., 2000; the application of transfer learning techniques (Pan Lu et al., 2009) or abstractive methods (Ganesan and Yang, 2010) problematic. et al., 2010; Carenini et al., 2013; Gerani et al., Motivated by these issues, Chu and Liu(2019) 2014; Di Fabbrizio et al., 2014). Although po- consider an unsupervised learning setting where there are only documents (product or business re- the decoder generate topically consistent text. views) available without corresponding summaries. We perform experiments on two review datasets They propose an end-to-end neural model to per- representing different domains (movies vs busi- form abstractive summarization based on (a) an nesses) and summarization requirements (short vs autoencoder that learns representations for each re- longer summaries). Results based on automatic view and (b) a summarization module which takes and human evaluation show that our method outper- the aggregate encoding of reviews as input and forms previous unsupervised summarization mod- learns to generate a summary which is semantically els, including the state-of-the-art abstractive sys- similar to the source documents. Due to the ab- tem of Chu and Liu(2019) and is on the same par sence of ground truth summaries, the model is not with a state-of-the-art supervised model (Wang and trained to reconstruct the aggregate encoding of re- Ling, 2016) trained on a small sample of (genuine) views, but rather it only learns to reconstruct the en- review-summary pairs. coding of individual reviews. As a result, it may not be able to generate meaningful text when the num- 2 Related Work ber of reviews is large. Furthermore, autoencoders are constrained to use simple decoders lacking at- Most previous work on unsupervised opinion sum- tention (Bahdanau et al., 2014) and copy (Vinyals marization has focused on extractive approaches et al., 2015) mechanisms which have proven useful (Carenini et al., 2006; Ku et al., 2006; Paul et al., in the supervised setting leading to the generation 2010; Angelidis and Lapata, 2018) where a cluster- of informative and detailed summaries. Problem- ing model groups opinions of the same aspect, and atically, a powerful decoder might be detrimental a sentence extraction model identifies text repre- to the reconstruction objective, learning to express sentative of each cluster. Ganesan et al.(2010) pro- arbitrary distributions of the output sequence while pose a graph-based abstractive framework for gen- ignoring the encoded input (Kingma and Welling, erating concise opinion summaries, while Di Fab- 2014; Bowman et al., 2016). brizio et al.(2014) use an extractive system to first select salient sentences and then generate an ab- In this paper, we enable the use of super- stractive summary based on hand-written templates vised techniques for unsupervised summarization. (Carenini and Moore, 2006). Specifically, we automatically generate a synthetic As mentioned earlier, we follow the setting of training dataset from a corpus of product reviews, Chu and Liu(2019) in assuming that we have ac- and use this dataset to train a more powerful neural cess to reviews but no gold-standard summaries. model with supervised learning. The synthetic data Their model learns to generate opinion summaries is created by selecting a review from the corpus, by reconstructing a canonical review of the average pretending it is a summary, generating multiple encoding of input reviews. Our proposed method noisy versions thereof and treating these as pseudo- is also abstractive and neural-based, but eschews reviews. The latter are obtained with two noise gen- the use of an autoencoder in favor of supervised eration functions targeting textual units of different sequence-to-sequence learning through the creation granularity: segment noising introduces noise at the of a synthetic training dataset. Concurrently with word- and phrase-level, while document noising re- our work, Brazinskasˇ et al.(2019) use a hierarchi- places a review with a semantically similar one. We cal variational autoencoder to learn a latent code of use the synthetic data to train a neural model that the summary. While they also use randomly sam- learns to denoise the pseudo-reviews and generate pled reviews for supervised training, our dataset the summary. This is motivated by how humans construction method is more principled making use write opinion summaries, where denoising can be of linguistically motivated noise functions. seen as removing diverging information. Our pro- Our work relates to denoising autoencoders posed model consists of a multi-source encoder and (DAEs; Vincent et al., 2008), which have been a decoder equipped with an attention mechanism. effectively used as unsupervised methods for vari- Additionally, we introduce three modules: (a) ex- ous NLP tasks. Earlier approaches have shown that plicit denoising guides how the model removes DAEs can be used to learn high-level text represen- noise from the input encodings, (b) partial copy en- tations for domain adaptation (Glorot et al., 2011) ables to copy information from the source reviews and multimodal representations of textual and vi- only when necessary, and (c) a discriminator helps sual input (Silberer and Lapata, 2014). Recent work has applied DAEs to text generation tasks, Candidate Summary the movie is a fun comedy specifically to data-to-text generation (Freitag and with fine performances . Roy, 2018) and extractive sentence compression (Fevry and Phang, 2018). Our model differs from these approaches in two respects. Firstly, while Token-level Noising Chunk-level Noising the film is a nice comedy with good production and zohan , previous work has adopted trivial noising methods with stellar cast . PP NP CC NP , such as randomly adding or removing words (Fevry a nice comedy by the film . NP PP NP . and Phang, 2018) and randomly corrupting encodings (Silberer and Lapata, 2014), our noise gen- Document-level Noising the high-handed premise does not always work in erators are more linguistically informed and suit- 0.05 able for the opinion summarization task. Secondly, zohan but you have to admire the chutzpah in trying it . the fine performance of sandler as zohan in this very while in Freitag and Roy(2018) the decoder is lim- funny comedy makes this movie special . 0.67 ited to vanilla RNNs, our noising method enables the latest in a long line of underwhelming adam sandler 0.12 the use of more complex architectures, enhanced comedies .

Load more