Constructiveness-Based Product Review Classification

Constructiveness-Based Product Review Classification Ugo Loobuyck Uppsala University Department of Linguistics and Philology Master Programme in Language Technology Master’s Thesis in Language Technology, 30 ects credits June 8, 2020 Supervisor: Prof. Joakim Nivre, Uppsala University Abstract Promoting constructiveness in online comment sections is an essential step to make the internet a more productive place. On online marketplaces, customers often have the opportunity to voice their opinion and relate their experience with a given product. In this thesis, we investigate the possibility to model constructiveness in product review in order to promote the most informative and argumentative customer feedback. We develop a new constructiveness 4-class scale taxonomy based on heuristics and specic categorical criteria. We use this taxonomy to annotate 4000 Amazon customer reviews as our training set, referred to as the Corpus for Review Constructiveness (CRC). In addition to the 4-class constructiveness tag, we include a binary tag to compare modeling performance with previous work. We train and test several computational models such as Bidirectional Encoder Representations from Transformers (BERT), a Stacked Bidirectional LSTM and a Gradient Boosting Machine. We demonstrate our annotation scheme’s reliability with a set of inter-annotator agreement experiments, and show that good levels of performance can be reached in both multiclass setting (0.69 F1 and 57% error reduction over the baseline) and binary setting (0.85 F1 and 71% error reduction). Dierent features are evaluated individually and in combination. Moreover, we compare the advantages, downsides and performance of both feature-based and neural network models. Finally, these models trained on CRC are tested on out-of-domain data (news article comments) and shown to be nearly as procient as on in-domain data. This work allows the extension of constuctiveness modeling to a new type of data and provides a new non-binary taxonomy for data labeling. Contents Preface5 1. Introduction6 1.1. Purpose...................................6 1.2. Outline...................................7 2. Background9 2.1. Constructiveness.............................9 2.2. Toxicity................................... 11 2.3. Machine Learning............................. 12 2.3.1. Classic feature-based models.................. 12 2.3.2. Neural networks......................... 13 3. Data 16 3.1. Data Gathering.............................. 16 3.1.1. Training Data........................... 16 3.1.2. Test Data............................. 17 3.2. Possible Shortcomings.......................... 17 4. Annotation 19 4.1. Annotation Scheme............................ 19 4.1.1. Four-class scale (CRC<D;C8 ).................... 20 4.1.2. Two-class scale (CRC18=)..................... 21 4.2. Annotation agreement.......................... 23 4.2.1. Experiment 1........................... 23 4.2.2. Experiment 2........................... 25 4.3. Annotated Data Sets............................ 26 5. Experimental Methodology 28 5.1. Preprocessing............................... 28 5.2. Baseline.................................. 28 5.3. Feature-based Model........................... 29 5.3.1. Model............................... 29 5.3.2. Features.............................. 29 5.3.3. Hyperparameters......................... 30 5.4. Neural Network Models.......................... 31 5.4.1. Stacked Bi-LSTM Network.................... 31 5.4.2. BERT................................ 32 5.5. Evaluation................................. 33 6. Results and Discussion 34 6.1. Results................................... 34 6.2. Multiclass vs. Binary Classication................... 35 6.3. Feature Performance........................... 36 6.4. Feature-Based vs. Neural Networks................... 37 6.5. In- vs. Out-of-Domain........................... 39 3 7. Conclusion 41 Appendices 42 A. Amazon categories in CRC 43 A.1. Amazon ocial Dataset 1995-2015.................... 43 A.2. Amazon Review Data 1996-2018 (University of California San Diego. 43 B. Examples of review annotations with 4-class scheme 44 B.1. Class A................................... 44 B.2. Class B................................... 45 B.3. Class C................................... 45 B.4. Class D................................... 45 C. Stacked Bidirectional LSTM Architecture 46 4 Preface Constructiveness is the human way. A Policy of Kindness Dalai Lama Beyond the technical aspect of the master’s thesis from which I learned a lot and the challenges that such a large project implies, the subject I chose oered me a number of unexpected key takeaways. I have learned how to give better feedback, be more positive and measure the impact of words, but also how to receive feedback and capitalize on it. I would like to thank my supervisor, Joakim Nivre, for all the judicious advice he gave me and for bringing a lot of experience into my project. Many thanks to Clara, who has shown tremendous support during these four months, both in the form of advice, moral support and proofreading, and without whom the project would have been much harder. Last but not least, thanks to my friends and family, from whom I was far away during the project, for their remote support and many encouragements. 5 1. Introduction Constructiveness has always been a core component of improvement-oriented human communications, and plays a key role in feedback systems. Instead of giving feedback by simply pointing out mistakes or attempting to hurt, constructiveness can be used through argumentation and respectful discourse techniques in order to capitalize on these past mistakes for future improvements. Similarly, there are many ways of giving positive feedback, but constructive positive feedback is usually supported by relevant examples and details. With the rapid expansion of online communities in the past decades, moderation tools have been continuously developed to face the spread of toxicity and hate in such places. It is important for these media and businesses that the space specically designed for ows of ideas, constructive feedback and respectful discussions is not polluted by a minority of disrupters. On the other side of this perpetual ght to make the internet a safer place, the recent NLP task of constructiveness analysis has arisen, and addresses the promotion of informative online content that aims for general improvement. While it is critical to ensure that a minimal respect standard is followed by users when posting feedback (Reich, 2011), boosting the exposure of constructive comments is twofold: rst, it can have a positive impact on the way other users act within the same environment, through mimetic isomorphism (DiMaggio and Powell, 1983) — second, it allows users to have an immediate access to the most informative content via means of ltering. The latter is perfectly illustrated by the New York Times newspaper,1 who employs a team of human moderators in order to pick and highlight remarkably valuable article comments on a broad range of subjects, referred to as the NYT Picks (Diakopoulos, 2015). Ghose and Ipeirotis (2006) also designed a tool that promotes helpful reviews on Amazon,2 usable by product manufacturers to retrieve the most insightful comments and by customers to have rapid access to missing information, for example. 1.1. Purpose Most of the previous work in this area has focused on the analysis and classication of online news article comments. This type of data is convenient in the way that users often express their points of view and opinion with a certain degree of constructiveness. We take a dierent approach: we believe that constructiveness modeling can also be performed on product feedback, where customers relate their personal experience about items they acquired. Although the intention clearly diers from news article comments which are more oriented towards inter-user discussion, product reviews still exhibit a wide range of explicit argumentation and informativeness features. Moreover, most of the constructiveness classication work has used a binary framework, i.e. constructive or not constructive. This setting is broadly used since it is the simplest form of classication, and usually oers a good balance between realisticness and performance. For instance, sentiment analysis is often performed in a binary 1https://www.nytimes.com 2https://www.amazon.com 6 setting although it is known that more target classes can be legitimately added, e.g. “neutral”. We believe that classifying the constructiveness of user inputs in a binary framework is not representative of the concept of constructiveness itself. For instance, the three product reviews below show increasingly constructive features: (a). That just sucks!! (b). My daughter loves these paints! She got them for Christmas and uses them almost daily. (c). This is a rugged great lighting tripod that will support a fair amount of weight safely. The only issue I had was there wasn’t detailed instructions on how to set it up. One of the extensions was actually inside a part of the unit and it took some time for me to actually nd that part to make it all come together. Overall great product just wished they had done a little bit better on the instructions. This setup raises a few questions: where does the threshold between constructive and non-constructive lie? Review (b) is clearly less destructive than review (a), but also less constructive than review (c), therefore where does it stand in

Load more