Application of Machine Learning in Automatic Sentiment Recognition from Human Speech
Total Page:16
File Type:pdf, Size:1020Kb
Application of Machine Learning in Automatic Sentiment Recognition from Human Speech Zhang Liu Ng EYK Anglo-Chinese Junior College College of Engineering Singapore Nanyang Technological University (NTU) Singapore [email protected] Abstract— Opinions and sentiments are central to almost all human human activities and have a wide range of applications. As activities and have a wide range of applications. As many decision many decision makers turn to social media due to large makers turn to social media due to large volume of opinion data volume of opinion data available, efficient and accurate available, efficient and accurate sentiment analysis is necessary to sentiment analysis is necessary to extract those data. Business extract those data. Hence, text sentiment analysis has recently organisations in different sectors use social media to find out become a popular field and has attracted many researchers. However, consumer opinions to improve their products and services. extracting sentiments from audio speech remains a challenge. This project explores the possibility of applying supervised Machine Political party leaders need to know the current public Learning in recognising sentiments in English utterances on a sentiment to come up with campaign strategies. Government sentence level. In addition, the project also aims to examine the effect agencies also monitor citizens’ opinions on social media. of combining acoustic and linguistic features on classification Police agencies, for example, detect criminal intents and cyber accuracy. Six audio tracks are randomly selected to be training data threats by analysing sentiment valence in social media posts. from 40 YouTube videos (monologue) with strong presence of In addition, sentiment information can be used to make sentiments. Speakers express sentiments towards products, films, or predictions, such as in stock market, electoral politics and political events. These sentiments are manually labelled as negative even box office revenue. Moreover, sentiment analysis that and positive based on independent judgement of 3 experimenters. A moves towards achieving emotion recognition can potentially wide range of acoustic and linguistic features are then analysed and extracted using sound editing and text mining tools respectively. A enhance psychiatric treatment as emotions of patients are more novel approach is proposed, which uses a simplified sentiment score accurately identified. to integrate linguistic features and estimate sentiment valence. This approach improves negation analysis and hence increased overall accuracy. Results have shown that when both linguistic and acoustic Since 2000, researchers have made many successful features are used, accuracy of sentiment recognition improves attempts in text sentiment analysis. In comparison, audio significantly, and that excellent prediction is achieved when the four sentiment analysis does not seem to receive as much attention. classifiers are trained respectively, with kNN and Neural Network It is, however, equally significant as text sentiment analysis. having higher accuracies. Possible sources of error and inherent Many people in the contemporary society shared their challenges of audio sentiment analysis are discussed to provide opinions using online-based multimedia platforms such as potential directions for future research. YouTube videos, Instagram stories, TV talk shows and TED Keywords – Sentiment Analysis; Natural Language Processing; talks. It is difficult to manually classify sentiments in them due Machine Learning; Affective Computing; Data Analytics; Speech to the sheer amount of data. With the help of machine Processing; Computational Linguistic. automation, we can recognise, with an acceptable accuracy, the general sentiments about certain products, movies, and I. INTRODUCTION socio-political events, hence aiding decision-making process of corporations, societal organisations and governments. Sentiment analysis, also called opinion mining, is the field of study that analyses people’s opinions, sentiments, This project explores the possibility of using a machine appraisals, attitudes, and emotions toward entities and their learning approach to recognise sentiments accurately and attributes [1]. Opinions and sentiments are central to almost all automatically from natural audio speech in English. In Ding, et al. proposed a holistic lexicon-based approach addition, the project also aims to examine the effect of [4] to solve the problem of insufficient acoustic features by combining acoustic and linguistic features on classification exploiting external evidences and linguistic conventions of accuracy. Training data consist of 150 speech segments natural language expressions. Inspired by above work, a extracted from 6 YouTube videos of different genres. Both simplified sentiment score model is proposed in this project. acoustic features and linguistic features will be examined in The method proposed provides sentence level audio speech order to increase the accuracy of automatic sentiment analysis. The detail of the method will be explained in section recognition. Sentiments will be categorised into 2 target III. classes, positive and negative. III. METHODOLOGY II. LITERATURE REVIEW Data Collection and Processing There were previous attempts to combine acoustic and Speech data from YouTube videos linguistic features of speech in sentiment analysis. Chul & Automatic Speech Recognition (ASR) software Audio converter Narayanan (2005) [2] explored the detection of domain- Transcribed text (.txt) Speech signal (.wav) specific emotions using language and discourse information in conjunction with acoustic correlates of emotion in speech Segmentation (using sound editing software, Praat) signals. The specific focus was on a case study of detecting Sentence-level speech data negative and non-negative emotions using spoken language Linguistic Analysis Acoustic Analysis data obtained from a call center application. Results showed Sentiment Lexicons Usiing Praat Negation Analysis Low Level Descriptors (LLDs) that combining all the information, rather than using only Emotionally salient words Number of Number of Number of Intensity (amp, acoustic information, improved emotion classification by Pitch (avg, energy, power) positive words negative words negators max, sd,) 40.7% for males and 36.4% for females (linear discriminant Formula Voice Quality (HNR, jitter, classifier used for acoustic information). This study suggested Sentiment Score shimmer) a comprehensive range of features and provided some insights Linguistic Features Acoustic Features for my project: acoustic features (Fundamental Frequency Both (F0), Energy, Duration, Formants), and textual features Machine Learning —> Compare Accuracies (emotional salience, discourse information). However, with its Naive Neural SVM kNN speech data collected from a call center, the research focused Bayes Network on emotions in human-machine interactions, rather than in natural human speech. Figure 1. An overview of the methodology. 3.1 DATABASE Another research, Kaushik & Sangwan & Hansen (2013) [3], provided an alternative source of speech data - The speech data used in the experiments are obtained from YouTube videos. In this study, the authors proposed a system YouTube, a social media platform. The source is chosen for automatic sentiment detection in natural audio streams because thousands of YouTube users share their personal such as those found in YouTube. The proposed technique uses opinions or reviews on their channels. There is a huge amount Part of Speech (POS) tagging and Maximum Entropy of accessible speech data containing sentiment valence. More modeling (ME) to develop a text-based sentiment detection importantly, their ways of speaking are usually closest to model. Using decoded Automatic Speech Recognition (ASR) natural, spontaneous human speech. Six videos are randomly transcripts and the ME sentiment model, the proposed system selected from 40 YouTube videos that have strong presence of is able to estimate sentiments in YouTube videos. Their results negative or positive sentiments. Subject matters include: 1) showed that it is possible to perform sentiment analysis on Product Review; 2) Movie Review; 3) Political Opinion. natural spontaneous speech data despite poor word error rates. This study provided a systematic approach and proved that such audio sentiment analysis is possible. It did not, however, include enough acoustic features of audio speech, possibly due to the limitation of document-level analysis. During the pre-processing stage, the videos are converted positively connoted words in each segment are then counted into .wav files. Speech transcriptions are generated using the respectively and the numerical values were stored in the Automatic Speech Recognition (ASR) software, Speechmatics training data set. (https://www.speechmatics.com) and checked manually to increase reliability. Each sound file (.wav) is then edited in the vocal toolkit, Praat (http://www.fon.hum.uva.nl/praat/). The TextGrid annotation (as shown in Figure 2) includes 2 tiers, transcription text and numbering, which are useful in keeping track of the data. Meanwhile, the sound file is segmented into smaller sections containing 1 to 5 sentences of relevant meaning and the same sentiment. Each segment is pre-assigned a sentiment label (‘negative’ or ‘positive’) based on independent judgement of 3 experimenters so as to minimise bias and subjective errors. There is a total of 150 sound