A Review on Speech Emotion Recognition Using Deep Learning and Attention Mechanism

Total Page:16

File Type:pdf, Size:1020Kb

A Review on Speech Emotion Recognition Using Deep Learning and Attention Mechanism electronics Review A Review on Speech Emotion Recognition Using Deep Learning and Attention Mechanism Eva Lieskovská *, Maroš Jakubec, Roman Jarina and Michal Chmulík Faculty of Electrical Engineering and Information Technology, University of Žilina, Univerzitná 8215/1, 010 26 Žilina, Slovakia; [email protected] (M.J.); [email protected] (R.J.); [email protected] (M.C.) * Correspondence: [email protected] Abstract: Emotions are an integral part of human interactions and are significant factors in determin- ing user satisfaction or customer opinion. speech emotion recognition (SER) modules also play an important role in the development of human–computer interaction (HCI) applications. A tremendous number of SER systems have been developed over the last decades. Attention-based deep neural networks (DNNs) have been shown as suitable tools for mining information that is unevenly time distributed in multimedia content. The attention mechanism has been recently incorporated in DNN architectures to emphasise also emotional salient information. This paper provides a review of the recent development in SER and also examines the impact of various attention mechanisms on SER performance. Overall comparison of the system accuracies is performed on a widely used IEMOCAP benchmark database. Citation: Lieskovská, E.; Jakubec, M.; Keywords: speech emotion recognition; deep learning; attention mechanism; recurrent neural Jarina, R.; Chmulík, M. A Review on network; long short-term memory Speech Emotion Recognition Using Deep Learning and Attention Mechanism. Electronics 2021, 10, 1163. https://doi.org/10.3390/ 1. Introduction electronics10101163 The aim of human–computer interaction (HCI) is not only to create a more effective and natural communication interface between people and computers, but its focus also lies Academic Editors: Matus Pleva, on creating the aesthetic design, pleasant user experience, help in human development, on- Yuan-Fu Liao, Patrick Bours and line learning improvement, etc. Since emotions form an integral part of human interactions, Chiman Kwan they have naturally become an important aspect of the development of HCI-based applica- tions. Emotions can be technologically captured and assessed in a variety of ways, such Received: 11 March 2021 as facial expressions, physiological signals, or speech. With the intention of creating more Accepted: 9 May 2021 Published: 13 May 2021 natural and intuitive communication between humans and computers, emotions conveyed through signals should be correctly detected and appropriately processed. Throughout Publisher’s Note: MDPI stays neutral the last two decades of research focused on automatic emotion recognition, many machine with regard to jurisdictional claims in learning techniques have been developed and constantly improved. published maps and institutional affil- Emotion recognition is used in a wide variety of applications. Anger detection can iations. serve as a quality measurement for voice portals [1] or call centres. It allows adapting provided services to the emotional state of customers accordingly. In civil aviation, moni- toring the stress of aircraft pilots can help reduce the rate of a possible aircraft accident. Many researchers, who seek to enhance players’ experiences with video games and to keep them motivated, have been incorporating the emotion recognition module into their Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. products. Hossain et al. [2] used multimodal emotion recognition for quality improvement This article is an open access article of a cloud-based gaming experience through emotion-aware screen effects. The aim is to distributed under the terms and increase players’ engagement by adjusting the game in accordance with their emotions. In conditions of the Creative Commons the area of mental health care, a psychiatric counselling service with a chatbot is suggested Attribution (CC BY) license (https:// in [3]. The basic concept consists of the analysis of input text, voice, and visual clues in creativecommons.org/licenses/by/ order to assess the subject’s psychiatric disorder and inform about diagnosis and treatment. 4.0/). Another suggestion for emotion recognition application is a conversational chatbot, where Electronics 2021, 10, 1163. https://doi.org/10.3390/electronics10101163 https://www.mdpi.com/journal/electronics Electronics 2021, 10, x FOR PEER REVIEW 2 of 29 Electronics 2021, 10, 1163 clues in order to assess the subject’s psychiatric disorder and inform about diagnosis2 ofand 29 treatment. Another suggestion for emotion recognition application is a conversational chatbot, where speech emotion identification can play a role in better conversation [4]. A speechreal-time emotion SER application identification should can playfind a an role op intimal better trade conversation-off between [4]. Aless real-time computing SER applicationpower, fast shouldprocessing find times an optimal, and a trade-off high degree between of accuracy. less computing power, fast processing times,In and this a review, high degree we focus of accuracy. on works dealing with the processing of acoustic clues from speechIn to this recogni review,se wethe focusspeaker on’s works emotions. dealing The with task theof speech processing emotion of acoustic recognition clues (SER) from speechis traditionally to recognise divided the speaker’s into two emotions.main parts: The feature task of extraction speech emotion and classification recognition (SER), as de- is traditionallypicted in Figure divided 1. During into two the mainfeature parts: extraction feature stage, extraction a speech and signal classification, is converted as depicted to nu- inmerical Figure values1. During using the various feature extractionfront-end signal stage, aprocessing speech signal techniques. is converted Extracted to numerical feature valuesvectors using have variousa compact front-end form and signal ideally processing should techniques. capture essential Extracted information feature vectors from have the asignal. compact In the form back and-end, ideally an appropriate should capture classifier essential is selected information according from theto the signal. task Into the be back-end,performed. an appropriate classifier is selected according to the task to be performed. Figure 1. Block scheme of general speechspeech emotion recognition system.system. Examples of widely used acoustic features are mel-frequencymel-frequency cepstralcepstral coefficientscoefficients (MFCCs), linear prediction cepstral coefficientscoefficients (LPCC), short-timeshort-time energy, fundamentalfundamental frequencyfrequency (F0), (F0), f formantsormants [5,6] [5,6,], etc etc.. Traditional Traditional classification classification techniques techniques include include probabil- proba- bilisticistic models models such such asas the the Gaussian Gaussian mixture mixture model model (GMM) (GMM) [6 [6––8]8],, h hiddenidden Markov modelmodel (HMM) [[9]9],, andand support support vector vector machine machine (SVM (SVM [10 [10–12–].12] Over. Over the the years years of research, of research, also also var- iousvarious artificial artificial neural neural network network architectures architectures have have been been utilised, utili fromsed, from the simplest the simplest multilayer mul- perceptrontilayer perceptron (MLP) [(MLP)8] through [8] through extreme extreme learning learning machine machine (ELM) [(ELM)13], convolutional[13], convolutional neural networksneural networks (CNNs) (CNNs) [14,15], [14,15] to deep, to architectures deep architectures of residual of residual neural networks neural networks (ResNets) (Res- [16] andNets) recurrent [16] and neuralrecurrent networks neural (RNNs)networks [17 (RNNs),18]. In [17,18] particular,. In particular, long short-term long short memory-term (LSTM)memory and (LSTM) gated and recurrent gated recurrent units (GRU)-based units (GRU) neural-based networks neural networks (NNs), as (NN state-of-the-arts), as state- solutionsof-the-art insolutions time-sequence in time modelling,-sequence havemodelling, been widely have been utilised widely in speech utilised signal in speech modelling. sig- Innal addition, modelling. various In addition, end-to-end various architectures end-to-end havearchitectures been proposed have been to learnproposed jointly to learn both extractionjointly both of extraction features and of features classification and classification [15,19,20]. [15,19,20]. Besides LSTM and GRU networks, the introduction of an attention mechanism (AM) in deepdeep learninglearning may may be be considered considered as as another another milestone milestone in sequentialin sequential data data processing. processing. The purposeThe purpose of AM of is,AM as is, with as with human human visual visual attention, attention, to select to select relevant relevant information information and filter and outfilter irrelevant out irrelevant ones. ones. The attention The attention mechanism, mechanism first, introducedfirst introduced for a machinefor a machine translation trans- tasklation [21 task], has [21] become, has become an essential an essential component component of neural of neural architectures. architectures. Incorporating Incorporating AM intoAM encoder–decoder-basedinto encoder–decoder-based neural neural architectures architectures significantly significantly boosted boosted the performance the perfor- of machinemance of translation machine translation even for long even sequences for long [sequences21,22].
Recommended publications
  • Neuroticism and Facial Emotion Recognition in Healthy Adults
    First Impact Factor released in June 2010 bs_bs_banner and now listed in MEDLINE! Early Intervention in Psychiatry 2015; ••: ••–•• doi:10.1111/eip.12212 Brief Report Neuroticism and facial emotion recognition in healthy adults Sanja Andric,1 Nadja P. Maric,1,2 Goran Knezevic,3 Marina Mihaljevic,2 Tijana Mirjanic,4 Eva Velthorst5,6 and Jim van Os7,8 Abstract 1School of Medicine, University of Results: A significant negative corre- Belgrade, 2Clinic for Psychiatry, Clinical Aim: The aim of the present study lation between the degree of neuroti- 3 Centre of Serbia, Faculty of Philosophy, was to examine whether healthy cism and the percentage of correct Department of Psychology, University of individuals with higher levels of neu- answers on DFAR was found only for Belgrade, Belgrade, 4Special Hospital for roticism, a robust independent pre- happy facial expression (significant Psychiatric Disorders Kovin, Kovin, Serbia; after applying Bonferroni correction). 5Department of Psychiatry, Academic dictor of psychopathology, exhibit Medical Centre University of Amsterdam, altered facial emotion recognition Amsterdam, 6Departments of Psychiatry performance. Conclusions: Altered sensitivity to the and Preventive Medicine, Icahn School of emotional context represents a useful Medicine, New York, USA; 7South Methods: Facial emotion recognition and easy way to obtain cognitive phe- Limburg Mental Health Research and accuracy was investigated in 104 notype that correlates strongly with Teaching Network, EURON, Maastricht healthy adults using the Degraded
    [Show full text]
  • How Deep Neural Networks Can Improve Emotion Recognition on Video Data
    HOW DEEP NEURAL NETWORKS CAN IMPROVE EMOTION RECOGNITION ON VIDEO DATA Pooya Khorrami1, Tom Le Paine1, Kevin Brady2, Charlie Dagli2, Thomas S. Huang1 1 Beckman Institute, University of Illinois at Urbana-Champaign 2 MIT Lincoln Laboratory 1 fpkhorra2, paine1, [email protected] 2 fkbrady, [email protected] ABSTRACT An alternative way to model the space of possible emo- There have been many impressive results obtained us- tions is to use a dimensional approach [11] where a person’s ing deep learning for emotion recognition tasks in the last emotions can be described using a low-dimensional signal few years. In this work, we present a system that per- (typically 2 or 3 dimensions). The most common dimensions forms emotion recognition on video data using both con- are (i) arousal and (ii) valence. Arousal measures how en- volutional neural networks (CNNs) and recurrent neural net- gaged or apathetic a subject appears while valence measures works (RNNs). We present our findings on videos from how positive or negative a subject appears. the Audio/Visual+Emotion Challenge (AV+EC2015). In our Dimensional approaches have two advantages over cat- experiments, we analyze the effects of several hyperparam- egorical approaches. The first being that dimensional ap- eters on overall performance while also achieving superior proaches can describe a larger set of emotions. Specifically, performance to the baseline and other competing methods. the arousal and valence scores define a two dimensional plane while the six basic emotions are represented as points in said Index Terms— Emotion Recognition, Convolutional plane. The second advantage is dimensional approaches can Neural Networks, Recurrent Neural Networks, Deep Learn- output time-continuous labels which allows for more realistic ing, Video Processing modeling of emotion over time.
    [Show full text]
  • Emotion Recognition Depends on Subjective Emotional Experience and Not on Facial
    Emotion recognition depends on subjective emotional experience and not on facial expressivity: evidence from traumatic brain injury Travis Wearne1, Katherine Osborne-Crowley1, Hannah Rosenberg1, Marie Dethier2 & Skye McDonald1 1 School of Psychology, University of New South Wales, Sydney, NSW, Australia 2 Department of Psychology: Cognition and Behavior, University of Liege, Liege, Belgium Correspondence: Dr Travis A Wearne, School of Psychology, University of New South Wales, NSW, Australia, 2052 Email: [email protected] Number: (+612) 9385 3310 Abstract Background: Recognising how others feel is paramount to social situations and commonly disrupted following traumatic brain injury (TBI). This study tested whether problems identifying emotion in others following TBI is related to problems expressing or feeling emotion in oneself, as theoretical models place emotion perception in the context of accurate encoding and/or shared emotional experiences. Methods: Individuals with TBI (n = 27; 20 males) and controls (n = 28; 16 males) were tested on an emotion recognition task, and asked to adopt facial expressions and relay emotional memories according to the presentation of stimuli (word & photos). After each trial, participants were asked to self-report their feelings of happiness, anger and sadness. Judges that were blind to the presentation of stimuli assessed emotional facial expressivity. Results: Emotional experience was a unique predictor of affect recognition across all emotions while facial expressivity did not contribute to any of the regression models. Furthermore, difficulties in recognising emotion for individuals with TBI were no longer evident after cognitive ability and experience of emotion were entered into the analyses. Conclusions: Emotion perceptual difficulties following TBI may stem from an inability to experience affective states and may tie in with alexythymia in clinical conditions.
    [Show full text]
  • Emotion Detection in Twitter Posts: a Rule-Based Algorithm for Annotated Data Acquisition
    2020 International Conference on Computational Science and Computational Intelligence (CSCI) Emotion detection in Twitter posts: a rule-based algorithm for annotated data acquisition Maria Krommyda Anastatios Rigos Institute of Communication and Computer Systems Institute of Communication and Computer Systems Athens, Greece Athens, Greece [email protected] [email protected] Kostas Bouklas Angelos Amditis Institute of Communication and Computer Systems Institute of Communication and Computer Systems Athens, Greece Athens, Greece [email protected] [email protected] Abstract—Social media analysis plays a key role to the person that provided the text and it is categorized as positive, understanding of the public’s opinion regarding recent events negative, or neutral. Such analysis is mainly used for movies, and decisions, to the design and management of advertising products and persons, when measuring their appeal to the campaigns as well as to the planning of next steps and mitigation actions for public relationship initiatives. Significant effort has public is of interest [4] and require extended quantities of text been dedicated recently to the development of data analysis collected over a long period of time from different sources to algorithms that will perform, in an automated way, sentiment ensure an unbiased and objective output. analysis over publicly available text. Most of the available While this technique is very popular and well established, research work, focuses on binary categorizing text as positive or with many important use cases and applications, there are negative without further investigating the emotions leading to that categorization. The current needs, however, for in-depth analysis other equally important use cases where such analysis fail to of the available content combined with the complexity and multi- provided the needed information.
    [Show full text]
  • Facial Emotion Recognition
    Issue 1 | 2021 Facial Emotion Recognition Facial Emotion Recognition (FER) is the technology that analyses facial expressions from both static images and videos in order to reveal information on one’s emotional state. The complexity of facial expressions, the potential use of the technology in any context, and the involvement of new technologies such as artificial intelligence raise significant privacy risks. I. What is Facial Emotion FER analysis comprises three steps: a) face de- Recognition? tection, b) facial expression detection, c) ex- pression classification to an emotional state Facial Emotion Recognition is a technology used (Figure 1). Emotion detection is based on the ana- for analysing sentiments by different sources, lysis of facial landmark positions (e.g. end of nose, such as pictures and videos. It belongs to the eyebrows). Furthermore, in videos, changes in those family of technologies often referred to as ‘affective positions are also analysed, in order to identify con- computing’, a multidisciplinary field of research on tractions in a group of facial muscles (Ko 2018). De- computer’s capabilities to recognise and interpret pending on the algorithm, facial expressions can be human emotions and affective states and it often classified to basic emotions (e.g. anger, disgust, fear, builds on Artificial Intelligence technologies. joy, sadness, and surprise) or compound emotions Facial expressions are forms of non-verbal (e.g. happily sad, happily surprised, happily disgus- communication, providing hints for human emo- ted, sadly fearful, sadly angry, sadly surprised) (Du tions. For decades, decoding such emotion expres- et al. 2014). In other cases, facial expressions could sions has been a research interest in the field of psy- be linked to physiological or mental state of mind chology (Ekman and Friesen 2003; Lang et al.
    [Show full text]
  • Deep-Learning-Based Models for Pain Recognition: a Systematic Review
    applied sciences Article Deep-Learning-Based Models for Pain Recognition: A Systematic Review Rasha M. Al-Eidan * , Hend Al-Khalifa and AbdulMalik Al-Salman College of Computer and Information Sciences, King Saud University, Riyadh 11451, Saudi Arabia; [email protected] (H.A.-K.); [email protected] (A.A.-S.) * Correspondence: [email protected] Received: 31 July 2020; Accepted: 27 August 2020; Published: 29 August 2020 Abstract: Traditional standards employed for pain assessment have many limitations. One such limitation is reliability linked to inter-observer variability. Therefore, there have been many approaches to automate the task of pain recognition. Recently, deep-learning methods have appeared to solve many challenges such as feature selection and cases with a small number of data sets. This study provides a systematic review of pain-recognition systems that are based on deep-learning models for the last two years. Furthermore, it presents the major deep-learning methods used in the review papers. Finally, it provides a discussion of the challenges and open issues. Keywords: pain assessment; pain recognition; deep learning; neural network; review; dataset 1. Introduction Deep learning is an important branch of machine learning used to solve many applications. It is a powerful way of achieving supervised learning. The history of deep learning has undergone different naming conventions and developments [1]. It was first referred to as cybernetics in the 1940s–1960s. Then, in the 1980s–1990s, it became known as connectionism. Finally, the phrase deep learning began to be utilized in 2006. There are also some names that are based on human knowledge.
    [Show full text]
  • A Review of Alexithymia and Emotion Perception in Music, Odor, Taste, and Touch
    MINI REVIEW published: 30 July 2021 doi: 10.3389/fpsyg.2021.707599 Beyond Face and Voice: A Review of Alexithymia and Emotion Perception in Music, Odor, Taste, and Touch Thomas Suslow* and Anette Kersting Department of Psychosomatic Medicine and Psychotherapy, University of Leipzig Medical Center, Leipzig, Germany Alexithymia is a clinically relevant personality trait characterized by deficits in recognizing and verbalizing one’s emotions. It has been shown that alexithymia is related to an impaired perception of external emotional stimuli, but previous research focused on emotion perception from faces and voices. Since sensory modalities represent rather distinct input channels it is important to know whether alexithymia also affects emotion perception in other modalities and expressive domains. The objective of our review was to summarize and systematically assess the literature on the impact of alexithymia on the perception of emotional (or hedonic) stimuli in music, odor, taste, and touch. Eleven relevant studies were identified. On the basis of the reviewed research, it can be preliminary concluded that alexithymia might be associated with deficits Edited by: in the perception of primarily negative but also positive emotions in music and a Mathias Weymar, University of Potsdam, Germany reduced perception of aversive taste. The data available on olfaction and touch are Reviewed by: inconsistent or ambiguous and do not allow to draw conclusions. Future investigations Khatereh Borhani, would benefit from a multimethod assessment of alexithymia and control of negative Shahid Beheshti University, Iran Kristen Paula Morie, affect. Multimodal research seems necessary to advance our understanding of emotion Yale University, United States perception deficits in alexithymia and clarify the contribution of modality-specific and Jan Terock, supramodal processing impairments.
    [Show full text]
  • Image Processing with the Artificial Swarm Intelligence 86 Xiaodong Zhuang, Nikos E
    Advances in Image Analysis - Nature Inspired Methodology Dr. Xiaodong Zhuang 1 Associate Professor, Qingdao University, China 2 WSEAS Research Department, Athens, Greece Prof. Dr. Nikos E. Mastorakis 1 Professor, Technical University of Sofia, Bulgaria 2 Professor, Military Institutes of University Education, Hellenic Naval Academy, Greece 3 WSEAS Headquarters, Athens, Greece Published by WSEAS Press ISBN: 978-960-474-290-5 www.wseas.org Advances in Image Analysis – Nature Inspired Methodology Published by WSEAS Press www.wseas.org Copyright © 2011, by WSEAS Press All the copyright of the present book belongs to the World Scientific and Engineering Academy and Society Press. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the Editor of World Scientific and Engineering Academy and Society Press. All papers of the present volume were peer reviewed by two independent reviewers. Acceptance was granted when both reviewers' recommendations were positive. See also: http://www.worldses.org/review/index.html ISBN: 978-960-474-290-5 World Scientific and Engineering Academy and Society Preface The development of human society relies on natural resources in every area (both material and spiritual). Nature has enormous power and intelligence behind its common daily appearance, and it is generous. We learn in it and from it, virtually as part of it. Nature-inspired systems and methods have a long history in human science and technology. For example, in the area of computer science, the recent well-known ones include the artificial neural network, genetic algorithm and swarm intelligence, which solve hard problems by imitating mechanisms in nature.
    [Show full text]
  • Emotion Recognition in Conversation: Research Challenges, Datasets, and Recent Advances
    1 Emotion Recognition in Conversation: Research Challenges, Datasets, and Recent Advances Soujanya Poria1, Navonil Majumder2, Rada Mihalcea3, Eduard Hovy4 1 School of Computer Science and Engineering, NTU, Singapore 2 CIC, Instituto Politécnico Nacional, Mexico 3 Computer Science & Engineering, University of Michigan, USA 4 Language Technologies Institute, Carnegie Mellon University, USA [email protected], [email protected], [email protected], [email protected] Abstract—Emotion is intrinsic to humans and consequently conversational data. ERC can be used to analyze conversations emotion understanding is a key part of human-like artificial that take place on social media. It can also aid in analyzing intelligence (AI). Emotion recognition in conversation (ERC) is conversations in real times, which can be instrumental in legal becoming increasingly popular as a new research frontier in natu- ral language processing (NLP) due to its ability to mine opinions trials, interviews, e-health services and more. from the plethora of publicly available conversational data in Unlike vanilla emotion recognition of sentences/utterances, platforms such as Facebook, Youtube, Reddit, Twitter, and others. ERC ideally requires context modeling of the individual Moreover, it has potential applications in health-care systems (as utterances. This context can be attributed to the preceding a tool for psychological analysis), education (understanding stu- utterances, and relies on the temporal sequence of utterances. dent frustration) and more. Additionally, ERC is also extremely important for generating emotion-aware dialogues that require Compared to the recently published works on ERC [10, an understanding of the user’s emotions. Catering to these needs 11, 12], both lexicon-based [13, 8, 14] and modern deep calls for effective and scalable conversational emotion-recognition learning-based [4, 5] vanilla emotion recognition approaches algorithms.
    [Show full text]
  • Machine Learning Based Emotion Classification Using the Covid-19 Real World Worry Dataset
    Anatolian Journal of Computer Sciences Volume:6 No:1 2021 © Anatolian Science pp:24-31 ISSN:2548-1304 Machine Learning Based Emotion Classification Using The Covid-19 Real World Worry Dataset Hakan ÇAKAR1*, Abdulkadir ŞENGÜR2 *1Fırat Üniversitesi/Teknoloji Fak., Elektrik-Elektronik Müh. Bölümü, Elazığ, Türkiye ([email protected]) 2Fırat Üniversitesi/Teknoloji Fak., Elektrik-Elektronik Müh. Bölümü, Elazığ, Türkiye ([email protected]) Received Date : Sep. 27, 2020 Acceptance Date : Jan. 1, 2021 Published Date : Mar. 1, 2021 Özetçe— COVID-19 pandemic has a dramatic impact on economies and communities all around the world. With social distancing in place and various measures of lockdowns, it becomes significant to understand emotional responses on a great scale. In this paper, a study is presented that determines human emotions during COVID-19 using various machine learning (ML) approaches. To this end, various techniques such as Decision Trees (DT), Support Vector Machines (SVM), k-nearest neighbor (k-NN), Neural Networks (NN) and Naïve Bayes (NB) methods are used in determination of the human emotions. The mentioned techniques are used on a dataset namely Real World Worry dataset (RWWD) that was collected during COVID-19. The dataset, which covers eight emotions on a 9-point scale, grading their anxiety levels about the COVID-19 situation, was collected by using 2500 participants. The performance evaluation of the ML techniques on emotion prediction is carried out by using the accuracy score. Five-fold cross validation technique is also adopted in experiments. The experiment works show that the ML approaches are promising in determining the emotion in COVID- 19 RWWD.
    [Show full text]
  • Emotion Classification Based on Biophysical Signals and Machine Learning Techniques
    S S symmetry Article Emotion Classification Based on Biophysical Signals and Machine Learning Techniques Oana Bălan 1,* , Gabriela Moise 2 , Livia Petrescu 3 , Alin Moldoveanu 1 , Marius Leordeanu 1 and Florica Moldoveanu 1 1 Faculty of Automatic Control and Computers, University POLITEHNICA of Bucharest, Bucharest 060042, Romania; [email protected] (A.M.); [email protected] (M.L.); fl[email protected] (F.M.) 2 Department of Computer Science, Information Technology, Mathematics and Physics (ITIMF), Petroleum-Gas University of Ploiesti, Ploiesti 100680, Romania; [email protected] 3 Faculty of Biology, University of Bucharest, Bucharest 030014, Romania; [email protected] * Correspondence: [email protected]; Tel.: +40722276571 Received: 12 November 2019; Accepted: 18 December 2019; Published: 20 December 2019 Abstract: Emotions constitute an indispensable component of our everyday life. They consist of conscious mental reactions towards objects or situations and are associated with various physiological, behavioral, and cognitive changes. In this paper, we propose a comparative analysis between different machine learning and deep learning techniques, with and without feature selection, for binarily classifying the six basic emotions, namely anger, disgust, fear, joy, sadness, and surprise, into two symmetrical categorical classes (emotion and no emotion), using the physiological recordings and subjective ratings of valence, arousal, and dominance from the DEAP (Dataset for Emotion Analysis using EEG, Physiological and Video Signals) database. The results showed that the maximum classification accuracies for each emotion were: anger: 98.02%, joy:100%, surprise: 96%, disgust: 95%, fear: 90.75%, and sadness: 90.08%. In the case of four emotions (anger, disgust, fear, and sadness), the classification accuracies were higher without feature selection.
    [Show full text]
  • DEAP: a Database for Emotion Analysis Using Physiological Signals
    IEEE TRANS. AFFECTIVE COMPUTING 1 DEAP: A Database for Emotion Analysis using Physiological Signals Sander Koelstra, Student Member, IEEE, Christian M¨uhl, Mohammad Soleymani, Student Member, IEEE, Jong-Seok Lee, Member, IEEE, Ashkan Yazdani, Touradj Ebrahimi, Member, IEEE, Thierry Pun, Member, IEEE, Anton Nijholt, Member, IEEE, Ioannis Patras, Member, IEEE Abstract—We present a multimodal dataset for the analysis of human affective states. The electroencephalogram (EEG) and peripheral physiological signals of 32 participants were recorded as each watched 40 one-minute long excerpts of music videos. Participants rated each video in terms of the levels of arousal, valence, like/dislike, dominance and familiarity. For 22 of the 32 participants, frontal face video was also recorded. A novel method for stimuli selection is proposed using retrieval by affective tags from the last.fm website, video highlight detection and an online assessment tool. An extensive analysis of the participants’ ratings during the experiment is presented. Correlates between the EEG signal frequencies and the participants’ ratings are investigated. Methods and results are presented for single-trial classification of arousal, valence and like/dislike ratings using the modalities of EEG, peripheral physiological signals and multimedia content analysis. Finally, decision fusion of the classification results from the different modalities is performed. The dataset is made publicly available and we encourage other researchers to use it for testing their own affective state estimation methods. Index Terms—Emotion classification, EEG, Physiological signals, Signal processing, Pattern classification, Affective computing. ✦ 1 INTRODUCTION information retrieval. Affective characteristics of multi- media are important features for describing multime- MOTION is a psycho-physiological process triggered dia content and can be presented by such emotional by conscious and/or unconscious perception of an E tags.
    [Show full text]