Affective Video Content Analysis: a Multidisciplinary Insight Yoann Baveye, Christel Chamaret, Emmanuel Dellandréa, Liming Chen
Total Page:16
File Type:pdf, Size:1020Kb
Affective Video Content Analysis: A Multidisciplinary Insight Yoann Baveye, Christel Chamaret, Emmanuel Dellandréa, Liming Chen To cite this version: Yoann Baveye, Christel Chamaret, Emmanuel Dellandréa, Liming Chen. Affective Video Content Analysis: A Multidisciplinary Insight. IEEE Transactions on Affective Computing, Institute of Elec- trical and Electronics Engineers, 2017, 10.1109/TAFFC.2017.2661284. hal-01489729 HAL Id: hal-01489729 https://hal.archives-ouvertes.fr/hal-01489729 Submitted on 14 Mar 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 Affective Video Content Analysis: A Multidisciplinary Insight Yoann Baveye, Christel Chamaret, Emmanuel Dellandrea,´ and Liming Chen Abstract—In our present society, the cinema has become one of the major forms of entertainment providing unlimited contexts of emotion elicitation for the emotional needs of human beings. Since emotions are universal and shape all aspects of our interpersonal and intellectual experience, they have proved to be a highly multidisciplinary research field, ranging from psychology, sociology, neuroscience, etc., to computer science. However, affective multimedia content analysis work from the computer science community benefits but little from the progress achieved in other research fields. In this paper, a multidisciplinary state-of-the-art for affective movie content analysis is given, in order to promote and encourage exchanges between researchers from a very wide range of fields. In contrast to other state-of-the-art papers on affective video content analysis, this work confronts the ideas and models of psychology, sociology, neuroscience, and computer science. The concepts of aesthetic emotions and emotion induction, as well as the different representations of emotions are introduced, based on psychological and sociological theories. Previous global and continuous affective video content analysis work, including video emotion recognition and violence detection, are also presented in order to point out the limitations of affective video content analysis work. Index Terms—Multidisciplinary review, Video content analysis, Cognitive models, Affective computing, State of the art. F 1 INTRODUCTION CCORDING to legend, when “L’arrivee´ d’un train en Affective video content analysis aims at automatic recog- A gare de La Ciotat” directed and produced by Auguste nition of emotions elicited by videos. There are three per- and Louis Lumiere` was first screened in 1896, the audience spectives for affective movie content analysis work, each was so terrified at the sight of the oncoming locomotive related to specific emotion detection: intended, induced and that people screamed and tried to hide under their seats. expected emotion [3]. The intended emotion is the emotion Nowadays, film-lovers are more hardened but still enjoy that the film maker wants to induce in the viewers. The the almost unlimited potential of movies for experiencing induced emotion is the emotion that a viewer feels in response emotions. And with the massive movie collection available to the movie. The expected emotion is the emotion that the online through popular on-demand streaming media web- majority of the audience feels in response to the same content. sites such as Netflix1 or M-GO2, increasing day after day, film In other words, the expected emotion is the expected value spectators can feel everywhere a large variety of emotions. of experienced (i.e. induced) emotion in a population. While Under these circumstances, knowing in advance the the induced emotion is subjective and context dependent, the emotions that a movie is likely to elicit from its viewers is expected emotion can be considered objective, as it reflects highly beneficial; not only to improve accuracy for video the more-or-less unanimous response of a general audience indexing, and summarization (e.g. [1], [2]), but also for to a given stimulus [3]. Intended and expected emotions mood-based personalized content delivery [3]. While major do not always match [4], [5]. It is a fact that a movie can progress has been achieved in computer vision for visual be unsuccessful in conveying the intended emotion (e.g., object detection, scene understanding and high-level concept viewers laughing while watching horror movies intended to recognition, a natural further step is the modeling and elicit fear). The degree of effectiveness with which a movie recognition of affective concepts. This explains why the creates the desired emotion in the viewer can be a basic affective video content analysis research topic has emerged criterion for assessing movie quality [6]. and attracted increasingly more attention in recent years. A large number of work has been proposed in the literature to tackle this highly challenging task. However, • Y. Baveye is with Ixpel, an expertise cell of CAPACITES,´ and is with the a review of the state of the art shows that affective video University of Nantes, Centre National de la Recherche Scientifique, LS2N, content analysis work from the computer science community UMR6004, France. benefits very little from the progress made in other research E-mail: [email protected] fields, including psychology, sociology, and neuroscience. • C. Chamaret is with Technicolor, 975, avenue des Champs Blancs, 35576 Cesson S´evign´e,France. Indeed, it appears from previous works that the main ap- E-mail: [email protected] proach for analyzing the affective content of videos is to use • E. Dellandr´eaand L. Chen are with the Universit´ede Lyon, Centre National machine learning to build, using a dataset, models such as de la Recherche Scientifique, Ecole Centrale de Lyon, LIRIS, UMR5205, F-69134, France. Hidden Markov Models (HMMs), Support Vector Machines E-mail: femmanuel.dellandrea, [email protected] for Regression (SVRs), and more recently Convolutional Neural Networks (CNNs). 1. https://www.netflix.com/ The main contribution of this paper is to propose a 2. http://www.mgo.com/ multidisciplinary state of the art for affective movie content JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2 analysis, in order to promote and stimulate exchanges and situations in the same way everywhere in the world. between researchers from a wide variety of fields. In contrast This idea of universality was reused by Ekman and Friesen to previous state of the art studies [7], [8], this work thus to develop the Emotion Facial Action Coding System (EM- confronts the ideas and models from psychology, sociology, FACS) [12], which is a simplified although less common neuroscience, and computer science with a focus on the version of FACS [13] designed explicitly for the coding of emotions elicited by movies. General psychological emo- facial behavior related only to emotion. Ekman considers tional theories are described alongside models for aesthetic emotions to be discrete states that are associated with facial emotions, focusing on emotions in response to a work of art. expressions. Consequently, he postulates that there are a With these theoretical models in mind, previous works on fixed number of emotions. Plutchik agreed with Ekman’s continuous and global affective video content analysis are biologically driven perspective and theorized that basic presented, including video emotion recognition and violence emotions are biologically primitive and have evolved in detection, which are related to the expected affect estimation order to increase the reproductive fitness of the animal [14]. from audiovisual features. Studies, known as implicit video The term basic is used in the literature to describe three affective content analysis, have also been conducted using perspectives [15]: the spontaneous response of users watching the videos (e.g., • Basic emotions are different from each other and facial recordings, physiological responses). However, this show unique characteristics such as physiological topic has already been investigated in [7]. This paper focuses and behavioral responses rather on direct video affective content analysis studies, using • Basic emotions are a learned response to deal with audiovisual features extracted from the videos to infer a fundamental life tasks given type of emotion. • Basic emotions are elemental, and combine together The paper is organized as follows. Section 2 describes to form complex emotions main psychological emotional theories, gives an overview of neuroscience work, and defines the concepts of aesthetic More specifically, the basic emotion theory identifies emotions and emotion induction, as well as different repre- several criteria for an emotion to be considered basic [15], sentations of emotions. Section 3 presents previous global such as: distinctive universal signals, distinctive physiology, and continuous affective video content analysis workand distinctive universals in antecedent events (i.e., some stimuli lists current existing affective video databases.Limitations render a universal response), presence in other primates,