International Journal of Web Portals, 5(3), 19-39, July-September 2013 19

Sharing Video Emotional Information in the Web

Eva Oliveira, Digital Games Research Centre (DIGARC), Polytechnic Institute of Cávado and Ave, Barcelos, Portugal Teresa Chambel, LaSIGE, University of Lisbon FCUL, Lisbon, Portugal Nuno Magalhães Ribeiro, Centro de Estudos e Recursos Multimediáticos (CEREM), Universidade Fernando Pessoa, Porto, Portugal

ABSTRACT

Video growth over the Internet changed the way users search, browse and view video content. Watching movies over the Internet is increasing and becoming a pastime. The possibility of streaming Internet content to TV, advances in video compression techniques and video streaming have turned this recent modality of watching movies easy and doable. Web portals as a worldwide mean of multimedia data access need to have their contents properly classified in order to meet users’ needs and expectations. The authors propose a set of semantic descriptors based on both user physiological signals, captured while watching videos, and on video low-level features extraction. These XML based descriptors contribute to the creation of automatic affective meta-information that will not only enhance a web-based video recommendation system based in emotional information, but also enhance search and retrieval of videos affective content from both users’ personal clas- sifications and content classifications in the context of a web portal.

Keywords: Domain Specific Extensible Markup Language (XML) Schema, User Physiological Signals, Video Access, Video Affective Classification, Video Recommendation, Video Search

1. INTRODUCTION searching and accessing. Information needs to be labeled or annotated to be accessible, With the advent of rich interactive multimedia shared and searchable. The Multimedia Infor- content over the Internet in such environments mation Retrieval (MIR) research area is still as educational or entertainment, the way people trying to find solutions to automated content use online multimedia content, such as film and users analysis techniques, and annotation viewing, video and image sharing, asks for techniques for media, as a result of a huge need new ways to access, explore and interact with of descriptors () of contents that can such information (Purcell, 2010). Video on be understood by computers and accessible the Web has been in explosive growth, which to humans. Also, data quality for Web portals improves the fullness of the user experience but consumers is still in debate and new methods leads to new challenges in content discovery, to ensure and improve data quality are being

DOI: 10.4018/ijwp.2013070102

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 20 International Journal of Web Portals, 5(3), 19-39, July-September 2013 studied (Herrera et al., 2010). Thus, there is and creativity. Thus, the relationship between a need for automatic methods for gathering people and the world they live in is clearly information both from multimedia objects emotional. Nowadays, with technological de- (video, images, audio, text) and from users velopment, the world in which people live also (preferences, emotions, likings, comments, implies computers and their applications. When descriptions, annotations), and subsequently considering this particular relation it becomes making this information available, searchable evident that we must support user experiences and accessible (Lew, 2006). In the literature, that are engaging and enjoyable. there are several studies that have attempted The attempt to define a standardized de- to define standards to establish structures of scription of emotions faces several challenges descriptors and concepts for affective applica- and some questions naturally arise. There is tions in their categorization issues (Devillers, more than one theory of emotions and there Vidrascu & Lamel, 2005; Douglas-Cowie et is also no common agreement on the quantity al., 2007; Luneski & Bamidis, 2007; Schroder and the actual names of emotions. Curiously, et al., 2007). One of the first works developed when trying to define emotion, we encountered towards this goal was the HUMAINE database, a multitude of definitions and models in the despite the fact that it did not propose a formal literature. As stated by Fehr and Russell (1984) definition to structure emotions, but simply “everyone knows what an emotion is, until asked identified the main concepts. to give a definition.” (p. 464). Despite this fact, It is clear that there is a demand for new tools we have some main theories defining emotions that enable automatic annotation and labeling and how they can be understood. Throughout our of digital videos. In addition, the improvement review, we examine findings in three different of new techniques for gathering emotional in- perspectives: the categorical, the dimensional formation about videos, be it through content and the appraisal. The categorical or discrete analysis or user implicit feedback through user model of emotions defines emotions as discrete physiological signals, is revealing a set of new states. These states are well-defined units that ways for exploring emotional information in identify a certain behavior and experience. One videos, films or TV series. In fact, gathering of the major research programs on emotional emotional information in this context brings out expressions was developed by Paul Ekman new perspectives to personalize user informa- (1992) (Figure 1). His multicultural research tion by creating emotional profiles for both users project tested thousands of people from all over and videos. In a collaborative web environment, the globe. It turns out as commonly accepted the collection of users profiles and video profiles that six basic emotions (anger, fear, surprise, has the potential to: (a) empower the discovery of sadness, disgust, happiness) represent innate interesting emotional information in unknown and universal emotions recognized in face or unseen movies, (b) compare reactions to the expressions across cultures. However, surprise same movies among other users, (c) compare has been withdrawn from the basic emotions directors’ intentions with the effective impact since his later article (Ekman, 1999). One of the on users, and (d) analyze, over time, our own most notorious dimensional theorists is Russel reactions or directors’ tendencies. The reason (1980), who proposed a circumplex model of for the growing interest in emotion research emotions, in which one central dimension de- lies precisely in the importance of the emotions fines the polarity (positive/negative or pleasant/ for human beings. According to Axelrod and unpleasant) of the emotion and the other defines Hone (2006), emotions are currently regarded the level of activation. Russel bases his theory as important for human development, in social on a layman’s conceptualization of affect and relationships and in the context of thinking on multivariate analysis of self-reported af- processes such as reasoning, problem solving, fective states, by showing that there are two motivation, consciousness, memory, learning, predominant dimensions that participants use

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Web Portals, 5(3), 19-39, July-September 2013 21 to communicate and label their own emotional In this chapter we review current chal- experience. The appraisal theory of emotions lenges in developing web portals for digital is essentially a categorical emotional model video content dissemination and then analyze that suggests that emotion elicitation and dif- the information requirements for creating a ferentiation depend on people’s own evaluation semantic representation of such a system based of events and its circumstances. In fact, it was on the exploration of the emotional components Magda Arnold 1960 who first called “appraisal” of movies, regarding their implicit impact on for categorizing emotions based on people’s users and in the affective categorization of own judgment and according to their experi- content by users perspective (manual input). ences (Scherer, 1999). Klaus Scherer defends We then propose a semantic description of that emotions are triggered by people’s own emotion oriented towards the user experience interpretation of events, i.e. by their subjective when watching videos and finally, we conclude interpretations and considers that appraisals by suggesting a domain specific XML schema emerges from the following sequence: novelty, that will provide information in a collaborative pleasantness, own goals, causes and compat- web-based recommendation system that is based ibility with standards. Despite the fact that there on emotional information on videos. are different theories, there is some agreement that some recognition techniques might be used, such as physiological patterns, brain and speech changes, or facial expressions.

Figure 1. IFelt emotional vocabulary

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 22 International Journal of Web Portals, 5(3), 19-39, July-September 2013

2. RELATED WORK also support personalization in order to make available the required tools and information 2.1. Web Portals for Digital Video to different users (Wong & Adamson, 2010). According to Ben Natan et al. (2004), Web portals have been commonly used as Web portals became the standard for delivering that allow users to customize their Web applications as portal plugins or portlets, homepage by adding widgets onto it. Accord- independently of whether the domain is con- ing to Omar (2007), such an approach provides sumer or business applications. Portals have the user with complete control over the content therefore become the accepted user interface displayed on users’ personal pages, the location for Web applications. For example, major for such content, and the means to interact with consumer websites, such as Amazon, Yahoo it. Furthermore, web portals allow users to or YouTube, commonly present a portal look control the multimedia data that is presented, and feel. Web portals are successful essentially provides a platform that enables the mash-up of because they aggregate important functions such technologies and data, enables different services as: integration, presentation, organization, and to be consumed and enable the aggregation of personalization, in highly complex application relevant content from heterogeneous sources environments which tend to provide a large with user contributed content, as well as user number of different applications and support feedback on the content using tagging and rating. large user populations. Web portals for digital video dissemination In particular, Web video portals offer huge can be considered one of the most successful and diverse collections of audiovisual data to web applications of the Web 2.0 generation. In the public, including public media archives of fact, web video portal sites provide users with large national broadcasters (Schneider et al., a personal homepage with facilitates access 2012). Indeed, in recent years, the growth in to video information gathered from different multimedia storage and the declining costs for websites. Such portals provide dashboards that storing multimedia data gave rise to increas- enable the delivery of content aggregation both ingly larger numbers of digital videos available for individual users and enterprises alike. Video online in video repositories, making it increas- portals can thus be used as content repositories ingly difficult to retrieve the relevant videos for various kinds of video, including generic from such large video repositories according video clips, public movie clips and trailers, docu- to users’ interests (Calic et al., 2005). The mentaries and TV series excerpts. Such portals same difficulties in finding efficient methods also need to provide client-side interactivity, for indexing multimedia have been felt in the improved usability and fast performance. Such context of lecture video portals (Haojin et al., requirements are often supported through the 2011), making efficient search and retrieval of use of widgets which provide specific function- videos still a challenging task. Therefore, there alities while keeping simple core architecture is a strong need to make the unstructured digital for the web video portal. video data accessible and searchable with great On the other hand, the growth in access to ease and flexibility, in particular in the context multimedia information and applications, made of web video portals. possible by the advances in Internet and technologies, caused new challenges 2.2. Extracting and Analyzing for information search and retrieval (Prusky, Emotional Data Guelgi & Reynaud, 2011; Varela, Putnik & Ribeiro, 2012). In this context, portals need to Since William James (1984), it is known that provide such capabilities and services in order to corporal changes are correlated with emotions, meet requirements for consistent administration, which turned possible the modulation of emo- appearance, and navigation across all sources of tions through the evaluation of physiological multimedia content. In addition, portals should patterns (Rainville, Bechara, Naqvi & Damásio,

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Web Portals, 5(3), 19-39, July-September 2013 23

2006) which, in turn, unveiled the possibility skin temperature, skin resistance and forearm of the automatic recognition of emotions from flexor muscle. Several other studies show a such patterns. direct correlation between emotion categories The Autonomic Nervous System (ANS) is and physiological pattern. For example heart part of the peripheral nervous system that has the rate has been used to differentiate positive function of conducting and regulating corporal from negative emotions, suggesting that several processes such as sensory impulses from the emotions can be predicted and defined by the blood, heart, respiration, salivation, perspira- analysis of patterns of cardio (heart-rate) and tion or digestion. This system is divided into respiratory activity (Friedman & Thayer, 1998; two parts: the parasympathetic nervous system Rao & Vikram, 2001). (PSNS) responsible to calm our system down, Besides these evidences, the characteriza- and the sympathetic nervous system (SNS) tion of emotions by physiological patterns faces that has the function to prepare our body to some problems, despite its advantages when stress conditions. For example, meditation is compared to other recognition methods. The characterized by the activation of the PSNS, main limitation is the difficulty in discriminat- while in running situations the SNS is activated ing emotions from physiological signals. Other using body energy (blood pressure increases, problem lies in finding the adequate elicitation heart beats faster, digestion slows down) to technique to target a specific emotion (Rainville, react accordingly. Bechara, Naqvi & Damásio, 2006). Emotions Some authors claim that there are distinc- are time, space, context and individual based, tive patterns of ANS for anger, fear, disgust, so trying to find a general pattern for emotions sadness or happiness (Ekman, Levenson & can be difficult. Moreover, it is also difficult to Friesen, 1983b). These evidences led to many obtain a “ground truth”. These are some of the advances in the study of emotion, and the most major problems, when compared to facial and common ways to measuring it include the vocal recognition. Although emotions can be following: self-reports, autonomic measures characterized by some physiological variations, (physiological measures), startle responses machine-learning techniques may be applied in magnitudes (verbal reaction of sudden stimuli), automatic emotion recognition methods, based brain states and behavior (vocal, facial, body on pattern recognition, to deal with huge amount postures) of data needed to be analyzed. Biosensors or physiological sensors are The collection of physiological data pro- used to gather these body changes. In fact, it duced while users are watching movies was was precisely through the analysis of the varia- recently developed in psychology to test whether tions of the ANS that Damásio (1995) tested the films can be efficient emotional inductors, which somatic markers hypothesis which states that, could help psychologists in specific treatments before we rationalize any decision, our body (Kreibig, Wilhelm, Roth & Gross, 2007), or reduces our options by corporal and somatic in the computer science area to automatically changes based in our past experiences. Thus, summarize videos according to the emotional from this study, we conclude that emotions impact on viewers (Soleymani, Chanel, Kierkels can be characterized by somatic patterns and & Pun, 2009). that every somatic pattern could be evaluated The major advantage is that the character- by the analysis of ANS components variation ization of emotions from physiological patterns (Mauss, 2009). cannot be intentional, as people cannot trigger From Ekman’s work, it was also evident that the autonomic nervous system (ANS) contrarily an emotion-specific autonomic pattern could be to the so called “poker face” where people distinguishable besides its valence and arousal, disguise expressions as well as vocal utterances. because emotions of the same valence appear Another common argument against physi- very distinguishable when analyzing hear-rate, ological measurements for emotion analysis is

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 24 International Journal of Web Portals, 5(3), 19-39, July-September 2013 the fact that sensors might be invasive. However, generation such as that performed by speech nowadays, sensors are less intrusive (e.g. wear- synthesizers. Because there is no agreed upon able, made of rubber) and they allow protecting model of emotions, and there is more than one peoples’ identity or appearance, which could be way to represent an emotion (e.g. dimensional, an advantage when compared with techniques categorical), EARL leaves freedom for users to like, for example, facial recognition. manage their preferred emotion representation, Physiological data allows for the develop- by creating XML schemas for domain–specific ment of systems that facilitate the understand- coding schemas embeded in EARL, that serves ing of cognitive, emotional and motivational application cases which demand for specific processes, by giving access to emotional in- data categorization (Schroder et al., 2006). This formation that can inform about the type of constitutes one of the advantages of EARL, and engagement of a user when interacting with a the other is the fact that being an XML-based computer, for example, when performing a task language, it standardizes the representation of or watching a video. data, allowing for re-use and data-exchange Although emotions can be characterized with other technological components (Schroder by some physiological variations, machine- et al., 2007). learning techniques may be applied in automatic In order to provide suitable descriptions emotion recognition methods, based on pattern among the different use-cases, EARL assumes recognition, to deal with huge amount of data descriptions for the following information pre- needed to be analyzed. The following section sented in Table 1. Schroder et al. (2007) argue discusses how recent studies are processing that these are the informative topics needed physiological data to recognize emotions. to describe simple use-cases that use only emotional labels to process their information. 2.3. Semantically An EARL example of a description that Representing Emotions needs to use a simple emotion to represent a text that aims to sound pleasurable, could be 2.3.1 EARL written in the following simple structure One of the first works (Schroder, Pirker & Lamolle, 2006) towards the standardization of Nice to meet the representation of emotions was the Emo- you! tion Annotation and Representation Language (EARL), which aims to represent emotions Complex emotions, as considered in the suitable for the most common use cases: 1) context of EARL, are composed of several manual annotation of emotional content, 2) elements, such as more than one emotion and affect recognition systems, and 3) affective the probability of each one occurring. In the

Table 1. EARL emotional description minimal requirements

Emotional Data Description Emotion descriptor Any emotional representation set Intensity Intensity of an emotion expressed in numeric, discrete values Regulation types Which encode a person’s attempt to regulate the expression of her emotions Scope of an emotion label Link to an external media object, text, or other modality Combination Co-occurrence of emotions and their dominance Probability Labeler’s degree of confidence

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Web Portals, 5(3), 19-39, July-September 2013 25 following example presented in Box 1, we have XML schema “plugins”. Three examples of a description of an image with two emotional plugins are the emotional definitions, such as categories and their intensities. the categorical, the dimensional and the ap- An example of video annotation with a praisal, so any application can use these EARL simple emotion could be as follows: dialects according to their needs, but can also develop different EARL documents with new

Box 1.­

Box 2.­

Box 3.­

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 26 International Journal of Web Portals, 5(3), 19-39, July-September 2013 using EARL complex emotion representation the First Public Working Draft (FPWD), which could be represented as follows in Box 4. was published in 2009. The FPWD proposed EARL was the first movement towards the elements of a syntax to address use-cases in creation of a standard semantic representation what regards emotion annotation, the automatic of emotion. recognition of user-related emotions and the generation of emotion-related system behavior. 2.3.2. EmotionML As EARL, Emotion ML does not enclose annotated emotions, but uses XML attributes The second step resulted in the Emotion Markup to represent the type of data to be presented. Language. The Emotion Markup Language For example, there is an attribute – “expressed- (Emotion ML) was born to define a general- through” – that is used when considering an- purpose emotion annotation and representation notation or recognition use-cases to specify language. After the definition of EARL, the where the emotion is expressed, whether by respective working group (Schröder et al., 2011) face, voice, posture or physiological data. moved to the World Wide Web Consortium As an example of the application of this (W3C) in the form of two incubator groups. attribute, we might use it in a video annotation First, an Emotion Incubator group was set to in Emotion ML such as the following Box 5. analyze use cases and requirements of a markup An example of the representation of the language, and then the Emotion Markup Lan- automatic recognition of emotions using physi- guage Incubator group to dissect the require- ological sensors could be written as follows ments of such a language. Like EARL, the first in Box 6. W3C incubator group defined a standard for As can be seen in the semantic representa- representing and processing emotions related tions above, the development of technological information in technological environments. support by using XML structures for human The work group opted to use XML as the activities that are based on emotional aspects semantic representation language because its requires the representation of different compo- standardization allows the easy integration and nents. The Emotion Markup Language notably communication with external applications by considered a proper body of use cases to set a having a standard emotional representation general structure that supports and enables the structure that enables other technological arti- communication between technological com- facts and applications to understand and process ponents regarding the emotional properties of, data. In fact, the most recent markup language among others, images, videos, text, automatic is defined in XML, which appears to be a good recognition of emotions, emotion synthesis and choice. Emotion ML is presently in the Second speech. For example, SMIL (Bulterman, Dick & Public Working Draft (W3C working draft) after Rutledge, Lloyd, 2008) is a presentation-level

Box 4.­

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Web Portals, 5(3), 19-39, July-September 2013 27

Box 5.­

XML language that combines animation and project. The interesting aspect of this project media objects into a single presentation in an is that, in a first phase, they use emotions as a interactive way. It controls the temporal and medium-term-affect representation, and later, spatial characteristics of media and animations as a long-term representation of the personality elements in an interactive interface. Emotion of their characters. Both APLM and AffectML ML can be used as a specialized plug-in language were developed within affective applications to for SMIL as an emotion engine to thrill facial endure the affective needs. and vocal expressions.

2.3.4. Other Semantic 3. A STUDY ON ANNOTATING Representations EMOTIONS FROM USERS WHEN WATCHING MOVIES De Carolis et al. (2001) present an Affective Presentation Markup Language (APML) which There is a difficulty on what regards the se- is also an XML based language used to represent lection of an appropriate annotation scheme, face expressions of their agents dialogs, in their together with a proper set of semantic labels, work on Embodied Animated Agents (ECA). to structure and identify emotions (Devillers, Other research work related with emotion Vidrascu, & Lamel, 2005), because different concepts representation is ALMA (Gebhard, application contexts demand for specific labels 2005), a layered model of affect, also called for emotional data annotation. As stated before, AffectML, whose main purpose is to repre- some works already focused this issue along sent affect and moods in their virtual humans’ with other issues such as the development

Box 6.

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 28 International Journal of Web Portals, 5(3), 19-39, July-September 2013 of semantic representations and descriptions • It must be simple enough so that emotions of emotions in different contexts (Devillers, can be captured from the diversity of ex- Vidrascu & Lamel, 2005; Douglas-Cowie et isting emotional data gathering methods; al., 2007). • It should be readable by any module of an In this section we first propose a set of emotion processing system; annotation requirements for emotion-oriented • It should be structured in accordance with video applications in order to define the specific the W3C Emotion Markup Language labels needs in this context and then we proceed guidelines (Schroder, Wilson, Jarrold & by presenting a first semantic representation Evans, 2008) so as to enable the commu- proposal based in XML, a standard language nication among future web services. that can be used in multiple technological environments. The second aspect to take into consideration We started this task by carrying out the is the importance of the movie collection, that requirements analysis of our specific video should cover each and every basic high intensity applications and found some guidelines that emotion in order to ensure that we have, at least, we believe are important and should therefore one recognized emotional category. Addition- be followed. In this section we present such a ally, a neutral state must also be included in set of classification considerations for emotion the list of basic emotions, which corresponds annotation related with video affective proper- to the occurrence of an emotionless moment ties in order to help creating a mechanism for in the video. obtaining meaningful affective information The third aspect concerns the information from the perspective of viewers affective impact. requirements of such a classification system. Subsequently, we present an XML schema con- In fact, the classification of a user’s subjective sisting of a set of semantic descriptors based on emotions, acquired both from physiological both user physiological signals, captured while signal analysis and from user’s manual clas- watching videos (to properly characterize our sifications implies having classified at least the classification method) and also based on video following information: low-level features extraction. • The user’s affective perspective for every 3.1. Classification Requirements video from emotional impact perspectives. We now discuss a set of classification consider- • The basic emotion for each scene in the ations related with the representation of video video, in order to have a complete affec- affective properties in order to help creating a tive description of the whole video along mechanism for obtaining meaningful affective its complete duration. information from the perspective of viewers and • An emotional profile constituted by all movies affective impact. Addressing the prob- user’s subjective emotional (user’s emo- lem of the different annotation requirements, tional impact) classification based on we established the most relevant aspects that physiological data and manual input of must be included in a classification procedure, in every movie. order to enable the creation of emotion-oriented • A video emotional profile constituted by applications that include video, users and their all users’ classifications. affective relationship. • User emotional profiles are constructed One of the first aspects to consider is the over time by collecting and analyzing choice of a standard semantic representation all emotional user data detected for each for emotional information: in fact we believe video scene. that such a representation must possess the following characteristics (Oliveira, Ribeiro, & The three major emotional models pre- Chambel, 2010): sented in the introduction should be represented,

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Web Portals, 5(3), 19-39, July-September 2013 29 if possible, in every classification of every presented in Tables 2, 3 and 4 is focused on clas- scene, improving the access and interaction sification and annotation to explore and share with emotional data. movie impact on users. Thus, we structured it, As described above, we have a considerable in a XML file, by defining a schema and using amount of emotional data to process, which the Emotion ML specification in order to turn demands for suitable structures that allow the this information organized and sharable. Such annotation of this information in videos and a structure is presented in the following section. users’ profiles. In the following section we It is important to note that there is also describe the information requirements for such non-emotional information that we suggest a classification system. should be included as meta-information about videos. The other types of information that we 3.2. Information Requirements believe are also needed can be gathered through a standard API from online services like IMDb. The exploration and access of movies by their More specifically, we are considering using the emotional dimensions involves a great amount title of the video, a link to the IMDb descrip- of information. Here, we first present the infor- tion, the year of release, a URL to the cover, mation that needs to be structured in order to be users’ ratings registered in IMDb, the first of accessible in an emotional perspective by users’ the directors, the genres, plot, and information point of view, videos and their relationships, on all the actors (or the number selected) listed and then we propose a semantic description of in the main title page (name, link to actor page, emotion oriented towards the user experience link to picture and character). Also interesting while watching videos, considering the user is to include the movie reviews from the New implicit assessment (by acquiring physiologi- York 1 critics that can be gathered to complete cal signals) and the user explicit assessment the movie profile general information. (manual classification). In this way, we gather After determining the information require- information about users, videos and their re- ments we will present our system, iFelt and then lationships (user-video): Table 2, Table 3 and propose a data structure to organize emotional Table 4 detail such information. information about users when watching movies. In each table, we represent the subjec- tive emotional information and we explicitly indicate the way used to obtain such informa- 4. IFELT SYSTEM tion – either through the classification of the physiological signals or through the manual iFelt is an interactive web video application that input from users. Some studies use movies’ allows to catalog, access, explore and visualize emotional information to automatically sum- emotional information about movies. It is being marize videos according to the emotional impact designed to explore the affective dimensions (Soleymani, Chanel, Kierkels & Pun, 2009), or of movies in terms of their properties and in to detect video emotional highlightings from accordance with users’ emotional profiles, physiological signals (Smeaton and Rothwell, choices and states, in two main components. 2009). This novel compilation of information The Emotional Movie Content Classification

Table 2. Emotional information: User

Subjective Physiologic Sensors Most felt category /valence Emotions Less felt category /valence (User Felt Emotion) Most recent category/valence

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 30 International Journal of Web Portals, 5(3), 19-39, July-September 2013

Table 3. Emotional information: Video

Subjective Physiologic Sensors All users dominant emotions (Mean | SD) Emotions All users most felt emotions per scene (Mean | SD) (User Felt Emotion) All users dominant emotions along a movie (Mean | SD) All users valences per movies and scenes (Mean | SD) All users intensities per movies and scenes (Mean | SD) User self- All users self assessment about felt dominant emotion (Mean | SD) Assessment All users self assessment about felt intensity of dominant emotion (Mean | SD) All users self assessment about Felt valence (Mean | SD) All users preferences about the movie (0-9) (Mean | SD) aims to provide video classification and index- This process and its results are thoroughly ing based on emotions, either expressed in the described in (Oliveira et al., 2011). movies (objective emotions), or felt by the users (subjective emotions), as dominant emotions at 4.1. Emotional ML the level of the movie or the movie scenes. Ob- structure for iFelt jective emotions are being classified with the aid of video low-level feature analysis, combined We followed the Emotion ML recommendation with audio and subtitles processing in VIRUS for the classification of iFelt movies, their users research project (Langlois et al., 2010); while and the relationships users/movies. Emotional subjective emotions can be classified manually, ML, being an XML based language, standard- or automatically recognized with biometric izes the representation of emotional data, methods based on physiological signals, such allowing the re-use and data-exchange with as respiration, heart rate and galvanic skin other technological components, following the response, employing digital signal processing guidelines we suggested above for emotional and pattern recognition algorithms, inspired by semantic representations. This information is statistical techniques used by (Picard, 2001). linked as metadata to every video in our system

Table 4. Emotional information: User-video relationship

Subjective Emotions Physiologic Most felt category /valence (User Felt Emotion) Sensors Less felt category /valence Category felt each 5 seconds Global percentage of categorical/valence Main variances between emotional states (in every movie and all over the time) User self- Dominant felt emotion all over the movie assessment Intensity of the felt dominant emotion Valence felt Intensity of the valence felt Preference about the movie (0-9)

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Web Portals, 5(3), 19-39, July-September 2013 31 but it was still not used or even tested for the allows specifying how information was gath- purposes of this work. ered. We only presented an example for each In order to have our emotional labels well case, and also for an iFelt vocabulary. specified we created a vocabulary, as Emotion The user’s information has the same struc- ML guidelines suggest. For that reason, inside ture of the movie’s information (see Figure 6), the main vocabulary file of Emotional ML we being the first part about the iFelt information inserted our set of emotions, like the ones in requirements: 1) User general information, as Figure 1. The vocabulary element has two main name, number of movies watched, number of attributes: a) type, to specify the type of emo- neighbors and country; 2) Last Emotions, pres- tions between the traditional emotional models ents a list of movies and the associated emotion; and b) id, to identify the list of emotions to be and 3) List of Movies that this user watched. used in external files. Every file header includes the Emotion ML We have created three Emotion ML files namespace, the reference to the vocabularies and an XML schema to ensure that our data to be used, as well as the schema created (see structure can be divided into a user emotional Figure 5). profile, a movie emotional profile and a user- The second part (similar to Figure 4) also movie classification. This division further en- presents global classifications of all forms ables the reuse of the information of just the of classification but from a user perspective: user, just the movie emotional information, or manual, for both iFelt emotional labels and the just the relationship among both of them, or dimensional model measures (arousal, valence), even the three simultaneously. and automatic, through the usage of biosensors. The XML schema we created has, as its The second part (similar to Figure 4) also main objective, to structure the information presents global classifications of all forms about users, movies and user-movies relation- of classification but from a user perspective: ships. It also includes additional information manual, for both iFelt emotional labels and the that our system requires, such as last emotions, dimensional model measures (arousal, valence), lists of users, lists of movies and users’ pref- and automatic, through the usage of biosensors. erences. As an example, the following code The third file, user-movie.xml, specifies snippet (Figure 2) illustrates the definition of every emotional classification (categorical or the movie information. dimensional) along the movie, acquired through In this example, we have, as an XML ele- biosensors, and the final classification manual, ment, the tag MovieInformation that includes a which was manually inserted by the user. Thus, movieId, a title, a director, a country, a year and besides the general information about users and a list of actors. This tag is going to be used in movies, we have the emotional classification the movie Emotion ML file inside the info tag as illustrated in Figure 7. (Figure 3), which is the way Emotion ML allows Thus, Figure 7 shows three types of emo- us to introduce metadata into their files. We tional information about a user and a specific also have a list of user is related to this specific movie. The first two emotion tags specify, movie, which can easily inform us about each emotionally, a movie in 5 second range intervals user classification for this movie. The second by using a relative time, following Emotion part of the movie.xml file is concerned with the ML specification. Clipping the movie by time, emotion classification (Figure 4). is denoted by the name t, and specified as an In Figure 4, the emotion tag has only the interval with a begin time and an end time. In global classifications of a movie including all this case, we have an example from the second forms of classification: manual for both iFelt 110 until the second 115, with an emotional emotional labels and the dimensional model classification of “happy”, gathered through measures (arousal, valence), and automatic by biosensors. Manual input can be specified biosensors. The attribute expressed-through using the text value for the expressed-through

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 32 International Journal of Web Portals, 5(3), 19-39, July-September 2013

Figure 2. iFelt schema: Movie information definition

attribute. In fact, this value can be used for both specific data requirements to be represented. categorical and dimensional (valence, arousal) Having video emotionally classified will fur- classifications. In this example, the dimensional ther allow for finding interesting emotional classification value represents the intensity of information in unknown or unseen movies, and the dimension, being 0.3 a low-than-average it will also allow for the comparison among arousal, and a 0.9 value a very positive valence. other users’ reactions to the same movies, as Having this information annotated and well as comparing directors’ intentions with available, we believe it is now easier to develop the effective emotional impact on users, and an interface with other systems that may wish also analyze users’ reactions over time or to use this information. Affective information even detect directors’ tendencies. This clas- of users acquired while watching videos has sification can subsequently contribute to the

Figure 3. iFelt Emotion ML specification part 1: movie.xml

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Web Portals, 5(3), 19-39, July-September 2013 33

Figure 4. iFelt Emotion ML specification part 2: movie.xml

specification of a software tool, which enables presented in this chapter we may conclude that the personalization of TV guides according to there is no report in current literature concern- individual emotional preferences and states, ing the representation of movies either by the and recommend programs or movies based on emotional impact on viewers or by its content. those preferences and states. This review also made clear that there is an effort to develop standards concerning the representa- tion of emotions, but also that there isn’t yet a 5. CONCLUSION closed solution for this representation. Recent studies reveal that there are specific data that In this chapter we reviewed the most relevant needs to be assembled in order to obtain, not research works that focus in the representation only quantitative but also qualitative informa- of emotions, analyzing how emotional models tion about users while watching videos. The ag- present emotions and how actual systems that gregation of such data items will further improve explore emotional information use such rep- a collaborative recommendation system due to resentations in their work. From the review

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 34 International Journal of Web Portals, 5(3), 19-39, July-September 2013

Figure 6. iFelt Emotion ML specification part 1: user.xml

Figure 5. iFelt XML headers

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Web Portals, 5(3), 19-39, July-September 2013 35

Figure 7. iFelt Emotion ML specification part 2: user.xml

the wider range of data to relate. In one hand, the same scene are captured by different physi- physiological signals results can show signifi- ological signals, which shows one more time cant differences between users responses to the the existence of important information that same movie scene, other conclusions reveals collected in a user profile can enhance search that some arousal result from users, watching and recommendation in a collaborative context.

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 36 International Journal of Web Portals, 5(3), 19-39, July-September 2013

The development of such profiles requires Calic, J., Campbell, N., Dasiopoulou, S., & Kom- semantic descriptions to allow the analysis of patsiaris, Y. (2005). A survey on multimodal video user preferences. representation for semantic retrieval, computer as a tool. In Proceedings of the International Conference We demonstrated the possibility of structur- on EUROCON 2005 (vol. 1, pp.135,138, 21-24). ing the emotional information manipulated in our system based on a W3C standard. This fact Damasio, A. (1995). Descartes’ error. New York, NY: Harper-Collins. allows us to conclude that our system might be universally understandable by other systems, De Carolis, B., De Rosis, F., Carofiglio, V., Pelachaud, given that such a system has the advantage of C., & Poggi, I. (2001). Interactive information being opened to external communication with presentation by an embodied animated agent. In Proceedings of the International Workshop on other systems that might require, or need, to use Information Presentation and Natural Multimodal the emotional information acquired and stored Dialogue (pp. 14-15). in our system. We believe that the structure we proposed may provide emotional meaning Devillers, L., Vidrascu, L., & Lamel, L. (2005). Challenges in real-life emotion annotation and ma- and organization for movies, both as a col- chine learning based detection. Neural Networks, lection and as media linked to a user, as well 18, 407–422. doi:10.1016/j.neunet.2005.03.007 as provide machine understandable meaning PMID:15993746 to a user profile in what concerns a person’s Douglas-Cowie, E., Cowie, R., Sneddon, I., Cox, psychological profile, constructed while the C., Lowry, O., Mcrorie, M., & Batliner, A. (2007). person is watching a movie. The possibility The HUMAINE database: Addressing the collection of sharing such information also provides a and annotation of naturalistic and induced emotional small contribution to the information retrieval data. In Proceedings of the Affective Computing and Intelligent Interaction (pp. 488-500). Amsterdam, area, specifically in order to enhance search Netherlands: Springer mechanisms by adding new search criteria. Given that Web portals need quality data to be Ekman, P. (1992). Are there basic emotions? Psy- returned to users, and video is an increasingly chological Review, 99, 550–553. doi:10.1037/0033- 295X.99.3.550 PMID:1344638 used digital media, our work could contribute to enhance web video portals to alleviate issues and Ekman, P. (1999). Basic emotions. T. Dalgleish & M. difficulties related with search and retrieval of Power (Eds.), Handbook of cognition and emotion (pp. 45-60). Sussex, UK: Wiley. video information pertaining to users’ interests and preferences. Ekman, P., Levenson, R. W., & Friesen, W. V. (1983). Autonomic nervous system activity distin- guishes among emotions. Science, 221, 1208–1210. doi:10.1126/science.6612338 PMID:6612338 REFERENCES EMMA. Extensible Multimodal Annotation Markup Al Zabir, O. (2007). Building a web 2.0 portal with Language. (2010, May). EMMA: Extensible multi- Asp.Net 3.5 (1st ed.). O’Reilly. modal annotation markup language. Retrieved from http://www.w3.org/TR/emma/#s3.2 Ben-Natan, R., Sasson, O., & Gornitsky, R. (2004). Mastering IBM websphere portal server: Expert Fehr, B., & Russell, J. (1984). Concept of emo- guidance to build and deploy portal applications. tion viewed from a prototype perspective. Jour- John Wiley & Sons. nal of Experimental Psychology, 113, 464–486. doi:10.1037/0096-3445.113.3.464 Bulterman, D. C. A., & Rutledge, L. (2008). SMIL 2.0: Interactive multimedia for web and mobile Friedman, B. H., & Thayer, J. F. (1998). Autonomic devices. Amsterdam: Springer. balance revisited: Panic anxiety and heart rate vari- ability. Journal of Psychosomatic Research, 44, 133–151. doi:10.1016/S0022-3999(97)00202-X PMID:9483470

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Web Portals, 5(3), 19-39, July-September 2013 37

Gebhard, P. (2005). ALMA: A layered model of Picard, R. W., Vyzas, E., & Healey, J. (2001). To- affect. In Proceedings of the Fourth International ward machine emotional intelligence: Analysis of Joint Conference on Autonomous Agents and Mul- affective physiological state. IEEE Transactions on tiagent Systems (pp. 29-36). New York, NY: ACM Pattern Analysis and Machine Intelligence, 23(10), Digital Library. 1175–1191. doi:10.1109/34.954607 Haojin, Y., Siebert, M., Luhne, P., Sack, H., & Meinel, Pruski, C., Guelfi, N., & Reynaud, C. (2011). Adap- C. (2011, November 28-December 1). Lecture video tive ontology-based web information retrieval: indexing and analysis using video OCR technology, The TARGET framework. [IJWP]. International signal-image technology and internet-based systems Journal of Web Portals, 3(3), 41–58. doi:10.4018/ (SITIS). In Proceedings of the 2011 Seventh Inter- IJWP.2011070104 national Conference on (pp.54-61). Purcell. (2010). The state of online video. Retrieved Herrera, M., Moraga, M. A., Caballero, I., & Coral, (June, 2010) from http://www.pewinternet.org/Re- C. (2010). Quality in use model for web portals ports/2010/State-of-Online-Video.aspx (QiUWeP). In F. Daniel & F. M. Facca (Eds.), Proceedings of the 10th International Conference Rainville, P., Bechara, A., Naqvi, N., & Dama- on Current Trends in Web Engineering (ICWE’10) sio, A. R. (2006a). Basic emotions are associ- (pp. 91-101). Springer-Verlag, Berlin, Heidelberg. ated with distinct patterns of cardiorespiratory activity. International Journal of Psychophysiol- Kreibig, S. D., Wilhelm, F. H., Roth, W. T., & ogy, 61, 5–18. doi:10.1016/j.ijpsycho.2005.10.024 Gross, J. J. (2007). Cardiovascular, electrodermal, PMID:16439033 and respiratory response patterns to fear- and sadness-inducing films. Psychophysiology, 44, Rainville, P., Bechara, A., Naqvi, N., & Dama- 787–806. doi:10.1111/j.1469-8986.2007.00550.x sio, A. R. (2006a). Basic emotions are associ- PMID:17598878 ated with distinct patterns of cardiorespiratory activity. International Journal of Psychophysiol- Lew, M., Sebe, N., Djeraba, C., & Jain, R. (2006). ogy, 61, 5–18. doi:10.1016/j.ijpsycho.2005.10.024 Content-based multimedia information retrieval: PMID:16439033 State of the art and challenges. In Proceedings of the Transactions on Multimedia Computing, Com- Russell, J. A. (1980). A circumplex model of emotion. munications, and Applications (TOMCCAP). Journal of Personality and Social Psychology, 39, 1161–1178. doi:10.1037/h0077714 Luneski, A., & Bamidis, P. D. (2007). Towards an emotion specification method: Representing Scherer, K. R. (1999). Appraisal theories. In T. Dal- emotional physiological signals. In Proceedings gleish, & M. Power (Eds.), Handbook of cognition of the Twentieth IEEE International Symposium and emotion (pp. 637–663). Chichester, UK: Wiley. on Computer-Based Medical Systems (CBMS’07) Schneider, D., Tschopel, S., & Schwenninger, J. (pp. 363-370). (2012, May 23-25). Social recommendation using Mauss, I. B., & Robinson, M. D. (2009). Measures speech recognition: Sharing TV scenes in social of emotion: A review. Cognition and Emotion, networks. In Proceedings of the 2012 13th Interna- 23, 209–237. doi:10.1080/02699930802204677 tional Workshop on Image Analysis for Multimedia PMID:19809584 Interactive Services (WIAMIS) (pp. 1-4). Oliveira, E., Benovoy, M., Ribeiro, N., & Chambel, Schroder, M., Baggia, P., Burkhardt, F., Pelachaud, T. (2011). Towards emotional interaction: Using C., Peter, C., & Zovato, E. (2011). Emotionml--An movies to automatically learn users’ emotional states. upcoming standard for representing emotions and In Proceedings of the Human-Computer Interaction- related states. [ACII]. Affective Computing and -Interact 2011 (pp. 152-161). Intelligent Interaction, 1, 316–325. doi:10.1007/978- 3-642-24600-5_35 Oliveira, E., Ribeiro, N., & Chambel, T. (2010, April 10-15). Towards enhanced video access and Schroder, M., Devillers, L., Karpouzis, K., Martin, J., recommendation through emotions. In Proceedings Pelachaud, C., Peter, C., & Wilson, I. (2007). What of the “Brain, Body and Bytes: Psychophysiological should a generic emotion markup language be able to User Interaction” Workshop, at ACM CHI’2010, represent? Lecture Notes in Computer Science, 4738, Atlanta, GA. 440. doi:10.1007/978-3-540-74889-2_39

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 38 International Journal of Web Portals, 5(3), 19-39, July-September 2013

Schroder, M., Pirker, H., & Lamolle, M. (2006). First Varela, M., Putnik, G., & Ribeiro, R. (2012). A suggestions for an emotion annotation and representa- web-based platform for collaborative manufactur- tion language. In Proceedings of LREC (pp. 88-92). ing scheduling in a virtual enterprise. Information and Communication Technologies for the Advanced Schröder, M., Wilson, I., Jarrold, W., Evans, D., Enterprise: An International Journal, 2, 87–108. Pelachaud, C., Zovato, E., & Karpouzis, K. (2008). What is most important for an emotion markup William, J. (1950). The principles of psychology (Vol. language? In Proc. Third Workshop Emotion and 2). New York, NY: Dover Publications. Computing (KI 2008), Kaiserslautern, Germany. Wong, K., & Adamson, G. (2010). Part of the tool Schröder, M., Wilson, I., Jarrold, W., Evans, D., kit. [IJWP]. International Journal of Web Portals, Pelachaud, C., Zovato, E., & Karpouzis, K. (2008). 2(1), 37–44. doi:10.4018/jwp.2010010104 What is most important for an emotion markup language? In Proc. Third Workshop Emotion and Computing, KI 2008, Kaiserslautern, Germany. Soleymani, M. S., Chanel, C. G., Kierkels, J. K., ENDNOTES & Pun, T. P. (2009). Affective characterization of movie scenes based on content analysis and 1 1 http://developer.nytimes.com/docs/movie_ physiological changes. International Journal of reviews_api Semantic Computing, 3(2), 235–254. doi:10.1142/ S1793351X09000744

Eva Ferreira de Oliveira is currently a lecturer in the School Technology at the Polytechnic Institute of Cávado and Ave, Portugal. She holds a Dipl Eng in the field of information systems and information technologies and a MSci in the field of systems and informatics engineering both from University of Minho(Portugal). She teaches web development, interactive systems, and character animation to undergraduate students. Her scientific and research interest are related to new ways of visualizing and interacting with videos exploring affective dimensions.

Teresa Chambel is a professor and researcher at Faculty of Sciences, University of Lisbon (FCUL) in Portugal, where she received a Ph.D. in Informatics, on video, and learn- ing technologies, and a B.Sc. in Computer Science. Her M.Sc. was in Electrical and Computer Engineering at the Technical University of Lisbon, on distributed hypermedia. She is a member of the Human-Computer Interaction and Multimedia Group (HCIM) at the Lasige Lab at FCUL, since 1998, and was previously a member of the Multimedia and Interaction Techniques Group at INESC, Lisbon. Her research interests include multimedia and hypermedia, with a special emphasis on video and hypervideo, human-computer interaction, elearning, creativity, visualiza- tion, accessibility, cognition and emotions, interactive TV, digital talking books and digital art. For more information, visit: www.di.fc.ul.pt/~tc

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Web Portals, 5(3), 19-39, July-September 2013 39

Nuno Magalhães Ribeiro is Associate Professor at the Faculty of Science and Technology of Fernando Pessoa University (Porto, Portugal). He is the coordinator of the Computer Science area, being the course director for bachelor and masters courses on Computer Systems Engi- neering, specializations on Multimedia and Information Systems and Mobile Computing. Cur- rently he teaches subjects on Multimedia and Interactive Systems, Multimedia Compression and Representation, Digital Design and Applied Electronics. He holds a Ph.D. on Computer Science by the University of York (U.K.), and an M.Sc. on Electrical and Electrotechnical Engineering, specialization on Teleccommunications and a Licenciate degree on Electrical and Electrotech- nical Engineering, specialization on Computer Systems, both by the Faculty of Engineering of the University of Porto (FEUP). He participated in international R&D projects developed at the Multimedia Center of the University of Porto (CMUP) and at the Institute of Computers and Systems Engineering (INESC). Currently, he is a researcher at the Center of Multimedia Studies and Resources (CEREM) at Fernando Pessoa University. His research interests lie on e-learning systems based on multimedia and mobile computing technologies, and he supervises Ph.D. projects on m-learning and emotion-based video search and retrieval. He is the author of the books Tecnologias de Compressão Multimédia (FCA) about multimedia compression and representation, Multimédia e Tecnologias Interactivas (FCA) about multimedia interactive systems and interfaces, and Informática e Competências Tecnológicas para a Sociedade da Informação (UFP) about general computer science.

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.