Examining Emotion Perception Agreement in Live Music Performance Simin Yang, Courtney Nicole Reed, Elaine Chew, Mathieu Barthet

Examining Emotion Perception Agreement in Live Music Performance Simin Yang, Courtney Nicole Reed, Elaine Chew, Mathieu Barthet To cite this version: Simin Yang, Courtney Nicole Reed, Elaine Chew, Mathieu Barthet. Examining Emotion Perception Agreement in Live Music Performance. IEEE Transactions on Affective Computing, Institute of Electrical and Electronics Engineers, In press, 10.1109/TAFFC.2021.3093787. hal-03277877 HAL Id: hal-03277877 https://hal.archives-ouvertes.fr/hal-03277877 Submitted on 5 Jul 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. X, NO. X, MONTH-MONTH 2021 1 Examining Emotion Perception Agreement in Live Music Performance Simin Yang, Courtney N. Reed, Elaine Chew, and Mathieu Barthet Abstract—Current music emotion recognition (MER) systems rely on emotion data averaged across listeners and over time to infer the emotion expressed by a musical piece, often neglecting time- and listener-dependent factors. These limitations can restrict the efficacy of MER systems and cause misjudgements. We present two exploratory studies on music emotion perception. First, in a live music concert setting, fifteen audience members annotated perceived emotion in the valence-arousal space over time using a mobile application. Analyses of inter-rater reliability yielded widely varying levels of agreement in the perceived emotions. A follow-up lab-based study to uncover the reasons for such variability was conducted, where twenty-one participants annotated their perceived emotions whilst viewing and listening to a video recording of the original performance and offered open-ended explanations. Thematic analysis revealed salient features and interpretations that help describe the cognitive processes underlying music emotion perception. Some of the results confirm known findings of music perception and MER studies. Novel findings highlight the importance of less frequently discussed musical attributes, such as musical structure, performer expression, and stage setting, as perceived across audio and visual modalities. Musicians are found to attribute emotion change to musical harmony, structure, and performance technique more than non-musicians. We suggest that accounting for such listener-informed music features can benefit MER in helping to address variability in emotion perception by providing reasons for listener similarities and idiosyncrasies. Index Terms—Music and emotion, music perception, inter-rater reliability, individual factors, live performance, music emotion recognition, music information retrieval F 1 INTRODUCTION USIC, like other forms of art, is subjective and response perception [20] and are commonly used in predictive emotion M to music is ultimately up to individual interpretation. models [13], [21], [22], it is unknown how these features Music can both convey and evoke emotions [1]. Some common influence perceived emotion and the features do not submit approaches used in the investigation of these emotions readily to cognitive modelling [23], [24]. involve self-reporting [2], through which participants can In the attempt to develop computational models linking actively report their own subjective experiences. This may music and associated emotions, the subjective and unique include perceived emotion, that which the listener recognises perspective of each individual listener has rarely been taken the music is trying to convey, or induced emotion, that which into account [2], [25], [26]. Music emotion research often re- is felt by the listener in response to the music [3]. A single quires the assessment of agreement among listeners; however, musical work can express a range of emotions that vary agreement in music emotion ratings from multiple listeners over time and across individual listeners [4], [5], [6]; thus, is usually limited [16], [27], [28]. Variance between listeners self-reporting investigations may use time-based annotation can be caused by numerous factors, including the inherent of emotions to help identify detailed, localised emotion subjectivity of individual perception, participants’ limited “cues” [7], [8], [9], [10]. understanding of emotion taxonomies, ill-defined rubrics Previous work using listener annotations has determined used to rate emotion and insufficient rating training, and that music features such as dynamics, tempo, mode, melodic- the lack of controls on data collection when using online or harmonic progression and interactions, and sound artic- crowd-sourcing platforms. MER normally utilises an average ulation impact perceived emotion [11], [12]. Continuous- or majority emotion response as a target for explanatory or pre- time music emotion recognition (MER) focuses heavily on dictive models, or simply discards inconsistent ratings from mapping musical features or low-level correlates to continu- further investigation. This is a reductive way of examining ous emotion data [13], [14], [15]. Current machine learning the problem; we must first understand the reliability of emo- approaches may efficiently predict listener perception, but tion annotations, as findings of disagreement between raters may also face confounding model performance [16], [17], are still useful and may indicate that emotion perception and often fail to address underlying cognitive processes [18], differs among individuals more than previously thought [29]. [19]. Although low-level acoustic features, such as Mel- This is evident in the MediaEval Database for Emotional frequency cepstral coefficients (MFCCs), relate to timbre Analysis in Music (DEAM) [16]: the limited consistency in annotator ratings poses a reliability issue when using • S. Yang, C. N. Reed, and M. Barthet are with the Centre for Digital Music, averaged emotion annotations as “ground truth” for the School of Electronic Engineering and Computer Science, Queen Mary creation of a generalised model [30]. This has driven analyses University of London, London, United Kingdom. instead towards investigation of the differences between • E. Chew is with the CNRS-UMR9912/STMS, Institute for Research and annotators. In emotion modelling, the divergence between Coordination in Acoustics/Music (IRCAM), Paris, France. participant annotations from this generalisation produces Manuscript received Month XX, 2020; revised Month XX, 2021. a natural upper bound for computational approaches and 2 IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. X, NO. X, MONTH-MONTH 2021 creates a serious bottleneck in MER system performance [31]. 2.1.2 Music and Setting Models that would go beyond what humans agree upon The music selected for this study was Arno Babajanian’s perhaps lead to a systematic misrepresentation of how (1921 - 1983) Piano Trio in F# minor (1952) performed by emotion perception occurs in an empirical setting [29]. The Hilary Sturt (violin), Ian Pressland (cello), and Elaine Chew present work was thus driven by the research questions: (piano), with simultaneous visualisation of spectral art by (1) ”Does the agreement between listeners on perceived emotion Alessia Milo. The piece was performed twice at the 2015 vary over the course of a musical piece?”, (2) ”What are the Inside Out Festival on 22 October at Queen Mary University connections between the emotions perceived and the observed of London (QMUL). Audio and video were recorded by Milo, semantic features1? (3) ”What musical or individual factors and the first performance was chosen for this study2. contribute to specific perceived emotions and emotion change?” The approximately 23-minute piece presents three In an initial study, time-based valence-arousal ratings movements with widely contrasting tempos (Table 1) and were collected during a live music performance (Live is not well known to general audiences, thus likely to study). In a secondary study, these emotion ratings were avoid familiarity bias. The piece is still firmly within the explored through open-ended feedback from participants in Western classical tradition. This allows us to relate the a controlled lab setting (Lab study). Through joint thematic present research to the majority of related MER research [38]; analysis [32] of participants’ feedback built upon previous however, the perception of this musical style may not be findings [33], we have identified seven key themes influencing relevant to other genres, as addressed in Section 5.2. emotion annotations. The analysis highlights the importance of features such as instrumentation, musical structure, ex- Duration pressive embellishments, and music communication as being Movement Tempo Marking Tempo Characteristics (Min:Sec) more closely aligned with underlying cognitive processes. We thus propose a more comprehensive focus in music emotion Largo slow 1 10:14 modelling to include these listener-informed features. We Allegro expressivo fast, bright, expressive believe attention to underlying semantic themes will address Maestoso majestic emotional inconsistencies and redirect the focus of MER 2 6:15 Andante walking pace, steady systems to the listener experience.

Examining Emotion Perception Agreement in Live Music Performance Simin Yang, Courtney Nicole Reed, Elaine Chew, Mathieu Barthet

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support