<<

Thesis

The Voice of Primates: Neuro-evolutionary Aspects of Emotions

DEBRACQUE, Coralie

Abstract

Rightly emphasized by Pascal Belin “Voices are everywhere”. From music to social conversation, the human voice has indeed the extraordinary power to communicate our everyday emotions. Shaped by millions of years of , the vocal recognition of emotions guides individuals in the best decision to take. The capacity to vocally express and then identify an emotion is not a distinctive characteristic of Homo sapiens. In fact, most species in the animal kingdom have such abilities that maximize their chances of survival and reproductive success. Despite homologous traits between humans and other animals, rare are the studies using a comparative approach to better understand emotional processing in voices. To fill this gap, the present thesis aims to investigate perceptual decision-making mechanisms through emotional vocalizations expressed by humans and non-human primates (NHP), our closest relatives. According to this goal, six complementary studies were performed using imaging as well as behavioral paradigms.

Reference

DEBRACQUE, Coralie. The Voice of Primates: Neuro-evolutionary Aspects of Emotions. Thèse de doctorat : Univ. Genève et Lausanne, 2020, no. Neur. 278

DOI : 10.13097/archive-ouverte/unige:143258 URN : urn:nbn:ch:unige-1432583

Available at: http://archive-ouverte.unige.ch/unige:143258

Disclaimer: layout of this document may differ from the published version.

1 / 1

DOCTORAT EN NEUROSCIENCES des Universités de Genève et de Lausanne

UNIVERSITE DE GENEVE SECTION DE PSYCHOLOGIE

Professeur Grandjean Didier, directeur de thèse Professeur Gruber Thibaud, co-directeur de thèse

THE VOICE OF PRIMATES: NEURO-EVOLUTIONARY ASPECTS OF EMOTIONS

THESE Présentée à la Faculté de Psychologie et des Sciences de l’Education de l’Université de Genève pour obtenir le grade de Docteure en Neurosciences

par

Coralie DEBRACQUE

de Cormeilles en Parisis (France)

Thèse N°278

Genève

Imprimeur : Université de Genève

2020

2

Published articles

Gruber, T*., Debracque, C*., Ceravolo, L., Igloi, K., Marin Bosch, B., Frühholz, S., & Grandjean, D. (2020). Human Discrimination and Categorization of Emotions in Voices: A Functional Near-Infrared Spectroscopy (fNIRS) Study. Frontiers in Neuroscience, 14. https://doi.org/10.3389/fnins.2020.00570

*joint first authors

Additional paper

Ben-Moussa M*, Debracque C*, Rubo M*, Lange WG*. (2017) DJINNI: A Novel Technology Supported Exposure Therapy Paradigm for SAD Combining Virtual Reality and Augmented Reality. Frontiers in Psychiatry, 8:26. https://doi:10.3389/fpsyt.2017.00026

*all authors contributed equally

3

Remerciements

Peu douée pour exprimer mes émotions (les cordonniers sont-ils véritablement les plus mal chaussés ?), c’est pourtant avec une facilité déconcertante que je vous remercie toutes et tous aujourd’hui, pour votre incroyable contribution à ce projet de thèse.

Premièrement, je tiens à remercier le Prof. Didier Grandjean, superviseur émérite, qui part ses conseils toujours avisés et sa bonne humeur permanente, m’a fait grandir en tant que chercheur mais aussi en tant que personne. Les soirées mémorables que nous avons passées ensemble m’ont rappelé à quel point il était important de vivre ! J’aimerais également remercier mon co-superviseur, le Prof. Thibaud Gruber, qui m’a apporté une aide cruciale lors de la rédaction des articles. De plus, il m’a permis de découvrir la recherche en primatologie, univers fascinant qui rappel à l’Homme, qu’il n’ait finalement qu’un grand primate parmi tant d’autres. Un grand merci au Dr. Adrien Meguerditchian, mon « co-co-superviseur marseillais » comme j’aime l’appeler, pour cette dernière et quatrième année de thèse. Grâce à sa gentillesse et ses retards légendaires, j’ai eu l’unique opportunité de travailler avec des primates non-humains qui ont rapidement compris comment me manipuler ... Je souhaiterais également remercier les jurés présents lors de mon examen intermédiaire, mon introduction privée et finalement lors de ma défense publique: Prof. David Sander, Prof. Pascal Belin, Prof. Klaus Zuberbühler, Prof. Anne-Lise Giraud et le Dr. Roland Maurer, pour leur bienveillance et leurs questions pertinentes sur le projet.

Je dois également beaucoup à mes chères et chers collègues qui ont malgré eux supporté mes blagues et mes sarcasmes durant quatre années. Vous reconnaitrez tout de même que cela animait l’open space du campus biotech! J’aimerais particulièrement remercier le Dr. Leonardo Ceravolo, pour son incroyable gentillesse et son aide plus que précieuse à la fois, dans cette thèse et dans mes essais (ratés ?) à la natation. Un grand merci à Blanca Marin Bosch, la seconde et dernière « NIRS girl » de toute la Suisse romande… Nous avons commencé la fNIRS en même temps, nous finirons avec gloire et honneur au même moment ! Dédicace particulière au Dr. Damien Benis, avec qui je débâterai encore longuement de l’utilisation d’électrodes intra corticales chez les singes et autres animaux de laboratoire. Je

4

dois cependant reconnaitre que nous sommes tous deux d’accord en ce qui concerne l’intra chez l’espèce humaine (pour ceux qui comprendront). Dans un souci de synthèse, je tenais à remercier: Dr. Simon Schaerlaeken (extraordinaire), Dr. Raphaël Thézé (Pandore), Dr. Donald Glowinski (musique), Dr. Manuella Philippa (douce), Dr. Ben Meuleman (statistiques), Patricia Cernadas (extraordinaire bis), Alexandra Zaharia (résiliente), Cyrielle Chappuis (tinder), Marion Gumy (choupinette), Sylvain Tailamée (S.A.V.) et Carole Varone (culture) pour les nombreux moments passés ensemble à discuter, rire et profiter tout simplement.

Les meilleurs pour la fin ! Je remercie sincèrement mes parents pour leur soutien indéfectible durant ces 30 dernières années. Je sais d’expérience que mon sale caractère (si si ! je vous jure !) ainsi que mes changements d’humeur chaotiques, n’ont pas toujours été évident à supporter. Finalement, un remerciement un peu spécial pour mes fidèles et très chers compagnons, sans lesquels ma vie aurait été bien différente: Mr Pim’s, Cacahuète, Galac et Petit-Chat, qui ne sont malheureusement plus parmi nous ; Patapon, Pixelle et Plume, toujours en vie et heureux de l’être !

5

Abstract in English

Rightly emphasized by Pascal Belin “Voices are everywhere”. From music to social conversation, the human voice has indeed the extraordinary power to communicate our everyday emotions. Shaped by millions of years of evolution, the vocal recognition of emotions guides individuals in the best decision to take. The capacity to vocally express and then identify an emotion is not a distinctive characteristic of Homo sapiens. In fact, most species in the animal kingdom have such abilities that maximize their chances of survival and reproductive success. Despite homologous traits between humans and other animals, rare are the studies using a comparative approach to better understand emotional processing in voices. To fill this gap, the present thesis aims to investigate perceptual decision-making mechanisms through emotional vocalizations expressed by humans and non-human primates (NHP), our closest relatives. According to this goal, six complementary studies were performed using imaging as well as behavioural paradigms.

Study 1 investigated the vocal recognition of emotions in human voices in implicit and explicit decoding using functional near infrared spectroscopy (fNIRS). The main objective of this first study was to demonstrate i) the involvement of distinct cerebral and behavioural mechanisms at play in biased and unbiased choices; and ii) the suitability of fNIRS to assess decision-making and emotional mechanisms. Therefore, twenty-eight participants categorized (unbiased choice) or discriminated (biased choice) angry, fearful and neutral pseudo-words in implicit (word recognition) or explicit (emotional content) decoding of emotions. fNIRS analyses revealed differences in the hemodynamic responses of the bilateral inferior frontal cortex (IFC) between the implicit and explicit decoding of emotions as well as a modulation of IFC activity depending on the categorization and discrimination tasks. These findings are supported by our behavioural data showing that participants were more accurate for explicit categorization and implicit discrimination compared to implicit categorization and explicit discrimination. Overall, our results suggest first, the existence of distinct mechanisms for both, the implicit and explicit decoding of emotions; and second, the suitability of fNIRS to assess such mechanisms in humans. The level of complexity in affective decision-making is thus at play for human voices. But do we have the same mechanisms for the processing of heterospecific vocalizations?

6

Following upon the findings of Study 1, Study 2 explored the categorization and discrimination of emotions in human voice and NHP vocalizations using fNIRS. The main aim of this experiment was to explore the involvement of similar decision-making mechanisms in cross-taxa recognition. Hence, thirty participants categorized or discriminated threatening, distressful, and affiliative contents in human voices, great ape (chimpanzee –Pan troglodytes, bonobo –Pan paniscus) and monkey (macaques –Macaca mulatta) vocalizations. fNIRS analyses interestingly revealed a distinct involvement of the prefrontal cortex (PFC) and the pars triangularis of the inferior frontal gyrus IFG (IFGtri) between the categorization and the discrimination of affects in all primate species vocalizations. Further analyses also demonstrated that the correct categorization of agonistic chimpanzee and bonobo calls as well as affiliative chimpanzee vocalizations were associated to a decrease of activity in bilateral PFC and IFGtri. On the contrary, the accurate discrimination of agonistic chimpanzee vocalizations was correlated to bilateral enhancement of activity in the frontal regions. Finally, our behavioural results showed that to the exception of threatening bonobo calls, human participants were able to discriminate all affective cues in all primate species while for the categorization task, they were unable to do so for macaque vocalizations. Together, these findings point out the difference between discrimination and categorization processes as well as acoustic divergence between chimpanzee and bonobo vocalizations. Acoustical analyses are thus needed to better understand the recognition of emotions in NHP vocalizations.

In order to assess the role of acoustic features in cross-taxa recognition, Study 3 investigated using a phylogenetic and acoustic perspective perceptual decision-making mechanisms at play in the identification of affects in human and NHP vocalizations. The aim of this study was to demonstrate the importance of acoustic and phylogenetic proximities in such processes. For this purpose, sixty-eight participants performed the paradigm described in Study 2. Our analyses revealed the existence of a closer acoustic similarity between human and chimpanzee vocalizations than between human voices and calls expressed by bonobo and macaque species. Consequently, participants were more accurate to categorize and discriminate affective cues in chimpanzee vocalizations compared to bonobo or macaque calls. Interestingly, our behavioural results also showed the ability of human participants to identify distressful and affiliative macaque calls during discrimination while in the categorization task they were only capable of doing so for affiliative vocalizations. Overall,

7

our findings support the distinction between categorization and discrimination mechanisms revealed in Studies 1 and 2. Furthermore, the difference of recognition performance between chimpanzee and bonobo – macaque vocalizations underline the importance of both acoustic and phylogenetic distances triggering an additional question: are acoustic and phylogeny essential to involve brain regions usually associated with the processing of conspecific vocalizations?

To answer this question, Studies 4 and 5 explored the recognition of human and NHP species vocalizations in temporal voice areas (TVA) and IFG using functional magnetic resonance imaging (fMRI). Twenty-five participants were asked to categorize the species (human, chimpanzee, bonobo, and macaque) that expressed the vocalizations. Wholebrain analyses revealed i) the involvement of left TVA and pars triangularis, pars opercularis and pars orbitalis (excepted for bonobos) of bilateral IFG for the categorization of NHP vocalizations; ii) an enhancement of activity in all subparts of IFG for the correct recognition of chimpanzee and macaque calls (left pars orbitalis excluded for macaques); iii) an involvement of the pars triangularis of IFG for the correct categorization of bonobo screams; and iv) an increase of activity in bilateral TVA for the categorization of human and chimpanzee vocalizations compared to bonobo and macaque screams. Following this, functional connectivity analyses showed a similar coupling between right and left TVA for the identification of human voices and chimpanzee calls. Finally, our behavioural results demonstrated the capacity of human participants to accurately recognize all primate species, to the exception of bonobos. Hence, together, our results suggest that humans are capable of recognizing most of the primate species. However, phylogenetic, acoustic and behavioural proximities seem required to enhance activity in the TVA and all subparts of IFG.

Finally, in order to close the comparative loop, it was essential to investigate the auditory processing at play in NHP. The goal of Study 6 was, as a proof of concept, to demonstrate the suitability of fNIRS to explore vocal recognition mechanisms in NHP. For this purpose, we tested fNIRS on three female adult baboons anesthetized with a minimum amount of propofol. Two passive tasks were performed. In the first one, experimenters extended several times the right or left arm of the anesthetized subject. In the second task, using headphones, white noises, agonistic chimpanzee and baboon vocalizations were broadcasted in stereo or lateralized in the right or left ear. fNIRS analyses of arm

8

stimulations revealed for the three baboons contralateral activity of the motor cortex depending on the right or left arm movements. Similarly, for one subject, contralateral activity of the temporal cortex was found depending on the lateralization of the sounds. In addition, stimuli broadcast in stereo increase the oxygenated haemoglobin more in the right temporal cortex. Overall, our results support the suitability of fNIRS to assess the modulation of brain activity in NHP. Further analyses in Study 6, currently in process, will investigate vocal recognition processing in baboons.

In sum, using an evolutionary perspective, the present thesis significantly improves at the cerebral and behavioural levels our knowledge of emotional processing through the human recognition of affects in conspecific and heterospecific vocalizations. Importantly, the current project will lead to further non-invasive investigations in NHP.

9

Abstract in French

Mentionné à juste titre par Pascal Belin « Les voix sont partout ». De la musique aux conversations sociales, la voix humaine a le pouvoir extraordinaire de communiquer quotidiennement nos émotions. Influencée par des millions d’années d’évolution, la reconnaissance vocale des émotions permet de guider les individus dans leur prise de décision. La capacité d’exprimer vocalement puis identifier une émotion n’est pas une caractéristique d’Homo sapiens. En effet, dans le règne animal, la majorité des espèces possède également de telles aptitudes, permettant de maximiser leur chance de survie et leur succès reproducteur. Bien que l’Homme ait des traits communs avec les autres primates, rare sont les études utilisant une approche comparative afin de mieux comprendre les processus émotionnels en jeu dans la voix. Pour combler cette lacune, cette thèse a pour but d’explorer les mécanismes de prise de décision, à travers les vocalisations émotionnelles exprimées par des humains et des primates non-humains (NHP), nos plus proches cousins. Dans ce but, six études complémentaires analysant des données cérébrales et comportementales ont été réalisées.

L’Etude 1 a permis d’investiguer la reconnaissance vocale implicite et explicite des émotions à travers les voix humaines grâce à la Spectroscopie proche infrarouge fonctionnelle (fNIRS). L’objectif principal de cette première étude était de démontrer i) les mécanismes cérébraux et comportementaux liés à une prise de décision biaisée ou non biaisée ; et ii) la pertinence de la fNIRS dans l’évaluation des mécanismes émotionnels et décisionnels. Par conséquent, vingt-huit participants ont catégorisé (choix non biaisé) ou discriminé (choix biaisé) implicitement (reconnaissance sémantique) ou explicitement (reconnaissance émotionnelle) des pseudo-mots prononcés avec un ton de colère, de peur ou neutre. Les analyses fNIRS ont alors révélé une modulation de la réponse hémodynamique, dans le cortex frontal inférieur (IFC), selon la reconnaissance implicite ou explicite des émotions. De même, les analyses ont démontré une modulation de l’activité dans l’IFC, liée à la tâche de catégorisation ou discrimination. Ces résultats sont confirmés par nos données comportementales. En effet, nous avons par exemple montré que les participants étaient plus performants dans la tâche de catégorisation explicite que dans la tâche de discrimination implicite. Dans l’ensemble, nos résultats soutiennent premièrement, l’existence de mécanismes distincts pour la reconnaissance implicite et explicite des émotions ; deuxièmement, la pertinence de la fNIRS pour évaluer de tels processus. La

10

complexité du choix influence donc la reconnaissance émotionnelle dans les voix humaines. Mais qu’en est-il des vocalisations hétérospécifiques ?

Faisant suite aux résultats de l’Etude 1, l’Etude 2 a permis d’explorer avec la fNIRS, la catégorisation et la discrimination des émotions à travers les voix humaines et les vocalisations de NHP. Le but de cette expérience était d’investiguer les mécanismes de prise de décision, impliqués dans la reconnaissance des émotions exprimées par diverses espèces de primates. Pour cela, trente participants ont catégorisé ou discriminé l’affiliation (positif), la menace et la détresse dans des voix humaines, des vocalisations de grands singes (chimpanzé –Pan troglodytes, bonobo –Pan paniscus) et de singes (macaque –Macaca mulatta). Les analyses fNIRS ont révélé une implication distincte du cortex préfrontal (PFC) et du pars triangularis du gyrus inferieur frontal (IFGtri) dans la catégorisation et la discrimination des affects à travers toutes les vocalisations primates. D’autres analyses ont aussi démontré que, la catégorisation des affects négatifs dans les vocalisations de chimpanzés et de bonobos, était associée à une baisse d’activité dans le PFC et l’IFGtri. A l’inverse, la discrimination des affects négatifs dans les vocalisations de chimpanzés, était significativement corrélée à une hausse d’activité dans les régions frontales. Enfin, nos résultats comportementaux ont montré la capacité des participants à discriminer tous les affects à travers toutes les vocalisations (à l’exception de la menace dans les vocalisations de bonobos). A contrario, les participants ont été bien incapables de catégoriser les affects dans les vocalisations de macaques. Ensemble, nos résultats suggèrent des différences entre les processus de catégorisation et de discrimination, ainsi qu’une divergence acoustique entre les vocalisations de chimpanzés et de bonobos. Une analyse acoustique nous a donc semblé être nécessaire, pour mieux comprendre la reconnaissance humaine des émotions à travers les vocalisations des NHP.

Afin d’évaluer le rôle des composantes acoustiques dans de tels processus, l’Etude 3 a eu pour but d’explorer, grâce à une perspective phylogénétique et acoustique, les mécanismes de décision émotionnelle en jeu, dans l’identification des affects à travers les voix humaines et les vocalisations de NHP. Pour cela, soixante-huit participants ont réalisé le même paradigme que celui décrit dans l’Etude 2. Nos analyses ont alors révélé l’existence de similarités acoustiques plus importantes entre les voix humaines et les vocalisations de chimpanzés que pour les autres espèces. En conséquence, les participants furent meilleurs à reconnaitre le contenu affectif dans les vocalisations de ces deux espèces. De façon

11

intéressante, nos données comportementales ont aussi démontré la capacité des participants, à discriminer la détresse et l’affiliation dans les vocalisations de macaques. Cependant, dans la tâche de catégorisation, seul le contenu affiliatif de ces vocalisations a pu être identifié par les participants. Par conséquent, nos résultats supportent une distinction entre les mécanismes de catégorisation et de discrimination révélés dans les Etudes 1 et 2. De plus, les différences de performances entre les vocalisations chimpanzés et les vocalisations bonobos, soulignent l’importance de la distance phylogénétique et acoustique dans ces processus. Mais ces deux facteurs sont-ils primordiaux pour impliquer des régions cérébrales habituellement associées aux vocalisations conspécifiques ?

Pour répondre à cette question, les Etudes 4 et 5 ont investigué, avec l’imagerie par résonnance magnétique fonctionnelle (IRMf), la reconnaissance des vocalisations humaines et NHP, dans les aires temporales de la voix (TVA) et l’IFG. Vingt-cinq participants ont donc catégorisé les vocalisations de quatre espèces de primates (humain, chimpanzé, bonobo et macaque). Les analyses d’imagerie cérébrale ont révélé i) l’implication des TVA gauches et du pars triangularis, pars opercularis et pars orbitalis (excepté pour les bonobos) de l’IFG, pour la catégorisation des vocalisations de tous les NHP ; ii) une augmentation de l’activité dans toutes les sous-parties de l’IFG (à l’exception du pars orbitalis gauche pour les macaques), pour la reconnaissance correcte des vocalisations de chimpanzés et macaques; iii) une implication du pars triangularis de l’IFG pour la catégorisation correcte des vocalisations de bonobos; et iv) l’activation des TVA pour la catégorisation des vocalisations humaines et de chimpanzés. De même, les analyses de connectivités fonctionnelles ont montré un couplage similaire entre les TVA des hémisphères droit et gauche, pour l’identification des voix humaines et des vocalisations de chimpanzés. Enfin, nos données comportementales ont démontré la capacité des participants à reconnaitre la majorité des espèces de primates. Ensemble, nos résultats suggèrent qu’une proximité phylogénétique, acoustique et comportementale est nécessaire pour impliquer les TVA et l’ensemble des sous parties de l’IFG.

Afin de clore la boucle comparative, il était essentiel d’investiguer les processus auditifs en jeu chez les NHP. Le but de l’Etude 6 était donc, en tant que preuve de concept, de démontrer la pertinence de la fNIRS pour explorer les mécanismes de reconnaissance vocale chez les NHP. Pour cela, nous avons testé la fNIRS sur trois femelles babouins adultes anesthésiées avec un dosage minimum de propofol. Deux tâches passives ont alors été

12

réalisées. Dans la première, les expérimentateurs ont mobilisé plusieurs fois le bras droit ou gauche du sujet anesthésié. Dans la deuxième tâche, à l’aide d’un casque audio, des bruits blancs ainsi que des vocalisations de chimpanzés et de babouins ont été diffusés en stéréo ou latéralisés dans l’oreille droite ou gauche. Les analyses fNIRS des stimulations motrices ont alors révélé, pour les trois babouins, une activité controlatérale du cortex moteur, en fonction des mouvements passifs du bras droit ou gauche. De même, pour un seul sujet, une activation controlatérale du cortex temporal a été démontrée selon la latéralisation des sons. De plus, d’autres analyses ont montré que les stimuli diffusés en stéréo avaient significativement plus activé le cortex temporal droit de ce même sujet. Dans l’ensemble, nos résultats confirment la pertinence de la fNIRS pour mesurer l’activité cérébrale chez les NHP. De futures analyses dans l’Etude 6, toujours en cours, exploreront les processus impliqués dans l’exposition à des vocalisations de babouins et de chimpanzés.

En résumé, utilisant à la fois une perspective évolutionnaire et des données cérébro- comportementales, cette thèse contribue à améliorer la compréhension des processus émotionnels en jeu dans les vocalisations conspécifiques et hétérospécifiques. Enfin, ce projet conduira je l’espère, à de futures investigations non-invasive chez les NHP.

13

Table of Contents

Published articles ...... 3 Remerciements ...... 4 Abstract in English...... 6 Abstract in French ...... 10 Table of Contents ...... 14 Theoretical Part ...... 19 1. Abstract of Literature Review ...... 20 2. Literature Review ...... 21 2.1 A brief history of Evolutionary Theory ...... 21 2.1.1 Development of Evolutionary Thought ...... 21 2.1.1.1 At the Origins of Sciences ...... 22 2.1.1.2 and modern view of Evolution ...... 25 2.1.2 Evolution in Theories of Emotion ...... 28 2.1.2.1 ...... 29 2.1.2.2 Emotions in Animals: The end of a Debate? ...... 34 2.2 Evolutionary continuity between Primates’ Affective Processing ...... 41 2.2.1 Emotional Prosody Recognition ...... 41 2.2.1.1 Emotional Prosody in Human Voice ...... 42 2.2.1.2 Human Recognition of Affects in other Primate Vocalizations ...... 48 2.2.2 Expression of Affects in Non-Human Primate Calls ...... 53 2.2.2.1 Affective Communication in non-human Primates ...... 53 2.2.2.2 A Continuous Evolution of Brain Mechanisms ...... 57 2.3 Synthesis of the Introduction ...... 62 3. Thesis objectives ...... 62 Experimental Part ...... 65 Chapter 1. Human Discrimination and Categorization of Emotions in Voices: A functional Near-Infrared Spectroscopy (fNIRS) study ...... 65 1.1 Abstract ...... 65 1.2 Introduction ...... 66 1.3 Materials & Methods ...... 70 1.3.1 Participants ...... 70

14

1.3.2 Stimuli ...... 70 1.3.3 Procedure ...... 71 1.3.4 NIRS Recordings ...... 73 1.4 Analysis ...... 74 1.4.1 Behavioural Data ...... 74 1.4.2 fNIRS Data ...... 75 1.4.3 Analyses Including Passive Blocks ...... 76 1.4.4 Analyses on Active Blocks Only ...... 77 1.5 Results ...... 77 1.5.1 Behavioural Data ...... 77 1.5.1.1 Accuracy Data ...... 77 1.5.1.2 Reaction Time ...... 78 1.5.2 NIRS Data ...... 79 1.5.2.1 Analyses Including the First Passive Run ...... 79 1.5.2.2 Analyses of the Active Blocks ...... 81 1.6 Interim Discussion ...... 83 1.7 Supplementary Material ...... 87 Chapter 2. Categorization and Discrimination of Human and Non-Human Primate Affective Vocalizations: a functional NIRS study of the Frontal cortex involvement ...... 93 2.1 Abstract ...... 93 2.2 Introduction ...... 94 2.3 Materials & Methods ...... 97 2.3.1 Participants ...... 97 2.3.2 Vocalizations ...... 97 2.3.3 fNIRS acquisition ...... 98 2.3.4 Experimental procedure ...... 99 2.4 Analysis ...... 100 2.4.1 Behavioural data ...... 100

2.4.2 Interaction between Participants’ Performance and Brain O2Hb changes ...... 100 2.4.3 fNIRS data ...... 100 2.5 Results ...... 102 2.5.1 Accuracy ...... 102

2.5.2 Interaction between Participants’ Performance and Brain O2Hb changes ...... 102

15

2.5.3 fNIRS data ...... 104 2.6 Interim Discussion ...... 105 2.7 Supplementary Material ...... 108 Chapter 3. Human Recognize Affective cues in Primates Vocalizations: Acoustic and Phylogenetic perspectives ...... 113 3.1 Abstract ...... 113 3.2 Introduction ...... 114 3.3 Materials & Methods ...... 117 3.3.1 Participants ...... 117 3.3.2 Vocalizations ...... 117 3.3.4 Experimental procedure ...... 118 3.4 Analysis ...... 119 3.4.1 Acoustic analyses ...... 119 3.4.2 Behavioural analyses ...... 120 3.4.3 Interaction between Behaviour and Acoustic Similarity ...... 121 3.5 Results ...... 121 3.5.1 Acoustic analyses ...... 121 3.5.2 Behavioural results ...... 122 3.5.3 Interaction between Behaviour and Acoustic similarity ...... 124 3.6 Interim Discussion ...... 125 3.7 Supplementary Material ...... 128 Chapter 4. Sensitivity of the Anterior Human Temporal Voice Areas to Affective Chimpanzee Vocalizations...... 135 4.1 Abstract ...... 135 4.2 Introduction ...... 135 4.3 Materials & Methods ...... 137 4.3.1 Species Categorization task ...... 137 4.3.1.1 Participants ...... 137 4.3.1.2 Stimuli ...... 138 4.3.1.3 Experimental Procedure and Paradigm ...... 138 4.3.2 Temporal Voice Areas localizer ...... 138 4.3.2.1 Participants ...... 138 4.3.2.2 Stimuli and Paradigm ...... 139

16

4.4 Analysis ...... 139 4.4.1 Behavioural Data analysis ...... 139 4.4.1.1 Accuracy ...... 139 4.4.1.2 Acoustic Mahalanobis distances ...... 140 4.4.1.3 Interaction between Behaviour and Mahalanobis distances ...... 140 4.4.2 Imaging Data acquisition ...... 141 4.4.2.1 Species Categorization task ...... 141 4.4.2.2 Temporal Voice Areas localizer task ...... 141 4.4.3 Wholebrain data analysis in the TVA ...... 141 4.4.3.1 Species Categorization task region-of-interest analysis within the Temporal Voice Areas ...... 141 4.4.3.2 Temporal Voice Areas localizer task ...... 142 4.4.4 Functional Connectivity analysis ...... 143 4.5 Results ...... 144 4.5.1 Interaction between Behaviour and Mahalanobis distances ...... 144 4.5.2 Region-of-interest data within the Temporal Voice Areas ...... 145 4.5.3 Functional Connectivity ...... 146 4.6 Interim Discussion ...... 148 4.7 Supplementary Material ...... 151 Chapter 5. Non-Human Primate Vocalizations are Categorized in the Inferior Frontal Gyrus ...... 154 5.1 Abstract ...... 154 5.2 Introduction ...... 155 5.3 Materials & Methods ...... 156 5.3.1 Participants ...... 156 5.3.2 Stimuli ...... 157 5.3.3 Experimental Procedure and Paradigm ...... 157 5.3.4 Image acquisition ...... 158 5.4 Analysis ...... 158 5.4.1 Wholebrain analysis ...... 158 5.4.2 Behavioural data analysis ...... 160 5.5 Results ...... 160 5.5.1 Wholebrain data ...... 160

17

5.5.1.1 Model 1: Processing of all Species trials Independently of Categorization Performance ...... 160 5.5.1.2 Model 2: Processing of Correctly Categorized Species trials ...... 162 5.5.2 Accuracy ...... 165 5.5 Interim Discussion ...... 166 5.6 Supplementary Material ...... 168 Chapter 6. Brain Activation Lateralization in Monkeys (Papio anubis) following Asymmetric Motor and Auditory stimulations through functional Near Infrared Spectroscopy ...... 170 6.1 Abstract ...... 170 6.2 Introduction ...... 171 6.3 Materials & Methods ...... 173 6.3.1 Subjects ...... 173 6.3.2 Subject’s Hand Preference in Communicative Gesture and Bi-Manual task .....174 6.3.3 Recordings ...... 174 6.3.4 Motor stimulations ...... 175 6.3.5 Auditory stimulations ...... 176 6.4 Analysis ...... 176 6.4.1 fNIRS signal ...... 176 6.4.2 AQ score calculation...... 177 6.5 Results ...... 178 6.5.1 Motor stimulations ...... 178 6.5.2 Auditory stimulations ...... 178 6.6 Interim Discussion ...... 179 6.7 Supplementary Material ...... 182 General Discussion ...... 184 1. Synthesis and Integration of the Main Findings ...... 184 2. Theoretical implications ...... 189 3. Limitations ...... 193 4. Future Perspectives ...... 195 5. Conclusion ...... 196 References ...... 198 Additional Project ...... 225

18

Theoretical Part

« J’aime trop les humains ! J’aime qu’ils me servent, j’aime les regarder pleurer, j’aime les regarder souffrir, j’aime les regarder vivre, j’aime les savoir ignorants de ma tendresse à leur égard, j’aime les savoir convaincu de mon incompétence à comprendre leur monde, de mon incapacité à les écouter et partager leurs peines et leurs tristesses. Ne suis-je pas qu’un misérable singe […] le chimpanzé sympathique ? »

Wajdi Mouawad, Anima, Chapter II Bestiae Fabulosae, Pan Troglodytes.

19

1. Abstract of Literature Review

Mechanisms underlying the human recognition of emotions are still misunderstood. Mainly focused on Homo oeconomicus and its rationality, studies on decision-making1 processes have investigated only late how emotional experiences affect our daily choices. Yet, for our survival2, it is crucial to distinguish a friendly situation from an agonistic one, in order to approach or avoid the source of these emotions. Thus, this ability to recognize emotional signalling must rely on old evolutionary bases. In fact, identifying emotional cues in order to make adaptive choices is also primordial across the animal kingdom, with the effect of increasing survival chances of an individual from a given species. Despite innate mechanisms and homologous traits between humans and other animals, only few studies have used a comparative approach to explore the possible capacity of humans to recognize emotions expressed by other animals, especially non-human primates, our closest relatives. Moreover, most of these studies have investigated such processes through facial expressions and little is known about the human recognition of emotions in primate vocalizations. The present thesis attempts to fill this gap, exploring behavioural and cerebral mechanisms underlying the human recognition of emotions in human voices, great ape and monkey calls.

Human behaviours, cognition or even emotions were shaped through millions of years of evolution. From our common ancestor with chimpanzees and bonobos 6-8 million years ago to Homo sapiens, human’s abilities have changed. Taking in consideration this extraordinary history has become crucial in affective sciences for a general understanding of emotional processing. However, evolutionary theories have been denied or ignored by most of the scientific community during a long time. Darwinism and its impact on human and animal3 research, especially in the affective domain, was only recognized late (Section 2.1). Scientists of the 20-21th centuries underlined the importance of evolution in human daily life, highlighting for example that arousal or emotional valence promote survival. Thus, researchers started to investigate affective communication in mammals, especially in primates, and comparative approaches emerged to explore cross-taxa recognition in humans (Section 2.2).

Evolution theories are part of the great human history (Section 2.1.1). Going back to Antiquity, most ancient civilizations already understood or believed that species change

1 The term decision-making refers here to perceptual decision-making 2 The term survival refers to proximate explanation of evolutionary mechanisms such as the recognition of emotions. For ultimate causes, the concept of fitness is used. 20 3 To simplify, the term animal is used to refer to all species excepted humans. Obviously, humans are part of the animal kingdom.

over time. Since this extraordinary period of knowledge, evolutionary thought continued to progress through ages, encountering criticisms and rejection from a part of the scientific community influenced by morality or religion (Section 2.1.1.1). However, during the 19th century, scientists such as Darwin and Wallace underlined once again the importance of evolutionary mechanisms, specifically , to understand the global history of mankind. The latter theory is the basis of a modern view of evolutionary thought (Section 2.1.1.2). Since the 20th century, the scientific community accepts the modern synthesis of evolution. Theories in psychology, especially in affective sciences (Section 2.1.2), have integrated this new point of view to investigate the evolution of emotional processing in human (Section 2.1.2.1) but also in other animals (Section 2.1.2.2).

Emotions and evolution are closely related. Affective processes rely on vocalisations in humans and other species to promote the survival of individuals (Section 2.2.1). In fact, the prosodic modulation of utterances in human voices can signal, among other things, emotional cues from a speaker to a receiver, allowing the latter to react adequately to a potential danger conveyed for instance, by a fearful or an angry prosody in the speaker’s voice (Section 2.2.1.1). Since the last decade, comparative approaches have emerged in neurosciences and psychology to explore the human identification of affects in other species’ vocalizations (Section 2.2.1.2). However, studies on this topic are still scarce and none of them have investigated at both the behavioural and cerebral levels the differences of mechanisms related for instance to the identification task itself (e.g. categorisation or discrimination processing). Consequently, research to fill this gap is mandatory. Moreover, emotional processing being crucial to attention, social interaction and more generally to survival, affective mechanisms can also be found in other primate vocalizations (Section 2.2.2) at the acoustic (Section 2.2.2.1) and cerebral level (Section 2.2.2.2).

2. Literature review

2.1. A Brief history of Evolutionary Theory

2.1.1. Development of Evolutionary Thought

From Antiquity to the 21th century, evolutionary thought has been discussed through the ages. In the Ancient Greece and the Roman Empire, philosophers, scientists and poets of their time debated the origin of humankind. Despite the fact that the first evolutionary

21

ideas were emerging, they were not sufficient to explain the appearance of human and animal beings on Earth, facing the strong belief in divine intervention for the creation of life. Due to the power of religion, evolutionary thought was thus ignored until the Age of Enlightenment. The 18th century was indeed the time of an important philosophical revolution, in which the thirst for knowledge enabled the theory of evolution to finally reach a milestone. Following this great period of scientific knowledge, the first conceptualizations of evolutionary mechanisms emerged in the 19th century with the concepts of and natural selection, made by Jean-Baptiste Lamarck and respectively. Nevertheless, Darwin’s ideas of fitness and phylogenetic branches initiated the “Darwinian revolution”, in which the fact that humans are primates and thus animals among others is finally recognized in sciences. His theories were strongly debated across the scientific community, even if his friend, Thomas Henry Huxley strongly supported him. Nowadays, through the modern synthesis of evolution, evolutionary mechanisms are recognized as key factors to explain physiological and behavioural processes in humans and other animals.

2.1.1.1. At the Origins of Sciences

The intuition that species change over time and descend from a common animal is not a modern theory. In fact, it is probably one of the oldest concepts in science. Indeed, Anaximander of Miletus (610 – 546 BC), often considered as the first evolutionist, already suggested that the first animal on Earth lived in water and that humans could be the descendants of a primitive fish (Kočandrle & Kleisner, 2013). However, influential philosophers such as Plato (428 – 648 BC) and Aristotle (384 – 322 BC), who believed in a divine intervention for the creation of life, questioned this first evolutionary perspective. In the same way, later in the Roman Empire, Lucretius (99 – 55 BC) described in his poems “De rerum natura”, survival mechanisms in the development of life (Holmes, 2007). Nevertheless, he was also criticized by Cicero (106 – 43 BC), a major philosopher at that time, strongly influenced by religion (Cicero, 2003).

Crossing the Middle-Age and the Renaissance period, evolutionary thought finally reached a milestone during the Age of Enlightenment (18th). Indeed, the word “evolution” took on its full meaning of progression with Charles Bonnet, in his concept of future generation development: the theory of pre-formation, in which female carry within them all future

22

generation in a miniature form (Pallen, 2011). Additionally to this, Pierre Louis Moreau de Maupertuis was the first to describe in his book “Venus Physique” the mechanism of “natural selection” conceptualized later by Charles Darwin: “In the fortuitous combinations of the productions of nature, none but those that found themselves in certain relations of fitness could subsist, is it not wonderful that this fitness is present in all species that are currently in existence?” (Maupertuis, 1745, p. 197). Similarly, Georges-Louis Leclerc, Comte de Buffon, like his future successor Jean-Baptiste Lamarck, argued that species are varieties modified by environmental factors from one original individual. He even suggested that humans and apes had a common ancestor. However, the Comte de Buffon also believed that each species had an original form arisen through spontaneous generation and thus, that modifications specific to each species were limited (Pallen, 2011). Finally, James Burnett, Lord Monboddo, probably ahead of his time, already suggested in the late 18th century, that humans were the descendants of other primates and often compared the human species to other great apes such as chimpanzees or orangutans in his book “Of the Origin and Progress of ”: “The orangutan is an animal of the human form, inside as well as outside: That he has the human intelligence, as much as can be expected in an animal living without civility or arts: That he has a disposition of mind, mild, docile and human: That he has the sentiments and affections peculiar to our species” (Monboddo, 1774, p. 289). Interestingly, such writings impacted scientific theories as well as popular thoughts, starting to influence the theatrical art (see Figure1) and the poetic literature of the 18th and 19th centuries (E. Darwin, 1803).

Figure 1: Illustration of Mazurier’s ape costume for his role of Joko at the Theatre de la porte- Saint-Martin (Mazurier, Rôle de Joko., 1826).

23

Emerging in the early 19th, the first comprehensive theory of evolution known as as well as Transformationism emphasized the importance of environmental in species complexity. In fact, even if Jean-Baptiste Lamarck in his theory of the transmutation of species believed in spontaneous generation and then, rejected the idea of a common ancestor (Lamarck, 1809), his concept of inheritance of acquired characteristics in species strongly influenced evolutionary thinkers. Professor in zoology, Lamarck famously illustrated his theory using the giraffe’s neck adaptation as one example (see Figure 2) in his books of 1802 and then 1809: “In regard to habits, it is interesting to observe a product of them in the particular form and height of the giraffe. […] the earth is nearly always arid and without herbage, obliging it to browse on the leaves of trees and to continually strive to reach up to them. It was resulted from this habit, maintained for a long time by all individuals of the race that the forelegs have become longer than the hind legs and its necks has so lengthened itself” (Lamarck, 1809, p. 219).

Figure 2: Drawing by Lamarck depicting the giraffe’s neck adaptation. He suggested that, over generations the long neck of the giraffe evolved to reach higher leaves, which are unattainable by any other herbivore. (Lamarck, 1802).

Hence, in response to modifications in their environment, a given species adopted new habits leading to new structures (e.g. muscles) accumulated across the future generations and resulting at some point in a new species. Despite the fact that Lamarck’s theory was well conceptualized, he did not explain explicitly how the acquired characters were

24

transmitted within the next generation. Nevertheless, even if he never pronounced the word “heredity”, Lamarck yet referred to sexual reproduction and shared characters between both parents (Burkhardt, 2013). The main opponent to the theory of transmutation of species was George Cuvier, who believed like Aristotle at this time in the fixity of species. According to him, a species could only change after an or a catastrophic episode leading to a new period of creation. Cuvier’s theory came from his work as a palaeontologist, for which he is often considered the funder of the discipline as well as that of comparative (Bowler, 2003). Nowadays, some researchers still claim a Lamarckism point of view in modern fields such as developmental plasticity or epigenetic inheritance, mainly focusing on the developmental aspect of Lamarck’s theory (Gissis et al., 2011).

From the Antiquity with Anaximander of Miletus to the early 19th century with Jean– Baptiste Lamarck, many concepts of Darwin’s theory were already claimed. Nevertheless, the “Darwinian revolution” (Burkhardt, 2013) enabled for the first time the conceptualization of all relevant hypotheses in one comprehensive theory that is currently the base of a modern view of evolution.

2.1.1.2. Darwinism & Modern view of Evolution

Travelling across the globe aboard to the HMS Beagle during five years, the young naturalist Charles Darwin collected fossils and geological artifacts, while precisely describing his surrounding environment. Writing his discoveries on a personal notebook “The voyage of the Beagle”, Darwin started to question the fixity of species, encouraged by the Captain of the HMS Beagle, Robert Fitzroy, who gave him the first volume of “Principles of Geology” wrote by Charles Lyell, a famous geologist of his time, supporting the idea of slow and long periods of modifications for the development of earth, totally rejecting the theory of George Cuvier. Influenced by his observations and Lyell’s ideas, Darwin secretly modelled the concept of transmutation of species that he described in opposition to Lamarck as a process of divergence and branching between species. While Darwin already built the foundations of his concept of natural selection, named so in opposite to artificial selection made by humans (Huxley, 1881), at the time of his travels (see Figure 3), he waited twenty years to publish his first conceptualization of evolution.

25

Figure 3: Illustration of Darwin’ Finches, observed in Galapagos Island (1835). Darwin described his concept of natural selection using the evolutionary mechanisms that were at play in Finches. According to Darwin, the different species of Finches have different beaks because they are adapted to eat different kinds of food (e.g. nuts, insects) but they all descend from of a common ancestor due to their important behavioural and anatomical similarities.

In parallel to Darwin, Alfred Russel Wallace, a naturalist, also believed in the transmutation of species after his own observations in South America and Malay Archipelago. He also suggested in his early writings that the evolution of species, would be explain by branching mechanisms (Pallen, 2011). Like Darwin, Wallace was strongly influenced by a clergyman, Thomas Robert Malthus and his concept of “struggle for existence” (Bowler, 2003). Advised by Lyell and Joseph Dalton Hooker, Darwin allowed them to present conjointly his own work on natural selection as well as Wallace theory to the Linnean Society in 1858 (Wyhe & Rookmaaker, 2012). Yet, one year later, the Darwinian revolution was fully realized after the first publication of Darwin’s famous book “” (Darwin, 1859). Darwin’s theory on natural selection and branching always provoked intense debates in the scientific community of the 19th century. In fact, the concept of branching claimed by Darwin suggested that humans and other great apes were together on the same evolutionary tree. Triggering a philosophical revolution, scientists such as Cuvier or even Darwin’ friends Lyell and Wallace questioned its perspective. Indeed, in Cuvier’s theory, humans are from a different order of mammals and thus, cannot be compared to any other species. Similarly, Lyell and Wallace effectively defended the idea of a physical ancestor between humans and other great apes but totally disproved the continuity between some aspects of their minds. The debate became even more intense with the successive publications of “The Descent of Man and Selection in Relation to Sex” (Darwin, 1871) and “The Expression of Emotions in Man and Animals” (Darwin, 1872) in which Darwin clearly compared humans to other

26

animals from a biological point of view to an intellectual and emotional one. Read by the non-scientific community as well, the news-papers of its period often caricatured Darwin as a primitive monkey (see Figure 4). Nevertheless, its caricatures finally contributed to the popularity of Darwin in evolutionary theories.

Figure 4: Darwin caricatures in 19th century’ new papers (Cavin & Vallotton, 2009)

Moreover, Thomas Henry Huxley, positioning himself as “Darwin Bulldog”, defended against old odds Darwin’s hypotheses in demonstrating for instance, how humans and apes were close anatomically speaking, even in their brain structures. He also pointed out the fact that, in opposite to Lamarck’s or Cuvier’s theories, Darwin was the first to describe evolutionary mechanisms without any divine or supernatural intervention. Visionary, Huxley concluded in his book “On the Origin of Species: Or, The Causes of the Phenomena of Organic Nature”: “Mr. Darwin’s work is the greatest contribution which has been made to biological science […] and I believe that, if you take it as the embodiment of a hypothesis, it is destined to be the guide of a biological and psychological speculation for the next three or four generations.” (Huxley, 1881, p. 144).

As Huxley predicted, nowadays, Darwinian theories of evolution are strongly supported by the emerging cross-disciplinary consensus of the early 20th and 21st centuries (see Figure 5). His grandson, Julian Huxley, also biologist, was the first to name this consensus “the modern synthesis of evolution” (Huxley, 1942).

27

Figure 5: Diagram of the idea brought together in the “Modern Synthesis” in (modified from Ian, 2020).

Therefore, influencing scientists and thinkers from the 19th to the 21st centuries, Darwinian theories were the keystones of evolutionary thought. Darwin indeed initiated a philosophical revolution in which other animals, like humans, expressed and felt emotions. Section 2.1.2. will develop how this perspective influences current theories of emotion.

2.1.2. Evolution in Theories of Emotion

The definition of emotion may be one of the most debated concepts in research. Despite the fact that the lack of consensus led few psychologists to even doubt of the necessity of a definition (Frijda, 2007; LeDoux, 2012), we will consider in the next sections that “emotion” is a rapid process, focused on a specific event and involving i) elicitation mechanisms based on stimuli relevance; and ii) multiple emotional response processes (action tendency, autonomic reaction, expression and feeling components - Sander, 2013). While the definition of emotion is still discussed, its evolutionary basis is now recognized by most scientists in the affective domain. From the theories of Nesse to Frijda, Ekman or more recently Scherer’s model, evolution is key to understand the genesis of emotions in humans. Moreover, if old evolutionary mechanisms are really at play in our emotional experiences, we can assume that emotional processes are also present in other animals, especially in phylogenetically close species such as non-human primates. In spite of intense debates on this last perspective, research however often described the existence of more or less complex affective lives in multiple species across the animal kingdom.

28

2.1.2.1. Evolution of Emotion

Darwin in “The expression of the emotions in man and animals” (Darwin, 1872) was the first to truly emphasize the key role of evolutionary mechanisms in emotional processing. Nowadays, it is accepted that emotions have an evolutionary history. Randolph Nesse, for example, writes in 2009 that “Natural selection shaped emotions and the mechanisms that regulate them” (Nesse, 2009, p. 159), referring to the role of natural selection in first, the subdivision of emotional types, and second, in the organization of psychological and physiological processes that facilitate an adaptive response to a particular situation (Al- Shawaf & Lewis, 2017; Nesse, 1990 - see Figure 6). In fact, different emotions were probably shaped by natural selection because each specific situation elicited different sets of adaptive responses. However, in our modern environment, emotions are not always adapted (Greenberg, 2002). Johan Bolhuis and Clive Wynne explained this, with a practical and quite comical example: “The tendency of modern humans to spontaneously fear spiders rather than cars, which are far more dangerous, is thought to stem form the prevalence of poisonous arachnids, rather than dangerous driving, during the Pleistocene.” (Bolhuis & Wynne, 2009, p. 832). Yet, emotions enable to react differently to a threatening or a joyful situation for instance. This last process was well conceptualized by Nico Frijda with his notion of action readiness, referring to motivational aspect of emotions (Frijda, 2007, 2016). In fact, specific motives would appear to “move” animals to achieve their biosocial goals (e.g. reproduction) and to guide them to pay attention or be emotionally aroused in a certain situation (Gilbert, 2015). Thus, most emotions may involve a position toward a specific object (e.g. rejection) and readiness to implement that position in action (e.g. by moving away). Furthermore, states of action readiness would involve activation and deactivation states as well as action tendency (theorized by Arnold, 1960) to avoid or approach an object in a peculiar context. Action tendency relies on specific brain networks in the involving bilateral prefrontal cortex (PFC), amygdala-motor pathway, right inferior frontal gyrus (IFG), anterior cingulate cortex (ACC), periaqueductal grey area (PAG) and (See Section 2.2.1.1 for a more detailed description of neural networks involved in emotion in humans and Section 2.2.2.2 for a review in non-human primates). Moreover, in his concept of approach- withdrawal, Richard Davidson demonstrated the distinct involvement of the right and left PFC depending on approach or avoidance behaviours elicited by the type of emotion (Davidson et al., 1990). Davidson and collaborators indeed demonstrated a stronger

29

activation of anterior fronto-temporal regions in the right hemisphere for fear and disgust faces (withdrawal emotions) compared to higher neural activity in the left hemisphere for happy faces normally triggering approach behaviours (Davidson et al., 1990; Davidson, 1992).

Figure 6: Phylogeny of emotions shaped by natural selection. Resources are represented in upright font, emotions in italic and situations in capitals. Natural selection gradually differentiated responses to increase the ability of individuals to obtain the three main types of resources: personal, reproductive and social effort. Emotions were thus shaped to deal with situations arising from the pursuit of specific goals (Nesse, 2004).

As illustrated in Figure 6, different situations can have adaptive challenges in common. Consequently, emotions can be classified using two dimensional aspects with i) arousal and ii) valence (Nesse, 2009). “Arousal” refers to a short-term increase in some processes that can be viewed as involving excitatory mechanisms (increase in behaviour or physiological activity) (Fowles, 2009). For instance, a massive increase in sympathetic nervous system (e.g. cardiac or respiratory rhythm) triggered by a phobic stimulus (e.g. spider). “Valence” refers to the pleasant or unpleasant characters of emotions (e.g. Frijda, 1987; Scherer, 2003). For example, happiness is classified as a positive emotion whereas threat is recognized as a negative one. Positive emotions should elicit approach behaviours whereas negative emotions should trigger avoidance behaviours. These two dimensions are highly correlated and thus, both are often used for the modelling of emotions (starting from Russell, 1980). Nevertheless, other dimensional aspects exist to represent emotional experiences (Fontaine et al., 2007). In fact, the individual sense of power

30

or control to represent emotions (potency – control) was also well described in the literature. For example, anger is expected to score highly in this dimension whereas fear is expected to score extremely low (Goudbeek & Scherer, 2010). Interestingly, Fontaine and collaborators highlighted as well the importance of a fourth dimension in the representation of emotions. This last dimension, characterized by the unpredictability of an emotional event, would be indeed particularly essential to explain the ambivalent status of surprise (Fontaine et al., 2007). Moreover, most of the theories of emotions involve complex psychological and physiological processes as well as evolutionary mechanisms (Sander et al., 2018). For instance, Paul Ekman and David Matsumoto define emotions as “Transient, bio-psychosocial reactions designed to aid individuals in adapting to and coping with events that have implications for survival and well-being” (Ekman & Matsumoto, 2009, p. 69). Hence, their definition highlights the key role of emotions in psychological and physiological processes enabling to maximize the chance of survival of an individual. Furthermore, Ekman in his theory of Basic emotions (Ekman, 1999), underlines the existence of an “Emotion alert database” storing schemas information and enabling individuals to react adaptively. He gives as an example the perception of a coiled object that could match with the schema of snake and thus trigger the emotion of fear. Importantly, he also emphasizes that the universality of seven emotional expressions relies on brain regions such as the amygdala and the insula (Sander et al., 2018) for anger, disgust, fear, happiness, sadness, contempt and surprise (see Figure 7), suggesting inherited mechanisms from our evolutionary history (Ekman, 2003). For instance, happiness would correspond to sub-goals being achieved, anger to an active plan being obstructed, sadness to the failure of a major plan or loss of an active goal, fear to self- preservation goal being threatened or goal conflict and disgust to gustatory goal violated (Juslin & Laukka, 2003). Yet, the universality of emotions is controversial in the literature. Some findings indeed demonstrate cross-cultural emotional basis (e.g. Sauter et al., 2010; Scherer et al., 2001) whereas others do not (e.g. Crivelli et al., 2016; Jack et al., 2016). Despite the fact that the results found by Carlos Crivelli and collaborators on cultural variation of emotional expression are highly debated (Kret & Straffon, 2018), other findings on facial expressions of emotions also suggest cross-cultural differences. According to Jack and collaborators, Ekman’s work was indeed biased by the use of static images excluding the temporal dynamic of facial expression for instance (Jack et al., 2014). The authors decided thus to investigate across three different studies, the conceptual mapping, the dynamics and

31

the similarities of facial expressions of emotions between two cultures. Unlike Ekman, they revealed that only four latent emotional patterns were universal (rage, shame, joy and ecstatic), suggesting cultural differences for certain types of emotion (Jack et al., 2016).

Figure 7: The seven basic emotions and their universal expressions (Matsumoto et al., 2008).

In line with the appraisal mechanisms described by Frijda or Phoebe Ellsworth (Ellsworth & Scherer, 2003) and the components of emotions of Ekman, Klaus Scherer conceptualized in his component process model (CPM; see Figure 8) the cognitive and physiological sequences that shaped the genesis of emotions (Scherer, 1982, 2001). The rapid cognitive evaluation of an event would enable the facilitation of adaptive mechanisms in case of emergency situations. In fact, in 2009 the psychologist described the CPM model as “Based on the idea that during evolution, emotion replaced instincts in order to allow for more flexible response to events in a complex environments, and it did so by introducing an interrupt for further processing into the stimulus-response chain.” (Scherer, 2009, p. 93). Moreover, according to the author, emotions would have been optimized i) to evaluate an object or an event, ii) to regulate the process, iii) to prepare and guide action, iv) to communicate reaction, behavioural intention; and v) to monitor internal states and organism-environment interaction (Scherer, 2001, 2009).

32

Figure 8: Comprehensive illustration of the component processes of emotion (Sander et al., 2005; Scherer, 2001). In summary, in individuals an emotion is determined by i) the filtering of a relevant stimulus or event; ii) the evaluation of the organism’s implication; iii) the level of control and adjustment to this event; and iv) the response suitability to personal and social norms. In addition, for each sequential stage of assessment, there are two outputs (represented with the arrows): a modification of the cognitive – motivational mechanisms that have influenced the appraisal process and efferent effects on the periphery.

Importantly, Scherer also attempted to conceptualize mechanisms involved in the vocal communication of emotions between a speaker and a listener. For this purpose, he revised the Organon model of Karl Bühler on linguistic signs (Bühler, 1990) who postulated that any sign of language has three adaptive functions including i) “the symptom” of the speaker’ states; ii) “the symbol” of socially shared meaning category; and iii) “the appeal” that refers to a social message toward others. Adapting these three functions to the vocal expression of emotions, Scherer thus describes the symptom as the mechanisms underlying the sender’s cognitive and emotional states; the symbol as the concept of emotions and the appeal as the listener’s reactions with approach or avoidance behaviours (Scherer, 1988, 1992, 2009b). Furthermore, based on Bühler and Brunswik’ model of emotional encoding – decoding processing (Brunswik, 1956), Scherer also crucially emphasizes “the strong pressure exerted by impression (pull) factors on expression (push) factors during the course of evolution of communication in socially living species” (Scherer, 2009c, p. 168). For example, an individual from any social species gives the impression of strength and power by producing loud and low-frequency vocalizations relying on vocal tracts tension. Hence, both psychobiological

33

mechanisms (internal push effects) and social norms (external pull effects) would influence the encoding and the decoding of the vocal expression of emotions in human and other animals (Grandjean et al., 2006; Scherer, 1988, 2009c; see Figure 9).

Figure 9: Adaptation of Brunswik’s lens model, including the influences of conventions, norms and display rules - pull effects - and psychobiological mechanisms - push effects - on the encoding of emotional vocalizations produced by the speaker and the decoding made by the listener on reciprocal influences of these two aspects on attributions. An emotional content is thus expressed by distal indicators cues (e.g. acoustic features) that are perceived by a listener who on the basis of proximal percept and contextual information makes a subjective attribution of the speaker’s emotional states (Grandjean et al., 2006).

Affective sciences through the main theories of emotions, recognized the crucial role of evolutionary processing such as natural selection in the emergence of emotions. For instance, despite the current debate, the universality of basic emotions suggests inherited mechanisms based on . If this hypothesis of inheritance is correct, similar processes should be found in other species as well, in particular in non-human primates (NHP), our closest relatives.

2.1.2.2. Emotions in Animals: The end of a Debate?

Based on René Descartes’ words, “Animals are like robots: they cannot reason or feel pain.” (Proctor et al., 2013, p. 883). Hopefully, in the 21st century, this Cartesian point of view is no longer shared in affective sciences. However, the necessity to define or even study emotions

34

are still debated in animal research (de Vere & Kuczaj, 2016; de Waal, 2011; LeDoux, 2012). Many reasons may explain this ongoing debate. First, our anthropocentric view of human emotions conducted some scientists to not recognize complex emotional processes in animals or to consider them differently because they would necessarily be less important than the ones experienced by humans (Proctor et al., 2013). Second, the “Anthropo-denial” inherited in research from the avoidance of anthropomorphism to explain animals’ behaviours (de Waal, 2018; Panksepp, 2011a). Anthropomorphism, indeed, cannot be used directly to explain results; however, it should be considered in the formulation of hypotheses to deal with certain types of primary psychobiological processes that we might share with other animals (Panksepp & Burgdorf, 2003) . Third, the ongoing question of existing innate circuitries in the human brain (LeDoux, 2012). Do emotions exist by nature or are they specific to the human mind? (Charland, 2002; Griffiths, 2004). Finally, the lack of consensus in the use of the terms emotion, affect or feeling. Usually, in psychology the notion of affect encompassed several classes or categories of mental states such as emotions, moods or attitudes (Frijda & Scherer, 2009). The notion of feeling refers to the subjective perception of emotional states and their accompanying somatic responses. For now, feelings can only be assessed by verbal reports (Anderson & Adolphs, 2014). Yet, in the animal literature, the terms emotion, feeling and affect are often used as synonyms (Anderson & Adolphs, 2014; Boissy et al., 2007; de Vere & Kuczaj, 2016). Hence, it has become crucial in affective sciences, neurosciences, primatology or even ethology to find an agreement on the mechanisms that define animal emotions. From this, we will use in this section and the following chapters, the more general term of affect when referring to animals’ emotional experiences (humans excluded).

Despites all the debates, researchers such as Jaak Panksepp, have greatly contributed to the exploration of affective mechanisms in animals. In fact, pioneer in the field of comparative affective neurosciences, he often described seven different systems in the mammalian brains that promote affective actions and associated feelings: i) SEEKING for expectancy; ii) RAGE for anger; iii) FEAR for anxiety; iv) LUST for sexuality; v) CARE for nurturance; vi) PANIC for separation; and vii) PLAY for joy (Panksepp, 2010, 2011a, 2011b; see Figure 10). Those evolutionary and primal affects are the components of the basic affective circuits of

35

mammalian brains, involving sub-cortical regions in the limbic system (amygdala, hypothalamus, cingulate, hippocampus) and cortical areas such the frontal or temporal lobes (Montag & Panksepp, 2017; Panksepp, 2011b).

Figure 10: The four major affective operating systems. They are defined primarily by genetically coded but experientially refined neural circuits that generate well-organized behaviour sequences (Panksepp, 2011b).

Going further, Panksepp also investigated the affective consciousness in animals. For this purpose, he defined the term consciousness as “The brain states that have an experiential feel to them, and it is envisioned as a multi-tiered process that needs to be viewed in evolutionary terms, with multiple layers of emergence” (Panksepp, 2005, p.32). Then, he distinguished: i) the primary-process consciousness, reflecting sensory – perceptual feelings and emotional – motivational experiences; ii) the secondary-consciousness, referring to the capacity to have thoughts about external events; and iii) the tertiary form of consciousness, related to awareness, the ability to remember experiences and to transform simple thoughts in linguistic and symbolic meanings (Panksepp, 2005). Affective consciousness in animals is still strongly discussed in research (Boissy et al., 2007). Yet, this clear distinction between different levels of consciousness could allow scientists to consider the existence of the primary and secondary processes in animals. These two first levels could be the primitive precursors of more complex layers of consciousness in our evolutionary history. Thus, the third process, more complex, would be specific to the human mind. Therefore, animals could be conscious at a certain level about their affective

36

experiences but they should be not consciously aware about them (Berridge, 2002; Panksepp, 2005, 2010). This lack of tertiary consciousness would explain why only humans can stimulate or regulate their emotions through rumination or self-criticism for example (Gilbert, 2015). However, it is also possible that animals have these abilities as well, suggesting that the available scientific methodologies are not able to assess them. Researchers may not be able to penetrate these kinds of thoughts (Panksepp, 2014).

According to Arnellos and Keijzer, the term animal refer to “Collections of cells that are connected and integrated to such an extent that these collectives act as unitary, large free-moving entities capable of sensing macroscopic properties and events” (Arnellos & Keijzer, 2019, p.1). This definition encompasses a very large number of species, independently of their phylogenetic proximity to humans. The ability to recognize and respond appropriately to affective expressions of others has biological fitness benefits for both the signaller and the receiver (Schmidt & Cohn, 2001). From this evolutionary point of view, the capacity to express and identify affective cues, in conspecifics or even in heterospecifics, should be found across a large number of species: “The assumption that the other animals are unfeeling behavioural zombies seems evolutionarily improbable” (Panksepp & Burgdorf, 2003, p. 536). In fact, the literature often describes the ability of domestic dog to recognize affective cues associated with pictures or/and vocalizations. For instance, in a cross-modal looking paradigm, Albuquerque and collaborators demonstrated in seventeen dogs their abilities to look significantly longer at the pictures representing faces (dogs or humans) that were congruent with the valence (positive or negative) of the vocalizations. These results underlined first, the correct discrimination of domestic dogs between positive and negative affects in both humans and dogs; second, the importance of affective recognition in heterospecifics for such species living in mixed groups and socially interacting with humans (Albuquerque et al., 2016). In a famous and controversial study of 1999, Panksepp and Burgdorf revealed that laughter was not only shared between primates but was also present in very distant species such as rats (Rattus norvegicus). The “Laughing rats” were indeed able to vocalize laughter-like responses in context of play with conspecifics and heterospecifics (experimenters). Interestingly, the authors hypothesized that rats’ laughter would correspond to the ones expressed by humans in early childhood (Panksepp & Burgdorf, 2003).

37

Research on affective experiences in animals is often accused of mammalcentrism (Proctor et al., 2013). However, molluscs or even insects also seem to have a certain type of primitive affects, making the evolutionary explanation only one factor to understand emotional processing in the human mind (Bolhuis & Wynne, 2009). For example, Anderson and Adolphs showed the potential existence of affective behaviours in octopus. In fact, the switch between camouflage (freezing behaviour) and ink expulsion – propulsion (avoidance behaviour) of an octopus when confronted to a threatening situation (e.g. predator) would demonstrate the gradation of affective intensity in the mollusc. Furthermore, the same authors as well as LeDoux, described neural circuitries associated to specific postures in flies (drosophila; see Figure 11) and approach-avoidance behaviours in worms (c.elegans), emphasizing some primitive aspects of affective experiences in phylogenetically very distant species (Anderson & Adolphs, 2014; LeDoux, 2012).

Figure 11: Illustration of the antithesis theory of Darwin in which opposite affective experiences produce behaviourally opposite expressions. a) Human sadness (left image) and human happiness (right image); b) Antithetical postures in dogs (from Darwin, 1872); c) opposite postures in flies (Anderson & Adolphs, 2014).

Nevertheless, to improve our knowledge about the evolutionary history of human emotions, a comparative approach is necessary with our closest relative ones: the NHP (see Figure 12). To simplify, we will divide the NHP in two main groups of species: the great apes (gorillas Gorilla subs., chimpanzees Pan troglodytes, orangutan Pongo subs., bonobos Pan Paniscus) with

38

whom we had a common ancestor around 14 million years ago and the monkeys, such as macaques or baboons, separated from the Hominidae branch 28 – 44 million years ago, depending on the species (Schrago & Voloch, 2013).

Figure 12: Primate with divergence time (in million years ago; Mya) based on comparative MRI and genotyping data. Numbers in parenthesis refer to the number of MRIs used for each species (Heuer et al., 2019).

Hence, several studies highlight the abilities of great apes and monkeys to express and perceive affective cues through faces, whole-body or vocalizations (Fragaszy & Simpson, 2011). For instance, using a matching-to-sample design, Kano and collaborators demonstrated the capacity of young female chimpanzees to recognize pictures depicting aggressive conspecifics more easily than pictures representing relaxed chimpanzees, considered as neutral (Kano, 2008). Interestingly, the authors also showed in juvenile and adult chimpanzees an attentional bias towards agonistic scenes compared to neutral and positive ones in naturalistic movies, involving the whole-body expressions of conspecifics (Kano & Tomonaga, 2010). This attentional sensitivity for negative affects has often been

39

demonstrated in the human literature (e.g. Kret et al., 2013; Sander, Grandjean, Pourtois, et al., 2005; Schaerlaeken & Grandjean, 2018), suggesting common mechanisms across primate species. Indeed, attentional bias towards threatening or fearful cues would maximize the chance of survival of an individual when reacting rapidly to a potentially dangerous situation. However, this attentional process would not be present in all primate species. For instance, in macaques (Macaca mulatta), Bliss-Moreau and collaborators showed an attention towards affective scenes involving whole-body conspecifics but they did not find any difference between aggressive and affiliative movies (Bliss-Moreau et al., 2017). Furthermore, using a dot-probe task, a study in bonobos revealed an attentional bias for positive scenes involving conspecifics (Kret et al., 2016). These results could be explained by the nature of this peculiar great ape species. Bonobos are indeed a more social species than chimpanzees (Gruber & Clay, 2016) and thus, protective and affiliative behaviours could represent a pivot in their society, increasing bonobo attentional capture of positive affects. Nevertheless, despite their more aggressive behaviour, chimpanzees are also capable of expressing affiliative contents through facial expressions identically to humans (Parr et al., 2005; Parr & Waller, 2006; see Figure 13). In addition, chimpanzees as well as bonobos, gorillas, orangutans and even siamangs (Symphalangus syndactylus), another ape species, are also able to laugh in playful situations similarly to human infants or adults (Davila Ross et al., 2009; Davila Ross et al., 2011).

a) b) Figure 13: a) Prototypical facial expressions of laugh (left image) and smile (right) in human (adapted from Parr & Waller, 2006). b) The corresponding facial expressions in a young chimpanzee with play face (left image) and silent bared-teeth display (right) (Parr et al., 2005).

Finally, before closing this section on affects in animals, we will think further with this open question: Do animals understand and feel what others feel? In other terms, do they have ? This question is currently debated among the scientific community. Yet, from an evolutionary perspective, empathy would facilitate social interactions (Gilbert, 2015). Thus,

40

the ability to express empathy through altruistic behaviours could be necessary to many species. In fact, literature on empathy in great apes often describes their probable ability to have empathy towards their conspecifics. For instance, researchers revealed the capacity of chimpanzees to console the victim of an aggression (de Waal, 2009). Similarly, they also demonstrated affective contagion in orangutans relying on facial mimicries (Davila Ross et al., 2008). Both of these behaviours seem indeed to involve the understanding of others’ feelings. Despites controversial findings, some authors demonstrated possible empathic behaviours in more distant species such as birds or other mammals. For example, Fraser and Bugnyar suggested that ravens (Corvus corax), like chimpanzees, could show consolation behaviours towards the victim of an aggression, but only if the victim had a valuable relationship with the consoler (Fraser & Bugnyar, 2010). Moreover, some researchers also described possible compassion mechanisms in elephants (Loxodonta africana), detailing the peculiar interest of these elephants for the dead bodies of their conspecifics (Douglas-Hamilton et al., 2006). The question of animals’ empathy is still open. Nevertheless, if we consider empathy as a simple perception-action mechanism that provides an observer (the subject) with access to subjective state of another (the object) through the subject’s own neural and bodily representations (de Waal, 2009; Preston & de Waal, 2002), we may hypothesize that empathic processing could be present in a large number of species.

Overall, affective scientists of the 21st century, have by now accepted the usefulness of evolutionary thought for the general understanding of human emotions. Consequently, comparisons between emotional processes in human and affective processing in NHP are now often used. From this, section 2.2 will describe the evolutionary continuity between human (Section 2.2.1) and NHP’ affective vocalizations (Section 2.2.2).

2.2. Evolutionary continuity between Human and Primate Affective Processing

2.2.1. Emotional Prosody Recognition

Abilities to vocally express and recognize emotional cues in others are crucially involved in the survival of all primate species, humans included. Shaped by millions of years of evolution in its own lineage, the human species became an expert for the vocal expression and the recognition of emotions through prosodic modulations. Relying on specific

41

physiological changes and particular brain networks, the variation in acoustic features is indeed essential to emotional communication in human. Yet, while researchers mainly focused on conspecific voices (human to human), little is known about the human capacity to decode or not emotional cues in other species vocalizations. Fortunately, comparative studies on this matter recently emerged, especially on the human recognition of affect in non-human primate vocalizations. Interestingly, influenced by the familiarity or the acoustic proximity of the calls, humans seem able to identify affective valence or arousal in non-human primate vocalizations.

2.2.1.1. Emotional Prosody in Human Voice

Prosody refers to all suprasegmental changes in spoken utterance, including intonation, amplitude, envelope, tempo, rhythm and voice quality. Prosody through the modulation of acoustic features such as fundamental frequency (F0 – lowest frequency of the voice, pitch) or energy distribution, enables a listener to infer for instance, the identity or the emotional state of a speaker (e.g. Grandjean et al., 2006; Wagner & Watson, 2010). Prosodic modulation in the human speech is thus necessary to the encoding and decoding of emotional expressions. In fact, studies demonstrate the link between acoustic parameter changes and the emotional valence or arousal expressed by humans. For instance, Elodie Briefer described that the arousal level of emotions correlates with modulations in the respiration and phonation, influencing acoustic features such as F0, duration or speech rate (number of syllables produced per time unit, velocity of speech). On the contrary, emotional valence relies on intonation and voice quality, mostly changing the energy distribution in the spectrum. Positive emotions indeed show a lower energy in frequency compared to negative ones (Briefer, 2012). Furthermore, specific prosodic modulations may refer to a particular emotion. For example, the literature often describes that i) angry voices rely on the increase of F0 mean, mean energy, high frequency energy and downward directed F0 contours (sequence of F0 values across the utterance including mean, start, end, maximum and minimum of F0, intonation contour); ii) fearful voices also correlate with an increase of mean F0, F0 range (difference between minimum and maximum of F0), high frequency energy, speech rate, jitter (cycle to cycle frequency variation of F0, pitch perturbation) and shimmer (cycle to cycle amplitude – level of energy – of F0); iii) sad voices involve at the opposite a decrease of mean F0, F0

42

range, mean energy and downward directed F0 contours; and finally iv) happy voices also increase mean F0, F0 range, mean energy, high frequency energy, speech rate, jitter and shimmer (Banse & Scherer, 1996; Eyben et al., 2016; Goudbeek & Scherer, 2010; Hammerschmidt & Jürgens, 2007; Juslin & Laukka, 2003). Overall, these findings emphasize the acoustic differences between high and low arousals in which higher arousal (anger, fear, happiness) correspond to an increase of the speech parameters and lower arousal (sadness) is linked to a decrease of these acoustic properties. Interestingly, the same authors also show that the way specific emotions are expressed acoustically is similar across cultures, suggesting inherited mechanisms (Zimmermann et al., 2013). Evolutionary functions are indeed at play in the expression of acoustic features. For instance, current findings show that rough temporal modulations found in alarm screams were selectively used to communicate danger across signal types. Thus, the roughness of screams, in inducing intense fear, would confer a behavioural advantage in increasing the rapidity and the accuracy of a listener to decode emotional cues in screams (Arnal et al., 2015).

Physiological fluctuations in emotional states have been found to influence acoustic properties of the speaker’s voice in a reliable and predictable manner that is perceptually available to a receiver (Juslin & Laukka, 2003; Taylor & Reby, 2010). Well conceptualized in the source – filter theory of vocal production (Fant, 1960; Titze, 1994), this biological process is crucially involved in affective communication in humans as well as in other mammals (e.g. Briefer, 2012; Filippi, 2016; Fitch, 2000). The source – filter theory distinguishes two independent functions from different parts of the vocal apparatus: i) the source, including larynx, sub-laryngeal and laryngeal structures; and ii) the filter (or vocal tract) involving the tube connecting the larynx to the mouth and the nose from which a sound is propagated into the environment (Titze, 1994; see Figure 14). Hence, studies have shown that the source with the rate of opening and closing the glottis as well as the length and mass of the vocal folds enable the F0 structure in the speaker’ voice. In addition, through muscular interactions and changes in airflow or sub-glottal pressure, the source is also able to shape the tempo, the duration and the amplitude of the vocalizations (Belin, 2006; Taylor & Reby, 2010). Thus, in modulating F0 or the speech rate for instance, the source plays an important role in the vocal expression of emotional valence and arousal in the speaker’s voice (Goudbeek & Scherer, 2010; Morton, 1977). Similarly, the

43

vocal tract (filter) length and the mouth position enable the production of multiple formants which are defined as the maximum of energy around specific frequencies (Belin, 2006; Fitch, 2000; Taylor & Reby, 2010). For example, it has been demonstrated that in a positive context, humans often retract their lips while in agonistic situations they protrude their lips, modulating the speaker’ voice quality (Drahota et al., 2008).

Figure 14: Illustration of the source – filter theory of vocal production in goat, valid for all living mammals (humans included). The source sound is produced by the larynx by vibration of the vocal folds, determining the formants (F1 to F4), which correspond to a concentration of acoustic energy around particular frequencies (Briefer, 2012; Fitch, 2000).

Emotional vocalizations cover a broad range from complex speech utterance with elaborate prosodic features (Goudbeek & Scherer, 2010) to laughter (Davila Ross et al., 2009) or non- verbal affect bursts such as “ah” or “eww” (Scherer, 2009a), all relying on specific brain structures. Belin and collaborators, demonstrated the specific involvement of the bilateral superior temporal sulcus (STS) in the perception of human voices. Including bilateral regions close to the temporal pole as well as anterior and posterior portions of the Heschl’s gyrus, these voice-selective areas showed greater neural activity for human speech and non-speech in comparison to environmental sounds (Belin et al., 2000). Moreover, playing a key role in human voice perception, the STS and the superior temporal gyrus (STG) are crucially involved in the processing of vocal emotions. In fact, Grandjean and collaborators found a stronger activation in the middle STS (mSTS) and STG during the listening of pseudo-words expressed with angry prosody compared to the neutral ones. In a second experiment, the authors importantly demonstrated that these results were independent to low-level acoustic features (Grandjean et al., 2005).

44

Despite the importance of STS and STG in emotional voices, studies have shown the primordial involvement of other brain regions. Ethofer and collaborators indeed revealed the existence of emotional voice areas (EVA) within the primary and secondary auditory cortices (AC). Using functional magnetic resonance imaging (fMRI), the authors found stronger activity in the bilateral STG posterolateral to the primary AC during the passive listening of angry, sad, joyful and relief voices comparing to neutral ones. Further analyses revealed that the activation of STG was driven by the prosodic arousal and not by the valence of the vocalizations. Furthermore, structural connections between EVA, ipsilateral IFG and the inferior parietal lobe (IPL) were also found. Interestingly, passive listening of emotional prosody compared to neutral ones only increased the functional coupling between EVA and ipsilateral IFG, highlighting the crucial role of IFG and STG in vocal emotions (Ethofer et al., 2012). The existence of EVA is supported by previous and recent literature showing the involvement of these regions in the vocal expression of emotions (e.g. Bach et al., 2008; Czigler et al., 2007; Kotz et al., 2013; Kreifelts et al., 2009; Plichta et al., 2011; Schirmer & Kotz, 2006; Wildgruber et al., 2009; Zhang et al., 2018).

This fronto-temporal network can be modulated by the nature of the task, the vocal emotion (see Figure 15) or even by the sex of the listener. In fact, in a meta-analysis, Frühholz and Grandjean showed a distinction between para-verbal (e.g. pseudo words) and non-verbal emotions (e.g. affect bursts) as well as differences between positive and negative emotions. Para-verbal emotional expressions thus solicit bilateral primary and secondary AC with the middle superior temporal cortex (mSTC) and posterior superior temporal cortex (pSTC) whereas non-verbal stimuli involve bilateral AC with the right pSTC only. Similarly, positive emotions strongly activate the left mSTS, the right anterior superior temporal cortex (aSTC) and pSTC while negative emotions involve bilateral primary AC, right mSTC and left pSTC (Frühholz & Grandjean, 2013a). In the same line, studies on brain haemodynamic using fMRI as well as functional near infrared spectroscopy (fNIRS) revealed stronger activity in the right anterior middle temporal gyrus (aMTG), the posterior middle temporal gyrus pMTG, the bilateral orbitofrontal cortex (OFC) and the right IFG for happy voices compared to angry ones (Johnstone et al., 2006; Zhang et al., 2018). The literature often distinguished the brain haemodynamic responses in implicit and explicit decoding of emotional prosody (Bach et al., 2008; Frühholz et al., 2012; Wildgruber

45

et al., 2009). Thus, the posterior superior temporal gyrus (pSTG) and bilateral IFG were more strongly activated in explicit task (prosody discrimination) whereas in implicit task (sex discrimination) the middle superior temporal gyrus (mSTG) as well as the left IFG were more involved. Interestingly, Frühholz and colleagues also highlighted the sensitivity of bilateral pSTG to acoustic features of prosody. However, they did not find such sensitivity for IFG. In addition, researchers also demonstrated the involvement of STC with the planum temporale (PT), pMTG and pSTG when salient acoustic cues were increased experimentally (e.g. increase of F0). In comparison, the decrease of the same acoustic features strongly elicited activity in the inferior frontal cortex (IFC) and the fronto-temporal connectivity (Leitman et al., 2010). These results suggest that the sensory-integrative processing is facilitated when the acoustic signal is rich in emotional information. At the opposite, if the acoustic signal is ambiguous, more evaluative processes are needed and thus the connectivity between IFG and STG is recruited. Finally, it seems that the sex of the listener plays a role as well in the modulation of the neural responses associated to the fronto-temporal network. Schirmer and Kotz indeed showed, that the left IFG and N400 effect (event-related potentials – ERPs found in electroencephalography – EEG) were only involved in women participants for congruous emotional words versus incongruent ones. Interestingly, the authors suggested that women listeners integrated vocal and verbal emotions in left IFG after 400 milliseconds (ms) following the word and that women seems thus to use emotional prosody more automatically than men (Schirmer & Kotz, 2006).

Figure15: Structural and functional connectivity between the sub-regions of IFC and STG. In the left hemisphere, major pathways have been identified for the processing of auditory stimuli,

46

especially for speech features, along a ventral (green) and dorsal pathway (red). In the right hemisphere, studies only describe a dorsal pathway for emotional processing (blue) (Frühholz & Grandjean, 2013b). See also (Ethofer et al., 2006; Schirmer & Kotz, 2006).

The amygdala as well as fronto-temporal regions are crucially involved in the vocal communication of emotions (see Figure 16). Mostly activated by high arousal independently of explicit and implicit decoding (Bach et al., 2008; Frühholz et al., 2012; Wildgruber et al., 2009; Young et al., 2020), the amygdala is also recruited by more complex emotional mechanisms such social interactions and listeners’ personality. Emotional prosody is indeed necessary in social interactions enabling an individual to infer what another feels (Belin, 2006; Grandjean et al., 2006; Sander, Grandjean, Pourtois, et al., 2005). Sander and collaborators emphasized the key role of the amygdala and the fronto- temporal network in this processing. In fact, using a dichotic listening paradigm to assess attentional mechanisms related to social interactions, the authors revealed that the right amygdala as well as bilateral STS were strongly involved in angry prosody compared to neutral one, irrespective of whether it was heard from to-be-attended or to-be-ignored voices. On the contrary, OFC was more activated when the angry voice was to-be-attended (Sander, Grandjean, Pourtois, et al., 2005). Furthermore, emotional prosody is also influence by personality. In fact, as described by Brück and collaborators “The way in which human beings perceive and process their emotional environments tend to differ tremendously across individuals and the same emotional stimuli may often evoke very different responses among subjects” (Brück et al., 2011, p. 260). Thus, the authors investigated the relationship between haemodynamic responses and inter- individual differences in personality. They then demonstrated a positive correlation between the measures of neuroticism and activations of the right amygdala, the right ACC (connected to the amygdala) and medial frontal regions. Overall, the neuro-imagery results suggest the crucial involvement of distinct but connected networks including subcortical (amygdala, cingulate) and cortical regions (mainly AC, STS, STG, IFG, OFC) for the vocal expression of emotions. Nevertheless, research also emphasized the role of the , basal ganglia, or even subcortical nuclei in such processes. For instance, encompassed in a subcortical system, the cerebellum and basal ganglia have been found to be essential to emotional prosody especially for the auditory information processing and the perception of emotions in voices (Grandjean, 2020). Finally, both hemispheres seem essential in a different way to the vocal decoding of

47

emotional cues (Schirmer & Kotz, 2006). Mostly driven by the time scale of auditory information, the right hemisphere would be strongly involved in the processing of low auditory variations (e.g. pitch dynamic) whereas the left hemisphere would preferentially decode short scale information (e.g. roughness) (Grandjean, 2020).

Figure 16: Model of emotional prosody processing during explicit and implicit tasks. Bottom-up modulation (stimulus-driven) is indicated by white arrows. Top-down modulation (task- dependent) is indicated by black arrows. Implicit processing is represented by dotted lines, explicit processing by whole lines. Direct neural connections between regions (e.g. amygdala – ACC) are not represented (Wildgruber et al., 2009).

Emotional prosody through prosodic modulations and specific brain structures is essential to the human life. Expressed across a large number of para-verbal and non-verbal vocalizations, the vocal expression of emotions is an evolutionary old mechanism used both in human and animal communication (Goudbeek & Scherer, 2010; Zimmermann et al., 2013). Nevertheless, it remains unclear if this common evolution enables the decoding of emotions in heterospecific vocalizations. From a comparative point of view, are humans able to identify vocal affective cues in other primates? Section 2.2.1.2 will investigate this question.

2.2.1.2. Human Recognition of Affects in other Primate Vocalizations

The ability to identify conspecific vocalizations is crucially involved in social interactions between individuals. As illustrated by Pascal Belin “In humans particularly in modern societies, voices are everywhere, from physically present individuals as well as increasingly from virtual sources such as radios, TVs, etc., and we spend a large part of our time listening to the voice” (Belin, 2006, p. 2091).

48

This human expertise for conspecific voices would rely on specific brain pathways. In fact, Peretz and collaborators described such mechanisms in the case of one patient with phonognosia for whom the recognition of environmental sounds was preserved (Peretz et al., 1994). Briefly, phonognosia is an auditory agnosia involving impairment of voice discrimination abilities without comprehension deficits and often occurs after posterior right hemisphere lesions. The fact that after lesions of the right AC, the correct identification of voices remains impossible whereas the recognition of environmental sounds is still preserved, suggests dissociated neural substrates in which one is specific to the human voice (Belin, 2006). Hence, from the behavioural and cerebral processing, humans are specialists in the decoding of human voices. Nevertheless, the literature often describes inherited mechanisms between humans and NHP in the communication of affects (see Sections 2.1.2.2 and 2.2.2). In fact, due to the phylogenetic proximity of humans with other primates, it is likely that humans are still able to recognize affective contents in NHP vocalizations. Surprisingly, only few studies have explored this extraordinary possibility.

Research investigated recently the human perception of arousal in NHP vocalizations. For instance, Kelly and collaborators showed that, in absence of exposure (familiarity) to the species’ vocalizations, human adults were highly sensitive to the main pitch to rate the arousal of human, chimpanzee and bonobo babies’ cries (see Figure 17). Participants indeed rated as more distressful bonobo infant calls in comparison to human or chimpanzee ones. After further acoustical analyses, the authors found that bonobos infants have higher pitched vocalizations compared to human and chimpanzee babies’ vocalizations, which have more similar acoustic features (Kelly et al., 2017). However, even if acoustic differences exist, research often demonstrates similar acoustical patterns for babies’ cries across primate species (e.g. Briefer, 2012; Kelly et al., 2017).

49

Figure 17: Interaction between the cry pitch and perceived distress by adult listeners in human and ape infants ‘cries. Blue dots correspond to human baby cries; yellow dots represent chimpanzee infant calls; and red dots show bonobo infant screams. Solid curves represent the fits of the estimated marginal means. The black curve represents all three species confounded (Kelly et al., 2017).

Similarly, Filippi and collaborators demonstrated the abilities of human participants to identify higher level of arousal in several mammalian species. In fact, human listeners were able to correctly identify arousal in sad and angry voices expressed by human actors as well as for macaque (Macaca sylvanus) vocalizations expressed in negative contexts. Acoustic analyses of the calls shows that such capacities rely on the prosodic modulation by arousal of F0 and the spectral centre of gravity (Filippi et al., 2017). Interestingly, using the peak frequency and arousal rating as covariates (results are thus controlled for these factors), Fritz and collaborators revealed the involvement of the right IFG, bilateral STS and left PT in the listening of human, chimpanzee and macaque vocalizations (see Figure 18). The results showed a gradient of brain activities with the strongest neural responses for human voices compared to chimpanzees and then compared to macaque calls in which the lowest activations were found (Fritz et al., 2018). These regions are importantly part of EVA (see Section 2.2.1.1). Overall, these results underline inherited acoustic and brain mechanisms enabling modern humans to identify correctly arousal in other primate species’ vocalizations. Acoustic similarities seem primordial to the processing of heterospecifics’ vocalizations.

50

Figure 18: Coronal, axial and sagittal views for the contrast human > chimpanzee > macaque affective vocalizations. The peak frequency of the calls and the arousal rating of the participants were used as covariates in general linear model (GLM). Statistical images were generated using an uncorrected threshold of p < .001 (Fritz et al., 2018).

A few studies also investigated the recognition of valence in NHP vocalizations. Belin and collaborators indeed demonstrated the incapacity of human listeners to identify correctly positive and negative vocalizations expressed by macaques. However, fMRI data revealed a stronger activation in the right medial posterior OFC for negative vocalizations in comparison to positive ones for human voices as well as for macaque calls (Belin et al., 2008). Yet, these fMRI analyses also included cat vocalizations making impossible a conclusion on neural substrates activated by primates only. In the same line, another study also showed the inability of humans to discriminate affective valence in macaque calls. Participants were however able to do it for human and chimpanzee vocalizations (Fritz et al., 2018). These results emphasize the key role of phylogeny in the accurate identification of affective valence in NHP vocalizations. Humans indeed seem only capable of recognizing positively versus negatively valenced vocalizations expressed by great apes, our closest relatives. Yet, Linnankoski and collaborators demonstrated that adults and 5-10 years old children have in fact the capacity to categorize correctly five different affective contents in macaque vocalizations (Macaca arctoides) involving satisfied, angry, fearful, submissive and scolding cues (Linnankoski et al., 1994). These results are not consistent with the ones found by Belin, Fritz and collaborators, nevertheless they highlight the importance of evaluating how participants are asked to decode affects in NHP calls. It could be easier for human adults and infants to directly label affective contents of NHP vocalizations in a forced choice

51

paradigm in which the number of possibilities is limited rather than rate the valence on Likert scales in which multiple choices are possible.

Species, acoustic features and decoding tasks are different factors influencing the human recognition of affective valence and arousal in NHP vocalizations. Furthermore, comparative studies also emphasize the primordial roles of familiarity and attentional capture in affective perception. In fact, Scheumann and collaborators demonstrated the impact of familiarity in the correct rating of affective valence and arousal by humans (see Figure 19). Thus, human participants failed to recognize affiliative chimpanzee as well as positive and negative tree shrew (Tupia belangeri; primate-like species) calls. As expected, these vocalizations were classified as less familiar than human infants’ cries and dog barks but interestingly, they were also rated as less familiar than agonistic chimpanzee screams (Scheumann et al., 2014). Furthermore, in another study, Scheumann and collaborators revealed in an auditory oddball novelty paradigm using EEG an orthogonal effect of affiliative and agonistic contents on early negativities (MMN, involved in early emotional processing) at posterior sites for the listening of human infant and chimpanzee vocalizations only. At the opposite, no effect was found for tree shrew calls. In addition, affiliative chimpanzee and tree shrew vocalizations which were rated as less familiar than agonistic ones also elicited posterior P3a and P3b responses. The authors suggested that P3a would be involved in the involuntary attention switch from familiar to novel stimuli whereas P3b response would underline cognitive evaluation influenced by prior experience (Scheumann et al., 2017). Overall, these results highlight the influence of attentional mechanisms in the human recognition of affects in NHP vocalizations. These findings are in line with the literature describing attentional capture in i) human adults viewing human and chimpanzee affective pictures (Kret et al., 2018); and ii) human infants at 3 – 4 months old listening to human voices and blue-eyed Madagascar lemur (Eulemur macaco flavifrons) vocalizations (Ferry et al., 2013). Finally, the familiarity as well as the phylogenetic distance also seem to play a crucial role in the vocal identification process of affects.

52

a) b) Figure 19: a) Familiarity ratings for the playback categories: mean and standard deviation. b) Valence ratings for the playback categories: mean and standard deviation. One-sample t-test *** p < .001, ** p < .01, * p < .05 (adapted from Scheumann et al., 2014).

To conclude, a large number of mechanisms are in play in the human recognition of affects in conspecific and heterospecific vocalizations. From the modulation of acoustic features to complex cognitive processes, all rely on specific brain pathways, with the literature often suggesting a continuous evolution between human and NHP species. The following Section 2.2.2 will describe these common mechanisms.

2.2.2. Expression of Affects in non-human Primate Calls

Based on evolutionary mechanisms, affective communication processing is shared across mammals: “The expression of emotion has an important evolutionary basis. For millions of years, long before verbal language evolved as communication device, affect and affect expressions functioned as the main form of social communication, signalling intent and desire” (Greenberg, 2002, p. 233). Hence, certainly guided by structural universal rules well described by Morton (Morton, 1977), animals and especially NHP species have also the ability to modulate specific acoustic features in their vocalizations to express low or high aroused agonistic and affiliative signalling for instance. This extraordinary expertise crucially relies on cerebral networks, comparable to the ones found in humans for the processing of emotion in voices.

2.2.2.1. Affective Communication in non-human Primates

The perception and the identification of signals are particularly relevant for the survival and the reproductive success of NHP, enabling individuals to encode information about

53

identity, referential target or affective states for instance (Buttelmann et al., 2009; Fischer & Price, 2017; Ghazanfar & Hauser, 2001; Ghazanfar & Santos, 2004; Zuberbühler, 2000). NHP can use visual signals as well as auditory ones, however in the wild, visual cues are often compromised by distance or vegetation. Unlike visual signals, acoustic cues can transmit information over large distances despite a rich environment (Ghazanfar & Santos, 2004). This selective pressure could explain the importance of vocal expression in some primate species, humans included. The modulation of prosodic features is indeed considered as a homologous trait between human and NHP vocalizations (Filippi, 2016) even if the vocal learning of these acoustic fluctuations during the childhood and adulthood of NHP is still debated (Egnor & Hauser, 2004; Fischer & Price, 2017). Nevertheless, like humans, NHP are able to modulate acoustic parameters such as F0 or energy, using similar vocal production mechanisms (see the source-filter theory in Section 2.2.1.1) to express affective vocalizations (Briefer, 2012). To assess these common abilities, Eugene S. Morton proposed universal motivational- structural rules (Morton, 1977, 1982; see Figure 20). Emerging from the comparison between mammal and bird vocalizations, Morton tried to systematize the role of F0, energy and voice quality for the vocal signalling of anger and fear (Grandjean et al., 2006). Thus, aggressive vocalizations are constituted by low frequencies whereas fearful vocalizations are characterized by high frequencies. Interestingly, Morton suggested that high tonal sounds would be similar to the ones produced by infants and then, should have an appeasing effect on the receiver, while in hostile interactions low frequency calls would increase the perceived size of the caller (Ehret, 2006).

54

Figure 20: Motivational-structural rules modelling the fluctuation of F0, energy and voice quality in aggressive and fearful/appeasement vocalizations (Morton, 1982). For instance, the fear endpoint is characterized by continuous high pitch; the aggressive endpoint by low pitch and roughness (Grandjean et al., 2006).

Research on vocal correlates of affects in NHP often describes acoustic parameters of arousal. For instance, in a meta-analysis conducted on 39 studies involving monkeys (e.g. macaques, lemurs – Lemur catta) as well as chimpanzees, Zimmermann and collaborators summarized the different acoustic features involved in arousal of agonistic and affiliative vocalizations. Thus, the authors found that the increase of arousal in NHP rely on the increase of call rate, duration and F0 as well as a decrease in harmonicity (Zimmermann et al., 2013). These results are consistent with the findings of Rendall in hamadryas baboons (Papio hamadrayas) in which the authors demonstrated the modulation of temporal and source related parameters such duration, jitter, F0, amplitude or harmony in the variation of caller affect intensity (Rendall, 2003). Interestingly, Slocombe and collaborators showed that wild chimpanzees were able to distinguish the severity of aggression according to the victim’s vocalizations. The authors indeed demonstrated that chimpanzee bystanders looked longer towards the victims producing severe aggression calls compared to the ones expressing mild or tantrum screams (Slocombe et al., 2009). This attentional distinction relies on the modulation of acoustic

55

parameters in the victim calls. In fact, severe aggression vocalizations elicit higher pitch and longer call duration in comparison to mild aggression screams (Slocombe & Zuberbühler, 2007).

While mainly focused on arousal, research on acoustic features of affective valence and specific affects nonetheless exists. For instance, in a recent review, Kret and collaborators described acoustic parameters relying on alarm calls and food grunts expressed by great apes. Thus, alarm calls, defined as a fearful reaction to threatening or dangerous situations, involved high frequencies and amplitude contours, whereas food grunts (affiliative calls) were characterized by low frequencies (Kret et al., 2020). These new findings are in agreement with the previous literature (see Briefer, 2012 for a review). Following this, Slocombe and Zuberbühler distinguished two types of negative vocalizations produced by wild chimpanzees in agonistic interactions. Thus, the authors found that calls produced by aggressors (angry contents) involved a larger frequency range and call duration than the ones produced by the victims (fearful cues). These results suggest that chimpanzees could be able to infer the role of each individual during a conflict (Slocombe & Zuberbühler, 2005). Furthermore, Fichtel, Hammerschmidt and Jürgens, in a multi-parametric analysis in squirrel monkeys (Saimiri sciureus) showed that aversive vocalizations were characterized by a higher pitch mainly shaped by the increase of peak frequency and frequency range rather than F0 (Fichtel et al., 2001). The literature is even rarer for the description of acoustic features in affiliative vocalizations. It is indeed more difficult to find contexts in which positive vocalizations are produced (Briefer, 2012). Nevertheless, Davila Ross and collaborators investigated the evolution of laughter in humans compared to great ape species in two elegant studies. In effect, they analysed the acoustic of tickle-induced vocalizations from human infants and juvenile orangutans, gorillas, chimpanzees and bonobos. The authors thus revealed common acoustic characteristics of laughter between the five primate species including stable voicing (regular vocal-fold vibration), egressive airflow (exhalation phase) and call duration (see Figure 21). Hence, even if some differences exist, the results seem to underline a phylogenetic continuity from human to NHP affiliative expressions (Davila Ross et al., 2009). Following this, Davila Ross and collaborators distinguished in wild chimpanzees

56

spontaneous laughter and laughter in response to laughter of others. The authors importantly demonstrated that laughter replications significantly involve more calls per laugher series than spontaneous laugher. These last findings represent the first empirical support of a socialization of expressions in great apes, an important mechanism also involved in social interactions of modern humans (Davila Ross et al., 2011).

Figure 21: Representative spectrograms of great ape and human vocalizations elicited by tickling. On a top right corner, a juvenile orangutan being tickled by a human experimenter (adapted from Davila Ross et al., 2009).

From humans to NHP, prosodic modulations are primordial to the communication of affects in primates. Underlying old evolutionary mechanisms, these common abilities of vocal encoding and decoding of affects may rely on comparable brain processes. The Section 2.2.2.2 will develop this point of view.

2.2.2.2. A Continuous Evolution of Brain Mechanisms

The existence of affective lives in animals was debated through the ages. These perpetual discussions guided by behaviourism though paralyzed the research on the underlying brains mechanisms (Gruber & Grandjean, 2017). Nevertheless, for the last two decades, scientists

57

have been studying cerebral networks of affects across animal species, NHP in particular.

Similar to humans, comparative studies in affective neurosciences have revealed the key role of PFC regions in affective processes in NHP. While PFC is less developed in NHP than in humans (LeDoux, 2012), PFC regions are crucially involved in the encoding of high order information, affective contents included, extracted from vocalizations (Cohen et al., 2007). In fact, the PFC is directly connected through thalamic projections to the amygdala (see Figure 22), well known for its role on affective memory and arousal (Barbas, 2000; Rauschecker, 2013). However, subparts of the PFC involve different strength of projections. For instance, rostral areas of PFC are less connected to the amygdala than the posterior or medial OFC (Ghashghaei et al., 2007). Barbas and collaborators indeed suggested that robust inputs from the amygdala to the upper layers of PFC would be involved in the attentional focus on affective salient stimulus, enabling the elimination of possible distracters. Furthermore, connectivity between the amygdala and OFC might enable the flexible regulation and the arousal modulation of affective expressions based on the behavioural context (Barbas et al., 2011). Hence, OFC is primordial to affective processing. Neurophysiological studies have shown indeed that lesions of OFC in NHP involved a decrease of aggression (e.g. less attentional capture, vocalizations…) towards a dangerous stimulus (Rolls, 2004). Similarly, NHP with lesions to OFC or the amygdala were bereft of appropriate affective responses, disrupting the communication in their social interactions (Barbas, 2000). In fact, AC through its connections to the amygdala, frontal and temporal regions, enables the decoding of affective information in conspecific vocalizations (Belin, 2006; Ghazanfar & Hauser, 2001; Ghazanfar & Santos, 2004; Joly et al., 2012; Rauschecker, 1997). In absence of connectivity, this ability is thus disrupted, making difficult for an individual to interact with others.

58

Figure 22: Connections between PFC regions and the amygdala. A) Unilateral projections from posterior OFC to the amygdala (magenta). The posterior OFC also has bidirectional connections with the basal nuclei, where its connections overlap (brown) with sensory input (yellow) that reaches the amygdala from sensory association cortices. B) Surface maps on a rhesus monkey brain representing the strength of pathways from the amygdala that terminate in lateral (top) and orbital (bottom) prefrontal areas. The posterior OFC receives the strongest projections from the amygdala; the thickness of the arrows indicates pathway strength. Abbreviations: pOFC, posterior OFC; AMY, amygdala; BL, basolateral nucleus; BM, basomedial nucleus; Ce, central nucleus; Co, cortical nuclei; IM, intercalated masses; La, lateral nucleus (Barbas et al., 2011).

In addition, connectivity between the amygdala and OFC might be involved in an individual’s action in guiding choices based on affective signals and reward processing. Nevertheless, both regions would contribute distinctively in this process in which OFC would play a key role in inhibition (e.g. object reversal learning) and the amygdala in affective mechanisms (Murray & Izquierdo, 2007). Interestingly, the medial frontal cortex (MFC), in interaction with the amygdala and OFC is also involved in such decision-action processes, allowing expected outcome based on individuals’ action (see Figure 23). Furthermore, OFC and other PFC regions are connected with motor control structures related to central executive functions, making OFC a primordial brain area in action decision making.

59

Figure 23: Cerebral pathways involved in goal-directed action based on sensory inputs. Illustration in lateral and medial views of a macaque brain. OFC and MFC represent the values of expected outcomes which can be reappraise in interaction with the amygdala. The infero- temporal cortex (IT) and perirhinal cortex (PRh) are both involved in the processing of cross- modal objects features in humans as well as in monkeys (adapted from Murray & Izquierdo, 2007).

Connected to PFC-amygdala, the hypothalamus, the nucleus accumbens and PAG are also crucial to affective communication in human and NHP species. The hypothalamus, PAG and nucleus accumbens are indeed essential to aggressive-defensive behaviours (see Figure 24), affective displays (e.g. vocalizations) and reward processing (Barbas, 2000; Berridge, 2002; Owren et al., 2011; Rauschecker, 2013). In fact, neurophysiological studies in NHP demonstrated for instance that damages in the hypothalamus involve a decrease of motivational and social behaviours shaped by affective signals (Berridge, 2002). Research also showed the involvement of PAG in vocal affective production in great apes as well as in monkeys (Owren et al., 2011). Similarly, the nucleus accumbens with the release of dopaminergic neurotransmitters, is well known for its role in reward processing and affiliative cues (Boissy et al., 2007).

60

Figure 24: Schematic representation of survival circuits underlying defense reactions elicited by unconditioned (unlearned) and conditioned (learned) threats. Abbreviations: ABA, accessory basal amygdala; BA, basal amygdala; CEA, central amygdala; LA, lateral amygdala; LH, lateral hypothalamus; MEA, medial amygdala; NAcc, nucleus accumbens; VMH, ventromedial hypothalamus; PAGd, dorsal PAG; PAGv, ventral PAG; PMH, premammilary nucleus of the hypothalamus (LeDoux, 2012).

Finally, as often demonstrated in the literature (Berridge, 2002), ACC is a key region for affective processing in human, especially in the self-consciousness of emotions. Interestingly, recent studies also highlighted the role of ACC in affective mechanisms in NHP. Indeed, involved in the vocal expression of affect (Owren et al., 2011), Parr, Waller and Fugate suggested as well the involvement of ACC in social affects and the evolutionary aspect of affective awareness in great apes (Parr et al., 2005). In fact, the spindle cells in ACC were found in humans as well as in chimpanzees, bonobos, gorillas and orangutans but not in monkeys (Nimchinsky et al., 1999). Overall, homologous brain mechanisms seem at play in the vocal-auditory processing of affects in humans and NHP. These results underlie the common evolution of affective communication across primate species.

61

2.3. Synthesis of the Introduction

Through the ages, evolutionary thoughts have influenced savants and philosophers of their times. Questioned or even denied for a long time, evolution mechanisms are now considered as a key to understand modern behaviours in humans and animals. Since the 20th and 21th centuries indeed, affective scientists conceptualize the genesis of emotions in light of inherited adaptive functions. Moreover, at the cerebral and behavioural levels, research uncovered the old evolutionary bases of emotions revealing common mechanisms across species, especially between humans and non-human primates, our closest relatives. Despite the importance of these last findings, most of them has been found through the study of faces. Yet, little is known about such processes in humans and NHP for the vocal expressions of emotions. The ability to communicate and understand emotional cues in conspecific or even heterospecific vocalizations is however crucial for both species. In fact, in our modern society, humans are surrounded by voices. This current expertise was shaped by our extraordinary history, in which through millions of years, the vocal channel became a carrier signal for emotional and referential information for instance (Fröhlich et al., 2019). Similarly, due to their phylogenetic proximity with humans and their specific environmental pressure, the capacity of non-human primates to encode and decode affective messages through the vocal channel is also primordial to their daily life. Indeed, such mechanisms allow an individual to inform about a potential danger, its sexual status or even about the place of a great fruit tree for example.

Overall, enabling the expression and the recognition of emotions even in a rich environment, vocal communication is essential to modern behaviours of all primate species (humans included). Comparative studies on this topic are thus crucially needed to (i) improve our general understanding of emotional processes in humans and (ii) provide new insights on the great Hominidae evolutionary history.

3. Thesis objectives

Decision-making mechanisms on emotional matter are crucially at play in humans’ and animals’ everyday lives. Indeed, accurately recognizing a perceived emotion is essential to maximize the survival chances of a given individual triggering adaptive behaviours such as the avoidance of a potential danger or the approach of a friendly peer (e.g. Bolhuis &

62

Wynne, 2009; Davidson et al., 1990; Frijda, 1987). Despite the importance of such mechanisms in both humans and other animals, research investigated only late the processing of emotional choices. Hence, until today, very little is known about the physiological processes underlying the recognition of emotions through voices. From this assessment, the present thesis aims to fill this gap.

How the choice complexity in a given task affects the vocal processing of emotions is still poorly understood, even in the human literature. Recent findings in fMRI however, demonstrated the differentiation between categorization (non-biased choice, A versus B) and less complex discrimination (biased choice, A versus non A) mechanisms in the recognition of emotions through human voices (Dricu et al., 2017). The authors indeed revealed a stronger involvement of bilateral IFC, OFC, voice sensitive areas and amygdala for the correct categorization of emotions whereas the discrimination of angry and happy voices elicited more activations in middle IFC. These results point out the distinct involvement of brain areas in categorization and discrimination processes. Furthermore and interestingly, within the bilateral IFC, both tasks elicited specific subparts of this frontal regions well known for its role in evaluative judgement of emotional prosody (Schirmer & Kotz, 2006). Despite promising findings, the study of Dricu and collaborators was limited to explicit tasks and it remains unclear however, if such processes are involved as well during the implicit decoding of emotional voices. Using the functional Near Infrared Spectroscopy (fNIRS), Study 1 of the present thesis investigated thus neural activity of bilateral IFC in categorization and discrimination tasks during implicit and explicit decoding of emotions in voices by human participants. If distinct neural mechanisms are effectively involved in the recognition of emotions in human voices, we can assume that similar processes could be at play for the identification of affects in other primate vocalizations. In fact, few comparative studies demonstrated the human ability to identify affective cues in NHP vocalizations (Belin et al., 2008; Filippi et al., 2017; Fritz et al., 2018; Kelly et al., 2017; Linnankoski et al., 1994; Scheumann et al., 2014, 2017). However, none of them explored the distinction between categorization and discrimination mechanisms at the behavioural and cerebral levels. Hence, Study 2 investigated behaviourally and cerebrally using fNIRS the human categorization and discrimination of affects in human voices and NHP vocalizations, specifically in frontal regions.

63

Among the comparative studies mentioned above, few underlined the importance of acoustic features in the processing of cross-taxa recognition in human (Filippi et al., 2017; Kelly et al., 2017). Nevertheless, the direct link between acoustic similarities across primate vocalizations and the human categorization and discrimination of affects in NHP is still unknown. Therefore, Study 3 attempted to fill this lack of results. From the perspective of acoustic similarity and phylogenetic proximity between humans and NHP, similar brains areas in humans could be involved during the listening of primates’ affective vocalizations (Belin et al., 2008; Fritz et al., 2018; Scheumann et al., 2014, 2017) showing a potential gradient of activations between human, great ape and monkey calls. This hypothesis was test in Studies 4 and 5 exploring with fMRI the haemodynamic activity in voice-sensitive areas and inferior frontal regions respectively. Finally, if understanding the cerebral mechanisms at play in the human recognition of affects in other primate vocalizations is essential, investigating the vocal identification of affects in NHP species seems necessary to fully understand such processes. Yet, to compare the affective recognition in both species, the using of similar methodologies is required. Study 6 attempted for the first time to use fNIRS usually applied in human research to the investigation of motor and auditory processing of affects in baboons.

Table 1: Overview of the thesis objectives and related studies – methodologies. Objective Method Study n° Objective 1 Investigate in humans categorization and discrimination Behaviour 1 mechanisms within bilateral IFC during the implicit and fNIRS explicit decoding of emotions in human voices Objective 2 Explore at behavioural and cerebral levels the human Behaviour 2 categorization and discrimination of affects in primate fNIRS vocalizations Objective 3 Understand the roles of acoustic similarity and Behaviour 3 phylogenetic proximity in the human categorization and Acoustic discrimination of affects in human, great apes and analyses monkey calls Objective 4 Demonstrate the involvement of similar brain regions in Behaviour 4 & 5 humans for the listening of affective primate vocalizations fMRI Objective 5 Develop a comparative protocol to explore cerebral fNIRS 6 processes in baboons

64

Experimental Part

Chapter 1.

Human Discrimination and Categorization of Emotions in Voices: A functional Near Infrared Spectroscopy (fNIRS) Study

Thibaud Gruber*, Coralie Debracque*, Leonardo Ceravolo, Kinga Igloi, Blanca Marin Bosch, Sascha Frühholz‡ and Didier Grandjean‡ *joint first authors ‡joint senior authors

Published on June 5th 2020 in Frontiers in Neuroscience, doi: 10.3389/fnins.2020.00570

1.1. Abstract fNIRS is a neuroimaging tool that has been recently used in a variety of cognitive paradigms. Yet, it remains unclear whether fNIRS is suitable to study complex cognitive processes such as categorization or discrimination. Previously, functional imaging has suggested a role of both inferior frontal cortices in attentive decoding and cognitive evaluation of emotional cues in human vocalizations. Here, we extended paradigms used in fMRI to investigate the suitability of fNIRS to study frontal lateralization of human emotion vocalization processing during explicit and implicit categorization and discrimination using mini-blocks and event-related stimuli. Participants heard speech like but semantically meaningless pseudo words spoken in various tones and evaluated them based on their emotional or linguistic content. Behaviourally, participants were faster to discriminate than to categorize; and processed the linguistic faster than the emotional content of stimuli. Interactions between condition (emotion/word), task (discrimination/categorization) and emotion content (anger, fear, neutral) influenced accuracy and reaction time. At the brain level, we found a modulation of the O2Hb changes in IFG depending on condition, task, emotion and hemisphere (right or left), highlighting the involvement of the right hemisphere to process fear stimuli, and of both hemispheres to treat anger stimuli. Our results show that fNIRS is suitable to study vocal emotion evaluation, fostering its application to complex cognitive paradigms.

65

Keywords: categorization, discrimination, emotion, fNIRS, prosody

1.2. Introduction

While the majority of the studies investigating cognitive processes in cortical regions have relied on fMRI or EEG, the use of fNIRS as an imaging technique has developed over the last 25 years (Boas et al., 2014; Buss et al., 2014; Chance et al., 1993; Hoshi & Tamura, 1993; Kato et al., 1993; Villringer et al., 1993). Similar to fMRI, fNIRS is a non-invasive and non- ionizing method that investigates the brain hemodynamic (Boas et al., 2014). Using the principle of tissue transillumination, fNIRS indirectly measures via near-infrared light the oxygenated haemoglobin (O2Hb) and deoxygenated haemoglobin (HHb) sustaining the hemodynamic response function (HRF). In effect, optical property changes assessed by two or more wavelengths between the optical fibers detecting and receiving the near-infrared light provide an indirect measure of cerebral O2Hb and HHb; an increase of O2Hb concentration suggests that the area considered is more active during a particular paradigm compared to a control condition (Mandrick et al., 2013; Felix Scholkmann et al., 2014). Research findings using fNIRS suggest that this method can be an appropriate substitute to fMRI to study brain processes related to cognitive tasks (Cui et al., 2011; Felix Scholkmann et al., 2014) with a more realistic approach (Strait & Scheutz, 2014). Despite a lower spatial resolution than fMRI, fNIRS has indeed a high temporal resolution, and is particularly interesting because of its low-cost and high portability, allowing for instance one to measure participants while they are engaged in a sport activity (Piper et al., 2014). The fNIRS signal is also less sensitive to movement artifacts than other brain imaging techniques. Over the last two decades, perception and cognition have been extensively studied in the cortical regions through fNIRS, which also allows studying functional connectivity among cortical regions (Boas et al., 2014). For example, Buss et al. (2014) showed that fNIRS can be used to study the frontal-parietal network at the base of visual working memory abilities. Similar to other neuroimaging techniques such as fMRI, a growing number of fNIRS studies use mini-block or event-related paradigms rather than block designs (Aarabi et al., 2017; Aqil et al., 2012). In fact, even if a block design significantly improves statistical power, mini-block or event related paradigms crucially avoid strong habituation effects in the HRF time course of complex cognitive processes (Tie et al., 2009). In the present study, we aimed to advance knowledge on the use of fNIRS in complex cognitive paradigms relying on mini-block design by evaluating its use in

66

emotional evaluation paradigms, which previous work suggested could constitute a relevant field to evaluate the suitability of fNIRS. fNIRS has indeed recently proven a useful non-invasive technique to study emotion processes (Doi et al., 2013), especially in the visual domain (for a review, see Bendall et al., 2016). In one study, fNIRS was used to study affective processing of pictures in the parietal and occipital areas (Köchel et al., 2011); together with more recent work, it suggests that a large occipital-parietal temporal network is involved in discrimination tasks involving judgments about ‘emotional’ gait patterns (Schneider et al., 2014). fNIRS has also allowed researchers to record PFC activations during two types of task: the passive viewing and the active categorization of emotional visual stimuli. In the first case, researchers found an increase of the O2Hb in the bilateral ventrolateral PFC when participants were watching negative pictures; in contrast, positive pictures led to a decrease of O2Hb in the left dorsolateral PFC (Hoshi et al., 2011). In the second case, the authors isolated an activation of the bilateral PFC involving an increase of O2Hb and a decrease of HHb when participants were viewing fearful rather than neutral images (Glotzbach et al., 2011). These results are consistent with recent findings showing fNIRS activations in ventrolateral PFC during the viewing of threatening pictures (Tupak et al., 2014). Finally, in a recent study, (Hu et al., 2019) showed that fNIRS was suitable to isolate the signature of various positive emotions in the PFC. However, some studies did not find differences in O2Hb between baseline and any kind of pictures, whether negative, neutral or positive (Herrmann et al., 2003). A natural negative mood during task completion was also found to have an impact on PFC activity during a working memory task (Aoki et al., 2011), although an experimentally induced negative mood had the opposite effect with increased PFC O2Hb (Ozawa et al., 2014). As of now, the emerging picture for affective visual stimuli is that the PFC is solicited during both passive and active stimulation; however, the exact pattern of activity must be characterized with more studies and with an effort toward more comparability between the paradigms employed across fNIRS studies (Bendall et al., 2016). While fNIRS studies are found in the literature with respect to visual emotional treatment, studies on affective processes using auditory signals remain rare in fNIRS research. That auditory emotional treatment is neglected is a concern given the abundance of work finding different cortical activations during auditory emotional evaluation through various imaging techniques. Indeed, even if much of the initial vocal emotional processing in the brain

67

occurs in subcortical and sensory cortical areas (for a review, see Frühholz et al., 2014; Pannese et al., 2015), many higher order processes occur in cortical areas, including the associative temporal and the prefrontal cortices (Belyk et al., 2017; Frühholz et al., 2012; Frühholz & Grandjean, 2013a, 2013b; Wildgruber et al., 2009). For example, in recent years, the PFC has been largely suggested to be involved in the processing of emotional stimuli in the vocal and auditory domain, based on work conducted mainly with fMRI (Dricu et al., 2017; Frühholz et al., 2012, 2016). In particular, IFG is involved in the processing of human vocal sounds, and reacts to some of its properties such as prosody, the variation in intonations that modulates vocal production (Frühholz et al., 2012; Schirmer & Kotz, 2006). In a recent meta-analysis, Belyk et al. (2017) have reviewed the role of the pars orbitalis of the IFG during semantic and emotional processing, highlighting a possible functional organization in two different zones. The lateral one, close to Broca’s area, would be involved in both semantic and emotional aspects while the ventral frontal operculum would be more involved in emotional processing per se. The lateral zone would have been co-opted in human communication for semantic aspects while in non-human primates this zone would be more related to emotional communication. While we broadly agree with this view, the potential existence of vocalizations with semantic content in non-human primates (Gruber & Grandjean, 2017) suggests that this co-optation may have emerged earlier in our evolution.

To our knowledge, only two studies have been published on the treatment of vocal emotional stimuli in fNIRS, both showing that emotional stimuli activated the auditory cortex more compared to neutral stimuli (Plichta et al., 2011; Zhang et al., 2018). While Plichta and colleagues did not investigate how vocal emotional stimuli modulated the activity in the PFC, Zhang and colleagues showed that the left IFG was modulated by emotional valence (positive versus negative) and they also found a bilateral activation for the orbito-frontal cortex when anger was contrasted with neutral stimuli. However, neither of these two studies investigated categorization and discrimination of vocal emotional stimuli.

To fill this gap, the present study investigated O2Hb and HHb changes after the judgment of the emotional content of vocal utterances, with the aim to compare our results with recent fMRI advances. In particular, because of its involvement in the processing of human prosody, we aimed to target the IFG as our region of interest (ROI) in the present study.

68

An additional interesting aspect of the IFG is that this region is involved in both implicit and explicit categorization and discrimination of emotions in auditory stimuli. Implicit processing occurs when participants are required to conduct a task (e.g., judging the linguistic content of words or sentence pronounced with different emotional tones) other than evaluating the emotional content of the stimuli (e.g., Frühholz et al., 2012). The IFG is also involved when participants make explicit judgments (e.g., categorizing anger vs. fear) about the emotional content of the stimuli they are exposed to Ethofer et al. (2006) and Frühholz et al. (2016). The right IFG may be particularly important for conducting such an explicit evaluation of the emotional content of the voices, although both hemispheres play a role in the processing of the emotional content (Frühholz & Grandjean, 2013b). In general, independently of the implicit or explicit characteristic of the task, hemisphere biases for IFG activation can be expected in the evaluation of auditory emotional stimuli. For example, the right IFG appears especially activated during the listening of emotional stimuli (Wildgruber et al., 2009). In comparison, activations of the left IFG have been connected to the semantic content of a given vocal utterance, in part because the left IFG encompasses Broca’s area, which is particularly involved in speech processing (Friederici, 2012), and which the linguistic structure of pseudo-words (e.g., ‘belam’ or ‘molem’) used in auditory emotional paradigms is likely to trigger (Frühholz & Grandjean, 2013b). Nevertheless, this lateralized view of the activity of the IFG is not shown in all studies. Indeed, several studies on emotional processing have found bilateral activations of the IFG (Ethofer et al., 2006; Frühholz et al., 2012; Kotz et al., 2013), or even left activations of specific areas of the IFG (Bach et al., 2008; Wildgruber et al., 2009) during emotional tasks. This suggests that different areas of the two IFGs are involved in different tasks concerned with the treatment of emotional vocal stimuli (Frühholz & Grandjean, 2013b).

Despite the current caveats of the research on categorization and discrimination of auditory stimuli that we have outlined here, the well-established paradigms in fMRI as well as the extended literature make a strong case to transfer, adapt, and extend (by adding new emotional stimuli) the fMRI protocols to fNIRS. At the behavioural level, we expected to replicate results from the literature that is participants would be more successful in discrimination compared to categorization, particularly in the pseudo word recognition compared to emotions (Dricu et al., 2017). At the brain level, in line with previous fNIRS studies in the visual modality, i) we first predicted that active evaluation (categorization and

69

discrimination) of auditory emotional stimuli would increase more O2Hb changes in IFG compared to passive listening of the same stimuli. In addition, based on findings in fMRI (e.g. Dricu et al., 2017), we predicted that categorization (processing A-versus-B computations) would lead to more O2Hb changes in IFG because it is cognitively more demanding than discrimination (only processing A-versus- Non-A computations). Second, based on the body of work in fMRI relying on implicit or explicit judgments, we predicted that ii) O2Hb changes would be modulated differentially according to the experimental manipulation of both the task (categorization or discrimination) and the content focus (condition: pseudo word or emotion). Finally, we also expected to capture hemisphere effects, based on the literature. Yet, because of the large variation recorded in the literature as reviewed above, we only hypothesized iii) that emotional stimuli would involve more the right IFG than neutral stimuli but we did not produce strong hypotheses regarding hemisphere biases beforehand.

1.3. Materials and methods

1.3.1. Participants

Twenty-eight healthy volunteers (14 males; mean age 26.44 years, SD = 4.7, age range 21–35) took part in the experiment. The participants reported normal hearing abilities and normal or corrected-to-normal vision. No participant presented a neurological or psychiatric history, or a hearing impairment. All participants gave informed and written consent for their participation in accordance with the ethical and data security guidelines of the University of Geneva. The study was approved by the Ethics Cantonal Commission for Research of the Canton of Geneva, Switzerland (CCER).

1.3.2. Stimuli

The stimulus material consisted of three speech-like but semantically meaningless two- syllable pseudo words (i.e., “minad,” “lagod,” “namil”). These three stimuli were selected before the experiment from a pre-evaluation of a pool of pseudo words enounced on five emotion scales (sadness, joy, anger, fear, neutral) because they were most consistently evaluated as angry, fearful, and neutral, respectively (Frühholz et al., 2015 and Supplementary Material). These pseudo words were 16-bit recordings sampled at a 44.1 kHz sampling rate. Two male and two female speakers spoke these three different pseudo words

70

in an angry, fearful, or neutral tone, resulting in a total of 36 individual stimuli used in the current study. While there were individual differences between the speakers, all stimuli were evaluated by listeners (N = 12) as reflecting the correct emotion (Frühholz et al., 2015).

1.3.3. Procedure

Participants sitting in front of a computer performed two alternative forced-choice tasks of auditory discrimination and categorization via pressing a button on the keyboard. Stimuli were presented binaurally through in-ear headphones (Sennheiser). The participants listened to each voice and made a corresponding button press as soon as they could identify the requested target for each block. The categorization and discrimination blocks were split into blocks with a focus on emotion and blocks with a focus on the linguistic features of the stimuli. That is, either the participant had to select the pseudo word that they believed they heard, or the emotional tone with which it was pronounced. For discrimination, participants had to answer to a A vs non-A question (e.g., “minad” vs “other” or “fear” vs “other”), while for categorization, participants had to answer a A vs B question (“minad” vs “lagod” vs “namil or “fear” vs “anger” vs “neutral”). In the following and for simplicity, we will refer to all blocks concerned with the recognition of pseudo word as ‘word categorization’ or ‘word discrimination.’ Similarly, we will refer to all blocks concerned with the recognition of emotion as ‘emotion categorization’ or ‘emotion discrimination.’ Our experiment was thus blocked by tasks, based on a two (task: discrimination/categorization) by two (condition: emotion/word) design, with two blocks per condition and task (two each for emotion categorization, word categorization, emotion discrimination, and word discrimination). This allowed us to repeat each condition at least once and make sure that the data of at least one block could be analysed if data acquisition came to a halt in a given block because of a software bug, which a pilot study suggested could occur. The eight blocks were preceded and followed by passive listening blocks, leading to 10 blocks in total (Figure 25). During the two passive blocks, participants only had to listen to the same stimuli as in the active tasks without having to make an active decision. Button assignments, target button and target stimuli alternated randomly across blocks for each participant. Task blocks, block order and response buttons also alternated through the experiment across participants, so that every participant had a unique ordering.

71

The two blocks of emotion categorizations involved a three alternative forced-choice determining whether the speaker’s voice expressed an “angry,” “fearful,” or “neutral” tone (the options “angry” and “fear” were assigned to left and right index finger buttons, the “neutral” option included a simultaneous press of the left and right buttons no more than 500 ms apart). The two blocks of word categorization involved a three alternative forced-choice determining whether the pseudo word spoken was “minad,” “lagod,” or “namil” (the options “minad” and “lagod” were assigned to left and right index finger buttons, the “namil” option included a simultaneous press of the left and right buttons no more than 500 ms apart). The discrimination blocks included a target emotion or a target pseudo word, which was assigned to one of the two response buttons. During the two emotion discrimination blocks, either angry or fearful voices were the target (e.g., press the left button for “angry” voices, and the right button for all other voices) and the two word discrimination blocks included either “minad” or “lagod” as the target pseudo word (e.g., press the left button for “minad,” and the right button for all other words). We acknowledge that by doing so, participants never had to discriminate “neutral” or “namil” against the opposite pseudo words or emotions. Testing all three would have required three blocks in each condition, multiplying the duration of the experiment or biasing it toward discrimination. In addition, by having “namil” and “neutral” always connected to the same behavioural response, we limited the possible number of button attribution errors (when a participant wrongly associates a button with a pseudo word or emotion, resulting in a stream of incorrect choices in a block), which would have likely increased if no single pseudo word or emotion had been bounded to a particular button combination. Within each block, all 36 voice stimuli were presented twice resulting in 72 trials per block. These 72 trials were clustered into mini-blocks of six voice stimuli, where a stimulus was presented every 2s; each mini-block thus had an average length of 11.5–12s. The presentation of mini-blocks was separated by 10s blank gap for the O2Hb signal to return to baseline. Trials for each mini-block were randomly assigned, with the only exception that every emotion (with no more than three times the same emotion in a row) and every pseudo word had to appear at least one time per mini-block. Each mini-block started with a visual fixation cross (1 x 1) presented on a grey background for 900 x 100 ms. The fixation cross

72

prompted the participant’s attention and remained on the screen for the duration of the mini-block.

Figure 25: Experimental protocol with a possible list of blocks and stimuli within a mini-block.

1.3.4. NIRS Recordings

For this study, we used the Oxymon MKIII device (Artinis Medical Systems B.V., Elst, Netherlands) with a 2x4 optode template and wavelengths of 765 and 855 nm corresponding to an optimal range of signal to noise ratio (SNR, see Scholkmann et al., 2014). We placed four optodes as a square on both sides of the participant’s head, forming 4 channels around the F7 or F8 references and corresponding, respectively, to the left and right IFG (Figure 26), as defined in the 10-20- EEG system (Jasper, 1958; Okamoto et al., 2004). All channels were placed at an inter-optode distance of 35 mm and we recorded with a sampling rate of 250 Hz.

Figure 26 : Spatial registration of optode locations to the Montreal Neurological Institute (MNI) space using spatial registration approach (Tsuzuki et al., 2007). This method relies on structural information from an anatomical database to estimate the fNIRS probe locations into a 3D space.

73

Thus, this procedure allows the projection of the eight channels in the subject space into the MNI (Okamoto et al., 2004). Central dots indicate the F7 and F8 electrode position in the 10-20 EEG system. “o” and “x” indicate optical transmitter and receiver positions, respectively.

1.4. Analysis

1.4.1. Behavioural Data

We only analysed data from N = 26 participants (2 excluded for missing too many blocks) using R studio software (R Studio Team, 2015, Inc., Boston, MA, United States). The accuracy analysis was performed on a total number of trials of N = 14’544 across the 26 participants (average: 559.39, SD: 42.27; on a basis of 576 trials/participant but with four participants’ dataset incomplete due to technical issues). We assessed accuracy in the tasks by predicting a generalized linear mixed model (GLMM) with binomial error distribution, with condition (emotion vs word), task (categorization vs discrimination), and emotion (anger, fear, neutral) as well as their interactions as fixed factors, and with intercept participant IDs and blocks (first or second) as random factors, against a GLMM with the same factors but not including the interaction between condition/task/emotion, allowing us to assess the effect of the triple interaction (see Supplementary Material for an example of model analysis). Note that for some models we used an optimizer to facilitate convergence. This analysis was followed by contrasts for which post hoc correction for multiple comparisons was applied by using a Bonferroni correction (0.05/66 = 0.00076). The specific contrasts we tested aimed to decipher whether the condition, emotion, and task had an effect on participants’ behaviour. We analysed reaction times by predicting a general linear mixed (GLM) model with condition (emotion vs word), task (categorization vs discrimination), and emotion (anger, fear, neutral), as well as their interactions as fixed factors, and with participant IDs and blocks as random factors using the same approach as in the analysis for accuracy. All reaction times were collected from the offset of the stimulus. We only included in our analyses the reaction times for correct answers. This resulted in a total number of trials of N = 13’789 across the 26 participants (average: 530.35, SD: 50.27). We excluded data points considered as outliers under 150 ms and higher than thrice the standard deviation (RT < 150 ms and >1860 ms; 98.85% of RT data points included).

74

1.4.2. fNIRS Data

Seven participants out of 28 were excluded from the dataset due to poor signal quality or missing fNIRS data. The absence or the low signal of heart beats in raw O2Hb as well as a strong negative correlation between O2Hb and HHb constituted a bad SNR. Furthermore, the presence of artifacts after band-pass filtering was also a factor of exclusion. A total of 21 participants were thus analysed in this study. The number of participants was in line with statistical power analyses in fMRI (Desmond & Glover, 2002) and studies using fNIRS to assess emotional processing in frontal areas (for a review, see Bendall et al., 2016). Due to a good repartition of the SNR, we performed on all channels the first level analysis with MATLAB 2016B (Mathworks, Natick, MA, United States) using the SPM_fNIRS toolbox (Tak et al., 2016) and homemade scripts. Haemoglobin conversion and temporal pre- processing of O2Hb and HHb were made using the following procedure: i) haemoglobin concentration changes were calculated with the modified Beer-Lambert law (Delpy et al., 1988); ii) motion artifacts were reduced using the method proposed by (Scholkmann et al., 2010) based on moving standard deviation and spline interpolation; iii) physiological and high frequency noise such as due to vasomotion or heart beats usually found in extra-cerebral blood flow were removed using a band-stop filter between 0.12–0.35 and 0.7–1.5Hz following (Oostenveld et al., 2010) and a low-pass filter based on the HRF (Friston et al., 2000); iv) fNIRS data were down-sampled to 10 Hz; v) low frequency confound were reduced using a high-pass filter based on a discrete cosine transform set with a cut-off frequency of 1/64 Hz (Friston et al., 2000).

In line with previous literature using vocal stimuli in fNIRS studies (e.g. Lloyd-Fox et al., 2014), we considered the hemodynamic time course in our second level analyses. To select the range of the maximum concentration changes (µM) observed across participants for each trial, we averaged the concentration of O2Hb between 4 and 12s post-stimulus onset. As in fMRI studies, this interval took into consideration the slow timing of participants’

HRF and allowed us to assess precisely the O2Hb concentration of one specific stimulus. We performed the same analyses on HHb to check our O2Hb concentration changes (µM) for consistency. Because our results with HHb were coherent with the O2Hb (Tachtsidis &

Scholkmann, 2016), we only provide our results for O2Hb in the main text (correlation

75

coefficient: -0.97, p < 0.001, N = 12, Supplementary Figure 33; and see Supplementary Material for HHb analyses). All data were log-transformed to normalize them for the analyses. We performed the second level analysis with R studio using Linear Mixed Models analysis including the following factors and their interactions depending on their pertinence in regard to our hypotheses (that is, we only run the contrasts that tested these hypotheses, rather than all the possible contrasts indiscriminately): condition (emotion vs word), emotion content (anger vs fear vs neutral), task (categorization vs discrimination vs passive) and hemisphere (right vs left, by pulling together data from channels 1–4 for the right hemisphere and data from channels 5–8 for the left hemisphere) as well as their interactions as fixed factors, with participant IDs and block orders as random factors. In particular, we predicted models including a higher level interaction against models of the lower dimension (e.g., a four-way versus a three-way interaction + the main effects), presented in the results, on which we ran subsequent contrasts (see Supplementary Material for models with lower dimension interactions).

1.4.3. Analyses Including Passive Blocks

We first aimed to isolate whether our ROIs were activated differently during active blocks compared to passive blocks, in line with our first hypothesis i). To do so, our first analyses confronted data collected during the passive and the active blocks. We were particularly interested in testing the effects of lateralization and emotional content, as previous fMRI studies had shown possible variation for these factors (see above). We noticed post hoc that subjects’ activations during the first and the final passive run differed widely, with the activation pattern found during the final passive run close to the pattern of activation recorded during the active tasks (see Supplementary Material, in particular Supplementary Figure 32, where we revealed a significant interaction of task by block number (χ²(2) = 2388.50, p < 0.001), with a significant contrast Passive 1 x Passive 2: (χ²(2) = 4.33, p < 0.001)). Therefore, it is likely that subjects were still engaged, consciously or not, in the discrimination or categorization of stimuli during the final passive block, even though they were instructed not to do so. For this reason, we excluded data from the final passive block, and only included data from the first passive block, for which no instruction besides listening to stimuli had been conveyed to the participants, ensuring their naivety to the task. To isolate any effect of active processes (that is processes occurring during blocks

76

where the task was either discrimination or categorization) vs. passive processes, we tested a three-way model including data from the first passive run and all discrimination and categorization blocks. We specifically tested effects of active vs. passive blocks across emotions and hemispheres iii), resulting in testing a three-way interaction between processes (active vs passive tasks), emotion (anger vs. fear vs. neutral) and hemispheres (right vs left).

1.4.4. Analyses on Active Blocks Only

Second, in line with our second hypothesis ii), we were interested in whether there were differences in activations between categorization or discrimination of words and emotions across hemispheres, and whether this depended on the emotion being tested. To do so, we focused on active blocks (discrimination and categorization blocks) and excluded the passive blocks, as the subjects had no specific instructions regarding the stimuli compared to the active blocks (see above). To isolate any differences between the factors, we tested a four-way interaction on the active blocks including the effects of hemisphere (right vs left), tasks (discrimination vs categorization), conditions (word vs. emotion), and emotions (anger vs fear vs neutral). Subsequently, as in our first analysis, we tested contrasts between right and left hemispheres. In a final analysis, we individually looked at each hemisphere iii) to contrast anger, respectively, fear, versus neutral stimuli.

1.5. Results

1.5.1. Behavioural Data

1.5.1.1. Accuracy Data

There were significant effects for task (categorization vs. discrimination; χ²(1) = 6.38, p = 0.012), and emotion (anger, fear, neutral; χ²(2) = 33.01, p < 0.001), and for the interactions condition by task (χ² (1) = 21.17, p < 0.001), and condition by emotion (χ² (2) = 14.00, p < 0.001), but not for the main effect related to condition (emotion vs. word; χ²(1) = 2.54, p = 0.11) and for the interactions task by emotion (χ²(2) = 4.65, p = 0.098), and task by condition by emotion (χ²(2) = 2.31, p = 0.32). Analysis of the contrasts of interest, following Bonferroni correction, revealed that participants were better for categorization when listening to neutral compared to anger and fear stimuli for the emotion condition (neutral vs anger: χ²(1) = 28.42, p = 0.0004;

77

neutral vs fear: χ²(1) = 15.06, p = 0.0001; Figure 27). This effect was not present for emotion discrimination (neutral vs anger: χ²(1) = 9.42, p = 0.002; neutral vs fear: χ²(1) = 8.47, p < 0.004), nor for categorization (p-values > 0.2) or discrimination (neutral vs anger: χ²(1) < 0.01; neutral vs fear: χ²(1) = 5.44, p = 0.02) in the word condition.

Figure 27 : Accuracy (in %) of the 26 participants represented as a function of condition (emotion vs word) and task (categorization vs discrimination), and emotion (anger, fear, neutral).

1.5.1.2. Reaction time

Correlation between reaction time and accuracy was extremely weak (Spearman’s rho = 0.033). The analysis revealed significant main effects for condition (χ²(1) = 240.51, p < 0.001), task (χ²(1) = 653.12, p < 0.001), and emotion (χ²(2) = 61.66, p < 0.001); as well as significant interactions between task by emotion (χ²(2) = 9.83, p = 0.007) and between all three factors (χ²(2) = 14.2, p < 0.001) but not for condition by task (χ²(1) = 1.92, p = 0.17) nor condition by emotion (χ²(2) = 5.42, p = 0.07, see Figure 28). Contrast analysis using the same Bonferroni correction as in the accuracy analysis revealed that participants were slower for anger compared to fear and neutral during emotion (respectively, χ²(1) = 15.46, p < 0.0001 and χ² (1) = 17.55, p < 0.0001), and word discrimination (respectively, χ²(1) = 41.65, p < 0.0001 and χ²(1) = 15.89, p < 0.0001). For word categorization the comparison between anger/neutral was significant (χ²(1) = 21.15, p < 0.0001) but not for anger/fear (χ²(1) = 4.82, p = 0.028). For emotion categorization both the comparisons anger/fear and anger/neutral were not significant (p-

78

values > 0.026).

Figure 28: Reaction time (in ms) for the correct trials of the 26 participants represented as a function of condition (emotion vs word) and task (categorization vs discrimination), and emotion (anger, fear, neutral).

1.5.2. NIRS Data

1.5.2.1. Analyses Including the First Passive Run

As predicted, we revealed a significant three-way interaction of task by hemisphere by emotion (χ²(10) = 262.47, p < 0.001, see Table 2).

Table 2: Summary of the main effects and results of the three-way interaction between the factors in the models assessing passive vs active processes (categorization, discrimination) comparison.

79

We subsequently ran contrasts to isolate the contributions of each of the factors. In particular, when contrasting passive listening vs. active tasks (categorization and discrimination) with lateralization (right vs. left) and pairs of emotions together, we found a significant difference with higher O2Hb values for tasks vs passive listening for ‘fear’ compared to ‘neutral’ on the right compared to left hemisphere (χ²(1) = 18.13, p < 0.001;

Figure 29); and a significant difference for ‘fear’ compared to ‘anger’ with higher O2Hb for anger on the left compared to the right hemisphere (χ²(1) = 15.16, p < 0.001); in comparison, ‘anger’ vs ‘neutral’ did not yield significant differences (χ²(1) = 0.13, p = 0.72). When only considering neutral stimuli, the contrast between passive listening and tasks was also significant with higher values for left compared to right (χ²(1) = 29.02, p < 0.001; see Figure 30), showing a general task difference independent of emotional content.

Hb value (Log)

2

Mean O

Figure 29 : Constrast in log of O2Hb concentration changes (µM) in the right and left hemispheres during the treatment of anger, fear and neutral stimuli. *** p < 0.001.

80

Hb value (Log)

2

Mean O

Figure 30: Contrast between log values of O2Hb concentration changes (µM) for activities during passive listening and active (categorization and discrimination) blocks for neutral stimuli only. *** p < 0.001.

1.5.2.2. Analyses of the Active Blocks

We revealed a significant four-way interaction of task by hemisphere by condition by emotion (χ²(11) = 117.04, p<0.001, see Table 3), confirmed also for HHb (χ²(17) = 2463.9, p < 0.001, see Supplementary Material). To test the specific significant effects related to emotions and lateralization, we performed the following contrasts: we tested the impact of condition (emotion versus word), hemisphere (left vs right) and task (discrimination vs categorization) for each emotion individually: anger (χ²(1) = 32.54, p<0.001), fear (χ²(1) = 54.85, p < 0.001), and neutral (χ²(1) = 79.84, p < 0.001). In addition, we also contrasted emotions with each other: the contrasts condition, hemisphere, task for anger vs fear (χ²(1) = 1.45, p = 0.23) did not reach significance but the contrasts anger vs neutral [χ²(1) = 107.16, p < 0.001] and fear vs. neutral did (χ²(1) = 133.52, p < 0.001, Figure 31), suggesting that the comparison across hemispheres between fear and neutral, on the one hand, and fear and anger, on the other hand, drove most of the interaction.

Finally, to investigate the specificities of the lateralization, we also ran contrasts on the left or right hemispheres only (Table 3). This analysis revealed a significant effect of ‘anger’ vs ‘neutral’ on the left hemisphere (χ²(1) = 13.42, p < 0.001); this effect was also significant for the right hemisphere (χ²(1) = 120.48, p < 0.001). The comparison for ‘fear’ vs ‘neutral’ was also significant for both the left (χ²(1) = 51.63, p < 0.001) and right hemispheres (χ²(1) = 83.83, p < 0.001). Finally, the comparison for ‘fear’ vs ‘anger’ was significant for the left (χ²(1) = 12.4, p <

81

0.001, Figure 31) but not the right hemisphere (χ²(1) = 3.31, p = 0.07).

Hb value (Log)

2

Mean O

Figure 31 : Contrast in log of O2Hb concentration changes (mM) for anger, fear, and neutral stimuli in the right and left hemispheres for emotional categorization/discrimination and word categorization/discrimination. ***p < 0.001.

Table 3: Summary of the main effects and results of the four-way interaction between the factors in the models assessing the active tasks comparison.

82

1.6. Interim Discussion

In this study we showed that fNIRS is a suitable method to study cognitive paradigms related to emotions, particularly categorization and discrimination, in the human frontal regions using mini-block design and event related stimuli. Our first goal was to estimate whether it was possible to isolate significant activity in the IFG using fNIRS, whose activity has been highlighted in previous fMRI studies investigating emotional prosody processing, and in particular during categorization and discrimination of emotional stimuli (Frühholz et al., 2012; Schirmer & Kotz, 2006). Both the right and left IFGs have been connected to the processing of emotional stimuli (Ethofer et al., 2006; Frühholz et al., 2012; Wildgruber et al., 2009) and we were interested to investigate such effects in more depth with fNIRS. We predicted i) that active evaluation (categorization and discrimination) of auditory emotional stimuli would increase more O2Hb changes in IFG compared to passive listening of the same stimuli, and that categorization itself would be more demanding than discrimination, which would be both reflected in the brain and behavioural data. Our second goal was to investigate whether fNIRS, beyond being suitable, could also offer informative data in complex multifactorial analyses. In particular, we expected ii) that the O2Hb changes would be modulated differentially according to the tasks, conditions and emotions, with the possible presence of hemisphere biases. Overall, we found increased differential changes in

O2Hb in the IFG based on experimental conditions suggestive of significant differences in frontal activations during our tasks, including a difference in activation during categorization and discrimination compared to passive listening in the O2Hb and confirmed in the HHb signals. In particular, in our first analysis of the NIRS signal, we isolated left hemisphere activity for active processing versus passive listening of neutral stimuli (Figure 30). This result suggests that fNIRS is in general a suitable method to identify brain signatures related to complex processes such as categorization and discrimination in auditory stimuli. In addition, while we did not observe a main effect of task in the active-only analyses, we uncovered significant interactions that included task, condition and emotion content, suggesting that categorization and discrimination of various content have different fNIRS signatures, and underlining that fNIRS can be used in complex multifactorial paradigms. Furthermore, we isolated specific hemispheric differences between emotions that can be linked with findings in fMRI. While our study was primarily aimed at showing that fNIRS

83

was suitable to use for the study of auditory discrimination and categorization, our results are also of interest in the current debate on the lateralization of effects in the brain, in particular when compared to former fMRI studies concerned with the involvement of the PFC in the evaluation of emotional stimuli (Dricu et al., 2017; Ethofer et al., 2006). When considering active and passive tasks, the effect for fear and anger versus neutral was more pronounced in the right hemisphere (Figure 29), in line with classic studies highlighting a right dominance for emotional treatment in prosody (Wildgruber et al., 2009) and our preliminary hypothesis iii). However, while the left hemisphere was more deactivated with fear stimuli, anger stimuli activated more the left side of the prefrontal lobe compared to the right side. Both findings are compatible with Davidson’s view (Davidson et al., 1990; 1992), for whom approach related emotions such as anger activate more the left hemisphere, particularly in the prefrontal cortex, while avoidance-related emotions, such as fear, are more located in the right hemisphere. Furthermore, our second analysis on active tasks only also revealed significant differences between categorization and discrimination between the experimental conditions: indeed, we found a significant four-way interaction between condition (word vs emotion), task (categorization vs discrimination), emotion and hemisphere. Interestingly, the results of the analysis of the contrasts suggest that differences in brain activity between categorization and discrimination and lateralization are more important for fear and anger stimuli compared to neutral ones, both on the right and left hemisphere. Nevertheless, activity for anger stimuli across conditions and tasks was higher compared to other stimuli (Figure 31). This result supports a bilateral approach to the treatment of emotional stimuli (Frühholz & Grandjean, 2013b; Schirmer & Kotz, 2006).

Our behavioural results are also informative with respect to a differential treatment of stimuli depending on emotion content, condition and task. While our participants were generally accurate across tasks and conditions (over 96% correct in all tasks), and while we cannot exclude that the minor but significant variations between the four experimental conditions result from the very large number of data points, which made the standard errors quite small, we note that these differences nevertheless appear to reflect the variations in treatment outlined in the four-way interaction found in the fNIRS data. Participants were most accurate when engaged in emotional categorization, seconded by word discrimination, with the lowest accuracy rates found for word categorization and emotional discrimination. This result may seem counter-intuitive at first, as categorization appears to be cognitively

84

more difficult than discrimination. However, there was also much variation in terms of emotion recognition, with participants more accurate with neutral stimuli when their task was to categorize the correct emotional content. However, the difference across emotions was not present when their task was to judge the linguistic content of the words, nor when they had to discriminate emotions, possibly because of our experimental design. In addition, participants’ reaction times also varied between the conditions and emotions: overall, categorization took more time compared to discrimination, with judgments made on emotional content always taking longer than on linguistic content, particularly with respect to anger stimuli. This behavioural finding may reflect the increased activation across hemispheres observed in the fNIRS data for anger stimuli. Combined, these results suggest different processing between words and emotions (in line with Belyk et al., 2017), with active judgments on emotional stimuli being more demanding (longest reaction time) than judgments on the linguistic content. Indeed, when participants judged the emotional content of stimuli, they were more accurate for categorization than discrimination but spent a longer time before selecting their answer. In contrast, for words, participants were more accurate for discrimination compared to categorization, but they spent less time before answering. Another potential explanation for the differences observed between the active processing of emotional aspects compared to linguistic aspects lies in the fact that the IFG is activated during both implicit and explicit categorization and discrimination of emotions (Dricu et al., 2017; Ethofer et al., 2006; Frühholz et al., 2012, 2016). Our participants may thus have engaged in implicit emotional processing of the stimuli even when their task was to judge the linguistic aspect of the stimuli. This additional treatment may explain the O2Hb differences found between emotions even in the context of word categorization and discrimination. The right IFG has previously been highlighted as particularly important in the explicit evaluation of the emotional content of the voices, and our O2Hb results support this view, particularly when considering fear versus neutral stimuli. The generally higher activity in both hemispheres when participants processed stimuli with an angry content also supports the view that both hemispheres play a role in the processing of the emotional content, whether implicit or explicit (Frühholz & Grandjean, 2013a). Future work will need to explore the specific aspects of emotional stimuli when more types of emotion (e.g., positive) are included. It may also be interesting to study whether bilateral or unilateral treatments are elicited depending on the evaluation process, implicit or explicit.

85

In general, more work is needed to assess the limitations of fNIRS with respect to complex cognitive processing. For example, there is only an indirect link between the O2Hb measures and the actual neural activity, which will eventually limit the direct connections that can be extrapolated between variation in activity in a given ROI and the behaviour of participants. Note, however, that this criticism also applies to other techniques (e.g., fMRI) relying on indirect measures such as blood oxygen-level dependent signal to reflect neural activity (Ekstrom, 2010). In our view, work relying on different imaging techniques can thus only improve our understanding of this indirect relationship, and a possible new avenue of research is to combine fMRI and fNIRS to explore auditory evaluation of stimuli. It seems also processing from other auditory processing. For example, effortful listening has been shown to also affect activity in the PFC and IFG (Rovetti et al., 2019), something that our study did not account for. In particular, listening to emotional stimuli and pseudo words may be more effortful than listening to traditional speech and thus might have also driven some of the recorded effect. Future work using this type of paradigms will thus need to tackle other cortical activities related to processing auditory stimuli in general.

To conclude, our study shows that, despite its caveats, fNIRS is a suitable method to study emotional auditory processing in human adults with no history of psychiatric antecedents or hearing impairment. Beyond fNIRS studies investigating emotions from a perceptual point of view (e.g. Plichta et al., 2011; Zhang et al., 2018), our study replicates and extends effects found with more traditional imaging methods such as fMRI and shows that subtle differences can be found in fNIRS signal across tasks and modalities in the study of emotional categorization and discrimination. Future work will need to examine in more details whether differences between stimuli valence or arousal may also influence the fNIRS signal. In this respect, one of the major advantages of fNIRS lies in the fact that it is noiseless. This is all the more important for studies that investigate the perception of sounds, but also in general for more realistic experiments. fNIRS may also be very informative in the context of prosody production thanks to its resistance to movement artifacts compared to other brain imaging methods. Combined with its portability and ease of use, fNIRS may also extend such questions in populations where the use of fMRI is limited such as young infants, populations in less developed countries or, possibly, other species (Gruber & Grandjean, 2017). The use of unfamiliar non-verbal human or non-human vocalizations rather than pseudo words may be particularly informative to study the

86

developmental and evolutionary origins of the two cognitive processes. Finally, our study contributes to the growing field of affective neurosciences, confirming through a different imaging technique that emotion treatment, both explicitly and implicitly, may be largely conducted in the IFG, a possible hub for the extraction and detection of variant/invariant aspects of stimuli (e.g., acoustical features) subjected to categorization/discrimination representation (e.g., anger/neutral prosody) in the brain.

1.7. Supplementary Material

Selection of stimuli for the experiment

From a pre-evaluation of all stimuli on five emotion scales (sadness, joy, anger, fear, neutral), we selected three pseudo words for each emotional tone that were most consistently evaluated as angry (F4,80 = 256.111, p<0.001), fearful (F4,80=151.894, p<0.001), and neutral (F4,80 = 193.527, p<0.001), respectively 46. One-way ANOVA revealed that there was a statistically significant difference in arousal scores among the angry, fear, and neutral voices (F2,40 = 54.073, p<0.001). Bonferroni corrected post-hoc planned comparisons revealed that arousal scores for angry and fearful tones were significantly higher than scores received by neutral tones (p<0.001), but they did not differ significantly from each other (p = 0.408).

Example of R script analysis for accuracy

Comparison of the model with the triple interaction and the model with main effects and double interactions including random effects (intercept) with binomial family: model.lme1

All other behavioral and NIRS analysis are based on similar scripts with generalized or general linear mixed models.

Analyses first (active and passive) blocks versus second (active and passive) blocks

87

Task * Block number: We revealed a significant interaction of task * block number (χ2(2)=2388.50, p<0.001).

Contrasts : Passive * active: (χ2(1)=2400.60, p<0.001) Passive 1 * passive 2: (χ2(1)=4.334.10, p<0.001)

Figure 32. Contrast between log values of O2Hb concentration changes (µM) for activities during the first (B1) and the second (B2) passive listening and active blocks including all stimuli.

Analyses Oxy-Hemoglobin Signal

Analyses including the first passive run

Main effects

Hemisphere: We revealed a significant main effect of hemisphere (χ2(1)=93.4, p<0.001). Emotion: We revealed a significant main effect of emotion (χ2(2)=2758.8, p<0.001). Task: We revealed a significant main effect of task (χ2(2)=3491.6, p<0.001).

88

Interactions

Task * Emotion: We revealed a significant two-way interaction of task * emotion (χ2(4)=48.8, p<0.001). Task * Hemisphere: We revealed a significant two-way interaction of task * hemisphere (χ2(2)=33.4, p<0.001). Emotion * Hemisphere: We revealed a significant two-way interaction of emotion * hemisphere (χ2(2)=5564.9 , p<0.001). Task * Hemisphere* Emotion: We revealed a significant three-way interaction of task * hemisphere * emotion (χ2(10)=262.47, p<0.001).

Analyses on active blocks

Main effects

Condition: We revealed a significant main effect of condition (χ2(1)=14.27, p<0.001). Emotion: We revealed a significant main effect of emotion (χ2(2)=2681.8, p<0.001). Hemisphere: We revealed a significant main effect of hemisphere (χ2(1)=58.98, p<0.001). Task: There was no significant effect of task (χ2(1)=0.01, p=0.92).

Interactions

Task * Hemisphere: We revealed a significant two-way interaction of task * hemisphere (χ2(1)=5.29, p<0.05). Task * Condition: We revealed a significant two-way interaction of task * condition (χ2(1)=28.56, p<.001). Emotion * Hemisphere: We revealed a significant two-way interaction of task * emotion (χ2(2)=5639, p<0.001). Condition * Hemisphere:

89

We revealed a significant two-way interaction of condition * hemisphere (χ2(1)=16.63, p<0.001). Task * Hemisphere * Condition: We revealed a significant three-way interaction of task * hemisphere * condition (χ2(3)=40.9, p<0.001). Task * Hemisphere * Emotion: We revealed a significant three-way interaction of task * hemisphere * emotion (χ2(5)=195.42, p<0.001).

Analyses Deoxy-Hemoglobin Signal

Figure 33. Average of O2Hb and HHb concentration changes (µM) during categorization and discrimination tasks for each emotion. Correlation coefficient: -0.97, p<0.001.

Analyses including the first passive run

Main effects

Hemisphere: We revealed a significant main effect of hemisphere (χ2(1)=64.88, p<0.001). Emotion: We revealed a significant main effect of emotion (χ2(2)=2575.7, p<0.001). Task: We revealed a significant main effect of task (χ2(2)=1755, p<0.001).

Interactions

90

Task * Emotion: We revealed a significant two-way interaction of task * emotion (χ2(4)=152.92, p<0.001). Task * Hemisphere: We revealed a significant two-way interaction of task * hemisphere (χ2(2)=32.41, p<0.001). Emotion * Hemisphere: We revealed a significant two-way interaction of emotion * hemisphere (χ2(2)=2242.1, p<0.001). Task * Emotion * Hemisphere: We revealed a significant three-way interaction of task * emotion * hemisphere (χ2(10)=316.92, p<0.001).

Analyses on active blocks

Main effects

Condition: We revealed a significant main effect of condition (χ2(1)=23.26, p<0.001). Emotion: We revealed a significant main effect of emotion (χ2(2)=2612.9, p<0.001). Hemisphere: We revealed a significant main effect of hemisphere (χ2(1)=43.53, p<0.001). Task: We revealed a significant main effect of task (χ2(1)=80.67, p<0.001).

Interactions

Task * Hemisphere: We revealed a significant two-way interaction of task * hemisphere (χ2(7)=21.41, p<0.001). Task * Condition: There was no significant effect of task * condition (χ2(1)=1.56, p=0.21). Task * Emotion: We revealed a significant two-way interaction of task * emotion (χ2(2)=76.18, p<0.001). Hemisphere * Condition We revealed a significant two-way interaction of hemisphere * condition (χ2(1)= 8.12,

91

p<0.005). Task * Hemisphere * Emotion: We revealed a significant three-way interaction of task * hemisphere * emotion (χ2(6)=2388, p<0.001). Task * Hemisphere * Condition: We revealed a significant three-way interaction of task * hemisphere * condition (χ2(3)= 30.02, p<0.001). Task * Hemisphere * Condition * Emotion: We revealed a significant four-way interaction of task * hemisphere * condition * emotion (χ2(17)=2463.9, p<0.001).

92

Chapter 2.

Categorization and Discrimination of Human and Non-Human Primate Affective Vocalizations: a functional NIRS study of the Frontal cortex involvement

Coralie Debracque, Leonardo Ceravolo, Zanna Clay, Katie Slocombe, Didier Grandjean‡ and Thibaud Gruber‡

In preparation

2.1. Abstract

While modern humans constantly have to make choices, other species, particularly non- human primates, also react differently to threatening or pleasant situations with the effect of increasing their potential fitness. Because of its adaptiveness, recognizing threatening or affiliative signals is likely to be reflected in a capability of modern humans to recognize other closely related species’ call content. However, at both behavioural and neural levels, only few studies have used a comparative approach to understand affective decoding processes in humans. Findings are even scarcer with respect to affective vocalizations. Previous research in neuroscience about the recognition of human affective vocalizations has shown the critical involvement of temporal and frontal regions. In particular, the frontal regions have been reported as crucial in the explicit decoding of vocal emotions especially in different task complexity such as discrimination or categorization. The main aim of the present study using fNIRS was, to specifically investigate the neural activity of IFGtri and PFC underlying categorization (unbiased choice) and discrimination (biased choice) mechanisms of positive and negative affects in human, great apes (chimpanzee and bonobo), and monkey (rhesus macaque) vocalizations. We also analysed participants’ behavioural responses and correlated them with the recorded frontal activations. The fNIRS data revealed a clear distinction between the two frontal regions, with a general positive activation of IFGtri compared to a decrease of PFC activity. We also found a modulation of

IFGtri and PFC activations depending on the task complexity with more activity in the IFGtri during the discrimination (A versus non-A choice) compared to categorization and a more

93

intense decrease of the PFC in categorization (A vs B choice) compared to discrimination. Similarly, the interactions between behavioural and neural responses revealed differences between categorization and discrimination mechanisms for agonistic chimpanzee screams.

Interestingly, we also found a decrease of activity in both IFGtri and PFC for agonistic cues for great ape vocalizations only. Nevertheless, the decrease of activity in frontal regions was shown as well for affiliative chimpanzee and macaque calls, suggesting the importance of these regions in the explicit cross-taxa recognition. Regarding our behavioural results, participants were able to recognize almost all affective cues in all species vocalizations in discrimination (excepted for threatening bonobo calls). Yet, regarding the categorization task, they mostly identified correctly affective contents in human and great ape vocalizations but were not able to do it for all types of rhesus macaque screams. Overall, these findings draw the Hominidae phylogenetic tree, supporting the hypothesis of a pre-human origin of affective recognition processing inherited from our common ancestor with other primates. Finally, our results highlight the behavioural differences related to task complexity, i.e. between categorization and discrimination processes, and the differential involvement of the PFC and the IFGtri which seems necessary to explicitly decode affects in all primate vocalizations.

Keywords: categorization, discrimination, affect, vocalization, primate, NIRS, IFG, PFC

2.2. Introduction

Human life is made of choices, especially in the social domain. How we should react to threatening or joyful voices expressed by others involves different complexities of perceptual decision making related to specific brain mechanisms. Recent research on these mechanisms has emphasized the role of available sensory information as well as the different levels of complexity involved in the process during which a human makes a decision among several options (de Lange & Fritsche, 2017). In particular, perceptual decision-making involves processing sensory information, which are evaluated and integrated according to the goal and the internal state of an individual but also depending on the possible number of choices (Hauser & Salinas, 2014). While usually associated with irrational choices, emotions are in fact essential to guide cognitive processes to enable adaptive responses to the environment (Brosch et al., 2013). Over the last three decades, researchers in psychology (for a review, see Lerner, Li,

94

Valdesolo, & Kassam, 2015) and neurosciences (for a review, see Phelps, Lempert, & Sokol- Hessner, 2014) have investigated the emotional impacts on decision-making processes. However, neuroscience studies have mainly focused on the visual domain. Therefore, the neural bases of perceptual decision-making using affective auditory information remain to be investigated. Until now, fMRI studies involving explicit recognition of affective cues in voices have emphasized the role of frontal regions, such as the inferior frontal cortex. For instance, Brück and colleagues have revealed a stronger activation in the IFG when the participants were explicitly decoding emotional prosody as compared to identifying phonetic or semantic aspects of speech (Brück et al., 2011). These results are in line with previous research showing a key role of the IFG in affective prosody decoding (Ethofer et al., 2006; Wildgruber et al., 2009). Furthermore, recent findings have highlighted the role of the IFG in the complexity of perceptual decision-making. The categorization (unbiased choice, ‘A vs B’) or the discrimination (biased choice, ‘A vs non-A’) of affective cues in voices indeed involves different subparts of the IFG, with the involvement of IFGtri for discrimination and the involvement of the pars opercularis (IFGoper) for categorization respectively (Dricu et al., 2017). Thus, it seems necessary to investigate in more details the relationship between perceptual decision-making complexity and the role of the IFG in explicit recognition of affects in human voices as well as for the recognition of other primate affective vocalizations. Despite the knowledge that PFC is strongly involved in decision-making (e.g. Brosch et al., 2013; Damasio, 1996), more research on affective decoding in voices and PFC areas is needed. With the emergence of fNIRS, a non-invasive technique to study the brain hemodynamic (Boas et al., 2014) using the principle of tissue transillumination (Bright, 1831), new studies have extensively investigated the role of PFC in emotional processing, highlighting its role in emotion regulation (Glotzbach et al., 2011) and emotion induction (Matsuo et al., 2003; Ohtani et al., 2005; Yang et al., 2007). Interestingly, Zhang and colleagues reported a stronger activation in PFC and in IFG during an affective decoding paradigm using human voices (Zhang et al., 2018). Similarly, a recent fNIRS study highlighted the modulation of IFG activity depending on the categorization or the discrimination of affects in auditory stimuli (Gruber et al., 2020). Hence, more investigations on PFC and IFG activations are necessary to improve our knowledge in affective decoding. Moreover, the fNIRS methodology seems particularly adapted to the exploration of frontal

95

regions in decision-making and emotional paradigms.

Facing choices is not specific to humans. To correctly identify an affective signal in vocalizations is often a matter of life or in the animal kingdom. In fact, allowing species to react adaptively to a threatening or a dangerous situation, these recognition mechanisms are primordial for the fitness of individuals (Anderson & Adolphs, 2014; Filippi et al., 2017). For example, research on NHP (from henceforth, primates), our closest relatives, have demonstrated the ability of great apes such as chimpanzees to modulate their vocalizations depending on whether they are victims or aggressors in agonistic interactions (Slocombe & Zuberbühler, 2005). Interestingly, it has been shown that chimpanzees produced different kinds of calls as function of the severities of aggression and the presence or absence of an alpha male in the close environment (Slocombe & Zuberbühler, 2007; Slocombe, Townsend, & Zuberbühler, 2009). Similar results have been found in other non-human primates, with Gouzoules reporting specific vocalizations in macaques (Macaca mulatta) during conflicts and their abilities to distinguish the seriousness of a situation while listening to the victim’s calls (Gouzoules, 1984). Despite the fact that humans, share with other primates the ability to identify correctly affective cues in vocalizations allowing them to use available information to make their choice, only few studies have used a comparative approach to understand affective decoding mechanisms in humans using primate vocalizations. These studies have revealed at both cerebral and behavioural levels promising results highlighting the importance of the phylogenetic proximity. For example, researchers emphasized the role of the right IFG and the right OFC, part of the PFC regions, in the human ability to discriminate agonistic or affiliative contents in chimpanzee screams. In comparison, the human participants were unable to do the same for macaque vocalizations (Belin, Fecteau, et al., 2008; Fritz et al., 2018). Nevertheless, Linnankoski and colleagues have shown, on human adults and infants, the abilities to recognize as well affective cues in macaque vocalizations using a categorization paradigm (Linnankoski et al., 1994). This last result points out the difference of complexity between the discrimination and categorization tasks in humans, even if the affective recognition is related to primate vocalizations. Overall, more controlled investigations in this domain are thus needed (Gruber & Grandjean, 2017).

96

The main aim of the present study was to test how the IFG and PFC regions are involved in explicit decoding of primate vocalizations? How does task complexity modulate the brain and behavioural responses? Is the phylogenetic proximity a key for a better understanding of such processes? Indeed, the present study aimed to investigate human affective recognition processing in human and other primate vocalizations using cerebral and behavioural data. The participants performed categorization and discrimination tasks on affective contents (agonistic versus affiliative) in human, great apes (chimpanzee, bonobo) and monkey (rhesus macaque) vocalizations while their brain activity was recorded using fNIRS. We predicted: i) according to the cognitive complexity hypothesis, the categorization task should involve more activations in the IFG and PFC than discrimination; ii) if a phylogenetic effect was at play, IFG and PFC would be modulated differently across human, great apes and monkey vocalizations; and iii) if frontal regions are necessary to cross-taxa recognition of affects, neural activity in the IFG and PFC should be related to the participants’ performances.

2.3. Materials and methods

2.3.1. Participants

Thirty healthy volunteers (12 males; mean age 25.06 years, SD = 5.09, age range 20-36) took part in the experiment. The participants reported normal hearing abilities and normal or corrected-to-normal vision. No participant presented a neurological or psychiatric history, or a hearing impairment. All participants gave informed and written consent for their participation in accordance with the ethical and data security guidelines of the University of Geneva. The study was approved by the CCER.

2.3.2. Vocalizations

Ninety-six vocalizations of four primate species (human, chimpanzee, bonobo, rhesus macaque) in agonistic and affiliative contexts were used as stimuli. The human voices obtained from the Montreal Affective Voices (Belin, Fillion-Bilodeau, et al., 2008b) were denoted as expressing a happy, angry or fearful affect (non-linguistic affective bursts) produced by two male and two female actors. Vocalizations in corresponding contexts were selected for chimpanzee, bonobo and rhesus macaques under the form of affiliative calls (food grunts), threatening calls (aggressor in

97

agonistic context) and distress calls (victim in agonistic context). For each species, 24 stimuli were selected containing single calls or call sequences produced by 6 to 8 different individuals in their social environment. All vocal stimuli were standardized to 750 milliseconds using PRAAT (www.praat.org) but were not normalized in order to preserve the naturalness of the sounds (Ferdenzi et al., 2013).

2.3.3. fNIRS acquisition fNIRS data were acquired using the Octamon device (Artinis Medical Systems B.V., Elst, The Netherlands) at 10 Hz with 6 transmitters and 2 receivers (wavelengths of ±760 nm and ±850 nm) with an inter-distance probes at 3.5 cm. The headband holding the 8 channels was placed identically for all participants according to the 10-20 electroencephalogram (EEG) system (Jasper, 1958; Okamoto et al., 2004) by using the FPZ axis as landmark (see Figure 34). The probe locations into the Montreal Neurological Institute (MNI) space were estimated using the 3D coordinates extracted from 32 healthy participants (Vergotte et al.,

2018). Hence, the channels 1, 2, 7 and 8 were located on IFGtri and the channels 3, 4, 5 and 6 on the PFC.

Figure 34: Probe locations into the MNI space by using SPM12 software implemented in MatLab R2018b (www.fil.ion.ucl.ac.uk/spm/). Red and blue dots indicate transmitters and receivers’ positions respectively. Yellow dots indicate the channel numbers.

98

2.3.4. Experimental procedure

Seated comfortably in front of a computer, participants listened to the vocalizations played binaurally using Seinnheiser headphones at 70 dB SPL. Each of the 96 stimuli was repeated nine times across six separate blocks leading to 864 trials following a randomization process. The overall experiment was structured in various layers (Figure 35). Testing blocks were task-specific, with participants having to either perform a categorization task (A versus B) or a discrimination task (A versus non-A) in a single block. Participants repeated three times a unique categorisation block and a unique discrimination block, resulting in six blocks in total. Each block was made of 12 mini-blocks, each separated by a break of 10 seconds. These mini-blocks comprised one unique mini-block per species (human, chimpanzee, bonobo and rhesus macaque), each mini-block repeated 3 times. Within each mini-block were 12 trials, containing four vocalisations from all three affective contexts (affiliative/happy; threatening/anger; fear) produced by a single species. The blocks, mini- blocks and stimuli were pseudo-randomly assigned for each participant to avoid more than two consecutive blocks, mini-blocks and stimuli from the same category.

At the beginning of each block, participants were instructed to identify the affective content of the vocalizations using a keyboard. For instance, the instructions for the categorization task could be “Affiliative – press M or Threatening – press Z or Distress – press space bar”. Similarly, the instructions for discrimination could be “Affiliative – press Z or other affect – press M”. The pressed keys were randomly assigned across blocks and participants. The participants had to press the key during the 2-second intervals (jittering of 400 ms) between each stimulus. If the participant did not respond during this interval, the next stimulus followed automatically.

Figure 35: Structure of the experiment, with each of the six blocks made of 12 mini-blocks, which in turn comprised 12 individual trials.

99

2.4. Analysis

2.4.1. Behavioural data

Raw behavioural data from all participants were analysed using a logistic regression in GLMM with a binomial error distribution (correct answer – 1 or not – 0), Species (human, chimpanzee, bonobo, and rhesus macaque), Tasks (categorization and discrimination), and Affects (affiliative, threat, and distress) as fixed factors; and participant IDs and order of the blocks as random factors. In order to test our hypotheses regarding the phylogenetic distance and the task complexity on participants’ performances we compared, using specific contrasts, the differences between Species and Affects within the categorization and the discrimination tasks. These contrasts were corrected with Bonferroni correction (Pcorrected = .05/number of tests). Similarly, the participants’ reaction time (correct answers only) were analysed using a GLMM with a Gaussian distribution with the same contrasts and analysis as for accuracy (see Supplementary Material).

2.4.2. Interaction between Participants’ Performance and Brain O2Hb changes

Interaction analyses were ran using R studio software (R Studio team (2015) Inc., Boston,

MA, url: http://www.rstudio.com/). To test whether the IFGtri and PFC activations facilitated the participants’ affective recognition, we used fNIRS data as continuous predictor in GLMM analysis for accuracy. To perform this interaction, we only used accuracy from the twenty participants included in fNIRS analyses. The GLMM included Species (human, chimpanzee, bonobo and rhesus macaque), Tasks (discrimination and categorization),

Affects (threat, distress and affiliative), and ROIs (IFGtri and PFC) as fixed factors, fNIRS data from IFGtri and PFC as continuous predictors, and participant IDs and blocks order as random factors. To assess the variance explained by the phylogeny as well within the frontal activation, we tested all slopes with the following contrast: human vs [great apes (chimpanzee and bonobo)] vs rhesus macaque.

2.4.3. fNIRS data

Ten participants out of 30 were excluded from the dataset due to poor signal quality (large number of artefacts after filtering) or missing fNIRS data. A total of 20 participants were thus analysed in this study, in line with previous power analyses in fMRI (Desmond &

100

Glover, 2002) and research using fNIRS to assess emotional processing in frontal areas (for a review, see Bendall et al., 2016). We performed on all channels the first level analysis with MatLab 2018b (Mathwortks, Natick, MA) using the SPM_fNIRS toolbox (Tak, Uga, Flandin, Dan, & Penny, 2016; https://www.nitrc.org/projects/spm_fnirs/) and homemade scripts.

Haemoglobin conversion and temporal pre-processing of O2Hb was made using the following procedure: 1. Haemoglobin concentration changes were calculated with the modified Beer- Lambert law (Delpy et al., 1988); 2. Motion artefacts were reduced using the movement artefact reduction algorithm (Scholkmann et al., 2010) based on moving standard deviation and spline interpolation; 3. Low frequency confound were reduced using a high-pass filter based on a discrete cosine transform set with a cut-off frequency of 1/64 Hz (Friston et al., 2000); 4. Physiological and high frequency noise such vasomotion or heart beats usually found in extra-cerebral blood flow were removed using a low-pass filter based on the HRF (Friston et al., 2000).

5. O2Hb concentration changes were averaged between 4 and 12 seconds post stimulus onset on each trial to include the maximum peak amplitude of the HRF observed across participants. As for fMRI imaging, this method of analysis taking into account the slow hemodynamic time course of brain activity is in line with previous literature using auditory stimuli in fNIRS (e.g. Lloyd-Fox et al., 2014).

The second level analysis was performed with R studio using GLMM with Species (human, chimpanzee, bonobo, rhesus macaque), Tasks (categorization versus discrimination), Affects

(affiliative, threatening, distressful), ROIs (PFC versus IFGtri), and Hemispheres (right versus left) as well as their interactions as fixed factors, with participant IDs and block orders as random factors. To assess the significant increase of explained variance for the main effects and their interactions we systematically compared the different models (with the more complex compared to the less complex in which one fixed factor or an interaction were dropped) using chi-squared tests. This analysis was followed by contrasts for which post- hoc correction for multiple comparisons was applied by using a Bonferroni correction.

101

2.5. Results

2.5.1. Accuracy

We investigated how the perceptual decision-making complexity influenced the ability of human participants to recognize affective contents in phylogenetically close or distant primate species (see Figure 36). Hence, participants were able to correctly recognize most of the affective cues in all species vocalizations. They were unable to do so only in the discrimination task for threatening bonobo calls (12%). Moreover, human participants were better at discriminating human voices (threat = 76%, distress = 77%, affiliative = 83%), distressful (65%) and threatening (63%) chimpanzee screams followed by distressful bonobo (62%) and affiliative macaque screams (62%). Similar results were found in categorization, to the exception of macaque calls for which participants were unable to categorize any affective cue.

Figure 36: Means and SE of human recognition of primate affective vocalizations for categorization (CAT) and discrimination (DIS) tasks and the different kinds of affective vocalizations. All contrasts were significant within each condition after Bonferroni correction with Pcorrected = .05/24=.002, excluding the following contrasts: chimpanzee vs macaque and bonobo vs macaque for affiliative cues and bonobo vs macaque for threatening contents in discrimination task (see Supplementary Material Table 5).

2.5.2. Interaction between Participants’ Performance and Brain O2Hb changes

All factors (Tasks, Species, Affects, ROIs, fNIRS data) contributed to a significant five-way interaction (χ2(54) = 660,5 p < 0.001).

102

We assessed thus more precisely how the affective contents modulated IFGtri and PFC activity across species vocalizations during the categorization or discrimination tasks. Note that similar patterns of performances between PFC and IFGtri were found. Thus, for more clarity, only the results for IFGtri will be described here (Figure 37 – for PFC, see Supplementary Material Figure 41). Contrasts testing the phylogenetic differences between the slopes for both IFGtri and PFC with humans vs [great apes (chimpanzees and bonobos)] vs rhesus macaques within each Affect and Task were significant at p < 0.001 (see Supplementary Material Table 7). Statistical levels of significance of each slope are summarized in Table 4. Hence, participants better discriminated agonistic (threat and distress) chimpanzee screams when the concentration changes of O2Hb increased in IFGtri and PFC. At the opposite, during the categorization task, the correct identification of all types of chimpanzee calls as well as affiliative macaque and agonistic bonobo vocalizations were associated to a decrease of activity in frontal regions.

Figure 37: Interaction between participants’ accuracy and O2Hb concentration changes in IFGtri within each affect and species for (A) categorization and (B) discrimination. Confidence interval at 0.95.

103

Table 4: Statistical significance of logistic regression slopes. The odds ratio quantify the strength of the association between two factors. If the slope is significant and odds ratio < 1, factors are negatively correlated (written in bold); if the slope is significant and odds ratio > 1, factors are positively correlated (written in bold italic). ** p < 0.01, * p < .05.

Categorization Discrimination Threat Distress Affiliative Threat Distress Affiliative Bonobo 0.845 * 0.883 * 1.058 0.989 1.105 1.063 Chimpanzee 0.782 * 0.693 ** 0.865 * 1.277 * 1.442 ** 0.931 Human 1.017 1.128 1.110 0.984 0.886 1.016 Macaque 1.068 0.939 0.846 * 0.935 0.896 1.054

2.5.3. fNIRS data

GLMM analyses revealed significant main effects for Tasks (χ2(1) = 130.91, p < 0.001), Hemispheres (χ2(1) = 2567.1, p < 0.001), and ROIs (χ2(1) = 8743.6, p < 0.001). We thus only ran the interaction between these three main factors (χ2(3) = 207.44, p < 0.001 – see Figure 38).

Contrasts in the three-way interaction revealed more activation in bilateral IFGtri than bilateral PFC for the discrimination task compared to the categorization one (χ2(1) = 49.648, p < 0.001). Following this, more O2Hb was found for the discrimination compared to the

2 2 categorization task in right IFGtri (χ (1) = 5.8662, p < 0.01), left IFGtri (χ (1) = 3.9285, p < 0.01), right PFC (χ2(1) = 95.3, p < 0.001) and left PFC (χ2(1) = 76.361, p < 0.001).

Figure 38: Means and SE of concentration changes of O2Hb (µM) in right and left PFC and

IFGtri during the categorization and the discrimination tasks by human participants of primate

104

affective vocalizations. All contrasts were significant within each condition after Bonferroni correction with Pcorrected = .05/8=.006. *** p< 0.001, ** p< 0.01.

2.6. Interim Discussion

The present study emphasized the different levels of complexity in decision-making processes underlying the human recognition of affects in human and non-human primate vocalizations.

Actually, we demonstrated that the left IFGtri and the right PFC were strongly involved in the discrimination task compared to the categorization one. Interestingly, and perhaps, contradictorily, we initially expected more activation in IFGtri for the categorization task (unbiased choice) because of the existing literature (Dricu et al., 2017). However, taking into account our behavioural results showing higher recognition performances in discrimination compared to categorization, we propose that more activity in IFGtri is required to enable participants to perform better during the discrimination task. At the opposite, in line with the cognitive complexity hypothesis, analyses for PFC revealed a stronger deactivation in the categorization task. We could link these last findings to the changes in regional cerebral blood flow. Indeed, Matsukawa and collaborators showed that during the passive viewing of emotional videos, the activity of PFC decreased in correlation to the reduction of facial skin blood flow (Matsukawa et al., 2018). Interestingly, these authors suggested that PFC activity might elicit an autonomic reaction with a vasoconstriction or a vasodilatation of cutaneous vessels. In the same line, George and collaborators demonstrated a stronger decrease of activity in right PFC during the viewing of pleasant pictures, also relying on a reduction of the frontal blood flow (George et al., 1995). A possibility is thus to extend the results of these visual studies to a decrease of activity in PFC regions during affective auditory processing. Overall, the results highlight the distinct roles of the IFG and the PFC in evaluative judgment and decision task in affective recognition (Schirmer & Kotz, 2006; Wagner & Watson, 2010).

Was human recognition influenced by the affects and/or the species that expressed the vocalizations? We did find an influence of these factors on behavioural responses and the interaction between participants’ performances and frontal activations. In fact, we demonstrated that

105

the correct categorization of agonistic cues in bonobo and chimpanzee vocalizations elicited a significant decrease of activity in the IFGtri and the PFC. These results might be related to an inhibition process enabling participants to reduce a high level of stress elicited by agonistic screams, i.e. automatic regulation. Frontal regions are indeed the most sensitive brain areas to stress exposure (Arnsten, 2009). Interestingly, a decrease of activation in frontal regions was also associated to better performance in the categorization task for affiliative chimpanzee and macaque vocalizations. On the contrary, in the discrimination task, agonistic chimpanzee screams were better identified when the level of activity in IFGtri and PFC increased. These results highlight the involvement of distinct mechanisms between the categorization and discrimination tasks in cross-taxa recognition. For instance, possible inhibition processes elicited by agonistic cues would rely on a decrease of activations in frontal regions for the simple choice between A versus non-A would while in categorisation (unbiased choice), similar inhibition mechanisms would require an enhancement of activity in IFG and PFC. The general absence of interaction between frontal activations and behaviours for human voices might be explained by three different mechanisms. First, for humans, because affective voices in our modern human societies are everywhere (Belin, 2006), the correct recognition of affects may not necessary involve particular frontal activations due to the human expertize in human voice processing. Second, the involvement of IFG has often been demonstrated in the literature for the recognition of emotional voices contrasted with neutral ones (e.g. Frühholz et al., 2012; Frühholz & Grandjean, 2013; Gruber et al., 2020; Sander et al., 2005; Zhang et al., 2018). Yet, in our study, we did not include such stimuli, thus comparing the cerebral activations across the affective contents. This difference in our experimental paradigm may have led to the absence of interaction between the hemodynamic response in the frontal regions and the emotional recognition in human voices. Third, encompassing three neuroanatomical and functional subparts: pars triangularis, pars orbitalis and pars opercularis (Cai & Leung, 2011), IFGtri would possibly requires the recognition of infrequent vocalizations expressed by evolutionary close species to be modulated. Following this, the phylogenetic gap of 25-33 million between rhesus macaque and the Hominidae branch might explain the lack of result for this monkey species. Furthermore, the performances on the macaque calls were really poor and then the frontal activations would not help to categorize them because human participants were, at least in this experiment, unable to categorize these calls.

106

Finally, behavioural analyses revealed that human participants were able to recognize mostly all affective cues in all species vocalizations for the discrimination task to the exception of threatening bonobo calls. Participants could be indeed biased by the peaceful nature of the bonobo species (Gruber & Clay, 2016) that is reflected in the acoustic features of their calls (Grawunder et al., 2018). For the categorization task, participants were able to recognize mostly all affects in great ape vocalizations (threatening bonobo excluded) but were not able to do it for rhesus macaque screams. It seems that the low level of complexity involved in discrimination processing compared to more complex categorization mechanisms, allowed participants to discriminate more correctly affective vocalizations of all primates, including species with high phylogenetic distances such as macaque monkeys.

Overall, these findings demonstrated cerebral and behavioural processes being in play for the recognition by humans of affective cues in primate vocalizations. Decision-making complexity, phylogeny, behaviour and acoustic proximity seems four essential markers to consider for further studies on cross-taxa recognition.

To conclude, in this study, we demonstrated the difference of mechanisms between the categorization and discrimination tasks at both behavioural and cerebral levels in frontal regions. We examined the relationship between frontal activity and the ability of humans to identify affective cues in other primate species’ vocalizations. Thus, these findings support the hypothesis of a pre-human origin of affective recognition at play in cerebral and behavioural mechanisms, and inherited from our common ancestor with other primates. It would be interesting to investigate the ontogenesis of these recognition abilities to better understand how from human voice exposure, infants are able or not to infer affective qualities of other primate calls. Furthermore, it seems primordial for the future research on perceptual decision-making, to take in consideration this evolutionary aspect to obtain a complete overview of the crucial variables possibly influencing cross-taxa affective recognition in primates. Furthermore, our results highlight the importance of the phylogenetic proximity in affective recognition processes. Additionally, to our knowledge, this study is the first to: i) distinguish categorization and discrimination processes in a neuroscientific experiment with a comparative perspective, and ii) to assess the link between cross-taxa affective recognition and frontal activations in a fNIRS paradigm. Finally, these new findings will hopefully contribute to a better understanding of emotional processing and decision-making origin in human.

107

2.7. Supplementary Material

Accuracy

Table 5: Main contrasts for the participants’ accuracy associated with degree of freedom, chi- squared tests and p-values for the interaction Species x Affect x Task. All p-values are corrected for multiple comparisons using Bonferroni correction with Pcorrected = .002.

2 Contrasts Dfs χ p-values Categorization Threat p < 0.001 Human > Chimpanzee 1 27.84 p < 0.001 Human > Bonobo 1 506.96 p < 0.001 Human > Macaque 1 363.35 p < 0.001 Chimpanzee > Macaque 1 210.18 p < 0.001 Bonobo > Macaque 1 28.06 p < 0.001 Distress Human > Chimpanzee 1 84.84 p < 0.001 Human > Bonobo 1 135.8 p < 0.001 Human > Macaque 1 319.29 p < 0.001 Chimpanzee > Macaque 1 90.67 p < 0.001 Bonobo > Macaque 1 50.30 p < 0.001 Affiliative Human > Chimpanzee 1 106.2 p < 0.001 Human > Bonobo 1 91.62 p < 0.001 Human > Macaque 1 287.96 p < 0.001 Chimpanzee > Macaque 1 51.58 p < 0.001 Bonobo > Macaque 1 64.27 p < 0.001

Discrimination Threat Human > Chimpanzee 1 42.65 p < 0.001 Human > Bonobo 1 210.35 p < 0.001 Human > Macaque 1 139.03 p < 0.001 Chimpanzee > Macaque 1 31.20 p < 0.001 Bonobo > Macaque 1 9.464 p < 0.01 Distress Human > Chimpanzee 1 35.23 p < 0.001 Human > Bonobo 1 64.30 p < 0.001 Human > Macaque 1 133.29 p < 0.001 Chimpanzee > Macaque 1 34.8 p < 0.001

108

Bonobo > Macaque 1 14.75 p < 0.001 Affiliative Human > Chimpanzee 1 152.76 p < 0.001 Human > Bonobo 1 143.37 p < 0.001 Human > Macaque 1 104.94 p < 0.001 Chimpanzee > Macaque 1 5.77 p = 1 Bonobo > Macaque 1 3.66 p < 0.1

Reaction time

Including only the correct answers of the participants, we firstly tested the difference between the categorization and discrimination tasks of affects in human voices and primate vocalizations. GLMM analysis revealed a reaction time less long in affective discrimination compared to categorization for human (χ2(1) = 176.61, p < 0.001); chimpanzee (χ2(1) = 140.7, p < 0.001); bonobo (χ2(1) = 49.075, p < 0.001) and rhesus macaque (χ2(1) = 152.53, p < 0.001) vocalizations. For the subsequent contrasts testing the phylogenetic hypothesis, we found that participants were faster to categorize and discriminate human voices compared to chimpanzee (cat: χ2(1) = 334.16, p < 0.001, dis: χ2(1) = 393.95, p < 0.001); bonobo (cat: χ2(1) = 184.11, p < 0.001, dis: χ2(1) = 448.5, p < 0.001) and macaque (cat: χ2(1) = 276.86, p < 0.001, dis: χ2(1) = 315.03, p < 0.001) affective vocalizations. Further analyses also revealed better performance in discriminating affective contents in bonobo vocalizations compared to rhesus macaque calls with the contrast bonobo vs rhesus macaque (χ2(1) = 10.811, p = 0.001).

Figure 39: Means and SE of human reaction time (milliseconds) of primate affective vocalizations for categorization and discrimination tasks.

109

Finally, as for participants’ accuracy we investigated how the different affects, species and tasks influenced the reaction time of human recognition. Therefore, participants needed less time to discriminate than categorize affective contents in all species vocalizations. Thus, participants were faster at recognizing affective cues in human voices (cat = 1080ms, dis = 950ms) than chimpanzee (cat = 1210ms, dis = 1100ms), bonobo (cat = 1210ms, dis =980 ms) or macaque (cat = 1230ms, dis = 930ms) vocalizations. Statistics revealed difference in both tasks depending on species and affective cues. Participants were faster to discriminate (in this order) human voices (threat = 950ms, distress = 1000ms, affiliative = 900ms), affiliative macaque screams (1090ms), distressful bonobo calls (1100ms) and threatening chimpanzee vocalizations (1110ms). Considering the categorization task, similar findings were found with some differences only for chimpanzee and bonobo vocalizations.

Figure 40: Means and SE of human reaction time (milliseconds) of primate affective vocalizations for categorization and discrimination tasks and the different kinds of affective vocalizations. All contrasts were significant within each condition after Bonferroni correction with Pcorrected = .05/24=.002, excluding the following contrasts: chimpanzee vs macaque and bonobo vs macaque for distressful cues in discrimination task. Regarding the categorization task, human vs chimpanzee, human vs bonobo, chimpanzee vs macaque and bonobo vs macaque for distressful cues and finally chimpanzee vs macaque and bonobo vs macaque for affiliative contents.

Table 6: Main contrasts for participants’ accuracy associated with degree of freedom, chi- squared tests and p-values for the interaction Species x Affect x Task. All p-values are corrected for multiple comparisons using Bonferroni correction with Pcorrected = .002.

110

2 Contrasts Dfs χ p-values Categorization Threat Human > Chimpanzee 1 104.45 p < 0.001 Human > Bonobo 1 23.44 p < 0.001 Human > Macaque 1 124.26 p < 0.001 Chimpanzee > Macaque 1 12.16 p < 0.001 Bonobo > Macaque 1 12.11 p < 0.001 Distress Human > Chimpanzee 1 5.63 p < 0.05 Human > Bonobo 1 5.13 p < 0.05 Human > Macaque 1 9.71 p < 0.001 Chimpanzee > Macaque 1 0.98 p = 1 Bonobo > Macaque 1 1.04 p = 1 Affiliative Human > Chimpanzee 1 350.83 p < 0.001 Human > Bonobo 1 378.44 p < 0.001 Human > Macaque 1 200 p < 0.001 Chimpanzee > Macaque 1 5.08 p < 0.05 Bonobo > Macaque 1 6.77 p < 0.01

Discrimination Threat Human > Chimpanzee 1 118.3 p < 0.001 Human > Bonobo 1 210.46 p < 0.001 Human > Macaque 1 110.98 p < 0.001 Chimpanzee > Macaque 1 0.05 p = 1 Bonobo > Macaque 1 16.95 p < 0.001 Distress Human > Chimpanzee 1 63.03 p < 0.001 Human > Bonobo 1 36.36 p < 0.001 Human > Macaque 1 78.21 p < 0.001 Chimpanzee > Macaque 1 1.66 p = 1 Bonobo > Macaque 1 8.66 p < 0.01 Affiliative Human > Chimpanzee 1 239.25 p < 0.001 Human > Bonobo 1 254.39 p < 0.001 Human > Macaque 1 130.74 p < 0.001 Chimpanzee > Macaque 1 16.46 p < 0.001 Bonobo > Macaque 1 19.25 p < 0.001

111

Accuracy * fNIRS data

Table 7: Summary of contrasts testing phylogenetic differences with humans vs [great apes (chimpanzees and bonobos)] vs rhesus macaques between the slopes for the interaction

Species x Affects x Tasks x fNIRS data for IFGtri and PFC. All p-values are corrected for multiple comparisons using Bonferroni correction with Pcorrected =.05/24=.002.

IFGtri Threat Distress Affiliative Categorization χ²(1) = 609.25, p < 0.001 χ²(1) = 795.73, p < 0.001 χ²(1) = 1170.5, p < 0.001 Discrimination χ²(1) = 456.57, p < 0.001 χ²(1) = 386.25, p < 0.001 χ²(1) = 499.69, p < 0.001 PFC Categorization χ²(1) = 609.53, p < 0.001 χ²(1) = 797.88, p < 0.001 χ²(1) = 1166.4, p < 0.001 Discrimination χ²(1) = 454.9, p < 0.001 χ²(1) = 384.98, p < 0.001 χ²(1) = 501.36, p < 0.001

Figure 41: Interaction between participants’ accuracy and concentration changes of O2Hb within Species and Affects for PFC in (A) categorization and (B) discrimination. Confidence interval at 0.95.

112

Chapter 3.

Humans Recognize Affective cues in Primate Vocalizations: Acoustic and Phylogenetic perspectives

Coralie Debracque, Katie Slocombe, Zanna Clay, Didier Grandjean‡ and Thibaud Gruber‡

In preparation

3.1. Abstract

Humans are adept in extracting affective information from the vocalisations of humans and other animals. Current research has mainly focused on phylogenetic proximity to explain such abilities. However, it remains unclear whether human recognition of affective cues in the vocalisations of other species is due to cross-taxa similarities between acoustic parameters, the phylogenetic distances between species, or a combination of both. Here, we investigated how humans recognized affective basis of the vocalizations of four primate species – humans, rhesus macaques, chimpanzees and bonobos – the latter two being equally phylogenetically distant from humans. 68 participants listened to 96 primate vocalizations from agonistic and affiliative contexts and were asked to categorize (‘A vs B’) or discriminate (‘A vs non-A’) based on their affective content. Results showed that participants could reliably categorize and discriminate most of the affective vocal cues expressed by other primates, except threat calls by bonobo and macaques. This perceptual capacity related strongly to the acoustic proximity of these vocalizations to the human voice, where acoustic proximity appears to facilitate affective recognition and attention toward specific calls. We found that chimpanzee vocalisations were acoustically closer to humans than to bonobos’, suggesting a potential derived vocal evolution in the bonobo lineage. Overall, findings support the hypothesis of a pre-human origin of affective recognition processing inherited from our common ancestor with other great apes. Furthermore, our results highlight for the first time the importance of both phylogenetic and acoustic parameter level explanations in these crucial mechanisms.

Keywords: categorization, discrimination, affect, vocalization, primate, acoustic, phylogeny

113

3.2. Introduction

Vocal affective communication is crucial for the emotional and attentional regulation of human social interactions (Grandjean et al., 2005; Sander et al., 2005; Schore & Schore, 2008). For instance, the modulation of prosodic features in human speech such as intonation or amplitude can convey subtle affective information to receivers (Grandjean, Bänziger, & Scherer, 2006; Scherer, 2003). Humans consistently recognize and evaluate the affective cues of others’ vocal production at different levels of complexity, as exemplified in studies of emotion categorization (unbiased choice; e.g. A versus B) and discrimination (biased choice; e.g. A versus non A) (Dricu et al., 2017; Gruber et al. 2020). In particular, by attending to the acoustic parameters in a vocalization, the listener is able to subjectively attribute the speaker’s affective state as well as any potentially referential content (Brunswick, 1956; Grandjean et al., 2006). These affective identification mechanisms aim to trigger adaptive behaviour such as approaching or avoiding the stimulus (Frijda, 1987, 2016; Gross, 1998; Nesse, 1990). Despite the adaptive value and importance of auditory affective processing abilities in humans, the evolutionary origins of these abilities remain to be investigated, and comparative research is a useful tool to help unravel this question. While central to social interactions in our own species, the adaptive behaviours underpinning emotion communication are also shared amongst animals. Over a century ago, Darwin (1872) hypothesized an evolutionary continuity between human and other animals for the vocal expression of affective signals. Morton subsequently proposed a model of motivational structural rules to characterize the universal relationship between the structure of mammal and bird vocalizations and their affective contents (Morton, 1977, 1982). For instance, being able to correctly detect modulation in acoustic parameters as a cue to affective information is essential to evaluate the level of threat or danger (Anderson & Adolphs, 2014; Filippi et al., 2017). Research on NHP has demonstrated that chimpanzees can modulate their vocalizations depending on whether they are victims or aggressors in agonistic interactions (Slocombe & Zuberbühler, 2005), and can also discriminate aggression severity based on victim scream acoustic properties (Slocombe & Zuberbühler, 2007; Slocombe, Townsend, & Zuberbühler, 2009). This enables the receiver to approach or avoid the conflict, and to detect potential changes in the fabric of their social environment. Similar results have been found in macaques, who produce specific vocalizations during conflicts and can distinguish the gravity of a situation while listening to the victim’s calls.

114

Thus, a macaque that listened to the victim’s calls can react adaptively by approaching or avoiding the conflict (Gouzoules, 1984). Although accurate conveyance of emotional information in the auditory domain occurs within non-human species as described above, one way of examining the origins of our affective perceptual abilities is to focus on cross-species perception. Playback studies on cross-taxa recognition have indeed shown that humans can recognize the affective valence and arousal of heterospecific vocalizations. For example, they are able to correctly identify affective valence and arousal in barbary macaque and chimpanzee calls but also in dog and cat vocalizations (Filippi et al., 2017; Scheumann et al., 2014; Belin et al., 2008). Several factors may explain this cross-specific recognition: current empirical research has revealed cross-taxa similarities in the acoustic conveyance of affect (Ross, Owren, & Zimmermann, 2009; Scheumann et al., 2014); moreover, they seem to be processed by homologous brain regions related to affective processing (Panksepp, 2011a; Panksepp & Burgdorf, 2003). With respect to pets, familiarity may also lead to a learning process in recognizing the affective states carried by their vocalizations. Thus far, most comparative research has focused on NHPs including great apes, humans’ closest living relatives. In fact, the ability to modulate prosodic features of a vocal signal to express an affective content may be considered a homologous trait between humans and other primates inherited from our common ancestor (Filippi, 2016). A recent study has highlighted the importance of phylogenetic proximity in human recognition of affects. At the behavioural level, humans can discriminate the valence of chimpanzee vocalizations, such as agonistic screams and food-associated calls, however they are unable to do so for rhesus macaque calls given in the same contexts (Fritz et al., 2018). Furthermore, fMRI measures within this study showed that neural activations were more similar when attending to chimpanzee and human vocalizations compared to macaque calls. Behaviourally, evidence for successful discrimination of macaque vocalisations remains mixed: Belin and colleagues (2008) found that participants could not accurately judge the valence of rhesus macaque calls, whereas Linnankoski and colleagues (1994) found that both human adults and infants could categorize affective cues in macaque vocalizations. Although the degree to which humans can reliably process the affective content of more distantly related primate species remains uncertain, it seems that humans perform better with more closely related chimpanzee calls (Kamiloğlu et al., 2020). However, research adopting a comparative affective approach remains limited (Gruber & Grandjean, 2017), and current findings, based on a limited sample of tested species, raise an empirical

115

and theoretical problem: homologous (shared) mechanisms of both affective perception and production may lead individuals of closely related species to both produce acoustically similar calls, and be sensitive to similar acoustical cues in their affective vocalizations, allowing listeners to correctly interpret them. It is therefore unclear whether the human ability to recognize affective cues from vocalizations of other species is due to cross-taxa similarities in acoustic parameters, to the phylogenetic distances between species, or to both, considering that closely phylogenetically-related species are likely to share acoustic parameters. While humans can recognize both valence and arousal in non-human primate vocalizations, only few studies have investigated the origin of these common mechanisms using the acoustic properties contained in other primate vocalizations (Briefer, 2012). Studies that investigate the relation between acoustic properties and arousal ratings by humans of NHP vocalizations are even more scarce (Filippi et al., 2017; Linnankoski et al., 1994; Scheumann et al., 2014). Nevertheless, acoustic differences being influenced by not only the species that expressed the vocalizations but also by the emotional valence of the calls (Belin, Fecteau, et al., 2008), it is important to take in consideration both modulations in the study of acoustic properties in primates. While claims of phylogenetic distance so far are likely confounded with acoustic similarity in existing studies, a possible way to go forward is to include our equally phylogenetically-closest relatives, the bonobos (Gruber & Clay, 2016). Bonobos are a key species for testing this as they are as close to humans as chimpanzees, but are noted for their acoustic differences with the latter (Tuttle, 1993). In particular, bonobos may have experienced evolutionary changes in their vocal communication in part due to a neoteny process: Grawunder and collaborators (2018) revealed that due to their larynx morphology, bonobo calls were much higher in fundamental frequency than chimpanzee calls, even if both species live in similar environments. Overall, it is therefore unclear whether the human ability to recognize affective cues from vocalisations of other species is mainly due to the proximity between species or across acoustic parameters. To address this outstanding question, the present study aimed to investigate human affective recognition processing in human and NHP vocalizations taking into account the combined influence of acoustic proximity within the phylogenetic distances. For a better understanding of general recognition processes, participants performed categorization (A versus B) and discrimination (A versus non-A) tasks of affective contents expressed in threatening, distressful and affiliative contexts with human voices,

116

great ape (chimpanzee, bonobo) and monkey (rhesus macaque) vocalizations. We predicted that i) if different levels of cognitive complexity between categorization and discrimination were at play as with human voices (e.g. Dricu et al. 2017), participant should be better at discriminating than categorizing primate affective vocalizations; ii) if phylogenetic distance was the main determinant of performance, participants would be able to recognize affective cues in human voices and great ape vocalizations better than in monkey vocalizations (phylogenetic hypothesis); while iii) if acoustic similarity was the main determinant of performance, participants would perform best with the calls of species most acoustically similar to those of humans (acoustic hypothesis). Finally, iv) if both hypotheses explained a part of common variance, it was also possible that different parts of the variance would be related to acoustic versus phylogenetic factors in the participants’ behavioural results; hence, for this specific prediction we would be able to quantify these different parts of variance.

3.3. Materials and methods

3.3.1. Participants

Sixty-eight healthy volunteers (29 males; mean age 23.54 years, SD = 5.09, age range 20 – 37 years) took part in the experiment. The participants reported normal hearing abilities and normal or corrected-to-normal vision. No participant presented a neurological or psychiatric history, or a hearing impairment. All participants gave informed and written consent for their participation in accordance with the ethical and data security guidelines of the University of Geneva. The study was approved by the CCER.

3.3.2. Vocalizations

Ninety-six vocalizations of four primate species (human, chimpanzee, bonobo, rhesus macaque) in agonistic and affiliative contexts were used as stimuli. The human voices, obtained from the Montreal Affective Voices (Belin, Fillion-Bilodeau, et al., 2008) were denoted as expressing a happy, angry or fearful affect (non-linguistic) produced by two male and two female actors. Vocalizations in corresponding contexts were selected for chimpanzee, bonobo and rhesus macaque species under the form of affiliative calls (food grunts), threatening calls (aggressors in agonistic context) and distress calls (victims in agonistic context). For each

117

species, 24 stimuli were selected containing single calls or call sequences produced by 6 to 8 different individuals. All vocal stimuli were standardized to 750 milliseconds using PRAAT (www.praat.org) but were not normalized for energy to preserve the naturality of the sounds (Ferdenzi et al., 2013).

3.3.3. Experimental procedure

Seated comfortably in front of a computer, participants listened to the vocalizations played binaurally using Seinnheiser headphones at 70 dB SPL. Each of the 96 stimuli was repeated nine times across six separate blocks leading to 864 trials following a randomization process. The overall experiment was structured in various layers (Figure 42). Testing blocks were task-specific, with participants having to either perform a categorization task (A versus B) or a discrimination task (A versus non-A) in a single block. Participants repeated three times a unique categorisation block and a unique discrimination block, resulting in six blocks in total. Each block was made of 12 mini-blocks, each separated by a break of 10 seconds. These mini-blocks comprised one unique mini-block per species (human, chimpanzee, bonobo and rhesus macaque), each mini-block repeated 3 times. Within each mini-block were 12 trials, containing four vocalisations from all three contexts (affiliative/happy; threatening/anger; fear) produced by a single species. The blocks, mini- blocks and stimuli were pseudo-randomly assigned for each participant to avoid more than two consecutive blocks, mini-blocks and stimuli from the same category.

At the beginning of each block, participants were instructed to identify the affective content of the vocalizations using a keyboard. For instance, the instructions for the categorization task could be “Affiliative – press M or Threatening – press Z or Distress – press space bar”. Similarly, the instructions for discrimination could be “Affiliative – press Z or other affect – press M”. The pressed keys were randomly assigned across blocks and participants. The participants pressed the key during 2-second intervals (jittering of 400 ms) between each stimulus. If the participant did not respond during this interval, the next stimulus followed automatically.

118

Figure 42: Structure of the experiment, with each of the six blocks made of 12 mini-blocks, which in turn comprised 12 individual trials.

3.4. Analysis

3.4.1. Acoustic analyses

To quantify the impact of acoustic distances in human recognition of affective vocalizations of other primates, we extracted 88 acoustic parameters from all vocalizations using the extended Geneva Acoustic parameters set defined as the optimal acoustic indicators related to voice analysis (GeMAPS; Eyben et al., 2016). This set of acoustical parameters was selected based on i) their potential to index affective physiological changes in voice production, ii) their proven value in former studies as well as their automatic extractability, and iii) their theoretical significance. Then, to assess the acoustic distance between vocalizations of all species, we ran a General Discriminant Analysis model (GDA). More precisely, we used the 88 acoustical parameters in a GDA in order to discriminate our stimuli based on the different species (human, chimpanzee, bonobo, and rhesus macaque). Excluding the acoustical variables with the highest correlations (>.90) to avoid redundancy of acoustic parameters, we retained 16 acoustic parameters to discriminate species (see Supplementary material Table 8).

We subsequently computed Mahalanobis distances of the 96 stimuli using these selected acoustical features discriminating the best the different species. A Mahalanobis distance is obtained from a generalized pattern analysis computing the distance of each vocalization from the centroids of the different species vocalizations. This analysis allowed us to obtain

119

an acoustical distance matrix used to test how these acoustical distances were differentially related to the different species. Using this acoustical distance matrix to investigate how a given vocalization of a particular species was acoustically closer or distant to another, we performed a GLMM. This GLMM included the Mahalanobis distances as dependent measure with two fixed factors: Observed-Species (the real species which produced the vocalization) and Distance-Species (the species centroid used to compute the distance for the same species or for the other species); and the vocalizations IDs as a random factor.

3.4.2. Behavioural analyses

Accuracy

Behavioural data were analysed using R studio (R Studio team (2015) Inc., Boston, MA, url: http://www.rstudio.com/). In order to assess participant accuracy we computed d-Prime values (based on the ratio of correct responses and false alarms) in order to take into account the possible bias of participant’s responses (e.g. the tendency to select one response more often than others). We ran GLMM models for categorization and discrimination tasks separately on the calculated d-Prime values with Species (human, chimpanzee, bonobo, and rhesus macaque) and Affects (affiliative, threat, and distress) as fixed factors, and participant IDs as random factor. To assess the significant increase of explained variance for the main effects and their interactions we systematically compared the different GLMM models (with the more complex compared to the less complex in which one fixed factor or an interaction were dropped) using chi-squared tests. We also ran a GLMM with a binomial distribution (correct answer: 1, incorrect: 0) with Species (human, chimpanzee, bonobo, and rhesus macaque), Tasks (categorization and discrimination), and Affects (affiliative, threat, and distress) as fixed factors; and participant IDs and order of the blocks as random factors provided in the Supplementary Material. Finally we investigated if participants’ accuracy was above chance for the different species, across affects and tasks, using a GLMM with a binomial distribution with Species, Tasks and Affects as fixed factors; and participants IDs as random factor provided in the Supplementary Material.

120

3.4.3. Interaction between Behaviour and Acoustic Similarity

Testing the parts of variance related to acoustic and phylogenetic distances

To test whether the acoustic distance between human voices and NHP vocalizations facilitated affective recognition, we used the Mahalanobis distances as continuous predictor in GLMM analyses for participants’ accuracy. The GLMMs included Species (human, chimpanzee, bonobo, and rhesus macaque), Tasks (discrimination, categorization), and Affects (affiliative, threat and distress) as fixed factors, Mahalanobis distances computed from the centroids of the different species as continuous predictor, and participant IDs and block order as random factors. To quantify the specific variance explained by both i) the phylogeny and ii) the acoustic distances, we computed the explained variance for the two GLMM models taking into account respectively the Mahalanobis acoustic distances and the phylogenetic distances. We also tested how acoustic distances impact on accuracy for the different affective states taking into account the Task factor.

3.5. Results

3.5.1. Acoustic analyses

The GDA allowed us to compute Mahalanobis distances for all stimuli as a function of Species (Figure 43). A GLMM on these Mahalanobis distances revealed a significant interaction effect between Observed-Species and Distance-Species factors (χ2(9) = 516.22, p < 0.001). Chimpanzee vocalizations were significantly closer in acoustic distance to human vocalizations than bonobo ones (χ2(1) = 14.41, p < 0.001), and bonobo vocalizations were closer to human vocalizations than macaque calls (χ2(1) = 15.09, p < 0.001; Figure 2). All pairwise comparisons were significant from human vocalizations after Bonferroni correction (Pcorrected = .05/6 = .008). Mahalanobis distances from chimpanzee, bonobo and macaque’s centroids and other pairwise comparisons are reported in Supplementary Material (Figure 46 and Table 9).

121

Figure 42: Boxplot of Mahalanobis distances from the GLMM model for the 96 vocalizations representing acoustic distances to human voice compared to the other species vocalizations. Higher values represent greater acoustic distances. (***= p<.001).

3.5.2. Behavioural results

We performed GLMM on accuracy (i.e. d-Prime values). Chance level and frequency analysis are provided in Supplementary Material (respectively Table 10 and 11, Figure 47).

Testing the phylogenetic hypothesis on d-Prime values

To test the effects of phylogenetic distance we performed contrasts of interest on the Species factor (e.g. human>chimpanzee=bonobo>macaque) taking into account the other fixed and random factors and their interactions on participants’ accuracy based on d-Prime values for Discrimination and Categorization tasks (Figure 43 and 44). GLMM model comparison revealed a significant interaction between Species and Affect factors (Discrimination: χ2(6) = 77.07, p < 0.001; Categorization: χ2(6) = 167.96, p < 0.001; see Figure S2). Contrast analysis of interest revealed that d-Prime values were higher for human compared to chimpanzee (Discrimination: χ2(1) = 103.34, p < 0.001; Categorization: χ2(1) = 499.17, p < 0.001); they were also higher for chimpanzee versus bonobo (Discrimination: χ2(1) = 30.86, p < 0.001; Categorization: χ2(1) = 89.64, p < 0.001); and higher for bonobo versus

122

macaque (Discrimination: χ2(1) = 7.36, p < 0.0067; Categorization: χ2(1) = 26.89, p < 0.001). All other contrasts are reported in the Supplementary Material Table 12. Note that all contrasts were corrected for multiple comparisons (Bonferroni correction: Pcorrected = .05/6=.008).

Figure 43: Boxplot of d-Prime values for the Discrimination task from the GLMM model for the different species vocalizations. Higher values represent greater accuracy (***: p<.001; **: p<.01).

Figure 44: Boxplot of d-Prime values for the Categorization task from the GLMM model for the different species vocalizations. Higher values represent greater accuracy (***: p<.001).

123

3.5.3. Interaction between Behaviour and Acoustic Similarity

In order to test our predictions, we performed statistical model comparisons for Species and Mahalanobis distances including the other main effects and the interaction for Tasks and Affects. Adding the factor Species, related to our phylogenetic hypothesis, increased significantly the part of explained variance of accuracy (χ2(1) = 5110.1, p < 0.001). The main effect of the Mahalanobis distances was also significant (χ2(1) = 1586.6, p < 0.001). When we tested the explained variance of Species taking into account the Mahalanobis distance in our model (including the interactions), the analysis revealed also a significant effect of Species (χ2(3) = 2974, p < 0.001), the same was true for the model including Species when we added the Mahalanobis distances (χ2(1) = 40.56, p < 0.001). Furthermore, the specific interaction of Species and Mahalanobis distances was also significant (χ2(1) = 627.75, p < 0.001). Overall, these results show that each factor, as well as their interaction, explained a part of the variance, underlying separate and combined effects. In addition, to assess the statistical significance of logistic regression slopes, we calculated the odds ratio (OR) to quantify the strength of the association between our two factors. If odds ratio < 1, factors are negatively correlated; if odds ratio > 1, factors are positively correlated. Hence, we found that Mahalanobis distances influenced differently the recognition rates of the calls depending on the task. In the case of Discrimination, the Mahalanobis distances only influenced recognition of humans (OR = 1.655, p < 0.01) and bonobos (OR = 0.77, p < 0.05) vocalizations. For humans, the higher the distance, the higher the recognition, while for bonobos, the lower the distance, the higher the recognition. In contrast, Categorization for all species was influenced by the Mahalanobis distances (humans: OR = 2.359, p < 0.001; chimpanzees: OR = 1.619, p < 0.001; bonobos: OR = 0.686, p < 0.001; macaques: OR = 1.648, p < 0.001). For humans, chimpanzees and macaques, the higher the distance, the higher the recognition, while for bonobos, the lower the distance, the higher the recognition (Figure 45).

124

Figure 45: Recognition rate (in percent) as a function of Mahalanobis distances for (A) Discrimination and (B) Categorization. Confidence interval at 0.95.

3.6. Interim Discussion

In this study, we investigated the role of acoustic parameters and phylogenetic proximity in the recognition of vocal affects by humans in primate vocalizations within a categorization and discrimination paradigm. Our primary question was to investigate whether acoustic factors, phylogenetic factors or a combination of both explained best participants’ behavioural responses in NHP calls. We also aimed to analyse whether such responses depended on the task considered (discrimination or categorization). As the results of our participants in discrimination/categorization replicated studies that only included human stimuli (e.g. Dricu et al. 2017), we only focus here on the acoustic and phylogenetic aspects of our results.

The factors extracted in our GDA analyses revealed the crucial role of specific acoustic features such as spectral, frequency and loudness parameters to distinguish affective vocalizations expressed by different primate species. Following this, our analysis of Mahalanobis distances showed that human vocalizations were acoustically closest to chimpanzee vocalizations, but not to bonobo calls, despite chimpanzees and bonobos being equally phylogenetically distant from humans. In fact, bonobo vocalizations, and less surprisingly macaque calls, were not only distant from human calls, but also from chimpanzee calls. This result is in line with current evidence that despite their genetic

125

proximity, chimpanzee and bonobo have known behavioural (Gruber & Clay, 2016), neurological (Staes et al., 2018) and morphological differences, including a shorter larynx for bonobos, which drives a higher F0 in their vocalizations (Grawunder et al., 2018). These observed acoustical differences had direct consequences on our participants’ behaviour in our experimental paradigm.

The acoustic proximity of NHP vocalizations to the human voice was indeed reflected in the participants’ ability to categorize and discriminate affective contents best in human voices, followed by chimpanzee and bonobo vocalizations, and finally by rhesus macaque calls. Interestingly, and in contrast to previous research (Fritz et al., 2018; Scheumann et al., 2014; Scheumann, Hasting, Zimmermann, & Kotz, 2017; Belin, Fecteau, et al., 2008), while overall worse than with other species, our participants were capable of identifying affiliative cues in rhesus macaque vocalizations in the discrimination task. This result in is line with the findings of Linnankoski and colleagues who showed that human infants and adults were able to categorize and to discriminate rhesus macaque vocalizations (Linnankoski et al., 1994). The difference in findings may be due to the task required from the participants. Both our study and that of Linnankoski and colleagues’ used a forced-choice method (the use of two or more specific response options) to identify affective cues, whereas others used Likert response scales. While our findings generally highlight performance according to the Hominidae phylogenetic tree, we also found differential performance for detection of chimpanzee and bonobo vocalizations. Overall, chimpanzee vocalizations were closer to human voices, and participants were more successful in identifying them compared to rhesus macaque and even bonobo calls. This finding is particularly important because it underlines that phylogenetic proximity is not the only factor at the root of interspecific recognition.

Beyond simple explanations relying only on acoustic or phylogenetic aspects at the exclusion of the other, the aim of our study was also to assess how much the acoustic proximity between affective vocalizations produced by human and NHP species would influence the perception of emotional cues by humans, and how this interacted with phylogeny. Our last analyses thus sought to address how the acoustic similarity to human vocalizations and the phylogenetic distance influenced the recognition of the emotional content of the calls in both human and NHP vocalizations. In particular, we found that both factors individually explained some of the variance recorded, but also their interaction,

126

showing that they all contributed to a different aspect of emotion recognition. In addition, the acoustical distance to human vocalizations also influenced the recognition rate of the various calls. This was more crucial in the categorization task than in the discrimination one emphasizing the influence of choice complexity in recognition mechanisms. More complex decisions would indeed require additional processes, such as the evaluation of acoustic parameters of the call. Moreover, further analyses in the affective categorization task revealed that the participants’ performance increase with acoustic distance for most species (apart from bonobos). This somewhat unexpected result may possibly be explained by an oddball effect induced by the human listening of infrequent acoustic features A recent study indeed demonstrated at the brain level an involuntary attention switch from familiar to novel stimuli for the human perception of tree-shrew calls (a primate like species) compared to the vocalizations of more familiar species such as dogs (Scheumann et al., 2017). In our case, it may be that unfamiliar human, chimpanzee and macaque vocalizations could have triggered a more sustained attention in our participants, leading them to categorize calls more often in the right category. On the contrary, participants more accurately categorized and discriminated affective contents in bonobo calls when the acoustic features were closer to the ones involved in the human voice. These results could be both explained by the higher F0 of bonobo vocalizations and the unfamiliarity of naïve participants with this species. In fact, bonobo calls were most often verbally pointed out by participants as the most unusual. Hence, future work will need to disentangle the effect of familiarity from potential acoustic parameters. Overall, our results suggest the existence of different mechanisms between the affective categorization and discrimination of human, chimpanzee and macaque vocalizations compared to more unfamiliar bonobo calls.

To conclude, in this study, we demonstrated the ability of humans to categorize and discriminate affective cues in other primate species’ vocalizations. Thus, our findings support the hypothesis of a pre-human origin of affective recognition mechanisms, innate in humans and inherited from our common ancestor with other primates. Furthermore, for the first time, our results highlight the interaction of both phylogenetic and acoustic similarity in these crucial mechanisms, by using the acoustic distance between four primate species to assess its impact on affective vocalization recognition. Finally, the inclusion of bonobo vocalizations allowed us to disentangle phylogeny from acoustic factors,

127

underlining once again the idiosyncratic evolutionary pathway on which the former have engaged. Future work should take in consideration the familiarity with tested species as well as aim to obtain an overview of the interactions between variables possibly influencing affective recognition. It would also be interesting to explore neural correlates associated with these phylogenetic and acoustic parameters. Finally, these new findings will hopefully contribute to a better understanding of emotional processing origin in humans.

3.7. Supplementary Material

General Discriminant Analysis (GDA) and Exploratory Factor Analysis (EFA)

The GDA revealed that 16 acoustical parameters are crucial for discriminating the vocalizations of the different species. An EFA on these parameters showed that most of the variance was explained by Factor 1 (27.14%) and Factor 2 (21.63%) followed by Factor 3 (18.99%), Factor 4 (17.57%) and Factor 5 (16.72%). The highest loadings for Factor 1 is SpectralFlux_sma3_amean (r = 0.723), EquivalentSoundLevel_dBp (r = 0.916) and Loundness_sma3_amean (r = 0.869); for Factor 2 is F2bandwidth_sma3nz_amean (r = 0.789); for Factor 3 is LogReIF0_sma3nz_amean (r = 0.804) and ShimmerLocaldB_sma3nz_amean (r = -0.703); and finally for Factor 5 is SlopeV0-500_sma3nz_amean (r = 0.891) (see Table 8).

Table 8: EFA summary on the acoustical features selected from the GDA and their loadings for the 5 factors discriminating the species across affective contents. Highlighted correlations are > .70.

128

Mahalanobis distances

Figure 46: Boxplots of Mahalanobis distances from the GLMM model for the 96 vocalizations representing acoustic distances for each species compared to the others. Higher values represent greater distances. All pairwise comparisons were significant after Bonferroni correction (Pcorrected = .05/6 = .008), excluding bonobo vs human and bonobo vs macaque from chimpanzee vocalizations; chimpanzee vs human/macaque and human vs macaque from bonobo vocalizations; and bonobo vs chimpanzee from macaque vocalizations (Table 9).

Table 9: Significant main contrasts for the Mahalanobis distances analysis on all 96 vocalizations with chi-squared tests and p-values for the interaction Observed-Species and Distance-Species factors (human, chimpanzee, bonobo, rhesus macaque). All p-values are corrected for multiple comparisons using Bonferroni correction (Pcorrected = .05/6 = .008), in italic the non-significant contrasts.

Contrasts χ2 p-values Contrasts χ2 p-values Distance from Human voices Distance from Bonobo vocalizations Human vs Chimpanzee 17.01 p < 0.001 Bonobo vs Human 43.17 p < 0.001 Human vs Bonobo 62.74 p < 0.001 Bonobo vs Macaque 36.16 p < 0.001 Human vs Macaque 139.36 p < 0.001 Bonobo vs Chimpanzee 30.99 p < 0.001 Bonobo vs Macaque 15.09 p < 0.001 Chimpanzee vs Human 1.01 p = 0.315

129

Bonobo vs Chimpanzee 14.41 p < 0.001 Chimpanzee vs Macaque 0.2 p = 0.655 Chimpanzee vs Macaque 58.99 p < 0.001 Human vs Macaque 0.31 p = 0.577 Distance from Chimpanzee Distance from Macaque vocalizations vocalizations Chimpanzee vs Human 31.94 p < 0.001 Macaque vs Human 81.59 p < 0.001 Chimpanzee vs Macaque 100.76 p < 0.001 Macaque vs Chimpanzee 32.93 p < 0.001 Chimpanzee vs Bonobo 71.3 p < 0.001 Macaque vs Bonobo 21.08 p < 0.001 Bonobo vs Human 7.8 p < 0.01 Bonobo vs Human 19.73 p < 0.001 Bonobo vs Macaque 2.54 p = 0.111 Bonobo vs Chimpanzee 1.32 p = 0.251 Human vs Macaque 19.24 p < 0.001 Chimpanzee vs Human 10.85 p < 0.001

Accuracy

Testing whether participants’ accuracy was above chance level

Table 10: Summary of the χ² values in the GLMMs testing if participants’ accuracy was above the chance level for each species (human, chimpanzee, bonobo and macaque) depending on the affective content of the vocalizations (threat, distress and affiliative) and the recognition task (categorization [Cat] and discrimination [Dis]). *** p<0.001; ** p<0.01; * p<0.05; # p<0.06.

Threat Distress Affiliative Cat Dis Cat Dis Cat Dis Human 4.337 *** 8.828 *** 1.942 # 7.444 *** -0.78 14.72 *** Chimpanzee -3.386 5.257 *** -2.143 3.653 *** -4.047 1.926 # Bonobo -12.37 -2.409 -2.15 4.049 *** -3.049 2.626 ** Macaque -5.528 0.683 -6.336 1.21 -5.814 4.211***

Testing accuracy on frequency analysis

This analysis revealed the same pattern of performance between the affective categorization and discrimination tasks. The analyses revealed better performances in affective categorization (Cat) and discrimination (Dis) respectively for human vocalizations compared to chimpanzee (Cat: χ2(1) = 995.48, p < 0.001, Dis: χ2(1) = 513.59, p < 0.0001), bonobo (Cat: χ2(1) = 2175.7, p < 0.001, Dis: χ2(1) = 1163.2, p < 0.001), and macaque vocalizations (Cat: χ2(1) = 2848.5, p < 0.001, Dis: χ2(1) = 1178.3, p < 0.001). We also predicted that humans would better categorize, and discriminate chimpanzee and bonobo vocalizations compared to more phylogenetically distant macaque vocalizations. These contrasts revealed significant effects when comparing chimpanzee to macaque for both categorization (χ2(1) = 611.54, p < 0.001) and discrimination (χ2(1) = 160.73, p < 0.001). The contrast was also significant between

130

bonobos and macaques for categorization (χ2(1) = 67.19, p < 0.001) but not for discrimination (χ2(1) < 1).

Table 11: Statistical values for the main contrasts for the participants’ accuracy associated with degree of freedom, chi-squared tests and p-values for the interaction species (human, chimpanzee, bonobo, rhesus macaque) x affect (affiliative, threat, distress) x task (discrimination, categorization). All p-values are correct for multiple comparisons using Bonferroni correction with Pcorrected = .002, not significant contrasts are written in italic.

Contrasts Dfs χ2 p-values Categorization Affiliative Human > Chimpanzee 1 670.3 p < 0.001 Human > Bonobo 1 626.1 p < 0.001 Human > Macaque 1 944.33 p < 0.001 Chimpanzee > Macaque 1 34.68 p < 0.001 Bonobo > Macaque 1 48.43 p < 0.001 Bonobo > Chimpanzee 1 0.92 p = 0.336 Threat p < 0.001 Human > Chimpanzee 1 176.88 p < 0.001 Human > Bonobo 1 1415 p < 0.001 Human > Macaque 1 1167.3 p < 0.001 Chimpanzee > Macaque 1 539.48 p < 0.001 Bonobo > Macaque 1 34.65 p < 0.001 Bonobo > Chimpanzee 1 702.3 p < 0.001 Distress Human > Chimpanzee 1 221.56 p < 0.001 Human > Bonobo 1 309.9 p < 0.001 Human > Macaque 1 741.21 p < 0.001 Chimpanzee > Macaque 1 181.16 p < 0.001 Bonobo > Macaque 1 114.75 p < 0.001 Bonobo > Chimpanzee 1 4.63 p = 0.031

Discrimination Affiliative Human > Chimpanzee 1 464.86 p < 0.001 Human > Bonobo 1 440.98 p < 0.001 Human > Macaque 1 484.67 p < 0.001 Chimpanzee > Macaque 1 0.32 p = 0.016 Bonobo > Macaque 1 1.64 p = 0.056 Bonobo > Chimpanzee 1 0.36 p = 0.541 Threat Human > Chimpanzee 1 88.2 p < 0.001

131

Human > Bonobo 1 627.74 p < 0.001 Human > Macaque 1 426.31 p < 0.001 Chimpanzee > Macaque 1 140.98 p < 0.001 Bonobo > Macaque 1 24.2 p < 0.001 Bonobo > Chimpanzee 1 270.51 p < 0.001 Distress Human > Chimpanzee 1 59.24 p < 0.001 Human > Bonobo 1 162.83 p < 0.001 Human > Macaque 1 276.37 p < 0.001 Chimpanzee > Macaque 1 86.04 p < 0.001 Bonobo > Macaque 1 16.9 p < 0.001 Bonobo > Chimpanzee 1 26.78 p < 0.001

Figure 47: Boxplots representing accuracy as response frequencies (up) and d-Primes (down) values for discrimination (left) and categorization (right) of all species and all affective contents.

132

Table 12: Statistical values of d-Prime analysis for the main contrasts for the participants’ accuracy associated with degree of freedom, chi-squared tests and p-values for the interaction species (human, chimpanzee, bonobo, rhesus macaque) x task (discrimination, categorization).

All p-values are correct for multiple comparisons using Bonferroni correction with Pcorrected = 0.008.

Categorization Discrimination Human > Chimpanzee χ(1) = 499.17; p < 0.001 χ(1) = 103.34; p < 0.001 Human > Bonobo χ(1) = 1011.9; p < 0.001 χ(1) = 247.14; p < 0.001 Human > Macaque χ(1) = 1368.6; p < 0.001 χ(1) = 339.76; p < 0.001 Bonobo > Chimpanzee χ(1) = 89.63; p < 0.001 χ(1) = 30.86; p < 0.001 Bonobo > Macaque χ(1) = 26.88; p < 0.001 χ(1) = 7.35; p < 0.01 Chimpanzee > Macaque χ(1) = 214.71; p < 0.001 χ(1) = 68.34; p < 0.001

Interaction between accuracy and Mahalanobis distances

Figure 48: Interaction between participants’ performance in the discrimination task and the Mahalanobis distances as function of Species and Affect.

Table 13: Statistical values for the main contrasts associated with degree of freedom and chi- squared tests for the interaction species (human, chimpanzee, bonobo, rhesus macaque) x affect (affiliative, threat, distress) x task (discrimination, categorization) x mahalanobis distances. All p-values are corrected for multiple comparisons using Bonferroni correction with Pcorrected = .008 and were found significant at p<0.001.

Observed species Distance From species Human Chimpanzee Bonobo Macaque Human χ² (1) = 17.015 χ² (1) = 62.643 χ² (1) = 139.36 Chimpanzee χ² (1) = 31.939 χ² (1) = 71.299 χ² (1) = 100.76

133

bonobo χ² (1) = 43.17 χ² (1) = 30.988 χ² (1) = 36.157 macaque χ² (1) = 81.589 χ² (1) = 32.935 χ² (1) = 21.079

Confusion matrix

Figure 49: Confusion matrix of answers (%) for the affiliative vocalizations expressed by human, chimpanzee, bonobo and macaque species.

Figure 50: Confusion matrix of answers (%) for the threatening vocalizations expressed by human, chimpanzee, bonobo and macaque species.

134

Chapter 4.

Sensitivity of the Anterior Human Temporal Voice Areas to Affective Chimpanzee Vocalizations

Leonardo Ceravolo*, Coralie Debracque*, Thibaud Gruber‡ and Didier Grandjean‡

Submitted

4.1. Abstract

In recent years, research on voice processing, particularly the study of TVA, has been dedicated almost exclusively to human vocalizations. To characterize commonalities and differences regarding affective primate vocalization modulations in the human’s TVA, the inclusion of phylogenetically close NHP, especially chimpanzees and bonobos, as well as more distantly related ones such as macaques is needed. We hypothesized that commonalities would depend on both phylogenetic proximity and acoustic distances, with chimpanzees being the closest ranking to Homo. Presenting human participants with four primate species’ vocalizations (rhesus macaques, chimpanzees, bonobos and humans), we observed within-TVA enhanced left anterior superior temporal gyrus activity (independent of fundamental frequency and mean energy) for chimpanzee compared to other non-human primates, and for chimpanzee compared to human vocalizations. Functional connectivity analyses revealed within-TVA shared coupled patterns of activity for human and chimpanzee vocalizations, which was not the case for other non-human primates. Our results support a common neural basis in the TVA for the processing of phylogenetically and acoustically close vocalizations, namely those of humans and chimpanzees.

Keywords: TVA, chimpanzee, vocalization, fMRI

4.2. Introduction

The study of the cerebral mechanisms underlying speech and voice processing has gained steam since the early 2000s with the emergence of fMRI (Ogawa et al., 1990). Voice-sensitive areas, generally referred to as TVA, have been highlighted along the upper, superior part of the temporal cortex (Belin et al., 2000). Since then, great effort has been put into better

135

characterizing these TVA, with a specific focus on their spatial compartmentalization into different functional subparts (Aglieri et al., 2018; Kriegstein & Giraud, 2004; Pernet et al., 2015). Repetitive transcranial magnetic stimulations over the right mid TVA lead to persistent voice detection impairment in a simple voice/non-voice discrimination task (Bestelmeyer et al., 2011) and a rather large body of literature is aligned with the crucial role of the TVA in voice perception and processing (Kriegstein & Giraud, 2004; Latinus et al., 2011; Zäske et al., 2017). Subparts of the TVA have also been directly linked to social perception (Lahnakoski et al., 2012), vocal emotion processing (Ethofer et al., 2006; Grandjean et al., 2005; Witteman et al., 2012), voice identity (Latinus et al., 2011, 2013) and sex (Charest et al., 2013) perception. The developmental axis of voice processing has also been studied in infants, revealing the existence of TVA as early as 7 but not 4 month-olds in the human brain (Grossmann et al., 2010) while in utero fetuses have been shown to be already able to react specifically to their parents’ voice (Kisilevsky et al., 2003). Along the evolutionary axis, evidence for TVA or more generally voice-sensitive brain areas have emerged most notably for dogs (Andics et al., 2014) and monkeys (Perrodin et al., 2011; Petkov et al., 2008), raising the questions of whether TVA are species-specific and to which extent human and non-human primates share neural mechanisms enabling them to process conspecific vocalizations (Belin, 2006). Less attention has however been devoted to paradigms presenting animal vocalizations to humans, and no study to date has ever reported human TVA activations for the processing of such auditory material, namely other animals’ vocalizations. Human processing of animal vocalizations has been studied using both monkey and cat material but no specific activations related to any of the species was observed (Belin, Fecteau, et al., 2008). Other studies have focused more specifically on phylogenetic distance, including as stimuli one human great ape (chimpanzee) and old- world-monkey (rhesus macaque) vocalizations. Such studies could not identify species- specific brain activations in spite of the correct discrimination of chimpanzee affective vocalizations (Fritz et al., 2018), and observed below vs. above (Linnankoski et al., 1994) chance discrimination of affective macaque vocalizations by human participants. This scarce literature motivated the present study that aims at a reliable investigation of species- specific TVA activations in humans asked to categorize phylogenetically close and distant species’ vocalizations while undergoing fast fMRI scanning. The importance of between- species acoustic differences, especially fundamental frequency was also of interest (Grawunder et al., 2018; Slocombe & Zuberbühler, 2007). We therefore included

136

vocalizations of our closet sister , Pan (chimpanzees; bonobos), whose estimated split with Homo is only 6-8 million years ago as well as phylogenetically more distant species (Cercopithecidae: rhesus macaques, with an estimated split with Homo 25 million years ago). In fact, any claim of human uniqueness for recruiting the TVA remains on hold and should be tested in light of these closely related species. Bonobo vocalizations are of particular interest, as this species is thought to have experienced evolutionary changes in their communication in part due to a neoteny process involving vocalization characteristic modifications (especially on with higher fundamental frequency of their calls) (Grawunder et al., 2018) even though they are as phylogenetically close to humans as chimpanzees (Gruber & Clay, 2016). Whether such changes would affect the abilities of human participants to recognize their calls should therefore be investigated in comparison to chimpanzee and rhesus macaque vocalizations. We predicted: i) closer acoustic distance for human and chimpanzee vocalizations, while more distance would separate these from bonobo and macaque vocalizations; ii) an overlap between brain networks of Homo and the Pan branch (chimpanzee, bonobo) but not for Cercopithecidae (rhesus macaque) vocalizations related to the phylogenetic distance hypothesis; iii) shared and localized brain activations for the categorization of human and chimpanzee vocalizations extending to the TVA, depending on both phylogenetic proximity and acoustic distances; iv) shared functional connectivity patterns, within the TVA, for the categorization of human and chimpanzee vocalizations, again depending on both phylogenetic proximity and acoustic distances. These hypotheses assume a control for low-level acoustic differences, namely vocalization mean fundamental frequency and energy included as trial-level covariates of no-interest in the statistical models of brain imaging.

4.3. Materials and methods

4.3.1. Species Categorization task

4.3.1.1. Participants

Twenty-five right-handed, healthy, either native or highly proficient French-speaking participants took part in the study. We included in our analyses only participants who had above chance level (25%) performance in categorizing each species, leaving us with eighteen participants (9 female, 9 male, mean age 24.61 years, SD 3.71). All participants were naive to the experimental design and study, had normal or corrected-to-normal vision, normal

137

hearing and no history of psychiatric or neurologic incidents. Participants gave written informed consent for their participation in accordance with ethical and data security guidelines of the University of Geneva. The study was approved by CCER and was conducted according to the Declaration of Helsinki.

4.3.1.2. Stimuli

Seventy-two vocalizations of four primate species (human, chimpanzee, bonobo and rhesus macaque) were used in this study. Thus, the eighteen human voices were expressed by two male and two female actors, obtained from a nonverbal validated set of Belin and collaborators (Belin, Fillion-Bilodeau, et al., 2008). The eighteen selected chimpanzee, bonobo and rhesus macaque vocalizations contained single calls or call sequences produced by 6 to 8 different individuals in their natural environment. All vocal stimuli were standardized to 750 milliseconds using PRAAT (www.praat.org) but were not normalized in order to preserve the naturalness of the sounds (Ferdenzi et al., 2013).

4.3.1.3. Experimental Procedure and Paradigm

Lay comfortably in a 3T scanner, participants listened to a total of seventy – two stimuli randomized and played binaurally using MRI compatible earphones at 70 dB SPL. At the beginning of the experiment, participants were instructed to identify the species that expressed the vocalizations using a keyboard. For instance, the instructions could be “Human – press 1, Chimpanzee – press 2, Bonobo – press 3 or Macaque – press 4”. The pressed keys were randomly assigned across participants. In a 3 – 5 second interval (jittering of 400 ms) after each stimulus, participants were asked to categorize the species. If the participant did not respond during this interval, the next stimulus followed automatically.

4.3.2. Temporal Voice Areas localizer

4.3.2.1. Participants

One-hundred and fifteen right-handed, healthy, either native or highly proficient French- speaking participants (62 female, 54 male, mean age 25.34 years, SD 5.50) were included in this functional magnetic resonance task. Among these participants, seventeen out of the eighteen who performed the species categorization task were included (the temporal voice areas localizer task was not acquired for one of them). All participants were naive to the

138

experimental design and study, had normal or corrected-to-normal vision, normal hearing and no history of psychiatric or neurologic incidents. Participants gave written informed consent for their participation in accordance with ethical and data security guidelines of the University of Geneva. The study was approved by CCER and was conducted according to the Declaration of Helsinki.

4.3.2.2. Stimuli and Paradigm

Auditory stimuli consisted of sounds from a variety of sources (Belin et al., 2000). Vocal stimuli were obtained from 47 speakers: 7 babies, 12 adults, 23 children and 5 older adults. Stimuli included 20 blocks of vocal sounds and 20 blocks of non-vocal sounds. Vocal stimuli within a block could be either speech 33%: words, non-words, foreign language or non- speech 67%: laughs, sighs, various onomatopoeia. Non-vocal stimuli consisted of natural sounds 14%: wind, streams, animals 29%: cries, gallops, the human environment 37%: cars, telephones, airplanes or musical instruments 20%: bells, harp, instrumental orchestra. The paradigm, design and stimuli were obtained through the Voice Neurocognition Laboratory website (http://vnl.psy.gla.ac.uk/resources.php). Stimuli were presented at an intensity that was kept constant throughout the experiment 70 dB sound-pressure level. Participants were instructed to actively listen to the sounds. The silent interblock interval was 8 s long.

4.4. Analysis

4.4.1. Behavioural Data analysis

4.4.1.1. Accuracy

Behavioural data were used to exclude participants who had below chance level categorization of human voices and/or of at least one other species. This decision was made to guarantee the absence of noisy data in the final sample. Therefore, data from eighteen participants mentioned above were analysed using R studio software (R Studio team Inc., Boston, MA, url: http://www.rstudio.com/). These data are reported in the Supplementary Materials since they are not part of the question of interest of this paper addressing the species-specific nature of the temporal voice areas in human participants.

139

4.4.1.2. Acoustic Mahalanobis distances

To quantify the impact of acoustic similarities in human recognition of affective vocalizations of other primates, we extracted 88 acoustic parameters from all vocalizations using the extended GeMAPS (Eyben et al., 2016). This set of acoustical parameters was selected based on i) their potential to index affective physiological changes in voice production, ii) their proven value in former studies as well as their automatic extractability, and iii) their theoretical significance. Then, to assess the acoustic distance between vocalizations of all species, we ran a GDA. More precisely, we used the 88 acoustical parameters in a GDA in order to discriminate our stimuli based on the different species (human, chimpanzee, bonobo, and rhesus macaque). Excluding the acoustical variables with the highest correlations (>.90) to avoid redundancy of acoustic parameters, we retained 16 acoustic parameters.

We subsequently computed Mahalanobis distances to classify the 96 stimuli on these selected acoustical features. A Mahalanobis distance is a generalized pattern analysis comparing the distance of each vocalization from the centroids of the different species vocalizations. This analysis allowed us to obtain an acoustical distance matrix used to test how the acoustical distances were differentially related to the different species.

4.4.1.3. Interaction between Behaviour and Mahalanobis distances

To test whether the acoustic similarity between human voices and NHP vocalizations facilitated affective recognition, we used the Mahalanobis distance as continuous predictors in GLMM analyses for participants’ accuracy and reaction time. The GLMMs included Species (human, chimpanzee, bonobo, and rhesus macaque) as fixed factors, Mahalanobis distances from human voices to NHP vocalizations as continuous predictors and participant IDs as random factors. To assess the variance explained by the phylogeny as well within the Mahalanobis distances, we tested all slopes with the following contrast: human vs [great apes (chimpanzee and bonobo)] vs rhesus macaque. Furthermore, in order to assess the statistical significance of logistic regression slopes, we used odds ratio analysis to quantify the strength of the association between participant’s behaviour and Mahalanobis distances. If the slope is significant and odds ratio < 1, factors are negatively correlated; if odds ratio > 1, factors are positively correlated.

140

4.4.2. Imaging Data acquisition

4.4.2.1. Species Categorization task

Structural and functional brain imaging data were acquired by using a 3T scanner Siemens Trio, Erlangen, Germany with a 32-channel coil. A 3D GR\IR magnetization-prepared rapid acquisition gradient echo sequence was used to acquire high-resolution (0.35 x 0.35 x 0.7 mm3) T1-weighted structural images (TR = 2400 ms, TE = 2.29 ms). Functional images were acquired by using fast fMRI, with a multislice echo planar imaging sequence 79 transversal slices in descending order, slice thickness 3 mm, TR = 650 ms, TE = 30 ms, field of view = 205 x 205 mm2, 64 x 64 matrix, flip angle = 50 degrees, bandwidth 1562 Hz/Px.

4.4.2.2. Temporal Voice Areas localizer task

Structural and functional brain imaging data were acquired by using a 3T scanner Siemens Trio, Erlangen, Germany with a 32-channel coil. A magnetization-prepared rapid acquisition gradient echo sequence was used to acquire high-resolution (1 x 1 x 1 mm3) T1- weighted structural images TR = 1,900 ms, TE = 2.27 ms, TI = 900 ms. Functional images were acquired by using a multislice echo planar imaging sequence 36 transversal slices in descending order, slice thickness 3.2 mm, TR = 2,100 ms, TE = 30 ms, field of view = 205 x 205 mm2, 64 x 64 matrix, flip angle = 90°, bandwidth 1562 Hz/Px.

4.4.3. Wholebrain data analysis in the TVA

4.4.3.1. Species Categorization task region-of-interest analysis within the Temporal Voice Areas

Functional images were analysed with Statistical Parametric Mapping software (SPM12, Wellcome Trust Centre for Neuroimaging, London, UK). Pre-processing steps included realignment to the first volume of the time series, slice timing, normalization into the Montreal Neurological Institute33 (MNI) space using the DARTEL toolbox34 and spatial smoothing with an isotropic Gaussian filter of 8 mm full width at half maximum. To remove low-frequency components, we used a high-pass filter with a cut-off frequency of 128 s. Two general linear models were used to compute first-level statistics, in which each event was modelled by using a boxcar function and was convolved with the hemodynamic response function, time-locked to the onset of each stimulus. Separate regressors were created for all

141

trials of each species (Species factor: human, chimpanzee, bonobo, macaque vocalizations) and two covariates each (mean fundamental frequency and mean energy of each species) for a total of 12 regressors. Finally, six motion parameters were included as regressors of no interest to account for movement in the data and our design matrix therefore included a total of 18 columns plus the constant term. The species regressors were used to compute simple contrasts for each participant, leading to separate main effects of human, chimpanzee, bonobo, and macaque vocalizations. Covariates were set to zero in order to model them as no-interest regressors. These four simple contrasts were then taken to a flexible factorial second-level analysis in which there were two factors: Participants factor (independence set to yes, variance set to unequal) and the Species factor (independence set to no, variance set to unequal). For this analysis and to be consistent in our analyses, we only included participants who were above chance level (25%) in the species categorization task (N=18). Brain region labelling was defined using xjView toolbox (http://www.alivelearn.net/xjview). All neuroimaging activations were thresholded in SPM12 by using a voxel-wise false discovery rate (FDR) correction at p<.05 and an arbitrary cluster extent of k>10 voxels to remove extremely small clusters of activity.

4.4.3.2. Temporal Voice Areas localizer task

Functional images were analysed with Statistical Parametric Mapping software (SPM12, Wellcome Trust Centre for Neuroimaging, London, UK). Pre-processing steps included realignment to the first volume of the time series, slice timing, normalization into the Montreal Neurological Institute (MNI) space (Collins et al., 1994) using the DARTEL toolbox (Ashburner, 2007) and spatial smoothing with an isotropic Gaussian filter of 8 mm full width at half maximum. To remove low-frequency components, we used a high-pass filter with a cut-off frequency of 128 s. A general linear model was used to compute first- level statistics, in which each block was modelled by using a block function and was convolved with the hemodynamic response function, time-locked to the onset of each block. Separate regressors were created for each condition (vocal and non-vocal; condition factor). Finally, six motion parameters were included as regressors of no interest to account for movement in the data. The condition regressors were used to compute simple contrasts for each participant, leading to a main effect of vocal and non-vocal at the first-level of analysis: [1 0] for vocal, [0 1] for non-vocal. These simple contrasts were then taken to a flexible factorial second-level analysis in which there were two factors: Participants factor

142

(independence set to yes, variance set to unequal) and the Condition factor (independence set to no, variance set to unequal). All neuroimaging activations were thresholded in SPM12 by using a voxel-wise family-wise error (FWE) correction at p<.05. Activation outline for vocal > non-vocal was precisely delineated and overlaid on brain displays of the species categorization task.

4.4.4. Functional connectivity analysis

Seed-to-seed functional connectivity analyses were performed only for the species categorization task by using the CONN toolbox (Whitfield-Gabrieli & Nieto-Castanon, 2012) version 18.b implemented in Matlab 9.0 (The MathWorks, Inc., Natick, MA, USA). Functional connectivity analyses were computed using 28 regions of interest (ROI) extracted with a 6mm sphere from each peak voxel of the [human > chimpanzee, bonobo, macaque] contrast located within the temporal voice areas delineated in the temporal voice areas localizer task. To ensure consistency and accuracy in such procedure, each ROI was extracted using the WFU PickAtlas toolbox (https://www.nitrc.org/projects/wfu_pickatlas/) and each ROI sphere was then overlaid on temporal voice areas using the FMRI Software Library v6.0 (FSL) through FSLeyes (Smith et al., 2004). Only ROI completely within the bounds of the TVA were included in this analysis. Spurious sources of noise were estimated and removed using the automated toolbox pre-processing algorithm, and the residual blood-oxygen-level dependent time-series was band-pass filtered using a low frequency window (0.008 < f < 0.09 Hz). Correlation maps were then created for each condition of interest by taking the residual BOLD time-course for each condition from atlas regions of interest and computing bivariate Pearson's correlation coefficients between the time courses of each voxel of each ROI of the atlas, averaged by ROI. We used generalized psychophysiological interaction (gPPI) measures, representing the level of task-modulated often̶ labelled ‘effective’ connectivity between ROI: gPPI is computed using a separate multiple regression model for each target (ROI). Each model includes three predictors: i) task effects convolved with a canonical hemodynamic response function (psychological factor); ii) each seed ROI BOLD time series (physiological factor) and iii) the interaction term between the psychological and the physiological factors, the output of which is regression coefficients associated with this interaction term. Finally, group-level analyses were performed on these regression coefficients to assess for main effects within group for

143

contrasts of interest in seed-to-seed analyses. Type I error was controlled by the use of seed- level false discovery rate correction with p<.05 FDR to correct for multiple comparison.

4.5. Results

4.5.1. Interaction between Behaviour and Mahalanobis distances

GLMM analyses revealed a significant interaction (χ2(3) = 6.9733, p<.05) between Species and Mahalanobis distances for reaction time data. However, only the macaque slope was found significant at p<.05 with an odds ratio of 8e-05 indicating a negative correlation between participants’ reaction time and Mahalanobis distances for macaque vocalizations. Furthermore, the contrasts testing the phylogenetic differences between the slopes of human > [chimpanzee; bonobo] > macaque were also significant (χ2(1) = 173.34, p<.001), emphasizing the link between short reaction time with short acoustic distance characterizing human and great apes vocalizations compared to macaque vocalizations. Hence, human participants categorized differently human voices compared to great ape (chimpanzee, bonobo) and to rhesus macaque vocalizations. Moreover, participants were faster to categorize macaque calls when the acoustic distance from human voices was larger (see Figure 51). Note that a trend toward significance was found for the interaction between Species and Mahalanobis distances for accuracy data (χ2(3) = 6.36, p=.09).

Figure 51: Interaction between participants’ reaction times in the species categorization task and acoustic distances between non-human primate calls and human voices. Confidence interval at 0.95.

144

4.5.2. Region-of-interest data within the Temporal Voice Areas

The present study did not aim at uncovering wholebrain results underlying the processing of each species’ vocalizations. Rather, we adopted a region-of-interest approach to uncover functional changes within the temporal voice areas. Vocalization mean energy and mean fundamental frequency were used as covariates of no-interest at the trial level, meaning that signal correlating directly with these low-level acoustics was removed from the reported results.

We were particularly interested in brain activity relative to perceiving vocalizations of our closest relative (both acoustically and phylogenetically), the chimpanzee compared to human vocalizations and other species. TVA activations specific to human and chimpanzee vocalizations using the [human, chimpanzee > bonobo, macaque] contrast led to enhanced signal in the bilateral posterior, mid and anterior superior temporal cortex (see Figure 52 abcd). Brain activity specific to chimpanzee vocalizations ([chimpanzee > human, bonobo, macaque]) led to enhanced activity in a cluster of the left anterior STG located within the temporal voice areas (see Figure 52 c). A similar result was observed when directly contrasting chimpanzee to human vocalizations ([chimpanzee > human]) in a slightly more medial area of the anterior STG, also located again within the voice-sensitive areas (see Figure 2 g). Enhanced activity for human relative to chimpanzee vocalizations ([human > chimpanzee]) was observed in large parts of the anterior, mid and posterior superior and middle temporal cortex (see Figure 2 efg).

145

Figure 52: Wholebrain analysis within the TVAs when selectively contrasting processing of chimpanzee to other species’ vocalizations. abc, Enhanced brain activity for human and chimpanzee compared to bonobo and macaque vocalizations (purple to yellow) on a sagittal view, overlaid with activity specific to chimpanzee vocalizations (dark blue to green). d, Percentage of signal change (± standard error of the mean) for each species in the left aSTG. ghi, Direct comparison between human and chimpanzee vocalizations (human > chimpanzee: dark red to yellow; chimpanzee > human: dark green to yellow) on a sagittal render. j, Percentage of signal change (± SEM: standard error of the mean) in a more medial part of aSTG when contrasting chimpanzee to human vocalizations. Brain activations are independent of low-level acoustic parameters for all species (mean fundamental frequency ‘F0’ and mean energy of vocalizations). Data corrected for multiple comparison using FDR at a threshold of p<.05. Percentage of signal change extracted at cluster peak including 27 surrounding voxels, selecting among these the ones explaining at least 85% of the variance using singular value decomposition. Hum: human; Chimp: chimpanzee; Bon: bonobo; Mac: macaque. ‘a’ prefix: anterior; ‘m’ prefix: mid; ‘p’ prefix: posterior; L: left; R: right.

4.5.3. Functional Connectivity

In order to characterize brain networks underlying the processing of each species, we performed functional connectivity analyses. These analyses revealed shared coupled patterns of activity for human and chimpanzee vocalizations, using only within-TVA seed regions (N=28) originating from the [human > chimpanzee, bonobo, macaque] results. Human-specific contrast ([human > chimpanzee, bonobo, macaque]) yielded to coupled

146

functional connectivity in the anterior, mid and posterior MTG and right mid and posterior STS (see Figure 53 a). For chimpanzee vocalizations ([chimpanzee > human, bonobo, macaque]), the closest species to humans as a function of acoustics and phylogeny, similar results were observed: coupled functional connectivity was observed between the left mid MTG and right mid STG while anti-coupling was highlighted between the right mid and anterior STG (see Figure 3b). For bonobo vocalizations ([bonobo > human, chimpanzee, macaque]), coupling was observed between ipsilateral mid STG and posterior STS regions (Right hemisphere). Direct comparison between humans and chimpanzee vocalizations revealed coupling between the left mMTG and right mSTS ([human > chimpanzee], see Figure 3c). No such intra-TVA functional connectivity was significant for the macaque vocalizations compared to other species.

Figure 53: ROI-to-ROI functional connectivity analyses for human and chimpanzee vocalizations. a, Human-specific, b) chimpanzee-specific, and c) human > chimpanzee. Functional connectivity data are independent of low-level acoustic parameters for all species (mean fundamental frequency ‘F0’ and mean energy of vocalizations). Data corrected for multiple

147

comparison using FDR at a threshold of p<.05, two-tailed. Hum: human; Chimp: chimpanzee; Bon: bonobo; Mac: macaque. ‘a’ prefix: anterior; ‘m’ prefix: mid; ‘p’ prefix: posterior; L: left; R: right.

4.6. Interim Discussion

The present study provides evidence, first, of a TVA sensitivity to chimpanzee screams enhancing activity in left and medial aSTG; second, of a shared dedicated brain network for the listening of human voices and chimpanzee calls involving posterior, middle and anterior parts of bilateral STC as well as a bilateral coupling between left mMTG and right mSTS; and third, of a connectivity between ipsilateral mSTG and pSTS regions for the human listening of bonobo calls. These findings revealed by fMRI analysis and connectivity analysis within the TVAs emphasize, for the first time, the hemodynamic activity within TVA for chimpanzee vocalizations which are more similar to the processing of human voices compared to bonobo and macaque vocalization processing. Moreover, our results suggest that the processing of great ape vocalizations (human, chimpanzee and bonobo included) compared to monkey calls, enhance functional connectivity within ipsilateral and contralateral TVA.

Distinct mechanisms seem thus at play for the processing of great ape and monkey vocalizations. Correlation analyses between reaction times and Mahalanobis distances indeed revealed that human participants categorized differently human voices from great ape calls (chimpanzee, bonobo) and then from macaque calls. Interestingly, we also found that our participants were faster in recognizing macaque vocalizations when the acoustic distance was higher from the human voice. At first counterintuitively, these results however suggest that both, phylogenetic and acoustic distances are essential to the human processing of heterospecific affective vocalizations. In fact, the human categorization of vocalizations expressed by other Hominidae species, namely chimpanzees and bonobos, involves brain regions that were thought as human specific; the TVA. On the contrary, the recognition of macaque vocalizations, with whom we share a more distant evolutionary history, might be related to a kind of oddball effect induced by human listening of acoustic objects that have unusual and infrequent acoustic features that led to a faster categorization of these macaque calls. This last claim is support by previous findings in EEG showing the involvement of P3a event-related potential (ERP) for the human perception of tree shrew (Tupaia glis -a primate-like species) vocalizations emphasizing the possible involuntary

148

attention switch from familiar to novel stimuli (Scheumann et al., 2017). The presence of peculiar acoustic features in macaque calls therefore seems to be an important factor for their recognition by humans. Together, these results suggest that both, phylogenetic and acoustic distances are essential to the human processing of heterospecific affective vocalizations.

Following the findings described above, fMRI wholebrain analyses within the TVA revealed that both human voices and chimpanzee calls enhanced activity in bilateral posterior, middle and anterior STC. Importantly, we also found specific hemodynamic responses in TVA involving the left aSTG for the human perception of chimpanzee screams only. However, vocalizations expressed by bonobos, another great ape species, did not involve such activity in STG or STC. This result might be related to existing differences at the acoustic level which could be at play in the TVA insensitivity to bonobo calls. In fact, are results are consistent with previous findings showing a higher pitch in young bonobo screams in comparison to chimpanzee and human baby cries (Kelly et al., 2017). Hence, its seems reasonable to hypothesized that human TVA usually associated to the processing of conspecific vocalizations (Belin et al., 2000) could also be involved in the listening of other phylogenetically close primate vocalizations provided that these calls are acoustically close of the ones expressed by conspecifics. Similar processes are thus at play for the processing of chimpanzee vocalizations and human voices in human TVA.

This last claim is corroborated by our functional connectivity analyses. Indeed, similar bilateral coupling between left mMTG and right mSTS were found for the human categorization of human and chimpanzee vocalizations. Interestingly, functional connectivity also revealed an ipsilateral coupling between mSTG and pSTS for the perception of bonobo calls. Hence, a common part of the cerebral network involved in the recognition of human is also at play for the chimpanzee calls recognition within the bilateral TVA. This finding supports the hypothesis that heterospecific vocalizations can enhance activity in TVA if i) there are expressed by phylogenetically close species; and ii) there acoustic properties are similar to the ones involved in conspecific vocalizations.

Furthermore, the human listening of great ape vocalizations only appears to involve functional coupling between mMTG - mSTS and mSTG - pSTS. At the opposite, vocalizations produced by more evolutionary distant species of non-human primates, such

149

as macaques, do not seem to require the existence of such connectivity in temporal regions. In accordance with the few existing studies in the literature, these results highlight the key role of phylogenetic proximity in the human processing of cross-taxa vocalizations. In fact, recent findings in fMRI have demonstrated a gradient of activity in bilateral STS for the human perception of primate vocalizations with the strongest neural responses for human voices compared to chimpanzees and then compared to macaque calls in which the lowest activations were found (Fritz et al., 2018). Following this, a study using electroencephalography (EEG) has shown an involvement of early negativities at posterior sites for the listening of human infant and chimpanzee vocalizations while the human perception of tree shrew calls, a primate like species, did not involve such ERP (Scheumann et al., 2017).

Finally and out of the scope of this paper, the absence of results in TVA for the perception of bonobo calls, compared to human and chimpanzee vocalizations, support the evolutionary divergence of this peculiar species. In fact, according to the self-domestication hypothesis, bonobo would have evolved differently compared to chimpanzees due to selection against aggression (Hare et al., 2012). Interestingly, differentiation in the evolutionary pathway of bonobos has affected both their behaviour (Gruber & Clay, 2016) and morphology. For instance, anatomical research has shown a shorter larynx in bonobo in comparison to chimpanzee, resulting in a higher fundamental frequency in their calls (Grawunder et al., 2018). From this assessment and our fMRI results showing that the human processing of chimpanzee vocalizations involve similar mechanisms than the human voice, we can suppose that the calls of our common ancestor together with the other great apes 8 million year ago (Perelman et al., 2011) would be closer to the ones currently expressed by chimpanzees.

To conclude, the present study emphasized the role of both phylogenetic and acoustic proximity in the recognition of primate species. To categorize the vocalizations of evolutionary distant species such as macaques, the role of infrequent acoustic features seems an important factor to induce an attentional bias toward these calls. Most importantly, for the first time, we demonstrated an increase of hemodynamic responses in the left aSTG, within TVA for the human listening of chimpanzee calls. Furthermore, both perception of human and chimpanzee vocalizations enhance activity in TVA involving bilateral pSTC, mSTC and aSTC as well as a functional coupling between the left mMTG

150

and right mSTS. Bonobo calls, also involved a functional connectivity between bilateral human TVA. Since the 2000’s (Belin et al., 2000), TVA has often been linked to the processing of human voice only. However, our results suggest that TVA is also involved in the processing of heterospecific vocalizations, in particular of chimpanzee calls, for which we share the closest acoustic properties and phylogenetic proximity. The bonobo with the same phylogenetic distance from human than chimpanzee but with more acoustically distant calls involved also a functional connectivity in the human TVA. Overall, our findings support an evolutionary continuity between human and great ape calls’ structure and the brain related processing in the TVA.

4.7. Supplementary Material

Accuracy

First, an ANOVA revealed significant differences between the recognition of the species by human participants (F(3) = 69.097 p<.001) . Second, all contrasts within the ANOVA were found significant after Bonferroni correction except for the [chimpanzee > macaque] contrast.

Thus, participants showed a better performance in categorizing human voices compared to chimpanzee (χ2(1) = 73.337, p<.001), bonobo (χ2(1) = 196.54, p<.001), and macaque vocalizations (χ2(1) = 95.433, p<.001). Statistics also revealed that humans were better to recognize human and then chimpanzee and bonobo vocalizations compared to more phylogenetically distant rhesus macaque calls (χ2(1) = 378.08, p<.001).

151

Figure 54: Rate (%) and SD of human recognition of primate species through their vocalizations. All contrasts were significant after Bonferroni correction with Pcorrected = .01, except the [chimpanzee > macaque] contrast.

Reaction time

ANOVA analysis revealed significant differences in participants’ reaction time between the species recognition (F(3) = 87.264, p<.001) . All contrasts were found significant after Bonferroni correction excepted the following contrasts: chimpanzee vs macaque and bonobo vs macaque.

Therefore, participants showed a faster reaction time in categorizing human voices compared to chimpanzee (χ2(1) = 113.2, p<.001), bonobo (χ2(1) = 126.95, p<.001), and macaque vocalizations (χ2(1) = 218.74, p<.001). Participants were also faster to recognize human and then chimpanzee and bonobo vocalizations compared to more macaque screams (χ2(1) = 265.56, p<.001).

152

Figure 55: Reaction time (ms) and SD of human recognition of primate species through their vocalizations. All contrasts were significant after Bonferroni correction with Pcorrected = .01.

153

Chapter 5.

Non-human Primate Vocalizations are Categorized in the Inferior Frontal Gyrus

Coralie Debracque*, Leonardo Ceravolo*, Thibaud Gruber‡ and Didier Grandjean‡

In preparation

5.1. Abstract

Homo sapiens and NHP species share a long evolutionary history. The phylogenetic proximity of humans with other great apes, our closest relatives, also suggests that numerous traits are shared with the latter, possibly inherited from a common ancestor. Yet, the literature on homologous traits between humans and other primates, particularly great apes, remains scarce with respect to emotion evaluation. The present study investigated the ability of humans to recognize four primate species including human, great apes (chimpanzee and bonobo) and monkey (rhesus macaque) through their vocalizations. Focusing on IFG, well known for its role in auditory object identification, we hypothesized that this brain area would be differently activated depending on the species that produced the vocalizations. Because of their evolutionary and auditory proximity to humans, we predicted that chimpanzee vocalizations would enhance more activity in IFG enabling human participants to better recognize this species compared to bonobo or macaque calls. As expected, when contrasted with other species’ vocalizations, chimpanzee but not bonobo or macaque calls, increased bilaterally activity in the IFG subparts, namely the pars triangularis, opercularis and orbitalis, independently of vocalization mean F0 and mean energy. Furthermore, to the exception of the pars orbitalis for the processing of bonobo screams, all non-human primate calls compared to human voices led to an enhancement of activity in bilateral IFG. Interestingly, similar brain activations were found for the correct recognition of chimpanzee and macaque screams (left pars orbitalis excluded for macaques). On the contrary, the correct categorization of bonobo calls bilaterally enhanced the pars triangularis of IFG only. Overall, our results support first, the existence of common brain and behavioural mechanisms for the categorization of human voices and vocalizations

154

expressed by chimpanzees. Second, with the involvement of the IFG for the processing of all vocalizations expressed by NHP, independently of their phylogenetic proximity with Homo sapiens, our data suggest a distinct role of the pars opercularis, orbitalis and triangularis in the identification of heterospecific vocalizations.

Keywords: NHP, vocalization, IFG, fMRI

5.2. Introduction

The Hominidae clade appeared between 13 and 18 million years ago (Perelman et al., 2011). Encompassing all living great apes including humans, chimpanzees, bonobos, gorillas and orangutans, this unique primate branch is a key to understand human current behaviours, physiology and cognitive abilities. In fact, despite a larger and more folded brain compared to other great ape species (Heuer et al., 2019), human neuroanatomical traits are mainly considered as part of a continuum in the primate brain evolution (e.g. Herculano-Houzel, 2009; Semendeferi et al., 2002). For instance, findings MRI have demonstrated the existence of a large frontal cortex in all great ape species, humans included (Semendeferi et al., 2002), underlining that the comparative approach is particularly of interest to investigate anatomical structures and functions of the frontal regions. Often associated to problem solving, emotion, judgment, language or even motor and sexual behaviours (Barbas, 2000; Barbas et al., 2011; Binder et al., 2004; Davidson, 1992; Frühholz & Grandjean, 2013; Kambara et al., 2018; LeDoux, 2012), functions of the frontal lobe and their related brain structures are shared by most primate species. Yet, IFG is an exception. Anatomically present in evolutionary distant species such as rhesus macaques (Petrides & Pandya, 2002), the left IFG is most prominently linked to speech production (Broca’s areas) in humans (Guenther et al., 2015), the only species capable of language. Hence, in the course of human evolution and in addition to evolutionary likely shared functions such as decision formation (Binder et al., 2004; Brück et al., 2011), inhibition (Hampshire et al., 2010; Tops & Boksem, 2011) and emotional judgement (Grandjean, 2020; Gruber et al., 2020; Zhang et al., 2018), the IFG also became specialized in semantic auditory processing (Belyk et al., 2017; Frühholz et al., 2012; Schirmer & Kotz, 2006). All described functions rely on specific neuroanatomical subparts of IFG. In fact, complex by nature, the cytoarchitecture of IFG encompasses three distinctive parts (Cai & Leung,

155

2011): IFGtri, IFGorb and IFGope. For instance, semantic and emotional processing enhance activity in the left IFGorb (Belyk et al., 2017), while emotional prosody discrimination and categorization involve bilateral IFGtri and IFGope respectively (Dricu et al., 2017). Overall, current functions and structures of IFG in humans seem importantly linked to the Hominidae evolutionary history (Stout & Chaminade, 2012). However, comparative approaches to investigate the evolutionary role of IFG in the human brain are still lacking, particularly with respect to auditory emotional processing (Gruber & Grandjean, 2017). To partially fill this gap, the present study focused on auditory decision-making, an evolutionary shared function of IFG using functional MRI in humans. Participants were explicitly asked to categorize four primate species though their vocalizations: human, chimpanzee, bonobo and rhesus macaque. To our knowledge, only few comparative studies has previously used bonobo vocalizations as stimuli (e.g. Kelly et al., 2017). Despite the same phylogenetic proximity with Homo sapiens than chimpanzees, their vocal communication and behaviour are very different (Grawunder et al., 2018; Gruber & Clay, 2016; Hare et al., 2012; Staes et al., 2018). It thus seems essential to include both great ape species to distinguish the role of phylogeny and other factors such as acoustic properties in such comparative experiment. In addition, to control for lower-level acoustic differences, the mean fundamental frequency and the mean energy of the vocalizations were included as trial-level covariates of no-interest in the statistical models. Hence, in light of the phylogenetic proximity of the species, we predicted: i) More activity in IFG for the categorization chimpanzee vocalizations compared to bonobo and macaque calls; ii) dedicated brain networks within the subparts of IFG for the correct recognition of the non-human primate vocalizations; and iii) a better performance from human participants for the categorization of chimpanzee calls compared to bonobo and macaque vocalizations.

5.3. Materials and methods

5.3.1. Participants

Twenty-five right-handed, healthy, either native or highly proficient French-speaking participants took part in the study. We included in our analyses only participants who had above chance level (25%) performance in categorizing each species, leaving us with eighteen participants (9 female, 9 male, mean age = 24.61 years, SD = 3.71). All participants were naive

156

to the experimental design and study, had normal or corrected-to-normal vision, normal hearing and no history of psychiatric or neurologic incidents. Participants gave written informed consent for their participation in accordance with ethical and data security guidelines of the University of Geneva. The study was approved by the CCER and was conducted according to the Declaration of Helsinki.

5.3.2. Stimuli

Seventy-two vocalizations of four primate species (human, chimpanzee, bonobo and rhesus macaque) were used in this study. The eighteen human voices were expressed by two male and two female actors, obtained from a nonverbal validated set of Belin and collaborators (Belin, Fillion-Bilodeau, et al., 2008). The eighteen selected chimpanzee, bonobo and macaque vocalizations contained single calls or call sequences produced by 6 to 8 different individuals in their natural environment. All vocal stimuli were standardized to 750 milliseconds using PRAAT (www.praat.org) but were not normalized in order to preserve the ‘naturality’ of the sounds (Ferdenzi et al., 2013).

5.3.3. Experimental Procedure and Paradigm

Lay comfortably in a 3T scanner, participants listened to a total of seventy – two stimuli randomized and played binaurally using MRI compatible earphones at 70 dB SPL. At the beginning of the experiment, participants were instructed to identify the species that expressed the vocalizations using a keyboard. For instance, the instructions could be “Human – press 1, Chimpanzee – press 2, Bonobo – press 3 or Macaque – press 4”. The pressed keys were randomly assigned across participants. In a 3 – 5 second interval (jittering of 400 ms) after each stimulus, participants were asked to categorize the species. If the participant did not respond during this interval, the next stimulus followed automatically (see Figure 56).

157

Figure 56: Experimental setup and task description. Participants were lying in the MRI scanner and performed the species categorization task for about 8 minutes, followed by a 7 minutes anatomical scan. Screen sequence depicts what the participants saw and heard during the task. Total study time was 15 minutes inside the MRI scanner.

5.3.4. Image acquisition

Structural and functional brain imaging data were acquired by using a 3T scanner Siemens Trio, Erlangen, Germany with a 32-channel coil. A 3D GR\IR magnetization-prepared rapid acquisition gradient echo sequence was used to acquire high-resolution (0.35 x 0.35 x 0.7 mm3) T1-weighted structural images (TR = 2400 ms, TE = 2.29 ms). Functional images were acquired by using fast fMRI, with a multislice echo planar imaging sequence 79 transversal slices in descending order, slice thickness 3 mm, TR = 650 ms, TE = 30 ms, field of view = 205 x 205 mm2, 64 x 64 matrix, flip angle = 50 degrees, bandwidth 1562 Hz/Px.

5.4. Analysis

5.4.1. Wholebrain analysis

Functional images were analysed with Statistical Parametric Mapping software (SPM12, Wellcome Trust Centre for Neuroimaging, London, UK). Pre-processing steps included realignment to the first volume of the time series, slice timing, normalization into the Montreal Neurological Institute (MNI) space (Collins et al., 1994) using the DARTEL

158

toolbox (Ashburner, 2007) and spatial smoothing with an isotropic Gaussian filter of 8 mm full width at half maximum. To remove low-frequency components, we used a high-pass filter with a cut-off frequency of 128 s. Two general linear models were used to compute first-level statistics, in which each event was modelled by using a boxcar function and was convolved with the hemodynamic response function, time-locked to the onset of each stimulus. For model 1, separate regressors were created for all trials of each species (Species factor: human, chimpanzee, bonobo, macaque vocalizations) and two covariates each (mean fundamental frequency and mean energy of each species) for a total of 12 regressors. Finally, six motion parameters were included as regressors of no interest to account for movement in the data and our design matrix therefore included a total of 18 columns plus the constant term. The species regressors were used to compute simple contrasts for each participant, leading to separate main effects of human, chimpanzee, bonobo, and macaque vocalizations. Covariates were set to zero in order to model them as no-interest regressors. These four simple contrasts were then taken to a flexible factorial second-level analysis in which there were two factors: Participants factor (independence set to yes, variance set to unequal) and the Species factor (independence set to no, variance set to unequal). For model 2, separate regressors were created for each correctly categorized species (Correct Species factor: human hits, chimpanzee hits, bonobo hits, macaque hits vocalizations) and two covariates each (mean fundamental frequency and mean energy of each species) as well as a regressor containing a concatenation of categorization errors across species for a total of 13 regressors. Finally, six motion parameters were included as regressors of no interest to account for movement in the data and our design matrix therefore included a total of 19 columns plus the constant term. The regressors of interest (Correct Species factor) were used to compute simple contrasts for each participant, leading to separate main effects of human hits, chimpanzee hits, bonobo hits and macaque hits vocalizations. Covariates were set to zero in order to model them as no-interest regressors. These four simple contrasts were then taken to a flexible factorial second-level analysis in which there were two factors: Participants factor (independence set to yes, variance set to unequal) and the Correct Species factor (independence set to no, variance set to unequal). For both models and to be consistent in our analyses, we only included participants who were above chance level (25%) in the species categorization task (N=18). Brain region labelling was defined using xjView toolbox (http://www.alivelearn.net/xjview). All neuroimaging activations were thresholded in SPM12 by using FDR correction at p<0.05 and an arbitrary cluster extent of k>10 voxels to

159

remove extremely small clusters of activity.

5.4.2. Behavioural data analysis

For consistency, behavioural data from the same 18 participants were analysed using R studio software (R Studio team Inc., Boston, MA, url: http://www.rstudio.com/). In order to test our hypotheses regarding the phylogenetic distance we performed ANOVA analyses on participants’ accuracy and reaction time with Species (human, chimpanzee, bonobo, macaque) as fixed factors and participants’ ID as random factor. Then, we compared using specific contrasts described below the species recognition rate of 18 participants. These contrasts were corrected with Bonferroni correction with Pcorrected = .05/number of tests = .05/4 = .01. Data for participants’ reaction time can be found in Supplementary Material.

5.5. Results

5.5.1. Wholebrain data

Categorization of the four primate species enhanced activity of several brain areas in human participants. In this paper, we focus on IFG only. As mentioned in Methods, mean energy and mean fundamental frequency were used as covariates of no-interest at the trial level, meaning that signal correlating directly with these low-level acoustics are removed from the reported contrasts.

5.5.1.1. Model 1: Processing of all Species trials Independently of Categorization Performance

First, we were particularly interested in brain activity related to the categorization of vocalizations expressed by chimpanzees, our closest relative from an acoustic and phylogenetic point of view. Brain activity specific to chimpanzee vocalizations ([chimpanzee > human, bonobo, macaque] and [chimpanzee > human]) led to enhanced activity in all bilateral subparts of IFG (pars triangularis, pars opercularis and pars orbitalis -See Figure 57). Second, focusing on cerebral activations related to the categorization of bonobo and macaque vocalizations, our analyses revealed an increase of the hemodynamic responses in all subparts of IFG for the contrasts [macaque > human] and [bonobo > human]; to the exception of the pars orbitalis for bonobo calls (see Figure 58). Note that no significant

160

activity in IFG was found for the categorization of bonobo and macaque calls against other species’ vocalizations.

Figure 57: Wholebrain results when selectively contrasting processing of chimpanzee against other species’ vocalizations. Enhanced brain activity for chimpanzee compared to other species’ vocalizations (ab) and chimpanzee calls compared to human voices (cd) (dark blue to green). ac: left hemisphere; bd: right hemisphere. Brain activations (a-d) are independent of low-level acoustic parameters for all species (fundamental frequency ‘F0’ and mean energy of vocalizations). Data corrected for multiple comparisons using FDR at a threshold of p<.05. Hum: human; Chimp: chimpanzee; Bon: bonobo; Mac: macaque.

161

Figure 58: Wholebrain results when selectively contrasting processing of bonobo and macaque vocalizations against human voices. Enhanced brain activity for bonobo (ab) and macaque (cd) compared to human voice (dark blue to green). ac: left hemisphere; bd: right hemisphere. Brain activations (a-d) are independent of low-level acoustic parameters for all species (fundamental frequency ‘F0’ and mean energy of vocalizations). Data corrected for multiple comparisons using FDR at a threshold of p<.05. Hum: human; Chimp: chimpanzee; Bon: bonobo; Mac: macaque.

5.5.1.2. Model 2: Processing of Correctly Categorized Species trials

We computed the same contrasts of interest as in model 1 and observed the enhancement of similar brain regions as above for chimpanzee vocalizations. Thus, brain activity specific to

162

the correct categorization of chimpanzee vocalizations ([chimpanzee hits > human hits, bonobo hits, macaque hits]) was enhanced in IFG bilaterally (see Figure 59 ab). When directly contrasting hits for chimpanzee vs human vocalizations ([chimpanzee hits > human hits]), activity was enhanced in similar regions as mentioned above in bilateral IFG (see Figure 59 cd). Following this, to the exception of the left pars orbitalis, the recognition of macaque calls [macaque hits > human hits]) also led to an increase of hemodynamic responses as described above in bilateral IFG (see Figure 60 cd). On the contrary, the correct categorization of bonobo vocalizations [bonobo hits > human hits] enhanced bilaterally activity in the pars triangularis only (see Figure 60 ab). As in model 1, we found no significant activity in IFG for the correct recognition of bonobo and macaque vocalizations compared to other species’ vocalizations.

163

Figure 59: Wholebrain results when selectively contrasting correct categorization of chimpanzee against other species’ vocalizations. Enhanced brain activity for chimpanzee compared to other species’ vocalizations (ab) and chimpanzee calls compared to human voices (cd) (dark red to yellow). ac: left hemisphere; bd: right hemisphere. Brain activations (a-d) are independent of low-level acoustic parameters for all species (fundamental frequency ‘F0’ and mean energy of vocalizations). Data corrected for multiple comparisons FDR at a threshold of p<.05. Hum: human; Chimp: chimpanzee; Bon: bonobo; Mac: macaque.

164

Figure 60: Wholebrain results when selectively contrasting correct categorization of bonobo and macaque against other species’ vocalizations. Enhanced brain activity for bonobo (ab) and macaque (cd) vocalizations compared to human voices (dark red to yellow). ac: left hemisphere; bd: right hemisphere. Brain activations (a-d) are independent of low-level acoustic parameters for all species (fundamental frequency ‘F0’ and mean energy of vocalizations). Data corrected for multiple comparisons using voxel-wise false discovery rate (FDR) at a threshold of p<.05. Hum: human; Chimp: chimpanzee; Bon: bonobo; Mac: macaque.

5.5.2. Accuracy

As described in Chapter 4, we found significant differences regarding the recognition of the four species by human participants (F(3) = 69.097 p<.001). Participants categorized human

165

voices more accurately compared to chimpanzee (χ2(1) = 73.337, p<.001), bonobo (χ2(1) = 196.54, p<.001), and macaque vocalizations (χ2(1) = 95.433, p<.001). Participants were also better at categorizing chimpanzee vocalizations compared to bonobo calls (χ2(1) = 29.761, p<.001). Note that the contrast [chimpanzee vs macaque] did not reach significance after Bonferroni correction (see Figure 61).

Figure 61: Rate (%) and SD of human recognition of primate species through their vocalizations. The dotted line represents the 25% chance level.

5.6. Interim Discussion

The present study emphasizes for the first time the role of IFG in the human categorization of heterospecific vocalizations expressed by phylogenetically close species, namely other primates.

Wholebrain analyses indeed revealed an enhancement of activity in bilateral IFGtri, IFGope and IFGorb for the human categorization of chimpanzee calls compared to human, bonobo and macaque vocalizations. Moreover, to the exception IFGorb for bonobos, similar fMRI activity maps were found within the bilateral IFG for the categorization of chimpanzee, bonobo and macaque screams when contrasting to human voices. These findings extend previous fMRI results showing a gradient of activations in the left IFG (uncorrected threshold of p < .001) for the human discrimination of emotions in human voice compared to chimpanzee vocalizations and then compared to macaque calls (Fritz et al., 2018). Interestingly, while the correct recognition of chimpanzee vocalisations also led to an

166

involvement of bilateral IFG, the correct categorization of bonobo and macaque calls enhanced activity in specific subparts of the frontal regions with respectively IFGtri and

IFGtri/ope. Extending the existing literature on human auditory processing of human voices, and in light of our findings, we can surmise that the categorization of NHP vocalizations require the involvement of i) the right IFG for the processing of low auditory variations (Frühholz & Grandjean, 2013b; Schirmer & Kotz, 2006); ii) the left IFG for the decoding of short scale information such as the call roughness (Frühholz & Grandjean, 2013b; Grandjean, 2020); iii) the bilateral IFGtri and IFGope for the general processing of decision making in heterospecific vocalizations (Dricu et al., 2017); and iv) the bilateral IFGorb for the integration of prosodic modulation (Belyk et al., 2017) in chimpanzee calls only via its connection to the posterior superior temporal sulcus (Frühholz et al., 2015). In line with our fMRI results and previous comparative studies (Kelly et al., 2017), the behavioural data also highlighted the difference of recognition between the two great ape species. Human participants were indeed capable of identifying human voices, chimpanzee vocalizations as well as macaque calls but were unable to do so for bonobo calls. Consequently, the peculiar evolutionary pathway of bonobos as well as the infrequent acoustic features involved in their calls (Grawunder et al., 2018; Hare et al., 2012) seem to prevent human participants from accurately recognizing the bonobo species, despite their close phylogenetic proximity with Homo sapiens and their affiliation to the great ape clade. Finally, regarding the correct categorization of macaque calls, previous comparative studies interestingly demonstrated the relationship between the capacity of humans to accurately recognize the emotional contents in vocalizations expressed by macaques (Macaca sylvanus and Macaca arctoides) and the modulation of acoustic features such as fundamental frequency (Filippi et al., 2017) and energy (Linnankoski et al., 1994).

Overall, humans would be able to identify most of the NHP species, enhancing activity in bilateral IFG. However, the involvement of all subparts of IFG requires vocalizations expressed by evolutionary, acoustically and behaviourally close species to humans.

To conclude, the present study revealed for the first time the sensitivity of bilateral IFG to NHP vocalizations, highlighting the evolutionary function of these specific frontal regions.

Furthermore, the absence of activations in IFGope and IFGorb as well as the failure of human participants to categorize bonobos calls revealed the influence of their evolutionary

167

divergence in such processes. Overall, our results support the hypothesis of a continuum in the primate brain evolution and will hopefully contribute to a better understanding of IFG functions in the human brain. Future studies focused on emotional judgement may be particularly informative in this matter. Most NHP vocalizations are indeed expressed in a motivational or emotional context that modulates the prosodic feature, which in turn have been found to influence the categorization processing at behavioural and cerebral level in human voices (Belin, 2006; Filippi et al., 2017; Fritz et al., 2018; Kelly et al., 2017; Scheumann et al., 2014, 2017). We expect a similar pattern to emerge in the listening of closely related vocalizations, underlying once again the close continuity at the brain level amongst all primates including humans.

5.7. Supplementary Material

Reaction time

ANOVA analysis revealed significant differences in participants’ reaction time between the species recognition (F(3) = 87.264, p<.001) . All contrasts were found significant after Bonferroni correction excepted the following contrasts chimpanzee vs macaque and bonobo vs macaque.

Therefore, participants showed a faster reaction time in categorizing human voices compared to chimpanzee (χ2(1) = 113.2, p<.001), bonobo (χ2(1) = 126.95, p<.001), and macaque vocalizations (χ2(1) = 218.74, p<.001). Participants were also faster to recognize human and then chimpanzee and bonobo vocalizations compared to macaque calls (χ2(1) = 265.56, p<.001).

168

Figure 62: Reaction time (ms) and SD of human recognition of primate species through their vocalizations. All contrasts were significant after Bonferroni correction with Pcorrected = .01.

169

Chapter 6.

Brain Activation Lateralization in Monkeys (Papio anubis) following Asymmetric Motor and Auditory stimulations through functional Near Infrared Spectroscopy

Coralie Debracque*, Thibaud Gruber*, Romain Lacoste, Didier Grandjean‡ and Adrien Meguerditchian‡

Submitted

6.1. Abstract

Hemispheric asymmetries have long been seen as characterizing the human brain; yet, an increasing number of reports suggest the presence of such brain asymmetries in our closest primate relatives. However, most available data in non-human primates have so far been acquired as part of neuro-structural approaches such as MRI, while comparative data in humans are often dynamically acquired as part of neuro-functional studies. In the present exploratory study in baboons (Papio anubis), we tested whether brain lateralization could be recorded non-invasively using fNIRS device in two contexts: motor and auditory passive stimulations. Under light propofol anaesthesia monitoring, 3 adult female baboons were exposed to a series of i) left- versus right-arm passive movement stimulation; and ii) left- versus right-ear versus stereo auditory stimulations while recording fNIRS signals in the brain related areas (i.e., motor central sulcus and superior temporal cortices respectively). For the motor condition our results show that left-arm versus right-arm stimulations induced typical contralateral difference in hemispheric activation asymmetries in the 3 subjects for all 3 channels. For the auditory condition, we also revealed typical human-like patterns of hemispheric asymmetries in 1 subject for all three channels, namely i) typical contralateral differences in hemispheric asymmetry between left-ear versus right-ear stimulations, and ii) a rightward asymmetry for stereo stimulations. Overall, our findings support the use of fNIRS to investigate brain processing in non-human primates from a functional perspective, opening the way for the development of non-invasive procedures in non-human primate brain research.

170

Keywords: fNIRS, hemispheric lateralization, primate, motor perception, auditory perception

6.2. Introduction

Lateralization is often presented as a key characteristic of the human brain, which separates it from other animal brains (Eichert et al., 2019); yet, an increasing number of studies, particularly in non-human primates (from here onward, primates), dispute this claim in a broad array of topics ranging from object manipulation, gestural communication to producing or listening to species-specific vocalizations (Fernández-Carriba et al., 2002; Hook-Costigan & Rogers, 1998; Lindell, 2013; Margiotoudi et al., 2019; Meguerditchian et al., 2012, 2013). For instance, several primate studies present behavioral evidence of manual lateralization (Fitch & Braccini, 2013; Meguerditchian et al., 2013) which have been shown to be associated with contralateral hemispheric correlates at the neuro-structural level (Margiotoudi et al., 2019; Meguerditchian et al., 2012). Other examples show orofacial asymmetries during vocal production, as evidenced by more pronounced grimaces on the left side of the mouth, which is suggestive of right hemisphere dominance in monkeys and great apes (Fernández-Carriba et al., 2002; Hook-Costigan & Rogers, 1998), as has been documented in humans (Moreno et al., 1990). In addition, comparative structural neuroimaging has shown that particular areas known to be leftwardly asymmetric in humans such as the Planum Temporale in the temporal cortex presented also leftward asymmetry in both monkeys and great apes (Gannon et al., 1998; Hopkins et al., 2015; Marie et al., 2018; Pilcher et al., 2001), although the bias at the individual level seems to be more pronounced in humans (Rilling, 2014; Yeni-Komshian & Benson, 1976).

At the neural functional level using fMRI or Positron Emission Tomography (PET) scan, most available studies in nonhuman primates focused on lateralization of perception of synthesized sinusoidal or more complex vocal signals and reported inconsistent results. For instance, in rhesus macaques, the processing of species-specific and/or heterospecific calls as well as non-vocal sounds, elicited various patterns of lateralized activations within STG such as in the left lateral parabelt, either toward the right hemisphere or the left depending on the study (Gil-da-Costa et al., 2006; Joly et al., 2012; Petkov et al., 2008; Poremba et al., 2004). In chimpanzees, a similar PET study reported a rightward activation within STG for processing conspecific calls (Taglialatela et al., 2009). In general, such a variability of

171

direction of hemispheric lateralization for processing calls appears similar to hemispheric lateralization’s variability described in humans for language processing depending of the type of auditory information and of language functions that are processed (Belin et al., 2000; Schirmer & Kotz, 2006; Zatorre & Belin, 2001).

Compared to the leftward bias suggested for language, research investigating emotion perception in primates has strengthened the idea of a right bias in lateralization specific to emotion processing (Lindell, 2013). For example, Parr and Hopkins found that right ear temperature increased in captive chimpanzees when they were watching emotional videos, consistent with a greater right hemisphere involvement (Parr & Hopkins, 2000). The rightward hemisphere bias documented in chimpanzees is also found in other primate species such as olive baboons during natural interactions, as evidenced by studies investigating the perception of visual emotional stimuli (Baraud et al., 2009; Casperd & Dunbar, 1996; Wallez & Vauclair, 2011). Yet, while the right hemisphere has understandably received much focused, the left hemisphere is also involved for emotion processing. For example, Schirmer and Kotz have suggested that the left hemisphere is particularly involved in the processing of short segmental information during emotional prosody decoding (Schirmer & Kotz, 2006). Whether this functional differentiation, essential for speech perception in humans (Grandjean, 2020), is also present in non-humans is unclear. Baboons appear in this respect a particularly interesting animal model to study for lateralization, with several recent studies underlying the similarities in manual and brain asymmetries with humans (Margiotoudi et al., 2019; Marie et al., 2018; Meguerditchian & Vauclair, 2006). Furthermore, the baboon brain is on average twice as large as the macaque brain (Leigh, 2004), which may facilitate the specific investigation of sensory regions. Finally, this species has all the primary cortical structures found in humans (Kochunov et al., 2010).

However, a major drawback in current studies lies in the complexity with which brain asymmetry can be investigated comparatively in primates. Here, we used fNIRS in baboon brains to test whether the blood oxygen level dependent (BOLD) response differed accordingly between the two hemispheres following left- versus right-asymmetric auditory and motor stimulations. fNIRS is a non-invasive optical imaging technique that has been developed to investigate brain processes in potentially at-risk populations such as human premature new-borns, but which is now widely used with adult human participants. fNIRS is a relatively young imaging technique, with around two decades of use for functional

172

research(Boas et al., 2014). Considering its portability and its lessened sensitivity to motion artefacts (Balardin et al., 2017) compared to other non-invasive techniques, it might be an excellent methodology to study brain activations in primates under more ecologically relevant testing conditions, for example with a wireless and wearable device. As a first step, the present study tested the fNIRS in baboons immobilized under light anesthesia monitoring. In relation with each of the stimulation types, we targeted relevant corresponding brain regions of interest – the motor cortex within the central sulcus and the auditory cortex regions in the temporal lobe respectively - in both hemispheres by positioning the two sets of fNIRS channels (one by hemisphere for a given region). We predicted that, if fNIRS was suitable to record brain signal in baboons, it would reflect contralateral hemispheric asymmetries in signals for each stimulation type within their corresponding brain region of interest, namely the motor cortex, associated with right- versus left-arm movements, and the temporal cortex, associated with the right- versus left- versus stereo ear auditory presentations. Our latter prediction was modulated by the knowledge that auditory regions are less lateralized, with about fifty percent of fibers projecting in the bilateral regions (Robinson & Burton, 1980; Smiley & Falchier, 2009), compared to cortical motor regions.

6.3. Materials and methods

6.3.1. Subjects

We tested 3 healthy female baboons (Talma, Rubis and Chet, mean age = 14.6 years, SD ± 3.5 years). The subjects had normal hearing abilities and did not present a neurological impairment. All animal procedures were approved by the “C2EA -71 Ethical Committee of neurosciences” (INT Marseille), and were conducted at the Station de Primatologie CNRS (UPS 846, Rousset-Sur-Arc, France) within the agreement number C130877 for conducting experiments on animals. All methods were performed in accordance with the relevant French law, CNRS guidelines and the European Union regulations (Directive 2010/63/EU). All monkeys were born in captivity from 1 (F1) or 2 generations (F2) and are housed in social groups at the Station de Primatologie in which they have free access to both outdoor and indoor areas. All enclosures are enriched by wooden and metallic climbing structures as well as substrate on the group to favour foraging behaviours. Water

173

is available ad libitum and monkey pellets, seeds, fresh fruits and vegetables were given every day.

6.3.2. Subject’s hand Preference in Communicative Gesture and Bi-Manual task

Impacts of subject’s handedness on cerebral lateralization of language, motor and visual functions are well known in human neuroscience (35). For that purpose, we report here the hand preference of individual baboons during visual communicative gesture (CG slapping one hand repetitively on the ground in direction to a conspecific to threaten him) and a bi- manual tube task (BM - holding a PVC tube with one hand when removing the food inside the tube with the fingers of the other hand). Hence, in both contexts, Talma was left-handed (CG: n=27, HI=-0.56, z-score=-2.89; BM: n=31, HI=-0.42, z-score=-2.33) whereas Rubis showed a preference toward the right hand (CG: n=16, HI=0.25, z-score = 1; BM: n=79, HI= 1, z-score=8.88). On the other hand, Chet was left-handed in communicative gesture (n=25, HI = -0.44, z-score = -2.2) but right-handed in the bi-manual tube task (n=11, HI = 0.45, z=score = 1.51).

6.3.3. Recordings

We selected one of the most wearable, wireless and light fNIRS device available on the market (Portalite, Artinis Medical Systems B.V., Elst, The Netherlands) to measure the brain activations in baboons during the motor and auditory stimulations. The data were obtained at 50 Hz using six channels (three by hemisphere), three inter-distance probes (3 – 3.5 – 4 cm) and two wavelengths (760 and 850 nm). To localize our regions of interests (ROIs), the motor and auditory cortices, the fNIRS probes were placed using T1 MRI scanner images previously acquired by the LPC group on baboons (see Figure 63). Each fNIRS session was planned during a routine health inspection undergone by the baboons at the Station de Primatologie. As part of the health check, subjects were isolated from their social group and anesthetized with an intramuscular injection of ketamine (5 mg/kg - Ketamine 1000®) and medetomidine (50µg/kg - Domitor®). Then Sevoflurane (Sevotek®) at 3 to 5% and atipamezole (250 µg/kg - Antisedan®) were administered before recordings. The area of interest on the scalp was shaved. Each baboon was placed in ventral decubitus position on the table and the head of the individual was maintained using foam positioners, cushions and Velcro strips to remain straight and to reduce potential motion occurrences. Vital functions were monitored (SpO2, Respiratory rate, ECG, EtCO2, T°) and

174

a drip of NaCl was put in place during the entire anaesthesia. Just before recording brain activations, sevoflurane inhalation was stopped and the focal subject was further sedated with a minimal amount of intravenous injection of Propofol (Propovet®) with a bolus of around 2mg/kg every 10 to 15 minutes or by infusion rate of 0.1 – 0.4 mg/kg/min. After the recovery period, baboons were put back in their social group at the Station de Primatologie and monitored by the veterinary staff.

Figure 63: Schematic representation of fNIRS channel locations on ROIs according to T1 MRI template from 89 baboons (Love et al., 2016) for (a) the motor and (b) the auditory stimulations. Red and blue dots indicate receivers and transmitters’ positions respectively. Yellow dots indicate the channel numbers.

6.3.4. Motor stimulations

The motor stimulations consisted of 20 successive extensions of the same arm, alternatively right and left repeated three times according to the same set plan (L-R-R-L-L-R) for all baboons, resulting in a total of 120 arm movements. One experimenter on each side of the baboon extended slowly their respective arm while stimulating the interior side of the hand (gentle rhythmic tapping) with their fingers throughout the duration of the extension (about

175

5s) upon a brief vocal command triggered by another experimenter. Between each block, there was a 10s lag.

6.3.5. Auditory stimulations

The auditory stimuli consisted of 20s long series of agonistic vocalizations of baboons and of chimpanzees recorded in social settings (in captivity in an outside enclosure for baboons; and in the wild for chimpanzees). Equivalent white noise stimuli matched for the energy dynamics (i.e. the sound envelopes) were produced and used for comparison to control for the sound energy dynamic differences. In the present study and analysis, we only examine the effect of the lateralization of auditory stimulations (i.e., left ear versus right ear versus stereo) as a whole on hemispheric asymmetry and thus do not distinguish between auditory signal types or species (e.g. white noise and vocalizations). The auditory stimuli were broadcast pseudo-randomly, alternating voiced and white noise stimuli and separated by 15s silences, either binaurally (stereo), only on the left side, or only on the ride side. Due to signal artefacts and anaesthesia shortfalls, the number of stimuli between the three baboons differs slightly. For Talma, the total sequence consisted of 37 stimuli; for Rubis, the total sequence consisted of 47 stimuli; and for Chet, the total sequence consisted of 25 stimuli.

6.4. Analysis

6.4.1. fNIRS signal

We performed the first level analysis with MatLab 2018b (Mathwortks, Natick, MA) using the SPM_fNIRS toolbox (Tak et al., 2016; https://www.nitrc.org/projects/spm_fnirs/) and homemade scripts. Hemoglobin conversion and temporal pre-processing of O2Hb and HHb were made using the following procedure: 1. Hemoglobin concentration changes were calculated with the modified Beer-Lambert law (Delpy et al., 1988); 2. Motion artifacts were removed manually in each individual and each channel for the

auditory stimulations. Thus, 10 seconds in total (1.3%) were removed from the O2Hb and HHb signals of Rubis and 35 seconds (4.8%) for Talma and Chet fNIRS data; 3. A low-pass filter based on the HRF (Friston et al., 2000) was applied to reduce physiological confounds.

176

4. A baseline correction was used for both the motor and auditory stimulations by subtracting respectively i) the average of 10 seconds intervals preceding each block; ii) the average of the 15 seconds of silence preceding each sound.

According to the temporal properties of the BOLD responses for each baboon, the O2Hb concentration was averaged for Talma in a window of 4 to 12s post stimulus onset for each trial; and for Rubis and Chet in a window of 2 to 8s post stimulus onset in order to select the range of maximum concentration changes (µM). The difference of concentration range is explained by the presence of some tachycardia episodes for both Rubis and Chet during the experiment, involving an HRF almost twice as fast compared to the one found for Talma.

6.4.2. AQ score calculation

Asymmetry Quotients (AQ) were derived for each subject and each experimental condition (i.e: stimulation of the right arm and of the left arm for the motor experiment; right, left and stereo audio stimulation for the auditory blocks) by first calculating the difference between the right hemisphere and the left hemisphere values, to which we subsequently subtracted the same difference during the preceding baseline block for the same subject to normalize across trials. In particular, for motor stimuli, the baseline represented the 10s block without motor activity immediately before a passive stimulation block of the right or left arm. For auditory stimuli, the baseline was calculated on the 15s silence block that immediately preceded the auditory stimuli. In this analysis, all auditory stimuli (baboon and chimpanzee calls, and corresponding white noises) were analysed together. All calculated AQs were then normalized using the scale function of R studio (R studio (2015) Inc: Boston, MA, url: http://www.rstudio.com/). For this analysis, we excluded one block ‘chimpanzee white noise audio stereo’ (2.7% of O2Hb signal) for Rubis, and two blocks ‘chimpanzee white noise audio stereo’ and ‘baboon white noise audio stereo’ (8.3%) for Talma as the recorded data revealed themselves artefactual beyond repair. Positive AQ values indicate a rightward asymmetry and negative values indicate a leftward asymmetry. Finally, using the aov function of R studio, we performed one-way ANOVAs with pairwise comparisons on individual baboons by comparing the AQ of all trials in the different stimulation conditions (right versus left motor stimulation; right versus left versus stereo auditory stimulation) enabling to generalize the data of each individual.

177

6.5. Results

6.5.1. Motor stimulations

One-way Anova analyses revealed significant differences between the left and right arm stimulations across the three channels and baboons. Hence, for Rubis and Chet, comparisons between left and right arms stimulations were all significant at p < .001 (Rubis:

Ch1: F1,118 = 52.63; Ch2: F1,118 = 50.63; and CH3: F1,118 = 42.35; for Chet: Ch1: F1,118 = 30.16; Ch2:

F1,118 = 28.21; and Ch3: F1,118 = 24.77). Regarding Talma, significant differences were found at p <.05 in channel 1 (F1,118 = 3.821) and channel 3 (F1,118 = 6.521). The pairwise comparison in channel 2 (F1,118 = 14.71) was also significant at p < .001. Overall, the difference of AQ between left- versus right-arm stimulations were consistently contralateral across the 3 subjects for all 3 channels: activation asymmetries were more leftward for right-arm stimulations than for left arm stimulations and, were more rightward for left-arm stimulations than for right arm stimulations (Figure 64). See Table 14 in Supplementary Material for the mean AQ values).

Figure 64: Normalized averaged AQ (and corresponding SE) in the motor cortex following motor stimulations in the three adult female baboons (see Figure 63 for localization of the channels).

6.5.2. Auditory stimulations

We only found significant overall differences between, right, left and stereo ear stimulations

(p <.05) for the subject Chet (Figure 3) for all channels (Ch1: F2,6 = 7.073; Ch2: F2,6 = 6.473; and

Ch3: F2,6 = 4.289). Pairwise comparison for right versus left ear stimulations were significant

178

(p <.05) in Channel 1 (F1,6 = 5.216) and Channel 2 (F1,6 = 5.043). Furthermore, significant differences between right and stereo ear stimulations appeared across all channels (Ch1: F1,6

= 22.55; Ch2: F1,6 = 16.56, p <.001; Ch3: F1,6 =15.95, p <.05). Note that the comparison left versus stereo did not reach significance for any channels (Ch1: F1,6 = 1.827; Ch2: F1,6 = 1.825; Ch3: F1,6 =0.989, all p >.05). Hence, for Chet, there was a larger bias toward the left hemisphere with right ear stimulation compared to stereo (for all our channels) and left ear stimulation (for channels 1 and 2 only; Figure 65). No difference was significant for the two other baboons. See Table 15 in Supplementary Material for the mean AQ values).

Figure 65: Normalized averaged AQ (and corresponding SE) above the temporal cortex following auditory stimulations in three adult female baboons (see Figure 63 for localization of the channels).

6.6. Interim Discussion

The results of the present study clearly demonstrate that non-invasive fNIRS is a valid imaging technique to investigate functional lateralization paradigms in a nonhuman primate species.

Our most potent results were found with the motor stimulation where we observed a strong contralateral hemispheric asymmetry of the fNIRS signals in the motor cortex across baboons. Right arm movements elicited greater leftward asymmetry than left arm movements and vice versa in each of the three baboons for all three fNIRS channels. Results were clear-cut for Rubis and Chet, though interestingly opposed, with Rubis having a strong leftward asymmetry as a result of her right arm being stimulated, and Chet showing a strong rightward asymmetry for her left arm. Results for Talma were somewhat similar to Rubis’ since right arm movements elicited more leftward asymmetry than the left arm in

179

channels 1 and 3. Results in channel 2 were most in line with our original prediction, namely a clear mirror pattern of contralateral asymmetries between the two arms: the right arm movements elicited leftward asymmetry and the left arm, a rightward asymmetry. Our results are consistent with previous studies in primates: for arm/hand movements, 90% of the corticospinal pathway project to the contralateral (Brösamle & Schwab, 1997; Dum & Strick, 1996; Heming et al., 2019; Lacroix et al., 2004; Rosenzweig et al., 2009). Hence, our study replicates these findings, with brain signals differences detected by non- invasive fNIRS. Despite the robust consistency of findings across subjects concerning the direction of the effect between the left and the right arms, the reasons for inter-individual variabilities as well as the lack of mirror pattern of results between the two arms (channel 2 of Talma excepted) remains unclear. In particular, potential involuntary differences in arms’ stimulation degree between the two experimenters involved in each of the subject’s arms manipulations, as well the handedness of each individual baboon may have had an impact on our results.

These results were also consistent with predicted asymmetries regarding auditory stimulations for one subject. Contralateral differences of asymmetry were found for Chet in all three channels, with the stimulation of both ears and left ear eliciting overall more rightward asymmetries than right ear stimulations. Nevertheless, for Talma and Rubis, the direction and degree of asymmetry varied irrelevantly of whether the sound was presented to the right or left ear, namely toward the left temporal sides for Rubis and toward the right temporal areas for Talma. These mixed results related to auditory stimulation might be interpreted with respect to some characteristics of the hemispheric organization of the brain. It is well-known that at least one third of the auditory fibres from the olivary complex project to ipsilateral brain regions inducing less lateralization compared to motor brain regions. Furthermore, it has been shown that receptive fields in some regions sensitive to somatosensory input from the auditory cortex are 50% contralateral and 50% bilateral (Robinson & Burton, 1980; Smiley & Falchier, 2009); and that temporal regions such as the belt, parabelt and STS receive strong ipsilateral connections in rhesus macaques (Cipolloni & Pandya, 1989; Hackett et al., 1998), suggesting overall a less marked lateralization for auditory processing compared to motor regions. Interestingly, the subject’s handedness in communicative gesture could also explain these mixed results. In fact, our left-handed subject Talma, showed a clear right hemisphere bias for most stimuli (to the exception of

180

the right ear stimulation in channel 2); whereas Rubis, right-handed in communicative gesture, showed a stronger bias toward the left hemisphere for the sounds broadcast in right and left ears. These preliminary findings may thus highlight the impact of hand preference in communicative contexts on contralateral brain organization in baboons during auditory processing but would need further investigations in a larger cohort of subjects. Overall, given the lack of statistical power related to low sample size, we cannot draw any conclusion regarding the direction of hemispheric lateralization at a population-level for sounds processing in baboons, or their relation to hand preference for communicative gesturing. Nevertheless, some of our findings remain consistent with the literature on human auditory pathways: for example, Kaiser and collaborators found that stimuli presented in stereo activated more the right hemisphere compared to lateralized sounds showing a left hemisphere bias (Kaiser et al., 2000). These results suggest that stereo sounds involve additional processing steps resulting in stronger and more rightward brain activations (Jäncke et al., 2002). This pattern of rightward asymmetry for stereo and left sounds processing in the baboon “Chet” is also somewhat consistent with previous rightward asymmetries reported in rhesus monkeys (Gil-da-Costa et al., 2006) and in chimpanzees (Taglialatela et al., 2009) for processing conspecific calls. Hence, our data suggest that a phylogenetic functional approach to vocal perception appears possible with fNIRS.

In conclusion, our study shows that fNIRS is a valid methodology to access brain signals in primates non-invasively. In particular, we have replicated findings in the literature about brain contralateral hemispheric activation in two different modalities showing that fNIRS is able to capture such functional differences even in a context in which baboons were anesthetized. However, we have also uncovered large variation between individuals. This may be due to inter-individual differences leading to the inability to precisely record in the same spot for all baboons. Indeed, while we based our placing of optodes on our subjects based on an averaged structural MRI pattern to which all tested individuals contributed, we cannot exclude small variation across cortices. In the future, fNIRS should thus be coupled with structural imaging techniques such as MRI that allow a precise positioning of the optodes for each individual. Yet, the need to couple fNIRS with existing techniques does not deny a more widespread use of fNIRS in the future. To the contrary, we believe that our

181

study opens new avenues for brain investigation in nonhuman primates using fNIRS for two main reasons. First, fNIRS has been used in a multitude of contexts when other brain imaging techniques could not be used, for example in the field with greater ecological conditions (Piper et al., 2014). While our data have been recorded in anesthetized baboons, a logical next step is to train and habituate baboons to accept wearing a fNIRS device. Our experimental paradigms could then be extended in awake baboons with more sophisticated design involving behavioural contingencies related to different kinds of stimulation. Second, our study stresses that fNIRS could in the future become a valuable method to explore brain activations in lateral regions in a non-invasive way in nonhuman animals without attempting the physical integrity of the subjects, which would ultimately make investigation of brain mechanisms in animal much more accessible and flexible.

6.7. Supplementary Material

Calculated mean AQ:

Asymmetry Quotients (AQ) were derived for each subject and each experimental condition (i.e: stimulation of the right arm and of the left arm for the motor experiment; right, left and stereo audio stimulation for the auditory blocks) by first calculating the difference between the right hemisphere and the left hemisphere values, to which we subsequently subtracted the same difference during the preceding baseline block for the same subject to normalize across trials. In particular, for motor stimuli, the baseline represented the 10s block without motor activity immediately before a passive stimulation block of the right or left arm. For auditory stimuli, the baseline was calculated on the 15s silence block that immediately preceded the auditory stimuli. In this analysis, all auditory stimuli (baboon and chimpanzee calls, and corresponding white noises) were analysed together. All calculated AQs were then normalized using the scale function of R studio (R studio (2015) Inc: Boston, MA, url: http://www.rstudio.com/). For this analysis, we excluded one block ‘chimpanzee white noise audio stereo’ (2.7% of O2Hb signal) for Rubis, and two blocks ‘chimpanzee white noise audio stereo’ and ‘baboon white noise audio stereo’ (8.3%) for Talma as the recorded data revealed themselves artefactual beyond repair. Positive AQ values indicate a rightward asymmetry and negative values indicate a leftward asymmetry.

182

Table 14: Calculated mean AQ values for subjects Talma, Rubis and Chet during the motor stimulation.

Channel 1 Channel 2 Channel 3 Left Right Left Right Left Right Talma -0.119 -0.330 0.320 -0.146 0.195 -0.609 Rubis 0.362 -1.940 0.089 1.603 -0.117 -1.572 Chet 1.448 -0.067 1.871 0.064 2.247 0.299

Table 15: Calculated mean AQ values for subjects Talma, Rubis and Chet during the auditory stimulation.

Channel 1 Channel 2 Channel 3 Left Right Stereo Left Right Stereo Left Right Stereo Talma 0.557 0.339 0.497 0.426 -0.639 0.371 0.426 0.822 0.268 Rubis -0.448 -0.468 0.060 -0.438 -0.501 0.258 -0.462 -0.449 0.332 Chet 0.312 -1.865 1.961 0.226 -2.268 2.189 0.185 -2.718 1.395

183

General Discussion

1. Synthesis and Integration of the Main Findings

From the human recognition of emotions in conspecific and NHP vocalizations to the passive listening of affective calls in baboons, the present thesis investigated through an evolutionary perspective these phenomena in the auditory modality.

Study 1 indeed, crucially demonstrated using fNIRS the difference of mechanisms during the explicit categorization and discrimination of emotions and the implicit processing of those in human voice. In accordance with previous fMRI findings showing a distinct involvement of the IFC in biased and unbiased choices (Dricu et al., 2017); and literature on frontal bilateral approach of emotions (e.g. Davidson, 1992; Frühholz & Grandjean, 2013b; Grandjean, 2020; Schirmer & Kotz, 2006), we found brain modulations of the bilateral IFC depending on the categorization or the discrimination of angry and fearful contents in implicit (word recognition task) and explicit (emotional identification task) auditory decoding (Figure 31). These results are supported by behavioural data (Figures 27 & 28) showing that participants in explicit task were more accurate to categorize than discriminate but spent a longer time before selecting their answer, while in implicit task, participants were better to discriminate than categorize words and spent less time before answering. Therefore, our findings suggest separated cerebral mechanisms for the vocal recognition of emotions relying on the level of complexity (number of possible choices) in perceptual decision-making. However, it remains unclear if such processes are unique to the human voice or are also at play in heterospecific vocalizations.

In order to clarify this point, Study 2 revealed the existence of similar processes for the human recognition of affect in calls expressed by other primate species namely chimpanzees, bonobos and rhesus macaques. fNIRS data indeed showed a modulation of the bilateral PFC and IFGtri depending on the categorization and discrimination of affects in all primate species (humans included) (Figure 38). Differences in affective decision- making, at least in auditory processing, seem thus to exist independently of the primate species that expressed the vocalizations. Interestingly, further analyses demonstrated that the correct categorization of agonistic chimpanzee and bonobo screams as well as affiliative

184

chimpanzee calls were associated with a decrease of activity in bilateral PFC and IFCtri. On the contrary, the accurate discrimination of agonistic chimpanzee screams was correlated to bilateral enhancement of PFC and IFGtri (Figure 37). Although these last findings seem counterintuitive, they could be linked to several explanations. First, the categorization of agonistic vocalizations expressed by great apes, our closest relatives, could involve inhibition mechanisms to reduce the possibly induced high level of stress. Frontal regions are indeed the most sensitive brain areas to stress exposure (Arnsten, 2009). Second, affiliative chimpanzee calls would be perceived as agonistic due to their loud and low frequency characteristics (Kelly et al., 2017; Kret et al., 2018). Such acoustic parameters are in fact, usually associated to aggressive behaviours (Briefer, 2012; Morton, 1977). Third, the enhancement of activity in frontal regions for agonistic chimpanzee vocalizations could rely on distinct mechanisms between the categorization and discrimination tasks in cross-taxa recognition. Indeed, the simpler choice between A versus non-A (compared to categorization) would not involve inhibition processes relying on stress reduction. Finally, the level of choice complexity also affected participants’ behaviour. In fact, to the exception of threatening bonobo calls, human participants were able to discriminate all affective cues in all primate species; for the categorization task, they were unable to do it for macaque vocalizations (Figure 36). It seems that the low level of complexity involved in discrimination processing compared to more complex categorization mechanisms, allowed participants to discriminate more correctly affective vocalizations of all primates, including species with higher phylogenetic distances from humans such as monkeys. Regarding the non-recognition of threatening cues in bonobo calls, participants could be indeed biased by the peaceful nature of the species (Gruber & Clay, 2016) that might be related to the higher F0 of their screams (Grawunder et al., 2018). Overall, similar brain and behavioural mechanisms seem involved in perceptual affective decision-making in conspecific and heterospecific vocalizations. Moreover, our data suggest that the acoustic, as the phylogenetic proximity, could play a crucial role in cross- taxa recognition.

Assessing both the role of phylogenetic and acoustic distances in the human recognition of affects in primate vocalizations, Study 3 importantly revealed strong acoustic similarities between affective chimpanzee calls and human voice (Figure 42). Consequently, participants were more accurate to categorize and discriminate affective cues in chimpanzee

185

vocalizations compared to bonobo or macaque screams (Figures 43 and 44). These results highlight the importance of acoustic features in vocal emotion recognition in heterospecific vocalizations. In fact, despite their phylogenetic proximity with Homo sapiens, bonobo calls were not recognized as well as chimpanzee calls by human participants. The peculiar evolutionary pathway of bonobos (Hare et al., 2012; Staes et al., 2018) leading to behavioural (Gruber & Clay, 2016) and vocal expression divergences (Grawunder et al., 2018) would prevent human participants from recognizing for instance, threatening cues in their calls. Furthermore, in line with Study 2, participants were able to discriminate distressful and affiliative macaque calls while in the categorization task they were only capable of doing so for affiliative cues. These results emphasize first, differences between simple and complex choice mechanisms; and second, distinction between forced-choice and Likert scale in recognition performance. In fact, the ability of humans to accurately identify affects in macaque screams is controversial. Most comparative studies using arousal or valence rating actually failed to demonstrated such capacities (Belin, Fecteau, et al., 2008; Fritz et al., 2018; Scheumann et al., 2014, 2017). However, research investigating the identification of affects using semantic labelling has shown the ability of human adults and infants to recognize various affective contents in macaque vocalizations (Linnankoski et al., 1994). Our findings suggest an intermediate level in which humans, despite overall poor performances are yet able to identify affective cues in macaque calls. Finally, the relationships between participants’ accuracy and acoustic distances (Figure 45) revealed that acoustic feature similarities facilitated both the categorization and discrimination of affiliative and distressful calls in all primate species. These findings highlight the key role of acoustic proximity in cross-taxa recognition. On the contrary, higher acoustic distances were associated with better performances for threatening human and chimpanzee vocalizations. This last result seems a priori counterintuitive, yet, literature has shown that human infant and chimpanzee vocalizations with agonistic contents lead to similar brain responses in human participants within a novelty oddball paradigm (Scheumann et al., 2017). If threatening contents in human and chimpanzee calls can involve novelty responses leading to attentional capture, it is possible that due to higher acoustical distances, our stimuli may have been more acoustically salient and thus resulted in better accuracy. Our results suggest that both phylogenetic and acoustic proximities are primordial to the correct categorization and discrimination of affect in primate calls. But do the same

186

mechanisms exist in brain regions often linked exclusively to conspecific vocalizations?

Study 4 indeed focused on TVA usually associated with the human listening of human voice. Nevertheless and importantly, fMRI wholebrain analyses revealed an increase in activity in the left aSTG for the human categorization of chimpanzee screams as well as an increase of activity in bilateral STC for the recognition of human and chimpanzee vocalizations compared to bonobo and macaque calls (Figure 52). Following this, functional connectivity demonstrated the existence of a similar coupling between left mMTG and right mSTS for the perception of human and chimpanzee vocalizations (Figure 53). To the exception of bonobo calls, for which no previous brain data exist in the literature, these results are in line with recent fMRI findings showing a gradient of activity in bilateral STS for the human perception of primate vocalizations with the strongest neural responses for human voices compared to chimpanzees and then compared to macaque calls in which the lowest activations were found (Fritz et al., 2018). In addition, in line with Study 2 and Study 3, our findings highlight the influence of evolutionary and acoustic divergences of the bonobo species on the recognition of their calls by human participants. Interestingly, further analyses investigated the link between reaction times and acoustic distances of the vocalizations, and showed that human participants were faster to recognize macaque calls when their acoustic features were the most dissimilar to the human voice (Figure 51). This last finding could be explained by an oddball effect induced by the human recognition of macaque vocalizations with infrequent acoustic features. In fact, previous findings in EEG have revealed P3a ERP for the human perception of macaque screams emphasizing the involuntary attention switch from familiar to novel stimuli (Scheumann et al., 2017). Hence, results in Study 4 confirm that both phylogenetic and acoustic proximity are needed to recruit TVA in the human brain. From this assessment, analogous mechanisms should be at play in the IFG, a region-of-interest for perceptual decision-making processing.

Using the same paradigm as Study 4, Study 5 revealed increased activity of the pars triangularis, opercularis and orbitalis of bilateral IFG for the categorization of chimpanzee calls, independently of correct/incorrect recognition of the chimpanzee calls (Figures 57 and 59). Moreover, wholebrain analyses showed the involvement of bilateral IFG in the processing of bonobo and macaque vocalizations (Figures 58 and 59). Yet, bonobo calls did not involve IFGorb. These results extend previous fMRI findings showing a gradient of

187

activation in the left IFG for the human discrimination of emotions in human voices compared to chimpanzee vocalizations and then compared to macaque calls. Interestingly, our analyses demonstrated the involvement of most subparts of IFG (left IFGorb excepted) for the correct categorization of macaque vocalizations. On the contrary, activations in bilateral

IFGtri only were found for the correct recognition of bonobo screams. The subparts of IFG seem thus differently implicated in the processing of heterospecific vocalizations. In agreement with the fNIRS data shown in Study 2, IFGtri would be indeed particularly involved in such processes. Finally, following the behavioural results revealed in Studies 2, 3 and 4, participants were able to identify all NHP vocalizations, to the exception of bonobos (Figure 61), again emphasizing the possible evolutionary divergence (Hare et al., 2012; Staes et al., 2018) as well as the acoustic and behavioural differences (Grawunder et al., 2018; Gruber & Clay, 2016) of this peculiar great ape species. Therefore, human participants seem capable to recognize most primate species; however, a phylogenetic, acoustic and behavioural proximity seems required to enhance activity in all subparts of bilateral IFG.

Overall, Studies 1 to 5 investigated perceptual decision-making in humans using primate vocalizations (human, chimpanzee, bonobo and macaque). Our findings suggest that, relying on different levels of complexity, distinct behavioural and cerebral mechanisms are involved in categorization and discrimination tasks. Interestingly, we found that such behavioural and brain processes were also at play for the affective recognition of NHP species.

Finally, Study 1 and Study 2, demonstrated the suitability of fNIRS to explore using complex paradigms, cerebral processing in healthy humans. Yet, it remained unclear if such relatively new device could be used to assess non-invasively brain mechanisms in NHP. For this matter, Study 6 investigated using a wireless and portative fNIRS, hemispheric lateralization in baboons, as a proof of concept. Here, hemodynamic response analyses revealed contralateral activations for motor stimulations with the right arm movements activating more the left hemisphere and left arm movements enhancing activity in the right hemisphere (Figure 64). These results are in agreement with previous studies in primates showing that 90% of the corticospinal pathway project to the contralateral spinal cord (Brösamle & Schwab, 1997; Dum & Strick, 1996; Heming et al., 2019; Lacroix et al., 2004; Rosenzweig et al., 2009). Regarding auditory stimulations, similar contralateral activations

188

were found for the baboon Chet only with additionally, a right hemisphere bias for the sounds broadcasted in stereo (Figure 65). The lack of results in the two other baboons could be explain by their handedness in communicative gesture. In fact, our left-handed subject Talma, showed a clear right hemisphere bias for most of the stimuli whereas right-handed Rubis showed a stronger bias toward the left hemisphere for the sounds broadcasted in right and left ears. Yet, findings for stereo stimuli are consistent with the literature on human and NHP auditory pathway showing a rightward asymmetry for stereo sounds (Jäncke et al., 2002; Joly et al., 2012; Kaiser et al., 2000; Petkov et al., 2008). Overall, this last study upholds the suitability of fNIRS to assess brain mechanisms in NHP, and opens new avenues of research in free-ranging primates.

Findings from Study 1 to Study 6 will hopefully contribute to a better understanding of i) the human recognition of affects in heterospecific vocalizations; ii) the evolutionary continuity between human and NHP brains; and iii) the development of non-invasive protocols in animal research.

2. Theoretical Implications

In light of evolutionary thought, the present thesis uncovered new findings that could be valuable in affective sciences, psychology, neurosciences or even primatology (see Table 16).

First, the congruent results found across Study 2 to Study 5, highlight the importance for future comparative studies on vocal recognition, to consider both phylogenetic and acoustic proximity in such matter (see Figure 66). Yet, rare are the studies investigating the link between the acoustic structure of the calls and the behavioural or cerebral response of human participants. In fact, to our knowledge, only three studies used this approach (Filippi et al., 2017; Fritz et al., 2018; Kelly et al., 2017) emphasizing the crucial role of frequency parameters in the identification of affects in heterospecific vocalizations. Despite the existence of few interesting findings, Study 3 was the first to extensively assess the influence of phylogenetic and acoustic proximities in vocal affective recognition. Thus, for the first time, we demonstrated that vocalizations expressed by chimpanzees, our closest relatives, were significantly closer in acoustic distance to the human voice than they were to bonobo (great ape) and macaque (monkey) calls. Moreover, results in Study 3 revealed that acoustic similarities between human and chimpanzee vocalizations had a direct impact on the

189

human recognition of affects in chimpanzee screams. Furthermore, similar brain mechanisms were also found at play in Study 4 and 5 for the categorization of NHP calls. Despite bonobos belonging to the great ape family, the higher acoustic distance of their calls from the human voice as well as their behavioural differences seems to prevent the involvement of such mechanisms in human recognition of bonobo vocalizations.

Hence, both phylogenetic and acoustic factors are essential to the human perceptual decision-making in heterospecific vocalizations. Yet, the kinds of process related to decision-making itself seem also primordial in such paradigms. In fact, previous fMRI findings (Dricu et al., 2017) and results found in Study 1 demonstrated the difference of mechanisms between the categorization and discrimination tasks, depending on the level of complexity (number) of possible choices. However, to our knowledge, Study 2 and Study 3 constitute the only comparative research considering the choice complexity in cross-taxa recognition. We thus importantly revealed, for the first time, the distinction between the categorization and discrimination of affect in NHP vocalizations at cerebral and behavioural levels. Therefore, future studies on this matter should carefully investigate sub processes involved in decision-making mechanisms at play in cross-taxa recognition. Interestingly and unexpectedly, findings in Studies 2 to 5 support the hypothesis of an evolutionary divergence between chimpanzee and bonobo species. The self-domestication hypothesis conceptualized by Hare and collaborators, indeed suggests a distinct evolutionary pathway for bonobos compared to chimpanzees due to selection against aggression, despite their common affiliation to the great ape family (Hare et al., 2012). In line with previous studies showing morphological (Grawunder et al., 2018), neuroanatomical (Staes et al., 2018), and behavioural (Gruber & Clay, 2016) differences between both species, our results also point out the existence of divergence in their vocal expression of affects. Together, these findings support the fact that bonobo vocalizations, are particularly of interest for future comparative studies investigating both, phylogenetic and acoustic distances.

190

Figure 66: Human perceptual decision-making tree for affective conspecific and heterospecific vocalizations. Summary of the findings from Study 1 to Study 5. In green are represented the factors of choice. In red are indicated the behavioural and cerebral consequences. Decision processing start from the bottom and end at the top of the tree.

Table 16: Summary of the findings from Study 1 to Study 6 depending on the five thesis objectives (see p.63).

Findings Objective 1 Brain data: distinct involvement of bilateral IFC in affective categorisation and discrimination for implicit and explicit decoding Behavioural recognition: explicit categorization (cat) > implicit discrimination (dis) > implicit cat > explicit dis

Objective 2 Brain data: distinct involvement of IFCtri and PFC in affective categorization and discrimination for all NHP calls Behavioural recognition: discrimination of all stimuli (threatening bonobo screams excepted) + categorization of all affective calls expressed by great apes Brain * behaviour: categorization agonistic chimpanzee and bonobo & affiliative chimpanzee calls = ↘ PFC-IFC. discrimination agonistic

191

chimpanzee screams = ↗ PFC-IFC Objective 3 Mahalanobis: human < chimpanzee < bonobo < macaque calls Behavioural recognition: discrimination of all stimuli, (threatening bonobo and macaque screams excepted) + categorization of all stimuli (agonistic macaque vocalizations and threatening bonobo screams excluded) Mahalanobis * behaviour: ↗ recognition for chimpanzee calls

Objective 4 Brain data: ↗ left TVA and bilateral IFGtri – ope - orb for the categorization of chimpanzee vocalizations + similar brain networks between

human voices and chimpanzee calls within bilateral TVA + IFGtri for

the categorization of NHP Behavioural recognition: chimpanzee > macaque. ∅ bonobo Objective 5 Brain data: Contralateral activations in the motor (for all baboons) and temporal cortices (for one subject)

A second important aspect of the present thesis is the successful investigation of preserved mechanisms in the humans’ frontal and temporal cortices. In fact, despite a larger and more folded brain compared to the other great apes species (Heuer et al., 2019), human neuroanatomical traits are mainly considered as a continuum in the primate brain evolution (Herculano-Houzel, 2009; Semendeferi et al., 2002). It is thus unlikely that 18 millions of years of common evolution (Perelman et al., 2011) did not affect modern human abilities. From this assessment, Studies 2, 4 and 5 explored in humans the processing of cross-taxa recognition within TVA, PFC and IFG areas. In line with the few existing fMRI findings showing an increase of activity in frontal regions for the identification of affective valence in macaque (Belin et al., 2008) and chimpanzee calls (Fritz et al., 2018), results in Study 2 revealed for the first time the involvement of bilateral IFG and PFC in the recognition of affective contents in human and NHP vocalizations (with the crucial addition of bonobos). Following this, fMRI data in Study 5 demonstrated the enhancement of activity in bilateral IFG for the recognition of NHP species. Finally and most importantly, wholebrain and functional connectivity analyses in Study 4 showed the involvement of TVA for the perception of chimpanzee vocalizations only. Usually associated to the processing of human voice (Belin et al., 2000), Joly and collaborators did not find such activity in TVA for human participants listening to macaque calls (Joly et al., 2012). Yet, this lack of result is not surprising in view of the large phylogenetic distance between macaque species and Homo sapiens. Therefore, Study 4 is the first to demonstrate the involvement of TVA for the

192

listening of heterospecific vocalizations, namely chimpanzee calls. Third, Study 6 will hopefully have a positive impact on ethical standards in animal research. In fact, our findings in both motor and auditory paradigms demonstrate for the first time the suitability of fNIRS to assess the hemodynamic responses of the baboon brain. fNIRS is indeed a relatively new brain imaging technique. Using the principle of tissue transillumination (Bright, 1831), fNIRS measures via near infrared lights, blood oxygenation changes (e.g. Hoshi, 2016; Jöbsis, 1977) related to the HRF constituted of oxygenated haemoglobin and deoxygenated haemoglobin. Considering that fNIRS is a lower-cost option and less sensitive to motion artefacts (Balardin et al., 2017) than other non-invasive techniques (e.g. MRI, EEG), it might be an excellent methodology to study cognitive functions of NHP under more ecologically relevant testing conditions using a wireless and wearable device. In fact, few fNIRS studies on NHP are emerging but the research have been only focused on macaques and on cross-validating the technique with other well established but invasive methods such as the electrophysiology (Kim et al., 2017). Nevertheless, fNIRS was successfully used on macaques as the only technique to explore PFC (Lee et al., 2017), occipital and frontal activations (Wakita et al., 2010) elicited by visual stimuli. Therefore, together Study 6 and the existing literature support an extensive using of fNIRS in the future to investigate brain processing in NHP in ecological conditions.

Overall, the present thesis led to various new findings that should especially improve our knowledge on perceptual affective decision-making in heterospecific vocalizations as well as fNIRS suitability for the NHP brain investigations. Yet, additional experiment are still needed to extend our findings in more ecological conditions, especially during social interactions or specific tasks.

3. Limitations

Six complementary studies were performed, leading to important behavioural and neuroimaging results. Yet, no study is without flaw and few methodological limitations may have biased our findings.

A major limitation of this work is to rely extensively on the use of fNIRS in our experimental paradigms. Despite the various advantages of fNIRS such as its high portability and its low sensitivity to motion artefacts (Balardin et al., 2017), fNIRS has

193

indeed a poor temporal resolution compared to EEG (Bendall et al., 2016). Crucially, as often demonstrated in the literature, the recognition of emotional voices depends on temporality (e.g. Frühholz & Grandjean, 2013; Grandjean, 2020; Schirmer & Kotz, 2006). For instance, Schirmer and Kotz have well described the distinct cerebral mechanisms at play in the processing of emotional prosody from 0 to 400 ms after the vocal stimulus (Schirmer & Kotz, 2006). Therefore, fNIRS data being acquired between 10 Hz and 250 Hz (10 to 250 measures per second) in Studies 1, 2 and 6, we unfortunately did not investigate the processing of affects in conspecific and heterospecific vocalizations according to temporality. Following this, the poor spatial resolution of fNIRS in comparison to fMRI (Bendall et al., 2016) can also be an issue. In fact, despite analogous measurements of the BOLD signal (Cui et al., 2011), the optical pathway (or banana shape) relied upon by fNIRS cannot reach the depth of brain areas below 2 – 2.5 cm (Okada & Delpy, 2003a, 2003b). However, subcortical regions such as amygdala, basal ganglia, and cerebellum are crucially involved in the processing of emotions (Bach et al., 2008; Brück et al., 2011; Frühholz et al., 2012; Grandjean, 2020; Sander et al., 2005; Wildgruber et al., 2009; Young et al., 2020 - see Section 2.2.1.1). Hence, in the present thesis, we only focused on frontal and temporal cortical areas. In order to assess the temporality and the spatiality of the emotional processing using fNIRS, future studies on this matter should use combined systems enabling to acquire simultaneously fNIRS- EEG data or fNIRS-fMRI measurements (Balconi et al., 2015; Balconi & Vanutelli, 2016; Boas et al., 2014; Ferrari & Quaresima, 2012; Felix Scholkmann et al., 2014).

How the fNIRS is assessing the O2Hb could also be a limitation. In fact, the optical window between 680 nm and 950 nm in which the near infrared light detects the oxy and deoxygenated haemoglobin, can also absorb other chromophores such as melanin or water (Jacques, 2013). The saturation of the optical signal in melanin due to dark skins or is indeed a problem in fNIRS methodology (Wassenaar & Van den Brand, 2005). Similarly, systemic variables can also affect fNIRS measurements (Caicedo et al., 2016; Caldwell et al., 2016; Y. Hoshi, 2016). Indeed, due to the optical pathway of the banana shape, the fNIRS signal can for instance, be biased by the heart beats or the vasocontraction of superficial vessels, leading to false positives or false negatives (Tachtsidis & Scholkmann, 2016). Yet, researcher should be particularly attentive when filtering fNIRS raw data (Scholkmann et al., 2010).

194

Overall, the use of fNIRS, despite over 20 years of active research (Boas et al., 2014), remains subject to improvement, both from technological and processing perspectives. Much work assessing the reliability and comparability of fNIRS with other imaging techniques (see above) remains to be conducted. Therefore, future studies should consider the methodological limitations of fNIRS and replicate the present findings with other neuroimaging techniques (Barch & Yarkoni, 2013). The present thesis is a step toward this endeavour, with results in Studies 2 and 5 emphasizing the congruence between our fNIRS and fMRI data.

4. Future Perspectives

Despite various advances on human perceptual decision-making and methodology in NHP, further experiments are still necessary for a better understanding of the underlying brain mechanisms.

In fact, the present thesis explored the frontal and temporal activations for the human recognition of species or affective contents in primate vocalizations. Yet, we did not investigate the fronto-temporal connectivity between both regions. According to the literature (see Section 2.2.1.1), the fronto-temporal network is however essential to the processing of voice in humans. Aware of this limitation, we recently conducted in humans the first comparative experiment in fNIRS to assess functional coupling between bilateral IFC and temporal cortices. Thus, as for Studies 2 and 3, human participants performed a categorization and a discrimination tasks in different blocks. In these blocks, the participants were instructed to identify the affective content of the vocalizations expressed by humans, chimpanzees, bonobos or macaques. In order to assess the haemodynamic response modulations within and between the fronto-temporal regions, we used a new fNIRS device comprising 18 channels (9 channels per hemisphere) at 50 Hz. Developing a new methodology is a matter of time. Therefore, functional connectivity analyses using complex Matlab and R studio coding for Generalized Additive Mixed Models (GAMM) analyses are still in process. However, preliminary results using GLMM have shown a different involvement of the fronto-temporal regions depending on the recognition tasks and the species that expressed the vocalizations. These findings support the necessity of further analyses to investigate the involvement of the fronto-temporal network in cross-

195

taxa recognition.

The fNIRS analyses described above are not the only ones in progress. Indeed, despite our interest on the processing of vocal recognition in several primate species, we did not differentiate in Study 6 the vocalizations expressed by baboons or chimpanzees. We recently started to assess the modulation of haemodynamic activity in the temporal cortex of our three baboons depending on the lateralization of the sounds as well as the species that expressed the vocalizations. Using GLMM, our preliminary results interestingly reveal differences in activity within the left temporal cortex between the processing of chimpanzee stereo and baboon stereo calls. These promising findings seem to emphasize the cerebral mechanisms underlying the recognition of the caller identity. However, further brain and acoustic analyses are still required.

Finally and perhaps the most challenging, future studies wanting to use fNIRS with NHP should consider the development of new protocols to investigate brain mechanisms on awake monkeys. Indeed, Study 6 was performed on three anesthetized baboons. Despite the injection of a minimum amount of propofol, we could not totally control the modulation of the haemodynamic responses induced by the anaesthesia. Furthermore, the recent appearance of wireless and portative fNIRS device on the market should make possible the development of such non-invasive paradigms to assess NHP brain functions in more ecological and ethical contexts.

In sum, Study 1 to Study 6 will hopefully contribute to improve our knowledge in neurosciences and related fields with regards to emotion processing. Yet, further investigations are still required to increase progress in research using a comparative approach.

5. Conclusion

In light of the great Hominidae history, the present thesis demonstrated the importance of an evolutionary perspective in neurosciences research. In fact, in line with Jaak Panksepp, considered as the founder of the comparative approach in affective neurosciences, our results suggest preserved brain and behavioural mechanisms in humans inherited from our common ancestor with other primates. Interestingly, we also pointed out that the

196

phylogenetic proximity was insufficient to explain modern human abilities to recognize affective contents in other primate vocalizations. A closer acoustic distance from the human voice to NHP vocalizations seems indeed necessary for the correct affective and species recognition by humans. Therefore, Study 1 to Study 6 revealed for the first time: i) differences between categorization and discrimination of emotions in conspecific and heterospecific vocalizations; ii) the involvement of frontal regions for the human recognition of affects in NHP calls; iii) the capacity of humans to mostly recognize all affects in all primate vocalizations; iv) acoustic similarities between human and chimpanzee calls; v) the involvement of TVA and IFG for the categorization of NHP vocalizations; and vi) the suitability of fNIRS to assess the haemodynamic activity in NHP brain. Overall, the present thesis will hopefully promote the evolutionary approach to investigate emotional mechanisms at play in both human and non-human primates’ vocal communication.

197

References

Aarabi, A., Osharina, V., & Wallois, F. (2017). Effect of confounding variables on hemodynamic response function estimation using averaging and deconvolution analysis: An event-related NIRS study. NeuroImage, 155, 25–49. https://doi.org/10.1016/j.neuroimage.2017.04.048 Aglieri, V., Chaminade, T., Takerkart, S., & Belin, P. (2018). Functional connectivity within the voice perception network and its behavioural relevance. NeuroImage, 183, 356– 365. https://doi.org/10.1016/j.neuroimage.2018.08.011 Albuquerque, N., Guo, K., Wilkinson, A., Savalli, C., Otta, E., & Mills, D. (2016). Dogs recognize dog and human emotions. Biology Letters, 12(1), 20150883. https://doi.org/10.1098/rsbl.2015.0883 Al-Shawaf, L., & Lewis, D. M. G. (2017). Evolutionary Psychology and the Emotions. In V. Zeigler-Hill & T. K. Shackelford (Eds.), Encyclopedia of Personality and Individual Differences (pp. 1–10). Springer International Publishing. https://doi.org/10.1007/978- 3-319-28099-8_516-1 Anderson, D. J., & Adolphs, R. (2014). A Framework for Studying Emotions Across Phylogeny. Cell, 157(1), 187–200. https://doi.org/10.1016/j.cell.2014.03.003 Andics, A., Gácsi, M., Faragó, T., Kis, A., & Miklósi, Á. (2014). Voice-Sensitive Regions in the Dog and Human Brain Are Revealed by Comparative fMRI. Current Biology, 24(5), 574–578. https://doi.org/10.1016/j.cub.2014.01.058 Aoki, R., Sato, H., Katura, T., Utsugi, K., Koizumi, H., Matsuda, R., & Maki, A. (2011). Relationship of negative mood with prefrontal cortex activity during working memory tasks: An optical topography study. Neuroscience Research, 70(2), 189–196. https://doi.org/10.1016/j.neures.2011.02.011 Aqil, M., Hong, K.-S., Jeong, M.-Y., & Ge, S. S. (2012). Detection of event-related hemodynamic response to neuroactivation by dynamic modeling of brain activity. NeuroImage, 63(1), 553–568. https://doi.org/10.1016/j.neuroimage.2012.07.006 Arnal, L. H., Flinker, A., Kleinschmidt, A., Giraud, A.-L., & Poeppel, D. (2015). Human Screams Occupy a Privileged Niche in the Communication Soundscape. Current Biology : CB, 25(15), 2051–2056. https://doi.org/10.1016/j.cub.2015.06.043 Arnellos, A., & Keijzer, F. (2019). Bodily Complexity: Integrated Multicellular Organizations for Contraction-Based Motility. Frontiers in Physiology, 10. https://doi.org/10.3389/fphys.2019.01268 Arnold, M. B. (1960). Emotion and personality. Columbia University Press. Arnsten, A. F. T. (2009). Stress signalling pathways that impair prefrontal cortex structure and function. Nature Reviews. Neuroscience, 10(6), 410–422.

198

https://doi.org/10.1038/nrn2648 Ashburner, J. (2007). A fast diffeomorphic image registration algorithm. NeuroImage, 38(1), 95–113. https://doi.org/10.1016/j.neuroimage.2007.07.007 Bach, D., Grandjean, D., Sander, D., Herdener, M., Strik, W., & Seifritz, E. (2008). The effect of appraisal level on processing of emotional prosody in meaningless speech. NeuroImage, 42, 919–927. https://doi.org/10.1016/j.neuroimage.2008.05.034 Balardin, J. B., Zimeo Morais, G. A., Furucho, R. A., Trambaiolli, L., Vanzella, P., Biazoli, C., & Sato, J. R. (2017). Imaging Brain Function with Functional Near-Infrared Spectroscopy in Unconstrained Environments. Frontiers in Human Neuroscience, 11. https://doi.org/10.3389/fnhum.2017.00258 Balconi, M., Grippa, E., & Vanutelli, M. E. (2015). What hemodynamic (fNIRS), electrophysiological (EEG) and autonomic integrated measures can tell us about emotional processing. Brain and Cognition, 95, 67–76. https://doi.org/10.1016/j.bandc.2015.02.001 Balconi, M., & Vanutelli, M. E. (2016). Hemodynamic (fNIRS) and EEG (N200) correlates of emotional inter-species interactions modulated by visual and auditory stimulation. Scientific Reports, 6, 23083. https://doi.org/10.1038/srep23083 Banse, R., & Scherer, K. R. (1996). Acoustic Profiles in Vocal Emotion Expression. 23. Baraud, I., Buytet, B., Bec, P., & Blois-Heulin, C. (2009). Social laterality and ‘transversality’ in two species of mangabeys: Influence of rank and implication for hemispheric specialization. Behavioural Brain Research, 198(2), 449–458. https://doi.org/10.1016/j.bbr.2008.11.032 Barbas, H. (2000). Connections underlying the synthesis of cognition, memory, and emotion in primate prefrontal cortices. Brain Research Bulletin, 52(5), 319–330. https://doi.org/10.1016/s0361-9230(99)00245-2 Barbas, Helen, Zikopoulos, B., & Timbie, C. (2011). Sensory Pathways and Emotional Context for Action in Primate Prefrontal Cortex. Biological Psychiatry, 69(12), 1133– 1139. https://doi.org/10.1016/j.biopsych.2010.08.008 Barch, D. M., & Yarkoni, T. (2013). Introduction to the special issue on reliability and replication in cognitive and affective neuroscience research. Cognitive, Affective, & Behavioral Neuroscience, 13(4), 687–689. https://doi.org/10.3758/s13415-013-0201-7 Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., & Pike, B. (2000). Voice-selective areas in human auditory cortex. Nature, 403(6767), 309–312. https://doi.org/10.1038/35002078 Belin, P. (2006). Voice processing in human and non-human primates. Philosophical Transactions of the Royal Society B: Biological Sciences, 361(1476), 2091–2107. https://doi.org/10.1098/rstb.2006.1933 Belin, P, Fecteau, S., Charest, I., Nicastro, N., Hauser, M. D., & Armony, J. L. (2008). Human cerebral response to animal affective vocalizations. Proceedings. Biological Sciences,

199

275(1634), 473–481. https://doi.org/10.1098/rspb.2007.1460 Belin, P, Fillion-Bilodeau, S., & Gosselin, F. (2008). The Montreal Affective Voices: A validated set of nonverbal affect bursts for research on auditory affective processing. ResearchGate. https://www.researchgate.net/publication/51398475_The_Montreal_Affective_Voices _A_validated_set_of_nonverbal_affect_bursts_for_research_on_auditory_affective_pr ocessing Belyk, M., Brown, S., Lim, J., & Kotz, S. A. (2017). Convergence of semantics and emotional expression within the IFG pars orbitalis. NeuroImage, 156, 240–248. https://doi.org/10.1016/j.neuroimage.2017.04.020 Bendall, R. C. A., Eachus, P., & Thompson, C. (2016). A Brief Review of Research Using Near-Infrared Spectroscopy to Measure Activation of the Prefrontal Cortex during Emotional Processing: The Importance of Experimental Design. Frontiers in Human Neuroscience, 10, 529. https://doi.org/10.3389/fnhum.2016.00529 Berridge, K. C. (2002). Comparing the emotional brains of humans and other animals. In Handbook of Affective Sciences (Orford University Press, pp. 25–51). Davidson, R.J., Scherer, K.R., Goldsmith, H.H. https://doi.org/10.1002/9780470751466.ch9 Bestelmeyer, P. E. G., Belin, P., & Grosbras, M.-H. (2011). Right temporal TMS impairs voice detection. Current Biology: CB, 21(20), R838-839. https://doi.org/10.1016/j.cub.2011.08.046 Binder, J. R., Liebenthal, E., Possing, E. T., Medler, D. A., & Ward, B. D. (2004). Neural correlates of sensory and decision processes in auditory object identification. Nature Neuroscience, 7(3), 295–301. https://doi.org/10.1038/nn1198 Bliss-Moreau, E., Moadab, G., & Machado, C. J. (2017). Monkeys preferentially process body information while viewing affective displays. Emotion (Washington, D.C.), 17(5), 765– 771. https://doi.org/10.1037/emo0000292 Boas, D. A., Elwell, C. E., Ferrari, M., & Taga, G. (2014). Twenty years of functional near- infrared spectroscopy: Introduction for the special issue. NeuroImage, 85, 1–5. https://doi.org/10.1016/j.neuroimage.2013.11.033 Boissy, A., Manteuffel, G., Jensen, M. B., Moe, R. O., Spruijt, B., Keeling, L. J., Winckler, C., Forkman, B., Dimitrov, I., Langbein, J., Bakken, M., Veissier, I., & Aubert, A. (2007). Assessment of positive emotions in animals to improve their welfare. Physiology & Behavior, 92(3), 375–397. https://doi.org/10.1016/j.physbeh.2007.02.003 Bolhuis, J. J., & Wynne, C. D. L. (2009). Can evolution explain how minds work? Nature, 458(7240), 832–833. https://doi.org/10.1038/458832a Bowler, P. J. (2003). Evolution: The History of an Idea. University of California Press. Briefer, E. (2012). Vocal Expression of Emotions in Mammals: Mechanisms of Production and Evidence. Communication Skills. https://animalstudiesrepository.org/comski/1 Bright, R. (1831). Reports of medical cases selected with a view of illustrating the symptoms and

200

cure of diseases by reference to morbid anatomy, case ccv “diseases of the brain and nervous system.” 2(3), 431. Brösamle, C., & Schwab, M. E. (1997). Cells of origin, course, and termination patterns of the ventral, uncrossed component of the mature rat corticospinal tract. The Journal of Comparative Neurology, 386(2), 293–303. https://doi.org/10.1002/(sici)1096- 9861(19970922)386:2<293::aid-cne9>3.0.co;2-x Brosch, T., Scherer, K., Grandjean, D., & Sander, D. (2013). The impact of emotion on perception, attention, memory, and decision-making. Swiss Medical Weekly, 143(1920). https://doi.org/10.4414/smw.2013.13786 Brück, C., Kreifelts, B., Kaza, E., Lotze, M., & Wildgruber, D. (2011). Impact of personality on the cerebral processing of emotional prosody. NeuroImage, 58(1), 259–268. https://doi.org/10.1016/j.neuroimage.2011.06.005 Brunswick, E. (1956). Perception and the representative design of psychological experiments. University of California Press. Bühler, K. (1990). Theory of Language: The Representational Function of Language. John Benjamins Publishing. Burkhardt, R. W. (2013). Lamarck, Evolution, and the Inheritance of Acquired Characters. Genetics, 194(4), 793–805. https://doi.org/10.1534/genetics.113.151852 Buss, A. T., Fox, N., Boas, D. A., & Spencer, J. P. (2014). Probing the early development of visual working memory capacity with functional near-infrared spectroscopy. NeuroImage, 85 Pt 1, 314–325. https://doi.org/10.1016/j.neuroimage.2013.05.034 Buttelmann, D., Call, J., & Tomasello, M. (2009). Do great apes use emotional expressions to infer desires? Developmental Science, 12(5), 688–698. https://doi.org/10.1111/j.1467- 7687.2008.00802.x Cai, W., & Leung, H.-C. (2011). Rule-guided executive control of response inhibition: Functional topography of the inferior frontal cortex. PloS One, 6(6), e20840. https://doi.org/10.1371/journal.pone.0020840 Caicedo, A., Varon, C., Hunyadi, B., Papademetriou, M., Tachtsidis, I., & Van Huffel, S. (2016). Decomposition of Near-Infrared Spectroscopy Signals Using Oblique Subspace Projections: Applications in Brain Hemodynamic Monitoring. Frontiers in Physiology, 7, 515. https://doi.org/10.3389/fphys.2016.00515 Caldwell, M., Scholkmann, F., Wolf, U., Wolf, M., Elwell, C., & Tachtsidis, I. (2016). Modelling confounding effects from extracerebral contamination and systemic factors on functional near-infrared spectroscopy. NeuroImage, 143, 91–105. https://doi.org/10.1016/j.neuroimage.2016.08.058 Casperd, J. M., & Dunbar, R. I. M. (1996). Asymmetries in the visual processing of emotional cues during agonistic interactions by gelada baboons. Behavioural Processes, 37(1), 57– 65. https://doi.org/10.1016/0376-6357(95)00075-5

201

Cavin, L., & Vallotton, L. (2009). Comment Darwin s’est transformé en singe. EspacesTemps.Net Electronic Journal of Humanities and Social Sciences.. https://www.espacestemps.net/en/articles/comment-darwin-srsquoest-transforme-en- singe-en/ Chance, B., Zhuang, Z., UnAh, C., Alter, C., & Lipton, L. (1993). Cognition-activated low- frequency modulation of light absorption in human brain. Proceedings of the National Academy of Sciences of the United States of America, 90(8), 3770–3774. Charest, I., Pernet, C., Latinus, M., Crabbe, F., & Belin, P. (2013). Cerebral Processing of Voice Gender Studied Using a Continuous Carryover fMRI Design. Cerebral Cortex (New York, NY), 23(4), 958–966. https://doi.org/10.1093/cercor/bhs090 Charland, L. C. (2002). The Natural Kind Status of Emotion. British Journal for the Philosophy of Science, 53(4), 511–37. https://doi.org/10.1093/bjps/53.4.511 Cicero. (2003). De Natura Deorum (Andrew R. Dyck). Cambridge University Press. Cipolloni, P. B., & Pandya, D. N. (1989). Connectional analysis of the ipsilateral and contralateral afferent neurons of the superior temporal region in the rhesus monkey. The Journal of Comparative Neurology, 281(4), 567–585. https://doi.org/10.1002/cne.902810407 Cohen, Y. E., Theunissen, F., Russ, B. E., & Gill, P. (2007). Acoustic features of rhesus vocalizations and their representation in the ventrolateral prefrontal cortex. Journal of Neurophysiology, 97(2), 1470–1484. https://doi.org/10.1152/jn.00769.2006 Collins, D. L., Neelin, P., Peters, T. M., & Evans, A. C. (1994). Automatic 3D intersubject registration of MR volumetric data in standardized Talairach space. Journal of Computer Assisted Tomography, 18(2), 192–205. Crivelli, C., Russell, J., Jarillo, S., & Fernandez-Dols, J.-M. (2016). The fear gasping face as a threat display in a Melanesian society. Proceedings of the National Academy of Sciences, 113, 12403–12407. https://doi.org/10.1073/pnas.1611622113 Cui, X., Bray, S., Bryant, D. M., Glover, G. H., & Reiss, A. L. (2011). A quantitative comparison of NIRS and fMRI across multiple cognitive tasks. NeuroImage, 54(4), 2808–2821. https://doi.org/10.1016/j.neuroimage.2010.10.069 Czigler, I., Cox, T. J., Gyimesi, K., & Horváth, J. (2007). Event-related potential study to aversive auditory stimuli. Neuroscience Letters, 420(3), 251–256. https://doi.org/10.1016/j.neulet.2007.05.007 Damasio, A. R. (1996). The somatic marker hypothesis and the possible functions of the prefrontal cortex. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 351(1346), 1413–1420. https://doi.org/10.1098/rstb.1996.0125 Darwin, C. (1859). On the origin of species by means of natural selection, or the preservation of favoured races in the struggle for life. (1st edition). John Murray. http://darwin- online.org.uk/content/frameset?itemID=F373&viewtype=text&pageseq=1

202

Darwin, C. (1871). The descent of man, and selection in relation to sex.: Vol. vol.1. John Murray. http://darwin- online.org.uk/content/frameset?itemID=F937.1&viewtype=text&pageseq=1 Darwin, C. (1872). The expression of the emotions in man and animals. http://darwin- online.org.uk/content/frameset?pageseq=1&itemID=F1142&viewtype=text Darwin, E. (1803). Temple of Nature. Canto I. Production of Life. http://spenserians.cath.vt.edu/TextRecord.php?action=GET&textsid=35443 Davidson, R. J., Ekman, P., Saron, C. D., Senulis, J. A., & Friesen, W. V. (1990). Approach- withdrawal and cerebral asymmetry: Emotional expression and brain physiology. I. Journal of Personality and Social Psychology, 58(2), 330–341. Davidson, Richard J. (1992). Anterior cerebral asymmetry and the nature of emotion. Brain and Cognition, 20(1), 125–151. https://doi.org/10.1016/0278-2626(92)90065-T Davila Ross, M., Owren, M. J., & Zimmermann, E. (2009). Reconstructing the evolution of laughter in great apes and humans. Current Biology: CB, 19(13), 1106–1111. https://doi.org/10.1016/j.cub.2009.05.028 Davila Ross, M., Allcock, B., Thomas, C., & Bard, K. A. (2011). Aping expressions? Chimpanzees produce distinct laugh types when responding to laughter of others. Emotion (Washington, D.C.), 11(5), 1013–1020. https://doi.org/10.1037/a0022594 de Lange, F. P., & Fritsche, M. (2017). Perceptual Decision-Making: Picking the Low- Hanging Fruit? Trends in Cognitive Sciences, 21(5), 306–307. https://doi.org/10.1016/j.tics.2017.03.006 de Vere, A. J., & Kuczaj, S. A. (2016). Where are we in the study of animal emotions? Wiley Interdisciplinary Reviews. Cognitive Science, 7(5), 354–362. https://doi.org/10.1002/wcs.1399 de Waal, F. B. M. (2009). Animal emotions. In The Oxford companion to Emotion and the Affective Sciences (pp. 33–36). Oxford University Press. de Waal, F. B. M. (2011). What is an animal emotion? Annals of the New York Academy of Sciences, 1224(1), 191–206. https://doi.org/10.1111/j.1749-6632.2010.05912.x de Waal, F. B. M. (2018). La dernière étreinte (Les liens qui liberent). http://www.editionslesliensquiliberent.fr/livre-La_derni%C3%A8re_%C3%A9treinte- 554-1-1-0-1.html Delpy, D. T., Cope, M., van der Zee, P., Arridge, S., Wray, S., & Wyatt, J. (1988). Estimation of optical pathlength through tissue from direct time of flight measurement. Physics in Medicine and Biology, 33(12), 1433–1442. Desmond, J. E., & Glover, G. H. (2002). Estimating sample size in functional MRI (fMRI) neuroimaging studies: Statistical power analyses. Journal of Neuroscience Methods, 118(2), 115–128. https://doi.org/10.1016/s0165-0270(02)00121-8 Doi, H., Nishitani, S., & Shinohara, K. (2013). NIRS as a tool for assaying emotional function

203

in the prefrontal cortex. Frontiers in Human Neuroscience, 7, 770. https://doi.org/10.3389/fnhum.2013.00770 Douglas-Hamilton, I., Bhalla, S., Wittemyer, G., & Vollrath, F. (2006). Behavioural reactions of elephants towards a dying and deceased matriarch. Applied Animal Behaviour Science, 100(1), 87–102. https://doi.org/10.1016/j.applanim.2006.04.014 Drahota, A., Costall, A., & Reddy, V. (2008). The vocal communication of different kinds of smile. Speech Communication, 50(4), 278–287. https://doi.org/10.1016/j.specom.2007.10.001 Dricu, M., Ceravolo, L., Grandjean, D., & Frühholz, S. (2017). Biased and unbiased perceptual decision-making on vocal emotions. Scientific Reports, 7(1), 16274. https://doi.org/10.1038/s41598-017-16594-w Dum, R. P., & Strick, P. L. (1996). Spinal Cord Terminations of the Medial Wall Motor Areas in Macaque Monkeys. Journal of Neuroscience, 16(20), 6513–6525. https://doi.org/10.1523/JNEUROSCI.16-20-06513.1996 Effects of Experience on Fetal Voice Recognition—Barbara S. Kisilevsky, Sylvia M.J. Hains, Kang Lee, Xing Xie, Hefeng Huang, Hai Hui Ye, Ke Zhang, Zengping Wang, 2003. (n.d.). Retrieved July 29, 2020, from https://journals.sagepub.com/doi/10.1111/1467- 9280.02435 Egnor, S. E. R., & Hauser, M. D. (2004). A paradox in the evolution of primate vocal learning. Trends in Neurosciences, 27(11), 649–654. https://doi.org/10.1016/j.tins.2004.08.009 Ehret, G. (2006). Common rules of communication sound perception. Behavior and Neurodynamics for Auditory Communication, 85–114. Eichert, N., Verhagen, L., Folloni, D., Jbabdi, S., Khrapitchev, A. A., Sibson, N. R., Mantini, D., Sallet, J., & Mars, R. B. (2019). What is special about the human arcuate fasciculus? Lateralization, projections, and expansion. Cortex, 118, 107–115. https://doi.org/10.1016/j.cortex.2018.05.005 Ekman, P. (1999). Basic emotions. In Handbook of cognition and emotion (pp. 45–60). John Wiley & Sons Ltd. https://doi.org/10.1002/0470013494.ch3 Ekman, P. (2003). Emotions revealed: Recognizing faces and feelings to improve communication and emotional life (pp. xvii, 267). Times Books/Henry Holt and Co. Ekman, P., & Matsumoto, D. (2009). Basic emotions. In The Oxford companion to Emotion and the Affective Sciences (pp. 159–164). Oxford University Press. Ekstrom, A. (2010). How and when the fMRI BOLD signal relates to underlying neural activity: The danger in dissociation. Brain Research Reviews, 62(2), 233–244. https://doi.org/10.1016/j.brainresrev.2009.12.004 Ellsworth, P. C., & Scherer, K. R. (2003). Appraisal processes in emotion. In Handbook of affective sciences (pp. 572–595). Oxford University Press. Ethofer, T., Anders, S., Erb, M., Herbert, C., Wiethoff, S., Kissler, J., Grodd, W., &

204

Wildgruber, D. (2006). Cerebral pathways in processing of affective prosody: A dynamic causal modeling study. NeuroImage, 30(2), 580–587. https://doi.org/10.1016/j.neuroimage.2005.09.059 Ethofer, T., Bretscher, J., Gschwind, M., Kreifelts, B., Wildgruber, D., & Vuilleumier, P. (2012). Emotional voice areas: Anatomic location, functional properties, and structural connections revealed by combined fMRI/DTI. Cerebral Cortex (New York, N.Y.: 1991), 22(1), 191–200. https://doi.org/10.1093/cercor/bhr113 Eyben, F., Scherer, K., Schuller, B., Sundberg, J., André, E., Busso, C., Devillers, L., Epps, J., Laukka, P., Narayanan, S., & Truong, K. P. (2016). The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing. IEEE transactions on affective computing, 7(2), 190–202. https://doi.org/10.1109/TAFFC.2015.2457417 Fant, G. (1960). Acoustic Theory of Speech Production: With Calculations Based on X-ray Studies of Russian Articulations. ’s-Gravenhage. Ferdenzi, C., Patel, S., Mehu-Blantar, I., Khidasheli, M., Sander, D., & Delplanque, S. (2013). Voice attractiveness: Influence of stimulus duration and type. Behavior Research Methods, 45(2), 405–413. https://doi.org/10.3758/s13428-012-0275-0 Fernández-Carriba, S., Loeches, Á., Morcillo, A., & Hopkins, W. D. (2002). Asymmetry in facial expression of emotions by chimpanzees. Neuropsychologia, 40(9), 1523–1533. https://doi.org/10.1016/S0028-3932(02)00028-3 Ferrari, M., & Quaresima, V. (2012). A brief review on the history of human functional near- infrared spectroscopy (fNIRS) development and fields of application. NeuroImage, 63(2), 921–935. https://doi.org/10.1016/j.neuroimage.2012.03.049 Ferry, A. L., Hespos, S. J., & Waxman, S. R. (2013). Nonhuman primate vocalizations support categorization in very young human infants. Proceedings of the National Academy of Sciences, 110(38), 15231–15235. https://doi.org/10.1073/pnas.1221166110 Fichtel, C., Hammerschmidt, K., & Jürgens, U. (2001). On the Vocal Expression of Emotion. A Multi-Parametric Analysis of Different States of Aversion in the Squirrel Monkey. Behaviour, 138(1), 97–116. JSTOR. Filippi, P. (2016). Emotional and Interactional Prosody across Animal Communication Systems: A Comparative Approach to the Emergence of Language. Frontiers in Psychology, 7, 1393. https://doi.org/10.3389/fpsyg.2016.01393 Filippi, P., Congdon, J. V., Hoang, J., Bowling, D. L., Reber, S. A., Pašukonis, A., Hoeschele, M., Ocklenburg, S., de Boer, B., Sturdy, C. B., Newen, A., & Güntürkün, O. (2017). Humans recognize emotional arousal in vocalizations across all classes of terrestrial : Evidence for acoustic universals. Proceedings of the Royal Society B: Biological Sciences, 284(1859), 20170990. https://doi.org/10.1098/rspb.2017.0990 Fischer, J., & Price, T. (2017). Meaning, intention, and inference in primate vocal

205

communication. Neuroscience & Biobehavioral Reviews, 82, 22–31. https://doi.org/10.1016/j.neubiorev.2016.10.014 Fitch, W. T. (2000). The evolution of speech: A comparative review. Trends in Cognitive Sciences, 4(7), 258–267. https://doi.org/10.1016/s1364-6613(00)01494-7 Fitch, W. T., & Braccini, S. N. (2013). Primate laterality and the biology and evolution of human handedness: A review and synthesis. Annals of the New York Academy of Sciences, 1288, 70–85. https://doi.org/10.1111/nyas.12071 Fontaine, J. R. J., Scherer, K. R., Roesch, E. B., & Ellsworth, P. C. (2007). The world of emotions is not two-dimensional. Psychological Science, 18(12), 1050–1057. https://doi.org/10.1111/j.1467-9280.2007.02024.x Fowles, D. C. (2009). Arousal. In The Oxford companion to Emotion and the Affective Sciences (p. 50). Oxford University Press. Fragaszy, D., & Simpson, E. (2011). Understanding emotions in primates: In honor of Darwin’s 200th birthday. American Journal of Primatology, 73(6), 503–506. https://doi.org/10.1002/ajp.20933 Fraser, O. N., & Bugnyar, T. (2010). Do Ravens Show Consolation? Responses to Distressed Others. PLOS ONE, 5(5), e10605. https://doi.org/10.1371/journal.pone.0010605 Friederici, A. D. (2012). The cortical language circuit: From auditory perception to sentence comprehension. Trends in Cognitive Sciences, 16(5), 262–268. https://doi.org/10.1016/j.tics.2012.04.001 Frijda, N. H. (1987). Emotion, cognitive structure, and action tendency. Cognition and Emotion, 1(2), 115–143. https://doi.org/10.1080/02699938708408043 Frijda, N. H. (2007). The laws of emotion (pp. xiv, 352). Lawrence Erlbaum Associates Publishers. Frijda, N. H. (2016). The evolutionary emergence of what we call “emotions.” Cognition and Emotion, 30(4), 609–620. https://doi.org/10.1080/02699931.2016.1145106 Frijda, N. H., & Scherer, K. R. (2009). Affect (psychological perspectives). In The Oxford companion to Emotion and the Affective Sciences (pp. 159–164). Oxford University Press. Friston, K. J., Josephs, O., Zarahn, E., Holmes, A. P., Rouquette, S., & Poline, J. (2000). To smooth or not to smooth? Bias and efficiency in fMRI time-series analysis. NeuroImage, 12(2), 196–208. https://doi.org/10.1006/nimg.2000.0609 Fritz, T., Mueller, K., Guha, A., Gouws, A., Levita, L., Andrews, T. J., & Slocombe, K. E. (2018). Human behavioural discrimination of human, chimpanzee and macaque affective vocalisations is reflected by the neural response in the superior temporal sulcus. Neuropsychologia, 111, 145–150. https://doi.org/10.1016/j.neuropsychologia.2018.01.026 Fröhlich, M., Sievers, C., Townsend, S. W., Gruber, T., & Schaik, C. P. van. (2019).

206

Multimodal communication and language origins: Integrating gestures and vocalizations. Biological Reviews, 94(5), 1809–1829. https://doi.org/10.1111/brv.12535 Frühholz, S., Ceravolo, L., & Grandjean, D. (2012). Specific brain networks during explicit and implicit decoding of emotional prosody. Cerebral Cortex (New York, N.Y.: 1991), 22(5), 1107–1117. https://doi.org/10.1093/cercor/bhr184 Frühholz, S., & Grandjean, D. (2013a). Multiple subregions in superior temporal cortex are differentially sensitive to vocal expressions: A quantitative meta-analysis. Neuroscience and Biobehavioral Reviews, 37(1), 24–35. https://doi.org/10.1016/j.neubiorev.2012.11.002 Frühholz, S., & Grandjean, D. (2013b). Processing of emotional vocalizations in bilateral inferior frontal cortex. Neuroscience and Biobehavioral Reviews, 37(10 Pt 2), 2847–2855. https://doi.org/10.1016/j.neubiorev.2013.10.007 Frühholz, S., Klaas, H. S., Patel, S., & Grandjean, D. (2015). Talking in Fury: The Cortico- Subcortical Network Underlying Angry Vocalizations. Cerebral Cortex (New York, N.Y.: 1991), 25(9), 2752–2762. https://doi.org/10.1093/cercor/bhu074 Frühholz, S., Trost, W., & Grandjean, D. (2014). The role of the medial temporal limbic system in processing emotions in voice and music (Vol. 123). https://doi.org/10.1016/j.pneurobio.2014.09.003 Frühholz, S., Trost, W., & Kotz, S. A. (2016). The sound of emotions-Towards a unifying neural network perspective of affective sound processing. Neuroscience and Biobehavioral Reviews, 68, 96–110. https://doi.org/10.1016/j.neubiorev.2016.05.002 Gannon, P. J., Holloway, R. L., Broadfield, D. C., & Braun, A. R. (1998). Asymmetry of chimpanzee planum temporale: Humanlike pattern of Wernicke’s brain language area homolog. Science (New York, N.Y.), 279(5348), 220–222. https://doi.org/10.1126/science.279.5348.220 George, M. S., Ketter, T. A., Parekh, P. I., Horwitz, B., Herscovitch, P., & Post, R. M. (1995). Brain activity during transient sadness and happiness in healthy women. The American Journal of Psychiatry, 152(3), 341–351. https://doi.org/10.1176/ajp.152.3.341 Ghashghaei, H. T., Hilgetag, C. C., & Barbas, H. (2007). Sequence of information processing for emotions based on the anatomic dialogue between prefrontal cortex and amygdala. NeuroImage, 34(3), 905–923. https://doi.org/10.1016/j.neuroimage.2006.09.046 Ghazanfar, A. A., & Hauser, M. D. (2001). The auditory behaviour of primates: A neuroethological perspective. Current Opinion in Neurobiology, 11(6), 712–720. https://doi.org/10.1016/S0959-4388(01)00274-4 Ghazanfar, A. A., & Santos, L. R. (2004). Primate brains in the wild: The sensory bases for social interactions. Nature Reviews Neuroscience, 5(8), 603–616. https://doi.org/10.1038/nrn1473

207

Gilbert, P. (2015). An Evolutionary Approach to Emotion in Mental Health With a Focus on Affiliative Emotions. https://journals.sagepub.com/doi/10.1177/1754073915576552 Gil-da-Costa, R., Martin, A., Lopes, M. A., Muñoz, M., Fritz, J. B., & Braun, A. R. (2006). Species-specific calls activate homologs of Broca’s and Wernicke’s areas in the macaque. Nature Neuroscience, 9(8), 1064–1070. https://doi.org/10.1038/nn1741 Gissis, S., Gissis, S. B., & Jablonka, E. (2011). Transformations of Lamarckism: From Subtle Fluids to Molecular Biology. MIT Press. Glotzbach, E., Mühlberger, A., Gschwendtner, K., Fallgatter, A. J., Pauli, P., & Herrmann, M. J. (2011). Prefrontal Brain Activation During Emotional Processing: A Functional Near Infrared Spectroscopy Study (fNIRS). The Open Neuroimaging Journal, 5, 33–39. https://doi.org/10.2174/1874440001105010033 Goudbeek, M., & Scherer, K. (2010). Beyond arousal: Valence and potency/control cues in the vocal expression of emotion. The Journal of the Acoustical Society of America, 128(3), 1322–1336. https://doi.org/10.1121/1.3466853 Gouzoules, S. (1984). Primate mating systems, kin associations, and cooperative behavior: Evidence for kin recognition? American Journal of Physical Anthropology, 27(S5), 99– 134. https://doi.org/10.1002/ajpa.1330270506 Grandjean, D. (2020). Brain Networks of Emotional Prosody Processing: Emotion Review. https://doi.org/10.1177/1754073919898522 Grandjean, D., Bänziger, T., & Scherer, K. R. (2006). Intonation as an interface between language and affect. In Progress in Brain Research (Vol. 156, pp. 235–247). Elsevier. https://doi.org/10.1016/S0079-6123(06)56012-1 Grandjean, D., Sander, D., Pourtois, G., Schwartz, S., Seghier, M. L., Scherer, K. R., & Vuilleumier, P. (2005). The voices of wrath: Brain responses to angry prosody in meaningless speech. Nature Neuroscience, 8(2), 145–146. https://doi.org/10.1038/nn1392 Grawunder, S., Crockford, C., Clay, Z., Kalan, A. K., Stevens, J. M. G., Stoessel, A., & Hohmann, G. (2018). Higher fundamental frequency in bonobos is explained by larynx morphology. Current Biology: CB, 28(20), R1188–R1189. https://doi.org/10.1016/j.cub.2018.09.030 Greenberg, L. S. (2002). Evolutionary Perspectives on Emotion: Making Sense of What We Feel. Journal of Cognitive Psychotherapy, 16(3), 331–347. https://doi.org/10.1891/jcop.16.3.331.52517 Griffiths, P. E. (2004). Is Emotion a Natural Kind? 18. Gross, J. J. (1998). The emerging field of emotion regulation: An integrative review. Review of General Psychology, 271–299. Grossmann, T., Oberecker, R., Koch, S. P., & Friederici, A. D. (2010). The developmental origins of voice processing in the human brain. Neuron, 65(6), 852–858. https://doi.org/10.1016/j.neuron.2010.03.001

208

Gruber, T., & Clay, Z. (2016). A Comparison Between Bonobos and Chimpanzees: A Review and Update. Evolutionary Anthropology: Issues, News, and Reviews, 25(5), 239–252. https://doi.org/10.1002/evan.21501 Gruber, T., Debracque, C., Ceravolo, L., Igloi, K., Marin Bosch, B., Frühholz, S., & Grandjean, D. (2020). Human Discrimination and Categorization of Emotions in Voices: A Functional Near-Infrared Spectroscopy (fNIRS) Study. Frontiers in Neuroscience, 14. https://doi.org/10.3389/fnins.2020.00570 Gruber, T., & Grandjean, D. M. (2017). A comparative neurological approach to emotional expressions in primate vocalizations. Neuroscience and Biobehavioral Reviews, 73, 182– 190. Guenther, F. H., Tourville, J. A., & Bohland, J. W. (2015). Speech Production. In A. W. Toga (Ed.), Brain Mapping (pp. 435–444). Academic Press. https://doi.org/10.1016/B978-0-12- 397025-1.00265-7 Hackett, T. A., Stepniewska, I., & Kaas, J. H. (1998). Subdivisions of auditory cortex and ipsilateral cortical connections of the parabelt auditory cortex in macaque monkeys. The Journal of Comparative Neurology, 394(4), 475–495. https://doi.org/10.1002/(sici)1096-9861(19980518)394:4<475::aid-cne6>3.0.co;2-z Hammerschmidt, K., & Jürgens, U. (2007). Acoustical correlates of affective prosody. Journal of Voice: Official Journal of the Voice Foundation, 21(5), 531–540. https://doi.org/10.1016/j.jvoice.2006.03.002 Hampshire, A., Chamberlain, S. R., Monti, M. M., Duncan, J., & Owen, A. M. (2010). The role of the right inferior frontal gyrus: Inhibition and attentional control. Neuroimage, 50(3–3), 1313–1319. https://doi.org/10.1016/j.neuroimage.2009.12.109 Hare, B., Wobber, V., & Wrangham, R. (2012). The self-domestication hypothesis: Evolution of bonobo psychology is due to selection against aggression. Animal Behaviour, 83(3), 573–585. https://doi.org/10.1016/j.anbehav.2011.12.007 Hauser, C. K., & Salinas, E. (2014). Perceptual Decision Making. In D. Jaeger & R. Jung (Eds.), Encyclopedia of Computational Neuroscience (pp. 1–21). Springer New York. https://doi.org/10.1007/978-1-4614-7320-6_317-1 Heming, E. A., Cross, K. P., Takei, T., Cook, D. J., & Scott, S. H. (2019). Independent representations of ipsilateral and contralateral limbs in primary motor cortex. ELife, 8. https://doi.org/10.7554/eLife.48190 Herculano-Houzel, S. (2009). The human brain in numbers: A linearly scaled-up primate brain. Frontiers in Human Neuroscience, 3. https://doi.org/10.3389/neuro.09.031.2009 Herrmann, M. J., Ehlis, A.-C., & Fallgatter, A. J. (2003). Frontal activation during a verbal- fluency task as measured by near-infrared spectroscopy. Brain Research Bulletin, 61(1), 51–56. https://doi.org/10.1016/s0361-9230(03)00066-2 Heuer, K., Gulban, O. F., Bazin, P.-L., Osoianu, A., Valabregue, R., Santin, M., Herbin, M., &

209

Toro, R. (2019). Evolution of neocortical folding: A phylogenetic comparative analysis of MRI from 34 primate species. Cortex, 118, 275–291. https://doi.org/10.1016/j.cortex.2019.04.011 Holmes, B. (2007). Lucretius on Creation and Evolution. A Commentary on De Rerum Natura 5.772-1104, by Gordon Campbell: Ancient Philosophy, 27(1), 208–212. https://doi.org/10.5840/ancientphil200727141 Hook-Costigan, M. A., & Rogers, L. J. (1998). Lateralized use of the mouth in production of vocalizations by marmosets. Neuropsychologia, 36(12), 1265–1273. https://doi.org/10.1016/S0028-3932(98)00037-2 Hopkins, W. D., Misiura, M., Pope, S. M., & Latash, E. M. (2015). Behavioral and brain asymmetries in primates: A preliminary evaluation of two evolutionary hypotheses. Annals of the New York Academy of Sciences, 1359, 65–83. https://doi.org/10.1111/nyas.12936 Hoshi, Y. (2016). Hemodynamic signals in fNIRS. Progress in Brain Research, 225, 153–179. https://doi.org/10.1016/bs.pbr.2016.03.004 Hoshi, Y., & Tamura, M. (1993). Dynamic multichannel near-infrared optical imaging of human brain activity. Journal of Applied Physiology (Bethesda, Md.: 1985), 75(4), 1842– 1846. https://doi.org/10.1152/jappl.1993.75.4.1842 Hoshi, Yoko, Huang, J., Kohri, S., Iguchi, Y., Naya, M., Okamoto, T., & Ono, S. (2011). Recognition of Human Emotions from Cerebral Blood Flow Changes in the Frontal Region: A Study with Event-Related Near-Infrared Spectroscopy. Journal of Neuroimaging, 21(2), e94–e101. https://doi.org/10.1111/j.1552-6569.2009.00454.x Hu, X., Zhuang, C., Wang, F., Liu, Y.-J., Im, C.-H., & Zhang, D. (2019). FNIRS Evidence for Recognizably Different Positive Emotions. Frontiers in Human Neuroscience, 13. https://doi.org/10.3389/fnhum.2019.00120 Huxley, J. (1942). Evolution, the Modern Synthesis. G. Allen & Unwin Limited. Huxley, T. H. (1881). On the Origin of Species: Or, The Causes of the Phenomena of Organic Nature, a Course of Six Lectures to Working Men. D. Appleton. Ian, A. (2020). Evolution: The Modern Synthesis. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Evolution:_The_Modern_Synthesis&oldid =941547603 Jack, R. E., Garrod, O. G. B., & Schyns, P. G. (2014). Dynamic Facial Expressions of Emotion Transmit an Evolving Hierarchy of Signals over Time. Current Biology, 24(2), 187–192. https://doi.org/10.1016/j.cub.2013.11.064 Jack, R. E., Sun, W., Delis, I., Garrod, O. G. B., & Schyns, P. G. (2016). Four not six: Revealing culturally common facial expressions of emotion. Journal of Experimental Psychology: General, 145(6), 708–730. https://doi.org/10.1037/xge0000162 Jacques, S. L. (2013). Optical properties of biological tissues: A review. Physics in Medicine

210

and Biology, 58(11), R37-61. https://doi.org/10.1088/0031-9155/58/11/R37 Jäncke, L., Wüstenberg, T., Schulze, K., & Heinze, H. J. (2002). Asymmetric hemodynamic responses of the human auditory cortex to monaural and binaural stimulation. Hearing Research, 170(1), 166–178. https://doi.org/10.1016/S0378-5955(02)00488-4 Jasper, H. H. (1958). The Ten-Twenty Electrode System of the International Federation. 10, 371– 375. Jöbsis, F. F. (1977). Noninvasive, infrared monitoring of cerebral and myocardial oxygen sufficiency and circulatory parameters. Science (New York, N.Y.), 198(4323), 1264–1267. Johnstone, T., van Reekum, C. M., Oakes, T. R., & Davidson, R. J. (2006). The voice of emotion: An FMRI study of neural responses to angry and happy vocal expressions. Social Cognitive and Affective Neuroscience, 1(3), 242–249. https://doi.org/10.1093/scan/nsl027 Joly, O., Pallier, C., Ramus, F., Pressnitzer, D., Vanduffel, W., & Orban, G. A. (2012). Processing of vocalizations in humans and monkeys: A comparative fMRI study. NeuroImage, 62(3), 1376–1389. https://doi.org/10.1016/j.neuroimage.2012.05.070 Juslin, P. N., & Laukka, P. (2003). Communication of emotions in vocal expression and music performance: Different channels, same code? Psychological Bulletin, 129(5), 770– 814. https://doi.org/10.1037/0033-2909.129.5.770 Kaiser, J., Lutzenberger, W., Preissl, H., Ackermann, H., & Birbaumer, N. (2000). Right- Hemisphere Dominance for the Processing of Sound-Source Lateralization. Journal of Neuroscience, 20(17), 6631–6639. https://doi.org/10.1523/JNEUROSCI.20-17- 06631.2000 Kambara, T., Brown, E. C., Silverstein, B. H., Nakai, Y., & Asano, E. (2018). Neural dynamics of verbal working memory in auditory description naming. Scientific Reports, 8(1), 15868. https://doi.org/10.1038/s41598-018-33776-2 Kamiloğlu, R. G., Slocombe, K. E., Haun, D. B. M., & Sauter, D. A. (2020). Human listeners’ perception of behavioural context and core affect dimensions in chimpanzee vocalizations. Proceedings of the Royal Society B: Biological Sciences, 287(1929), 20201148. https://doi.org/10.1098/rspb.2020.1148 Kano, F. (2008). Enhanced recognition of emotional stimuli in the chimpanzee (Pan troglodytes). Animal Cognition. https://doi.org/10.1007/s10071-008-0142-7 Kano, F., & Tomonaga, M. (2010). Attention to emotional scenes including whole-body expressions in chimpanzees (Pan troglodytes). Journal of Comparative Psychology (Washington, D.C.: 1983), 124(3), 287–294. https://doi.org/10.1037/a0019146 Kato, T., Kamei, A., Takashima, S., & Ozaki, T. (1993). Human visual cortical function during photic stimulation monitoring by means of near-infrared spectroscopy. Journal of Cerebral Blood Flow and : Official Journal of the International Society of Cerebral Blood Flow and Metabolism, 13(3), 516–520.

211

https://doi.org/10.1038/jcbfm.1993.66 Kelly, T., Reby, D., Levréro, F., Keenan, S., Gustafsson, E., Koutseff, A., & Mathevon, N. (2017). Adult human perception of distress in the cries of bonobo, chimpanzee, and human infants. Biological Journal of the Linnean Society, 120(4), 919–930. https://doi.org/10.1093/biolinnean/blw016 Kim, H. Y., Seo, K., Jeon, H. J., Lee, U., & Lee, H. (2017). Application of Functional Near- Infrared Spectroscopy to the Study of Brain Function in Humans and Animal Models. Molecules and Cells, 40(8), 523–532. https://doi.org/10.14348/molcells.2017.0153 Kočandrle, R., & Kleisner, K. (2013). Evolution Born of Moisture: Analogies and Parallels Between Anaximander’s Ideas on Origin of Life and Man and Later Pre-Darwinian and Darwinian Evolutionary Concepts. Journal of the History of Biology, 46(1), 103–124. https://doi.org/10.1007/s10739-012-9334-8 Köchel, A., Plichta, M. M., Schäfer, A., Leutgeb, V., Scharmüller, W., Fallgatter, A. J., & Schienle, A. (2011). Affective perception and imagery: A NIRS study. International Journal of Psychophysiology, 80(3), 192–197. https://doi.org/10.1016/j.ijpsycho.2011.03.006 Kochunov, P., Castro, C., Davis, D., Dudley, D., Brewer, J., Zhang, Y., Kroenke, C. D., Purdy, D., Fox, P. T., Simerly, C., & Schatten, G. (2010). Mapping Primary Gyrogenesis During Fetal Development in Primate Brains: High-Resolution in Utero Structural MRI of Fetal Brain Development in Pregnant Baboons. Frontiers in Neuroscience, 4. https://doi.org/10.3389/fnins.2010.00020 Kotz, S. A., Kalberlah, C., Bahlmann, J., Friederici, A. D., & Haynes, J.-D. (2013). Predicting vocal emotion expressions from the human brain. Human Brain Mapping, 34(8), 1971– 1981. https://doi.org/10.1002/hbm.22041 Kreifelts, B., Ethofer, T., Shiozawa, T., Grodd, W., & Wildgruber, D. (2009). Cerebral representation of non-verbal emotional perception: FMRI reveals audiovisual integration area between voice- and face-sensitive regions in the superior temporal sulcus. Neuropsychologia, 47(14), 3059–3066. https://doi.org/10.1016/j.neuropsychologia.2009.07.001 Kret, M. E., Jaasma, L., Bionda, T., & Wijnen, J. G. (2016). Bonobos (Pan paniscus) show an attentional bias toward conspecifics’ emotions. Proceedings of the National Academy of Sciences, 113(14), 3761–3766. https://doi.org/10.1073/pnas.1522060113 Kret, M. E., Muramatsu, A., & Matsuzawa, T. (2018). Emotion processing across and within species: A comparison between humans (Homo sapiens) and chimpanzees (Pan troglodytes). Journal of Comparative Psychology, 132(4), 395–409. https://doi.org/10.1037/com0000108 Kret, M. E., Prochazkova, E., Sterck, E. H. M., & Clay, Z. (2020). Emotional expressions in human and non-human great apes. Neuroscience & Biobehavioral Reviews.

212

https://doi.org/10.1016/j.neubiorev.2020.01.027 Kret, M. E., Stekelenburg, J. J., Roelofs, K., & de Gelder, B. (2013). Perception of face and body expressions using electromyography, pupillometry and gaze measures. Frontiers in Psychology, 4, 28. https://doi.org/10.3389/fpsyg.2013.00028 Kret, M. E., & Straffon, L. M. (2018). Reply to Crivelli et al.: The different faces of fear and threat. Evolutionary and cultural insights. Journal of Human Evolution, 125, 193–197. https://doi.org/10.1016/j.jhevol.2017.11.006 Kriegstein, K., & Giraud, A.-L. (2004). Distinct functional substrates along the right superior temporal sulcus for the processing of voices. Neuroimage, 22(2), 948–955. https://doi.org/10.1016/j.neuroimage.2004.02.020 Lacroix, S., Havton, L. A., McKay, H., Yang, H., Brant, A., Roberts, J., & Tuszynski, M. H. (2004). Bilateral corticospinal projections arise from each motor cortex in the macaque monkey: A quantitative study. The Journal of Comparative Neurology, 473(2), 147–161. https://doi.org/10.1002/cne.20051 Lahnakoski, J. M., Glerean, E., Salmi, J., Jääskeläinen, I. P., Sams, M., Hari, R., & Nummenmaa, L. (2012). Naturalistic fMRI Mapping Reveals Superior Temporal Sulcus as the Hub for the Distributed Brain Network for Social Perception. Frontiers in Human Neuroscience, 6. https://doi.org/10.3389/fnhum.2012.00233 Lamarck, J.-B. de. (1802). Recherches sur l’organisation des corps vivans et particulièrement sur son origine, sur la cause de ses développements et des progrès de sa composition, et sur celle qui, tendant continuellement à la détruire dans chaque individu, amène nécessairement sa mort; précédé du discours d’ouverture du course de zoologie, donné dans le Muséum national d’Histoire Naturelle (Maillard). Lamarck, J.-B. de. (1809). Philosophie zoologique. Dentu. Latinus, M., Crabbe, F., & Belin, P. (2011). Learning-induced changes in the cerebral processing of voice identity. Cerebral Cortex (New York, N.Y.: 1991), 21(12), 2820–2828. https://doi.org/10.1093/cercor/bhr077 Latinus, M., McAleer, P., Bestelmeyer, P. E. G., & Belin, P. (2013). Norm-Based Coding of Voice Identity in Human Auditory Cortex. Current Biology, 23(12), 1075–1080. https://doi.org/10.1016/j.cub.2013.04.055 LeDoux, J. (2012). Rethinking the emotional brain. Neuron, 73(4), 653–676. https://doi.org/10.1016/j.neuron.2012.02.004 Lee, Y.-A., Pollet, V., Kato, A., & Goto, Y. (2017). Prefrontal cortical activity associated with visual stimulus categorization in non-human primates measured with near-infrared spectroscopy. Behavioural Brain Research, 317, 327–331. https://doi.org/10.1016/j.bbr.2016.09.068 Leigh, S. R. (2004). Brain growth, life history, and cognition in primate and human evolution. American Journal of Primatology, 62(3), 139–164.

213

https://doi.org/10.1002/ajp.20012 Leitman, D. I., Wolf, D. H., Ragland, J. D., Laukka, P., Loughead, J., Valdez, J. N., Javitt, D. C., Turetsky, B. I., & Gur, R. C. (2010). “It’s Not What You Say, But How You Say it”: A Reciprocal Temporo-frontal Network for Affective Prosody. Frontiers in Human Neuroscience, 4, 19. https://doi.org/10.3389/fnhum.2010.00019 Lerner, J. S., Li, Y., Valdesolo, P., & Kassam, K. S. (2015). Emotion and Decision Making. Annual Review of Psychology, 66(1), 799–823. https://doi.org/10.1146/annurev-psych- 010213-115043 Lindell, A. K. (2013). Continuities in Emotion Lateralization in Human and Non-Human Primates. Frontiers in Human Neuroscience, 7. https://doi.org/10.3389/fnhum.2013.00464 Linnankoski, I., Laakso, M., Aulanko, R., & Leinonen, L. (1994). Recognition of emotions in macaque vocalizations by children and adults. Language & Communication, 14(2), 183– 192. https://doi.org/10.1016/0271-5309(94)90012-4 Lloyd-Fox, S., Papademetriou, M., Darboe, M. K., Everdell, N. L., Wegmuller, R., Prentice, A. M., Moore, S. E., & Elwell, C. E. (2014). Functional near infrared spectroscopy (fNIRS) to assess cognitive function in infants in rural Africa. Scientific Reports, 4, 4740. https://doi.org/10.1038/srep04740 Love, S. A., Marie, D., Roth, M., Lacoste, R., Nazarian, B., Bertello, A., Coulon, O., Anton, J.- L., & Meguerditchian, A. (2016). The average baboon brain: MRI templates and tissue probability maps from 89 individuals. NeuroImage, 132, 526–533. https://doi.org/10.1016/j.neuroimage.2016.03.018 Mandrick, K., Derosiere, G., Dray, G., Coulon, D., Micallef, J.-P., & Perrey, S. (2013). Prefrontal cortex activity during motor tasks with additional mental load requiring attentional demand: A near-infrared spectroscopy study. Neuroscience Research, 76(3), 156–162. https://doi.org/10.1016/j.neures.2013.04.006 Margiotoudi, K., Marie, D., Claidière, N., Coulon, O., Roth, M., Nazarian, B., Lacoste, R., Hopkins, W. D., Molesti, S., Fresnais, P., Anton, J.-L., & Meguerditchian, A. (2019). Handedness in monkeys reflects hemispheric specialization within the central sulcus. An in vivo MRI study in right- and left-handed olive baboons. Cortex: A Journal Devoted to the Study of the Nervous System and Behavior, 118, 203–211. https://doi.org/10.1016/j.cortex.2019.01.001 Marie, D., Roth, M., Lacoste, R., Nazarian, B., Bertello, A., Anton, J.-L., Hopkins, W. D., Margiotoudi, K., Love, S. A., & Meguerditchian, A. (2018). Left Brain Asymmetry of the Planum Temporale in a Nonhominid Primate: Redefining the Origin of Brain Specialization for Language. Cerebral Cortex (New York, N.Y.: 1991), 28(5), 1808–1815. https://doi.org/10.1093/cercor/bhx096 Matsukawa, K., Asahara, R., Yoshikawa, M., & Endo, K. (2018). Deactivation of the

214

prefrontal cortex during exposure to pleasantly-charged emotional challenge. Scientific Reports, 8(1), 14540. https://doi.org/10.1038/s41598-018-32752-0 Matsumoto, D., Keltner, D., Shiota, M. N., O’Sullivan, M., & Frank, M. (2008). Facial expressions of emotion. In Handbook of emotions, 3rd ed (pp. 211–234). The Guilford Press. Matsuo, K., Kato, T., Taneichi, K., Matsumoto, A., Ohtani, T., Hamamoto, T., Yamasue, H., Sakano, Y., Sasaki, T., Sadamatsu, M., Iwanami, A., Asukai, N., & Kato, N. (2003). Activation of the prefrontal cortex to trauma-related stimuli measured by near- infrared spectroscopy in posttraumatic stress disorder due to terrorism. Psychophysiology, 40(4), 492–500. https://doi.org/10.1111/1469-8986.00051 Maupertuis, P.-L. M. de (1698-1759) A. du texte. (1745). Vénus physique. https://gallica.bnf.fr/ark:/12148/bpt6k862803 Mazurier, rôle de Joko. (1826). NYPL Digital Collections. http://digitalcollections.nypl.org/items/59d55990-beaf-0132-6e89-58d385a7bbd0 Meguerditchian, A., Gardner, M. J., Schapiro, S. J., & Hopkins, W. D. (2012). The sound of one-hand clapping: Handedness and perisylvian neural correlates of a communicative gesture in chimpanzees. Proceedings of the Royal Society B: Biological Sciences, 279(1735), 1959–1966. https://doi.org/10.1098/rspb.2011.2485 Meguerditchian, A., & Vauclair, J. (2006). Baboons communicate with their right hand. Behavioural Brain Research, 171(1), 170–174. https://doi.org/10.1016/j.bbr.2006.03.018 Meguerditchian, A., Vauclair, J., & Hopkins, W. D. (2013). On the origins of human handedness and language: A comparative review of hand preferences for bimanual coordinated actions and gestural communication in nonhuman primates. Developmental Psychobiology, 55(6), 637–650. https://doi.org/10.1002/dev.21150 Monboddo, L. J. B. (1774). Of the Origin and Progress of Language. J. Balfour. Montag, C., & Panksepp, J. (2017). Primary Emotional Systems and Personality: An Evolutionary Perspective. Frontiers in Psychology, 8. https://doi.org/10.3389/fpsyg.2017.00464 Morton, E. S. (1977). On the Occurrence and Significance of Motivation-Structural Rules in Some Bird and Mammal Sounds. The American Naturalist, 111(981), 855–869. https://doi.org/10.1086/283219 Morton, E. S. (1982). Grading, discreteness, redundancy, and motivation-structural rules. In Acoustic Communication in Birds (Kroodsma, D.E., Miller, E.H. and Ouellet, H., pp. 182–212). Academic Press. Murray, E. A., & Izquierdo, A. (2007). Orbitofrontal Cortex and Amygdala Contributions to Affect and Action in Primates. Annals of the New York Academy of Sciences, 1121(1), 273–296. https://doi.org/10.1196/annals.1401.021 Nesse, R. M. (1990). Evolutionary explanations of emotions. Human Nature, 1(3), 261–289.

215

https://doi.org/10.1007/BF02733986 Nesse, R. M. (2004). Natural selection and the elusiveness of happiness. Philosophical Transactions of the Royal Society B: Biological Sciences, 359(1449), 1333–1347. https://doi.org/10.1098/rstb.2004.1511 Nesse, R. M. (2009). Evolution of emotion. In The Oxford companion to Emotion and the Affective Sciences (pp. 159–164). Oxford University Press. Nimchinsky, E. A., Gilissen, E., Allman, J. M., Perl, D. P., Erwin, J. M., & Hof, P. R. (1999). A neuronal morphologic type unique to humans and great apes. Proceedings of the National Academy of Sciences of the United States of America, 96(9), 5268–5273. https://doi.org/10.1073/pnas.96.9.5268 Ogawa, S., Lee, T. M., Kay, A. R., & Tank, D. W. (1990). Brain magnetic resonance imaging with contrast dependent on blood oxygenation. Proceedings of the National Academy of Sciences of the United States of America, 87(24), 9868–9872. Ohtani, T., Matsuo, K., Kasai, K., Kato, T., & Kato, N. (2005). Hemodynamic response to emotional memory recall with eye movement. Neuroscience Letters, 380(1), 75–79. https://doi.org/10.1016/j.neulet.2005.01.020 Okada, E., & Delpy, D. T. (2003a). Near-infrared light propagation in an adult head model. I. Modeling of low-level scattering in the cerebrospinal fluid layer. Applied Optics, 42(16), 2906–2914. https://doi.org/10.1364/AO.42.002906 Okada, E., & Delpy, D. T. (2003b). Near-infrared light propagation in an adult head model. II. Effect of superficial tissue thickness on the sensitivity of the near-infrared spectroscopy signal. Applied Optics, 42(16), 2915–2921. https://doi.org/10.1364/AO.42.002915 Okamoto, M., Dan, H., Sakamoto, K., Takeo, K., Shimizu, K., Kohno, S., Oda, I., Isobe, S., Suzuki, T., Kohyama, K., & Dan, I. (2004). Three-dimensional probabilistic anatomical cranio-cerebral correlation via the international 10-20 system oriented for transcranial functional brain mapping. NeuroImage, 21(1), 99–111. Oostenveld, R., Fries, P., Maris, E., & Schoffelen, J.-M. (2010, December 23). FieldTrip: Open Source Software for Advanced Analysis of MEG, EEG, and Invasive Electrophysiological Data [Research Article]. Computational Intelligence and Neuroscience; Hindawi. https://doi.org/10.1155/2011/156869 Owren, M. J., Amoss, R. T., & Rendall, D. (2011). Two organizing principles of vocal production: Implications for nonhuman and human primates. American Journal of Primatology, 73(6), 530–544. https://doi.org/10.1002/ajp.20913 Ozawa, S., Matsuda, G., & Hiraki, K. (2014). Negative emotion modulates prefrontal cortex activity during a working memory task: A NIRS study. Frontiers in Human Neuroscience, 8. https://doi.org/10.3389/fnhum.2014.00046 Pallen, M. (2011). The Rough Guide to Evolution. Rough Guides UK.

216

Panksepp, J. (2005). Affective consciousness: Core emotional feelings in animals and humans. Consciousness and Cognition, 14(1), 30–80. https://doi.org/10.1016/j.concog.2004.10.004 Panksepp, J. (2010). Affective consciousness in animals: Perspectives on dimensional and primary process emotion approaches. Proceedings. Biological Sciences / The Royal Society, 277, 2905–2907. https://doi.org/10.1098/rspb.2010.1017 Panksepp, J. (2011a). Cross-species affective neuroscience decoding of the primal affective experiences of humans and related animals. PloS One, 6(9), e21236. https://doi.org/10.1371/journal.pone.0021236 Panksepp, J. (2011b). The basic emotional circuits of mammalian brains: Do animals have affective lives? Neuroscience and Biobehavioral Reviews, 35(9), 1791–1804. https://doi.org/10.1016/j.neubiorev.2011.08.003 Panksepp, J. (2014). The science of emotions at TEDxRainier. https://www.youtube.com/watch?v=65e2qScV_K8 Panksepp, J., & Burgdorf, J. (2003). “Laughing” rats and the evolutionary antecedents of human joy? Physiology & Behavior, 79(3), 533–547. Pannese, A., Grandjean, D. M., & Frühholz, S. (2015). Subcortical processing in auditory communication. Hearing Research, 328, 67–77. https://doi.org/10.1016/j.heares.2015.07.003 Parr, L. A., & Waller, B. M. (2006). Understanding chimpanzee facial expression: Insights into the evolution of communication. Social Cognitive and Affective Neuroscience, 1(3), 221–228. https://doi.org/10.1093/scan/nsl031 Parr, L. A., Waller, B. M., & Fugate, J. (2005). Emotional communication in primates: Implications for neurobiology. Current Opinion in Neurobiology, 15(6), 716–720. https://doi.org/10.1016/j.conb.2005.10.017 Perelman, P., Johnson, W. E., Roos, C., Seuánez, H. N., Horvath, J. E., Moreira, M. A. M., Kessing, B., Pontius, J., Roelke, M., Rumpler, Y., Schneider, M. P. C., Silva, A., O’Brien, S. J., & Pecon-Slattery, J. (2011). A Molecular Phylogeny of Living Primates. PLOS Genetics, 7(3), e1001342. https://doi.org/10.1371/journal.pgen.1001342 Peretz, I., Kolinsky, R., Tramo, M., Labrecque, R., Hublet, C., Demeurisse, G., & Belleville, S. (1994). Functional dissociations following bilateral lesions of auditory cortex. Brain: A Journal of Neurology, 117 ( Pt 6), 1283–1301. https://doi.org/10.1093/brain/117.6.1283 Pernet, C. R., McAleer, P., Latinus, M., Gorgolewski, K. J., Charest, I., Bestelmeyer, P. E. G., Watson, R. H., Fleming, D., Crabbe, F., Valdes-Sosa, M., & Belin, P. (2015). The human voice areas: Spatial organization and inter-individual variability in temporal and extra-temporal cortices. NeuroImage, 119, 164–174. https://doi.org/10.1016/j.neuroimage.2015.06.050 Perrodin, C., Kayser, C., Logothetis, N. K., & Petkov, C. I. (2011). Voice cells in the primate

217

temporal lobe. Current Biology : CB, 21(16), 1408–1415. https://doi.org/10.1016/j.cub.2011.07.028 Petkov, C. I., Kayser, C., Steudel, T., Whittingstall, K., Augath, M., & Logothetis, N. K. (2008). A voice region in the monkey brain. Nature Neuroscience, 11(3), 367–374. https://doi.org/10.1038/nn2043 Petrides, M., & Pandya, D. N. (2002). Comparative cytoarchitectonic analysis of the human and the macaque ventrolateral prefrontal cortex and corticocortical connection patterns in the monkey. The European Journal of Neuroscience, 16(2), 291–310. https://doi.org/10.1046/j.1460-9568.2001.02090.x Phelps, E. A., Lempert, K. M., & Sokol-Hessner, P. (2014). Emotion and decision making: Multiple modulatory neural circuits. Annual Review of Neuroscience, 37, 263–287. https://doi.org/10.1146/annurev-neuro-071013-014119 Pilcher, D. L., Hammock, E. A. D., & Hopkins, W. D. (2001). Cerebral volumetric asymmetries in non-human primates: A magnetic resonance imaging study. Laterality, 6(2), 165–179. https://doi.org/10.1080/13576500042000124 Piper, S. K., Krueger, A., Koch, S. P., Mehnert, J., Habermehl, C., Steinbrink, J., Obrig, H., & Schmitz, C. H. (2014). A wearable multi-channel fNIRS system for brain imaging in freely moving subjects. NeuroImage, 85 Pt 1, 64–71. https://doi.org/10.1016/j.neuroimage.2013.06.062 Plichta, M. M., Gerdes, A. B. M., Alpers, G. W., Harnisch, W., Brill, S., Wieser, M. J., & Fallgatter, A. J. (2011). Auditory cortex activation is modulated by emotion: A functional near-infrared spectroscopy (fNIRS) study. NeuroImage, 55(3), 1200–1207. https://doi.org/10.1016/j.neuroimage.2011.01.011 Poremba, A., Malloy, M., Saunders, R. C., Carson, R. E., Herscovitch, P., & Mishkin, M. (2004). Species-specific calls evoke asymmetric activity in the monkey’s temporal poles. Nature, 427(6973), 448–451. https://doi.org/10.1038/nature02268 Preston, S. D., & de Waal, F. B. M. (2002). Empathy: Its ultimate and proximate bases. The Behavioral and Brain Sciences, 25(1), 1–20; discussion 20-71. https://doi.org/10.1017/s0140525x02000018 Proctor, H. S., Carder, G., & Cornish, A. R. (2013). Searching for Animal Sentience: A Systematic Review of the Scientific Literature. Animals, 3(3), 882–906. https://doi.org/10.3390/ani3030882 Rauschecker, J. P. (1997). Processing of complex sounds in the auditory cortex of cat, monkey, and man. Acta Oto-Laryngologica. Supplementum, 532, 34–38. https://doi.org/10.3109/00016489709126142 Rauschecker, J. P. (2013). Brain networks for the encoding of emotions in communication sounds of human and nonhuman primates. In The Evolution of Emotional Communication: From Sounds in Nonhuman Mammals to Speech and Music in Man (OUP

218

Oxford, pp. 49–60). Altenmüller, E., Schmidt, S., Zimmerman, E. Rendall, D. (2003). Acoustic correlates of caller identity and affect intensity in the vowel-like grunt vocalizations of baboons. The Journal of the Acoustical Society of America, 113(6), 3390–3402. https://doi.org/10.1121/1.1568942 Rilling, J. K. (2014). Comparative primate neuroimaging: Insights into human brain evolution. Trends in Cognitive Sciences, 18(1), 46–55. https://doi.org/10.1016/j.tics.2013.09.013 Robinson, C. J., & Burton, H. (1980). Organization of somatosensory receptive fields in cortical areas 7b, retroinsula, postauditory and granular insula of M. fascicularis. Journal of Comparative Neurology, 192(1), 69–92. https://doi.org/10.1002/cne.901920105 Rolls, E. T. (2004). Convergence of sensory systems in the orbitofrontal cortex in primates and brain design for emotion. The Anatomical Record Part A: Discoveries in Molecular, Cellular, and Evolutionary Biology, 281A(1), 1212–1225. https://doi.org/10.1002/ar.a.20126 Rosenzweig, E. S., Brock, J. H., Culbertson, M. D., Lu, P., Moseanko, R., Edgerton, V. R., Havton, L. A., & Tuszynski, M. H. (2009). Extensive spinal and bilateral termination of cervical corticospinal projections in rhesus monkeys. Journal of Comparative Neurology, 513(2), 151–163. https://doi.org/10.1002/cne.21940 Ross, M. D., Menzler, S., & Zimmermann, E. (2008). Rapid facial mimicry in orangutan play. Biology Letters, 4(1), 27–30. https://doi.org/10.1098/rsbl.2007.0535 Rovetti, J., Goy, H., Pichora-Fuller, M. K., & Russo, F. A. (2019). Functional Near-Infrared Spectroscopy as a Measure of Listening Effort in Older Adults Who Use Hearing Aids. Trends in Hearing, 23, 2331216519886722. https://doi.org/10.1177/2331216519886722 Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178. https://doi.org/10.1037/h0077714 Sander, D. (2013). Models of emotion: The affective neuroscience approach. In The Cambridge handbook of human affective neuroscience (pp. 5–53). Cambridge University Press. https://doi.org/10.1017/CBO9780511843716.003 Sander, D., Grandjean, D., Pourtois, G., Schwartz, S., Seghier, M. L., Scherer, K. R., & Vuilleumier, P. (2005). Emotion and attention interactions in social cognition: Brain regions involved in processing anger prosody. NeuroImage, 28(4), 848–858. https://doi.org/10.1016/j.neuroimage.2005.06.023 Sander, D., Grandjean, D., & Scherer, K. R. (2005). A systems approach to appraisal mechanisms in emotion. Neural Networks, 18(4), 317–352. https://doi.org/10.1016/j.neunet.2005.03.001 Sander, D., Grandjean, D., & Scherer, K. R. (2018). An Appraisal-Driven Componential Approach to the Emotional Brain: Emotion Review.

219

https://doi.org/10.1177/1754073918765653 Schaerlaeken, S., & Grandjean, D. (2018). Unfolding and dynamics of affect bursts decoding in humans. PloS One, 13(10), e0206216. https://doi.org/10.1371/journal.pone.0206216 Scherer, K. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication, 40(1–2), 227–256. https://doi.org/10.1016/S0167-6393(02)00084-5 Scherer, K. R. (1982). Emotion as a process: Function, origin and regulation. Social Science Information/Sur Les Sciences Sociales, 21(4–5), 555–570. https://doi.org/10.1177/053901882021004004 Scherer, K. R. (1988). On the Symbolic Functions of Vocal Affect Expression: Journal of Language and Social Psychology. https://doi.org/10.1177/0261927X8800700201 Scherer, K. R. (1992). Vocal affect expression as symptom, symbol, and appeal. In Nonverbal Vocal Communication: Comparative and Developmental Approaches (Cambridge University press, pp. 43–60). Papousek, H., Jürgens, U., Papousek, M. https://www.researchgate.net/publication/313721225_Vocal_affect_expression_as_sy mptom_symbol_and_appeal Scherer, K. R. (2001). Appraisal considered as a process of multilevel sequential checking. In Appraisal processes in emotion: Theory, methods, research (pp. 92–120). Oxford University Press. Scherer, K. R. (2009a). Affect bursts. In The Oxford companion to Emotion and the Affective Sciences (p. 11). Oxford University Press. Scherer, K. R. (2009b). Component process model. In The Oxford companion to Emotion and the Affective Sciences (pp. 93–94). Oxford University Press. Scherer, K. R. (2009c). Principles of emotional expression. In The Oxford companion to Emotion and the Affective Sciences (pp. 167–169). Oxford University Press. Scherer, K. R. (2009d). Push/Pull effects. In The Oxford companion to Emotion and the Affective Sciences (p. 326). Oxford University Press. Scheumann, M., Hasting, A. S., Kotz, S. A., & Zimmermann, E. (2014). The voice of emotion across species: How do human listeners recognize animals’ affective states? PloS One, 9(3), e91192. https://doi.org/10.1371/journal.pone.0091192 Scheumann, M., Hasting, A. S., Zimmermann, E., & Kotz, S. A. (2017). Human Novelty Response to Emotional Animal Vocalizations: Effects of Phylogeny and Familiarity. Frontiers in Behavioral Neuroscience, 11. https://doi.org/10.3389/fnbeh.2017.00204 Schirmer, A., & Kotz, S. A. (2006). Beyond the right hemisphere: Brain mechanisms mediating vocal emotional processing. Trends in Cognitive Sciences, 10(1), 24–30. https://doi.org/10.1016/j.tics.2005.11.009 Schmidt, K. L., & Cohn, J. F. (2001). Human facial expressions as : Evolutionary questions in facial expression research. American Journal of Physical Anthropology, 116(S33), 3–24. https://doi.org/10.1002/ajpa.20001

220

Schneider, S., Christensen, A., Häußinger, F. B., Fallgatter, A. J., Giese, M. A., & Ehlis, A.-C. (2014). Show me how you walk and I tell you how you feel—A functional near- infrared spectroscopy study on emotion perception based on human gait. NeuroImage, 85 Pt 1, 380–390. https://doi.org/10.1016/j.neuroimage.2013.07.078 Scholkmann, F., Spichtig, S., Muehlemann, T., & Wolf, M. (2010). How to detect and reduce movement artifacts in near-infrared imaging using moving standard deviation and spline interpolation. Physiological Measurement, 31(5), 649–662. https://doi.org/10.1088/0967-3334/31/5/004 Scholkmann, Felix, Kleiser, S., Metz, A. J., Zimmermann, R., Mata Pavia, J., Wolf, U., & Wolf, M. (2014). A review on continuous wave functional near-infrared spectroscopy and imaging instrumentation and methodology. NeuroImage, 85 Pt 1, 6–27. https://doi.org/10.1016/j.neuroimage.2013.05.004 Schore, J. R., & Schore, A. N. (2008). Modern Attachment Theory: The Central Role of Affect Regulation in Development and Treatment. Clinical Social Work Journal, 36(1), 9–20. https://doi.org/10.1007/s10615-007-0111-7 Schrago, C. G., & Voloch, C. M. (2013). The precision of the hominid timescale estimated by relaxed clock methods. Journal of Evolutionary Biology, 26(4), 746–755. https://doi.org/10.1111/jeb.12076 Semendeferi, K., Lu, A., Schenker, N., & Damasio, H. (2002). Humans and great apes share a large frontal cortex. Nature Neuroscience, 5(3), 272–276. https://doi.org/10.1038/nn814 Slocombe, K. E., Townsend, S. W., & Zuberbühler, K. (2009). Wild chimpanzees (Pan troglodytes schweinfurthii) distinguish between different scream types: Evidence from a playback study. Animal Cognition, 12(3), 441–449. https://doi.org/10.1007/s10071-008-0204-x Slocombe, K. E., & Zuberbühler, K. (2005). Agonistic screams in wild chimpanzees (Pan troglodytes schweinfurthii) vary as a function of social role. Journal of Comparative Psychology (Washington, D.C.: 1983), 119(1), 67–77. https://doi.org/10.1037/0735- 7036.119.1.67 Slocombe, K. E., & Zuberbühler, K. (2007). Chimpanzees modify recruitment screams as a function of audience composition. Proceedings of the National Academy of Sciences of the United States of America, 104(43), 17228–17233. https://doi.org/10.1073/pnas.0706741104 Slocombe, K. E, Townsend, S. W., & Zuberbüuhler, K. (2009). Wild chimpanzees (Pan troglodytes schweinfurthii) distinguish between different scream types: Evidence from a playback study. Animal Cognition, 12(3), 441–449. https://doi.org/10.1007/s10071-008-0204-x Smiley, J. F., & Falchier, A. (2009). Multisensory connections of monkey auditory cerebral cortex. Hearing Research, 258(1), 37–46. https://doi.org/10.1016/j.heares.2009.06.019

221

Smith, S. M., Jenkinson, M., Woolrich, M. W., Beckmann, C. F., Behrens, T. E. J., Johansen- Berg, H., Bannister, P. R., De Luca, M., Drobnjak, I., Flitney, D. E., Niazy, R. K., Saunders, J., Vickers, J., Zhang, Y., De Stefano, N., Brady, J. M., & Matthews, P. M. (2004). Advances in functional and structural MR image analysis and implementation as FSL. NeuroImage, 23 Suppl 1, S208-219. https://doi.org/10.1016/j.neuroimage.2004.07.051 Staes, N., Smaers, J. B., Kunkle, A. E., Hopkins, W. D., Bradley, B. J., & Sherwood, C. C. (2018). Evolutionary divergence of neuroanatomical organization and related genes in chimpanzees and bonobos. Cortex. https://doi.org/10.1016/j.cortex.2018.09.016 Stout, D., & Chaminade, T. (2012). Stone tools, language and the brain in human evolution. Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1585), 75–87. https://doi.org/10.1098/rstb.2011.0099 Strait, M., & Scheutz, M. (2014). What we can and cannot (yet) do with functional near infrared spectroscopy. Frontiers in Neuroscience, 8, 117. https://doi.org/10.3389/fnins.2014.00117 Tachtsidis, I., & Scholkmann, F. (2016). False positives and false negatives in functional near-infrared spectroscopy: Issues, challenges, and the way forward. Neurophotonics, 3(3), 031405. https://doi.org/10.1117/1.NPh.3.3.031405 Taglialatela, J. P., Russell, J. L., Schaeffer, J. A., & Hopkins, W. D. (2009). Visualizing Vocal Perception in the Chimpanzee Brain. Cerebral Cortex (New York, NY), 19(5), 1151– 1157. https://doi.org/10.1093/cercor/bhn157 Tak, S., Uga, M., Flandin, G., Dan, I., & Penny, W. D. (2016). Sensor space group analysis for fNIRS data. Journal of Neuroscience Methods, 264, 103–112. https://doi.org/10.1016/j.jneumeth.2016.03.003 Taylor, A. M., & Reby, D. (2010). The contribution of source–filter theory to mammal vocal communication research. Journal of Zoology, 280(3), 221–236. https://doi.org/10.1111/j.1469-7998.2009.00661.x Team, R. (2020). RStudio: Integrated Development for R. RStudio. RStudio, Inc. https://rstudio.com/ Tie, Y., Suarez, R. O., Whalen, S., Radmanesh, A., Norton, I. H., & Golby, A. J. (2009). Comparison of blocked and event-related fMRI designs for pre-surgical language mapping. NeuroImage, 47 Suppl 2, T107-115. https://doi.org/10.1016/j.neuroimage.2008.11.020 Titze, I. R. (1994). Principles of Voice Production. Prentice Hall. Tops, M., & Boksem, M. A. S. (2011). A Potential Role of the Inferior Frontal Gyrus and Anterior Insula in Cognitive Control, Brain Rhythms, and Event-Related Potentials. Frontiers in Psychology, 2. https://doi.org/10.3389/fpsyg.2011.00330 Tsuzuki, D., Jurcak, V., Singh, A. K., Okamoto, M., Watanabe, E., & Dan, I. (2007). Virtual

222

spatial registration of stand-alone fNIRS data to MNI space. NeuroImage, 34(4), 1506– 1518. https://doi.org/10.1016/j.neuroimage.2006.10.043 Tupak, S. V., Dresler, T., Guhn, A., Ehlis, A.-C., Fallgatter, A. J., Pauli, P., & Herrmann, M. J. (2014). Implicit emotion regulation in the presence of threat: Neural and autonomic correlates. NeuroImage, 85, 372–379. https://doi.org/10.1016/j.neuroimage.2013.09.066 Tuttle, R. H. (1993). Kano, T. 1992. The Last Ape: Pygmy Chimpanzee Behavior and Ecology. Stanford University Press, Stanford, CA, xxviii + 248 pp. ISBN 0-8047-1612-9. Price (hardbound), $45.00. Journal of Mammalogy, 74(1), 239–240. https://doi.org/10.2307/1381928 Vergotte, G., Perrey, S., Muthuraman, M., Janaqi, S., & Torre, K. (2018). Concurrent Changes of Brain Functional Connectivity and Motor Variability When Adapting to Task Constraints. Frontiers in Physiology, 9. https://doi.org/10.3389/fphys.2018.00909 Villringer, A., Planck, J., Hock, C., Schleinkofer, L., & Dirnagl, U. (1993). Near infrared spectroscopy (NIRS): A new tool to study hemodynamic changes during activation of brain function in human adults. Neuroscience Letters, 154(1), 101–104. https://doi.org/10.1016/0304-3940(93)90181-J Wagner, M., & Watson, D. G. (2010). Experimental and theoretical advances in prosody: A review. Language and Cognitive Processes, 25(7–9), 905–945. https://doi.org/10.1080/01690961003589492 Wakita, M., Shibasaki, M., Ishizuka, T., Schnackenberg, J., Fujiawara, M., & Masataka, N. (2010). Measurement of neuronal activity in a macaque monkey in response to animate images using near-infrared spectroscopy. Frontiers in Behavioral Neuroscience, 4, 31. https://doi.org/10.3389/fnbeh.2010.00031 Wallez, C., & Vauclair, J. (2011). Right hemisphere dominance for emotion processing in baboons. Brain and Cognition, 75(2), 164–169. https://doi.org/10.1016/j.bandc.2010.11.004 Wassenaar, E. B., & Van den Brand, J. G. H. (2005). Reliability of near-infrared spectroscopy in people with dark skin pigmentation. Journal of Clinical Monitoring and Computing, 19(3), 195–199. https://doi.org/10.1007/s10877-005-1655-0 Whitfield-Gabrieli, S., & Nieto-Castanon, A. (2012). Conn: A functional connectivity toolbox for correlated and anticorrelated brain networks. Brain Connectivity, 2(3), 125–141. https://doi.org/10.1089/brain.2012.0073 Wildgruber, D., Ethofer, T., Grandjean, D., & Kreifelts, B. (2009). A cerebral network model of speech prosody comprehension. International Journal of Speech-Language Pathology, 11(4), 277–281. https://doi.org/10.1080/17549500902943043 Witteman, J., Van Heuven, V., & Schiller, N. (2012). Hearing feelings: A quantitative meta- analysis on the neuroimaging literature of emotional prosody perception. Neuropsychologia, 50(12), 2752–2763.

223

https://doi.org/10.1016/j.neuropsychologia.2012.07.026 Wyhe, J. V., & Rookmaaker, K. (2012). A new theory to explain the receipt of Wallace’s Ternate Essay by Darwin in 1858. Biological Journal of the Linnean Society, 105(1), 249– 252. https://doi.org/10.1111/j.1095-8312.2011.01808.x Yang, H., Zhou, Z., Liu, Y., Ruan, Z., Gong, H., Luo, Q., & Lu, Z. (2007). Gender difference in hemodynamic responses of prefrontal area to emotional stress by near-infrared spectroscopy. Behavioural Brain Research, 178(1), 172–176. https://doi.org/10.1016/j.bbr.2006.11.039 Yeni-Komshian, G. H., & Benson, D. A. (1976). Anatomical study of cerebral asymmetry in the temporal lobe of humans, chimpanzees, and rhesus monkeys. Science, 192(4237), 387–389. https://doi.org/10.1126/science.816005 Young, A. W., Frühholz, S., & Schweinberger, S. R. (2020). Face and Voice Perception: Understanding Commonalities and Differences. Trends in Cognitive Sciences, 0(0). https://doi.org/10.1016/j.tics.2020.02.001 Zäske, R., Hasan, B. A. S., & Belin, P. (2017). It doesn’t matter what you say: FMRI correlates of voice learning and recognition independent of speech content. Cortex: A Journal Devoted to the Study of the Nervous System and Behavior, 94, 100–112. https://doi.org/10.1016/j.cortex.2017.06.005 Zatorre, R. J., & Belin, P. (2001). Spectral and temporal processing in human auditory cortex. Cerebral Cortex (New York, N.Y.: 1991), 11(10), 946–953. https://doi.org/10.1093/cercor/11.10.946 Zhang, D., Zhou, Y., & Yuan, J. (2018). Speech Prosodies of Different Emotional Categories Activate Different Brain Regions in Adult Cortex: An fNIRS Study. Scientific Reports, 8(1), 218. https://doi.org/10.1038/s41598-017-18683-2 Zimmermann, E., Leliveld, L., & Schehka, S. (2013). Toward the evolutionary roots of affective prosody in human acoustic communication: A comparative approach to mammalian voices. Oxford University Press. https://www.oxfordscholarship.com/view/10.1093/acprof:oso/9780199583560.001.0001/ acprof-9780199583560-chapter-8 Zuberbühler, K. (2000). Referential labelling in Diana monkeys. Animal Behaviour, 59(5), 917– 927. https://doi.org/10.1006/anbe.1999.1317

224

Additional Project

Djinni: A Novel Technology Supported Exposure Therapy Paradigm for SAD Combining Virtual Reality and Augmented Reality

Maher Ben-Moussa*, Coralie Debracque*, Marius Rubo*, and Wolf-Gero Lange* *All authors listed have made equally substantial, direct, and intellectual contribution to the work

Published on April 28th 2017 in Frontiers in Psychiatry, doi: 10.3389/fpsyt.2017.00026

Abstract

The present paper explores the benefits and the capabilities of various emerging state-of- the-art interactive 3D and Internet of Things technologies and investigates how these technologies can be exploited to develop a more effective technology supported exposure therapy solution for social anxiety disorder. “DJINNI” is a conceptual design of an in vivo augmented reality (AR) exposure therapy mobile support system that exploits several capturing technologies and integrates the patient’s state and situation by vision-based, audio-based, and physiology-based analysis as well as by indoor/outdoor localization techniques. DJINNI also comprises an innovative virtual reality exposure therapy system that is adaptive and customizable to the demands of the in vivo experience and therapeutic progress. DJINNI follows a gamification approach where rewards and achievements are utilized to motivate the patient to progress in her/his treatment. The current paper reviews the state of the art of technologies needed for such a solution and recommends how these technologies could be integrated in the development of an individually tailored and yet feasible and effective AR/virtual reality-based exposure therapy. Finally, the paper outlines how DJINNI could be part of classical cognitive behavioural treatment and how to validate such a setup.

Keywords: social anxiety disorder, social phobia, VReT, exposure therapy, virtual reality therapy, augmented reality

225

Introduction

In the well-known movie Amélie (1), the eponymous heroine has trouble managing social situations. When a friend of hers is publicly embarrassed by his boss, being called a vegetable, she intends to help, but does not know how to react. However, her wishful thinking of having someone whispering the perfect response to her at the right time magically comes true, and she sovereignly masters the situation by replying: “You’ll never be a vegetable. Even artichokes have hearts!.” While the dream of being helped out unobtrusively when caught off-guard or feeling insecure in social situation is usually not fulfilled and may even seem naïve, the aim of this paper is to explore the technical possibilities in achieving just this. By using the metaphor of a “DJINNI,” an Arabian mythological spirit that can be summoned to help it’s “Master” and has the ability to hide itself to others, we convey our core idea behind a set of technological solutions for helping individuals fearing social situations. The last two decades saw the emergence of virtual reality (VR) as a therapeutic tool for treatment of various mental disorders (2). Especially in the field of anxiety disorders, the immersive experience that VR offers has made it a useful tool for exposure therapy, the gold standard in the treatment of these conditions. In fact, in situations where the confrontation with the feared object or situation is expensive (e.g., fear of flying) or difficult to provide (e.g., high buildings in rural areas when treating fear of heights; fear of open spaces; audiences for fear of speaking in public, etc.), virtual reality exposure therapy (VRET) has become a tangible solution (3). Most challenging in that respect, however, is social anxiety disorder (SAD). The nature of social situations that SAD patients fear is heterogeneous. Simulating various in vivo situations in VR where patients can experience possible negative evaluation by others is quite difficult to achieve technologically. A lot of efforts and resources would be required to build virtual environments that would simulate different situations experienced by the patient. In fact, the heterogeneous nature of situations experienced by SAD patients is quite difficult to achieve also for traditional approaches of exposure therapy. While the presence of the therapist in traditional exposure therapy for first guidance or modelling purposes is usual in the treatment of other anxiety disorders (e.g., specific phobia or agoraphobia), it seems rather odd to accompany a patient to speeches they have to give, or to parties, or dates. As a mental disorder, SAD is desperately in need of novel technological therapeutic solutions that can overcome the current

226

limitations. The aim of the present manuscript is to explore the potential of new emerging, alternative technologies, that have advanced a lot in the last years [e.g., augmented reality (AR) glasses and various Internet of Things (IoT) devices such as smartwatches, sporting sensors, etc.], as support tools for exposure therapy. With “DJINNI,” we propose a technological solution integrating VR, AR, and IoT technologies to deliver a complete support solution for patients suffering from SAD as well as for their therapists.

SAD as a mental disorder

Social anxiety disorder (or social phobia) is a mental disorder accumulating in frequency (4,5), with lifetime prevalence rates of 7–13% in Europe (6), and it belongs to the most prevalent mental disorders after depression and substance abuse (7, 8). It is a debilitating condition characterized by marked fear and embarrassment in social and performance situations and when under scrutiny by others (9). SAD typically has a chronic course if untreated (6), and becomes increasingly associated with comorbid mental problems such as depression, or alcohol abuse. In the Netherlands, mental health services for SAD cost 11,952€ per capita. Even the costs for sub-threshold SAD sum up to 4,687€, substantially more than for healthy individuals (10, 11). But, most importantly, it severely disrupts social and occupational functioning (6, 12) of individuals suffering from SAD. Cognitive models of SAD (13, 14) suggest that socially anxious individuals (SAs) tend to interpret their own performance in social contexts and ambiguous social information surrounding them, in a threat-confirming way (15). In addition, they are believed to quickly attend to negative social stimuli, predominantly faces (16). Both of these cognitive biases are believed to fuel SAs’ subjective experiences of anxiety and reconfirm the fearful convictions they enter a situation with thereby maintaining the disorder (12, 13). Interestingly, measurable relationships with observable behaviour (14, 17-20) or physiology (21, 22) have been indecisive. This is intriguing as cumulative evidence suggests that SAs are experienced as somewhat odd in interactions—although no systematic behavioural patterns have been identified (23, 24). With SAs being experts in looking for signs of devaluation, they will most likely sense negative signals validating their fear of negative evaluation (24). Consequently, along with attentional biases, and interpretational biases, measurement of psychophysiological indices and assessment as well as retraining of “true” social behaviour in real social contexts should become part of therapeutic approaches in the future.

227

In sum, while evidence supports the notion that social anxiety is related to a negative interpretive and attentional biases, and elevated subjective anxiety levels (15), deviations in behavioural and physiological indices have been difficult to substantiate. One reason may be that fear is, according to Lang’s model of emotion (25, 26), reflected in three independent but interrelated systems: verbal report, fear related behaviour, and patterns of somatic activation. He stresses that these systems are not necessarily synchronized. In fact, it is plausible to assume that, e.g., continuous reporting (on one’s thoughts, behaviour, or physiology) will disrupt their natural patterns. Additionally, human (anxious) behaviour is often guided by reflexive, automatic behavioural responses inaccessible to introspection (27, 28). When social anxiety is considered, self-report could be compromised by SAs’ prone- ness to social desirability effects (29), or again cognitive biases. Furthermore, SAs’ self- reported deficits in, e.g., social skills appear to be quite accurate in one context (social interaction) but not in another (public speaking) (30). Another reason for these inconsistencies may lie in the fact that social anxiety is quite heterogeneous. Its specific form knows many facets such as fear of trembling, blushing, or sweating, and speaking, writing, eating/drinking, and urinating in public restrooms. Often, research participants are screened on general aspects of social anxiety, which they all share. However, stimuli or social scenarios might not be specific enough to tap into their particular fear. Although effective treatment regimens exist, the very same heterogeneity is thought responsible for the smaller treatment outcomes and higher relapse rates when compared to exposure therapy in other anxiety disorders: the effect sizes of SAD treatments lag behind those of the other anxiety disorders (>1.74) (31, 32) and of the 60% that remit due to treatment 40% relapse (33). In addition, cognitive behaviour therapy (CBT), the gold standard in anxiety treatment, focuses on distorted cognitions and exposure to frequently avoided social situations, but readily ignores shortcomings in subtle interpersonal behaviour. Finally, replicable scientific investigation of observable social behaviour in real life is hampered by the necessity to incorporate numbers of confederates. Their (non-)verbal reactions, even after extensive training, are neither completely controllable nor predictable. Physiological assessment with mobile hardware makes “normal” interaction awkward. In addition, a reliable behavioural observation demands at least two observers/film angles capable of registering, e.g., frequency and length of eye contact or physical distance between interlocutors, and a team of observers/evaluators blind to the conditions (of the

228

participant). High-resolution measurement is impossible in real social environments, even under laboratory conditions. With regard to exposure treatment, SAD is one of the few anxiety disorders where the therapist is usually absent during the in vivo exposure sessions. He or she can only rely on the subjective record of the patient’s description of the situation. They cannot assess the anxiety levels or behaviours “in vivo” to evaluate if the session was successful. In addition, gradual exposure is difficult to accomplish and the therapist cannot serve either as model or as helping hand, should an exposure task seem too difficult.

Current state-of-the art solutions

Virtual Reality Exposure Therapy

To overcome the abovementioned difficulties in creating anxiety evoking, highly controllable, and replicable situations in real life, immersive virtual reality (IVR) technology and VRET has gained considerable attention in research and assessment of anxiety disorders over the last decennia (34). As analogue to in vivo exposure, VRET to threat- evoking stimuli has proven to be an equally effective way for provoking (reflexive) threat responses in close-to-real situations and initiating habituation as prerequisite of treatment (35). However, the difficulty to assess and address the complex pat-tern of SAD symptoms in VRET is reflected in a relative scarcity of studies in the field. Two meta-analyses by Opriş et al. (36) and a more recent one by Kampmann and colleagues (37) explored the efficacy of technology-assisted interventions for SAD since 1985. They list only a handful of high- quality studies on VRET in SAD. These studies yield generally positive results. In a study by Klinger and colleagues (38), patients participated in virtual conversations in a meeting room and at a dinner table, were scrutinized by, and needed to assert themselves against virtual agents. This treatment was found to be similarly effective as group CBT. Wallach and colleagues (39) found effects comparable to CBT in a public speaking task in a VR scenario while at the same time achieving smaller dropout rates. Similarly, Anderson and colleagues (40) also report significant improvement in a public speaking task in VR and no difference between virtual and in vivo exposure therapy. Moreover, they found the effect to be stable throughout a period of 1 year after treatment. Finally, Kampmann et al. (37) found VRET treatment effects by exposing patients to virtual speech situations, small-talk with strangers, job interviews, dining in a restaurant, having a blind date, or returning bought

229

products to a shop. Most interestingly in this study, the therapist could adjust the number, gender, and gestures of avatars, the friendliness, and to a certain degree content of the semi structures dialogs depending on the patients’ needs, anxiety, and treatment progress. Yet, treatment evaluation was heavily based on subjective self-report questionnaires and improvements in speech performance as the only behavioural measure. While the studies conducted so far document a promising future for VR applications in SAD treatment, they fall short on exploiting its full potential. Besides often being unpretentious in its audio-visual quality, they put, except of the Kampmann’s study, a strong emphasis on the fear of public speaking, largely neglecting the complex structure of other contexts in which SAD symptoms occur. More importantly, they fail to properly address the dissimilarities that exist between SAD patients with regard to attentional and behavioural indices. Therefore, it is advised to establish a VR treatment program that covers the complex pattern of SAD symptoms while allowing for patient-specific adaptations.

The Potential of AR as a Complementary or an Alternative Tool to VR

Although numerous virtual scenarios exist for treatment purposes, they are generally not individually tailored and very few utilize behavioural measures available through VR technology such as amount of mimicry (41), interpersonal distance and movement speed (42), or gaze direction (43), nor combine them with physiological measures (44). In addition, AR devices would allow therapists to get a first-hand impression of their patient’s behaviour in a social situation in order to individually tailor the VRET scenarios. AR could also incorporate physiological data from, e.g., smartwatches to analyse indices of anxiety in a situation, hint on friendly faces in a crowd via emotion recognition, and provide helpful sentences to the patient (invisible for others) should he/she be stuck in a conversation. Taken together, utilizing IVR or AR means a great step forward in investigating the interplay between self-report, actual behaviour, and physiology in social anxiety as well as their response to different aspects of the treatment or their predictive value for treatment outcome. While circumventing the methodological difficulties of assessment in real life, IVR allows investigating differences in behaviour such as gaze direction, gaze duration, distance from, and movement speed toward audience/interlocutor, psycho-physiology, and voice properties between high and low SAs, in a high ecologically valid setting. In addition, e.g., wearable AR glasses could be used to assess actual social behaviour of SADs in real social contexts. But they could also provide cues for positive social interaction via emotion

230

recognition and help out in conversations or provide soothing comments, should the heartbeat registered by peripheral devices indicate anxiety. In sum, such an integrative approach will yield the potentially most reliable source of assessment for all possible indices of social anxiety, will fill the gaps in our comprehension of their correlations and of SAD, and will lead to better treatment in the future.

Application of Wearable AR Glasses in Mental Health

Similar to VR, AR has also been employed in mental disorders for the last two decades, however, in much smaller scale due to AR’s complexity and limitations (45). With the more recent emergence of modern AR wearable glasses, such as Google Glass™, and the maturation of this technology, AR started becoming a serious potential tool for treatment of various mental disorders. AR has been employed as assistive technology for social interaction by assisting users to identify/remember people and acquire more information about these people (46, 47). Swan and colleagues (48) proposed a system for brain training using wearable AR glasses. AR has also been used in treatment of small animal phobia (49, 50) with some promising results. McNaney and colleagues investigate the employment of Google Glass™ wearable glasses as an assistive everyday device for people suffering from Parkinson’s disease (51). Wearable AR glasses are recently also being utilized as tool for supporting and teaching children suffering from autism spectrum disorder in recognizing emotions and social signals in Stanford’s Autism Glass project (52, 53) and as commercial products of the start-up company Brain Power (54). Although AR has been used in exposure therapy, to date, no serious attempts have been published to use AR in SAD treatment, though its potential use has been speculated (45). However, early attempts at simulating patients in other fields of health sciences using wearable AR glasses can be taken as a demonstration of its feasibility (55).

Proposed Solution

Enhancing CBT with VRET and Augmented Reality Exposure Therapy (ARET)

To address the challenges and limitations of contemporary SAD exposure therapy approaches as described above, DJINNI is pro-posed as a software and hardware solution that integrates AR and VR and various sensing technologies to support SAD patient’s CBT treatment. In a nutshell, DJINNI would offer i) an ARET that would compensate for the

231

absence of the therapists during the in vivo exposure experiences and would automatically interpret various events occurring during these experiences and guide and support the patient; and ii) a VRET that would simulate exposure experiences in a safe 3D environment while incorporating the interaction and behavioural data collected from the ARET experiences. The goals and the scenarios of both ARET and VRET will be influenced by the CBT. The experiences and progress statistics collected by both ARET and VRET can influence the course of the whole SAD treatment (Figure 1).

Augmented Reality Exposure Therapy

In the ARET situation, DJINNI will be experienced as an intelligent assistant that guides and supports the patient during in vivo real-life exposure experiences. As illustrated in Figure 2, DJINNI will be experienced by the patient through her/his wearable AR glass as a system that “understands” the patient’s environment, interprets the state of the people in the environment (location, activity, conversational state, emotions, etc.), interprets the social context, assesses the state of the patient/user (location, activity, conversational state, emotions, etc.), and provides support and advices in the in vivo exposure experience by providing cues, advices, and soothing comments. As illustrated in Figure 2, the system’s interaction with the patient follows a gamification approach (56) where progress is measured in scores/rewards and the patient is encouraged to progress in her/his exposure treatment. The gamification approach will be partly based on previous work where techniques for visualization of rewards and progress were employed to motivate eating disorder patients (57-59). Let us say we have a patient suffering from anxiety of being in crowded places with high probability of scrutiny by others. For the in vivo exposure therapy, the patient will be

232

equipped with wearable AR glasses, a smartphone, and a wristband with physiological sensors and will be sent to, for example, a post office. For such system to function properly in an unpredictable environment, it needs to be able to reliably detect and interpret various events occurring in the environment (e.g., spatial information; people in the environment, their roles, and their current actions/ behaviours; the patient, her/his actions, and her/his physiological state, etc.). With the help of the sensors, the patient is connected to and/or sensors installed in the environment, and based on the predefined information entered by the therapist about the environment, the system should be able to capture and detect various events reliably if we acknowledge some constrains and limitations as will be discussed later. In practice, the system will understand that the persons behind the counter or person with specific shirt colour and name tags are post office employees. The other persons are clients also waiting in the queue. Upon entering the post office, the system should monitor the physiological state of the patient and how she/he behaves in the environment. When hesitating, the system can display some instructional or motivational messages on the AR glasses on what to do (e.g., DJINNI could say: “You are in the post office. Please stand in the queue and wait for your turn”). When getting stressed the system can attempt to provide messages to calm down the patient (e.g., “Don’t panic. Everything is ok. Please keep standing in the queue”). When detecting the facial expressions and the gaze of others in the environment the system would be able to have a better understanding of the situation (e.g., “The employee just greeted you and smiles at you. Try to smile back”). The system will also be able to detect if anxiety has caused the patient toward performing inappropriate behaviours such as jumping the queue, anxiously looking people right in the eyes, or especially avoiding eye contact (e.g., “Relax. No reason to panic. Don’t stare at her/Try to look into her eyes once a while,” or “Call a friend on the phone to feel better”).

233

Virtual Reality Exposure Therapy

The DJINNI VRET system will complement the ARET system by allowing the patient to re- experience similar exposure events in the safe 3D VR environment. The novelty in DJINNI’s VRET system in comparison with traditional VRET solutions lies in its incorporation of data collected by the ARET system, and, by doing so, it is possible to automatically generate and adapt VR elements to simulate the events similarly to how they occurred in the in vivo ARET experience. Thus, it gives the patient a platform to, first, re-experience (replay) previous experiences recorded by the ARET system. Consequently, they can learn to objectively analyse/evaluate others’ and one’s own behaviour to prevent negative rumination/post-event processing. Finally, they eventually learn how to deal with these experiences in the safe environment of VR as the ARET feeds right into the VRET platform. Similar to the ARET system, the VRET system will also follow a gamification approach to motivate the patient during her/his therapy by feeding back the actual achievements but also the progress that is already accomplished. Considering the “post-office scenario” as described in Section “Augmented Reality Exposure Therapy,” DJINNI’s VRET system would utilize the data related to the events and the people in the post office environment to automatically adapt the VRET virtual world by simulating similar density of crowd in the post office, specific events that occurred (various people starting looking at the patient, someone in the queue smiled at the patient, etc.), the timing and duration of these events, etc.

234

Technical Specifications of DJINNI

The DJINNI solution consists of two separate systems that complement each other and exchange data between each other to improve the treatment. As illustrated in Figure 3, DJINNI’s ARET system will rely on several intelligent software components that together determine the functioning and behaviour of the system. In general, DJINNI’s ARET platform consists of the following: • Wearable AR glasses for displaying DJINNI’s messages to the user and capturing the events in the environment through the built-in camera. • Smart phone for giving the patient a familiar and safe interface to interact with the system. • The components environment events tracking, patient behaviour interpretation, and people behaviour analyse data acquired by the various perception components, the predefined environment information, and workflows and interpret the occurring event, the patient’s current behaviour, or the behaviour of other people in the environment, respectively. The perception components would detect what they perceive (e.g., woman facing patient is smiling), while the interpretation components would “understand” what it means in the current situation (e.g., the post office employee just greeted the patient with a smile). • The workflow engine represents the core component of the ARET system and generates the supportive and guiding behaviours of the system by executing the predefined workflows during the exposure experience. • Progress tracking component tracks the progress of the patient and communicates it with the therapist based on predefined objectives set by the therapist. This component is also responsible for the gamification-based reinforcement/motivation. • Event tracking and logging logs all the events that occur during an exposure experience, which can be used to generate the VRET scenarios. • Vision-based analysis represents all the components responsible for analysing the video captured by the camera of the wearable AR glasses. • Audio-based analysis represents all the components responsible for analysing the audio and detected speech and environ-mental sounds. • Physiology-based analysis represents the components that measure the patient’s physiological signals through wristband and other sensors. • Indoor and outdoor localization represent components for determining the geographical location of the patient. Implementation of indoor localization may require equipping popular exposure therapy

235

venues in the city (e.g., postal office, supermarket) with localization sensors that help the system in determining the exact location of the patient in an indoor environment. • Scenario workflows are predefined workflows/programs defining, in a detailed way, how the system should behave during each situation in a specific exposure scenario/environment. Scenario workflows should be considered as complex but general definitions created for all the patients, while individualised customization are simple adaptations to these general workflows considering specific patient’s needs. As is the case with the ARET system, the VRET system as illustrated in Figure 4 will also rely on several intelligent software components that together determine the functioning and behaviour of the system. However, all the components that analyse and interpret the in vivo situation events are excluded as they are rather simulated instead of captured in the VR environment. Next to the components already described above for the ARET systems, the VRET system includes the following components: • Environment simulation component simulates the environment and the events in the environment based on predefined data as well as on data collected by the ARET system. • People/agents simulation component simulates the behaviour of the people in the scenario based on the predefined workflows as well as the by ARET captured data.

236

Workflows and Interpretations

DJINNI’s scenario workflows will define step by step what actions the system should take at each possible step/situation of a specific exposure experience. Based on previous work on interactive virtual environments (60), the workflows will be implemented in artificial intelligence condition–action rules defining what actions to take given certain derived or predefined data. Table 1 illustrates the types of data that will be derived by the system or predefined by the therapist/programmer that will serve as conditions for determining the system’s actions.

Technology Feasibility

The development of the DJINNI solution would require the integration of various technologies of which some are mature and others are experimental. The following sub- sections will review these technologies and determine their current level of development and will provide recommendations on how these technologies should be integrated and utilized in the DJINNI solution.

Wearable AR Glasses

Although, the idea of having wearable AR glasses as a communication interface for guiding and supporting the patient is innovative, it can also be seen as pitfall. The currently disconnected Google Glass™ wearable glasses (61) have been for a long time seen as the standard in wearable AR glasses. However, due to their cyborg style design, they would not be suitable for patients suffering from SAD to wear them as they would attract undesirable attention. However, the latest years also saw the development of various new wearable AR glasses that may attract less undesirable attention than Google Glass™. Currently available wearable AR glasses such as Laforge Shima™ (62) and Vuzix VidWear™ B3000 (63) will enable DJINNI to exploits the advantages of AR glasses without the negative impact risked if the technology is visible. Furthermore, the emerging waveguide optical technologies [e.g., Trulife Optics™ (64) and Dispelix™ (65)], which is also being used in Laforge and Vuzix, will enable many wearable AR glasses manufacturers to produce more fashionable and less attention-attracting glasses in the near future.

237

Consequence for DJINNI

For the first prototype of DJINNI, it is the intention to employ Laforge Shima™ or Vuzix VidWear™ B3000 AR glasses.

238

Audio- and Vision-Based Perception and Interpretation Technologies

Although our perception capabilities as humans is quite advanced and we are able to easily distinguish between the different objects in our human vision and to understand the characteristics and affordances of these objects, yet, for a computer system nowadays, it is impossible to reliably detect and recognize all the objects in its view, let alone to derive their characteristics and affordances. However, for each type of object, specific algorithms have been developed and some algorithms started to reach a level of sufficient maturity (66). Using machine learning techniques, computers are enabled to emulate human cognitive abilities such as sound detection, speech recognition, image recognition, emotion recognition, and behaviour analysis.

239

Speech Recognition Although speech recognition has significant success in research using techniques such as hidden Markov models (67), practical experiments have shown that currently available speech technologies are still not reliable enough for natural human–computer interaction as most algorithms are still too sensitive to noise and accents (68). In addition, it appears that the most reliable speech recognition stems from the English language (60).

Consequence for DJINNI

Because of its unreliableness, speech recognition technologies will only be used in DJINNI VRET system and only if really needed. Use of speech recognition in public places will be avoided.

Object Recognition/Tracking

Object recognition/tracking has seen some progress in the past decades due to innovations in learning algorithms, image processing techniques, and feature extraction (69, 70). However, most object recognition and tracking algorithms only work well with objects that are near the camera view or objects that are large enough (represented by a large number of pixels in the image).

Consequence for DJINNI

As the camera is part of the wearable glasses, object recognition/ tracking algorithms will only be able to detect objects that are in the view of the camera and close to the patient. Small objects and objects far away will not be detected by the system. Therefore, use of object recognition and tracking should be limited in the scenario workflows to large or clearly visible objects and only at certain states of the workflows (e.g., for detecting the post office counter).

Facial Expression Recognition

Facial expression recognition research has been ongoing for some time and has reached a convincing level of maturity. For instance, the facial action coding system, originally designed to allow human users to objectively describe facial states (71), can now be automatically processed and classified into prototypical emotional expressions using

240

computer algorithms (72). Various research and commercial software prototypes have been designed and have reached a sufficient level of recognition.

Consequence for DJINNI

When speaking about emotional expressions, one has to be aware that there is no common definition of what emotions are (73) and workflows should be limited to utilizing prototypical Ekmanian emotions—the basic emotions anger, fear, sadness, disgust, surprise, and happiness, which are believed to be the most universal/ stable across cultures and contexts (74). In addition, emotion detection should only occur at states of the workflows where emotions are expected to be detected and relevant. Limiting the scenario will increase the reliability. Furthermore, the system should also be able to differentiate whether someone is talking to the patient or not.

Sound Detection

Sound detection algorithms have reached a very convincing level of maturity in the last years. Reliable commercial and open- source products that have been released can reliably distinguish between a large variety of sounds [e.g., ROAR™ (75) and Audio Analytics™ Ltd (76)].

Consequence for DJINNI

One of the reliable sound detection products can be used to detect environmental sounds that are relevant to identify certain social environments.

VR Technologies

The last years saw a rapid emergence of high-quality and afford-able commercial VR head- mounted devices [e.g., HTC Vive™ (77), Oculus Rift™ (78), Sony PlayStation VR™ (79), Samsung Gear VR™ (80), and Microsoft HoloLens™ (81)].

Consequence for DJINNI

All the existing products have reached sufficient quality and can be directly deployed for DJINNI VRET system.

241

Physiological and Affective State Interpretation Technologies

Over the last years, various fitness wristband and chest straps have been developed and employed to track one’s fitness performance by integrating peripheral sensors for tracking of motions and heart rate [Mio LINK™ (82), Fitbit Charge HR™ (83), and Garmin Soft Strap Premium HR Monitor™ (84)]. Information derived from these affordable fitness trackers can be used to extract some information about the patient physiological state. The more professional wristband Empatica E4™ (85) has also been gaining prominence the last years in research due to its reliability and the number of sensors that it embeds. Detected sensor signals and signals changes can be directly incorporated into the workflows and can lead to system actions. Furthermore, research has also shown that heart rate variability and galvanic skin response can also be used as a good indicator of stress (86, 87). However, interpretation of psychological signals is also prune to errors due to noise and to the other causes of the same physiological changes.

Consequence for DJINNI

As indicated with other components, it is possible to reach a certain level of reliability if limited to certain scenario. When a change in physiological signals is preceded by events than can cause stress, there is a high chance that the patients feel stressed. By considering these indicators, DJINNI can reach a good level of reliability.

Indoor/Outdoor Localization

Localization can refer to either indoor or outdoor localization, and it refers to locating an object or oneself in a building or outside. Various technologies for outdoor localization already exist. However, it is mainly GPS, sometimes complemented by WIFI router-based localization that has been the most reliable until now. Indoor localization is enabled by various technologies (e.g., iBeacons, RFID, sound, infrared, ultra-wideband, etc.) (88). However, the most accurate solutions for indoor localization are the ones that combine various technologies rather than using only one (89, 90). Several mature commercial systems exist that also employ various technologies [e.g., IndoorAtlas™ (91) and AccuWare™ (92)].

Consequence for DJINNI

For DJINNI, it is also intended to combine various technologies for indoor localization and

242

create a hybrid system based on previous work (93). However, indoor localization is not the most crucial component of the DJINNI solution. Although the example of the post office may require some indoor localization, many other exposure scenarios do not (e.g., public speaking anxiety, dating anxiety, etc.). For several scenarios where indoor localization is needed, local health institutes can install localization devices in some public buildings if these buildings can be used for exposure therapy (e.g., iBeacons).

Validation Study

As shown above, VR and AR are certainly the keys to improve actual exposure therapies. To validate DJINNI and its ARET and VRET components, the following validation study needs to be conducted. The study aims to evaluate the impact that new technology has on therapeutic progress in a population with, e.g., SAD. The proposed design consists of three consecutive steps.

Diagnostic Phase and Anxiety Evaluation

Naturally, the first step consists of thorough diagnostics of patients, by trained psychologist/therapists, by means of (structured) clinical interviews [e.g., Mini-International Neuropsychiatric Interview or Structured Clinical Interview for DSM-5 (95)] and trait questionnaires [e.g., Social Phobia Scale (96, 97) or Fear of Negative Evaluation Scale (98)]. After the diagnostics, the therapist should repeatedly assess subjective reports of state anxiety in anxiety evoking situations by traditional scales [e.g., State Anxiety Inventory (99), Subjective Units of Discomfort Scale (100), or visual analogue scale (101)] and/or its physiological equivalents (e.g., heart rate, skin conductance). These parameters at baseline should be assessed again in the same (formerly) anxiety evoking situations at the end of the treatment (after 12 weeks) and at follow-up (3 months after treatment) to eventually analyse treatment effects.

ARET/VRET/Traditional Exposure Therapies

Assuming that traditional CBT as well as VRET are effective treatments of SAD (37, 102), the experimental design aims to evaluate the added value of adaptive VRET and ARET. Accordingly, six groups of patients, with a principle diagnosis of SAD would be necessary to combine all possible exposure treatments (see Table 2) with ARET. Based on this semi-

243

experimental design and short-term and long-term changes in subjective experienced anxiety, behavioural and physiological indices can be used to evaluate the added value of (adaptive) VR and AR in exposure therapy. Furthermore, it will also help to assess the importance of technology-based individual tailoring/customization of treatment and evaluate the role of positive or negative feedback on the patient’s behaviour.

Conclusion and Discussion

The present article presented a conceptual design of DJINNI, a system that combines AR and VR to provide more effective exposure therapy solutions for patients suffering from SAD. Due to the effect sizes of traditional exposure therapies, heterogeneity of the patients’ difficulties in social interaction and the difficulty of personalization in current state-of-the- art VRET, an approach that exploits the benefits of wearable AR glasses is desperately needed. Yet, the use of AR technology as well as various experimental technologies for environment analysis should not lead to the development of an unreliable system that will not be beneficial or even suitable for use by SAD patients in the end. However, by taking the limitations of various technologies carefully into consideration and incorporating various contextual information regarding the place, situation, and phases of the interaction when developing the solution, it would be possible to achieve a reliable levels of accuracy with the proposed technologies. A thorough evaluation of various technologies is needed before integration into analytical and therapeutic workflows. After doing so, the paper has proposed technologies that have reached a sufficient level of reliability to be considered mature enough for building an effective ARET system. Another innovation of DJINNI is the adaptive nature of its VRET system. By incorporating data collected from in vivo expo-sure experience or even everyday social encounters, it is possible to improve and personalize VRET scenarios. Data collected during anxiety evoking events, as well as the behaviour of the people in the patients’ environment, can be used to inform the behaviour of the virtual

244

environment and the people in it. It is expected that this solution can be more effective than traditional VRET solutions as it is very personalized: It simulates events that have been recently experienced by the patient in in vivo situations, and, yet, can take place in a “safe” therapy setting. The final deployment of DJINNI would require the development of detailed workflows that define how the system should behave in ARET and VRET situations. This may require considerable work as each possible situation that may occur and possible actions to take by the system need to be defined in the workflows. However, by distinguishing between the general workflows and the individualized customizations, personalizing a system for each patient should not be too time consuming. Ideally, by providing advanced, high-end technical support solutions, DJINNI could substantially improve the efficacy of CBT (for SAD) and thereby ameliorate the individuals’ suffering and societal burden considerably.

References

1. Deschamps J-M, Ossard C, Jeunet J-P. Le fabuleux destin d’Amélie Poulain. France: UGC-Fox Distribution (2001).

2. Rizzo AA, Schultheis M, Kerns KA, Mateer C. Analysis of assets for virtual reality applications in neuropsychology. Neuropsychol Rehabil (2004) 14:207–39. doi:10.1080/09602010343000183

3. Meyerbröker K, Emmelkamp PM. Virtual reality exposure therapy in anxiety disorders: a systematic review of process-and-outcome studies. Depress Anxiety (2010) 27:933–44. doi:10.1002/da.20734

4. Furmark T. Social phobia: overview of community surveys. Acta Psychiatr Scand (2002) 105:84–93. doi:10.1034/j.1600-0447.2002.1r103.x

5. Rapee RM, Spence SH. The etiology of social phobia: empirical evidence and an initial model. Clin Psychol Rev (2004) 24:737–67. doi:10.1016/ j.cpr.2004.06.004

6. Fehm L, Pelissolo A, Furmark T, Wittchen H-U. Size and burden of social phobia in Europe. Eur Neuropsychopharmacol (2005) 15:453–62. doi:10.1016/j.euroneuro.2005.04.002

7. Fehm L, Wittchen HU. Comorbidity in social anxiety disorder. In: Bandelow B, Stein DJ, editors. Social Anxiety Disorder. New York: Dekker (2004). p. 49–63.

8. Kessler RC, Berglund P, Demler O, Jin R, Merikangas KR, Walters EE. Lifetime

245

prevalence and age-of-onset distributions of DSM-IV disorders in the national comorbidity survey replication. Arch Gen Psychiatry (2005) 62:593–602. doi:10.1001/archpsyc.62.6.593

9. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders. 5th ed. Washington, DC: American Psychiatric Association (2013).

10. Acarturk C, Smit F, de Graaf R, van Straten A, ten Have M, Cuijpers P. Economic costs of social phobia: a population-based study. J Affect Disord (2009) 115:421–9. doi:10.1016/j.jad.2008.10.008

11. Heimberg RG, Stein MB, Hiripi E, Kessler RC. Trends in the prevalence of social phobia in the United States: a synthetic cohort analysis of changes over four decades. Eur Psychiatry (2000) 15(1):29–37. doi:10.1016/ S0924-9338(00)00213-3

12. Stein MB. An epidemiologic perspective on social anxiety disorder. J Clin Psychiatry (2006) 67(Suppl 12):3–8.

13. Clark DM, Wells A. A cognitive model of social phobia. In: Heimberg R, Liebowitz M, Hope D, Schneier F, editors. Social Phobia: Diagnosis, Assessment and Treatment. New York: Guilford Press (1995). p. 69–112.

14. Rapee RM, Heimberg RG. A cognitive-behavioral model of anxiety in social phobia. Behav Res Ther (1997) 35:741–56. doi:10.1016/S0005- 7967(97)00022-3

15. Huppert JD, Pasupuleti RV, Foa EB, Mathews A. Interpretation biases in social anxiety: response generation, response selection, and self-appraisals. Behav Res Ther (2007) 45:1505– 15. doi:10.1016/j.brat.2007.01.006

16. Staugaard SR. Threatening faces and social anxiety: a literature review. Clin Psychol Rev (2010) 30:669–90. doi:10.1016/j.cpr.2010.05.001

17. Baker SR, Edelmann RJ. Is social phobia related to lack of social skills? Duration of skill- related behaviours and ratings of behavioural adequacy. Br J Clin Psychol (2002) 41:243–57. doi:10.1348/014466502760379118

18. Hofmann SG. Cognitive factors that maintain social anxiety disorder: a comprehensive model and its treatment implications. Cogn Behav Ther (2007) 36(4):193–209. doi:10.1080/16506070701421313

19. Moscovitch DA, Hofmann SG. When ambiguity hurts: social standards moderate self- appraisals in generalized social phobia. Behav Res Ther (2007) 45:1039–52. doi:10.1016/j.brat.2006.07.008

246

20. Rapee RM, Abbott MJ. Mental representation of observable attributes in people with social phobia. J Behav Ther Exp Psychiatry (2006) 37:113–26. doi:10.1016/j.jbtep.2005.01.001

21. Gramer M, Sprintschnik E. Social anxiety and cardiovascular responses to an evaluative speaking task: the role of stressor anticipation. Pers Individ Dif (2008) 44:371–81. doi:10.1016/j.paid.2007.08.016

22. Mauss I, Wilhelm F, Gross J. Is there less to social anxiety than meets the eye? Emotion experience, expression, and bodily responding. Cogn Emot (2004) 18:631–42. doi:10.1080/02699930341000112

23. Schneider BW, Turk CL. Examining the controversy surrounding social skills in social anxiety disorder: the state of the literature. In: Weeks JW, editor. The Wiley Blackwell Handbook of Social Anxiety Disorder. Chichester, UK: John Wiley & Sons, Ltd (2014). p. 366–87. doi:10.1002/ 9781118653920.ch17

24. Lange W-G, Rinck M, Becker ES. Behavioral deviations: surface features of social anxiety and what they reveal. In: Weeks JW, editor. The Wiley Blackwell Handbook of Social Anxiety Disorder. Chichester, UK: John Wiley & Sons, Ltd (2014). p. 344–65. doi:10.1002/9781118653920.ch16

25. Lang PJ. A bio-informational theory of emotional imagery. Psychophysiology (1979) 16:495–512. doi:10.1111/j.1469-8986.1979.tb01511.x

26. Lang PJ. The cognitive psychophysiology of emotion: fear and anxiety. In: Maser JD, Tuma AH, editors. Anxiety and the Anxiety Disorders. Hillsdale, NJ: Lawrence Erlbaum Associates (1985). p. 131–70.

27. De Houwer J. What are implicit measures and why are we using them? In: Wiers RW, Stacy AW, editors. Handbook of Implicit Cognition and Addiction. Thousand Oaks, CA: SAGE (2006). p. 11–28. doi:10.4135/9781412976237.n2

28. Strack F, Deutsch R. Reflective and impulsive determinants of social behavior. Pers Soc Psychol Rev (2004) 8:220–47. doi:10.1207/s15327957pspr0803_1

29. Frisch MB. Social-desirability responding in the assessment of social skill and anxiety. Psychol Rep (1988) 63:763–6.

30. Voncken MJ, Bögels SM. Social performance deficits in social anxiety disorder: reality during conversation and biased perception during speech. J Anxiety Disord (2008) 22:1384– 92. doi:10.1016/j.janxdis.2008.02.001

247

31. Bijl RV, Van Zessen G, Ravelli A. Psychiatric morbidity among adults in the Netherlands: the NEMESIS-study. II. Prevalence of psychiatric disorders. Netherlands Mental Health Survey and incidence study. Ned Tijdschr Geneeskd (1997) 141(50):2453–60.

32. Simon NM, Otto MW, Korbly NB, Peters PM, Nicolaou DC, Pollack MH. Quality of life in social anxiety disorder compared with panic disorder and the general population. Psychiatr Serv (2002) 53:714–8. doi:10.1176/appi. ps.53.6.714

33. Fehm L, Beesdo K, Jacobi F, Fiedler A. Social anxiety disorder above and below the diagnostic threshold: prevalence, comorbidity and impairment in the general population. Soc Psychiatry Psychiatr Epidemiol (2008) 43:257–65. doi:10.1007/s00127-007-0299-4

34. Bush J. Viability of virtual reality exposure therapy as a treatment alternative. Comput Human Behav (2008) 24:1032–40. doi:10.1016/j.chb.2007. 03.006

35. Powers MB, Emmelkamp PM. Virtual reality exposure therapy for anxiety disorders: a meta-analysis. J Anxiety Disord (2008) 22:561–9. doi:10.1016/ j.janxdis.2007.04.006

36. Opriş D, Pintea S, García-Palacios A, Botella C, Szamosközi Ş, David D. Virtual reality exposure therapy in anxiety disorders: a quantitative meta-analysis. Depress Anxiety (2012) 29:85–93. doi:10.1002/da.20910

37. Kampmann IL, Emmelkamp PM, Morina N. Meta-analysis of technology-assisted interventions for social anxiety disorder. J Anxiety Disord (2016) 42:71–84. doi:10.1016/j.janxdis.2016.06.007

38. Klinger E, Bouchard S, Légeron P, Roy S, Lauer F, Chemin I, et al. Virtual reality therapy versus cognitive behavior therapy for social phobia: a preliminary controlled study. Cyberpsychol Behav (2005) 8:76–89. doi:10.1089/ cpb.2005.8.76

39. Wallach HS, Safir MP, Bar-Zvi M. Virtual reality cognitive behavior therapy for public speaking anxiety: a randomized clinical trial. Behav Modif (2009) 33:314–38. doi:10.1177/0145445509331926

40. Anderson PL, Price M, Edwards SM, Obasaju MA, Schmertz SK, Zimand E, et al. Virtual reality exposure therapy for social anxiety disorder: a randomized controlled trial. J Consult Clin Psychol (2013) 81:751–60. doi:10.1037/a0033559

41. Vrijsen JN, Lange W-G, Becker ES, Rinck M. Socially anxious individuals lack unintentional mimicry. Behav Res Ther (2010) 48:561–4. doi:10.1016/ j.brat.2010.02.004

42. Rinck M, Rörtgen T, Lange W-G, Dotsch R, Wigboldus DH, Becker ES. Social anxiety

248

predicts avoidance behaviour in virtual encounters. Cogn Emot (2010) 24:1269–76. doi:10.1080/02699930903309268

43. Wieser MJ, Pauli P, Grosseibl M, Molzow I, Mühlberger A. Virtual social interactions in social anxiety-the impact of sex, gaze, and interpersonal distance. Cyberpsychol Behav Soc Netw (2010) 13:547–54. doi:10.1089/ cyber.2009.0432

44. Wiederhold BK, Wiederhold MD. Virtual Reality Therapy for Anxiety Disorders: Advances in Evaluation and Treatment. 1st ed. Washington, DC: American Psychological Association (2005)

45. Baus O, Bouchard S. Moving from virtual reality exposure-based therapy to augmented reality exposure-based therapy: a review. Front Hum Neurosci (2014) 8:112. doi:10.3389/fnhum.2014.00112

46. Utsumi Y, Kato Y, Kunze K, Iwamura M, Kise K. Who are you? A wearable face recognition system to support human memory. Proceedings of the 4th Augmented Human International Conference. New York, NY: ACM Press (2013). p. 150–3.

47. Mandal B, Chia S-C, Li L, Chandrasekhar V, Tan C, Lim J-H. A wearable face recognition system on Google glass for assisting social interactions. In: Jawahar CV, Shan S, editors. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Cham: Springer International Publishing (2014). p. 419–33. doi:10.1007/978-3-319-16634-6_31

48. Swan M, Kido T, Ruckenstein M. BRAINY–multi-modal brain training app for Google glass: cognitive enhancement, wearable computing, and the Internet-of-Things extend personal data analytics. Workshop on Personal Data Analytics in the Internet of Things 40th International Conference on Very Large Databases. Hangzhou, China (2014).

49. Botella C, Breton-López J, Quero S, Baños RM, García-Palacios A, Zaragoza I, et al. Treating cockroach phobia using a serious game on a mobile phone and augmented reality exposure: a single case study. Comput Human Behav (2011) 27:217–27. doi:10.1016/j.chb.2010.07.043

50. Michaliszyn D, Marchand A, Bouchard S, Martel MO, Poirier-Bisson J. A randomized, controlled clinical trial of in virtuo and in vivo exposure for spider phobia. Cyberpsychol Behav Soc Netw (2010) 13:689–95. doi:10.1089/ cyber.2009.0277

51. McNaney R, Vines J, Roggen D, Balaam M, Zhang P, Poliakov I, et al. Exploring the acceptability of Google glass as an everyday assistive device for people with Parkinson’s.

249

Proceedings of the 32nd Annual ACM Conference on Human Factors in Computing Systems. New York, NY: ACM Press (2014). p. 2551–4.

52. Voss C, Winograd T, Wall D, Washington P, Haber N, Kline A, et al. Superpower glass: delivering unobstructive real-time social cues in wearable systems. Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing Adjunct. New York, NY: ACM Press (2016). p. 1218–26.

53. Washington P, Voss C, Haber N, Tanaka S, Daniels J, Feinstein C, et al. A wearable social interaction aid for children with autism. Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems. New York, NY: ACM Press (2016). p. 2348–54.

54. Official Web Page of Brain Power LCC. (2016). Available from: http://www. brain- power.com/

55. Chaballout B, Molloy M, Vaughn J, Brisson Iii R, Shaw R. Feasibility of augmented reality in clinical simulations: using Google glass with manikins. JMIR Med Educ (2016) 2:e2. doi:10.2196/mededu.5159

56. Reeves B, Read JL. Total Engagement: How Games and Virtual Worlds Are Changing the Way People Work and Businesses Compete. Boston, MA: Harvard Business Review Press (2009).

57. Fernández-Aranda F, Jiménez-Murcia S, Santamaría JJ, Gunnard K, Soto A, Kalapanidas E, et al. Video games as a complementary therapy tool in mental disorders: Playmancer, a European multicentre study. J Ment Health (2012) 21:364–74. doi:10.3109/09638237.2012.664302

58. Fagundo AB, Santamaría JJ, Forcano L, Giner-Bartolomé C, Jiménez-Murcia S, Sánchez I, et al. Video game therapy for emotional regulation and impulsivity control in a series of treated cases with bulimia nervosa. Eur Eat Disord Rev (2013) 21:493–9. doi:10.1002/erv.2259

59. Fernandez-Aranda F, Jimenez-Murcia S, Santamaría JJ, Giner-Bartolomé C, Mestre- Bach G, Granero R, et al. The use of videogames as complementary therapeutic tool for cognitive behavioral therapy in bulimia nervosa patients. Cyberpsychol Behav Soc Netw (2015) 18:1–7. doi:10.1089/cyber. 2015.0265

60. Tsiourti C, Ben-Moussa M, Quintas J, Loke B, Jochem I, Albuquerque Lopes J, et al. A virtual assistive companion for older adults: design implications for a real-world application. SAI Intelligent Systems Conference. London: IntelliSys (2016).

250

61. Official Web Page of Google Glass. (2016). Available from: https://www.google.com/glass/

62. Official Web Page of Laforge Shima. (2016). Available from: https://www.laforgeoptical.com/

63. Official Web Page of Vuzix VidWear B3000. (2016). Available from: https:// www.vuzix.com/Products/Series-3000-Smart-Glasses

64. Official Web Page of Trulife Optics. (2016). Available from: https://www.trulifeoptics.com/

65. Official Web Page of Dispelix. (2016). Available from: http://www.dispelix.com/

66. Ben-Moussa M, Fanourakis MA, Tsiourti C. How learning capabilities can make care robots more engaging? RoMan 2014 Workshop on Interactive Robots for Aging and/or Impaired People. Edinburgh, UK (2014).

67. Jurafsky D, Martin JH. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Upper Saddle River, NJ: Pearson Prentice Hall (2009).

68. Tsiourti C, Joly E, Ben Moussa M, Wings-Kolgen C, Wac K. Virtual assistive companion for older adults: field study and design implications. 8th International Conference on Pervasive Computing Technologies for Healthcare (Pervasive Health). Oldenburg, Germany: ACM (2014). doi:10.4108/icst. pervasivehealth.2014.254943

69. Park IK, Singhal N, Lee MH, Cho S, Kim C. Design and performance evaluation of image processing algorithms on GPUs. IEEE Trans Parallel Distrib Syst (2011) 22:91–104. doi:10.1109/TPDS.2010.115

70. Sonka M, Hlavac V, Boyle R. Image Processing, Analysis, and Machine Vision. Australia: Cengage Learning (2014).

71. Ekman P, Friesen WV, Hager JC. Facial Action Coding System. Manual and Investigator’s Guide. Salt Lake City, UT: Research Nexus (2002).

72. Pantic M, Bartlett MS. Machine analysis of facial expressions. In: Delac K, Grgic M, editors. Face Recognition. InTech (2007). p. 377–416. doi:10.5772/4847

73. Scherer KR. What are emotions? And how can they be measured? Soc Sci Inf (2005) 44:695–729. doi:10.1177/0539018405058216

251

74. Ekman P. An argument for basic emotions. Cogn Emot (1992) 6:169–200. doi:10.1080/02699939208411068

75. Romano JM, Brindza JP, Kuchenbecker KJ. ROS open-source audio recognizer: ROAR environmental sound detection tools for robot programming. Auton Robots (2013) 34:207– 15. doi:10.1007/s10514-013-9323-6

76. Mitchell CJ. Sound Identification Systems. (2015). Available from: https:// www.google.com/patents/US20150106095

77. Official Web Page of HTC Vive. (2016). Available from: http://www.vive.com/

78. Official Web Page of Oculus Rift. (2017). Available from: https://www3.oculus.com/en- us/rift/

79. Official Web Page of Sony PlayStation VR. (2016). Available from: https://www.playstation.com/en-us/explore/playstation-vr/

80. Official Web Page of Samsung Gear VR. (2016). Available from: http://www.samsung.com/global/galaxy/gear-vr/

81. Official Web Page of Microsoft HoloLens. (2016). Available from: https://www.microsoft.com/microsoft-hololens/

82. Official Web Page of Mio LINK. (2016). Available from: http://www.mioglobal.com/en- us/Mio-Link-heart-rate-wristband/Product.aspx

83. Official Web Page of Fitbit Charge HR. (2016). Available from: https://www.fitbit.com/uk/chargehr

84. Official Web Page of Garmin Soft Strap Premium HR Monitor. (2016). Available from: https://buy.garmin.com/en-US/US/shop-by-accessories/fitness-sensors/soft-strap-premium- heart-rate-monitor/prod15490.html

85. Official Web Page of Empatica E4. (2016). Available from: https://www.empatica.com/e4- wristband

86. Hernandez J, McDuff D, Benavides X, Amores J, Maes P, Picard R.AutoEmotive: bringing empathy to the driving experience to manage stress. Proceedings of the 2014 Companion Publication on Designing Interactive Systems – DIS Companion. New York, NY: ACM Press (2014). p. 53–6.

252

87. Reynolds E. Nevermind: creating an entertaining biofeedback-enhanced game experience to train users in stress management. SIGGRAPH Posters; Anaheim, CA. New York, NY: ACM (2013). doi:10.1145/2503385.2503469

88. Mautz R. Indoor Positioning Technologies. Zurich: ETH Zurich (2012). doi:10.3929/ethz- a-007313554

89. Deak G, Curran K, Condell J. A survey of active and passive indoor localisation systems. Comput Commun (2012) 35:1939–54. doi:10.1016/ j.comcom.2012.06.004

90. Xiao J, Zhou Z, Yi Y, Ni LM. A survey on wireless indoor localization from the device perspective. ACM Comput Surv (2016) 49:1–31. doi:10.1145/2933232

91. Official Web Page of IndoorAtlas. (2016). Available from: https://www.indooratlas.com/

92. Official Web Page of AccuWare. (2016). Available from: https://www.accuware.com/

93. Martinez C, Anagnostopoulos GG, Deriaz M. Smart position selection in mobile localisation. The Fourth International Conference on Communications, Computation, Networks and Technologies. Barcelona, Spain (2015).

94. Lecrubier Y, Sheehan D, Weiller E, Amorim P, Bonora I, Harnett Sheehan K, et al. The Mini International Neuropsychiatric Interview (MINI). A short diagnostic structured interview: reliability and validity according to the CIDI. Eur Psychiatry (1997) 12:224–31. doi:10.1016/S0924-9338(97)83296-8

95. First MB, Williams JBW, Karg RS, Spitzer RL. Structured Clinical Interview for DSM-5 Disorders, Clinician Version (SCID-5-CV). Arlington, VA: American Psychiatric Association (2015).

96. Liebowitz MR. Social phobia. Modern Problems of Pharmacopsychiatry (1987). p. 141– 73. doi:10.1159/000414022

97. Fresco DM, Coles ME, Heimberg RG, Liebowitz MR, Hami S, Stein MB, et al. The Liebowitz Social Anxiety Scale: a comparison of the psychometric properties of self-report and clinician-administered formats. Psychol Med (2001) 31:1025–35. doi:10.1017/S0033291701004056

98. Leary MR. A brief version of the fear of negative evaluation scale. Pers Soc Psychol Bull (1983) 9:371–5. doi:10.1177/0146167283093007

99. Spielberger CD, Gorsuch RL, Lushene R, Vagg P, Jacobs G. Manual for the State-Trait

253

Anxiety Inventory. Palo Alto, CA: Consulting Psychology Press (1983).

100. Kaplan DM, Smith T, Coons J. A validity study of the subjective unit of discomfort (SUD) score. Meas Eval Couns Dev (1995) 27:195–9.

101. Wewers ME, Lowe NK. A critical review of visual analogue scales in the measurement of clinical phenomena. Res Nurs Health (1990) 13:227–36. doi:10.1002/nur.4770130405

102. Stangier U. New developments in cognitive-behavioral therapy for social anxiety disorder. Curr Psychiatry Rep (2016) 18:25. doi:10.1007/ s11920-016-0667-4

254