DESIGN OF A PROTOCOL FOR THE MEASUREMENT OF PHYSIOLOGICAL AND EMOTIONAL RESPONSES TO STIMULI

ANDRÉS FELIPE MACÍA ARANGO

UNIVERSIDAD DE SAN BUENAVENTURA MEDELLÍN FACULTAD DE INGENIERÍAS INGENIERÍA DE SONIDO MEDELLÍN 2017

DESIGN OF A PROTOCOL FOR THE MEASUREMENT OF PHYSIOLOGICAL AND EMOTIONAL RESPONSES TO SOUND STIMULI

ANDRÉS FELIPE MACÍA ARANGO

A thesis submitted in partial fulfillment for the degree of Sound Engineer Adviser: Jonathan Ochoa Villegas, Sound Engineer

Universidad de San Buenaventura Medellín Facultad de Ingenierías Ingeniería de Sonido Medellín 2017

TABLE OF CONTENTS

ABSTRACT ...... 7 INTRODUCTION ...... 8 1. GOALS ...... 9 2. STATE OF THE ART ...... 10 3. REFERENCE FRAMEWORK ...... 15 3.1. Noise ...... 15 3.1.1. Noise by Colors ...... 15 3.1.2. Tonal noise (and Low Noise) ...... 16 3.1.3. Temporal Characteristics of Noise ...... 17 3.1.4. Speech Noise ...... 18 3.2. Binaural ...... 18 3.2.1. Binaural Signals and Dummy Head Recording ...... 18 3.3. Psychoacoustic Concepts ...... 19 3.3.1. Masking ...... 20 3.4. Emotions and its Measurement ...... 21 3.5. Digital Filters ...... 23 4. DESCRIPTION AND ELECTION OF SOUND STIMULI ...... 24 4.1. Frequency Characteristics ...... 24 4.1.1. Filtering ...... 25 4.2. Time Characteristics ...... 26 4.3. Spatial Information ...... 26 4.3.1. Dummy Head Recording ...... 26 4.3.2. Convolution ...... 27 4.4. Sound Pressure Level...... 27 4.5. Stimuli Choice ...... 28 5. EXPERIMENT DESIGN ...... 30 5.1. General Considerations ...... 30 5.1.1. Participants ...... 30 5.1.2. Visual Stimuli ...... 30 5.1.3. Responses to be Measured ...... 31 5.1.4. Measurement tools ...... 31 5.1.5. Reproduction system ...... 31 5.1.6. Room Setup ...... 32 5.2. Sound Pressure Level Test ...... 32 5.3. Audiovisual Test ...... 33 6. GENERAL RESULTS AND ANALYSIS...... 36 6.1. Sound Pressure Level Test ...... 36 6.2. Audiovisual Test ...... 38 7. CONCLUSIONS ...... 43

LIST OF FIGURES

Fig. 1. Noise by Colors. Spectrum of each type on noise represented with different colors. . 16 Fig. 2. Low – Middle – High over White noise representation ...... 17 Fig. 3. Binaural Hearing ...... 19 Fig. 4 Equal loudness-level or phon curves (based on values in ISO 226-2003) ...... 20 Fig. 5 Masking patterns produced various tone maskers (masker frequency indicated in each frame). Number on curves indicate masker level...... 21 Fig. 6. SAM (Self Assessment Manikin) ...... 22 Fig. 7. Digital Filter in MATLAB ...... 25 Fig. 8. Dummy Head Recording Procedure ...... 26 Fig. 9. PSD at 500Hz centered noise...... 29 Fig. 10. Sound Pressure Level Test Stimuli ...... 33 Fig. 11. Setup for the Audiovisual Test ...... 34 Fig. 12. Audiovisual Stimuli Combinations ...... 35 Fig. 16. Valence mean values for the three Frequency noises ...... 37 Fig. 17. Arousal mean values for the three Frequency noises ...... 37 Fig. 18. Dominance mean values for the three Frequency noises ...... 38 Fig. 19. Software readings when participants move their head down ...... 40 Fig. 20. Reactions to audiovisual stimuli ...... 41 Fig. 21. 125 Hz filter designed in Matlab with its cut frequencies and characteristics ...... 50 Fig. 22. 500 Hz filter designed in Matlab with its cut frequencies and characteristics ...... 51 Fig. 23. 3150 Hz filter designed in Matlab with its cut frequencies and characteristics ...... 51 Fig. 24. Diagram of System Concection and Stimuli Presentation ...... 63

LIST OF TABLES

Table I. Studies using sound stimuli ...... 12 Table II. Valence Results ...... 58 Table III. Arousal Results ...... 58 Table IV. Dominance Results...... 59 Table V. Values from Audiovisual Test Pt1...... 60 Table VI. Values from Audiovisual Test Pt2...... 60 Table VII. Sahpiro-Wilk test for audiovisual test results...... 61 Table VIII. Kruskal Wallis test for audiovisual test results...... 61 Table IX. Shapiro-Wilk test for sound pressure level test results...... 62

Abstract 7

ABSTRACT

A protocol for the measurement of physiological and emotional responses to sound stimuli is presented. The used in the study were created by comparing different methods and possible bandwidths. These sounds correspond to white noise filtered in 3 central frequencies, to know, 125 Hz, 500Hz, and 3150Hz, with a variable bandwidth based on 1/3 octaves. additionally, spatial information was given to the sounds by convolving them with a simulated binaural impulse response.

Two experiments were conducted. The first one consisted of 3 sounds created at different sound pressure levels, between 50 dB and 80 dB in 6 steps. It was found that both valence and arousal changed as the level increased, the first one decreasing, and the last one increasing, showing a possible relation between elicited emotions from a sound and its sound pressure level.

The second experiment presented both image and sound simultaneously. The sound corresponded to the same described above, at a fixed level of 65dB. The images were two, one with a positive semantic content, and the other with a negative one. Both images were taken from the IAPS (International Affective Picture System). Responses were measured with the self assessment manikin SAM and Noldus FaceReader technology. The results obtained with the SAM were not conclusive, probably due to sample size, experiment design and other factors. The results obtained with the FaceReader showed clear reactions from participants to the audiovisual stimuli, but further analysis and improvement from this data is needed. Finally, based on the protocol design, some recommendations are made for future studies.

Keywords: Acoustic Noise, sound, SAM, FaceReader, emotions, crossmodal.

Introduction 8

INTRODUCTION

The field of saw a big growth in the XX century, with a lot of research aiming at one goal: the understanding of human hearing. As the years go by, this goal is constantly renovated, influenced by new findings, new technologies and new demands from both the academic world and the general society. Nowadays, sound has become important not only to researchers and academics, but has also become a matter of public health, and more recently, an important tool for commercial use.

The understanding of the underlying mechanisms of human hearing and perception are no longer restricted to the mere reception and transduction of the stimuli, but rather the whole impact, cognitive and emotional, that sound might elicit. In this line of thoughts, new research that approaches the perception of sound in different ways is needed, and the constant evolution of technological resources brings new possibilities and opportunities.

The current project contributes to a doctoral thesis being developed at Universidad de San Buenaventura, where the emotional impact of audiovisual stimuli is studied. The use of technologies like the FaceReader brings new approaches for understanding the problem at hand, and can provide useful information to expand the present knowledge. The goal here is to develop a solid guideline for the future work in this field, structuring a proper protocol that allows for the correct measurement of sound perception with various tools and techniques.

Goals 9

1. GOALS

General Goal

Design a protocol for the measurement of physiological and emotional responses to sound stimuli with the purpose of evaluating its impact on multisensorial processes.

Specific Goals

 Choose the sound stimuli that will be included in the test.

 Define the visual stimuli and variables that will be evaluated through the tests.

 Establish a methodology for the application of the sound stimuli in the experimental framework.

State of the Art 10

2. STATE OF THE ART

Thanks to the technological advances achieved in the last decades [1], and the accessibility to such technology, it has been possible to employ new methods and tools in research processes. It is the case, for example, to use electroencephalography or EEG to associate brain waves fluctuations to sound stimuli. In [2] an 8 electrode EEG was used to evaluate the perceived quality of the acceleration sound of a car, using binaural recordings, and managing to establish a relationship between alpha brain waves and good quality sound perception. On the other hand, [3] and [4] carried on a study to determine whether supra- threshold sound waves are perceived at neural level, where Tsutomu et al. found evidence favoring such hypothesis. Another topic usually approached with the use of neuroscientific tools is language [5],[6],[7] where the goal is to find brain processing mechanisms when perceiving such stimuli under different conditions.

Noise is a broadly studied stimuli, due to the great impact it generates on health [8], therefore, the development of new tools that allow for a deeper comprehension of its effects is of much interest. The review done by [9] shows how different kind of background noise affect in different ways on specific population, and proposes important guidelines for further studies. Other authors have analyzed the combined effect of noise with other bioclimatic variables such as temperature and illumination [10],[11], finding results that demonstrate the multisensory interactions in perception. It has been also proved that noise composed of speech is particularly disruptive in working environments [12],[13], and that its effect depends mainly on intelligibility, rather than the sound level itself [14]. Low frequency noise and infrasonic noise had been of particular interest, on one hand due to the harmful effects that have been proven, and on the other, due to the fact that most of anthropogenic sound sources possess a great amount of energy at low frequencies [15]. A study done in 2014 evaluated the impact of wind turbines noise (under 20Hz), where they found fluctuations in specific brainwaves [16]. Another study done by the same researcher evaluated the same parameters, only that this time the participants were exposed to State of the Art 11 infrasonic sound between 4Hz y 8Hz, a frequency range associated with theta and alpha brainwaves, where they found that the last diminished in amplitude with the application of the stimuli [17].

Simple sounds like pure tones or white noise has also been studied, especially for the evaluation of very basic brain processes [18],[19],[20]. Another approach to sound stimuli research is by analyzing how music and music therapy can influence in positive or negative ways, field that has been strengthened with the use of neuroscientific tools [21],[22],[23],[24],[25],[26].

Nevertheless, this topic can be approached from the point of view of the effects that are generated and how these can be characterized. Generally, one can talk about physiological and psychological effects of sound [15]. Physiological effects are mostly generated at high sound level exposures, an example of them are hearing loss, nystagmus, nausea and even organ damage. Other physiological effects are related to cardiovascular problems and endocrine system alterations due to stress [15].

Psychological effects greatly depend on the interpretation (positive or negative) of the stimuli. A clear example is the annoyance, that depends on the particular context, the task at hand, the age, the sound source and the way it is perceived [27]. From this point, there are several psychological effects that emerge and are related to cognition and emotional states, which are of great interest for the present study.

The interdisciplinary investigations done by Universidad de San Buenaventura also provides valuable information. From the analysis of computational methods for virtual sound environments [28],[29], to the studies on how background noise and time can affect on cognitive processes like memory and attention [30],[31].

Some studies have analyzed the multisensory interaction between auditory and visual stimuli, finding interesting crossmodal effects, just like the McGurk effect [32] and the ventriloquist effect. These findings demonstrate the dominance of sight as the primal State of the Art 12

human sense. However, there are also studies that reveal the bilateral relation between this two [33],[34]. In [35] a general review shows how sound can affect sight and vice versa.

The necessary work on the comprehension of these interactions is considerable, and the neurophysiological measurement tools like EEG can help considerably on bringing new and better results [34]. The use of evoked-related potentials or ERP can provide key elements to understand exactly how does the crossmodal modulation occurs.

The following table summarizes some of the work done in the past years in relation to sound stimuli and its analysis. It is important to note how most of the studies neglect the reproduction of sound and the acoustic conditions of the experiments done. Therefore, the necessity for properly designed protocols of reproduction and general presentation of sound or multisensory stimuli becomes clear.

Table I. Studies using sound stimuli

STIMULI RESULTS REPRODUCTION METHOD Alimohammadi Low frequency noise, at Better performance in cognitive tests and Not specified (Vienna 2013 50dB and 70dB arousal rise. testing system)

Cho 2011 Noise in bands of 100Hz, Brainwave and pupil responses to noise Not specified speakers. 1000Hz y 10000Hz suggest noise as a stress factor.

Daly 2014 Excerpts from film music Found correlation between emotions Not specified. previously selected induced by music and brain activity.

Daly 2015 Excerpts from film music Find relations between brainwave activity Not specified previously selected and acoustic features of each excerpt.

Drossos 2012 Use the IADS Compare acoustic features of sounds with Not specified elicited emotions. Find that the semantic State of the Art 13

content of sounds is predominant

Drossos 2015a Use the BEADS Find relation between the incidence angle Web survey. Every user of sound and its perceived impact. whit its own sound system.

Drossos 2016 Use the IADS and the Analyze the impact of semantic content of Not specified BEADS sound on its perception. Find that it is important but limited, and depends on others aspects. Fassbender 2012 Different musical Analyze the impact of music on memory. Headphones Sennheiser compositions. Didn’t find conclusive results. HD 280

Hygge 2001 Heat machine with Better performance but poorer accuracy Heat machine frequency components on writing tests when the noise was around and below 250Hz. present. At 38dBA and 58dBA

Inui 2010 Pure tone at 800Hz y 300ms of sound is enough to generate a E-A-Rtone 3A headphones 840Hz at 70dB memory trace.

Johnson 2007 Broadband noise for Reduced activity on beta brainwaves. Headphones Etymotic 500ms Research ER2

Kasprzak 2013a Infrasonic sound between Amplitude decay on alpha brainwaves. 6 sub-woofers GND 4Hz and 8Hz. at 110dB 30/80/2 type hung from the rooftop.

Kasprzak 2014a Recorded infrasonic sound Changes on EEG readings, with no statistic 6 sub-woofers GND from a wind turbine, validity. 30/80/2 type hung from filtered at 20Hz. Presented the rooftop at 91.6dB

Knoeferle 2016 Sound associated to a Semantic content of sound can affect Two speakers in front of a product, and jingles of visual perception of the products. screen at 70dB. products.

Lane 1998 Binaural beats at various Binaural beats on beta range influenced Headphones at carrier frequencies, with positively in the test, associated with less “comfortable” level. . negative feeling.

Li 2014 160 Hz, 500 Hz, 4000 Hz Average power of EEG (APEEG) changed Pure tones with Brüel & tones and white noise at on different frequency ranges depending Kjær HP1001. White noise 70dBA on signal duration. with Nor270 dodecahedron. Min2015 Various signals Better localization of moving objects Not specified. State of the Art 14

when considering both image and audio.

Oberfeld 2012 low-mid-high frequency spectral and temporal weights of loudness Headphones equalized narrow-band noise based are independent but influence the overall according to IEC 318 on bark scale. loudness perception.

Padmanabhan Music with binaural beats Music with binaural beats slightly Philips Electronics HP 140 2005 taken from commercial reduced anxiety on post-operatory headphones, CD played on CD’s. patients. CD420

Reyes 2014 Binaural beats with Greater effects found when carrier Bayerdynamic DT 990PRO various carrier frequency is 432Hz. headphones frequencies.

Rickard 2012 Relaxing and exciting Remembrance of an emotional story was Headphones with self- classical music with reduced when presented with relaxing adjusted level. recorded background music. noise.

Schuller2012 Sounds with annotated It is possible to perform automatic Not specified semantic content and emotion recognition of sounds. Further acoustic features. research is needed.

Trimmel 1996 White noise, traffic noise Found relation between sound, level and Speakers 5m away. and relaxing music at the brain load and DC potentials. 55dBA and 75dBA

Verhey 2001 Variable bandwidth noise results suggest that spectral loudness Sennheiser HD 25 with centered at 2KHz, varying summation depends on signal duration free-field equalization in duration from 10 to 1000ms

Vernon 2014 10Hz and 20Hz binaural No clear changes on EEG measurements. Headphones. beats with 400Hz carrier frequency

Reference Framework 15

3. REFERENCE FRAMEWORK

3.1. Noise

Noise can be defined simply as “an unwanted sound”, taking into account the subjective nature of this definition. It is also possible to give a statistical description of noise, according to Hartmann: “In , and other communication sciences, the word usually refers to a complicated signal with a dense spectrum whose properties are defined only statistically” [36]. From a broader point of view, one can talk about different kinds of noise according to its frequency and time behavior, and also the source that generates it.

Subjective nature of noise has already been mentioned, but for practical and investigation purposes, there are certain signals that can be considered universally to be noise in their own context; for example, traffic noise, machinery sound or an airplane passing by. All these sounds are present on most urban concentrations and therefore the effects that they might cause are a matter of public health. Different types of noise are described next, according to different classification criteria.

3.1.1. Noise by Colors

Named after their analogy with light spectrum, pink and white noise are used the most. There are other types of noise but are not commonly used. The main characteristic of these signals is that they are broadband, and due to their pseudo-random nature, they can be associated with other types of noise that are created by combining multiple sound sources. They are generally used as a sound “base” in auditory experiments given the mentioned characteristics.

White noise has a constant power through all the frequency range, while pink noise is a signal whose power spectral density is inversely proportional to frequency; that is, it has equal energy by octave band. Because of this, pink noise is perceived with grater sound level at low frequencies, and white noise, at high ones. Using one or other stimuli depends on the particular application, but it is important to say that prolonged reproduction of white noise at high sound pressure levels may cause damage to the speakers.

Reference Framework 16

Fig. 1. Noise by Colors. Spectrum of each type on noise represented with different colors. Taken from Mwchalmers

3.1.2. Tonal noise (and Low Frequency Noise)

It is a type of noise where great amount of energy is concentrated in a limited sound frequency range. Most engines and rotating machines, air conditioning systems and ventilators can generate it [37], which means it is present at most urban spaces. This, and the fact that several studies have shown a greater impact of tonal noise over broadband noise (See Table I), makes it of great interest.

Low frequency noise is somehow similar, where, as its name implies, exhibits a greater concentration of acoustic energy in the low frequency range, which can be defined around 20Hz and 200Hz. This type of noise is of great interest given the effects it can produce on people, shown by many studies (See Table I)

Reference Framework 17

Fig. 2. Low – Middle – High frequencies over White noise representation

3.1.3. Temporal Characteristics of Noise

Temporal characteristics of sound can be defined regardless of its frequency behavior. It is important to note, though, that when it comes to perception, frequency and time depend on each other, especially when considering loudness [38]. Constant noise can be defined as a signal whose sound pressure level does not change significantly over time. Impulsive sound, on the other hand, is characterized for having great amount of energy over a short period of time; take for example a short burst or a gunshot. Finally, intermittent noise is the one that contains random or regular sound pressure changes, where each sound has to last at least 5 seconds [39].

From a perceptual point of view, temporal characteristics of sound are very important. The “modality appropriateness hypothesis” [35], says that given a task under a specific context, the most “appropriate” or precise sense for such task is the one that will dominate the perception. In this way, and taking into account the fact that hearing has a greater time resolution than sight, it is shown how the time behavior of sound can greatly affect the perception of a given stimuli.

Reference Framework 18

3.1.4. Speech Noise

This type of noise has a direct impact on learning, teaching, working and in general productivity, so it is very important to understand the underlying mechanisms of its affection. In essence, “speech” means spoken language, and this has to be understood as a special kind of sound, not only because of its complex semantic content, but also to the fact that it is processed differently in the auditory cortex [13].

3.2. Binaural Hearing

The way in which us human beings listen to sound is called “binaural hearing”; that is, hearing with both ears, but not only stereophonically, but also modulated by the shape and consistency of the pinna, the ear canal, the head, and lastly the rest of the body. This kind of hearing allows for spatiality and a sense of sound direction [40].

3.2.1. Binaural Signals and Dummy Head Recording

A dummy head [41] is composed of a real-sized head and torso manikin, along with specially made pair of ears with a material similar to real ones, in order to recreate the reflections and diffractions that occur due to this structures. The ears are usually made of high density silicone, and there’s an omnidirectional microphone on each of them. This way, it’s possible to record sounds in the same way a person would perceive them1 On the other hand, a HRTF (head related transfer function), according to Vorländer: “is defined by the sound pressure measured at the eardrum or at the ear canal entrance divided by the sound pressure measured with a microphone at the center of the head but with the head absent” [42, p. 87]. Then, by obtaining the impulse response of the sound reaching both ears, it’s possible to give spatial information to any sound by convolving it with the impulse response.

Additionally, it is important to consider how spatial information of sound can modify the way it’s perceived, not only from the listening itself, but from its cognitive or emotional

1 it is important to note that each person has a unique HRTF due to the specific ear and head structure. nevertheless, general approximations or “averages” are used in dummy head construction to account for this problem. Reference Framework 19 effect. One study showed how the incoming angle of the sound can affect the emotional state of the listener [43].

Fig. 3. Binaural Hearing Taken from [42]

3.3. Psychoacoustic Concepts

One very important concept when dealing with sound perception is the critical bands. It describes the frequency resolution of human hearing in various ways, one of which is the equal rectangular bandwidth, defined as the rectangular filter that passes the same amount of energy as an auditory filter would do, and is expressed by the following equation, where F denotes the central frequency in kHz, and ERB is the bandwidth of the filter in Hz :

ERB=24.7*(4.37*F+1) (1)

Another important concept is loudness, which is the perceptual correlate of the sound intensity, and though they both vary in correlation, loudness depends on other perceptual Reference Framework 20 and physical variables. The equal loudness curves or equal loudness contours were first presented in 1927 by Kingsbury. However, the first ones to be well-accepted were the ones introduced in 1933 by Fletcher and Munson. These curves describe the SPL required for tones of all frequencies to sound as loud as a reference 1 kHz tone.

Fig. 4 Equal loudness-level or phon curves (based on values in ISO 226-2003) Taken from [44]

However, there are many different techniques to measure loudness, which depend on their approach and the type of stimuli they compare. The unit of loudness is called “sone”, which represents the loudness of a 1 kHz pure tone at 40 dB. Also, since loudness and bandwidth are closely related, it is important to consider the critical bands models.

One of these models is the bark scale. Proposed by Zwicker and Terhardt, it represents a frequency scale divided in 24 steps, with perceptually equal distance. Each band is not limited to a given cut frequency but can be defined depending on the central frequency of interest. Other important model, the ERB bands, are described at the beginning of this chapter.

3.3.1. Masking

Masking can be defined as the change in sensitivity for one sound in presence of another. This can be reflected in the threshold shift of the given sound, or its loudness shift, which is called partial masking [45]. It is important to note that this phenomenon depends on the Reference Framework 21 frequency and is not symmetrical along the frequency range. The masking patterns at different frequencies can be seen in fig 5:

Fig. 5 Masking patterns produced various tone maskers (masker frequency indicated in each frame). Number on curves indicate masker level. Taken from [45]

3.4. Emotions and its Measurement

First, it is important to explain how emotions are described and categorized. one method is to name the most basic and universal emotions, like happiness and sadness. Paul Ekman’s model proposes 5 basic emotions: disgust, anger, enjoyment, sadness and fear [46]. The other way to describe emotions is by their dimensional descriptions, that is, valence, arousal and dominance. All three can be put in a scale from 1 to 9, as described in [47]. In the case of valence, 1 means a very unpleasant emotion, and 9 a very pleasant one. For arousal, 1 represents a state of calm or ease, and 9 a very excited state. For dominance, 1 means the person perceives a lack of control over the situation, and 9 represents total control over it. By combining these 3 dimensions, it is possible to create a space where every emotion can be placed.

Measuring qualitative variables has always been a challenge, especially in quantitative investigations. That is the case of emotion, which is broadly studied and analyzed in all kinds of contexts. In the sound context, for example, researchers try to understand why and how certain pieces of music, certain noises or audio signals can make an impact over one’s emotional state. However, the lack of a unifying and holistic theory on emotion and affect in this context limits the possibilities of investigation [48].

Reference Framework 22

Nevertheless, there are validated and accepted tests to measure emotions, depending on its theoretical approach. One of these is the SAM (Self Assessment Manikin), which has 3 rows with figures that represent the dimensional characterization of emotions: valence, arousal, and dominance [47]. This tool has been widely used in psychological studies, proving to be useful.

Fig. 6 SAM (Self Assessment Manikin) taken from [49]

Other tools and methods for measuring responses to stimuli gather information from more basic and biological processes of human behavior. The work done by Paul Ekman and other researchers in more than two decades has brought knowledge about the way face movement, even in the order of millimeters, can reveal information about the emotional state of a person. Computational algorithms and tools that use these findings have been created to automatically identify emotional responses, achieving great results [50].

ECG (electrocardiogram), EEG (electroencephalography) and EDA (electro-dermal activity) measure different types of physiological responses, and, as any tool of this kind, require of the interpretation of a trained professional to assure accurate results. Additionally, going from basic biological data to highly complex responses like emotions and feelings, requires not only great understanding of such data but a lot of signal processing to eliminate noise and translate the raw information into relevant results. Reference Framework 23

3.5. Digital Filters

Filters are one of the most important tools on digital signal processing. They are essentially systems that operate on a discrete signal to reduce or enhance parts of that signal. They can be characterized according to their impulse response, or their difference equation, and can be put in one of two categories: FIR (finite impulse response) or IIR (infinite impulse response).

FIR filters are those whose impulse response is finite, that is, they depend only on the input signal, and not on past stages of the output signal. They have some advantages, for example, the ability to implement linear-phase filtering. On the other hand, they are generally more computationally demanding. IIR filters are often called recursive filters, because their output depends on past stages of the output itself. They have the advantage of being very efficient to implement, but can be unstable, and their phase is difficult to control.

Description and Election of Sound Stimuli 24

4. DESCRIPTION AND ELECTION OF SOUND STIMULI

Based on the reference framework and the state of the art, the proposed sound stimuli consists of a filtered white noise in three specific frequency bands. 125Hz can be considered as low frequency, which has proven to potentially generate various negative effects on people. In the band of 500Hz, there is a great amount of energy concentration from the vocal range that can also mask higher frequencies, therefore is of great interest. Finally, around 3KHz to 4KHz, the ear canal creates a resonance that implies an increase of the perceived loudness of the signal at those frequencies; a central frequency at 3150Hz can represent such phenomena [15].

Additionally, by choosing these frequencies, it is possible to obtain a representation of the spectrum divided in three parts, which can be called “low” (125Hz), “middle” (500Hz) and “high” (3150Hz) frequencies. This convention is almost universally used when dealing with the sound frequency range in audio applications, in order to easily refer to a given portion of it. Some psychoacoustic studies have adopted this convention [51], still maintaining the rigor when defining the properties of each sound.

4.1. Frequency Characteristics

Once the center frequencies were defined, it was necessary to decide the noise spectral behavior for each band. For this, three options were given. First, a bandwidth according to the ERB definition, (see section 3.3), so that every frequency range was similar to the critical bands of human hearing. In the second option the use of 1/3 octave bands was proposed, specifically, 5/3 octaves for each central frequency, two above, two below, and one central octave, aiming to obtain a broader spectrum that could be representative of the specific frequency range. As a third option, the use of a variable bandwidth, based on 1/3 octaves was proposed: for 125Hz, a 5/3 octaves bandwidth, for 500Hz and 3150Hz, a 3/3 octave bandwidth (not to be confused with one full octave, where lower and high cut frequencies are different). The reasons for choosing these options are described in section 4.5.

Description and Election of Sound Stimuli 25

4.1.1. Filtering

White noise, previously created in MATLAB, had to be filtered in such a way that only the frequencies inside the defined range would be present in the signal. To do this, a high slope, bandpass FIR filter was applied. By doing this process, any effect observed during the experiments could be adjudged to that specific frequency range. This can also be supported by the studies that show how low-intensity noise can cause clear effects (See Table I). the characteristics of the filters created can be seen on appendix A.

The filters were created based on the defined frequency ranges for each option mentioned in the above section. This way, three filter banks were obtained for each central frequency. Later, the noise was filtered with MATLAB’s ‘filtfilt’ function, which performs a filtering process without modifying the phase of the signal. Anyhow, at this point, the phase shift was irrelevant due to the random phase information of the white noise; the importance of this type of filtering will be shown later.

Fig. 7. Digital Filter in MATLAB

Description and Election of Sound Stimuli 26

4.2. Time Characteristics

Taking into account the way in which sound stimuli duration can affect in the perception of loudness, and also considering the type of test being made, it was decided to make the sounds 3 seconds long, so that there is enough time to avoid loudness variations due to stimuli duration [52]. For reproduction, fade in and out were applied to each sound sample, as to avoid clipping and popping that may occur otherwise. The fade applied was linear and 3ms long, again to avoid the loudness perception of the overall sound to be affected.

4.3. Spatial Information

Once the sounds were created, several methods were used to give a “spatial” sensation to them, with the intention of 1: giving specific, nonrandom phase information to the sound, and 2: giving the sensation that the sound is being reproduced within a space and not just from a focal point (speaker). The methods used are described next.

4.3.1. Dummy Head Recording

The first approach used to give spatial information to the sounds was to reproduce and record them with a Dummy Head (See section 3.2.1). This way, it was possible to obtain a binaural recording from a mono reproduction. The procedure was done as shown in Fig. 8:

Fig. 8. Dummy Head Recording Procedure

Description and Election of Sound Stimuli 27

Both broadband and filtered noise were reproduced and recorded, and in the broadband case, it was latter filtered using the same procedure described in section 4.1.1 for comparison purposes.

4.3.2. Convolution

Another way to give spatial information to the sound is by convolving them with a BIR (binaural impulse response), thus making the stimuli sound as if it was reproduced in the specific space and setup the BIR represents. In this case, the BIR was taken from Urrego’s work [53], where he created an acoustically “ideal” space by simulating a classroom with acoustic treatment.

It is important to note that the BIR obtained with this method uses generic HRTF’s, since it is a general model. This means that the spatial information conveyed in the impulse response might not match completely with each person’s way of listening, or more specifically, the way each brain processes such information. This is a limitation that must be considered as part of the experiment, since the creation of personalized HRTF’s for each participant is not a viable option in this study.

The convolution process was done in audacity’s module called ‘Aurora’, which allows for easy use inside the DAW (Digital Audio Workstation). The convolved audio was then brought back to MATLAB for analysis.

4.4. Sound Pressure Level

If a bark scale was chosen, the frequency boundaries wouldn’t match any measurement standard, then the SPL would be limited to a global measurement. On the other hand, by choosing 1/3 octave band-based filters, each stimuli could be measured with a sound level meter on the specific bands and then an energy summation could be performed (Equation 2). This means that each measurement is less affected by other frequency components of background noise, as was the case of the room used for the experiment, where important low frequency components were found. All the measurements in this project were done in dBZ or unweighted sound level.

Description and Election of Sound Stimuli 28

퐿1 퐿2 퐿푛 10 10 10 퐿푒 = 10 ∗ 푙표푔10 (10 + 10 + ⋯ 10 ) dB (2)

4.5. Stimuli Choice

Once all the stimuli were created with the methods described above, they had to be compared to find the one that worked best in the current investigation. For instance, in the frequency characteristics, the variable bandwidth based on 1/3 octave filter was chosen. The first option, which was to use ERB based filters, turned out to be too narrow for the purpose of generalization of frequency ranges, both from a theoretical and critical listening point of view. Also, the perceived loudness of sound is different both for a shorter and longer bandwidth than the critical bands, where in shorter bandwidths, the level fluctuations are much more easily perceived [51].

On the other hand, using a constant 3/3 octave band filters meant that the higher frequencies would have a much broader spectrum, thus being perceived louder [36]. This could bias the experiment in unwanted ways, since the loudness perception of the stimuli would not only depend on the frequency itself, but on the chosen bandwidth.

Taking this into account, the last option was chosen as the best fitting for this study. The variable bandwidths based on 1/3 octave bands resemble the ones found on bark’s scale [44]. The question of why not just to use bark scale bands might arise, and the answer relies on reproducibility and ease of measurement, which are described in section 4.4. for all cases, frequency response and PSD (power spectral density), which describes the distribution of power in frequency components of the signal, were analyzed to assure that each filtered noise was accurate according to its definition, and also to check for abnormal behavior, of which none was found.

For the spatial information, the proposed methods were evaluated as follows. First, frequency response and PSD were compared between each other and the pure generated noise (Fig. 9). Then, a critical listening test was undertaken, where some significant resonances were found in the dummy head recordings. This can be explained by the possible frequency changes introduced in the process of reproducing and recording the sound. The room, the speaker and the set up itself could account for this problem. The resonance issue could also be observed in the frequency response and PSD plots.

Description and Election of Sound Stimuli 29

Fig. 9 PSD at 500Hz centered noise. *Case 1 = filtered then recorded. Case 2 = Recorded then filtered

When comparing the filtered-before, and filtered-after signals, some noise was perceived in the first case, which might be due to the floor noise of the system and possible resonances and of both the speaker and the room. In general, the spatial impression of the sound was minor, though greater than the pure noise.

On the other hand, the convolved signals achieved the goal of sounding “wide” and “surrounding”. They were perceived clearly different from the pure noise, but no major frequency changes were found in the plots (Fig. 9). In all cases, the low frequency noise was the one that changed the most, both in the dummy head recordings and the convolved signals. In the first case, due to possible resonances of the room, and in the second one due to the limitations of the computational models. The convolved signal was chosen to be used in the tests.

Experiment Design 30

5. EXPERIMENT DESIGN

The general framework of this projects proposes the collaboration on the creation of a study that evaluates the crossmodal relations between sound and visual stimuli, focused on the emotional responses that these might elicit. Is then necessary to define a series of parameters and variables for such study.

5.1. General Considerations

The main aspects to consider in the experiment are mentioned below. Some of this items were defined according to the guidelines of the macro project.

5.1.1. Participants

For this study, the selected participants were students from Universidad de San Buenaventura Medellín that gathered the following characteristics:

 Not to study sound engineering  Not to study psychology  Not to have advance musical knowledge or training  Not having any hearing impairment  Not having any visual impairment  To be right handed

These requisites obey to the need of excluding participants that might be biased towards the experiment due to their professional education, or may have some sensory difficulty.

5.1.2. Visual Stimuli

The chosen visual stimuli for this study were part of the IAPS (International Affective Picture System). The advantage of using this dataset is that it is internationally used and recognized, and also that is locally validated [54]. All the images contained are annotated in Experiment Design 31 the dimensions of valence, arousal and dominance, and are therefore easily catalogued by their rankings. For this test, two images were chosen, one with negative ranking and one with positive ranking. This allowed to set the two extreme conditions of emotional responses, according to the investigation purposes.

5.1.3. Responses to be Measured

According to the goals set on the macro project, the measurement of emotions elicited by both sound and image stimuli is approached by using the dimensional structure, that is, by measuring valence, arousal and dominance. Additionally, physiological responses corresponding to face movements were also considered, in order to obtain emotional values from a different source, rather than self-evaluation alone.

5.1.4. Measurement tools

On a multisensory perception study, there are several tools and measurement techniques that can be used. Self-assessed tests like SAM (Self assessment manikin) provide “rationalized” answers from the subjects; however, they are widely used because of their ease of use and reproducibility. Different techniques consider other aspects of sensory processing and measure biological responses like brainwaves, heartbeat changes or facial expressions.

For this study, SAM was used as a baseline to evaluate the emotional responses of the participants. Additionally, the FaceReader technology from Noldus was also applied to analyze their reactions as they did the test (See section 3.4).

5.1.5. Reproduction system

Under a research context, the way sound is presented or reproduced is one of the most important and often neglected aspects of psychoacoustic experiments (see table I). The sound pressure level, the acoustic conditions of the room, the source positioning, and the type of source itself are just some of the necessary considerations to undertake a well- designed protocol that doesn’t affect the overall results. Experiment Design 32

As mentioned before, the spatial information of the sound is an important perceptual element, so it has to be successfully transmitted through the reproduction system. For this purpose, the typical 2.0 stereo configuration is not well suited, because it provides only a left and right panning. There are several tools and technological resources that can achieve binaural reproduction.

For this project, the Marantz OPSODIS system was chosen, as it allows binaural reproduction through speakers, using signal processing to avoid the effects of CrossTalk filters [42]. Compared to headphone reproduction, the first has a more “natural” configuration as it avoids partial or total occlusion of the ears, and the effects this can generate, for example, in-head localization.

5.1.6. Room Setup

The tests from this project were done at the recording room of studio “A” at Universidad de San Buenaventura. This space provided the best possible acoustic scenario, with a short reverberation time and overall controlled parameters. The following thesis provides a complete characterization of the recording room [55]. It is important to note though, that there were low frequency components of sound in the room, which had to be considered in the SPL measurement. Additionally, considering that this was not an anechoic environment, it becomes clear that such aspects as early reflection could, and most likely did affect on the overall experiment. However, it is a drawback that must be considered and taken as part of the setup as an infrastructure limitation.

For both tests, the participants were sitting 1.2m away from the sound system, which was placed in a 1m tall table. The instructor remained behind one of the sound panels placed in the room during the tests.

5.2. Sound Pressure Level Test

In this first test, only sound was used, presenting different SPL’s randomly over the three noises previously created. The aim was to evaluate the whole sound system and set up, as well as to help defining the SPL to be used in the following audiovisual test. The level was set from 50dB to 80dB in 6 steps at the listener position. Below 50dB, the sounds were barely audible, and above 80dB the whole sound system would start clipping. Experiment Design 33

A total of 18 stimuli were presented to each participant, where he or she had to evaluate every sound with the SAM.

Fig. 10 Sound Pressure Level Test Stimuli

5.3. Audiovisual Test

For this tests, not only the SAM was used, but also the FaceReader software from Noldus, which implied that every participant had to be recorded with an HD camera. To do this, a professional canon digital camera was placed over the screen, supported by a tripod on the table. This posed an extra challenge since the illumination had to be improved, and the fact of having a camera up front and being recorded was intimidating for the participants.

Experiment Design 34

Fig. 11 Setup for the Audiovisual Test

The test was done as follows. First, the participant was given the sheet with the SAM and the informed consent which he or she had to sign. Once sitting comfortably, straight, and looking forward, a pre-recorded voice gave clear instructions on how to undertake the test. An example was given in video format, and then two practice stimuli were presented to assure the correct understanding of the procedure. The video for the test was created using Final Cut Pro software, which was latter imported to Pro Tools and combined with the sounds.

Once everything was clear, the test began. It consisted of 6 audiovisual stimuli which corresponded to the combination of the three noises previously created and two images, one being positive and one negative (see section 5.1.2). the stimuli were presented in random order and balanced by creating several random combinations. A beep and a cross where presented 2 seconds before each stimuli to bring the participant’s attention, and to assure and equal expectation level. The stimuli were shown for three seconds, image and audio simultaneously, with a gap of 15 second between each other. This gap time was enough for the participant to fill the SAM.

Experiment Design 35

Fig. 12 Audiovisual Stimuli Combinations

General Results and Analysis 36

6. GENERAL RESULTS AND ANALYSIS

6.1. Sound Pressure Level Test

Considering the purpose of this first test, special attention was payed to the whole procedure from beginning to end, trying to find mistakes that could be corrected for the following ones. Some key aspects of the test were considered for improvement or change.

First, the use of the measuring tool (SAM) was a far from clear, and several participants filled the form incorrectly. Despite the instructions given verbally to each participant according to a written script, some failed to understand the dynamics of the test. This might have affected the overall results, because even when only complete and correctly filled tests where taking into consideration, it is possible that the participant didn’t completely understand the emotions trying to be conveyed by the figures in the SAM.

The results showed some interesting results. It was found that the values for valence slightly decreased as the SPL rose, and the ones for arousal, instead, increased. As for the dominance, a decreasing tendency was also found, though less pronounced. It is important to note though, that given the standard deviation of the values found, these tendencies cannot be supported as conclusive results.

General Results and Analysis 37

Fig. 13. Valence mean values for the three Frequency noises

Fig. 14. Arousal mean values for the three Frequency noises

General Results and Analysis 38

Fig. 15. Dominance mean values for the three Frequency noises

The results show that the data is too disperse to make solid conclusions. However, by following this approach with the given stimuli, a bigger study could reveal clear tendencies and correlated behaviors between frequency, level and the emotions elicited. This could further confirm a general hypothesis about the dependency that emotional responses from sound have on the SPL, and how exactly this dependency behaves as the level increases or decreases. Additionally, describing the behavior of emotional impact as a function of level could complement the broad findings on noise annoyance [15].

6.2. Audiovisual Test

Even though corrections were made after analyzing the first test, the extra measurement tool posed some problems on the setup. First, the lights from the room had to point at the participant’s sitting spot, which felt quite strong on the sight. Also, the big camera over the screen was clearly visible, and often made people feel intimidated.

After every test, the instructors asked each person how they felt about the stimuli, and how they perceived them. Based on their answers, it could be concluded, or at least conjectured, that the intrasubject design of the test was not adequate. Almost everyone assured that after looking at the negative image, the positive one was not perceived the same as before. General Results and Analysis 39

This means that not only the sound was affecting the way images were perceived, but also the images itself on one another. Therefore, the order of appearance might have affected the overall reactions.

Also based on the questions asked to the participants at the end, was the fact that almost everyone claimed to feel differently when looking at the same image with different sound, both for the positive and negative cases. This suggests that sound alone can modulate the impact of the audiovisual stimuli, but this should be reflected in the results to sustain such claim.

After computing the data obtained with the SAM, basic statistic descriptors were extracted and a Shapiro-Wilk test was done, which showed that the results did not come from a normal distribution. Therefore, a Kruskal-Wallis test was performed, which is a non- parametric method to determine if independent samples come from groups with the same distribution. It revealed that the data didn’t vary significantly between groups, and therefore no conclusion can be drawn from the results obtained with the SAM.

On the other hand, the analysis performed with the FaceReader indicated some tendencies. However, it is important to mention that the results obtained for every video recording had to be interpreted individually, and required of the investigators criteria to determine which parts of the results were relevant and which not.

One example of this is the pattern found on valence measurement, where most reading showed a pronounced negative peak around 4 seconds after the stimuli started. This could easily be confused with a negative reaction or the presence of anger, as the software suggested, but it was actually due to the participant’s movement of their head when looking down at the sheet to write the answers. In other cases, the software stopped reading values as the person completely turned their head down.

General Results and Analysis 40

Fig. 16. Software readings when participants move their head down

This first analysis suggests that the fact of looking down to write an answer while having a stimuli presented up front is both a possible distractor and a difficulty for the video recording, which results in inaccurate readings from the software. Therefore, it became clear that a tool allowing to do the whole procedure always looking at the screen would be optimal. An application that integrates the SAM test digitally is recommended for further evaluations.

Asides from this findings, there were also some results that revealed emotional reactions to the audiovisual stimuli. At around 2 seconds from the start of each video, was the time at General Results and Analysis 41 which the stimuli were presented. In that moment, some participants had a reaction that was caught by the software, and was reflected in the valence measurements, as shown in Figure 15.

Fig. 17. Reactions to audiovisual stimuli

Most of the reactions at this time were interpreted by the software as anger, but surprise and sadness were also found. This clearly shows that there were actually physiological reactions to the audiovisual stimuli that could be captured by the software. The ability to accurately measure the intensity and difference from one reaction to another requires further analysis.

Additionally, as seen on Fig. 16 and Fig. 17, every valence reading was almost always on the negative side, which can point at two things. First, as noted in section 5.3, the fact of being recorded, having direct lightning and being in such a controlled environment, could create General Results and Analysis 42 tension or stress in the participants, which could account for the overall negative reactions. On the other hand, it is possible that the face calibration for each participant was not the best, because there was not a still frame in which the participant was told to “look neutral”.

As mentioned before, the results given by the FaceReader software require a qualified professional to interpret them in order to obtain the most information. However, the findings here suggest that this tool can be helpful and provide valuable information to complement with other tests like SAM. It is important to take note on the video recording problems and improve them in further evaluations to assure a better reading from the software.

Conclusions 43

7. CONCLUSIONS

After analyzing and evaluating the overall results of this project, some conclusions can be drawn. First, the overall approach of changing basic sound characteristics as frequency and level to evaluate their impact on perception seems very promising, since with only a small pilot test, a clear tendency was found. It was shown how valence and arousal ratings changed as the level of the stimuli increased. Further investigation on the impact of frequency range is needed.

On the other hand, evaluating not only sound but also image stimuli imply an exponential growth on the complexity of the test, which makes it much more prone to error. Therefore, special care and rigor is recommended for further studies of this kind. In particular, using an intersubject instead of an intrasubject design can avoid unwanted modulation in the perception of the stimuli.

The results obtained from the two measuring tools were relevant and complementary. On one hand, the SAM proved to be relatively easy to implement, but the correct understanding of the test from the participants is something to consider. On the other hand, the FaceReader showed great potential but also more complexity. The need for data interpretation and optimal recording conditions made it difficult to use. If correctly combined, the data from both tools can bring more depth to the overall analysis.

Due to time and resources limitations, no other measuring tools were used in this project. It is recommended though, that future studies include more, like ECG or EEG, which could contribute to the understanding of the studied phenomena from a broader perspective.

Finally, it is concluded that the general goal of this project was achieved, successfully creating a protocol that allows for the measurement of multisensory stimuli reactions. Conclusions 44

Hopefully this will positively contribute to the macro project that aims at finding conclusive results on this matter, by providing a solid design of the sound stimuli and presentation system, both from a theoretical and practical point of view.

References 45

REFERENCES

[1] E. Niedermeyer, D. L. Schomer, and F. H. Lopes da Silva, Niedermeyer’s electroencephalography : basic principles, clinical applications, and related fields. Wolters Kluwer/Lippincott Williams & Wilkins Health, 2011. [2] S. M. Lee and S. K. Lee, “Objective evaluation of human perception of automotive sound based on physiological signal of human brain,” Int. J. Automot. Technol., vol. 15, no. 2, pp. 273–282, 2014. [3] M. Omata, K. Ashihara, M. Koubori, Y. Moriya, M. Kyoso, and S. Kiryu, “A Psycho‐ acoustic Measurement and ABR for the Sound Signals in the Frequency Range between 10 kHz and 24 kHz,” October, pp. 1–5, 2008. [4] O. Tsutomu, E. Nishina, N. Kawai, Y. Fuwamoto, and H. Imai, “High-Frequency Sound Above the Audible Range Affects Brain Electric activity and sound perception,” AES Conv. 1991, no. d, 1991. [5] B. Liu, Y. Lin, X. Gao, and J. Dang, “Correlation between audio-visual enhancement of speech in different noise environments and SNR: A combined behavioral and electrophysiological study,” Neuroscience, vol. 247, pp. 145–151, 2013. [6] Y. Lin, B. Liu, Z. Liu, and X. Gao, “EEG gamma-band activity during audiovisual speech comprehension in different noise environments,” Cogn. Neurodyn., vol. 9, no. 4, pp. 389–398, 2015. [7] J.-N. Antons, A. K. Porbadnigk, R. Schleicher, B. Blankertz, S. Möller, and G. Curio, “Subjective Listening Tests and Neural Correlates of Speech Degradation in Case of Signal-correlated Noise,” Audio Eng. Soc., no. 100, pp. 2–5, 2010. [8] L. M. Luxon and D. Prasher, Noise and its effects, 1st ed. Wiley, 2007. [9] C. P. Beaman, “Auditory distraction from low-intensity noise: A review of the consequences for learning and workplace environments,” Appl. Cogn. Psychol., vol. 19, no. 8, pp. 1041–1064, 2005. [10] S. HYGGE and I. KNEZ, “Effects of Noise, Heat and Indoor Lighting on Cognitive Performance and Self-Reported Affect,” J. Environ. Psychol., vol. 21, no. 3, pp. 291– 299, 2001. [11] A. Liebl, J. Haller, B. Jödicke, H. Baumgartner, S. Schlittmeier, and J. Hellbrück, References 46

“Combined effects of acoustic and visual distraction on cognitive performance and well-being,” Appl. Ergon., vol. 43, no. 2, pp. 424–434, 2012. [12] S. Banbury and D. C. Berry, “Disruption of office-related tasks by speech and office noise,” British Journal of Psychology, vol. 89. pp. 499–517, 1998. [13] P. Roelofsen, “Performance loss in open‐plan offices due to noise by speech,” J. Facil. Manag., vol. 6, no. 3, pp. 202–211, Jul. 2008. [14] S. J. Schlittmeier, J. Hellbrück, R. Thaden, and M. Vorländer, “The impact of background speech varying in intelligibility: effects on cognitive performance and perceived disturbance.,” Ergonomics, vol. 51, no. 5, pp. 719–36, May 2008. [15] J. P. Cowan, The effects of sound on people, 1st ed. Chichester: John Wiley & Sons, Ltd, 2016. [16] C. Kasprzak, “The influence of noise from wind turbines on EEG signal patterns in humans,” ACTA Physica Poloica A, vol. 125, no. 4–A. pp. 20–23, 2014. [17] C. Kasprzak, “The effect of the narrow-band noise in the range 4-8 Hz on the alpha waves in the EEG signal,” Acta Phys. Pol. A, vol. 123, no. 6, pp. 980–983, 2013. [18] K. Inui, T. Urakawa, and K. Yamashiro, “Echoic memory of a single pure tone indexed by change-related brain activity,” Bmc …, vol. 11, no. 1, p. 135, 2010. [19] B. W. Johnson, S. D. Muthukumaraswamy, W. C. Gaetz, and D. O. Cheyne, “Neuromagnetic and neuroelectric oscillatory responses to acoustic stimulation with broadband noise,” Int. Congr. Ser., vol. 1300, pp. 41–44, 2007. [20] E. Manjarrez, I. Mendez, L. Martinez, A. Flores, and C. R. Mirasso, “Effects of auditory noise on the psychophysical detection of visual signals: Cross-modal stochastic resonance,” Neurosci. Lett., vol. 415, no. 3, pp. 231–236, 2007. [21] M. H. Thaut, Rhythm, music, and the brain: Scientific foundations and clinical applications. Routledge, 2005. [22] T. Egner and J. Gruzelier, “Ecological validity of neurofeedback: modulation of slow wave EEG enhances musical performance.,” Neuroreport, vol. 14, no. 9, pp. 1221– 1224, 2003. [23] O. Sourina, Y. Liu, and M. K. Nguyen, “Real-time EEG-based emotion recognition for music therapy,” J. Multimodal User Interfaces, vol. 5, no. 1–2, pp. 27–35, 2012. [24] D. Justin, Patrik N., & Västfjäll, “Emotional Responses to Music: The Need to Consider References 47

Underlying Mechanisms,” Behav. Brain Sci., vol. 31, no. 5, pp. 559–575, 2008. [25] I. Daly et al., “Music-induced emotions can be predicted from a combination of brain activity and acoustic features,” Brain Cogn., vol. 101, pp. 1–11, 2015. [26] I. Cross, S. Hallam, and M. Thaut, The Oxford Handbook of Music Psychology. Oxford: Oxford University Press, 2008. [27] M.-F. C., E. Premat, A. D., and V. M, “Noise and its Effects – A Review on Qualitative Aspects of Sound . Part II : Noise and Annoyance,” Acta Acust. united with Acust., vol. 91, no. January, pp. 626–642, 2005. [28] M. A. Henríquez and A. D. Londoño, “evaluación de auralizaciones creadas mediante métodos numéricos basados en acústica geométrica y reproducidas en el sistema de reproducción binaural opsodis,” Universidad de San Buenaventura, 2015. [29] J. C. Rodriguez and A. Naranjo, “evaluación de auralizaciones obtenidas combinando métodos de elementos finitos y acústica geométrica en dos recintos y su aplicación en la valoración acústica de uno de ellos,” Universidad de San Buenaventura, 2015. [30] D. Q. VERTEL, “análisis del impacto de las condiciones acústicas en un aula de enseñanza sobre los procesos cognitivos mediante auralizaciones,” Universidad de San Buenaventura, 2015. [31] D. C. P. B. OCHOA and S. ESCOBAR, “análisis del impacto de ruido de fondo y tiempo de reverberación en procesos cognitivos por medio de auralizaciones,” Universidad de San Buenaventura, 2015. [32] H. McGurk and J. Macdonald, “Hearing lips and seeing voices.,” Nature, vol. 264, pp. 691–811, 1976. [33] J. Udesen, T. Piechowiak, and F. Gran, “The effect of vision on psychoacoustic testing with headphone-based virtual sound,” AES J. Audio Eng. Soc., vol. 63, no. 7–8, pp. 552–561, 2015. [34] S. Yuval-Greenberg and L. Y. Deouell, “What You See Is Not (Always) What You Hear: Induced Gamma Band Responses Reflect Cross-Modal Interactions in Familiar Object Recognition,” J. Neurosci., vol. 27, no. 5, pp. 1090–1096, 2007. [35] D. R. Tobergte and S. Curtis, the handbook of multisensory processes, vol. 53, no. 9. 2013. [36] W. M. Hartmann, Signals, Sound and Sensation, 1st ed. Michigan: Springer. References 48

[37] “UNE-ISO 1996-2,” 1998. [38] J. Rennies and J. L. Verhey, “Temporal weighting in loudness of broadband and narrowband signals.,” J. Acoust. Soc. Am., vol. 126, no. 3, pp. 951–4, 2009. [39] ISO, “Acoustics - Description, measurement and assessment of environmental noise. Part 1: Basic quantities and assessment procedures (ISO 1996-1:2003),” 2003. [40] J. Blauert, The technology of binaural listening, 1st ed. Bochum: Springer, 2014. [41] Burkhard and Sachs, “Anthropometric manikin for acoustic research,” 1975. [42] V. Michael and M. Vorländer, Auralization. Fundamentals of Acoustics, Modelling, Simulation, Algorithms and Acoustic Virtual Reality. Springer, 2008. [43] K. Drossos, A. Floros, A. Giannakoulopoulos, and N. Kanellopoulos, “Investigating the impact of sound angular position on the listener affective state,” IEEE Trans. Affect. Comput., vol. 6, no. 1, pp. 27–42, 2015. [44] S. A. Gelfand, Hearing: An Introduction to Psychological and Physiological Acoustics, 5th ed., vol. 45, no. 12. London: informa healthcare, 2010. [45] S. A. Gelfand, Hearing, 5th ed. New York: informa healthcare, 2010. [46] P. Ekman and E. Rosenberg, What the face reveals. 2005. [47] M. M. Bradley and P. J. Lang, “Measuring emotion: The self-assessment manikin and the semantic differential,” J. Behav. Ther. Exp. Psychiatry, vol. 25, no. 1, pp. 49–59, 1994. [48] F. Weninger, F. Eyben, B. W. Schuller, M. Mortillaro, and K. R. Scherer, “On the acoustics of emotion in audio: What speech, music, and sound have in common,” Front. Psychol., vol. 4, no. MAY, pp. 1–12, 2013. [49] J. Redondo, I. Fraga, I. Padrón, and M. Comesaña, “The Spanish adaptation of ANEW (affective norms for English words).,” Behav. Res. Methods, vol. 39, no. 3, pp. 600– 605, 2007. [50] V. Terzis, C. N. Moridis, and A. a. Economides, “Measuring instant emotions during a self-assessment test,” Proc. 7th Int. Conf. Methods Tech. Behav. Res. - MB ’10, vol. 2010, pp. 1–4, 2010. [51] D. Oberfeld, W. Heeren, J. Rennies, and J. Verhey, “Spectro-Temporal Weighting of Loudness,” PLoS One, vol. 7, no. 11, 2012. [52] J. L. Verhey and A.-K. Anweiler, “Spectral loudness summation for short and long References 49

signals as a function of level,” Acoust. Soc. Am., vol. 45, no. 5, pp. 287–294, 2006. [53] D. U. RUIZ, “impacto de las condiciones acústicas en la inteligibilidad y la dificultad de escucha en tres aulas de la universidad de san buenaventura medellín, sede san benito,” vol. 1, 2015. [54] C. A. Gantiva Díaz, P. Guerra Muñoz, and J. Vila Castelar, “Colombian Validation of the International Affective Picture,” Acta Colomb. Psicol., vol. 14, no. 2, pp. 103–111, 2011. [55] J. O. VILLEGAS, “estimación del coeficiente de absorción en incidencia aleatoria utilizando presión y velocidad de partícula mediante la sonda pu de microflown technologies,” 2015.

Apendix 50

APPENDIX

A. FILTER CHARACTERISTICS

Fig. 18 125 Hz filter designed in Matlab with its cut frequencies and characteristics

Apendix 51

Fig. 19 500 Hz filter designed in Matlab with its cut frequencies and characteristics

Fig. 20 3150 Hz filter designed in Matlab with its cut frequencies and characteristics Apendix 52

B. MATLAB CODE FOR SIGNAL PROCESSING

%% NOISE GENERATION Fs = 44100; d = 10; RB = 1.5*(rand(Fs*d,1)-0.5); load('variables6');

%% FILTERING

% 5/3 octave filters F_M_125 = filtfilt(kaiss125wide,1, RB); F_M_500 = filtfilt(kaiss500wide,1, RB); F_M_3150 = filtfilt(kaiss3150wide,1, RB);

%% CREATE AUDIO FILE audiowrite('F_M_125.wav', F_M_125, Fs); audiowrite('F_M_500.wav', F_M_500, Fs); audiowrite('F_M_3150.wav', F_M_3150, Fs);

%% FILTERING OF RECORDED NOISE WITH DUMMY HEAD

%importar el ruido blanco [W_E,FsW] = audioread('W_E.wav');

F_E2_125 = filtfilt(kaiss125wide,1, W_E); F_E2_500 = filtfilt(kaiss500wide,1, W_E); F_E2_3150 = filtfilt(kaiss3150wide,1, W_E);

%% IMPORT CONVOLVED AUDIO

% importar los sonidos filtrados y convolucionados en el audacity

[F_BIR_125] = audioread('F_BIR_125.wav'); [F_BIR_500] = audioread('F_BIR_500.wav'); [F_BIR_3150] = audioread('F_BIR_3150.wav');

% importar los sonidos filtrados y capturados con la cabeza binaural

[F_E1_125] = audioread('F_E1_125.wav'); [F_E1_500] = audioread('F_E1_500.wav'); Apendix 53

[F_E1_3150] = audioread('F_E1_3150.wav');

% crear los sonidos a partir del ruido blanco grabado grabado con la cabeza % binaural

%% RESIZE VECTORS

F_BIR_125 = F_BIR_125(1:441000,:); F_BIR_500 = F_BIR_500(1:441000,:); F_BIR_3150 = F_BIR_3150(1:441000,:);

F_E1_125 = F_E1_125(1:441000,:); F_E1_500 = F_E1_500(1:441000,:); F_E1_3150 = F_E1_3150(1:441000,:);

F_E2_125 = F_E2_125(1:441000,:); F_E2_500 = F_E2_500(1:441000,:); F_E2_3150 = F_E2_3150(1:441000,:);

%% NORMALIZE

% F_M_XXX

F_M_125 = F_M_125/(max(abs(F_M_125)*1.1));

F_M_500 = F_M_500/(max(abs(F_M_500)*1.1));

F_M_3150 = F_M_3150/(max(abs(F_M_3150)*1.1));

% F_BIR_XXX

F_BIR_125_L = F_BIR_125(:,1)/(max(abs(F_BIR_125(:,1)))*1.1); F_BIR_125_R = F_BIR_125(:,2)/(max(abs(F_BIR_125(:,2)))*1.1); F_BIR_125 = [F_BIR_125_L,F_BIR_125_R];

F_BIR_500_L = F_BIR_500(:,1)/(max(abs(F_BIR_500(:,1)))*1.1); F_BIR_500_R = F_BIR_500(:,2)/(max(abs(F_BIR_500(:,2)))*1.1); F_BIR_500 = [F_BIR_500_L,F_BIR_500_R];

F_BIR_3150_L = F_BIR_3150(:,1)/(max(abs(F_BIR_3150(:,1)))*1.1); F_BIR_3150_R = F_BIR_3150(:,2)/(max(abs(F_BIR_3150(:,2)))*1.1); F_BIR_3150 = [F_BIR_3150_L,F_BIR_3150_R];

% F_E1_XXX

F_E1_125_L = F_E1_125(:,1)/(max(abs(F_E1_125(:,1)))*1.1); F_E1_125_R = F_E1_125(:,2)/(max(abs(F_E1_125(:,2)))*1.1); F_E1_125 = [F_E1_125_L,F_E1_125_R];

F_E1_500_L = F_E1_500(:,1)/(max(abs(F_E1_500(:,1)))*1.1); F_E1_500_R = F_E1_500(:,2)/(max(abs(F_E1_500(:,2)))*1.1); Apendix 54

F_E1_500 = [F_E1_500_L,F_E1_500_R];

F_E1_3150_L = F_E1_3150(:,1)/(max(abs(F_E1_3150(:,1)))*1.1); F_E1_3150_R = F_E1_3150(:,2)/(max(abs(F_E1_3150(:,2)))*1.1); F_E1_3150 = [F_E1_3150_L,F_E1_3150_R];

% F_E2_XXX

F_E2_125_L = F_E2_125(:,1)/(max(abs(F_E2_125(:,1)))*1.1); F_E2_125_R = F_E2_125(:,2)/(max(abs(F_E2_125(:,2)))*1.1); F_E2_125 = [F_E2_125_L,F_E2_125_R];

F_E2_500_L = F_E2_500(:,1)/(max(abs(F_E2_500(:,1)))*1.1); F_E2_500_R = F_E2_500(:,2)/(max(abs(F_E2_500(:,2)))*1.1); F_E2_500 = [F_E2_500_L,F_E2_500_R];

F_E2_3150_L = F_E2_3150(:,1)/(max(abs(F_E2_3150(:,1)))*1.1); F_E2_3150_R = F_E2_3150(:,2)/(max(abs(F_E2_3150(:,2)))*1.1); F_E2_3150 = [F_E2_3150_L,F_E2_3150_R];

%% PLOT SPECTRUM freq = linspace(20,20000,220500); hold on grid on subplot(2,2,1) prueba = abs(fft(F_M_500(:,1))); prueba = prueba(1:end/2); semilogx(freq,10*log10(prueba),'r')

subplot(2,2,2) prueba2 = abs(fft(F_BIR_500(:,1))); prueba2 = prueba2(1:end/2); semilogx(freq,10*log10(prueba2),'b')

subplot(2,2,3) prueba3 = abs(fft(F_E1_500(:,1))); prueba3 = prueba3(1:end/2); semilogx(freq,10*log10(prueba3),'g')

subplot(2,2,4) prueba4 = abs(fft(F_E2_500(:,1))); prueba4 = prueba4(1:end/2); semilogx(freq,10*log10(prueba4),'y')

Apendix 55

%% PLOT PSD hold on

%figure subplot(2,2,1) pwelch(F_M_125,[],[],[],Fs) subplot(2,2,2) pwelch(F_BIR_125,[],[],[],Fs) subplot(2,2,3) pwelch(F_E1_125,[],[],[],Fs) subplot(2,2,4) pwelch(F_E2_125,[],[],[],Fs)

hold on

%figure subplot(2,2,1) pwelch(F_M_500,[],[],[],Fs) subplot(2,2,2) pwelch(F_BIR_500,[],[],[],Fs) subplot(2,2,3) pwelch(F_E1_500,[],[],[],Fs) subplot(2,2,4) pwelch(F_E2_500,[],[],[],Fs)

hold on

%figure subplot(2,2,1) pwelch(F_M_3150,[],[],[],Fs) subplot(2,2,2) pwelch(F_BIR_3150,[],[],[],Fs) subplot(2,2,3) pwelch(F_E1_3150,[],[],[],Fs) subplot(2,2,4) pwelch(F_E2_3150,[],[],[],Fs)

%% CREATE AUDIO FILES

Apendix 56 audiowrite('F_E2_125.wav', F_E2_125, Fs); audiowrite('F_E2_500.wav', F_E2_500, Fs); audiowrite('F_E2_3150.wav', F_E2_3150, Fs); audiowrite('F_BIR_125.wav', F_BIR_125, Fs); audiowrite('F_BIR_500.wav', F_BIR_500, Fs); audiowrite('F_BIR_3150.wav', F_BIR_3150, Fs); audiowrite('F_E1_125.wav', F_E1_125, Fs); audiowrite('F_E1_500.wav', F_E1_500, Fs); audiowrite('F_E1_3150.wav', F_E1_3150, Fs); audiowrite('F_M_125.wav', F_M_125, Fs); audiowrite('F_M_500.wav', F_M_500, Fs); audiowrite('F_M_3150.wav', F_M_3150, Fs);

%% VARIABLE BANDWIDHT FILTERING (BARK)

[Bark_125_W] = filtfilt(Bark125,1,F_M_125); [Bark_500_W] = filtfilt(Bark500,1,F_M_500); [Bark_3150_W] = filtfilt(Bark3150,1,F_M_3150);

[Bark_125_P] = filtfilt(Bark125,1,P_125); [Bark_500_P] = filtfilt(Bark500,1,P_500); [Bark_3150_P] = filtfilt(Bark3150,1,P_3150);

%% CRETE FILES WITH VARIABLE BANDWIDTH audiowrite('Bark_125_W.wav',Bark_125_W,Fs); audiowrite('Bark_500_W.wav',Bark_500_W,Fs); audiowrite('Bark_3150_W.wav',Bark_3150_W,Fs); audiowrite('Bark_125_P.wav',Bark_125_P,Fs); audiowrite('Bark_500_P.wav',Bark_500_P,Fs); audiowrite('Bark_3150_P.wav',Bark_3150_P,Fs);

%% RESIZE VARIABLE BANDWIDTH FILES

[Bark_125_BIR_W] = audioread('Conv_125_W.wav'); [Bark_500_BIR_W] = audioread('Conv_500_W.wav'); [Bark_3150_BIR_W] = audioread('Conv_3150_W.wav');

[Bark_125_BIR_P] = audioread('Conv_125_P.wav'); [Bark_500_BIR_P] = audioread('Conv_500_P.wav'); [Bark_3150_BIR_P] = audioread('Conv_3150_P.wav');

Bark_125_BIR_W = Bark_125_BIR_W(1:441000,:); Apendix 57

Bark_500_BIR_W = Bark_500_BIR_W(1:441000,:); Bark_3150_BIR_W = Bark_3150_BIR_W(1:441000,:);

Bark_125_BIR_P = Bark_125_BIR_P(1:441000,:); Bark_500_BIR_P = Bark_500_BIR_P(1:441000,:); Bark_3150_BIR_P = Bark_3150_BIR_P(1:441000,:);

audiowrite('Bark_125_BIR_W.wav',Bark_125_BIR_W,Fs); audiowrite('Bark_500_BIR_W.wav',Bark_500_BIR_W,Fs); audiowrite('Bark_3150_BIR_W.wav',Bark_3150_BIR_W,Fs); audiowrite('Bark_125_BIR_P.wav',Bark_125_BIR_P,Fs); audiowrite('Bark_500_BIR_P.wav',Bark_500_BIR_P,Fs); audiowrite('Bark_3150_BIR_P.wav',Bark_3150_BIR_P,Fs);

%% CONVOLVED AND THEN FILTERED WHITE NOISE

[RB_conv] = audioread('White_Conv.wav');

[Bark_125_W_2] = filtfilt(Bark125,1,RB_conv); [Bark_500_W_2] = filtfilt(Bark500,1,RB_conv); [Bark_3150_W_2] = filtfilt(Bark3150,1,RB_conv); audiowrite('Bark_125_W_2.wav',Bark_125_W_2,Fs); audiowrite('Bark_500_W_2.wav',Bark_500_W_2,Fs); audiowrite('Bark_3150_W_2.wav',Bark_3150_W_2,Fs);

Apendix 58

C. RESULTS FROM SOUND PRESSURE LEVEL TEST

Table II Valence Results

Table III Arousal Results

Apendix 59

Table IV Dominance Results

D. RESULTS FROM AUDIOVISUAL TEST

Apendix 60

Table V Values from Audiovisual Test Pt1.

Table VI Values from Audiovisual Test Pt2.

E. STATISTICAL DATA FOR THE AUDIOVISUAL TEST RESULTS

Apendix 61

Table VII Sahpiro-Wilk test for audiovisual test results.

Table VIII Kruskal Wallis test for audiovisual test results.

F. STATISTICAL DATA FOR THE SOUND PRESSURE LEVEL TEST RESULTS

Apendix 62

Table IX Shapiro-Wilk test for sound pressure level test results.

G. DIAGRAM OF SYSTEM CONECTION AND STIMULI PRESENTATION

Apendix 63

Fig. 21 Diagram of System Concection and Stimuli Presentation