<<

How does binaural audio mixed for translate to setups in terms of listener preferences?

Ian Eiderbo

Audio Technology, bachelor's level 2021

Luleå University of Technology Department of Social Sciences, Technology and Arts How does binaural audio mixed for headphones translate to loudspeaker setups in terms of listener preferences?

2 Abstract While most of today’s music listening is being done through headphones, mixing techniques using binaural audio are still not widely implemented in modern music production. This study aims to help inform mixing engineers on the applicability of binaural processing for music production, with the specific focus on how binaurally processed audio translates to in terms of listener preference. In this study a listening test was performed where binaurally processed mixes where given preference ratings in relation to a reference mix. Each listener completed the test twice, once using headphones and once using loudspeakers. The test results for the two playback systems were then compared. Only one of 12 mixes showed a significant difference in preference ratings with playback system as the factor, but the reported ratings showed a large disagreement among the 13 test subjects. The results from the study are inconclusive, however they do not suggest that the binaural processing used for the stimuli suffers in terms of listener preference when played back over loudspeakers.

3 Table of contents How does binaural audio mixed for headphones translate to loudspeaker setups in terms of listener preferences? ...... 2 Abstract ...... 3 1. Introduction ...... 5 1.1 Background ...... 5 1.2 Head related transfer functions (HRTFs) ...... 5 1.3 Listening and monitoring on loudspeakers vs. headphones...... 6 1.4 Binaural processing ...... 8 1.5 Compatibility issues ...... 8 1.6 The research question...... 8 2. Method ...... 10 2.1 Stimuli ...... 10 2.1.1 Processing method for binaural positioning ...... 11 2.1.2 Measurements of Ambeo Orbit ...... 11 2.1.3 Panning ...... 13 2.2 Subjects ...... 13 2.3 Procedure ...... 13 3. Results and Analysis ...... 14 3.1 Preference with playback system as factor...... 14 3.2 Testing significance for preference differences between stimuli and the reference...... 17 4. Discussion ...... 18 4.1 Comparing preference of stimuli with playback system as factor...... 18 4.2 Comparing preference of different stimuli with processing as factor...... 18 4.3 Keywords gathered from listener survey...... 19 4.3.1 Key points from listener comments ...... 20 4.4 Critique of method ...... 21 4.5 Findings ...... 21 5. References ...... 22 6. Appendix ...... 23 6.1 Raw data ...... 23 6.2 Written instructions for listeners ...... 30

4

1. Introduction This research aims to gain insight into how binaural processing for headphones affects listener preference when played back on loudspeakers. This is done by letting listeners rate their preference for mixes that use binaural processing on different elements in a mix in relation to a reference that is free from binaural processes. 1.1 Background Today the playback systems of consumers look different to what they once did. With the advent of the iPod and smartphones, listening to music in headphones (the term “headphones” will be used to refer to both headphones and earphones in this study if a distinction is not necessary) has become the most common way to experience music (Zinga, 2011). According to a survey done by IFPI, more than 60% of music listening time worldwide was done on either mobile devices or computers this year (IFPI, 2019). For example, in China, India and Mexico almost every music listener uses their smartphone for music playback. On the contrary, in the survey done by IFPI only 8% of the listening time was on a HIFI system. This suggests that few listeners experience music in the sweet spot with loudspeakers correctly set up in the same way that audio engineers monitor music, and that most listeners instead listen to music through headphones. Still, it is considered common practice among audio engineers to primarily monitor on loudspeakers and to use headphones for secondary checking, performing specific tasks or when the conditions for using loudspeakers are less than optimal.

With the advances in technology seen during the 21st century more music is also being produced at home. In recent years, the term “bedroom producer” has gone from not only indicating an amateur producer but to also contain a group of professional music makers contributing a significant part of the music that is available to the public today. Although a well thought out monitoring environment demonstrably can produce successful results for different kinds of playback systems, a lot can be achieved without overly expensive control rooms. For the bedroom producer, using headphones is a convenient solution to the problem of poor acoustic treatment and possibly a well-suited monitoring system for production of music since so much music ends up being played over headphones/earbuds. According to Owsinski (2014) most monitoring practices work well if the is familiar with the sound of the system. At the same time, he makes the case that music should be mixed for good translation to a wide range of playback systems. With so many listeners experiencing music mostly through headphones today one might ask if standard monitoring and mixing practices are truly the most suitable to optimize general listener experience for the widest audience. Another question is whether optimizing for playback on headphones is under-used in audio engineering, for instance using binaural or HRTF-techniques on one or several elements of a mix. The latter question is the one being examined in this study. 1.2 Head related transfer functions (HRTFs) Francis Rumsey (2011) writes in his article that “The term binaural audio is commonly used to refer to systems or techniques that capture or simulate “head-related” signals containing natural acoustic differences between the two ears.” (page 672). For binaural representation of audio, the left and right channel of the audio content normally needs to be isolated between the listener’s two ears, like in the case with headphones. There are however techniques utilizing controlled crosstalk

5 cancellation that can create binaural audio for loudspeaker setups, but in this study the use of the term “binaural” will refer to techniques for audio played back over headphones. The HRTF is an important aspect of how we hear sounds and the localization of their sources in a space. HRTFs consist of spectral alteration, time difference between sound arriving at the two ears, and amplitude differences at the two ears. The human body and head obstruct some of the sound travelling in the room. This results in shadowing of high frequencies and the spectral balance will therefore be different for the two ears if sound comes from the side. This is one cue for the human sense of auditory directionality. Another cue is that the ears’ pinnae cause several resonances to form before the sound reaches the eardrum. Since every ear is unique, the sense of directionality and especially the sense of the sound’s elevation is in direct relation to the listener’s personal HRTF (Møller et al., 1996). The human brain can interpret resonances differing depending on direction to help provide a sense of localization. While the specific HRTF of a person is unique due to different head sizes, body sizes and ear shapes there are several solutions where an approximated HRTF is used such as in Neumanns KU100 dummy head . Although research conducted by Møller et al. shows that a personalized HRTF results in a better sense of directionality (1996), generic HRTF models still give the listener the benefit of a sense of depth and space. Furthermore, Merimaa (2009) draws from his findings that the spectral coloration that is an effect of the HRTF can be omitted when using generic HRTF models without worsening the listener’s sense of position in the lateral plane. A precise sense of directionality is also arguably more important in a visual context as well whereas in music, preference does not necessarily correlate with being able to place instruments at specific spots since no visual reference is given. Generic HRTF models can also be made less complex than personal HRTF models, using simpler algorithms and filters. 1.3 Listening and monitoring on loudspeakers vs. headphones. In practically all modern studios a monitoring setup with at least two loudspeakers is used. The standard for monitoring practices for audio engineers is to use loudspeakers with an angle of 30° from the center in front of the listener sweet spot (ITU, 2012). This setup is tried and tested in the audio production industry and is used to produce music that translates well over different audio playback systems. Professional studios also take great care to minimize acoustic coloration of frequency response at the mixing engineer’s position.

Listening to two loudspeakers placed apart from each other playing the same signal will give the illusion that the sound is playing from between the two speakers (Vickers, 2009). This is known as a phantom image. Two loudspeakers placed at different locations in the room will also introduce a phenomenon known as crosstalk where the sound coming from the left speaker will hit both ears but at different times and vice versa for the right loudspeaker (see figure 1). In contrast, there is total separation of the left and right channel when played back over headphones. The audio played back by loudspeakers also interacts with the listener’s HRTF. Figure 1 - The wavefront of audio from a sound source will arrive at the two ears at different times because of the distance the sound travels.

6

Movement of the head outside the sweet spot will drastically change the frequency response at the ear drum when listening to loudspeakers. Moving in the lateral directions will introduce amplitude and time changes to the sound at the listener’s ears. The interaural time difference when a sound is not positioned in the center in front of a listener will also give rise to what is known as the Haas effect. Sounds arriving at one ear before the other will sound like they are coming from the direction where the wavefront hit first even in cases where the amplitude is the same at both ears (Haas, 1972). Relatedly, incorrect positioning in the vertical direction might put the listener in a position where the crossover frequency between the bass speaker and tweeter of the monitor is completely cancelled making it important to have the loudspeakers put at the correct height and angle (Bohn, 2005). Sound level difference also plays a role in sense of directionality, and amplitude difference is the parameter that regular panners in mixers and DAWs use.

Since the sound of two loudspeaker will arrive at the listener’s two ears at different points in time the combination creates a comb filter-effect that is very prominent in the frequency range around 2000 Hz when using the ITU-R BS 775 standard loudspeaker placement due to the approximate size of the human head (Vickers, 2009). Placing the speakers in narrower positions will make these notches move up in frequency since the interaural time difference decreases but the perceived width of the program material will also narrow for the listener. Since this is the case the center speaker in a surround setup generally is perceived to give speech more intelligibility partly because the frequency range around 2000 Hz is an important part of speech. Clark (1986) describes how the comb filtering is not easily perceived in a room where early reflections and reverberation fills in the notches. He also states that listeners often comment that program material in mono played back on a two-speaker system sounds fuller, more solid and has depth opposed to playback with a single loudspeaker.

Listening on headphones differs from listening on speakers in several aspects. First, since there is no crosstalk between the left and right channel, there is no introduction of inter-channel comb filtering, thus resulting in a frequency response free from the notches in the audio frequency spectrum present in loudspeaker stereo playback. Research that has been done does however show that people experience the phantom image differently on headphones due to the lack of crosstalk between the left and right channel (Vickers, 2009). It is generally perceived that listening on headphones will generate a phantom image that seems to come from inside the head opposed to appearing to reside in a physical space, between two loudspeakers.

One advantage to using headphones is that it eliminates the need to handle room acoustics. In addition to this, some types of headphones will attenuate background noise greatly. For instance, when using in-ear monitors it is not unusual to have a noise attenuation of more than 25 dB while a pair of open back headphones do not attenuate outside noise much at all. Since the listening device is situated on the listener’s head or in the ears, movement will not affect the frequency or time response of the monitoring system thus giving the listener more freedom of movement while retaining the headphones’ frequency response. The closeness to the listener’s eardrum also eliminates room absorption of higher frequencies. This, however, makes playback of music overly bright on headphones with a flat frequency response. A study by Olive et al. (2013) shows that the target frequency curve of headphones is usually preferred not to be flat but have an attenuation that mimics the in-room response of a loudspeaker setup.

7

1.4 Binaural processing Today, audio engineers can find many different DSP software applications that emulate in-room loudspeaker playback for headphones as a possible solution to the problem of an internalized phantom image. This processing uses crosstalk between the left and right channel often in combination with filters, delays and reverberation algorithms to simulate a head related transfer function (HRTF) of stereophonic loudspeakers in an acoustic space. Some applications, Waves Abbey Road Studio 3 for example, even incorporate head tracking to simulate movement in relation to the virtual speakers (Waves, 2020). However, with most listeners not using software to binaurally process media when played back through headphones the same issues with translation to headphone playback systems as with loudspeaker monitoring might arise if binaural processing of the full mix is used in the monitoring chain.

1.5 Compatibility issues The issue of compatibility between different playback systems is still relevant to the mixing engineer with music in most cases being produced to be presented in a lot of different scenarios. Today’s mixing engineer knows that commercial music is played back on smartphones, on TV, on radio and in venues with big PA systems etc. In a world where this is the case, one can assume that music needs to be mixed to translate well to many different playback systems, but there will always be tradeoffs. With headphones being the most common playback system, it would be of value to the mixing engineer to see how binaural mixing techniques will affect the compatibility of a mix when played back in other playback systems. With headphones having specific issues with mix translation and a substantial amount of listening time today is done on them, one might ask how monitoring and mixing practices may be optimized to deliver a product that is most suitable to the widest range of listeners. In one sense, with headphones, the audio engineer can create an immersive experience. The complaint when using headphones is ironically often that the sound does not have the same depth and that the phantom center image appears to be appearing from inside the listener’s head opposed to when listened to a stereo loudspeaker setup (Vickers, 2009). 1.6 The research question. A major part of music listening is done on a playback system that is well suited for binaural processing. Research on the compatibility of binaural audio to other systems than headphones is needed to see if binaural techniques can be more broadly applied as a tool for the music mixing engineer. In the same way that stereo material is checked for mono compatibility to make sure it translates to monophonic playback systems, an increased use of binaural processes and recording techniques would need to safeguard against bad translation to loudspeaker setups. Little research has been done on the translation of binaural audio to loudspeakers in terms of listener preference. The sense of space and directionality experienced in headphones using binaural audio might not transfer to loudspeakers, but how does binaural recordings and processes perform on loudspeakers in terms of listener preference? Comparing binaurally processed recordings to other stereo techniques in loudspeaker playback would inform mixing engineers on the extent to which binaural techniques could be applied more widely to instruments in a mix or to full music mixes. The use of binaural recordings and processes when mixing could be a very valuable tool in an age where the common consumer technology lends itself well to a binaural experience.

8

One hypothesis is that spectral altering of the HRTF will have a significant impact on the listeners’ preference, where an increase in HRTF coloration of frequency content will impact the preference negatively. Therefore, the test will try different amounts of spectral HRTF processing. Another hypothesis is that a played back on loudspeakers might work well for instruments that do not have much transient information, like a pad or piano, while the introduction of time differences in the different channels might smear the transients of percussive sounds when played back on loudspeakers thus making it less preferred with the listeners.

9

2. Method The purpose of the experiments is to see if any difference in preference is experienced when the program material is played back over either headphones or loudspeakers depending on if the components of the program materials are panned with a traditional amplitude-based panner or positioned in the lateral plane with binaural processing. The listening tests made use of Ambeo Orbit for the binaural positioning, which is a plugin for binaural processing (Sennheiser, 2020). Ambeo Orbit was first tested to see how exactly it affects the signal that is passed through it so that the variables can be examined in depth. 2.1 Stimuli The program material consists of 13 different mixes where binaural processing and standard panning has been utilized in different combinations. In the tests, only positioning in the lateral plane using Ambeo Orbit was used (see figure 2). Only settings with positioning in front of the listener was used due to the limitations of standard panning, where no psychoacoustic techniques can be utilized to differentiate positioning in front and behind the listener. The mixes were prepared on headphones as that is the intended playback system for binaural processing. The mixes consist of the following three elements:

• A stereo drum loop consisting of several percussive sounds Figure 2 - The interface of Ambeo Orbit. with all sounds within the loop being either panned or binaurally processed individually. • A stereo pad with no transients. • A mono lead sound with soft transients. The different mixes used consist of every individual element being binaurally processed in three ways using Ambeo Orbit. What differs between the versions are the use of the “clarity”-parameter in the Ambeo Orbit plugin. There is one version where the clarity value is 0%, one where it is 50% and one where the value is set at 100%. Every binaurally processed element is put in the context of a mix where the other elements are panned using standard panning. For example, one mix might consist of the drum loop and pad being panned with standard panning while the lead sound is positioned using binaural processing and a value of 50% on the clarity parameter. Additionally, three mixes have all the elements using binaurally processed positioning using the different clarity values for each mix. All the mixes were referenced to a mix only consisting of standard panned elements in the Two-comparison forced choice-listening test. This makes a total of 12 mixes plus one reference mix (see table 1).

Table 1 - A list of the 13 mixes used for the listening tests. 1. All elements binaurally processed using a clarity value of 0% 2. All elements binaurally processed using a clarity value of 50% 3. All elements binaurally processed using a clarity value of 100% 4. Only drums binaurally processed using a clarity value of 0% 5. Only drums binaurally processed using a clarity value of 50% 6. Only drums binaurally processed using a clarity value of 100% 7. Only melody binaurally processed using a clarity value of 0% 8. Only melody binaurally processed using a clarity value of 50% 9. Only melody binaurally processed using a clarity value of 100% 10. Only pad binaurally processed using a clarity value of 0% 11. Only pad binaurally processed using a clarity value of 50% 12. Only pad binaurally processed using a clarity value of 100% 13. No binaural processing (reference mix)

10

The choice was made to use simple sound design derived fully from a simple monophonic synth, a polyphonic synth and samples from a drum machine. The main reason for this is because introduction of natural reverberation in recordings could complicate isolation of the variables in the listening test. By isolating the different elements in the mix, confounding issues can be controlled for during analysis. 2.1.1 Processing method for binaural positioning The choice was made to use Sennheisers plugin Ambeo Orbit because of the properties described in this part of the paper. It is a free plugin that uses binaural processing to place audio sources in the lateral and longitudinal plane with algorithms based on the Sennheiser KU100 dummy head microphone which is widely used in the audio industry. Although the plugin has a variety of features, only two parameters will be used for mono signals, and a third for stereo signals. These are: The “clarity” value, “positioning” in the lateral plane and the “width” function for stereo signals. In the plugin the parameter named “clarity” can be used to decrease the amount of spectral coloration that the plugin emulates from the KU100 dummy head. Tests conducted by the researcher show that a higher value of “clarity” decreases the simulated spectral coloring, especially on the side of the lateral plane that the sound source is positioned. A value of 100% “clarity” omits spectral coloration to within approximately 1 dB of the original sound on the ipsilateral side while keeping the coloration mostly intact on the contralateral side of the sound source. Keeping the “clarity”-value at 0% keeps the full emulated spectral coloration of the KU100. The “clarity”-parameter is potentially useful in this test as one variable that might affect the listeners’ preference is that of spectral timbre. Being able to reduce the “clarity”-value to some extent can show if this parameter correlates with listener preference. 2.1.2 Measurements of Ambeo Orbit Ambeo Orbit simulates the HRTF of a KU100 dummy head microphone. This includes frequency response, Interaural time difference (ITD) and amplitude of the left and right channel depending on positioning chosen in the plugin. To see exactly how the plugin processes the audio, measuring of these features within the plugin were performed prior to the program material design and listening test. Measuring ITD An impulse was played on a track in Cubase with the plugin applied after in the chain. During this test three different settings for lateral placement in Ambeo orbit were used: 0°, 45° Left and 90° Left. The resulting impulse response was recorded and the number of samples between the start of the two channels were measured visually inside Cubase. With the project using a sample rate of 44100 kHz the following formula was then used to measure the time ITD between the two channels: 1000 푚푠 × 푎푚표푢푛푡 표푓 푠푎푚푝푙푒푠 = 푡푖푚푒 푑푖푓푓푒푟푒푛푐푒 푖푛 푚푠 44100 First of all, no ITD was shown when using 0° placement. For the 45° setting the ITD measured was 0,43 ms and for 90° the ITD was 0,77 ms. Assuming that the sound moves at 343 m/s in room temperature, this is what to expect from a plugin that mimics the HRTF of a dummy head microphone. The results from these measurements are show in table 2 underneath.

Table 2 - ITD measurements from Ambeo Orbits processing. Angle 0° 45° Left 90° Left ITD 0 ms 0,43 ms 0,77 ms

11

Measuring Amplitude difference A burst of white noise was routed through the plugin and the RMS of the white noise was measured separately for the left and right channel. The difference in RMS levels where then calculated by subtracting the right channels dB from the left channels dB. The settings used in the plugin during this test were 0% “clarity”, “reflections” turned off, and 0% “width”. The results can be seen in table 3.

Table 3 - Measured amplitude differences for the left and right channels at three different angle settings in Ambeo Orbit. 0° 45° Left 90° Left Left channel RMS -25,6 dB -22,9 dB -21,6 dB Right channel RMS -24,7 dB -34,1 dB -34,1 dB Difference 0,9 dB 11,2 dB 12,5 dB

Measuring frequency response Tests of the plugin were performed using white noise within Cubase (DAW). Using Fabfilters Pro-Q3 plugin the frequency spectrum of the left and right channel can be analyzed and shown simultaneously on the same frequency plot window. In figure 3 the left and right channels are visually layered on top of each other for nine different measurements. Frequency plot 1-3 within figure 3 show a positioning of the sound source at 0° where the first graph shows the plugin with a “clarity”-setting of 0%, the second with a “clarity”-setting of 50% and the third uses a “clarity”-setting of 100%. Frequency plot 4- 6 use a 90° positioning to the left with the “clarity“-settings applied in the same manner as 1-3. Frequency plot 7-9 use a positioning of 45°. In the different frequency plots of Figure 3 one can see that an increase in “clarity” evens out the resonances created by the plugins processing. When the positioning is at 0°, straight in front of the listener, the frequency response of the processing mimics the KU100 when the clarity is set to 0% and is almost fully flattened when set to 100%. In frequency plot 4-9 within figure 3 one can see that an increase in “clarity” evens out the frequency response of the ipsilateral channel while keeping the channel of the contralateral ear intact to some extent.

Figure 3 - Measurements of Ambeo plugin using Fabfilter Pro-Q2.

12

2.1.3 Panning Panning is done using the plugin mUtlility. Here the panning law is set to 4.5 dB+ which means that the signal is amplified by 4.5 dB when panned fully to either the left or right compared to the center. 2.2 Subjects The participants in the listening test were all second- and third-year students of the audio engineering program at LTU in Piteå, Sweden. A total of 13 listeners performed the listening test. None of the participants reported having any hearing impairment. Since the subjects all have at least one year of education in audio engineering they are suitable for evaluating the stimuli in the listening experiment. 2.3 Procedure To gather information to answer the posed research question a listening test divided into two parts was conducted. One part lets the listeners rate their preference using headphones and the other test is identical except for the use of loudspeakers as playback system. By doing the same test twice on two different systems, the preference ratings of a specific stimuli can be compared between the two playback systems. The listeners used a tablet showing the interface of the software STEP to rate the stimuli. For half of the listeners the first time the test was conducted, headphones were used. The test was then repeated an additional time using loudspeakers. For the other half of the listeners this order was switched to make sure the order is not a confounding variable. Also, the listeners were asked to set a playback volume before rating each system that they felt they could use throughout the test. Two tests were used in this manner to be able to isolate the variables to see if other aspects of the stimuli presented than the processing being researched affect the listener’s experience. By letting the listeners listen to the program material in both the intended playback system (headphones) and loudspeakers, correlations can be examined between the listeners´ experience when listening to the two different systems. For example, binaural processing that is given a poor preference rating in both headphones and loudspeakers might have other issues not related to the transfer of binaural processing to loudspeakers. In the headphone part of the test the stimuli were played back using high quality over-ear headphones, Beyerdynamic DT-990. In the second test the stimuli were played back on an ITU-R BS. 775 stereo setup using high quality studio monitors and subwoofers, Klein & Hummel KH O410 and KH O870. The monitor setup was situated in a room where the acoustics are well controlled, using a studio control room that is designed for mastering purposed. The listeners reported their preference in the software STEP. The test used an AB7 test format where the listeners listened to pairs of mixes and discern which of the two they prefer, and to what degree (on a scale of 7 where 4 means no preference). The test was randomized and double blind. One of the two randomized alternatives in the pairing is always the non-binaurally processed reference mix. This mix serves as a reference to which every processed mix is rated. Information regarding the listeners’ experience was also collected at this time with the listeners being able to report on why they chose one over the other on a paper form. This information could be used for creating new hypotheses when conducting further research but due to the limitations of the software the randomization of the playback order cannot be deciphered and thus linking the comments to specific variables is not possible.

13

3. Results and Analysis

3.1 Preference with playback system as factor. The results from the listening test were analyzed using T-testing to see if any preference showed statistical significance. In the table below (table 4), the reported preferences for each separate mix are shown for both headphones and loudspeakers. Significance testing using paired T-testing for differences between the two playback systems is shown in the right column and a green cell indicates that the stimuli show statistical significance (P < 0.05).

Table 4 - Preference scores and standard deviation for individual stimuli shown independently for headphones and loudspeakers. The p-value relates to the difference in means between headphone and loudspeaker playback. The degrees of freedom for the comparisons in this table are 12.

Stimuli Mean score Mean score Std Dev Std Dev Headphones Loudspeakers Headphones Loudspeakers p-value 1. All processed 0% clarity -0,077 1,154 2,019 1,951 0,088 2. All processed 50% clarity -0,769 0,385 2,048 1,660 0,045 3. All processed 100% clarity -1,077 -1,538 1,891 1,266 0,418 4. Drums processed 0% clarity 1 1,385 1,155 1,446 0,544 5. Drums processed 50% clarity 0,846 1 0,800 1,155 0,584 6. Drums processed 100% clarity -0,308 -0,308 1,316 0,947 1 7. Melody processed 0% clarity 0,154 0,154 0,800 0,800 1 8. Melody processed 50% clarity 0,231 0,077 1,092 0,76 0,7 9. Melody processed 100% clarity 0,154 0,154 0,800 0,800 1 10. Pad processed 0% clarity -1,462 -1,231 1,198 1,300 0,273 11. Pad processed 50% clarity -0,385 -1,077 1,660 1,441 0,311 12. Pad processed 100% clarity -1,231 -1,308 1,74 1,377 0,886

14

In the following graph (Figure 4) preference scores for both headphones and loudspeakers are shown for each mix. Stimuli number 2 (All processed 50% clarity) is the only stimuli where a difference in preference shows significance between headphones and loudspeakers. In this case the listeners rated the stimuli lower when played back over headphones than on loudspeakers.

Preference ratings for individual mixes 3

2

1

0

-1

-2 Score relative Score relative toreference

-3 All 0% All 50% All 100% Drums 0% Drums Drums Melody Melody Melody Pad 0% Pad 50% Pad 100% clarity clarity clarity clarity 50% 100% 0% clarity 50% 100% clarity clarity clarity clarity clarity clarity clarity Headphones Loudspeakers Stimuli

Figure 4 - Mean values of preference for Headphone vs Loudspeaker playback with error bars showing standard deviation. The error bars show standard deviation.

Combinations of mixes were also tested for significant preferential differences between headphones and loudspeakers. This analysis was performed to see if certain themes of variables have a bigger statistical effect once combined. The mean scores in table 5 show preference of the stimuli compared to the reference mix while the p-value shows if there is a significant difference between preference for the stimuli when played back over different playback systems. The groupings that were tested are shown in table.

Table 5 - The means of rated preferences for groups of mixes and p-value.

Stimuli groups Mean score Mean score p-value Headphones Loudspeakers 1. All 12 stimuli combined. -0,045 -0,295 0,304

2. Every mix using 0% clarity -0,096 0,365 0,07 (Stimuli: 1, 4, 7 and 10). 3. Every mix using 50% clarity -0,019 0,096 0,648 (Stimuli: 2, 5, 8 and 11). 4. Every mix using 100% clarity -0,615 -0,75 0,624 (Stimuli: 3, 6, 9 and 12). 5. Every mix where all elements are -0,641 0 0,074 processed. 6. Every mix where only drum elements are 0,513 0,692 0,514 processed (Stimuli: 4, 5 and 6). 7. Every mix where only melody is processed 0,18 0,128 0,809 (Stimuli: 7, 8, 9) 8. Every mix where only pad is processed -1,026 -1,205 0,535 (Stimuli: 10, 11 and 12).

15

None of the groupings of stimuli show statistical significance for preference differences between headphones and loudspeakers. The wide spread of preference ratings is shown in figure 5 where the group numbers relate to the stimuli groups in Table 5.

Preference ratings for groups of stimuli 3

2

1

0

-1

Score relative Score relative toreference -2

-3 1 2 3 4 5 6 7 8 Headphones Loudspeakers Group number

Figure 5 - Groups of stimuli with preference rated in comparison to reference. Error bars show standard deviation.

16

3.2 Testing significance for preference differences between stimuli and the reference.

In the table below (table 6) the ratings for stimuli compared to the reference mix are presented with p-value showing significance for means of stimuli compared to the reference mix. Significant results are marked with green cells.

Table 6 - Preference scores and p-values showing significance for comparisons of the stimuli and the reference stimuli. Stimuli Mean score p-value for Mean score p-value for headphones headphones loudspeakers loudspeakers 1. All processed 0% -0,077 0,893 1,154 0,054 clarity 2. All processed 50% -0,769 0,201 0,385 0,42 clarity 3. All processed -1,077 0,063 -1,538 0,001 100% clarity 4. Drums processed 1 0,009 1,385 0,005 0% clarity 5. Drums processed 0,846 0,002 1 0,01 50% clarity 6. Drums processed -0,308 0,416 -0,308 0,264 100% clarity 7. Melody processed 0,154 0,502 0,154 0,502 0% clarity 8. Melody processed 0,231 0,461 0,077 0,721 50% clarity 9. Melody processed 0,154 0,502 0,154 0,502 100% clarity 10. Pad processed 0% -1,462 0,001 -1,231 0,005

clarity 11. Pad processed -0,385 0,42 -1,077 0,02 50% clarity 12. Pad processed -1,231 0,025 -1,308 0,005

100% clarity

17

4. Discussion 4.1 Comparing preference of stimuli with playback system as factor. Overall little difference in preference is shown for the binaurally processed stimuli depending on whether it is played back over headphones or loudspeakers. Only one of the comparisons between playback systems show a statistical significance (stimuli 2). And that is the mix where all elements have been processed using a “clarity”-value of 50%. What could account for the significance of this specific stimuli is not clear, neither stimuli 1 or 3 show any statistical significance. The data also shows a big spread in reported preference for most stimuli, suggesting that personal preference can be highly individual. A possible reason for the big spread in reported preference is perhaps due to listener experience and personal aesthetic values. As seen in table 5, the mean from a combination of all stimuli using binaural processing (group 1) show a preference score of -0.04 for headphones and -0.3 for loudspeakers, although no statistical significance could be found (P < 0.05). In this case the use of the binaural processing done by Ambeo Orbit cannot be shown to significantly alter the preference of the listeners in the test. Out of six stimuli where listeners preference ratings show statistical significance in either playback system, four stimuli (4, 5, 10 and 12) show statistical significance in both (as seen in table 6). The preference also points in the same direction in all these instances. This implies that the use of the binaural processing to at least some degree have the same effects on preference for both systems. 4.2 Comparing preference of different stimuli with processing as factor. Although little mean difference in preference ratings were shown when comparing the stimuli using the two playback systems as factor some trends can be seen when comparing the scores of the different mixes to one another. A notable trend is that the increase of the “clarity”-value affects listener preference negatively for both headphone and loudspeaker playback for half of the stimuli and be seen in figure 6. This trend is shown only for stimuli containing drum elements (stimuli 1 to 6). For the other half, stimuli with binaural processing used on melody and pad, no correlating trend is shown when altering “clarity”. This could perhaps be attributed to the lack of treble frequencies in the elements being binaurally processed in stimuli 7-12, thus “clarity” did not affect the mix to any great extent in these cases. These results, although vague, are the opposite of what was expected in the hypothesis. An increase in “clarity” did not generate higher ratings. The second hypothesis was that the temporal smearing of transient-rich elements when using binaural processing would negatively impact preference when played back on loudspeakers, but this could not be shown from the listening test.

Figure 6 - Trend lines show how an increase in clarity negatively impact listener preference rating when increased on mixes where drums have been binaurally processed. No such effect can be seen in mixes where only the pad or melody have been binaurally processed.

18

4.3 Keywords gathered from listener survey. In the accompanying survey to the listening test the listeners reported on what aspects of the stimuli affected their preference ratings. Although the notes provided can’t be linked to either reference or any particular stimuli, because of method shortcomings, some themes of keywords can be summarized and can inform the research whether there are possible issues with the stimuli and also to see what factors could be researched more in depth for further research. The notes from the listeners were reported in Swedish but in the summary in table 7, these were translated to English. In Table 7 and 8 the attributes noted from the survey are put in the corresponding column relating to whether they were noted as positive or negative aspects by the listeners for the listening test. The comments are also categorized depending on if they relate to one of several aspects. These categories are: “Spatial and panning related attributes”, “Spectral content”, “Balance-related attributes”, “Phase-related attributes” and “Attributes not specific to any of the above”.

Table 7 - This table shows a compilation of translated comments for headphone playback gathered during the listening test. Positive Negative Neutral or not possible to categorize preference. Spatial and “Panning of instruments sounds “Too wide panning of melody” “Better sound but too panning related comfortable.” “Overly wide mix” wide” attributes “Nicer stereo width” “pad centered” ”Panned pad makes stereo “Nicer width on pad” “Confusing panning” image wider” “narrower” “pad panned to the left” “Comfortable stereo image” “Unbalanced stereo image” “Clearer center in the image” “Stereo image leans to much to “Wider” the right” “Pad sounds better in the middle.” “Better stereo balance” “Nice with pad in both ears” “Stereo width makes it fatter”

Spectral content “Brighter Snare” “More treble makes mix less “Drums sound brighter” “Fatter snare” balanced” “Brighter mix” “Nicer snare” “Don’t like the filter effect” “Boring EQ” Balance-related “Kick sounds louder and punchier” “Kick too loud” attributes Phase related “Phasey” “Sounds better although attributes “Snare sounds phasey” more phasey” “Pad sounds phasey” Attributes not “Clearer melody” “Melody sounded too close.” “Fuller sounding pad” specific to any of “Snare fits better” “Annoying drums” “Nearer and warmer the above “Fuller sounding” “Melody sounds more sound but also has a bit of “Clearer bass” interesting.” ‘ugly’ distortion.” “Sounds livelier.” “Tame mix” “more HiFi” “Dislike the pitched drums” “Sounds bigger” “Thin sounding mix” “More energetic” “Snare sounds distracting” “Sounds more old school” “More even” “Snare sounds livelier”

19

Table 8 - This table shows a compilation of translated comments for headphone playback gathered during the listening test.

Positive Negative Neutral or not possible to categorize preference. Spatial and “Nicer width in the pad” “Width of pad is too big.” “Melody is panned.” panning related “Better width” “The pad is panned to the left.” “Synth is in the middle.” attributes “Better width in pad” “The mix tilts to the right” “The pad sounds better in the “melody is far to the right” middle.” “Fuller with pads in the middle” “Stereo width makes mix more interesting.” “Pad sounded narrower, making mix more balanced.” “Pad’s positioning makes it bigger.” “More 3D-feeling” “Stereo width makes it fatter.” “More comfortable placement of the sounds in the stereo image.” Spectral content “Brighter snare sound which fit the “Snare sounds canny” “More bottom end” mix.” “Hi hat sounds like it lost some “clearer spectrum” treble.” “Snare clearer” “Uncomfortable filtering” “Fuller frequency spectrum” “More low mids” Balance-related “Kick balanced better” “louder kick” attributes “Drums are louder” Phase related “Drums sound phasey.” “Sounds “phasey” but not attributes “Felt like something hade its necessarily bad” phase inverted” “Phase issues” “Snare sounds phasey” “Sounds stereo enhanced and phasey”

Attributes not “More force in the snare” “The drums sound confined.” specific to any of “Clearer drums” “less full sounding” the above “More clarity” “Thin sounding” “Clearer bass” “Sounded confined” “Prefer the snare” “Sounds more HiFi” “Hihat fits better in mix” “The whole mix sounds fatter.” “Bigger sound” “Fuller sound” “More oomph in kick” “More cohesive mix” “Snare sounds uncomfortable.” “Fuller kick” “Sounds more even.”

4.3.1 Key points from listener comments One take away from the listener comments is that there is some disparity inside the group of listeners on how the perceived width of the pad affects preference. Both negative and positive comments are reported regarding the pad’s width for both playback systems. In some instances, a perceived wider pad is reported as a positive, and others report a narrow pad to be positive and vice versa. This could partly explain the big spread in reported preference. Furthermore, although few comments relate to amplitude balance properties one listener stated that the kick sounded “too loud” as a negative while another stated that kick sounded “loud and punchier” as a reason for preferring the stimuli.

20

When it comes to stimuli creation it is also clear from the comments that the listeners, at least sometimes, perceive the lateral placement of some elements to change when comparing the stimuli to the reference. Comments like “The mix tilts to the right”, “the pad sounds better in the middle”, “The pad is panned to the left” etc. make this clear. These comments show that it is hard to create mixes where panoramic placement is not a possible confounding variable, especially in a context where it is the specific properties of the binaural processing that are being evaluated. 4.4 Critique of method The processing on the pad in the mix had a big impact on preference. In all cases where the pad was the only element that was processed, the mean was negative. This is not necessarily due to binaural processing itself but could be due to how it was performed on the specific stimuli. In this case further tweaking of pad processing should be done to ensure that the stimuli do not have confounding issues. Although, no significant differences in means could be shown when playback system was a factor. To gain a better understanding of the reason for the listeners’ ratings, a way to couple survey questions and specific stimuli would be needed. In hindsight, a slightly different approach for collecting comments could have been used. When using the software STEP, the built-in blinding-function obscures any link between comments noted outside the software to the corresponding ratings reported inside the program since the randomization order is not stored. The stimuli names presented in STEP when not hidden to the listener could very well be randomized, though, and the names could be of a type that would not influence the listener to any expected degree (using a random number sequence for instance). This likely would have provided the analysis of the results with a better understanding of how the parameters affect listeners’ preference and other trends such as spread. 4.5 Findings The results from the listening test did not show significant differences between the two playback systems except for in one out of twelve cases. In the single case where the difference was significant it is hard to tell why that is the case. The trends shown when comparing preference ratings for the different stimuli roughly follow the same pattern in both playback systems. These lack of shown differences in preference when comparing the stimuli over the two playback systems might indicate that binaural processing is not necessarily problematic when it comes to compatibility. More research would need to be done on specific instrument types to see what instruments present issues when it comes to compatibility. Listening tests with a narrower focus, perhaps a single instrument, could help to inform even more on this subject.

21

5. References

Dolby. (2018). Dolby Atmos for Mobile. https://www.dolby.com/us/en/technologies/mobile/dolby- atmos.html ITU. (2012, August). BS.775 : Multichannel stereophonic sound system with and without accompanying picture. https://www.itu.int/rec/R-REC-BS.775/en Merimaa, J. (2009). Modification of HRTF Filters to Reduce Timbral Effects in Binaural Synthesis. (AES Convention Paper 7912). Retrieved from: http://www.aes.org.proxy.lib.ltu.se/e- lib/browse.cfm?elib=15107 Möller, H., Sörensen F, M., Jensen B, C., & Hammershöi, D. A. (1996). Binaural Technique: Do We Need Individual Recordings? JAES, 44(6), 451-469 Retrieved from: http://www.aes.org.proxy.lib.ltu.se/e-lib/browse.cfm?elib=7897 Rumsey, F. (2011). Whose head is it anyway? JAES, 59(9), 672-677 Retrieved from: http://www.aes.org.proxy.lib.ltu.se/e-lib/browse.cfm?elib=15982 Sennheiser. (2020). Ambeo orbit. https://en-us.sennheiser.com/ambeo-orbit Vickers, E. (2009). Fixing the Phantom Center: Diffusing Acoustical Crosstalk. (AES Convention Paper 7916) Retrieved from: http://www.aes.org.proxy.lib.ltu.se/e-lib/browse.cfm?elib=1511 Waves. (2020). Abbey Road Studio 3. https://www.waves.com/plugins/abbey-road-studio-3#inside- the-waves-abbey-road-studio-3-plugin

22

6. Appendix 6.1 Raw data

Listener Session stimuli Score Listener Session stimuli Score 2 HP 1 -3 1 HP 1 1 1 HP 2 -2 2 HP 2 -3 1 HP 3 -1 2 HP 3 -3 1 HP 4 1 2 HP 4 0 1 HP 5 1 2 HP 5 0 1 HP 6 -1 2 HP 6 -2 1 HP 7 0 2 HP 7 0 1 HP 8 0 2 HP 8 0 1 HP 9 -1 2 HP 9 0 1 HP 10 -1 2 HP 10 -3 1 HP 11 1 2 HP 11 -2 1 HP 12 -2 2 HP 12 -3 1 LS 1 2 2 LS 1 -3 1 LS 2 0 2 LS 2 -2 1 LS 3 -1 2 LS 3 -2 1 LS 4 2 2 LS 4 3 1 LS 5 2 2 LS 5 0 1 LS 6 0 2 LS 6 0 1 LS 7 0 2 LS 7 0 1 LS 8 0 2 LS 8 0 1 LS 9 0 2 LS 9 0 1 LS 10 -1 2 LS 10 -3 1 LS 11 -2 2 LS 11 -2 1 LS 12 -2 2 LS 12 -1

23

Listener Session stimuli Score Listener Session stimuli Score 3 HP 1 0 4 HP 1 2 3 HP 2 2 4 HP 2 -1 3 HP 3 2 4 HP 3 2 3 HP 4 1 4 HP 4 0 3 HP 5 2 4 HP 5 2 3 HP 6 0 4 HP 6 2 3 HP 7 1 4 HP 7 0 3 HP 8 2 4 HP 8 -2 3 HP 9 0 4 HP 9 0 3 HP 10 -1 4 HP 10 -2 3 HP 11 -1 4 HP 11 3 3 HP 12 -2 4 HP 12 2 3 LS 1 3 4 LS 1 2 3 LS 2 2 4 LS 2 3 3 LS 3 -3 4 LS 3 -2 3 LS 4 2 4 LS 4 3 3 LS 5 2 4 LS 5 0 3 LS 6 -2 4 LS 6 -1 3 LS 7 1 4 LS 7 0 3 LS 8 0 4 LS 8 0 3 LS 9 0 4 LS 9 -1 3 LS 10 -2 4 LS 10 -1 3 LS 11 -2 4 LS 11 -2 3 LS 12 -2 4 LS 12 -2

24

Listener Session stimuli Score Listener Session stimuli Score

5 HP 1 -2 6 HP 1 3 5 HP 2 1 6 HP 2 -2 5 HP 3 -3 6 HP 3 -3 5 HP 4 3 6 HP 4 1 5 HP 5 2 6 HP 5 1 5 HP 6 1 6 HP 6 -1 5 HP 7 2 6 HP 7 0 5 HP 8 1 6 HP 8 1 5 HP 9 1 6 HP 9 0 5 HP 10 -2 6 HP 10 -2 5 HP 11 -1 6 HP 11 -2 5 HP 12 -3 6 HP 12 -2 5 LS 1 3 6 LS 1 2 5 LS 2 2 6 LS 2 -1 5 LS 3 -3 6 LS 3 -1 5 LS 4 2 6 LS 4 1 5 LS 5 3 6 LS 5 1 5 LS 6 -2 6 LS 6 -1 5 LS 7 0 6 LS 7 -1 5 LS 8 0 6 LS 8 -1 5 LS 9 0 6 LS 9 1 5 LS 10 -2 6 LS 10 -2 5 LS 11 -2 6 LS 11 -2 5 LS 12 -1 6 LS 12 -2

25

Listener Session stimuli Score Listener Session stimuli Score

7 HP 1 -2 8 HP 1 -2 7 HP 2 -3 8 HP 2 -3 7 HP 3 -2 8 HP 3 -1 7 HP 4 1 8 HP 4 2 7 HP 5 1 8 HP 5 1 7 HP 6 0 8 HP 6 -2 7 HP 7 0 8 HP 7 0 7 HP 8 1 8 HP 8 1 7 HP 9 2 8 HP 9 0 7 HP 10 -2 8 HP 10 -2 7 HP 11 -2 8 HP 11 -1 7 HP 12 -2 8 HP 12 -2 7 LS 1 1 8 LS 1 1 7 LS 2 1 8 LS 2 -2 7 LS 3 -1 8 LS 3 -2 7 LS 4 2 8 LS 4 1 7 LS 5 2 8 LS 5 2 7 LS 6 0 8 LS 6 0 7 LS 7 0 8 LS 7 -1 7 LS 8 0 8 LS 8 1 7 LS 9 0 8 LS 9 1 7 LS 10 -2 8 LS 10 -2 7 LS 11 1 8 LS 11 -2 7 LS 12 1 8 LS 12 -2

26

Listener Session stimuli Score Listener Session stimuli Score

9 HP 1 2 10 HP 1 1 9 HP 2 2 10 HP 2 1 9 HP 3 2 10 HP 3 -1 9 HP 4 -1 10 HP 4 1 9 HP 5 0 10 HP 5 1 9 HP 6 -1 10 HP 6 -1 9 HP 7 -1 10 HP 7 0 9 HP 8 -1 10 HP 8 0 9 HP 9 1 10 HP 9 0 9 HP 10 2 10 HP 10 -1 9 HP 11 2 10 HP 11 -1 9 HP 12 2 10 HP 12 -1 9 LS 1 2 10 LS 1 -2 9 LS 2 2 10 LS 2 1 9 LS 3 2 10 LS 3 -2 9 LS 4 1 10 LS 4 -1 9 LS 5 1 10 LS 5 0 9 LS 6 1 10 LS 6 0 9 LS 7 1 10 LS 7 0 9 LS 8 -1 10 LS 8 0 9 LS 9 -1 10 LS 9 0 9 LS 10 2 10 LS 10 0 9 LS 11 1 10 LS 11 1 9 LS 12 2 10 LS 12 -1

27

Listener Session stimuli Score Listener Session stimuli Score

11 HP 1 2 12 HP 1 -1 11 HP 2 2 12 HP 2 -2 11 HP 3 -2 12 HP 3 -2 11 HP 4 1 12 HP 4 0 11 HP 5 0 12 HP 5 0 11 HP 6 -1 12 HP 6 0 11 HP 7 -1 12 HP 7 0 11 HP 8 -1 12 HP 8 0 11 HP 9 -1 12 HP 9 0 11 HP 10 -2 12 HP 10 -1 11 HP 11 0 12 HP 11 1 11 HP 12 -2 12 HP 12 1 11 LS 1 2 12 LS 1 -1 11 LS 2 -1 12 LS 2 -1 11 LS 3 -2 12 LS 3 -1 11 LS 4 2 12 LS 4 2 11 LS 5 -1 12 LS 5 0 11 LS 6 0 12 LS 6 0 11 LS 7 2 12 LS 7 0 11 LS 8 2 12 LS 8 0 11 LS 9 2 12 LS 9 0 11 LS 10 -2 12 LS 10 -1 11 LS 11 -2 12 LS 11 -2 11 LS 12 -2 12 LS 12 -2

28

Listener Session stimuli Score 13 HP 1 -2 13 HP 2 -2 13 HP 3 -2 13 HP 4 3 13 HP 5 0 13 HP 6 2 13 HP 7 1 13 HP 8 1 13 HP 9 0 13 HP 10 -2 13 HP 11 -2 13 HP 12 -2 13 LS 1 3 13 LS 2 1 13 LS 3 -2 13 LS 4 -2 13 LS 5 1 13 LS 6 1 13 LS 7 0 13 LS 8 0 13 LS 9 0 13 LS 10 0 13 LS 11 1 13 LS 12 -3

29

6.2 Written instructions for listeners

30