Auditory Illusion Through Headphones: History, Challenges and New Solutions
Total Page:16
File Type:pdf, Size:1020Kb
The Technology of Binaural Listening & Understanding: Paper ICA2016-363 Auditory illusion through headphones: History, challenges and new solutions Karlheinz Brandenburg(a);(b), Stephan Werner(b), Florian Klein(b), Christoph Sladeczek(a) (a)Fraunhofer IDMT, Germany, [email protected], [email protected] (b)TU Ilmenau, Germany, [email protected], [email protected], fl[email protected] Abstract The dream of perfect recreation of sound has always consisted of two parts: Reproduction of monaural sounds such that they seem to be exact copies of an original signal and the plausi- ble recreation of complex sound environments, the possibility to be immersed in sound. The latter goal seems to be much more difficult, especially if we consider reproduction over head- phones. From standard two-channel sounds reproduced over headphones through artificial head recordings, the inclusion of HRTF and binaural room impulse responses, always something was missing to create a perfect auditory illusion. Depending on refinements like individually adapted HRTF etc. these methods work for many people, but not for everybody. As we know now, in addition to the static, source and listener dependent modifications to headphone sound we need to pay attention to cognitive effects: The perceived presence of an acoustical room rendering changes depending on our expectations. Prominent context effects are for example acoustic di- vergence between the listening room and the synthesized scene, visibility of the listening room, and prior knowledge triggered by where we have been before. Furthermore, cognitive effects are mostly time variant which includes anticipation and assimilation processes caused by training and adaptation. We present experiments proving some of these well-known contextual effects by investigating features like distance perception, externalization, and localization. These features are shifted by adaptation and training. Furthermore, we present some proposals how to get to a next level of fidelity in headphone listening. This includes the use of room simulation software and the adaptation of its auralization to different listening rooms by changing acoustical parameters. Keywords: immersive sound via headphones, room simulation Auditory illusion through headphones: History, challenges and new solutions 1 Introduction As long as there has been recording of sound, people have been dreaming about the perfect sound reproduction enabling the real illusion of artists being in the room. There are reports of even Edison, when marketing early phonograph systems, emphasized audio quality, even over the artistic quality of the recording [9]. He organized demos around the world where people where asked whether they were actually listening to the live artist or a recording. With the continued improvement of recording, amplifier technology and loudspeakers, today we can get a quite faithful reproduction of monaural signals. For reproducing sound from multiple sources in a room, we still cannot say that the task of a plausible recreation of the sound in a different room has been completely solved. Difficult as this task is for reproduction using loudspeakers, there is even more of a problem when using headphones. The usual result if using recordings which have been mixed to enable playing the sound via two loudspeakers is sound which seems to come from within the head. What we do desire is to hear the sound coming from a stage before us (or around us). If we succeed with this, we call the result an externalized sound. Externalization describes the perception of the position of an auditory event outside or inside the head of the listener [29, 15]. Externalization is a crucial feature to reach a plausible spatial auditory illusion with binaural headphone systems. In the following chapters, we will first look at the main reason for non-externalized sound and present older efforts to get around these problems. We then present more current work ex- plaining the extent of the difficulties and look into some newer proposals to enable perfect audio illusion via headphones. 2 Earlier work In the field of electrical sound reproduction headphones always played a major role. In the early days due technical constrains, the development of a headphone speaker was much simpler than the development of a loudspeaker. Therefore the first electrical devices allowing listening to recorded audio were based on headphone-like technology. One of the first systems that was available to a broader audience was the "Théâtrophone" developed by Clément Ader in 1881 [8]. This device allowed a transmission of a two-channel audio signal to different receiver stations were the user needed to put two earcups to its ears to listen to concerts or plays. As the recording was realized using two microphones placed in a distance to each other, the users reported a spatial impression because of the inequalities of the sound at the two ears [22]. However, as these signals do not really represent the sound pressure at the ear drums as it would occur in natural listening, researchers have been working since more than 40 years to re- semble the "real world" signals. Major cues for directional hearing are time (phase) differences 2 (Interaural Level Differences - ILD , Interaural Time Differences - ITD) and the direction and frequency dependent transfer function from sound source to ear drum (Head Related Trans- fer Functions - HRTF) [27]. To simulate the HRTFs in an audio recording situation so-called dummy heads are used [26]. One of the first demonstrations made to a public audience was the presentation of a mechanical man called "Oscar" with microphone ears by AT&T at the Chicago World Fair in 1933 [13]. This was the starting point for the development of different dummy heads, which are still used today [21, 14]. With the availability of dummy-heads it was possible to record sound as placing a head on a specific position. However, a true spatial impression of the recorded scenes was not perceiv- able for every listener, since there are large differences between the HRTFs of different test subjects and we usually are not very good with listening with somebody else’s ears. Therefore a next big step towards more realistic auralization was the introduction of individualized HRTFs. Over the years, different methods have been used among others: In-ear measurement using small probes in the blocked or unblocked ear canal [26]; On-ear measurements including some correction factors; Selection of HRTFs (without actual measurement) of "what works best" [33] and optical measurement of ear and ear canal geometry and calculation of an HRTF using numerical simulations [18, 19]. Another question in the usage of individualized HRTFs is the needed accuracy. There is more research than would fit in this paper on such questions. There are publicly available databases of HRTF measurements. When we look at the amount of research going into the different methods, HRTF individualization has clearly got a lion’s share of the research. Since actual HRTFs depend not only on the individual, but on the position of the sound, an im- portant cue for auditory illusion is the change of the sound when somebody moves the head or the listeners moves within space. In literature this is typically called dynamic binaural synthesis [32]. It has been implemented using head trackers and the selection including interpolation of actual HRTFs [10]. Such systems are known for a much better auditory illusion and external- ization. Another cue to help with externalization is the actual reflection pattern in a room [3]. This founded in the fact, that reflections have a major influence on distance perception [24]. To include room acoustics into the sound of a headphone Binaural Room Impulse Responses (BRIR) are used. These transfer functions can either be determined by room acoustic mea- surement or room acoustic simulation. Regarding the lack of externalization, we find different theories in the literature. If we include the newer results on room divergence and adaptation (see the following chapters), the authors favor the following explanation: Sounds in a room are localized via a complex interaction of simple auditory cues, the expectation in higher layers of the brain and the recognition of sound patterns (including reflections) of known signals. Whenever there is too much divergence be- tween the expected sound pattern and the actual sound delivered to the inner ear, there is an decreased probability of externalization. Context-dependent quality parameters like room di- vergence, presence of visual cues, and personalization of the system influence the perception of externalization [38]. From this it is clear that externalization is not just the result of using 3 correct HRTFs etc., but also the result of complex cognitive interactions in the brain. It is hypothesized that the build-up of the experienced quality is a cognitive process which includes expectations and preknowledge of the listener. Jekosch [16, 17], Blauert [5, 4], Raake [30, 31], and others [25] propose a quality formation process with two essential processes. The quality perception path is driven by the physical nature of the event which reaches the sensory organs. The perceived auditory event is created or constructed with respect to the an internal reference path. This process includes a comparison and judgment with the internal reference (or expectation) of each individual person. The reference path describes the time dependent, context dependent, multi-sensory, and cognitive influencing factors on the quality formation process. To transfer this knowledge in new applications an extension of this process is proposed. The extension includes on the one hand the technical system to build-up the perceived quality and on the other hand feedback mechanisms from the perceived quality to the technical elements of the system [35]. The quality of the system can be described by the technical quality elements and the context of use of the system (context-dependent quality parameters like room divergence or personalization of the system).