Audio Engineering Society Convention Paper Presented at the 113Th Convention 2002 October 5–8 Los Angeles, CA, USA
Total Page:16
File Type:pdf, Size:1020Kb
ALGAZI ET AL. HAT MODELS FOR SYNTHESIS Audio Engineering Society Convention Paper Presented at the 113th Convention 2002 October 5{8 Los Angeles, CA, USA This convention paper has been reproduced from the author's advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society. THE USE OF HEAD-AND-TORSO MODELS FOR IMPROVED SPATIAL SOUND SYNTHESIS V. Ralph Algazi1, Richard O. Duda1, and Dennis M. Thompson1 1CIPIC Interface Laboratory, University of California, Davis CA, 95616-8553, USA Correspondence should be addressed to Richard O. Duda ([email protected]) ABSTRACT This paper concerns the use of a simple head-and-torso model to correct de¯ciencies in the low-frequency behavior of experimentally measured head-related transfer functions (HRTFs). This so-called \snowman" model consists of a spherical head located above a spherical torso. In addition to providing improved low- frequency response for music reproduction, the model provides the major low-frequency localization cues, including cues for low-elevation as well as high-elevation sources. The model HRTF and the measured HRTF can be easily combined by using the phase response of the model at all frequencies and by \cross-fading" between the dB magnitude responses of the model and the measurements. For e±cient implementation, the exact snowman HRTF is approximated by two time delays and two ¯rst-order IIR ¯lters. Because the poles are independent of the location of the virtual source, this supports a simple real-time implementation that allows for arbitrarily rapid head and source motion. AES 113TH CONVENTION, LOS ANGELES, CA, USA, 2002 OCTOBER 5{8 1 1. INTRODUCTION as the elevation ranges from overhead to underneath. A snowman ¯lter model is then developed by building upon Many systems for spatial sound synthesis employ ex- a ¯lter model for a single sphere and by analyzing the perimentally measured head-related transfer functions combined geometry of the two spheres that compose the (HRTFs). Unfortunately, it is both time-consuming and snowman. Filter implementation and applications are experimentally di±cult to measure HRTFs accurately. briefly discussed, with emphasis on the low-frequency Low-frequency measurements are particularly problem- compensation of measured HRTFs. atic, partly because large loudspeakers are required, and partly because even good anechoic chambers reflect long- 2. THE SNOWMAN MODEL wavelength sound waves. The use of windowing to elim- inate room reflections also modi¯es the response to low Although it has been known for some time that the torso frequencies. Without low-frequency compensation, mu- has a measurable influence on the HRTF, reports on the sic synthesized with HRTFs often sounds \thin" and psychoacoustic importance of torso e®ects for sound lo- lacking in bass. Ad hoc methods for boosting the bass calization have been inconclusive (see [3] for a review can improve the sound quality, but such compensation of previous investigations). The best known e®ect is the can disturb localization cues. Finally, most HRTF-based so-called \shoulder bounce," the specular reflection from systems are not able to convincingly position virtual the upper torso that occurs when the source is above the sound sources at very low elevations. subject. The reflected wave introduces a series of comb- ¯lter notches into the HRTF spectrum. For overhead The low-frequency performance of HRTF-based systems sources, the ¯rst notch appears at frequencies as low as can be signi¯cantly improved by the judicious use of the- 600 Hz, which makes it potentially important for sources oretical HRTFs derived from simple models. The sim- that do not contain much high-frequency energy. plest of these is the classical spherical-head model [1, 2]. It provides the low-frequency interaural time delay In a previous study, we showed that this torso reflec- (ITD) and the interaural level di®erence (ILD), which are tion provides a weak but signi¯cant elevation cue for the two most important cues for azimuth. The spherical sources that are away from the median plane [3]. Fur- head model all by itself does not provide any elevation thermore, the e®ect can be simulated with a simple cues, and sounds synthesized using this model alone are model. When the source spectrum was limited to 3 kHz weakly externalized and sometimes experienced as being to prevent the pinnae from providing elevation cues, es- elevated. sentially the same localization performance was obtained when the measured HRTF was replaced by the HRTF for This paper describes a simple model that accounts for a spherical-head model augmented by torso reflections the contributions of the torso as well as the head to computed from an ellipsoidal torso model. the HRTF. This so-called \snowman model" is based In general, the longest time delay between the direct on approximating both the head and the torso by rigid pulse and the reflected pulse occurs when the source is spheres. The e®ect of these spheres on sound waves is in overhead. As the source elevation is reduced, the de- turn approximated by simple signal-processing modules. lay time shrinks, eventually becoming zero when the ray These modules consist of two frequency-independent from the source to the ear becomes tangent to the torso. time delays and two ¯xed-pole, variable-zero, ¯rst-order The set of tangent rays de¯nes what we call the \torso- IIR ¯lters. The time delays account for the propaga- shadow cone" (see Figure 1). A sound source outside of tion of sound across the head and the reflection by the the torso-shadow cone produces a torso reflection. The torso for sounds at high elevation. The ¯lters accounts reflection disappears when the source is inside the torso- for head shadow for sources on the contralateral side and shadow cone, where the source is shadowed by the torso for torso shadow for sources at low elevation. We show instead. how this ¯lter model can be used to compensate exper- imentally measured HRTFs to produce virtual auditory Although an ellipsoid ¯ts the human torso better than sources that have a natural sounding bass response while does a sphere, the geometry of a spherical torso is easier retaining good source localization cues. to analyze. This gives rise to the so-called \snowman model," in which the body is approximated by a spheri- We begin by reviewing some of the observed character- cal head directly above a spherical torso. The snowman istics of HRTFs that are caused by the torso and that model is de¯ned by three parameters | the head radius have been explained and simulated by a simple head- a, the torso radius b and the neck height h (see Fig- and-torso geometric model. We then present the exact ure 2a).1 In principle, the model can be adapted to ¯t HRTF for the snowman model at all elevations. This HRTF indicates the major modes of behavior that occur 1One can further elaborate the model in various ways. For ALGAZI ET AL. HAT MODELS FOR SYNTHESIS n o ti Ip c s le il f a re te l a ra r l e r t e la fl a e £¦¥ c tr w ti n ¢¡¤£¦¥¨§ © o o d n C d " a ! ht spot ¤ h C I p o s n i tr b l w a a t l e a t r ) * ( e a r l a T + l t o o t h r o r t so r $ # s so shado p % & # o t ' $ w w Fig. 1: The torso shadow cone, which is de¯ned as (a) (b) the set of rays from the ear that are tangent to the torso. A source outside the torso shadow cone gives Fig. 2: (a) Frontal view of the snowman model. The rise to a torso reflection. A source inside the torso head and torso are coaxial spheres. The de¯ning shadow cone is shadowed by the torso. parameters are the head radius a, the torso radius b and the neck height h. The range for the elevation angle ± is 90± ± < 270±. (b) Zones for the right- ear response.¡ The· torso-shadow cone is divided into speci¯c individuals by determining the parameters from an ipsilateral and a contralateral zone by the torso anthropometric data [4]. In particular, we have found bright spot. Similarly, the head-shadow zone is di- that the following values are appropriate for the KE- MAR mannequin: a = 8:7 cm, b = 16:9 cm, and h = 5:3 vided by the head bright spot. cm. These are the values that are used in this paper. 3. THE IMPULSE RESPONSE OF THE than a meter or so away. Although it is conventional SNOWMAN MODEL to examine the behavior of the HRTF as a function of Although there is no analytical solution for the HRTF azimuth in the horizontal plane, or as a function of el- for the snowman model, it is possible to compute the evation in the median plane, the behavior of the HRTF HRTF quite accurately using numerical methods.2 In in the frontal plane is more revealing. Thus, all of the general, the HRTF is a function of azimuth, elevation results that we present are for the frontal plane. and range. We restrict ourselves to the case where the We used the multipole-reexpansion method described in source is in¯nitely distant, which is a good approxima- [6] to compute the HRTF in the frontal plane.