forum Short Papers Localization Cues for a Magnified Head: Implications from Sound Diffraction about a Rigid Sphere

I Introduction Rigid Sphere Results

Auditory displays for teleoperation and virtual en- To obtain insight into the frequency scalability of vironments are currendy receiving great attention. While HRTFs within a framework that is theoretically and some applications require displays that recreate the nor- computationally straightforward, we have analyzed the mal acoustic inputs to the , others may benefit from following simplified problem. HRTFs were computed as modifications designed to enable localization that ex- the pressure Ps along the surface of a rigid spherical ceeds normal performance. Systems for so-called "super- "head" of radius a = 9 cm arising from a point source at auditory localization" can be realized in several ways distance r from the sphere's center. Due to sound diffrac- (Durlach, Shinn-Cunningham, & Held, 1993). Most tion, there are frequency- and angle-dependent changes systems focus on magnifying only the cues for a source's in the magnitude and phase of the sound that would direction; cues for source's distance are not intentionally otherwise exist were the sphere absent. Following the modified. One such approach that is directly applicable derivation given in Morse and Ingard (1968, Sec. 7.2; to virtual-environment (VE) systems presents localiza- for related review of head effects, see Kuhn, 1977; tion cues that would correspond to having an enlarged Blauert, 1983), Ps is given by head. From basic acoustic such head principles, scaling pcc70 will alter the head-related transferfunctions (HRTFs) from sound sources to each , producing increases in (1) Hm(kr) both interaural amplitude and time (or phase) differ- 2 \m + ;(cos 9) eJ(2TTß-TT/2) ences, as well as increases in pinna cues.1 m=0 H'm(ka) In this paper, we quantify this notion for a range of where of interest. In we assess the parameters particular, impact p = density of air =1.18 kg/m3 of conditions simulating magnified-head listening by c = speed of sound = 344.8 m/sec measured HRTFs that are fre- using normal, empirically / = sound frequency scaled the inverse of the desired quency by head-magnifi- k = wave number = 2trf/c = 2tt/\, with X = cation factor. For example, to simulate a head of four sound wavelength times normal a normal would be fre- size, HRTF(f) a = radius of spherical head = 0.09 m scaled to the interaural quency HRTF(f/4); resulting r = distance from source position to the cen- differences for the scaled head at 1 kHz would then be ter of the sphere the differences that occur at 4 represented by normally 9 = angle between radii from the sphere's kHz. Insofar as this is simulations can be scaling valid, center to the sound source location and normal HRTFs. We conveniently implemented using to the measurement position on the shall see that the key variable of importance is the dis- sphere's surface tance from the head to the sound source (s). General im- plications of the results for magnified-head VE systems and other related systems are given at the end of the W. M. Y. and M. Wei paper. Rabinowitz, j. Maxwell, Shao, Research Laboratory of Electronics Massachusetts Institute Presence, Vol. 2. No. 2. Spring 1993. 125-129 of Technology © 1993 The Massachusetts Institute of Technology Cambridge, Massachusetts 02139

Rabinowitz et al. 125

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1993.2.2.125 by guest on 01 October 2021 126 PRESENCE: VOLUME 2, NUMBER 2

c70 = volume velocity strength of the sound sources, perfect frequency scalability ofHRTFs is ob- source (of infinitesimal extent) tained. However, as sound sources are positioned close Lra(cos 9) = Legendre polynomial functions in cos 9 to the scaled sphere (r —* ßa), the differences between P* Hm(kr) = spherical Hankel functions \nkr and Ps/ß2 increase, with consequent errors in HRTF H'm(ka) = derivatives of spherical Hankel functions scalability. The size and form of these differences are with respect to ka illustrated below. Observe that theform ofP„ i.e., the dependence on/ Polar sensitivity results for three frequencies (0.4, 2, and 9, is determined by the arguments kr and ka. and 10 kHz), two source distances (r = 1 and 4 m), and Scalings that do not change these arguments will, there- one scaling value (ß = 4) are given in Figure 1. To em- fore, leave the form of Ps unchanged. In particular, scal- phasize the angular dependencies independent of overall ing the sphere to a new (larger) radius a' = ßa (with magnitude changes, the plots have been normalized to ß > 1), while simultaneously scaling the source distance unity at 9 = 0°. Also, note that the specified frequency r' = ßr, and inverse scaling the sound frequency (/', applies to the normal sphere (i.e., ß = 1); for the scaled *') = (//ß> */ß)> results in P's = Ps/ß2, which is un- sphere, the results apply to the inverse-scaled frequency changed except for an overall scaling by 1/ß2 that is in- f/ß. As one expects, the patterns generally become more dependent offand 9. This result is consistent with the directional with increasing/and with decreasing r. Of well-known phenomenon that the solution of a geomet- more relevance here, as r decreases, the angular differ- ric acoustics problem at a particular frequency equals the ences between P* (dashed lines) andPs (solid lines) in- solution associated with scaling up (or down) all geo- crease but the forms ofP*(9) and Ps(9) still remain simi- metric dimensions of the problem and evaluating at a lar. The main differences are summarized by the amount frequency that is scaled down (or up) by the same factor. of fall-off in going from 0° to 180°. For r = 1 m and ß = This concept is usefully exploited in architectural acous- 4, this fall-off is about 7 dB more than that which occurs tics when the sound characteristics of large halls are normally. For less extreme cases, such as ß = 2 and/or evaluated in advance using small-scale physical models r = 20 m (not shown), the differences are substantially tested at appropriately scaled up frequencies. smaller. In the case of superauditory localization, we wish to Frequency responses as functions of source distance, magnify interaural differences by simulating an enlarged scaling factor, and the angle of source incidence were head. However, we probably will not want the external also computed. For normalization purposes, Ps (andP*) world to change. In other words, we will want the sound was divided by the "free-field" pressure Pff at the sphere- sources (and other objects) to remain at the same posi- center location with the sphere absent. Pff is simply the tions and, therefore, r' will remain at r rather than scal- spherical radiation pressure at a distance r from a point ing to ßr. The resulting sound pressure will be source: we shall denote as Because Ps(r, ßa,f/ß, 9) (which P*). ockUn same as = = = H-2 this is not the p eJ(2Trfi~kr+Tr/2) n)v ' pressure Ps(ßr, ßa,f/ß, 9) P's 4-nr Ps/ß2, perfect scalability ofHRTFs does not obtain. But how large are the differences between P* and Ps/ß2? As above, results at frequency//ß with the scaled sphere If the sound sources are far from the sphere, then were compared to those at/with the normal sphere. P* = Ps/ß2- This follows because as kr —> oo5 the acoustic Phase effects were examined via the group delay Tg = waves incident on both the normal and scaled spheres —d(f)/df where (/) is the phase of the normalized become planar, the effect ofr (other than an overall am- sound pressure. For the scaled sphere, Tg(//ß)/ß was plitude scaling) disappears, and keeping ¿'a' = ka gives used (and plotted below) to remove effects due to the the above sound pressure equality. Thus, for simulations overall change in distance (and Tg) associated with the that either ignore source distance or consider only distal scaling.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1993.2.2.125 by guest on 01 October 2021 Rabinowitz et al. 127

a. 400 Hz b. 2 kHz

180

c. 10 kHz

180 ___^/\/Va»^ -e- -Z^rr._^r\J^/xz&

FREQUENCY (Hz) FREQUENCY (Hz) Figure 2. Frequency responses for normal (solid line) and scaled Figure I. Polar responses for the normal sphere, Ps (solid lines), and (ß = 4, dashed line) conditions, for source distances of r = 4 m (a, c) for the sphere scaled by a factor ofß = 4, Pf (dashed lines). Results and I m (b, d), with magnitude (a, b) and group delay (c, d), and for are given for normal (unsealed) frequencies of (a) 400, (b) 2000, and two angles of incidence (8 = 0° and 120", as labeled). Results are given (c) 10,000 Hz and corresponding I /ß-sco/ed frequencies for the scaled relative to the reference free-field sphere-center sound pressure. The conditions. At each frequency, results are given for source distances of specified frequency abscissae refer to the normal (unsealed) conditions. r = 4 m (upper half circle) and I m (lower half circle); actual results are For the scaled conditions, the results were computed at corresponding symmetric in 8. All results are normalized to 0 dB at 6 = 0° and radial I /ß-sco/ed fs and the group delays are shown divided by ß. Symbols sensitivity is 10 dB per division. indicate asymptotic model results (normal = squares, scaled = diamonds) at low frequencies, after Rayleigh (see text), plotted near 40 Hz, and at high frequencies, after Woodworth (see text), plotted near Frequency responses for r = 1 and 4 m, ß = 4, and 14 kHz. Note that the predictions are sometimes independent of ß 9 = 0° and 120° are in 2. Consider first the given Figure (e.g., for group delay at 8 = 0°) and the symbols superimpose. results for r = 4 m with the normal sphere (Fig. 2a and c, the solid lines). At 9 = 0° the magnitude increases from near 0 dB at low frequencies to 6 dB at high fre- 9 = 0° the scaled magnitude exceeds the normal magni- quencies due to "pressure doubling"; at 9 = 120° the tude by 4-6 dB and at 120° the scaled magnitude is magnitude falls off at high frequencies due to diffraction lower than the normal magnitude by a similar range. (i.e., "head shadow"). The group delays transition from Asymptotic differences in group delay are also evident, low-frequency asymptotes near 390 and -200 (xsec for but the forms of the results for the scaled and normal 9 = 0° and 120°, respectively, to high-frequency asymp- spheres are, once again, generally similar. Also, as noted totes near 260 and —140 p.sec. (We use positive delay to above, results for other intermediate conditions exhibit mean that the sound at the measurement position leads differences smaller than those in Figure 2. that at the sphere-center free-field reference.) For scaling Accounting for the "effective" distance between the with ß = 4 and r = 4 m, very similar results are obtained source and the measurement position can explain part of (compare the solid and dashed lines). For the smaller the changes that result from sphere scaling (i.e., part of source distance (r = 1 m, Fig. 2b and d), larger differ- the differences between Ps and P*). For the polar depen- ences exist between the normal and scaled spheres. At dencies, consider ignoring diffraction by removing the

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1993.2.2.125 by guest on 01 October 2021 128 PRESENCE: VOLUME 2, NUMBER 2

sphere and using the "straight-line" distances from the 3 Discussion source to the measurement positions. From Eq. (2), sound pressure is inversely proportional to distance. For The sound pressure along the surface of a scaled up the farthest and nearest measurement positions, these rigid sphere computed at inversely scaled-down frequen- distances are r + ßa and r ßa (for 9 = 180° and 0°, cies has been analyzed for sources whose locations re- — respectively). Then, for r = 1 m and ß = 4, the ratio of main fixed in space and which may be both near to and these distances compared to the ratio for the normal far from the sphere. In general, this sound pressure (P*) sphere is 5.0 dB, close to relative difference of ~ 7 dB is close to the corresponding pressure along the normal noted above. (unsealed) sphere (Ps), with differences that become sub- stantial for sources that are How- Models that are slightly more sophisticated can ac- only relatively nearby. even differences are with count for parts of the asymptotic frequency responses. ever, in these cases, the orderly to the main of interest: sound fre- For low frequencies, diffraction effects are well modeled respect parameters and the of incidence. Several of the by removing the sphere and substituting a new measure- quency angle aspects in some of the behaviors at ment position that is along the original radius but at a results, particular asymptotic low and can be understood sim- distance of 1.5 times the original or scaled radius (i.e., at high frequencies, using models. 1.5a or 1.5ßa). This result is not intuitively obvious and plified As a concise the normalized appears to have been first recognized by Rayleigh in quantitative summary, and differences between P* and 1896 (see Rayleigh, 1945, p. 248). From Figure 2, the magnitude group-delay are within about 2.5 dB and 20 for source dis- values from this simplified model (the symbols plotted P, ixsec tances r > 1 m with a sphere magnification ß = 2, and near 40 Hz) are in good agreement with the exact calcu- for r > 2.5 m with ß = 4. These results apply for a nor- lations. The most significant discrepancy occurs with the mal to 20 kHz. For a scaled fre- = = (unsealed) frequency up source close to the scaled sphere (r 1 m, ß 4) and to 20 kHz, normal to ß times = = quency up frequency goes 9 0°; the model gives a delay of Tg/4 390 p.sec 20 kHz and for the above conditions on r and ß, magni- (identical to the for the normal prediction sphere) tude differences between P* and Ps remain ~ systematic whereas the actual delay is somewhat smaller, 340 with/and 9 but increase to about 4 dB at the highest usée. For very for high frequencies, good predictions frequencies with 9 = 180°. group follow from the model of asymptotic delay The use of normal HRTFs at inversely scaled down Woodworth (Woodworth & Schlosberg, 1962); is Tg frequencies appears to reproduce the main effects of a as the distance from the source to the computed point- magnified head. Of course, HRTFs from a real head (as on the the distance of-tangency sphere plus arclength opposed to a sphere) exhibit effects due to the size and the to the measurement all di- along sphere position, shape of the particular head, pinnae, and torso (and vided the sound c. For 9 = 0°, this by speed prediction other factors such as hair, clothing, etc.) and, in contrast = = 261 u.sec for the normal relative gives Tg a/c sphere to the cylindrical symmetry that exists with a sphere, real to the reference and times this value for sphere-center ß HRTFs are dependent on both azimuth and elevation. the scaled sphere (with division by ß superimposing this Scaling HRTFs (via//ß) corresponds to uniformly scal- point on that for the normal case in Fig. 2). For magni- ing up all physical dimensions. Such scaling is appropri- tude effects, we are unaware of a simplified model that ate for VE applications and is convenient to implement accurately accounts for high-frequency diffraction. Fi- since normal HRTFs can be used. One caveat, however, nally, the transitions between the low- and high-fre- is that the desired upper frequency limit for the magni- quency asymptotes for group delay are irregular and not fied-head simulation requires the normal HRTFs to be simply modeled; nevertheless, these irregularities are available at ß times this frequency. moderate in extent and little would probably be lost by In contrast to the above scaling of all dimensions, using a simplified transitional representation. other means of head magnification might correspond to

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1993.2.2.125 by guest on 01 October 2021 Rabinowitz et al. 129

scaling up only some dimensions. For example, a helmet- normal HRTF(f) would be frequency scaled to based scaling "pseudophone" is a device in which sound HRTF(4f). From the analysis above, the main factor is picked up by left/right microphones (separated by a that causes differences between P* and Ps is the change in distance greater than the normal head width) and routed effective distance from the sound source to the measure- to the left/right ears. Since such a system requires real- ment position. Because these distance changes are pro- world sound sources, it is not directly applicable to VE portionately smaller for head minification than for mag- studies; however, it could be very useful to explore local- nification, the sound pressure differences will be reduced ization with a magnified head without some of the prob- and the scalability of normal HRTFs will be more accu- lems posed by scaling virtual audio displays, including rate. time delays associated with head tracking, interpolating HRTFs between measured values, etc. Ignoring con- struction difficulties, a pseudophone would provide a Acknowledgments means of conveniently magnifying the interaural separa- tion, possibly including magnified pinnae and head dif- Nat Durlach motivated this work and he, Barbara Shinn-Cun- David and Kulkarni useful fraction, but magnifying the torso would be more chal- ningham, Zeltzer, Abhijit provided comments on an earlier version of the The work was A also distance cues, paper. lenging. pseudophone represents AFOSR Grant 90-0200 and NIH Grant R01 whereas most virtual do not. From the supported by displays presently DC00270. above analysis, the localization cues from the use of a scaling pseudophone will exhibit interactions with source distance that are somewhat distorted (re normal). References However, as emphasized recently in another context by Durlach, Woods, Kulkarni, Colburn, Rigopolous, Pang, Blauert, J. (1983). Spatial . Cambridge, MA: MIT and Wenzel the and (1992), precision accuracy required Press. HRTFs to represent (and localization cues) probably Durlach, N. I., Rigopolous, A., Pang, X. D., Woods, W. S., depend both on the specific task objective and whether Kulkarni, A., Colburn, H. S., & Wenzel, E. M. (1992). On adequate opportunity for adaptation to the novel the externalization of auditory images. Presence: Teleoperators HRTFs has been provided. Clearly, for any studies in- and Virtual Environments, 1, 251-257. volving head magnification, a subject must interpret a Durlach, N. I., Shinn-Cunningham, B. G., & Held, R. M. new set of interaural differences. Normal differences will (1993). Supernormal auditory localization I. General back- and Virtual 2 occur at novel (reduced) frequencies and at a given fre- ground. Presence: Teleoperators Environments, 89-103. quency interaural differences will be magnified relative Kuhn, G. F. (1977). Model for the interaural time differences to their normal values. The to is, there- necessity adapt in the azimuthal the Acoustical unavoidable. plane. Journal of Society of fore, America, 62, 157-167. for some VE studies, a head that is Finally, simulating Morse, P. M., & Ingard, K. U. (1968). Theoretical acoustics. reduced, as opposed to enlarged, may be of interest. The New York: McGraw-Hill. above rigid-sphere theory applies, except that all vari- Rayleigh, ]. W. S. (1945). The theory ofsound (Vol. 2). New ables scale in the opposite directions, which can be real- York: Dover. ized simply by using ß < 1. Thus, to simulate a head Woodworth, R. S., & Schlosberg, H. (1962). Experimental that is reduced in size by a factor of four, ß = !/4 and a psychology. New York: Holt, Reinhart, & Winston.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1993.2.2.125 by guest on 01 October 2021