<<

Is My Decoder Ambisonic?

Aaron J. Heller Artificial Intelligence Center, SRI International Menlo Park, CA 94025, USA Email: [email protected] Richard Lee Eric M. Benjamin Pandit Littoral Dolby Laboratories Cooktown, Queensland 4895, AU San Francisco, CA 94044, USA Email: [email protected] Email: [email protected] Revision 2

Abstract In earlier papers, the present authors established the importance of various aspects of Am- bisonic decoder design: a decoding matrix matched to the geometry of the loudspeaker array in use, phase-matched shelf filters, and near-field compensation [1, 2]. These are needed for accurate reproduction of spatial localization cues, such as interaural time difference (ITD), interaural level difference (ILD), and distance cues. Unfortunately, many listening tests of Ambisonic reproduction reported in the literature either omit the details of the decoding used or utilize suboptimal decoding. In this paper we review the acoustic and psychoacoustic criteria for Ambisonic reproduc- tion, present a methodology and tools for “black box” testing to verify the performance of a candidate decoder, and present and discuss the results of this testing on some widely used decoders.

Keywords: Ambisonics, decoder design, listening tests, surround , psychoacoustics

NOTE: This is a revised and corrected version of the paper presented at the 125th Convention of the Audio Engineering Society, held October 1 – 5, 2008 in San Francisco, CA USA

$Id: imda-revised.tex 25737 2012-06-10 23:40:07Z heller $

1 1 INTRODUCTION rience with good Ambisonic reproduction, we This paper is about testing Ambisonic de- might have stopped there and written off Am- coders. The decoder is the component of an bisonics as yet another failed surround sound Ambisonic reproduction system that derives the technology. loudspeaker signals from the program signals. Instead, we went to the benchmark of Unlike most other surround sound systems in good Ambisonic playback, what are known in- which each channel of a recording is intended formally as Classic Ambisonic Decoders the to drive a single loudspeaker directly, an Am- hardware-based decoders designed by the origi- bisonic recording can be played back on a va- nal Ambisonics team [5] and built up an offline, riety of speaker layouts, both 2-D and 3-D, by file-to-file decoding workflow that mimicked the using an appropriate decoder. processing performed by those decoders. Since A key feature of Ambisonic theory is that it each step produced an intermediate file, we were provides a mathematical encapsulation of prac- able to verify that our implementation was per- tically all known auditory localization models, forming as expected. The techniques described except the pinna coloration and impulsive (high- in this paper are a formalization and extension ) interaural time delay models. These of this verification process. mathematical descriptions can be used to prove Finally, by using a playback program that theorems about surround sound recording and provided synchronized playback of a number of reproduction, predict what spatial information files and rapid switching among them, we were can and cannot be conveyed by a particular sys- able to gain an understanding of the effects of tem, guide the design of decoders, and as dis- each of the key components in an Ambisonic cussed in this paper, evaluate and validate im- decoder: a decoding matrix matched to the ge- plementations. ometry of the loudspeaker array in use, phase- We assume that the reader is familiar with matched shelf filters, and near-field compensa- the basic workings of surround sound in gen- tion (NFC). These listening tests demonstrated eral and Ambisonics in particular. Background that using the correct decoder results in dramat- material on these topics, as well as sample Am- ically improved performance [1, 2]. bisonic recordings, can be found at various web- A number of recent papers have reported sites [3, 4]. on the results of Ambisonic listening tests that Our interest in determining whether or not a have used decoders that are clearly faulty or given decoder meets the criteria for Ambisonic employed incorrectly. As an example, in ref- reproduction is motivated by practical consider- erence [6] the authors compare various spatial- ations. When we first conducted listening tests, ization techniques, including Ambisonics. The we did what many do: obtained some - methodology used was well thought out, but un- ings made with a Soundfield , set up fortunately the used to decode the Am- six loudspeakers in a hexagon, downloaded a de- bisonic program material may not have been the coder off the Internet, and listened with the de- most appropriate: fault settings. What we experienced was quite “The ‘in-phase’ ambisonic decoder confusing completely ambiguous localization was selected as it is recommended and severe comb filtering artifacts from slight for larger rooms and listening areas, head movements. Over the next few listening preventing anti-phase signals to be sessions, we tried other software decoders and fed to the loudspeaker opposite to other settings with different but equally unsat- the sound source.” isfying results. Had we not had previous expe- Later in the paper, the authors conclude that

2 Ambisonics provides poor localization. How- want to design a decoder; they simply want to ever, given that the listening tests were per- verify that an existing one works properly and formed with single listeners using a speaker then use it. array with 2-meter radius, the best (known) Good engineering practice dictates that the methodology for decoding would have been to behaviors of the individual components of a sys- perform exact, or velocity decoding at low fre- tem under test be verified so that its overall per- quencies, energy decoding at middle and high formance can be properly characterized. While and use near-field compensation. the design criteria have been outlined or implied Another recent paper discusses “spectral in many papers, we have found no discussion of impairment” in Ambisonic reproduction [7]. tools or methodologies to assess how well they While not stated explicitly, a careful reading re- have been met in a given implementation. veals that exact decoding was used over the en- We confine the discussion in this paper to de- tire frequency range. This is known to produce coders suitable for one or two listeners.1 In this comb-filtering artifacts, and again, better decod- paper we test only horizontal, first-order Am- ing strategies are known. bisonic decoders; however, the extensions for Craven sums up the current situation as fol- full 3-D reproduction (periphony) and arbitrary lows [8], orders are straightforward. There are a number of additional factors, any There may be doubt about the of which can have a large effect on the quality meaning of the term “Ambison- of playback but are beyond the scope of what ics”. The precise definition given is discussed here, such as room acoustics, accu- in [Sec. 1.1 of this paper] seems to racy of speaker positioning and matching, tim- have been largely ignored, and the ing skew in multichannel D/A converters, and so term “ambisonic” is now applied forth. Simply due to the number of interconnec- loosely to any system that makes tions, speakers, and amplifiers in a typical Am- use of circular or spherical harmon- bisonics playback system, the odds of making a ics. setup error are much higher than in the case of Most software decoders have many adjust- stereo and the faults more difficult to diagnose ments, but their authors provide little or no guid- than, for example, a channel reversal in stereo ance on appropriate settings for various play- reproduction. back situations, making it difficult for a user to Due to space limitations, we test just four know if they are functioning correctly without √decoders and a single speaker configuration, the extensive listening tests. We have read many ac- 3 : 1 rectangle. In this configuration there are counts of “phasey”, “ambiguous”, or “unpleas- four speakers at azimuths 30 and 150 de- ant” Ambisonic reproduction that can be at- grees in the horizontal plane. It was preferred tributed to this problem. In particular, phasey over a square layout in previous listening tests, reproduction will occur when exact velocity de- as well as being easier to fit in most domestic coding is used at higher frequencies, where the rooms. are smaller than the inter-ear dis- 1Design of decoders that work well over large areas is tance. a distinct art and in general involves additional constraints The key point here is that it is not enough to that compromise their performance for small areas. [9] simply specify that an Ambisonic decoder was used; not all decoders or decoder philosophies perform in the same way. It is also worth to noting that various workers in the field may not

3 1.1 Definition of Ambisonics tion of the localization perception, and the mag- In summary, we are trying to decide if a given nitude indicates the quality of the localization. decoder and loudspeaker configuration meet the In natural hearing, from a single source the mag- criteria for Ambisonic reproduction as defined nitude of each is exactly 1 and the direction is by Gerzon in [10]. the direction to the source. Ideally, both types of cue will be accurately A decoder or reproduction system recreated by a multispeaker playback environ- ◦ for 360 surround sound is defined ment and they will be in agreement with each to be Ambisonic if, for a central lis- other. In terms of Gerzon’s models this means tening position, it is designed such that rV and rE should agree in direction up to that around 4 kHz; that below 400 Hz, the magnitude of rV is near unity for all reproduced directions; i) velocity and energy vec- | | 2 and that between 700 Hz and 4 kHz, rE is max- tor directions are the same at imized over as many reproduced directions as least up to around 4 kHz, such possible. |rE| achieves a maximum value of 1 that the reproduced azimuth for a single source and is always less than 1 for θ θ V = E is substantially multiple sources. Gerzon observes that a value unchanged with frequency, less than 0.5 “gives rather poor image stability.” ii) at low frequencies, say below [11] around 400 Hz, the magnitude Following Gerzon [12], the magnitude and of the velocity vector is near direction of the velocity vector, rV and rˆV, at the unity for all reproduced az- center of a speaker array with n speakers is imuths, n ∑i=1 Giuˆi iii) at mid/high frequencies, say rV rˆV = Re (1) ∑n G between around 700 Hz and i=1 i 4 kHz, the energy vector mag- whereas the magnitude and direction of the en- nitude, rE, is substantially ergy vector, rE and rˆE are computed by maximised across as large a ◦ n ∗ part of the 360 sound stage as ∑i=1(GiGi )uˆi rErˆE = n ∗ (2) possible. ∑i=1(GiGi )

We feel that these are necessary, if perhaps not where the Gi are the (possibly complex) gains sufficient, conditions for good surround sound from the source to the i-th speaker, uˆ is a unit ∗ reproduction. vector in the direction of the speaker, and Gi is the complex conjugate of Gi. 2 REVIEW OF AMBISONIC CRITERIA The main goal of the test protocol outlined Gerzon defines two primitive models that are in Sec. 3 is to recover the Gi’s used by the de- characterized by the velocity localization vec- coder under test for a given source direction and speaker configuration. In the general case, they tor (rV) and energy localization vector (rE). ∗ These models encapsulate the primary Interau- vary with frequency; hence, Gi and GiGi can be ral Time Difference (ITD) and Interaural Level thought of as the complex frequency and energy Difference (ILD) theories of auditory localiza- responses of the decoder for a particular direc- tion. The direction of each indicates the direc- tion. The remaining parameters are the imaginary 2Precise definitions of these are given in Sec. 2.

4 parts of velocity localization vector ing time of this file is 109.2 seconds. 3 This file is then applied directly to the in- ∑n G uˆ Im i=1 i i (3) put of a software decoder, or played out though ∑n i=1 Gi a multichannel soundcard into a hardware de- coder, and the output recorded for subsequent which correspond to “phaseyness” arising from analysis. In either case the intermediate output using filters whose phase responses are not of the testing process is a file containing the re- matched. The most important part of this is the sultant loudspeaker feeds derived by the decoder Y-component, the direction of the ear axis, over for the particular speaker configuration. The the frequency range 300 to 1500 Hz, and should sync signal is recorded directly into the output be as close to zero as possible [12]. file, without passing through the decoder under In general, optimizing the r and r vectors V E test. A screen capture showing this process is requires the use of a different decoding matrix shown in Figure 2. for each frequency range. This can be accom- In the current work, 72 horizontal directions plished with shelf filters or band-splitting filters are used and the four loudspeaker feeds captured similar to those used in loudspeaker crossovers. to the file, resulting in 288 impulse response In either case, it is imperative that the filters are (IR) measurements for each decoder configura- phase matched to preserve uniform frequency tion tested. response over all directions. To perform the analysis, the complex fre- Last, near-field compensation corrects for quency and energy responses are computed for the reactive component of the reproduced each IR, yielding the G s needed to compute r soundfield when the listening position is within i V and r according to Eqns. 1 and 2. By exam- a few meters of the loudspeakers. E ining these results, we can evaluate the decoder against Gerzon’s criteria for Ambisonic repro- 3 TEST PROTOCOL duction as well as our other criteria. It is not necessarily straightforward to determine It is worth noting that a number of methods whether a decoder is operating optimally sim- can be used to measure the impulse response ply by inspecting the software or listening to of a system. A survey of these techniques can the output. They must be tested in order to be found in Stan, et al. [15]. Any of these verify that the desired characteristics have been techniques should work for this analysis. For achieved. our current purposes, we have selected the sim- To do that, a test signal was created con- plest one since it provides adequate signal-to- 16 sisting of unit impulses at 2 -sample intervals. noise (S/N) ratio for direct testing of software This signal was encoded according to the B- decoders and removes the deconvolution pro- format conventions (see Appendix B) to create cess as a potential source of errors. To test hard- a series of unit impulses from varying source di- ware decoders, more sophisticated IR measure- rections. This test signal is the equivalent to the ment techniques, such as MLS or Sine Sweep, output of a virtual soundfield microphone with are needed to deal with the lower S/N ratio and a virtual source that is moved from one angu- possibly higher distortion levels found in analog lar position to the next. The original series of circuitry. impulses is included on an additional channel in the test file to act as a sync signal to simplify the 3Matlab code to generate this test file, along with the later analysis. A plot of the test file is shown in code discussed in Sec. 3.1 is available on our website [13]. Figure 1. At 48 kHz sample frequency, the play-

5 Figure 1: A plot of the B-format impulses file encoding an impulse from 72 source directions in the horizontal plane. The signals from top to bottom are W, X, Y, Z, and original impulses. They are offset vertically on the plot for clarity. The original impulse is included as a sync pulse that is recorded directly to the output file to simplify the analysis process. The first and last four impulses in the file excite each B-format component of the decoder independently. These are not used in the current analysis.

3.1 Analysis speaker. It also returns the sample rate of the We have created a set of tools in Matlab to an- file, Fs. Optional return values are the raw sam- alyze the recorded impulse responses and pro- ples and an array containing the locations of the sync pulses. duce various plots, which give us insight into the behavior of the decoders. All the Matlab [ imps_dir_spkr Fs ] = ... code used in this paper is available on our web- read_spkr_imps( file ); site [13]. The next function, compute_ffts_imps, takes The first function, read_spkr_imps, reads the imps_dir_spkr array as input and com- the speaker feeds file, normalizes the range putes the FFT of each impulse. This returns of the data to fullscale = 1.0, extracts the a complex valued array, with the same indices sync pulses from the sync track, and then uses as above, but with frequency instead of sample them to extract the individual impulses. It re- number. It also returns the length of the FFT. By 16 × × turns a 2 72 4 real-valued array, called slicing though this array along various dimen- imps_dir_spkr in this example. The indices sions, we obtain the data we need to compute to each dimension represent sample number, the parameters of interest. source direction in 5 degree increments, and [ ffts_dir_spkr NFFT ] = ...

6 Figure 2: A screen capture of Plogue Bidule [14] being used as a test harness for a decoder avail- able as a VST plug-in. This works for Windows and MacOSX, which also supports Apple Audio Unit plug-ins. A similar setup using Jack and is used to test -hosted decoders.

compute_ffts_imps( imps_dir_spkr ); Epxyz = sum_pxyz( ffts_dir_spkr .* ... conj(ffts_dir_spkr), ... √Next, the speaker positions are specified. In the speaker_weights ); 3 : 1 rectangular array used here, there are four ◦ loudspeakers, with 60 separation between the The absolute value of Vpxyz is used to exam- pairs of speakers in the front and rear. It is im- ine the frequency response of the pressure and portant that the order of these corresponds to the velocity components to determine whether or order of the recorded speaker signals in the data not the decoder under test implements near-field file. correction and dual-band processing. We also phi = pi/6; % frontal spkrs half-angle compare the frequency and phase responses in speaker_weights = [ ... various directions to check that they are consis- 1, cos( phi), sin( phi), 0 ; tent. 1, cos(pi-phi), sin(pi-phi), 0 ; rV and rE are now computed by normaliz- 1, cos(pi+phi), sin(pi+phi), 0 ; ing by the pressure component and converting 1, cos( -phi), sin( -phi), 0 ]; to spherical coordinates to yield the direction and magnitude. rVcart is complex. The real The next functions compute the unnormalized parts comprise rV. The imaginary parts, and in components (Pressure, X, Y, and Z) of rV and particular the one parallel to the Y-axis, indicate rE, by summing the ffts and the ffts squared phaseyness due to use of filters that are not phase weighted by the speaker locations. Vpxyz and matched. Epxyz are indexed by frequency, direction, and component. The values in Vpxyz are complex, [ rVsph rVcart ] = r_from_pxyz( Vpxyz ); those in Epxyz are real. [ rEsph rEcart ] = r_from_pxyz( Epxyz ); Vpxyz = sum_pxyz( ffts_dir_spkr, ... speaker_weights ); At this point, polar plots of rV and rE are created at various frequencies and evaluated according

7 to the Ambisonic criteria discussed in Sec. 2. is assuming that also holds for irregular arrays. Moore and Wakefield have published schemes Sec. 5.3 discusses the effect of this error. In our for prioritizing and weighting these criteria to experience, most software Ambisonic decoders produce a single figure of merit suiteable for use that can be downloaded are of this type. in numerical optimization methods [16, 17]. The exact decoder matrix recreates the pres- sure and velocity at the central position under 4 KEY COMPONENTS OF AN the assumption that the wavefronts are planar, AMBISONIC DECODER i.e., sources and loudspeakers at an infinite dis- All decoders must perform the fundamental tance. Sources and loudspeakers at finite dis- function of forming suitable linear combinations tances produce wavefronts with a “reactive” (or of the B-format signals for each loudspeaker in imaginary) component, which is perpendicular the array that reproduces the pressure and parti- to the direction of propagation, in addition to cle velocity at the central position in the array. the “real” component, which is parallel to the di- This set of linear combinations is called the ex- rection of propagation. This results in the well- act or matching decoder matrix. It is also called known bass-boosting proximity effect in direc- the basic or systematic solution of the speaker tional . It is important to under- array or simply the velocity decode. Regardless stand that this is an actual physical effect, not a 4 of what it is called, it is unique for each loud- design flaw in the microphone or loudspeaker. speaker array geometry. For point sources, the frequency at which the re- In general, there are three types of loud- active and real components are equal is given by speaker arrays: c f = (4) 2πd 1. regular polygons and polyhedra, such as square, hexagon, cube, dodecahedron where c is the speed of sound and d is the dis- tance from the loudspeaker [20]. 2. irregular layouts but with speakers in√ dia- In terms of the velocity localization vector, metrically opposite pairs, such the 3 : 1 this makes rV > 1 at low frequencies, which has rectangle tested here and its 3-D coun- the effect of widening the source images and terparts the bi- and tri-rectangular arrays. making them unstable with head rotation. This This are also called semiregular arrays in artifact is most apparent in recordings of string some literature. trios and quartets, where the cello as if it 3. general irregular arrays such as an ITU 5.1 is somewhat larger and closer than the other in- array or hemispherical dome. struments. To compensate for this, a single-pole high-pass filter is applied to the velocity signals In all cases the number of loudspeakers must ex- in the decoder. We call this near-field compen- ceed the number of B-format signals. sation. This is also refered to as distance com- A method for deriving the decoder matrix pensation in the older literature. The design of for the first two types is given in Appendix A. 4The implication for B-format signal encoding is that Methods for the third type remain an area of the X, Y, and Z signals must have a low-frequency boost open research [10, 18, 19]. In the case of strict and phase shift relative to the W signal, the amount of regular arrays (type 1), this reduces to the re- which is a function of the source distance. For natu- sult that the decoding and encoding matrices are ral acoustic sources, a properly aligned Soundfield-type microphone does this by virtue of accurate transduction identical, with the speaker positions substituted of the incident wavefronts, and thereby encodes distance. for the source positions. The single most perva- For synthetic sources, this must be included in the encod- sive error in Ambisonic decoder design and use ing equations. Further discussion of this is in Appendix B.

8 this filter is covered in Appendix A. values of k are often called “Max-rE” or “energy This exact reproduction of acoustic pressure decodes.” and velocity is equivalent to the condition rV = 1 Next we must apportion the total between in Gerzon’s velocity localization model. In the- boost for pressure and cut for velocity in such ory that happens at only a single point in space; a way that the overall loudness and balance however, in practice, it is good enough up to between low and high frequencies is main- roughly one-half from the central tained. One approach is preserving the root- position. If we desire reconstruction over an mean-square (RMS) level.7 In the 2-D case area of 0.5-meter, the exact decoder matrix can W 2 + X2 +Y 2 = 3 at both LF & HF (7) be used up to roughly 350 Hz and corresponds √ to the frequency regime of ITD-based auditory X Y 2 = = at HF only (8) localization. If it is used beyond that frequency, W W 2 comb filter artifacts and in-head localization ef- fects will be experienced by the listener. This is solving for the HF gains probably the second most common error in Am- √ 3 bisonic reproduction. W = ≈ +1.76 dB (9) At higher frequencies, say 700 to 4000 Hz, √2 ILD-based auditory localization models are ap- 3 X = Y = ≈ −1.25 dB (10) propriate, which Gerzon encapsulates in the en- 4 ergy localization vector, r . The one parameter E In the 3-D case that can be changed in the exact solution with- out changing the direction of the velocity local- W 2 + X2 +Y 2 + Z2 = 4 at both LF & HF ization vector, rˆV, is the velocity-to-pressure ra- (11) tio (i.e, the gain of X, Y, and Z vs. W), usually √ denoted by k.5 Writing the magnitude of the en- X Y Z 3 = = = at HF only (12) ergy localization vector, rE, as a function of k, W W W 3 for any regular 2-D polygonal array with at least four speakers, we get solving for the HF gains √ ≈ 2k W = 2 +3.01 dB (13) r (k) = (5) √ E 2k2 + 1 2 X = Y = Z = ≈ −1.76 dB (14) 3 which√ attains its maximum value when k = 2 ≈ ≈ − 6 2 0.7071 3.01 dB. In the case of a regu- Note that the gains derived here for the 3- lar 3-D polyhedral array, with at least six speak- D case differ from those given by Gerzon in ers, we get 2k 7In our listening tests with small numbers of loud- rE(k) = 2 (6) speakers, this provides a tonal balance similar to a single- 3k + 1 band energy decode. However, Adriaensen has observed which√ attains its maximum value when k = that with a 16-speaker array reproducing first- and second- 3 ≈ ≈ − order material, there is a quite apparent low-frequency 3 0.5774 4.77 dB. Figure 3 shows graphs of these equations. Solutions with these boost unless the decoder is set up assuming that pressures are added [21]. This is because very low frequencies sum 5k is equivalent to the inverse of the specific acoustic coherently, so equalizing the RMS levels is not correct in impedance. that regime. The effect is also present in smaller arrays, 6found by setting the derivative with respect to k equal but to a lesser extent, and results in the subjective impres- to zero and finding the roots sion of extended bass. The best way to treat this is an open question at this time.

9 [11]. We have auditioned both sets using a bi- rectangle array and find that the present ones yield a similar tonal balance to 2-D playback on a rectangular rig with similar aspect ra- tio, whereas Gerzon’s provide a distinctly more rolled off and spatially diffuse balance. For semiregular speaker arrays (“type 2”), the magnitude of rE varies in direct proportion to the angular density of the loudspeakers in a given direction, but for first-order√ Ambisonics 2 the average value√ cannot exceed 2 for hori- 3 zontal arrays√ and 3 for 3-D arrays. Gerzon 3 notes that 3 is “perilously close to being un- Figure 3: Plots of rE as a function of the satisfactory” [11]. However, in most periphonic velocity-to-pressure ratio k. The top curve (with height) systems, practical and domestic shows the 2-D case, the bottom curve√ shows√ the considerations often dictate that there will be 2 3 3-D case. The maximum values are 2 and 3 , more speakers in near horizontal than vertical respectively. directions. Localization is better in directions with more speakers hence, our preference for a rectangle horizontal array over a square for pre- “morph” the exact solution into the energy so- dominantly frontal source material. However, lution. A more flexible strategy, first suggested Ambisonic systems have a clear advantage over by Barton [10], is to split each B-format sig- other surround systems in that ambient/diffuse nal into two bands so that two independent so- sounds are still perceived realistically even from lutions can be used. We call this a dual-band directions with “poor localization.” decoder. It requires the use of phase-matched A physical interpretation of the energy de- band-splitting filters similar to those used in code is that for a square array, a source directly loudspeaker crossover networks. The design of ahead (azimuth zero), is reproduced with equal such filters is discussed in Appendix A. gain in the two front speakers and with zero gain In summary, all the key components in the two rear speakers. That is, the virtual mi- • decoding matrices matched to the speaker crophone pattern formed by the gains from a di- array geometry rectly frontal source to the speakers is a near- supercardioid, with the two nulls at the angu- • near-field compensation lar locations of the rear speakers. The same is • true in 3-D of a cube array; a frontal sound with frequency-dependent gains (shelf filters or azimuth and elevation zero, is reproduced with a dual-band) optimizing for ITD and ILD equal gain in the front speakers and with zero cues using phase-matched filters gain in the rear speakers. are needed to satisfy various localization mecha- This suggests that for optimal reproduction nisms. It is this compensation for important psy- two decoding matrices are needed, one for low choacoustic phenomena by simple means that frequencies with k = 1√and another√ for high makes a decoder Ambisonic. 2 3 frequencies, with k = 2 or 3 for the 2-D Anecdotal evidence suggests that us- and 3-D cases, respectively. Classic Ambisonic ing shelf or band-splitting filters with small Decoders used phase-matched shelf filters to phase-matching errors is better than omitting

10 frequency-dependent processing all together. was set to 2.0 meters and dual-band decoding This remains to be confirmed with formal was turned on with 380 Hz crossover. listening tests, however. Finally, if one must Results are shown in Figure 4. Examining use a decoder without frequency-dependent the frequency and phase response graph, we see processing, the best result will be obtained that NFC is implemented and has the correct - using the energy-optimized values of k for all 3 dB point for 2 meter distance (27 Hz). The frequencies [1]. NFC (correctly) has a large effect on the low- frequency gain and phase response of the ve- 5 EXAMPLES locity signals, which makes the magnitude of We review general classes of decoders and dis- rV less than 1 (0.91) and introduces a phase- cuss the results of testing on four samples. Two matching error between pressure and velocity. perform well and two do not perform√ well for These are intended to be the complement of the the given speaker configuration, the 3 : 1 rect- near-field effect of the loudspeakers, so that at angle. the listening position, the magnitude of rV will be exactly 1 and the phase mismatch will be 0. 5.1 Types of Decoders To examine the frequency and phase re- sponse of the band-splitting filter in isolation, Beyond the issues of operating system, audio we ran a second test with NFC turned off. and user interfaces, the main distinguishing at- These results are shown in Figure 5. Frequency- tributes of decoders are which of the three nec- dependent gains are implemented with the cor- essary components discussed in Sec. 4 are im- rect values of k, and the phase responses of the plemented and how the decoding matrix is spec- filters are matched. Also note that the pressure ified to the decoder. and velocity gains are identical over all source Some decoders provide presets such as azimuths. square, rectangle, pentagon, hexagon, cube, do- Examining the polar plots of r and r , we decahedron, and so forth. With others the angu- V E see that all source azimuths are rendered cor- lar position of the speaker is entered along with rectly at both high and low frequencies. At the directivity of a virtual microphone or gains low frequencies the magnitude of r is (almost) of the various orders of spherical harmonics. In V 1 and at high frequencies the magnitude of r the third type, the decoding matrix is specified E varies smoothly and has the highest attainable√ directly. 2 average for a first-order decoder of 2 . The 5.2 Decoder 1 magnitude of rV is slightly less than 1 (0.95) be- cause the shelf filters are already affecting the Decoder 1 is Adriaensen’s AmbDec [22]. Ver- gain at 150 Hz, as seen in Figure 5(c). sion 0.2.0 was tested on a dual AMD Athlon This decoder and configuration is Am- machine running Fedora Core 8 and the Planet bisonic. CCRMA distribution of audio software [23]. It has provisions for near-field compensation, 5.3 Decoder 2 dual-band processing, and a number of other features. The decoding matrices are specified di- Decoder 2 is a VST plugin that was tested on a rectly in the configuration file. The distribution MacBook Pro running OS X 10.5 using Bidule as a host program.8 The GUI has sliders used contains presets files for common√ loudspeaker arrays, but does not include the 3 : 1 array. 8We do not identify Decoder 2 since many decoders It was tested with decoding matrices derived by appear to use the same underlying approach. We suspect the procedure outlined in Appendix A. Distance that any one of them would have produced results no bet-

11 (a) rV and rE measured at 150 Hz (b) rV and rE measured at 3 kHz

(c) frequency and phase response

Figure 4: AmbDec with configuration derived by the procedure given in Appendix A. This is a very good result. (a) and (b) show rV and rE at 150 Hz and 3 kHz. Source directions are correct and matched.√ The magnitude of rV is uniform in all directions and rE at 3 kHz attains an average value 2 of 2 . (c) shows that NFC and dual-band processing is implemented. The next figure shows the same configuration with NFC switched off so that frequency and phase response can be examined in isolation.

12 (a) rV and rE measured at 150 Hz (b) rV and rE measured at 3 kHz

(c) frequency and phase response

(d) phase mismatch between pressure and velocity

Figure 5: AmbDec with NFC switched off. Only three lines are seen in (c) because the pressure and velocity phase responses are identical.

13 to specify azimuth, elevation, directivity, and compiler or an audio programming language distance for each loudspeaker. There is also a [24]. The sound processing elements are called switch to turn shelf filters on and off. The shelv- opcodes, which are connected and invoked us- ing frequency is not listed in the documentation, ing orchestra and score files. The documenta- nor is there any mention of near-field compensa- tion for bformdec says “New in Version 5.07 tion. It was tested with shelf filters switched on (October 2007).” This opcode does not have and the azimuths of the four speakers entered on provision for the rectangular configuration we the sliders. All other settings were left in their are using; however, it is still useful to include default positions. this test as an example of a decoder that is in Testing results are shown in Figure 6. No widespread use. The configuration tested is the near-field compensation is provided. Shelf fil- square speaker layout. ters are implemented with a turnover frequency The test was conducted with the following of about 415 Hz. The magnitude of rV varies orchestra file: with source direction, and its direction does not sr = 48000 match the source direction. For some source di- kr = 4800 > . rections rV 1 0. Frequency response varies ksmps = 10 with source direction. At 3 kHz, the magni- nchnls = 5 tude of rE is < 0.5 from 45 to 135 degrees az- imuth. Note that over these azimuths, the shelf instr 1 filters merely make a suboptimal rE even worse. a1 soundin "bf-1-sync.wav" We emphasize that shelf filters will have the in- aw soundin "bf-2-w.wav" tended effect only when paired with the correct ax soundin "bf-3-x.wav" decoder matrix for the speaker array geometry. ay soundin "bf-4-y.wav" There is a small (15◦) error in phase matching az soundin "bf-5-z.wav" near the crossover frequency. a2, a3, a4, a5 bformdec 2, aw, ax, ay, az In summary, this decoder is not suitable for outc a1, a2, a3, a4, a5 use with this speaker configuration when set up endin according to the instructions provided. Test results are shown in Figure 7. To be fair, it is possible to derive appropri- The most apparent feature of these results is ate virtual microphone angles and directivities that the frequency response is perfectly flat; nei- from the exact decoder matrix, enter those into ther NFC nor shelf filters are implemented. At decoders of this type and obtain somewhat better low frequencies the magnitude of rV is 1 and performance than we observe here. In the gen- source directions are rendered correctly. How- eral case, the virtual microphones will not point ever, at high frequencies it remains 1, which will at the speaker positions. See Sec. A.1.2 for fur- result in in-head localization and comb filtering ther discussion and examples. artifacts with head movement. This also puts the high-frequency rE magnitude well below the op- 5.4 Decoder 3 timum value. This decoder should not be used as Decoder 3 is Csound’s bformdec Opcode.9 currently implemented. Csound is a computer programming language If one is forced to use this decoder, reducing for dealing with sound, also known as a sound the levels of X, Y, and Z by 3 dB will produce ter than the one we happened to test. an “energy decode,” which was recommended in 9Csound tests were performed by Sven Bien from Bre- [1] in cases where no shelf filters are available. men University.

14 (a) rV and rE measured at 150 Hz (b) rV and rE measured at 3 kHz

(c) frequency and phase response for a 0◦ (d) frequency and phase response for a 90◦ azimuth source azimuth source

(e) phase mismatch between pressure and velocity

Figure 6: Decoder 2

15 (a) rV and rE measured at 150 Hz (b) rV and rE measured at 3 kHz

(c) frequency and phase response for all source azimuths

Figure 7: Decoder 3 Csound’s bformdec Opcode. Only two of the four lines are visible in (c) because the pressure and velocity responses are identical.

16 (a) rV and rE measured at 150 Hz (b) rV and rE measured at 3 kHz

(c) frequency response with and without NFC (d) phase mismatch between pressure and velocity

Figure 8: Decoder 4 Minim Analog Ambisonic Decoder.

17 5.5 Decoder 4 sense of envelopment, with no audible ar- Decoder 4 is a Mimim AD-10 decoder whose tifacts. Ambisonic processing is similar to the model • With Decoder 2 all sources were localized described here [5] and is an example of a Classic to the front or rear, with no sense of envel- Ambisonic Decoder. It is an all-analog imple- opment and a very narrow “sound stage” mentation of an Ambisonic decoder for horizon- on orchestral test material. tal decoding to four or six loudspeakers. It has a switch to turn NFC on and off and a “layout” • We were not able to audition the Csound adjustment to accommodate speaker arrays with decoder directly, but did simulate it in aspect ratios ranging from 1 : 2 to 2 : 1. For these Bidule using the measured parameters, tests, the layout control was set to “3 o’clock”, and can confirm that the predicted comb- which√ was our best estimate of the setting for filtering and in-head localization artifacts the 3 : 1 array used for our analysis. NFC was are quite apparent. switched on. The test was performed using an Audio Precision analyzer. Results are shown in 7 DISCUSSION Figure 8. As can be seen from the graphs, cor- From the measurements reported above, it may rect shelf filters and NFC are implemented. At be safely concluded that not all available Am- low frequencies, rV = 1 and source directions bisonic decoders perform according to the re- are rendered correctly. At high frequencies, the quirements set down in Sec. 2, and in fact most maximum possible values of rE are achieved, in- do not! The most common problems that were dicating that this decoder is operating near opti- found in decoders are mally for the speaker array used for these tests. The slight “egg shape” in rE at low frequencies • Incorrect coefficients for rectangular or is most likely due to a gain mismatch between other nonregular polygonal loudspeaker the front and rear loudspeaker outputs. arrays This decoder is Ambisonic. • Lack of dual-band decoding 6 LISTENING TESTS • Lack of near-field compensation One of the authors carried out informal listening tests in a domestic setting using some of the test These omissions or errors cause the audio repro- files employed in earlier listening tests: voice duction to suffer from poor localization behav- announcements in eight directions, continuously ior, phasiness, and other artifacts. panned pink noise, a recording of a classical chamber orchestra, and the applause that fol- The regular polygon dilemma lowed the performance. The last was recorded In General Metatheory, Gerzon described a with a Calrec Soundfield Microphone MkIV, se- “naïve decoder for a regular polygon loud- rial number 099, with original capsules and cal- speaker Layout,” with the following form: ibration. The results broadly confirm the results of the W + Xcosθ + Y sinθ (15) testing described in this paper. for which he proved that “the Makita and En- • Adriaensen’s AmbDec decoder provided ergy vector localizations coincide, and the√ en- good localization in all directions, uni- 2 ergy vector magnitude, rE, cannot exceed 2 . form frequency response, and a good [12]

18 Unfortunately, that statement is true only for importance of all the features of an Ambisonic the case of regular polygons, and following this decoder. The use of Ambisonic decoders that form for nonregular polygons gives an incorrect are inappropriate to the venue, have incorrect result. Specifically, if a regular array is nar- decoding coefficients, or lack the important fea- rowed, then the naïve decoder gives values for tures of dual-band decoding and NFC will give the coefficients that are larger for X and smaller results that are inferior to what would be ob- for Y, relative to the regular polygon. But in- tained with a correct Ambisonic decoder and tuition tells us that narrowing the array would which will prejudice the results of comparative require less X and more Y to achieve the same listening tests. localization vectors. In this paper, we have made every effort to The correct decoder for the rectangle case is “tell it all” as clearly and plainly as we can with- X Y out oversimplification, and back that up with W  √  √ (16) examples, test files, and sample code that can ϕ ϕ 2cos 2sin be downloaded and used. We hope that re- where 2ϕ is the angle subtended by the front two searchers conducting experiments in audio lo- loudspeakers. calization will adopt these or similar techniques It may be argued, and in fact we do argue, to validate their experimental setups and that de- that the rectangle decoder (and its 3-D exten- coder writers will now have necessary knowl- sion, the bi-rectangle) is the single most impor- edge and tools to write better decoders. tant case for actual applications in reproduction A decade ago, lack of program material (B- of first-order Ambisonic recordings. The reason format recordings) was the biggest problem with for this importance is that rectangular arrays fit Ambisonics. Now that downloads of B-format better into ordinary rooms, and in addition that recordings [4], relatively low-cost B-format mi- they give a needed improvement in mid/high- crophones [25], and pocket-sized multichannel frequency localization toward the front and back digital audio recorders are available, suitable when the correct decoder is used. program material is somewhat more plentiful. Many of the available software Ambisonic The next hurdle is the creation of easy-to-use decoders do not implement dual-band decod- playback software that runs on popular comput- ing, presumably because of lack of knowledge ing platforms, about which we can say: These about how to design IIR filters. When dual- are Ambisonic. band decoding is implemented, it frequently is found to utilize shelf filters that are not phase 8 ACKNOWLEDGMENTS matched between the filter for W and the fil- The authors thank Sven Bien from Bremen ters for X/Y/Z. Since that phase mismatch oc- University and his advisor Jörn Loviscach at curs only in the frequency range around the tran- Hochschule Bremen, University of Applied Sci- sition frequency (see Figure 6(e)), it is difficult ences, for help with Csound, and in partic- to evaluate the seriousness of the error. Clearly, ular Sven for running the tests under very any errors will be program dependent, depend- tight deadline constraints. We also thank Mar- ing on the spectral density in that particular fre- tin Leese, Daniel Courville, David Malham, quency range. However, it is relatively easy to and Etienne Deleflie for their careful reading do it correctly and it should be done correctly. and suggestions, as well as the participants in See Appendix A. the sursound email discussion group for their Previous listening tests by the authors of this encouragement and comments, Don Drewecki paper, and informal listening tests done during for providing a steady flow of top-notch B- the writing of the present paper, have shown the

19 format concert recordings for our listening tests, [8] Peter G Craven. The hierarchical view- and Jim Mastracco for many hours of stimu- point. AES 22nd UK Conference - Illusions lating discussion on microphones and acous- In Sound, (16):7, Sep 2007. 3 tics. [9] David G Malham. Experience with Large REFERENCES Area 3-D Ambisonic Sound Systems. Pro- ceedings of the Institute of Acoustics Au- [1] Eric Benjamin, Richard Lee, and Aaron tumn Conference on Reproduced Sound 8, Heller. Localization in Horizontal-Only 1992. 3 Ambisonic Systems. In 121st Audio En- gineering Society Convention Preprints, [10] Michael A. Gerzon and Geoffrey J. Barton. number 6967, San Francisco, October Ambisonic Decoders for HDTV. In 92nd 2007. 1, 2, 11, 14, 23 Audio Engineering Society Convention Preprints, number 3345, Vienna, March [2] Richard Lee and Aaron J Heller. Am- 1992. AES E-lib http://www.aes.org/ bisonic localisation - part 2. 14th Interna- e-lib/browse.cfm?elib=6788. 4, 8, 10 tional Congress on Sound and Vibration, 2007. 1, 2 [11] Michael A. Gerzon. Practical Periphony: The Reproduction of Full-Sphere Sound. [3] Richard Elen. Ambisonic.net - where In 65th Audio Engineering Society surround-sound comes to life. http:// Convention Preprints, number 1571, www.ambisonic.net/. 2 London, February 1980. AES E-lib [4] Etienne Deleflie. Ambisonia: The next http://www.aes.org/e-lib/browse. generation surround sound experience, by cfm?elib=3794. 4, 10, 22, 23 the enthusiasts, for the home theatre. [12] Michael A. Gerzon. General Metathe- http://www.ambisonia.com/. 2, 19 ory of Auditory Localisation. In 92nd [5] Michael Gerzon. Multi-System Ambisonic Audio Engineering Society Convention Decoder. Wireless World, 83(1499):43–47, Preprints, number 3306, Vienna, March July 1977. Part 2 in issue 1500. Available 1992. AES E-lib http://www.aes.org/ from http://www.geocities.com/ e-lib/browse.cfm?elib=6827. 4, 5, ambinutter/Integrex.pdf (accessed 18, 27 June 1, 2006). 2, 18 [13] Aaron Heller. Ambisonics reference ma- [6] Catherine Cuastavino, Veronique Larcher, terial. http://www.ai.sri.com/ajh/ Guillaume Gatusseau, and Patrick ambisonics. 5, 6, 22 Bossard. Spatial audio quality evaluation: [14] Plogue Art et Technologie Inc. Bidule: Comparing transaural, ambisonics and modular audio software. http://www. stereo. Proceedings of the 13th Interna- plogue.com/. 7 tional Conference on Auditory Display, 2007. 2 [15] Guy-Bart Stan, Jean-Jacques Embrechts, and Dominique Archambeau. Compari- [7] Audun Solvang. Spectral impairment for son of different impulse response measure- two-dimensional higher order ambisonics. ment techniques. Journal of the Audio Journal of the Audio Engineering Society, Engineering Society, 50(4):249–262, Apr 56(4):267–279, Jan 2008. 3 2002. 5

20 [16] David Moore and Jonathan Wakefield. [23] Fernando Lopez-Lezcano. Planet ccrma The Design of Improved First Order Am- at home. http://ccrma.stanford.edu/ bisonic Decoders by the Application of planetccrma/software/. 11 Range Removal and Importance in a Heuristic Search Algorithm. Preprints [24] Wikipedia. Csound — wikipedia, the free AES 122nd Convention, 7053, 2007. 8 encyclopedia, 2008. [Online; accessed 11- August-2008]. 14 [17] David Moore and Jonathan Wakefield. Ex- ploiting Human Spatial Resolution in Sur- [25] Core sound tetramic. http://www. round Sound Decoder Design. In 125th core-sound.com/TetraMic/1.php. 19 AES Convention Preprints, San Francisco, [26] Eric W. Weisstein. Moore- Oct 2008. 8 Penrose Matrix Inverse. From [18] Bruce Wiggins, Iain Paterson-Stephens, MathWorld–A Wolfram Web Resource. Val Lowndes, and Stuart Berry. The De- http://mathworld.wolfram.com/ sign and Optimisation of Surround Sound Moore-PenroseMatrixInverse.html, Decoders Using Heuristic Methods. In 2008. 23 Proceedings of UKSim 2003, Conference [27] Wikipedia. Singular value decomposition of the UK Simulation Society, pages 106– — wikipedia, the free encyclopedia, 2008. 114. UK Simulation Society, 2003. Availi- [Online; accessed 10-August-2008]. 23 ble from http://sparg.derby.ac.uk/ SPARG/PDFs/SPARG_UKSIM_Paper.pdf [28] Wikipedia. Digital biquad filter — (accessed May 15, 2006). 8 wikipedia, the free encyclopedia, 2008. [Online; accessed 11-August-2008]. 26 [19] David Moore and Jonathan Wakefield. The Design of Ambisonic Decoders for the [29] Leo Beranek. Acoustic Measurements, ITU 5.1 Layout with Even Performance pages 56–64. John Wiley, New York, Characteristics. In 124th Audio Engineer- 1949. 27 ing Society Convention Preprints, number 7473, Amsterdam, May 2008. 8 [30] Philip Cotterel. On the Theory of the Second-Order Soundfield Microphone. [20] Philip M. Morse and K. Uno Ingard. The- PhD thesis, Dept. of Cybernetics. Reading oretical Acoustics, chapter 7, page 311. University, Febuary 2002. 27 Princeton University Press, 1986. ISBN: 0691024014. 8, 27 [31] Jerome Daniel. Spatial sound encoding in- cluding near field effect: Introducing dis- [21] Fons Adriaensen. 1st vs 2nd order: tim- tance coding filters and a viable, new am- bre differences? email to Sursound online bisonic format. Preprints 23rd AES Inter- discussion group at music.vt.edu, January national Conference, Copenhagen, 2003. 2009. 9 27 [22] Fons Adriaensen. AmbDec User Manual, 0.4.3 edition, 2011. http://kokkinizita.linuxaudio. org/ambisonics/. 11

21 APPENDICES wards the center of the array that is parallel to the direction given by its position relative to the A DECODER DESIGN center. For the ith loudspeaker [ ] We present some “cookbook” procedures for the Li = xi yi zi (17) design of the key decoder components, along 2 2 2 with some digressions into the underlying the- where xi + yi + zi = 1, that is they are the di- ory and mathematics. Examples and implemen- rection cosines of the of the vector from the cen- tations of all three can be found on the authors’ ter of the array to the ith loudspeaker. In spheri- website [13]. cal coordinates this is [ ] L = cosθ cosε sinθ cosε sinε (18) A.1 Exact-Solution Decoder Matrix i θ This method is based on the idea of inversion where is the counterclockwise azimuth from ε where we write down the direction of propa- directly ahead, and is the elevation from hori- gation of the acoustic impulse created by each zontal. loudspeaker in the array, decompose that into Next, we select a set of spherical harmonic the selected set of spherical harmonics and use functions, up to the desired order, that form an generalized inversion to derive the decoder ma- orthogonal basis and then project the speaker di- trix that recreates the original impulse. This is a rections onto it. For first-order Ambisonics there 10 generalization of Figure 12 (“The Design Math- is single choice, the B-format definitions. In ematics”) in [11] and can be shown to be equiv- Cartesian coordinates, each Li becomes [ √ ] alent to the least-squares solution. If the prob- 2 Ki = x y z . (19) lem is under-constrained (many solutions), as it 2 i i i √ is for typical Ambisonic speaker arrays (more For the array used in this paper, the 3 : 1 rect- speakers than signals), it will give the solution angle, which has speakers at azimuths 30, 150, that requires the minimum overall radiated en- 210, and 330 degrees in the horizontal plane ergy, which also will yield the largest |rE|’s. K_rect30 = While this procedure provides a solution for 0.7071 0.8660 0.5000 0 any loudspeaker array, only regular arrays and 0.7071 -0.8660 0.5000 0 those with speakers in diametrically opposed 0.7071 -0.8660 -0.5000 0 pairs (Type 1 and 2 from Sec. 4) will result in the 0.7071 0.8660 -0.5000 0 directions of rV and rE agreeing for all source directions, which is one of the basic criteria for where each row corresponds to the coefficients Ambisonic reproduction. As noted earlier, so- of a single speaker. For a cuboid array that is 2 meters wide, 3 meters deep, and 1.5 meters tall lution of general irregular arrays (Type 3) that satisfy Ambisonic criteria is beyond the scope K_cuboid = of this paper. 0.7071 0.5121 0.7682 -0.3841 An additional constraint is that all the speak- 0.7071 0.5121 -0.7682 -0.3841 ers in the array are equidistant from the listen- 0.7071 -0.5121 -0.7682 -0.3841 ing position. While this can be relaxed by in- 0.7071 -0.5121 0.7682 -0.3841 troducing delays, “1/r”-level adjustments, and 0.7071 0.5121 0.7682 0.3841 per-speaker NFC, it is also beyond the scope of 0.7071 0.5121 -0.7682 0.3841 0.7071 -0.5121 -0.7682 0.3841 this paper. 0.7071 -0.5121 0.7682 0.3841 We assume that each loudspeaker in the ar- 10 ray produces a planar wave front propagating to- For higher-order Ambisonics, there are at least three possibilities.

22 We want to find M, the decoder matrix, that sat- 0.1768 -0.2441 0.1627 -0.3254 isfies the condition 0.1768 0.2441 0.1627 0.3254 0.1768 0.2441 -0.1627 0.3254 M × K = I (20) 0.1768 -0.2441 -0.1627 0.3254 0.1768 -0.2441 0.1627 0.3254 where I is the identity matrix, a matrix with 1’s on the diagonal and 0’s everywhere else and × where the function transpose() swaps the indicates matrix multiplication. When that is rows and columns of the matrix, yielding a ma- satisfied, it means that the speaker array, L, in trix where each column corresponds to one of combination with the decoder matrix, M, can the B-format signals, W, X, Y, and Z and each reproduce all the spherical harmonics indepen- row contains the decoder gains for the corre- dently (i.e., without crosstalk). sponding speaker for that signal. If K is invertible, then M = K−1; however, We can now use Eqn. 21 to check the qual- in the case with most Ambisonic arrays (and in ity of the solution by examining the entries in particular in the two examples above), it is not, r. Non-zero entries on the diagonal indicate a so we use the least-squares solution to the sys- spherical harmonic that is not being reproduced tem correctly. Non-zero entries off the diagonal indi- M × K − I = r (21) cate crosstalk or aliasing between the spherical harmonics. Either condition indicates that fur- In the typical case, we have more speakers than ther analysis of the array geometry is needed. signals, so this system is over determined and there are many solutions. By selecting the one A.1.1 A further math digression... that also minimizes the 2-norm of r, we obtain A† the one providing the highest average value of One way to compute , the pseudoinverse of A |r |. , is E A† = (ATA)−1AT The desired solution of Eqn. 21, M is given (22) by the Moore-Penrose pseudoinverse of K [26], where AT indicates the transpose of A.11 With which is available in Matlab and GNU Octave one further optimization (factoring out the W as the function pinv() and in many other com- column, which is constant), this is what Gerzon puter mathematics systems, such as Scilab and is doing in Figure 12 of Practical Periphony[11], Mathematica. which is reproduced as Eqn. 4 in [1]. In Matlab, Most spreadsheet programs include basic matrix operations such as transposition and in- >> M_rect30 = pinv(K_rect30); version, so it is possible to create spreadsheets >> transpose(M_rect30) that do these calculations, however from a nu- ans = 0.3536 0.2887 0.5000 0 merical computing standpoint, this is not the 0.3536 -0.2887 0.5000 0 best way to obtain the pseudoinverse. † 0.3536 -0.2887 -0.5000 0 A better way to compute A is to use 0.3536 0.2887 -0.5000 0 singular-value decomposition (SVD) [27]. This will also yield some insight into the underlying >> M_cuboid = pinv(K_cuboid); mechanism of the inverse method. The SVD >> transpose(M_cuboid) factors any matrix (real or complex) into three ans = 11 0.1768 0.2441 0.1627 -0.3254 If A is complex, use the conjugate transpose, denoted AH (where the H stands for "Hermitian".) 0.1768 0.2441 -0.1627 -0.3254 0.1768 -0.2441 -0.1627 -0.3254

23 other matrices, as follows >> svd(K_rect30) ans = × Σ × T A = U V (23) 1.7321 where U and V are orthonormal and Σ is diago- 1.4142 nal. Then the pseudoinverse is 1.0000 0 A† = V × Σ† × UT. (24) indicates that one spherical harmonic will not be Σ† is trivial to compute because it is diagonal; reproduced (Z) and that rE will not be uniform simply substitute each non-zero entry in Σ with in all directions. its reciprocal. Informally, V is said to contain the “input” A.1.2 Virtual Microphones or “analyzing” vectors of A, U is said to con- With some decoders, the decoder matrix is spec- tain the “output” vectors of A, and the diagonal ified in terms of virtual microphones. The the entries of Σ the “gains.” directions and directivites are obtained from the In terms of solving for Ambisonic decoder exact solutions found in Sec. A.1 as follows matrices [ az el r ] = cart2sph( M(:,2), ... T 1. V transforms K, into an orientation that M(:,3), ... is symmetric about the coordinate axes, X, M(:,4) ); Y, and Z in the case of first order, so each d = r ./ ( r + M(:,1)*sqrt(1/2) ); can be adjusted independently, then vm = [ az el d ] 2. Σ† adjusts the gain of each spherical har- Using the two examples from above and con- monic, so the pressure, velocity, and so verting from radians to degrees forth, are correctly reproduced, which also assures that source directions are cor- >> [ az el r ] = cart2sph( M_rect30(:,2), ... rectly reproduced, and finally, M_rect30(:,3), ... M_rect30(:,4) ); 3. U returns everything to the original orien- >> d = r ./ ( r + M_rect30(:,1)*sqrt(1/2) ); tation of the speaker array. >> vm_rect30 = [ az*(180/pi) el*(180/pi) d ]

The non-zero entries in Σ are called the singular vm_rect30 = values of A. In analyzing Ambisonic playback systems, if the number of singular values does 60.0000 0 0.6202 not equal the number of signals in use, then the 120.0000 0 0.6202 speaker array is not capable of reproducing all -120.0000 0 0.6202 the intended spherical harmonics. This may be a -60.0000 0 0.6202 trivial result, for example that a horizontal array cannot reproduce Z, or more significantly, that >> [ az el r ] = cart2sph( M_cuboid(:,2), ... the array geometry is degenerate in some other M_cuboid(:,3), ... way. M_cuboid(:,4) ); The ratio of the largest to the smallest sin- >> d = r ./ ( r + M_cuboid(:,1)*sqrt(1/2) ); gular value is called the condition of the matrix. >> vm_cuboid = [ az*(180/pi) el*(180/pi) d ] This is related to the minimum and maximum vm_cuboid = values of rE, and hence gives us insight into the overall quality of localization the array will pro- vide. For example, in Matlab 33.6901 -47.9689 0.7125

24 -33.6901 -47.9689 0.7125 which is a first-order all-pass network. Hence, -146.3099 -47.9689 0.7125 the phase response is the same as the LF sec- 146.3099 -47.9689 0.7125 tion and is maintained regardless of the relative 33.6901 47.9689 0.7125 levels of the LF and HF sections. -33.6901 47.9689 0.7125 Applying the bilinear transformation to im- -146.3099 47.9689 0.7125 plement these as digital infinite-impulse re- 146.3099 47.9689 0.7125 sponse (IIR) filters, the second-order pole 1 Note that the virtual microphones do not point H(s) = (30) at the loudspeakers. 1 + 2sT + (sT)2 becomes A.2 Phase-Matched Band-Splitting −1 −2 Filters b0 + b1z + b2z H(z) = −1 −2 (31) Sec. 4 discussed the need for frequency- a0 + a1z + a2z dependent processing in order to transition from where, for the LF section the exact solution at low-frequencies (LF), to one that optimizes rE at high frequencies (HF). k2 The crossover frequency use by Classic Am- b0 = (32) k2 + 2k + 1 bisonic Decoders and in our earlier listening b = 2b (33) tests is 380 Hz. We present the design of a suit- 1 0 able filter for this. b2 = b0 (34) The key idea is to treat the LF-to-HF transi- a0 = 1 (35) tion as one would the crossover network feeding 2(k2 − 1) a = (36) the LF and HF units in a loudspeaker. We de- 1 k2 + 2k + 1 sire a gradual transition, so simple second-order k2 − 2k + 1 filters are used a = (37) 2 k2 + 2k + 1 1 LF(s) = (25) 1 + 2sT + (sT)2 and, for the HF section 2 (sT) 1 HF(s) = (26) b = (38) 1 + 2sT + (sT)2 0 k2 + 2k + 1 These have the -6 dB point at the crossover fre- b1 = −2b0 (39) ω quency = 1/T rad/sec. If you combine these, b2 = b0 (40) the outputs cancel at the crossover frequency, but reversing the phase of the HF section, makes with a0, a1, a2 as in the LF section and its phase match that of the LF section and there πF is no cancellation at the crossover frequency. k = tan c (41) The output is Fs

1 − (sT)2 and Fc is the crossover frequency in Hz and Fs is Total(s) = (27) 12 1 + 2sT + (sT)2 the sample rate. As an example, with F = 380 and F = (1 + sT)(1 − sT) c s = (28) 48000 (1 + sT)(1 + sT) 12 1 − sT Note that this k is not the same as the k used in Sec. 4. = (29) 1 + sT

25 b_lp = 0.000589143208472 0.001178286416944 0.000589143208472

b_hp = 0.952044598366767 -1.904089196733534 0.952044598366767

a = 1.000000000000000 -1.902910910316590 0.905267483150478 Figure 10: Frequency and phase response of 2- meter NFC filter implemented as Direct Form 1 These filters can be implemented in Plogue Bid- IIR with Bidule’s recursive function block. ule as Direct Form 1 IIRs [28] using the Recur- sive Function block. The HF section is speci- fied by entering with 1 ( 0.952044598366767 * x) + b = (44) 0 k + 1 (-1.904089196733534 * prevX(1)) + − ( 0.952044598366767 * prevX(2)) - b1 = b0 (45) (-1.902910910316590 * prevR(1)) - a0 = 1 (46) ( 0.905267483150478 * prevR(2)) a1 = (k − 1)b0 (47) and the LP section by entering where k is given by Eqn. 41. ( 0.000589143208472 * x) + As an example, for 2 meters, f-3dB = 27.1 Hz ( 0.001178286416944 * prevX(1)) + and Fs = 48000 ( 0.000589143208472 * prevX(2)) - b = (-1.902910910316590 * prevR(1)) - 0.998229447703366 -0.998229447703366 ( 0.905267483150478 * prevR(2)) a = 1.000000000000000 -0.996458895406731 Recall that the desired phase response is ob- tained by subtracting the output of these sec- Implementing in Bidule’s recursive function tions, so after scaling according to the desired block response, the output signals must be differenced, ( 0.998229447703366 * x) + not summed. (-0.998229447703366 * prevX(1)) - (-0.996458895406731 * prevR(1)) A.3 Near-Field Compensation Filter All that is needed here is a first-order high-pass B IS MY ENCODER AMBISONIC? (HP) filter s H(s) = (42) Many references show the first-order horizontal 1 + sT Ambisonic B-format encoding equations as which translates to the digital IIR filter    √  W 2 2 −1     b0 + b1z X = cosθ S (48) H(z) = −1 (43) Y θ a0 + a1z sin

26 (a) low-pass filter (b) high-pass filter

Figure 9: Frequency and phase response of phase-matched 380 Hz band-splitting filters imple- mented as Direct Form 1 IIR with Bidule’s recursive function block.

where θ is the azimuth and S is the pressure The correct equation, via Beranek [29] en- due to the source at the position of the “micro- codes a single point source at distance d as phone.” But this is a simplification that has led    √  W 2 to much misunderstanding of the nature of B-    2  format. As an example, one myth suggests that X = D(s)cosθ S (49) B-format does not accurately encode near or dif- Y D(s)sinθ fuse soundfields because these equations have 1 + sT d no “phase information.” where D(s) = , T = , and c is the speed sT c W is a perfect omni-directional pressure mi- of sound. crophone with -3 dB gain. However, no practi- Eqn. 49 is essentially Gerzon’s full expres- cal microphone has a response as shown for X sion for velocity components at the bottom of and Y. The correct equations regard X and Y page 15, General Metatheory of Auditory Lo- as two perfect figure-8 particle velocity micro- calisation [12], but from an encoding viewpoint. phones with an on-axis gain of 1. As such, X and The Wave Equation precludes any practical de- Y are subject to the variations in phase and am- vice that implements the simplistic Eqn. 48. plitude between particle velocity and pressure Daniel [31] extends this form for higher or- encountered in real life. An important example ders but his derivation does not conveniently of this is proximity [29, 20], which is a clear describe diffuse fields, standing waves or non- and unambiguous coding of distance for near point sources for which we refer you to Cotterel sources. Cotterel [30] correctly derives XYZ as [30]. However, a microphone with response to solutions of the Helmholtz wave equation. This point sources as Eqn. 49 is necessary and suffi- codes near and diffuse soundfields properly and cient to correctly encode diffuse fields, standing is consistent with practical implementations of waves, non-point and nearby sources. the Soundfield Microphone.

27