Ambisonic Decoders by Nonlinear Optimization
Total Page:16
File Type:pdf, Size:1020Kb
AES Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4–7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract and extended precis that have been peer reviewed by at least two qualified anonymous reviewers. This convention paper has been reproduced from the author's advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society. Design of Ambisonic Decoders for Irregular Arrays of Loudspeakers by Non-Linear Optimization Aaron J. Heller1, Eric Benjamin2, and Richard Lee3 1 Artificial Intelligence Center, SRI International, Menlo Park, CA 94025, US [email protected] 2 Surround Research, Pacifica, CA, 94044, US [email protected] 3 Pandit Littoral, Cooktown, Queensland 4895, AU [email protected] ABSTRACT In previous papers, the present authors described techniques for design, implementation, and evaluation of Ambi- sonic decoders for regular loudspeaker arrays. However, irregular arrays are often required to accommodate domes- tic listening rooms. Because the figures of merit used to predict decoder performance are non-linear functions of speaker positions, non-linear optimization techniques are needed. In this paper we discuss the implementation of an open-source application, based on the NLopt non-linear optimization software library, that derives decoders for arbi- trary arrays of loudspeakers, as well as providing a prediction of their performance using psychoacoustic criteria, such as Gerzon’s velocity and energy localization vectors. We describe the implementation and optimization crite- ria, and report on informal listening tests comparing the decoders produced. those cues as accurately as possible over the listening area. 1. INTRODUCTION The ability to generate decoders for any given ad hoc The main goal of Ambisonic sound is to reproduce lo- array of loudspeakers is needed to accommodate typical calization cues that approximate those experienced in domestic listening rooms. Although the ITU- natural hearing, while utilizing a modest number of recommended five-loudspeaker configuration is a com- transmission channels and loudspeakers. The signals monly referenced system, its actual use is relatively carried by the transmission channels define “what it uncommon. Real systems may be symmetrical, but not should sound like” and it is the job of the decoder to have the ITU shape, or may be completely asymmetri- produce a set of loudspeaker signals that reproduce cal. If the angular coverage is non-uniform, the derived Heller, Benjamin, and Lee Design of Ambisonic Decoders by Non-Linear Optimization decoder may be theoretically suitable but in practice In addition, we present two new results: require that the loudspeaker emit unrealistically intense sounds. Likewise, the localization cues may be repro- • A decoder for the ITU 5 array that makes full duced correctly at the center of the array (the “sweet- use of the center loudspeaker. In our listening spot”) but vary so rapidly with displacement from the tests it was preferred over decoders that did not center that the localization experienced by the listener is use the center speaker.3 unstable with respect to head movements. • A decoder optimized for a left/right asymmet- There are numerous numerical techniques for optimiz- ric five-speaker array where the front speakers ing non-linear objective functions; in particular, modi- and one of the surround speakers are placed fied tabu search [31][24][25], neural networks [30], and according to the ITU recommendations and the genetic search [2] have been applied to the problem of remaining surround loudspeaker is signifi- designing Ambisonic decoders. cantly displaced. In our listening tests this was preferred over misapplied decoders. We have employed the NLopt library for non-linear optimization [20] and other free/open-source software Because an Ambisonic recording is a definition of packages to produce an open-source application for the “what it should sound like” rather than “what comes out generation of practical Ambisonic decoders for end- of a speaker,” the promise it offers is that results for (at users. NLopt provides a common application program- least) the central listener should be independent of the mer interface (API) to a number of non-linear optimiza- speaker layout. This is subject to certain constraints. tion techniques, allowing a common framework to be One logical expectation might be that the sound is developed to support rapid-turnaround experiments. “better” or more “accurate” in a direction with more speakers. Previous decoder designs for irregular arrays In the present work we limit ourselves to first-order have not lived up to this promise. We feel the tech- decoders because the vast majority of Ambisonic re- niques in this paper, especially what we are calling cordings are first order. Furthermore, the listening fa- “Vienna-like decodes” are an important step in achiev- cilities available to us have horizontal arrays, so our ing this promise. listening tests were limited to those arrays. The present work has been limited to ITU-like arrays initially, be- At the time of this writing it is not known what the cause they are a configuration that others have worked precise perceptual tradeoffs are between maximizing the on and therefore provide a good benchmark of our ap- various parameters associated with a given decoder proach. However, there is nothing that limits the tech- design. As a separate project, the present authors have niques presented to a particular Ambisonic order or created a methodology for assessing the effective repro- speaker array.1 duction of various factors such as ITD and ILD cues and the perception of envelopment. Differences with work in this area published previously include: We do not claim that the present work represents the ultimate, or even best available, solution, but we do • The software is being released as an open- hope that by making our tools available on an open- source project2. source basis, it will foster further work in this area, as well as provide a useful tool for listeners. • The system is not limited to ITU 5 loudspeaker arrays. In particular there are no assumptions about left/right symmetry. It will operate with arbitrary arrays. • The user can impose constraints on the ranges of the individual parameters. 3 Wiggins, Moore, and others have published ITU B-format 1 We recognize that as more parameters are added, the more decoders where the center speaker is effectively shut off. Ger- slowly the system will converge and the more likely it will zon’s “Vienna decoders” use the center speaker, but he does settle in a local minimum rather than the global minimum. not show a decoder for the ITU array. Hence we refer to the 2 Please email the authors for access information. current five-speaker decoder as “Vienna-like”. AES 129th Convention, San Francisco, CA, USA, 2010 November 4–7 Page 2 of 23 Heller, Benjamin, and Lee Design of Ambisonic Decoders by Non-Linear Optimization 2. BACKGROUND the low-frequency decoder using phase-matched shelf filters. [18] 2.1. Classic Decoders Decoder design for general irregular arrays, such as It is likely that, until the last five years, the majority of ITU-R, is more difficult because the energy localization good experiences with Ambisonic playback have been vector is not guaranteed to point in the same direction as with the hardware designs by Dr. Geoffrey Barton of the the velocity localization vector, and in general will not original Ambisonic team, e.g., [23][19]. When we at- point in the same direction as the velocity vector, and is tempted to duplicate these experiences with modern therefore not a linear function of the speaker locations. software decoders, we found many would not have been deemed Ambisonic by the original team and did not The idea of using separate decoders for the different give as good results as the ‘Classic’ hardware designs frequency regimes (below and above 400 Hz) was a done in accordance with Gerzon’s theories. [17] major advance leading to the so-called “Vienna Decod- ers.” [16] These are also known generically as “dual- Our first paper “Localization in Ambisonic Systems” band decoders.” Gerzon and Barton outline an analytic [4] confirmed via controlled listening tests, the impor- optimization technique (characterized as “very tedious tance of each aspect of “Classic” Ambisonic decoder and messy”) and show results for some five-speaker design: a decoding matrix matched to the geometry of arrays, but do not give enough details to generalize the loudspeaker array in use, phase-matched shelf fil- these to other arrays. Furthermore, Wiggins has pointed ters, and near-field compensation.4 out that these solutions have flawed localization. [33] “Ambisonic Localisation – Part 2” [22] compared suc- More recent work in this area has made use of numeri- cessful decoders from the first paper with an ITU-R cal optimization techniques. While steady progress on decoder by Wiggins [31]. We investigated how robust the design of decoders for the ITU 5-channel loud- decoders were to movement away from the central speaker arrays has been accomplished by Gerzon and ‘sweet spot’ and when used with the wrong speaker Barton [16], Craven [7], Wiggins [31], Moore and layout. Wakefield [25], and others [30] [2], significant work remains to be done in this area, in particular for play- “Is My Decoder Ambisonic” [18] put into the public back of first-order B-format recordings. domain what we had learned about the design of Classic decoders.