A Comparative Study of Multilateration Methods for Single-Source Localization in Distributed Audio
Total Page:16
File Type:pdf, Size:1020Kb
A Comparative Study of Multilateration Methods for Single-Source Localization in Distributed Audio Srdan¯ Kitic,´ Clément Gaultier, Grégory Pallone Orange Labs Cesson-Sévigné, France srdan.kitic, clement.gaultier, [email protected] Abstract—In this article we analyze the state-of-the-art in multilateration - the family of localization methods enabled by the range difference observations. These methods are computa- tionally efficient, signal-independent, and flexible with regards to the number of sensing nodes and their spatial arrangement. How- ever, the multilateration problem does not admit a closed-form y solution in the general case, and the localization performance is r? conditioned on the accuracy of range difference estimates. For that reason, we consider a simplified use case where multiple distributed microphones capture the signal coming from a near x field sound source, and discuss their robustness to the estimation errors. In addition to surveying the relevant bibliography, we Fig. 1. Source localization with 5 single-channel microphones. present the results of a small-scale benchmark of few “main- stream” multilateration algorithms, based on an in-house Room Impulse Response dataset. with the relatively small number of microphones, seriously degrades performance of beamforming-based techniques, at I. INTRODUCTION least in narrowband [5]. The approaches based on distributed As the audio technologies incorporating distributed acous- beamforming, e.g. [6], [7], [8], could still be appealing if they tic sensing – like Internet Of Audio Things [1] – gain mo- operate in the wideband regime: unfortunately, the literature mentum, the questions regarding efficient exploitation of such on wideband beamforming by distributed mono microphones acquired data naturally arise. A valuable information that could is somewhat scarce. Another major downside of beamforming- be provided by these networks is the location of the sound based localization is generally high computational cost, al- source, which can be a daunting task in the adverse acoustic though various attempts have been made in order to reduce conditions, namely in the presence of noise and reverberation. its complexity, e.g. [9]. Finally, the fact that the number of On the flipside, localization is usually only a pre-processing sensors and their spatial arrangement can vary precludes the block of a larger processing chain (e.g. in the case of location- use of contemporary learning-based localization methods (e.g. guided separation and enhancement [2]), thus its computational [10], [11], [12]), which have shown remarkable performance efficiency is of uttermost importance. in more restricted use cases. In this work we consider a specific distributed audio Under these constraints, the pairwise Time Difference Of context, where we assume a sensor network composed of dis- Arrival (TDOA) features emerge as a viable choice for sound tributed single-channel microphones with potentially different source localization (rectifying the need for adequate synchro- gains (Fig.1). Such a network could be seen as one large nization, since the TDOA estimation is known to be sensitive arXiv:1910.10661v4 [cs.SD] 28 Jul 2020 scale microphone array - note that this is markedly different to clock offsets and internal delays [13]). The corresponding from a network whose nodes are (compact) microphone arrays sound source localization pipeline is presented in Fig. 2. First, themselves, as detailed in the following paragraph. The array TDOAs are estimated for each microphone pair, followed by geometry is assumed known in advance, and the microphones their conversion to (pseudo) Range Differences (RD). These are already synchronized/syntonized [3]. Lastly, we assume estimates are then processed by a multilateration algorithm the presence of a single sound source and a direct path [14], which finally yields the source position. In this article we (line-of-sight) between the source and microphones. The latter focus on the last part of the pipeline, i.e. we discuss exclusively assumption is essential for most of the traditional sound source and thoroughly the localization algorithms belonging to the localization methods to work, in order to avoid an extremely multilateration class, as opposed to related review papers [14], challenging “hearing behind walls” problem [4]. [15], [16] that study localization methods in a broader sense. Albeit deceivingly simple, the considered scenario imposes Simultaneous localization of multiple sound sources is of a several technical constraints. First, the absence of compact great practical interest. From the perspective of a multilatera- arrays prevents the node-level Direction-of-Arrival (DOA) tion algorithm, as long as multiple sets of RDs (corresponding estimation. Second, no knowledge of the source emission time to each sound source) are available, the localization of each prohibits the Time-of-Flight (TOF) estimation. Third, large source can be done independently of the rest. Therefore, array size implies significant spatial aliasing, which, along discussing the single-source case only does not endure a loss as well (since Dm = c·τm), provided that the emission time is known. Surprisingly, despite theoretical evidence, a simulation study in [25] has revealed that the TOF and TDOA features TDOA RD Source position TOF/TDOA seem to perform similarly in terms of localization accuracy. estimates estimates multilateration estimate estimation algorithm … algorithm Since the microphone signals are often corrupted by noise and reverberation, the TDOA measurements – thereby RDs – could be erroneous, which negatively affects the perfor- mance of localization algorithms. In the RD domain, such degradations are usually modeled by an additive noise term. Fig. 2. Standard multilateration processing chain (notations are defined in Another cause of localization errors is the inexact knowledge section II). of microphone positions. As shown in [26], the Cramér-Rao lower Bound (CRB) [27] of the source location estimate increases rather quickly with the increase in the microphone of generality with regards to multilateration-based localization. position “noise” (fortunately, somewhat less fast in the near However, we underline that the performance of multilateration field, than in the far field setting). Finally, the localization methods is conditioned on the accuracy of TDOA/RD esti- accuracy also depends on the array geometry [28], which is mation, which is a difficult problem in its own right [17], assumed arbitrary in our case. [18], [19]. The problem becomes even more challenging in the presence of multiple overlapping sources [20], but it falls In noiseless conditions, the number of linearly independent out of the scope of the present article. Indeed, as practitioners RD observations is equal to M − 1, and all the remaining we are particularly interested in performance of multilateration RDs could be calculated from such a set. However, in the algorithms using the pseudo RDs obtained from off-the-shelf presence of noise, considering the full set of observations (of TDOA estimators, such as Generalized Cross Correlation - size M(M − 1)=2) may be useful for alleviating the noise- PHAse Transform (GCC-PHAT) [21]. related degradation [25], [29], [30], [31]. If only independent RDs are to be used, one microphone is chosen as a reference II. PROBLEM STATEMENT (the choice of which, under the same noise level, does not affect the theoretical localization accuracy [25]). The same Distances between microphones are considered to be of the microphone could be conveniently put at the coordinate origin, same order as the distances between the microphones and the e.g. r1 = 0, where 0 is the null vector. By denoting r := rs, source, hence – with regards to the audible frequency range from (3), the non-redundant RDs are compactly given as of speech – we discuss the near field scenario. The general formulation of the time-domain signal ym(t), recorded at the dm0 := d1;m0 = krm0 − rk − krk: (4) mth microphone is given by the time-variant convolution: 0 t Finally, given a minimal fdm0 j m 2 [2; M]g, or an Z 0 0 extended fdm;m0 j (m; m ) 2 [1; M] × [1; M]; m 6= m g set of ym(t) = am(t; τ)xs(t − τ)dτ + nm(t); (1) observations, along with microphone position frmg, the goal 0 now is to estimate source position r. In the following sections, where am(t; τ) is the time-variant Room Impulse Response we discuss two major classes of such localization algorithms: th (RIR) filter, relating the m microphone position rm with maximum likelihood and least squares estimators. the source position rs, xs(t) is the source signal, and nm(t) is the additive noise of the considered microphone. In (1), III. MAXIMUM LIKELIHOOD ESTIMATION the microphone gains are absorbed by RIRs. In practice, various simplifications are commonly used instead of the Since the observations (4) are non-linear, a statistically general expression (1). Commonly, a free-field, time-invariant efficient estimate (i.e. the one that attains CRB) may not be approximation is adopted, as follows [22]: available. The common approach is to seek the maximum likelihood (ML) estimator instead. ym(t) = amxs(t − τm) + nm(t); (2) ^ Let ^r and dm0 (^r) denote the estimated source position, and where the offset τm denotes the TOF value, which is propor- the corresponding RD, respectively: tional to the source-microphone distance. ^ The TDOA, corresponding to the difference in propagation dm0 (^r) = krm0 − ^rk − k^rk: delay between the microphones m and m0, is defined as τm;m0 = τm0 − τm. In homogeneous propagation media, the Under the hypothesis that the observation noise is Gaus- TDOA values τm;m0 , directly translate into Range Differences sian, the ML estimator is given as the minimizer of the negative (RD) dm;m0 , given the sound speed c: log-likelihood [32], [33] dm;m0 := Dm0 −Dm = krm0 −rsk −krm −rsk = c·τm;m0 ; (3) T L(r) = d − d^(^r) Σ−1 d − d^(^r) ; (5) where Dm denotes the source-microphone distance. The ob- servation model (3) defines the two-sheet hyperboloid with T T ^ h ^ ^ ^ i respect to rs, with foci in rm and rm0 [23], [24].