A New Nonlinear Filtering Technique for Source Localization Frederik Beutler, Uwe D. Hanebeck Intelligent Sensor–Actuator–Systems Institute of Computer Design and Fault Tolerance Universitat¨ Karslruhe Kaiserstrasse 12, 76128 Karlsruhe, Germany [email protected], [email protected]

Abstract A method for source direction estimation based on hemi- A new model-based approach for estimating the pa- sphere sampling is presented in [2], where a nonlinear trans- rameters of an arbitrary transformation between two formation is used to convert the correlation vectors of the discrete-time sequences will be introduced. One se- GCC for all TDoA measurements to a common coordinate quence is interpreted as part of a nonlinear measure- system. ment equation, the other sequence is typically measured In [3] a fast Bayesian acoustic localization method is pre- sequentially. Based on every measured value, the prob- sented, a unifying framework for localization can be found ability density function of the parameters is updated in [1]. These approaches are similar to the nonlinear filtering using a Bayesian approach. For the evolution of the technique presented here, but they use a window for source system over time, a system equation is included. The estimation from received samples. Furthermore, a motion new approach provides a high update rate for the de- model is not included. sired parameters up to the sampling rate with high ac- This paper introduces a new model-based approach for esti- curacy. It will be demonstrated for source localization mating the state, e.g. the position of a source, directly from of a speaker, where the parameters describe the position time sequences. Hence, a model for the evolution of the of the source. desired state over time can be used. The novelty is the new interpretation of a known reference sequence as part of a mea- Keywords surement equation of a nonlinear system. This new approach Source Localization, Nonlinear Filter Technique, Bayesian can be used as a framework for a multitude of procedures. Approach An important application is the use of this new approach in sensor-actuator-networks, because the received sequence INTRODUCTION does not have to be stored on the sensor node, since every Estimating the position of a target object is a typical problem received measured value will be immediately processed. arising in acoustic localization systems. A sound source is In comparison to the conventional method there are sev- placed in the environment and emits that are picked eral advantages. In the conventional method the estimate of up by a sensor node equipped with microphones. If the the time delay from the time sequences is the result from is additionally transmitted through a different medium to the processing overlapping or consecutive time windows. This sensor node, e.g. wireless radio, the position of the source disadvantage disappears in this new approach, since every can be calculated. The conventional method for this problem measured value will be used immediately. Thus the compu- consists of two steps. In the first step, the Times of Arrival tational complexity is lower and the processing delay for state (ToA) are estimated for example by means of generalized estimation is shorter compared to the conventional method, cross correlation (GCC) [6]. These estimates are then used because for every measured value the approach provides a in the second step, which converts Times of Arrival to object result for the desired state. In addition, the structure and positions based on the known microphone arrangement. This the density of measurement noise and process noise can be is in general a nonlinear optimization procedure [5]. exploited. The search complexity compared to the conven- When the signal emitted by the sound source is not known tional beamforming algorithm is reduced. In addition, prior a priori and no timing reference is given, the position of the knowledge can be exploited. source has to be calculated on the basis of Time Differences The structure of the paper is as follows. The problem of esti- of Arrival (TDoA). In [8] a Sequential Monte Carlo (SMC) mating the parameters of a transformation between two time method for source tracking based on Auxiliary sequences is formulated in the next section. For efficient (APF) is presented for that purpose. The method makes use estimation based on sequential measurements, a nonlinear of a model for speaker motion. Other frameworks for source filtering technique is presented. This approach is then spe- tracking based on particle filtering algorithms can be found cialized for the case of source localization. The performance in [7, 9, 10]. of the proposed new approach is evaluated in comparison to v(k) where f e(x(k)) is the estimated density from the filter step T x(k) h (x(k)) g(s(k),h(x(k)),v(k)) and f (x(k +1)) is the transition density, which depends on h(). g(). y(k) the system equation. s(k) Calculation of Point Estimate The point estimate of the state can be calculated according Figure 1. Diagram of the measurement equation. to  E(x(k)) = x(k)f e(x(k))dx(k) , the conventional method. Conclusions and some hints on R future investigations are given in the last section. where E(x(k)) is the expected value of the estimated density.

PROBLEM FORMULATION SPECIAL CASE: SOUND SOURCE LOCALIZATION We consider a nonlinear discrete-time system with a system We consider the far field of an acoustic point source located T equation, which describes the evolution of the system state at x =[xyz] , e.g. a speaker. The source emits a known over time as signal s(k), which is detected by several microphones. The T microphones are placed at known points M i =[xi yi zi] , x(k +1)=a(x(k),w(k)) , i =1...N in the environment. For microphone i the where a(.) is the nonlinear system equation, x(k) is the sys- received signal yi(k) according to (1) is given as tem state at time tk and w(k) is the process noise. The main yi(k)=g(s(k),hi(x(k),Mi),vi(k)) . problem of state estimation based on sequences is the recon- struction of the state x(k) from different measured values The speaker signal is delayed and attenuated depending y(k). In Fig. 1 a block diagram for the discrete-time mea- on the distance from the microphones. Assuming additive surement equation is shown. The parameters of an arbitrary measurement noise gives transformation h(.) represent the system state x(k), which yi(k)=hi(k, x(k),Mi) s(k)+vi(k) , is mapped to an intermediate state. The nonlinear function g(.) maps the sequence s(.) depending on this intermediate where stands for convolution. We assume that there is no state on the sequence y(.). The sequence y(.) is typically reflection. The time delay and attenuation of the signal can measured sequentially and can be described as be described by means of a dirac-function y(k)=g(s(k),h(x(k)),v(k)) , (1) hi(k)=Di(x(k)) · δ(k − τi(x(k))) , where v(k) is the measurement noise. where the attenuation factor and the time delay are given as 1 Di(x(k)) = NEW MODEL-BASED APPROACH FOR M i − x(k) DIRECT NONLINEAR STATE ESTIMATION and The nonlinear state estimation technique for estimating the M i − x(k) τi(x(k)) = , desired transformation parameters will be presented in this c section. where c is the acoustic velocity. The received signal at Measurement Update, Filter Step microphone i depending on the position of the microphone In the measurement update, or the filter step, the estimated M i and of the source x is given by e density f (x(k)) will be calculated according to Bayes’ law yi(k)=Di(x(k)) · s(k − τi(x(k))) + vi(k) e J p f (x(k)) = c(k) · f (x(k)) · f (x(k)) , or    J where f (x(k)) is the joint density, which depends on the 1 M i − x(k) yi(k)= ·s k − +vi(k) . likelihoods corresponding to the measurement equation. M i − x(k) c f p(x(k)) is the predicted density resulting from the previous (2) estimation step and c(k) is a normalization constant. Measurement Equation Prediction Step For the derivation of the joint density, according to the mea- In the prediction step the new predicted density is given by surement update we use (2) and can thus define the likelihood  f L as p T · e f (x(k + 1)) = f (x(k + 1)) f (x(k))dx(k) , L v − · − R f (x(k)) = f (yi(k) Di(x(k)) s(k τi(x(k)))) , New algorithm respectively. The speaker signal is delayed depending on 0.4 Conventional method the distance between microphone and speaker. The signals 0.2 are generated with zero-mean white Gaussian noise with the RMSE in meters 0 0 50 100 150 200 variance 1. In addition, the signals are disturbed by adding Samples zero-mean white Gaussian noise of appropriate variance of 2. The estimated probability density function for the new al- Figure 2. Comparison of the new algorithm and the gorithm is calculated on a two-dimensional grid comprising conventional method: RMSE versus number of received 50 × 50 grid points and ranging from 0 mto1 m. For every samples. received sample the position is calculated. In the conven- tional method, the Time of Arrival is estimated by means of where s(.) is piecewise constant. cross correlation, where the block length for the received se- We assume the density f v to be Gaussian, resulting in quence is set to 100. In the second step, the estimated Times 2 of Arrival are used to estimate the position of the speaker [yi(k) − Di(x(k))s(k − τi(x(k)))] L −1 by using the algorithm presented in [5] as a starting guess f (x(k)) = exp( 2 ) . 2 σ for a gradient descent algorithm. In Fig. 2 the Root-Mean- The likelihood function maps the scalar measurement ri(k) Square-Errors between the estimation of the speaker position to the 3–dimensional speaker position. and its true value for the new algorithm and the conventional When combining the likelihoods of N microphones to a joint method are plotted for each received sample. The new algo- J density f (x(k)), we obtain rithm converges after 31 samples to the true value, because f J (x(k)) = for every received sample the estimated density is updated, whereas the conventional method provides a result after 100 N 2 samples, which depends on the block length. [yi(k) − Di(x(k)) · s(k − τi(x(k)))] −1 exp( 2 ) . Finally, we compare the new algorithm to the conventional i=1 2 σ method in a real experimental setups. The microphone posi- In this case we assume, that the measurement noise processes T tions are given as M 1 = 0.16 −0.145 0.115 , M 2 = at different microphones are independent.  T  T 0.485 −0.007 0 , M 3 = 0.315 −0.155 0.255 ,  T System Equation and M 4 = 000 . The sampling frequency is 48 kHz m We now derive the system equation, which describes the and the acoustic velocity is 340 s . In the first setup the motion of the speaker over time. In the case of additive noise speaker position is fixed. In Fig. 3a, the speaker position w(k) the system equation is given by is plotted as a function of the number of received samples. x(k +1)=a(x(k)) + w(k) . Both methods converge to the same value. The proposed method, however, provides accurate estimates for every new In this paper we consider a simple linear model according to sample starting from sample 150, while the conventional x(k +1)=x(k)+w(k) , method provides estimates whenever 200 samples have been received and processed as the block length is set to 200. where the uncertainty of the source position is increasing In the second setup the speaker moves on the y-axis forward over time. to the microphones. The results are plotted in Fig. 3b, where the position of the estimated source position is depicted as a IMPLEMENTATION function of the received samples. The new approach provides In order to evaluate the performance of the proposed new good tracking results, whereas the conventional method pro- approach, it was implemented using a grid-based proce- duces outliers resulting from the assumption that the source dure. Although this kind of implementation is inefficient position remains constant while a block is being received and for real-time-tracking, it is sufficient for demonstrating the processed. new algorithm with simulated and real data.

RESULTS CONCLUSIONS AND FUTURE WORK The performance of the new approach is evaluated in sim- A new model-based approach for estimating the parameters ulation as well as in experiments with real data. Further- of an arbitrary transformation between two discrete-time se- more, the new algorithm will be compared with the conven- quences has been presented, where the corresponding model tional method. In the simulation the locations of the micro-  T characterizes the evolution of the desired transformation pa- phones and of the speaker are given as M 1 = 2.52.8 , rameters over time. The new approach provides probability  T  T  T M 2 = 32 , M 3 = 21.8 , and x = 0.20.5 , densities describing the parameter estimates that are updated 0.6 New algorithm 0.39 Conventional method 0.4 0.37 0.2 0.35 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 4 x Position in meters 0 100 200 300 400 500 x Position in meters Samples x 10 Samples

0.6 0.58 0.4 0.56 0.2 0.54 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 y Position in meters

y Position in meters 0 0 100 200 300 400 500 Samples x 10 4 Samples 0.15 0.14 0.1 0.12 0.05 0.1 z Position in meters

0 z Position in meters 0 100 200 300 400 500 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Samples Samples x 10 4 (a) Comparison of the new algorithm and the conventional (b) Comparison of the new algorithm and the conventional method for a fixed speaker. method for a dynamic speaker.

Figure 3. Experimental results. based on every received measurement sample. It is demon- state estimation. Proceedings of SPIE, AeroSense strated for the special case of source localization, where the Symposium 5099 (2003), 256–267. source location parameterizes the transformation between the [5] Hanebeck, U. D., and Schmidt, G. Closed-form emitted and the received sound sequence. This parameter set, elliptic location with an arbitrary array topology. the source location, is directly estimated from the sound sam- Proceedings of the 1996 IEEE International ples and updated with every new sample received. The results Conference on Acoustics, Speech, and Signal are compared with the conventional two-step procedure for Processing (ICASSP’96) (1996), 3070–3073. source localization, which relies on time-delay estimation. [6] Knapp,C.,andCarter,G.C.The generalized A motion model for a moving object is also included in the correlation method for estimation of time delay. IEEE new approach. Transactions on Acoustics, Speech, and Signal The presented implementation provides very good results Processing 4, 4 (1976), 320–327. compared to the conventional procedure, but is not efficient [7] Lehmann, E. A., Ward, D. B., and Williamson, as the probability density functions of the parameters are rep- R. C. Experimental comparison of particle filtering resented on a grid. Future work is concerned with developing algorithms for acoustic source localization in a an efficient approximation scheme for the probability density reverberant room. IEEE International Conference functions based on the Progressive Bayes framework [4]. on Acoustics, Speech, and (ICASSP) 5 (2003), 177–180. REFERENCES [8] Vermaak, J., and Blake, A. Nonlinear filtering for speaker tracking in noisy and reverberant [1] Birchfield, S. T. A unifying framework for acoustic environments. IEEE International Conference on localization. 12th European Signal Processing Acoustics, Speech, and Signal Processing Conference (2004). (ICASSP) 5 (2001), 3021–3024. [2] Birchfield, S. T., and Gillmor, D. K. Acoustic [9] Ward, D. B., Lehmann, E. A., and Williamson, source direction by hemisphere sampling. IEEE R. C. Particle filtering algorithms for tracking an International Conference on Acoustics, Speech, acoustic source in a reverberant environment. IEEE and Signal Processing (ICASSP) 5 (2001), Transactions on Speech and Audio Processing 11, 3053–3056. 6 (2003), 826–836. [3] Birchfield, S. T., and Gillmor, D. K. Fast bayesian [10] Ward, D. B., and Williamson, R. C. Particle filter acoustic localization. IEEE International beamforming for acoustic source localization in a Conference on Acoustics, Speech, and Signal reverberant environment. IEEE International Processing (ICASSP) 2 (2002), 1793–1796. Conference on Acoustics, Speech, and Signal [4] Hanebeck, U. D., Briechle, K., and Rau, A. Processing (ICASSP) 2 (2002), 1777–1780. Progressive bayes: A new framework for nonlinear