A FOR MODEL BASED AUDIO SOURCE SEPARATION

C. Andrieu, S.J. Godsill

Signal Processing Lab., University of Cambridge Department of Engineering, Trumpington Street

CB2 1PZ Cambridge, UK

Email: ca226,sjg ¡ @eng.cam.ac.uk

ABSTRACT different zones of interest of the state space dictated by the obser- In this paper we present an original modelling of the source separa- vation process and the dynamics of the underlying system. This class of Monte Carlo methods was introduced in the late 60’s, but tion problem that takes into account all the non-stationarities of the were overlooked mostly because of their computational complex- underlying processes. The estimation of the sources then reduces ity, see [9] for a review. The advantage of these methods is that to that of a filtering/fixed-lag smoothing algorithm, for which we they get around the problems faced by classical methods: the adap- propose an efficient numerical solution, relying on particle filter tativity of the stochastic grid results in an accurate representation techniques. of the density of interest at a rate independent of the dimension of 1 Introduction the problem [6]. The problem of separating convolutively mixed sources is of in- Particle techniques are here applied to obtain filtered and fixed- terest to many applications such as speech enhancement, crosstalk lag smoothed estimates of the non-stationary sources modelled as removal in multichannel communications and multipath channel time varying autoregressive processes, observed in additive white identification. In this paper we consider the problem of separating Gaussian noise. The simulation-based algorithms developed here audio signals modelled as autoregressive processes. The desired are not just a straightforward application of the basic methods, but sources are clearly non-stationary as the vocal tract is continually are designed to make efficient use of the analytical structure of the changing, sometimes slowly, sometimes rapidly and the coupling model. At each iteration the algorithm has a computational com- filters may be time varying as the propagation conditions in the plexity that is linear in the number of particles, and can easily be medium or more simply the sources/sensors geometry change over implemented on parallel computers, thus facilitating near real-time time. Here we explicitly model the non-stationarities of both the processing. It is also shown how an efficient fixed-lag smoothing sources and the propagation medium, allowing us to take into ac- algorithm may be obtained by combining the filtering algorithm count different stationarity time scales: speech will typically be with Monte Carlo (MCMC) methods (see [12] for stationary over periods of 20-40 milliseconds, whereas the trans- an introduction to MCMC methods). Note that the power and flex- mission channel is expected to be stationary over longer periods. ibility of these techniques can allow for far more complex models Taking into account these non-stationarities clearly removes some than those considered here. of the problems associated with framed based algorithms that as- The remainder of the paper is organized as follows. The model sume piecewise stationarity and result in a delayless algorithm. specification and estimation objectives are stated in Section 2. In The modelling we adopt for the system facilitates a state space Section 3 sequential simulation-based methods are developed to representation, and the problem of estimating the sources from ob- solve the filtering and fixed-lag smoothing problem and determine served mixtures then reduces to that of a filtering/fixed-lag smooth- the model adequacy. In this section we first introduce Monte Carlo ing estimation problem. The filtering and smoothing problems, basics, secondly show how to take advantage of the structure of the that is the recursive estimation of the sources conditional upon the model and reduce dramatically the dimensionality of the problem, currently available data, require the evaluation of integrals that do thirdly explain why a selection scheme for the particles is required not admit closed-form analytic solutions and approximate methods and why diversity must be introduced, here using MCMC steps. must be employed. Classical methods to obtain approximations to Section 4 presents and discusses simulation results. the desired distributions include analytical approximations, such

as the extended Kalman filter [1] the Gaussian sum filter [3], and 2 Model of the data £ deterministic numerical integration techniques (see ¢ £ ¤ . [5]). The The problem addressed is the problem of source separation, the

extended Kalman filter and Gaussian sum filter are computation- ¥ sources being modelled as autoregressive processes, from which § ally cheap, but fail in difficult circumstances. we have at each time ¦ , observations which are convolutive mix- Instead of hand-crafting such algorithms, we propose here the tures of the ¥ sources. use of an adaptive stochastic grid approximation, which introduces a so-called system of particles. These particles evolve randomly in 2.1 Model for the sources

time in correlation with one another, and either give birth to off-

¦ © £ £ £ Source ¨ can be modelled for as:

spring particles or die according to their ability to represent the

   

 

             

     C. Andrieu is sponsored by AT&T Lab., Cambridge UK. © (1)

381

 ¡



 ¢ £

   

 ¨

and is the order of the AR model. We assume that 2.3 State-space representations

¥

¤ ¥ ¦ §





  

£ £ ©



 

 

¨ ¨ © £ £ £ for . is a zero mean

2.3.1 State-space representation of the sources

¥

© £ £ £

normalized i.i.d. Gaussian sequence, i.e. for ¨ and

¨

© £ £ £

¦ For each of the source the signal can be rewritten in the following

 form:

¤ ¥ ¦ 



 

©



 

 

(2) 

  

 ¢ 

   5       5

    

 6 6 6

© (9)

 

 

 

 



The are the of the dynamic noise for

" < =

 >







7 / 8 9 : / 8 9 *  7 7

" ; 6 each source at time ¦ . where and is a matrix We assume that the evolving autoregressive model follows a defined as:

linear Gaussian state-space representation,





  B

 @ A 5  







 

  

      ?

 B

 @ A 5  

 



6 6







5  

 B

A 5   @  C

 

 

¢  

  

¢

 



 ©      

 (10)

   



  ¢

    



¢

 

      

© (3) Then one can rewrite the dynamic system equations:



D D

 



¢

    

 



       ¢ ¢ ¢

     

 

©  E E E

with ,

F D

 H

 

  I 

GE E E

© (11)

 

 ¤ ¥ ¦ ¤ ¥ ¦ 

 

 

 



 © ©





 



and (4)

with



¤ ¥ ¦ §

£

£ £ ©

 +   + 

 ¨

¦ ¦

 

 J   J  

     

© © 

and . A future version of the model will 

 ¨ KL  M M M  ¨ KL M M M

E E





 

include switching between different sets of matrices to D

 + 



    + 



 

   

¢ ¢

    5     5 N 

 

4

© M M M M M M

E 6 6  





©  

take into account silences, for example. Typically  and

-

 ,G

E contains the mixing system

 

 

 .

    

¦ ¦

H  J  (  (

)   )  

© ©

 I 

M M M ¨ KL M M M £ ©

 E and E (12) 2.2 Model of the mixing system and noise The mixing model is assumed to be a multidimensional time vary- 2.3.2 State-space representation of the parameters ing FIR filter. More precisely we assume that the sources are Defining the “stacked” parameter vectors

mixed in the following manner, and corrupted by an additive Gaus-

 ¡

 P



4

 -O   -

"

© £ £ £ § 

"





¢ 

sian i.i.d. noise sequence: at the sensor, and for 

, , ¨ © £ £ £ § S © £ £ £

Q

Q R 4



¨ © £ £ £ § S

P

 P

¥

# "

 



' 4

 -   -.

4

O O 

.





© £ £ £ S

¢

% 

, ,

!" " " "

     



  (

   & ¢  ) 

   % 



R 4 T R 4

'

T U

© "

(5)

$ $





© £ £ £ *

 $

 + 

+ (13)

"

"



 V  V V V

" "

   

* * * © * "

 and , it is pos-

* ¨

 1

where is the length of the filter from source to sensor . The 1

"



¦



) sible to consider the following state space representations

©





series  is a zero mean normalized i.i.d. Gaussian se-

¥

© £ £ £ ¦ © £ £ £

quence, i.e. for and



  ¢

  

¢ ¢ ¢

     

© 

1 1 1

$ $

F (14)

  H 

  

I



© G

1 1 1

"



¤ ¥ ¦ 

) 

©

$

(6)

 +  

%Y @%Y  @ %Y @

 H

  W X W X  W X

 G

1 1 1

where 1

"

 

 

(



"

 +



The quantities are the variances of the observa-

"



D D

 + 

) 

 J

& ¢ & ¢

    %     %  [

N



 

4

¦



G ¨ KL Z M M M tion noise for each sensor at time . The are assumed in- 1 (15)

dependent of the excitations of the AR models. We impose the

" "

   



 -  -.  0

 

  %

© , © © ¨ / §

constraint , and for and

¥ ¥

$ $



¢  

  

¢ ¢ ¢

     

¨ © £ £ £ § ©

©  

and . In the case this constraint corresponds 

  

" "

   



- -

  .

D 

 (16)

 H

  %  

  I 

, © , © ¨ ©

G 

to and for . As for the model of ©

  

$ $

+

@%\ @%\

the sources, we assume that the observation system also follows a %\





"   W X  W X

 

"

 

(      



(



 G

  

state-space representation: writing with where

<

D



J 

     



 



G © ¨ KL ;

 (17)

" "

   

" " "

   



¢  

  

¢

 

1 ©  1 1

$ $

 

" "

   



( ¢ (  

  

" " "



¢

 

©   2  2  2 (7) This is of practical interest as it allows us to integrate out the mix-

ing filters and autoregressive filters, as conditional upon the vari- D with ances and states  the systems (14) and (16) are linear Gaussian

state space models. Together with (11) they define a bilinear Gaus-

 

¤ ¥  ¤ ¥ ¦ "

 

 

 &  & %

% sian process. "



©





' 3 4 '

2

  1

(8)

 %Y @ ^

In fact one can be more general and this can be replaced by ] for

_

"

 

¤ ¥ §

£

£ £



¨

1 1

and . 1 . $

382



2.4 Objectives of particles  and the characteristics of the function . Unfor-

" 

tunately, it is not possible to sample directly from the distribution

*

Our aim is, given the number of sources § , and to estimate

D F

¦

J £ ¢ £ J £ ¢ £ ¢ ¢ £

   

©

¦ D

¦ at any , and alternative strategies

  

©



  sequentially the sources and their parameters

need to be investigated.

"

! 

 +    

    

[

(

      

D F

¦

Z  ¡ ¡

J £ ¢ £ J £ ¢ £ ¢ ¢ £

   

from the observations . More pre- ©

One solution to estimate and

$

©

¦

£  cisely, in the framework of Bayesian estimation, one is interested © is the well-known Bayesian importance sampling method

in the recursive, in time, estimation of posterior distributions of [9]. This method assumes the existence of an arbitrary importance

D F

¦ 

¢ £

J  J  ¢  

©

D F

¦

¤ ©

J £ ¢ £ J £ ¢ £ ¢ ¢ £

   

the type : when this corresponds to ©

distribution which is easily simu-

 ¥

the filtering distribution and when ¤ this corresponds to the lated from, and whose support contains that of

©

D F

¦ ¦

£ ¢ £ £ ¢ £ ¢ £ £

J  J  ¢    ©

fixed-lag smoothing distribution. This is a very complex prob- ©

£ Using this distribution may lem that does not admit any analytical solution, and one has to re- be expressed as

sort to numerical methods. In the next section we develop such

   

 

B(

A

¦ # ¦

¢

4 4 4

         

              

© $ ©

©

¦

£



©

E E

a numerical method based on Monte Carlo simulation. Subse- E

  

© ! "

 

(

¦ # ¦

¢

4 4 4

      

              

© ©

<

    



¦  §

; E

quently we will use the following notation: , E

! "

$

D

 +      +    

     (22)



[ [

( (

             



Z ¡ ¡ ¨ Z ¡ ¡

and . D

¦

£ ¢ £ £ ¢ £

)  

©

where the importance weight is given by



3 A Simulation-Based Optimal Filter/Fixed-lag Smoother 

# ¦

¢

4

    

        

©

D

¦

£ ¢ £ £ ¢ £

)  

©

E



£

#

¦ (23)

¢

4

    

        

This section develops a simulation-based optimal filter/fixed-lag © E smoother to obtain filtered/fixed-lag smoothed estimates of the un- " observed sources and their parameters of the type The importance weight can normally only be evaluated up to a

constant of proportionality, since, following from Bayes’ rule,

©

D D F

¦ ¦ ¦

£  J J ¢ ¢ £

       

© © ©

D F

¦

J £ ¢ £ J £ ¢ £ ¢ ¢ £

   

(18) ©

  

 

 

#

¦ ¦

¢

4

        

              

© ©

E E

©



¦# 4

The standard Bayesian importance sampling method is first de- 

   ©

scribed, and then we show how it is possible to take advantage of (24)

F

¦

¢ £   the analytical structure of the model by integrating out the parame- ©

where the normalizing constant can typically not be

 

ters  and which can be high dimensional, using Kalman filter expressed in closed-form.

$

B B A

related algorithms. This leads to an elegant and efficient algorithm A

D

£ ¢ £ £ ¢ £

  



for which the only tracked parameters are the sources and the noise If  i.i.d. samples can be simulated ac-

D F

¦

J £ ¢ £ J £ ¢ £ ¢ ¢ £

   

©

variances. Then a sequential version of Bayesian importance sam- cording to a distribution , a Monte

©

¦

£ 

pling for optimal filtering is presented, and it is shown why it is Carlo estimate of © in (22) may be obtained as

       

O %

  (

necessary to introduce selection as well as diversity in the process. 



&  &

$

R 4



    '   '

     

©

¦

   





   

E © E

£

O

%

©



Finally, a Monte Carlo filter/fixed-lag smoother for our problem is 

(



&

R 4

   

'

     

 

E (25)

B B B

A A

described. A



D

V

) 

 

 £ ¢ £







3.1 Monte Carlo Simulation for Optimal Estimation ©

©

¦

¢ £ ¢ 

 

For any it will subsequently be assumed that © . where the normalized importance weights are given by

   



 (

Suppose that it is possible to sample i.i.d. samples, called parti-

&

B

A

    '

     

B B

A A

 

)

 

E

D D F

¦

£ ¢ £ ' '



£ ¢ £ £ ¢ £ ¢ £

  ¢  

O

©

%

© £

£ ¢ £ £ ¢ £ 

  

(

 (26)

cles, according to . &

' R 4

   

'

     

  Then an empirical estimate of this distribution is given by E

This method is equivalent to a point mass approximation of the



D F

¦

£ ¢ £ £ ¢ £ ¢ £

J  J  ¢  

©



target distribution of the form





   

D



¦

£ ¢ £ £ ¢ £ V

J  J 

©

 



D F

¦

© £ ¢ £ £ ¢ £ ¢ £

J  J  ¢  

©



   



     

B

A

E  



   

D

¦

V J £ ¢ £ J £ ¢ £

)  

©

£ ¢ £









(19) ©

   

     

 

so that a Monte Carlo approximation of the marginal distribution E F

D (27)

¦

J J ¢ ¢ £

   

©

D F

¦

J £ ¢ £ J £ ¢ £ ¢ ¢ £

   

follows as 

©

The perfect simulation case, when

B

A

 



D F



¦

J £ ¢ £ J £ ¢ £ ¢ ¢ £

)    

©



   

£ ¢ £



D D F

¦ ¦

¢ £

J  J  J  J  ¢   V

©  ©

© ©







 , corresponds to ,

© 



 

¨ © £ £ £ 

  E . In practice, the importance distribution will be cho- (20)

© sen to be as close as possible to the target distribution in a given

¦

£ 

 

© 

Using this distribution, an estimate of for any may be ©

¦





©

£ 

obtained as sense. For finite ,  is biased, since it involves a ratio of

estimates, but asymptotically, according to the strong law of large







 © ©

©





D F D ¦ ¦





£

 

¦ ¦ ¦



J J ¢ ¢ £ £ © ©

       

£

© © ©

©

 



numbers, ¢ . Under additional assump-

 

B B

A A 

 (21) D

V



 





© £ 

 tions a also holds [9].

3.2 Analytical integrations

This estimate is unbiased and from the strong law of large num-



D F

¦

J J ¢ ¢ £

   

©

© ©







¦ ¦

£ £

 

© 

© It is possible to reduce the estimation of and

 ©

bers, ¢ . Under additional assumptions

F

¦

£ ¢ £

 J  

©

£ ¢ £ (



 

¨

to one of sampling from , where we

(

D

 +    

  



[

(

     

the estimates satisfy a central limit theorem. The advantage of the 



¨ Z ¡ ¡

© recall that . Indeed,

¦

£ 

Monte Carlo method is clear. It is easy to estimate © for any

F F



J J ¢ £ J ¢ ¢ £

     

(

£ ¢ £ £ ¢ £

 

¦ ¨ © ¦ ¨

, and the rate of convergence of this estimate does not depend

( F

> (28)

J ¢ £

 

(

£ ¢ £



¦

¨

on or the dimension of the state space, but only on the number (

383

F

¢ £

J  ¢  

£ ¢ £



¦ ¨

where is a Gaussian distribution whose time without modifying subsequently the past simulated trajecto-

B

A

F

¢ £

J  

£ ¢ £ (



£ ¢ £



¨ ¨

parameters may be computed using Kalman filter type techniques. ries . This means that should admit

(

F

J ¢ £

 

£ ¢ £ (



F

¨

¢ £

J    

£ ¢ £ (

Thus, given an approximation of , an approx-   

¨

( as marginal distribution. This is possi-

D F

(

¦

¢ £

J  J  ¢  

©

imation of may straightforwardly be obtained. ble if the importance distribution is restricted to be of the general Defining the marginal importance distribution and associated im- form

portance weight as

F

¢ £ J

 

£ ¢ £ (



¨

¢ £



(

F F

¢ £ ¢ £ J £ ¢ £ J J

    

.

( (

£ ¢ £ £ ¢ £

 

F (33)

. .

¦

J . J ¢

 

© ¦ ¨ ¨

¦

©

£ £

  



© ¨ ¨ ¨

( (



#

¦

¢

4



    

©

  

)

£ ¢ £



¨

¦ #

¢

4



  

  ©

   Such an importance distribution allows recursive evaluation of the

"

¢ £



) ) )

£ ¢ £ £ ¢ £

  

(29) 

¨ © ¨

B

A importance weights, i.e. , and

£ ¢ £ 

and assuming that a set of samples ¨ distributed according to in our particular case

F

¢ £ J

 

(

£ ¢ £



¨

is available, an alternative Bayesian impor-

P

 

(

P

©

# ¦ # ¦

¢ ¢

4 4 4

 

4

         

© ©

     

¦

£

 ©

tance sampling estimate of follows as

P

P

# ¦ # ¦

¢ ¢

4 4 4

 

4

         

© ©

     

         



¢

§ § §

P   

P

O %

" "

#

¦ ¦ ¦

¢ ¢ ¢

4  

4

          

© © ©

           

(



¢

#



4

 

& ¡ & &

  

$

4 R 4

>       

' ' '

     

¢

©

E E



¦

 



 P

  © E   

#

¦

¢

4



4

    

©

     

O

%

©



(



&

R 4

  '

  

"



B B B B B

A A A A

A (34)

£



(

F D

¢ £

   V

)

D D

  

 £ ¢ £  £ ¢ £ J ¢ £ ¢ ¢ £

   



¢ £



¨  ©  ¦

(

§ The quantity can be computed up to a

( (30) normalizing constant using a one step ahead Kalman filter for the

F D

D F ¢ £ ¢ £ ¢ £

 

¦

¢ £

    ¢  

¢ £



©

£ ¢ £



§

¦ ¨ provided that can be evaluated system given by Eq. (16) and can be in a closed-form expression. In (30) the marginal normalized im- computed using a one step ahead Kalman filter of the system given

portance weights are given by by Eq. (14).

 

(

3.3.1 Choice of the Importance Distribution

&

B

A

 

'

  

£



)



£ ¢ £ '



O

%

© ¨ © £ £ £  £

( (31)

&

' R 4

'  

   There is an unlimited number of choices for the importance distri-



F

¢ £

J  

£ ¢ £ (



¨

bution , the only restriction being that its sup-

(



©

F

¢ £ J

 

£ ¢ £ (



¦







©

£

¨

port includes that of . Two possibilities are 

Intuitively, to reach a given precision, will need a re- (







©

¦ ¤

¦



 ©

£ considered next. A possible strategy is to choose at time the

duced number of samples over  , since it only requires

B

A importance distribution that minimizes the of the impor-

(

F

¢ £ J

 

£ ¢ £ F

 

 

¨



£ ( samples from the marginal distribution . It   

tance weights given ¨ and . The importance distribution

(

F

¢ £

  J

(

£ ¢ £ ¢ £

   

¨ ¨ can be proved that the variance of the estimates is subsequently that satisfies this condition is [9] , reduced [8]. In our case this is important as at each time instant with the associated incremental importance weight( given by

the number of parameters is (when assuming that all mixing filters

F F

¢ £ ¢ £ ¢ £

 ¢     

)

£ ¢ £

  

¨

and AR processes have the same length),

F F

¢ £ ¢ ¢ £ J

    

¢ £ £ ¢ £ ¢ £ (

    

© ¨ ¨ £ ¨

¤



(

¤ ¥ § ¤ ¤ § parameters for the mixing filters, where can (35)

be large. Direct sampling from the optimal importance distribution is diffi-

¤

§ or parameter(s) for the observation noise. cult, and evaluating the importance weight is analytically intractable.

¤

¥ ¥  The aim is thus in general to mimic the optimal distribution by

* parameters for the autoregressive processes.

means of tractable approximations, typically local linearization

¤

¥

F

J ¢

 

£

  

parameters for the sources. 

¨ ¨ . Instead, here we describe a mixed subop-

It is not clear which integration will allow for the best variance timal method. We propose to sample the particles at time ¦ accord-



¨

ing to two importance distributions and  with proportions

B

reduction [8], but at least in terms of search in the parameters space A

)

£ ¢ £





¥ ¨ ¨ the integration of the mixing filters and autoregressive filters seems and such that the importance weights  have now

preferable.   

the form (note that it would be possible to draw and  ran-

Given these results, the subsequent discussion will focus on domly according to a Bernoulli distribution with parameter ¨ , but Bayesian importance sampling methods to obtain approximations

this would increase the estimator variance)

B

A

©

(

F

¦

¢ £ J £

  

©

£ ¢ £

   

¢ ¨

 (

of and using an importance distri- ©



¢



#

4

& 

   ¦ #

¢

4



(

  '     

©

     

¢

B

A

 

?



¢

4

(

F

¨

¢ £

J  

#

¦

£ ¢ £

  ¢

4

¢

4



    

# © C

  

4

4

 ¨ &



  

(

  ' 

bution of the form . The methods described   

¢

"

 

¢

(



"

"



¢



#

4 &



   ¦ #

¢

4



'

    

  ©

  

   ¢

up to now are batch methods. The next section illustrates how a

¦

 

?



©

¢

¥ ¨

¦ #

¢

¢

4



  

  ©

# C

  

4

& 

  

  '    

sequential method may be obtained. ¢

" 



"  

" (36)

B

3.3 Sequential Bayesian Importance Sampling A

)

£ ¢ £ which in practice is estimated as 

The importance distribution at discrete time ¦ may be factorized as

   

¢ ¢

©



¢ ¢

# #

4

4 4

&  & 

     

    '  '

     

F

¢ ¢

¢ £

J  

(

£ ¢ £



¨  

 

¢ ¢

' '

"

O

% 

¢ £ ¨



(



¢ ¢

# #

4

4 4

 

& &

     

F ' R 4 F

    

' '

     

. .

¦

.

¢ £ J ¢ ¢ £ J ¢

  ¢ ¢  

© ¦

£ £

  



   

£ © ¨ ¨ ¨

¢ ¢

 

"



¢ ¢

# #

4 4

 

& &

     

    

' '

     

¢ ¢

¦

 

(32)

©  

¢ ¢

' '

" 

O

%

¥ ¨



¢ ¢

# #

& 4 & 4

 

     

'R  % 4 '  '

   

      

¢ ¢

The aim is to obtain at any time ¦ an estimate of the distribution

 

" 

F

¢ £

J  

£ ¢ £ (



¨

and to be able to propagate this estimate in (37) (

384

F D D

¢ £ ¢ £ ¢ £

   J  ¢    

¢ £

 

§ The importance distribution of distinct particles from it. However, the choice and configura-

will be taken to be a normal distribution centered around zero with tion of a specific kernel are not always straightforward. Moreover,

F D D

¢ £   J ¢ £ ¢ ¢ £

      

¢ £

 

§

variance , and  is taken these methods introduce additional Monte Carlo variation. In the

 

B B B E B

A A A A

(

F D §

J next subsection it is shown how MCMC methods may be com-

¢ £

 

 ¢ £ ¢ £

 

¢ £ ¢ £ ¢ £ ¢ £

   

 

1 1

¨ §

 to be ( which is

( bined with sequential importance sampling to introduce diversity

a Gaussian distribution obtained from a one step ahead Kalman

B

A amongst the samples without increasing the Monte Carlo variation.

§

¢ £ ¢ £

  filter for the state space model described in (11) with 

An efficient way of limiting sample depletion consists of sim-

B

A

B B

A A

§

¢ £ ¢ £

  1

and as values for  and and initial variances ply adding a MCMC step to the simulation-based filter/fixed-lag

$

¢ £ ¢ £ ¢ £ ¢ £

  

 smoother (see Berzuini and Gilks in [9], and [12] for an intro-

¨ ¨ 

and 1 . The variances are sampled from their

B

A duction to MCMC methods). This introduces diversity amongst

)

£ ¢ £ 

prior distributions, and expression (34) is used to compute  . the samples and thus drastically reduces the problem of deple-

B A

Note that other importance distributions are possible, but this ap- £



£ ¢ £



¤ ¨ tion of samples. Assume that, at time ¦ , the particles

proach yields good results and seems to preserve diversity of the

F

¢ £ J

 

(

£ ¢ £



¨

are marginally distributed according to . Ifa (

samples. £

J

£ ¢ £ £ ¢ £ (

 

¤ ¨ ¨

transition kernel with invariant distribution

(

F

¢ £ J

 

£ ¢ £ (

3.3.2 Degeneracy of the Algorithm 

¨

is applied to each of the particles, then the

(

B

A

£ ¢ £

 For importance distributions of the form specified by (33) the vari- new particles ¨ are still distributed according to the distribu- ance of the importance weights can only increase (stochastically) tion of interest. Any of the standard MCMC methods, such as the over time, see [9] and references therein. It is thus impossible to Metropolis-Hastings (MH) algorithm or Gibbs sampler, may be avoid a degeneracy phenomenon. Practically, after a few iterations used. However, contrary to classical MCMC methods, the transi- of the algorithm, all but one of the normalized importance weights tion kernel does not need to be ergodic. Not only does this method are very close to zero, and a large computational effort is devoted introduce no additional Monte Carlo variation, but it improves the to updating trajectories whose contribution to the final estimate is estimates in the sense that it can only reduce the total variation almost zero. For this reason it is of crucial importance to include norm [12] of the current distribution of the particles with respect selection and diversity. This is discussed in more detail in the fol- to the target distribution. lowing section. 3.5 Implementation Issues 3.4 Selection and diversity

The purpose of a selection (or resampling) procedure is to dis- 3.5.1 Algorithm

B

A

 ¡ ¥

card particles with low normalized importance weights and multi- W

£ ¢ £

  

¤ ¥  ¨ Given at time ¦ , particles distributed

ply those with high normalized importance weights, so as to avoid F

J ¢ £

   

(

£ ¢ £

  

¨ approximately according to , the Monte

the degeneracy of the algorithm. A selection procedure associates (



B

A

¦ ¤

£ Carlo fixed-lag smoother proceeds as follows at time .

W ¡

£

 

with each particle, say ¨ , a number of children , such

B

A





V

£





 ¨  © © 

that  , to obtain new particles . If

B

A

£



£



 ¦ then ¨ is discarded, otherwise it has children at time .

After the selection step the normalized importance weights for all Monte Carlo filter/fixed-lag smoother  

the particles are reset to  , thus discarding all information re- garding the past importance weights. Thus, the normalized im- Sequential Importance Sampling Step

portance weight prior to selection in the next time step is propor-

B

A

£

B B

A A

) £



¤

D F

¤

¢ £ J

 

(

£ tional to (34). These will be denoted as , since they do not ¢



£ ¢ £  ¢ £

   

 ¨ © £ £ £  ¨

For ¨ ,

(

B B

depend on any past values of the normalized importance weights. B

A A A

£ £

¢ £ £ ¢ £ £ ¢ £

     

© ¨ ¨ If the selection procedure is performed at each time step, then the and set ¨ .

approximating distribution before the selection step is given by

B

A

¤

£ £



 

F

¦ ¦

J J ¢   V

)

¨ © £ £ £ 

© ©



£ £

 

 ¢ 

For , compute the normalized importance

 ¨ ¨ ©

B A

, and the one after the

£

 



)

¢ £

  





 

 weights using (34) and (37).

F

¦ ¦

J J ¢   V

© ©

£ £

 

 

%

¨ ¨ © 

selection step follows as .

   We choose systematic sampling [8] for its good variance prop er- Selection Step

ties.

B

A £

However selection poses another problem. During the resam- ¤

£ ¢ £  Multiply / discard particles ¨ w.r.t. high / low pling stage any particular particle with a high importance weight

normalized importance weights to obtain  particles



B

A

£ ¥

will be duplicated many times. In particular, when ¤ , the

£ ¢ £



¨ ¢ £ ¤ £ 

 , using systematic sampling.

¦ ¦ ¤

trajectories are resampled ¤ times from time to so

 ¤ that very few distinct trajectories remain at time ¦ . This is the classical problem of depletion of samples. As a result the cloud MCMC Step

of particles may eventually collapse to a single particle. This de-

B

A

£

¤

£ ¢ £



© £ £ £  ¨

generacy leads to poor approximations of the distributions of in- For ¨ , apply to a Markov transition

B B

A A

£ (

terest. Several suboptimal methods have been proposed to over- J

£ ¢ £ £ ¢ £

  

¤ ¨ ¨ 

kernel ( with invariant distribution

(

B

come this problem and introduce diversity amongst the particles. A

F

J ¢ £

 

£ ¢ £ (



£ ¢ £



¨  ¨

Most of these are based on kernel density methods [9, 10], which to obtain particles . ( approximate the probability distribution using a kernel density es- timate based on the current set of particles, and sample a new set ¦

385 3.5.2 Implementation of the MCMC Steps 5 There is an unlimited number of choices for the MCMC transition 0 −5 kernel. Here a one-at-a-time MH algorithm is adopted that updates 50 100 150 200 250 300 350 400 450 500

5

 

¤ ¦ ¦ ¤

at time ¦ the values of the Markov process from time to . 0

B

A

U .

 −5

© ¦ £ £ £ ¦ ¤ ¨ © £ £ £ 

More specifically, ¨ , , , is sam-

50 100 150 200 250 300 350 400 450 500

B

A

F

.

. ¢ £

J ¢   

 10

¨ ¨ pled according to an MCMC with  as target

0

B B B B B

A A A A A £

£ −10

.  . .

£ 

  

¢

   

¨ ¨ ¨ £ £ £ ¨ ¨ £ £ £ distribution, where  50 100 150 200 250 300 350 400 450 500

20

B B

A A

£

F

.

.

¢ £

J ¢  

¢ £   

 0

 ¨ ¨ ¨ . Evaluation of can be done ef-

−20

¦ 

© 0 50 100 150 200 250 300 350 400 450 500

¤ ficiently via a backward-forward algorithm of com- plexity [7]. The algorithm is fully described in [2]. The com- Figure 2: True (two top) and estimated (two bottom) sources.

putational complexity of the whole algorithm at each iteration is

¦

© 

clearly . At first glance, it could appear necessary to keep

B

A

£

 in memory the paths of all the trajectories ¨ , so that the stor- 5 REFERENCES

age requirements would increase linearly with time. In fact, the

F

 ¢

  [1] B.D.O. Anderson and J.B. Moore, Optimal Filtering ,

£

   

¨ ¨ importance distribution , and the associ-

Prentice-Hall, Englewood Cliffs, 1979.

£   

ated importance weights, depend on ¨ only via a set of low-

 

B B

A A

§ [2] C. Andrieu, S.J. Godsill, “A Particle filter for audio source

   

 

1 1 ¨ dimensional sufficient , and only these separation”, Tech. Rep. Cambridge University Engineering values need to be kept in memory. Thus, the storage requirements

Department, 2000. ¦

©  are also and do not increase over time. [3] D.L. Aspach and H.W. Sorenson, “Nonlinear Bayesian es- timation using Gaussian sum approximations”, IEEE Trans. 4 Simulations Auto. Cont., vol. 17, no. 4, pp. 439448, 1972. To illustrate the efficiency of our method we have applied the pro- [4] H. Broman, U. Lindgren, H. Sahlin and P. Stoica, “Source cedure to two scenarios. In the first case we generated two sta- Separation: A TITO System Identification Approach”, Sig-

tionary time invariant order ¡ autoregressive processes, mixed by nal Processing 1998.

two filters of length ¢ . In the second case we made the phase [5] R.S. Bucy and K.D. Senne, “Digital synthesis of nonlinear

£

£ ¦ ©

of the poles of the AR processes evolve from £ to from filters”, Automatica, vol. 7, 1971, pp. 287-298.

£

  

£ £ £ ¡ ¢ £ £ ¦ © ¡ ¢ £ £ £ ¢

and from to for . The algorithm [6] D. Crisan, P. Del Moral and T. Lyons, “Discrete filtering us-

  

¤ ©

was run with particles with , the parameters of the sys- ing branching and interacting particle systems”, to appear 

tem and the sources were initialized at random. The matrices  Markov Processes and Related Fields, 2000.

>





£  and 1 were set to . The results for the two cases are pre- sented on Fig. (1) and Fig. (2), where the two original sources and [7] A. Doucet and C. Andrieu, “Iterative algorithms for optimal the sources estimated on-line using the particle filter are displayed. state estimation of jump Markov linear systems”, in Proc. Note that for the first scenario convergence really occurs from it- Conf. IEEE ICASSP, 1999. eration 150 approximately, and that source 2 was apparently mis- [8] A. Doucet, J.F.G. de Freitas and N.J. Gordon (eds.), Sequen- taken with source 1 before. For the second scenario, as expected, tial Monte Carlo Methods in Practice, Springer-Verlag, to ap- the problem is much harder, especially when the two poles are pear June 2000. close, or when zero/pole cancellation occurs. Application of the [9] A. Doucet, S.J. Godsill and C. Andrieu, “On sequential algorithm to real speech signals is currently under investigation. Monte Carlo sampling methods for Bayesian filtering”, to appear Statistics and Computing, 2000. [10] N.J. Gordon, D.J. Salmond and A.F.M. Smith, “Novel ap- 5 proach to nonlinear/non-Gaussian Bayesian state estima- 0 tion”, IEE Proceedings-F, vol. 140, no. 2, pp. 107-113, 1993. −5 50 100 150 200 250 300 350 400 5 [11] J.E. Handschin and D.Q. Mayne, “Monte Carlo techniques

0 to estimate the conditional expectation in multi-stage non-

−5 linear filtering”, Int. J. Cont., vol. 9, no. 5, 1969, pp. 547- 50 100 150 200 250 300 350 400

5 559.

0

−5 [12] C.P. Robert and G. Casella, Monte Carlo Statistical Methods, 50 100 150 200 250 300 350 400 Springer-Verlag, 1999. 10

0 [13] E. Weinstein, A.V. Oppenheim, M. Feder and J.R. Buck,

−10 0 50 100 150 200 250 300 350 400 “Iterative and Sequential Algorithms for Multisensor Signal Enhancement”, IEEE. Trans. SP, Vol. 42, No. 4, April 1994. Figure 1: True (two top) and estimated (two bottom) sources.

386