Auxiliary function based IVA using a source prior exploiting fourth order relationships Yanfeng Liang, Jack Harris, Gaojie Chen, Syed Mohsen Naqvi, Christian Jutten, Jonathon Chambers

To cite this version:

Yanfeng Liang, Jack Harris, Gaojie Chen, Syed Mohsen Naqvi, Christian Jutten, et al.. Auxiliary function based IVA using a source prior exploiting fourth order relationships. EUSIPCO 2013 - 21th European Conference, Sep 2013, Marrakech, Morocco. pp.EUSIPCO 2013 1569746061. ￿hal-00867131￿

HAL Id: hal-00867131 https://hal.archives-ouvertes.fr/hal-00867131 Submitted on 27 Sep 2013

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. EUSIPCO 2013 1569746061

1 2 3 4 AUXILIARY FUNCTION BASED IVA USING A SOURCE PRIOR 5 EXPLOITING FOURTH ORDER RELATIONSHIPS 6 7 Yanfeng Liang∗, Jack Harris∗,†, Gaojie Chen∗, Syed Mohsen Naqvi∗,Christian Jutten†, Jonathon Chambers∗ 8 9 ∗ 10 School of Electronic, Electrical and System Engineering 11 Loughborough University, UK 12 {y.liang2, g.chen, s.m.r.naqvi, j.a.chambers}@lboro.ac.uk 13 †GIPSA-Lab, CNRS UMR5216, 14 Universite´ de Grenoble, France 15 {jack.harris, christian.jutten}@gipsa-lab.grenoble-inp.fr 16 17 18 19 ABSTRACT algorithmically. The permutation ambiguity can theoretically be avoided by using a multivariate source prior to retain the 20 Independent vector analysis (IVA) can theoretically avoid the dependency between different bins of each source 21 permutation ambiguity present in indepen- [7][8][9]. IVA can thereby solve the permutation problem 22 dent component analysis by using a multivariate source prior during the convergence process without post processing. In 23 to retain the dependency between different frequency bins of recent years, different modified IVA methods have been pro- 24 each source. The auxiliary function based independent vector posed. An adaptive step size IVA method is proposed to in- 25 analysis (AuxIVA) is a stable and fast update IVA algorithm crease the convergence speed [10]. The fast fix-point IVA 26 which includes no tuning parameters. In this paper, a particu- method adopts Newton’s update method to obtain a fast con- 27 lar multivariate generalized Gaussian distribution source prior vergence form of IVA [11]. Video information is introduced 28 is therefore adopted to derive the AuxIVA algorithm which to form the audio-video based IVA method [12]. In 2011, the 29 can exploit fourth order relationships to better preserve the auxiliary function based form of independent vector analy- 30 dependency between different frequency bins of speech sig- sis (AuxIVA) was proposed, which is a stable and fast form 31 nals. Experimental results confirm the improved separation of IVA algorithm. By using the auxiliary function technique 32 performance achieved by using the proposed algorithm. AuxIVA can guarantee a monotonic decrease of the cost func- 33 Index Terms— AuxIVA, multivariate generalized Gaus- tion without the need for tuning parameters such as step size 34 sian distribution, fourth order relationships [13]. 35 36 The source prior is important to all IVA methods, be- 37 1. INTRODUCTION cause it is used to derive the nonlinear score function which 38 is used to keep the dependency between different frequency 39 Independent component analysis (ICA) is the central tool for bins. For the original IVA algorithm, the multivariate Laplace 40 the blind source separation (BSS) problem [1][2]. The most distribution is adopted as the source prior [8]. However, it is 41 famous BSS problem is the cocktail party problem [3][4], in not the only form for source prior; an improved source prior 42 which one speaker must be selected from a mixture of sounds. which can better preserve the relationships between differ- 43 In the real room environment, due to reverberation, it be- ent frequency bins is still needed. In this paper, we adopt 44 comes a convolutive blind source separation (CBSS) problem. a generalized multivariate Gaussian distribution as the source 45 Therefore time domain methods are not appropriate because prior, which can preserve the fourth order terms within the 46 of the computational complexity [1]. Thus frequency domain score function. Then the AuxIVA algorithm based on this 47 methods are proposed to solve the CBSS problem [5]. How- particular source prior is derived. The proposed method will 48 ever, the permutation problem inherent to BSS needs to be be shown to better preserve the relationships between differ- 49 solved to achieve the separation result. Many methods ex- ent frequency bins, thereby improving the separation perfor- 50 ploiting the positions or spectral structures of the sources have mance. 51 been proposed to solve the permutation problem, but all of The paper is organized as follows, in Section 2, the par- 52 these methods need post processing to address the problem ticular source prior is analyzed. In Section 3, the AuxIVA al- 53 [6]. gorithm with our source prior is derived and the advantage of 54 Independent vector analysis (IVA)is proposed to solve the this source prior is discussed. Then, the experimental results 55 frequency domain blind source separation (FD-BSS) problem are shown to confirm the advantages of the proposed method 56 57 60 61

1 1 2 in Section 4. Finally, conclusions are drawn in Section 5. 3. AUXIVA USING THE PARTICULAR SOURCE 3 PRIOR 4 2. A MULTIVARIATE GENERALIZED GAUSSIAN The auxiliary function technique is developed from the 5 DISTRIBUTION SOURCE PRIOR expectation-maximization algorithm, and avoids the step size 6 tuning [14]. In the auxiliary function technique, an auxiliary 7 For the convolutive blind source separation problem, the basic function is designed for optimization. Instead of minimiz- 8 noise free model in the frequency domain is described as: ing the cost function, the auxiliary function is minimized in 9 terms of auxiliary variables. The auxiliary function technique 10 x(k)=H(k)s(k) (1) can guarantee monotonic decrease of the cost function, and 11 (k)=W (k) (k) y x (2) therefore provides effective iterative update rules [13]. 12 T The contrast function for AuxIVA is derived from the 13 where x(k)=[x1(k), ··· ,xm(k)] is the observed sig- T source prior [14]. For the original IVA algorithm, 14 nal vector, while s(k)=[s1(k), ··· ,sn(k)] and y(k)= T [y1(k), ··· ,yn(k)] 15 are the source signal vector and the es- G(yi)=GR(ri)=ri (6) 16 timated source vector respectively in the frequency domain; T 17 and (·) denotes vector transpose. H(k) is the mixing matrix where ri = yi2. 18 with m×n dimension, and W (k) is the unmixing matrix with By using the proposed source prior, we can obtain the fol- 19 n × m dimension. The index k denotes the k − th frequency lowing contrast function k =1, ··· ,K 20 bin, and . The unmixing matrix can be defined 2 3 21 as G(yi)=GR(ri)=ri . (7) h 22 W (k)=[w1(k) ···wn(k)] (3) 23 The update rules contain two parts, i.e. the auxiliary vari- (·)h 24 where denotes Hermitian transpose. able updates and unmixing matrix updates. In summary, the 25 For traditional CBSS approaches, the scalar Laplace dis- update rules are as follow: 26 tribution is widely used for the source prior. However, the   K 27 resultant nonlinear score function is a univariate function,  r =  |wh(k)x(k)|2 28 which can not retain the dependency between different fre- i i (8) k=1 29 quency bins for each source. In [8], IVA exploits a dependent 30 multivariate Laplace distribution as the source prior. In this GR(ri) h 31 paper we adopt a particular multivariate distribution as the Vi(k)=E[ x(k)x(k) ] (9) ri 32 source prior which can be written as: 33    −1 3 h −1 wi(k)=(W (k)Vi(k)) ei (10) 34 p(si) ∝ exp − (si − μi) Σi (si − μi) (4) 35 T wi(k) 36 =[s (1), ··· ,s (K)] i wi(k)= . where si i i is the -th vector source, and h (11) −1 wi (k)Vi(k)wi(k) 37 μi and Σi are respectively the vector and 38 matrix. The is then assumed to be a di- In equation (10), ei is a unity vector, the i-th element of 39 agonal matrix due to the orthogonality of the Fourier bases, which is unity. 40 which implies that each frequency bin sample is mutually un- During the update process of the auxiliary variable Vi(k), G (r ) 41 correlated. Moreover, if the source signal is zero mean and R i we notice that ri is used to keep the dependency between 42 unity , then the source prior becomes: different frequency bins for source i. In this paper, as we 2 43 3   K  1     2  defined previously, GR(ri)=ri . Therefore 44 2 3 3 p(si) ∝ exp − |si(k)| = exp − si2 (5) 45 G (r ) 2 2 k=1 R i = = r 4 (12) 46 i 3r 3 3 K 2 2 i 3 ( k=1 |yi(k)| ) 47 where |·|denotes the absolute value, and ·2 denotes the 48 Euclidean norm. This new source prior can be taken as a mul- If we expand the equation under the cubic root in equation 49 tivariate generalized Gaussian distribution with the shape pa- 2 (12), then 50 rameter 3 , and the original multivariate Laplace distribution K K 51 also belongs to this family with the 1. For the 2 2 4 2 2 ( |yi(k)| ) = |yi(k)| + cuv|yi(u)| |yi(v)| (13) 52 multivariate generalized Gaussian distribution, the smaller the k=1 k=1 u=v 53 shape parameter, the heavier are the tails. Thus, this source 2 2 54 prior has a heavier tail, which which can have advantage in which contains the cross items u=v cuv|yi(u)| |yi(v)| , 55 separating non-stationary signals. where cuv denotes a scalar coefficient between the u-th and 56 57 60 61

2 1 v 2 -th frequency bins. These terms contain the fourth order inter-relationships of different components for each source 60 3 20 vector. Thus, they can provide an informative model of the 50 4 40 5 dependency structure. 40 The use of such fourth order information has seldom been 60 6 30 highlighted in original AuxIVA.We will show the comparison 80 7 20 8 of the second order information and the fourth order informa- 100 tion inherent to speech signals. We choose a particular speech 10 9 120 10 signal “si10390.wav” from the TIMIT database [15], with 8 20 40 60 80 100 120 11 kHz frequency and 1024 DFT length. Fig 1 is part of the covariance matrix, which is correspondent to the low 12 Fig. 2. Fourth order inter-frequency relationships informa- frequency bins. It is difficult to observe any information in 13 tion for the speech signal “si1039.wav”, x and y dimensions the high frequency bins due to the limited energy. We can see 14 correspond to frequency bins 1 to 128 of 512. 15 that the second order information is mainly distributed on the 16 diagonal. This is because the is an orthog- 17 onal based transform. Then the general form of equation (12) is: 18 GR(ri) 2β 2β = = (17) 60 2(1−β) K 2 1−β 19 ri r ( |yi(k)| ) 20 i k=1 20 50 21 40 In order to preserve the form of the fourth order relationship 40 as defined in equation (13), the root needs to be odd. Thus the 22 60 30 following condition must be satisfied 23 80 24 20 2 100 1 − β = (18) 25 10 2I +1 26 120 20 40 60 80 100 120 I 27 where is positive integer. Then we can obtain the condition for β 28 2I − 1 29 Fig. 1. Second order inter-frequency relationships informa- β = 2I +1 (19) 30 tion for the speech signal “si1039.wav”, x and y dimensions β 31 correspond to frequency bins 1 to 128 of 512. On the other hand, is the shape parameter of the gen- 32 eralized multivariate Gaussian distribution. In order to make β 33 Now we construct a fourth order matrix to exploit the the proposed source prior more robust to outliers, should 34 fourth order information be less than the 1/2, which is correspondent to the original ⎛ 2 2 2 2 ⎞ source prior. Thus 35 E[(yi(1)) (yi(1)) ] ··· E[(yi(1)) (yi(K)) ] 2I − 1 1 36 ⎜ . . ⎟ < ⎝ . .. . ⎠ (14) 2I +1 2 (20) 37 . . . E[(y (K))2(y (1))2] ··· E[(y (K))2(y (K))2] 38 i i i i Finally, I =1is the only solution, and the correspondent β is 39 Fig 2 is part of this fourth order matrix, which is also 1/3, as proposed in this paper. 40 correspondent to the same low frequency bins. It is evident 41 that there is fourth order information not only on the diago- 4. AND RESULTS 42 nal. Thus, such fourth order information should be exploited 43 to help separation, as have been recently highlighted in [16], In this section, we use experiments to confirm the advan- 44 namely considering the correlation of squares of components. tages of the proposed method. We chose different speech 45 signals from the TIMIT dataset [15]. Each speech signal 46 Now we will discuss that the proposed source prior is the was approximately seven seconds long. The image method 47 most appropriate for introducing the fourth order relationship was used to generate the room impulse responses, and the 3 48 as shown in equation (13). We assume that the source prior size of the room was 7 × 5 × 3m . The DFT length was 49 has the general form 1024. We used a 2 × 2 mixing case, i.e. m = n =2, for which the microphone positions are [3.48, 2.50, 1.50]m 50   K β    2β  2 m 51 p(si) ∝ exp − |si(k)| = exp − si2 (15) and [3.52, 2.50, 1.50] respectively. The sampling frequency 52 k=1 was 8kHz. The separation performance was evaluated objec- 53 Thus the general contrast function is: tively by the signal-to-distortion ratio (SDR) and signal-to- 54 interference ratio (SIR)[17]. Fig 3 denotes the experimental 2β 55 G(yi)=GR(ri)=ri (16) setting. 56 57 60 61

3 1 2 environments. Fig 4(a) and Fig 4(b) show the SDR and SIR 3 comparison respectively. It indicates that the proposed algo- 4 rithm can consistently improve the separation performance in 5 different reverberant environment. 6 16 18

7 15 proposed 17 proposed 8 14 original 16 original 13 15

9 12 14 10 11 13 SIR(dB) SDR(dB) 10 12

11 9 11 12 8 10 7 9

13 6 8 200 250 300 350 400 450 200 250 300 350 400 450 14 Fig. 3. Plan view of the setting in the room envi- Reverberation time RT60 (ms) Reverberation time RT60 (ms) 15 ronment with two microphones and two sources (a) (b) 16 17 Fig. 4. Separation comparison between original and proposed 18 In the first experiment, we set the reverberation time RT60 AuxIVA algorithms as a function of reverberation time 19 = 200ms. We selected two different speech signals randomly 20 from the TIMIT dataset and convolved them into two mix- 21 tures. Then the orignal AuxIVA method and the proposed 22 AuxIVA method with the new source prior were used to sep- 5. CONCLUSIONS 23 arate the mixtures respectively. Then we changed the source 24 positions to repeat the simulation. For every pair of speech The auxiliary function based IVA algorithm is a recently pro- 25 signals, three different azimuth angles for the sources rela- posed fast form IVA algorithm. For all the IVA algorithms, 26 tive to the normal to the microphone array were set for test- the selection of the source prior is important. In this paper, we examined a particular multivariate generalized Gaussian 27 ing, these angles were selected from 30, 45, 60 and -30 de- distribution as the source prior, then the AuxIVA method was 28 grees as shown in Fig 3. After that, we chose another pair of derived based on this particular source prior. The proposed 29 speech signals to repeat the above simulations. In total, we method exploits a score function which contains a term de- 30 used five different pairs of speech signals, and repeated the scribing the fourth order relationships between different fre- 31 simulation 15 times at different positions. Table 1 shows the quency bins for each source, and thereby provides a more 32 average separation performance for each pair of speech sig- informative dependency structure. The experimental results 33 nals in terms of SDR and SIR. The average SDR and SIR im- showed that the proposed AuxIVA method can improve the 34 provements are approximately 1.7dB and 1.9dB respectively. separation performance by approximately 1.7dB and 1.9dB 35 The results confirm the advantage of the proposed AuxIVA respectively in terms of SDR and SIR. Meanwhile, the pro- 36 method which can better preserve the dependency between posed method can consistently improve the separation perfor- 37 different frequency bins of each source. mance in different reverberant environments. 38 39 Table 1. Separation performance comparison in terms of 40 SDR and SIR measures in dB 6. REFERENCES 41 Mixtures mix1 mix2 mix3 mix4 mix 5 42 AuxIVA (SDR) 12.13 14.62 9.86 19.23 18.64 [1] A. Cichocki and S. Amari, Adaptive Blind Signal and 43 Proposed (SDR) 14.82 16.30 12.45 19.92 19.50 Image Processing: learning algorithms and applica- 44 AuxIVA (SIR) 14.06 16.72 11.59 20.54 20.12 tions, Wiley, 2003. 45 Proposed (SIR) 17.26 18.42 14.58 21.20 20.90 46 [2] A. Hyvarinen,¨ J. Karhunen, and E. Oja, Independent 47 In the second experiment, we tested the robustness of the Component Analysis, Wiley, 2001. 48 proposed AuxIVA method in different reverberant room en- 49 vironments. We selected two speech signals from the TIMIT [3] C. Cherry, “Some experiments on the recognition of 50 dataset randomly and convolved them into two mixtures. The speech, with one and with two ears,” The Journal of 51 azimuth angles for the sources relative to the normal to the mi- The Acoustical Society of America, vol. 25, pp. 975– 52 crophone array were set as 60 and -30 degrees. Both the orig- 979, 1953. 53 inal AuxIVA and the proposed AuxIVA were used to separate 54 the mixtures. The results are shown in Fig 4, which shows the [4] S. Haykin and Z. Chen, “The cocktail party problem,” 55 separation performance comparison in different reverberant Neural Computation, vol. 17, pp. 1875–1902, 2005. 56 57 60 61

4 1 2 [5] L. Parra and C. Spence, “Convolutive blind separa- 3 tion of non-stationary sources,” IEEE Transactions 4 on Speech and Audio Processing, vol. 8, pp. 320–327, 5 2000. 6 [6] M. S. Pedersen, J. Larsen, U. Kjems, and L. C. Parra, “A 7 survey of convolutive blind source separation methods,” 8 Springer Handbook on Speech Processing and Speech 9 Communication, 2007. 10 11 [7] T. Kim, I. Lee, and T.-W. Lee, “Independent vector anal- 12 ysis: definition and algorithms,” in Fortieth Asilomar 13 Conference on Signals, Systems and Computers 2006, 14 Asilomar, USA, 2006. 15 [8] T. Kim, H. Attias, S. Lee, and T. Lee, “Blind source 16 separation exploiting higher-order frequency dependen- 17 cies,” IEEE Transactions on Audio, Speech and Lan- 18 guage processing, vol. 15, pp. 70–79, 2007. 19 20 [9] T. Kim, “Real-time independent vector analysis for con- 21 volutive blind source separation,” IEEE Transactions on 22 Circuits and Systems I, vol. 57, pp. 1431–1438, 2010. 23 [10] Y. Liang, S.M. Naqvi, and J. Chambers, “Adaptive step 24 size indepndent vector analysis for blind source separa- 25 tion,” in 17th International Conference on Digital Sig- 26 nal Processing, Corfu, Greece, 2011. 27 28 [11] I. Lee, T. Kim, and T.-W. Lee, “Fast fixed-point in- 29 dependent vector analysis algorithms for convolutive 30 blind source separation,” Signal Processing, vol. 87, pp. 31 1859–1871, 2007. 32 [12] Y. Liang, S.M. Naqvi, and J. Chambers, “Audio 33 video based fast fixed-point independent vector analy- 34 sis for multisource separation in a room environment,” 35 EURASIP Journal on Advances in Signal Processing, 36 vol. 2012:183, 2012. 37 38 [13] N. Ono, “Stable and fast update rules for independent 39 vector analysis based on auxiliary function technique,” 40 in 2011 IEEE WASPAA, New Paltz, USA, 2011. 41 [14] N. Ono and S. Miyabe, “Auxiliary-function-based 42 independent component analysis for super-Gaussian 43 source,” in LVA/IVA 2010, St. Malo, France, 2010. 44 45 [15] J. S. Garofolo et al., “TIMIT acoustic-phonetic contin- 46 uous speech corpus,” in Linguistic Consortium, 47 Philadelphia, 1993. 48 [16] A. Hyvarinen,¨ “Independent component analysis: re- 49 cent advances,” Philos Transact A Math Phys Eng Sci, 50 vol. 371(1984), pp. 1–19, 2012. 51 52 [17] E. Vincent, C. Fevotte,´ and R. Gribonval, “Performance 53 measurement in blind audio source separation,” IEEE 54 Transactions on Audio, Speech, and Language Process- 55 ing, vol. 14, pp. 1462–1469, 2006. 56 57 60 61

5