<<

SPECTRAL SMOOTHING FOR -DOMAIN BLIND SOURCE SEPARATION

Hiroshi Sawada Ryo Mukai Se´bastien de la Kethulle de Ryhove Shoko Araki Shoji Makino

NTT Communication Science Laboratories, NTT Corporation 2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0237, Japan {sawada,ryo,delaketh,shoko,maki}@cslab.kecl.ntt.co.jp

ABSTRACT We need thousands of filter taps to separate acoustic This paper describes the circularity problem of frequency- in a room. The approach is impractical for such cases unless domain blind source separation (BSS), and presents a new starting from a good initial solution. method for solving it. Frequency-domain BSS performs in- The other approach is frequency-domain BSS, where dependent component analysis (ICA) in each frequency bin. complex-valued ICA for an instantaneous mixture is applied It is more efficient than time-domain BSS where ICA is ap- in each frequency bin [3Ð8]. The merit of this approach is plied to convolutive mixtures. However, frequency-domain that the ICA algorithm can be performed separately at each BSS has two problems. The first is the permutation prob- frequency, and the convergence of each ICA is fast. The dif- lem, for which we have recently proposed a method. It ficulty lies in solving the permutation problem and the cir- provides a robust and precise solution for the permutation cularity problem. The permutation problem is well known problem and reveals the influence of the second problem, to be a difficult problem. Recently, we have proposed a namely the circularity problem. Our solution for this sec- method [6] that provides a robust and precise solution and ond problem is based on spectral smoothing by window- also enables the separation of more than two sources. As ing. However, the direct application of windowing changes a consequence, the influence of the circularity problem is the frequency responses for separation obtained by ICA and highlighted. It deteriorates separation performance as ex- causes an error. Therefore, we adjust the frequency responses plained in Sec. 3. before windowing so that the error is minimized. The effec- One well-known aspect of the circularity problem is that tiveness of the method is shown by experimental results for a multiplication in the frequency domain corresponds to a the separation of up to four sources. circular in the time domain, which is different from a linear one. A widely used technique for simulating a linear convolution by frequency-domain multiplication in- 1. INTRODUCTION volves constraining frequency responses such that the cor- responding time-domain filter contains enough consecutive We consider the problem of blind source separation (BSS) zeros at the end [9]. To maintain this constraint strictly in for convolutive mixtures [1]. Suppose that N source sig- BSS, the gradient of a filter in the ICA algorithm should also nals sp(t) are mixed and observed at M sensors xq(t)=   be constrained to contain enough consecutive zeros. If this N ( ) ( − ) ( ) p=1 l hqp l sp t l , where hqp l represents the im- constraint is followed in frequency-domain BSS, it becomes pulse response from source p to sensor q. The goal is to sep- a frequency-domain implementation of time-domain convo- ( ) arate the mixtures xq t and to obtain a filtered version of lutive ICA [10, 11], whose convergence is slow as in the ( ) ( ) a source sp t at each output yr t . The separation system case of time-domain BSS. Some BSS methods interleave typically consists of a set of FIR filters wrq(l) of length L M L−1 this constraint with a frequency-domain ICA algorithm [7, to produce separated signals yr(t)= q=1 l=0 wrq(l) 8]. With these methods, the constraint conflicts with the xq(t − l). In a practical situation, separation should be per- ICA solution, and these two different operations should be formed without knowing sp(t) or hqp(l). Independent com- applied repeatedly until convergence. In both cases, the al- ponent analysis (ICA) [2] is one of the major statistical tools gorithm moves back and forth between the two domains, for solving this problem. We can classify BSS methods into and loses the attractive characteristic of frequency-domain two approaches based on how we apply ICA. BSS whereby ICA can be applied separately in each fre- The first approach is time-domain BSS, where ICA is quency bin. applied directly to the convolutive mixture model. The ICA Our recognition of the circularity problem is more gen- algorithm correctly evaluates the independence of separated eral than the circular convolution problem. The circularity signals yr(t). Thus, the approach achieves good separation problem arises from discrete frequency representation as ex- once the algorithm converges. However, if the algorithm plained in Sec. 3. Our technique for solving the problem in- starts from a solution far from the final one, it takes many volves spectral smoothing by a window that tapers smoothly iterations and much time to converge. This is because filter to zero at each end, and forcing a filter to fit a specific length coefficients wrq(l) depend on each other in the algorithm. L. The direct application of windowing, however, changes mixed signals separated signals x(t) w(l) y(t) 1 0 STFT IDFT X( f ,t) W( f ) −1 Amplitude −2 ICA permutation scaling smoothing W( f ) 1000 2000 3000 4000 5000 6000 1 Fig. 1. Flow of frequency-domain BSS 0

−1 the ICA solutions and causes an error. Therefore, we adjust Amplitude −2 the ICA solutions within their scaling ambiguity before win- 1000 2000 3000 4000 5000 6000 dowing so that the error is minimized in the least-squares Time (sample) sense. These techniques are explained in Sec. 4. Fig. 2. Periodical time-domain filter represented by fre- quency responses sampled at L = 2048 points (above) and 2. FREQUENCY-DOMAIN BSS its one-period realization (below). This section describes frequency-domain BSS where ICA Target: u (l) Interference: u (l) is applied separately in each frequency bin [3Ð6]. Figure 1 11 13 shows the flow. First, time-domain signals xq(t) are con- 1 1 ( ) verted into frequency-domain time-series signals X q f,t 0.5 0.5 by short-time Fourier transform (STFT), where t is now 0 0

down-sampled with the distance of the frame shift. Then, Amplitude −0.5 −0.5 to obtain the frequency responses Wrq(f) of filters wrq(l), complex-valued ICA Y(f,t)= W(f)X(f,t) is solved, 3000 4000 5000 3000 4000 5000 T where X(f,t)=[X1(f,t), ..., XM (f,t)] , Y(f,t)= T 1 1 [Y1(f,t), ...,YN (f,t)] , and W(f) is a separation matrix 0.5 0.5 whose elements are Wrq(f). Note that any ICA algorithm can be used in this scheme. The ICA solution in each fre- 0 0 quency has scaling and permutation ambiguity. We perform Amplitude −0.5 −0.5 W( ) ← P( )W( ) P( ) permutation alignment f f f , where f 3000 4000 5000 3000 4000 5000 is a permutation matrix obtained by the method [6]. The Time (sample) Time (sample) scaling ambiguity is solved by the minimal distortion prin- Fig. 3. Impulse responses urp(l) obtained by the periodical W( ) ← (W( )−1)W( ) ( ) ciple, f diag f f , to make Yr f,t as filter (above) and by the finite-length filter (below). close to Xr(f,t) as possible [4, 12]. Then, we solve the circularity problem by spectral smoothing as described in ( ) Sec. 4. Finally, separation filters wrq l are obtained by ap- are obtained by the infinite-length filters, and the lower ones ( ) plying inverse DFT to Wrq f . are by the one-period filters. We see that the one-period fil- ters create spikes, which distort a target and degrade 3. THE CIRCULARITY PROBLEM the separation performance. Since these spikes does not exist in the infinite-length case, both adjacent periods are The frequency-domain BSS described in the previous sec- needed to eliminate a spike. Here, we consider two reasons tion is influenced by the circularity of the discrete frequency for these spikes. One is that the frequency responses are un- representation. The circularity refers to the fact that fre- dersampled and the corresponding time domain filter has an quency responses sampled at L points with an interval f s/L overlap with another period. This problem does not occur (fs: sampling frequency) represent a time-domain signal if the required filter length is less than L. The other rea- whose period is L/fs. Figure 2 shows two time-domain fil- son is that adjacent periods work together to perform some ters. The upper one is a periodical infinite-length filter rep- filtering even if the first problem is solved. The effect of ( ) resented by frequency responses Wrq f calculated by ICA the second problem can be mitigated if the amplitude of the at L points. Since this filter is unrealistic, we usually use the filter coefficients around both ends is small. one-period realization of the filter shown in the lower part. The ICA algorithm in the frequency domain does not However, one-period filters may cause a problem. Fig- know the length L of the time-domain filters, and ICA solu- ure 3 shows impulse responses from a source sp(t) to a sep- M L−1 tions in various frequency bins might require the length of arated signal yr(t): urp(l)= q=1 τ=0 wrq(τ)hqp(l−τ). the time-domain filter to be longer than L and generally infi- Those on the left u11(l) correspond to the extraction of a nite. Therefore, we need to control the frequency responses target signal, and those on the right u13(l) correspond to the so that the corresponding time-domain filter fits length L suppression of an interference signal. The upper responses and has small amplitude around the ends. 4. SPECTRAL SMOOTHING Length of separation filters: 150° L = 2048 a 4.1. By Windowing Length of source signals: 7 sec We propose a method for changing the frequency responses b 110° to meet the requirement considered in the previous section. 4cm 1.2m The basic idea is spectral smoothing by windowing. We Sampling rate: 4cm apply a window g(l) to a filter wrq(l) to obtain a new fil- 8000 Hz ( ) · ( ) 4cm c 70° ter wrq l g l . To ensure that the new filter have small Reverberation time: amplitude around both ends, we use a window that tapers 130 ms smoothly to zero at each end, such as a Hanning window ° 1 2πl Room size: d 30 g(l)= 2 (1 + cos L ). By this operation, frequency re- 4.45 m × 3.55 m × 2.50 m W( ) W( ) ← sponses f obtained by ICA are smoothed as f fs−∆f φ=0 G(φ)W(f −φ), where G(f) is the frequency re- Fig. 4. Experimental conditions sponse of g(l) and ∆f = fs/L. If a Hanning window is used, the frequency responses are smoothed as great change in the predetermined scaling. Thus, the total W( ) ← [W( −∆ )+2W( )+W( +∆ )] 4 f f f f f f / cost to be minimized is J = f Jr(f), where ( ) 2 2 2 since the frequency responses G f of a Hanning window Jr(f)=||er(f)|| / ||wr(f)|| + β|dr(f) − 1| , are G(0) = 1/2, G(∆f)=G(fs −∆f)=1/4, and zero for the other frequency bins. and β is a parameter indicating the importance of maintain- ing the predetermined scaling. The minimization can be Although the windowing eliminates the spikes, it changes ∂J performed iteratively by dr(f)=dr(f) − µ with a the frequency response obtained by ICA and causes an error. ∂dr(f) The error can be calculated for each row w r(f)=[Wr1(f), small step-size µ. If a Hanning window is used, the smoothed frequency responses are ...,WrM(f)] of W(f). Let br(f) be the smoothed fre- quency responses, and αr be a complex-valued scalar rep- br(f)=[vr(f −∆f)+2vr(f)+vr(f +∆f)] / 4, resenting the scaling ambiguity of the ICA solution. The and the error can be represented as error is er(f)=minαr [br(f) − αrwr(f)], and its least- − + er(f)=[dr(f −∆f)c (f)+dr(f +∆f)c (f)] / 4. squares solution is r r c− c+ b ( )w ( )H using previously defined r and r . Thus, the gradient is e ( )=b ( ) − r f r f w ( ) r f r f 2 r f . ∂J ∂Jr(f −∆f) ∂Jr(f +∆f) ∂Jr(f) ||wr(f)|| = + + ( ) ( ) ( ) ( ) If we use a Hanning window, the smoothed frequency re- ∂dr f ∂dr f ∂dr f ∂dr f + H sponse is br(f)=[wr(f −∆f)+2wr(f)+wr(f +∆f)]/4. =[er(f −∆f)cr (f −∆f) + − H 2 Thus, the error can be represented as er(f +∆f)c (f +∆f) ] / (8 ·||wr(f)|| ) − + r er(f)=[c (f)+c (f)] / 4, where r r +2β(dr(f) − 1). w ( −∆ )w ( )H c−( )=w ( −∆ ) − r f f r f w ( ) r f r f f 2 r f , ||wr(f)|| 5. EXPERIMENTAL RESULTS H + wr(f +∆f)wr(f) c ( )=wr( +∆ ) − wr( ) We performed experiments under the conditions shown in r f f f ||w ( )||2 f . r f Fig. 4. As the ICA algorithm, we used FastICA [2] followed − + This cr (or cr ) represents the difference of two vectors by Infomax + Natural gradient [1] with 50 loops to improve wr(f) and wr(f −∆f) (or wr(f +∆f)). These differences the performance. Table 1 summarizes the separation results. are usually not very large, therefore the error does not seri- To show the effectiveness of the proposed method, we com- ously affect the separation. pare cases where the spectral smoothing was applied differ- ently: no smoothing (“no”), simply multiplying a Hanning 4.2. Pre-scaling window (“win”) and pre-scaling before multiplying a Han- ning window (“pre + win”). In the “pre + win” case, we Even if the error caused by the windowing is not very large, tested two different values for β. we can improve the separation by minimizing the error. We We evaluated the separation using signal-to-interference minimize the error by adjusting the scaling of the ICA so- ratio (SIR) and signal-to-distortion ratio (SDR). SIR is cal- lution before windowing. Let dr(f) be a complex-valued culated by the ratio of the power of a target component scalar and vr(f)=dr(f)wr(f) be a new row vector of a  urr(l) and an interference component p=r urp(l). To cal- separation matrix. We want to find dr(f) such that the error H culate SDR, we decompose the target component u rr(l) into br(f)vr(f) e ( )=b ( ) − v ( ) a scaled version of a reference and a distortion e r(t).We r f r f ||v ( )||2 r f r f selected hrr(l) as the reference following the minimal dis- is minimized, where br(f) is the smoothed frequency re- tortion principle [12]. Thus, the target component is de- sponses. A scalar dr(f) should be close to 1 to avoid any composed as urr(l)=αr · hrr(t)+er(t), where αr is a Table 1. Experimental results #sources / position 2/ab 3/abd 4/abcd smoothing no win pre + win no win pre + win no win pre + win β 0.10.01 0.10.01 0.10.01 SIR (dB) 20.6 21.6 21.8 22.1 13.4 17.2 17.8 19.1 11.8 15.6 17.0 19.0 SDR (dB) 20.5 21.1 20.8 19.6 14.9 17.0 17.6 17.0 15.1 17.9 15.8 12.2 Execution time (s) 9.7 9.7 9.8 10.1 17.9 18.0 18.2 18.7 27.0 27.0 27.3 27.9

no smoothing smoothing (win) 7. ACKNOWLEDGEMENT 1 1 We thank Prof. Scott C. Douglas for valuable discussions. 0 0

−1 −1 8. REFERENCES Amplitude −2 −2 [1] S. Haykin, Ed., Unsupervised Adaptive Filtering (Volume I: 500 1000 1500 2000 500 1000 1500 2000 Time (sample) Time (sample) Blind Source Separation), John Wiley & Sons, 2000. [2] A. Hyv¬arinen, J. Karhunen, and E. Oja, Independent Com- Fig. 5. Separation filters w11(l) ponent Analysis, John Wiley & Sons, 2001. [3] P. Smaragdis, “Blind separation of convolved mixtures in Target: u (l) Interference: u (l) 11 13 the frequency domain,” Neurocomputing, vol. 22, pp. 21Ð34, 1 1 1998. 0.5 0.5 [4] S. Ikeda and N. Murata, “A method of ICA in timeÐ frequency domain,” in Proc. ICA ’99, Jan. 1999, pp. 365Ð 0 0 370. Amplitude −0.5 −0.5 [5] H. Saruwatari, S. Kurita, and K. Takeda, “Blind source sepa- 1000 2000 3000 1000 2000 3000 ration combining frequency-domain ICA and beamforming,” Time (sample) Time (sample) in Proc. ICASSP 2001, May 2001, MULT-P2.2. Fig. 6. Impulse responses urp(l) obtained by filters, one of [6] H. Sawada, R. Mukai, S. Araki, and S. Makino, “A ro- which is shown in “smoothing (win)” in Fig. 5. bust and precise method for solving the permutation prob- lem of frequency-domain blind source separation,” in Proc. ICA2003, Apr. 2003, pp. 505Ð510. real-valued scalar to make the distortion er(t) minimum. [7] L. Parra and C. Spence, “Convolutive blind separation of Finally, SDR is calculated by the ratio of the power of αr · non-stationary sources,” IEEE Trans. Speech Audio Process- hrr(t) and er(t). ing, vol. 8, no. 3, pp. 320Ð327, May 2000. We found that spectral smoothing improved the SIR in [8] L. Schobben and W. Sommen, “A frequency domain blind all cases. With spectral smoothing, both ends of a separa- signal separation method based on decorrelation,” IEEE tion filter converge to zero as shown in Fig. 5. This elimi- Trans. , vol. 50, no. 8, pp. 1855Ð1865, nates the spikes caused by the circularity as shown in Fig. 6. Aug. 2002. However, spectral smoothing alone changes the ICA solu- [9] J. J. Shynk, “Frequency-domain and multirate adaptive fil- tions and causes an error. The pre-scaling minimizes the er- tering,” IEEE Signal Processing Magazine, vol. 9, no. 1, pp. ror and improves SIR further. If we use a smaller β, a better 14Ð37, Jan. 1992. SIR is obtained at the cost of a worse SDR. An appropriate [10] M. Joho and P. Schniter, “Frequency domain realization of β can be chosen to balance SIR and SDR. a multichannel blind deconvolution algorithm based on the natural gradient,” in Proc. ICA2003, Apr. 2003, pp. 543Ð 548. 6. CONCLUSION [11] H. Buchner, R. Aichner, and W. Kellermann, “A general- ization of a class of blind source separation algorithms for We have observed the circularity problem, and have pro- convolutive mixtures,” in Proc. ICA2003, Apr. 2003, pp. posed a method for solving the problem based on spectral 945Ð950. smoothing. This method provides good separation in com- [12] K. Matsuoka and S. Nakashima, “Minimal distortion prin- bination with our previously reported solution to the permu- ciple for blind source separation,” in Proc. ICA 2001, Dec. tation problem [6]. We have succeeded in separating many 2001, pp. 722Ð727. sources with a practical execution time. The experiments [13] R. Mukai, H. Sawada, S. de la Kethulle de Ryhove, shown here were for up to four sources with linearly ar- S. Araki, and Shoji Makino, “Array geometry arrangement ranged sensors. We have also separated six sources with a for frequency domain blind source separation,” in Proc. planar array of eight sensors [13]. IWAENC2003, Sept. 2003.