<<

1218 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 4, JULY 2006 New Insights Into the Reduction Wiener Filter Jingdong Chen, Member, IEEE, Jacob Benesty, Senior Member, IEEE, Yiteng (Arden) Huang, Member, IEEE, and Simon Doclo, Member, IEEE

Abstract—The problem of has attracted a reduction/speech enhancement techniques in order to extract considerable amount of research attention over the past several the desired speech from its corrupted observations. decades. Among the numerous techniques that were developed, Noise reduction techniques have a broad range of applica- the optimal Wiener filter can be considered as one of the most fundamental noise reduction approaches, which has been de- tions, from hearing aids to cellular phones, voice-controlled sys- lineated in different forms and adopted in various applications. tems, multiparty teleconferencing, and automatic speech recog- Although it is not a secret that the Wiener filter may cause some nition (ASR) systems. The choice between using and not using detrimental effects to the speech signal (appreciable or even sig- a noise reduction technique may have a significant impact on nificant degradation in quality or intelligibility), few efforts have the functioning of these systems. In multiparty conferencing, been reported to show the inherent relationship between noise for example, the picked up by the microphone reduction and speech . By defining a speech-distortion index to measure the degree to which the speech signal is deformed at each point of the conference combines additively at the net- and two noise-reduction factors to quantify the amount of noise work bridge with the noise from all other points. The being attenuated, this paper studies the quantitative performance loudspeaker at each location of the conference therefore repro- behavior of the Wiener filter in the context of noise reduction. We duces the combined sum of the noise processes from all other show that in the single-channel case the a posteriori signal-to-noise locations. Clearly, this problem can be extremely serious if the ratio (SNR) (defined after the Wiener filter) is greater than or equal to the a priori SNR (defined before the Wiener filter), number of conferees is large, and without noise reduction, com- indicating that the Wiener filter is always able to achieve noise munication is almost impossible in this context. reduction. However, the amount of noise reduction is in general Noise reduction is a very challenging and complex problem proportional to the amount of speech degradation. This may seem due to several reasons. First of all, the nature and the character- discouraging as we always expect an to have maximal istics of the noise signal change significantly from application noise reduction without much speech distortion. Fortunately, to application, and moreover vary in time. It is therefore very we show that speech distortion can be better managed in three different ways. If we have some a priori knowledge (such as the difficult—if not impossible—to develop a versatile algorithm linear prediction coefficients) of the clean speech signal, this a that works in diversified environments. Secondly, the objective priori knowledge can be exploited to achieve noise reduction while of a noise reduction system is heavily dependent on the spe- maintaining a low level of speech distortion. When no a priori cific context and application. In some scenarios, for example, we knowledge is available, we can still achieve a better control of noise want to increase the intelligibility or improve the overall speech reduction and speech distortion by properly manipulating the perception quality, while in other scenarios, we expect to ame- Wiener filter, resulting in a suboptimal Wiener filter. In case that we have multiple microphone sensors, the multiple observations liorate the accuracy of an ASR system, or simply reduce the of the speech signal can be used to reduce noise with less or even listeners’ fatigue. It is very hard to satisfy all objectives at the no speech distortion. same time. In addition, the complex characteristics of speech Index Terms—Microphone arrays, noise reduction, speech dis- and the broad spectrum of constraints make the problem even tortion, Wiener filter. more complicated. Research on noise reduction/speech enhancement can be traced back to 40 years ago with 2 patents by Schroeder [1], I. INTRODUCTION [2] where an analog implementation of the spectral magnitude INCE we are living in a natural environment where noise subtraction method was described. Since then it has become S is inevitable and ubiquitous, speech signals are gener- an area of active research. Over the past several decades, ally immersed in acoustic ambient noise and can seldom be researchers and engineers have approached this challenging recorded in pure form. Therefore, it is essential for speech problem by exploiting different facets of the properties of the processing and communication systems to apply effective noise speech and noise signals. Some good reviews of such efforts can be found in [3]–[7]. Principally, the solutions to the problem can be classified from the following points of view. Manuscript received December 20, 2004; revised September 2, 2005. The associate editor coordinating the review of this manuscript and approving it for • The number of channels available for enhancement; i.e., publication was Prof. Li Deng. single-channel and multichannel techniques. J. Chen and Y. Huang are with the Bell Labs, Lucent Technologies, Murray • How the noise is mixed to the speech; i.e., additive noise, Hill, NJ 07974 USA (e-mail: [email protected]; arden@re- multiplicative noise, and convolutional noise. search.bell-labs.com). J. Benesty is with the Université du Québec, INRS-EMT, Montréal, QC, H5A • Statistical relationship between the noise and speech; i.e., 1K6, Canada (e-mail: [email protected]). uncorrelated or even independent noise, and correlated S. Doclo is with the Department of (ESAT-SCD), noise (such as echo and reverberation). Katholieke Universiteit Leuven, Leuven 3001, Belgium (e-mail: simon.doclo@ esat.kuleuven.be). • How the processing is carried out; i.e., in the time domain Digital Object Identifier 10.1109/TSA.2005.860851 or in the frequency domain.

1558-7916/$20.00 © 2006 IEEE CHEN et al.: NEW INSIGHTS INTO THE NOISE REDUCTION WIENER FILTER 1219

In general, the more microphones are available, the easier the speech signal is deformed and two noise-reduction factors to task of noise reduction. For example, when multiple realizations quantify the amount of noise being attenuated. We then show of the signal can be accessed, beamforming, source separation, or that for the single-channel Wiener filter, the amount of noise re- spatio-temporal filtering techniques can be applied to extract the duction is in general proportional to the amount of speech degra- desired speech signalor to attenuatethe unwantednoise[8]–[13]. dation, implying that when the noise reduction is maximized, If we have two microphones, where the first microphone the speech distortion is maximized as well. picks up the noisy signal, and the second microphone is able Depending on the nature of the application, some practical to measure the noise field, we can use the second microphone noise-reduction systems require very high-quality speech, but signal as a noise reference and eliminate the noise in the first can tolerate a certain amount of residual noise, whereas other microphone by means of adaptive noise cancellation. However, systems require the speech signal to be as clean as possible, in most situations, such as mobile communications, only one but may allow some degree of speech distortion. Therefore, it microphone is available. In this case, noise reduction techniques is necessary that we have some management scheme to control need to rely on assumptions about the speech and noise signals, the compromise between noise reduction and speech distortion or need to exploit aspects of speech perception, speech produc- in the context of Wiener filtering. To this end, we discuss three tion, or a speech model. A common assumption is that the noise approaches. The first approach leads to a suboptimal filter where is additive and slowly varying, so that the noise characteristics a parameter is introduced to control the tradeoff between speech estimated in the absence of speech can be used subsequently in distortion and noise reduction. The second approach leads to the the presence of speech. If in reality this premise does not hold, well-known parametric-model-based noise reduction technique, or only partially holds, the system will either have less noise where an AR model is exploited to achieve noise reduction, reduction, or introduce more speech distortion. while maintaining a low level of speech distortion. The third Even with the limitations outlined above, single-channel approach pertains to a multichannel approach where spatio-tem- noise reduction has attracted a tremendous amount of re- poral filtering techniques are employed to obtain noise reduction search attention because of its wide range of applications with less or even no speech distortion. and relatively low cost. A variety of approaches have been developed, including Wiener filter [3], [14]–[19], spectral II. ESTIMATION OF THE CLEAN SPEECH SAMPLES or cepstral restoration [17], [20]–[27], signal subspace We consider a zero-mean clean speech signal contami- [28]–[35], parametric-model-based method [36]–[38], and nated by a zero-mean noise process [white or colored but statistical-model-based method [5], [39]–[46]. uncorrelated with ], so that the noisy speech signal at the Most of these were developed independently of discrete time sample is each other and generally their noise reduction performance was evaluated by assessing the improvement of signal-to-noise ratio (1) (SNR), subjective speech quality, or ASR performance (when the ASR system is trained in clean conditions and additive noise Define the error signal between the clean speech sample at time is the only distortion source). Almost with no exception, these and its estimate algorithms achieve noise reduction by introducing some distor- (2) tion to the speech signal. Some algorithms, such as the subspace method, are even explicitly formulated based on the tradeoff be- where superscript denotes transpose of a vector or a matrix, tween noise reduction and speech distortion. However, so far, few efforts have been devoted to analyzing such a tradeoff be- havior even though it is a very important issue. In this paper, we is an FIR filter of length , and attempt to provide an analysis about the compromise between noise reduction and speech distortion. On one hand, such a study may offer us some insight into the range of existing algorithms is a vector containing the most recent samples of the observa- that can be employed in practical noisy environments. On the tion signal . other hand, a good understanding may help us to find new algo- We now can write the mean-square error (MSE) criterion rithms that can work more effectively than the existing ones. Since there are so many algorithms in the literature, it is ex- (3) tremely difficult—if not impossible—to find a universal ana- where denotes mathematical expectation. The optimal es- lytical tool that can be applied to any algorithm. In this paper, timate of the clean speech sample tends to contain we choose the Wiener filter as the basis since it is one of the less noise than the observation sample , and the optimal most fundamental approaches, and many algorithms are closely filter that forms is the Wiener filter which is obtained as connected to this technique. For example, the minimum-mean- follows: square-error (MMSE) estimator presented in [21], which be- longs to the category of spectral restoration, converges to the (4) Wiener filter at a high SNR. In addition, it is widely known that the Kalman filter is tightly related to the Wiener filter. Consider the particular filter Starting from optimal Wiener filtering theory, we introduce a speech-distortion index to measure the degree to which the 1220 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 4, JULY 2006

This means that the observed signal will pass this filter where has the same size as and consists of all zeros. The unaltered (no noise reduction), thus the corresponding MSE is minimum MSE (MMSE) is (15)

(5) We see clearly from the previous expression that ; therefore, noise reduction is possible. In principle, for the optimal filter , we should have The normalized MMSE is

(6) (16) In other words, the Wiener filter will be able to reduce the level of noise in the noisy speech signal . and . From (4), we easily find the Wiener–Hopf equation III. ESTIMATION OF THE NOISE SAMPLES (7) In this section, we will estimate the noise samples from the where observations .Define the error signal between the noise sample at time and its estimate (8) (17) is the correlation matrix of the observed signal and where (9) is the cross-correlation vector between the noisy and clean speech signals. However, is unobservable; as a result, an is an FIR filter of length . The MSE criterion associated with estimation of may seem difficult to obtain. But (17) is

(18)

The estimation of in the MMSE sense will tend to attenuate the clean speech. (10) The minimization of (18) leads to the Wiener–Hopf equation Now depends on the correlation vectors and . The vector (which is also the first column of ) can be easily esti- mated during speech and noise periods while can be esti- (19) mated during noise-only intervals assuming that the statistics of the noise do not change much with time. We have Using (10) and the fact that , we obtain the (20) optimal filter (21)

The MSE for the particular filter (no clean speech reduc- (11) tion) is where (22) Therefore, the MMSE and the normalized MMSE are, respec- (12) tively, is the signal-to-noise ratio, is the identity matrix, and (23) (24)

Since , the Wiener filter will be able to reduce the level of the clean speech in the signal . As a result, . We have In Section IV, we will see that while the normalized MMSE, (13) , of the clean speech estimation plays a key role in noise reduction, the normalized MMSE, , of the noise process (14) estimation plays a key role in speech distortion. CHEN et al.: NEW INSIGHTS INTO THE NOISE REDUCTION WIENER FILTER 1221

IV. IMPORTANT RELATIONSHIPS BETWEEN NOISE REDUCTION is feasible with the Wiener filter, expression (33) shows that the AND SPEECH DISTORTION price to pay for this is also a reduction of the clean speech [by Obviously, there are some important relationships between a quantity equal to and this implies distor- the estimation of the clean speech and noise samples. From (11) tion], since . In other words, the power of the at- and (19), we get a relation between the two optimal filters tenuated clean speech signal is, obviously, always smaller than the power of the clean speech itself; this means that parts of the (25) clean speech are attenuated in the process and as a result, dis- tortion is unavoidable with this approach. In fact, minimizing or with respect to is We now define the speech-distortion index due to the optimal equivalent. In the same manner, minimizing or filtering operation as with respect to is the same thing. At the optimum, we have

(26)

From (15) and (23), we see that the two MMSEs are equal (34)

(27) Clearly, this index is always between 0 and 1 for the optimal filter. Also However, the normalized MMSE’s are not, in general. Indeed, we have a relation between the two (35) (36)

So when is close to 1, the speech signal is highly dis- (28) torted and when is near 0, the speech signal is lowly distorted. We deduce that for low SNRs, the Wiener filter can So the only situation where the two normalized MMSE’s are have a disastrous effect on the speech signal. equal is when the SNR is equal to 1. For , Similarly, we define the noise-reduction factor due to the and for , . Also, Wiener filter as and . It can easily be verified that

(29) which implies that . We already know that and . The optimal estimation of the clean speech, in the Wiener (37) sense, is in fact what we call noise reduction and . The greater is , the more noise reduc- (30) tion we have. Also or equivalently, if the noise is estimated first (38) (31) (39) we can use this estimate to reduce the noise from the observed Using (34) and (37), we obtain important relations between the signal speech-distortion index and the noise-reduction factor (32) (40) The power of the estimated clean speech signal with the optimal Wiener filter is (41)

(33) Therefore, for the optimum filter, when the SNR is very large, there is little speech distortion and little noise reduction (which which is the sum of two terms. The first one is the power of the is not really needed in this situation). On the other hand, when attenuated clean speech and the second one is the power of the the SNR is very small, speech distortion is large as well as noise residual noise (always greater than zero). While noise reduction reduction. 1222 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 4, JULY 2006

V. P ARTICULAR CASE:WHITE In this section, we assume that the additive noise is white, so that,

(47)

From (16) and (24), we observe that the two normalized MMSEs are (48) (49) where and are the first components of the vectors and , respectively. Clearly, and . Hence, the normalized MMSE is completely governed by the first element of the Wiener filter . Now, the speech-distortion index and the noise-reduction factor for the optimal filter can be simplified $ @h A ) @g A Fig. 1. Illustration of the areas where and take their values (50) as a function of the SNR. $ @h A can take any value above the solid line while ) @g A can take any value under the dotted line.

Another way to examine the noise-reduction performance is to inspect the SNR improvement. Let us define the a posteriori (51) SNR, after noise reduction with the Wiener filter as We also deduce from (50) that and . We know from linear prediction theory that [47]

(52)

where is the forward linear predictor and is the corre- sponding error energy. Replacing the previous equation in (11), we obtain (42) (53) It can be shown that the a posteriori SNR and the a priori SNR satisfy (see Appendix), indicating that the where Wiener filter is always able to improve the SNR of the noisy speech signal. (54) Knowing that , we can now give the lower bound for . As a matter of fact, it follows from (42) that Equation (53) shows how the Wiener filter is related to the for- ward predictor of the observed signal . This expression also (43) gives a hint on how to choose the length of the optimal filter : it should be equal to the length of the predictor required to Since , and , it can be easily have a good prediction of the observed signal . Equation shown that (54) contains some very interesting information. Indeed, if the clean speech signal is completely predictable, this means that (44) and . On the other hand, if is not predictable, we have and . This Similarly, we can derive the upper bound for , i.e., implies that the Wiener filter is more efficient to reduce the level of noise for predictable signals than for unpredictable ones. (45)

Fig. 1 illustrates expressions (44) and (45). VI. BETTER WAYS TO MANAGE NOISE REDUCTION We now introduce another index for noise reduction AND SPEECH DISTORTION For a noise-reduction/speech-enhancement system, we al- (46) ways expect that it can achieve maximal noise reduction without The closer is to 1, the more noise reduction we get. This much speech distortion. From the previous section, however, index will be helpful to use in Sections V–VII. it follows that while noise reduction is maximized with the CHEN et al.: NEW INSIGHTS INTO THE NOISE REDUCTION WIENER FILTER 1223 optimal Wiener filter, speech distortion is also maximized. In order to have less distortion with the suboptimal filter One may ask the legitimate question: are there better ways than with the Wiener filter , we must find in such a way that to control the tradeoff between the conflicting requirements of noise reduction and speech distortion? Examining (34), (62) one can see that to control the speech distortion, we need to hence, the condition on should be minimize . This can be achieved in different ways. For example, a speech signal can be modeled as (63) an AR process. If the AR coefficients are known a priori or can Finally, the suboptimal filter can reduce the level of noise of be estimated from the noisy speech, these coefficients can be the observed signal but with less distortion than the Wiener exploited to minimize , while simulta- filter if is taken such as neously achieving a reasonable level of noise attenuation. This is often referred to as the parametric-model-based technique (64) [36], [37]. We will not discuss the details of this technique here. For the extreme cases and we obtain respectively Instead, in what follows we will discuss two other approaches , no noise reduction at all but no additional distortion to manage noise reduction and speech distortion in a better way. added, and , maximum noise reduction with maximum speech distortion. A. A Suboptimal Filter Since Consider the suboptimal filter

(55) (65) where is a real number. The MSE of the clean speech estima- tion corresponding to is it follows immediately that the speech-distortion index and the noise-reduction factor due to are

(56) (66) and, obviously, , ; we have equality for (67) . In order to have noise reduction, must be chosen in such a way that , therefore From (61), one can see that , which is (57) a function of only. Unlike , does not only depend on , but on the characteristics of both the We can check that speech and noise signal as well. However, using (56) and (15), we find that (58)

Let (68)

(59) Fig. 2 plots and , both as a function of . We can see that when , the suboptimal denote the estimation of the clean speech at time with respect filter achieves of the noise reduction with the Wiener filter, to . The power of is while the speech distortion is only 49% of that of the Wiener filter. In real applications, we may want the system to achieve maximal noise reduction, while keeping the speech distortion as low as possible. If we define a cost function to measure the compromise between the noise reduction and the speech distor- (60) tion as The speech-distortion index corresponding to the filter is

(69) It is trivial to see that the that maximizes is (61) (70) The previous expression shows that the ratio of the speech- distortion indices corresponding to the two filters and In this case, the suboptimal filter achieves 75% of the noise depends on only. reduction with the Wiener filter, while the speech-distortion is 1224 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 4, JULY 2006

both the signal and the noise are assumed to be Gaussian random processes and . This figure shows that for the same , decreases with SNR, indicating that the higher the SNR, the better the suboptimal filter is able to control the compromise between noise reduction and speech distortion. In order for the suboptimal filter to be able to control the tradeoff between noise reduction and speech distortion, should be chosen in such a way that . Therefore, should satisfy . From Fig. 3, we notice that is always positive if the SNR is above 1 (0 dB). When the SNR drops below 1 (0 dB), however, may become negative, indicating that the suboptimal filter cannot work reliably in very noisy conditions [when dB ]. Fig. 2. ) @g Aa) @g A (dashed line) and  @h Aa @h A (solid line), both as a function of . Fig. 3 also shows the that maximizes in different SNR situations. It is interesting to see that the approaches to 1 when dB , which means that the suboptimal filter converges to the Wiener filter in very low SNR condi- tions. As we increase the SNR, the begins to decrease. It goes to 0 when SNR is increased to 1000 (30 dB). This is un- derstandable. When the SNR is very high, the speech signal is already very clean, so filtering is not really needed. By searching the that maximizes (71), the system can adaptively achieve the best tradeoff between noise reduction and speech distortion according to the characteristics of both the speech and noise signals.

B. Noise Reduction With Multiple Microphones In more and more applications, multiple microphone signals are available. Therefore, it is interesting to investigate deeply the Fig. 3. Illustration of t @A in different SNR conditions, where both the multichannel case, where various techniques such beamforming signal and the noise are assumed to be Gaussian random processes, and  a (nonadaptive and adaptive) and spatial-temporal filtering can be HXU t @A . The “ ” symbol in each curve represents the maximum of in the used to achieve noise reduction [13], [50]–[52]. One of the first corresponding condition. papers to do so is a paper written by Doclo and Moonen [13], where the optimal filter is derived as well as a general class only 25% of that of the Wiener filter. The parameter , which of estimators. The authors also show how the generalized sin- is optimal in terms of the tradeoff between noise reduction and gular value decomposition can be used in this spatio-temporal speech distortion, can be used as a guidance in designing a prac- technique. In this section, we take a slightly different approach. tical noise reduction system for applications like ASR. We will see, in particular, that we can reduce the level of noise Another way to obtain an optimal is to define a dis- without distorting the speech signal. criminative cost function between and We suppose that we have a linear array consisting of , i.e., microphones whose outputs are denoted as , . Without loss of generality, we se- lect microphone 0 as the reference point and to simplify the analysis, we consider the following propagation model:

(71) (72) where is an application-dependent constant and determines where is the attenuation factor (with ), is the prop- the relative importance between the improvement in speech dis- agation time from the unknown speech source to micro- tortion and degradation in noise reduction (e.g., in hearing aid phone 0, is an additive noise signal at the th micro- applications we may tune this parameter using subjective intel- phone, and is the relative delay between microphones 0 and ligibility tests). , with . In contrast to , which is a function of only, the cost In the following, we assume that the relative delays , function does not only depend on , but on the char- , are known or can easily be estimated. So our first acteristics of the speech and noise signal as well. Fig. 3 plots step is the design of a simple delay-and-sum beamformer, which as a function of in different SNR conditions, where spatially aligns the microphone signals to the direction of the CHEN et al.: NEW INSIGHTS INTO THE NOISE REDUCTION WIENER FILTER 1225 speech source. From now on, we will work on the time-aligned Usually, in the single-channel case, the minimization of the signals MSE corresponding to the residual noise is done while keeping the signal distortion below a threshold [28]. With no distortion, the optimal filter obtained from this optimization is , hence there is not any noise reduction either. The advantage of mul- tiple microphones is that, actually, we can minimize (73) with the constraint that (no speech distortion at all). Therefore, our optimization problem is A straightforward approach for noise reduction is to average (78) the signals By using a Lagrange multiplier, we easily find the optimal so- lution

(79) (74) where we assumed that the noise signals are not perfectly coherent so that is not singular. This result is very similar where . If the are added incoherently, to the linearly constrained minimum variance (LCMV) beam- the output SNR will, in principle, increase [48]. We can further former [51], [52]; but in (79) additional attenuation factors reduce the noise by passing the signal through a Wiener have been included. Note also that this formula has been derived filter as was shown in the previous sections. This approach has, from a different point of view as a multichannel extension of a however, two drawbacks. The first one is that, since for , single-channel MMSE noise-reduction algorithm. in general, the output SNR will Given the optimal filter , we can write the MMSE for not improve that much; and the second one, as we know already, the th microphone as is speech distortion introduced by the optimal filter. Let us now define the error signal, for the th microphone, (80) between the clean speech sample and its estimate as Since we have microphones, we have MMSEs as well. The best MMSE from a noise reduction point of view is the smallest one, which is, according to (80), the microphone signal with the smallest attenuation factor. (75) The attenuation factors can be easily determined, if the power of the noise signals is known, by using the formula where are filters of length and

(81) Since , (75) becomes For the particular case where the noise is spatio-temporally white with a power equal to , the MMSE and the normalized MMSE for the th microphone are, respectively,

(82)

(83) (76) where As in the single-channel case, we can define for the th mi- crophone the speech-distortion index as

Expression (76) is the difference between two error signals; (84) represents signal distortion and represents the residual noise. The MSE corresponding to the residual noise and the noise-reduction factors as with the th microphone as the reference signal is (85)

(77) (86) 1226 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 4, JULY 2006

With the optimal filter given in (79), for the particular case where the noise is spatio-temporally white with a power equal to , it can be easily shown that

and

It can be seen that when the number of microphones goes to in- finity, and approach, respectively, to in- finity and 1, and meanwhile , which indicates that the noise can be completely removed with no signal distor- tion at all.

VII. SIMULATION EXPERIMENTS Fig. 4. Noise and its estimate. The first trace (from the top) shows the waveform of a speech signal corrupted by a car noise where SNR a 10 (10 dBA. By defining a speech-distortion index to measure the degree The second and third traces plot the waveform and spectrogram of the noise to which the speech signal is deformed and two noise-reduction signal. The fourth and fifth traces display the waveform and spectrogram of the factors to quantify the amount of noise being attenuated, we noise estimate. have analytically examined the performance behavior of the Wiener-filter-based noise reduction technique. It is shown that at the bottom of the page), where again is the “attack” the Wiener filter achieves noise reduction by distorting the coefficient and the “decay” coefficient. To further reduce the speech signal. The more the noise is reduced, the more the spectral fluctuation, both and are averaged across speech is distorted. We also proposed several approaches to the neighboring frequency bins around . Finally, an estimate better manage the tradeoff between noise reduction and speech of the noise spectrum is obtained by multiplying distortion. To further verify the analysis, and to assess the with , and the time-domain noise signal is obtained noise-reduction-and-speech-distortion management schemes, through IDFT and the overlap-add technique. See [6], [7] for a we implemented a time-domain Wiener-filter system. The more detailed description of this noise-estimation scheme. sampling rate is 8 kHz. The noise signal is estimated in the Fig. 4 shows a speech signal corrupted by a car noise time-frequency domain using a sequential algorithm presented dB , the waveform and the spectrogram of the car noise in [6], [7]. Briefly, this algorithm obtains an estimate of noise that is added to the speech, and the waveform and spectrogram using the overlap-add technique on a frame-by-frame basis. The of the noise estimate. It can be seen that during the absence of noisy speech signal is segmented into frames with a frame speech, the estimate is a good approximation of the noise signal. width of 8 ms and an overlapping factor of 75%. Each frame It is also noticed from its spectrogram that the noise estimate is then transformed via a DFT into a block of spectral samples. consists of some minor speech components during the presence Successive blocks of spectral samples form a two-dimensional of speech. Our listening test, however, shows that the residual time-frequency matrix denoted by , where subscript speech in the noise estimate is almost inaudible. An apparent is the frame index, denoting the time dimension, and is the advantage of this noise-estimation technique is that it does not angular frequency. Then an estimate of the magnitude of the require an explicit voice activity detector. In addition, our exper- noise spectrum is formulated as shown in (87) at the bottom imental investigation reveals that such a scheme is able to cap- of the page, where and are the “attack” and “decay” ture the noise characteristics in both the presence and absence coefficients respectively. Meanwhile, to reduce its temporal of speech, therefore it does not rely on the assumption that the fluctuation, the magnitude of the noisy speech spectrum is noise characteristics in the presence of speech stay the same as smoothed according to the following recursion (see (88), shown in the absence of speech.

if (87) if

if (88) if CHEN et al.: NEW INSIGHTS INTO THE NOISE REDUCTION WIENER FILTER 1227

Fig. 5. Noise-reduction factor and signal-distortion index, both as a function of the filter length: (a) noise reduction and (b) signal distortion. The source is a signal recorded in a NYSE room; the background noise is a computer-generated white Gaussian random process; and ƒx‚ a IH @IH dBA.

Fig. 6. Noise-reduction factor and signal-distortion index, both as a function of the filter length: (a) noise reduction and (b) speech distortion. The source signal is an /i:/ sound from a female speaker; the background noise is a computer-generated white Gaussian process; and ƒx‚ a IH @IH dfA.

Based on the implemented system, we evaluate the Wiener can be modeled as an AR process, where its current value can filter for noise reduction. The first experiment investigates the be predicted from its past samples. To simplify the situation for influence of the filter length on the noise reduction perfor- the ease of analysis, the source signal used here is an /i:/ sound mance. Instead of using the estimated noise, here we assume recorded from a female speaker. Similarly as in the previous case, that the noise signal is known a priori. Therefore, this ex- the background noise is a computer-generated white Gaussian periment demonstrates the upper limit of the performance random process. The results are plotted in Fig. 6. Again, the of the Wiener filter. We consider two cases. In the first one, noise-reduction factor, which quantifies the amount of noise both the source signal and the background noise are random being attenuated, increases monotonically with the filter length; processes in which the current value of the signal cannot be but unlike the previous case, the relationship between the noise predicted from its past samples. The source signal is a noise reduction and the filter length is not linear. Instead, the curve signal recorded from a New York Stock Exchange (NYSE) at first grows quickly as the filter length is increased up to 10, room. This signal consists of sound from various sources such and then continues to grow but with a slower rate. Unlike , as speakers, telephone rings, electric fans, etc. The background the speech-distortion index, i.e., , exhibits a nonmonotonic noise is a computer-generated Gaussian random process. The relationship with the filter length. It first decreases to its min- results for this case are graphically portrayed in Fig. 5. It can imum, and then increases again as the filter length is increased. be seen that both the noise-reduction factor and the The reason, as we have explained in Section V, is that a speech speech-distortion index increase linearly with the filter length. signal can be modeled as an AR process. Particular to this ex- Therefore, a longer filter should be applied for more noise periment, the /i:/ sound used here can be well modeled with a reduction. However, the more the noise is attenuated, the more sixth-order LPC (linear prediction coding) analysis. Therefore, the source signal is deformed, as shown in Fig. 5. when the filter length is increased to 6, the numerator of (34) In the second case, we test the Wiener filter for noise reduction is minimized, as a result, the speech-distortion index reaches in the context of speech signals. It is known that a speech signal its minimum. Continuing to increase the filter length leads to a 1228 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 4, JULY 2006

Fig. 8. Noise reduction in a car noise condition (same speech and noise signals as in Fig. 7) where ƒx‚ a I @H dBA: (a) noisy speech and its spectrogram and (b) noise reduced speech and its spectrogram.

The second experiment tests the noise reduction performance in different SNR conditions. Here the speech signal is recorded from a female speaker as shown in Fig. 7. The computer-gen- erated random Gaussian noise is added to the speech signal to control the SNR. The length of the Wiener filter is set to . The results are presented in Fig. 9, where besides and , we also plotted the Itakura–Saito (IS) distance, a widely used objective quality measure that performs a comparison of spec- tral envelopes (AR parameters) between the clean and the pro- cessed speech [53]. Studies have shown that the IS measure Fig. 7. Noise reduction in a car noise condition where ƒx‚ a IH @IH dfA: is highly correlated (0.59) with subjective quality judgements (a) clean speech and its spectrogram; (b) noisy speech and its spectrogram; and [54]. A recent report reveals that the difference in mean opinion (c) noise reduced speech and its spectrogram. score (MOS) between two processed speech signals would be less than 1.6 if their IS measure is less than 0.5 for various higher distortion due to more noise reduction. To further verify codecs [55]. Many other reported experiments confirmed that this observation, we investigated several other vowels, and found two spectra would be perceptually nearly identical if their IS dis- that the curve of versus filter length follows a similar shape, tance is less than 0.1. All this evidence indicates that the IS dis- except that the minimum may appear in a slightly different loca- tance is a reasonably good objective measure of speech quality. tion. Taking into account the sounds other than vowels in speech As SNR decreases, the observation signal becomes more that may be less predicable, we find that good performance with noisy. Therefore, the Wiener filter is expected to have more the Wiener filter (in terms of the compromise between noise noise reduction for low SNRs. This is verified by Fig. 9(a), reduction and speech distortion) can be achieved when the filter where significant noise reduction is obtained for low SNR length is chosen around 20. Figs. 7 and 8 plot, respectively, conditions. However, more noise reduction would correspond the outputs of our Wiener filter system for dB to more speech distortion. This is confirmed by Fig. 9(b) and and dB , where the speech signal is from a female (d) where both the speech-distortion index and the IS distance speaker, the background noise is a car noise signal, and . increase as speech becomes more noisy. Comparing the IS CHEN et al.: NEW INSIGHTS INTO THE NOISE REDUCTION WIENER FILTER 1229

Fig. 9. Noise reduction performance as a function of SNR in white Gaussian noise: (a) noise-reduction factor; (b) speech-distortion index; (c) Itakura–Saito distance between the clean and noisy speeches; and (d) Itakura–Saito distance between the clean and noise-reduced speeches.

TABLE I NOISE REDUCTION PERFORMANCE WITH THE SUBOPTIMAL FILTER,WHERE sƒh IS THE IS DISTANCE BETWEEN THE CLEAN SPEECH AND THE FILTERED VERSION OF THE CLEAN SPEECH,WHICH PURELY MEASURES THE SPEECH DISTORTION DUE TO THE FILTERING EFFECT; sƒh IS THE IS DISTANCE BETWEEN THE CLEAN AND NOISE-REDUCED SPEECHES; sƒh IS THE IS DISTANCE BETWEEN THE CLEAN AND NOISY SPEECH SIGNALS

distance before [Fig. 9(c)] and after [Fig. 9(d)] noise reduction, comparison, besides the speech-distortion index and the noise- one can see that significant gain in the IS distance has been reduction factor, we also show three IS distances (between the achieved, indicating that the Wiener filter is able to reduce clean and filtered speech signals denoted as noise and improve speech quality (but not necessarily speech , between the clean and noise-reduced speech intelligibility). signals marked as , and between the clean and noisy signals The third experiment is to verify the performance behavior of denoted as , respectively). the suboptimal filter derived in Section VI-A. The experimental One can see that the IS distance between the clean and noisy conditions are the same as outlined in the previous experiment. speech signals increases as SNR drops. The reason for this is ap- The results are presented in Table I, where for the purpose of parent. When SNR decreases, the speech signal becomes more 1230 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 4, JULY 2006

Fig. 10. Noise-reduction factor and signal-distortion index, both as a function of the number of microphone sensor: (a) noise reduction; (b) speech distortion. The source signal is a speech from a female speaker as shown in Fig. 7; the background noise is a computer-generated white Gaussian process; and ƒx‚ a IH @IH dBA. noisy. As a result, the difference between the spectral envelope for this experiments are graphically portrayed in Fig. 10. It can (or AR parameters) of the clean speech and that (or those) of be seen that the noise-reduction index increases linearly with the noisy speech tends to be more significant, which leads to the number of microphones, while the speech distortion is ap- a higher IS distance. It is noticed that is much smaller proximately 0. Comparing Fig. 10 with 9, one can see that in the than . This significant gain in IS distance indicates that condition where dB , the multichannel optimal the use of noise reduction technique is able to mitigate noise filter with 4 sensors achieves a noise reduction similar to the op- and improve speech quality. Comparing the results from both timal single-channel Wiener filter, but with no speech distortion, the Wiener and the suboptimal Wiener filters, we can see that a which shows the advantage of using multiple microphones. better compromise between noise reduction and speech distor- tion is accomplished by using the suboptimal filter. For example, VIII. CONCLUSION when dB , the suboptimal filter with has achieved a noise reduction of 2.0106, which is 82% of that The problem of speech enhancement has attracted a consider- with the Wiener filter; but its speech-distortion index is 0.0006, able amount of research attention over the past several decades. which is only 54% of that of the Wiener filter; the corresponding Among the numerous techniques that were developed, the op- IS distance between the clean and filtered speech is 0.0281, timal Wiener filter can be considered as one of the most funda- which is only 17% of that of the Wiener filter. From the anal- mental noise-reduction approaches. It is widely known that the ysis shown in Section VI-A, we know that both Wiener filter achieves noise reduction by deforming the speech and are independent of SNR. This can be easily signal. However, so far not much has been said on how the verified from Table I. However, it is noted that Wiener filter really works. In this paper we analyzed the inherent decreases with SNR, which may indicate that the suboptimal relationship between noise reduction and speech distortion with filter works more efficiently for higher SNR than for lower SNR the Wiener filter. Starting from the speech and noise estima- conditions. tion using the Wiener theory, we introduced a speech-distortion The last experiment is to investigate the performance of the index and two noise-reduction factors, and showed that for the multichannel optimal filter given in (79). Since the focus of single-channel Wiener filter, the amount of noise attenuation is this paper is on reduction of additive noise, the reverberation in general proportional to the amount of speech degradation, i.e., effect is not considered here. To simplify the analysis, we as- more noise reduction incurs more speech distortion. sume that we have an equispaced linear array, which consists of Depending on the nature of the application, some practical ten microphone sensors. The spacing between adjacent micro- noise-reduction systems may require very high-quality speech, phones is cm. There is only a single speech source (a but can tolerate a certain amount of noise. While other systems speech signal from a female speaker) propagating from the far may want speech as clean as possible even with some degree of field to the array with an incident angle (the angle between the speech distortion. Therefore, it is necessary to have some man- wavefront and the line joining the sensors in the linear array) agement schemes to control the contradicting requirements be- of . We further assume that all the microphone sen- tween noise reduction and speech distortion. To do so, we have sors have the same signal and noise power. The sampling rate discussed three approaches. If we know the linear prediction co- is 16 kHz. For the experiment, we choose Microphone 0 as the efficients of the clean speech signal or they can be estimated reference sensor, and synchronize the observation signals ac- from the noisy speech, these coefficients can be employed to cording to the time-difference-of-arrival (TDOA) information achieve noise reduction while maintaining a low level of speech estimated using the algorithm presented in [56]. We then pass distortion. When no a priori knowledge is available, we can the time-aligned observation signals through the optimal filter use a suboptimal filter in which a free parameter is introduced given in (79) to extract the desired speech signal. The results to control the compromise between noise reduction and speech CHEN et al.: NEW INSIGHTS INTO THE NOISE REDUCTION WIENER FILTER 1231 distortion. By setting the free parameter to 0.7, we showed that where the suboptimal filter can achieve 90% of the noise reduction compared to the Wiener filter; but the resulting speech distor- tion is less than half compared to the Wiener filter. In case that we have multiple microphone sensors, the multiple observations of the speech signal can be used to reduce noise with less or even . . no speech distortion. . .

APPENDIX and RELATIONSHIP BETWEEN THE A PRIORI AND THE A POSTERIORI SNR Theorem: With the Wiener filter in the context of noise re- duction, the a priori SNR given in (12) and the a posteriori SNR defined in (42) satisfy . . . . (89) Proof: From their definitions, we know that all three ma- are two diagonal matrices. If for the ease of expression we de- trices, , , and are symmetric, and positive semi-defi- note as , then both SNR and can nite. We further assume that is positive definite so its inverse be rewritten as exists. In addition, based on the independence assumption be- tween the speech signal and noise, we have . In case that both and are diagonal matrices, or is a scaled version of (i.e., ), it can be easily seen that . Here, we consider more complicated situations where at least one of the and matrices is not di- agonal. In this case, according to [49], there exists a linear trans- (95) formation that can simultaneously diagonalize , , and . The process is done as follows. Since , , , and all are nonnegative numbers, as long as we can show that the inequality (90) where again is the identity matrix (96)

holds, then . Now we prove this inequality by way of induction. . . (91) . . • Basic Step: If , is the eigenvalue matrix of , with , is the eigenvector matrix of , and (92) Note that is not necessarily orthogonal since is not Since , it is trivial to show that necessarily symmetric. Then from the definition of SNR and , we immediately have where “ ” holds when . Therefore (93) and

(94) 1232 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 4, JULY 2006

so the property is true for , where “ ” holds REFERENCES when any one of and is equal to 0 (note that [1] M. R. Schroeder, “Apparatus for supressing noise and distortion in com- and cannot be zero at the same time since is munication signals,” U.S. Patent 3 180 936, Apr., 27 1965. invertible) or when . [2] , “Processing of communication signals to reduce effects of noise,” • Inductive Step: Assume that the property is true for U.S. Patent 3 403 224, Sep., 24 1968. , i.e., [3] J. S. Lim and A. V. Oppenheim, “Enhancement and bandwidth compres- sion of noisy speech,” Proc. IEEE, vol. 67, no. 12, pp. 1586–1604, Dec. 1979. [4] J. S. Lim, Speech Enhancement. Englewood Cliffs, NJ: Prentice-Hall, 1983. [5] Y. Ephraim, “Statistical-model-based speech enhancement systems,” Proc. IEEE, vol. 80, no. 10, pp. 1526–1554, Oct. 1992. [6] E. J. Diethorn, “Subband noise reduction methods for speech enhance- We must prove that it is also true for .Asa ment,” in Audio for Next-Generation Multimedia matter of fact Communication Systems, Y. Huang and J. Benesty, Eds. Boston, MA: Kluwer, 2004, pp. 91–115. [7] J. Chen, Y. Huang, and J. Benesty, “Filtering techniques for noise reduc- tion and speech enhancement,” in Adaptive Signal Processing: Applica- tions to Real-World Problems, J. Benesty and Y. Huang, Eds. Berlin, Germany: Springer, 2003, pp. 129–154. [8] S. Gannot, D. Burshtein, and E. Weinstein, “Signal enhancement using beamforming and nonstationarity with applications to speech,” IEEE Trans. Signal Process., vol. 49, no. 8, pp. 1614–1626, Aug. 2001. [9] S. E. Nordholm, I. Claesson, and N. Grbic, “Performance limits in sub- band beamforming,” IEEE Trans. Speech Audio Process., vol. 11, no. 3, pp. 193–203, May 2003. [10] F. Asano, S. Hayamizu, T. Yamada, and S. Nakamura, “Speech en- hancement based on the subspace method,” IEEE Trans. Speech Audio Process., vol. 8, no. 5, pp. 497–507, Sep. 2000. [11] F. Jabloun and B. Champagne, “A multi-microphone signal subspace approach for speech enhancement,” in Proc. IEEE ICASSP, 2001, pp. 205–208. (97) [12] M. Brandstein and D. Ward, Eds., Microphone Arrays: Signal Pro- cessing Techniques and Applications. Berlin, Germany: Springer, 2001. [13] S. Doclo and M. Moonen, “GSVD-based optimal filtering for single and Using the induction hypothesis, and also the fact that multimicrophone speech enhancement,” IEEE Trans. Signal Process., vol. 50, no. 9, pp. 2230–2244, Sep. 2002. [14] B. Widrow and S. D. Stearns, Adaptive Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1985. [15] S. F. Boll, “Suppression of acoustic noise in speech using spectral sub- traction,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-27, no. 2, pp. 113–120, Apr. 1979. hence [16] R. J. McAulay and M. L. Malpass, “Speech enhancement using a soft- decision noise suppression filter,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-28, no. 2, pp. 137–145, Apr. 1980. [17] P. Vary, “Noise suppression by spectral magnitude estimation-mecha- nism and theoretical limits,” Signal Process., vol. 8, pp. 387–400, Jul. 1985. [18] R. Martin, “Noise power spectral density estimation based on optimal smoothing and minimum statistics,” IEEE Trans. Speech Audio Process., vol. 9, no. 5, pp. 504–512, Jul. 2001. [19] W. Etter and G. S. Moschytz, “Noise reduction by noise-adaptive spec- tral magnitude expansion,” J. Audio Eng. Soc., vol. 42, pp. 341–349, May 1994. [20] D. L. Wang and J. S. Lim, “The unimportance of phase in speech enhancement,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-30, no. 4, pp. 679–681, Aug. 1982. (98) [21] Y. Ephraim and D. Malah, “Speech enhancement using a minimum- mean square error short-time spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, no. 6, pp. 1109–1121, Dec. 1984. where “ ” holds when all the ’s corresponding to nonzero [22] , “Speech enhancement using a minimum mean-square error are equal, where . That completes the log-spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal proof. Even though it can improve the SNR, the Wiener filter Process., vol. ASSP-33, no. 2, pp. 443–445, Apr. 1985. [23] N. Virag, “Single channel speech enhancement based on masking prop- does not maximize the a posteriori SNR. As a matter of fact, erties of human auditory system,” IEEE Trans. Speech Audio Process., (42) is well known as the generalized Rayleigh quotient. So vol. 7, no. 2, pp. 126–137, Mar. 1999. the filter that maximizes the a posteriori SNR is the eigen- [24] Y. M. Chang and D. O’Shaughnessy, “Speech enhancement based con- vector corresponding to the maximum eigenvalue of the matrix ceptually on auditory evidence,” IEEE Trans. Signal Process., vol. 39, no. 9, pp. 1943–1954, Sep. 1991. . However, this filter typically gives rise to large speech [25] T. F. Quatieri and R. B. Dunn, “Speech enhancement based on auditory distortion. spectral change,” in Proc. IEEE ICASSP, vol. 1, May 2002, pp. 257–260. CHEN et al.: NEW INSIGHTS INTO THE NOISE REDUCTION WIENER FILTER 1233

[26] L. Deng, J. Droppo, and A. Acero, “Estimation cepstrum of speech under [52] H. Cox, R. M. Zeskind, and M. M. Owen, “Robust adaptive beam- the presence of noise using a joint prior of static and dynamic features,” forming,” IEEE Trans. Acoust., Speech, Signal Process., vol. 35, no. IEEE Trans. Speech Audio Process., vol. 12, no. 3, pp. 218–233, May 10, pp. 1365–1375, Oct. 1987. 2004. [53] L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition. En- [27] , “Enhancement of log mel power spectra of speech using a phase- glewood Cliffs, NJ: Prentice-Hall, 1993. sensitive model of the acoustic environment and sequential estimation [54] S. Quakenbush, T. Barnwell, and M. Clements, Objective Measures of of the corrupting noise,” IEEE Trans. Speech Audio Process., vol. 12, Speech Quality. Englewood Cliffs, NJ: Prentice-Hall, 1988. no. 2, pp. 133–143, Mar. 2004. [55] G. Chen, S. N. Koh, and I. Y. Soon, “Enhanced Itakura measure incorpo- [28] Y.Ephraim and H. L. Van Trees, “A signal subspace approach for speech rating masking properties of human auditory system,” Signal Process., enhancement,” IEEE Trans. Speech Audio Process., vol. 3, no. 4, pp. vol. 83, pp. 1445–1456, Jul. 2003. 251–266, Jul. 1995. [56] J. Benesty, “Adaptive eigenvalue decomposition algorithm for passive [29] M. Dendrinos, S. Bakamidis, and G. Garayannis, “Speech enhancement acoustic source localization,” J. Acoust. Soc. Amer., vol. 107, pp. from noise: A regenerative approach,” Speech Commun., vol. 10, pp. 384–391, Jan. 2000. 45–57, Feb. 1991. [30] P. S. K. Hansen, “Signal Subspace Methods for Speech Enhancement,” Ph.D., Tech. Univ. Denmark, Lyngby, 1997. [31] S. H. Jensen, P. C. Hansen, S. D. Hansen, and J. A. Sørensen, “Reduction of broad-band noise in speech by truncated qsvd,” IEEE Trans. Speech Jingdong Chen (M’99) received the B.S. degree in Audio Process., vol. 3, no. 6, pp. 439–448, Nov. 1995. electrical engineering and the M.S. degree in array [32] H. Lev-Ari and Y. Ephraim, “Extension of the signal subspace speech signal processing from the Northwestern Polytechnic enhancement approach to colored noise,” IEEE Signal Process. Lett., University in 1993 and 1995, respectively, and the vol. 10, no. 4, pp. 104–106, Apr. 2003. Ph.D. degree in pattern recognition and intelligence [33] A. Rezayee and S. Gazor, “An adaptive KLT approach for speech control from the Chinese Academy of Sciences in enhancement,” IEEE Trans. Speech Audio Process., vol. 9, no. 2, pp. 1998. His Ph.D. research focused on speech recogni- 87–95, Feb. 2001. tion in noisy environments. He studied and proposed [34] U. Mittal and N. Phamdo, “Signal/noise KLT based approach for en- several techniques covering speech enhancement and hancing speech degraded by colored noise,” IEEE Trans. Speech Audio HMM adaptation by signal transformation. Process., vol. 8, no. 2, pp. 159–167, Mar. 2000. From 1998 to 1999, he was with ATR Interpreting [35] Y.Hu and P. C. Loizou, “A generalized subspace approach for enhancing Telecommunications Research Laboratories, Kyoto, Japan, where he conducted research on speech synthesis, speech analysis as well as objective measure- speech corrupted by colored noise,” IEEE Trans. Speech Audio Process., ments for evaluating speech synthesis. He then joined the Griffith University, vol. 11, no. 4, pp. 334–341, Jul. 2003. Brisbane, Australia, as a Research Fellow, where he engaged in research in [36] K. K. Paliwal and A. Basu, “A speech enhancement method based on robust speech recognition, signal processing, and discriminative feature repre- Proc. IEEE ICASSP Kalman filtering,” in , 1987, pp. 177–180. sentation. From 2000 to 2001, he was with ATR Spoken Language Translation [37] J. D. Gibson, B. Koo, and S. D. Gray, “Filtering of colored noise for Research Laboratories, Kyoto, where he conducted research in robust speech speech enhancement and coding,” IEEE Trans. Signal Process., vol. 39, recognition and speech enhancement. He joined Bell Laboratories as a Member no. 8, pp. 1732–1742, Aug. 1991. of Technical Staff in July 2001. His current research interests include adaptive [38] S. Gannot, D. Burshtein, and E. Weinstein, “Iterative and sequential signal processing, speech enhancement, adaptive noise/echo cancellation, mi- Kalman filter-based speech enhancement algorithms,” IEEE Trans. crophone array signal processing, signal separation, and source localization. He Speech Audio Process., vol. 6, no. 4, pp. 373–385, Jul. 1998. is a co-editor/co-author of the book Speech Enhancement (Berlin, Germany: [39] Y. Ephraim, D. Malah, and B.-H. Juang, “On the application of Springer-Verlag, 2005). hidden Markov models for enhancing noisy speech,” IEEE Trans. Dr. Chen is the recipient of 1998–1999 research grant from the Japan Key Acoust., Speech, Signal Process., vol. 37, no. 12, pp. 1846–1856, Technology Center, and the 1996–1998 President’s Award from the Chinese Dec. 1989. Academy of Sciences. [40] Y. Ephraim, “A Bayesian estimation approach for speech enhancement using hidden Markov models,” IEEE Trans. Signal Process., vol. 40, no. 4, pp. 725–735, Apr. 1992. [41] I. Cohen, “Modeling speech signals in the time-frequency domain using GARCH,” Signal Process., vol. 84, pp. 2453–2459, Dec. 2004. Jacob Benesty (SM’04) was born in Marrakech, Mo- [42] T. Lotter, “Single and Multichannel Speech Enhancement for Hearing rocco, in 1963. He received the Masters degree in Aids,” Ph.D. dissertation, RWTH Aachen Univ., Aachen, Germany, microwaves from Pierre & Marie Curie University, 2004. France, in 1987, and the Ph.D. degree in control and [43] J. Vermaak, C. Andrieu, A. Doucet, and S. J. Godsill, “Particle methods signal processing from Orsay University, France, in for Bayesian modeling and enhancement of speech signals,” IEEE Trans. April 1991. Speech Audio Process., vol. 10, no. 2, pp. 173–185, Mar. 2002. During his Ph.D. program (from November 1989 [44] H. Sameti, H. Sheikhzadeh, L. Deng, and R. L. Brennan, “HMM-based to April 1991), he worked on adaptive filters and strategies for enhancement of speech signals embedded in nonstationary fast algorithms at the Centre National d’Etudes des noise,” IEEE Trans. Speech Audio Process., vol. 6, no. 5, pp. 445–455, Telecommunications (CNET), Paris, France. From Sep. 1998. January 1994 to July 1995, he worked at Telecom [45] D. Burshtein and S. Gannot, “Speech enhancement using a mixture- Paris on multichannel adaptive filters and acoustic echo cancellation. From maximum model,” IEEE Trans. Speech Audio Process., vol. 10, no. 6, October 1995 to May 2003, he was first a Consultant and then a Member of the Technical Staff at Bell Laboratories, Murray Hill, NJ. In May 2003, he pp. 341–351, Sep. 2002. joined the Université du Québec, INRS-EMT, in Montréal, QC, Canada, as an [46] J. Vermaak and M. Niranjan, “Markov Chain Monte Carlo methods for associate professor. His research interests are in acoustic signal processing and speech enhancement,” in Proc. IEEE ICASSP, vol. 2, May 1998, pp. multimedia communications. He co-authored the book Advances in Network 1013–1016. and Acoustic Echo Cancellation (Berlin, Germany: Springer-Verlag, 2001). He [47] S. Haykin, Adaptive Filter Theory, 4th Ed. ed. Upper Saddle River, is also a co-editor/co-author of the books Speech Enhancement (Berlin, Ger- NJ: Prentice-Hall, 2002. many: Springer-Verlag, 2005), Audio Signal Processing for Next-Generation [48] P. M. Clarkson, Optimal and Adaptive Signal Processing. Boca Raton, Multimedia Communication Systems (Boston, MA: Kluwer, 2004), Adaptive FL: CRC, 1993. Signal Processing: Applications to Real-World Problems (Berlin, Germany: [49] K. Fukunaga, Introduction to Statistial Pattern Recognition. San Springer-Verlag, 2003), and Acoustic Signal Processing for Telecommunication Diego, CA: Academic, 1990. (Boston, MA: Kluwer, 2000). [50] J. Capon, “High resolution frequency-wavenumber spectrum analysis,” Dr. Benesty received the 2001 Best Paper Award from the IEEE Signal Pro- Proc. IEEE, vol. 57, no. 8, pp. 1408–1418, Aug. 1969. cessing Society. He is a member of the editorial board of the EURASIP Journal [51] O. L. Frost, “An algorithm for linearly constrained adaptive array pro- on Applied Signal Processing. He was the co-chair of the 1999 International cessing,” Proc. IEEE, vol. 60, no. 8, pp. 926–935, Aug. 1972. Workshop on Acoustic Echo and . 1234 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 4, JULY 2006

Yiteng (Arden) Huang (S’97–M’01) received Simon Doclo (S’95–M’03) was born in Wilrijk, Bel- the B.S. degree from the Tsinghua University in gium, in 1974. He received the M.Sc. degree in elec- 1994, the M.S. and Ph.D. degrees from the Georgia trical engineering and the Ph.D. degree in applied sci- Institute of Technology (Georgia Tech), Atlanta, in ences from the Katholieke Universiteit Leuven, Bel- 1998 and 2001, respectively, all in electrical and gium, in 1997 and 2003, respectively. computer engineering. Currently, he is a Postdoctoral Fellow of the Fund During his doctoral studies from 1998 to 2001, for Scientific Research—Flanders, affiliated with the he was a Research Assistant with the Center of Electrical Engineering Department of the Katholieke Signal and Image Processing, Georgia Tech, and was Universiteit Leuven. In 2005, he was a Visiting Post- a teaching assistant with the School of Electrical doctoral Fellow at the Adaptive Systems Laboratory, and Computer Engineering, Georgia Tech. In the McMaster University, Hamilton, ON, Canada. His re- summers from 1998 to 2000, he worked with Bell Laboratories, Murray search interests are in microphone array processing for acoustic noise reduction, Hill, NJ and engaged in research on passive acoustic source localization dereverberation and sound localization, adaptive filtering, speech enhancement, with microphone arrays. Upon graduation, he joined Bell Laboratories as a and hearing aid technology. He serves as Guest Editor for the Journal on Ap- Member of Technical Staff in March 2001. His current research interests are in plied Signal Processing. multichannel acoustic signal processing, multimedia and wireless communi- Dr. Doclo received the first prize “KVIV-Studentenprijzen” (with E. De cations. He is a co-editor/co-author of the books Audio Signal Processing for Clippel) for the best M.Sc. engineering thesis in Flanders in 1997, a Best Stu- Next-Generation Multimedia Communication Systems (Boston, MA: Kluwer, dent Paper Award at the International Workshop on Acoustic Echo and Noise 2004) and Adaptive Signal Processing: Applications to Real-World Problems Control in 2001, and the EURASIP Signal Processing Best Paper Award 2003 (Berlin, Germany: Springer-Verlag, 2003). (with M. Moonen). He was secretary of the IEEE Benelux Signal Processing Dr. Huang was an Associate Editor of the IEEE SIGNAL PROCESSING Chapter (1998–2002). LETTERS. He received the 2002 Young Author Best Paper Award from the IEEE Signal Processing Society, the 2000–2001 Outstanding Graduate Teaching Assistant Award from the School Electrical and Computer Engineering, Georgia Tech, the 2000 Outstanding Research Award from the Center of Signal and Image Processing, Georgia Tech, and the 1997–1998 Colonel Oscar P. Cleaver Outstanding Graduate Student Award from the School of Electrical and Computer Engineering, Georgia Tech.