Robust Signal-To-Noise Ratio Estimation Based on Waveform Amplitude Distribution Analysis

Robust Signal-to-Noise Ratio Estimation Based on Waveform Amplitude Distribution Analysis Chanwoo Kim, Richard M. Stern Department of Electrical and Computer Engineering and Language Technologies Institute Carnegie Mellon University, Pittsburgh, PA 15213 {chanwook, rms}@cs.cmu.edu speech and background noise are independent, (2) clean Abstract speech follows a gamma distribution with a fixed shaping In this paper, we introduce a new algorithm for estimating the parameter, and (3) the background noise has a Gaussian signal-to-noise ratio (SNR) of speech signals, called WADA- distribution. Based on these assumptions, it can be seen that if SNR (Waveform Amplitude Distribution Analysis). In this we model noise-corrupted speech at an unknown SNR using algorithm we assume that the amplitude distribution of clean the Gamma distribution, the value of the shaping parameter speech can be approximated by the Gamma distribution with obtained using maximum likelihood (ML) estimation depends a shaping parameter of 0.4, and that an additive noise signal is uniquely on the SNR. Gaussian. Based on this assumption, we can estimate the SNR While we assume that the background noise can be by examining the amplitude distribution of the noise- assumed to be Gaussian, we will demonstrate that this corrupted speech. We evaluate the performance of the algorithm still provides better results than the NIST STNR WADA-SNR algorithm on databases corrupted by white algorithm, even in the presence of other types of maskers such noise, background music, and interfering speech. The as background music or interfering speech, where the WADA-SNR algorithm shows significantly less bias and less corrupting signal is clearly not Gaussian. variability with respect to the type of noise compared to the The organization of this paper is as follows: in Sec. 2 we standard NIST STNR algorithm. In addition, the algorithm is discuss the assumptions about clean speech and additive noise. quite computationally efficient. In Sec. 3 we describe how the SNR measurement can be Index Terms: SNR estimation, Gamma distribution, obtained from the amplitude distribution of the input signal. Gaussian distribution Section 4 contains experimental results that compare the accuracy of WADA-SNR to the standard NIST STNR algorithm. 1. Introduction The estimation of signal-to-noise ratios (SNRs) has been 2. Characterization of clean speech and extensively investigated for decades and it is still an active additive noise field of research (e.g. [1-7]). Reliable SNR estimation can improve algorithms for speech enhancement [1][2], speech It is widely known that the symmetric gamma distribution detection, and speech recognition [3], since knowledge of is a good approximation to the amplitude distribution of a SNR makes it easier to compensate for the effects of noise. large speech corpus (e.g. [8][9]). Specifically, the probability Techniques for estimating SNR can be classified into density function, f (x) of clean speech can be represented by several categories. One of the approaches is based on x the following equation [8-11]: distinguishing the spectra of noise and speech. Noise spectrum estimation (e.g. [3]) or spectral subtraction techniques usually belong to this category. Another approach f (x | ) x | x | x 1 exp( | x |) (1) x x x x is based on measurement of the energy. The widely used 2 ( x ) NIST STNR (Signal-To-Noise-Ratio) algorithm is based on this technique. In this approach, a histogram of short-time where x is the amplitude of the speech, and x and !x are the energy is constructed, from which the signal and noise energy shaping and rate parameters of the gamma distribution, distributions are estimated. In another approach, Martin [4] respectively [10][11]. Fig. 1 (a) illustrates this property. Many used the low-energy envelope in frequency bands to estimate research results show that values of 0.4 or 0.5 for x provide the SNR level. Still other approaches are based on statistics the best fit for clean speech (e.g. [8][9]). We will assume for that are obtained from waveform samples rather than from now that a clean speech signal x[n] exhibits a gamma energy or spectral coefficients. For example, Nemer [5] used distribution with a fixed shaping parameter x of 0.4 and an kurtosis values to estimate the SNR in each frequency band. arbitrary value of !x. (The parameter !x serves to normalize In this approach, short-time voiced signals in a given the density function and has no impact on the SNR frequency band are assumed to be sinusoidal with a fixed estimation.) As will be shown later (cf. Fig. 2), the SNR phase, and short-time unvoiced signals in this band is value estimated by our algorithm is relatively independent of assumed to be a sinusoidal signal with random phase. x if the true value of the SNR is less than 20 dB. Throughout Our approach is based on the fact that the amplitude this paper, x[n], *[n], and z[n] will denote sample functions distribution of a waveform usually can be characterized by a for clean speech, noise, and corrupt speech respectively. The gamma distribution with a shaping parameter value between variables x, , and z will denote sample values without regard 0.4 and 0.5. This fact has been observed by several research to time, and x, *, and z will represent the random variables groups and has been described in numerous books and papers that describe them. (e.g. [8][9]). The only assumptions we make are that (1) the Copyright © 2008 ISCA Accepted after peer review of full paper 2598 September 22-26, Brisbane Australia 100 2 =0.4 Actual Speech Amplitude Distribution x =0.44 Gamma Distribution Model x 1.5 80 x=0.54 z 1 60 G 40 0.5 Probability Density Probability 20 0 -20 0 20 40 60 80 100 SNR (dB) Figure 2: Calculated dependence of the parameter Gz on 0 0 0.02 0.04 0.06 0.08 0.1 the SNR in dB. Amplitude (Maximum Possible Amplitude Is Normalized to 1.0) (a) Clean speech corrupt speech signal z[n] is represented by the following equation: 100 z[n] x[n] [n] (3) Actual Speech Amplitude Distribution Gamma Distribution Model From (1) and (2), the power of the speech and noise 80 parts can be obtained from the above distributions using some arithmetic: ( 1) 60 P x x (4) x 2 x P 2 (5) 40 * * where Px and Pv are the signal and noise power, respectively. Probability Density Probability Hence, the SNR of this signal z[n] is given by 20 Px x ( x 1) (6) z 2 P* ( * x ) 0 0 0.02 0.04 0.06 0.08 0.1 Amplitude (Maximum Possible Amplitude Is Normalized to 1.0) 3. SNR measurement based on the gamma (b) 10-dB additive white Gaussian noise distribution 100 In Figs. 1(a) to 1(c), we observe the amplitude distribution of Actual Speech Amplitude Distribution clean and corrupt utterances. It can be seen that even for noisy Gamma Distribution Model 80 speech utterances, the gamma distribution model is still quite close to the actual amplitude distribution. More importantly, we can easily see that the shapes of Figs. 1(b) and 1(c) are 60 very different from that of Fig. 1(a). Hence we observe that the value of the parameter z characterizes the amplitude distribution of noisy speech using the Gamma-function model 40 depends on the SNR. From the probability density function of the gamma distribution, we can obtain the following relation Probability Density Probability 20 for corrupt speech [11]: N 1 N 1 1 1 (7) ln( z ) 0 ( z ) ln( B| z[n] |) Bln[| z[n] |] N N 0 n 0 n 0 0 0.02 0.04 0.06 0.08 0.1 where () is the digamma function. Amplitude (Maximum Possible Amplitude Is Normalized to 1.0) 0 z (c) 0-dB additive white Gaussian noise From the above equation, we can see that the shaping Figure 1: Comparison between the actual amplitude parameter depends on the right hand side of (7). Based on this distribution of speech and the gamma distribution model. observation, we define the parameter Gz: N1 N1 A subset of 1,600 utterances of the DARPA RM test set 1 1 (8) Gz ln( B| z[n] |) Bln(| z[n] |) was used. N n0 N n0 We now show that Gz in (8) can be employed to uniquely If a clean speech signal is corrupted by additive Gaussian determine SNRs. Let us assume that |z[n]| and ln(|z[n]|) are noise *[n], its probability density function can be expressed as: both ergodic in the mean. For N sufficiently large, we can 1 2 (2) replace the time averages by their corresponding ensemble f ( ) exp( ) * 2 averages: 2 * 2 * 1 N1 where * is the standard deviation of the noise. B| z[n] | E[| z |] (9) We will further assume that both x[n] and *[n] have zero N n0 means and that they are statistically independent. The 2599 50 White Noise Background Music Noise 40 Interfering Speaker Noise Ideal Case 30 20 10 Estimated SNR [dB] Figure 3: The structure of the WADA-SNR estimation system 0 1 N 1 (10) -10 Bln | z[n] | E[ln | z |] -10-50 5 1015202530 N n0 True SNR [dB] producing: G ln(E[| z |]) E(ln(| z |)) (a) Results with the WADA-SNR algorithm Z . (11) ln(E[| x * |]) E(ln(| x * |)) 50 Let’s consider the following normalized random variables ~x ~ and* : 40 ~ x xx, (12) ~ 30 * * / * (13) From (1), (2), (12), and (13), we see that the probability ~ 20 densities of ~x and * are represented by the following distributions: ~ 10 x dx 1 ~ ~ X 1 ~ (14) f~ (x) f ( ) | x | exp( | x |) [dB] SNR Peak Estimated x x ~ White Noise X dx 2 ( x ) 0 Background Music Noise Interfering Speaker Noise ~ ~ d 1 ~2 f~ ( ) f ( ) exp( ) (15) Ideal Case (7dB Difference Assumption) * * * ~ -10 d 2 -10 -5 0 5 10 15 20 25 30 Note that Eqs.

Robust Signal-To-Noise Ratio Estimation Based on Waveform Amplitude Distribution Analysis

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support