<<

Proceedings of the 7th WSEAS International Conference on CIRCUITS, SYSTEMS, ELECTRONICS, CONTROL and (CSECS'08)

Estimation of using Thresholding

PETRSYSEL JIRI MISUREC Brno University of Technology Brno University of Technology Department of Department of Telecommunications Purkynova 464/118, 612 00 Brno Purkynova 464/118, 612 00 Brno CZECH REPUBLIC CZECH REPUBLIC

Abstract: The basic problem of the single-channel speech enhancement methods lies in a rapid and precise method for estimating , on which the quality of enhancement method depends. The paper describes a new method of power spectral using in spectral domain. To reduce the proposed method use the procedure of thresholding the wavelet coefficients of a periodogram. Then the smoothed estimate of power spectral density of noise is obtained using the inverse discrete wavelet transform.

Key–Words: Speech Enhancement, Power Spectral Density, Periodogram, Wavelet Transform Thresholding

1 Introduction the theoretical power spectral density Gxx [k] and the value of power spectral density estimation Most of the speech enhancement methods use estima- Gˆ [k] [8] tion of noise and interference characteristics. Thus the xx notion of power spectral density is introduced, which Gˆ k G k Gˆ k . (3) defines the density of total noise energy of a random B xx [ ] = xx [ ] − E xx [ ] signal in dependence on . n o n o Estimation is unbiased when the mean value of esti- By the Wiener-Khinchin relation [3, 8] it can be mation is equal to the theoretical power spectral den- derived that the power spectral density G [k] of a xx sity stationary random process can be obtained by the ˆ discrete of the se- E Gxx [k] = Gxx [k] . (4) quence Estimation is asymptoticallyn o unbiased when this equa- ∞ tion holds for a sufficiently large count of realizations. −j2πkm G [k] = γxx [m] e , (1) ˆ xx Estimation variance D Gxx [k] is defined as the m=−∞ X mean square value of the differencen o between the esti- where γxx [m] is the autocorrelation sequence of sta- mate and the mean value of estimate [8] tionary random process, for which it holds ˆ ˆ ˆ 2 ∞ D Gxx [k] = E Gxx [k] − E Gxx [k] . γxx [m] = x[n + m]x[n]. (2) n o  n o  (5) n=−∞ X Thus the estimation error can be defined by nor- The stationary random process denotes a random pro- malized mean square error [8] cess whose mean value and variance are independent ǫ2 = ǫ2 + ǫ2, (6) of time n while the autocorrelation sequence γxx [m] c b r depends only on lags m. 2 where ǫb is the normalized mean square error due to The autocorrelation sequence in formula (1) is 2 bias, and ǫr is the normalized mean square error due theoretically infinite and in practice it must be re- to variance. Both errors are defined as placed by an estimate from one or a few process realizations of finite length. On the other hand, this 2 B Gˆxx [k] causes and the power spectral density ob- ǫ2 = , (7) b  nG [k] o tained is only an estimate Gˆxx [k] of theoretical power xx spectral density Gxx [k].  ˆ  The estimation bias (systematic error) D Gxx [k] ǫ2 = . (8) r n 2 o B Gˆxx [k] is defined as the difference between Gxx [k] n o

ISSN: 1790-5117 207 ISBN: 978-960-474-035-2 Proceedings of the 7th WSEAS International Conference on CIRCUITS, SYSTEMS, ELECTRONICS, CONTROL and SIGNAL PROCESSING (CSECS'08)

2 Power spectral density estimation achieves the value 1, i.e. 100 %. For this reason esti- methods mation using the periodogram is improper. One of the methods of smoothing the power spec- 2.1 Non-parametric methods tral density estimation is the mean value of a few , which are obtained from a few suffi- Non-parametric methods of power spectral density es- ciently long random process realizations. [3, 8] If timation use the discrete Fourier transform. The auto- we have at our disposal only one random process correlation sequence can be estimated by the relation realization, then we subdivide an N-point signal into [3] K segments. The effect of subdividing is reduced fre- quency resolution by a factor K. For each segment, 1 N−m−1 r [m] = x [n + m] x [n], 0 ≤ m < N, the periodogram is computed xx N n=0 X 2 N−1 M−1 1 (i) 1 −j2πkn rxx [m] = x [n + m] x [n], −N

where N is the signal length. The estimation used is where M = N is the length of the each segment. biased but it asymptotically approximates the theoret- K Then the powerj spectralk density estimation is equal to ical autocorrelation sequence the mean value of partial periodograms

lim rxx [m] = γxx [m] . (10) N→∞ 1 K−1 P B [k] = P (i) [k] . (16) Moreover, estimation variance converges to zero as xx K xx i=1 N → ∞, therefore rxx [m] is a consistent estimate X [3]. This method is called the Bartlett method [3]. The If we substitute (9) into (1), an estimate of power power spectral density estimation by the Bartlett spectral density of an ergodic random process, so method is again an asymptotically unbiased estima- called periodogram, can be obtained tion. However, the estimation variance is reduced by N−1 a factor K [3] −j2πkm 1 2 P [k] = rxx [m] e = |X[k]| . xx N k 2 m=−(N−1) 1 sin 2π N X P B k G2 k 2N−1 . (11) D xx [ ] = xx [ ] 1 + k K  N sin 2π !  However, the periodogram is also a biased estimate n o 2N−1 (17) but asymptotically it is an unbiased estimate   After substituting into (8) the normalized mean square lim E {Pxx [k]} = Gxx [k] . (12) error due to variance is shown to be reduced by a fac- N→∞ tor K too. When the random process is a Gaussian random Another modification of averaging periodograms process, the estimation variance is shown to be [3] is proposed by Welch. The fundamentals are the same as in the Bartlett method but Welch proposed segment k 2 2 sin 2π 2N−1 N overlapping and weighting segments by a chosen win- D {Pxx [k]} = Gxx [k] 1 + k .  N sin 2π !  dow. The Welch method can reduce the estimation 2N−1 variance while preserving sufficient frequency resolu- (13)   tion. [3] Hence the estimation variance is approaching the square of power spectral density 2 2.2 Parametric methods lim D {Pxx [k]} = Gxx [k] (14) N→∞ Unlike non-parametric methods the parametric as N → ∞. Therefore estimation using the pe- methods emulate a model for the generation of sig- riodogram is an inconsistent estimation. Moreover, nals that is constructed from a number of parameters estimation variance achieves the value of the square estimated from a short realization of the random of power spectral density and the normalized mean process observed. The modelling aproach makes square error due to variance it possible to extrapolate the value of the random process outside the realization and also to extrapolate ˆ D Gxx [k] G2 [k] the value of the autocorrelation sequence for lags ǫ2 = lim = xx = 1 (15) r n 2 o 2 |m| ≥ N. An advantage is that these methods N→∞ Gxx [k] Gxx [k]

ISSN: 1790-5117 208 ISBN: 978-960-474-035-2 Proceedings of the 7th WSEAS International Conference on CIRCUITS, SYSTEMS, ELECTRONICS, CONTROL and SIGNAL PROCESSING (CSECS'08)

provide a better frequency resolution than non- where Pxx [k] are samples of the periodogram of the parametric methods and their frequency resolution is implementation of a random process of length 2N = almost independent of the length of random process 2M+1 obtained by the discrete Fourier transform, and realization. [3] the base functions ψj,m [k] are derived by a time shift The discrete stationary random process x [n] with j = 0, 1,..., 2m − 1 and dilatation with the scale unknown power spectral density Gxx [k] can be trans- m = 0, 1,..., log2 N of a single mother function formed into w[n] when sequence x [n] is ψ [k] according to the relation filtered by a linear digital filter with a 1/H(z) that is inverse to the transfer function (20). 1 k ψ [k] = √ ψ − j . (23) This filter is then called whitening filter. If it is possi- j,m m 2m 2   ble to find out the transfer function of whitening filter then the power spectral density is shown to be The transformation is linear and therefore the coeffi- cients will represent the sum of the coefficients repre- j2πf 2 j2πf j2πf Gxx e = σwH(e )eH( ), (18) senting the sought power spectral density gj,m[k] and   the coefficients representing the noise ej,m[k] where 2 υ[0] σw = e (19) Cj,m[k] = gj,m[k] + ej,m[k]. (24) is the variance of white noise, As the random processes ς[k] are independent and the ∞ −m transformation is orthogonal, the coefficients ej,m[k] H(z) = exp υ [m] z = (20) will be non-correlated. At the same time, however, m=1 ! r X they are not independent since their probability distri- −i biz bution is independent of the shift j but is dependent i=0 , z > r , r < , on the scale m. But according to the central limiting = Ps | | 1 1 1 −j theorem their converges to the 1 + ajz j=0 Gaussian normal distribution with m → ∞ [2]. The P coefficients are modified via soft thresholding accord- is the transfer function of the model for the observed ing to the relation random process generation, and υ [m] are the cepstral coefficients of the autocorrelation sequence γxx [m]. (s) Cj,m[k] = sgn (Cj,m[k]) max (0, |Cj,m[k]| − λ) , (25) 2.3 Enhancing the estimate of power spec- where λ is the threshold value chosen. After that the tral density using the wavelet transform smoothed estimate of power spectral density of noise ˆ Other methods of power spectral density estimation Gxx [k] is obtained (via the inverse discrete wavelet can be based on thresholding the wavelet coefficient transform) in the form of periodogram [9, 7, 4, 5]. Consider a stationary ran- M 2m−1 dom process x[n], which has a defined logarithm of ˆ 1 (s) ln Gxx [k] = Cj,m[k]ψj,m [k] , j2πf N power spectral density ln G e , |f| ≤ 0.5. If m=0 j=0 xx X X we assume noise to be a Gaussian random process, (26) then the logarithm of periodogram can be modelled as k = 0, 1,...,N − 1. j2πf j2πf ln Pxx [k] = ln Gxx e + ς(e ) + γ, (21) With regard to the asymptotically normal distribu-   tion of coefficients e k the threshold value λ can j2πf j,m[ ] where ς(e ) is a random process with probabi- be determined using the universal thresholding pro- 2 lity distribution χ2 with two degrees of freedom, and posed by Johnstone Donoho [1] in the form γ ≈ 0.57721 is the Euler-Mascheroni constant. [2, 9] The random process, which is responsible for the pe- λ = σ 2, log N (27) riodogram variance, can be removed via thresholding p the wavelet transform coefficients. The coefficients where σ is the of noise, and N is of the discrete wavelet transform with discrete time the half of length. In [9] an optimum determina- Cj,m[k] will be calculated according to the relation tion of the threshold in dependence on the scale is pro- N−1 posed. If the scale is large, then the threshold equals Cj,m[k] = (ln Pxx [k] − γ) ψj,m [k] , (22) λm = αm ln N, (28) kX=0

ISSN: 1790-5117 209 ISBN: 978-960-474-035-2 Proceedings of the 7th WSEAS International Conference on CIRCUITS, SYSTEMS, ELECTRONICS, CONTROL and SIGNAL PROCESSING (CSECS'08)

where the constants αm are as given in Table 1 blender and vacuum cleaner, which have the character and N is the length of data. If the scale is of wideband noise almost approximating white noise. small, m << M − 1, the threshold will be deter- In the testing, the speech signal was also exposed to mined according to the relation interference by noise from a drilling machine and a π √ Ford Transit, which on the contrary have the character λm = √ ln. N of narrow-band noise. Approximately 60 recordings 3 from an actual environment with various signal-to- noise ratio and with various types of interference

scale m αm scale m αm (drill, vacuum cleaner, vehicles, blender, shower, etc.) are processed. [6] In view of the fact that the recor- M − 1 19.2 M − 6 04.5 dings were made in a real environment and it was M − 2 19.0 M − 7 06.4 impossible to obtain pure speech signal without inter- M − 3 02.9 M − 8 09.3 ference, the estimation of the quality was performed M − 4 07.7 M − 9 02.3 using the signal-to-noise ratio determined segment- M − 5 05.6 0M − 1 07.2 wise according to the relation

Rs ∆κ = 10 log10 . (29) Table 1: Values of constant αm for determining the Rν threshold when thresholding the wavelet transform The power of signal Rs is determined from a segment coefficients. containing the speech signal while the power of inter- ference Rν is estimated from a segment that does not contain the speech signal. The resulting signal-to-noise ratio and improve- ment in the signal-to-noise ratio were monitored in all enhanced recordings. The mean value and the vari- ance of improvement of signal-to-noise ratio were cal- culated. The results are summarized in Table 2.

4 Conclusion

A new method for enhancing the estimation of power spectral density of noise using thresholding the wavelet transform coefficients was demonstrated. It led to a reduced variance in the estimate of power spectral density of noise. The accuracy of power spec- tral density estimation can greatly affect the results of Figure 1: Power spectral density extimation using th the methods that use it. This can be demonstrated periodogram, AR model 10 order, and wavelet on speech enhancement by the spectral subtraction thresholding of circular saw noise. method with various types of interference and various signal-to-noise ratios. Thanks to the lower variance in the estimate of power spectral density of noise there 3 was also a lower variance in the estimate of the power of speech signal and a reduction in the signal The proposed method of power spectral density es- reconstruction error. timation using wavelet thresholding was tested on Although the mean value of improvement in the speech enhancement by spectral subtraction [6] of ac- signal-to-noise ratio changes very little depending on tual recordings of speech signal interfered with by the method of power spectral density estimation used, different types of noise. The block diagram of the marked differences can be seen in the variance of modified spectral subtraction is in the Fig. 2. [4, 5] improvement in the signal-to-noise ratio. It can be For comparison, the speech was also enhanced by the said that methods of power spectral density estimation method of spectral subtraction with the power spec- with a smaller variance of estimation (thresholding of tral density of interference being estimated using the wavelet coefficient of peridogram) provide a smaller Welch method and the AR model with order s = 20. variance of the improvement of signal-to-noise ratio. The speech signal was interfered with by the noise of Hence these methods are more suitable because the

ISSN: 1790-5117 210 ISBN: 978-960-474-035-2 Proceedings of the 7th WSEAS International Conference on CIRCUITS, SYSTEMS, ELECTRONICS, CONTROL and SIGNAL PROCESSING (CSECS'08)

noisy speech FFT abs( . )a DTWT

threshold estimation

arg( . ) VAD

soft thresholding enhanced speech IFFT abs( . )1/a subtraction IDTWT

Figure 2: Modified spectral subtraction method using the wavelet transform to estimate the power spectral density of noise.

Table 2: Results of spectral subtraction using various power spectral estimation methods.

Power spectral density estimation by Improvement of signal-to-noise ratio ∆κ(dB) Minimum Maximum Mean Variance Welch method 2,7 218 ,5 96,2 163 ,5 wavelet thresholding 32,5 200 ,1 8,6 109 ,1 AR model s = 20 39,6 237 ,8 102 ,0 119 ,6

improvement in the signal-to-noise ratio is less depen- tions. Third edition. New Jersey: Prentice Hall, dent on the type of interference or on the input signal- 1996. 1016 p. ISBN 0-13-373762-4. to-noise ratio. Judging by the experiments made, the [4] Z. Smekal, P. Sysel, Single-Channel Noise Sup- method is in the first place suitable for noise that is of pression by in Spectral Domain. In wideband nature - e.g. shower, vacuum cleaner, and Proceedings of COST Action 2102 International the like. Workshop. Vietri sul Mare, Italy, Springer. 2007. p. 150 - 164. ISBN 978-3-540-76441-0. Acknowledgements: This work was supported [5] P. Sysel, Z. Smekal, Power Spectral Density within the framework of the project No 102/07/1303 Noise Estimation using Wavelet Transform in of the Grant Agency of the Czech Republic and the Spectral Domain. In Proceedings of 17th Czech- National Research Project “Information Societ” No German Workshop Speech Processing. Prague. 1ET301710509 by the Academy of Sciences of the 2007. p. 98 - 105. ISBN 978-80-86269-00-9. Czech Republic. This work is also supported in [6] P. Sysel, Single-channel speech enhancement part by the Ministry of Education, Youth and Sports method based on wavelet transform in spectral of the Czech Republic under research program No. domain. Ph.D. Thesis. Brno, Brno University of MSM0021630513. Technology. 2007. p. 1 - 108. [7] P. Sysel, Wiener filtering with spectrum estima- References: tion by wavelet transformation. In Proceedings of International Conference on Trends in Com- [1] D. L. Donoho, I. M. Johnstone, Ideal spatial munications. Bratislava: FEI STU, July 2001. adaptation by wavelet shrinkage. Biometrika. pp. 471–474. ISBN 0-7803-6490-2. 1994, vol. 3, no. 81, pp. 425–455. [8] J. Uhl´ıˇr, P. Sovka, Digital signal processing [in [2] P. Moulin, Wavelet thresholding techniques for Czech]. 1. edition. Prague: CVUTˇ Publishing, power spectrum estimation. IEEE Transactions 1995. 313 s. ISBN 80-01-01303-0. on Signal Processing. November 1994, vol. 42, [9] B. Vidakovic, Statistical Modeling by Wavelets. no. 11, pp. 3126–3136. First edition. New Jersey: Wiley, 1999. 408 p. [3] J. G. Proakis, D. G. Manolakis, Digital Signal Probability and . ISBN 0-471-29365-2. Processing Principles, Algorithms, and Applica-

ISSN: 1790-5117 211 ISBN: 978-960-474-035-2