Speech Detection Using High Order Statistic (Skewness)
Total Page:16
File Type:pdf, Size:1020Kb
Speech Detection Using High Order Statistic (Skewness) Juraj Kačur1, Matúš Jurečka2 1) Slovak University of Technology, Faculty of Electrical Engineering and Information Technology, Department of Telecommunications, Ilkovičová 3, Bratislava, Slovakia, e- mail: [email protected], phone: +421-2- 68279416 2) Slovak University of Technology, Faculty of Electrical Engineering and Information Technology, Department of Telecommunications, Ilkovičová 3, Bratislava, Slovakia, e- mail: [email protected] process. Another detection problem poses the Abstract speech itself, containing high energy vowels of In this article we discuses one of many various lengths as well as unvoiced consonants possible applications of high order statistic of low energy and higher frequency (HOS) to speech detection problem. As a components; even intervals of silence are speech feature we used skewness or third- regular parts of speech. Situation dramatically order cumulant with two zero lags in a simple deteriorates if there is some background noise voiced speech detection algorithm. We present and mixed up with useful signal. It is compared its properties with the energy important to stress that noises can have almost feature in symmetric noises, speech signals any spectral and time characteristic that makes and in various signal to noise ratios (SNR). their separation from speech sometime almost These results proofed skewness feature being impossible. much suitable than energy for detection The process of speech detection includes: purposes, given those environment settings. finding proper speech features distinguishing Furthermore, we executed detection accuracy speech from noise, evaluation of their tests, which showed skewness outperforming behaviour in the time and decision taking classical approach, especially in low SNRs’. algorithm. All these stages are closely related to each other and will be presented Key words: speech detection, WSS noises, chronologically. symmetric noises, HOS, cumulants, skewness. 2. High Order Statistic 1. Introduction Speech detection algorithms can be found 2.1 Introduction & Theory in many different signal or speech processing The decision to use some HOS parameters applications, usually as parts of complex in our speech detection algorithm was based on systems. These algorithms are widely outstanding properties and features of HOS employed, especially in the following areas: techniques over other classical approaches. speech compression and transmission, speech Despite HOS has been investigated for 40 recognition, speech enhancement, medicine, years there is still lot to be explored and tested, etc. Despite existing many of them, there is no notably in real speech processing applications. universal detection algorithm yet working HOS is a non- linear statistical branch of reliably in all possible noises and settings. As signal processing. It provides new, more the requirements for their performance are still integrated view and description of stochastic growing it is getting worth to explore this area signals. Commonly used features like: more and more thoroughly. intensity, energy, autocorrelation and various The task of speech detection consists of spectral techniques can be expressed in the labelling end points of words if uttered isolated terms of HOS as first and second order statistic or marking active segments of a speech signal. features. Obviously these important and well- Apparently, finding word boundaries in known parameters suppress some parts of continues speech is much more difficult and information (like phase) which may cause can be done only via speech recognition ambiguity in system identification and other applications. These drawbacks can be • Any random vector that is jointly Gaussian eliminated by using HOS, like third and four has all its cumulants of the order three and order moments and cumulants and their higher identically equal to zero. spectra: bispectrum and trispectrum • Odd order cumulants of processes with respectively. High order moments can be symmetrical pdfs’ are equal to zero. expressed using moment generating function • Cumulants of the higher order than 1 are as follows. mean invariant. k1 kn k1 kn M om[x1 ,.., xn ] = E{x1 ,.., xn } = Other interesting properties can be found in δ r Φ(ω ,..,ω ) (− j)r 1 n | (1) [1]. Definition given by (3) is somewhat k1 kn ω1 =..=ωn =0 awkward and is practically out of use. Usually, δω1 ..δωn Where moment generating function is the real cumulants’ calculations for stationary Fourier transform of probability density processes are done by (6): function (pdf) of given stochastic process (2). Letmn(τ1,τ2,..,τn) = E(x(k +τ1)x(k +τ2)...x(k +τn)) then Φ(ω1,ω2 ,...,ωn ) = c1 = m1 (mean) ∞ ∞ ∞ c (τ ) = m (τ ) −(m )2 ( covariance ) ... pdf (x ,..., x )e− j(ω1x1+...ωn xn )dx ..dx 2 1 2 1 1 ∫∫ ∫ 1 n 1 n c (τ ,τ ) = m (τ ,τ )−m[m (τ )+m (τ )+m (τ −τ )]+2(m )3 −∞−∞ −∞ 3 1 2 3 1 2 1 2 1 2 2 2 2 1 1 c (τ ,τ ,τ ) = m (τ τ τ ) −m (τ )m (τ −τ )− (2) 4 1 2 3 4 1, 2, 3 2 1 2 3 2 m (τ )m (τ −τ )−m (τ )m (τ −τ ) − Even though these moments definitely provide 2 2 2 3 1 2 3 2 2 1 m[m (τ −τ ,τ −τ )+m (τ ,τ ) +m (τ ,τ ) +m (τ ,τ )] us with additional information, they do not 1 3 2 1 3 1 3 2 3 3 2 4 3 1 2 +(m )2[m (τ ) +m (τ ) +m (τ ) +m (τ −τ ) +m (τ −τ ) necessary exhibit eligible properties for further 1 2 1 2 2 2 3 2 3 1 2 3 2 +m (τ −τ )]−6(m )4 processing of stochastic signals. Therefore a 1 2 1 1 new set of features was derived called (6) cumulants, defined according (3): Zero lag cumulants, i.e. τn=0 have statistical k1 kn meanings and names and can be related to C um[x1 ,.., xn ] = moments by (7), provided processes are zero r mean: r δ ln(Φ(ω1,..,ωn )) (− j) | (3) 2 k1 kn ω1=..=ωn =0 E(x(k) ) = c (0) covariance δω1 ..δωn 2 3 This set of parameters exhibits interesting E(x(k) ) = c3 (0,0) skewness properties, which make them be a very E(x(k) 4 ) − 3c (0)2 =c (0,0,0) kurtosis powerful tool for signal processing. Some of 2 4 them being most important, which are also of (7) direct use to our application are listed next: Combining those properties together, we can • If sets of random variables x1,..xn and y1,…, easily see their great potential in signal yn are mutual independent then following processing, like system identification, equality (4) holds: Gaussian noise suppression, detection and Cum(x + y , x + y ,.., x + y ) = classification of various non- linear distortions 1 1 2 2 n n and many more. Detailed information can be Cum(x1, x2 ,..xn ) + Cum(y1, y2 ,...yn ) found in [1], [2], [3]. (4) 2.2 Real life application of HOS In the previous section definitions and • If random variables can be divided into any basic properties of HOS were given. However, two or more statistically independent sets in real applications we must not forget its then their joined cumulant is identically limitations. These are mainly related to the equal to zero. cumulants’ calculation (estimation) from real • Let a , a ,..,a are constants, then: 1 2 n data sets. Real data are always limited in their Cum(a1x1,a2 x2 ,...,an xn ) = size that inevitably leads to the certain estimation dispersion (due to approximation of a a ...a Cum(x , x ,..., x ) (5) 1 2 n 1 2 n the mean operation). This is just the case for speech which is highly non- stationary process. Therefore speech must be segmented into 1 (i+1)M −1 intervals which can be regarded as being X(l) − ∑ X( j), l ∈< iM, (i +1)M) stationary and thus can be easily described by M j=iM statistical means. Those intervals for speech Xˆ i (l) = 〈 signals are usually about 10-30ms. Obviously, the more parameters we are to estimate the 0 otherwise more data we need, which contradict the real (9) life. Note that higher order moments or Finally, a consistent and asymptotically cumulants need much more parameters to set unbiased estimation of the third order than the mean or covariance sequence does. cumulants from real data is given by (10) Furthermore, when only one realization of the K stochastic process is present (almost always), 1 i c3 (τ ,τ ) = m (τ ,τ ) we have to resort to the ergodicity assumption 1 2 ∑ 3 1 2 K i=1 of the examined process, which is usually not the exact case. Ergodicity is a strict condition (10) (if process is ergodic it must be also stationary, reverse implication does not hold), which enable us to obtain various statistic 3 Detection algorithm characteristics of the tested stochastic process We based our detection algorithm on by applying time averages to the samples skewness parameter (c3(0,0)) assuming belonging only to one realization. Otherwise, symmetric pdfs’ of noise generating processes, we would have to have lot of realizations and which many real life application obey. This apply averages to them rather than to the time, assumption proofed to be correct for all which is usually unfeasible. Unfortunately, available and tested noises appearing in our there is no exact method for ergodicity test, so experiments. Theoretically this parameter the decision usually lies on designers’ should be equal to zero in those noises, but due experience. Speech signals are regarded to be to approximations and estimation variances, it “ergodic enough” within their stationary is no, but still small enough. This feature is intervals. To suppress those abovementioned also negligibly small in unvoiced speech strict statements and limitations a key role is segments, but increases in voiced segments laid on more sophisticated estimation methods.