<<

General Synthesis Model

Log Areas, Reflection Coefficients, Formants, Vocal voiced sound amplitude Tract Polynomial, Articulatory Parameters, … Digital Speech ProcessingProcessing—— T T Lectures 7-8 1 2

Time Domain Methods Rz()=−1 α z−1

in Speech Processing unvoiced sound amplitude

Pitch Detection, Voiced/Unvoiced/Silence Detection, Gain Estimation, Vocal Tract 1 2 Parameter Estimation, Glottal Pulse Shape, Radiation Model

V Basics Overview speech or music P P 1 2 • 8 kHz sampled speech (bandwidth < A(x,t) 4 kHz) representation formants reflection coefficients • properties of speech change with speech, x[n] Signal of speech voiced-unvoiced-silence time Processing pitch U/S • excitation goes from voiced to sounds of language unvoiced speaker identification emotions • peak amplitude varies with the sound being produced • time domain processing => direct operations on the speech waveform • pitch varies within and across • domain processing => direct operations on a spectral voiced sounds representation of the signal • periods of silence where zero crossing rate background signals are seen x[n] level crossing rate system energy • the key issue is whether we can create simple time-domain processing methods that enable us to • simple processing measure/estimate speech • enables various types of feature estimation 3 representations reliably and accurately4

Fundamental Assumptions Compromise Solution

• properties of the speech signal change relatively • “short-time” processing methods => short segments of slowly with time (5-10 sounds per second) the speech signal are “isolated” and “processed” as if – over very short (5-20 msec) intervals => uncertainty they were short segments from a “sustained” sound with due to small amount of , varying pitch, varying fixed (non-time-varying) properties amplitude – this short-time processing is periodically repeated for the – over medium length (20-100 msec) intervals => duration of the waveform uncertainty due to changes in sound quality, – these short analysis segments, or “analysis frames” often overlap one another transitions between sounds, rapid transients in – the results of short-time processing can be a single number (e.g., speech an estimate of the pitch period within the frame), or a set of – over long (100-500 msec) intervals => uncertainty numbers (an estimate of the formant frequencies for the analysis due to large amount of sound changes frame) – the end result of the processing is a new, time-varying sequence • there is always uncertainty in short time that serves as a new representation of the speech signal measurements and estimates from speech signals 5 6

1 FrameFrame--byby--FrameFrame Processing Frame 1: samples 0,1,...,L − 1 in Successive Windows Frame 2: samples RR,1,...,1++− R L

Frame 1 Frame 3: samples 222RR,21++ 1,..., 221 RL− 1 Frame 2 Frame 3 Frame 4

Frame 5 Frame 4: samples 3,3RR++− 1,...,3 RL 1 75% frame overlap => frame length=L, frame shift=R=L/4 Frame1={x[0],x[1],…,x[LFrame1={x[0],x[1],…,x[L--1]}1]} Frame2={x[R],x[R+1],…,x[R+LFrame2={x[R],x[R+1],…,x[R+L--1]}1]} Frame3={x[2R],x[2R+1],…,x[2R+LFrame3={x[2R],x[2R+1],…,x[2R+L--1]}1]} … 7

FrameFrame--byby--FrameFrame Processing in Successive Windows Frames and Windows

Frame 1 Fraaeme 2 50% frame overlap Frame 3 Frame 4

• Speech is processed frame-by-frame in overlapping intervals until entire region of speech is covered by at least one such frame • Results of analysis of individual frames used to derive model parameters in some manner FS =16,000 samples/second • Representation goes from time sample x [ n ], n = , 0 , 1 , 2 , to parameter vector f [ m ], m = 0 , 1 , 2 , where n is the time index and m is the frame index. L = 641 samples (equivalent to 40 msec frame (window) length) R = 240 samples (equivalent to 15 msec frame (window) shift) 9 10 Frame rate of 66.7 frames/second

ShortShort--TimeTime Processing Generic Short-Short-TimeTime Processing

speech speech representation, ⎛⎞∞ waveform, x[n] short-time f[m] QTxmwnmnˆ =−⎜⎟∑ ([ ])[ ] processing ⎝⎠m=−∞ nn= ˆ

x[n] T(x[n]) ~ Qnˆ i xn[ ]= samples at 8000/sec rate; (e.g. 2 seconds of 4 kHz bandlimited T( ) w[n] speech, xn[ ], 0≤≤ n 16000 ) linear or non-linear window sequence transformation (usually finite length) i fm[ ]== f [ mfm ], [ ],..., f [ m ] vectors at 100/sec rate, 1 ≤≤ m 200 , {}1 2 L L is the size of the analysis vector (e.g., 1 for pitch period estimate, 12 for autocorrelation estimates, etc) •Qnˆ is a sequence of local weighted average values of the sequence T(x[n]) at time nn= ˆ 11 12

2 ShortShort--TimeTime Energy Computation of ShortShort--TimeTime Energy

∞ Exm= ∑ 2 [ ] m=−∞ -- this is the long term definition of signal energy wn[]− m -- there is little or no utility of this definition for time-varying signals nˆ 2 22ˆˆ Exmnˆ = ∑ [ ] =−+++xn[]...[] N1 xn m=nNˆ−+1 -- short-time energy in vicinity of time nˆ Tx ( ) = x2 • window jumps/slides across sequence of squared values, selecting interval for processing wn [ ]=≤≤−10 n N 1 • what happens to E nˆ as sequence jumps by 2,4,8,...,L samples ( E nˆ is a lowpass = 0 otherwise function—so it can be decimated without lost of information; why is Enˆ lowpass?)

• effects of decimation depend on L; if L is small, then E nˆ is a lot more variable 13 than if L is large (window bandwidth changes with L!) 14

Effects of Window ShortShort--TimeTime Energy QTxnwnnnnˆˆ=∗([]) [] = •serves to differentiate voiced and unvoiced sounds in speech from silence (background signal) =∗ xn′ [] wn [] nn= ˆ • natural definition of energy of weighted signal is:

∞ •wn[] serves as a lowpass filter on Txn([]) which often has a lot of ˆ 2 Exmwnmnˆ =−∑ ⎣⎦⎡⎤ [ ] [ ] (sum or squares of portion of signal) high frequencies (most non-linearities introduce significant high m=−∞ frequency energy—think of what ( x [] nxn ⋅ [] ) does in frequency) -- concentrates measurement at sample nwn-mˆˆ , using weighting [ ] ∞∞ 22 2 • often we extend the definition of Qnˆ to include a pre-filtering term ˆˆ Exmwnmxmhnmnˆ =−=−∑∑ [ ] [ ] [ ] [ ] so that x[]n itself is filtered to a region of interest mm=−∞ =−∞ hn [ ]= w 2 [ n ] short time energy xˆ[]n x[]n Txn([]) QQnnˆ = Linear nn= ˆ T() wn[] x[n] x2[n] EEˆ = Filter ( )2 h[n] nnnn= ˆ

FS FS FRS / 15 16

ShortShort--TimeTime Energy Properties Windows

• depends on choice of h[n], or equivalently, ~ • consider two windows, w[n] window w~ [n] – rectangular window: –if w[n] duration very long and constant amplitude ~ • h[n]=1, 0≤n≤L-1 and 0 otherwise (w[n]=1, n=0,1,...,L-1), En would not change much over time, and would not reflect the short-time amplitudes of – Hamming window (raised cosine window): the sounds of the speech • h[n]=0.54-0.46 cos(2πn/(L-1)), 0≤n≤L-1 and 0 otherwise – very long duration windows correspond to narrowband – rectangular window gives equal weight to all L lowpass filters samples in the window (n,...,n-L+1) – want En to change at a rate comparable to the changing – Hamming window gives most weight to middle sounds of the speech => this is the essential conflict in all speech processing, namely we need short duration samples and tapers off strongly at the beginning and window to be responsive to rapid sound changes, but the end of the window short windows will not provide sufficient averaging to

give smooth and reliable energy function 17 18

3 Rectangular and Hamming Windows Window Frequency Responses

• rectangular window sin(ΩLT /2 ) He()jTΩ−Ω−= e jTL()/12 sin(ΩT /2 )

• first zero occurs at f=Fs/L=1/(LT) (or Ω=(2π)/(LT)) => L = 21 samples nominal cutoff frequency of the equivalent “lowpass” filter • Hamming window

wnHR[ ]= 0.54 wn [ ]−− 0.46*cos(2π n / ( L 1)) wn R [ ] • can decompose Hamming Window FR into combination of three terms

19 20

RW and HW Frequency Responses Window Frequency Responses

• log magnitude response of RW and HW • bandwidth of HW is approximately twice the bandwidth of RW • attenuation of more than 40 dB for HW outside passband, versus 14 dB for RW • stopband attenuation is essentially independent of L, the window duration => iiincreasing L sildimply decreases w idindow bandwidth • L needs to be larger than a pitch period

(or severe fluctuations will occur in En), but smaller than a sound duration (or En will not adequately reflect the changes in the speech signal)

There is no perfect value of L, since a pitch period can be as short as 20 samples (500 Hz at a 10 kHz rate) for a high pitch child or female, and up to 250 samples (40 Hz pitch at a 10 kHz sampling Rectangular Windows, Hamming Windows, rate) for a low pitch male; a compromise value of L on the order of 100-200 samples for a 10 kHz sampling 21 22 rate is often used in practice L=21,41,61,81,101 L=21,41,61,81,101

ShortShort--TimeTime Energy ShortShort--TimeTime Energy using RW/HW

Enˆ Enˆ L=51 L=51

L=101 L=101

i Short-time energy computation: ∞ L=201 L=201 ˆ 2 Exmwnmnˆ =−∑ ([ ][ ]) m=−∞ ∞ L=401 =−∑ ([xm ])2 wn [ˆ m ] L=401 m=−∞ i For L -point rectangular window, wm[ ]== 1, m 0,1, ..., L − 1 i giving ••asas L increases, the plots tend to converge (however you are smoothing sound energies) nˆ • short- short-timetime energy provides the basis for distinguishing voiced from unvoiced speech 2 Exmnˆ = ∑ ([ ]) 23 regions, and for mediummedium--toto--highhigh SNR recordings, can even be used to find regions of 24 mnL=−+ˆ 1 silence/background signal

4 ShortShort--TimeTime Energy for AGC Recursive Short-Short-TimeTime Energy

i un[− m−−−≥110 ] implies the condition n m Can use an IIR filter to define short-short-timetime energy, e.g., or mn≤−1 giving n−1 σαααα22122[]()[]nxmxnxn=−∑ 1112nm−− =−−+−+ ()([] []...) • time-dependent energy definition m=−∞ ∞∞ i for the index n −1 we have σ 22[]nxmhnmhm=− [ ][ ]/ [ ] ∑∑ n−2 mm=−∞ =0 σαααα22222[nxmxnxn−=11 ]∑ ( − ) [ ]nm−− =− ( 1 )( [ −+ 2 ] [ −+ 3 ] ...) • consider of filter of form m=−∞ i thus giving the relationship hn [ ]=−=≥ααnn−−11 un [11 ] n 222 =<01n σασ[]nnxn=⋅ [ −+111 ] [ − ]( − α ) ∞ i and defines an Automatic Gain Control (AGC) of the form 221nm−− σαα[]nxmunm=− (11 ) [ ] [ −− ] G ∑ Gn[]= 0 m=−∞ σ []n 25 26

Recursive Short-Short-TimeTime Energy Recursive Short-Short-TimeTime Energy

2 x[n] 2 x [n] σ 2[n] ( ) z−1 + (1−α) z−1

α

σ 222[]nnxn=⋅ασ [ −+111 ] [ − ]( − α )

27 28

Use of ShortShort--TimeTime Energy for AGC Use of ShortShort--TimeTime Energy for AGC

α = 0.9

α = 0.99

29 30

5 ShortShort--TimeTime Magnitude ShortShort--TimeTime Magnitudes

M nˆ M nˆ • short-time energy is very sensitive to large L=51 L=51 signal levels due to x2[n] terms L=101 L=101 – consider a new definition of ‘pseudo-energy’ based on average signal magnitude (rather than energy) ∞ L=201 L=201 ˆ M nˆ =−∑ |[]|[xm wn m ] m=−∞ L=401 L=401 – weighted sum of magnitudes, rather than weighted sum of squares |x[n]| MM= x[n] nnˆ nn=ˆ | | wn[] • differences between En and Mn noticeable in unvoiced regions F F FR/ • dynamic of Mn ~ square root (dynamic range of En) => level differences between voiced and S S S unvoiced segments are smaller

• computation avoids multiplications of signal with itself (the squared term) 31 • En and Mn can be sampled at a rate of 100/sec for window durations of 20 msec or so => efficient 32 representation of signal energy/magnitude

Short Time Energy and MagnitudeMagnitude—— Short Time Energy and MagnitudeMagnitude—— Rectangular Window Hamming Window

Enˆ M nˆ Enˆ M nˆ L=51 L=51 L=51 L=51

L=101 L=101 L=101 L=101

L=201 L=201 L=201 L=201

L=401 L=401 L=401 L=401

33 34

Other Lowpass Windows Other Lowpass Windows

• can replace RW or HW with any lowpass filer

• window should be positive since this guarantees En and Mn will be • this simple lowpass filter can be used to positive implement En and Mn recursively as: • FIR windows are efficient computationally since they can slide by L samples for with no loss of information (what should L 2 be?) EaEnn=+−−−1 (1 axn ) [ ] short-time energy • can even use an infinite duration window if its z-transform is a MMM = aM +−()|[]|htti()|[]|1 axn −short-time magnit itdude rational function, i.e., nn−1 hn[]=≥<< an , n00 , a 1 • need to compute En or Mn every sample and then =<00n down-sample to 100/sec rate 1 • recursive computation has a non-linear phase, Hz()=> | z | | a | 1− az−1 so delay cannot be compensated exactly

35 36

6 ShortShort--TimeTime Average ZC Rate Sinusoid Zero Crossing Rates

zero crossing => successive samples have different algebraic signs Assume the sampling rate is FS = 10 , 000 Hz

1. FFF00===100 Hz sinusoid has S / 10 , 000 / 100 100 samples/cycle; or zz===2 / 100 crossings/sample, or 2 / 100 * 100 zero crossings 1100 2 crossings/10 msec interval • zero crossing rate is a simple measure of the ‘frequency content’ of a signal—especially true for narrowband signals (e.g., sinusoids) 2. F0 = 1000 Hz sinusoid has FFS /0 ==10 , 000 / 1000 10 samples/cycle;

• sinusoid at frequency F0 with sampling rate FS has FS/F0 samples per or zz1100===2 / 10 crossings/sample, or 2 / 10 * 100 cycle with two zero crossings per cycle, giving an average zero 20 crossings/10 msec interval crossing rate of 3. FFF===5000 Hz sinusoid has / 10 , 000 / 5000 2 samples/cycle; z1=(2) crossings/cycle x (F0 / FS) cycles/sample 00S or z1 = 22 / crossings/sample, or z100 ==22100 / * z1=2F0 / FS crossings/sample (i.e., z1 proportional to F0 ) 100 crossings/10 msec interval

37 38 zM=M (2F0 /FS) crossings/(M samples)

Zero Crossing for Sinusoids Zero Crossings for Noise

offset:0.75, 100 Hz sinewave, ZC:9, offset sinewave, ZC:8 offseet:0.75, random noise, ZC:252, offset noise, ZC:122 1 100 Hz sinewave 3 random gaussian noise

0.5 2 ZC=252 ZC=9 1 0 0

-0.5 -1

-2 -1 0 50 100 150 200 0 50 100 150 200 Offset=0.75 Offset=0.75 6 100 Hz sinewave with dc offset random gaussian noise with dc offset 1.5 4 ZC=8 1 2 ZC=122 0.5 0

0 -2 0 50 100 150 200 0 50 100 150 200 250 39 40

ZC Rate Definitions ZC Normalization 1 nˆ ˆ Zxmxmwnmnˆ =−−−∑ |sgn( [ ]) sgn( [1 ])| [ ] 2Leff mnL=−+ˆ 1

sgn(xn [ ])=≥10 xn [ ] i The formal definition of zn is: =−10xn[] < 1 nˆ Znˆ == z1 ∑ | sgn( xm [ ]) − sgn( xm [ −1 ]) | i simple rectangular window: 2L mnL=−+ˆ 1 wn []=≤≤−101 n L is interpreted as the number of zero crossings per sample. = 0 otherwi se i For most practical applications, we need the rate of zero crossings

LLeff = per fixed interval of M samples, which is

zzMM =⋅=1 rate of zero crossings per M sample interval Thus, for an interval of τ sec., corresponding to M samples we get

zzMMFMS =⋅1 ; =τ = τ /T

Same form for Znnnˆˆˆ as for EM or 41 42

7 ZC and Energy Computation ZC Normalization

Hamming window i For a 1000 Hz sinewave as input, using a 40 msec window length with duration L=201 samples (LF ), with various values of sampling rate (S ), we get the following: (12.5 msec at Fs=16 kHz)

FLzMzSM1 8000 320 1/ 4 80 20 10000 400 1/ 5 100 20 16000 640 1/ 8 160 20

Hamming window i Thus we see that the normalized (per interval) zero crossing rate, with duration L=401 samples zM , is independent of the sampling rate and can be used as a measure (25 msec at of the dominant energy in a band. Fs=16 kHz)

43 44

ZC Rate Distributions ZC Rates for Speech

Unvoiced Speech: the dominant energy component is at about 2.5 kHz

Voiced Speech: the 1 KHz 2KHz 3KHz 4KHz dominant energy component is at about 700 Hz

• 15 msec windows • 100/sec sampling rate on • for voiced speech, energy is mainly below 1.5 kHz ZC computation • for unvoiced speech, energy is mainly above 1.5 kHz • ZC rate for unvoiced speech is 49 per 10 msec interval 45 46 • mean ZC rate for voiced speech is 14 per 10 msec interval

ShortShort--TimeTime Energy, Magnitude, ZC Issues in ZC Rate Computation

• for zero crossing rate to be accurate, need zero DC in signal => need to remove offsets, hum, noise => use bandpass filter to eliminate DC and hum • can quantize the signal to 1-bit for computation of ZC rate • can apply the concept of ZC rate to bandpass filtered speech to give a ‘crude’ spectral estimate in narrow bands of speech (kind of gives an estimate of the strongest frequency in each narrow band of speech)

47 48

8 Summary of Simple Time Domain Measures ShortShort--TimeTime Autocorrelation

s()n x()n Txn([]) Q Linear T[] wn[] nˆ Filter -for a deterministic signal, the autocorrelation function is defined as: ∞ ∞ QTxmwnm=−([ ])[ ˆ ] nˆ ∑ Φ[]kxmxmk=+∑ [ ][ ] m=−∞ m=−∞ 1. Energy: -for a random or periodic signal, the autocorrelation function is: nˆ L 2 ˆ 1 Exmwnmnˆ =−∑ [][ ] Φ[kmmkk ]= limx [m ]x [mk+ ] mnL=−+ˆ 1 N→∞ ∑ ()21L + mL=− i can downsample E at rate commensurate with window bandwidth nˆ - if x [nxnP]=+ [ ], then Φ [k ] =Φ [kP + ], => the autocorrelation function 2. Magnitude: preserves periodicity nˆ ˆ -properties of Φ [k ] : Mxmwnmnˆ =−∑ [] [ ] mnL=−+ˆ 1 1.Φ [k ] is even, Φ [k ]=−Φ [k ] 3. Zero Crossing Rate: nˆ 200.Φ [kk ] is maximum at =≤∀ , |Φ [k ] |Φ [ ], k ˆ Zznˆ ==1 ∑ sgn( x[mxmwnm ])−−− sgn( [ 1]) [ ] 30.Φ [ ] is the signal energy or power (for random signals) mnL=−+ˆ 1 where sgn(xm [ ])=≥ 1 xm [ ] 0 49 50 =−1[]0xm <

Short-Time Autocorrelation

Periodic Signals - a reasonable definition for the short-time autocorrelation is: ∞ ˆˆ Rknˆ [ ]=−+−−∑ xmwnmxmkwnk [ ] [ ] [ ] [ m ] m=−∞ • for a periodic signal we have (at least in 1. select a segment of speech by windowing theory) Φ[P]=Φ[0] so the period of a 2. compute deterministic autocorrelation of the windowed speech Rknnˆˆ [ ]=− R [ k ] - symmetry ∞ periodic signal can be estimated as the =−−+−∑ xmxm [ ] [ k ]⎣⎦⎡⎤ wn [ˆˆ mwn ] [ k m ] m=−∞ first non-zero maximum of Φ[k] - define filter of the form wk [ nˆˆˆ ]= wn [ ] wn [+ k ] – this that the autocorrelation function is - this enables us to write the short-time autocorrelation in the form: ∞ a good candidate for speech pitch detection ˆ Rnˆ [ k ]=−−∑ xmxm [ ] [ kw ]k [ n m ] algorithms m=−∞ ˆ th - the value of wknˆ [ ] at time n for the k lag is obtained by filtering ˆˆ ˆ – it also means that we need a good way of the sequence xn [ ] xn [− k ] with a filter with impulse response wnk [ ] measuring the short-time autocorrelation function for speech signals

51 52

Short-Time Autocorrelation Short-Time Autocorrelation

∞ ˆˆ Rnˆ [ k ]=−+−−∑ ⎣⎦⎣⎦⎡⎤⎡⎤ xmwn [ ] [ m ] xm [ kwn ] [ k m ] m=−∞

NN--L+1L+1 n+kn+k--L+1L+1

Lk−−1 ˆˆ′′ Rknˆ [ ]=+∑ ⎣⎦⎣⎦⎡⎤⎡⎤ xnmwm [ ] [ ] xnm [ +++ kwk ] [ m ] m=0

⇒−Lk points used to compute Rknˆ [ ]

53 54

9 Examples of Voiced (female) L=401 (magnitude)

T0 = N0T t = nT T0

1 F0 = T0 • autocorrelation peaks occur at k=72, 144, ... => 140 Hz pitch • much less clear estimates of periodicity since • Φ(P)<Φ(0) since windowed speech is not perfectly periodic HW tapers signal so strongly, making it look like a non-periodic signal • over a 401 sample window (40 msec of signal), pitch period F / F 55 56 s changes occur, so P is not perfectly defined • no strong peak for unvoiced speech

Voiced (female) L=401 (log mag)mag) Voiced (male) L=401

T T0 0 t = nT T0 T0

1 3 F0 = 3F0 = T0 T0

F 57/ Fs 58

Unvoiced L=401 Unvoiced L=401

59 60

10 Effects of Window Size Modified Autocorrelation • choice of L, window duration • want to solve problem of differing number of samples for each • small L so pitch period different k term in R ˆ [] k , so modify definition as follows: almost constant in window n ∞ •large L so clear periodicity ˆ ˆˆ Rknˆ []=−+−−∑ xmwn [ ]12 [ mxmkwn ][ ] [ m k ] seen in window m=−∞ L=401 - where wL12 is standard -point window, and w is extended window •as k increases, the number of duration LK+ samples, where K is the largest lag of interest of window points decrease, - we can rewrite modified autocorrelation as: reducing the accuracy and ∞ ˆ ˆˆˆˆ Rknˆ [][][][][]=+∑ xnmwmxnm12 +++ kwm k size of Rn(k) for large k => m=−∞ have a taper of the type - where L=251 R(k)=1-k/L, |k|

Examples of Modified Autocorrelation Examples of Modified Autocorrelation

LL--11

L+KL+K--11

ˆ - Rkn [ ] is a cross-correlation, not an auto-correlation ˆˆ - Rknn[ ]≠− R [ k ] ˆ - Rkn [ ] will have a strong peak at k= P for periodic signals and will not fall off for large k 63 64

Examples of Modified AC ShortShort--TimeTime AMDF

- belief that for periodic signals of period P , the difference function dn []=−− xn [] xn [ k ] - will be approximately zero for k,=±±02 P,P,... For realistic speech L=401 L=401 signals, dn [ ] will be small at k= P --but not zero. Based on this reasoning. the short-time Average Magnitude Difference Function (AMDF) is defined as: ∞ L=251 ˆˆ L=401 γ nˆ [kxnmwmxnmkwmk ]=+−+−−∑ | [ ]12 [ ] [ ] [ ] | m=−∞ - with wm12 [ ] and wm [ ] being rectangular windows. If both are the same

length, then γ nˆ [k ] is similar to the short-time autocorrelation, whereas L=401 L=125 if wm21 [ ] is longer than wm [ ], then γ nˆ [ k ] is similar to the modified short-time autocorrelation (or ) function. In fact it can be shown that 12/ γβ []kkRRk≈−20 []⎡⎤ˆˆ [] [] Modified Autocorrelations – Modified Autocorrelations – nnnˆˆˆ⎣⎦ fixed value of L=401 values of L=401,251,125 65 - where β [k ] varies between 0.6 and 1.0 for different segments of speech.66

11 AMDF for Speech Segments Summary • Short-time parameters in the time domain: –short-time energy –short-time average magnitude –short-time zero crossing rate –short-time autocorrelation –modified short-time autocorrelation –Short-time average magnitude difference function 67 68

12