ECE6255 Digital Processing of Speech Signals

Lectures 16-18: Homomorphic Speech Processing

Chin-Hui Lee School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA 30332, USA [email protected]

Center of Signal and Image Processing Georgia Institute of Technology Basic Speech Model • Short segment of speech can be modeled as having been generated by exciting an LTI system either by a quasi-periodic impulse train, or a random noise signal • Speech analysis => estimate parameters of the speech model, measure their variations (and perhaps even their statistical variabilites-for quantization) with time

• Speech = excitation * system response => want to deconvolve speech into excitation and system response => do this using homomorphic filtering methods

2 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Superposition Principle

x()()() n ax12 n bx n

yn()()()() Lxn   aLxn 12  bLxn 

3 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Generalized Superposition for

 for LTI systems we have the result  yn()()()()() xnhn   xkhnk  k  "generalized" superposition => addition replaced by convolution

x()()() n x12 n x n

yn()()()() Hxn   Hxn 12  Hxn   homomorphic system for convolution

4 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Homomorphic Filter  homomorphic filter => homomorphic system H  that passes the desired signal unaltered, while removing the undesired signal

x ( n ) x1 ( n ) x 2 ( n ) - with x 1 ( n ) the "undesired" signal

H x()()() n  H x12 n H x n 

Hx 11(n )   ( n ) - removal of x ( n )

H x22()() n  x n

H x()()()() n  n  x22 n  x n  for linear systems this is analogous to additive noise removal

5 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Canonic Form for Homomorphic Convolution

 any homomorphic system can be represented as a cascade of three systems, e.g., for convolution 1. system takes inputs combined by convolution and transforms them into additive outputs 2. system is a conventional linear system 3. inverse of first system--takes additive inputs and transforms them into convolutional outputs

6 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Canonic Form for Homomorphic Convolution

x ( n ) x12 ( n ) x ( n ) - convolutional relation

xˆ( n ) D  x ( n )  x ˆ12 ( n )  x ˆ ( n ) - additive relation

ynˆ( ) Lxn ˆ1 ( )  xn ˆ 2 ( )  yn ˆ 1 ( )  yn ˆ 2 ( ) - conventional linear system 1 y()()()() n D  yˆˆ1 n  y 2 n  y 1 n  y2 (n ) - inverse of convolutional relation => design converted back to linear system, L

D   - fixed (called the characteristic system for homomorphic deconvolution) 1 D   - fixed (characteristic system for inverse homomorphic deconvolution)

7 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Properties of Characteristic Systems

D x()()() n  D x12 n x n 

D x12()() n D x n 

xˆ12()()() n  x ˆ n  x ˆ n

11 D yˆ()()() n  D  y ˆ12 n y ˆ n  11 D yˆˆ12()() n  D  y n 

y12()()() n  y n  y n

8 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Domain Canonic Form

 need to find a system that converts convolution to addition

x()()() n x12 n x n

X()()() z X12 z X z  since

D  x ( n )  xˆ12 ( n )  x ˆ ( n )  x ˆ ( n ) ˆ ˆ ˆ D  X()()()() z  X12 z  X z  X z => use log function which converts products to sums ˆ X ( z ) log X ( z )  log X12 ( z )  X ( z ) ˆˆ log X1 ( z )  log X 2 ( z )  X 1 ( z )  X 2 ( z ) Yzˆ()()()()() LXz ˆ  Xz ˆ  YzYz ˆ  ˆ 1 2 1 2 Y( z ) exp Yˆˆ ( z )  Y ( z )  Y ( z )  Y ( z ) 1 2 1 2

9 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Characteristic Systems for Homomorphic Deconvolution

 X()() z  x n z n n Xˆ ( z ) log X ( z )  log X ( z )  j arg X ( z ) 1 xˆ()() n  Xˆ z zn1 dz 2 j C

10 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Characteristic Systems for Homomorphic Deconvolution

+

 Yˆ()() z  yˆ n zn n ˆ Y( z ) exp Y ( z ) 1 y()() n  Y z zn1 dz 2 j C

11 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Issues with  it is essential that the obey the equation

log X1 ( z ) X 2 ( z )  log X 1 ( z )  log X 2 ( z )

 this is trivial if X12 ( z ) and X ( z ) are real -- however usually

X12( z ) and X ( z ) are complex  consider evaluating X ( z ) only on the unit circle, e.g., z e j  then the complex log can be written in the form:

jarg X ( e j ) X( ejj ) | X ( e ) | e  logX ( ej )  Xˆ ( e j  )  log  | X ( e j  ) |   j arg  X ( e j  )         no problems with log magnitude term; uniqueness problems arise in defining the imaginary part of the log; can show that the imaginary part (the phase angle of the z-transform) needs to be a continuous odd function of 

12 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Complex and Real Cepstrum

 define the inverse transform of Xeˆ (j ) as  1 xˆ()() n Xˆ ej e j n d 2    where xnˆ ( ) called the "complex cepstrum" since a complex logarithm is involved in the computation  can also define a "real cepstrum" using just the real part of the logarithm, giving  1 c( n ) Re Xˆ ( ej ) e j n d 2     1  log |X ( ej ) | e j n d 2    can show that c ( n ) is the even part of xˆ ( n )

13 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Representation: The Complex Cepstrum

D*[] xn() * ● ● + + + xnˆ() F{ } log{ } F-1{ } Xe()j Xeˆ ()j

 j j j  n j  jarg X ( e ) X()()() e x n e X e e n ˆ j j  j  j  X( e ) log X ( e )   log X ( e )  j arg  X ( e )  1  xˆ()() n  Xˆ ej e j n d 2 

14 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology The Complex Cepstrum-DFT Implementation

22 j k  j kn NN Xp ( k ) X ( e )  x ( n ) e k  0,1,..., N  1, n j Xp ( k ) is the N point DFT corresponding to X ( e ) ˆ Xp( k ) log X p ( k )  log X p ( k )  j arg X p ( k ) 2 1 N 1 j kn ˆˆˆ N xpp( n ) X ( k ) e  x ( n  rN ) n  0,1,..., N  1 N kr0 

 xnˆ p ( ) is an aliased version of xnˆ ( )  use as large a value of N as possible to minimize aliasing

15 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology The Cepstrum-DFT Implementation

22 j k  j kn NN Xp ( k ) X ( e )  x ( n ) e n  0,1,..., N  1 n

jarg Xp ( k ) Xpp()() k X k e C( ejj ) log X ( e )

2 jk N Cpp( k ) log X ( e )

1  c( n ) C ( ej ) e j n d  xˆˆ ( n )  x (  n ) / 2 2  2 N 1 j kn 1 N cpp( n ) C ( k ) e  c ( n  rN ) n  0,1,..., N  1 N kr0 

 cp (n ) is an aliased version of c ( n ) use as large a value of N as possible to minimize aliasing

16 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Computational Considerations

 consider only absolutely summable input sequences so that the z-transform region of convergence includes the unit circle => the sequence has a FT  for finite length sequences, the characteristic system for convolution can be represented using FT's, giving N 1 X ( ej )  x ( n ) e j n n0 Xˆ ( ej ) log X ( e j  )   log X ( e j  )  j arg  X ( e j  )       1 xˆ()() n Xˆ ej e j n d 2    need to have the log magnitude be an even function of   need to have the phase be a continuous periodic function of  with period 2  need to approximate the FT by the DFT (works fine for finite length seqences)

17 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Finite Length Sequences  for finite length sequences all computations can be done using DFT's 2 N 1  j kn N Xp ( k ) x ( n ) e 0  k  N  1 n0 ˆ  Xpp( k ) log X ( k ) 0  k  N  1 2 1 N 1 j kn ˆ ˆ N xpp( n ) X ( k ) e 0  k  N  1 N k 0  not exact complex cepstrum since the complex log used in the DFT calculations is a sampled version of Xeˆ (j ) and thus the resulting inverse transform is an aliased version of the true complex cepstrum, i.e.,  ˆˆ xp ( n ) x ( n rN) r

18 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Cepstrum

 the complex cepstrum involves the use of the complex log and the cepstrum involves only the log magnitude, i.e.,  1 c ( n ) log X ( ej ) e j n d    n   2    an approximation to the cepstrum (using an inverse DFT) is 2 N 1 j kn 1 N cpp ( n ) log X ( k ) e 0  n  N  1 N k 0  the relation to the true cepstrum is  cp ( n ) c ( n rN ) r  need to use large values of N (512 or greater) in computing

xˆ pp( n ), c ( n ) to minimize aliasing

19 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology z-Transform Cepstrum Analysis

The main z-transform formula for cepstrum alanysis is based on the power series expansion:

n1  1 log(1x )  xn x  1 n1 n Example 1 --Apply this formula to the exponential sequence 1 xn ( ) an u()() n  X z  1 1 1 az1 n1  1 ˆ 1   nn X11( z ) log[ X ( z )]   log(1  az )   (  a ) z n1 n aann  ˆ ˆ 1 n x11( n ) u ( n  1)  X ( z )   log(1  az )   z nnn1 

20 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology z-Transform Cepstrum Alanysis

 Example 2 --consider the case of a digital system with a single zero outside the unit circle ( b 1)

x2 ( n ) ( n )  b ( n  1)

X2 ( z ) 1  bz (zero at z   1/ b ) ˆ X22( z ) log X ( z )  log 1  b z  ( 1)n1   ()bznn n1 n ( 1)nn1b xˆ (n ) u (  n  1) 2 n

21 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology z-Transform Cepstrum Analysis

 consider digital systems with rational z-transforms of the general type

MMi 0 r 1 Az11 akk z  b z Xz ( )  kk11 NNi 0 1 11ckk z  d z kk11

 with all coefficients ak , b k , c k , d k 1 => all c k poles and

ak zeros are inside the unit circle; all dbkk poles and zeros are outside the unit circle; the factor zr is just a shift of the time origin and has no effect

22 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology z-Transform Cepstrum Analysis  the complex logarithm of Xz ( ) is

Mi Xˆ ( z ) log X ( z )  log A  log zr  log 1  a z1       k  k 1

MNN00i 1 log 1 bk z   log 1  c k z   log 1  d k z k1 k  1 k  1  evaluating Xzˆ ( ) on the unit circle we can ignore the term related to loge jr (as this contributes only to the imaginary  part and is a linear phase shift)

23 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology z-Transform Cepstrum Analysis

 we can then evaluate the remaining terms, use power series expansion for logarithmic terms (and take the inverse transform to give the complex cepstrum) giving:  1 xˆ ( n ) Xˆ ( ej ) e j n d 2   log An 0 NM iicann kk n  0 kk11nn MN 00bdnn kk n  0 kk11nn

24 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Summary

1. Homomorphic System for Convolution: x(n) X(z) X(z)^ Y(z)^ Y(z) y(n) Z( ) log( ) L( ) exp( ) Z-1( )

^ ^ X(z) x(n)^ y(n)^ Y(z) Z-1( ) Linear Z( ) System 2. Practical Case: Z ( )  DFT Z1() IDFT

j jjjarg X ( e ) X()() e X e e

j j  j  logX ( e ) log X ( e ) j arg X ( e )

25 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Summary

3. Complex Cepstrum: 1  xˆ ( n )  Xˆ ( ej ) e j n d 2  4. Cepstrum: 1  c ( n )  log X ( ej ) e j n d 2  xˆˆ()() n x n c( n ) even part of xˆ ( n ) 2

26 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Summary

^ ^ x(n) Xp(k) X (k) xp(n) DFT Complex Log p IDFT

5. Practical Implementation of Complex Cepstrum:  j2 / N k j 2 / N kn Xp ( k ) X ( e ) x ( n ) e ) n ˆ Xp( k ) log X p ( k )  log X p ( k )  j arg X p ( k ) 1 N 1 ˆˆˆ j2/ N kn xpp( n ) X ( k ) e  x ( n  rN )  aliasing N kr0 

xnˆ p ( ) aliased version of xnˆ ( ) 6. Examples: 1  ar ˆ X()()() z1  x n   n  r 1 azr1 r  br X( z ) 1  bz  xˆ ( n )    ( n  r ) r1 r

27 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology z-Transform Cepstrum Analysis for 2 Pulses

 Example 3 --an input sequence of two pulses of the form

x3 ( n ) ( n )   ( n  N p ) (0    1)

N p X3 ( z ) 1  z

ˆ N p X33 ( z ) log X ( z )  log 1  z   ( 1)n1    n znN p n1 n   k ˆ k1 x3 ( n ) (  1) ( n  kN p ) k 1 k

 the cepstrum is an impulse train with impulses spaced at N p samples

Np 2Np 3Np

28 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Cepstrum for Train of Impulses

 an important special case is a train of impulses of the form: M x ( n )rp ( n rN ) r0 M rN p X ( z )  r z r0  clearly X ( z ) is a polynomial in zN p rather than z1 ; thus Xz ( ) can be expressed as a product of factors of the form (1azNNpp ) and (1 bz ), giving a complex

cepstrum, xˆ ( n ), that is non-zero only at integer multiples of N p

29 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology z-Transform Cepstrum Analysis for Convolution of Two Sequences

Example 4 --consider the convolution of sequences 1 and 3, i.e., n  xnxnxn4 ( ) 1 ( )  3 ( )  aun ( )  ( n )   ( nN  p )

n nN p a u()() n  a u n  N p The complex cepstrum is therefore the sum of the complex cepstra of the two sequences (since convolution in the time domain is converted to addition in the cepstral domain)

xˆ4 ( n ) x ˆ 1 ( n ) x ˆ 3 ( n ) an ( 1) k1 k u( n  1)   ( n  kN p ) nkk1

30 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology z-Transform Cepstrum Analysis for Convolution of 3 Sequences

Example 5 --consider the convolution of sequences 1, 2 and 3, i.e.,

x5 ( n ) x 1 ( n )  x 2 ( n )  x 3 ( n ) n  a u() n  () n  b  (1) n    () n   ( n  N p )

nnn Npp n  N 1 a u( n )  a u ( n  Npp )  ba u ( n  1)  ba u ( n  N  1) The complex cepstrum is therefore the sum of the complex cepstra of the three sequences

xˆ5 ( n ) x ˆ 1 ( n )  x ˆ 2 ( n )  x ˆ 3 ( n ) abn ( 1) k11 k ( 1) n n u( n  1)   ( n  kNp )  u (  n  1) nk 1 k n

31 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Example: a=.9, b=.8, a=.7, Np=15

()1 bz N X() z1  z p ()1 az1  

 k1 k (1)n1bn (1)    [n  kN p ] n k1 k

ann /

32 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Properties of the Complex Cepstrum

 in general, the complex cepstrum is non-zero and of infinite extent for both positive and negative n , even when xn( ) is causal, stable, and even of finite duration  the complex cepstrum is a decaying sequence which is bounded by  ||n x ( n ) for | n |   ||n  where  is the maximum absolute value of the quantities

ak, b k , c k , d k and  is a constant multiplier

 if Xz ( ) has no poles or zeros outside the unit circle (bdkk 0) then xˆ ( n ) 0 for n 0 => such signals are called "minimum phase" signals

33 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Relation of Cepstrum to Complex Cepstrum

for the sequence xn ( ) we have:

 j j j  n j  jarg X ( e ) X ( e ) x ( n ) e X ( e ) e n Xˆ ( ej ) log X ( e j  )  log X ( e j  )  j arg X ( e j  ) giving the complex cepstrum 1  xˆ ( n )  Xˆ ( ej ) e j n d 2  with cepstrum 1  c ( n )  log Xˆ ( ej ) e j n d 2  because of the symmetries of logX ( ejj ) (even), and j arg X ( e ) (odd), along with the symmetries of cos(nn ) (even) and sin( ) (odd), it can easily be shown that xˆˆ()() n x n c ( n ) Ev xˆ ( n ) 2

34 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology The Phase Issue in Computation

In order for the complex cepstrum to be computable, we need

jjj argX ( e )  arg X12 ( e ) arg X ( e ) when j j  j  X()()() e X12 e X e

j j  j  ARG X()()() e  ARG X12 e ARG X e 

35 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology The “Phase Unwrapping” Problem

36 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Cepstrum for Minimum Phase Signals

 for minimum phase signals (no poles or zeros outside unit circle) the complex cepstrum can be completely represented by the real part of the Fourier transforms  this means we can represent the complex cepstrum of minimum phase signals by the log of the magnitude of the FT alone  since the real part of the FT is the FT of the even part of the sequence ˆˆ ˆ j x()() n x n ReX ( e )  FT  2 FT c( n )  log X ()e j xˆˆ()() n x n cn() 2  giving xˆ ( n ) 0 n 0 c( n ) n 0 2c ( n ) n 0  thus the complex cepstrum (for minimum phase signals) can be computed by computing the cepstrum and using the equation above

37 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Recursive Relation for Complex Cepstrum for Minimum Phase Signals

 the complex cepstrum for minimum phase signals can be computed recursively from the input signal, xn( ) using the relation xˆ( n ) 0 n 0 logxn (0) 0 n1 x()() n k x n k  xˆ( k ) n  0 x(0)k 0  n x (0)

38 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Recursive Relation for Complex Cepstrum for Minimum Phase Signals

x( n ) X ( z ) 1. basic z -transform dX() z nx( n ) z   zX ( z ) 2. scale by n rule dz xˆ( n ) Xˆ ( z )  log X ( z ) 3. definition of complex cepstrum dXˆ ()() z d X z log[X ( z )] 4. differentiation of z -transform dz dz X() z dXˆ () z z X( z ) zX ( z ) 5. multiply both sides of equation dz

39 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Recursive Relation for Complex Cepstrum for Minimum Phase Signals

dXˆ () z nxˆ()()()()() n x n  z X z   zX z  nx n dz  k x()()() n xˆ k x n k  k  n for minimum phase systems we have xˆ ( n ) 0for n 0, x( n ) 0for n 0, giving: n k x()()() n xˆ k x n k  k0 n separating out the term for kn we get: n1 k x( n ) xˆˆ ( k ) x ( n  k )  x (0) x ( n ) k0 n x()() nn1 x n k k xˆˆ( n )  x ( k ) , n  0 x(0)k 0 x (0)  n xˆˆ(0) log x (0) , x ( n )  0, n  0

40 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Recursive Relation for Complex Cepstrum for Minimum Phase Signals

why is xxˆ (0) log[ (0)]? assume we have a finite sequence x ( n ), n 0,1,..., N 1 we can write xn ( ) as: x ( n ) x (0) ( n )  x (1)  ( n  1)  ...  x ( N  1)  ( N  1)  x(1) x ( N  1)  x(0) ( n )   ( n  1)  ...   ( N  1)  xx(0) (0)  taking z  transforms, we get: N 1 NZ12 NZ n 1 X ( z ) x ( n ) z  G (1  akk z ) (1  b z ) n0 kk11 where the first term is the gain, Gx (0), and the two product terms are the zeros inside and outside the unit circle. for minimum phase systems we have all zeros inside the unit circle so the second product term is gone, and we have the result that xˆˆ ( n ) log[ G ]  log[ x (0)]; x ( n )  0, n  0 NZ1a n xnˆ() k n 0 k=1 n

41 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Cepstrum for Maximum Phase Signals  for maximum phase signals (no poles or zeros inside unit circle) xˆˆ()() n x n cn() 2  giving xˆ ( n ) 0 n 0 c( n ) n 0 2c ( n ) n 0  thus the complex cepstrum (for maximum phase signals) can be computed by computing the cepstrum and using the equation above

42 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Recursive Relation for Complex Cepstrum for Maximum Phase Signals

 the complex cepstrum for maximum phase signals can be computed recursively from the input signal, xn( ) using the relation xˆ( n ) 0 n 0 logxn (0) 0 0 x()() n k x n k   xˆ( k ) n  0 x(0)kn1 n x (0)

43 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Characteristic System for Homomorphic Convolution

• Still need to define (and design) the L operator part (the linear system component) of the system to completely define the characteristic system for homomorphic convolution for speech – to do this properly and correctly, need to look at the properties of the complex cepstrum for speech signals

44 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Complex Cepstrum of Speech • Model of speech: – voiced speech produced by a quasi-periodic pulse train exciting slowly time-varying linear system =>

p(n) convolved with hv(n) – unvoiced speech produced by random noise exciting slowly time-varying linear system => u(n)

convolved with hv(n) • Time to examine full model and see what the complex cepstrum of speech looks like

45 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Complex Cepstrum for Speech  full model for voiced speech is: s ( n ) p ( n )  g ( n )  v ( n )  r ( n )   hvp() n rN r  where

p ( n ) is a periodic pulse train of period N p samples gn( ) is a glottal wave shape vn( ) is the vocal tract impulse response rn ( ) is the radiation impulse response

hnv ( ) is the impulse response of a linear system that combines the effects of glottal wave shape, vocal tract IR, and radiation IR

46 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Complex Cepstrum for Speech

 the transfer function for voiced speech is of the form

Hv ( z ) G ( z ) V ( z ) R ( z )  similary for unvoiced speech we have

sn ( ) un ( )  vn ( )  rn ( )  un ( )  hnu ( )  where un ( ) is a random noise excitation

hnu ( ) is IR of system that combined the effects of the vocal tract and radiation  the transfer function is of the form

Hu ( z ) V ( z ) R ( z )

47 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Complex Cepstrum for Speech

 the models for the speech components are as follows:

MMi 0 M 1 Az(1 akk z ) (1 b z ) 1. vocal tract: Vz ( )  kk11 Ni 1 (1 czk ) k1

--for voiced speech, only poles => akk b 0, all k --unvoiced speech and nasals, need pole-zero model but all poles are

inside the unit circle => ck  1 --all speech has complex poles and zeros that occur in complex conjugate pairs 2. radiation model: R ( z ) 1 z1 (high frequency emphasis) 3. glottal pulse model: finite duration pulse with transform

LLi 0 1 G ( z ) B (1 kk z ) (1  z ) kk11 with zeros both inside and outside the unit circle

48 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Complex Cepstrum for Voiced Speech

• combination of vocal tract, glottal pulse and radiation will be non-minimum phase => complex cepstrum exists for all values of n • the complex cepstrum will decay rapidly for large n (due to polynomial terms in expansion of complex cepstrum) • effect of the voiced source is a periodic pulse train for multiples of the pitch period

49 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Simplified Speech Model  short-time speech model xn ( ) wn ( )  pn ( )  gn ( )  vn ( )  rn ( )

pwv()() n h n  short-time complex cepstrum

xˆ ( n ) p ˆw ( n )  g ˆ ( n )  v ˆ ( n )  r ˆ ( n )

50 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Complex Cepstrum for Voiced Speech

a: HW voiced speech segment b: log magnitude of DFT (periodic component) c: principal value of phase => rapidly discontinuous d: unwrapped phase e: complex cepstrum (based on b and d)-pitch peaks for both positive and negative n f: cepstrum-same general properties

51 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Homomorphic Filtering of Voiced Speech

• Goal is to separate out the excitation impulses from the remaining components of the complex cepstrum • Use cepstral window, l(n), to separate excitation pulses from combined vocal tract

l(n)=1 for |n|

l(n)=0 for |n|

52 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Homomorphic Filtering of Voiced Speech

a and b: lowpass liftered log magnitude and unwrapped phase c: resulting combined vocal tract IR d and e: highpass liftered log magnitude and unwrapped phase f: resulting periodicity IR (see HW shape effects)

53 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Homomorphic Filtering of Unvoiced Speech

a: HW windowed signal b: log magnitude DFT- random component seen c: cepstrum => no sharp peaks due to periodicity d: smoothed log magnitude - phase not used as it makes no sense for unvoiced speech - for most applications do not need phase => use real cepstrum

54 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Short-Time Homomorphic Analysis

STFT

55 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Homomorphic Spectrum Smoothing

56 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Homomorphic Spectrum Smoothing

57 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Homomorphic Spectrum Smoothing

58 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Cepstral Pitch Detection • Simple procedure for cepstral pitch detection • Compute cepstrum every 10-20 msec • Search for periodicity peak in expected range of n • If found and above threshold => voice, pitch=location of cepstral peak • If not found => unvoiced

59 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Cepstral Pitch Detection

• male speaker (left), female speaker (right) • log spectra and cepstra [c(n)]2 • 40 msec HW with 10 msec jumps • for male-first 7 intervals are unvoiced; for female voiced at beginning, unvoiced at end • pitch period increases with time for male; pitch period much shorter for female • pitch period doubling at end of voiced regions for female

60 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Issues in Cepstral Pitch Detection • strong peak in 3-20 msec range is strong indication of voiced speech-absense of such a peak does not guarantee unvoiced speech – cepstral peak depends on length of window, and structure – maximum height of pitch peak is 1 (RW, unchanging pitch, window contains exactly N periods); height varies dramatically with HW, changing pitch, window interactions with pitch period => need at least 2 full pitch periods in window to define pitch period well in cepstrum => need 40 msec window for low pitch male—but this is way too long for high pitch female • bandlimited speech makes finding pitch period harder – extreme case of single harmonic => single peak in log spectrum => no peak in cepstrum – this occurs during voiced stop sounds (b,d,g) where the spectrum is cut off above a few hundred Hz • need very low threshold-e.g., 0.1-on pitch period-with lots of secondary verifications of pitch period

61 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Cepstral Formant Estimation

• The low-time cepstrum corresponds primarily to the combination of vocal tract, glottal pulse, and radiation, while the high time part corresponds primarily to excitation => use lowpass liftered cepstrum to give smoothed log spectra to estimate

want to estimate time-varying model parameters every 10-20 msec

62 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Cepstral Formant Estimation

• Fit peaks in cepstrum—decide if section of speech voiced or unvoiced • If voiced-estimate pitch period, lowpass lifter cepstrum, match first 3 formant to smooth log magnitude spectrum • If unvoiced, set pole frequency to highest peak in smoothed log spectrum; choose zero to maximize fit to smoothed log spectrum

63 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Cepstral Formant Estimation

cepstra spectra

• Sometimes 2 formants get so close that they merge and there are not 2 distinct peaks in the log magnitude spectrum’ use higher resolution spectral analysis via CZT • Blown up region of 0-900 Hz showing 2 peaks when only 1 seen in normal spectrum

64 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Speech Synthesis  can use cepstrally estimated parameters to control speech synthesis model  for voiced speech the vocal tract transfer function is modeled as 4 1 2ekkTT cos(2 F T ) e 2 Vz ( )  k  kT 1 2kT 2 k 1 1 2e cos(2 Fk T ) z ez

-- cascade of digital resonators (F14 F ) with unity gain at f 0

-- estimate FFF1 3 using formant estimation methods, 4 fixed at 4000 Hz

-- formant bandwidths fixed (14 )  fixed spectral compensation approximates glottal pulse shape and radiation (1eeaT )(1 bT ) Sz ( )  (1eaT z 11 )(1 e  bT z  ) ab400 , 5000

65 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Speech Synthesis

 for unvoiced speech the model is a complex pole and zero of the form (1 2eTTTT cos(2 F T )  e 2  )(1  2 e   cos(2 F T ) z  1  e  2  z  2 ) Vz() pz TTTT 1  2   2    2  (1 2e cos(2 Fpz T ) z  e z )(1  2 e cos(2 F T )  e )

Fp  largest peak in smoothed spectrum above 1000 Hz

FFFz(0.0065 p  4.5   )(0.014 p  28)

j2 Fp T j0  20log10H ( e )  20log 10 H ( e )  these formulas ensure spectral amplitudes are preserved

66 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Spectral Matches for Voiced Sounds

67 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Analysis and Synthesis of Speech

• essential features of signal well preserved • very intelligible synthetic speech • speaker easily identified

formant synthesis

68 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Mel Cepstrum

R filters

Ul Ul 1 2 2 E[,]()[,] n l V X n Al  Vl (k ) mel A  l k k l kL l kLl 1 R1 2 1  c [n,m]  logE [n,l]cos l  m mel mel  2 R l0  R 

69 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Delta Cepstrum

3200 Hz

200 Hz

cmel[n,l]  cmel[n,l] cmel[n 1,l] logEmel[n,l]  logEmel[n,l] logEmel[n 1,l] Differencing removes effect of linear filtering.

70 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Summary • Today’s Class – Homomorphic speech processing • Homework – HW4 assigned (due on 2/26) • Quiz #2: 2/10, HWs #4 - #6 (Open book) • Next Classes – Linear Prediction Coding of Speech Feb. 25 – March 3 • Reading assignments – Quatieri, Chapters 6 & 5 – Rabiner and Schafer, Chapters 7 & 8

71 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology