Homomorphic Speech Processing

ECE6255 Digital Processing of Speech Signals Lectures 16-18: Homomorphic Speech Processing Chin-Hui Lee School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA 30332, USA [email protected] Center of Signal and Image Processing Georgia Institute of Technology Basic Speech Model • Short segment of speech can be modeled as having been generated by exciting an LTI system either by a quasi-periodic impulse train, or a random noise signal • Speech analysis => estimate parameters of the speech model, measure their variations (and perhaps even their statistical variabilites-for quantization) with time • Speech = excitation * system response => want to deconvolve speech into excitation and system response => do this using homomorphic filtering methods 2 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Superposition Principle x()()() n ax12 n bx n yn()()()() Lxn aLxn 12 bLxn 3 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Generalized Superposition for Convolution for LTI systems we have the result yn()()()()() xnhn xkhnk k "generalized" superposition => addition replaced by convolution x()()() n x12 n x n yn()()()() Hxn Hxn 12 Hxn homomorphic system for convolution 4 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Homomorphic Filter homomorphic filter => homomorphic system H that passes the desired signal unaltered, while removing the undesired signal x( n ) x1 ( n ) x 2 ( n ) - with x 1( n ) the "undesired" signal H x()()() n H x12 n H x n Hx 11(n ) ( n ) - removal of x( n ) H x22()() n x n H x()()()() n n x22 n x n for linear systems this is analogous to additive noise removal 5 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Canonic Form for Homomorphic Convolution any homomorphic system can be represented as a cascade of three systems, e.g., for convolution 1. system takes inputs combined by convolution and transforms them into additive outputs 2. system is a conventional linear system 3. inverse of first system--takes additive inputs and transforms them into convolutional outputs 6 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Canonic Form for Homomorphic Convolution x( n ) x12 ( n ) x ( n ) - convolutional relation xˆ( n ) D x ( n ) x ˆ12 ( n ) x ˆ ( n ) - additive relation ynˆ( ) Lxn ˆ1 ( ) xn ˆ 2 ( ) yn ˆ 1( ) yn ˆ 2 ( ) - conventional linear system 1 y()()()() n D yˆˆ1 n y 2 n y 1 n y2 (n ) - inverse of convolutional relation => design converted back to linear system, L D - fixed (called the characteristic system for homomorphic deconvolution) 1 D - fixed (characteristic system for inverse homomorphic deconvolution) 7 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Properties of Characteristic Systems D x()()() n D x12 n x n D x12()() n D x n xˆ12()()() n x ˆ n x ˆ n 11 D yˆ()()() n D y ˆ12 n y ˆ n 11 D yˆˆ12()() n D y n y12()()() n y n y n 8 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Frequency Domain Canonic Form need to find a system that converts convolution to addition x()()() n x12 n x n X()()() z X12 z X z since D x( n ) xˆ12 ( n ) x ˆ ( n ) x ˆ ( n ) ˆ ˆ ˆ D X()()()() z X12 z X z X z => use log function which converts products to sums ˆ X( z ) log X ( z ) log X12 ( z ) X ( z ) ˆˆ log X1 ( z ) log X 2 ( z ) X 1 ( z ) X 2 ( z ) Yzˆ()()()()() LXz ˆ Xz ˆ YzYz ˆ ˆ 1 2 1 2 Y( z ) exp Yˆˆ ( z ) Y ( z ) Y ( z ) Y ( z ) 1 2 1 2 9 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Characteristic Systems for Homomorphic Deconvolution X()() z x n z n n Xˆ ( z ) log X ( z ) log X ( z ) j arg X ( z ) 1 xˆ()() n Xˆ z zn1 dz 2 j C 10 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Characteristic Systems for Homomorphic Deconvolution + Yˆ()() z yˆ n zn n ˆ Y( z ) exp Y ( z ) 1 y()() n Y z zn1 dz 2 j C 11 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Issues with Logarithms it is essential that the logarithm obey the equation log X1 ( z ) X 2 ( z ) log X 1 ( z ) log X 2 ( z ) this is trivial if X12( z ) and X( z ) are real -- however usually X12( z ) and X( z ) are complex consider evaluating X( z ) only on the unit circle, e.g., z e j then the complex log can be written in the form: jarg X ( e j ) X( ejj ) | X ( e ) | e logX ( ej ) Xˆ ( e j ) log | X ( e j ) | j arg X ( e j ) no problems with log magnitude term; uniqueness problems arise in defining the imaginary part of the log; can show that the imaginary part (the phase angle of the z-transform) needs to be a continuous odd function of 12 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Complex and Real Cepstrum define the inverse transform of Xeˆ (j ) as 1 xˆ()() n Xˆ ej e j n d 2 where xnˆ( ) called the "complex cepstrum" since a complex logarithm is involved in the computation can also define a "real cepstrum" using just the real part of the logarithm, giving 1 c( n ) Re Xˆ ( ej ) e j n d 2 1 log |X ( ej ) | e j n d 2 can show that c( n ) is the even part of xˆ( n ) 13 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Fourier Transform Representation: The Complex Cepstrum D*[] xn() * ● ● + + + xnˆ() F{ } log{ } F-1{ } Xe()j Xeˆ ()j j j j n j jarg X ( e ) X()()() e x n e X e e n ˆ j j j j X( e ) log X ( e ) log X ( e ) j arg X ( e ) 1 xˆ()() n Xˆ ej e j n d 2 14 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology The Complex Cepstrum-DFT Implementation 22 j k j kn NN Xp ( k ) X ( e ) x ( n ) e k 0,1,..., N 1, n j Xp ( k ) is the N point DFT corresponding to X( e ) ˆ Xp( k ) log X p ( k ) log X p ( k ) j arg X p ( k ) 2 1 N 1 j kn ˆˆˆ N xpp( n ) X ( k ) e x ( n rN ) n 0,1,..., N 1 N kr0 xnˆ p ( ) is an aliased version of xnˆ( ) use as large a value of N as possible to minimize aliasing 15 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology The Cepstrum-DFT Implementation 22 j k j kn NN Xp ( k ) X ( e ) x ( n ) e n 0,1,..., N 1 n jarg Xp ( k ) Xpp()() k X k e C( ejj ) log X ( e ) 2 jk N Cpp( k ) log X ( e ) 1 c( n ) C ( ej ) e j n d xˆˆ ( n ) x ( n ) / 2 2 2 N 1 j kn 1 N cpp( n ) C ( k ) e c ( n rN ) n 0,1,..., N 1 N kr0 cp (n ) is an aliased version of c( n ) use as large a value of N as possible to minimize aliasing 16 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Computational Considerations consider only absolutely summable input sequences so that the z-transform region of convergence includes the unit circle => the sequence has a FT for finite length sequences, the characteristic system for convolution can be represented using FT's, giving N 1 X( ej ) x ( n ) e j n n0 Xˆ ( ej ) log X ( e j ) log X ( e j ) j arg X ( e j ) 1 xˆ()() n Xˆ ej e j n d 2 need to have the log magnitude be an even function of need to have the phase be a continuous periodic function of with period 2 need to approximate the FT by the DFT (works fine for finite length seqences) 17 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Finite Length Sequences for finite length sequences all computations can be done using DFT's 2 N 1 j kn N Xp ( k ) x ( n ) e 0 k N 1 n0 ˆ Xpp( k ) log X ( k ) 0 k N 1 2 1 N 1 j kn ˆ ˆ N xpp( n ) X ( k ) e 0 k N 1 N k 0 not exact complex cepstrum since the complex log used in the DFT calculations is a sampled version of Xeˆ (j ) and thus the resulting inverse transform is an aliased version of the true complex cepstrum, i.e., ˆˆ xp ( n ) x ( n rN) r 18 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology Cepstrum the complex cepstrum involves the use of the complex log and the cepstrum involves only the log magnitude, i.e., 1 c( n ) log X ( ej ) e j n d n 2 an approximation to the cepstrum (using an inverse DFT) is 2 N 1 j kn 1 N cpp( n ) log X ( k ) e 0 n N 1 N k 0 the relation to the true cepstrum is cp ( n ) c ( n rN ) r need to use large values of N (512 or greater) in computing xˆ pp( n ), c ( n ) to minimize aliasing 19 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology z-Transform Cepstrum Analysis The main z-transform formula for cepstrum alanysis is based on the power series expansion: n1 1 log(1x ) xn x 1 n1 n Example 1--Apply this formula to the exponential sequence 1 xn( ) an u()() n X z 1 1 1 az1 n1 1 ˆ 1 nn X11( z ) log[ X ( z )] log(1 az ) ( a ) z n1 n aann ˆ ˆ 1 n x11( n ) u ( n 1) X ( z ) log(1 az ) z nnn1 20 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology z-Transform Cepstrum Alanysis Example 2--consider the case of a digital system with a single zero outside the unit circle ( b 1) x2 ( n ) ( n ) b ( n 1) X2 ( z ) 1 bz (zero at z 1/ b ) ˆ X22( z ) log X ( z ) log 1 b z ( 1)n1 ()bznn n1 n ( 1)nn1b xˆ (n ) u ( n 1) 2 n 21 ECE6255 Spring 2010 Center of Signal and Image Processing Georgia Institute of Technology z-Transform Cepstrum Analysis consider digital systems with rational z-transforms of the general

Homomorphic Speech Processing

Cepstrum Analysis and Gearbox Fault Diagnosis

Cepstral Peak Prominence: a Comprehensive Analysis

Exploiting the Symmetry of Integral Transforms for Featuring Anuran Calls

A Processing Method for Pitch Smoothing Based on Autocorrelation and Cepstral F0 Detection Approaches

Speech Signal Representation

'Specmurt Anasylis' of Multi-Pitch Signals

The Exit-Wave Power-Cepstrum Transform for Scanning Nanobeam Electron Diffraction: Robust Strain Mapping at Subnanometer Resolution and Subpicometer Precision

Study Paper for Timbre Identification in Sound

L9: Cepstral Analysis

ECW-706 Lecture 9 Ver. 1.1 1 Cepstrum Alanysis a Cepstrum Is

Pitch Tracking

Digital Signal Processing of Multi-Echo Interference for Angle Modulated Signal