Source-Channel Coding for CELP Speech Coders

2G'S c5 Source-Channel Coding for CELP Speech Coders J.A. Asenstorfer B.Ma.Sc. (Hons),B.Sc.(Hons),8.8',M'E' University of Adelaide Department of Electrical and Electronic Engineering Thesis submitted for the Degree of Doctor of Philosophy. Septemb er 26, L994 A*ond p.] l'ìq t Contents 1 1 Speech Coders for NoisY Channels 1.1 Introduction 1 2 I.2 Thesis Goals and Historical Perspective 2 Linear Predictive Coders Background I I 2.1 Introduction ' 10 2.2 Linear Predictive Models . 2.3 Autocorrelation of SPeech 15 17 2.4 Generic Code Excited Linear Prediction 3 Pitch Estimation Based on Chaos Theory 23 23 3.1 Introduction . 3.1.1 A Novel Approach to Pitch Estimation 25 3.2 The Non-Linear DYnamical Model 26 29 3.3 Determining the DelaY 32 3.4 Speech Database 3.5 Initial Findings 32 3.5.1 Poincaré Sections 34 35 3.5.2 Poincaré Section in Time 39 3.6 SpeechClassifrcation 40 3.6.1 The Zero Set 4l 3.6.2 The Devil's Staircase 42 3.6.3 Orbit Direction Change Indicator 45 3.7 Voicing Estimator 47 3.8 Pitch Tracking 49 3.9 The Pitch Estimation Algorithm ' 51 3.10 Results from the Pitch Estimator 55 3.11 Waveform Substitution 58 3.11.1 Replication of the Last Pitch Period 59 3.1I.2 Unvoiced Substitution 60 3.11.3 SPlicing Coders 61 3.11.4 Memory Considerations for CELP 61 3.12 Conclusion Data Problem 63 4 Vector Quantiser and the Missing 63 4.I Introduction bÐ 4.2 WhY Use Vector Quantisation? a Training Set 66 4.2.1 Generation of a Vector Quantiser Using Results 68 4.2.2 Some Useful Optimal Vector Quantisation ,7r 4.3 Heuristic Argument ùn . fù 4.4 Codebook Generation ll 74 4.4.I Fixed Codebook . 74 4.4.2 AdaPtive Codebook 79 4.5 Performance 81 4.5.1 Common Current Techniques 82 4.5.2 ComParison of Methods 86 4.6 Improving the Vector Quantiser Error Corrector 88 4.7 Conclusions Information' 90 5 Vector Quantiser Design for Spectral 90 5.1 Introduction 91 5.2 Tree Structured Vector Quantiser Design 98 5.2.I Initial Tree-structured vector Quantiser Design Channels t02 5.3 Vector Quantiser Design for Noisy 105 5.4 Genetic Algorithms 107 5.5 Genetic Algorithms Applied to Vector Quantiser Design on Noisy Channels Ll2 5.6 Simulation Results of a Protected Vector Quantiser 115 5.7 Split Vector Quantisers t20 5.8 Multistage Vector Quantisers for Noisy Channels LzI 5.9 Conclusions 126 6 Tlellis Vector Excitation I25 6.1 Introduction . Model r27 6.2 The Cod.e Excited Linear Prediction Vocoder ' 130 6.3 Analysis of the Trellis Encoding Process lll Residual 137 6.4 Trellis Encoding of the Excitation 6.4.1 ImplementationDetailsoftheTrellisVectorExcitedLPCoder.l4l r45 6.5 Code Book OPtimization r47 Functions ' 6.5.1 Optimization for Rough Non-convex 156 6.6 Coding Performance 162 Speech Using Pitch Information 6.7 Post Processing of the Synthetic 166 7 Analysis of Tlellis Coding 166 7.r Introduction 167 7.2 Channel Error AnalYsis t72 7.2.1 Informal Listening Tests Error 173 7.3 Regular Pulse Excitation Prediction t74 7.4 Model of the Coding Mechanism 180 7.5 Conclusions 1E3 I Conclusions 183 8.1 Pitch Estimation 185 Spectral Information ' 8.2 Protection of the Short Term Correction Mechanism 186 8.2.1 Vector Quantiser as an Error 188 8.2.2 Protection of Vector Quantised Parameters ' 190 8.2.3 ComParison of Techniques 191 8.3 Trellis Encoding of the Residual 191 8.3.1 Training of the Trellis 193 8.3.2 Trellis Encoding Performance IV 193 8.4 Error Modelling of Trellis Encoded Residual 194 8.5 Trellis Coder Overview V List of Figures 10 2,7 Differential Pulse Code Modulator Filter' I4 2.2 All Pole Auto-Recursive Prediction juice 16 2.3 Speech Waveforms: fine and in juice 16 2.4 Autocorrelations: 'i' in fine and 's' 18 2.5 Block Diagram of Basic CELP Structure 31 versus Pitch Estimate 3.1 Decorrelation Lags in Samples Map generated using a Stroboscope ' 34 3.2 A Poincaré Section and' a Poincaré and 3 Dimensional AMDF's' 37 3.3 Comparison of 1 Dimensional 37 3.4 ReferencelDimensionaland3DimensionalAMDF'sforNoiseTests. 3.5 ComparisonoflDimensionaland3DimensionalAMDF,sofSpeech 38 with Additive White Gaussian Noise' 3.6 ComparisonoflDimensionaland.3DimensionalAMDF'swith100Hz 38 Tone Noise parameter solid line, atd coo,rse 3.7 Structure Parameter s lot wagon: fi,ne 4l parameter broken line' 43 3.8 The Devil's Staircase lor wagon' fine and 's' in juice' 43 3.g Two Dimensional Phase Pl0ts of the sounds: 'i'in 45 3.l0speechWaveformandNormalizedDirectionChanges. VI 47 3.11 Speech Waveform and Voicing Indicator' 53 3.12 Pitch Contour lor juice' fot wagon' 53 3.13 The Speech Waveform and Pitch Contour 3.l4SpeechWaveformtheFemaleUtterance:Thejuiceoflemonsmakesfi,ne bÐ punch. 56 3.15 AMDF Pitch Contour 56 3.16 Super-resolution Pitch Contour for the Male utterance" The wûgon 3.17 speech waveform and Pitch contour Ðl rnoaed on well oiled' wheels' ' for the Female utterance: These 3.1g speech waveform and Pitch contour 57 ilays a chiclcen leg is a rare dish' ' 4.I FixedVectorCodebookPerformanceinMissingDataEstimation:the codebook' 82 plot names indicate the size of the fixed 4.2 AdaptiveVectorCodebookPerformanceinMissingDataEstimation: the codebook' 83 plot name extensions indicate the size of Data Estimation: 128 entry frxed 4.3 Comparison of Method,s in Missing repetition of entire vector codebook, sorting of the Lsp components, errors' 84 LSP vectors and distortion due to undetected all Vectors with Bit Errors' 85 4.4 Ratio of Detected. Error Vectors to 85 Codebook and an Adaptive Codebook' 4.5 Comparison of a Fixed Vector 5.1 AverageSpectralDistortionVersusLog2Cod.ebookSizeforTreeand full search and tree tree search' 101 Full search vector Quantisers: /s denotes 5.2 WeightedNegativeSpectralDistortionVe[susGenerationNumber:öesú denotesthebestofthepopulation,auedenot,esthepopulationaverage.ll0 vl1 cod,ebook size for 3 Bits of the Index 5.3 Average spectral Distortion Versus case and pernù Protected: unpernù denotes the unpgrmuted index Error ll4 denotes permutation of the indices' 5.4 AverageSpectralDistortionVersustheNumberofErrorProtectedBits.116 Two I2l 5.5 Multi-Stage Vector Quantiser Using Stages' the Total codebook size: The upper 5.6 Average spectral Distortion versus curveisforthe2S6vectorcoalsequantiserandthelowercurveforthe show the performance of the 512 vector coarse quantiser, the points r22 better split-vector quantisers' r27 6.1 Simple Multi-Pulse Encoder' Excited Linear Predictor coder' ' 130 6.2 Block Diagram of a Generic code 132 6.3 BlockDiagramoftheTrellisCod.eExcitedLinearPredictorCoder. 135 6.4 ConvolutionalsourceCodeBranchsymbol-VectorGeneration 138 6.5 Example 4 State Trellis' 6.6 BlockDiagramoftheDoublyLinkedTrepStructure.Theblocksindi- pointers were used for cate records and the arrows pointers. Forward generatingthetreeandreversearrolüswereusedfortrace-backtothe t44 root node. Function; the function value was 6.7 Four proflles of the Trellis Evaluation theaveragemeansquareelrolonencodingthesourcespeechusingthe 147 M,L algorithm f'or four randomly selected' branches Using Regular Pulse Excitation; 6.8 Four profiles of the Trellis Evaluation 148 evaluated for four randomly selected' branches' Algorithms: without: con 6.9 Evaluation Profile of simulated Annealing 151 and with restarts: con? and con?' required for self-affine curves 6.10 plot to determine if the scale relationship wasmetbythemeansquareencodingerrorforregularpulseexcitation.lSS Optimizalion Algorithm 155 6.11 Evaluation profile of Conjugate Stochastic vlll 6.12 Original Waveform Uttered by a Male Speaker' 159 159 6.13 coded waveform for Hate lf2 Bit per sample Trellis, 160 6.14 Detail Showing Comparison of Original and Synthetic Waveform' 160 6.15 coded waveform for Ftate ll2 Bit per sample untrained Trellis' 161 6.16 Coded Waveform for Rate 1/4 Bit per Sample Trellis' 6.17 Original Waveform of u.'ide Spoken by a Male Speaker' 163 t64 6.18 Reconstructed Coded Waveform o1 wiile' 165 6.19 Enhanced Version: post processing of wide' 7.I cumulative Probability Distribution for Maximum Error Events Greater state, than a given Threshold: cumul128, cumul6f, cumul72 refer to 128 17t 64 state and 32 state trellises respectively' r77 7.2 Frequency Response of synthesis Filter 1 in the Encoding Model . f77 7.3 Chirp Input Waveform to be Encoded Rate 1'178 7.4 Reconstructed waveform from Encoding Model; Residual coding Rate 7.5 Reconstructed Waveform from Encoding Model; Residual Coding | 12. 178 Rate 7.6 Reconstructed waveform from Encoding Model; Residual coding rl3. 179 Rate 7.7 Reconstructed waveform from Encoding Model; Residual coding rl4. t79 180 7.g Encoding Rate 1/4 With a Greater codebook-Range/source Mismatch 181 7.g Frequency Response for Synthesis Filter 2 ' ' ' ' ' 181 7.10 Encoding Simulation For Rate Ila with Filter 2 195 8.1 Simplifred Block Diagram of Proposed Speech Encoder' IX List of Tables 2.1 Bit Allocation for Cox's CELP coder 2T 2.2 Bit Allocation for Perkis' CELP coder 2l 8.1 Average Standard Deviations on Pitch Estimates for Different Dimen- sionality: for speech sampled at 8 kHz. 52 100 5.1 Spectral Distortion and Percent Outliers for the Tree Structured Quantiser' 5.2 Spectral Distortion and Percent Outliers for the Fully Searched Quantiser.101 111 5.3 Average Spectral Distortion (sD) in dB for the specifred Error Rates. 5.4 Average spectral Distortion in dB Due to channel Errors (e : 0.06) with Three Index Bits Protected. 113 5.5 Average spectral Distortion in dB due to channel Errors (e : 0'03) with Three Index Bits Protected' Il4 5.6 Average spectral Distortion in dB due to channel Errors (e : 0.06) with a Variable Number of Index Bits Protected' 115 5.T Spectral Distortion and Percent Outliers for the Split Tree Structured 119 Quantiser.

Source-Channel Coding for CELP Speech Coders

Relating Optical Speech to Speech Acoustics and Visual Speech Perception

Improving Opus Low Bit Rate Quality with Neural Speech Synthesis

Lecture 16: Linear Prediction-Based Representations

Samia Dawood Shakir

End-To-End Video-To-Speech Synthesis Using Generative Adversarial Networks

Linear Predictive Modelling of Speech - Constraints and Line Spectrum Pair Decomposition

Audio Processing and Loudness Estimation Algorithms with Ios Simulations

Code Excited Linear Prediction with Multi-Pulse Codebooks

A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds

CELP and Speech Enhancement Ian Mcloughlin

Harmonic Coding of Speech at Low Bit Rates

TRANSCOM Proceedings, Section 3