CELP and Speech Enhancement Ian Mcloughlin

CELP and Speech Enhancement Ian McLoughlin PhD Thesis School of Electronic and Electrical Engineering Faculty of Engineering University of Birmingham October 1997 University of Birmingham Research Archive e-theses repository This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third parties in respect of this work are as defined by The Copyright Designs and Patents Act 1988 or as modified by any successor legislation. Any use made of information contained in this thesis/dissertation must be in accordance with that legislation and must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the permission of the copyright holder. Synopsis This thesis addresses the intelligibility enhancement of speech that is heard within an acoustically noisy environment. In particular, a realistic target situation of a police vehicle interior, with speech generated from a CELP (codebook-excited linear prediction) speech compression-based communication system, is adopted. The research has centred on the role of the CELP speech compression algorithm, and its transmission parameters. In particular, novel methods of LSP-based (line spectral pair) speech analysis and speech modification are developed and described. CELP parameters have been utilised in the analysis and processing stages of a speech intelligibility enhancement system to minimise additional computational complexity over existing CELP coder requirements. Details are given of the CELP analysis process and its effects on speech, the development of speech analysis and alteration algorithms coexisting with a CELP system, their effects and performance. Both objective and subjective tests have been used to characterize the effectiveness of the analysis and processing methods. Subjective testing of a complete simulation enhancement system indicates its effectiveness under the tested conditions, and is extrapolated to predict real-life performance. The developed system presents a novel integrated solution to the intelligibility enhancement of speech, and can provide a doubling, on average, of intelligibility under the tested conditions of very low intelligibility. Acknowledgements I would like to express thanks to the following people who have assisted me greatly in my work: Firstly, my supervisor, R. J. Chance who for the past three years has rendered invaluable help and assistance, has continually expressed his interest in my work, and has provided much of the encouragement and guidance that has resulted in this thesis. Many other past and present members of the University of Birmingham School of Electronic and Electrical Engineering have also assisted me in my research, and although too numerous to name, I would like to especially thank the following: Dr J.A.Edwards who was instrumental in establishing the initial industrial link with Simoco (formerly Philips PMR) and has remained interested in this work throughout. Dr S.I.Woolley has provided ideas and encouragement, and in particular has lent her time to comment on my draft thesis. Dr. H. Ghafouri-Shiraz, Dr. I. Gu, Dr. D.G. Checketts, Dr. S.M. Bozic and D. Pycock also deserve mention for their help. I wish to acknowledge the encouragement and support of Dr. M. Rayne of Simoco International (Cambridge), who accepted the industrial link from the former Philips Telecom PMR, who has assisted in the funding of the work, and who has expressed much interest in the research. Further acknowledgements must go to my ever patient and helpful wife, my parents, fellow researchers and friends, and also to K. Worrell and D. Pinfold of Her Majesty's Government Communications Centre (HMGCC, Hanslope, Buckinghamshire) who have remained in contact with me, and have kept me up to date with the latest industrial developments. I would like to thank Chief Inspector Anderson and Sgt. Bassett of West Midlands Police who arranged for my police vehicle noise measurement tests, and for Sgt. Higgins and Sgt. Simmonite who patiently drove me around North Birmingham for half a day, attached to recording equipment. Finally, my thanks extend to those people who participated in the listening tests, for their patience and willingness to help. Contents Contents List of figures and tables Glossary of terms 1 Introduction 1 PARTI: Investigation and Context...................................................3 2 Candidate speech enhancement methods.................................................... 4 2.1 Definitions and background................................................................. 4 2.2 Existing methods of enhancement....................................................... 4 2.2.1 Direct speech modification...................................................... 5 2.2.2 Enhancing CELP.................................................................... 8 2.2.3 Noise removal techniques....................................................... 8 2.2.4 Techniques developed for hearing-impaired listeners........... 9 2.3 Enhancement methods.......................................................................... 11 2.3.1 Amplification........................................................................... 11 2.3.2 Spectral modification.............................................................. 12 2.3.2.1 Masking................................................................. 13 2.3.3 Clipping and selective amplification...................................... 14 2.3.4 Temporal methods.................................................................. 15 2.4 Acoustic background noise.................................................................. 16 2.4.1 Generation of noise................................................................. 16 2.4.2 Noise dynamics....................................................................... 17 2.4.3 Frequency distribution............................................................ 19 2.4.4 Target environment................................................................. 21 3 Speech compression using CELP................................................................. 24 3.1 Introduction ............................................................................... 24 3.2 A description of CELP......................................................................... 24 3.2.1 Derivation of algorithms......................................................... 24 3.2.2 Operating principle................................................................. 25 3.2.3 Description of operating process........................................... 26 6 Sound Classification ................................................................................. 58 6.1 Introduction ................................................................................ 58 6.2 Available measures............................................................................... 59 6.3 Speech detection and classification..................................................... 60 6.4 Speech analysis ................................................................................ 61 6.4.1 Speech type classification....................................................... 61 6.4.2 Formant detection.................................................................... 61 6.5 Noise analysis ................................................................................ 65 6.6 The hearing model, and intelligibility................................................. 65 6.6.1 Psychoacoustic effects............................................................ 66 6.6.2 Reported hearing models........................................................ 66 6.6.2.1 Spectral analysis..................................................... 67 6.6.2.2 Critical band warping............................................. 67 6.6.23 Critical band function convolution........................ 69 6.6.2.4 Equal-loudness preemphasis................................. 69 6.6.2.5 Intensity-loudness conversion............................... 70 6.6.3 Hearing model outcome.......................................................... 70 7 Designing a speech-enhancing CELP coder............................................... 73 7.1 Introduction ................................................................................ 73 7.2 Data flow analysis................................................................................ 74 7.3 Hearing model and expert system........................................................ 74 7.4 Enhancement ................................................................................ 75 7.5 Expert system 76 PART III: Testing and Evaluation..................................................78 8 Test objectives and systems.......................................................................... 79 8.1 Introduction ................................................................................ 79 8.2 CELP simulator ................................................................................ 79 8.3 Speech classification............................................................................. 81 8.3.1 LSP measure in relation to speech features............................ 81 8.3.2 LSP measure compared to other measures............................ 84 8.3.3 Interpretation of phoneme test results................................... 92 8.4 Speech intelligibility............................................................................. 93 8.5 Noise simulation

CELP and Speech Enhancement Ian Mcloughlin

Relating Optical Speech to Speech Acoustics and Visual Speech Perception

Improving Opus Low Bit Rate Quality with Neural Speech Synthesis

Lecture 16: Linear Prediction-Based Representations

Source-Channel Coding for CELP Speech Coders

Samia Dawood Shakir

End-To-End Video-To-Speech Synthesis Using Generative Adversarial Networks

Linear Predictive Modelling of Speech - Constraints and Line Spectrum Pair Decomposition

Audio Processing and Loudness Estimation Algorithms with Ios Simulations

Code Excited Linear Prediction with Multi-Pulse Codebooks

A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds

Harmonic Coding of Speech at Low Bit Rates

TRANSCOM Proceedings, Section 3