High-Fidelity Multichannel Audio Coding EURASIP Book Series on Signal Processing and Communications Editor-In-Chief: K
Total Page:16
File Type:pdf, Size:1020Kb
High-Fidelity Multichannel Audio Coding Dai Tracy Yang, Chris Kyriakakis, and C.-C. Jay Kuo EURASIP Book Series on Signal Processing and Communications EURASIP Book Series on Signal Processing High-Fidelity Multichannel Audio Coding EURASIP Book Series on Signal Processing and Communications Editor-in-Chief: K. J. Ray Liu Editorial Board: Zhi Ding, Moncef Gabbouj, Peter Grant, Ferran Marques,´ Marc Moonen, Hideaki Sakai, Giovanni Sicuranza, Bob Stewart, and Sergios Theodoridis Hindawi Publishing Corporation 410 Park Avenue, 15th Floor, #287 pmb, New York, NY 10022, USA Nasr City Free Zone, Cairo 11816, Egypt Fax: +1-866-HINDAWI (USA toll-free) © 2006 Hindawi Publishing Corporation All rights reserved. No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without written permission from the publisher. Cover Image: Mehau Kulyk/Science Photo Library ISBN 977-5945-24-0 EURASIP Book Series on Signal Processing and Communications, Volume 2 High-Fidelity Multichannel Audio Coding Dai Tracy Yang, Chris Kyriakakis, and C.-C. Jay Kuo Hindawi Publishing Corporation http://www.hindawi.com Dedication To Ruhua, Joshua, Junhui, and Zongduo — Dai Tracy Yang To Wee Ling, Anthony, and Alexandra — Chris Kyriakakis To Terri and Allison —C.-C.JayKuo Preface Audio is one of the fundamental elements in multimedia signals. Audio signal pro- cessing has attracted attention from researchers and engineers for several decades. By exploiting unique features of audio signals and common features of all multi- media signals, researchers and engineers have been able to develop more efficient technologies to compress audio data. Although books on digital audio have been available some time, the subject of multichannel audio coding techniques has not yet been addressed in great detail. With many years of teaching and research in the field of digital audio signal processing and digital audio compression, we see a need for an advanced audio coding book that covers recent developments in this field. When we started this book project, we had a smaller scope. Our objective was to present several inno- vative compression techniques for multichannel audio sources and publish it as a research monograph. However, after the first draft, we received valuable comments from our colleagues and anonymous reviewers. With their encouragement, we de- cided to extend the coverage of the book by including more background material to make it a senior undergraduate or a graduate level textbook on advanced au- dio coding techniques. Special thanks also go to Dr. Hongmei Ai for her valuable discussions and suggestions when we developed and tested our new audio coding algorithms. This book includes three parts. The first part covers the basic topics on au- dio compression, such as quantization, entropy coding, psychoacoustic models, and sound quality assessment. The second part of the book highlights the current most prevalent low-bit-rate high-performance audio coding standard—MPEG-4 Audio. More emphasis is given to the audio standards that are capable of support- ing multichannel signals, that is, MPEG Advanced Audio Coding (AAC), includ- ing the original MPEG-2 AAC specification, additional MPEG-4 toolsets, and the most recent aacPlus standard. The third part of this book introduces several inno- vative multichannel audio coding methods, which can further improve the coding performance and expand the available functionalities of MPEG AAC. This section is more suitable for graduate students and researchers. Dai Tracy Yang, Chris Kyriakakis, and C.-C. Jay Kuo Los Angeles, CA August 17, 2005 Contents Dedication v Preface vii 1. Introduction to digital audio 1 1.1. Digital audio coding 1 1.1.1. Representing digital audio signals 1 1.1.2. Building blocks of digital audio codecs 3 1.1.3. Lossy compression and lossless compression 3 1.2. Fundamentals of digital signal processing 4 1.2.1. Fourier transform 4 1.2.2. Sampling operation 5 1.2.3. Sampling theorem and aliasing 7 1.3. Multichannel audio 12 1.3.1. Perceptual cues 13 1.3.2. Surround sound 14 1.3.3. Surround sound standards 15 1.3.4. A future surround sound system 17 1.4. Outline of this book 18 2. Quantization 21 2.1. Scalar quantization 21 2.1.1. Uniform quantization 21 2.1.2. Nonuniform quantization 25 2.2. Vector quantization 26 2.2.1. Nearest-neighbor quantizers 28 2.2.2. Optimality of vector quantizers 29 2.2.3. Vector quantizer design 31 2.3. Bit allocation 32 2.3.1. Problem of bit allocation 33 2.3.2. Optimal bit allocation results 33 3. Entropy coding 35 3.1. Introduction to information theory 35 3.2. Huffman coding 38 3.2.1. Huffman coding algorithm 38 3.2.2. Variance of Huffman codes 39 3.2.3. Huffman decoding 40 3.2.4. Adaptive Huffman coding 41 xContents 3.3. Arithmetic coding 42 3.3.1. Arithmetic coding algorithm 42 3.3.2. Implementation issues 44 3.3.3. Solving underflow problem 47 3.3.4. Adaptive arithmetic coding 48 3.4. QM coding 51 3.4.1. QM encoder 51 3.4.2. QM decoder 55 3.4.3. Probability estimation 55 4. Introduction to psychoacoustics 59 4.1. Perception of loudness 59 4.2. Masking 61 4.2.1. Frequency masking 62 4.2.2. Temporal masking 63 4.2.3. Interaural masking 64 5. Subjective evaluation of audio codecs 65 5.1. Introduction 65 5.2. Listening environment specifications 65 5.3. Testing methodology 68 5.4. Data analysis after subjective listening tests 69 5.4.1. Mean 69 5.4.2. Variance 69 5.4.3. Standard deviation 71 5.4.4. Standard error of the mean 72 5.4.5. Confidence interval 73 6. MPEG-4 audio coding tools 77 6.1. Introduction to MPEG-4 audio 77 6.2. MPEG-4 audio tools 79 6.2.1. MPEG-4 natural sound coding tools 81 6.2.2. MPEG-4 audio synthesis tools 87 7. MPEG advanced audio coding 91 7.1. Introduction to advanced audio coding 91 7.2. MPEG-2 AAC 92 7.2.1. Overview of MPEG-2 AAC 92 7.2.2. Psychoacoustic model 94 7.2.3. Gain control 94 7.2.4. Transform 95 7.2.5. Spectral processing 98 7.2.6. Quantization 102 7.2.7. Entropy coding 103 7.3. New features in MPEG-4 AAC 105 7.3.1. Perceptual noise substitution 106 Contents xi 7.3.2. Long-term prediction 107 7.3.3. TwinVQ 108 7.3.4. Low-delay AAC 109 7.3.5. Error-resilient tools 111 7.3.6. MPEG-4 scalable audio coding tools 112 7.4. MPEG-4 high-efficiency AAC 118 7.4.1. Background of SBR technology 119 7.4.2. Basic principle of SBR technology 121 7.4.3. More technical details on high-efficiency AAC 122 8. Introduction to new audio coding tools 125 8.1. Motivation and overview 125 8.1.1. Redundancy inherent in multichannel audio 125 8.1.2. Quality-scalable single compressed bitstream 126 8.1.3. Embedded multichannel audio bitstream 126 8.1.4. Error-resilient scalable audio bitstream 127 8.2. Audio coding improvements 127 8.2.1. Interchannel redundancy removal approach 128 8.2.2. Audio concealment and channel transmission strategy for heterogeneous network 129 8.2.3. Quantization efficiency for adaptive Karhunen-Loeve` transform 129 8.2.4. Progressive syntax-rich multichannel audio codec design 130 8.2.5. Error-resilient scalable audio coding 130 9. Interchannel redundancy removal and channel-scalable decoding 133 9.1. Introduction 133 9.2. Interchannel redundancy removal 133 9.2.1. Karhunen-Loeve` transform 133 9.2.2. Evidence for interchannel decorrelation 135 9.2.3. Energy compaction effect 138 9.2.4. Frequency-domain versus time-domain KLT 141 9.3. Temporal adaptive KLT 143 9.4. Eigen-channel coding and transmission 147 9.4.1. Eigen-channel coding 147 9.4.2. Eigen-channel transmission 149 9.5. Audio concealment for channel-scalable decoding 150 9.6. Compression system overview 152 9.7. Complexity analysis 154 9.8. Experimental results 155 9.8.1. Multichannel audio coding 155 9.8.2. Audio concealment with channel-scalable coding 157 9.8.3. Subjective listening test 160 9.9. Conclusion 162 xii Contents 9.10. Appendix: Karhunen-Loeve` expansion 163 9.10.1. Definition 163 9.10.2. Features and properties 163 10. Adaptive Karhunen-Loeve` transform and its quantization efficiency 165 10.1. Introduction 165 10.2. Vector quantization 166 10.3. Efficiency of KLT decorrelation 167 10.4. Temporal adaptation effect 172 10.5. Complexity analysis 176 10.6. Experimental results 176 10.7. Conclusion 177 11. Progressive syntax-rich multichannel audio codec 179 11.1. Introduction 179 11.2. Progressive syntax-rich codec design 180 11.3. Scalable quantization and entropy coding 182 11.3.1. Successive approximation quantization 182 11.3.2. Context-based QM coder 186 11.4. Channel and subband transmission strategy 187 11.4.1. Channel selection rule 187 11.4.2. Subband selection rule 188 11.5. Implementation issues 191 11.5.1. Frame, subband, or channel skipping 191 11.5.2. Determination of the MNR threshold 192 11.6. Complete description of PSMAC codec 192 11.7. Experimental results 193 11.7.1. Results using MNR measurement 194 11.7.2. Subjective listening tests 196 11.8. Conclusions 197 12. Error-resilient scalable audio codec design 199 12.1. Introduction 199 12.2. WCDMA characteristics 201 12.3. Layered coding structure 201 12.3.1.