Source and Channel Coding for Audiovisual Communication Systems
Total Page:16
File Type:pdf, Size:1020Kb
Thesis for the degree of Doctor of Philosophy Source and Channel Coding for Audiovisual Communication Systems Moo Young Kim Sound and Image Processing Laboratory Department of Signals, Sensors, and Systems Kungliga Tekniska H¨ogskolan Stockholm 2004 Kim, Moo Young Source and Channel Coding for Audiovisual Communication Systems Copyright c 2004 Moo Young Kim except where otherwise stated. All rights reserved. ISBN 91-628-6198-0 TRITA-S3-SIP 2004:3 ISSN 1652-4500 ISRN KTH/S3/SIP-04:03-SE Sound and Image Processing Laboratory Department of Signals, Sensors, and Systems Kungliga Tekniska H¨ogskolan SE-100 44 Stockholm, Sweden Telephone + 46 (0)8-790 7790 To my family Abstract Topics in source and channel coding for audiovisual communication systems are studied. The goal of source coding is to represent a source with the lowest possible rate to achieve a particular distortion, or with the lowest possible distortion at a given rate. Channel coding adds redundancy to quantized source information to recover channel errors. This thesis consists of four topics. Firstly, based on high-rate theory, we propose Karhunen-Lo`eve trans- form (KLT)-based classified vector quantization (VQ) to efficiently utilize optimal VQ advantages over scalar quantization (SQ). Compared with code- excited linear predictive (CELP) speech coding, KLT-based classified VQ provides not only a higher SNR and perceptual quality, but also lower com- putational complexity. Further improvement is obtained by companding. Secondly, we compare various transmitter-based packet-loss recovery techniques from a rate-distortion viewpoint for real-time audiovisual com- munication systems over the Internet. We conclude that, in most circum- stances, multiple description coding (MDC) is the best packet-loss recov- ery technique. If channel conditions are informed, channel-optimized MDC yields better performance. Compared with resolution-constrained quantization (RCQ), entropy- constrained quantization (ECQ) produces a smaller number of distortion outliers but is more sensitive to channel errors. We apply a generalized r-th power distortion measure to design a new RCQ algorithm that has less distortion outliers and is more robust against source mismatch than conventional RCQ methods. Finally, designing quantizers to effectively remove irrelevancy as well as redundancy is considered. Taking into account the just noticeable differ- ence (JND) of human perception, we design a new RCQ method that has improved performance in terms of mean distortion and distortion outliers. Based on high-rate theory, optimal centroid density and its corresponding mean distortion are also accurately predicted. The latter two quantization methods can be combined with practical source coding systems such as KLT-based classified VQ and with joint source-channel coding paradigms such as MDC. Keywords: Information theory, high-rate theory, source coding, channel coding, quantization, multimedia communication, error correction, multiple description, just noticeable difference, perceptual quality, source mismatch. iii List of Papers The thesis is based on the following papers: [A] Moo Young Kim and W. Bastiaan Kleijn, \KLT-based Classified VQ for the Speech Signal," in Proc. IEEE Int. Conf. Acoustics, in Speech, Signal Processing (ICASSP), 2002, pp. 645-648. [B] Moo Young Kim and W. Bastiaan Kleijn, \KLT-Based Classified VQ with Companding," in Proc. IEEE Workshop on Speech Coding, 2002, pp. 8-10. [C] Moo Young Kim and W. Bastiaan Kleijn, \Rate-Distortion Comparisons between FEC and MDC based on Gilbert Channel Model," in Proc. IEEE Int. Conf. on Networks (ICON), 2003, pp. 495 - 500. [D] Moo Young Kim and W. Bastiaan Kleijn, \Comparison of Transmitter-Based Packet-Loss Recovery Techniques for Voice Transmission," in Proc. INTERSPEECH2004-ICSLP, 2004, pp. 641-644. [E] Moo Young Kim and W. Bastiaan Kleijn, \KLT-based Adap- tive Classified VQ of the Speech Signal," IEEE Transactions on Speech and Audio Processing, vol. 12, no. 5, pp. 277-289, May 2004. [F] Moo Young Kim and W. Bastiaan Kleijn, \Comparative Rate- Distortion Performance of Multiple Description Coding for Real- Time Audiovisual Communication over the Internet," submitted to IEEE Transactions on Communications, 2004. [G] Moo Young Kim and W. Bastiaan Kleijn, “Effect of Cell-Size Variation of Resolution-Constrained Quantization," to be sub- mitted to IEEE Transactions on Speech and Audio Processing. [H] Moo Young Kim and W. Bastiaan Kleijn, \Resolution- Constrained Quantization with Perceptual Distortion Measure," to be submitted to IEEE Signal Processing Letters. v Contents Abstract iii List of Papers v Contents vii Acknowledgements ix Acronyms xi I Introduction 1 1 Source Coding and Distortion Measures . 3 2 High-Rate Theory . 5 2.1 Quantization with Various Distortion Measures . 8 2.2 Advantages of Vector Quantization over Scalar Quan- tization . 9 2.3 Distortion Distribution . 11 2.4 Robust Quantization against Source Mismatch . 11 2.5 Comparison between Resolution-Constrained and Entropy-Constrained Quantization . 13 2.6 Comparison between R-D Theory and High-Rate Theory . 14 3 Quantizer Design . 15 3.1 Generalized Lloyd Algorithm . 15 3.2 Entropy-Constrained Vector Quantization . 17 3.3 Lattice Quantization . 17 3.4 Random Coding based on High-Rate Theory . 18 4 Channel Coding . 19 4.1 Packet-Loss Recovery Techniques . 20 4.2 Forward Error Correction (FEC) . 22 4.3 Multiple Description Coding (MDC) . 25 vii 4.4 Rate-Distortion Optimized Packet Loss Recovery Techniques . 30 5 Applications: Audiovisual Coding . 31 5.1 Speech and Audio Coding Standards . 31 5.2 Image and Video Coding Standards . 32 6 Summary of Contributions . 33 References . 35 viii Acknowledgements The work presented in this thesis has been performed at the Sound and Image Processing (SIP) Lab. of the Department of Signals, Sensors and Systems (S3) of KTH (Royal Institute of Technology) from January 2001 to November 2004. During this period, I have been indebted to many people for their help. It is my great pleasure to thank all the people who have supported me during my study. First of all, I am very grateful to my supervisor, Professor W. Bastiaan Kleijn, for his excellent academic guidance and endless encouragement to my research. I am impressed by his broad and deep knowledge and his enthusiasm to share his creativity. I also appreciate his patient proof reading of my works and for his trust in my independent research. I would like to thank my good friends at SIP for sharing a nice at- mosphere that has always filled me with energy. I have enjoyed working together with Volodya Grancharov, Barbara Resch, Anders Ekman, and Kahye Song. Jan Plasberg deserves my gratitude especially for his sense of humor and kindness. I would like to thank Harald Pobloth and David Zhao who have helped everyone in SIP from computer problems to important software tips. I would like to express my gratitude to Dr. Jonas Lindblom, Professor Arne Leijon, Elisabet Molin, and Martin Dahlquist for valuable discussions and interaction. Dr. Jonas Samuelsson is a rare man who can do works quickly with high quality. Renat Vafin and Yusuke Hiwasaki are specially thanked for staying in the office together until midnight and for sharing valuable ideas for both professional and private life. I am very much grateful to my previous office mate Mattias Nilsson especially for his hospi- tality during my first year of study and life in Stockholm. Particular thanks go to my office mate Sriram Srinivasan for his bright comments on my work and his tireless proof reading. I wish to thank our secretary Dora S¨oderberg for her kindness whenever I was in trouble. I cannot forget our energetic lunch-time activity - fooseball which gave us many laughs. This work was financially supported by KTH, Vinnova (Swedish Agency for Innovation Systems), Global IP Sound AB, and Samsung Advanced In- stitute of Technology (SAIT). I would like to express my gratitude to Pro- fessor Mikael Skoglund, Professor Gunnar Karlsson, Professor Gerald Q. ix Maguire Jr., Ian Marsh, Tomas Sk¨ollermo, Gy¨orgy D´an, and other col- leagues for their positive and friendly discussions during our projects on `Audiovisual Mobile Internetworking' and ‘Affordable Wireless Services and Infrastructures'. I am grateful to my previous supervisor Professor Jaihie Kim of Yonsei University for his mental support and inspiring advice. Dr. Sang Ryong Kim, Professor Dae Hee Youn, Professor Hong Goo Kang, Pro- fessor Nam Soo Kim, Professor Hong Kook Kim, Dr. Yong Duk Cho, Mr. Jeongsu Kim, and Dr. Doh Suk Kim are acknowledged for encouraging me to continue on this academic path. Thanks to all the members of the Korean student and scholar association in Sweden (KOSAS) and the Korean church in Sweden. I am also lucky enough to have the support of many good friends in Korea. It was great to meet my best friend, Young Jay Paik, in Stockholm. Most of all, I would like to thank my dear wife, Hyun Soo Kwon, and my lovely daughters, So Yee Kim and So Yon Kim for their endless love. Their support and understanding has been the most essential part for me to complete this thesis. I also would like to share my pleasure with my wife's parents, my sister's family, and my uncles' and aunts' family. Finally, I sincerely dedicate this thesis to my parents, Eung Tae Kim and Byoung Joo Youn, who I respect most. Moo Young Kim Stockholm, November 2004 x Acronyms 3GPP : 3rd Generation Partnership Project 3GPP2 : 3GPP Project 2 AbS : analysis-by-synthesis ADPCM : adaptive differential PCM AFS : adaptive full-rate