Thesis for the degree of Doctor of Philosophy

Source and Channel Coding for Audiovisual Communication Systems

Moo Young Kim

Sound and Image Processing Laboratory Department of Signals, Sensors, and Systems Kungliga Tekniska H¨ogskolan

Stockholm 2004 Kim, Moo Young Source and Channel Coding for Audiovisual Communication Systems

Copyright c 2004 Moo Young Kim except where otherwise stated. All rights reserved.

ISBN 91-628-6198-0 TRITA-S3-SIP 2004:3 ISSN 1652-4500 ISRN KTH/S3/SIP-04:03-SE

Sound and Image Processing Laboratory Department of Signals, Sensors, and Systems Kungliga Tekniska H¨ogskolan SE-100 44 Stockholm, Sweden Telephone + 46 (0)8-790 7790 To my family

Abstract Topics in source and channel coding for audiovisual communication systems are studied. The goal of source coding is to represent a source with the lowest possible rate to achieve a particular distortion, or with the lowest possible distortion at a given rate. Channel coding adds redundancy to quantized source information to recover channel errors. This thesis consists of four topics. Firstly, based on high-rate theory, we propose Karhunen-Lo`eve trans- form (KLT)-based classified vector quantization (VQ) to efficiently utilize optimal VQ advantages over scalar quantization (SQ). Compared with code- excited linear predictive (CELP) , KLT-based classified VQ provides not only a higher SNR and perceptual quality, but also lower com- putational complexity. Further improvement is obtained by . Secondly, we compare various transmitter-based packet-loss recovery techniques from a rate-distortion viewpoint for real-time audiovisual com- munication systems over the Internet. We conclude that, in most circum- stances, multiple description coding (MDC) is the best packet-loss recov- ery technique. If channel conditions are informed, channel-optimized MDC yields better performance. Compared with resolution-constrained quantization (RCQ), entropy- constrained quantization (ECQ) produces a smaller number of distortion outliers but is more sensitive to channel errors. We apply a generalized r-th power distortion measure to design a new RCQ algorithm that has less distortion outliers and is more robust against source mismatch than conventional RCQ methods. Finally, designing quantizers to effectively remove irrelevancy as well as redundancy is considered. Taking into account the just noticeable differ- ence (JND) of human perception, we design a new RCQ method that has improved performance in terms of mean distortion and distortion outliers. Based on high-rate theory, optimal centroid density and its corresponding mean distortion are also accurately predicted. The latter two quantization methods can be combined with practical source coding systems such as KLT-based classified VQ and with joint source-channel coding paradigms such as MDC. Keywords: , high-rate theory, source coding, channel coding, quantization, multimedia communication, error correction, multiple description, just noticeable difference, perceptual quality, source mismatch.

iii

List of Papers

The thesis is based on the following papers: [A] Moo Young Kim and W. Bastiaan Kleijn, “KLT-based Classified VQ for the Speech Signal,” in Proc. IEEE Int. Conf. Acoustics, in Speech, Signal Processing (ICASSP), 2002, pp. 645-648. [B] Moo Young Kim and W. Bastiaan Kleijn, “KLT-Based Classified VQ with Companding,” in Proc. IEEE Workshop on Speech Coding, 2002, pp. 8-10. [C] Moo Young Kim and W. Bastiaan Kleijn, “Rate-Distortion Comparisons between FEC and MDC based on Gilbert Channel Model,” in Proc. IEEE Int. Conf. on Networks (ICON), 2003, pp. 495 - 500. [D] Moo Young Kim and W. Bastiaan Kleijn, “Comparison of Transmitter-Based Packet-Loss Recovery Techniques for Voice Transmission,” in Proc. INTERSPEECH2004-ICSLP, 2004, pp. 641-644. [E] Moo Young Kim and W. Bastiaan Kleijn, “KLT-based Adap- tive Classified VQ of the Speech Signal,” IEEE Transactions on Speech and Audio Processing, vol. 12, no. 5, pp. 277-289, May 2004. [F] Moo Young Kim and W. Bastiaan Kleijn, “Comparative Rate- Distortion Performance of Multiple Description Coding for Real- Time Audiovisual Communication over the Internet,” submitted to IEEE Transactions on Communications, 2004. [G] Moo Young Kim and W. Bastiaan Kleijn, “Effect of Cell-Size Variation of Resolution-Constrained Quantization,” to be sub- mitted to IEEE Transactions on Speech and Audio Processing. [H] Moo Young Kim and W. Bastiaan Kleijn, “Resolution- Constrained Quantization with Perceptual Distortion Measure,” to be submitted to IEEE Signal Processing Letters.

v

Contents

Abstract iii

List of Papers v

Contents vii

Acknowledgements ix

Acronyms xi

I Introduction 1 1 Source Coding and Distortion Measures ...... 3 2 High-Rate Theory ...... 5 2.1 Quantization with Various Distortion Measures . . . 8 2.2 Advantages of Vector Quantization over Scalar Quan- tization ...... 9 2.3 Distortion Distribution ...... 11 2.4 Robust Quantization against Source Mismatch . . . 11 2.5 Comparison between Resolution-Constrained and Entropy-Constrained Quantization ...... 13 2.6 Comparison between R-D Theory and High-Rate Theory ...... 14 3 Quantizer Design ...... 15 3.1 Generalized Lloyd Algorithm ...... 15 3.2 Entropy-Constrained Vector Quantization ...... 17 3.3 Lattice Quantization ...... 17 3.4 Random Coding based on High-Rate Theory . . . . 18 4 Channel Coding ...... 19 4.1 Packet-Loss Recovery Techniques ...... 20 4.2 Forward Error Correction (FEC) ...... 22 4.3 Multiple Description Coding (MDC) ...... 25

vii 4.4 Rate-Distortion Optimized Packet Loss Recovery Techniques ...... 30 5 Applications: Audiovisual Coding ...... 31 5.1 Speech and Audio Coding Standards ...... 31 5.2 Image and Coding Standards ...... 32 6 Summary of Contributions ...... 33 References ...... 35

viii Acknowledgements

The work presented in this thesis has been performed at the Sound and Image Processing (SIP) Lab. of the Department of Signals, Sensors and Systems (S3) of KTH (Royal Institute of Technology) from January 2001 to November 2004. During this period, I have been indebted to many people for their help. It is my great pleasure to thank all the people who have supported me during my study. First of all, I am very grateful to my supervisor, Professor W. Bastiaan Kleijn, for his excellent academic guidance and endless encouragement to my research. I am impressed by his broad and deep knowledge and his enthusiasm to share his creativity. I also appreciate his patient proof reading of my works and for his trust in my independent research. I would like to thank my good friends at SIP for sharing a nice at- mosphere that has always filled me with energy. I have enjoyed working together with Volodya Grancharov, Barbara Resch, Anders Ekman, and Kahye Song. Jan Plasberg deserves my gratitude especially for his sense of humor and kindness. I would like to thank Harald Pobloth and David Zhao who have helped everyone in SIP from computer problems to important software tips. I would like to express my gratitude to Dr. Jonas Lindblom, Professor Arne Leijon, Elisabet Molin, and Martin Dahlquist for valuable discussions and interaction. Dr. Jonas Samuelsson is a rare man who can do works quickly with high quality. Renat Vafin and Yusuke Hiwasaki are specially thanked for staying in the office together until midnight and for sharing valuable ideas for both professional and private life. I am very much grateful to my previous office mate Mattias Nilsson especially for his hospi- tality during my first year of study and life in Stockholm. Particular thanks go to my office mate Sriram Srinivasan for his bright comments on my work and his tireless proof reading. I wish to thank our secretary Dora S¨oderberg for her kindness whenever I was in trouble. I cannot forget our energetic lunch-time activity - fooseball which gave us many laughs. This work was financially supported by KTH, Vinnova (Swedish Agency for Innovation Systems), Global IP Sound AB, and Samsung Advanced In- stitute of Technology (SAIT). I would like to express my gratitude to Pro- fessor Mikael Skoglund, Professor Gunnar Karlsson, Professor Gerald Q.

ix Maguire Jr., Ian Marsh, Tomas Sk¨ollermo, Gy¨orgy D´an, and other col- leagues for their positive and friendly discussions during our projects on ‘Audiovisual Mobile Internetworking’ and ‘Affordable Wireless Services and Infrastructures’. I am grateful to my previous supervisor Professor Jaihie Kim of Yonsei University for his mental support and inspiring advice. Dr. Sang Ryong Kim, Professor Dae Hee Youn, Professor Hong Goo Kang, Pro- fessor Nam Soo Kim, Professor Hong Kook Kim, Dr. Yong Duk Cho, Mr. Jeongsu Kim, and Dr. Doh Suk Kim are acknowledged for encouraging me to continue on this academic path. Thanks to all the members of the Korean student and scholar association in Sweden (KOSAS) and the Korean church in Sweden. I am also lucky enough to have the support of many good friends in Korea. It was great to meet my best friend, Young Jay Paik, in Stockholm. Most of all, I would like to thank my dear wife, Hyun Soo Kwon, and my lovely daughters, So Yee Kim and So Yon Kim for their endless love. Their support and understanding has been the most essential part for me to complete this thesis. I also would like to share my pleasure with my wife’s parents, my sister’s family, and my uncles’ and aunts’ family. Finally, I sincerely dedicate this thesis to my parents, Eung Tae Kim and Byoung Joo Youn, who I respect most.

Moo Young Kim Stockholm, November 2004

x Acronyms

3GPP : 3rd Generation Partnership Project 3GPP2 : 3GPP Project 2 AbS : analysis-by-synthesis ADPCM : adaptive differential PCM AFS : adaptive full-rate speech AHS : adaptive half-rate speech AMR : adaptive multi-rate AMR-WB : adaptive multi-rate wideband ARQ : automatic repeat request AVC : BCH : Bose, Chaudhuri, and Hocquenghem B-pictures : bidirectionally-predicted pictures BQ-RCQ : RCQ with the biquadratic distortion measure BS : base station C/I : carrier-to-interference CCITT : International Telegraph and Telephone Consultative Committee CD : compact disc cdf : cumulative distribution function CD-MDC : central-distortion optimized MDC CELP : code-excited

xi CS-ACELP : conjugated structured algebraic CELP DCT : discrete-cosine transform DFT : discrete-Fourier transform DVD : digital video disc DVQ : direct vector quantization ECQ : entropy-constrained quantization ECSQ : entropy-constrained scalar quantization ECVQ : entropy-constrained vector quantization EGC : El Gamal and Cover EM : expectation maximization ETSI : European Telecommunications Standards Institute EVRC : enhanced variable-rate FEC : forward error correction FLOPS : floating-point operations GLA : generalized Lloyd algorithm GMM : Gaussian mixture model GOP : group of pictures GSM : global system for mobile communication HDTV : high definition television IEC : International Engineering Consortium IETF : Internet Engineer Task Force iid : independent and identically distributed iLBC : Internet low-bit-rate codec I-pictures : intra-pictures ISL : inherent SNR limit ISO : International Organization for Standardization ITU : International Telecommunication Union ITU-T : ITU Telecommunication Standards Sector

xii JND : just noticeable difference JPEG : Joint Photographic Experts Group JVT : joint video team KLT : Karhunen-Lo`eve transform KLT-CVQ : KLT-based adaptive classified VQ KLT-CVQC : KLT-based adaptive classified VQ with companding LBG : Linde, Buzo, and Gray LC : layered coding LD-CELP : low-delay CELP LDPC : low-density parity check LPC : linear predictive coding LQ : lattice quantization LSD : log-spectral distortion LSF : line spectral frequencies MDC : multiple description coding MDC-SC : MDC for a single-channel MDC-TIC : MDC with two independent channels MDLQ : multiple description lattice quantizer MDSQ : multiple description scalar quantizer MDTC : multiple description MI-FEC : media independent FEC MLT : modulated lapped transform MS : mobile station MS-FEC : media specific FEC MTU : maximum transfer unit optMDC : channel optimized MDC PCM : pulse code PCT : pairwise correlation transform

xiii pdf : probability density function PER-Q : quantizer based on the perceptual-error criterion pmf : probability mass function P-pictures : predicted pictures PSNR : peak signal-to-noise ratio PSTN : public switched telephone networks Q-RCQ : RCQ with the squared-error criterion RCQ : resolution-constrained quantization RMS : root mean square RRD : redundancy-rate distortion RS : Reed Solomon RS-FEC : Reed-Solomon based FEC RTCP : real-time control protocol SD : spectral distortion SDC : single description coding SD-MDC : side-distortion optimized MDC SMV : selectable mode vocoder SNR : signal-to-noise ratio SQ : scalar quantization SQUARE-Q : quantizer based on the squared-error criterion SVD : singular-value decomposition SVS : Servetto, Vaishampayan, and Sloane TCP : transmission control protocol UDP : user datagram protocol ULP : uneven level protection UQ : uniform quantization VoIP : voice over IP VQ : vector quantization

xiv XOR : eXclusive-OR ZIR : zero-input response ZSR : zero-state response

xv

Part I

Introduction

Introduction

Source and channel coding is an essential part of modern life. Audiovisual communication and storage systems such as the public switched telephone networks (PSTN) based communication systems, mobile phones, voice over IP (VoIP), digital television, compact disc (CD), digital video disc (DVD), and MP3 players, require source and channel coding as fundamental blocks. Fig. 1 shows general building blocks for source and channel coding systems.

source channel source encoding encoding

Channel

source channel playout decoding decoding

Figure 1: Building blocks of source and channel coding.

The goal of source coding is to represent a source with a good com- promise between rate and quality. For example, for an average bit-rate constraint, source coding attempts to optimize quality. Decreasing the rate results in decreased quality, but increasing the rate requires higher cost of source coding systems. The rate-distortion tradeoff was first defined as rate- distortion theory by Shannon in 1948 [1,59,60,62]. The minimum achievable rate R at a distortion D is the rate-distortion function R(D). The Shannon lower bound is a lower bound for the rate-distortion function. If we con- sider a k-dimensional vector X k and a reconstruction error W k, then the rate-distortion function R(D) and the Shannon lower bound RSLB(D) are 2 Introduction given by

R(D) R (D) ≥ SLB 1 1 = h(Xk) sup h(W k) (1) k − k f (wk): f (wk)d(wk)dwk D { W k W k ≤ } R where h(Xk) and h(W k) are the differential entropies of X k and W k, respec- k k tively, and d(w ) and fW k (w ) are the reconstruction error and its prob- ability, respectively. For a Gaussian source with a squared error criterion, the bound is tight and identical to the rate-distortion function [59, 60, 62]:

1 σ2 R(D) = R (D) = log( ) (2) SLB 2 D where σ2 is a variance of the source. Bennett [2] analyzed the rate-distortion tradeoff using high-rate the- ory. While rate-distortion theory characterizes rates achievable for infinite dimensionality, high-rate theory assumes a high number of reconstruction levels, which corresponds to low distortion. The former is considered to be source coding theory or information theory, while the latter is consid- ered to be quantization theory or high-rate theory [7]. General optimality conditions and design issues for source coding were developed by Lloyd [3], Max [4], Linde et al. [5], and Chou et al. [6]. The original source descriptions often contain redundancy and irrele- vancy. Redundancy is information that is superfluous and inessential for representing the source. It can be eliminated without loss of source repre- sentation. Irrelevancy is related to the imperceptible part of information that can be removed without perceptual quality degradation. Perceptual quality for both human vision and hearing is studied in [131–138]. In papers A, C, and D, we use information theory and high-rate theory to propose new methods to efficiently remove redundancy and irrelevancy. In channel coding, k information bits are mapped into n coded bits. Since n k, channel coding introduces n k redundant bits. The proba- ≥ − bility of a channel error, λi, is defined for each information bit i, and the maximal probability of error is

(n) λ = max λi. (3) i 1,2,...,k ∈{ } The rate of an (n, k) code is R = k/n, and a rate R is said to be achievable if the maximum probability of channel error λ(n) approaches zero as n . The channel capacity C is the supremum of all achievable rates, and→ ∞Shannon proved that information bits can be reliably sent over a channel at all rates R below the channel capacity C [1]. Conversely, any (n, k) code with λ(n) 0 must have R C. → ≤ 1 Source Coding and Distortion Measures 3

From the source and channel separation theorem, it follows that if op- timal source codes for given sources and optimal channel codes for given channel conditions are designed separately and independently, the com- bination results in optimal performance. However, if delay is of critical importance such as in real-time communication systems, source and chan- nel coding should be optimized jointly. In paper B, we compare different transmitter-based packet-loss recovery techniques for real-time audiovisual communications over the Internet. This introduction is organized as follows. In section 1, various distor- tion measures for source coding are introduced. In section 2, based on high-rate theory, we first address the optimal centroid density and its corre- sponding mean distortion with various distortion measures. We also discuss resolution versus entropy constrained quantization, high-rate theory versus rate-distortion theory, scalar versus vector quantization, distortion distri- bution, and robust quantization against source mismatch. In section 3, practical quantizer design issues are discussed such as the generalized Lloyd algorithm (GLA), entropy-constrained vector quantization (ECVQ), lattice quantization (LQ), and random coding theory. In section 4, various packet- loss recovery techniques are introduced, especially forward error correction (FEC) and multiple description coding (MDC). Audiovisual coding stan- dards for speech, audio, image and video are introduced in section 5. Fi- nally, in section 6, we present a summary of the contributions of this thesis.

1 Source Coding and Distortion Measures

Source coding attempts to encode a source description into an index with rate and distortion tradeoff. The choice of an appropriate distortion cri- terion is a fundamental component in the design and analysis of a coding system. k Let fXk (x ) be the probability density function (pdf) that a k- dimensional random variable X k takes the k-dimensional real-value xk. Let Q be a mapping from k-dimensional Euclidean space Rk to a reproduction k k alphabet having N elements c1 , ..., cN . The most common distortion mea- sure is a nonnegative function ρ of the difference between an input source xk = x , ..., x Rk and its reconstruction xˆk = xˆ , ..., xˆ Rk: { 1 k} ∈ { 1 k} ∈

d(xk, xˆk) = ρ(xk xˆk) (4) − which is called a difference-distortion measure [21]. Because of its mathematical convenience, one popular difference- 4 Introduction distortion measure is the mean r-th power distortion measure:

d(xk, xˆk) = xk xˆk k − kr k 1 2 r ( (x xˆ ) ) 2 . (5) ≡ k m − m m=1 X The r-th power distortion measure for r=2 is the mean squared distortion measure:

1 k d(xk, xˆk) = (x xˆ )2. (6) k m − m m=1 X The weighted squared-error measure is

1 k d(xk, xˆk) = w (x xˆ )2 (7) k m m − m m=1 X where w are the weights, and can be generalized into { m} d(xk, xˆk) = (xk xˆk)B(xk xˆk)T (8) − − where B is a k k sensitivity matrix. × Difference-distortion measures are mathematically simple to calculate. However, perceptually relevant distortion measures may not be difference- distortion measures. In [10–13], the mean squared error criterion is not a good measure for image coding in terms of subjective quality, and more per- ceptually relevant distortion measures are developed. For speech coding, the log-spectral distortion (LSD) measure for quantizing linear predictive coding (LPC) coefficients in a perceptually relevant domain has the form [14–16,33]

d(xk(ω), xˆk(ω)) = (20 log xk(ω) 20 log xˆk(ω) )2 (9) 10 | | − 10 | | where xk(ω) and xˆk(ω) are the frequency response of LPC parameters and their quantized version, respectively. Another non-difference distortion measure where the weight matrix B is dependent on xk or xˆk is:

d(xk, xˆk) = (xk xˆk)B(xk)(xk xˆk)T (10) − − where B(xk) is a function of xk. Since the perceptual distortion measure in (9) requires heavy computational complexity, it is often modified into the form of (10) [15, 16, 33]. 2 High-Rate Theory 5

2 High-Rate Theory

It is possible to analyze and predict the performance of quantizers of any di- mensionality by using high-rate theory [7,58–62]. High-rate theory was first developed by Bennett [2] for scalar quantization (SQ), which was extended to vector quantization (VQ) by Zador [19] and Gersho [20]. VQ maps a block of k samples xk into an index, while SQ is a special case of VQ with k = 1. In this section, we present the performance of block source codes un- der the assumption that the rate is high and the distortion measure is the r-th power error criterion. High-rate theory assumes that the pdf does not change significantly within a quantization cell. With the r-th power distortion measure, an i-th Voronoi region or an i-th Voronoi cell in a k- dimensional vector space is given by

V = xk Rk : xk ck xk ck , j > i, i { ∈ k − i kr ≤ k − j kr ∀ xk ck < xk ck , j < i (11) k − i kr k − j kr ∀ } k k where ci and cj are i-th and j-th centroids, respectively. Quantization can be divided into two classes:

resolution-constrained quantization (RCQ) where the number of cen- • troids, N, is fixed;

entropy-constrained quantization (ECQ) where the index entropy, • H(I), is fixed.

Let DRC and DEC be the mean distortion over all cells for RCQ and the mean distortion for ECQ, respectively. Bennett [2] modelled a nonuni- form quantizer with a companding quantizer that consists of a compressor followed by a uniform quantizer, and the inverse of the compressor at the output of the uniform quantizer. His companding formula shows that the mean squared error of SQ (k=1 and r=2) with many centroids and fixed rate is approximately given by

1 fX (x) DRC 2 ( 2 dx) (12) ≈ 12N x R gc(x) Z ∈ where N and gc(x) are the number of reconstruction points and the centroid density, respectively. Panter and Dite developed a mean squared error of SQ [7]

1 1 3 3 DRC 2 ( fX (x) dx) . (13) ≈ 12N x R Z ∈ 6 Introduction

Algazi found a Bennett-like bound for the r-th power distortion mea- sure [17]. Bucklew extended Bennett’s original idea of companding to mul- tidimensional cases [22, 24]. Bucklew and Wise rigorously derived the ex- tended Bennett integral for VQ [23, 25]. Gish and Pierce found that the asymptotically best centroid density for entropy-constrained scalar quantiz- ers is uniform [18]. They also showed that the mean squared distortion of optimal ECQ for the scalar case is

1 2(H(I) h(X)) D 2− − (14) EC ≈ 12 where h( ) is a differential entropy. Zador· found the mean distortion of VQ for an r-th power distortion measure [19]:

r k k k k+r DRC = A(k, r, V )N − k ( fXk (x ) k+r dx ) k (15) xk Rk Z ∈ and

r (H(I) h(Xk)) DEC = B(k, r, V )2− k − (16) where A(k, r, V ) and B(k, r, V ) are the quantization coefficients for the poly- tope of a Voronoi region, V . Zador also showed that

k β β κ− B(k, r, V ) A(k, r, V ) Γ(1 + β)κ− (17) k + r k ≤ ≤ ≤ k where β = r/k, Γ( ) is the gamma function, and κk is the volume of the k-dimensional unit·sphere. Asymptotically, the upper and lower bounds in (17) agree as k approaches infinity. Based on Zador’s work, Gersho [20] conjectured that any optimal high- rate quantization has Voronoi regions that are congruent to some polytope V :

C(k, r, Vi) = A(k, r, V ) = B(k, r, V ) C(k, r, V ) (18) ≡ where the dimensionless quantization coefficient C(k, r, Vi) is defined by

k k k x c rdx Vi i C(k, r, Vi) = k − k+kr . (19) R v(Vi) k Using (18), the mean distortion over all cells can be derived. For the resolution-constrained case, the number of centroids, N, is fixed, and the optimal centroid density is

k k k+r k fXk (x ) gc(x ) N k (20) ≈ k k+r k xk Rk fXk (x ) dx ∈ R 2 High-Rate Theory 7 and the associated distortion is

r R k k k k+r DRC C(k, r, V )2− k ( fXk (x ) k+r dx ) k ≈ xk Rk Z ∈ r R k C(k, r, V )2− k fXk (x ) k (21) ≡ h i k+r where R is the required rate and

k k α k 1/α fXk (x ) α = ( fXk (x ) dx ) . (22) h i xk Rk Z ∈ For the entropy-constrained case, the average rate, H(I), is fixed. The optimal ECQ centroid density is constant:

g (xk) g c ≈ c r (H(I) h(Xk)) 2− k − (23) ≈ and the mean distortion of ECQ is

r (H(I) h(Xk)) D C(r, k, V )2− k − . (24) EC ≈ The optimal high-rate ECQ has a uniform centroid density, and the indices of uniform quantizers are coded with lossless coding. The goal of lossless coding is to represent the index of a quantization point without introducing distortion at the shortest average codeword length possible. If a cell has a higher probability, the codeword length is shorter. Huffman coding provides codes with an average length of not more than H(I) + 1 bits for a known source pdf with relatively few symbols [8]. yields codes with average length not more than H(I) + 2 bits for a given source pdf [9]. While Huffman coding encodes each symbol separately, arithmetic coding encodes a supersymbol (block of symbols). Thus, the average length per symbol of arithmetic codes is far less than that of Huffman codes. If a source pdf is not known, the pdf can be estimated with maximum-likelihood, maximum a-posteriori, or Bayesian estimates. The dynamically estimated source pdf can then be used with the arithmetic coding [62]. A Ziv-Lempel code does not need to calculate the empirical pdf [60]. Since ECQ allows variable codeword length while RCQ allows fixed code- word length, ECQ requires a lower average rate than RCQ to obtain the same distortion. However, ECQ cannot be used for fixed-rate communica- tion systems. More comparisons are described in subsection 2.5. In the following subsections, based on high-rate theory, we discuss:

quantization with various distortion measures; • advantages of vector quantization over scalar quantization; • 8 Introduction

distortion distribution; • robust quantization against source mismatch; • comparison between RCQ and ECQ; • and comparison between R-D theory and high-rate theory. • 2.1 Quantization with Various Distortion Measures Zador and Gersho studied the asymptotic performance of k-dimensional VQ with an r-th power distortion measure [19, 20]. Yamada et al. ex- tended Zador and Gersho’s result to VQ with difference-distortion measures in (4) [21]. Gardner and Rao extended the previous works to the non-difference distortion measure [33]. They consider a distortion measure d(xk, xˆk) having the following constraints:

d(xk, xˆk) 0 and equality holds if and only if xk = xˆk; • ≥ d(xk, xˆk) is continuous and continuously differentiable. • By applying the Taylor expansion of d(xk, xˆk), Gardner and Rao approxi- mated the distortion measure as 1 d(xk, xˆk) (xk xˆk)T B(xk)(xk xˆk) (25) ≈ 2 − − where B(xk) is the sensitivity matrix. Based on this locally-quadratic non- difference distortion measure, the new centroid density and mean distortion bounds were derived for resolution-constrained quantization. In [33], the sensitivity matrix has an important application in the quantization of LPC parameters for speech coding. In [34, 35], the Gardner and Rao distortion measure was more formally derived and applied to image coding. The derived distortion measure was used to find the centroid density and the mean distortion with a constraint on entropy. With a non-difference distortion measure, the optimal centroid k density gc(x ) for ECQ is not uniformly distributed. In [34, 35], Li et al. k showed that the density gc(x ) is higher in locations where the value of det(B(xk)) is higher:

k 1/2 k [det(B(x ))] gc(x ) = N k 1/2 k . (26) xk Rk [det(B(x ))] dx ∈ In [36,37], the high-rate rate-distortionR function is investigated for a class of non-difference distortion measures. For the source-dependent weighted 2 High-Rate Theory 9 mean squared error criterion in (25), the rate-distortion function is approx- imated by k R(D) h(Xk) log(2πeD/k) ≈ − 2 1 + E[log det(B(xk))]. (27) 2 Note that for B(xk) = I, (27) reduces to the R(D) for the regular mean squared error case. In [38], a rigorous derivation for the asymptotic distortion of a multidi- mensional compander based ECQ is given for a locally-quadratic distortion measure. The optimal compressor does not depend on the source distri- bution, but is determined by the distortion measure. If the compressor and uniform quantizer satisfy the optimality condition, the high-rate per- formance approaches the rate-distortion bound for large dimensionality. In paper D, we consider just noticeable difference (JND)-based distortion measures that are not included in earlier works. Based on high-rate theory, we find bounds and approximations of the mean distortion.

2.2 Advantages of Vector Quantization over Scalar Quantization In this subsection, for an r-th power distortion measure, the mean dis- tortion bounds of SQ and VQ given by the Bennett integral in (12) and Zador-Gersho’s formula in (21) are compared for the resolution and entropy constrained cases. Under the resolution-constrained condition, VQ has memory, space- filling, and shape advantages over SQ . The ratio of the distortion of SQ to that of VQ is defined as the VQ advantage factor

DRC (1, r, V1) = FRC (k)MRC (k)SRC (k) (28) DRC (k, r, Vk) where the space-filling advantage factor is

C(1, r, V1) FRC (k) = , (29) C(k, r, Vk) the memory advantage factor is i=k k Πi=1 fXk (xi ) 1 h i i 1+r MRC (k) = k , (30) fXk (x ) k h i k+r and the shape advantage factor is

fX (x) 1 h i 1+r SRC (k) = i=k k . (31) Πi=1 fXk (xi ) 1 h i i 1+r 10 Introduction

The space-filling advantage results from the optimal shape of the Voronoi regions for each dimensionality. The maximum gain for the space-filling advantage is approximately 1.53 dB for r=2. As the dimensionality k goes to infinity, the Zador upper bound for C(k, r, Vk) approaches the sphere bound r/2 and C(k, r, Vk) = (2πe) [19]. This is the minimum value of C(k, r, Vk) for the optimal vector quantizer compared with C(1, r, V1) = 1/12 for the scalar quantizer. The memory advantage accounts for the dependencies between the sam- ples. If the vector components are independent, the memory advantage disappears. The shape advantage shows how well the centroid density of SQ can be matched to the optimal centroid density of VQ. The shape advantage is relevant only for RCQ (it is always unity for ECQ). If the source density is uniform, the shape advantage vanishes. In [30], the shape advantage is di- vided into oblongitis and the remaining density advantages. The oblongitis advantage is the ratio of the distortion of a quantizer that has a cubic cell shape over that of a quantizer which has the rectangular cell shape. In the entropy-constrained case, the shape advantage of VQ vanishes since the optimal centroid density of ECQ is uniform, and the remaining VQ advantage factors are

DEC (1, r, V1) = FEC (k)MEC (k) (32) DEC (k, r, Vk) where FEC (k) = FRC (k). We note that the shape advantage still exists for a non-difference distortion measure, since the optimal centroid density is not uniform in (26).

The space-filling advantage factor FEC (k) is identical to FRC (k), and the memory advantage factor is

r k (kh(Xi) h(X )) MEC (k) = 2 k − (33)

k i=k k k where kh(Xi) and h(X ) are determined by Π f k (x ) and f k (x ), i=1 Xi i X respectively. In paper A, we propose a new resolution-constrained quantizer that uti- lizes the three VQ advantages more efficiently than a conventional speech coding paradigm, code-excited linear predictive (CELP) coding. Even though VQ has three advantages over SQ for the resolution- constrained case, VQ can be computationally complex and need huge amount of storage as dimensionality k increases. To reduce complexity, structured quantizers, such as product, tree-structured, multi-stage, and transform-based quantizers, have been proposed. In [27–30], loss analyses for such quantizers based on high-rate theory are presented. 2 High-Rate Theory 11

2.3 Distortion Distribution High-rate theory provides the asymptotic average distortion for many quan- tization points. An analytical formula for the pdf of the distortion is pre- sented in [31]. The pdf of error depends on the source pdf, the centroid density, and the quantizer shape profile. For scalar quantization, the error pdf is [31]

fV (v) = fX (x)gc(x)dx (34) g (x) 1/2 v Z c ≤ | | where v is quantization error, x xˆ. Using the formula of fV (v), Lee and Neuhoff revisited the Bennett integral− for scalar quantization [2, 30]. For vector quantization, if the normalized quantization error is w = N 1/k xk xˆk , the error pdf is [31] || − ||r

k 1 k k k k 1/k k f (w) = kκ w − f k (x )g (x )S(x , wg (x ) )dx , w 0 (35) W k X c c ≥ Z where S(xk, ρ) is the quantizer shape profile. The shape profiles of a circle, hexagon, and several rectangles for two-dimensional cases are also given in [31]. The Bennett integral for vector quantization [30] can be proven using the error function fW (w). For a spherical cell shape, the error pdf is approximated by

k 1 k (2k+r)/(k+r) k fW (w) = kκkw − cf k (x ) dx − X cf (xk)k/(k+r) κ 1w−k Z Xk ≤ k k 1 k (2k+r)/(k+r) k = kκkw − cfXk (x ) dx , Z for w 1/(cκ )1/kf 1/(k+r) (36) ≤ k M k k/(k+r) k 1 where c = [ fXk (x ) dx ]− and fM is the maximum value of k fXk (x ). Simulation results with 2, 3, and 4 dimensional Gaussian source show that (36)R is a good approximation. In [32], a rigorous derivation of the formula (35) is given.

2.4 Robust Quantization against Source Mismatch Source mismatch between training and test data is unavoidable in practice. Source mismatch occurs when a quantizer that is trained for one source is applied to another source. An upper bound on the performance loss of scalar quantizers due to a source mismatch was derived in [41]. In rate-distortion theory, it has been known that Gaussian sources yield the highest distortion at a given rate for a mean squared error criterion [43]. It has also been known in rate-distortion 12 Introduction theory that if a code is designed for a Gaussian source with the mean and variance of the actual source and has mean distortion DG, then the mean distortion of the code applied to another source with the same mean and variance is less than or equal to DG [43,47,48]. The code generated for the worst source is called a robust code. Another approach to robust quantization is for a collection of source pdf’s with an r-th power distortion measure [40, 44, 45]. Since the source pdf is generally incompletely known in real applications, a class of source pdf’s instead of a unique pdf is defined and a quantizer is designed for that class of sources. The designed quantizer guarantees performance for the specified source pdf’s. Robust designs are called two-person games: a designer attempts to minimize distortion, and an opponent tries to choose the most difficult family of source pdf’s to maximize distortion. Let f, g, and D(f, g) be source pdf, centroid pdf, and mean distortion, respectively. A robust quantizer (f ∗, g∗) for RCQ exists, if the following inequality holds:

D(f, g∗) D(f ∗, g∗) D(f ∗, g) (37) ≤ ≤ where f F and F is a predetermined class of pdf’s. The left inequality ∈ means that the distortion is no more than D(f ∗, g∗) for any f if the centroid density g∗ is selected. The left inequality means that the distortion is not less than D(f ∗, g∗) if the worst pdf f ∗ is chosen. Since (f ∗, g∗) does not always exist, an alternative approach is the minimax solution

D(f ∗, g∗) = min max D(f, g). (38) g f F ∈ For the entropy-constrained case, the best centroid density has been shown to be uniform [18–20]. Thus, for the entropy-constrained case, robust quan- tization is equivalent to finding the f F that maximizes the entropy. ∈ In [40], it is shown that, for the case that amplitudes of the source are bounded in [a, b], a scalar minimax quantizer for RCQ is uniform for [a, b]. If we have very little knowledge of source, it is shown in [40] that uniform quantization is a good choice. In [44], the Bennett companding model was used to find the minimax solution for scalar quantization of a fixed-moment constrained pdf. In [45], robust vector quantizers were proposed for three classes of distributions:

the class with fixed moments: • F = f(xk); m(xk)f(xk)dxk = q (39) 1 { } Z where m(xk) 0 is the moment function and q is an arbitrary con- stant; ≥ 2 High-Rate Theory 13

the class of -contaminated pdf’s: • F = f(xk); f(xk) = (1 )f (xk) + f (xk) (40) 2 { − k u } k k where fk(x ) and fu(x ) are the known and unknown pdf’s, respec- tively; and the class of pdf’s which are characterized by a lower and an upper • bound of amplitude:

F = f(xk); f (xk) f(xk) f (xk) (41) 3 { 1 ≤ ≤ 2 } k k where f1(x ) and f2(x ) are known functions, but are not always density functions. It is shown in [45] that the maximum norm element of the above classes of distributions is the solution f ∗:

k k f ∗(x ) = arg max f(x ) k (42) f(xk) F h i k+r ∈ and the optimal centroid density is

k k k+r k f ∗(x ) g∗(x ) = N k . (43) k k+r k xk Rk f ∗(x ) dx ∈ In [35], it is shown that whenR a resolution-constrained scalar quantizer is designed for a source pdf fX (x), and that quantizer is tested with another source fY (x), as k goes to infinity, the distortion loss due to the mismatch is given in the log-domain by the relative entropy H(fY (x) fX (x)). In [50–52], it is shown that when an entropy-constrained|| vector quan- k k tizer that is trained optimally for fXk (x ) is applied to a source fY k (x ), if k k k k fY k (x )/fXk (x ) is bounded, the rate loss is given by H(fY k (x ) fXk (x )). Interestingly, the distortion loss of RCQ and the rate loss of ECQ|| are de- pendent on the relative entropy of the mismatched sources. In papers C and D, the source mismatch problem is considered in the design of quantizers. We consider mean and variance mismatch of the Gaus- sian source, and we also apply the quantizers to practical situations such as training and testing data mismatch of line spectral frequencies (LSF) in a speech coding application.

2.5 Comparison between Resolution-Constrained and Entropy-Constrained Quantization From high-rate theory, RCQ and ECQ are known to be optimal fixed and variable rate coding schemes, respectively. If channels allow variable-rate 14 Introduction transmission, ECQ is the best choice. For fixed-rate networks, such as PSTN, RCQ has been used widely [141, 142, 146]. ECQ has three advantages over RCQ: reduction of the rate required to obtain the same mean distortion, reduction of the number of distortion outliers, and increased robustness against source pdf mismatch. For a Gaussian source, Max showed that optimal ECQ has around 0.47 bits/sample reduction compared to optimal RCQ for a scalar quantization case [4]. For vector quantization cases, as k increases, the required rate difference between ECQ and RCQ is about 0.72 bits in total. Thus, the required rate advantage of ECQ over RCQ decreases as k increases. We also note that optimal ECQ needs variable rate transmission which is not always feasible in practical situations and which is more sensitive to channel errors. The optimal centroid density of ECQ is uniformly distributed, while that of RCQ depends on the source pdf [3,5,39,54,56,64]. For RCQ, at a sparsely populated region of the source pdf, the volumes of the Voronoi region are large. Large Voronoi regions produce distortion outliers that are often not acceptable in terms of perceptual quality. However, ECQ produces equal cells and thus the mean distortion of the individual cells is identical. Since the mean-distortion variance between cells is zero, ECQ produces a smaller number of distortion outliers than RCQ for the same average distortion. If the pdf of the test source is different from the pdf of the training data, the mean distortion of RCQ is increased at a given fixed rate transmission. However, misestimation of the source pdf for ECQ increases the average rate, but the distortion performance is preserved. The three advantages of ECQ result from uniform quantization with loss- less coding. In papers C and D, we design a new RCQ scheme to efficiently exploit the advantages of ECQ.

2.6 Comparison between R-D Theory and High-Rate Theory The Shannon rate-distortion bound considers the dimensionality of source samples to be asymptotically high for a given average rate, while high- rate theory assumes that the codebook size is asymptotically large for fixed dimensionality [7, 58–62]. If both dimensionality and rate are infinite, the rate-distortion bound and the high-rate bound are identical. However, for moderate dimensionality and moderate rate, high-rate theory is preferred. For an independent and identically distributed (iid) Gaussian source, high- rate theory with rates higher than 2 3 bits/symbol has known to be accurate, but rate-distortion theory requires∼ more than 250 dimensions [7]. In papers A, C, and D, new quantization methods are proposed, and their performance is analyzed with high-rate theory. 3 Quantizer Design 15

The Shannon rate-distortion bound can be calculated analytically for squared-error and absolute-error criteria with Gaussian and Laplacian sources. If the rate-distortion bound cannot be computed analytically, the Blahut algorithm can be used [57]. To calculate the high-rate bound based on the works of Zador and Ger- sho, we must know the quantization coefficient C(k, r, V ) [20,64]. If analytic calculation of the high-rate bound is not feasible, numerical integration is utilized [63]. Since rate-distortion theory assumes an average rate for a long block size, it is more natural to compare the rate-distortion function with high- rate ECQ than with high-rate RCQ. For a squared-error criterion (r=2), the Shannon lower bound in (1) is given by

1 1 R (D) = h(Xk) log (2πeD) (44) SLB k − 2 2 where D = D/k. The mean distortion of high-rate ECQ is

1 1 1 D H(I) = h(Xk) log ( EC ). (45) k k − 2 2 C(2, k, V )

By subtracting (44) from (45), we obtain

1 1 H(I) R (D) = log (2πeC(2, k, V )). (46) k − SLB 2 2 For a scalar quantizer, C(2, 1, V ) = 1/12 and thus the right hand side of (46) is about 0.25. As the dimensionality k increases, C(2, k, V ) approaches 1/(2πe) and the high-rate performance of ECQ reaches the Shannon lower bound. As k increases, the high-rate performance of RCQ also approaches the Shannon lower bound [62].

3 Quantizer Design

Rate-distortion theory and high-rate theory lead to analytical expressions for quantizer performance. In this section, we introduce practical quantizers such as GLA, ECVQ, lattice quantization, and random codebook genera- tion.

3.1 Generalized Lloyd Algorithm The design problem for quantizers has been considered in numerous studies. Lloyd [3] found optimality conditions for the N-level resolution-constrained scalar quantizer and proposed an iterative algorithm for the mean squared error criterion. The optimality conditions are that the set of Voronoi regions 16 Introduction must be optimal for the given set of centroids, and that the set of centroids must be optimal for the given set of Voronoi regions. If the squared-error measure is used, for a given Voronoi region Vi, a centroid ci is given by

ci = arg min E[d(x, y) x Vi] y R | ∈ ∈ = E[x x V ]. (47) | ∈ i

Lloyd’s optimality conditions for r-th power distortion measures were proven by Max [4]. Linde, Buzo, and Gray (LBG) [5] generalized the resolution-constrained Lloyd algorithm to a vector quantizer for either known pdf of source or for source sample data. The GLA is a double minimization problem and can be solved in an iterative manner. Firstly, the optimal partition Vi k { } for given centroids ci is found. Given the partition Vi , the optimal k { } { } centroid ci is found at the second step. The iterations are terminated if the impro{vemen} t per iteration is below a threshold. In [5], a splitting-based initialization method was proposed. The mean of k the source is usually assigned as an initial codevector c1 , and the codebook is k k doubled in size by splitting. Given N quantization points c1 , ..., cN , new 2N k k quantization points are generated for GLA training by splitting ci into ci + k k k k  and ci  , where  is a fixed perturbation vector. The mean distortion produced−by high-rate prediction and LBG are similar for moderate rate [5]. Since the source pdf is usually unknown, training data can be directly applied to find centroids and their Voronoi regions. For an unknown pdf with a squared-error measure, instead of using (47), the centroid is estimated by

1 ci = x (48) Vi x V | | X∈ i where Vi is a population of x in an i-th Voronoi region Vi. The Voronoi region is| a| collection of data points instead of cell boundaries for partitioning the data space. This algorithm is called the discrete GLA. The discrete GLA converges in a finite number of iterations. Since unstructured vector quantization requires high complexity and large memory as dimensionality and rate increase, structured vector quan- tizers have been developed such as tree-structured VQ [14], multi-stage VQ [72], pyramid VQ [73], polar VQ [74], split VQ [16], lattice VQ [20,64], etc. However, these structured vector quantizers yield nonoptimal perfor- mance in general. 3 Quantizer Design 17

3.2 Entropy-Constrained Vector Quantization The GLA can be extended to ECVQ [6]. Based on a Lagrangian formulation, a new distortion criterion is defined as

k k η = d(x , ci ) + λli (49)

k where ci , λ, and li are an i-th centroid, a Lagrange multiplier, and the length of the i-th codeword, respectively. The codeword length for an i-th Voronoi region is given by

Vi li = log2( | | ). (50) − N V i=1 | i| P k A Voronoi region for ECVQ is estimated for a given centroid ci and its corresponding codeword length li:

V = xk Rk : d(xk, ck) + λl d(xk, ck) + λl , j > i, i { ∈ i i ≤ j j ∀ d(xk, ck) + λl < d(xk, ck) + λl , j < i (51) i i j j ∀ } and its new centroid and codeword length are calculated by (48) and (50) k in an iterative manner, respectively. Iteration steps to find Vi, li, and ci are terminated, if the improvement per iteration is below a threshold. The improvement is also estimated with the new distortion measure in (49). The Lagrange multiplier λ decides the operating point of average rate and distortion. If λ = 0, ECVQ reduces to GLA.

3.3 Lattice Quantization Gersho conjectured that optimal high-rate quantization of a uniform source has Voronoi regions that are congruent to some polytope V , which naturally leads to lattice vector quantization [20]. Lattice vector quantization with lossless coding can be seen as an extension of the work of Gish and Pierce [18]. In lattice quantization (without considering companding), all the Voronoi regions have the same shape, size, and orientation (rotation). The design of a lattice VQ can be separated into finding a generator matrix G, finding a truncation and scaling factor, and assigning an index to the codevectors. The lattice is defined by

Λ = GT uk; uk Zk . (52) { ∈ } To use only a finite number of centroids, the lattice must be truncated

C = Λ S (53) ∩ 18 Introduction where S is a bounded region in Rk. The truncated lattice is then multiplied by a scaling factor to cover the source pdf. The truncation shape and scaling factor can be estimated with repeated experiments to reduce distortion [68]. Lattice VQ has fast encoding and requires low memory, which allows much larger dimensionality than GLA-based unconstrained VQ. Since the centroids and their Voronoi regions are generated with G, almost no memory is required to store the codebook. In [69], by using a Voronoi code, a fast index assignment algorithm k k was proposed. The Voronoi code CΛ consists of all vectors x b for k k − x Λ (b + Vr) where Vr is the r-times magnified Voronoi region. This algorithm∈ ∩ requires a fast coding method to find the closest point. In [70], Conway and Sloane presented a fast coding method for a variety of lattices. In [71], Sayood and Blankenau proposed a more generic coding algorithm which will work with all kinds of latticea but is slower. For RCQ of non-uniform sources, a lattice VQ (uniform SQ for k=1) can be combined with companding proposed by Bennett and Gersho [2,20]. For ECQ, since the optimal centroid density is uniform, a lattice VQ followed by lossless coding can be used. The performance of lattice quantization depends on the quantization coefficient. If k and r are fixed, the quantization coefficient is determined by the lattice structure G. Although a sphere would be the optimal lattice for quantization, we cannot partition a k-dimensional space into spheres and hence the associated lower bound on distortion is overly optimistic. Conway and Sloane found the best-performing known lattices of infinite uniform pdfs, for example, in 2-8 dimensions: A2, A3∗, D4, D5∗, E6∗, E7∗, and E8 [64– 66]. Agrell and Eriksson found them for 9-10 dimensions [67]. Jeong and Gibson [68] argued that, for RCQ, the lattice should be trun- cated by a contour of constant pdf of the source. The theta function is used to truncate a lattice with the correct number of VQ points. They showed that the shape of the constant pdf contours for a Gaussian source is a sphere, while that for a Laplacian source is a pyramid.

3.4 Random Coding based on High-Rate Theory

Zador gave a distortion upper bound for a random code [19]. This upper bound converges to the sphere bound as the dimensionality increases. For sufficiently high dimensionality, a random codebook generated by optimal centroid density based on high-rate theory gives similar performance as optimal codebook design methods such as GLA [39, 54–56]. To generate an RCQ codebook with random coding, the source pdf k fXk (x ) must be known. It can be approximated well by a Gaussian mix- ture model (GMM). A GMM is a weighted sum of multivariate Gaussian 4 Channel Coding 19 densities:

M k k f k (x ) = ρ f (x u , C ) (54) X i i | i i i=1 X where the weights ρi, means ui, and covariance matrices Ci of each GMM component are calculated with the expectation maximization (EM) algo- rithm. k The corresponding centroid density gc(x ) for RCQ is given by high- rate theory in (20). Since the random codebook generation based on this centroid density is not simple, the codebook is generated for a GMM-based pdf instead in [39,54,55]. For the GMM-based pdf, a random codevector is generated with two steps: (1) a mixture component i is selected based on M the probability ρi/ j=1 ρj, (2) and a codeword is randomly generated for the selected Gaussian pdf f (xk u , C ). In [54], the codebooks are generated P i i i with the centroid density in (20)| by using the rejection method [63]. It is shown that both methods produce similar distortion for a high dimensional source (k=10 in their spectral coding example). The above codebook design method does not need to store codebook. Thus, complexity resulting from memory requirement is negligible even for high-dimensional VQ cases. In [39, 54–56], performance bounds of spectral quantization for the 10 dimensional case were given. In papers C and D, new high-rate distortion bounds and centroid density are found for better perceptual quality, and GMM-based random coding is a promising design alternative.

4 Channel Coding

Source coding attempts to remove all the redundancy to efficiently represent a source in a compressed version, whereas channel coding adds redundancy to recover channel error. For example, in the Internet, the channel error includes single bit error and packet loss. In this thesis, we focus on channel coding for packet-loss recovery. We assume that bit errors of wired/wireless channels are corrected by bit-error correcting codes, such as convolutional coding, and that packet losses are corrected by packet-loss recovery techniques. In paper B, we compare packet-loss recovery techniques to find the best one for varying channel conditions at a given overall rate. The Shannon separation theorem states that source and channel coding can be designed separately, which yet results in the optimal performance of communication systems if the block length is infinite and the entropy rate of the source is less than the channel capacity in a time-invariant system [1]. However, infinite block length causes infinite delay and infinite complexity, 20 Introduction which is not feasible in reality. The communication channels may also be time-varying and band-limited especially in multi-user communication sys- tems. Thus, for optimal system design, source and channel coding must be jointly optimized [75, 76]. In [77], Sayood et al. classified joint source-channel coding into four classes:

joint source channel coders where source and channel coders are truly • integrated (i.e., channel-optimized quantization [7]);

concatenated source channel coders where source and channel coders • are cascaded and fixed total rates are divided into the source and chan- nel coders according to channel condition (i.e., FEC [87] for adaptive multirate (AMR) speech coding [154]);

unequal error protection where more bits are allocated to more im- • portant parts of source (i.e., uneven level protection [83]);

and constrained source-channel coding where source encoders and de- • coders are modified depending on channel condition (i.e., MDC [94, 104, 106, 111]).

In the following subsections, a variety of packet-loss recovery techniques is introduced. Especially, the second class of joint source-channel coding (FEC) and the fourth class (MDC) are more thoroughly studied for packet- loss recovery.

4.1 Packet-Loss Recovery Techniques Packet-loss recovery techniques are divided into two classes: transmitter- based and receiver-based recovery techniques as shown in Fig. 2 [78–80]. The receiver-based error concealment techniques are of use when a transmitter-based scheme repairs most of the burst losses, and leaves a small number of gaps to be repaired [81, 82]. Error concealment schemes produce a replacement for a lost packet. This is possible since the perceived features of audiovisual signals change slowly over time, thus they are quite similar to each other during a short-time period. Error concealment methods can be divided into three categories: in- sertion, interpolation, and regeneration. Insertion-based schemes repair losses by inserting a fill-in packet by splicing (zero-length fill-in), silence substitution (zero-amplitude fill-in), noise substitution (repetition of the previous packet with fading), and extrapolation (for example, losses during voiced speech are corrected using pitch-cycle waveform repetition) [142]. Interpolation-based schemes fill gaps by stretching the received information in both ends of the losses. Interpolation-based schemes, such as time-scale 4 Channel Coding 21

Transmitter-based

Automatic Forward Multiple Layered Uneven Repeat Error Description Coding Interleaving Level reQuest Correction Coding (LC) Protection (ARQ) (FEC) (MDC) (ULP)

Media- Media- Independent Specific FEC (MI-FEC) FEC (MS-FEC)

Receiver-based

Insertion Interpolation Regeneration

Figure 2: Transmitter-based and receiver-based packet-loss recovery techniques.

modification, have better reconstruction quality than insertion-based tech- niques with additional delay requirements. Regeneration uses knowledge of source coding algorithms to synthesize lost parts of audiovisual signals. For example, in [146], spectrum parameters, excitation signal, and gain param- eters are extrapolated instead of extrapolating the speech signal itself. The basic transmitter-based mechanisms available to recover packet loss include automatic repeat request (ARQ), media independent FEC (MI-FEC or generic FEC) and media specific FEC (MS-FEC), multiple description coding (MDC), interleaving, layered coding (LC), and uneven level protec- tion (ULP). If the end-to-end delay is of secondary importance, such as in radio broadcasts, ARQ, FEC and interleaving can be successfully applied for packet-loss recovery. Interleaving distributes a burst error with large gaps into multiple single errors while it does not introduce additional network load. If the delay is of critical importance, then low-delay FEC and MDC schemes can be used. In paper B, we compare rate-distortion bounds of FEC and MDC for real-time audiovisual communication systems. ULP [83] is used, e.g., in combination with FEC and MDC to exploit the unequal importance of data. UDP Lite provides an application layer with a corrupted packet instead of discarding it at a protocol layer. It is up to the application to decide whether the damaged packet can be used or discarded. It has to be noted that even though the described techniques help to solve the packet-loss recovery problem, some of them, such as ARQ and FEC, introduce additional load to the network, which results in increased 22 Introduction

channel error D1 D2 D1 D1 ×D2 P1 D1 D2 FEC P1 P1 FEC D2 Encoder Decoder

Figure 3: Example of media-independent FEC [79]. probability of congestion and packet loss. The best tradeoff always depends on the application and network conditions. The basic channel coding blocks for real-time audiovisual communication systems are

FEC and MDC for the transmitter-based packet-loss recovery; • FEC and ULP for bit error detection and correction for bit errors; • and receiver-based error concealment. • In the following subsections, we describe basic ideas of FEC, MDC, and rate-distortion optimized packet loss recovery techniques.

4.2 Forward Error Correction (FEC) In FEC, lost data are recovered at the receiver without further reference to the transmitter [87]. Both the original data and the redundant information are transmitted to the receiver. There are two kinds of redundant infor- mation: those that are either independent of or dependent on the media stream:

Generic FEC (media-independent FEC) [84] does not require infor- • mation about the original data type (speech, audio, or video). In generic FEC, original data together with some redundant data, called parities, are transmitted to the receiver. In media-specific FEC (MS-FEC), if an original data packet is lost, • redundant data packets, which are related to the specific media, are used to recover the loss [85]. The first transmitted data is referred to as the primary encoding and redundant transmissions as secondary encoding. Usually, the secondary packet is produced using a lower- bandwidth encoding method than the primary encoding, which results in lower-quality.

In paper B, we show that MS-FEC is a subclass of MDC, and FEC in the remaining section means generic FEC. In generic FEC, k original 4 Channel Coding 23 data packets and additional h redundant parity packets are transmitted. Figure 3 shows an example for k=2 and h=1. The FEC encoder produces one redundant packet (P 1) for two data packets (D1 and D2). If one data packet (D2) is lost, the receiver can recover the data packet (D2) by using the successfully received packets D1 and P 1. In generic FEC, redundant data, called parity, is derived from the orig- inal data using the eXclusive-OR (XOR) operation (one parity packet is generated for the given original packets), using Reed-Solomon codes (mul- tiple independent parities can be computed for the same set of packets), or other techniques. The Reed-Solomon codes can achieve optimal loss pro- tection, but lead to higher processing costs than schemes based on XOR operation. The advantages and disadvantages for the generic FEC schemes are: the operation of generic FEC is source independent; • the original data packet can be used by receivers that are not capable • of FEC; FEC coding requires additional bandwidth for the efficient channel • coding; parity packets can be useless at a decoder, if the number of received • packets exceeds the number of required packets to recover packet losses. Linear block codes and convolution codes are subclasses of generic FEC. In the following sections, we focus on linear block codes [86, 87].

Linear Block Codes Linear block codes generate n code elements u = u , u , ..., u from k { 1 2 n} original message elements o = o1, o2, ..., ok with a k n generating matrix G: { } × u = oG. (55) If we denote the channel error term as e, the signal r received by the decoder is r = u + e = oG + e. (56) For each generating matrix, a parity-check matrix H can be constructed such that GHT =0. For error free channels, e=0 and thus rH T =0. However, if detectable channel errors occur, rH T is not zero. If the detected errors are correctable, the syndrome vector S = rHT = eHT (57) 24 Introduction represents a unique error pattern sequence that is called the coset. If the minimum distance of the codewords in a vector space Vn is dmin, the error-detection capability  and error-correction capability t are

 = d 1 min − d 1 t = min − (58) b 2 c and such codes are denoted as (n, k, t) codes. One of classes of linear block codes are cyclic codes. If u = u0, u1, ..., un is a cyclic codeword, single and multiple cyclic shifts of the co{ deword is also} a cyclic codeword. Bose, Chaudhuri, and Hocquenghem (BCH) codes and Reed-Solomon (RS) codes are examples of cyclic codes. BCH codes are generalizations of the Hamming code [86, 88, 89]

(n, k, t) = (2m 1, 2m 1 mt, t). (59) − − − RS codes are maximum distance separable, and thus have the best error correcting capability at the same length and dimensionality [86, 90, 91]

(n, k, t) = (2m 1, 2m 1 2t, t). (60) − − − RS codes are constructed in an extended Galois field GF (pm) where p and m are a prime number and a positive integer, respectively. RS codes can be shortened to any desired size from larger codes, for example, by padding zeros to the unused parts of blocks. These zeros are not sent, but at the decoder they are inserted again to recover lost packets. The CD format utilizes shortened RS codes. The error patterns in RS codes are not of importance if the number of incorrect received symbols is not larger than the maximum number of symbols that can be correctable. With an XOR operation, the pattern of the recoverable lost data is much more restricted. However, RS codes are more complex than the XOR operation, which is an important disadvantage of RS codes. This higher complexity is mostly due to the syndrome test in (57) which is used to find a unique error pattern from the huge number of syndrome sets. However, for erasure channels such as packet-switched networks, the error locations are known to the decoder and the complexity of the RS code is not a problem. Moreover, the error correction capability t is increased to

t = d 1 = n k. (61) min − − Half of the information in syndromes is used to detect error locations, and the other half is used to correct the detected errors. If error patterns are known, all the syndrome information can be used to correct errors. 4 Channel Coding 25

R1 D1 ^ Encoder 1 Decoder 1 X1 Channel 1

D0 ^ X Decoder 0 X0

R2 D2 ^ Encoder 2 Decoder 2 X2 Channel 2

Figure 4: Multiple description coding: two encoders and three decoders.

Turbo codes [93] and low-density parity check (LDPC) codes [92] have a performance that approaches the Shannon limit. These codes use itera- tive decoding with soft decision with likelihood functions. Turbo codes are concatenated forms of two or more convolution or block codes. In this thesis, since we focus on packet-loss recovery techniques with stringent delay constraints, we mainly consider the RS codes. However, for bit-error channel, other error correcting codes such as turbo codes and LDPC codes can be deployed.

4.3 Multiple Description Coding (MDC) Channel coding adds redundancy to encoded source streams for obtain- ing robustness against channel errors. If we simply send identical encoded streams twice over a channel, one of descriptions is redundant for the case that both descriptions are successfully received at a decoder. MDC divides data into multiple streams such that the decoding quality using any sub- set yields minimum acceptable quality, and higher quality is obtained by receiving more descriptions [94, 104, 106, 111]. As shown in Figure 4, for the case of two encoders and three decoders, the information to be transmitted to the receiver is divided into two descrip- tions. If both descriptions arrive at the receiver, the best possible quality D0 can be obtained. We expect reasonable quality, D1 or D2, even if one packet is lost. In the following subsections, the MDC rate-distortion region is discussed to find the achievable rate-distortion sets for a particular source and distor- tion measures. Then, we introduce three major practical MDC coders.

Rate-Distortion Region of MDC The MDC problem arose from the idea of diversity-based communication systems where the information is split and sent over separate paths. For 26 Introduction two descriptions, El Gamal and Cover (EGC) [94] found an achievable rate- distortion region of (R1, R2, D0, D1, D2) for a sequence of iid random vari- ables X = x , x , ..., x : { 1 2 n}

R1 > I(X; Xˆ1)

R2 > I(X; Xˆ2)

R1 + R2 > I(X; Xˆ1, Xˆ2, Xˆ0) + I(Xˆ1; Xˆ2) (62) such that

D E[d(X; Xˆ )] 1 ≥ 1 D E[d(X; Xˆ )] 2 ≥ 2 D E[d(X; Xˆ )] (63) 0 ≥ 0 where Xˆ1, Xˆ2, and Xˆ0 are three estimates of X, and I( ; ) is the mutual information. · · Ozarow [95] showed that the EGC bound is tight for the iid Gaussian source and squared-error criterion. Given the rates R1 and R2, central distortion, D0, and side distortions, D1 and D2, are

− 2 2(R1+R2) √ √ 2 , if Π > ∆ D0 1 ( Π ∆) −2(R −+R ) ≥ ( 2− 1 2 , otherwise

2R1 D1 2− ≥ 2R D 2− 2 (64) 2 ≥

2(R1+R2) where Π = (1 D1)(1 D2) and ∆ = D1D2 2− . For other input sources and distortion− −measures, the MDC b−ound is not completely known and remains as an open problem. Ahlswede [96,97] proved that the EGC bound is tight for the “no excess rate” case where R1 + R2 = R0(D0). Zhang and Berger [97] proved that this bound is not tight for the “excess rate” case where R1 + R2 > R0(D0). In [98], inner and outer bounds of the achievable MDC region, RX (D0, D1, D2), were found for general memoryless sources and a squared error measure:

2 R∗(σ , D , D , D ) R (D , D , D ) R∗(P , D , D , D ) (65) x 0 1 2 ⊆ X 0 1 2 ⊆ x 0 1 2 2 where R∗(σx, D0, D1, D2) and R∗(Px, D0, D1, D2) are the Ozarow bound 2 2h(X) with a variance σx and an entropy power defined by Px = 2 /2πe of the source, respectively [58]. These bounds form an extension of Shannon lower and upper bounds

1 σ2 1 P log( x ) R (D) log( x ). (66) 2 D ≥ X ≥ 2 D 4 Channel Coding 27

2 Equality in (65) and (66) holds for a Gaussian source where Px = σx. For a high-rate case, the outer bound coincides with the optimum rate-distortion bound. Linder, Zamir, and Zeger [99] found an MDC bound for memoryless sources and locally quadratic distortion measures at high rate. In [100], an MDC bound for wide-sense stationary Gaussian sources with infinite memory was proven to be asymptotically tight for high rate. In [100], an algorithm for the design of optimal two-channel filter banks for MDC of Gaussian sources was proposed. An achievable bound for the L-channel MDC problem was presented in [101]. In successive refinement or layered coding, one base-layer information is sent to obtain basic quality of a system, and several enhancement descrip- tions are sent for better quality. An enhancement layer is sent for refining the previous optimal descriptions in a successive manner. If only enhance- ment layer descriptions are received, the reconstruction quality is poor. On the other hand, MDC yields at least minimum acceptable quality with any subset of descriptions. Equitz and Cover [102] proved that 2-stage succes- sive refinement with coarse and fine descriptions is achievable if and only if the individual solutions of R-D problems can be written as a Markov chain. They showed that Gaussian sources with a squared-error measure, Lapla- cian sources with an absolute-error criterion, and arbitrary discrete sources with hamming distortion can be successively refinable. In [103], the L-stage successive refinement problem was presented.

Practical application I: Multiple Description Scalar Quantizer (MDSQ) In [106], a multiple description scalar quantizer (MDSQ) with a resolution constraint was proposed for multiple channel environments. For given rates 2 R1 and R2, an MDSQ is optimal if it minimizes E[(X Xˆ0) ] under the 2 2 − constraints that E[(X Xˆ1) ] D1 and E[(X Xˆ2) ] D2. The optimal encoders and decoders−are found≤ with a Lagrange− formulation:≤ η(A, g, λ , λ ) = E[(X Xˆ )2] 1 2 − 0 + λ E[(X Xˆ )2] D 1{ − 1 − 1} + λ E[(X Xˆ )2] D (67) 2{ − 2 − 2} where A and g are the encoder partition and the decoder centroid density. The optimal encoder and decoder, (A∗, g∗), are determined iteratively, such that A∗ minimizes η(A, g∗, λ1, λ2) for all A, and such that g∗ minimizes η(A∗, g, λ1, λ2) for all g. The operation point between side-distortion opti- mized MDC (CD-MDC) and central-distortion optimized MDC (CD-MDC) is obtained by changing λ1 and λ2. The central partition is determined by index assignment. A given index set i , i of the side coders is mapped onto an index i for central coding. { 1 2} 0 28 Introduction

In [106], several heuristic techniques were proposed, since optimal index as- signment is very difficult. The number of side indices and the redundancy of the index assignment matrix should be determined to reduce the distor- tion in (67). If the matrix is full, side distortion is high and CD-MDC is obtained. The entropy-constrained version of MDSQ was presented in [107]. A high rate analysis showed that, for a memoryless Gaussian source with a squared-error measure, resolution-constrained MDSQ and entropy- constrained MDSQ have a 3.07 dB and a 8.69 dB performance gap compared with the Ozarow R-D bound [108].

Practical application II: Multiple Description Transform Coding (MDTC) The focus of multiple description transform coding (MDTC) is on how to minimize the side distortions, D1 and D2, for a given central distortion, D0 [109–112]. In single description coding (SDC), the minimum rate R to achieve distortion D is given by the rate-distortion function R(D). However, to reduce D1 and D2 below a certain level, the total rate for the side decod- ing must be greater than R∗. This extra is called redundancy, and based on this, a redundancy-rate distortion (RRD) function is defined. For a memoryless Gaussian source with a squared-error criterion, the Ozarow bound can be rearranged into the RRD function [104, 113, 114]. For the balanced case (R1 = R2 and D1 = D2) with unit variance of a source: 2r D D = 2− 0,R1+R2 ≥ 0,r 1 2r 2r √ 2ρ∗ 2 [1 + 2− (1 2− ) 1 2− ], − − −2r D1,R +R if ρ∗ r 1 + log (1 + 2− ) 1 2 ∗ 2 ≥  (r+ρ≤) −  2− , otherwise (68)  where r and ρ∗ are the base rate and the redundancy rate, respectively, and the total rate is R1 + R2 = r + ρ∗. To control the amount of correlation in MDTC, the pairwise correla- tion transform (PCT), T , was used [109–112]. Lost information can be approximated with other descriptions that have correlation with the lost description. For the balanced case, two independent variables A and B are transformed to correlated variables C and D for an arbitrary linear trans- form T : C A = T (69) D B     where r cos θ r sin θ T = 2 2 2 2 (70) r cos θ −r sin θ  − 1 1 1 1  4 Channel Coding 29

where the parameters r1 and r2 control the lengths of the basis vectors, and θ1 and θ2 control their directions. Wang et al. derived the approximate conditions for the optimality of the transform [111]: 1 r1 = r2 = √sin 2θ1 θ = θ . (71) 1 − 2 Thus the optimal transform is

cot θ1 tan θ1 T = 2 2 (72)  q cot θ1 q tan θ1  − 2 2  q q  which is different from the general orthogonal transform. Compared to MDSQ, MDTC can reach a lower redundancy range, but MDSQ provides a higher peak signal-to-noise ratio (PSNR) in the high redundancy region. In [112], Wang et al. generalized MDTC by combining MDTC with layered coding in [115], and this generalized MDTC results in an RRD performance close to the theoretical bound even for the high redundancy region.

Practical application III: Multiple Description Lattice Quantizer (MDLQ) A multiple description lattice quantizer (MDLQ) is an extension of MDSQ to lattice vector quantization. In [116, 117], Servetto, Vaishampayan, and Sloane (SVS) designed a balanced MDLQ by labelling the side-coding points around a central-coding point with ordered pairs. For a source vector xk, k the central-coding point xˆo is determined by

k k k xˆo = arg min d(x , y ) (73) yk Λ ∈ k where Λ is a lattice. To produce two descriptions for xˆo , a coarse lattice Λ0 is defined as

Λ0 = cλU (74) where c and U are a scaling factor and an orthogonal matrix, respectively. The index assignment maps the central-coding point at Λ to the two side- coding points at Λ0, l : Λ Λ0 Λ0. In the SVS method, the central distortion is fixed by the lattice→ Λ.× The rate and the side distortion are determined by the scaling factor c in (74) and the index assignment. In [117], under the assumption of high rate and a squared-error criterion, the performance of MDLQ was measured for a memoryless source with the 30 Introduction differential entropy h(X) < . For any a (0, 1) and balanced MDC, the ∞ ∈ central distortion D0 satisfies

2R(1+a) 1 2h(X) lim D02 = C(Λ)2 (75) R 4 →∞ and the side distortions, D1 and D2, satisfy

2R(1 a) 2h(X) lim D12 − = C(SL)2 (76) R →∞ where C(Λ) and C(SL) are the normalized second moments of a Voronoi cell of the lattice Λ and of an L-dimensional sphere, respectively. In [118,119], a weighted distortion measure is applied to the SVS method such that the operating point of MDLQ can move towards the side-distortion optimized case. In [122], translated sublattices are used instead of coarse sublattices to improve the performance for high packet loss. In [120,121], an asymmetric MDLQ is presented where each description is allowed to have different distortion.

4.4 Rate-Distortion Optimized Packet Loss Recovery Techniques One of the transmitter-based packet-loss recovery methods is rate-distortion optimized streaming with feedback channels. Suppose that there are L packets to be transmitted to a receiver. Let πl be the transmission policy for a packet l 1, ..., L and Π = (π1, ..., πL) be the policy vector. For example, in the∈AMR{ speec} h coder [154], the policy vector is used to allocate fixed rate to source and channel coding to minimize perceptual distortion for varying channel conditions. Given a policy vector Π = (π1, ..., πL), we can find an expected distortion D(Π) and an average rate R(Π) [123–125]. The expected rate R(Π) is

R(Π) = Blρ(πl) (77) l 1,...,L ∈{X } where Bl is the number of bytes in data unit l, and ρ(πl) is the expected number of transmitted bytes per source byte under policy πl. The expected distortion D(Π) is

D(Π) = D ∆D (1 (π 0 )) (78) 0 − l − l l 1,...,L l0 l ∈{X } Y≤ where D0 is the expected reconstruction distortion if no data unit has been received, ∆Dl is the expected reduction in distortion if data unit l has arrived on time, and (πl) is the probability that data unit l does not arrive 5 Applications: Audiovisual Coding 31 on time. The goal of the rate-distortion optimized streaming is to find the optimal policy vector π that minimizes the Lagrangian

J(π) = D(π) + λR(π). (79)

An iterative descent algorithm can be used to minimize the objective func- tion J(π). Rate-distortion optimization for video coding in error-free environments was reviewed in [126]. Extending this approach to error-prone environments was discussed in [127–130]. In these works, an effective framework for video communications in error-prone environments based on layered coding with prioritized transportation is presented. In paper B, we apply a rate-distortion optimized method to MDC for finding the best operating point at given channel conditions. Since we con- sider the case of fixed rate transmission, for fair comparison with other packet-loss methods, λ in (79) is set to be zero which means that only mean distortion is a constraint for optimization.

5 Applications: Audiovisual Coding

In this section, we briefly survey several international standard coding al- gorithms for speech, audio, image and video.

5.1 Speech and Audio Coding Standards The Telecommunication Standardization Sector of the International Telecommunication Union (ITU-T), which was formerly the International Telegraph and Telephone Consultative Committee (CCITT), is responsible for studying and issuing global speech coding standard recommendations. Narrowband (200 Hz to 3.4 kHz) speech coding standards include G.711 A-law or µ law pulse code modulation (PCM) at 64 kbit/s [142], G.726 adaptive differen− tial PCM (ADPCM) at 40/32/24/16 kbit/s [143], G.728 low-delay CELP at 16 kbit/s [144], G.723.1 dual-rate speech coder at 6.3/5.3 kbit/s [145], and G.729 conjugate- structure algebraic CELP (CS-ACELP) at 8 kbit/s [146]. The functionality of each recommendation is extended by annexes. Wideband (50 Hz to 7 kHz) speech coding standards include G.722 subband PCM at 64/56/48 kbit/s [147], G.722.1 modulated lapped transform (MLT) based coding at 32/24 kbit/s [148], and G.722.2 AMR wideband (AMR-WB) coding at 6.60-23.85 kbit/s [149]. The European Telecommunications Standards Institute (ETSI) and the 3rd Generation Partnership Project (3GPP) have standardized five speech compression algorithms: at 13.0 kbit/s [150, 151], at 5.6 kbit/s [152], at 12.2 kbit/s [153], AMR at 4.75-12.20 kbit/s for source coding [154], and AMR-WB that is same as G.722.2 [149]. 32 Introduction

For global system for mobile communication (GSM), AMR works at two traffic channel modes [155]: adaptive full-rate speech (AFS) and adaptive half-rate speech (AHS). The sum of source and channel coding bit rates for AFS and AHS are 22.8 and 11.4 kbit/s. Instead of the AMR encoder, a mobile station (MS) requests a mode (different bit-rate combination of source and channel coding) according to the carrier-to-interference (C/I) measurements, which is approved by the base station (BS). The 3GPP Project 2 (3GPP2) has standardized the IS-127 enhanced variable-rate codec (EVRC) [156]. In the IS-127, a specific rate is selected at a encoder side based on the phonetic class of the input speech. It guar- antees longer battery life for the reverse link (MS to BS) and more users per cell capacity for the forward link (BS to MS). Even though EVRC has four variable-rate options, it was reported in [157] that EVRC works in prac- tice as a fixed full-rate codec with a silence coding mode. The selectable mode vocoder (SMV) was developed as a new speech coding standard for a substitute for EVRC and the third generation systems [157, 158]. For robust voice communication over IP, the Internet Engineer Task Force (IETF) is standardizing the Internet low-bit-rate codec (iLBC) at 13.33 kbit/s with a frame length of 30 ms and 15.20 kbit/s with a frame length of 20 ms [159]. Since iLBC does not utilize interframe correlation of parameters, the codec enables graceful quality degradation in the case of packet loss. The moving picture experts group (MPEG) standardized MPEG audio coders with three compression layers [160]. The layer 3 of the MPEG- 1 standard is known as MP3, which is widely used for audio file format over the Internet and for portable audio players. The AC3 is also one of the main compression schemes especially for high definition television (HDTV) and DVD in North America [161]. The basic building blocks of audio coding standards include transforms, psychoacoustic models to remove irrelevancy, and quantization [162].

5.2 Image and Video Coding Standards The basic functions of image and video coding are composed of transform, uniform quantization, and lossless coding. For a squared error criterion and a Gaussian input, the Karhunen-Lo`eve transform (KLT) is optimal in the sense that the vector components are decorrelated [139]. For image and video coding, because of the heavy complexity of KLT, the discrete-cosine transform (DCT) is commonly used [140]. The Joint Photographic Experts Group (JPEG) of the International Organization for Standardization (ISO) and the ITU established a joint standard for still . In 1992, the JPEG members selected a DCT-based method with Huffman coding as the ISO/IEC international standard 10928-1 or ITU-T recommendation T.81 [163]. With the increas- 6 Summary of Contributions 33 ing use of multimedia, a new image coding standard, JPEG2000, has been developd [164]. JPEG2000 includes new features such as superior perfor- mance at low bit rate, progressive transmission, region-of-interest coding, robustness to bit errors, and protective security. JPEG2000 uses a transform followed by uniform scalar quantization with arithmetic coding. Numerous international video coding standards exist, including ITU-T H.261, MPEG-1, H.262/MPEG-2 [165], and MPEG-4 (Part 2) [166]. Most video coding standards use block-based DCT, and com- pensation, quantization of DCT coefficients, and lossless coding of quantized DCT coefficients and motion vectors. Each group of pictures (GOP, typ- ically 15 pictures) has at least one intra-picture (I-picture) and a set of predicted (P-pictures)/bidirectionally-predicted pictures (B-pictures) [162]. H.262/MPEG-2 is the most widely used video coding standard for digital television systems and high quality storage such as DVD. Recently, the joint video team (JVT) of ITU-T and ISO/International Engineering Consor- tium (IEC) developed a new coding standard: ITU-T H.264 and ISO/IEC MPEG-4 (Part 10) advanced video coding (AVC) which is referred to as H.264/AVC [166]. H.264/AVC improves coding efficiency by a factor two (which means halving the bit rate for a given fidelity) over MPEG-2 with reasonable complexity and cost. H.264/AVC has important blocks such as inter-picture prediction, low-complexity variable block-size transform, adaptive deblocking filter, context-based arithmetic coding, error-resilience extensions, and so on [167].

6 Summary of Contributions

In this thesis, we study the basic building blocks, source and channel cod- ing, for audiovisual communication systems. Source coding has a tradeoff between rate and distortion. Rate-distortion theory and high-rate theory are used to analyze practical quantizers. While rate-distortion theory as- sumes infinite dimensionality, high-rate theory assumes a large number of reconstruction points. Since high-rate theory is accurate for moderate di- mensionality and rate, we predict quantizer performance using high-rate theory before designing quantizers, and verify the prediction by comparing with the performance of the designed quantizers. In paper A, we propose a new resolution-constrained quantizer that utilizes the three VQ advantages more efficiently than conventional speech coding paradigms. In papers C and D, for the resolution-constrained case, new quantization methods are proposed considering perceptual quality, various distortion measures, the number of distortion outliers, and the source mismatch effect. Channel coding adds redundancy to a source for recovering channel er- rors. Shannon proved that information can be sent reliably over a channel at all rates R below the channel capacity C. From the source and channel 34 Introduction separation theorem, optimal source codes and optimal channel codes can be separately and independently designed. However, if delay is of critical im- portance, such as in real-time communication systems, source and channel coding should be optimized jointly. In paper B, we compare various packet- loss recovery techniques for real-time audiovisual communications over the Internet. Two promising techniques, FEC and MDC, are extensively com- pared in this paper. FEC is optimized for given channel conditions without considering the source, while MDC is a well-known joint source channel coding scheme. We summarize the contributions of the appended papers which consti- tute the main body of this thesis.

Paper A - KLT-Based Adaptive Classified VQ of the Speech Signal Compared to SQ, VQ has memory, space-filling, and shape advantages. If the signal statistics are known, direct vector quantization (DVQ) accord- ing to these statistics provides the highest coding efficiency, but requires unmanageable storage requirements if the statistics are time varying. In CELP coding, a single “compromise” codebook is trained in the excitation- domain and the space-filling and shape advantages of VQ are utilized in a non-optimal, average sense. In this paper, we propose KLT-based adaptive classified VQ (KLT-CVQ), where the space-filling advantage can be utilized since the Voronoi-region shape is not affected by the KLT. The memory and shape advantages can be also used, since each codebook is designed based on a narrow class of KLT-domain statistics. We further improve basic KLT-CVQ with companding. The companding utilizes the shape advantage of VQ more efficiently. Our experiments show that KLT-CVQ provides a higher signal-to-noise ratio (SNR) than basic CELP coding, and has a com- putational complexity similar to DVQ and much lower than CELP. With companding, even single-class KLT-CVQ outperforms CELP, both in terms of SNR and codebook search complexity.

Paper B - Comparative Rate-Distortion Performance of Multiple Description Coding for Real-Time Audiovisual Communication over the Internet To facilitate real-time audiovisual communication through the Internet, FEC and MDC can be used as low-delay packet-loss recovery techniques. We use both a Gilbert channel model and data obtained from real IP con- nections to compare the rate-distortion performance of different variants of FEC and MDC. Using identical overall rates with stringent delay con- straints, we find that side-distortion optimized MDC generally performs 6 Summary of Contributions 35 better than Reed-Solomon based FEC, which itself, as expected, performs better than XOR based FEC. If the channel condition is known from feed- back, then channel-optimized MDC can be used to exploit this information, resulting in significantly improved performance. Our results confirm that two-independent-channel transmission is preferred to single-channel trans- mission both for FEC and MDC.

Paper C - Effect of Cell-Size Variation of Resolution- Constrained Quantization Based on high-rate theory, resolution-constrained quantization is optimized to reduce the mean distortion at a given fixed rate. However, not only the mean distortion, but also the number of distortion outliers is an important factor in human perception. In this paper, to reduce the number of out- liers measured with an r-th power distortion criterion, we propose to use an αr-th power distortion criterion for codebook training. As we increase α, codevectors are placed more towards the tail of source distribution. With a properly selected α, the number of distortion outliers is effectively reduced at a small cost in mean distortion. Experiments with a Gaussian source and line spectral frequencies show that in practical applications the mean distortion of the proposed quantizer is similar to that of conventional quan- tizers while having a lower percentage of outliers and a significantly reduced sensitivity to source mismatch.

Paper D - Resolution-Constrained Quantization with JND based Perceptual-Distortion Measures When the squared error of observable signal parameters is below the just noticeable difference (JND), it is not registered by human perception. We modify commonly used distortion criteria to account for this phenomenon and study the implications for quantizer design and performance. Taking the JND into account in the design of the quantizer generally leads to im- proved performance in terms of mean distortion and the number of outliers. Moreover, the resulting quantizer exhibits better robustness against source mismatch.

References

[1] C. E. Shannon, “A mathematical theory of communications,” The Bell System Technical Journal, vol. 27, pp. 379-423 623-656, 1948.

[2] W. R. Bennett, “Spectra of quantized signals,” The Bell System Tech- nical Journal, vol. 27, pp. 446-472, Jul. 1948. 36 Introduction

[3] S. P. Lloyd, “Least squares quantization in PCM,” IEEE Trans. In- form. Theory, vol. IT-28, no. 2, pp. 129-137, 1982. [4] J. Max, “Quantizing for minimum distortion,” IEEE Trans. Inform. Theory, vol. IT-6, no. 1, pp. 7-12, 1960. [5] Y. Linde, A. Buzo, and R. M. Gray, “An algorithm for vector quantizer design,” IEEE Trans. Commun., vol. COM-28, no. 1, pp. 84-95, 1980. [6] P. A. Chou, T. Lookabaugh, and R. M. Gray, “Entropy-constrained vector quantization,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 37, no. 1, pp. 31-42, 1989. [7] R. M. Gray and D. L. Neuhoff, “Quantization,” IEEE Trans. Inform. Theory, vol. 44, no. 6, pp. 2325-2383, 1998. [8] D. A. Huffman, “A Method for the construction of minimum- redundancy codes,” Proc. IRE, vol. 40, no. 9, pp. 1098-1101, 1952. [9] J. J. Rissanen, “Generalized Kraft inequality and arithmetic coding,” IBM J. Res. Devel., vol. 20, no. 9, pp. 198-203, 1976. [10] J. L. Mannos and D. J. Sakrison, “The effects of a visual fidelity criterion of the encoding of images,” IEEE Trans. Inform. Theory, vol. IT-20, no. 4, pp. 525-536, 1974. [11] D. J. Sakrison, “On the role of the observer and a distortion measure in image transmission,” IEEE Trans. Commun., vol. COM-25, no. 11, pp. 1251-1267, 1977. [12] F. X. J. Lukas and Z. L. Budrikis, “Picture quality prediction based on a visual model,” IEEE Trans. Commun., vol. COM-30, no. 7, pp. 1679-1692, 1982. [13] N. B. Nill, “A visual model weighted cosine transform for image com- pression and quality assessment,” IEEE Trans. Commun., vol. COM- 33, no. 6, pp. 551-557, 1985. [14] A. Buzo, A. H. Gray, Jr., R. M. Gray, and J. D. Markel, “Speech coding based upon vector quantization,” IEEE Trans. Speech Audio Processing, vol. ASSP-28, no. 5, pp. 562-574, 1980. [15] B. S. Atal, R. V. Cox, and P. Kroon, “Spectral quantization and interpolation for CELP coders,” in Proc. IEEE Conf. Acoust., Speech, Signal Processing, May 1989, pp. 69-72. [16] K. K. Paliwal and B. S. Atal, “Efficient vector quantization of LPC parameters at 24 bits/frame,” IEEE Trans. Speech Audio Processing, vol. 1, no. 1, pp. 3-14, 1993. 6 Summary of Contributions 37

[17] V. R. Algazi, “Useful approximations to optimum quantization,” IEEE Trans. Commun., vol. COM-14, no. 3, pp. 297-301, 1966.

[18] H. Gish and J. N. Pierce, “Asymptotically efficient quantizing,” IEEE Trans. Inform. Theory, vol. IT-14, no. 5, pp. 676-683, 1968.

[19] P. L. Zador, “Asymptotic quantization error of continuous signals and the quantization dimension,” IEEE Trans. Inform. Theory, vol. IT-28, no. 2, pp. 139-149, 1982.

[20] A. Gersho, “Asymptotically optimal block quantization,” IEEE Trans. Inform. Theory, vol. IT-25, no. 4, pp. 373-380, 1979.

[21] Y. Yamada, S. Tazaki, and R. M. Gray, “Asymptotic performance of block quantizers with difference distortion measures,” IEEE Trans. Inform. Theory, vol. IT-26, no. 1, pp. 6-14, 1980.

[22] J. A. Bucklew, “Companding and random quantization in several di- mensions,” IEEE Trans. Inform. Theory, vol. IT-27, no. 2, pp. 207-211, 1981.

[23] J. A. Bucklew and G. L. Wise, “Multidimensional asymptotic quantiza- tion theory with rth power distortion measures,” IEEE Trans. Inform. Theory, vol. IT-28, no. 2, pp. 239-247, 1982.

[24] J. A. Bucklew, “A note on optimal multidimensional companders,” IEEE Trans. Inform. Theory, vol. IT-29, no. 2, pp. 279, 1983.

[25] J. A. Bucklew, “Two results on the asymptotic performance of quan- tizers,” IEEE Trans. Inform. Theory, vol. IT-30, no. 2, pp. 341-348, 1984.

[26] T. D. Lookabaugh and R. M. Gray, “High-resolution quantization the- ory and the vector quantizer advantage,” IEEE Trans. Inform. Theory, vol. IT-35, no. 5, pp. 1020-1033, 1989.

[27] D. L. Neuhoff and D. H. Lee, “On The Performance Of Tree-structured Vector Quantization,” in Proc. IEEE Int. Symposium Inform. Theory, Jun. 1991, pp. 247.

[28] D. L. Neuhoff and D. H. Lee, “On the performance of tree-structured vector quantization,” in Proc. IEEE Conf. Acoust., Speech, Signal Processing, Apr. 1991, pp. 2277-2280.

[29] D. H. Lee, D. L. Neuhoff, and K. K. Paliwal, “Cell-conditioned mul- tistage vector quantization,” in Proc. IEEE Conf. Acoust., Speech, Signal Processing, Apr. 1991, pp. 653-656. 38 Introduction

[30] S. Na and D. L. Neuhoff, “Bennett’s integral for vector quantizers,” IEEE Trans. Inform. Theory, vol. 41, No. 4, pp. 886-900, 1995. [31] D. H. Lee and D. L. Neuhoff, “Asymptotic distribution of the errors in scalar and vector quantizers,” IEEE Trans. Inform. Theory, vol. 42, No. 2, pp. 446-460, 1996. [32] D. L. Neuhoff, “On the asymptotic distribution of the errors in vector quantization,” IEEE Trans. Inform. Theory, vol. 42, No. 2, pp. 461- 468, 1996. [33] W. R. Gardner and B. D. Rao, “Theoretical analysis of the high-rate vector quantization of LPC parameters,” IEEE Trans. Speech Audio Processing, vol. 3, no. 5, pp. 367-381, 1995. [34] J. Li, N. Chaddha, and R. M. Gray, “Asymptotic performance of vector quantizers with the perceptual distortion measure,” in Proc. IEEE Int. Symposium Inform. Theory, Jul. 1997, pp. 55. [35] J. Li, N. Chaddha, and R. M. Gray, “Asymptotic performance of vector quantizers with a perceptual distortion measure,” IEEE Trans. Inform. Theory, vol. 45, no. 4, pp. 1082-1091, 1999. [36] T. Linder and R. Zamir, “High-resolution source coding for non- difference distortion measures: the rate distortion function,” in Proc. IEEE Int. Symposium Inform. Theory, Jul. 1997, pp. 187. [37] T. Linder and R. Zamir, “High-resolution source coding for non- difference distortion measures: the rate-distortion function,” IEEE Trans. Inform. Theory, vol. 45, no. 2, pp. 533-547, 1999. [38] T. Linder, R. Zamir, and K. Zeger, “High-resolution source coding for non-difference distortion measures: multidimensional companding,” IEEE Trans. Inform. Theory, vol. 45, no. 2, pp. 548-561, 1999. [39] P. Hedelin and J. Skoglund, “Vector quantization based on Gaussian mixture models,” IEEE Trans. Speech Audio Processing, vol. 8, no. 4, pp. 385-401, 2000. [40] J. M. Morris and V. D. VandeLinde, “Robust quantization of discrete- time signals with independent samples,” IEEE Trans. Commun., vol. COM-22, no. 12, pp. 1897-1902, 1974. [41] R. M .Gray and L. D. Davisson, “Quantizer mismatch,” IEEE Trans. Commun., vol. COM-23, no. 4, pp. 439-443, 1975. [42] D. L. Neuhoff, R. M. Gray, and L. D. Davisson, “Fixed rate universal block source coding with a fidelity criterion,” IEEE Trans. Inform. Theory, vol. IT-21, no. 5, pp. 511-523, 1975. 6 Summary of Contributions 39

[43] D. J. Sakrison, “Worst sources and robust codes for difference distor- tion measures,” IEEE Trans. Inform. Theory, vol. IT-21, no. 3, pp. 301-309, 1975. [44] W. G. Bath and V. D. VandeLinde, “Robust memoryless quantization for minimum signal distortion,” IEEE Trans. Inform. Theory, vol. IT-28, no. 2, pp. 296-306, 1982. [45] D. Kazakos, “New results on robust quantization,” IEEE Trans. Com- mun., vol. COM-31, no. 8, pp. 965-974, 1983. [46] Y. Yamada, S. Tazaki, M. Kasahara, and T. Namekawa, “Variance mismatch of vector quantizers,” IEEE Trans. Inform. Theory, vol. IT-30, no. 1, pp. 104-107, 1984. [47] A. Lapidoth, “On the role of mismatch in rate distortion theory,” in Proc. IEEE Int. Symposium Inform. Theory, Sept. 1995, pp. 193. [48] A. Lapidoth, “On the role of mismatch in rate distortion theory,” IEEE Trans. Inform. Theory, vol. IT-43, no. 1, pp. 38-47, 1997. [49] K. K. Lin and R. M .Gray, “Vector quantization of video with two codebooks,” in Proc. Conf., Mar. 1999, pp. 537. [50] R. M. Gray and T. Linder, “Relative entropy and quantizer mismatch,” in Proc. Thirty-Sixth Asilomar Conf. Signals, Systems and Computers, Nov. 2002, pp. 129-133. [51] R. M. Gray and T. Linder, “Mismatch in high-rate entropy-constrained vector quantization,” IEEE Trans. Inform. Theory, vol. IT-49, no. 5, pp. 1204-1217, 2003. [52] R. M. Gray and T. Linder, “High rate mismatch in entropy constrained quantization,” in Proc. Data Compression Conf., 2003, pp. 173-182. [53] P. Hedelin, F. Norden, and J. Skoglund, “SD optimization of spectral coders,” in IEEE Workshop Speech Coding, Jun. 1999, pp. 28-30. [54] J. Samuelsson and P. Hedelin, “Recursive coding of spectrum parame- ters,” IEEE Trans. Speech Audio Processing, vol. 9, no. 5, pp. 492-503, 2001. [55] J. Lindblom and J. Samuelsson, “Bounded support Gaussian mixture modeling of speech spectra,” IEEE Trans. Speech Audio Processing, vol. 11, no. 1, pp. 88-99, 2003. [56] A. D. Subramaniam and B. D. Rao, “PDF optimized parametric vector quantization of speech line spectral frequencies,” IEEE Trans. Speech Audio Processing, vol. 11, no. 2, pp. 130-142, 2003. 40 Introduction

[57] R. E. Blahut, “Computation of channel capacity and rate-distortion functions,” IEEE Trans. Inform. Theory, vol. IT-18, no. 4, pp. 460-473, 1972.

[58] T. Berger, Rate distortion theory. Englewood Cliffs, NJ: Prentice-Hall, 1971.

[59] R. M. Gray, Source coding theory. Boston, MA: Kluwer Academic Publishers, 1990.

[60] T. M. Cover and J. A. Thomas, Elements of information theory. New York, NY: John Wiley & Sons, 1991.

[61] A. Gersho and R. M. Gray, Vector quantization and signal compression. Boston, MA: Kluwer Academic Publishers, 1992.

[62] W. B. Kleijn, A Basis for Source Coding. Course notes, KTH, Stock- holm, Jan. 2004.

[63] A. Papoulis and S. U. Pillai, Probability, random variables and stochas- tic processes. 4th ed. New York, NY: McGraw-Hill, 2002.

[64] J. H. Conway and N. J. A. Sloane, Sphere packings, lattices and groups. 2nd ed. New York, NY: Springer-Verlag, 1993.

[65] N. J. A. Sloane, “Tables of sphere packings and spherical codes,” IEEE Trans. Inform. Theory, vol. IT-27, no. 3, pp. 327-338, 1981.

[66] J. H. Conway and N. J. A. Sloane, “Voronoi regions of lattices, sec- ond moments of polytopes, and quantization,” IEEE Trans. Inform. Theory, vol. IT-28, no. 2, pp. 211-226, 1982.

[67] E. Agrell and T. Eriksson, “Optimization of lattices for Quantization,” IEEE Trans. Inform. Theory, vol. IT-44, no. 5, pp. 1814-1828, 1998.

[68] D. G. Jeong and J. D. Gibson, “Uniform and piecewise uniform lattice vector quantization for memoryless Gaussian and Laplacian sources,” IEEE Trans. Inform. Theory, vol. IT-39, no. 3, pp. 786-804, 1993.

[69] J. H. Conway and N. J. A. Sloane, “A fast encoding method for lattice codes and quantizers,” IEEE Trans. Inform. Theory, vol. IT-29, no. 6, pp. 820-824, 1983.

[70] J. H. Conway and N. J. A. Sloane, “Fast quantizing and decoding algorithms for lattice quantizers and codes,” IEEE Trans. Inform. Theory, vol. IT-28, no. 2, pp. 227-232, 1982. 6 Summary of Contributions 41

[71] K. Sayood and S. J. Blankenau, “A fast quantization algorithm for lattice quantizer design,” in Proc. IEEE Conf. Acoust., Speech, Signal Processing, May 1989, pp. 1168-1171.

[72] B.-H. Juang and A. H. Gray, Jr., “Multiple stage vector quantiza- tion for speech coding,” in Proc. IEEE Conf. Acoust., Speech, Signal Processing, Apr. 1982, pp. 597-600.

[73] T. R. Fischer, “A pyramid vector quantizer,” IEEE Trans. Inform. Theory, vol. IT-32, no. 4, pp. 568-583, 1986.

[74] P. F. Swaszek and T. W. Ku, “Asymptotic performance of unrestricted polar quantizers,” IEEE Trans. Inform. Theory, vol. IT-32, no. 2, pp. 330-333, 1986.

[75] S. Vembu, S. Verdu,´ and Y. Steinberg, “When does the source-channel separation theorem hold?,” in Proc. IEEE Int. Symposium Inform. Theory, 27 June-1 July 1994, pp. 198.

[76] S. Vembu, S. Verdu,´ and Y. Steinberg, “The source-channel separation theorem revisited,” IEEE Trans. Inform. Theory, vol. IT-41, no. 1, pp. 44-54, 1995.

[77] K. Sayood, H. H. Otu, N. Demir, “Joint source/channel coding for variable length codes,” IEEE Trans. Commun., vol. COM-48, no. 5, pp. 787-794, 2000.

[78] G. Carle and E. W. Biersack, “Survey of error recovery techniques for IP-based audio-visual multicast applications,” IEEE Network, vol. 11, no. 6, pp. 24-36, Nov.-Dec., 1997.

[79] C. Perkins, O. Hodson, and V. Hardman, “A survey of packet loss recovery techniques for streaming audio,” IEEE Network, vol. 12, no. 5, pp. 40-48, Sept.-Oct. 1998.

[80] C. Perkins and O. Hodson, “Options for repair of streaming media,” IETF RFC 2354, Jun. 1998.

[81] P. Cuenca, L. Orozco-Barbosa, A. Garrido, F. Quiles, and T. Olivares, “A survey of error concealment schemes for MPEG-2 video communi- cations over ATM networks,” in Proc. IEEE Canadian Conf. Electrical and Computer Engineering, May 1997, pp. 118-121.

[82] B. W. Wah, X. Su, and D. Lin, “A survey of error-concealment schemes for real-time audio and video transmissions over the Internet,” in Proc. IEEE Int. Symposium Multimedia Software Engineering, Dec. 1998, pp. 17-24. 42 Introduction

[83] A. Li, F. Liu, J. Villasenor, J. Park, D. Park, and Y. Lee, “An RTP payload format for generic FEC with uneven level protection,” Internet Draft draft-ietf-avt-ulp-04.txt, Feb. 2002. [84] J. Rosenberg and H. Schulzrine, “An RTP payload format for generic forward error correction,” IETF RFC 2733, Dec. 1999. [85] C. Perkins, I. Kouvelas, O. Hodson, V. Hardman, M. Handley, J.- C. Bolot, A. Vega-Garcia, and S. Fosse-Parisis, “RTP payload for redundant audio data,” IETF RFC 2198, Dec. 1997. [86] D. G. Hoffman, D. A. Leonard, C. C. Lindner, K. T. Phelps, C. A. Rodger, and J. R. Wall, “Coding theory: the essentials”. New York, NY: Marcel Dekker, 1991. [87] B. Sklar and F. J. Harris, “The ABCs of linear block codes,” IEEE Signal Processing Magazine, vol. 21, no. 4, pp. 14-35, 2004. [88] R. C. Bose and D. K. Ray-Chaudhuri, “On a class of error correcting binary group codes,” Inform. Contr., vol. 3, pp. 69-79, 1960. [89] R. C. Bose and D. K. Ray-Chaudhuri, “Future results on error cor- recting binary group codes,” Inform. Contr., vol. 3, pp. 279-290, 1960. [90] I. S. Reed and G. Solomon, “Polynomial codes over certain finite fields,” SIAM J. Appli. Math., vol. 8, pp. 300-304, 1960. [91] R. C. Singleton, “Maximum distance Q-nary codes,” IEEE Trans. Inform. Theory, vol. IT-10, no. 2, pp. 116-118, 1964. [92] D. J. C. MacKay and R. M. Near, “Shannon limit performance of low density parity check codes,” Electronics Letters, vol. 32, no. 18, pp. 1645-1646, 1996. [93] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error-correcting coding and decoding: Turbo-codes,” in Proc. IEEE Proc. Int. Conf. Commun., May 1993, pp. 1064-1070. [94] A. A. El Gamal and T. M. Cover, “Achievable rates for multiple de- scriptions,” IEEE Trans. Inform. Theory, vol. IT-28, no.6, pp. 851-857, 1982. [95] L. Ozarow, “On a source coding problem with two channels and three receivers,” The Bell System Technical Journal, vol. 59, no. 10, pp. 1909-1921, Dec. 1980. [96] R. Ahlswede, “The rate-distortion region for multiple descriptions without excess rate,” IEEE Trans. Inform. Theory, vol. 31, no. 6, pp. 721-726, Nov. 1985. 6 Summary of Contributions 43

[97] Z. Zhang and T. Berger, “New results in binary multiple descriptions,” IEEE Trans. Inform. Theory, vol. 33, no. 4, pp. 502-521, July 1987.

[98] R. Zamir, “Gaussian codes and Shannon bounds for multiple descrip- tions,” IEEE Trans. Inform. Theory, vol. 45, no. 7, pp. 2629-2636, Nov. 1999.

[99] T. Linder, R. Zamir, and K. Zeger, “The multiple description rate region for high resolution source coding,” in Proc. Data Compression Conf., 1998, pp. 149-158.

[100] P. L. Dragotti, S. D. Servetto, and M. Vetterli, “Optimal filter banks for multiple description coding: analysis and synthesis,” IEEE Trans. Inform. Theory, vol. 48, no. 7, pp. 2036-2052, July 2002.

[101] R. Venkataramani, G. Kramer, and V. K. Goyal, “Multiple description coding with many channels,” IEEE Trans. Inform. Theory, vol. 49, no. 9, pp. 2106-2114, Sept. 2003.

[102] W. H. R. Equitz and T. M. Cover, “Successive refinement of informa- tion,” IEEE Trans. Inform. Theory, vol. 37, no. 2, pp. 269-275, Mar. 1991.

[103] R. Venkataramani, G. Kramer, and V. K. Goyal, “Successive refine- ment on trees: a special case of a new MD coding region,” in Proc. Data Compression Conf., 2001, pp. 293-301.

[104] V. K. Goyal, “Multiple description coding: compression meets the network,” IEEE Signal Processing Magazine, vol. 18, no. 5, pp. 74-93, Sept. 2001.

[105] H. Coward, “Joint Source-Channel Coding: Development of Meth- ods and Utilization in Image Communications,” Ph. D. Dissertation, Norwegian University of Science and Technology, 2001.

[106] V. A. Vaishampayan, “Design of multiple description scalar quan- tizer,” IEEE Tran. Inform. Theory, vol. 39, no. 3, pp. 821-834, May 1993.

[107] V. A. Vaishampayan and J. Domaszewicz, “Design of entropy- constrained multiple-description scalar quantizers,” IEEE Trans. In- form. Theory, vol. 40, no. 1, pp. 245-250, Jan. 1994.

[108] V. A. Vaishampayan and J.-C. Batllo, “Asymptotic analysis of multi- ple description quantizers,” IEEE Trans. Inform. Theory, vol. 44, no. 1, pp. 245-250, Jan. 1998. 44 Introduction

[109] Y. Wang, M. T. Orchard, and A. R. Reibman, “Multiple description image coding for noisy channels by pairing transform coefficients,” in Proc. IEEE First Workshop Multimedia Signal Processing, Jun. 1997, pp. 419-424.

[110] M. T. Orchard, Y. Wang, V. Vaishampayan, and A. R. Reibman, “Redundancy rate-distortion analysis of multiple description coding using pairwise correlating transforms,” in Proc. IEEE Int. Conf. Image Processing, Oct. 1997, pp. 608-611.

[111] Y. Wang, M. T. Orchard, V. Vaishampayan, and A. R. Reibman, “Multiple description coding using pairwise correlating transforms,” IEEE Trans. Image Processing, vol. 10, no. 3, pp. 351-366, Mar. 2001.

[112] Y. Wang, A. R. Reibman, M. T. Orchard, and H. Jafarhani, “An improvement to multiple description transform coding,” IEEE Trans. Signal Processing, Vol. 50, No. 11, pp. 2843-2854, Nov. 2002.

[113] V. K. Goyal and J. Kovaˇcevi´c, “Generalized multiple description cod- ing with correlating transforms,” IEEE Trans. Inform. Theory, vol. 47, no. 6, pp. 2199-2224, Sept. 2001.

[114] V. K. Goyal, “On the side distortion Lower bound in “Generalized multiple description coding with correlating transforms”,” submitted to IEEE Trans. Inform. Theory, Nov. 2002.

[115] A. R. Reibman, H. Jafarkhani, M. T. Orchard, and Y. Wang, “Perfor- mance of multiple description coders on a real channel,” in Proc. IEEE Conf. Acoust., Speech, Signal Processing, May 1999, pp. 2415-2418.

[116] S. D. Servetto, V. A. Vaishampayan, and N. J. A. Sloane, “Multiple description lattice vector quantization,” in Proc. Data Compression Conf., Mar. 1999, pp. 13-22.

[117] V. A. Vaishampayan, N. J. A. Sloane, and S. D. Servetto, “Multi- ple description vector quantization with lattice codebooks: design and analysis,” IEEE Trans. Inform. Theory, vol. 47, no. 5, pp. 1718-1734, 2001.

[118] J. A. Kelner, V. K. Goyal, and J. Kovaˇcevi´c, “Multiple description lattice vector quantization: variations and extensions,” in Proc. Data Compression Conf., Mar. 2000, pp. 480-489.

[119] V. K. Goyal, J. A. Kelner, and J. Kovaˇcevi´c, “Multiple description vector quantization with a coarse lattice,” IEEE Trans. Inform. The- ory, vol. 48, no. 3, pp. 781-788, 2002. 6 Summary of Contributions 45

[120] S. N. Diggavi, N. J. A. Sloane, and V. A. Vaishampayan, “Design of asymmetric multiple description lattice vector quantizers,” in Proc. Data Compression Conf., 2000, pp. 490-499.

[121] S. N. Diggavi, N. J. A. Sloane, and V. A. Vaishampayan, “Asymmetric multiple description lattice vector quantizers,” IEEE Trans. Inform. Theory, vol. 48, no. 1, pp. 174-191, 2002.

[122] D. Y. Zhao and W. B. Kleijn, “Multiple-description vector quanti- zation using translated lattices with local optimization,” to appear in IEEE Global Telecommunications Conf., 2004.

[123] P.A. Chou and Z. Miao, “Rate-distortion optimized sender-driven streaming over best-effort networks,” in Proc. IEEE Fourth Workshop on Multimedia Signal Processing, 2001, pp. 587-592.

[124] J. Chakareski, P. A. Chou, and B. Aazhang, “Computing rate- distortion optimized policies for streaming media to wireless clients,” in Proc. Data Compression Conf., 2002, pp. 53-62.

[125] J. Chakareski and P.A. Chou, “Application layer error correction coding for rate-distortion optimized streaming to wireless clients,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, 2002, pp. 2513-2516.

[126] G. J. Sullivan and T. Wiegand, “Rate-distortion optimization for video compression,” IEEE Signal Processing Magazine, vol. 15, no. 6, pp. 74-90, Nov. 1998.

[127] A. Ortega and K. Ramchandran, “Rate-distortion methods for image and video compression,” IEEE Signal Processing Magazine, vol. 15, no. 6, pp. 23-50, Nov. 1998.

[128] M. Gallant and F. Kossentini, “Efficient scalable DCT-based video coding at low bit rates,” in Proc. Int. Conf. Image Processing, 1999, pp. 782-786.

[129] M. Gallant and F. Kossentini, “Rate-distortion optimal joint source/channel coding for robust and efficient low bit rate packet video communications,” in Proc. Int. Conf. Image Processing, 2000, pp. 355- 358.

[130] M. Gallant and F. Kossentini, “Rate-distortion optimized layered coding with unequal error protection for robust Internet video,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 3, pp. 357-372, Mar. 2001. 46 Introduction

[131] J. D. Johnston, “Transform coding of audio signals using perceptual noise criteria,” IEEE J. Selected Areas Comm., vol. 6, no. 2, pp. 314- 323, 1988.

[132] N. Jayant, “Signal compression: technology targets and research di- rections,” IEEE J. Selected Areas Comm., vol. 10, no. 5, pp. 796-818, 1992.

[133] N. Jayant, J. Johnston, and R. Safranek, “Signal compression based on models of human perception,” Proceedings of the IEEE, vol. 81, no. 10, pp. 1385-1422, 1993.

[134] T. Painter and A. Spanias, “Perceptual coding of digital audio,” Proceedings of the IEEE , vol. 88, no. 4, pp. 451-515, 2000.

[135] A. B. Watson, G. Y. Yang, J. A. Solomon, and J. Villasenor, “Visi- bility of wavelet quantization noise,” IEEE Trans. Image Processing, vol. 6, no. 8, pp. 1164-1175, 1997.

[136] D.-F. Shen and J.-H. Sung, “Application of JND visual model to SPHIT image coding and performance evaluation,” in Proc. IEEE Conf. Image Processing, Jun. 2002, pp. 24-28.

[137] D. S. Kim and M. Y. Kim, “On the perceptual weighting function for phase quantization of speech,” in Proc. IEEE Workshop Speech Coding, Sep. 2000, pp. 17-20.

[138] D. S. Kim, “Perceptual phase quantization of speech,” IEEE Trans. Speech Audio Processing, vol. 11, no. 4, pp. 355-364, Jul. 2003.

[139] J. J. Y. Huang and P. M. Schultheiss, “Block quantization of corre- lated Gaussian random variables,” IEEE Trans. Commun., vol. COM- 11, no. 3, pp. 289-296, 1963.

[140] N. Ahmed, T. Natarajan, and K. Rao, “Dscrete cosine transform,” IEEE Trans. Comput., vol. C-23, pp. 90-93, 1974.

[141] W. B. Kleijn and K. K. Paliwal, Speech Coding and Synthesis. Ams- terdam, The Netherlands: Elsevier, 1995.

[142] ITU-T Recommendation G.711. Pulse code modulation (PCM) of voice frequencies, ITU-T, 1988.

[143] ITU-T Recommendation G.726. 40, 32, 24, 16 kbit/s adaptive differ- ential pulse code modulation (ADPCM), ITU-T, 1990.

[144] ITU-T Recommendation G.728. Coding of speech at 16 kbit/s using low-delay code excited linear prediction, ITU-T, 1992. 6 Summary of Contributions 47

[145] ITU-T Recommendation G.723.1. Dual rate speech coder for multime- dia communications transmitting at 5.3 and 6.3 kbit/s, ITU-T, 1996.

[146] ITU-T Recommendation G.729. Coding of speech at 8 kbit/s us- ing conjugate-structure algebraic-code-excited linear-prediction (CS- ACELP), ITU-T, 1996.

[147] ITU-T Recommendation G.722. 7 khz audio - coding within 64 kbit/s, ITU-T, 1988.

[148] ITU-T Recommendation G.722.1. Coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss, ITU-T, 1999.

[149] ITU-T Recommendation G.722.2. Wideband coding of speech at around 16 kbit/s using Adaptive Multi-rate Wideband (AMR-WB), ITU-T, 2002.

[150] Technical Report GSM 06.10, European digital telecommunications system (Phase2); full rate speech processing functions. ETSI, 1992.

[151] P. Kroon, E. F. Deprettere, and R. J. Sluyter, “Regular-pulse excitation–A novel approach to effective and efficient multipulse coding of speech,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 34, no. 5, pp. 1054-1063, 1986.

[152] I. A. Gerson, and M. A. Jasiuk, “Techniques for improving the perfor- mance of CELP-type speech coders,” IEEE J. Selected Areas Comm., vol. 10, no. 5, pp. 858-865, 1992.

[153] Technical Report GSM 06.60, Digital cellular telecommunications sys- tem: Enhanced full rate (EFR) speech transcoding. ETSI, 1997.

[154] Technical Report GSM 06.90, Digital cellular telecommunications sys- tem (phase 2+): Adaptive multirate (AMR) speech transcoding. ETSI, 1998.

[155] K. Homayounfar, “Rate adaptive speech coding for universal multi- media access,” IEEE Signal Processing Magazine, vol. 20, no. 2, pp. 30-39, 2003.

[156] IS-127, Enhanced variable rate codec. 3GPP2 C.S0014-0, 1997.

[157] Y. Gao, E. Shlomot, A. Benyassine, J. Thyssen, H.-Y. Su, and C. Murgia, “The SMV algorithm selected by TIA and 3GPP2 for CDMA applications,” in Proc. IEEE Conf. Acoust., Speech, Signal Processing, May 2001, pp. 709-712. 48 Introduction

[158] S. C. Greer and A. DeJaco, “Standardization of the selectable mode vocoder,” in Proc. IEEE Conf. Acoust., Speech, Signal Processing, May 2001, pp. 953-956.

[159] S. V. Andersen, W. B. Kleijn, R. Hagen, J. Linden, M. N. Murthi, and J. Skoglund, “iLBC - a linear predictive coder with robustness to packet losses,” in Proc. IEEE Workshop Speech Coding, Oct. 2002, pp. 23-25.

[160] P. Noll, “MPEG digital audio coding,” IEEE Signal Processing Mag- azine, vol. 14, no. 5, pp. 59-81, 1997.

[161] ATSC Doc. A/52, Digital audio compression standard (AC-3), United States Advanced Television Systems Committee (ATSC), 1995.

[162] K. Konstantinides, C.-T. Chen, T.-C. Chen, H. Cheng, and F.-C. Jeng, “Design of an MPEG-2 codec,” IEEE Signal Processing Maga- zine, vol. 19, no. 4, pp. 32-41, 2002.

[163] G. K. Wallace, “The JPEG still picture compression standard,” IEEE Trans. Consumer Electronics, vol. 38, no. 1, pp. xviii-xxxiv, 1992.

[164] A. Skodras, C. Christopoulos, and T. Ebrahimi, “The JPEG 2000 still image compression standard,” IEEE Signal Processing Magazine, vol. 18, no. 5, pp. 36-58, 2001.

[165] MPEG-2/H.262, Generic coding of moving pictures and associated audio information - Part 2: Video, ITU-T and ISO/IEC JTC 1.

[166] H.264/AVC, Draft ITU-T recommendation and final draft international standard of joint video specification (ITU-T Rec. H.264/ISO/IEC 14486-10 AVC), Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, 2003.

[167] T. Wiegand, G. J. Sullivan, G. Bjøtegaard, and A. Luthra,, “Overview of the H.264/AVC Video Coding Standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560-576, 2003.