Image and Video Compression and Copyright Protection

IMAGE AND VIDEO COMPRESSION AND COPYRIGHT PROTECTION

LEI YANG

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA 2011 c 2011 Lei Yang

2 To my beloved parents, elder sister and Zhe

3 ACKNOWLEDGMENTS First and the foremost, I am heartily thankful to my advisor, Professor Dapeng Oliver

Wu, for his great inspiration and excellent guidance throughout my dissertation and my

Ph.D. education at University of Florida. His extensive knowledge, valuable suggestions, research trustiness, and kind consideration help me ﬁnish my dissertation and Ph.D. study.

I would also like to thank Prof. Shigang Chen, Prof. Xiaolin Andy Li, and Prof. Yijun

Sun for serving on my dissertation committee and providing valuable suggestions on this dissertation. I am indebted to my master’s thesis advisor Prof. Pengwei Hao to bring me the research world of signal and image processing. His deep knowledge, research innovation, responsible attitude and impressive kindness have helped me to develop the fundamental and essential academic competence. I am fortunate to be a student of

Prof. Jianbo Gao. I have learnt a lot from his signal processing classes and elaborately designed course projects. I am thankful to my lab-mates in Multimedia Communications and Networking

Laboratory at UF. I am fortunate to join this big friendly family. I would like to thank senior lab-mates Dr. Jun Xu, Dr. Zhifeng Chen and Dr. Bing Han for research discussion;

I cherish every cooperation the discussion with my current lab-mates Qian Chen,

Huanghuang Li, Zheng Yuan, Yuejia He, Shijie Li; I would also like thank Zongrui Ding, Dr. Xihua Dong, Wenxing Ye, Dr. Xiaocheng Li, Yakun Hu, Dr. Taoran Lv, Jiangping

Wang, Sunbao Hua, Qin Chen, Qing Wang, Dr. Youngho Jo and Chris Paulson. Wish you all have success in your studies and work.

I am grateful to Dr. Debargha Mukhurjee, my intern mentor in Google Inc., for his kind guidance, support and help. I am also thankful to my manage Mr. Shastra, Dr. Yaowu Xu, Dr. Jim Bankoski, Dr. Paul Wilkins for their kind instructions and support. I am in a video coding world with you.

4 Finally, I owe my deepest gratitude to my parents, elder sister and Zhe for their endless love and constant support. Without my parents, I would have never been able to accomplish what I had today. To them, I dedicate this dissertation.

5 TABLE OF CONTENTS page

ACKNOWLEDGMENTS ...... 4

LIST OF TABLES ...... 11

LIST OF FIGURES ...... 12 ABSTRACT ...... 15

CHAPTER

1 INTRODUCTION ...... 17

1.1 Problem Statement ...... 17 1.1.1 Research Background ...... 17 1.1.1.1 Compression ...... 17 1.1.1.2 Copyright protection ...... 22 1.1.2 Research Challenges ...... 24 1.1.2.1 Transform design ...... 24 1.1.2.2 Quantizer design ...... 25 1.1.2.3 Image hashing and authentication ...... 25 1.1.2.4 Track-and-trace video watermarking ...... 26 1.2 Contributions of the Dissertation ...... 27 1.3 Outline of the Dissertation ...... 29 2 INTEGER REVERSIBLE TRANSFORMS FOR LOSSLESS COMPRESSION OF IMAGES AND VIDEOS ...... 32

2.1 Research Background ...... 32 2.1.1 Transform Design ...... 32 2.1.2 Introduction of PLUS Factorization ...... 35 2.2 Stable PLUS Factorization Algorithms ...... 37 2.3 PLUS Factorization Optimization ...... 38 2.3.1 Transform Error Analysis ...... 39 2.3.2 Statement of Optimization Problem ...... 40 2.3.3 Optimization Algorithm with Tabu Search ...... 41 2.4 Lossless Transform for Lossy/Lossless Image Compression ...... 43 2.5 Experimental Results ...... 45 2.5.1 Examples of the Stable PLUS Factorization Algorithms ...... 45 2.5.2 Experiments for PLUS Factorization Optimization ...... 46 2.5.3 Experiments on Applications in Image Coding ...... 47 2.5.3.1 Integer DCT with optimal PLUS factorization ...... 47 2.5.3.2 Integer lapped biothogonal transform with optimal PLUS factorization ...... 48 2.6 Summary ...... 51

6 3 ADAPTIVE QUANTIZATION USING PIECEWISE COMPANDING AND SCALING FOR GAUSSIAN MIXTURES ...... 61

3.1 Research Background ...... 61 3.2 Preliminaries ...... 63 3.2.1 MMSE Quantizer ...... 63 3.2.2 Gaussian Mixture Model and Affine Law ...... 64 3.2.3 MMSE Compander ...... 65 3.3 Adaptive Quantizer for Gaussian Mixture Models ...... 66 3.3.1 Design Methodology ...... 66 3.3.2 Three Modes ...... 66 3.3.3 Parameter Determination ...... 67 3.3.4 Piecewise Companding of Mode II ...... 68 3.3.5 Adaptive Quantizer for A General GMM ...... 69 3.3.5.1 GMM estimation by EM ...... 69 3.3.5.2 Generalization to GMM ...... 69 3.4 Reconfigurable A/D converter with Adaptive Quantizer ...... 70 3.5 High Dynamic Range Image Compression with Joint Adaptive Quantizer and Multiscale Techniques ...... 71 3.6 Experimental Results and Discussion ...... 73 3.6.1 Example and Justification of Parameter Determination ...... 73 3.6.2 MSE Performance Comparison ...... 74 3.6.3 An Application in Image Quantization ...... 76 3.6.4 Experimental Results on HDR Image Tone Mapping ...... 77 3.7 Summary ...... 78

4 APPROXIMATING OPTIMAL VECTOR QUANTIZATION WITH TRANSFORMATION AND SCALAR QUANTIZATION ...... 84

4.1 Research Background ...... 84 4.2 Preliminaries ...... 86 4.2.1 n-dimensional MMSE Quantizer and Scaling Law ...... 86 4.2.2 Circular and Elliptical Distributions ...... 88 4.2.3 Ideal Uniform Distribution and Optimal Two-dimensional Hexagon Lattice ...... 89 4.3 System Architecture ...... 90 4.3.1 Quantization for Compression ...... 90 4.3.2 Theorem and System Framework ...... 90 4.4 Preprocessing with Transforms ...... 91 4.4.1 Unitary Transforms ...... 91 4.4.2 Scaling Transforms ...... 92 4.4.3 Optimal Transform for Arbitrary Distributions ...... 93 4.5 Optimal Scalar Quantizers in Tri-axis Coordinate Frame ...... 93 4.5.1 Tri-axis Coordinate Frame ...... 94 4.5.2 Tri-Axis System for Uniform Distribution ...... 94 4.5.3 Tri-Axis Coordinate Frame for Circular and Elliptical Distributions . 95

7 4.5.3.1 Elastic quantization lattices ...... 95 4.5.3.2 Design methodology ...... 96 4.5.3.3 The number of quantization levels in each annulus .... 98 4.5.3.4 Expansion rule ...... 99 4.5.4 Generalization to GMM or LMM ...... 101 4.5.5 Generalization to High Dimension ...... 101 4.6 Experimental Results and Discussions ...... 101 4.6.1 Basic Optimal Properties ...... 102 4.6.2 Circular Gaussian Distribution ...... 103 4.6.3 Elliptical Gaussian Distribution ...... 103 4.6.4 Circular Laplace Distribution ...... 103 4.6.5 Elliptical Laplace Distribution ...... 103 4.6.6 Bit-rate Saving ...... 104 4.7 Summary ...... 104

5 CONTENT BASED IMAGE HASHING ...... 114

5.1 Research Background ...... 114 5.2 System Overview ...... 116 5.3 Robust Descriptor of Images ...... 117 5.3.1 Preprocessing ...... 118 5.3.2 Feature Point Extraction ...... 118 5.3.3 Feature Point Description ...... 119 5.4 Hash Generation ...... 120 5.4.1 Pseudo Random Permutation of Morlet Wavelet Coefﬁcients .... 121 5.4.2 Quantization Using Companding ...... 121 5.4.3 Binary Coding Using Gray Code ...... 122 5.5 Experimental Results ...... 122 5.5.1 The Robustness of Feature Point Detector ...... 122 5.5.2 Parameter Determination of Singularity Descriptor ...... 123 5.5.3 Discriminability and Robustness of Image Hashes ...... 124 5.5.3.1 Discriminability between different images ...... 124 5.5.3.2 Non-predictability of image hashes ...... 124 5.5.3.3 Robustness to content-preserving attacks ...... 124 5.5.3.4 Robustness to tampering ...... 125 5.5.3.5 Discriminative thresholds ...... 125 5.6 Summary ...... 125

6 CONTENT BASED IMAGE AUTHENTICATION ...... 132 6.1 Research Background ...... 132 6.2 System Overview ...... 134 6.3 Feature Point Detection ...... 135 6.3.1 Preprocessing ...... 136 6.3.2 Feature Point Extraction ...... 136 6.4 Feature Point Clustering and Matching ...... 137

8 6.4.1 Clustering by Fuzzy C-Means ...... 137 6.4.2 Outlier Removal ...... 138 6.4.3 Spatial Ordering and Feature Point Matching ...... 139 6.4.4 Algorithm Summary ...... 139 6.5 Distance Evaluation ...... 139 6.5.1 Normalized Euclidean Distance ...... 140 6.5.2 Hausdorff Distance ...... 141 6.5.3 Histogram Weighted Distance ...... 141 6.5.4 Majority Vote ...... 141 6.5.5 Strategy for Threshold Determination ...... 142 6.6 Possible Attack Identiﬁcation ...... 142 6.6.1 Geometric Attack Estimation and Registration ...... 142 6.6.2 Tampering Attack Identiﬁcation ...... 143 6.7 Experimental Results ...... 143 6.7.1 Feature Point Detection ...... 143 6.7.2 Feature Point Matching Example ...... 144 6.7.3 Authentication Performance ...... 144 6.7.4 Distance Comparison ...... 146 6.7.5 Tampering Detection ...... 146 6.8 Summary ...... 147 7 ROBUST TRACK-AND-TRACE VIDEO WATERMARKING ...... 155

7.1 Research Background ...... 155 7.2 Architecture of Robust Video Watermarking System ...... 157 7.2.1 Watermarking Embedder ...... 158 7.2.2 Watermarking Detector ...... 158 7.3 Watermark Embedding Techniques ...... 160 7.3.1 Watermark Pattern Generation ...... 160 7.3.2 Watermark Payload Generation ...... 160 7.3.3 Perceptual Weighting Model ...... 161 7.3.3.1 Temporal perceptual modeling ...... 161 7.3.3.2 Spatial perceptual modeling ...... 162 7.3.4 Geometric Anti-collusion Coding ...... 164 7.4 Watermark Detection Techniques ...... 164 7.4.1 Video Frame Registration ...... 164 7.4.1.1 Spatial registration ...... 165 7.4.1.2 Temporal registration ...... 166 7.4.2 Watermark Extraction and Payload Recovery ...... 167 7.5 Experimental Results ...... 168 7.6 Summary ...... 169

8 CONCLUSIONS ...... 177

8.1 Summary of the Dissertation ...... 177 8.2 Future Work ...... 180

9 8.2.1 Optimal Integer Reversible Transforms and the Lossless Video Compression ...... 180 8.2.2 A New Video Coding Strategy and RDC Optimization ...... 181

APPENDIX A PERTURBATION ANALYSIS OF PLUS ...... 182

B PROOFS ...... 187

B.1 Proof of Proposition 3.1 ...... 187 B.2 Proof of Proposition 3.2 ...... 187 B.3 Proof of Proposition 4.1 ...... 188 B.4 Proof of Lemma 2 ...... 188 B.5 Proof of Lemma 4 ...... 189 B.6 Proof of Lemma 5 ...... 189

REFERENCES ...... 190 BIOGRAPHICAL SKETCH ...... 202

10 LIST OF TABLES Table page

2-1 Performance of optimal factorizations for DCT matrices with exhaustive search. 56

2-2 E2(LUS), OMSE and OME of several PLUS factorizations for DCT matrices .. 56 2-3 Some optimal factorizations for DCT found by exhaustive search ...... 57

2-4 Transform error E2 of optimal factorizations found by TS ...... 57 2-5 Entropy comparison of integer DCTs...... 58

2-6 Entropy comparison among different integer transforms...... 58

2-7 Performance comparison of integer lapped biorthogonal transforms ...... 59 3-1 Proposed quantizer vs. Lloyd-Max quantizer...... 78

3-2 Comparison of complexity of quantizaters...... 81

3-3 Comparison among histograms of images ...... 82

4-1 Average bit-rate saving of the proposed quantizers over other quantizers. ... 113

5-1 Hamming distance between Lena and its tampered version...... 130 5-2 Hamming distance between different test images...... 130

5-3 Hamming distance between attacked images and test images...... 131

6-1 Authentication performance comparison among different methods...... 153

6-2 Tampering detection and percentage of tampering area estimation...... 154 7-1 Step by step results of watermarking embedder ...... 175

7-2 Cross correlation coefﬁcient error ratio performance ...... 175

7-3 Capability of KLT based video registration for various geometric transforms .. 176

11 LIST OF FIGURES Figure page

1-1 The structure of the dissertation...... 31

2-1 Flowchart of 4-point integer DCT implemented with PLUS...... 51

2-2 E2 comparison between the factorizations found by three algorithms...... 52 2-3 Convergence speed of optimization algorithm using TS...... 52

2-4 Average bpp vs. PSNR with integer transforms for test images...... 52

2-5 Lossy performance comparison for image Barbara...... 53

2-6 Lossy performance comparison for image Lena...... 54 2-7 Lossy performance comparison for image Baboon...... 55

3-1 CDF of Gaussian N(0,1) vs. CDF of SGMM with µ = 0.5...... 78

3-2 Transformation function of a piecewise compressor µ = 1.5...... 79

3-3 CDF of the catenated Gaussian vs. CDF of SGMM with µ = 3...... 79

3-4 Reconﬁgurable A/D converter...... 79 3-5 GMM estimation by EM algorithm on histogram of Barbara...... 79

3-6 Tone mapping by using joint adaptive quantizer and multiscale techniques. .. 80

3-7 MSE comparison among different quantizers (σ = 1)...... 80

3-8 MSE comparison among different quantizers (σ = 2)...... 80 3-9 Performance comparison among different quantizers when k = 4...... 81

3-10 Performance comparison among different quantizers when k = 5...... 81

3-11 Performance comparison between different tone mapping algorithms...... 82

3-12 Visual performance comparison between different tone mapping algorithms. .. 83

4-1 General encoding and decoding pipeline...... 105 4-2 System architecture with transform plus scalar quantization...... 105

4-3 Transform plus scalar quantization with companding technique...... 105

4-4 Gaussian mixture model decorrelation...... 106

4-5 Two-dimensional tri-axis coordinate system...... 106

12 4-6 Two dimensional optimal uniform vector quantizer...... 106 4-7 Circularly expanded hexagon lattice for two-dimensional circular distribution. . 107

4-8 Elliptically expanded hexagon lattices...... 107

4-9 Tri-axis frame for a general two-dimensional elliptical distribution...... 107

4-10 Expanded hexagon lattice for two-dimensional circular Gaussian distribution. . 108 4-11 Expanded hexagon lattice for two-dimensional elliptical Gaussian distribution. . 108

4-12 The ﬁrst optimal quantization scheme...... 109

4-13 The second progressive quantization scheme...... 109

4-14 Voxel for three-dimensional uniform distribution...... 110

4-15 MSE per dimension...... 110 4-16 Optimal magnitude levels...... 111

4-17 Rate-Distortion comparison among different quantizers 1...... 111

4-18 Rate-Distortion comparison among different quantizers 2...... 112

4-19 Rate-Distortion comparison among different quantizers 3...... 112 4-20 Rate-Distortion comparison among different quantizers 4...... 113

5-1 Flow chart of image hash generation...... 126

5-2 The 7th frame and the 17th frame in the test video Bunny...... 126

5-3 The stable feature point detector...... 127

5-4 Hash distance with different wavelets at different scales...... 128 5-5 Six test images for image hashing...... 128

5-6 Distance between the hashes...... 129

5-7 Six tampered images of Lena...... 129

6-1 The ﬂowchart of the proposed image authentication system...... 148

6-2 The stable feature point detector...... 149 6-4 Six test images for image hashing...... 150

6-5 Distance comparison among different authentication methods 1...... 151

6-6 Distance comparison among different authentication methods 2...... 151

13 6-7 The two frames in two shots in the test video Bunny...... 152 6-8 Diagram of possible attack identiﬁcation...... 152

7-1 Track-and-trace video watermarking embedder...... 171

7-2 Track-and-trace video watermarking detector...... 172

7-3 Perceptual modeling diagram ...... 173 7-4 PSNR of watermarked video sequences in Table 7-1...... 173

7-5 Geometric transform/attacks to 5th frame of Grandma ...... 174

14 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulﬁllment of the Requirements for the Degree of Doctor of Philosophy IMAGE AND VIDEO COMPRESSION AND COPYRIGHT PROTECTION

Lei Yang

August 2011

Chair: Dapeng Wu Major: Electrical and Computer Engineering

An upsoaring number of digital images and videos demand efﬁcient compression to facilitate storage and transmission of images and videos over Internet, and effective security and copyright protection techniques against malicious fabrication and illegal copy of digital contents.

First, we focus on transform and quantizer design for compression. We design integer reversible transforms with the stabilized and optimized PLUS factorization for uniﬁed lossy/lossless compression. The proposed integer DCTs and integer Lapped Biorthogonal Transform have better lossy/lossless image coding performance than some existing integer DCT and the integer Lapped Transform in JPEG-XR. Moreover, we propose an adaptive quantizer using piecewise companding and scaling for gaussian mixture with three modes. The proposed quantizers approximate MSE performance of Lloyd-Max quantizers but only with similar complexity of uniform quantizers, and achieve higher perceptual quality in High Dynamic Range(HDR) images compression.

Furthermore, we propose an optimal vector quantizer approximator by using transforms plus scalar quantizers with small complexity. The system is built on the tri-axis coordinate frame, works for both circular and elliptical distributions, and almost always outperforms restricted/unrestricted polar quantizers, and rectangular quantizers.

Second, we study copyright protection of digital contents. The proposed image hashes by using companding and gray code have a small collision rate, strong

15 discriminability and are difﬁcult to analyze by attackers. We also propose an image authentication technique by feature point clustering and matching. Query images are authenticated with anchor images. The query images are registered, and the possible tampered areas are detected. Moreover, a robust track-and-trace video watermarking system is developed with watermarking embedder and detector. In the embedder, we insert a watermark pattern into video frames according to a watermark payload weighted by the human perceptual model, and transform videos with geometric anti-collusion codes. In the detector, Kanade-Lucas-Tomasi feature tracker is used to register the candidate videos, and the cross-correlation sequence is binarized, ECC decoded and decrypted. This system is very robust to both geometric attacks and collusion attacks, and watermarks are perceptually invisible to human vision system.

16 CHAPTER 1 INTRODUCTION

1.1 Problem Statement

1.1.1 Research Background

Digital images and videos are in an exponential increase due to the proliferation of Internet, digital cameras, and image and video applications. With the immersive display and imaging technologies, increasing penetration of high-speed broadband, and availability of computing power, image and video compression and copyright protection technology have come of age, enabling images and videos as a driving force for growth and innovation over the years. In many applications and end equipments, images and videos play a key role as a mechanism of information exchange, transmission and storage, and play an important part in human lives.

1.1.1.1 Compression

The soaring number of images and videos requires efﬁcient compression.

Therefore, the compression techniques are deployed in many appliances and applications including cameras, digital cinema, cable and satellite digital video transmissions for entertainment, low-latency video telephone, storage-constrained surveillance applications, machine vision and recognition, small-form and power-constrained mobile handsets, and other handheld devices. For these applications, lossy video compression already satisﬁes the human perceptual requirement. Whereas, for the especially precious data, such as high quality studio products, ﬁne arts and antique documents, medical and satellite data, lossless archiving is needed to preserve every pixel exactly.

The general compression system composes of an encoder and a decoder as shown in Figure 4-1. Encoder usually includes transform, quantization and entropy coding, while decoder performs as a mirror of encoder. There are many image and video coding standards evolving with ages. JPEG is for still color image compression, began in the International Standards Organization

17 (ISO) in 1982, and was approved in September 1992 as ITU-T Recommendation T.81 and in 1994 as ISO/IEC 10918-1. JBIG1 and JBIG2 are for bilevel fax images or black and white images, and JPEG-2000 is for still color images, but they are not as popular as JPEG. The H.26x family of video coding standards were developed by the ITU-T Video Coding Experts Group (VCEG) in the domain of the ITU-T. H.261 is for conversational services. MPEG-1, and MPEG-2 are for storage and broadcast applications. H.263 and H.263+ are video compression standards originally designed as a low-bitrate compressed format for video conferencing. MPEG-7 is for multimedia metadata. H.264/MPEG-4 Part 10 or AVC (Advanced Video Coding) is currently one of the most commonly used formats for the recording, compression, and distribution of high deﬁnition videos. They are presented in the order of increasing complexity although this is not in accordance with the chronological order of their completion.

Among the compression techniques, transforms are widely used in source coding, image processing and computer graphics. Transform coding is a major technique in image and video compression standard, such as JPEG, JPEG 2000, JPEG-XR [56],

MPEG [81] and H.264 [152]. Lossy compression is based on the traditional transforms, such as discrete cosine transform (DCT) in JPEG and H.264, discrete wavelet transform

(DWT) in JPEG2000. For lossless compression, JPEG 2000 uses integer discrete wavelet transform (IntDWT), JPEG-XR uses integer lapped biorthogonal transform (IntLBT) to realize perfect integer reversibility, while lossless JPEG and H.264 are basically based on differential pulse-code modulation (DPCM).

There are many integer reversible transforms proposed in the literature. Besides the directly designed integer transforms [160], most of integer transforms are derived from the traditional transforms. Factorization is an effective tool to make the traditional transforms, such as Discrete Cosine Transform (DCT) [7] and Discrete Wavelet

Transform (DWT) [92, 134], faster, simpler and integer reversible. PLUS factorization is a kind of customizable triangular matrix factorization, proposed by Hao [63] as a

18 new framework of matrix factorization, and encompasses and generalizes quite a few triangular matrix factorizations [64, 133, 141]. In the ﬁrst part of this dissertation, we will

stabilize and optimize the PLUS factorization, and do perturbation analysis on it. We

use it to design integer reversible transforms, to make image and video compression

lossless. Quantization is another critical step in compression as well as in digitization.

The sources are in a great variety. They could be uni-variate or multi-variate, could

be in Gaussian distribution, Laplace distribution or a mixture distribution. How to

make the optimal quantization for an arbitrary source in terms of MMSE? The optimal

vector quantizers are designed for this purpose. The resulted MMSE quantizers or Lloyd-Max quantizers satisfy two optimal conditions: the nearest neighbor condition

(NNC) and the centroid condition (CC) as shown in Eq. (3–3) and Eq. (3–4). However,

the codebook size linearly increases with the number of quantization levels N, moreover, codebook design time exponentially increases with the number of quantization levels N. Therefore, researchers are focusing on designing various substitutions of the optimal vector quantizers with much less quantizer design complexity. This is the goal of

Chapter 3 and Chapter 4. There are several quantizer designs available which provide

various trade-offs between simplicity and performance. It also falls into the study of

Rate-Distortion-Complexity (RDC) theory. For uni-variate quantizer, existing quantization schemes can be classiﬁed into

two categories, namely, uniform quantization and nonuniform quantization [60, 61].

Uniform quantization is simple, and optimal for uniform distributions, but not optimal for

signals with nonuniform distributions if more computations and storage are available.

Nonuniform quantization is much more complex and in a great variety. Minimum mean squared error (MMSE) quantization (a.k.a, Lloyd-Max quantization) is a major

type of nonuniform quantization. It is optimal in the sense of mean squared error

(MSE), but incurs high computational complexity. Companding, which consists of

19 nonlinear transformation and uniform quantization, is a technique capable of trading off quantization performance with complexity for nonuniform quantization. Especially, for

high rate compression, the performance of companding can approach that of Lloyd-Max

quantization asymptotically.

Lloyd-Max quantizers and companders are already well developed for Gaussian distribution or Laplacian distribution [61, 68, 109] as convenience, but not for Gaussian

mixture model (GMM). Since GMM serves as a good approximation of an arbitrary

distribution, it is important to develop quantizers and companders for GMM, which

are expected to ﬁnd wide applications in ADC and high dynamic range (HDR) image

compression, as well as audio [111] and video [152] compression. To address this, we proposes a succinct adaptive quantizer with piecewise

companding and scaling for GMM in this paper. We ﬁrst consider a simple GMM

(SGMM) that consists of two Gaussian components with mean −µ and µ respectively,

and the same variance σ 2. The proposed quantizers have three modes, making them capable of adapting their reconstructed levels to the varying means and variances of the

Gaussian components in a GMM. For multi-variate quantization, to reduce the codebook design and lookup time, a lot of research has been focused on two-dimensional random variables (r.v.), especially those in circular Gaussian distributions, since Gaussian distributions [20, 39] have a lot of elegant close-form theorems. The earliest work could refer to Huang and

Schultheiss’s method [66], which quantizes each dimension of random variables with

separate Lloyd-Max quantizers [97]. It is efﬁcient and effective, but deﬁnitely could be

improved. Later, Zador [163] and Gersho [57] studied quantization by using companders with a large number of quantization levels. They used a compressor to transform the data into a uniform distribution, and then applied the optimal quantizers for the uniform distribution, and then transform the data with an expander. But this scheme does not work well under a small number of quantization levels. Another major method for

20 designing quantizers for circular distributions uses polar coordinates. Polar quantization includes separable magnitude quantization and phase quantization. The optimal ratio between the number of magnitude quantization levels and the number of phase quantization levels are studied by Pearlman [108] and Bucklew et al. [21, 22], and an MMSE restricted polar quantizer is implemented by using a uniform quantizer for the phase angles and a scaled Lloyd-Max Rayleigh quantizer for the magnitude. But this scheme does not consider the center of a circular distribution as a quantization level, thus, its MSE performance is sometime worse than rectangular quantizers and other lattice quantizers, and it does not work well for elliptical distributions. Wilson

[153] proposed a series of non-continuous quantization lattices which provide almost the optimal performance among polar quantization. It is a kind of unrestricted polar quantization, but without Dirichlet boundaries. Peter et al. [135] improved Wilson’s scheme by replacing arc boundaries with Dirichlet boundaries. He showed the optimal circularly symmetric quantizers for circular Gaussian distributions. Most of these previous works concentrate on Gaussian distributions, and provide numerical results only for Gaussian distributions. Although Gaussian source is considered as the “worst case” source for data compression, which is instructive to construct a robust quantizer [27], it is far from the optimal for quantizing other distributions. They did not consider the elliptical distributions neither, whose optimal quantizers are different from those for circular distributions. Also, they did not provide a uniﬁed framework for arbitrary distributions. Therefore, the optimal quantizers for other distributions such as Laplacian distributions, elliptical Gaussian and Laplacian distributions need investigation.

To address these problems, we propose a uniﬁed quantization system to approach the optimal vector quantizers by using transforms and scalar quantizers. The effect of transforms on signal entropy and signal distortion is discussed, especially for unitary transforms and volume-keeping scaling transforms. The optimal decorrelation transform

21 is illustrated which turns a memory source into a memoryless source in an ideal case. Then we focus on the quantizer design for memoryless circular and elliptical sources.

The tri-axis coordinate frame is proposed to determine the quantization lattice, i.e. the positions of quantization levels, inspired by the well-known optimal hexagon lattice for two-dimensional uniformly distributed signals [68]. It provides a uniﬁed framework for both circular and elliptical distributions, and encompasses the polar quantization as a special case. The proposed quantizer is also a kind of adaptive quantizer with elastic lattice. We will present the simple design methodology and utilize the Lloyd-Max quantizers for the corresponding one-dimensional distributions. The optimality of this scheme is veriﬁed on elliptical/circular Gaussian and Laplacian distributions. The methodology description and experiments are focused on the bivariate random variables, and the extension to high dimensional random variables is also discussed.

1.1.1.2 Copyright protection

Besides compression techniques, the security and copyright of digital images and videos also rise to the position that cannot be ignored. Digital images and videos facilitate multimedia processing, and at the mean time, make fabricating and copying of digital contents easy, with the increase of various image and video applications. Computers, interconnected via the Internet, make the distribution of the digital media fast and easy, and make the exact copies to be obtained without efforts. To protect the copyright of images and videos, efﬁcient and automatic techniques are needed to identify and verify the content of digital multimedia. Therefore, it poses great challenges to copyright protection for digital media.

Besides the long-time established copy right protection tool — image watermarking

[112], image hashing [105] emerges as an effective tool to represent images and automatically identify whether the query image is a fabrication or a copy of the original one. As an alternative to image watermarking, image hashing can be applied to many applications previously accomplished by watermarking, such as copyright protection,

22 image authentication. It can also be used for image indexing and retrieval as well as video signature. Unlike watermarking, image hashing need not change the image by inserting watermarks into the image. Image hash is a short binary string, mapped from an image by an image hash function. The image hash function has such a property that perceptually identical images should have the same or similar hash values with high probability, while perceptually different images should have quite different hash values.

In addition, the hash function should be secure, so that an attacker cannot predict the hash value of a known image.

Image authentication [117] is such a promising technique to automatically identify whether the query image is a fabrication or a simple copy of the original one. It can utilize the established image hashing.

The video content changes are less noticeable than image content changes.

Therefore, we apply watermarking technique to video copyright protection [82] rather than image copyright protection. Digital watermark embedding is a process of integrating the user and copyright information into the carrier media in a way which is invisible to human vision system (HVS). Its purpose is to protect the digital works from the unauthorized duplication or distribution. Video watermarking system is desired to embed watermark in such a way that the watermark can be detected later for authentication, copyright protection, and track-and-trace illegal distribution. Videos, composed of multiple frames, can utilize image watermarking techniques in a frame-wise manner. Although the watermarking embedding capacity of videos is much larger than that of images, the attacks the video watermarking suffers are more complicated than image watermarking. The attacks include not only spatial attacks, but also temporal attacks, hybrid spatial-temporal attacks and collusion attacks. These motivate us to investigate on image/video compression and copyright protection by focusing on transform design, quantizer design, image hashing, image authentication and video watermarking.

23 1.1.2 Research Challenges

1.1.2.1 Transform design

Integer reversible transforms are one-to-one mappings, and any permutation is an integer reversible transform. About the integer reversible transform design for lossless compression, the best energy concentrating ability, i.e., the best decorrelation ability, is desired. After integer reversible transforms, the joint entropy of subband coefficients will not change, but the sum of the marginal entropy of subband coefficients will decrease. The least sum of the marginal entropy of subband coefficients is achieved when subband coefficients are independent of each other, i.e., their mutual information is 0. For a multi-dimensional information source, such as an image or a video, the least entropy integer transform, i.e., the optimal permutation, could be derived by algorithms approximately. However, the content of images and videos varying, the optimal integer transform for a source may not optimal globally. Then we wish to design an optimal integer reversible transform in a statistical sense, since the joint two-dimensional distributions of sources are statistically similar. It is well known that the traditional orthogonal transforms have nice decorrelation ability, and they are rotations in Euclidean space geometrically. Therefore, the integer reversible transforms deriving from or mimicing the traditional transforms work well, but still need improving. In addition to the strong decorrelation ability, the piecewise linearity and the least transform error of integer reversible transforms are desired. These two properties guarantee the integer reversible transforms could be applied to lossy compression directly, with the competent performance as the traditional transforms. Moreover, the least transform error will promise coefficients of integer reversible transforms approaching those of original transforms as close as possible. Later on, we will present our methods by using PLUS factorization to address these problems.

24 1.1.2.2 Quantizer design

Quantizer design considers the trade-offs between simplicity and performance.

How to design a quantizer that has the similar good Rate-Distortion performance as the optimal vector quantizer, but with much less complexity is our concern. The good

performance approximation should work for sources in arbitrary distributions. The

source could be uni-variate or multi-variate. It could be memory source or memoryless

source. It could be in circular distribution or elliptical distribution. Uniform distribution

and Gaussian distribution are well investigated. How about the other distributions? Can we use companding to transform arbitrary distribution into uniform or Gaussian

distribution, then use the Lloyd-Max quantizers for uniform or Gaussian distribution to

quantize. Furthermore, how to capture the source distribution information to adaptively

tune the quantizers in a uniﬁed framework needs investigation. Then with certain performance guarantee, how much complexity of quantizer design and quantization

implementation could we reduce? Memoryless source usually could have simpler

quantizer design than memory source. With transforms, we can turn memory source

into memoryless source. Then with the transform design technique at hand, we could

use transform to simplify quantizer’s design and implementation as well as improved the quantization performance. Which transforms are effective? In Chapter 3 and Chapter 4, we will concentrate to address these problems and answer these questions.

1.1.2.3 Image hashing and authentication

For image copyright protection, image hashing should be robust to malicious image

processing. It consists of a compact representation of some image features. It should

be robust to image ﬁltering, but surrenders to geometric attacks and may not be collision

free. The image hash based on Scale Invariant Feature Transform(SIFT) algorithm [48] and compressive sensing technique [138] could solve geometric attacks in certain

degree, but it is computationally expensive. Lin and Chang [86] created the mutual

relationship of pairwise block DCT coefﬁcients to distinguish JPEG compression from

25 malicious modiﬁcations. But the block based method is unreliable to some geometric attacks, since possible shifting and cropping operations may change hash values.

Venkatesan et al. [147] proposed an image hashing technique. Their hashes are generated from statistical features extracted from random tiling of wavelet coefﬁcients.

However, it only allows limited resistance to geometric distortions, and is susceptible to some manipulations, such as luminance change and object insertion. To address these problems, we propose content based image hashing using companding and Gray code.

Image authentication techniques usually include conventional cryptography, fragile and semi-fragile watermarking and digital signature and so on. The authentication process can be assisted with the original image or in the absence of the original image. Image authentication methods, based on cryptography, use a hash function [79, 130] to compute the message authentication code (MAC) from images. The generated hash is further encrypted with a secrete key from the sender, and then appended to the image as an overhead, which is easy to be removed. Fragile watermarking usually refers to reversible data hiding [23, 140, 160, 162]. These methods cannot distinguish tolerable changes from malicious changes. Semi-fragile watermarking has attack-resistant ability between fragile and robust watermarking. And there is a trade off between image quality and watermark robustness. Digital signature based techniques are image content dependent, which are also called image hashing. An image hash is a representation of the image. Besides image authentication, it can also be used for image retrieval and other applications. But it is not intended to identify the locations of changes. To address these problems, we propose content based image authentication by feature point clustering and matching.

1.1.2.4 Track-and-trace video watermarking

26 utilize image watermarking techniques in a frame-wise manner [112]. Although the watermarking embedding capacity of videos is much larger than that of images, the attacks from which the video watermarking suffers are more complicated than those of image watermarking. The attacks include not only spatial attacks, but also temporal attacks and hybrid spatial-temporal attacks. In the literature of track-and-trace video watermarking, the algebra-based anti-collusion code is investigated [10, 17,

24, 139, 144, 151, 155]. Its ability to trace one or multiple colluders depends on the assumption that the code is always available and error-free, which may not be true in practice. Besides, the length of anti-collusion code determines the system user-group capacity. Hence, practical and multi-functional watermarking systems based on algebra anti-collusion code are very limited. To this end, we propose a robust track-and-trace watermarking system for digital video copyright protection [158].

1.2 Contributions of the Dissertation

The major contributions of our work are summarized as follows:

1. We design integer reversible transforms based on the optimized PLUS factorization.

• We stabilize, optimize the PLUS factorization, and do perturbation analysis on it to prove its numerical stability. • We propose the integer DCT and the integer Lapped Biorthogonal Transform by using the optimized PLUS factorization. • Experimental results show the superiority of our algorithms over some existing integer DCT algorithms, and the integer lapped transform factorization in JPEG-XR for lossy/lossless image coding.

2. We design adaptive quantization using piecewise companding and scaling for Gaussian mixtures with three modes.

• The experimental results show that 1) the proposed quantizer is able to achieve performance close to Lloyd-Max quantizer in the sense of Mean Squared Error (MSE), at much lower computational cost than it; 2) the proposed quantizer is able to achieve much better MSE performance than a uniform quantizer, at a cost similar to it. • We propose a reconﬁgurable architecture to implement our adaptive quantizer in an ADC.

27 • We use it to quantize images and design the tone mapping algorithm for high dynamic range (HDR) image compression, rewarding improved visual performance.

3. We design the optimal vector quantizer approximators with transforms and scalar quantizers, especially for two-dimensional random vectors.

• It provides an elegant quantization lattice for arbitrary number of quantization levels, especially for prime numbers. • It almost always have smaller MSE than the other quantizers, and has small design and implementation complexity. • It considers both memoryless and memory source with arbitrary distributions, such as circular distributions, elliptical distributions and mixed distributions. • It is under a uniﬁed framework of tri-axis coordinate frame.

4. We propose a robust image hashing system.

• The k-largest local total variations are robust to content preserving attacks such as geometric attacks and luminance attacks, and indicate stable similar feature points in perceptual identical images. • The Morlet wavelet coefficients are pseudo randomly permuted with a secrete key, which enhances the security and reduces the collision rate of the image hashing system. • Morlet wavelet coefficients are quantized using companding technique according to the probability distribution of the coefficients, which enables the image hashing robust to contrast changing and gamma correction of images. • Gray code is used to binarily code the quantized coefficients, which increases discriminability between image hashes.

5. We propose a robust image authentication system.

• Feature points are ﬁrst generated from a given image, but their locations may be changed due to possible image processing and degradation. Accordingly, we propose to use Fuzzy C-mean clustering algorithm to cluster the feature points and remove the outliers from the feature points. • Histogram weighted distance is proposed, which is equivalent to Hausdorff distance after outlier removal. • The authenticity of the query image is determined by the majority vote of whether three types of distance between matched feature point pair are larger than their respective thresholds.

28 • The geometric transforms through which the query images are aligned with the anchor images are estimated, and the query images are registered. • The possible tampered image blocks are identiﬁed, and the percentage of the tampered area is estimated.

6. We propose a robust track-and-trace video watermarking system. The system provides:

• Security: user and product copyright information, e.g. a string of length Ls , is ﬁrst encrypted with Advanced Encryption Standard (AES); error correction code (ECC) is applied to the sequence to generate a binary sequence with error-correction ability of length L, called watermark payload; a frame-size watermark pattern arises from a pseudo-random noise (PN) sequence. • Perceptual invisibility and robustness: To make a trade-off between visual quality and robustness which is determined by the embedding strength, we build a perceptual model to determine the signal strength that can be embedded to each pixel by using statistical source information and Just-Notice-Difference (JND) model. • Track-and-trace: geometric anti-collusion coding is used for tracking and tracing colluders. • An iterative KLT scheme is used for registration for watermarking extraction.

1.3 Outline of the Dissertation

The structure and inner connection of the dissertation is shown in Figure 1-1. The

ﬁrst theoretical compression part on transform and quantization provides tools for the the second system based copyright protection part. In the ﬁrst part, transform and scalar quantization are combined together to approximate the optimal vector quantization. In the second part, image hashing is incorporated into image authentication system.

The outline of each chapter is presented as follows.

Chapter 2 studies the integer reversible transform for lossless image/video compression by using PLUS factorization. We propose stabilized PLUS factorization in Section 2.2, and do perturbation analysis in Appendix A, which proves the numerical stability of PLUS factorization. Furthermore, we optimize PLUS factorization to achieve the least transform error by using Tabu Search algorithm in Section 2.3. Then we studies the lossless Transform for lossy/lossless image compression in Section

29 2.4. We propose the integer DCT and the integer Lapped Transform by using the optimized PLUS factorization, and test the lossy/lossless image coding performance.

Experimental results show the superiority of our algorithms over some existing integer

DCT algorithms, and the lapped transform factorization in JPEG-XR in Section 2.5,

Finally, we conclude this chapter in Section 2.6. Chapter 3 designs adaptive quantization using piecewise companding and scaling for Gaussian mixtures. Section 3.2 presents the preliminaries of optimal adaptive quantizers. Section 3.3 describes the proposed adaptive quantizer for GMM. In

Section 3.4, we propose a reconﬁgurable architecture to implement our adaptive quantizer in an ADC. In section 3.5, the proposed quantizer is applied into high dynamic range image compression. Experimental results are exhibited in Section 3.6. Section 3.7 concludes the chapter.

Chapter 4 designs the optimal vector quantizer approximators with transforms and scalar quantizers. Section 4.2 presents the preliminaries of our proposed quantizer. Section 4.3 describes the system architecture of transform plus scalar quantization to approach the optimal vector quantizer. The preprocessing with transforms is discussed in Section 4.4 to decorrelate signals. In Section 4.5, we present the tri-axis coordinate frame, and the methodology to design the optimal scalar quantizer for both circular and elliptical distributions in detail. Experimental results are shown in Section 4.6. Finally, Section 4.7 concludes the chapter.

Chapter 5 proposes a robust image hashing system. In Section 5.2, we present an overview of the proposed image hashing system. In Section 5.3, we describe how to extract the robust feature of images, which is the Morlet wavelet coefﬁcients at feature points with the k-largest local total variations. Then the Morlet wavelet coefﬁcients are quantized and binarily coded with Gray code as shown in Section 5.4. Section 5.5 shows the experimental results that demonstrate the effectiveness and robustness of the proposed image hashing system. Finally, we conclude this chapter in Section 5.6.

30 Chapter 6 investigates on a robust image authentication system. Section 6.2 presents an overview of the proposed image authentication system. Section 6.3

describes how to detect feature points in images. In Section 6.4, we propose an efﬁcient

and effective algorithm to remove outliers of feature points, and the remaining feature

points are ordered and matched into pairs. Histogram weighted distance is proposed and normalized Euclidean distance and Hausdorff distance are used in Section 6.5.

Majority voting strategy is used to determine the authenticity of images. In Section 6.6,

possible attacks are identiﬁed, the query images are registered, the tampered image

blocks are located, and the percentage of tampered area is estimated. Experimental

results are shown in Section 6.7. Finally, Section 6.8 concludes the chapter. Chapter 7 researches on track-and-trace video watermarking. Section 7.2 describes

the overall architecture of the proposed track-and-trace video watermarking system.

The watermarking embedder techniques are discussed in Section 7.3. Section 7.4

introduces watermarking detector techniques. The experimental results presented in Section 7.5 verify robustness of the proposed video watermarking. Finally, the conclusion and future work are given in Section 7.6.

Figure 1-1. The structure of the dissertation.

31 CHAPTER 2 INTEGER REVERSIBLE TRANSFORMS FOR LOSSLESS COMPRESSION OF IMAGES AND VIDEOS 2.1 Research Background

2.1.1 Transform Design

Transforms are widely used in source coding, image processing and computer graphics. Transform coding is a major technique in image and video compression standard, such as JPEG, JPEG 2000, JPEG-XR [56], MPEG [81] and H.264 [152]. Besides the directly designed integer transforms [160], most of integer transforms are derived from the traditional transforms. Factorization is an effective tool to make the traditional transforms, such as Discrete Cosine Transform (DCT) [7] and Discrete

Wavelet Transform (DWT) [92, 134], as well as new emerging ripplet transform [156] faster, simpler and integer reversible. These linear transforms, such as Fourier transforms, discrete cosine transforms and wavelet transforms, also have wide applications in general signal processing, with the ability of energy compact frequency decomposition. For prevalent digital images and videos, input signals are available as integer data sequences, or more generally as ﬁxed-point data sequences. In some special applications such as military, medical and remote sensing imaging [113], loss of information is not tolerated during processing.

Therefore, integer reversible transforms are highly desirable. However, these linear transforms cannot achieve perfect reversibility directly due to the precision limitation of computers. In the literature of transforms, the indirectly implemented transforms usually refer to integer discrete wavelet transforms (IntDWT), integer discrete cosine transforms

(IntDCT), integer discrete Fourier transforms (IntDFT) and so forth.

Generally speaking, the integer reversible transforms are the marriage of corresponding linear transforms and an appropriate reversible transform framework with rounding operations in an attempt to approximate the corresponding linear transforms. Sweldens et al. [44, 136, 137] ﬁrst proposed factoring wavelets into lifting steps to realize integer

32 transformation. With the lifting scheme, a new generation of wavelets was constructed, calculated in-place and with further reduced computational complexity. Chen [30] and Liang et al. [85] combined the lifting framework with discrete cosine transforms to construct fast integer DCT, and Oraintara et al. [104] proposed fast integer Fourier transforms (IntFFT) which are also based on the lifting scheme. Besides, another integer transform frameworkoverlapping rounding transform (ORT)was developed by

Jung and Prost [72], which is, later proved by Adams [6], equivalent to a special case of lifting with only trivial extensions. For generic linear transforms, the elementary reversible matrices with Gaussian integer units as the diagonal entries were proposed by Hao and Shi [64] for triangular matrix factorization to realize integer-to-integer transforms. A linear transform with unitary determinant was further proved to be integer reversible if a PLUS factorization was applied to the transform matrix [63]. Differently,

Plonka [110] chose expansion factors for transforms to expand the ranges of the transforms larger than their input domains, and this transform domain redundancy was utilized for reversibility. Modulo transforms (MT), an alternative to lifting, was proposed by Srinivasan [131] recently, which employed certain Pythagorean triples that could be critically quantized to produce a reversible, normalized, scale-free transform, and the ladder structure was employed to approximate orthonormal transforms by 2-point rotations. For the linear transforms, the bit width of the low-frequent coefﬁcients is generally greater than that of the original data, so the dynamic ranges of the transforms are expanded [78]. For example, the discrete cosine transform matrix of type II of order n is: r π II 2 j(2k + 1) C := εn(j)cos (2–1) n n 2n √ √ II where εn(0) = 2/2 and εn(j) = 1 for j , k ∈ 1,··· ,n − 1. It is easy to know kCn k∞ = n. √ It indicates that m-bit inputs result in m + log2 n bit outputs, i.e . 16-bit memory space is generally needed in programs to store a 9-bit output for each 8-bit input after 4-point

33 DCT transformation with general personal computers. Even more, the corresponding integer reversible transforms, IntDCT, usually need greater expanded ranges to keep reversibility. However, if we encounter a computational environment with only limited buffer, fixed-point arithmetic units and fixed-width channels, or with fixed representing word length, the traditional integer transforms will fail, and the dynamic-range-preserving transforms are desired. They are appealing for saving memory and computational resources in both compression and decompression, and attractive for speeding up the computational process of coding, since only are the processing units for fixed-bitwidth operands needed. The solution to integer reversible and dynamic-range-preserving transformation is a challenging research topic, since the constant dynamic range, the compact coefficient representation and reversibility are conflicting goals. Not much has been done to find the best of all possible solutions, except three methods. Chao et al.

[26] utilized the complementary code and modular arithmetic automatically to preserve the dynamic ranges of integer wavelet transforms. However, the application of this approach is limited to lossless compression, in which the large positive transform coefﬁcients become negative and vice versa due to modular arithmetic. Thus, the recovered images suffer severe blocking artifacts and salt-and-pepper noise when transform coefﬁcients are lossily compressed. The other two approaches to avoid the dynamic range expansion is Table-Lookup Haar-like transform (TLHaar) and Piecewise Linear Haar-like transform (PLHaar) proposed by Senecal et al.

[123, 124]. TLHaar and PLHaar are two-point transforms, both evolving from the S

Transform, the integer realization of Haar wavelet transform. TLHaar needs a dynamic look-up table built by optimized permutation and PLHaar is a special 2-D case of our proposed inﬁnity-norm rotation. In [160], we construct general integer-reversible and dynamic-range-preserving inﬁnity-norm rotation transforms by analogy with rotations in the 2-norm space (orthogonal transforms, e.g., DCT). Although PLHaar is a special case

34 of our proposed inﬁnity-norm rotation transforms, it is not straightforward to obtain our proposed transforms from PLHaar.

2.1.2 Introduction of PLUS Factorization

PLUS factorization is a kind of customizable triangular matrix factorization, proposed by Hao [63] as a new framework of matrix factorization, and encompasses and generalizes quite a few triangular matrix factorizations [64, 133, 141]. The PLUS factorization of a general nonsingular matrix A is formulated as:

A=PLUS (2–2) where matrices P, L and U are, almost the same as in LU factorization, permutation, unit lower and upper triangular matrices, respectively, while S is a very special matrix, which is unit, lower and triangular, and only with no more than N − 1 nonzeros. Different from LU factorization, all the diagonal elements of U in PLUS factorization are customizable, i.e. the diagonal elements can be assigned almost freely, as long as the determinant is equal to that of A up to a possible sign adjustment. With PLUS factorization, a nonsingular matrix A is easily factorized further into a series of special matrices similar to S. Furthermore, the permutation matrix can also be substituted with a pseudo-permutation matrix, which is a simple unit upper triangular matrix with 0, 1 and −1 as its off-diagonal elements. Besides PLUS, a customizable factorization also has other alternatives, LUSP, PSUL or SULP with lower S, and PULS, ULSP, PSLU or

SLUP with upper S, which are all taken as varieties of PLUS factorization.

Currently, the lifting factorization [44] is mostly used to factorize the transforms into lifting steps to simplify the transforms as well as to make the transform integer reversible, such as the factorization in JPEG-XR [51]. But these factorizations are mostly based on experiences and experiments. However, the PLUS factorization provides a general and universal way to factorize transform matrices of any order into products

35 of Elementary Reversible Matrices [64]. With the Elementary Reversible Matrices as factor matrices, PLUS factorization is a powerful tool for realizing the integer reversible transform [5], when assisted by the ladder structure and the rounding operations [64].

Therefore, it has promising applications in lossless/lossy coding [64, 67, 128] and reversible image processing [38, 159]. Meanwhile, an elementary reversible matrix is also a triangular shear matrix. Thus, it also has found applications in computer graphics, such as transformation acceleration by shears [28], fast image registration

[29], in which matrix-based transformations dominate. However, the existing PLUS factorization suffers from two limitations: instability and sub-optimality. By instability, we mean that the PLUS factorization may stop because of zero or near zero pivoting during the process of Gaussian Elimination. By sub-optimality, we mean that the PLUS factorization of transform matrices found by existing factorization algorithm may lead to large transform error, which may deprive the good properties of original transform matrices, such as orthogonality and high energy-compacting ability, from the products of the factor matrices of PLUS factorization.

This chapter is proposed to address these problems (other problems of PLUS factorization, like blocking factorization and parallelling computing, arising from solving large linear systems, could be found in [125–127]). Our main contributions include:

1. We propose three stable PLUS algorithms in Matlab pseudo code, and the methodology to stabilize the factorization.

2. We prove the stable theorem and did perturbation analysis of PLUS factorization, to guarantee the stability of our algorithms theoretically.

3. We obtain a closed-form formula of transform error of PLUS factorization, and propose the optimization algorithm based on Tabu Search to quickly ﬁnd the optimal factorization and to realize integer transform with the least transform error.

4. We apply our algorithms to realize integer DCT and integer Lapped Biorthogonal Transform, and test the lossy/lossless image coding performance. Experimental results show the superiority of our algorithms over some existing integer DCT algorithms, and the lapped transform factorization in JPEG-XR.

36 Stabilization, optimization and perturbation analysis of PLUS factorization are documented in [161]. We mainly focused on the newest work of PLUS factorization optimization and its applications in image and video compression.

2.2 Stable PLUS Factorization Algorithms

We exhibit a PLUS factorization algorithm with partial pivoting in Algorithm 2.2, with the two pivots of the largest magnitudes in A(i : n,n) and A(i : n,i).

Algorithm 1 PLUS factorization with partial pivoting For a nonsingular n-by-n matrix A, the ﬁrst n − 1 diagonal entries of U are given in vector u, then the PLUS factorization algorithm with partial pivoting is given as follows: P = 1 : n for i = 1 : (n − 1) do Determine µ1 with i ≤ µ1 ≤ n, so that |A(µ1,n)| =k A(i : n,n) k∞ P(i) ↔ P(µ1) A(i,1 : n) ↔ A(µ1,1 : n) s(i) = (A(i,i) − u(i))/A(i,n) A(1 : n,i) = A(1 : n,i) − s(i) · A(1 : n,n) Determine µ2 with i ≤ µ2 ≤ n, so that |A(µ2,i)| =k A(i : n,i) k∞ P(i) ↔ P(µ2) A(i,1 : n) ↔ A(µ2,1 : n) k = (i + 1) : n A(k,i) = A(k,i)/A(i,i) A(k,k) = A(k,k) − kron(A(k,i),A(i,k)) end for L = I + strict lower tri(A) U = upper tri(A) S = I + [zeros(n − 1,1);1] · [s,0] 2

PLUS factorization with complete pivoting is equivalent to applying both left and right permutation matrices to A in each iteration. Not only is the following algorithm stable without 0 divisors, but also avoids subtraction between two very close numbers when calculating s(pass) in line 9. The column pivots must avoid the elements in the last

··· − column to guarantee that Si PRi+1 = PRi+1 Si . Therefore, PLAPR1 S1PR2 S2 PRn−1 Sn 1 = ··· ··· APLAPR S, where PR = PR1 PR2 PRn−1 and S = S1S2 Sn−1, where PL is left permutation matrix, PR is right permutation matrix and PRi is the right permutation matrix in the i-th iteration.

37 Algorithm 2 PLUS factorization with complete pivoting. PL = 1 : n PR = 1 : n for i = 1 : (n − 1) do Determine µ with i ≤ µ ≤ n , so that |A(µ,n)| =k A(i : n,n) k∞ Determine λ with i ≤ λ ≤ n − 1, so that |A(µ,λ)| = max{|A(µ,m) − d(i)|,m = i : (n − 1)} PL(i) ↔ PL(µ) PR (i) ↔ PR (λ) A(i,1 : n) ↔ A(µ,1 : n) A(1 : n,i) ↔ A(1 : n,λ) if |A(i,n)| > ε then s(i) = (A(i,i) − u(i))/A(i,n) A(1 : n,i) = A(1 : n,i) − s(i) · A(1 : n,n) Determine ν with i ≤ ν ≤ n , so that |A(ν,i)| =k A(i : n,i) k∞ PL(i) ↔ PL(ν) A(i,1 : n) ↔ A(ν,1 : n) k = (i + 1) : n A(k,i) = A(k,i)/A(i,i) A(k,k) = A(k,k) − kron(A(k,i),A(i,k)) end if end for L = I + strict lower tri(A) U = upper tri(A) S = I + [zeros(n − 1,1);1] · [s,0] 2

2.3 PLUS Factorization Optimization

For the better performance in applications, optimization of PLUS factorization aims to minimize three types of transform error, which has three main origins. The first one is due to the precision limitation of computers. The second one results from the rounding operations for integer reversible transformations. The third one comes from the possible quantization of compression. The error will be further propagated and amplified after multiplications by factor matrices. The transform error causes the differences between the coefficients after transform with factor matrices of PLUS factorization and those after the original transform. It is well known that the traditional linear transforms such as DCT and DWT hold many good properties like orthogonality, high de-correlation and energy-concentration ability for effective image coding. Therefore, to keep the

38 same merits as the original transform matrices is an important concern, when using PLUS factorization as the effective tool for realizing integer reversible reversion of the

traditional transforms. The least transform error is desired. This optimization problem of

PLUS factorization is denoted as PLUS FOP in the following discussion.

PLUS factorization is really diversiﬁed, even with a given pattern of S and 1

as diagonal entries of U. For an n-by-n matrix A, A = PLLUSPR , there are up to n!

possible left permutation matrices PL, n! possible right permutation matrices PR and 2n−1 possible combinations of ﬁrst n − 1 diagonal entries of U. It means that there are

n! × n! × 2n−1 possible PLUS factorizations. It is an N-P hard problem to ﬁnd the optimal

PLUS factorization. Thus, enumerating all the possible solutions to ﬁnd the best is out of the question for the high-order matrices, but its results for low-order matrices can be

used as a ground truth for comparison between algorithms. To solve PLUS FOP, we

present our error analysis of PLUS factorization, propose E2 error metric, and design the optimization algorithm with Tabu Search to ﬁnd the factorizations with the least transform error. The optimal PLUS factorization with the least transform error can help improve the

performance of various systems, e.g., lossless/lossy image coding.

2.3.1 Transform Error Analysis

The transform error after the transformation steps with the factor matrices of

PLUS is actually a mixture of direct round-off error and indirect error propagated and

accumulated from the round-off error in the previous steps. For A = PLLUSPR , the transform error can be formulated as:

e = PL(eL + L(eU + U(eS + SPR e0))) (2–3) where eL, eU and eS are round-off error vectors directly brought in after the transformation with L, U and S, respectively, and e0 is the system error vector before transformation.

Compared with the transform error using the original matrix A, e = eA + Ae0, e0 can be

39 disregarded, and the following error model is considered for PLUS factorization:

e = PL(eL + L(eU + UeS )) (2–4)

T T T where eL = [0,1,1,··· ,1] , eU = [1,1,··· ,1,0] and eS = [0,0,··· ,0,1] as the upper bound of the magnitudes of error vectors, when the ﬂoor() operator is used. e can be

evaluated by its norm as follows:

k e k =k PL(eL + L(eU + UeS )) k

=k eL + LeU + LUeS k (2–5)

≤k eL k + k LeU k + k LUeS k

In our random numerical experiments, such upper error bound can sometimes be reached. Therefore, in the integer transform domain, an error metric of PLUS factorization can be deﬁned by the error bound in Equation (2–5):

Ep(LUS) =k eL kp + k LeU kp + k LUeS kp (2–6)

where k • kp is the p-norm operator. 2.3.2 Statement of Optimization Problem

Take the error metric deﬁned in Equation (2–6) as the objective function, the

optimization of PLUS factorization for any nonsingular matrix A is formulated as:

min Ep(LUS) PL,PR ,u (2–7)

s.t. A =PLLUSPR

where u is the vector composed by the ﬁrst n − 1 diagonal elements of U.

For the vector norm in Equation (2–6), we test L1, L2 and L∞ in our experiments, and use E1, E2 and E∞ to represent the corresponding error metric, respectively. Our experimental results in Section 2.5 show that L2, which is a continuous function of

40 vectors, consists with the deﬁnition of Mean Square Error to achieve Least Mean Square

Error. Thus, the transform error metric is deﬁned as E2. The global optimal factorization results for the DCT matrices found by exhaustive

search are given in Table 2-3, when matrix order n = 2,4,8. It is easy to obtain the

optima when n = 2 or 4. But when n = 8, a PC with 0.7G CPU and 128M Memory takes nearly 3 weeks to try all the possibilities, not to mension the matrices of higher orders.

The facts founded by exhaustive search are: (i) Due to symmetry of solution space,

the optimal solutions are not unique: 4 optimal results for 2 × 2 DCT, 4 for 4 × 4 DCT,

and 32 for 8 × 8 DCT are found in our experiments. (ii) There are a lot of factorization

results, which approximate the optimal, scattering in the feasible solution space. In our experiments, there are 8 suboptimal factorizations near the optimal one for 4 × 4 DCT,

with 0.08 more transform error than the least transform error of the optimal factorization.

Therefore, attempts to solve this NP-hard nonlinear combinatorial optimization

problem can be achieved by heuristic methods, which yield the approximately optimal solutions in a polynomial time. Such methods include Neural Network (NN), Simulated

Annealing (SA), Genetic Algorithm (GA), Tabu Search (TS) and so on [149]. In this

chapter, we use Tabu Search to solve the PLUS FOP, since it is a combinatorial

optimization problem, and can be well modelled in the TS framework and well solved as

shown in the experiments. 2.3.3 Optimization Algorithm with Tabu Search

Tabu search (TS) is a meta-heuristic technique proposed by Glover [59] to solve combinatorial optimization problems, such as vehicle routing and shop scheduling

problems [149]. TS avoids being trapped at local minima by allowing the temporal acceptance of the worse solutions, and avoids cyclically revisiting solutions by keeping track of the recent migrations of the solutions in tabu list. It consistently outperforms the algorithm in Section 2.2 and provides amazing optimization performance. Based on the essential procedures of the TS algorithm and the characteristics of possible

41 solutions mentioned in Section 2.3.2, we see that PLUS FOP is a typical problem in the TS framework and TS works well for our PLUS FOP, which is also veriﬁed by the experiments.

Some key terms in the TS algorithm are:

1. Objective function The objective function E is deﬁned in Equation (2–7) with p = 2.

2. Possible solution set A possible solution can be represented as a triplet (PL,PR ,u). The possible n−1 solution set is Ω = {(PL,PR ,u)}, and |Ω| = n! × n! × 2 .

3. Neighbors ∀X ∈ Ω,X = (PL,PR ,u), the neighbors of X is deﬁned as: B(X ) = { 0 0 0 | 0 0 0 } (PL,PR ,u ) d((PL,PR ,u),(PL,PR ,u )) = 1 , where 0 0 0 0 0 0 0 00 0 0 0 d((PL,PR ,u),(PL,PR ,u )) = d (PL,PL) + d (PR ,PR ) + d (u,u ), d (P,P ) = 00 0 δ¯ 0 δ¯ δ¯ δ¯ (∑i,j P(i,j),P (i,j))/2, d (u,u ) = ∑i u(i),u0(i), and i,j = 0 when i = j, i,j = 1 when i =6 j.

4. Candidate list ∀X ∈ Ω, C(X ) ⊆ B(X ), is composed of the top k better candidates among the neighbors of the current point with less E. The size of the candidate list |C(X )| = k is a tunable parameter.

5. Tabu tenets, tabu list and tabu tenure Tabu tenets are implemented using tabu list and tabu tenure. Tabu list records the recent migrations of current point and forbids revisiting these points in tabu tenure.

6. Aspiration criteria Criterion1. If E(Xnext ) < Emin, then the next candidate Xnext will be set as the current candidate Xcurrent , even if Xnext is in the tabu list. Criterion2. If the candidates in the candidate list are all tabu-active, then the best one among the candidates will be set as Xcurrent .

7. Termination criteria Criterion1. If the number of iterations exceeds the maximum iteration limits, then the program stops. Criterion2. If the improvement 4E for Emin is less than the improvement threshold ε, then the program stops.

Our iterative TS algorithm is summarized as follows:

42 Algorithm 3 The optimization of PLUS factorization. Initialization: Randomly Choose an initial point X in Ω. Set Xmin = X , Emin = E(X ) and i = 0. while no termination criteria are meet do i = i + 1 Find B(X ) and ∀X 0 ∈ B(X ) calculate E(X 0). Construct C(X ) and ﬁnd Xnext . if Xnext satisﬁes aspiration criteria then X = Xnext . if E(X ) < Emin then Xmin = X ,Emin = E(X ). end if else if Xnext is not tabu-active, then X = Xnext if E(X ) < Emin then Xmin = X ,Emin = E(X ). end if end if Update tabu list. end while

2.4 Lossless Transform for Lossy/Lossless Image Compression

In this section, we apply optimal PLUS factorization to Discrete Cosine Transform

(DCT) [114] and Lapped Biothogonal Transform (LBT) [94, 95] based image coding. There are a lot of integer implementations of DCT [31, 85, 110] and LBT [34, 143, 145] in the literature. The representative ones are Plonka’s Integer DCT [110] based on expandion factors and Tu et al.’s method in JPEG-XR standard.

In our application, DCT and LBT are ﬁrst factorized by optimal PLUS factorization.

Then for test images, their integer pixel values are transformed into integer coefﬁcients with the integer transforms under studying. The integer transform implemented with

PLUS factorization is illustrated in Figure 2-1. The transform implemented with our method could be implemented in-place with top-down computation of U and bottom-up computation of L. Furthermore, the coefﬁcients are entropy coded for lossless compression, or quantized and coded for lossy compression.

43 For lossless compression, we evaluate the performance of the integer transforms in terms of entropy H(C) of coefﬁcients, which is similar to coding data rate in unit of bits per pixel (bpp). (Note that the distortion caused by lossless coding is zero.) H(C) is the

average entropy of transform coefﬁcients of all subbands, given as below:

1 s H(C) = ∑ H(Ci ) (2–8) s i=1 where H(Ci ) is the entropy of the transform coefﬁcients of the i-th subband, and s is the total number of the subbands. For n-point 2 dimensional DCT, s = n × n. Here we use entropy of transform coefﬁcients instead of implementing an entropy coding scheme

(e.g., Huffman code or arithmetic code) to code the resulting transform coefficients for two reasons. First, this chapter focuses on matrix factorization for integer transform implementation; hence, we should compare the decorrelation performance of the resulting integer transforms, which is usually characterized by the average entropy defined in Eq. (2–8). Second, entropy coding for specific integer transforms is out of the

scope of this chapter; we will leave this for future study.

For lossy compression, to compare the performance of the integer transforms, we use bpp vs. Peak Signal-to-Noise Ratio (PSNR), and Structural SIMilarity (SSIM) index

and the quality map proposed by Wang [150]. In the lossy encoder that we implement,

an input image is ﬁrst transformed by an integer transform; then the integer transform

coefficients are uniformly quantized; the quantized coefficients are coded with fixed length coding instead of variable length coding (or entropy coding), since entropy coding

is out of the scope of this chapter. Finally, the decoder reconstructs the image.

44 2.5 Experimental Results

2.5.1 Examples of the Stable PLUS Factorization Algorithms

Example:    4 3 2 0       3 4 3 2    A =    2 3 4 3    1 2 3 4 The original general PLUS factorization algorithm stops in the ﬁrst iteration due to a14 = 0. With partial pivoting, the result of Algorithm 2.2 is:     0 1 0 0 1 0 0 0         0 0 1 0 4 1 0 0   ·   A =     0 0 0 1 3 −0.5 1 0     1 0 0 0 2 −0.25 1.5 1     1 1 0.33 4  1 0 0 0         0 −1 0.67 −16 0 1 0 0 ·  ·       0 0 1 −18 0 0 1 0     0 0 0 18 0 0.25 0.67 1

With complete pivoting, the result of Algorithm 2.2 is:

      0 1 0 0  1 0 0 0 1 2.5 1.97 4              0 0 0 1  2 1 0 0 0 −1 −0.94 −8    ·   ·   A =       0 0 1 0 2.5 3.13 1 0 0 0 1 18        1 0 0 0 2 1.25 1.22 1 0 0 0 −18

45      1 0 0 0 0 0 0 1          0 1 0 0 1 0 0 0 ·  ·        0 0 1 0 0 1 0 0     0.5 −0.38 0.007 1 0 0 0 1

Remark: The differences between the two above factorizations are that the latter gives lower integer transform error, since complete pivoting results in elements with smaller magnitudes in factor matrices. E2 for the ﬁrst factorization is 17.03, and E2 for the second factorization is 15.68.

2.5.2 Experiments for PLUS Factorization Optimization

DCT matrices are used in the experiments to exemplify the effectiveness of PLUS factorization optimization, due to the popularity of DCT in image and video coding. For

each optimal PLUS factorization, 10000 randomly generated matrices [103]. The range

of elements in randomly generated matrices is [0, 255], which simulate blocks in gray

images. They are tested for the average transform error with Overall Mean Square Error (OMSE) and Overall Mean Error (OME):

∑n ∑n ∑10000 e2(i,j) OMSE = i=1 j=1 k=1 k (2–9) n × n × 10000 n n 10000 ∑ ∑ ∑ |ek (i,j)| OME = i=1 j=1 k=1 (2–10) n × n × 10000

where ek (i,j) = xˆk (i,j) − xk (i,j), xk (i,j) are the coefﬁcients of the randomly generated matrices after DCT transform, and xˆk (i,j) are the coefﬁcients after integer transform with the PLUS factor matrices of the DCT matrices.

The results in Table 2-1 and Table 2-2 reveal that:

• OMSE and OME of the global optimal PLUS factorizations found with error metrics deﬁned using L2 are less than those deﬁned using L1 or L∞.

• For optimal PLUS factorizations with different n, small E(LUS) is related to small OMSE and OME, and E(LUS) increases slowly with the orders of matrices.

46 • For different PLUS factorizations with the same n, small E2(LUS) is related to small OMSE and OME.

Based on the results in Table 2-4, Figure 2-2 and Figure 2-3, some remarks are given as follows:

• For n = 4, when the solution space of optimization is relatively small, our algorithm can always ﬁnd the optima.

• For n = 8, when the solution space of optimization is expanded, our algorithm can ﬁnd the global optima and other sub-optimal solutions. The transform errors of sub-optimal solutions are very close to that of the global optima.

• For n = 16, when the solution space of optimization is very large, our algorithm can ﬁnd the sub-optimal solutions with little ﬂuctuation. The sub-optimal solutions found with our fast TS method are all much better than randomly found ones and those found by any PLUS factorization algorithm.

• The convergence speed of our optimization algorithm using Tabu Search is very fast. In contrast to exhaustive search for weeks, each iteration for factorization of the 8 × 8 DCT matrix only costs 0.2ms on a PC with 0.7G CPU and 128M memory, and the sub-optimal solutions can be found in only a few iterations.

• When n is small, the PLUS factorizations obtained from algorithms in Section 2 are with little larger transform error. Thus, these algorithms are practical in the applications when n is small.

2.5.3 Experiments on Applications in Image Coding

2.5.3.1 Integer DCT with optimal PLUS factorization

We apply the optimal PLUS factorization of DCT found by our proposed algorithm to lossless image coding. The transform matrices are 2-point, 3-point and 4-point DCTs.

Their integer reversible implementation by optimal PLUS factorization are denoted as ‘Opt2’, ‘Opt3’ and ‘Opt4’ respectively. Their integer reversible implementation by expansion factors [110] are denoted as ‘IntDCT2’, ‘IntDCT3’ and ‘IntDCT4’. Table 2-5 shows the entropy of transform coefﬁcients obtained by our proposed optimal PLUS factorization schemes and integer DCT with expansion factors. The entropy obtained by our algorithm is less than that obtained by integer DCT with expansion factors for all test

47 images. It indicates that the integer DCT with optimal PLUS factorization has stronger ability to reduce redundancy in images than integer DCT with expansion factors.

II The PLUS factorization ‘Opt2’ for 2-point DCT C2 is:      − . II 0 1 1 01 0 7071 1 0 C2 =      1 0 0.4142 1 0 1 0.4142 1

II The PLUS factorization ‘Opt3’ for 3-point DCT C3 is:    0 1 0 1 0 0    II    C =1 0 00.3382 1 0 3    0 0 1 0.2391 −0.5176 1    1 −0.3660 −0.7071 1 0 0       0 1 0.8165  0 1 0    0 0 1 0.4142 −0.5176 1

II The PLUS factorization ‘Opt4’ for 4-point DCT C4 is:    0 0 0 1 1 0 0 0       0 0 1 0 0.3827 1 0 0 II    C4 =    1 0 0 0−0.9239 −0.6682 1 0    0 1 0 0 0 0.3318 0.6934 1     − − 1 0.3318 0.3318 0.5 1 0 0 00 0 1 0         0 1 −0.0761 −0.46190 1 0 00 0 0 1         0 0 1 −0.5 0 0 1 00 1 0 0     0 0 0 1 1 0.3364 −0.3364 1 1 0 0 0

2.5.3.2 Integer lapped biothogonal transform with optimal PLUS factorization

We also apply optimal PLUS factorization to make Lapped Transform [93–95] integer reversible. The Photo Core Transform (PCT) and Photo Overlap Transform

48 (POT) are deﬁned in [145]. We obtain the optimal PLUS factorization for 4-point POT and 4-point DCT, and apply it to lossy/lossless image coding, which is denoted as

‘PLUS 1’. We also apply optimal PLUS factorization to 4-point POT and 4-point PCT which is an approximation of DCT. This coding scheme is denoted as ‘PLUS 2’.

The optimal PLUS factorization of POT is:      −0.1448 0.2313 −0.2313 0.9720  0 0 0 1 1 0 0 0            0.2313 0.9720 −0.1448 −0.2313 0 1 0 0 0.276 1 0 0       =   −0.2313 −0.1448 0.9720 0.2313  0 0 1 0−0.276 −0.173 1 0      0.9720 −0.2313 0.2313 −0.1448 1 0 0 0 −0.333 0.328 −0.085 1    − − 1 0.2585 0.2311 0.14481 0 0 0       0 1 −0.2089 −0.19130 1 0 0       0 0 1 0.1583 0 0 1 0    0 0 0 1 1 0.3364 −0.3364 1

The optimal PLUS factorization of DCT is:       0.5 0.5 0.5 0.5  0 1 0 0 1 0 0 0           0.6533 0.2706 −0.2706 −0.6533 1 0 0 00.2346 1 0 0       =    0.5 −0.5 −0.5 0.5  0 0 0 10.4142 −0.7654 1 0      0.2706 −0.6533 0.6533 −0.2706 0 0 1 0 0.2346 0 −0.6934 1    1 −0.2929 −0.0137 −0.6533 1 0 0 0       0 1 0.3066 0.6533  0 1 0 0       0 0 1 0.5  0 0 1 0    0 0 0 1 0.5307 −0.8626 0.3933 1

49 The optimal PLUS factorization of the approximation of DCT is:       0.5 0.5 0.5 0.5  0 1 0 0 1 0 0 0           0.7071 0 0 −0.7071 1 0 0 00.2929 1 0 0       =    0.5 −0.5 −0.5 0.5  0 0 1 00.2929 0 1 0      0 0.7071 −0.7071 0 0 0 0 1 0 0.7071 −2.1213 1    − − − 1 0.5 1.5 0.7071 1 0 0 0       0 1 2 0.7071  0 1 0 0       0 0 1 0.7071  0 0 1 0    0 0 0 1 0.4142 −0.7071 −2.1213 1

The lifting factorization scheme in JPEG-XR [51] is denoted as ‘JPEG-XR’. The entropy performance, as well as lossy performance of ‘PLUS 1’ and ‘PLUS 2’ is better than that of ‘JPEG-XR’, as shown in the Table 2-6, Table 2-7, Figure 2-4, Figure 2-5,

Figure 2-6 and Figure 2-7. In Table 2-6, the entropy obtained by ‘PLUS 1’ is smaller than that obtained by ‘PLUS 2’ and ‘JPEG-XR’ for all test images. In Table 2-7, the

PSNR and SSIM index are shown for all test images at the bit rates of 4, 2, 1, 0.5, 0.25,

0.125 bpp. The block for SSIM calulation is 4. ‘PLUS 1’ has better performance, i.e., larger PSNR and SSIM, than ‘PLUS 2’ and ‘JPEG-XR’. For example, for image Lena, at the bit rate of 0.25 bpp, ‘PLUS 1’ achieves 2.5dB gain in PSNR than ‘JPEG-XR’. In addition, the performance of ‘PLUS 1’ is better than that of ‘PLUS 2’, which means that the DCT implemented in ‘PLUS 1’ has higher decorrelation ability than its approximation in ‘PLUS 2’. We also compare the subjective performance of ‘JPEG-XR’, ‘PLUS 1’ and ‘PLUS 2’ in terms of the visual quality of the reconstructed images ‘Barbara’ in Figure 2-5, ‘Lena’ in Figure 2-6 and ‘Baboon’ in Figure 2-7. The ‘Barbara’, ‘Lena’, ‘Baboon’ reconstructed by ‘PLUS 1’ has the best visual quality, and then the ‘PLUS 2’, and the worst is ‘JPEG-XR’. The quality maps for these reconstructed images are also shown. More

50 contours in quality maps implies higher quality with larger SSIM indexs, which could be found around table cloth of ‘Barbara’, hat ﬂowers of ‘Lena’ and nose of ‘Baboon’. It

indicates that our proposed ‘PLUS 1’ and ‘PLUS 2’ has better objective and subjective

performance in lossy image compression than ‘JPEG-XR’.

2.6 Summary

In this chapter, we studied the integer reversible transform for lossless image/video compression by using PLUS factorization. We proposed stabilized PLUS factorization, and do perturbation analysis, which proves the numerical stability of PLUS factorization. Furthermore, we optimized PLUS factorization to achieve the least transform error by using Tabu Search algorithm. Then we studied the lossless Transform for lossy/lossless image compression. We proposed the integer DCT and the integer Lapped Transform by using the optimized PLUS factorization, and compare the lossy/lossless image coding performance with the standards. Experimental results show the superiority of our algorithms over some existing integer DCT algorithms, and the lapped transform factorization in JPEG-XR. The optimal integer reversible transforms with the least entropy need to be investigated further.

x S U L P y

l21 [·] [·] l31 u12

l41 s1 u13 l32 [·] [·] l42 s2 uu1423

l43 s3 u24 [·] [·] [·] u34

· Figure 2-1. Flowchart of 4-point integer DCT implemented with PLUS. ([ ] denotesL a round-off operation, an edge with a number denotes multiplication, denotes addition.)

51 Figure 2-2. E2 comparison between the factorizations found by three algorithms.

Figure 2-3. Convergence speed of optimization algorithm using TS.

36 JPEG−XR

PSNR(dB) 34 PLUS_1 PLUS_2 32

26 0 0.5 1 1.5 2 2.5 3 3.5 4 Bit Rate(bpp)

Figure 2-4. Average bpp vs. PSNR with integer transforms for test images.

52 A B

C D

50 50 50

100 100 100

150 150 150

200 200 200

250 250 250

300 300 300

350 350 350

400 400 400

450 450 450

500 500 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500

E F G

Figure 2-5. Lossy performance comparison of ‘JPEG-XR’, ‘PLUS 1’ and ‘PLUS 2’ at 0.25 bpp for image Barbara. (A)Original image (B)‘PLUS 1’ PSNR 32.51 dB (C)‘PLUS 2’ PSNR 31.78 dB (D)‘JPEG-XR’ PSNR 30.33 dB (E)Quality map with ‘PLUS 1’ SSIM 0.6556 (F)Quality map with ‘PLUS 2’ SSIM 0.6427 (G)Quality map with ‘JPEG-XR’ SSIM 0.5835.

53 A B

C D

50 50 50

100 100 100

150 150 150

200 200 200

250 250 250

300 300 300

350 350 350

400 400 400

450 450 450

500 500 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500

E F G

Figure 2-6. Lossy performance comparison of ‘JPEG-XR’, ‘PLUS 1’ and ‘PLUS 2’ at 0.25 bpp for image Lena. (A)Original image (B)‘PLUS 1’ PSNR 33.94 dB (C)‘PLUS 2’ PSNR 31.78 dB (D)‘JPEG-XR’ PSNR 33.52 dB (E)Quality map with ‘PLUS 1’ SSIM 0.4831 (F)Quality map with ‘PLUS 2’ SSIM 0.4831 (G)Quality map with ‘JPEG-XR’ SSIM 0.4734.

54 A B

C D

50 50 50

100 100 100

150 150 150

200 200 200

250 250 250

300 300 300

350 350 350

400 400 400

450 450 450

500 500 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500

E F G

Figure 2-7. Lossy performance comparison of ‘JPEG-XR’, ‘PLUS 1’ and ‘PLUS 2’ at 0.25 bpp for image Baboon. (A)Original image (B)‘PLUS 1’ PSNR 29.52 dB (C)‘PLUS 2’ PSNR 29.30 dB (D)‘JPEG-XR’ PSNR 28.85 dB (E)Quality map with ‘PLUS 1’ SSIM 0.7765 (F)Quality map with ‘PLUS 2’ SSIM 0.7669 (G)Quality map with ‘JPEG-XR’ SSIM 0.7599

55 Table 2-1. E(LUS), OMSE and OME of optimal factorizations for DCT matrices with exhaustive search n error metric E(LUS) OMSE OME

E1 2.1907 0.3821 0.00015

2 E2 1.7809 0.1272 0.00011

E∞ 1.0205 0.3000 0.0003

E1 3.2134 0.3855 0.00015

4 E2 2.8893 0.1485 0.00011

E∞ 1.3205 0.3000 0.0003

E1 17.5834 0.4011 0.00016

8 E2 4.6766 0.1549 0.00011

E∞ 4.5655 0.3123 0.00049 n is the order of the DCT matrices.

Table 2-2. E2(LUS), OMSE and OME of several PLUS factorizations for DCT matrices

n E2(LUS) OMSE OME 8.5437 1.4213 0.02015

2 3.0119 0.4728 0.00103

1.7809 0.1272 0.00011

16.7421 6.5122 0.07982

4 6.8102 1.3201 0.01971

2.8893 0.1485 0.00011 n is the order of the DCT matrices.

56 Table 2-3. Some optimal factorizations for DCT found by exhaustive search

n PL PR u E2 2 1 2 1 2 1 −1 1.7809

2 1 1 2 1 1 1.7809

4 2 0 4 3 1 2 3 4 1 1 1 2.8893

4 3 2 1 2 4 1 3 −1 1 1 2.8893

47186532 32547186 −1 −1 1 −1 1 1 1 1 4.6766 8 47186532 32547816 −1 −1 1 −1 1 1 1 −1 4.6766

2 3 1 4 8 5 76 1 4 6 3 5 2 7 8 1 −1 1 −1 1 −1 −1 −1 4.6766

23148576 14635728 1−1 1−1 1−1 −1 1 4.6766

Table 2-4. Transform error E2 of optimal factorizations found by TS t

n k 5 10 15 20 Eae Emse 4 2.89 2.89 2.89 2.89

4 8 2.89 2.89 2.89 2.89 0 0 11 2.89 2.89 2.89 2.89

4 4.68 4.81 4.82 4.78

8 6 4.76 4.76 4.76 4.76 0.08 0.0016

8 4.76 4.76 4.71 4.76

5 8.35 8.14 8.31 7.94

16 8 8.19 8.23 8.12 8.23 — 0.011

11 8.28 8.28 8.28 8.28 1 ∑N − k is the size of candidate list; t is tabu tenure; Eae = N i=1 E(i) Emin; 1 ∑N − 2 Emin is E2 in Table 2-3; Emse = N i=1(E(i) E) ; E is the average error.

57 Table 2-5. Entropy comparison of integer DCTs Image IntDCT2 Opt2 IntDCT3 Opt3 IntDCT4 Opt4

Barbara 6.94 5.95 8.02 6.65 6.92 5.57

Lena 6.37 5.38 7.43 5.97 6.36 5.03 Boat 6.41 5.42 7.46 6.06 6.39 5.12

Jet 5.89 5.11 6.89 5.72 5.87 4.95

Mandrill 7.59 6.59 8.66 7.00 7.57 6.62

Goldhill 6.69 5.70 7.78 6.23 6.68 5.43

Average 6.65 5.69 7.71 6.27 6.63 5.45

Table 2-6. Entropy comparison among integer lapped biorthogonal transforms implemented by JPEG-XR, PLUS 1 and PLUS 2 Image JPEG-XR PLUS 1 PLUS 2

Lena 4.93 4.48 4.61

Baboon 6.34 6.11 6.22

Barbara 5.64 4.98 5.29

Boat 5.10 4.54 4.71 Goldhill 5.27 4.92 5.01

Peppers 5.09 4.76 4.83

Average 5.40 4.97 5.11

58 Table 2-7. Bpp vs. PSNR comparison of integer lapped biorthogonal transforms Image bpp JPEG-XR PLUS 1 PLUS 2

PSNR SSIM PSNR SSIM PSNR SSIM

4 44.14 0.9103 45.21 0.9286 45.03 0.9261 2 42.35 0.8769 42.97 0.8939 42.93 0.8930

Lena 1 38.90 0.7840 39.82 0.8052 39.69 0.8035

0.5 35.16 0.5866 36.95 0.6206 36.69 0.6150

0.25 31.45 0.4226 33.94 0.4831 33.52 0.4734

0.125 28.46 0.2865 30.10 0.3568 29.78 0.3484

4 44.13 0.9370 45.22 0.9502 45.06 0.9482

2 42.36 0.9135 42.99 0.9256 42.93 0.9482

Barbara 1 38.89 0.8489 39.96 0.8694 39.73 0.8667

0.5 34.68 0.7167 36.54 0.7635 36.02 0.7550 0.25 30.34 0.5835 32.51 0.6558 31.78 0.6427

0.125 26.43 0.4475 28.14 0.5291 27.42 0.5086

4 44.14 0.9868 45.44 0.9901 45.30 0.9897 2 42.39 0.9807 43.13 0.9837 43.06 0.9834

Baboon 1 38.85 0.9597 39.28 0.9636 39.21 0.9633

0.5 33.89 0.8957 34.31 0.9041 34.24 0.9011

0.25 28.85 0.7599 29.52 0.7765 29.30 0.7669

0.125 24.46 0.5574 25.39 0.6079 25.06 0.5916

59 Table 2-7. Continued

Image bpp JPEG-XR PLUS 1 PLUS 2

PSNR SSIM PSNR SSIM PSNR SSIM

4 44.09 0.8834 45.17 0.9062 44.97 0.9027 2 42.31 0.8449 42.92 0.8668 42.91 0.8655

Boat 1 39.02 0.7490 40.22 0.7662 40.05 0.7641

0.5 35.26 0.5807 37.01 0.6047 36.67 0.6000

0.25 30.91 0.4488 33.14 0.4995 32.74 0.4875 0.125 27.40 0.3033 29.12 0.3744 28.85 0.3614

4 44.13 0.9495 45.29 0.9604 45.09 0.9587

2 42.37 0.9305 43.01 0.9399 42.96 0.9397

Goldhill 1 38.88 0.8738 39.55 0.8840 39.41 0.8824 0.5 34.42 0.7399 35.64 0.7673 35.45 0.7614

0.25 30.38 0.5413 32.11 0.6039 31.86 0.5882

0.125 27.51 0.3282 28.79 0.4153 28.63 0.4017

4 44.15 0.9336 45.36 0.9490 45.17 0.9473 2 42.35 0.9051 43.05 0.9193 43.00 0.9193

Peppers 1 38.87 0.8242 39.49 0.8423 39.33 0.8388

0.5 34.59 0.6251 35.67 0.6467 35.57 0.6327

0.25 31.17 0.3909 33.27 0.4249 33.09 0.4182

0.125 28.57 0.2619 30.19 0.3160 29.99 0.3017

60 CHAPTER 3 ADAPTIVE QUANTIZATION USING PIECEWISE COMPANDING AND SCALING FOR GAUSSIAN MIXTURES 3.1 Research Background

Quantization is a critical technique for analog-to-digital conversion and signal compression. On one hand, many input signals are continuous analog signals, therefore, quantization is indispensable for analog-to-digital converters (ADC) [70], which are important components of many digital products. On the other hand, with the exponential growth of usage of computers and Internet, countless digital contents, especially digital images and videos, demand signal compression for efficient storage and transmission. Accordingly, quantization provides a means to represent signals efficiently with acceptable fidelity for signal compression.

Existing quantization schemes can be classiﬁed into two categories, namely, uniform quantization and nonuniform quantization [60, 61]. Uniform quantization is simple, but not optimal for signals with nonuniform distribution in terms of MMSE if more computations and storage are available. While nonuniform quantization is much more complex and in a great variety. Minimum mean squared error (MMSE) quantization

(a.k.a, Lloyd-Max quantization) is a major type of nonuniform quantization. It is optimal in the sense of mean squared error (MSE), but incurs high computational complexity.

Companding, which consists of nonlinear transformation and uniform quantization, is a technique capable of trading off quantization performance with complexity for nonuniform quantization. Especially, for high rate compression, the performance of companding can approach that of Lloyd-Max quantization asymptotically.

Lloyd-Max quantizers and companders are already well developed for Gaussian distribution or Laplacian distribution [61, 68, 109] as convenience, but not for Gaussian mixture model (GMM). Since GMM serves as a good approximation of an arbitrary distribution, it is important to develop quantizers and companders for GMM, which

61 are expected to ﬁnd wide applications in ADC and high dynamic range (HDR) image compression, as well as audio [111] and video [152] compression.

To address this, we proposes a succinct adaptive quantizer with piecewise

companding and scaling for GMM in this chapter. We ﬁrst consider a simple GMM

(SGMM) that consists of two Gaussian components with mean −µ and µ respectively,

and the same variance σ 2. The proposed quantizers have three modes, making them capable of adapting their reconstructed levels to the varying means and variances of the

Gaussian components in a GMM.

Speciﬁcally, for SGMMs, if µ is small, our quantizer operates in Mode I, and treats

the input as if it were from two overlapping Gaussian random variables (r.v.) rather than

a GMM r.v.. For Mode I, our quantizer can be implemented by a compander or a scaled Lloyd-Max quantizer of a unit-variance Gaussian. If µ is large, our quantizer operates

in Mode III, i.e., if the input is negative, treat the input as if it were a Gaussian r.v. with

mean −µ; if the input is positive, treat the input as if it were a Gaussian r.v. with mean µ. For Mode III, our quantizer can be implemented by two companders or two scaled

Lloyd-Max quantizers, each of which corresponds to one of the two Gaussian r.v.s. If µ

is of medium value, our quantizer operates in Mode II, i.e., with piecewise companding.

Moreover, we propose a reconﬁgurable architecture to implement our adaptive

quantizer in an ADC. The proposed adaptive quantizer is tuned by the information from a signal histogram estimator to optimally quantize signals with available speed and power

from devices. Furthermore, the proposed quantizer is applied into image quantization

and high dynamic range image compression. We design HDR tone mapping algorithm

by jointly using adaptive quantizers and multiscale techniques. Therefore, the proposed algorithm could mitigate the halo artifacts in the resulted low dynamic range image, as

well as keep the contrast of image details crossing the largest gamut.

The experimental results show that 1) our proposed quantizer is able to achieve

MSE performance close to Lloyd-Max quantizer for GMM, at much lower cost than

62 Lloyd-Max quantizer for GMM; 2) our proposed quantizer is able to achieve much better MSE performance than a uniform quantizer, at a cost similar to the uniform quantizer.

The experimental results also show that the proposed adaptive quantizer holds great

potential of revolutionizing the existing ADC and HDR image compression. It works well

with both high rate and low rate quantization. The rest of the chapter is organized as below. Section 3.2 presents the preliminaries

of optimal adaptive quantizers. Section 3.3 describes the proposed adaptive quantizer

for GMM. In Section 3.4, we propose a reconﬁgurable architecture to implement our adaptive quantizer in an ADC. In section 3.5, the proposed quantizer is applied into high dynamic range image compression. Experimental results are exhibited in Section 3.6. Section 3.7 concludes the chapter.

3.2 Preliminaries

3.2.1 MMSE Quantizer

The performance of a quantizer can be evaluated by mean square error (MSE)

between input signal X and the reconstructed signal Xˆ , i.e.,

MSE = E[(X − Xˆ )2] (3–1)

Lloyd-Max quantizer [58] is an MMSE quantizer. Let tk (k = 0,··· ,N) denote

boundary points of quantization intervals, and let rk (k = 0,··· ,N −1) denote quantization levels. Then Lloyd-Max quantizer is characterized by:

{ ∗ ∗} tk ,rk = arg min MSE {tk ,rk } − Z (3–2) N 1 tk +1 2 = arg min ∑ (x − rk ) fX (x)dx { } tk ,rk k=0 tk

where fX (x) is the probability density function (pdf) of X , N is the number of quantization

levels. Deriving respect to tk and rK in Eq. 4–3, we have the centroid and the nearest

63 neighbor conditions as following:

∗ ∗ ∗ r − + r t = k 1 k , k = 1,··· ,N − 1, (3–3) k 2

and R ∗ tk+1 t∗ xp(x)dx ∗ k ··· − rk = R ∗ , k = 0, ,N 1, (3–4) tk+1 ∗ p(x)dx tk ∗ ∗ where [t0 ,tN ] is the range of the quantizer input. The Lloyd-Max quantizer for Gaussian distribution with zero mean and unit variance has been well studied. Given the number of quantization levels N, the Lloyd-Max quantizer for zero mean, unit variance Gaussian could be obtained from tables in [68]. Given the Lloyd-Max quantizer for zero mean, unit variance Gaussian, we can use the

afﬁne law in Proposition 3.1 to obtain the Lloyd-Max quantizer for Gaussian distribution

with arbitrary mean µ and arbitrary variance σ 2.

3.2.2 Gaussian Mixture Model and Afﬁne Law

Gaussian distribution is wildly used in signal modeling because of its simplicity,

ubiquity, and the Central Limit Theorem. However, signals in the real world, such as

pixel intensity of natural images, may have an arbitrary distribution, which can be better approximated by a GMM than by a Gaussian distribution.

The pdf of a GMM r.v. X is given as below:

Ng fX (x) = ∑ pi · gi (x) (3–5) i=1 where Ng is the number of Gaussian components in the GMM; gi (x) is the Gaussian pdf for component i (i = 1,··· ,Ng); pi denotes the probability of component i (i = 1,··· ,Ng); ∑Ng and i=1 pi = 1. In this chapter, we ﬁrstly consider a Simple GMM (SGMM) given as below:

1 − 1 (x−µ)2 − 1 (x+µ)2 fX (x) = √ (e 2 + e 2 ) (3–6) 2 2π

64 Given a suboptimal quantizer for SGMM, we can use the afﬁne law in Proposition 3.1 to obtain a suboptimal quantizer for a GMM that consists of two Gaussian components

with arbitrary mean −µ and µ (µ > 0), respectively and the same variance σ 2 (σ 2 > 0). It can also be used to obtain the suboptimal quantizer for a GMM with arbitrary number of components. Proposition 3.1. (Afﬁne Law) For a r.v. X with zero mean and unit variance, assume

that its N-level Lloyd-Max quantizer is speciﬁed by tk (k = 0,··· ,N) and rk (k = 0,··· ,N − 1). Then for r.v. Y = σX + µ, with mean µ and variance σ, its Lloyd-Max quantizer is

speciﬁed by tˆk = σtk + µ (k = 0,··· ,N) and rˆk = σrk + µ (k = 0,··· ,N − 1). For a proof of Proposition 3.1, see Appendix B.1. 3.2.3 MMSE Compander

A compander consists of a compressor, a uniform quantizer, and an expandor; the compressor performs nonlinear transformation and the expandor is an inverse of the

compressor. The compressor is intended to convert the input r.v. of arbitrary distribution

into a uniformly-distributed r.v., so that we can use a simple uniform quantizer, which is

the optimal quantizer for the one-dimensional uniform distribution in the sense of MMSE.

Proposition 3.2 gives a nonlinear transformation for an (suboptimal) MMSE compander for any distribution.

Proposition 3.2. Assume that a r.v. X has Cumulative Distribution Function (CDF)

FX (x)(x ∈ R). Then r.v. Y = FX (X ) is uniformly distributed in [0,1]; and the compander

with compressor Y = FX (X ) is an optimal/suboptimal MMSE quantizer of X , especially when X is quantized with high rate.

For a proof of Proposition 3.2, see Appendix B.2.

For Gaussian distribution with zero mean and unit variance, a MMSE compressor

performs transformation by 1 − Q(X ), where Z 1 ∞ u2 Q(X ) = √ exp(− )du. (3–7) 2π X 2

65 Since the integral in Q(X ) has high computational complexity, in this chapter, we propose a simple compressor, which only needs computation of piecewise monomials

(see Section 3.3.4).

3.3 Adaptive Quantizer for Gaussian Mixture Models

In this section, we ﬁrst present our adaptive quantizer for SGMM in Eq. (3–6) and

then extend it to a more complicated GMM with arbitrary µ and σ 2, and arbitrary number

of components, by using Proposition 3.1.

3.3.1 Design Methodology

Because Proposition 3.2 states that the compander with compressor Y = FX (X ) is a MMSE quantizer of input X , our design methodology is to ﬁnd a compressor whose

transformation function is simple, but can achieve a good approximation of CDF FX (X ). The robust quantizer [75] will be provided through the determination of the required

parameters.

Figure 3-1 shows the CDF of Gaussian N(0,1) vs. that of SGMM with µ = 0.5. We

can observe that they are similar. Figure 3-2 shows the transformation function of a piecewise compressor speciﬁed by Eq. (3–10) vs. CDF of SGMM with µ = 1.5. From

Figure 3-2, we could observe that the transformation function of a piecewise compressor

speciﬁed by Eq. (3–10) is similar to the CDF of SGMM with µ = 1.5. Figure 3-3 shows

two catenated CDFs of two Gaussians vs. the CDF of SGMM with µ = 3, where the

catenated CDF of two Gaussians is given by Eq. (3–15). From Figure 3-3, it is observed that the CDF of the catenated Gaussian is similar to the CDF of SGMM with µ = 3.

For this reason, our proposed adaptive quantizer operates under three modes, which

correspond to small µ, medium-valued µ, and large µ, respectively.

3.3.2 Three Modes

Let qg(X ) denote the Lloyd-Max quantization function for a Gaussian r.v. X ∼ N(0,1).

66 Our proposed adaptive quantizer operates in one of the following three modes, depending on the value of µ.

1. If 0 ≤ µ < µS , the quantizer operates in Mode I, i.e., the quantizer can be an MMSE compander for Gaussian N(0,1), or Lloyd-Max quantizer for Gaussian N(0,1).

Denote the quantization function in Mode I by qI (X ). We use Lloyd-Max quantizer for Gaussian N(0,1) to implement Mode I, i.e.,

qI (X ) = qg(X ). (3–8)

The motivation of using Mode I is that the CDF of Gaussian N(0,1) is similar to the CDF of SGMM with small µ as shown in Figure 3-1.

2. If µS ≤ µ < µL, the quantizer operates in Mode II, i.e., the quantizer is a compander with a piecewise compressor speciﬁed by Eq. (3–10). The motivation of using Mode II is that the transformation function of a piecewise compressor speciﬁed by Eq. (3–10) is similar to the CDF of SGMM with medium-valued µ as shown in Figure 3-2.

3. If µ ≥ µL, the quantizer operates in Mode III, i.e., the quantizer can be two catenated MMSE compander for two Gaussians, or two catenated Lloyd-Max quantizers for two Gaussians. Denote the quantization function in Mode III by

qIII (X ). We choose the catenated Lloyd-Max quantizer to implement Mode III as following: ( qg(X − µ) , X ≥ 0 qIII (X ) = (3–9) qg(X + µ) , X < 0 The motivation of using Mode III is that two catenated CDFs of Gaussian is similar to that of SGMM with large µ as shown in Figure 3-3. 3.3.3 Parameter Determination

In this section, the values of µS and µL will be determined. It is well known the 3-sigma rule that nearly all (99.7%) of the values lie within 3 standard deviations around the mean for Gaussian distribution. Therefore, if µ ≥ 3, the two Gaussian components of SGMM could be dealt with respectively, as in Mode

III. When µ < σ, for SGMM, the data of right Gaussian component in [µ − σ, µ + σ], always fall in the [−µ − 3σ,−µ + 3σ], the 3 standard deviations around the mean of left Gaussian component, and vise versa. Therefore, for σ = 1, when 0 ≤ µ < 1, we consider

67 the data of SGMM as Mode I. In conclusion, for the proposed quantizer µS = 1 and

µL = 3. 3.3.4 Piecewise Companding of Mode II

For Mode II, we choose the monomial f (x) = ax b to approximate the ideal compressor of SGMM, i.e. the CDF of SGMM, piecewisely. There are many more accurate and more complicated approximative functions, like the sum of monomials

∑ bi 1 f (x) = i ai x , i > 1, sigmoid function f (x) = 1+e−x , and f (x) = arctan(x). But their corresponding expandors, i.e. the inverses of compressors, are hard to obtain or computationally expensive. However, f (x) = axb has simple inverse and is a good approximation to the segments of the CDF of SGMM. The piecewise compressor symmetrical to the origin can be described by Eq. (3–10).

  µ b ≤ −µ  a(x + ) + 0.25,x (3–10a)   0 0  a (x + µ)b + 0.25,−µ < x ≤ 0 (3–10b) f (x) =  0 b0  −a (µ − x) + 0.75,0 < x ≤ µ (3–10c)   a(x − µ)b + 0.75,x > µ (3–10d)

with Z Z 3 ∞ 0 0 2 {a,a ,b,b } = arg{ min ( (FSGMM(x, µ) − f (x, µ)) dx)dµ} (3–11) {a,a0,b,b0} 1 −∞

1 0 1 0 By the steepest descent method, we obtain b = 3 , b = 2 , a = 0.15 and a = 0.125 (which can be realized by right shifting 3 bits) for simplicity and fast computation.

The compressor is shown in Figure 3-2 when µ = 1.5. When x < −µ and x > µ, the

1 PDF decaying faster, we use f (x) = ax 3 . When x > −µ and x < µ, the PDF decaying 0 1 slower, we use f (x) = a x 2 . It results that the data with small probability is compressed more and the data with large probability is compressed less. It is more precise than

piecewise linear compander [75], and still simple.

68 Although there are more accurate compressors to approximate the CDF with certain µ, they may not have good approximations to the CDF with other µ ∈ [1,3) in average.

The proposed compressor is a good tradeoff between accuracy and generalizability. It provides a stable good performance when µ ∈ [1,3) as shown in experiments in Section

3.6. It is robust. Therefore, the proposed compander has three advantages.

1. It is easy to design compander by Eq. (3–10);

2. It is fast to quantize data with this compander;

3. It has good average MSE performance when µ ∈ [1,3). 3.3.5 Adaptive Quantizer for A General GMM

In this section, we design the adaptive quantizer for a general GMM based on the adaptive quantizer for SGMM.

3.3.5.1 GMM estimation by EM

The GMM (Gaussian Mixture Model) is a probability distribution model consisting ﬁnite number of Gaussian components as shown in Eq. (4–10). The Expectation-Maximum

(EM) algorithm [15] is a general method to ﬁnd the maximum likelihood estimation of

GMM.

EM algorithm can efﬁciently estimate the components of GMM [157] as shown in Figure 3-5. The number of components of GMM should be assigned to the EM algorithm by experience and restricted by the available computational resources and N, the number of the reconstruction levels of quantizers. Ng could be N/5 or smaller. µi , σi and pi (i = 1,··· ,Ng) of each Gaussian component in Eq. (4–10) are determined by the EM algorithm. The GMM estimation of signals is obtained for later quantization once for all. 3.3.5.2 Generalization to GMM

For the General GMM as shown in Eq. (4–10), with the scaling law in Proposition 3.1, the following generalizations are made from SGMM by processing neighboring pairwise

Gaussian components of a GMM.

69 Assuming the Gaussian components are sorted by their means µi , for the

neighboring Gaussian components Ci and Ci+1, we consider support (µi , µi+1), when 6 −∞ µ µ ∞ i = 1,Ng, else consider ( , 1) or ( Ng ,+ ).

1. Allocate the number Ni from the total reconstruction levels N for each Gaussian component according to its percentage pi .

Ni = [N · pi ]

where [·] is round off operator. For each Ni , it is symmetrically located with respect to the mean of the corresponding Gaussian component.

2. Origin Shift: µ σ 2 For any two adjacent Gaussian components with means and variances of ( i , i ) µ σ 2 and ( i+1, i+1), their pdfs equal around

σi µi + σi+1µi+1 xo = σi + σi+1

(the effect of pi is omitted). Then we shift the origin to xo.

3. The three-mode boundaries µS and µL are scaled by (σi + σi+1).

4. Scale the reconstruction levels according to the variance: For the Gaussian component i with (µi ,σi ), scale the reconstruction levels obtained from SGMM by σi .

5. Tune mode II: Since half support (µi , µi+1) of Gaussian components is considered each time, the compressor in Eq. (3–10b)(3–10c) are needed, and should be scaled by pi as: ( 0 b0 pi (a (x + µ) + 0.25),−µ < x ≤ 0 f (x) = (3–12) 0 b0 pi (a (x + µ) + 0.25),−µ < x ≤ 0

In this way, the adaptive quantizer for a GMM is determined.

3.4 Reconﬁgurable A/D converter with Adaptive Quantizer

With the proliferation of autonomous sensors, and digital devices, there has been an increasing demand for reconﬁgurable analog-to-digital converters (ADC) [62], where the

proposed adaptive quantizer can have important applications. We propose a reconﬁgurable A/D converter adaptive to the distribution of the

input signals with the proposed quantizer as shown in Figure 3-4. For the input signal

70 with arbitrary distribution, we quickly sample and discretize it with uniform quantizer to estimate the distribution of the signal. This information is sent back to the proposed

adaptive quantizer to do mode selection. Then the adaptive quantizer could give a more

accurate discrete signal by capturing the signal characteristics as much as possible

with appropriate modes. The residual signal could also be iteratively sent back to the adaptive quantizer to minimize the quantization error. The FPGA implementation of the

adaptive quantizer could be reconﬁgured in Tq milliseconds, where Tq < 10. Then the

system can be updated at the beginning of every cycle of Tq milliseconds, according to the distribution of the input signal. The number of quantization levels could be adjusted

according to the speed, resolution and power consumption of the devices. Our scheme based on histogram estimation and the GMM modeling may outstand previous iterative

DPCM schemes [36].

The reconﬁgurable ADC architecture in Figure 3-4 can dynamically adjust

the quantization speed, resolution and power consumption to match input data characteristics. Therefore, it will have wide applications in many ADCs and sensor

applications.

3.5 High Dynamic Range Image Compression with Joint Adaptive Quantizer and Multiscale Techniques

High dynamic range imaging (HDRI or just HDR) is one of the frontier techniques in image processing, computer graphics and photography [46, 47, 49], where image pixels

take ﬂoating values in the range of [0,1] rather than the traditional 8 bits per pixel for gray images and 24 bits per pixel for RGB images. HDRI try to capture the dynamic range of

natural scenes, which can exceed three orders of magnitudes of display devices. The

dynamic range of natural scenes can be captured by human eyes, many ﬁlms, and new

camera sensors. Whereas, display devices, such as CRTs, LCDs, and print materials, are restricted to low dynamic range. Therefore, compressing the high dynamic range of

HDRI to adapt to the low dynamic range of display devices and keeping the vivid colors

71 and the rich details of the original images as much as possible, is getting more and more attention. It is called tone mapping, which is an important component in the HDR

imaging pipeline, and widely used in virtual reality, video advertising, visual simulation,

remote sensing images, aerospace, medical and many other ﬁelds [45].

The tone mapping techniques can be divided into two categories: tone reproduction curves (TRCs) and tone reproduction operators (TROs). They could be applied to images both globally and locally. TRCs use compressive point nonlinearity mapping, such as a power function f (·), to shrink the high dynamic range images into the low dynamic range images. K. Chiu et al. proposed spatially nonuniform scaling functions for high contrast images [35]. F. Drago et al. used an adaptive logrithmic mapping for displaying high contrast scenes [50]. I.R. Khan et al. [77] and A. Boschetti et al. [19]

proposed a tone mapping algorithm based on histogram equalization. These algorithms

are simple and efﬁcient, but the contrast of image details may be lost obviously. The

TROs adjust pixel intensity by using spacial context to preserve local image contrast, which usually use multiscale techniques. Stockham [132] separated an HDR image

H(x,y) into a product of an illumination image I (x,y) and a reﬂectance image R(x,y)

in an early literature. Later on, Jobson et al. [69], Pattanaik et al. [107] improved the

multiscale techniques by introducing mechanism of the human visual system. These

multiscale methods have halo artifacts, which happen around the sharp edges and are caused by the blurring effect of ﬁlters. The most recent multiscale technique proposed

by Yuanzhen Li [84] properly used a symmetric analysis-synthesis ﬁlter bank, and local

gain control of each subband to mitigate the halo artifacts. But the luminance of the

resulted low dynamic range images seems low, and the boundary of the dynamic range

is clipped, which could be seen from their histograms. To address these problems, we proposed a joint TRC and TRO methods for high dynamic range image tone mapping

based on Li’s method [84] and our proposed adaptive quantizer.

72 The proposed tone mapping algorithm by using joint adaptive quantizer and multiscale techniques is shown in Figure 3-6 with processing in both wavelet domain

and image domain. The HDR image is ﬁrst decomposed into several subbands with

wavelet analysis. For the signal in each subband, we build a gain map from it, and apply

the gain map to the subband signal to release it from the compressed log domain. Then subband signals are synthesized back to image domain. Furthermore, the proposed

adaptive quantizer is applied on all the gamut of the reconstructed image, tune the

ﬂoating pixel values into user assigned shrinked dynamic range, such as integers in [0,

255] for ordinary digital images. Our system enjoys the beneﬁts from multiscale method

to keep the contrast of HDR as much as possible. By using Li’s local gain map [84], our system could mitigate halo artifacts. The proposed adaptive quantizer keeps the most

gamut information in the largest available low dynamic range. Therefore, our system

outperforms Li’s algorithm and histogram equalization based system.

3.6 Experimental Results and Discussion

We compare the proposed quantizer with the actual Lloyd-Max quantizers for SGMM, by comparing the corresponding approximate CDFs of the proposed quantizer with the actual CDFs of SGMM. We also compare the proposed quantizer with the Lloyd-Max quantizers for SGMM and the uniform quantizer in terms of MSE performance.

The proposed adaptive quantizer is described in detail in Section 3.3. The

Lloyd-Max quantizer for SGMM is found by the LBG algorithm numerically [61]. The uniform quantizer we compare with is the optimal uniform quantizer which is applied

uniformly to the ﬁnite region containing 99.8% of the data of the GMM distribution.

3.6.1 Example and Justiﬁcation of Parameter Determination

The reproduction values of 2-bit Lloyd-Max quantizer for N(0,1) are [−1.5104, −

0.4528, 0.4528, 1.5104]. When µ ≥ 3, for the 3-bit quantizer for SGMM, the 8 reproduction

values are [−1.5104 − µ, − 0.4528 − µ, 0.4528 − µ, 1.5104 − µ, − 1.5104 + µ, − 0.4528 +

73 µ, 0.4528 + µ, 1.5104 + µ] as in Mode III. When µ < 1, i.e. in mode I, for the 2-bit

quantizer for SGMM, the reproduction values are [−1.5104, − 0.4528, 0.4528, 1.5104].

When 1 ≤ µ < 3, i.e. in mode II, the compander is chosen as shown in Eq. (3–10). The differences between reproduction values of the proposed quantizer and those

of the Lloyd-Max quantizer for SGMM are evaluated by average absolute difference (AAD) as following: Z d N−1 1 | p µ − l µ | µ AAD = ∑ rk ( ) rk ( ) d (3–13) c N k=0 p l where rk and rk are the reproduction values of the proposed quantizer and the Lloyd-Max quantizer for SGMM, µ is the mean in SGMM, (c,d) is the support for averaging, i.e. the region of µ for each mode.

The approximation error between the CDF approximators in the proposed quantizer

and those of SGMM is evaluated by: Z Z d ∞ 2 ( (FSGMM (x, µ) − FA(x, µ)) dx)dµ (3–14) c −∞

where FSGMM is the CDF of SGMM, FA is the CDF approximators in the proposed

quantizer. For Mode I, c = 0, d = 1, FA(x) = Q(x) where Q(x) is deﬁned in Eq. (3–7); for

Mode II, c = 1, d = 3, FA(x) is in Eq. (3–10); for Mode III,    (1 + Q(x + µ))/2, x < 0 FA(x) = (3–15)  (1 + Q(x − µ))/2 + 1/2, x ≥ 0

The numerical experiments show that the AAD in 10−n order and the approximation error of the proposed quantizer is small as listed in Table 3-1. Table 3-1, Figure 3-1, Figure 3-2, and Figure 3-3 indicate the closeness of the proposed quantizer to the

Lloyd-Max quantizer as well as the robustness [65] of the proposed quantizer.

3.6.2 MSE Performance Comparison

We randomly generate 10000 data from the distribution of SGMM in Eq. (3–6).

Then the proposed adaptive quantizer, Lloyd-Max quantizer and uniform quantizer are

74 used to quantize the data into 8 quantization levels. We reconstruct the data from the quantized values, and compare them with the original data in terms of MSE with respect

to different µ as shown in Figure 3-7.

The MSE performance of the proposed quantizer is very close to that of Lloyd-Max

quantizer and much better than that of the uniform quantizer. In mode I, since we take the optimal uniform quantization in ﬁnite high probability region, the MSE gap between

uniform quantizer and Lloyd-Max quantizer is small. But the proposed quantizer still

has performance gain than uniform quantizer. In mode II, the proposed piecewise

compander provides a good stable MSE performance with a simple design. In mode III,

the MSE of the uniform quantizer increases dramatically with µ, since distribution is far

away from uniform distribution when µ is large, and the uniform quantizer wastes lots of bits for values with small probability around origin. But the proposed quantizer is still with

MSE performance very close to that of Lloyd-Max quantizer. Again, we apply our method to:

2 −µ 2 1 − (x i ) G (x) = ∑ e 2σ2 (3–16) 2 π 1/2σ i=1 (2 )

When µ1 = −µ2 = µ and σ = 2, we draw MSE results of the proposed quantizer, the Lloyd-Max quantizer and the uniform quantizer in Figure 3-8. From Figure 3-8, we could

see that in Mode I and III, the quantization error of the proposed adaptive quantizer is

very close to that of the Lloyd-Max quantizer, and the quantization error is a little higher

in Mode II. µS and µL for σ = 2 are almost the twice of those for σ = 1. The proposed adaptive quantizer has MSE performance close to that of the

Lloyd-Max quantizer, with similar computations as the uniform quantizer. It veriﬁes

the afﬁne law of quantizers in Proposition 3.1. By the way, due to the good MSE

performance of the proposed quantizer, the reproduction values of the proposed quantizer are effective initials of Lloyd-Max algorithm for quickly ﬁnding the Lloyd-Max

quantizers for GMM.

75 The time complexity and space complexity of the uniform quantizer, the Lloyd-Max quantizer and the proposed quantizer for N quantization levels are shown in Table 3-2.

Quantizer designing time, quantization running time per sample and memory cost

of the quantizers are compared. The uniform quantizer design needs the inverse

of CDF to obtain the optimal quantization range. The uniform quantization function

[x/N] + t0 needs 3 operations per sample, i.e. a multiplication, a rounding operation and an addition. In mode I and mode III, the computation of the proposed adaptive

quantizer using qI and qIII is just a table-lookup. When the number of quantization levels N is small, the running time of the proposed quantizer per sample log(N) or

log(N/2) is similar to that of uniform quantization. In mode II, the adaptive quantizer uses companding technique. Its computation is approximative 4 operations per sample,

i.e. a multiplication, an exponentiation, a rounding operation and an addition. In mode

I and mode III, if companding is used, the complexity is the same as Mode II. The

computation of Lloyd-Max algorithm includes an addition, a division in Eq. (3–3) and two integrals in Eq. (3–4) for each reconstruction level in one iteration.

The memory costs are also compared. For uniform quantizer and proposed quantizer in mode II and the companding in mode I and III, O(1) space is needed for computation. Others need O(N) space for table lookup. In short, the proposed quantizer is much more computationally efﬁcient than the Lloyd-Max quantizer and close to the uniform quantizer.

3.6.3 An Application in Image Quantization

We apply the proposed adaptive quantizer in gray image quantization. Assume

that we only have a low dynamic range displayer, such as printed paper, with m bits per

pixel, where m < 8, i.e. we should have a m bit quantization scheme for proper display.

Then what is the best image quality we can obtain from original gray images with

quantization? The quantizer should utilize the information of image pixel distribution.

76 Again, we compare the proposed adaptive quantizer with uniform quantizer and Lloyd-Max quantizer.

We show the cases when m = 4,5 in Figure 4-8 and Figure 3-10 respectively

on image Barbara, whose histogram and GMM estimation are shown in Figure 3-5

before. From Figure 4-8 and Figure 3-10, we could see that the proposed adaptive quantizer results less block artifacts than the uniform quantizer and similar to Lloyd-Max quantizer. The proposed quantizer has better performance than the uniform quantizer, and approximate to the optimal Lloyd-Max quantizer in terms of perceptual quality and

PSNR.

3.6.4 Experimental Results on HDR Image Tone Mapping

We show the experimental results of the proposed tone mapping algorithm of HDR

images by using joint adaptive quantizer and multiscale techniques in this section. We compare our algorithm with the recent algorithms: a TRC based method [84] and a TRO

based method [77].

From Figure 3-11, it is observed that Li’s result [84] is a little dark due to the

concentrated histograms, and the histogram based algorithm [77] loses some details

between trees and background, while our result looks better. We also compare the resulted LDR images and their histograms of RGB components

on the HDR image chairs in Figure 3-12 and Table 3-3. From Figure 3-12, we can

see that the board on the wall in our result is clearer than that in Li’s result, and the

illumination information in our algorithm is richer. In Table 3-3, the ﬁrst row shows the histograms of results from Li’s algorithm [84] and the second row shows the histograms of results from our proposed algorithm. The histograms of RGB components from our results are more spread out than Li’s algorithm. In addition, Li’s algorithm has clipping on both high and low end of dynamic ranges, which will loss information and may cause false color artifacts.

77 3.7 Summary

In this chapter, we proposed a novel adaptive quantizer for Gaussian Mixture

Model. The proposed quantizer is adaptive to the varying means and variances of the components of Gaussian Mixture. The adaptive quantizer has less Mean Square

Error than uniform quantizer, and very close to Lloyd-Max quantizer, only with similar computations as uniform quantizer. We also proposed a reconﬁgurable A/D converter with our adaptive quantizer. The proposed quantizer can also have applications in image quantization and High Dynamic Range Image compression. The quantized gray images with our quantizer have better visual quality and higher PNSR than those with the uniform quantizer, and are similar to those with Lloyd-Max quantizer. For HDR image compression, we proposed the tone mapping algorithm by using our adaptive quantizer and multiscale techniques. The experimental results show that the proposed adaptive quantizer holds great potential of revolutionizing the existing ADC and HDR image compression.

Our future work will focus on extending one-dimensional quantizers of Gaussian

Mixture Model to high dimensional space. The potential applications will include high dimensional signal processing and clustering.

Figure 3-1. CDF of Gaussian N(0,1) vs. CDF of SGMM with µ = 0.5.

Table 3-1. Proposed quantizer vs. Lloyd-Max quantizer. Mode I Mode II Mode III AAD 10−2 10−1 10−4 Approximation Error 2.51 16.69 0.03

78 Figure 3-2. Transformation function of a piecewise compressor vs. CDF of SGMM with µ = 1.5.

Figure 3-3. CDF of the catenated Gaussian vs. CDF of SGMM with µ = 3.

% % " # $

¦ ¥ & ¢ © ¥ * § ¡ ¢ ¦ 0 ) * ) ¦ ¥ & ¢

' ¡ ¦ ¥ ( ) * ¥ ¢ ¡ ¥ ¦ ' ¡ ¦ ¥ ( )

¡ ¢ £ ¤ ¥ ¦ § ¡ ¦ ¨ © ¤ ¨

¥ ¥

, - . /

¨ ¡ £ ¥ 0 )

' ¡ ¦ ¥ ( )

Figure 3-4. Reconﬁgurable A/D converter.

−3 x 10 9 Original Histogram 8 Gaussian Component 1 Gaussian Component 2 7 Gaussian Component 3 Gaussian Component 4 6 GMM Estimation

4 Pixel Frequency 3

0 0 50 100 150 200 250 Pixel Value

Figure 3-5. GMM estimation by EM algorithm on histogram of Barbara.

79 Figure 3-6. Tone mapping by using joint adaptive quantizer and multiscale techniques.

0.7 Uniform Lloyd−Max 0.6 Proposed

0.5

Mode I Mode II Mode III 0.4 MSE 0.3

0.2

0.1

0 0 1 2 3 4 5 6 7 µ

Figure 3-7. MSE comparison among the proposed adaptive quantizer, Lloyd-Max quantizer and Uniform quantizer for SGMM (σ = 1).

2.5 Uniform Lloyd−Max Proposed 2

Mode I Mode II Mode III

1.5 MSE

0.5

0 0 1 2 3 4 5 6 7 8 µ

Figure 3-8. MSE comparison among the proposed adaptive quantizer, Lloyd-Max quantizer and Uniform quantizer for the data in Eq. (3–16)(σ = 2).

80 A B C

Figure 3-9. Performance comparison among different quantizers when k = 4. (A)Uniform Quantizer(34.76dB). (B)Proposed Quantizer(36.21dB). (C)Lloyd-Max Quantizer(36.84dB).

A B C

Figure 3-10. Performance comparison among different quantizers when k = 5. (A)Uniform Quantizer(40.72dB). (B)Proposed Quantizer(41.85dB). (C)Lloyd-Max Quantizer(42.45dB).

Table 3-2. Comparison of complexity of quantizaters. Quantizers Design Time Running Time per Sample Memory Uniform Quantizers Inv 3 O(1)

Proposed Mode I qI (x) N logN O(N) Companding N 4 O(1) Adaptive Mode II Companding N 4 O(1)

Mode III qIII (x) N/2 log(N/2) O(N) Quantizer Companding N 4 O(1) the Lloyd-Max Quantizer for GMM 2k · N · (n + 1) logN O(N) (N is the number of quantization levels; Inv denotes the complexity of computing the inverse of CDF; k is the number of iterations in Lloyd-Max algorithm; n is the number of training samples in Lloyd-Max algorithm.)

81 A B

Figure 3-11. Performance comparison between different tone mapping algorithms on HDR image mpi atrium (copyright by Rafal Mantiuk). (A)Li’s algorithm [84]. (B)Histogram based algorithm [77]. (C)Our proposed algorithm.

Table 3-3. Histograms of images obtained by Li’s algorithm [84] (the ﬁrst row) and our algorithm (the second row) R Component Histogram G Component Histogram B Component Histogram

2000 600 6000

1800 500 5000 1600

1400 400 4000 1200

1000 300 3000

800 200 2000 600

400 100 1000 200

0 0 0 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 −0.2 0 0.2 0.4 0.6 0.8 1 1.2

2000 600 6000

1800 500 5000 1600

1400 400 4000 1200

1000 300 3000

800 200 2000 600

400 100 1000 200

0 0 0 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 −0.2 0 0.2 0.4 0.6 0.8 1 1.2

The horizontal axes are normalized into the range [0 1].

82 A

Figure 3-12. Visual performance comparison between different tone mapping algorithms on HDR image chairs. (A)Li’s algorithm [84]. (B)The proposed algorithm.

83 CHAPTER 4 APPROXIMATING OPTIMAL VECTOR QUANTIZATION WITH TRANSFORMATION AND SCALAR QUANTIZATION 4.1 Research Background

Quantization [61] is a critical technique for analog-to-digital conversion and signal compression. Scalar quantization is simple and fast, while vector quantization [58] in high dimension could achieve smaller mean square error (MSE) and better rate-distortion performance [39], by jointly considering all the dimensions, but at the cost of exponential increasing quantizer design time and more quantization computations, i.e. at the cost of more codebook design and lookup time. To reduce the codebook design and lookup time, a lot of research has been focused on two-dimensional random variables (r.v.), especially those in circular

Gaussian distributions, since Gaussian distributions [20, 39] have a lot of elegant close-form theorems. The earliest work could refer to Huang and Schultheiss’s method

[66], which quantizes each dimension of random variables with separate Lloyd-Max quantizers [97]. It is efﬁcient and effective, but deﬁnitely could be improved. Later, Zador

[163] and Gersho [57] studied quantization by using companders with a large number of quantization levels. They used a compressor to transform the data into a uniform distribution, and then applied the optimal quantizers for the uniform distribution, and then transform the data with an expander. But this scheme does not work well under a small number of quantization levels. Another major method for designing quantizers for circular distributions uses polar coordinates. Polar quantization includes separable magnitude quantization and phase quantization. The optimal ratio between the number of magnitude quantization levels and the number of phase quantization levels are studied by Pearlman [108] and Bucklew et al. [21, 22], and an MMSE restricted polar quantizer is implemented by using a uniform quantizer for the phase angles and a scaled Lloyd-Max Rayleigh quantizer for the magnitude. But this scheme does not consider the center of a circular distribution as a quantization level, thus, its MSE

84 performance is sometime worse than rectangular quantizers and other lattice quantizers, and it does not work well for elliptical distributions. Wilson [153] proposed a series of non-continuous quantization lattices which provide almost the optimal performance among polar quantization. It is a kind of unrestricted polar quantization, but without

Dirichlet boundaries. Peter et al. [135] improved Wilson’s scheme by replacing arc boundaries with Dirichlet boundaries. He showed the optimal circularly symmetric quantizers for circular Gaussian distributions.

Most of these previous works concentrate on Gaussian distributions, and provide numerical results only for Gaussian distributions. Although Gaussian source is considered as the “worst case” source for data compression, which is instructive to construct a robust quantizer [27], it is far from the optimal for quantizing other distributions. They did not consider the elliptical distributions neither, whose optimal quantizers are different from those for circular distributions. Also, they did not provide a uniﬁed framework for arbitrary distributions. Therefore, the optimal quantizers for other distributions such as Laplacian distributions, elliptical Gaussian and Laplacian distributions need investigation.

Then we focus on the quantizer design for memoryless circular and elliptical sources.

85 We will present the simple design methodology and utilize the Lloyd-Max quantizers for the corresponding one-dimensional distributions. The optimality of this scheme is

veriﬁed on elliptical/circular Gaussian and Laplacian distributions. The methodology

description and experiments are focused on the bivariate random variables, and the

extension to high dimensional random variables is also discussed. The advantages of our scheme include the following:

1. It provides an elegant quantization lattice for arbitrary number of quantization levels, especially for prime numbers.

2. It almost always have smaller MSE than the other quantizers.

3. It considers both memoryless and memory source with arbitrary distributions, which include circular distributions, elliptical distributions and mixed distributions.

4. It is under a uniﬁed framework of tri-axis coordinate frame.

5. It has small design and implementation complexity. The rest of the chapter is organized as below. Section 4.2 presents the preliminaries

of our proposed quantizer. Section 4.3 describes the system architecture of transform plus scalar quantization to approach the optimal vector quantizer. The preprocessing with transforms is discussed in Section 4.4 to decorrelate signals. In Section 4.5, we

present the tri-axis coordinate frame, and the methodology to design the optimal scalar quantizer for both circular and elliptical distributions in detail. Experimental results are

shown in Section 4.6. Finally, Section 4.7 concludes the chapter.

4.2 Preliminaries

4.2.1 n-dimensional MMSE Quantizer and Scaling Law

Usually, there are three tools to evaluate the performance of a quantizer. Firstly, mean square error (MSE) between input signal X and the reconstructed signal Xˆ , where

X ,Xˆ ∈ ℜn, is considered as following,

MSE = E[(X − Xˆ )2] (4–1)

86 Signal-to-Noise ratio is another evaluation tool.

|Σ| SNR = (4–2) MSE

where Σ is the covariance matrix of X , and | · | is the matrix determinant operator. The

rate-distortion curve is the third one.

Lloyd-Max quantizer [58, 97] is an MMSE quantizer. For one-dimensional signals,

let tk (k = 0,··· ,N) denote boundary points of quantization intervals, and let rk (k = 0,··· ,N − 1) denote quantization levels. Then Lloyd-Max quantizer is characterized by:

{ ∗ ∗} tk ,rk = arg min MSE {tk ,rk } − Z (4–3) N 1 tk +1 2 = arg min ∑ (x − rk ) fX (x)dx { } tk ,rk k=0 tk

where fX (x) is the probability density function (pdf) of X , N is the number of quantization levels. From (4–3), we have the centroid, and the nearest neighbor conditions:

∗ ∗ ∗ r − + r t = k 1 k , k = 1,··· ,N − 1, (4–4) k 2

and R ∗ tk+1 t∗ xp(x)dx ∗ k ··· − rk = R ∗ , k = 0, ,N 1, (4–5) tk+1 ∗ p(x)dx tk ∗ ∗ [t0 ,tN] is the range of the input of quantizers. The Lloyd-Max quantizer for one-dimensional Gaussian distribution with zero mean and unit variance has been well studied. Given the number of quantization levels N, the Lloyd-Max quantizer for zero mean, unit variance Gaussian is shown in the tables in [68]. Given the Lloyd-Max quantizer for zero mean, unit variance Gaussian, we can

use the afﬁne law in Proposition 4.1 to obtain the Lloyd-Max quantizer for Gaussian

distribution with arbitrary mean µ and arbitrary variance σ 2.

87 Proposition 4.1. (Afﬁne Law) For a r.v. X with zero mean, assume that its N-level

Lloyd-Max quantizer is speciﬁed by tk (k = 0,··· ,N) and rk (k = 0,··· ,N − 1). Then for − 1 r.v. Y = Σ 2 X + µ, with mean µ and covariance matrix Σ, where Σ = c · I , c > 0, its − 1 − 1 Lloyd-Max quantizer is speciﬁed by tˆk = Σ 2 tk + µ (k = 0,··· ,N) and rˆk = Σ 2 rk + µ (k = 0,··· ,N − 1).

For a proof of Proposition 4.1, see Appendix B.3. It is indicated that for a random variable obtained from another random variable by an afﬁne transform, the optimal quantizer could be obtained from the original quantizer by the same afﬁne transform.

Later, we focus our investigation on random variables with zero means and diagonal covariance matrices. 4.2.2 Circular and Elliptical Distributions

The source in a circular distribution is memoryless source. Each dimension of data is in exactly the same one-dimensional distribution. For example, a two-dimensional circular Gaussian distribution with unitary variance is shown as following:

x2 x2 1 − 1 + 2 f (x ,x ) = e 2 (4–6) 1 2 2π

It could also be represented in polar coordinate frame as following:

1 − r2 f (r,φ) = re 2 (4–7) 2π

That is why Lloyd-Max quantizer for Rayleigh distribution is preferred to quantize signal magnitudes in polar quantization.

The source in an elliptical distribution could be memoryless source with erect principle axes, and memory source with skewed principle axes. Memory source could be decorrelated with unitary transforms into memoryless source whose component in each dimension is in the same distribution but with possible different variances. The

88 memoryless elliptical Gaussian source could be represented as following:

−µ 2 −µ 2 − (x1 1) − (x2 2) 1 2σ2 2σ2 f (x1,x2) = e 1 · e 2 (4–8) 2πσ1σ2

Uniform distribution is both an ordinary circular distribution and an ordinary elliptical distribution.

4.2.3 Ideal Uniform Distribution and Optimal Two-dimensional Hexagon Lattice

Vector quantization aims to MMSE, but its computational complexity is really high and increases exponentially with the number of quantization levels. For uniform distribution, we generally accept that the optimal VQs are regular voxels in the source domain. Whereas, due to finite domain constrain, regular voxels cannot just pad the space by exactly integer number of voxels. Therefore, the VQs of uniform distributions found by various algorithms compose of irregular voxels, which are degenerated from the regular ones. We build our theorem on the infinite domain, i.e. infinite dynamic range to avoid boundary dilemma. Thus, we define ideal uniform distribution in the infinite dynamic range as follows.

Deﬁnition 1. Ideal Uniform Distribution Given domain of distribution whose volume is V, the uniform distribution is   1  , x ∈ V , f (x) = V (4–9)  0, o/w.

When V → ∞, then f (x) is in ideal uniform distribution. There is no boundary constrain for optimal VQ designing in ideal uniform distribution case.

Lemma 1. The optimal VQ for the two-dimensional ideal uniform distribution is regular honeycomb as shown in Figure 4-5.

89 4.3 System Architecture

4.3.1 Quantization for Compression

The general coding system usually includes transform, quantization and entropy

coding as shown in Figure 4-1. The optimal transform could simplify vector quantization

scheme into scalar quantization, even replace entropy coding from the coding system by ﬁxed length coding. Rate-Distortion code is optimal code proposed by Shannon

[39]. It is an optimal vector code when block length n → ∞. Only is its existence known,

but not its design in a general case. Vector quantization has the ability to approach

Rate-Distortion bound when number of quantization levels N → ∞, but is overwhelmed by the exponentially increasing complexity. Therefore, vector quantization is desired to be replaced by transform followed by scalar quantization with the same Rate-Distortion performance but much less design and implementation complexity as adopted by a general transform compression system shown in Figure 4-1. Therefore, an optimal

transform plus an optimal scalar quantizer gives us a new promising guideline to achieve

Rate-Distortion bound as studied in the next Sections. 4.3.2 Theorem and System Framework

Therefore, we claim a statement as following: Theorem 4.1. The MMSE vector quantization could be achieved by a transform followed by scalar quantization.

It will be proved in Section 4.5. Following Theorem 4.1, we propose system

architecture as shown in Figure 4-2. A vector quantizer is implemented by transforms and scalar quantization. The transforms we focus on are linear transforms with high decorrelation ability. We will discuss the unitary transforms, volume-keeping scaling transforms and the optimal decorrelation transforms in Section 4.4. The scalar quantizer

is implemented in the tri-axis coordinate frame which will be described in detail in

Section 4.5. The transform plus scalar quantization has the advantage of possible small complexity and good R-D performance, but still has tradeoff between complexity, rate

90 and distortion. Therefore, the system design should consist with the C-R-D theory. C represents complexity, R represents rate, and D represents distortion. Best R-D performance with least complexity is desired.

This system is ﬂexible, in which the companding technique could also be plugged as shown in Figure 4-3. As we will point out later, the companding technique is asymptotically optimal, but may not work well in low rate situation. But our tri-axis coordinate frame works almost universally.

4.4 Preprocessing with Transforms

Transform is helpful for quantization. Any mapping is a transform. Nonlinear transforms will introduce nonlinear error after quantization. Therefore, linear transform is considered in this section. To preserve constant energy, linear transforms, represented by matrices with unitary determinant, are focused on, such as unitary transforms and volume-keeping scaling transforms.

4.4.1 Unitary Transforms

Unitary Transforms are rotations in Euclidean space, aiming at high decorrelation ability.

We know that Karhunen-Loeve transform (KLT) dependent on signals is optimal in the sense of the highest decorrelation ability for ﬁnite block length signals. While DCT is a ﬁxed transform and a good substitution of KLT. Lemma 2. Mean Square Error is invariant under unitary transforms.

For a proof of Lemma 2, see Appendix B.4.

Lemma 3. The MMSE vector quantizers of random vectors after unitary transformation are the MMSE vector quantizers of the random vector after the same unitary transformation.

Lemma 3 is easy to obtain from Lemma 2, and is a rotation-invariant property of

MMSE vector quantizers.

91 Lemma 4. The sum of entropy of each component of random vector X will decrease after decorrelation by unitary transforms.

For a proof of Lemma 4, see Appendix B.5.

4.4.2 Scaling Transforms

From Afﬁne Law in Proposition 4.1, we know that the MMSE quantizers will undergo

the similar expansion or shrinkage as input signals, whose dimensions are scaled by a

same factor. These scaling transforms will increase or decrease signal volumn. Thus,

we steer to Volume-keeping uniform scaling transform. Deﬁnition 2. Volume-keeping uniform scaling transform could be represented by a diagonal matrix with unitary determinant, i.e., the product of diagonal elements of the

diagonal matrix is one as following:

  ··· a1 0 0       0 a ··· 0   2   . . .   . .. .   

0 ··· 0 an ∏n where i=1 ai = 1. Usually the MSE and energy of signals will change after volume-keeping scaling, ∑n 2 6 ∑n 2 2 since i=1 xi = i=1 ai xi . The rate-distortion theory requires the MSE uniformly distributed among each component of random vector, if the MSE does not exceed the variance of that component. Therefore, the MMSE vector quantizer for a elliptical

distribution could not be obtained from the MMSE vector quantizer for a circular

distribution by a simple scaling. They should be considered separately.

Lemma 5. The sum of entropy of every component of a random vector in Gaussian or Laplace distributions keeps constant after volume-keeping scaling transform.

For a proof of Lemma 5, see Appendix B.6.

92 4.4.3 Optimal Transform for Arbitrary Distributions

Since the volume-keeping scaling transforms perturb the MSE of each component,

we only consider unitary transforms before quantization. If an arbitrary distribution is considered, the most powerful decorrelation can

not be achieved by a simple unitary transform, since unitary transform decrease the

intra-component/block correlation, but it did not consider the inter-component/block

correlation. An arbitrary distribution can be approximated by a Gaussian Mixture Model

(GMM) found by expectation maximization (EM) algorithm. The pdf of a GMM random variable X is given as below:

Ng fX (x) = ∑ pi · gi (x) (4–10) i=1 where Ng is the number of Gaussian components in the GMM; gi (x) is a Gaussian pdf for component i (i = 1,··· ,Ng) shown as ellipses in Figure 4-4; pi denotes the ··· ∑Ng probability of component i (i = 1, ,Ng); and i=1 pi = 1 . Assume the data come from the pixel pairs in a gray image. Then Gaussian components of data almost fall along

diagonal due to the correlation between pixel pairs. The ﬁrst step is intra-component

decorrelation, the second step is inter-component decorrelation. The two steps could change order. Later the scalar quantizers could be applied to each decorrelated

Gaussian component.

4.5 Optimal Scalar Quantizers in Tri-axis Coordinate Frame

After transforms, we obtain random vectors with independent components. For

Theorem 4.1, one-dimensional vector quantization is scalar quantization. No transform is needed. It is a trivial case. For two-dimensional vector quantization, we will prove this theorem in Tri-axis coordinate system. For high dimensional vector quantization, multi-axis coordinate system is needed.

93 4.5.1 Tri-axis Coordinate Frame

Deﬁnition 3. The tri-axis coordinate frame in two-dimensional space has three axes X , Y and Z, and there are 120◦ angles between the three axes X , Y and Z.

The tri-axis coordinate frame is shown in Figure 4-5. Every point in the two-dimension plane could be represented in this tri-axis coordinate frame. For some symmetrical distribution, two axes or one axis is sufﬁcient. For example, the optimal two-dimensional vector quantization for uniform distribution could be determined by two axes X , Y having

120◦ in between spanning a hexagonal lattice. The optimal two-dimensional vector quantization for circular distribution also needs two axes, one of which determines the magnitude quantization, another one determines the phase quantization. We will show them in next subsections.

4.5.2 Tri-Axis System for Uniform Distribution

It is well known that the optimal vector quantizer for uniform distributions in two-dimensional space is regular honeycomb [68], which is from the geometry of numbers, also from discrete geometry in the Euclidean plane. We will implement it with scalar quantization in tri-axis system as shown in Figure 4-6.

Proposition 4.2. Hexagon lattice in tri-axis system is still Rate-Distortion optimal for uniform distribution.

Proof. 1. Vector quantization levels are the centroids of the hexagons. The centroid of each hexagon of the optimal quantizer could be represented by a ﬁxed length code. R = log2 N, where N is the number of quantization levels.

2. Every point in two-dimensional space could be represented by the vector ~r = c1~r1 + c2~r2, as shown in Figure 4-6, where~r1 and~r2 are the basis vectors of VQ. c1 and c2 are integers and uniformly distributed if the centroid is uniformly distributed.

3. Scalar quantizers compose of two independent scalar quantizers along two axes 0 0 ~r1 and~r2. Every point could be represented by two indices in the codebook. The indices are obtained by projecting the point to the nearest code on the axes. For 0 0 0 0 example, a point representation is~r = x~r1 +y~r2 which is quantized to~r = xm~r1 +yn~r2.

94 0 0 4. Let~r1 =~r1 and~r2 =~r2, then we have c1 = xm and c2 = ym. The MSE distortion − 0 − 0 of both SQ and VQ are the inner products of vectors ((x xm)~r1 and (y ym)~r2) , ((x − c1)~r1 and (y − c2)~r2). Thus, VQ and SQ have the same distortion.

0 0 5. For uniform distribution, only one codebook is needed for two axes~r1 and~r2. The indices could be ﬁxed length coded. Ri = log2 Ni , where Ni is the number of quantization levels along axis i (i = 1,2). × 6. N1 N2 = N asymptotically, i.e. R1 + R2 = log2 N1 + log2 N2 = log2 N = R, VQ and SQ have the same rate.

7. Therefore, SQ and VQ are with the same R-D performance.

Therefore, the optimal MMSE vector quantization could be achieved by a transform

(an identical transform) followed by uniform scalar quantization for two-dimensional ideal uniform distribution.

In this way, SQ and VQ have the same R-D performance, while the codebook size of SQ is around the square root of that of VQ. Because of the reduced complexity brought by SQ, we try to use SQ to replace VQ in this chapter. Another thing needed to mention is that the numbers of quantization levels for hexagon lattice are prime numbers 1,7,19,37,···, growing along circles with larger and larger radius, which could be found in Figure 4-7.

4.5.3 Tri-Axis Coordinate Frame for Circular and Elliptical Distributions

How about the distribution is not uniform, what will the shape of optimal lattice be?

This is the inverse problem of ﬁnding the transform from non-uniform distribution to uniform distribution.

4.5.3.1 Elastic quantization lattices

We will show the elastic quantization lattices for circular and elliptical distributions.

From the optimal hexagon lattices for a uniform distribution, we state that the optimal vector quantizer for a circular distribution forms an expanded hexagonal lattice, as shown in Figure 4-7. The expansion ratio between the optimal lattice for a two-dimensional circular distribution and that of a two-dimensional uniform distribution

95 along the radius direction may approximately follow the expansion ratio between the Lloyd-Max quantizer for the corresponding one-dimensional distribution and that of a

one-dimensional uniform distribution, i.e. the Lloyd-Max quantizer for the corresponding

one-dimensional distribution.

4.5.3.2 Design methodology

We ﬁrstly focus on the positions of quantization levels of a two-dimensional vector

quantizer. The lattice patterns of the proposed quantizer, where quantization levels

fall on, are determined beforehand, as shown in Figure 4-10 and Figure 4-11. The quantization levels approximately fall on the centroids of the lattice, which are uniformly

distributed in each annulus. We restrict them to be in the same circle for simplicity, and

each quantization region of lattice do not need to be hexagon. In different annuluses,

quantization levels are staggered arranged similar to the rotated polar quantization [135]. The optimal distance between the quantization levels and the origin, i.e., the

magnitude quantization, is determined by weighting the Lloyd-Max quantizer of the

corresponding one-dimensional distribution with the unitary-variance. For more precise

locations of MMSE magnitude quantization levels, they are further searched outward

along radial direction in terms of MMSE. To be speciﬁc, for a circular distribution, its pdf could be represented in polar

coordinate frame as f (r,θ) = f1(r) · f2(θ). For an arbitrary elliptical distribution, the data could be transformed by unitary transforms into a distribution whose principle axes are parallel to the coordinate frames, and then shifted to the origin. Then the distributions could be uniformly represented by the following equation in cartesian coordinate system:

n 2 xi ∑ 2 = 1 (4–11) i=1 bi b1 = b2 = ··· = bn = b for circular distributions; bi s are not all equal for elliptical

distributions. For circular distributions, f1(r), f2(θ) are separable, while it is not for

elliptical distributions. The weighting effect from b1 and b2 for elliptical distributions is

96 important. If the quantizers of circular distributions is used for elliptical distributions, 2 b1 the resulted MSE per each dimension has ratio of 2 . Whereas, from Shannon b2 Rate-Distortion theory (i.e. reverse water ﬁling), we know that the if the MSE is less than the variance of each component, bitrate should be allocated such that the MSE per dimension is nearly equal. Therefore, we should use b1 and b2 to weight quantization levels towards this way for elliptical distributions.

The magnitude quantization is non-uniform. For both circular and elliptical

distributions, the two-dimensional quantization levels fall on each circle or oval could

be represented by the coordinates (c · b1 · cosθ,c · b2 · sinθ) shown as stars in Figure 4-9. c increases non-uniformly along the radial direction. c could be determined by searching outward starting from Lloyd-Max quantization for Gaussian distribution along radial direction.

The uniform phase quantization is optimal for circular distributions, but may not for elliptical distributions. We take uniform phase quantization for both kinds of distributions, since the optimal phase quantization for elliptical distributions is a little perturbation from the uniform quantization. We will show its sub-optimality for elliptical distribution in experiments. As shown in Figure 4-10, the number of quantization levels in each annulus is 1, 6, 12, 18, similar to that of the regular hexagon lattice. Within each magnitude annulus, the k phase regions all have equal size, whose boundaries are represented as following: 2π 2π (j − 1) ≤ θ < j (4–12) k k where j = 1,2,··· ,k.

The boundary of quantization intervals is obtained by the nearest neighborhood scheme based on the ﬁxed quantization levels as shown in Eq. (4–3). The resulted

quantization regions are not necessary hexagons. The optimal VQ for distributions,

including uniform distributions, in ﬁnite regions is determined from theoretical hexagonal lattice in this way.

97 4.5.3.3 The number of quantization levels in each annulus

How many quantization levels should we assign to each annulus? Previously, for the

restricted polar quantization [108], quantization levels N is factorized into N = Nθ · Nr ,

where Nθ is the number of quantization levels in each annulus, and Nr is the number of

annuluses. Although the optimal ratio between Nθ and Nr is studied, some number N can not be perfectly factorized, not to mention the prime number. This difﬁculty also lies

in the unrestricted polar quantization [153]. The non-continuity of quantization patterns

exists in all the previous works. It is also an imperfection in our schemes. We have two schemes to arrange magnitude quantization levels vs. phase quantization levels.

The ﬁrst one is the optimal arrangement according to quantization levels, similar to

Wilson’s [153]. The second one is progressive semi-continuous scheme. Our quantizer

design and optimization methodology is much simpler than that of the unrestricted polar quantization.

The ﬁrst scheme allows freedom in the number of phases assigned at each

magnitude level. The optimal patterns are derived from experiments, which are

coincident with Wilson’s scheme [153] but with better performance and Dirichlet

boundaries, as shown in Figure 4-12. Another one is the progressive quantization scheme [115] as shown in Figure 4-13.

The number of annuluses L increases when the number of quantization levels N =

1,7,19,··· ,1 + 6 · (1 + 2 + 3 + ···). Deﬁne set NL = {1,7,19,··· ,1 + 6 · (1 + 2 + 3 + ···)} and

NL(l) is the l-th element in set NL. That is the number of annuluses L is determined by:   inf{l : N − 3l ≤ NL(l)}, N ≤ 4 L = (4–13)  inf{l : N + 7 − 6l ≤ NL(l)}, o/w where inf is the inﬁmum. Therefore, the quantizer could be implemented progressively along the increase of N. The previously located quantization levels need not change their relative positions, only their magnitudes should be shrinked a little as suggested

98 by the Lloyd-Max quantizer of one-dimensional Gaussian. Or hierarchically, we could further quantize each existing quantization region with our scheme.

Comparing the two schemes of quantization lattice patterns, the quantitative

descriptions of the ﬁrst optimal scheme are difﬁcult to provide. For small N, the

second scheme has performance close to the ﬁrst scheme, although with possible different lattice patterns; for large N, their performance difference decreases, and the

quantization patterns of the second scheme is asymptotically approaches those of the

ﬁrst scheme. Scheme one and scheme two have a lot of common quantization lattice

patterns.

4.5.3.4 Expansion rule

How far away is the expansion rule along radial direction for two-dimensional

distributions from the Lloyd-Max quantizer for the corresponding one-dimensional distributions?

Take gaussian distribution as an example. For two-dimensional Gaussian with joint

density given by:

1 x2 + x 2 P (x ,x ) = exp{− 1 2 }) (4–14) X 1 2 2π 2

where −∞ < x1,x2 < ∞. Its polar coordinate representation is:

r P (r,θ) = exp{−r 2/2} (4–15) R,Θ 2π

where 0 ≤ r < ∞, 0 ≤ θ < 2π

2 2 1/2 r = (x1 + x2 )

− x θ = tan 1( 2 ) x1 The number of annuluses L of the quantizer for two-dimensional circular Gaussian has the following relationship with the number N1 of quantization levels of one-dimensional

99 Gaussian distribution.    2L, L = 1 N1 = (4–16)  2L − 1, L ≥ 2 Then the expansion rule for r in Eq. (4–15) with L annuluses is found in table of the Lloyd-Max quantizer for one-dimensional Gaussian with N1 quantization levels. For example, N = 7, L = 2 case as shown in Figure 4-12. It corresponds to N1 = 3 of the quantizer for one-dimensional Gaussian, i.e., r1 = 0, r2 = 1.2240. ∗ ∗ Then how far away of r1 = 0, r2 = 1.2240 from the optimal r1 , r2 ? Consider an upper ∗ bound of the difference between r1 and r1 when L = 1. The radius expansion is following the rule for Rayleigh distribution. But there is no quantization level at origin for Rayleigh distribution, so we have to utilize the quantizer for Gaussian distribution for our quantizer.

R R 2 r2 r ∞ θ1 r − θ π e 2 dθdr π 0 0 2 = R R r2 ∞ θ1 r − 2 e 2 dθdr 0 θ0 2π The Lloyd-Max quantizer for one-dimensional Gaussian distribution as following:

R 2 ∞ x − x r √ e 2 dx 0 2π 2 R 2 = ∞ 1 − x π √ e 2 dx 0 2π Then the upper bound of difference is around 0.46 for unitary-variance distributions.

As N becomes larger, these differences become smallers. These are also the maximal searching ranges to ﬁnd the optimal magnitude quantizers. The Lloyd-Max quantizer for one-dimension Gaussian case is good initial for ﬁnding the optimal magnitude quantization for one-dimensional distributions.

The magnitude quantization is almost independent of the phase quantization. It means that when phase quantization changes, the magnitude quantization suffers a little perturbation at most.

100 4.5.4 Generalization to GMM or LMM

We can use EM algorithm to identify the components in Gaussian Mixture Model

(GMM) or Laplacian Mixture Model (LMM). Then for each component, the proposed transforms plus scalar quantizer could be used to replace vector quantizers to

approximate the optimal quantization performance.

4.5.5 Generalization to High Dimension

The dodecahedron is the optimal voxel of vector quantization for signals in

three-dimensional uniform distribution [9]. The three-dimensional six-axis coordinate

system (six axes for six parallel surface pairs of dodecahedron) can be built up similarly

as shown in Figure 4-14, and its coordinates(a,b,c,d,e,f) have three independent components. Analogously, vector quantization could be approximated by transforms plus scalar quantization for three-dimensional distributions.

For even higher dimensional space, once the optimal voxel of vector quantization for signals in uniform distribution is obtained, we can obtain the optimal transforms plus scalar quantization to replace vector quantization accordingly. 4.6 Experimental Results and Discussions

In this section, we will ﬁrst show the basic properties of the proposed scalar quantizer. Later on we will show experimental results on memoryless source in unitary

circular Gaussian and Laplace distribution, and erect elliptical Gaussian and Laplace

distribution with b1 = 2, b2 = 1. We will compare MSE and the Rate-Distortion performance of our proposed quantizers based on the ﬁrst scheme with optimal quantization lattices, the unrestricted polar quantizers [153] (indicated by ‘UPQ’), the restricted polar quantizers [108]

(indicated by ‘PQ’), the rectangular quantizers [66] (indicated by ‘Rectangular’). The

log2 N rate here is deﬁned as 2 . Distortion is MSE per dimension. The benchmark is

101 Rate-Distortion function for the Gaussian memoryless source:

1 1 R(D) = log ( ) (4–17) 2 2 D where 0 < D ≤ 1. Each quantizer is tested on its best performance, with the corresponding optimal quantization levels, and the optimal ratio between the number of phase quantization levels and the number of magnitude quantization levels. For example, the rectangular quantizers are almost tested with n2 quantization levels for circular distributions, i.e. each dimension is quantized by a Lloyd-Max quantizer with n quantization levels, and 2n × n for elliptical distributions, i.e. data is quantized by

Lloyd-Max quantizer with 2n and n quantization levels respectively to two dimensions.

We will not show the results of the vector quantizers found by LBG algorithm, since it is highly initial dependent and the results we obtained are much worse than those of our proposed quantizers.

4.6.1 Basic Optimal Properties

1. The property of optimal solutions is considered as following. Assume MMSE per dimension is the objective. For N = 7 = 1 + 6 with two magnitude levels (i.e. the ﬁrst magnitude quantization level is quantized with one phase quantization level, the second magnitude quantization level with six phase quantization levels), MSE per dimension performance on uni-variance circular Gaussian is shown in Figure 4-15 with respect to different radius of circles where lies the second magnitude quantization level. The radius shown in Figure 4-15 starts from the second quantization level of Lloyd-Max quantizer for univariate Gaussian around 1.224. Then MSE per dimension decreases with the increase of radius, reaches its unique minimum around 1.43, and increases with the increase of radius. With more magnitude levels and more than one radiuses needed to be tuned for the optimal performance, there are deﬁnitely local minimums. But the optimal radiuses could be easily and fast searched with values starting from the quantization levels of Lloyd-Max quantizers for univariate Gaussian.

2. For the quantization lattices with the same number of magnitude quantization levels, the radiuses of the optimal magnitude levels increase with the number of quantization levels N, and are saturated with relatively large N. The optimal radiuses of the second magnitude quantization level are shown by the vertical coordinates of points in in Figure 4-16 corresponding to the number of quantization levels N = 5(= 1+4),6(= 1+5),7(= 1+6),8(= 1+7),9(= 1+8). They increase with

102 N, and gradually slow down. This gives us a guidance on how to tune the optimal magnitude quantization levels.

4.6.2 Circular Gaussian Distribution

We show the R-D performance of different quantizers on a uni-variance circular

Gaussian distribution in Figure 4-17. From Figure 4-17, we can see that the R-D performance of our proposed quantizers is always a little better than that of UPQs, and much better than that of PQs and that of Rectangular quantizers. They have the same R-D performance when N = 4, due to the same quantization levels. Rectangular quantizers may have better performance than PQs with some n2 quantization levels.

4.6.3 Elliptical Gaussian Distribution

We show the R-D performance of different quantizers on an elliptical Gaussian distribution in Figure 4-18. From Figure 4-18, we can see that UPQs does not consider the different variances among different random vector components, thus do not perform well. Our proposed quantizers almost always perform better than rectangular quantizers, except when N = 8. Since N = 8 = 4 × 2 is the best factorization for the rectangular quantizer on elliptical distributions when ratio of data component variances equals 2.

Whereas, for other N non-factorable, rectangular quantizers perform much worse as expected, although we do not plot it in the ﬁgure. 4.6.4 Circular Laplace Distribution

We show the R-D performance of different quantizers on a uni-variance circular

Laplace distribution in Figure 4-19. It indicates in Figure 4-19 that our proposed quantizers always perform a little better than UPQs, and much better than PQs and rectangular quantizers.

4.6.5 Elliptical Laplace Distribution

We show the R-D performance of different quantizers on an elliptical Laplace distribution in Figure 4-20. It indicates in Figure 4-20 that our proposed quantizers always perform better than UPQs, and better than rectangular quantizers except

103 when N = 8 = 4 × 2. Our proposed quantizers have predominant advantageous when N = 7,19,37,···.

4.6.6 Bit-rate Saving

We also evaluate the average bit-rate saving of our quantizers comparing to other

quantizers. Average bit-rate is calculated by using Bjontegaard’s method [16, 106] with

ﬁtting polynomials of degree 3. Bit-rate saving is evaluated based on relative average

bit-rate in percentage as shown in the following equation.

R − R c p × 100% (4–18) Rp

where Rc is the average bit-rate of the compared quantizers, and Rp is the average bit-rate of the proposed quantizers.

From Table 4-1, we can see that the proposed quantizer saves 0.4% - 24.5%

bit-rate on average, compared to unrestricted polar quantizers, restricted polar quantizers and rectangular qantizers. We did not list the average bit-rate gain over

restricted polar quantizers for elliptical distributions, which is even higher than that over

unrestricted polar quantizers.

4.7 Summary

In this paper, we proposed a scheme to use transformation plus scalar quantization to replace the optimal vector quantization. The unitary transforms rather than scaling transforms were needed for the optimal vector quantizer approximation. After transformation, scalar quantization for both circular and elliptical distributions was studied in the proposed tri-axis coordinate system. The optimal quantization levels were found in the elastic hexagonal lattices, which included the optimal and the progressive quantizer lattice patterns. The experimental results showed that our proposed quantizers almost always had better performance than UPQs, PQs and rectangular quantizers on both Gaussian and Laplace distributions, especially with prime number of quantization levels.

We achieved O(N2) design complexity and 0.4%-24.5% bitrate saving, where N is the

104 number of quantization levels per dimension. Therefore, we claimed that transforms plus scalar quantizers could approximate the optimal vector quantizers in terms of R-D performance but with much less computational complexity. Our future work will focus on the optimal vector quantizer approximation in high dimensional spaces, and the applications in image and video coding.

Figure 4-1. General encoding and decoding pipeline with transforms and quantization.

Figure 4-2. System architecture with transform plus scalar quantization.

Figure 4-3. Transform plus scalar quantization with companding technique.

105 Figure 4-4. Gaussian mixture model decorrelation.

Figure 4-5. Two-dimensional tri-axis coordinate system.

Figure 4-6. Two dimensional optimal uniform vector quantizer.

106 Figure 4-7. Circularly expanded hexagon lattice for two-dimensional circular distribution.

A B

Figure 4-8. Elliptically expanded hexagon lattices for two-dimensional elliptical Gaussian distribution. (A)Horizontal elliptical hexagonal lattice. (B)Vertical elliptical hexagonal lattice.

Figure 4-9. Tri-axis frame for a general two-dimensional elliptical distribution.

107 Figure 4-10. Expanded hexagon lattice for two-dimensional circular Gaussian distribution.

Figure 4-11. Expanded hexagon lattice for two-dimensional elliptical Gaussian distribution.

108 Figure 4-12. The ﬁrst optimal quantization scheme.

Figure 4-13. The second progressive quantization scheme. 109 Figure 4-14. Voxel for three-dimensional uniform distribution.

0.238

0.236

0.234

0.232

0.23

0.228 MSE per dimension

0.226

0.224

0.222 1.25 1.3 1.35 1.4 1.45 1.5 1.55 1.6 Radius of circles where lie the quantization levels.

Figure 4-15. MSE per dimension of quantization for 10000 samples from a unitary-variance circular Gaussian distribution.

110 1.48

1.46

1.44

1.42

Optimal magnitude levels 1.4

1.38

1.36 5=1+4 6=1+5 7=1+6 8=1+7 9=1+8 N

Figure 4-16. Optimal magnitude levels for different number of quantization levels N for unitary-variance circular Gaussian distribution.

3 R(D)=log (1/D)/2 2 Proposed UPQ PQ 2.5 Rectangular

2 (N)/2) 2

Rate (log 1.5

0.5 −2 −1 0 10 10 10 Distortion (MSE per dimension)

Figure 4-17. Rate-Distortion comparison among different quantizers for circular Gaussian distribution.

111 2.8 Proposed UPQ 2.6 Rectangular

2.4

2.2 (N)/2) 2 2

Rate (log 1.8

1.6

1.4

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Distortion (MSE per dimension)

Figure 4-18. Rate-Distortion comparison among different quantizers for elliptical Gaussian distribution.

2.8 Proposed 2.6 UPQ PQ Rectangular 2.4

2.2

(N)/2) 2 2

1.8 Rate (log 1.6

1.4

1.2

1 0.1 0.2 0.3 0.4 0.5 Distortion (MSE per dimension)

Figure 4-19. Rate-Distortion comparison among different quantizers for circular Laplace distribution.

112 2.8 Proposed UPQ 2.6 Rectangular

2.4

2.2 (N)/2) 2 2

Rate (log 1.8

1.6

1.4

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Distortion (MSE per dimension)

Figure 4-20. Rate-Distortion comparison among different quantizers for elliptical Laplace distribution.

Table 4-1. Average bit-rate saving of the proposed quantizers over other quantizers. UPQ PQ Rectangular Circular Gaussian 0.36% 6.78% 3.22% Elliptical Gaussian 22.4% — 16.9% Circular Laplace 0.94% 24.5% 5.62% Elliptical Laplace 19.8% — 6.32%

113 CHAPTER 5 CONTENT BASED IMAGE HASHING

5.1 Research Background

Digital images are in an exponential increase due to proliferation of digital cameras and image applications. The huge number of digital images requires efficient classification and retrieval of images [43]. Digital images facilitate multimedia processing, and at the mean time, make fabricating and copying of digital contents easy. To protect the copyright of the images, efficient and automatic techniques are needed to identify and verify the content of digital multimedia. Besides the long-time established copy right protection tool — image watermarking [154], image hashing emerges as an effective tool to represent images and automatically identify whether the query image is a fabrication or a copy of the original one.

As an alternative to image watermarking, image hashing can be applied to many applications previously accomplished by watermarking, such as copyright protection, image authentication. It can also be used for image indexing and retrieval as well as video signature. Unlike watermarking, image hashing need not change the image by inserting watermarks into the image. Image hash is a short binary string, mapped from an image by an image hash function. The image hash function has such a property that perceptually identical images should have the same or similar hash values with high probability, while perceptually different images should have quite different hash values.

In addition, the hash function should be secure, so that an attacker cannot predict the hash value of a known image.

Many techniques for image hashing have been proposed in the literature. These algorithms are typically based on statistics [147], relations [86], low-level image features

[13, 99], non-negative matrix factorizations [100], and so on. Fridrich and Goljan [55] proposed a robust visual hashing method. Their hash digests of digital images are created by projections of DCT coefﬁcients to zero-mean random smooth patterns,

114 generated using a secrete key. Roy et al. [120] proposed content-hashing, which consists consists of a compact representation of some image features. The resulting image hash is robust to image filtering, but surrenders to geometric attacks and may not be collision free. The image hash based on Scale Invariant Feature Transform(SIFT) algorithm [48] and compressive sensing technique [138] could solve geometric attacks in certain degree, but it is computationally expensive. Lin and Chang [86] created the mutual relationship of pairwise block DCT coefficients to distinguish JPEG compression from malicious modifications. But the block based method is unreliable, since possible shifting and cropping operations may change hash values. Venkatesan et al. [147] proposed an image hashing technique. Their hashes are generated from statistical features extracted from random tiling of wavelet coefficients. However, it only allows limited resistance to geometric distortions, and is susceptible to some manipulations, such as luminance change and object insertion.

To address these problems, we propose content based image hashing using companding and Gray code. Content based image hashing is more robust to content preserving image processing like geometric and illuminance attacks, and more sensitive to malicious content tampering attacks than statistics based image hashing. Our method combines robust feature point detector and robust content singularity descriptor at these feature points. The feature points are chosen from cross points of lines, with the k-largest local total variations [8] in images. The local total variations are determined by the structure of images, and robust to geometric attacks and luminance attacks.

Therefore, it will obtain stable similar feature points in perceptual identical images and is resistant to content-preserving attacks. Morlet wavelet [14] is good at describing the singularity of signals. The Morlet wavelet coefficients at feature points are obtained to represent images. The Morlet wavelet coefficients are pseudo randomly permuted with a secrete key, which enhances the security of the image hashing system. Morlet wavelet coefficients are computationally efficiently quantized using companding technique

115 according to the probability distribution of the coefﬁcients. Thus, the proposed image hashing is robust to contrast changing and gamma correction of images. Gray code

is used to binarily code the quantized coefﬁcients, which increases discriminability of

image hashes.

The rest of the chapter is organized as following. In Section 5.2, we present an overview of the proposed image hashing system. In Section 5.3, we describe how to

extract the robust feature of images, which is the Morlet wavelet coefﬁcients at feature

points with the k-largest local total variations. Then the Morlet wavelet coefﬁcients

are quantized and binarily coded with Gray code as shown in Section 5.4. Section 5.5

shows the experimental results that demonstrate the effectiveness and robustness of the proposed image hashing system. Finally, we conclude this chapter in Section 5.6.

5.2 System Overview

Image hashes should have small collision probability, and high discriminability.

From two input images I and I 0, a image hashing system φ extracts two corresponding

binary hashes h and h0 using a secrete key K as shown in Equation (5–1). The distance

function such as normalized Hamming distance is denoted by d(·,·), and discriminative thresholds are denoted by ζ1 and ζ2.    h = φ(I ,K) (5–1)  0 0 h = φ(I ,K) For the design of image hashing system, three objectives should be considered.

0 0 0 1. ∀I ,I , if I =6 I then d(h,h ) ≥ ζ1;

0 0 0 2. ∀I ,I , if I = I then d(h,h ) < ζ2;

3. ∀I , P(h(i) = 1) = P(h(i) = 0) = 0.5, where h(i) is the i-th element of hash h, P(h(i) = j) is the probability of h(i) = j,(j = 0 or 1).

The ﬁrst objective indicates that distances between different images should be larger

than a threshold ζ1, which guarantees that the discriminability of image hashing system.

116 The second objective implies that the distances between similar images should be smaller than a threshold ζ2, where ζ2 ≤ ζ1, which ensures the robustness of image hashing under intentional or unintentional attacks. For similar images, it is expected that the image hash is able to discriminate images under intentional and unintentional attacks using a threshold for image authentication purpose. The third objective provides the unpredictability of image hashes whose binary values are distributed with equal probability.

Our proposed image hashing system is shown in Figure 5-1. First, feature points in images are extracted. The feature points are expected to be similar for similar images, such that distances between hashes of similar images are small and that image hashes are robust against perceptually preserving attacks. We extract feature points with the k-largest local total variations, which capture the structure of images. Second, we obtain the Morlet wavelet coefficients at feature points to describe the degree of singularity at feature points. Third, pseudo random permutation of the Morlet wavelet coefficients with a secrete key increases the security and reduces the collision probability of image hashes. Forth, the coefficients are quantized with companding technique and binarily coded with Gray code to form the final image hashes. Inverse Error Correction Coding to compress image hashes is optional in our system.

5.3 Robust Descriptor of Images

Most information of signals is conveyed by irregular structures and transient phenomena of signals. Feature points such as corners are salient content descriptors of images. There are three stages to extract the robust descriptor of images in our proposed method.

• Image preprocessing;

• Finding the locations of feature points;

• Evaluating the singularity of image signals at feature points by continuous wavelet transform.

117 5.3.1 Preprocessing

Preprocessing will change image pixels and may inﬂuence the detection and description of feature points. We try to avoid any changes to images and extract the original information from images. Therefore, the only preprocessing in our method is resizing images into the same size to facilitate later algorithm steps.

5.3.2 Feature Point Extraction

For different applications and corresponding performance requirements, different techniques to extract feature points are explored in the literature [146]. Since image hashes should be invariant to content-preserving processing, robust repeatable feature point detectors with small computations are desired. Jaroslav et al. [74] proposed a feature point detector in blurred images, which we call BFP in the chapter. BFP can yield high repetition rate on differently distorted images. We will propose a more robust feature point detector based on BFP. BFP is to efﬁciently detect points which belong to two edges regardless their orientations. It selects points with the k-largest local variances. The local variance (LV) is deﬁned on the image block in Equation (6–1).

2 LV = ∑ (I (X ) − I¯Ω) (5–2) X ∈Ω where Ω is the image block centered at a feature point, X is a vector representing the pixel coordinates, I (X ) is the pixel value, I¯Ω is the mean of the pixel values in the block. LV depends on pixel values, thus, is easily changed by any image processing. Therefore, we propose to select feature points with the k-largest local total variations (LTV) [8]. LTV is deﬁned as:

0 LTV = ∑ |I (X )|2 (5–3) X ∈Ω

118 where Ω is the image block centered at the current feature point, I 0(X ) is the gradient of pixel values at coordinate X = (x1,x2) s 0 ∂I (X ) ∂I (X ) |I (X )| = ( )2 + ( )2 (5–4) ∂x1 ∂x2

LTV depends on local structure in images. It is robust against content-preserving image processing.

Therefore, our modiﬁed feature point extraction algorithm is more robust than BFP.

We use this method to determine the coordinates of the most salient feature points with the k-largest local total variations in images, as shown in Figure 5-3.

Feature points are extracted in the high repetition rate. Coordinates of feature points are not invariant to the geometric transforms of images, but the singularity of feature points is invariant to the geometric transforms of images. Therefore, after locating the most salient feature points in images, we use Morlet wavelet to evaluate the degree of singularity at the feature points.

5.3.3 Feature Point Description

Harmonic analysis [73] of signals can detect and locate the singularity of signals.

Wavelet bases have good localization ability in both time and frequency domain.

Therefore, they can locate and characterize the singularity of signals very well. The local singularity of functions is measured by Lipschitz exponent mathematically. A function f (x) is with singularity of Lipschitz α, at point x0, if and only if there exists a constant α A such that all the points x in a neighborhood of x0 satisfy |f (x) − f (x0)| ≤ A|x − x0| .

The wavelet coefﬁcients Wf (s,x0) of f (x) at x0 and scale s has relation with Lipschitz exponents α shown in Equation (5–5).

α |Wf (s,x0)| ≤ Aε s (5–5) where Aε is a constant.

119 Continuous wavelet transform [91] is designed to detect the singularity of signals better than discrete wavelet transform. The locations of singularity found by continuous wavelet transform [14] may be influenced by the noise in images. False positive may happen at points which are not at corners but close to straight lines. False negative may happen at points which are at corners but with small variation of gray levels. But the degree of singularity is less influenced. Therefore, based on robustly extracted feature points, we calculate the continuous wavelet coefficients row-by-row and column-by-column, and use the magnitudes of the coefficients to represent feature points.

Morlet wavelet is a continuous wavelet with single frequency and Sine modulated Gaussian function. Morlet wavelet is used to detect linear structures perpendicular to the orientation of the wavelet. 2D Morlet wavelet is deﬁned as

iK X −1/2|K |2 −1/2|X |2 ϕM (X ) = (e 0 − e 0 )e (5–6)

where X = (x1,x2) is the 2D spatial coordinates, and K0 = (k1,k2) is the wave-vector of the mother wavelet, which determines the scale-resolving power and angular resolving power of the wavelet.

Because the directions of the strongest responses of Morlet wavelet ﬁlter at feature points may be perturbed by noise. Only horizontal and vertical directions of Morlet wavelet are considered. Although the magnitudes of Morlet wavelet coefﬁcients in horizontal and vertical directions will be slightly perturbed by a small degree of rotation of images, it will be normalized in the later quantization step.

5.4 Hash Generation

After extracting salient feature in images, we will further generate the binary hash sequences from the obtained Morlet wavelet coefﬁcients in this section.

120 5.4.1 Pseudo Random Permutation of Morlet Wavelet Coefﬁcients

To enhance the security of image hashes, i.e., to avoid the forgery inputs designed by an adversary resulting in the same hashes, we use a secrete key K to pseudo randomly permute the Morlet wavelet coefﬁcients. The random permutation can also decrease the collision probability for different inputs by using different secrete keys.

5.4.2 Quantization Using Companding

Quantization using companding is efficient and unbiased for coefficients with different probability. Quantization can obtain discrete representation of image hash, normalize the range of output hash, as well as weight different parts of hashes with different values. Vector quantization with Lloyd-Max algorithm is classical, but is dependent on the initial configuration and computationally expensive. Therefore, we propose to use companding technique [61] to quantize the float-point Morlet wavelet coefficients to finite level binaries. The algorithm of companding for discrete values is similar to the algorithm of histogram equalization. The computational complexity of the algorithm is O(n), where n is the number of coefficients. Quantization using companding technique assumes that the shape of distribution of Morlet wavelet coefficients of similar images are similar which is in line with the fact. It is a kind of probabilistic quantization.

It tries to be fair to every coefficients, i.e., coefficient values with large probability will be quantized with small stepsizes, while coefficient values with small probability will be quantized with large stepsizes. A compandor consists of a compressor, a uniform quantizer, and an expandor. The compressor is a nonlinear transformation and designed to convert the distribution into the uniform distribution. The expandor is an inverse of the compressor, which is used for recovery of original coefficients and thus disregarded in our image hashing system. Using the companding technique, we quantize the data

1 m into L levels. The coefﬁcient probability of each level is the same, i.e., L . L should be 2 (m ∈ Z +) for easy binarization of coefﬁcients.

121 5.4.3 Binary Coding Using Gray Code

We propose to use Gray code [122] to code the quantized coefﬁcients. Gray code,

also known as the reﬂection binary code, is a binary code, in which two successive values differ in only one bit. Thus the hamming distances between successive values

are 1s, and hamming distances between nonsuccessive values are proportional to their

differences. However, it does not hold for ordinary binary code. In this way, the distances

between similar images decrease and those between different images increase. This

helps increase the discriminability of the system. Since the length of a hash is 5 in our experiments, a 32 byte array is used as a lookup table for constructing hash with Gray

Code. For arbitrary length, Gray code may be constructed recursively.

5.5 Experimental Results

In our experiments, the image block Ω is 15x15 to calculate LTV; 40 points with largest LTVs are chosen as feature points; the quantization level L is 32, i.e. 25; the

length of image hash N is 200.

The distances between different hashes are evaluated by the normalized Hamming distance. n 0 1 0 d(h,h ) = ∑ δ(h(i),h (i)) (5–7) N i=1 where    , h i h0 i 0 1 ( ) = ( ) δ(h(i),h (i)) = (5–8)  0 0, h(i) =6 h (i) h and h0 are two hash vectors, their the ith values are denoted as h(i) and h0(i), and N is the length of an image hash.

These parameters could be tuned for better performance in speciﬁc applications. 5.5.1 The Robustness of Feature Point Detector

The robustness of proposed feature point detector is shown in Figure 5-3. The image ‘Lena’ is tampered, rotated, contrast enhanced with histogram equalization

122 as shown in Figure 5-3(b) (c) (d), respectively. In Figure 5-3, the extracted feature points are denoted by red ‘o’. The feature points extracted from the original image, the tampered image, the rotated image and the image after histogram equalization are almost the same. It indicates high correct detection rate of the proposed feature point detector. Hence, the proposed feature point detector is robust against tampering, rotation and histogram equalization.

5.5.2 Parameter Determination of Singularity Descriptor

Besides the elegance of Morlet wavelet, the reason why we use Morlet wavelet is due to its strong discriminability in the proposed image hashing system, which we will show in this subsection.

The optimal scale of Morlet wavelet is determined by using 24 frames of two shots in the video sequence big buck bunny 480p h264.mov [3]. Each shot has 12 frames. The reference image randomly selected is the 7th frame in the ﬁrst shot. The 7th frame and the 17th frame among 24 frames are shown in Figure 6-7. The frames in the ﬁrst shot are similar to Figure 6-7(a), and the frames in the second shot are similar to Figure

6-7(b).

From Figure 5-4, we can see that the distance between the 7th frame and the reference image, i.e., itself, is 0. The distances between similar frames in the same shot are much smaller than the distances between different frames in different shots.

Figure 5-4A compares the discriminability of image hashes with Morlet wavelet at different scales. The gaps between the average distances of the ﬁrst shot and the average distances of the second shot are 0.2342, 0.2704, 0.2628, 0.2645 at scale 6,

8, 10, 12, respectively. The largest gap 0.2704 at scale 8 indicates that the strongest discriminability of Morlet wavelet at scale 8 for video signals. Figure 5-4B compares the discriminability of image hashes with different wavelets at scale 8. The gaps between the average distances of the ﬁrst shot and the average distances of the second shot

123 are 0.2704, 0.2335, 0.2083, 0.2582 for Morlet wavelet, Spline wavelet, Haar wavelet, Symmetric wavelet respectively. Morlet wavelet has the strongest discriminability.

5.5.3 Discriminability and Robustness of Image Hashes

We also test the robustness of proposed image hashing system on natural images under various attacks. The six test images are 512x512 gray images shown in Figure 6-4.

5.5.3.1 Discriminability between different images

The normalized Hamming distances of six test images are shown in Table 5-2. They are relatively large. It indicates that the discriminability of the proposed image hashing system is good. 5.5.3.2 Non-predictability of image hashes

A desirable property of an image hash function is that the distance between the hash of an image and any random sequence with the same length is large. If this property is achieved, it is unlikely for an attacker to generate an imposter of the hash of an image by using a random sequence.

We evaluate the distance between an image hash and a random binary sequence.

We generate 50 different random sequences; each random sequence is a binary random vector with probability p(0) = p(1) = 0.5 and has the same length as the image hash. The normalized Hamming distances between the hash of Lena and the 50 different random sequences are illustrated in Figure 5-6. The distances are relatively large and constant, which implies the non-predictability of image hashes.

5.5.3.3 Robustness to content-preserving attacks

We make the following types of attacks on these images: scaling images to 0.5 and

1.5 of their sizes, compressing images using JPEG with quality factor 50 [116], rotating images by 5 degrees, cropping 20% of images, adding white Gaussian noise (variance

σ 2 = 20) to images, ﬁltering images with Gaussian and Median ﬁlters. The normalized

Hamming distances between the attacked images and the original test images are

124 listed in Table 5-3. The hash distances are small under these content-preserving image processing, showing the robustness of the proposed image hashing system.

5.5.3.4 Robustness to tampering

We make several tampered versions of image ‘Lena’ as shown in Figure 5-7. The

tampered areas are indicated by red rectangles. The normalized Hamming distances

between the hash of Lena and those of the tampered Lena are shown in Table 5-1. The

normalized Hamming distances between the hash of Lena and those of the tampered

Lena are larger than the normalized Hamming distances between the hash of Lena and those of the content-preserving processed Lena, and are smaller than the normalized

Hamming distances between the hashes of different images. Thus, the proposed image

hashing system has the ability to identify tampering.

5.5.3.5 Discriminative thresholds

Based on the experiments above, the discriminative thresholds in our system are

determined as ζ1 = ζ2 = 0.26. With the thresholds, the false positive and false negative rate on the test images are both 0.

5.6 Summary

In this chapter, we proposed a new method to generate robust image hashes. The feature points are extracted from images with the k-largest local total variations.

Morlet wavelet coefﬁcients are calculated at the feature points. They are pseudo random permutated, quantized with companding technique and binarily coded with Gray code.

The generated image hashes are robust to content-preserving image processing. The normalized Hamming distances between the hashes of Lena and those of the tampered Lena are larger than the normalized Hamming distances between the hash of Lena and those of the content-preserving processed Lena, and are smaller than the normalized

Hamming distances between the hashes of different images. Our future research will be exploring its applications on image authentication and video signature, since our

125 proposed image hash has good discriminability between different images and different video shots, and strong ability to recognize similar images.

Image

Extract Feature Points

Obtain Morlet Wavelet Coefficients at Feature Points

Pseudo Random Permutate Morlet Wavelet Coefficients

Quantize Coefficients using Companding

Binary Coding with Gray Code (Inverse ECC Compression)

Image Hash

Figure 5-1. Flow chart of image hash generation.

A B

Figure 5-2. The 7th frame and the 17th frame in the test video Bunny. (A)The 7th frame. (B)The 17th frame.

126 A B

C D

Figure 5-3. The stable feature point detector. (A)The original image and extracted feature points. (B)The hat-tampered image and extracted feature points. (C)The rotated image and extracted feature points. (D)The image after histogram equalization and extracted feature points.

127 Distance between frames in two shots with Morlet wavelet Distance between frames in two shots with different wavelet 0.5 0.5

0.45 0.45

0.4 0.4

0.35 0.35

0.3 0.3

0.25 0.25

0.2 0.2 Morlet wavelet Scale = 6 Spline wavelet Scale = 8 0.15 0.15 Haar wavelet Scale = 10 Symmetric wavelet 0.1 Scale = 12 0.1 Hamming distance between image hash 0.05 0.05

0 0 0 5 10 15 20 25 0 5 10 15 20 25 frames

A B

Figure 5-4. Hash distance with different wavelets at different scales. (A)Hash distance with Morlet wavelet at different scales. (B)Hash distance with different wavelets at scale 8.

A B C

D E F

Figure 5-5. Six test images for image hashing. (A)Lena (B)Barbara (C)Boat (D)Mandrill (E)Jet (F)Pepper

128 2

1.5

Distances 0.5

−0.5 0 10 20 30 40 50 n

Figure 5-6. Distance between the hash of Lena and the n-th random sequence.

(n = 1,2,··· ,50)

A Tamper 1 B Tamper 2 C Tamper 3

D Tamper 4 E Tamper 5 F Tamper 6

Figure 5-7. Six tampered images of Lena.

129 Table 5-1. Hamming distance between Lena and its tampered version. (Lena, Tamper 1) (Lena, Tamper 2) (Lena, Tamper 3)

Distance 0.28 0.32 0.33

(Lena, Tamper 4) (Lena, Tamper 5) (Lena, Tamper 6)

Distance 0.26 0.35 0.29

Table 5-2. Hamming distance between different test images. (Lena, Barbara) (Lena, Boat) (Lena, Mandrill)

Distance 0.44 0.39 0.42

(Barbara,Boat) (Barbara,Mandrill) (Barbara,Jet)

Distance 0.39 0.48 0.41

(Boat,Jet) (Boat,Pepper) (Mandrill,Jet)

Distance 0.37 0.37 0.41

(Lena, Jet) (Lena, Pepper) (Barbara,Pepper)

Distance 0.35 0.39 0.39

(Boat,Mandrill) (Mandrill,Pepper) (Jet,Pepper)

Distance 0.41 0.40 0.36

130 Table 5-3. Hamming distance between attacked images and test images. Attacks Lena Barbara Boat Mandrill Jet Pepper

Scale 0.5 0.05 0.03 0.25 0.08 0.25 0.03

Scale 1.5 0.03 0.03 0.05 0.08 0.05 0.03 JPEG 50 0.05 0.30 0.15 0.03 0.03 0.03

Rotate 5o 0.03 0.34 0.25 0.03 0.23 0.17

Crop 20% 0.03 0.33 0.25 0.08 0.25 0.25

AWGN σ 2=20 0.03 0.12 0.03 0.03 0.04 0.03

Gaussian Filtering 0.12 0.03 0.03 0.12 0.06 0.06

Median Filtering 0.06 0.12 0.25 0.12 0.03 0.15

131 CHAPTER 6 CONTENT BASED IMAGE AUTHENTICATION

6.1 Research Background

Digital images become an important part of our daily lives due to the rapid growth of

Internet and the increasing demand of multimedia contents from people. The upsoaring number of image applications facilitate image processing, and at the mean time, make fabricating and copying of digital contents easy, and lead us doubtful when digital images are used as evidences in court. Therefore, efﬁcient and automatic techniques are desired to identify and verify the contents of digital images. Image authentication is such a promising technique to automatically identify whether a query image is a different one, or a fabrication, or a simple copy of an anchor image. Here, the anchor image is the ground truth image or the original image as an authentication reference, and the query image is the one under suspicion.

Image authentication techniques usually include conventional cryptography, fragile and semi-fragile watermarking and digital signature and so on. The authentication process can be assisted with the original image or in the absence of the original image. Image authentication methods, based on cryptography, use a hash function [79, 130] to compute the message authentication code (MAC) from images. The generated hash is further encrypted with a secrete key from the sender, and then appended to the image as an overhead, which is easy to be removed. Fragile watermarking usually refers to reversible data hiding [23, 140, 160, 162]. A watermark is embedded into an image in a reversible and unnoticeable way. If the original image is reconstructed and the embedded message is recovered exactly, then the image is declared as authentic. The conventional cryptography and reversible watermarking can guarantee the integrity of images, but they are vulnerable to any changes. A one-bit different version of the image will be treated as a totally different image. These methods cannot distinguish tolerable changes from malicious changes. Semi-fragile watermarking has

132 attack-resistant ability between fragile and robust watermarking. It has the ability of tampering identiﬁcation. Fridrich [53, 54] proposed block Discrete Cosine Transform

(DCT) based methods to identify the tampered areas. But the block based method is susceptible to translation and cropping attacks. Besides, semi-fragile watermarking techniques will change the pixel values, and degrade the image quality once the watermarks are embedded, which is undesirable. And there is a trade off between image quality and watermark robustness. Digital signature based techniques are image content dependent, which are also called image hashing. An image hash is a representation of the image. Besides image authentication, it can also be used for image retrieval and other applications. Kozat et al. [80] proposed an image hash technique based on Singular Value Decomposition (SVD). It is assumed that the singular values are robust to general image processing, but not to malicious image tampering. It achieves high probability of detecting a tampered image at the cost of high false alarm probability. Venkatesan et al. [147] developed an image hash based on a statistical property of wavelet coefﬁcients, which is invariant to content-preserving modiﬁcations of images. But it is not intended to identify the locations of changes.

The image authentication system proposed by Monga et al. [101] is based on feature points of images. The system is not sufficiently robust due to the outlier feature points produced by image processing, although Hausdorff distance is used to evaluate the distances between feature points. Monga et al. [99] also proposed a perceptual image hashing. The extracted features are the quantized magnitudes of the Morlet wavelet coefficients at feature points. Although the distribution of the magnitudes of the Morlet wavelet coefficients may be preserved under perceptually insignificant distortions, the location information is lost. In this chapter, we propose a perceptual image authentication technique based on clustering and matching of feature points of images to address the limitations of the aforementioned schemes. Feature points are first generated from a given image, but

133 their locations may be changed due to possible image processing and degradation. Accordingly, we propose to use Fuzzy C-mean clustering algorithm to cluster the feature

points and remove the outliers from the feature points. In the meanwhile, the feature

points in the query image and the anchor image are matched into pairs in zigzag

ordering along diagonals of the images cluster by cluster. Three types of distance are used to measure the distances between the matched feature point pairs. Histogram

weighted distance is proposed, which is equivalent to Hausdorff distance after outlier

removal. The authenticity of the query image is determined by the majority vote of

whether three types of distance between matched feature point pairs are larger than

their respective thresholds. The geometric transforms through which the query images are aligned with the anchor images are estimated, and the query images are registered

accordingly. Moreover, the possible tampered image blocks are identiﬁed, and the

percentage of the tampered area is estimated.

The rest of the chapter is organized as follows. Section 6.2 presents an overview of the proposed image authentication system. Section 6.3 describes how to detect

feature points in images. In Section 6.4, we propose an efﬁcient and effective algorithm to remove outliers of feature points, and the remaining feature points are ordered and matched into pairs. Histogram weighted distance is proposed and normalized Euclidean distance and Hausdorff distance are used in Section 6.5. Majority voting strategy is used to determine the authenticity of images. In Section 6.6, possible attacks are

identiﬁed, the query images are registered, the tampered image blocks are located,

and the percentage of tampered area is estimated. Experimental results are shown in

Section 6.7. Finally, Section 6.8 concludes the chapter.

6.2 System Overview

The services provided by the proposed image authentication system include:

• Identify a query image as a similar image, or a tampered image, or a different image, with regard to an anchor image;

134 • Evaluate similarity of two images by distance between them;

• Identify and locate three types of tampered area, i.e., added area, removed area, changed area;

• Estimate the percentage of tampered area.

The ﬂowchart of the proposed image authentication system is shown in Figure 6-1.

First, feature points are extracted from the anchor image and the query image with the k-largest local total variations. Second, the feature points are clustered, then outliers of feature points are removed, and corresponding feature point pairs in the anchor and query images are zigzag aligned along the diagonals of images. Third, histogram weighted distance is proposed. Three types of distances between two images are evaluated and compared to thresholds. The low missing rate of authentication is desired in our system. Thus, majority voting strategy is used to make authentication decisions of images. If at least two distances are greater than their thresholds, the two images are declared as different. Otherwise, the two images are declared as similar for further examination. Forth, if the two images are considered to be similar, the possible attacks on the query image, i.e., geometric attacks and tampering attacks, are subject to detection. The query image is further registered. The locations and percentage of tampered area are estimated.

6.3 Feature Point Detection

Feature points are geometric descriptors of the contents of images. Most information of signals is conveyed by irregular structures and transient phenomena of signals.

Feature points such as corners can be used to characterize the saliency of images. Feature point based descriptor is more robust to geometric attacks than statistics-based descriptors. Feature points are also useful for registration and identiﬁcation of possible underlying attacks (geometric or non-geometric), on query images.

135 6.3.1 Preprocessing

Preprocessing will change image pixels and may inﬂuence the detection and

description of feature points. To extract the original information from the query image, we keep the query image intact except adapting its size to the size of the anchor image.

6.3.2 Feature Point Extraction

For different applications, different techniques to extract feature points are

explored in the literature [146]. Since image authentication needs to be invariant to

content-preserving processing, hence, robust and repeatable feature point detectors

with small computation overhead are desired. Jaroslav et al. [74] proposed a feature

point detector for blurred images, which we call BFP in the chapter. In our chapter, a more robust feature point detector is proposed based on BFP. BFP is intended to

efﬁciently detect points which belong to two edges regardless their orientations. It

selects points with the k-largest local variances. The local variance (LV) is deﬁned on the image block in Equation (6–1).

2 LV = ∑ (I (X ) − I¯Ω) (6–1) X ∈Ω

where Ω is the image block centered at the current feature point, X is a vector

representing the pixel coordinates, I (X ) is the pixel value at X , I¯Ω is the mean of the pixel values in the block. LV depends on pixel values, thus, is easily changed by any

image processing. Therefore, we propose to select feature points with the k-largest local total

variations (LTV) [8]. LTV is deﬁned as:

0 LTV = ∑ |I (X )|2 (6–2) X ∈Ω

136 where Ω is the image block centered at the current feature point, I 0(X ) is the gradient of image at coordinate X = (x1,x2) s 0 ∂I (X ) ∂I (X ) |I (X )| = ( )2 + ( )2 (6–3) ∂x1 ∂x2

LTV depends on local structure of images. It is more robust against content-preserving image processing than LV.

Therefore, our proposed feature point extraction algorithm is more robust than BFP.

We use this method to determine the coordinates of the most salient feature points with the k-largest local total variations in images, as shown in Figure 6-2.

6.4 Feature Point Clustering and Matching

Due to possible changes applied to the query image, such as luminance changes and geometric transforms, the extracted feature points of the query image are different from those of the anchor image, no matter the query image and the anchor image are similar or not. The possibly missing, emerging and moving feature points may defeat the image authentication. If two images are similar, the possible missing, emerging and moving feature points of the query image, may enlarge their distance, and affect the similarity measure. If the query image and the anchor image are totally different images, the possible changes of feature points in the query image may decrease the distance between the two different images, and degrade the discriminability of the system. Besides, for distance evaluation, the feature point matching is needed between the anchor image and the query image. Therefore, to improve the performance of the system, the following clustering process is critical to remove outliers and match feature points into pairs in certain spatial ordering. We propose to use Fuzzy C-mean clustering to implement outlier removal and feature point matching in one pass.

6.4.1 Clustering by Fuzzy C-Means

Fuzzy C-means clustering algorithm is used to cluster the feature points. Fuzzy C-means clustering method, developed by Dunn [52] in 1973 and improved by Bezdek

137 [12] in 1981, is based on minimization of the following objective function:

N C mk − k2 Jm = ∑ ∑ uij xi cj (6–4) i=1 j=1 where 1 ≤ m < ∞, uij is the degree of membership of xi belonging to the cluster j, xi is the ith feature point, cj is the center of the cluster j, k · k is any norm evaluating the distance between any feature point and the center, N is the number of samples, and

C is the number of clusters. The membership degree uij and the cluster centers cj are updated by:

1 uij = (6–5) C kxi −cj k 1 ∑ ( ) m−1 k=1 kxi −ck k

N m · ∑i u xi c = =1 ij (6–6) j ∑N m i=1 uij

6.4.2 Outlier Removal

The outliers are deﬁned as extra points unmatched in corresponding clusters in the query image and the anchor image. For example, there are n feature points in cluster j in the anchor image , and n + 1 feature points in the corresponding cluster j0 in the query image, then the one extra emerging feature point in cluster j in the query image with least degree of membership is regarded as outlier, and vice versa. Like noise, these points should not be considered in the measurement of distance between the anchor image and the query image, and the registration of the query image.

If the number of outliers in a cluster is greater than a threshold, this cluster is declared as ‘tampered’. If an image has at least one tampered cluster, this image is declared as ‘tampered’. The locations of the outliers are used to determine the locations of tampered area.

138 6.4.3 Spatial Ordering and Feature Point Matching

After outlier removal, the numbers of remaining feature points in corresponding

clusters in the query image and the anchor image are the same. The feature point matching algorithm processes feature points cluster by cluster. In each cluster, the

feature points in two images are ordered zigzag along diagonals of images. The

proposed feature point matching algorithm may not result in exact pairs between feature

points, but it is sub-optimal and very fast.

Given N feature points in the query image, ﬁnding the corresponding N feature points in the anchor image, incurs a computational complexity of N!. Whereas, the

computational complexity of our proposed feature point matching algorithm is O(N logn), where n is the average number of feature points per cluster. Assume there are n

N feature points per cluster on average. Thus, there are n clusters. For each cluster, the computation of ordering is O(n logn). After clustering and outlier removal, the

computational complexity of feature point matching reduces to O(N logn) by cluster

ordering and spatial ordering.

The spatial matching by diagonal ordering is optimal to raster ordering in terms of

correct matching rate under the perturbation of possible attacks. The proposed matching algorithm is robust to outliers, and the case where feature points are removed, emerge

or change their locations due to possible noise or attacks. It increases the similarity

measure of similar images, and increases the distance between two different images.

6.4.4 Algorithm Summary

6.5 Distance Evaluation

Three types of distance are used to evaluate the distances between images, among

which histogram weighted distance is proposed. If at least two types of distance are

larger than their corresponding threshold, the two images are considered different,

otherwise similar. The thresholds are obtained by statistical experiments.

139 Algorithm 4 Feature Point Clustering and Matching For feature point set XA in the anchor image and feature point set XQ in the query image.

1. Perform fuzzy C-means clustering on XA and XQ, which are clustered into clusters ··· XAj and XQj (j = 1, ,C), C is the number of clusters. 2. For cluster j (j = 1,··· ,C) do:

Ordering feature points in XAj and XQj according to their coordinates (x1,x2) in zigzag ordering along diagonals of the images, i.e., ordering feature points with respect to (x1 + x2).

(a) if length(X )=length(X ), match (X (i), X (i)) into pairs, where X (i) is the ith Aj Qj Aj Qj Aj feature point in the jth cluster of the anchor image and X (i) is the ith feature Qj point in the jth cluster of the query image.

(b) if length(X )>length(X ), for each feature point X (i) in X , sequentially ﬁnd Aj Qj Qj Qj 0 (i0 ) the closest unmatched feature points X (i ) in X . For pairs (X 1 ,X (i1)) and Aj Aj Aj Qj 0 (X (i 2),X (i2)), if i > i , then i 0 > i 0 . Other unmatched feature points in X are Aj Qj 1 2 1 2 Aj

considered as outliers of XAj . (c) if length(X ) i , then i 0 > i 0 . Other unmatched feature points in X are Aj Qj 1 2 1 2 Qj

considered as outliers of XQj .

6.5.1 Normalized Euclidean Distance

The ﬁrst type of distance is normalized Euclidean distance between the matched feature point pairs, which is given by:

N 1 k (i) − (i)k E(XA,XQ) = ∑ XA XQ E (6–7) N i=1

(i) where N is the number of feature point pairs, XA is the coordinate of the corresponding (i) ith feature point in the anchor image, XQ is the coordinate of the ith feature point in the query image, k · kE is Euclidean norm.

140 6.5.2 Hausdorff Distance

The Hausdorff distance [37] is deﬁned by:

H(XA,XQ) = max(h(XA,XQ),h(XQ,XA)) (6–8) where

h(XA,XQ) = max min kx − yk (6–9) x∈XA y∈XQ Since it is minimax based distance, it is robust to outliers of feature points. It is also used in an image hashing system in chapter [99].

6.5.3 Histogram Weighted Distance

We propose the third type of distance, i.e., histogram weighted distance, which

is a perceptual based distance. The signiﬁcance of a feature point is weighted by

percentage of pixel values at that position. If the pixel values of feature points have

higher percentage in the histogram of pixels, the distances between these pairs of

feature points should be trusted more than others. The histogram weighted distance is given by:

N N 1 (i)k (i) − (i)k 1 (i)k (i) − (i)k W (XA,XQ) = max( ∑ wA XA XQ E , ∑ wQ XA XQ E ) (6–10) N i=1 N i=1

(i) where N is the number of feature point pairs, XA is the coordinate of the ith feature (i) point in the anchor image, XQ is the coordinate of the ith feature point in the query (i) image, wA is the luminance percentage of the ith feature points in the anchor image, (i) k·k wQ is the luminance percentage of the ith feature points in the query image, and E is the Euclidean norm.

6.5.4 Majority Vote

The ﬁnal decision is made from majority vote among whether three types of

distance are larger than the respective thresholds or not as shown in Figure 6-1. It

is due to the ability and limitation of three types of distance. Normalized Euclidean

distance is mostly used, but is easily perturbed by outlier feature points; Hausdorff

141 distance is a kind of minimax distance, repels the outliers, but may lose some geometric information of images; histogram weighted distance considers pixel/color information,

makes decision more robust, although it is inﬂuenced by outliers too. Therefore, majority

vote is necessary to take advantageous of these types of distance. Three types of

distance are equal important and are treated with the same weight in the proposed system. They are diverse enough in our experiments to lower authentication error rate.

More distance measures may repeat the performance of existing distance or dilute their

functions, and will increase the system complexity.

6.5.5 Strategy for Threshold Determination

The thresholds of distance to differentiate similar images and different images are

determined based on the statistical experiments. A novel strategy we take is to calculate

distance among two video shots. A frame in one video shot is taken as the anchor image. The other frames are query images. Then the middle value between the average

distance in the same video shot and the average distance in the different video shots

is taken as the threshold. More results could be found in experiments in Section 6.7, especially as shown in Figure 6-6.

6.6 Possible Attack Identiﬁcation

After distance evaluation, if the two images are considered similar, the possible

geometric attacks and tampering, which the query image may experience are subject to further detection.

6.6.1 Geometric Attack Estimation and Registration

Registration algorithms, such as iterative close point (ICP) algorithm [164] and Kanade-Lucas-Tomasi Feature Tracker (KLT) [90] estimate the translation and

rotation transforms between feature point pairs, but do not consider scaling transform.

Scale-invariant feature transform (SIFT) algorithm [88, 89] considers the scaling

transform, but requires high computation overhead. In this chapter, we propose to estimate and recover images from possible geometric attacks in two stages. First,

142 iterative close point (ICP) algorithm [164] is used to estimate the rotation and translation based on the matched feature point pairs. Then the query image is recovered from the rotation and translation transforms. Second, the scaling transforms are estimated. We propose to use the ratio of the standard deviation (STD) of feature points of the query image to the standard deviation of feature points of the anchor image to estimate the possible scaling transforms after rotation and translation registration.

6.6.2 Tampering Attack Identiﬁcation

The possible tampered image blocks are detected and the percentage of the tampered area is estimated. The tampered image blocks are determined by the distances between local histograms of image blocks around the feature points in two images. The distance we use is earth mover distance (EMD) [83, 121]. We divide the tampering into three categories: adding new features, removing existing features, and changing existing features. Feature-added areas are identiﬁed around the outlier feature points in the query image, which do not appear in the anchor image. Feature-removed areas are identiﬁed around the outlier feature points in the anchor image, which do not appear in the query image. Feature-changed areas are the areas with matched feature points, which have large local histogram distances from the corresponding area in the anchor image. If the EMD between local histograms of image blocks around feature points in the anchor image and the local histograms of the corresponding blocks in the query image is larger than the threshold, the blocks in the query image are declared as tampered areas. After detection possible tampered areas, we sum up the area of these tampered blocks, and use the ratio of the sum of the tampered area to the area of the whole image as the percentage of the tampered area.

6.7 Experimental Results

6.7.1 Feature Point Detection

We will show the robustness of the proposed feature point detector in this subsection. We create differently distorted versions of image ‘Lena’ by tampering Lena’s

143 hat, rotating the image by 3 degrees, and histogram equalization. In Figure 6-2, feature points are denoted by red ‘o’. The feature points extracted from the original image, the

tampered image, the rotated image and the image after histogram equalization are

almost the same. It indicates the robustness of our proposed feature point detector

against attacks. 6.7.2 Feature Point Matching Example

Figure 6-3 shows the result of the proposed feature point matching algorithm.

Speciﬁcally, Figure 6-3(A) shows the original image; Figure 6-3(B) shows the tampered and compressed image and Figure 6-3(C) shows extracted, clustered and matched

feature points. The axes in Figure 6-3(C) denote the pixel coordinates in images. Each

cluster concentrates in one ellipse. Feature points of different clusters are illustrated

with different colors, ‘+’ and ‘∗’ denote the matched and outlier feature points in the original image, and ‘o’ and ‘’ denote the matched and outlier feature points in the query

image respectively. Tampering the corner of the hat of Lena in Figure 6-3(B) will add

and remove feature points. By using the proposed feature point matching algorithm, the

outliers of feature points can be efﬁciently and correctly detected, and corresponding

feature points between images in Figure 6-3(A) and Figure 6-3(B) are matched into pairs in line with the fact. It shows the effectiveness of our feature point matching algorithm.

And our algorithm runs fast.

6.7.3 Authentication Performance

We compare authentication performance of four image authentication systems: the

proposed image authentication system, image hashing based on feature points [99],

image hashing based on Singular Value Decomposition (SVD) [80] and image hashing

based on Wavelet [147] in image hashing toolbox [2]. We test image authentication system on 6 test images. They are 512x512 gray

images shown in Figure 6-4.

144 Several types of attacks are made on these images: scaling images to 0.5 and 1.5 of their sizes, compressing images using JPEG with quality factor 50 [116], rotating images by 5 degrees, cropping 20% of images, adding white Gaussian noise (σ 2 = 20) to images, ﬁltering images with Gaussian and Median ﬁlters.

The thresholds to distinguish similar images and different images are 1.5, 0.2 and 0.2 for normalized Euclidean distance, histogram weighted distance and Hausdorff distance respectively in the propose image authentication system. The proposed image authentication system can make correct authentication decisions in the cases where feature point based, SVD based and Wavelet based image authentication in image hashing toolbox [2] may not. Some experimental results are shown in Table 6-1. Feature point based, SVD based and Wavelet based image authentication are denoted by FP, SVD and Wavelet in Table 6-1 respectively. The decisions of the image authentication systems are represented by ‘S’ for similar images and ‘D’ for different images. FP fails to authenticate the Lena and its tampered version, Lena and its compressed and enhanced version, since their feature point extraction is not that robust, and they donot have outlier exclusion. SVD underestimates the distances and considers

Lena and Mandrill are similar. Wavelet fails to recognize similarity between the image

‘Goldhill’ and its enhanced version. It indicates that the SVD based method and wavelet based method are not robust as feature point based methods in some cases. It fails in luminance adjusted cases. Our proposed authentication system makes correct decisions in these cases.

We also create 84 attacked images from six test images in Figure 6-4. Each test image has 14 attacked versions, which suffers from scaling 0.5, scaling 1.5, JPEG compression with quality 50, 5 degree rotation, cropping 20%, white gaussian noise addition, filtering with Gaussian filter and Median filter, and 6 tampering attacks. We test similarity and difference among 3486 image pairs. The correct probability of the proposed system, FP, SVD and Wavelet are 84.5%, 81.9%, 83.2%, 79.1% respectively.

145 6.7.4 Distance Comparison

We compare the three types of distance in the proposed image authentication system with the distance of image hash based on feature points [99], the distance of image hash based on Singular Value Decomposition (SVD) [80] and the distance of image hash based on Wavelet [147] in image hashing toolbox [2]. They are denoted by Euclidean, Histogram weighted, Hausdorff, FP, SVD and Wavelet in Figure 6-5 and

Figure 6-6.

Our experiments use the frames of two shots in video sequence big buck bunny 480p h264.mov [3]. Each shot has 30 frames. The 20th frames in the two shots are shown in Figure 6-7.

The distances between the 20th frame and the other frames in the first shot are shown in Figure 6-5. The distances are all very small. The distances between the 20th frame in the first shot and the other frames in two shots are shown in Figure 6-6. The methods can distinguish two shots. Discriminability of SVD is the lowest, while discriminability of FP is the highest. The distances used in our authentication system have both robustness and discriminability, and the non-constant distances reflect the similarity between frames better than other methods since perceptually similar images have small distance between them. And FP, SVD and

Wavelet based methods do not provide tampering location identiﬁcation.

6.7.5 Tampering Detection

We detect tampering such as adding, removing and changing features as shown in

Table 6-2. Three types of tampering, i.e., adding, changing, and removing features, are shown in each row of the Table 6-2. In the images in the ﬁrst column of Table 6-2, the

‘’s in images are basic image blocks used in local histogram distance evaluation. The ‘’s indicate the detected tampered blocks around some feature points. The tampered versions of ‘Lena’ are shown in the ﬁrst column of the Table 6-2. The percentage of tampering area is also estimated in the second column of the Table 6-2, but is

146 under-estimated. If we increase size of the ‘’, missing rate will be high for small-area tampered images, and it will increase EMD computation overhead. Thus we just choose

11x11 as the size of image blocks. The size of image blocks should be hierarchical and

adaptive in our future work.

6.8 Summary

We proposed an efﬁcient robust image authentication system. The feature points with the k-largest local total variations are extracted. Feature points are clustered by Fuzzy C-means algorithm. Then the outliers of feature points are removed and feature points pairs between the query image and the anchor image are matched in zigzag order cluster by cluster at the same time, which increases the robustness of the proposed image authentication system. Furthermore, the normalized Euclidean distance, the Hausdorff distance,the histogram weighted distance between the query image and the anchor image are evaluated. Based on the distances, whether the images similar or not are determined by majority voting. For similar images, possible geometric attacks are subject to detection and image registration is performed. Possible tampered areas are determined and classiﬁed, and the percentage of tampered area is estimated. The proposed image authentication system could serve as a building block in many applications such as copyright protection, image retrieval and video signature.

147 Anchor Query Image Image

Feature Point Feature Point Extraction Extraction

Outlier Exclusion by Clustering and Point Matching

Normalized Histogram Hausdorff Euclidean Weighted Distance Distance Distance

>T1 >T2 >T3

N Y Y Y

N Majority Voting N Different Images Similar Images

Geometrical Transform Identification Registration

Possible Tempered Area Identification I:Removed II:Changed III: Added Percentile of Tempered Area Estimation

Figure 6-1. The ﬂowchart of the proposed image authentication system.

148 A B

C D

Figure 6-2. The stable feature point detector. (A)The original image and extracted feature points. (B)The hat-tampered image and extracted feature points. (C)The rotated image and extracted feature points. (D)The image after histogram equalization and extracted feature points.

149 A B C

Figure 6-3. Feature point clustering and matching. (A)The original image and extracted feature points.(B)The hat-tampered image and extracted feature points.(C)Feature point clustering and matching.

A B C

D E F

Figure 6-4. Six test images for image hashing. (A)Lena (B)Barbara (C)Boat(D)Mandrill (E)Jet (F) Pepper

150 0.09

0.08 Euclidean*0.06 Histogram Weighted Hausdorff FP 0.07 SVD Wavelet

0.06

0.05

0.04

0.03

0.02

0.01

0 0 5 10 15 20 25 30

Figure 6-5. Distance comparison among different authentication methods in one video shot.

1.4

Euclidean*0.1 1.2 Histogram Weighted Hausdorff FP 1 SVD Wavelet 0.8

0.6

0.4

0.2

0 0 5 10 15 20 25 30 35

Figure 6-6. Distance comparison among different authentication methods in two video shots.

151 A B

Figure 6-7. The two frames in two shots in the test video Bunny. (A) The 20th frame in the ﬁrst shot. (B) The 20th frame in the second shot.

Geometric Attack Estimation

Estimate rotation and translation by ICP

Estimate scaling by the ratio of STD of FPs

Registration

Tampering Estimation

Estimate tampered area by EMD of local histograms I: Added area(FPs in the query image have no correspondences in the anchor image) II: Removed area(FPs in the anchor image have no correspondences in the query image) III: Changed area

Estimate percentage of tampered area

Figure 6-8. Diagram of possible attack identiﬁcation.

152 Table 6-1. Authentication performance comparison among different methods.

(show some limitations of methods from [80, 99, 147])

153 Table 6-2. Tampering detection and percentage of tampering area estimation. Detection results Percentage of tampering area

Tampered Area Detection Possible Added Area

1.53%

Tampered Area Detection Possible Changed Area

0.84%

Tampered Area Detection Possible Removed Area

1.02%

154 CHAPTER 7 ROBUST TRACK-AND-TRACE VIDEO WATERMARKING

7.1 Research Background

Nowadays, computers, interconnected via the Internet, make the distribution of the digital media fast and easy. However, it also requires less effort to obtain the exact copies. Therefore, it poses great challenges to copyright protection for digital media. Digital watermark embedding is a process of integrating the user and copyright information into the carrier media in a way invisible to human vision system (HVS). Its purpose is to protect the digital works from the unauthorized duplication or distribution.

Video watermarking system is desired to embed watermark in such a way that the watermark can be detected later for authentication, copyright protection, and track-and-trace illegal distribution. Videos, composed of multiple frames, can utilize image watermarking techniques in a frame-wise manner [112]. Although the watermarking embedding capacity of video is much larger than that of image, the attacks the video watermarking suffers are more complicated than image watermarking.

The attacks include not only spatial attacks, but also temporal attacks and hybrid spatial-temporal attacks.

In the literature of track-and-trace video watermarking, the algebra-based anti-collusion code is investigated [10, 17, 24, 139, 144, 151, 155]. Its ability to trace one or multiple colluders depends on the assumption that the code is always available and error-free, which may not be true in practice. Besides, the length of anti-collusion code hinder the system user capacity. Hence, practical and multi-functional watermarking systems based on algebra-based anti-collusion code are very limited.

To this end, we propose a robust track-and-trace watermarking system for digital video copyright protection [158]. It consists of two independent bodies, watermarking embedder and watermarking detector. At embedder, user and product copyright information, e.g. a string of length Ls , is ﬁrst encrypted with Advanced Encryption

155 Standard (AES)[42] to form a binary sequence. We then apply error correction code (ECC) [87] to the sequence to generate a binary sequence with error-correction ability of length L, called watermark payload. Meanwhile, a frame-size watermark pattern arises from a pseudo-random noise (PN) sequence [41, 118]. Each binary bit in watermark payload is associated with one video frame and determines how watermark is embedded to this frame. Bit 0 indicates subtracting the watermark pattern from the current frame, while bit 1 indicates adding the watermark pattern to the current frame.

We will repeatedly embed L bits if the video is longer than L frames, and sync the watermark payload at the beginning of dramatic video scene changes to resist temporal attacks. Furthermore, in order to meet the perceptual quality, we build a perceptual model to determine the signal strength that can be embedded to each frame pixel. Note that the stronger the embedded signal, and hence the easier the watermark can be correctly detected. However, watermark pattern is like random noise, and too strong of the noise signal can cause noticeable distortion to the picture. The randomness of PN sequences also make the embedded watermark information blind to the attackers.

To make a trade-off between capacity and visual quality, we build a perceptual model to determine the signal strength that can be embedded to each pixel. Finally, since distributed videos are prone to collusion attacks, we propose to apply geometric transforms to the watermarked videos. This is called geometric anti-collusion coding in this chapter. These transforms include rotation, resizing and translation, and should be moderate enough to cause no defect to HVS, but also enhance the capability to resist collusion attacks.

The watermarking detector just carries out the reverse process of the embedder.

In this system, we assume the detector can always have access to the original video as the prototype of the candidate video. Because of the geometric anti-collusion coding at embedder, watermark usually cannot be correctly extracted without any pre-processing even the candidate video is an error-free copy. Additionally, spatial

156 attacks such as further geometric manipulations and temporal attacks may occur to distributed videos. In this chapter, we propose to register the candidate video to the

original video spatially and temporally. An iterative KLT [142] based scheme is applied

for spatial registration, whereas temporal registration is to match frames that minimize

the mean-square-error (MSE). We then compute cross-correlation coefﬁcients between re-generated watermark pattern and frame difference of the registered frame and its

corresponding original frame, demodulate the coefﬁcient sequence to recover the

watermark payload. It is then ECC decoded (convolutional coding for speciﬁc) and AES

decrypted to derive the original user or copyright information. Successful detection

indicates the user or copyright information is correctly extracted, otherwise we say the detector fails to detect the watermark.

The chapter is organized as follows: Section 7.2 describes the overall architecture

of the proposed track-and-trace video watermarking system. The watermarking

embedder techniques are discussed in Section 7.3. Section 7.4 introduces watermarking detector techniques. The experimental results presented in Section 7.5 verify robustness

of the proposed video watermarking. Finally, the conclusion and future work are given in

Section 7.6.

7.2 Architecture of Robust Video Watermarking System

The architecture of the track-and-trace video watermarking system includes two

independent components, i.e., watermarking embedder (Figure 7-1) and watermarking

detector (Figure 7-2). It is an additive watermarking system. Watermarking embedder consists of two functional components, watermark generator to generate watermark

payload (Figure 7-1A), and watermark embedder to embed the payload to video frames

(Figure 7-1B). Watermarking detector extracts payload from candidate video (Figure

7-2A), and then recover user or copyright information from the payload (Figure 7-2B).

157 7.2.1 Watermarking Embedder

The proposed video watermarking system is an additive system, i.e. adding watermark signal to the original video. The inputs of embedder are the original video, user ID and copyright information. The key conﬁgure parameters are the frame size (widthxheight) of the input video, AES encryption key (Key1 in Figure 7-1B), pattern ID (Key2 in Figure 7-1B) to generate watermark pattern and Key3 to generate geo-transform parameters for ﬁlm distributors.

String-type user/copyright information are binarized, encrypted, and ECC coded into watermark paylaod. If convolution code rate = 1/2 is used, the information of Ls characters is transformed into L = 16Ls bits. Watermark pattern by using orthogonal PN sequences can resist frame-averaging collusion. The pseudo random watermark patterns weighted by perceptual modeling of each frame are embedded with the largest strength under imperceivable constraint. The length of PN sequence N is frame size

(widthxheight). The number of the orthogonal sequences of length N is exactly N. For geometry transform, the bicubic interpolation is used to keep original video information as much as possible. There are 5o rotation, 5 pixel translation and 5% resizing.

Thus, the proposed system could accommodate 1000N distributors ideally. After embedding, the watermarked videos are distributed. They may suffer from intentional manipulations or unintentional degradations later. These attacks include but not restrict to geometric manipulations, erroneous transmission, and collusion.

7.2.2 Watermarking Detector

The inputs of detector are the candidate video and its original copy. The goal is to extract watermark payload from the candidate video and recover the user/copyright information with reference to the original copy. Some of the key conﬁgure parameters are the size (width and height) of the two input videos, AES decryption key (Key10 in

Figure 7-2A) and pattern ID (Key20 in Figure 7-2A) to re-generate watermark pattern.

158 Usually, we set Key10 = Key1, Key20 = Key2 for consistency of symmetric AES and PN-sequence generation at both ends.

The distributed video may be enlarged or cropped in size, referring to as resizing attack. Hence, the candidate video may differ in frame size with the original video.

The detector employs a resize algorithm to the candidate video to match them in size wherever necessary. The algorithm is bicubic interpolation (expanding) or decimation

(shrinking). Note that we apply geometric anti-collusion coding at embedder. Also, malicious attacks may impose spatial and temporal transforms attempting to remove the watermark information. On the other hand, the detector is very sensitive to these transforms and often fails in detection without any pre-processing to the candidate video. Accordingly, we ﬁrst register the candidate video to the reference video, both spatially and temporally. Normalized cross-correlation coefﬁcients are computed between each pair of the registered candidate frame and the reference frame. The anti-collusion geometric transform information brought by Key30 is used to trace possible illegal distributors.

Then we do binary hard decision to get +1/-1 sequence from the coefﬁcients based on a threshold, and demodulate it to a binary 0/1 sequence, which is the extracted watermark payload. Finally, the payload is ECC decoded and AES decrypted to recover the user/copyright information of string type, as illustrated in Figure 7-2B. Here Viterbi algorithm is used for ECC decoding regarding convolutional code for ECC encoding at embedder.

The proposed video watermarking system is integrated with various techniques, include spectrum spreading, AES encryption and decryption, ECC encoding and decoding, perceptual weighting model, geometric anti-collusion coding and frame registration. The following section will introduce these techniques in detail respectively.

159 7.3 Watermark Embedding Techniques

7.3.1 Watermark Pattern Generation

A seed denoted as Key2 in Figure 7-1A is required to generate a PN-sequence as watermark pattern using spectrum spreading. It should be of the same size with the video frame in order to do matrix addition. The PN-sequence can be m-sequence, Walsh codes or Kasami sequence with optimal cross-correlation values. The orthogonal

PN-sequences are desired between different videos to resist averaging collusion, and desired between different watermark payload (+1/-1) to resist temporal attacks.

Orthogonal m-sequence is used in our system. For frame-size N (widthxheight), the length of m-sequences is N, hence, there are N orthogonal m-sequences. 7.3.2 Watermark Payload Generation

Product copyright and user information require encryption to keep it from attackers who want to detect or tamper the content. After encryption, the information appears as noise to the attackers. The encryption technique could be Rivest-Shamir-Adleman cryptography (RSA), Data Encryption Standard (DES), Advance Encryption Standard(AES) and so on. The encryption key denoted as Key1 in Figure 7-1A could be the choice from the watermark creator following certain rules. In our system, we choose AES for encryption and set the length of standard key to be 128 bits. Key1 could be both user and video related. We assume that it is a common key to both embedder and detector, known as a symmetrical encryption system. If unsymmetrical encryption system is used, the embedder has private key Key1, and the detector has public key Key10.

Moreover, video distribution process can be viewed as transmission in channel, and the attacks to the media is regarded as channel noise. Therefore, 1/2 convolution code is adopted in our system for error correction coding (ECC). After encryption and encoding, a binary sequence of length L is generated as watermark payload.

The binary payload is further modulated into +1/-1 sequences as:

0 X = 2X − 1, (7–1)

160 where {X } ∈ {0,1} is the binary payload, X 0 is the modulated sequence. 7.3.3 Perceptual Weighting Model

As mentioned in section 7.1, there is a trade-off between watermark capacity and visual quality in determining the signal strength that can be embedded into video

frames. We build a perceptual model in both temporal domain and spatial domain.

The objective is to maximize the watermark capacity without causing noticeable

degradation to visual quality of videos. The model diagram is shown in Figure 7-3.

Embedding strength is determined in a pixel-wise manner for every frame. Hence, it is formulated as a height×width mask matrix M, with each entry describing the weight

for the collocated watermark pattern signal. Then for length L watermark payload, the

watermark corresponding to the ith payload bit in frame kL + i, k ∈ Z + is:

0 W = sign(X (i)) · M P (7–2) where is element-wise product, P is watermark pattern, X 0(i) is the ith watermark payload. The pixel values in the watermarked frame W should be clipped to [0,255]. 7.3.3.1 Temporal perceptual modeling

The perceptual model in temporal domain is based on the fact that human eyes are sensitive to changes with slow motion, but not to fast moving changes. Generally, the

larger the difference between the current frame and the previous frame, the stronger

the embedded signal strength could be. But the simple difference between adjacent

frames is not good enough to describe object moving. For example, if an object is

moving leaving a smooth background, we cannot embed high strength watermark into the smooth background. Therefore, we propose a block motion matching algorithm to

ﬁnd the difference between blocks in current frame and previous frame with the least

sum of absolute differences (SAD) , which is deﬁned as:

0 0 0 SAD(Ω,Ω ) = ∑ |Ic (i,j) − Ip(i ,j )| (7–3) (i,j)∈Ω,(i0,j0)∈Ω0

161 where Ω is the block in the current frame, Ω0 is the block in the previous frame, 0 0 (i,j),(i ,j ) are the pixel coordinates, Ic is the current frame, Ip is the previous frame. The algorithm for perceptual model in temporal domain is summarized as follows:

Algorithm 5 Perceptual Modeling in Temporal Domain for each block Ω in the current frame do 0 for each block Ω in NB (Ω) in the previous frame do if SAD(Ω,Ω0) < minSAD then minSAD = SAD; 0 Diﬀ = |Ic (Ω) − Ip(Ω )|; end if end for end for

where NB (Ω) is the neighborhood of Ω in the range B.

NB (Ω) = {z ∈ Ip|Bz ∩ Ω =6 /0} (7–4) where Bz is the translation of B by the vector z, i.e., Bz = {b + z|b ∈ B},∀z ∈ Ip. Temporal model ﬁrst perform block matching between two adjacent frames, and calculate the difference between these matching blocks. Then the differences are scaled as temporal mask.

TemporalMask = α · Diﬀ (7–5)

7.3.3.2 Spatial perceptual modeling

We propose two perceptual models in spatial domain. They can either be used independently or combined together to generate one spatial mask. Model 1 is based on edge and texture. The underlying principle is that rough area like texture rich area and edges, could be embedded with higher strength watermark, since human eyes are insensitive to changes in these areas. To accurately identify these areas, we use a combination of three metrics to describe such area.

• The ﬁrst metric Map1 is the difference between current pixel and its surrounding pixels. If the difference is large, it means the current pixel is located in area that can tolerate relative large changes, so that large embedded signal strength is

162 possible. Map1 is calculated by a convolution of the original frame and high pass ﬁlter H, which is deﬁned below:   −1 −1 −1 H = −1 8 −1 (7–6) −1 −1 −1

• The second metric Map2 is the variance of the block which is centered at the current pixel. The larger the variance, the higher the embedded signal strength can be.

• The third metric Map3 is the entropy of the block which is centered at the current pixel. The higher the entropy is, the richer texture the area has, and the higher embedded signal strength could be. Each of the three metrics can describe how rich the texture is of the local area

around pixel, but none of them is sufﬁcient by its own. Therefore, we deﬁne the spatial

as the product of the three metrics:

SpatialMask1 = β · Map1 · Map2 · Map3 (7–7)

where β is a scaling factor.

Model 2 is based on saliency map [4] and Just Noticeable Difference (JND) model

[18, 40] of video frames. The saliency map highlights salient texture areas in a image, where could be imperceivable embedding locations. To obtain saliency map, the

frame is down-sampled and low pass ﬁltered in the fourier transform domain, and then

up-sampled to the original frame size. The magnitudes of the saliency map describe

the frequency of frame information. Visual just noticable difference reﬂects nonlinear response of human eyes to spatial frequency. Based on JND human perceptual model,

the saliency map is further mapped into spatial mask by:

η SpatialMask2 = (7–8) (SaliencyMap + δ)

To guarantee good visual quality of videos, we choose the minimal value between

the spatial mask and temporal mask for each pixel. And the ﬁnal embedded signal

163 strength is also bounded by a minimum and a maximum value. The perceptual weighting map (PWM) is deﬁned as:

PWM = min(maxStrength,max(minStrength,min(TemporalMask,SpatialMask))) (7–9)

7.3.4 Geometric Anti-collusion Coding

After embedding the watermark payload into the carrier video, we apply geometrical transforms to each copy of the video. The transform is a combination of shifting, resizing and rotation, and varies among different video distributors. For each video copy, its speciﬁc transform index is a random variable generated by Key3, related to user and copyright information. The extent of the transform should be moderate enough in order not to be awared by HVS, but can still be detected by computers. If colluders try to linearly or nonlinearly combine the multiple video copies to eliminate the embedded watermark, the resulted video will usually be blurred and become unaccepted by human eyes. Thus, the geometrical transform protects video watermark from inter-video collusion attacks. This process is called geometrical anti-collusion coding. To preserve as much information as possible, bi-cubic spline interpolation [76] is used to ﬁll the blank area after transform.

7.4 Watermark Detection Techniques

7.4.1 Video Frame Registration

Apart from the geometric anti-collusion coding, the input candidate video at detector may go through many changes, either accidental manipulations or malicious attacks.

Two major categories among the changes are afﬁne transform in spatial domain, and frame add/drop in temporal domain. Since detector has access to original video, we can use original copy as reference and register candidate video to the reference in both spatial domain and temporal domain.

164 7.4.1.1 Spatial registration

The spatial registration is based on Kanade-Lucas-Thomasi (KLT) feature tracker.

Afﬁne model is used in spatial registration [90, 129]. The afﬁne transform model to any pixel (x,y) is:        0 x  a bx e   =    +   (7–10) y 0 c d y f

The objective of spatial registration is to find the 6 affine transform parameters a − f in the model, so as to do inverse affine transform before detection. KLT achieves this by matching the corresponding feature points in the candidate frame and the original frame, and get the solution to the parameter set. . We call the rectified frame F (1). For each pixel (x,y) of F (1), we compute its pixel position (x0,y 0) in candidate frame

F (0). Take F (0)(x0,y 0) as the match in F (1)(x,y) if x0,y 0 are integers; otherwise, we interpolate F (0)(x 0,y 0) at (x0,y 0). However, due to the complexity of the transform and the imperfectness of KLT algorithm, the rectified frame after one-time inverse affine transform is often not good enough to extract watermark from. Therefore, we propose to refine the estimate F (1) by applying KLT iteratively. Specifically, we have affine transform displacement expressed as:          0 δx x − x a − 1 b x e   =   =    +   (7–11) δy y 0 − y c d − 1 y f

When the ith KLT iteration gets F (i) and the corresponding affine parameter set ai − f i , we compute the displacement of each pixel. We keep doing this until the convergence condition is satisfied or we reach the maximum number of iteration. In this system, we check the maximum pixel displacement between two consecutive rectifications:

{|δ i | |δ i |} ε maxxi ,y i ∈F (i) x , y < (7–12)

165 where

− δxi = xi − xi 1 (7–13)

− δy i = y i − y i 1 (7–14)

and ε is a pre-deﬁned threshold.

In most cases, we expect spatial registration based on KLT to improve the detection

performance if the candidate video actually experiences certain afﬁne transform. However, the detector is unaware of the exact manipulation to the candidate frame. If it

is not afﬁne transform, KLT gives wrong parameters, and the performance after spatial

registration can be worse than that without it. Hence, spatial registration is set optional

in our detector. Typically, detector can control to switch on/off the spatial registration if it has manipulation information. Otherwise, we can always try both and choose the one

with better detection result.

7.4.1.2 Temporal registration

In temporal registration, we use the simple rule of minimizing the mean-square-error

(MSE) to register candidate frame to the original frame. We scan original sequence to

ﬁnd the best match for current candidate frame that minimize MSE [33]. One causal constraint is put so that no frame displayed in the past can be captured in the future.

That is, if two frames i,j in candidate video with i < j, and then the registered frame

α(i),α(j) in original video must satisfy α(i) ≤ α(j). For frame k in the candidate video, it computes the MSE with n consecutive frames in the original video α(k −1)+1,··· ,α(k −

1) + n, where α(k − 1) is the previous registered frame, and register the current frame to the one with the minimal MSE.

Likewise, temporal transform may or may not appear in the candidate video. The performance after temporal registration could be worse than that without it if no temporal manipulation occurs. Therefore, temporal registration is also set optional in our detector.

If temporal registration is enabled, it is usually performed ahead of spatial registration.

166 7.4.2 Watermark Extraction and Payload Recovery

After registration, it is assumed each frame in candidate video has found its match

frame in the original video. Note the watermark is a additive system that adds watermark pattern into original frame. Hence, we can detect the existence of watermark signal

by computing the cross-correlation between the watermark pattern and the true frame

difference. We use a key exactly corresponding to Key3 at embedder to re-generate

the watermark pattern, a frame-size PN sequence at the detector. The normalized

cross-correlation is deﬁned as:

P Pˆ NC P,Pˆ h , i ( ) = k k (7–15) P F kPˆ kF

where P is the watermark pattern, Pˆ is the true difference between candidate frame and

its registered frame; < ·,· > denotes inner product, and k · kF is Frobenius norm. The range of the normalized cross-correlation is [−1,1]. The larger the absolute value of the coefﬁcient, the better chance the candidate frame contains the regenerated

pattern, i.e. it has the watermark information embedded. Each candidate frame

corresponds to one coefﬁcient value. A hard decision threshold of 0 is used to make

the coefﬁcient sequence to a binary -1/+1 sequence. If the coefﬁcient is larger than

0, we denote it as “1”, otherwise it is “-1”. The extracted -1/+1 sequences {X 0} is then demodulated to 0/1 sequence {X } as:

0 X = (X + 1)/2 (7–16)

The watermark payload recovery is the reverse process of payload generation. The binary payload sequence X 0 needs to be decoded and decrypted to derive the string.

For ECC decoding, we use Viterbi algorithm to decode the convolutional code [102]. And AES decryption method is described in standard [42]. The 128-bit AES key used in

decryption is denoted as Key10, usually set to be identical to Key1.

167 7.5 Experimental Results

The step by step results of watermark embedding are listed in Table 7-1. The test

video sequences are downloaded from [1] for watermarking embedding. They are YUV sequences of CIF format including Foreman, Mobile, News and Stefan. Column 1 shows

the Y components of the 90th frames in the original sequences. Column 2 represents

their corresponding watermark patterns after perceptual weighting model. Column 3

shows the watermarked sequences from which the watermark is imperceptible. And

Column 4 is the watermarked frames after geometric transform with anti-collusion ability. They all undergo up-right shifting 3 pixels, clockwise rotate 2o, and resizing

101%. We can hardly distinguish the watermarked frames in Column 3 with the original

frames in Column 1, which meets our perceptual requirement for watermarking system.

Furthermore, geometric anti-collusion code does not cause much distortion neither, as frames in Column 4 and Column 3 look exactly alike. The PSNR of the watermarked

sequences are shown in Figure 7-4, which falls in the range of 34 and 47 dB.

At detector side, we test the capability of detector to correctly extract watermark information under various attacks. Among them, the most important task is to verify the capability to resist afﬁne transforms, not only because they are used for anti-collusion coding at embedder for security purpose, but the distributed videos can encounter malicious geometric manipulations as well. Figure 7-5 lists the watermarked frame

under various afﬁne transforms. The test sequence is 300 frames QCIF Grandma,

and the 5th frame is selected to illustrate the effect of multiple geometric transforms, including 25 pixel rotation (around 8o rotation) (7-5C), 10 pixel expanding (around

105.7%) (7-5D), 10 pixel shrinking (around 94.3%) (7-5E) and 40 pixel shifting 7-5F.

Note how signiﬁcantly the last three transforms change the frame structure. The

geometric transforms to such extent have been easily detected by human eyes, hence

they may be out of the range the anti-collusion coding can carry out on watermarked video, and very likely the result of third party attacks. Therefore, the performance the

168 detector achieves on these videos can fully justify it under afﬁne transform attacks. Table 7-2 shows the error rate of cross-correlation coefﬁcients the detector obtains under the

above mentioned transform scenarios. It is deﬁned as:

M R = e (7–17) e M

where Me is the number of erroneous demodulated binary bits, and M is the number of frames in the sequence. Note this error rate is obtained before ECC decoding,

which can further correct the bit error. The ﬁrst row shows the error rate without frame

registration. The error rate is too high for ECC to correct. And it turns out we cannot

get the correct watermark information at detector. One time KLT registration has

signiﬁcantly reduced the error rate. The iterative KLT registration can further improve the performance (the third row) but not so signiﬁcant as what one time registration to

no registration at all. We notice for 25 pixel rotation, iterative KLT is actually identical

to one time KLT as it only operates the registration once. This is because these are all

single kind transforms, either rotation, resizing or shifting. And one time KLT is good

enough to track the correct transform parameters. While combinational transforms pose greater challenge for KLT based spatial registration. As shown in Table 7-3, complex

afﬁne transforms and single transform of higher magnitude require iterative KLT to

enable the detector to extract the correct watermark. Note that for resizing, positive

value means shrinking (7-5E), and negative value indicates expanding (7-5D). There are some combinatorial transforms in which KLT registration fails (indicated by “N/A”),

i.e. iterative KLT registration will not converge after maximum iteration times and fails to

estimate the correct parameters.

7.6 Summary

In this chapter, we propose a robust track-and-trace anti-collusion watermarking system. At the embedder, the user and copyright information is securely mapped to binary sequences using AES, ECC, which results in watermark payload. Orthogonal

169 frame size PN-sequence is generated with secrete key as watermark pattern. The pattern is then perceptually weighted and integrated with the original video sequence frame by frame according to watermark payload. For anti-collusion purpose, the watermarked video will be geometrically transformed before distribution. At the detector, candidate video will be spatially and temporally registered to the original video if needed. We compute the cross correlation between the re-generated watermark pattern and frame difference to extract the payload. Then the payload is ECC decoded and AES decrypted to get the final watermark information. Experimental results show that the proposed system is robust against geometric attacks and collusion attacks, and meets the requirement of invisibility to HVS. Meantime, it also shows that iterative KLT registration has limitations. Our future work includes further investigation into the transform attacks and enhancing the detector capability to cope with complicated combinatorial affine transforms and non-affine transforms. Moreover, we will test the detector under other types of video-related attacks such as compression, erroneous transmission, and reverse order display.

170 A

Figure 7-1. Track-and-trace video watermarking embedder. (A)Watermark generator (B)Watermark embedder

171 A

Figure 7-2. Track-and-trace video watermarking detector. (A)Watermark extractor (B)Payload recovery

172 Figure 7-3. Perceptual modeling diagram

PSNR per Frame PSNR per Frame 42.5 35.6

42 35.4

35.2 41.5

35 41

34.8 40.5

PSNR (dB) PSNR (dB) 34.6

40 34.4

39.5 34.2

39 34

38.5 33.8 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 Frames Frames

A B

PSNR per Frame PSNR per Frame 47 39.5

46 39

45 38.5

44 38

43 37.5

PSNR (dB) 42 PSNR (dB) 37

41 36.5

40 36

39 35.5

38 35 0 10 20 30 40 50 60 70 80 90 173 0 10 20 30 40 50 60 70 80 90 Frames Frames

C D

Figure 7-4. PSNR of watermarked video sequences in Table I. (A) Foreman. (B) Mobile. (C) News. (D) Stefan. A B

C D

E F

Figure 7-5. Geometric transform/attacks to 5th frame of Grandma. (A) Original frame. (B) Watermark frame. (C) Watermark frame with 25 pixel rotation. (D) Watermark frame with 10 pixel expanding. (E) Watermark frame with 10 pixel shrinking and truncated to original size. (F) Watermark frame with 40 pixel shifting.

174 Table 7-1. Step by step results of watermarking embedder Sequences PWM Watermarked Geo-transformed

Table 7-2. Cross correlation coefﬁcient error ratio (%) with frame registration in Grandma sequence

Attack1 Attack2 Attack3 Attack4

No registration 41.67 3 48.67 47.67

1-KLT Registration 0 2.33 0.67 2.33

Iterative KLT Registration 0 0 0 0.33 (Attack1: 25 Pixel Rotation; Attack2: 10 Pixel Expanding; Attack3: 10 Pixel Resizing; Attack4: 40 Pixel Shifting)

175 Table 7-3. Capability of KLT based video registration for various geometric transforms shift resize rotate KLT iteration time Capability 0 -15 0 2 Y 0 -20 0 N/A N 0 15 0 2 Y 0 20 0 N/A N 0 0 40 2 Y 10 10 10 4 Y 20 10 20 N/A N 20 5 20 4 Y 20 -5 20 3 Y 30 5 10 N/A N (N/A: not applicable)

176 CHAPTER 8 CONCLUSIONS

8.1 Summary of the Dissertation

In this section, we summarize the research presented in this dissertation.

This dissertation explored algorithms and theories in image and video compression and copyright protection. For compression, integer reversible transforms and low complexity quantization were studied. For copyright protection, image hashing, image authentication and video watermarking techniques were developed.

Chapter 2 studied the integer reversible transform design problem for lossless signal compression. For the PLUS factorization, which factorizes arbitrary transform matrix with unitary determinant into a product of permutation matrix, a lower triangular elementary reversible matrix, an upper triangular elementary reversible matrix and a single row elementary reversible matrix, we stabilized and optimized the factorization and did perturbation analysis on it. Stabilization by using pivoting made PLUS factorization stable; optimization by using Tabu search made it have the smallest transform error; perturbation analysis proved it numerically stable. Based on the optimized PLUS factorization, we built up integer reversible transforms from the traditional transforms, such as DCT and LBT. Applying the proposed integer reversible transforms into lossless image compression, the experimental results showed that our integer DCT with the optimal PLUS factorization outperformed the integer DCT with expansion factors; our integer lapped biothogonal transforms with the optimal PLUS factorization outperformed that in JPEG-XR.

Chapter 3 studied the one-dimensional quantizer design and proposed an adaptive quantization using piecewise companding and scaling for Gaussian mixture. Our adaptive quantizer had three modes corresponding to three types of Gaussian Mixture. Our experimental results showed that 1) the proposed quantizer was able to achieve performance close to the optimal quantizer (i.e., Lloyd-Max quantizer for GMM) in

177 the sense of Mean Squared Error (MSE), at much lower computational cost than it; 2) the proposed quantizer was able to achieve much better MSE performance than a

uniform quantizer, at a cost similar to the uniform quantizer. Furthermore, we proposed

a reconﬁgurable architecture to implement our adaptive quantizer in an ADC. We also

used it to quantize images and design the tone mapping algorithm for high dynamic range (HDR) image compression, rewarding improved visual performance.

Chapter 4 studied the high dimensional quantizer design and proposed an optimal

vector quantizer approximators with transforms and scalar quantizers. Vector quantizers

aim at the optimal Rate-Distortion performance, but their design complexity increases

exponentially with the number of quantization levels. To reduce quantizer design and implementation complexity, we proposed to combine the transform and scalar

quantization. Transform was used to decorrelate data and facilitate quantization.

Unitary transforms and volume-keeping scaling transforms were discussed, but only

unitary transforms were used. After transform, the memoryless data were plugged into a tri-axis coordinate frame, and then an effective scalar quantization was applied

on the data. The proposed quantization framework were almost suitable for arbitrary

distribution. The tri-axis coordinate frame worked for memoryless sources in both

circular and elliptical distributions. We tested our proposed quantizer on both Gaussian

and Laplace distributions. The resulted performance was almost always better than that of restricted/unrestricted polar quantizers and rectangular quantizers. The resulted

Rate-Distortion performance of the proposed quantization approached the performance

of the optimal vector quantization.

Chapter 5 proposed a robust image hashing system. The system was built up by:

a robust descriptor of images — the k-largest local total variations, which was then quantized with companding and binarily coded with Gray code. The k-largest local total variations were robust to content preserving attacks such as geometric attacks and luminance attacks, and indicated stable similar feature points in perceptual identical

178 images. The Morlet wavelet coefficients were pseudo randomly permuted with a secrete key, which enhanced the security and reduced the collision rate of the image hashing system. Morlet wavelet coefficients were computationally efficiently quantized using companding technique according to the probability distribution of the coefficients. Thus, the proposed image hashing was robust to contrast changing and gamma correction of images. Gray code was used to binarily code the quantized coefficients, which increased discriminability of image hashes.

Chapter 6 proposed a robust image authentication system. It was a content based image authentication system by feature point clustering and matching. Feature points were detected from images firstly. Then the feature points from the anchor images and the query images were clustered and matched. The distance and matching information between point pairs were used to identify the authenticity of the query images and possible attack it suffers. Feature points were firstly generated from a given image, but their locations may be changed due to possible image processing and degradation. Accordingly, we proposed to use Fuzzy C-mean clustering algorithm to cluster the feature points and remove the outliers from the feature points. Histogram weighted distance was proposed, which was equivalent to Hausdorff distance after outlier removal. The authenticity of the query image was determined by the majority vote of whether three types of distance between matched feature point pair were larger than their respective thresholds. The geometric transforms through which the query images were aligned with the anchor images were estimated, and the query images were registered accordingly. The possible tampered image blocks were identified, and the percentage of the tampered area was estimated.

Chapter 7 proposed a robust track-and-trace video watermarking system. The system includes a watermark embedder and a watermark detector. At embedder, we pseudo randomly generated watermark patterns and translated the user information, video information into watermark payload. Then they were embedded into video frames

179 according to perceptual weighting models. We also embedded geometric anti-collusion code to resist collusion attacks. At detector, we used KLT to register video frames and

then extracted watermark and recover payload. The video copyright information and

the possible malicious attacks were identiﬁed. The system provided: Security: user

and product copyright information, e.g. a string of length Ls , was ﬁrst encrypted with Advanced Encryption Standard (AES); error correction code (ECC) was applied to the sequence to generate a binary sequence with error-correction ability of length L, called watermark payload; a frame-size watermark pattern arised from a pseudo-random noise

(PN) sequence. Perceptual invisibility and robustness: To make a trade-off between visual quality and robustness which was determined by the embedding strength, we built a perceptual model to determine the signal strength that could be embedded to each pixel by using statistical source information and Just-Notice-Difference (JND) model. Track-and-trace: geometric anti-collusion coding was used for tracking and tracing colluders. An iterative KLT based scheme was applied for spatial registration for watermarking extraction.

8.2 Future Work

In this section, we point out future research directions. 8.2.1 Optimal Integer Reversible Transforms and the Lossless Video Compres- sion

The optimal integer reversible transforms with the least entropy need to be

investigated further.

Nowadays, capturing, creating, editing and processing moving images employ a

wide range of techniques for reducing the amount of data to be stored and transmitted. Signiﬁcant advance techniques are occurring in the reduction of bit-rates for end-use

distribution and consumer applications such as Internet video streaming, the DCI (Digital

Cinema Initiative), Blu-Ray DVD, and high-deﬁnition TV. Video archiving, especially

lossless video archiving, is important for high quality studio products, medical videos,

180 ﬁne arts and antique documents, and satellite data. There is lossless compression mode in H.264 Advanced Video Compression / MPEG-4 Part 10 which is an industry standard for video coding [119]. Its implementation in x264 is basically DPCM without transform.

We will try to implement integer DCT into video code VP8, test its performance, and compare it with lossless compression mode of video codec x264.

Besides, the RDC optimal transforms are still unknown, and how to effectively use directional transforms still need investigation.

8.2.2 A New Video Coding Strategy and RDC Optimization

Videos become an important part of human life in the digital age. The soaring number of videos demands efﬁcient video compression, which is standarnized in

H.264/MPEG-4 Part 10 or AVC (Advanced Video Coding) [119, 152] and the emerging H.265/HEVC [71, 96, 148]. These standards aim to encode videos with the least bitrate, the least distortion and the least computational complexity with certain constraint.

The existing H.264 video codecs include many encoding strategies, such as one-pass and multi-pass Average Bitrate Encoding (ABR), Constant Bitrate Encoding

(CBR), Constant Quantizer Encoding (CQP) and Constant Rate Factor Encoding (CRF) [32, 98]. These encoding strategies generally have single objective and the following properties.

Based on the existing rate-distortion-complexity frame work, we can incorporate

RDC optimal transform and RDC optimal quantization into video coding. The RDC theory could be studied and built up, and tested on practical video coding system.

181 APPENDIX A PERTURBATION ANALYSIS OF PLUS

Assume that a nonsingular matrix A ∈ Rn×n has unique PLUS factorization,

A = LUS. Let ∆A ∈ Rn×n be a perturbation such that A + ∆A also has unique PLUS factorization:

A + ∆A = (L + ∆L)(U + ∆U)(S + ∆S) (A–1)

Note that P is not considered, because the analysis of the sensitivity of general PLUS factorization algorithm is simpler and without much loss of generality. The measure of

∆L, ∆U and ∆S is deduced as following.

We use the matrix-vector equation approach [25], to present perturbation analysis

for PLUS factorization. Compared with Chang’s perturbation analysis for LU factorization,

1. The pattern of ∆U in PLUS factorization is different from that of LU factorization.

2. ∆S increases the analysis complexity.

We use the following notations for the perturbation analysis. A(t) represents matrix ˙ n×n function, and A(t) represents its derivative. For any matrix A ∈ R , A = [a1,··· ,an], n2×1 ai is the ith column vector of A, and vec(A) ∈ R is the vector form of A with ai in succession. Le ∈ Rn(n−1)/2×1 is composed of elements in nonzero locations of L˙(0), Ue ∈

2 R(n −n+2)/2×1 is composed of elements in nonzero locations of U˙ (0), and Se ∈ R(n−1)×1 is composed of elements in nonzero locations of S˙ (0). The uniqueness condition of PLUS factorization is: Uniqueness Condition.

(k) (k) det([an,a1,a2,··· ,ak−1] ) =6 0 when k > 1, and a1n =6 0 when k = 1, where A denotes the k-th leading sub-matrix of A.

Lemma 1. Assume that a nonsingular matrix A ∈ Rn×n has unique PLUS factorization, A = LUS. Let ∆A ∈ Rn×n, ∆A = εE, where ε is small enough, such that A + tE

satisﬁes Uniqueness Condition for all |t| ≤ ε, then A + tE has unique PLUS factorization:

A + tE = L(t)U(t)S(t),|t| ≤ ε, (A–2)

182 which leads to

L˙(0)US + LU˙ (0)S + LUS˙ (0) = E. (A–3)

For t = ε, we obtain A + ∆A with the unique PLUS factorization

A + ∆A = (L + ∆L)(U + ∆U)(S + ∆S), (A–4) where ∆L, ∆U and ∆S satisfy

∆L = εL˙(0) + O(ε2) (A–5a)

∆U = εU˙ (0) + O(ε2) (A–5b)

∆S = εS˙ (0) + O(ε2) (A–5c)

It can be easily proved using Taylor expansion in the similar way as in [25]. Note

that L(0) = L, L(ε) = L + ∆L, U(0) = U, U(ε) = U + ∆U, S(0) = S, and S(ε) = S + ∆S.

From Equation (A–3), we obtain

ei = L˙(0)Usi + LU˙ (0)si + LUs˙(0)i ,i = 1,··· ,n. (A–6)

By rearranging equation (A–6), we obtain a matrix-vector equation:   e L     W W W Ue = vec(E) (A–7) L U S   Se

n2×n(n−1)/2 where WL ∈ R is composed of n × (n − 1) sub-matrices with following pattern:   W W ··· W W  L1,1 L1,2 L1,n−2 L1,n−1     ···   WL2,1 WL2,2 WL2,n−2 WL2,n−1     . . . . .   ......      W W ··· W W   Ln−1,1 Ln−1,2 Ln−1,n−2 Ln−1,n−1  ··· WLn,1 WLn,2 WLn,n−2 WLn,n−1

183 ∈ n×(n−j) ≤ ≤ WLij R ,1 i,j n,       0   (s u + u ) , i < n (A–8a)  i jn ji  In−j W =   Lij     0   u  , i = n (A–8b)  jn In−j

n2×(n2−n+2)/2 WU ∈ R is composed of n × (n − 1) sub-matrices. Each sub-matrix has n rows, and the sub-matrices in the ith (i = 1,··· ,n − 2) column block have i columns, the

sub-matrices in the (n − 1)th column block have n columns, with the following pattern:  

 0n×1 s1L      L1 × s L   n 1 2       L1n×2 s3L     .. .   . .       L1 s − L  n×(n−2) n 1  L

n2×(n−1) WS ∈ R is a diagonal block matrix, composed of the same n × (n − 1) n×1 sub-matrices, Lun ∈ R , with the following pattern:   Lun       Lun       ...   

Lun

Obviously, by fundamental matrix transform, [WL,WU ,WS ] can be transformed into ··· ··· ··· ··· ··· a lower triangular matrix with (|a1n,u1{z, ,u}1,1|,1,u2{z, ,u}2, 1|,1,1,u{z3, ,u}3, ,1|, ,{z1,un−}1, n n n n

184 ··· 1|, {z ,1}) as the diagonal entries. Therefore, W = [WL,WU ,WS ] is invertible. Let n  

YL    −1   W = Y ,  U  YS then we obtain     e L YL        −1   Ue = W vec(E) = Y vec(E) (A–9)    U  e S YS

e kL˙(0)kF =kLk2 ≤kYLkF kvec(E)k2 =kYLkF kEkF (A–10a) e kU˙ (0)kF =kUk2 ≤kYU kF kvec(E)k2 =kYU kF kEkF (A–10b) e kS˙ (0)kF =kSk2 ≤kYS kF kvec(E)k2 =kYS kF kEkF (A–10c)

k k k k k k k k ∆L F YL F E F 2 ∆A F 2 ≤ ε+O(ε )=κL(A) +O(ε ) (A–11a) k L kF k L kF k A kF k k k k k k k k ∆U F YU F E F 2 ∆A F 2 ≤ ε+O(ε )=κU (A) +O(ε ) (A–11b) k U kF k U kF k A kF k k k k k k k k ∆S F YS F E F 2 ∆A F 2 ≤ ε+O(ε )=κS (A) +O(ε ) (A–11c) k S kF k S kF k A kF where

κL(A) =k YL kF k A kF / k L kF

κU (A) =k YU kF k A kF / k U kF

κS (A) =k YS kF k A kF / k S kF

Theorem 3. Perturbation analysis II. Assume that A and A + ∆A are both nonsingular and their PLUS factorizations exist: A = LUS and A + ∆A = (L + ∆L)(U + ∆U)(S + ∆S), then Equations (A–9)(A–10a) (A–10b)(A–10c)(A–11a)(A–11b)(A–11c) hold.

Perturbation analysis example:

185 The original PLUS factorization is:

A = LUS   0.8913 0.4565 =   0.7621 0.0185      1 01 0.4565  1 0 =     0.7665 1 0 −0.3314 −0.2381 1

With a perturbation ∆A, the PLUS factorization is:

A + ∆A = (L + ∆L)(U + ∆U)(S + ∆S)     0.8913 0.4565 0.009 0.001 =   +   0.7621 0.0185 0.007 0.004      1 01 0.4575  1 0 =     0.7746 1 0 −0.3316 −0.2179 1

Therefore, the perturbation error of PLUS factorization is limited.

186 APPENDIX B PROOFS

B.1 Proof of Proposition 3.1

Proof. The Lloyd-Max quantizer is: Z N t0 k − 0 2 min ∑ (x rk ) fX (x)dx (B–1) r 0 ,t0 0 k k k=1 tk−1 { }N { }N with tk k=0 and rk k=1 as the solution. For Y = σX + µ, Z N t00 k − 00 2 min ∑ (y rk ) fY (y)dy r 00,t00 00 k k k=1 tk−1 00−µ N Z tk σ 00 00 σ µ − 2 = ∑ t −µ ( x + r ) fX (x)dx (B–2) k−1 k k=1 σ 00−µ N Z tk 00 σ r − µ σ 2 00 − k 2 = ∑ t −µ (x ) fX (x)dx k−1 σ k=1 σ

t00 −µ r 00−µ k−1 k {σ µ}N if and only if σ = tk and σ = rk , Eq. (B–6) is minimal, i.e. tk + k=0 and {σ µ}N rk + k=1 is the solution for Eq. (B–6).

B.2 Proof of Proposition 3.2

−1 Proof. The FX (y) should be well deﬁned as −1 { ≥ } FX (y) = inf x : FX (x) y,0 < y < 1 . Then

P(Y ≤ y) = P(FX (X ) ≤ y) −1 ≤ −1 = P(FX [FX (X )] FX (y)) ≤ −1 (B–3) = P(X FX (y))

−1 = FX (FX (y)) = y.

Thus, Y is uniformly distributed.

187 Assume that X has a ﬁnite support or a truncated support (a,b), and fX (x) > 0 on the support. By using Bennettt’s Bennett [11] approximate expression for the mean

square distortion for very large number N of quantizer output levels, we have: Z 1 b 1 E X − Xˆ 2 ∼ dx [( ) ] = 2 (B–4) 12N a fX (x)

It is bounded by 1 1 b − a { } 2 ( )max 12N (a,b) fX (x) Therefore, Eq. (B–4) is towards 0 as N → ∞.

B.3 Proof of Proposition 4.1

Proof. For one-dimensional r.v.s, Σ = σ −2, the Lloyd-Max quantizer is: Z N t0 k − 0 2 min ∑ (x rk ) fX (x)dx (B–5) r 0 ,t0 0 k k k=1 tk−1 { }N { }N with tk k=0 and rk k=1 as the solution. For Y = σX + µ, Z N t00 k − 00 2 min ∑ (y rk ) fY (y)dy r 00,t00 00 k k k=1 tk−1 00−µ N Z tk σ 00 00 σ µ − 2 = ∑ t −µ ( x + r ) fX (x)dx (B–6) k−1 k k=1 σ 00−µ N Z tk 00 σ r − µ σ 2 00 − k 2 = ∑ t −µ (x ) fX (x)dx k−1 σ k=1 σ

t00 −µ r 00−µ k−1 k {σ µ}N if and only if σ = tk and σ = rk , Eq. (B–6) is minimal, i.e. tk + k=0 and {σ µ}N rk + k=1 is the solution for Eq. (B–6). Similarly, it holds for n-dimensional r.v.s.

B.4 Proof of Lemma 2

Proof. A unitary transform U satisﬁes UT U = UUT = I . For vectors x and y, the Euclidean distance between them is kx − yk. After transform by U, the Euclidean

188 distance between Ux and Uy is kUx − Uyk = kUk · kx − yk = kx − yk. Thus, Mean Square Error is invariant under unitary transforms.

B.5 Proof of Lemma 4

Proof. Denote the components of random vector X as (X1,X2) and a unitary transform 0 0 0 as U. After transform, the random vector becomes X = (X1,X2). Neglect the precision

loss brought by the computer, U is a one-to-one mapping. Therefore, H(X1,X2) = 0 0 0 0 0 0 0 0 ≤ H(X1,X2). If X1 and X2 are independent, then H(X1) + H(X2) = H(X1,X2) = H(X1,X2)

H(X1) + H(X2)

B.6 Proof of Lemma 5

Proof. Denote the variance of every component of an n-dimensional Gaussian vector ··· σ 2 σ 2 ··· σ 2 X = (X1,X2, ,Xn) as ( 1 , 2 , , n ). After volume-keeping scaling transforms, the 0 0 0 0 σ 2 σ 2 ··· σ 2 resulted Gaussian vector X has variances of ( 1 , 2 , , n ) for each component and 0 ∏n σ 2 ∏n σ 2 ∑n ∑n 1 π σ 2 1 π n ∏n σ 2 i=1 i = i=1 i . Therefore, i=1 H(Xi ) = i=1 2 ln(2 e i ) = 2 ln((2 e) i=1 i ) = 0 1 π n ∏n σ 2 ∑n 0 2 ln((2 e) i=1 i ) = i=1 H(Xi ) Similarly, it holds for random vectors in Laplacian distribution.

189 REFERENCES

[1] “YUV Video Sequences.” http://trace.eas.asu.edu/yuv/index.html, 2005.

[2] “Image Hashing Toobox in Matlab.” http://users.ece.utexas.edu/~bevans/ projects/hashing/toolbox/index.html, 2006. [3] “Test video sequence.” http://www.bigbuckbunny.org, 2008. [4] Achanta, R., Hemami, S., Estrada, F., and Susstrunk, S. “Frequency-tuned salient region detection.” Proceedings of IEEE Conference on CVPR 09.. IEEE, 2009, 1597–1604.

[5] Adams, M.D. “Generalized reversible integer-to-integer transform framework.” Proceedings of IEEE Paciﬁc Rim Conference on Communications, Computers and signal Processing, 2003. vol. 2. IEEE, 2003, 569–572.

[6] Adams, M.D. and Kossentini, F. “On the relationship between the overlapping rounding transform and lifting frameworks for reversible subband transforms.” IEEE Transactions on Signal Processing 48 (2002).1: 261–266.

[7] Ahmed, N., Natarajan, T., and Rao, K.R. “Discrete cosine transfom.” IEEE Transactions on Computers 100 (2006).1: 90–93. [8] Aubert, G. and Kornprobst, P. “Mathematical problems in image processing.” (2006).

[9] Barnes, E.S. and Sloane, N.J. “Optimal lattice quantizer in three dimensions.” SIAM Journal of algebraic discrete math. 4 (1983).1: 30–41.

[10] Bas, P. and Chassery, J.M. “A survey on attacks in image and video watermarking.” Applications of digital image processing XXV: 8-10, Seattle, Washington, USA (2002): 169.

[11] Bennett, W.R. “Spectra of quantized signals.” Bell Syst. Tech. J 27 (1948).3: 446–472. [12] Bezdek, J.C. Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers Norwell, MA, USA, 1981.

[13] Bhattacharjee, S. and Kutter, M. “Compression tolerant image authentication.” Proceedings of IEEE International Conference on Image Processing, 1998. vol. 1. 1998.

[14] Bhattacharjee, S.K. and Vandergheynst, P. “End-stopped wavelets for detecting low-level features.” Proceedings of SPIE. vol. 3813. 1999, 732.

[15] Bishop, C.M. Pattern recognition and machine learning, vol. 4. Springer New York, 2006.

190 [16] Bjontegaard, G. “Calculation of average PSNR differences between RD-curves (VCEG-M33).” in Proceedings of VCEG Meeting (ITU-T SG16 Q. 6). 2001, 2–4.

[17] Boneh, D. and Shaw, J. “Collusion-secure ﬁngerprinting for digital data.” Advances in Cryptology CRYPT095 (1995): 452–465.

[18] Booth, D.A. and Freeman, R.P.J. “Discriminative measurement of feature integration in object recognition.” Acta Psychologica 84 (1993): 1–16.

[19] Boschetti, A., Adami, N., Leonardi, R., and Okuda, M. “High dynamic range image tone mapping based on local histogram equalization.” Proceedings of IEEE International Conference on Multimedia and Expo (ICME) 2010. IEEE, 2010, 1130–1135.

[20] Bryc, W. The normal distribution: characterizations with applications. Springer-Verlag, 1995.

[21] Bucklew, J. and Gallagher Jr, N. “Quantization schemes for bivariate Gaussian random variables.” IEEE Transactions on Information Theory 25 (1979).5: 537–543. [22] ———. “Two-dimensional quantization of bivariate circularly symmetric densities.” IEEE Transactions on Information Theory 25 (1979).6: 667–671.

[23] Celik, M.U., Sharma, G., Tekalp, A.M., and Saber, E. “Reversible data hiding.” Proceedings of IEEE International Conference on Image Processing, 2002. vol. 2. Citeseer, 2002, 157–160.

[24] Cha, B.H. and Kuo, C.C.J. “Design and analysis of high-capacity anti-collusion hiding codes.” Circuits, Systems, and Signal Processing 27 (2008).2: 195–211.

[25] Chang, X.W. and Paige, C.C. “On the sensitivity of the LU factorization.” BIT Numerical Mathematics 38 (1998).3: 486–501.

[26] Chao, H., Fisher, P., and Hua, Z. “An approach to integer wavelet transformations for lossless image compression.” Advances in computational mathematics: proceedings of the Guangzhou international symposium. CRC, 1999, 19. [27] Chen, Q. and Fischer, T.R. “Robust quantization for image coding and noisy digital transmission.” Proceedings on Data Compression Conference, 1996. IEEE, 1996, 3–12. [28] Chen, Y., Hao, P., and Yu, J. “Shear-resize factorizations for fast multimodal volume registration.” Proceedings of IEEE International Conference on Image Processing, 2004. vol. 2. IEEE, 2005, 1085–1088. [29] Chen, Y., Hao, P., and Zhang, C. “Shear-resize factorizations for fast image registration.” Proceedings of IEEE International Conference on Image Processing, 2005. vol. 3. Citeseer, 2005, 1120–1123.

191 [30] Chen, Y.J., Oraintara, S., and Nguyen, T. “Video compression using integer DCT.” Proceedings of International Conference on Image Processing, 2000. vol. 2. IEEE, 2002, 844–845. [31] ———. “Video compression using integer DCT.” Proceedings of IEEE Interna- tional Conference on Image Processing, 2000. vol. 2. IEEE, 2002, 844–845.

[32] Chen, Zhenzhong and Ngan, King Ngi. “Recent advances in rate control for video coding.” Image Commun. 22 (2007): 19–38.

[33] Cheng, H. “A review of video registration methods for watermark detection in digital cinema applications.” Proceedings of IEEE International Symposium on Circuits and Systems’04.. vol. 5. 2004, 704–707.

[34] Cheng, L.Z., Zhong, G.J., and Luo, J.S. “New family of lapped biorthogonal transform via lifting steps.” Proceedings of IEE conference on Vision, Image and Signal Processing. vol. 149. IET, 2002, 91–96.

[35] Chiu, K., Herf, M., Shirley, P., Swamy, S., Wang, C., and Zimmerman, K. “Spatially nonuniform scaling functions for high contrast images.” Graphics interface. Citeseer, 1993, 245–245.

[36] Choi, M. and Abidi, A.A. “A 6-b 1.3-Gsample/s A/D converter in 0.35-µm CMOS.” IEEE Journal of Solid-State Circuits 36 (2001).12: 1847–1858. [37] Cignoni, P., Rocchini, C., and Scopigno, R. “Metro: measuring error on simpliﬁed surfaces.” Computer Graphics Forum. vol. 17. 1998, 167–174.

[38] Condat, L. and Van De Ville, D. “Fully reversible image rotation by 1-D ﬁltering.” Proceedings of the 15th IEEE International Conference on Image Processing, 2008. IEEE, 2008, 913–916.

[39] Cover, T.M. and Thomas, J.A. Elements of information theory. Wiley, 2006.

[40] Cox, I.J. Digital watermarking and steganography. Morgan Kaufmann, 2008. [41] Cox, I.J., Kilian, J., Leighton, F.T., and Shamoon, T. “Secure spread spectrum watermarking for multimedia.” IEEE transactions on image processing 6 (1997).12: 1673–1687.

[42] Daemen, J. and Rijmen, V. The design of Rijndael: AES–the advanced encryption standard. Springer Verlag, 2002.

[43] Datta, R., Joshi, D., Li, J., and Wang, J.Z. “Image retrieval: Ideas, inﬂuences, and trends of the new age.” ACM Computing Surveys (CSUR) 40 (2008).2: 1–60.

[44] Daubechies, I. and Sweldens, W. “Factoring wavelet transforms into lifting steps.” Journal of Fourier analysis and applications 4 (1998).3: 247–269.

192 [45] Debevec, P. and McMillan, L. “Image-based modeling, rendering, and lighting.” IEEE Journal of Computer Graphics and Applications 22 (2002).2: 24–25.

[46] Debevec, P.E. and Malik, J. “Recovering high dynamic range radiance maps from photographs.” ACM SIGGRAPH 2008 classes. ACM, 2008, 1–10.

[47] Devlin, K., Chalmers, A., Wilkie, A., and Purgathofer, W. “Tone reproduction and physically based spectral rendering.” Eurographics 2002: State of the Art Reports (2002): 101–123.

[48] Di Wu, X.N. “A self-synchronized image hash algorithm.” Proceedings of 2010 International Conference on Communications and Mobile Computing. IEEE, 2010, 13–15.

[49] DiCarlo, J. and Wandell, B. “Rendering high dynamic range images.” Proceedings of the SPIE: Image Sensors. vol. 3965. Citeseer, 2000, 392–401.

[50] Drago, F., Myszkowski, K., Annen, T., and Chiba, N. “Adaptive logarithmic mapping for displaying high contrast scenes.” Computer Graphics Forum. vol. 22. Wiley Online Library, 2003, 419–426. [51] Dufaux, F., Sullivan, G.J., and Ebrahimi, T. “The JPEG XR image coding standard.” IEEE Signal Processing Magazine 26 (2009).6: 195–199.

[52] Dunn, J.C. “A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters.” Cybernetics and Systems 3 (1973).3: 32–57.

[53] Fridrich, J. “Image watermarking for tamper detection.” Proceedings of IEEE International Conference on Image Processing, 1998. vol. 2. 1998.

[54] ———. “Methods for tamper detection in digital images.” Multimedia and Security (1999): 29.

[55] ———. “Visual hash for oblivious watermarking.” Proceedings of SPIE INT. Society of Optical Engineering. vol. 3971. 2000, 286–294. [56] Furht, B. “A survey of multimedia compression techniques and standards. Part I: JPEG standard.” Real-Time Imaging 1 (1995).1: 49–67.

[57] Gersho, A. “Asymptotically optimal block quantization.” IEEE Transactions on Information Theory 25 (1979).4: 373–380.

[58] Gersho, A. and Gray, R.M. Vector quantization and signal compression. Kluwer, 1993. [59] Glover, F. “Tabu search: A tutorial.” Interfaces 20 (1990).4: 74–94.

[60] Graf, S. and Luschgy, H. Foundations of quantization for probability distributions. Springer-Verlag New York, Inc. Secaucus, NJ, USA, 2000.

193 [61] Gray, Robert M. and Neuhoff, David L. “Quantization.” IEEE Transactions on Information Theory 44 (1998).6: 2325–2383.

[62] Gulati, K. and Lee, H.S. “A low-power reconﬁgurable analog-to-digital converter.” IEEE Journal of Solid-State Circuits 36 (2001).12: 1900–1911.

[63] Hao, P. “Customizable triangular factorizations of matrices.” Linear Algebra and its Applications 382 (2004): 135–154.

[64] Hao, P. and Shi, Q. “Matrix factorizations for reversible integer mapping.” IEEE Transactions on Signal Processing 49 (2002).10: 2314–2324.

[65] Hossack, D. and Sewell, J.I. “A robust CMOS compander.” IEEE Journal of Solid-State Circuits 33 (1998).7: 1059–1064. [66] Huang, J. and Schultheiss, P. “Block quantization of correlated Gaussian random variables.” IEEE Transactions on Communications Systems 11 (1963).3: 289–296.

[67] Iwahashi, M., Fukuma, S., Chokchaitam, S., and Kambayashi, N. “Lossless/lossy progressive coding based on reversible wavelet and lossless multi-channel prediction.” Proceedings of IEEE International Conference on Image Processing, 1999. vol. 1. IEEE, 2002, 430–434.

[68] Jain, A.K. Fundamentals of digital image processing. Prentice-Hall, Inc. Upper Saddle River, NJ, USA, 1989.

[69] Jobson, D.J., Rahman, Z., and Woodell, G.A. “A multiscale retinex for bridging the gap between color images and the human observation of scenes.” IEEE Transactions on Image Processing 6 (1997).7: 965–976.

[70] Johns, D.A. and Martin, K. Analog integrated circuit design. Wiley India Pvt. Ltd., 2008.

[71] Joshi, R., Reznik, Y.A., and Karczewicz, M. “Efﬁcient large size transforms for high-performance video coding.” Proceedings of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series. vol. 7798. 2010, 24.

[72] Jung, H.Y. and Prost, R. “Lossless subband coding system based on rounding transform.” IEEE Transactions on Signal Processing 46 (2002).9: 2535–2540.

[73] Katznelson, Y. An introduction to harmonic analysis. Cambridge Univ Pr, 2004.

[74] Kautsky, Jaroslav, Zitova,´ Barbara, Flusser, Jan, and Peters, Gabriele. “Feature point detection in blurred images.” IVCNZ. 1998, 103–108. [75] Kazakos, D. and Makki, S.K. “Piecewise linear companders are robust.” Proceed- ings of the 12th Biennial IEEE Conference on Electromagnetic Field Computation 2006. IEEE, 2006, 462–462.

194 [76] Keys, R. “Cubic convolution interpolation for digital image processing.” IEEE Transactions on Acoustics, Speech and Signal Processing 29 (1981).6: 1153–1160. [77] Khan, I.R., Huang, Z., Farbiz, F., and Manders, C.M. “HDR image tone mapping using histogram adjustment adapted to human visual system.” Proceedings of the 7th International Conference on Information, Communications and Signal Processing 2009.. IEEE, 2009, 1–5.

[78] Kiely, A. and Klimesh, M. “The ICER progressive wavelet image compressor.” IPN Progress Report 42 (2003).155: 1–46. [79] Knudsen, L. and Preneel, B. “Hash functions based on block ciphers and quaternary codes.” Advances in CryptologyASIACRYPT’96. Springer, 1996, 77–90. [80] Kozat, S.S., Venkatesan, R., and Mihcak, M.K. “Robust perceptual image hashing via matrix invariants.” Proceedings of IEEE International Conference on Image Processing, 2004. vol. 5. 2004.

[81] Le Gall, D. “MPEG: A video compression standard for multimedia applications.” Communications of the ACM 34 (1991).4: 46–58.

[82] Lee, S.J. and Jung, S.H. “A survey of watermarking techniques applied to multimedia.” Proceedings of IEEE International Symposium on Industrial Electron- ics, 2001. vol. 1. IEEE, 2002, 272–277.

[83] Levina, E. and Bickel, P. “The earth movers distance is the Mallows distance: Some insights from statistics.” Proceedings of IEEE conference on ICCV, 2001. vol. 2. Citeseer, 2001, 251–256.

[84] Li, Y., Sharan, L., and Adelson, E.H. “Compressing and companding high dynamic range images with subband architectures.” ACM Transactions on Graphics (TOG). vol. 24. ACM, 2005, 836–844. [85] Liang, J. and Tran, T.D. “Fast multiplierless approximations of the DCT with the lifting scheme.” IEEE Transactions on Signal Processing 49 (2002).12: 3032–3044. [86] Lin, C.Y. and Chang, S.F. “A robust image authentication method distinguishing JPEGcompression from malicious manipulation.” IEEE Transactions on Circuits and Systems for Video Technology 11 (2001).2: 153–168. [87] Lin, S. and Costello, D.J. Error control coding: fundamentals and applications. Prentice-hall Englewood Cliffs, NJ, 1983.

[88] Lowe, D.G. “Object recognition from local scale-invariant features.” Proceedings of IEEE conference on ICCV, 1999. Published by the IEEE Computer Society, 1999, 1150.

195 [89] ———. “Distinctive image features from scale-invariant keypoints.” International journal of computer vision 60 (2004).2: 91–110.

[90] Lucas, B.D. and Kanade, T. “An iterative image registration technique with an application to stereo vision.” International joint conference on artiﬁcial intelligence. vol. 3. Citeseer, 1981, 674–679.

[91] Mallat, S. and Hwang, W.L. “Singularity detection and processing with wavelets.” IEEE transactions on information theory 38 (1992).2 Part 2: 617–643.

[92] Mallat, S.G. A wavelet tour of signal processing. Academic Press, New York, 1999.

[93] Malvar, H.S. Signal processing with lapped transforms. Artech House, Inc. Norwood, MA, USA, 1992.

[94] ———. “Lapped biorthogonal transforms for transform coding with reduced blocking and ringing artifacts.” Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 1997. vol. 3. IEEE, 2002, 2421–2424.

[95] Malvar, H.S. and Staelin, D.H. “The LOT: Transform coding without blocking effects.” IEEE Transactions on Acoustics, Speech and Signal Processing 37 (2002).4: 553–559.

[96] Marpe, D., Schwarz, H., Bosse, S., Bross, B., Helle, P., Hinz, T., Kirchhoffer, H., Lakshman, H., Nguyen, T., Oudin, S., et al. “Video compression using nested quadtree structures, leaf merging, and improved techniques for motion representation and entropy coding.” IEEE Transactions on Circuits and Systems for Video Technology 20 (2010): 1676–1687.

[97] Max, J. “Quantizing for minimum distortion.” IRE Transactions on Information Theory 6 (1960).1: 7–12. [98] Merritt, Loren and Vanam, Rahul. “Improved rate control and motion estimation for H.264 encoder.” Proceedings of the IEEE ICIP 2007. 2007, 309–312.

[99] Monga, V. and Evans, B.L. “Perceptual image hashing via feature points: performance evaluation and tradeoffs.” IEEE Transactions on Image Process- ing 15 (2006).11: 3452–3465.

[100] Monga, V. and Mihcak, M.K. “Robust image hashing via non-negative matrix factorizations.” Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2006. vol. 2. 2006.

[101] Monga, V., Vats, D., and Evans, B.L. “Image authentication under geometric attacks via structure matching.” Proceedings of IEEE International Conference on Multimedia and Expo, 2005. 2005, 229–232.

196 [102] Moon, T.K. Error correction coding: mathematical methods and algorithms. Wiley-Blackwell, 2005.

[103] of the IEEE Circuits, CAS Standard Committee and Society, Systems. “IEEE Standard Speciﬁcations for the Implementations of 8x8 Inverse Discrete Cosine Transform.” IEEE Std (1990).

[104] Oraintara, S., Chen, Y.J., and Nguyen, T.Q. “Integer fast Fourier transform.” IEEE Transactions on Signal Processing 50 (2002).3: 607–618.

[105] Ou, Y. and Rhee, K.H. “A survey on image hashing for image authentication.” IEICE Transactions on Information and Systems 93 (2010).5: 1020–1030.

[106] Pateux, S. and Jung, J. “An excel add-in for computing Bjontegaard metric and its evolution.” in Proceedings of VCEG document VCEG-AE07, 31st VCEG Meeting, Marrakech, MA. 2007, 15–16.

[107] Pattanaik, S.N., Ferwerda, J.A., Fairchild, M.D., and Greenberg, D.P. “A multiscale model of adaptation and spatial vision for realistic image display.” Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques. ACM, 1998, 287–298.

[108] Pearlman, W. “Polar quantization of a complex Gaussian random variable.” IEEE Transactions on Communications 27 (1979).6: 892–899. [109] Peric,´ Z.H., Milanovic,´ O.D., and Jovanovic,´ A.Z.ˇ “Optimal companding vector quantization for circularly symmetric sources.” Information Sciences 178 (2008).22: 4375–4381. [110] Plonka, G. “A global method for invertible integer DCT and integer wavelet algorithms.” Applied and Computational Harmonic Analysis 16 (2004).2: 90–110.

[111] Pohlmann, K.C. Principles of digital audio. McGraw-Hill/TAB Electronics, 2005.

[112] Potdar, V.M., Han, S., and Chang, E. “A survey of digital image watermarking techniques.” Industrial Informatics, 2005. INDIN’05. 2005 3rd IEEE International Conference on. IEEE, 2005, 709–716. [113] Rane, S.D. and Sapiro, G. “Evaluation of JPEG-LS, the new lossless and controlled-lossy still image compression standard, for compression of high-resolution elevation data.” IEEE Transactions on Geoscience and Remote Sensing 39 (2002).10: 2298–2306.

[114] Rao, K.R., Yip, P., and Rao, K.R. Discrete cosine transform: algorithms, advantages, applications, vol. 4. Academic Press London, 1990. [115] Ravelli, E. and Daudet, L. “Embedded polar quantization.” Signal Processing Letters, IEEE 14 (2007).10: 657–660.

197 [116] Recommendation, T. “Information technology - digital compression and coding of continuous-tone still images - requirments and guidelines.” Recommendation T.81. 1993. [117] Rey, C. and Dugelay, J.L. “A survey of watermarking algorithms for image authentication.” EURASIP Journal on Applied Signal Processing 2002 (2002).1: 613–621. [118] Rheingold, H. Smart mobs: The next social revolution. Basic Books, 2003.

[119] Richardson, I.E.G. H. 264 and MPEG-4 video compression. Wiley Online Library, 2003.

[120] Roy, S. and Sun, Q. “Robust hash for detecting and localizing image tampering.” Proceedings of IEEE International Conference on Image Processing, 2007. vol. 6. 2007.

[121] Rubner, Y., Tomasi, C., and Guibas, L.J. “A metric for distributions with applications to image databases.” Proceedings of the Sixth International Con- ference on Computer Vision, 1998.. 1998, 59–66. [122] Savage, C. “A survey of combinatorial Gray codes.” SIAM review 39 (1997).4: 605–629.

[123] Senecal, J., Duchaineau, M., and Joy, K.I. “Reversible n-bit to n-bit integer haar-like transforms.” Proceedings of Data Compression Conference, 2004. IEEE, 2005, 564.

[124] Senecal, J.G., Lindstrom, P., Duchaineau, M., and Joy, K. “An improved n-bit to n-bit reversible integer haar-like transform.” Proceedings of the 12th Paciﬁc Conference on Computer Graphics and Applications. IEEE Computer Society, 2004, 371–380. [125] She, Y. and Hao, P. “On the necessity and sufﬁciency of PLUS factorizations.” Linear Algebra and its Applications 400 (2005): 193–202.

[126] She, Y., Hao, P., and Paker, Y. “Block TERM factorization of uniform block matrices.” Science in China (Series F) 47 (2004).4: 421–436.

[127] ———. “Matrix factorizations for parallel integer transformation.” IEEE Transac- tions on Signal Processing 54 (2006).12: 4675–4684. [128] Sheng, F., Bilgin, A., Sementilli, P.J., and Marcelling, M.W. “Lossy and lossless image compression using reversible integer wavelet transforms.” Proceedings of IEEE International Conference on Image Processing, 1998. IEEE, 2002, 876–880. [129] Shi, J. and Tomasi, C. “Good features to track.” Proceedings of IEEE Computer Society Conference on CVPR’94.. 1994, 593–600.

198 [130] Skala, V. and Kucha, M. “The hash function and the principle of duality.” Proceed- ings of Conference on Computer Graphics International 2001. 2001, 167–174.

[131] Srinivasan, S. “Modulo transforms-an alternative to lifting.” IEEE Transactions on Signal Processing 54 (2006).5: 1864–1874.

[132] Stockham, T.G. “Image processing in the context of a visual model.” Proceedings of the IEEE 60 (1972).7: 828–842.

[133] Strang, G. “Every unit matrix is a LULU.” Linear Algebra and Its Applications 265 (1997).1-3: 165–172.

[134] Strang, G. and Nguyen, T. Wavelets and ﬁlter banks. Wellesley Cambridge Press, 1996. [135] Swaszek, P.F. and Thomas, J.B. “Optimal circularly symmetric quantizers.” Journal of the Franklin Institute 313 (1982).6: 373–384.

[136] Sweldens, W. “Lifting scheme: a new philosophy in biorthogonal wavelet constructions.” Proceedings of Society of Photo-Optical Instrumentation Engi- neers (SPIE) Conference Series. vol. 2569. Citeseer, 1995, 68–79.

[137] ———. “The lifting scheme: A construction of second generation wavelets.” SIAM Journal on Mathematical Analysis 29 (1998).2: 511.

[138] Tagliasacchi, M., Valenzise, G., and Tubaro, S. “Hash-based identiﬁcation of sparse image tampering.” IEEE Transactions on Image Processing 18 (2009).11: 2491–2504.

[139] Tardos, G. “Optimal probabilistic ﬁngerprint codes.” Journal of the ACM (JACM) 55 (2008).2: 1–24. [140] Tian, J. “Wavelet-based reversible watermarking for authentication.” Proceedings of SPIE Security and Watermarking of Multimedia Cont. IV. vol. 4675. 2002, 679–690. [141] Toffoli, T. “Almost every unit matrix is a ULU* 1.” Linear Algebra and Its Applica- tions 259 (1997): 31–38.

[142] Tomasi, C. and Kanade, T. “Detection and tracking of point features.” International Journal of Computer Vision. Citeseer, 1991, 590–597.

[143] Tran, T.D., Liang, J., and Tu, C. “Lapped transform via time-domain pre-and post-ﬁltering.” IEEE Transactions on Signal Processing 51 (2003).6: 1557–1571. [144] Trappe, W., Wu, M., Wang, Z.J., and Liu, K.J.R. “Anti-collusion ﬁngerprinting for multimedia.” IEEE Transactions on Signal Processing 51 (2003).4: 1069–1087.

199 [145] Tu, C., Srinivasan, S., Sullivan, G.J., Regunathan, S., and Malvar, H.S. “Low-complexity hierarchical lapped transform for lossy-to-lossless image coding in JPEG XR/HD Photo.” Proceedings of SPIE. vol. 7073. Citeseer, 2008, 70730C. [146] Tuytelaars, Tinne and Mikolajczyk, Krystian. “Local invariant feature detectors: A survey.” Foundations and Trends in Computer Graphics and Vision 3 (2007).3: 177–280. [147] Venkatesan, R., Koon, S.M., Jakubowski, M.H., and Moulin, P. “Robust image hashing.” Proceedings of IEEE International Conference on Image Processing, 2000. vol. 3. 2000. [148] Vetrivel, S. and Suba, K. “An overview of H. 26x series and its applications.” Inter- national Journal of Engineering Science and Technology 2 (2010): 4622–4631.

[149] Wang, L. “Intelligent optimization algorithms with applications.” Tsinghua University & Springer Press, Beijing (2001).

[150] Wang, Z., Bovik, A.C., Sheikh, H.R., and Simoncelli, E.P. “Image quality assessment: From error visibility to structural similarity.” IEEE Transactions on Image Processing 13 (2004).4: 600–612.

[151] Wang, Z.J., Wu, M., Trappe, W., and Liu, K.J.R. “Anti-collusion of group-oriented ﬁngerprinting.” Proceedings of IEEE International Conference on ICME’03.. vol. 2. 2003.

[152] Wiegand, T., Sullivan, G.J., Bjontegaard, G., and Luthra, A. “Overview of the H. 264/AVC video coding standard.” IEEE Transactions on Circuits and Systems for Video Technology 13 (2003).7: 560–576.

[153] Wilson, S. “Magnitude/phase quantization of independent Gaussian variates.” IEEE Transactions on Communications 28 (1980).11: 1924–1929. [154] Wong, P.W. and Memon, N. “Secret and public key image watermarking schemes for imageauthentication and ownership veriﬁcation.” IEEE Transactions on Image processing 10 (2001).10: 1593–1601.

[155] Wu, M., Wang, Z., and Liu, K.J.R. “Anti-collusion ﬁngerprinting for multimedia.” IEEE Transactions on Signal Processing 51 (2003): 1069–1087.

[156] Xu, J., Yang, L., and Wu, D. “Ripplet: A new transform for image processing.” Journal of Visual Communication and Image Representation (2010).

[157] Xuan, G., Zhang, W., and Chai, P. “EM algorithms of Gaussian mixture model and hidden Markov model.” Proceedings of International Conference on Image Processing, 2001.. vol. 1. IEEE, 2002, 145–148.

[158] Yang, L., Chen, Q., Tian, J., and Wu, D. “Robust Track-and-Trace Video Watermarking.” Journal of Security and Communication Networks, Wiley (2010).

200 [159] Yang, L. and Hao, P. “Inﬁnity-norm rotation for reversible data hiding.” Proceedings of IEEE International Conference on Image Processing, 2007. vol. 3. IEEE, 2007.

[160] ———. “Inﬁnity-norm rotation transforms.” IEEE Transactions on Signal Process- ing 57 (2009).7: 2594–2603.

[161] Yang, L., Hao, P., and Wu, D. “Stabilization and optimization of PLUS factorization and its application in image coding.” Journal of Visual Communication and Image Representation (2010).

[162] Yang, L., Hao, P., and Zhang, C. “Progressive Reversible Data Hiding by Symmetrical Histogram Expansion with Piecewise-Linear Haar Transform.” Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2007. vol. 2. 2007.

[163] Zador, P. Development and evaluation of procedures for quantizing multivariate distributions. Stanford University, 1963.

[164] Zhang, Z. “Iterative point matching for registration of free-form curves.” Interna- tional Journal of Computer Vision 7 (1994).3: 119–152.

201 BIOGRAPHICAL SKETCH Lei Yang received her B.S. degree in computer science from Beijing Information and Technology Institute, and M.E. in electrical and computer engineering, Peking

University, Beijing, China, in 2008. She received her Ph.D. in electrical and computer engineering from the University of Florida in the summer of 2011. From May 2010 to Dec. 2010, she was an intern at Google Inc., Mountain View, CA, where she worked on rate-distortion-complexity optimization of video transcoding for YouTube, and open source video codec VP8. She will join Google Inc. after August 2011. Her research interests include advanced image and video coding, image processing, image and video security, transform design and optimization, computer vision and machine learning.

202