Vector Quantization for Spatiotemporal Sub-Band Coding by Pasquale Romano Jr
Total Page:16
File Type:pdf, Size:1020Kb
Vector Quantization for Spatiotemporal Sub-band Coding by Pasquale Romano Jr. A.B., Computer Science Harvard University Cambridge, Massachusetts 1987 SUBMITTED TO THE MEDIA ARTS AND SCIENCES SECTION IN PARTIAL FULFILLMENT OF THE REQUIREMENTS OF THE DEGREE OF MASTER OF SCIENCE AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 1990 @Massachusetts Institute of Technology 1989 All Rights Reserved Signature of the Author . ..... ... .. f Pasquale Romano Jr. Media Arts and Sciences Section September 28, 1989 In Certified by ............ ............. '............... / .... d w....... .p. .. a... Andrew B. Lippan.1u/ Lecturer, Associate Director, Media Laboratory Thesis Supervisor . I P_ Accepted by Stephen A. Benton Chairman Departmental Committee on Graduate Students mAssACHUSETTS INS OF TEcwQLOGY FEIB 27 1990 Rotert Vector Quantization for Spatiotemporal Sub-band Coding by Pasquale Romano Jr. Submitted to the Media Arts and Sciences Section on September 28, 1989 in partial fulfillment of the requirements of the degree of Master of Science at the Massachusetts Institute of Technology Abstract This thesis investigates image coding using a combination of sub-band analysis/synthesis techniques and vector quantization. The design of a vector quantizer for image sub-bands is investigated, and the interplay between multi-rate filter banks and the vector coder is examined. The goal is twofold, first, a vector quantizer that is bounded by a distortion criterion versus an apriori fixed limit is essential to more optimally allocate bits in a sub- band coding system. This is due to the dynamic nature of the energy distribution in the sub-bands. Second, parameters for the vector quantizer that are psychophysically and statistically well matched to the characteristics of image sub-bands must be determined. Chrominance information is treated independantly due to reduced acuity in humans to chroma information. Ultimately a moving picture coding system is proposed which affords significant bit rate reductions while maintaining a high degree of flexibility with respect to the feature set required in an interactive multi-media environment. Wherever applicable, the tradeoffs between coding efficiency and flexibility are discussed. Issues involving source material and coder/decoder complexity are also examined. Thesis Supervisor: Andy Lippman Title: Lecturer, Associate Director, Media Laboratory This work was supported in part by Columbia Pictures Entertainment Incorporated, Paramount Pictures Corporation and Warner Brothers Incorporated. Contents 1 Introduction 12 2 Sub-band Analysis 17 2.1 Modeling the Human Visual System Using Spatiotemporal Sub-bands 18 2.2 Temporal Source Model Considerations .. ............... 20 2.3 Sub-band Transforms and Image Coding ............. ... 22 2.3.1 Perfect Reconstruction Multi-rate Filter Banks ......... 24 2.3.2 Pyramid Coding ....... 30 2.3.3 Temporal Band Splitting ..................... 33 3 Vector Quantization 51 3.1 Background ........ ................................ 52 3.2 Tree Based Algorithms for Vector Quantization . 55 3.2.1 H istory ........... ........ 3.2.2 Implementation Specifics ......... ..... 58 3.3 Error Limit Coding ... ............. 3.4 Jointly Coding Spatial Detail in QMF pyramids 4 Chrominance Coding 4.1 Color Vision .... ........ ........ 4.2 Source Model Considerations .. ...... ... ...... ... 71 4.3 Using Vector Quantization to Code Chrominance Information 5 A Spatiotemporal Sub-band Coding System 5.1 System Overview . ....... ....... ....... .. 5.2 Preprocessing ... .... .. .... ... .... 80 5.3 Luminance Sub-band Analysis . ...... 84 5.4 Encoding ... ...... .. 85 5.5 Decoding ......... ... 91 5.6 Sub-band Synthesis .. ... 91 5.7 Post Processing .. ... .. 92 5.8 Experimental Results ... .. 92 6 Conclusions and Future Work 98 6.1 Concluding Remarks ...................... ..... 98 6.2 Future Work ................................ 100 A Computational Requirements of Spatiotemporal Sub-band Trans- forms 101 A.1 Spatial QMF pyramid Transforms ............... ..... 101 A.2 Temporal QMF Sub-band Transforms ........ .......... 104 A .3 Sum m ary ........ ............. ............. 105 B Acknowledgments 106 List of Figures 2.1 Arbitrary M band analysis/synthesis system. 25 2.2 Top: Notched 2way band splitter. Middle: Overlapped 2way band splitter. Bottom: Ideal band splitter .................. 26 2.3 Aliasing as a result of decimating the pictured spectrum (top) by a factor of 2 . .. 27 2.4 9 tap QMF analysis/synthesis bank [42]. Top Left: Impulse re- sponse of the low pass filter. Top Right: Impulse response of the high pass filter. Bottom: Magnitude response of the two filters pictured above displayed simultaneously. ..................... 32 2.5 Frequency domain subdivisions in a typical 3 level QMF pyra- mid. LL denotes lowpass filtering vertically and horizontally. LH de- notes lowpass filtering vertically an highpass filtering horizontally. HL denotes highpass filtering vertically and lowpass filtering horizontally. HH denotes highpass filtering both horizontally and vertically. .... 34 2.6 Top: Original image from the sequence "Alley". Bottom: Pyramid decomposition of the above image corresponding to the frequency do- main subdivisions shown in Figure 2.5. Level 1 Gain = 16. Level 2 Gain = 8. Level 3 Gain = 4. ........... ........... 35 2.7 2 way band splitting filter bank (15 tap) used recursively to achieve the 4 way split - Left Column: Analysis transform basis functions. Right Column: Synthesis transform basis functions ... 37 2.8 Magnitude response of the 2 way band splitting filter bank - Top: Magnitude response of the analysis bank (15 tap). Bottom: Magnitude response of the synthesis bank (3 tap). ........ ... 38 2.9 5 way band splitting filter bank (7 tap) - Left Column: Analysis transform basis functions. Right Column: Synthesis transform basis functions ....... ....................... .... 39 2.10 5 way band splitting filter bank (7 tap) - Top: Magnitude re- sponse of the analysis bank. Bottom: Magnitude response of the synthesis bank. ... ....................... .... 40 2.11 5 way band splitting filter bank (13 tap) - Left Column: Anal- ysis transform basis functions. Right Column: Synthesis transform basis functions ............................... 45 2.12 5 way band splitting filter bank (13 tap) - Top: Magnitude response of the analysis bank. Bottom: Magnitude response of the synthesis bank. .............................. 46 2.13 13 frame excerpt from the sequence "Table Tennis. ...... 47 2.14 5 sub-bands resulting from applying the 7 tap 5 band trans- form pictured in Figures 2.9 and 2.10 to the middle 7 frames pictured in Figure 2.13. Top Left: Band 0. Top Right: Band 1 (gain = 2). Middle Left: Band 2 (gain = 3). Middle Right: Band 3 (gain = 5). Bottom: Band 5 (gain = 6). ............... 48 2.15 5 sub-bands resulting from applying the 13 tap 5 band trans- form pictured in Figures 2.9 and 2.10 to the frames pictured in Figure 2.13. Top Left: Band 0. Top Right: Band 1 (gain = 2). Middle Left: Band 2 (gain = 3). Middle Right: Band 3 (gain = 5). Bottom: Band 5 (gain = 6). .... .... ... .... ... 49 2.16 Top: Original. Middle Left: 13 tap analysis/synthesis bank w/bands 2-4 vector coded. Middle Right: 7 tap analysis/synthesis bank w/bands 2-4 vector coded. Bottom Left: Error signal resulting from 13 tap analysis/synthesis system w/vector coding (gain = 8). Bottom Right: Error signal resulting from 7 tap analysis/synthesis w/vector coding (gain = 8). ......... ........... 50 3.1 Kd-Tree: A typical Kd-Tree (K = 2) viewed as both a binary tree and a subdivided 2 dimensional space. ................. 57 3.2 Variance "Cardiogram": Variance of selected sub-bands plotted as a function of time for the test sequence "Alley". ........ .... 62 3.3 Histograms of selected sub-bands. Top Left: Histogram of the LH channel of pyramid level 1. Top Right: Histogram of the LH channel of pyramid level 2. Bottom: Histogram of the LH channel of pyramid level 3. Note that all of the above originate from a selected pyramid assembled from the test sequence "Table Tennis" ...... 63 3.4 Edge Orientation: Probability distribution of the angle (with re- spect to horizontal) of edges for an outdoor scene containing man-made structures (from F. Kretz). ........................ 66 4.1 Contrast sensitivity as a function of spatial frequency for the red-green chromatic grating and a green monochromatic grating [30] ............................... 69 4.2 Contrast sensitivity as a function of spatial frequency for the blue-yellow chromatic grating and a yellow monochromatic grating [30] .................. 69 4.3 Gaussian decimation/interpolation filter applied to the chromi- nance channels. Top: Impulse response. Bottom: Magnitude re- sponse........... .................................... 74 4.4 Top Row: Luminance. Middle Row: I. Bottom Row: Q 75 5.1 High Level Codec Block Diagram. .. .... ... .... .... 8 1 5.2 Encoder. ................. .. ... .... ... .... 82 5.3 Decoder. ................. ..... ...... ..... 83 5.4 Partitioned Luminance Spectrum. .... ...... ..... .. 86 5.5 Retained Luminance Sub-bands. .. .... .... 5.6 Excerpts From the Test Sequence "Alley" ..... 5.7 Excerpts From the Test Sequence "Table Tennis" List of Tables 2.1 Bit rate and vector quantizer parameters for bands 2-4 resulting from the 13 tap analysis/synthesis filter bank. Note: Maximum error clamped to specific SN R 's. ........ ........ ....... 43 2.2 Bit rate and vector quantizer parameters for bands 2-4 resulting from the 7