Lossless Wideband Audio Compression: Prediction and Transform
Total Page:16
File Type:pdf, Size:1020Kb
JONG-HWA KIM Lossless Wideband Audio Compression: Prediction and Transform Verlustfreie Kompression fur¨ Breitband-Audiosignale: Pradiktion¨ und Transformation Von der Fakultat¨ I - Geisteswissenschaften der Technischen Universitat¨ Berlin zur Verleihung des akademischen Grades Doktor der Philosophie - Dr. phil. - genehmigte Dissertation Technische Universitat¨ Berlin Lossless Wideband Audio Compression: Prediction and Transform Verlustfreie Kompression fur¨ Breitband-Audiosignale: Pradiktion¨ und Transformation vorgelegt von Master of Electronic Engineering Jong-Hwa Kim aus Seoul / Korea Von der Fakultat¨ I - Geisteswissenschaften der Technischen Universitat¨ Berlin zur Verleihung des akademischen Grades Doktor der Philosophie - Dr. phil. - genehmigte Dissertation Promotionsausschuss: Vorsitzender: Prof. Dr. Eckart Mensching Berichter: Prof. Dr. Klaus Hobohm Berichterin: Prof. Dr. Helga de la Motte-Haber Tag der wissenschaftlichen Aussprache: 19. Dezember 2003 Berlin 2004 D83 ABSTRACT Lossless Wideband Audio Compression: Prediction and Transform This thesis studies lossless audio compression. In the domain of lossless compression, research takes place on two broad development sections, signal modeling and coding algorithm. The former is concerned with the understanding of the source signal, while coding is the more tightly specified task of efficiently representing a single symbol as a code. The focus of this thesis is the evaluation and the development of signal modeling techniques for lossless compression. Related with the modeling method used to decorrelate a signal, the data compression schemes are generally divided in two categories, predictive modeling and transform-based modeling. In the thesis, all two categories are investigated in depth and handled from the lossless viewpoint. The first contribution of the thesis is an exploration of the general audio compression systems including the lossy compression system. In predictive modeling, the structures of various linear prediction filters are introduced by presenting the fundamental autoregressive modeling. The prediction filters including the approaches to the nonstationary signal modeling and to the adaptive linear prediction filters are explored and evaluated by testing within a prototypical lossless audio compression system. For transform modeling, two well-known subband transform coding methods, Laplacian pyramid and subband coding scheme, are first described, and then the design methods of perfect reconstruction multirate filter banks are studied. Concerning with the modulated lapped orthogonal transform, the efficiency of linear prediction from subband and from fullband is formally compared and empirically examined. Wavelet transform is in depth studied from the various viewpoints in order to find the theoretical relationship between the wavelet and the multirate filter banks. Theoretical and practical aspects of reversible transforms are discussed by introducing the S-transform, S+P transform, and RTS transform. The lifting method is examined as a means to realize the biorthogonal wavelets. Integer lifting scheme with rounding-off method is investigated to construct reversible version of wavelet transforms and its performance is validated by applying to lossless audio compression. Finally, some of the more important results presented in this thesis are summarized with the suggesting directions for future research. ZUSAMMENFASSUNG Verlustfreie Kompression fur¨ Breitband-Audiosignale: Pradiktion¨ und Transformation Das Ziel dieser Dissertation ist die Untersuchung von verlustfreier Audiokompression. Im Bereich der verlustfreien Kompression teilt sich die Forschung in zwei große Entwicklungsabschnitte, n¨amlich Signalmodellierung und Codierungsverfahren. Die Signalmodellierung befasst sich mit dem Verstehen des Quellsignals, w¨ahrend sich die Codierung mit dem speziellen Problem der effizienten Repr¨asentation von Einzelsymbolen befaßt. Der Fokus dieser Dissertation liegt auf der Evaluation und Entwicklung von Signalmodellierungstechniken f¨ur die verlustfreie Kompression. Mit Bezug auf die Modellierungsmethode zur Dekorrelation des Signals fallen die Datenkompressionssysteme in zwei Kategorien, n¨amlich die pr¨adiktive Modellierung und die auf Transformation basierende Modellierung. In der Dissertation wurden beide Kategorien aus der Sicht verlustfreier Kompression ausf¨uhrlich untersucht. Der erste Beitrag der Dissertation ist der Erforschung von Audiokompressions- systemen einschließlich der verlustbehafteten Kompressionssysteme gewidmet. Bei der pr¨adiktiven Modellierung werden die verschiedenen Strukturen der Linear-Pr¨adiktions- filter durch Vorlage der fundamentalen autoregressiven Modellierung vorgestellt. Die Pr¨adiktionsfilter einschließlich der Ans¨atze zur Modellierung des nicht-station¨aren Signals und adaptiven Linear-Pr¨adiktionsfiltern werden vorgestellt, und deren Effizienz im Experiment mit einem prototypischen verlustfreien Audiokompressionssystem evaluiert. Bez¨uglich der Transformationsbasierten Modellierung werden zwei prominente Codierungsmethoden beschrieben, die auf Subbandtransformation basieren, n¨amlich Laplacian Pyramide und Subbandcodierung, beschrieben, und die Entwurfsverfahren f¨ur perfekt rekonstruierbare Multirate-Filterb¨anke werden dargestellt. Die Effizienz linearer Pr¨adiktion von Vollbandsignalen und Subbandsignalen wird theoretisch untersucht und empirisch validiert. Die Wavelet-Transformation wird nach verschiedenen Aspekten gr¨undlich untersucht, um die theoretische Verkn¨upfung zwischen der Wavelet-Transformation und der Multirate-Filterbank zu diskutieren. Theoretische und praktische Aspekte von reversiblen Transformationen werden in Zusammenhang mit S-Transformation, S+P Transformation, und RTS Transformation untersucht. Die Liftingmethode, die die biorthogonalen Wavelets realisiert, wird untersucht. Um eine reversible Wavelet-Transformation zu konstruieren, wird das mit Ganzzahlen operierende Liftingsystem durch Abrundungsverfahren entwickelt. Dessen Funktion wird durch Anwendung auf die verlustfreie Audiokompression validiert. Zum Schluss sind einige wichtige Untersuchungsergebnisse der Dissertation mit Vorschl¨agen f¨ur zuk¨unftige Forschungsarbeiten zusammengefasst. ACKNOWLEDGEMENTS This research was performed at the Institute for Communication Sciences, Technical University Berlin, financially supported by Berliner Senate in the course of a Postgraduate-Scholarship NaF¨oG. This thesis is dedicated to the memory of my supervisor Prof. Dr.-Ing. Manfred Krause. As a teacher and a partner, he introduced me to this intriguing field of research and continuously taught me precise scientific thinking and writing. I feel deeply grateful for his academic support and moral endorsement through the years. The scientific community of audio researchers suffered from a great loss of a person who contributed outstanding ardour and affection to the area, when he passed away a few months ago. I would like to thank Prof. Dr. Klaus Hobohm and Prof. Dr. de la Motte-Haber for reviewing this thesis and numerous valuable comments and suggestions. I also thank Dr.-Ing. Axel R¨obel who works currently at IRCAM, France, for fruitful discussions in this work. Doris Grasse deserves special thanks for all arrangements of my postgraduate life in the Institute. My current colleagues at Multimedia Lab of University Augsburg have also been very supportive for successful conclusion of this work. I would like to thank Prof. Dr. Elisabeth Andr´e who gave me the opportunity to continue and finish my work I started in Berlin. Thanks to Dr. Martin M¨uller for being my best friend. Finally, my deepest thanks for beaming love from my parents and brothers. Thanks to Jongkwan for bringing me to the sunny places. Thanks, Jongihn, gathering our family together always in a good atmosphere. Thanks, Jongho, for reminding how science and art go together, and being my best friend. Thanks to my mother who always understood the value of continuing my long schooling. Thanks to my father for encouraging me to follow my own path to happiness, letting me know you are always there for me, and I deeply respect your honest life of 40 years as a schoolteacher. Eternal gratitude goes to my wife Heejin whose unfathomable love and understanding during this work made it possible for me to keep going. Thanks to our son Nova who does not pose any questions related to the contents of this thesis yet. CONTENTS i Contents 1 Introduction 11 2 High-Quality Digital Audio Compression 14 2.1 Digital Audio Representation . ..................... 14 2.1.1 Wideband audio signal . ..................... 14 2.1.2 Bandwidth, capacity, and high-quality audio . ....... 15 2.2 General Framework of Audio Compression System . ....... 17 2.2.1 Segmentation of audio stream . .............. 17 2.2.2 Signal decorrelation . ..................... 18 2.2.3 Entropy coding . ..................... 20 2.3 Lossy and Lossless Audio Compression . .............. 24 2.3.1 Lossy compression . ..................... 24 2.3.2 Lossless compression . ..................... 30 2.4 Performance Criteria for Lossless Compression System.......... 32 2.5 Test Audio Materials ............................ 34 3 Predictive Modeling 35 3.1 Theoretical Background . ..................... 35 3.2 Linear Prediction System . ..................... 38 3.2.1 General expressions . ..................... 38 3.2.2 Estimation of the predictor coefficients .............. 41 3.2.3 Lattice structure of LP coefficients . .............. 45 3.2.4 Determination of LP filter