Note to Users

NOTE TO USERS This reproduction is the best copy available. ® UMI Lapped Transforms in Perceptual Coding of Wideband Audio Sien Ruan Department of Electrical & Computer Engineering McGill University Montreal, Canada December 2004 A thesis submitted to McGill University in partial fulfillment of the requircments for the degree of Master of Engineering. © 2004 Sien Ruan Library and Bibliothèque et 1+1 Archives Canada Archives Canada Published Heritage Direction du Branch Patrimoine de l'édition 395 Wellington Street 395, rue Wellington Ottawa ON K1A ON4 Ottawa ON K1A ON4 Canada Canada Your file Votre référence ISBN: 0-494-12642-6 Our file Notre référence ISBN: 0-494-12642-6 NOTICE: AVIS: The author has granted a non L'auteur a accordé une licence non exclusive exclusive license allowing Library permettant à la Bibliothèque et Archives and Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par télécommunication ou par l'Internet, prêter, telecommunication or on the Internet, distribuer et vendre des thèses partout dans loan, distribute and sell th es es le monde, à des fins commerciales ou autres, worldwide, for commercial or non sur support microforme, papier, électronique commercial purposes, in microform, et/ou autres formats. paper, electronic and/or any other formats. The author retains copyright L'auteur conserve la propriété du droit d'auteur ownership and moral rights in et des droits moraux qui protège cette thèse. this thesis. Neither the thesis Ni la thèse ni des extraits substantiels de nor substantial extracts from it celle-ci ne doivent être imprimés ou autrement may be printed or otherwise reproduits sans son autorisation. reproduced without the author's permission. ln compliance with the Canadian Conformément à la loi canadienne Privacy Act some supporting sur la protection de la vie privée, forms may have been removed quelques formulaires secondaires from this thesis. ont été enlevés de cette thèse. While these forms may be included Bien que ces formulaires in the document page count, aient inclus dans la pagination, their removal does not represent il n'y aura aucun contenu manquant. any loss of content from the thesis. ••• Canada Ta my belaved parents 11 Abstract Audio coding paradigms depend on time-frequency transformations to remove statistical redundancy in audio signaIs and reduce data bit rate, while maintaining high fidelity of the reconstructed signal. Sophisticated perceptual audio coding further exploits perceptual redundancy in audio signaIs by incorporating perceptual masking phenomena. This thesis focuses on the investigation of different coding transformations that cau be used to compute perceptual distortion measures effectively; among them the lapped transform, which is most widely used in nowadays audio coders. Moreover, an innovative lapped transfonn is developed that can vary overlap percentage at arbitrary degrees. The new lapped transform is applicable on the transient audio by capturing the time-varying characteristics of the signal. III Sommaire Les paradigmes de codage audio dépendent des transformations de temps-fréquence pour enlever la redondance statistique dans les signaux audio et pour réduire le taux de trans mission de données, tout en maintenant la fidélité élevée du signal reconstruit. Le codage sophistiqué perceptuel de l'audio exploite davantage la redondance perceptuelle dans les signaux audio en incorporant des phénomènes de masquage perceptuels. Cette thèse se concentre sur la recherche sur les différentes transformations de codage qui peuvent être employées pour calculer des mesures de déformation perceptuelles efficacement, parmi elles, la transformation enroulé, qui est la plus largement répandue dans les codeurs audio de nos jours. D'ailleurs, on développe une transformation enroulée innovatrice qui peut changer le pourcentage de chevauchement à des degrés arbitraires. La nouvelle transformation en roulée est applicable avec l'acoustique passagère en capturant les caractéristiques variantes avec le temps du signal. IV Acknowledgments 1 would like to acknowledge my supervisor, Prof. Peter Kabal, for his support and guidance throughout my graduate studies at McGill University. Prof. Kabal's kind treatment to his students is highly appreciated. 1 would also like to thank Ricky Der for working with me and advising me through the work. My thanks go to my fellow TSP graduate students for their close friendship; especially Alexander M. Wyglinski for the various technical assistances. 1 am sincerely indebted to my parents for all the encouragement they have given to me. They are the reason for who 1 am today. To my mother, Mrs. Dejun Zhao and my father, ML Liwu Ruan, thank you. v Contents 1 Introduction 1 1.1 Audio Coding Techniques 1 1.1.1 Parametric Coders 1 1.1.2 Waveform Coders . 2 1.2 Time-to-Frequency Transformations. 3 1.3 Thesis Contributions 4 1.4 Thesis Synopsis . 4 2 Perceptual Audio Coding: Psychoacoustic Audio Compression 6 2.1 Human Auditory Masking ... 6 2.1.1 Hearing System . 7 2.1.2 Perception of Loudness . 7 2.1.3 Critical Bands ..... 1:\ 2.1.4 Masking Phenomena .. 10 2.2 Example Perceptual Model: Johnston's Model 11 2.2.1 Loudness Normalization .... 11 2.2.2 Masking Threshold Calculation 11 2.2.3 Perceptual Entropy . 13 2.3 Perceptual Audio Coder Structure. 14 2.3.1 Time-to-Frequency Transformation 15 2.3.2 Psychoacoustic Analysis 17 2.3.3 Adaptive Bit Allocation 17 2.3.4 Quantization ..... 18 2.3.5 Bitstream Formatting 20 Contents vi 3 Signal Decomposition with Lapped Transforms 21 3.1 Block Transforms 22 3.2 Lapped Transforms 22 3.2.1 LT Orthogonal Constraints. 23 3.3 Filter Banks: Subband Signal Processing 26 3.3.1 Perfect Reconstruction Conditions. 27 3.3.2 Filter Bank Representation of the LT 28 3.4 Modulated Lapped Transforms . 28 3.4.1 Perfect Reconstruction Conditions. 28 3.5 Adaptive Filter Banks . 33 3.5.1 Window Switching with Perfect Reconstruction 33 4 MP3 and AAC Filter Banks 35 4.1 Time-to-Frequency Transformations of MP3 and AAC . 35 4.1.1 MP3 Transformation: Hybrid Filter Bank 35 4.1.2 AAC Transformation: Pure MDCT Filter Bank 43 4.2 Performance Evaluation .. 44 4.2.1 Full Coder Description ... 44 4.2.2 Audio Quality Measurements 49 4.2.3 Experiment Results ..... 50 4.3 Psychoacoustic Transforms of DFT and MDCT 52 4.3.1 Inherent Mismatch Problem 52 4.3.2 Experiment Results ... 54 5 Partially Overlapped Lapped Transforms 55 5.1 Motivation of Partially Overlapped LT: NMR Distortion 55 5.2 Construction of Partially Overlapped LT .. 56 5.2.1 MLT as DST via Pre- and Post-Filtering 56 5.2.2 Smaller Overlap Solution 60 5.3 Performance Evaluation ..... 62 5.3.1 Pre-echo Mitigation . 62 5.3.2 Optimal Overlapping Point for Transient Audio 65 Contents VIl 6 Conclusion 66 6.1 Thesis Summary ..... 66 6.2 Future Research Directions. 68 A Greedy Aigorithm and Entropy Computation 70 A.1 Greedy Algorithm. 70 A.2 Entropy Computation 71 VIll List of Figures 2.1 Absolute threshold of hearing for normallisteners. 8 2.2 Generic perceptual audio encoder 14 2.3 Sine MDCT-window (576 points). 16 3.1 General signal processing system using the lapped transform .. 23 3.2 Signal processing with a lapped transform with L = Uv!. 24 3.3 Typical subband processing system, using the filter bank. 26 3.4 Magnitude frequency response of a MLT (M = 10). 29 4.1 MPEG-1 Layer III decomposition structure. 36 4.2 Layer III prototype filter (b) and the original window (a). 37 4.3 Magnitude response of the lowpass filter. 38 4.4 Magnitude response of the polyphase filter bank (f\.1 = 32). 38 4.5 Switching from a long sine window to a short one via a start window. 41 4.6 Layer III aliasing-butterfiy, encoderjdecoder. 41 4.7 Layer III aliasing reduction encoderjdecoder diagram. 42 4.8 Block diagram of the encoder of the full audio coder. 45 4.9 Frequency response of the MD CT basis function hk(n), f\.1 = 4 .. 53 5.1 Flowgraph of the Modified Discrete Cosine Transform. ..... 57 5.2 Flowgraph of MDCT as block DST via butterfiy pre-filtering. 58 5.3 Global viewpoint of MDCT as pre-filtering at DST block boundaries. 59 5.4 Pre-DST lapped transforms at arbitrary overlaps (L < 2f\.1). 61 5.5 Post-DST lapped transforms at arbitrary overlaps (L < 2M). 62 List of Figures IX 5.6 Partially overlapped Pre-DST example showing pre-echo mitigation for sound files of castanets. 64 x List of Tables 2.1 Critical bands measured by Scharf ............. 9 4.1 MOS is a number mapping to the above subjective quality. 50 4.2 Subjective listening tests: Hybrid filter bank (Hybrid) vs. Pure MDCT filter bank (Pure) . .. 51 4.3 PESQ MOS values: Hybrid filter bank (Hybrid) vs. Pure MDCT filter bank (Pure) . .. 51 4.4 PESQ MOS values: DFT spectrum (DFT) vs. MDCT spectrum (MDCT) 54 5.1 Subjective listening tests of Pre-DST coded test files of castanets. .. 65 xi List of Terms AAC MPEG-2 Advanced Audio Coding ADPCM Adaptive DifferentiaI Pulse Code Modulation CELP Code Excited Linear Prediction DCT Discrete Cosine Transform DFT Discrete Fourier Transform DPCM DifferentiaI Pulse Code Modulation DST Discrete Sine Transform EBU-SQAM European Broadcasting Union - Sound Quality Assessment Material ERB Equivalent Rectangular Bandwidth FIR Finite Impulse Response IMDCT Inverse Modified Discrete Cosine Transform ITU International Telecommunication Union MDCT Modified Discrete Cosine Transform MDST Modified Discrete Sine Transform MLT Modulated Lapped Transform MOS Mean Opinion Score

Load more