![Lapped Transforms in Perceptual Coding of Wideband Audio](https://data.docslib.org/img/3a60ab92a6e30910dab9bd827208bcff-1.webp)
Lapped Transforms in Perceptual Coding of Wideband Audio Sien Ruan Department of Electrical & Computer Engineering McGill University Montreal, Canada December 2004 A thesis submitted to McGill University in partial fulfillment of the requirements for the degree of Master of Engineering. c 2004 Sien Ruan ° i To my beloved parents ii Abstract Audio coding paradigms depend on time-frequency transformations to remove statistical redundancy in audio signals and reduce data bit rate, while maintaining high fidelity of the reconstructed signal. Sophisticated perceptual audio coding further exploits perceptual redundancy in audio signals by incorporating perceptual masking phenomena. This thesis focuses on the investigation of different coding transformations that can be used to compute perceptual distortion measures effectively; among them the lapped transform, which is most widely used in nowadays audio coders. Moreover, an innovative lapped transform is developed that can vary overlap percentage at arbitrary degrees. The new lapped transform is applicable on the transient audio by capturing the time-varying characteristics of the signal. iii Sommaire Les paradigmes de codage audio d´ependent des transformations de temps-fr´equence pour enlever la redondance statistique dans les signaux audio et pour r´eduire le taux de trans- mission de donn´ees, tout en maintenant la fid´elit´e´elev´ee du signal reconstruit. Le codage sophistiqu´eperceptuel de l’audio exploite davantage la redondance perceptuelle dans les signaux audio en incorporant des ph´enom`enes de masquage perceptuels. Cette th`ese se concentre sur la recherche sur les diff´erentes transformations de codage qui peuvent ˆetre employ´ees pour calculer des mesures de d´eformation perceptuelles efficacement, parmi elles, la transformation enroul´e, qui est la plus largement r´epandue dans les codeurs audio de nos jours. D’ailleurs, on d´eveloppe une transformation enroul´ee innovatrice qui peut changer le pourcentage de chevauchement `ades degr´es arbitraires. La nouvelle transformation en- roul´ee est applicable avec l’acoustique passag`ere en capturant les caract´eristiques variantes avec le temps du signal. iv Acknowledgments I would like to acknowledge my supervisor, Prof. Peter Kabal, for his support and guidance throughout my graduate studies at McGill University. Prof. Kabal’s kind treatment to his students is highly appreciated. I would also like to thank Ricky Der for working with me and advising me through the work. My thanks go to my fellow TSP graduate students for their close friendship; especially Alexander M. Wyglinski for the various technical assistances. I am sincerely indebted to my parents for all the encouragement they have given to me. They are the reason for who I am today. To my mother, Mrs. Dejun Zhao and my father, Mr. Liwu Ruan, thank you. v Contents 1 Introduction 1 1.1 AudioCodingTechniques ........................... 1 1.1.1 ParametricCoders ........................... 1 1.1.2 WaveformCoders............................ 2 1.2 Time-to-Frequency Transformations . ..... 3 1.3 ThesisContributions .............................. 4 1.4 ThesisSynopsis................................. 4 2 Perceptual Audio Coding: Psychoacoustic Audio Compression 6 2.1 HumanAuditoryMasking ........................... 6 2.1.1 HearingSystem............................. 7 2.1.2 PerceptionofLoudness......................... 7 2.1.3 CriticalBands.............................. 8 2.1.4 MaskingPhenomena .......................... 10 2.2 Example Perceptual Model: Johnston’s Model . ..... 11 2.2.1 LoudnessNormalization . 11 2.2.2 Masking Threshold Calculation . 11 2.2.3 PerceptualEntropy........................... 13 2.3 PerceptualAudioCoderStructure. 14 2.3.1 Time-to-Frequency Transformation . 15 2.3.2 Psychoacoustic Analysis . 17 2.3.3 Adaptive Bit Allocation . 17 2.3.4 Quantization .............................. 18 2.3.5 BitstreamFormatting . 20 Contents vi 3 Signal Decomposition with Lapped Transforms 21 3.1 BlockTransforms ................................ 22 3.2 LappedTransforms ............................... 22 3.2.1 LTOrthogonalConstraints. 23 3.3 FilterBanks: SubbandSignalProcessing . 26 3.3.1 Perfect Reconstruction Conditions . 27 3.3.2 Filter Bank Representation of the LT . 28 3.4 ModulatedLappedTransforms . 28 3.4.1 Perfect Reconstruction Conditions . 28 3.5 AdaptiveFilterBanks ............................. 33 3.5.1 Window Switching with Perfect Reconstruction . 33 4 MP3 and AAC Filter Banks 35 4.1 Time-to-Frequency Transformations of MP3 and AAC . ..... 35 4.1.1 MP3 Transformation: Hybrid Filter Bank . 35 4.1.2 AAC Transformation: Pure MDCT Filter Bank . 43 4.2 PerformanceEvaluation . 44 4.2.1 FullCoderDescription . 44 4.2.2 AudioQualityMeasurements . 49 4.2.3 ExperimentResults........................... 50 4.3 Psychoacoustic Transforms of DFT and MDCT . 52 4.3.1 InherentMismatchProblem . 52 4.3.2 ExperimentResults........................... 54 5 Partially Overlapped Lapped Transforms 55 5.1 Motivation of Partially Overlapped LT: NMR Distortion . ....... 55 5.2 Construction of Partially Overlapped LT . ..... 56 5.2.1 MLT as DST via Pre- and Post-Filtering . 56 5.2.2 SmallerOverlapSolution . 60 5.3 PerformanceEvaluation . 62 5.3.1 Pre-echoMitigation........................... 62 5.3.2 Optimal Overlapping Point for Transient Audio . 65 Contents vii 6 Conclusion 66 6.1 ThesisSummary ................................ 66 6.2 FutureResearchDirections. 68 A Greedy Algorithm and Entropy Computation 70 A.1 GreedyAlgorithm................................ 70 A.2 EntropyComputation ............................. 71 viii List of Figures 2.1 Absolute threshold of hearing for normal listeners. ...... 8 2.2 Genericperceptualaudioencoder . 14 2.3 SineMDCT-window(576points). 16 3.1 General signal processing system using the lapped transform. ...... 23 3.2 Signal processing with a lapped transform with L = 2M........... 24 3.3 Typical subband processing system, using the filter bank. 26 3.4 Magnitude frequency response of a MLT (M =10). ............. 29 4.1 MPEG-1LayerIIIdecompositionstructure. 36 4.2 Layer III prototype filter (b) and the original window (a). .......... 37 4.3 Magnituderesponseofthelowpassfilter. 38 4.4 Magnitude response of the polyphase filter bank (M =32).......... 38 4.5 Switching from a long sine window to a short one via a start window. 41 4.6 Layer III aliasing-butterfly, encoder/decoder. ......... 41 4.7 Layer III aliasing reduction encoder/decoder diagram. ........... 42 4.8 Block diagram of the encoder of the full audio coder. ........ 45 4.9 Frequency response of the MDCT basis function hk(n), M =4........ 53 5.1 Flowgraph of the Modified Discrete Cosine Transform. 57 5.2 Flowgraph of MDCT as block DST via butterfly pre-filtering. ...... 58 5.3 Global viewpoint of MDCT as pre-filtering at DST block boundaries. 59 5.4 Pre-DST lapped transforms at arbitrary overlaps (L< 2M). ........ 61 5.5 Post-DST lapped transforms at arbitrary overlaps (L< 2M)......... 62 List of Figures ix 5.6 Partially overlapped Pre-DST example showing pre-echo mitigation for sound files of castanets. ................................... 64 x List of Tables 2.1 CriticalbandsmeasuredbyScharf. ... 9 4.1 MOS is a number mapping to the above subjective quality. ....... 50 4.2 Subjective listening tests: Hybrid filter bank (Hybrid) vs. Pure MDCT filter bank (Pure)................................... 51 4.3 PESQ MOS values: Hybrid filter bank (Hybrid) vs. Pure MDCT filter bank (Pure)...................................... 51 4.4 PESQ MOS values: DFT spectrum (DFT ) vs. MDCT spectrum (MDCT ) 54 5.1 Subjective listening tests of Pre-DST coded test files of castanets. ..... 65 xi List of Terms AAC MPEG-2 Advanced Audio Coding ADPCM Adaptive Differential Pulse Code Modulation CELP Code Excited Linear Prediction DCT Discrete Cosine Transform DFT Discrete Fourier Transform DPCM Differential Pulse Code Modulation DST Discrete Sine Transform EBU-SQAM European Broadcasting Union — Sound Quality Assessment Material ERB Equivalent Rectangular Bandwidth FIR Finite Impulse Response IMDCT Inverse Modified Discrete Cosine Transform ITU International Telecommunication Union MDCT Modified Discrete Cosine Transform MDST Modified Discrete Sine Transform MLT Modulated Lapped Transform MOS Mean Opinion Score MPEG Moving Picture Experts Group MP3 MPEG-1LayerIII PCM Pulse Code Modulation NMN Noise-Masking-Noise NMR Noise-to-Masking Ratio NMT Noise-Masking-Tone LOT Lapped Orthogonal Transform List of Terms xii LT Lapped Transform QMF QuadratureMirrorFilter PE Perceptual Entropy PEAQ Perceptual Evaluation of Audio Quality PESQ Perceptual Evaluation of Speech Quality PR Perfect Reconstruction Pre-DST Pre-filtered Discrete Sine Transform SFM Spectral Flatness Measure SMR Signal-to-Masking Ratio SNR Signal-to-Noise Ratio SPL SoundPressureLevel TDAC Time-Domain Aliasing Cancellation TMN Tone-Masking-Noise TNS Temporal Noise Shaping VQ Vector Quantization 1 Chapter 1 Introduction 1.1 Audio Coding Techniques Audio coding algorithms are concerned with the digital representation of sound using in- formation bits. A number of paradigms have been proposed for the digital compression of audio signals. Roughly, audio coders can be grouped as either parametric coders or wave- form coders. The concept of perceptual audio coding is relevant in the latter case, where auditory perception characteristics are applicable [1]. 1.1.1 Parametric Coders Parametric coders represent the source of the signal with a few parameters. Such coders are suitable for speech signals
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages88 Page
-
File Size-