Scalable and Perceptual Audio Compression
Total Page:16
File Type:pdf, Size:1020Kb
University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2003 Scalable and perceptual audio compression Mohammed Raad University of Wollongong, [email protected] Follow this and additional works at: https://ro.uow.edu.au/theses University of Wollongong Copyright Warning You may print or download ONE copy of this document for the purpose of your own research or study. The University does not authorise you to copy, communicate or otherwise make available electronically to any other person any copyright material contained on this site. You are reminded of the following: This work is copyright. Apart from any use permitted under the Copyright Act 1968, no part of this work may be reproduced by any process, nor may any other exclusive right be exercised, without the permission of the author. Copyright owners are entitled to take legal action against persons who infringe their copyright. A reproduction of material that is protected by copyright may be a copyright infringement. A court may impose penalties and award damages in relation to offences and infringements relating to copyright material. Higher penalties may apply, and higher damages may be awarded, for offences and infringements involving the conversion of material into digital or electronic form. Unless otherwise indicated, the views expressed in this thesis are those of the author and do not necessarily represent the views of the University of Wollongong. Recommended Citation Raad, Mohammed, Scalable and perceptual audio compression, Doctor of Philosophy thesis, School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, 2003. https://ro.uow.edu.au/theses/1957 Research Online is the open access institutional repository for the University of Wollongong. For further information contact the UOW Library: [email protected] NOTE This online version of the thesis may have different page formatting and pagination from the paper copy held in the University of Wollongong Library. UNIVERSITY OF WOLLONGONG COPYRIGHT WARNING You may print or download ONE copy of this document for the purpose of your own research or study. The University does not authorise you to copy, communicate or otherwise make available electronically to any other person any copyright material contained on this site. You are reminded of the following: Copyright owners are entitled to take legal action against persons who infringe their copyright. A reproduction of material that is protected by copyright may be a copyright infringement. A court may impose penalties and award damages in relation to offences and infringements relating to copyright material. Higher penalties may apply, and higher damages may be awarded, for offences and infringements involving the conversion of material into digital or electronic form. SCALABLE AND PERCEPTUAL AUDIO COMPRESSION By Mohammed Raad SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY AT UNIVERSITY OF WOLLONGONG NORTHFIELDS AVE WOLLONGONG NSW 2522 AUSTRALIA FEBRUARY 2003 © Copyright by Mohammed Raad, 2003 To all those who value, seek and apply knowledge for the true benefit of mankind. 11 Table of Contents Table of Contents iii List of Tables vii List of Figures ix Abstract xiv Statement of Originality xvi Acknowledgements xvii List of Acronyms xviii 1 Introduction 1 1.1 Perceptually scalable audio compression ............................................. 1 1.2 Scalable to lossless audio compression................................................... 2 1.3 Contributions......................................................... 3 1.4 Publications............................................................................................. 4 2 DSP and psychoacoustic concepts in audio compression 6 2.1 Introduction............................................................................................. 6 2.2 Audio signal processing................................................ 8 2.2.1 Transforms and transform theory.............................................. 8 2.2.2 Quantization ............................................................................... 20 2.2.3 Audio signal models .................................................................. 31 2.3 Perception and perceptual entropy ...................................................... 35 2.3.1 The mechanics of sound perception.......................................... 36 2.3.2 Modelling the human ear............................................................ 38 2.3.3 Masking....................................................................................... 43 2.3.4 Calculating the masking threshold.............................................. 48 2.4 Sensory pleasantness and its contributing factors................................. 54 2.5 Audio quality assessment........................................................................ 60 2.5.1 Subjective quality assessment................................................... 62 2.5.2 Objective quality assessment...................................................... 65 2.6 Summary and conclusion........................................................................ 68 3 A Review of Perceptual Audio Compression 70 3.1 Introduction............................................................................................. 70 3.2 Transform and sub-band audio compression basics.............................. 72 3.2.1 Frame selection 72 3.2.2 The transform.............................................................................. 76 3.2.3 The perceptual model and quantization.................................... 77 3.3 Parametric audio compression basics................................................... 79 3.4 Scalable audio compression- why and how ? ....................................... 82 3.5 Lossless audio compression - why and how ? ....................................... 83 3.6 Audio compression standards ............................................................... 85 3.6.1 MPEG-1....................................................................................... 85 3.6.2 MPEG-2....................................................................................... 88 3.6.3 MPEG-4....................................................................................... 90 3.7 The Dolby AC series of coders ............................................................ 93 3.8 AT&T Perceptual Audio C oding (PAC).......................................... 95 3.9 Sony’s ATRAC....................................................................................... 97 3.10 Other transform and sub-band coders................................................... 100 3.11 Parametric audio coders .....'......................................................... 106 3.12 Scalable audio coders.............................................................................. 109 3.13 Lossless audio coding.............................................................................. 112 3.14 Discussion and conclusions..................................................................... 114 4 Perceptually Scalable Sinusoidal Compression of Audio 116 4.1 Introduction............................................................................................. 116 4.2 Sinusoidal Compression of Speech and Audio....................................... 118 4.2.1 The Short Time Fourier Transform (STFT) Approach............ 118 4.2.2 The Analysis By synthesis Approach (A-by-S)........................ 121 4.3 A new scalable sinusoidal coder ............................................................. 125 4.3.1 Motivation.................................................................................... 125 4.3.2 System Overview........................................................................ 127 4.3.3 The Perceptual Model............................................................... 131 4.3.4 Sorting the Parameters............................................................... 132 IV 4.3.5 Frequency Domain Gains............................................................ 139 4.3.6 Quantization .............................................................................. 141 4.4 Results........................ 147 4.4.1 Variable rate coder objective results.......................................... 148 4.4.2 Scalable rate coder objective results.......................................... 148 4.4.3 Subjective scalable results......................................................... 162 4.5 Discussion and conclusion..................................................................... 165 5 Set Partitioning In Hierarchical Trees (SPIHT) For Audio Coding 167 5.1 Introduction............................................................................................. 168 5.2 A study of SPIHT with current audio codingtransforms ................... 169 5.2.1 Set Partitioning In Hierarchical Trees....................................... 169 5.2.2 Audio compression using wavelets and SPIH T....................... 171 5.2.3 The M-Band Transform Based Codec....................................... 177 5.2.4 M-Band Transform Coding Results.......................................... 183 5.2.5 Discussion of the M-Band Transform coding results............... 186 5.3 Improving the compression ratio of the M-Band SPIHT coder .... 188 5.3.1 The use of masking..................................................................... 188 5.3.2