High-Fidelity Multichannel Audio Coding EURASIP Book Series on Signal Processing and Communications Editor-In-Chief: K

Total Page:16

File Type:pdf, Size:1020Kb

High-Fidelity Multichannel Audio Coding EURASIP Book Series on Signal Processing and Communications Editor-In-Chief: K High-Fidelity Multichannel Audio Coding Dai Tracy Yang, Chris Kyriakakis, and C.-C. Jay Kuo EURASIP Book Series on Signal Processing and Communications EURASIP Book Series on Signal Processing High-Fidelity Multichannel Audio Coding EURASIP Book Series on Signal Processing and Communications Editor-in-Chief: K. J. Ray Liu Editorial Board: Zhi Ding, Moncef Gabbouj, Peter Grant, Ferran Marques,´ Marc Moonen, Hideaki Sakai, Giovanni Sicuranza, Bob Stewart, and Sergios Theodoridis Hindawi Publishing Corporation 410 Park Avenue, 15th Floor, #287 pmb, New York, NY 10022, USA Nasr City Free Zone, Cairo 11816, Egypt Fax: +1-866-HINDAWI (USA toll-free) © 2006 Hindawi Publishing Corporation All rights reserved. No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without written permission from the publisher. Cover Image: Mehau Kulyk/Science Photo Library ISBN 977-5945-24-0 EURASIP Book Series on Signal Processing and Communications, Volume 2 High-Fidelity Multichannel Audio Coding Dai Tracy Yang, Chris Kyriakakis, and C.-C. Jay Kuo Hindawi Publishing Corporation http://www.hindawi.com Dedication To Ruhua, Joshua, Junhui, and Zongduo — Dai Tracy Yang To Wee Ling, Anthony, and Alexandra — Chris Kyriakakis To Terri and Allison —C.-C.JayKuo Preface Audio is one of the fundamental elements in multimedia signals. Audio signal pro- cessing has attracted attention from researchers and engineers for several decades. By exploiting unique features of audio signals and common features of all multi- media signals, researchers and engineers have been able to develop more efficient technologies to compress audio data. Although books on digital audio have been available some time, the subject of multichannel audio coding techniques has not yet been addressed in great detail. With many years of teaching and research in the field of digital audio signal processing and digital audio compression, we see a need for an advanced audio coding book that covers recent developments in this field. When we started this book project, we had a smaller scope. Our objective was to present several inno- vative compression techniques for multichannel audio sources and publish it as a research monograph. However, after the first draft, we received valuable comments from our colleagues and anonymous reviewers. With their encouragement, we de- cided to extend the coverage of the book by including more background material to make it a senior undergraduate or a graduate level textbook on advanced au- dio coding techniques. Special thanks also go to Dr. Hongmei Ai for her valuable discussions and suggestions when we developed and tested our new audio coding algorithms. This book includes three parts. The first part covers the basic topics on au- dio compression, such as quantization, entropy coding, psychoacoustic models, and sound quality assessment. The second part of the book highlights the current most prevalent low-bit-rate high-performance audio coding standard—MPEG-4 Audio. More emphasis is given to the audio standards that are capable of support- ing multichannel signals, that is, MPEG Advanced Audio Coding (AAC), includ- ing the original MPEG-2 AAC specification, additional MPEG-4 toolsets, and the most recent aacPlus standard. The third part of this book introduces several inno- vative multichannel audio coding methods, which can further improve the coding performance and expand the available functionalities of MPEG AAC. This section is more suitable for graduate students and researchers. Dai Tracy Yang, Chris Kyriakakis, and C.-C. Jay Kuo Los Angeles, CA August 17, 2005 Contents Dedication v Preface vii 1. Introduction to digital audio 1 1.1. Digital audio coding 1 1.1.1. Representing digital audio signals 1 1.1.2. Building blocks of digital audio codecs 3 1.1.3. Lossy compression and lossless compression 3 1.2. Fundamentals of digital signal processing 4 1.2.1. Fourier transform 4 1.2.2. Sampling operation 5 1.2.3. Sampling theorem and aliasing 7 1.3. Multichannel audio 12 1.3.1. Perceptual cues 13 1.3.2. Surround sound 14 1.3.3. Surround sound standards 15 1.3.4. A future surround sound system 17 1.4. Outline of this book 18 2. Quantization 21 2.1. Scalar quantization 21 2.1.1. Uniform quantization 21 2.1.2. Nonuniform quantization 25 2.2. Vector quantization 26 2.2.1. Nearest-neighbor quantizers 28 2.2.2. Optimality of vector quantizers 29 2.2.3. Vector quantizer design 31 2.3. Bit allocation 32 2.3.1. Problem of bit allocation 33 2.3.2. Optimal bit allocation results 33 3. Entropy coding 35 3.1. Introduction to information theory 35 3.2. Huffman coding 38 3.2.1. Huffman coding algorithm 38 3.2.2. Variance of Huffman codes 39 3.2.3. Huffman decoding 40 3.2.4. Adaptive Huffman coding 41 xContents 3.3. Arithmetic coding 42 3.3.1. Arithmetic coding algorithm 42 3.3.2. Implementation issues 44 3.3.3. Solving underflow problem 47 3.3.4. Adaptive arithmetic coding 48 3.4. QM coding 51 3.4.1. QM encoder 51 3.4.2. QM decoder 55 3.4.3. Probability estimation 55 4. Introduction to psychoacoustics 59 4.1. Perception of loudness 59 4.2. Masking 61 4.2.1. Frequency masking 62 4.2.2. Temporal masking 63 4.2.3. Interaural masking 64 5. Subjective evaluation of audio codecs 65 5.1. Introduction 65 5.2. Listening environment specifications 65 5.3. Testing methodology 68 5.4. Data analysis after subjective listening tests 69 5.4.1. Mean 69 5.4.2. Variance 69 5.4.3. Standard deviation 71 5.4.4. Standard error of the mean 72 5.4.5. Confidence interval 73 6. MPEG-4 audio coding tools 77 6.1. Introduction to MPEG-4 audio 77 6.2. MPEG-4 audio tools 79 6.2.1. MPEG-4 natural sound coding tools 81 6.2.2. MPEG-4 audio synthesis tools 87 7. MPEG advanced audio coding 91 7.1. Introduction to advanced audio coding 91 7.2. MPEG-2 AAC 92 7.2.1. Overview of MPEG-2 AAC 92 7.2.2. Psychoacoustic model 94 7.2.3. Gain control 94 7.2.4. Transform 95 7.2.5. Spectral processing 98 7.2.6. Quantization 102 7.2.7. Entropy coding 103 7.3. New features in MPEG-4 AAC 105 7.3.1. Perceptual noise substitution 106 Contents xi 7.3.2. Long-term prediction 107 7.3.3. TwinVQ 108 7.3.4. Low-delay AAC 109 7.3.5. Error-resilient tools 111 7.3.6. MPEG-4 scalable audio coding tools 112 7.4. MPEG-4 high-efficiency AAC 118 7.4.1. Background of SBR technology 119 7.4.2. Basic principle of SBR technology 121 7.4.3. More technical details on high-efficiency AAC 122 8. Introduction to new audio coding tools 125 8.1. Motivation and overview 125 8.1.1. Redundancy inherent in multichannel audio 125 8.1.2. Quality-scalable single compressed bitstream 126 8.1.3. Embedded multichannel audio bitstream 126 8.1.4. Error-resilient scalable audio bitstream 127 8.2. Audio coding improvements 127 8.2.1. Interchannel redundancy removal approach 128 8.2.2. Audio concealment and channel transmission strategy for heterogeneous network 129 8.2.3. Quantization efficiency for adaptive Karhunen-Loeve` transform 129 8.2.4. Progressive syntax-rich multichannel audio codec design 130 8.2.5. Error-resilient scalable audio coding 130 9. Interchannel redundancy removal and channel-scalable decoding 133 9.1. Introduction 133 9.2. Interchannel redundancy removal 133 9.2.1. Karhunen-Loeve` transform 133 9.2.2. Evidence for interchannel decorrelation 135 9.2.3. Energy compaction effect 138 9.2.4. Frequency-domain versus time-domain KLT 141 9.3. Temporal adaptive KLT 143 9.4. Eigen-channel coding and transmission 147 9.4.1. Eigen-channel coding 147 9.4.2. Eigen-channel transmission 149 9.5. Audio concealment for channel-scalable decoding 150 9.6. Compression system overview 152 9.7. Complexity analysis 154 9.8. Experimental results 155 9.8.1. Multichannel audio coding 155 9.8.2. Audio concealment with channel-scalable coding 157 9.8.3. Subjective listening test 160 9.9. Conclusion 162 xii Contents 9.10. Appendix: Karhunen-Loeve` expansion 163 9.10.1. Definition 163 9.10.2. Features and properties 163 10. Adaptive Karhunen-Loeve` transform and its quantization efficiency 165 10.1. Introduction 165 10.2. Vector quantization 166 10.3. Efficiency of KLT decorrelation 167 10.4. Temporal adaptation effect 172 10.5. Complexity analysis 176 10.6. Experimental results 176 10.7. Conclusion 177 11. Progressive syntax-rich multichannel audio codec 179 11.1. Introduction 179 11.2. Progressive syntax-rich codec design 180 11.3. Scalable quantization and entropy coding 182 11.3.1. Successive approximation quantization 182 11.3.2. Context-based QM coder 186 11.4. Channel and subband transmission strategy 187 11.4.1. Channel selection rule 187 11.4.2. Subband selection rule 188 11.5. Implementation issues 191 11.5.1. Frame, subband, or channel skipping 191 11.5.2. Determination of the MNR threshold 192 11.6. Complete description of PSMAC codec 192 11.7. Experimental results 193 11.7.1. Results using MNR measurement 194 11.7.2. Subjective listening tests 196 11.8. Conclusions 197 12. Error-resilient scalable audio codec design 199 12.1. Introduction 199 12.2. WCDMA characteristics 201 12.3. Layered coding structure 201 12.3.1.
Recommended publications
  • Lossless Audio Codec Comparison
    Contents Introduction 3 1 CD-audio test 4 1.1 CD's used . .4 1.2 Results all CD's together . .4 1.3 Interesting quirks . .7 1.3.1 Mono encoded as stereo (Dan Browns Angels and Demons) . .7 1.3.2 Compressibility . .9 1.4 Convergence of the results . 10 2 High-resolution audio 13 2.1 Nine Inch Nails' The Slip . 13 2.2 Howard Shore's soundtrack for The Lord of the Rings: The Return of the King . 16 2.3 Wasted bits . 18 3 Multichannel audio 20 3.1 Howard Shore's soundtrack for The Lord of the Rings: The Return of the King . 20 A Motivation for choosing these CDs 23 B Test setup 27 B.1 Scripting and graphing . 27 B.2 Codecs and parameters used . 27 B.3 MD5 checksumming . 28 C Revision history 30 Bibliography 31 2 Introduction While testing the efficiency of lossy codecs can be quite cumbersome (as results differ for each person), comparing lossless codecs is much easier. As the last well documented and comprehensive test available on the internet has been a few years ago, I thought it would be a good idea to update. Beside comparing with CD-audio (which is often done to assess codec performance) and spitting out a grand total, this comparison also looks at extremes that occurred during the test and takes a look at 'high-resolution audio' and multichannel/surround audio. While the comparison was made to update the comparison-page on the FLAC website, it aims to be fair and unbiased.
    [Show full text]
  • DV-983H 1080P Up-Converting Universal DVD Player with VRS by Anchor Bay Video Processing and 7.1CH Audio
    DV-983H 1080p Up-Converting Universal DVD Player with VRS by Anchor Bay Video Processing and 7.1CH Audio DV-983H is the new flagship model in OPPO's line of award-winning up-converting DVD players. Featuring Anchor Bay's leading video processing technologies, 7.1-channel audio, and 1080p HDMI up-conversion, the DV-983H Universal DVD Player delivers the breath-taking audio and video performance needed to make standard DVDs look their best on today's large screen, high resolution displays. The DV-983H provides a rich array of features for serious home theater enthusiasts. By applying source-adaptive, motion-adaptive, and edge-adaptive techniques, the DV-983H produces an outstanding image for any DVD, whether it’s mastered from an original theatrical release film or from a TV series. Aspect ratio conversion and multi-level zooming enable users to take full control of the viewing experience – maintain the original aspect ratio, stretch to full screen, or crop the unsightly black borders. Special stretch modes make it possible to utilize the full resolution of ultra high-end projectors with anamorphic lens. For users with an international taste, the frame rate conversion feature converts PAL movies for NTSC output without any loss of resolution or tearing. Custom home theater installers will find the DV-983H easy to integrate into whole-house control systems, thanks to its RS-232 and IR IN/OUT control ports. To complete the home theatre experience, the DV-983H produces stunning sound quality. Its 7.1 channel audio with Dolby Digital Surround EX decoding offers more depth, spacious ambience, and sound localization.
    [Show full text]
  • Ardour Export Redesign
    Ardour Export Redesign Thorsten Wilms [email protected] Revision 2 2007-07-17 Table of Contents 1 Introduction 4 4.5 Endianness 8 2 Insights From a Survey 4 4.6 Channel Count 8 2.1 Export When? 4 4.7 Mapping Channels 8 2.2 Channel Count 4 4.8 CD Marker Files 9 2.3 Requested File Types 5 4.9 Trimming 9 2.4 Sample Formats and Rates in Use 5 4.10 Filename Conflicts 9 2.5 Wish List 5 4.11 Peaks 10 2.5.1 More than one format at once 5 4.12 Blocking JACK 10 2.5.2 Files per Track / Bus 5 4.13 Does it have to be a dialog? 10 2.5.3 Optionally store timestamps 5 5 Track Export 11 2.6 General Problems 6 6 MIDI 12 3 Feature Requests 6 7 Steps After Exporting 12 3.1 Multichannel 6 7.1 Normalize 12 3.2 Individual Files 6 7.2 Trim silence 13 3.3 Realtime Export 6 7.3 Encode 13 3.4 Range ad File Export History 7 7.4 Tag 13 3.5 Running a Script 7 7.5 Upload 13 3.6 Export Markers as Text 7 7.6 Burn CD / DVD 13 4 The Current Dialog 7 7.7 Backup / Archiving 14 4.1 Time Span Selection 7 7.8 Authoring 14 4.2 Ranges 7 8 Container Formats 14 4.3 File vs Directory Selection 8 8.1 libsndfile, currently offered for Export 14 4.4 Container Types 8 8.2 libsndfile, also interesting 14 8.3 libsndfile, rather exotic 15 12 Specification 18 8.4 Interesting 15 12.1 Core 18 8.4.1 BWF – Broadcast Wave Format 15 12.2 Layout 18 8.4.2 Matroska 15 12.3 Presets 18 8.5 Problematic 15 12.4 Speed 18 8.6 Not of further interest 15 12.5 Time span 19 8.7 Check (Todo) 15 12.6 CD Marker Files 19 9 Encodings 16 12.7 Mapping 19 9.1 Libsndfile supported 16 12.8 Processing 19 9.2 Interesting 16 12.9 Container and Encodings 19 9.3 Problematic 16 12.10 Target Folder 20 9.4 Not of further interest 16 12.11 Filenames 20 10 Container / Encoding Combinations 17 12.12 Multiplication 20 11 Elements 17 12.13 Left out 21 11.1 Input 17 13 Credits 21 11.2 Output 17 14 Todo 22 1 Introduction 4 1 Introduction 2 Insights From a Survey The basic purpose of Ardour's export functionality is I conducted a quick survey on the Linux Audio Users to create mixdowns of multitrack arrangements.
    [Show full text]
  • Installation Manual, Document Number 200-800-0002 Or Later Approved Revision, Is Followed
    9800 Martel Road Lenoir City, TN 37772 PPAAVV8800 High-fidelity Audio-Video In-Flight Entertainment System With DVD/MP3/CD Player and Radio Receiver STC-PMA Document P/N 200-800-0101 Revision 6 September 2005 Installation and Operation Manual Warranty is not valid unless this product is installed by an Authorized PS Engineering dealer or if a PS Engineering harness is purchased. PS Engineering, Inc. 2005 © Copyright Notice Any reproduction or retransmittal of this publication, or any portion thereof, without the expressed written permission of PS Engi- neering, Inc. is strictly prohibited. For further information contact the Publications Manager at PS Engineering, Inc., 9800 Martel Road, Lenoir City, TN 37772. Phone (865) 988-9800. Table of Contents SECTION I GENERAL INFORMATION........................................................................ 1-1 1.1 INTRODUCTION........................................................................................................... 1-1 1.2 SCOPE ............................................................................................................................. 1-1 1.3 EQUIPMENT DESCRIPTION ..................................................................................... 1-1 1.4 APPROVAL BASIS (PENDING) ..................................................................................... 1-1 1.5 SPECIFICATIONS......................................................................................................... 1-2 1.6 EQUIPMENT SUPPLIED ............................................................................................
    [Show full text]
  • Real-Time Programming and Processing of Music Signals Arshia Cont
    Real-time Programming and Processing of Music Signals Arshia Cont To cite this version: Arshia Cont. Real-time Programming and Processing of Music Signals. Sound [cs.SD]. Université Pierre et Marie Curie - Paris VI, 2013. tel-00829771 HAL Id: tel-00829771 https://tel.archives-ouvertes.fr/tel-00829771 Submitted on 3 Jun 2013 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Realtime Programming & Processing of Music Signals by ARSHIA CONT Ircam-CNRS-UPMC Mixed Research Unit MuTant Team-Project (INRIA) Musical Representations Team, Ircam-Centre Pompidou 1 Place Igor Stravinsky, 75004 Paris, France. Habilitation à diriger la recherche Defended on May 30th in front of the jury composed of: Gérard Berry Collège de France Professor Roger Dannanberg Carnegie Mellon University Professor Carlos Agon UPMC - Ircam Professor François Pachet Sony CSL Senior Researcher Miller Puckette UCSD Professor Marco Stroppa Composer ii à Marie le sel de ma vie iv CONTENTS 1. Introduction1 1.1. Synthetic Summary .................. 1 1.2. Publication List 2007-2012 ................ 3 1.3. Research Advising Summary ............... 5 2. Realtime Machine Listening7 2.1. Automatic Transcription................. 7 2.2. Automatic Alignment .................. 10 2.2.1.
    [Show full text]
  • Lossless Audio Codec Comparison
    Contents Introduction 3 1 Test setup 4 1.1 Scripting and graphing . .4 1.2 Codecs and parameters used . .5 1.3 WMA, RealAudio and ALAC . .6 2 CD-audio test 8 2.1 CD's used . .8 2.2 Results all CD's together . .9 2.3 Interesting quirks . 12 2.3.1 Mono encoded as stereo (Dan Browns Angels and Demons) 12 2.4 Convergence of the results . 15 3 High-resolution audio 17 3.1 Nine Inch Nails' The Slip . 17 3.2 Howard Shore's soundtrack for The Lord of the Rings: The Re- turn of the King . 20 3.3 Wasted bits . 22 4 Multichannel audio 24 4.1 Howard Shore's soundtrack for The Lord of the Rings: The Re- turn of the King . 24 A Motivation for choosing these CDs 27 Bibliography 31 2 Introduction While testing the efficiency of lossy codecs can be quite cumbersome (as results differ for each person), comparing lossless codecs is much easier. As the last well documented and comprehensive test available on the internet has been a few years ago, I thought it would be a good idea to update. Beside comparing with CD-audio (which is often done to assess codec perfor- mance) and spitting out a grand total, this comparison also looks at extremes that occurred during the test and takes a look at 'high-resolution audio' and multichannel/surround audio. While the comparison was made to update the comparison-page on the FLAC website, it aims to be fair and unbiased. Because of this, you'll probably won't find anything that looks like conclusions: test results are displayed and analysed, but there is no judgement or choice made.
    [Show full text]
  • (A/V Codecs) REDCODE RAW (.R3D) ARRIRAW
    What is a Codec? Codec is a portmanteau of either "Compressor-Decompressor" or "Coder-Decoder," which describes a device or program capable of performing transformations on a data stream or signal. Codecs encode a stream or signal for transmission, storage or encryption and decode it for viewing or editing. Codecs are often used in videoconferencing and streaming media solutions. A video codec converts analog video signals from a video camera into digital signals for transmission. It then converts the digital signals back to analog for display. An audio codec converts analog audio signals from a microphone into digital signals for transmission. It then converts the digital signals back to analog for playing. The raw encoded form of audio and video data is often called essence, to distinguish it from the metadata information that together make up the information content of the stream and any "wrapper" data that is then added to aid access to or improve the robustness of the stream. Most codecs are lossy, in order to get a reasonably small file size. There are lossless codecs as well, but for most purposes the almost imperceptible increase in quality is not worth the considerable increase in data size. The main exception is if the data will undergo more processing in the future, in which case the repeated lossy encoding would damage the eventual quality too much. Many multimedia data streams need to contain both audio and video data, and often some form of metadata that permits synchronization of the audio and video. Each of these three streams may be handled by different programs, processes, or hardware; but for the multimedia data stream to be useful in stored or transmitted form, they must be encapsulated together in a container format.
    [Show full text]
  • Name Synopsis Description
    SHNTOOL(1) local SHNTOOL(1) NAME shntool − a multi-purpose WAV Edata processing and reporting utility SYNOPSIS shntool mode ... shntool [CORE OPTION] DESCRIPTION shntool is a command-line utility to viewand/or modify WAV Edata and properties. It runs in several dif- ferent operating modes, and supports various lossless audio formats. shntool is comprised of three parts - its core, mode modules, and format modules. This helps to makethe code easier to maintain, as well as aid other programmers in developing newfunctionality.The distribution archive contains a file named ’modules.howto’ that describes howtocreate a newmode or format module, for those so inclined. Mode modules shntool performs various functions on WAV Edata through the use of mode modules. The core of shntool is simply a wrapper around the mode modules. In fact, when shntool is run with a valid mode as its first argument, it essentially runs the main procedure for the specified mode, and quits. shntool comes with sev- eral built-in modes, described below: len Displays length, size and properties of PCM WAV Edata fix Fixes sector-boundary problems with CD-quality PCM WAV Edata hash Computes the MD5 or SHA1 fingerprint of PCM WAV Edata pad Pads CD(hyquality files not aligned on sector boundaries with silence join Joins PCM WAV Edata from multiple files into one split Splits PCM WAV Edata from one file into multiple files cat Writes PCM WAV Edata from one or more files to the terminal cmp Compares PCM WAV Edata in twofiles cue Generates a CUE sheet or split points from a set of files conv Converts files from one format to another info Displays detailed information about PCM WAV Edata strip Strips extra RIFF chunks and/or writes canonical headers gen Generates CD-quality PCM WAV Edata files containing silence trim Trims PCM WAV Esilence from the ends of files Formore information on the meaning of the various command-line options for each mode, see the MODE- SPECIFIC OPTIONS section below.
    [Show full text]
  • 21065L Audio Tutorial
    a Using The Low-Cost, High Performance ADSP-21065L Digital Signal Processor For Digital Audio Applications Revision 1.0 - 12/4/98 dB +12 0 -12 Left Right Left EQ Right EQ Pan L R L R L R L R L R L R L R L R 1 2 3 4 5 6 7 8 Mic High Line L R Mid Play Back Bass CNTR 0 0 3 4 Input Gain P F R Master Vol. 1 2 3 4 5 6 7 8 Authors: John Tomarakos Dan Ledger Analog Devices DSP Applications 1 Using The Low Cost, High Performance ADSP-21065L Digital Signal Processor For Digital Audio Applications Dan Ledger and John Tomarakos DSP Applications Group, Analog Devices, Norwood, MA 02062, USA This document examines desirable DSP features to consider for implementation of real time audio applications, and also offers programming techniques to create DSP algorithms found in today's professional and consumer audio equipment. Part One will begin with a discussion of important audio processor-specific characteristics such as speed, cost, data word length, floating-point vs. fixed-point arithmetic, double-precision vs. single-precision data, I/O capabilities, and dynamic range/SNR capabilities. Comparisions between DSP's and audio decoders that are targeted for consumer/professional audio applications will be shown. Part Two will cover example algorithmic building blocks that can be used to implement many DSP audio algorithms using the ADSP-21065L including: Basic audio signal manipulation, filtering/digital parametric equalization, digital audio effects and sound synthesis techniques. TABLE OF CONTENTS 0. INTRODUCTION ................................................................................................................................................................4 1.
    [Show full text]
  • Lossless Compression of Audio Data
    CHAPTER 12 Lossless Compression of Audio Data ROBERT C. MAHER OVERVIEW Lossless data compression of digital audio signals is useful when it is necessary to minimize the storage space or transmission bandwidth of audio data while still maintaining archival quality. Available techniques for lossless audio compression, or lossless audio packing, generally employ an adaptive waveform predictor with a variable-rate entropy coding of the residual, such as Huffman or Golomb-Rice coding. The amount of data compression can vary considerably from one audio waveform to another, but ratios of less than 3 are typical. Several freeware, shareware, and proprietary commercial lossless audio packing programs are available. 12.1 INTRODUCTION The Internet is increasingly being used as a means to deliver audio content to end-users for en­ tertainment, education, and commerce. It is clearly advantageous to minimize the time required to download an audio data file and the storage capacity required to hold it. Moreover, the expec­ tations of end-users with regard to signal quality, number of audio channels, meta-data such as song lyrics, and similar additional features provide incentives to compress the audio data. 12.1.1 Background In the past decade there have been significant breakthroughs in audio data compression using lossy perceptual coding [1]. These techniques lower the bit rate required to represent the signal by establishing perceptual error criteria, meaning that a model of human hearing perception is Copyright 2003. Elsevier Science (USA). 255 AU rights reserved. 256 PART III / APPLICATIONS used to guide the elimination of excess bits that can be either reconstructed (redundancy in the signal) orignored (inaudible components in the signal).
    [Show full text]
  • Codec Is a Portmanteau of Either
    What is a Codec? Codec is a portmanteau of either "Compressor-Decompressor" or "Coder-Decoder," which describes a device or program capable of performing transformations on a data stream or signal. Codecs encode a stream or signal for transmission, storage or encryption and decode it for viewing or editing. Codecs are often used in videoconferencing and streaming media solutions. A video codec converts analog video signals from a video camera into digital signals for transmission. It then converts the digital signals back to analog for display. An audio codec converts analog audio signals from a microphone into digital signals for transmission. It then converts the digital signals back to analog for playing. The raw encoded form of audio and video data is often called essence, to distinguish it from the metadata information that together make up the information content of the stream and any "wrapper" data that is then added to aid access to or improve the robustness of the stream. Most codecs are lossy, in order to get a reasonably small file size. There are lossless codecs as well, but for most purposes the almost imperceptible increase in quality is not worth the considerable increase in data size. The main exception is if the data will undergo more processing in the future, in which case the repeated lossy encoding would damage the eventual quality too much. Many multimedia data streams need to contain both audio and video data, and often some form of metadata that permits synchronization of the audio and video. Each of these three streams may be handled by different programs, processes, or hardware; but for the multimedia data stream to be useful in stored or transmitted form, they must be encapsulated together in a container format.
    [Show full text]
  • Introduction to DVD Carol Cini, U.S
    Proceedings of the 8th Annual Federal Depository Library Conference April 12 - 15, 1999 Introduction to DVD Carol Cini, U.S. Government Printing Office Washington, DC DVD started out standing for Digital Video Disc, then Digital Versatile Disc, and now it’s just plain old DVD. It is essentially a bigger and faster CD that is being promoted for entertainment purposes (movies) and some computer applications. It will eventually replace audio CDs, VHS and Beta tapes, laserdiscs, CD-ROMs, and video game cartridges as more hardware and software manufacturers support this new technology. DVDs and CDs look alike. A CD is a single solid injected molded piece of carbonate plastic that has a layer of metal to reflect data to a laser reader and coat of clear laminate for protection. DVD is the same size as a CD but consists of two solid injected molded pieces of plastic bonded together. Like CDs, DVDs have a metalized layer (requires special metalization process) and are coated with clear laminate. Unlike CD's, DVD's can have two layers per side and have 4 times as many "pits" and "lands" as a CD. There are various types of DVD, including DVD-ROM, DVD-Video, DVD-Audio, DVD-R, and DVD-RAM. The specifications for these DVD's are as follows: for prerecorded DVD's; Book A - DVD-ROM, Book B - DVD-Video, and Book C- DVD-Audio. For recordable DVD's, there is Book D - DVD-R, Book E - DVD-RAM. The official DVD specification books are available from Toshiba after signing a nondisclosure agreement and paying a $5,000 fee.
    [Show full text]