Audio Coding and Compression

Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Audio coding and compression Corso di Networked Multimedia Systems Master Universitario di Primo Livello in Progettazione e Gestione di Sistemi di Rete Carlo Drioli Università degli Studi di Verona Facoltà di Scienze Matematiche, Dipartimento di Informatica Fisiche e Naturali Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Audio coding and compression: OUTLINE Introduction Perceptual coding LPC-based coding Audio Compression Standards Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards PCM coding of speech and audio signals Simplest audio coding (PCM) involves: I Anti-aliasing ltering I Sampling I Quantization Typical PCM audio coding parameters Parameters for mono audio and speech Frequency range (Hz) Sampling rate (Hz) PCM bps PCM bit rate (kb/s) Telephone speech 300 - 3400 8 8 64 Wideband speech 50-7000 16 8 128 Mediumband Audio 10-11000 24 16 384 Wideband Audio 10-22000 48 16 768 Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Audio coding and compression schemes Approaches to improve coding eciency include I Making assumptions about the nature of the source (esp. for speech) I Reducing the redundancy in the signals I Exploiting perceptual limitations of the human auditory system (esp. for generic audio, e.g. MPEG standards) Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Auditory masking and Perceptual coding Critical bands and auditory masking I The basilar membrane provides frequency-to-place transformation I The auditory system has limited, frequency dependent resolution I Frequency dependency: the auditory system acts as a bandpass lter bank with nonuniform bandwidths (critical bands) Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Auditory masking and Perceptual coding Auditory masking I Simultaneous masking: auditory system blurs signal components within a critical band I The noise-masking threshold at any given frequency depends only on signal energy within a limited region around that frequency I Low-level signal components below that threshold will not be audible I Temporal masking: may occur when two sounds appear within a limited small interval of time Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Auditory masking and Perceptual coding Simultaneous masking I Signal components below masking threshold are not audible source: (P. Noll 1997) Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Auditory masking and Perceptual coding Temporal masking I Post-masking: in the order of 50 to 200 msec I Pre-masking: below 10 msec source: (P. Noll 1997) Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Perceptual coding: noise shaping Dithering and noise shaping I Dither aims at reducing artifacts due to quantization (e.g., harmonic and intermodulation distortion) I Additive dither: add noise before quantization I Dither with noise shaping: moves wideband noise energy due to additive dither into less audible regions of the spectrum Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Perceptual coding: companding Companding I Companding operation compresses dynamic range on encode and expands dynamic range on decode I In audio coding, companding precedes the quantization step, improving the eciency of the quantizer I Used in standard speech coders (mu-law companding) and for wideband audio compression standards (e.g., MPEG) Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards LPC-based coders Linear predictive coding (LPC) I The signal is modeled as an AR process I LPC attempts to predict the present sample from past samples: P y k Np p y k i ^( ) = i=1 i ( ¡ ) 1 I The lter recovers y k from the prediction error 1¡P(z) ( ) e(k) = y(k) ¡ y^(k) I LPC exploits the signal redundancies to reduce the transmission bit rate Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards LPC-based coders LPC with open loop quantization I Quantization of the residual out of the prediction loop I The state of the predictor lters will not evolve in the same way 2 I Overall system SNR: SNR σ = E[(y¡y~)2] y(k) + e(k) eˆ(k) eˆ(k) + yˆ(k) Q − + y˜0(k) y˜(k) P (z) P (z) Encoder Decoder Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards LPC-based coders DPCM: closed loop quantization I Quantization of the residual in the prediction loop (predictive quantization) I The state of the predictor lters will evolve in the same way 2 2 I σ σ : the eective resolution of the E[(y¡y~)2] = E[(e¡e^)2] quantization is increased if error is small 2 I Overall system SNR: SNR σ = E[(e¡e^)2] y(k) + e(k) eˆ(k) eˆ(k) + yˆ(k) Q − + + y˜(k) y˜(k) + P (z) yˆ(k) P (z) Audio coding and compression Encoder Decoder Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Audio Compression Standards Dolby Digital I Supports multichannel audio ( Mono, Stereo, 5.1, Surround) I Codec uses lossy data compression I Adopted for cinema soundtracks, DVD Video, digital TV, cable TV, etc. MPEG/Audio( I Part of a more general standard for audio and video coding I Supports multichannel audio (mono, stereo, 5.1) I Codec uses lossy data compression I Adopted for data storage, DVD Video, digital TV, cable TV, etc. Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG Audio compression MPEG/Audio features I A compression standard for generic audio I Makes no assumptions about the source I Exploits the perceptual limitation of auditory system I Oers three independent layers of compression I Encoded bitstream supports CRC for error detection Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG Audio compression MPEG/Audio compression layers I Layer I: low encoding complexity, bit rates above 128 Kbps I Layer II: medium complexity, bit rates around 128 Kbps I Layer III (MP3): high complexity, best audio quality, bit rates around 64 Kbps Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG Audio compression MPEG/Audio codec scheme: overview I Goal: minimize the audibility of quantization noise I Approach: bits available for quantization are dynamically allocated to the subband signals Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards The lter bank stage Polyphase lter bank I The lter bank divides the input into 32 equal-width frequency subbands I The lterbank is critically sampled: 32 input samples ) 32 output samples I Subbands have equal widths: does not reect accurately the human auditory system I Filterbank analysis/synthesis: not a lossless processing I Adjacent lter bands overlaps: a single frequency can excite two adjacent lterbank outputs. Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards The psychoacoustic model Principal steps of the psychoacoustic model I Computes a higher resolution time-freq mapping of the signal (e.g., FFT) I Separate spectral values into tonal and non-tonal I Apply a masking function I Computes the masking threshold for each subband I Computes the signal-to-mask ratio (SMR) in each subband (the ratio of the signal energy to the minimum masking threshold ) Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards The psychoacoustic model Example of masking threshold determination Masking components and masking thresholds. 100 80 60 40 dB 20 0 −20 0 50 100 150 200 250 Frequency index Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Dynamic bit allocation The bit allocation process determines the number of bits allocated to each subband Procedure: I Computes the mask-to-noise ratio MNRdB = SNRdB ¡ SMRdB I Allocates bits to the subbands with lowest MNR I Re-estimates SNR, recomputes the subband's MNR, and iterate allocation until bits are available Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Bitstream formatting Typical frame format for the MPEG/Audio bitstream Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG-1 MPEG-1 Audio Layer 1 I Processes data grouping 12 samples from each subband I Each group has a bit allocation and a scale factor I The scale factor tells how to scale the samples to use the full range of quantizer Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG-1 MPEG-1 Audio Layer 2 (wrt Layer 1) I Similar to Layer 1, but higher complexity and better performance I Psychoacustic model has ner frequency resolution I Finer quantization I Allows grouping of subband samples to

Audio Coding and Compression

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support