Audio Coding and Compression

Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Audio coding and compression Corso di Networked Multimedia Systems Master Universitario di Primo Livello in Progettazione e Gestione di Sistemi di Rete Carlo Drioli Università degli Studi di Verona Facoltà di Scienze Matematiche, Dipartimento di Informatica Fisiche e Naturali Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Audio coding and compression: OUTLINE Introduction Perceptual coding LPC-based coding Audio Compression Standards Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards PCM coding of speech and audio signals Simplest audio coding (PCM) involves: I Anti-aliasing ltering I Sampling I Quantization Typical PCM audio coding parameters Parameters for mono audio and speech Frequency range (Hz) Sampling rate (Hz) PCM bps PCM bit rate (kb/s) Telephone speech 300 - 3400 8 8 64 Wideband speech 50-7000 16 8 128 Mediumband Audio 10-11000 24 16 384 Wideband Audio 10-22000 48 16 768 Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Audio coding and compression schemes Approaches to improve coding eciency include I Making assumptions about the nature of the source (esp. for speech) I Reducing the redundancy in the signals I Exploiting perceptual limitations of the human auditory system (esp. for generic audio, e.g. MPEG standards) Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Auditory masking and Perceptual coding Critical bands and auditory masking I The basilar membrane provides frequency-to-place transformation I The auditory system has limited, frequency dependent resolution I Frequency dependency: the auditory system acts as a bandpass lter bank with nonuniform bandwidths (critical bands) Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Auditory masking and Perceptual coding Auditory masking I Simultaneous masking: auditory system blurs signal components within a critical band I The noise-masking threshold at any given frequency depends only on signal energy within a limited region around that frequency I Low-level signal components below that threshold will not be audible I Temporal masking: may occur when two sounds appear within a limited small interval of time Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Auditory masking and Perceptual coding Simultaneous masking I Signal components below masking threshold are not audible source: (P. Noll 1997) Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Auditory masking and Perceptual coding Temporal masking I Post-masking: in the order of 50 to 200 msec I Pre-masking: below 10 msec source: (P. Noll 1997) Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Perceptual coding: noise shaping Dithering and noise shaping I Dither aims at reducing artifacts due to quantization (e.g., harmonic and intermodulation distortion) I Additive dither: add noise before quantization I Dither with noise shaping: moves wideband noise energy due to additive dither into less audible regions of the spectrum Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Perceptual coding: companding Companding I Companding operation compresses dynamic range on encode and expands dynamic range on decode I In audio coding, companding precedes the quantization step, improving the eciency of the quantizer I Used in standard speech coders (mu-law companding) and for wideband audio compression standards (e.g., MPEG) Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards LPC-based coders Linear predictive coding (LPC) I The signal is modeled as an AR process I LPC attempts to predict the present sample from past samples: P y k Np p y k i ^( ) = i=1 i ( ¡ ) 1 I The lter recovers y k from the prediction error 1¡P(z) ( ) e(k) = y(k) ¡ y^(k) I LPC exploits the signal redundancies to reduce the transmission bit rate Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards LPC-based coders LPC with open loop quantization I Quantization of the residual out of the prediction loop I The state of the predictor lters will not evolve in the same way 2 I Overall system SNR: SNR σ = E[(y¡y~)2] y(k) + e(k) eˆ(k) eˆ(k) + yˆ(k) Q − + y˜0(k) y˜(k) P (z) P (z) Encoder Decoder Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards LPC-based coders DPCM: closed loop quantization I Quantization of the residual in the prediction loop (predictive quantization) I The state of the predictor lters will evolve in the same way 2 2 I σ σ : the eective resolution of the E[(y¡y~)2] = E[(e¡e^)2] quantization is increased if error is small 2 I Overall system SNR: SNR σ = E[(e¡e^)2] y(k) + e(k) eˆ(k) eˆ(k) + yˆ(k) Q − + + y˜(k) y˜(k) + P (z) yˆ(k) P (z) Audio coding and compression Encoder Decoder Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Audio Compression Standards Dolby Digital I Supports multichannel audio ( Mono, Stereo, 5.1, Surround) I Codec uses lossy data compression I Adopted for cinema soundtracks, DVD Video, digital TV, cable TV, etc. MPEG/Audio( I Part of a more general standard for audio and video coding I Supports multichannel audio (mono, stereo, 5.1) I Codec uses lossy data compression I Adopted for data storage, DVD Video, digital TV, cable TV, etc. Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG Audio compression MPEG/Audio features I A compression standard for generic audio I Makes no assumptions about the source I Exploits the perceptual limitation of auditory system I Oers three independent layers of compression I Encoded bitstream supports CRC for error detection Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG Audio compression MPEG/Audio compression layers I Layer I: low encoding complexity, bit rates above 128 Kbps I Layer II: medium complexity, bit rates around 128 Kbps I Layer III (MP3): high complexity, best audio quality, bit rates around 64 Kbps Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG Audio compression MPEG/Audio codec scheme: overview I Goal: minimize the audibility of quantization noise I Approach: bits available for quantization are dynamically allocated to the subband signals Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards The lter bank stage Polyphase lter bank I The lter bank divides the input into 32 equal-width frequency subbands I The lterbank is critically sampled: 32 input samples ) 32 output samples I Subbands have equal widths: does not reect accurately the human auditory system I Filterbank analysis/synthesis: not a lossless processing I Adjacent lter bands overlaps: a single frequency can excite two adjacent lterbank outputs. Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards The psychoacoustic model Principal steps of the psychoacoustic model I Computes a higher resolution time-freq mapping of the signal (e.g., FFT) I Separate spectral values into tonal and non-tonal I Apply a masking function I Computes the masking threshold for each subband I Computes the signal-to-mask ratio (SMR) in each subband (the ratio of the signal energy to the minimum masking threshold ) Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards The psychoacoustic model Example of masking threshold determination Masking components and masking thresholds. 100 80 60 40 dB 20 0 −20 0 50 100 150 200 250 Frequency index Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Dynamic bit allocation The bit allocation process determines the number of bits allocated to each subband Procedure: I Computes the mask-to-noise ratio MNRdB = SNRdB ¡ SMRdB I Allocates bits to the subbands with lowest MNR I Re-estimates SNR, recomputes the subband's MNR, and iterate allocation until bits are available Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Bitstream formatting Typical frame format for the MPEG/Audio bitstream Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG-1 MPEG-1 Audio Layer 1 I Processes data grouping 12 samples from each subband I Each group has a bit allocation and a scale factor I The scale factor tells how to scale the samples to use the full range of quantizer Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG-1 MPEG-1 Audio Layer 2 (wrt Layer 1) I Similar to Layer 1, but higher complexity and better performance I Psychoacustic model has ner frequency resolution I Finer quantization I Allows grouping of subband samples to

Load more