Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards
Audio coding and compression
Corso di Networked Multimedia Systems
Master Universitario di Primo Livello in Progettazione e Gestione di Sistemi di Rete Carlo Drioli
Università degli Studi di Verona Facoltà di Scienze Matematiche, Dipartimento di Informatica Fisiche e Naturali Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards
Audio coding and compression: OUTLINE
Introduction
Perceptual coding
LPC-based coding
Audio Compression Standards
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards PCM coding of speech and audio signals
Simplest audio coding (PCM) involves:
I Anti-aliasing ltering
I Sampling
I Quantization
Typical PCM audio coding parameters Parameters for mono audio and speech Frequency range (Hz) Sampling rate (Hz) PCM bps PCM bit rate (kb/s) Telephone speech 300 - 3400 8 8 64 Wideband speech 50-7000 16 8 128 Mediumband Audio 10-11000 24 16 384 Wideband Audio 10-22000 48 16 768
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Audio coding and compression schemes
Approaches to improve coding eciency include
I Making assumptions about the nature of the source (esp. for speech)
I Reducing the redundancy in the signals
I Exploiting perceptual limitations of the human auditory system (esp. for generic audio, e.g. MPEG standards)
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Auditory masking and Perceptual coding
Critical bands and auditory masking
I The basilar membrane provides frequency-to-place transformation I The auditory system has limited, frequency dependent resolution I Frequency dependency: the auditory system acts as a bandpass lter bank with nonuniform bandwidths (critical bands)
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Auditory masking and Perceptual coding
Auditory masking
I Simultaneous masking: auditory system blurs signal components within a critical band
I The noise-masking threshold at any given frequency depends only on signal energy within a limited region around that frequency
I Low-level signal components below that threshold will not be audible
I Temporal masking: may occur when two sounds appear within a limited small interval of time
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Auditory masking and Perceptual coding
Simultaneous masking
I Signal components below masking threshold are not audible
source: (P. Noll 1997)
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Auditory masking and Perceptual coding
Temporal masking
I Post-masking: in the order of 50 to 200 msec
I Pre-masking: below 10 msec
source: (P. Noll 1997)
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Perceptual coding: noise shaping
Dithering and noise shaping
I Dither aims at reducing artifacts due to quantization (e.g., harmonic and intermodulation distortion)
I Additive dither: add noise before quantization
I Dither with noise shaping: moves wideband noise energy due to additive dither into less audible regions of the spectrum
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Perceptual coding: companding
Companding
I Companding operation compresses dynamic range on encode and expands dynamic range on decode
I In audio coding, companding precedes the quantization step, improving the eciency of the quantizer
I Used in standard speech coders (mu-law companding) and for wideband audio compression standards (e.g., MPEG)
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards LPC-based coders
Linear predictive coding (LPC)
I The signal is modeled as an AR process
I LPC attempts to predict the present sample from past samples: P y k Np p y k i ˆ( ) = i=1 i ( − ) 1 I The lter recovers y k from the prediction error 1−P(z) ( ) e(k) = y(k) − yˆ(k) I LPC exploits the signal redundancies to reduce the transmission bit rate
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards LPC-based coders
LPC with open loop quantization
I Quantization of the residual out of the prediction loop
I The state of the predictor lters will not evolve in the same way 2 I Overall system SNR: SNR σ = E[(y−y˜)2]
y(k) + e(k) eˆ(k) eˆ(k) + yˆ(k) Q − + y˜0(k) y˜(k) P (z) P (z)
Encoder Decoder
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards LPC-based coders
DPCM: closed loop quantization
I Quantization of the residual in the prediction loop (predictive quantization)
I The state of the predictor lters will evolve in the same way 2 2 I σ σ : the eective resolution of the E[(y−y˜)2] = E[(e−eˆ)2] quantization is increased if error is small 2 I Overall system SNR: SNR σ = E[(e−eˆ)2]
y(k) + e(k) eˆ(k) eˆ(k) + yˆ(k) Q − + + y˜(k) y˜(k) + P (z) yˆ(k) P (z) Audio coding and compression Encoder Decoder Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Audio Compression Standards
Dolby Digital
I Supports multichannel audio ( Mono, Stereo, 5.1, Surround)
I Codec uses lossy data compression
I Adopted for cinema soundtracks, DVD Video, digital TV, cable TV, etc.
MPEG/Audio⇐
I Part of a more general standard for audio and video coding
I Supports multichannel audio (mono, stereo, 5.1)
I Codec uses lossy data compression
I Adopted for data storage, DVD Video, digital TV, cable TV, etc. Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG Audio compression
MPEG/Audio features
I A compression standard for generic audio
I Makes no assumptions about the source
I Exploits the perceptual limitation of auditory system
I Oers three independent layers of compression
I Encoded bitstream supports CRC for error detection
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG Audio compression
MPEG/Audio compression layers
I Layer I: low encoding complexity, bit rates above 128 Kbps
I Layer II: medium complexity, bit rates around 128 Kbps
I Layer III (MP3): high complexity, best audio quality, bit rates around 64 Kbps
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG Audio compression
MPEG/Audio codec scheme: overview
I Goal: minimize the audibility of quantization noise
I Approach: bits available for quantization are dynamically allocated to the subband signals
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards The lter bank stage
Polyphase lter bank
I The lter bank divides the input into 32 equal-width frequency subbands I The lterbank is critically sampled: 32 input samples ⇒ 32 output samples
I Subbands have equal widths: does not reect accurately the human auditory system
I Filterbank analysis/synthesis: not a lossless processing
I Adjacent lter bands overlaps: a single frequency can excite two adjacent lterbank outputs.
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards The psychoacoustic model
Principal steps of the psychoacoustic model
I Computes a higher resolution time-freq mapping of the signal (e.g., FFT)
I Separate spectral values into tonal and non-tonal
I Apply a masking function
I Computes the masking threshold for each subband
I Computes the signal-to-mask ratio (SMR) in each subband (the ratio of the signal energy to the minimum masking threshold )
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards The psychoacoustic model
Example of masking threshold determination
Masking components and masking thresholds. 100
80
60
40 dB
20
0
−20 0 50 100 150 200 250 Frequency index
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Dynamic bit allocation
The bit allocation process determines the number of bits allocated to each subband Procedure:
I Computes the mask-to-noise ratio MNRdB = SNRdB − SMRdB I Allocates bits to the subbands with lowest MNR
I Re-estimates SNR, recomputes the subband's MNR, and iterate allocation until bits are available
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Bitstream formatting
Typical frame format for the MPEG/Audio bitstream
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG-1
MPEG-1 Audio Layer 1
I Processes data grouping 12 samples from each subband
I Each group has a bit allocation and a scale factor
I The scale factor tells how to scale the samples to use the full range of quantizer
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG-1
MPEG-1 Audio Layer 2 (wrt Layer 1)
I Similar to Layer 1, but higher complexity and better performance
I Psychoacustic model has ner frequency resolution
I Finer quantization
I Allows grouping of subband samples to reduce scale-factor information
I Higher decoding delay
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG-1
MPEG-1 Audio Layer 3 (wrt Layer 1 and 2)
I Achieves better performances
I Improved time-to-frequency mapping and noise allocation
I Hybrid lterbank: a cascade of polyphase lterbank and dynamically windowed MDCT transform. Provides better spectral resolution and advanced pre-echo control
I Nonuniform quantization with entropy coding
I Run length coding of zero value sequences increases the eciency
I Bit-reservoir: bits in excess can be donated to encode a dierent frame of audio
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Pre-echo canceling
Overlapping dynamic MDCT windows
I Frequency domain coding of audio is aected by pre-echo (e.g., silent period is followed by a percussive sound in the same coding block)
I Solution: pre-echo detection and dynamic window switching for time-varying temporal resolution
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Bit allocation
Bit reservoir
I If a signal frame can be eciently encoded with fewer bits than maximum allowed, the remaining bits are allocated to future signal frames
source: (D. Pan 1995)
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG-2
Principal features of MPEG-2
I Extends MPEG-1 to lower sampling frequencies, providing better sound quality at very low bit rates
I Extends MPEG-1 to multichannel sound
I Adds a new coding scheme called "Advanced Audio Coding" (AAC), not backward compatible
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG-2 AAC
Principal features of MPEG-2 AAC
I AAC adheres to the same basic coding paradigm as MPEG-1/2 Layer-3, with some improvements (e.g., noise shaping for improved speech coding)
I Provides very high audio quality at 64 kb/s/channel for multichannel operation
I Provides up to 48 main audio channels, 16 low frequency eects channels, 16 overdub/multilingual channels, and 16 data streams
I Provides scalability (Main Prole, Low Complexity Prole, and Scalable Sampling Rate Prole)
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG-4 AAC
Principal features of MPEG-4 AAC
I Includes coding tools from several dierent coding paradigms, synthetic audio, speech coding and subband/transform coding
I The core part of the MPEG-4 audio coder is based on MPEG-2 AAC
I Provides advanced scalability tools and hierarchical coding (e.g., bandwidth scalability)
I Provides error robustness tools for improved performance on error-prone transmission channels (codec specic error resilience tools and a common error protection tools).
I Provides several "proles" to allow the optimal use of MPEG-4 in dierent applications.
Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards References
P. Noll, "MPEG Audio Coding"IEEE Signal Processing Magazine, 14(5):5981, 1997.
Audio coding and compression