<<

Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards

Audio coding and compression

Corso di Networked Multimedia Systems

Master Universitario di Primo Livello in Progettazione e Gestione di Sistemi di Rete Carlo Drioli

Università degli Studi di Verona Facoltà di Scienze Matematiche, Dipartimento di Informatica Fisiche e Naturali Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards

Audio coding and compression: OUTLINE

Introduction

Perceptual coding

LPC-based coding

Audio Compression Standards

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards PCM coding of speech and audio signals

Simplest audio coding (PCM) involves:

I Anti-aliasing ltering

I Sampling

I Quantization

Typical PCM audio coding parameters Parameters for mono audio and speech Frequency range (Hz) Sampling rate (Hz) PCM bps PCM (kb/s) Telephone speech 300 - 3400 8 8 64 Wideband speech 50-7000 16 8 128 Mediumband Audio 10-11000 24 16 384 Wideband Audio 10-22000 48 16 768

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Audio coding and compression schemes

Approaches to improve coding eciency include

I Making assumptions about the nature of the source (esp. for speech)

I Reducing the redundancy in the signals

I Exploiting perceptual limitations of the human auditory system (esp. for generic audio, e.g. MPEG standards)

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Auditory masking and Perceptual coding

Critical bands and auditory masking

I The basilar membrane provides frequency-to-place transformation I The auditory system has limited, frequency dependent resolution I Frequency dependency: the auditory system acts as a bandpass lter bank with nonuniform bandwidths (critical bands)

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Auditory masking and Perceptual coding

Auditory masking

I Simultaneous masking: auditory system blurs signal components within a critical band

I The noise-masking threshold at any given frequency depends only on signal energy within a limited region around that frequency

I Low-level signal components below that threshold will not be audible

I Temporal masking: may occur when two sounds appear within a limited small interval of time

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Auditory masking and Perceptual coding

Simultaneous masking

I Signal components below masking threshold are not audible

source: (P. Noll 1997)

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Auditory masking and Perceptual coding

Temporal masking

I Post-masking: in the order of 50 to 200 msec

I Pre-masking: below 10 msec

source: (P. Noll 1997)

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Perceptual coding: noise shaping

Dithering and noise shaping

I Dither aims at reducing artifacts due to quantization (e.g., harmonic and intermodulation distortion)

I Additive dither: add noise before quantization

I Dither with noise shaping: moves wideband noise energy due to additive dither into less audible regions of the spectrum

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Perceptual coding: companding

Companding

I Companding operation compresses dynamic range on encode and expands dynamic range on decode

I In audio coding, companding precedes the quantization step, improving the eciency of the quantizer

I Used in standard speech coders (mu-law companding) and for wideband audio compression standards (e.g., MPEG)

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards LPC-based coders

Linear predictive coding (LPC)

I The signal is modeled as an AR process

I LPC attempts to predict the present sample from past samples: P y k Np p y k i ˆ( ) = i=1 i ( − ) 1 I The lter recovers y k from the prediction error 1−P(z) ( ) e(k) = y(k) − yˆ(k) I LPC exploits the signal redundancies to reduce the transmission bit rate

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards LPC-based coders

LPC with open loop quantization

I Quantization of the residual out of the prediction loop

I The state of the predictor lters will not evolve in the same way 2 I Overall system SNR: SNR σ = E[(y−y˜)2]

y(k) + e(k) eˆ(k) eˆ(k) + yˆ(k) Q − + y˜0(k) y˜(k) P (z) P (z)

Encoder Decoder

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards LPC-based coders

DPCM: closed loop quantization

I Quantization of the residual in the prediction loop (predictive quantization)

I The state of the predictor lters will evolve in the same way 2 2 I σ σ : the eective resolution of the E[(y−y˜)2] = E[(e−eˆ)2] quantization is increased if error is small 2 I Overall system SNR: SNR σ = E[(e−eˆ)2]

y(k) + e(k) eˆ(k) eˆ(k) + yˆ(k) Q − + + y˜(k) y˜(k) + P (z) yˆ(k) P (z) Audio coding and compression Encoder Decoder Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Audio Compression Standards

Dolby Digital

I Supports multichannel audio ( Mono, Stereo, 5.1, Surround)

I uses lossy

I Adopted for cinema soundtracks, DVD , digital TV, cable TV, etc.

MPEG/Audio⇐

I Part of a more general standard for audio and video coding

I Supports multichannel audio (mono, stereo, 5.1)

I Codec uses lossy data compression

I Adopted for data storage, DVD Video, digital TV, cable TV, etc. Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG Audio compression

MPEG/Audio features

I A compression standard for generic audio

I Makes no assumptions about the source

I Exploits the perceptual limitation of auditory system

I Oers three independent layers of compression

I Encoded bitstream supports CRC for error detection

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG Audio compression

MPEG/Audio compression layers

I Layer I: low encoding complexity, bit rates above 128 Kbps

I Layer II: medium complexity, bit rates around 128 Kbps

I Layer III (MP3): high complexity, best audio quality, bit rates around 64 Kbps

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG Audio compression

MPEG/ scheme: overview

I Goal: minimize the audibility of quantization noise

I Approach: bits available for quantization are dynamically allocated to the subband signals

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards The lter bank stage

Polyphase lter bank

I The lter bank divides the input into 32 equal-width frequency subbands I The lterbank is critically sampled: 32 input samples ⇒ 32 output samples

I Subbands have equal widths: does not reect accurately the human auditory system

I Filterbank analysis/synthesis: not a lossless processing

I Adjacent lter bands overlaps: a single frequency can excite two adjacent lterbank outputs.

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards The psychoacoustic model

Principal steps of the psychoacoustic model

I Computes a higher resolution time-freq mapping of the signal (e.g., FFT)

I Separate spectral values into tonal and non-tonal

I Apply a masking function

I Computes the masking threshold for each subband

I Computes the signal-to-mask ratio (SMR) in each subband (the ratio of the signal energy to the minimum masking threshold )

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards The psychoacoustic model

Example of masking threshold determination

Masking components and masking thresholds. 100

80

60

40 dB

20

0

−20 0 50 100 150 200 250 Frequency index

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Dynamic bit allocation

The bit allocation process determines the number of bits allocated to each subband Procedure:

I Computes the mask-to-noise ratio MNRdB = SNRdB − SMRdB I Allocates bits to the subbands with lowest MNR

I Re-estimates SNR, recomputes the subband's MNR, and iterate allocation until bits are available

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Bitstream formatting

Typical frame format for the MPEG/Audio bitstream

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG-1

MPEG-1 Audio Layer 1

I Processes data grouping 12 samples from each subband

I Each group has a bit allocation and a scale factor

I The scale factor tells how to scale the samples to use the full range of quantizer

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG-1

MPEG-1 Audio Layer 2 (wrt Layer 1)

I Similar to Layer 1, but higher complexity and better performance

I Psychoacustic model has ner frequency resolution

I Finer quantization

I Allows grouping of subband samples to reduce scale-factor information

I Higher decoding delay

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG-1

MPEG-1 Audio Layer 3 (wrt Layer 1 and 2)

I Achieves better performances

I Improved time-to-frequency mapping and noise allocation

I Hybrid lterbank: a cascade of polyphase lterbank and dynamically windowed MDCT transform. Provides better spectral resolution and advanced pre-echo control

I Nonuniform quantization with entropy coding

I Run length coding of zero value sequences increases the eciency

I Bit-reservoir: bits in excess can be donated to encode a dierent frame of audio

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Pre-echo canceling

Overlapping dynamic MDCT windows

I Frequency domain coding of audio is aected by pre-echo (e.g., silent period is followed by a percussive sound in the same coding block)

I Solution: pre-echo detection and dynamic window switching for time-varying temporal resolution

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards Bit allocation

Bit reservoir

I If a signal frame can be eciently encoded with fewer bits than maximum allowed, the remaining bits are allocated to future signal frames

source: (D. Pan 1995)

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG-2

Principal features of MPEG-2

I Extends MPEG-1 to lower sampling frequencies, providing better at very low bit rates

I Extends MPEG-1 to multichannel sound

I Adds a new coding scheme called "" (AAC), not backward compatible

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG-2 AAC

Principal features of MPEG-2 AAC

I AAC adheres to the same basic coding paradigm as MPEG-1/2 Layer-3, with some improvements (e.g., noise shaping for improved )

I Provides very high audio quality at 64 kb/s/channel for multichannel operation

I Provides up to 48 main audio channels, 16 low frequency eects channels, 16 overdub/multilingual channels, and 16 data streams

I Provides scalability (Main Prole, Low Complexity Prole, and Scalable Sampling Rate Prole)

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards MPEG-4 AAC

Principal features of MPEG-4 AAC

I Includes coding tools from several dierent coding paradigms, synthetic audio, speech coding and subband/

I The core part of the MPEG-4 audio coder is based on MPEG-2 AAC

I Provides advanced scalability tools and hierarchical coding (e.g., bandwidth scalability)

I Provides error robustness tools for improved performance on error-prone transmission channels (codec specic error resilience tools and a common error protection tools).

I Provides several "proles" to allow the optimal use of MPEG-4 in dierent applications.

Audio coding and compression Outline Introduction Perceptual coding LPC-based coding Audio Compression Standards References

P. Noll, "MPEG Audio Coding"IEEE Magazine, 14(5):5981, 1997.

Audio coding and compression