The MPEG-4 General Audio Coder

The MPEG-4 General Audio Coder

Bernhard Grill Fraunhofer Institute for Integrated Circuits (IIS)

grl 6/98 page 1

115

FRAUNHOfER Institut IntegrierteSchaltungen The MPEG-4 General Audio Coder

Outline • MPEG-2 (AAC)

• MPEG-4 Extensions: – Perceptual Noise Substitution (PNS) – Long Term Prediction – TwinVQ Coding Core

• The MPEG-4 Scalable General Audio Coder

• Results of Listening Tests • Demonstration of a Real-Time Player

grl 6/98 page 2

115

FRAUNHOfER Institut IntegrierteSchaltungen The MPEG-4 General Audio Coder MPEG-2 AAC Encoder Overview: Bitstream Output

Bitstream Multiplexer

Gain Filter TNS M/S Quant.

Control Scale Bank Coding Factors Coupling Intensity/ Noiseless Prediction

Input time signal Perceptual Model Rate/Distortion Control

grl 6/98 page 3

115

FRAUNHOfER Institut IntegrierteSchaltungen The MPEG-4 General Audio Coder Extension: Perceptual Noise Substitution (PNS)

Bitstream Output

Bitstream Multiplexer

Gain Filter TNS M/S Quant.

Control Scale Bank Coding Factors Coupling Intensity/ Noiseless Prediction PNS

Input time signal Perceptual Model Rate/Distortion Control

grl 6/98 page 4

115

FRAUNHOfER Institut IntegrierteSchaltungen The MPEG-4 General Audio Coder Perceptual Noise Substitution (2) Background: • Parametric coding of noise-like signal components has been used widely e.g. in

MPEG-4: • Perceptual Noise Substitution (PNS) permits a frequency selective parametric coding of noise-like signal components • Noise-like signal components are detected on a scalefactor band basis • Corresponding groups of spectral coefficients are excluded from quantization/coding • Instead, only a "noise substitution flag" plus total power of the substituted band is transmitted in the bitstream • Decoder inserts pseudo random vectors with desired target power as spectral coefficients grl 6/98 page 5

115

FRAUNHOfER Institut IntegrierteSchaltungen The MPEG-4 General Audio Coder Perceptual Noise Substitution (3)

Perceptual "Perceptual Noise Model Substitution" (PNS):

Audio Analysis Quantization Bitstream Bitstream Perceptual coder + Input Filterbank & Coding Multiplexer Out parametric represent. Noise Noise subst. signaling

Detection of noise-like signals Substituted signal energies Encoder

Decoder

Audio Synthesis Inverse Bitstream Bitstream Output Filterbank Quantization Demultiplexer In

Noise Noise subst. signaling Generator Substituted signal energies grl 6/98 page 6

115

FRAUNHOfER Institut IntegrierteSchaltungen The MPEG-4 General Audio Coder Extension 2: Long Term Prediction

Bitstream Output

Bitstream Multiplexer

Gain Filter TNS M/S Quant.

Control Scale Bank Coding Factors Coupling Intensity/ Noiseless Prediction

Input time signal Perceptual Model Rate/Distortion Control

grl 6/98 page 7

115

FRAUNHOfER Institut IntegrierteSchaltungen The MPEG-4 General Audio Coder Long Term Prediction (2) Motivation: • Tone-like signals require much higher coding precision than noise-like signals (e.g. 20 dB vs. 6 dB) • Tonal signal components are predictable

MPEG-2 AAC: • Prediction of each spectral coefficient with backward adaptive predictor • High complexity (ca. 50% of decoder computation & RAM)

MPEG-4: • Long Term Predictor (LTP) as known from speech coding • Lower complexity: Saving of approx. 50% in terms of computation and memory over MPEG-2 predictors • Comparable performance to MPEG-2 predictors

grl 6/98 page 8

115

FRAUNHOfER Institut IntegrierteSchaltungen The MPEG-4 General Audio Coder Extension 3: Twin-VQ

Bitstream Output

Bitstream Multiplexer

Gain Filter TNS M/S Quant.

Control Scale Bank Coding Factors Coupling Intensity/ Noiseless Noiseless Prediction

Input time signal time Input Perceptual Model Rate/Distortion Control

grl 6/98 page 9

115

FRAUNHOfER Institut IntegrierteSchaltungen The MPEG-4 General Audio Coder Transform-Domain Weighted Interleave VQ (2) Background: • Audio coding at extremely low bitrates (6-8 kbit/s) • CELP speech coders do not perform well for music • 0.5 Bits per frequency line at these data rates !!

MPEG-4: • Transform-Domain Weighted Interleave (TwinVQ) as alternative coding kernel – Vector selection under control of the perceptual model • Fully integrated into MPEG-4 AAC coding system: – Uses same spectral representation as AAC coder – Makes use of other MPEG-4 tools (e.g. LTP, TNS, joint stereo coding) grl 6/98 page 10

115

FRAUNHOfER Institut IntegrierteSchaltungen The MPEG-4 General Audio Coder Transform-Domain Weighted Interleave VQ (3) Structure: • Normalization of spectral coefficients: – LPC envelope (overall spectral shape) – Periodic component coding (harmonic components) – Bark-scale envelope coding (additional flattening)

• Vector Quantization (VQ) process: – Interleaving of spectral coefficients into new sub-vectors – Vector quantization (two sets of codebooks, weighted distortion measure allows distortion control by perceptual model)

⇒ no bit/noise allocation or rate control iteration

grl 6/98 page 11

115

FRAUNHOfER Institut IntegrierteSchaltungen The MPEG-4 General Audio Coder

Scalability Definition: • Capability to decode useful sub-sets of the bitstream

Types of • SNR / NMR (Noise to Mask Ratio) Scalability: Scalability: – “Extension layers improve the SNR/NMR of the coded signal” • Audio Bandwidth Scalability: – “Extension layers increase the decodable audio band width”

• Restriction of Generality: – Very low core coder optimized for special signals, e.g speech. Additional layers provide good quality for all types of signals.

• Implementation Complexity: grl 6/98 page 12

115

FRAUNHOfER Institut IntegrierteSchaltungen The MPEG-4 General Audio Coder Application examples • Network based (packetized) transmission – Requires routers which know about the importance of a packet – Less important (outer layer) packets may be dropped if the available bandwidth decreases

• Broadcast – The most important (inner layer) packets are transmitted with a better error protection scheme

• Music data base – High quality content is encoded and stored – Access to a lower quality version is possible without recoding to allow for pre-listening with a lower quality grl 6/98 page 13

115

FRAUNHOfER Institut IntegrierteSchaltungen The MPEG-4 General Audio Coder Scalable GA Coder (I) Encoder Block Diagram

- Reconstruct F Q&C S MDCT Spectrum + Q&C S

Perceptual Model

• Encoding of the error signal of an AAC or Twin-VQ Quantization and Coding (Q&C) module in a second, or third, or n-th similar quantization module in the frequency domain • Solutions using only AAC, or only Twin-VQ modules possible • Additionally, Twin-VQ / AAC combinations defined • Useful for large enhancement steps of >= 8 kbit/s per step grl 6/98 page 14

115

FRAUNHOfER Institut IntegrierteSchaltungen The MPEG-4 General Audio Coder Scalable GA Coder (II) • Twin-VQ Q&C Modules Decoder Block Diagram – 8 kbit/s fixed step size Vector Quantizer (VQ) modules Reconstruct – optional 6 kbit/s in first layer Spectrum Inv. – first choice for a 6 or 8 kbit/s base layer for the coding FSS of general audio signals + • AAC Q&C modules Reconstruct Add – Any step size possible Spectrum – Reasonable step sizes from 8 to >64 kbit/s – The same end quality can be achieved as from a single step AAC coder IMDCT – However, a higher bit rate may be required for the same audio quality

grl 6/98 page 15

115

FRAUNHOfER Institut IntegrierteSchaltungen The MPEG-4 General Audio Coder Scalable GA Coder : Combination with CELP Coder (I)

CELP MDCT Quantization FSS & Coding

MDCT Encoder

Perceptual. model

• Very low bitrate core coder ( e.g. speech coder) • Core coder typically operating at a lower sampling frequency • MDCT used for efficient up-sampling grl 6/98 page 16

115

FRAUNHOfER Institut IntegrierteSchaltungen The MPEG-4 General Audio Coder Scalable GA Coder : Combination with Core Coder (II)

CELP MDCT IMDCT DECODER IFSS

Re- quantization + Decoder Re- quantization

grl 6/98 page 17

115

FRAUNHOfER Institut IntegrierteSchaltungen The MPEG-4 General Audio Coder Scalable Stereo Coding: Stereo / Stereo

grl 6/98 page 18

115

FRAUNHOfER Institut IntegrierteSchaltungen The MPEG-4 General Audio Coder Scalable Stereo Coding: Mono / Stereo

grl 6/98 page 19

115

FRAUNHOfER Institut IntegrierteSchaltungen The MPEG-4 General Audio Coder Scalable Stereo Coding: Mono Core / Mono GA / Stereo GA

grl 6/98 page 20

115

FRAUNHOfER Institut IntegrierteSchaltungen The MPEG-4 General Audio Coder Scalable GA Coder : Typical Configurations • Some successfully tested mono/mono combinations: 6 kbit/s CELP + 18 kbit/s AAC 6 kbit/s TwinVQ + 18 kbit/s AAC 8 kbit/s TwinVQ + 8 kbit/s TwinVQ

6 kbit/s CELP + 18 kbit/s + 24 kbit/s AAC

• Mono/stereo combinations 6 kbit/s mono CELP + 18 kbit/s mono + 24 kbit/s stereo AAC 24 kbit/s mono + 16 kbit/s stereo + 16 kbit/s stereo AAC 24 kbit/s mono + 72 kbit/s stereo AAC

• Stereo/stereo combinations 2 x 6 kbit/s mono CELP + 36 kbit/s stereo AAC grl 6/98 page 21

115

FRAUNHOfER Institut IntegrierteSchaltungen 4,5 4 The MPEG-4 General Audio Coder Results 3,5(I) Mono Configurations 3 2,5 2 1,5 1

0,5 CELP+A Tw inV Q Layer 3 AAC 24 AAC 18 AC 24 + AAC 024 kbit/s kbit/s kbit/s kbit/s 24 kbit/s

Series1 Series2 3,5 4,1 3,85 3,65 3,27 grl 6/98 page 22

115

FRAUNHOfER Institut IntegrierteSchaltungen The MPEG-4 General Audio Coder

Results5 (II) Mono / Stereo Configuration

4,5 Mono Stereo 4

3,5

3

2,5

2

1,5

1

0,5

0 L3 24 AAC scal L3 40 AAC scal L3 56 AAC scal kbit/s 24 24 24 kbit/s 40 40 kbit/s 56 56 grl 6/98 kbit/s kbit/s kbit/s kbit/s kbit/s kbit/s page 23

115

FRAUNHOfER Institut IntegrierteSchaltungen The MPEG-4 General Audio Coder

Conclusions • Highest quality coding with proven AAC technology • PNS, LTP and TwinVQ further enhance the very low bitrate performance • Mono, Stereo, and Multi-channel Stereo supported • Bitrate range 6 - ~300 kbit/s per channel at 8 - 96 kHz SR • Additional flexibility with the scalable coding modes – Unique capabilities through the availability of the mono- stereo coding modes • Overall complexity within the limits of today’s hardware

•==> • The MPEG-4 GA coder the most versatile audio coding system available today • Low-Delay and Error Resilience Additions in MPEG-4 Version 2 grl 6/98 page 24

115

FRAUNHOfER Institut IntegrierteSchaltungen