<<

18-899 Special Topics in

Multimedia Communications: Coding, Systems, and Networking

Prof. Tsuhan Chen [email protected]

Lecture 8

MPEG-1 Audio

1 MPEG-1 Audio

• Outline – Background – – Subband coding – Layer I and II – Layer III – Frame structure and packetization

18-899/Spring 1998/Chen

MPEG-1 Audio

• ISO/IEC 11172-3 (1988~1991) • First high quality audio compression standard • CD quality two-channel audio at 256 kbits/s – CD: 44.1 kHz × 16 bits × 2 = 1.411 Mbits/s

Frequency Sampling Bits per Raw Bitrate Band (Hz) Rate Sample Telephone 300~3400 8 8 64 Speech Wideband 50~7000 16 8 128 Speech Mediumband 10~11000 24 16 384 Audio Wideband 10~22000 48 16 768 Audio

18-899/Spring 1998/Chen

2 Quality Demonstration

• MPEG-1 Audio (Layer II) – Stereo 44.1 kHz at 64 kbits/s

– Stereo 44.1 kHz at 128 kbits/s

– Stereo 44.1 kHz at 192 kbits/s

– Stereo 44.1 kHz at 256 kbits/s

18-899/Spring 1998/Chen

Psychoacoustics

• Threshold in quiet

18-899/Spring 1998/Chen

3 Frequency Masking

18-899/Spring 1998/Chen

Temporal Masking

Post-Masking: 50~200ms

Also Pre-Masking (much shorter)

18-899/Spring 1998/Chen

4 Encoder Block Diagram

PCM encoded audio samples 32, 44.1, 48 kHz bitstream quantizer frame mapping and packing coding

psychoacoustic model 11172-3 Encoder

ancillary data

18-899/Spring 1998/Chen

Decoder Block Diagram

PCM encoded audio samples bitstream 32, 44.1, 48 kHz frame inverse reconstruction unpacking mapping

11172-3 Decoder

ancillary data

18-899/Spring 1998/Chen

5 Mapping: Subband Coding

H1(z) M Q M F1(z)

H (z) 2 M Q M F2(z)

HM(z) M Q M FM(z) Analysis Synthesis Filterbank Filterbank • Critical downsampling • Q should be based on signal-to-masking ratio (SMR) • Ear’s critical bands are not uniform, but logarithmic

18-899/Spring 1998/Chen

Polyphase Filterbank

M M z -1 z M M E(z) R(z) ......

z -1 z M M

• Alias cancellation and perfect reconstruction

18-899/Spring 1998/Chen

6 Layers

• Increasing complexity, delay, and quality • Layer I ~384 kbits/s for perceptually lossless quality (4:1) • Layer II ~192 kbits/s for perceptually lossless quality (8:1) • Layer III ~128 kbits/s for perceptually lossless quality (12:1)

18-899/Spring 1998/Chen

Layer I and II Encoder

32 Analysis Scaler & Filterbank Quantizer Mux

Masking Dynamic Coder FFT Threshold Bit Generator Allocator

18-899/Spring 1998/Chen

7 Block-Based Coding 12 12 12

Analysis Filterbank ...

Layer I

Layer II • 12 samples for Layer I, 36 samples for Layer II • Block : Each block normalized by scalefactor • For Layer II, up to 3 scalefactors, with 2-bit scalefactor select • Each block receives one bit allocation

Layer III Encoder

6 or 18 with overlap Analysis MDCT Scaler & Huffman Quantizer Coding Filterbank Mux

Masking Coding FFT Threshold Generator

18-899/Spring 1998/Chen

8 New Features in Layer III

• Modified DCT (MDCT) – DCT with overlap – Long/short window switching • Short for better temporal resolution (to prevent pre-echoes) • Long for better frequency resolution • Nonuniform quantization • Entropy coding – Run-length and • Bit reservoir (buffer)

18-899/Spring 1998/Chen

Frame Structure

Header Info Side Info Subband Sanples Aux Data

• Header info: Sync bits, system info, CRC (cyclic redundancy ) • Side info: bit allocation, scalefactor, (and scalefactor select for Layer II and III) • Subband samples: 32 × 12 for Layer I, 32 × 36 for Layer II and III • Packetization: 4-byte header, 184-byte payload

18-899/Spring 1998/Chen

9 Stereo Redundancy Coding

• Four modes: mono, stereo, dual with two separate channel, joint stereo • In joint stereo mode – Human stereo perception > 2kHz is based on envelope – Intensity stereo coding > 2kHz • Encode (L + R) • Assign independent left- and right- scalefactors • Layer III supports (L+R) and (L–R) coding

18-899/Spring 1998/Chen

References

– Peter Noll, “MPEG digital audio coding,” IEEE Signal Processing Magazine, Sept. 1997, pp. 59-81

– D. Pan, “A tutorial on MPEG/Audio compression,” IEEE Trans. on Multimedia, vol. 2, no. 2, 1995, pp. 60-74

18-899/Spring 1998/Chen

10