18-899 Special Topics in Signal Processing
Multimedia Communications: Coding, Systems, and Networking
Prof. Tsuhan Chen [email protected]
Lecture 8
MPEG-1 Audio
1 MPEG-1 Audio
• Outline – Background – Psychoacoustics – Subband coding – Layer I and II – Layer III – Frame structure and packetization
18-899/Spring 1998/Chen
MPEG-1 Audio
• ISO/IEC 11172-3 (1988~1991) • First high quality audio compression standard • CD quality two-channel audio at 256 kbits/s – CD: 44.1 kHz × 16 bits × 2 = 1.411 Mbits/s
Frequency Sampling Bits per Raw Bitrate Band (Hz) Rate Sample Telephone 300~3400 8 8 64 Speech Wideband 50~7000 16 8 128 Speech Mediumband 10~11000 24 16 384 Audio Wideband 10~22000 48 16 768 Audio
18-899/Spring 1998/Chen
2 Quality Demonstration
• MPEG-1 Audio (Layer II) – Stereo 44.1 kHz at 64 kbits/s
– Stereo 44.1 kHz at 128 kbits/s
– Stereo 44.1 kHz at 192 kbits/s
– Stereo 44.1 kHz at 256 kbits/s
18-899/Spring 1998/Chen
Psychoacoustics
• Threshold in quiet
18-899/Spring 1998/Chen
3 Frequency Masking
18-899/Spring 1998/Chen
Temporal Masking
Post-Masking: 50~200ms
Also Pre-Masking (much shorter)
18-899/Spring 1998/Chen
4 Encoder Block Diagram
PCM encoded audio samples 32, 44.1, 48 kHz bitstream quantizer frame mapping and packing coding
psychoacoustic model 11172-3 Encoder
ancillary data
18-899/Spring 1998/Chen
Decoder Block Diagram
PCM encoded audio samples bitstream 32, 44.1, 48 kHz frame inverse reconstruction unpacking mapping
11172-3 Decoder
ancillary data
18-899/Spring 1998/Chen
5 Mapping: Subband Coding
H1(z) M Q M F1(z)
H (z) 2 M Q M F2(z)
HM(z) M Q M FM(z) Analysis Synthesis Filterbank Filterbank • Critical downsampling • Q should be based on signal-to-masking ratio (SMR) • Ear’s critical bands are not uniform, but logarithmic
18-899/Spring 1998/Chen
Polyphase Filterbank
M M z -1 z M M E(z) R(z) ......
z -1 z M M
• Alias cancellation and perfect reconstruction
18-899/Spring 1998/Chen
6 Layers
• Increasing complexity, delay, and quality • Layer I ~384 kbits/s for perceptually lossless quality (4:1) • Layer II ~192 kbits/s for perceptually lossless quality (8:1) • Layer III ~128 kbits/s for perceptually lossless quality (12:1)
18-899/Spring 1998/Chen
Layer I and II Encoder
32 Analysis Scaler & Filterbank Quantizer Mux
Masking Dynamic Coder FFT Threshold Bit Generator Allocator
18-899/Spring 1998/Chen
7 Block-Based Coding 12 12 12
Analysis Filterbank ...
Layer I
Layer II • 12 samples for Layer I, 36 samples for Layer II • Block companding: Each block normalized by scalefactor • For Layer II, up to 3 scalefactors, with 2-bit scalefactor select • Each block receives one bit allocation
Layer III Encoder
6 or 18 with overlap Analysis MDCT Scaler & Huffman Quantizer Coding Filterbank Mux
Masking Coding FFT Threshold Generator
18-899/Spring 1998/Chen
8 New Features in Layer III
• Modified DCT (MDCT) – DCT with overlap – Long/short window switching • Short for better temporal resolution (to prevent pre-echoes) • Long for better frequency resolution • Nonuniform quantization • Entropy coding – Run-length and Huffman coding • Bit reservoir (buffer)
18-899/Spring 1998/Chen
Frame Structure
Header Info Side Info Subband Sanples Aux Data
• Header info: Sync bits, system info, CRC (cyclic redundancy code) • Side info: bit allocation, scalefactor, (and scalefactor select for Layer II and III) • Subband samples: 32 × 12 for Layer I, 32 × 36 for Layer II and III • Packetization: 4-byte header, 184-byte payload
18-899/Spring 1998/Chen
9 Stereo Redundancy Coding
• Four modes: mono, stereo, dual with two separate channel, joint stereo • In joint stereo mode – Human stereo perception > 2kHz is based on envelope – Intensity stereo coding > 2kHz • Encode (L + R) • Assign independent left- and right- scalefactors • Layer III supports (L+R) and (L–R) coding
18-899/Spring 1998/Chen
References
– Peter Noll, “MPEG digital audio coding,” IEEE Signal Processing Magazine, Sept. 1997, pp. 59-81
– D. Pan, “A tutorial on MPEG/Audio compression,” IEEE Trans. on Multimedia, vol. 2, no. 2, 1995, pp. 60-74
18-899/Spring 1998/Chen
10