CS 414 – Systems Design Lecture 11 – MP3 Audio & Introduction to MPEG-4 (Part 6)

Klara Nahrstedt Spring 2011

CS 414 - Spring 2011 Administrative

 MP1 – deadline February 18

CS 414 - Spring 2011 Outline  MP3 Audio Encoding  MPEG-2 and Intro to MPEG-4  Reading:  Media Coding book, Section 7.7.2 – 7.7.5  Recommended Paper on MP3: Davis Pan, “A Tutorial on MPEG/Audio Compression”, IEEE Multimedia, pp. 6-74, 1995  Recommended books on JPEG/ MPEG Audio/Video Fundamentals:  Haskell, Puri, Netravali, “Digital Video: An Introduction to MPEG-2”, Chapman and Hall, 1996

CS 414 - Spring 2011 MPEG-1 Audio Encoding

 Characteristics Precision 16 bits Sampling frequency: 32KHz, 44.1 KHz, 48 KHz 3 compression layers: Layer 1, Layer 2, Layer 3 (MP3)  Layer 3: 32-320 kbps, target 64 kbps  Layer 2: 32-384 kbps, target 128 kbps  Layer 1: 32-448 kbps, target 192 kbps

CS 414 - Spring 2011 MPEG Audio Encoding Steps

CS 414 - Spring 2011 MPEG Audio Filter Bank

 Filter bank divides input into multiple sub-bands (32 equal frequency sub-bands)  Sub-band i defined 7 7 (2i +1)(k −16)π St[i] = ∑3∑cos( *(C[k + 64 j]* x[k + 64 j] k =0 j=0 64

 i ∈[0,31], St[i]- filter output sample for sub-band i at time t, C[n] – one of 512 coefficients, x[n] – audio input sample from 512 sample buffer

CS 414 - Spring 2011 MPEG Audio Psycho-acoustic Model  MPEG audio compresses by removing acoustically irrelevant parts of audio signals  Takes advantage of human auditory systems inability to hear quantization noise under auditory masking  Auditory masking: occurs when ever the presence of a strong makes a temporal or spectral neighborhood of weaker audio signals imperceptible.

CS 414 - Spring 2011 MPEG/audio divides audio signal into frequency sub-bands that approximate critical bands. Then we quantize each sub-band according to the audibility of quantization noise within the band CS 414 - Spring 2011 MPEG Audio Bit Allocation  This process determines number of code bits allocated to each sub-band based on from the psycho- acoustic model  Algorithm: 1. Compute mask-to-noise ratio: MNR=SNR-SMR  provides tables that give estimates for SNR resulting from quantizing to a given number of quantizer levels 2. Get MNR for each sub-band 3. Search for sub-band with the lowest MNR 4. Allocate code bits to this sub-band.  If sub-band gets allocated more code bits than appropriate, look up new estimate of SNR and repeat step 1

CS 414 - Spring 2011 MP3 Audio Format

Source: http://wiki.hydrogenaudio.org/images/e/ee/Mp3filestructure.jpg

CS 414 - Spring 2011 MPEG Audio Comments  Precision of 16 bits per sample is needed to get good SNR ratio  Noise we are getting is quantization noise from the digitization process  For each added bit, we get 6dB better SNR ratio  Masking effect means that we can raise the noise floor around a strong sound because the noise will be masked away  Raising noise floor is the same as using less bits and using less bits is the same as compression

CS 414 - Spring 2011 MPEG-2 Standard Extension

 MPEG-1 was optimized for CD-ROM and apps for 1.5 Mbps (video strictly non- interleaved)  MPEG-2 adds to MPEG-1: More aspect ratios: 4:2:2, 4:4:4 Progressive and interlaced frame coding Four scalable modes  spatial scalability, data partitioning, SNR scalability, temporal scalability

CS 414 - Spring 2011 MPEG-1/MPEG-2

 Pixel-based representations of content takes place at the encoder  Lack support for content manipulation e.g., remove a date stamp from a video turn off “current score” visual in a live game

 Need support manipulation and interaction if the video is aware of its own content

CS 414 - Spring 2011 MPEG-4 Compression

CS 414 - Spring 2011 Interact With Visual Content

CS 414 - Spring 2011 Original MPEG-4

 Conceived in 1992 to address very low audio and video (64 Kbps)  Required quantum leaps in compression beyond statistical- and DCT-based techniques committee felt it was possible within 5 years

 Quantum leap did not happen

CS 414 - Spring 2011 The “New” MPEG-4  Support object-based features for content

 Enable dynamic rendering of content defer composition until decoding

 Support convergence among digital video, synthetic environments, and the

CS 414 - Spring 2011 MPEG-4 Components  Systems – defines architecture, multiplexing structure, syntax  Video – defines video coding algorithms for animation of synthetic and natural hybrid video (Synthetic/Natural Hybrid Coding)  Audio – defines audio/, Synthetic/Natural Hybrid Coding such as MIDI and text-to-speech synthesis integration  Conformance Testing – defines compliance requirements for MPEG-4 bitstream and decoders  Technical report  DSM-CC Multimedia Integration Framework – defines Digital Storage Media – Command&Control Multimedia Integration Format; specifies merging of broadcast, interactive and conversational multimedia for set-top-boxes and mobile stations CS 414 - Spring 2011 MPEG-4 Example

CS 414 - Spring 2011 ISO N3536 MPEG4 MPEG-4 Example

CS 414 - Spring 2011 ISO N3536 MPEG4 MPEG-4 Example

Daras, P. MPEG-4 Authoring Tool, J. Applied Signal Processing, 9, 1-18, 2003

CS 414 - Spring 2011 Interactive Drama

CS 414 - Spring 2011 http://www.interactivestory.net MPEG-4 Characteristics and Applications

CS 414 - Spring 2011 Media Objects

 An object is called a media object real and synthetic images; analog and synthetic audio; animated faces; interaction

 Compose media objects into a hierarchical representation form compound, dynamic scenes

CS 414 - Spring 2011 Composition

Scene

character furniture

sprite voice desk globe

CS 414 - Spring 2011 ISO N3536 MPEG4 Video Syntax Structure

New MPEG-4 Aspect: Object-based layered syntactic structure

CS 414 - Spring 2011 Content-based Interactivity  Achieves different qualities for different objects with a fine granularity in spatial resolution, temporal resolution and decoding complexity  Needs coding of video objects with arbitrary shapes  Scalability Spatial and temporal scalability Need more than one layer of information (base and enhancement layers) CS 414 - Spring 2011 Examples of Base and Enhancement Layers

CS 414 - Spring 2011 Coding of Objects  Each VOP corresponds to an entity that after being coded is added to the bit stream  Encoder sends together with VOP Composition information where and when each VOP is to be displayed  Users are allowed to change the composition of the entire scene displayed by interacting with the composition

information CS 414 - Spring 2011 Spatial Scalability

VOP which is temporally coincident with I-VOP in the base layer, is encoded as P-VOP in the enhancement layer. VOP which is temporally coincident with P-VOP in the base layer is encoded as B-VOP in the enhancement layer.

CS 414 - Spring 2011 Temporal Scalability

CS 414 - Spring 2011 Composition (cont.)

 Encode objects in separate channels encode using most efficient mechanism transmit each object in a separate stream  Composition takes place at the decoder, rather than at the encoder requires a binary scene description (BIFS)  BIFS is low-level language for describing: hierarchical, spatial, and temporal relations

CS 414 - Spring 2011 MPEG-4 Rendering

CS 414 - Spring 2011 ISO N3536 MPEG4 Interaction as Objects

 Change colors of objects  Toggle visibility of objects  Navigate to different content sections  Select from multiple camera views change current camera angle  Standardizes content and interaction e.g., broadcast HDTV and stored DVD

CS 414 - Spring 2011 Hierarchical Model

 Each MPEG-4 movie composed of tracks each track composed of media elements (one reserved for BIFS information) each media element is an object each object is a audio, video, sprite, etc.  Each object specifies its: spatial information relative to a parent temporal information relative to global timeline

CS 414 - Spring 2011 Synchronization

 Global timeline (high-resolution units) e.g., 600 units/sec

 Each continuous track specifies relation e.g., if a video is 30 fps, then a frame should be displayed every 33 ms.

 Others specify start/end time

CS 414 - Spring 2011 MPEG-4 parts

 MPEG-4 part 2 Includes Advanced Simple Profile, used by such as Quicktime 6  MPEG-4 part 10 MPEG-4 AVC/H.264 also called Used by coders such as Quicktime 7 Used by high-definition video media like Blu- ray Disc

CS 414 - Spring 2011