<<

Computer and Machine Vision

Deeper Dive into MPEG Digital Encoding

January 22, 2014  Sam Siewert Reminders

CV and MV Use UNCOMPRESSED FRAMES

Remote Cameras (E.g. Security) May Need to Transport Frames Capture Over Network to CV/MV Processor

We NEED to Understand Both!

BEWARE of

I-Frame ONLY or MJPEG Decent Compromise of Both

 Sam Siewert 2 MPEG: Order Of Operators

#1

#2B #2C #2A #3

#1: POINT () Encoding #2 A-C: Macro-Block Lossy Intra-Frame Compression #3: Motion-Based Compression in Group of Pictures

 Sam Siewert 3 Step #1 – RGB to YCrCb 4:4:4 24- (Lossless) For every Y sample in a scan-line, there is also one CrCb sample – Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 – No compression between RGB and YCrCb 4:4:4 (both 24 bits/pixel)

Typically a Post Production, CEDIA or DCI format 0 319 …

… 76,480 76,799 …

= Y, Cr, and Cb sample = Y sample only

 Sam Siewert 4 Step #1 – RGB to YCrCb 4:2:2 (Lossy) For every 2 Y samples in a scan-line, one CrCb sample

– Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits – Two RGB = 48 bits, Whereas Two YCrCb is 32 bits, or 16 bits per pixel vs. 24 bits per pixel (33% smaller frame size)

0 319 …

48 bit to 32 bit … 76,480 76,799 …

= Y, Cr, and Cb sample = Y sample only

 Sam Siewert 5 Step #1 – RGB to YCrCb 4:2:0 (Lossy) For every 4 Y samples in a scan-line, one CrCb sample – Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits – Two RGB Pixes = 48 bits, Whereas Four YCrCb is 48 bits, or 12 bits per pixel on average vs. 24 bits per pixel (50% smaller)

0 319 …

… 76,480 76,799 = Cr, Cb sample = Y sample only

 Sam Siewert 6 Step #2 – Convert to 8x8 and Transform Aspect Ratios Designed to Fit 8x8

E.g. 640 x 480 => 80 x 60 Macroblocks

Discrete Cosine Transform Applied to Each 8x8 – Spatial Intensity to Frequency Transform – Applied on X Axis (Row) – Applied on Y Axis (Column)

Set up for Intra-frame (I-frame) Compression

 Sam Siewert 7 Convolution Concepts Math operation on 2 functions, that produces a 3rd Point Spread Function “Sharpen” meets this Definition So do Many Mask Operations applied to Pixel Neighborhoods

2 impulses, f(t), g(X – t)

Area inside intersection

f convolved with g over t  Sam Siewert 8 DCT – Discrete Cosine Transform Convolution of Image with Discrete Cosine See http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_code/example-dct1/ De-convolved to restore image from Convolved Image

DCT

Inverse DCT

 Sam Siewert 9 DCT Concepts

F(x) is a sum of sinusoids (with frequency, amplitude) DCT operates of a discrete number of samples Can derive DC sum at any x, even where F(x) not known N x N Macro-block has Zero Frequency DC at 0,0 Increasing Horizontal Frequency Increasing Vertical Frequency Can De-convolve (inverse DCT, or iDCT) Can Eliminate High Frequency Horizontal and Vertical Terms – Minimal Losses from Truncation (otherwise lossless) – Loss of High Frequency Image Features (What are These?)

 Sam Siewert 10 Basic Concept of Waveforms Complex Waveform is Sum of Simple Fundamentals Simple Fundamentals Can Be Derived from Complex

 Sam Siewert 11 Scanline DCT Example Small Losses Due to DCT, iDCT Numerical Truncation Larger Losses Due to H.O.T. Quantization and Truncation http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_doc/1D-DCT-N- Fundamentals.xlsx

 Sam Siewert 12 What Is Lost with DCT Quantization? Noise More Than Anything Else Complex XY Variable Patterns (Real Science Data?)

Complex Tiling Higher Frequency X Higher Frequency Y Terms Can Still be Ignored

Complex Wood Texture Most Detail in X Far Less in Y

Randomized Texture Image High X Detail High Y Detail Most Loss of Detail, But Noisy

 Sam Siewert 13 Step #2A: Macro-block Discrete Cosine Transform 8x8 Pixel Block – Macro-block – SD NTSC 720x480 (90x60 Macro-blocks), 3:2 Aspect Ratio – HD 720 1280x720 (160x90 Macro-blocks), 16:9 AR – HD 1080 1920x1080 (240x135 Macro-blocks), 16:9 AR

 Sam Siewert 14 Step #2B: Macro-block Quantization (Lossy)

Apply Weighting and Scaling 8x8 to DCT Produces Lots of Repeated Values (and Zeros) Compared to Original

 Sam Siewert 15 Decode Process for #2A-B

 Sam Siewert 16 How Lossy is the Decode Macro- Block?

 Sam Siewert 17 OpenCV Macroblock DCT Example

Same Cactus 320x240 with 80x80 DCT Macroblocks

DCT iDCT

Same Cactus 320x240 Again with 8x8 DCT Macroblocks

DCT iDCT

 Sam Siewert 18 Mathematics for 2D DCT Frequency Variation on X and Y axes from top left to bottom right

Straight-forward Algorithm Based on 2D Equation is O(n2) per dimension

Like Cooley-Tukey for DFT, a DCT Algorithm that is O(n*log2(n)) has been formulated (Arai, Y.; Agui, T.; Nakajima, M. - Numerical Recipes: http://en.wikipedia.org/wiki/File:Dctjpeg.png The Art of Scientific Computing (3rd ed.)) http://www.cse.uaa.alaska.edu/~ssiewert/a490d mis_code/dct2/dct2.c

 Sam Siewert 19 Step #2C: Macro-block Run-Length and Huffman Encoding Zig-Zag Run-Length Encoding to Exploit Repeated Data and Zeros found in H.O.T. of Quantized DCT

– 86, 1, 7, -5, -1, 0, 1, 0, 0, 2, -1, 1, 0, -1, 0 , 0, 0, 0, -1, 0, 0, … Becomes:

 Sam Siewert 20 Huffman Applied to RLE Data

Huffman Tables for MPEG-2 Macro-Blocks Defined in 13818-2 (Lossless) Compression Based on Probability of Occurance

Shannon’s Source : log2(P), P=probability of occurrence, Binary encoding of Symbols

 Sam Siewert 21 Step #3: Group of Pictures Concept – Transmit Change-Only Data I-Frame Compressed Only Intra-Frame By Methods #2A-2C to Macro-Blocks I-Frame Can Be Decoded Alone P-Frame is Differences Only Over the GoP B-Frame is Differences Only Between Both I-Frame and Closest P-Frame Difference Data Can be Further Encoded with Lossless Methods Without Steps 2A-C, Specifically Quantization, and With High Motion Video, Could Blow-Up

 Sam Siewert 22 Group of Pictures: High Level View

 Sam Siewert 23 Overall MPEG YCrCb Compression Performance Standard Definition 720x480x2 (675KB/frame) @ 30fps – Requires 20MB/sec (200 Mbps) Uncompressed – Typical MPEG-2 @ 3.75 Mbps, > 50x Compression – Typical MPEG-4 @ 1.5 Mbps, > 100x Compression – 10 to 20 Programs on QAM 256 (48Mbps, 6MhZ/Ch) – ≈10 MPEG-4 Programs on ATSC 8VSB (19.39 Mbps, 6MhZ/Ch)

HD 720p (1280x720x2,1800KB/frame) @ 30fps – Requires 53MB/sec (530Mbps) Uncompressed – Typical MPEG-2 @ 20 Mbps, > 25x Compression – Typical MPEG-4 @ 10 Mbps, > 50x Compression

HD 1080p (1920x1080x2, 4050KB/frame) @ 30fps – Requires 120MB/sec (1200Mbps) Uncompressed – Typical MPEG-2, VC-1 @ 45 Mbps, > 30x Compression – Typical MPEG-4 @ 20 Mbps, > 60x Compression

 Sam Siewert 24 Parsing an Elementary Video Stream

Many 188-Byte Packet Types and Header Allows for Multi-plexing of many Video and Audio Streams on a Carrier  Sam Siewert 25 MPEG-4 vs. MPEG-2

MPEG-2 – Defined by ISO 13818-1, 13818-2 – Leverages MPEG-1 (Motion Picture Experts Group – 1988) – Widely Used for – Digital Cable TV, DVD – Transport Stream designed for Broadcast (Lossy, No Beginning or End of Stream) ATSC – Advanced Systems Committee (HDTV Broadcast) – 8VSB Modulation – 8 level Vestigal Sideband Modulation, 6MhZ channel, 19.39 Mbps, Reed-Solomon Error Correction – Up to 1080p (1920x1080) Video Resolution – AC-3 (Dolby) Audio DVB – Digital Video Broadcast (Europe, Satellite) – Program Stream designed for Playback Media (DVD, Flash, HDD, etc.)

MPEG-4 – Defined by ISO 14496 (1998) – Leverages MPEG-2 Standards for Program/Transport, Encode/Decode – Better Compression Rates (improved motion prediction for P,B frames), MPEG-4 Part-10 (H.264), e.g. Blu-Ray – Extensions for Digital Rights Management – Advanced Audio Encoding – Becoming More Widely Deployed for HD and Because of Lower Bit-Rate Transport Streams

 Sam Siewert 26