RT Computer and Machine Vision

An Overview of MPEG and Image Processing Tips

December 12, 2017  Sam Siewert FFMPEG FAQ Read It!! http://ffmpeg.org/faq.html

You should know how to Decode Video (recorded from your camera or pre- recorded by someone else) You should know how to Encode Video (to turn in with your labs)

 Sam Siewert 2 Ffmpeg (avconv) Notes sudo apt-get install ffmpeg -i movie.mpg –ss 30 –t 30 movie%d.ppm –- 30 seconds @ 30 sec ssiewert@ssiewert-VirtualBox:~/a485/media$ ffmpeg -i big_buck_bunny_480p_surround-fix.avi -ss 30 -t 30 bbb%d.ppm ffmpeg version 0.8.6-4:0.8.6-0ubuntu0.12.04.1, Copyright () 2000-2013 the Libav developers built on Apr 2 2013 17:02:36 with gcc 4.6.3 Input #0, avi, from 'big_buck_bunny_480p_surround-fix.avi': Duration: 00:09:56.45, start: 0.000000, bitrate: 2957 kb/s Stream #0.0: Video: mpeg4 (Simple Profile), yuv420p, 854x480 [PAR 1:1 DAR 427:240], 24 tbr, 24 tbn, 24 tbc Stream #0.1: Audio: ac3, 48000 Hz, 5.1, s16, 448 kb/s Incompatible pixel format 'yuv420p' for 'ppm', auto-selecting format 'rgb24' [buffer @ 0x907700] w:854 h:480 pixfmt:yuv420p [avsink @ 0x9054c0] auto-inserting filter 'auto-inserted scaler 0' between the filter 'src' and the filter 'out' [scale @ 0x905b60] w:854 h:480 fmt:yuv420p -> w:854 h:480 fmt:rgb24 flags:0x4 Output #0, image2, to 'bbb%d.ppm': Metadata: encoder : Lavf53.21.1 Stream #0.0: Video: ppm, rgb24, 854x480 [PAR 1:1 DAR 427:240], q=2-31, 200 kb/s, 90k tbn, 24 tbc Stream mapping: Stream #0.0 -> #0.0 Press ctrl-c to stop encoding ... Last message repeated 719 times -0kB time=29.00 bitrate= -0.0kbits/s frame= 720 fps= 38 q=0.0 Lsize= -0kB time=30.00 bitrate= -0.0kbits/s video:864686kB audio:0kB global headers:0kB muxing overhead -100.000002% ssiewert@ssiewert-VirtualBox:~/a485/media$  Sam Siewert 3 Now with PPM Frames PPM is Simple, but No Compression – Good for CV – http://en.wikipedia.org/wiki/Netpbm_format - Read this! – JPEG, PNG are Compressed – TIFF is an Alternative, but More Complex

 Sam Siewert 4 Simple Re-encode When Quality is not a Concern, Keep it Simple ffmpeg -f image2 -i bbb%d.ppm bbbtrans.mpg vlc bbbtrans.mpg

 Sam Siewert 5 Quality Encoding is Tricky Use MPEG4 HQ Settings, Encode 480p, AR=4:3 ffmpeg -f image2 -i bbb%d.ppm -maxrate 20000k - bufsize 32M -s 640x480 -vcodec mpeg4 -qscale 1 bbbtranshq.mp4

 Sam Siewert 6 Color and Object Recognition Demo (Revisited) Object Recognition and Tracking Using Color in Real-Time Use Color Models or “Signatures” for Known Objects – General Color Perception and Recognition – Computer Vision – Specific Color Signature Recognition – Machine Vision – Controlled Lighting, Apriori algorithm, not tracking primary colors, but rather centroids of objects with a color signature

ECEN 4623/5623 – University of Colorado Boulder  Sam Siewert 7 Basic Concepts Single Camera Tilt/Pan Object Tracking – 2 Axes of Rotation – +/- 45 deg Tilt Rotation Servo – +/- 45 deg Pan Rotation Servo – Side or Rear Mounting of Tilt Servo

Front View Side View

Camera Pan Camera Rotation Tilt Rotation Pan Rotation Tilt Rotation Pan Servo

Pan Tilt Servo Servo Tilt Servo

Mounting Plane Mounting Plane

 Sam Siewert 8 Dual Camera Tilt/Pan Tracking

Baseline with 2 Fixed Cameras – Pan Serve Pans Entire Baseline – Tilt Servo Tilts Pan Servo – Side Mounting Plate

Camera Camera

Tilt Rotation

Pan Rotation Pan Servo Tilt Servo

Mounting Plane

 Sam Siewert 9 Target Centroid

Apply Edge Detection or Enhancement – High Pass Filter PSF Convolution – Point Spread Function – E.g. Edge Enhancement kernel -k/8 -k/8 -k/8 – 9 multiplications at every pixel and accumulate for new pixel value -k/8 k+1 -k/8 – k=0,1,2,3,… (k=0, no change) Raster Processing to Find Edges on Rows -k/8 -k/8 -k/8 Raster Processing to Find Edges on Columns Use Target Shape Characteristics Threshold Filter to Clean up Sharpened Image Chapter 24, “The Scientist and Engineer’s Guide to Digital Signal Processing”, by Steven Smith

 Sam Siewert 10 Edge Enhancement/Filter

RGB Sharpen and Filter

Sharpen

Balanced Gray Sharpen and Filter

Sharpen

 Sam Siewert 11 Pixel Coordinates Define Image Coordinates to Track Object Centroid Tilt/Pan With Servos to Keep Target Centroid in FOV Center

X

Frame Origin Reference Target Pixel Address 0,0 Pixel Address 160, 120 w o R l e t x i a P m r 0 o 4 F 2 C S T N

Y NTSC 320 Pixel Column Format  Sam Siewert 12 Stereo Ranging with Common Tilt/Pan Fixed Camera Baseline

Target centroid

∆centroid = (dl + dr) ∆centroid b α θ = f d f ≡ focal − length d

L lens R lens b α f f θ

L detector R detector

dl dr

 Sam Siewert 13 Pixel to Servo Calibration at Distance

Pan Shifted Tilt One Servo Frame Tilt Shifted Frame Increment

Pan One Servo Reference Target Increment Pixel Address Pan Shifted 160, 120 Target Pixel w

o Address 158, 120 R l e t x i a P m r 0 o 4 F 2 Tilt Shifted Target

C Pixel Address

S 160, 122 T N

NTSC 320 Pixel Column Format Pan 1 Servo Increment and Find Centroid X Pixel Change Tilt 1 Servo Increment and Find Centroid Y Pixel Change

 Sam Siewert 14 Characterize Camera FOV at Distances

FOV Width/Height Linear as a Function of Distance Pixels/Inch Not Necessarily Linear as Function of Distance Use to Calibrate Servo Step Size (Gain) for Target Centering – For Servo Step Sizes of 1 Increment in Tilt/Pan, Camera Will Track Slowly – For Larger Servo Step Sizes May Over-shoot – Determine Deadbands (Servo Limits for Pixel Change Accuracy)

CCTV FOV analysis Pixels Per Inch at Distances

50 y = 1.2132x - 1.1711 40 40 30 30 measured FOV 20 20 linear fit 10 10 Pixels Per Inch Per Pixels 0 0 distance from target distance from 0 20 40 0 10 20 30 40 50 FOV width Distance to Target

 Sam Siewert 15 Finding Centroid In Image Target Known by Color, Shape, or Brightness Raster to Find Target Edges and Max Width and Height Scanline for Symmetric targets Mark Centroid in Image for Easy Debug Noise in Image Will Cause Centroid Error – Use Averaging Pan Right to Move Target Left in Image Tilt Up to Move Target Down in Image

 Sam Siewert 16 More Tips Consider Controlled Lighting or For Mobile Robots, On-Board Lighting with LEDs Use Frame Grabber ADC Sensitivity Settings to Control Brightness Consider Automatic Calibration Sequences – Place Reference Targets to Set Pixel Step Size as a Function of Servo Step Size – Use Stereo Range Estimation to Determine Distance to Target and Set Tracking Gains for Current Distance When Target is Lost, Go Into Search Mode – Start from Max Tilt/Pan and Raster to Min Tilt/Pan to Find Target – For Search Modes Use Coarser Step Size Consider Small Servo Step Size, But High Frame Rate and Servo Command Rate Be Careful of Processor Overload with Image Processing at 30 FPS

 Sam Siewert 17 Project Suggestions

Target Tracker – Tilt/Pan Camera Target Tracker – Fixed Camera, Tilt/Pan Laser Pointer Stereo Ranging Tracker Stereo Scene Imager – Raster a Scene with Tilt/Pan Laser Pointer – Fixed Camera Ranges to Each Laser Pointer Location – Builds 3-D Scene Map Line Follower Mobile Robot – Downward Camera Keeps Robot On Course – Image Processing Drives Steering Commands – Can Look Upward and Use Laser Pointer to detect Obstacles GPS Coarse Navigation with Close-In Computer Vision Scanners – Image to XY Plot Combine with Robotic Projects for Arm Navigation

 Sam Siewert 18 Video Media

 Sam Siewert 19 Embedding Video

Codec = Compression, Decompression

Build Your Own – Run Length Encoding – Difference Images – Python Viewer (Displays PPM sequences) – X-Windows Viewer (Displays PPM sequences)

Theora/ Open Source Option – http://www.theora.org/ – Stream over Raw TCP to VLC Viewer

MJPEG Open Source Option – http://mjpeg.sourceforge.net/

 Sam Siewert 20 Notes on Computer Color Encoding RGB, 24-bit, [0-255] for each color band Each Pixel is a 3-D Vector in RGB Space Blue Cyan

Magenta White

Green Black

Red Yellow

 Sam Siewert 21 YUV/YCrCb  RGB An Alternative to RGB is YUV, Where Y is Luminance and CrCb is Chrominance

The following 2 sets of formulae are taken from information from Keith Jack's excellent book "Video Demystified" (ISBN 1-878707-09-4).

RGB to YUV Conversion (For Computers with RGB [0-255]) – Y = (0.257 * R) + (0.504 * G) + (0.098 * B) + 16 – Cr = V = (0.439 * R) - (0.368 * G) - (0.071 * B) + 128 – Cb = U = -(0.148 * R) - (0.291 * G) + (0.439 * B) + 128

YUV to RGB Conversion – B = 1.164(Y - 16) + 2.018(U - 128) – G = 1.164(Y - 16) - 0.813(V - 128) - 0.391(U - 128) – R = 1.164(Y - 16) + 1.596(V - 128)

In both these cases, you have to clamp the output values to keep them in the [0-255] range.

 Sam Siewert 22 RGB to Grayscale

From 24 bits to 8 bits most often

Single Color Band from RGB – Not True Grayscale, but Useful for Computer Vision Applications – Some Targets Like a Laser Pointer are Best Seen in Red Band or Green Band Alone

GIMP Uses a Conversion to 8-bit Luminance – Y = 0.3R + 0.59G + 0.11B – Defined by equal amounts of color the eye is most sensitive to green, then red, and then blue

 Sam Siewert 23 R, G, or B band only vs. Balance

R G B

Balanced

 Sam Siewert 24 Building Your Own Video Codec Video Compression Spaces – Color Space RGB (24 bits) YCrCb (16 bits / pixel) – Lossy compared to RGB Grayscale (8 bits) – Lossy

– XY Dimension As an Image Convolution/Deconvolution (Lossy) – Convolution: Moving Average of Pixels to Compress Multiple Pixels to One – Deconvolution: Interpolation to Estimate Original Pixel Values Adjancent to Compressed Pixel As A String – Run Length Encoding (Lossless) – Huffman Encoding (Lossless)

– Frame to Frame Time Dimension Difference Images (Lossless or Lossy with Thresholds) – Pixel Address and data for non-zero ∆pixels Pixel Address for 320x240 = 17 bits Dpixel = 24 bits for RGB – Scenes often don’t change quickly – Transmission of Change-Only Data – Threshold on ∆pixel to Compress more (Lossy) – Detection of Size Blow-up on Fast Changing Data

 Sam Siewert 25 YCrCb 4:2:2 16-bit Format For every 2 Y samples in a scanline, there is one CrCb sample – Each Y, Cr, and Cb Sample is 8 bits each – Two RGB Pixes = 48 bits, Whereas Two YCrCb is 32 bits, or 16 bits per pixel vs. 24 bits per pixel (1/3 smaller frame size) 0 319 …

… 76,480 76,799 …

= Y, Cr, and Cb sample = Y sample only

Pixel-0 = Y7:Y00, Cb7:Cb00; Pixel-1 = Y7:Y01, Cr7:Cr00

Pixel-2 = Y7:Y02, Cb7:Cb01; Pixel-3 = Y7:Y03, Cr7:Cr01

Pixel-4 = Y7:Y04, Cb7:Cb02; Pixel-5 = Y7:Y05, Cr7:Cr02

 Sam Siewert 26 Basic Definitions Useful Wikipedia Pages – PPM - http://en.wikipedia.org/wiki/Portable_pixmap – GIF - http://en.wikipedia.org/wiki/GIF – JPEG - http://en.wikipedia.org/wiki/JPEG – MPEG - http://en.wikipedia.org/wiki/MPEG – Theora - http://en.wikipedia.org/wiki/Theora

PPM and PGM Info – http://netpbm.sourceforge.net/doc/ppm.html (RGB) – http://netpbm.sourceforge.net/doc/pgm.html (grayscale)

MPEG Info – http://www.mpeg.org/MPEG/index.html – http://www.compression-links.info/MPEG

DivX Info – http://www.divx.com/divx/

 Sam Siewert 27 Video Driver and Frame Analysis Resources

Test Dumping Frame over TSFS – Slow, but sure – Can load dumped frame to analyze

Single Frame Viewing and Analysis – http://www.irfanview.com/ – http://www.trilon.com/xv/downloads.html – http://www.gimp.org/downloads/

Image Processing Libraries – http://cimg.sourceforge.net/ – http://sourceforge.net/projects/opencvlibrary/

 Sam Siewert 28 Using Python PPM Stream Viewer

Test Your Python and Vpipe Installation Using – Run vpipe_display.py first – Run frametx_test.py second

Set up and test your Btvid Bt878 driver and hardware

Test Image capture using “report” and write_save_buffer to dump PPM image over TSFS

Write TCP/IP client code to stream 1 frame/sec to Vpipe Display

 Sam Siewert 29 More on Streaming

Streaming = Codec + Data Transport – E.g. MPEG-4 / RTP – Your Codec / UDP

Transport Protocols – UDP – Connectionless Datagrams, No Delivery Guarantee Diversely Routed Data Can Out of Order Datagrams Lost Are Not Re-transmitted – TCP – Connection-oriented Messaging, Guarantee for Window All Messages Segmented, Sequenced, and Fully Acknowledged All Messages Re-assembled from Segments and Re-Ordered Any Lost Messages Re-transmitted from Re-Transmission Window – Re-transmission Window Based on Bandwidth-Delay, Congestion – After a Maximum Number of Retries, TCP Finally Gives Up – RTP/UDP – Real-Time Transport Payload type, Sequence Number, Time-stamp, Delivery Monitoring http://www.ietf.org/rfc/rfc1889.txt – RTSP – Real-Time Streaming Transport Typically Used to Control RTP Delivery, but can use UDP or other transport http://www.ietf.org/rfc/rfc2326.txt

 Sam Siewert 30 Project Suggestions

Motion Detection Video Stream Storage and Playback – Motion Detection Threshold for Difference Images – Compress on Store and Un-compress on Retrieval for Display

Computer Vision Projects

Video Editing

Digital Video Recorder

 Sam Siewert 31