RT Computer and Machine Vision
An Overview of MPEG and Image Processing Tips
December 12, 2017 Sam Siewert FFMPEG FAQ Read It!! http://ffmpeg.org/faq.html
You should know how to Decode Video (recorded from your camera or pre- recorded by someone else) You should know how to Encode Video (to turn in with your labs)
Sam Siewert 2 Ffmpeg (avconv) Notes sudo apt-get install ffmpeg ffmpeg -i movie.mpg –ss 30 –t 30 movie%d.ppm –- 30 seconds @ 30 sec ssiewert@ssiewert-VirtualBox:~/a485/media$ ffmpeg -i big_buck_bunny_480p_surround-fix.avi -ss 30 -t 30 bbb%d.ppm ffmpeg version 0.8.6-4:0.8.6-0ubuntu0.12.04.1, Copyright (c) 2000-2013 the Libav developers built on Apr 2 2013 17:02:36 with gcc 4.6.3 Input #0, avi, from 'big_buck_bunny_480p_surround-fix.avi': Duration: 00:09:56.45, start: 0.000000, bitrate: 2957 kb/s Stream #0.0: Video: mpeg4 (Simple Profile), yuv420p, 854x480 [PAR 1:1 DAR 427:240], 24 tbr, 24 tbn, 24 tbc Stream #0.1: Audio: ac3, 48000 Hz, 5.1, s16, 448 kb/s Incompatible pixel format 'yuv420p' for codec 'ppm', auto-selecting format 'rgb24' [buffer @ 0x907700] w:854 h:480 pixfmt:yuv420p [avsink @ 0x9054c0] auto-inserting filter 'auto-inserted scaler 0' between the filter 'src' and the filter 'out' [scale @ 0x905b60] w:854 h:480 fmt:yuv420p -> w:854 h:480 fmt:rgb24 flags:0x4 Output #0, image2, to 'bbb%d.ppm': Metadata: encoder : Lavf53.21.1 Stream #0.0: Video: ppm, rgb24, 854x480 [PAR 1:1 DAR 427:240], q=2-31, 200 kb/s, 90k tbn, 24 tbc Stream mapping: Stream #0.0 -> #0.0 Press ctrl-c to stop encoding ... Last message repeated 719 times -0kB time=29.00 bitrate= -0.0kbits/s frame= 720 fps= 38 q=0.0 Lsize= -0kB time=30.00 bitrate= -0.0kbits/s video:864686kB audio:0kB global headers:0kB muxing overhead -100.000002% ssiewert@ssiewert-VirtualBox:~/a485/media$ Sam Siewert 3 Now with PPM Frames PPM is Simple, but No Compression – Good for CV – http://en.wikipedia.org/wiki/Netpbm_format - Read this! – JPEG, PNG are Compressed – TIFF is an Alternative, but More Complex
Sam Siewert 4 Simple Re-encode When Quality is not a Concern, Keep it Simple ffmpeg -f image2 -i bbb%d.ppm bbbtrans.mpg vlc bbbtrans.mpg
Sam Siewert 5 Quality Encoding is Tricky Use MPEG4 HQ Settings, Encode 480p, AR=4:3 ffmpeg -f image2 -i bbb%d.ppm -maxrate 20000k - bufsize 32M -s 640x480 -vcodec mpeg4 -qscale 1 bbbtranshq.mp4
Sam Siewert 6 Color and Object Recognition Demo (Revisited) Object Recognition and Tracking Using Color in Real-Time Use Color Models or “Signatures” for Known Objects – General Color Perception and Recognition – Computer Vision – Specific Color Signature Recognition – Machine Vision – Controlled Lighting, Apriori algorithm, not tracking primary colors, but rather centroids of objects with a color signature
ECEN 4623/5623 – University of Colorado Boulder Sam Siewert 7 Basic Concepts Single Camera Tilt/Pan Object Tracking – 2 Axes of Rotation – +/- 45 deg Tilt Rotation Servo – +/- 45 deg Pan Rotation Servo – Side or Rear Mounting of Tilt Servo
Front View Side View
Camera Pan Camera Rotation Tilt Rotation Pan Rotation Tilt Rotation Pan Servo
Pan Tilt Servo Servo Tilt Servo
Mounting Plane Mounting Plane
Sam Siewert 8 Dual Camera Tilt/Pan Tracking
Baseline with 2 Fixed Cameras – Pan Serve Pans Entire Baseline – Tilt Servo Tilts Pan Servo – Side Mounting Plate
Camera Camera
Tilt Rotation
Pan Rotation Pan Servo Tilt Servo
Mounting Plane
Sam Siewert 9 Target Centroid
Apply Edge Detection or Enhancement – High Pass Filter PSF Convolution – Point Spread Function – E.g. Edge Enhancement kernel -k/8 -k/8 -k/8 – 9 multiplications at every pixel and accumulate for new pixel value -k/8 k+1 -k/8 – k=0,1,2,3,… (k=0, no change) Raster Processing to Find Edges on Rows -k/8 -k/8 -k/8 Raster Processing to Find Edges on Columns Use Target Shape Characteristics Threshold Filter to Clean up Sharpened Image Chapter 24, “The Scientist and Engineer’s Guide to Digital Signal Processing”, by Steven Smith
Sam Siewert 10 Edge Enhancement/Filter
RGB Sharpen and Filter
Sharpen
Balanced Gray Sharpen and Filter
Sharpen
Sam Siewert 11 Pixel Coordinates Define Image Coordinates to Track Object Centroid Tilt/Pan With Servos to Keep Target Centroid in FOV Center
X
Frame Origin Reference Target Pixel Address 0,0 Pixel Address 160, 120 w o R l e t x i a P m r 0 o 4 F 2 C S T N
Y NTSC 320 Pixel Column Format Sam Siewert 12 Stereo Ranging with Common Tilt/Pan Fixed Camera Baseline
Target centroid
∆centroid = (dl + dr) ∆centroid b α θ = f d f ≡ focal − length d
L lens R lens b α f f θ
L detector R detector
dl dr
Sam Siewert 13 Pixel to Servo Calibration at Distance
Pan Shifted Tilt One Servo Frame Tilt Shifted Frame Increment
Pan One Servo Reference Target Increment Pixel Address Pan Shifted 160, 120 Target Pixel w
o Address 158, 120 R l e t x i a P m r 0 o 4 F 2 Tilt Shifted Target
C Pixel Address
S 160, 122 T N
NTSC 320 Pixel Column Format Pan 1 Servo Increment and Find Centroid X Pixel Change Tilt 1 Servo Increment and Find Centroid Y Pixel Change
Sam Siewert 14 Characterize Camera FOV at Distances
FOV Width/Height Linear as a Function of Distance Pixels/Inch Not Necessarily Linear as Function of Distance Use to Calibrate Servo Step Size (Gain) for Target Centering – For Servo Step Sizes of 1 Increment in Tilt/Pan, Camera Will Track Slowly – For Larger Servo Step Sizes May Over-shoot – Determine Deadbands (Servo Limits for Pixel Change Accuracy)
CCTV FOV analysis Pixels Per Inch at Distances
50 y = 1.2132x - 1.1711 40 40 30 30 measured FOV 20 20 linear fit 10 10 Pixels Per Inch Per Pixels 0 0 distance from target distance from 0 20 40 0 10 20 30 40 50 FOV width Distance to Target
Sam Siewert 15 Finding Centroid In Image Target Known by Color, Shape, or Brightness Raster to Find Target Edges and Max Width and Height Scanline for Symmetric targets Mark Centroid in Image for Easy Debug Noise in Image Will Cause Centroid Error – Use Averaging Pan Right to Move Target Left in Image Tilt Up to Move Target Down in Image
Sam Siewert 16 More Tips Consider Controlled Lighting or For Mobile Robots, On-Board Lighting with LEDs Use Frame Grabber ADC Sensitivity Settings to Control Brightness Consider Automatic Calibration Sequences – Place Reference Targets to Set Pixel Step Size as a Function of Servo Step Size – Use Stereo Range Estimation to Determine Distance to Target and Set Tracking Gains for Current Distance When Target is Lost, Go Into Search Mode – Start from Max Tilt/Pan and Raster to Min Tilt/Pan to Find Target – For Search Modes Use Coarser Step Size Consider Small Servo Step Size, But High Frame Rate and Servo Command Rate Be Careful of Processor Overload with Image Processing at 30 FPS
Sam Siewert 17 Project Suggestions
Target Tracker – Tilt/Pan Camera Target Tracker – Fixed Camera, Tilt/Pan Laser Pointer Stereo Ranging Tracker Stereo Scene Imager – Raster a Scene with Tilt/Pan Laser Pointer – Fixed Camera Ranges to Each Laser Pointer Location – Builds 3-D Scene Map Line Follower Mobile Robot – Downward Camera Keeps Robot On Course – Image Processing Drives Steering Commands – Can Look Upward and Use Laser Pointer to detect Obstacles GPS Coarse Navigation with Close-In Computer Vision Scanners – Image to XY Plot Combine with Robotic Projects for Arm Navigation
Sam Siewert 18 Video Media
Sam Siewert 19 Embedding Video Codecs
Codec = Compression, Decompression
Build Your Own – Run Length Encoding – Difference Images – Python Viewer (Displays PPM sequences) – X-Windows Viewer (Displays PPM sequences)
Theora/Ogg Open Source Option – http://www.theora.org/ – Stream over Raw TCP to VLC Viewer
MJPEG Open Source Option – http://mjpeg.sourceforge.net/
Sam Siewert 20 Notes on Computer Color Encoding RGB, 24-bit, [0-255] for each color band Each Pixel is a 3-D Vector in RGB Space Blue Cyan
Magenta White
Green Black
Red Yellow
Sam Siewert 21 YUV/YCrCb RGB An Alternative to RGB is YUV, Where Y is Luminance and CrCb is Chrominance
The following 2 sets of formulae are taken from information from Keith Jack's excellent book "Video Demystified" (ISBN 1-878707-09-4).
RGB to YUV Conversion (For Computers with RGB [0-255]) – Y = (0.257 * R) + (0.504 * G) + (0.098 * B) + 16 – Cr = V = (0.439 * R) - (0.368 * G) - (0.071 * B) + 128 – Cb = U = -(0.148 * R) - (0.291 * G) + (0.439 * B) + 128
YUV to RGB Conversion – B = 1.164(Y - 16) + 2.018(U - 128) – G = 1.164(Y - 16) - 0.813(V - 128) - 0.391(U - 128) – R = 1.164(Y - 16) + 1.596(V - 128)
In both these cases, you have to clamp the output values to keep them in the [0-255] range.
Sam Siewert 22 RGB to Grayscale
From 24 bits to 8 bits most often
Single Color Band from RGB – Not True Grayscale, but Useful for Computer Vision Applications – Some Targets Like a Laser Pointer are Best Seen in Red Band or Green Band Alone
GIMP Uses a Conversion to 8-bit Luminance – Y = 0.3R + 0.59G + 0.11B – Defined by equal amounts of color the eye is most sensitive to green, then red, and then blue
Sam Siewert 23 R, G, or B band only vs. Balance
R G B
Balanced
Sam Siewert 24 Building Your Own Video Codec Video Compression Spaces – Color Space RGB (24 bits) YCrCb (16 bits / pixel) – Lossy compared to RGB Grayscale (8 bits) – Lossy
– XY Dimension As an Image Convolution/Deconvolution (Lossy) – Convolution: Moving Average of Pixels to Compress Multiple Pixels to One – Deconvolution: Interpolation to Estimate Original Pixel Values Adjancent to Compressed Pixel As A String – Run Length Encoding (Lossless) – Huffman Encoding (Lossless)
– Frame to Frame Time Dimension Difference Images (Lossless or Lossy with Thresholds) – Pixel Address and data for non-zero ∆pixels Pixel Address for 320x240 = 17 bits Dpixel = 24 bits for RGB – Scenes often don’t change quickly – Transmission of Change-Only Data – Threshold on ∆pixel to Compress more (Lossy) – Detection of Size Blow-up on Fast Changing Data
Sam Siewert 25 YCrCb 4:2:2 16-bit Format For every 2 Y samples in a scanline, there is one CrCb sample – Each Y, Cr, and Cb Sample is 8 bits each – Two RGB Pixes = 48 bits, Whereas Two YCrCb is 32 bits, or 16 bits per pixel vs. 24 bits per pixel (1/3 smaller frame size) 0 319 …
… 76,480 76,799 …
= Y, Cr, and Cb sample = Y sample only
Pixel-0 = Y7:Y00, Cb7:Cb00; Pixel-1 = Y7:Y01, Cr7:Cr00
Pixel-2 = Y7:Y02, Cb7:Cb01; Pixel-3 = Y7:Y03, Cr7:Cr01
Pixel-4 = Y7:Y04, Cb7:Cb02; Pixel-5 = Y7:Y05, Cr7:Cr02
Sam Siewert 26 Basic Definitions Useful Wikipedia Pages – PPM - http://en.wikipedia.org/wiki/Portable_pixmap – GIF - http://en.wikipedia.org/wiki/GIF – JPEG - http://en.wikipedia.org/wiki/JPEG – MPEG - http://en.wikipedia.org/wiki/MPEG – Theora - http://en.wikipedia.org/wiki/Theora
PPM and PGM Info – http://netpbm.sourceforge.net/doc/ppm.html (RGB) – http://netpbm.sourceforge.net/doc/pgm.html (grayscale)
MPEG Info – http://www.mpeg.org/MPEG/index.html – http://www.compression-links.info/MPEG
DivX Info – http://www.divx.com/divx/
Sam Siewert 27 Video Driver and Frame Analysis Resources
Test Dumping Frame over TSFS – Slow, but sure – Can load dumped frame to analyze
Single Frame Viewing and Analysis – http://www.irfanview.com/ – http://www.trilon.com/xv/downloads.html – http://www.gimp.org/downloads/
Image Processing Libraries – http://cimg.sourceforge.net/ – http://sourceforge.net/projects/opencvlibrary/
Sam Siewert 28 Using Python PPM Stream Viewer
Test Your Python and Vpipe Installation Using – Run vpipe_display.py first – Run frametx_test.py second
Set up and test your Btvid Bt878 driver and hardware
Test Image capture using “report” and write_save_buffer to dump PPM image over TSFS
Write TCP/IP client code to stream 1 frame/sec to Vpipe Display
Sam Siewert 29 More on Streaming
Streaming = Codec + Data Transport – E.g. MPEG-4 / RTP – Your Codec / UDP
Transport Protocols – UDP – Connectionless Datagrams, No Delivery Guarantee Diversely Routed Data Can Out of Order Datagrams Lost Are Not Re-transmitted – TCP – Connection-oriented Messaging, Guarantee for Window All Messages Segmented, Sequenced, and Fully Acknowledged All Messages Re-assembled from Segments and Re-Ordered Any Lost Messages Re-transmitted from Re-Transmission Window – Re-transmission Window Based on Bandwidth-Delay, Congestion – After a Maximum Number of Retries, TCP Finally Gives Up – RTP/UDP – Real-Time Transport Payload type, Sequence Number, Time-stamp, Delivery Monitoring http://www.ietf.org/rfc/rfc1889.txt – RTSP – Real-Time Streaming Transport Typically Used to Control RTP Delivery, but can use UDP or other transport http://www.ietf.org/rfc/rfc2326.txt
Sam Siewert 30 Project Suggestions
Motion Detection Video Stream Storage and Playback – Motion Detection Threshold for Difference Images – Compress on Store and Un-compress on Retrieval for Display
Computer Vision Projects
Video Editing
Digital Video Recorder
Sam Siewert 31