Computer and Machine Vision
Lecture Week 3
January 27, 2014 Sam Siewert Outline of Week 3
Processing Images and Moving Pictures – High Level View and Computer Architecture for it
Linux Platforms for Computer/Machine Vision
I/O, Memory and Processing Challenges
Sam Siewert 2 Old School Moving Picture Media and Cameras NTSC OTA (1941, 1953 color, 2009 dead) – Analog, Interlaced, Continuous Broadcast Transmission or CCTV (Closed Circuit TV) – Coax Cable or Tuner with Immediate CRT Display – No Buffers, No Routing, No De-mux – No Compression Analog Cable AM/FM OTA Film Projectors
Sam Siewert 3 Modern Digital Cameras Camera Link – High Frame Rates – High Data Rates and Resolutions – Industry Standard for Machine Vision Automation – E.g. Inspection Systems – E.g. – Sony, IDT, National Instruments
SD-SDI and HD-SDI – Standard and High Definition Synchronous Digital Interface – Standard for Studios, Broadcast
Digital Cinema – Red Camera – 1080p, 2K, 4K Resolutions and Much Higher – Automated Digital Delivery and Projection
Webcams and Mobile Phone Cameras – Very Low Cost – Proprietary – Performance Varies Dramatically
Sam Siewert 4 Differences Analog vs Digital Encoding for Transmission – Digital Allows for Image Processing – Adds Latency – Requires Compression for Packet Switched Networks and Storage Routed (Diversely), Buffered Compressed (MPEG, JPEG) to Lower Bit-rates Multiplexed (Shares Transmission Carrier for Audio, Video, Channels) Transported by IP (Large Packets) Continuous Transmission – Analog or Constant Bit-Rate / Frame-Rate
Sam Siewert 5 E.g. UAV Latency and Jitter
Verification of Video Frame Latency Telemetry for UAV Systems Using A Secondary Optical Method, Sam Siewert, Muhammad Ahmad, Kevin Yao
Sam Siewert 6 NTSC (Analog TV) http://en.wikipedia.org/wiki/File:Ntsc_channel.svg AM Video to CRT FM Audio Chroma Added Later Odd/Even Lines (Interlaced) 29.97 FPS (30 before color) Vertical Blanking (CRT Retrace Time, Closed Captioning) 525 Lines, 262.5 per Field, 60 Fields per Second
Sam Siewert 7 Linux in Computer Vision
Embedded Solutions – Texas Instruments OMAP (Beagle xM, Bone) – Numerous ARM SoCs (NVIDIA, Qualcomm, Broadcomm, …)
Scalable Solutions – Multi-Core (Xeon Phi) – Vector Processing – CUDA, OpenCL GPU and GP- GPU
Computer and Machine Vision is I/O, Memory and Processing Intensive
Sam Siewert 8 Camera Interfaces
CCD (Charge Coupled Device) or CMOS (Common Metal Oxide Substrate) Detector – Integration Time for Photo-sensitive Elements in Array (to Build up Charge) – Read-out Time to Sample Elements in Array
Luminance and Chroma Analog to Digital Conversion
Double Buffer for Read-out + Processing
Frame Capture – http://www.cse.uaa.alaska.edu/~ssiewert/a485_doc/Frame- Capture-Chips/ – Host Interface over PCI Bus or USB
Sam Siewert 9 Digital Video Transport QoS Latency – To Tune in a Program, Turn-on – To Deliver a Video Frame or Audio PCM Sample – To Start, FF, REW, Start-Over, Pause
Bandwidth – Resolution, Lossy/Lossless Compression, High Motion – Pixel Encoding for Color – Frame Rate – Constant Bit-rate Transport? – Variable Bit-rate Transport and Encoding?
Jitter – Decode and Presentation Rates – Elasticity in Decode to Presentation Buffering Necessary
Sam Siewert 10 Linux System Options
(Linux for Image Processing, Camera Interfacing and Computer Vision)
January 27, 2014 Sam Siewert Processing Outline
Many-Core Linux Host(s) – Intel Atom – ARM – Xeon
GP-GPU Vector Processing PCI-E Co-Processors NVIDIA Tesla/Fermi AMD ATI
NPTL – Native POSIX Threads Library
NPTL Example Code Walkthrough
Sam Siewert 12 Conceptual View of RT Resources Three-Space View of Utilization Requirements – CPU Margin? CPU-Utility – IO Latency (and Bandwidth) Margin? – Memory Capacity (and Latency) Margin? Upper Right Front IO-Utility Corner – Low-Margin Origin – High-Margin Memory-Utility Mobile – Must Consider Battery Life Too (Power)
Sam Siewert 13 Processing – Initial Focus Processing and Scaling Frame Transformation, Encode, Decode is Critical
Memory for Buffering (Frame Transformations, CPU Integrated or GPU Offloaded – e.g. Linux VDPAU)
I/O for Networking (Transport)
I/O for Storage (On-Demand, Post, Non-Linear Editing)
Sam Siewert 14 Flynn’s Computer Architecture Taxonomy Single Instruction Multiple Instruction Single Data SISD (Traditional Uni- MISD (Voting schemes processor) and active-active controllers)
Multiple Data SIMD (e.g. SSE 4.2, GP- MIMD (Distributed GPU, Vector Processing) systems (MPMD), Clusters with MPI/PVM (SPMD), AMP/SMP)
GPC has gone MIMD with SIMD Instruction Sets and SIMD Offload (GP-GPU) NUMA vs. UMA (Trend away from UMA to NUMA or MCH vs. IOH) SMP with One OS (Shared Memory, CPU-balanced Interrupt Handling, Process Load Balancing, Mutli-User, Multi-Application, CPU Affinity Possible) MIMD - Single Program Multi-Data vs. Multi-Program Multi-Data Sam Siewert 15 Computer and Machine Vision
Treated as a Real-time and/or Interactive System – Requires Predictable Response (By Deadline) – Rate Monotonic – Earliest Deadline First
Sam Siewert 16 CPU Scheduling Taxonomy
Execution Scheduling
Global-MP Local-Uniprocessor
Preemptive Dynamic Static Non-Preemptive
Symmetric SMT Asymmetric Distributed (SMP OS) (Micro-Paralell) (AMP ) Fixed-Priority Batch (Preemptive, Non-Preemptive Subtree Under Each Global-MP Leaf) Hybrid Rate Deadline Monotonic Monotonic FCFS SJN
Dynamic-Priority Cooperative
Dataflow
Heuristic EDF/LLF RR Timeslice Multi-Frequency Co-Routine Continuation (desktop) Executives Function
Sam Siewert 17 Response Latency
Ci WCET Input/Output Latency Interference Time
Response Time = TimeActuation – TimeSensed (From Release to Response)
Event Completion Actuation Sensed Interrupt Dispatch Preemption Dispatch (IO Queued) (IO Completion)
Interference Time
Input-Latency Execution Execution Dispatch-Latency Output-Latency
Sam Siewert 18 SIMD Vector Instructions
Intel MMX, SSE 1, 2, 3, 4.x Code Generation Using SIMD Extensions to Accelerate Algorithms (Edge Enhancement) – http://software.intel.com/en-us/articles/using-intel- streaming-simd-extensions-and-intel-integrated- performance-primitives-to-accelerate-algorithms/
PSF
Sam Siewert 19 Offload, Co-Proc, Vector Proc 1. GPU (Graphics Processing Units) – Evolved for Consumer CGI and Games Physics Engines 3D Rendering + Texture (4D Vector Operations) Game Engines and Simulation HD Output: HDMI, HD-SDI, Headless GP-GPU
– Higher End Used for Digital Cinema / Post Production, Broadcast PNY Quadro FX NVIDIA CUDA for Post
– GP-GPU Being Used to Accelerate Encode, Transcode, Trans-rate, etc. - http://www.elementaltechnologies.com/
2. Built-In SIMD Instruction Set Extensions – Intel SSE
Sam Siewert 20 GP-GPU, What Is It? Ideal for Large Bitwise, Integer, and Floating Point Vector Math Flynn’s Taxonomy SIMD Architecture often leverages GP-GPU Co- Processors or Cell for MPMD Single Instruction/Prog Multiple Instruction Single Data SISD (Traditional Uni- MISD (Voting schemes processor) and active-active controllers) Multiple Data SIMD (SSE 4.2, Vector MIMD (Distributed Processing) systems (MPMD), SPMD (Single Program Clusters with MPI/PVM Multiple Data),21 GP-GPU (SPMD), AMP/SMP) SSE – Streaming SIMD Extensions 128-bit registers known as XMM0 through XMM7 Large Operands and Operators (Multi-Word) E.g. 128-bit XOR of Two Operands Multiple Multiply and Accumulate Operations for Floating Point (DSP Kernel Operations) – E.g. 4 Component Vector addition – 4 Single Precision Pixel Multiply and Accumulate in Single Instruction
vec_res.x = v1.x + v2.x; movaps xmm0,address-of-v1 vec_res.y = v1.y + v2.y; addps xmm0,address-of-v2 vec_res.z = v1.z + v2.z; movaps address-of-vec_res,xmm0 vec_res.w = v1.w + v2.w;
16 operations 3 SSE operations to load, add, store ;xmm0=v1.w | v1.z | v1.y | v1.x to load 2 operands, add, store ;xmm0=v1.w+v2.w | v1.z+v2.z | v1.y+v2.y | v1.x+v2.x Sam Siewert 22 Scheduling Parallel/Cluster HW MIMD – OS SMP threading, provides load balancing, affinity operations, routable interrupts (e.g. MSI-X), e.g. NPTL – RTOS AMP is most often used in Embedded Systems
MPMD – OpenCL, CUDA, DirectCompute (DirectX extension) – Intel OpenMP, Linux Cluster, MPI
Sam Siewert 23 How Does NPTL Work?
No Thread Manager or M-on-N Mapping – Previous POSIX Threading Model – Manager Becomes Bottleneck – Two-Level Scheduling Not Deterministic – Many Pthreads (M) to N Kernel Threads Still an Issue – O(n) Scheduling for each Manager
Direct Mapping of User to Kernel Thread or 1-to-1 – User Space Pthread Maps Directly onto Kernel Thread (Requires Root privilege) – Deterministic (Non-Determinism due to Kernel Preemptability Issues) – O(1) Scheduling
Scheduling Policies Selectable
Similar to RTOS Tasking
Sam Siewert 24 Linux NPTL Scheduling Policies Fixed Priority Preemptive – SCHED_FIFO – This is Priority Preemptive – SCHED_RR – This is Fair, but at Kernel Level – SCHED_OTHER – This is OS default and should not be used
POSIX Threads have – Policy (FIFO, RR, OTHER) – Priority (RT min to RT max) – Creation (Fork) – Join (Wait for thread completion at rendezvous) – Synchronization Methods Semaphores Message Queues – Asynchronous Communication Methods Signals Queued Signals
POSIX RT Extensions Include – Virtual Timer Services – Signals Tied to Timer Services – Priority Inversion Protection (Availability on Linux TBD)
Sam Siewert 25 NPTL Coding
Code Walk-through
July 7, 2004 Sam Siewert Thread Scheduling Policy pthread_attr_init(&rt_sched_attr); pthread_attr_setinheritsched(&rt_sched_attr, PTHREAD_EXPLICIT_SCHED); pthread_attr_setschedpolicy(&rt_sched_attr, SCHED_FIFO); rt_max_prio = sched_get_priority_max(SCHED_FIFO); rt_min_prio = sched_get_priority_min(SCHED_FIFO); rt_param.sched_priority = rt_max_prio-1; rc=sched_setscheduler(getpid(), SCHED_FIFO, &rt_param); pthread_attr_getscope(&rt_sched_attr, &scope); if(scope == PTHREAD_SCOPE_SYSTEM) printf("PTHREAD SCOPE SYSTEM\n"); else if (scope == PTHREAD_SCOPE_PROCESS) printf("PTHREAD SCOPE PROCESS\n"); else printf("PTHREAD SCOPE UNKNOWN\n");
Sam Siewert 27 Thread Creation and Join rc = pthread_create(&main_thread, &main_sched_attr, testThread, (void *)0); if (rc) { printf("ERROR; pthread_create() rc is %d\n", rc); perror(NULL); exit(-1); } pthread_join(main_thread, NULL); if(pthread_attr_destroy(&rt_sched_attr) != 0) perror("attr destroy");
Sam Siewert 28 Issues Beyond Policy and Feasibility
Throughput
Latency
How do they Differ?
E.g. Frame Rate vs. Time to First Frame
Sam Siewert 29 Digital Video (Quick Reminders)
Sam Siewert 30 Simple Encode/Decode is Processing Intensive GPU Co-Processors Can Offload CPU
Example with Mplayer and VDPAU (Video Decode and Presentation Acceleration Unit) for Linux Core Loading with Mplayer VDPAU Dual-Core SW Decode MPEG Decode (Load balancing) (Load balancing and offload)
Sam Siewert 31 Discussion – What Does Eye See? Ewald Hering (1872), Opponent Colors (R/G, RGB Cube Y/B) Color Models – RGB Cube – HSV - Hue/Saturation/Value Hue – Similarity to R, G, Y, B Saturation – Color vs. Brightness
Value – Low=Black, High=Color http://en.wikipedia.org/wiki/File:RGB_Cube_Show_lowgamma_cutout_b.png – Red and Green Opponent Colors – Can’t See Both Simultaneously HSV Cylinder – Yellow and Blue Opponent Colors – Luminance (Candela/Square-Meter) – Light Passing Through Area Forming a Solid Angle in A Direction Candela (Photonic Power )= Watts/Steradian More Precise than “Brightness” – Chrominance (“CrCb” or “UV” in YCrCb or YUV) U=Blue – Luminance (Y) V=Red - Luminance (Y) – Wavelength Spectrum - ROYGBIV
http://en.wikipedia.org/wiki/File:HSV_color_solid_cylinder_alpha_lowgamma.png Sam Siewert 32 Frame Analysis and Image Processing Resources for Raw Frame Data GNU Image Processing – Single Frame Analysis and Transforms Octave – Similar to MATLAB Irfanview – Simple Viewer includes PPM OpenCV (C/C++ and Python API)
Single Frame Viewing and Analysis – http://www.irfanview.com/ – http://www.gimp.org/downloads/
Image Processing Libraries – http://cimg.sourceforge.net/ – http://opencv.org/
Sam Siewert 33 Practice with Linux
GIMP PPM and JPEG Frame Analysis
FFMPEG MPEG-4 DV to Frames
Sobel Image Transformation Real-Time – http://www.cse.uaa.alaska.edu/~ssiewert/a485_code/capture- transformer/
Sobel Image Transformation Batch Mode
FFMPEG Re-encoding
Sam Siewert 34