Why Use an RTOS?

Computer and Machine Vision

Lecture Week 3

January 27, 2014  Sam Siewert Outline of Week 3

Processing Images and Moving Pictures – High Level View and Computer Architecture for it

Linux Platforms for Computer/Machine Vision

I/O, Memory and Processing Challenges

 Sam Siewert 2 Old School Moving Picture Media and Cameras NTSC OTA (1941, 1953 color, 2009 dead) – Analog, Interlaced, Continuous Broadcast Transmission or CCTV (Closed Circuit TV) – Coax Cable or Tuner with Immediate CRT Display – No Buffers, No Routing, No De-mux – No Compression Analog Cable AM/FM OTA Film Projectors

 Sam Siewert 3 Modern Digital Cameras Camera Link – High Frame Rates – High Data Rates and Resolutions – Industry Standard for Machine Vision Automation – E.g. Inspection Systems – E.g. – Sony, IDT, National Instruments

SD-SDI and HD-SDI – Standard and High Definition Synchronous Digital Interface – Standard for Studios, Broadcast

Digital Cinema – Red Camera – 1080p, 2K, 4K Resolutions and Much Higher – Automated Digital Delivery and Projection

Webcams and Mobile Phone Cameras – Very Low Cost – Proprietary – Performance Varies Dramatically

 Sam Siewert 4 Differences Analog vs Digital Encoding for Transmission – Digital Allows for Image Processing – Adds Latency – Requires Compression for Packet Switched Networks and Storage Routed (Diversely), Buffered Compressed (MPEG, JPEG) to Lower Bit-rates Multiplexed (Shares Transmission Carrier for Audio, Video, Channels) Transported by IP (Large Packets) Continuous Transmission – Analog or Constant Bit-Rate / Frame-Rate

 Sam Siewert 5 E.g. UAV Latency and Jitter

Verification of Video Frame Latency Telemetry for UAV Systems Using A Secondary Optical Method, Sam Siewert, Muhammad Ahmad, Kevin Yao

 Sam Siewert 6 NTSC (Analog TV) http://en.wikipedia.org/wiki/File:Ntsc_channel.svg AM Video to CRT FM Audio Chroma Added Later Odd/Even Lines (Interlaced) 29.97 FPS (30 before color) Vertical Blanking (CRT Retrace Time, Closed Captioning) 525 Lines, 262.5 per Field, 60 Fields per Second

 Sam Siewert 7 Linux in Computer Vision

Embedded Solutions – Texas Instruments OMAP (Beagle xM, Bone) – Numerous ARM SoCs (NVIDIA, Qualcomm, Broadcomm, …)

Scalable Solutions – Multi-Core (Xeon Phi) – Vector Processing – CUDA, OpenCL GPU and GP- GPU

Computer and Machine Vision is I/O, Memory and Processing Intensive

 Sam Siewert 8 Camera Interfaces

CCD (Charge Coupled Device) or CMOS (Common Metal Oxide Substrate) Detector – Integration Time for Photo-sensitive Elements in Array (to Build up Charge) – Read-out Time to Sample Elements in Array

Luminance and Chroma Analog to Digital Conversion

Double Buffer for Read-out + Processing

Frame Capture – http://www.cse.uaa.alaska.edu/~ssiewert/a485_doc/Frame- Capture-Chips/ – Host Interface over PCI Bus or USB

 Sam Siewert 9 Digital Video Transport QoS Latency – To Tune in a Program, Turn-on – To Deliver a Video Frame or Audio PCM Sample – To Start, FF, REW, Start-Over, Pause

Bandwidth – Resolution, Lossy/Lossless Compression, High Motion – Pixel Encoding for Color – Frame Rate – Constant Bit-rate Transport? – Variable Bit-rate Transport and Encoding?

Jitter – Decode and Presentation Rates – Elasticity in Decode to Presentation Buffering Necessary

 Sam Siewert 10 Linux System Options

(Linux for Image Processing, Camera Interfacing and Computer Vision)

January 27, 2014  Sam Siewert Processing Outline

Many-Core Linux Host(s) – Intel Atom – ARM – Xeon

GP-GPU Vector Processing PCI-E Co-Processors NVIDIA Tesla/Fermi AMD ATI

NPTL – Native POSIX Threads Library

NPTL Example Code Walkthrough

 Sam Siewert 12 Conceptual View of RT Resources Three-Space View of Utilization Requirements – CPU Margin? CPU-Utility – IO Latency (and Bandwidth) Margin? – Memory Capacity (and Latency) Margin? Upper Right Front IO-Utility Corner – Low-Margin Origin – High-Margin Memory-Utility Mobile – Must Consider Battery Life Too (Power)

 Sam Siewert 13 Processing – Initial Focus Processing and Scaling Frame Transformation, Encode, Decode is Critical

Memory for Buffering (Frame Transformations, CPU Integrated or GPU Offloaded – e.g. Linux VDPAU)

I/O for Networking (Transport)

I/O for Storage (On-Demand, Post, Non-Linear Editing)

 Sam Siewert 14 Flynn’s Computer Architecture Taxonomy Single Instruction Multiple Instruction Single Data SISD (Traditional Uni- MISD (Voting schemes processor) and active-active controllers)

Multiple Data SIMD (e.g. SSE 4.2, GP- MIMD (Distributed GPU, Vector Processing) systems (MPMD), Clusters with MPI/PVM (SPMD), AMP/SMP)

GPC has gone MIMD with SIMD Instruction Sets and SIMD Offload (GP-GPU) NUMA vs. UMA (Trend away from UMA to NUMA or MCH vs. IOH) SMP with One OS (Shared Memory, CPU-balanced Interrupt Handling, Process Load Balancing, Mutli-User, Multi-Application, CPU Affinity Possible) MIMD - Single Program Multi-Data vs. Multi-Program Multi-Data  Sam Siewert 15 Computer and Machine Vision

Treated as a Real-time and/or Interactive System – Requires Predictable Response (By Deadline) – Rate Monotonic – Earliest Deadline First

 Sam Siewert 16 CPU Scheduling Taxonomy

Execution Scheduling

Global-MP Local-Uniprocessor

Preemptive Dynamic Static Non-Preemptive

Symmetric SMT Asymmetric Distributed (SMP OS) (Micro-Paralell) (AMP ) Fixed-Priority Batch (Preemptive, Non-Preemptive Subtree Under Each Global-MP Leaf) Hybrid Rate Deadline Monotonic Monotonic FCFS SJN

Dynamic-Priority Cooperative

Dataflow

Heuristic EDF/LLF RR Timeslice Multi-Frequency Co-Routine Continuation (desktop) Executives Function

 Sam Siewert 17 Response Latency

Ci WCET Input/Output Latency Interference Time

Response Time = TimeActuation – TimeSensed (From Release to Response)

Event Completion Actuation Sensed Interrupt Dispatch Preemption Dispatch (IO Queued) (IO Completion)

Interference Time

Input-Latency Execution Execution Dispatch-Latency Output-Latency

 Sam Siewert 18 SIMD Vector Instructions

Intel MMX, SSE 1, 2, 3, 4.x Code Generation Using SIMD Extensions to Accelerate Algorithms (Edge Enhancement) – http://software.intel.com/en-us/articles/using-intel- streaming-simd-extensions-and-intel-integrated- performance-primitives-to-accelerate-algorithms/

PSF

 Sam Siewert 19 Offload, Co-Proc, Vector Proc 1. GPU (Graphics Processing Units) – Evolved for Consumer CGI and Games Physics Engines 3D Rendering + Texture (4D Vector Operations) Game Engines and Simulation HD Output: HDMI, HD-SDI, Headless GP-GPU

– Higher End Used for Digital Cinema / Post Production, Broadcast PNY Quadro FX NVIDIA CUDA for Post

– GP-GPU Being Used to Accelerate Encode, Transcode, Trans-rate, etc. - http://www.elementaltechnologies.com/

2. Built-In SIMD Instruction Set Extensions – Intel SSE

 Sam Siewert 20 GP-GPU, What Is It? Ideal for Large Bitwise, Integer, and Floating Point Vector Math Flynn’s Taxonomy SIMD Architecture often leverages GP-GPU Co- Processors or Cell for MPMD Single Instruction/Prog Multiple Instruction Single Data SISD (Traditional Uni- MISD (Voting schemes processor) and active-active controllers) Multiple Data SIMD (SSE 4.2, Vector MIMD (Distributed Processing) systems (MPMD), SPMD (Single Program Clusters with MPI/PVM Multiple Data),21 GP-GPU (SPMD), AMP/SMP) SSE – Streaming SIMD Extensions 128-bit registers known as XMM0 through XMM7 Large Operands and Operators (Multi-Word) E.g. 128-bit XOR of Two Operands Multiple Multiply and Accumulate Operations for Floating Point (DSP Kernel Operations) – E.g. 4 Component Vector addition – 4 Single Precision Pixel Multiply and Accumulate in Single Instruction

vec_res.x = v1.x + v2.x; movaps xmm0,address-of-v1 vec_res.y = v1.y + v2.y; addps xmm0,address-of-v2 vec_res.z = v1.z + v2.z; movaps address-of-vec_res,xmm0 vec_res.w = v1.w + v2.w;

16 operations 3 SSE operations to load, add, store ;xmm0=v1.w | v1.z | v1.y | v1.x to load 2 operands, add, store ;xmm0=v1.w+v2.w | v1.z+v2.z | v1.y+v2.y | v1.x+v2.x  Sam Siewert 22 Scheduling Parallel/Cluster HW MIMD – OS SMP threading, provides load balancing, affinity operations, routable interrupts (e.g. MSI-X), e.g. NPTL – RTOS AMP is most often used in Embedded Systems

MPMD – OpenCL, CUDA, DirectCompute (DirectX extension) – Intel OpenMP, Linux Cluster, MPI

 Sam Siewert 23 How Does NPTL Work?

No Thread Manager or M-on-N Mapping – Previous POSIX Threading Model – Manager Becomes Bottleneck – Two-Level Scheduling Not Deterministic – Many Pthreads (M) to N Kernel Threads Still an Issue – O(n) Scheduling for each Manager

Direct Mapping of User to Kernel Thread or 1-to-1 – User Space Pthread Maps Directly onto Kernel Thread (Requires Root privilege) – Deterministic (Non-Determinism due to Kernel Preemptability Issues) – O(1) Scheduling

Scheduling Policies Selectable

Similar to RTOS Tasking

 Sam Siewert 24 Linux NPTL Scheduling Policies Fixed Priority Preemptive – SCHED_FIFO – This is Priority Preemptive – SCHED_RR – This is Fair, but at Kernel Level – SCHED_OTHER – This is OS default and should not be used

POSIX Threads have – Policy (FIFO, RR, OTHER) – Priority (RT min to RT max) – Creation (Fork) – Join (Wait for thread completion at rendezvous) – Synchronization Methods Semaphores Message Queues – Asynchronous Communication Methods Signals Queued Signals

POSIX RT Extensions Include – Virtual Timer Services – Signals Tied to Timer Services – Priority Inversion Protection (Availability on Linux TBD)

 Sam Siewert 25 NPTL Coding

Code Walk-through

July 7, 2004  Sam Siewert Thread Scheduling Policy pthread_attr_init(&rt_sched_attr); pthread_attr_setinheritsched(&rt_sched_attr, PTHREAD_EXPLICIT_SCHED); pthread_attr_setschedpolicy(&rt_sched_attr, SCHED_FIFO); rt_max_prio = sched_get_priority_max(SCHED_FIFO); rt_min_prio = sched_get_priority_min(SCHED_FIFO); rt_param.sched_priority = rt_max_prio-1; rc=sched_setscheduler(getpid(), SCHED_FIFO, &rt_param); pthread_attr_getscope(&rt_sched_attr, &scope); if(scope == PTHREAD_SCOPE_SYSTEM) printf("PTHREAD SCOPE SYSTEM\n"); else if (scope == PTHREAD_SCOPE_PROCESS) printf("PTHREAD SCOPE PROCESS\n"); else printf("PTHREAD SCOPE UNKNOWN\n");

 Sam Siewert 27 Thread Creation and Join rc = pthread_create(&main_thread, &main_sched_attr, testThread, (void *)0); if (rc) { printf("ERROR; pthread_create() rc is %d\n", rc); perror(NULL); exit(-1); } pthread_join(main_thread, NULL); if(pthread_attr_destroy(&rt_sched_attr) != 0) perror("attr destroy");

 Sam Siewert 28 Issues Beyond Policy and Feasibility

Throughput

Latency

How do they Differ?

E.g. Frame Rate vs. Time to First Frame

 Sam Siewert 29 Digital Video (Quick Reminders)

 Sam Siewert 30 Simple Encode/Decode is Processing Intensive GPU Co-Processors Can Offload CPU

Example with Mplayer and VDPAU (Video Decode and Presentation Acceleration Unit) for Linux Core Loading with Mplayer VDPAU Dual-Core SW Decode MPEG Decode (Load balancing) (Load balancing and offload)

 Sam Siewert 31 Discussion – What Does Eye See? Ewald Hering (1872), Opponent Colors (R/G, RGB Cube Y/B) Color Models – RGB Cube – HSV - Hue/Saturation/Value Hue – Similarity to R, G, Y, B Saturation – Color vs. Brightness

Value – Low=Black, High=Color http://en.wikipedia.org/wiki/File:RGB_Cube_Show_lowgamma_cutout_b.png – Red and Green Opponent Colors – Can’t See Both Simultaneously HSV Cylinder – Yellow and Blue Opponent Colors – Luminance (Candela/Square-Meter) – Light Passing Through Area Forming a Solid Angle in A Direction Candela (Photonic Power )= Watts/Steradian More Precise than “Brightness” – Chrominance (“CrCb” or “UV” in YCrCb or YUV) U=Blue – Luminance (Y) V=Red - Luminance (Y) – Wavelength Spectrum - ROYGBIV

http://en.wikipedia.org/wiki/File:HSV_color_solid_cylinder_alpha_lowgamma.png  Sam Siewert 32 Frame Analysis and Image Processing Resources for Raw Frame Data GNU Image Processing – Single Frame Analysis and Transforms Octave – Similar to MATLAB Irfanview – Simple Viewer includes PPM OpenCV (C/C++ and Python API)

Single Frame Viewing and Analysis – http://www.irfanview.com/ – http://www.gimp.org/downloads/

Image Processing Libraries – http://cimg.sourceforge.net/ – http://opencv.org/

 Sam Siewert 33 Practice with Linux

GIMP PPM and JPEG Frame Analysis

FFMPEG MPEG-4 DV to Frames

Sobel Image Transformation Real-Time – http://www.cse.uaa.alaska.edu/~ssiewert/a485_code/capture- transformer/

Sobel Image Transformation Batch Mode

FFMPEG Re-encoding

 Sam Siewert 34