I.MX 8: ‘XS’ GPU

i.MX8 GRAPHICS AND DISPLAY HUGO OSORNIO GRAPHICS TECHNOLOGY ENGINEERING CENTER GRAPHICS ENGINEER AMF-AUT-T2770 | AUGUST 2017 NXP and the NXP logo are trademarks of NXP B.V. All other product or service names are the property of their respective owners. © 2017 NXP B.V. PUBLIC AGENDA • Introduction to GPUs. • GPU Types on i.MX8 • 3D GPUs • 3D APIs Available. • Vision Processing. • Dual GPUs • Display Controller. • 2D Blitter. • Image Capture Engine. • Video PUBLIC 1 01. What is Graphics? PUBLIC 2 http://www.xkcd.com/722 PUBLIC 3 02. GPU Fundamentals PUBLIC 4 What is a GPU? • Graphics Processing Unit • Extremely Multithreaded Processor • Can execute the same operation over multiple Data. • You can think of it as a SIMD!! PUBLIC 5 What is a GPU? • At the end of the day, the usual goal of a GPU is to generate an Image. • This image will be generated using input data, as vertices, textures, segment paths, colors, etc. • Its main application is Real Time Rendering PUBLIC 6 Real Time Rendering and its Importance • Real Time Rendering is the Art and Science of generating images quickly using a compute unit. • It is the most interactive area of computer graphics, as the content generated will react immediately to the user input. • In contrast with the Computer Generated Image that we see in the movies. PUBLIC 7 Real Time Rendering and its importance (2) PUBLIC 8 GPU types for i.MX8 2D ‘blitter’ 3D GPU GPU • 3D mesh • Bitmap copy, rendering, combine, texturing… filter… PUBLIC 9 Vivante GPU Architecture • The 2D, VG and 3D Vivante GPUs are all based on the same fundamental pipeline. • Parametrization of internal engines defines the GPU variants • Incremental changes to internal engines allow for a continuous upgrade path utilizing a single unified GPU driver stack Vivante GCxxxx GPU GPU Core AHB AXI Host Interface Memory Controller Graphics Pixel Pipeline GPU dependent engines Engine Front End PUBLIC 10 03. 3D GPU PUBLIC 11 Graphics Pipeline Front End GC SERIES • Inserts high level primitives into graphics pipeline AHB 2X AXI Host Interface Memory Controller Texture EngineTexture 3-D Rendering Engine Engine ScalarMorphic Unified Shader 4X SIMD Vec4 IEEE 32-bit per component Graphics 8X SIMD Vec4 IEEE 16-bit per component 1024 threads per shader • Determines primitive Orientation Pipeline PixelPixel Front EngineEngine End PUBLIC 12 Unified Shader Engine: Vertex GC SERIES • Transform Object and Viewport Positions AHB 2X AXI Host Interface Memory Controller Texture EngineTexture 3-D Rendering Engine Engine Unified Vertex Shader 4X SIMD Vec4 IEEE 32-bit per component Graphics 8X SIMD Vec4 IEEE 16-bit per component 1024 threads Pipeline PixelPixel Front EngineEngine End PUBLIC 13 3D Rendering Engine GC SERIES • Rasterization: Conversion of Vertices to Fragments AHB 2X AXI Host Interface Memory Controller Texture EngineTexture 3-D Rendering Engine Engine ScalarMorphic Unified Shader 4X SIMD Vec4 IEEE 32-bit per component Graphics 8X SIMD Vec4 IEEE 16-bit per component 1024 threads Pipeline PixelPixel Front EngineEngine From: http://acko.net/files/fullfrontal/fullfrontal/webglmath/online.html End a PUBLIC 14 Unified Shader Engine: Fragment GC SERIES • Apply Texture and Color a Pixel by Pixel. AHB 2X AXI Host Interface Memory Controller Texture EngineTexture 3-D Rendering Engine Engine Unified Fragment Shader 4X SIMD Vec4 IEEE 32-bit per component Graphics 8X SIMD Vec4 IEEE 16-bit per component 1024 threads Pipeline PixelPixel Front EngineEngine https://github.com/hallahan/presentations/blob/master/map-renderers-and-open-gl/index.html End PUBLIC 15 Texture Engine GC SERIES • Retrieve texture and color via texture coordinate AHB 2X AXI Host Interface Memory Controller Texture EngineTexture 3-D Rendering Engine Engine • Interpolate and filter texture and provide to shaders ScalarMorphic Unified Shader 4X SIMD Vec4 IEEE 32-bit per component Graphics 8X SIMD Vec4 IEEE 16-bit per component 1024 threads Pipeline PixelPixel Front EngineEngine End PUBLIC 16 Pixel Engine GC SERIES • Sample and Filter final pixels onto screen. AHB 2X AXI Host Interface Memory Controller Texture EngineTexture 3-D Rendering Engine Engine • ScalarMorphic Unified Shader Blend and Resolve Cache Tiles to Memory 4X SIMD Vec4 IEEE 32-bit per component Graphics 8X SIMD Vec4 IEEE 16-bit per component Pipeline 1024 threads Pixel http://aidiri.com/3d/?p=215 Pixel Front EngineEngine End PUBLIC 17 i.MX 8: ‘XS’ GPU Tessellation Shading • GPU hardware function that increases the detail of given polygons via a shader program. Polygons are subdivided into smaller and higher resolution polygons programmatically by the tessellation shader. • Promises close to “unlimited detail” and greatly reduces 3D model sizes, memory requirements and GPU workload. • Great for detailed maps (navigation), car or other 3D models, extremely detailed GUI designs, etc. with significantly lower starting polygon counts of the 3D content. • Couple with displacement maps for adding details to generated mesh. PUBLIC 19 i.MX 8: ‘XS’ GPU Geometry Shading • GPU hardware function that can either modify existing 3D primitives that come through the “pipeline” or create NEW polygons programmatically to inject into the 3D pipeline. • Geometry shaders run AFTER tessellation shading for almost unlimited scene flexibility. • Unlimited use cases – usable for terrain generation in automotive mapping, but real world application has mostly been in gaming so far. PUBLIC 20 04. Vulkan Support PUBLIC 21 What is Vulkan? • Vulkan is … − Revolutionary Hi Performance GPU API. − Cross platform API − Open Source Standard − Meant for Graphics & Compute Workloads • Vulkan is NOT… − The successor of OpenGL − The race of Spock (‘k’ not ‘c’) PUBLIC 22 Next Generation GPU APIs PUBLIC 23 How Did Vulkan Come to Be? • As graphics APIs evolved, backwards compatibility as a paradigm became unwieldy. • Embedded Version of OGL emerged to simplify and enhance usage for size and constraints. OpenGL 4.4 OpenGL 4.0 OpenGL 3.2 OGL ES 3.2 OpenGL 3.0 OGLES 3.1 OpenGL 2.1 OGLES 3.0 Vulkan OpenGL 2.0 OGL ES 2.0 1.0 OpenGL 1.5 OGLES 1.1 OpenGL 1.1 OGLES 1.0 PUBLIC 24 Excess Baggage from Backwards Compatibility • Supporting scenarios and adding driver complexity for unlikely paths. − Driver overhead for unnecessary driver paths • Resource management • Lacking support for multiple GPU cores in a context • Driver complexity leads to reduced reliability on the driver side. PUBLIC 25 How Will Vulkan Help Embedded Graphics? • Benefits Performance − Render more with the same GPU − Render “the same” with a lower cost GPU − Less latency for all use cases • Thin simple driver − More robust − Smaller code footprint • Naturally Cross platform − Shares all same code – except windowing - on simulation and target PUBLIC 26 05. VX – The Vision Extension PUBLIC 27 i.MX 8: ‘VX’ GPU ARM Cores Cortex-A53 | Cortex-A72 CPU GPU + Vision Vision Optimized Dynamic VLIW Architecture Interleaved Multi-Threading Unified Cache Stream Interconnect to Hardwired Vision Functions Extended Vision Instruction Set (EVIS) Accessible via OpenVX, OpenCL VX extensions, VXC PUBLIC 28 VX Engine on i.MX8 = Cordic Engine From (Suchitra Sathyanarayana, 2007). PUBLIC 29 Open Source APIS for Vision & Compute on i.MX 8 OpenGL ES Compute Shaders (OpenGL ES 3.1, OpenGL 4.x) Pervasively available on almost any mobile device or OS Easy integration into graphics apps – no vision/compute API interop needed Program in GLSL not C Limited to GPU acceleration Vulkan Compute (Vulkan 1.0) New bare-metal API for graphics and compute Thin driver, explicit GPU control Higher performance at cost of high code complexity Program in SPIR-V, or cross compile from GLSL OpenCL (OpenCL 1.2 FP. OpenCL 2.0 driver available in 2H 2017) Flexible, low level access to any devices with OpenCL compiler Single programming and run-time framework for CPU’s, GPUs, DSPs, hardware Open standard for any device or OS – being used as backed by many languages and frameworks i.MX8 VX instructions accessible via extensions! Needs full compiler stack and IEEE precision OpenVX – Out Of The Box Vision Framework (OpenVX 1.0.1) Can run some or all nodes on dedicated hardware – no compiler needed Higher level abstraction means easier performance portability to diverse hardware Graph optimization opens up possibility of low-power, always-on vision acceleration Fixed set of operators – but can be extended Custom OpenVX nodes can be built with OpenCL or GLSL on programmable devices! PUBLIC 30 i.MX 8QuadMax Pixel Pipe – CNN example MIPI-CSI2 ICS CPU GPU0 GPU1 DPU MIPI-DSI 4 lane 4 lane Segment MIPI-CSI2 Collect Machine ation & Display HDMI/eDP 4 lane Video Decision CNN Render Ethernet LVDS AVB PUBLIC 31 iMX 8 Vision Algorithms in development • Surround View − 2D Top-down & 3D bowl view, static calibration for low-end tier − Structure-from-motion 3D point cloud extraction for occupancy mapping Surround View − Real-time stitch re-calibration for high-end tier, based on SfM • CNN object classification − CNN engine on GPU0 CNN Object Recognition − ISP, Segmentation & Rendering on GPU1 • Stereo Vision − 3D depth mapping − 3D Object detection Stereo Vision • SoftISP Bad pixel --Histogram Bayer Noise RGB − ISP implementation in C, OpenCL, OpenVX detection & --White balance Demosaic input reduction output correction --Equalization • AR HUD SoftISP − Low latency view frustum adjustment, rendering based on view vector tracking PUBLIC 32 OpenVX Programming Framework • Directed Acyclic Graph (DAG) Framework Pipeline − Optimized precompiled kernels of commonly used vision processes . A subset of OpenCV that lends itself to HW Acceleration • HW Vendor can create hardened / silicon aware specialized kernels • App Developer can create unique shader-based kernels using OpenCL or OpenGL APIs − OpenVX Graphs can split, join, delay, and produce callbacks depending on heuristics. OpenVX Primitives include: Images, Image Pyramids, Process Graphs, Kernels, Control Parameters OpenVX Node Native OpenVX OpenVX Downstream Camera Node Node Application Control OpenVX Processing Node Example OpenVX Graph PUBLIC 33 Optimized Kernels for OpenVX PUBLIC 34 05.

I.MX 8: ‘XS’ GPU

Release Notes for X11R6.8.2 the X.Orgfoundation the Xfree86 Project, Inc

Matrox MGA-1064SG Developer Specification

Matrox MGA-2164W Developer's Specification

Intel 815EM Chipset: 82815EM Graphics and Memory Controller

Performance Analysis of Intel Gen9.5 Integrated GPU Architecture

Fire GL1 User's Guide

Datasheet – Volume 1 of 2

United States Patent (19) 11 Patent Number: 6,011,546 Bertram (45) Date of Patent: *Jan

Apov Issue 4 Regulars

Introducing the X68000: Japan's 16-Bit Beast

Intel740™ Graphics Accelerator

Memory System Optimizations for CPU-GPU Heterogeneous Chip-Multiprocessors