i.MX8 GRAPHICS AND DISPLAY

HUGO OSORNIO GRAPHICS TECHNOLOGY ENGINEERING CENTER GRAPHICS ENGINEER AMF-AUT-T2770 | AUGUST 2017

NXP and the NXP logo are trademarks of NXP B.V. All other product or service names are the property of their respective owners. © 2017 NXP B.V. PUBLIC AGENDA • Introduction to GPUs. • GPU Types on i.MX8 • 3D GPUs • 3D APIs Available. • Vision Processing. • Dual GPUs • Display Controller. • 2D Blitter. • Image Capture Engine. • Video

PUBLIC 1 01. What is Graphics?

PUBLIC 2 http://www.xkcd.com/722

PUBLIC 3 02. GPU Fundamentals

PUBLIC 4 What is a GPU?

• Extremely Multithreaded Processor • Can execute the same operation over multiple Data. • You can think of it as a SIMD!!

PUBLIC 5 What is a GPU?

• At the end of the day, the usual goal of a GPU is to generate an Image. • This image will be generated using input data, as vertices, textures, segment paths, colors, etc. • Its main application is Real Time Rendering

PUBLIC 6 Real Time Rendering and its Importance

• Real Time Rendering is the Art and Science of generating images quickly using a compute unit.

• It is the most interactive area of computer graphics, as the content generated will react immediately to the user input.

• In contrast with the Computer Generated Image that we see in the movies.

PUBLIC 7 Real Time Rendering and its importance (2)

PUBLIC 8 GPU types for i.MX8

2D ‘blitter’ 3D GPU GPU • 3D mesh • copy, rendering, combine, texturing… filter…

PUBLIC 9 Vivante GPU Architecture

• The 2D, VG and 3D Vivante GPUs are all based on the same fundamental pipeline. • Parametrization of internal engines defines the GPU variants • Incremental changes to internal engines allow for a continuous upgrade path utilizing a single unified GPU driver stack

Vivante GCxxxx GPU

GPU Core AHB AXI

Host Interface

Memory Controller

Graphics Pixel Pipeline GPU dependent engines Engine Front End

PUBLIC 10 03. 3D GPU

PUBLIC 11 Front End

GC SERIES • Inserts high level primitives into graphics pipeline

AHB 2X AXI Host Interface

Memory Controller

Texture EngineTexture 3-D Rendering Engine Engine

ScalarMorphic Unified 4X SIMD Vec4 IEEE 32-bit per component Graphics 8X SIMD Vec4 IEEE 16-bit per component 1024 threads per shader • Determines primitive Orientation Pipeline PixelPixel Front EngineEngine End

PUBLIC 12 Unified Shader Engine: Vertex GC SERIES • Transform Object and Viewport Positions

AHB 2X AXI Host Interface

Memory Controller

Texture EngineTexture 3-D Rendering Engine Engine

Unified Vertex Shader 4X SIMD Vec4 IEEE 32-bit per component Graphics 8X SIMD Vec4 IEEE 16-bit per component 1024 threads Pipeline PixelPixel Front EngineEngine End

PUBLIC 13 3D Rendering Engine GC SERIES • Rasterization: Conversion of Vertices to Fragments

AHB 2X AXI Host Interface

Memory Controller

Texture EngineTexture 3-D Rendering Engine Engine

ScalarMorphic Unified Shader 4X SIMD Vec4 IEEE 32-bit per component Graphics 8X SIMD Vec4 IEEE 16-bit per component 1024 threads Pipeline PixelPixel Front EngineEngine From: http://acko.net/files/fullfrontal/fullfrontal/webglmath/online.html End

a

PUBLIC 14 Unified Shader Engine: Fragment GC SERIES • Apply Texture and Color a Pixel by Pixel.

AHB 2X AXI Host Interface

Memory Controller

Texture EngineTexture 3-D Rendering Engine Engine

Unified Fragment Shader 4X SIMD Vec4 IEEE 32-bit per component Graphics 8X SIMD Vec4 IEEE 16-bit per component 1024 threads Pipeline PixelPixel Front EngineEngine https://github.com/hallahan/presentations/blob/master/map-renderers-and-open-gl/index.html End

PUBLIC 15 Texture Engine GC SERIES • Retrieve texture and color via texture coordinate

AHB 2X AXI Host Interface

Memory Controller

Texture EngineTexture 3-D Rendering Engine Engine • Interpolate and filter texture and provide to ScalarMorphic Unified Shader 4X SIMD Vec4 IEEE 32-bit per component Graphics 8X SIMD Vec4 IEEE 16-bit per component 1024 threads Pipeline PixelPixel Front EngineEngine End

PUBLIC 16 Pixel Engine GC SERIES • Sample and Filter final pixels onto screen.

AHB 2X AXI Host Interface

Memory Controller

Texture EngineTexture 3-D Rendering Engine Engine

• ScalarMorphic Unified Shader Blend and Resolve Cache Tiles to Memory 4X SIMD Vec4 IEEE 32-bit per component Graphics 8X SIMD Vec4 IEEE 16-bit per component Pipeline 1024 threads Pixel http://aidiri.com/3d/?p=215 Pixel Front EngineEngine End

PUBLIC 17 i.MX 8: ‘XS’ GPU

Tessellation Shading • GPU hardware function that increases the detail of given polygons via a shader program. Polygons are subdivided into smaller and higher resolution polygons programmatically by the tessellation shader.

• Promises close to “unlimited detail” and greatly reduces 3D model sizes, memory requirements and GPU workload.

• Great for detailed maps (navigation), car or other 3D models, extremely detailed GUI designs, etc. with significantly lower starting polygon counts of the 3D content.

• Couple with displacement maps for adding details to generated mesh.

PUBLIC 19 i.MX 8: ‘XS’ GPU

Geometry Shading • GPU hardware function that can either modify existing 3D primitives that come through the “pipeline” or create NEW polygons programmatically to inject into the 3D pipeline.

• Geometry shaders run AFTER tessellation shading for almost unlimited scene flexibility.

• Unlimited use cases – usable for terrain generation in automotive mapping, but real world application has mostly been in gaming so far.

PUBLIC 20 04. Vulkan Support

PUBLIC 21 What is Vulkan?

• Vulkan is … − Revolutionary Hi Performance GPU API. − Cross platform API − Open Source Standard − Meant for Graphics & Compute Workloads

• Vulkan is NOT… − The successor of OpenGL − The race of Spock (‘k’ not ‘c’)

PUBLIC 22 Next Generation GPU APIs

PUBLIC 23 How Did Vulkan Come to Be?

• As graphics APIs evolved, backwards compatibility as a paradigm became unwieldy. • Embedded Version of OGL emerged to simplify and enhance usage for size and constraints. OpenGL 4.4

OpenGL 4.0

OpenGL 3.2 OGL ES 3.2

OpenGL 3.0 OGLES 3.1

OpenGL 2.1 OGLES 3.0 Vulkan OpenGL 2.0 OGL ES 2.0 1.0 OpenGL 1.5 OGLES 1.1

OpenGL 1.1 OGLES 1.0

PUBLIC 24 Excess Baggage from Backwards Compatibility

• Supporting scenarios and adding driver complexity for unlikely paths. − Driver overhead for unnecessary driver paths

• Resource management

• Lacking support for multiple GPU cores in a context

• Driver complexity leads to reduced reliability on the driver side.

PUBLIC 25 How Will Vulkan Help Embedded Graphics?

• Benefits Performance − Render more with the same GPU − Render “the same” with a lower cost GPU − Less latency for all use cases

• Thin simple driver − More robust − Smaller code footprint

• Naturally Cross platform − Shares all same code – except windowing - on simulation and target

PUBLIC 26 05. VX – The Vision Extension

PUBLIC 27 i.MX 8: ‘VX’ GPU

ARM Cores Cortex-A53 | Cortex-A72

CPU

GPU + Vision  Vision Optimized Dynamic VLIW Architecture  Interleaved Multi-Threading Unified Cache  Stream Interconnect to Hardwired Vision Functions  Extended Vision Instruction Set (EVIS)  Accessible via OpenVX, OpenCL VX extensions, VXC

PUBLIC 28 VX Engine on i.MX8 = Cordic Engine

From (Suchitra Sathyanarayana, 2007).

PUBLIC 29 Open Source APIS for Vision & Compute on i.MX 8

OpenGL ES Compute Shaders (OpenGL ES 3.1, OpenGL 4.x) Pervasively available on almost any mobile device or OS Easy integration into graphics apps – no vision/compute API interop needed Program in GLSL not C Limited to GPU acceleration

Vulkan Compute (Vulkan 1.0) New bare-metal API for graphics and compute Thin driver, explicit GPU control Higher performance at cost of high code complexity Program in SPIR-V, or cross compile from GLSL

OpenCL (OpenCL 1.2 FP. OpenCL 2.0 driver available in 2H 2017) Flexible, low level access to any devices with OpenCL compiler Single programming and run-time framework for CPU’s, GPUs, DSPs, hardware Open standard for any device or OS – being used as backed by many languages and frameworks i.MX8 VX instructions accessible via extensions! Needs full compiler stack and IEEE precision

OpenVX – Out Of The Box Vision Framework (OpenVX 1.0.1) Can run some or all nodes on dedicated hardware – no compiler needed Higher level abstraction means easier performance portability to diverse hardware Graph optimization opens up possibility of low-power, always-on vision acceleration Fixed set of operators – but can be extended Custom OpenVX nodes can be built with OpenCL or GLSL on programmable devices! PUBLIC 30 i.MX 8QuadMax Pixel Pipe – CNN example

MIPI-CSI2 ICS CPU GPU0 GPU1 DPU MIPI-DSI 4 lane 4 lane

Segment MIPI-CSI2 Collect Machine ation & Display HDMI/eDP 4 lane Video Decision CNN Render

Ethernet LVDS AVB

PUBLIC 31 iMX 8 Vision Algorithms in development

• Surround View − 2D Top-down & 3D bowl view, static calibration for low-end tier − Structure-from-motion 3D point cloud extraction for occupancy mapping Surround View − Real-time stitch re-calibration for high-end tier, based on SfM • CNN object classification − CNN engine on GPU0 CNN Object Recognition − ISP, Segmentation & Rendering on GPU1 • Stereo Vision − 3D depth mapping − 3D Object detection Stereo Vision • SoftISP Bad pixel --Histogram Bayer Noise RGB − ISP implementation in C, OpenCL, OpenVX detection & --White balance Demosaic input reduction output correction --Equalization • AR HUD SoftISP − Low latency view frustum adjustment, rendering based on view vector tracking PUBLIC 32 OpenVX Programming Framework

• Directed Acyclic Graph (DAG) Framework Pipeline − Optimized precompiled kernels of commonly used vision processes . A subset of OpenCV that lends itself to HW Acceleration • HW Vendor can create hardened / silicon aware specialized kernels • App Developer can create unique shader-based kernels using OpenCL or OpenGL APIs

− OpenVX Graphs can split, join, delay, and produce callbacks depending on heuristics. . OpenVX Primitives include: Images, Image Pyramids, Process Graphs, Kernels, Control Parameters

OpenVX Node Native OpenVX OpenVX Downstream Camera Node Node Application Control OpenVX Processing Node

Example OpenVX Graph

PUBLIC 33 Optimized Kernels for OpenVX

PUBLIC 34 05. Dual GPUs

PUBLIC 35 PUBLIC 36 Dual GPUs - Pitfalls of GPU Virtualization

• Increased Software Complexity − in the OS and kernel domains (debugging, stability)

• Performance Hit − Virtualization overhead, especially with API trapping techniques and multiple context switching

• Monolithic GPU’s require extra HW − to hide the complexity of command buffer management and TDM (Time Division Multiplexing)

• If Monolithic GPU goes down, everything goes down − Means a driver bug fail in secure vision path can take down the HMI

PUBLIC 37 i.MX 8 Family eCockpit Solutions

Optional High Performance 3D High Performance 3D NXP’s HUD Secondary Display Cluster Infotainment i.MX 8 architecture Display is designed to eliminate complicated software virtualization using hardware i.MX 8 Display Processor 0 i.MX Display Processor 1 separation to streamline e-cockpit development requiring high levels of 3D GPU Complex safety, security and integration i.MX 8 GPU Core 0 i.MX 8 GPU Core 1

Advantages ASIL-B OS OPEN OS

• Dual Independent Display Controllers HYPERVISOR

• Configurable GPU Cores • Single Monolithic GPU • Dual Independent GPUs

• SoC Level Virtualization A5x Series A7x Series CPU Cluster 0 CPU Cluster 1

PUBLIC 38 06. Display Infrastructure

PUBLIC 39 i.MX 8 Display Support

PUBLIC 40 i.MX 8 Display Controller 2 Display Processor Instances: i.MX 8 QuadMax, QuadPlus, Quad • All i.MX 8 Display Processors will have the same feature set and share the same programming model / code base • Fetch from memory and generate pixel streams for “pixel link” interface 4K (4x1080p) Display Support using dual output ports • Each pixel stream can be directed to 1 Display Processor Instance: a selection of output PHYs i.MX 8 QuadXPlus, DualXPlus, DualX − LVDS / DSI / HDMI

2K (2x1080p) Display Support using dual output portsPUBLIC 41 i.MX 8 Display Controller

18 Layer i.MX 8 Display Composition Engine (32 bit) Composition 4 Constant Background Planes (RGBA) 2 Display Output Streams (independent panels)

10 bits per color component (30-bit resolution)

Multi-planar Video 18 Total layers composition at 300 Mpixels / second

Real-time blend and Warp, no DDR passes req’d or Powerful Real Time Automatic Safety Stream Panic + Detection using Safety Display stream Warp 8 Layers Blend Engine CRC Matching

or Content_0 Content_1 Safety Display stream Fractional 8 Layers Safety_0 Safety_1

Display Dual Display Stream Port Stream Multi-planar Video Frame Sync Gamma, Brightness, Contrast Region CRC Dither (Physical resolution) Checker

Display Output Display Output

PUBLIC 42 i.MX 8 e-Cockpit Cluster Safety Strategy

CRC Checking Zones HUD

i.MX 8 GPU Core 0

i.MX 8 Display Controller 0 55 MPH 0.4 mi Turn ►► (2 Ports) Cluster

Detects CRC MISMATCH

Failover images running in background

i.MX 8 GPU Core 0

i.MX 8 Display Controller 0 55 MPH 0.4 mi Turn ►► Safety Streams x 2

Automatically Switchover to Simplified Cluster / HUD

PUBLIC 43 i.MX 8 Display Controller inputs

• 4 independent source planes • Warp and fractional can combine from 8 sources • Warp and fractional can be configured as dedicated safety streams • AXI Access permissions configurable per plane for safety

PUBLIC 44 Video plane Fetch unit i.MX 8 Display Controller inputs Decode Decompression

Warp plane Clip Clip window Window Fetch unit Fractional plane Re- Clip Clip window Flip/Rotate/Repl./Drop sample Fetch unit Window Plane composition Clip Clip window Window Plane composition Re- Flip/Rotate/Repl./Drop Palette Color palette sample Arbitrary warping Extract & Re- Raw to RGBA/YUV Expand Bitwidth expansion sample Flip/Rotate/Repl./Drop Extract & Raw to RGBA/YUV Expand Bitwidth expansion Combine Planar to packed Palette Color palette Planar to packed Combine Chroma YUV422 to 444 Extract & Raw to RGBA/YUV Expand Bitwidth expansion Color YUV to RGB Color YUV to RGB

Color YUV to RGB Gamma Gamma removal Gamma Gamma removal

Multiply Alpha multiply, RGB pre-multiply Gamma Gamma removal Multiply Alpha multiply, RGB pre-multiply

Video processing block Multiply Alpha multiply, RGB pre-multiply Matrix Brightness, Contrast, Saturation Filter Bilinear filter

Scaler Upscale, Deinterlace (Bob) PUBLIC 45 i.MX 8 Display Controller blending

• 4 constant planes = background color generator per plane. • Plane can only be assigned to one output stream • Configurable blend order • Frame generator can toggle between Content stream, Safety Stream and Constant plane based on safety configuration

PUBLIC 46 i.MX 8 Display Controller output

• CRC checker with 8 stackable regions, maskable, exclusive top-to-bottom priority. • CRC check can be inserted after any stage in the post-processing pipe • CRC failure can generate SW interrupt, or switch the Frame Gen to either Safety Stream or Constant Plane.

PUBLIC 47 DPR display interface cache Display channel • Each display plane has a multi line cache • This contains 8 lines of pixels for each plane − RGB, YUV etc formats supported − Supports Video and GPU tile formats Multi line Prefetch buffer • Contents are fetched from memory to fill cache ahead of time − Horizontal and vertical fetches supported • Warp fetches not supported, require bypass.

Prefetch

Memory

PUBLIC 48 MX8QM display support i.MX 8QM

CLK +/- CLK+/- (up to 1.5GHz) DC0#0&1 D2 +/- D0+/- HDMI DC0#0 DC D1 +/- Display 1 D1+/- MIPI DSI D0 +/- D2+/-

D3+/- CLK+/- (up to 150MHz) D0+/- SerializerDisplay3 3 LVDS D1+/- or DC0#1 D2+/- D3+/- Display 3 CLK+/- (up to 1.5GHz) D4+/- D5+/- D0/D4+/- DC1#0 D6+/- D7+/- Display 2 D1/D5+/- MIPI DSI CLK+/- (up to 150MHz) D0+/- Display4 D2/D6+/- LVDS D1+/- D2+/- DC1#1 D3+/- D3/D7+/- D4+/- D5+/- D6+/- PUBLICD7+/- 49 i.MX 8X Display Support i.MX 8X

CLK+/- (up to 150MHz)

Serializer 1 D0+/- or LVDS DC#0 CLK (up to 120 MHz) Display 1 D1+/- or HSYNC (150MHz) MIPI DSI D2+/- VSYNC D3+/- Serializer 3 18-bit ENABLE or PARALLEL RESET Display 3 CLK+/- (up to 150MHz) D0 . Serializer 2 D0/D4+/- . DC#1 . or LVDS . . Display 2 D1/D5+/- or D17 (150MHz) MIPI DSI D2/D6+/-

D3/D7+/-

PUBLIC 50 i.MX 8X Display Support i.MX 8X

CLK+/- (up to 85MHz)

Serializer 1 D0+/- or CLK (up to 120 MHz) Display 1 D1+/- LVDS HSYNC (150MHz) D2+/- VSYNC Serializer 1 D3+/- Serializer 3 DC#0 18-bit ENABLE or or PARALLEL Display 1 RESET Display 3 (85MHz) CLK+/- (up to 85MHz) D0 . Serializer 2 D0/D4+/- . . or . . Display 2 D1/D5+/- LVDS D17 (150MHz) D2/D6+/-

D3/D7+/-

PUBLIC 51 Display Output Capabilities (Dual Display Port) 300MHz pixel rate with 25% blanking distance for each port

Display Use Case Total # of Controller 1 Controller 2 Max. Resolutions Example Displays

3840 x 2160 (60 FPS) Large Screen 1   HW combined dual pixel port

  2 Wide Screens 2 2880 x 1440 (60FPS)

3 @ 2880 x 960 (60FPS) 3 Screens Wide 3 or   3 Screens Normal 3 @ 1280 x 1024 (120 FPS) 3840 x 720 (60FPS) Multiple Screens 3 1280 x 720 (60FPS)   Cluster, HUD, IVI 1920 x 1080 (60FPS) 1920 x 720 (60FPS) Multiple Screens 1920 x 720 (60FPS) 4   Cluster, HUD, IVI, RSE 1920 x 1080 (60FPS) 1920 x 1080 (60FPS)

PUBLIC 52 Display -> ISI loopback

• We can loop the output stream of one of the displays back to memory − Supports RGB/422 formats − Can be done at SAME time as output but limited to 300Mhz single channel (2500x1600) roughly − Uses ISI input path (behaves like a camera) • For QXP − Also supported, and can be used for parallel output path . From Display -> ISI -> Memory -> LCDIF - > parallel output . Limit ~800x600

PUBLIC 53 07. 2D GPU

PUBLIC 54 i.MX 8 2D Composition Engine = Efficiency, Performance and Safety • 2D Graphics Engine Support: Source Buffers i.MX 8 2D Blitting Engine (32 bit) AXI Read Master 3 Source Pipes and 1 Destination Pipe 800 Mpixels / second for fill-rate • Reduces burden on GPU: Allows the 3D GPU to be a 3D 400 Mpixels / second for copy, blend, ROP, Scale 200 Mpixels / second for rotate, warp GPU Intelligent 2D Composition Matrix Filtering for Linear Color Conversion • Plays nice with video: Engine Color LookUp Table for color buffer compression Overlays native video and Free angle rotation(90/180/270), flip, mirror graphics with minimal trips to i.MX 8QM/QP/Q have over 2Gpixels/sec 2D Bandwidth system memory Dest. Buffer i.MX 8QXP/DXP/DX have over 1Gpixels/sec 2D Bandwidth* AXI Write Master

• Saves power: 3D Engine can Color Space Conv. Copy, Blend, ROP, remain off for windowing GUI’s Scale, Rotate, Warp (Android HW Composer) Affine Transformations, Linear Light

*Assumes: 1 or 2 complete sets of display processor + blitting engine. 800 Mpixels * 2 (fill-rate) + 300 Mpixels * 2 (Composition / Blend) PUBLIC 55 i.MX 8 2D Blitter

• The 2D Blitter is exposed via the NXP G2D API, already used by X11, Wayland • Built from the DPU IP • 3-Source blit, smooth scaling • Linear & non-linear color conversion • Warping • Compressed buffer R/W support (TGA & proprietary) • DDR to DDR only • Supports reading of VPU and GPU tiled formats via DPR • Output to GPU tiled format possible

PUBLIC 56 08. Image Capture

PUBLIC 57 i.MX 8 Image Sensing Interface (ISI)

• Combines Pixel DMA and M-JPEG processing • 4K@30Hz total capture capacity, across maximum of 8 sources • iMX8: 2x MIPI-CSI, HDMI, DPU loopback input • iMX8X: 1x MIPI-CSI and Parallel CSI • 8 pixel pipelines total • One input to multi output stream generation − Will consume multiple pipelines − SafeAssure feature • Capture to DDR only – no direct link to DPU • AXI Permission control per pipeline

PUBLIC 58 i.MX 8 NANCI (Not Another Camera Interface)

• Supported To Memory Pixel Formats − RAW6, RAW7, RAW8, RAW10, RAW12, RAW14 − RGB888, RGB444, RGB666, RGB565, RGB555 − YUV420 8-bit, YUV420 10-bit − YUV422 8-bit, YUV422 10-bit − Generic Byte Aligned Data (8-bit,16-bit,24-bit,32-bit) • Color Space Conversion (CSC) − RGB, YUV, YCbCr − User Defined Color Space Matrix-Based Conversion • Alpha Channel insertion for RGB formats − Global Alpha Value − Separate Rectangular Region of Interest Alpha Value • Downscaling (Bilinear filtering, decimation) • Mirroring (Image Flip): horizontal flip and Vertical flip supported • Frame Awareness, Frame Skipping − Clean frame start, shutdown based on HSYNC / VSYNC − Buffer overrun prevention • Simple De-Interlace support for MIPI-CSI Interlaced, Memory to Memory input sources − Weave, Discard (Line Doubling), and Linear Interpolation Supported

PUBLIC 59 iMX 8 M-JPEG

• Motion JPEG Engines − Memory to Memory Tiled Operation − ISO/IEC 10918-1 JPEG Compliance − Programmable Huffman Tables − Supports all possible scan configurations for input/output − Supports image size up to 64K x 64K − 4 Read, 4 Write DMA streams • Encode − Motion JPEG payload encoding − Programmable Quantization Tables with configurable quality − One pass compression ratio regulation − Block based rate control option with independent luminance and chrominance thresholds • Decode − Motion JPEG payload decoding − 8-bit or 16-bit quantization tables − 8-bit and 12-bit per sample − Supports restart markers

PUBLIC 60 09. Video

PUBLIC 61 i.MX 8: Video Processing Subsystem (VPU) overview

• HEVC Support HEVC @ 8Mbps − Up to 4k30 Decode support − Up to 50% lower b/w than h.264 − Same quality • Full Legacy Codecs h.264 @ 16Mbps − H.264 − DivX − VC-1 − MPEG-2 − MPEG-4 i.MX 8 Video Subsystem − VP-8 • Decode • Decode • Decode − H.265 (4kp30) − H.265 (4kp30) − H.265 (4kp30) − H.264 (4kp30) − H.264 (4kp30) − H.264 (4kp30) − MPEG-4 (1080p60) − MPEG-4 (1080p60) − MPEG-4 (1080p60) − MPEG-2 (1080p60) − MPEG-2 (1080p60) − MPEG-2 (1080p60) − VC-1 (1080p60) − VC-1 (1080p60) − VC-1 (1080p60) − DivX (1080p60) − DivX (1080p60) − DivX (1080p60) − VP8 (1080p60) − VP8 (1080p60) − VP8 (1080p60) i.MX 8QuadMax i.MX 8QuadPlus i.MX 8Quad • Encode • Encode • Encode − H.264 (1080p60) − H.264 (1080p60) − H.264 (1080p60) − JPEG − JPEG − JPEG

PUBLIC 62 i.MX 8 Series Video key features

• Decoder: − Legacy Decoder: 4Kp30 AVC/H264 Baseline, Main, High profile, 1080p60 MPEG-2, VC-1, MPEG4 part 2, VP8, VP6, RV9, AVS, MJPEG, JPEG decoder with no de-interlacing − HEVC Decoder: 4Kp30 8/10bit HEVC/H.265 Main, Main10 decoder with no scaling • Encoder: − 1080p60 AVC/H.264 Baseline, Main, High profile encoder • Scaler: − Memory to memory scaler to produce scaled luma search buffer for encoder first pass and to convert raster format 4:2:2 YUV into 8xn tiled 4:2:0 2 or 4 buffer YUV (luma top, luma bottom, chroma top, chroma bottom) No input scaling • Transport stream: − TS input and decryption engine with 2 serial inputs and Multi2 decryption for Japanese Terrestrial broadcast (XUVI)

PUBLIC 63 i.MX 8 Series Video block diagram

• Multi-stream playback (up to 4 simultaneous streams) Examples include 2 HEVC 1080p60 decode streams, 4 H.264 1080p30 decode streams, 4 H.264 SD decode streams, etc.1080p60 • Multi-stream encode (up to 4 simultaneous streams) Examples include 2 H.264 1080p30 encode streams, 4 H.264 SD encode streams, etc.

Host interface Cortex-A53 (FW)

Video Decoder

Video External Memory Encoder

XUVI TS TS

PUBLIC 64 i.MX 8 X: Video Processing Subsystem (VPU) overview

• HEVC Support HEVC @ 8Mbps − Up to 1080p60 Decode support − Up to 50% lower b/w than h.264 − Same quality • Full Legacy Codecs h.264 @ 16Mbps − H.264 − DivX − VC-1 − MPEG-2 − MPEG-4 i.MX 8 Video Subsystem − VP-8 • Decode • Decode − H.265 (1080p60) − H.265 (1080p60) − H.264 (1080p60) − H.264 (1080p60) − MPEG-4 (1080p60) − MPEG-4 (1080p60) − MPEG-2 (1080p60) − MPEG-2 (1080p60) − VC-1 (1080p60) − VC-1 (1080p60) − DivX (1080p60) − DivX (1080p60) − VP8 (1080p60) − VP8 (1080p60) i.MX 8QuadXPlus i.MX 8DualXPlus • Encode • Encode − H.264 (1080p30) − H.264 (1080p30) − JPEG − JPEG

PUBLIC 65 i.MX 8 X Series Video key features

• Decoder: − Legacy Decoder: 1080p60 AVC/H264 Baseline, Main, High profile, 1080p60 MPEG-2, VC-1, MPEG4 part 2, VP8, VP6, RV9, AVS, MJPEG, JPEG decoder with no de-interlacing − HEVC Decoder: 1080p60 8/10bit HEVC/H.265 Main, Main10 decoder with no scaling

• Encoder: − 1080p30 AVC/H.264 Baseline, Main, High profile encoder

PUBLIC 66 i.MX 8 X Series Video block diagram

• Multi-stream playback (up to 4 simultaneous streams) Examples include 2 HEVC 1080p30 decode streams, 2 H.264 1080p30 decode streams, 4 H.264 SD decode streams, etc.

• Multi-stream encode (up to 4 simultaneous streams) Examples include 2 H.264 SD encode streams, etc.

Video Host interface Decoder External Memory

Video Encoder

PUBLIC 67 10. Putting it all together

PUBLIC 68 Graphics Rendering is a Journey…

PUBLIC 69 ..on a very busy freeway CPU Clusters

• Going full throttle!

− Full compatibility of buffer formats between i.MX8QM - 25.6 GB/s pixel producers / consumers 3D 32-bit DDR4 − No DDR-to-DDR format conversion 2D M necessary, including GPU resolve e 32-bit DDR4

− Compressed format support for 2D Blitter and 2 Port m Display o DPU VPU r − Full GL range of compressed texture support y 3D − DPR for optimal DPU access patterns Image 8 Input B Capture Channels − Unified memory; 0-Copy 2D U S Audio 2 Port Display IO

PUBLIC 70 NXP and the NXP logo are trademarks of NXP B.V. All other product or service names are the property of their respective owners. © 2017 NXP B.V.