<<

Module Introduction

Purpose currently Family • The intent of this module is to introduce you to the multimedia features and here functions of the i.MX31. You will learn about the Imagination PowerVR MBX- Lite hardware core, graphics rendering, video processing, video encoding, and Product video decoding. indicated Objectives • Identify the key features of PowerVR MBX-Lite. i.MX31L • Describe multimedia capabilities of the i.MX31. numbers • Identify the features of the IPU. Family,

• Describe MPEG-4 video encoding. part • Describe the role of H.264 decoding during video playback. and

Content Product lines • 15 pages

• 2 questions i.MX31 product Learning Time 2010: • 25 minutes September BGA-packaged to The intent of this module is to introduce you to the multimedia features and functions of the i.MX31. You will learn about the Imagination PowerVR MBX-Lite prior hardware core, which provides high performance 3D graphics rendering for less States power and bandwidth than many traditionally architected accelerators. You will also Commission, learn about video processing, video encoding, and video decoding. It should be Trade noted that the i.MX31L does not have 2D/3D graphics acceleration; otherwise, United unless specifically mentioned, all information in this module applies to both the the in i.MX31 and the i.MX31L. sale International or States import for United the Freescale from from order an of available not Because are

1 PowerVR MBX-Lite

Key features of the MBX-Lite: currently

• Tile-based renderer Family

– Allows lower bandwidth to system memory vs. traditional architectures here – Allows high precision color and depth operations

• PowerVR Texture Compression (PVR-TC) Product 3D performance: indicated • Up to 1 million triangles per second • 118 million per second i.MX31L numbers

Standard Features Family, part •Flat and Gouraud •Per vertex fog

•Perspective texturing •16-bit textures and

•Specular highlights •32-bit textures Product

•Two-layer multitexturing •YUV video textures lines •32-bit Z support •Point, bilinear, trilinear and

•Full tile blend buffer anisotropic filtering i.MX31

•Alpha test •Full range of blend modes product •Full-scene anti-aliasing 2010: September BGA-packaged to The MBX-Lite uses a tile-based rendering technique to achieve high performance while keeping power and bandwidth low. The MBX-Lite is also prior able to yield higher precision color and depth processing. States Commission, The MBX-Lite further reduces bandwidth and memory consumption by Trade providing PowerVR Texture Compression (PVR-TC) texture compression. United This reduces the size of textures to shrink the memory footprint of textures the in and the overall size of applications. sale International In addition to these attractive key features, the MBX-Lite supports up to 1 or

million triangles per second and 118 million pixels per second, allowing States import

developers to create compelling 3D applications. for United

Lastly, MBX-Lite provides a host of standard 3D features to support industry the

APIs and developers. Here you can see these 3D features. Freescale from from order an of available not Because are

2 Tile-based Rendering currently Family here Product indicated i.MX31L numbers Family, part and Product

Tile On-chip lines

Low Latency i.MX31 All 3D Data System Intermediate Data System

Traditional 3D product Renderer Memory Memory

MBX-Lite 3D 2010: Renderer on i.MX31 Resulting Data September BGA-packaged to In tile-based rendering, the system divides the 3D data into blocks that refer to rectangular regions of the display. This division allows the rendering to prior occur in one region at a time and utilizes much fewer resources than if the States whole screen were considered at one time. Commission, Trade In traditional systems, all 3D data was saved to system United memory. The MBX-Lite uses a set of small on-chip buffers that replaces the the in large, fast buffers of the traditional 3D renderer. Due to the order of sale

rendering, only the resulting rendered scene is written out to system International memory, and the on-chip memory absorbs the intermediate accesses. or States import

In addition, the deferred aspect of a tile-based approach allows the renderer for to only read texture data that the end scene requires from the system United memory. For the i.MX31 unified memory architecture, this results in lower the Freescale

system bandwidth usage and less power drain. The increased bandwidth from and lower latency of the on-chip buffers allows the system to afford higher from precision calculations than those available in traditional architectures. This order results in more accurate color values and fewer depth-based artifacts. an of available not Because are

3 Graphics Partitioning currently Family here Product Scene Geometry Lighting Rasterization

Management Processing Display indicated i.MX31L numbers Family, part

MBX-Lite IPU and ARM11 Product VFP lines i.MX31 product 2010: September BGA-packaged to To render a 3D image, the data must pass through a set of standard stages of processing. Let’s look at the hardware and software partitioning of these prior stages. States Commission, Trade

The ARM1136 is partitioned to handle the scene management, lighting, and United

geometry processing stages in software. These stages are accelerated by the the vector floating point (VFP) unit on the processor. This eliminates the in

need to do costly floating point conversions and emulation. sale International or States

The MBX-Lite 3D acceleration hardware handles the rasterization portion of import

the pipeline, which is traditionally the most bandwidth-intensive portion. This for stage handles the interpolation of triangles, blending of colors, and occlusion United

checking. In addition, the tile partitioning is executed as a pre-processing the

step in hardware just prior to rasterization. Freescale from from Lastly, the IPU handles the final compositing and display of the resulting 3D order an

rendered image. of available not Because are

4 Graphics Software APIs currently Family

OpenGL ES Mobile M3G / JSR184 here

•Low level graphics API •Low level graphics API •High level (scene- Product •Open standard •Microsoft mobile 3D API graph) based Java API indicated developed by the Khronos •Available only for WinCE •Available for i.MX31 Group JVM

5.0 devices i.MX31L •Available for non- Microsoft platforms for numbers i.MX31 Family, part and Product lines i.MX31 product 2010: September BGA-packaged to Depending on the platform, the i.MX31 provides one of three application programmer interfaces for accessing the capabilities on the MBX-Lite. prior States OpenGL ES provides a low-level hardware abstraction API for native Commission, programming on most operating systems. Based on a subset of the desktop Trade OpenGL, this API is an open, royalty-free standard developed by the United Khronos Group. the in sale

Direct3D Mobile is also a low-level API for 3D graphics accelerators. Similar International to Direct3D, version 8 for personal computers, Direct3D Mobile provides a or

comprehensive interface to 3D hardware for WinCE based platforms. States import for

For Java-based platforms, M3G provides a higher level scene-graph United

interface for 3D accelerators. While commonly criticized for its floating-point the

usage, M3G excels on the i.MX31 due to the integrated VFP unit. Freescale from from order an of available not Because are

5 Question

Which of the following statements about the tile-based currently

rendering scheme of the MBX-Lite are true? Click all that Family

apply, and then click Done. here Product

a. Tile-based rendering allows lower system bandwidth. indicated

b. Tile-based rendering allows better scene management. i.MX31L

c. Tile-based rendering allows higher texture compression. numbers Family, d. Tile-based rendering allows higher precision color operations. part and Product lines i.MX31

Done product 2010: September BGA-packaged Here is a question to check your understanding of the MBX-Lite. to prior States Correct. Commission, Trade United Tile-based rendering allows lower system bandwidth and higher precision the

color operations. in sale International or States import for United the Freescale from from order an of available not Because are

6 Multimedia Capabilities

Up to Up to 60 480 Mbps Hours of MP3 Playback 128 Synchronization

Kbps Speed currently StereoStereo Base DAC WLAN Base Family DAC WLAN Band

Band here

16 Megapixels Resolution Up to 10 Hours Product In Still Picture ARM11 VFP of Real-Time Video ARM11 VFP indicated Capture Capture & Encoding 2 Displays VGA 30 fps 2 Sensors i.MX31 i.MX31L TV Encoder i.MX31 numbers IPU MPEG-4 IPU MPEG-4 MMC card, Family,

Flash Card part 6 Hours (3 Full Movies) 18bits SDIO, MS Pro MMC/ of MPEG-4 USB HS HDD and SDIO

Decoding and Playback MS Pro Up to 37 Hours Product ATA

VGA 30 fps of Viewfinder lines Operation HDD i.MX31 product 2010:

The i.MX31 processor is optimized to support a variety of image and video applications. It offers power-efficient image and video processing, pre- and post-processing in hardware, simultaneous September

MPEG-4 Simple Profile (SP) video encoding and decoding, real-time video decode in advanced formats, BGA-packaged to and image capture of up to 30

megapixels per second. The video implementation in the i.MX31 processor is the result of a smart trade-off prior between performance and flexibility. With a VFP co-processor and L2 cache, the i.MX31 is designed for any wireless device running computationally-intensive multimedia applications such as digital video broadcast States and videoconferencing. Commission, The i.MX31 has many multimedia highlights, including up to 60 hours of MP3 playback at 128 Kbps. It Trade provides versatile connectivity to a variety of image sensors and display devices as well as many United peripherals and expansion ports for devices such as MultiMedia Card™, Flash cards, the SDIOs, the Memory Stick PRO, and HDDs. The synchronization speed is up to 480 Mbps. in

Image capture in the i.MX31 can reach up to 30 megapixels per second, supporting VGA at 30+ fps in real sale International time, 3 megapixels at 10 fps, and 16 megapixels for still picture capture. The synchronization speed is up or to 480 Mbps. States Image and video processing is very power efficient in the i.MX31. In particular, pre- and post-processing is import performed fully in hardware, and the viewfinder, with up to 37 hours of operation, does not involve the for ARM CPU. United the The i.MX31 supports simultaneous MPEG-4 SP Video Encoding and Decoding with up to VGA at 30 fps and Freescale

3 Mbits per second. Encoding is accelerated in hardware (approximately 1300 MHz of equivalent ARM11 from performance), and decoding is performed in software. Pre- and post-processing is performed fully in from

hardware, adding considerable processing power to the system (approximately 1200 MHz of equivalent order

ARM11 performance). Pre- and post-processing includes functions such as resizing, inversion, rotation, an de-blocking, de-ringing, blending, and color space conversion. of available i.MX31 supports six hours of real-time video decoding and playback with VGA at 30 fps. Other features of MPEG-4 video decoding include hardware-accelerated Post-Filtering for MPEG-4 and hardware-accelerated not

In-Loop De-Blocking for H.264. The i.MX31 supports real-time video decode in the following advanced Because are formats: MPEG-4 Simple Profile (SP), H.264, Windows Media Video™ (WMV), RealVideo™ (RV), MPEG2, and DiVX.Video conference calling is supported on the i.MX31 with up to VGA at 30 fps and 1 Mbps.

7 Video Processing Performed by:

Camera (Image Signal Processing) currently

(or ARM11 SW) Image Sensor Display Family IPU in i.MX31 Bayer here MPEG-4 Encoder in i.MX31 RGB ARM11 SW Format Conversion IPU Product YUV Image indicated Quality Conversion Enhancement i.MX31L

Viewfinder Window numbers

Image Post Filtering Family, Conversion part

YUV and

MPEG-4 Product Compression Decompression Encoder lines Memory

Combining Separation i.MX31 from Audio with Audio Communication product Network 2010: September Let’s examine the video processing chain and its implementation. Images are captured by a camera and BGA-packaged to input directly to the Image Processing Unit (IPU) via the sensor interface. prior

The IPU performs some very processing-intensive image manipulations, adding considerable processing States power to the system: approximately 1200 MHz of equivalent ARM11 performance. The IPU includes all Commission, the functionality required for image processing and display management. It allows a camera preview Trade function to be performed fully in hardware, allowing the CPU to be powered down in this stage. It United performs post filtering for MPEG-4, including de-blocking and de-ringing, and it also performs in-loop the

de-blocking for H.264 as specified in this standard. Video and graphics can be combined, and transparency in specified by a key color, global alpha value, or per- alpha values interleaved with the pixel components. sale International or With regards to image conversion, it provides a fully flexible resizing ratio essentially between any two

resolutions. Pixel format conversion features include fully flexible conversion coefficients, color space, States import and color adjustments. Other IPU functions include filtering, 90, 180, and 270 degree rotation, and for horizontal/vertical inversion. United the The pre-processor is part of the IPU, and it resizes the data and performs color space conversion. Freescale The pre-processor can send data to a small viewfinder display, which provides visual feedback to the from

user to ensure that the desired data is being captured. The pre-processor then sends data to the from MPEG-4 encoder, which performs according to the MPEG-4 video standard. The order encoded data can be stored to file or sent to a communication network for later retrieval and playback. an of available

Later, when the user wants to view the recorded video, the encoded data is retrieved and passed through not

the MPEG-4 decoder, which decompresses the data. The decompressed data is then sent to the Because are post-processing module for quality enhancement, image resizing, and color space conversion. The data is then viewable on a display such as an LCD or TV monitor.

8 Video Processing Pre/Post processing: • Performed fully in hardware • Includes resizing, rotation and inversion, color conversion, de-blocking, de-ringing, and blending with graphics currently Encoding: Family • MPEG-4 SP (fully HW accelerated) here – High performance; up to VGA @ 30 fps; image quality not compromised

– Very power efficient Product – CPU is totally free to perform other tasks • Sufficient for most purposes: indicated – MPEG-4 SP is used for video conferencing

– MPEG-4 SP is supported by most video players i.MX31L • Other standards are left to SW numbers

Decoding: Family, • Post-filtering (de-blocking and de-ringing) is HW accelerated, providing significant part acceleration. • For H.264, the most processing-intensive standard, the de-blocking filter is HW and accelerated. Product

• Other standards are implemented in software, enabling full flexibility to support a lines variety of algorithms and future extensions.

• This is enabled by the powerful ARM11 MCU and multilevel cache system. i.MX31 product 2010: September

The i.MX31 has built in pre- and post- processing in hardware that includes all the BGA-packaged functionality required for image processing and display management, including de-block, to de-ring, color space conversion, independent horizontal and vertical resizing, blending of prior graphics and video planes, and rotation in parallel to video decoding. States Commission,

For video encoding, MPEG-4 SP and the H.263 baseline formats are fully hardware Trade United accelerated, supporting resolutions up to VGA at 30 fps. This achieves a high degree the of power efficiency and frees the CPU to perform other tasks. It is sufficient for most in purposes, as video conferencing and most video players support MPEG-4 SP. sale International

Software performs the encoding for other video standards. or States import Based on a mixture of software and hardware, this implementation provides the greatest for

flexibility to support a variety of algorithms and future extensions. The advanced ARM11 United

instruction set and multilevel cache system optimizes software. For MPEG-4, IPU the

hardware accelerates the post-filtering (deblocking and deringing), which results in a 75 Freescale percent load reduction on the ARM11 core. For H.264 baseline format—the most from from

processing-intensive format—hardware also performs the deblocking filter, which order

provides a 30 percent acceleration improvement. an of available

The software does implement other standards, which enables full flexibility to support a not variety of algorithms and future extensions. The powerful ARM11 processor (including its Because are multi-level cache system) provides the flexibility to decode at a high rate any currently relevant formats (up to HVGA at 30 fps), as well as possible future extensions.

9 IPU currently Family Camera here s i.MX31 Product

CPU Complex indicated

ARM11 i.MX31L TV Encoder IPU CPU numbers Family, part EMI

Memory and MPEG-4 Product lines Graphics Encoder

Accelerator i.MX31 product 2010: Displays September BGA-packaged to As you saw earlier, the IPU is at the heart of the video processing chain. It offers an integrative approach, including all functionalities required for image processing and prior display management. The IPU supports connectivity to a wide range of external States devices including cameras, displays, graphics accelerators, and TV encoders and Commission, decoders. To support all these devices, the IPU has a synchronous interface and Trade an asynchronous interface. The synchronous interface is for transfer of display data United in synchronization with the screen refresh cycle. This interface is for memory-less the in displays and TV encoders, and it also transfers video to smart displays that have a sale

video port. The asynchronous interface is for random read/write access to the International or memory and registers of smart displays and graphics accelerators. The data bus is

18 bits wide (or less), and it can transfer pixels of up to 24-bit color depth. States import for

The interface with cameras and TV decoders is much more systematic than the United interface with displays and requires much less flexibility. The interface receives one the Freescale data sample per bus cycle, with 8 to 16 bits per sample. There is one exception, a from nibble mode, in which 8-bit samples are received through a 4-bit bus, each during from two cycles. Synchronization signals (Vsync, Hsync) are either embedded in the order an

data stream, following the BT.656 protocol, or transferred through dedicated pins. of The main pixel formats are YUV (4:4:4 or 4:2:2) and RGB. Any other format, such available as Bayer or JPEG, can be received as generic data, which is transferred without not modification, to the system memory. Because are

10 IPU currently Family here Interface to: Synchronization

ƒ smart image sensors Product & Control ƒ raw image sensors

ƒ camera flash support indicated AHB Sensor System Port Master

Memory i.MX31L • Deblocking and deringing Port • Resizing numbers • Color conversion Video IPU Family, • Combining with graphics Processing part • Inversion and rotation and AHB Product Display Slave Port ARM11 Interface to: lines Port • a smart/memory-less display IP Port

• a TV encoder i.MX31 • a graphics accelerator product 2010: September The IPU is equipped with powerful control and synchronization capabilities to perform its tasks with BGA-packaged to minimal involvement of the ARM CPU. The integrated DMA controller (with two AHB master ports)

allows autonomous access to system memory. An integrated display controller performs screen refresh prior of memory-less displays.A page-flip double buffering mechanism synchronizes read and write accesses to the system memory to avoid tearing. The IPU also offers internal synchronization. States Commission, Trade

Here you can see the layout of the IPU. The sensor port provides interface to smart image sensors, raw United image sensors, and camera flash support. Video processing provides deblocking and deringing, resizing, the

color conversion, combining with graphics, and inversion and rotation. The display port provides interface in to a smart/memory-less display, a TV encoder, and a graphics accelerator. sale International or With the ARM platform powered down, the IPU performs the following activities completely autonomously: screen refresh of a memory-less display, periodic update of the display buffer in a smart display, and States display of a viewfinder window. When the system is idle, the user may want to display on the screen a import changing image such as an animation or a running message. In i.MX31, this can be performed automatically. for The CPU stores in system memory all the data to be displayed, and the IPU performs the periodic display United update without further CPU intervention. the Freescale from Integration, combined with internal synchronization, avoids unnecessary access to system memory, so it from

reduces the load on the memory bus and power consumption. In particular, input from a smart sensor order

(in YUV or RGB pixel formats) can be processed on the fly before being stored in system memory, and an output to a smart display can be processed on the fly while being read from system memory. In some cases, of input from a sensor can be sent directly to a display without passing through system memory at all. available not Because are The integrative approach enables efficient hardware design in which the hardware is reused whenever possible for different applications. For example, the DMA controller is used for video capture, image processing and data transfer to display. In addition, the image conversion hardware is used both for captured video (from camera) and for video playback (from memory).

11 Question Label the components in the IPU diagram below to show that you recognize the function of each. Drag the letters from the left to the currently Family corresponding positions on the right. Click “Done” when you are finished. here Product

Synchronization indicated & Control i.MX31L AHB Interface to smart image sensors, raw Sensor A Master image sensors, camera flash support PortA numbers Port B Family, part B Autonomous access to system memory Video IPU and Processing Interface to a smart/memory-less display, a Product

C TV encoder, a graphics accelerator lines AHB Slave Port

Display i.MX31 PortC

IP Port product 2010: September BGA-packaged to Let’s review the functions of the components of the IPU. prior States Commission, Correct. Trade United the

The sensor port is the interface to smart image sensors, raw image sensors, and in

camera flash support. The two AHB Master Ports are for autonomous access to sale International system memory, and the display port is the interface to a smart/memory-less or display, a TV encoder, and a graphics accelerator. States import for United the Freescale from from order an of available not Because are

12 MPEG-4 Encoding in Hardware currently Family

ARM Processing: here EMI Memory MPEG-4 stream forming i.MX31

MPEG-4 Stream Product Encoder Processing: ARM11 CPU VLC-Encoded

•Motion estimation, DCT indicated Frame & quantization MPEG-4 Encoder •Inverse quantization, IDCT Reference i.MX31L & motion compensation IPU Frame Buffer •Scan, run-length coding numbers & Huffman coding Camera Video Input •Rate control Double Buffer Family, part

IPU Processing: Graphics and •For compression: Overlay de-interleaving Product Display •For display (viewfinder): lines Double Buffer color conversion,

combining with graphics Display i.MX31 •For both (independently): product resizing, inversion, rotation 2010: September BGA-packaged to Here you can see how data flows for video capturing using MPEG-4 encoding. IPU processing takes care of de-interleaving for compression; prior color conversion and combining with graphics for display (viewfinder); and States resizing, inversion, and rotation for both compression and display Commission, (independently). Next, the encoder processes motion estimation, discrete Trade cosine transform (DCT) and quantization, inverse quantization, inverse DCT United (IDCT) and motion compensation, scan, run-length coding and Huffman the in coding, and rate control. Finally, the ARM takes care of MPEG-4 stream sale

forming. International or States The video encoding hardware accelerator of the i.MX31 processor supports import MPEG-4 SP (all levels) and H.263 baseline and enables pixel rates up to for VGA at 30 fps and compressed bit rate up to 4 Mbps. This adds up to 1300 United MHz of equivalent ARM11 performance. Two methods can detect that the the Freescale encoding of one frame is finished: either poll the register 1 or catch the from interrupt signal (IP Indigo IF). from order an of

The VGA MPEG-4 encoder in the i.MX31 has motion estimation capabilities available

with a motion vector length up to 32 pixels. VGA MPEG-4 encoding also not

includes error resilience tools as defined in the MPEG-4 standard. Additional Because are features of the VGA MPEG-4 encoder include pre-processing for picture smoothing using a low-pass filter and camera movement stabilization, both of which are patented technologies.

13 Video Playback: H.264 currently Family here i.MX EMI Memory

H.264 Stream Product ARM11 CPU ARM Processing:

Decoding indicated except in-loop deblocking Reference

Frame Buffer i.MX31L IPU

IPU Processing: numbers For in-loop deblocking Video Output

In-loop de-blocking, Family, Double Buffer part resizing, color conversion, For post-processing combining with graphics, Graphics Inversion, rotation Overlay and Product Display lines Double Buffer

Display i.MX31 product 2010: September BGA-packaged Here you can see the data flow of video playback using H.264 decoding. to

ARM processing takes care of decoding except in-loop deblocking. IPU prior processing takes care of in-loop de-blocking, resizing, color conversion, combining with graphics, inversion, and rotation. States Commission, Trade United the in sale International or States import for United the Freescale from from order an of available not Because are

14 Module Summary currently Family

• Imagination PowerVR MBX-Lite here – High performance 3D graphics Product – Less power and bandwidth than traditional architectures indicated • Three graphic software APIs: – OpenGL ES i.MX31L

– Direct3D Mobile numbers Family,

–M3G/JSR184 part • i.MX31 processor multimedia capabilities and

– Power-efficient image and video processing Product

– Simultaneous MPEG-4 SP video encoding and decoding lines

– Real-time video decode in advanced formats i.MX31

– Image capture of up to 30 megapixels per second product

•IPU 2010: September BGA-packaged to In this module, you learned about the features and functions of the of the Imagination PowerVR MBX-Lite hardware core, which provides high prior performance 3D graphics for less power and bandwidth than many traditionally States architected accelerators. You also learned about the three graphic software Commission, APIs: OpenGL ES, Direct3D Mobile, and M3G/JSR184. Next you examined Trade the multimedia capabilities of the i.MX31 processor, which include power- United efficient image and video processing, simultaneous MPEG-4 SP video the in encoding and decoding, real-time video decode in advanced formats, and sale

image capture of up to 30 megapixels per second. Finally, you learned about International the features of the IPU. or States import for United the Freescale from from order an of available not Because are

15