Multiprocessors in Wireless Multimedia Terminals

Mika Kuulusa / Technology Platforms / Product Platforms

15 August 2006 MPSOC Forum, Colorado, USA

1 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006 Outline

• Nokia HW/SW and Platforms • Multimedia Computers and Teardown • Power Consumption • Multimedia Application Processors • General-Purpose Processors/Multicore • Multimedia Processors • Key Messages

2 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006 Nokia Structure Business Groups

Mobile Multimedia Enterprise Networks Customer and Phones Solutions Market Operations

Technology Platforms

Brand and design Developer support Horizontal Groups Horizontal Research and venturing Business infrastructure

CorporateCorporate FunctionsFunctions

3 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006 Nokia HW/SW Platforms

• Complete, verified HW/SW engines with memories, EM, displays and cellular/proximity radio modems. • Business groups take a variety of chipsets according to product needs (low/mid/high-end). • Product price point generally specifies the chosen platform. • Multimedia accelerators extend features in high-end terminals. • S60 / Symbian 9.1 • S40, S30 / Nokia RTOS • Linux 2.6

4 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006 S60 Platform

• Complete software package for . • S60 UI concept: Global design & UI system implementation including Symbian optimisations. • Application suite: Telephony, messaging, browsing, PIM, imaging, connectivity, etc. • Localised to over 30 languages including Chinese. • Licensed in source-code form with extensive documentation, consultation for product integration & modifications. • Native Symbian / S60 C++ & Java (MIDP) interfaces open to 3rd parties (S60 SDK) & closer partners. • S60 Release 2.6 and 2.8: Symbian version 8.x • S60 Release 3.0: Symbian version 9.1

5 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006 S60/Symbian Architecture

S60S60 UI System & UIQUIQ Application NokiaNokia SymbianSymbian Suite PlatformPlatform App Engines & Common SymbianSymbian OSOS Middleware & Symbian IF OS ProviderProvider ModulesModules HW-SpecificHW-Specific Adaption SymbianSymbian SWSW CellularCellular Layers ModemModem SWSW

HW & Cellular ApplicationApplication HWHW CellularCellular HWHW Modem IF

6 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006 Mobile Technologies and My Favourite Features Real-Time Gaming Audio Streaming Radio Recording 1”HDD OpenGL ES Video Streaming Mobile Web Server MMC/SDCard MPEG4 HDMI PodCasting TV-Out MP3/AAC Web Browsing HS-USB VoIP GPS Mediabox Headset DVB-H Office Speaker Voice FM Radio Location-Based WLAN Applications Accelerometer WiMax Applications WCDMA/HSDPA See-What-I-See Real-Time Calendar Wireless USB GSM/GPRS/EDGE DTM 2-Way Video Call Instant Sharing SMS Instant Messaging Your Killer App

7 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006 Multimedia Computers 770: Internet Web Tablet • Hildon UI/Apps, Linux OS 2.6 kernel Nokia 770 • 800 x 480 (24bpp) • 250MHz ARM926 + C55 DSP (OMAP1710) • WLAN 802.11b/g, Bluetooth 1.2 • 1800mAh Lithium-Ion

Nokia N93

N93: Camcorder Phone • S60 3.0 Edition UI/Apps, Symbian OS 9.2 kernel • 320 x 240 (18bpp), TV-out, 3D Accelerator (MBX/VGP) • 330MHz ARM1136, 220MHz C55 DSP + IVA (OMAP2420) • GSM/GPRS/WDCMA (128/384kbps), WLAN 802.11b/g, Bluetooth • 1100mAh Lithium-Polymer

8 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006 Source: Portelligent, Inc. (printed with permission) Illustration: N90 Teardown

9 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006 Power Consumption Breakdown: 2-Way Video Call

3.0 0.6W Applications Processor

1.2W Wireless Modem 2.0

Display Backlight 0.4W 1.0 Display HW Speaker Power Consumption / W Camera Bluetooth Others 0

10 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006 Battery: 1000mAh, 3.7Wh MP3/MPEG4 File Playback Times Worst Case 3.0 3.0W

2.0

MPEG4 Video 1.0 1.2W 1.0W Power Consumption / W Consumption Power Backlight 0.6W MP3 Audio Idle 0.4W 0.3W 0 0.01W

Time: 15days 12:20 9:15 6:10 3:42 3:05 1:23 (h:min)

11 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006 Multimedia Application Processor

• ARM processor • Multimedia processing options: • Microcontroller + HW Image L1 Signal ARM Processor • DSP + HW Cache Processor • 2D/3D: acceleration for rendering vector gfx, user interfaces, games. • ISP: raw image enhancements for plain camera sensors (SMIA). • Memory: 166MHz 32-bit Mobile DDR- Memory / SDRAM, NOR/NAND Flash. L1 Cache 2D/3D Graphics • Peripherals: Display/TV-out, HS-USB Multimedia Acclerator (OTG), MMC, SDIO, I2C, SPI, UART. DSP/Accelerators • Die area: 40-60 mm2 • Price: 5-15 USD (>10M units) • 28M Smartphones shipped in 2005.

12 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006 Sources: arm.convergencepromotions.com, www.intel.com, EETimes 11/05, ARM DevCon 04. Application Processor vs. Mobile x86

Texas Instruments OMAP2420 Intel L2300 Duo Core (Yonah) ARM1136JF 330MHz L1 32/32kB 2CPU x86 1.5 GHz, L1 32/32kB, L2 2MB C55 220MHz L1 16 4/4/64/96kB + IVA, Area: 90.3 mm2, 65nm MBX/VGP 3D, 640kB SRAM Power: 15W (TDP) Power: 0.6W Price: 284 USD Price: <20 USD

13 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006 General-Purpose Processors: ARM

• All Nokia terminals include one or more ARM processors: • ARM7TDMI (low-end), ARM926 (mid-end), ARM1136 (high-end). • Processors have MMU and L1. Lately L2-cached designs appearing in industry. • Mobile phones are not sold by GHz-CPU arguments. • ARM CPU power budget is 250mW. • Cortex-A8 and ARM MPCore are next generation application CPUs. • Implementation complexity and leakage/active power (45 nm) may limit reasonable CPU clock speeds around 800MHz. • Simultaneous Multithreading (SMT): probably not a good idea, consider x86. • 10-15% increase in core logic area. • Cache trashing from 2+ threads. • Symmetric Multiprocessing (SMP, Multicore): simpler CPU cores. • 100% increase in core/cache area. • 2x performance is possible.

14 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006 Sources: www.ARM.com, www.IBM.com, Spring Processor Forum 06, ARM Devcon 05. Cortex A8 area estimate: Cortex-R4 process scaling ratio 0.61 and cache areas. General-Purpose Processor Area

Die Area for 130nm CMOS 20 18 16 14 12 Cache 10

mm² Logic 8 6 4 2 0

/8 6 /16 1 8 6 1 ARM7EJ-S ARM7TDMI ARM968E-S ARM922T ARM946E-S 8/8 ARM920T A8 N 16/16 128 ARM MP11F 16/16 ARM926EJ-SPowerPC 16/16 405ARM1136J-S 16/16 16/16 ARM1136JF-S 16/16ARM1176JZF-SPowerPC 16/ 440 F 32/32 Cortex-

15 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006 Multicore and SMT Considerations

Pipeline Processor • CPU pipeline length traditionally Stages describes maturity of a processor ARM7E 3 microarchitecture. ARM926EJ 5 • Desktop/mobile x86 CPUs today have PowerPC 405 5 pipeline length around 10-16 stages. ARM1136JF 8 • Pipeline evolution in ARM processors: XScale (ARM) PXA27x 7 • ARM7 3 PowerPC 440 7 • ARM9 5 Pentium III 10 • ARM11 8 AMD Athlon 64 12 • Cortex-A8 13 Pentium M (Yonah) 13* • Optimizing memory architecture for faster random-access and wider Cortex-A8 13 buses can bring significant PowerPC 970 (G3) 16 performance increases. Pentium 4 20

16 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006 Speedup factors for 2CPU configurations. Multicore Performance 1.5x 1.8x 2x 1800 Very Parallel 1600

1400

1200 1CPU 1000 2CPU 3CPU 800 4CPU Performance Mostly Sequential 600

400

200

0 0 102030405060708090100 Parallelizable Program Code, %

Performance increase requires either: 1) multithreaded application or 2) parallel application usage scenario.

17 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006 Example: Parallel Usage Scenario

”Advanced end-user Ari does multiple things at the same time.” • Beginning: 1. Streaming audio from XM Radio. 2. Browsing website www.CNN.com (very tricky Java/tables/css). 3. Recording news video received from DVB-H. • Suddenly: 4. Push email downloads 1MB JPEG image (background). 5. Voice call comes in. • At the same, time many OS features are used in parallel: VoIP stack, HTTP/TCP/IP stack, Bluetooth stack, WLAN driver, telephony, MP3 decode, RTP/UDP stack, Java virtual machine, window server, fileserver, etc. • Requires high performance peak. Overload shows as bad user experience for the foreground application. • Any system stutter or unresponsiveness considered harmful.

18 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006 Multimedia Processors

• DSP and HW accelerator clock speeds are around 100-200MHz. Processors utilize DMA and large tightly-coupled memories. Typically there is no cache. • Application-specific hardware units provide performance boost for time- consuming kernels mostly found in video/imaging. • Three main processor subsystems for multimedia: • Video/audio processing • Camera post-processing • 2D/3D graphics acceleration • Fast camera serial-shooting sets very high peak-processing requirements for JPEG encoders: 20-40Mpxl/second for 5 images/second at 4-8MP resolution. • Mobile 3D graphics: OpenGL ES 2.0 will provide improved look of surface using programmable pixel/vertex shaders. • HW design cycles are long. Programmable DSP/SIMD processors are preferred for implementing new codecs and other unpredicted product features.

19 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006 MPEG4 AVC (H.264) Processing

Decoding • MPEG4 Advanced Video Codec (AVC or H.264) decoder provides excellent quality for low bit rates, but requires 2-3x more processing than Simple Profile (H.263). • All-ARM11 SW implementation of CIF (352x288) and D1 (720x576) 30fps decoding requires approximately 400 and 1400 MHz (3.5x expected).

Encoding • MPEG4 AVC video encoding is roughly 4x more complex as decoding. • Using decoder estimates, CIF and D1 30fps encoding would require 1.6GHz and 5.6GHz ARM11. • MPEG4 AVC (H.264) codecs supporting D1 30fps need hardware acceleration. • Coarse comparison: an MPEG4 Advanced Simple Profile (ASP) implementation reports CIF encoding at 35fps with 1.66GHz Athlon CPU. D1 35fps encoding would require 6GHz.

20 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006 Related Topics

• Open interface standards from MIPI alliance: apps processor-modem, display/camera serial interface, Image L1 high-speed serial (SLVS), UniPro. SignalDSP/ ARM Processor Cache • Advanced multi-die packaging for ProcessorSIMD Processor digital ASICs: Product size reduction and performance improvements. • Standard DSP/SIMD-processor concept: Memory / L1 Cache 2D/3D • Enable reusing optimized software DSP/ DSP/ SIMD SIMD Graphics • Maximize silicon utilization ProcessorMultimedia ProcessorAcclerator • Instruction set architecture for DSP/Accelerators BB/camera/imaging/video • Specify only ISA, implement a family of processors or one processor (IP)

21 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006 Six Key Messages

• Wireless multimedia computers are extremely power and heat limited gadgets. Fundamentally this limits amount of processing power integrated into device. • Power constrained CPUs are mandatory, but the most exciting features require system-level SW optimization. ARM Dhrystone_2.1 power budget is 250mW. • General-purpose processors will evolve to multicore due to increasing architecture/implementation complexity in the most advanced CPUs. • We will not get performance increase as ”a free lunch” anymore in multicore/SMT because applications need multithreading or parallel use case. • MIPI: key venue for specifying open, license-free mobile chipset interfaces. • Mobile industry could benefit from a standard, licensable multicore processor that is architected for video/imaging and baseband DSP processing.

22 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006 23 / 22 © 2006 Nokia Mika Kuulusa MPSOC Forum 2006