CARMA Energy-Efficient HPC on ARM

4th July 2012 – Munich, Germany At the beginning…

What it used to be... . At the beginning two different and independent markets divided by their features, price, performances and power consumption

Two completely SEPARATED Worlds!

Security Systems ETC ETC Systems Telematics /

A/V Jukebox Markets & x86 ARM Applications Embedded Broad set of

Slot machines SECO® SECO® Proprietary & Confidential

SECO® SECO® Proprietary Size of

Market & Confidential ARM & x86 ConvergenceARM& x86

1,000 W 1,000

100 W The Next Billion Computers

10W

1 W

Consumption Power

ARM & x86 Convergence

X86 15%

85%

RISC ARM & x86 Convergence

6

5

4

ARM Power PC 3 x86 SH MIPS 2

1

0 2007 2008 2009 2010 2011 2012 2013 Semicast Research ARM & x86 Convergence

Units in Billions 7

6

5 Annual Shipments

4 AR M

3

2

1

0 Years 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 The Best Standard For An Easy Migration

It’s needed to BE STANDARD! The Best Standard For An Easy Migration? The Main Questions…

The main questions

Which is the Right Platform to suggest?

How to let our Customer to reach the Market quickly?

How to develop a Cost Effective and an Innovative Solution?

What about Product Long Life Term? Why Qseven? Historical Background

 The Qseven Consortium • was born in the 2008 • Open to all Companies in embedded market • First standard accepted by the SGET  The points of the success • Free Industrial COM Standard • Legacy Free • Cost effective MXM connector • Low power (12W max TDP) and low cost solution • Flexible Graphics – new graphic standards are supported • Embedded API • Fast Serial Interfaces • Solid Mechanical Mounting (7 x 7 cm – 2.76” x 2.76” ) • Passive cooling • Cross Platform ARM-x86

Why Qseven?

The Qseven Members  The Best Standard For An Easy Migration ARM - Key Features Industrial market: ARM & x86, two completely separated world...

• The low-power consumption platform per excellence

• Low-end, custom solutions

• “Microcontroller-style” approach for a friendly solution across most

small/medium system integrators

• ARM processors offer many features through pin multiplexing

• Strong platform customization solutions available on the market =>

no second sources available

• Worse Time-To-Market because of software development

 The Best Standard For An Easy Migration

ARM & x86 Convergence is the future

The TRADE-OFF: increasing complexity of ARM solutions!

 Traditional ARM developers are now faced to new technological challenges. As a consequence, overall development costs are increasing, projects may become unfeasible and Time-to-Market goals cannot be achieved.

 Work and added value are still mainly focused on the software.  The Best Standard For An Easy Migration

The wide range of possibilities offered by ARM processors can lead to negative effects:

very strong customization is necessary, obsolescence or natural change of board requirements can involve a long re-engineering time.

Two different needs:

• Manufacturers need to produce standard products to be used by as many customers as possible, reducing engineering and manufacturing costs –> Enabling Economies of Scale

• Customers need to reduce resources invested in development, minimizing overall development and material expenses –> Shortest TTM!

Standards help ensuring product availability over years, second sources and cost reduction.  The Best Standard For An Easy Migration

ARM & x86 Convergence is the future We can observe:  Latest ARM processors often include standard x86 interfaces: • PCI-Express • DVI/HDMI video interface • S-ATA • MIPI DSI  Latest ARM processors are multicore, enabling parallel Convergence computing and reducing overall power consumption while improving computational performances  Convergence of performances: distributed VS centralized computation often allows ARM SOC to equal/overtake x86 performances (application-oriented approach VS full general purpose).  Convergence of OS and SW: traditional x86 operating systems now support ARM architectures (Linux / Windows CE / Windows Compact Embedded 7 / Windows 8)

 The Best Standard For An Easy Migration

Which Architecture?

Thanks to the convergence of latest ARM processors and standard x86 interfaces You can easily migrate from x86 to ARM with the Qseven philosophy!

X86 ARM

 Qseven 1.30: A Standard Evolution…Evolution of Standard

Qseven 1.30: it is a standard oriented to the future! rel. 1.1 4x PCI Express lanes 2x SATA, 8x USB 2.0 2x ExpressCard SDIO, I²C Bus High Definition Audio / AC'97 Support Gigabit / Fast Ethernet LVDS 2x24 Bit / SDVO / HDMI / Display Port Additional interfaces rel. 1.30 Power: 5VDC USB 3.0 support Battery Management Up to 4-display support Support for two single-channel LVDS displays rel. 1.20 Support for two display ports CAN Support for MIPI DSI video port SPI More GPIO pins DEBUG UART SUPPORT Two UARTs PWM Half size format proposed → less space requirements

CROSS PLATFORM

What is the Cross Platform?

The Cross Platform is the reference design for SECO Qseven modules.

With Qseven cross Platform, you can use x86 or ARM architectures, with full support of all Peripherals listed in Qseven Standards specifications. The Cross Platform Dev Kit CROSS PLATFORM

SECO provides:

 Electrical schematics available for re-use in customers application

 BSPs (Win, Lunix, Android) available for the implementation of SECO Qseven modules on Cross Platform, ready to use

 Engineering team providing support for the use of schematics and BSPs

Thanks to the cross-platform reference design, customers can drastically reduce time/costs required for the implementation of tailored solutions! CROSS PLATFORM

Advantages for HW developers:

- ORCAD schematics available to cut & paste

- BOM built with common off-the-shelf components, easily sourced from distributors

- BOM cost-optimized. Cross Platform is designed keeping in mind overall costs of the solution

CROSS PLATFORM

Advantages for the end-product

 Performance / price scalability

 Possibility to switch from x86 to ARM modules on the same carrier board

 Visibility over future technologies thanks to a standard approach  SECO Qseven CROSS PLATFORM

The XPlat DevKit &secoqseven.com Steps:

# Access www.secoqseven.com

# Download Cross Platform Datasheet & Manual: http://www.secoqseven.com/en/item/cross- platformdevelopment-kit/

# Access Private Area and download DOCs, Schematic and BSP for your Qseven module.

The Cross Platform Starter Kit

Expansion Slots 1 x miniPCI Express slot SIM Card slot for miniPCI Express modems Mass Storage 1 x S-ATA connector μSD Card Slot I/O Up to 7x USB ports (1 x USB client) 1 x Gigabit/FastEthernet connector 1 x optional additional FastEthernet port 8 x GPIO on 10-Pin Header Connector 2 x Serial ports (RS-232 / RS-422 / RS-485 configurable), one of them available TTL level CAN Interface 4-Wire Touch Screen controller integrated SM Bus Pin Header I2C Bus, SPI interface Audio AC’97 and HD Audio Codec, jumper selectable Line In, Mic In on internal pin headers Earphone pin header Video LVDS Interface, 34 pin 2mm pin header Backlight Connector, 6 pin, 2mm Pin Header HDMI Connector CMOS Battery On Board rechargeable Lithium Battery for CMOS Backup and RTC Power 12V Power jack Internal Pin Header for Power, Lid, Sleep and Reset Button Power On Status LED Temperature Operating 0° ÷ +60°C Available in extreme version, -40°C ÷ +85°C Dimensions 100 x 72 mm (3,94” x 2,83”)

Qseven® specifications rel. 1.20 compliant Qseven ARM for the HPC?

CARMA concept CARMA is an architectural for high performance, energy efficient hybrid computing

Schedule • Motivation • System Overview • System Details

Motivation

HPC systems will be capped by power and thermal limits

• The world’s largest supercomputer systems are near their physical limits • Broader market HPC installations are capped by pragmatic and site limits The cluster revolution was driven by:  Cost-effective computing – Dollars per FLOP  Transferable knowledge and accessibility – Skills and tools developed on personal-scale machines  Long-term viable architecture – Commodity market components used at a larger scale

We now need to incorporate power-efficient computing The next revolution: Power Efficiency!

Once again, look to commodity market for the next generation

Power-effective computing is driven by phones and tables – ARM has an architectural and experience advantage – System-level software complexity is high • Most power optimization work is being done for ARM High performance power-efficent computing from GPGPUs – GPUs have an architectural efficency advantage – Many applications already effectively use GPUs

GPU CPU 225 pJ/flop 1700 pJ/flop

Optimized for throughput and Optimized for latency power efficiency Caches Explicit management of on-chip memory

Fermi Westmere 40 nm 32 nm Multi-core CPUs

 Multi-core as a first response to power issues – Performance through parallelism, not frequency increases – Slow the complexity spiral – Better locality in many cases  But CPUs have evolved for single thread performance rather than energy efficiency Fast clock rates with deep pipelines Data and instruction caches optimized for latency Superscalar issue with out-of-order execution Dynamic conflict detection Lots of predictions and speculative execution Lots of instruction overhead per operation

Less than 2% of chip power today goes to flops! Possible Power-efficient Future Power-efficient general core combined with GPU • Power control shared with mobile products – Ultra-focused on power efficiency – Aggressive market forces innovation • Technology evolution driven by commodity market • Bulk of compute power provided by inherently efficient GPUs

Increase to over 50% of chip power for flops! Why CARMA?

 Have a real platform for these future HPC systems

 Explore the efficiency and performance trade-offs for existing ARM+GPU systems

 Check, tune and evaluate CUDA accelerated applications

CARMA Hardware Overview

Ultra low power host CPU Qseven module with Tegra T30 “Kal-El”, 2GB DDR3, 4GB eMMC Four ARM A9 cores with NEON and VFPv3 extensions NVIDIA GPU for GPU computing Quadro 1000M on PCIe 4x 96 CUDA cores with 200GFLOPS SP peak MXM module CARMA Software Overview

• ARM Linux distribution – Ubuntu 11.04 for ARM – Linux 3.1.10 kernel – Enhancements to support Tegra features • CUDA 4.2 run-time and libraries • Host x86 system support for cross development – CUDA cross-compiler

Developer Information

• For support and questions, register on the CUDA DevZone – http://www.nvidia.com/carmadevkit – http://www.nvidia.com/devzone • Future enhancements – Native (ARM hosted) compile support – Updated CUDA versions e.g. CUDA 5.0 • Long term plans for the CARMA platform – ARMv8 64 bit platform support

World’s First ARM CPU / CUDA GPU Supercomputer • Mont Blanc Research project

• Exploring energy efficient supercomputer architectures

• Working towards exascale

http://www.montblanc-project.eu • Developing applications Can we build supercomputers from embedded technology?

Speaker: Gianluca Venere – Sales Director E-mail: [email protected]

Thank You!

www.seco.com www.secoqseven.com