CARMA Energy-Efficient HPC on ARM
Total Page:16
File Type:pdf, Size:1020Kb
CARMA Energy-Efficient HPC on ARM 4th July 2012 – Munich, Germany At the beginning… What it used to be... At the beginning two different and independent markets divided by their features, price, performances and power consumption Two completely SEPARATED Worlds! ARM & x86 Markets Security Systems A/V JukeboxA/V Slot Slot machines Broad set of Embedded Applications Telematics / ETC Systems SECO® Proprietary & Confidential ARM & x86 Convergence Size of Size Market The Next Billion Computers 1,000 W 100 W 10 W 1 W Power Consumption SECO® Proprietary & Confidential ARM & x86 Convergence X86 15% 85% RISC ARM & x86 Convergence 6 5 4 ARM Power PC 3 x86 SH MIPS 2 1 0 2007 2008 2009 2010 2011 2012 2013 Semicast Research ARM & x86 Convergence Units in Billions 7 6 5 Annual Shipments 4 AR M 3 2 1 0 Years 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 The Best Standard For An Easy Migration It’s needed to BE STANDARD! The Best Standard For An Easy Migration? The Main Questions… The main questions Which is the Right Platform to suggest? How to let our Customer to reach the Market quickly? How to develop a Cost Effective and an Innovative Solution? What about Product Long Life Term? Why Qseven? Historical Background The Qseven Consortium • was born in the 2008 • Open to all Companies in embedded market • First standard accepted by the SGET The points of the success • Free Industrial COM Standard • Legacy Free • Cost effective MXM connector • Low power (12W max TDP) and low cost solution • Flexible Graphics – new graphic standards are supported • Embedded API • Fast Serial Interfaces • Solid Mechanical Mounting (7 x 7 cm – 2.76” x 2.76” ) • Passive cooling • Cross Platform ARM-x86 Why Qseven? The Qseven Members The Best Standard For An Easy Migration ARM - Key Features Industrial market: ARM & x86, two completely separated world... • The low-power consumption platform per excellence • Low-end, custom solutions • “Microcontroller-style” approach for a friendly solution across most small/medium system integrators • ARM processors offer many features through pin multiplexing • Strong platform customization solutions available on the market => no second sources available • Worse Time-To-Market because of software development The Best Standard For An Easy Migration ARM & x86 Convergence is the future The TRADE-OFF: increasing complexity of ARM solutions! Traditional ARM developers are now faced to new technological challenges. As a consequence, overall development costs are increasing, projects may become unfeasible and Time-to-Market goals cannot be achieved. Work and added value are still mainly focused on the software. The Best Standard For An Easy Migration The wide range of possibilities offered by ARM processors can lead to negative effects: very strong customization is necessary, obsolescence or natural change of board requirements can involve a long re-engineering time. Two different needs: • Manufacturers need to produce standard products to be used by as many customers as possible, reducing engineering and manufacturing costs –> Enabling Economies of Scale • Customers need to reduce resources invested in development, minimizing overall development and material expenses –> Shortest TTM! Standards help ensuring product availability over years, second sources and cost reduction. The Best Standard For An Easy Migration ARM & x86 Convergence is the future We can observe: Latest ARM processors often include standard x86 interfaces: • PCI-Express • DVI/HDMI video interface • S-ATA • MIPI DSI Latest ARM processors are multicore, enabling parallel Convergence computing and reducing overall power consumption while improving computational performances Convergence of performances: distributed VS centralized computation often allows ARM SOC to equal/overtake x86 performances (application-oriented approach VS full general purpose). Convergence of OS and SW: traditional x86 operating systems now support ARM architectures (Linux / Windows CE / Windows Compact Embedded 7 / Windows 8) The Best Standard For An Easy Migration Which Architecture? Thanks to the convergence of latest ARM processors and standard x86 interfaces You can easily migrate from x86 to ARM with the Qseven philosophy! X86 ARM Qseven 1.30: A Standard Evolution…Evolution of Standard Qseven 1.30: it is a standard oriented to the future! rel. 1.1 4x PCI Express lanes 2x SATA, 8x USB 2.0 2x ExpressCard SDIO, I²C Bus High Definition Audio / AC'97 Support Gigabit Ethernet / Fast Ethernet LVDS 2x24 Bit / SDVO / HDMI / Display Port Additional interfaces rel. 1.30 Power: 5VDC USB 3.0 support Battery Management Up to 4-display support Support for two single-channel LVDS displays rel. 1.20 Support for two display ports CAN Support for MIPI DSI video port SPI More GPIO pins DEBUG UART SUPPORT Two UARTs PWM Half size format proposed → less space requirements CROSS PLATFORM What is the Cross Platform? The Cross Platform is the reference design for SECO Qseven modules. With Qseven cross Platform, you can use x86 or ARM architectures, with full support of all Peripherals listed in Qseven Standards specifications. The Cross Platform Dev Kit CROSS PLATFORM SECO provides: Electrical schematics available for re-use in customers application BSPs (Win, Lunix, Android) available for the implementation of SECO Qseven modules on Cross Platform, ready to use Engineering team providing support for the use of schematics and BSPs Thanks to the cross-platform reference design, customers can drastically reduce time/costs required for the implementation of tailored solutions! CROSS PLATFORM Advantages for HW developers: - ORCAD schematics available to cut & paste - BOM built with common off-the-shelf components, easily sourced from distributors - BOM cost-optimized. Cross Platform is designed keeping in mind overall costs of the solution CROSS PLATFORM Advantages for the end-product Performance / price scalability Possibility to switch from x86 to ARM modules on the same carrier board Visibility over future technologies thanks to a standard approach SECO Qseven CROSS PLATFORM The XPlat DevKit &secoqseven.com Steps: # Access www.secoqseven.com # Download Cross Platform Datasheet & Manual: http://www.secoqseven.com/en/item/cross- platformdevelopment-kit/ # Access Private Area and download DOCs, Schematic and BSP for your Qseven module. The Cross Platform Starter Kit Expansion Slots 1 x miniPCI Express slot SIM Card slot for miniPCI Express modems Mass Storage 1 x S-ATA connector μSD Card Slot I/O Up to 7x USB ports (1 x USB client) 1 x Gigabit/FastEthernet connector 1 x optional additional FastEthernet port 8 x GPIO on 10-Pin Header Connector 2 x Serial ports (RS-232 / RS-422 / RS-485 configurable), one of them available at TTL level CAN Interface 4-Wire Touch Screen controller integrated SM Bus Pin Header I2C Bus, SPI interface Audio AC’97 and HD Audio Codec, jumper selectable Line In, Mic In on internal pin headers Earphone pin header Video LVDS Interface, 34 pin 2mm pin header Backlight Connector, 6 pin, 2mm Pin Header HDMI Connector CMOS Battery On Board rechargeable Lithium Battery for CMOS Backup and RTC Power 12V Power jack Internal Pin Header for Power, Lid, Sleep and Reset Button Power On Status LED Temperature Operating 0° ÷ +60°C Available in extreme version, -40°C ÷ +85°C Dimensions 100 x 72 mm (3,94” x 2,83”) Qseven® specifications rel. 1.20 compliant Qseven ARM for the HPC? CARMA concept CARMA is an architectural for high performance, energy efficient hybrid computing Schedule • Motivation • System Overview • System Details Motivation HPC systems will be capped by power and thermal limits • The world’s largest supercomputer systems are near their physical limits • Broader market HPC installations are capped by pragmatic and site limits The cluster revolution was driven by: Cost-effective computing – Dollars per FLOP Transferable knowledge and accessibility – Skills and tools developed on personal-scale machines Long-term viable architecture – Commodity market components used at a larger scale We now need to incorporate power-efficient computing The next revolution: Power Efficiency! Once again, look to commodity market for the next generation Power-effective computing is driven by phones and tables – ARM has an architectural and experience advantage – System-level software complexity is high • Most power optimization work is being done for ARM High performance power-efficent computing from GPGPUs – GPUs have an architectural efficency advantage – Many applications already effectively use GPUs GPU CPU 225 pJ/flop 1700 pJ/flop Optimized for throughput and Optimized for latency power efficiency Caches Explicit management of on-chip memory Fermi Westmere 40 nm 32 nm Multi-core CPUs Multi-core as a first response to power issues – Performance through parallelism, not frequency increases – Slow the complexity spiral – Better locality in many cases But CPUs have evolved for single thread performance rather than energy efficiency Fast clock rates with deep pipelines Data and instruction caches optimized for latency Superscalar issue with out-of-order execution Dynamic conflict detection Lots of predictions and speculative execution Lots of instruction overhead per operation Less than 2% of chip power today goes to flops! Possible Power-efficient Future Power-efficient general core combined with GPU • Power control shared with mobile products – Ultra-focused on power efficiency – Aggressive market forces innovation • Technology evolution driven by commodity market • Bulk of compute