Software Standards for the Multicore Era

SOFTWARE STANDARDS FOR THE MULTICORE ERA The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Holt, J. et al. “Software Standards for the Multicore Era.” Micro, IEEE 29.3 (2009): 40-51. © 2009 IEEE As Published http://dx.doi.org/10.1109/MM.2009.48 Publisher Institute of Electrical and Electronics Engineers Version Final published version Citable link http://hdl.handle.net/1721.1/52432 Terms of Use Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. ......................................................................................................................................................................................................................... SOFTWARE STANDARDS FOR THE MULTICORE ERA ......................................................................................................................................................................................................................... SYSTEMS ARCHITECTS COMMONLY USE MULTIPLE CORES TO IMPROVE SYSTEM PERFORMANCE.UNFORTUNATELY, MULTICORE HARDWARE IS EVOLVING FASTER THAN SOFTWARE TECHNOLOGIES.NEW MULTICORE SOFTWARE STANDARDS ARE NECESSARY Jim Holt IN LIGHT OF THE NEW CHALLENGES AND CAPABILITIES THAT EMBEDDED MULTICORE Freescale Semiconductor SYSTEMS PROVIDE.THE NEWLY RELEASED MULTICORE COMMUNICATIONS API Anant Agarwal STANDARD TARGETS SMALL-FOOTPRINT, HIGHLY EFFICIENT INTERCORE AND INTERCHIP Massachusetts Institute COMMUNICATIONS. of Technology ......Pushing single-thread perfor- or for networked collections of computers mance is no longer viable for further scaling (see the ‘‘Requirements for Successful Het- of microprocessor performance, due to this erogeneous Programming’’ sidebar. Sven Brehmer approach’s unacceptable power characteristics.1-3 We define a system context as the set of PolyCore Software Current and future processor chips will external entities that may interact with an ap- comprise tens to thousands of cores and plication, including hardware devices, mid- specialized hardware acceleration to achieve dleware, libraries, and operating systems. A Max Domeika performance within power budgets.4-7 multicore SMP context would be a system Hardware designers find it relatively easy context in which the application interacts Intel to design multicore systems. But multicore with multicore hardware running an SMP presents new challenges to programmers, operating system and its associated libraries. including finding and implementing paral- In SMP multicore-system contexts, pro- Patrick Griffin lelism, dealing with debugging deadlocks grammers can use existing standards such as and race conditions (that is, conditions Posix threads (Pthreads) or OpenMP for Tilera 8,9 that occur when one or more software tasks their needs. In other contexts, the program- attempt to access shared program variables mer might be able to use existing standards Frank Schirrmeister without proper synchronization), and elimi- for distributed or parallel computing such as nating performance bottlenecks. sockets, CORBA, or message passing interface Synopsys It will take time for most programmers (MPI, see http://www.mpi-forum.org).10 and enabling software technologies to catch However, two factors make these standards up. Concurrent analysis, programming, debug- unsuitable in many contexts: ging, and optimization differ significantly from their sequential counterparts. In addi- heterogeneity of hardware and software, tion, heterogeneous multicore programming and is impractical using standards defined for sym- constraints on code size and execution metric multiprocessing (SMP) system contexts time overhead. .............................................................. 40 Published by the IEEE Computer Society 0272-1732/09/$25.00 c 2009 IEEE Authorized licensed use limited to: MIT Libraries. Downloaded on December 7, 2009 at 11:40 from IEEE Xplore. Restrictions apply. ............................................................................................................................................................................................... Requirements for Successful Heterogeneous Programming Successful heterogeneous programming requires adequately addressing providers have created multicore debugging and optimization software the following three areas: a development process to exploit parallelism, a tools, but they are custom-tailored to each specific chip.7,8 This situation generalized multicore programming model, and debugging and optimization leaves many programmers lacking tool support. paradigms that support this model. It became evident decades ago that source-level debugging tied to the programming language provided efficient debugging techniques to the programmer, such as visually examining complex data structures and stepping Development process 9 Multicore development models for scientific computing have existed for through stack back-traces. Raising the abstraction of multicore debugging years. But applications outside this domain leave programmers with stan- is another natural evolutionary step. In a multicore-system context, the pro- dards and development methods that don’t necessarily apply. Furthermore, grammer creates a set of software tasks, which are then allocated to the programmers have little guidance as to which standards and methods provide various processor cores in the system. The programmer would benefit some benefit and which don’t. Domeika has detailed some guidance on de- from monitoring task life cycles and from seeing how tasks interact via com- velopment techniques,1 but there is no industry-wide perspective covering munications and resource sharing. However, the lack of standards means multiple embedded-system architectures. This problem invites an effort to that debugging and optimization for most multicore chips is an art, not a formulate industry standard development processes for embedded systems science. This is especially troubling because concurrent programming introdu- that address current and future development models. ces new functional and performance pitfalls, including deadlocks, race conditions, false sharing, and unbalanced task schedules.10,11 Although it isn’t this article’s main focus, we believe that the standards work presented in Programming models the main text will enable new capabilities for multicore debug and A programming model comprises a set of software technologies that help optimization. programmers express algorithms and map applications to underlying computing systems. Sequential programming models require programmers to specify application functionality via serial-programming steps that occur in strict References order with no notion of concurrency. Universities have taught this programming model for decades. Outside of a few specialized areas, such as distributed sys- 1. M. Domeika, Software Development for Embedded Multi-core tems, scientific applications, and signal-processing applications, the sequential- Systems: A Practical Guide Using Embedded Intel Architecture, programming model permeates the design and implementation of software.2 Newnes, 2008. 2. W.-M. Hwu, K. Keutzer, and T.G. Mattson, ‘‘The Concurrency But there is no widely accepted multicore programming model outside Challenge,’’ IEEE Design & Test, vol. 25, no. 4, 2008, pp. 312-320. those defined for symmetric shared-memory architectures exhibited by work- 3. IEEE Std. 1003.1, The Open Group Base Specifications Issue 6, stations and PCs3 or those designed for specific embedded-application IEEE and Open Group, 2004; http://www.opengroup.org/onlinepubs/ domains such as media processing (see http://www.khronos.org). Although 009695399. these programming models work well for certain kinds of applications, 4. B. Vermeulen et al., ‘‘Overview of Debug Standardization Activ- many multicore applications can’t take advantage of them. Standards that ities,’’ IEEE Design & Test, vol. 25, no. 3, 2008, pp. 258-267. require system uniformity aren’t always suitable, because multicore systems 5. L. Adhianto and B. Chapman, ‘‘Performance Modeling of Com- display a wide variety of nonuniform architectures, including combinations of munications and Computation in Hybrid MPI and OpenMP Appli- general-purpose cores, digital-signal processors, and hardware accelerators. cations,’’ Proc. 12th Int’l Conf. Parallel and Distributed Systems Also, domain-specific standards don’t cover the breadth of application (ICPADS 06), IEEE CS Press, 2006, pp. 3-8. domains that multicore encompasses. 6. J. Cownie, and W. Gropp, ‘‘A Standard Interface for Debugger The lack of a flexible, general-purpose multicore-programming model lim- Access to Message Queue Information in MPI,’’ Recent Advan- its the ability of programmers to transition from sequential to multicore pro- ces in Parallel Virtual Machine and Message Passing Interface, gramming. It forces companies to create custom software for their chips. It J. Dongarra, E. Lugue, and T. Margalef, eds., 1999, Springer, prevents tool chain providers from maximally leveraging their engineering, pp. 51-58. forcing them to repeatedly produce custom solutions. It requires users to 7. M. Biberstein et al., ‘‘Trace-Based Performance Analysis on Cell craft custom infrastructure to support their programming needs. Furthermore, BE,’’ Proc. IEEE Int’l Symp. Performance Analysis of Systems time-to-market pressures often force

Software Standards for the Multicore Era

Microprocessor

Programmers' Tool Chain

Educational Goals for Embedded Systems in the Multicore Era

Performance Analysis and Tuning in Multicore Environments

Exploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API

Openmp for Heterogeneous Multicore Embedded Systems Using MCA API Standard Interface

High-Level Programming Model for Heterogeneous Embedded Systems Using

Performance Impact of Lock-Free Algorithms on Multicore Communication Apis

Making Full Use of Emerging ARM-Based Heterogeneous Multicore Socs Felix Baum, Arvind Raghuraman

Automatically Parallelizing Embedded Legacy Software on Soft-Core Socs

Heterogeneous Multicore Openamp

MINIME: Pattern-Aware Multicore Benchmark Synthesizer Etem Deniz, Member, IEEE, Alper Sen, Senior Member, IEEE, Brian Kahne, Jim Holt, Senior Member, IEEE