COMPASS: a Community-Driven Parallelization Advisor for Sequential Software

COMPASS: A Community-driven Parallelization Advisor for Sequential Software Simha Sethumadhavan Nipun Arora Ravindra Babu Ganapathi John Demme Gail E. Kaiser Department of Computer Science, Columbia University, New York, 10027 E-mail: {simha, kaiser}@cs.columbia.edu Abstract driven Parallelization Advisor for Sequential Software – proffers advice to programmers based on information col- The widespread adoption of multicores has renewed the lected from observing a community of programmers paral- emphasis on the use of parallelism to improve performance. lelize their code. The utility of COMPASS rests in part on The present and growing diversity in hardware architec- the premise that the growing popularity of CMP systems tures and software environments, however, continues to will encourage expert programmers to parallelize some of pose difficulties in the effective use of parallelism thus de- the existing sequential software, and COMPASS can quickly laying a quick and smooth transition to the concurrency era. deploy capabilities to capture their wisdom (including any In this paper, we describe the research being conducted at gained through trial and error intermediate steps) for the Columbia University on a system called COMPASS that aims multicore software engineering community. to simplify this transition by providing advice to program- COMPASS observes (Table 1) expert programmers mers while they reengineer their code for parallelism. The (henceforth called gurus) parallelize their sequential code advice proffered to the programmer is based on the wisdom using parallel programming patterns and other techniques collected from programmers who have already parallelized for parallelization, records their code changes (before and some similar code. The utility of COMPASS rests, not only after), summarizes this information and stores it in a central- on its ability to collect the wisdom unintrusively but also on ized Internet-accessible database. When a relatively inexpe- its ability to automatically seek, find and synthesize this wis- rienced new user (henceforth called a learner) wants to par- dom into advice that is tailored to the task at hand, i.e., the allelize his/her code, the system first identifies the regions code the user is considering parallelizing and the environ- of code most warranting performance improvement (deter- ment in which the optimized program is planned to execute. mined by profiling typical executions), and then which of COMPASS provides a platform and an extensible framework those regions are most amenable to parallelization (by con- for sharing human expertise about code parallelization – sulting its database of previously parallelized code). COM- widely, and on diverse hardware and software. By lever- PASS then presents a stylized template, or “sketch”, that can aging the “wisdom of crowds” model [30], which has been be used as a starting point for parallelization by the learner. conjectured to scale exponentially and which has success- The learner may then provide feedback to the system on the fully worked for wikis, COMPASS aims to enable rapid prop- usefulness of the advice. To effectively provide these capa- agation of knowledge about code parallelization in the con- bilities, COMPASS builds upon recent work on code clone text of the actual parallelization reengineering, and thus detection and graph matching algorithms, and introduces continue to extend the benefits of Moore’s law scaling to program transformations for generating program sketches science and society. from similar code. The work described in this paper is still in progress but our pilot system can already analyze enterprise level code in C/C++. Once complete, COMPASS will provide 1 Introduction a wide range of advice for improving application perfor- The adoption of chip multiprocessors (CMPs) poses mance across multiple granularities of parallelism. Its prac- methodological and linguistic challenges for sequential tical utility will depend on the nature and number of users in software. While new programming models and languages the system and the diversity of their code. We believe that can help us create correct parallel programs quickly, there it is not unreasonable to imagine that COMPASS will be able is an immediate need for tools that can systematically help to provide advice for a large class of present and upcom- in parallelizing, debugging and performance engineering of ing CMP systems with templates covering: peephole Data the vast sequential legacy code base. In this paper, we out- Level Parallelism optimizations such as using SIMD ker- line our vision for such a tool. COMPASS – A Community- nels for inner loops, splitting loop iterations into threads us- IWMSE’09, May 18, 2009, Vancouver, Canada 978-1-4244-3718-4/09/$25.00 © 2009 IEEE 41 ICSE’09 Workshop Table 1. User Interactions with COMPASS Learner Guru Identifies the program to be parallelized and specifies Inputs a sequential program and highlights re- program inputs. Optionally, identifies code regions of gions being parallelized. Matching of sequential particular interest for parallelization. and parallel regions may be automated. Receives parallelization suggestions. Modifies & adapts Inputs the corresponding parallel version. the suggestion to the program at hand, if necessary. Optionally, provides feedback on the usefulness of the Provides system information and required li- suggestion braries for optimization being entered ing OpenMP, replacement and threading of large chunks of mined semi-automatically from code bases with little hu- serial code with vendor-supplied optimized libraries (such man intervention. When an optimization is complete, the as CUDA), stream dataflow graphs, and parallel patterns. watcher snips out the code regions corresponding to the op- To the best of our knowledge, we are the first to propose timization, and summarizes the before and after versions leveraging collective human wisdom to propagate “best in the form of unique identifiers, called signatures, and de- practices” about how to parallelize code. This is a new posits them in the data store. The data store holds the be- way of thinking about (multicore) performance engineer- fore and after signatures as < key,value > pairs (with the ing enabled by the connectedness brought about the Inter- before version acting as the key and the after version serv- net and the wide availability of parallel machines. Some- ing as a value – which could in principle be reversed to what related is work by Deline et al. [11] on understanding de-parallelize if warranted). When a learner wants to paral- programs by recording source code navigation actions from lelize some code, the matcher module prepares signatures of several users, work by Deganais and Robillard on a recom- regions of code that are considered critical for paralleliza- mendation system for adapting to framework evolution [9] tion (e.g., via hotspot profiling), and sends them to the data and work by Cubranic et al. on collecting and organizing store. The data store runs a query with the matcher signature group project memory to help new users understand large as a key, ranks the results and returns the top parallelized code bases [7]. signature(s) to the generator. The generator then prepares a COMPASS is a significant improvement over the state of “sketch” of the code corresponding to the parallelized ver- the art, where the learner has to “pull” parallelization ad- sion, presented as a graphical overlay that can be accepted vice from books, class notes and/or Internet tutorial ex- as is, modified or rejected by the learner. The learner may amples. Such a process can be time consuming and error optionally provide feedback to the data store on the useful- prone. COMPASS, on the other hand, is a “push” based sys- ness of the sketch which can be used to rank parallelization tem where advice is proactively proferred to the learner with solutions when multiple matches are available. very little overhead for the learner. In addition, unlike tuto- § An Use Case We envision COMPASS being useful in a rials, the advice is customized to the learner’s specific cod- wide range of optimization scenarios, only one of which we ing problem. Further, unlike traditional hardware, compiler, elaborate (refer to Sethumadhavan and Kaiser [25] for more language or hybrid approaches for parallelization, which examples). The example described below is chosen to high- have long incubation times before application end users can light parallelization opportunities that COMPASS is likely to benefit, typically years to decades, COMPASS can enable be able to exploit whereas typical optimizers – compilers rapid parallelization of sequential code because the useful- and hardware – do not exploit. ness of knowledge networks like COMPASS scale exponen- § Procedurization When chip companies design new tially as the number of users in the system increases [24]. hardware features to improve performance, they typically release APIs that can fully utilize the new hardware capabilities. Examples include the CUDA graphics library 2 System Architecture for effectively using the capabilities of NVIDIA Graph- ics Co-Processors [21] and the Intel Performance Primi- The architecture of COMPASS is shown in Figure 1. The sys- tives [29] which contain specially optimized library rou- tem has four modules that combine to provide the required tines for domain-specific operations such as JPEG image functionality – the watcher, the data store, the matcher, and processing or video transcoding

COMPASS: a Community-Driven Parallelization Advisor for Sequential Software

Software Optimization Guide for Amd Family 15H Processors (.Pdf)

Motmot Documentation Release 0

AMD Accelerated Parallel Processing Math Libraries Are Software Libraries

Downloaded and Freely Modiﬁed to Meet Our Additional Requirements Related to Result Logging

VSM Cover Snipe

Nova.Simd - a Framework for Architecture-Independent SIMD Development

Source Code for Biology and Medicine Biomed Central

D1.2 Mathisis Exploitation Plan M24

X86 Assembly Language Reference Manual Oracle Documentation

On the Effective Parallel Programming of Multi-Core Processors

Efficient Implementations of Machine Vision Algorithms Using a Dynamically Typed Programming Language

System Manual