Speculative Separation for Privatization and Reductions

Speculative Separation for Privatization and Reductions Nick P. Johnson Hanjun Kim Prakash Prabhu Ayal Zaksy David I. August Princeton University, Princeton, NJ yIntel Corporation, Haifa, Israel fnpjohnso, hanjunk, pprabhu, [email protected] [email protected] Abstract Memory Layout Static Speculative Automatic parallelization is a promising strategy to improve appli- Speculative LRPD [22] R−LRPD [7] cation performance in the multicore era. However, common pro- Privateer (this work) gramming practices such as the reuse of data structures introduce Dynamic PD [21] artificial constraints that obstruct automatic parallelization. Privati- Polaris [29] ASSA [14] zation relieves these constraints by replicating data structures, thus Static Array Expansion [10] enabling scalable parallelization. Prior privatization schemes are Criterion DSA [31] RSSA [23] limited to arrays and scalar variables because they are sensitive to Privatization Manual Paralax [32] STMs [8, 18] the layout of dynamic data structures. This work presents Privateer, the first fully automatic privatization system to handle dynamic and Figure 1: Privatization Criterion and Memory Layout. recursive data structures, even in languages with unrestricted pointers. To reduce sensitivity to memory layout, Privateer speculatively separates memory objects. Privateer’s lightweight runtime system contention and relaxes the program dependence structure by repli- validates speculative separation and speculative privatization to en- cating the reused storage locations, producing multiple copies in sure correct parallel execution. Privateer enables automatic paral- memory that support independent, concurrent access. Similarly, re- lelization of general-purpose C/C++ applications, yielding a ge- duction techniques relax ordering constraints on associative, com- omean whole-program speedup of 11.4× over best sequential ex- mutative operators by replacing (or expanding) storage locations. ecution on 24 cores, while non-speculative parallelization yields Prior work [7, 21, 22, 29, 32] demonstrates that privatization and only 0.93×. reductions are key enablers of parallelization. The applicability of privatization systems can be understood in Categories and Subject Descriptors D.1.3 [Software]: Concur- two dimensions (Figure 1). A system uses the Privatization Cri- rent Programming—Parallel Programming; D.3.4 [Programming terion [21] to decide if replacing a shared data structure would Languages]: Processors—Compilers, Optimization change program behavior. To replicate object storage, a system de- termines Memory Layout: the base address and size of memory ob- General Terms Languages, Performance, Design, Experimenta- jects. Prior work assesses the privatization criterion statically [29], tion dynamically [21], or speculatively [7, 22]. However, prior work limits memory layout to arrays and scalar Keywords Automatic parallelization, Separation, Speculation variables only and fails for programs that use linked or recursive data structures. The prevalent use of pointers and dynamic memory 1. Introduction allocation creates a dichotomy between static accesses and objects. The microprocessor industry has committed to multicore architec- A pointer may refer to different objects of different sizes at different tures. These additional hardware resources, however, offer no bene- times, and static analysis usually fails to disambiguate these cases. fit to sequential applications. Automatic parallelization is a promis- As a result, it is difficult for a static compiler to decide which ing approach to achieve performance on existing and new applica- objects to duplicate, even when it can decide which accesses are tions without additional programmer effort or application changes. private. Prior techniques are largely inapplicable to most C or C++ Yet automatic parallelization is not the norm. One limiting fac- applications for this reason. Table 1 summarizes prior work. tor is the compiler’s inability to distribute work across processors This work proposes Privateer, the first fully automatic system due to reuse of data structures. This reuse does not contribute to the capable of privatizing data structures in languages with pointers, constructive data flow of the program, but creates contention that type casts, and dynamic allocation. Instead of relying solely on prevents efficient parallel execution. A parallelizing compiler must static analysis to determine memory layout, Privateer employs pro- either respect this contention by enforcing exclusivity on accesses filing to characterize memory accesses at the granularity of mem- to shared data structures, or ignore it and risk data races that change ory objects. Using profiling information and static analysis, Priva- program behavior. teer identifies accesses to memory objects that are expected to be A compiler can remove contention by creating a private copy of iteration-private. Such objects are speculatively privatized, predict- the data structures for each worker process. Privatization eliminates ing that their accesses will remain iteration-private, thereby relax- ing program dependence structure and enabling optimization and parallelization. Privateer overcomes difficulties in memory layout while mini- mizing validation overheads. The loop’s memory footprint is par- Permission to make digital or hard copies of all or part of this work for personal or titioned into several logical heaps according to observed access classroom use is granted without fee provided that copies are not made or distributed patterns. Privateer speculates that these heaps remain separated for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute at runtime rather than speculating that individual memory access to lists, requires prior specific permission and/or a fee. pairs are independent. Workers validate this separation property au- PLDI’12, June 11–16, Beijing, China. tonomously, requiring neither a log of accesses nor communication Copyright c 2012 ACM 978-1-4503-1205-9/12/04. $10.00 with other workers. This separation property is efficiently checked using compact metadata encoded in pointer addresses. Speculative ulation [7, 22] in prior work. To replace these data structures, the separation reduces sensitivity to memory layout, thus allowing Pri- privatization strategy must also determine the memory layout. vateer to extend an LRPD-style shadow memory test [22] to ar- Determining the memory layout entails identifying all private bitrary objects. Privateer’s robust, layout-insensitive privatization objects: Q, pathcost, and all linked list nodes. The memory lay- and reductions enable automatic parallelization of applications with out enables the system to duplicate objects and re-route memory linked and recursive data structures. This work contributes: accesses so workers refer to their private copy. In the absence of pointers, a variable’s source-level name uniquely identifies a mem- • the first fully automatic system to support privatization and ory object, allowing the compiler to determine the object’s address reductions involving pointers and dynamic allocation; and size. However, languages with pointers and dynamic alloca- • an efficient, scalable validation scheme for speculative priva- tion allow a many-to-many relationship between names and ob- tization and reductions based on the speculative separation jects. Pointers refer to different objects at different times and al- of logical heaps according to usage patterns; and, location sites produce many objects, causing the code to exhibit different reuse patterns. Unlike related work, Privateer addresses • an application of this speculative privatization and reduction the complications of pointers, type casts, and dynamic allocation. technique to the problem of automatic parallelization. Privateer speculatively separates the program state into several Privateer’s transformations facilitate scalable automatic paral- logical heaps according to the reuse patterns observed during pro- lelization on commodity shared-memory machines. No program- filing. The compiler indicates this separation to the runtime system, mer hints are used, nor are any hardware extensions assumed. We which in turn privatizes without concern for individual objects. By implement Privateer and evaluate it on a set of 5 programs. On a grouping objects, a logical heap can be privatized as a whole by 24-core machine, results demonstrate a geomean whole-program adjusting virtual page tables, neither requiring complicated book- speedup of 11.4× over best sequential execution. To achieve these keeping nor adjusting object addresses in a running program. All results, Privateer privatized linked and recursive data structures objects of each logical heap are placed within a fixed memory ad- which are beyond the abilities of prior work. Speculation via heap dress range, allowing efficient validation of the separation property. separation allows Privateer to extract scalable performance from In the example, Q, pathcost, and all linked list nodes are accessed general purpose, irregular applications. privately whereas adj is only read; Privateer allocates them to dis- tinct logical heaps of private objects and read-only objects at com- pile time, validating this separation at runtime. 2. Motivation Speculative separation greatly simplifies the memory layout problem, condensing unboundedly many objects into few heaps. Automatic parallelization

Speculative Separation for Privatization and Reductions

Chinese Privatization: Between Plan and Market

Speculation: Futures and Capitalism in India: Opening Statement

Advanced Data Structures

Fast As a Shadow, Expressive As a Tree: Hybrid Memory Monitoring for C

Speculation in the United States Government Securities Market

Rockjit: Securing Just-In-Time Compilation Using Modular Control-Flow Integrity

Transforming Government Through Privatization

The Many Sides of John Stuart M I L L ' S Political and Ethical Thought Howard Brian Cohen a Thesis Submitted I N Conformity

Monetary Theory

Download/Repository/Ivan Fratric.Pdf, 2012

Speculative Business Loss from Trading in Shares Cannot Be Set-Off Against Profits from the Business of Futures and Options Prio

Practical Analysis of Framework-Intensive Applications