Memory Models: a Case for Rethinking Parallel Languages and Hardware

review articles Doi:10.1145/1787234.1787255 Solving the memory model problem will require an ambitious and cross-disciplinary research direction. By SaRiTa V. Adve anD hanS-J. Boehm memory models: a case for Rethinking Parallel Languages and hardware most pARALLeL pRogRAms today are written using threads potentially provides a performance advantage; for example, by implicitly and shared variables. Although there is no consensus sharing read-mostly data without the on parallel programming models, there are a number space overhead of complete replica- tion. The ability to pass memory refer- ee of reasons why threads remain popular. Threads H ences among threads makes it easier to were already widely supported by mainstream share complex data structures. Finally, en Van W operating systems well before the dominance of shared-memory makes it far easier to on by G selectively parallelize application hot I multicore, largely because they are also useful for other spots without complete redesign of data purposes. Direct hardware support for shared-memory structures. Illusrat 90 communicaTionS of The acm | august 2010 | vol. 53 | no. 8 art in Development The memory model, or memory to adjacent fields in a memory location gram (written in a high-level, byte code, consistency model, is at the heart of at the same time? Must the final value assembly, or machine language) or any the concurrency semantics of a shared- of a location always be one of those writ- part of the language implementation memory program or system. It defines ten to it? (including hardware) without an unam- the set of values that a read in a pro- The memory model defines an in- biguous memory model. gram is allowed to return, thereby de- terface between a program and any A complex memory model makes fining the basic semantics of shared hardware or software that may trans- parallel programs difficult to write, and variables. It answers questions such as: form that program (for example, the parallel programming difficult to teach. Is there enough synchronization to en- compiler, the virtual machine, or any An overly constraining one may limit sure a thread’s write will occur before dynamic optimizer). It is not possible to hardware and compiler optimization, another’s read? Can two threads write meaningfully reason about either a pro- severely reducing performance. Since august 2010 | vol. 53 | no. 8 | communicaTionS of The acm 91 review articles it is an interface property, the memory this experience has made it clear that model decision has a long-lasting im- key insights solving the memory model problem will pact, affecting portability and maintain- require a significantly new and cross- memory models, which describe the ability of programs. Thus, a hardware semantics of shared variables, are disciplinary research direction for par- architecture committed to a strong crucial to both correct multithreaded allel computing languages, hardware, memory model cannot later forsake it applications and the entire underlying and environments as a whole. implementation stack. it is difficult for a weaker model without breaking bi- to teach multithreaded programming This article discusses the path that nary compatibility, and a new compiler without clarity on memory models. led to the current convergence in mem- release with a weaker memory model ory models, the fundamental shortcom- after much prior confusion, major may require rewriting source code. Fi- programming languages are converging ings it exposed, and the implications nally, memory-model-related decisions on a model that guarantees simple for future research. The central role of interleaving-based semantics for for a single component must consider “data-race-free” programs and most the memory model in parallel comput- implications for the rest of the system. hardware vendors have committed to ing makes this article relevant to many A processor vendor cannot guarantee support this model. computer science subdisciplines, in- a strong hardware model if the mem- This process has exposed fundamental cluding algorithms, applications, lan- ory system designer provides a weaker shortcomings in our languages and guages, compilers, formal methods, model; a strong hardware model is not a hardware-software mismatch. software engineering, virtual machines, Semantics for programs that contain very useful to programmers using lan- data races seem fundamentally difficult, runtime systems, and hardware. For guages and compilers that provide only but are necessary for concurrency practitioners and educators, we pro- a weak guarantee. safety and debuggability. We call upon vide a succinct summary of the state of software and hardware communities Nonetheless, the central role of the to develop languages and systems the art of this often-ignored and poorly memory model has often been down- that enforce data-race-freedom, and understood topic. For researchers, we played. This is partly because formally co-designed hardware that exploits and outline an ambitious, cross-disciplinary specifying a model that balances all supports such semantics. agenda toward resolving a fundamental desirable properties of programmabil- problem in parallel computing today— ity, performance, and portability has environments that addressed the is- what value can a shared variable have proven surprisingly complex. At the sue with relative clarity,40 but the most and how to implement it? same time, informal, machine-specific widely used environments had unclear descriptions proved mostly adequate in and questionable specifications.9,32 Sequential consistency an era where parallel programming was Even when specifications were relative- A natural view of the execution of a the domain of experts and achieving the ly clear, they were often violated to ob- multithreaded program operating on highest possible performance trumped tain sufficient performance,9 tended to shared variables is as follows. Each programmability or portability argu- be misunderstood even by experts, and step in the execution consists of choos- ments. were difficult to teach. ing one of the threads to execute, and In the late 1980s and 1990s, the area Since 2000, we have been involved in then performing the next step in that received attention primarily in the hard- efforts to cleanly specify programming- thread’s execution (as dictated by the ware community, which explored many language-level memory models, first thread’s program text, or program or- approaches, with little consensus.2 for Java and then C++, with efforts now der). This process is repeated until the Commercial hardware memory model under way to adopt similar models for C program as a whole terminates. Effec- descriptions varied greatly in precision, and other languages. In the process, we tively, the execution can be viewed as including cases of complete omission had to address issues created by hard- taking all the steps executed by each of the topic and some reflecting ven- ware that had evolved without the ben- thread, and interleaving them in some dors’ reluctance to make commitments efit of a clear programming model. This way. Whenever an object (that is, vari- with unclear future implications. Al- often made it difficult to reconcile the able, field, or array element) is accessed, though the memory model affects the need for a simple and usable program- the last value stored to the object by this meaning of every load instruction in ev- ming model with that for adequate per- interleaved sequence is retrieved. ery multithreaded application, it is still formance on existing hardware. For example, consider Figure 1, sometimes relegated to the “systems Today, these languages and most which gives the core of Dekker’s mutual programming” section of the architec- hardware vendors have published (or exclusion algorithm. The program can ture manual. plan to publish) compatible memory be executed by interleaving the steps Part of the challenge for hardware model specifications. Although this from the two threads in many ways. For- architects was the lack of clear memory convergence is a dramatic improve- mally, each of these interleavings is a models at the programming language ment over the past, it has exposed fun- total order over all the steps performed level. It was unclear what programmers damental shortcomings in our parallel by all the threads, consistent with the expected hardware to do. Although languages and their interplay with hard- program order of each thread. Each ac- hardware researchers proposed ap- ware. After decades of research, it is still cess to a shared variable “sees” the last proaches to bridge this gap,3 wide- unacceptably difficult to describe what prior value stored to that variable in the spread adoption required consensus value a load can return without com- interleaving. from the software community. Before promising modern safety guarantees or Figure 2 gives three possible execu- 2000, there were a few programming implementation methodologies. To us, tions that together illustrate all possible 92 communicaTionS of The acm | august 2010 | vol. 53 | no. 8 review articles final values of the non-shared variables Second, while sequential consisten- neous access to variables by different r1 and r2. Although many other inter- cy may seem to be the simplest model, threads. If we require that these be used leavings are also possible, it is not possi- it is not sufficiently simple and a much correctly, and guarantee sequential ble that both r1 and r2 are 0 at the end of less useful programming model than consistency only if no undesirable con- an execution; any execution must start commonly imagined. For example, it current accesses are present, we avoid with the first statement of one of the two only makes sense to reason about in- the above issues. threads, and the variable assigned there terleaving steps if we know what those We can make this more precise as will later be read as one. steps are. In this case, they are typically follows. We assume the language allows Following Lamport,26 an execution individual memory accesses, a very low- distinguishing between synchroniza- that can be understood as such an in- level notion.

Load more