Memory Consistency Models

Memory Consistency Models David Mosberger TR 93/11 Abstract This paper discusses memory consistency models and their in¯uence on software in the context of parallel machines. In the ®rst part we review previous work on memory consistency models. The second part discusses the issues that arise due to weakening memory consistency. We are especially interested in the in¯uence that weakened consistency models have on language, compiler, and runtime system design. We conclude that tighter interaction between those parts and the memory system might improve performance considerably. Department of Computer Science The University of Arizona Tucson, AZ 85721 ¡ This is an updated version of [Mos93] 1 Introduction increases. Shared memory can be implemented at the hardware Traditionally, memory consistency models were of in- or software level. In the latter case it is usually called terest only to computer architects designing parallel ma- DistributedShared Memory (DSM). At both levels work chines. The goal was to present a model as close as has been done to reap the bene®ts of weaker models. We possible to the model exhibited by sequential machines. conjecture that in the near future most parallel machines The model of choice was sequential consistency (SC). will be based on consistency models signi®cantly weaker ¡ Sequential consistency guarantees that the result of any than SC [LLG 92, Sit92, BZ91, CBZ91, KCZ92]. execution of processors is the same as if the opera- The rest of this paper is organized as follows. In tions of all processors were executed in some sequential section 2 we discuss issues characteristic to memory order, and the operations of each individual processor consistency models. In the following section we present appear in this sequence in the order speci®ed by the several consistency models and their implications on the program. However, this model severely restricts the set programming model. We then take a look at implemen- of possible optimizations. For example, in an architec- tation options in section 4. Finally, section 5 discusses ture with a high-latency memory, it would be bene®- the in¯uence of weakened memory consistency models cial to pipeline write accesses and to use write buffers. on software. In particular, we discuss the interactions None of these optimizations is possible with the strict between a weakened memory system and the software SC model. Simulations have shown that weaker models using it. allowingsuch optimizationscould improve performance on the order of 10 to 40 percent over a strictly sequential model [GGH91, ZB92]. However, weakening the mem- 2 Consistency Model Issues ory consistency model goes hand in hand with a change in the programming model. In general, the program- Choosing an appropriate memory consistency model ming model becomes more restricted (and complicated) (MCM) is a tradeoff between minimizing memory ac- as the consistency model becomes weaker. That is, an cess order constraints and the complexity of the pro- architecture can employ a weaker memory model only gramming model as well as of the complexity of the if the software using it is prepared to deal with the new memory model itself. The weakest possible memory is programming model. Consequently, memory consis- one returning for a read access some previously writ- tency models are now of concern to operating system ten value or a value that will be written in the future. and language designers too. Thus, the memory system could choose to return any We can also turn the coin around. A compiler nor- of those values. While minimizing ordering constraints mally considers memory accesses to be expensive and perfectly, it is not useful as a programming model. Im- therefore tries to replace them by accesses to registers. plementing weaker models is also often more complex In terms of a memory consistency model, this means that as it is necessary to keep track of outstanding accesses certain accesses suddenly are not observable any more. and restrict the order of execution of two accesses when In effect, compilers implicitly generate weak memory necessary. It is therefore not surprising that many differ- consistency. This is possible because a compiler knows ent MCM's have been proposed and new models are to exactly (or estimates conservatively) the points where be expected in the future. Unfortunately, there is no sin- memory has to be consistent. For example, compilers gle hierarchy that could be used to classify the strictness typically write back register values before a function of a consistency model. Below, we de®ne the design call, thus ensuring consistency. It is only natural to at- space for consistency models. This allows us to classify tempt to make this implicit weakening explicit in order the various models more easily. to let the memory system take advantage too. In fact, Memory consistency models impose ordering restric- it is anticipated that software could gain from a weak tions on accesses depending on a number of attributes. model to a much higher degree than hardware [GGH91] In general, the more attributes a model distinguishes, by enabling optimizations such as code scheduling or the weaker the model is. Some attributes a model could delaying updates that are not legal under SC. distinguish are listed below: In short, weaker memory consistency models can have ¢ location of access a positive effect on the performance of parallel shared memory machines. The bene®t increases as memory ¢ direction of access (read, write, or both) latency increases. In recent years, processor performance has increased signi®cantly faster than memory ¢ value transmitted in access system performance. In addition to that, memory latency increases as the number of processors in a system ¢ causality of access 1 For example, if a critical region has only read accesses memory access to shared variables, then acquiring the lock can be done non-exclusively. shared private Consistency models that distinguish access categories employ different ordering constraints depending on the competing non−competing access category. We therefore call such models hybrid. In contrast, models that do not distinguish access cat- synchronizing non−synchronizing egories are called uniform. The motivation for hybrid models is engendered in Adve and Hill's de®nition for acquire release weak ordering [AH90]: exclusive non−exclusive Hardware is weakly ordered with respect to a synchronization model if and only if it ap- Figure 1: Access Categories pears sequentially consistent to all software that obey the synchronization model. That is, as long as the synchronization model is re- ¢ category of access spected, the memory system appears to be sequentially consistent. This allows the de®nition of almost arbi- The causality attribute is a relation that tells if two trarily weak consistency models while still presenting ¢¡ ¤£ accesses and are (potentially) causally related a reasonable programming model. All that changes is ¥¡ ¦£ [Lam78] and if so, whether occurred before or the number or severity of constraints imposed by the vice versa. synchronization model. The access category is a static property of accesses. A useful (but by no means the only possible) categorization is shown in Figure 1. It is an extension of the catego- ¡ 3 Proposed Models rization used in [GLL 90]. A memory access is either shared or private. Private accesses are easy to deal with, We now proceed to de®ne some of the consistency mod- so we don't discuss them further. Shared accesses can be els that have previously been proposed. We do not give divided into competing and non-competing accesses. A formal de®nitions for the presented models as they do pair of accesses is competing if they access the same lo- not help much to understand a model's implications on cation, at least one of them is a write access, and they are the programming model. More formal descriptions can ¡ not ordered. For example, accesses to shared variables be found for example in Ahamad et al. [ABJ 92] and within a critical section are non-competing because mu- ¡ ¡ Gharachorloo et al. [GLL 90]. We ®rst discuss uni- tual exclusion guarantees ordering . A competing ac- form models and then hybrid models. Figure 2 gives an cess can be either synchronizing or non-synchronizing. overview of the relationships among the uniform mod- Synchronizingaccesses are used to enforce order, for ex- © ¨ els. An arrow from model ¨ to indicates that is ample by delaying an access until all previous accesses more strict than © . Each model is labeled with the sub- are performed. However, not all competing access are section it is described in. Hybrid models are described synchronizing accesses. Chaotic relaxation algorithms, roughly in decreasing order of strictness. for example, use competing accesses without imposing We use triplets of the form ¥ to denote a memory ordering constraints. Such algorithms work even if some access, where is either for read access or for read accesses do not return the most recent value. write access, denotes the accessed location, and the Synchronizing accesses can be divided further into transmitted value. In our examples, a triplet identi®es acquire or release accesses. An acquire is always asso- an access uniquely. Memory locations are assumed to ciated with a read synchronizingaccess while a release is have an initial value of zero. always a write synchronizing access. Atomic fetch-and- § Execution histories are presented as diagrams with operations can usually be treated as an acquiring read one line per processor. Time corresponds to the horizon- access followed by a non-synchronizing write access. tal axis and increases to the right. For write accesses, a Finally, an acquire operation can be either exclusive or triplet in a diagram marks the time when it was issued, non-exclusive. Multiple non-exclusive acquire accesses while for read accesses it denotes the time of comple- can be granted, but an exclusive acquire access is de- tion.

Memory Consistency Models

Memory Consistency

Memory Consistency: in a Distributed Outline for Lecture 20 Memory System, References to Memory in Remote Processors Do Not Take Place I

Towards Shared Memory Consistency Models for Gpus

On the Coexistence of Shared-Memory and Message-Passing in The

Consistency Models • Data-Centric Consistency Models • Client-Centric Consistency Models

Challenges for the Message Passing Interface in the Petaflops Era

An Overview on Cyclops-64 Architecture - a Status Report on the Programming Model and Software Infrastructure

Constraint Graph Analysis of Multithreaded Programs

The Openmp Memory Model

Optimization of Memory Management on Distributed Machine Viet Hai Ha

Dag-Consistent Distributed Shared Memory

Memory Models