Recent Advances in Memory Consistency Models for Hardware

SELECTED FOR PROC OF THE IEEE SPECIAL ISSUE ON DISTRIBUTED SHAREDMEMORY NOT THE FINAL VERSION Recent Advances in Memory Consistency Mo dels for Hardware SharedMemory Systems Sarita V Adve Vijay S Pai and Parthasarathy Ranganathan Department of Electrical and Computer Engineering Rice University Houston Texas rsimecericeedu Abstract sp ecication that determines what values the programmer The memory consistency mo del of a sharedmemory sys can exp ect a read to return tem determines the order in which memory op erations will Unipro cessors provide a simple memory consistency app ear to execute to the programmer The memory consis tency mo del for a system typically involves a tradeo b e mo del to the programmer that ensures that memory op era tween p erformance and programmability This pap er pro tions will app ear to execute one at a time and in the order vides an overview of recent advances in hardware optimiza sp ecied by the program or program order Thus a read tions compiler optimizations and programming environ in a unipro cessor returns the value of the last write to the ments relevant to memory consistency mo dels of hardware distributed sharedmemory systems same lo cation where last is uniquely dened by the pro We discuss recent hardware and compiler optimizations gram order Unipro cessor hardware and compilers how that exploit the observation that it is sucient to only ap ever do not necessarily execute memory op erations one pear as if the ordering rules of the consistency mo del are ob eyed These optimizations substantially improve the p er at a time in program order They may reorder and over formance of the strictest consistency mo del making it more lap memory op erations as long as data and control de attractive for its programmability Recent concurrent pro p endences are maintained the system appears to ob ey the gramming languages and environments on the other hand exp ected memory consistency mo del supp ort more relaxed consistency mo dels We discuss sev eral such environments including POSIX threads Java and In a multiprocessor the notion of a last write is not as Op enMP welldened and a more formal sp ecication of the memory consistency mo del is required The most intuitive mo del for I Introduction multiprocessors is a natural extension of the unipro cessor mo del This mo del sequential consistency requires that The memory consistency mo del of a sharedmemory sys memory op erations app ear to execute one at a time in some tem sp ecies the order in which memory op erations will sequential order and the op erations of each pro cessor o ccur app ear to execute to the programmer The memory consis in program order A read in a sequentially consistent tency mo del aects the pro cess of writing parallel programs multiprocessor must return the value of the last write to and forms an integral part of the entire system design in the same lo cation in the ab ove sequential order cluding the architecture the compiler and the program Unlike unipro cessors simply maintaining p erpro cessor ming language data and control dep endences is insucient to maintain se Figure shows a program fragment to illustrate how the quential consistency in a multiprocessor For example in memory consistency mo del aects the programmer The Figure there are no data or control dep endences among gure shows pro cessor P pro ducing some data Data and the memory op erations of pro cessor P Thus a unipro Data that needs to b e consumed by pro cessor P The cessor could allow Ps write to Flag to b e executed b efore variable Flag is used for synchronization Pro cessor P its writes to the data variables In this case it is p ossible writes the value into Flag to indicate that the data is pro that P executes all its reads b efore the data writes of P duced pro cessor P rep eatedly reads Flag until it returns returning the old values for the data variables and violating the value indicating the data is ready to b e consumed sequential consistency This simple example shows that se The program is written with the exp ectation that pro cessor quential consistency imp oses more restrictions than simply Ps reads of the data variables will return the new values preserving data and control dep endences at each pro cessor written by pro cessor P However in many current sys tems pro cessor Ps reads may return the old values of the Sequential consistency can restrict several common hard data variables giving an unexp ected result The memory ware and compiler optimizations used in unipro cessors consistency mo del of a sharedmemory system is a formal For this reason relaxed consistency mo dels have b een pro p osed that explicitly p ermit relaxations of some program This work is supp orted in part by IBM Intel the National Sci orderings These mo dels allow more optimizations than se ence Foundation under Grant No CCR CCR quential consistency but present a more complex mo del to CDA and CDA and the Texas Advanced Tech nology Program under Grant No Sarita V Adve is also the programmer Thus choosing the memory consistency supp orted by an Alfred P Sloan Research Fellowship Vijay S Pai by mo del of a multiprocessor system has typically involved a a Fannie and John Hertz Foundation Fellowship and Parthasarathy Ranganathan by a Lo dieska Sto ckbridge Vaughan Fellowship tradeo b etween p erformance and programmability A tu SELECTED FOR PROC OF THE IEEE SPECIAL ISSUE ON DISTRIBUTED SHAREDMEMORY NOT THE FINAL VERSION Initially Data Data Flag Mo del Program Order Relaxation Sequential consistency None P P Pro cessor consistency Write Read Weak Ordering Data Data Data while Flag fg Release Consistency Data Data Data register Data Data Acquire Flag register Data Release Data Release Acquire Fig Motivation for a memory consistency mo del Fig Relaxations of program order allowed by dierent memory torial pap er by Adve and Gharachorloo gives a detailed consistency mo dels Only program order b etween memory op description of the optimizations restricted by straightfor erations to dierent lo cations is considered ward implementations of sequential consistency as well as release and a subsequent acquire The distinctions b etween relaxations p ermitted by several relaxed mo dels various categories of memory op erations can b e achieved This pap er provides an overview of recent advances in in several ways Current pro cessors supp ort a conserva hardware optimizations compiler optimizations and pro tive metho d through fence or memory barrier instructions gramming languages and environments relevant to mem Typically all program orders with resp ect to a fence or ory consistency mo dels of hardware sharedmemory sys memory barrier instruction are enforced tems The advances in hardware and compiler optimiza The examples in Figure further illustrate the dier tions seek to narrow the p erformance gap b etween memory ences among the mo dels from the programmers view consistency mo dels making stricter mo dels more attrac p oint Part a rep eats the example in Figure With tive for their programmability On the other hand recent sequential consistency since all memory op erations must concurrent programming languages and environments eg app ear to o ccur in program order it follows that Ps reads POSIX threads Java and Op enMP supp ort more relaxed of Data and Data do not o ccur until Ps writes to the consistency mo dels We conclude with a discussion of the data lo cations complete Therefore Ps reads of the data impact of the ab ove advances lo cations will return the newly written values and Pro cessor consistency ensures Ps writes will o ccur in pro I I Background gram order and Ps reads will o ccur in program order A Hardwarecentric Models therefore Ps reads must again return the newly writ ten values Weak ordering and release consistency how We briey describ e key asp ects of the memory consis ever do not imp ose any such restrictions Therefore with tency mo dels most commonly supp orted in commercial sys these mo dels Ps writes to Data and Data may o ccur tems sequential consistency supp orted by HP PA after Ps write to Flag and after Ps reads to Data and and MIPS R pro cessors pro cessor consistency Data Consequently Ps reads of the data lo cations may similar to Suns total store ordering and the mo del sup return the old values of giving unexp ected results With p orted by Intel pro cessors weak ordering and release a weakly ordered or release consistent system Ps reads consistency similar to Suns relaxed memory order can b e made to return the new values if the programmer ing and the mo dels supp orted by Digital Alpha and IBM explicitly identies all accesses to Flag as synchronization PowerPC pro cessors More detailed descriptions of these op erations mo dels and their features can b e found in Figure b illustrates a simplied version of Dekkers The relaxed mo dels mentioned ab ove have b een referred algorithm for implementing critical sections Sequential to as hardwarecentric b ecause they were primarily moti consistency prohibits the reads of b oth Flag and Flag vated by hardware optimizations In this pap er optimiza from returning in the same execution Thus with se tions regarding reordering a pair of memory op erations im quential consistency at most one pro cessor will enter the plicitly refer to op erations on dierent lo cations critical section With pro cessor consistency however the Figure summarizes the relaxations in program order al reads may execute b efore the writes of their resp ective pro lowed by the various memory

Recent Advances in Memory Consistency Models for Hardware

Memory Consistency

Memory Ordering: a Value-Based Approach

Connect User Guide Armv8-A Memory Systems

Fence Scoping

Fences in Weak Memory Models

Linux-Kernel Memory Ordering: Help Arrives at Last!

The Semantics of X86-CC Multiprocessor Machine Code

Memory Ordering in Modern Microprocessors Paul E

Itanium Processor Microarchitecture

Weak Memory Models with Matching Axiomatic and Operational Definitions

Louvre: Lightweight Ordering Using Versioning for Release Consistency

A Better X86 Memory Model: X86-TSO