Scalable Correct Memory Ordering Via Relativistic Programming

Portland State University PDXScholar Computer Science Faculty Publications and Presentations Computer Science 3-2011 Scalable Correct Memory Ordering via Relativistic Programming Josh Triplett Portland State University Philip William Howard Portland State University Paul E. McKenney Jonathan Walpole Portland State University Follow this and additional works at: https://pdxscholar.library.pdx.edu/compsci_fac Part of the Computer and Systems Architecture Commons, and the Systems Architecture Commons Let us know how access to this document benefits ou.y Citation Details Triplett, Josh, Paul E. McKenney, Philip W. Howard, and Jonathan Walpole (2011) "Scalable Correct Memory Ordering via Relativistic Programming." Portland State University Computer Science Department Technical Report 11-03. This Technical Report is brought to you for free and open access. It has been accepted for inclusion in Computer Science Faculty Publications and Presentations by an authorized administrator of PDXScholar. Please contact us if we can make this document more accessible: [email protected]. Scalable Correct Memory Ordering via Relativistic Programming Josh Triplett Paul E. McKenney Portland State University IBM Linux Technology Center [email protected] [email protected] Philip W. Howard Jonathan Walpole Portland State University Portland State University [email protected] [email protected] Abstract operations, and using proof systems to verify correctness. We propose and document a new concurrent pro- We propose relativistic programming,anewmodel gramming model, relativistic programming.This for concurrent programming which achieves scalabil- model allows readers to run concurrently with writ- ity without sacrificing the simplicity of strong order- ers, without blocking or using expensive synchro- ing guarantees. This model allows readers to run nization. Relativistic programming builds on exist- concurrently with writers, without blocking or using synchronization primitives that allow writers to ing expensive synchronization. Despite this concur- wait for current readers to finish with minimal reader rency, however, relativistic writers can strictly con- overhead. Our methodology models data structures trol the order their writes become visible to readers, as graphs, and reader algorithms as traversals of allowing straightforward reasoning about relativistic these graphs; from this foundation we show how algorithms. writers can implement arbitrarily strong ordering guarantees for the visibility of their writes, up to Readers commonly perform multiple separate and including total ordering. memory reads as they traverse a data structure. By allowing readers to run concurrently with writers, and avoiding any delays in readers, we allow readers 1 Introduction to perform their reads interleaved with a writer’s writes. It therefore becomes the writer’s responsi- Concurrent programming faces a difficult tradeoff bility to prevent readers from observing inconsistent between simplicity and scalability. Simple concur- states. Without blocking readers, the writer cannot rent algorithms incur high synchronization costs, generally make a series of writes appear atomic, but while highly scalable algorithms prove highly com- our methodology allows the writer to maintain the plex to develop and maintain. This dilemma hinges order of its writes. With no write-side synchroniza- primarily on the ordering of memory operations, and tion, readers could observe any possible permutation the order in which they become visible to threads. of writes; by adding write-side synchronization, writ- Simple concurrent algorithms force more serializa- ers may strictly preserve program order for any or tion to preserve sequential reasoning techniques, all writes, allowing any ordering policy up to and while scalable algorithms allow more reordering and including total order. The choice of ordering poli- force the programmer to think about concurrent be- cies affects writer latency, but never impacts reader havior. performance or scalability. Existing concurrent programming models ap- “Laws of Order” [1] documents a set of inter- proach this dilemma by choosing a point on the slid- locked properties for concurrent algorithms, and ing scale and attempting to incrementally reduce proves that any algorithm with all those properties the drawbacks associated with that point. Trans- must necessarily rely on expensive synchronization actional memory presents a simple programming instructions. These properties include “strong non- model where transactions appear serialized, and commutativity”: multiple operations whose results strives to eliminate the expensive synchronization both depend on their relative ordering. Relativis- costs. Scalable algorithms seek simplicity by intro- tic readers, however, cannot execute strongly non- ducing reasoning models for weakly ordered memory commutative operations—a relativistic reader can- 1 not affect the results of a write, regardless of order- the base address of the hash table, the head pointer ing. This allows relativistic readers to avoid expen- of the desired hash bucket, and the various “next” sive synchronization while maintaining correctness. pointers of the linked list in that bucket. These mul- Relativistic programming builds on several exist- tiple reads may still represent a single semantic oper- ing synchronization primitives; in particular, rela- ation on some abstract data type, such as the lookup tivistic writers rely on a primitive that waits for all of an item in a map or set. current readers to finish. Read-Copy Update (RCU) Similarly, a single writer may need to perform implementations [10, 9, 7]providesuchaprimitive, multiple modifications to implement a single seman- with little to no overhead for the corresponding read- tic operation. For example, to implement removal side delineation [8]. Any synchronization primitive from a linked list or hash table, a writer must mod- with the same semantics will work as well. ify the preceding next pointer (or head pointer) to Section 2 documents the relativistic programming point past the disappearing item, and then reclaim methodology. Section 3 presents the status of our the memory associated with that node. To move a ongoing research in this area. Section 4 summarizes node in a linked list, a writer could insert the new our contributions. node and remove the old, in some order. In the absence of additional synchronization, if a writer with concurrent readers performs two mem- 2 Methodology ory operations on a data structure, a reader travers- ing the data structure and reading these memory Consider a simple set of reader and writer algo- locations at different times may observe four dis- rithms based on mutual exclusion. Each algorithm tinct states of memory: the state with neither op- begins by acquiring a lock, and ends by releasing eration performed, the state with both operations that lock. Readers and writers run for a finite dura- performed, or either of the two states with a single tion, delineated by these lock operations. As far as operation performed. (These same principles extend other threads can observe, all operations of a run- to more than two memory operations in the obvious ning thread occur between the lock and unlock oper- manner, with 2n possible reader observations for n ations; since other threads cannot examine the data memory operations; subsequent examples will ini- structure without acquiring the lock themselves, the tially focus on the case of two memory operations.) operations performed by a thread appear atomic. With neither operation performed, the state of the Thus, any set of reads and writes in this system will data structure appears the same to a reader as if that appear sequential. reader completed before the writer began; thus, ob- Reader-writer locking does not change this prop- serving this state cannot do harm to a reader. Sim- erty; readers cannot modify the memory observed by ilarly, with both operations performed, the state of other readers, so running readers concurrently does the data structure appears the same to a reader as if not affect the sequential consistency of the system; that reader began after the writer completed; thus, to generate an equivalent sequential order, choose observing this state can do no harm either. Incor- any arbitrary order for read operations. Similarly, rect behavior can only occur if a reader observes one common applications of non-blocking synchroniza- of the two possible intermediate states. tion and transactional memory will roll back readers For an example of such harm, consider the re- or writers to ensure that reader operations cannot moval of a node described earlier: first unlink the observe intermediate states. node, then free the memory. If a reader does not When using a relativistic programming technique observe the node unlinked (and thus sees the node), such as RCU, readers and writers still have finite the reader must not observe the memory reclama- durations, but readers no longer exclude other read- tion; otherwise, the reader would access reclaimed ers, or even writers. Thus, any number of readers memory. can run concurrently, together with a simultaneous Mutual exclusion takes the simplest approach to writer, providing joint-access parallelism. Multiple avoid this problem: by preventing readers from run- writers continue to use disjoint-access parallelism, ning concurrently with writers, readers cannot ob- accessing separate data simultaneously through the serve intermediate states at all,

Scalable Correct Memory Ordering Via Relativistic Programming

Memory Consistency

Memory Ordering: a Value-Based Approach

Connect User Guide Armv8-A Memory Systems

Recent Advances in Memory Consistency Models for Hardware

Fence Scoping

Fences in Weak Memory Models

Linux-Kernel Memory Ordering: Help Arrives at Last!

The Semantics of X86-CC Multiprocessor Machine Code

Memory Ordering in Modern Microprocessors Paul E

Itanium Processor Microarchitecture

Weak Memory Models with Matching Axiomatic and Operational Definitions

Louvre: Lightweight Ordering Using Versioning for Release Consistency