Translating Between Itanium and Sparc Memory Consistency Models

Translating Between Itanium and Sparc Memory Consistency Models Lisa Higham(1) LillAnne Jackson(1)(2) [email protected] [email protected] ( ) 1 Department of Computer Science The University of Calgary, Calgary, Canada ( ) 2 Department of Computer Science The University of Victoria, Victoria, Canada Categories and Subject Descriptors: B.3.2 [HARDWARE]: formed program substantially more than the original program Memory Structures— Design styles: Shared memory C.1.2 in order to ensure that no erroneous computations can arise. [PROCESSOR ARCHITECTURES]: Multiple Data Stream Architectures (Multiprocessors)— Interconnection architectures 1. INTRODUCTION General Terms: Theory, Verification. Our general goal is to translate multiprocessor programs Keywords: Sparc, Itanium, Memory consistency models, mul- designed for one architecture to programs for another archi- tiprocessors, program transformations. tecture, while ensuring that each program’s semantics remains unchanged. This is challenging because the memory access and ordering instructions of various architectures, differ sig- ABSTRACT nificantly. Consequently, the set of possible outcomes of a Our general goal is to port programs from one multipro- “fixed” multiprogram can be very different when run on dif- cessor architecture to another, while ensuring that each pro- ferent multiprocessors. Each multiprocessor architecture has gram’s semantics remains unchanged. This paper addresses a set of rules, called its memory consistency model, that speci- a subset of the problem by determining the relationships be- fies the ordering constraints and the allowable returned values tween memory consistency models of three Sparc architec- of instructions that access the shared memory. These rules tures, (TSO, PSO and RMO) and that of the Itanium archi- and the vocabulary used in their specification varies consid- tecture. First we consider Itanium programs that are con- erably between architectures, making comparison difficult. strained to have only one load-type of instruction in { load, Additionally, each architecture defines a set of instructions load acquire}, and one store-type of instruction in { store, that further constrain the allowable orderings. For example store release}. We prove that in three out of four cases, the Sparc architectures use memory barriers which order certain set of computations of any such program is exactly the set instructions before and after the barrier, while Itanium archi- of computations of the “same” program (using only load and tectures provide instructions with acquire (respectively, re- store) on one Sparc architecture. In the remaining case the set lease) semantics which only ensure that instructions after (re- is nested between two natural sets of Sparc computations. spectively, before) them remain so ordered. Real Itanium programs, however, use a mixture of load, This paper compares the memory ordering of the Sparc ar- load acquire, store, store release and memory fence instruc- chitecture by Sun [15] and the Itanium architecture by IN- tions, and real Sparc programs use a variety of barrier instruc- TEL[11]. First we restrict the memory access and ordering in- tion as well as load and store instructions. We next show that structions to a subset that is common to each class of machine any mixture of the load-types or the store-types (in the case and derive (and prove) the precise relationship between sets of Itanium) or any barrier instructions (in the case of Sparc) of computations that can arise from running the same mul- completely destroys the clean and simple similarities between tiprogram on each. Next, system specific memory ordering the sets of computations of these systems. Thus (even with- instructions are included and we derive the relationship be- out considering the additional complications due to register tween these more complicated sets of computations on each and control dependencies) transforming these more general machine. This two step approach has proven a useful tech- programs in either direction requires constraining the trans- nique for understanding how different architectures behave. Sections 3 through 5 restrict Sparc multiprograms to have only load and store memory access instructions that are constrained by one of the RMO, PSO or TSO memory con- Permission to make digital or hard copies of all or part of this work for sistency models. Itanium multiprograms are restricted to personal or classroom use is granted without fee provided that copies are use only one type of memory access instruction in {load, not made or distributed for profit or commercial advantage and that copies load acquire} and one type in {store, store release}.We bear this notice and the full citation on the first page. To copy otherwise, to prove that the sets of possible computations on each system republish, to post on servers or to redistribute to lists, requires prior specific for any multiprogram so restricted are closely related and in permission and/or a fee. SPAA’06, July 30–August 2, 2006, Cambridge, Massachusetts, USA. many cases identical. For example, if a multiprogram uses Copyright 2006 ACM 1-59593-262-3/06/0007 ...$5.00. just load and store memory access instructions then the set of computations that the multiprogram can produce on a Sparc 2.2 Multiprocesses, computations, mem- RMO system is equal to the set of computations it can pro- ory consistency duce on an Itanium system. The same a multiprogram, when As each process in a multiprocess system executes, it issues run on a Sparc TSO system, can produce exactly the same set a sequence of instruction invocations on shared memory ob- of computations as it would produce on an Itanium system jects. We begin with a simplified setting in which we assume after replacing each load by load acquire and each store by the shared memory consists of only shared variables, and each store release. instruction invocation is either of the form storep(x,v) mean- These simple relationships hold because different strengths ing that process p writes a value v to the shared variable x, of load and/or store instructions are not combined in the same or of the form loadp(x) meaning that process p reads a value program. In section 6, we add the memory barrier instructions from shared variable x. For this paper it suffices to model each of Sparc and allow Itanium multiprograms to contain more individual process p as a sequence of these store and load in- combinations of load, load acquire, store and store release struction invocations and call such a sequence an individual instructions. Perhaps surprisingly, any mixture of the load- program.Amultiprogram is defined to be a finite set of these types or the store-types (in the case of Itanium) or any bar- individual programs. rier instructions (in the case of Sparc) completely destroys An instruction is an instruction invocation completed with the clean and simple similarities between the sets of compu- a response. Here the response of a store instruction invocation tations of these systems. In fact, we present impossibilities is an acknowledgment and is ignored. The response of a load when dealing with the Sparc MEMBAR #StoreLoad instruc- invocation is the value returned by the invocation. A (mul- tion and with any additional Itanium memory instructions. tiprocess) computation of a multiprogram P is created from The scope of this paper does not include atomic read- P by changing each load instruction invocation, loadp(x),to modify-write instructions (i.e., atomic memory transactions ν ←loadp(x) where ν is either the initial value of x or a value on Sparc and semaphores on Itanium). Furthermore, regis- stored to x by some store(x,·) in the multiprogram. A “·” ter and control dependencies such as branching are not con- in place of a variable or value is used to denote the set of sidered. Capturing these dependency orderings is a signfi- all the instruction invocations or instructions that match the cant task that we have addressed elsewhere [9] using different given pattern. For example, storep(x,·) represents the set of techniques. Since the techniques are orthogonal to those of all store instructions by program p to the shared variable x,or, this paper, they could be combined to give a complete mem- depending on context, a member of that set. ory model. Let I(C) be all the instructions of a computation, C. The se- Previous work [8, 10] has compared the Sparc models to quence of instruction invocations of each individual program standard memory consistency models defined in the literature prog induces a partial order (I(C),−→ ), called program order,on [13, 14, 1, 6, 2, 3]. Gopalakrishnan and colleagues [4, 16] ( ) −→prog have worked on a formal specification of the memory consis- I C , defined naturally by i j if both i and j are instruc- tency of Itanium for verification purposes. Previous work [12] tions of the same program, say p, and the invocation of i pre- has compared the memory consistency model on distributed cedes the invocation of j in p’s individual program. shared memory systems. Gharachorloo [5] (chapter 4) de- Notice that the definition of a computation permits the ( ) fines porting relationships to and from Sparc and other shared value returned by each load x instruction invocation to be memory architectures, but does not consider Itanium. arbitrarily chosen from the set of values stored to x by the multiprogram. In a real machine, the values that might be actually be returned are substantially further constrained by the machine architecture, which determines the way in which 2. NOTATION AND MODELS the processes communicate and that shared memory is imple- mented. A memory consistency model captures these con- 2.1 Sets, sequences, and orders straints by specifying a set of additional requirements that computations must satisfy. We use C(P, MC) to denote the set −→po < Let (B, ) be a partial order. The notation a po b de- of all computations of multiprogram P that satisfy the mem- po po notes that (a,b) ∈ (B,−→ ).

Load more