Multicore Semantics and Programming Mark Batty
Total Page:16
File Type:pdf, Size:1020Kb
Multicore Semantics and Programming Mark Batty University of Cambridge January, 2013 – p.1 These Lectures Semantics of concurrency in multiprocessors and programming languages. Establish a solid basis for thinking about relaxed-memory executions, linking to usage, microarchitecture, experiment, and semantics. x86, POWER/ARM, C/C++11 Today: x86 – p.2 Inventing a Usable Abstraction Have to be: Unambiguous Sound w.r.t. experimentally observable behaviour Easy to understand Consistent with what we know of vendors intentions Consistent with expert-programmer reasoning – p.3 Inventing a Usable Abstraction Key facts: Store buffering (with forwarding) is observable IRIW is not observable, and is forbidden by the recent docs Various other reorderings are not observable and are forbidden These suggest that x86 is, in practice, like SPARC TSO. – p.4 x86-TSO Abstract Machine Thread Thread Write Buffer Write Buffer Lock Shared Memory – p.5 SB, on x86 Thread 0 Thread 1 MOV [x] 1 (write x=1) MOV [y] 1 (write y=1) ← ← MOV EAX [y] (read y) MOV EBX [x] (read x) ← ← Thread Thread Write Buffer Write Buffer Lock x=0 Shared Memory y= 0 – p.6 SB, on x86 Thread 0 Thread 1 MOV [x] 1 (write x=1) MOV [y] 1 (write y=1) ← ← MOV EAX [y] (read y) MOV EBX [x] (read x) ← ← Thread Thread t0:W x=1 Write Buffer Write Buffer Lock x= 0 Shared Memory y= 0 – p.6 SB, on x86 Thread 0 Thread 1 MOV [x] 1 (write x=1) MOV [y] 1 (write y=1) ← ← MOV EAX [y] (read y) MOV EBX [x] (read x) ← ← Thread Thread Write Buffer Write Buffer (x,1) Lock x= 0 Shared Memory y= 0 – p.6 SB, on x86 Thread 0 Thread 1 MOV [x] 1 (write x=1) MOV [y] 1 (write y=1) ← ← MOV EAX [y] (read y) MOV EBX [x] (read x) ← ← Thread Thread t1:W y=1 Write Buffer Write Buffer (x,1) Lock x= 0 Shared Memory y= 0 – p.6 SB, on x86 Thread 0 Thread 1 MOV [x] 1 (write x=1) MOV [y] 1 (write y=1) ← ← MOV EAX [y] (read y) MOV EBX [x] (read x) ← ← Thread Thread Write Buffer Write Buffer (x,1) (y,1) Lock x= 0 Shared Memory y= 0 – p.6 SB, on x86 Thread 0 Thread 1 MOV [x] 1 (write x=1) MOV [y] 1 (write y=1) ← ← MOV EAX [y] (read y) MOV EBX [x] (read x) ← ← Thread Thread Write Buffer Write Buffer (x,1) t0:R y=0 (y,1) Lock x= 0 Shared Memory y= 0 – p.6 SB, on x86 Thread 0 Thread 1 MOV [x] 1 (write x=1) MOV [y] 1 (write y=1) ← ← MOV EAX [y] (read y) MOV EBX [x] (read x) ← ← Thread Thread Write Buffer Write Buffer (x,1) (y,1) t1:R x=0 Lock x= 0 Shared Memory y= 0 – p.6 SB, on x86 Thread 0 Thread 1 MOV [x] 1 (write x=1) MOV [y] 1 (write y=1) ← ← MOV EAX [y] (read y) MOV EBX [x] (read x) ← ← Thread Thread Write Buffer Write Buffer (x,1) (y,1) t0:τ x=1 Lock x= 0 Shared Memory y= 0 – p.6 SB, on x86 Thread 0 Thread 1 MOV [x] 1 (write x=1) MOV [y] 1 (write y=1) ← ← MOV EAX [y] (read y) MOV EBX [x] (read x) ← ← Thread Thread Write Buffer Write Buffer (y,1) Lock x= 1 Shared Memory y= 0 – p.6 SB, on x86 Thread 0 Thread 1 MOV [x] 1 (write x=1) MOV [y] 1 (write y=1) ← ← MOV EAX [y] (read y) MOV EBX [x] (read x) ← ← Thread Thread Write Buffer Write Buffer (y,1) t1:τ y=1 Lock x= 1 Shared Memory y= 0 – p.6 SB, on x86 Thread 0 Thread 1 MOV [x] 1 (write x=1) MOV [y] 1 (write y=1) ← ← MOV EAX [y] (read y) MOV EBX [x] (read x) ← ← Thread Thread Write Buffer Write Buffer Lock x= 1 Shared Memory y= 1 – p.6 How to formally define this? Separate instruction semantics and memory model Define the memory model in two (provably equivalent) styles: an abstract machine (or operational model) an axiomatic model Put the instruction semantics and abstract machine in parallel, exchanging read and write messages (and lock/unlock messages). – p.7 Let’s Pretend... that we live in an SC world Thread1 Threadn WRW R Shared Memory Multiple threads acting on a sequentially consistent (SC) shared memory – p.8 A Tiny Language location, x, m address integer, n integer thread id, t thread id expression, e ::= expression n integer literal | x read from address x | x = e write value of e to address x | e; e sequential composition | ! e + e plus | ! process, p ::= process t:e thread | p p parallel composition | | ! – p.9 A Tiny Language That was just the syntax — how can we be precise about the permitted behaviours of programs? – p.10 Defining an SC Semantics: expressions l e e! e does l to become e! −→ l e e! e does l to become e! −→ READ x R x=n n −−−−→ l e1 e! WRITE 1 W x=n −→ PLUS CONTEXT 1 x = n n e + e l e + e −−−−→ 1 2 −→ 1! 2 l e e! l −→ WRITE CONTEXT e2 e2! l −→ PLUS CONTEXT 2 x = e x = e! l −→ n1 + e2 n1 + e! −→ 2 SEQ n; e τ e −→ n = n1 + n2 τ PLUS l n1 + n2 n e1 e1! −→ −→ SEQ CONTEXT e ; e l e ; e 1 2 −→ 1! 2 – p.11 Example: SC Expression Trace (x = y); x – p.12 Example: SC Expression Trace (x = y); x R y=7 W x=7 τ R x=9 (x = y); x 9 −−−→ −−−−→−→−−−→ – p.12 Example: SC Expression Trace (x = y); x R y=7 W x=7 τ R x=9 (x = y); x 9 −−−→ −−−−→−→−−−→ READ R y=7 y 7 WRITE CONTEXT −−−→R y=7 x = y x =7 SEQ CONTEXT −−−→ R y=7 (x = y); x (x =7);x −−−→ – p.12 Example: SC Expression Trace (x = y); x R y=7 W x=7 τ R x=9 (x = y); x 9 −−−→ −−−−→−→−−−→ WRITE W x=7 x =7 7 SEQ CONTEXT −−−−→W x=7 (x =7);x 7; x −−−−→ – p.12 Example: SC Expression Trace (x = y); x R y=7 W x=7 τ R x=9 (x = y); x 9 −−−→ −−−−→−→−−−→ τ SEQ 7; x x −→ READ R x=9 x 9 −−−→ – p.12 Defining an SC Semantics: lifting to processes t:l p p p does t : l to become p −→ ! ! l e e! −→ THREAD t:l t:e t:e −→ ! t:l p1 p1! −→ PAR CONTEXT LEFT t:l p p p p 1| 2 −→ 1! | 2 t:l p2 p2! −→ PAR CONTEXT RIGHT t:l p p p p 1| 2 −→ 1| 2! free interleaving – p.13 Defining an SC Semantics: SC memory Take an SC memory M to be a function from addresses to integers. Define the behaviour as a labelled transition system (LTS): the least set of (memory,label,memory) triples satisfying these rules. t:l M M M does t : l to become M −→ ! ! M (x)=n MREAD t:R x=n M M −−−−−→ MWRITE t:W x=n M M (x n) −−−−−→ ⊕ %→ – p.14 Defining an SC Semantics: whole-system states A system state p, M is a pair of a process and a memory. & ' t:l s s s does t : l to become s −→ ! ! t:l p p! −→t:l M M ! −→ SSYNC t:l p, M p , M & ' −→& ! !' t:τ p p! −−→ TAU t:τ S p, M p , M & ' −−→& ! ' synchronising between the process and the memory, and letting threads do internal transitions – p.15 Example: SC Interleaving All threads can read and write the shared memory. Threads execute asynchronously – the semantics allows any interleaving of the thread transitions. Here there are two: t1:x =1t2:x =2, x 0 & l| R{R %→ }' t1:W x=1 lll RRRt2:W x=2 lll RRR lll RRR ulll RR) t :1 t :x =2, x 1 t :x =1t :2, x 2 & 1 | 2 { %→ }' & 1 | 2 { %→ }' t2:W x=2 t1:W x=1 t :1 t :2, x 2 t :1 t :2, x 1 & 1 | 2 { %→ }'&1 | 2 { %→ }' But each interleaving has a linear order of reads and writes to the memory. – p.16 Combinatorial Explosion The behaviour of t :x = x +1t :x = x +7for the initial store 1 | 2 x 0 : { %→ } r + w t1:1 t2:(x = x +7), x 1 / / / t1:1 t2:8, ll 8 & | i4 { %→ }' • • & | { %→ }' w iiii iiii iiii iiii t1:(x =1)t2:(x = x +7), x 0 t1:1 t2:(x =7+0), x 1 & | m6 UU { %→ }' & | i4 {TT %→ }' + mm UUU r w iii TTT + mm UUUU iiii TTTT mmm UUUU iiii TTT mmm UUU iii TTT m U* ii T* w t1:(x =1+0)t2:(x = x +7), x 0 t1:(x =1)t2:(x =7+0), x 0 t1:1 t2:(x =7), x 1 / t1:1 t2:7, x 7 & |m6 QQ { %→ }' & i|i4 UUU{ %→ }' & | j4 { %→ }' & | { %→ }' r mm QQ r + iii UUU + w jjjj mmm QQQ iiii UUUU jjj mmm QQQ iiii UUUU jjjj mmm QQ( iiii UUU* jjjj t1:(x = x +1)t2:(x = x +7), x 0 t1:(x =1+0)t2:(x =7+0), x 0 t1:(x =1)t2:(x =7), x 0 & | QQ { %→ }' & m| 6 UU { %→ }' & i4 | TT{ %→ }' QQ mm UUUU + iiii TTTT QQQ mmm UUUU iiii TTT r QQ mmr + UUU iii w TTT QQQ mmm UUU iii TTT ( m U* ii T* w t1:(x = x +1)t2:(x =7+0), x 0 t1:(x =1+0)t2:(x =7), x 0 t1:x =1t2:7, x 7 / t1:1 t2:7, x 1 & | QQ { %→ }' & ii4 | UUU{ %→ }' & jj4 | { %→ }' & | { %→ }' QQQ r iiii UUUU + jjj QQ iii UUU jjjj + QQQ iiii w UUUU jjj QQ( iiii UUU* jjjj t :(x = x +1)t :(x =7), x 0 t :x =1+0t :7, x 7 & 1 | 2 { %→ }' & 1 | 2 { %→ }' UUUU UUUUw UUUU UUUU U* r + w t :x = x +1t :7, x 7 / / / t :8 t :7, ll 8 & 1 | 2 { %→ }' • • & 1 | 2 { %→ }' NB: the labels +, w and r in this picture are just informal hints as to how those transitions were derived – p.17 Morals For free interleaving, number of systems states scales as nt,wheren is the threads per state and t the number of threads.