<<

USOO5867649A United States Patent (19) 11 Patent Number: 5,867,649 Larson (45) Date of Patent: Feb. 2, 1999

54) DANCE/MULTITUDE CONCURRENT Algorithms for Scalable Synchronization on Shaved COMPUTATION Memory Multiprocessors, J. Mollor-Crumley, ACM Trans actions on Computer Systems, vol. 9, No. 1, Feb. 1991. 75 Inventor: Brian Ralph Larson, Mpls., Minn. Verification of Sequential and Concurrent Programs, Apt. & 73 Assignee: Multitude Corporation, Bayport, Olderog; Springer-Verlag, 1991. Minn. The Science of Programming, D. Gries Springer-Verlag, 1981. 21 Appl. No.: 589,933 Computer Architecture R. Karin Prentice-Hall (1989) (CPT. 5). 22 Filed: Jan. 23, 1996 The Design and Analysis of Parallel Algorithms, S. Akl, (51) Int. Cl." ...... G06F 15/16 Prentice-Hall, 1989. 52 U.S. Cl...... 395/200.31 Executing Temporal Programs, B.C. Moszkowski, 58 Field of Search ...... 364/DIG. 1 MS File, Cambridge University Press, Copyright 1986, pp. 1-125. 364/DIG. 2 MS File: 395/377, 376, 379, 382, 385, 390, 391,561, 800, 705, 706, Primary Examiner Robert B. Harrell 712, 200.31, 800.28, 800.3, 800.36; 340/825.8 Attorney, Agent, or Firm-Haugen and Nikolai, P.A. 56) References Cited 57 ABSTRACT U.S. PATENT DOCUMENTS This invention computes by constructing a lattice of States. 4,814,978 3/1989 Dennis ...... 395/377 Every lattice of States corresponding to correct execution 4,833,468 5/1989 Larson et al. . 340/825.8 Satisfies the temporal logic formula comprising a Definitive 5,029,080 7/1991 Otsuki ...... 395/377 Axiomatic Notation for Concurrent Execution (DANCE) program. This invention integrates into a State-lattice com OTHER PUBLICATIONS putational model: a polymorphic, Strong type System; Reference Manual for the ADA Programming Language, US Visibility-limiting domains, first-order assertions, and logic Dept. of Defense, Jul. 1980. for proving a program correctness. This invention includes Communicating Sequential Processes, C.A.R. Hoare, Special hardware means for elimination of cache coherency Comm. of the ACM, Aug. 1978, vol. 21, No. 8. among other things, necessary to construct State-lattices An Optimum Algorithm for Mutual Exclusion in Computer concurrently. The model of computation for the DANCE Networks, Glen Ricant et al. Comm. of the ACM, Jan. 1981, language consisting of four, interrelated, logical Systems vol. 21, No. 1. describing State-lattices, types, domains, and assertions. A An Algorithm for Mutual Exclusion in Decentralized Sys fifth logical System interrelates the other four Systems allow tems, M. Maekawa, CAM Transactions on Computer Sys ing proofs of program correctness. The method of the tems, vol. 3, No. 2, May 1985. present invention teaches programs as temporal formulas The Information Structure of Distributed Mutual Exclusion Satisfied by lattices of States corresponding to correct execu Algorithms, Sanders, ACM Transactions on Computer Sys tion. State-lattices that are short and bushy allow application tems, vol. 5, No. 3, Aug. 1987. of many processors Simultaneously thus reducing execution A Tree-Based Algorithm for Distributed Mutal Exclusion, time. K. Raymond, ACM Transactions on Computer Systems, vol. 2, #1, Feb. 1989. 7 Claims, 41 Drawing Sheets

g 2 O Xx 8X

8X 8X recests G 8. responses 8: &: als x 8X 3. :S& 8 8. s & : controller 8 : : H responses -> : : : s: to disks, displays, 8 : : ATM, etc. x: 8Xas X asX s8. U.S. Patent Feb. 2, 1999 Sheet 1 of 41 5,867,649

1 O1 initial state -

time 1 O3

1 O2 final state

Figure 1 Prior Art U.S. Patent Feb. 2, 1999 Sheet 2 of 41 5,867,649

initial state(s) 1 O1 time 1 O3

1 O2 Nb O final state(s) Figure 2 Prior Art U.S. Patent Feb. 2, 1999 Sheet 3 of 41 5,867,649

waiting barrier ) 204

barrier

barrier

Figure 3 Prior Art U.S. Patent Feb. 2, 1999 Sheet 4 of 41 5,867,649

a=NL b-NIL

Figure 4 U.S. Patent Feb. 2, 1999 Sheet S of 41 5,867,649

1 O1/1 O2 ()

time

Figure 5 Prior Art U.S. Patent Feb. 2, 1999 Sheet 6 of 41 5,867,649

Figure 6 U.S. Patent Feb. 2, 1999 Sheet 7 of 41 5,867,649

Figure 7 U.S. Patent Feb. 2, 1999 Sheet 8 of 41 5,867,649

end (i)=end(k)

Figure 8 U.S. Patent Feb. 2, 1999 Sheet 9 of 41 5,867,649

Figure 9 U.S. Patent Feb. 2, 1999 Sheet 10 of 41 5,867,649

C 1 O1

() () 1 O3 to C 1 O3 make W true On each Subinterval () () ()

OGo? Figure 10 U.S. Patent Feb. 2, 1999 Sheet 11 of 41 5,867,649

start(i) evaluate expression, e, using values of variables in first state of the Subinterval

X:= e to be made true Over this interval

the value of X at the end of the Subinterval becomes the Value evaluated

Figure 11 U.S. Patent Feb. 2, 1999 Sheet 12 of 41 5,867,649

elements in queue

Figure 12 Prior Art U.S. Patent Feb. 2, 1999 Sheet 13 of 41 5,867,649

Figure 13 Prior Art U.S. Patent Feb. 2, 1999 Sheet 14 of 41 5,867,649

Figure 14 Prior Art U.S. Patent Feb. 2, 1999 Sheet 15 of 41 5,867,649

101

unsorted employees=Johnson, Swenson, Anderson, Larson, Carlson, Jackson, Henderson, and Peterson Sorted employees= nil, nil, nil, nil, nil, nil, nil, nil number of employees=8

Figure 15 U.S. Patent Feb. 2, 1999 Sheet 16 of 41 5,867,649

103

orig=Johnson, Swenson, AnderSon, Larson, 35 Carlson, Jackson, HenderSon, PeterSon final=nil, nil, nil, nil, nil, nil, nil, nil low = 1, high = 8

Figure 16 U.S. Patent Feb. 2, 1999 Sheet 17 0f 41 5,867,649

1 O1

51

54 57 6

1 O2

Figure 17 U.S. Patent Feb. 2, 1999 Sheet 18 of 41 5,867,649

81

82-107 O O O

Figure 18 U.S. Patent Feb. 2, 1999 Sheet 19 of 41 5,867,649

. . . . . 105 ( to {

Figure 19 U.S. Patent Feb. 2, 1999 Sheet 20 of 41 5,867,649

53 unsolted employeeses Johnson, Swenson, Anderson, Larson, Carlson, Jackson, lenderson, and Peterson) sorted enjoy ccse inil, inil, ini, iii, nil, ini, nil, nil inct of left|ployeesri 8

is , in 8 a=Johnson, Swcinson, Anderson, Larson, Carlson, Jackson, lencierson, Peterson int cr = nil, nil, hit, in it, nil, inil, nil, ini) low-index = 1, high index is 8, its 4

a 1N?t-2 a.2)-Y (t=3 a.3) (l=4 a.4) a ?t-5 a.5) (t ta7 at 7-Y/-8 at Jolins Swet still VAthleson Larso CSOI Lenian Peter Spin

3 sills 100 socialUSS 100 S it: 3 Ssss GskipTo", 92 yess 3 C s.S.) isS-7 hold=Johnson, Carlson, Jacksoil, Ancierson, 1 elderson, Larson, Peterson, Swenson) ks: (5

3 terra Johnson, Carlson, Jackson, Anderson, enclerson, Larson, Peterson, Swenson pivots G - -s-

ows 5 low-7 high a 5 60 final 6) is arson fight 8

8.-- or r / N ls a =Y?t-2 a 2- LS)N (.7 it 7) (10 ato) Johnson Carson Peterson SWCIn 5O1

on35 were )

hold 1.5-Carlson, Anderson, told 7.8= Peterson, Swenson k a 7

enderson, Jackson, Johnson k = 4

Figure 20 U.S. Patent Feb. 2, 1999 Sheet 21 of 41 5,867,649

53 iter1.5) = Carlson, Anderson, iter(7.8)=Peterson, Swenso Henderson, Jackson, Johnson) pivot = 7 pivot = 4

final/=Peterson 60

44 final(5)=Johnson)

final8)=Swenso 65 final(7.8)= 1 11 Peterson, Swenson) hold 1.3)=Anderson, Henderson, Carlson k=1

53 iter1.3)=Anderson, Henderson, Carlson pivot=1 60 final1=Anderson

Figure 21 U.S. Patent Feb. 2, 1999 Sheet 22 of 41 5,867,649

111 hold2.3=Carlson, Henderson) k=3 53 iter2.3=Carlson, Henderson) pivot=3 54 60 low = 2 final3)=Henderson high=2

44 final2=Carlson) 65 final2.3= Carlson, Henderson) 65 final 1.3= Anderson, Carlson, Henderson)

65 final 1.5)= Anderson, Carlson, Henderson, Jackson, Johnson

65 final1.8)=Anderson, Carlson, Henderson, Jackson, Johnson, Henderson, Larson, Peterson, Swenson)

1 O2 Sorted employees 1.8)=Anderson, Carlson, Henderson, Jackson, Johnson, Henderson, Larson, Peterson, Swenson)

Figure 22 U.S. Patent Feb. 2, 1999 Sheet 23 of 41 5,867,649

Figure 22A

XXXX K S& KXXX

proCeSSOrS Figure 23 U.S. Patent Feb. 2, 1999 Sheet 24 of 41 5,867,649

request G &&. recuests & responses & XXXX: &N&

memory response € X controller reCueStS X responses

to disks, displays, X2XYXX ATM. etc X W vw. &SAAAAYA

Figure 24

physical address RAM address

Figure 25 U.S. Patent Feb. 2, 1999 Sheet 25 of 41 5,867,649

CT-I-O------an as C.

low addresses E::::::::::::::::ECSOSE:E:::::::::::::

Main Memory Word

17 bits 3 bits 64 bits Figure 27 U.S. Patent Feb. 2, 1999 Sheet 26 of 41 5,867,649

segmentedaddress

User Segment Table

range error

Oriolin

effective virtual address

Figure 28 U.S. Patent Feb. 2, 1999 Sheet 27 of 41 5,867,649

effective virtual address

Page Table

range error

oridin

physical address

Figure 29 U.S. Patent Feb. 2, 1999 Sheet 28 of 41 5,867,649

free-list head array

he Pool Figure 30 U.S. Patent Feb. 2, 1999 Sheet 29 of 41 5,867,649

segment length | | || segment origin

ecurity level equirements page Size

OS-only

Orotected

ead-only

global, spread, shared, or private

Figure 31 U.S. Patent Feb. 2, 1999 Sheet 30 of 41 5,867,649

requirements security level

page size 5 bits

OS-only

Orotected

ead-only

global, Spread, Shared, or private 2 bits

Figure 32 U.S. Patent Feb. 2, 1999 Sheet 31 of 41 5,867,649

30 Orodram module link record other module prameters

code entry point

call (0,iiimport)

indirect procedure

code entry point

call (0,iother)

Figure 33 U.S. Patent Feb. 2, 1999 Sheet 32 of 41 5,867,649

-- -2 - Siss COC C) O O DOOOO

Figure 34 U.S. Patent Feb. 2, 1999 Sheet 33 of 41 5,867,649

40

42

44

46

level

Figure 35

Segment page mapping

mappin from Segment pping node IP bUS

Figure 36 U.S. Patent Feb. 2, 1999 Sheet 34 of 41 5,867,649

56 54

segmented addres interval user segment Offset

Interval

range error

violation

52 effective virtual address

Figure 37 U.S. Patent Feb. 2, 1999 Sheet 35 of 41 5,867,649

segmentedaddress

U S e r range error

3CC eSS UST on violation TLB miss interval map table effective virtual address

Figure 38 U.S. Patent Feb. 2, 1999 Sheet 36 of 41 5,867,649

segmented address 54 Struction user segment Offset type/mode

TLB range error

VI

rights violation

effective virtual address

page size physical address

Figure 39 U.S. Patent Feb. 2, 1999 Sheet 37 of 41 5,867,649

6O

o l 62 16 bits 16 bits 27 bits 276 bits user segment offset cache line -1 1. --- -

- dataward3 is dataword? is dataward is datawodo 64 bits 5 bits

Figure 40 U.S. Patent Feb. 2, 1999 Sheet 38 of 41 5,867,649

USer segment Offset

Figure 41 U.S. Patent Feb. 2, 1999 Sheet 39 0f 41 5,867,649

(A, float)

(X, float)

ex.h.

(w, ) (Y, bool)

Figure 42 U.S. Patent Feb. 2, 1999 Sheet 40 of 41 5,867,649

Figue 43 U.S. Patent Feb. 2, 1999 Sheet 41 of 41 5,867,649

I, Sl' OI RFINCI NUMRAS

In in eral flu re(s)

cicei) () (s to Tinitial state TTT 12 lO2 final State 12 (),3 it tell ediatic state 12 2O ) Occess 3 2) ict.lv (, CU injutatiot 3 A()3 ) 'ocess waiting 3 barrier synclairo inization 3 3() a all b have no value 4 3 ()2 a becomes ten 4. lb becomes twenty 4

(). a locco illes othe lull rect 3).5 sy in clonization point 4. it becomics ()ine undred-twenty 4 3O) state tial sillot ?

3 l () Sille S G

34 Sw O 3S Stace X G 316 transition (s,t) 6 37 translion (S,v) G translt lot (t,f) G

Figure 44 5,867,649 1 2 DANCE/MULTITUDE CONCURRENT doall Style. The State diagram ShowS State transitions occur COMPUTATION ring as “parallel' Sequences. The conception of parallel processing as a collection of Sequential executions charac BACKGROUND OF THE INVENTION terizes the prior art that this invention Supersedes. 1. Field of the Invention Two mechanisms allow communication between Sequen This invention relates to a method for the execution of tial processes: message-passing or Shared-memory. Load plural computer programs being executed by a multiplicity balancing and problem partitioning have produced many of processors in a parallel configuration, and more specifi academic papers for either message-passing or shared cally to performing multiple State transitions simultaneously. memory paradigms-nearly all yielding poor performance. Mutual exclusion (MutEx) via message passing (i.e. for 2. Discussion of the Prior Art distributed systems, no shared state) is horribly inefficient. If New Parallel Language Needed the application needs MutEx often, performance will suffer. I Surveyed ways to program parallel machines. Despite “An Algorithm for Mutual Exclusion in Computer contentions to the contrary, I do not consider SIMD (single Networks,” Glenn Ricart & Ashok K. Agrawala, Commu instruction, multiple-datastream) machines like the Connec 15 nications of the ACM, Vol. 24, No. 1, January 1981, for a tion Machine (CM-2) to be parallel processors. SIMD discussion of prior art O(N) request/reply algorithm, which machines are restricted to performing the same operation in really needs O(N) messages when everyone wants the lock. many cells and thus can perform only a narrow range of “AVN Algorithm for Mutual Exclusion in Decentralized applications. Most applications require much data Systems,” Mamoru Maekawa, ACM Transactions on Com dependent branching that degrades performance on SIMD puter Systems, Vol. 3, No. 2, May 1985, reduces messages machines, such as the Cray YMP-8 and CM-2 alike. to O(VN) with an elegant multidimensional geometry. Sand FORTRAN, with a vectorizing compiler, is appropriate for ers generalized MutEX algorithm in more convoluted terms SIMD machines. Every attempt to wring efficient, massive in her article, “The Information Structure of Distributed parallelism from conventional FORTRAN programs has Mutual Exclusion Algorithms,” Beverly A. Sanders, ACM failed. Only programs expressed with a language embodying 25 Transactions on Computer Systems, Vol. 5, No. 1, August an inherently parallel model of computation can execute 1987. She considers deadlock freedom (absence of possi efficiently in parallel. bility that computation ceases because every process is The challenge to those wishing to exact optimum perfor waiting to acquire a lock held by a different process), but the mance from highly parallel processors is to coordinate basic model which she proposes needs further augmentation activities in processing nodes that are doing different things. to detect and recover from deadlock. Of course, only a Modifications to Special-purpose languages like LISP and wordy explanation Substitutes for correctness proof. There PROLOG to incorporate futures and guards, respectively, fore the desired interference freedom provided by MutEX is are problematic, addressing restricted application domains not achieved. Raymond in “A Tree-Based Algorithm for of largely academic interest. Distributed Mutual Exclusion, ACM Transactions On Com Then there are the many extensions of Sequential lan 35 puter Systems, Vol. 7, No. 1, February 1989, further reduced guages. Such extensions fall into two classes: “processes' the number of messages to O(logN) by increasing the and “doall'. Ada is perhaps the best example of a proceSS latency, which could easily become a significant burden if based language. Each Ada task (i.e. process) represents a processes need the lock briefly. Single thread of control that cooperates with other tasks Controlling access to shared variables is So difficult that through message passing. Communicating Sequential Pro 40 designers have resorted to the message-passing model in ceSSes described in “Communicating Sequential Processes,” which all State variables are local to Some process and C. A. R. Hoare, Communications of the ACM, Vol. 21, No. processes interact exclusively, via eXchange of messages. 8, August 1978 forms the underlying model of computation Inevitably, two different kinds of system-wide operations for message-passing, process-based languages. However, barrier Synchronization and remote State acquisition are deciding whether an arbitrary collection of processes is 45 required by the algorithm or application. They are notori deadlock-free is undecidable and proving that a particular ously inefficient. Barrier Synchronization ensures that all the collection of processes is deadlock-free is onerous. parallel computations in one phase are completed before the Furthermore, Such pairwise communication and Synchroni next phase is begun. Remote State acquisition occurs when Zation is unnecessarily restrictive thus limiting the degree of the application's data cannot be partitioned disjointly So that parallelism. Finally, a programmer writes (essentially) a 50 all processes use values exclusively from its partition held in plurality of Sequential programs, rather than a single, par local memory. For example, advanced magnetic resonance allel program. The “doall” is just a FORTRAN do-loop in imaging (MRI) research seeks to analyze brain Scans and which all the iterations are instead performed concurrently; call to the attention of the radiologist any anomalies. Three again a collection of Sequential programs. dimensional fast Fourier transform (FFT) is necessary for Current, Conventional, Concurrent Computing 55 filtering and extraction of interesting features. The data Traditional, Sequential computing causes a Sequence of consists of a 1 k by 1 k by 1 k array of floating point States to be created So that the last State holds the data intensities. The first phase of the algorithm maps nicely by desired. FIG. 1 illustrates Sequential computing Starting at an putting a 1 k by 1 k plane in each of 1 k processors. However initial State (101), proceeding through intermediate States integrating the third dimension finds the data in the worst (103), and terminating in a final state (102). The values of 60 possible distribution-every computation needs values that program variables at a particular instant of time comprise a are possessed by each of the 1 k processors State, depicted as circles. In State diagrams, time always Barriers are an artifact of the popular yet problematic flows down the page, that is, the State at the top of the figure process paradigm. A Static collection of Sequential processes occurs first followed by the state(s) connected by a line. alternately work independently and wait for laggards. FIG. Broken lines indicate repetition or “more of the same.” 65 3 depicts four processes (201) synchronized with barriers In "parallel' execution Several Sequences of State transi (204). Each process is actively computing (202) and then tions occur concurrently. FIG.2 depicts parallel execution in wait idly (203) until all other processes reach the barrier. The 5,867,649 3 4 best known algorithms for barrier-Synchronization take inflicts an O(logP) penalty for barrier Synchronization same O(logP) time (with a large constant) for both message as message-passing. passing and shared-memory models. This corresponds with Sequential algorithms have been traditionally analyzed the depth of a tree using either messages or flags. using Turing machines. Both the time (number of Steps) and A large body of academic work has been produced for the Space (number of tape Squares) are used to measure the both message-passing and parallel, random-acceSS machine complexity of a computation. Additionally whether the (PRAM) models, i.e. shared memory. Barrier synchroniza Turing machine has a unique next action (deterministic) or tion in a message-passing model requires O(logP) time a of possible next actions (nondeterministic) Substan where P is number of processors, with a big constant tially affects the complexity measure of a computation. proportional to message transmission latency. See "Algo The following paragraphs Summarize three chapters of a rithms for Scalable Synchronization on Shared-Memory recent book on parallel algorithms entitled “Synthesis of Multiprocessors,” John M. Mellor-Crummey and Michael L. Parallel Algorithms”, John H. Reif, editor, Morgan Scott, ACM Transactions on Computer Systems, Vol. 9, No. Kaufmann, 1993, that I believe is state-of-the-art computer 1, February 1991. Science theory for shared-memory computers. The following Consider an ideally parallelizable application that 15 sections quote heavily those authors (Faith E. Fich, Ray requires Synchronization. A Single processor would require mond Greenlaw, and Phillip B. Gibbons) and will be N Steps (time, Ts) to perform the application with Litera indented to indicate quotation. tions of a Single loop each requiring N/L Steps. When PRAMs attempt to model the computational behavior of parallelized, barrier Synchronization is required at the end of a shared-memory parallel processor. The alternative is each iteration taking logP Steps. Real Systems take many shared-nothing, message-passing parallel processors, which steps for each level of the synchronization tree. So the are popular these days. PRAMs have many variations and parallel time for each iteration, T, is N/(LP)+logP and the have a large body of published Scholarship. Fich, Greenlaw, whole application T, is T, L or N/P+L*logP steps. and Gibbons cover the main kinds of PRAMs, which I deem Speedup, S, is the ratio of Sequential time to parallel time, represent the conventional wisdom about how to think about T/T. Efficiency, E, is the speedup obtained divided by the 25 parallel computation. This paradigm has, So far, not yielded number of processors used, S/P. In this case, E=T/T.* P= efficient, general-purpose, parallel processing. Thus the N/(N+P*L*logP). motive for the present invention. Fich's chapter begins with a concise definition of a Total work to be done T = N steps “synchronous” PRAM. (sequential) “The PRAM is a synchronous model of parallel compu Number of loop iterations L tation in which processors communicate via Shared Sequential time per iteration NFL Parallel time per iteration memory. It consists of m shared memory cells M . . . Total parallel time , M, and p processors P, ..., P. Each processor is 35 a random access machine (RAM) with a private local memory. During every Step of a computation a proces Sor may read from one shared memory cell perform a local operation, and write to one memory cell. Reads, local operations, and writes are viewed as occurring When the number of processors is small, N>>P*L*logP, 40 during three Separate phases. This simplifies analysis the efficiency can be quite good, close to 1. But as more and only changes the running time of an algorithm by processors are used (NaP) the barrier synchronization over a constant factor.” (Underline mine.) head begins to dominate even when only a single barrier is Throughout Fich's paper she assumes a Synchronous needed. PRAM: single-instruction, multiple-datastream (SIMD). Therefore, an application needing barrier Synchronization 45 This allows her to ignore Synchronization costs and hassles on any conceivable message-passing architecture (Nsp), the in her analyses. The synchronous PRAM cannot model minimum time of execution will be O(logP), the best MIMD machines with their independent, data-dependent speedup possible is O(P/logP), the efficiency can be no branching is and uncertain timing. Having all processors better than O(1/logP). Little wonder that efficiencies of even read-execute-write each Step SideStepS. Sticky issues like 10% are hard to achieve on parallel processors like CM-2, 50 what happens when a memory cell is read and written CM-5, NCube, Paragon, and T3D. They can perform no Simultaneously. Furthermore the execute part of a step can better than their theoretical model. perform any calculation whatsoever! So the lower bounds derived for various algorithms concentrate Solely on shared memory accesses while totally ignoring the real Order parallel time T = O(logP) 55 computation-data transformation in the execute part of the Order speedup S = O(P/logP) Step. Order efficiency E = O(1/logP) Fich describes “forking” PRAM in which new processors are created with a fork operation (as in Unix). A forking Parallel Random-Access Machines (PRAMs) suffer simi PRAM can dynamically describe parallel activities at run lar fate. When most people think of “shared-memory” they 60 time. (Regular PRAMs have a fixed set of processes/ imagine a PRAM-abunch of Sequential processors that can processors.) But a forking PRAM still is a collection of read and write the same memory locations. Many PRAM Sequential execution Streams, not a single, parallel program. variations exist. Whether the memory locations can be “A forking PRAM is a PRAM in which a new processor accessed concurrently or exclusively and what happens for is created when an existing processor executes a fork concurrent accesses constitute most of the PRAM model 65 operation. In addition to creating the new processor, differences. Even unrealistic operations like Storing the this operation specifies the task that the new processor median or geometric mean of all concurrent writes Still is to perform (starting at the next time Step).” 5,867,649 S 6 After defining a forking PRAM Fich never mentions it synchronization point after each PRAM instruction. again. But she does make claims for PRAMs about “pro Synchronizing after each instruction is inherently inef grammer's view,” “ignoring lower level architectural details ficient Since the ability of the machine to run asynchro Such as . . . Synchronization,” and "performance of algo nously is not fully exploited and there is a (potentially rithms on the PRAM can be a good predictor of their relative large) overhead in performing the Synchronization. performance on real machines,” that I completely disagree Therefore our first modification to the PRAM model with. Since PRAMs exemplify the process paradigm will be to permit the processors to run asynchronously (collection of Sequential executions) they may adequately and then charge for any needed Synchronization.” predict performance of algorithms and architectures simi (underline mine) Additionally shared memory accesses are charged more larly afflicted (all machines built so far), but certainly not than local accesses: “To keep the model Simple we will what's possible. use a single parameter, d, to quantify the communica Most distinctions between PRAMs involve memory tion delay to memory. This parameter is intended to access restrictions. The most restrictive is called exclusive capture the ratio of the median time for global memory read exclusive-write (EREW). Ensuring that two processors accesses to the median time for a local operation.” don’t write to the same memory cell Simultaneously is trivial 15 I invented the Layered class of multistage interconnection with a synchronous PRAM since all operations are under networks U.S. Pat. No. 4,833,468, LAYERED central control. The other four memory access restrictions NETWORK, Larson et al., May 23, 1989 specifically to are: concurrent-read exclusive write (CREW), COMMON, keep d Small. For the Multitude architecture (interconnected ARBITRARY, and PRIORITY. The last three restrictions by a Layered network) the ratio of time for global memory differ in how concurrent writes are handled. With accesses to local memory accesses is about 3 to 4. This COMMON, values written at the same time to the same cell compares with a ratio of local memory access time to cache must have the same value. With ARBITRARY, any value hit time of 20 to 30. written in the same Step may be stored in the memory cell. Gibbons asynchronous PRAM model uses four types of With PRIORITY, the value written by the highest priority instructions: global read, global write, local operation, and process (lowest process number) is stored in the memory 25 Synchronization Step. The cost measures are: cell. Obviously an EREW program would run just fine on a CREW machine; the concurrent read hardware would not be Instruction Cost used. The five memory restrictions Fich uses nest nicely in local operation 1. a power hierarchy: global read or write d k global reads or writes d + k - 1 PRIORITYsARBITRARY COMMONCREWEREW synchronization barrier B Fich then spends much of her chapter showing how many Steps it takes one PRAM to Simulate a Single Step of another. The parameter B=B(p), the time to synchronize all the Proving that EREW can simulate a step of PRIORITY 35 processors, is a nondecreasing function of the number of using O(log p) Steps and pm memory cells takes Fich two processors, p, used by the program. In the ASynchronous pages of wordy prose. Essentially the Simulation takes three PRAM model, the parameters are assumed to obey the phases: following constraints: 2S dSBsp. However a reasonable assumption for modeling most machines is that B(p)6O(d/ Phase 1: Each processor that wants to write marks its 40 log p) or B(p)6O(d/log p/log d). dedicated cell from the p EREW cells used for each Here again is evidence of a logP performance hit for PRIORITY ce. barrier Synchronization. Phase 2: By evaluating (in parallel) a binary tree of all A synchronous EREW PRAM can be directly adapted to processors that want to write the highest priority pro run on an asynchronous PRAM by inserting two synchro ceSSor is found. 45 nization barriers, one after the read phase and write phase of Phase 3: The winning processor writes the “real' memory the synchronous PRAM time step. Thus a single step of a cell. synchronous PRAM can be simulated in 2B+2d+1 time on Since the height of the binary tree is O(log p) it takes an asynchronous PRAM. Therefore a synchronous EREW O(log p) steps for an EREW PRAM to emulate a PRIORITY PRAM algorithm running in t time using p processors will PRAM. Much of Fich's chapter is devoted to convoluted 50 take t02B+2d+1) time on an asynchronous PRAM-a sig Schemes by which one model Simulates another without nificant penalty. Gibbons shows how to maintain the O(Bt) mentioning any applications that benefit from a more pow running time but use only p/B processors by bunching the erful and expensive model. operations of B Synchronous processors into a single asyn Gibbons finally provides a reasonable PRAM model in chronous processor thus getting more work done for each the very last chapter of this 1000 page book. None of the 55 barrier Synchronization. other authors constrain themselves to reasonable models, Gibbons transforms an EREW algorithm for the venerable thus limiting the value of their work. In Gibbons PRAM “all prefix Sums’ operation for an asynchronous PRAM model: yielding an O(Blog n/log. B) time algorithm which isn't too “Existing MIMD machines are asynchronous, i.e., the bad. Gibbons presents asynchronous versions of descaling processors are not constrained to operate in lock-Step. 60 lemmas (performing the same application with fewer Each processor can proceed through its program at its processors) which tack on a for the barrier Synchroni own Speed, constrained by the progreSS of other pro Zations. Still he uses many barriers in order to obtain ceSSorS only at explicit Synchronization points. How behavior similar to synchronous PRAMs. Only when appli ever in the (synchronous) PRAM model, all processors cations are Synchronized just when they need it, and those execute a PRAM program in lock-step with one 65 needed Synchronizations are really fast, Balds 1, will Scaled another. Thus in order to safely execute a PRAM up parallel processing be possible, which is the contribution program on an asynchronous machine, there must be a of this invention and Layered networkS. 5,867,649 7 8 Gibbons considers several algorithms that run in O(Blog denotes the Set consisting of all elements X that Satisfy the n/log. B) time. In particular, he proves an interesting theorem property P. The bar, , can be read “such that.” For example, about Summing numbers: Given n numbers Stored one per global memory location, {xx is an integer and X is divisible by 2} and the following four types of instructions: L:=G, denotes the infinite Set of all even . L:=L+L, G:=L and “barrier,' where L is a local cell and We write a A to denote that a is an of the set A, G is a global cell, then the Sum of n numbers on an and b A to denote that b is not an element of A. Often it is Asynchronous PRAM with this instructions set requires G2(B log n/log. B) regardless of the number of convenient to refer to a given Set when defining a new Set. processors. This is interesting because Multitude can We write Sum n integers in O(1) time! Gibbons concludes with “ synchronization” in which only a subset of processors in the machine need Synchronize. Since the as an abbreviation for {xxe-A and P}. barriers involve fewer processors, and thus are shorter, Some Sets have Standard names Ø denotes the , Summing numbers runs marginally faster. 15 Z denotes the Set of all integers, and No denotes the Set of So the best PRAM model I could find charges PRAM all natural numbers including Zero. algorithms for their “cheating,” but still offers no insights In a set one does not distinguish repetitions of elements. about how to efficiently compute on an MIMD machine. Thus T, F} and {T, T, F} are the same set. Similarly, the Temporal Logic order of elements is irrelevant. In general, two Sets A and B MoSzkowski in his Executing Temporal Logic Programs are equal if-and-only-if the have the same elements, in Cambridge University Press, 1986 presented a temporal symbols: A=B. logic programming language. His language, Tempura, Let A and B be sets. Then ACB (and BDA) denotes that allows description of parallel execution (unsatisfactorily) A is a Subset of B, A?hB denotes the intersection of A and B, within a Single program expression. The foundation for AUB denotes the union of A and B, and A-B denotes the set Tempura is interval temporal logic. A Tempura interval is a 25 difference of A and B. Symbolically, non-empty Sequence of States. A State is determined by the values of program variables at a particular instant of time. The Sequence of States that form an interval provide a discrete notion of time. The semantics of Tempura were much different than any I had previously encountered. Instead of programs being a Step-by-step recipe for how to do the computation, they were logical formulas that are “satisfied” by constructing a “model” that makes the formula “true.” Unfortunately the Note that A=B if both ACB and BCA. A and B are disjoint models constructed are simply Sequences of States-the 35 if they have no element in common that is A?h B=3. process model underlying PRAMs. The definitions of interSection and union can be general The semantics of what is referred to herein as DANCE ized to more than two sets. Let A be a set for every element programs, although also temporal logic formula based, are i of Some other set J. Then Satisfied by constructing models with a different and Supe rior mathematical Structure that allows efficient execution on 40 a plurality of computational engines. It is this mathematical structure that forms the core of this invention. For a finite Set A, card(A) denotes the cardinality, or the BACKGROUND-MATHEMATICAL number of elements of A. For a non-empty, finite Set A FOUNDATIONS 45 CZ, min(A) denotes the minimum of all integers in A. Tuples In this Section of the Specification, we explain the notation In Sets the repetition of elements and their order is used throughout this document and Some necessary elemen irrelevant. If ordering matters, we use ordered pairs and tary notions from . Although these nota tuples. For elements a and b, not necessarily distinct, (a,b) is tions are fairly Standard, this presentation is adapted from 50 an ordered pair or simply pair. Then a and b are called Apt and Olderog's book Verification of Sequential and components of (a,b). Two pairs (a,b) and (c,d) are identical Concurrent Programs, Krzysztof R. Apt, and Ernst-Ruidiger (a,b)=(c,d), if-and-only-if a=c and b=d. Olderog, Springer-Verlag, 1991). More generally, let n be any natural number, neNo. Then Sets if a1, . . . , an are any n elements then (a1, . . . , an) is an It is assumed that the reader is familiar with the notion of 55 n-tuple. The element ai, where i{1,..., n} is called the i-th a Set, a collection of elements. Finite Sets may be specified component of (a1, ..., an). An n-tuple (a1, ..., an) is equal by enumerating their elements between curly brackets. For to an m-tuple (b1, ..., bm) if-and-only-if n=m and ai=bi for example {T, F} denotes the set consisting of the Boolean all i{1,..., n}. More generally, let n be any natural number, constants T (true) and F (). When enumerating the neNo. Then if a1, . . . , a, are any n elements, then (a1, . . elements of a Set, I Sometimes use ". . . . as a notation. For 60 ., a) is an n-tuple. The element a, where ie1, ..., n} is example, {1,..., n} denotes the set consisting of the natural called the i-th component of (a1, . . . , a,). An n-tuple (a1, . numbers from 1 to n where the upper bound, n, is a natural .., a,) is equal to an m-tuple (b1, ...,b) if-and-only-if n=m number that is not further specified. and a-b; for allie1, ..., n}. Note that 2-tuples are pairs. More generally, Sets are specified by referring to Some Additionally, a 0-tuple is written as (), and a 1-tuple as (a) property of their elements: 65 for any element a. The Cartesian product AxB of sets A and B consists of all pairs (a,b) with aeA and be B. The n-fold Cartesian product 5,867,649 9 10 AX . . . XA, of Sets A1, ..., A consists of all n-tuples (a, Some functions, however, we use postfix notation and write ..., a,) with aeA, for ie. 1, ..., n}. If all A are the same af=b. In both cases b is called the value off applied to the Set A, then the n-fold Cartesian product AX . . . XA is also argument a. To indicate that f is a function from A to B we written A'. write Relations 5 Abinary relation R between sets A and B is a Subset of the Cartesian product AxB, that is, RCAxB. If A=B then R is The set A is called the domain of f and the set B is the called a relation on A. For example, co-domain of f. Consider a function f:A->B and some set XCA. Then the restriction off to X is denoted by fX and defined as the is a between {a,b,c) and {1,2}. More Intersection off (which is a subset of AxB) with XxB: generally, for any natural number n an n-ary relation R between A, . . . , A, is a Subset of the n-fold Cartesian product A X ... XA, that is, RCA X ... XA. Note that 2-ary 15 We are Sometimes interested in functions with Special relations are the same as binary relations. Instead of 1-ary properties. A function f:A->B is called one-to-one or injec and 3-ary relations one usually uses unary and ternary tive if f(a)=f(a) for any two distinct elements a, a 6A, it instead. is called onto or subjective if for every element beB there Consider a relation R on a set A. R is called reflexive if exits an element aeA with f(a)=b; it is called bijective if it (a,a)6R for all aeA; it is called irreflexive if (a,a)st R for all is both injective and Surjective. aeA. R is called symmetric if for all a,bea whenever Consider a function whose domain is a Cartesian product, (a,b)6R then also (b,a)6R; it is called antisymmetric if for say f:AX . . . XA >B. Then it is customary to drop one pair all a,beA whenever (a,b)6R and (b,a)6R, then a-b. R is of parentheses when applying f to an element (a, . . . , called transitive if for all a,b,cAA whenever (a,b)6R and a)6AX . . . X.A. We also say f if an n-ary function. (b,c)6R, then also (a,c)6R. 25 Consider a function whose domain and co-domain The transitive, reflexive closure R* of a relation R on a set coincide, say f: A >A. An element aeA is called a fixed A is the Smallest transitive and reflexive relation on A that point off if f(a)=a. contains R as a subset. The transitive, irreflexive closure R' Sequences of a relation R on a Set A is the Smallest transitive and In the following let Abe a Set. A sequence of elements irreflexive relation on A that contains R as a Subset. The from A of length n20 is a function f: {1, ..., n}->A. We transitive, irreflexive closure is necessary because the graphs write a sequence f by listing the values of f without considered later do not have self-loops. The “plus” super punctuation in the order of ascending arguments, that is, as Script may not be standard. The relational composition RoR of relations R and R a 1. . . . a on a Set A is defined as follows: 35 where a=f(1), . . . , a, f(n). Then a1, ie{1, . . . , n} is RoR ={(a,c) there exists beA with (a,b)6R and (b,c)6R. referred to as the i-th element in the Sequence a . . . a. A finite Sequence is a Sequence of any length ne0. A sequence For any natural number n the n-fold relational composi of length 0 is called the empty Sequence and is denoted by tion R' of of a relation R on a set A is defined inductively 6. as follows: 40 We also allow countably infinite Sequences. An infinite sequence of elements from A is a function S: No->A. To exhibit the general form of an infinite Sequence S we typically write

Note that 45 Siao a1a2 . . . if a=S(i) for all ie.No. Then i is called the index of the element a. Consider now relations R, R2, . ... on a Set A. For any and that 50 finite Sequence ao . . ... a, of elements from A with ao R1 ai, a 1 R2 a2, . . . . a, 1 R, an where R'-R. we write a finite chain Membership of pairs in a binary relation R is mostly written in infix notation, so instead of (a,b)6R one usually 55 ao R1a1 R2 a2 . . . an 1 R, a. writes aRb. For example, using the relations = and < on Z, we may Any binary relation RCAxB has an inverse RCBxA write defined as:

bRaiff aRb. 60 We apply this notion also to infinite Sequences. Thus for Functions any infinite Sequence at a a . . . of elements of A with Let A and B be sets. A function or mapping from A to B is a binary relation f between A and B with the following ao R1 a1, a 1 R2 a2, a 2 R3 as . . . Special property: for each element aeAthere is exactly one 65 we write an infinite chain element be B with afb. Mostly we use prefix notation for function application and write f(a)=b instead of afb. For ao R1 a1 R2 a 2 R3 as . . . 5,867,649 11 12 Such a chain stabilizes if from Some index n onwards all DANCE (i.e. this invention) E" is a partial order that elements a with i2n are identical corresponds to the concept of precedes. A graph (V, E) is a lattice if E is an irreflexive partial ao R1a1 R2 a2 . . . an 1 R, an-a, -1-a, -2. . . order, D it has a least element 16V, and an upper bound u6V. All executions of DANCE programs will create a Apt and Olderog use chains to describe the computations lattice of States. of programs. This invention does not. Chains represent the Lattices may be combined to form new lattices in two Sequential paradigm of computation that permeates com ways. Consider two lattices (V, E) and (V, E) that have puter Science. It is the fundamental limiting factor for both no vertices in common, V ?hV=g), least elements 16V and the kind of programs Apt and Olderog can handle, and the leV, and upper bounds u€V and useV. Their sequen efficiency with which those programs can be executed. tial combination may be performed as follows: Substitute ul Strings for 1 in V and E and form the union of the vertices and A Set of Symbols is often called an alphabet. A String over edges, (VUV, EUE). Their insertion combination may an alphabet A is a finite Sequence of Symbols from A. For be performed as follows: choose an edge eeE, e=(V, V), example 1+2 is a string over the alphabet {1,2,+). The remove it from E., Substitute V, for 1 and V for u in V2 and Syntactic objects considered in this treatise are Strings. We 15 E. and form the union of the vertices and edges, (VUV, shall introduce Several classes of Strings: expressions, EUE). We assert, without proof, that sequential and inser tion combination of lattices results in a lattice. assertions, temporal predicates, programs, and correctness Proofs formulas. A mathematical proof is a Sequence of Statements formu We use for the Syntactic identity of Strings. For las or expressions, each of which is an axion of the logical example 1+2=1+2 but not 1+2=2+1. The = will be System used, a given assumption, or a theorem derived by an used for the Semantic of objects. Thus 1+2=2+1. inference rule from Statements earlier in the Sequence. The The concatenation of Strings S and S yields the String last theorem is usually the desired truth to be established SS formed by first writing S and then S without interven Such as the program meets its Specifications and therefore is ing Spaces. A String t is called a SubString of a String S if there correct. We will exhibit proofs in the following form: exist StringSS and S. Such that S=Sts. Since S and S may 25 be empty, S is always a SubString of itself. Partial Orders (1) theorem 1 A partial order is a pair (A, C) consisting of a set A and -> explanation why theorem 2 is derived from theorem 1 reflexive, antisymmetric, and transitive relation C on A. If (2) theorem 2 Xy for Some X,y6A, we say that X is less than or equal to EX y or y is greater than or equal to X. Since C is reflexive, XCX. Sometimes we consider irreflexive partial orders. These are s pairs (A, C) consisting of a set A and an irreflexive and (n-1) theorem n-1 transitive relation C on A. -> explanation why theorem n is derived from theorems 1 to n-1 Consider now a partial order (A, C). Let aeA and X 35 (n) theorem in CA. Then a is called the least element of X if aeX and a DX for all X6X. The element a is called an upper bound of X if Sometimes the parenthetical theorem numbers or obvious Xa for all x6X. Note that upper bound of X need not be explanations may be omitted. The symbol iff will be used for elements of X. Let U be the set of all upper bounds of X. 40 if-and-only-if. The symbol will be used for there exists. Then a is called the least upper bound of X if a is the least The symbol W will be used for for all. element of U. Similarly for greatest lower bound. Induction A partial order is called complete if A contains a least Proofs frequently rely on induction. Usually this involves element and if for every ascending chain proving Some property P that has a natural number 45 parameter, say n. To prove that P holds for all neNo, it a -a at . . . Suffices to proceed by induction on n, organizing the proof of elements from A the set as follows: induction basis: Prove that P holds for n=0. {ao a1, a2, ... } Induction step: Prove that P holds for n+1 from the has a least upper bound. 50 induction hypothesis that P holds for n. Graphs The induction principle for natural numbers is based on the fact that the natural numbers can be constructed by A graph is a pair (V, E) where V is a finite set of Vertices beginning with the number 0 and repeatedly adding 1. By allowing more general construction methods, one obtains 55 the principle of Structural induction, enabling the use of and E is a finite Set of edges where each edge is a pair of more than one case at the induction basis and the induction Vertices Step. GrammarS To facilitate parsing as well as defining attributes to be 60 synthesized or inherited, the grammar for DANCE has been All graphs we consider are directed in that the order of written in Larson normal form or LNF. Vertices within the pair describing an edge is significant. LNF production rules: The set of edges E forms a relation on the set of vertices: Non-terminals are enclosed in angle brackets: . A production consists of a single non-terminal followed 65 by ::= and then an expression in LNF describing what can be The transitive, irreflexive closure of E, namely E", is substituted for the non-terminal. Production expressions especially important. In the temporal logic created for have have only three forms: 5,867,649 13 14 1) A single terminal. makes all the temporal predicates “true.” A model is a lattice or States that Satisfies the temporal formula represented by ::= Several objects and advantages of my invention, the model of computation devised for the Definitive, Axiomatic 3) A single nonterminal. Notation for Concurrent Execution (DANCE) computer programming language are: ::= elements within the data Structure to be simultaneously Literals within “” accessed without locking, blocking, or Semaphores, One or More * the meaning of every language construct is precisely and Zero or more * mathematically defined; and Choice I during program execution, processors may simulta Grouping neously Self-Schedule or post work for other processors Zero or One () to do. “(“ is just an identifier grammatically, but neously” The fundamental concern of this invention is reducing the is helpful to know when a “function' name is expected, that time of program execution by using may processors (i.e. must have a function of that name delcared Somewhere. Speedup). Existing computer architectures and computa Data, States, and Meaning tional models underlying parallel programming languages A model is a triple (T, S, 7 ) containing a set of data encounter fundamental, theoretical limitations of Speedup. I types T, a set of domains containing States S, and an invented the "Layered’ class of multistage interconnection 7 giving meaning to every symbol used networks to permit construction of parallel computers com in the language. For now, consider the data type domain, T, prising thousands of processors connected with a Layered to be the integers. A State is a function mapping variables to 35 network to create a fundamentally different computer archi tecture I called “Multitude.” Unfortunately, the models of values in T. For n state S in S and a variable X, let 7% S computation embodied explicitly or implicitly in program X denote X's value in State S. Each k-place function ming languages adapted for parallel computers (e.g. Ada) symbol f has an interpretation 7% sf which is a function would not allow full exploitation of the Multitude architec mapping k elements in T to a Single value: 40 ture computer's power and efficiency. Therefore, I invented a new computational model that allows a programmer to 7% slf (T-T) describe, and the machine to utilize, maximal opportunities What this means is that every function symbol (like for Simultaneous execution of many parts of the same program. I endeavored to create an embodiment, the “plus”) has a “meaning” (like 7e plus) that is a map 45 DANCE language, whose form is familiar (patterned after ping from Zero or more numbers (in this case two) to a single Ada) but whose Semantics (i.e. meaning, or computational number. model) are radically different. "hundreds, thousands, or more processors may be effi 7% splus (T-T) ciently used” 50 Techniques that allow a dozen or two processors to The line above doesn’t say how “plus' works, just that it cooperate effectively generally do not work for hundreds or takes two numbers and returns one. Meanings arent rigor thousands of processors. That is, the techniques don’t ously defined in this application for common arithmetic “Scale.” Efficiency is of utmost importance, keeping the functions. Rather than develop axioms for number theory, processors busy (almost) always doing Something useful the meanings of function Symbols rely on intuition. 55 economically employs expensive equipment. Machines that MoSzkowski uses temporal predicates to affect Side use 512 processors to deliver Speedup around 25 have not effects (cause State change). This may cause confusion Since been, and will not be, purchased by Savvy, private-sector for correctness proofs we need assertions made with first users. Speedup greater than one thousand for a broad range order predicate calculus. Keep the distinction clear between of applications requires all aspects of an application to Scale the text that represents executable program (temporal 60 from description in an inherently-parallel language down to predicates) and text used to assert Some property of State the topology of the network physically interconnecting the variables (assertions). Interpretations of predicate Symbols processors. are similar but map to truth Values: “program correctness may be proved” 7% sp(T->{true, false}) Poor quality software is endemic today. No other prod 65 uct's flaws are So tolerated by Society. Buggy programs Commands in the DANCE language are temporal predi inevitably result from the "group hack' methodology: code cates. Program execution tries to construct a model that test-fix-test-fix-release-get bug reports from users-fix-test 5,867,649 15 16 release-get more bug reports-etc.-etc.-etc. Testing can only and one edge to every node except the initial State; the final show the presence of errors, not their absence. Only when a State is reachable from any State in the graph and any node proof that a program meets its Specifications has been is reachable from the initial State; exhibited and confirmed can we be Sure that a program does FIG. 2 is a diagram Showing convention parallel execu what we say it does and nothing more. Correctness proofs tion as Several (or many) Sequential executions; are even more important for parallel programs, debuggers FIG. 3 is a diagram illustrating barrier Synchronization that rely on a single locus of control to Step through program which ensures that every proceSS has completed a particular executions cannot be extended for debugging of parallel computation before any process may start its next compu programs. Furthermore, parallel programs must assure noininterference-one part of the execution doesn’t tation; adversely modify the values of variables needed by another FIG. 4 represents a simple execution in which the values part of the execution. This invention is precise, mathemati of b and a may be computed concurrently; cal Semantics for parallel programming languages which FIG. 5 diagrammatically show Sequential intervals , necessarily permits formal correctness proofs. , and . Sequential intervals may be represented “special data Structures may be created that allow many 15 with either text or a graph; elements within the data Structure to be simultaneously FIG. 6 illustrates a simple State lattice in accordance with accessed without locking, blocking, or Semaphores' the invention. Achieving efficiency requires existing applications to be FIG. 7 diagrammatically illustrates an arbitrary interval. reprogrammed using data Structures that can be accessed in The interval, i, has multiple, concurrent computations, parallel. Constructs that ensure exclusive use of Some object FIG. 8 represents a Sequential composition in which two by forcing other users to wait inevitably impede execution intervals are composed Sequentially by fusing the “end” of thus destroying efficiency. Many Such constructs have been one interval with the “start of another; proposed or tried: critical Sections, monitors, Semaphores, FIG. 9 illustrates a simple parallel composition in which locks, Spin Waiting, Snoopy caches, message passing, etc. two intervals may be composed in parallel by fusing their This invention incorporates Semantics for combinable 25 “start States together and fusing their “end” States together; operations in which many or all of the processors can acceSS the same variable at the same clock cycle yielding results as FIG. 10 illustrates universal quantification: Universal, if the accesses occurred one-at-a-time. Provision by Layered temporal quantification creates many concurrent computa networks of combinable operations for Multitude architec tions each with a different value for the quantified variable, ture computers is crucial for efficient program execution. in this case X, Automatic conversion of existing, Sequential programs for FIG. 11 illustrates diagrammatically an assignmentiIter parallel execution is futile. Efficient parallel execution Val which functions to evaluate an expression using the demands data structures, and their method of access, be values contained in its "start node; the end node has the radically different from their Sequential counterparts. new value as the State of the assigned variable, in this case “the meaning of every language construct is precisely and 35 X; mathematically defined” FIG. 12 represents a conventional concurrently accessible The Semantics of most programming languages are queue: Using fetchadd to determine the array indeX for defined in a “natural language, usually English. Therefore either get or put allows many intervals to access the queue a program compiled by different compilers for the same concurrently, machine may not behave the same, though both compilers 40 FIG. 13 illustrates the inserting of records into a linked are “correct.” In reality the compiler defines language list: Two records can be inserted into the same place at the Semantics, compiler bugs are language features. Only when Same time using a Swap operation; language and program Semantics are precisely (formally, FIG. 14 illustrates a conventional linked-list queue: Using mathematically) defined can an implementation be deemed Swap to manipulate pointerS allows concurrent access to a “correct.” Furthermore, proof of program correctness 45 dynamically-sized queue; demands formal language Semantics. FIG. 15 is a graph of a single node showing an initial State “during program execution, processors may simulta condition. The “first node contains the values for variables neously Self-Schedule or post work for other processors to at the beginning of the program, before any computation do' OCCurS, When an idle processor can fetch its own work to do 50 FIG. 16 illustrates the initial state of procedure “sort”. Simultaneously with many other processors processors, can Because the actual parameter names (what the caller calls be kept busy doing useful work. Self-Scheduling is essential the variables) and the formal parameter names (used within for good efficiency. In addition anonymous processor Sched a procedure) usually differ we show two nodes rather than uling greatly facilitates fault-tolerance, permanent failure of just one. Since there is no State change the two nodes can be a processor, Switch, or memory requires checkpoint/restart 55 of the program and on-the-fly hardware reconfiguration to collapsed into one; allow graceful degradation. Clever, but brittle, mapping FIG. 17 graphically shows an interval to satisfy “sort': Schemes of applications onto architectures generally Four Subintervals are inserted into the lattice for "sort” to implode with the loSS of a single processor or link. first partition the data (51), followed by sorts on the upper Additional objects and advantages Such as elimination of 60 (54) and lower (57) halves, concurrent with storing the pivot cache coherency, concurrent memory allocation cache write (60); back and others will become apparent from the Specification FIG. 18 illustrates an interval to satisfy “partition”; and drawings. FIG. 19 shows concurrent intervals satisfying the alter native command within “partition”; DESCRIPTION OF DRAWINGS 65 FIGS. 20, 21 and 22 when arranged as shown in FIG.22A FIG. 1 is a diagram showing convention Sequential execu illustrate as an examplee a lattice Satisfying a temporal logic tion and has one edge from every node except the final State formula for sorting members of the Swedish Bikini Team; 5,867,649 17 18 FIG. 23 shows schematically the Multitude architecture Achieving the objective requires major changes in the which connects processors with a layered network; operation of Virtually every aspect of hardware and Software, FIG. 24 shows an exemplary processing node in a Mul with the important exception of fabrication technology. titude architecture computer; Sub-micron CMOS from any of a dozen silicon foundries FIG. 25 represents diagrammatically a physical address will do fine. The trick to making a faster machine with slower parts is to buy more of those parts for less money, and comprised of node number and memory address, then actually use a much bigger fraction of the parts doing FIG. 26 illustrates the memory of each node partitioned Something useful. Additionally, as fabrication technology into global, local, and Spread; improves, it can readily be incorporated without changing FIG. 27 shows a main memory word as comprising are 84 the architecture or the Software. bits wide; The Sequential paradigm is So powerful that it is difficult FIG. 28 represents the translation of segmented address to to think otherwise. It is natural to think of parallel computing an effective virtual address, as a plurality of Sequential execution Streams. Sometimes FIG. 29 represents a translation of an effective virtual they cooperate by exchanging messages, Sometimes they address to a physical address, 15 cooperate by Sharing State variables, Sometimes they wait FIG. 30 shows a free-space list for blocks of 2n bytes; until all other processors have finished a phase before FIG. 31 shows the Segment capability bit assignment; proceeding to the next phase. Sequential languages have been augmented with means to fork off nearly identical FIG. 32 illustrates page descriptor fields; copies of themselves, with hints how to use the data in FIG. 33 represents the static linking with link record; parallel, and with libraries of pre-compiled procedures for FIG. 34 shows a lattice of states satisfying a DANCE message passing or shared-memory access. In all of these temporal formula; cases the paradigm is still replicated Sequential control flow. FIG. 35 illustrates hidden registers; Virtually all of today's computer Science theory deals FIG. 36 shows that MMU translates a segmented address with Sequentiality or replicated Sequentiality. Huge tomes of to a physical address, 25 algorithms have been published and Studied that are either FIG. 37 shows how a User Sergment Table defines Sequential control flows or replicated Sequential control Segmented-to-EVA translation; flows. Regular expressions are recognized by finite-state FIG. 38 illustrates TLB hit address translation; machines, programs are parsed using a push-down automa FIG. 39 illustrates TLB hit address translation with page tion coupled with a handle-recognizing finite-state machine. descriptor; The compiler-construction tool, yacc, known and generally loathed by computer Science Students can only parse a Subset FIG. 40 illustrates that a cache line holding four words of context-free grammars: LALR(1). The very definition of needs 335 bits; the LALR(1) class of languages has sequentiality built into FIG. 41 shows that cache hits are determined by encoding it-those languages which can be parsed by push-down Virtual address to Select cache lines, and then compare that 35 automata coupled with a handle-recognizing finite-state user, Segment, and offset match; machines. The pull of Sequentiality is So Strong many people FIG. 42 is a diagram showing nesting of domains in proof will not be able set it aside to consider a different way of of noninterference; computing. FIG. 43 shows count faces; and FIG. 44 is an instruction-set architecture table for Multi 40 DESCRIPTION OF A PREFERRED tude architecture processors. It describes the functionality of EMBODIMENT OF THE INVENTION every instruction and defines it in terms of temporal logic. A mathematical model of computation is presented with SUMMARY the Salient advantage that the computation may be per 45 formed by hundreds or thousands of microprocessors Simul Efficient, massively-parallel computing requires work taneously and efficiently. Five mathematical Systems inter Scheduling that keeps every processor busy almost all the relate in Special ways. The five Systems are: temporal logic, time. The dismal 3% to 5% efficiency found by the Army domains, type theory, assertions and proofs. Representing High-Performance Computing Research Center at the Uni programs as Special temporal logic formulae that are Satis versity of Minnesota (AHPCRC) across a broad range of 50 fied (executed) by lattices of State transitions form the core applications results from inherent limitations in currently of this invention and provide the Simultaneity and efficiency available massively-parallel computers. desired. Domains contain variable-value bindings that This is not for lack of trying. restrict visibility of variables allowing necessary proofs of A new class of multistage interconnection networks to non-interference, but without the locks, Semaphores, or provide rapid, voluminous communication between RISC 55 messages that hobble the prior art attempts. The type theory processors is disclosed in my earlier U.S. Pat. No. 4,833,468 adapts an elegant, lambda-calculus-based, polymorphic, entitled “LAYERED NETWORK". The shared-memory, object-oriented type System for the temporal logic execution multiprocessor architecture based on my “Layered’ class of espoused here. ASSertions in first-order predicate calculus networks has been dubbed “Multitude”. Although Multitude define Specifications and allowable intermediate States for nicely supports Ada ANSI/MIL-STD-1815A tasking and 60 programs. Since only proofs can assure program correctness, rendezvous are both difficult to use and retain an inherent a proof System called BPL associates inference rules and/or Sequential paradigm. Therefore, I tried to define a new weakest-precondition predicate transformers with each lan language, "Samantha', that would allow programmers to guage production. A proof of Soundness for BPL is pre define applications without circumscribing opportunities for Sented; a logic is Sound if it inferS true fact from true facts. parallel execution. 65 Computational Model The objective of efficient computing with 1000s of pro A model of computation provides a mathematical foun ceSSorS has daunted computer architects for two decades. dation for the meaning of programs and their execution. 5,867,649 19 20 Hopcraft and Ullman Introduction to Automata Theory, However those edges do have direction in that time flows Languages, and Computation, Addison-Wesley, 1979 down the page. FIG. 4 illustrates the lattice corresponding to present a Series of increasingly complex Sequential machines the previous example. and their models. Hopcraft and Ullman show correspon In state 301 both a and b have no value. State 302 has a dences between the machines and classes of languages assigned 10. States 303 and 304 are independently deter recognized by those machines. State transitions are defined mined and may occur simultaneously. State 305 is the for an input character, the current State, and perhaps Some Synchronization point for the independent computations of internal data Structure Such as a Stack. All the machines are 303 and 304. Finally state 306 is computed. The heavy lines Sequential. Hoare “Communicating Sequential Processes,” between states like 307 are state transitions. The actual C. A. R. Hoare, Communications of the ACM, Vol. 21, No. 8, August 1978 expands the sequential model by allowing computation occurs during State transitions like 307 and are multiple Sequential machines. Others use models Such as shown as arcs between the nodes holding State. dataflow (Petri nets), logic programming (Horn clauses), or Please note that this model explicitly rejects the notion of functional (lambda calculus) without much greater utility. Some “universal Sequence of States comprised of States of None of these models of computation adequately individual processes in “Some Sequential order. Such mod expresses parallel execution. Therefore I invented the State 15 els although popular provide poor paradigms for program lattice temporal logic model for DANCE. ming parallel processors. Temporal Logic Commands in the DANCE language are really temporal Moszkowski in his Executing Temporal Logic Programs predicates. Program execution tries to construct a model that treats programs as temporal logic formulae. His language, makes all the temporal predicates true. A model is a lattice "Tempura’, allows description of traditional, proceSS-based, of States that Satisfies the temporal formula represented by parallel execution within a Single program expression. The program text. The use of the word “model” here refers to foundation for Tempura is interval temporal logic. A Tem creation of a lattice of States that Satisfies a particular pura interval is a non-empty Sequence of States. A State is temporal logic formula embodied in a DANCE program. determined by the values of program variables at a particular The “model” of computation for DANCE refers to the instant of time. The Sequence of States that form an interval 25 general method of constructing lattices of States concur provide a discrete notion of time. rently. The formal model for DANCE presented here differs Expressions and Predicates significantly from the formal model for Tempura: This Section Starts to use all the mathematical machinery States form a lattice not a Sequence. in the “Background-Mathematical Foundations' section The domains visible at a particular State are both restricted hereof to define the meaning of DANCE programs, which is and explicit. the model of execution that forms the core of this invention. Later a Sophisticated, polymorphic type System will be Obviously, one could merely change the grammar and Substituted for the simple integer types presented here. Symbols used in the DANCE language, but if the underlying Again later a first-order predicate calculus for describing meaning of Such languages and programs is based on Sets of States very Similar to those used by Apt/Dijkstra/ 35 temporal logic formulae Satisfied by lattices of States, it Gries/Hoare for States in a guarded command language would not depart from the Spirit and Scope of the invention. program is presented. A logical System for reasoning about Expressions in DANCE are built inductively as follows: the behavior of programs in terms of interval temporal logic lattices forms another significant, original contribution of Variable names or constants X, Y, Z, 5, 1509, O 40 this exposition. Functions: f(e1, . ..., e), where f is a function symbol defined by Each DANCE interval may be thought of as a non-empty M, lattice of States partially ordered by their occurrence in time. A State is the values of a set of variables at an instant in time. k20, and e1, . . . . e. are expressions. The set of variables visible by a particular state is its domain. An interval can have many intervening States between any 45 DANCE grammar for expressions: pair of States. An interval may also be split but the Subinter vals must have a starting and ending State in common (i.e. ::="("+) every Subinterval is itself a lattice, combinations of lattice ::= must form lattices). Intervals are Strung end-to-end by Sequential composition invoked by the Semicolons in the 50

X b: = a + acka: = a a X-Y 60 (X+Y)/Z. a: = b + a somefunction(XYZ) First 'a' is assigned the value ten. Then the values of ‘b’ and 'a are computed in parallel. Finally 'a' is assigned the Formulas (constructions of temporal predicates) are built Sum of 'a and “b. 65 inductively as follows: In this paper, interval lattices will be depicted as a Set of Predicates p(e., . . . , e), where p is a predicate symbol circles (or Stretched circles) connected by undirected edges. defined by M, k20, and e, . . . , e, are expressions. 5,867,649 21 22 Like functions, we assume a basic Set of relations An irreflexive partial order relation, D., Such that complete with Syntax. Logical connectives: w & W and w, we where w and a F, b=n>0: it, . . . , t, CT: t1=(a,s) and t=(s_1, b) and w are predicates. The ampersand corresponds to “and.” The semicolon corresponds to “after.” One more (bounded universal quantification) A least upper bound (lub) and greatest lower bound (glb) will be added in a following section. functions are defined for any pair of States in S, as Parentheses may be used for grouping expressions. Predi cates may be grouped with BEGIN-END. lub(a,b)=c as a Fc and b sic and not de SL: DANCE programs are temporal logic formulas. Execu 1O a d and b d and d - c. tion of a program seeks to construct an interval (e.g. model) glb(a,b)=c as c a and c L b and not de SL: that makes the formula true. d La and d b and c d DANCE grammar for temporal logic formulas: There exists a distinguished states “start” and “end” "("+) 15 defined as . Some examples are and X 315. Being a "partial” order provides opportunities for paral lelism. Since neither u , w nor WLu, the transitions (t,u) 318, and (v,w) 319 may occur independently, even simul 45 taneously. Good DANCE programs create execution lattices that are short and broad. Temporal Meaning Let I denote the set of all possible intervals. Consider an FIG. 5 shows these sequential intervals in graph form. arbitrary interval iDI, composed of a lattice of States with DANCE expands the definition of an interval to a finite 50 distinguished states start(i) 101 and end(i) 102 with inter lattice instead of a Sequence of States. This provides the mediate states 103 and state transitions 307 as depicted in mathematical machinery upon which correct, efficient, par FIG. 7. allel execution will be based. In defining meaning for DANCE language constructs over FIG. 6 depicts a simple lattice interval containing States S, interval will be denoted by a capital M subscripted with an t, u, V, W, and X. A textual representation could be: 55 interval or State designator, the language construct in double brackets, II, the equality-by-definition Symbol, E, fol t lowed by Some expression as to the meaning of the con / e Struct:

\/ 60 M interval constructEdefinition Meaning of Variables but notation good for Sequential intervals becomes cumber The meaning of variable names is defined by: Some for DANCE lattices so a graphical representation will be used instead. W |X)= 7% XI where X is a variable name, Xes, and seS. Alattice of States, L, is a graph whose nodes are States, S, 65 and arcs are State transitions, T., that must have the follow This says “the meaning of X for the interval, i, is defined ing properties: as the value of X in each state, S, in the interval.” If X is not 5,867,649 23 24 a State variable of S, or S is not a State in the interval i, then X's meaning is undefined. Meaning of Functions where w and we are formulas means Where f is a function name and e, ..., e are expressions W w; w=true iff or variable names, the meaning of function Symbol, f, is bikiC and kCi: ( 7e Iwi Htrue) and ( 7% w=true) and defined by: (end()=start(k)) W f(e1, .. ., e)- 7%. If ( 7% ell,..., This Says “the meaning of w; W is true in the interval i 7e eID if-and-only-if there exists Some Subintervals of i, namely This says “the meaning of an expression in the interval (i) 1O and k, Such that the meaning of w is true during and the is the meaning of the function symbol (f) applied to the meaning of w is true during k, and k starts at the end of j.” meaning of each of the Subexpressions fuses as its oper Fig. 8 depicts Sequential composition of intervals. ands.” Note that the two subintervals of i share the state end() DANCE grammar for functions: =Start(k), and w is true on the first part of the interval and 15 w is true on the last part. ::="('+") Simple DANCE grammar for Sequential composition:

::= Parallel Composition ::="FUNCTION "(Izvariable name>“:”"(* ") meaning of Wa is concurrently true over Some different path, ::="PROCEDURE “(Izvari k, of states from start(i) to end(i) in the interval i.” able name>“: ad w is true on as in FIG. 9. Therefore w & w is true over the whole interval. Temporal Formulas 40 Moszkowski doesn’t recognize such bifurcation. Rather Formulas are compositions of temporal predicates. Such he uses the more traditional arbitrary interleaving of States. compositions will be covered in the later Sections. For now Though he makes reference to parallel computing, his inter consider that a formula is a simple temporal predicate. vals are always Sequences, it's just that Several predicates A formula w is satisfied by an interval iiff the meaning of must be true over the same Sequence of States. In this W on i is true: 45 example the States V, and W contain variables Seen only by 7% w-true. W. Similarly the States t and u contain variables Seen only by W. This is the essential, paradigmatic difference in “iff means “. DANCE from other models. Thus the importance of local Execution of a program constructs an interval So that the variables and limited scoping in DANCE provided by interval Satisfies the formula that expresses the program. If 50 domains. all intervals satisfy a formula that formula is deemed valid. Simple DANCE grammar for parallel composition: However, valid formulas (tautologies) are rather uninterest ing programs. Constructing an interval, a lattice of States, is

7% IIFETCH AND(X.e.Y.) & FETCH AND(X.e.Y.) & . . . & FETCH AND(X,e,Y)=true f there exists Some indperm(e., . . ., e.)=(e1, ..., e.) Such A single fetch-add where X is the shared variable e is the that expression to be added (usually an integer constant) and Y is the variable to receive the fetched value is defined as 7% O(X=X AND e AND e., AND... AND e=true and follows: 7% O(Y=X)=true and 7% O(Y=X AND e)-true and 7% O(Y=X AND e AND e.)=true and . . . and

15 7% O(Y=X AND e AND e AND... AND e ). 7% IIFETCH OR(X,e,Y)-true iff Multiple combinable operations are more difficult to 7% O(Y=X)H-true and 7% O(X=X OR e)-true. define. The new value held by the shared variable is straight 7% IIFETCH AND(X.e.Y.) & FETCH OR(X.e.Y.) & . . . forward the new value is the sum (and, or, xor) of the old & FETCH AND(X.e.Y.)=true value and the Sum (and, or, Xor) of the values contained in the combinable operations. The returned values are more f problematic; they are the result of Some, nondeterministic ordering as if they occurred one at a time. The Layered there exists Some indperm(e., . . ., e.)=(e1, ..., e.) Such network combines requests and decombines responses for that combinable operations. Since which processor executes a 25 7% O(X=X OR e OR e OR . . . OR e=true and part of an interval is unspecified, the apparent Sequential order cannot be known. Fortunately it doesn't need to be. Let nd perm(e1, ..., e) be a nondeterministic permutation of expressions resulting in Some Scrambling of their order: 7% IO(Ys=XOR el OR e.)=true and . . . and W O(Y=XOR e OR e2, OR . . . OR e ). 7% Indperm(e1, ..., e.)=(e1, ... eks) W IIFETCH XOR(X.e.Y)=true iff where W IO(Y=X)Htrue and 7% IO(X=X XOR e)-true. 7% IIFETCH XOR(X.e.Y.) & FETCH XORCX.e.Y.) Wi:1 sisk: 35 & . . . & FETCH xoro...Y.)Htrue dm:e=e and f there exists Some indperm(e1, .. ., et)=(e1, ..., e.) Such that Now with ndperm, fetch-add can be defined for simulta neous execution: 40 7% O(X=X XOR e XOR e XOR. . . XOR e=true and 7% IIFETCH ADD(X.e.Y.) & FETCH ADD(X.e.Y.) & . . . & FETCH ADD(X.e., Y)=true 7% O(Y=X XOR e XOR e.)=true and . . . and f W O(Y=X XOR e XOR e XOR . . . XOR e )= 45 true. there exists Some indperm(e1, .. ., et)=(e1, ..., et.) Such that The Swap operation behaves differently. When a single Swap occurs at a time, the value of the shared variable becomes the parameter of the Swap and the value returned is 50 the previous shared variable value.

When many Swaps occur Simultaneously one processor What the above expressions say is that the shared variable 55 X gets all the values contained in the fetch-ads added to it will get the value of the shared variable the others will get and the values returned are the original X value plus Some values from Some other Swap, and the Shared variable, the Subset of other values from other fetch-adds. The reason others will get values from Some other Swap, and the shared why this is important is that the behavior is as if the variable will get the Swap value no processor receives. fetch-adds occurred in Some Sequential order when in fact 60 they occurred at the same instant. Combinable operations are why "interleaving” of States are inappropriate. Any languages or architectures for which interleaving provides f an adequate model are doomed to poor efficiency. there exists Some indperm(e... e.)=(e1, ..., e.) Such that The fetch-and, fetch-or, and fetch-Xor are defined simi larly: 65 7% IIFETCH AND(X.e.Y)=true iff and . . . and 7% O(Y=e, )=true. 5,867,649 29 30 The Swap combinable operation is good for manipulating where & & is String concatenation. That is that domain d2 pointers in dynamic data Structures. The fetch-add is good call “see' all the values in every domain d1 whose name is for indexing into a shared array. The fetch-and fetch-or and a shortened version of d2. All the values visible in domain fetch-Xor are good for clearing, Setting, and toggling flags d2 are either declared in the DANCE program text corre respectively. Virtually all of the DANCE programs use sponding to d2, or are declared in Some ancestor of d2. combinable operations to manipulate shared variables to Domains are neighbors if they have Some variables they access data Structures in parallel without mutual eXclusion. both can See: Hardware Support for combinable operations is necessary for high efficiency. In addition to combining in the Layered network, I'd like combinable operations in the processor's Remote domains have no common variables: ISA and memory management unit. Although combinable operations are expected to represent a Small percentage of all instructions executed, they (can) eliminate parallelism Remote domains are needed to represent programs that destroying mutual eXclusion and blocking. span machine boundaries, i.e. distributed processing. Temporal Domains 15 Simple DANCE program fragment: Temporal domains hold bindings between identifiers (variables) and values. All State variables occur in Some package body ex:example is domain and at most one domain. Domains hold not only declare changeable States, but complete packages which contain A:float: procedures and functions as well as State variables. procedure p is declare Domains are essential for proving noninterference-that Winteger; no other processor is modifying the variables any particular begin processor is currently using. Noninterference is crucial to command1 showing that parallel activities can occur together correctly end; “An Axiomatic Proof Technique for Parallel Programs.” 25 procedure h is declare Susan Owicki and David Gries, Acta Informatica 6, X:float: Springer-Verlag 1976). begin Domain Trees declare All domains for a Single parallel program running on a Y:bool; begin Single Multitude computer (which is comprised of many command2 processors) form a rooted, directed tree. Intervals being end executed (causing State change) exist in Some leaf domain. The temporal predicates (commands) that control the inter commands Val can use any variable-value bindings in any domain declare below it in the tree. The domain tree will dynamically 35 Z:char: expand an contract as the program is executed. A new begin domain is created when a block is entered that declares local command3 variables and is deleted when the block is exited. declare Domain names are formed by a String of identifiers V:string; Separated by periods representing the path in the domain tree 40 begin from the root to the leaf. The local variables immediately command4 end declared within a Subprogram are contained in a domain end whose name is the Subprograms identifier prepended (with a end; period) with the domain name in which it was invoked. end package Within Subprograms, blocks that declare new variables may 45 be encountered; a new domain leaf is grown. Non-nested blocks are assigned numbers starting with 1, in the order in In the above program fragment: which they are encountered. Nested blocks grow new command 1 sees W and A; domain leaves. Automatically numbered blocks cannot be command 2 Sees A, X, and Y, confused with either named domains or Subprogram 50 command 3 Sees A, X, and Z, domains since DANCE identifiers cannot start with digits. command 4 Sees A, X, Z, and V; However domain identifiers and Subprogram identifiers may command 5 sees X and A. be the same. However, an identifier's position in the name FIG. 42 shows the domains for the example above. FIG. make clear whether it's a domain name or Subprogram 42 is not a Set diagram in the usual Sense of a Subset drawn C. 55 inside its SuperSet. The Set of nameable and therefore usable In the following, d, d1 d2, etc. Stand for domains and variables for a particular domain are all the variables N(d) stands for the name of domain d. Define the domain declared within the domain, union variables declared in operator S), by: Surrounding domains. Domains are necessary for proof of 5 Id={(n,v) neidentifier, veType(n)} noninterference. The easiest way to assure non-interference 60 is by restricting access to critical variables. Semaphores which means “the domain of d is a set of pairs in which the protect variables by Serializing access to critical Sections. first element, n, is a program identifier and the Second DANCE protects variables with domains. By relying on the element, V, is a value in the type of n.” Types are Sets of fact that no other interval can “see' critical, local variables, values. A domain contains the domains of all its ancestors in noninterference is assured. the domain tree. Ancestor domain names are formed by 65 Remote Procedure Call truncating their descendent's names: Distributed systems have domains that do not form a tree-no Single domain exists within which all other 5,867,649 31 32 domains are contained. No Shared address Space, no com which domain the value of interest lies. Tree-structured binable operations, just messages, usually point-to-point directories Save much space and may work well for Sequen sans confirmation. Remote procedure call (RPC) works just tial accesses, but they cause “hot spots” that would Substan fine for much of what one would like to do at disparate tially degrade the performance when distributed. On the geographical locations. (Note: I explicitly reject message other hand, if the number of domains were large enough So passing paradigms for parallel processing that have one that the probability of any two transactions needing the same machine out of many microprocessors, in a Single location. domain were Small, the System would exhibit latency largely Only geography demands distribution.) independent of the number of transactions-a blessing to However there are worthwhile applications that demand travel agents worldwide. geographical distribution that cannot utilize nested RPCS Type Theory Such as databases. RPCS are great for computation where the duration or lifetime of the domain which contains all Type Systems are important. Most languages Such as accessed variables coincides with the duration of the Pascal and C only allow types that are basic and constructed. invoked procedure. Such applications abound, i.e. Basic types generally correspond to types Supported directly photorealistic, 3D, animated rendering. by the hardware like integers, floating point number and Sometimes the lifetime of state variables must exceed the 15 characters. Constructed types are built from the basic types lifetime of any procedure that uses it. Databases are like this. to form arrays, lists, and records. Some languages Such as The traditional means of ensuring noninterference (and thus Ada allow forms of polymorphism (many types) through partial correctness) is with locks, Semaphores, monitors or operator overloading. Abstract data typing Seemed to answer Some other form of mutual exclusion which may lead to a desire for powerful, reusable, and encapsulated types. A deadlock. This has caused a colorful consideration of paper by Cardelli and Wegner “On Understanding Types, resource allocation and deadlock detection starting with Data Abstraction, and Polymorphism,” Luca Cardelli and Dijkstra's dining philosophers and ending with global Snap Peter Wegner, Computing Surveys, Vol. 17, No. 4, Decem shots. Total correctneSS proofs cannot tolerate even the ber 1985. gave new Vistas to my thinking about types. The possibility of a deadlock occurring-the program must type system of DANCE has been adapted from Cardelli and terminate. How to ensure noninterference between transac 25 Wegner's type expression language Fun. Existentially and tions that may wish to exclusively use the same values universally quantified types are realized through "packages' without risking deadlock? described in a following Section. Must have noninterference cannot permit deadlock. Non Basic Types interference without deadlock. But how? The basic types of DANCE are: 1) Divide the database into a large number of domains which may reside on machines acroSS the country. Integer, Bool, Rat, Float, and Char 2) Impose a total ordering on the domains. 3) Allow only one RPC to be active in a domain-queue Short for integer, boolean, rational, floating point, and char other received RLPC invocations first-in-first-out. acter. Rational is included because exactness cannot be 4) Make database accesses that need to maintain consis 35 obtained with floating point numbers. A special value, NIL, tency between values in different domains use an RPC chain is a member of each of the basic types representing no value. in the order of the domains used. Uninitialized variables have the value NIL. Attempts to use The proof by induction of totally correct programs that variables whose values are NIL in expressions will result in invoke only other (already proven) totally correct Subpro error messages. However NIL can be assigned to variables. grams with RPC is direct. Anything that is totally correct and 40 Simple DANCE grammar for simple types: is invoked with RPC will eventually terminate and return with the right value, and the caller will then terminate (the :

::= (“I* + “I” ) “IS " (“PRIVATE” ") “END” “PACKAGE” ::= “PACKAGE BODY “:” “IS ::= “PROCEDURE “(“ICvariable name>" 2.2% 2.2% ::= “FUNCTION “(“Izvariable name>" “:” " “FROM ( “:”) (““ formal parameter “=> actual parameter “I” ) “END “IMPORT

Algebraic Specifications in Software Engineering. Ivo Van Polymorphism Horebeek & Johan Lewi, Springer-Verlag, 1989 and 40 There are two major kinds of universal polymorphism, inspired by packages in Ada ANSI/MIL-STD-1815A-1983 that is, two major ways in which a value can have many Reference Manula for the Ada Programming Language. types. In parametric polymorphism, a polymorphic package American National Standard Institute, 1983.). has an explicit type parameter. In inclusion polymorphism, Packages have two parts: a visible, declaration Section, a package can be viewed as belonging to many different and a hidden body. The body is lot really hidden or secret, 45 types, which need not be disjoint. So far, I haven’t imposed just that it is not within the Scope that can be seen by other limitations on Overloading to avoid ad hoc polymorphism; packages. All that a programmer should need to know about for example, a programmer may define a new '+' function on the Services of a package should be visible in the declaration a new type that is not associative. Section. Types are set of values. The of all values, V, Those familiar with object oriented languages like Small 50 contains all Simple values like integers and Strings, and all talk will recognize that packages define classes and that data Structures like pairs, records, variants, and records. A these classes are instantiated as objects. Such considerations type is a set of elements of V. Moreover, V when ordered by will be finessed for the time being. The conception of Set inclusion forms a lattice (of types not states). The top of executing a package body is adequate. this lattice is, the type Top, the set of all values or V itself. Packages have names. The items within a package can be 55 The bottom of this lattice is essentially, the empty set. The named from other packages similar to records by prefacing types used by any given programming language is only a the name of the item with the package's name and a period. Small Subset of V. A particular type System is a collection of Packages group together objects and operations that have Sets (ideals) of V, which is usually defined by presenting a useful purpose. Packages require lots of declarations. C language of type expressions and a mapping from type hackers don’t want to be bothered with Scoping more 60 expressions to Sets of values. Sophisticated than “local” and “global.” DANCE adds some Since types are sets, Subtypes are . Moreover, the powerful, yet eSoteric declarations to Adas. These declara semantic assertion “T1 is a subtype of T2” corresponds to tions allow generic (polymorphic) programs that are param the mathematical condition T1CT2 in V. This is the foun eterized with type (universal type quantification) and certain dation for type checking. For simple types (like integer requirements for the awaited type (inheritance via bounded 65 Subranges) this subtype relationship is intuitive. But Cardelli universal quantification). Ada's private type declarations and Wegner add types Such as: Serve as existential type quantification. DANCE's type type B=WTTT 5,867,649 35 36 This means that type B is all junctions whose result type Type Inclusion Rules:tCS is the same as it operand type. Notjust Some function but all C is a Set of inclusion constraints for type variables. C.a Such functions-universal quantification. For example, the Ct is the Set C extended with the constraint that the type trigonometric function cosine has floating-point valued variable a is a Subtype of the type t. C-tCS is an assertion operand and result: meaning that from C we can infer tCs. A horizontal bar is logical implication; if we can infer what is above it, then we type float functions = float float can infer what is below it. cosine(x) € float functions float functions CB TOP C = to Top 1O Similarly for existential quantification, IVAR C.aCt aOt BAS1 C H aca type C=c.t(C) BAS2) C F kCk means, for Some type, c, C has type tcc) where tcc) is a type 15 C = S1CS and C = tot1. expression using c. Every value has type a.a because for every value there exists Some type (set) for which the value C = S-> tCS1 ti is a member. Thus the type a.a denotes the set of all values C = S1Ct and . . . and C = S.Ct. or Top. RECD Universal quantification yields generic types whereas C = RECORD L 1 : S1, ..., L. : Sr.,..., L. : S. existential quantification yields abstract data types. Com END RECORD CRECORDL1 : S1,..., L. :s, bining these two notions with Subtyping yields parametric END RECORD data abstractions. A package declaration, C = S1Ct and . . . and C = S.Ct. 25 VART C = VARIANT L : (L1 ... L.); L1 : S1, ... package pit is required END VARIANTC VARIANT L : (L... L.); t: L1 : S1,..., L. : S, END VARIANT end required d--declarations C.aCS tC1 end package (UNI)UNI - evey, a nott free.free inIn C creates a type of the form C.aCS tCl1 EXIEXIT -C = (aCS.t)C (aCS.t)- a not free In C type p=WtCr:b:d(b) 35 TRANS C = SCt and C = uCl Which says that package p is a type, Such that for any type C = SCt t which is a Subtype of r, there exists Some body, b, that implements the declaration(s), d. There may be more than Type Rules for Expressions: e one body b that upholds the declaration(s), d, and the A is a Set of type assumptions for free program variables. package will operate with any Supplied package t, So long as 40 it behaves like r. A.X:t is the Set A extended with the assumption that variable Type Checking X has type t. C.A. e.t means that from the Set of constraints Types have been called "armor,” protecting data repre C and the Set of assumptions. A we can infer that e has type Sentations from inappropriate manipulation. By requiring tf is a function. tsla is texual Substitution t explicit declaration of all program variables, the DANCE 45 language Supplies redundant information used for checking ETOP CA = e : Top types. In general the type of a thing must match the use of EVAR. C.A.x: t that thing. Using Separate information to determine tiling type and use-type, allows them to be compared. Like bal ABS f = e and C.A.x: S = e : t x free in e ancing a double-entry bookkeeping System, type checking 50 C.A. f : S -> t gives confidence. APPL C. A f : S-> t and C. A e :S Type checking involves (Somehow) determining types of C, A = f(e): t an object and its use. That determination is called type inferencing. These type inference rules are adapted from (simple DANCE needs not RECD, Cardelli and Wegner. 55 record components accessed individually: r.L.) The DANCE type system can be formalized as a set of C, A = e1C1 and . . . and C, A = en Cin type inference rules that prescribe how to establish the type ERECD) of an expression form the type of its Subexpressions. These C, A = (L1 =>e1, ..., L=e) : RECORD L 1 : ty, ..., rules can be intended as the Specification of a type-checking L. : t END RECORD algorithm. 60 C. A x : RECORD L 1: t ..., L. : t END RECORD The inference rules are given in two groups. The first SEL group is for deducing that two types are in the inclusion C, A = x. Li : it i e 1... in relation. The Second group is for deducing that an expression C, A = x : VARIANTL:(L1 ... L.); L1 : S1, ..., L. : S. has a type. END VARIANT C, A He : (L1 ... L.) Type expressions are denoted by S, t, and u, type variables 65 VART by a and b, type constants (e.g. float) by k, expressions by e, and f, variables by X, and labels by L. 5,867,649 37 38 -continued DANCE shifts the view to the values of variables at a C.aCS, A = e : t particular instant of time (i.e. State). If one thinks of a PACKAGE Pal IS REQUIRED a : s END REQUIRED parallel program represented by interval temporal logic as a e END PACKAGE GEN lattice made from threads of State transitions Seeing oppor C, A = P: Wads.t tunities for parallel execution becomes easier. Each thread has (for the most part) its own private variables that can be C, A = P: Wads.t and C = S1CS manipulated and proven to not affect other threads. Perhaps SPEC C, A = P.es 1: t sta} the word “thread” may be confused with popular “multi threaded' operating Systems. Such as Mach. Such threads are C,A = e: tsIb} TYPEab IS t; merely Sequential processes. Arbitrary interleaving of DEFN - - C, A -= e : as- - Sequential process States is a paradigm that clouds parallel conception and requires inspection of the States of all C, A = e : t and C = tou ETRANSI - - - - F processes, rather than just the variables whose values they C, A = e : it share. Two Similar concepts must be kept clear: temporal predi C, A = a : it 15 PACKAGE PISTYPE IS PRIVATE : cates and assertional predicates. Temporal predicates Serve HIDE) SEND PACKAGE as commands that create a model, in interval temporal logic, making the predicates true over the interval. ASSertional predicates have not concept of time, only State. ASSertional C, A = P: Wads.3bCt.u predicates arent actually executed (but checking assertions C, A = c :S IMPORT u FROMp: Pe =>al END IMPORT during debugging would be nice). Rather they declare allow PACKAGE BODYp: P IS TYPE b ISt: able values for State variables and are used for Specifying USE what to do and proving it was done. C, A = p.u : P cla Weakest Precondition Predicate Transformers Any formal logical System consists of rules defined in Correctness Proofs 25 terms of: My attendance at the Institute on Formal Development of a set of Symbols, Programs and Proofs sponsored by the University of Texas a set of formulas constructed from these Symbols, at Austin in 1987 changed my attitude toward correctness proofs. I am now completely convinced of the necessity of a set of distinguished formulas called axioms, and correctness proofs. Testing can at best show the presence of a set of inference rules. program errors. Only proofs can show the absence of Formulas are well-formed Sequences of Symbols (they are program errors. The problems of debugging are greatly grammatically correct). The axioms are special formula that magnified by parallelism. Since no single thread of control are assumed to be true. Inference rules specify how to derive exists two executions of a parallel program may not yield the additional true formulas from axioms and other true formu Same results. Experiments showed that a particular program 35 las. would execute correctly over 99% of the time but would Inference rules have the form: occasionally Seize up in deadlock. However, when run in debug mode to generate trace files to files the bug, the H1, H2, ... , H, program never deadlocked because generation of trace files C changed the timing enough to avoid the problem. Programs 40 for parallel computerS Simply cannot be developed by those Where the H are hypotheses, and C is a conclusion. Both tied to C language. Therefore the necessity for program the hypotheses and conclusion are formulas. The meaning of proofs. an inference rule is that if all the hypotheses are true, then I could plagarize David Gries excellent book but will we can infer that the conclusion is also true. refer the reader insteadl The Science of Programming, 45 A proof in a formal logic System is a Sequence of lines David Gries, Springer-Verlag, 1981). In Gries book, weak each of which is an axiom or can be derived from previous est precondition predicate transformers refer to a particular lines by application of an inference rule. A theorem is any place in the text of the program. In DANCE, the assertions line in a proof. Thus theorems are either axioms or are are applied to the States of program variables denoted by obtained by applying an inference rule to other theorems. interval temporal logic almost always referring to the first or 50 By itself, a formal logical System is a mathematical last node of a state lattice. Such states are “between abstraction-a collection of Symbols and relations between execution of commands. them. A logical System becomes interesting when the for Assertions in DANCE are enclosed in curly braces {}. mulas represent Statements about Some ASSertions form pre- and post-conditions for commands. and the formulas are theorems that are true. This requires This allows proof of program correctness. The form of 55 that we provide an interpretation to the formulas. An inter DANCE assertions was adapted from Concurrent Program pretation of a logical System maps each formula to true (T) ming: Principles and Practice, Gregory R. Andrews, or false (F). A logic is Sound with respect to an interpretation Benjamin/Cummings, 1991). Andrews notation is based on if all its axioms and inference rules are Sound. An axiom is guarded commands like Gries, but from a parallel comput Sound if it maps to true; an inference rule is Sound if its ing emphasis. 60 conclusion maps to true, assuming all the hypotheses are For example an array, b, with length k can be asserted to true. Thus, if a logical System is Sound, all theorems in the be in order as: logic are true Statements about the domain of discourse. In this case, the interpretation is called a model for the logic. {Sorted (b):forall i:1 ... k-1:bisbi-1} Earlier we talked about “models” of interval temporal An assertion applies to the State of visible variables at the 65 logic. Those models were interpretations that assigned val interval(s) when the following command is ready to execute. ues to State variables to make the temporal predicates true. Andrews views assertions as part of the text of a program. Generating a model for a temporal predicate is executing a 5,867,649 39 DANCE program. This section defines assertions which are really outside the program, but allow proof that the programs commands (imperative directives, temporal predicates) do p q NOT pp AND q p OR q p > qp = q what we claim they do. F F T F F T T A logical System is complete with respect to an interpre F T T F T T F tation if every formula that is mapped to true is a theorem; T F F F T F F i.e. every true formula can be proven with the logical T T T T T T T System. Unfortunately, parallel programs cannot have both Sound and complete axiomatizations as a logical System. Given a State S, we interpret a propositional formula Pas 1O follows. First replace each propositional variable in P by its This is because program behavior must include arithmetic value in S. Then use the table to simplify the result. When and a well-known result in number theory, Gödel's incom there are multiple operators in a formula, has the pleteness theorem, States that no formal logical System that highest precedence followed by conjunction, disjunction, axiomiatizes arithmetic can be complete. However, a logic implication and finally equivalence. AS with arithmetic expressions, parentheses may be used to force a different that extends another logic can be relatively complete, mean 15 order of evaluation or to make the formula clearer. For ing that it introduces no additional Source of incompleteneSS example if p is false and q is true in a State, then the beyond that already in the logic it extends. Fortunately, interpretation of the formula relative completeneSS is good enough Since the arithmetic properties that we will employ are certainly true, even if not p OR (->q) all of them can be proven formally. is: Propositions Propositional logic is an instance of a formal logical p OR(p & q}= FOR(FT) System that formalizes what is often called “common Sense” reasoning. The formulas of the logic are called propositions. - FORT The axioms are Special propositions assumed to be true. The 25 inference rules allow new, true propositions to be formed A formula Pissatisfied in a state if it is true in that state. from existing ones. Formula P is satisfiable if there is some state in which it is In propositional logic the Symbols are: satisfied. Formula P is valid if it is satisfiable in every state. Propositional constants: T and F (for “true” and “false") A valid proposition is also called a . In a Sound Propositional variables: identifiers (p, q, r, . . . ) propositional logic, an axiom is a tautology Since it is assumed to be valid. Propositional operators: NOT, AND, OR, e, and = There are many tautologies in propositional logic just as The formulas of propositional logic can be single constant there are many identities in trigonometry. The ones that we or variable Symbols, or constructed by using propositional will use most are listed in the table below. These tautologies operators to combine two other formulas. There are five are called propositional equivalence laws. Since they allow a propositional operators: negation (NOT), conjunction 35 proposition to be replaced by an equivalent one. Applica (AND), disjunction (OR), implication (), and equivalence tions of the propositional equivalence laws are transitive-in (=). Their interpretations are given below. any given state, if P=Q and Q=R, then P=R.

Law of Negation: PNOT(NOT(P)) Law of Excluded Middle: P OR NOT(P) = T Law of : P AND NOT(P) = F Law of Implication: P – Q = NOT(P) OR Q Law of Equality: (P=Q) = (P --Q) OR (QP) Laws of OR-simplification: P ORP = P PORT = T POR (PAND Q) = P Laws of AND-simplification: PAND P = P PAND T = P PAND P = F P AND (P OR Q) = P Computative Laws: (PAND Q) = (QAND P) (P OR Q) = (Q ORP) (P = Q) = (Q = P) Associative Laws: P AND (QAND R) = (P AND Q) AND R POR (Q OR R) = (P OR Q) OR R Distributive Laws: POR (Q AND R) = (P OR Q) AND (PORR) P AND (Q OR R) = (P AND Q) OR (PAND R) DeMorgan's Laws: NOT(PAND Q) = NOT(P) OR NOT(Q) NOT(P OR Q) = NOT(P) AND NOT(Q) AND-elimination: (PAND Q) - P OR-introduction: P -- (P ORQ)

Both AND-elimination and OR-introduction are examples of weakening a proposition. For example if a State Satisfies (makes true) PAND Q, then it satisfies Palone. Thus, P is 65 a weaker proposition than P AND QSince (in general) mare states satisfy just P than satisfy both P and Q. Similarly, in the OR-introduction tautology, POR Q is a weaker propo 5,867,649 41 42 Sition than P. Since F implies anything, F is the Strongest -continued proposition (no State Satisfies it), and T is the weakest Conjunction Law: proposition (every State Satisfies it). (FORALL B : R : P AND Q) = Predicates (FORALL B : R : P) AND (FORALL B : R : Q) Propositional logic provides the basis for assertional Disjunction Law: (EXISTS B : R : P ORQ) = reasoning. However, by itself it is too restrictive Since the (EXISTS B : R : P) OR (EXISTS B : R : Q) only propositional constants are T and F. DANCE has Empty Range Laws: additional data types besides just boolean. Thus, we need a (FORALL B : F : P ) = T way to manipulate any kind of boolean-valued expression. (EXISTS B : F: P) = F (NUM B ; F : P) = 0 Predicate calculus extends propositional logic to Support this (SUM B ; F : P) = 0 generalization. The extensions to propositional logic are: (PROD B : F : P) = 1 Any expression that evaluates to T or F can be used in place of a propositional variable; or The counting quantifier, NUM, adding quantifier, SUM or Existential (EXISTS or 3) and universal (FORALL or W) 15 X, and multiplying quantifier, PROD or II, can be helpful in quantifiers are provided to characterize Sets of values. describing properties of groups of elements. This is the same reserved word (FORALL) as the parallel Textual Substitution is the final concept from predicate command, but within a predicate it is different Since and calculus employed: W are not ASCII characters, DANCE programs use instead. Textual Substitution: If no variable in expressione has the and W are used in mathematical definitions where appli Same name as any bound variable in predicate P, then cable. Px is defined to be the result of replacing every free The Symbols of predicate calculus formulas are those of occurrence of X with e. Textual Substitution is used for propositional logic plus relational operators and quantifiers. the assignment command. The formulas called predicates, are propositional formulas Brian's Programming Logic in which relational and quantified expressions may replace 25 A programming logic is a formal System that facilitates propositional variables. To interpret a predicate in Some making precise, mathematical Statements that program State, we first interpret each relational and quantified behavior. Brian's Programming Logic (hereinafter BPL) expression, and then interpret the resulting propositional derives from Andrews PL, suitably modified for the formula. DANCE language and its model of computation. When dealing with sets of values, we will often wait to As with any formal logical system BPL contains symbols, assert that Some or all values of a Set Satisfy a property. We formulas, axioms, and inference rules. The symbols of BPL could do this informally, but quantified expressions provide are predicates, curly braces, and DANCE programming a more precise and compact method. language commands. The formulas of BPL are triples of the The existential quantifier, EXISTS, allows one to assert form: that Some element of a Set Satisfies a property. It appears in 35 expressions of the form: Above, P and Q are predicates, C is a simple or compound (EXISTS b, ..., b.:R:P) command. P and Q are assertions about the behavior of C. The b, following the quantifier are new variables called In DANCE, C is a temporal predicate which is made true by bound variables; R is a formula that specifies the set of 40 construction of an interval. An interval (or Subinterval) is a values (the range) of the bound variables: P is a predicate. lattice States. The assertion Papplies to the first State in the The interpretation of existential quantification is true if for interval, Q applies to the last State. some combination of values of the bound variables (within In P and Q, the free variables (those not bound by a range R), predicate P is true. quantifier) are either State variables or logical variables. 45 State variables are introduced explicitly by declarations and The universal quantifier, FORALL, allows one to assert are manipulated by execution of DANCE commands. Logi that all elements of a Set Satisfy a property. Universal cal variables are special variables that Serve as placeholders quantification takes the form: for arbitrary values; they appear only in predicates (between (FORALL b, ..., b.:R:P) braces), not in DANCE commands. 50 Interpretation of a Triple: Let each logical variable have Again, the b are bound variables, R Specifies the range of Some value of the correct type. Then the interpretation of a values of bound variables, and P is a predicate. The inter triple {P}C{Q} is true if, whenever execution of C is begun pretation of universal quantification is true if P is true for all in a State Satisfying P, both the execution of S terminates and combinations of the values of the bound variables. the resulting State Satisfies Q. In a quantified expression, the Scope of a bound variable 55 Please note: Andrews uses partial correctness in his triple is the expression itself. Thus, a quantified expression is like interpretation. He then requires a separate proof of termi a nested block. Many quantified expressions will also ref nation to derive total correctness. I prefer a total correctness erence program variables. Thus, confusion could result if the interpretation which includes the termination requirement. name of ai bound variable conflicts with that of another In a triple, predicates P and Q are called assertions since 60 they assert that the program State must Satisfy the predicate variable, and should be avoided. in order for the interpretation of the triple to be true. Laws for manipulation of quantified expressions: ASSertions characterize acceptable program States, they may be considered specifications, as contracts between the imple DeMorgan's Laws for quantifiers: menter (programmer) and the user (customer). (EXISTS B : R : P) = NOT(FORALL B : R : NOT(P)) 65 Predicate P is called the precondition of C, denoted (FORALL B : R : P) = NOT(EXISTS B : R : NOT(P)) pre(C). Predicate P characterizes the States of program variables; P must be satisfied before execution of C begins. 5,867,649 43 44 Predicate Q is called the postcondition of C, denoted post This illustrates that relations about variables that are not (C). Q characterizes the States of program variables after assigned to are not affected by assignment command execu execution of C. Two Special assertions are provided: T and tion. The Semantics of array and record elements can be F. Predicate T (true) characterizes all program States; predi tricky but understood that arrays and records are like func cate F (false) characterizes no program State. In order for the interpretation of a triple to be a model for tions from indices (for arrays) or field Selection (for records) programming logic BPL, the axioms and inference rules to the values of the appropriate element. ASSignment to an must be sound. This ensures that all theorems provable in element returns the whole array or record with only that BPL are sound. For example, the following triple should be element changed. a theorem: Inference Rules The program State changes as a result of executing assignment commands. The axioms for these Statements use textual Substitution to introduce new values into predicates. However the following triple should not be a theorem 15 The inference rules of BPL allow the theorems resulting Since assigning a value to X cannot magically Set y to 1. from these axioms to be manipulated and combined. There is one inference rule for each of the Statements that affect the flow of control in a DANCE program: Sequential composi In addition to being sound, BPL is relatively complete so tion (), parallel composition (& and FORALL), alternation that all triples that are both true and represent useful com (IFFI), and iteration (DOOD). Each inference rule has two putation are in fact provable as theorems. parts separated by a horizontal line. If the Statement(s) above Axioms the line are true, the statement below the line can be inferred This section presents axioms of BPL along with informal justifications of their Soundness and relative completeness. to be true. Expression evaluation is assumed to be side-effect free; i.e. 25 The first rule is for sequential composition. If a triple, {P} only functions may be invoked within expressions because S1 {Q} is true and another triple {Q} S2 {R} is true where they are side-effect free. {Q} is the same assertion in both cases, then the sequential The Skip command changes no variables. If predicate P is composition of S1 and S2 is true, true before execution of Skip, it remains true when skip terminates. {P} S1; S2 {R}. Sequential Composition Rule: An assignment command assigns a value (say e) to a variable (Say X) and thus in general changes program State. At first glance it might Seem that the axiom for assignment 35 should start with the precondition pre(X:=e), and that the postcondition should be pre(X:=e) plus a predicate to indi Suppose we have are alternative command of the form: cate that now X has value e. However, going in the other direction results in a much simpler axiom for assignment. ASSume that the post condition of an assignment is to Satisfy 40 a predicate P. ASSignment (with side-effect free expression IFFY: evaluation) changes only the designated variable X, So all other program variables have the same value before and after the assignment. Second, X has a new value e and thus any relation pertaining to X that is to be true after the 45 assignment has to have been true before with X replaced by e. Textual Substitution makes exactly this transformation.

50 Bn => Sn FI To illustrate the use of this axiom consider the following triple:

55 Where each of the Bs are boolean valued expressions and the Ss are DANCE commands. By the DANCE semantics at least one of the B's (guards) must be true. Then if whenever both P and Bi are true P and Bi} and the Statement Si is executed and Q is true of the State that results This indicates that Starting in ally State and assigning a 60 {Q} then we call infer that the whole alternative command value to a variable gives the variable that value. As a Second is true. example, consider the triple: Alternative Rule: {y=1}x:=4 (y=1) AND (x=4) FORALL i IN (1...n): {P and B. S. Q} This is also a theorem Since: 65 {P} IFFY {Q} 5,867,649 45 46 Suppose we have an iterative command of the form: that assignment commands are executed atomically. ASSumptions of atomicity (a command or group of com mands are executed indivisibly) pervade Andrews’ book by placing angle brackets around whatever is to be atomic. To DOT: achieve atomicity Some form of critical Section must be enforced, which, of course, destroys efficiency. For DANCE we need an inference rule something like: for all intervals that may be executed concurrently all the variables that each interval accesses (either read or write) are either 1) Read-only constant values, 2) Variables no other interval accesses, or 3) Shared variables properly accessed with combinable Bn => Sn operations. OD Read-only constants and private variables are obviously {I AND NOT(B1 OR B2 OR. . . OR Bn)} 15 O.K., but what are “properly accessed” shared variables? Remember that the combinable operations have hardware Where the B's are boolean-valued expressions and the S's enforced atomicity; they are both indivisible and can be are commands. The predicate I is the loop invariable; it is performed Simultaneously. The result of combinable opera true before and after each evaluation of the loop. For one of tions (even if they occur at the same clock cycle) is the same the commands to be executed (Si) its guard (Bi) must as if they occurred in Some Sequential order. The shared evaluate to true. Therefore, if each triple of the form {I and variables are properly accessed if-and-only-if the correct Bi Si {I} is true (i.e. execution of commands Si when neSS of the accesses is order-independent. Now we are ready Bi=true keeps the loop invariant true), then when the loop to form the parallel rule for DANCE. terminates (all the guards are false) we can infer that {I} Parallel Composition Rule DOIT I and not (B or . . . or B)} is true. 25 Iterative Rule: foralli: i IN (1 ... n.) : I and B. S. {I} {I} DOIT I and not(B or ... or B.) where AccessOK(S1.S2) is defined as: forall variables Vi occurring in S1 or S2 The iterative rule, as Stated, has a problem partial cor neither S1 nor S2 changes the value of Vi, rectness. For an iterative command to terminate, eventually only one of S1 or S2 accesses (read or write) V1, or all the guards must be false. To prove termination an integer S1 and S2 access Vi solely via combinable operations bound function must be included as discussed in the later and the truth of Q1 and Q2 is independent of the Section on weakest precondition predicate transformers. 35 result of the combinable operations, or more formally: Wi:ie1 ... n : B =>(bound > 0) and {I and B, and bound=bd S; I and bound-bd}} AccessorOK(S1.S2) = {I} DOIT I and not (B1 or ... or B) and not (bound > 0) 40 W V: We S1 US2: (Vigi Ihs(S1) Ulhs(S2) ) or (Vie S1 and Vig S2) or (Vie S2 and Vig. S1) or The rule of consequence allows us to Strengthen a ( (Vie co(S1) and Vie co(S2) ) -> precondition, weaken a postcondition, or both. This rule is (co(S1) co(S2) and co(S2) co(S1) ) not really necessary Since it follows from the underlying Ihs(S) = { viv := e e S} predicate calculus, it is handy in proofs, and will later be 45 co(S) = { v fetch add(v,ev2) e S or swap(v,ev2) e S or ... } used to understand weakest-precondition predicate trans formerS. Suppose predicate P1 implies predicate P, predicate Since the parallel command (FORALL) is merely a Q implies predicate Q1, and the triple {P} S {Q} is true. parameterized generalization of parallel composition, the Then we can infer that P1}S Q1} is true. parallel composition rule may be simply extended. Rule of Consequence: 50 Parallel Command Rule: forall iii in range:{P(i)}S(i) {Q(i)} and AccessOK(S(i)) and P=> P(i) and O(i) => O {P} FORALL i IN range DOS {Q} Parallel composition (&) and the parallel command 55 (FORALL) are much more difficult to make rules for. The Where P(i), S(i), and Q(i) are commands or assertions problem is “interference.”Interference occurs when more where the variable i has assumed one of the values in the than one intervals reference the same variables. Sometimes range of the parallel command. interference is desired. Accessing shared variables with These inference rules cover only the major language combinable operations allows access to data Structures in 60 constructs. Such things as types, whether constructed or parallel. Andrews insists on “interference freedom.” Conse abstract, have no inference rules as Such. These rules do quently his programs are littered with “await' Statements however allow inferences to be made about the behavior of that block execution until Some boolean expression is true. the DANCE control structures. The next section considers By dumping any operation that may read or write a variable full-blown proofs using BPL. used by a different process into an appropriately guarded 65 Program Correctness Proofs await Statement, Andrews achieves interference freedom. Complete proofs are onerous both to concoct and to Furthermore, Andrews makes an unwarranted assumption follow. Only one example of a complete proof will be 5,867,649 47 48 presented and that of a very Small program: Dijkstra's linear ASSertions in proof outlines can be viewed as precise Search. An array a1 ... n is given for Some positive integer comments in a programming language. They characterize n. Also given is that the value X is held by Some element of exactly what is true of the State at various points in the a. The problem is to find the first occurrence of X in a. program. Just as it is neither necessary nor useful to place a comment between every command in a program it is usually neither necessary nor useful to place an assertion before --there is some element holding the value searched for every command in a proof outline. Thus we will place { P: n>0 and (exists j : 1 <=<= n : a-x)} assertions at critical points to help provide convincing --n is positive and some element of “a holds value x evidence that a proof outline represents a proof. Some day DO (ai f= x) => i:= i+1 OD 1O I hope to create a "proof verifier” that can automatically {LS: ai = x and (forall j : 1 <= Assignment Axiom, a postualte of BPL conditions (precondition) and the final goal (postcondition). (1) { P and 1=1} i := 1 { P and i=1} Additionally, the program is expected to terminate. -> And-simplification of (1) Weakest Preconditions (2) { (P and 1=1)=P } Suppose the goal of a program S is to terminate in a State – Rule of Consequence, (1) and (2) (3) {P} i:= 1 { P and i=1} Satisfying predicate Q. The weakest precondition of com -> Assignment Axiom, (3) mand (or composition of commands) S and a predicate Q, (4) { P and (forall j : 1 <=

-> substitution for label Inv into (Inv and (ai f= x) ) (5) (Inv and (ai f= x) ) -> (P and (forall j:1<= extending the range of j in (5) (g) { (Inv and (ai f= x) ) -> (P and (forall j : 1<= Rule of Consequence, (4) and (6) (7) { Inv and (ai?= x) } { Inv -> Iterative Rule, (7) (8) { Inv DO (ai f= x) => i := i+1 OD { Inv and (a i = x) } -> Empty Range Law and (4) (9) { (P and i=1) => Inv -> Sequential Composition Rule, (8) and (9) (10) { P }

{ Inv and (a i = x) } -> Law of Equality and substitution for Inv and LS (11) { (Inv and ai-x) => LS -> Rule of Consequence, (10) and (11) (12) { P } DO (ai f= x) => i:= i+1 OD

AS this example illustrates even for trivial programs a 55 The function “wp” is an odd function. It takes as its full-blown proof is voluminous and hard to follow. The arguments a predicate and a command. It returns another proof outlines in the next Section present a short-hand predicate. In most cases wp(S, Q) is not even computed. It method to prove programs correct. Proof Outlines is sufficient to show that the precondition implies wp(S, Q). A proof outline, Sometimes called an annotated program, Rather than a plurality of inference rules weakest precon consists of program commands interspersed with assertions. 60 dition predicate transformers use a Single inference rule for A complete proof outline contains at least one assertion program proofs. In practice wp is easier to work with. before and after each command. A complete proof outline Suppose we have a triple encodes applications of the axioms and inference rules for each command in the program. Simply the assertions before and after a command form a triple that correspond to 65 application of the axiom or inference rule defining the behavior of the command. we wish to prove is correct. 5,867,649 49 SO Weakest Precondition Inference Rule: DOT: DO

If Pawp(S, Q) then {P} S {Q} is a theorem of BPL and hence is true. The following laws allow manipulation of expressions containing Wp. Law of Conjunction: 1O wp(S, Q) and wp(S,R)=wp(S, Q and R) Bn => Sn Law of Disjunction: OD

15 Let BB be the disjunction of the guards BB=B1 or B2 or Note that the law of disjunction is only an implication ... or Bn. Let I be a predicate, and “bound” be an while the law of conjunction has equality. integer-valued expression whose value is non-negative. Command Semantics using Weakest Preconditions Then The Skip command always terminates and changes no program variable. I->wp(DOIT, I and not(BB)

An assignment command terminates if the expression and provided target variable are well defined. Henceforth, Wee assume this 25 to be true. If execution of X:=e is to terminate in a State Satisfying Q, it must Start in a State in which every variable (I and Bi)wp(Si,I) for 1 sism, except X has the same value and X is replaced by e. AS for the assignment axiom, this is exactly the transformation provided by textual substitution. (I and BB)-bound>0, and (i and Bi and bound=B)--wp(si, bound&B) for 1s isn. Sequential composition of commands leads to composi tion of Wp. The main implication is the essence of a loop invariant 35 it is true before execution of the loop and still true after enough executions So all the guard are false. The invariant Parallel composition of commands leads to conjunction and a true guard must imply the weakest precondition of the provided they are AccessOK. command with the true guard. AS long as any guard is true 40 the bound function is positive. Finally, execution of a command with a true guard reduces the bound function. The An alternative command terminates if all expressions in constantly decreasing, yet non-negative bound function the guards are well defined, one (or more) guard is true and ensures termination. Derivation of loop invariants and the Selected command Si terminates. For alternative com bound functions is an art. David Gries Science of Program mand of the form: 45 ming has the best description of how to determine loop invariants. IFFY: IF B1 => S1 The weakest precondition for the parallel command will finish our Suite of predicate transformers. 50 B2 => S2 FA: FORALL i IN range DOS

wp(FA, Q)=wp(Si1, Q) and wp(Si2), Q) and . . . and wp(Sik, 55 Q) Bn => Sn FI where i1, 12, . . . , ik are the values i takes in tie range and Six is the statement S with i replaced by the value ix. The weakest precondition predicate transformer for alter 60 native commands is: An Extended Example: Sorting To better explain the features of the DANCE language a The iterative command has a messy weakest precondition Sorting package is presented. The Syntax is “full-blown” since it may not terminate. In fact wip(DOIT, Q) is hard to 65 DANCE and so is slightly different from the syntax in determine and not very useful. Instead, an implication will previous Sections. The line numbers at the left are not part be given based on a loop invariant and bound function. of the program, but allow discussion of program text. 5,867,649 S1 52

1 package Sortrt,ar,< 2 is 3 -- this package contains a generic, parallel sorting procedure. 4 -- an array, its type, and a total-order operator are provided by 5 -- the calling package. 7 required 8 --these type parameters must be supplied to use the package 9 type rt; --this is the type of elements to be sorted 10 type ar is array.<> of rt; --this type is the array to be sorted 11 function < (a : rt; b : rt) return bool; 12 {(ckd) and (dze) =>(cke) 13 -a total order operator must be supplied for the element type 14 end required 16 procedure sort( orig: IN SPREAD ar; final: OUT SPREAD ar; low, high : IN integer); 17 pre: forall m,n, i <= m,n <= j : origm f=Origin 18 --the elements are distinct 19 post: forall i,j : low <= i finallow := origlow 45 forall i,j : low <= i < j <= high : finali < finali 46 --this symbol can be read as “or if'; it is two square brackets 47 (low > high) => skip 48 49 (low < high) => 50 begin 51 partition(low, high, orig, inter, pivot) 52 forall m,n : (low <= m < pivot < n <= high): interm < intern 53 ; --explicit sequential composition 54 sort( interlow.pivot-1, finallow.pivot-1, low, pivot-1) 55 forall i,j : low <= i < j <= pivot-1 : finali < finali 56 & --parallel composition 57 sort(interpivot+1.high final pivot+1..high, pivot+1, high) 58 forall i,j : pivot-1 <= i < j <= high : finali < finali 59 & 60 final pivot:= interpivot 61 end 62 fi 63 post: forall i,j : low <= i < j <= high : finali < final 64 AND final = permutation(orig) } 65 end --of sort procedure 67 procedure partition(i,j: IN integer; ai.j: IN SPREAD ar; 68 holdi.j: OUT SPREAD ar, k: OUT integer) 69 is 70 pre: forall m,n, : i <= m, n <= j : am f= an. 71 --the elements are distinct 72 {post: forall m,n : (i <= m < k < n <= j) : holdma hold k < holdn 73--the lower parts values are less than the values in the upper part 74 declare 75 low index := i, high index := j : SHARED integer; 76 h := (i+ i) DIV 2 : integer; 77 --choose an entry from ai..j to use as a pivot. 78-pick from the middle: h = (i+) DIV 2 79 begin 81 forall t in i.j do 82 begin 83 if 84 (at < a.h) => 5,867,649 S3 S4 -continued 85 declare 86 s: integer; --declare locals 87 begin 88 fetch add(low index, 1, s); 89 --fetch add indivisibly reads the value in the first 90 --variable, adds the second parameter returning the 91 --result in the third, it is combinable 92 holds := at 93 {forall m: i <= m <= s : holdm Cah) 94 end 95 96 (ah < a t) => 97 declare 98 s : integer; 99 begin 100 fetch add(high index, -1, s); 101 holds := at 102 forall m: s <= m <= j : ah C holdm 103 end 104 105 ( at = ah) => skip 106 fi 107 end --of forall 108; 109 k := low index & hold low index := ah 110 forall m,n : (i <= m < k < n <= j) : holdm C hold k < holdn 111 end -- of partition 113 end body --of package ParallelSort:Sort

Line 1 begins the package declaration called “Sort.” The "Sort' can use the postcondition in their own proof outlines. package declaration, which continues through line 23 Such hierarchical proofs are essential for large programs. defines an abstract data type (ADT). ADTs are more pow Line 23 concludes the package declaration for “Sort”. erful than constructed data types in that they contain opera Line 31 declares that the following package body is an tions (functions and/or procedures) that manipulate internal, object named “ParallelSort” of type “Sort”. Because “Par hidden data Structures. DANCE packages are even more allelSort” is of type “Sort” the programmer using it needs powerful than most ADTs in that they allow parametric or know nothing about its internal details, just that it operates inclusion polymorphism. Line 1 includes three parameters: as specified by “Sort”. In fact there could be many different “rt”, “ar”, and “C”. These parameters are necessarily types 35 objects of type “Sort'. Sequential “BubbleSort” or “Ouick Sets of values. The bounded universal type quantification Sort' could be written, also of type “Sort', that would affected by these parameters allows the package to be used perform the Same operation, but in a different way. A with many other (currently unspecified) types So long as program written to use type "Sort' could use any of the they uphold the bounds required of them. different “ParallelSort”, “BubbleSort', or “OuickSort” with Line 2 follows Ada Syntax; "is commonly connects the 40 equal correctness, but different performance. header of a declaration with its body. Line 33 begins the declaration of the procedure “sort” and Lines 3, 4, & 5 are comments. Comments begin with two must exactly match the declaration in the package "Sort'. dashes and continue to the end of line. Lines 35 through 38 introduce local variables “inter” and Lines 7 through 14 are a requirement clause. It demands “pivot” by existential quantification. They are only visible that “rt” be any type whatsoever; “ar” be a one-dimensional 45 within the following begin-end block which corresponds to array of unspecified length whose elements are of type “rt'; a domain when the program is executed. and “C” be a two-input function on elements of type “rt” and Line 40 begins the body of “sort', which is just one big returning a boolean result. The assertion in line 12 demands alternative command (if-fi). Line 41 restates the precondi that the function “C” be transitive, otherwise sorting will not tion. Line 43 begins the alternative command which produce an ordered list of elements. 50 stretches to line 62. Line 44 contains one of the two Line 16 declares that the package contains a Single alternatives. The guard, “(low =high)" is evaluated and if item-the procedure “sort', with four parameters: “orig” of true, the body may be executed. (If more than one guard is type “ar” to be supplied by the (caller (IN), “final” which true the choice of body is nondeterministic.) In this case an holds the result (OUT), and “low” and “high” which indicate array with a single element is already Sorted. Line 45 is the the range of “orig' to be sorted. The reserved word 55 assertion established by executing the first alternative. Note “SPREAD' causes the dynamic memory allocation of “orig” that it is the same as the overall postcondition for the whole and “final” to put parts of it in many different memory banks procedure. Line 46, the “box” separates alternatives. Line (processors) So that “orig' and “final” may be concurrently 47, which should never be executed is necessary to ensure accessed without contention. When concurrently accessible that at least one guard is true. data Structures are placed in a Single memory bank, memory 60 Line 49 guards the third alternative in which there is more conflicts call significantly degrade performance. Line 17 than one thing to sort. Line 51 invokes the “partition” demands as the precondition for invoking "Sort' that the procedure. Line 52 asserts that the data have been divided elements of the array are distinct-no duplicates. Modifying (partitioned) Such that one part holds elements that are all the code to handle duplicates requires an additional array to less than the pivot and elements in the other part are greater hold multiple matches of the chosen pivot. Lines 19 and 20, 65 than the pivot. This corresponds to the postcondition of the postcondition, assure that when "sort' is done, the "partition'. Line 53, a simple Semicolon, represents Sequen elements will be in sorted order. Other programs that invoke tial composition of temporal logic formulae. Lines 54 and 57 5,867,649 SS S6 recursively invoke “sort” on each part divided by “parti tion'. Line 56, an ampersand, represents parallel composi tion; the two Sorts may execute concurrently. Lines 55 and import sort from ParallelSort:Sort rt => employee record, 58 assert that each part is in order which implies lines 63 and ar => employee array, 64, the postcondition that the whole array (of this < => (employee record.last name < employee record.last name) invocation) is in order. Lines 59 and 60 copy the value of the end import pivot concurrently with sorting of lines 54 and 57. Line 63 marks the end of the procedure “sort”. Lines 67 and 68 declare the procedure “partition” and its In the body of the calling procedure: parameters. Note that the arrays passed must be “SPREAD'. 1O ParallelSort.sort(unsorted team.sorted team, 1, number of Lines 70 and 72 are the precondition and postcondition, Swedes) respectively, for “partition'. The postcondition asserts that the pivot, “k', divides “hold” into greater and lesser parts. Of course “sort' could be used several different ways in Lines 74, 75, and 76 introduce local variables “low the same procedure provided that it is imported with appro index”, “high index”, and “h”. Initially “low index”, priate type parameters each time. “high index”, and “h” have values “i”, “”, and the average 15 This package is also used for an example of parallel of “i” and “” respectively. The reserved word “SHARED" execution in OPERATION OF INVENTION, infra. demands that the declared variables reside on a non Example Programs cacheable page; all accesses must use the layered network The following DANCE programs are selected to demon even if it happens to reside in local memory. strate the efficiency and elegance of DANCE compared with Line 79 begins the main body of procedure “partition”. the code in The Design and Analysis of Parallel Algorithms, Any value for “h” between “i' and “” inclusive will satisfy Selim G. Akl, Prentice-Hall, 1989. the assertion in line 80; the average was chosen to minimize Selection pernicious data arrangements. Akl dives into “selection' first. His algorithm picks the Line 81 uses the parallel command to create j-i--1 Sub kth largest item from an unordered Sequence. He then uses intervals each with a different value for “t” from “i” to “j” 25 Selection to pick the median of the Sequence for partitioning inclusive to be executed concurrently. Since the parallel of the sequence as part of Sorting. He claims optimality (in command is really bounded, universal, temporal order terms) for his algorithm, but it is so twisted and hard quantification, “t' is really a constant and must not be to follow, his claims are difficult to verify. However for this assigned in any Subinterval. Line 82 begins the block for algorithm to have any chance, the number of elements in the which values of “t” are valid. Sequence must be much larger (orders of magnitude) than Line 83 begins an alternative command. Line 84 com the number of processors. pares the value of the element of “a” unique to this sub Merging interval with the common pivot using the total order opera Akl needs merging to sort. DANCE doesn’t. Akl devotes a long chapter to merging on various models and algorithms tor that must be Supplied to use the package. Lines 85 and 35 86 create a new local variable “s' that is used in line 88 to to merge two Sorted Sequences into a Single Sequence. determine the index to which this element “at” should be Merging is an excellent example of a simple, efficient assigned. In this case “at” is smaller than the pivot “ah” Sequential program becoming a monstrous parallel program. so “low index” Is used to assign a slot in the lower half of Sorting “hold” in line 92. Since “low index' was declared Akl then uses his inelegant Selection and merging to “SHARED any number of fetch-adds may occur at the 40 concoct a Slow Sorting algorithm. Sorting with combinable same clock cycle with different values for “s' returned as if operations is easy to execute. The trick is to use fetch-add they had been executed in some sequential order. We don’t operations to determine the array indeX during partition. An really care which element of “hold” gets the value of “at” arbitrary element is chosen as the pivot. Each element is then So long as it's in the lower part (less than the pivot) and no compared to the pivot and if it is Smaller, put at the other Subinterval tries to assign the same element. Line 93 45 beginning of the array, otherwise put at the end of the array. asserts that all the elements between “f” and “s' inclusive are Isoefficiency analysis: The DANCE program used as an Smaller than the pivot. example earlier is similar to “quickSort' which recursively Line 95 separates alternatives of the alternative command partitions the list into larger and Smaller halves, and then starting at line 83. Lines 96 through 103 are similar to lines Sorts the halves. The recursion stops when a partition has but 84 through 94 except that the value of “at” is assigned to 50 a single element. If a median value is used then each Step the the high end of “hold'. Line 105 handles the pivot and does algorithm requires logN waves of partitioning. However nothing. Line 106 ends the alternative command; line 107 finding the median requires examining every element which ends the parallel command; line 108 performs sequential is really wasted effort. If an arbitrary element is chosen as composition. Line 109 copies the index of the pivot to “k” the pivot then O(logN) waves are needed but each wave is and the value of the pivot into the hole left in “hold”. 55 much faster. Picking an arbitrary pivot element takes Ucalc The assertion on line 110 states that the elements on the time the unit time of a Sequential computer to perform a low end are smaller than the elements on the high end. The Single operation-roughly one instruction. Comparing an is tricky; “k' will be left one greater than the element with the pivot, computing its new index, and index of the last element considered that was Smaller than copying it takes 3* U time. Therefore the amount of time the pivot-that’s the hole where the pivots value goes. Line 60 for each wave is 3*N* U+number of pivot picks U. 110 is the postcondition that procedure “partition” seeks to On the first wave a single pivot will be selected. On each establish. Line 111 ends “partition"; Line 113 ends the whole wave the number of partitions roughly doubles, So through package body of “ParallelSort”. out the whole sorting procedure about N will be chosen. To invoke “ParallelSort', the package must be imported Since there will be (on average) logN waves the total amount and specialized. Use of “ParallelSort” to sort employee 65 of time for quickSort is: records alphabetically by last name is shown below. In a declaration part: 5,867,649 57 58 The analysis of the DANCE sort (on Multitude layered -continued network parallel processor with combinable operations) is Similar except that partitioning (and Sorting) can be done in parallel. Choosing the pivot must be done before partitioning, So we incur Some “Amdahl's law penalty, but 5 Since many pivots can be chosen Simultaneously as the algorithm progresses, and large partitions will automatically return C have more processors assigned to them the penalty is end --of Multiply negligible, So the Sacrifice that is taken in performance is a Single U per wave (hence the logN term below). So the For both of these matrix operations the parallel versions total amount of time to compute in parallel is: do no extra work So the efficiency is 1 with unit-linear isoefficiency. Well not quite since the burden of scheduling requires a fetch-add to get an index and a DIV-MOD to extract the ij indices of the element of matrix C to be 15 calculated. Still, for large matrices that overhead is negli gible. Akl presents. Some ugly algorithms for matrix trans position and multiplication for mesh, Shuffle, and cube architectures Vividly demonstrating how hard it is for pro grammers to map their algorithms onto Static network archi tectures (Akl, ch. 7). Furthermore his CRCW multiplication algorithm (Akl p. 187) has a serious flaw in that his To maintain the same efficiency, the ratio of To to Tcalc definition for write conflicts has “the Sum of the numbers to must be held constant: be written is stored in that location.” This risks not having all writes exactly simultaneous, thereby overwriting partial 25 Sums already computed. Linear Equation Solution The oldest of all linear equation Solution algorithms is GauSS-Jordan elimination. Given a matrix A and a vector b determine a vector X Such that AX=b. Akls CREW SIMD algorithm (Akl, p. 231) does not test for division by Zero yet So the parallel quickSort is (order) optimal. Increasing the makes no explicit requirement that the matrix be non number of processors proportionally with the number of Singular or have a non-Zero diagonal. elements maintains the same efficiency. Furthermore, it is expected that real running times with PsN to have 35 function gauss jordan (a: matrix; b: vector; size:integer) extremely high real efficiencies Since the vast majority of return vector operations are completely parallelizible. is Matrix Manipulation declare Ap: SPREAD matrixsize.size:= A; Matrix operations are obligatory for many numerical x: SPREAD vectorsize:= b: analysis problems and many other problems. Such as con 40 i.k: SHARED integer; nected components. i: integer-1; begin {pre: forall m: 1 <= m <= size : Amm f= O} -non-zero diagonal do function Transpose(B:matrix, sinteger) return matrix (j <= size)=> --the type “matrix must be declared elsewhere begin S 45 declare forall ik in 1...size,j...size--1 do f C: SPREAD matrix: begin (i=) => skip forall i,j in 1...s.1...s do Ci,j):=BIji not(i = ) => 50 return C > size) => xi := xi - (Apij/Api,j)*x end --of Transpose function Multiply(A,B:matrix; S:integer) return matrix not(k > size) => S Artik) := Apik - (Apij/Api,j)*/Apk declare f C: SPREAD matrix :=0 55 begin i-j+1 forall i,j in 1...s.1...s do end declare k: integer :=1; --localk for each parallel interval Od begin {inv: (sum m: k=mc=s: Aik Bk,j) + Ci,j = val foral i in 1...size do {bound: S-k >= 0} 60 xi := xi. Apii do s (k <= s)=> return X begin end --of Gauss-Jordan Cij := Aik * Bk.j + Ci,j k := k+1 65 This adaptation of Akl’s Gauss-Jordan algorithm Seems to end be correct, but no loop invariant has been created to prove it. 5,867,649 59 60 The following matrix invert procedure was developed When D is transformed into I then C=A-1I=A-1. So how from Scratch and has a loop invariant. Since I=A-1A (where are the Bi matrices determined? Well, we arent going to I is the identity matrix) we can multiply (on the right side) exactly form the matrices Bi, but rather perform the same by a Sequence of matrices Bi, transforming A into the operations on each side-first normalizing the diagonal identity matrix and I into A-1. So if initially C=I and D=A 5 element and then subtracting a multiple of that row from then the invariant will be C=A-1D. each other row.

procedure matrix invert(A : IN OUT SPREAD matrix; size : integer; singular: OUT bool) is declare C: SPREAD matrixsize, size = 0; D: SPREAD matrixsize.size = A; i: SHARED integer; i: integer:=1; begin singular := F; forall i in 1...size do Cii := 1 {C=I} do {inv: C = A-1D {bound: size-j >= 0} (<= size) and not (singular) => begin --normalize diagonal element (Dji/=0) => forall i in 1...size do Cii, Dji := Ci/Dij, Dji/Dij (Dij = 0) => --pivot matrix by adding a lower row with a non-zero element declare k: integer:=j+1; begin do (k <= size) and (Dk.j = 0) => k := k+1 od

f (ki>size) => singular := T not(ki>size) => begin --pivot found in row k, add to row i forall i in 1...size do CIji, Dji := C,i-CIki, Dji + Dji --now normalize forall i in 1...size do Cii, Dji := Ci/Dij, Dji/Dij end f end --of pivoting togen non-zero diagonal element f end --of normalizing diagonal element --now subtract that row (scaled by Di,j) from all the others forall i in 1...size do f (i = ) => skip --the skip command does nothing (if =) => declare k: SHARED integer; begin --Dij is the multiple of row j to subtract forall k in 1...size do Clik, Dik:= Ci.k - Cik/Dij, Dik - Dik/Di,j end f j := i+1 od {C = A-1 AND D=I AND (>size) OR singular s --copy inverse back to A if not singular (singular) => skip 5,867,649 61 62 -continued not(singular) => A1...size, 1...size:= C1...size, 1...size f end --of matrix inversion

The above matrix inversion algorithm allows computation can use up to n processors for a running time of O(n(1+ of any number of “X” vectors by simple matrix-by-vector n))=O(n2). Therefore the overall cost is O(n3). Akl claims multiplication. Furthermore I am Sure that anomalies like the sequential time is O(nx) where 2

function rev(x, size : integer) return integer is --reverse the bits of a number x that is s bits long declare i: integer = size; k: integer = 0; begin do ( > 0) => begin

not(x DIV 2**(i-1) > 0) => skip

j-1 t d od

return k end --of rev (it reverses bits) procedure fft( A: IN ivector; B ; OUT ivector; size : IN integer) IS declare C.D. ivector; --an ivector is a vector of complex numbers --Supplied by a package with complex data --structures and operations as yet uncoded h.p.q.k: integer; Z,w: complex; n: integer:= 2**size --n is the number of elements in the sample begin h := size 1. --h is the loop variable counting the number of --butterfly stages & --W is this goofy constant, complex number --used in the FFT formula --the “i' is the square root of -1 as a defined constant --the complex arithmetic package C := A --assign whole vector to C do (his=0) => begin p := 2**h --difference of indexes that are bufferflied & q := 2** (size-h) &

--make copy of current vector

forall kin 0..n-1 do f 5,867,649 63 64 -continued (k MOD p) = (k MOD 2*p)) =>

not((k MOD p) = (k MOD 2*p)) => skip f

h := h-1 end od foral k in 0..n-1 do Bk := Crev(k) end--of fift

15 This program is unproven. Rather it was adapted directly bered vertex the node is connected to. In this way all nodes from Akl. that are connected have the Same component number, and Connected Components just as importantly all nodes not reachable have different Determine forss each node (vertex) in a graph, its “com component numbers. ponent number.” A component number is the lowest num

FUNCTION components(A: array O.size-1.0...size-1 OF bool; size : integer) RETURN array O.size-1 OF integer IS --A holds the adjancy matrix of the graph --that is Ax,y) is true --iff node X is directly connected to node y --the vector returned contains the component numbers: the -lowest numbered node that can be reached in the graph DELCARE B: arrayIO.size-1.0...size-1 OF bool; t: array O.size-1 of integer; i : integer; BEGIN --Ax,y) is true if node X is directly connected to node y {forall x,y: 0 <= x,y < size : Ax,y) => Connect(x,y)} connectivity(A.B.size) --B contains a matrix --Bx,y) is true if there is a path from node x to node y {forall x,y: 0 <= x,y a size : Bx,y) => Path(x,y)} FORALL i IN O...size-1 DO DECLARE j: integer = 0; BEGIN --find the lowest numbered node were connected to {inv:forall k:0 <= k < j: NOT(Bi,j) } DO NOT(Bi,j) OR (i=j)) => j := i+1 OD {inv AND (Bij OR (i=j)) } ti := i {ti - AND (Bij OR i=j) AND inv} END {post: forall v : 0 <= v < size : (forall k : 0 <= k < tv NOT(Path(x,y))) OR Tv = v } --this is yet another adaptation of Dijkstra's linear search

RETURN t END --of components PROCEDURE connectivity ( B : IN SPREAD array O.size–1,0,...size-1 OF bool; C: OUT SPREAD array O.size-1.0...size-1 OF bool; size : LN integer) IS --given an adjacency matrix B, compute reachability matrix C --that means Cx,y) is true if the is some set of nodes n1, n2,...ink --So that BX.n1 Bn1n2, ... Bink,y are all true DECLARE D: SPREAD array O.size-1.0...size-1 of bool; i: integer = 1; limit : integer:=integer(log2(size)); --how many iterations? --the second "integer' is an invocation of type conversion BEGIN 5,867,649 65 66 -continued C:=B --copy input {forall x,y: 0 <= x,y < size : Cx,y) => Path(x,y)} {forall x,y: 0 <= x,y < size : Cx,y) => MaxPath Length(x,y) = 1 } {(V => MaxPath Length = u)=>(VV => MaxPath Length = 2*u)} --see Akl page 254 for justification

DO {inv: C => (MaxPath Length = 2**(i-1)) {bound: limit - is= 0 } (i < limit) => BEGIN C := Multiply(C,C) --binary matrix multiplication i := i+1 END OD {C => (MaxPath Length = size) } {forall x,y: 0 <= x,y < size : Cx,y) => Path(x,y)}

RETURN C END --of connectivity

Isoefficiency Analysis: Connected components calls “con -continued nectivity” and then uses linear Search in parallel. Connec tivity calls “multiply” logN times. foralls in 1...N do As := Ms.log2(N) Multiply takes O(N3) operations. So the work involved is: 25 end; --of parallel prefix {As = Sum k in 1...s:A1k}

BPL Soundness Proof Matrix multiplication can use P=N2 processors but the The proof system BPL is sound for total correctness. linear search for the smallest one can only use N. So if you A proof system is “sound” if it derives true facts from true had enough processors, connected components could be facts. The proof system BPL concerns DANCE programs computed in O(logN) time. But if you want to maintain annotated with proof outlines. The soundness of BPL allows perfect efficiency=1 (no extra work except for Scheduling) conclusion that a program is indeed correct. This is not to you can only use N processors. On a time-shared Multitude Say that every correct program has a proof in BPL-that is layered network parallel processing System with many appli 35 a different property, “completeness.” Rather, BPL is likely cations Vying for all the computing resources they can get, “incomplete” like the proof systems developed early this the unused processing power during linear Search will be century by Russel, Whitehead, and Gödel. For DANCE quickly consumed. programs, we seek to Verify given proofs of correctness, not Parallel Prefix Search for proofs of arbitrary programs. Parallel prefix takes a vector and returns another vector 40 Due to the form of the proof system BPL, it is sufficient containing partial “Sums’ of the given vector. The Summing to prove that all axioms are true in the Sense of total may be simple addition, or a more complex commutitve correctness and that all the inference rules of BPL are Sound operation. Parallel prefix arises in a Surprisingly large num for total correctness. Then the result follows by induction on ber of algorithms. the length of proofs. We consider all axioms and proof rules 45 in turn. procedure parallel prefix(var spread A : the array; N : integer); Where applicable, Soundness of weakest-precondition declare (wp) predicate transformer definition is proven as well. M : spread array1...log2(N) of the array; Including Wp into BPL expands the ways to prove a program is : shared integer; is correct. begin 50 {A1=A} Skip foralls in 1...N do Ms.O. := As model: 7e SKIP)=true. ; (s in 1...N) => Ms.O = As j := 1 do inv: forall n in O.j-1 : Min = sum k in i-2** n+1.i. {bd: log2(N)-i} 55 grammar: proof: SKIP makes no state change whatsoever. Whatever begin predicates are true before execution are true afterward foralls in 1...Ndo so both the axiom and wip for SKIP are sound. f ASSignment (s-2**j>0) ->Ms.j := Ms.j-1 + M(s-2**i-1j-1 60 not(s-2**j > 0) -> Ms.j := Ms.j-1 model: 7% X:=e= W eaceX= W start(1Lell. f j := i+1 end grammar: ::=":= proof of axiom: If the predicate P with all occurrances of X replaced by e is true at the beginning of the interval 5,867,649 67 68 I then P will be true at the end of the interval because EXISTjk.jCi and kCi: the value held by Xat end(i) is determined by the value ( W will=true) AND ( w Iwl=true) AND (start(i)= of eat Start(I). This is exactly the same as replacing all start(k)=start(i)) AND (end()=end(k)=end(i)) occurances of X with e in P, So the axiom for assign Rule: ment is Sound. proof of wp: Applying Weakest Precondition Inference Rule:

1O AccessOK(S1, S2) is defined as: AccessOK(S1, S2) = WVi: Vi e S1 US2: Since by the Axiom of ASSignment, (Vi it ihs(S1) Uths(S2)) or {Qx}X:=e {Q}, and P-Qx. 15 (Vie S1 and Vi e S2) or (Vi e S2 and Vi gi S1) or then by the Rule of Consequence, ((Vie co(S1) and Vi e co(S1)). {P}x:=e {Q} (co(S1) co(S2) and co(S2) co(S1)) So Wp for assignment is Sound. ths(S) = {y y := e e S} Sequential Composition co(S) = {y fetch add(v,e, v2) e S or swap(y, e, v2) e S or ... } model: 7% w; waitrue iff wp: AccessOK(S1, S2) wp(S1 & S2, Q)=wp(S1, Q) there exist two Subintervals of i, Ci and kCi, Such that and wp(S2, Q) we will=true, we IIwall-true, and end G)=start(k). grammar: “& rule: 25 proof. Both the rule and Wp for concurrent composition invoke the Strange relation AccessOK. AccessOK allows noninterference to be proved definition. For flow, consider only intervals lacking combinable operations. AccessOK(A,B) InterferenceFree(A,B) The relation AccessOK is stronger than Owicki-Gries noninterference (not falsify a proof outline, A&O p. 216). grammar: ::="' There exist lattices that uphold noninterference but not proof of rule: Let w=S1, w=S2. Given that satisfies AccessOK. Such intervals are disallowed. The premise of the rule states: {P} S1 {Q} 35 {P1}S1 {Q2 and P2S2 {Q2} and AccessOK(S1.S2) AccessOK(S1,S2) interferenceFree(S1, S2) which and k Satisfies means that S1 cannot affect the truth of {Q2, and S2 cannot {Q} S2 {R} affect the truth of {Q1, therefore, by the model for 40 7e S1 & S2, intervals j and k are constructed Such and Q satisfies end()=start(k), then i Satisfies that W: W2 we (S1)=true and we (S2)=true and by definition of the model. Since by construction Psatisfies 7%. , , , , , IP1 AND P2 = true and Start(i)=Start(i), and R satisfies end(k)=end(i) then interval i 45 Satisfies 7e DIQ1 AND Q2)=true therefore we can finally conclude {P1 and P2S1 & S2 {Q1 and Q2} So the rule is Sound. So the Sequential Composition Rule is Sound. 50 proof of wp: proof of wp: Let R=wp(S2, Q), then by definition of wip The weakest precondition predicate transformer definition uses an implication: {R} S2 {Q} AccessOK(S1, S2)wp(S1 & S2, Q)=wp(S1, Q) and Let P=wp(S1, R), then by definition of wip wp(S2, Q) 55 So only when the temporal commands are AccessOK will {P} S1 {R} S1 & S2 have a defined meaning. Now since we are interested in programs that have meaning, we can assume and So, by the Sequential Composition Rule Access OK(S1.S2) is true. Therefore wp(S1,X) is indepen {P} S1; S2 {Q} dent of wip(S2,X) whatever the predicate X. By the model 60 (start(i)=start(k)=start(i)) AND (end()=end(k)=end(i)) Since, by construction, which means that both wip(S1, Q) and wip(S2, Q) must be true at the exact same State, namely start(i). The fact that the intervals are lattices of States is crucial. The way that interval i is composed from Separate (independent, interference-free) wp for Sequential composition is Sound. 65 intervals and k allows proof that their concurrent execution Concurrent Composition does both things at the Same time as if they has been model: 7% w & witrue iff executed alone. 5,867,649 69 70 Existential Quantification are premises that allow use of the Parallel Composition model: 7e DECLARE X=c begin w end=true iff Rule inductively: W startIX)=c, and 7% w)=true. 5 { P(i1) AND P(2) AND... } (d1,d2: domain: S. d1 C S) d2) AND S(i1) & S(i2) & ... (3s: string: (d1 &&“.”s) = d2) AND { Q(i1) AND Q(i2) AND... } (WV: variable: Ve d2 Ve d1D AND (Xe Sa2D AND (X = c) AND (P - P1X.) AND (Q1 Q) The last two premesis {P} DECLAREX = c begin{P1} w Q1} end {Q} 1O (forall ii in range P=>P(i) ) and (forall ii in range: Q(i))=>Q wp: wp(DECLARE X=c begin P1}w {Q1}end,Q)=wp allow use of the Rule of Consequence to conclude: (w,Q1) provided (Q1 Q) and all the uses of X are con {P}S(i1) & S(i2) & . . . {Q} tained within the begin-end. Means of defining variables, Which is exactly the meaning of their scope (visibility), an initial value, and type. 15 {P}FORALL i IN range DOS {Q} grammar: So the rule is Sound. “: proof of wp: < type > (“ = < expression >)+* B E GIN" AccessOK(S)-wp(FORALL i IN range DOS, Q) =wp

5,867,649 73 Nominally, a collection of processors that share memory via a Layered network constitute a Multitude architecture computer as shown in FIG. 23. immedi Each processor (also called a node) has its own micro mode ate direct indirect displaceinent indexed processor (CPU 10), cache memory 12, memory manage look #n or addr (addr) (addrfidisp) (addrindx) “stir ment units (MMU 14), memory, input/output controller value constant Caddr CICaddr CIC addr + disp CIC addr + (IOC 16), and Layered network interface 18. The Layered Cindy network interface comprises a request chip 20 and a response chip 22. The processor accesses remote memory banks through the request chip, which initiates a network In all cases, addr must be an address (Sometimes of a request. The request is routed to the response chip of the memory location containing another address, disp must be designated processor which accesses local memory and an integer constant, and indX must be the address of an Sends a response back through the network following the integer. Same path back to the original request chip. A preoessing node is shown in FIG. 24. 15 Tagged Memory Words Cache coherency protocols hobble many architectures. Every memory word (8-bytes) has a 3-bit tag indicating Multitude requires no cache coherency hardware, because its type. Tags cannot be directly modified by OS routines, correct DANCE programs will never have the same value in and cannot be modified at all by user programs. Any attempt different caches at the same time. Therefore Multitudes's to use a word in conflict with its tag will be intercepted by caches can be as fast as the fastest Sequential processor with the memory controller and will force a high-priority inter no coherency hardware at all. Every other System, that rupt for the offending processor. Tags are but one way that provides shared memory has Some cache coherency protocol erroneous program behavior is detected and crucial OS with its attendant higher cost and reduced performance. tables are protected. Tag types are: address, integer, float, 25 instruction, Segment capability, page descriptor, Value, or Memory Accessing nil. The variables whose States comprise nodes (or vertices) in Error correction coding (ECC) requires 8 additional bits a lattice Satisfying a DANCE program all reside in main to achieve Single error correction/double error detection memory: the combined, dynamic, random-acceSS memory (SEC/DED). FIG. 27 shows the format of a word stored in (DRAM) in all the processing nodes connected via a Lay RAM by the memory controller. ered network. The 64-bit physical address is divided between node number and memory address within the node Segmented Addressing as shown in FIG. 25. The physical address specifies a byte; All virtual addresses (those seen and manipulated by full, 64-bit work operations must begin at addresses evenly programs) are composed of a segment number, and an offset divisible by 8-the number of bytes in a word. (Many of the 35 within the Segment. The Segment number indexes a User following numbers are merely “best guess.” For example, Segment Table (UST) which holds, among other things, the 64-bit microprocessors are on the market today and Seem to address of the origin of the Segment and the length of the be the trend; 64 is not a magic or crucial parameter.) segment. The effective virtual address (EVA) is the origin in Address Space Partition the UST concatenated with the offset within the segment as 40 shown in FIG. 28. At the same time the offset can be The physical memory is partitioned during boot according compared with the length to check that the reference is in to how it will be used: OS-only, local, spread, and global. bounds. See FIG. 26. OS-only memory 24 contains instructions and tables that must not be accessed by user programs. Local Paging memory 26 is reserved for the microprocessor in the same 45 node as the memory bank. Global memory 28 can be Paged address translation is depicted in FIG. 29. Effective accessed by any processor. Spread memory 30 must be Virtual addresses are translated into physical addresses using accessed by Shuffling address bits So consecutive (word) pages. A page is a chunk of memory that may be Stored on addresses reside in different memory banks to alleviate disk when not in active use. Paging allows the Virtual contention. OS-only memory is also local, global, or spread address Space to appear much larger than the physical depending how the OS uses it. FIG. 26 shows a possible 50 address Space, by keeping most of the pages on disk. A page partition of memory acroSS Several bankS. fault occurs when a value is accessed that currently resides on disk. If there is an empty page in memory, the IOC Addressing Modes controlling the disk will copy the page from disk to memory Multitude instructions allow five addressing nodes: 55 and update the page table with the new, temporary, physical immediate, direct, indirect, indirect with displacement, and address of the restored page. Translation of EVA to physical indexed. Immediate addressing uses a constant within the address is similar to Segmented address translation except instruction for its parameter value. Direct addressing uses that the page table is indexed with the most significant bits the value in the Specified memory location. Indirect address of EVA instead of a Separate Segment field. ing dereferences the address twice; the value is at the address contained in the memory location whose address is con 60 Activation block tained in the instruction. Indirect addressing with displace ment (or just displacement) adds a constant to the address An activation block holds variables (and other necessary found in memory to determine where the value is. Indexed information) for procedures, functions, packages, and other addressing uses two addresses, one to all integer (the index) language constructs. Activation records (blocks) will be kept and another to an address of an address (the base) to get the 65 entirely on a heap. If a processor doesn’t have a pointer for eventual address of the value. In the following Caddr a block it cannot be accessed. This contrasts sharply with the means the contents of memory location addr. Stack indexes for activation records (Computer Architecture, 5,867,649 75 76 Vol. 1, Richard Kain, 1989, Prentice-Hall, Chapter 5). If a A vector whose length is the total number of frames holds processor needs an ancestor's block, a pointer to that acti pointers to unallocated frames. Three integers then control Vation record may be contained in an interval map) table, or heap allocation: NumFree, NextAlloc, and NextReturn. by traversing pointers linking children to their parent filling NumFree is decremented on allocation and incremented on in the display at the same time. Displays allow determination deallocation with a fetch-add. Fetch-add could also be used of the address for any accessible variable with a Single fetch on NextAlloc to get the index into the vector holding the (to obtain a pointer to the right activation record) and simple address of an unused block. Deallocation uses NextReturn integer arithmetic (to add the proper offset within that Similarly. The disadvantage is that the size of the page activation record). Activation records of procedures that frames is always the Same. If the size needed for the millay be executed by any processor are held in “spread” allocation block exceeds the frame size, then Some form of (interleaved, commonly addressable) memory. Conversely indirection must be used instead of Simple indexing. Addi Some activation records are best put on the local Stack of a tionally program-directed allocations are often Small (in Specific processor, Say, for example, the processor connected bytes), thus wasting most of the frame. to a fiber-optic channel would handle external communica So fixed-size frames won’t work. Kain discusses three tion taskS. 15 variable-size allocation Strategies: first fit, best fit, and buddy Procedures can not be So easily invoked as merely push system (Kain Vol. 1, pp. 171-176). All three strategies ing parameters on the top of the Stack before calling the involve Subdividing large blocks (when less space is needed) procedure. A new activation record (context) must be created and combining adjacent Small blocks (when no available to hold all of the parameters the procedure will need. Each block is big enough). The buddy system is by far the most activation record corresponds with the Symbol table associ elegant Since the blocks used are always powers-of-two ated with each (meaningful, non-empty) lexical Scope. Sym sized. Tlle buddy System also (seems) resistant to heap bol tables are constructed during compilation. Program fragmentation where enough heap Space is available, but in variable names will eventually devolve into an activation lots of noncontiguous little pieces. Furthermore both free block pointer and an offset into the activation block. Effec Space lists and free Space tables do not lend themselves to tive address computation at run-time can be simple index 25 Simultaneous allocation and deallocation. ing. My proposed solution is the buddy system. Several free In Sequential, block-structured languages a Stack Space lists are used with all of the elements in a single table adequately Supports activation block allocation because the being the same size. When a block of the desired size is lifetimes of the activation blocks are nested (Kain, Vol. 1, pp unavailable, a next larger sized block is broken in two (or 181-190). For DANCE programs such a relationship does more), one piece is used and the other is placed in the next not hold clue to parallel execution. While it is true that smaller list. Suppose the smallest sized block is 8 bytes. parent activation record lifetimes encompass their Then different lists would handle 8, 16, 32, 64, 128, 256, children's, a parent may have many different children whose 512, 1024, 2048, 4096, 8k, 16k, 32k, 64k, 128k, 256k, 512k, lifetimes are unrelated. Therefore activation blocks must be 1M, 2M, 4M and 16M bytes. These twenty-one lists could allocated from a heap. Heaps are necessary anyway to 35 handle all the block sizes a program would need. A different Support program invoked memory allocation. pool of page frames is needed for each physical memory Since DANCE Supports exotic, polymorphic types, the partition: OS-only, global, and Spread. The local partition in Size of the allocation block may not be known at compile each node will be managed with conventional Stack and time. However the size of activation block must be deter heap. FIG. 30 shows a buddy-heap data structure. mined when the procedure is invoked. The following pro 40 cedure calling Scenario StepS are adapted from Kain, Vol. 1 Processor Flags p. 182. Multitude processors contains no internal registers, perse. A) Compute the size of activation record required. All of the instructions reference memory locations. B) Allocate that sized block from the heap. However, Several flags are available for branching or arith C) Copy the current activation record address to the 45 metic calculations. The flags are: carry (C), overflow (V), display in the new block. The immediate parent is at the Zero (Z), negative (N), extension (X), and test result (R). “top” of the display. These flags are set automatically during instruction execu D) Copy the display of the caller into the new block. tion. They may be used for conditional branching. They may (optional) not be written or read any other way. E) Copy the actual parameters or their pointers into the 50 block. Three Address Code F) Call the procedure, updating the activation block This Section defines an intermediate code representation pointer to the new block. that call both: G) Perform the procedure. 55 Serve as a target language for compilation, H) Upon completion, copy returned values into their Be easily translated into Specific ISAS by a compiler respective actual parameter locations. back-end. I) Restore the caller's activation record pointer with the The form of expression will be quadruples or three value at the top of the display. address code. Each of the “addresses' will include the J) Release the unneeded activation block to the heap. 60 objects type, context, and offset within context. Obviously, allocation and deallocation from a heap is The instructions for the Multitude architecture are loosely more involved than simple addition used to modify the based on the intermediate language outlined in the book top-of-Stack in Sequential machines. Therefore heap alloca (Compilers. Principles, Techniques, and Tools, Aho, Sethi, tion and deallocation must be both Speedy and be performed & Ullman, Addison-Wesley 1986, Chapter 8). The compiler in parallel without blocking. 65 back-end will be able to generate actual machine code for The Simplest Strategy would be fixed-size heap frames any commercial-off-the-shelf (COTS) microprocessor from with a queue (or FIFO) policy (Kain, Vol. 1 pp. 168-171). the three address code. 5,867,649 77 78 Five addressing modes will be used: direct, indirect, processors that can handle arbitrarily large integers, rational indexed, displacement, and immediate. For direct addressing instructions will not be provided. the value to be used is held in the memory location Specified Memory allocation and deallocation (alloc & free) will be by the field. For indirect addressing the memory location performed using the “buddy System'. Such operations are specified by the field holds the address of the parameter to 5 included as instructions (rather than OS calls) since they are be used. For immediate addressing, the parameter to be used So important. Aho, et. al. pp. 467-468 have three pointer is held within the instruction. Indexed addressing adds memory location containing an integer to a memory location assignments, two of which are covered by indirect address containing an address to determine the address of the ing. In this case the value of location X becomes the address operand. of location y (x :=&y). The Second case (x := y) is accom Not all the instructions need three addresses. In all cases, plished by using the “mov” instruction with the N (indirect) the last address receives the value when one is returned. All addressing mode on the first operand. The third case (x :=y) addresses are Segmented. uses mov with the indirect addressing mode on the last Instructions denote the type of operands (e.g. integer or operand. The ptrinstruction is not really necessary Since the float) and their size in bytes. A size of “w” indicates whole address of y could be accomplished by the mov instruction word, “b' indicates byte, and a digit represents the number 15 with the address of y as an immediate operand. of bytes (at most eight). Some instructions need no type or Scheduling size. If the Size is unspecified, whole word will be assumed. Instruction will be represented in assembly-language by: Multitude wouldn’t be a parallel architecture without the capacity to Spawn, Suspend and terminate intervals. Two label instruction:type:size op1;op2;result-comment different Scheduling instructions are required because of the No spaces are allowed within the instruction name and a two types of parallel composition in DANCE. Scheduling a Space (or tab) is required between the instruction name and Single proceSS is performed by "Sched'. For example, in its operands. Labels must be enclosed in Square brackets and “sort” procedure two recursive calls to “sort” are invoked in must be separated from the instruction name by a space (or parallel with Simple assignment. The parent interval will tab). 25 Schedule each new interval Sequentially, but they will be User programs may not use any assembly-language code picked up by different processors and executed concurrently. whatsoever. Requiring and enforcing all code be generated Barrier Synchronization at the end of each new interval by “trusted’ compilers helps assure Security and correctness. causes the last one to continue with the parent interval. This is part of the “nothing illegal” philosophy. Much of this The power and efficiency of the Multitude architecture Section is devoted to architectural features that implement an comes not from the ability to Schedule large-grained “everything checked” philosophy. Ensuring that programs processes, but rather from Scheduling (and executing) do what they're Supposed to do, and nothing else, assures bunches of “light-weight” processes. A lightweight proceSS correctness and Security. runs in its parent's environment (activation block) obviating The branch instruction is an exception to the above rule. the need for each light-weight process to allocate its own Instead of type and size as part of the instruction name, the 35 activation block thus permitting parallel execution with a branch condition is appended. minimum of fuss. The instruction “schedm” schedules many Branch conditions are: processes at once that share the same environment and code. The combinable, fetch-add operation will be used by each light-weight process to determine a unique parameter So that T true always branch 40 although running the same code, the processes do different F false never branch (useless) BO equal Z. things. Both Sched and Schedm require two parameters that NE not equal not(Z) contain addresses. One for the proper activation block and CC carry clear not(C) the other for the first instruction to be executed. The instruc CS carry set C VC overflow clear not(V) 45 tion "Susp'' Suspends execution of the current processes and VS overflow set V Schedules the processes for later execution Starting at the RC test result clear not(R) following instruction. The instruction “term' terminates the RS test result set R current process. LT less than N and not(Z) GT greater than not(N) and not(Z) Combinable Operations AM atmost N or Z. 50 AL at least not(N) or Z. Combinable operations (fadd, for, fand, fxor, Swapi, Swapf, & Swapb) are included as instructions since they are Examples of instructions: So critical to parallel processing without locks or critical sections. Inclusion of combinable operations in the ISA of 55 any microprocessor will make it my choice for Multitude. label addi:w a;b;c --add integers in a and b, put result in c Alas, I’m aware of none that do. Therefore the node pro mod:i:w c:#17:d --d:= c MOD 17 ceSSor will probably need Some means for transforming a eq:i:wd:#0 --d = 0. br:eq done --test equal flag write followed by a read to Some Specific physical address add:i:w a;#1;a --increment a range into a combinable operation. br:t label --always branch to tabel 60 done -loop exit, continue Segment Capability Every Segment has an associated capability. A capability Each instruction is typed where appropriate with i, f, b, r, is used to form an effective virtual address by concatenating or c for integer, floating-point, boolean, rational, or character the address of the Segment's origin with the offset provided respectively. 65 by the instruction. The Segment's length is used for range The rational type operators are not included because, checking. The page size holding the Segment has two lacking greatest, common divisor (GCD) hardware and purposes: additional memory allocation and releasing 5,867,649 79 80 memory when no longer needed. A memory allocation addresses not known when they were compiled. The process request may be Satisfied by the OS by using the extra Space of connecting the modules is known as “linking.” Linking in a page without needing to get fresh memory from the may either be static (all links made when program is loaded, heap. A capability also has access rights that limit how a before execution) or dynamic (Some links made during Segment can be used. execution). Multitude can do either; Static typing is Some A Segment must be global, spread, shared, or private, a times required to ensure Security. FIG. 33 graphically Segment may be read-only; a Segment may be protected; a depicts Static linking. Segment may be OS-only. The beginning of every linkable Subprogram contains a Global: resides in memory accessible by any processor. link record 30. The link record contains an entry for every Spread: Same as global, except that the bits used for Symbol exported (defined in this module, used elsewhere) or memory bank number come from the least Significant end of imported (defined elsewhere, used in this module). Every the virtual address to spread the Segment over many or all export entry contains the exported Symbol, its type, and its the working memory banks. address. Every import entry also contains a Symbol and its Shared: non-cacheable; every access must use the Lay type, but its address is nil. Filling in all the nil addresses in ered network; memory locations Serving as targets for com 15 every import entry in every code module is the process of binable operations must be on shared pages. Private: resides Static linking. in local memory for this node. Dynamic linking fills in as many links as are known when Read-Only: no writing, all instructions reside in read-only the program is loaded, but leaves Some linkS. nil. These Segments. dynamic links are made in the activation record for this Subprogram. Space is allocated for the dynamic links when Protected: can force writeback of cache values, data the Subprogram's activation record is allocated. Instructions Structures that are accessed using combinable operations to use the link record 30 for links established before execution. assure non-interference reside in protected Segments. The The link record is part of the program Segment and will be wback instruction forces writing back to main memory read-only after loading. Instructions use the activation every value in that Segment that may have been loaded into 25 cache. This allows temporary caching of values from a data record for dynamic links. In either case, displacement Structure visible by any processor. Executing wback before addressing will be used to reference imported Symbol barrier Synchronization assures that main memory reflects displacement from the origin of program text or the origin of the true State of the interval representing execution of the the activation record. program as a whole. In addition, protected Segments are Linking must check to see that the types linked together expected to reduce LRU-induced write-back upon cache are compatible. Usually this is just integer, float, or miss, protected Segments will leave many empty cache instruction, but type checking of the polymorphic abstract entries after wback. data types provided by DANCE will entail comparing type OS-only: OS-only segments will never appear in a User expressions similar to that used in the compiler. Segment Table. 35 OPERATION OF INVENTION Requirements: Bit field indicating authorizations neces Sary to access the Segment. A set of corresponding authori This section expounds the “operation” of this invention, zation bits (access rights) bound to each user, are compared the model of computation described earlier. That entails with the requirements. All requirements must be met disclosing how Someone would actually “use' this inven (corresponding authorization bits Set) to access the Segment. 40 tion. This invention is a process for transforming informa A limited number of discretionary acceSS controls can be tion into a desired form. Generally, Such transformation is directly implemented in hardware. called “computation.” This invention performs computation Security level: Users may read their level or lower; write So that many parts of the computation may proceed concur their level or higher; and do combinable operations on their rently. Concurrent computation is achieved by constructing level only. 45 a particular kind of mathematical object that Satisfies (makes true) a temporal logic formula. DANCE programs are tem FIG. 31 shows the bits in a capability. poral logic formulas. The constructed mathematical objects Page Descriptor are lattices of States. States are values of program variables A page descriptor is used to translate effective virtual at an instant of time. The first state holds the information as addresses into physical addresses; check access rights, Secu 50 received; the last State holds the transformed information in rity level, type of operation; and record page use for virtual the desired form. memory page Swapping. FIG. 32 shows the bits in a page The operation of this invention is the construction of State descriptor. lattices that satisfy DANCE programs. Correct DANCE programs describe how to construct a Satisfying State lattice. Temporary Variables 55 At any time, the processor may use implicitly temporary Use of the invention variables that are Stored Solely in cache. Of course tempo To use this invention one must write a correct DANCE rary variables must be written before read. The cache immediately makes a place for the temporary variable, if it program, compile the program into executable form, and is not present already. All temporary variables are discarded 60 linkloadeXecute it on a Multitude-architecture computer. upon wback, or Suspending or terminating the current inter Writing a DANCE program val. Procedure and functions that use temporary variables must wback before they return. Composing a DANCE program and its proof outline together is more art than craft. Gries Science of Program Linking 65 ming well explains derivation of Sequential programs. Writ A compiled DANCE program will comprise many, Sepa ing parallel programs in DANCE is similar, but seeks to rate code modules. The modules have placeholders for maximize the opportunities for concurrent computation. In 5,867,649 81 82 general, new data Structures accessed via combinable opera postcondition the program must assure. Real-time and tions will be required to achieve non-interference without transaction-based Systems will write pre- and postconditions blocking. for every event that can occur with the caveat that Several events may be Serviced Simultaneously. Compiling Formal requirements can form a contract between devel Compilation of DANCE programs differs little from com operS and users. Proof outlines show that a program meets pilation of ordinary languages. We believe the best way to do its formal Specification-not that the System does what's So is to use an attribute grammar to define language Syntax intended. ASSuring that the Specification accurately captures and Semantics. Currently those who compile large programs the desires of users will remain a human-intensive activity perform "parallel' compilation by using Several networked 1O forever. Fortunately, the process of transforming natural WorkStations to compile individual “compilation units.” language Specifications into formal assertions identifies Rich opportunities for concurrent computation within com ambiguities and lapses in the natural language Specification pilation units can be realized with a DANCE compiler early in the development process. Thus critics of correctness written in DANCE. proofs, because they only assure a program meets its speci Lexical analysis Parsing Tree building Tree 15 fication and not what is wanted, are wrong. Formality assists decoration Code generation conformance to expectations. Execution on a Multitude-architecture MPP Choose concurrently accessible data Structures Prior to execution, the program must be loaded, linked, and Scheduled. Proper linking of packages with type param Unfortunately, older FORTRAN programs cannot be eters (e.g. ParallelSort) is tricky; every invocation uses the automatically converted into DANCE while still obtaining Same object code with different, appropriate linkS. This efficiencies possible. New, concurrently-accessible data facilitates use of pre-compiled libraries and minimizes the Structures that use combinable operations must be created. need for recompilation when the program is modified. Fetch-add is useful for accessing arrays to assure that at most one interval is reading or writing any array element. How to write a DANCE program 25 Similarly, Swap is useful for manipulating dynamic data Structures like linked-lists and trees. Once the new paradigm embodied in DANCE is under stood (a major change from the inherently sequential way Array Queue people think), DANCE programs will be as easy to write as Pascal programs. High School graduates with a firm ground FIG. 12 depicts a simultaneously accessible queue 32. For ing in algebra and two years of trade School should be able Simplicity we (initially) assume that the queue size is to write most DANCE programs used. Sufficiently large enough So that there is always enough room for another put, and there is always Something in the Determine what you want queue for a get. 35 Software engineering methods “waterfall” model stress requirements evaluation prior to design and coding. These Put code: fetch add(tail,1 put index): methods determine “what the program should do, rather Qput index MOD Q length := put value than “how” it should do it. Get code: 40 fetch add(head,19et index): Formally Specify requirements get value := Qget index MOD Q length Natural language specifications must be transformed into assertions written in the first-order predicate calculus used Now let's assume that the queue 32 may be empty or full, for proof outlines. For most applications this means com in which case the interval will be Suspended, and retried posing the precondition the program can assume and the later.

Put code: {inv: 0<= min in <= (tail-head) MOD Q length <= max in < Q length } { OLDTAIL = tail} put: success := F; do (not success) -> declare size : nat: begin fetch add (max in, 1...size); f (size < Q length) -> begin fetch add(tail, 1.put index): Qput index MOD Q length:= put value; fetch add(min in,1,dummy); success := T end NOT(size < Q length) -> begin fetch add(max in-1,dummy); 5,867,649 83 84 -continued suspend end f end od { inv AND exists i : OLDTAIL <= i < tail : Qi=put value AND AccessOK} Get code: { inv { OLDHEAD = head AND OLDQ = Q} get: success := F; do (not success) -> declare size : nat: begin fetch add (min in-1.size); f (size > 0) -> begin fetch add(head,19et index): get value := Qget index MOD Q length: fetch add(max in-1,dummy); success := T end NOT(size > 0) -> begin fetch add(min in,1,dummy), suspend end f end od { inv AND exists i : OLDHEAD <= i < head : OLDOi-get value AND AccessOK}

Linked List Queue 35 Put code: FIG. 13 depicts insertion of a record into a linked list. A my record := NEW record type; Swap is used to exchange a pointer in the linked list. When --fill in my record only a Single insertion occurs, the Swap exchanges a pointer swap (tail.my record,old tail); to the new record with the pointer to the “next record. my record.last := old tail; 40 old tail.next := my record ASSigning the returned value to the “next entry in the new Get code: record keeps the list linked. The code to insert a new record, my record := head; “e' after record “c” will look something like this: swap (head,my record.next,dummy); e := NEW record type; This works fine for put; simultaneous Swaps to “tail” --fill in my record 45 Swap (c.next,e,old ptr); allow multiple records to be added to the queue e.next := old ptr; concurrently, but get only works for one interval at a time. Dynamic Double-Ended Queue But if another interval tries to insert a record, say “f , at 50 Making the queue work with multiple, Simultaneous the Same place at the same time, the Swaps will combine "get's gets messy. nondeterministically. The value swapped into “c.next will always be one of the Swap parameters, “e' or “f”; the value swapped out of “c.next will be returned as “old wtr' to Put code: only one interval; and the value returned as “old ptr' to the 55 my record := NEW record type; --fill in my record remaining interval is the Swap parameter of the other inter { list = SET L: L-head OR (EXISTS L2 : L2 IN list: L=L2.next) } val. In FIG. 13 the value swapped into “c.next was “e”, --list is the set of all pointers such that either the pointer is head or “e.next' became “f”, and “f.next got the value originally in --another pointer in the list has it as the next pointer “c.next which was “d'. Since Swap is nondetermistic, {inv: min in + lookers <= NUM r: r IN list:rb} record “f” could have been inserted before record “e', but in 60 --min in is at most the number of pointers in the list whose records any case all of the records will be inserted between “c” and --have the b flag true (default value) “d no matter how many records are inserted swap (tail.my record,old tail); Simultaneously, but the inserted records could be in any my record.last := old tail; order. old tail = nil)->swap (head,my record,dummy1) FIG. 14 depicts a queue made from a doubly-linked list. 65 (old tail f= nil) ->old tail.next := my record Such a queue is limited only by the memory available and fi: consumes no more memory than needed. 5,867,649 85 86 -continued Satisfy an alternative command, we must Satisfy a choice with an open guard. Since “low” is less than “high we must my record.b := T: fetch add(min in,1,dummy); satisfy the choice from line 50 to line 61. To do this we must {inv AND AccessOK} construct an interval that first satisfies “partition”, line 51, Get code: and then satisfies two temporal predicates “sort', lines 54 get: success := F; and 57, and an assignment, line 60, concurrently. FIG. 17 {inv} do depicts Such an interval. The dashed lines represent Some (not success) -> lattice of States that Satisfy the temporal predicate on the line declare indicated. size : nat: 1O The interval satisfying “partition”, line 51, must satisfy its begin definition Starting on line 67. AS usual, the names of the {OLDMIN = min in AND OLDLOOK = lookers} fetch add (min in-1.size); formal and actual parameters differ without performance {min in = OLDMIN-1} impact. The body of “partition”, lines 79 through 111, consists of a parallel command, line 81, followed by a pair (size > 0) -> begin 15 of concurrent assignments, line 109, as shown in FIG. 18. {lookers = OLDLOOK+1} The body of the parallel command is an alternative my record := head; command with three choices, lines 82 to 107. The last {OLDB = my record.b. choice, line 105, is Selected when considering the pivot, and fetch and(my record.b.F.success); does nothing. The other two choices perform a fetch-add, {inv2: success OR NOT OLDB} do line 88 or 100, and then an assignment, line 92 or 101, using (NOT success) -> the value returned by the fetch-add. Fetch-add is one of the begin combinable operations that allow concurrent intervals to use my record := the same State variables Simultaneously in a non my record.next; {OLDB = deterministic, yet controlled manner. State transitions my record.b. 25 involving combinable operations that occur Simultaneously fetch and(my record.b.F.success) are crossed with a striped, horizontal bar. FIG. 19 shows a {inv2} end few, possible concurrent intervals that Satisfy the alternative od command, lines 83 to 106. {success AND exclusive(my record) Many different intervals can Satisfy a temporal predicate, swap (head,my record next,dummy) and yield exactly the Same result. The previous example end illustrates that many different State lattices (intervals) can NOT(size > 0) -> Satisfy "Sort-even a Sequence. The best intervals are short begin (few arcs in longest path from start to end of the lattice) and {min in = OLDMIN-1} bushy (much concurrent computation). fetch add(min in,1,dummy); 35 FIGS. 20, 21, and 22 exemplify a particular state lattice, {(min in = OLDMIN) AND (lookers = OLDLOOK) } that could be an interval that Sorts the roster for the Swedish --well the above assertion isn't quite Bikini Team. The first state is just the values of variables in accurate, but the the initial invocation of "sort' as in FIG. 15. The states are -net change in min in and lookers is labeled with line numbers, but should not be thought of as ZCO suspend 40 the location of a program counter or place in the text of the end program. DebuggerS are universally used for Sequential f program development. A debugger allows the programmer end od to Step through the code and examine any variables desired. {inv AND exclusive(my record) AND success AND AccessOK} This works because there is a Single locus of control which 45 can be stopped to examine State variables. The Semantics of a traditional, prior-art, language for FIG. 15 would be: “Push Operation of Parallel Sort current State variables onto the Stack, jump to the beginning Returning to the earlier example of ParallelSort execution of the Sort Subroutine; and Sequentially execute the instruc will be explained using reference numerals Chicago font tions found there.” In contrast, DANCE semantics for FIG. (1234567890) for line numbers of the package in section: An 50 15 are: “make a lattice of states that satisfies Extended Example: Sorting. Sort (unsorted employees, Sorted employees, Suppose the employee names to be Sorted are: Johnson, 1,number of employees) Swenson, AnderSon, Larson, Carlson, Jackson, HenderSon, where the values of variables in the initial State are as given.” and Peterson. Execution will (try) to make the following Therefore the states of the lattice really don’t correspond to temporal logic formula true by constructing a lattice of 55 any particular line of the program at all, but to tell which StateS: temporal predicate is Satisfied by the Subinterval containing Sort (unsorted employees, Sorted employees, the numbered State. 1,number of employees) FIG. 20 deliberately resembles FIG. 17 which depicts an where "unsorted employees' have the names above and the interval (lattice of states) to satisfy “sort”. Each of the "number of employees' is eight. Initially the graph is 60 dashed lines in FIG. 17 represents a Subinterval that satisfies only a start node 101 as shown in FIG. 15. the numbered line of the program. The Subinterval from 67 Exchanging actual parameters for formal parameters (just to 111 in FIG. 20 replaces 51 in FIG. 17. Similarly subinter a name change, no copying or computation) yields the State vals 54, 57 and 60 in FIG. 17 are replaced by subintervals at the beginning of procedure "sort', line 33, depicted in beginning at states labeled 54, 57 and 60 which continue off FIG. 16. 65 the page into FIG. 21. States 88 in FIG. 20 hold initial values To satisfy temporal predicate “sort' we must satisfy its for each Subinterval labeled 82-107 in FIG. 19. The values body. The body of “sort” is just an alternative command. To for “s” in states 88 or 100 are the first actual state change 5,867,649 87 88 the result of a fetch-add. The state in 109 is the last state of number of every address is contained in the level register. the Subinterval satisfying “partition”. States 54 and 57 begin Variables declared by a parent Subprogram and visible at this Subintervals that make each (smaller) “sort”. Those subinter level have Segment numbers relative to the level register. vals have the same shape as FIG. 17. The process of The “interval” register 46 designates an Interval Mapping inserting Subintervals (lattices) continues until the predicate Table (IMT). An IMT maps segment numbers up to the level "Sort' can be Satisfied by a single assignment, 44. Each register into the true Segment numbers. In this manner many Subinterval satisfying “sort' ends with a state labeled 65. intervals may be executing concurrently, at the same level, Finally the array "Sorted employees' gets its proper value but Still use independent Segments in the same User Segment in the last state of the interval in FIG. 22. Table. FIG. 34 graphically represents another exemplarylattice The “access rights” and “security level” registers 48, 40 satisfying “sort” for different data without state variable hold a user's permissions and Security classification. These values in each node of the graph. Sorting a million records are compared with Segment capabilities and page descriptors will still have the same shape for the interval which satisfies to enforce Security policies. "Sort, just wider and deeper. 15 Sort ASSembly Language: Memory Management Unit The Memory Management Unit (MMU) performs address translation, reference monitoring, cache operation, Vir sort eqi:w low;high --are the indices the same? branch EQ;onething --yes tualmemory page Swapping, and node bus transactions. The gti:w low;high TLB described below usually resides in the microprocessor, branch GT:endsort but may be included on the MMU chip. In fact, the MMU alloc spp.psize;loc 1 may not even have its own chip, but be also included in the OWW low:loc1.i;wordsize OW high;loc1.j:wordsize microprocessor. The address translation performed by the subi high low timp2 MMU are depicted in FIG. 36. addi timp2 1. timp2 25 multi rsize timp2 timp3 Translation Lookaside Buffer OW orig loc1.a timp3 call loc1 partition The Translation Lookaside Buffer (TLB) implements seg mented addressing. Referring to FIGS. 36, 37 and 38, TLB 50 translates addresses generated by the processor (user, Hardware Operation segment,offset) into effective virtual addresses (EVA). The TLB holds temporary copies of capabilities held in a User The section Multitude Computer Architecture in the Segment Table (UST) 52. The TLB is a cache for address DESCRIPTION OF INVENTION Summarized What the translations. hardware necessary to execute is. This Section describes The UST defines segmented-to-EVA translation. The user what the hardware does. 35 register 54 indicates which UST defines this user's capa The three-address code, clever memory management, bilities. The interval register 56 indicates segment substitu capabilities, descriptors, hidden registers, and tags will tion for low-numbered Segments. It has as many entries as almost certainly be emulated by early prototypes. A layered the level register 56. Segment numbers greater than level are network prototype System with custom GaAS Switch chips unmodified. This allows multiple intervals to be executing at has been developed. It employs off-the-shelf VME-card 40 the same level but with different local variables. processors for the nodes. Combining is not done in the The TLB 50 holds copies of recently used capabilities for network, but rather in the custom hardware that handles network operations for each node. It is believed to be the first rapid access as shown in FIG. 38. platform to execute a DANCE program in parallel, with all EVA is translated into a physical address using a page the performance-degrading emulation by conventional hard 45 table. Since a Segment resides on a Single page, inclusion of a page descriptor in the TLB allows direct translation from WC. Segmented address to physical address. Hidden Registers FIG. 39 shows the TLB 50 augmented with page descrip tors. The page size determines how many bits of page origin Several hidden registers are used by the microprocessor, 50 but are not directly accessible: instruction counter 40, user are used to form the physical address, larger pages use fewer 42, nesting level 44, interval 46, access rights 48, and bits. security level 50. The hidden registers are shown in FIG. 35. Cache The instruction counter (ic) 40 contains the address of the next instruction to be executed by this processor. The cache 60 (FIG. 40) holds values in fast static RAM 55 (SRAM). The values in the cache are the same as the values The “user” register 42 identifies what program the pro obtained from main memory in absence of the cache, except ceSSor is doing. It is implicitly part of every address the for temporary variables which have no main memory processor generates. It tells which User Segment Table addresses at all. Shared variables are never cached. Fortu defines segments for this program. When Multitude archi nately the vast majority of State variables can be cached. The tecture computers are timeshared as a public utility, many 60 wback instruction requires that the cache entries for a different user's programs will be running concurrently. Pro particular Segment be known. ceSSors will be Switching between users frequently. The user Acache line 62 holding four, 64-bit words needs 335 bits, register ensures that user programs cannot access memory not including parity. Wide cache lines Strain limitations of locations or other resources without an explicit entry in the chip bonding pads and package pins. Multiplexing can User Segment Table. This is a valuable security mechanism. 65 alleviate pinout restrictions at the cost of two-cycle cache The “level” register 44 defines the nesting level of the access. Standard, two-way, Set-associative, LRU currently executing code. The default value for the Segment replacement cache management is fine, except protected 5,867,649 89 90 segments must be written back upon wback. FIG. 41 shows Parallel Composition the Set-associative cache. One method would be to read every cache line and compare its user and Segment fields with the Segment being sync shared int --sync must be on shared, noncacheable page written back, and if the cache line is dirty (has been written) par mov #n:sync:1 --initialize Sunc to in writes the values back into main memory. The disadvantage alloc sp.p;#c1size:c1block --allocate spread, protected of this approach is the time to cycle through every cache --fill in state variables in activation block: c1block line. sched c1block.c1 -schedule starting at c1 using c1block alloc sp.p;#c2.size;c2block Augmenting the TLB with three parameters can Substan --fill in state variables in c2block tially reduce the number of cache lines considered for sched c2block;c2 wback. Retaining the largest and Smallest offset numbers used to write variables in the Segment limits range of offsets alloc spp.ffcnsize:cnblock --fill in state variables in cnblock that must be tested. Counting the number of writes tells the sched cnblock;cn maximum number of cache lines that may need writeback. term --end this interval Write counts of Zero, one or two are especially helpful. 15 c1 --code implementing c1 WaCK c1block --clear cache Memory Controller faddi:w sync:-1:tmp1 --segment for sync is level-1 eqi:w 1;tmp1 --are we last to sync The Memory Controller reads and writes the DRAM branch EQ:rest1 chips according to node bus transactions. It performs the term --no, quit rest1 rtn rest --yes, continue parent fetch-op for combinable operations. It checks the tags for c2 --code implementing c2 proper use. It clears (sets to nil) memory when allocated. WaCK c1block fadd :i:w sync:-1:tmp2 Input/Output Controller eqi:w 1;tmp2 branch EQ:rest2 The input/output controller (IOC) Swaps pages to disk, term reads or writes disk files, and handles any other peripherals 25 rest2 rtn rest under command of the OS. cn --code implementing cn wback c1block Request and Response Chips fadd :i:w sync:-1:tmpn eqi:w 1;tmpn The request and response chips interface the node to the branch EQ:restin Layered network. term restn rtn rest Construct Implementations rest --copy state variables (if needed) --last sync does rest free c1block The following construct implementations are the form of free c2block 35 code the compiler would generate for the particular com free cnblock mand. English descriptions of code appear where appropri --code following the parallel composition ate. Alternative Command 40 Parallel Command if --evaluate boolean expression for g1 branch:g1 sl --g1 is condition code of up flags: EQ.LTCC, FORALL i INj.k DO body --evaluate boolean expression for g2 i shared int --i in activation record, filled w/address by alloc branch:g2 s2 bar shared int --barrier syncronization variable 45 forall alloc:w sh;#2;i --get address for i and bar --evaluate boolean expression for gn OW j;i --initialize i branch:gn Sn Sub:i:w j:k:tmp goto error--an error has occurred addi:w 1;tmp;bar --number of subintervals s1 --code implementing s1 schedin myblock;body:bar goto f term 50 after-code following parallel command s2. --code implementing s2 body fadd :i:w i:1;tmpi goto f --code for body using timpi for i faddi:w bar:-1:tmpr sin --code implementing sn eqi:w 1;tmpr fi --code executed after alterative command branch EQ;goaft 55 term goaft goto after Iterative Command do Math Symbols 60 g => S od do --evaluate g e member of branch:g od -g is condition code for branch g not memeber of --code implementings 2 empty set branch:T do -always branch C subset od --code executed after iterative command 65 C proper subset D Superset 5,867,649 91 92 -continued Textual Substitution: intersection union g multiple intersection multiple union 5 If no variable in expression e has the same name as any Cartesian product composition bound variable in predicate P, then P is defined to be the function space result of replacing every free occurrence of X with e. -> mapping, or implication double implication function symbol 1O not equal at least Simple DANCE Grammar at most about equal empty sequence equality by definition partial order 15 irreflexive partial order This is a simplified grammar for DANCE. There are infinite sequence several (small) differences from the full-blown grammar. In there exists particular, parameters in function and procedure declara for all tions are separated by Semicolons, parameters in function W meaning next 20 and procedure invocations are separated by commas. domain operator sometime summing quantifier multiplying quantifier infer 25

Predicate Calculus:

Law of Negation. P = NOT(NOT(P)) Law of Excluded Middle: P OR NOT(P) = T Law of Contradiction: P AND NOT(P) = F Law of Implication: P => Q = NOT(P) OR Q Law of Equality: (P=Q) = (P=>O) OR (Q=>P) Laws of OR-simplification: P ORP = P PORT = T P ORF = P POR (PAND Q) = P Laws of AND-simplification: PAND P = P PAND T = P PAND F = F P AND (P OR Q) = P Commutative Laws: (PAND Q) = (QAND P) (P OR Q) = (Q ORP) (P = Q) = (Q = P) Associative Laws: P AND (QAND R) = (P AND Q) AND R POR (Q OR R) = (P OR Q) OR R Distributive Laws: POR (Q AND R) = (P OR Q) AND (PORR) P AND (Q OR R) = (P AND Q) OR (P ANDR) DeMorgan's Laws: NOT(PAND Q) = NOT(P) OR NOT(Q) NOT(P OR Q) = NOT(P) AND NOT(Q) AND-elimination: (PAND Q) => P OR-introduction: P => (P ORQ) DeMorgan's Laws for Quantifiers: (EXISTS B : R : P) = NOT(FORALL B : R : NOT(P)) (FORALL B : R : P) = NOT(EXISTS B : R : NOT(P)) Conjunction Law: (FORALL B : R : P AND Q) = (FORALL B : R : P) AND (FORALL B : R : Q) Disjunction Law: (EXISTS B : R : P OR Q) = (EXISTS B : R : P) OR (EXISTS B : R : Q) Counting Law: FORALL I : O NOT R: (NUM B1 : R OR I : Q = (NUM B2 : R: Q) + 1) Summing Law: FORALL I : O NOT R: (SUM B1 : R OR I : Q = (SUM B2: R: Q) + I) Product Law: FORALL I : O NOT R: (PROD B1 : R OR I : Q = (PRODB2 : R : Q) I) Empty Range Laws: (FORALL B : F: P) = T (EXISTS B : F: P) = F (NUM B ; F : P) = 0 (SUM B ; F : P) = 0 (PROD B : F : P) = 1

65

5,867,649 97 98 Assignment Axiom: {P}X:=e {P} Weakest Precondition Inference Rule: Sequential Composition Rule:

Weakest preconditions of Statements: Alternative Rule:

Iterative Rule: WP for Iterative Command: Wi: i €1 ... n : B (bound > 0) and 15 Let BB be the disjunction of the guards: BB=B1 or B2 or {I and B, and bound=bd S; I and bound-bd}} ... or Bn. Let I be a predicate, and “bound” be an {I} DOIT I and not (B1 or ... or B) and not (bound > 0) integer-valued expression whose value is nonnegative. Then Rule of Consequence: (I and BB) => (bound > 0) and (Wi: 1 <= i <= n. (I and Bi) => wp(Si, I) and I and Bi and bound-X) => wp(Si, bounds I => wp(DOIT, I and not(BB)

WP for Parallel Command: Parallel Composition Rule: 25 wp(FORALL i IN range DOS,Q)=wp(Si1),Q) and wp(S i2).Q) and . . . and wp(Sik),Q) where i1,i2, . . . , ik are the values i takes in the range and Six is the statement S with i replaced by the value ix, Si. AccessOK(S1, S2)= DANCE Language Reference Manual WVi:VIeS1 US2: (Viti Ihs(S1) UIhs(S2)) or (VieS1 and This DANCE Language Reference Manual parallels the Vi (fS2) or (VieS2 and Vig. S1) or ( (Vieco(S1) and Ada language reference manual (MIL-STD-1815A). Vi eco(S2) ) (co(S1) co(S2) and co(S2) co(S1)) Principally, naming conventions, packages, and Subprogram 35 Syntax are based on Ada which in turn was based on Pascal. The graimar of DANCE is defined using LNF LNF pro ductions can be parsed by a modified CYK algorithm. LNF Existential Quantification Rule: is defined in Appendix C which contains the complete grammar. Appendix D has LNF productions for DANCE in (d1,d2 : domain : Sd1C Sd2) and 40 lexicographic order. (S: string: (d1 + “.” +s) = d2) and Lexical Elements (WV: variable: Ved1 Ve d2) and This chapter defines the lexical elements of DANCE. (Xe Sa2) and (X=c) and (P - P1) and (Q1 Q) Lexical elements are denoted with the same LNF notation as {P} DECLAREX=c begin (P1} w{Q1} end {Q} 45 the language grammar. LNF notation is described in appen dix C. Simply, LNF productions have either one or two Parallel Command Rule: non-terminals (enclosed in angle brackets <>) or a single terminal on the right hand Side. For lexical token definition (Wi: i e range :{P(i)}S(i) {Q(i)}) and 50 only, the vertical bar, l, should be read as “or.” (Wi, j : i,j e range:AccessOK(S(i), S(i))) and Character Set (Wi: i e range : P => P(i)) and (Wi: i e range : Q(i)) => Q Any lower case letter is equivalent to the corresponding {P} FORALL i IN range DOS {Q} upper case letter, except within character Strings and char 55 acter literals. All language constructs may be represented with a basic character set, which is Subdivided as follows: Weakest Precondition Predicate Transformers

Law of Conjunction: wp(S,O) and wip(S,R)=wp(S,O and R) Law of Disjunction: 65 5,867,649 99 100 merely approximate them with floating point numbers. Rational numbers are denoted by a vertical bar between integers. s sys" ca":

::= ::= 15 ::= The other special character group is Specially used to ::="+ ss ::="-” Lexical Units and Spacing Conventions ::= A program is a Sequence of lexical units, the partitioning ::= of the Sequence into lines amid the Spacing between lexical units does not affect the meaning of the program. The lexical Identifiers are used as names and also as reserved words.

::= ::= ::= ::= 50

::=::=::=::= ::= ::= ::=::= 5,867,649 101 102 The length of a String is the length of the Sequence -continued represented. Backslashes use to delinate Special character literals are not counted for String length. Catenation must be Reserved Word Used For used to represent Strings longer than one line, and Strings OR logical or containing control characters. The plus character, +, is used OUT mode of procedure operands for String literal concatenation. PACKAGE start (or end) of package specification or body PRIVATE declare existance of a type with assignment PROCEDURE procedure declaration Examples PROD product quantifier RANGE range restriction String containing+ACSIICR+ASCII.LF+Control Char RAT rational number type acters RECORD start of record type REM integer remainder String containing bracket characters \\\\\\{\} REMOTE caller and performer of routine may be distant RENAME introduce alias, nice when names have many dots Comments REOUIRED demand type parameters for this package 15 RETURN declare returned type for ths, command at end A comment Starts with two hyphens and is terminated by SHARED prohibit caching of value SKIP do nothing command. the end of line. It may only appear following a lexical unit, SPREAD shared data structure--spread many memory banks or at the beginning or end of a program unit. Comments have SUM summing quantifier no effect on the meaning of a program-their Sole purpose SWAP indivisibly exchange a given value with that stored is the enlightenment of a human reader. T true TYPE start of type declaration VARIANT Start (or end) of variant type Example XOR logical exclusive-or This is a comment. PragmaS 25 Declarations and Constructed Types DANCE defines several forms of named entities. A named Pragmas (compiler directives) are not included in entity can be: a number, an enumeration literal, an object DANCE at this time. (variable), a discriminant, a record component, a type, a Subprogram (procedure or function), a package, or a param Reserved Words eter. The identifiers listed below are reserved words that have Declarations special significance in DANCE. Declared identifiers may A declaration associates an identifier with a declared not be reserved words. Reserved words' lower, upper, or entity. Each identifier must be explicitly declared before it is mixed case equivalents are considered the Same. used. A declaration may declare one or more entities. 35 ::= ::= Reserved Word Used For ::= ::= ACCESS access type declaration; Ada-style pointers ::= ALL deferences access types when object is scalar: ALL AND logical operator 40 ::=::=::=::= CAND conditional and, the left argument eval 1st The proceSS by which a declaration achieves its effect is CONSTANT declare a named, constant value COR conditional or, the left argument eval 1st 45 called the “elaboration” of the declaration. This process DIV integer divide generally involves Several Successive actions (at compile DO start of do loop, or part of parallel command time): END end of package etc. 1) The identifier of a declared entity is “introduced” at the F false FETCH ADD read the value, indivisibly add to it point of its first occurrence; it may hide other previ FETCH AND read the value, indivisibly AND with it 50 ously declared identifiers from then on. FETCH OR read the value, indivisibly OR with it 2) The entity is elaborated. For all forms of declarations, FETCH XOR read the value, indivisibly XOR with it except those of Subprograms and packages, an identi FILE file type (not included in DANCE yet) FLOAT floating point type fier can only be used as a name of a declared entity once FORALL bounded universal quantification the elaboration of the entity is completed. A Subpro FROM part of import declaration 55 gram or package identifier can be used as a name of the FUNCTION declare a function corresponding entity as Soon as the identifier is IF start of alternative command introduced, hence even within the declaration of the IMPORT make objects in named packages visible entity. Elaboration at compile time consists of placing IN part of parallel command, before range INTEGER integer type an entry in the Symbol table corresponding to the Scope IS separates a subprogram declaration from its body of the declaration. LIMITED declares that a type exists, but no 60 3) The last action performed by the elaboration of an MOD integer remainder, result always non-negative object declaration may be the initialization of the NEW allocate memory for object, set up access variable NIL universal value held by uninitialized variables declared object(s), if the initialization is a constant NOT logical negation value. NUM counting quantifier OD end of iterative command 65 Object and Number Declarations OF part of array declaration An object is an entity that contains a value of a specified type. Objects can be introduced by object declarations. 5,867,649 103 104 Objects can also be components of other objects, or formal ::= parameters of Subprograms. ::= ::=“TYPE” ::= ::= ::= ::= ::=": ::= ::= ::= ::= ::= ::= ::= constants::=“CONSTANT ::=

::="SHARED’ ::= ::="SPREAD” ::= remotes::=“REMOTE ::= ::= 15 ::= Examples ::= ::= ::= TYPE operator type1,type2 IS ARRAYtype1 OF ::=":=” type2, ::= TYPE array of integers IS operator0 An object declaration may include an expression speci 10,INTEGER); fying the initial value for the declared objects. Otherwise the object has value “NIL” and will cause a runtime error if the Type Definition value is used before a value is aissigned. 25 A type definition defines a type for type declarations and An object is constant if the reserved word “CONSTANT" other type expressions. appears in its declaration. Objects that are not constant are ::= called “variables.” ::= A number declaration introduces one or more identifiers ::= naming a number defined by a literal expression, which ::= involves only numeric literals, names of numeric literals, ::= parenthesized literal expressions and predefined arithmetic ::= operatorS. ::= ::= ::= ::= 35 ::= ::=::= ::=::= ::= ::= 40 ::= During elaboration, an object is bound to its type. Restric ::= tion of object behavior to its type (type checking) is essential ::= to program correctness. Only when our programs do exactly what we say they do (nothing more or less) then quality Subtype Indication programming can be achieved. 45 A Subtype indication is a type name with possibly a Chackers beware! Your days of code Slinging and bit constraint limiting the range of values to a Subset of the fiddling are few. The new wave of Software engineering will named type. displace your midnight marathons by disciplined teams ::= concocting understandable specifications, elegant designs, ::= Type Declarations The elaboration of a type definition always produces a distinct type. TIhe type identifier Serves as a name of the A “type' denotes a Set of values and a set of operations type resulting from the elaboration of the type definition. applicable to those values. "Scalar types have no compo nents, their values can be enumerated, integer or floating 55 point. "Array' and “record' types are composite; their Examples of type declarations values consist of Several component values. “Private” types have well defined possible values, but are not known to users TYPE enumerated IS (one, two, three, four); --enumerated of Such types. Private types are equivalent to Cardelli & TYPE int IS integer RANGE 0.100: --integer Wegner's existentially quantified types. TYPE ch IS char; --character 60 TYPE b IS bool; --boolean Record types may have Special components called "dis TYPE fuzzy IS float RANGE 0.0.1.0; --floatling point criminants' whose values distinguish alternative forms of TYPE gooey IS ARRAYenumerated OF fuzzy; --array values. Discriminants are defined by a “discriminant part.” TYPEs IS string: --string TYPE di IS RECORD --record A private type is known only by its names its discriminiants handle : string; (if any), and the set of operations applicable to its values. 65 salary : float; ::= funny : bool; ::= 5,867,649 105 106 -continued ::="CHAR’ END RECORD; Boolean Types TYPE type operator type 1 type2 IS ARRAYI type1 OF type2; --type definitions with parameters are type operators There is a predefined enumeration type “bool.” It contains TYPE gooey2 IS type operatorenumerated,fuzzy: 5 --a derived type is created by applying a type operator two literals: “Tfor true and “F” for false ordered with the TYPE myob IS PRIVATE; --private relation F-T. The evaluation of a “condition' must result in TYPE ptr IS ACCESS gooey2: --aCCCSS type bool. TYPE two input integer function IS integer,integer => integer; --function type ::="BOOL TYPE either or IS --variant 1O VARIANT selector : enumerated: Integer Types one : integer, two : float, Ada requires all integers to be declared with range con three, four : fuzzy straints. This makes me uncominfortable. I hope that Multi END VARIANT tude will not place a bound on integer representation. TYPE rational range ISO1.11; --rational 15 Anyway: ::=::=::=

a ... Z. January...June TYPE fuzzy IS 0.0 . . . 1.0; TYPE weights IS array1 . . . 5 OF FLOAT; where January and June are elements of an enumerated type. 40 Rational Types Enumeration Types Fast GCD hardware to supports Rat. A vertical bar sepa An enumeration type definition defines an ordered Set of rates the denominator from the numerator in rational literals. distinct values that are denoted by enumeration literals. ::=::= ::= ::= 45 ::=")” Examples ::=::= Array Types ::= TYPE Month IS (January, February, March, April, June, ::= July, August, September, October, November, December); ::="ARRAY.” ::=::= A character type is an enumeration type that contains 65 ::= be used in character Strings. 5,867,649 107 108 ::=::= A variant is like a record in that it has named components, ::= but a variant object may have only one actual component determined by its discriminant. ::=::= ::= ::="<> ::="VARIANT Unconstrained arrays have no range explicitly given. The range of the allowed indices is determined by the ranges of 10 their types. Values for unassigned components are initially NIL. Constrained arrays have ranges in their declarations. Examples ARRAY 1 ... 100 OF employee records 15 ::= ARRAY < > OF mystery achievements The discriminant of a variant can only be changed by Strings assigning the whole variant. The predefined type “STRING” denotes a onedimen Example sional arrays of the predefined type “CHARACTER,” indexed by the predefined type “NAT” 25 TYPE device IS (printer, disk, drum); TYPE NAT IS INTEGER RANGE 0 . . . integer last1; TYPE data IS (open, closed); TYPE STRING IS ARRAY-> OF CHAR; TYPE peripheral IS RECORD Concatenation is a predefined operator for the type status : state, STRING and for one-dimensional array types. Concatena variant part: tion is represented as "+'. The relational operators<, <=, >, VARIANT unit: device; printer : 1...page size, and >= are defined for Strings by lexicographic order. disk, drum: ::="STRING” RECORD cylinder : cylinder index Example track : track number 35 END RECORD This String has+ catenation. -the first character of a END WARIANT END RECORD; String has index Zero periph.variant part.disk, track first ch :=Some String O --yields a track number from a peripheral periph.variant part.unit --tells what device it is Record Types 40 A record object is a composite object consisting of named components, which may be of different types Access Types ::= DANCE contains acceSS types which are a lot like point ::= CS. ::= month : months, ::= END RECORD; ::= ::= < program component> Discriminants ::= A discriminant is a value of an enumeration type 65 ::=::= ant part is used. ::= 5,867,649 109 110 ::= ::= ::= ::= ::= The body of a Subprogram or package declared in a ::= declarative part must be provided in the same declarative ::= part. If the body of one of these program unitS is a separately compiled Subunit, it must be represented by a body Stub at Examples the place where it would otherwise appear. ::= ::= ::="Separate' Slices A Slice denotes a Subarray formed by a Sequence of Derived Type Definition consecutive components of an array. A derived type definition provides a type expression ::= including type parameters. Type parameters are enclosed in 15 ::= Square brackets immediately following the type name in a ::= type declaration. When a type declaration has parameters it does not create a new type, but rather a type operator. A Example derived type is created when the type operator is invoked Monthly sales January . . . June corner1 . . . n/2,1 . . . with type parameters. n/2 ::= ::=::= ::= ::= ::= '' ::= ::= 5,867,649 111 112 ::=23, julie=>28, children=>0) to days date:= (day =>7, month =>November, year=>1992) Expressions ::= ::= Relational Operators ::= The relational operators have operandi of the same type ::= and return values of the predefined type Bool. Equality ::= (inequality) is defined for any two objects of the same type ::= except when the type is “limited private.” ::= 25 ::= = f= (in)equality any type Bool ::= <<= >>= test for order scalar or discrete Bool IN inclusion discrete type IN Bool ::= discrete range ::= Equality for discrete types is equality of values. Equality for Float types requires Some misty liandwaving about ::= 35 nearly equal. If the numerical analysis people want Some ::= Special equality test, they must Supply it; DANCE compares ::= for exact equality. For equality tests of arrays and records, ::=::=::= Strings (arrays of characters) and provides "telephone direc ::= tory ordering of various lengths of Strings. However, in ::=::= type, including array length. ::= Each primary has a value and type. The only names allowed as primaries are names denoting objects. The type of an expression depends on the type of its constituents and 50 on the operators applied. For the case where Some program mer used operator overloading, the expression type must be determinable by context. ::="<” j + 1 ::="<= Apik- (Apij/Apij.) Apk ::="> NOT(k > n) an acceSS Value that designates the object. ::= 15 news::=''NEW Multiplying Operators An implementation must guarantee that any object created by evaluation of an allocator remains allocated for as long as Multiplication (*) and division (/) are straightforward for this object or one of its Subcomponents is accessible directly both Int and Float types. Modulus (MOD) and remainder of indirectly, that is, as long as it can be denoted by Some (REM) are defined by the relations: name. An implementation may reclaim the Storage occupied by an object created by an allocator, once this object has become inaccessible. This should be sufficient to prohibit where (A REM B) has the same sign as A and an absolute Security breaches due to use of Stale access types. value less than B. 25 Allocator example where (A MOD B) has the same sign as B, which must be TYPE ptr IS ACCESS some type: --in declaration part positive, and N is Some integer. Make Sure you get the pointer : ptr; --in declaration part operator you really want. pointer := NEW ptr; --in subprogram body ::= Static Expressions i ASSignment examples 55 ::= ::= k:=k+1 ::=

Array ASSignment ::=“FI’ Arrays are functions from a discrete type to the base type. The whole array can be assigned, or member by member. Array assignment examples 65