Reading Assignment Lazy Evaluation

• MULTILISP: a language for concurrent Lazy evaluation is sometimes symbolic computation, called “call by need.” We do an by Robert H. Halstead evaluation when a value is used; (linked from class web page) not when it is defined. Scheme provides for lazy evaluation: (delay expression) Evaluation of expression is delayed. The call returns a “promise” that is essentially a lambda expression. (force promise) A promise, created by a call to delay, is evaluated. If the promise has already been evaluated, the value computed by the first call to force is reused.

© © CS 538 Spring 2008 207 CS 538 Spring 2008 208

Example: An argument to a function is strict Though and is predefined, writing if it is always used. Non-strict a correct implementation for it is arguments may cause failure if a bit tricky. evaluated unnecessarily. The obvious program With lazy evaluation, we can define a more robust and (define (and A B) function: (if A (define (and A B) B (if A #f (force B) ) #f ) ) is incorrect since B is always ) evaluated whether it is needed or not. In a call like This is called as: (and (not (= i 0)) (> (/ j i) 10)) (and (not (= i 0)) (delay (> (/ j i) 10))) unnecessary evaluation might be Note that making the programmer fatal. remember to add a call to delay is unappealing.

© © CS 538 Spring 2008 209 CS 538 Spring 2008 210 Delayed evaluation also allows us We need to slightly modify how a neat implementation of we explore suspended infinite suspensions. lists. We can’ redefine car and The following definition of an cdr as these are far too infinite list of integers clearly fundamental to tamper with. fails: Instead we’ll define head and (define (inflist i) tail to do much the same job: (cons i (inflist (+ i 1)))) (define head car) But with use of delays we get the (define (tail L) desired effect in finite time: (force (cdr L))) (define (inflist i) head looks at car values which (cons i are fully evaluated. (delay (inflist (+ i 1))))) tail forces one level of Now a call like (inflist 1) evaluation of a delayed cdr and creates saves the evaluated value in place of the suspension (promise).

1 promise for (inflist 2)

© © CS 538 Spring 2008 211 CS 538 Spring 2008 212

Given Exploiting Parallelism (define IL (inflist 1)) Conventional procedural (head (tail IL)) returns 2 and programming languages are expands IL into difficult to compile for multiprocessors. Frequent assignments make it difficult to find independent computations. 1 Consider (in Fortran): do 10 I = 1,1000 2 promise for X(I) = 0 (inflist 3) A(I) = A(I+1)+1 B(I) = B(I-1)-1 C(I) = (C(I-2) + C(I+2))/2 10 continue This loop defines 1000 values for arrays X, A, B and C.

© © CS 538 Spring 2008 213 CS 538 Spring 2008 214 Which computations can be done • C(I) = (C(I-2) + C(I+2))/2 in parallel, partitioning parts of an It is clear that even and odd array to several processors, each elements of C don’t interact. Hence operating independently? two processors could compute C • X(I) = 0 even and odd elements of in Assignments to X can be readily parallel. Beyond this, since both parallelized. earlier and later C values are used in each computation of an element, • A(I) = A(I+1)+1 no further means of parallel Each update of A(I) uses an A(I+1) evaluation is evident. Serial value that is not yet changed. Thus evaluation will probably be needed a whole array of new A values can for even or odd values. be computed from an array of “old” A values in parallel.

• B(I) = B(I-1)-1 This is less obvious. Each B(I) uses B(I-1) which is defined in terms of B(I-2), etc. Ultimately all new B values depend only on B(0) and I. That is, B(I) = B(0) - I. So this computation can be parallelized, but it takes a fair amount of insight to realize it.

© © CS 538 Spring 2008 215 CS 538 Spring 2008 216

Exploiting Parallelism in How is Parallelism Found? Scheme There are two approaches: Assume we have a shared- • We can use a “smart” compiler that is memory multiprocessor. We might able to find parallelism in existing be able to assign different programs written in standard serial processors to evaluate various programming languages. independent subexpressions. • We can add features to an existing For example, consider that allows a (map (lambda(x) (* 2 x)) programmer to show where parallel '(1 2 3 4 5)) evaluation is desired. We might assign a processor to each list element and compute the lambda function on each concurrently:

12 34 5

Processor 1 ... Processor 5

246 8 10

© © CS 538 Spring 2008 217 CS 538 Spring 2008 218 Concurrentization • Control Dependence Not all expressions need be (or Concurrentization (often called should be) evaluated. parallelization) is process of In automatically finding potential (if (= a 0) concurrent execution in a serial 0 program. (/ b a)) Automatically finding current we don’t want to do the division execution is complicated by a until we know a ≠ 0. number of factors: • Side Effects • Data Dependence If one expression can write a Not all expressions are value that another expression independent. We may need to might read, we probably will need delay evaluation of an operator or to serialize their execution. subprogram until its operands are available. Consider (define rand! Thus in (let ((seed 99)) (+ (* x y) (* y z)) (lambda () we can’t start the addition until (set! seed both multiplications are done. (mod (* seed 1001) 101101)) seed )) )

© © CS 538 Spring 2008 219 CS 538 Spring 2008 220

Now in Utility of Concurrentization (+ (f (rand!)) (g (rand!))) we can’t evaluate (f (rand!)) Concurrentization has been most and (g (rand!)) in parallel, successful in engineering and because of the side effect of set! scientific programs that are very in rand!. In fact if we did, f and g regular in structure, evaluating might see exactly the same large multidimensional arrays in “random” number! (Why?) simple nested loops. Many very • Granularity complex simulations (weather, Evaluating an expression fluid dynamics, astrophysics) are concurrently has an overhead (to run on multiprocessors after setup a concurrent computation). extensive concurrentization. Evaluating very simple Concurrentization has been far expressions (like (car x) or less successful on non-scientific (+ x 1)) in parallel isn’t worth programs that don’t use large the overhead cost. arrays manipulated in nested for Estimating where the “break loops. A compiler, for example, is even” threshold is may be tricky. difficult to run (in parallel) on a multiprocessor.

© © CS 538 Spring 2008 221 CS 538 Spring 2008 222 Concurrentization within Adding Parallel Features to Processors Programming Languages. Concurrentization is used It is common to take an existing extensively within many modern serial programming language and uniprocessors. Pentium and add features that support PowerPC processors routinely concurrent or parallel execution. execute several instructions in For example versions for Fortran parallel if they are independent (like HPF—High Performance (e.g., read and write distinct Fortran) add a parallel do loop registers). This are superscalar that executes individual iterations processors. in parallel. These processors also routinely Java supports threads, which may speculate on execution paths, be executed in parallel. “guessing” that a branch will (or Synchronization and mutual won’t) be taken even before the exclusion are provided to avoid branch is executed! This allows unintended interactions. for more concurrent execution than if strictly “in order” execution is done. These processors are called “out of order” processors.

© © CS 538 Spring 2008 223 CS 538 Spring 2008 224

Multilisp The Pcall Mechanism Multilisp is a version of Scheme Pcall is an extension to Scheme’s augmented with three parallel function call mechanism that evaluation mechanisms: causes the function and its • Pcall arguments to be all computed in Arguments to a call are evaluated parallel. in parallel. Thus • Future (pcall F X Y Z) Evaluation of an expression starts causes F, X, Y and Z to all be immediately. Rather than waiting evaluated in parallel. When all for completion of the computation, evaluations are done, F is called a “future” is returned. This future with X, Y and Z as its parameters will eventually transform itself into (just as in ordinary Scheme). the result value (when the Compare computation completes) (+ (* X Y) (* Y Z)) • Delay Evaluation is delayed until the with result value is really needed. (pcall + (* X Y) (* Y Z))

© © CS 538 Spring 2008 225 CS 538 Spring 2008 226 It may not look like pcall can Look at the execution of treemap give you that much parallel on the execution, but in the context of (((1 . 2) . (3 . 4)) . recursive definitions, the effect ((5 . 6) . (7 . 8))) can be dramatic. We start with one call that uses Consider treemap, a version of the whole tree. This splits into map that operates on binary trees two parallel calls, one operating (S-expressions). on ((1 . 2) . (3 . 4)) (define (treemap fct tree) and the other operating on (if (pair? tree) ((5 . 6) . (7 . 8)) (pcall cons Each of these calls splits into 2 (treemap fct (car tree)) calls, and finally we have 8 (treemap fct (cdr tree)) independent calls, each operating ) on the values 1 to 8. (fct tree) ))

© © CS 538 Spring 2008 227 CS 538 Spring 2008 228

Futures If the computation of expr is not yet completed, you are forced to Evaluation of an expression as a wait until computation is future is the most interesting completed. Then you may use the feature of Multilisp. value and resume execution. The call But this is exactly what ordinary (future expr) evaluation does anyway—you begin evaluation of expr and wait begins the evaluation of expr. until evaluation completes and But rather than waiting for expr’s returns a value to you! evaluation to complete, the call to future returns immediately with a new kind of data object—a To see the usefulness of futures, future. This future is actually an consider the usual definition of “IOU.” When you try to use the Scheme’s map function: value of the future, the computation of expr may or may (define (map f L) not be completed. If it is, you see (if (null? L) the value computed instead of the () future—it automatically (cons (f (car L)) transforms itself. Thus evaluation (map f (cdr L))) of expr appears instantaneous. ) )

© © CS 538 Spring 2008 229 CS 538 Spring 2008 230 If we have a call We will exploit a useful aspect of (map slow-function long-list) futures—they can be cons’ed where slow-function executes together without delay, even if the slowly and long-list is a large computation isn’t completed yet. , we can expect to Why? Well a cons just stores a pair wait quite a while for of pointers, and it really doesn’t computation of the result list to matter what the pointers complete. reference (a future or an actual result value). Now consider fastmap, a version The call to fastmap can actually of map that uses futures: return before any of the call to (define (fastmap f L) slow-function have completed: (if (null? L) () (cons (future (f (car L))) future1 (fastmap f (cdr L)) ) future2 ) ) future3 ... Now look at the call (fastmap slow-function long-list)

© © CS 538 Spring 2008 231 CS 538 Spring 2008 232

Eventually all the futures That is, instead of automatically transform (pcall F X Y Z) themselves into data values: we can use ((future F) (future X) (future Y) (future Z)) answer1 In fact the latter version is actually more parallel—execution answer2 of F can begin even if all the parameters aren’t completely evaluated. answer3 ... Note that pcall can be implemented using futures.

© © CS 538 Spring 2008 233 CS 538 Spring 2008 234