An Adaptive Evaluation Strategy for Non-Strict Programs

Optimistic Evaluation: An Adaptive Evaluation Strategy for NonStrict Programs Robert Ennals Simon Peyton Jones Computer Laboratory, University of Cambridge Microsoft Research Ltd, Cambridge [email protected] [email protected] ABSTRACT thunk. This argument-passing mechanism is called call-by- Lazy programs are beautiful, but they are slow because they need, in contrast to the more common call-by-value. The build many thunks. Simple measurements show that most extra memory traffic caused by thunk creation and forcing of these thunks are unnecessary: they are in fact always makes lazy programs perform noticeably worse than their evaluated, or are always cheap. In this paper we describe strict counterparts, both in time and space. Optimistic Evaluation | an evaluation strategy that ex- Good compilers for lazy languages therefore use sophisti- ploits this observation. Optimistic Evaluation complements cated static analyses to turn call-by-need (slow, but sound) compile-time analyses with run-time experiments: it evalu- into call-by-value (fast, but dangerous). Two basic analy- ates a thunk speculatively, but has an abortion mechanism ses are used: strictness analysis identifies expressions that to back out if it makes a bad choice. A run-time adaption will always be evaluated [40]; while cheapness analysis lo- mechanism records expressions found to be unsuitable for cates expressions that are certain to be cheap (and safe) to speculative evaluation, and arranges for them to be evalu- evaluate [7]. ated more lazily in the future. However, any static analysis must be conservative, forc- We have implemented optimistic evaluation in the Glas- ing it to keep thunks that are \probably unnecessary" but gow Haskell Compiler. The results are encouraging: many not \provably unnecessary". Figure 1 shows measurements programs speed up significantly (5-25%), some improve dra- taken from the Glasgow Haskell Compiler, a mature op- matically, and none go more than 15% slower. timising compiler for Haskell (GHC implements strictness analysis but not cheapness analysis, for reasons discussed in Section 7.1.). The Figure shows that most thunks are either Categories and Subject Descriptors always used or are \cheap and usually used". The exact D.3.2 [Programming Languages]: Language Classifica- definition of \cheap and usually-used" is not important: we tions|Applicative (functional) languages, Lazy Evaluation; use this data only as prima facie evidence that there is a D.3.4 [Programming Languages]: Processors|Code gen- big opportunity here. If the language implementation could eration, optimization, profiling somehow use call-by-value for most of these thunks, then the overheads of laziness would be reduced significantly. This paper presents a realistic approach to optimistic eval- General Terms uation, which does just that. Optimistic evaluation decides Performance, Languages, Theory at run-time what should be evaluated eagerly, and provides an abortion mechanism that backs out of eager computa- Keywords tions that go on for too long. We make the following contributions: Lazy Evaluation, Online Profiling, Haskell The idea of optimistic evaluation per se is rather ob- • 1. INTRODUCTION vious (see Section 7). What is not obvious is how to (a) make it cheap enough that the benefits exceed the Lazy evaluation is great for programmers [16], but it car- costs; (b) avoid potholes: it is much easier to get big ries significant run-time overheads. Instead of evaluating a speedups on some programs if you accept big slow- function's argument before the call, lazy evaluation heap- downs on others; (c) make it robust enough to run allocates a thunk, or suspension, and passes that. If the full-scale application programs without modification; function ever needs the value of that argument, it forces the and (d) do all this in a mature, optimising compiler that has already exploited most of the easy wins. In Section 2 we describe mechanisms that meet these Permission to make digital or hard copies of all or part of this work for goals. We have demonstrated that they work in prac- personal or classroom use is granted without fee provided that copies are tice by implementing them in the Glasgow Haskell not made or distributed for profit or commercial advantage and that copies Compiler (GHC) [29]. bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP'03, August 2529, 2003, Uppsala, Sweden. Copyright 2003 ACM 1581137567/03/0008 ...$5.00. Optimistic evaluation is a slippery topic. We give an % Always Used % Cheap and Usually Used • operational semantics for a lazy evaluator enhanced 100% with optimistic evaluation (Section 5). This abstract machine allows us to refine our general design choices into a precise form. 75% The performance numbers are encouraging: We have pro- 50% duced a stable extended version of GHC that is able to compile arbitrary Haskell programs. When tested on a set of reasonably large, realistic programs, it produced a geomet- 25% ric mean speedup of just over 15% with no program slow- ing down by more than 15%. Perhaps more significantly, 0% naively written programs that suffer from space leaks can digits-of-e1digits-of-e2exp3_8gen_regexpsintegrateparaffinsprimesqueensrfib tak wheel-sieve1wheel-sieve2x2n1constraintsannainfergamtebrsa symalgwordcountatomboyercircsim speed up massively, with improvements of greater than 50% being common. Percentage of dynamically allocated thunks These are good results for a compiler that is as mature as GHC, where 10% is now big news. (For example, strictness analysis buys 10-20% [31].) While the baseline is mature, we have only begun to explore the design space for optimistic Figure 1: How thunks are commonly used evaluation, so we are confident that our initial results can be further improved. make optimistic evaluation really work, in addition to tak- ing care of real-world details like errors, input/output, and 2. OPTIMISTIC EVALUATION so on. The following sub-sections introduce these mech- Our approach to compiling lazy programs involves mod- anisms one at at time. Identifying and addressing these ifying only the back end of the compiler and the run-time concerns is one of our main contributions. To avoid con- system. All existing analyses and transformations are first tinual asides, we tackle related work in Section 7. We use applied as usual, and the program is simplified into a form the term \speculative evaluation" to mean \an evaluation in which all thunk allocation is done by a let expression whose result may not be needed", while \optimistic eval- (Section 5.1). The back end is then modified as follows: uation" describes the entire evaluation strategy (abortion, profiling, adaption, etc). Each let expression is compiled such that it can either • build a thunk for its right hand side, or evaluate its 2.1 Switchable Let Expressions right hand side speculatively (Section 2.1). Which of these it does depends on the state of a run-time ad- In the intermediate language consumed by the code generator, the let expression is the only construct that allocates justable switch, one for each let expression. The set of thunks. In our design, each let chooses dynamically whether all such switches is known as the speculation configuration. The configuration is initialised so that all lets to allocate a thunk, or to evaluate the right-hand side of the are speculative. let speculatively instead. Here is an example: If the evaluator finds itself speculatively evaluating a let x = <rhs> in <body> • very expensive expression, then abortion is used to sus- pend the speculative computation, building a special The code generator translates it to code that behaves log- continuation thunk in the heap. Execution continues ically like the following: with the body of the speculative let whose right-hand side has now been suspended (Section 2.2). if (LET237 != 0) { x = value of <rhs> Recursively-generated structures such as infinite lists • }else{ can be generated in chunks, thus reducing the costs x = lazy thunk to compute <rhs> when needed of laziness, while still avoiding doing lots of unneeded } work (Section 2.3). evaluate <body> Online profiling is used to find out which lets are ex- • pensive to evaluate, or are rarely used (Section 3). The Our current implementation associates one static switch profiler then modifies the speculation configuration, by (in this case LET237) with each let. There are many other switching off speculative evaluation, so that the offend- possible choices. For example, for a function like map, which ing lets evaluate their right hand sides lazily rather is used in many different contexts, it might be desirable for than strictly. the switch to take the calling context into account. We have not explored these context-dependent possibilities because Just as a cache exploits the \principle" of locality, op- the complexity costs of dynamic switches seem to overwhelm timistic evaluation exploits the principle that most thunks the uncertain benefits. In any case, GHC's aggressive inlin- are either cheap or will ultimately be evaluated. In practice, ing tends to reduce this particular problem. though, we have found that a realistic implementation re- If <rhs> was large, then this compilation scheme could quires quite an array of mechanisms | abortion, adaption, result in code bloat. We avoid this by lifting the right hand online profiling, support for chunkiness, and so on | to sides of such let expressions out into new functions. 2.2 Abortion is evaluated speculatively. However once the chunk is fin- The whole point of optimistic evaluation is to use call- ished (for example, after 10 elements have been created) by-value in the hope that evaluation will terminate quickly. rest switches over to lazy evaluation, causing the function It is obviously essential to have a way to recover when this to terminate quickly.

Load more