<<

ITAT 2016 Proceedings, CEUR Workshop Proceedings Vol. 1649, pp. 3–10 http://ceur-ws.org/Vol-1649, Series ISSN 1613-0073, 2016 M. Kratochvíl

Compiling functional code for system-level environment

Miroslav Kratochvíl1∗

Department of Software Engineering, Charles University in Prague [email protected]

Abstract: Compilation of programming languages based Variables of recursive types (like lists and trees), that on λ–calculus to standalone, low-level in- • form a highly valued building block of functional volves a challenge of removing automatic memory man- programming, are quite difficult to be implemented agement that is usually required for supporting implicit al- efficiently without some of automatic dealloca- locations produced by currently available . We tion decision. Common example of such decision propose a compilation approach that is able to convert may be seen in automatic handling of two singly- pure, lazily-evaluated functional code to machine code linked list objects that share a common ‘tail’. suitable for running on bare hardware with no run-time A quite common technique in functional program- support, and describe a Haskell-like programming lan- • guage that demonstrates feasibility of the new method. Us- ming — generating functions on runtime by partial ing the proposed approach, the missing ability to directly application and passing them around as first-class control the hardware and memory allocation may be added objects — is impossible in minimalist system-level to purely functional languages, which otherwise have sev- conditions, as any code generator (or, equivalently, a eral advantages over traditional procedural languages, in- ) can not be a part of the language runtime. cluding easier verification and parallelization. Similar problem arises with code that implies need for , which, if it can not be removed by inlining at compile time, is usually supported by 1 Introduction run-time allocation of in automatically man- aged memory. Ideas from have always been in- fluencing more traditional imperative programming lan- Arbitrarily deep recursion, a common method to run guages. Apparent simplicity and expressiveness of • loops in functional programming, is usually sup- λ–based constructions is reflected both in new features of ported by unbounded automatic allocation of a stack- some languages (notably the new functionality of C++11) like structure. In system-level programming, the pro- or in whole new languages (Clojure, Rust, partially also in grams and all recursion must fit into a standard, lim- Swift or Nim). Functional languages, on the other hand, ited and unmanaged program stack segment. are being improved to acquire the benefits of languages with more direct control of resulting machine code, which Viability of a new language. The main motivation for cre- may be required for reaching performance, efficiency, or ating a new language is to explore the possibilities that binary compatibility goals — there has been much re- arise from the expressive power of a purely functional lan- search aiming to make the high-level constructions more guage applied to a full stack1 of code that runs on bare efficient, concerning e.g. memory allocation efficiency hardware. The benefits may include easier high-level op- [7], type unboxing for performance [18] and various in- timization and simplified static analysis. lining methods. In particular, performing partial evaluation of the func- We push both of these developments to one of possible tional code is a computationally easy, well developed and meeting points — we demonstrate a language that satis- very effective method of optimization [20]. The lack of fies the requirements from the system-level languages by side effects (or correct embedding of side effects in a tan- having similar (mostly compatible) resulting code, execu- gible construction) also simplifies derivation of semantic tion and memory management model as C or C++, while meaning of the code by reducing amount of variables that supporting functional programming paradigms, type sys- affect how the code is run. tems, syntax, and many practical programming construc- Moreover, functional languages do not reflect any of tions from Haskell and similar languages. the traditional paradigms that the system-level program- ming languages inherited: They are not designed specif- Major challenges. Such combination may easily lead to a ically for register-based machines, nor uniprocessors, nor direct contradiction, as the phenomena common in func- for sequential code execution2, not even for preservation tional languages imply the need either for garbage collec- tion or for other, possibly even more complicated run-time 1We indirectly refer to the valuable property of C-like languages, processing: that all dependencies of C (esp. the standard library) can again be written only in C. ∗This work was supported by the grant 11562/2016 of the Charles 2Any code that touches a common system-level primitive directly is University Grant Agency. almost necessarily sequential. 4 M. Kratochvíl

of call conventions and code structures shared by program functional-programming improvements (notably pat- parts. We consider lack of those properties a crucial ben- tern matching and a shifted view on classes, repre- efit for simplification of automatic code processing, sig- sented e.g. by Swift protocols and improved enumer- nificant both for further elimination of variables affect- ated types). ing derivation of high-level optimization possibilities and, more notably, for automatic parallelization of code, as the Important theoretical result about evaluation methods, compiler is not forced to perform a potentially sub-optimal the fact that pure and eager functional languages can be decomposition of sequentially written code into paralelliz- shown inferior in terms of performance to languages that able fibers. are either lazily evaluated or allow side-effects, was dis- Explicitly specified procedures and calls in block- cussed by Bird, Jones and De Moor [1]. The low-level structured code often represent some semantic value (like target environment of our language can not support lazy an API for module separation) that is unimportant or even evaluation directly (although, in the code, the programmer harmful for resulting program structure — non-existence can use lazy evaluation for any construction); we therefore of any explicit code-structuring syntax leaves the choice of must allow some well-contained amount of side effects. the call convention and separation of the code into subrou- We make extensive use of the knowledge that was gath- tines on the compiler, which may produce better program ered during the GHC development. Discussed compilation while leaving the source semantically clean and readable. and optimization methods are usually based on the meth- We are further motivated by the recent high- ods used to compile Haskell, as described for example in performance results achieved by the language-centric GHC Internals [13]. functional programming approach, most notably the cache-oblivious memory allocation [2] or the generative 1.2 Approach approach to code parallelization [22]. As a main goal we construct a simple language to demon- strate that there is no technical obstacle that would make 1.1 Related research system-level programming in a purely functional language We highlight several compilers that target a similar of impossible. To solve the aforementioned problems with goals: run-time dependency on automatic memory management, we make following design choices: PreScheme — a language by Kelsey [15] is based on a compiler that transforms a simplified version of Recursive types are replaced by pointed types. We • Scheme language to machine code. The approach discuss the reasons and effects of this removal in sec- chosen by authors is to completely replace all re- tion 2.3. As in C, management of all memory ex- cursive types in the runtime with vectors, and to cept the stack is done by programmer (directly or in- forcibly inline all code, so that no code-generating directly using library code) through pointed types or β–reductions present at runtime. As in other Scheme constructions based on them. implementations, all evaluation is always eager and sequential structure of code is preserved. Deep functional recursion is allowed, but it is re- • placed with tail recursion at all tail-call positions. Habit — a project of Portland State university HASP Such trivial approach is used successfully in many group [10] that states a need for similar system- compilers, including the GHC. See section 2.4 for de- . Most recent publications tails. from the project discuss the language features and Our main contribution is the method to run lazy eval- provide a clear specification for implementation. To • the best of our knowledge, no implementation of uation without heap allocation of thunks. We store Habit is available yet. the thunks in program stack and apply them to func- tion bodies that were modified at compile time to ex- Rust — a relatively new language that exploits tech- pect them as arguments, passed to them by a standard niques similar to linear types to stay both very effi- calling convention. We describe a fast, determinis- cient with memory allocations and safe against pro- tic inference-based algorithm that converts functional grammer errors. Compilation method and execution code to such equivalently-behaving non-lazy form in model chosen by Rust is similar to ours, with the ex- section 2.2. Note that our solution works without any ception of the imperative Rust syntax and the fact that partial evaluation technique that is usually exploited the Rust runtime needs a garbage collector to work for this purpose, but maintains the code in a form that with recursive types from its standard library. is still able to be optimized and transformed by stan- dard inlining algorithms. Swift — a recent product of Apple shows corpo- rate interest in small, fast languages that re- For demonstration purposes, we only provide a sim- duce the memory-management overhead and include ple (Hindley-Milner with several extensions) Compiling Functional Code for System-Level Environment 5

and rely on the LLVM compiler framework to generate Lazy evaluation Instead of allocation in the STG4 platform-specific code. Results are presented in section 3. data structure, we will use a structure that is of predictable Although there is currently no guarantee that resulting size and stored completely on stack. The only problem- programming language will be practical, we believe that atic part of such static thunk is its “meaning”, or, techni- low-level programming languages with high levels of ab- cally, a description of the function that will evaluate the straction are currently very favorable3 and current devel- thunk. As the concept of function objects has no straight- opment is producing a lot of small languages inspired by forward assembly-level representation, we replace it by a functional programming that target low-level goals (e.g. code address of a previously-prepared compiled function the Nim language), in which our resulting language could (or simply a “function pointer” to code generated at com- fit easily. Still, its main purpose is to serve as a future test- pile time) that is able to compute the actual result of the bed for optimization and automatic parallelization tech- thunk. Rest of the static thunk consists of a tuple of the niques. argument values that are expected by the pointed function on evaluation. In resulting situation, the compiler must solve following 2 Language internals new tasks: The language is constructed similarly as other pure func- It has to ensure that the code is ready for passing the • tional languages. We tightly follow the standard defini- thunks around as function arguments or return values tion structure and functional syntax known from Haskell, instead of simple values. with some simplifications (e.g. the syntax for type classes It has to prepare thunk-evaluating functions for all oc- and related constructions is unnecessary for our purpose). • currences of thunks in the code. Upon compilation, is type-checked, rewrit- ten to non-polymorphic non-lazy equivalent, functions are For example, consider this simple functional code: lifted to form top-level blocks, and partially evaluated to certain extent for optimization. f a = (+) a The testing compiler finally emits a pack of LLVM in- g a b = a b termediate code with several functions (e.g. main) ex- h a = g (f a) a ported to serve as entry points. LLVM framework is then We will ignore the fact that any reasonable compiler would used to compile the almost-machine LLVM code to an choose inlining with a far better result, and generate static object file (or executable) of the target platform. Using thunks for demonstration. We progress as follows: LLVM as “assembly” allows us to simplify the compiler in three ways: We do not have to care about register al- 1. We will first derive the types of the code. Given ini- location, spilling and raw stack operations, program can tial basis Γ (+ : N N N), usual type infer- ∋ → → be easily linked with any code the LLVM bytecode is able ence would output following type assignments for the to link with (notably any library that follows the C call- code: f : N N N, g : N N N N and → → → → → ing conventions, including the C standard library), and we h : N N. → can use the vast library of already-available low-level op- Instead, we reflect the need to see which functions timizations that can be run on LLVM bytecode. must be called lazily and modify the type system to allow such expression. Specifically, the typing in- ferred for f is f : N thunk( N ,N N). 2.1 Evaluation → { } → The type expression means that f returns a thunk that In our case, the compilation of function evaluation is prin- can be used as a function of type N N and already cipally similar to that used in GHC — the function defini- carries one argument of type N that→ will be passed to tions that were optimized and lifted to top-level are com- evaluating function. We will show how to derive such piled to form code blocks that obey a machine-level calling information later. convention, and the machine code is generated for them. In contrast to GHC, our resulting code can only use a set of 2. From that, the compiler sees that f must be called primitives applicable to system-level environment, notably lazily, and creates an eagerly evaluable function that it can not implicitly access any memory other than on the can be referenced from the thunk: program stack. The two cases when GHC uses such allo- cation are the handling of boxed or recursive types (which f_eager t1 t2 = (+) t1 t2 we disallowed) and allocation of thunks for lazy evalua- tion, which must be worked around. 3. It correspondingly translates the “inner” call (f a) in h to a thunk represented as a tuple address_of(f_eager),a . 3This knowledge was gained by looking at the statistical distribu- tion of programming languages used in software packages installed on 4Spineless Tagless G-Machine, the structure used to store all implic- an average Debian system. [19] itly allocated data of programs compiled by GHC. 6 M. Kratochvíl

4. To translate the outer call, a new version of g, called x : σ Γ Var g_eager that accepts a tuple in the above form as a Γ x∈: σ second argument, is generated, and h is rewritten: ⊢ Γ e : τ σ instancesΓ(τ) ⊢ ∈ Inst g_eager (fptr,a) t3 = fptr a t3 Γ e : σ h a = g_eager (f_eager,a) a ⊢ Γ e : τ Γ,x : close(τ) y : σ ⊢ ⊢ Let Resulting code now does not contain any lazy evalu- Γ let x = e in y : σ ation. ⊢ Γ,x1,2,...,n : σ1,2,...,n e : τ ⊢ Abs Overhead of static thunks. Compared to code inlining that Γ (λx1,2,...,n.e) : thunk( ,σ1 σ2 σn τ) ⊢ {} → → ··· → → would trivially solve the aforementioned example, static Γ e : thunk(Φ,σ ρ) Γ x : σ thunks may potentially present significant runtime over- ⊢ → ⊢ App Γ (e x) : thunk(Φ σ,ρ) head arising from the necessity to transfer function point- ⊢ k ers and (possibly bulky) thunk data through the stack struc- Γ e : thunk(Φ,ρ) ρ R ⊢ ∈ Eval tures (such behavior may manifest as appearance of func- Γ e : ρ tions with surprisingly many arguments), and from the ⊢ possibility that the thunk may get evaluated more than once. In our view, the first case is a counterweight to a Figure 1: rules based on Hindley-Milner similar situation present with inlining process — inlining type system. The original abstraction and application rules may produce a program that runs faster, but usually for the are modified to augment the simple function types with price of code size. Allocating thunks produces possibly information about all possible thunk-evaluating functions. smaller code (because the compiler is not forced to inline all partial applications), but the required data transfers and memory operations) are contained in a monad environ- indirections add overhead and reduce execution efficiency. ment, and we provide syntax for marking lazy code that Compared to thunks allocated on heap, our approach the programmer can use to avoid possible infinite loops of profits from the simplicity of stack allocation, which is eager evaluation. usually handled by one-instruction modification of a CPU register. General-purpose heap allocators may require hundreds of instructions and possibly produce several 2.2 Type system CPU cache misses. The language is typed statically by a Hindley-Milner-style Strictness. Strict method of evaluation is a common opti- type inference as described by Damas and Milner [5]. mization in compilers of lazy functional languages, as it is While we use several non-trivial practical extensions (no- usually cheaper to actually evaluate the result than to allo- tably the statically defined overloading in a manner similar cate the thunk and possibly re-evaluate it on each usage5. to that of Kaes [14] and support for recursion without ex- There are practical methods that automatically determine plicit handling of the fixpoint operator) we only describe which results should be evaluated strictly, including the the minimal extension of the system that informs the com- one from Haskell [12] or OCaml [24]. piler about the requirements for removing lazy evaluation. Our approach is simpler — because the need to pro- See Figure 1 for the modified inference rules. duce specialized code with additional data transfers may In the figure, the functions instances and close are impose significant performance overhead, we consider all used exactly for the purposes of the original system — calls strict by default, allowing lazy evaluation only when close introduces the -polymorphism (to create a poly- ∀ needed (i.e. where inlining was unable to remove it) or on type), and instancesΓ removes it, by describing a set of programmer’s choice. new possible monotype instances with fresh type variables free in the basis Γ. Effect of strict evaluation on correctness. The common ar- The information stored in a thunk covers gument against strict evaluation — that it may prevent the the actual type of the function the thunk represents at program from halting — holds here. Moreover, our work • with pointed types may cause a program crash if the state- current place ments are not evaluated in exact order (e.g. a null-pointer one list of argument types for each thunk-evaluating check followed by a dereference is a common construc- • function that must be able to get arguments required tion, but may get evaluated in a reverse order if both parts for its evaluation from this thunk. were arguments of one function). We argue that our envi- ronment is not affected by those kinds of errors — all po- Syntactically, the situation is described as follows. The tentially harmful side effects (especially the out-of-stack structure: 5With the exception of popular counterexamples, including long lists that get only first few items instantiated. thunk (τ1,τ2,...,τn),(σ1,...),... ,ρ { }  Compiling Functional Code for System-Level Environment 7

describes a thunk that returns (possibly functional) type tions (addition of a new argument, set unification and ex- ρ, and stores enough arguments to be evaluated either by traction of any complete argument list) stay deterministic. function of type τ1 τ2 τn ρ, or σ1 ρ, An easily implementable example is a tuple with just etc. Argument lists→ can→ be ··· empty, → → in that case→ we ··· write→ enough fields of every type so that each argument list can them as . Operation Φ σ in the figure produces a new fit in, stored in tail-aligned order to allow easy addition of set of argument lists, containingk all lists from Φ with σ a newly applied argument to list tails. appended on the end. Such set-based construction is necessary to overcome Function specialization. After inference, function bodies the possible incompatibility of thunks that can arise from specialized for thunk processing are created by a very grat- a code similar to this: ifying side effect of the overloading resolution algorithm: In the same way in which e.g. an arithmetic function is succ :: N -> N specialized to support both integer and floating-point ar- if :: Boolean -> a -> a -> a guments, it can also be specialized to support thunks val- f = succ ues. The difference in the code is then created on the place g x = (+) x of the “invisible” application operator, instead of the more h a b = if a f (g b) traditional place of an arithmetical operator. Because underlying code can not prepare the result of h for evaluation by derived thunk-evaluating functions of ei- 2.3 Recursive types ther f or g, it is necessary for the thunk to be universal. Its Recursive types, such as auto-allocated lists and trees, universal type is decided as thunk( ,(N) ,N N). Con- { } → form a basic building block of many current functional secutive application of a value of type N to the result of h languages. Their removal in our language is justified by produces thunk( (N),(N,N) ,N), which can be readily { } the addition of pointed (“reference”) types that, like in evaluated — the argument lists exactly correspond to the other languages, are able to replicate the functionality to argument lists of succ and (+). Note that it is not nec- a practical extent. While other solutions for replacing the essary to mark which argument list is going to be used on need for automatic deallocation exist (like the linear type evaluation, as that information is also (implicitly) present systems), our choice is supported by following considera- in the code of the associated pointed function. tions: Unification of the thunk descriptions, required for the Almost all primitives used by a system-programming inference system to work, is done structurally in the type • part, and by set union in the argument list. In our im- language require some computation with memory ad- plementation, set union is handled by working with the dresses (e.g. the system calls). set contents as with an separate object external to actual Usage of first-class references is a fairly standard unification process, storing only a variable-like reference. • method to define complicated data structures.6 Technical details are omitted. Inclusion of pointers does not prohibit the possibility There is a chance for non-determinism introduced in • rules Abs and Eval, as it is not clear for the compiler of future addition of automatic memory management whether e.g. the thunk should be evaluated on spot by the programmer, just as with garbage collectors or passed down unmodified; or whether successive λ- for C/C++ [3]. abstractions should be converted to a single thunk or mul- All memory operations can be contained in a monad tiple smaller thunks that would eventually get evaluated • to produce a pure language. This can be further ex- successively. For our purpose, we use early evaluation of ploited to provide some automatic safety of memory every thunk in the Eval case by completely disallowing operations, but forces the programmer to use compli- already-evaluable thunks, and the all-at-once thunk con- cated syntax for trivially-looking tasks. struction in the Abs case. Such approach satisfies our re- quirements on the compiler; other alternatives were not As we are not aware of any officially recognized considered as they bring “additional laziness” of the eval- syntax that would allow to directly use pointers in uation and measurement of their effect is not in the a purely functional language, we reuse the sizeOf, of this paper. peek and poke primitives from Haskell FFI library [4], that shares a similar set of goals. Allocation and Thunk storage. Given a deterministic storage method, deallocation functions are linked from standard C li- the size required for thunk structure can be computed at brary, having type signatures alloc :: IO (Ptr a) and 7 compile-time for any thunk type. Thunks can therefore free :: Ptr a -> IO (). be passed around as traditional function arguments and re- 6Moreover, naive data structure implementations in pure languages turn values without any need for dynamic memory man- suffer from a fatal inefficiency resulting from the need to constantly re- agement. allocate immutable data. In those cases, non-trivial constructions such as the zipper [11] are required to produce effective code. Exact order in which the arguments are stored in the 7alloc can decide about allocation size from previous type infer- thunk is completely arbitrary, as long as the basic opera- ence. 8 M. Kratochvíl

2.4 Recursion 1.38 0.56 The common model for transforming functional recursion TEA 0.61 to loops using tail calls fits our scheme with one excep- tion — unbounded, non-tail recursion along a data struc- 1.48 ture will cause stack overflow much earlier than in a com- 0.58 mon functional language, where the “stack” structure is BinTree 0.56 allocated dynamically on heap and the actual system stack 0 0.5 1 1.5 holds only rarely-expanding structures. In case of Haskell, tlc C Haskell infinite non-tail recursion will cause a regular memory de- pletion error, stack overflows can happen only on evalua- tion of deeply-nested thunk values.8 Figure 2: Performance results for the test programs, lower The risk of unwanted stack overflows can be mitigated is better. Values are average run time in seconds over 100 syntactically: Whenever the compiler would emit code runs. that calls some function recursively using a non-tail call, i.e. when there is a loop in call graph that contains at least repeatedly on a list that fitted in the CPU L3 cache. Fully one non-tail call edge, it may abort the compilation and inlined, specialized foldl was able to process one list ele- require the programmer to syntactically acknowledge the ment on average in around 4.152ns, foldl that was passed acceptance of the risk (or existence of some functionality a thunk that contained the adding function took 4.548ns. that alleviates it) with a keyword at call site. Several (less than 10) extra 8-byte arguments passed with the thunk increased the processing time of one element by 3 Implementation and results around 0.1ns each. We consider the overhead of indirection less than 10% Our demonstrational compiler is implemented as an ex- in a pathological stress-test a good result, and do not ex- pansion of the tlc compiler written in C++ [17]. We pect real applications to suffer serious performance prob- measure its performance in two ways: First, to measure lems. Overhead of passing reasonably-sized thunk con- the general performance of our approach, we compare the tents around seems similarly inconsequential. All tests were measured on Intel R CoreTMi5-4200M speed of simple implementations of two algorithms that solve relatively common problems with other compilers. CPU at 2.50GHz, used C compiler was standard Debian Next, performance overhead of a synthetic case of static GCC of version 5.3.1-19, Haskell compiler was GHC ver- thunk usage is tested on a small program that runs an sion 7.10.3. equivalent of standard Haskell foldl function on a list structure. 3.1 Drawbacks and unsolved problems The exact test problems for first comparison are: The main drawbacks of the new language originate in the Insertion of 220 pseudo-random elements into a bi- new combination of system-level problems with purely • nary search tree, and functional syntax. Suitable solutions for some problems listed below are not yet established, and are slightly more 228 rounds of TEA cipher [23]. • interesting as a question of language design than of actual We have created simple implementations of both test al- compiler functionality. We present the most interesting gorithms in C, Haskell and in our testing language, and problems as open questions: compared the execution speed of the code produced by each compiler. The measurements are shown in the fig- Destructors. Finding a good spot for automatic release ure 2. Results show almost identical running times for C of resources held by local variables is not very straightfor- and our language, and some overhead for the code pro- ward in a functional language, as the concept of imperative duced by Haskell compiler. The tiny speedup over C++ in code blocks that correspond to variable validity in C-like one case was determined to be a product of complete in- languages is not extensible to the monad environment. lining of the functional code; it was no longer measurable A related approach that merely prevents the program- after forcibly inlining the C++ code by hand to a form with mer from forgetting to deallocate a structure (thus creating similar structure as the assembly produced by tlc a memory leak) is the bracketing pattern, known for exam- Next, thunk overhead was measured on the foldl func- ple from Python as the with construction, or as bracket tion used to sum elements in a prepared linked-list struc- from Haskell Control.Exception library. For our pur- ture with pseudo-random numbers. The function was run pose, bracketing is composable with monads [8] to form an illusion of variable-holding program-blocks that resemble 8 The nice example given on Haskell Wiki [9] exploits the properties well-accepted syntax from procedural programming. of lazy implementation of scanl: Code print $ last $ scanl (+) [0..1000000] is killed in older GHC implementations as the stack gets Similar problems arise with temporary data structures filled by the back-references to unevaluated list elements. required for sub-results of certain operations: For instance, Compiling Functional Code for System-Level Environment 9

the multiplication of 3 matrix objects (a*b*c) usually im- 4.1 Future research plicitly allocates space for sub-result (a*b). In a func- tional environment, * has to be replaced by some monadic Apart from the demonstration of the original goal, the new construction to allow access to impure primitives (e.g. language opens possibilities to apply high-level optimiza- heap memory operations) required to allocate the space for tion methods to programs that intend to run on bare hard- sub-results. ware, possibly yielding better results than the optimizers of Haskell that are encased in a pre-defined scheme of memory management, or the optimizers of C++ which are Pointers to stack. A common method to allocate and deal- constrained by aliasing problems arising from impurity of locate a small structure needed for communication with a the language. system- or a library-call is to allocate it dynamically on An example of approaches that aggressively modify and the stack and only pass a pointer; a great example of such restructure the structure of the program is the structure is struct timeval. In a functional language, work of Danvy and Schultz [6] that shows beneficial im- careless usage of such pointer may lead to a program crash pact of possibly non-deterministic combination of argu- — because the stack management does not necessarily ment dropping, lifting and inlining. Similarly, purity of correspond to variable scope as in C-like languages, no the code allows for a more efficient elimination of com- dependency that would hold the structure in place until is mon sub-expressions and repeated code. Practical impact implicitly materialized and the pointer may easily become of both techniques is yet to be measured. dangling. Newly added program control possibilities, mainly the memory-related primitives described in section 2.3, create List processing. Easy list traversal and recursive struc- many new chances for programmer errors. While current ture comprehension is one of the main features of the type checkers can discover many errors on compilation, functional programming languages. Similar functional- more complicated type systems or verification approaches ity may also exist in a low-level language that forbids could be able to check e.g. actual memory safety or related the (required) recursive types: Pattern matching on list constraint satisfaction. We hope to implement and test the structures, as known from Haskell, can be also applied ideas from the Sage language [16] that allows automatic to pointers. Common functional allocation of list struc- addition of runtime checks for constraints that the com- tures (e.g. the well-known function pattern that returns list piler was not able to satisfy by static checking; or some tail with a new prepended head) can be either replaced by of the work of Roorda [21], which allows to use a quite allocation-free generators similar to Data.Traversable, powerful concept of pure type systems in a practical envi- or redesigned to work on STL-like list structures that are ronment. implicitly allocated by well-hidden language construction.

References 4 Conclusion [1] Richard Bird, Geraint Jones, and Oege De Moor. More We have presented a new method to compile a pure func- haste, less speed: lazy versus eager evaluation. Journal of tional language to low-level code suitable for system- Functional Programming, 7(05):541–547, 1997. programming. Main improvement, described in section [2] Guy E Blelloch and Robert Harper. Cache efficient func- 2.2, is the inference algorithm that allows to transform tional algorithms. Communications of the ACM, 58(7):101– lazily-evaluated functional code to a form that does not 108, 2015. require automatic memory management for run-time allo- [3] Hans Boehm, Alan Demers, and Mark Weiser. A garbage cation of thunks. Resulting compiler is comparably sim- collector for C and C++, 2002. ple, fast, and produces code that does not require any run- [4] Manuel MT Chakravarty, Sigbjorn Finne, F Henderson, time support, shows no significant performance drawbacks Marcin Kowalczyk, Daan Leijen, Simon Marlow, Erik Mei- when compared to code generated by current C compilers, jer, Sven Panne, S Peyton Jones, Alastair Reid, et al. The Haskell 98 foreign function interface 1.0, 2002. Available and allows basic usage of high-level constructions known onlin e: http://www. cse. unsw. edu. au/ chak/haskell/ffi. from current functional languages. [5] Luis Damas and Robin Milner. Principal type-schemes While the language is not yet very pleasant to work for functional programs. In Proceedings of the 9th ACM with, mainly because of the drawbacks that are mentioned SIGPLAN-SIGACT symposium on Principles of program- in section 3.1 and lack of well-developed standard library, ming languages, pages 207–212. ACM, 1982. the code of the test programs is concise, shows no unnec- [6] Olivier Danvy and Ulrik P Schultz. Lambda-dropping: essary complexity, are is readily comprehensible by any transforming recursive equations into programs with block functional-aware programmer. Moreover, as described in structure, volume 32. ACM, 1997. section 2, the language allows direct linking with many ex- [7] Damien Doligez and Xavier Leroy. A concurrent, genera- isting libraries through standardized calling conventions, tional garbage collector for a multithreaded implementation which may be easily exploited for further applications. of ML. In Proceedings of the 20th ACM SIGPLAN-SIGACT 10 M. Kratochvíl

symposium on Principles of programming languages, pages 113–123. ACM, 1993. [8] Cale Gibbard. Bracket pattern. https://wiki.haskell. org/Bracket_pattern, 2008. Accessed: 2016-05-10. [9] Stack Overflow - Haskell Wiki. https://wiki. haskell.org/Stack_overflow. Accessed: 2015-12-21. [10] The High Assurance Systems Programming Project HASP. The Habit Programming Language: The Revised Prelimi- nary Report. Department of Computer Science, Portland State University Portland, Oregon 97207, USA, November 2010. [11] Gérard Huet. The zipper. Journal of functional program- ming, 7(05):549–554, 1997. [12] Kristian Damm Jensen, Peter Hjæresen, and Mads Rosendahl. Efficient strictness analysis of Haskell. In Static Analysis, pages 346–362. Springer, 1994. [13] SL Peyton Jones, Cordy Hall, Kevin Hammond, Will Par- tain, and Philip Wadler. The : a technical overview. In Proc. UK Joint Framework for Information Technology (JFIT) Technical Conference, vol- ume 93, 1993. [14] Stefan Kaes. Parametric overloading in polymorphic programming languages. In ESOP’88, pages 131–144. Springer, 1988. [15] Richard A Kelsey. Pre-Scheme: A Scheme dialect for sys- tems programming, 1997. [16] Kenneth Knowles, Aaron Tomb, Jessica Gronski, Stephen N Freund, and Cormac Flanagan. SAGE: Unified hybrid checking for first-class types, general refinement types, and dynamic (extended report), 2006. [17] Miroslav Kratochvíl. Low-level functional programming language. Master’s thesis, Charles University in Prague, 2015. [18] Xavier Leroy. Unboxed objects and polymorphic typing. In Proceedings of the 19th ACM SIGPLAN-SIGACT sympo- sium on Principles of programming languages, pages 177– 188. ACM, 1992. [19] Avery Pennarun, Bill Allombert, and Petter Reinholdtsen. Debian popularity contest, 2012. [20] and Simon Marlow. Secrets of the glasgow haskell compiler inliner. Journal of Functional Programming, 12(4-5):393–434, 2002. [21] J-W Roorda and JT Jeuring. Pure type systems for func- tional programming. 2007. [22] Michel Steuwer, Christian Fensch, Sam Lindley, and Christophe Dubach. Generating performance portable code using rewrite rules. 2015. [23] David J Wheeler and Roger M Needham. TEA, a tiny en- cryption algorithm. In Fast Software Encryption, pages 363–366. Springer, 1994. [24] Hirofumi Yokouchi. Strictness analysis algorithms based on an inequality system for lazy types. In Functional and , pages 255–271. Springer, 2008.