Native Implementation of Mutable Value Semantics ICOOOLPS ’21, July 13, 2021, Online
Total Page:16
File Type:pdf, Size:1020Kb
Native Implementation of Mutable Value Semantics Dimitri Racordon Denys Shabalin Daniel Zheng [email protected] [email protected] [email protected] Faculty of Science Google Google University of Geneva Switzerland United States Switzerland Dave Abrahams Brennan Saeta [email protected] [email protected] Google Google United States United States ABSTRACT 1 INTRODUCTION Unrestricted mutation of shared state is a source of many well- Software development continuously grows in complexity, as ap- known problems. The predominant safe solutions are pure func- plications get larger and hardware more sophisticated. One well- tional programming, which bans mutation outright, and flow sen- established principle to tackle this challenge is local reasoning, de- sitive type systems, which depend on sophisticated typing rules. scrbibed by O’Hearn et al. [4] as follows: Mutable value semantics is a third approach that bans sharing in- To understand how a program works, it should be pos- stead of mutation, thereby supporting part-wise in-place mutation sible for reasoning and specification to be confined to and local reasoning, while maintaining a simple type system. In the cells that the program actually accesses. The value the purest form of mutable value semantics, references are second- of any other cell will automatically remain unchanged. class: they are only created implicitly, at function boundaries, and cannot be stored in variables or object fields. Hence, variables can There are two common ways to uphold local reasoning. One never share mutable state. takes inspiration from pure functional languages and immutability. Because references are often regarded as an indispensable tool Unfortunately, this paradigm may fail to capture the programmer’s to write efficient programs, it is legitimate to wonder whether such mental model, or prove ill-suited to express and optimize some al- a discipline can compete other approaches. As a basis for answer- gorithms [5], due to the inability to express in-place mutation. ing that question, we demonstrate how a language featuring muta- Another approach aims to tame aliasing. Newer programming ble value semantics can be compiled to efficient native code. This languages have successfully blended ideas from ownership types [1], approach relies on stack allocation for static garbage collection and type capabilities [2], and region-based memory management [9] leverages runtime knowledge to sidestep unnecessary copies. into flow-sensitive type systems, offering greater expressiveness and giving more freedom to write efficient implementations. Un- CCS CONCEPTS fortunately, these approaches have complexity costs that signifi- cantly raise the entry barrier for inexperienced developers [10]. • Software and its engineering → Source code generation; Mutable value semantics (MVS) offers a tradeoff that does not Runtime environments; Language features. add the complexity inherent to flow-sensitive type systems, yet preserves the ability to express in-place, part-wise mutation. It does KEYWORDS so treating references as a “second-class” concept. References are mutable value semantics, local reasoning, native compilation only created at function boundaries by the language implementa- arXiv:2106.12678v1 [cs.PL] 23 Jun 2021 tion, and only if the compiler can prove their uniqueness. Further, ACM Reference Format: they can neither be assigned to a variable nor stored in object fields. Dimitri Racordon, Denys Shabalin, Daniel Zheng, Dave Abrahams, and Bren- Hence, all values form disjoint topological trees, whose roots are nan Saeta. 2021. Native Implementation of Mutable Value Semantics. In assigned to the program’s variables. ICOOOLPS ’21, June 13, 2021, Online. ACM, New York, NY, USA, 4 pages. The reader may understandably worry about expressiveness and https://doi.org/10.1145/nnnnnnn.nnnnnnn efficiency, as references are often held as indispensable for both aspects. We note that a large body of software projects already ad- dress expressiveness concerns empirically, such as the Boost Graph Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed Library[7], leveraging MVS to elucidate recurring questions sur- for profit or commercial advantage and that copies bear this notice and the full cita- rounding equality, copies, and mutability, and develop generic data tion on the first page. Copyrights for components of this work owned by others than structures and algorithms [8]. ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission This paper focuses on the question of efficiency. We discuss an and/or a fee. Request permissions from [email protected]. approach for compiling languages featuring MVS to native code, ICOOOLPS ’21, July 13, 2021, Online relying on stack allocation for static garbage collection and using © 2021 Association for Computing Machinery. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM...$15.00 runtime information to elide unnecessary copies. We present it in https://doi.org/10.1145/nnnnnnn.nnnnnnn the context of a toy programming language, called MVSL, inspired ICOOOLPS ’21, July 13, 2021, Online Trovato and Tobin, et al. by Swift, for which we have written a compiler. Our implemen- Functions are declared with the keyword func followed by a list tation is available as an open-source project hosted on GitHub: of typed parameters, a codomain, and a body. Functions are anony- https://github.com/kyouko-taiga/mvs-calculus. mous but are first-class citizen values that can be assigned, passed as an argument, or returned from other functions. Arguments are 2 A QUICK TOUR OF MVSL evaluated eagerly and passed by copy. Further, functions are al- MVSL is a statically typed expression-oriented language, designed lowed to capture identifiers from their declaration environment. to illustrate the core principles of MVS. In MVSL, a program is a Such captures also result in copies and stored in the function’s clo- sequence of structure declarations, followed by a single expression sure, thus preserving value independence. denoting the entry point (i.e., the main function in a C program). 1 var foo: Int = 42 in var A variable is declared with the keyword followed by a name, 2 var fn: () -> Int { a type annotation, an initial value, and the expression in which it 3 foo = foo + 1 in foo is bound. A constant is declared similarly, with the keyword let. 4 } in 1 var foo: Int = 4 in 5 let bar = fn() in 2 let bar: Int = foo in bar 6 bar // foo is equal to 0 7 // bar is equal to 1 There are three built-in data types in the MVSL: Int for signed integer values, Float for floating-point values, and a generic type To implement part-wise in-place mutation across function bound- inout [T] for arrays of type T. In addition, the language supports two aries, values of parameters annotated can be mutated by the inout kinds of user-defined types: functions and structures. A structure callee. At an abstract level, an argument is copied when the is a heterogeneous data aggregate, composed of zero or more fields. function is called and copied back to its original location when the 1 inout Each field is typed explicitly and associated with a mutability qual- function returns. At a more operational level, an argument inout ifier (let or var) that denotes whether it is constant or mutable. is simply passed by reference. Of course, extends to multiple arguments, with one important restriction: overlapping mutations 1 struct Pair { are prohibited to prevent any writeback from being discarded. 2 var fs: Int ; var sn: Int 1 struct Pair { ... } in 3 } in 2 struct U {} in 4 var p: Pair = Pair(4, 2) in p 3 let swap: ( inout Int , inout Int) -> U Fields of a structurecan be of any type, but type definitions cannot 4 = (a: inout Int , b: inout Int) -> U { be mutually recursive. Hence, all values have a finite representa- 5 let tmp = a in tion. 6 a = b in All types have value semantics. Thus, all values form disjoint 7 b = tmp in U() topological trees rooted at variables or constants. Further, all as- 8 } in signments copy the right operand and never create aliases, depart- 9 var p = Pair(4, 2) in ing from the way aggregate data types typically behave in popular 10 _ = swap(&p.fs, &p.sn) object-oriented programming languages, such as Python or Java. 11 in p // p is equal to Pair(2, 4) struct in 1 Pair { ... } A more exhaustive specification of MVSL, as well as more elab- 2 var p: Pair = Pair(4, 2) in orate program examples, are available in the GitHub repository. 3 var q: Pair = p in 4 q.sn = 8 in 3 NATIVE IMPLEMENTATION 5 p // p is equal to Pair(4, 2) This section describes the strategy we implemented to compile 6 // q is equal to Pair(4, 8) MVSL to native code. Immutability applies transitively. All fields of a data aggregate 3.1 Memory representation assigned to a constant are also treated as immutable by the type Int Float system, regardless of their declaration. and are built-in numeric types that typically have a 1- to-1 correspondence with machine types. Since struct definitions 1 struct Pair { ... } in cannot be mutually recursive, all values of a structure have a finite 2 let p: Pair = Pair(4, 2) in memory representation (more on that later). Therefore, they can 3 p.sn = 8 in p // <- type error be represented as passive data structure (PDS), where each field is laid out in a contiguous fashion. Likewise, all elements of an array are constant if the array itself is In the absence of first-class references, it is fairly easy to iden- assigned to a constant.