Explicit and Controllable Assignment Semantics
Total Page:16
File Type:pdf, Size:1020Kb
Explicit and Controllable Assignment Semantics Dimitri Racordon Didier Buchs University of Geneva University of Geneva Centre Universitaire d’Informatique Centre Universitaire d’Informatique Switzerland Switzerland [email protected] [email protected] Abstract continuing growth in computing power, contemporary pro- Despite the plethora of powerful software to spot bugs, iden- gramming languages can sit at very high levels of abstrac- tify performance bottlenecks or simply improve the over- tions, letting developers convey their intent with great ex- all quality of code, programming languages remain the first pressiveness. Unfortunately, this evolution also brings the and most important tool of a developer. Therefore, appro- potential for more inadvertent behaviors, as the small imper- priate abstractions, unambiguous syntaxes and intuitive se- fections in these many layers of abstractions usually have mantics are paramount to convey intent concisely and effi- deep consequences on a language’s semantics. This is partic- ciently. The continuing growth in computing power has al- ularly true for the imperative paradigm, in which the inter- lowed modern compilers and runtime system to handle once play between variables and memory can prove to be a pro- thought unrealistic features, such as type inference and re- lific source of puzzling bugs. For instance, most mainstream flection, making code simpler to write and clearer to read. object-oriented programming languages have a limited sup- Unfortunately, relics of the underlying memory model still port to express aliasing explicitly, blurring the distinction transpire in most programming languages, leading to con- between object mutation and reference reassignment, and fusing assignment semantics. often leading to erroneous assumptions about sharing and This paper introduces Anzen, a programming language immutability. that aims to make assignments easier to understand and ma- This paper introduces Anzen, a multi-paradigm program- nipulate. The language offers three assignment operators, ming language that aims to dispel the confusion that sur- with unequivocal semantics, that can reproduce all common rounds assignment semantics. Rather than overloading a sin- imperative assignment strategies. It is accompanied by a gle assignment operator with different meanings, depend- type system based on type capabilities to enforce unique- ing on its operands’ types, Anzen provides three distinct ness and immutability. We present Anzen’s features infor- operators with unequivocal semantics. One creates aliases, mally and formalize it by the means of a minimal calculus. another performs deep copies and the last deals with unique- ness. The language further supports high-level concepts, such CCS Concepts • Software and its engineering → Im- as generic types and higher-order functions, and offers capability- perative languages; Language features; Formal language based [4] memory and aliasing control mechanisms that are definitions; Compilers; enforced by its runtime. We present Anzen’s features infor- mally, by the means of various examples, and describe its Keywords imperative languages, assignment, aliasing, unique- operational semantics formally. A compiler and interpreter ness, immutability for Anzen are distributed as an open-source software and ACM Reference Format: available at hps://github.com/anzen-lang/anzen. arXiv:1907.11317v1 [cs.PL] 25 Jul 2019 Dimitri Racordon and Didier Buchs. 2019. Explicit and Controllable Assignment Semantics. In Proceedings of Managed Programming 2 Background and Problematic Languages and Runtimes (MPLR’2019). ACM, New York, NY, USA, In the imperative paradigm, computation is described by se- 13 pages. quences of instructions that define how a program operates. These instructions interact with the state of the program, 1 Introduction whose modifications are expressed by memory assignments. A memory assignment is a statement of the form “x ≔ v" Writing software programs is a difficult task. Moreover, with that instructs the machine to write (i.e. assign) a value to a the dissemination of computer-based technology in all do- specific location in the memory. Here, x is a variable repre- mains, this endeavor is no longer reserved for computer ex- senting a memory address and v is a value. Programming perts. Consequently, good language design is essential to languages can express vastly different data types, ranging the production and maintenance of software. Thanks to the from numbers to more abstract data structures (e.g. a binary tree). For example,v could denote a number and be stored in MPLR’2019, October 20–25, 2019, Athens, Greece memory as a simple sequence of bits, but could also denote 2019. a dynamic list represented by multiple structures chained MPLR’2019,October20–25,2019,Athens,Greece DimitriRacordon and Didier Buchs together. It follows that the actual semantics of “≔" has to variable a’s value is copied. Therefore its mutation at line deal with data representation and properties thereof. 3 does not impact x’s value. Conversely on the right, the A language only describes programs, whereas their exe- variable a represents a heap-allocated dynamic array, whose cutions are driven by a runtime system that is responsible pointer is copied at line 3. Consequently the mutation at line to process each instruction. This means that the details of 3 also impacts the value that x represents. data representation can be eluded at the language level, as long as sufficient information can be derived by the runtime 1 a = 4 1 a = [4] system to actually operate the machine’s memory. Most sys- 2 x = a 2 x = a tems have evolved to divide their memory into a stack, to 3 a += 2 3 a += [2] store local (w.r.t. a function call), small and fixed-sized val- 4 print(a, x) 4 print(a, x) ues, and a heap, to store long-lived and arbitrarily-sized data. 5 # 6 2 5 # [4,2] [4,2] A variable is merely a name denoting an address in the stack, meaning that it cannot directly refer to heap-allocated ob- jects. However, addresses thereof are essentially represented Another assignment strategy that has gained popularity as numbers, and therefore do fit into the stack. This means in recent years is to “move” values from one variable to an- that heap-allocated values are always manipulated through other, thereby avoiding any copy or alias creation. The ad- pointers. Modern programming languages (e.g. Python or vantages of this so-called move semantics are twofold. On Javascript) are able to hide these implementation constraints the performance front, it allows efficient return-by-value so that no distinction has to be made between stack and implementations by eluding unnecessary copies [1]. On the heap allocatedobjects. Thus, a variable can be seen as a con- memory safety front, it allows to preserve uniqueness, thereby tainer for a value of some kind, without any regard for where guaranteeing data-race safety [31]. For example, the Rust this value is stored. programming language adopts this approach as its default assignment semantics. However, since the language can fall Stack Heap back to copy semantics for certain data types, it also bears the potential for similar confusing situations. We argue that the problem does not reside in the support 0xd4 y 0x2a of multiple assignment strategies in the same programming language. On the contrary, those allow to better express in- x 0xd8 42 tent and offer a tighter control over memory. Instead, the 0xdc b 0x2a source of the confusion stems from overloading a single op- erator with different semantics, defined implicitly by the 0xe0 a 42 type of its operands, in order to cater for the abstraction of pointers. This comes with several important drawbacks: • This steepens the learning curve for beginners, as alias- Figure 1. Effect of bitwise copy assignment. Two local vari- ing consistently appear as a difficult aspect of pro- ables x and y are assigned to two other local variables a and gramming [27]. b, respectively. The variable a represents a number allocated • This increases the risk of errors in code refactoring, on the stack. It is copied during x’s assignment, which there- as seemingly identical statements can have different fore represents a different object. On the other hand, since b side-effects. represents a tree allocated on the heap, its pointer is copied • This hinders the control over pointers, as purposely during y’s assignment, resulting in an alias. creating aliases on stack-allocated values often requires to wrap them in heap-allocated objects, thereby reim- Unfortunately, most execution models do not preserve plementing the abstraction. this abstraction when dealing with assignments, which are A better approach would be to support these different strate- implemented as a straightforward “bitwise copy" of the stack- gies explicitly, in order to let the developer choose the most allocated value a variable represents. Consequently, values appropriate one and express her intent unambiguously. stored on the stack get fully copied, whereas values stored on the heap and referred to by a pointer get aliased, since 3 The Anzen Programming Language only the their pointers’ values are copied. Figure 1 illustrates Anzenisatypedgeneralpurposelanguagethataimstomake both cases. Although this strategy is efficient and usually assignment semantics explicit. Although it supports multi- matches the developer’s intent, it may leads to confusing sit- ple paradigms, Anzen leans towards object orientation and uations, in particular when mutation is involved. Consider