Python: the Full Monty                      

Python: The Full Monty A Tested Semantics for the Python Programming Language Joe Gibbs Politz Alejandro Martinez Matthew Milano Providence, RI, USA La Plata, BA, Argentina Providence, RI, USA [email protected] [email protected] [email protected] Sumner Warren Daniel Patterson Junsong Li Providence, RI, USA Providence, RI, USA Beijing, China [email protected] [email protected] ljs.darkfi[email protected] Anand Chitipothu Shriram Krishnamurthi Bangalore, India Providence, RI, USA [email protected] [email protected] Abstract ity it now has several third-party tools, including analyzers We present a small-step operational semantics for the Python that check for various potential error patterns [2, 5, 11, 13]. programming language. We present both a core language It also features interactive development environments [1, 8, for Python, suitable for tools and proofs, and a translation 14] that offer a variety of features such as variable-renaming process for converting Python source to this core. We have refactorings and code completion. Unfortunately, these tools tested the composition of translation and evaluation of the are unsound: for instance, the simple eight-line program core for conformance with the primary Python implementa- shown in the appendix uses no “dynamic” features and con- tion, thereby giving confidence in the fidelity of the seman- fuses the variable renaming feature of these environments. tics. We briefly report on the engineering of these compo- The difficulty of reasoning about Python becomes even nents. Finally, we examine subtle aspects of the language, more pressing as the language is adopted in increasingly im- identifying scope as a pervasive concern that even impacts portant domains. For instance, the US Securities and Ex- features that might be considered orthogonal. change Commission has proposed using Python as an ex- ecutable specification of financial contracts [12], and it is Categories and Subject Descriptors J.3 [Life and Medical now being used to script new network paradigms [10]. Thus, Sciences]: Biology and Genetics it is vital to have a precise semantics available for analyzing Keywords serpents programs and proving properties about them. This paper presents a semantics for much of Python (sec- 1. Motivation and Contributions tion 5). To make the semantics tractable for tools and proofs, we divide it into two parts: a core language, λ , with a small The Python programming language is currently widely used π number of constructs, and a desugaring function that trans- in industry, science, and education. Because of its popular- lates source programs into the core.1 The core language is a mostly traditional stateful lambda-calculus augmented with Permission to make digital or hard copies of all or part of this work for personal or features to represent the essence of Python (such as method classroom use is granted without fee provided that copies are not made or distributed lookup order and primitive lists), and should thus be familiar for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM to its potential users. must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a 1 fee. Request permissions from [email protected]. The term desugaring is evocative but slightly misleading, because ours is OOPSLA ’13, October 29-31 2013, Indianapolis, IN, USA. really a compiler to a slightly different language. Nevertheless, it is more Copyright © 2013 ACM 978-1-4503-2374-1/13/10. $15.00. suggestive than a general term like “compiler”. We blame Arjun Guha for http://dx.doi.org/10.1145/2509136.2509536 the confusing terminology. Because desugaring converts Python surface syntax to the Σ ::= ((ref v+undef) ...) core language, when it is composed with an interpreter for ref ::= natural λπ (which is easy to write), we have another implementation v, val ::= 〈val,mval,{string:ref,...}〉 of Python. We can then ask how this implementation com- | 〈x,mval,{string:ref,...}〉 pares with the traditional CPython implementation, which | @ref | (sym string) represents a form of ground truth. By carefully adjusting v+undef ::= v | ☠ the core and desugaring, we have achieved high fidelity with e+undef ::= e | ☠ CPython. Therefore, users can built tools atop λπ, confident t ::= global | local that they are conformant with the actual language. mval ::= (no-meta) | number | string | meta-none [ ...] ( ...) { ...} In the course of creating this high-fidelity semantics, we | val | val | val | (meta-classx) | (meta-code (x ...)xe) identified some peculiar corners of the language. In partic- | λ(x ...) opt-var.e ular, scope is non-trivial and interacts with perhaps unex- opt-var ::= (x) | (no-var) pected features. Our exposition focuses on these aspects. e ::= v | ref | (fetche) | (set!ee) | (alloce) In sum, this paper makes the following contributions: | e[e] | e[e :=e] if • a core semantics for Python, dubbed λπ, which is defined | eee | e e as a reduction semantics using PLT Redex [3]; | let x= e+undef ine | x | e :=e | (deletee) • an interpreter, dubbed λπ#, implemented in 700LOC of | e(e ...) | e(e ...)*e | (framee) | (returne) Racket, that has been tested against the Redex model; | (whileeee) | (loopee) | break | continue | (builtin-prim op(e ...)) • a desugaring translation from Python programs to λπ, | fun ( ...) implemented in Racket; x opt-var e | 〈e,mval〉 | list〈e,[e ...]〉 • a demonstration of conformance of the composition of | tuple〈e,(e ...)〉 | set〈e,(e ...)〉 desugaring with λπ# to CPython; and, | (tryexceptexee) | (tryfinallyee) | (raise ) | (err ) • insights about Python gained from this process. e val | (moduleee) | (construct-modulee) Presenting the semantics in full is neither feasible, given | (in-moduleeε) space limits, nor especially enlightening. We instead focus on the parts that are important or interesting. We first give Figure 1: λπ expressions an overview of λπ’s value and object model. We then introduce desugaring through classes. We then discuss gen- erators, classes, and their interaction with scope. Finally, we describe the results of testing our semantics against either another λπ value or the name of a built-in class. The CPython. All of our code is available online at https: primitive content, or meta-val, position holds special kinds //www.github.com/brownplt/lambda-py. of builtin data, of which there is one per builtin type that λπ models: numbers, strings, the distinguished meta-none 2 2. Warmup: A Quick Tour of λπ value, lists, tuples, sets, classes, and functions. The distinguished ☠ (“skull”) form represents uninitial- We provide an overview of the object model of λπ and ized heap locations whose lookup should raise an excep- Python, some of the basic operations on objects, and the tion. Within expressions, this form can only appear in let- shape of our small step semantics. This introduces notation bindings whose binding position can contain both expres- and concepts that will be used later to explain the harder sions and . The evaluation of in a -binding allocates parts of Python’s semantics. it on the heap. Thereafter, it is an error to look up a store location containing ; that location must have a value 2.1 λπ Values assigned into it for lookup to succeed. Figure 2 shows the Figure 1 shows all the values and expressions of λπ. The behavior of lookup in heaps Σ for values and for . This metavariables v and val range over the values of the lan- notion of undefined locations will come into play when we guage. All values in λπ are either objects, written as triples discuss scope in section 4.2. in 〈〉, or references to entries in the store Σ, written @ref. Python programs cannot manipulate object values di- Each λπ object is written as a triple of one of the forms: rectly; rather, they always work with references to objects. 〈v,mval,{string:ref,...}〉 Thus, many of the operations in λπ involve the heap, and 〈x,mval,{string:ref,...}〉 few are purely functional. As an example of what such an operation looks like, take constructing a list. This takes the These objects have their class in the first position, their primitive content in the second, and the dictionary of string- 2 We express dictionaries in terms of lists and tuples, so we do not need to indexed fields that they hold in the third. The class value is introduce a special mval form for them. (E[ let x= v+undef ine]εΣ) [E-LetLocal] (E[[x/ref]e]εΣ 1) where (Σ1 ref) = alloc(Σ,v+undef) (E[ref]εΣ) (E[val]εΣ) [E-GetVar] where Σ = ((ref1 v+undef1) ... (ref val) (refn v+undefn) ...) (E[ref]εΣ) [E-GetVarUndef] (E[(raise〈%str,“Uninitialized local”,{}〉)]εΣ) where Σ = ((ref1 v+undef1) ... (ref ☠)(refn v+undefn) ...) Figure 2: Let-binding identifiers, and looking up references (E[〈val,mval〉]εΣ) [E-Object] (E[(fetch @ref )]εΣ) [E-Fetch] (E[@refnew ]εΣ 1) (E[Σ(ref)]εΣ) where (Σ1 refnew) = alloc(Σ,〈val,mval,{}〉) (E[(set! @ref val)]εΣ) [E-Set!] (E[ tuple〈valc,(val ...)〉]εΣ) [E-Tuple] (E[val]εΣ 1) ( [@ ]εΣ ) E refnew 1 where Σ1 = Σ[ref:=val] where (Σ1 refnew) = alloc(Σ,〈valc,(val ...),{}〉) (E[(alloc val)]εΣ) [E-Alloc] (E[ set〈valc,(val ...)〉]εΣ) [E-Set] (E[@refnew ]εΣ 1) ( [@ ]εΣ ) E refnew 1 where (Σ1 refnew) = alloc(Σ,val) where (Σ1 refnew) = alloc(Σ,〈valc,{val ...},{}〉) Figure 5: λπ reduction rules for references Figure 3: λπ reduction rules for creating objects 2.2 Accessing Built-in Values values that should populate the list, store them in the heap, Now that we’ve created a list, we should be able to perform and return a pointer to the newly-created reference: some operations on it, like look up its elements.

Load more