arXiv:1804.10806v1 [cs.PL] 28 Apr 2018 n-nwrwbie frisac,[].Teonrhpo a of ownership question- The of [5]). topic instance, common (for a websites become and-answer has and developers fles te omnlnugs hc rnsnwdi new brings which languages, common other esnn bu utporm.Tedi The programs. Rust about reasoning udeso et,icuigteo the including tests, of yield hundreds f semantics tools verification executable and programs. The formal a languages. automatically programming for endin defined faraitcsbe fRs,called Rust, of subset is sema realistic operational Rust executable a formal of too a of related semantics present developing we for formal its paper, fundament this a learn a as programmers useful features, and helping desired and and mechanisms programs new Rust of analysis iglanguage, ming di uti rdcinrfr to runn refers (organizations production etc. in [4], Enclave Rust brows SGX parallel Intel [2], [3], h system engine Rust . implement carefully type to used linear and been static The time its compile compile-time. complement at to at designed checked data-races are mechanisms prevents language three core also the valu it in each Moreover, unnecessary for The collection lifetime borrows. garbage clear is and making a promises moves establishes these ownership, system all of ownership meeting a system to needing novel key without Rust’s memory, One of collector. use garbage the fine f over provides and control abstractions idioms low-level zero-cost programming thread system supports high- and common it many modern safety meanwhile most memory and guarantees Like safety, Rust drivers browsers. languages, device web developing level systems, and in operating engines control as game low-level such software, and system safety achievin at high-level aiming [1], both Mozilla by implemented and designed fCadC The and collection. c low-level garbage C fine performance, need of high v necessarily Rust each bring not features for novel lifetime does lifetime clear the hence a to o and according ownership establishes borrowed The system or resource. ownership moved given The be any can of poin resource owner any a at ownershi one that only and as ensures is such system safety there Ownership mechanisms memory borrows. novel and its both moves through provides safety It thread language. programming ff 1 ne Terms Index h e etrsmk h eatc fRs di Rust of semantics the make features new The eety e ytmpormiglnug utwas Rust language programming system new a Recently, Abstract https: rRs rmohreitn rvln agae.Frformal For languages. prevalent existing other from Rust er KRust // www.rust-lang.org Rs sanwadpoiighg-ee system high-level promising and new a is —Rust K KRust † ertn-ae xctbesmni framework semantic executable rewriting-based a , ++ hnhiKyLbrtr fTutotyCmuig atChi East Computing, Trustworthy of Laboratory Key Shanghai Fra prtoa eatc,Rs program- Rust semantics, operational —Formal n neest ngraecleto,which collection, garbage in unnecessity and , K ∗ a entoogl aiae ytsigwith testing by validated thoroughly been has framework colo nomto cec n ehooy ShanghaiTec Technology, and Science Information of School omlEeual eatc fRust of Semantics Executable Formal A : / .I I. en-US egWang Feng ntroduction / 1 friends.html. ) ffi ilRs etsuite. test Rust cial KRust ∗ uSong Fu , ffi ut osatybaf- constantly culty h eatc is semantics The . ∗ ff ffi i Zhang Min , rn from erent ute in culties rRust or ontrol s In ls. ntics alue ing or as er se p, e, s. n g s f t , . fohrmdr rgamn agae uha C semantics as the such languages hence programming and modern unusual, other invalidation. of are iterator rules and semantic such races, specific errors, data programming low-level use-after-free, common as elimina of property range This wide shar immutable. a object is an references that multiple is by ba Rust The by mutable. enforced the is discipline itself if ownership object the muta object even cannot object borrowed reference borrowed immutable the the an while mutate mutable, is can object reference mutable A vi u eatc ob ut“ae ok,w formalize we work”, “paper in just be semantics to the To of semantics borrows. semantics and our operational moves avoid ownership formal realist ownership, a a capturing design of Rust We situation semantics Rust. this operational of rectifying formal subset toward first the step giving major by a make In programs. we Rust paper, for o tools developments verification further and not impedes analysis has formal which Rust studied, of well semantics been formal yet computer-aided the related knowledge, pro- developing our Rust To and about tools. way reasoning formal for a fundament in a grams as useful and sired Rust. to adapted directly be cannot Javascript and rm http: from: hc a enscesul ple oC[]adJv [8]. Java and [7] C to the applied languages call successfully programming We been for has framework which semantic scalable a of kinds two are There references. to Rust: multiple transferrin object in borrows by an of allows Instead shared which there object. be borrows that any provides ensures to Rust This binding ownership, owner. one outdated exactly the is via accessed be eatc a entoogl etdwt udeso tests the of executable, hundreds o with Being the tested automatically. including thoroughly been generated interpreter has an is semantics which Rust from executable, of and readable machine eto u nweg,i stefis omlexecutable formal first the is it Rust for knowledge, semantics our of best h eatc ayt en n xesbe h semantics The extensible. and define making to o languages, easy of semantics semantics the modular for notation simple in cp,Rs ilfe h on eore.Teonrhpof ownership its The be of resources. out can bound goes object the variable an free a will When Rust binding. scope, variable a is object † ff ioa Zhu Xiaoran , 2 ormd hsstain omlsmniso uti de- is Rust of semantics formal a situation, this remedy To hr r eea eet rmtefra eatc defined semantics formal the from benefits several are There r omlrfrneadacretesdfiiinfor definition correctness a and reference formal a ers KRust K rmwr.Frty h eatc endin defined semantics the Firstly, framework. // n l h eae apeeape r vial o downlo for available are examples sample related the all and sist.shanghaitech.edu.cn aNra nvriy hnhi China Shanghai, University, Normal na K nvriy hnhi China Shanghai, University, h ffi ento ftesemantics the of definition ual references mutable † ilRs etsie Secondly, suite. test Rust cial transferred K n u Zhang Jun and . rmwr 6 (http: [6] framework / faculty fe hc h betcannot object the which after , / songfu ∗ and mual references immutable / Projects // kframework.org), KRust / KRust. K / . rvdsa provides C K 2 ++ sboth is othe To These Java , ading this sic tes ed ic te g f , . implementers of tools such as parsers, compilers, interpreters function definition with return type (). Furthermore, Rust has and debuggers, which would greatly facilitate developers’ no restriction on the order of function definitions, namely that understanding, freeing them from lengthy, ambiguous, elusive Rust programs can invoke functions which are defined later. Rust documentations. Moreover, the semantics could automat- These features are unusual and introduce tricky corner cases. ically yield formal analysis tools such as state-space explorer 1 fn foo(x:i32, y:i32) −> i3 2 { for reachability, model-checker, symbolic execution engine, 2 x+y / / return x+y ; deductive program verifier by the language-independent tools 3 } K provided in [9]. The above program defines a function foo which takes two Organization. Section II gives a brief overview of Rust. In 32-bit integers x and y as arguments and returns x+y. This Section III, after introducing the basic notations of K, we function behaves the same as the function which replaces the present the formal semantics KRust of Rust in K. Confor- last expression x+y in foo with the return statement return mance testing and applications of KRust are described in x+y;. Section IV. Section V discusses related work. Finally, we conclude the work with a discussion in Section VI. C. Ownership Ownership is the key feature of Rust, which guarantees II. ATour of Rust memory safety and thread safety without garbage collection. In this section, we give a brief overview of Rust. Rust is The basic form of ownership is exclusive ownership, namely a C-like which contains common con- that each object has a unique owner at any time. This ensures structs from modern programming languages such as if/else that at most one reference is allowed to mutate a given branches, while/for loops, functions, compound data location. When an object is created and assigned to a variable structures, etc. We will mainly point out distinct features of x, the variable x becomes the owner of the object. If the object Rust, compared with other well-known modern programming is reassigned (as well as parameter passing, etc.) to another languages such as C/C++ and Java. variable y, the ownership of the object is transferred from the A. Mutability variable x to the variable y, namely that y becomes the owner of the object and x is not the owner of the object. This is Variables are declared using let statements. In default, so-called move semantics in Rust. This ownership discipline variable is immutable, which means that its value cannot be rules out aliasing entirely, and thus prevents from data race. mutated. To declare a mutable variable, mut is required in the Moveover, if the owner of the object goes out of scope, the declaration statement. object is destroyed automatically without garbage collection. 1 fn main() { This is implemented by the Rust compiler which inserts a 2 letx =9; 3 x =10; / / Error! call to a destructor at the end of the owner’s scope, called 4 let muty = 0 ; drop in Rust. This ownership discipline enforces automatic 5 let mut z: bool; memory management and prevents other errors commonplace } 6 in low-level pointer-manipulating programs, like use-after-free For instance, the above code declares an immutable variable or double-free. To see this principle in practice, consider the x at Line 2 and a mutable variable y at Line 4 whose types following sample program. are inferred at compile-time. The type can also be explicitly 1 struct Point { specified in the program like the mutable variable z at Line 2 x:i32, 5. The Rust compiler will issue an error at Line 3, as the 3 y:i32, 4 } immutable variable x is reassigned. This is different from 5 fn main() { other modern programming languages like C/C++ and Java. 6 letp = Point { x:1, y:2 } ; The semantic rules of Rust should take into the mutability of 7 { 8 letmutq = p ; / / q becomes the owner variables into account. 9 q.x = 2 ; 10 println!(” {} ”,p.x); / / Error! B. Functions 11 } Functions are declared with the keyword fn and each of 12 println!(” {} ”,q.x); / / Error! 13 } them should return exactly one value. There are two ways to return a value if the function definition declares a return Point is a compound data structure consisting of two 32-bit type. The first one returns a value in the body of the function integer fields x and y. At Line 6, a Point object is created explicitly using the return statement, which is similar to and assigned to the mutable variable p. The Rust compiler existing prevalent languages. The another one returns the will issue an error at Line 10 which accesses the x field of value of the last expression in the body of the function, if the Point object via the reference p, as the ownership of the there is no explicit return statement. However, if the function Point object was already transferred from p to q at Line 8. definition does not declare a return type, Rust will implicitly Moreover, the Rust compiler will also issue another error at returns the unit type (). Indeed, a function definition without Line 12 which accesses the x field of the Point object via the declaring a return type is just syntactic sugar for the same reference q, as the owner q went out of its scope and the Point

2 object was already destroyed. These scenarios rarely happen lifetimes can be elided in general, which is why they did not in existing prevalent programming languages, thus makes the show up in the above examples. Rust also supports named Rust semantics unusual. lifetimes which helps the Rust compiler to aggressively infer lifetimes and makes sure all borrows are valid. D. Borrowing 1 fn main() { The ownership discipline is a fairly straightforward mecha- 2 letmutx; nism for guaranteeing memory safety and thread safety. How- 3 { ever, it is also too restrictive to develop industry programs. To 4 lety = 1 ; 5 x = &y ; / / Error! address this issue, Rust provides a mechanism used to handle 6 } references, called borrowing. There are two different borrow- 7 letz = 1 ; ing (i.e., reference types) in Rust: mutable references and 8 letp = &z ; / / OK! 9 } immutable references. A mutable reference grants temporary exclusive read and write access to the object, i.e., each object The above example illustrates intuition behind lifetime. has at most one mutable reference (without any immutable There is an error at Line 5 as the lifetime of x is not included references). This ensures that mutable references are always in the lifetime of y. Instead, it is fine to borrow z at Line 8, unique pointers. A mutable reference can be reborrowed to as the lifetime of p is included in the lifetime of z. Therefore, someone else. Contrary to mutable references, an immutable Rust’s variable context is substructural. reference grants temporary read-only access to the object and III. KRust:The Formal Semantics of Rust in K it is allowed that multiple immutable references refer to the same object. The design of borrowing in Rust also guarantees In this section, we first introduce the basic notations of K, memory safety and thread safety. and then define the addressed formal syntax of Rust. Finally, we define the configurations and formal semantics KRust of 1 fn main() { 2 letx1 = 1 ; Rust in K. 3 letp1 = &x1 ; / / x1 is borrowed immutably 4 letq1 = &x1 ; / / x1 is borrowed immutably A. K Framework 5 ∗ p1 = 2 ; / / Error! K 6 lety = &mut x1; / / Error! Framwork is a rewrite-based executable semantic frame- 7 work. Operational semantics of a programming language can 8 let mutx2 = 1 ; be formally defined with the state of an executing program 9 letp2 = &x2 ; / / x2 is borrowed immutably 10 letq2 = &x2 ; / / x2 is borrowed immutably being represented as a configuration and the semantics of each 11 ∗ p2 = 2 ; / / Error! program statement being defined as K rules. With the defined 12 operational semantics, K automatically generates an interpreter 13 let mutx3 = 1 ; K 14 letp3 = &mut x3; / / x3 is borrowed mutably which can execute programs of the language. Besides, 15 x3 = 2 ; / / Error! also provides formal analysis functionalities such as model 16 ∗ p3 = 2 ; / / OK! checking, symbolic execution, and theorem proving [6]. A 17 letp4 = &mut x3; / / Error! 18 } number of programming languages have been formalized using K, such as C [7], Python [10], PHP [11] and Java [8]. We can see borrowing in action in the above example. In In this section, we briefly introduce the mechanism of this example, the variable x1 is immutable which is immutably how to define operational semantics in K with an example. borrowed twice at Lines 3 and 4. The compiler will issue an Basically, the syntax of a language is defined using BNF with error at Line 5 which tries to mutate the value of x1. It also semantic attributes, and the operational semantics is defined issues another error at Line 6, as immutable variable x1 cannot by a set of K rules which describe the effect of atomic be mutably borrowed. The code at Lines 8-11 shows that the program statements over K configurations. A K configuration mutable variable x2 can also be immutably borrowed multiple is essentially a nested cell defined in XML style, which times, but cannot be mutated by an immutable reference. specifies a state of an executing program. Line 15 demonstrates that the mutable variable x3 cannot be For better understanding K, let us consider the following mutated via x3 once it is borrowed. Instead, it can be mutated semantic rule for lookup function in Rust, which is used to via the mutable reference p3 at Line 16. Line 17 shows that lookup the value of a name (e.g., variable) when a statement the mutable variable x3 cannot be mutably borrowed more which contains that name is being executed. than once. X ··· h··· X 7→ L ···ienv h··· L 7→ V ···istore E. Lifetime V k Borrowing grants temporary access to the object. Rust There are three cells related to the lookup function. The cell associates to each reference with a lifetime to specify how long k (i.e., the cell labeled with k) is used to store the computations is temporary. Intuitively, lifetimes are effectively just names of a program to be executed. In this example, X is the next for scopes somewhere in the program, but they are not same. expression/statement to evaluate/execute, where X is a K label The lifetime of a reference should be included in the lifetime representing a program name. Both env and store cells are of the borrowed variable. Rust provides a convention so that Map type to store key-value pairs, where env is used to store

3 TABLE I The syntax of KRust

Syntax Description Id ::= [a-zA-Z ][a-zA-Z0-9 ]* Identifier Type ::= i8 | u8 | i16 | u16 | i32 | u32 | i64 | u64 | f32 | f64 | isize | usize | char | &str | bool | Id | () | [Type;Exp] | fn (Types) -> Type Types ::= Type* Variable Types TypedId ::= Id : Type TypedIds ::= TypedId* Auxiliary types ConstAndStatic ::= const Id : Type = Exp ; | static mut ? Id : Type = Exp ; DeclExp ::= let mut ? Id [: Type] ? [= Exp] ? | ConstAndStatic; Variable declaration Op ::= “+” | “-” | “*” | “/” | “%” | “|” | “&” | “>>” | “<<” | “<” | “<=” | “>” | “>=” | “==” | “!=” | “||” | “&&” Exp ::= Int | Bool | Float | String | Char | Id | *Id | [Exps] | [Exp;Exp] | vec![Exps] | (Exp) | Exp[Exp] | {Exp} | Ref Exp | Id { StructValues } | Exp(Exp) | -Exp | ! Exp | Exp Op Exp Exps ::= Exp* Expressions AssignOp ::= “=” | “+=” | “-=” | “*=” | “/=” AssignmentStmt ::= Id AssignOp Exp ; | Id[Exp] AssignOp Exp ; | *Id AssignOp Exp; | Id . Id AssignOp Exp ; Assignment statement If ::= if Exp else ? Block While ::= while Exp Block Loop ::= loop Block | If | While If, while and loop statement Block ::= {}|{ Stmts }|{ Stmts Exp } Block Ref ::= & | &mut Two types of references Struct ::= struct Id { TypedIds } StructValue ::= Id : Exp StructValues ::= StructValue* StructInstance ::= Id { StructValues } Struct For ::= for Id in Int..Int Block For statement Function ::= fn Id (TypedIds) [-> Type] ? Block Function Stmts ::= DeclExp | AssignStmt | Block | Exp ; | return ; | return Exp ; | Loop ; | Loop | Function | Struct | For Statements

the mapping from program names to locations in the form of hKi hMapi hhListi i hMapi X 7→ L, and store is used to store the mapping from locations k env fstack control genv hMapi hMapi hMapi h i to values in the form of L 7→ V. The dots “··· ” in the cells are typeEnv store mutType 0 nextLoc * hMapi hMapi hMapi hMapi + structural frames, denoting irrelevant pairs or computations. borrow ref refType moved T For instance, given the following two statements: Fig. 1. The K configuration for the states of KRust programs 1 let a = 1 ; 2 let b = a ; / / b = 1 Variable a should be first evaluated to 1 using the lookup rule when the second statement is being executed. The cell T is the top one in K which contains all the cells. B. Syntax As aforementioned, the k cell contains the computations of a program. The env cell is the local environment, recording Table I presents the syntax of a realistic subset of Rust the map from variables to their locations. Inside control cell, defined in KRust. The syntax is described by a dialect of Ex- there is a fstack cell which encodes the stack frame. The tended Backus-Naur Form (EBNF) according to the grammar fstack cell is a list, in which each element contains an env of Rust [12]. We use two repetition forms in the definition of cell, some computations and a return type. The genv cell the syntax, where “?” means zero or one repetition. “*” means represents global environment. The typeEnv cell records the zero or more repetitions. The option is represented through type of a given variable’s location. The values of all the defined squared brackets [ ... ] followed with “?”. For instance, in variables are stored in the store cell. syntax for function declaration, “[-> Type] ?” means that “- > Type” maybe present just once, or not at all. Since Rust Since a variable is either mutable or immutable in Rust, we shares many common conventions with prevalent functional add mutType cell to record whether the variable is mutable and imperative programming languages, most of its syntax or immutable. When a new variable is declared, a new loca- are easy to understand and hence we omit corresponding tion that is an integer value is allocated from nextLoc cell. explanations. We remark that Table I is not a full syntax After that, integer value in nextLoc is increased by one. The of Rust, e.g., traits and pattern matching are excluded (c.f. borrow cell keeps the record whether a variable is mutably Section VI). or immutably borrowed if there exists an alive reference to it. The ref cell records reference relations. The refType cell C. Configuration contains the types of references, including immutable and A K configuration is represented by nested multisets of la- mutable references. The moved cell is a map from locations beled cells. Figure 1 shows the 13 main cells in a configuration of variables to {0, 1}, where 1 denotes that the variable has for the representation of the state of KRust programs. been moved, otherwise not.

4 TABLE II The partial semantic rules of KRust

Variable declaration and assignment hX ⇒ Li hL ⇒ ⊥i hL ⇒ Ti hX ⇒ Li hL ⇒ ⊥i hL ⇒ Ti let X : T; env store typeEnv let mut X : T; env store typeEnv .  hL ⇒ 0imutType hL ⇒ ⊥iborrow hL ⇒ ⊥iref  .  hL ⇒ 1imutType hL ⇒ ⊥iborrow hL ⇒ ⊥iref  k  h ⇒ ⊥i h ⇒ i h + i  k  h ⇒ ⊥i h ⇒ i h + i  D E  L refType L 0 moved L 1 nextLoc  D E  L refType L 0 moved L 1 nextLoc   [Declaration-of-Immutable-Variable]  [Declaration-of-Mutable-Variable]    hX ⇒ Li hL ⇒ [T; N]i  hX 7→ Li = hX 7→ Li hL 7→ Ti let mut env typeEnv X env X V; env typeEnv X: [T;N]; hL ⇒ 1i hL + Ni V hL 7→ Vi . hL 7→ 1i hL 7→ ( ⇒ V)i .  mutType nextLoc  k ( store ) k ( mutType store ) k  hL ... L + N − 1 ⇒⊥i  D E[Lookup-of-Variable] D E when T = getType(V) [Assignment] D E  store   [Declaration-of-Mutable-Array]   X[I] = V; hX 7→ Lienv hL 7→ [T; N]itypeEnv X[I] hX 7→ Lienv hL 7→ [T; N]itypeEnv . k ( hL 7→ 1imutType hL + I 7→ ( ⇒ V)istore ) V k ( hL + I 7→ Vistore ) whenD 0 ≤ I < N andE T = getType(V) [Updating-of-Array-Element] DwhenE 0 ≤ I < N [Evaluation-of-Array-Element] Borrow, reference, lifetime and dereference h 7→ , 7→ i h 7→ , 7→ i hX 7→ L1, Y 7→ L2ienv hL1 7→ 1, L2 7→ 1imutType X L1 Y L2 env L1 1 L2 1 mutType X = &mut Y; X = &Y; h 7→ , 7→ i h 7→ ⇒ i hL1 7→ 0, L2 7→ 0imoved hL1 7→ ( ⇒ L2)iref .  L1 0 L2 0 moved L1 ( L2) ref  . k   k  h 7→ ⇒ i h 7→ ⊥⇒ i   hL1 7→ 0i hL2 7→ 0i  D E  L1 ( 1) refType L2 ( 1) borrow  D E  refType borrow   when L > L [Mutable-Reference]  when L1 > L2 [Immutable-Reference]  1 2    ∗X hX 7→ L1ienv hL1 7→ L2iref V k ( hL1 7→ 0imoved hL2 7→ Vistore ) D E [Dereference] Function definition and function call fn F(TIDs) ->T{S} hF ⇒ Lienv hL + 1inextLoc mkDecls((X : T, TIDs), (V,Vs)) λ(TIDs,S,T)(Vs) K h(Env, K, T)ifstack . , k ( hL ⇒ λ(TIDs, S, T)istore ) let X : T=V; mkDecls(TIDs, Vs) k mkDecls(TIDs Vs) S return (); k ( hEnv ←⊥ienv ) D [Function-Definition-with-Return-TypeE ] D [mkDecls-Helper-FunctionE ] D E [Function-Call] fn F(TIDs) {S} fn F(TIDs) -> T {SE} return V; − h(Env, K, T)ifstack

fn F(TIDs) -> () {S} k fn F(TIDs) -> T {S return E;} k V K k ( h ← Envienv ) [Function-Definition-without-Return-TypeD E ] [Function-Definition-Return-by-Last-ExpressionD E ] D when T =E getType(V) [Return] Struct and ownership struct Z {TIDs} hZ ⇒ Lienv hL + 1inextLoc let mut X = Z {Vs}; hX ⇒ L1, Z 7→ L2ienv hL1 ⇒ 1imutType hL1 ⇒ 0imoved . , , k ( hL ⇒ F1(TIDs)istore ) F2 (F1(TIDs) X Vs) k ( hL1 ⇒ ZitypeEnv hL2 7→ F1(TIDs)istore hL + 1inextLoc ) D E [Struct-Definition] D E [Declaration-of-Struct-Instance] hX.Y ⇒ Lienv F2(F1 ((Y : T, TIDs), X, (Y : V), Vs) X = Y; hX 7→ L1, Y 7→ L2, Z 7→ L3ienv hL1 7→ Z, L2 7→ ZitypeEnv hL ⇒ Vistore F2(F1 (TIDs), X, Vs)   F3 (X, Y, TIDs) F4 (Y) hL 7→ 1i hL 7→ F (TIDs)i k  hL + 1i  k ( 1 mutType 3 1 store ) D E  nextLoc  D E [Ownership-Move] [F1-F2-Helper-Function]   F3(X,Y,((P:T),TIDs)) F4(X) hX 7→ Lienv X.Y hX.Y 7→ L1, X 7→ L2ienv . X.P = Y.P; F3 (X,Y,TIDs) k k ( hL 7→ (0 ⇒ 1)imoved ) V k ( hL1 7→ Vistore hL2 7→ 0imoved ) F -Helper-Function D [ 3 E ] D E [F4-Helper-Function] D E [Evaluation-of-Struct-Field]

X.Y = V; hX.Y 7→ L1, X 7→ L2ienv hL2 7→ 1imutType . k ( hL1 7→ ( ⇒ V)istore hL2 7→ 0imoved ) D E [Updating-of-Struct-Field]

D. Formalization of the semantics the k cell, where X is a variable and T is a primitive type. A In this section, we present the formal semantics KRust in new pair L : ⊥ is added to store, where ⊥ indicates that X K, emphasizing on key features of Rust such as: mutability, is uninitialized. The pair L : 0 is added to mutType, where borrows and ownership. Table II shows the partial semantic 0 means that X is immutable. The next available location rules of KRust, which specify the semantics of Rust statements becomes L + 1 since X has consumed the location L. Some by modifying the content of relevant cells. In the semantic other initialization operations are made in the relevant cells as rules, operator ⇒ is used to add a new pair to its corresponding shown in the rule. The rule for the declaration of a mutable integer variable is defined likewise with a modification of the cell. For instance, hX ⇒ Lienv means inserting a new pair X : L into the env cell. The operator ⊥ denotes undefined value. Dot mutType cell. · is a special character in K, representing the identity element The semantic rule [Assignment] is defined for the ordinary in list and map structures. Note that curly braces are used only assignment, e.g., assigning a new value to an integer variable. for the compactness of the rules, where we leave out all the It says that the assignment could be successfully executed only dots “··· ” in cells. The complete semantic rules of KRust can if X is mutable and the type of V is T, where getType is a pre- be found in its source code. defined function to get the type of V. The execution updates 1) Variable declaration and assignment: The semantic rule the value in store using L 7→ ( ⇒ V), where denotes that [Declaration-of-Immutable-Variable] specifies how cells are the current value of L can be any. The notation A 7→ (B ⇒ C) affected after the execution of the statement let X : T; in will be used regularly in this work, which stands for replacing

5 the pair A : B by A : C in a cell. A declaration with an Type] handles the corner case that the function definition does initial value can be handled similarly by combining the above not have a return type, for which, the unit type () is added semantic rules. as the return type. The semantic rule [Function-Definition- The semantic rule [Declaration-of-Mutable-Array] is de- Return-by-Last-Expression] handles another corner case that fined for the declaration of a mutable array, where a new pair the function definition returns the value of the last expression, X : L is added to env, i.e., allocates a location L for X. In for which, the last expression E is rewritten as a return typeEnv,[T; N] is the array type. L ... L + N − 1 in store statement return E;. denotes the locations L, ··· , L + N − 1 which are uninitialized. The rule [Return] first checks the type of the return value The integer in nextLoc cell is increased by N, as locations V, then returns the value V and finally restores the local L, ··· , L + N −1 are used to store the content of the array. The environment Env and the computations K from the top of the rules for the evaluation and updating of an array are defined fstack cell at the same time. likewise, as depicted in the table. 4) Struct and ownership: The rule [Struct-Definition] is 2) Borrowing, reference, lifetime and dereference: Both used for struct definition, which adds a new pair Z : L into immutable references and mutable references for borrow- the env cell. The value of L in store is a helper function F1 ing are implemented in KRust.[Mutable-Reference] and that is used to record the fields of the struct. [Immutable-Reference] are two of semantic rules for im- The rule [Declaration-of-Struct-Instance] is defined for the mutable and mutable references respectively. The [Mutable- declaration of a mutable struct instance. It allocates a location Reference] rule expresses that if both X and Y are mutable L1 for X (i.e., adds X : L1 into env) and sets the type of L1 and have not been moved, and Y has not been borrowed yet, as Z, which is a struct name (i.e., add L1 : Z into typeEnv). then the value of L1 in ref cell is updated to L2, which may Other related cells are initialized accordingly. The declaration be used for dereference, the reference type of L1 in refType and initialization of fields are handled by the computation is assigned by 1 denoting that the reference type of L1 is F2(F1(TIDs), X, Vs), which is defined by the [F1-F2-Helper- mutable, the pair L2 :⊥ in cell borrow is updated to L2 : 1 Function] rule. This rule recursively allocates a location for meaning that Y is now mutably borrowed, while the condition each field, which is initialized with the corresponding value L1 > L2 ensures that X has to be declared later than Y, i.e., the from Vs. lifetime of X is included in the lifetime of Y. The[Immutable- The rule [Ownership-Move] specifies Rust’s move seman- Reference] rule is defined likewise, except that Y is already tics, in which the assignment statement is encoded as the immutably borrowed and the reference type of L1 in refType computations of two helper functions F3 and F4, if X and is assigned by 0 denoting immutable. The intuition behind the Y have same type and X is immutable. Notice that the pair [Dereference] rule is straightforward, namely, the value of ∗X L3 : F1(TIDs) in store ensures that Y is a struct instance. The is evaluated via the reference relation L1 7→ L2 in the ref cell. semantics of the helper functions F3 and F4 are expressed 3) Function definition and function call: The standard func- by the rules [F3-Helper-Function] and [F4-Helper-Function] tion definition that has an explicit return type is handled by the respectively. Intuitively, [F3-Helper-Function] helps to copy [Function-Definition-with-Return-Type] rule, which allocates the fields from Y to X.[F4-Helper-Function] is used to a location L for the name F in the env cell and binds the update the move cell. We remark that hL1 7→ 0, L2 7→ 0imoved location L with an auxiliary function λ() in store. The auxiliary is not added in the [Ownership-Move] rule, as “X.P = Y.P;” function λ() records the type and body of the function, which in [F3-Helper-Function] ensures that both variables are not will be used for function call, i.e., the [Function-Call] rule. moved yet. The semantic rules [Evaluation-of-Struct-Field] The function call is processed into two steps: first replacing and [Updating-of-Struct-Field] are defined for evaluation and the function name by its λ() function in store using the updating of a struct filed, in which it is required that the struct [Lookup-of-Variable] rule, then applying the [Function-Call] variable is not moved, i.e., the pair L2 : 0 should occur in rule. The [Function-Call] rule first stores the current local moved. environment Env together with the return type T and the remaining computations K into the fstack cell, reallocates a IV. Testing and Applications new empty local environment denoted by hEnv ← ⊥ienv in In this section, we validate our semantics KRust and show the rule, and sets mkDecls(TIDs, Vs) S return (); as some potential applications of KRust. the computations. The latter firsts executes the helper function mkDecls() which is used to declare formal parameters initial- A. Conformance Testing ized with actual parameters Vs recursively (c.f. the [mkDecls- Following previous work [7], [11], [8], [13] which used test Helper-Function] rule), then executes the function body S suite for validating executable language semantics, we tried to followed by an additional return statement return ();. We do the same. We validated our semantics KRust by testing the remark that return (); is added to handle the case that the KRust interpreter (that was automatically generated from the function definition does not have a return statement, and it KRust semantics using K framework) against both the official will be executed only if there is no return statement at the test suite of Rust 3 and hand-crafted tests. end of the function body S. The semantic rule [Function-Definition-without-Return- 3https://github.com/rust-lang/rust/tree/master/src/test.

6 The official test suite of Rust is used to test the Rust As an example, we will verify the time complexity of the compiler. It is already split into folders containing different Euclidean algorithm by subtraction which computes greatest categories of tests. We chose tests from the “run-pass” folder, common divisor. ff as others were designed for di erent purposes such as error 1 fn gcd(a: i32, b : i32) −> i3 2 { message. There are 3119 tests in the “run-pass” folder: some of 2 i f a != b { them can be compiled by the nightly version or stable version 3 i f a>b { return gcd(a−b, b); } 4 else { return gcd(a, b−a ) ; } of the compiler, and some may be ignored during compiler 5 } else { return a ; } testing, but it is unclear how these tests were used from the 6 } official documents. Therefore, we parsed all 3119 and 195 of them are supported by our syntax. In 195 tests, there are The Euclidean algorithm is implemented in Rust as 38 tests that either do not have a main function or call some shown above. We will prove its time complexity is indeed standard library functions. Therefore, we chose other 157 tests. O(max(a, b)). To verify time complexity, an extra cell time is Because 157 tests might not cover all the supported con- added to the configuration which increases a counter T each structs, we hand-crafted 25 tests according to the syntax time when gcd is called. The core part of the specification for defined in Table I. 25 tests together cover all the primitive proving is: constructs. 1 gcd(X:Int, Y:Int) We have tested the KRust interpreter against all these 182 2

7 Besides the above languages, the formal semantics of potential applications of KRust for debugging and verification Python 3.3, Verilog, Scheme, LLVM IR and Esolangk were of Rust programs. also defined in K [10], [15], [16], [17], [18]. However, KRust does not cover the full features of Rust, as Compared to other language semantics in K, the most Rust is being actively developed and some of these features are distinguished aspect of our semantics is the formalization not stable so far. As a witness, although the Rust’s community of the distinct features of Rust, namely ownership, borrow, provides some syntax in EBNF [12], it is still far away from lifetime, etc. To address these features, we had to redesign the complete. This makes the formalization of Rust much difficult, program state instead of just copying from the existing works. as mentioned in [20]. The following is a list of features Our semantics has been validated using the Rust standard test that we haven’t implemented yet but plan to implement: (1) suite as well as hand-crafted tests. structs with reference fields, (2) pattern matching which can be seen as a generalization of switch-case, (3) trait objects B. Works related to Rust which like interfaces in Java, (4) lifetime annotation which Rust has attracted some attention of researchers, and there is used to mark explicit lifetime in functions or structs, (5) are some works on the Rust and Rust program complex closures which use outside variables, (6) concurrency verification. for writing multi-threading programs, (7) crates and modules which are used to call external library codes (8) unsafe which Reed presented a formal semantics for Rust that captures the is used to write code that the Rust compiler is unable to prove key features relevant to memory safety, unique pointers and its safety, etc. A long-term program is to develop an almost borrowed references, described the operation of the Borrow complete formal executable semantics for Rust and formally Checker, and formalized the Rust type system [19]. The main verify Rust programs using formal analysis tools turned from goal of this work is to provide a framework for borrow the semantics, towards which the work reported in this paper checking. However, Rust has been evolved a lot since their is the first cornerstone. work and their semantics has not been implemented yet, hence not executable. Very recently, Jung et al. defined a References formal semantics and type system of a language λRust, which incorporates Rust’s notions [20]. Their work has been imple- [1] N. D. Matsakis and F. S. Klock II, “The Rust language,” in ACM SIGAda Ada Letters, vol. 34, no. 3, 2014, pp. 103–104. mented in Coq. The main goal of this work is to study Rust’s [2] Redox, “Redox: a unix-like operating system written in rust,” ownership discipline in the presence of unsafe code. However, https://www.redox-os.org, 2018. the language λ is close to Rust’s Mid-level Intermediate [3] B. Anderson, L. Bergstrom, M. Goregaokar, . Matthews, K. McAllister, Rust J. Moffitt, and S. Sapin, “Engineering the servo web browser engine Representation (MIR) rather than the actual Rust language, using rust,” in ICSE’16, 2016, pp. 81–89. and their semantics defines more behaviors than Rust does. [4] Y. Ding, R. Duan, L. Li, Y. Cheng, Y. Zhang, T. Chen, T. Wei, and Our semantics addresses to exact behavior of the actual Rust H. Wang, “POSTER: rust SGX SDK: towards memory safety in intel SGX enclave,” in CCS’17, 2017, pp. 2491–2493. language and is executable. [5] Stackoverflow, “https://stackoverflow.com/search?q=rust+,” 2018. Dewey et al. proposed a technique to fuzz type checker [6] G. Ros¸u and T. F. S¸erbˇanuˇtˇa, “An overview of the k semantic frame- implementations and applied to test Rust’s type checker [21]. work,” The Journal of Logic and Algebraic Programming, vol. 79, no. 6, pp. 397–434, 2010. It has identified 18 bugs. It is evident that formalization of the [7] C. Ellison and G. Rosu, “An executable formal semantics of C with Rust type system is important, while formalization of the Rust applications,” in POPL’12, 2012, pp. 533–544. semantics is the first step toward this. [8] D. Bogdanas and G. Ros¸u, “K-Java: a complete semantics of Java,” in POPL’15, 2015, pp. 445–456. Two model checkers for verifying unsafe Rust programs [9] A. S¸tef˘anescu, D. Park, S. Yuwen, Y. Li, and G. Ros¸u, “Semantics-based have been proposed [22], [23], which verify Rust programs program verifiers for all languages,” in OOPSLA’16. ACM, 2016, pp. by translating them into input languages of existing model 74–91. [10] D. Guth, “A formal semantics of Python 3.3,” Master’s thesis, University checkers. Our semantics can be automatically turned into of Illinois at Urbana-Champaign, 2013. formal analysis tools such as state-space explorer for reacha- [11] D. Filaretti and S. Maffeis, “An executable formal semantics of PHP,” bility, model-checker and symbolic execution engine using the in ECOOP’14, 2014, pp. 567–592. [12] https://doc.rust-lang.org/grammar.html. language-independent tools provided by K [9]. [13] D. Park, A. Stefanescu, and G. Rosu, “KJS: a complete formal semantics of javascript,” in PLDI’15, 2015, pp. 346–356. VI. Conclusion,Limitations and Future Work [14] C. Hathhorn, C. Ellison, and G. Rosu, “Defining the undefinedness of C,” in PLDI’15, 2015, pp. 336–345. In this work, we proposed a formal executable semantics [15] P. O. Meredith, M. Katelman, J. Meseguer, and G. Rosu, “A formal K executable semantics of Verilog,” in MEMOCODE’10, 2010, pp. 179– KRust for Rust using the framework. KRust captures (1) 188. all the primitive types and their operations, (2) compound data [16] P. Meredith, M. Hills, and G. Rosu, “A K definition of Scheme,” types: struct, array and vector, (3) all the basic control University of Illinois at Urbana-Champaign, Tech. Rep., 2007. [17] LLVM IR in K, “http://github.com/davidlazar/llvm-semantics.” flow constructs: for, while, loop, if, if/else, function [18] E. in K, “http://esolang-semantics.googlecode.com.” definition/call and (4) three most distinct and important [19] E. Reed, “Patina: A formalization of the rust programming language,” features: ownership, borrow and lifetime. We tested Rust University of Washington, Tech. Rep., 2015. [20] R. Jung, J. Jourdan, R. Krebbers, and D. Dreyer, “RustBelt: securing on many hand-crafted tests and Rust’s official tests suite. Tests the foundations of the Rust programming language,” PACMPL, vol. 2, in supported syntax are all passed. We also demonstrated no. POPL, pp. 66:1–66:34, 2018.

8 [21] K. Dewey, J. Roesch, and B. Hardekopf, “Fuzzing the Rust typechecker Rust,” in ASE’15, 2015, pp. 75–80. using CLP (T),” in ASE’15, 2015, pp. 482–493. [23] F. Hahn, “Rust2viper: Building a static verifier for rust,” Master’s thesis, [22] J. Toman, S. Pernsteiner, and E. Torlak, “Crust: A bounded verifier for ETH Z¨urich, 2016.

9