Sound, Heuristic Type Annotation Inference for Ruby
Total Page:16
File Type:pdf, Size:1020Kb
Sound, Heuristic Type Annotation Inference for Ruby Milod Kazerounian Brianna M. Ren Jeffrey S. Foster University of Maryland University of Maryland Tufts University College Park, Maryland, USA College Park, Maryland, USA Medford, MA, USA [email protected] [email protected] [email protected] Abstract Keywords: type inference, dynamic languages, Ruby Many researchers have explored retrofitting static type sys- ACM Reference Format: tems to dynamic languages. This raises the question of how Milod Kazerounian, Brianna M. Ren, and Jeffrey S. Foster. 2020. to add type annotations to code that was previously untyped. Sound, Heuristic Type Annotation Inference for Ruby. In Proceed- One obvious solution is type inference. However, in com- ings of the 16th ACM SIGPLAN International Symposium on Dynamic plex type systems, in particular those with structural types, Languages (DLS ’20), November 17, 2020, Virtual, USA. ACM, New type inference typically produces most general types that York, NY, USA, 14 pages. https://doi.org/10.1145/3426422.3426985 are large, hard to understand, and unnatural for program- mers. To solve this problem, we introduce InferDL, a novel Ruby type inference system that infers sound and useful type 1 Introduction annotations by incorporating heuristics that guess types. Many researchers have explored ways to add static types to For example, we might heuristically guess that a parame- dynamic languages [3, 4, 15, 19, 29, 31, 36, 37], to help pro- ter whose name ends in count is an integer. InferDL works grammers find and prevent type errors. One key challenge by first running standard type inference and then applying in using such retrofitted type systems is finding type anno- heuristics to any positions for which standard type infer- tations for code that was previously untyped. Type inference, ence produces overly-general types. Heuristic guesses are which aims to type check programs with few or no type added as constraints to the type inference problem to ensure annotations, is an obvious solution, and indeed there are sev- they are consistent with the rest of the program and other eral type inference systems for dynamic languages [2–4, 15]. heuristic guesses; inconsistent guesses are discarded. We for- Beyond type checking, type inference can also be extended malized InferDL in a core type and constraint language. We to generate type annotations for program values. These an- implemented InferDL on top of RDL, an existing Ruby type notations provide a useful form of documentation, and can checker. To evaluate InferDL, we applied it to four Ruby on be used in other forms of program analysis such as code Rails apps that had been previously type checked with RDL, completion for IDEs. and hence had type annotations. We found that, when using However, type inference systems typically aim to find the heuristics, InferDL inferred 22% more types that were as or most general type for every position, i.e., the least restrictive more precise than the previous annotations, compared to possible type. If the type language is rich—particularly if standard type inference without heuristics. We also found it includes structural types—the most general possible an- one new type error. We further evaluated InferDL by ap- notations might be large, hard to read, and unnatural for plying it to six additional apps, finding five additional type programmers. For example, An et al.[2] describe a type in- errors. Thus, we believe InferDL represents a promising ap- ference system for Ruby that infers that a certain position proach for inferring type annotations in dynamic languages. accepts any object with >, <<, >>, &, and ˆ methods. In contrast, a programmer would mostly likely, and much more CCS Concepts: • Software and its engineering ! Data concisely, say that position takes an Integer. Moreover, even types and structures. if we are not interested in producing annotations, large, com- plex types can lead to difficult-to-understand error messages. Permission to make digital or hard copies of all or part of this work for In this paper, we present InferDL, a novel Ruby type in- personal or classroom use is granted without fee provided that copies ference system that aims to infer sound and useful type anno- are not made or distributed for profit or commercial advantage and that InferDL copies bear this notice and the full citation on the first page. Copyrights tations. More specifically, allows the programmer for components of this work owned by others than the author(s) must to specify heuristics for guessing type annotations. For ex- be honored. Abstracting with credit is permitted. To copy otherwise, or ample, one simple but effective heuristic is to guess the type republish, to post on servers or to redistribute to lists, requires prior specific Integer for any variables whose names end with id, num, or permission and/or a fee. Request permissions from [email protected]. count. InferDL runs such heuristics for positions for which DLS ’20, November 17, 2020, Virtual, USA standard type inference is overly-general (meaning, among © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. others, positions with inferred structural types). Heuristic ACM ISBN 978-1-4503-8175-8/20/11...$15.00 guesses are added as additional type constraints and checked https://doi.org/10.1145/3426422.3426985 for consistency with the rest of the program. Only consistent DLS ’20, November 17, 2020, Virtual, USA Milod Kazerounian, Brianna M. Ren, and Jeffrey S. Foster solutions are kept. In this way, InferDL maintains sound- 1 class User < ActiveRecord::Base ness while producing a less general, but potentially more 2 # α ! β useful, solution than standard type inference. (§2 gives an 3 def self.normalize_username (name) overview of InferDL.) 4 name.unicode_normalize.downcase if name.present? We describe InferDL more formally on a core type and 5 end constraint language. We present standard constraint reso- 6 # γ ! δ lution rules, which rewrite a set of constraints into solved 7 def self.find_by_name (name) form from which most general solutions can be extracted. 8 find_by(name_lower: normalize_username(name)) end We describe that solution extraction procedure in detail and 9 10 end then show how to incorporate heuristics. (See §3 for our formal description.) (a) Source code from Discourse app. We implemented InferDL as an extension to RDL, an ex- isting Ruby type checker [13, 19, 31]. We modified RDL to Constraints Generated generate and resolve type constraints, run heuristics, and (1) α ≤ »unicode_normalize : ?! ϵ¼ extract solutions to produce annotations. InferDL currently (2) α ≤ »present? : ?! ζ ¼ includes eight heuristics: one that replaces structural types (3) ϵ ≤ »downcase : ?! η¼ with nominal types that match the structure; six that look at (4) η ≤ β variable names, such as the one mentioned above for names (5) γ ≤ α ending in id, num, or count; and one that produces precise (6) User ≤ δ hash types. We also extended RDL with choice types, an idea Resolved Constraints inspired by variational typing [7] that helps type inference (7) γ ≤ »unicode_normalize : ?! ϵ¼ work in the presence of overloaded methods. (§4 describes (8) γ ≤ »present? : ?! ζ ¼ the implementation of InferDL.) We evaluated InferDL by applying it to four Ruby on Rails (b) Constraints generated on type variables. apps for which RDL type annotations already existed [19, 31]. We note that our results are preliminary, and further work Figure 1. Inferring method types in Discourse. is needed to affirm they generalize beyond our benchmarks. For these apps, InferDL inferred 496 type annotations. Of Suppose we wish to infer types for these two methods these, 399 exactly matched or were more precise than the using the standard, constraint-based approach. We first gen- programmer-supplied annotations, compared to only 290 erate a type variable for the method argument and return such annotations when not using heuristics. InferDL also types, as shown in the comments on lines2 and6, e.g., nor- found one previously unknown type error. We also applied malize_username takes a value of type α and returns type InferDL to six additional Ruby programs for which we did β. Then we analyze the method body, generating constraints InferDL not have annotations, and found five previously of the form x ≤ y, indicating that x must be a subtype of y. unknown type errors. (§5 discusses our evaluation.) In this case we also say x is a lower bound on y and y is an InferDL We believe that is an effective type inference upper bound on x. system and represents a promising approach to generating The top portion of Figure 1b shows the constraints gener- useful, sound type annotations. ated from this example. Constraint (1) arises from the call name.unicode_normalize1. In this constraint, the structural 2 Overview type »unicode_normalize : ?! ϵ¼ represents an object with We begin by discussing standard type inference, which gen- a unicode_normalize method that takes no argument (here erates and solves type constraints to yield type annotations. written ?) and returns ϵ, a fresh type variable generated We then discuss why this approach alone can be inadequate at the call. Hence, by standard subtyping rules, α must be and give a high-level overview of how InferDL uses heuris- a type that contains at least this method with appropriate tics to infer more precise, useful types. argument and return types. Constraint (2) is similar. Constraint (3) arises from calling downcase on the re- sult of . Constraint (4) arises because the 2.1 Standard Type Inference unicode_normalize result of the call to downcase is returned. Note that normal- Figure 1a shows a code snippet taken from Discourse, a Ruby ize_username may also return nil (if the conditional guard is on Rails web app used in our evaluation (§5). The code de- false), but nil is a subtype of all other types in InferDL, so we fines two methods, normalize_username and find_by_name, omit this constraint here.