Type-Based Alias Analysis Amer Diwan Stanford University
Total Page:16
File Type:pdf, Size:1020Kb
University of Massachusetts Amherst ScholarWorks@UMass Amherst Computer Science Department Faculty Publication Computer Science Series 1998 Type-Based Alias Analysis Amer Diwan Stanford University Kathryn S. McKinley University of Massachusetts - Amherst J. Eliot B. Moss University of Massachusetts - Amherst Follow this and additional works at: https://scholarworks.umass.edu/cs_faculty_pubs Part of the Computer Sciences Commons Recommended Citation Diwan, Amer; McKinley, Kathryn S.; and Moss, J. Eliot B., "Type-Based Alias Analysis" (1998). Computer Science Department Faculty Publication Series. 18. Retrieved from https://scholarworks.umass.edu/cs_faculty_pubs/18 This Article is brought to you for free and open access by the Computer Science at ScholarWorks@UMass Amherst. It has been accepted for inclusion in Computer Science Department Faculty Publication Series by an authorized administrator of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. Type-Based Alias Analysis Amer Diwan Kathryn S. McKinley J. Eliot B. Moss Department of Computer Science Department of Computer Science Stanford University, Stanford, CA 94305 University of Massachusetts, Amherst, MA 01003-4610 (650) 723-4013 Abstract piler to reorder statements that do pointer accesses. This paper evaluates three alias analyses based on program- Despite its importance, few commercial or research com- ming language types. The first analysis uses type compati- pilers implement non-trivial alias analysis. Three reasons bility to determine aliases. The second extends the first by alias analysis is not implemented are: (1) Many alias anal- using additional high-level information such as field names. yses are prohibitively slow and thus impractical for produc- The third extends the second with a flow-insensitive analy- tion use. (2) The alias analyses in the literature require the sis. Although other researchers suggests using types to dis- entire program (or some representation of it), which inhibits ambiguate memory references, none evaluates its effective- separate compilation and compiling libraries. (3) Most alias ness. We perform both static and dynamic evaluations of analyses have been evaluated only statically, and thus we do type-based alias analyses for Modula-3, a statically-typed not know the effectiveness of these algorithms with respect to type-safe language. The static analysis reveals that type com- the optimizations that use them. To address these concerns, patibility alone yields a very imprecise alias analysis, but the this paper explores using fast alias analyses that rely on pro- other two analyses significantly improve alias precision. We gramming language types. While prior work [1, 6] mentions using type compatibility for alias analysis, none evaluates the use redundant load elimination (RLE) to demonstrate the ef- fectiveness of the three alias algorithms in terms of the oppor- idea or presents the details of an algorithm. tunities for optimization, the impact on simulated execution This paper describes and evaluates three fast alias analy- times, and to compute an upper bound on what a perfect alias ses based on programming language types. The first analy- analysis would yield. We show modest dynamic improve- sis (TypeDecl) uses type compatibility to determine aliases. ments for (RLE), and more surprisingly, that on average our The second (FieldTypeDecl) uses other high-level properties, alias analysis is within 2.5% of a perfect alias analysis with such as field names to improve on the first. The third (SM- respect to RLE on 8 Modula-3 programs. These results il- FieldTypeRefs) improves the second by incorporating a flow- lustrate that to explore thoroughly the effectiveness of alias insensitive pass to include the effects of variable assignments analyses, researchers need static, dynamic, and upper-bound and references. This pass is similar to Steensgaard’s algo- analysis. In addition, we show that for type-safe languages rithm [32]. like Modula-3 and Java, a fast and simple alias analysis may We statically evaluate our alias algorithms using the num- be sufficient for many applications. ber of alias pairs (the traditional method). We also evaluate TBAA based on its static and dynamic effects on an optimiza- 1 Introduction tion. In addition, we evaluate TBAA with respect to an upper bound on the same optimization. Each of the evaluation met- To exploit memory systems, multiple functional units, and rics reveals different strengths and weaknesses in our alias the multi-issue capabilities of modern uniprocessors, compil- algorithms, and we believe this range of metrics, and espe- ers must reorder instructions. For programs that use pointers, cially upper-bound analysis, is necessary to understand the the compiler’s alias analysis dramatically affects its ability to effectiveness of any alias analysis. reorder instructions, and ultimately performance. Alias anal- Our static evaluation reveals that the simplest type-based ysis disambiguates memory references, enabling the com- alias analysis is very imprecise, but that for our Modula-3 The authors can be reached electronically via Internet addresses di- benchmarks, the other two alias analyses significantly reduce g [email protected], fmckinley,moss @cs.umass.edu. This work the number of intraprocedural aliases of a reference to on av- was supported by the National Science Foundation under grants CCR- erage 3.4 references (ranging from .3 to 20.8). We find that 9211272 and CCR-9525767 and by gifts from Sun Microsystems Labora- TBAA is much less effective for interprocedural aliases. tories, Inc., Hewlett-Packard, and Digital Equipment. Kathryn S. McKinley is supported by an NSF CAREER Award CCR-9624209. Amer Diwan was We also evaluate TBAA by measuring the static and sim- also supported by the Air Force Materiel Command and ARPA award num- ulated run-time impact on an intraprocedural optimization ber: F30602-95-C-0098. that depends on alias analysis: redundant load elimination (RLE). RLE combines loop invariant code motion and com- mon subexpression elimination of memory references. TBAA In the ACM SIGPLAN Conference on Programming Language Design and RLE combine to improve simulated program performance and Implementation, June 1998, Montreal, Quebec, Canada, pp. 106– modestly, by an average of 4%, and up to 8% on a DEC Alpha 117. 3000-500 [12] for 8 Modula-3 benchmarks. We also compare TBAA to an upper bound that represents TYPE T = OBJECT f, g: T; END; the best any alias analysis algorithm could hope to do for RLE. S1 = T OBJECT ... END; This comparison shows that a perfect alias analysis could at S2 = T OBJECT ... END; most eliminate an average of 2.5% more heap loads. In addi- S3 = T OBJECT ... END; tion, we modify TBAA for incomplete programs and demon- VAR strate, using RLE, that it performs as well as it does on com- t: T; s: S1; TBAA plete programs. These results and ’s fast time complex- u: S2; ity suggest that TBAA is a practical and promising analysis for scalar optimization of type-safe programs. Figure 1: Type Hierarchy Example The remainder of this paper is organized as follows. Sec- tion 2 describes our type-based alias analysis algorithms. Section 3 presents our evaluation methodology, and uses it 2.2 TBAA Using Type Declarations to evaluate TBAA. Section 4 extends and evaluates TBAA for incomplete programs. Section 5 discusses related work in To use type declarations to disambiguate memory references, alias analysis. Section 6 concludes. we simply examine the declared type of an access path AP , and then assume the AP may reference any object with the 2 Type-Based Alias Analysis same declared type or subtype. We call this version of TBAA, TypeDecl. More formally, given two AP s p and q, Type- This section describes type-based alias analyses (TBAA) in Decl(p, q) determines they may be aliases if and only if: which the compiler has access to the entire program except = for the standard libraries. TBAA assumes a type-safe pro- Subtypes (Type (p)) Subtypes (Type (q)) . gramming language such as Modula-3 [25] or Java [33] that Consider the example in Figure 1. Since S1 is a subtype of does not support arbitrary pointer type casting (this feature is T, objects of type T can reference objects of type S1. Thus, supported in C and C++). We begin with our terminology, = and then discuss using type declarations, object field and ar- Subtypes (Type (t)) Subtypes (Type (s)) = ray access semantics, and modifications to the set of possible Subtypes (Type (t)) Subtypes (Type (u)) = types via variable assignments to disambiguate memory ac- Subtypes (Type (s)) Subtypes (Type (u)) cesses. In other words, t and s may reference the same location, 2.1 Memory Reference Basics and t and u may reference the same location, but s and u may not reference the same location since they have different Table 1 lists the three kinds of memory references in Modula- types. Note that TypeDecl is not transitive. 3 programs, their names, and a short description. 1 AP Table 2: FieldTypeDecl (AP 1, 2) Algorithm Table 1: Kinds of Memory References AP AP AP Case AP 1 2 FieldTypeDecl ( 1, 2) Notation Name Description 1 p p true p.f Qualify Access field f of object p 2 p.f q.g (f = g) FieldTypeDecl (p, q) pˆ Dereference Dereference pointer p 3 p.f qˆ AddressTaken (p.f) p[i] Subscript Array p with subscript i TypeDecl (p.f, qˆ) 4 pˆ q[i] AddressTaken(q[i]) TypeDecl (pˆ, q[i]) We call a non-empty string of memory references, for exam- 5 p.f q[i] false ple aˆ.b[i].c, an access path (AP ) [22]. Without loss of 6 p[i] q[j] FieldTypeDecl (p, q) generality, we assume that distinct object fields have different 7 p q TypeDecl (p, q) names. We also define: 2.3 Using Field Access Types Type (p): Thestatictypeof AP p. We next improve the precision of TypeDecl using the type Subtypes (T): The set of subtypes of type T, declarations of fields and other high level information in the which includes T.