Using Path Spectra to Direct Function Cloning Tom Way Lori Pollock

Using Path Spectra to Direct Function Cloning Tom Way Lori Pollock Department of Computer and Information Sciences University of Delaware, Newark, DE 19716 (302) 831-1953 (302) 831-8458 (Fax) way,pollock ¡ @cis.udel.edu Abstract This paper explores the use of path spectra to guide cloning decisions. By instrumenting the program, a While function cloning can improve the precision of in- profiler can report the frequency of execution of indi- terprocedural analysis and thus the opportunity for op- vidual paths executed during a program run [2]1. For timization by changing the structure of the call graph, a particular program run, the path spectrum is the set its successful application relies on the cloning deci- of paths executed along with their execution frequen- sions. This paper explores the use of program spec- cies [14]. As an indication of program behavior, path tra comparisons for guiding cloning decisions. Our hy- spectra have been used successfully for program opti- pothesis is that this approach provides a good heuris- mization [1, 8, 9, 10] as well as software maintenance tic for determining which calls contribute different dy- and testing [14]. namic interprocedural information and thus suggest good candidates for cloning for the purpose of improv- Our approach to making cloning decisions is based on ing optimization. comparing the program spectra for the execution of the same function by different calls to that function. Keywords: function cloning, path spectra, profile- Our hypothesis is that this approach provides a good guided optimization heuristic for determining which calls contribute different dynamic interprocedural information and thus suggest good candidates for cloning. Path profiles indi- 1 Introduction cate run-time control flow, which will have an effect on the actual dynamic data flow and data dependen- As modular and object-oriented programming are be- cies within the callee; path profiles for different calls coming the norm, and architectures offer more avail- to the same function indicate differences in data val- able fine-grain parallelism, the importance of analysis ues flowing into the function through parameters or and optimization across function boundaries continues via global variables. In addition, profile-guided opti- to increase. While function inlining allows each func- mization and path-qualified data flow analysis within tion to be optimized within the separate context of each the cloned functions may benefit from creating clones call site, its well known limitation is the potentially ex- based on similar run-time paths through the function. ponential code growth and associated increased com- The technique of using program spectra comparison for pile time. The most common alternative is interproce- cloning decisions is different from, but could be used dural data flow analysis, which avoids the code growth, in combination with, the use of program profiling to but is limited by the program’s original calling struc- uncover the most frequently executed calls to guide in- ture, in particular, making conservative assumptions at lining decisions. the convergence of paths in the call graph. Function cloning seeks to play an intermediary role by parti- The objective of cloning decisions is threefold: (1) in- tioning the calls in the program and creating multiple crease program performance, (2) manage compile time, copies of the function body, one for each set of “closely and (3) control code growth. We have been evaluat- related” calls. This graph restructuring can create op- ing our technique in terms of its effectiveness in mak- portunity to compute more precise information during ing intelligent cloning decisions that will lead to im- interprocedural analysis because each node represent- proved optimization as well as its associated overhead. ing a copy of the function has fewer incoming edges 1Actually, the length of the paths that are profiled must be lim- on which information must be merged during analy- ited, as there can be an unbounded number of paths for general flow sis; cloning also seeks to avoid the potential exponential graphs and an exponential number for directed acyclic graphs (i.e., code growth of inlining. flow graphs without loop backedges included). Way & Pollock 1 Our preliminary experiments have focused on compar- namic dispatches, providing significant improvements ing this approach to goal-directed cloning [11] which in performance and reduced code growth over cus- is based on static interprocedural data flow analysis, tomization [5], the previous state-of-the-art specializa- directed toward particular optimizations. Thus far in tion technique. In dynamic compilation environments, our limited study, we have found that cloning based on cloning is sometimes performed on the fly as a state- path spectra comparisons can produce results compara- ment is executed the first time [13]. To our knowledge, ble to that of goal-directed cloning for constant prop- none of these techniques has used path spectra compar- agation, with compile-time and run-time analysis that ison in their cloning decisions. can be reused for cloning that is performed for multiple optimization goals. Path profiling has been used successfully in compiler optimization [1, 9, 10]. Other basic types of con- We first describe the related work on cloning, profil- trol flow profiling are edge profiling, which measures ing, and path spectra comparisons. A simple motivat- the execution frequency of each individual flow graph ing example for our approach is presented, followed by edge, and basic block profiling, which measures how our profile-directed cloning algorithm. We describe our many times each basic block is executed. Edge pro- path spectra comparison technique which is used to par- files are good predictors of frequently executed paths tition call sites for cloning. Finally, this method is com- (hot paths) for programs with a large amount of defi- pared to goal-directed cloning decisions on an example nite flow relative to total flow, while path profiles are code, and we make some concluding remarks and sum- better when there is less definite flow [3]. Path profiles marize our future work. can be collected efficiently and provide more accurate information than edge profiling [2]. In particular, different path profiles can result in the same edge profile, 2 Related Work making it impossible to accurately compute the execution frequency of paths based on edge profile informa- Goal-directed cloning [6, 11] first solves a forward in- tion. In this paper, we use path profiling rather than terprocedural data flow problem with slight modifica- edge profiling as we are interested in the differences, tion in order to compute a set of cloning vectors for or actually, the similarities in the run-time behavior of the particular data flow problem of interest at each call different invocations of a given function. Differences in graph node. Cloning vectors which produce equivalent path spectra obtained from two different calls to a func- effects on the optimization of interest are merged, and tion with different parameter values indicate differences finally the cloning is performed until the program size in the execution states, and therefore the function’s be- reaches some threshold. Cooper et al. [6] presented an havior, due to differences in the parameter values. Edge experiment on the matrix300 code from release one or basic block profiling could be used, but comparisons of the SPEC benchmark suite, in which they showed of spectra created from path profiles generally will give that significant improvement in code quality could be better predictions of differences in run-time behavior. attained by using this method to expose sufficient in- When variables cannot be conservatively identified as formation to perform inlining and unroll and jam. The constants at compile time, value profiling [4] can be approach requires either knowledge of what optimiza- used to determine whether they exhibit a high degree of tions would most likely benefit from cloning in order invariant behavior at run-time. Value profiling records to focus on that forward data flow problem, or a sepa- information about the invariance of a variable, typically rate clone decision-making phase for each optimization the top N values for an instruction and the number of of interest, and then some heuristic to select the clones occurrences for each of those values. In addition to suggested by each phase. Only goal-directed cloning other compilation and optimization decisions, this in- for constant propagation has been investigated. It is not formation can be used for duplicating and specializing clear how easy it would be to construct a goal-directed code conditioned on particular high frequency values. partitioning algorithm for other optimizations or how While value profiling will certainly aid in cloning deci- much goal-directed cloning could improve the opportu- sions with the goal of specializing on constant values, nity for other optimizations, or even more challenging, we believe that cloning that utilizes path spectra dif- for multiple simultaneous optimization goals. ferences as proposed in this paper may provide a more A general framework for selective specialization, the general, or at least complementary, approach to judg- equivalent of cloning for object-oriented languages, ing the run-time behavioral differences between sepa- combines static analysis and profile data to identify the rate calls to the same function, and enable cloning de- most profitable specializations [7]. This goal-directed

Using Path Spectra to Direct Function Cloning Tom Way Lori Pollock

Redacted for Privacy Professor Donald Guthrie, Jr

Dynamic Functions in Dyalog

Xerox University Microfilms 300 North Zeeb Road Ann Arbor, Michigan 48106 76-15,823

Ginger Documentation Release 1.0

Speeches and Papers

2D3-Induced Genes in Osteoblasts

FEMA 369 2000 Edition

Computer Science and the Organization of White-Collar Work, 1945-1975

Problems of Drug Dependence 1989: Proceedings of the 51St Annual

Three Implementation Models for Scheme

Measuring Space Systems Flexibility: a Comprehensive Six-Element Framework

Financial Numerical Recipes in C++