On Models for Object Lifetime Distributions (short paper)

Darko Stefanovi´c Kathryn S. McKinley J. Eliot B. Moss g [email protected] fmckinley,moss @cs.umass.edu Department of Electrical Engineering Department of Computer Science Princeton University University of Massachusetts Princeton,NJ08544 Amherst,MA01003

Abstract. Analytical models of object lifetimes are ap- rive at a discrete distribution. Most performance-related pealing because they would enable mathematical analysis or propositions have dealt with mortality, which is a deriva- fast simulation of the behavior of pro- tive form; since obtaining a derivative of a discrete ob- grams. In this paper, we investigate models for object-oriented served function involves inherently arbitrary smoothing programs such as Java and Smalltalk. We present analyti- decisions, it has been difficult to characterize the mor- cal models and compare them with observed lifetimes for 58 tality of observed distributions, let alone to match it to Smalltalk and Java programs. We find that observed lifetime an analytical model. The extremely fast decay of ob- distributions do not match previously proposed object lifetime jects exacerbates the situation: most models developed in models, but do agree in salient shape characteristics with the other domains are for much more slowly decaying pop- gamma distribution family used in statistical survival analysis ulations. for general populations. If we knew which distribution family describes typi- cal object behaviors, we could fit observed lifetimes to 1 Introduction the model of that family, and find the best matching in- stance (i.e., its parameters). In fact, a running program If we can develop accurate analytical models for object could recognize the lifetime distribution of objects allo- lifetimes in object-oriented programs, they would enable cated (overall, or at a particular allocation site) and adjust faster and more thorough exploration of memory man- collection policies accordingly. But it is not yet known agement techniques. For instance, given a model of ob- which family this may be. ject lifetimes, we could compute an estimate of copying In the following, we first briefly introduce terms and costs of a generational or some other garbage collector. notation related to lifetime distributions (with more de- If distribution models and garbage collector models are tails in the Appendix), then review what assumptions simple enough, we may even arrive at closed-form an- have been made implicitly (or stated explicitly) in the alytical descriptions; but even if both are quite compli- past research in garbage collection. We develop models cated, we can use the lifetime distributions to drive sim- based on a plausible qualitative characterization of life- ulations of a proposed garbage collector scheme. times; namely, that past lifetime is a strong predictor of Lifetime models are not sufficient for exploring future lifetime. Lastly, we put the models to the test of garbage collection, because they do not account for heap empirical evidence against actual lifetime distributions pointer structure effects: the direct cost of pointer main- from object-oriented programs. tenance (including write barriers), and the copying cost increase owing to the excess retention of objects, both present in generational schemes. Nevertheless, they could be useful as a tool for preliminary evaluation (and 2 Background material understanding) of collector performance. Observed object lifetime behaviors are inherently dis- The lifetime of an object is defined as the amount of al- crete; we measure the lifetime of each object and ar- location that occurs between the allocation of the object and its demise.1 We view object lifetime as a random derstanding of his statement of the weak generational variable. Future studies may look at object lifetimes as hypothesis is this: newly created objects have a much stochastic processes, and in this context, distinguish each higher mortality than objects that are older. His state- allocation site as generating different processes. Here, ment of the strong generational hypothesis (which he in we do not attempt such fine distinctions. fact introduces) is that even if the objects in question are Actual object lifetimes are natural numbers, thus dis- not newly created, the relatively younger objects have a

crete probability distributions are the obvious represen- higher mortality than the relatively older objects, or sim- tation. However, continuous models are used for math- ply, that mt is an everywhere decreasing function.

ematical ease and convenience. Below we review some Baker clearly pointed out that an exponential distri- definitions and symbols from probability theory as they bution of lifetimes, with mt constant, cannot be fa-

apply to survival analysis; further details appear in the vorable to generational collection (as opposed to whole- Appendix along with a summary of properties of com- heap collection), and that instead mt should be de- monly used analytical distribution families. creasing [Bak93]. Nevertheless, the exponential distri-

The survival function of a random variable L is bution has a unique cachet among survival distributions:

℘f > g sL t L t . For object lifetimes, it expresses what its mathematical simplicity and the property of “lack of fraction of original allocation volume is still live at age t. memory”. In a garbage collector this property assures We usually drop the subscript L. The survivor function that an object just discovered live by the collector has

is a monotone non-increasing function. The probability the same residual lifetime as the lifetime of a newly al-

¼

density function is f t s t . Occasionally we also located object, and this greatly simplifies the analysis.

use the cumulative distribution function F t 1 s t . Thus, the exponential distribution was used by Clinger

f t d

The mortality function is mt logs t , and and Hansen in the analysis (and to inspire the design) of st dt

it expresses the age-specific death rate. Mortality is also a non-predictive collector, outside the generational realm known as the hazard function (and written ht ) in the [CH97]. In our examination of a generalized form of that literature on lifetime analysis [CO84]. collector [SMM98], we decided to use not only the expo- ρ

t

nential distribution st e , but also a variation with Ô ρ

t

decreasing mortality st e as being in agreement 3 Previous statements about object with the strong generational hypothesis, as well as a vari-

2

ρ

t

lifetime distributions and their in- ation with increasing mortality st e for control. In fact, these three are instances of the Weibull distribu-

c

ρ

fluence on garbage collection perfor- t

tion st e [Wei61, Lem82]. mance

Object lifetime distributions have been of interest to re- 4 Models derived from the qualitative searchers of garbage collection, especially generational assertion that “past is like the fu- collection: the success of a particular garbage collector ture” organization or promotion policy depends on how well it is matched to the behavior of typical user programs. In A multitude of models can be developed that have de- fact, claims have been made about lifetimes. creasing mortality. But developing them ex vacuo, just Hayes introduced a distinction between a “weak” and for the simplicity of their mathematical formulation (or a “strong” generational hypothesis [Hay91].2 Our un- their use in other domains) is not satisfactory. We can

1The actual point of demise depends on the accuracy of the mem- base models on a broad experimental study, and in Sec- ory management scheme. In the empirical data reported here, that tion 5 we make a first attempt at that. But, our un- scheme is an accurate-roots garbage collector performing full-heap derstanding would be aided more if distribution models collection at each object allocation. Thus, demise is detected pre- could be derived from certain principles that we expect cisely at the point when the object becomes unreachable from the to be naturally associated with program behavior. In this global roots. 2Unfortunately, the paper could be easily misinterpreted, since vein, Appel suggested that plausible object lifetime dis- the term “survival rate” was used both for “the reciprocal of mortal- tributions should satisfy the following: ity” and “the amount copied from a generation”, depending on the context. (1) An object’s future expected lifetime is pro-

2 portional to its current age. [App97] so that

β λ

Thus, past lifetime is a strong predictor of the future 1

τ ; st t

(residual) lifetime. This stands in stark contrast with the λ 1 exponential distribution.

where λ > 2. Normalization: An object’s future expected lifetime C x is the differ- ence between its expected lifetime and its current age. β λ

τ1 The expected lifetime (once we know its current age) 1 s 0

λ 1

is the conditional expected value [Pap91] of the lifetime

j > ℄ random variable L, E L L x , where x is the current gives

age. It is calculated as:

 

= λ Z ∞ 1 1

λ 1

: τ

j > ℄ j > E L L x t f t t x dt β 0

Ê ∞

x t f t dt Ê

∞ We find x f t dt

Ê ∞

x t f t dt 1 1

λ λ :

2 1

β τ τβ τ :

Gx x x

sx λ

λ 2 1

j > ℄

Thus C x E L L x x, and statement (1) is: Expected value

9ψ > 8 > ψ ;

λ  0 x 0 C x x  2 λ 1

1 λ 1

℄ β :

E L G 0

λ λ β

with ψ a proportionality constant between the current 2 1

age x and the future expected lifetime C x . Unfortu-

℄ j > ℄

nately, with x 0, we have E L E L L 0 If we take this value as a free parameter E L V (as it

ψ ¡ ψ C 0 0 0 for any . We look at two ways out of is necessary to do in order to generate a trace for simula- the quandary: first, by letting the relation (1) hold only in tion, where we want to control the heap size in equilib- ∞

the limit as x ! ; and second, by restricting the domain rium), we have

; ∞ of definition of the distribution to an interval x0 .

β λ 1 λ λ :

We shall find use for an alternative form of (1): Let 1 V 2

Ê

∞ Gx

j > ℄ Λ

Gx t f t dt; then E L L x . Let x

x sx

j > ℄

Gx E L L x C x x C x We can then simplify:

1. xsx x x x

Then statement (1) is

λ :

τ V 2

¼ ¼

9ψ 8 > Λ ψ ; > 1 x 0 x

The ratio

¼ ψ ψ with 1.

τ λ λ

Λ Gx 1 1 1 V ;

x

λ λ λ 4.1 Past-is-future in the limit xsx 2 2 x 2 x

Let us weaken the statement (1) so that the proportional- hence ity holds asymptotically, for large values of x. We look

λ 1

for a distribution such that lim Λx exists and is strictly Λ > ; ∞ lim 1

x! x

!∞ λ greater than 1. Here is one such distribution. x 2

Let as required. Varying λ changes the value Λx uniformly

β for all x; lower values of λ produce distributions with

;

f t λ heavier tails.

τ t 3

4.2 Past-is-future restricted to x > x0 The expected live amount in the heap is:

! 1

¼ >

We make condition (1) strictly hold for , where ψ ¼ x x ψ

0 x 1 ¼

¼ 0

>

ψ ψ : x0 0, and we define the distribution (functions F, f , v x x 1

0 x

s, and m) in that interval, but we set F 0, f 0, s

; ℄

1, and m 0 in the interval 0 x0 . This formulation is

The steady-state heap volume V lim v x equals the ∞ intuitively appealing, since lifetimes in practice take on x!

expected value of L, which is ψ¼ x0. What are reason- discrete values in Æ , and hence setting x0 1 is quite natural. able parameter values? Suppose that we wish to set 3

Condition (1): V 50000. Suppose also that we want x0 to be 1. Then

¼

ψ

V =x0 50000. However, the live volume in the

¼ ¼

> ; 9ψ 8 > Λ ψ 1 x x0 x heap, v x , approaches its limit value V at the rate of de- cay of the second term, that is, as the 49999-th root of x. that is, With such a slow approach, that is, with such a heavy tail

99997 ¡

in the distributions, one must allow a time 3:67 10

¼

ψ ; Gx xs x to pass before the heap is within 1% of equilibrium; until the heap is in equilibrium, the distribution of objects in gives the integral equation,

it does not reflect the heavy tail of the source distribu- Z

Z ∞ ∞ tion. Simulating that many objects is somewhat imprac-

¼

ψ : t f t dt x f t dt x x tical. Moreover, can actual programs exhibit such ex-

ψ¼ tremely slow heap growth as with 50000? Suppose It is easy to obtain the corresponding differential equa- that a program does. We can only observe executions tion: of much shorter duration than 1099997, say, up to1010;

¼ but then we cannot empirically distinguish the postulated ψ

1 F x

¼

: ¼

F x ψ

¼ 50000 past-is-future distribution from another dis-

ψ 1 x 10

tribution that agrees with it up to t 10 , but lacks the

The solution, with the boundary value F x0 0, is heavy tail beyond that age. Alternatively, to allow 99% of the steady-state volume to be reached with 107 objects

ψ¼

 

¼

¼

ψ :

x0 1 simulated, one must have ψ < 2 5 (approximately), but

; F x 1

x then x0 > 20000. Thus, beyondthe construction of an el-

egant analytical model of object lifetimes, we must keep

¼

ψ 

 2 1 in mind the need to be able to validate it—or invalidate— ¼

ψ ¼

1 x0 ψ 1

;

f x against real data. ¼

x0 ψ 1 x

ψ¼ 5 Comparing postulated models with

  ¼

x0 ψ 1

; sx empirical evidence x In validating lifetime models, we apply the tools of sta-

ψ¼ 1 tistical analysis of survival data to the distributions of

:

mx ¼

ψ 1 x object lifetimes. The populations traditionally studied in statistical analyses are quite different from programs’ The mortality is indeed an everywhere decreasing func- objects. Moreover, the relevant literature concentrates

tion. mostly on statistical confirmation of explanatory vari-

! 1 ¼ ables (e.g., in clinical trials), which is different from our

ψ ψ¼ x 1

¼ 0 goal: finding distributions.

ψ ; Gx x

3This number is sufficiently large so that in simulation, even when  a heap is divided into  100 regions, each one contains at least

100 objects, which allows us to vary the heap configuration widely

¼

ψ : Gx0 x0 without incurring significant discretization errors. 4 40

35

30 3 γ 25

20

15

Coefficient of skewness 10 past-is-future restricted past-is-future in the limit 5 log-normal Weibull gamma observed distributions 0 fit to observed 0 5 10 15 20 Coefficient of variation γ

100 3 γ 10

1 Coefficient of skewness past-is-future restricted 0.1 past-is-future in the limit log-normal Weibull gamma observed distributions fit to observed 0.01 0.01 0.1 1 10 100 Coefficient of variation γ

Figure 1: Coefficient of skewness, γ3, versus coefficient of variation for several analytical distribution families and for empirically observed object lifetime distributions. Top: linear scale; bottom: logarithmic scale. (See Appendix for definitions.)

5 One recommended test for comparing families of dis- us to exclude certain analytical distribution families as tributions is based on the graph of moment ratios: the co- models for a set of observed distributions. Note, how- efficient of skewness γ3 vs. the coefficient of variation γ. ever, that the two moments displayed do not completely These moment ratios are dimensionless quantities inde- capture the shape of a distribution. For positive matching pendent of time-scale [CO84, p.24–28] that reflect only further statistical tests are necessary. the shape of the distribution. The coefficient of varia- tion is a measure of the spread of the distribution around its mean; the coefficient of skewness is a measure of the 6 Summary asymmetry of the distribution, and large positive values Analytical modelling of object lifetimes is desirable for indicate heavy tails. Each distribution corresponds to

the design, analysis, and simulation of dynamic mem-

γ; γ a single point ; a single-parameter family of dis- 3 ory management systems, but it has remained a diffi- tributions defines a parametric curve in the γ–γ plane. 3 cult problem. We examined certain qualitative crite- (See Appendix for definition of γ and γ .) Three fam- 3 ria that may be imposed on lifetime distributions. We ilies of distributions commonly used in survival analy- demonstrated the use of a simple graphical technique for sis (log-normal (Section A.4.3), Weibull (Section A.4.2), (in)validating postulated distribution models against em- and gamma (Section A.4.1)), are plotted in the γ–γ plane 3 pirical evidence. in Figure 1. Note that the Weibull and gamma curves in-

Acknowledgments. We are grateful to Andrew Appel

; tersect at the point 1 2 ; at this point each has degener- for comments and discussions that spurred the develop- ated into the exponential distribution. The two families ments presented. introduced here are also shown (past-is-future in the limit

(Section 4.1) and past-is-future restricted (Section 4.2)).

γ; γ Also, the figure contains scatter points 3 of object References lifetime distributions obtained empirically, and a line of least-squares fit for these points. These empirical distri- [App97] Andrew Appel, A better analytical model for the butions come from 58 Smalltalk and Java programs. strong generational hypothesis, November 1997. The scatter points of empirical distributions show a [Bak93] Henry G. Baker, ‘Infant Mortality’ and genera- tional garbage collection, SIGPLAN Notices 28 trend of correlation between γ and γ3. This trend is some- (1993), no. 4, 55–57. what surprising, since there is no a priori reason to ex- pect it from a haphazard collection of benchmark pro- [BM70] Richard Stevens Burrington and Donald Curtis grams. Perhaps the presence of this trend points to fun- May, Jr., Handbook of probability and statistics damental properties of program behavior, and it certainly with tables, 2nd ed., McGraw-Hill, New York, ought to be studied further. 1970. The scatter points lie for the most part well to the right [CH97] William D. Clinger and Lars T. Hansen, Genera- and below the common analytical distribution families. tional garbage collection and the radioactive de- The one exception is the gamma family: in fact, even cay model, SIGPLAN Notices 32 (1997), no. 5, 97–108, Proceedings of the ACM SIGPLAN ’97 though most scatter points are to the right and below the Conference on Programming Language Design gamma curve, they are quite close to it. and Implementation. The two past-is-future families are a disappointment: [CO84] D. R. Cox and D. Oakes, Analysis of survival data, they both have much lower coefficient of variation and Chapman and Hall, London, 1984. much higher coefficient of skewness than the empiri- cal distributions. Therefore, however intuitively plausi- [Dev86] Luc Devroye, Non-uniform random variate gener- ble they are, they should not be employed to model ob- ation, Springer-Verlag, New York, 1986. ject lifetimes. Indeed, we must concentrate the search [EHP93] Merran Evans, Nicholas Hastings, and Brian Pea- for analytical distributions on those—not in standard cock, Statistical distributions, 2nd ed., John Wiley, literature—with much higher coefficient of variation; in New York, 1993. the meantime, the gamma family is to be favored as a [EJJ80] Regina C. Elandt-Johnson and Norman Lloyd candidate. We see that the γ–γ3 diagram usefully sum- Johnson, Survival models and data analysis, Wi- marizes the shape properties of distributions and allows ley, New York, 1980.

6 [Hay91] Barry Hayes, Using key object opportunism to A.2 Moments collect old objects, Proceedings of the Confer- The moments of a distribution of random variable L are de-

ence on Object-Oriented Programming Systems, Ê

k ∞ k

℄ fined as mk E L 0 t f t dt. The central moments are

Languages, and Applications (Phoenix, Arizona), Ê

k ∞ k

℄ SIGPLAN Notices 26, 11 (November 1991), Octo- defined as µk E L m1 0 t m1 dt. Here m1 is the ber 1991, pp. 33–46. mean, or expected value, and m2 is the variance, or dispersion; 2 σ a common notation is σ µ2, where is called standard de- [JK70] Norman L. Johnson and Samuel Kotz, Distribu- viation. In calculation, we usually first find moments m1, m2, tions in statistics: Continuous univariate distribu- and m3, by integration in the case of analytical definitions, or tions (vols. 1 and 2), John Wiley, New York, 1970. by summation over observed discrete points in empirical dis- tributions, and then compute central moments using the for-

[Lem82] Achille Lemmi, Empirical testing of the weibull 2 3

mulae µ m m and µ m 3m m 2m . distribution in the analysis of younger ages mor- 2 2 1 3 3 1 2 1 γ σ The coefficient of variation is . The standardized tality, Tech. report, Facolt`adi Scienze Economiche m1

µ3 γ

e Bancarie, Universit`adi Siena, December 1982, third moment or coefficient of skewness is 3 σ3 ; it is also Ô Paper presented to a seminar on Demographic written η3 or β1. Models at the Department of Demography, Uni- versity of Kinshasa, December 15–22, 1982. A.3 Finiteness of expected value [MF65] Irwin Miller and John E. Freund, Probability and statistics for engineers, Prentice-Hall, Englewood It is a simple exercise to show that the expected value of the Cliffs, New Jersey, 1965. live amount in the heap at time x (that is, after an amount x has

Ê x

been allocated) is vx s t dt, and that

[Pap91] Athanasios Papoulis, Probability, random vari- 0 Z

ables, and stochastic processes, 3rd ed., McGraw- Z ∞ ∞

℄; Hill, New York, 1991. V lim v x s t dt t f t dt E L ∞ x! 0 0 [SMM98] Darko Stefanovi´c, J. Eliot B. Moss, and Kathryn S. McKinley, Oldest-first garbage collection, Com- when they exist. puter Science Technical Report 98-81, University We impose on the object lifetime distribution an additional of Massachusetts, Amherst, April 1998. property of finiteness (existence) of expected value, to ensure that a heap equilibrium is reached in the limit. (Heap equilib- [Wei61] Waloddi Weibull, Fatigue testing and analysis rium has been the underlying assumption in some compara- of results, Pergamon Press, for the Advisory tive analyses of garbage collection costs [CH97, SMM98]. A Group for Aeronautical Research and Develop- relative heap size parameter is used as the basis for compari- ment, North Atlantic Treaty Organization, New son of two collection algorithms: heap size is a fixed multiple York, 1961. of a steady-state live data amount.) How essential is this re- quirement, and could we also consider distributions with un-

A Vademecum of lifetime distributions bounded expected value? On the one hand, the running time of ℄ real programs is finite, and thus f is finally-zero, hence E L is finite. On the other hand, it is theoretically plausible that we A.1 Basic definitions are observing initial segments of potentially infinite computa-

tions, and so it is useful to investigate heaps that grow without

The survival function of a random variable L is sL t

bound. From a purist standpoint, many realistic programs that

> g ℘fL t . For object lifetimes, it expresses what fraction of original allocation volume is still live at age t. We usually run indefinitely do use increasing amounts of space; for in- drop the subscript L. The survivor function is a monotone stance, counting requires logarithmically increasing space. non-increasing function. If the live data amount does not stabilize, but rather grows

¼ indefinitely, then the available heap size ought to grow in equal

The probability density function is f t s t Occasion-

proportion—if one desires measurements in terms of the rela-

ally we also use the cumulative distribution function F t µ

f ´t d tive heap size parameter. This property must be ensured with

1 s t . The mortality function is m t logs t , µ s´t dt

and it expresses the age-specific death rate. Mortality is also due care in analysis and simulation. known as the hazard function (and written ht ) in the liter- ature on lifetime analysis. Occasionally we also use the in-

Ê t A.4 Distribution families of Figure 1

tegrated mortality: M t 0 m u du. Certain properties al-

Ê ∞

ways hold: s0 1; lims t 0; f t dt 1 (normaliza- ∞ 0

t ! Basic definitions of distribution families, compiled from text-

´ µ

M t

tion of density); st e [CO84, p.14]. books [BM70, Dev86, EJJ80, EHP93, JK70, MF65, Pap91].

7 A.4.1 Gamma

b·1

c b ct

f t t e

· µ Γ´b 1

k

¡¡¡ mk b 1 b 2 b k c

b·1 µ2 c2

b·1 µ3 2 c3 1

γ Ô

b·1

2 Ô γ3

b·1 γ γ Therefore 3 2 is a straight line in Figure 1.

A.4.2 Weibull

c

c 1 t

f t ct e

1 1

Γ m1 c c

2

Γ m2 1 c

3

Γ m3 1 c The curve γ–γ3 is plotted parametrically with respect to c in Figure 1.

A.4.3 Log-normal

2

ζµ ´logt 1 2

Ô δ

f t e 2 tδ 2π

δ2 Ô

γ ω It can be shown that, with abbreviation ω e , 1

γ Ô 3 ω ω γ γ γ and 3 2 1, therefore 3 3 in Figure 1.

8