<<

IV.

Known values are the minimum (a), the (b - the most likely value of the pdf), and the maximum (c). probability density function (area under the curve = 1)

f(x) 2 h = c - a

a b c x

ì 2(x -a) h for a £ x £ b (slope = ) ï(c - a)(b - a) b- a The pdf is given by f(x) = í -2(x - c) - h ï for b £ x £ c (slope = ) îï (c -a)(c -b) c- b = 0 otherwise

The is given by ¥ b 2(x -a) c 2(c - x) a + b +c E(X) = x ×f(x)dx = × xdx + ×xdx = ò ò (c - a)(b -a) ò (c -a)(c - b) 3 -¥ a b The derivation is fairly tedious; with a little work it can be shown that

é2b3 - 3ab 2 + a3 c3 - 3cb 2 + 2b3 ù a 3 (c - b) + b3 (a - c) + c3 (b - a) E(X) = h × ê + ú = ëê 6(b - a) 6(c - b) ûú 3(c - a)(b - a)(c - b) (a + b + c)(c - a)(b - a)(c - b) a + b + c = = 3(c - a)(b - a)(c - b) 3

Remark:

For a discrete sample, measures of centrality that are typically determined are the , the mode, and the . The mean is the average value of the sample and corresponds to E(X). The mode corresponds to the maximum value of the pdf. When working with a sample, it is necessary to resort to a histogram (which can be tricky) to estimate the mode of the underlying pdf. The median simply corresponds to that point at which half of the area under the curve is to the left and half is to the right. The triangular distribution is typically employed when not much is known about the distribution, but the minimum, mode, and maximum can be estimated. Sampling from the triangular distribution requires solving

rsample x = òf(z)dz -¥ for rsample given random probability x.

Since f(z) is piecewise continuous, its distribution function F(t) is given by

ì 0 for t £ a ï t ï f(z)dz for a < t £ b t ò ïa F(t) = òf(z)dz = í c -¥ ï1 - f(z)dx for b £ t < c ï ò ï t î 1 for t ³ c

Hence, for a £ rsample £ b we get

rsample rsample rsample 2(z -a) z 2 - 2az (rsample - a)2 A x = f(z)dz = dz = = ò ò (b -a)(c - a) (b -a)(c - a) (b - a)(c - a) a a a

and for b £ rsample £ c, since c c c 2(c - z) 2cz - z 2 (c - rsample) 2 f(z)dz = dz = = ò ò (c - b)(c -a) (c - b)(c - a) (c - b)(c - a) rsample rsample rsample

we get

(c - rsample) 2 B x = 1 - (c- b)(c - a)

Since b (b - a) f(z)dz = ò (c -a) a (b - a) if the random probability x £ then equation A is used to solve for (c - a) rsample; otherwise equation B is used.

A rsample = a + (b -a)(c - a)x for x £ (b -a)/(c - a)

B rsample = c - (c - b)(c -a)(1- x) for x > (b - a)/(c - a) Graphically, the sampling function has the appearance

rsample c

b

a

x 0 1

(b-a)/(c-a)

Example: (Note: the median corresponds to x= 0.5) For a=1, b = 2, c = 4 mean = (a+b+c)/3 = 2.333 mode = 2 median = b)(c(cc a)(0.5) 4 - 3 ==−−− 2.268

V.

A large number of useful functions are related to the exponential function. The is one of these. The gamma function generally traces from 18th century work by Euler in which he was using interpolation methods to define n! for non-integral values (it was later dubbed the gamma function by LeGendre in a series of books published between 1811 and 1826). The gamma function appears naturally in the study of anti-differentiation; i.e., it is also studied in the context of differential equations when calculating LaPlace transforms. The gamma function is given by ∞ Γ(α) = ∫ α - 1 ex −xdx α > 0)( 0 Integrating by parts, we get

Γ(α) = (α-1)Γ(α-1) for α > 1. ∞ Since (1) ∫e x- dx ==Γ 1 , then when α is an integer, Γ(α) = (α - 1)! 0 Hence, the gamma function is a generalization of the factorial, applying to all α > 0, not just integers. The gamma distribution is obtained from the gamma function by specifying the pdf x ì - ï a - 1 ß f(x) = íkx e for x > 0 for fixed a > 0 and b > 0 ï î 0 otherwise ¥ where the proportionality constant k is chosen so that òf(x)dx = 1 . -¥ k is easy to figure:

x - ¥¥ 1 xk a - 1e ß dx == kß òò t aa - 1e-t dt where x = bt . 0 0 G(a) 1 1 so k = and f(x) is given by x a - 1 e-x/ß ßa G(a ) ßa G(a )

a is called shape (or order) parameter; b us called the . 1 1 - × x Note that when a = 1, f(x) = × e ß which is the ß with mean b.

In general, E(X) = ab and s2 = ab2 . Hence, if the mean and standard deviation can be estimated, then a and b can also be determined.

Algorithm for calculating the natural of the gamma function Attributed to Lanczos, C., Journal S.I.A.M. Numerical Analysis, ser. B, vol. 1, p. 86 (1964) and adapted from Numerical Recipes in C by Press, W.H., and B.P. Flannery, S.A. Teukolsky, W.T. Vetterling (Cambridge University Press, 1988). FUNCTION lngamma(z) /* Use the reflection formula for z < 1 */ IF z < 1 z ¬ 1 - z RETURN ln(pz) - (lngamma(1 + z) + ln(sin(pz)) ENDIF coeff ¬ 76.18009173, -86.50532033, 24.01409822, -1.231739516, 0.00120858003, -0.00000536382 /* These values are the (approximate) coefficients for the first 6 terms of an infinite series involved in an exact formulation for the gamma function credited to Lanczos. They yield an approximation for the variable "a" (determined below) which is within |e| < 2 ´ 10-10 of its true value */ a ¬ 1 FOR i ¬ 1 TO 6 a ¬ a + coeff(i)/(i + z - 1) ENDFOR RETURN ln(a 2p ) - (z + 4.5) + (z - 0.5)ln(z + 4.5) END Gamma pdf for fixed mean ab = 5 and varying values of a and b f(x) 0.6

0.5 a=.5, b=10

0.4 a=1.5, b=3.3333 a=5, b=1 0.3 a=10, b=.5

0.2

0.1

0 x 0 5 10

Corresponding distribution functions and sampling functions F(x) rsample 1 10

8 .6

6

.2 4 0 0 10

2

0 0 .2 .6 1

The gamma distribution is used to model waiting times or time to complete a task. More specifically, it can be shown that if we have exponentially distributed interarrival times with mean 1/l, the time needed to obtain k changes distributes according to a gamma distribution with a = k and b = 1/l. ¥ Gamma Function: G(a ) ò x a - 1 e -x dx (a >= 0) 0

The general relationship G(a) = (a-1)G(a-1) for a > 1 holds. p It can also be shown that (a )GG (1- a ) = for 0 < a < 1. sin( p×a ) (Note that in particular, this that G(.5) = p )

For 0 < a < 1, 1 + a > 1, so G(1+a) = aG(a).

This in turn gives the reflection formula p ×a G(1- a ) = for 0 < a < 1 G(1 + a )sin( p×a )

Selected values computed according to the algorithm for ln(G(a)).

G(.25) » 3.62560991 G(a) G(.5) = p » 1.77245385 10 G(.75) » 1.22541670 G(1) = 0! = 1 G(1.25) » 0.90640248 8 G(1.5) = .5G(.5) = p / 2 » 0.88622693 G(1.75) » 0.91906253 G(2) = 1! = 1 6 x 3! G(2.25) » 1.13300310 0! 1! 2! G(2.5) = 1.5G(1.5) = 3 p / 4 » 1.32934039 4 G(2.75) » 1.60835942 G(3) = 2! = 2 G(3.25) » 2.54925697 2 x G(3.5) = 2.5G(2.5) = 15 p / 8 » 3.32335097 x x

G(3.75) » 4.42298841 0 a G(4) = 3! = 6 0 1 2 3 4 5 G(4.25) » 8.28508514 G(4.5) = 3.5G(3.5) = p105 /16 » 11.63172840 G(4.75) » 16.58620654 G(5) = 4! = 24 G(5.25) » 35.21161185 G(5.5) = 4.5G(4.5) = 945 p /32 » 52.34277778 G(5.75) » 78.78448105 G(6) = 5! = 120 For the pdf of the gamma distribution x ì - ï a - 1 ß f(x) = íkx e for x > 0 for fixed a > 0 and b > 0 ï î 0 otherwise note that if: a < 1 then xa - 1 ® ¥ as x ® 0 a = 1 then the distribution is the exponential distribution a > 1 then xa - 1 ® 0 as x ® 0 The earlier example showed three basic shapes, each of which is described by the behavior of the derivative f '(x) (slope function) of f(x). f '(x) = k[(a - 1)xa - 2 e-x/b + (-1/b)xa - 1 e-x/b] There are actually 5 cases: a < 1 the slope ® -¥ as x ® 0 since each term is < 0 and each exponent of x is < 0 a = 1 the slope ® -1/b2 as x ® 0 in accord with the exponential distribution since k = 1/b, term 1 is 0 and term 2 is -1/b a < 2 and a > 1 the slope ® +¥ as x ® 0 since the lead term ® +¥ and term 2 is 0 a = 2 the slope ® +1/b2 as x ® 0 since k = 1/ba = 1/b2 a > 2 the slope ® 0 as x ® 0 In each case the slope ® 0 as x ® +¥

The gamma distribution is one which is usually sampled by the accept-reject technique, which means to get k, the value of G(a) must be computed.

VI. Distribution

Erlang was a Danish telephone engineer who did some of the early work in queuing theory. When a is an integer i, then the gamma distribution is called an of order i. The Erlang distribution is used to model phenomena having i stages, each with independent, exponentially distributed service times of mean m. Rather than model these separately, an Erlang distribution of order i can be used to model the total service time; e.g., if an experiment has 3 successive stages each of which takes an average of 5 minutes (exponentially distributed), then the experiment time can be modeled by taking a = 3 and b = 5 (to get the overall mean of ab = 15 minutes). VII.

Recall that for the gamma distribution the idea was to choose k so that ¥ ò xk a - 1 e- x/ ß dx is equal to 1 and then choose f(x) = kxa-1e-x/b. 0 A major difficulty with this pdf is that it is not integrable in closed form; however, it is close to being so. The idea is to render the part under the integral to be in the form ezdz so that the function will be integrable; i.e., for ¥ ? k òe-(x/ ß ) x a - 1 dx we need to determine a ? value so that 0 for z = -(x/b)?, dz = xa-1dx

If z = -(x/b)a then dz = -(a/b)(x/b)a - 1dx and then

a ¥ a ¥ a æ -ß ö æ -a ö æ -ß ö aa æß ö a k ç ÷× e-()x/ ß ç ÷()x/ ß a - 1 dx = k ç ÷ ée-()x/ ß ù = k ç ÷ 1 when k == ç a ÷ ò ç ß ÷ ç a ÷ ëê ûú ç a ÷ a è ø 0 è ø è ø 0 è ø ß

This forms the basis for the Weibull distribution, which has the pdf

a ì æ x - d ö a - 1 -ç ÷ ïa æ x -d ö ç ß ÷ f(x) = ç ÷ e è ø for x ³ d íß ç ß ÷ ï è ø îï 0 otherwise

Remark: the effect of d is to displace the distribution d along the horizontal axis, so it is often taken to be 0.

With a little work, it can be shown that E(X) = d + bG(1 + 1/a) and VAR(X) = b2(G(1 + 2/a) - G2(1 + 1/a))

Since rsample rsample aa x f(z)dz == - ée-((x - d )/ß)ù = 1 - e-(( rsample - d)/ß) ò ëê ûú d d the sample function can be solve for rsample for given values of a, b, d.

The Weibull distribution is often used to model time until failure (for example, light bulbs may have significant early failure and some have significant long term until failure) as per the following graph. When a = 1, b = 1/l, d = 0 then the Weibull distribution is the exponential distribution. Weibull pdf for fixed b = 1 and varying values of a f(x) 4 a=10, b=1

3 a=.5, b=1

2 a=5, b=1

1 a=1.5, b=3.3333

0 x 0 1 2 3 4