Review of Mathematical Chapter 10

Definition. If {X1 , ... , Xn} is a set of random variables on a sample space, then any function √ f(X1 , ... , Xn) is called a . For example, sin(X1 + X2) is a statistic. If a statistics is used to approximate an unknown quantity, then it is called an . For example, we

X1+X2+X3 X1+X2+X3 may use 3 to estimate the mean of a population, so 3 is an estimator.

Definition. By a random sample of size n we mean a collection {X1 ,X2 , ... , Xn} of random variables that are independent and identically distributed. To refer to a random sample we use the abbreviation i.i.d. (referring to: independent and identically distributed).

Example (exercise 10.6 of the textbook) ∗. You are given two independent of ˆ ˆ an unknown quantity θ. For estimator A, we have E(θA) = 1000 and Var(θA) = 160, 000, ˆ ˆ while for estimator B, we have E(θB) = 1, 200 and Var(θB) = 40, 000. Estimator C is a ˆ ˆ ˆ ˆ weighted average θC = w θA + (1 − w)θB. Determine the value of w that minimizes Var(θC ).

Solution. Independence implies that:

ˆ 2 ˆ 2 ˆ 2 2 Var(θC ) = w Var(θA) + (1 − w) Var(θB) = 160, 000 w + 40, 000(1 − w)

Differentiate this with respect to w and set the derivative equal to zero:

2(160, 000)w − 2(40, 000)(1 − w) = 0 ⇒ 400, 000 w − 80, 000 = 0 ⇒ w = 0.2

Notation: Consider four values

x1 = 12 , x2 = 5 , x3 = 7 , x4 = 3

Let’s arrange them in increasing order:

3 , 5 , 7 , 12

1 We denote these “ordered” values by these notations:

x(1) = 3 , x(2) = 5 , x(3) = 7 , x(4) = 12

5+7 x(2)+x(3) The of these four values is 2 , so it is 2 . As another example if we have observed the values {1 2 1 7 6}, then

x(1) = 1 , x(2) = 1 , x(3) = 2 , x(4) = 6 , x(5) = 7 and the median is x(3) = 2.

Definition: If {X1 ,X2 , ... , Xn} is a collection of random variables, then the k-th order statistics is X(k). The median can be written in terms of order statistics:    X n+1 if n is odd  ( 2 ) median =   X n +X n  ( 2 ) ( 2 +1) 2 if n is even Note that

X(1) = min(X1 , ... , Xn) X(n) = max(X1 , ... , Xn)

Theorem. Let {X1 , ... , Xn} be an i.i.d. of continuous random variables with common density f(x) and common distribution function F (x). Let Y1 < Y2 < ··· < Yn be the their order statistics (why can we ignore the equality between them). Then the density of the k-th order statistics Yk is n! g (y) = [F (y)]k−1 [1 − F (y)]n−k f(y) k (k − 1)!(n − k)!

Example. Consider a population with density function

1 f(x) = 2x 0 < x < 2

Then ∫ x 1 F (x) = f(t)dt = x2 0 < x < 0 2

2 Then the density of the random variable min(X1, , ... , X6) is: [ ] [ ] 6! 1−1 6−1 g (x) = y2 1 − y2 (2y) = 12y(1 − y2)5 1 0!(6 − 1)! and , the density of the random variable max(X1, , ... , X6) is: [ ] [ ] 6! 6−1 6−6 g (x) = y2 1 − y2 (2y) = 3y11 6 5!(6 − 6)!

Definition. An estimator θˆ for a parameter θ is said to be unbiased if for all possible values of θ we have E(θˆ| θ) = θ.

Note. What is good about about an unbiased estimator? Answer: If θˆ is unbiased for estimating θ, then by observing a large number of instances, {θˆ1 , θˆ2 , ... , θˆ100} of θˆ, say, and

θˆ1+···+θˆ100 then by taking the simple average 100 we will have a good approximation for θ. This is due to the .

2 Example. Suppose {X1 ,X2} are two random variables from a population N(ν , σ = 1). 2 Find the parameter for which X1 + 3X2 is an unbiased estimator.

Solution.

2 2 2 ⇒ 2 2 2 E(X1 ) = Var(X1) + E(X1) = 1 + µ E(X1 + 3X2) = E(X1 ) + 3E(X2) = 1 + µ + 3µ, regardless of the true value of µ.

2 2 So, X1 + 3X2 is an unbiased estimator for the parameter µ + 3µ + 1.

Definition. Consider a class of unbiased estimators for a particular parameter. Then from this class an estimator is called Uniformly Minimum Unbiased Estimator (UMVUE) if it has the minimum variance among all elements of that class of unbiased estimators.

Example (exercise 10.8 of the textbook) ∗. Two instruments are available for measuring a particular nonzero distance. The random variable X represents the measurement with the

3 first instrument and the random variable Y with the second instrument. Assume X and Y are independent, and that

E[X] = 0.8 m E[Y ] = m Var(X) = m2 Var(Y ) = 1.5 m2 where m is the true distance. Consider the estimators of m that are of the form Z = αX + βY . Determine the values of α and β that make Z an UMVUE within the class of estimators of this kind.

Solution. being unbiased ⇒ m = E(αX + βY ) = α E(X) + β E(Y ) = (0.8α + β)m ⇒ 0.8 α + β = 1 ⇒ β = 1 − 0.8 α

Var(αX + βY ) = α2Var(X) + β2Var(Y ) = α2m2 + (1 − 0.8α)2(1.5 m2) = { } { } α2 + (1 − 0.8α)2(1.5) m2 = 1.96 α2 − 2.4 α + 1.5 m2

Since the coefficient of α2 is positive (1.96), the quadratic form is minimized at − −2.4 − α = 2(1.96) = 0.6122. Then also β = 1 0.8(0.6122) = 0.5102

Theorem. Let {Y1 , ... , Yn} be random variables from the same population with mean µ and ∑n 2 ¯ 1 variance σ . Then the sample mean Y = n Yj is an unbiased estimator for the j=1 population mean µ. The proof does not need any assumption of independence.

Proof. 1 ∑n 1 ∑n E(Y¯ ) = E(Y ) = µ = µ n j n j=1 j=1

Theorem. Let {Y1 , ... , Yn} be mutually independent random variables from the same population with mean µ and variance σ2. Then

¯ σ2 (a) Var(Y ) = n .

4 ∑n 1 − ¯ 2 (b) The sample variance n−1 (Yj Y ) is an unbiased estimator for the population j=1 variance σ2

Proof. ( ) { } Y + ··· + Y 1 1 Var(Y¯ ) = Var 1 n = Var(Y + ··· + Y ) = Var(Y ) + ··· + Var(Y ) n n2 1 n n2 1 n { } 1 1 σ2 = σ2 + ··· + σ2 = (nσ2) = n2 n2 n

This proves part (a). To prove part (b) we first note that [ ] ∑n ∑n 2 2 (Yj − Y¯ ) = (Yj − µ) + (µ − Y¯ ) j=1 j=1

∑n [ ] 2 2 = (Yj − µ) + 2(Yj − µ)(µ − Y¯ ) + (µ − Y¯ ) j=1

∑n ∑n ∑n 2 2 = (Yj − µ) + 2(µ − Y¯ ) (Yj − µ) + (µ − Y¯ ) j=1 j=1 j=1

∑n 2 2 = (Yj − µ) + 2(µ − Y¯ )(nY¯ − nµ) + n(µ − Y¯ ) j=1

∑n 2 2 2 = (Yj − µ) − 2n(µ − Y¯ ) + n(µ − Y¯ ) j=1

∑n 2 2 = (Yj − µ) − n(µ − Y¯ ) j=1

So

∑n ∑n 2 2 2 (Yj − Y¯ ) = (Yj − µ) − n(µ − Y¯ ) j=1 j=1

5 Applying the expectation E on both sides , we will have: [ ] ∑n ∑n 2 2 2 E (Yj − Y¯ ) = E(Yj − µ) − nE(µ − Y¯ ) j=1 j=1

∑n = Var(Yj) − nVar(Y¯ ) j=1

( ) 2 − σ2 − 2 = nσ n n = (n 1)σ

So   1 ∑n E  (Y − Y¯ )2 = σ2 n − 1 j j=1 This proves part (b).

Note. Repeating the argument we used for the identity ∑n ∑n 2 2 2 (Yj − Y¯ ) = (Yj − µ) − n(µ − Y¯ ) j=1 j=1 shows that for any constant a we have ∑n ∑n 2 2 2 (Yj − Y¯ ) = (Yj − a) − n(a − Y¯ ) j=1 j=1 and specially for a = 0 we have : ∑n ∑n − ¯ 2 2 − ¯ 2 (Yj Y ) = Yj n(Y ) j=1 j=1

Dividing by n gives:

1 ∑n 1 ∑n (Y − Y¯ )2 = Y 2 − (Y¯ )2 n j n j j=1 j=1 Equivalently:

6 ( ) ∑n ∑n ∑ 2 1 − ¯ 2 1 2 − 1 n n (Yj Y ) = n Yj n j=1 Yj j=1 j=1

Note that the left-hand side is the empirical variance.

Example ∗. Mrs. Actuarial Gardener has used a global positioning system to lay out a perfect 20-meter by 20-meter gardening plot in her back yard. Her husband, Mr. Actuarial Gardener, decides to estimate the area of the plot. He paces off a single side of the plot and records his estimate of the length. He repeats this experiment an additional 4 times along the same side. Each trial is independent and follows a with a mean of 20 and a of 2 meters. He then averages his results and squares that number to estimate the total area of the plot. Which of the following is a true statement regarding Mr. Gardener’s method of estimating the area? A) On average, it will underestimate the true area by at least 1 square meter. B) On average, it will underestimate the true area by less than 1 square meter. C) On average, it is an unbiased method. D) On average, it will overestimate the true area by less than 1 square meter. E) On average, it will overestimate the true area by at least 1 square meter.

Solution. “on average” means “if he repeats his experiment over and over”. So we must find

¯ 2 ¯ X1+X2+X3+X4 E(X ) , where X = 4 .

σ2 4 E(X¯ 2) = Var(X¯) + E(X¯)2 = + µ2 = + 202 = 0.8 + 400. 4 5

So, his estimate overestimates by 0.8 of meters. So the answer is part (D).

Definition. If θˆ is an estimator for θ, then the difference Bias(θˆ) = E(θˆ) − θ is called the bias of θˆ.

Note. We use the Law of Large Numbers to estimate expected values. Therefore, we can

7 ˆ ··· ˆ ˆ − θ1+ +θn − {ˆ ˆ } estimate any bias E(θ) θ by n θ where θ1 , ... , θn are some random sample from the distribution of θˆ. The next example is an application of this technique.

Example. Let {X1 ,X2 ,X3} be a random sample from a population Exponential(θ). Calculate the bias of the median as an estimator for θ.

Solution. For n = 3 the formula for the density of order statistics becomes:

3! f (y) = F (y)k−1[1 − F (y)]3−kf(y) Yk (k − 1)!(3 − k)!

For this sample the median is X2, therefore for k = 2 we get: ( ) ( ) 3! − y − y 1 − y 6 − 2y − 3y f (y) = F (y)[1 − F (y)]f(y) = 3[1 − e θ ] e θ e θ = e θ − e θ Y2 (1!)(1!) θ θ

∫ ∞ ∫ ∞ ( ) 6 − 2y − 3y θ − θ E(Y2) = y fY2 (y) dy = y e y e dy 0 θ 0

{ ∫ ∞ } − − y 5θ = from the Gamma integral formula for xα 1 e β dx being Γ(α)βα = 0 6

5θ θ Bias(Y ) = E(Y ) − θ = − θ = − 2 2 6 6

Example ∗. Claim sizes are described by an exponential distribution with parameter θ. The probability density function of the order statistic Yk with sample size n is

n! F (y)k−1[1 − F (y)]n−kf(y) (j − 1)! (n − k)!

For a sample of size 5, determine the bias in using Y3, the third order statistic, as an estimate of the median of the distribution. (A). Less than 0.05 θ (B). At least 0.05 θ but less that 0.08 θ (C). At least 0.08 θ but less that 0.10 θ (D). At least 0.10 θ but less that 0.13 θ

8 (E). At least 0.13 θ

Solution. Let m be the median.

m F (m) = 0.5 ⇒ 1 − exp(− ) = 0.5 ⇒ m = θ ln(2) Y θ [ ] [ ] ( ) [ ] 5! − y 2 − y 2 1 − y 30 − 3y − 4y − 5y f (y) = 1 − e θ e θ e θ = e θ − 2 e θ + e θ Y3 2! 2! θ θ

∫ ∞ ∫ ∞ [ ] 30 − 3y − 4y − 5y θ − θ θ E(Y3) = y fY3 (y) dy = y e 2 e + e dy = 0 θ 0

∫ ∞ − x α−1 β α using the Gamma integral formula 0 x e dx = β Γ(α) [( ) ( ) ( ) ] 30 θ 2 θ 2 θ 2 = − 2 + = 0.783 θ θ 3 4 5

Bias = E(Y3) − median = 0.783 θ − θ ln(2) = 0.09 θ ✓

Definition. Let θˆn be an estimator based on a sample of size n for θ. This sequence of estimators is called asymptotically unbiased if lim E(θˆn) = θ. n→∞

Example. Let {X1 , ... , Xn} be an i.i.d. from a population Uniform(0 , θ). Show that the largest order statistic Yn = max(X1 , ... , Xn) is asymptotically unbiased for estimating θ.

Solution. For this uniform distribution we have the distribution function being

y F (y) = 0 < y < θ θ

Then by putting k = n in the formula of density of order statistics we will have:

n! f (y) = F (y)n−1[1 − F (y)]n−nf(y) = n F (y)n−1f(y) Yn (n − 1)! 0! ( ) ( ) y n−1 1 n yn−1 = n = 0 < y < θ θ θ θn

9 Then: ∫ ∫ [ ] θ θ θ n n n 1 n+1 n θ → → ∞ E(Yn) = y fYn (y) dy = n y dy = n y = θ as n 0 θ 0 θ n + 1 0 n + 1

Definition. A sequence θˆn of estimators is called consistent (also called weakly consistent) if for all positive values δ > 0 we have

lim P (|θˆn − θ| > δ) = 0 n→∞

Theorem. If θˆn is asymptotically unbiased and if Var(θˆn) → 0, then θˆn is consistent.

Example. For a random sample (i.e. an i.i.d.) {X1 ,X2 , ....} of random variable from a ··· ¯ X1+ +Xn population with mean µ, show that the sequence of sample means Xn = n is consistent for estimating µ.

Solution. We already know that X¯n is unbiased. Furthermore,

σ2 Var(X¯ ) = under the assumption of independence = → 0 n n

Then from the theorem above, the sequence X¯n is consistent.

Example. Let Yn be the maximum observation from a uniform distribution on the interval

(0, θ). By calculating Var(Yn) show that Yn is a consistent estimator of θ.

Solution. As we saw above, we have

n yn−1 f (y) = 0 < y < θ Yn θn

Then ∫ ∫ [ ] θ θ θ 2 2 2 n n+1 n n+2 n θ E(Yn ) = y fYn (y) dy = n y dy = n y = 0 θ 0 (n + 2)θ 0 n + 2

10 ( ) n θ2 n θ 2 n θ2 Var(Y ) = E(Y 2) − E(Y )2 = − = → 0 n n n n + 2 n + 1 (n + 2)(n + 1)2

But we have already shown that Yn is asymptotically unbiased, therefore together with

E(Yn) → 0 it implies consistency.

Definition. For an estimator θˆ the mean square error is defined to be [ ] ˆ 2 MSEθ(t) = E (θ − θ) | θ

Theorem. [ ] 2 ˆ| MSE(θ) = Var(θ θ) + Biasθˆ(θ)

Note. The conditioning on θ means that the expectation should be calculated under the assumption that the true parameter of the population is θ.

Example (exercise 10.10 of the textbook) ∗. Two different estimators, θˆ1 and θˆ2 are being considered. To test their performance, 75 trials have been simulated, each with the true value set at θ = 2. The following totals were obtained:

∑75 ∑75 ∑75 ∑75 ˆ ˆ2 ˆ ˆ2 θ1j = 165 , θ1j = 375 , θ2j = 147 , θ2j = 312 j=1 j=1 j=1 j=1

where θˆij is the estimate based on the j-th simulation using estimator θˆi (i = 1 , 2). Estimate the MSE for each estimator and determine the relative efficiency (the ratio of the MSEs).

Solution. ∑ (∑ ) ( ) ( ) 75 ˆ2 75 ˆ 2 2 2 θ θ1j 375 165 Var(θˆ ) = Var(θˆ2) − E(θˆ ) ≈ j=1 1j − j=1 = − = 0.16 1 1 1 75 75 75 75 ∑ 75 ˆ θ1j 165 Bias(θˆ ) = E(θˆ ) − θ ≈ j=1 − θ = − 2 = 0.2 1 1 75 75

11 2 2 MSE(θˆ1) = Var(θˆ1) + Bias(θˆ1) ≈ 0.16 + (0.2) = 0.2 ∑ (∑ ) ( ) ( ) 75 ˆ2 75 ˆ 2 2 2 θ θ2j 312 147 Var(θˆ ) = Var(θˆ2) − E(θˆ ) ≈ j=1 2j − j=1 = − = 0.3184 2 2 2 75 75 75 75 ∑ 75 ˆ θ2j 147 Bias(θˆ ) = E(θˆ ) − θ ≈ j=1 − θ = − 2 = −0.04 2 2 75 75

2 2 MSE(θˆ2) = Var(θˆ2) + Bias(θˆ2) ≈ 0.3184 + (−0.04) = 0.32

MSE(θˆ ) 0.2 relative efficiency = 1 = = 0.625 MSE(θˆ2) 0.32

12