NORMAL DISTRIBUTION

Achievement Standard: 90646 (2.6) (part) external; credits 4

Key words: mean, standard deviation, standard normal, inverse normal, continuity correction.

1. The Normal Distribution - introduction

Some characteristics of a normal distribution are: * Continuous data which has a bell shaped histogram.

* Parameters that are the mean, , and the standard deviation, . * About 68% of the population lie within   , 95% within   2, and 99% within   3. * Many applications in life can be approximated by a normal distribution - IQ’s, heights of people, lifetimes of a light bulb. * Some probability distributions can be approximated by a Normal Distribution. * The Central Limit Theorem (Confidence intervals topic) allows the use of the Normal Distribution in statistical decision making.

2. The Standard Normal Distribution

Properties * Total area under curve = 1 * The mean = 0 * The standard deviation = 1 * The curve is symmetrical P(a < Z < b) = shaded area a b Use of Standard Normal Tables

- The tables give the value P(0 < Z < a). - Diagrams are essential. 0 a Examples:

(a) P(0  Z  1) = 0.3413 0 1

(b) P(-1  Z  1) = 2  0.3413 = 0.6826 -1 1

(c) P(0.3  Z  3.2) = 0.4993 – 0.1179 = 0.3814

0.3 3.2 (d) P(-0.326  Z  2.215) = 0.1278 + 0.4866 = 0.6144

-0.326 2.215 (e) P(Z  0.342 ) = 0.5 – 0.1339 = 0.3661

0.342 Use of Inverse Tables Examples: (a) Find the value of z giving the area of 0.3770 between 0 and z.

0.377

z = 1.16 0 z

(b) Find the value of z giving an area of 0.05 to the right of z

0.45 0.05

z = 1.645 0 z 3. Using your graphics calculator for standard normal problems

STAT, DIST,NORM; then there are three options: * Npd – you will not have to use this option * Ncd – for calculating probabilities * InvN – for inverse problems

NB: On your graphics calculator shaded areas are from -∞ to the point. To enter -∞ you type - EXP 99. To enter +∞ you type EXP 99.

(a) Calculating probabilities:

(i) P(-0.326  Z  2.215) = 0.6144 STAT, DIST, NORM, Ncd lower :-0.326, upper :2.215, σ :1, μ :0 -0.326 2.215

(ii) P(Z  0.342 ) = 0.36617 lower :0.342, upper :EXP99, σ :1, μ :0

0.342 (b) Inverse calculations Find the value of z giving an area of 0.05 to the right of z STAT, DIST, NORM, InvN Area : 0.95 (NB), σ :1, μ :0 0.45 0.05 z = 1.645 0 z x – m Z = s

4. Calcuating Probabilities for other Normal Random Variables Any normal random variable, X, can be transformed into a standard normal random variable by the formula x – m Z = s Nulake p285 Sigma p 358, Ex 17.01

– 0.5 1.2 Examples (a) Given a normal distribution, X, with μ = 50 and σ = 10, find the probability that X assumes a value between 45 and 62, 45 – 50 62 – 50

Solution:10 10

45 – 50 62 – 50 P(45  X  62) = P(  Z  ) 10 10 = P( -0.5  Z  1.2) – 0.5 1.2 = 0.3849 + 0.1915 = 0.5764 (b) (Inverse problems) A manufacturer of car tyres knows that her product has mean life of 2.3 years with a standard deviation of 0.4 years. Assuming that the lifetime of a tyre is normally distributed what guarantee should she offer if she only wants to pay out on 2% of tyres produced.

Nulake p279 Solution: Sigma p 366, Ex 17.03 Let X be a R.V. representing the life of a tyre. x – m X is normal with μ = 2.3 and σ = 0.4 s We want an x value such that P(X x) = 0.02. x – 2.3

0.4 P(Z  z) = 0.02 is given by z = -2.05 0.48 x – m = -2.05 0.02 s x – 2.3 = -2.05 0.4 x = -2.05  0.4 + 2.3 x = 1.48

So her guarantee should run for 1.48 years.

(c) A product has a normally distributed weight with μ = 1kg. If the risk of marketing a product more than 50g underweight must be less than 1%, what is the maximum allowable standard deviation? 950 – 1000

Solution: s Let X be a R.V.50 representing the weight of the product. – s We want  such that P( X  950 )  0.01 50 – 950 – 1000 P( Z  s )  0.01 50 s – 50 P( Z  s – )  0.01 -2.33 s 50 Nulake p279 For the maximum value of  take P( Z  – ) = 0.01 s Sigma p 369, Ex17.04 50 =– -2.33 s  = 21.5

so 21.5 is the maximum allowable standard deviation 5. Sums and Differences of Normal Distributions Nulake p 305 Recall that if X and Y are two random variables: Sigma p 376, Ex 18.01. 18.02 Ex 18.03 E[aX + bY + c] = aE[X] + bE[Y] + c

Var [aX  bY + c] = a2Var [X] + b2Var[Y] , if X and Y are independent.

Assume that if Z = X + Y and X and Y are both distributed normally then Z will also be distributed normally.

Example 1 (a) Two containers are loaded onto a truck. The weight of each container is normally distributed with mean 2000kg and standard deviation 200kg. A truck is licensed to carry a total load of 4500kg. What is the probability that a randomly selected truck is overloaded?

Solution:

Let W1, W2 and T be random variables representing the weights of the two containers and the total weight on the truck respectively.

Then T = W1 + W2

and T = E[W1 + W2]

= E[W1] + E[W2] = 2 000 + 2 000 = 4 000 kg

Var[T] = Var[W1 + W2]

= Var[W1] + Var[W2] (assuming independence) = 2002 + 2002 = 80 000 kg 4500 – 4000

T = 283 kg 283

T is normal since W1 and W2 are. P( overload) = P( T  4 500) 4500 – 4000 = P( Z  ) 283

= 0.5 – 0.4616 1.77 = 0.0384 (b) If the cost of transporting the containers is $200 plus 25cents per kilogram for each container, what is the probability that the cost will be less than $1 250?

Let the cost (C) be C = 0.25W1 + 0.25W2 + 200

and C = E[0.25W1 + 0.25W2 + 200]

= 0.25E[W1] + 0.25E[W2] + 200 = $1 200

Var[C] = Var[0.25W1 + 0.25W2 + 200] 2 2 = 0.25 Var[W1] + 0.25 Var[W2] (assuming independence) = 0.0625  2002 + 0.0625  2002 = $5 000

C = $70.71 (2DP)

P( C  1250) = P(Z  0.707) = 0.5 + 0.2601 = 0.7601 0.707

Example 2 Weights of containers are normally distributed with mean 2 000kg and standard deviation 200kg. 50 containers are put on a ship. What is the probability that the total weight of all the containers is more than 101 tonne?

Solution:

Let C1,…..C50 be random variables representing the weight of each of the containers and T be a R.V. representing the total weight of the 50 containers.

Then T = C1 +,…..+ C50 and E[T] = E[C1 +,…..+ C50 ]

= E[C1] +,…..+ E[C50 ] = 50  2 000 = 100 000kg or 100tonne

Var[T] = Var[C1 +,…..+ C50 ]

= Var[C1] +,…..+ Var[C50 ] = 50  2002 = 2 000 000kg

T = 1414 kg or 1.414 tonne

P( T  101) = P( Z  0.707) = 0.5 – 0.2061 = 0.2399 0.707 6. Combining normal probabilities Sigma p372, Ex 17.05 (select from)

The lengths of fish that Keith catches are independent of each other and are normally distributed with mean 350mm and standard deviation 95mm. A fish can be kept legally if it is more than 200mm long, otherwise it is too small. What is the probability that, if Keith catches two fish, at least one of then is legal?

Answer:

P( a legal fish) = 0.943, P( too small) = 0.057 (graphics calculator)

For two fish a probability tree can be drawn.

legal 0.943 legal

0.943 0.057 too small

0.943 legal 0.057 too small 0.057 too small

P( at least one legal fish) = 0.057 × 0.943+ 0.943 × 0.943 + 0.943 × 0.057 = 0.997 Sigma p362, Ex 17.02 7. Continuity corrections Nulake p294

The normal distribution is sometimes used to calculate probabilities when the data under consideration is discrete. In such cases we are approximating the data by a continuous normal distribution and, in order to get a best answer, have to make an adjustment that takes this approximation into account.

Example: The weights of a sample of a certain breed of fish are weighed to the nearest kilogram (this means the data is discrete). The bar graph shows the probability distribution of weights (the area of each bar represents the probability of each weight – all areas would add to one).

It is decided the weights can be approximated by a Weights of fish to the nearest kg normal distribution with a mean of 5kg and a standard deviation of 1.4 kg Calculate the probability that a fish will weigh between 6kg and 8kg inclusive.

Answer: Let X approximate the weight of a fish. 1 2 3 4 5 6 7 8 9 X is approximately normal with μ = 5, σ = 1.4

Weights of fish to the nearest kg We must include half intervals in making a continuity correction.

P ( weight between 6 and 8kg inclusive) ≈ P( 5.5 ≤ X ≤ 8.5) = 0.3543.

1 2 3 4 5 6 7 8 9

Note that if the question had asked for the probability that a fish was less than 8kg Weights of fish to the nearest kg the following would have to be done:

P( weight less than 8) ≈ P ( X ≤ 7.5) (see diagram) = 0.963

1 2 3 4 5 6 7 8 9 8. Summary of distributions

Binomial Poisson Normal X is a discrete r.v. and: X is a discrete r.v. and: X is a continuous r.v. and:

1. There are a fixed 1. The number of 1. If it fits a bell-shaped number of trials (n) outcomes in a time symmetrical curve then or region is we usually assume it is 2. Trials are independent. independent of the distributed normally. number in another 3. Each trial has only two time or region. Pop. parameters = μ, σ outcomes (success/failure) 2. The probability of a Sample statistics = x, s 4. The probability of success single outcome (π) remains the same for occurring in a short each trial interval/region is proportional to the Pop. parameter = π length of time/region. Sample statistic = p 3. The probability of more than one outcome occurring in a short time or region is negligible.

Pop. parameter = 