Lecture 3, Spring 2005 by Haiyan Huang

Lecture 3, STATC141 Jan. 26, Spring 2006

Introduction on Probability and Statistics (III-b)

I. Uniform Distribution. A uniform distribution is one for which the probability of occurrence is the same for all values of X. It is sometimes called a rectangular distribution. For example, if a fair die is thrown, the probability of obtaining any one of the six possible outcomes is 1/6. Since all outcomes are equally probable, the distribution is uniform. If a uniform distribution is divided into equally spaced intervals, there will be an equal number of members of the population in each interval.

The probability density function P(x) and cumulative distribution function D(x) for a continuous uniform distribution on the interval [a, b] are

II. Normal Distribution. A normal distribution in a variate X with mean m and variance s 2 has probability density function P(x) on the domain x �(ゥ , ) . While statisticians and mathematicians uniformly use the term “normal distribution” for this distribution, physicists sometimes call it a Gaussian distribution and, because of its curved flaring shape, social scientists refer to it as the “bell curve.” The probability density function (pdf) P(x) is

1 2 2 P( x ) = e-(x -m ) /(2 s ) s2 p

The so-called “standard normal distribution” is given by taking m = 0 and s =1 in a general normal distribution. An arbitrary normal distribution can be converted to a standard normal distribution by X - m changing variables to Y = . The cumulative distribution function D(x), which gives the s probability that a variate will assume a value x , is then the integral of the normal distribution, x1 x 2 2 D( x )=蝌 P ( t ) dt = e-(t -m ) /(2 s ) dt -� s2 p

III. Poisson Distribution. The Poisson distribution is most commonly used to model the number of random occurrences of some phenomenon in a specified unit of space or time. For example,

 The number of phone calls received by a telephone operator in a 10-minute period.  The number of flaws in a bolt of fabric.  The number of typos per page made by a secretary.

For a Poisson random variable, the probability that X is some value x is given by the formula

m xe-m P( X= x ) = , x = 0,1,...... x! where m is the average number of occurrences in the specified interval. For the Poisson distribution,

E( X )=m , Var ( X ) = m

EXAMPLE: The number of false fire alarms in a suburb of Houston averages 2.1 per day. Assuming that a Poisson distribution is appropriate ( m = 2.1), the probability that 4 false alarms will occur on a given day is given by

2.14e- 2.1 P( X = 4) = = 0.0992 4!

Poisson is ultimate Binomial. Assuming that there are n independent trials and success probability of each trial is p, the number of successful trials X, out of the n trials, then follows a binomial distribution. And 骣n x n- xn! x n - x P( X= x ) =琪 p (1 - p ) = p (1 - p ) . 桫x x!( n- x )!

If n� ギ , p 0, and np m with m >0 , then

n! limP ( X= x ) = lim px (1 - p ) n- x n�ギ n + x!( n- x )! 禳1 m = lim睚 (np )(( n - 1) p )...(( n - x + 1) p )(1 - )n- x (1) n� 铪x! n m xe-m = . x!

Test of Significance Was it due to chance, or something else? Statisticians have invented “tests of significance” to deal with this sort of question. Nowadays, it is almost impossible to read a research article without running across tests and significance levels.

For example, assume that we observed 9 heads in 10 tosses of a coin. Since 9 is much different to what we have expected (5 heads), we would like to ask “is the coin a fair coin, or not?” There are then two statements about the coin – null hypothesis and alternative hypothesis. Each hypothesis represents one side of the argument:  Null hypothesis—the coin tossed is a fair coin.  Alternative hypothesis—the coin tossed is NOT a fair coin. The null hypothesis expressed the idea that “the outcome of 9 heads in 10 tosses” is due to chance, or say the difference is due to chance. The alternative hypothesis says that “the outcome of 9 heads in 10 tosses” is due to the unfairness of the coin, or say the difference is real.

A test statistic is used to measure the difference between the data and what is expected on the null hypothesis. The P-value of a test is the chance of getting a big test statistic—assuming the null hypothesis to be right. P is not the chance of null hypothesis being right. If the P-value is small, say it is less than 0.05, we would reject the null hypothesis.

For the above example, we can define the test statistic as |X-5|, where X is the number of heads we observe in 10 tosses of a coin. Then the observed test statistic is 4. And the P-value is the probability of the event |X - 5 | 4 when the coin is fair:

P(| X- 5 |� 4) = P ( X + 0) = P ( X + 1) = P ( X + 9) = P ( X 10) =(1/2)10+ 10(1/ 2) 10 + 10(1/ 2) 10 + (1/ 2) 10 =0.02

This means that if the coin is fair, there is only about 2% probability to observe|X - 5 | 4 . So we would prefer to believe that the coin is NOT fair.