BIO5312 Lecture 04: and Three Distributions Derived from the Normal Distribution

Dr. Junchao Xia Center of Biophysics and Computational Biology

Fall 2016

9/21/2016 1/23 Introduction In this lecture we will talk about main concepts as listed below,

and of functions of random variables Linear combination of random variables Moments and generating function Theoretical derivation of the central limit theorem Three distributions derived from the normal distribution

9/21/2016 2/23

Expected Value of a Measures of location and spread can be developed for a random variable in much the same way as for samples. The analog of the arithmetic is called the expected value of a random variable, or population mean, and is denoted by E(X) or  and represents the “average” value of the random variable.  E(X )    x p(x )  xf (x)dx  i i  i  Expected value of a binomial distribution (n,p) = np Expected value of a Poisson distribution with   t

 k  k1  k         E(X )  k e  e   e   e e   k0 k! k1 (k 1)! k0 k!

Expected value of a normal distribution with N(, 2 )

 1 (x )2  z 2  z 2  xz   1 2 2 1 2  2 E(X )   xe  dx   ze 2 dz   e 2 dz  0      2   2   2 

9/21/2016 3/23

Variance of a Random Variable

 The analog of the sample variance (s2) for a random variable is called the variance of a random variable, or population variance, and is denoted by Var(X) or 2 . The of a random variable X, denoted by sd(X) or , is defined by the square root of its variance. Var (X )  E{[X  E(x)]2}  E[( X  )2 ]  E(X 2  2X   2 )  E(X 2 )   2   2  (x  )2 p(x )  (x  )2 f (x)dx  i i  i  Variance of a binomial distribution (n,p) = npq Variance of a Poisson distribution  2    t Variance of a normal distribution N(, 2 )

 1 (x)2  2 1 2 2 2 Var(X )  E[( X  ) ]   (x  ) e  dx  2  2  z 2 2  2 xz   z 2u   1   z 2e 2 dz   2ueu du  (  1)   2 2  2 0  2

9/21/2016 4/23

Expected Value and Variance of Functions of Random Variables

Expected value of functions of random variables  Y  g(X ), E(Y)  g(x ) p(x )  g(x) f (x)dx  i i  i E(X Y)  E(X)  E(Y), (for any random variables independent or not) if X and Y are independent random variables, E(XY)  E(X)E(Y)

E[g(X )h(Y )]   g(xi )h(yi ) p(xi ) p(yi )  E[g(X )]E[h(Y)] i a and b are constants E(a  bX)  a  bE(X ) Variance of functions of random variables  Y  g(X ), Var(Y)  [g(x )  E[g(X )]]2 p(x )  [g(x)  E[g(X )]]2 f (x)dx  i i  i  X and Y are independent random variables, Var(X Y)  Var(X)Var(Y)

a and b are constants Var(a  bX)  E{[a  bX  a  bE(X )]2}  E{b2[X  E(X )]2}  b2Var(X )

9/21/2016 5/23

Linear Combinations of Random Variables Sums or difference or more complicated linear functions of random variables (either continuous or discrete) are often used.

A linear combination L of the random variables X1,…Xn is defined as any function of the form L = c1X1 + …+cnXn. A linear combination also called a linear contrast. To compute the expected value and variance of linear combinations of random variables, we use the principle that the expected value of the sum of n random variables is the sum of the n respective expected values.

No matter Xi independen t to each or not, n n n L  c X , E(L)  c E(X )  c  i1 i i i1 i i i1 i i

Variance of L where X1,…, Xn are independent is n n n L  c X , Var(L)  c2Var(X )  c2 2 i1 i i i1 i i i1 i i The corresponding case for dependent random variables will be shown later.

9/21/2016 6/23 Covariance of Dependent Random Variables The covariance between two random variables X and Y is denoted by Cov(X,Y) and is defined by Cov(X ,Y)  E[( X  E(X ))(Y  E(Y))] E[( X  x )(Y   y )]

 E(XY  X y Yx  x y )  E(XY )   y E(X )  x E(Y)  x y  E(XY )  E(X )E(Y) where x is the average value of X, y is the average value of Y, and E(XY) = average value of the product of X and Y. Cov(a  X ,Y)  Cov(X ,Y), Cov(aX,bY)  abCov(X ,Y), Cov(X ,Y  Z)  Cov(X ,Y)  Cov(X , Z) If X and Y are independent, then the covariance between them is 0. If large values of X and Y occur among the same subjects (as well as small values of X and Y), then the covariance is positive. If large values of X and small values of Y (or conversely, small values of X and large values of Y) tend to occur among the same subjects, then the covariance is negative. To obtain a measure of relatedness or association between two random variables X and Y, we consider the correlation coefficient, denoted by Corr(X,Y) or  and is defined by

 = Corr(X,Y) = Cov(X,Y)/(xy) where x and y are the standard deviations of X and Y, resp. It is a dimensionless quantity that is independent of the units of X and Y and ranges between -1 and 1.

9/21/2016 7/23 Examples of Correlated Random Variables

If X and Y are approx. linearly related, a correlation coefficient of 0 implies independence. Correlation coefficient close to 1 implies nearly perfect positive dependence with large values of X corresponding to large values of Y and small values of X corresponding to small values of Y. Corr close to -1 implies  perfect negative dependence, with large values of X corresponding to small values of Y and vice versa.

9/21/2016 8/23 Linear Combinations of Dependent Random Variables

To compute the variance of a linear contrast involving two dependent

random variables X1 and X2, we can use the general equation

Variance of linear combination of random variables (general case) The variance of the linear contrast

9/21/2016 9/23 Moments and Higher-Order

In statistics, higher-order statistics involves using the third or higher power of a sample such as the third or higher moments as defined below  m   E(X m )  x p(x )  xm f (x)dx m  i i  i  2 2 2 2 2 1  E(X )  , 2  E(X )    E(X )     m - th central moments  E[(X-)m ]

The normalised m-th or standardized moment is the m-th central moments divided by m . m - th normalised central moments  E[(X-)m ]/ m

9/21/2016 10/23 and

The normalised third central moment is called the skewness , often denoted by γ.

  E[(X-)3 ]/ 3   0 for normal distribution

The normalised four central moment or standardized moment is called Kurtosis, often denoted by .

4 4   E[(X-) ]/

The excess kurtosis is defined as kurtosis minus 3. The excess kurtosis = 0, for a normal distribution. The excess kurtosis > 0, fatter tails than a normal distribution. The excess kurtosis < 0, thinner tails than a normal distribution.

9/21/2016 11/23 Distribution

2 How is a specific random sample X1,…Xn used to estimate μ and σ , the mean and variance of the underlying distribution? A natural estimator to select n number for the sequence and take the average,

x is a single realization of a random variable X over all possible samples of size n that could have been selected from the population.

X denotes a random variable, and x denotes a specific realization of the random variable X in a sample.

The of X is the distribution of values of x over all possible samples of size n that could have been selected from the reference population.

9/21/2016 12/23 Example of Sampling Distribution

Frequency distribution of the sample mean from 200 randomly selected samples of size 10. The expected value of X over its sampling distribution is equal to .

9/21/2016 13/23 Central Limit Theorem

Let X1, …, Xn be a random sample from some population with mean  and variance 2. Then for large n, X ~ N(, 2/n) even if the underlying distribution of individual observations in the population is not normal. (The symbol is used to represent “approximately distributed.)

This theorem allows us to perform based on the approximate normality of the sample mean despite the nonnormality of the distribution of individual observations.

The skewness of the distribution can be reduced by transformation data using log scale. The central-limit theorem can them be applicable for smaller sizes than if the data are retained in the original scale.

9/21/2016 14/23 Examples for Central Limit Theorem

9/21/2016 15/23 Moment-Generating Function

The moment-generating function (mgf) of a random variable X is defined as  tx M (t)  E(etX )  e i p(x )  etx f (x)dx  i  i  Property A: If the moment-generating function exists for t in an open interval containing zero, it uniquely determine the . Property B: if the moment-generating function exists in an open interval containing zero, them the n-th derivative at t=0, M(n) (0)=E(Xn)

Property C: If X has the mgf MX(t) and Y=a+bX, then Y has the mgf at MY(t)=e MX(bt). Property D: if X and Y are independent random variables with mgf’s

MX and MY and Z=X+Y, them MZ(t)=MX(t)MY(t) on the common interval where both mgf’s exist.

9/21/2016 16/23 Moment-Generating Function of Standard Normal Distribution

The moment-generating function (mgf) of the standard normal distribution is defined as

 1 2 M (t)   etxex / 2dx 2  x2 1 t 2 1 t 2 Since  tx  (x2  2tx  t 2 )   (x  t)2  2 2 2 2 2 2 2 t / 2  uxt t / 2  e 2 e 2 2 therefore, M (t)   e(xt) / 2dx   eu / 2du  et / 2 2  2  t 2 / 2 E(X )  M '(t  0)  te |t0  0 2 t 2 / 2 t 2 / 2 E(X )  M"(t  0)  e  te |t0 1 Var(X )  E(X 2 )  E(X )2 1

9/21/2016 17/23 Theoretical Derivation of Central Limit Theorem

n , 2 Sn  nX n   X i where X i are independent and identically distribution from (0, ) i1

Let Zn  (Sn ) ( n),

We need to show the mgf Zn tends to the mgf of standard normal distribution.

Since Sn is a sum of independent random variables, t M (t)  [M (t)]n and M (t)  [M ( )]n Sn Zn  n M(s) has a Taylor series expansion about zero: 1 M(s)  M (0)  sM'(0)  s2M ''(0)   where  / s2  0 as s  0 2 s s Since E(X )  0, M (0) 1, M '(0)  0, M ''(0)   2 , as n  ,t /( n)  0, and t 1 t M ( ) 1  2 ( )2   where  /(t 2 /(n 2 ))  0 as n    n 2  n n n t 2 We have, M (t)  (1   )n Zn 2n n

a n n a It can be shown that if a n  a, then, limit(1 )  e n n 2 From this result,we get M (t)  et / 2 as n  , Zn 2 where et / 2 is the mgf of the standard normal distribution.

9/21/2016 18/23 Chi-Square Distribution To obtain an interval estimate for 2, a new family of distributions, called chi-square (2) distributions, must be introduced to enable us to find the sampling distribution of S2 from sample to sample.

If Z is a standard normal random variable, the distribution of U=Z2 called 2 the chi-square distribution with 1 degree of freedom, denoted by 1 . 2 2 2 If X~ N(,  ), the (X- )/  ~ N(0, 1), and therefore [(X- )/  ] ~ 1 .

 If U1, U2, ….. Un are independent chi-square random variables with 1 degree of freedom, the distribution of V = U1+U2+…..Un is called chi- 2 square distribution with n degrees of freedom and is denoted by n , with , the pdf as below 1 (n/ 2)1 v / 2 f (v)  g n (1 v) v e , v  0   , 2n/ 2 (n / 2) 2 2  Gamma density function g (t)  t 1et , t  0  , ()  Gamma function ()   x 1exdx, x  0 0 M (t)  (1 2t)n/ 2 , E(V )  n, Var(V )  2n 9/21/2016 19/23 Example of Chi-Square Distribution

If where X1, …, Xn ~ N(0,1) and the Xi’s are independent, then G is said to follow a chi-square distribution with n degrees of freedom (df). 2 The distribution is often dented by n . The chi-square distribution only takes on positive values and is always skewed to the right.

For n  3, the distribution has a 2 The uth of a d distribution (that is, a greater than 0 and is skewed to the 2 chi-square distribution with d df) is denoted by d right. The skewness diminishes as n where Pr( 2 <  2)  u. increases. d d

9/21/2016 20/23 Student’s t Distribution 2 If Z ~ N(0,1) and U ~ n , and Z and U are independent, then the distribution of W  Z/ U / n is called t distribution with n degrees of freedom, solved by a statistician named William Gossett (“Student”). The t distribution is not unique but is a family of distributions indexed by a parameter, the degrees of freedom (df) of the distribution. [(n 1) / 2] t 2 f (t)  (1 )(n1)/ 2 The density functiuon with n degrees of freedom is n (n / 2) n 2 If X1, …, Xn ~ N(, ) and are independent random variables, then (X - )/(s/√n) is distributed as a t distribution with (n - 1) df. The t distribution with d degrees of

freedom is referred to as the td distribution. The 100 × uth percentile of a t distribution with d degrees of freedom is denoted

by td,u, that is Pr(td < t d,u)  u

f (t)  f (t) n  , f (t)  normal distribution

9/21/2016 21/23 F Distribution  Let U and V be independent chi-square random variables with m and n degrees of freedom respectively. The distribution of a new random variable W=(U/m)/(V/n) is called the F distribution with m and n degrees of freedom and is

denoted by Fm,n, with the density function as

[(m  n) / 2] m m f (w)  ( )m/ 2 wm/ 21(1 w)(mn)/ 2 , w  0 (m / 2)(n / 2) n n

9/21/2016 22/23 Summary

In this chapter, we discussed

Expected and variance of functions of random variables Moment generating function Central-limit theorem Three distributions from the normal distribution t, chi-square, and F distributions

9/21/2016 23/23 The End

9/21/2016 24/23