Statistics Terms and Concepts
Total Page:16
File Type:pdf, Size:1020Kb
Statistics Terms and Concepts
The random variable, X The outcome of one single occurrence; Mean = µ, Standard Deviation = σ
The random variable, X σ The average outcome of n occurrences; Mean = µ, Standard Deviation = * n σ *( is called the standard error) n
Distribution of X -- the determination of the type of hypothesis test or confidence interval used If X is normal and σ is known, then X is normal If X is normal and σ is unknown, then X has a t-distribution Even if X is not normal, if σ is known X is approximately normal if n is large* Even if X is not normal, if σ is unknown X has approximately a t-distribution if n is large* *( This is the Central Limit Theorem)
Requirement for doing confidence intervals and hypothesis tests (1) Sampling from a normal distribution OR (2) Take a large sample, n.
Confidence Intervals -- Meaning A 95% confidence interval* means that if repeated samples of size n are taken and confidence intervals constructed according to the formulas, in the long run 95% of these intervals will contain the true value of µ. *(In this case 95% is called the confidence level, .95 is called the confidence coefficient, and α = .05 is the proportion that will NOT contain µ.)
Form of Confidence Interval for µ: Point Estimate ± Margin of Error Point Estimate for µ -- x x is the best single “guess” for µ Margin of Error: σ If σ is known: z-intervals: z (Standard Error) = zα/2 * α/2 n
s If σ is unknown: t-intervals: t (Standard Error) = tα/2,n1 * α/2,n-1 n
*( zα/2 means the z-value that puts probability α/2 to the right of this z-value; tα/2,n-1 means the t-value that puts α/2 to the right of this t-value with n-1 degrees of freedom)
Calculating Correct Sample Sizes to Meet a Specific Margin of Error (E) (with α and σ given) σ 1. Set zα/2 = E. n σ 2. Then n = zα/2 -- Solve for n E
3. n = ( n )2 --- most likely this will give a number with a decimal --- ROUND UP! Hypothesis Tests – Basic Idea You are satisfied (you prove that) µ > v if x is “a lot” greater than v. You are satisfied (you prove that) µ < v if x is “a lot” less than v. You are satisfied (you prove that) µ ≠ v if x is “a lot” different from v (higher or lower).
The Hypotheses Null Hypothesis: H0: µ = v Alternate Hypothesis: HA: µ > v OR µ< v OR µ ≠ v
What you are trying to show is the alternate hypothesis. You must get “strong evidence” to show this is true. The null hypothesis is the one you cannot reject unless you get strong evidence to the contrary.
We reject or do not reject the null hypothesis – we never accept it! We accept or say we do not have enough evidence to accept the alternate hypothesis.
Level of Significance – the risk you are willing to take of concluding HA is true when it’s not* This is α. We assign this. *(This is called a TYPE I Error)
x v Form of the Test Statistic (z or t) -- how many standard errors x is from v: Standard Error x v x v z t For z-tests, σ , For t-tests, s n n
Form of the Hypothesis Tests* -- (i.e. What is “a lot”?) “>” Tests z-test: Reject H0 (Accept HA) if z > zα t-test: Reject H0 (Accept HA) if t > tα,n-1 “ <” Tests z-test: Reject H0 (Accept HA) if z < -zα t-test: Reject H0 (Accept HA) if t < -tα,n-1 “≠” Tests z-test: Reject H0 (Accept HA) if |z| > zα/2 t-test: Reject H0 (Accept HA) if |t| > tα/2,n-1 p-value -- Low p-values (< α) prove HA is true > tests: p-value = probability of getting greater than the x value we got (area to the right) < tests: p-value = probability of getting less than the x value we got (area to the left) ≠ tests: p-value = probability of getting greater an x value further away than the one we got (= twice the area in the tail)
Calculating β – the probability of not concluding HA is true when it is* for a given true value of µ *(This is called a TYPE II error) σ x x n 1. Express the test in terms of . For a “> test”: Accept HA if > v + zα = xcrit xcrit μ 2. For a “> test” Find z for the given value of µ: z = σ . β is the area to the left of z. n EXCEL APPROACHES Must be sampling from a normal distribution or take a large sample Use z with σ known and t with σ unknown z-intervals 1. Calculate x by: =AVERAGE(of the column of x’s) 2. Calculate the Margin of Error by: =CONFIDENCE(α, σ, n) 3. LCL: x - Margin of Error UCL: x + Margin of Error t-intervals 1. Go to Data Analysis/Descriptive Statistics – Input Range: (The column of x’s) Check Summary Statistics and Confidence Level for Mean 2. From output LCL: Mean – Confidence UCL: Mean + Confidence z-tests for HA: µ > v OR µ < v OR µ ≠ v 1. Calculate x by: =AVERAGE(of the column of x’s) 2. Calculate z by: =( x - v)/(σ/SQRT(n)) 3. Get p-value: “> Tests” p-value = 1-NORMSDIST(z) “< Tests” p-value = NORMSDIST(z) “≠ Tests and z >0” p-value = 2*(1-NORMSDIST(z)) “≠ Tests and z <0” p-value = 2*NORMSDIST(z) 4. If p-value < α, enough evidence to conclude HA is true If p-value > α, not enough evidence to conclude HA is true t-tests for HA: µ > v OR µ < v OR µ ≠ v 1. Go to Data Analysis/Descriptive Statistics – Input Range: (The column of x’s) Check Summary Statistics and Confidence Level for Mean 2. From output, calculate t by: =(Mean- v)/Standard Error 3. Get p-value: “> Tests” with t > 0 p-value = TDIST(t,n-1,1) “< Tests” with t < 0 p-value = TDIST(-t,n-1,1) “≠ Tests with t > 0 p-value = TDIST(t,n-1,2) “≠ Tests with t < 0 p-value = TDIST(-t,n-1,2) 4. If p-value < α, enough evidence to conclude HA is true If p-value > α, not enough evidence to conclude HA is true