City University of New York (CUNY) CUNY Academic Works

Open Educational Resources Queensborough Community College

2020

Clear-Sighted : Appendix 3: Common Statistical Symbols and Formulas

Edward Volchok CUNY Queensborough Community College

How does access to this work benefit ou?y Let us know!

More information about this work at: https://academicworks.cuny.edu/qb_oers/143 Discover additional works at: https://academicworks.cuny.edu

This work is made publicly available by the City University of New York (CUNY). Contact: [email protected] Clear-Sighted Statistics: An OER Textbook

Appendix 3: Common Statistical Symbols and Formulas

I. Introduction

This appendix lists common statistical symbols and formulas used in Clear-Signed Statistics.

The terms and formulas presented here are explained in detail in the appropriate modules of Clear-Sighted Statistics.

II. Common Statistical Symbols and Formula

A. Module 4: Picturing with Tables and Charts

Symbol/Formula Description N Number of observations, or items, in a population n Number of observations, or items, in a sample Number of categories, classes, buckets, or bins in a k Distribution 2k > n. Formula used to determine the number of categories, 2 to the k formula classes, buckets, or bins in a Frequency Distribution H The highest value in a distribution L The smallest value in a distribution Class Interval or H − L i ≥ Width, i k

f Frequency or the number of observations

Relative frequency or the proportion of the total number of RF or % observations Upper Class Limit − Lower Class Limit Class Midpoint Midpoint = 2 Table 1: Module 4 Symbols and Formulas B. Module 5: Statistical Measures

Symbol/Formula Description X X stands for the Σ (capital Greek letter Sigma). It the operation of Σ summation or addition X̅ (The Sample ∑ X X̅ = , where X are the random variables , X-Bar) n μ (The Population ∑ X μ = where X are the random variables Mean, mu) N ∑ wX X̅ W (Weighted X̅ = where X are the random variables and w are the w ∑ w Mean) weights M or Med or 푥̃ “x-tilde” Mo Range = H (Highest Value) – L (lowest Value) M, Med, or X̃ Median Mean Deviation ∑|푋−푋̅| 푀퐷 = where “| |” means the absolute value, the (MD) or Mean 푛 Absolute Deviation distance of a positive or negative number from zero, or the (MAD) value of a number regardless of its negative or positive sign. σ2 (Population ∑(X − μ)2 , sigma- σ2 = squared) N s2 (Sample ∑(X − X̅)2 Variance, s- s2 = squared) n − 1 σ (Population ∑(X − μ)2 Standard σ = √ Deviation, sigma) N s (Sample Standard ∑(X − X̅)2 s = √ Deviation, s) n − 1

Sample Mean, ∑ fm X̅ = n Sample Standard ∑ f (M − X)2 Deviation, Grouped s = √ Data n − 1 Stands for Decile. Deciles divide a distribution into ten D groups of equal frequency D Location of a Decile L = (n + 1) D 10 Stands for . P75 or P75 means the 75th percentile. P divide a distribution into a hundred groups of equal frequency. Location of a P L = (n + 1) Percentile P 100 Stands for Quartile: Q1 (1st Quartile), Q2 (2nd Quartile), Q3 (3rd Q Quartile) and Q4 (4th Quartile). Quartiles divide a distribution into four groups of equal frequency. Location of a Q L = (n + 1) Quartile Q 4 Interquartile IQR = 3rd Quartile − First Quartile Range (IQR) Lower Outlier Outlier < Q + 1.5(Q − Q ) (Extreme Lower 1 3 1 Extreme Outlier < Q + 3(Q − Q ) Outlier) 1 3 1 Upper Outlier Outlier > Q + 1.5(Q − Q ) (Extreme Upper 3 3 1 Extreme Outlier > Q + 3(Q − Q ) Outlier) 3 3 1 σ CV = Coefficient of μ Variation as a s percentage CV = X̅

Coefficient of Variation as an index

X̅ − Mode SKmode = 푠푡푑. 푑푒푣

Skewness or SK X̅ − Median SKmedian = 푠푡푑. 푑푒푣

Pearson’s Coefficient of Q1 + (2 ∗ Q2) + Q3 Trimean Trimean = 4 Table 2: Module 5 Measures C. Module 6: Index Numbers

Symbol/Formula Description

Simple Index Pt P = (100) Number Po ΣPi Simple Price Index P = n Simple Aggregate ΣPt P = ∗ 100 Price Index ΣPo

ΣPtQO Laspeyres Index PL = ∗ 100 ΣPOQO

ΣPtQt Paasche Index PP = ∗ 100 ΣPOQt

푛 Fisher’s Ideal Index PF = √(Laspeyres) ∗ (Paasche)

ΣP Q Value Index V = t t ∗ 100 ΣPoQo Table 3: Module 6: Index Numbers D. Module 7: Basic Concepts of Probability

Symbol/Formula Description P(A) The probability of event “A” The probability of the event not A. This is called the P(~A) complement of event A. It is sometimes written as P(AC) or P(not A). The probability of event A given than event B has happened. P(A|B) This is called conditional probability. P(A or B) = P(A) + P(B) Special Rule of or Addition (for P(A  B) = P(A) + P(B) mutually exclusive Note:  is pronounced “union” and is the equivalent to the events) word “or” Complement Rule P(A) = 1 − P(~A) (Subtraction Rule) P(A or B) = P(A) + P(B) − P(A and B) General Rule of or Addition (for non- P(A  B) = P(A) + P(B) − P(A ⋂ B) mutually exclusive Note:  is pronounced as “intersection.” It is the equivalent events to the word “and.” P(A and B) = P(A)P(B) Special Rule of or Multiplication (for 푃(A ⋂ B) = P(A)P(B) independent events

P(A and B) = P(A)P(B|A) General Rule of or Multiplication (for P(A ⋂ B) = P(A)P(B|A) dependent events

P(A1)P(B|A1) Bayes Theorem P(A1|B) = P(A1)P(B|A1) + (A2)P(B|A2)

Multiplication Total Arrangements = (m)(n)(o) Formula n! (The factorial of a non-negative integer n, denoted by n!, Factorial Number is the product of all positive integers less than or equal to n: 4! = 1 x 2 x 3 x 4 = 24.)

Permutations

nPr is pronounced “the permution of r things nPr selected from n things.” Note: With permutations, the order of selection matters. Combinations

nCr is pronounced “the combination of r things nCr selected from n things.” Note: With combinations, the order of selection matters. Table 4: Module 7: Basic Concepts of Probability E. Module 8: Discrete Probability Distributions

1) Mean of a , μ

μ = Σ[xP(x)], found by multiplying each value by its probability and then adding the product of each value times its probability.

2) Variance of a Probability Distribution, σ2

σ2 = Σ[(X – μ)2P(x)], found by, 1) Subtract the mean from each random value, x, 2) Square (x – μ), 3) Multiply each square difference by its probability, and 4) Sum the resulting values to arrive at σ2.

3) of a Probability Distribution, σ

σ = σ2, the standard deviation is the positive square root of variance.

4) Binomial Probability Formula

P(x) = nCxπx(1 – π)n – x, where C denotes combinations, n is the number of trials, x is the random number of successful trials, π is the probability of a success for each trial. Note: π, or pi, is not the mathematical constant of 3.14159 that you used in your geometry class to find the circumference of a circle.

5) Mean of a Binomial Distribution μ = nπ

6) Variance of a Binomial Distribution μ = nπ(1 - π)

7) Hypergeometric Distribution ( C )( C ) P(x) = s x n-s n-x N Cn Where N is the size of the population; S is the number of successes in the population; x is the number of successes (It could be 0, 1, 2, 3, 4, …); n is the size of the sample (number of trials); and C is the combinations.

8) Poisson Distribution mxe-m P(x) = x! Where μ is the mean number of successes in a particular interval; e is the constant or base of the Naperian logarithmic system, 2.71828’ x is the number of successes; and P(x) is the probability of a specified value of x.

9) Mean of a Poisson Distribution μ = nπ

F. Module 9: Continuous Probability Distributions

Symbol/Formula Description Standard Normal X − μ 푧 = Value σ for σ the Mean, sigma σX̅ = sub x-bar or SEM √n ̅ z-value, μ and σ X − μ z = σ known ⁄ √n X = μ + zσ Solving for X Note: z can be either a positive or negative number. Table 5: Module 9: Continuous Probability Distribution

G. Module 10: and Sampling Errors

Symbol/Formula Description Mean of the Sample Sum of all sample means μ = Means (mu sub x-bar) X̅ Total number of samples Sampling Error X̅ - μ = 0 or X̅ ≠ μ X̅ − μ z-value for sample z = σ ⁄ √n Standard Error of the σ = σ X̅ ⁄ n Mean, SEM, or σX̅ √ Table 6: Module 10: Sampling and Sampling Errors H. Module 11: Confidence Intervals

Symbol/Formula Description The selected confidence level; usually 95%, but in some c cases 99% or 90%. The value a test must exceed to be out of the or the value a test statistics must exceed to reject the Null Hypothesis. A test statistic is a value Critical Value derived from a sample for the purposes of hypothesis testing and confidence intervals. Do not report the Critical Value as CV. CV is the Coefficient of Variance.

zc The critical value for a confidence level using z values.

tc The critical value for a confidence level using t values. σ Confidence Interval for X̅ ± z Means using z √n σ Margin of Error for the z Mean using z √n d.f., df, or ν (the lower- Note: The formula for degrees of freedom depends on the case or small Greek type of distribution used. letter nu) s Confidence Interval for X̅ ± t Means using t √n s Margin of Error for the t Mean using t √n Sample Proportion = p (a lower-case p). A commonly used Sample Proportion, p symbol for the sample proportion is p-hat, p̂. Sample Proportion X p = formula n Population Proportion = π. Some use a capital P to symbolize the Population Proportions. In Clear-Sighted Statistics Population Proportion population parameters are always symbolized with Greek letters. Standard Error for the p(1 − p) Proportion (σp, SEP or SEP√ SEP) n

Confidence Interval for p(1 − p) z√ Proportions n Table 7: Module 11: Confidence Intervals I. Module 12: Estimating Sample Size

Symbol/Formula Description Estimating Sample zσ 2 n = ( ) Size for the Mean E Estimating Sample z 2 n = p(1 − p) ( ) size for the Proportion E Table 8: Module 12: Estimating Sample Size J. Module 13: Introduction to Null Hypothesis Significance Testing

Symbol/Formula Description The Null Hypothesis. H0 is pronounced “H sub-zero” or “H sub naught.” H0 is a hypothesis about a population H0 parameter. The Null Hypothesis states that there is no effect. Any difference between the parameter and the statistic is due to sampling error. The Alternate Hypothesis, sometimes called the Research Hypothesis. The Alternate Hypothesis is pronounced “H sub- one” when the H1 symbol is used or “H sub-A” when the HA symbol is used. Like the Null Hypothesis, the Alternate H1 or HA Hypothesis is a statement about a population parameter. The Alternate hypothesis states that there is an effect, which means the difference between the parameter and statistic is too big to have occurred by chance. The level of significance. The level of significance is selected α (alpha) by the researcher or analyst. Alpha is also the likelihood of a Type I Error. The probability of a Type I Error, or rejecting a Null P(α) Hypothesis when we should fail to reject it. A Type II Error or failing to reject a Null Hypothesis that β (beta) should be rejected. The p-value represents the likelihood of obtaining a test statistic as extreme or more extreme than the one obtained. p-value If the p-value is greater than the level of significance, fail to reject the Null Hypothesis. When the p-value is equal to or less than the level of significance, reject the Null Hypothesis. Table 9: Module 13: Introduction to Null Hypothesis Significance Testing K. Module 14: One-Sample Tests of Hypothesis (Normal and Student t Distributions)

Symbol/Formula Description One-Sample test for 푋̅ − 휇 the Mean when σ is 푧 = 휎 ⁄ known √푛 One-Sample test for X̅ − μ the Mean when σ is t = s ⁄ unknown √n p − π One-Sample test for z = the Proportion √π(1 − π) n ̅ Probability of a Type Xc − μ1 z = σ II Error P(β) ⁄ √n Power of a Test = 1 − P(β) |X̅ − μ⌋ Cohen’s d Cohen′s d Effect Size = σ |μ − μ | γ = 1 0 delta, δ, for the mean σ δ = γ√n |π1 − π0| delta, δ, for the γ = √π0(1 − π0) proportion δ = γ√n Table 10: Module 14: One-Sample Tests of Hypothesis L. Module 15: Two-Sample Tests of Hypothesis (Normal and Student t Distributions)

Symbol/Formula Description Variance of the σ2 σ2 Distribution of σ2 = 1 + 2 X̅1−X̅2 differences in Means n1 n2 X̅ − X̅ z = 1 2 Two-sample z-test of σ2 σ2 Means √ 1 + 2 n1 n2 Pooled Standard s2 + s2 Pooled Standard Deviation = √ 1 2 Deviations 2 |X̅ − X̅ | Cohen′s d = 1 2 Cohen’s d Pooled Standard Deviation

′ Cohen’s h (ES for Cohen s h = |ϕ1 − ϕ2|, where phi (ϕ) population) = 2(arcsin ∗ √p)

Χ1 + Χ2 Pooled Proportion pc = n1 + n2

p1 − p2 Two-sample z-test of z = p (1 − p ) p (1 − p ) Proportions √ c c + c c n1 n2 Pooled Variance t-test ( ) 2 ( ) 2 2 n1 − 1 s1 + n2 − 1 s2 for Means (equal sp = Variance) n1 + n2 − 2 X̅ − X̅ t = 1 2 Pooled Variance 2 1 1 √sp ( + ) n1 n2 2 F-Test for comparing s1 F = 2 two sample s2 X̅ − X̅ Two-Sample t-test for t = 1 2 Means (Unequal 2 2 s1 s2 Variance) √ + n1 n2

[(s2⁄n ) + (s2⁄n )] df for Unequal df = 1 2 2 2 (s2⁄n )2 (s2⁄n )2 Variance t-test 1 1 + 2 2 n1 − 1 n2 − 1 d The difference between paired or dependent samples.

The mean of the difference between paired or dependent d̅ samples. ̅ Paired t-test for d t = s dependent samples d⁄ √n Table 11: Module 15: Two-Sample Tests of Hypothesis M. Module 16: ANOVA

Symbol/Formula Description 2 Sum of Square, total Total = ∑(X − X̅G)

2 Sum of Square, error Within (SSW) = ∑(X − X̅c) Sum of Square, Between (SSB) = Total − SSW treatment

Eta-squared, η2, Effect SSB η2 = Size Total

Confidence Interval 1 1 for difference in (X̅1 − X̅2) ± t√MSE ( + ) Treatment Means n1 n2 Table 12: Module 16 ANOVA N. Module 17: Chi-Square Tests

Symbol/Formula Description (f − f )2 χ2 = ∑ [ o e ] Chi-Square (χ2) fe Test Where fo stands for the Observed Frequencies for each category and fe stands for the Expected Frequencies for each category. Chi-Square Expected (Row Total)⁄(Column Total) Frequency for a fe = Grand Total χ2 Cohen’s w Effect Size Cohen′s w = √ n Degrees of Freedom df = k – 1 for a Goodness of Fit Degrees of Freedom for a contingency df = (# of rows − 1)(# of columns − 1) table Degrees of Freedom df = k – 3 for a Goodness of Fit (The two extra degrees of freedom are needed because we test for Normality use the sample mean and sample standard deviation.) Table 13: Module 17 Chi-Square Tests O. Module 18: Linear Correlation and Regression

Symbol/Formula Description Coefficient of ∑(X − X̅) (Y − Y̅) r = Correlation or r (n − 1)sXsY r2 or Coefficient of SSR SSE Determination or r2 r2 = = 1 − SS total SS total Test for the r√n − 2 t = significance of r √1 − r2 ρ, or the lower-case ∑(X − μX) (Y − μY) ρ = Greek letter rho (n)σxσY 푌̂ = 푎 + 푏푋 Equation (y-hat) s Slope of the b = r Y Regression Line sX Intercept of the a = Y̅ − bX̅ Regression Line b − 0 Test for Zero Slope t = sb

Standard Error of ∑(Y − Ŷ)2 s = √ the Estimate Y•X n − 2

1 (X − X̅)2 Confidence Interval Ŷ ± t(s )√ + Y•X n ∑(X − X̅)2

1 (X − X̅)2 Ŷ ± t(s )√1 + + Y•X n ∑(X − X̅)2 Table 14: Module 18 Linear Correlation and Regression P. Microsoft Excel Statistical Functions

Analysis ToolPak

Anova: Single Factor

Correlation

Descriptive Statistics

F-Test Two-Sample for Variance

Histogram

Moving Rank and Percentile

Regression

Sampling t-Test: Paired Two Samples for Means t-Test: Two-Sample Assuming Equal Variances t-Test: Two-Sample Assuming Unequal Variances z-Test: Two Sample for Means

Math Functions

ABS, POWER, ROUND, ROUNDDOWN, ROUNDUP, SQRT, SUM, SUMIF, SUMIFS,

SUMPRODUCT, SUMSQ

Frequency Distribution Functions:

FREQUENCY

Descriptive Statistics Functions:

AVEDEV, AVERAGE, AVERAGEA, AVERAGEIF, AVERAGEIFS, COUNT, COUNTA,

COUNTBLANK, CCOUNTIF, COUNTIFS, FREQUENCY, GEOMEAN, HARMEAN, KURT, LARGE,

MAX, MAXA, MAXIFS, MEDIAN, MIN, MINA, MINIFSM MODE.MULT, MODE.SNGL,

PERCENTILE.EXC, PERCENTILE.INC, PRECENTRANK.EXC, PERCENTRANK.INC,

QUARTILE.EXC, QUARTILE.INC, RANK.AVG, RANK.EQ, SKEW, SKEW.P, SMALL, STDEV.P,

STEVA, STDEVPA STDEV.S, VAR.P, VAR.S

Probability Functions:

COMBIN, FACT, PERMUT, PROB

Binomial Distribution Functions:

BINOM.DIST, BINOMI.INV Exponential Distribution Functions:

EXPON.DIST

Hypergeometric Distribution Functions:

HYPOGEOM.DIST

Poisson Distribution Functions:

POISSON.DIST,

Normal Distribution Functions:

NORMAL.DIST, NORM.INV, NORM.S.INV, NORM.SINV, STANDARDIZE t Distribution Functions:

T.DIST, T.DIST.2T, T.DIST.RT, T.INV, T.INV.2T, and T.TEST

Confidence Interval Functions:

CONFIDENCE.NORM and CONFIDENCE.T

F Distribution Functions:

FDIST, FDISTRT, FINV, FINVRT

Chi-Square Functions:

CHISQ.DIST, CHISQ.DISTRT, CHISQ-INV, and CHISQ-INVRT

Exponential Distribution Functions:

EXPON.DIST

Correlation and Regression Functions:

CORREL, DEVEQ, INTERCEPT, LINEST, PEARSON, RSQ, SLOP, STEYX, TREND

Q. Greek Letters Commonly Used in Statistics

Greek Upper Lower Letter Case Case Statistical Symbol Alpha Α α α = level of significance, Type I Error Beta Β β β = Type II Error; 1 – β = Power of the test Gamma Γ γ Delta Δ δ Epsilon Ε ε Zeta Ζ ζ Eta Η η Theta Θ θ Iota Ι ι Kappa Κ κ Lambda Λ λ Mu Μ μ μ = population mean Nu Ν ν ν = degrees of freedom (df) Xi Ξ ξ Omicron Ο ο Pi Π π π = population proportion Rho Ρ ρ ρ = linear correlation of a population Σ = “Sum of” or summation; σ2 = population variance; Sigma Σ σ σ = population standard deviation Tau Τ τ Upsilon Υ υ Phi Φ φ Chi Χ χ Chi Square statistics (χ2) Psi Ψ ψ Omega Ω ω

Except where otherwise noted, Clear-Sighted Statistics is licensed under a

Creative Commons License. You are free to share derivatives of this work for non-commercial purposes only. Please attribute this work to Edward Volchok.