Frequently Used Statistics Formulas and Tables

Total Page:16

File Type:pdf, Size:1020Kb

Frequently Used Statistics Formulas and Tables Frequently Used Statistics Formulas and Tables Chapter 2 highest value - lowest value Class Width = (increase to next integer) number classes upper limit + lower limit Class Midpoint = 2 Chapter 3 Chapter 3 n = sample size Limits for Unusual Data N = population size µσ Below : - 2 f = frequency Above: µσ+ 2 Σ=sum w= weight Empirical Rule About 68%: µσ- to µ+ σ ∑ x About 95%: µσ-2 to µ+ 2 σ Sample mean: x = n About 99.7%: µσ-3 to µ+ 3 σ ∑ x Population mean: µ = N s Sample coefficient of variation: CV = 100% ∑•()wx x Weighted mean: x = ∑ σ w Population coefficient of variation: CV = 100% ∑•()fx µ Mean for frequency table: x = ∑ f highest value + lowest value Sample standard deviation for frequency table: Midrange = 2 n [ ∑•( fx22 ) ] −∑• [ ( fx ) ] s = nn (− 1) Range = Highest value - Lowest value xx− Sample z-score: z = ∑−()xx2 s = Sample standard deviation: s n −1 x − µ Population z-score: z = ∑−()x µ 2 σ Population standard deviation: σ = N Interquartile Range: (IQR) =QQ31 − Sample variance: s2 Modified Box Plot Outliers Population variance: σ 2 lower limit: Q1 - 1.5 (IQR) upper limit: Q3 + 1.5 (IQR) Chapter 4 Chapter 5 Probability of the complement of event A Discrete Probability Distributions: P( not A) = 1 - P( A ) Mean of a discrete probability distribution: Multiplication rule for independent events µ =∑•[x Px ( )] P( A and B) = P( A ) • P( B ) Standard deviation of a probability distribution: General multiplication rules P( A and B) = P( A ) • P( B , given A) σµ=∑•[x22 Px ( )] − P ( A and B) = P( A ) • P( A , given B) Addition rule for mutually exclusive events Binomial Distributions PAorB( ) = PA( ) + PB( ) r = number of successes (or x) p = probability of success General addition rule q = probability of failure P( A or B) = P( A ) + P( B )− P ( A and B) =−+ q1 p pq = 1 Binomial probability distribution n! r nr− Permutation rule: P = Pr()= nr Cpq nr (nr− )! Mean: µ = np n! Combination rule: C = Standard deviation: σ = npq nr rn!(− r )! Poisson Distributions Permutation and Combination on TI 83/84 rx= number of successes (or ) µ = mean number of successes n Math PRB nPr enter r (over a given interval) Poisson probability distribution n Math PRB nCr enter r e−µ µ r Pr()= r! e ≈ 2.71828 Note: textbooks and formula sheets interchange “r” and “x” µ = mean (over some interval) for number of successes σµ= σµ2 = 2 Chapter 6 Chapter 7 Normal Distributions Confidence Interval: Point estimate ± error Raw score: xz=σµ + Point estimate = Upper limit + Lower limit 2 x − µ Standard score: z = σ Error = Upper limit - Lower limit 2 Mean of x distribution: µµx = Sample Size for Estimating σ means: Standard deviation of x distribtuion: σ x = 2 zα /2σ n n = (standard error) E x − µ Standard score for xz: = proportions: σ 2 / n zα /2 n= pqˆˆwith preliminary estimate for p E Chapter 7 2 zα /2 np= 0.25 without preliminary estimate for One Sample Confidence Interval E >> for proportions (p ) : ( np 5 and nq 5) variance or standard deviation: *see table 7-2 (last page of formula sheet) pEˆˆ−<<+ p pE pp(1− ) Confidence Intervals where Ez= α /2 n r Level of Confidence z-value ( zα /2 ) pˆ = n 70% 1.04 for means (µσ ) when is known: 75% 1.15 xE−<µ <+ xE σ 80% 1.28 where Ez= α /2 n 85% 1.44 for means (µσ ) when is unknown: 90% 1.645 −<µ <+ xE xE 95% 1.96 s where Et= α /2 n 98% 2.33 with df. .= n − 1 99% 2.58 22 22(ns−− 1) ( ns 1) for variance (σσ ) : < < χχ22 RL with df. .= n − 1 3 Chapter 8 Chapter 9 One Sample Hypothesis Testing Difference of means μ -μ (independent samples) 1 2 Confidence Interval when σσ12 and are known ppˆ − −−<−µµ <−+ for p ( np> 5 and nq >=5) : z ()(xx12 E 1 2 )() xx 12 E pq/ n 22 σσ12 where Ez=α /2 + where q=−=1 pp ;ˆ r / n nn12 x − µ Hypothesis Test when σσ and are known for µσ ( known): z = 12 σ (xx12−−− )(µµ 1 2 ) / n z = σσ22 x − µ 12+ for µσ ( unknown): t= with df. .= n − 1 nn sn/ 12 2 22(ns− 1) Confidence Interval when σσ and are unknown for σχ: = with df. .= n − 1 12 σ 2 ()(xx12−−<− Eµµ 1 2 )() <−+ xx 12 E 22 ss12 Et=α /2 + Chapter 9 nn12 with df. = smaller of n−−1 and n 1 Two Sample Confidence Intervals 12 and Tests of Hypotheses Hypothesis Test when σσ12 and are unknown Difference of Proportions (pp− ) 12 (xx12−− )(µ12− µ ) t = 22 ss12 Confidence Interval: + nn12 ()()()ppˆˆ12−−<−<−+ E pp 12 ppˆˆ 12 E with df. .= smaller of n12 −−1 and n 1 pqˆˆ11 pq ˆˆ 2 2 where Ez=α /2 + Matched pairs (dependent samples) nn12 Confidence Interval dE−+ < µ < dE pˆ1= r 1/ np 1 ; ˆ 2 = r 2 / n 2 and q ˆ 1 =−=−1 pq ˆˆ 12 ; 1 p ˆ 2 d sd where Et= α /2 with d.f. = n−1 n Hypothesis Test: Hypothesis Test ()()ppˆˆ12−−− pp 12 z = − µ d d pq pq t= with df. .= n − 1 + sd nn 12 n where the pooled proportion is p Two Sample Variances rr12+ σσ22 p= and qp=1 − Confidence Interval for 12 and 2 22 nn12+ ss11 σ 1• <<11 • 2 22 ss22FFright σ 2 left pˆˆ1= rnp 112/ ; = rn 22/ 2 s1 22 = ≥ Hypothesis Test Statistic: F2 where ss12 s2 numerator df. .= n−= 1 and denominator df. n− 1 12 4 Chapter 10 Chapter 11 ()OE− 2 (row total)(column total) χ 2 =∑= where E Regression and Correlation E sample size Linear Correlation Coefficient (r) Tests of Independence df. .=−− ( R 1)( C 1) n∑ xy −∑( x )( ∑ y ) r = nx(∑2 )( −∑ x ) 22 ny ( ∑ )( −∑ y ) 2 OR Goodness of fit df. .= (number of categories) −1 Σ()zz rz= xy where = z score for x and z= z score for y n −1 xy Chapter 12 explained variation Coefficient of Determination: r 2 = total variation One Way ANOVA ∑−()yyˆ 2 Standard Error of Estimate: s = kN= number of groups; = total sample size e n − 2 2 2 ∑y − b01 ∑− y b ∑ xy 2 ()∑ xTOT or s = SS=∑− xTOT e n − 2 TOT N Prediction Interval: yEˆˆ−<<+ y yE ()∑∑xx22 ( ) = i − TOT SSBET ∑ all groups nNi 2 1 nx()0 − x where Et=α s1 ++ /2 e n nx(Σ22 )( −Σ x ) ()∑ x 2 = ∑−2 i SSWi∑ x all groups ni Sample test statistic for r r t = with df..= n − 2 SSTOT= SS BET + SS W 1− r 2 SS n − 2 MS = BET where d. f .= k − 1 BET df.. BET BET Least-Squares Line (Regression Line or Line of Best Fit) SSW yˆ = b01 + bx note that b0 is the y-intercept and b1 is the slope MSW = where df. .W = N − k d..f W n∑ xy −∑( x )( ∑ y ) sy = = MSBET where b1122 or br = = − nx(∑ )( −∑ x ) s F where df. numerator = df. .BET k 1 x MS and W df. denominator = df. = N − k (∑y )( ∑ x2 ) −∑ ( x )( ∑ xy ) W where b = or b= y − bx 0nx(∑22 )( −∑ x ) 01 β Confidence interval for y-intercept 0 Two - Way ANOVA bE0−<β 00 < bE + 1 x 2 rc= number of rows; = number of columns where E = tsα + /2 e n ()∑ x 2 MS row factor ∑−x2 Row factor F : n MS error MS column factor Column factor F : Confidence interval for slope β MS error 1 MS interaction bE1−<β 11 <+ bE Interaction F : MS error se where E = tα /2 • ()∑ x 2 ∑−x2 with degrees of freedom for n row factor = r −1 column factor = c −1 interaction = (rc−− 1)( 1) error = rc( n − 1) 5 critical z-values for hypothesis testing α = 0.10 α = 0.05 α = 0.01 c-level = 0.90 c-level = 0.95 c-level = 0.99 ≠ ≠ ≠ 0.05 0.05 0.025 0.025 0.005 0.005 z = - 1.645 z = 0 z = 1.645 z = - 1.96 z = 0 z = 1.96 z = - 2.575 z = 0 z = 2.575 < < < 0.01 0.10 0.05 z = - 2.33 z = 0 z = - 1.28 z = 0 z = - 1.645 z = 0 > > > 0.01 0.10 0.05 z = 0 z = 2.33 z = 0 z = 1.28 z = 0 z = 1.645 Figure 8.4 . . . . Greek Alphabet http://www.keyway.ca/htm2002/greekal.htm .
Recommended publications
  • Projections of Education Statistics to 2022 Forty-First Edition
    Projections of Education Statistics to 2022 Forty-first Edition 20192019 20212021 20182018 20202020 20222022 NCES 2014-051 U.S. DEPARTMENT OF EDUCATION Projections of Education Statistics to 2022 Forty-first Edition FEBRUARY 2014 William J. Hussar National Center for Education Statistics Tabitha M. Bailey IHS Global Insight NCES 2014-051 U.S. DEPARTMENT OF EDUCATION U.S. Department of Education Arne Duncan Secretary Institute of Education Sciences John Q. Easton Director National Center for Education Statistics John Q. Easton Acting Commissioner The National Center for Education Statistics (NCES) is the primary federal entity for collecting, analyzing, and reporting data related to education in the United States and other nations. It fulfills a congressional mandate to collect, collate, analyze, and report full and complete statistics on the condition of education in the United States; conduct and publish reports and specialized analyses of the meaning and significance of such statistics; assist state and local education agencies in improving their statistical systems; and review and report on education activities in foreign countries. NCES activities are designed to address high-priority education data needs; provide consistent, reliable, complete, and accurate indicators of education status and trends; and report timely, useful, and high-quality data to the U.S. Department of Education, the Congress, the states, other education policymakers, practitioners, data users, and the general public. Unless specifically noted, all information contained herein is in the public domain. We strive to make our products available in a variety of formats and in language that is appropriate to a variety of audiences. You, as our customer, are the best judge of our success in communicating information effectively.
    [Show full text]
  • Use of Statistical Tables
    TUTORIAL | SCOPE USE OF STATISTICAL TABLES Lucy Radford, Jenny V Freeman and Stephen J Walters introduce three important statistical distributions: the standard Normal, t and Chi-squared distributions PREVIOUS TUTORIALS HAVE LOOKED at hypothesis testing1 and basic statistical tests.2–4 As part of the process of statistical hypothesis testing, a test statistic is calculated and compared to a hypothesised critical value and this is used to obtain a P- value. This P-value is then used to decide whether the study results are statistically significant or not. It will explain how statistical tables are used to link test statistics to P-values. This tutorial introduces tables for three important statistical distributions (the TABLE 1. Extract from two-tailed standard Normal, t and Chi-squared standard Normal table. Values distributions) and explains how to use tabulated are P-values corresponding them with the help of some simple to particular cut-offs and are for z examples. values calculated to two decimal places. STANDARD NORMAL DISTRIBUTION TABLE 1 The Normal distribution is widely used in statistics and has been discussed in z 0.00 0.01 0.02 0.03 0.050.04 0.05 0.06 0.07 0.08 0.09 detail previously.5 As the mean of a Normally distributed variable can take 0.00 1.0000 0.9920 0.9840 0.9761 0.9681 0.9601 0.9522 0.9442 0.9362 0.9283 any value (−∞ to ∞) and the standard 0.10 0.9203 0.9124 0.9045 0.8966 0.8887 0.8808 0.8729 0.8650 0.8572 0.8493 deviation any positive value (0 to ∞), 0.20 0.8415 0.8337 0.8259 0.8181 0.8103 0.8206 0.7949 0.7872 0.7795 0.7718 there are an infinite number of possible 0.30 0.7642 0.7566 0.7490 0.7414 0.7339 0.7263 0.7188 0.7114 0.7039 0.6965 Normal distributions.
    [Show full text]
  • On the Meaning and Use of Kurtosis
    Psychological Methods Copyright 1997 by the American Psychological Association, Inc. 1997, Vol. 2, No. 3,292-307 1082-989X/97/$3.00 On the Meaning and Use of Kurtosis Lawrence T. DeCarlo Fordham University For symmetric unimodal distributions, positive kurtosis indicates heavy tails and peakedness relative to the normal distribution, whereas negative kurtosis indicates light tails and flatness. Many textbooks, however, describe or illustrate kurtosis incompletely or incorrectly. In this article, kurtosis is illustrated with well-known distributions, and aspects of its interpretation and misinterpretation are discussed. The role of kurtosis in testing univariate and multivariate normality; as a measure of departures from normality; in issues of robustness, outliers, and bimodality; in generalized tests and estimators, as well as limitations of and alternatives to the kurtosis measure [32, are discussed. It is typically noted in introductory statistics standard deviation. The normal distribution has a kur- courses that distributions can be characterized in tosis of 3, and 132 - 3 is often used so that the refer- terms of central tendency, variability, and shape. With ence normal distribution has a kurtosis of zero (132 - respect to shape, virtually every textbook defines and 3 is sometimes denoted as Y2)- A sample counterpart illustrates skewness. On the other hand, another as- to 132 can be obtained by replacing the population pect of shape, which is kurtosis, is either not discussed moments with the sample moments, which gives or, worse yet, is often described or illustrated incor- rectly. Kurtosis is also frequently not reported in re- ~(X i -- S)4/n search articles, in spite of the fact that virtually every b2 (•(X i - ~')2/n)2' statistical package provides a measure of kurtosis.
    [Show full text]
  • The Probability Lifesaver: Order Statistics and the Median Theorem
    The Probability Lifesaver: Order Statistics and the Median Theorem Steven J. Miller December 30, 2015 Contents 1 Order Statistics and the Median Theorem 3 1.1 Definition of the Median 5 1.2 Order Statistics 10 1.3 Examples of Order Statistics 15 1.4 TheSampleDistributionoftheMedian 17 1.5 TechnicalboundsforproofofMedianTheorem 20 1.6 TheMedianofNormalRandomVariables 22 2 • Greetings again! In this supplemental chapter we develop the theory of order statistics in order to prove The Median Theorem. This is a beautiful result in its own, but also extremely important as a substitute for the Central Limit Theorem, and allows us to say non- trivial things when the CLT is unavailable. Chapter 1 Order Statistics and the Median Theorem The Central Limit Theorem is one of the gems of probability. It’s easy to use and its hypotheses are satisfied in a wealth of problems. Many courses build towards a proof of this beautiful and powerful result, as it truly is ‘central’ to the entire subject. Not to detract from the majesty of this wonderful result, however, what happens in those instances where it’s unavailable? For example, one of the key assumptions that must be met is that our random variables need to have finite higher moments, or at the very least a finite variance. What if we were to consider sums of Cauchy random variables? Is there anything we can say? This is not just a question of theoretical interest, of mathematicians generalizing for the sake of generalization. The following example from economics highlights why this chapter is more than just of theoretical interest.
    [Show full text]
  • Theoretical Statistics. Lecture 5. Peter Bartlett
    Theoretical Statistics. Lecture 5. Peter Bartlett 1. U-statistics. 1 Outline of today’s lecture We’ll look at U-statistics, a family of estimators that includes many interesting examples. We’ll study their properties: unbiased, lower variance, concentration (via an application of the bounded differences inequality), asymptotic variance, asymptotic distribution. (See Chapter 12 of van der Vaart.) First, we’ll consider the standard unbiased estimate of variance—a special case of a U-statistic. 2 Variance estimates n 1 s2 = (X − X¯ )2 n n − 1 i n Xi=1 n n 1 = (X − X¯ )2 +(X − X¯ )2 2n(n − 1) i n j n Xi=1 Xj=1 n n 1 2 = (X − X¯ ) − (X − X¯ ) 2n(n − 1) i n j n Xi=1 Xj=1 n n 1 1 = (X − X )2 n(n − 1) 2 i j Xi=1 Xj=1 1 1 = (X − X )2 . n 2 i j 2 Xi<j 3 Variance estimates This is unbiased for i.i.d. data: 1 Es2 = E (X − X )2 n 2 1 2 1 = E ((X − EX ) − (X − EX ))2 2 1 1 2 2 1 2 2 = E (X1 − EX1) +(X2 − EX2) 2 2 = E (X1 − EX1) . 4 U-statistics Definition: A U-statistic of order r with kernel h is 1 U = n h(Xi1 ,...,Xir ), r iX⊆[n] where h is symmetric in its arguments. [If h is not symmetric in its arguments, we can also average over permutations.] “U” for “unbiased.” Introduced by Wassily Hoeffding in the 1940s. 5 U-statistics Theorem: [Halmos] θ (parameter, i.e., function defined on a family of distributions) admits an unbiased estimator (ie: for all sufficiently large n, some function of the i.i.d.
    [Show full text]
  • Lecture 14 Testing for Kurtosis
    9/8/2016 CHE384, From Data to Decisions: Measurement, Kurtosis Uncertainty, Analysis, and Modeling • For any distribution, the kurtosis (sometimes Lecture 14 called the excess kurtosis) is defined as Testing for Kurtosis 3 (old notation = ) • For a unimodal, symmetric distribution, Chris A. Mack – a positive kurtosis means “heavy tails” and a more Adjunct Associate Professor peaked center compared to a normal distribution – a negative kurtosis means “light tails” and a more spread center compared to a normal distribution http://www.lithoguru.com/scientist/statistics/ © Chris Mack, 2016Data to Decisions 1 © Chris Mack, 2016Data to Decisions 2 Kurtosis Examples One Impact of Excess Kurtosis • For the Student’s t • For a normal distribution, the sample distribution, the variance will have an expected value of s2, excess kurtosis is and a variance of 6 2 4 1 for DF > 4 ( for DF ≤ 4 the kurtosis is infinite) • For a distribution with excess kurtosis • For a uniform 2 1 1 distribution, 1 2 © Chris Mack, 2016Data to Decisions 3 © Chris Mack, 2016Data to Decisions 4 Sample Kurtosis Sample Kurtosis • For a sample of size n, the sample kurtosis is • An unbiased estimator of the sample excess 1 kurtosis is ∑ ̅ 1 3 3 1 6 1 2 3 ∑ ̅ Standard Error: • For large n, the sampling distribution of 1 24 2 1 approaches Normal with mean 0 and variance 2 1 of 24/n 3 5 • For small samples, this estimator is biased D. N. Joanes and C. A. Gill, “Comparing Measures of Sample Skewness and Kurtosis”, The Statistician, 47(1),183–189 (1998).
    [Show full text]
  • Covariances of Two Sample Rank Sum Statistics
    JOURNAL OF RESEARCH of the National Bureou of Standards - B. Mathematical Sciences Volume 76B, Nos. 1 and 2, January-June 1972 Covariances of Two Sample Rank Sum Statistics Peter V. Tryon Institute for Basic Standards, National Bureau of Standards, Boulder, Colorado 80302 (November 26, 1971) This note presents an elementary derivation of the covariances of the e(e - 1)/2 two-sample rank sum statistics computed among aU pairs of samples from e populations. Key words: e Sample proble m; covariances, Mann-Whitney-Wilcoxon statistics; rank sum statistics; statistics. Mann-Whitney or Wilcoxon rank sum statistics, computed for some or all of the c(c - 1)/2 pairs of samples from c populations, have been used in testing the null hypothesis of homogeneity of dis­ tribution against a variety of alternatives [1, 3,4,5).1 This note presents an elementary derivation of the covariances of such statistics under the null hypothesis_ The usual approach to such an analysis is the rank sum viewpoint of the Wilcoxon form of the statistic. Using this approach, Steel [3] presents a lengthy derivation of the covariances. In this note it is shown that thinking in terms of the Mann-Whitney form of the statistic leads to an elementary derivation. For comparison and completeness the rank sum derivation of Kruskal and Wallis [2] is repeated in obtaining the means and variances. Let x{, r= 1,2, ... , ni, i= 1,2, ... , c, be the rth item in the sample of size ni from the ith of c populations. Let Mij be the Mann-Whitney statistic between the ith andjth samples defined by n· n· Mij= ~ t z[J (1) s=1 1' = 1 where l,xJ~>xr Zr~= { I) O,xj";;;x[ } Thus Mij is the number of times items in the jth sample exceed items in the ith sample.
    [Show full text]
  • Analysis of Covariance (ANCOVA) with Two Groups
    NCSS Statistical Software NCSS.com Chapter 226 Analysis of Covariance (ANCOVA) with Two Groups Introduction This procedure performs analysis of covariance (ANCOVA) for a grouping variable with 2 groups and one covariate variable. This procedure uses multiple regression techniques to estimate model parameters and compute least squares means. This procedure also provides standard error estimates for least squares means and their differences, and computes the T-test for the difference between group means adjusted for the covariate. The procedure also provides response vs covariate by group scatter plots and residuals for checking model assumptions. This procedure will output results for a simple two-sample equal-variance T-test if no covariate is entered and simple linear regression if no group variable is entered. This allows you to complete the ANCOVA analysis if either the group variable or covariate is determined to be non-significant. For additional options related to the T- test and simple linear regression analyses, we suggest you use the corresponding procedures in NCSS. The group variable in this procedure is restricted to two groups. If you want to perform ANCOVA with a group variable that has three or more groups, use the One-Way Analysis of Covariance (ANCOVA) procedure. This procedure cannot be used to analyze models that include more than one covariate variable or more than one group variable. If the model you want to analyze includes more than one covariate variable and/or more than one group variable, use the General Linear Models (GLM) for Fixed Factors procedure instead. Kinds of Research Questions A large amount of research consists of studying the influence of a set of independent variables on a response (dependent) variable.
    [Show full text]
  • What Is Statistic?
    What is Statistic? OPRE 6301 In today’s world. ...we are constantly being bombarded with statistics and statistical information. For example: Customer Surveys Medical News Demographics Political Polls Economic Predictions Marketing Information Sales Forecasts Stock Market Projections Consumer Price Index Sports Statistics How can we make sense out of all this data? How do we differentiate valid from flawed claims? 1 What is Statistics?! “Statistics is a way to get information from data.” Statistics Data Information Data: Facts, especially Information: Knowledge numerical facts, collected communicated concerning together for reference or some particular fact. information. Statistics is a tool for creating an understanding from a set of numbers. Humorous Definitions: The Science of drawing a precise line between an unwar- ranted assumption and a forgone conclusion. The Science of stating precisely what you don’t know. 2 An Example: Stats Anxiety. A business school student is anxious about their statistics course, since they’ve heard the course is difficult. The professor provides last term’s final exam marks to the student. What can be discerned from this list of numbers? Statistics Data Information List of last term’s marks. New information about the statistics class. 95 89 70 E.g. Class average, 65 Proportion of class receiving A’s 78 Most frequent mark, 57 Marks distribution, etc. : 3 Key Statistical Concepts. Population — a population is the group of all items of interest to a statistics practitioner. — frequently very large; sometimes infinite. E.g. All 5 million Florida voters (per Example 12.5). Sample — A sample is a set of data drawn from the population.
    [Show full text]
  • Testing Hypotheses
    Chapter 7 Testing Hypotheses Chapter Learning Objectives Understanding the assumptions of statistical hypothesis testing Defining and applying the components in hypothesis testing: the research and null hypotheses, sampling distribution, and test statistic Understanding what it means to reject or fail to reject a null hypothesis Applying hypothesis testing to two sample cases, with means or proportions n the past, the increase in the price of gasoline could be attributed to major national or global event, such as the Lebanon and Israeli war or Hurricane Katrina. However, in 2005, the price for a Igallon of regular gasoline reached $3.00 and remained high for a long time afterward. The impact of unpredictable fuel prices is still felt across the nation, but the burden is greater among distinct social economic groups and geographic areas. Lower-income Americans spend eight times more of their disposable income on gasoline than wealthier Americans do.1 For example, in Wilcox, Alabama, individuals spend 12.72% of their income to fuel one vehicle, while in Hunterdon Co., New Jersey, people spend 1.52%. Nationally, Americans spend 3.8% of their income fueling one vehicle. The first state to reach the $3.00-per-gallon milestone was California in 2005. California’s drivers were especially hit hard by the rising price of gas, due in part to their reliance on automobiles, especially for work commuters. Analysts predicted that gas prices would continue to rise nationally. Declines in consumer spending and confidence in the economy have been attributed in part to the high (and rising) cost of gasoline. In 2010, gasoline prices have remained higher for states along the West Coast, particularly in Alaska and California.
    [Show full text]
  • Tests of Hypotheses Using Statistics
    Tests of Hypotheses Using Statistics Adam Massey¤and Steven J. Millery Mathematics Department Brown University Providence, RI 02912 Abstract We present the various methods of hypothesis testing that one typically encounters in a mathematical statistics course. The focus will be on conditions for using each test, the hypothesis tested by each test, and the appropriate (and inappropriate) ways of using each test. We conclude by summarizing the di®erent tests (what conditions must be met to use them, what the test statistic is, and what the critical region is). Contents 1 Types of Hypotheses and Test Statistics 2 1.1 Introduction . 2 1.2 Types of Hypotheses . 3 1.3 Types of Statistics . 3 2 z-Tests and t-Tests 5 2.1 Testing Means I: Large Sample Size or Known Variance . 5 2.2 Testing Means II: Small Sample Size and Unknown Variance . 9 3 Testing the Variance 12 4 Testing Proportions 13 4.1 Testing Proportions I: One Proportion . 13 4.2 Testing Proportions II: K Proportions . 15 4.3 Testing r £ c Contingency Tables . 17 4.4 Incomplete r £ c Contingency Tables Tables . 18 5 Normal Regression Analysis 19 6 Non-parametric Tests 21 6.1 Tests of Signs . 21 6.2 Tests of Ranked Signs . 22 6.3 Tests Based on Runs . 23 ¤E-mail: [email protected] yE-mail: [email protected] 1 7 Summary 26 7.1 z-tests . 26 7.2 t-tests . 27 7.3 Tests comparing means . 27 7.4 Variance Test . 28 7.5 Proportions . 28 7.6 Contingency Tables .
    [Show full text]
  • Understanding Statistical Hypothesis Testing: the Logic of Statistical Inference
    Review Understanding Statistical Hypothesis Testing: The Logic of Statistical Inference Frank Emmert-Streib 1,2,* and Matthias Dehmer 3,4,5 1 Predictive Society and Data Analytics Lab, Faculty of Information Technology and Communication Sciences, Tampere University, 33100 Tampere, Finland 2 Institute of Biosciences and Medical Technology, Tampere University, 33520 Tampere, Finland 3 Institute for Intelligent Production, Faculty for Management, University of Applied Sciences Upper Austria, Steyr Campus, 4040 Steyr, Austria 4 Department of Mechatronics and Biomedical Computer Science, University for Health Sciences, Medical Informatics and Technology (UMIT), 6060 Hall, Tyrol, Austria 5 College of Computer and Control Engineering, Nankai University, Tianjin 300000, China * Correspondence: [email protected]; Tel.: +358-50-301-5353 Received: 27 July 2019; Accepted: 9 August 2019; Published: 12 August 2019 Abstract: Statistical hypothesis testing is among the most misunderstood quantitative analysis methods from data science. Despite its seeming simplicity, it has complex interdependencies between its procedural components. In this paper, we discuss the underlying logic behind statistical hypothesis testing, the formal meaning of its components and their connections. Our presentation is applicable to all statistical hypothesis tests as generic backbone and, hence, useful across all application domains in data science and artificial intelligence. Keywords: hypothesis testing; machine learning; statistics; data science; statistical inference 1. Introduction We are living in an era that is characterized by the availability of big data. In order to emphasize the importance of this, data have been called the ‘oil of the 21st Century’ [1]. However, for dealing with the challenges posed by such data, advanced analysis methods are needed.
    [Show full text]