Inferential Statistics

1 Inferential Statistics

Inferential statistics is the body of statistical techniques that deal with the question "How reliable is the conclusion or estimate that we derive from a set of data?" The two main techniques are confidence intervals and hypothesis tests.

Confidence Interval:

A confidence interval is an interval that brackets a sample estimate that quantifies uncertainty around this estimate. Since there are a variety of samples that might be drawn from a population, there are likewise a variety of confidence intervals that might be imagined for a given population parameter (though with the observed data you can see only one of them). A 95% confidence interval, for example, is one interval in a set of intervals. The property of this set of intervals is that 95% of the intervals in this set contain the population parameter. Likewise, a 90% confidence interval for the population mean is an interval which belongs to a set of intervals, 90% of which contain the population mean.

Hypothesis Testing:

Hypothesis testing (also called "significance testing") is a statistical procedure for discriminating between two statistical hypotheses - the null hypothesis (H0) and the alternative hypothesis ( Ha, often denoted as H1). Hypothesis testing rests on the presumption of validity of the null hypothesis - that is, the null hypothesis is accepted unless the data at hand testify strongly enough against it.

The philosophical basis for hypothesis testing lies in the fact that random variation pervades all aspects of life, and in the desire to avoid being fooled by what might be chance variation. The alternative hypothesis typically describes some change or effect that you expect or hope to see confirmed by data. For example, new drug A works better than standard drug B. Or the accuracy of a new weapon targeting system is better than historical standards. The null hypothesis embodies the presumption that nothing has changed, or that there is no difference.

Hypothesis testing comes into play if the observed data do, in fact, suggest that the alternative hypothesis is true (the new drug produces better survival times than the old one in an experiment, for example). We ask the question "is it possible that chance variation might have produced this result?"

As noted, the null hypothesis stands ("is accepted") unless the data at hand provide strong enough evidence against it. "Strong enough" means that probability that you would obtain a result as extreme as the observed result, given that the null hypothesis is true, is small enough (usually < 0.05) given the null hypothesis is true.

Interval Scale:

An interval scale is a measurement scale in which a certain distance along the scale means the same thing no matter where on the scale you are, but where "0" on the scale does not represent the absence of the thing being measured. Fahrenheit and Celsius temperature scales are examples.

See also: nominal scale, ordinal scale, ratio scale

Nominal Scale:

A nominal scale is really a list of categories to which objects can be classified. For example, people who receive a mail order offer might be classified as "no response," "purchase and pay," "purchase but return the product," and "purchase and neither pay nor return." The data so classified are termed categorical data. 2 See also: ordinal scale, interval scale, ratio scale

Ordinal Scale:

An ordinal scale is a measurement scale that assigns values to objects based on their ranking with respect to one another. For example, a doctor might use a scale of 0-10 to indicate degree of improvement in some condition, from 0 (no improvement) to 10 (disappearance of the condition). While you know that a 4 is better than a 2, there is no implication that a 4 is twice as good as a 2. Nor is the improvement from 2 to 4 necessarily the same "amount" of improvement as the improvement from 6 to 8. All we know is that there are 11 categories, with 1 being better than 0, 2 being better than 1, etc.

Ratio Scale:

A ratio scale is a measurement scale in which a certain distance along the scale means the same thing no matter where on the scale you are, and where "0" on the scale represents the absence of the thing being measured. Thus a "4" on such a scale implies twice as much of the thing being measured as a "2."

See also: nominal scale, ordinal scale, interval scale

Standard Deviation:

The standard deviation is a measure of dispersion. It is the positive square root of the variance.

An advantage of the standard deviation (as compared to the variance) is that it expresses dispersion in the same units as the original values in the sample or population. For example, the standard deviation of a series of measurements of temperature is measured in degrees; the variance of the same set of values is measured in "degrees squared".

Note: When using the sample standard deviation to estimate the population standard deviation, the divisor (n-1) is typically used instead of (n) to calculate the average. The use of (n-1) allows to reduce the bias of the estimate.

Degrees of Freedom:

For a set of data points in a given situation (e.g. with mean or other parameter specified, or not), degrees of freedom is the minimal number of values which should be specified to determine all the data points.

For example, if you have a sample of N random values, there are N degrees of freedom (you cannot determine the Nth random value even if you know N-1 other values). If your data have been obtained by subtracting the sample mean from each data point (thus making the new sample mean equal to zero), there are only N-1 degrees of freedom. This is because if you know N-1 data points, you may find the remaining (Nth) point - it is just the sum of the N-1 values with the negative sign. This is another way of saying that if you have N data points and you know the sample mean, you have N-1 degrees of freedom.

Another example is a 2x2 table; it generally has 4 degrees of freedom - each of the 4 cells can contain any number. If row and column marginal totals are specified, there is only 1 degree of freedom: if you know the number in a cell, you may calculate the remaining 3 numbers from the known number and the marginal totals. 3 Degrees of freedom are often used to characterize various distributions. See, for example, chi-square distribution, t-distribution, F distribution. t-distribution:

A continuous distribution, with single peaked probability density symmetrical around the null value and a bell-curve shape. T-distribution is specified completely by one parameter - the number of degrees of freedom.

If X and Y are independent random variables, X has the standard normal distribution and Y - chi-square distribution with N degrees of freedom, then the random variable

Standard Normal Distribution:

The standard normal distribution is the normal distribution where the mean is zero and the standard deviation is one.

Chi-Square Distribution:

The square of a random variable having standard normal distribution is distributed as chi-square with 1 degree of freedom. The sum of squares of 'n' independently distributed standard normal variables has a Chi-Square distribution with 'n' degrees of freedom. The distribution is typically used to compare multiple-sample count data in contingency tables to expected values under a null hypothesis.

F Distribution:

The F distribution is a family of distributions differentiated by two parameters: m1 (degrees of freedom, numerator) and m2 (degrees of freedom, denominator). If x1 and x2 are independent random variables with a chi-square distribution with m1 and m2 degrees of freedom respectively, then the random variable f = (x1/m1)/(x2/m2) has an F distribution with (m1,m2) degrees of freedom.

The F-distribution arises naturally in tests for comparing variances of two populations. The ratio of two sample variances has an F-distribution with (m11, m21) degrees of freedom if the samples of sizes m1 and m2 are drawn from normal populations with equal variances.

Z score:

An observation's z-score tells you the number of standard deviations it lies away from the population mean (and in which direction). The calculation is as follows:

x  z =  ,  where x is the observation itself,  is the mean of the distribution,  is the standard deviation of the distribution 4 t-test:

A t-test is a statistical hypothesis test based on a test statistic whose sampling distribution is a t- distribution. Various t-tests, strictly speaking, are aimed at testing hypotheses about populations with normal probability distribution. However, statistical research has shown that t-tests often provide quite adequate results for non-normally distributed populations too.

The term "t-test" is often used in a narrower sense - it refers to a popular test aimed at testing the hypothesis that the population mean is equal to some value  (see also t-statistic).

The most popular t-tests are aimed at testing the following hypotheses:

1) The population mean is as hypothesized (the population variance is not known).

2) The means of two populations are equal (the population variances are not known but equal).

3) The means of two populations are equal (the population variances are not known and not equal).

4) The correlation coefficient for two random variables is zero.

5) The slope of the population regression line is zero. t-statistic:

T-statistic is a statistic whose sampling distribution is a t-distribution.

Often, the term "t-statistic" is used in a narrower sense - as the standardized difference between a sample mean and a population mean , where N is the sample size:

where and are the mean and the standard deviation of the sample.