Goodness of Fit

GOODNESS OF FIT INTRODUCTION Goodness of fit tests are used to determine how well the shape of a sample of data obtained from an experiment matches a conjectured or hypothesized distribution shape for the population from which the data was collected. The idea behind a goodness-of-fit test is to see if the sample comes from a population with the claimed distribution. Another way of looking at that is to ask if the frequency distribution fits a specific pattern, or even more to the point, how do the actual observed frequencies in each class interval of a histogram compare to the frequencies that theoretically would be expected to occur if the data exactly followed the hypothesized probability distribution. This is relevant to cost risk analysis because we often want to apply a distribution to an element of cost based on observed sample data. A goodness of fit test is a statistical hypothesis test: Set up the null and alternative hypothesis; determine alpha; calculate a test statistic; look-up a critical value statistic; draw a conclusion. In this course, we will discuss three different methods or tests that are commonly used to perform Goodness-of-Fit analyses: the Chi-Square (χ2) test, the Kolmogorov-Smirnov One Sample Test, and the Anderson-Darling test. The Kolmogorov-Smirnov and Anderson-Darling tests are restricted to continuous distributions while the χ2 test can be applied to both discrete and continuous distributions. GOODNESS OF FIT TESTS CHI SQUARE TEST The Chi-Square test is used to test if a sample of data came from a population with a specific distribution. An attractive feature of the chi-square goodness-of-fit test is that it can be applied to any univariate distribution for which you can calculate the cumulative distribution function. The chi-square goodness-of-fit test is applied to binned data (i.e., data put into classes). This is actually not a restriction, since for non-binned data you can simply calculate a histogram or 1 Dec 2014 frequency table before generating the chi-square test. However, the value of the chi-square test statistic is dependent on how the data is binned. Another characteristic of the chi-square test is that it requires a sufficient sample size in order for the chi-square test statistic to be valid. The chi-square statistic measures how well the expected frequency of the fitted distribution compares with the frequency of a histogram of the observed data. It compares the histogram of the data to the shape of the candidate density (continuous data) or mass (discrete data) function. Definition The chi-square test is defined for the hypothesis: H0: The data follow a specified distribution. H1: The data do not follow the specified distribution. Test Statistic For the chi-square goodness-of-fit computation, the data are divided into k bins and the test k 2 2 statistic is defined as: x ((Oi Ei ) / Ei ) i1 where is the observed frequency for bin i and is the expected frequency for bin i. Computation of the expected frequency (Ei) will be shown by example. For the chi-square approximation to be valid, the expected frequency in each bin should be at least 5. This test is less sensitive when the sample size is small, and if some of the theoretical bin counts are less than five, you may need to combine some bins to ensure that there are at least 5 theoretical observations in each bin. Significance Level Critical Region: The test statistic follows, approximately, a Chi-Square distribution with (k – 1- number of population parameters estimated) degrees of freedom where k is the number of non- empty bins. If specific sample statistics need to be computed in order to develop the binning, then the degrees of freedom are reduced by the number of statistics that were computed. Therefore, the hypothesis that the data are from a population with the specified distribution is rejected if the computed χ is greater than the critical value. Note that the information needed to determine critical values from the χ2 distribution is the level of significance (α) and the Degrees k 2 2 of Freedom (df). If the sum of the squared deviations from x ((Oi Ei ) / Ei ) is small, the i1 observed frequencies are close to the expected frequencies and there would be no reason to reject 2 the claim that it came from that distribution. Only when the sum is large is there a reason to question the distribution. Therefore, the chi-square goodness-of-fit test is always a right tail test. KOLMOGOROV-SMIRNOV TEST The Kolmogorov-Smirnov One Sample Test, also referred to as the KS test, is an alternative to the χ2 test and is called a distribution-free test because it does not require that any assumptions about the underlying distribution of the Test Statistic be made. The KS test compares the cumulative relative frequency distribution derived from sample data with the theoretical cumulative relative frequency distribution that is described by the Null Hypothesis. In essence, the KS test is based on the maximum distance between these two cumulative relative frequency curves. The Tests Statistic, D, is the absolute value of the maximum deviation between the observed cumulative relative frequencies and the expected (theoretical) relative cumulative frequencies. Depending on the probability that such a deviation would occur if the sample data really came from the distribution specified in the Null Hypothesis, the Null Hypothesis should be rejected or not rejected. Note that in the KS test we are talking about relative frequencies, which are percentages rather than actual frequencies. The KS test is restricted to continuous distributions only. Definition The Kolmogorov-Smirnov test is defined as: H0: The data follow a specified distribution H1: The data do not follow the specified distribution Test Statistic: The Kolmogorov-Smirnov test statistic is defined as: D = Maximum|Fo – Fe| where: Fo = observed relative frequency Fe = theoretical relative frequency 3 Significance Level Critical Values: The hypothesis regarding the distributional form is rejected if the test statistic, D, is greater than the critical value obtained from a table. There are several variations of these tables in the literature that use somewhat different scalings for the KS test statistic and critical regions. These alternative formulations should be equivalent, but it is necessary to ensure that the test statistic is calculated in a way that is consistent with how the critical values were tabulated. ANDERSON-DARLING TEST The Anderson-Darling test is used to test if a sample of data came from a population with a specific distribution. It is a modification of the Kolmogorov-Smirnov (KS) test and gives more weight to the tails than does the KS test. The KS test is distribution free in the sense that the critical values do not depend on the specific distribution being tested. The Anderson-Darling test makes use of the specific distribution in calculating critical values. This has the advantage of allowing a more sensitive test and the disadvantage that critical values must be calculated for each distribution. Definition The Anderson-Darling test is defined as: H0: The data follow a specified distribution. H1: The data do not follow the specified distribution Test Statistic: The Anderson-Darling test statistic is defined as: 2 A = (-Sum/n)-n Where Sum is the sum of the (2i-1)*{(ln(Pi)+ln(1-Pn+1-i)} column and n is the sample size. The estimated (computed) Critical Value, designated as A* is computed as follows: A* = A2 (1.0 + 0.75/n + 2.25/n2) This is the value that is compared against the Critical Region value. Significance Level: α Critical Region: The critical values for the Anderson-Darling test are dependent on the specific distribution that is being tested. Tabulated values and formulas have been published (Stephens, 1974, 1976, 1977, 1979) for a few specific distributions (normal, lognormal, exponential, Weibull, logistic, extreme 4 value type 1). The test is a one-sided test and the hypothesis that the distribution is of a specific form is rejected if the test statistic, A, is greater than the critical value. Note that for a given distribution, the Anderson-Darling statistic may be multiplied by a constant (which usually depends on the sample size, n). These constants are given in the various papers by Stephens. In the sample output below, this is the “adjusted Anderson-Darling” statistic. This is what should be compared against the critical values. Also, be aware that different constants (and therefore critical values) have been published. You just need to be aware of what constant was used for a given set of critical values (the needed constant is typically given with the critical values). EXAMPLES CHI-SQUARE TEST EXAMPLE You have been presented with a set of 25 data points that represent the weights in pounds of missile warheads that have been installed on a number of different kinds of aircraft. The government is interested in determining if the distribution of these weights can be considered to be normally distributed with a mean of 100 lbs. and a standard deviation of 5 pounds. Table 1 provides the raw data with the values ranked from low to high. Table 1: Sample Data WEIGHTS (lbs.) 79.5 93.6 98.7 102.6 107.3 85.1 94.8 99.4 103.4 108.2 88.4 95.8 100.0 104.2 108.4 89.8 96.4 100.6 104.2 111.8 93.2 98.7 101.9 105.6 113.9 In order to perform the Chi-Square test, the data must be tabulated into “bins” to form the histogram.

Goodness of Fit

Relationships Among Some Univariate Distributions

Goodness of Fit: What Do We Really Want to Know? I

Linear Regression: Goodness of Fit and Model Selection

Chapter 2 Simple Linear Regression Analysis the Simple

A Study of Some Issues of Goodness-Of-Fit Tests for Logistic Regression Wei Ma

Testing Goodness Of

Measures of Fit for Logistic Regression Paul D

Goodness of Fit of a Straight Line to Data

Using Minitab to Run a Goodness‐Of‐Fit Test 1. Enter the Values of A

Goodness-Of-Fit Tests for Parametric Regression Models

Field Guide to Continuous Probability Distributions

Probability Distributions Used in Reliability Engineering