R session

Shota Gugushvili November 12, 2013

1 Permutation test for

Here I will illustrate the permutation test as described on pp. 162–163 in Wasser- man. First let me generate two sets, X1,...,Xm ∼ FX and Y1,...,Yn ∼ FY .

> m<-10 > n<-10 > set.seed(123456) > dat.1<-rnorm(m) > dat.2<-rnorm(n,mean=3,sd=1)

I want to test H0 : FX = FY versus FX 6= FY . Let T (X1,...,Xm,Y1,...,Yn) = |Xm − Y n|. In the following code I implement the permutation test based on T (since N = m + n = 20, the total number of possible permutations is 20!; type fac- torial(20) in R to find out how large this number actually is!): > dat.3 = c(dat.1,dat.2) # Combine dat.1 and dat.2 into one data set. > t.obs <- abs(mean(dat.1) - mean(dat.2)) # t_{obs} in Wasserman, p. 163, step 1. > t.obs

[1] 2.460882

> # Now steps 2 and 3 in Wasserman, p. 163: > N<-m+n # Size of dat.3 > B = 100000 # Number of replications. > # In the next object we will fill in values of T based on permuted data. > T.sim = NULL > # Loop for filling in T.sim values. > index<-seq(1,N) > for (i in 1 : B) { +

1 + random1 = sample(dat.3,m,replace=FALSE) + random2 = setdiff (dat.3, random1) + + T.sim[i] = abs(mean(random1) - mean(random2)) + } > # Now we compute an approximate p-value (Wasserman, p. 163, step 4). > p.value = sum(T.sim>t.obs)/B # Check the English R guide on the blackboard, pp. 11-12. > p.value

[1] 7e-05

A small p-value gives very strong evidence against the null hypothesis. Exercise 1. Figure out why the code does what the algorithm on p. 163 in Wasserman says. Remark 1. The T as above is not a good choice to detect difference of two distributions that have the same means. Always choose T carefully.

2 One sample Kolmogorov-Smirnov test

The command ks.test performs both the one-sample and two-sample Kolmogorov- Smirnov tests. Let us start with the one-sample case. We generate the sample of size n = 50 from the N(µ, σ2) distribution and test the null hypothesis that the data come from the normal CDF Φµ,σ with (µ, σ) = (0, 1). > data<-rnorm(50) > ks.test(data,"pnorm",mean=0,sd=1)

One-sample Kolmogorov-Smirnov test data: data D = 0.109, p-value = 0.5559 : two-sided

The syntax is quite obvious: the command takes as its input the data set, the name of the distribution and the parameter value determining the null hypoth- esis. Exercise 2. Generate a sample of size n = 20 from the Gamma(α, β) distribution with parameters α = 1 and β = 2. (take a good notice: here I use Wasserman’s parametrisation of the gamma distribution). Next test the null hypothesis α0 = 0.8, β0 = 1.8 using the Kolmogorov-Smirnov test.

3 Two-sample Kolmogorov-Smirnov test

The two-sample Kolmogorov-Smirnov test is just as easy to perform. To il- lustrate it I will again use the MWG and M 31 data sets. Assume Wikipedia

2 is right in saying the distance modulus to Andromeda is 24.4. I will subtract that number from the values (apparent magnitudes) in the M 31 data set and compare the resulting data set to the MWG data set (absolute magnitudes). > GC_M31<-read.table("http://astrostatistics.psu.edu/MSMA/datasets/GlobClus_M31.dat", + header=TRUE) > GC_MWG<-read.table("http://astrostatistics.psu.edu/MSMA/datasets/GlobClus_MWG.dat", + header=TRUE) > data.x<-GC_M31[,2]-24.4 > data.y<-GC_MWG[,2] > ks.test(data.x,data.y) Two-sample Kolmogorov-Smirnov test data: data.x and data.y D = 0.259, p-value = 0.0002817 alternative hypothesis: two-sided The syntax is self-evident: I supply the command with two data sets. An extremely small p-value gives very strong evidence in favour of the claim that the two distributions are not the same.

4 Mann-Whitney test

Another nonparametric test for testing the null hypothesis of equality of two distributions is the Mann-Whitney test. This is included in R as part of the functionalities of the wilcox.test command. Here is an illustration using the same example as in the case of the two-sample Kolmogorov-Smirnov test above. > wilcox.test(data.x,data.y) Wilcoxon rank sum test with continuity correction data: data.x and data.y W = 17268.5, p-value = 0.009496 alternative hypothesis: true location shift is not equal to 0 The result is comparable to the Kolmogorov-Smirnov test case in that the p-value is extremely small.

5 Normality tests

There are many normality tests implemented in R. Let us try out the Shapiro- Wilk test using the MWG and M 31 data sets. > data.x<-GC_M31[,2] > data.y<-GC_MWG[,2] > shapiro.test(data.x)

3 Shapiro-Wilk data: data.x W = 0.9853, p-value = 0.001017

> shapiro.test(data.y)

Shapiro-Wilk normality test data: data.y W = 0.9883, p-value = 0.675

A small p-value gives strong evidence against the normality assumption in the M 31 case. We see no evidence in the MWG case. You can compare the results of the Shapiro-Wilk test to the graphical checks (, QQ-plot) you have applied to the two data sets before. Many other normality tests are implemented in the nortest package (if you want to use it, you first have to install and then load it).

6 Lilliefors test

Let me draw your attention to the Lilliefors test (Kolmogorov-Smirnov test, with estimated parameters).

> library(nortest) > lillie.test(data.y)

Lilliefors (Kolmogorov-Smirnov) normality test data: data.y D = 0.0688, p-value = 0.4498

Compare this to the output of the Kolmogorov-Smirnov test, but disregard- ing the fact the parameters have been estimated.

> ks.test(data.y, "pnorm", mean=mean(data.y), sd=sd(data.y))

One-sample Kolmogorov-Smirnov test data: data.y D = 0.0688, p-value = 0.8376 alternative hypothesis: two-sided

We get a much larger (and incorrect) p-value in the latter case. This does no harm in our particular example, but the message must be clear: the common practice in astronomy to disregard the fact that the parameters have been esti- mated and proceed with the usual Kolmogorov-Smirnov goodness-of-fit test can lead to invalid inference.

4 7 Additional practice

You will be working with the Hipparcos data set. Read its description here: http://astrostatistics.psu.edu/datasets/HIP_star.html. Using exploratory data analysis techniques in R, 92 Hyades stars were iden- tified out of the data set consisting of 2719 Hipparcos stars. The way this has been done is described here (if you want, you can try the things out, but perhaps at this stage it is better to be content directly with the end results given be- low): http://astrostatistics.psu.edu/datasets/2006tutorial/2006reg. html. I presume there exist more refined astronomical techniques for doing that (just as there are more sophisticated statistical techniques for that), but the goal of that tutorial is just to show how exploratory data analysis can be performed in R and why it is useful for astronomers to know to do it.

> hip = read.table("http://astrostatistics.psu.edu/datasets/HIP_star.dat", + header=T,fill=T) > attach(hip) > filter1= (RA>50 & RA<100 & DE>0 & DE<25) > filter2=(pmRA>90 & pmRA<130 & pmDE>-60 & pmDE< -10) > filter = filter1 & filter2 & (e_Plx<5) The user-defined command filter identifies the Hyades stars. What we are interested in is the colour of the stars. In the code below H contains the colours of 92 Hyades, while nH gives the colours of other stars (had not I added !is.na(color) to the definition of nH, there would have been some missing values (NA values) in it. Keeping them does no direct harm, but I nevertheless decided to throw them out). The variable B.V is the colour of a star in the original dataset.

> color=B.V > H=color[filter] > nH=color[!filter&!is.na(color)] So now we have two data sets, one with colours of the Hyades and another with colours of other stars. We are interested in studying whether the two groups of stars are really different from each other as far as their colour is concerned. Numerical summaries indeed suggest some difference.

> summary(H)

Min. 1st Qu. Mean 3rd Qu. Max. 0.0490 0.3680 0.5600 0.6123 0.8410 1.3270

> summary(nH)

Min. 1st Qu. Median Mean 3rd Qu. Max. -0.1580 0.5662 0.7160 0.7668 0.9540 2.8000

5 We can be more formal here. Observations in each group can be thought of as coming from a certain distribution. Thus we can test whether the two distribu- tions are the same. Exercise 3. Carry out testing in the above setting using the two-sample Kolmogorov- Smirnov and the Mann-Whitney tests. What are your conclusions? The per- mutation test for the sample means can in principle be also used, but it is going to be slow.

8 General observations

Here I provide a little summary on various tests I have introduced. One can classify them into two categories: those that make parametric assumptions (t- test, χ2-test) and those that do not (permutation test, Mann-Whitney test, Kolmogorov-Smirnov test). Parametric tests, such as the t-test, have to be applied in those cases where we have good reasons to believe the parametric assumptions we are making are valid. In the case of the t-test we make an assumption that our data have been generated from the normal distribution. We learned two graphical tools for checking the normality assumption: histogram and QQ-plot. But one has to be careful with them: are sensitive to how the bin widths are chosen and QQ-plot depends on which quantiles we decide to plot. In this course We just trusted the default choices of R, but they do not always do the job, and in any case, there is some subjectivity in drawing conclusions from the graphs. More formally, we could have performed a normality test, of which there are many, and of which we considered just one, the Shapiro-Wilk test. These are good, but can be unrealiable for small sample sizes. Inference with small sample sizes is difficult in any case. The is used for testing hypotheses on a one-dimensional parameter θ. The estimator θˆ of θ, on which the Wald test statistic is based, has to be asymptotically normal. Not all estimators are asymptotically normal. Also the Wald test is based on a large sample argument. It is conceptually simple in those cases where good, natural estimators of θ exist, but can be inaccurate for small sample sizes. Permutation tests are most useful when sample sizes are small. They are exact in the sense that they do not rely on asymptotic approximations. When used properly (meaning when the test statistic is chosen properly), they are a very powerful tool. The Mann-Whitney test, that is based on ranks, can be thought of as a particular instance of a permutation test, with an additional advantage that the distribution of the Mann-Whitney statistic U under the null hypothesis is tabulated (known) for small sample sizes, while for large samples accurate approximations based on the asymptotic normality property exist. This is a very good test: even when the data are come from the normal distribution, the Mann-Whitney test is only slightly inferior than the t-test, while beating t-test when the data are not normal.

6 Kolmogorov-Smirnov test is very popular in astronomy. For it to be appli- cable, sample sizes must be large. In the two-sample test case the test at times performs poorly when the means of two distributions are the same. The χ2-test for the grouped/binned data is typically inferior to other options mentioned above. There are a number of papers comparing different nonparametric tests. The one that specifically has astronomical applications in mind is Hou et al. (2009): http://iopscience.iop.org/0004-637X/702/2/1199/.

7