MULTIPLE HYPOTHESIS TESTING FOR FINITE AND INFINITE NUMBER OF HYPOTHESES

by

Zhongfa Zhang

Submitted in partial fulfillment of the requirements For the degree of Doctor of Philosophy

Dissertation Advisor: Dr. Jiayang Sun

Department of

CASE WESTERN RESERVE UNIVERSITY

August 2005 CASE WESTERN RESERVE UNIVERSITY

SCHOOL OF GRADUATE STUDIES

We hereby approve the dissertation of

Zhongfa Zhang

candidate for the Doctor of Philosophy degree

Committee Chair: Dr. Jiayang Sun Dissertation Advisor Professor Department of Statistics

Committee: Dr. Wojbor Woyczynski Professor Department of Statistics

Committee: Dr. Robert Elston Professor Department of &

Committee: Dr. Hemant Ishwaran Adjunct Associate Professor Department of Statistics Staff, Dept. of Quantitative Health Sciences Cleveland Clinic Foundation

August 2005 Table of Contents

TableofContents...... iii ListofTables ...... v ListofFigures...... vi Acknowledgement...... ix Abstract...... x

1 Introduction 1 1.1 HypothesisTesting ...... 1 1.1.1 SingleHypothesisTesting ...... 1 1.1.2 MultipleHypothesisTesting ...... 3 1.1.3 Test Equality of Curves ...... 11 1.2 Road Map of the Following Chapters ...... 14

2 Multiple Hypothesis Testing— New FDR Controlling Procedures 15 2.1 Introduction...... 15 2.2 Relationship between FDR and FWER ...... 17 2.3 LiteratureReview...... 20 2.4 AFewTheorems ...... 22 2.5 OurProposedProcedure(PP)...... 34 2.6 ComparisonwithOtherProcedures ...... 42 2.7 ApplicationtoaRealDataSet ...... 44

3 Test Equality of Curves 48 3.1 AnEnvironmentalStudy—LeadProject ...... 48 3.2 ModelSetup...... 52 3.3 RelatedWorkandOutline...... 53 3.4 Methods...... 54 3.4.1 Homoscedastic Case...... 54

iii 3.4.2 Special Case When f (t) 0...... 61 2 ≡ 3.4.3 Heteroscedastic Case...... 62 3.5 Simulations ...... 64 3.6 Test Results on Teeth Lead Set...... 66

4 Connections and Discussions 69 4.1 Connections...... 69 4.2 DiscussionsandFutureResearch ...... 71

Appendices 75

A Proofs of Lemmas and Theorems in Chapter 2 76 A.1 ProofofLemma2.4.2...... 76 A.2 ProofofTheorem2.4.6...... 81 A.2.1 KeyLemma...... 81 A.2.2 OtherLemmas ...... 86 A.2.3 ProofofTheorem2.4.6...... 91

B Proof of Theorem in Chapter 3 92 B.1 Lemmas ...... 92 B.2 ProofofTheorem3.4.2...... 99

C Software ctest 102

Bibliography 103

iv List of Tables

1.1 Outcomeofsinglehypothesistesting ...... 3 1.2 Outcomeofmultiplehypothesistesting ...... 7

2.1 Number of genes discovered by three FDR procedures ...... 47

B.1 Comparison of simulated degrees of freedom ν = 4πm2 (upper element, via simulation) with approximated degrees of freedom ν (lower element,

by formula (3.4.19)) of different combinations of sample sizes n1, n2 and

degrees of freedom ν1,ν2...... 95

v List of Figures

2.1 Trellis plot to explore the functional relationship between FWER and FDR: simulated samples from N(µ, 1). Total hy-

potheses m = 1000, with number of true null hypotheses m0 = 100, 400, 700, 950, 990, 1000 from left to right, µ = 0 for null and µ = 0.06, 0.12, ... 0.36 from bottom up for the alternative distributions. . 19 2.2 Explanation of why the FDR produced by the BH procedure at level β (in case of independent test statistics ) is (m0/m)β, which depends

on m0,m and β only, but not on the realized p-values from alternative. Solid straight line: BH critical line; thick blue curve: sorted p-values againstindices...... 24

2.3 Partition of unit square such that the joint distribution of (P1, P2) constitutes a counter example to Theorem 2.4.1 when the independence

assumption is violated. β/2= c1 and β = c2...... 31

2.4 A joint distribution of (P1, P2) that constitutes a counter example to Theorem2.4.1...... 31 2.5 The asymptotic quadratic relationship between FDR level β and vari-

ance when m and m0 are large, based on Corollary 2.4.7. A realization

of for a fixed m and m0...... 34 2.6 Comparison of three FDR controlling procedures: 1. ST (Storey’s), 2. PP (Proposed, Uncorrected), 3. BH Procedure. 10000 repetition were performed to average the FDR, FNR, and Power. Total number of tests is m = 1000. The generated signal sampled from N(µ, σ2) is relatively weak with µ = 0.04, 0.04 + 1.18 1/(m 1), 0.04 + ∗ 1 − 1.18 2/(m 1),..., 1.2...... 39 ∗ 1 −

vi 2.7 Average FDP (left panel), FNP (middle panel), POWER (right panel) (y-axis) by Storey’s (line with mark 1), Corrected PP (line with mark 2), with δ = 0.035 in formula (2.5.4). 10000 replications were used for

average. Number of total tests is m=1000, m0 (x-axis) is the number oftruenullhypotheses...... 43 2.8 comparison of false discovery proportion of three procedures. Averaged FDR’s were plotted together produced by Storey’s(ST, solid blue), Uncorrected PP(PP, dashed green), BH’s(BH, dotted brown) procedures . “Confidence bands” (plus and minus one standard devia- tion) were added to the plot. 12000 replications were used for average.

Total test number is m=1000, m0 =number of true null hypotheses. . 45 2.9 Index plot for the 7129 p-values computed through permutation and t-test. Two straight lines are added to the plot. One is y = x/m, which corresponds to case when all genes are insignificant to the class differentiation. The other is y =(β/m)x, corresponding to the BH line at level β...... 46

3.1 Plot for teeth lead concentrations: Red square: M1 group, Blue circle: M2group...... 51 3.2 Plot for teeth lead concentrations: Solid red : M1 group, Dotted blue : M2 group. Local smoothing curves were superimposed for each group. Solid red line: M1 group, dotted blue line: M2 group...... 51 3.3 Simulation result. Test: f (t) = f (t),t = [0, 1]. Homoscedastic 1 2 ∈ T variances were assumed. 10000 repetitions were used...... 65

3.4 Simulation result: Test H0 : f(t) = 0. 10000 iterations were used. σ = 0.1,h = 0.1...... 68

3.5 Simulation results. Test: f1(t) = f2(t), for t = [0, 1]. 10000 ∈ T 2 repetitions were used. Heteroscedastic variances were used with σ1 = 2 0.02 and σ2 = 0.03...... 68

A.1 Illustration for case 3: m = 40,m0 = 10. All p-values are the same except one, which comes from the alternative. This point was marked asMintheleftpanelandNintherightpanel...... 84

2 B.1 The true density of Y (solid black) and the density of χν/ν (dashed red), 2 with ν computed by formula (3.4.19). The density curves of χνi /νi are

also added on the plot. n1 = 800, n2 = 1000,ν1 = 120,ν2 = 300. . . . 96

vii B.2 Compare the degrees of freedom ν estimated by formula (3.4.19) (dot- ted green lines) and the degrees of freedom ν by simulated data with 2 ν = 4πm (solid red line) with different combination of values ν1 =

100, 200,..., 800 (x-axis) and ν2 = 100, 200,..., 800 (from bottom

curve up). Here n1 = 1000 and n2 =1500...... 96 B.3 Tubes with 2 endpoints around a 1-dimensional manifold embedded in R2...... 98

viii ACKNOWLEDGEMENTS

First, I would like to thank my parents who have sacrificed so much for their children, and my brother and sisters for their unselfish love. No matter what happened and what will happen, they were and will always be there ready to give their support of whatever they can offer at their utmost. My gratitude also goes to numerous other people who have enlightened me during my primary and middle school years and who have sincerely cared about and helped me. I have been feeling so lucky in having them in my life. Without their help, I could not imagine what could happen in my life. I would also like to express my gratitude toward Drs. Alexander, Elston, Ish- waran, Sedransk, Sun, Werner, Woyczynski and Wu for my education in Mathemat- ics/Statistics Departments and for their understanding. I thank Drs Elston, Ishwaran and Woyczynski for serving on my thesis committee. Special thanks to my thesis ad- visor Jiayang Sun, who not only supported this research in part by her NSF awards, but also spent so much time, taken so much effort in trying to make me a successful researcher during my graduate years here in CWRU. I thank her guidance, knowledge and patience. Finally, I thank Dr. Steve Ganocy for proofreading my entire thesis and his support at this critical period.

ix Multiple Hypothesis Testing For Finite and Infinite Number of Hypotheses

Abstract

by

Zhongfa Zhang

Multiple hypotheses testing is one of the most active research areas in statistics. The number of hypotheses can be finite or infinite. For a multiple hypothesis testing, an overall error criterion must be properly defined and different test procedures must be developed. In this thesis, we investigate situations of both finite and infinite hypotheses testing. Accordingly, the thesis will be roughly divided into two parts.

The first part of this thesis will focus on the finite hypotheses testing. We study the (FDR) proposed by Benjamini and Hochberg in 1995, as an error criterion for a multiple testing procedure. We first attempt to find a functional relationship between FDR and the more familiar family-wise error rate (FWER) in order to study the practical aspects of the two criteria and to get a controlling procedure of one from that of the other. A few new theoretic results are then presented about FDR and based on these results, a new and “suboptimal” FDR controlling procedure is proposed. Some comparisons are made to compare the performance of the proposed procedure with that of Benjamini and Hochberg’s (1995) and Storey et al’s (2003). The procedure is then applied to a microarray data set to illustrate its application in the area.

x The second part of this thesis involves testing the equality of two curves. This type of testing problems occurs often in functional data analysis. In this part, we develop test procedures for testing if two curves measured with homoscedastic or het- eroscedastic errors are equal. The method is applicable to a general class of curves that can be either specified up to some unknown parameters, or are only assumed to be smooth. The null distribution of the test is derived and an approximation formula to estimate the p-value is developed, when the homoscedastic or heteroscedas- tic variances are either known or unknown. Simulation are conducted to show how our procedures perform in finite sample situations. Application to our motivating data example from an environmental study is illustrated.

The two areas are actually related. We will discuss their connections in the last chapter and propose questions for future research.

xi Chapter 1

Introduction

1.1 Hypothesis Testing

In statistical applications, one of the most important tasks is to model data, formu- late a pair of null and alternative hypotheses and then test if the null hypothesis is true based on data. Depending on the number of null hypotheses to be tested, we can classify hypothesis testing problems into one of the following three categories: single hypothesis testing, finite number of hypothesis testing and infinite number of hypoth- esis testing. The latter two categories belong to the multiple hypothesis testing. Brief description of each of the three categories follows.

1.1.1 Single Hypothesis Testing

Single hypothesis testing involves making a decision between one null hypothesis versus an , based on data. The usual process of single hypothesis testing consists of four steps.

Step 1. Formulate a null hypothesis (e.g. the observations are the result of pure chance) and an alternative hypothesis (e.g. the observations show a real effect combined with a component of chance variation).

1 Step 2. Identify a that can be used to assess the validity of the null hypothesis versus the alternative hypothesis.

Step 3. Compute the p-value, which is the probability that the test statistic (treated as a ) is as extreme as or more extreme than the observed value of the test statistic (treated as fixed, computed from data), under the

assumption that the null hypothesis H0 is true. That is,

P value = P r (T t ), − H0 ≥ 0

where T is the test statistic and t0 = T (X) is the realized test statistic, com- puted from data X. Small p-values suggest that the null hypothesis is unlikely to be true. The smaller it is, the more convincing is the rejection of the null hypothesis.

Step 4. Compare the p-value to an acceptable significance value (sometimes called an α value). If the p-value is smaller than α, we often say that at level α, the null hypothesis is rejected and the alternative hypothesis is favored.

There are two types of errors associated with a single hypothesis test. Type I error is the error which occurs when a true null hypothesis is rejected (called false positive in some application areas), while type II error is the error which occurs when a false null hypothesis is accepted (called false negative ). The possible outcomes of a single hypothesis test are shown in Table 1.1. The probability of making a type I error is called the significance level. Subtracting the probability of the type II error from 1 leads to the :

P ower = P r (H is rejected) = 1 P r (type II error). H1 0 − H1 The maximum power a test can have is 1; the minimum is 0. Ideally we want a test to have high power, close to 1. Among the tests that have significance levels at most α, the one with the largest power is called the most powerful test.

2 Table 1.1: Outcome of single hypothesis testing

Test Results Accept H ( ) Reject H (+) 0 − 0 False Positive, H0 is true Correct Decision Situation Type I Error False Negative, H1 is true Correct Decision Type II Error

The t and F tests are two of the most frequently used tests in the single hypothesis testing. Statistical hypothesis testing has many applications. It can test if the means, or the standard deviations of two populations are equal. It can also test differences between two models, or differences between two density functions. Single hypothesis testing is the starting point of multiple hypothesis testing, which is about testing two or more hypotheses simultaneously. Historically, in most cases, multiple hypothesis testing is about testing a finite number of hypotheses simultaneously; it has also been termed as multiple hypothesis testing problem, or multiple testing problem, or multiple comparison problem (we will ignore the subtle differences between the terms). Testing equality of curves, on the other hand, which we will discuss later, is an example of multiple testing with an infinite number of (null) hypotheses.

1.1.2 Multiple Hypothesis Testing

Multiple hypothesis testing occurs naturally in practice. For example, suppose we wish to test the hypothesis H0 : β1 = β2 = 0, where β1 and β2 are two coefficients in a multiple regression model. In situations in which we only wish to test whether H0 is true or not, we can use an F test (e.g., ANOVA). Often when H0 is rejected, we want to know further whether either β1 or β2 or both are nonzero. In this situation, we have a multiple decision problem. We can test the hypotheses H0,1 : β1 = 0 and

3 H0,2 : β2 = 0 separately with t test. This is called pointwise test. We can also test

H0,1 : β1 = 0 and H0,2 : β2 = 0 simultaneously. We then have a simultaneous test, or multiple testing problem. The multiple testing problem differs from pointwise testing problem (i.e., testing multiple hypotheses separately) in global error control. Testing multiple hypotheses separately with each at level α does not necessarily control the overall type I error of H0 at level α. This is important, especially when the number of hypotheses is large. On the other hand, procedures to test the multiple hypotheses simultaneously are designed to guarantee that the overall errors are controlled. Examples can be found in brain image analysis, in which we have tens of thousand of hypotheses to be tested simultaneously. Suppose the number of hypotheses m = 106. A point-wise test procedure at level α, say, 0.05, will lead to a large number of false rejections (this number is about 5 104, still a very large number). Thus we need a procedure to ∗ control the overall error at a specified level.

Testing the two hypotheses H0,1 and H0,2 when we are interested in whether β1 or

β2 or both are different from zero induces a multiple decision problem in which the four possible decisions are:

d : H and H are both true, • 00 0,1 0,2

d : H is true, H is false, • 01 0,1 0,2

d : H is false, H is true, • 10 0,1 0,2

d : H and H are both false. • 11 0,1 0,2

More generally, suppose that the hypothesis H0 is true if and only if the separate hypotheses H0,1 , H0,2 ,... are true. The induced test accepts H0 if and only if all the separate hypotheses are accepted. An induced test is termed either finite or infinite depending on whether there are a finite or infinite number of separate null hypotheses.

4 Typical examples of induced tests can be found in papers by Roy (1953), Roy and Bose (1953), Scheffe (1953) and Tukey (1953). Roy referred to induced tests as union-intersection tests. A lucid presentation of the union-intersection principle of test construction is given in Morrison (1976). A good reference for finite induced tests is Krishnaiah (1979). Miller (1977) presents a survey of induced tests and simultaneous confidence interval procedures. Two commonly referred induced tests are the Bonferroni test and Scheff´etest. The Bonferroni test is a finite induced test where the critical value is computed using the well known Bonferroni inequality stated below:

The Bonferroni Inequality: If A1, ..., Am are events (not necessarily independent), each of which has probability 1 p of occurring, then the probability that they all − occur is at least 1 mp, i.e.: −

m P r( m A ) = 1 P r( m Ac) 1 P r(Ac) = 1 mp. (1.1.1) ∩i=1 i − ∪i=1 i ≥ − i − i=1 X QED. The test on m hypotheses at level α simultaneously thus can be done to test each hypothesis at level α/m, which has been called the Bonferroni procedure (test). The Bonferroni test is often very conservative, which motivated research on finding sharper inequalities to bound P r( Ac) than the Bonferroni inequality. However, the ∪ i Bonferroni inequality has the advantage that it is very simple to apply. The Scheff´e F test is an infinite induced test. It is set up to test if any contrast

m of means from different populations are different: H0 : i=1 ciµi = 0 versus Ha : m c µ = 0, where m c = 0 and c =(c ,...,c ) = 0P so that c is a contrast and i=1 i i 6 i=1 i 1 m 6 Pthe µi’s are the (true)P population means. The test statistic is

m c X¯ 2 S2(c)= | i=1 i i| , V ar( m c X¯ ) P i=1 i i P 5 m ¯ 2 m 2 2 where V ar( i=1 ciXi) = Sw i=1 ci /ni, and Sw is the within variance of the whole data set. UnderP the normalityP assumption, for any z,

z P r(max S2(c) z)= P r(F ) (1.1.2) c=0 ≥ ≥ m 1 6 − where F F (m 1,N m) under the one-way ANOVA setting. So the critical value ∼ − − for testing H for all c = 0 based on S2(c) is: 0 6

CV =(m 1)F (m 1,N m, α). − − −

We reject H if the test statistic is greater than the critical value and declare m c µ = 0 i=1 i i 6 0 for those contrasts (directions) c such that S2(c) CV . P ≥ The Scheff´e F test is used to find where the differences between means lie, when the indicates the means are not all equal. It gives the freedom to test any and all comparisons that appear to be interesting. However, this great flexibility has a cost: the Scheff´etest normally has very low power. If in fact, we are only interested in a subset of c′s instead of the whole (unrestrained) contrast space, we can hope to increase the power of the test. This was done by Sun (1993), Sun and Loader (1994), Faraway and Sun (1995), Sun (2001) and others. Another multiple testing procedure worth mentioning here is the Tukey test for comparison of K population means. Let n1,...,nK be the sample sizes for samples K from K populations with N = j=1 nj. The total variance of the samples is now calculated from P K 2 (nj 1)s s2 = j=1 − j , N K P − 2 where sj is the variance of the j-th sample. The test statistic W is calculated as:

q s W = ∗ , √n where q is the Studentized , which is tabulated [see. for example, http : //fsweb.berry.edu/academic/education/vbissonnette/tables/posthoc.pdf] for

6 parameters K, ν(= N K), level α and − K n = . 1/n + 1/n + + 1/n 1 2 ··· K If this limit W is exceeded by the absolute difference between any two sample means, then the corresponding population means is declared to differ significantly.

A multiple hypothesis test of m null hypotheses H0,i versus Ha,i for i = 1, 2,...,m can produce outcomes which are summarized in Table 1.2, where S,T,U,V are the

Table 1.2: Outcome of multiple hypothesis testing

Test Results Accepted(-) Rejected (+) Total

Null true U V m0

Situation Alt. true T S m1 Total m R R m −

numbers of true discoveries (i.e., H0,i is rejected when H0,i is not true), false nondis- coveries (i.e., H0,i is accepted when H0,i is not true), true nondiscoveries (i.e., H0,i is accepted when H0,i is true) and false discoveries (i.e., H0,i is rejected when H0,i is not true) respectively. m and m = m m are the numbers of hypotheses among the 0 1 − 0 total m hypotheses whose null and alternative are true respectively. Several overall error criteria are defined. Among them, Family-Wise Error Rate (FWER) has been the most widely used. It is defined to be the probability of rejecting at least one true null hypothesis

FWER = P r (V 1), (1.1.3) H0 ≥ which was used to develop the Bonferroni and Sheff´e’s tests. Both of them control FWER at some prespecified level α. It is well known that the Bonferroni and Sheff´e’s procedures can be very conserva- tive, especially, when the number of hypotheses is large. The conservativeness of the

7 tests comes from two sources:

(1) The Bonferroni and Sheff´e’s tests were developed based on conservative upper bound for the FWER (see (1.1.1) and (1.1.2));

(2) FWER is a more stringent error rate than FDR below, especially when the total number of hypotheses is large.

Development of sharper upper bound to have a better approximation to FWER can be done to overcome (1), see Hsu (1999) and references therein, Naiman (1987, 1990) and Sun (1993, 2001), for example. To overcome (2), Benjamini and Hochberg (1995) defined the False Discovery Rate (FDR) as a relaxed overall error criterion for multiple hypothesis testing procedures. FDR is defined to be the expected value of the False Discovery Proportion (FDP):

FDR = E(F DP ), (1.1.4) where FDP is: V F DP = 1 V 1 . R { ≥ } Clearly, as an error criterion for multiple hypothesis testing, FDR is more liberal than FWER. It tries to control the ratio of false discoveries among the discovered, rather than the probability of rejecting at least one true null hypotheses, as was FWER defined. Another error criterion related to FWER is Per Comparison Error Rate (PCER). PCER is PCER = E(V/m)= E(V )/m. (1.1.5)

It is easy to see now that the following relationship holds true:

V V E( ) E( 1V 1) P r(V 1) E(V ), (1.1.6) m ≤ R ≥ ≤ ≥ ≤

8 so that we have: PCER FDR FWER. (1.1.7) ≤ ≤

In case that all null hypotheses are true (m = m0), we have:

FDR = FWER.

These relations are easy to prove. For example, by m R and V/R 1, we have the ≥ ≤ first two inequalities in (1.1.6). The third inequality in (1.1.6) is obvious. In the case

V of m = m0, we have V R, and thus FDR = E( 1 V 1 ) = E(1 V 1 ) = P r(V ≡ R { ≥ } { ≥ } ≥ 1) = FWER.

Now suppose we have two test procedures S and T such that S controls FDR at α, and T controls FWER at α, i.e., we have:

FWER(T )=α

FDR(S)=α.

Then by (1.1.6), FDR(S) FWER(S), ≤ hence we have FWER(T ) FWER(S). ≤ Therefore, it is natural to expect that this increase of the same type I error in S from T can lead to an increased power of S over T.

Whether to use FDR or FWER depends on the applications. We shall discuss more on this in Chapter 4 ( 4.2). §

Although Benjamini and Hochberg (1995)’s pioneer paper sparked a surge of re- search on FDR controlling procedures, the practical FDR controlling procedures are

9 available only for some special cases. Here are a few examples. Benjamini and Hochberg’s 1995 procedure (BH procedure thereafter, see Chapter 2, Section 2.3) was designed for independent continuous test statistics. Later, Benjamini and Yekutieli (2001) showed that the BH procedure also works for a special class of dependent test statistics, which they termed as Positive Regression Dependent on Subset (PRDS) property. We will give the definition of PRDS in Chapter 2. For more general depen- dent case, they require changing α to a quantity equivalent to α/ log(m) in the BH procedure to control the FDR at level α, where m is the total number of hypotheses tested. The latter case of using α/log(m) is often as conservative as the Bonferroni test. All of them are essentially step-down procedures, since the algorithm starts comparing the sorted p-values P(i) with critical values ci, i = 1, 2,...,m in such a way that the maximum index i which satisfies P c will be set equal to be the (i) ≤ i total number of rejected hypotheses. Benjamini and Liu (1999) and Somerville (1999) considered Step-Down procedures. Sarkar (2002) gave a step-down and step-up pro- cedure. Storey (2003) considered FDR in Bayesian setting, proposed the Positive False Discovery Rate (pFDR) as an overall error rate and argued that it is a more reasonable error rate than FDR. Storey (2002) and Storey et al. (2004) considered a unified approach to FDR control. Genovese and Wasserman built a mathematical framework for FDR. Other works on FDR can be found in papers by Abramovich et al. (2000); Donoho and Jin (2004); Pacifico et al. (2004); Sabattia et al. (2003) et al.

Since the research on the FWER controlled procedures has a long history, we may wonder if we can find a functional relationship between FWER and FDR so that we can build an FDR controlling procedure from an FWER controlling procedure. We will discuss this in Chapter 2.

All of these error criteria mentioned above are analogues to the type I error. Sim-

10 ilarly, we can define an error rate analogues to the type II error. Such a criterion is termed as False Nondiscovery Rate (FNR), which was defined by Genovese and Wasserman (2002) as the expected value of the False Nondiscovery Proportion. Re- ferring to Table 1.2, T F NP := 1 T>0 m R { } − and thus, T FNR := E(F NP )= E( 1 T>0 ). (1.1.8) m R { } −

Most new FDR studies intend to build test procedures that control the FDR as close to the target FDR level as possible while achieving a better power. This involves designing test procedures and then providing proofs that FDR is indeed controlled at some level by the procedure. Since FDR is the expected value of FDP, we also want to study the variances of FDP for different test procedures so that we may compare the efficiencies among them. If two test procedures have similar FDR control but different variances, we will prefer the test procedure with a smaller variance and call this procedure “suboptimal”. We shall study this in Chapter 2.

1.1.3 Test Equality of Curves

Sometimes, we need to test an infinite number of hypotheses simultaneously. We have seen an example of this in the previous section when we introduced the Scheff´etest procedure, since we are actually testing an infinite number of hypotheses of the form m c µ = 0, where m 2 and m c = 0 simultaneously. We have a hypothesis i=1 i i ≥ i=1 i testP for each c =(c1,c2,...,cm). AnotherP example is to test if two continuous smooth curves or functions, measured with errors, are statistically equal. In this case, we have one hypothesis for each point in a continuous domain. This is equivalent to having a multiple testing problem with an infinite number of hypotheses. Here are a few practical examples.

11 Time Course Gene Expression Levels: Gene expression levels are obtained from a chromosome sample at different time points. The expression levels for each gene as a function of time can be treated as a continuous function or curve. We want to test if those curves differ significantly.

Regression : Two time series f (t) and f (t) for t [a, b] need to be 1 2 ∈ tested if they are equal: H : f (t)= f (t), t [a, b], versus H : f (t) = f (t), 0 1 2 ∀ ∈ a 1 6 2 for at least one t [a, b]. ∈ Probability Densities: Two samples are obtained from 1-dimensional continuous distributions. We want to know if the two samples follow the same (continuous) density function, i.e, to test H : p (x)= p (x), versus H : p (x) = p (x), where 0 1 2 a 1 6 2 pi(x),i = 1, 2 are two density functions.

Testing equality of curves is also part of a larger research area of Functional Data Analysis (FDA). The following is quoted from one of the FDA web pages ( http : //ego. psych.mcgill.ca/misc/fda):

FDA is about the analysis of information on curves or functions. For example, these twenty traces of the writing of ”fda” are curves ......

...... In fact, most of the questions and problems associated with the usual multivariate data analyzed by statistical packages like SAS and SPSS have their functional counterparts.

12 But what is unique about functional data is the possibility of also using information on the rates of change or derivatives of the curves. We use slopes, curvatures, and other characteristics made available because these curves are intrinsically smooth, and we can use this information in many useful ways.

Studies of functional data analysis can be traced back to Parzen (1961). Besse and Ramsay (1986) considered the functional principal component analysis of func- tions resulting from polynomial interpolation of observed values. Ramsay and Daizell (1991) coined the term Functional Data Analysis (FDA) to distinguish it from or- dinary data analysis. Since then, several people tried research from various aspects in the field. For example, Leurgans et al. (1993) considered analysis when the data are curves. James and Hastie (2001) discussed functional linear discriminant analysis for irregularly sampled curves. Unfortunately, there are still a lot of open questions in research on testing equality of curves due to the dif- ficulty of calculating the tail probability. Fan and Lin (1998) gave a procedure to test significance of two curves by executing a Fourier or transformation (on the curves), and then using the first few coefficients to be the approximation of the curves to test the hypothesis. We call this an indirect method since the test is based on transformed functions (curves) rather than on the functions (curves) themselves. Sun (2001) gave a testing procedure based on a regression model when the variances are homoscedastic. A question we want to ask is: can we give a direct method to test the hypothesis by estimating the involved probability directly when the variances are heteroscedastic? We will discuss this in Chapter 3 and give testing procedures by estimating the related probabilities through the Tube formula.

13 1.2 Road Map of the Following Chapters

In Chapter 2, we give new theoretic results about FDR; then based on these theoretic results, we propose our own FDR controlling procedures. In Chapter 3, we start from describing an environmental project that motivated our research on testing equality of curves. Then we develop new test procedures to test the equality of curves for either homoscedastic or heteroscedastic cases for known and unknown standard deviations. Simulation results are presented and then the procedure is applied to the motivated lead concentration data. In Chapter 4, we discuss the connections between the two areas, give a unified definition of FDR and FWER for discrete and continuous domain and list some unsolved problems.

14 Chapter 2

Multiple Hypothesis Testing— New FDR Controlling Procedures

2.1 Introduction

One of the challenging problems in analyzing large data sets is simultaneously test- ing a large number of hypotheses. Powerful multiple testing procedures are useful in many applications such as finding disease genes or detecting differential gene expres- sions from microarray data [Lemon et al. (2003); Sabattia et al. (2003)] and locating significant source activity in EEG data [e.g.,Isotani et al. (2001)] or fMRI data [e.g., Zheng et al. (2001)].

Suppose we have m null hypotheses H1,...Hm. Among them, an unknown but fixed number of m hypotheses are true, and the remaining m = m m are not. 0 1 − 0 Suppose further that the test statistics are T1, T2,...,Tm, and their p-values are

P1, P2,...,Pm respectively. Our goal is to estimate m0 and find out which null hypotheses are not true (discoveries) by conducting or developing a suitable testing procedure. If we reject a null hypothesis which is true, a type I error (false discovery) occurs; while if we accept a null hypothesis which is not true, a type II error (false nondiscovery) occurs. Recall from Table 1.2 for the possible outcome of performing

15 a multiple testing procedure. Several error criteria have been defined. Among them are Family-Wise Error Rate (FWER) and False Discovery Rate (FDR) [Benjamini and Hochberg (1995)] defined in (1.1.3) and (1.1.4) in Chapter 1. Both FWER and FDR try to control the type I error rate. A dual concept to FDR is the False Non-discovery Rate(FNR), which was defined in (1.1.8) in the previous chapter to be the expected proportion of falsely accepted hypotheses among all the accepted hypotheses.

Remark: Throughout this Chapter, we sometime use p-value from the null to that the p-value (taken as a r.v., see definition below) comes from testing a hypothesis whose null H0 is true, while we use p-value from the alternative to mean the p-value (taken as a r.v. ) comes from testing a hypothesis whose alternative H1 is true. Sometime, we even use alternative p-value in short for p-value from alternative. The explanation is as following.

Let T be a test statistic for testing a hypothesis H0 versus Ha. Let t0 be the realization of test statistic. The p-value from testing the hypothesis is calculated by

p = P r (T t )= g(t ). H0 ≥ 0 0

If t0 is replaced by a random variable, say S. Then g(S) is a random variable too. If S and T follows the same distribution, as in the case when the null is true, this P = g(S) U(0, 1). If S has a different distribution, as in the case when the ∼ alternative hypothesis is true, P = g(S) will not have a uniform distribution. In this setup, each p-value from testing one pair of the hypotheses is viewed to be a random variable that comes either from a unform distribution U = U[0, 1] or from an unknown but common alternative distribution F . As a result in the Bayesian context, the general p-value follows a mixture between null distribution U = U(0, 1) and

16 alternative distribution F . Its CDF can thus be expressed in the following formula:

P = P (t)= P r (T t)= π t + (1 π )F (t), H0 ≤ 0 − 0 for t (0, 1), where π is the parameter that specifies the proportion of number of ∈ 0 true null hypotheses among the total number of hypotheses.

2.2 Relationship between FDR and FWER

As shown in (1.1.7) in Chapter 1 , the general relationship between FWER and FDR is: FDR FWER, (2.2.1) ≤ where equality holds when m = m0. A natural question is: can we find the functional relationship between FDR and FWER, namely, find a function g such that

FDR = g(FWER,f,m0)? (2.2.2) where f is the probability density function of the alternative p-value, assuming all the alternative p-values come from a common distribution. Given this computable formula (2.2.2), we can borrow from our expertise in building FWER controlling procedures. Consider generic multiple tests of the form: reject all those Hi′s such that “p c”, where c is a critical value that can be fixed or estimated. i ≤ In this Chapter, we shall only consider the simplest case. Suppose for i = 1, 2,...,m, our data are independently and normally distributed with mean µi and variance

2 σi = 1. We are testing for each i the null hypothesis H0,i : µi = 0 against Ha,i : µi > 0. Let T be the t-test statistic and P = P r(T tobs) be the corresponding p-value for i i i ≥ i i = 1, 2..., m. Suppose further that the null hypotheses which are known to be true are indexed from 1 to m , and the remaining m = m m hypotheses come from al- 0 1 − 0 ternative, whose alternative hypotheses are true. The test statistics from alternative

17 are calculated from samples drawn from a common distribution N(µ, 1) with equal means µ = µ 0. For any critical value c in the sequence of sorted T-values, i ≥ ∈ R the test procedure

′′rejecting any null hypothesis with T value greater than c′′ − will have

FWER(c) = PH0 (V (c) 1) = P max Ti c ≥ {i=1,...,m0 ≥ }

m0 set = 1 P (tn 1

1/m0 so that we have c satisfying P (tn 1

V FDR(c) = E( 1 V 1 ) V + S { ≥ }

m0 m where V = V (c)= i=1 1 Pi p0 Binomial(m0,p0) and S = S(c)= i=m +1 1 Pi p0 { ≤ }∼ 0 { ≤ } ∼ Binomial(m ,p ) areP independent with p defined above and p =PP r(T c) = 1 1 0 1 i ≥ p(tn 1 c + √nµ) for i m0 + 1,...,m . − ≥ ∈ { } Treating c as a bridging parameter between FWER and FDR, we are able to produce the following trellis plot, Figure 2.1 through simulation. Each p-value is calculated from a sample of i.i.d. either N(0, 1) (for null distribution) or N(µ, 1) (for alternative distribution). The total number of hypotheses m is fixed to be 1000, but the number of true null is m0 = 100, 400, 700, 950, 990, 1000 respectively from left to right. The alternative distribution N(µ, 1) is allowed to change via change of µ from the bottom 0.06 up to 0.36 in the first row with a step of 0.06.

From Figure 2.1, we can see that whenever m0 = m, (panels in the rightmost col- umn), FWER is indeed equal to FDR, as is expected in relationship (2.2.1). Therefore,

18 Figure 2.1: Trellis plot to explore the functional relationship between FWER and FDR: simulated samples from normal distribution N(µ, 1). Total hypotheses m =

1000, with number of true null hypotheses m0 = 100, 400, 700, 950, 990, 1000 from left to right, µ = 0 for null and µ = 0.06, 0.12, ... 0.36 from bottom up for the alternative distributions.

19 it is possible to get an FDR controlling procedure via a corresponding FWER con- trolling procedure when m0 is close to m. This is the case when only a few discoveries are expected at most (spark signals). In this situation, suppose we want to control FDR at level β, we can estimate the necessary parameters to get the relationship between FDR and FWER. Then we translate our FDR controlling problem at level β to an FWER controlling problem at FWER level α by locating the corresponding α over the FWER-FDR curve. At the same time, as m0/m decreases (e.g., to 950/1000 in the fourth column in Figure 2.1), the FDR will decrease to 0 quickly throughout the whole range [0, 1] of FWER. This directly illustrates the point that the FWER procedure gives a more conservative type I error control than FDR procedures and makes it unlikely to find a critical value for FDR control by finding the corresponding critical value for a certain FWER control (The line FDR = β will not intersect the

FWER-FDR curve when m0/m is down away from 1), unless m0 and m are close.

2.3 Literature Review

Benjamini and Hochberg (1995)’s proposed FDR controlling procedure (BHFDR) is:

Reject all H with p p where p ,...,p are the sorted p-values of p ,...,p , i i ≤ (k) (1) (m) 1 m j k = max j : p β, 0 j m , { (j) ≤ m ≤ ≤ } where p(0) = 0 if it is needed.

This is a simple two-step step-down procedure. The first step is to order the p- values from small to large p(1),...,p(m). Then the next one is to compare the sorted p-value with BH critical values (j/m)β, j = 1, 2,...,m from the upper side down to lower side (step-down). The process stops at the first time when we encounter the event satisfying the condition p (j/m)β. The index at this point is denoted by (j) ≤

20 k. Any hypotheses with a p-value less than or equal to p(k) will be declared to be significant (discoveries). Their null hypotheses will be rejected. They showed that this procedure in the independent test statistic case controls FDR at the pre-fixed level β: m FDR 0 β β. ≤ m ≤ Later, it was shown that the bound holds for the positive regression dependent (PRDS) p-value case [Benjamini and Yekutieli (2001)]. Equality holds for contin- uous and independent test statistic case [ Storey et al. (2004)] using a martingale argument. In the following section, we will show that a much stronger conclusion than this conclusion does actually hold true under the same assumptions for the BH procedure (Section 2.4, Lemma 2.4.2). Then we will propose our own test procedures based on our theorems (Section 2.5). Storey et al. (2004) used a different approach to control FDR at a pre-specified level β. Their procedure first tries to estimate the realized FDR for any fixed rejection region; then finds the rejection region that controls the FDR as close to the pre- specified FDR level β as possible by an optimal procedure. This procedure suffers from the fact that it is computationally intensive: it requires bootstrap and then optimization computation. Both are computationally intensive. Since FDR is the expected value of the (realized) false discovery proportion, vari- ances of this proportion (as a random variable) can be studied. This is done in Section 2.4 (Theorem 2.4.6 and Corollary 2.4.7) and in Section 2.6, in which simulations are run to compare the variances of false discovery proportions produced by the proce- dures of BH, Storey and ours (Figure 2.8). This study is important, because variance is the indication of efficiency of the test procedures. Two test procedures with the same FDR control can differ greatly in efficiency; whichever test procedure produces a smaller variance will be preferred over the other. In the last section (Section 2.7 ), we apply the procedure to a real (microarray) data set to try to determine the genes

21 that significantly differentiate the sample for some (genetic cancer) classes. A related but different work was given by Owen (2004), in which he discussed the variances of number of false discoveries, i.e., he considered V ar(V ), where V is the false discoveries defined in table 1.2. This is not what we are talking here (V ar(V/R)). Computing V ar(V/R) is much more complicated than computing V ar(V ). Since FDR was proposed by Benjamini and Hochberg (1995), quite a lot of research has been done in this area. Here is a review of the literature. Apart from the research mentioned above by Benjamini and Yekutieli (2001), Storey (2002) and Storey et al. (2004), Storey (2003) proposed positive FDR (pFDR) in the Bayesian framework and argued that pFDR is a more appropriate and meaningful error rate to be controlled in multiple testing setup. Genovese and Wasserman (2002) offered some theoretic results, and Genovese and Wasserman (2004) considered the false discovery proportion as a stochastic process to study the behavior of FDR. Benjamini and Yekutieli (2001) and Sarkar (2002) proved that the BHFDR is still controlled at the desired level when the test statistics are dependent for some special dependence structures (e.g., PRDS etc). Donoho and Jin (2004) considered the application of FDR in a non- Gaussian setting, in hope of learning whether the good asymptotic properties of FDR thresholding as an estimation tool hold more broadly than just at the standard Gaussian model. Abramovich et al. (2000) considered the problem of recovering a high-dimensional vector observed in white noise, where the vector is known to be sparse, but the degree of sparsity is unknown. Pacifico et al. (2004) extended FDR definition from a discrete (counting) measure to general measures, and thus FDR can be defined and studied when the test statistics are defined over a continuous domain such as an interval.

2.4 A Few Theorems

Consider random p-value g(S), as in Section 2.1.

22 Let P1,...,Pm0 be the p-values (as random variables) from the null and let (Q1, Q2,...,Qm1 ) are the p-values from the alternative.

Theorem 2.4.1. If the true null test statistics are independent, then after applying the BH procedure at level β, we have:

V m0 FDR = E( 1 R 1 ) β. (2.4.1) R { ≥ } ≤ m for any configurations of false and true null hypotheses. If the test statistics are also continuous, the equality in (2.4.1) holds.

This theorem was proved partially (the one-sided upper inequality) by Benjamini and Hochberg (1995) and then by Storey et al. (2004) under the same assumptions using a martingale argument. However, we will provide our own different proof in Appendix A. It is based on the following lemma, which has its own significance, because it actually leads to a stronger result than what is claimed in Theorem 2.4.1.

Lemma 2.4.2. Under the conditions of Theorem 2.4.1, after applying the BH proce- dure at level β, we have

V m0 E( 1 R 1 Q1 = q1 ...Qm1 = qm1 ) β. (2.4.2) R { ≥ }| ≤ m

for any realized values (q1,q2,...,qm1 ) of random variables (Q1, Q2,...,Qm1 ), where

Qi′ s are the p-values from the alternative distribution. If the test statistics are also continuous, the equality in (2.4.2) holds.

This lemma establishes that the actual BHFDR does not depend on the (realized) values of statistics from the alternative at all. The proof for this lemma is given in Appendix A. By taking expectations on both sides of equation (2.4.2), we get equation (2.4.1) in Theorem 2.4.1.

The following is a simple heuristic proof of lemma 2.4.2.

23 sorted p−values B

BH critical line

C S β sorted p−values

C

R m

0.0O 0.2 0.4 0.6 0.8 1.0 A

0 100 200 300 400 500

index i

Figure 2.2: Explanation of why the FDR produced by the BH procedure at level β

(in case of independent test statistics ) is (m0/m)β, which depends on m0,m and β only, but not on the realized p-values from alternative. Solid straight line: BH critical line; thick blue curve: sorted p-values against indices.

24 Suppose we have a sequence of p-values for testing the null hypotheses at the FDR level β by the BH procedure. The procedure will first try to locate the index

j R = max j : P β, j = 0, 1,...,m , (2.4.3) { (j) ≤ m } which is just the total number of rejections. It is the first time j such that p(j) falls below the corresponding critical BH value (j/m)β, as a straight line in the index plot

(Figure 2.2). Every hypothesis with p-value below or at P(R) will be rejected. R then is the “cutoff” value in the index domain. Let C be the “cutoff” random variable in the p-value domain, i.e., the point on the line of y =(j/m)β at j = R. By the similarity between triangles ORS and OAB, we have

C β = , (2.4.4) R m for all R> 0. Therefore, even though both C and R are random variables, their quotient C/R remains a constant. Notice the fact that P C is always true by definition 2.4.3 of R, and that (R) ≤ there are no p-values between P(R) and C. The BH procedure will have the number of false discoveries m0

V = V (C)= 1 Pi C . { ≤ } i=1 X In the case of independent continuous test statistics,

V E( )= m , (2.4.5) C 0 for all C values between 0 and 1 (because those Pi′s are independently and uniformly distributed between 0 and 1). Without the continuity argument, it is easy to see that

25 E(V/C) m . Therefore, ≤ 0 V (V 1 V 1 )/C BHFDR(β)=E 1 V 1 = E { ≥ } R { ≥ } R/C     (V 1 V 1 )/C =E { ≥ } (by (2.4.4 )) m/β   β V m = E 0 β, (by (2.4.5 )) m C ≤ m   QED where the equality holds if the test statistics are continuous.

Remarks:

1 This is a much simpler proof than any existing proofs for theorem 2.4.1 and serves as a heuristic proof for Lemma 2.4.2. We shall give a detailed proof in appendix A for Lemma 2.4.2.

2 The only step that we can not extend the above proof to general dependent test statistic case is equation (2.4.5), for which C and V will have a more compli- cated correlation structure than they have under the independent test statistic

condition. The expectation of their quotient may not be equal to m0 all the time. However once this condition is established, the FDR by the BH procedure will always be controlled at the pre-specified level, no matter how the underlying test statistics are correlated with each others.

Prompted by the above reasoning, we can give a simplified proof of FDR control in the case when the test statistics only satisfy the PRDS property.

Definition 2.4.3. (Increasing Set) A set D is called increasing if x D and y x, ∈ ≥ imply that y D as well. ∈ Definition 2.4.4. (PRDS) X , i I is said to satisfy the property of Positive { i ∈ } Regression Dependent on each one from a Subset I I,or PRDS on I [cf: Benjamini 0 ⊂ 0 26 and Yekutieli (2001)] if for any increasing set D, and for each i I , P (X D X = ∈ 0 ∈ | i x) is nondecreasing in x.

The PRDS property in definition (2.4.4) is a relaxed form of the positive regression dependency (PRD) property proposed by Sarkar (1969). The latter means that for any increasing set D, P (X D X = x ,...,X = x ) is nondecreasing in x ...x . ∈ | 1 1 i i 1 i In PRDS, the conditioning is on one variable only at each time, and is required to hold only for a subset of the variables.

Theorem 2.4.5. If the test statistics have the PRDS property on subset I0 where I0 is the set of indices whose corresponding null hypothesis is true, then

m BHFDR(β) 0 β. ≤ m

This theorem was proved by Benjamini and Yekutieli (2001) using a novel but lengthy argument. Here we give a short and easily understood proof.

Proof: Let c =(j/m)β be the BH critical values. Let I = 1, 2,...,m be the j 0 { 0} index set of hypotheses whose null is true so that P1, P2,...,Pm0 be the p-values from the null. For C =(R/m)β, the similar heuristic argument gives us the following:

β V β 1 m0 BHFDR(β)= E = E P r(P

β m0 m = P r(R = r P

27 m0 m 1 m β − = 1+ P r(R r P

m0 m 1 m β − = 0 β + P r(R r P

m 1 Then we need only to show that − [P r(R r P < c ) P r(R r P < r=1 ≤ | i r − ≤ | i c )] 0 for all i = 1, 2,...,m . NoteP that r+1 ≤ 0 m 1 − [P r(R r P cr (2.4.8) β ≤ | r { } − r + 1 { } r=1 X     m 1 m − 1 1 P r(R r P = c ) P r P

The inequality holds since R r P c ,...P c is an increas- { ≤ } ≡ { (r+1) ≥ r+1 (m) ≥ m} 28 1 1 ing set, 1 Pi

A Counter Example. Now it is relatively easy to design a counter example that BHFDR(β) > β in dependent test statistic case by finding a joint distribution of

P1, P2,...,Pm such that

m0 m 1 − [P r(R r P 0 (2.4.9) ≤ | i r − ≤ | i r+1 i=1 r=1 X X by (2.4.6).

We can take m = m0 = 2 so that we have two tests in which both nulls are true.

The requirement (2.4.9) for joint distribution of (P1, P2) becomes to be

m0 [P r(R 1 P 0 (2.4.10) ≤ | i 1 − ≤ | i 2 i=1 X for i = 1, 2. Let β 1/2 be chosen so that c = β/2, c = β. The left hand side of (2.4.10) for ≤ 1 2

29 i = 1 can be reduced to:

P r(R 1 P

=P r(P c P

P r(R 1 P

Note if P1 and P2 are independent and continuous, both conditions (2.4.11) and

(2.4.12) go to 0, so that we will have BHFDR(β) = β = (m0/m)β in this case, as was expected by Theorem 2.4.1.

Let

E =(c ,c ) (c ,c ), E = (0,c ) (c , 1), 1 1 2 × 1 2 2 1 × 2 E =(c , 1) (0,c ), E =(c , 1) (c , 1) 3 2 × 1 4 2 × 2 and sets E E are as shown in Figure 2.3. Denote by P r(E) as P r((P , P ) E) for 5 − 9 1 2 ∈ short. Condition (2.4.11) becomes P r(E E ) P r(E E ) and condition (2.4.12) 1 ∪ 7 − 5 ∪ 6

30 1

E2 E9 E4 2 P β

E6 E1 E8 β 2

E5 E7 E3 0 0 β 2 β 1 P1

Figure 2.3: Partition of unit square such that the joint distribution of (P1, P2) con- stitutes a counter example to Theorem 2.4.1 when the independence assumption is violated. β/2= c1 and β = c2.

1

β 2 0 1 − 3β 2 2 P

β 0 β 2 0 β 2 0 0 β 2 0 0 β 2 β 1 P1

Figure 2.4: A joint distribution of (P1, P2) that constitutes a counter example to Theorem 2.4.1.

31 becomes P r(E E ) P r(E E ). Thus condition on the left hand side of (2.4.10) 1 ∪ 6 − 5 ∪ 7 becomes 2P r(E ) 2P r(E ). 1 − 5 The maximum value for P r((P , P ) E ) is β/2, since P r((P , P ) E ) 1 2 ∈ 1 1 2 ∈ 1 ≤ P r((P , P ) E E E ) = P r(P (c ,c )) = β/2. The minimum value for 1 2 ∈ 1 ∪ 7 ∪ 9 1 ∈ 1 2 P r((P , P ) E ) is 0. Thus the maximum value that the left hand side of inequality 1 2 ∈ 5 (2.4.10) or equivalently, expression (2.4.11) can take is 1/2 when the joint distribution of (P1, P2) is the following:

β 2 , if E = Ei, i = 1, 2, 3; 3 P r((P1, P2) E)= 1 β, if E = E ; ∈  − 2 4  0, if otherwise.  See Figure 2.4, where the number in each cell E is the joint probability P r((P , P ) 1 2 ∈ E). In this case, the left hand side of (2.4.10) takes maximum value 1. If we plug it

3 m0 into expression (2.4.6), we have BHFDR(β)= 2 β > β = m β.

The above definition of joint distribution of (P1, P2) only gives the probabilities in each areas. The joint pdf f(p1,p2) is required to be continuous over the domain E = (0, 1) (0, 1). This can be done by smoothing over two-dimension step function × by sigmoidal functions. Then we have BHFDR(β)= 3 β ǫ>β, for small ǫ’s. QED 2 −

By Lemma 2.4.2, the expected false discovery proportion of the BH procedure (BHFDR) does not depend on any of the realized test statistics corresponding to the alternative hypotheses (under independent test statistic situation). This prompts us to think if the variance of the proportion is independent of any realized values of test statistics from alternative hypotheses. Unfortunately this is no longer the case. However, we can offer the upper and lower bounds of the variance, with both bounds sharp in the sense that there are cases reaching the upper or lower bound respectively.

Theorem 2.4.6. If the true null test statistics are independent and continuous, then there exists an integer M such that for all m M, after applying the BH procedure ≥ 32 at level β, we have

m0 V m0 β(1 β) V ar 1 R 1 β(1 β). (2.4.13) m(m m + 1) − ≤ R { ≥ } ≤ m − − 0   The proof of this theorem is quite involved; we give the proof in Appendix A . A key observation in proving this theorem is: as m becomes large, the variance will be determined monotonically by the stochastic order of p-values from the alternative distribution, as will be shown in a key lemma in Appendix A. The two bounds on both sides correspond to the two extreme cases: the upper bound is the variance when the alternative p-values are all uniformly distributed (Least Favorable Configuration, or LFC); while the lower bound will be the case when the alternative p-values are all fixed at 0 (Most Favorable Configuration, or MFC).

Combining the above theorem with large sample theory, we have

Corollary 2.4.7. If each test statistic for testing H0,i : mean = 0 against H1,i : mean = 0 is consistent as the common sample size n , and these test statistics 6 → ∞ are independent and continuous, then after applying the BH procedure at level β, we have:

V m0 1 V ar( 1 V 1 )= β(1 β)+ o( ) (2.4.14) R { ≥ } m(m m + 1) − n − 0 as total number of tests m M for some integer M. ≥ Proof. As each test statistic is consistent, we have P 0 in distribution for all i → i I . This then falls in the case of the MFC. By invoking Theorem 2.4.6, we have 6∈ 0 approximation (2.4.14). QED

Implications. From Figure 2.5, we see that the variances of FDP follow a quadratic form of FDR level β for fixed m, m0 and fixed distribution of alternative statistics. The variance will increase as long as the FDR level β increases when β 0.5. There- ≤ fore, if a modified BH procedure is to make the FDR closer to the target value β, as

33 Variances 0.00 0.05 0.10 0.15 0.20 0.25

0.0 0.2 0.4 0.6 0.8 1.0

β

Figure 2.5: The asymptotic quadratic relationship between FDR level β and variance when m and m0 are large, based on Corollary 2.4.7. A realization of variances for a

fixed m and m0. various 2-step modified BH procedure do, then the price to be paid for is to have a larger value of variance of FDP in the new procedure than that in the BH procedure. One example to illustrate this point is: to control FDR at level β, instead to run BH procedure at level β, we run BH procedure at level (m/m0)β. This will produce a less conservative FDR controlling procedure, but will have a larger variance than BH procedure. We will see this from our simulation result in Figure 2.8 from Section 2.6. Hence, we define a “suboptimal” FDR procedure at level β to be the one with the smallest variance V ar(F DP ) among a set of procedures such that

FDR =E(F DP ) β. (2.4.15) ≤

2.5 Our Proposed Procedure (PP)

The BHFDR can still be conservative in that the realized FDR is far below the target level β, if m0 << m. This can be seen from the conclusion of Theorem 2.4.1 and from the relationship between FWER and FDR (Figure 2.1). Several people have tried to

34 correct this conservativeness. One example is Storey et al. (2004). It was shown that their FDR is indeed less conservative than BHFDR, and their procedure will be used as a benchmark in comparing our new FDR procedure. Storey’s procedure is a two step procedure. It first tries to calculate the FDRs for any rejection regions using a bootstrap sampling method; then the procedure will find the rejection region that produces an FDR level to be as close to the target FDR level as possible by minimizing an MSE. This procedure eliminates much of the conservativeness, but unfortunately, it is computationally intensive, since it involves both a bootstrap sampling and optimization process. It suffers from another drawback that it tells nothing about its efficiency, or the variance of FDP. With this same goal in mind, we propose our own FDR controlling procedure based on Theorem 2.4.1. Our idea is a procedure in two steps.

Step 1 Estimate m0 properly. Letm ˆ 0 be the estimate.

Step 2 Run BH procedure at level α = (m/mˆ 0)β, where β is the target FDR level.

Hopefully, by Theorem 2.4.1, the modified BH procedure will produce an FDR closer to the target level β than the BH procedure that runs at the original level β will.

Let P =(P1, P2,...,Pm) be the vector of p-value statistics, among which, the first m are from the null, and the remaining m = m m ’s are from the alternative. For 0 1 − 0 any λ between 0 and 1, let

m

A = Aλ(P)= 1 Pi λ { ≥ } i=1 X denote the number of p-values that are greater than or equal to λ. Since all the p- values from the alternative are expected to be close to 0, we would expect that all these Aλ(P) p-values come from the null. Since those p-values from the null are i.i.d.

35 uniformly distributed, we can expect that about m (1 λ) p-values will be greater 0 − than or equal to λ. Therefore, we have: m (1 λ) A (P), and m can be estimated 0 − ≈ λ 0 by A (P) mˆ (λ)= λ . (2.5.1) 0 1 λ − However, since the alternative p-values are not exactly 0, this estimate of m0 is conservative in that E(m ˆ (λ)) m 0 ≥ 0 for all λ (0, 1). ∈ More precisely, let U and F be the CDF of p-values from null and alternative respectively. Then

m m Aλ(P) E( i=1 1 Pi λ ) P (Pi λ) Emˆ (λ)=E( )= { ≥ } = i=1 ≥ 0 1 λ 1 λ 1 λ − P − P − m (1 λ)+ m (1 F (λ)) = 0 − 1 − 1 λ − 1 F (λ) =m + m − (2.5.2) 0 1 1 λ − =:m0 + θ, where 1 F (λ) θ = m − 1 1 λ − denotes the bias of estimate of m . It is easy to see that θ 0. The last inequality 0 ≥ holds because for all λ (0, 1), 0 F (λ) 1. ∈ ≤ ≤ The key difference of our procedure from Storey et al’s is following. We let λ take on a sequence of values between 0 and 1. Each λ in the sequence will give an estimate of m0. If we take their average as our estimate of m0, we can expect that this estimate will have a smaller variance than the estimate by any single λ does. This idea is similar to the variance reduction method in statistical computation class by adding an antithetic variate to the original variate. By using a proper linear

36 combination of two estimates, we can achieve a smaller variance than we can by using one of the two variates individually.

The estimate of m0, as obtained above, is generally biased, unless the p-values from the alternative are all smaller than the smallest λ value. The bias is

1 k Emˆ m =E mˆ (λ ) m 0 − 0 k { 0 i − 0} i=1 X m k 1 F (λ ) = 1 − i (2.5.3) k 1 λ i=1 i X − by equation (2.5.2), where k is the number of λ’s used in getting the average estimate of m0. However, from equation (2.5.3), when k is large, large sample theory may apply.

Then the bias of this estimate of m0 obtained by taking the average of a sequence of estimates is more likely due to the nature of the alternative distribution F itself, rather than due to the of the realized p-values. Because of this observation, we can put a correction term in the estimate, which will be explained by what follows.

Two ad hoc techniques can be included to reduce the bias of the estimate of m0. One technique is to choose λ away from both endpoints 0 and 1, because if λ is too close to 0, the bias will be large while if λ is too close to 1, the variance of the estimate will be large. The other technique is to discard all those estimates that are more than 2 standard deviations away from the mean estimate. Having too large a deviation is likely to be the result of a random effect, rather than a good estimate of m0. Therefore they should be excluded in the averaging process.

Once we have the estimate of m0, the next natural step will be an application of the BH procedure on the statistics at level α =(m/mˆ 0)β, where β is the target level of FDR at which FDR should be controlled. It is possible (but very unlikely) that the quantity (m/mˆ 0)β exceeds 1. In this case, we will let α = 1.

Our first FDR controlling procedure is:

37 Step 1. Specify a sequence of λ, say, 0.02, 0.03,..., 0.98. For each such λ, compute

the estimated mˆ 0(λ) based on equation (2.5.1).

Step 2. Compute the mean estimate of this sequence of estimates, to get the final

estimate mˆ 0 of m0, after excluding those estimates that are more than 2 standard deviations away from the mean.

Step 3. Apply the BH FDR controlling procedure at level α = min(1, (m/mˆ 0)β) and reject all those hypotheses that are declared to be significant by this BH procedure.

Theorem 2.5.1. The above procedure controls FDR at the pre-specified level β as- ymptotically as m , when the test statistics are independent and continuous, and → ∞ m0 limm = π0 (0, 1). →∞ m ∈ Proof: Let BHFDR(β) denote the FDR after applying the BH procedure at level β. By Theorem 2.4.1, BHFDR(β)=(m /m)β for all β (0, 1). 0 ∈ Now the FDR produced by the above proposed procedure is m m m E(BHFDR(α)) =E(BHFDR(min(1, β))) = E( 0 min(1, β)) mˆ 0 m mˆ 0 m m m m E( 0 β)= βE( 0 )= βE( 0 ), ≤ m mˆ 0 mˆ 0 m0 + θ where θ is the bias of the estimatem ˆ 0 from m0 and β is the pre-specified FDR level.

m0 Notice that when limm exists, and when m , then m0 . Therefore, →∞ m → ∞ → ∞ by Taylor expansion, we have

m θ θ βE( 0 )=βE(1 + o( )) m0 + θ − m0 m0 β β Eθ β, → − m0 ≤ where θ is the bias defined in (2.5.3) with Eθ 0. We are done. QED ≥

38 Averagefdr by 1.ST,2 PP,3BH Average fnr by 1.ST,2 PP,3 BH Average Power by 1 ST 2 PP,3 BH

1 3 1 1 2 1 2 2 1 2 2 3 1

2

1 2 2 3 1

12 3

3

Average FDR Average FNR 3

Average Power

3 3 12 21 3 3

3 m= 1000 bea= 0.02 cycle= 10000 21

0.005 0.010 0.015 0.020

m= 1000 bea= 0.02 cycle= 10000 231 m= 1000 bea= 0.02 cycle= 10000

0.90 0.92 0.94 0.96 12 3 231 3

0.0 0.1 0.2 0.3 0.4

200 400 600 800 200 400 600 800 200 400 600 800

m0 m0 m0

Figure 2.6: Comparison of three FDR controlling procedures: 1. ST (Storey’s), 2. PP (Proposed, Uncorrected), 3. BH Procedure. 10000 repetition were performed to average the FDR, FNR, and Power. Total number of tests is m = 1000. The generated signal sampled from N(µ, σ2) is relatively weak with means µ = 0.04, 0.04 + 1.18 1/(m 1), 0.04 + 1.18 2/(m 1),..., 1.2. ∗ 1 − ∗ 1 −

39 Simulation shows (Figure 2.6) that the FDR obtained in this way is comparable to Storey et al’s FDR, and thus is less conservative than BHFDR. However, it can still be conservative and the bias increases as m0/m decreases.

Supposem ˆ 0 = m0 +θ, where θ is the bias of estimate of m0. Use Taylor’s expansion to expand m/m0 around m/mˆ 0 (suppose θ << mˆ 0) we have

m m m 1 m θ θ = = = (1 + + o( )). m mˆ θ mˆ 1 θ/mˆ mˆ mˆ mˆ 0 0 − 0 − 0 0 0 0 Therefore, by ignoring the nonlinear terms in the expansion, we have

m m θ β β(1 + ). m0 ≈ mˆ 0 mˆ 0

Simulation and equation (2.5.3) show that the bias of the resulting FDR by ap- plying the BH procedure at level α = β m/mˆ (which comes from the bias of ∗ 0 the estimate of m ), is affected by two factors. One is m /m. As m /m 1, or 0 0 0 → equivalently, as m /m 0, the bias will go to 0. We have already observed this in 1 → Figures 2.1 and 2.6 before. The other factor is the strength of signals (the closer is the alternative p-values to 0, the stronger the signal is. This is also shown in the

1 F (λi) − term of 1 λi in equation (2.5.3), which is exactly the ratio of signal (1 F (λ)) to − − noise (1 λ = 1 F (λ), where F is the cdf of a uniform random variable)). If the − − U U strength of signal is strong (F (λ) 1 or p-value 0 for most λ (0, 1)), then the bias ≈ ≈ ∈ will be small. This can also be seen in Theorem 2.4.6 and the simulated FDR-FWER relationship plot in Figure 2.1, in which the variance of the false discovery proportion depends not only on constants m,m0 and FDR level β, but also on the realized values from the alternative. It is large in the case of weak signals and is small otherwise. To take advantage of this information, we introduce a correction term ǫ for α =

(m/mˆ 0)β. By rule of thumb and by the argument above, we can use:

m α′ = (1+ ǫ)α = (1+ ǫ) β, mˆ 0

40 instead of α =(m/mˆ 0)β, where the Correction Coefficient is defined to be:

m mˆ ǫ = − 0 δ, (2.5.4) mˆ 0 where δ is a constant, chosen by hand, that depends on the strength of signals only (by looking at the index plot of p-values). The numerator m mˆ of the correction coefficient is an effort to include factor 1 − 0 into our correction consideration. As m 0, this term will go to 0, as is expected in 1 → the plot (Figure 2.6). We can also introduce another term to include another factor say, ρ, which is the average of the firstm ˆ = m mˆ smallest p-values. But this 1 − 0 does not help the final output FDR. δ is a constant which can be chosen by some optimization algorithm (e.g., Storey et al’s procedure 2002.) or by looking at the index plot of the sorted realized p-values. The closer the curve of the index plot of sorted p-values is to the line y = x/m, the weaker the signal will be and the larger this constant should be. Generally δ will be between 0 and 0.1. The exact way of selecting this constant in order to obtain the best FDR control requires further investigation. With properly selected δ, we can hope to further reduce the bias of FDR on the lower side of m0/m.

Our proposed bias corrected FDR controlling procedure is:

Step 1. Specify a sequence of λ, say, 0.02, 0.03,..., 0.98. For each such λ, compute

the estimatedm ˆ 0(λ) based on equation (2.5.1).

Step 2. Compute the mean estimate of this sequence of estimates to get the final

estimatem ˆ 0 of m0, after excluding those estimates that are more than 2 standard deviations away from the mean.

Step 3. Based on the index plot and our prior belief, choose a suitable δ and apply

the BH procedure to the sequence of p-values at level α′ = min(1, α(1 + ǫ)), where ǫ is computed by equation (2.5.4).

41 Step 4. Reject all those hypotheses that are rejected by the above BH procedure.

Thereafter, we will call the previous procedure bias uncorrected PP(UPP) if

α is used and bias corrected PP(CPP) if α′ = α(1 + ǫ) is used.

2.6 Comparison with Other Procedures

To compare the performances of the proposed (PP), BHFDR and Storey et al’s FDR controlling procedures, we performed the following simulation study. Assume that each of our statistics comes from testing a normally distributed sample with mean µ and 1, i.e. N(µ, 1). The null hypothesis is H0 : µ = 0 versus the alternative H : µ = 0. Assume that we have a total of m = 1000 tests. Among them, a 6 m0 = (100, 300,..., 900) are true null hypotheses respectively, whose p-value will be generated from a uniform distribution. The remaining m = m m p-values are 1 − 0 generated by computing the p-values of T-statistics, which are generated by a non- central t-distribution with noncentrality parameter to be such that it has the same distribution as a t-test statistic by testing an i.i.d. sample from N(µ, 1) distribution of size n = 400. The three test procedures are applied to the generated p-values at level β = 0.02, and the observed false discovery proportion, false non-discovery proportion, and the power are computed. The above process is repeated 10,000 times to compute the average FDP, FNP and power. The results are summarized in Figure 2.6. However, if we put a correction term for α with δ = 0.035 in (2.5.4), we obtain the plots shown in Figure 2.7. From Figures 2.6 and 2.7, we see that, roughly speaking, both procedures control the FDR at level β = 0.02. Among them the BH procedure is the most conservative one; the next is our uncorrected proposed procedure. After correction, the overall bias is reduced. Storey’s procedure performs slightly better than our uncorrected procedure. However if we choose the right δ, our procedure can perform a little better. The differences in FNR and power are negligible.

42 Averagefdr by 1.storey,2 PP Average fnr by 1.storey 2 PP Average Power by 1 storey 2 PP

2 2 1 2 1 2 1 2 2 1 2 1

1 12

12

21

12

21 m= 1000 bea= 0.02 cycle= 10000

0.05 0.10 0.15 0.20 0.25 12

0.90 0.92 0.94 0.96 1 12 12

0.0180 0.0185 0.0190 0.0195 0.0200

200 400 600 800 200 400 600 800 200 400 600 800

Figure 2.7: Average FDP (left panel), FNP (middle panel), POWER (right panel) (y- axis) by Storey’s (line with mark 1), Corrected PP (line with mark 2), with δ = 0.035 in formula (2.5.4). 10000 replications were used for average. Number of total tests is m=1000, m0 (x-axis) is the number of true null hypotheses.

43 Figure 2.8 shows the difference in variances of the FDPs produced by the three pro- cedures. It is seen that the variances of FDR produced by the proposed uncorrected procedure (dashed green) are uniformly smaller than that by Storey’s procedure (solid blue). Part of the reason for this is due to the fact that our estimate of m0 is an average of a sequence of estimates of m0, and thus has a smaller variance than any single estimate of m0.

2.7 Application to a Real Data Set

We applied our procedure to the training data set of leukemia bone marrow presented by Golub (1999) published in Nature. There are m = 7129 genes and n = 38 samples. Among these 38 samples, 27 come from AML (acute myeloid leukemia) while the remaining 11 come from ALL (acute lymphoblastic leukemia). Originally the paper tried to construct a classification rule based on the expression training data, then use the rule to predict the class of a future sample. Their classification rule was based on a number of preselected genes that are most significant to the class differentiation. The following is the application of our method to find these genes.

Several steps are involved in this procedure: First, we calculate the statistics Ti for each gene i by

X¯i,1 X¯i,0 Ti = − , 2 2 Si,1/n1 + Si,0/n0 q where X¯i,j,Si,j, nj are the mean, standard deviation and sample size for gene i of class j respectively, where j = 0, 1. The next step is to find the null distribution for each Ti by randomly permuting the labels of samples repeatedly for B times and calculating the statistic T b in the same manner for b = 1, 2, , B. The p-value p for gene i is i ··· i computed by 1 B b pi = 1 T Ti . B {| i |≥| |} 1 X

44 FDR and FDR +/- Standard Deviation

31 1 m= 1000 cycle= 12000 beta= 0.05 n= 400 u=seq(0.1, 1.2, length = m1) 2 1 1 1 3 3 3 3 1 1 31 31 31 3 3 2 2 3 3 1 13 1 13 2

FDR 2 31 2 2 2 2 2 2 2 2 2 1.Storey 2 2. BH 3. PP

0.0 0.02 0.04 0.06

200 400 600 800

m0

Figure 2.8: Variance comparison of false discovery proportion of three procedures. Averaged FDR’s were plotted together produced by Storey’s(ST, solid blue), Uncor- rected PP(PP, dashed green), BH’s(BH, dotted brown) procedures . “Confidence bands” (plus and minus one standard deviation) were added to the plot. 12000 repli-

cations were used for average. Total test number is m=1000, m0 =number of true null hypotheses.

45 Index plot for the p-values

y=x/m when all p are uniform

Sorted pvalues sorted p-values

BH critical line: y=(0.05/m)x

0.0 0.2 0.4 0.6 0.8 1.0

0 2000 4000 6000

Index

Figure 2.9: Index plot for the 7129 p-values computed through permutation and t- test. Two straight lines are added to the plot. One is y = x/m, which corresponds to case when all genes are insignificant to the class differentiation. The other is y =(β/m)x, corresponding to the BH line at level β.

These p-values are then plugged into our algorithm as well as BH and Storey’s pro- cedure to find the most significant genes. We made a total B = 450 permutations; the sorted p-values (y-axis) against index (x-axis) is plotted in Figure 2.9. We see that the signal is moderately strong. Applying the three procedures at FDR level β = 0.02, we find out that there are 637 genes rejected by our proposed procedure and Storey’s procedure while the direct application of the BH procedure will only discover 476 genes (less discoveries). If the above comparison is run at different levels β, the total numbers of rejected hypotheses by running these 3 procedures are summarized in Table 2.1. The total rejected hypotheses by Storey’s procedure and by our procedure happened to be the same (due to small number of permutations), and always more than that by directly

46 Table 2.1: Number of genes discovered by three FDR procedures

β ST PP BH 0.01 476 476 476 0.02 637 637 476 0.03 767 767 637 0.04 862 862 637 0.05 1024 1024 767 applying the BH procedure.

47 Chapter 3

Test Equality of Curves

3.1 An Environmental Study—Lead Project

Studies on the effects of lead exposure on children’s health have demonstrated a close adverse relationship between them. These effects include impaired mental and physical development, decreased heme biosynthesis, elevated hearing threshold, and decreased serum levels of vitamin D. Several major sources of lead exposure have been identified [ATSDR (1988)]. Among them are leaded paint (used in home, peeled off and then being taken in by children), leaded gasoline (via consumption, then released to environment), food, drinking water, industry wastes and products etc. High lead concentration in children can cause severe health problems, sometimes even death. However, it is difficult to collect data of lead concentration in children’s blood over a long period of time (e.g., the past 70 years). Fortunately, there is a high correlation between lead concentration in children’s blood and that in their teeth. Once a tooth has finished its development, the lead concentration in it will remain unchanged throughout one’s life. By measuring lead concentration in teeth of patients of different ages, we can trace the lead concentration in a child’s blood back to the year when the corresponding tooth was formed.

48 Over the past 15 years, scientists from the Department of Neuroscience, the School of Dentistry at CWRU and the Department of Chemistry at Northern Arizona Uni- versity have been collecting data on lead concentration and lead isotopic ratios (i.e., Pb207/PB206 and PB208/PB206 ) in teeth and in Lake Erie Core (as a surrogate for atmospheric lead exposure) to study the key sources of lead exposure to children in Cleveland area and other related issues [ref: Robbins et al. (2005)]. These issues include, among others:

(1) to fill in the gap in our knowledge about childhood uptake of lead from leaded gasoline before and during its peak usage in the 1960’s and 1970’s and

(2) to reconstruct a more complete history of lead uptake in a single population.

This retrospective analysis was begun not only for historic interest, but also to deter- mine which of a few possible scenarios best accounts for the most prominent source(s) of childhood lead uptake over the years 1936-1993. These years include periods when tetraethyl in gasoline was being phased in, peaked, and then being phased out. Each patient participated in the study had either his/her first molar (molar 1) or second molar (molar 2) extracted, and then the lead concentration as well as lead isotopic ratios of that tooth were measured. The time for half-maximal enamel formation for tooth molar 1 is at about age 2 and for molar 2, it is at about age 6. In the following, we will call a patient in group M1 if his/her first molar was used and in group M2 if otherwise. The of lead concentrations for groups M1 and M2 is displayed in Figure 3.1. From the plot, we see that points from the two groups are well mixed with each other. Lead concentrations for both groups seemed to go up during the phase-in period (1936-1960), then reach their peaks at the same period (1960), then both went down. In Figure 3.2, we added the smoothing curves fitted by a local regression smoothing

49 method for both groups to the scatter plot. It is of interest to ask if there are any differences between the curves of teeth lead concentrations corresponding to the two patient groups. A positive answer will prompt us to ask what reason caused the differences, as well as to ask whether one of the two teeth is more suitable for the lead concentration study than the other. A negative answer will clear our way to eliminate the group variable in the later . It seems there is a difference between the two curves in Figure 3.2 between 1940 and 1960, but not in the rest of the time period in the study. Is the difference statistically significant or is it just a random phenomenon? We need to have a procedure to answer these questions.

The test developed here is general. It is designed to compare two curves or one curve with line 0 (or any other nonzero constant), but can be readily extended to cases of comparing more than 2 curves. Application of testing the equality of curves can be readily found in a variety of situations. Ideally if we want to compare the trends of some quantity for two or more groups from longitudinal data, we can either use longitudinal data analysis or use the curve testing procedure developed here (or its generalized version for correlated observations). Furthermore, the curves could be functions over time (time related) or functions over location. For example, if we have a time course gene expression levels for the same chromosome, we may consider the profile of gene expression levels over time for a single gene as a smooth curve, and thus the study of the differentiation of genes under some conditions becomes the study of comparing if the profile curves are equal.

50 15 15

10 10 M1 M2

5 5

0 0

1940 1950 1960 1970 1980 1990 Year

Figure 3.1: Plot for teeth lead concentrations: Red square: M1 group, Blue circle: M2 group.

15 15

10 10 M1 M2

5 5

0 0

1940 1950 1960 1970 1980 1990 Year

Figure 3.2: Plot for teeth lead concentrations: Solid red : M1 group, Dotted blue : M2 group. Local smoothing curves were superimposed for each group. Solid red line: M1 group, dotted blue line: M2 group. 51 3.2 Model Setup

We formalize our model as following. Suppose

Y1(t)= f1(t)+ ε1(t), (3.2.1) and

Y2(t)= f2(t)+ ε2(t), (3.2.2) where ε1(t) and ε2(t) are two independent homogeneous Gaussian random errors

2 indexed by t with means Eε1(t)=0= Eε2(t) and variances V ar(ε1(t)) = σ1 and

2 V ar(ε2(t)) = σ2 respectively for all t. The term also implies that the errors at any two locations t and t with t = t , ε (t ) and ε (t ) are independent for both i = 1, 2. 1 2 1 6 2 i 1 i 2 If the two variances are equal, this is an example of homoscedastic case, while if the two variances are unequal, we have an example of heteroscedastic case. We will differentiate the two cases in the following section.

Of interest is to test H : f (t)= f (t) for all t versus H : f (t) = f (t) for at 0 1 2 ∈ T 1 1 6 2 least one t , for some domain (e.g., an interval). ∈ T T We also assume both f1(t) and f2(t) are smooth functions with continuous deriva- tives up to order 2. Such restrictions on f(t) are reasonable and sometimes necessary, since too wide a range of f(t) will lead to a meaningless analysis. One special case occurs when f (t) 0. In this case, we are testing if Y (t) is 2 ≡ 1 statistically different from a (homogeneous) Gaussian random errors. The observed data are (t ,Y ), i = 1, 2,...,n from model (3.2.1) and (t ,Y ) { 1,i 1,i 1} { 2,j 2,j j = 1, 2,...,n from model (3.2.2), where sample sizes n and n may or may not 2} 1 2 be equal. The errors ǫi,j are assumed to be independent for i = 1, 2, j = 1, 2,...,ni.

52 3.3 Related Work and Outline.

Studies from extending numerical data analysis to functional data analysis can be traced back to Parzen (1961). Besse and Ramsay (1986) considered the functional principal component analysis of functions resulting from polynomial interpolation of observed values. Ramsay and Daizell (1991) coined the term Functional Data Analysis (FDA) to distinguish it from ordinary data analysis. Since then several researchers have tried doing research from various aspects in the field. For example, Leurgans et al. (1993) considered canonical correlation analysis when the data are curves. James and Hastie (2001) discussed functional linear discriminant analysis for irregularly sampled curves. Closely related to our present research is work by Fan and Lin (1998), which gives a procedure to test significance of two curves by executing a Fourier or Wavelet transformation (on the curves), and then using the partial coefficients to test the hypothesis. We call this an indirect method since the test is based on transformed functions (curves) rather than on working on the original functions (curves) themselves. In this chapter we present a direct method to test the hypothesis by estimating the involved probability directly. The outline for the rest of this chapter is as follows. In Section 3.4, we describe our test procedures, which are based on the Tube formula [Sun (1993); Sun and Loader (1994)] and local regressions [Loader (1996)]. Our method is based on directly estimating the involved probabilities rather than on transformations Fan and Lin (1998). Simulation studies are made and the results are presented in Section 3.5. The test procedures are applied to the Teeth data mentioned above, which motivated this study, to draw our conclusions for the hypothesis testing of this data set in Section 3.6.

Testing the equality of curves is closely related to building simultaneous confidence bands (SCB) around a curve. To test if two curves measured with errors are statis- tically equal, we could also first build a proper SCB around each of the two curves,

53 then if the two SCBs don’t have overlap in some area(s) of the domain that is to be tested, we will claim the two curves are statistically different and vice versa. See papers of Sun and Loader (1994), Sun (2001) and Naiman (1987) for more detail. We will also discuss this in more depth in Chapter 4.

3.4 Methods.

We describe the procedure for several different cases.

3.4.1 Homoscedastic Case.

Here we assume the variances of the two homogeneous Gaussian random errors εi,i =

2 2 2 1, 2 in models (3.2.1) and (3.2.2) are equal, i.e., we assume that σ1 = σ2 := σ .

Consider the quadratic local regression estimation. Then f1 can be obtained by solving the following optimal problem

n1 t t min (a ,a ) : W ( − 1.i )(Y a a (t t))2 , 0 1 h 1,i − 0 − 1 1,i − ( i=1 ) X where W (t) is a kernel function and h is the window width with h> 0 and h 0. → The estimated f1(t) from model (3.2.1) is

n1 fˆ (t) =a ˆ = l (t)Y = l (t), Y , (3.4.1) 1 0 1,i 1,i h 1 1i i=1 X where

Y1 = (Y1,1,...,Y1,n1 )′

l1(t) = (l1,i(t),i = 1, 2,...,n1)′, (3.4.2)

w1,i j w1,j(t t1,j)(t1,i t1,j) l1,i(t) = − − , w1P,i w1,j(t t1,j)(t1,i t1,j) i j − − P n P o t t1.i and w1,i = W ( −h ) for i = 1, 2,...,n1.

54 The estimated f2(t) by quadratic local regression for model (3.2.2) is

n2 fˆ (t)= l (t)Y = l (t), Y , (3.4.3) 2 2,i 2,i h 2 2i i=1 X where Y2, l2(t) are similarly defined as in (3.4.2), with subscript “1” replaced by “2”. ˆ ˆ If both estimators fi(t) are unbiased, i.e., Efi(t)= fi(t)= j li,j(t)µi,j for i = 1, 2, where µi,j = EYi,j for all i,j. Then for i = 1, 2, P

Efˆ(t)=E( l (t)Y )= l (t)µ = l (t),µ , i i,j i,j i,j i,j h i ii j j X X V ar(fˆ(t)) =σ2 l2 (t)= σ2 l (t) 2, i i,j k i k2 j X where denotes the L2 norm. k · k2 Because ε1(t) and ε2(t) are assumed to be independent,

V ar(fˆ (t) fˆ (t)) =σ2( l2 (t)+ l2 (t)) 1 − 2 1,i 2,i X X =σ2( l (t) 2 + l (t) 2), k 1 k k 2 k and the standard deviation of fˆ (t) fˆ (t) is 1 − 2 sd(fˆ (t) fˆ (t)) = σ l (t) 2 + l (t) 2. 1 − 2 k 1 k k 2 k p

Estimate of the standard deviation σ. [cf, Cleveland and Devin (1988) ]. For i = 1, 2, let

εˆi := (ˆε , εˆ ,..., εˆ )=(Y Yˆ ,Y Yˆ ,...,Y Yˆ ), i,1 i,2 i,ni i,1 − i,1 i,2 − i,2 i,ni − i,ni where Yˆi,j = fˆi(ti,j), and let

li,1(ti,1) li,2(ti,1) ... li,ni (ti,1)

li,1(ti,2) li,2(ti,2) ... li,ni (ti,2) Li =   ......    l (t ) l (t ) ... l (t )   i,1 i,ni i,2 i,ni i,ni i,ni    55 be the matrix such that Yˆ i = LiYi. Cleveland and Devin (1988) showed that

2 E(ˆε′ εˆ )= σ tr[(I L )(I L )′]. i i − i − i

Also for i = 1, 2, let

δ := tr[(I L )(I L )′], i,1 − i − i 2 δ := tr [(I L )(I L )′] , (3.4.4) i,2 − i − i 2  νi := δi,1/δi,2,

and let ν = ν1 + ν2, where I is the identity matrix of order either n1 or n2, depending on i = 1 or 2. Then εˆ εˆ δ 1′ 1 1,1 χ2 2 approx ν1 σ δ1,2 ∼ and εˆ εˆ δ 2′ 2 2,1 χ2 . 2 approx ν2 σ δ2,2 ∼ The degrees of freedom thus defined were chosen in such a way that the first two mo-

2 ments of the approximating distribution χνi match the first two moments of quadratic formε ˆi′ εˆi.

Sinceε ˆ1 andε ˆ2 are independent, we have

εˆ εˆ δ εˆ εˆ δ 1′ 1 1,1 + 2′ 2 2,1 χ2 = χ2 . 2 2 approx ν ν1+ν2 σ δ1,2 σ δ2,2 ∼

If we estimate σ2 by εˆ εˆ δ εˆ εˆ δ σˆ2 = 1′ 1 1,1 + 2′ 2 2,1 , (3.4.5) νδ1,2 νδ2,2 then νσˆ2/σ2 χ2. ∼ ν 2 2 This expression of estimate of σ can be viewed in another way. Letσ ˆi = (ˆεi′ εˆi)/δi,1, then (ν σˆ2)/σ2 χ2 approximately for i = 1, 2. The right hand side of equation i i i ∼ νi (3.4.5) can thus be written as:

56 ν σˆ2 + ν σˆ2 σˆ2 = 1 1 2 2 . ν1 + ν2

Therefore, our estimate ofσ ˆ2 is a proper weighted average of estimates of the two individual variances.

Another method to estimate σ2 is as following. Let

Y (t) f (t) ε (t) Y (t)= 1 , f(t)= 1 , ε(t)= 1 . Y2(t) ! f2(t) ! ε2(t) !

Y1,1 ε1,1  . . .   . . .  Y1,n L1 0 ε1,n Y =  1  , L = , ε =  1  .      Y2,1  0 L2 !  ε2,1       . . .   . . .       Y   ε   2,n2   2,n2      Then the two models (3.2.1) and (3.2.2) can be written together as:

Y (t)= f(t)+ ε(t), (3.4.6) where ε(t) is a homogeneous Gaussian random errors indexed by t. We can then estimate f(t) by fˆ(t)= Yˆ = LY, and estimate ε by εˆ =(I L)Y, − where I is an n by n identity matrix. Thus define:

δ :=tr[(I L)(I L)′], 1 − − 2 δ :=tr [(I L)(I L)′] , 2 − − 2  ν :=δ1/δ2,

57 then we have δ1 = δ1,1 + δ2,1, δ2 = δ1,2 + δ2,2. If we estimate σ2 by

εˆ εˆ σˆ2 = ′ , δ1 we have that the first two moments of νσˆ2/σ2 agree with the first two moments of

2 χν. Therefore, we have approximation:

σˆ2 χ2 ν . σ2 ∼approx ν

This estimates of ν and σ2 produce a little different estimates of ν and σ2 as they were produced by the previous method. However, our simulations show that both give very similar estimate of the tail probabilities that will be stated below. Therefore, our algorithm are based on the first method only.

Let (fˆ (t) fˆ (t)) (f (t) f (t)) Z(t):= 1 − 2 − 1 − 2 (3.4.7) sd(fˆ (t) fˆ (t)) 1 − 2 Under H : f (t)= f (t), EZ(t) = 0, Var(Z(t)) = 1 for all t . 0 1 2 ∈ T Z(t) is approximately a Gaussian random field. Let

li(t) ui(t) := , i = 1, 2. l (t) 2 + l (t) 2 k 1 k2 k 2 k2 p Then Z(t) can be expressed in terms of u1(t) and u2(t)

(fˆ (t) f (t)) (fˆ (t) f (t)) fˆ (t) f (t) fˆ (t) f (t) Z(t)= 1 − 1 − 2 − 2 = 1 − 1 2 − 2 sd(fˆ (t) fˆ (t)) sd(fˆ (t) fˆ (t)) − sd(fˆ (t) fˆ (t)) 1 − 2 1 − 2 1 − 2 l (t) Y EY1 l (t) Y EY = 1 , 1 − 2 , 2 − 2 h l (t) 2 + l (t) 2 σ i−h l (t) 2 + l (t) 2 σ i k 1 k2 k 2 k2 k 1 k2 k 2 k2 ε1 ε2 = up(t), u (t), = u (t), ξ pu (t), ξ h 1 σ i−h 2 σ i h 1 1i−h 2 2i where ξi = εi/σ, i = 1, 2 are multivariate standard normal, and they are independent to each other.

58 The correlation function ρ(t,t′) of the random field Z(t) is computed as

ρ(t,t′) :=corr(Z(t), Z(t′)) = u (t), u (t′) + u (t), u (t′) (3.4.8) h 1 1 i h 2 2 i by the fact that ξi are standard normal.

Let us return to our primary test problem. Recall that we want to test H0 : f1(t)= f (t) for all t H : f (t) = f (t) for at least one t at a pre-specified level α. Consider 2 ⇔ 1 1 6 2 the test statistic

T = maxt Z(t) . (3.4.9) ∈T k k

If the realized T = t0 is too large, we reject the null hypothesis. More specifically, we

need to find the (tail) probability P rH0 (T >t0). If this probability is larger than α, we accept the null and declare that there is no difference between curves f1(t) and f2(t). Otherwise, we will reject the null hypothesis.

The probability P rH0 (T >t0) can be estimated by (3.4.10) and (3.4.11) in the following Theorem generalized from theorems in Sun (2001).

Theorem 3.4.1. (Tail Probability Estimation for Homoscedastic Case) Suppose = T [a, b] and fˆ1(t) and fˆ2(t) are unbiased estimate of f1(t) and f2(t), l1(t) and l2(t) are defined in (3.4.2) and (3.4.3) respectively. If σ2 is known, then

κ t2 P r (T >t ) 0 exp( 0 )+ E(1 Φ(t )) as t , (3.4.10) H0 0 ≈ π − 2 − 0 0 → ∞

If σ2 is unknown and is estimated by σˆ2 in (3.4.5) so that νσˆ2/σ2 χ2, then ∼ ν

2 κ0 t0 ν/2 E P r (T >t ) (1 + )− + P ( t >t ) as t , (3.4.11) H0 0 ≈ π ν 2 | ν| 0 0 → ∞ where tν follows a standard t distribution with degrees of freedom ν,

κ = C(t) 1/2dt, (3.4.12) 0 | | ZT 59 C = ∂ρ(t,t′)/∂t∂t′ ′ , and E is the Euler-Poincare characteristic of manifold (t)= |t =t M n 1 n n 1 (u (t), u (t)) from to − = x : x = 1 , where − is the n = n + n 1 − 2 T S { ∈R k k } S 1 2 dimensional unit surface. E = 0 if (a) = (b), E = 2 if (a) = (b) and M M M 6 M M has no self-overlap.

Note that this theorem has the same expression (of the tail probability) as it is in

Sun (2001), but different κ0, ν and setup.

Proof. Z(t) in formula (3.4.7) can be written as

Z(t)= u (t), ξ u (t), ξ = (t), ξ , h 1 1i−h 2 2i hM i

u1(t) n 1 ξ1 ε1/σ n where (t) = − and ξ = = is standard M u (t) ∈ S ξ ε /σ ∈ R − 2 ! 2 ! 2 ! multivariate normal. Conditioning on ξ , the probability can be written as k k

P r(T t0)=P r(sup < (t), ξ> t0) ≥ t | M | ≥ ∈T ∞ ξ t = P r(sup < (t), > 0 ξ = y)g(y, n)dy (3.4.13) t t M ξ ≥ y |k k Z 0 ∈T k k where g(y, n) is the probability density function (pdf) of the square root of a χ2

n 1 random variable with n degrees of freedom. Since U = ξ/ ξ uniform( − ) is k k ∼ S independent of ξ , we can drop the condition in the probability. Let T = x k k { ∈ n 1 t0 S − : supt < (t), x> y , we then have tubes around curve (t) and ∈T | M | ≥ } M n 1 curve (t) embedded in S − . The probability inside the integral of (3.4.13) can −M n 1 be calculated by V ol(T )/V ol(S − ). We then plug-in the tube formula (B.1.4) to get result (3.4.10). See also Sun and Loader (1994) [ Proposition 1, p1330]. Result (3.4.11) was obtained by replacing g(y, n) of the pdf of ξ = ε ε/σ2 by the pdf of k k ′ ξˆ = ε ε/σˆ2, where ξˆ 2/n F . p QED k k ′ k k ∼ n,ν p

When σ2 is known, Z(t) =< u (t), ξ > < u (t), ξ > is actually a finite 1 1 − 2 2 Karhunen-Loeve expansion of the random field Z(t) [see Sun (1993, 2001)]. Thus

60 we have another way to prove the theorem. The function ρ(s,t) in for- mula (3.4.17) has a finite expansion (of up to n terms of form Zi,j(s)Zi,j(t), for i = 1, 2, j = 1, 2,...,ni). Therefore, Theorem 3.1 in Sun (1993, p40) can be applied and the constant κ0 can be computed as follows.

Computation of κ : If = [a, b], and is partitioned into k small intervals a = 0 T t0

k 1/2 2 2 1/2 κ0 = C(t) dt [ u1(ti) u1(ti 1) 2 + u2(ti) u2(ti 1) 2] , (3.4.14) | | ≈ k − − k k − − k i=1 ZT X where . as before denotes the L2 norm. Its computation is often straight forward. k k2

3.4.2 Special Case When f2(t) 0 ≡ This is a much simpler version of the previous problem and is equivalent to a (over a continuous domain t). Studies of this problem can be found in James and Stein (1961), Shapiro and Wilk (1965) and Chakravarti et al. (1967) etc, but in a different setup. Throughout this subsection, we suppress the group subscripts to simplify our no- tation. Our test procedure will first use local regression to estimate f(t) for a suitably selected window width h and kernel function W (t) as before. The estimated curve f(t) can be expressed n fˆ(t)= l (t)Y = l(t), Y , (3.4.15) i i h i i=1 X where Y and l(t) are defined as before in (3.4.2), with index i omitted. Let T (t)= l(t)/ l(t) . Theorem (3.4.1) is still valid and κ can be calculated by k k 0

κ = T ′(t) dt 0 k k ZT

61 Similar to the estimation formula of κ0 in (3.4.14), we can estimate κ0 by

k

κ0 T (ti) T (ti 1) , ≈ k − − k i=1 X if = [a, b] and it is partitioned into k intervals a = t < t < t < < t = b T 0 1 2 ··· k with maxi ti ti 1 0 as k . E has the same definition as it was in Theorem | − − | → → ∞ 3.4.11. QED

3.4.3 Heteroscedastic Case.

2 2 In this case the assumption σ1 = σ2 is no longer valid. ˆ ni Let fi(t)= j=1 li,j(t)Yi,j, for i = 1, 2, be estimated as before in the homoscedastic 2 case (note theP estimate of fi(t) does not depend on σi ’s at all) and let

[fˆ (t) f (t)] [fˆ (t) f (t)] Z(t) := 1 − 1 − 2 − 2 sd(fˆ (t) fˆ (t)) 1 − 2 σ1l1(t) u1(t) := σ2 l (t) 2 + σ2 l (t) 2 1k 1 k 2k 2 k p σ2l2(t) u2(t) := σ2 l (t) 2 + σ2 l (t) 2 1k 1 k 2k 2 k pε1 Y1 EY1 ξ1 := = − σ1 σ1

ε2 Y2 EY2 ξ2 := = − , σ2 σ2 where ξ ni is (multivariate) standard normal for i = 1, 2, independent with each i ∈R other, l1(t) is defined as it was before in (3.4.2) and similarly for l2(t).

Z(t) can be expressed in terms of u1(t) and u2(t) (with the assumptions that fˆi(t),i = 1, 2 are unbiased, as we did in the homoscedastic case)

Z(t)= u (t), ξ u (t), ξ . h 1 1i−h 2 2i

The correlation function ρ(t,t′) of this random field Z(t) is computed as

ρ(t,t′) :=corr(Z(t), Z(t′)) = u (t), u (t′) + u (t), u (t′) (3.4.16) h 1 1 i h 2 2 i

62 2 We can estimate σi separately for i = 1, 2 as: εˆ εˆ σˆ2 = i′ i (3.4.17) i tr((I L )(I L ) ) − i − i ′ where L is the matrix in the estimation equation Yˆ = L Y , εˆ = Y Yˆ =(I L )Y . i i i i i i− i − i i Such estimates satisfy the property that

2 2 σˆi χνi 2 , σi ≈ νi for i = 1, 2 as we have discussed before.

To test H : f (t) = f (t) for all t H : f (t) = f (t) for at least one t at 0 1 2 ⇔ 1 1 6 2 ∈ T pre-specified level α, consider the test statistic

T = maxt Z(t) . (3.4.18) ∈T k k

Depending on the realized T = t0, we will reject or accept the null hypothesis if t0 is too large or too small. More specifically, we need to find the probability P r(T >t0). If this is larger than α, we will accept the null and declare that there is no difference between curves f1(t) and f2(t).

The probability P r(T >t0) can be approximately estimated by the following theorem modified from [cf, Sun and Loader (1994), p1330]:

Theorem 3.4.2. (Tail Probability Estimation for Heteroscedastic Case) When the variances are not equal, everything in Theorem 3.4.1 is valid with ν replaced by

2 n ν1ν2 ν = 2 2 . (3.4.19) n2ν1 + n1ν2 i.e., we have:

κ t2 P r(T >t ) 0 exp( 0 )+ E(1 Φ(t )) as t , (3.4.20) 0 ≈ π − 2 − 0 0 → ∞

63 for known variances, and

2 κ0 t0 ν/2 E P r(T >t ) (1 + )− + P ( t >t ), (3.4.21) 0 ≈ π ν 2 | ν| 0 for unknown variances, and they are estimated by formula (3.4.17).

Proof. See Appendix B. Computation of the κ : If σ2 are known, = [a, b], and it is partitioned into 0 i T k small intervals a = t

The computation of κ0, as before, is generally straight forward.

3.5 Simulations

First assign values to n1 and n2, say n1 = 50 and n2 = 55, which are the sample sizes for the two groups. The ti,j’s for i = 1, 2, j = 1, 2,...ni are chosen to be equally spaced between 0 and 1. Set f (t)= f (t)= t(1 t) so that H is true. Generate n 1 2 − 0 1 2 and n2 i.i.d. Gaussian N(0, σ ) random samples εi,j for i = 1, 2, j = 1, 2,...,ni. The

Yi′s are obtained by adding fi(t) and εi together, for i = 1, 2 respectively:

Yi(ti,j)= fi(ti,j)+ εi(t).

We actually used σ2 = 0.022 in our simulation to produce Figure 3.3. Now partition = [0, 1] into n equally spaced subintervals with the n +1 end T points ti = i/n for i = 0, 1, 2,...,n. Select a suitable window width h (say h = 0.15) by visually checking the scatter plot and smoothing curves. Use this h to run a local regression to get the estimated function values fˆ1(ti) and fˆ2(ti), for i =

64 1, 2,...n. κ0 is computed based on formula (3.4.22) and the realized test statistic t0 defined in (3.4.18) is computed, together with the p-value based on right-hand- side of formula (3.4.11). This p-value is compared with a sequence of levels(e.g.,

α = 0.005, 0.01, 0.02, 0.05, 0.1). At each level, the decision of H0 being accepted or rejected is made by comparing the p-value with each α in the sequence. The proportion that H0 is falsely rejected among the 10,000 iterations is calculated, and is plotted in Figure 3.3.

Compare 2 Curves,f1(x)=f2(x) Equal Variances Assumed.

Expected Error Rates Simulated Error Rates Simulated Error Rates 0.00 0.02 0.04 0.06 0.08 0.10

0.02 0.04 0.06 0.08 0.10

Prespecified Noncoverage Level alpha

Figure 3.3: Simulation result. Test: f (t) = f (t),t = [0, 1]. Homoscedastic 1 2 ∈ T variances were assumed. 10000 repetitions were used.

Figure 3.4 is a similar plot as Figure 3.3 but for the special case when f (t) 0. 2 ≡ Figure 3.5 is another plot similar to Figure 3.3 but without the assumption of

2 2 2 2 homoscedastic variances. The variances actually used were σ1 = 0.02 and σ2 = 0.03 with the common window width h = 0.1. All three plots shows that our approximation formulas (3.4.1) and (3.4.2) are good enough for practical purpose.

65 3.6 Test Results on Teeth Lead Data Set.

Next the above test procedure is applied to the Teeth data set based on our Splus module ctest for our application purpose (see Appendix C), with kernel function

3 3 W (t)=(1 t ) I[ 1,1](t), − | | − where I denotes the indicator function and h = 14 for both groups. The reason h = 14 was chosen for the common window width was based on checking the scatter plot with smoothing curves. The outputs for cases of equal and unequal variances for the two groups M1 and M2 are recorded here below:

======Curve Test Procedures ======The p-value to test H0: f1(x)=f2(x) is 0.2130898 With test statistic equals 2.314093, Estimated degree of freedom is 113.3636.

Equal variances assumed. Estimated common sigma^2 is 6.096678

======Curve Test Procedures ======The p-value to test H0: f1(x)=f2(x) is 0.2103608 With test statistic equals 2.292319, Estimated degree of freedom is 113.3636.

Unequal variances assumed.

66 Estimatedsigma^2are 6.488023 and 5.675449.

From the outputs, we are unable to reject the null hypothesis in both the equal and unequal variance cases. This indicates that we don’t have to be concerned about whether tooth M1 or M2 should be used in the lead concentration study since they behave similarly in statistical sense. The scatter plot with smoothing curves is dis- played in Figure (3.2).

67 Test Curve=0

Expected Error Rates Simulated Error Rates Estimated Error Rates 0.02 0.04 0.06 0.08 0.10

0.02 0.04 0.06 0.08 0.10

Pre−specified Noncoverage Level

Figure 3.4: Simulation result: Test H0 : f(t) = 0. 10000 iterations were used. σ = 0.1,h = 0.1

Compare 2 Curves,f1(x)=f2(x) Unequal Variances Assumed.

Expected Error Rates Simulated Error Rates Simulated Error Rates 0.02 0.04 0.06 0.08 0.10

0.02 0.04 0.06 0.08 0.10

Prespecified Noncoverage Level alpha

Figure 3.5: Simulation results. Test: f1(t) = f2(t), for t = [0, 1]. 10000 repeti- ∈ T2 2 tions were used. Heteroscedastic variances were used with σ1 = 0.02 and σ2 = 0.03.

68 Chapter 4

Connections and Discussions

4.1 Connections

We have presented research results in two different areas, namely, multiple hypoth- esis testing and testing the equality of curves. The former one involves testing a finite number of hypotheses indexed conveniently by set 1, 2,...,m . We have stud- { } ied FWER and FDR as the overall error criteria, generalized Benjamini-Hochberg (1995)’s seminal theorem with a simpler proof for FDR control under the indepen- dent test statistics case, and also under the PRDS condition for correlated test sta- tistics. More importantly, this is the first work that the range of variances of BH FDP (for different alternative hypotheses) is provided. In addition, we provided a new FDR-controlled procedure that is more powerful than Benjamini and Hochberg (1995)’s and has a smaller variance than Storey et al. (2004). An application to a real microarray data is also given. Testing equality of curves, on the other hand, is equivalent to testing simultane- ously an uncountable number of hypotheses indexed by a continuous domain. We used the Tube formula to develop approximation formulas for computing corresponding P- values when the variance of random errors is either homoscedastic or heteroscedastic. The performance of approximation formula is examined via simulation. The new test

69 procedure for the heteroscedastic variance case is applied to a real data set. Though different as they may look like, the two areas can be related. The key connection is to generalize the definitions of FWER and FDR so that they are properly defined on any type of index set. Let be an index set of hypotheses to be tested. For the finite hypotheses case, T = 1, 2,...,m as a discrete set. For the second case when we test H : f (t) = T { } 0 1 f (t) : t [a, b], versus H : f (t) = f (t) for some t [a, b], we have = [a, b] 2 ∀ ∈ a 1 6 2 ∈ T as a continuous set. In either case, we have a test statistic Z(t) for each t . ∈ T

Measure µ on . Given a cutoff value c of test statistics, consider the generic test T procedure that rejects all H ’s such that p c. Let 0,i i ≤ V = V (c)= t : p c,H is true { ∈ T i ≤ 0 } = set of indices of hypotheses at t that are falsely rejected , { } R = R(c)= t : p c { ∈ T i ≤ } = set of indices of hypotheses at t that are rejected . { } Define

FWER(c) = P r (µ V > 0), H0 { } µ V FDR(c) = E( { }1 µ R >0 ). µ R { { } } { } Then we see a. If H is true for all t , then FDR(c) = FWER(c) for all c. This is because 0,t ∈ T in this case, each rejected hypothesis is falsely rejected and V (c)= R(c) for all

c. Thus µ V (c) = µ R(c) , and FDR(c) = E(1 µ R(c) >0 ) = P r(µ R(c) > { } { } { { } } { } 0) = P r(µ V (c) > 0) = FWER(c). { } b. If µ is the counting measure on the index set = 1, 2,...,m , then the definitions T { } of FDR and FWER agree with that in the finite hypothesis testing case.

70 c. If µ is the Lebsgue measure on the set = [a, b], then P r(supt Z(t) >t0(c)) = T ∈T k k

P rH0 (µ(V (c)) > 0), where t0(c) is the cutoff value of the test statistic corre- sponding to c. It is therefore equivalent to an FWER controlling procedure, since Z(t) defined in formula (3.4.7) of Chapter 2 is continuous; any one point t with Z(t) >t will result in a small neighborhood of t such that Z(s) >t | | 0 | | 0 for all s in this neighborhood. Therefore, we can find a subset E in this neigh- borhood such that µ(E) > 0.

These definitions are very general, including any form of index set, as long as a proper measure is defined on the set. For example, if we want to test if two images defined in a rectangle area [a, b] [c, d] are the same in all areas, we will have a × 2-dimension domain = [a, b] [c, d]. Thus by defining a mathematical measure on T × this domain, we will have definitions of FDR and FWER accordingly.

4.2 Discussions and Future Research

The definitions of FWER and FDR open up new research areas. For example, we can ask to provide the parallel test procedures based on our newly defined FDR when we test the equality of curves. This will give us more powerful test procedures than the procedures based on FWER error criteria. Of course, this will incur a price of higher type I error.

FWER versus FDR. FWER and FDR are two different error criteria, even though they are both related to the type I error. A logical question is: When should we use FWER, and when should we use FDR? As already mentioned, FWER is a more stringent error control than FDR is, and hence the resulting test procedure with FWER controlled at α is less powerful than the one that controls FDR. Hence when one wants a high confidence in all hypotheses that one has declared to be significant, one should use FWER. On the other hand,

71 if one’s goal is to pick up as many significant hypotheses as possible, he/she should use FDR. In this case, one gains flexibility and statistical power, but will be more likely to make a type I error or have less confidence in declaring his/her discoveries (hypotheses that he/she declares to be significant). Gene selection in microarray data analysis is an example to the point. If one’s primary goal is a high confidence in all selected genes (e.g., selecting candidate genes for Real Time Polymerase Chain Reaction (RTPCR) validation), use FWER con- trolled procedure. If one’s primary goal is to find any genes for further study such that certain proportions of false positives are tolerable (e.g. gene discovery, select- ing candidate co-regulated gene sets for GO/pathway analysis), use FDR controlling procedure, especially if the correlation structure of genes is unknown. See below. From a practical point, when m = , as is in the case of curve testing, BH ∞ procedure is not applicable and there does not exist a practical FDR controlling procedure. One might suggest solving this problem by finding a set of m points (for some large m) in the continuous domain of interests and then applying an FDR controlled procedure at these m points. There are several problems with this ad hoc procedure:

(1) the test is highly sensitive to the choice of m which can be arbitrary;

(2) no matter what finite m is chosen, it is not a simultaneous test on the entire domain of interests; and

(3) more importantly, in the general correlated test statistics case, Benjamini and Liu (1999) and Benjamini and Yekutieli (2001) suggested to add a factor of log(m) to the BH procedure to control the FDR, this is then similar to the Bonferroni procedure.

Hence this FDR controlled procedure would not only have a bigger type I error than the FWER controlled procedure but also can be much less powerful than the

72 simultaneous tests developed here and by Sun (2001) that control FWER more or less exactly. See the illustration in Figure 4 by Sun (2001). Therefore, in this case, we suggest using the test that controls FWER exactly; these procedures sometimes are called the tests using random fields theory or tube formulas. This is different from recommending a Bonferroni procedure which is based on a too conservative upper bound of the FWER. From another practical point, if there is no reason to assume that Z(t) : t { ∈ T } is continuous, or there does not exist a spatial relationship in the domain of t, or the covariance of Z(t) is hard to estimate, such as in the case of microarray data analysis, use FDR. In the case where Z(t) is continuous on a spatially related domain, such as an interval, and for some brain imaging applications, we suggest using the exact FWER-controlled tests based on the random fields theory by Worsley et al. (2004) and Adler (2000), or tube formulas by Sun (1993), or the LASR procedure by Wang, Bogie and Sun (05) (see http://sun.cwru.edu/lasr/) for determining which pixels are significantly activated. Note that Benjamini and Hochberg (1995) and Benjamini and Yekutieli (2001) FDR controlling procedures do not use the valuable correlation information.

Optimal FDR Procedures. As we pointed out in Chapter 2, the variance of FDP depends on the corresponding p-values that come from the alternative, we have provided sharp upper and lower bounds for the variance and give a test procedure that produces the smallest variances through simulation. Is our test procedure optimal? In other words, can we design another test procedure that has an even smaller variance? It would be interesting to have an optimal FDR controlling procedure that has the smallest variance in FDP among all the test procedures that controls FDR at level β.

Duality between Confidence Interval and Hypotheses Test. It is well know

73 that there is a dual relationship between testing a hypothesis and building a confidence interval. The test H : θ = 0 versus H : θ = 0 at a specific confidence level α can be 0 a 6 translated into a dual problem of constructing a 2-sided α-level confidence interval for the parameter θ. If 0 is not inside the constructed CI, it will be easy to see that the realized test statistic will be in an α-level rejection region and hence we will reject

H0 in favor of Ha and vice versa. Similarly, for multiple hypothesis testing, there is a dual relationship between con- struction of simultaneous confidence interval (SCI) and testing multiple hypotheses simultaneously. An example is testing equality of curves that we discussed in Chap- ter 3. For example, suppose we want to test H : f(t) = 0 for all t against 0 ∈ T H : f(t) = 0 for at least one t at some level α. This test can be done by building a 6 ∈ T an α-level SCI of f(t) around the estimated fˆ(t): t . If fˆ(t) are all within the ∈ T SCI, then we will accept the null. For test of equality of two curves, H0 : f1(t)= f2(t) for all t against H : f (t) = f (t) for at least one t at some level α, we ∈ T a 1 6 2 ∈ T could translate it into an SCI problem that constructs an α-level SCI for f (t) f (t) 1 − 2 around the estimated fˆ (t) fˆ (t) . If the line 0 sticks out of the SCI, then H will 1 − 2 0 be rejected at level α. The above duality between constructing SCIs and simultaneous test is actually based on FWER. Now that we have extended the definition of FDR in this chapter, we may wonder if we could find a dual problem for multiple testing based on FDR error criteria. This is an research issue which would be interesting to explore.

Applications. Both studies have applications in bioinformatics area and beyond. For example FDR can be used to find the significant genes among tens of thousands of them by genome scan, as we have seen in Chapter 2. Another possibility is to treat gene expression level as continuous curves (because of linkage) over domain of location, to study the areas that are significantly different between the two expression

74 level functions.

75 Appendix A

Proofs of Lemmas and Theorems in Chapter 2

Throughout this appendix, we will denote the false discovery ratio V/R1 R 1 by W. { ≥ } Table 1.2 will be referred implicitly. For example, V and S will denote the numbers of the false discoveries, and the true discoveries respectively.

A.1 Proof of Lemma 2.4.2

To simplify our notation, we assume that P1, P2,...,Pm0 are p-values from the null that have been sorted from small to large. Similarly, the p-values from the alternative

are denoted by Q1, Q2,...,Qm1 , which are also sorted from small to large.

If m = 1, then m0 = 0 or 1. It is easy to see that Lemma 2.4.2 is true in both cases. Otherwise, if m = 2, let us distinguish the following 3 cases:

Case 1: m = 0,m = 2. In this case, W 0. and (2.4.2) is true. 0 1 ≡

Case 2: m0 = 2,m1 = 0. In this case, we don’t have any conditions on the left hand

side, and W 1 V 1 . We can show that EW = P (V 1) = β in this case. ≡ { ≥ } ≥

76 Indeed,

E(W )=P (V = 1,S = 0)+ P (V = 2,S = 0) β =P (P , P β)+ P (P β) 1 ≤ 2 2 ≥ 2 ≤ β m =2 (1 β)+ β2 = β = 0 β 2 − m

where the third equality is due to the fact that P P are order statistics 1 ≤ 2 of a pair of i.i.d. uniformly distributed random variables and thus the joint distribution has cdf

F (p ,p )= P r(P p , P p ) = 2!p p . (P1,P2) 1 2 1 ≤ 1 2 ≤ 2 1 2

Case 3: m0 = 1,m1 = 1. Then we have:

E(W Q = q ) | 1 1 1 =1 P (V = 1,S = 0 Q = q )+ P (V = 1,S = 1 Q = q ) ∗ | 1 1 2 | 1 1 1 =P (V = 1,S = 0 Q = q )+ P (max P , Q β Q = q ) | 1 1 2 { 1 1} ≤ | 1 1 β 1 =P (P , Q > β Q = q )+ P (P β, Q β Q = q ) 1 ≤ 2 1 | 1 1 2 1 ≤ 1 ≤ | 1 1 β β = P (Q > β Q = q )+ P (Q β Q = q ) 2 1 | 1 1 2 1 ≤ | 1 1 β m = = 0 β. 2 m

where in passing from line 4 to line 5, we have used the assumption that P and Q are independent.

Now suppose for all m k, our main lemma is true. We need to show that for the ≤ case m = k + 1, our main lemma is also true. If m = 0, we have nothing to prove since W 0 in this case. Otherwise, if m = 1, 0 ≡ 0 let P1 denote the (only) p-value corresponding to the hypothesis whose null is true,

77 and Q = q ,...,Q = q denote the m = k = m 1 p-values corresponding to 1 1 m1 m1 1 − hypotheses whose null is not true.

E(W Q = q ...Q = q ) | 1 1 m1 m1 m1 1 = P (V = 1,S = s Q = q ...Q = q ) 1+ s | 1 1 m1 m1 s=0 X m1 1 s + 1 s + 2 = P (max P , Q β, Q > β...Q > β Q = q ...Q = q ) 1+ s { 1 s} ≤ k + 1 s+1 k + 1 m1 | 1 1 m1 m1 s=0 X m1 1 s + 1 s + 1 s + 2 = P (P β, Q β, Q β ...Q β Q = q ...Q = q ) 1+ s 1 ≤ k + 1 s ≤ k + 1 s+1 ≥ k + 1 m1 ≥ | 1 1 m1 m1 s=0 X m1 1 1+ s s + 1 s + 2 = β P (Q β, Q > β ...Q > β Q = q ...Q = q ) 1+ s k + 1 s ≤ k + 1 s+1 k + 1 m1 | 1 1 m1 m1 s=0 X β m1 s + 1 s + 2 = P (Q β, Q > β ...Q > β Q = q ...Q = q ) k + 1 s ≤ k + 1 s+1 k + 1 (m1) | 1 1 m1 m1 s=0 X β m1 = 1 s+1 s+2 k+1 qs β, qs+1> β ... qm > β k + 1 { ≤ k+1 k+1 1 k+1 } s=0 X β m = = 0 β k + 1 m

where we have assumed that Q0 = 0, Qm1+1 = 1.

Therefore, the claim holds for all cases when m0 = 0, 1. Now let us look at the case when m 2. 0 ≥ We follow the idea in the proof of the main lemma in Benjamini and Hochberg

(1995) with some notation changes. Here we denote P1,...,Pm0 and Q1,...,Qm1 as the p-values for the true and non-true null hypotheses respectively. They are assumed to be independent between Pi’s and Qj’s. To simplify our notation, they are also assumed to have been sorted from small to large already within each group. The

q1,...,qm1 are a set of realized values of Q′s.

78 Let j0,p′ be defined as, with q0 = 0 if needed: m + j j =max j : q 0 β, 0 j m , 0 { j ≤ k + 1 ≤ ≤ 1}

m0 + j0 p′ = β, k + 1 where j is the maximal index between 1 and m such that the inequalities q 0 1 j ≤ (m + j)/(k + 1)β are true for 0 j m , while p′ is the value of the right hand 0 ≤ ≤ 1 side of this inequality when j = j0. If no such j exists, let j0 = 0.

Now conditioning on Pm0 = p, we have:

E(W Q = q ,...,Q = q ) | 1 1 m1 m1 p′

= E(W Pm = p, Q1 = q1 ...Qm = qm )fPm (p)dp | 0 1 1 0 Z0 1

+ E(W Pm0 = p, Q1 = q1 ...Qm1 = qm1 )fPm (p)dp ′ | 0 Zp

Consider the first integration. Since p p′, all m +j null hypotheses are rejected ≤ 0 0 by the BH procedure and W m /(m + j ). Therefore, ≡ 0 0 0 p′

E(W Pm = p, Q1 = q1 ...Qm = qm )fPm (p)dp | 0 1 1 0 Z0 p′ m0 m0 1 m0 m0 = m p − dp = (p′) m + j 0 m + j Z0 0 0 0 0

m0 1 where f m (p) = m p is the probability density function of P , the maximal P 0 0 − m0 value among the m0 p-values corresponding to the true null hypotheses.

Now let us consider the second integration. Here we will consider separately each j, j = j + 1,...,m , q

79 Let

P ′ = P /p, i = 1, 2,...,m 1, i i 0 −

Qi′ = Qi/p, i = 1, 2,...,j, then P ′ = P /p,i = 1, 2,...,m 1 are the ordered statistics of a set of i.i.d. uniformly i i 0− distributed variables while 0 Q′ 1 for i = 1,...,j. ≤ i ≤ Let

m′ = m + j 1 < k +1= m, 0 −

m′ = m 1, 0 0 −

m1′ = j,

β′ = (m + j 1)/((k + 1)p)β. 0 −

Notice that for X to be either P or Q, X′ (i/m′)β′ if and only if X (i/m)β for i ≤ i ≤ i = 1,...,m′. Hence the false discovery proportion for testing P1,...Pm0 , Q1,...,Qm1

at level β for a specific j is the same as testing P1′,...Pm′ 0 1, Q1′ ,...,Qj′ at β′ = − (m + j 1)/((m + 1)p)β. Therefore, we have by induction that 0 − E(W P = p, Q = q ...Q = q ) | m0 1 1 m1 m1 q1 qj =E(W Q′ = ,...,Q′ = ) | 1 p m1 p

m0′ m0 1 m0 + j 1 = β′ = − − β (by induction) m m + j 1 (k + 1)p ′ 0 − m 1 = 0 − β (k + 1)p

As a result, the second integration becomes 1

E(W Pm0 = p, Q1 = q1 ...Qm1 = qm1 )fPm (p)dp ′ | 0 Zp 1 1 m0 1 m0 1 m0 1 = − βfPm (p)dp = − βm0p − dp ′ (k + 1)p 0 ′ (k + 1)p Zp Zp

m0 (m0 1) = β 1 p′ − k + 1 { − }

80 Note here that for the last equality to be valid, we need m 2. However this is not 0 ≥ a problem since we have already proved cases when m0 = 0, 1. Adding the two parts together, we have:

E(W Q = q ...Q = q ) | 1 1 m1 m1 ′ m0 m0 m0 (m0 1) = (p′) + β 1 p − m0 + j0 k + 1 { − }

′ m0 m0 m0 m0 (m0 1) = β + (p′) βp − k + 1 m0 + j0 − k + 1

m0 m0 1 1 β = β + m0(p′) − ( p′ ) k + 1 m0 + j0 − k + 1

m0 m0 1 1 m0 + j0 β = β + m0(p′) − ( β ) k + 1 m0 + j0 k + 1 − k + 1 m m = 0 β = 0 β k + 1 m as was claimed in Lemma 2.4.2. QED.

A.2 Proof of Theorem 2.4.6

A.2.1 Key Lemma

First we show that the following key lemma is true. Let

P0 =(P1,...,Pm0 ), P1 =(Pm0+1,...,Pm) be the p-values of test statistics from the true and alternative respectively. Let

P =(P1,...,Pm)=(P0, P1).

Also let V W = 1 V 1 , R { ≥ } denote the false discovery proportion of a test procedure.

81 Lemma A.2.1. For independent and continuous test statistics, Var(W) of a BH procedure at level β is an increasing function of P1, if m ,m /m π (0, 1) → ∞ 0 → 0 ∈ where P1 P1′ means that P P ′, stochastically for all i = m + 1,...,m. ≤ i ≤ i 0 Proof: Since

V ar(W )= E(W 2) (EW )2 = E(W 2) (m2/m2)β2, − − 0 we only need to show that E(W 2) is a monotone function of P1, under conditions of Lemma A.2.1 .

Let Pi be any p-value from the alternative among the Pj’s , j = m0 + 1,...,m, and P(i) = P P = P , k = i be the sequence of p-values from the alternative \{ i} { k 6 } without Pi. Let

(i) 2 (i) (i) g = g(p ,pi)= W (P = p , Pi = pi)

2 (i) denote the dependence of W on given values (p ,pi) for some given BHFDR con- trolling procedure. Then we claim that g is an increasing function of pi almost surely as m becomes large. Let j k = max j : p β, j = 0, 1, 2,...,m , { (j) ≤ m } then 2 (i) v g = g(p ,pi)= 1 v 1 , k2 { ≥ } where v is the number of falsely rejected hypotheses among the total k rejected hypotheses or is 0 if either such k does not exist or v = 0.

Let p′ be another (realized) value of P such that 0 p < p′ 1. Let k′,v′ be i i ≤ i i ≤ (i) defined the same way as it was but on the new p-value sequence (p ,pi′ ), and let

2 (i) v′ ′ ′ ′ g = g(p ,pi)= 2 1 v 1 . k′ { ≥ }

82 We want to show that claim

g g′ (A.2.1) ≤ is true almost surely, as m becomes large.

If k = 0, then g = 0 g′. Thus the above claim A.2.1 is true. ≤ Therefore we can assume k 1. ≥

Let us distinguish 3 cases based on the relative order between pi, pi′ and the

‘cutoff’ value p(k):

k Case 1: p

(i) (i) Case 2: p p

Case 3: p p

particular, pi′ will be accepted but pi was rejected. Therefore, k will decrease at least by 1 in this case. Meanwhile, there are possibly more p-values being

accepted. Thus the total rejected k′ satisfies k′ k 1. See figure A.1 for this ≤ − case.

Further dividing this case into the following subcases:

2 2 2 2 Subcase 1: p ((k 1)/m)β. Then k′ = k 1,v′ = v,g′ = v′ /k′ v /k = (k) ≤ − − ≥ g. (In case k = 1, the only rejected p-value is pi, which comes from P1. So we must have in this case v = 0. Therefore, g = v2/k2 = 0 as well as

g′ = 0.)

Subcase 2: ((k 1)/m)β p (k/m)β. The probability for this case to − ≤ (k) ≤ occur will be

P (((k 1)/m)β P (k/m)β, P (k + 1/m)β, , P β) − ≤ (k) ≤ (k+1) ≥ ··· (m) ≥ P (((k 1)/m)β P (k/m)β) 0, ≤ − ≤ (k) ≤ →

83 BH on data 1 BH on data 2

0 0 0: if test from True null 0: if test from True null 1: if test from Nontrue. 1: if test from Nontrue.

00 00

00 00 0 0

0 0 1 1

pvalue 0 pvalue 0

1N

1 1 1 1

0 0

1 1 1 1 1 1 1 1 0 0 1111111 1111111 111111111111111M1 111111111111111

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

0 10 20 30 40 0 10 20 30 40

index index

Figure A.1: Illustration for case 3: m = 40,m0 = 10. All p-values are the same except one, which comes from the alternative. This point was marked as M in the left panel and N in the right panel.

84 as m , since the length of the integral interval for P is β/m 0 as → ∞ (k) → m . → ∞

Therefore, our claim that g is an almost surely increasing function in pi is proved.

2 (i) 2 (i) Now let P P ′ stochastically. We must show that E(W (P , P )) E(W (P , P ′)). i ≤ i i ≤ i However,

2 (i) (i) (i) (i) E(W (P , Pi)) = g(p ,pi)fPi (pi)fP(i) (p )dpidp P(i) Z ZPi

2 (i) (i) (i) (i) ′ i E(W (P , Pi′)) = g(p ,pi′ )fPi (pi′ )fP( ) (p )dpi′ dp P(i) ′ Z ZPi

′ i (i) (i) where fPi (pi), fPi (pi), fP( ) (p ) are the pdf’s of Pi, Pi′ and P respectively, and

(i) (i) 2 (i) (i) g(p ,pi), g′(p ,pi) are realized W at (p ,pi) and (p ,pi′ ) respectively. Here we

(i) have used the independent property of test statistics between pairs of Pi and P , Pi′ and P(i).

2 (i) 2 (i) To show that E(W (P , P )) E(W (P , P ′)), we only need to show that i ≤ i

(i) (i) ′ g(p ,pi)fPi (pi)dpi g(p ,pi′ )fPi (pi′ )dpi′ , ≤ ′ ZPi ZPi or that

(i) (i) E g(p , P ) E ′ g(p , P ′). Pi i ≤ Pi i But this is not difficult because here we have showed that 0 g(p(i),p ) 1 is almost ≤ i ≤ surely monotonically increasing in p , P P ′ stochastically by assumption, and that i i ≤ i stochastic increment of random variables (e.g., X Y stochastically) is characterized ≤ by the expectation increment of any monotone increasing function of the random variables (e.g., for any monotonically increasing function f(s), Ef(X) Ef(Y )). ≤ Now suppose that there are more than one p-values that increase stochastically. We can repeat the above procedure a finite number of times and prove that the lemma is true. QED.

85 A.2.2 Other Lemmas

Lemma A.2.2. Let P(1:k) ...P(k:k) be the ordered i.i.d. uniformly distributed ran- dom variables, let R(1:s) ...R(s:s) be another set of ordered i.i.d. uniformly distributed random variables, where 1 s

P (P((k s):k) c0, P((k s+1):k) c1,...P(k:k) cs) − ≤ − ≥ ≥

k k s = c − P (R c ...R c ) (A.2.2) s 0 (1:s) ≥ 1 (s:s) ≥ s   for any constants 0 c c c 1. ≤ 0 ≤ 1 ≤···≤ s ≤ Proof. By integration and variable substitution.

P (P((k s):k) c0, P((k s+1):k) c1,...P(k:k) cs) − ≤ − ≥ ≥ c0 1 1 k s = dpk s . . . k(k 1) . . . (k s)pk−s 1dpk ...dpk s+1 − − − − − − Zpk−s=0 Zpk−s+1=c1 Zpk=cs c0 1 1 k s 1 =k(k 1) . . . (k s) pk−s− dpk s . . . dpk ...dpk s+1 − − − − − Z0 Zpk−s+1=c1 Zpk=cs c0 1 1 k(k 1) . . . (k s + 1) k s 1 = − − (k s)pk−s− dpk s . . . s!dpk ...dpk s+1 s! − − − − Z0 Zpk−s+1=c1 Zpk=cs

k k s = c − P (R c ...R c ) s 0 (1:s) ≥ 1 (s:s) ≥ s   where the last equation was obtained by variable substitution. QED

Lemma A.2.3. For any positive integer k, any 0 <β < 1, c 0, such that ≥ β +(k 1)c 1, let − ≤ P (k,β,c)= P (P > β,P > β + c...P > β +(k 1)c), (1:k) (2:k) (k:k) − where P(i,k),i = 1,...,k are the order statistics of a set of k i.i.d. uniformly distributed variables. Then

k k j 1 P (k,β,c)= ( 1) (c β) − (β +(j 1)c). (A.2.3) − j − − j=0 X   If c = β,the above equation is reduced to P (k,β,β) = 1 kβ. − 86 Proof. By mathematical induction on k. It is Simes’s inequality when c = β and an extension when c = β. See Simes(1986). QED 6 Lemma A.2.4. For any k 2, any 0 j k 1, we have: ≥ ≤ ≤ −

k k ( 1)v (v + 1)j = 0. − v v=0 X   Proof. By mathematical induction on k. For any k 2, j = 0, it is the expansion ≥ of (1+( 1))k. For k 2,j > 0, we need to expand each (v + 1)j for v =0 to k, and − ≥ use the induction step. QED.

Lemma A.2.5. For a two-sided normal , the p-value P, as a random variable, satisfies 0 P U ≤ ≤ stochastically, where U is a uniformly distributed random variable.

Proof: Suppose the true mean and variance of the normally distributed X are µ and σ2 respectively. Without loss of generality, we assume that the null hypothesis is H : µ = 0 against H : µ = 0, and that σ = 1. 0 a 6 Obviously, P 0. So we only need to show P U, or ≥ ≤

P r(P t) t, ≤ ≥ for all t (0, 1). ∈ With some calculation, we obtain

1 t 1 t P r(P t) = 1 Φ(Φ− (1 ) µ)+Φ( Φ− (1 ) µ). ≤ − − 2 − − − 2 −

For µ = 0, we have

1 t 1 t t = 1 Φ(Φ− (1 ))+Φ( Φ− (1 )). − − 2 − − 2

87 1 Let c = Φ− (1 t/2). Then for µ 0 and for all t (0, 1), we have c 0 and − ≥ ∈ ≥

1 t 1 t P r(P t) t =Φ( Φ− (1 ) µ) Φ(Φ− (1 )) ≤ − − − 2 − − − 2

1 t 1 t + Φ( Φ− (1 )) Φ(Φ− (1 ) µ) − − 2 − − 2 − =P r(c µ Z c) P r( c µ Z c) − ≤ ≤ − − − ≤ ≤ − =P r(c µ Z c) P r(c Z c + µ) − ≤ ≤ − ≤ ≤ 0, ≥ where Φ denotes the standard normal cdf, Z a standard normal variable. The last inequality holds true because for z 0, the pdf of a standard normal is (strictly) ≥ decreasing. The case for µ 0 can be similarly proved. QED. ≤

Lemma A.2.6. If all Qi = 0, with other assumptions in Theorem 2.4.6 unchanged, we have m V ar(W )= 0 β(1 β) m(m m + 1) − − 0

Proof: In this proof, we will assume that the m0 p-values P1,...,Pm0 correspond- ing to the true null hypotheses are already sorted, for simplicity of notation.

Let 0 <β< 1 be specified and let P > v:m0 β denote the multiple relationship { v:m0 m } of P > v β, P > v+1 β,...,P > m0 β . { v m v+1 m m0 m } In this case, all the m1 non true null hypotheses are rejected for all β > 0. There- fore, we have S = m1.

Assume P0 = 0, Pm0+1 = 1. m V ar(W )= E(W 2) (EW )2 = E(W 2) ( 0 β)2 − − m

For E(W 2), we have:

m0 v v + m v + m + 1 : m E(W 2)= ( )2P (P 1 β, P > 1 β) v + m v ≤ m v+1:m0 m v=1 1 X

88 By applying lemmas A.2.2 and A.2.3 we have

m0 v v + m (v + m + 1) : m ( )2P (P 1 β,,P > 1 β) v + m v ≤ m ((v+1):m0) m v=1 1 X m0 v 2 m0 v + m1 v (v + m1 + 1) : m = ( ) ( β) P (R(1:(m0 v)) > β) v + m v m − m v=1 1 X   m0 m0 v − v 2 m0 v + m1 v j m0 v v + m1 j 1 v + m1 + 1 = ( ) ( β) ( 1) − ( β) − β v + m v m − j m m v=1 1 j=0 X   X   m0 m0 v − v(m + v + j)m ! (m + v)v+j 3 = ( 1)j 1 0 1 − βv+j − (v 1)!j!(m v j)! mv+j v=1 j=0 0 X X − − − where in passing from line 2 to 3 and from 3 to 4, we have used Lemma A.2.2 and A.2.3 respectively.

89 Suppose 3 k m , the coefficient for βk is ≤ ≤ 0 v+j 3 j v(m1 + v + j)m0! (m1 + v) − v+j ( 1) v+j β − (v 1)!j!(m0 v j)! m v+Xj=k − − − k k 3 k v v m1 + k (m1 + v) − =m ! ( 1) − 0 − (k v)!(v 1)! (m k)! mk v=0 0 X − − − k m0!(m1 + k) k v v k 3 = ( 1) − (m + v) − (m k)!mk − (k v)!(v 1)! 1 0 v=0 − X − − k k 3 − m0!(m1 + k) k v v k 3 i k 3 i = ( 1) − − m v − − (m k)!mk − (k v)!(v 1)! i 1 0 v=0 i=0 − X − − X   k 3 k − m0!(m1 + k) k 3 i k v v k 3 i = − m ( 1) − v − − (m k)!mk i 1 − (k v)!(v 1)! 0 i=0 v=0 − X   X − − k 3 k − m0!(m1 + k) k 3 i k v v(k 1)! k 3 i = − m ( 1) − − v − − (k 1)!(m k)!mk i 1 − (k v)!(v 1)! 0 i=0 v=1 − − X   X − − k 3 k − m0!(m1 + k) k 3 i k v k 1 k 3 i = − m ( 1) − v − v − − (k 1)!(m k)!mk i 1 − v 1 0 i=0 v=1 − − X   X  −  k 3 m !(m + k) − k 3 = 0 1 − mi 0 (by Lemma A.2.4) (m k)!mk i 1 0 i=0 − X   =0

There is only one term involving β, which corresponds to v = 1, j = 0. Therefore its coefficient is v(m + v + j)m ! (m + v)v+j 3 ( 1)j 1 0 1 − (for v=1, j=0) − (v 1)!j!(m v j)! mv+j − 0 − − m ! (m + 1) 1 m m = 0 1 − = 0 = 0 (m 1)! m m(m + 1) m(m m + 1) 0 − 1 − 0

The coefficient for β2, together with coefficient of β2 in term (EW )2, which is −

90 m2/m2, is: − 0 v(m + v + j)m ! (m + v)v+j 3 m2 ( 1)j 1 0 1 − 0 − (v 1)!j!(m v j)! mv+j − m2 v+j=2 0 X − − − (m + 2)m ! v 1 m2 = 1 0 ( 1)j 0 m2(m 2)! − (v 1)!j! (m + v) − m2 0 v+j=2 1 − X − 2 (m1 + 2)m0(m0 1) 1 2 m0 = 2 − ( + ) 2 (v = 0, 1, 2) m −m1 + 1 m1 + 2 − m 2 (m1 + 2)m0(m0 1) 2m0(m0 1) m0 = 2 − + 2 − 2 m (m1 + 1) m − m m m = 0 = 0 m(m + 1) m(m m + 1) 1 − 0 Therefore, we have proved that

m m m V ar(W )= 0 β 0 β2 = 0 β(1 β). m(m + 1) − m(m m + 1) m(m m + 1) − 1 − 0 − 0 QED.

Lemma A.2.7. If all Qi’s are uniformly distributed, with other assumptions in The- orem 2.4.6 unchanged, we have

m V ar(W ) 0 β(1 β) ≤ m −

Proof: Similar to the proof of Lemma A.2.6. Omitted. QED.

A.2.3 Proof of Theorem 2.4.6

Under the condition of Theorem 2.4.6, we have for all Q , 0 Q U by Lemma i ≤ i ≤ A.2.5. Now Lemma A.2.1 shows that V ar(W (P, 0)) V ar(W (P, Q)) V ar(W (P, U)), ≤ ≤ where P, Q, U, 0 are vectors of random variables of p-values for true null, non true null hypotheses, vectors of all uniformly distributed variables and 0 variables respectively. Then we can use Lemma A.2.6 and A.2.7 to arrive at our conclusion. QED.

91 Appendix B

Proof of Theorem in Chapter 3

B.1 Lemmas

Lemma B.1.1. Assume that X χ2 and X χ2 are independent random 1 ∼ n1 2 ∼ n2 variables, n1 and n2 are relatively large. Let

Xi ni Gi = / . X1 + X2 n1 + n2

Then G 1, i → in distribution. In particular,

E G 1 (B.1.1) { i} → E G2 1 (B.1.2) i →  for i = 1, 2.

Proof. Let X = n1 W 2, where W N(0, 1). Since EX = n ,Var(X ) = 2n , 1 i=1 i i ∼iid 1 1 1 1 we can approximateP X1 by

d X1 = n1 + √2n1Z1 + o(√n1).

92 Similar result for X2 leads to:

d X2 = n2 + √2n2Z2 + o(√n2),

d where Z1 and Z2 are independent standard normal random variables and notation = denotes equality in distribution. Therefore,

Xi ni Gi =( )/( ) X1 + X2 n1 + n2

d n + √2n Z + o(√n ) n = i i i i /( i ) n1 + n2 + √2n1Z1 + √2n2Z2 + o(√n2)+ o(√n1) n1 + n2

1+ √2niZi+o(√ni) = ni 1+ √2n1Z1+√2n2Z2+o(√n2)+o(√n1) n1+n2

d 1, ∼approx as n for i = 1, 2. QED i → ∞

Now suppose that X1, X2,S1,S2 are four independent random variables such that X χ2 and S χ2 /ν for i = 1, 2. Let n = n + n and i ∼ ni i ∼ νi i 1 2 X + X Y = 1 2 . (B.1.3) X1/S1 + X2/S2

Lemma B.1.2. Assume ni > νi for i = 1, 2. Under the conditions of Lemma B.1.1 Y χ2/ν approximately with degrees of freedom ν which can be estimated by formula ∼ ν (3.4.19), i.e., by 2 n ν1ν2 ν = 2 2 . n2ν1 + n1ν2

Proof. Clearly by ESi = 1, Var(Si) = 2/νi, we have

νi k=1 Wi d 1 Si = =1+ 2/νiZi + o( ), ν √ν P i i p

93 for a sequence of i.i.d. random variables Wi for i = 1, 2,...,νi for i = 1, 2.

Expanding Y = f(S ,S )= X1+X2 around (1, 1), we have: 1 2 X1/S1+X2/S2

2 X 2 X 1 1 Y = f(S ,S ) =1+ Z 1 + Z 2 + o( + ). 1 2 ν 1 X + X ν 2 X + X ν ν r 1 1 2 r 2 1 2 1 2 Conditioning on X , X , it is easy to find that EY 1. Now 1 2 → V ar(Y )=V ar(E(Y X , X )) + E(V ar(Y X , X )) | 1 2 | 1 2 1 1 =E(V ar(Y X1, X2)) + o( + ) | ν1 ν2 2 X 2 X E V ar( Z 1 + Z 2 X , X ) ≈ ν 1 X + X ν 2 X + X | 1 2  r 1 1 2 r 2 1 2  2 X 2 X =E ( 1 )2 + ( 2 )2) ν X + X ν X + X  1 1 2 2 1 2  2 X 2 X = E( 1 )2 + E( 2 )2) ν1 X1 + X2 ν2 X1 + X2 2 n 2 n ( 1 )2 + ( 2 )2 (by Lemma (B.1.1)) ≈ν1 n1 + n2 ν2 n1 + n2 2 2 2n1ν2 + 2n2ν1 = 2 . (n1 + n2) ν1ν2

2 2 2 2n1ν2+2n2ν1 Thus by comparing V ar(χ /ν) = 2/ν with var(Y )= 2 , we have estimation ν (n1+n2) ν1ν2 ν in formula (3.4.19). QED

Our simulation results in Figures B.1, B.2 and Table B.1 tell us that the approxi-

2 mation of Y by χν/ν, where ν is estimated by formula (3.4.19) is quite accurate. In

2 Figure B.1, the density curve of χν/ν, where ν is estimated by (3.4.19) is presented by dashed red curve and the density curve of simulated Y is presented by solid black curve. The two curves agree well with each other. Figure B.2 compares estimated degrees of freedom obtained by formula (3.4.19) (dotted red line) and by degrees of freedom obtained by simulation (solid red line, also called the true degrees of freedom). The simulated degrees of freedom were

94 Table B.1: Comparison of simulated degrees of freedom ν = 4πm2 (upper element, via simulation) with approximated degrees of freedom ν (lower element, by formula (3.4.19)) of different combinations of sample sizes n1, n2 and degrees of freedom ν1,ν2.

ν2

ν1 100 200 300 400 500 600 700 800 100 192.7 232.6 242.4 253.6 252.8 258.6 264 268.5 192.3 227.3 241.9 250 255.1 258.6 261.2 263.2 200 289.5 379 422.9 449.6 471.8 468.7 497.2 500 294.1 384.6 428.6 454.5 471.7 483.9 493 500 300 354.8 496.2 580.1 625.4 643.1 677.4 705.3 732 357.1 500 576.9 625 657.9 681.8 700 714.3 400 407.3 570.9 697.8 763.9 808.8 844.7 889.8 896.7 400 588.2 697.7 769.2 819.7 857.1 886.1 909.1 500 422.2 672.3 789.9 879.5 941 1013.1 1038.9 1098.7 431 657.9 797.9 892.9 961.5 1013.5 1054.2 1087 600 456.1 721.3 868.2 1011.5 1062.6 1165.2 1180.2 1256.9 454.5 714.3 882.4 1000 1087 1153.8 1206.9 1250 700 471.6 768.9 943.5 1095.5 1204.6 1256.3 1355.2 1409.4 473 760.9 954.5 1093.8 1198.6 1280.5 1346.2 1400 800 486.7 793.8 1040 1152.3 1267.9 1433.2 1451.3 1534.3 487.8 800 1016.9 1176.5 1298.7 1395.3 1473.7 1538.5 obtained by the following steps. First, draw samples of X χ2 ,S χ2 , i = 1, 2. i ∼ ni i ∼ νi independently and then compute Y to get a sample of Y. Then its pdf p(y) is estimated and its peak value m = maxy p(y) is calculated and ν is calculated based on the model formula ν = 4πm2. This relation is obtained by theoretic calculation of relationship

2 between degrees of freedom ν and the peak values of density functions of χν/ν random variables. Our simulation shows that they closely agree with each other. See also Table B.1.

To prove our Theorems, we also need the following definition and lemma.

95 pdf of simulated Y pdf of χ2(ν) ν 2 pdf of χ (ν1) ν1 2 pdf of χ (ν2) ν2 Density 0 1 2 3 4 5 6

0.7 0.8 0.9 1.0 1.1 1.2 1.3

Y

2 Figure B.1: The true density of Y (solid black) and the density of χν/ν (dashed red), 2 with ν computed by formula (3.4.19). The density curves of χνi /νi are also added on the plot. n1 = 800, n2 = 1000,ν1 = 120,ν2 = 300.

True degrees of freedom ν Cal'd degrees of freedom ν ν 200 400 600 800 1000 1200 1400

100 200 300 400 500 600 700 800

ν1

Figure B.2: Compare the degrees of freedom ν estimated by formula (3.4.19) (dotted green lines) and the degrees of freedom ν by simulated data with ν = 4πm2 (solid red line) with different combination of values ν1 = 100, 200,..., 800 (x-axis) and ν2 = 100, 200,..., 800 (from bottom curve up). Here n1 = 1000 and n2 = 1500. 96 Definition B.1.3. (Tube) A tube, T , with radius r of a manifold (t)= mn(t) := M { (m (t),m (t),...,m (t),t embedded in n-dimensional space X (either Euclid- 1 2 n ∈ T } n n 1 n ean Space R or Spherical surface S − R ) is defined to be the set of points ⊂ x X such that d(x, (t)) r, or ∈ M ≤ T = T (r)= x X : d(x, y) r for at least one y (t) , { ∈ ≤ ∈ M } where d is the usual Euclidean distance. It is one dimensional because the domain T of t is.

QED Since the distance d in Rn can be equivalently defined by the inner product:

d(x, y)2 =< x y, x y >, − − n n 1 where x, y R , the tube can also be defined by the inner product. When x, y S − , ∈ ∈ the relation between distance and inner product reduces to:

d(x, y)2 = 2 2 x, y . − h i The volume of a tube vol(T) can be partitioned into two parts. The first part is the (main) tubular area in the middle. The second part is the boundary correction part, which results from boundary correction if the manifold has boundaries. See the illustration picture in Figure B.3. This problem of calculating the volume of a tube was first proposed and solved by Hotelling (1939) in a 1-dimensional manifold situation. Weyl (1939) extended the results to higher dimensional manifolds, that is, when (t) is a surface. Other publications on the subject include Knowles and M Siegmund (1989), Johansen and Johnstone (1990), Naiman (1987, 1990) and Sun (1993). In the simplest case when X = Rn and the manifold is a curve, we have:

Lemma B.1.4. (Tube formula 1)[ref: Weyl (1939), Hotelling (1939), Naiman (1990)]

n 1 l0 n V ol(T )=κ0Vn 1r − + Vnr , − 2

97 Figure B.3: Tubes with 2 endpoints around a 1-dimensional manifold embedded in R2. where κ0 is the length of the manifold, l0 is the number of end-points (often, it equals to 2 when the domain = [a, b] and (a) = (b)). V = πk/2/Γ(1 + k/2) is the T M 6 M k volume of the k-dimensional unit sphere.

QED The volume of tube in Figure B.3 is thus:

V (T )=(Length of Manifold) 2r + πr2. ×

n 1 n When X = S − is the surface of the unit sphere in R and the manifold is a curve, we have:

Lemma B.1.5. (Tube formula 2)[ref: Sun and Loader (1994), Naiman (1990)]

κ0An 2 EAn 2 V ol(T )= P (β1, (n 2)/2 w )+ P (β1/2, (n 1)/2 w ), (B.1.4) 2π − ≥ 4 − ≥ where βa, b denotes a random variable following a β distribution with parameters a and b, A = 2πn/2/Γ(n/2) is the surface area of the unit sphere in Rn, w = 1 r2/2. n − κ0 is the length of the manifold, as before and E is the Euler number.

98 QED This theorem was provided by Knowles and Siegmund (1989), extended by Sun and Loader (1994)[2.5, P1331] to cases when the dimension of manifold d 2: ≥ P r sup < T (x),U >w (B.1.5) x  ∈X 

ξ0 κ2 + ξ1 + m0 2 (n d+2)/2 = κ J (w)+ J (w)+ J (w)+ O((1 w ) − ), 0 0 2 1 2π 2 − where 1 An d+e 1 2 (n d+e 3)/2 d e J (w)= − − (1 u ) − − u − du. e A − n Zw It is easy to see the corresponding relationship between formula (B.1.4) and (B.1.5): apart from a constant An, which is needed to be divided from the volume of tube to get the probability, Je is related to a probability associated with a proper β random variable and E = ξ0. Formula (B.1.4) takes only two terms in formula (B.1.5).

B.2 Proof of Theorem 3.4.2

2 2 u1(t) n 1 n Suppose σ1 and σ2 are known. Let (t) = − for n = M u2(t) ! ∈ S ⊆ R

ξ1 n1 + n2, t , as before. Let ξ = and U := ξ/ ξ . Then U is uniformly ∈ T ξ2 ! k k n 1 2 distributed (over − ), independent of ξ . Since σ and σ are known, ξ will S k k 1 2 k k follow a χ2 distribution. Conditioning on value ξ , making use of the fact that ε n k k 1 and ε2 are supposed to be independent, we have

P r(T >t0)=P r(sup Z(t) t0)= P r(sup (t), ξ t0) t k k ≥ t hM i ≥ ∈T ∈T ξ t =P r(sup (t), 0 ) t M ξ ≥ ξ ∈T  k k k k t0 = P r(sup (t), U ξ = y)f ξ (y)dy k k y t0 t hM i ≥ y | k k Z ≥ ∈T t0 = P r(sup (t), U )f ξ (y)dy, k k y t0 t hM i ≥ y Z ≥ ∈T

99 2 where f ξ (y) denotes the pdf of ξ , whose square follows a χn distribution. The last k k k k equation holds because of the fact that ξ/ ξ and ξ are independnet. k k k k When t0 is large, the following tube formula [cf: lemma B.1.4] can be plugged into the last equation

κ0 2 n/2 1 E 2 P r(sup (t), U c) 2 (1 c ) − + P r(β 1/2,(n 1)/2 c t k hM i k ≥ ≈ ∗ 2π − 4 { − } ≥ ∈T   where β 1/2,(n 1)/2 denotes a Beta random variable with parameters 1/2 and (n-1)/2. { − } The factor 2 corresponds to the two curves satisfying the probability condition : one is (t), another (t). This gives formula (3.4.20). M −M 2 Suppose we don’t know both of the σi . and they are estimated as in formula (3.4.5). Let

′ 2 εiεi σˆi Xi = 2 , Si = , i = 1, 2, (B.2.1) σi 2 σi

X1 + X2 X1 + X2 Y = X1 + X2 , X = = . (B.2.2) S1 S2 X1/S1 + X2/S2 Y

Then the requirements of Lemma B.1.2 are satisfied, and X χ2/ν, with degree ∼approx ν 2 X1+X2 χn of freedom ν estimated in formula (3.4.19). Therefore, Y = 2 nFn,ν. X → χν /ν ∼

Now let

(ˆσ1l1(t), σˆ2l2(t)) n 1 ˆ (t)= − − , M σˆ2 l (t) 2 +σ ˆ2 l (t) 2 ∈ S 1 k 1 k 2 k 2 k ε qε ξˆ =( 1 , 2 ) n, σˆ1 σˆ2 ∈R ξˆ U = , ξˆ k k Zˆ(t)=< ˆ (t), ξˆ > . M

100 Then

P r(T >t0)=P r(sup Zˆ(t) t0)= P r(sup ˆ (t), ξˆ t0) t k k ≥ t M ≥ ∈T ∈T D E ξˆ t =P r(sup ˆ (t), 0 ) t *M ξˆ + ≥ ξˆ ∈T k k k k ˆ t0 ˆ = P r(sup (t), U ξ = y)f ξˆ (y)dy y t0 t M ≥ y | k k k k Z ≥ ∈T D E ˆ t0 = P r(sup (t), U )f ξˆ (y)dy, y t0 t M ≥ y k k Z ≥ ∈T D E where Y =: ξˆ 2 χ2/ν, as Y was defined in (B.1.3). k k ∼approx ν Let the σ2 be known values in ˆ (t) in order to estimate κ , so that we can use a i M 0 Tube formula (B.1.4) for the probability inside the integral. After some calculation, we get result (3.4.21).

QED

101 Appendix C

Software ctest

We have compiled an R routine ctest for applications of testing equality of curves proposed in Chapter 3. The parameters include:

Data: Specified by vectors XX1, YY, XX2 and ZZ. The XX1 and XX2 are for x vectors, and YY and ZZ are the corresponding response vectors. Therefore, XX1 and YY specifies a curve, and XX2 and ZZ specifies a curve. If XX2 and

YY2 are missing, the test is H0 : f(t) = 0.

Equal.var: Logic value, specifies if equal.variances are assumed. Default=T.

Plotit: Logic value, asks if ctest should print the scatter plots and smoothing curves. It is useful to plot it to select the windown width hh below. Default=F. hh: The common window width for smoothing the data. Selected by checking the plot. Default is 0.5.

Conf.level: The α value specifies the type I error level. Default=.05. nn Number of points used to smooth the curves. The points are equally spaced between the domains that appeared in the two data set. Default=100.

Kernel: Kernel functions to choose for smoothing. Rightnow, only trio implemented.

102 Usage:

n1=50; n2=55 x1=seq(0,1, length=n1); x2=seq(0, 1, length=n2); y1=x1*(1-x1)+rnorm(n1, 0, 0.02) y2=x2*(1-x2)+rnorm(n2, 0, 0.01) ctest(x1, y1, x2, y2, equal.var=F, plotit=T) #explore the best #window width. ctest(x1, y1, x2, y2, equal.var=F, hh=0.1, plotit=T)

Output: ======Curve Test Procedures ======The p-value to test H0: f1(x)=f2(x) is 0.5283728 With test statistics equals 2.216465 , Estimated degree of freedom is 82.36816 ,

Unequal variances assumed. Estimatedsigma^2are 0.0002896806 and 0.0001126348 . ======

103 Bibliography

Abramovich, F., Benjamini, Y., Donoho, D., and Johnstone, I. (2000). Adapting to unkown sparsity by controlling the false discovery rate. Technical report, Dept. of Statistics, Stanford Univ.

Adler, R. J. (2000). On excursion sets, tube formulas and maxima of random fields. Ann. Appl. Probab., 10(1):1–74.

ATSDR, D. (1988). The nature and extent of lead poisoning in children in the united states: a report to congress. Technical report, Agency for Toxic Substances and Disease Registry, Atlanta: US Department of Health and Human Services, Public Health Service.

Benjamini and Hochberg (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Royal Statist. Soc. B, (57):289–300.

Benjamini, Y. and Liu, W. (1999). A step-down multiple hypothesis testing procedure that controls the false discovery rate under independence. Journal Stat. Planning Inference, 82(1-2):163–170.

Benjamini, Y. and Yekutieli, D. (2001). The control of false discovery rate in multiple testing under dependency. The Annals of Statistics, (29):1165–1188.

Besse, P. and Ramsay, J. O. (1986). Principle components analysis of sampled func- tions. Psychometrika, (51):285–311.

104 Chakravarti, I. M., Laha, R. G., and Roy, J. (1967). Handbook of Methods of Applied Statistics, Volume I. John Wiley and Sons.

Cleveland, W. and Devin, S. (1988). Locally weighted regression: an approach to regression analysis by local fitting. J. Amer. Statist. Assoc., (83):596–610.

Donoho, D. and Jin, J. (2004). Asymptotic minimaxity of false discovery rate thresh- olding for sparse exponential data. Technical Report 2004-6, Dept. of Statistics, Stanford Univ.

Fan, J. and Lin, S. (1998). Test of significance when data are curves. J. of the American Stat. Assoc., (93):1007–1021.

Faraway, J. J. and Sun, J. (1995). Simultaneous confidence bands for linear re- gression with heteroscedastic errors. Journal of American Statistical Association, 90(431):1094–1098.

Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist., 32(3):1035–1061.

Genovese, C. R. and Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure. J. Royal Statist. Soc. B, (64):499–518.

Golub, e. a. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, (286):531–537.

Hotelling, H. (1939). Tubes and spheres in n-spaces, and a class of statistical problems. American Journal of Mathematics, (61):440–460.

Hsu, J. C. (1999). Multiple Comparisons, Theory and Methods. Chapman and Hall.

Isotani, T., Lehmann, D., Pascual-Marqui, R., Kochi, K., Wackermann, J., Saito, N., Yagyu, T., Kinoshita, T., and Sasada, K. (2001). Eeg source localization and

105 global dimensional complexity in high- and low- hypnotizable subjects: a pilot study. Neuropsychobiology, (44):192–198.

James, G. and Hastie, T. (2001). Functional linear discriminant analysis for irregularly sampled curves. Journal of the Royal Statistical Society, Series B, (63):533–550.

James, W. and Stein, C. (1961). Estimation with quadratic loss. Proceedings of Fourth Berkeley Symposium on and Probability Theory University of California Press, pages 361–380.

Johansen, S. and Johnstone, I. (1990). Hotelling’s theorem on the volume of tubes: some illustrations in simultaneous inference and data analysis. The Annals of Statistics, (18):652–684.

Knowles, M. and Siegmund, D. (1989). On hotellings geometric approach to testing for a nonlinear parameter in regression. International Statistical Review, (57):205– 220.

Krishnaiah, P. R. (1979). Some Developments on Simultaneous Test Procedures, volume 2. New York: Academic Press.

Lemon, W., Liyanarachchi, S., and You, M. (2003). A high performance test of differential gene expression for oligonucleotide arrays. Genome Biology, 4(10):R67.

Leurgans, S. E., Moyeed, R. A., and Silverman, B. W. (1993). Canonical correlation analysis when the data are curves. Journal of the Royal Statistical Society, Series B, (55):725–740.

Loader, C. (1996). Local likelihood . The Ann. of Statistics, (24):1602–1618.

Miller, R. G., J. (1977). Developments in multiple comparisons. Journal of the American Statistical Association, 12:179–188.

106 Morrison, D. F. (1976). Multivariate Statistical Methods. New York : McGraw-Hill, 2 edition.

Naiman, D. Q. (1987). Simultaneous confidence bounds in multiple regression using predictor variable constraints. Journal of the American Statistical Association, (82):214–219.

Naiman, D. Q. (1990). On volumes of tubular neighborhoods of spherical polyhedra and . The Annals of Statistics, (18):685–716.

Owen, A. B. (2004). Variance of the number of false discoveries. Technical report, Department of Statistics, Stanford University.

Pacifico, P., Genovese, C. R., Verdinelli, I., and Wasserman, L. (2004). False discovery rates for random fields. J. of the American Stat. Ass., 99(468):1002–1014.

Parzen, E. (1961). An approach to time series analysis. Ann. Math. Statist., (32):951– 989.

Ramsay, J. and Daizell, C. (1991). Some tools for functional data analysis. J. of the Royal Statistical Society Series B, (53):539–572.

Robbins, N., Zhang, Z., Sun, J., Ketterer, M., Lalumandier, J., and Shulze, R. (2005). Exposure and pediatric uptake of lead in cleveland, ohio during the era of leaded gasoline (1936-1993). Environmental Health Perspectives. In progress.

Roy, S. N. (1953). On a heuristic method of test construction and its uses in multi- variate analysis. Annuls of Mathematical Statistics, (24):220–239.

Roy, S. N. and Bose, R. C. (1953). Simultaneous confidence . Annuls of Mathemuticul Statistics, (2):415–536.

Sabattia, C., Serviceb, S., and Freimerb, N. (2003). False discovery rate in linkage and association genome screens for complex disorders. Genetics, (164):829–833.

107 Sarkar, S. (2002). Some results on false discovery rate in stepwise multiple testing procedures. Ann. Statist, 30(1):239–257.

Sarkar, T. K. (1969). Some lower bounds of reliability. Technical Report 124, Dept. Operation Research and Statistics, Stanford Univ.

Scheffe, H. (1953). A method of judging all contrasts in the analysis of variance. Biometrika, (40):87–104.

Shapiro, S. and Wilk, M. B. (1965). An analysis of variance test for normality (com- plete samples). Biometrika, (52, 3 and 4):591–611.

Somerville, P. N. (1999). Critical values for multiple testing and comparisons: one step and step down procedures. Journal of Statistical Planning and Inference, 82(1-2):129–138.

Storey, J. (2002). A direct approach to false discovery rates. J. Royal Statist. Soc. B, (64):479–498.

Storey, J. (2003). The positive false discovery rate: a bayesian interpretation and the q-value. The Annals of Statistics, (31):2013–2035.

Storey, J., Taylor, J., and Siegmund, D. (2004). Strong control, conservative , and simultaneous conservaive consistency of false discovery rates: A unified approach. J. Royal Statist. Soc. B, (66):187–205.

Sun, J. (1993). Tail probabilities of the maxima of gaussian random fields. The Annals of Probability, (21):34–71.

Sun, J. (2001). Multiple comparisons for a large number of parameters. Biometrical J., (43):627–643.

Sun, J. and Loader, C. (1994). Simultaneous confidence bands for and smoothing. The Annals of Statistics, (22):328–1345.

108 Tukey, J. W. (1953). The Problem of Multiple Comparisons. Princeton University, mimeo.

Weyl, H. (1939). On the volume of tubes. Amer. J. of Mathematics, 61(2):461–472.

Worsley, K., Taylor, J., Tomaiuolo, F., and Lerch, J. (2004). Unified univariate and multivariate random field theory :s. NeuroImage, (23):189–195.

Zheng, Y., Johnston, D. Berwick, J., and Mayhew, J. (2001). Signal source separation in the analysis of neural activity. NeuroImage, (13):447–458.

109