<<

Interfaces Between Bayesian and Frequentist Multiple Testing by Shih-Han Chang

Department of Statistical Science Duke University

Date: Approved:

James O. Berger, Supervisor

Surya Tokdar

David Banks

Yin Xia

Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Statistical Science in the Graduate School of Duke University 2015 Abstract Interfaces Between Bayesian and Frequentist Multiple Testing by Shih-Han Chang

Department of Statistical Science Duke University

Date: Approved:

James O. Berger, Supervisor

Surya Tokdar

David Banks

Yin Xia

An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Statistical Science in the Graduate School of Duke University 2015 Copyright c 2015 by Shih-Han Chang All rights reserved except the rights granted by the Creative Commons Attribution-Noncommercial Licence Abstract

This thesis investigates frequentist properties of Bayesian multiple testing procedures in a variety of scenarios and depicts the asymptotic behaviors of Bayesian methods. Both Bayesian and frequentist approaches to multiplicity control are studied and compared, with special focus on understanding the multiplicity control behavior in situations of dependence between test . Chapter 2 examines a problem of testing mutually exclusive hypotheses with dependent data. The Bayesian approach is shown to have excellent frequentist prop- erties and is argued to be the most effective way of obtaining frequentist multiplicity control without sacrificing power. Chapter 3 further generalizes the model such that multiple signals are acceptable, and depicts the asymptotic behavior of false positives rates and the expected number of false positives. Chapter 4 considers the problem of dealing with a sequence of different trials concerning some medical or scientific issue, and discusses the possibilities for multiplicity control of the sequence. Chap- ter 5 addresses issues and efforts in reconciling frequentist and Bayesian approaches in sequential endpoint testing. We consider the conditional frequentist approach in sequential endpoint testing and show several examples in which Bayesian and frequentist methodologies cannot be made to match.

iv To my family, for their unwavering faith in me.

v Contents

Abstract iv

List of Tables ix

List of Figuresx

List of Abbreviations and Symbols xii

Acknowledgements xiii

1 Introduction1 1.1 Multiple testing...... 1 1.1.1 Applications...... 2 1.1.2 A review of multiplicity adjustment...... 2 1.1.3 Family-wise error rate (FWER) control...... 4 1.1.4 False discovery rate (FDR) control...... 5 1.1.5 Other approaches...... 6 1.2 Outline and Contributions of the Thesis...... 6

2 Comparison of Bayesian and frequentist multiplicity correction for testing mutually exclusive hypotheses under data dependence8 2.1 Introduction...... 8 2.2 Frequentist Multiplicity Control...... 10 2.2.1 An Ad hoc Procedure...... 11 2.2.2 Likelihood Ratio Test...... 12 2.3 A Bayesian Test...... 15 2.4 The situation as the correlation goes to 1...... 17

vi 2.5 Asymptotic frequentist properties of Bayesian procedures...... 21 2.5.1 Posterior ...... 22 2.5.2 False positive ...... 27 2.6 A Type II maximum likelihood approach...... 29 2.7 Power analysis for the Type II ML Bayesian procedure...... 36 2.8 Analysis as the information grows...... 38 2.9 Conclusions...... 42 2.10 Appendix...... 42 2.10.1 Normal Theory...... 42 2.10.2 Type II MLE...... 44

3 Frequentist multiplicity control of Bayesian with spike-and-slab priors 51 3.1 The standard Bayesian model of multiple testing...... 52 3.1.1 Bayes factors...... 53 3.1.2 Posterior model probabilities...... 53 3.2 of the null model as n grows...... 54 3.3 Asymptotic frequentist properties of Bayesian procedures...... 56 3.3.1 Inclusion probabilities and the decision rule...... 56 3.3.2 Expected number of false positives...... 60 3.3.3 FPPs for various structures of prior probabilities...... 61 3.4 Conclusion...... 66 3.5 Appendix...... 67

4 Bayesian multiple testing for a sequence of trials 72 4.1 Introduction...... 72 4.2 Bayesian sequence multiplicity control...... 73 4.2.1 Recursive formula...... 75 4.3 HIV vaccines...... 75

vii 4.4 Analysis when only p-values are available...... 77

5 Reconciling frequentist and Bayesian methods in sequential end- point testing 79 5.1 Introducion...... 79 5.2 Conditional frequentist testing...... 81 5.2.1 Testing of two simple hypotheses...... 81 5.2.2 Conditioning partitions for the sequential endpoint problem. 82 5.2.3 Conditional frequentist error probabilities for the random par- tition...... 84 5.3 Assessing the possibility of Bayesian/frequentist agreement...... 86 5.3.1 Testing two simple hypotheses...... 87 5.3.2 Testing composite hypotheses...... 91 5.4 Appendix...... 95 5.4.1 Conditional frequentist analysis for the fixed partition..... 95 5.4.2 Conditional frequentist test...... 96

Bibliography 99

Biography 104

viii List of Tables

1.1 Decisions and errors...... 3 5.1 Frequentist error probabilities, conditional on being in the random partition...... 84 5.2 Decisions and reported errors in sequential endpoint tests...... 87 5.3 Decisions and reported frequentist errors for two endpoints...... 87 5.4 Decisions and reported errors...... 88 5.5 Decision rules...... 95 5.6 Conditional probabilities up to a normalizing constant...... 95

ix List of Figures

2.1 Ratio of estimated and true posterior probability of M1 as n grows under the null model and fixed τ, r, different ρ. Each subplot is for different correlations and contains 200 simulations...... 24

2.2 Estimated (red line) and true posterior probability (blue line) of M1 for different τ under the null model, for fixed n “ 2000, ρ “ r “ 0.5. 25

2.3 Convergence of P pM0 | xq to the (0.5) under the null model. Each subplot has a different correlation and contains 50 simulations...... 27

2.4 Comparison of the simulated FPP and its asymptotic approximation when n “ 106, p “ 1{3; r “ 0.5, ρ “ 0 as n varies from 101 to 108 (x-axis in exponential scale) ...... 34 2.5 threshold probability (p) versus the number of hypotheses (n) for fixed FPP(“ 0.05) and prior probability of null model (“ 0.5)...... 35

1 1 p 2.6 Solution of k˚ ´ 2 (y-axis) versus p1´pqp1´rq (x-axis)...... 35

2 4 2.7τ ˆ versus θi and fixed n “ 10 , r “ 0.5, ρ “ 0, and p “ 1{3. Each point is the averageτ ˆ2 among 104 independent draws from the multivariate normal with the constraint τ 2 ą 1...... 36

4 2.8 Power versus θi and fixed n “ 10 , r “ 0.5, ρ “ 0, p “ 1{3. Each point is the average acceptance rate of the true non-null model with respect to θ in the x-axis ...... 37

2.9 θi versus n for fixed power 0.95 (above) 0.75 (below), ρ “ 0, p “ 1{3. 38 3.1 of the posterior probability of the null model under the null model, where P pM0q “ 0.5, kmax “ 4, τ “ 2. The green horizontal line is y “ 0.5. For each m (number of hypotheses), 3000 iterations have been performed...... 56

x 3.2 For kmax “ 4, τ “ 2, and various numbers of hypotheses m, box plots of the logarithm of the ratio of the true inclusion probability to the estimated inclusion probability, under the null model. For each m, 3000 iterations have been carried out. The red horizontal line is y “ 0 (ratio = 1)...... 58 3.3 Box plots of empirical false positive rates with (red line), (green dashed line) and the theoretical false positive probabili- ties (black lines), when kmax “ 2, the decision threshold is q “ 0.05, and τ “ 1...... 60 3.4 False positive probability based on the threshold in Theorem 3.3.10. The blue curve is numerical FPP (average FPP at given m); the green α m curve is 1 ´ p1 ´ m q where α “ 0.05...... 64

3.5 Box plots of inclusion probability P pµi ‰ 0 | Xq versus signal size θi. m “ 100, kmax “ 3, τ “ 1. For each θi, 100 iterations were generated to get the box plot...... 65 3.6 Posterior of p versus different noise ratios. m “ 20, τ “ 2. For each plot, kmax is indicated in the subtitle, and simulate data x according to the top right true noise ratio legend...... 66

xi List of Abbreviations and Symbols

Symbols

X a vector of i.i.d random variables.

th Xi i element of X. x a vector of observations.

γi an indicator for nonzero .

γ pγ1, ..., γnq

i th M0 i null model.

i th M1 i alternative model. Φpxq cumulative density function of Gaussian distribution.

φpxq probability density function of Gaussian distribution. α Type-I error.

ρ correlation

Abbreviations

FPP False Positive Probability.

FDR False Discovery Rate

FWER Family-wise Error Rate.

i.i.d. independent and identically distributed.

xii Acknowledgements

There were many people involved in the process of my PhD that have assisted me greatly to reach this point. First, I am extremely grateful to the advice and support from professor Jim Berger, who guided me extensively through the process of finding a research topic and accomplishing high-level research by thinking critically and thoroughly. His feedback is always insightful. I appreciate his time and assistance over the years. I would also like to thank my committee members: David Banks, Surya Tokdar, Yin Xia and Barbara Engelhardt, and statistics professors Robert Wolpert, David Dunson for all the feedback and support they provided during my graduate studies. Thank you to my fellow classmates - I have been fortunate enough to have you from our First Year Exam to ISBA Canc´un,from discussing research to career de- velopment, I have benefited greatly by having you there with me through it all. I also owe a great deal of thanks to my friends from the Duke Taiwanese Student Association, the Duke Table Tennis Varsity Team, and my internship in California. My Duke life and internships were colorful and enjoyable because of you. To all my friends from the Duke Chapel and FYTER congregations, especially Karen and Michael, Carol and Richard, Steve and Sara, and Howard and Joann I always feel your warmhearted hospitality and love. You are like family and gave me a place where I can feel like home. Special thanks to pastors Luke and Carol for sharing your wisdom; every sermon has been inspiring and enlightening. Thank you to Chapel conductor Rodney for constantly bringing us great music - every Handel’s

xiii Messiah and Christmas Eve Service has been so beautiful and memorable. Finally, I would like to thank my parents Yu-Miao and Chang-Wang, brother Li- Ying, uncle Yumin and aunt Yolanda for their consistent care and support; thank you to my wonderful wife Chien-Kuang for your love, care, and companionship. Without you, none of this would have been possible.

xiv 1

Introduction

1.1 Multiple testing

Multiple testing has been an important topic for decades, and as technology advances, more and more scientific disciplines require testing a large number of hypotheses simultaneously or sequentially. Simply testing each hypothesis at some significance level α and reporting findings results in false discoveries and irreproducible outcomes in multiple testing situations (Ioannidis(2005), Banks et al.(2011), Heller et al. (2014)). In clinical trials or pharmaceutical studies, such errors may lead to wasting resources or potentially costing lives. The Bayesian approach to controlling for multiplicity operates through the prior probabilities assigned to hypotheses. For example, suppose the observations come from

Yi „ pF0 ` p1 ´ pqF1 ,

where p is the probability that the null model is true, and F0,F1 denote the proba- bility distributions corresponding to the null and the alternative model, respectively. By giving a non-degenerate prior on p, if strong signals are discovered, p could shift towards zero. This automatically controls for multiplicity, as discussed extensively

1 in Scott and Berger(2006), Scott and Berger(2010) for a two-groups model and variable-selection in linear models. This simplicity of Bayesian multiplicity control makes it highly attractive, the hope being that it can lead to fully powered multi- plicity control (to be defined later) even in scenarios of high dependence amongst test statistics (where frequentist multiplicity control can be difficult). To be widely accepted, however, the frequentist properties of Bayesian multi- plicity control procedures need to be understood. The central goal of this thesis, therefore, is to investigate frequentist properties of Bayesian multiplicity control in a variety of scenarios. It aims to answer when a Bayesian procedure can be considered to be a valid frequentist procedure, and depicts the asymptotic behaviors of Bayesian procedures as the number of tests grows.

1.1.1 Applications

Multiple testing methodology has wide applications including microarrays (Dudoit and Van Der Laan(2007), Noble(2009)), data quality (Karr et al.(2006)), neu- roimage classification (Genovese et al.(2002)), image processing (White and Ghosal (2014)), astrophysics Miller et al.(2001), high-dimensional hypothesis testing (Cai et al.(2013), Tony Cai et al.(2014)) etc. It is also worth noting the strong con- nection between multiplicity and sparsity, which is discussed in Malloy and Nowak (2011), Abramovich et al.(2006) Datta et al.(2013), Bogdan et al.(2008b), Cai and Xia(2014).

1.1.2 A review of multiplicity adjustment

Many multiple testing procedures have been developed to control type I or other errors variants. For conducting n hypothesis tests, the ith test consists of testing the

i i null hypothesis M0 versus the alternative M1. For each individual test i, there are four possible scenarios : (1) rejecting the null when the null is true, (2) rejecting the null when the alternative is true, (3) failing to reject the null when the null is true

2 or (4) failing to reject the null when the alternative is true. Scenarios (2) and (3) are correct. Because we are conducting multiple tests, we need to consider the fact that each test will yield one of the four outcomes, and hence may have a correct or incorrect result. The following table shows the counts for the outcomes for n tests in terms of these four scenarios:

Table 1.1: Decisions and errors

Null True Alternative True Total Fail to Reject Null U T n ´ R Reject Null V S R N0 n ´ N0 n

Recall that only two of these scenarios (rejecting the null when the alternative is true and failing to reject the null when the null is true) are correct. We can then compute error rates for the n tests based on the counts in the table for the other two scenarios. The types of errors that have been often considered are:

Per-comparison error rate (PCER): EpV q{n Per-family error rate (PFER): expected number of Type-I errors, EpV q Family-wise error rate (FWER): P pV ě 1q False discovery rate (FDR): EpV {R | R ą 0qP pR ą 0q Positive false discovery rate (pFDR): EpV {R | R ą 0q A relationship between these errors is

PCER ď FDR ď pF DR ď FWER ď PFER, (1.1) cf. Ge et al.(2003), Goeman and Solari(2014) for details. We use the standard term weak control to refer to computing any of these error rates assuming all n null hypotheses are true; strong control refers to computing these error rates under any combination of true and false nulls. It may be desirable to control the rate of one or more of these type of errors. In

3 the following sections, we will discuss a few methods for controlling the error rate for specific error types.

1.1.3 Family-wise error rate (FWER) control

Three possible methods for controlling FWER are as follows:

1. The Bonferroni procedure rejects a hypothesis if its p-value is less than α{n, which directly controls the FWER error. This is equivalent to rejecting if

p˜i “ mintnpi, 1u ď α .

2. Another single-step FWER adjustment is Sidk’s method:

n p˜i “ 1 ´ p1 ´ piq .

3. The previous two methods adjust for all p-values simultaneously. Holm(1979) instead attempts to gain more power by successively making smaller adjust-

ments to the p-values. He denotes pp1q ď pp2q ď ... ď ppnq as the ordered raw p-values, and rejects a hypothesis if:

p˜piq “ max tminp pn ´ k ` 1qppkq, 1qu ď α . kPt1,...,iu

All of these methods suffer when the test statistics are not independent. Westfall and Young(1993)’s step-down -based algorithm controls FWER without the independence assumption. However, in many applications, such as RNA sequencing, which have millions of hypotheses, controlling FWER is too conservative. With millions of tests being run, it is highly likely that at least one of them will results in a Type 1 Error. Such a strong control as FWER, which attempts to eliminate Type 1 Error from all tests, is overly conservative.

4 1.1.4 False discovery rate (FDR) control

FDR is one alternative error metric that typically reduces the FWER without sacri- ficing power (cf (1.1)). If all null hypotheses are true and test statistics are independent, then 5% of p-values are expected to be less than the 0.05 threshold since p-values are uniformly distributed under null. Motivated by this, Benjamini and Hochberg(1995) considered dividing each observed p-value by its rank in order to estimate FDR. In this approach, they declare significance if

p˜piq ď min tminpn ppkq{k, 1qu . kPt1,...,nu

Alternatively, Efron et al.(2001) proposed the two-groups approach: let Zi be

th the z-value of i hypothesis testing. Assume Z1, ..., Zn follows the below mixture cumulative distribution functions (cdf):

F pzq “ p0F0pzq ` p1 ´ p0qF1pzq , (1.2)

where p0 is the null prior probability, and F0, F1 are the cdfs of null and alternative hypotheses, respectively. From Bayes rule,

F drpzq “ P p null | Z ď zq “ p0F0pzq{pF0pzq ` F1pzqq ,

local FDR “ fdrpzq “ P p null | Z “ zq “ p0f0pzq{pf0pzq ` f1pzqq ,

where F0,F1 are functions corresponding to null and alter- native models. Note that local FDR is calculated with respect to a single score, which is useful for checking the false positive probability of a single case. Although (1.2) is a strong assumption, the two-groups approach is easy to interpret, deals with dependency (Efron(2004)) and links to empirical Bayes.

5 1.1.5 Other approaches

We have mentioned some Bayesian work in the beginning of this chapter which attempts to control error rates in multiple testing situations. There are other Bayes and Empirical Bayes approaches such as parametric empirical Bayes (Bogdan et al. (2007), Bogdan et al.(2008a), Dutta et al.(2015)), decision theoretical FDR (Muller et al.(2006), Bogdan et al.(2011)) and, in different contexts, subgroup analysis (Wang et al.(2007), Berger et al.(2014)), or (Scott(2009)).

1.2 Outline and Contributions of the Thesis

Chapter 2 examines a problem of testing mutually exclusive hypotheses with de- pendent data. Both Bayesian and frequentist approaches to multiplicity control are studied and compared, and this work helps us gain understanding as to the effect of data dependence on each approach. The Bayesian approach is shown to have excel- lent frequentist properties and is argued to be the most effective way of obtaining frequentist multiplicity control without sacrificing power. Chapter 3 generalizes the model so that multiple signals are acceptable. The chal- lenging part is the exponential growth in model size (i.e. 2n models for n tests). We make a very mild assumption on the model complexity and determine the asymptotic behavior of false positives rates and the expected number of false positives. Chapter 4 applies Bayesian methods to clinical trials, where multiplicity has been neglected in many studies. We demonstrate how to use Bayesian methodology to control for multiplicity and exemplify the technique on HIV vaccine data. In addition, since many existing studies have been conducted via frequentist methods, we show possible ways to obtain Bayesian answers by converting p-values to Bayes factors. Chapter 5 addresses issues and efforts in reconciling frequentist and Bayesian ap- proaches in the sequential endpoint testing problem. Conditional frequentist method- ology has been one of the most successful ways to reconcile Bayes and frequentist

6 approaches in the single hypothesis testing scenario. However, we show that no reconciliation appears to be possible, in general, for sequential endpoint testing.

7 2

Comparison of Bayesian and frequentist multiplicity correction for testing mutually exclusive hypotheses under data dependence

2.1 Introduction

Modern scientific often require considering a large number of hypotheses simultaneously (Efron(2004), Noble(2009)) and has led to extensive interest in controlling for multiple testing (henceforth, just termed controlling for multiplicity). Many multiplicity control methods have been proposed in the frequentist literature, such as the Bonferroni procedure which controls the family-wise error rates, and various versions of false discovery rates (cf. Benjamini and Hochberg(1995) and Storey(2003)) which control for the fraction of false discoveries to stated discoveries. Asymptotic behavior of false discovery rate have been studied in Abramovich et al. (2006). The Bayesian approach to controlling for multiplicity operates through the prior probabilities assigned to hypotheses. For instance, in the scenario that is considered herein of testing mutually exclusive hypotheses (only one of n considered hypotheses

8 can be true), one can simply assign each hypothesis prior probability equal to 1{n and carry out the Bayesian analysis; this automatically controls for multiplicity. That multiplicity is controlled through prior probabilities of hypotheses or models is extensively discussed in Scott and Berger(2006), Scott and Berger(2010) for a two-groups model and variable-selection in linear models. One of the appeals of the Bayesian approach to multiplicity control is that it does not depend on the dependence structure of the data; the Bayes procedure will automatically adapt to the dependence structure through Bayes theorem, but the prior probability assignment that is controlling for multiplicity is unaffected by data dependence. In contrast, frequentist approaches to multiplicity control are usually highly affected by data dependence. For instance, the Bonferroni correction is fine if the test statistics for the hypotheses being tested are independent, but can be much too conservative (losing detection power), if the test statistics are dependent. An interesting possibility for frequentist multiplicity control in data dependence situations is thus to develop the procedure in a Bayesian fashion and verify that the procedure has sufficient control from a frequentist perspective. This has the potential of yielding optimally powered frequentist procedures for multiplicity control. There have been other papers that study the frequentist properties of Bayesian multiplicity control procedures (Bogdan et al.(2008a), Guo and Heitjan(2010), Abramovich and Angelini(2006)), but they have not focused on the situation of data dependence. We investigate the potential for this program by an exhaustive analysis of the simplest multiple testing problem which exhibits data dependence. The data X “

1 pX1,...Xnq arises from the multivariate normal distribution

θ1 1 ρ ¨ ¨ ¨ ρ θ2 ρ 1 ¨ ¨ ¨ ρ X „ multinorm ¨¨ . ˛ , ¨. . . .˛˛ , (2.1) ...... ˚˚ ‹ ˚ ‹‹ ˚˚θn‹ ˚ρ ρ ¨ ¨ ¨ 1‹‹ ˚˚ ‹ ˚ ‹‹ ˝˝ ‚ ˝ ‚‚ 9 where ρ is the correlation between the observations. Consider testing the n hypothe-

i i ses M0 : θi “ 0 versus M1 : θi ‰ 0, but under the assumption that at most one alternative hypothesis could be true. (It is possible that no alternative is true.) Al- though our study of this problem is pedagogical in nature, such testing problems can arise in signal detection, when a signal could arise in one and only one of n channels, and there is common background noise in all channels, leading to the equal correla- tion structure. We will, for convenience in exposition, use this language in referring to the situation. In subsection 2 we introduce two natural frequentist procedures for multiplicity control in this problem and, in subsection 3, we introduce a natural Bayesian pro- cedure. subsection 4 explores a highly curious phenomenon that is encountered in the situation where ρ is near 1; when n ą 2, the Bayesian procedure finds the true alternative hypothesis with near certainty, while an ad hoc frequentist procedure fails to do so. subsections 5 and 6 study the frequentist properties of the original

Bayesian procedure and the Type-II MLE approach, showing that, as n Ñ 8, the Bayesian procedures have strong frequentist control of error. subsection 7 discusses power of the Type II maximum likelihood procedure, and argues that it is essentially an optimal frequentist procedure. Subsection 8 considers the situation in which there is a data sample of growing size m for each θi.

2.2 Frequentist Multiplicity Control

Two natural frequentist procedures are considered.

10 2.2.1 An Ad hoc Procedure

Declare channel i to have the signal if max |Xj| ą c, where c is determined to achieve 1ďjďn overall family-wise error control

α “ P max |Xj| ą c θi “ 0 @i . (2.2) 1ďjďn ˆ ˙ ˇ ˇ Lemma 2.2.1. (2.2) can be expressed as

? ? c ´ ρZ ´c ´ ρZ n α “ 1 ´ EZ Φ ? ´ Φ ? , 1 ´ ρ 1 ´ ρ "„ ˆ ˙ ˆ ˙ * where the expectation is with respect to Z „ Np0, 1q.

? Proof. By Lemma 2.21, under the null model, Xi can be written as Xi “ ρZ ` ? 1 ´ ρZi, where the Z and the Zi are independent standard normal random vari- ables. Thus

P max |Xj| ą c θi “ 0 @i 1ďjďn ˆ ˙ ˇ ˇ Z ? “ 1 ´ E P for all j, | ρZ ` 1 ´ ρZj| ă c | Z " ˆ ˙* a ? ? n ´c ´ ρZ c ´ ρZ “ 1 ´ EZ P ? ă Z ă ? Z 1 ´ ρ j 1 ´ ρ # 1 ˆ ˙+ ź ˇ ? ? ˇ c ´ ρZ ´c ´ ρZ n “ 1 ´ EZ Φ ? ´ Φ ? . 1 ´ ρ 1 ´ ρ "„ ˆ ˙ ˆ ˙ *

Corollary 2.2.2.

logp1´αq 2 • When ρ “ 0, Φpcq “ 1` 2n `Op1{n q, essentially calling for the Bonferroni correction.

11 α • When ρ Ñ 1, Φpcq Ñ 1 ´ 2 , so the critical region is the same as that for a single test.

Proof. If ρ “ 0, α “ 1 ´ pΦpcq ´ Φp´cqqn “ 1 ´ p2Φpcq ´ 1qn, from which it follow that 1 ` p1 ´ αq1{n 1 ` 1 ` logp1´αq ` Op1{n2q Φpcq “ “ n . 2 2

If ρ Ñ 1, by Lemma 2.21,

lim P max |Xj| ą c θi “ 0 @i ρÑ1 1ďjďn ˆ ˙ ˇ ˇ Z1,...,Zn ? “ 1 ´ lim E P | ρZ ` 1 ´ ρZj| ă c | Z1, .., Zn ρÑ1 " ˆ ˙* a

Z1,...,Zn ? “ 1 ´ E lim P | ρZ ` 1 ´ ρZj| ă c | Z1, .., Zn ρÑ1 " ˆ ˙* a “ 1 ´ pΦpcq ´ Φp´cqq

“ 2p1 ´ Φpcqq .

The extreme effect of dependence in the data on frequentist multiplicity correction is clear here; the correction ranges from full Bonferroni correction to no correction, as the correlation ranges from 0 to 1.

2.2.2 Likelihood Ratio Test

A more principled frequentist procedure would be the likelihood ratio test (LRT):

Theorem 2.2.3. The test arising from the likelihood ratio test is

2 xj ´ x¯ T “ max 1 ´ ρxj ` nρ ? j 1 ´ ρ „ ˆ ˙ a and the LRT would be to reject the null hypothesis if T ą c, where c satisfies α “

P pT ą c | θi “ 0 @iq.

12 Proof. Denote

1 ρ ¨ ¨ ¨ ρ ρ 1 ¨ ¨ ¨ ρ Σ0 “ ¨. . . .˛ . . .. . ˚ ‹ ˚ρ ρ ¨ ¨ ¨ 1‹ ˚ ‹ ˝ ‚ and its inverse

a b ¨ ¨ ¨ b 1`pn´2qρ b a ¨ ¨ ¨ b a “ an “ Σ´1 ¨ ˛ where p1`pn´1qρqp1´ρq . 0 “ . . .. . ´ρ . . . . #b “ bn “ 1 n 1 ρ 1 ρ ˚ ‹ p `p ´ q qp ´ q ˚b b ¨ ¨ ¨ a‹ ˚ ‹ ˝ ‚

The likelihood ratio is then, letting fp¨q denote the density of X and ˜xi “ px1, ..., xi´1, xi´

1 θi, xi`1, ...xnq ,

´ 1 ´1 T ´1 fpx | θ “ 0, @ iq pdet Σ0q 2 exp x Σ x LR “ i “ 2 0 . (2.3) 1 1 ´1 max f x θ 0, θ 0 ´ 2 ´ T p | i ‰ ´i “ q max pdet Σ0q exp supθ ˜xi Σ0 ˜xi i,θi i 2 i ( ( n Computation yields, defining ui “ xj, j‰i ř

ˆ ´1 T ´1 b θi “ arg max ˜xi Σ0 ˜xi “ xi ` ui , θi 2 a

13 from which it is immediate that n ´1 n 2 2 exp 2 pa ´ bqp 1 xj q ` bp xjq LR “ min " ˆ 1 ˙* i řn ř n ´1 b2 2 2 b 2 exp 2 pa ´ bqp a2 ui ` xj q ` bp´ a ui ` xjq # ˆ j‰i j‰i ˙+ ř ř 2 ´1 2 b 2 2 2 b 2 “ min exp pa ´ bqpxi ´ ui q ` b pui ` xiq ´ ui p ´ 1q i 2 a2 a " „ ˆ ˙* n

psince xj “ ui ` xiq 1 ÿ 2 ´1 2 b 2 “ min exp axi ` 2buixi ` ui i 2 a " „ *

´1 2 “ min exp paxi ` buiq . i 2a " * Noting that 1 pax ` bu q2 a j j

1 2 “ p1 ` pn ´ 2qρqx ´ ρ x p1 ` pn ´ 1qρqp1 ` pn ´ 2qρqp1 ´ ρq j k k‰j ˆ ÿ ˙ 1 2 “ p1 ´ ρqx ` nρpx ´ x¯q p1 ` pn ´ 1qρqp1 ` pn ´ 2qρqp1 ´ ρq j j „  1 x ´ x¯ 2 “ 1 ´ ρx ` nρ ?j , p1 ` pn ´ 1qρqp1 ` pn ´ 2qρq j 1 ´ ρ „ ˆ ˙ a it is immediate that LR is equivalent to the test statistic T . The rejection region is LR ď k for some k, which is clearly equivalent to T ě c for appropriate critical value c.

2 When ρ “ 0, T “ max xi , and the LRT reduces to the ad hoc testing procedure i in the previous section. On the other hand, as ρ Ñ 1, T « max n2p ?xi´x¯ q2, which i 1´ρ exhibits a quite different behavior that will be discussed later.

14 2.3 A Bayesian Test

On the Bayesian side, it is convenient to view this as the model selection problem of

deciding between the n ` 1 exclusive models

M0 : θ1 “ ... “ θn “ 0 pnull modelq

Mi : θi ‰ 0, θp´iq “ 0 (2.4)

where θp´iq is the vector of all θj except θi.

The simplest prior assumption computationally is that, for the nonzero θi (if there is one),

2 θi „ Np0, τ q ;

initially we will assume τ 2 to be known, but later will consider it to be unknown.

Then under model Mi, the marginal likelihood of model Mi is

m0pxq „ Np0, Σ0q

mipxq “ fpx | θqπpθqdθ „ N p0, Σiq , (2.5) ş where 1 ρ ¨ ¨ ¨ ρ ρ 1 ¨ ¨ ¨ ρ Σi “ ¨. . .˛ . . . 1 ` τ 2 . ˚ ‹ ˚ρ ρ ¨ ¨ ¨ 1‹ ˚ ‹ ˝ ‚ th The posterior probability of Mi (that the i channel has the signal) is then

mipxqP pMiq P pMi | xq “ n , mjpxqP pMjq j“0 ř

where P pMjq is the prior probability of model Mj.

Theorem 2.3.1. For any ρ P r0, 1q and positive integer n ą 1, the null posterior probability is: ´1 n 2 2 1 ´ r 1 τ xi P pM0 | xq “ 1 ` ? exp ` bnx¯ , n r 2 2p1 ` τ 2aq 1 ´ ρ # 1 ` τ a i“1 # ++ ˆ ˙ ÿ ˆ ˙ 15 and the posterior probability of an alternative model Mi is

2 ´1 ? 2 2 n r ´τ xi 1 ` aτ p 1´r q exp 2p1`τ 2aq 1´ρ ` nx¯b P pM | xq “ $ # + , . i n ˆ ˙ ’ 2 x2 x2 / ’ ´τ i ´ k 2b / & ` exp 2p1`τ 2aq p1´ρq2 ` 1´ρ pnx¯qpxi ´ xkq . k“1 " ˆ ˙* ’ ř / %’ -/ Proof. The posterior probability of Mi is

mipxqP pMiq P pMi | xq “ n mjpxqP pMjq j“0 ř 1´r ´1 ´1 1 ´1 |Σi| 2 exp x Σ x “ n 2 i ´1 n ´1 2 ´1 1 ´1 1´r 2 ´1 1 ´1 r|Σ0| exp 2 x Σ0 x ` n |Σj| exp( 2 x Σj x (2.6) j“1 ( ř ! ) ´1 ´1 2 n r Σ0 ´1 1 ´1 ´1 exp x pΣ0 ´ Σi qx $ 1´r Σ1 2 , “ ˆ ˙nˇ ˇ . ’ ˇ ˇ ´1 1 ´1 ´1 ( / &’ `1 ` ˇ expˇ 2 x pΣj ´ Σi qx ./ j‰ˇ i ˇ ! ) ’ ř / %’ -/ ´1 ´1 ´1 The expression can be simplified by further computing Σi ,pΣi ´ Σk q and detpΣiq. First notice that by the Cholesky decomposition

a b ¨ ¨ ¨ b b a ¨ ¨ ¨ b ´1 ¨ ˛ T Σ0 “ . . . . “ LL , . . .. . ˚ ‹ ˚b b ¨ ¨ ¨ a‹ ˚ ‹ ˝ ‚ for some lower triangular matrix L. Then by the Woodbury identity, the difference of two inverse matrices can be obtained:

´1 T ´1 ´1 ´1 T ´1 ´1 T ´1 Σi “ pΣ0 ` τiτi q “ Σ0 ´ Σ0 τip1 ` τi Σ0 τiq τi Σ0

´τ 2 “ Σ´1 Σ´1 b ¨ ¨ ¨ b0 a b ¨ ¨ ¨ b , 0 1 ` τ 2a 0 ¨ ˛ ˝ 0 ‚ T th 2 where τi “ p0, ¨ ¨ ¨ , τ, ¨ ¨ ¨ , 0q (the i element is τ ).

16 Therefore,

2 2 2 2 1 ´1 ´1 τ τ xi x pΣ ´ Σ qx “ xipa ´ bq ` bnx¯ “ ` bnx¯ 0 i 1 ` τ 2a 1 ` τ 2a 1 ´ ρ ˆ ˙ ˆ ˙ 2 1 ´1 ´1 τ 2 2 2 x pΣ ´ Σ qx “ px ´ x qpa ´ bq ` 2bpa ´ bqpnx¯qpxi ´ x q . k i 1 ` τ 2a i k k

Also the ratio of two determinants is

´1 T detpΣiq{ detpΣ0q “ detpΣ1q{detpΣ0q “ detpI ` Σ0 τ1τ1i q

T T “ detpI ` LL τ1τ1 q

T T “ detpI ` τ1 L Lτ1q

2 2 “ p1 ` τ L11q

“ p1 ` τ 2aq .

By plugging back these quantities into (2.6), the proof is complete.

Remark 2.3.2. (2.24) gives the full expression for P pMi | xq, without using a, b, and is utilized in the subsequent proofs.

Corollary 2.3.3. In particular, when n “ 2, the null posterior probability is:

P pM0 | xq “

´1 2 2 2 1 ´ r 1 ´ ρ τ xi ´ ρxp´iq 1 ` exp , $ 2r 1 ´ ρ2 ` τ 2 2p1 ´ ρ2 ` τ 2q 2 , d i 1,2 # 1 ´ ρ + & Ptÿ u ˆ ˙ . a %and the posterior probability of the alternative Mi, i P t1, 2u is: -

2 ´1 1´ρ2`τ 2 2r ´τ 2 pxi´ρxp´iqq 1´ρ2 1´r exp 2p1´ρ2`τ 2q 1´ρ2 P pMi | xq “ . $ ˆ ˙2 , b1 exp ´τ !x2 x2 ) &’ ` ` 2p1´ρ2`τ 2q p i ´ p´iqq ./ ! ) ’ / 2.4 The situation% as the correlation goes to 1 -

The following theorem shows the surprising result that, when the dimension is greater than 2, the Bayesian method can correctly select the true model when the correlation goes

17 to one. In two dimensions, however, there is nonzero probability of choosing the wrong alternative model if a non-null model is true.

Theorem 2.4.1. When n “ 2, i P t1, 2u, ρ Ñ 1, then:

P pM0 | Xq Ñ 1 under null model ,

´1 ´1 P pM | Xq Ñ 1 ` exp X2 ´ X2 under M or M . i 2 i p´iq 1 2 ˆ " * ˙ ` ˘

When n ą 2, i, j P t0, 1, ..., nu, ρ Ñ 1, under model Mj:

j 1 if i “ j, P pMi|Xq Ñ δi “ #0 else .

? ? Proof. By Lemma 2.21, xi can be written as xi “ θi ` ρz ` 1 ´ ρzi. Case I: n=2 : ? ? xi ´ ρxp´iq pθi ´ θp´iqq ` 1 ´ ρpzi ´ ρzp´iqq ` z ρp1 ´ ρq “ 1 ´ ρ2 1 ´ ρ2 (2.7) a a ? θi ´ θp´iq zi ´ ρzp´iq ρz “ ` ? ` 1 ´ ρ? . 1 ´ ρ2 1 ` ρ 1 ` ρ a a If M0 is true, since both θi, θp´iq are zero, the dominant term of (2.7) is Op1q, so the null posterior probability (Corollary 2.3.3) becomes:

2 P pM0 | xq “ p1 ` Op 1 ´ ρ qq “ 1 ` op1q . a 2 If Mi is true, the dominant term of (2.7) is Op1{ 1 ´ ρ q; hence as ρ Ñ 1:

a ´1 2 θ2 τ 2r ´1 i ? 1 2 1 ρ2 1´r exp 2 2 1 ρ2 ` Op 1 ρ q P pMi | xq “ p ´ q p ´ q ´ $ b ˆ ´1 ˙ ! ) , & `1 ` exp 2 θipθi ` 2zq . % 1 ( - “ p1 ` op1qq 1 ` expt ´1 θ pθ ` 2zqu ˆ 2 i i ˙ 1 “ p1 ` op1qq , ´1 2 2 1 ` exp 2 xi ´ xp´iq ! ` ˘) 18 ? ? using the fact that e´1{ 1´ρ goes to zero faster than 1{ 1 ´ ρ goes to infinity. ? If Mp´iq is true: similar to the previous case, (2.7) is Op1{ 1 ´ ρq, so only the last term in (2.3.3) remains. n Case II: n ą 2: denote z “ pz1, ..., znq, z¯ “ 1{n 1 zi; ř 1`pn´1qρ If M0 is true : take θ “ pθ1, ..., θnq “ 0 in (2.24), let c “ cpρq “ p1´ρ`τ 2qp1`pn´1qρq´τ 2ρ , n and note that lim cpρq “ 2 . Then ρÑ1 τ pn´1q

P pMi | xq

´1 1 n r cp1´ρq 1´r ˚ $ b ˆ ˙ , ’ / ’ ? 2 / ’ ´τ 2 1 ´ ρ ? / ’ exp $ c pzi ´ z¯q ` p ρz ` 1 ´ ρz¯q , ` 1` / ’ ’ 2 1 ` pn ´ 1qρ / / ’ ’ ˆ ˙ / / “ ’ & ? a . / ’ O( 1 ´ ρq / ’ z z 2z¯ z z / & ’ p i` klooooooooooooooooooomooooooooooooooooooon´ qp i´ kq ` / . %’ 1´ρ -/ n 2 1 ρ z¯ z z ρ 1 ρ z z z ’ $ ´τ $ p ´ q p i ´ kq ` p ´ q p i ´ kq ,, / ’ exp 2 c 2 / ’ k‰i ’ ’ 1 ` pn ´ 1qρ // / ’ ’ ’ a // / ’ ř & & ? .. / ’ Op 1 ´ ρq / ’ ’ ’ // / ’ ’ ’ loooooooooooooooooooooooooomoooooooooooooooooooooooooon // / %’ % % ´1 -- -/ τ 2pn´1q{n n r ´n 2 1´ρ 1´r exp 2pn´1q pzi ´ z¯q ` 1` “ $ bn ˆ ˙ ! ) , p1 ` op1qq ’ exp ´n pzi`zk´z¯qpzi´zkq / &’ 2pn´1q 1´ρ ./ k‰i ! ) ’ ř / %’ -/´1 2 τ pn ´ 1q{n n r ´n 2 ď exp pzi ´ z¯q p1 ` op1qq 1 ´ ρ 1 ´ r 2pn ´ 1q #d ˆ ˙ " *+

1 ´ r “ 1 ´ ρ p1 ` op1qq Ñ 0 . r τ 2 c a

19 If Mj is true (j ą 0): Take θk “ 0 @ k ‰ j in (2.24):

P pMj | xq

2 ´1 1 nr ´τ 2c ? cp1´ρq 1´r exp 2p1´ρq p1 ´ 2{nqθj ` 1 ´ ρpzi ´ z¯q ` Op1 ´ ρq $ # + , “ b n ˆ ˙ ˆ ? ? ˙ ’ 2 / ’ ´τ c p1´2{nqθj ` 1´ρpzj `zk´2z¯qpθj ` 1´ρpzi´zkqq / & `1 ` exp 2 1´ρ ` Op1q . k‰j " ˆ ˙* ’ ř / %’ ´1 -/ n´1 τ 2 p1´2{nq2θ2 n n r exp ´n j O ? 1 1 1´ρ 1´r 2pn´1q 1´ρ ` p 1´ρ q ` “ $ c ˆ ˙ " ˆ ˙* , p1 ` op1qq p1´2{nqθ2 ’ n 1 exp ´n j O ? 1 / & `p ´ q 2pn´1q 1´ρ ` p 1´ρ q . " ˆ ˙* ’ / Ñ%1 since lim 1{p1 ´ ρq expt´1{p1 ´ ρqu “ 0. - ρÑ1 a

Theorem 2.4.2. The likelihood ratio test (Theorem 2.2.3) is fully powered (i.e., rejects the null with probability 1 under an alternative hypothesis) when ρ Ñ 1 and n ą 2, but (as with the Bayesian test) is not fully powered when n “ 2.

Proof. By Lemma 2.22:

?xi´x¯ z z¯ under the null model M , 1´ρ “ i ´ 0 i θj pδ ´1{nq $ ?xi´x¯ “ ?j ` z ´ z¯ under an alternative model M . & 1´ρ 1´ρ i j % When n “ 2, under Mi, when ρ Ñ 1:

2 xj ´ x¯ lim T “ lim max 1 ´ ρxj ` 2ρ ? ρÑ1 ρÑ1 jPt1,2u 1 ´ ρ „ ˆ ˙  a 2 xj ´ xp´jq “ 2 lim max ρ ? , ρÑ1 jPt1,2u 2 1 ´ ρ ˆ ˙

For both j “ i or j “ p´iq, the corresponding likelihood ratios go to infinity at the same

? 2 ? 2 asymptotic rate since θi{p2 1 ´ ρq ` pzi ´ zp´iqq{2 “ ´ θi{p2 1 ´ ρq ´ pzi ´ zp´iqq{2 .

Hence, one cannot distinguish“ whether θi or θp´iq is‰ nonzero.“ There is a positive probability‰ choosing an incorrect alternative model.

20 When n ą 2, under Mj, when ρ Ñ 1:

2 xi ´ x¯ max 1 ´ ρxi ` nρ ? i 1 ´ ρ „ ˆ ˙  a 2 θi ´ θj{n “ max n ? ` zi ´ z¯ ` op1q i 1 ´ ρ „  2 θjp1 ´ 1{nq “ n ? ` zj ´ z¯ ` op1q . 1 ´ ρ „ 

In this case, the true alternative model has largest likelihood ratio (=8), hence, LRT is fully powered.

From Theorem 2.4.1 and 2.4.2, when the correlation goes to 1 and the dimension is larger than 2, both the Bayesian procedure and the LRT are fully powered. This surprising behavior as the correlation goes to one can be explained by the following observations. When n “ 2, ρ Ñ 1:

0 under null model xi ´ xj “ #θi or ´ θj else .

Hence, one can correctly distinguish the null model if it is true, but can not declare which non-null model is true when xi ´ xj is not 0.

When n ą 2, ρ Ñ 1: If all pairs xi ´ xj are zero, then the null model is true. If there are pairs xi ´ xj, xj ´ xi that are nonzero, we can further check whether xi ´ xk (k ‰ j)

equals zero or not to see whether θi or θj is nonzero. Note that the ad hoc frequentist test does not have this behavior. As ρ Ñ 1, the test

• still has probability α of incorrectly rejecting a true M0;

• still has positive probability of not detecting a signal when Mi is true.

2.5 Asymptotic frequentist properties of Bayesian procedures

In this section, we will be studying the false positive probability (FPP) theoretically and numerically. We first need to obtain asymptotic posteriors.

21 2.5.1 Posterior probabilities

Lemma 2.5.1. As n Ñ 8 under the null model,

2 2 2 ´1 n 1 ´ ρ ` τ ´τ xi ´ x¯ P pMi | xq “ 1 ` exp ? p1 ` op1qq 1 ´ r 1 ´ ρ 2p1 ´ ρ ` τ 2q 1 ´ ρ ˆ d # ˆ ˙ + ˙ (2.8)

almost surely.

Proof. Take θ “ 0 in (2.24):

P pMi | xq

´1 ? 2 1 n r ´τ 2 1 ´ ρ ? $ p q exp $ c zi ´ z¯ ` p ρz ` 1 ´ ρz¯q , ` 1` , ’ cp1´ρq 1´r ’ 2 1 ` pn ´ 1qρ / / ’ ’ ˆ ˙ / / “ ’ b & a . / ’ I ? ? / ’ n ? / ’ 2 p ρz ` 1 ´ ρz¯qp 1 ´ ρpz ´ z qq / & ´τ ’ looooooooooooooooooooooooooomoooooooooooooooooooooooooooni/ k . expt 2 c pzi %’` zk ´ 2z¯qpzi ´ zkq ` 2 -/ u k‰i 1 ` pn ´ 1qρ ’ ˆ ˙ / ’ ř II / ’ / ’ (2.9)/ %’ loooooooooooooooooooooooooooooooooooooooooomoooooooooooooooooooooooooooooooooooooooooon -/

1{2´ Without loss of generality, assuming |zi| ď n for all i, which holds almost surely by Lemma 2.10.4, asymptotic analysis of (2.9) yields:

2 2 2 I “ 1 ´ 1{n zi ` z¯p´iq ´2p1 ´ 1{nq ziz¯p´iq p where z¯p´iq “ 1{n zkq ˆ ˙ k‰i ` ˘ O(n´1) Opn´q ÿ ? loomoon loomoon 1 ´ ρ ? ` 2 p1 ´ 1{nqzi ´ z¯ ρz ` 1 ´ ρz¯ 1 ` pn ´ 1qρ p´iq ˆ ˙ ˆ ˙ ˆ ˙ a Opn´1q Opn1{2´q Op1q looooooooomooooooooon? looooooooooooomooooooooooooon looooooooooomooooooooooon 1 ´ ρ 2 ? ` p ρz ` 1 ´ ρz¯q2 1 ` pn ´ 1qρ ˆ ˙ Oap1q Opn´2q looooooooooomooooooooooon looooooooomooooooooon 2 2 ´ “ p1 ´ 1{nq zi ` Opn q .

22 Therefore,

1 n r ´τ 2 exp c I cp1 ´ ρq 1 ´ r 2 d ˆ ˙ " *

1 n r ´τ 2 “ exp c p1 ´ 1{nq2 z2 1 ` Opn´q (2.10) cp1 ´ ρq 1 ´ r 2 i d ˆ ˙ " * ˆ ˙

1 ´ ρ ` τ 2 n r ´τ 2 “ exp z2 p1 ` op1qq . 1 ´ ρ 1 ´ r 2p1 ´ ρ ` τ 2q i d ˆ ˙ " *

2 2 II “ zi ´ zk 1 ´ 2{n ´ 2{n zl pzi ´ zkq lRti,ku ` Opn1´2q ˘` ˘ ` ÿ ˘ Opn1{2´q loomoon Opn´1{2q looomooon looooooomooooooon ρp1 ´ ρq ziz `p1 ´ ρq zi{np´zkq

2 $ a Opn1{2´q Opn´1{2´q , ` ’ 2 ? / 1 ` pn ´ 1qρ ’ `p1 ´ ρq ziloomoon{n ` p ρz ` loooomoooon1 ´ ρz¯p´iqqp´zkq / ˆ ˙ &’ ./ ´2 Opn´1q Opn q aOpn1{2´q ’ / looooooooomooooooooon ’ loomoon loooooooooooooooomoooooooooooooooon / 2 2 %’ ´ -/ “ pzi ´ zkq p1 ´ 2{nq ` Opn q .

The summation term in (2.9) becomes:

n ´τ 2 exp c II 2 k‰i ÿ " * τ 2 n τ 2 “ exp ´c p1 ´ 2{nqz2 1 ` Opn´q exp c p1 ´ 2{nqz2 2 i 2 k " * k“1 " * ` ˘ ÿ ´τ 2 τ 2 (2.11) “ exp z2 n EZ exp Z2 ` op1q p1 ` op1qq 2p1 ´ ρ ` τ 2q i 2p1 ´ ρ ` τ 2q " * ˆ „ " *  ˙ by Law of Large Number and Z „ Np0, 1q

´τ 2 1 ´ ρ ` τ 2 “ exp z2 n p1 ` op1qq . 2p1 ´ ρ ` τ 2q i 1 ´ ρ " * d

The proof is completed by summing (2.10), (2.11).

23 Remark 2.5.2. Note that, by Lemma 2.22,

xi ´ x¯ ? zi “ zipxq “ ? ` Op1{ nq , (2.12) 1 ´ ρ

so that Lemma 2.5.1 can be written with respect to zi:

2 2 2 ´1 n 1 ´ ρ ` τ ´τ zi P pMi | xq “ 1 ` exp p1 ` op1qq . 1 ´ r 1 ´ ρ 2p1 ´ ρ ` τ 2q ˆ d " * ˙

Remark 2.5.3. Figure 2.1 shows the ratio of the estimated P pM1|xq (from Lemma 2.8) and the true probability (from Theorem 2.3.1), as n grows. Each plot contains 200 different

ratio curves based on independent simulations with fixed ρ, P pM0q and τ. As can be seen, the ratio goes to 1 when n grows and the convergence rate indeed depends on the correlation.

Figure 2.1: Ratio of estimated and true posterior probability of M1 as n grows un- der the null model and fixed τ, r, different ρ. Each subplot is for different correlations and contains 200 simulations.

24 Remark 2.5.4. Figure 2.2 gives the estimated and true posterior probability of M1 under the assumption that the null model is true, for fixed r “ ρ “ 0.5 and n, but varied τ. Notice that, for fixed n, the estimated probability is closer to the true probability when τ is small but is worse for larger τ, indicating that larger n is required for obtaining the same precision as τ grows.

−5 −5 −5 x 10 ρ =0 ; P(M0) =0.5 x 10 ρ =0.2 ; P(M0) =0.5 x 10 ρ =0.4 ; P(M0) =0.5 5 5 5 True P(M1|X) True P(M1|X) True P(M1|X) 4.5 Estimate P(M1|X) 4.5 Estimate P(M1|X) 4.5 Estimate P(M1|X)

4 4 4

3.5 3.5 3.5

under null 3 under null 3 under null 3 1 1 1 M M M 2.5 2.5 2.5

2 2 2 Posterior of Posterior of Posterior of 1.5 1.5 1.5

1 1 1

0.5 0.5 0.5 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 τ τ τ

−5 −5 −5 x 10 ρ =0.6 ; P(M0) =0.5 x 10 ρ =0.8 ; P(M0) =0.5 x 10 ρ =0.9 ; P(M0) =0.5 5 6 5 True P(M1|X) True P(M1|X) True P(M1|X) 4.5 Estimate P(M1|X) Estimate P(M1|X) 4.5 Estimate P(M1|X) 5 4 4

3.5 3.5 4 3 3 under null under null under null 1 1 1 M M 2.5 3 M 2.5

2 2 2 1.5 1.5 Posterior of Posterior of Posterior of

1 1 1 0.5 0.5

0 0 0 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 τ τ τ

Figure 2.2: Estimated (red line) and true posterior probability (blue line) of M1 for different τ under the null model, for fixed n “ 2000, ρ “ r “ 0.5.

The following theorem shows the surprising result that, as n grows when the null model is true, the posterior probability of the null model converges to its prior probability. Thus one cannot learn that the null model is true.

25 Theorem 2.5.5. As n Ñ 8 and ρ P r0, 1q, under the null model,

P pM0 | Xq Ñ P pM0q .

Proof. First note that

1 ´ρ 1 an “ ` “ ` Op1{nq , 1´ρ p1´ρqp1`pn´1qρq 1´ρ (2.13) ´1 1´ρ ´1 #nbn “ 1´ρ ` p1´ρqp1`pn´1qρq “ 1´ρ ` Op1{nq .

The summation term in the null posterior (Theorem 2.3.1) becomes

n 2 2 1 ´ r 1 τ xi ´ x exp p1 ` op1qq nr 2 2 1 τ 2 1 ρ 1 ρ 1 ` τ {p1 ´ ρq 1 # p ` {p ´ qq ´ + ˆ ˙ ÿ „  a 1 ´ r 1 ´ ρ n τ 2 “ 1{n exp z2 p1 ` op1qq pby Lemma2 .22q r 1 ´ ρ ` τ 2 2p1 ´ ρ ` τ 2q i c 1 ˆ ˙ ÿ " * 1 ´ r Ñ (by the Strong Law of Large Numbers). r

´1 Therefore, P pM0 | Xq Ñ 1 ` p1 ´ rq{r “ r “ P pM0q . ` ˘

Remark 2.5.6. Figure 2.3 shows simulations of the null posterior probability for different numbers of hypotheses and different correlations. Interestingly, by Theorem 2.4.1, the Bayes procedure identifies the correct model (here the null model) when n is fixed and the correlation goes to 1, resulting in higher initial posterior probability of the null model for highly correlated cases. On the other hand, by Theorem 2.5.5, this posterior probability converges to its prior probability regardless of the correlation. This convergence can be seen in Figure 2.3.

26 r=0.5, ρ=0 r=0.5, ρ=0.2 r=0.5, ρ=0.4 0.508 0.508 0.515

0.506 0.506 0.51 0.504

0.504 0.502 0.505 0.5 0.502 0.498 0.5 ) under null model ) under null model ) under null model 0.5 X X X 0.496 | | | 0 0 0 0.495 M M M 0.494 ( ( ( 0.498 P P P 0.492 0.49 0.496 0.49

0.494 0.488 0.485 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 5 5 5 # of hypotheses x 10 # of hypotheses x 10 # of hypotheses x 10

r=0.5, ρ=0.6 r=0.5, ρ=0.8 r=0.5, ρ=0.9 0.52 0.55 0.65

0.6 0.5 0.55 0.5 0.48 0.5

0.46 0.45 0.45 ) under null model ) under0.44 null model ) under null model 0.4 X X X | | | 0 0 0 0.35 M M M ( ( 0.42 (

P P 0.4 P 0.3 0.4 0.25

0.38 0.35 0.2 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 5 5 5 # of hypotheses x 10 # of hypotheses x 10 # of hypotheses x 10

Figure 2.3: Convergence of P pM0 | xq to the prior probability (0.5) under the null model. Each subplot has a different correlation and contains 50 simulations.

2.5.2 False positive probability

Here we focus on the major goal, to find the frequentist false positive probability under the null model of the Bayesian procedure. To begin, we must formally define the Bayesian procedure for detecting a signal.

Definition 2.5.7 (Bayesian detection criterion). Accept model Mi if its posterior proba- bility P pMi | xq, is greater than a specified threshold p P p0, 1q. If multiple models pass this threshold, choose the one with largest posterior probability.

Definition 2.5.8 (False positive probability, FPP). Under the null model, the FPP is the frequentist probability of accepting a non-null model.

Theorem 2.5.9 (False positive probability). Under the null model, as n Ñ 8,

1´ρ 2 ´ ´1{2 P pfalse positive | r, ρ, τ q “ Opn τ2 plog nq q .

27 Proof. Under the null model, by (2.8), P pMi | xq ě p is equivalent to

1 ´ ρ ` τ 2 n p 1 ´ ρ ` τ 2 z2 ě 2 ln p1 ` op1qq i τ 2 1 ´ r 1 ´ p 1 ´ ρ ˆ ˙ ˆ d ˙ (2.14) 1 ´ ρ ` τ 2 n p 1 ´ ρ ` τ 2 “ 2 ln ` op1q . τ 2 1 ´ r 1 ´ p 1 ´ ρ ˆ ˙ ˆ d ˙

By Fact 5.4.1:

1 ´ ρ ` τ 2 n p 1 ´ ρ ` τ 2 P |Z | ě 2 ln ` op1q i g 2 f τ 1 ´ r 1 ´ pd 1 ´ ρ ˆ f ˆ ˙ ˆ ˙ ˙ e γn loooooooooooooooooooooooooooooooooooomoooooooooooooooooooooooooooooooooooon 1 1´ρ 1 ρ ´p ` 2 q 2 ´ ´ 1 p 1´ρ`τ 2 τ ? n τ2 p1 ` op1qq 2π 1´r 1´p 1´ρ 1 “ 1{n ¨ ˆ ˙ ˛ `O . b n p log nq2 ˚ 1´ρ`τ 2 n p 1´ρ`τ 2 ‹ ˆ ˙ ˚ 2 2 ln q ` op1q ‹ ˚ τ 1´r 1´p 1´ρ ‹ ˚ d ˆ ˙ ˆ ˙ ‹ ˝ b ‚ dn looooooooooooooooooooooooooooooooooomooooooooooooooooooooooooooooooooooon

n Ppany false positive | M0q “ 1 ´ Pp|Zi| ă γnq i ź n dn “ 1 ´ p1 ´ Pp|Z | ě γ qqn “ 1 ´ 1 ´ “ 1 ´ p1 ´ d q ` Opd2 q 1 n n n n ˆ ˙ 1` 1´ρ 1 |τ| p1 ´ rqp1 ´ pq τ2 “ 1´ρ ? 1` 1´ρ p ˆn τ2 ˙ˆ πp1 ´ ρ ` τ 2q 2τ2 ˙ˆ ˙

´1 n p 1 ´ ρ ` τ 2 2 log ` log p1 ` op1qq 1 ´ r 1 ´ p 1 ´ ρ ˆ ˆ d ˙˙

´p 1´ρ q ´1 “ Opn τ2 plog nq 2 q .

This is a surprising and unsettling result: a standard Bayesian procedure yields a false positive probability that goes to zero at a polynomial rate. This is much too strong error probability control from a frequentist perspective; that it happens on the Bayesian side is

28 surely indication that assuming we know τ 2 is too strong an assumption. Hence we turn to a more flexible approach in the next section.

2.6 A Type II maximum likelihood approach

The type II maximum likelihood approach to choice of the prior under the alternative model replaces a pre-specified τ 2 with that prior which maximizes the marginal likelihood over all possible τ 2; see Berger(1985) for discussion of this approach.

2 2 Lemma 2.6.1. Let L˜npτ q be the marginal likelihood of τ given px1, ..., xnq:

n 2 2 L˜npτ q “ P pMiqmipx | τ q , i“0 ÿ from which it follows that

n 2 ˜ 2 1 τ xi 2 arg max Lnpτ q “ arg max ? 1{n exp 2 ` bnx¯ . 2 2 1 ` τ 2a 2p1 ` τ aq 1 ´ ρ τ τ i“1 " * ÿ ` ˘ Proof. n ´1 ´1 1 ´1 1 ´ r ´1 ´1 1 ´1 L˜ “ r|Σ | 2 expt x Σ xu ` |Σ | 2 expt x Σ xu n 0 2 0 n 1 2 i i“1 ÿ

´1 ´1 1 ´1 1 ´ r ´1 2 ´1 “ r|Σ | 2 exp x Σ x ` |Σ | 2 p1 ` τ aq 2 0 2 0 n 0 " * n 2 2 ´1 1 ´1 τ ˚ exp x Σ x exp xipa ´ bq ` bnx¯ 2 0 2p1 ` τ 2aq i“1 # + " * ÿ ˆ ˙ ´1 1´r 2 2 r ` n p1 ` τ aq ´1 1 ´ 1 ´1 n 2 “ |Σ0| 2 exp x Σ x ˚ 2 . 2 0 $ ˚ exp τ x pa ´ bq ` bnx¯ , " * ’ 2p1`τ 2aq i / & i“1 # ˆ ˙ + . ř 1 ´1 %’ 2 -/ Note that a, b,Σ0 and x Σ0 x are independent of τ and the result follows.

2 1 n τ xi 2 For simplicity, denote Ln “ ? exp 2 ` bnx¯ , then n 1`τ 2a i“1 2p1`τ aq 1´ρ 2 ˜ 2 2 ! ` ˘ ) τˆn “ arg max Lnpτ q “ arg max Lřnpτ q τ 2 τ 2

n 2 2 1 ´ ρ τˆnzi “ arg max 2 1{n exp 2 p1 ` op1qq by p2.25q . 2 1 ´ ρ ` τ 2p1 ´ ρ ` τ q τ d n i“1 n ÿ " * 29 The following theorem gives the Type II ML estimator and the corresponding FPP:

Theorem 2.6.2 (Type II ML false positive probability). Given null model prior probability r, correlation ρ, and decision threshold p, as n Ñ 8:

1 1 1 Ppfalse positive | null model , τˆ2q “ ´ p1 ` op1qq , n log n k˚ 2 ˆ ˙ where k˚ satisfies:

? 1 1 p 1 ´2 log π ´ “ log k˚ ` 2 log ` 2 . (2.15) k˚ 2 p1 ´ pqp1 ´ rq k˚ ˆ ˆ ˙˙ ˆ ˙ ˆ ˙

2 2 Proof. Without loss of generality, assume max zi “ z1. By the model selection criteria i p2.14q, z1 is a false positive if:

1 ´ ρ p 1 ´ ρ ` τˆ2 z2 ě 2 1 ` log n n ` op1q 1 τˆ2 p1 ´ pqp1 ´ rq 1 ´ ρ ˆ n ˙ ˆ d ˙ 1 ´ ρ ` τˆ2 p “ 2 log n ` log n ` 2 log 1 ´ ρ p1 ´ pqp1 ´ rq (2.16) ˆ ˙ ˆ ˙ u loooooooooooooomoooooooooooooon log n 1 ´ ρ 1 ´ ρ ` τˆ2 p1 ´ ρq ` 2p1 ´ ρq ` log n ` u ` op1q , τˆ2 τˆ2 1 ´ ρ τˆ2 n n ˆ ˙ n 2 2 2 1 ´ ρ τˆnz1 lim Lnpτˆnq “ lim 1{n exp ` nÑ8 nÑ8 1 ´ ρ ` τˆ2 2p1 ´ ρ ` τˆ2q ˜ d n " n ˙* I loooooooooooooooooooooooooomoooooooooooooooooooooooooon 1 ´ ρ n τˆ2z2 1{n exp n i p1 ` op1qq . 1 ´ ρ ` τˆ2 2p1 ´ ρ ` τˆ2q d n i“2 n ¸ ÿ " * II

2 loooooooooooooooooooooooooooomoooooooooooooooooooooooooooon τˆn 2 If log n Ñ 0, then Lnpτˆnq Ñ 1 since

2 2 2 z1 “ 2 log n ` 2p1 ´ ρq{τˆn log n ` logτ ˆn ` u ` op1q ,

30 1 τˆ2 ` 1 ´ ρ τˆ2 logτ ˆ2 τˆ2 u I “ exp n log n ` n n ` n p1 ` op1qq n τˆ2 1 ´ ρ ` τˆ2 2p1 ´ ρ ` τˆ2q 2p1 ´ ρ ` τˆ2q n " n n n * a 2 2 ´p1´ρq{2{p1´ρ`τˆnq “ Oppτˆnq q “ op1q since

1´ρ ´ 2 ´p1 ´ ρq 2 2p1´ρ`τˆnq 2 logpτˆnq “ 2 logτ ˆn Ñ 0 . 2p1 ´ ρ ` τˆnq

II Ñ 1 by Corollary2 .10.7.

2 τˆn 2 u{2 If log n Ñ 8, Lnpτˆnq Ñ e since

2 2 z1 “ 2 log n ` logτ ˆn ` u ` op1q ,

τˆ2 log n τˆ2 logτ ˆ2 τˆ2 u I “ pτˆ2q´1{2{n exp n ` n n ` n p1 ` op1qq n 1 ´ ρ ` τˆ2 2p1 ´ ρ ` τˆ2q 2p1 ´ ρ ` τˆ2q " n n n *

uτˆ2 n ´ 1´ρ ´ 1´ρ 2p1´ρ`τˆ2q 2 2p1´ρ`τˆ2q 2p1´ρ`τˆ2q “ e n pτˆnq n n n

“ eu{2 ` op1q since

´ 1´ρ 2 2p1´ρ`τˆ2q ´p1´ρq 2 logpτˆ q n “ 2 logτ ˆ Ñ 0, n 2p1´ρ`τˆnq n 2 $ ´p1´ρq{p1´ρ`τˆnq 1´ρ log pn q “ ´ 2 log n Ñ 0. & 1´ρ`τˆn

II Ñ 0 by Corollary% 2 .10.7.

u{2 There exists an  ą 0 and k P p0, 8q such that lim Ln p1 ´ ρqk log n ą maxt1, e u `  nÑ8 ` ˘ since:z ˜2 “ 2 log n ` log log n ` log k ` u ` 1{k ` op1q by p2.16q,

1 k log n log log n u 1 I “ ? exp log n ` ` ` p1 ` op1qq n k log n 1 ` k log n 2 2 k " * ` ˘ ´1{p1`k log nq “ n´1{p1`k log nq k log n eu{2e1{kp1 ` op1qq a logppk log nq´1{p2p1`k log nqqq “ ´ log log n Ñ 0 , eu{2 since 2 log n Ñ ´1{p1`k log nq ´log n #log pn q “ 1`k log n Ñ ´1{k .

II Ñ 2Φp 2{kq ´ 1 by Corollary2 .10.7. a Choose  “ 0.5 mint1, eu{2u. Because (1) I ` II Ñ eu{2 ` 2Φp 2{kq ´ 1, (2)

u{2 a limkÑ0 2Φp 2{kq ´ 1 “ 1, there exists k such that I ` II ą maxt1, e u ` . Also k must a 31 be bounded away from zero and infinity otherwise it violates conclusions from the previous

2 two cases. Henceτ ˆn P pk1 log n, k2 log nq, where 0 ă k1 ă k2 ă 8. To find the exact rejection region, we solve for the critical value in (2.16), defined by exact equality holding; denote this critical value byz ˜2. Then

z˜2 “ 2 log n ` log log n ` c p0 ă c ă 8q .

For each c, by Lemma 2.10.9,

2 τˆn “ p1 ´ ρqkpcq log np1 ` op1qq , ´1 1 ?1 c #kpcq “ 2 ` π exp ´ 2 . ` ( ˘ 2 2 maximizes Ln With suchτ ˆn, by (2.16),z ˜ “ 2 log n ` log log n ` plog k ` 2 log u ` 2{kq, c also needs to satisfy c “ log k ` 2 log u ` 2{k. Therefore, the boundary region is defined, if it exists, by

c˚ “ log k˚ ` 2 log u ` 2{k˚ (2.17) 1 exp t´c˚{2u exp t´c˚{2u “ ´log ` ? ` 2 log u ` 1 ` 2 ? , 2 π π ˆ ˙

where c˚, k˚ are the c, k when the boundary condition holds. Since

? c˚pk˚q “ ´2 log π 1{k˚ ´ 1{2 , ` ` ˘˘ 2 2 p2.15q ô p2.17q. With the above exact ratesτ ˆn, z1, one can compute the FPP similar to Theorem 2.5.9:

32 2 P pfalse positive | τˆn, p, rq

n 1 ρ p 1 ρ τˆ2 ?1 exp ´ 1 2 1` ´ log n ´ ` `op1q 2π $ 2 τˆ2 1´r 1´p 1´ρ , $ $ & ˆ c ˙ . ,, ` ˘ “ 1 ´ ’1 ´ 2 ’ % 1´ρ np 1´ρ`τˆ2 - // ’ ’ 2 1` 2 log `op1q // ’ ’ d τˆ p1´rqp1´pq 1´ρ // &’ &’ c ././ `1 ˘ ` ˘ `o n p log nq2 ’ ’ // ’ ’ ˆ ˙ // ’ ’ // n % % ´ 1` 1 -- ? k˚ log n np 1 ` k˚ log n p1 ` op1qq $ p1´pqp1´rq ` ˘ , “ 1 ´ ’1 ´ ˆ ˙ / ’ ? / &’ 1 np ˚ ./ π 1 ` k˚ log n log p1´rqp1´pq 1 ` k log n ` op1q d ’ ˆ ˙ / ’ ` ˘ / %’ -/ n

´1 $ 1`k˚ log n , p1´pqp1´rq p n 1 ` k˚ log n ’ p p1´pqp1´rq / ’ / “ 1 ´ ’1 ´ 1{n ˆ ˙ˆ ˆ ˙˙ p1 ` op1qq/ ’ ? / &’ ˚ 1 np ˚ ./ π k log n ` 2 ` k˚ log n log p1´pqp1´rq 1 ` k log n d ’ ˆ ˙ / ’ ` ˘ / ’ dn / ’ / ’ loooooooooooooooooooooooooooooooooooooooomoooooooooooooooooooooooooooooooooooooooon / % 1 - “ dnp1 ` op1qq ` O plog nq2 ˆ ˙ 1 ´1 p1 ´ pqp1 ´ rq 1 “ ? exp p1 ` op1qq πk˚ k˚ p log n " * ˆ ˙ 1 ? 1 1 p p1 ´ pqp1 ´ rq 1 “ ? πk˚ ´ p1 ` op1qq by (2.17) πk˚ k˚ 2 p1 ´ pqp1 ´ rq p log n ˆ ˙ ˆ ˙ 1 1 1 “ ´ p1 ` op1qq . log n k˚ 2 ˆ ˙

Note that (2.15) can be solved numerically. When p “ 1{3; r “ 0.5, k˚ « 1.37 and

˚ 1 1 p c « 1.78. Solution of k˚ ´ 2 with respect to p1´rqp1´pq will be given in Figure 2.6. The Type II ML FPP converges to 0 at a logarithmic rate in n, which is much less conservative than the Bayesian procedure with fixed τ 2. In addition, the asymptotic FPP does not depend on ρ.

33 Remark 2.6.3. Figure 2.4 provides the simulated (red curve) and theoretical (in blue) false positive probability (FPP) with respect to the number of hypotheses (denoted by n). As expected, the simulated results match the theoretical prediction, the rate of convergence ? being around 1{ log n. Note that the FPP does not become extremely small even for very large n.

FPP with respect to n, c*=1.7761; k*=1.3658; m=100000 0.25 simulated FPP 1 1 1 ( ∗ − ) 0.2 logn k 2 dn − − d n 0.15 1 (1 n) FPP 0.1

0.05

0 0 1 2 3 4 5 6 7 8 log10(sample size)

Figure 2.4: Comparison of the simulated FPP and its asymptotic approximation when n “ 106, p “ 1{3; r “ 0.5, ρ “ 0 as n varies from 101 to 108 (x-axis in exponential scale)

Remark 2.6.4. Figure 2.5 demonstrates how the threshold p (Definition 2.5.7) varies as a function of the sample size, given that the false positive probability (FPP) is a fixed number (0.05). Due to the multiplicity correction (2.8), the posterior probability of each non-null model is penalized by 1{n, resulting in higher posterior probability of the null model and lower FPP. Therefore, to reach the same FPP, a higher acceptance rate or lower threshold probability is needed, which can be seen from the downward trend in this figure.

34 Threshold p w.r.t n for fixed FPR (= 0.05) 0.34

0.32

0.3

0.28

0.26

0.24

0.22 threshold probability p

0.2

0.18

0.16 0 1 2 3 4 5 6 7 8 9 10 sample size x 104

Figure 2.5: threshold probability (p) versus the number of hypotheses (n) for fixed FPP(“ 0.05) and prior probability of null model (“ 0.5).

1 1 p Remark 2.6.5. Figure 2.6 shows the value of k˚ ´ 2 for different p1´rqp1´pq . For fixed dimension n, larger threshold p or prior probability of null model r has smaller false positive probability. 1 − 1 p k∗ 2 against 1−r 1.4

1.2

1

0.8 1 2 − ∗ 1

k 0.6

0.4

0.2

0 0 5 10 15 20 25 30 p 1 1 1−r p Figure 2.6: Solution of k˚ ´ 2 (y-axis) versus p1´pqp1´rq (x-axis).

35 2.7 Power analysis for the Type II ML Bayesian procedure

The FPP in the previous section was computed under the assumption that the null model is true. For power analysis, we assume the alternative model Mi is true, so that θi is nonzero. Figure 2.7 shows the numerical solution for the Type II MLτ ˆ2, as a function of θ, for the specified scenario.

τˆ versus θ for fixed n =100000;ρ=0 9

8

7

6 ˆ τ 5

4

3

2

1 0 1 2 3 4 5 6 7 8 9 10 θi

2 4 Figure 2.7:τ ˆ versus θi and fixed n “ 10 , r “ 0.5, ρ “ 0, and p “ 1{3. Each point is the averageτ ˆ2 among 104 independent draws from the multivariate normal with the constraint τ 2 ą 1.

Figure 2.8 demonstrates the how the detection power (computed numerically) varies when the signal size increases, for the specified situation.

36 Power versus θ for fixed n =100000;ρ=0 1

0.9

0.8

0.7

0.6

0.5 Power 0.4

0.3

0.2

0.1

0 0 1 2 3 4 5 6 7 8 9 10 θi

4 Figure 2.8: Power versus θi and fixed n “ 10 , r “ 0.5, ρ “ 0, p “ 1{3. Each point is the average acceptance rate of the true non-null model with respect to θ in the x-axis range.

Figure 2.9 shows that, when the number of hypotheses grows, the size of the signal (θi) must also increase to guarantee the same power.

37 θ i v e rsu s n f or fi x e d p ow e r= 0.95 6.16

6.14

6.12

6.1

6.08 i θ 6.06

6.04

6.02

6

5.98 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 n

θi versus n for fixed power= 0.75 5.2

5.15

5.1 i θ

5.05

5

4.95 1000 1100 1200 1300 1400 1500n 1600 1700 1800 1900 2000

Figure 2.9: θi versus n for fixed power 0.95 (above) 0.75 (below), ρ “ 0, p “ 1{3.

2.8 Analysis as the information grows

In this section, we generalize model (2.4) to the scenario where each channel has m i.i.d observations. Then the sample mean satisfies

θ1 1 ρ ¨ ¨ ¨ ρ θ2 1 ρ 1 ¨ ¨ ¨ ρ X¯ „ multinorm ¨¨ . ˛ , ¨. . . .˛˛ . (2.18) . m . . .. . ˚˚ ‹ ˚ ‹‹ ˚˚θn‹ ˚ρ ρ ¨ ¨ ¨ 1‹‹ ˚˚ ‹ ˚ ‹‹ ˝˝ ‚ ˝ ‚‚ 38 Hence, m can be seen as the precision of X¯. More generally, we will replace 1{m by a

2 2 general function σn, where σn decreases to zero as n grows. The theorem below gives the rate of decrease which guarantees consistency.

Theorem 2.8.1. Consider model (2.4) with the following covariate matrix:

θ1 1 ρ ¨ ¨ ¨ ρ θ ρ 1 ρ 2 2 ¨ ¨ ¨ X „ multinorm ¨¨ . ˛ , σn ¨. . . .˛˛ , (2.19) ...... ˚˚ ‹ ˚ ‹‹ ˚˚θn‹ ˚ρ ρ ¨ ¨ ¨ 1‹‹ ˚˚ ‹ ˚ ‹‹ ˝˝ ‚ ˝ ‚‚ 2 When σn log n Ñ 0, consistency holds for both the null and alternative models.

2 When σn log n Ñ d P p0, 8q,

1´r p1´ρqd ´1 Under M0: P pM0 | Xq Ñ 1 ` r 2Φ τ 2 ´ 1 .

` “ ` ˘2 ‰˘ θj Under an alternative model Mj, if d P p0, 2p1´ρq q, consistency holds for Mj, whereas consistency does not hold otherwise.

2 2 When σn log n “ oplog nq and σn log n Ñ 8, consistency does not hold for any model. In addition, when the null hypothesis is true,

P pM0 | Xq Ñ P pM0q .

2 ˚ ˚ ˚ Proof. Write σn “ dn{log n, Xi “ Xi{σn, and θi “ θi{σn. Then a nonzero θi has prior 2 2 Np0, τ {σnq and the generalized model 2.19 becomes:

˚ θ1 1 ρ ¨ ¨ ¨ ρ θ˚ ρ 1 ¨ ¨ ¨ ρ ˚ 2 X „ multinorm ¨¨ . ˛ , ¨. . . .˛˛ ...... ˚˚ ‹ ˚ ‹‹ ˚˚θ˚‹ ˚ρ ρ ¨ ¨ ¨ 1‹‹ ˚˚ n‹ ˚ ‹‹ ˝˝ ‚ ˝ ‚‚

39 Under the null model: by Theorem 2.3.1,

´1 P pM0 | xq

1 ´ r σ2 “ 1 ` n ˚ n r σ2 ` τ 2a ˆ ˙d n n

n 2 ? 2 τ zi ρz ? exp ? ` ` bnn 1 ´ ρz¯ ` bnn ρz 2pσ2 ` τ 2a q 1 ´ ρ 1 ´ ρ i“1 # n n ˆ ˙ + ÿ a 1 ´ r σ2 n τ 2 z2 1 “ 1 ` n exp i 1 ` Op? q nr σ2 ` τ 2{p1 ´ ρq 2pσ2 ` τ 2{p1 ´ ρqq p1 ´ ρq n d n i“1 n ˆ ˙ ÿ " * ˆ ˙ n 1 ´ r ? cn ? “ 1 ` 1 ´ c exp z2 1 ` Op1{ nq , nr n 2 i i“1 ! ) ` ˘ ÿ ` ˘ (2.20) where

1 ´ρ 1 an “ 1´ρ ` p1´ρqp1`pn´1qρq “ 1´ρ ` Op1{nq , ´1 1´ρ ´1 $nbn “ 1´ρ ` p1´ρqp1`pn´1qρq “ 1´ρ ` Op1{nq , ’ τ 2 τ 2 &cn “ 2 2 “ 2 . p1´ρqσn`τ p1´ρqdn{log n`τ ’ Since dn “ oplog%nq, 1 ´ cn “ op1q, and 0 ă cn ă 1, one can apply Theorem 2.10.6 to get the asymptotic analysis of (2.20):

2p1 ´ cnq n 2Φ log ? ´ 1 c 1 ´ c ˆ n n ˙

p1 ´ ρqσ2 p1 ´ ρqσ2 ` τ 2 “ n log n n τ 2 p1 ´ ρqσ2 ˆ d n ˙

p1 ´ ρqdn log n “ log n ` Op1q τ 2 log n d „ ˆ c n ˙ 

1 ´ ρ 1 dn log n “ d ` log p1 ` Op1qq τ 2 n 2 log n d „ ˆ ˙ ˆ n ˙

8 if dn Ñ 8 , 1´ρ Ñ $ τ 2 d if dn Ñ d , &’0 if dn Ñ 0 .

%’

40 Hence, under the null hypothesis,

P pM0q if dn Ñ 8 , P M X 1´r 1´ρ ´1 p 0 | q Ñ $ 1 ` p r qp2Φp τ 2 dq ´ 1q if dn Ñ d , ’ &`1˘ if dn Ñ 0 . %’ Under the alternative model Mj: by Theorem 2.3.1,

2 2 θ2 ´1 anτ n r ´τ j 1 P pMj | xq “ 1 ` exp ` Op q σ2 1 ´ r 2pσ2 ` a τ 2q σ2p1 ´ ρq2 σ d n ˆ ˙ # n n „ n n +

n 2 2 ´τ θj 1 ` 1 ` exp 2 2 2 2 ` Op q , 2pσ ` anτ q 2p1 ´ ρq σ σn k‰j # n n + ÿ „  the first term:

2 2 2 τ {p1 ´ ρq n r ´τ θj log n 1 ` exp 1 ` O σ2 1 ´ r 2pσ2 ` τ 2{p1 ´ ρqq p1 ´ ρq2σ2 d d n ˆ ˙ # n „ n ˙+ ˆ ˆc n ˙˙

θ2 log n r 1´ j log n “ τ 2{p1 ´ ρq n 2p1´ρqdn 1 ` O d 1 ´ r d c n ˆ ˙ ˆ ˆc n ˙˙ a 2 θj Ñ 0 if and only if lim dn ă . nÑ8 2p1 ´ ρq

And the last term:

n 2 2 ´τ θj log n exp 2 2 2 2 1 ` O 2pσ ` τ {p1 ´ ρqq p1 ´ ρq σ dn k‰j # n n + c ÿ „ ˙ ˆ ˆ ˙˙ 2 θj log n log n “ n exp ´ 1 ` O 2p1 ´ ρq d d # n + ˆ ˆc n ˙˙

θ2 1´ j log n “ n 2p1´ρqdn 1 ` O d ˆ ˆc n ˙˙ 2 θj Ñ 0 when lim dn ă . nÑ8 2p1 ´ ρq

2 Interestingly, Theorem 2.5.5 can be seen as the limit case of Theorem 2.19 when σn “ 1. And this result is true in the more general model selection problem.

41 2.9 Conclusions

We have studied the comparison of Bayesian and frequentist multiplicity correction under a scenario of data dependence, where there is (at most) one true hypothesis, but the number of potential hypotheses grows. We first discussed the surprising result that, as the correlation ρ Ñ 1, the likelihood ratio test and the Bayesian procedure have similar performance, both choosing the correct model with certainty (when n ě 2). In addition, when n Ñ 8, the Bayes with fixed τ 2 has false positive probability converging to 0 at a polynomial (logarithmic) rate in 0. For the Type II ML procedure and the default p “ r “ 0.5 and typical large n (but not extreme n), the FPP ranges between 0.02 and 0.04. The suggestion from this example is that Bayesian and Type II ML procedures have very strong control of FWER while having excellent power for detecting a signal, even with highly dependent data.

2.10 Appendix 2.10.1 Normal Theory

Lemma 2.10.1.

θ1 1 ρ ¨ ¨ ¨ ρ θ2 ρ 1 ¨ ¨ ¨ ρ X „ multinorm ¨¨ . ˛ , ¨. . . .˛˛ ...... ˚˚ ‹ ˚ ‹‹ ˚˚θn‹ ˚ρ ρ ¨ ¨ ¨ 1‹‹ ˚˚ ‹ ˚ ‹‹ ˝˝ ‚ ˝ ‚‚ is equivalent to ? Xi “ θi ` ρZ ` 1 ´ ρZi @ i P t1, 2, ..., nu , (2.21) a where Z,Z1, ..., Zn „ iid Np0, 1q. Furthermore, if θj “ 0 @j, then when n Ñ 8,

?x¯ ?1 ρ “ z ` O n $ ˆ ˙ (2.22) ’ ?xi´x¯ z O ?1 &’ 1´ρ “ i ` n ˆ ˙ ’ %’ Proof. Expectation and covariance of (2.21) can be derived straightforwardly: EpXiq “ θi

42 and

EpXiq “ θi

1 if i “ j CovpXi,Xjq “ #ρ if i ‰ j

(2.22) can be obtained accordingly by definition and .

Fact 2.10.2 (Normal tail probability). Let Φptq be cummulative distribution function of standard normal, then

2 2 t ?1 e´t {2 ?1 e´t {2 2π ď 1 ´ Φptq ď 2π t2 ` 1 t

2 1 ´t {2 2 ? e e´t {2 1 ´ Φptq “ 2π ` O t t3 ˆ ˙

Proof can be found in Durrett(2010). Expand a, b in Theorem 2.3.1, one can get the following explicit form of posteriors:

Corollary 2.10.3. The posterior of any non-null model Mi is:

P pMi | xq

´1 2 2 p1´ρ`τ qp1`pn´1qρq´τ ρ n r ˚ p1`pn´1qρqp1´ρq 1´r (2.23) $ b 2 ˆ ˙ 2 , ’ exp ´τ 1`pn´1qρ px ´ x¯q ` p1´ρqx¯ / . “ ’ 2 rp1´ρ`τ 2qp1`pn´1qρq´τ 2ρsp1´ρq i 1`pn´1qρ / ’ / &’ n " 2 ´ ¯ * ./ exp ´τ 1`pn´1qρ x x 2x¯ x x 2 x¯pxi´xkqp1´ρq ` 2 rp1´ρ`τ 2qp1`pn´1qρq´τ 2ρsp1´ρq p i ` k ´ qp i ´ kq ` 1`pn´1qρq ’ k“1 / ’ „ ˆ ˙ / ’ ř / %’ -/

And in terms of zpxq, zi “ zipxq:

P pMi | xq “

´1 p1´ρ`τ 2qp1`pn´1qρq´τ 2ρ n r p1`pn´1qρqp1´ρq 1´r ˚ $ ˆ ˙ 2 , b 2 ? ? ? ’ exp ´τ 1`pn´1qρ θ θ¯ 1 ρ z z¯ θ¯ ρz 1 ρz¯ / ’ 2 1 ρ τ 2 1 n 1 ρ τ 2ρ 1 ρ i ´ ` ´ p i ´ q ` p ` ` ´ q / ’ rp ´ ` qp `p ´ q q´ sp ´ q / ’ n „ ˆ ˙  / . ’ 2 1`pn´1qρ / ’ exp ´τ / &’ ` 2 rp1´ρ`τ 2qp1`pn´1qρq´τ 2ρs ./ k“1 „ ? ? ¯ ? ? ¯ ? ’ ř rθi`θk´2θ` 1´ρpzi`zk´2z¯qsrθi´θk` 1´ρpzi´zkqs rθ` ρz` 1´ρz¯srθi´θk` 1´ρpzi´zkqs / ’ ` 2 / ’ 1´ρ 1`pn´1qρ / ’ ˆ ˙ / ’ /(2.24) %’ -/

43 Lemma 2.10.4. Suppose i P t1, 2, ..., nu, Z. are i.i.d. standard normals, then

1{2´ |Zi| ď n @ i holds almost surely.

Proof. By Fact 5.4.1:

1{2´ P p for all i, |Zi| ď n q

n 1{2´ “ 1 ´ P p|Z1| ě n q ˆ ˙

?1 1 1´2 1´2 n expt´ 2 n u ´n “ 1 ´ 2 2π ` O exp n1{2´ 2 ˆ ˆ " * ˙˙

1{2´ 2n? 1 1´2 n expt´ 2 n u “ 1 ´ 2π ` opn´2q n ˆ ˙ 1 “ 1 ´ O 2n1{2` expt´ n1´2u 2 ˆ ˙ “ 1 ` op1q .

2.10.2 Type II MLE

Fact 2.10.5 (Weak law for triangular arrays). For each n, let Xn,i, 1 ď k ď n be indepen- dent. Let βn ą 0 with βn Ñ 8 and let x¯n,k “ Xn,k1t|Xn,k|ďβnu. Suppose that as n Ñ 8:

n n 2 ¯ 2 P p|Xn,k| ą βnq Ñ 0 and 1{βn EXn,k Ñ 0 . then k“1 k“1 ř ř

pSn ´ αnq Ñ 0 in probability βn

n where Sn “ Xn,1 ` ... ` Xn,n and αn “ EX¯n,k. k“1 ř See Durrett(2010) for proof.

44 Theorem 2.10.6. If cn P p0, 1q @n, 1 ´ cn “ op1q, then

n ? cn 2 2p1 ´ cnq n lim 1{n 1 ´ cn exp zi “ lim 2Φ log ? ´ 1 nÑ8 2 nÑ8 c 1 ´ c i“1 d n n ÿ ! ) ˆ ˙ in probability.

cn 2 n Proof. Take Xn,i “ exp z ; βn “ ? in Fact 3.5.2. 2 i 1´cn ( The first assumption:

2 n P p|Xn,i| ą βnq “ P |zi| ą log ? c 1 ´ c ˆ d n n ˙

1 ´1 2 n n ´ 1 exp log ? p ? q cn 2π 2 cn 1´cn 1 c “ 2 ` O ´ n !2 n ) p 1 log ? n q3 log ? cn 1 c cn 1´cn ˆ ´ n ˙ b ? 1 1 1 ´ cn cn 1 “ ? p1 ` op1qq 1 n π n cn log ? 1´cn b ? 1 1 1 ´ cn cn 1 ă ? 1 ? p1 ` op1qq . π n cn log n

Therefore,

n P p|Xn,k| ą βnq “ nP p|Xn,k| ą βnq i“1 ÿ 1´ 1 1 1 ă n cn p1 ´ c q 2cn ? n log n

´ 1´cn 1 1 “ n cn p1 ´ c q 2cn ? Ñ 0 n log n

45 The second assumption: since lim cn Ñ 1, without left of generality, assume cn ą 3{4. nÑ8

1 n EX¯ 2 β2 n,k n k“1 ÿ

1 ´ cn 2 1 ´1 2 “ n exp cnz ? exp z dz n2 2π 2 ż " * |z|ă 2 log ? n cn 1´cn ( b

1 ´ cn 1 1 2 1 2 “ ? $ exp pcn ´ qz dz ` exp pcn ´ qz dz, n 2π ’ 2 2 / ’ ż " * ż " * / &1ă|z|ă 2 log ? n |z|ă1 . cn 1´cn b ’ / %’ -/

1 ´ cn 1 1 2 ď ? $ z exp pcn ´ qz dz ` d, n 2π ’ 2 / ’ ż " * / &1ă|z|ă 2 log ? n . cn 1´cn b ’ / %’ -/ 1 ´ cn 1 1 1 2 n 1 “ ? 1 exp pcn ´ qp q log ? ` d n 2π 2pc ´ q 2 cn 1 ´ cn # n 2 " * +

2´ 1 1 1 1 ´ cn n cn “ ? ? ` op1q 2π 2c ´ 1 n 1 ´ c ˆ n ˙ ˆ n ˙

1 1 1´ 1 1 “ ? n cn p1 ´ cnq 2cn ` op1q 2π 2c ´ 1 ˆ n ˙ ď 2 looooomooooon 2 ´ 1´cn 1 ď ? n cn p1 ´ cnq 2cn ` op1q 2π

“ op1q .

46 Therefore, both assumptions hold. The limit is:

? n 1 ´ cn 1 ´ cn α “ EX¯ n n n n,i i“1 ÿ 2 2 cnz 1 ´z “ p1 ´ cnq e 2 ? e 2 dz 2π ż |z|ă 2 log ? n cn 1´cn b 2 log ? n cn 1´cn 1 “ 2 Φ ´ , b 1 2 ˆ ˆ 1´cn ˙ ˙ b by Fact 3.5.2:

n n 2 ? 2 cnz cnz e i ´ αn 1 ´ cn e i ? Sn ´ αn i“1 i“1 1 ´ cn “ n “ ´ αn Ñ 0 . βn ř ? nř n 1´cn in probability. And the result follows.

2 τˆn Corollary 2.10.7. Let cn “ 2 , then 1´ρ`τˆn

log n 1 if 2 Ñ 8 , τˆn ? n c n 2 $ 2 log n 1 1{n 1 ´ cn exp zi Ñ 2Φ ´ 1 if 2 Ñ , 2 ’ k τˆn p1´ρqk i 1 ’ “ ! ) &’ ˆb ˙ log n ÿ 0 if 2 Ñ 0 τˆn ’ ’ in probability. %

Proof. By Theorem 2.10.6:

log n Case I : 2 Ñ 8 τˆn

? 2 1 ´ cn 2p1 ´ ρq 1 ´ ρ ` τˆ α ñ 2Φ log n n ´ 1 Ñ 1 . n g 2 n f τˆn d 1 ´ ρ ˆf ˆ ˙˙ e log n 1 Case II : 2 Ñ τˆn p1´ρqk

? 2 1 ´ cn 2p1 ´ ρq 1 ´ ρ ` τˆ 2 α ñ 2Φ log n n ´ 1 Ñ 2Φ ´ 1 . n n g τˆ2 1 ρ k f n d ´ c ˆf ˆ ˙˙ ˆ ˙ e 47 log n Case III : 2 Ñ 0 τˆn

? 2 1 ´ cn 2p1 ´ ρq 1 ´ ρ ` τˆ α ñ 2Φ log n n ´ 1 Ñ 0 . n g 2 n f τˆn d 1 ´ ρ ˆf ˆ ˙˙ e

Lemma 2.10.8. n 2 1 τ xi 2 lim exp n ` bnx¯ nÑ8 n 1 ` τ 2a 2p1 ` τ 2aq 1 ´ ρ n i“1 " n * ÿ ` ˘ (2.25) a 1 1 ´ ρ n τ 2z2 “ lim exp n i a.s. nÑ8 n 1 ´ ρ ` τ 2 2p1 ´ ρ ` τ 2q d n i“1 n ÿ " * Proof. Expand the coefficients: 1 τ 2p1 ` pn ´ 2qρq ´1 “ 1 ` n 1 ` τ 2a p1 ` pn ´ 1qρqp1 ´ ρq n ˆ ˙ 1 ´ ρ 1 ´ ρ “ “ p1 ` op1qq , 2 ´ρ 1 ´ ρ ` τ 2 1 ´ ρ ` τn 1 ` 1`pn´1qρ n ` ˘ and 2 2 xi 1 ´ρnx¯ ` bnx¯ “ xi ` 1 ´ ρ p1 ´ ρq2 1 ` pn ´ 1qρ ˆ ˙ ˆ ˙ 1 1 ´ ρ 2 “ xi ´ x¯ 1 ´ p1 ´ ρq2 1 ´ ρ ` ρn ˆ ˙ ` ˘ 1 ? 1 ´ ρ 1 ´ ρ 2 “ 1 ´ ρzi ` ρz ` 1 ´ ρ z¯ ´ 1 ` p1 ´ ρq2 1 ´ ρ ` ρn 1 ´ ρ ` ρn ˆ ? ˙ a ` ˘ a Op1{ nq ` ˘ Op1{nq loomoon looooooomooooooon z2 ? “ i ` O plog nq{ nq . 1 ´ ρ ` ˘ Therefore, 2 1 τ xi 2 1{n exp n ` bnx¯ 1 τ 2a 2p1 ` τ 2aq 1 ´ ρ ` n i " n * ÿ ` ˘ a 1 ´ ρ τ 2 z2 “ 1{n exp n i ` op1q . 1 ρ τ 2 o 1 2 1 ρ τ 2 d ´ ` n ` p q i ´ ` n ÿ " „ * 48 Lemma 2.10.9 (Type II ML estimator). Under null model, suppose

2 xj ´ x¯ max ? “ 2 logpnq ` log logpnq ` c , j 1 ´ ρ ˆ ˙ then 2 arg max Lpτˆnq “ p1 ´ ρq kpcq log n , 2 1 1 ` τˆnPtk log n, k PR u

1 ?1 c ´1 where kpcq “ p 2 ` π exp ´ 2 q . ( Proof. Without left of generality, max |zi| “ |z1|. By (2.25),

2 2 2 1 ´ ρ τnzi max Lpτnq “ max 1{n exp p1 ` op1qq τ 2 τ 2 1 ρ τ 2 2 1 ρ τ 2 n n d ´ ` n i p ´ ` nq ÿ " * ? 1 ´ cn cn 2 “ max exp zi p1 ` op1qq cn n 2 i ÿ ! )

? cn 2 1 n “ max $1{n 1 ´ cn exp z1 `2Φ 2 ´ 1 log ? ´ 1, p1 ` op1qq , cn ’ 2 d cn 1 ´ cn / &’ ! ) ˆ ˆ ˙ ˆ ˙˙ ./ Op1q ’ / %’loooooooooooooomoooooooooooooon -/ the last equality holds since Corollary 2.10.7. Next step is finding critical points of the above function, define f, g, h as:

? cn z2 1 n fpcnq “ 1{n 1 ´ cne 2 i ` 2Φ 2 ´ 1 log ? ´ 1 , d cn 1 ´ cn g ˆ ˆ ˙ ˆ ˙˙ h looooooooomooooooooon loooooooooooooooooooooooooomoooooooooooooooooooooooooon 2 τn since cn “ cpτnq “ 1 ρ τ 2 is monotone, solving (2.25) is equivalent to find arg max fpcnq: ´ ` n cn

49 1 ? dg 1 1 ´ ρ 2 c 1 ´ ρ 1 ´ cn “ 2 ´ 1 log n exp exp ´ p1 ` op1qq dcn 2 k 2 k ˆ ˙ˆ ˙ ! ) " * ? dh 1 1 ´ ρ 1 ´ ρ ` k log n 1 ´ cn “ ´? exp ´ log n dc π k log n 1 ´ ρ n # ˆ d ˙+

1 ´ ρ ` k log n ˚ log n p1 ` op1qq g f d 1 ´ ρ f ˆ ˙ e ´1 1 ´ ρ “ ? exp ´ log np1 ` op1qq π k " * a df “ 0 if and only if k satisfies: dcn

1 1 1 ´ ρ 2 c 1 2 ´ 1 log n exp “ ? log np1 ` op1qq , 2 k 2 π ˆ ˙ˆ ˙ ! ) a 2p1 ´ ρq 2 ´c ´ 1 “ ? exp p1 ` op1qq . k π 2 ˆ ˙ " *

Therefore, p1 ´ ρq kpc q “ p1 ` op1qq . 1 ?1 c 2 ` π exp ´ 2 (

50 3

Frequentist multiplicity control of Bayesian model selection with spike-and-slab priors

Much of scientific research requires testing thousands to millions of hypotheses simultane- ously, from genomewide studies (Storey and Tibshirani(2003)) to pharmaceutical research

(Dmitrienko et al.(2009)). The goal is to find true signals amongst multiple hypotheses, but also control the occurrence of false positives, including controlling family-wise error rate (FWER) (Benjamini and Hochberg(1995)), false discovery rate (FDR) (Efron et al.

(2001), Efron(2004)), and positive false discovery rate (pFDR) (Storey(2003)), etc.. There also has been increased interest in the Bayesian approach to handling multiple testing, including Cui and George(2008), Guindani et al.(2009), Bogdan et al.(2008a). It has been shown that Bayes and empirical Bayes methods control multiplicity through spec- ification of the prior probabilities of hypotheses (Scott and Berger(2006), Scott and Berger

(2010)). However, exact false positive rates for Bayesian decision-theoretic approaches have yet to be found. In this chapter, we consider the standard Bayesian approach to multiple hypothesis testing, involving spike-and-slab priors, focusing on the frequentist performance of the

Bayesian procedures. In particular, we study the frequentist false positive rate (under the

51 null model of no effect) of the Bayesian procedure, asymptotically and through simulation. In variable selection, one would like to include variables which are good predictors for responses; each variable is either selected or not. Hence, an intuitive prior in this scenario is the spike-slab prior (George and McCulloch(1993), Chipman et al.(2001), Clyde et al.

(2011)), which includes a point mass at zero (indicating not selected) and a continuous normal distribution for nonzero variables. The corresponding posterior has good inter- pretability by having posterior probability on the point mass (chance of not selected) and elsewhere.

3.1 The standard Bayesian model of multiple testing

Let

Xi „ Npµi, 1q for i P t1, ..., mu , (3.1) where µ “ pµ1, ..., µmq are unknown a priori with the spike-slab prior:

i.i.d. 2 µi „ pδ0 ` p1 ´ pqNp0, τ q , (3.2)

where δ0 is a point mass at zero, p “ P pµi “ 0q. We focus on the testing question of which of the µi are zero, and approach this through the Bayesian model selection approach. Following the usual convention, we use γ to index models:

m γ “ pγ1, ..., γmq P t0, 1u ,

0, if µi “ 0 , γi “ #1, if µi ‰ 0 ; thus Mγ represents the model with the indicated zero and nonzero means, i.e.

2 Mγ “ X „ N µ,In where µi „ p1 ´ γiqδ0 ` γiNp0, τ q , (3.3) ` ˘ ( with In being the n by n identity matrix. m 1 Model size or model complexity of Mγ is defined as |γ| “ i“1 γi‰0. We consider models having size up to kmax (between 0 and m) and kmax isř allowed to grow with m.

52 The corresponding model space is:

m Mkmax “ tMγ | γ P t0, 1u , |γ| ď kmaxu .

kmax Given pp0q, pp1q, ..., ppkmaxq satisfying ppkq ě 0 @ k and k“0 ppkq “ 1, one can assign model priors by: ř 1 P pMγ q “ pp|γ|q m . (3.4) |γ| ` ˘ In other words, models having the same complexity k share the weight ppkq equally. This prior generalizes the following intuitive prior from the Beta distribution:

p “ P pγi ‰ 0q „ Bep1, 1q @1 ď i ď m ,

1 1 1 (3.5) P pMγ q “ ppMγ | pqπppqdp “ , pk ` 1q m ż0 max |γ| ` ˘ which is the special case of (3.4) when p “ p “ ... “ p “ 1 . p0q p1q pkmaxq kmax`1 Before we dig into the asymptotic analysis, first consider the explicit form of the and the posterior model probabilities.

3.1.1 Bayes factors

The marginal likelihood of Mγ is:

mγ pxq “ fpx | µqppµ | γ, τqdµ (f is the of x) ż exp ´x2{p2p1 ` τ 2qq exp ´x2{2 “ i ? i . 2πp1 ` τ 2q 2π i:γi‰0 ( i:γi“0 ( ź ź a Under the null model, the Bayes factor of Mγ to the null model is: 2 mγ pxq 1 τ 2 Bγ0pxq “ “ ? exp x . (3.6) m pxq 2 2p1 ` τ 2q i 0 i:γ ‰0 1 ` τ źi " * 3.1.2 Posterior model probabilities

The posterior model probabilities are

P pMγ qBγ0pxq P pMγ qBγ0pxq P pMγ | xq “ “ . kmax 1 P pMηqBη0pxq p B (3.7) k“0 pkq pmq η0 γPMkmax k η:|η|“k ř ř ` ř ˘ 53 3.2 Posterior probability of the null model as n grows

In Theorem 2.5.5, we saw the interesting fact that, for the mutually exclusive testing scenario and under the null model, the null posterior probability converges to its prior probability as the number of tests grows. In this section, we show the same result for the generalized model.

r 1 2 Theorem 3.2.1. Suppose kmax “ Opm q where r P r0, 1`τ 2 s and τ ą 1{3. Then, under the null model,

P pM0 | Xq Ñ P pM0q when m Ñ 8 .

Proof. The posterior can be written in term of Bayes factors:

pp0q P pM0 | Xq “ . kmax 1 (3.8) ppkq m Bγ0pXq k“0 p k q |γ|“k ř ř The key step is to show

1 m Bγ0 “ 1 ` op1q @k “ t1, 2, ..., kmaxu , (3.9) k γ k | ÿ|“ ` ˘ which would complete the proof by observing that the denominator of (3.8) is

kmax kmax ppkq 1 1 ´ pp0q 1 ` op1q “ p p1 ` op1qq “ ` op1q . p p pkq p k“1 p0q p0q k“1 p0q ÿ ˆ ˙ ÿ

1 X2τ 2{2p1`τ 2q 1 m k To prove (3.9), first denote Yj “ ? e j and expand p Yjq : 1`τ 2 m 1 ř m k Y kj 1 Yj k j “ k m k1, k2, ..., km m j k `...`k “k 1ďjďm „ř  1 ÿ m ˆ ˙ ź Y kj (3.10) k! kj k! j “ k Yj ` k . m k1!...km! m j k1`...`km“k 1ďjďm k1`...`km“k 1ďjďm maxÿ|k.|“1 ź maxÿ|k.|ě2 ź

I II looooooooooooooomooooooooooooooon loooooooooooooooooooomoooooooooooooooooooon Notice that

54 2 pm ´ 1q...pm ´ k ` 1q 1 τ 2 I “ expt kjX u mk´1 m 2p1 ` τ 2q j k k1`...`km“k 1ďjďm maxÿ|k.|“1 ź ` ˘ pm ´ 1q...pm ´ k ` 1q 1 “ B mk´1 m γ0 k γ k | ÿ|“ ` ˘ 1 “ m Bγ0pxq p1 ` op1qq ; k γ k „ | ÿ|“  ` ˘ 1 k! kj II “ k Yj m k1!...km! k1`...`km“k 1ďjďm maxÿ|k.|ě2 ź

m kpk ´ 1q pk ´ 2q! ˜ ˜ ď y2 Y k1 ...Y km mk i ˜ ˜ 1 m 1 ˜ ˜ k1!...km! ˆ ÿ ˙ k1`...`ÿkm“k´2

m 2 m k´2 kpk ´ 1q Y Yi “ 1 i 1 . m2{p1`τ 2q m2τ 2{p1`τ 2q m ˆ ˙ˆ ř ˙ˆř ˙

To conclude that II Ñ 0 when m Ñ 8, note that kpk´1q converges to a constant because m2{p1`τ2q of the k “ Opmrq assumption;

m Y 2 By Lemma 3.5.3, when τ 2 ą 1{3, 1 i “ op1q; m2τ2{p1`τ2q ř m 1 Yi k´2 By Lemma 3.5.5, m “ 1 ` op1q. ř m Hence, the right hand` side˘ of (3.10) converges to 1{ k |γ|“k Bγ0p1 ` op1qq. By Lemma 3.5.5, the left hand side goes to 1. Hence, ` ˘ ř

1 lim Bγ0 “ 1 . mÑ8 m k γ k | ÿ|“ ` ˘

Remark 3.2.2. Figure 3.1 plots the null posterior probability under the null model for different numbers of hypotheses. Here we use the prior pp0q “ 0.5; pp1q “ pp2q “ ... “

1´p p “ p0q . As can be seen, under the null model, the posterior of null model converges pkmaxq kmax to its prior probability when m grows, as established in Theorem 3.2.1.

55 0.7

0.6

0.5

) 0.4 X | 0 M (

P 0.3

0.2

0.1

0.0 25 50 75 100 number of hypotheses

Figure 3.1: Box plot of the posterior probability of the null model under the null model, where P pM0q “ 0.5, kmax “ 4, τ “ 2. The green horizontal line is y “ 0.5. For each m (number of hypotheses), 3000 iterations have been performed.

3.3 Asymptotic frequentist properties of Bayesian procedures

This section is devoted to studying the asymptotic false positive probability and the ex- pected number of false discoveries under the null model. We begin by discussing rates of the inclusion probability and the decision criterion.

3.3.1 Inclusion probabilities and the decision rule

Definition 3.3.1. The posterior inclusion probability for the ith mean is

1 P pγi “ 1 | Xq “ γi“1P pMγ | Xq . γ ÿ

r 1 2 Theorem 3.3.2. Suppose kmax “ Opm q where r P r0, 1`τ 2 s and τ ą 1{3. Then, under the null model and as m Ñ 8,

τ 2X2 k ? 1 exp i 1 max kp 1`τ 2 2p1`τ 2q m 1 pkq P pγi “ 1 | Xq “ ! ) ˆ ˙ p1 ` op1qq . (3.11) k ř τ 2X2 1 ` 1 max kp ? 1 exp i ´ 1 m 1 pkq 1`τ 2 2p1`τ 2q ˆ ˙ˆ ˙ ř ! )

56 Proof.

kmax 1 ppkq m Bγ0 k“1 p k q |γ|“k ř γři“1 P pγi “ 1 | Xq “ . (3.12) kmax p ` p 1 B ` 1 B p0q pkq pmq γ0 pmq γ0 k“1 „ k |γ|“k k |γ|“k  ř γři“1 γři“0

From the explicit expression for Bayes factors (3.6): 2 2 1 τ Xi Bγ0 “ ? exp Bγ˜ 0pX˜iq , 2 2p1 ` τ 2q i (3.13) 1 ` τ γ˜ k 1 " * | |“ÿ´ where

γ˜i “ pγ1, ..., γi´1, γi`1, ..., γmq ,

˜ Xi “ pX1, ..., Xi´1,Xi`1, ..., Xmq .

By (3.9),

1 ˜ m´1 Bγ˜i0pXiq Ñ 1 @ i . k´1 γ˜ k 1 | i|“ÿ´ ` ˘ Therefore,

m´1 2 2 1 k´1 1 τ Xi Bγ0 “ ? exp p1 ` op1qq m m 1 ` τ 2 2p1 ` τ 2q k |γ|“k ` k ˘ " * γÿ“1 ` ˘ i ` ˘ k 1 τ 2X2 “ ? exp i p1 ` op1qq . m 1 ` τ 2 2p1 ` τ 2q " * Similarly, one can show 1 m´1 m ´ k B “ k p1 ` op1qq “ p1 ` op1qq . m γ0 m m k |γ|“k ` k ˘ γÿ“0 ` ˘ i ` ˘ Combining all of these, the inclusion posterior probability (3.12) becomes:

k τ 2X2 max ? 1 exp i 1 kp 1`τ 2 2p1`τ 2q m pkq P pγ “ 1 | Xq “ ˆ 1 ˙ p1 ` op1qq . (3.14) i k ! ) max ř τ 2X2 1 ` 1 kp ? 1 exp i ´ 1 m pkq 1`τ 2 2p1`τ 2q ˆ 1 ˙ˆ ˙ ř ! )

57 Remark 3.3.3. Figure 3.2 is the box plot of the ratio of the true inclusion probability (3.12) to the approximated one (3.11) under the null model. As expected, the ratio concentrates at 1, indicating that the estimated inclusion probability matches with the true one. In addition, the empirical convergence rate can be obtained by running the same simulation for multiple numbers of hypotheses.

0.04

0.02

0.00

0.02

0.04

0.06 log(true/est inclusion probability)

0.08 50 75 100 number of hypotheses

Figure 3.2: For kmax “ 4, τ “ 2, and various numbers of hypotheses m, box plots of the logarithm of the ratio of the true inclusion probability to the estimated inclusion probability, under the null model. For each m, 3000 iterations have been carried out. The red horizontal line is y “ 0 (ratio = 1).

Definition 3.3.4 (Decision rule). The ith channel is claimed to have a signal if the posterior inclusion probability P pγi “ 1 | xq is greater than a preassigned threshold q P p0, 1q.

r Theorem 3.3.5 (False positive probability). If the maximum model size kmax “ Opm q

1 2 where r P r0, 1`τ 2 s, τ ą 1{3, then under the null model, the false positive probability (FPP) asymptotically equals to:

2 1`τ 1`τ2 kmax 2 kp k τ p1 ´ qq τ2 |τ| k“1 p q . 1 ? 1`τ2 1 τ2 ? 2 2 1` 2 ` ř m ˘ q 2 m πq τ p1 ` τ q 2τ log 1´q 1 ` τ kmax kppkq d „ k“1  ` ř ˘ 58 Proof. A false positive occurs when this probability is greater than p, from Theorem 3.3.2:

2 1 ` τ q 2 m P |Xi| ą 2 log 1 ` τ p1 ` op1qq τ 2 1 ´ q kmax d k“1 kppkq ˆ ˆ ˙ „ a ˆ ˙ ˙ γm ř looooooooooooooooooooooooooooooooooooooooomooooooooooooooooooooooooooooooooooooooooon 1`τ2 1`τ2 kmax 2 2 k“1 kppkq τ 1 |τ| 1 ´ q τ “ ? 1 p1 ` op1qq m 2 1` 2 q ˆř ˙ m πp1 ` τ q 2τ ˆ ˙ log kmax kppkq d ˆ k“1 ˙ ř 1`τ2 kmax kppkq τ2 1 “ k“1 m ? ˆř ˙ q 2 m log 1´q 1 ` τ kmax kppkq d „ k“1  ` ř ˘ 1`τ2 |τ| 1 ´ q τ2 ? 1 p1 ` op1qq by Fact5 .4.1. 2 1` 2 q πp1 ` τ q 2τ ˆ ˙ (3.15)

Hence, the FPP is:

P pany false positive | M0q

m “ 1 ´ P p no false positive | M0q “ 1 ´ P |Xi| ď γm i“1 ź ˆ ˙ m m “ 1 ´ 1 ´ P |Xi| ą γmq “ 1 ´ 1 ´ P |Xi| ą γmq i“1 ź “ ` ‰ “ ` ‰ “ mP |X1| ą γm p1 ` op1qq

` 1˘`τ2 kmax τ2 k“1 kppkq 1 “ 1 τ2 ? ` ř m ˘ q 2 m log 1´q 1 ` τ kmax kppkq d „ k“1  ` ř ˘ 1`τ2 |τ| 1 ´ q τ2 ? 1 p1 ` op1qq . 2 1` 2 q πp1 ` τ q 2τ ˆ ˙

Remark 3.3.6. Figure 3.3 demonstrates the convergence rate of the false positive probabil- ities as the number of hypotheses increases. At each m, 300 m-dimensional multivariate

59 normals were generated. FPP is the number of multivariate normals having at least one false positive over 300. To get the distribution of FPP, repeat the above procedure a hun- dred times and include all FPPs in the box plot. Furthermore, the theoretical FPP based on Theorem 3.3.5 is plotted in black lines. As m grows, the simulated FPP converges to the theoretical FPP as expected.

FPP versus m under null model 0.9

0.8

0.7

0.6

0.5

0.4

0.3 False Positive Rate

0.2

0.1

0.0 50 118 281 665 number of hypotheses

Figure 3.3: Box plots of empirical false positive rates with median (red line), mean (green dashed line) and the theoretical false positive probabilities (black lines), when kmax “ 2, the decision threshold is q “ 0.05, and τ “ 1.

3.3.2 Expected number of false positives

r 1 2 Theorem 3.3.7. Suppose kmax “ Opm q, r P r0, 1`τ 2 s, and τ ą 1{3. Then under the null model, the expected number of false positives is asymptotically the same as the false

positive probability.

60 Proof.

m E 1 P pγi“1|Xqą1{2 | M0 i“1 „ ÿ  m “ P P pγi “ 1 | Xq ą 1{2 M0 i“1 ˆ ˇ ˙ ÿ ` ˘ˇ ˇ (3.16) 2 ˇ 1 ` τ q 2 m “ mP |Xi| ą 2 log 1 ` τ p1 ` op1qq τ 2 1 ´ q kmax d k“1 kppkq ˆ ˆ ˙ „ a ˆ ˙ ˙

1`τ2 ř kmax τ2 kppkq 1 “ O k“1 . 1 m τ2 log ` ř m ˘ kmax „ k“1 kppkq  b ` ř ˘

3.3.3 FPPs for various structures of prior probabilities

The false positive probability (Theorem 3.3.5) depends on the prior structure ppiq “

|γ|“i P pMγ q. In this section, we consider several configurations for this prior structure andř compare the resulting FPPs.

1. Trivially, no false positive occurs if a prior does not allow any positive models, i.e.

kmax pp0q “ 1. This is the case when k“1 kppkq is 0. This case can be categorized as

kmax “ 0 as well. ř

2. If kmax “ 1, pp0q “ r, then pp1q “ 1 ´ r. In this case, the multiple signals model reduces to the mutually exclusive model. As expected, the FPP rate in the mutually

exclusive model (Theorem 2.5.9) is a special case of FPP rate in the generalized

model (Theorem 3.3.5); and the inclusion probability (3.12) is essentially the same

as the posterior probability in Lemma 2.5.1.

kmax 3. At the other extreme, the maximum of k“1 kppkq can be obtained straightforwardly by Cauchy’s inequality: ř

61 Fact 3.3.8. When p “ 2k p1 ´ p q, kmax kp attains its maximum pkq kmaxpkmax`1q p0q k“1 pkq at ř

2kmax ` 1 p1 ´ p q . 3 p0q

4. The Beta-induced prior pp0q “ pp1q “ pp2q “ ... “ ppkmaxq has similar asymptotic

kmax behavior to the previous case since 1 kppkq has the same order: ř Fact 3.3.9. When the prior probabilities on all levels are the same: pp0q “ pp1q “

p “ ... “ p “ 1 , then p2q pkmaxq kmax`1

kmax kmax kp “ . pkq 2 k“1 ÿ

r For cases 3 and 4 above, if kmax “ Opm q, the corresponding FPP is:

1 prp1`τ 2q´1q 1 O m τ2 ? . log m ˆ ˙

In particular, when r 1 , the FPP is O ? 1 . “ 1`τ 2 log m ` ˘ r 1 Theorem 3.3.10. Suppose kmax “ Opm q where r P r0, 1`τ 2 s and there is a non-zero weight on an alternative model, i.e. pp0q ă 1. Then for any α P p0, 1q, the following decision threshold

2 2 1 m1{p1`τ q τ ´ q “ qpm, αq “ 1 ` pπα2 log mq 2p1`τ2q 1 ` τ 2 kmax 1 kppkq ˆ a ˙ ř guarantees the false positive probability to be α.

62 Proof. FPP can be obtained from criteria (3.15) with the aforementioned ppm, αq:

2 1 ` τ qpm, αq 2 m mP |Xi| ą 2 log 1 ` τ p1 ` op1qq τ 2 1 ´ qpm, αq kmax # d k“1 kppkq + ˆ ˙ „ a ˆ ˙ ř 1 “ mP |Xi| ą 2log m p1 ` op1qq πα2log m # d „ ˆ ˙ + a (3.17) 2 “ mP |Xi| ą 2log m ´ log πα log m p1 ` op1qq # d„ ˆ ˙ +

2 1 πα2 log m “ m? m p1 ` op1qq by Fact5 .4.1 2 2π 2log m ´alog log m ´ logpπα q a “ αp1 ` op1qq .

Remark 3.3.11. Figure 3.4 is a simulation study of Theorem 3.3.10 for fixed α “ 0.05.

α m The red curve is 1 ´ p1 ´ m q which converges to α when m grows. FPP is generated from the same procedure as Remark 3.3.6. As can be seen, numerical FPP converges to

1 ´ p1 ´ α{mqm (and hence converges to α) slowly.

63 Figure 3.4: False positive probability based on the threshold in Theorem 3.3.10. The blue curve is numerical FPP (average FPP at given m); the green curve is α m 1 ´ p1 ´ m q where α “ 0.05.

Remark 3.3.12. Interestingly, from the numerical study Figure 3.4, the real FPP tends to be smaller than the theoretical FPP, indicating that the testing procedure usually performs better than the preassigned FPP.

Remark 3.3.13. Figure 3.5 shows the power with respect to different signal size pθiq. For each θ, box plot of 100 iterations of posterior inclusion probabilities is shown.

64 m uhhigher. much π means all over means nonzero of ( number the is plot) each of ( right top is the ratio on noise legend model (the The ratios. noise true and expected 3.5 Figure Remark plot. k “ p max p “ m | ntedt eeainpoes scnb en hntemdlsz sntrestricted not is size model the when seen, be can As process. generation data the in ) 100 x

“ inclusion probability q 0.8 0.6 0.0 0.4 0.2 1.0 3.3.14 ed ontsrastemdlnieratio noise model the surpass not to tends k , m ,tepseirof posterior the ), max 0.00 o lt ficuinprobability inclusion of plots Box :

. 0.33 “ dmntae how demonstrates 3.6 Figure 0.67 3, 1.00 τ 1.33 “ 1.67 .Freach For 1. 2.00

p 2.33

ocnrtsa h renierto nteohrhand, other the On ratio. noise true the at concentrates 2.67 3.00 3.33 θ i

0 trtoswr eeae ogttebox the get to generated were iterations 100 , 3.67 θ i 65 4.00 4.33 π 4.67 p p 5.00 P | k p

x 5.33 max µ q i

k 5.67 ‰ aiswt epc odifferent to respect with varies max {

m 6.00 0 {

vnwe h rertois ratio true the when even 6.33 | m X

n h renieratio noise true the and , 6.67 q 7.00 esssga size signal versus 7.33 7.67 8.00 θ i . (a) kmax=5 (b) kmax=10

(c) kmax=15 (d) kmax=20

Figure 3.6: Posterior of p versus different noise ratios. m “ 20, τ “ 2. For each plot, kmax is indicated in the subtitle, and simulate data x according to the top right true noise ratio legend.

3.4 Conclusion

In this chapter, we described the asymptotic behavior of frequentist properties of the standard Bayesian model selection procedure, providing explicit rates for the false positive probability and the expected number of false positives. For those who prefer having a pre- specified constant false positive rate, the Bayesian decision threshold was provided that guarantees this. Future work on this topic includes extending weak control to strong control; doing all computations under the null model does not reflect situations such as DNA sequencing data, where more than one gene will typically affect phenotypes. Another generalization

66 is to consider more general covariance structures for X. It should be possible to apply a matrix decomposition method, similar to Theorem 2.3.1, and obtain the relevant frequentist properties.

3.5 Appendix

Fact 3.5.1 (Normal tail probability). Let Φptq be cummulative distribution function of standard normal. Then

2 2 t ?1 e´t {2 ?1 e´t {2 2π ď 1 ´ Φptq ď 2π , t2 ` 1 t

2 ?1 e´t {2 1 ´ Φptq “ 2π p1 ` op1qq . t

The proof can be found in Durrett(2010).

Fact 3.5.2 (Weak law for triangular arrays). For each m, let Xm,k, 1 ď k ď m be ¯ independent random variables. Let βm ą 0 with βm Ñ 8 and let Xm,k “ Xm,k1t|Xm,k|ďβmu. Suppose that as m Ñ 8

m P p|Xm,k| ą βmq Ñ 0 , k“1 ř m 1 ¯ 2 β2 EXm,k . Ñ 0 . m k“1 ř m Denote Sm “ Xm,1 ` ... ` Xm,m and αm “ EX¯m,k, then k“1 ř

pSm´αmq Ñ 0 in probability. βm

See Durrett(2010) for proof.

1 τ 2 2 2 Lemma 3.5.3. If Yj “ ? expt 2 X u and τ ą 1{3, then 1`τ 2 2p1`τ q j

m ´2τ 2{p1`τ 2q 2 m Yj Ñ 0 when m Ñ 8 . j“1 ÿ 67 Proof. First we need the following order estimate before applying Fact 3.5.2: If limmÑ8 am P p0, 8s, limmÑ8 bm “ 8, then

2 bm 2 amb {2 amx e m e 2 dx “ O . (3.18) a b ż0 ˆ m m ˙

a x2 ? 2 bm m 1 ambm x To show (3.18) is true, notice that e 2 dx “ ? e 2 dx and by L’Hˆopital’s 0 am 0 rule: ş ş

? x2 ? 2 1 ambm ambm x ? e 2 dx 2 am 0 0 e dx lim 2 “ lim 2 mÑ8 ambm mÑ8 ambm ? e ş2 {pambmq e ş 2 {p ambmq

v x2 2 0 e dx ? “ lim where v “ ambm vÑ8 v2 ş e 2 {v

2 d v ex {2dx 1 “ lim dv 0 “ lim “ 1 . v v2{2 1 Ñ8 d e bÑ8 1 ´ 2 şdv v v

τ 2 2 2τ 2{p1`τ 2q Then take Xm,j “ expt 1`τ 2 xj u, βm “ m in Fact 3.5.2, both assumptions in Fact 3.5.2 hold since:

m 1 P p|X | ą β q “ O ? Ñ 0 by Fact5 .4.1 , m,j m log m j“1 ÿ ˆ ˙

and m 2τ2 x2 1 2 m 1 p ´1q EX¯ “ ? e 1`τ2 2 dx β2 m,j m4τ 2{p1`τ 2q m j“1 ? 2π ÿ |x|ď ż2 log m

2τ2 ´ 1 O m 1`τ2 ? if τ 2 1, by 3.18 log m ą p q “ 1´3τ2 $ 2 &O`m 1`τ ˘ else, ` ˘ “ o%p1q if τ 2 ą 1{3 .

The limit is:

68 m 2 2 1 ´2τ 2{p1`τ 2q m x 2τ EX¯m,j “ m ? exp p ´ 1q β 2π 2 1 ` τ 2 m j“1 ? " * ÿ |x|ă ż2 log m

2 2 2 2 1 “ mp1´τ q{p1`τ qO mpτ ´1q{p1`τ q ? by p3.18q log m ˆ ˙ 1 “ O ? Ñ 0 . log m ˆ ˙

Fact 3.5.4 (Marcinkiewicz and Zygmund). Let X1,X2, ... be i.i.d. with EX1 = 0 and

p E|X1| ă 8 where 1 ă p ă 2. If Sm “ X1 ` ... ` Xm then

Sm Ñ 0 a.s. m1{p

Proof of the claim can be found in Durrett(2010).

1 τ 2 2 r 1 Lemma 3.5.5. Let Yj “ ? expt 2 X u, k “ Opm q where r P r0, 2 s then: 1`τ 2 2p1`τ q j 1`τ

1 m k Y “ 1 ` op1q . m j j“1 ˆ ÿ ˙

Proof.

When r “ 0, note that EpYi | M0q “ 1 @i, then by Strong Law of Large number, m 1{m 1 Yj Ñ 1, this case is proved. ř

1 1`τ 2 Now consider r P p0, 1`τ 2 s: take Xi “ Yi ´ 1, p “ τ 2 in Fact 3.5.4

m m Yj ´ m Yj 1 “ 1 ´ m1´1{p “ op1q , m1{p m1{p ř ˆř ˙

m Yj 1 Yj 1 1 ´ 1 “ ´ m1´1{p “ o , (3.19) m m1´1{p m1{p m1´1{p ř ˆř ˙ ˆ ˙

69 the lemma follows since:

m m kpmq 1 1 1 r ´ log Y “ kpmq log Y “ Opm qo m 1`τ2 “ op1q . m j m j j“1 j“1 ˆ„ ÿ  ˙ ˆ ÿ ˙ ˆ ˙

Lemma 3.5.6. 1 m Bγ0pxq is a U-statistic . k γ k | ÿ|“ ` ˘ Proof. By definition,

1 τ 2 B x , x , ..., x exp x2 γ0p 1 2 mq “ |γ| 2 i 2 2p1 ` τ q p1 ` τ q 2 # i:γ ‰0 + ÿi

2 k k 1 τ 2 h px1, x2, ..., xkq “ k exp xi . 2 2p1 ` τ 2q p1 ` τ q 2 # i“1 + ÿ Notice that an iteration over models is the same as an iteration through inputs:

k Bγ0pxq “ h pxi1 , xi2 , ..., xik q . γ: γ k Cm |ÿ|“ ÿk

Therefore,

1 k m Bγ0pxq is a U-statistics with kernel h . k γ k | ÿ|“ ` ˘

Definition 3.5.7. For c P t0, 1, ..., ku,

k E k hc px1, ..., xcq “ ph px1, x2, ..., xc,Xc`1, ..., Xkqq

k 2 k pσc q “ V arphc pX1, ..., Xcqq .

k k k 2 k 2 Note that hk “ h , pσk q “ V arph q “ σk.

70 k 2 Fact 3.5.8. Let U be the U-statistics with kernel h . If σk ă 8, then

? E k 2 k 2 m U ´ ph pX1, ..., Xkqq Ñ Np0, k pσ1 q q . “ ‰ The asymptotic theory of U-statistics can be found in Ferguson(2003).

2 Theorem 3.5.9. If τ ă 1, and kmax “ opmq, then under the null model,

? 1 1 m B pxq ´ 1 Ñ N 0, k2 ? ´ 1 . m γ0 4 k γ k 1 ´ τ „ | ÿ|“  ˆ „ ˙ ` ˘ m Proof. By Lemma 3.5.6, 1{ k |γ|“k Bγ0 is a U-statistics, then evaluate the expectation and variance in Definition 3.5.7` ˘ ř, one can get under the null model:

E E k pBγ0pXqq “ ph1pXqq “ 1 ,

1 2 |γ| if τ ă 1 , 2 2 2 EpBγ0pXq q “ rp1`τ qp1´τ qs 2 $ else, &8

%? 1 ´ 1 if τ 2 ă 1 , k 2 1´τ 4 pσ1 q “ #8 else,

1 2 k ´ 1 if τ ă 1 , k 2 rp1`τ 2qp1´τ 2qs 2 pσk q “ $8 else. & % then apply Fact 3.5.8.

71 4

Bayesian multiple testing for a sequence of trials

4.1 Introduction

Much progress has been made regarding multiplicity issues in drug development and clinical trials (Alosh et al.(2014), Streiner(2015)). These methods usually involve experimental design on multiple endpoints and subgroups, such as fixed sequence procedures (Holm

(1979), Westfall and Krishen(2001)), group sequential procedures (O’Brien and Flem- ing(1979), Whitehead(1997). Siegmund(1985)), and graphical approaches (Bretz et al.

(2009)). Here we consider a somewhat different scenario, imagining a sequence of com- pletely separate trials designed to investigate a particular condition. For instance, consider the (by now large) series of trials that have been conducted in an effort to find a treat- ment for Alzheimer’s disease. As these trials have been conducted by different companies and organizations, it is not common to think of the situation as one requiring multiplicity adjustment. But, from the perspective of society, this is a multiple testing problem and society should make a multiplicity adjustment to understand the evidence resulting from the sequence of trials.

i In each of the trials, we presume that there is a null model M0 that reflects no treatment i effect, and an alternative model M1 that indicates a treatment effect. We presume that

72 the data xi from trial i is independent of the data from the other trials, and that that any parameters of the models in each trial are also apriori independent.

4.2 Bayesian sequence multiplicity control

The key idea is to just treat the sequence of trials as an ordinary multiple testing problem,

i letting p denote the (assumed common) prior probability that the the null hypothesis M0 is true in each trial, and allowing p to be updated with each new trial. Thus, if a majority of the trials fail to reject, p will move towards 1 and the next trial will need stronger evidence for a rejection. While this could be treated exactly as in Chapter 3, with a new multiple testing analysis at the end of each trial (taking the trials up to that point as the collection of multiple tests), it is of interest to examine how the formulas update in a sequential 1 fashion. For simplicity in exposition, we will utilize the objective prior πppq “ r0,1sppq in the following.

th i i Denoting the marginal likelihood of the i trial under model Mj by mpxi | Mj q, the Bayes factor of the null hypothesis to the alternative hypothesis in the ith trial is

i i mpxi | M0q B01 “ i . mpxi | M1q

Because of the sequential updating aspect of Bayes theorem, the posterior probability of

n`1 the n ` 1 null hypothesis M0 , in the sequence of trials, is given by

´1 n`1 1 ´ pn 1 P pM | x1, ..., xn`1q “ 1 ` , 0 p Bn`1 ˆ n 01 ˙ where pn “ Epp | x1, ..., xnq. The following theorem gives the explicit form of the posterior probability of p and its expectation.

Theorem 4.2.1. The probability density function and expectation of the posterior proba-

73 bility of p are

n n´j j i1 in´j p p1 ´ pq B01...B01 n ` 1 j“0 1ďi1ă...ăin´j ďn πpp | x1, ..., xnq “ ř n ř , n ` 2 n i1 in´j ˆ ˙ 1{ n´j B01...B01 j“0 1ďi1ă...ăin´j ďn ř ` ˘ ř (4.1) n n`1 i1 in´j 1{ n`1´j B01...B01 E n ` 1 j“0 1ďi1ă...ăin´j ďn pp | x1, ..., xnq “ řn ` ˘ ř . n ` 2 n i1 in´j ˆ ˙ 1{ n´j B01...B01 j“0 1ďi1ă...ăin´j ďn ř ` ˘ ř Proof. By definition, the posterior of p is

n i pB01 ` p1 ´ pq πpp | x , x , ..., x q “ i“1 ˆ ˙ . (4.2) 1 2 n 1 śn pBi ` p1 ´ pq dp 0 01 i“1 ˆ ˙ ś The result follows by expanding the productsş in (4.2) and using the following identity to evaluate the denominator:

1 i j Γpi ` 1qΓpj ` 1q i!j! 1 p p1 ´ pq dp “ “ “ i`j . 0 Γpi ` j ` 2q pi ` j ` 1q! pi ` j ` 1q ż i ` ˘

Example 4.2.2. When n “ 1, the posterior of p is

1 1 1 pmpx1 | M0 q ` p1 ´ pqmpx1 | M1 q pB01 ` p1 ´ pq πpp | x1q “ 1 1 1 1 “ 2 1 , 2 mpx1 | M0 q ` 2 mpx1 | M1 q B01 ` 1

1 1 2B01 ` 1 p1 “ Epp | x1q “ pπpp | x1qdp “ . 3pB1 ` 1q ż0 01

Example 4.2.3. When n “ 2, the posterior density of p, given x1, x2, is

2 1 2 1 2 2 fpx1, x2 | pqπppq p B01B01 ` pp1 ´ pqpB01 ` B01q ` p1 ´ pq πpp | x1, x2q “ “ 6 1 2 1 2 . mpx1, x2q 2B01B01 ` B01 ` B01 ` 2

1 2 1 2 E 3B01B01 ` B01 ` B01 ` 1 p2 “ pp | x1, x2q “ 1 2 1 2 . 2 2B01B01 ` B01 ` B01 ` 2 ` ˘

74 4.2.1 Recursive formula

th Letting mpxiq denote the overall marginal likelihood of the i trial,

i i i i mi “ mpxiq “ P pM0qmpxi | M0q ` P pM1qmpxi | M1q ,

a recursive formula for the posterior density of p can be obtained, as follows:

Corollary 4.2.4.

mn 1 πpp | x , ..., x q “ πpp | x , ..., x q pBn ` p1 ´ pq ´ , 1 n 1 n´1 01 m ˆ ˙ n where n n n n i i Bi B01 B01 01 1 i“1 j“1 i‰j jăk i‰j,k 1 mn “ ` ` ` ... ` . n ` 1 ś n ř śn ř śn n „ n n´1 n´2 0  ` ˘ ` ˘ ` ˘ ` ˘ Another possibility is dynamic programming:

Corollary 4.2.5 (Recursive formula).

1 an1 an2 an3 1 mn “ ` ` ` ... ` , n ` 1 n n n n „ n n´1 n´2 0  ` ˘ ` ˘ ` ˘ ` ˘ i aij “ B01api´1qj ` api´1qpj´1q @i, j P t1, ..., nu where $ai0 “ a0i “ 0 @i P t1, ..., nu ’ &anpn`1q “ 1 %’ By caching the n coefficients of mn (i.e. ani, 1 ď i ď n ` 1), mn`1 can be computed in

n`1 linear time when the new data or Bayes factor B01 is provided.

4.3 HIV vaccines

In this section, we take the HIV vaccine trial study (Gilbert et al.(2011)) as an example to demonstrate how multiplicity control works in a sequence of trials. Here, a sequence of three vaccines were tested, yielding the following results: Vax004:

127 out of 1805 infected in the placebo group

75 241 out of 3598 infected in the vaccine group

Vax003:

105 out of 1260 infected in the placebo group

106 out of 1267 infected in the vaccine group

Lee et al. analysis

7 out of 240 infected in the placebo group.

24 out of 1257 infected in the vaccine group

For convenience, use i P t1, 2, 3u for these trials and j “ 0, 1 for the placebo and vaccine group, respectively; nij is the number of subjects in vaccine i and group j; xij is the number of infected cases in each group; xi “ txi0, xi1u; and ni “ tni0, ni1u. We model xij „ Binomialppij, nijq with a Betapa, bq prior on pij. To investigate the efficacy of each vaccine, test

i M0 : pi0 ď pi1 i #M1 : pi0 ą pi1 .

The marginal likelihood for the null model and the alternative model are

i mpxi, ni | M0q

1 1 pi1 n i1 xi0 ni0´xi0 a´1 b´1 “ p p1 ´ pi0q p p1 ´ pi0q dpi0 pBetapa, bqq2 x i0 i0 ż0 ˆ ż0 ˆ i0˙ ˙

ni1 pxi1 p1 ´ p qni1´xi1 pa´1p1 ´ p qb´1dp , x i1 i1 i1 i1 i1 ˆ i1˙ i mpxi, ni | M1q

1 1 1 n i0 xi0 ni0´xi0 a´1 b´1 “ p p1 ´ pi0q p p1 ´ pi0q dpi0 pBetapa, bqq2 x i0 i0 ż0 ˆ żpi1 ˆ i0˙ ˙

ni1 pxi1 p1 ´ p qni1´xi1 pa´1p1 ´ p qb´1dp . x i1 i1 i1 i1 i1 ˆ i1˙

76 i One can compute the resulting Bayes factors, B01, numerically. For the uniform prior

(a “ b “ 1) on the pij, these are 2.185, 1.048, 0.147, respectively. The posterior probability

rd 3 of a treatment effect in the 3 trial, without multiplicity adjustment, is P pM1 | x3, n3q “

3 ´1 p1 ` B01q “ 0.872. With multiplicity adjustment, the posterior probability is

´1 3 p2 3 P pM | x1:3, n1:3q “ 1 ` B “ 0.839 . 1 1 ´ p 01 ˆ 2 ˙

Thus the evidence for effectiveness of the 3rd vaccine is somewhat weakened due to failure of first two trials.

4.4 Analysis when only p-values are available

As we are considering a sequence of completely different trials, it may be that the data from each trial is unavailable. Indeed, we may only have the p-value from each trial. Lower bounds on the Bayes factors in each trial can still be obtained, based on Sellke et al.(2001).

i Indeed, they showed that B01 ě ´e p log p when the corresponding p-value is less than 1{e.

i Lemma 4.4.1 (lower bound on pn). If B01 ě ci @i, then n n n n c n ci ci i ´1 n ` 1 i“1 j“1 i‰j jăk i‰j,k 1 pn ě ci ` ` ` ... ` . n ` 2 śn ř nś ř nś n i“1 n n´1 n´2 0 ˆ ˙ˆ ź ˙ˆ ˙ ` ˘ ` ˘ ` ˘ ` ˘ Proof.

1 n 1 n p pBi 1 p dp p p 1´p dp 01 ` p ´ q ` Bi 0 1 0 1 01 pn “ ˆ ˙ “ ˆ ˙ 1 śn 1 nś i 1´p pB01 ` 1 ´ p dp p ` i dp B01 0 1 ˆ ˙ 0 1 ˆ ˙ ş ś ş ś 1 n n 1 p p ` 0 dp n`2 ci 0 ş ş ě 1 ˆ ˙ “ 1 1 n ś 1 n ś p ` 1´p dp pc ` 1 ´ p dp ci i 0 1 ˆ ˙ 0 1 ˆ ˙ ş ś ś n n n n c ci ci i ´1 n ` 1 j“1 i‰j jăk i‰j,k 1 “ ş c i“1 ş` ` ` ... ` . n 2 i śn ř nś ř nś n ` i n n´1 n´2 0 ˆ ˙ˆ ź ˙ˆ ˙ ` ˘ `77 ˘ ` ˘ ` ˘ We finish with general, but not very useful, bounds on p1 ´ pnq{pn.

Corollary 4.4.2 (generic bounds).

1 1 ´ pn ď ď n ` 1 . n ` 1 pn

Proof.

1 n E E 1 n j pn “ pp | x1, ..., xnq “ pp | B01, ..., B01q “ p pB01 ` 1 ´ p dp , 0 j“1 ż ź ˆ ˙

1 Bpn “ p2 pBj ` 1 ´ p dp , BBi 01 01 0 j‰i ż ź ˆ ˙ 1 2 j p p pB 1 p dp ´ B n ´ 01 ` ´ B 1 ´ pn BBi 0 j‰i “ 01 “ ˆ ˙ ď 0 . BBi p p2 ś p2 01 ˆ n ˙ n n

ş i Hence, p1 ´ pnq{pn is monotone decreasing with respect to B01,

1 1 ´ pn 1 ´ pn 1 ´ pn “ lim ď ď lim “ n ` 1 . n ` 1 Bi i p p Bi 0 i p 01Ñ8 @ ˆ n ˙ ˆ n ˙ 01Ñ @ ˆ n ˙

78 5

Reconciling frequentist and Bayesian methods in sequential endpoint testing

5.1 Introducion

Sequential endpoint testing (or fixed-sequence testing) is about testing a sequence of possi- ble different treatment effects. It was considered by Maurer et al.(1995), Westfall and

Krishen(2001), Wiens(2003), and is widely used in clinical trials (Huque and Alosh

(2008)). In the original version of this procedure, endpoints are ordered a proiori and tested sequentially at nominal level α, until a nonsignificant endpoint is found. This pro- cedure controls FWER in the strong sense, but may lose power when stopping too early.

In this chapter, we will be discussing the implied multiplicity issue and the possibility of reconciling Bayesian and frequentist methods in sequential endpoint testing. More formally, the frequentist sequential endpoint test proceeds as follows:

i i • Pre-specify n null and alternative hypotheses pairs, tM0,M1u, i P t1, ..., nu, ordered in terms of importance (most important first).

th i i • In the i test (i ă n), test tM0,M1u at level α, continuing to the next test if the ith null model is rejected; if the ith test is not a rejection, stop testing and report

overall type-I error (the probability that any rejection is an error) as α .

79 • If the testing terminates at the last trial, report overall type-I error of α for the

rejections that were made.

The startling feature of this test is that there is apparently no penalty for conducting multiple tests. One could conceivably reject all n hypotheses, each at nominal level α, and report that the overall error probability (the chance that any rejection is erroneous) is α. This is in stark contrast to Bayesian testing. If one has followed the above procedure and has rejected the first m null hypotheses, the of at least one of the rejections being in error is

m i 1 ´ P pM1 | xiq , i“1 ź i th where P pM1 | xiq is the posterior probability of the i alternative given the data. As m grows, this error probability will go to one. Note that this is true even without introducing

a multiplicity adjustment in the Bayesian testing. The major difference between frequentist and Bayesian testing here is that of condi-

tioning. The frequentist procedure is unconditional, and has overall error α because it is

very unlikely to observe a long series of rejections. In contrast the Bayesian procedure is

fully conditional, and assesses the error given that a long series of rejections has happened. This type of frequentist/Bayesian conflict has been addressed in other scenarios by

utilizing the conditional frequentist approach (Berger et al.(1994) and Dass and Berger

(2003)), showing that, if conditioning is introduced into the frequentist paradigm, then the

conflict disappears. The purpose of this chapter is to investigate whether the conditional

frequentist paradigm can successfully remove the conflict in sequential endpoint testing.

The answer appears to be no, in general. Section 5.2.2 covers a partition construction on the sequential endpoint

when testing two simple hypotheses. Section 5.3 collects several examples explaining when

frequentist and Bayesian error are not equivalent.

80 5.2 Conditional frequentist testing 5.2.1 Testing of two simple hypotheses

Consider testing two simple hypotheses

i M0 : Xi has density f0pxiq i #M1 : Xi has density f1pxiq.

Then α-level tests are as follows.

• The likelihood ratio test (LRT):

i i f0pxiq reject M0 if B01pxiq “ ă ki , f1pxiq

i i where ki is such that α “ P xi : B01pxiq ă ki | M0 . ˆ ˙

i • Bayesian test: defining si ” B01pxiq, the Bayesian test can be summarized as

i i B01 – if si ď 1, reject M0 and report the conditional error probability α “ i ; 1`B01

i 1 – if si ě 1, accept M0 and report the conditional error probability β “ i . 1`B01

Note that the frequentist and Bayesian test statistics are equal in this case. In the following,

i denote the cumulative density function (cdf) of the null and alternative models by F0 “ i F0pxiq, F1 “ F1pxiq, respectively. For a single test, Berger et al.(1994) showed the equivalence of the Bayesian proce- dure and a conditional frequentist procedure. In particular, they defined the conditional frequentist procedure by considering a partition of the sample space, with partitions

1 1 ´1 χs1 “ tx1 P χ1 | B01px1q P ts1, ψps1qu where ψps1q “ pF0 q p1 ´ F1ps1qq ,

and then defining frequentist error probabilities conditioned on being in a partition.

81 Theorem 5.2.1 (Berger, Brown, Wolpert, 1994). For 0 ă s1 ă 1,

1 s1 αps1q “ P0prejecting M0 | χs1 q “ 1 ` s1

1 1 “ posterior probability of M0 when B01 ă 1

1 1 βps1q “ P1paccepting M0 | χs1 q “ 1 ` ψps1q

1 1 “ posterior probability of M1 when B01 ą 1 .

Two obvious consequences of the above theorem are that

Corollary 5.2.2.

1 s P0paccepting M | χsq “ 1 ´ 0 1 ` s

1 “ posterior probability of M1 when B01 ă 1 ;

1 1 P1prejecting M | χsq “ 1 ´ 0 1 ` ψpsq

1 “ posterior probability of M0 when B01 ą 1 .

We first generalize this construction to the sequential endpoint testing scenario.

5.2.2 Conditioning partitions for the sequential endpoint problem

Similar to Theorem 5.2.1, we attempt to establish a conditioning partition for the sequential endpoint sample space:

χ “ pX1Aq Y pX1R,X2Aq Y pX1R,X2R,X3Aq Y ¨ ¨ ¨ pX1R,X2R, ..., Xpn´1qR,Xnq , where the XiA, XiR represent the acceptance and rejection regions for test i, respectively.

We will consider two different choices of partitions when n “ 2.

A fixed partition: A seemingly natural partition to consider begins by defining ψpsiq “

i ´1 i pF0q p1 ´ F1psiqq, for any 0 ă si ď 1, and then lets the partition be

χs1,s2

i “ x P χ : B01 P tsi, ψpsiqu, i P t1, 2u ( “ tBpx1q “ s1,Bpx2q “ s2u Y tBpx1q “ s1,Bpx2q “ ψps2qu Y tBpx1q “ ψps1qu . 82 Unfortunately, this does not work because tBpx1q “ ψps1qu is of a different dimension than the other two parts of the partition. To overcome this, we must partition tBpx1q “ ψps1qu further, and an interesting way to do this is to the define its subpartitions so that they

‘match up’ with tBpx1q “ s1,Bpx2q “ s2u Y tBpx1q “ s1,Bpx2q “ ψps2qu in terms of

probability under the null hypothesis. Specifically, we will partition tBpx1q “ ψps1qu into

˚ ˚ subsets indexed by s2, with the conditional mass of the subsets being f0 ps2q ` f1 ps2q. The ensuing overall fixed partition of the sample space is

F i χs1,s2 “ x P χ : B01 P tsi, ψpsiqu, i P t1, 2u ( “ tBpx1q “ s1,Bpx2q “ s2u Y tBpx1q “ s1,Bpx2q “ ψps2qu (5.1) B x ψ s Y t p 1q “ p 1qus2

“ I Y II Y III.

This will be considered in Appendix 5.4.1.

A random partition: The fixed partition does not yield results as simple as we had hoped,

and so we also consider a random partition, defined by partitioning tBpx1q “ ψps1qu into

subsets indexed by s2, as follows:

B x ψ s B x ψ s B x ψ s , t p 1q “ p 1qu “ Y0ďs2ď1 t p 1q “ p 1qus2 Y t p 1q “ p 1quψps2q ” ı where the B x ψ s and B x ψ s are chosen to have conditional prob- t p 1q “ p 1qus2 t p 1q “ p 1quψps2q ˚ ˚ ˚ abilities proportional to fT ps2q and fT pψps2qq, respectively, where fT p¨q refers to the un- known true density. The resulting overall random partition is

R i χs1,s2 “ x P χ : B01 P tsi, ψpsiqu, i P t1, 2u ( “ tBpx1q “ s1,Bpx2q “ s2u Y tBpx1q “ s1,Bpx2q “ ψps2qu (5.2) B x ψ s B x ψ s Y t p 1q “ p 1qus2 Y t p 1q “ p 1quψps2q

“ I Y II Y IIIA Y IIIB.

˚ ˚ The reason this is called a random partition is that we do not know fT ps2q and fT pψps2qq and, hence, the partition will change with the true distribution.

83 Use of a random partition might seem strange, but it does satisfy the key requirements of conditional frequenist inference. First, conditional frequentist error probabilities are still computable; these probabilities are always calculated separately for each given possible true model. Hence, in their computation, the partition is known. Second, the resulting conditional error measures still satisfy the overall frequentist long run error property, since again that property is defined with respect to a given model. As a final comment on the use of this random partition, note that the only arises in the way that tBpx1q “ ψps1qu is partitioned; this partitioning is rather artificial

(it is only x1 that arises from the sequential sample here), so setting up the partition so that the answer does not depend on the supposedly irrelevant x2 seems quite appealing.

5.2.3 Conditional frequentist error probabilities for the random partition

In this section, the conditional frequentist error probabilitilies, given the random partition,

1 2 are provided. Assuming Mi and Mj are true, we compute Pijpdecision | χs1,s2 q, the error

probability for the decision conditioned on χs1,s2 , in the following two theorems. This results are also summarized in the following table.

Table 5.1: Frequentist error probabilities, conditional on being in the random parti- tion.

ps1, s2q ps1, ψps2qq pψps1q, s2 Y ψps2qq 1 2 1 2 1 reject M0 , reject M0 reject M0 , accept M0 accept M0

s1 s2 s1 1 1 P00p¨ | χs ,s q 1 2 s1`1 s2`1 s1`1 s2`1 s1`1 ˆ ˙ˆ ˙ ˆ ˙ˆ ˙ s1 ψps2q s1 1 1 P01p¨ | χs ,s q 1 2 s1`1 ψps2q`1 s1`1 ψps2q`1 s1`1 ˆ ˙ˆ ˙ ˆ ˙ˆ ˙ ψps1q s2 ψps1q 1 1 P10p¨ | χs ,s q 1 2 ψps1q`1 s2`1 ψps1q`1 s2`1 ψps1q`1 ˆ ˙ˆ ˙ ˆ ˙ˆ ˙ ψps1q ψps2q ψps1q 1 1 P11p¨ | χs ,s q 1 2 ψps1q`1 ψps2q`1 ψps1q`1 ψps2q`1 ψps1q`1 ˆ ˙ˆ ˙ ˆ ˙ˆ ˙

84 Theorem 5.2.3 (Type I error).

1 1 2 P00prejecting M0 | χs1,s2 q “ P00prejecting M0 _ M0 | χs1,s2 q

1 B01 1 “ 1 if B01 ă 1 B01 ` 1

2 1 2 P00prejecting M0 | χs1,s2 q “ P00prejecting M0 ^ M0 | χs1,s2 q

B1 B2 “ 01 01 if B1 ă 1; B2 ă 1 B1 ` 1 B2 ` 1 01 01 ˆ 01 ˙ˆ 01 ˙ 2 1 2 P10prejecting M0 | χs1,s2 q “ P10prejecting M0 ^ M0 | χs1,s2 q

2 ψps1q B “ 01 if B2 ă 1 1 ` ψps q 1 ` B2 01 ˆ 1 ˙ˆ 01 ˙ 1 1 2 P01prejecting M0 | χs1,s2 q “ P01prejecting M0 ^ M0 | χs1,s2 q

B1 “ 01 if B1 ă 1 B1 ` 1 01 ˆ 01 ˙

1 ψps1q P11prejecting M0 | χs1,s2 q “ 1 ` ψps1q

2 ψps1q ψps2q P11prejecting M | χs ,s q “ 0 1 2 1 ` ψps q 1 ` ψps q ˆ 1 ˙ˆ 2 ˙ 2 2 1 B01 2 P10prejecting M | rejecting M , χs ,s q “ if B ă 1 0 0 1 2 B2 ` 1 01 ˆ 01 ˙

85 Theorem 5.2.4 (Type II error).

2 1 2 P01paccepting M0 | χs1,s2 q “ P01prejecting M0 X accepting M0 | χs1,s2 q

B1 1 “ 01 if B1 ă 1; B2 ą 1 B1 ` 1 B2 ` 1 01 01 ˆ 01 ˙ˆ 01 ˙ 2 1 2 P11paccepting M0 | χs1,s2 q “ P11prejecting M0 X accepting M0 | χs1,s2 q

ψps1q 1 “ if B2 ą 1 ψps q ` 1 B2 ` 1 01 ˆ 1 ˙ˆ 01 ˙

1 1 1 P10paccepting M | χs ,s q “ if B ą 1 0 1 2 B1 ` 1 01 ˆ 01 ˙

1 1 1 P11paccepting M | χs ,s q “ if B ą 1 0 1 2 B1 ` 1 01 ˆ 01 ˙ 1 2 1 2 B01 B01 1 2 P11paccepting M _ M | χs ,s q “ 1 ´ if B ,B ą 1 0 0 1 2 B1 ` 1 B2 ` 1 01 01 ˆ 01 ˙ˆ 01 ˙

2 1 1 2 P01paccepting M | rejecting M , χs ,s q “ if B ą 1 0 0 1 2 B2 ` 1 01 ˆ 01 ˙

Unfortunately, these frequentist errors, conditioned on this random partition, can be quite different from the Bayesian reported errors. In general, constructing appropriate partitions on the sequential endpoint space seems very hard. In the next section, we investigate whether it is possible to construct partitions that lead to Bayesian/frequentist agreement.

5.3 Assessing the possibility of Bayesian/frequentist agreement

One cannot conclude from the failure in the previous section to find a conditional frequentist procedure that agrees with the Bayesian procedure that no such procedure exists. But note that conditional frequentist procedures still have unconditional error probability α so that, if one can be found to match the standard Bayesian test, then the standard Bayesian test should also have expected error probability α (at least approximately), when averaged over the rejection regions. In this section we study if this is so, for a variety of examples. To begin, the following table gives the decisions and reported errors, for the different

86 parts of the sample space, in sequential endpoint testing. Here αf pχsq and αBpχsq are the reported frequentist and Bayesian error probabilities, respectively.

Table 5.2: Decisions and reported errors in sequential endpoint tests

subset of χ decision αf pχsq αBpχsq 1 A1 Accept M0 1 2 1 pR1,A2q Reject M0 , accept M0 α P pM0 | x1q 2 1 2 3 i pR1,R2,A3q Reject M0 , M0 , accept M0 α 1 ´ P pM1 | xiq 1 ...... ś n´1 1 n´1 n i pR1, ..., Rn´1,Anq Reject M0 to M0 , accept M0 α 1 ´ P pM1 | xiq 1 n ś i pR1, ..., Rn´1,Rnq Reject all null models α 1 ´ P pM1 | xiq 1 ś 5.3.1 Testing two simple hypotheses

We first consider the case when n “ 2. For completeness, we give the frequentist compu- tation showing that the overall error probability of the sequential endpoint test is α. For comparison with the Bayesian procedure, we formally compute this unconditional Type I error as the expected value of the frequentist reported error (α), conditioned on rejecting.

Example 5.3.1 (two endpoints, frequentist).

Table 5.3: Decisions and reported frequentist errors for two endpoints.

partition χs decision frequentist error P pχsq 1 A1 Accept M0 p1 ´ αq 1 2 pR1,A2q Reject M0 , accept M0 α αp1 ´ αq 1 2 2 pR1,R2q Reject M0 , reject M0 α α

Thus the expected frequentist report, conditioned on being in the rejection region, is

αf “ Epα | rejectionq

1 “ αf px qf px qdx dx ` αf px qf px qdx dx α 0 1 0 2 1 2 0 1 0 2 1 2 „ żA2 żR1 żR2 żR1  1 “ p1 ´ αqα2 ` α3 “ α . α „  87 Next we give the frequentist expectation, conditioned on rejecting, of the reported

Bayesian error for two endpoints.

Lemma 5.3.2 (two endpoints, Bayesian). The expected Bayesian reported error, condi- tional on the rejection region, is

1 α “ p1 ´ v qv α ` v , where v “ P pM i | x qf px qdx . (5.3) B 1 2 1 i α 0 i 0 i i żRi Proof. The decisions and Bayesian reported error probabilities in two dimension are

Table 5.4: Decisions and reported errors

partition χs decision Bayesian error P pχsq 1 A1 Accept M0 p1 ´ αq 1 2 1 pR1,A2q Reject M0 , accept M0 P pM0 | x1q αp1 ´ αq 1 2 1 2 2 pR1,R2q Reject M0 , reject M0 1 ´ P pM1 | x1qP pM1 | x2q α

The expectation of this Bayesian error probability, conditioned on the rejection region, is

αB “ EpBayesian error | rejectionq

1 “ P pM 1 | x qf px qf px qdx dx α 0 1 0 1 0 2 1 2 żA2 żR1 1 ` 1 ´ P pM 1 | x qP pM 2 | x q f px qf px qdx dx α 1 1 1 2 0 1 0 2 1 2 żR2 żR1 ˆ ˙ 1 “ p1 ´ αq 1 ´ P pM 1 | x q f px qdx α 1 1 0 1 1 „ żR1 ˆ ˙  1 ` α2 ´ P pM 1 | x qf px qdx P pM 2 | x qf px qdx α 1 1 0 1 1 1 2 0 2 2 „ ˆ żR1 ˙ˆ żR2 ˙ 1 “ p1 ´ αqv α ` α2 ´ p1 ´ v qp1 ´ v qα2 α 1 1 2 „ 

“ p1 ´ v1qv2α ` v1 .

In practice, we are interested in small α, e.g. α “ 0.05. An immediate consequence of

(5.3) is that, if v1 does not converge to zero when α goes to zero, then αB cannot possibly equal α. Here is an example.

88 Example 5.3.3 (bounded distributions). Suppose the primary endpoint is testing:

1 M0 : f0px1q “ 1 where x1 P r0, 1s 1 #M1 : f1px1q “ 2x where x1 P r0, 1s .

The likelihood ratio is

f0px1q 1 “ . f1px1q 2x1

Therefore, the rejection region is

1 R1pαq “ tx1 : x1 ą 1 ´ αu; P pR1 | M0 q “ α .

Since the posterior probability is

1 f0px1q P pM0 | x1q “ , f0px1q ` f1px1q

1 1 dx 1 3 v1 “ “ log . α 2x ` 1 2α 3 ´ 2α ż1´α ˆ ˙ Now consider the secondary endpoint:

2 M0 : x2 „ Uniformp0, 1q 2 #M1 : x2 „ Betap1{2, 1q .

The likelihood ratio and rejection region are

f px q 1 ? 0 2 2 x , “ ? ´1 “ 2 f1px2q p2 x2q

? 2 R2pαq “ tx2 : x2 ă αu; P pR2 | M0 q “ α .

Therefore, ? 1 α 2 xdx 1 ? 1 ? v2 “ ? “ 1 ´ α ´ logp2 α ` 1q . α 2 x ` 1 α 2 ż0 „  Notice that

lim v1 “ 1{3 αÑ0 $ lim v2 “ 0 . &αÑ0

Since limαÑ0 v1 Û 0, from (5.3) it is% clear that αBpαq ą α for small α.

89 This example shows that it is not possible to find a partition on the sequential endpoint testing sample space having the same conditional frequentist and Bayesian errors at every endpoint. Nonetheless, since αB ą α, the reported Bayesian error probability is greater than the frequentist error (on average), so that this Bayesian procedure could still be considered a valid frequentist procedure (though overly conservative).

Even if lim vi “ 0, it can still be the case that αB ą α for small α, as the following αÑ0 example shows.

Example 5.3.4 (Gaussian distribution).

M i : f px q „ Npµ , σ2q 0 0 i 0 where µ µ , i 2 0 ă 1 #M1 : f1pxiq „ Npµ1, σ q

Ripαq “ tx : x ą σz1´α ` µ0u .

Since vi does not have closed form here, we use the following approximation instead, which is valid for small α:

1 f0pxiq vi “ f0dxi α f px q ` f px q żRipαq 0 i 1 i 1 1 “ 1 ´ f0dxi α 1 ` f {f żRipαq 0 1 (5.4) 1 1 f 2 1 f 3 “ 1 ´ f dx ` 0 dx ` O 0 dx α 0 α f α f 2 żRipαq żRipαq 1 ˆ żRipαq 1 ˙ 1 f 2 1 f 3 “ 0 dx ` O 0 dx . α f α f 2 żRipαq 1 ˆ żRipαq 1 ˙

Also,

1 1 8 f 2{f dx “ exp ´1{p2σ2q 2px ´ µ q2 ´ px ´ µ q2 dx α 0 1 α 0 1 żRipαq żσz1´α`µ0 “ ‰( (5.5) 1 “ exptpµ ´ µ q2{σ2u 1 ´ Φpz ´ µ ` µ q . α 0 1 1´α 0 1 ˆ ˙

90 By Fact 5.4.1 and Lemma 5.4.2,

1 ´ Φpz1´α ` µ1 ´ µ0q ? ? ?1 e´pµ1´µ0q{2α 2π ´2 logpα 2πq “ 2π p1 ` op1qq ? b ? µ1 ´ µ0 ´ 2logpα 2πq ´ logp´2 logpα 2πqq b ? e´pµ1´µ0q{2α 2π “ ? p1 ` op1qq log 2 log α 2π 1 ` µ1´µ?0 ´ p´ ?p qq ´2 logpα 2πq logpα 2πq c “ Opαq

Thus, (5.5) is greater than α when α is small ñ vi “ Op1q ñ αB ą α.

Similarly, with the exponential distribution (testing f0pxiq “ λ0 expt´λ0xiu versus f1pxiq “ λ1 expt´λ1xiu, λ1 ą λ0), it can be shown that αB ą α for small α. Thus we have several examples involving testing simple hypotheses where the Bayesian and conditional frequentist cannot possibly report the same error probabilities in sequential endpoint testing. In the next section, we investigate composite hypotheses.

5.3.2 Testing composite hypotheses

For composite hypothesis testing:

M i : x „ f px | θ q; θ P Θ 0 i 0 i 0 0 (5.6) i #M1 : xi „ f1pxi | θ1q; θ P Θ1 .

Given priors on the unknown parameters θ0, θ1 and equal prior probability on both models (=1/2), the posterior probability of the null hypothesis is

i m0pxiq P pM0 | xi, π0, π1q “ , m1pxiq ` m0pxiq where the marginal likelihood is

mjpxiq “ fjpxi | θqπjpθjqdθj j P t0, 1u . żΘj

91 In particular, when f0 “ f1; π0 “ π1;Θ0 “ p´8, 0q;Θ1 “ p0, 8q,(5.6) is a one-sided test. The following theorem shows that, using the standard constant prior, the average

Bayesian null posterior probability over the rejection region is equal to α{2, which is anti-

conservative.

Theorem 5.3.5 (one-sided hypothesis testing). Suppose

i M0 : fpxi ´ θq, θ ď 0 , i #M1 : fpxi ´ θq, θ ą 0 , where f is symmetric about zero and has monotone likelihood ratio (MLR) with constant prior πpθq “ 1. Then the expected Bayesian error probability, conditional on rejection, satisfies 1 P pM i | x qfpx qdx “ α{2 . α 0 i i i żRipαq Proof. By Casella and Berger(1987),

i P pM0 | xiq “ ppxiq , where ppxiq is the p-value. Since f has MLR, without left of generality, let Ripαq “ tx : x ą cu, so that 8 8 ppxiqf0pxiqdxi “ f0ptqdt f0pxiqdxi żRipαq żc ˆ żx ˙ 8 “ p1 ´ F0pxiqqf0pxiqdxi żc ´1 F0 p8q “ α ´ F0pxiqdF0pxiq F ´1 c ż 0 p q “ α ´ p1 ´ p1 ´ αq2q{2 “ α2{2 .

The theorem follows by noticing that 1 1 P pM i | x qf px qdx “ ppx qf px qdx “ α{2 . α 0 i 0 i i α i 0 i i żRipαq żRipαq

92 While the Bayesian report here is the same as the p-value, neither are valid frequentist

reports, both having expected value, conditional on the rejection region, of α{2; i.e., both

are anti-conservative.

Remark 5.3.6. Casella and Berger also showed that

i inf P pM0 | xi, πq “ ppxiq πPΓs where Γs “ tall distributions symmetric about zerou and the infimum is attained when the prior is constant. Hence, under the same assumptions,

1 i inf P pM0 | xi, πqfpxiqdxi “ α{2 . πPΓs α żRipαq

Corollary 5.3.7. If the primary endpoint satisfies the assumptions in Theorem 5.3.5, then the average Bayesian error, conditional on rejecting, is a proper frequentist procedure if and only if the secondary has

1{2 v2 ą . (5.7) 1 ´ α{2

Proof. By Theorem 5.3.5 and (5.3),

α{2 α{2 αB “ ą “ α . α{2 ` p1 ´ α{2qp1 ´ v2q α{2 ` p1{2 ´ α{2q

When testing two-sided hypotheses, the p-value is much smaller than the Bayesian error probability (Sellke et al.(2001)), so that it might well be the case that αB is greater than α for two-sided tests. Thus we finish with the example of a two-sided test with the

Gaussian distribution. This example is similar to that in Example 5.3.4.

Example 5.3.8 (two-sided hypothesis testing, Gaussian distribution).

i 2 M0 : f0pxiq „ Np0, σ q i 2 #M1 : f1pxiq „ Npµ, σ q; µ ‰ 0 93 Suppose µ „ Np0, τ 2q, so that the marginal likelihoods are

1 2 2 m0pxiq “ ? expt´x {p2σ qu 2πσ2 i 1 2 2 2 $m1pxiq “ ? expt´xi {p2σ ` 2τ qu , & 2πpσ2`τ 2q % Ripαq “ xi : |xi| ą σz1´α{2 , 1 1 ( vi “ f0pxiqdxi α 1 ` m {m żRipαq 0 1

2 2 2 2 1 σ ` τ ´xi τ “ ? exp ` 1 dxi α 2π σ2 2σ2 σ2 ` τ 2 c " „ * |xi|ąσzż 1´α{2

2pσ2 ` τ 2q τ 2 “ ? 1 ´ Φ z1´α{2 ` 1 . α 2τ 2 ` σ2 σ2 ` τ 2 „ ˆ c ˙ By Lemma 5.4.2,

1 ´ Φpxiq “

τ 2 π τ 2 π exp σ2`τ 2 ` 1 logpα 2 q exp τ 2`σ2 ` 1 log ´ 2 logpα 2 q "ˆ ˙ * "ˆ ˙ ˆ ˙*p1 ` op1qq ? a π π a 2π ´2logp 2 αq ´ logp´2 logpα 2 qq b a a τ 2{pτ 2`σ2q`1 τ 2{pσ`τ 2q π π α 2 ´ 2 logpα 2 q “ ˆ ˙ ˆ ? ˙ p1 ` op1qq a ? logp´2 logpα aπ qq 2π 1 ` ? 2 ´logpα π q c 2 2 2 2 2 2 2 “ Opατ {pσ `τ q`1p´log αqτ {pσ `τ qqp1 ` op1qq .

Therefore, vi is greater than α for small α, showing that αB is a conservative frequentist report. These examples show that virtually anything is possible in sequential endpoint testing; the Bayesian answers can be conservative frequentist reports or anti-conservative frequen- tist reports. Obtaining exact agreement appears almost impossible: first one would need to find a (peculiar) situation in which the Bayesian procedure has exact average reported error of α, and then one would need to find a partitioning of the sample space so that the conditional frequentist error probabilities equal the Bayesian error probabilities.

94 5.4 Appendix 5.4.1 Conditional frequentist analysis for the fixed partition

In this section, we derive conditional frequentist error probabilities using the fixed partition

(5.1).

Table 5.5: Decision rules region I II III 1 2 pB01px1q,B01px2qq ps1, s2q ps1, ψps2qq pψps1q, s2q 1 2 1 2 1 decision reject M0 & M0 reject M0 , accept M0 accept M0

Table 5.6: Conditional probabilities up to a normalizing constant

I II III ˚ ˚ ˚ ˚ ˚ ˚ ˚ P00p. | χs1,s2 q9 f0 ps1qf0 ps2q f0 ps1qf1 ps2q f1 ps1qpf0 ps2q ` f1 ps2qq ˚ ˚ ˚ 1 ˚ ˚ ˚ ˚ P01p. | χs ,s q9 f ps1qf ps2q f ps1q f ps2q f ps1qpf ps2q ` f ps2qq 1 2 0 1 0 ψps2q 1 1 0 1 ˚ ˚ ˚ ˚ ˚ 1 ˚ ˚ P10p. | χs ,s q9 f ps1qf ps2q f ps1qf ps2q f ps1q pf ps2q ` f ps2qq 1 2 1 0 1 1 1 ψps1q 0 1 ˚ ˚ ˚ 1 ˚ ˚ 1 ˚ ˚ P11p. | χs ,s q9 f ps1qf ps2q f ps1q f ps2q f ps1q pf ps2q ` f ps2qq 1 2 1 1 1 ψps2q 1 1 ψps1q 0 1

Error probabilities

˚ 1 2 1 I`II f0 ps1q • P00preject M0 _ M0 | χs1,s2 q “ P00preject M0 | χs1,s2 q “ “ ˚ ˚ “ I`II`III f0 ps1q`f1 ps1q

1 B01 1 B01`1

f ˚ s • P preject M 1 | χ q “ I`II “ 0 p 1q 01 0 s1,s2 I`II`III ˚ ˚ 1`s2 f ps1q`f ps1q 0 1 1`1{ψps2q ` ˘

˚ ˚ 2 I f0 ps2q f0 ps2q • P preject M | χ q “ “ ď ˚ ˚ 10 0 s1,s2 I`II`III 1 ˚ ˚ f s f s 1` f ps2q`f ps2q 0 p 2q` 1 p 2q ψps1q 0 1 ` ˘` ˘

1 2 • P11pfalsely reject M0 _ M0 | χs1,s2 serveq “ 0

95 Power

1. Power: Make all correct rejection

1 2 • P00pcorrectly reject M0 ,M0 | χs1,s2 q “ 0

1 2 II ψps2q • P01preject M | χs ,s q “ “ 1 s 0 1 2 I`II`III 1` 1 ` ` 2 ψps2q s1

1 1 I`II 1 ψps1q B01 • P10preject M | χs ,s q “ “ 1 “ “ 1 0 1 2 I`II`III 1` ψps1q`1 B `1 ψps1q 01

1 2 I 1 1 • P11preject M ^M | χs1,s2 q “ “ 1 1 “ 1 1 s2 0 0 I`II`III 1` ` ps2`1q 1` ` ` ψps2q ψps1q ψps2q ψps1q ψps1q

i i 2. Marginalized power: P preject M0 | M1q

5.4.2 Conditional frequentist test

Fact 5.4.1 (Normal tail probability). Let Φptq be cummulative distribution function of standard normal, then

2 2 t ?1 e´t {2 ?1 e´t {2 2π ď 1 ´ Φptq ď 2π t2 ` 1 t

2 1 ´t {2 2 ? e e´t {2 1 ´ Φptq “ 2π ` O t t3 ˆ ˙

Proof can be found in Durrett(2010).

Lemma 5.4.2 (Asymptote of Gaussian quantile). Let zp be the p-quantile of the standard Gaussian distribution i.e.

t 1 ´x2{2 p “ Φpzpq where Φptq “ ? e dx 2π ż´8 then ? ? z1´α “ ´2 logp 2παq ´ logp´2 logp 2παqqp1 ` op1qq b

96 Proof. By Fact 5.4.1, the ratio of normal tail probability and α is:

1 ´ Φpz1 αq φpz1 αq 1 ´ “ ´ α z1´α α ? ? ?1 exp log 2πα exp 1 log 2πα 2π t u t 2 u “ ? ? α ´2 logp 2παq ´ logp´2 logp 2παqqp1 ` op1qq b ? logp´2 log 2παq ´1 “ 1 ` ? p1`q 2 log 2πα ˆ ˙ “ 1 ` op1q

Theorem 5.4.3. Suppose there is no constant c ą 0 such that

i P xi f0pxiq ă cf1pxiq,M0 “ 1 ˆ ˙ ˇ ˇ then

lim vipαq “ 0 αÑ0

Proof. Under the null model, vi, the average of null posterior over the rejection region at ith endpoint, is bounded above by:

1 1 1 kipαq kipαq vi “ f0dxi ă f0dxi “ . α 1 ` f {f α 1 ` k pαq 1 ` k pαq ˆ żxi:f0ăkipαqf1 1 0 ˙ ˆ i żx:f0ăkipαqf1 ˙ i

Note that vi ą 0 and the upper vi from Remark 5.4.2, if limαÑ0 kpαq “ 0, then the

bounds force vipαq to 0, i.e. lim vipαq “ 0. αÑ0 To prove lim kpαq “ 0, let αÑ0

Gpcq “ f0dxi żxi:f0ăcf1

which has several properties by definition: (1) Gp0q “ 0 (2) G is monotonic increasing. (3)

α “ Gpkpαqq.

97 Since

lim α “ lim Gpkpαqq “ 0 αÑ0 αÑ0

And by assumption, Gpcq can be zero only when c “ 0. Thus, limαÑ0 kpαq “ 0.

The exponential distribution satisfies Theorem 5.4.3.

98 Bibliography

Abramovich, F. and Angelini, C. (2006), “Bayesian maximum a posteriori multiple testing procedure,” Sankhy¯a:The Indian Journal of Statistics, pp. 436–460.

Abramovich, F., Benjamini, Y., Donoho, D. L., Johnstone, I. M., et al. (2006), “Adapting to unknown sparsity by controlling the false discovery rate,” The Annals of Statistics, 34, 584–653.

Alosh, M., Bretz, F., and Huque, M. (2014), “Advanced multiplicity adjustment methods in clinical trials,” Statistics in medicine, 33, 693–713.

Banks, D. et al. (2011), “Reproducible research: A range of response,” Statistics, Politics, and Policy, 2, 4.

Benjamini, Y. and Hochberg, Y. (1995), “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 289–300.

Berger, J. O. (1985), Statistical Decision Theory and Bayesian Analysis, Springer Science & Business Media.

Berger, J. O., Brown, L. D., and Wolpert, R. L. (1994), “A unified conditional frequentist and Bayesian test for fixed and sequential simple hypothesis testing,” The Annals of Statistics, pp. 1787–1807.

Berger, J. O., Wang, X., and Shen, L. (2014), “A Bayesian approach to subgroup identifi- cation,” Journal of biopharmaceutical statistics, 24, 110–129.

Bogdan, M., Ghosh, J. K., Ochman, A., and Tokdar, S. T. (2007), “On the Empirical Bayes approach to the problem of multiple testing,” Quality and International, 23, 727–739.

Bogdan, M., Ghosh, J. K., Tokdar, S. T., et al. (2008a), “A comparison of the Benjamini- Hochberg procedure with some Bayesian rules for multiple testing,” in Beyond paramet- rics in interdisciplinary research: Festschrift in honor of Professor Pranab K. Sen, pp. 211–230, Institute of Mathematical Statistics.

Bogdan, M., Frommlet, F., Biecek, P., Cheng, R., Ghosh, J. K., and Doerge, R. (2008b), “Extending the Modified Bayesian Information Criterion (mBIC) to dense markers and multiple interval mapping,” Biometrics, 64, 1162–1169.

99 Bogdan, M., Chakrabarti, A., Frommlet, F., and Ghosh, J. K. (2011), “Asymptotic Bayes- optimality under sparsity of some multiple testing procedures,” The Annals of Statistics, pp. 1551–1579.

Bretz, F., Maurer, W., Brannath, W., and Posch, M. (2009), “A graphical approach to sequentially rejective multiple test procedures,” Statistics in medicine, 28, 586.

Cai, T., Liu, W., and Xia, Y. (2013), “Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings,” Journal of the American Statistical Association, 108, 265–277.

Cai, T. T. and Xia, Y. (2014), “High-dimensional sparse MANOVA,” Journal of Multi- variate Analysis, 131, 174–196.

Casella, G. and Berger, R. L. (1987), “Reconciling Bayesian and frequentist evidence in the one-sided testing problem,” Journal of the American Statistical Association, 82, 106–111.

Chipman, H., George, E. I., McCulloch, R. E., Clyde, M., Foster, D. P., and Stine, R. A. (2001), “The practical implementation of Bayesian model selection,” Lecture Notes- Monograph Series, pp. 65–134.

Clyde, M. A., Ghosh, J., and Littman, M. L. (2011), “Bayesian adaptive for vari- able selection and model averaging,” Journal of Computational and Graphical Statistics, 20.

Cui, W. and George, E. I. (2008), “Empirical Bayes vs. fully Bayes variable selection,” Journal of Statistical Planning and Inference, 138, 888–900.

Dass, S. C. and Berger, J. O. (2003), “Unified conditional frequentist and Bayesian testing of composite hypotheses,” Scandinavian Journal of Statistics, 30, 193–210.

Datta, J., Ghosh, J. K., et al. (2013), “Asymptotic properties of Bayes risk for the horseshoe prior,” Bayesian Analysis, 8, 111–132.

Dmitrienko, A., Tamhane, A. C., and Bretz, F. (2009), Multiple testing problems in phar- maceutical statistics, CRC Press.

Dudoit, S. and Van Der Laan, M. J. (2007), Multiple testing procedures with applications to genomics, Springer Science & Business Media.

Durrett, R. (2010), Probability: theory and examples, vol. 3, Cambridge university press.

Dutta, R., Bogdan, M., and Ghosh, J. K. (2015), “Model Selection and Multiple Testing- A Bayesian and Empirical Bayes Overview and some New Results,” arXiv preprint arXiv:1510.00547.

Efron, B. (2004), “Large-scale simultaneous hypothesis testing,” Journal of the American Statistical Association, 99.

100 Efron, B., Tibshirani, R., Storey, J. D., and Tusher, V. (2001), “Empirical Bayes analysis of a microarray experiment,” Journal of the American statistical association, 96, 1151– 1160.

Ferguson, T. S. (2003), “U-statistics,” Notes for Statistics.

Ge, Y., Dudoit, S., and Speed, T. P. (2003), “Resampling-based multiple testing for mi- croarray data analysis,” Test, 12, 1–77.

Genovese, C. R., Lazar, N. A., and Nichols, T. (2002), “Thresholding of statistical maps in functional neuroimaging using the false discovery rate,” Neuroimage, 15, 870–878.

George, E. I. and McCulloch, R. E. (1993), “Variable selection via Gibbs sampling,” Journal of the American Statistical Association, 88, 881–889.

Gilbert, P. B., Berger, J. O., Stablein, D., Becker, S., Essex, M., Hammer, S. M., Kim, J. H., and DeGruttola, V. G. (2011), “Statistical interpretation of the RV144 HIV vaccine efficacy trial in Thailand: a case study for statistical issues in efficacy trials,” Journal of Infectious Diseases, 203, 969–975.

Goeman, J. J. and Solari, A. (2014), “Multiple hypothesis testing in genomics,” Statistics in medicine, 33, 1946–1978.

Guindani, M., M¨uller,P., and Zhang, S. (2009), “A Bayesian discovery procedure,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71, 905–925.

Guo, M. and Heitjan, D. F. (2010), “Multiplicity-calibrated Bayesian hypothesis tests,” , 11, 473–483.

Heller, R., Yekutieli, D., et al. (2014), “Replicability analysis for genome-wide association studies,” The Annals of Applied Statistics, 8, 481–498.

Holm, S. (1979), “A simple sequentially rejective multiple test procedure,” Scandinavian journal of statistics, pp. 65–70.

Huque, M. F. and Alosh, M. (2008), “A flexible fixed-sequence testing method for hierar- chically ordered correlated multiple endpoints in clinical trials,” Journal of Statistical Planning and Inference, 138, 321–335.

Ioannidis, J. P. (2005), “Why most published research findings are false,” Chance, 18, 40–47.

Karr, A. F., Sanil, A. P., and Banks, D. L. (2006), “Data quality: A statistical perspective,” Statistical Methodology, 3, 137–173.

Malloy, M. and Nowak, R. (2011), “Sequential analysis in high-dimensional multiple testing and sparse recovery,” in Information Theory Proceedings (ISIT), 2011 IEEE Interna- tional Symposium on, pp. 2661–2665, IEEE.

101 Maurer, W., Hothorn, L., and Lehmacher, W. (1995), “Multiple comparisons in drug clinical trials and preclinical assays: a-priori ordered hypotheses,” Biometrie in der chemisch-pharmazeutischen Industrie, 6, 3–18.

Miller, C. J., Genovese, C., Nichol, R. C., Wasserman, L., Connolly, A., Reichart, D., Hopkins, A., Schneider, J., and Moore, A. (2001), “Controlling the false-discovery rate in astrophysical data analysis,” The Astronomical Journal, 122, 3492.

Muller, P., Parmigiani, G., and Rice, K. (2006), “FDR and Bayesian multiple comparisons rules,” .

Noble, W. S. (2009), “How does multiple testing correction work?” Nature biotechnology, 27, 1135–1137.

O’Brien, P. C. and Fleming, T. R. (1979), “A multiple testing procedure for clinical trials,” Biometrics, pp. 549–556.

Scott, J. G. (2009), “Nonparametric Bayesian multiple testing for longitudinal performance stratification,” The Annals of Applied Statistics, pp. 1655–1674.

Scott, J. G. and Berger, J. O. (2006), “An exploration of aspects of Bayesian multiple testing,” Journal of Statistical Planning and Inference, 136, 2144–2162.

Scott, J. G. and Berger, J. O. (2010), “Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem,” The Annals of Statistics, 38, 2587–2619.

Sellke, T., Bayarri, M., and Berger, J. O. (2001), “Calibration of ρ values for testing precise null hypotheses,” The American Statistician, 55, 62–71.

Siegmund, D. (1985), Sequential Analysis: Tests and Confidence Intervals, Springer Science & Business Media.

Storey, J. D. (2003), “The positive false discovery rate: A Bayesian interpretation and the q-value,” Annals of Statistics, pp. 2013–2035.

Storey, J. D. and Tibshirani, R. (2003), “Statistical significance for genomewide studies,” Proceedings of the National Academy of Sciences, 100, 9440–9445.

Streiner, D. L. (2015), “Best (but oft-forgotten) practices: the multiple problems of multi- plicitywhether and how to correct for many statistical tests,” The American Journal of Clinical Nutrition, 102, 721–728.

Tony Cai, T., Liu, W., and Xia, Y. (2014), “Two-sample test of high dimensional means un- der dependence,” Journal of the Royal Statistical Society: Series B (Statistical Method- ology), 76, 349–372.

Wang, R., Lagakos, S. W., Ware, J. H., Hunter, D. J., and Drazen, J. M. (2007), “Statistics in medicinereporting of subgroup analyses in clinical trials,” New England Journal of Medicine, 357, 2189–2194.

102 Westfall, P. H. and Krishen, A. (2001), “Optimally weighted, fixed sequence and gatekeeper multiple testing procedures,” Journal of Statistical Planning and Inference, 99, 25–40.

Westfall, P. H. and Young, S. S. (1993), Resampling-based multiple testing: Examples and methods for p-value adjustment, vol. 279, John Wiley & Sons.

White, J. T. and Ghosal, S. (2014), “Multiple testing approaches for removing background noise from images,” in Topics in , pp. 95–104, Springer.

Whitehead, J. (1997), The design and analysis of sequential clinical trials, John Wiley & Sons.

Wiens, B. L. (2003), “A fixed sequence Bonferroni procedure for testing multiple end- points,” Pharmaceutical Statistics, 2, 211–215.

103 Biography

Shih-Han Chang was born August 31, 1987 in Taipei, Taiwan. He married Dr. Chien-

Kuang Ding in 2015. Chang received his B.Sc.in Mathematics from National Taiwan

University in 2010, M.A. in Mathematics from Duke University in 2013, and Ph.D. in 2015 under the supervision of professor James Berger. Under his advisors mentorship, he estab- lished Bayesian procedures on false positive probability of mutually exclusive hypotheses under the scenario of data dependence and investigated Asymptotic behavior in Bayesian multiple testing and sequential endpoint tests.

During summers at the graduate school, Chang worked at Facebook (Menlo Park,

CA), Goldman Sachs (London, UK), and Verisk Analytics (San Francisco, CA). His roles included Data Scientist and Risk Management Associate. He is interested in and machine learning and motivated to pursue a career as a data scientist. He was on the Dean’s List with Honors at National Taiwan University, awarded the Study

Abroad Scholarship from the Ministry of Education in Taiwan. During his graduate studies at Duke, he represented Duke for 2015 National College Table Tennis Championships and participated in the Triangle Healthcare Challenge (2015), where his team won both the

Grand Prize and the Validic mHealth prize.

104