<<

arXiv:2011.01343v1 [math.ST] 2 Nov 2020 olcigmr aaatra paetysgicn etr test significant apparently an after data more collecting – ebido hst nrdc aiyo ekn-outsequ peeking-robust of family a introduce to this on build We hc r aiydsotdb ekn.W eeo sequentia develop We peeking. by distorted easily are which h ai unrblt fmn ttsia et opeekin to tests statistical many of vulnerability basic The oki ae namrigl-ae rmwr o analyzing for framework martingale-based a on based is work hspolmhsbe drse yeitn hoyo h sub the on theory existing by addressed been has problem This values rvoslnso ok otpof r eerdt h appen the to deferred are proofs Most ra work. discuss general of 5 a lines Section previous to 12). lead Theorem and (e.g., information, diagn interest independent quantitative past with use null These the statistic. under test a of functions iei scmue seeing – computed is it time nsc cnro,dsrbdi eto .Tecorrected The 2. Section in described scenarios, such in ope noteftr,gvn ulmdlfrteftr res future the for model null a giving future, the into peer to δ oercn teto o t outest tpig(Grünw stopping to an robustness factors its Bayes for of attention context the recent within more viewed As idea. this on oelkl ofleyrpr infiac ne h ulhyp null the under significance report falsely to con likely lamentable more The 1954). (Anscombe, conclusion" foregone ouhr Vv,19;Sae ta. 01 okadWn,20 Wang, and Vovk 2011; al., the et Shafer 1993; (Vovk, coauthors ool eotalwenough low the a repeating tests, report common only many for to that proven be can th it But in 2000) (Nickerson, investigator" the of intentions the odt nepeain Ewrse l,16)–adta col that Howe and results. – previous invalidate, 1963) not wh help, al., governing always et should rules (Edwards “the interpretation" – data statis null the to the that against argued evidence been of long degree has It benefit. questionable p etstatistic test osdrasinittyn ots yohsso oehuge some on hypothesis a test to trying scientist a Consider ocmuetecniinlexpectation conditional the compute to ekn safr of form a is Peeking P ie hl eetdytsigaldt ahrds a,i i it far, so gathered data all testing availabili repeatedly the While by limited time. primarily is resources computing oegiven some T 1 hshlsirsetv ftedtiso h ekn proce peeking the of details the of irrespective holds This . [email protected] University Stanford Balsubramani Akshay τ - min = p AU EKN N SIAIGEXTREMA ESTIMATING AND PEEKING VALUE vle nfrl vrtm sn ts atnae"adc and martingale," “test a using time over uniformly -values EPOLMO PEEKING OF PROBLEM HE P 1 evsv su nsaitclhptei etn sthat is testing hypothesis statistical in issue pervasive A ietyadesteefc fpeigi oegnrlscena general some in peeking t of of effect extrema the running address such are directly samples estimate data to more mechanisms as principled statistic reportin test of the practice of the values – extreme “peeking" data by downward biased P , . . . , s ≤ f E τ ( P [ X f s t ( 1 eutn ntereported the in resulting , X X , . . . , ni n slweog ob infiat(a ttime at (say significant be to enough low is one until 1 X , . . . , p vlehacking -value n ) taytime any at setmtdb rwn admsml ftedt (say data the of sample random a drawing by estimated is n p )] vle–casclwr eonzsta hyae“apigt “sampling are they that recognizes work classical – -value a , p -value hti iepedi miia cec o peln reaso appealing for science empirical in widespread is that E au of value a [ f A p A ( vlehvn onadbias. downward a having -value X BSTRACT t 1 scluae.Apamtcpattoe ihample with practitioner pragmatic A calculated. is X , . . . , 1 δ losrjcino h ula infiac level significance at null the of rejection allows n sta hymeasure they that is g ) sista rc h iko uuepeeking future of risk the track that ostics i hieo tpigrule. stopping of choice eir e,standard ver, p ste tlnt ntecneto several of context the in length at them es yo aa ahrn oesmlswith samples more gathering data, of ty | iinsoiinsol o nunethe influence not should opinion tician’s vlei ai o l ie,ntjs the just not times, all for valid is -value othesis. l,21;Güwl ta. 09.Such 2019). al., et Grünwald 2018; ald, ouaino samples of population lso peeking. of ults X omnt pe"a h reported the at “peek" to common s etn oedt n ec evidence hence and data more lecting sl a ecsl,ado seemingly of and costly, be can esult ue nacransne hsalw us allows this sense, certain a In dure. dmwl eopsto fpossible of decomposition walk ndom ieiodrto,ti a lodrawn also has this ratios, likelihood d dix. ehnssfrestimating for mechanisms l nilhptei et nSc n 4. and 3 Sec. in tests hypothesis ential lso sta ekn ae tmuch it makes peeking that is clusion etln nuhwl edtescientist the lead will enough long test ndt olcinsosaeirrelevant are stops collection data en 1: p t 9 eeosteie fcorrecting of idea the develops 19) vle hnpeigi performed is peeking when -values nan ute itrclreferences historical further ontains ] et ieo okb okand Vovk by work of line A ject. suiganl yohsswith hypothesis null a Assuming . h reported the olce.W develop We collected. rios. nyprogressively only g s ttsis which , est p vleaaye dpn on “depend analyses -value τ ,adrpr that report and ), p average vle are -values X 1 X , . . . , X phenomena, 1 ec a reach o X , . . . , extremal p n -value The . ns p t ) - 2 SETUP: ALWAYS VALID p-VALUES

Recalling our introductory discussion, a common testing scenario involving a statistic f tests a sam- ple using the conditional mean over the sample: N := E [f(X ,...,X ) X ]. The stochastic t 1 n | 1:t process N is a martingale because t : E (N N − ) X =0 (Durrett, 2010). Similarly, ∀ t − t 1 | 1:(t−1) a supermartingale has differences with conditional mean 0. A more general and formal definition conditions on the canonical filtration (see Appendix A).≤ F A p-value is a random variable P producedby a statistical test such that under the null, Pr (P s) s s> 0. We will discuss this in terms of stochastic dominance of random variables. ≤ ≤ ∀ Definition 1. A real-valued random variable X (first-order) stochastically dominates another real r.v. Y (written X Y ) if either of the following equivalent statements is true (Rockafellar and Royset, 2014): (a) For all c R, Pr (X c) Pr (Y c). (b) For any non- decreasing function h, E [h(X)] E [h(Y )]. Similarly,∈ define≥ X≥ Y if ≥X Y . If X Y d ≥  −  −  and X Y , then X = Y .  In these terms, a p-value P satisfies P , with a Uniform([0, 1]) random variable. This can be described as the quantile function of theU test’s statisticU under the null hypothesis. The peeker can choose any random time τ without foreknowledge, to report the value they see as final – they choose a stopping time τ (see Appendix A for formal definitions) instead of pre- specifying a fixed time t. So a peeking-robust p-value Ht requires that for all stopping times τ, Hτ . As τ could be any fixed time, this condition is more strict than the condition on P for a fixedt. UH is an inflated process that compensates for the downward bias of peeking. How is the stochastic process H defined? There is one common recipe: define H = 1 , using t Mt a nonnegative discrete-time (super)martingale Mt with M0 = 1. This guarantees H is a robust p-value process, i.e. H for stopping times τ. (The reason why is briefly stated here: the τ  U expectation E [Mτ ] is controlled at any stopping time τ by the supermartingale optional stopping theorem (Theorem 0), so E [Mτ ] = E [M0]=1. Therefore, using Markov’s inequality on Mτ , we have Pr (H s)= Pr M 1 s.) τ ≤ τ ≥ s ≤  Such a “test [super]martingale" Mt turns out to be ubiquitous in studying sequential inference pro- cedures (Shafer et al., 2011; Vovk and Wang, 2019), and is effectively necessary for such inference (Ramdas et al., 2020). Appendix B Therefore, our analysis focuses on a nonnegative discrete-time supermartingale Mt with M0 = 1. We also use the cumulative maximum St := maxs≤t Ms and the lookahead maximum S≥t := maxs≥t Ms.

3 WARM-UP: HAS THE ULTIMATE MAXIMUM BEEN ATTAINED?

In the peeking scenario, it suffices to consider times until τF := max s 0 : Ms = Ss , the time of the final attained maximum, because no peeker can report a greater{ value≥ than they} see at this time. However, τF is not a stopping time because it involves occurrences in the future, so traditional martingale methods do not study it.

Studying τF is a useful introduction to the main results of this paper. We describe τF by establishing a “multiplicative representation" of a nonnegative discrete-time (super)martingale Mt (with M0 = 1) in terms of its maxima.

Theorem 2 (Bounding future extrema with the present). Define the supermartingale Zt := Pr (τ t ). Then with a standard Uniform([0, 1]) random variable: F ≥ |Ft U Mt 1 Mt 1 (a) S≥t . Therefore, S∞ , and t such that Mt > 0, S∞ St max 1, .  U  U ∀  St U Mt   (b) Zt , with equality if M is a martingale. ≤ St t Mi−Mi−1 t 1 1 (c) Define Qt := i=1 S and Lt := q=1 Mq−1 S S . Then the decom-  i   q−1 − q  position Zt P1+ Qt Lt holds, with equalityP for martingale M. Furthermore: • Q is a≤ (super)martingale− if M is. • L is a nondecreasing process which only changes when M hits a new maximum.

2 Z is called the Azéma supermartingale of M (Azéma, 1973). Note that M − S − , so that t t 1 ≤ t 1 t t t 1 1 Sq−1 Sq Lt Sq−1 = 1 log = log St (1) ≤ S − − S   − S  ≤ S −  Xq=1 q 1 q Xq=1 q Xq=1 q 1 where we use the inequality 1 1 log x for positive x. This can be quite tight (L log S ) − x ≤ t ≈ t when the steps are small relative to St−1, so that Mt−1 is not much lower than St−1 at the times Lt changes. This decomposition is intimately connected with log St, as we will see that the martingale Q is effectively equal to E [log S∞ ] 1 (Theorem 6). t |Ft − Notably, Mt can be calculated pathwise, so a natural question is if it can be used as a peeking-robust St statistic, i.e. if we can reason about its peeked version

Ms Rt := min min Zs s≤t Ss ≥ s≤t which is a nonincreasing process. The following result shows that Rt can be considered a valid p-value at any time horizon. Theorem 3 (An alternative p-value). With denoting a standard Uniform([0, 1]) random variable, U (a) For any stopping time τ τ , R . ≤ F τ U (b) Define ρ := max t τ : Z = min ≤ Z . Then R Pr (ρ >t ). F { ≤ F t u τF u} t ≥ F |Ft

4 ESTIMATING EXTREMA OF MARTINGALES

For fixed sample sizes, any statistic T with null distribution µ can be computed from its p-value by applying the statistic’s inverse complementary CDF µ¯−1 to the p-value P . In this way, we can think of any distribution µ in terms of a nondecreasing function g(x) :=µ ¯−1(1/x) for x 1, so that g (P ) corresponds to the statistic T . In this prototypical case, T µ. Similarly, given≥ a martingale  1 M associated with a robust p-value process H, the equivalent statistic g(M)= g H is dominated by µ.  µ Assume M is a martingale and suppose we test a statistic g (Mt) with a process At. The obvious µ choice At = g (Mt) is prone to peeking. We instead inoculate A against future peeking by max- imizing over the entire trajectory of A, and using that as a test statistic. We directly estimate the extreme value maxt g(Mt)= g(S∞) – a quantity robust to peeking – with the process (martingale) 1 E [g(S∞) ]. |Ft This quantity has a natural motivation, but it depends on the future through S∞, and confounds attempts at estimation with fixed-sample techniques. Nevertheless, we show how to efficiently com- pute this as a stochastic process (Theorem 4), and prove that its null distribution is µ, under a “good" stopping rule (Theorem 9). This characterization leads to results which are more generally novel (Section 4.4).

We also study the interplay between the statistic E [g(S∞) ] and its own “peeked" cumulative |Ft maximum max E [g(S∞) ], characterizing it in terms of µ (Theorem 9, Theorem 10) and show- t |Ft ing that E [g(S∞) ]. |Ft

4.1 ESTIMATING THE RUNNING EXTREMUM

We can use the distributional characterization of Theorem 2 to provide insight into the statistic E [g(S∞) ] and ways to compute it. |Ft

1 If the peeker can be assumed to have a limited waiting period of T samples, S∞ can be replaced by ST in this analysis.

3 1 s Theorem 4. For any nondecreasing function g, denote (s) := 0 g u du and its derivative ′ dG(s) ∞ dx G (s) := = 2 (g(x) g(s)). Then is continuous, concave,R and  nondecreasing. Also: G ds s x − G R ∞ E (a) Mt g(x) [g(S∞) t] Yt := 1 g(St)+ Mt 2 dx |F ≤  − St  ZSt x (b) Mt Mt = 1 g(St)+ (St)  − St  St G (c) = (S ) (S M ) ′(S ) G t − t − t G t (d) = g(S )+ M ′(S ) t tG t with equality when M is a martingale. Furthermore, g 0 = Y 0. ≥ ⇒ ≥

Theorem 4(a) shows exactly which choices of g are appropriate, as E [g(S∞)] can only be bounded g(x) if x2 is integrable away from zero. This paper assumes this hereafter: g(x) Assumption 1. x2 has a finite integral on any closed interval away from zero. Theorem 4 characterizes the test statistic Y , the Azéma-Yor (AY) process of M with respect to g (Azéma and Yor, 1979). Thm. 4(b) can be interpreted as an expectation over two outcomes, using Theorem 2(b). With 1 Mt , the cumulative maximum is not exceeded in the future − St Mt (τF t), so g(S∞) = g(St). Alternatively with probability , the cumulative maximum is ≤ St exceeded in the future (τF > t), and the conditional expectation of g(S∞) in this case is (St), using Theorem 2 to get a precise idea of the lookahead maximum from the present. G The AY process Y , constructed by Theorem 4 using any g, has some remarkable properties that further motivate its use.

Lemma 5 (Properties of AY processes). Define the Bregman divergence DF (a,b) := F (a) F (b) (a b)F ′(b) 0 for any convex function F . Any AY process Y defined as in Theorem 4− is a supermartingale.− − The≥ following relations hold pathwise for all t:

a) Yt g(St) ≥ ′ b) Y Y − = (M M − ) (S − ) D−G(S ,S − ) t − t 1 t − t 1 G t 1 − t t 1 c) max Ys = (St) Yt (Mt) s≤t G ≥ ≥ G d) For any stochastic process A, if Au (Mu) for all u, then max As max Ys. ≥ G s≤t ≥ s≤t

4.2 CONSEQUENCESANDEXAMPLES

t Mi−Mi−1 Theorem 6. Define Qt := as in Theorem 2(c). Then Qt E [log(S∞) t] 1. i=1 Si ≤ |F − Here the inequality is as tightP as that in (1). 

Proof. First, note that 1+ Q = Z + L from Theorem 2(c). Define Q := t Mi−Mi−1 t t t t i=1 Si from Using Theorem 4 with g(x) = log(x), we have (x) = log(x)+1, so P   G 1 Mt E [log(S∞) t] = log(St)+1 (St Mt) = log(St)+ Lt + Zt =1+ Qt |F − − St St ≥

Theorem 4(d) implies a simple formula for the mean of the ultimate maximum E [g(S∞)]. ′ Corollary 7. With g and defined as in Theorem 4, E [g(S∞)] = g(1) + (1). G G

4.3 BOUNDINGTHENULLDISTRIBUTION

Next, we characterize the null distribution of the test statistic process Y .

4 Our stated motivation for g in Sec. 4.1 involves a distribution µ, which plays the role of the null in the fixed-sample case. We proceed to specify a stopping time τ µ such that the stopped test statistic satisfies the same null guarantee as the fixed-sample one: Yτ µ µ. Our development depends on some properties of µ. 

Definition 8. A real-valued distribution µ has a complementary CDF µ¯(x) := PrX∼µ (X x), a tail quantile function µ¯−1(ξ) := min x :µ ¯(x) < ξ , and barycenter function ψ (x≥) = { } µ E [X X x]. Its superquantile function is SQµ(ξ) := ψ µ¯−1(ξ) = 1 ξ µ¯−1(λ)dλ, and µ | ≥ µ ξ 0 its Hardy-Littlewood transform is the distribution µHL := SQµ( ) for a [0, 1]R -uniform random variable (Carraro et al., 2012; Rockafellar and Royset, 2014). µ Uis associated with a nondecreas- U µ −1 µ 1 µ x ing function g (x)=¯µ (1/x) with corresponding future loss potential (x) := 0 g u du = µ G SQ (1/x). R  (Hereafter, superscripts of µ will be omitted when clear from context.) The characterization provided by Theorem 4 precisely characterizes the mediating function g’s effect on the distribution of the given null process Y , fully specifying its distribution. µ µ HL Theorem 9. Fix a µ and define τ := min t : g (St) Yt . Then max Ys µ , and Yτ µ µ. { ≥ } s≤τ µ   Theorem 10. [see also Gilat and Meilijson (1988)] For any distribution µ, nonnegative martingale HL A, and stopping time τ, if Aτ µ, then max As µ .  s≤τ 

4.4 UNIVERSALITY

Having derived the AY process for any nonnegative supermartingale M, we have introduced a num- ber of perspectives on its favorable properties and usefulness as a test statistic (Thm. 4, Lemma 5). This section casts those earlier developments more powerfully, with a converse result: any stochastic process can be viewed as an AY-like process. We know this to be only a loose solution because Y is a strict supermartingale even when M is a martingale (by Lemma 5). Instead, a recentered version of this process is appropriate, satisfying two important difference equations pathwise. Lemma 11. Given any process M and continuous concave nondecreasing nonnegative , there is an a.s. unique process B with B = (1) such for all t, G 0 G ′ ′ ‘Bt Bt−1 = (Mt Mt−1) (St−1) and max Bs Bt = (St Mt) (St) (2) − − G s≤t − − G Due to (2), if M is a nonnegative (super)martingale respectively, so is B. For t 0, B is defined by ≥ t ′ B := (S ) (S M ) (S )+ D−G(S ,S − ) (3) t G t − t − t G t s s 1 Xs=1

Lemma 11 says that B, a bias-corrected version of Y (w.r.t. M), is a “damped" version of M with ′ variation modulated by the positive nonincreasing function (St−1). This result couples the entire evolutions of M and B, so after fixing initial conditions weG can derive a unique decomposition of any process B in terms of a martingale M and its cumulative maximum S. Theorem 12 (Martingale-max (MM) Decomposition). Fix any continuous, concave, strictly increas- ing, nonnegative . Any process B with B = (1) can be uniquely (a.s.) decomposed in terms of G 0 G a “variation process" M and its running maximum S, such that M0 = S0 = 1 and (2) holds. The processes M and S are defined for any t 1 inductively by t t ≥ t Bs Bs−1 Mt =1+ ′ − , St = max Ms (4) (S − ) s≤t Xs=1 G s 1 If B is a (super)martingale respectively, so is M.

This depends on an attenuation function , decomposing the input B into a variation process M and its cumulative maximum S, which (asG a nondecreasing process) functions as an “intrinsic time" quantity. Thm. 12 vastly expands the scope of these analytical tools for AY processes to be applica- ble to stochastic processes more generally, readily allowing manipulation of cumulative maxima.

5 4.5 MAX-PLUS DECOMPOSITIONS

We can also cast the scenario of Section 4 in terms of the quantity (Mt). This is a supermartingale if M is (Durrett, 2010), and many supermartingales can be writteGn in such a form. By Theorem Mt 4, (M ) = E ∈U E [g(S≥ ) ] = E [max ≥ g(M ) ], where the inequality is G t u u ≥ t |Ft s t s |Ft by the stochastic dominance  relation in Theorem 2. In our scenario, this can be viewed without further restrictions as a unique decomposition of (Mt), following the continuous-time development (El Karoui and Meziou (2008), Prop. 5.8). G Theorem 13 (Max-plus (MP) Decomposition). Fix any continuous, concave, strictly increasing, nonnegative . For any nonnegative martingale M with M = 1, there is an a.s. unique process G t 0 Lt such that (Mt)= E [maxs≥t Ls t], with equality for martingale M. This can be written as G |F ′ Lt := g(Mt) for the nondecreasing function g(x) := (x) x (x). Also, there is an a.s. unique supermartingale Y with Y (M ) for all t pathwise.G − G t ≥ G t

5 DISCUSSION

5.1 SEQUENTIAL TESTING

Treating the sample size as a random stopping time is central to the area of sequential testing. Much work in this area has focused around the likelihood-ratio martingale of a distribution f for data t f(X ) under a null distribution g: M := i . A prototypical example is the Sequential Probability t g(X ) iY=1 i Ratio Test (SPRT, from Wald and Wolfowitz (1948)), which is known to stop optimally soon given particular type I and type II error constraints. The likelihood-ratio martingale has been explored for stopping in other contexts as well (Darling and Robbins, 1968; Robbins and Siegmund, 1970; Berger et al., 1997), including for composite hypotheses (Wasserman et al., 2020; Grünwald et al., 2019). These all deal with specific situations in which the martingale formulation allows for tests with anytime guarantees. Frequentist or nonparametric perspectives on sequential testing typically contend with LIL behavior. For example, the work of Balsubramani and Ramdas (2016) presents sequential nonparametric two- sample tests in a framework related to ours. Such work requires changing the algorithm itself to be a sequential test in an appropriate setting, with a specified level of α. The setting of p-values is in some sense dual to this, as explored in recent work (Howard et al., 2018; Shin et al., 2020). Sequential testing involves specifying a type I error a priori (and sometimes also type II, e.g. for the SPRT), while what we are reporting is a minimum significance level at which the data show a deviation from the null. This is exactly analogous to the relationship between Neyman-Pearson hypothesis testing and Fisher-style significance testing – the method of this paper can be considered a robust Fisher-style significance test under martingale nulls, just as sequential testing builds on the Neyman-Pearson framework. Similarly, we do not analyze any alternative hypothesis, which would affect the power of the test (though the choice of test statistic governs the power).

5.2 TECHNICAL TOOLS

The particulars of computing H-values are direct algorithmic realizations of the proof of Balsubramani (2014), which also shows that these H-values are as tight as possible within a constant factor on the probability. The broader martingale mixture argument has been studied in detail in an inverted form, as a uniform envelope on the base martingale M (Robbins, 1952; Robbins and Siegmund, 1970). In testing maxima, we are guided by the framework fundamentally linking the SQ( ) function and the maxima of stochastic processes. SQ has been used in much the same time-uniform· context (Blackwell and Dubins (1963), Thm. 3a), and seminal continuous-time contributions showed that this can controlthe maximumof a continuousmartingalein general settings (Dubins and Gilat, 1978; Azéma and Yor, 1979). Related work also includes the continuous (super)martingale “multiplicative representations" of Nikeghbali and Yor (2006), whose techniques we repurpose. The modern usage

6 crucially involves a variational characterization of SQ (Rockafellar and Uryasev, 2000) that would be an interesting avenue to future methods (Rockafellar and Royset, 2014). Many stopping-time issues in this paper have been studied for Brownian motion, and some for mar- tingales in continuous time under regularity conditions. Stopping Brownian motion to induce a given stopped distribution has been well studied in probability, as the Skorokhod embedding prob- lem (Obłój, 2004). AY processes were originally proposedas a continuous solution of the Skorokhod problem (Azéma and Yor, 1979), analogous to our discrete-time results on the null distribution of our AY test statistic, for which we adapted techniques from previous work (Gilat and Meilijson, 1988; Carraro et al., 2012). The difference equation of Lemma 11 has been studied in the context of future maxima since Bachelier (1906). To our knowledge the MM decomposition is novel, though in continuous time the AY process can be inverted directly (El Karoui and Meziou, 2008).

5.3 FUTURE WORK

The importance of peeking has long been recognized in the practice of statistical testing (Robbins, 1952; Armitage et al., 1969; Nickerson, 2000; Wagenmakers, 2007; Simmons et al., 2011), mostly in a negative light. The statistician typically does not know their sampling plan, which is necessary for standard hypothesis tests. The stopping rule is subject to many sources of variation: for example, it could be unethical to continue sampling when a significant effect is detected in a clinical trial (Ioannidis, 2008), or the experimenter could run out of resources to gather more data. Solutions to this problem are often semi-heuristic and generally involve “spendinga budget of α," the willingness to wrongly reject the null, over time. Such methods are widely used (Peto et al., 1977; Pocock, 1977; Sagarin et al., 2014) but are not uniformly robust to sampling strategies, and their execution suffers from many application-specific complexities arising from assumptions about the possible stopping times employed by the peeker (Pocock, 2005). We hope to have presented general and useful theory to address this state of affairs. A main open problem of interest here is applying these results to design and deploy new hypothesis tests.

REFERENCES Francis J Anscombe. Fixed-sample-size analysis of sequential observations. Biometrics, 10(1): 89–100, 1954. Peter Armitage, CK McPherson, and BC Rowe. Repeated significance tests on accumulating data. Journal of the Royal Statistical Society. Series A (General), pages 235–244, 1969. Jacques Azéma. Théorie générale des processus et retournement du temps. Annales scientifiques de l’École Normale Supérieure, 4e série, 6(4):459–519, 1973. Jacques Azéma and Marc Yor. Une solution simple au problème de skorokhod. Séminaire de Probabilités XIII, pages 90–115, 1979. Louis Bachelier. Théorie des probabilités continues. J. Math. Pures Appl., 6(II):259–327, 1906. Akshay Balsubramani. Sharp finite-time iterated-logarithm martingale concentration. arXiv preprint arXiv:1405.2639, 2014. Akshay Balsubramani and Aaditya Ramdas. Sequential nonparametric testing with the law of the iterated logarithm. In Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, pages 42–51. AUAI Press, 2016. James O Berger, Ben Boukai, and Yinping Wang. Unified frequentist and bayesian testing of a precise hypothesis. Statistical Science, 12(3):133–160, 1997. David Blackwell and Lester E. Dubins. A converse to the dominated con- vergence theorem. Illinois J. Math., 7(3):508–514, 09 1963. URL https://projecteuclid.org:443/euclid.ijm/1255644957. Haydyn Brown, David Hobson, and Leonard CG Rogers. Robust hedging of barrier options. Math- ematical Finance, 11(3):285–314, 2001.

7 Laurent Carraro, Nicole El Karoui, and Jan Obłój. On azéma–yor processes, their optimal properties and the bachelier–drawdown equation. The Annals of Probability, 40(1):372–400, 2012.

DA Darling and Herbert Robbins. Some nonparametricsequential tests with power one. Proceedings of the National Academy of Sciences of the United States of America, 61(3):804, 1968.

Lester E Dubins and David Gilat. On the distribution of maxima of martingales. Proceedings of the American Mathematical Society, 68(3):337–338, 1978.

Rick Durrett. Probability: theory and examples. Cambridge Series in Statisti- cal and Probabilistic . Cambridge University Press, Cambridge, fourth edi- tion, 2010. ISBN 978-0-521-76539-8. doi: 10.1017/CBO9780511779398. URL http://dx.doi.org/10.1017/CBO9780511779398.

Ward Edwards, Harold Lindman, and Leonard J Savage. Bayesian statistical inference for psycho- logical research. Psychological review, 70(3):193, 1963.

Nicole El Karoui and Asma Meziou. Max-plus decomposition of supermartingales and convex order. application to american options and portfolio insurance. The Annals of Probability, 36(2): 647–697, 2008.

David Gilat and Isaac Meilijson. A simple proof of a theorem of blackwell and dubins on the maximum of a uniformly integrable martingale. Séminaire de probabilités de Strasbourg, 22: 214–216, 1988.

Peter Grünwald. Safe probability. Journal of Statistical Planning and Inference, 195:47–63, 2018.

Peter Grünwald, Rianne de Heide, and Wouter Koolen. Safe testing. arXiv preprint arXiv:1906.07801, 2019.

Steven R Howard, Aaditya Ramdas, Jon McAuliffe, and Jasjeet Sekhon. Uniform, nonparametric, non-asymptotic confidence sequences. arXiv preprint arXiv:1810.08240, 2018.

John PA Ioannidis. Why most discovered true associations are inflated. Epidemiology, 19(5):640– 648, 2008.

Olav Kallenberg. Foundations of modern probability. Springer Science & Business Media, 2006.

Raymond S Nickerson. Null hypothesis significance testing: a review of an old and continuing controversy. Psychological methods, 5(2):241, 2000.

Ashkan Nikeghbali. Non-stopping times and stopping theorems. Stochastic Processes and their Applications, 117(4):457–475, 2007.

Ashkan Nikeghbali and Eckhard Platen. A reading guide for last passage times with financial appli- cations in view. Finance and Stochastics, 17(3):615–640, 2013.

Ashkan Nikeghbali and Marc Yor. A definition and some characteristic properties of pseudo- stopping times. the Annals of Probability, 33(5):1804–1824, 2005.

Ashkan Nikeghbali and Marc Yor. Doob’s maximal identity, multiplicative decompositions and enlargements of filtrations. Illinois Journal of Mathematics, 50(1-4):791–814, 2006.

Jan Obłój. The skorokhod embedding problem and its offspring. Probability Surveys, 1:321–392, 2004.

R Peto, MC Pike, Philip Armitage, Norman E Breslow, DR Cox, SV Howard, N Mantel, K McPher- son, J Peto, and PG Smith. Design and analysis of randomized clinical trials requiring prolonged observation of each patient. ii. analysis and examples. British journal of cancer, 35(1):1, 1977.

Stuart J Pocock. Group sequential methods in the design and analysis of clinical trials. Biometrika, 64(2):191–199, 1977.

8 Stuart J Pocock. When (not) to stop a clinical trial for benefit. Journal of the American Medical Association, 294(17):2228–2230, 2005. Aaditya Ramdas, Johannes Ruf, Martin Larsson, and Wouter Koolen. Admissible anytime-valid sequential inference must rely on nonnegative martingales. arXiv preprint arXiv:2009.03167, 2020. Herbert Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58:527–535, 1952. Herbert Robbins and David Siegmund. Boundary crossing for the Wiener process and sample sums. Ann. Math. Statist., 41:1410–1429, 1970. ISSN 0003-4851. R Tyrrell Rockafellar and Johannes O Royset. Random variables, monotone relations, and convex analysis. Mathematical Programming, 148(1-2):297–331, 2014. R Tyrrell Rockafellar and Stanislav Uryasev. Optimization of conditional value-at-risk. Journal of Risk, 2:21–42, 2000. Brad J Sagarin, James K Ambler, and Ellen M Lee. An ethical approach to peeking at data. Perspec- tives on Psychological Science, 9(3):293–304, 2014. Glenn Shafer, Alexander Shen, Nikolai Vereshchagin, and Vladimir Vovk. Test martingales, bayes factors and p-values. Statistical Science, 26(1):84–101,02 2011. doi: 10.1214/10-STS347. URL http://dx.doi.org/10.1214/10-STS347. Jaehyeok Shin, Aaditya Ramdas, and Alessandro Rinaldo. Nonparametric iterated-logarithm exten- sions of the sequential generalized likelihood ratio test. arXiv preprint arXiv:2010.08082, 2020. Joseph P Simmons, Leif D Nelson, and Uri Simonsohn. False-positive psychology undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological science, 22:1359–1366, 2011. Jean Ville. Etude critique de la notion de collectif. Bull. Amer. Math. Soc, 45(11):824, 1939. Vladimir Vovk and Ruodu Wang. Combining e-values and p-values. arXiv preprint arXiv:1912.06116, 2019. Vladimir G Vovk. A logic of probability, with application to the foundations of statistics. Journal of the Royal Statistical Society. Series B (Methodological), pages 317–351, 1993. Eric-Jan Wagenmakers. A practical solution to the pervasive problems ofp values. Psychonomic bulletin & review, 14(5):779–804, 2007. Abraham Wald and Jacob Wolfowitz. Optimum character of the sequential probability ratio test. The Annals of Mathematical Statistics, pages 326–339, 1948. Larry Wasserman, Aaditya Ramdas, and Sivaraman Balakrishnan. Universal inference. Proceedings of the National Academy of Sciences, 117(29):16880–16890, 2020.

9 A PROOFS OF RESULTS

A.1 PRELIMINARIES

In our setting, a stopping time is an adapted real function of the past (sub-)σ-algebra t (see the works (Durrett, 2010; Kallenberg, 2006) for more theoretical background). The central resultF about stopping times, which is the basis of this paper’s development, is the optional stopping theorem. Theorem 0 (Optional Stopping for Nonnegative Supermartingales ((Durrett, 2010), Theorem 5.7.6)). Let M be a nonnegative supermartingale. Then if τ >s is a (possibly infinite) stopping time, E [M ] M , with equality when M is a martingale. τ |Fs ≤ s This is typically useful for bounding probabilities pathwise, after applying Markov’s inequality on a particular choice of the stopped process Mτ . Lemma 0 (Ville (1939)). If M is a nonnegative supermartingale, for any c > 0, Ms Pr max ≥ M c, with equality for martingale M with lim →∞ M 0 a.s. t s t ≥ c |Fs ≤ t t →  Proof of Lemma 0. Consider the stopped M for τ := min t s : M Ms . By Theorem 0: τ ≥ t ≥ c M E [M ]= E [M τ < , ] Pr (τ < )+E [M τ = , ] Pr (τ = ) s ≥ τ τ | ∞ Fs ∞|Fs τ | ∞ Fs ∞|Fs M s Pr (τ < ) ≥ c ∞|Fs proving that Pr (τ < s) c, which is the result. The equality case uses the same proof, replacing the inequalities∞|F by equalities.≤

A.2 DEFERRED PROOFS

Here are full proofs of all the results we introduce in this paper.

Mt Mt Proof of Theorem 2. (a) By Lemma 0, s (0, 1), Pr s = Pr S≥t s = ∀ ∈ S≥t ≤ ≥ s ≤ Mt    Pr ( s), proving that S≥t U . Taking the nondecreasing function x max(St, x) of bothU ≤ sides gives the result.  7→ ∞ Mt (b) By Lemma0 on Ms , Zt = Pr (τF t t)= Pr (maxs≥t Ms St t) . { }s=t ≥ |F ≥ |F ≤ St Mt (c) By part (b), we have Zt Z0 1. We can write − ≤ St − t t M M M M M M M t 1= q q−1 = q q−1 + q−1 q−1 S −  S − S −   S − S S − S −  t Xq=1 q q 1 Xq=1 q q q q 1 t t Mi Mi−1 1 1 = − Mq−1  S  −  S −1 − S  Xi=1 i Xq=1 q q

Theorem 2 adapts continuous-time results from Nikeghbali and Yor (2005; 2006).

Proof of Theorem 3. (a) Define τ(u) := min t : Z u for u (0, 1). Then τ τ , { t ≤ } ∈ ∀ ≤ F Pr (R u)= Pr (τ(u) τ) τ ≤ ≤ Pr (τ(u) τ )= E [Pr (τ τ(u) )] = E Z u ≤ ≤ F F ≥ |Ft τ(u) ≤ where the last equality is by definition of Z, and the last inequality is by definition of τ(u). 1 1 1 (b) For any t, (ρF >t) = ( t < τ τF : Zτ RτF ) ( t < τ τF : Zτ Rt). Taking E [ ] on both sides∃ and defining≤ τ :=≤ min τ >t≤ : Z∃ R ≤, ≤ ·|Ft I { τ ≤ t} (a) (b) Pr (ρ >t ) Pr ([ t < τ τ : Z R ] )= Pr (τ τ ) = Z R F |Ft ≤ ∃ ≤ F τ ≤ t |Ft I ≤ F |Ft τI ≤ t where (a) and (b) are respectively by definition of Zt and τI .

10 Proof of Theorem 4. We condition on whether τF t. Using Theorem 2 (i.e., for a uniformrandom Mt ≤ variable , S≥ ) and the monotonicity of g, U t  U Mt E [g(S∞) ]= E [g (max(S ,S≥ )) ] E g max S , := Y (5) |Ft t t |Ft ≤    t  t U The rest of the proof consists of writing the right-hand side of (5) in equivalent forms. To prove parts (a) and (b), observe that

Mt/St Mt Mt Mt Yt = E g max St, = g ds + 1 g(St)     Z0  s   − St  ∞ U (χ1) g(x) Mt = Mt 2 dx + 1 g(St) ZSt x  − St  1 (χ2) Mt St Mt = g du + 1 g(St) St Z0  u   − St  where (χ1) uses the change of variables x := Mt/s, and (χ2) uses the change of variables u := sSt/Mt.

To prove (c), start from (χ1): ∞ Mt g(x) Yt = 1 g(St)+ Mt 2 dx  − St  ZSt x 1 ∞ g(x) ∞ g(x) = (St Mt) g(St)+ St 2 dx (St Mt) 2 dx St − ZSt x − − ZSt x  ∞ (χ3) g(x) g(St) = (St) (St Mt) 2 dx G − − ZSt x − St  ∞ dx = (St) (St Mt) 2 (g(x) g(St)) G − − ZSt x −  ∞ g(x) where (χ3), like (χ2), uses the change of variables u := St/x to construct (St)= St 2 dx. G St x To prove (d), start from part (a) of the result: R ∞ ∞ Mt g(x) dx Yt = 1 g(St)+ Mt 2 dx = g(St)+ Mt 2 (g(x) g(St))  − St  ZSt x ZSt x − ′ This proves that Yt g(St)+ Mt (St), which is 0 if g 0. is continuous because g is. The ≥ G ′ ≥ ∞ dx≥ G concavity and monotonicity of are because (s) = 2 (g(x) g(s)) is never negative, and G G s x − is monotone nondecreasing due to the monotonicity of gR. ′ This also shows that g(St)= (St) St (St) (previouslyprovedwith real analysis, in Carraro et al. (2012), Lemma 4.4). G − G

(a) ′ Proof of Lemma 5. (a) Yt = g(St)+ Mt (St) g(St), where (a) uses Theorem 4(d). G ≥ ′ (b) When M = S and therefore S − = S , then Y Y − = (M M − ) (S − ). When t 6 t t 1 t t − t 1 t − t 1 G t 1 Mt = St, ′ Yt Yt−1 = (St) (St−1) + (St−1 Mt−1) (St−1) − G − G − ′G ′ = (St) (St−1) + (Mt Mt−1) (St−1) + (St−1 St) (St−1) G − G ′ − G − G ′ = (Mt Mt−1) (St−1) + ( (St) (St−1) (St St−1) (St−1)) − G′ G − G − − G = (M M − ) (S − ) D−G(S ,S − ) t − t 1 G t 1 − t t 1 which also shows that Y is a supermartingale whenever M is. (c) To prove the equality, define the times at which Mt sets cumulative record maxima

(Mt = St) as τ1 < τ2 < t : Mt = St , where Yτi = (Sτi ). For v [τi, τi+1), by definition of Y , Y = ···∈{(S ) + (M S }) ′(S ) (S G) = (S ), with∈ equality v G v v − v G v ≤ G v G τi exactly at each τi. Therefore, τ1, τ2,... are also precisely the times Yt sets cumulative

11 record maxima, and max ≤ Y = (S ) for all t. s t s G t

Now we prove the inequalities. By concavity of (Theorem 4), we have Yt = (St)+ ′ G ′ G (Mt St) (St) (Mt). Also by monotonicity of , (Mt St) (St) 0, so Yt (S −). G ≥ G G − G ≤ ≤ G t (d) is nondecreasing and At (Mt), so that max As max (Ms)= (St) = max Ys, G ≥ G s≤t ≥ s≤t G G s≤t using part (b) for the last equality.

Proof of Theorem 9. a) Using the definition of and Lemma 5, we deduce maxs≤t Ys = µ 1 G (St) (S∞) = SQ S . As the SQ function is nondecreasing, using Thm. 2(a), G ≤ G  ∞  we get SQµ 1 SQµ( ) µHL. S∞  U ∼   µ µ −1 −1 b) The stopping event is equivalent to Yτ µ g (Sτ µ ) g (S∞)=¯µ (1/S∞) µ¯ ( ). By definition of the tail quantile function,≤ µ¯−1( ) has≤ distribution µ.  U U

To prove Theorem 10, we use a variational characterization of µ¯HL. Proposition 14 (Prop. 4.10(c), Carraro et al. (2012)).

HL 1 + µ¯ (y) = min EX∼µ (X (y z)) z>0 z − −   Proof of Theorem 10. We adapt an argument from Brown et al. (2001), via Carraro et al. (2012). (z−y)+ Define Q := max ≤ A . Let K

Proof of Lemma 11. It suffices to prove that if Bt is defined as specified, its differences have the specified properties, which together with the initial conditions define the process uniquely almost surely. By part (a) and then parts (b,c) of Lemma 5, Bt Bt−1 = Yt Yt−1 + D−G(St,St−1)= ′ t − − (M M − ) (S − ). So B = Y + D−G(S ,S − ). Taking the cumulative maximum of t − t 1 G t 1 t t s=1 s s 1 P t (a) both sides, maxs≤t Bs = maxs≤t Ys + s=1 D−G(Ss,Ss−1) = (St)+ Bt Yt = Bt + (St M ) ′(S ), where (a) uses Lemma 5 and the definition of B. G − − t G t P Proof of Theorem 12. Recall ′ 0 by concavity of . The definitions in (4) imply (2), with initial G ≥ G conditions for M ,S specified. Given B (w.p. 1) and − , this unambiguously specifies M , and 0 0 t Ft 1 t hence St. The decomposition is therefore a.s. unique (as can also be shown by contradiction).

A.3 NOTES ON THE DEFINITIONS

Definition 1. (a) = (b) follows by computing the expectations on each side of (b) by sampling a uniform [0, 1]⇒r.v. and applying (a) on this variable. (b) = (a) follows by setting h(s)= 1 (s c) for any c. U ⇒ ≥

12 µ 1 µ x 1 −1 1/x −1 Definition 8. We prove the form of : 0 g u du = 0 µ¯ (u/x)du = x 0 µ¯ (λ)dλ = µ G SQ (1/x). R  R R

B PROTOTYPICAL EXAMPLE: z-TEST AND SUB-GAUSSIAN STATISTICS

For sub-Gaussian statistics, characterization of their null distributions often ultimately relies on the Central Limit Theorem (CLT). Therefore, we use the z-test as a prototypical example to introduce the concentration behaviors.

B.1 A p-VALUE FOR A FIXED TIME

The z-test’s statistic, appropriately normalized, is a sum of standard normal random variables Mt = t λ2 R E λMt 2 t i=1 Zi, and its moment-generating function (m.g.f.) of any λ is e = e . So the λ2 ∈ P λ λMt− 2 t  λ  1 variable Yt := e has mean 1, and Markov’s inequality tells us that At := λ meets the Yt above definition of a p-value, i.e. Pr Aλ s s s. t ≤ ≤ ∀  2 λ −Mt /2t All this holds for any λ, so the best p-value at a fixed time is At = minλ At = e , recovering the well-known Gaussian tail. Peeking can be disastrous in this canonical scenario, leading to a profusion of false positives (indeed, classical results (Armitage et al., 1969) prove that a peeker willing to wait for the z-test as long as necessary can report any desired p< 1 w.p. 1).

B.2 INOCULATIONAGAINSTPEEKINGBYMIXINGDISTRIBUTIONS

To devise such a peeking-robust H, recall the distribution of M as specified by its m.g.f. at all times: 2 − λ Γ E eλMt 2 t = 1 λ. So for any distribution Γ, we have 1 = E W for the mixed process ∀ t h i λ2   Γ E λMt− 2 t Γ Wt := λ∼Γ e , and the process Wt is a nonnegative martingale. Its expectation is h i E Γ controlled at any stopping time τ by the optional stopping theorem (Theorem 0), so Wτ = Γ Γ 1 Γ E W = 1. Therefore, defining H := Γ and using Markov’s inequality on W , we have 0 τ Wτ τ Pr HΓ s = Pr W Γ 1 s, so HΓ behaves like a p-value despite the arbitrariness of the τ ≤ τ ≥ s ≤ τ stopping time τ. This is true regardless of the distribution Γ, which controls how the reported H varies over each sample path (Balsubramani, 2014; Howard et al., 2018). Such pathwise variation is unavoidably Ω(√t log log t), the content of a fundamental the- orem of probability – the law of the iterated logarithm (LIL). Proofs of the asymptotic (Robbins and Siegmund, 1970) and finite-time LIL (Balsubramani, 2014) have used its relation- Γ ship with mixed processes like Xτ , and that line of work has explored how best to choose Γ (Howard et al., 2018).

B.3 ROBUST p-VALUES FOR SUB-GAUSSIAN STATISTICS

Despite their generality, (super)martingales whose increments are sub-Gaussian follow concentra- tion behavior like Mt, the Gaussian random walk of the z-statistic we have discussed.The recipe λ2 λMt− Vt for H-values is much the same for these generalizations, where Wt = Eλ∼Γ e 2 is a (su- h i per)martingale for different values of λ, with Vt being the martingale’s cumulative variance process. − λ2 1 Γ E λMt− 2 Vt Γ So Ht := λ∼Γ e satisfies Pr Hτ s s for any stopping time τ – by the  h i ≤ ≤ argument of Appendix B.2. This makes it a robustp-value.2

2The guarantees on H hold even at the time of the ultimate minimum of H (see Section 3). This is not a stopping time, as it depends on future events, and was originally termed an “honest time" (Nikeghbali, 2007; Nikeghbali and Platen, 2013). Following this, a robust p-value is also “honest."

13