Arxiv:1902.02774V3 [Stat.ME] 15 Feb 2021

Confidence Intervals for Nonparametric Empirical Bayes Analysis Nikolaos Ignatiadis Stefan Wager [email protected] [email protected] September 2021 Abstract In an empirical Bayes analysis, we use data from repeated sampling to imitate in- ferences made by an oracle Bayesian with extensive knowledge of the data-generating distribution. Existing results provide a comprehensive characterization of when and why empirical Bayes point estimates accurately recover oracle Bayes behavior. In this paper, we develop flexible and practical confidence intervals that provide asymptotic frequentist coverage of empirical Bayes estimands, such as the posterior mean or the local false sign rate. The coverage statements hold even when the estimands are only partially identified or when empirical Bayes point estimates converge very slowly. Keywords: Empirical Bayes, mixture models, local false sign rate, partial identifi- cation, bias-aware inference 1 Introduction Empirical Bayes methods enable frequentist estimation that emulates a Bayesian oracle. Suppose we observe Z generated as below, and want to estimate θG(z), µ G; Z p( µ); θG(z) = EG h(µ) Z = z ; (1) ∼ ∼ · for some known function h( ) R. Given knowledge of G, θG (z) can be directly evaluated via Bayes' rule. An empirical· Bayesian2 does not know G, but seeks an approximately optimal estimator θ^(z) θG(z) using independent draws Z1;Z2; :::; Zn from the distribution (1). The empirical≈ Bayes approach was first introduced by Robbins[1956] and has proven to arXiv:1902.02774v4 [stat.ME] 8 Sep 2021 be successful in a wide variety of settings with repeated observations of similar phenomena, such as genomics [Efron et al., 2001, Love et al., 2014], education [Lord, 1969, Gilraine et al., 2020] and actuarial science [Bühlmannand Gisler, 2006]. Table1 provides concrete applications of model (1) for these subject areas. In all examples, the posterior mean θG(z) = EG µ Z = z is a statistic of interest, as it describes the (mean squared error) optimal shrinkage rule for estimating µ. In the genomics application, it is also of interest to determine the local false-sign rate θG(z) = P µz 0 Z = z , i.e., the posterior probability that µ has a different sign than Z. ≤ As elaborated later, there is by now a large literature proposing a suite of estimators θ^(z) for θG(z). Many of these estimators have theoretical guarantees under nonparametric specification of G, say G , where is a convex class of distributions. The goal of this 2 G G 1 Subject i Zi µi Zi µi · ∼ Actuarial science Contract Number of insurance claims Risk profile Poisson (µ ) i Education Student Score in test with N Latent ability Binom(N; µi) multiple choice questions Genomics Gene t-statistic comparing Standardized (µi; 1) N expression between conditions effect size Table 1: Example applications for empirical Bayes inference in model (1). paper is to move past point estimation, and develop nonparametric confidence intervals for θG(z), i.e., intervals with the following property: ^ ^+ α(z) = θα−(z); θα (z) ; lim inf PG [θG(z) α(z)] 1 α for all G : (2) I n 2 I ≥ − 2 G h i !1 Despite widespread use of empirical Bayes methods, the problem has received surprisingly little attention. In fact, we are not aware of confidence intervals with property (2) beyond two special cases: one proposal by Lord and Cressie[1975] for inference about the posterior mean in the binomial model and another by Robbins[1980] for the same task in the Poisson model. 1.1 Motivating application: Predicting automobile insurance claims To motivate our interest in confidence intervals of the form (2), we revisit the historical work of Bichsel[1964]. Bichsel developed a theoretical framework for assigning automobile insurance premium rates, in a way that accounts for the claims experience of each individual. He analyzed a dataset (Table2) of claims made in the year 1961 by holders of a Swiss automobile insurance policy. Bichsel posited that Zi(t), the number of claims made in year t by the i-th insurance holder, is distributed as Poisson (µi), where µi is i's latent risk. µi is a random draw from a distribution G that captures the heterogeneity of the insurance portfolio. Bichsel further assumed that the number of claims Zi(t);Zi(t0) in different years t = t0 are i.i.d. conditionally on µi. Given these assumptions, Bichsel sought to estimate the6 expected number of claims in the next year, among all insurance holders that made Zi(1961) = z claims in 1961, θG(z) = EG Z(1962) Z(1961) = z = EG µ Z(1961) = z : (3) Bichsel reasoned, that if θG(z) were known, it could be used by the insurance company for policy decisions, such as increasing or decreasing the premium of a policy holder with z claims in 1961. Since G, and consequently θG(z), were not known to Bichsel, he considered an empirical Bayes approach.1 1 θG(z) is a property of G, i.e., of the portfolio heterogeneity. The goal is to best assess how many claims will be made across all individuals in the portfolio that made z claims in 1961, and not to reason about the risk µi of any individual policy holder with Zi(1961) = z. The problem of forming intervals containing the true µi (for individuals) is of scientific importance, see e.g., Morris[1983], Laird and Louis [1987], Armstrong, Kolesár,and Plagborg-Møller[2020], Koenker[2020] for some proposals; however it is not the problem we consider in this work. 2 z # Zi = z θ^ (z) F-localization α(z) AMARI α(z) f g NPMLE I I 0 103704 0.14 0.13 { 0.14 0.13 { 0.14 1 14075 0.25 0.23 { 0.27 0.24 { 0.26 2 1766 0.44 0.36 { 0.53 0.38 { 0.49 3 255 0.69 0.48 { 0.94 0.53 { 0.91 4 45 0.82 0.52 { 1.64 0.58 { 1.39 5 8 ≥ Table 2: Empirical Bayes confidence intervals in an actuarial application. The first two columns report the number of claims z made in 1961 by 119853 holders of a Swiss automobile insurance policy [Bichsel, 1964]. The next column reports the nonparametric maximum likelihood (NPMLE) point estimates of the posterior mean θG(z) = EG µ Z = z under model (1) with Z µ Poisson (µ). The last two columns report the two types of ∼ confidence intervals developed in this work; the F -localization intervals (Section 2) and AMARI intervals (Section 4). The problem of point estimation for θG(z) is well understood. One popular nonparametric solution2 is to first estimate G through the nonparametric maximum likelihood estimator (NPMLE) of Kiefer and Wolfowitz[1956] and Simar[1976]: one estimates G as the maxi- z mizer of the marginal log-likelihood log(fG(Zi)), fG(z) = exp( µ)µ =z! dG(µ), among i − all possible prior distributions G. Then, with G in hand, one estimates θG(zb) through the ^ P R plug-in principle, i.e., θG(z) = θGb(z) (shown in the third column of Table2). However, in so far as θG(z) may be used forb policy decisions of the insurance company, it is also important to assess the uncertainty in estimating it. In this paper, we develop two complementary approaches that address the problem of inference for empirical Bayes estimands, and enable the construction of intervals with the property (2) under the general model (1). The last two columns of Table2 show the two confidence intervals that we propose for Bichsel's data. The assumption we make in forming these intervals is that G is supported on [0; 5]. The `F-localization' intervals (third column of Table2) have simultaneous coverage for all z, while the ÀMARI' intervals (fourth column) have pointwise coverage. We next provide a high-level overview of our two constructions. 1.2 Empirical Bayes confidence intervals In our approach the data analyst first specifies (1), i.e., θG(z), the empirical Bayes estimand of interest (e.g., the posterior mean (3)) and the conditional distribution of Z given µ (e.g., Poisson (µ)), which we represent by its conditional density p( µ) with respect to a σ-finite · measure λ on a subset of R (e.g., the counting measure on N 0). We also require the data ≥ analyst to specify a convex class of priors such that G . For example, for our analysis of Bichsel's data in Table2, we assumed thatG G = 2([0 G; 5]), where:3 2 G P ( ) := G distribution : support(G) for R: (4) P K f ⊂ Kg K ⊂ 2In his work, Bichsel modeled G parametrically as a Gamma distribution with unknown parameters. 3We provide more guidance for choosing G in Section8. 3 1.2.1 F -localization Our first confidence interval construction is based on the notion of F -localization. The key idea is to construct a confidence set for the marginal distribution of Z and then determine all G consistent with this confidence set. Let us denote the marginal distribution of Z 2 G by FG and its dλ-density by fG, i.e., fG(z) = p(z µ)dG(µ) ;FG(t) = PG [Z t] = 1 (z t) fG(z)dλ(z): (5) ≤ ≤ Z Z We then define an F -localization as an (asymptotic) 1 α confidence set n(α) of distributions, i.e., a set such that − F lim inf G [FG n(α)] (1 α) 0: (6) n P !1 f 2 F − − g ≥ With an F -localization n(α) in hand, and deferring the construction of such to Section2, F + we can form confidence intervals α(z) = [θ^−(z); θ^ (z)] for θG(z) by letting, I α α + θ^−(z) = inf θG(z) G ( n(α)) ; θ^ (z) = sup θG(z) G ( n(α)) ; (7) α f j 2 G F g α f j 2 G F g where ( ) = G FG : (8) G F 2 G 2 F ^ ^+ The intervals (7) satisfy (2), since PG[θG(z) [θα−(z); θα (z)]] PG [FG n(α)] ; and the same argument also demonstrates that coverage2 holds simultaneously≥ over2 all F possible empirical Bayes estimands θG(z) = EG h(µ) Z = z , where both z and h can vary.

Arxiv:1902.02774V3 [Stat.ME] 15 Feb 2021

The Meaning of Probability

Strength in Numbers: the Rising of Academic Statistics Departments In

Cramer, and Rao

A Century of Mathematics in America, Peter Duren Et Ai., (Eds.), Vol

An Annotated Bibliography on the Foundations of Statistical Inference

Computation of Expected Shortfall by Fast Detection of Worst Scenarios

Statistical Inference: Paradigms and Controversies in Historic Perspective

Front Matter

David Donoho. 50 Years of Data Science. Journal of Computational

A Conversation with Richard A. Olshen 3

Curriculum Vitae January 2017 DONALD

Herbert S. Wilf (1931–2012)