Methods for Ranking and Selection in Large-Scale Inference

Nicholas Henderson

Department of Statistics, University of Wisconsin – Madison −6 −4 −2 0 2 4 6 scale

θ θˆ measurement ± 2SE

In this talk, we will look at effect sizes θ and estimates θˆ.

2 Multiple effect size estimates

−6 −4 −2 0 2 4 6

parameters

θ1, . . . , θ10

3 Rank Ordering Effects

−6 −4 −2 0 2 4 6 θ increasing

what we want

4 Rank Ordering Effects

−6 −4 −2 0 2 4 6 increasing estimate

what we get

5 Large Scale

−6 −4 −2 0 2 4 6

I regression effect

I variance effect

6 Large Scale

−6 −4 −2 0 2 4 6 increasing estimate/SE

7 Type 2 Diabetes (T2D) GWAS (Morris et. al., 2012)

I case/control 22, 669/58, 119

I many T2D associated loci, but of small effect. (3371 SNPs shown)

I How to rank order?

8 Gene-Set Enrichment (Hao et. al. 2013)

I list of 984 human genes linked to influenza-virus 0.50 replication Ai

I overlap of this list with annotated gene sets from the Gene Ontology 0.20 (5719 gene sets)

How to rank order? I 0.10 proportion of set detected by RN

0.05

10 20 50 100 200 500 1000

set size N

9 Connection to Large-Scale Inference

 I Test statistics T1,..., Tn with Ti |θi ∼ f (t|θ)  I Unobserved effect sizes: θ1, . . . , θn

I Large-scale multiple testing Consider H0 : θi = 0, for i = 1,..., n.

Our Focus

I Non-sparse cases: substantial fraction of the θi are not zero.

I Ranking/Prioritizing the non-null cases.

I Differing levels of information across units

10 Overview

Objective We have data from a large number of measurement or inference units, and we would like to identify the most important units by some measure.

Typical setup: I n units  I Data: D1,..., Dn

I signals or effect sizes: (θ1, . . . , θn).  I parametric model: p Di |θi

11 Type 2 Diabetes (T2D) GWAS

I unit = SNP

I n = 3, 371

 odds(T 2D|Ai )  I θi = log c odds(T 2D|Ai )

I Di = (θˆi , σˆi )

12 : Free-Throw Percentages

1.0 I 461 NBA players (2013 − 2014) 0.8

I percentages 0.6

I lots of variation in 0.4

number of attempts Makes/Attempts 0.2

0.0

1 5 20 50 100 250 500 1000 # Free Throw Attempts unit − Basketball Player

Di = (mi , Xi ) = (# attempts, # makes)

Xi |θi , mi ∼ Binomial(θi , mi ) 13 Motivating model

 2    X1 σ1 θ1 . . . Data: . .  Signals:  .  2 Xn σn θn

Xi = θˆi estimate of θi ,

σi - precision of Xi . 2 The {σi } often differ substantially across units. normal/normal model 2 2 Xi |θi , σi ∼ N(θi , σi ) θi ∼ N(0, 1) 2 σi ∼ g(·)

14 How to rank?

2 2 Xi |θi , σi ∼ N(θi , σi ) θi ∼ N(0, 1) 2 σi ∼ g(·)

I MLE: sort by θˆi = Xi  I p-value: sort by pi = Φ Xi /σi , or equivalently, Xi /σi ˆPM 2 I post. mean: sort by θi = Xi /(σi + 1) σ2 ˆQ ˆPM i I post. quantile: sort by θi = θi − 1.96 × 2 σi +1

Each of the above methods produce the same rankings whenever σ1 = σ2 = ... = σn.

15 Variances given selection: sorting by p-value

2

p(σi |pvali ≤ p.05) 1 {i : pvali ≤ p.05} - top 5% by p-value

0.5 density

0.1

p(σi ) 0

0 1 2 3 4 5 6 σ

16 Sort by MLE

2

1

0.5 p(σi |Xi ≥ x.05) density {i : Xi ≥ x.05} - top 5% by MLE

0.1 p(σi )

0

0 2 4 6 8 σ 17 Sort by Posterior Mean

2

1 p(σi |PMi ≥ pm.05)

{i : PMi ≥ pm.05} - top 5% by posterior mean 0.5 density

0.1 p(σi )

0

0 2 4 6 8 σ

18 Bayesian approaches

I It has often been suggested to use Bayes estimates for ranking.

I These shrink units with high variance.

I posterior mean - good for point estimation posterior expected rank - treats all ranks equally posterior tests - good for hypothesis testing

I Can we do any better?

19 Threshold functions

2 tα(σ ) - threshold function

2 I Place unit i in the top α-fraction if Xi ≥ tα(σi ).

I A unit is selected if its effect size estimate Xi is sufficiently large relative to the associated precision.

Size constraint 2 On average, the proportion of units selected through Xi ≥ tα(σi ) must equal α

 2 P Xi ≥ tα(σi ) = α, for each α ∈ (0, 1).

20 Threshold functions: visualizing tradeoffs

I We can equate most methods for sorting units with a family T of threshold functions T = {tα(·): α ∈ (0, 1)}.

ˆPM 2 I e.g. posterior mean, θi = Xi /(σi + 1)

Xi 2 2 2 ≥ uα =⇒ tα(σ ) = uα(σ + 1) σi + 1

I Plots of tα(·) show how each method trades off observed signal versus estimation precision.

21 Threshold functions: T2D

22 Threshold functions: visualizing tradeoffs

Threshold functions associated with various ranking criteria, normal/normal model

2 criteria ranking variable threshold function tα(σ ) MLE Xi uα PV H0 : θi = 0 Xi /σi uασ PV H0 : θi = c (Xi − c)/σi c + uασ 2 2 PM Xi /(σi + 1) uα(σ + 1) 2 p 2 2 PER P(θi ≤ θ|Xi , σ ) uα (σ + 1)(2σ + 1) i r 2 n 2 o P(Xi |σi ,θi 6=0) 2 2 (σ +1) BF 1(Xi > 0) 2 σ (σ + 1) uα + log 2 P(Xi |σi ,θi =0) σ

23 From thresholds to ranks Assign ranks ri by sweeping through the family T  2 ri = inf α ∈ (0, 1) : Xi ≥ tα(σi ) .

24 Agreement

 The overlap between the “true” top α-fraction i : θi ≥ θα  2 and the reported top α-fraction i : Xi ≥ tα(σi ) is

n 1 X 1{θ ≥ θ }1{X ≥ t (σ2)}, n i α i α i i=1

where P{θi ≥ θα} = α.

The expected overlap is the agreement

 2 Agreement(α) = P θi ≥ θα, Xi ≥ tα(σi ) | {z } “Limiting overlap”

 2 Because P Xi ≥ tα(σi ) = α, this a “fair” way to compare procedures.

25 Maximizing agreement

2 I Assume: positive joint density for (Xi , σi , θi ) 2 I θi = E(Xi |θi , σi ) 2 2 I σi = var(Xi |θi , σi )

∗ ∗ The family T = {tα} is optimal if for any α ∈ (0, 1):

 ∗ 2  2 P θi ≥ θα, Xi ≥ tα(σi ) ≥ P θi ≥ θα, Xi ≥ tα(σi ) (1)

Theorem 1 (Necessary condition) ∗ A necessary condition for tα to be optimal as in (1) is that it satisfies

∗ 2 2 2 P{θi ≥ θα|Xi = tα(σ ), σi = σ } = cα

26 T2D data: maximal agreement

27 Normal/Normal

Threshold functions associated with various ranking criteria, normal/normal model

2 criteria ranking variable threshold function tα(σ ) MLE Xi uα PV H0 : θi = 0 Xi /σi uασ PV H0 : θi = c (Xi − c)/σi c + uασ 2 2 PM Xi /(σi + 1) uα(σ + 1) 2 p 2 2 PER P(θi ≤ θ|Xi , σ ) uα (σ + 1)(2σ + 1) i r 2 n 2 o P(Xi |σi ,θi 6=0) 2 2 (σ +1) BF 1(Xi > 0) 2 σ (σ + 1) uα + log 2 P(Xi |σi ,θi =0) σ 2 p 2 2 max agreement r-value θα(σ + 1) − uα σ (σ + 1)

28 Local Tail Probabilities

2  2  Vα(Xi , σi ) = P θi ≥ θα|Xi , σi and P θi ≥ θα = α

Theorem 2 2 2 If Vα(x, σ ) is right-continuous and non-decreasing in x for every (α, σ ), and λα is chosen so that

 2 P Vα(Xi , σi ) ≥ λα = α, then the optimal family is given by

∗ 2 2 tα(σ ) = inf{x : Vα(x, σ ) ≥ λα}.

29 Crossing?

Do functions from the optimal family cross?

Theorem 3: (Crossing)  ∗ 2 No functions in the family tα(σ ): α ∈ (0, 1) “cross” as long as ∗ 2 ∂tα(σ ) ∂α < 0. This property holds in the normal/normal model for any distribution of σ2.

30 Optimal ranking variable

For T = {tα}, assign percentile ranks by sweeping through the family 2 2 r(Xi , σi ) = inf{α : Xi ≥ tα(σi )}. ∗ For the family {tα} maximizing agreement, this is 2  2 r(Xi , σi ) = inf α ∈ (0, 1) : Vα(Xi , σi ) ≥ λα . | {z } 2 P(θi ≥θα|Xi ,σi ) “r-value” ∗ ∗ If the family T = {tα} has no crossings then 2 2 r(Xi , σi ) ≤ α if and only if Vα(Xi , σi ) ≥ λα 2 Recall: P{Vα(Xi , σi ) ≥ λα} = α.

 2  2 P r(Xi , σi ) ≤ α = P Vα(Xi , σi ) ≥ λα = α | {z } uniformly distributed 31 r-values ∼ Unif(0,1)

NBA

1.0

0.8

0.6 Density

0.4

0.2

0.0

0.0 0.2 0.4 0.6 0.8 1.0

rvalues

32 Interpretation

2  2 “r-value” → r(Xi , σi ) = inf α : Vα(Xi , σi ) ≥ λα 2 {i : Vα(Xi , σi ) ≥ λα} − top α - fraction of units when 2 ranking by Vα(Xi , σi )

A unit with an r-value of α may be interpreted as the smallest value at which the unit should be placed in the top α-fraction of units when 2 ranking by Vα(Xi , σi ).

33 Another View Constrained loss for classifying units as in the top α-fraction or not:

n n X X Lα(a, θ) = 1{ai ≤ α, θi ≥ θα}, s.t. ai ∈ [0, 1], 1{ai ≤ α} = nα. i=1 i=1

Constrained Bayes Rule 2 2  Choose the top α-fraction among Vα(X1, σ1),..., Vα(Xn, σn) .

I To avoid dependence on a particular α, the r-value compares 2 Vα(Xi , σi ) with the quantile curve λα. 2  2 I Considers posterior probabilities Vα(Xi , σi ) = P θi ≥ θα|Xi , σi } for all thresholds {θα}. I Does not require a pre-defined threshold of interest

34 General form

data for unit i Di

local posterior  tail probability Vα(Di ) = P θi ≥ θα|Di

Vα distribution  function Hα(v) = P Vα(Di ) ≤ v

marginal λ = H−1(1 − α) upper quantile α α

 r-value r(Di ) = inf α : Vα(Di ) ≥ λα

35 Percentiling procedure

The r-value is a percentiling procedure in the sense that  P r(Di ) ≤ α ≤ α for all α ∈ (0, 1)

I On average, the proportion of units claimed to lie in the top-α fraction is at most α.  I R-values are not explicit ranks 1,..., n nor explicit  1 n percentiles n+1 ,..., n+1 . (though likely to perform the same in large-scale problems)

36 Percentiles and Agreement

 r(Di ) = inf α : Vα(Di ) ≥ λα The r-value is uniquely determined if the function

gDi (α) = Vα(Di ) − λα contains exactly one root in (0, 1) for each Di .

Theorem 4

If r(Di ) is uniquely determined, the r-value, as a percentiling procedure, maximizes agreement for any α ∈ (0, 1). That is,   P r(Di ) ≤ α, θi ≥ θα ≥ P π(Di ) ≤ α, θi ≥ θα , for any other procedure π(Di ) such that P{π(Di ) ≤ α} ≤ α.

37 r-value vs. other procedures

I The r-value procedure tends to not select for small variance units as much as the posterior mean or posterior expected rank.

I In the motivating normal/normal model, the conditional distribution of the selected variances typically matches the unconditional distribution more closely.

38 Conditional density of σ

2 2

1 1

0.5 r-value 0.5 Post.Mean density density

0.1 0.1 p(σ) p(σ)

0 0

0 2 4 6 8 0 2 4 6 8 σ σ

2 2

1 Post.Expected 1 p-value 0.5 Rank 0.5 density density

0.1 0.1 p(σ) p(σ) 0 0

0 2 4 6 8 0 1 2 3 4 5 6 σ σ 39 r-values vs. other methods (GSEA example) MLE MLE

MLE p−value N[ok] N[ok]

1

1/4

1/16

0 MLE −1/16

−1/4

−1

posterior mean (X−R)/(X+R) N[ok]

40 r-values vs. others (gene expression, Pyeon et. al., 2007)

3.5 ● 3.5 ● MLE ● p−value ● 3.0 3.0

● ● 2.5 ● 2.5 ● ● ●

● ● ● ● 2.0 ● ● 2.0 ● ● ● ● ● ● e estimate e estimate ●● ● ●● ● ● ● ●● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ●● ● ● ●●●●● ● ●● ● ● ● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ●●● ●● ● ● ●●● ●● ● ect siz 1.5 ● ● ●●● ●● ● ● ect siz 1.5 ● ● ●●● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ● ● ●●●●● ●●● ● ●●●●● ●●● ● eff ● ●●● eff ● ●●● ● ●●● ● ● ● ● ●● ● ●●● ● ● ● ● ●● ● ●● ● ● ●● ●● ● ● ●● ● ● ●● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ●●● ●●● ●●●●● ●●● ●● ● ● ●●● ●●● ●●●●● ●●● ●● ● ● ● ● ●●●●●●●●●● ●● ●● ● ● ● ●●●●●●●●●● ●● ●● ●●●●●● ●● ●●●● ● ● ● ●●●●●● ●● ●●●● ● ● ● ●● ●● ●●●●●●●● ●●● ●●●● ● ● ●● ●● ●●●●●●●● ●●● ●●●● ● ● ● ●●●●● ●●●●●●● ● ●●●●● ●● ● ● ● ●●●●● ●●●●●●● ● ●●●●● ●● ● ● ● ●●●●● ●●●●●●●●●●●●●●●●●●● ● ● ● ●●●●● ●●●●●●●●●●●●●●●●●●● ● ● ● ●●●●●●●●●●●●●●●●●●● ● ●● ● ●●●●●●●●●●●●●●●●●●● ● ●● ● ●● ●●●●●●●●●●●●●●●●●● ●● ●●● ● ● ● ● ●● ●●●●●●●●●●●●●●●●●● ●● ●●● ● ● ● ●●● ●● ●●●●●●●●●●●●●●●●●●●● ● ●● ●● ● ●●● ●● ●●●●●●●●●●●●●●●●●●●● ● ●● ●● ● 1.0 ● ● ●●●●●●●●●●●●●●●●●●●●● ●● ● ● ●●● ● ● ● 1.0 ● ● ●●●●●●●●●●●●●●●●●●●●● ●● ● ● ●●● ● ● ●

0.1 0.2 0.3 0.4 0.6 0.1 0.2 0.3 0.4 0.6

standard error standard error

3.5 ● post. mean ● 3.0 1 ● 2.5 ● 1/4 ●

● ● 1/16 2.0 ● ● ● ● e estimate ●● ● 0 ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●●●●●● ●● ● ● −1/16 ● ●● ● ●● ● ● ● ●●● ●● ● ect siz 1.5 ● ● ●●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●●●●● ●●● ● −1/4 eff ● ● ●●● ● ●● ●● ● ● ● ●●● ● ●● ● ●●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ●●● ●●●●●● ●●● ● ● ● ● ● ●●●●●●● ●●●● ●● ●●●●●● ●● ● ●●● ● ● ● −1 ●● ●● ●●● ●●●● ●●● ●●●● ● ● ● ●●●●● ●●●●● ● ● ●●●● ●● ● ● ● ●●●●● ●●●●●●●●●●●●●●●●●●● ● ● ● ●●●●●●●●●●●●●●●●●●● ● ●● ● ●● ●●●●●●●●●●●●●●●●●● ●● ●●● ● ● ● ●●● ●● ●●●●●●●●●●●●●●●●●●●● ● ●● ●● ● 1.0 ● ● ●●●●●●●●●●●●●●●●●●●●● ●● ● ● ●●● ● ● ● (X−R)/(X+R)

0.1 0.2 0.3 0.4 0.6

standard error

41 Interval Estimation and Uncertainty Quantification

I Though the r-value optimizes agreement, there is often substantial uncertainty in the reported percentiles.

I The population percentile ρi of θi is its percentile in the population of {θi }.

ρi = 1 − F (θi ) θi ∼ F  I If P c1(Di ) ≤ θi ≤ c2(Di ) Di = 1 − γ, a 100 × (1 − γ)% credible interval for ρi is h  i Cρ(Di ) = 1 − F c2(Di ) , 1 − F c1(Di )

I For a correct model, the empirical coverage should be close to 1 − γ n 1 X 1{ρ ∈ C (D )} −→ Pρ ∈ C (D ) = 1 − γ n i ρ i a.s. i ρ i i=1 42 Credible Intervals: Normal/Normal Simulation

ρi r−value

0.0 0.2 0.4 0.6 0.8 1.0

43 Uncertainty in Percentiles: NBA Data

Brian Roberts Ryan Anderson Danny Granger Kyle Korver Mike Harris J.J. Redick Ray Allen Mike Muscala Dirk Nowitzki Reggie Jackson Kevin Martin Gary Neal D.J. Augustin Stephen Curry Patty Mills Courtney Lee Steve Nash Greivis Vasquez Robbie Hummel r−value Mo Williams

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8

44 Computing r-values Definition:  ri = inf α : Vα(Di ) ≥ λα .

(1) Model-Based Approach: Z Solve for λα: 1{Vα(D) ≥ λα}p(D)dD = α

(2) Empirical Dist. Approach: ˆ ˆ −1 ˆ −1 X λα,n = Hα,n(1 − α), where Hα,n(v) = n 1{Vα(Di ) ≤ v} i Then, compute  rˆi = inf α : Vα(Di ) ≥ λˆα,n , or  n ˆ rˆi = inf α : Vbα (Di ) ≥ λα,n . 45 Computing r-values

Two main objects

I Vα(Di ) = P{θi ≥ θα|Di }

I The marginal quantile curve λα, where P{Vα(Di ) ≥ λα} = α.

Steps:

(1) Select a grid 0 < α1 < . . . < αT < 1.

(2) Compute Vαj (Di ) for all i and j.

– Conjugate prior: Vαj (Di ) known

– MCMC output: Vαj (Di ) estimated

(3) λˆ0 = (1 − α )th sample quantile from V (D ),..., V (D ). αj ,n j αj 1 αj n

(4) Compute λˆ by interpolating (α , λˆ0 ),..., (α , λˆ0 ) α,n 1 α1,n T αT ,n

46 Computing r-values I n = 10

I Vα(D1) = P(θi ≥ θα|D1)

1.0

0.8 Vα(D1) 0.6 Vα 0.4

0.2

0.0

0 1

α

47 Computing r-values I n = 10

I six grid points

1.0

0.8

0.6

Vα 0.4

0.2

0.0

0 α1 α2 α3 α4 α5 α6 1

α

48 Computing r-values

λˆ0 = (1 − α )th sample quantile from V (D ),..., V (D ). I αj j αj 1 αj 10

1.0

0.8

0.6 Vα 0.4

0.2 0 λ^ αj 0.0

α α α α α α 0 1 2 3 4 5 6 1

α

49 Computing r-values

1.0

0.8

0.6 Vα 0.4

^0 0.2 λ αj ^ λα 0.0

α α α α α α 0 1 2 3 4 5 6 1

α

50 Computing r-values  rˆi = inf α : Vα(Di ) ≥ λˆα,n

1.0

0.8

0.6

Vα Vα(D4) 0.4 Vα(D9) ^0 0.2 λ αj ^ λα 0.0

α α α α α ^ α 0 1 2 3 4 5 r4 6 1

α

51 NBA example

I Beta prior

10 I θ - probability of making a free α throw 5 = 0.05 density

1

0 θ 0.2 0.4 0.6 0.8 α 1.0

Free Throw Ability

52 NBA example

I Binomial likelihood

I Beta posteriors 35

20

10 density 5

1

0

0.2 0.4 0.6 0.8 1.0

Free Throw Ability

53 NBA example

54 Top 10

Leading free-throw shooters, 2013-2014 regular season of the National Basketball Association. From 461 players who attempted at least one free throw, shown are the top 10 players as inferred by r-value.

player i yi ni FTP PM RV Q.R MLE.R PM.R RV.R Brian Roberts 125 133 0.940 0.913 0.002 1 17 1 1 Ryan Anderson 59 62 0.952 0.898 0.003 15 2 2 Danny Granger 63 67 0.940 0.893 0.005 16 3 3 Kyle Korver 87 94 0.926 0.892 0.008 19 4 4 Mike Harris 26 27 0.963 0.866 0.010 14 15 5 J.J. Redick 97 106 0.915 0.886 0.011 22 6 6 Ray Allen 105 116 0.905 0.880 0.016 25 8 7 Mike Muscala 14 14 1.000 0.844 0.017 7 34 8 Dirk Nowitzki 338 376 0.899 0.891 0.018 2 30 5 9 Trey Burke 102 113 0.903 0.877 0.018 28 9 10

55 Sampling Performance

Estimated r-values ˆrn(Di ) are typically computed with estimates Fˆn of F , θˆα,n = Fˆn(1 − α) of θα, and λˆα,n of λα. Z ∞ Z ∞ ˆ 2 2 ˆ . 2 ˆ Vα,n(Xi , σi ) = p(Xi |θ, σi )dFn(θ) p(Xi |θ, σi )dFn(θ) θˆα,n −∞ 2  ˆ 2 ˆ ˆrn(Xi , σi ) = inf α : Vα,n(Xi , σi ) ≥ λα,n

2 2 (A1) Triples (θi , Xi , σi ) are i.i.d. from a joint distribution; θi and σi are independent and with positive densities f and g. 2 ˆ (A2) From data {Di = (Xi , σi ): i = 1, 2, ..., n}, the estimator Fn is invariant under permutations of the units, and Fˆn converges weakly to F almost surely as n −→ ∞. (A3) The density p(x|θ, σ2) is (i) continuous: p(x|θ, σ2) is continuous in x for each (θ, σ2) and is continuous in θ for each (x, σ2) (ii) bounded: there is a continuous function K(σ2) such that 0 < p(x|θ, σ2) ≤ K(σ2) for all arguments 2 2 p(x1|θ,σ ) (iii) monotone: for any x1 > x0 and σ > 0, 2 is increasing in θ p(x0|θ,σ ) ∗ 2  2 (A4) The family t (σ ) = inf α : Vα(Xi , σ ) ≥ λα has no crossings. α i 56 Sampling Performance Theorem 5 ˆ ˆ −1 Assume conditions A1-A4 are satisfied and λα,n = Hα,n(1 − α). Then, as n −→ ∞ and for any α ∈ (0, 1)

λˆα,n −→P λα and n 1 X 1{ˆr δ(D ) ≤ α} −→ Pr δ(D ) ≤ α n n i P i i=1 and n 1 X 1{ˆr (D ) ≤ α}1{θ ≥ θ } ≥ A∗ + o (1), n n i i α α P i=1 ∗  where Aα = P r(Di ) ≤ α, θi ≥ θα is the optimal agreement and δ h i r (Di ) = min inf{α ∈ [δ, 1] : Vα(Di ) ≥ λα}, 1 − δ .

57 Simulation

I normal/normal model

2 2 0.6 Xi |θi , σi ∼ N(θi , σi ) θi ∼ N(0, 1) 0.5 I n = 1, 000

α 0.4 I 2, 000 replications

0.3 agreement/ 0.2 r−value 0.1 PM PER MLE 0.0

0.00 0.02 0.04 0.06 0.08 0.10 α

58 Varying distribution of σ2.

0.8 cv(σ2) = 1.41 0.6 cv(σ2) = 4

0.4 r−value r−value PM PM

α 0.2 PER PER MLE MLE 0.0 0.8

2 2 agreement/ cv(σ ) = 1 cv(σ ) = 0.32 0.6

0.4 r−value r−value 0.2 PM PM PER PER MLE MLE 0.0 0.0 0.02 0.04 0.06 0.08 0.10 0.0 0.02 0.04 0.06 0.08 0.10 α 59 Varying number of units.

0.6 n = 10,000 n = 1,000

0.4

r−value r−value 0.2 PM PM α PER PER MLE MLE 0.0 0.6 n = 200 n = 50 agreement/

0.4

r−value r−value 0.2 PM PM PER PER MLE MLE 0.0 0.0 0.02 0.04 0.06 0.08 0.10 0.0 0.02 0.04 0.06 0.08 0.10 α 60 2 Dependence: ρ = Corr(θi , σi )

0.6

0.4

r−value r−value 0.2 PM PM α ρ = 0.19 PER ρ = 0.24 PER mle mle 0.0

0.6 agreement/

0.4

r−value r−value 0.2 PM PM ρ = 0.55 PER ρ = 0.71 PER mle mle 0.0 0.0 0.02 0.04 0.06 0.08 0.10 0.0 0.02 0.04 0.06 0.08 0.10 α 61 Heavy-tailed distribution of θi

0.8

0.6

0.4 df = 6 r−value df = 4 r−value PM PM

α 0.2 PER PER MLE MLE 0.0 0.8 agreement/ 0.6

0.4 df = 3 r−value df = 2 r−value 0.2 PM PM PER PER MLE MLE 0.0 0.0 0.02 0.04 0.06 0.08 0.10 0.0 0.02 0.04 0.06 0.08 0.10 α 62 Summary

I ranking to maximize agreement

I large-scale, non-sparse settings

I empirical Bayes inference

I promising performance

63 Thanks

64