<<

Confidence Distribution (CD) and an unifying framework for BFF inferences (Part I: An Introduction to CD & Part II: CD is an effective inference tool for fusion learning)

Min-ge Xie (Part I) Regina Liu (Part II)

Department of Statistics & Biostatistics, Rutgers University

International FocuStat Workshop on CDs and Related Themes Oslo, Norway 2015 Research supported in part by grants from NSF First peek of Bayesian/Fiducial/Frequentist (BFF) inferences Question: What are in common among the following BFF statistical tools/concepts?

Bayesian posterior distribution (B)

Fiducial distribution (Fi)

Likelihood function or normalized likelihood L(θ|data)/ R L(θ|data)dθ – assume the integration exists (Fr/B)

Bootstrap distribution (Fr)

p-value function (Fr)

One-sided H0: θ = t vs H1: θ > t =⇒ p-value p = p(t); varying t in Θ =⇒ p(t) is called a p-value function.

Belief/plausible functions (Dempster-Shafer/IMs) (Fi/Fr) ······ A common name for such sample-dependent functions? — Confidence distributions (CDs) In the talks –

I A more formal treatment of CD concept; combining information; examples as effective tools; relations to fiducial and Bayes

I Stress added values – Provide solutions for problems whose solutions are previously unavailable, unknown or not easily available.

First peek of Bayesian/Fiducial/Frequentist (BFF) inferences

They all —

are sample-dependent (distribution) functions on the parameter space.

can often be used to make , especially o can be used to construction confidence (credible?) intervals of all levels, exactly or asymptotically. In the talks –

I A more formal treatment of CD concept; combining information; examples as effective tools; relations to fiducial and Bayes

I Stress added values – Provide solutions for problems whose solutions are previously unavailable, unknown or not easily available.

First peek of Bayesian/Fiducial/Frequentist (BFF) inferences

They all —

are sample-dependent (distribution) functions on the parameter space.

can often be used to make statistical inference, especially o can be used to construction confidence (credible?) intervals of all levels, exactly or asymptotically.

A common name for such sample-dependent functions? — Confidence distributions (CDs) First peek of Bayesian/Fiducial/Frequentist (BFF) inferences

They all —

are sample-dependent (distribution) functions on the parameter space.

can often be used to make statistical inference, especially o can be used to construction confidence (credible?) intervals of all levels, exactly or asymptotically.

A common name for such sample-dependent functions? — Confidence distributions (CDs) In the talks –

I A more formal treatment of CD concept; combining information; examples as effective tools; relations to fiducial and Bayes

I Stress added values – Provide solutions for problems whose solutions are previously unavailable, unknown or not easily available. Estimation viewpoint – Parameter estimation & confidence distribution (CD)

Statistical inference: Point estimate Interval estimate Distribution estimate (e.g., confidence distribution)

Example: X1,..., Xn i.i.d. follows N(µ, 1) ¯ 1 Pn Point estimate: xn = n i=1 xi √ √ Interval estimate: (x¯n − 1.96/ n, x¯n + 1.96/ n) ¯ 1 Distribution estimate: N(xn, n )

I The idea of the CD approach is to use a sample-dependent distribution (or density) function to estimate the parameter of interest. One-sided test: K0 : µ = µ0 vs Ka : µ > µ0 √ √ ¯ p(µ0) = P(X > x¯n) = 1 − Φ( n{x¯n − µ0}) = Φ( n{µ0 − x¯n}). ¯ 1 Varying µ0 ∈ Θ! =⇒ Cumulative distribution function of N(xn, n )!

? Suppose n = 100 and we observe x¯n = 0.3

mu0 iny -0.1 0 0.1 0.1355 0.2 0.3 0.4645 0.496

P-value 0 .00002 .00135 0.02275 0.05 .15866 0.5 0.95 0.975

-0.1 0 0.2 0.3 0.496

¯ 1 Example: Many ways to obtain "distribution estimate" N(xn, n )

Method 1 & 2: Bayes & fiducial (omit here ...) Method 3: P-value method ? Suppose n = 100 and we observe x¯n = 0.3

mu0 iny -0.1 0 0.1 0.1355 0.2 0.3 0.4645 0.496

P-value 0 .00002 .00135 0.02275 0.05 .15866 0.5 0.95 0.975

-0.1 0 0.2 0.3 0.496

¯ 1 Example: Many ways to obtain "distribution estimate" N(xn, n )

Method 1 & 2: Bayes & fiducial (omit here ...) Method 3: P-value method

One-sided test: K0 : µ = µ0 vs Ka : µ > µ0 √ √ ¯ p(µ0) = P(X > x¯n) = 1 − Φ( n{x¯n − µ0}) = Φ( n{µ0 − x¯n}). ¯ 1 Varying µ0 ∈ Θ! =⇒ Cumulative distribution function of N(xn, n )! ¯ 1 Example: Many ways to obtain "distribution estimate" N(xn, n )

Method 1 & 2: Bayes & fiducial (omit here ...) Method 3: P-value method

One-sided test: K0 : µ = µ0 vs Ka : µ > µ0 √ √ ¯ p(µ0) = P(X > x¯n) = 1 − Φ( n{x¯n − µ0}) = Φ( n{µ0 − x¯n}). ¯ 1 Varying µ0 ∈ Θ! =⇒ Cumulative distribution function of N(xn, n )!

? Suppose n = 100 and we observe x¯n = 0.3

mu0 iny -0.1 0 0.1 0.1355 0.2 0.3 0.4645 0.496

P-value 0 .00002 .00135 0.02275 0.05 .15866 0.5 0.95 0.975

-0.1 0 0.2 0.3 0.496

? Suppose n = 4; Observe a sample with mean x¯obs = 2.945795:

Area = 1

¯ 1 Example: Many ways to obtain "distribution estimate" N(xn, n )

Method 4: Normalizing Likelihood function 1 P 2 n 2 1 P 2 Y − (xi −µ) − (x¯−µ) − (xi −x¯) L(µ|data) = f (xi |µ) = Ce 2 = Ce 2 2 Normalized with respect to µ

L(µ|data) 1 − n (x¯−µ)2 R = e 2 L(µ|data)µ p2π/n ¯ 1 It is the density of N(xn, n )! ¯ 1 Example: Many ways to obtain "distribution estimate" N(xn, n )

Method 4: Normalizing likelihood function Likelihood function 1 P 2 n 2 1 P 2 Y − (xi −µ) − (x¯−µ) − (xi −x¯) L(µ|data) = f (xi |µ) = Ce 2 = Ce 2 2 Normalized with respect to µ

L(µ|data) 1 − n (x¯−µ)2 R = e 2 L(µ|data)µ p2π/n ¯ 1 It is the density of N(xn, n )!

? Suppose n = 4; Observe a sample with mean x¯obs = 2.945795:

Area = 1 Inference using CD: Point estimators, Intervals, p-values & more

b Review on CD (Xie & Singh 2013, Int. Stat. Rev.)

Cox (2013, Int. Stat. Rev.): The CD approach is “to provide simple and interpretable summaries of what can reasonably be learned from data (and an assumed model).” Efron (2013, Int. Stat. Rev.): The CD development is “a grounding process” to help solve “perhaps the most important unresolved problem in statistical inference” on “the use of Bayes theorem in the absence of prior information.”

Confidence distribution

Efron (1998, Statist. Sci.): “... but here is a safe prediction for the 21st century: ... I believe there is a good chance that ... something like fiducial inference will play an important role ... Maybe Fisher’s biggest blunder will become a big hit in the 21st century!” o Bootstrap distributions are “distribution estimators” and “confidence distributions.” Confidence distribution

Efron (1998, Statist. Sci.): “... but here is a safe prediction for the 21st century: ... I believe there is a good chance that ... something like fiducial inference will play an important role ... Maybe Fisher’s biggest blunder will become a big hit in the 21st century!” o Bootstrap distributions are “distribution estimators” and “confidence distributions.” Review on CD (Xie & Singh 2013, Int. Stat. Rev.)

Cox (2013, Int. Stat. Rev.): The CD approach is “to provide simple and interpretable summaries of what can reasonably be learned from data (and an assumed model).” Efron (2013, Int. Stat. Rev.): The CD development is “a grounding process” to help solve “perhaps the most important unresolved problem in statistical inference” on “the use of Bayes theorem in the absence of prior information.” Confidence distribution has a long history

Long history: an uncertainty measure on parameter space without any prior Fraser (2011) suggested the seed idea (alas, fiducial idea) can be traced back to Bayes (1763) and Fisher (1922). The term “confidence distribution” appeared as early as 1937 in a letter from E.S. Pearson to W.S. Gosset (Student); the first use of the term in a formal publication is Cox (1958). No precise yet general definition for a CD in the classical literature Most commonly used approach is via inverting the upper limits of a whole set of lower-side confidence intervals, often using some special examples (e.g., Efron, 1993, Cox, 2006) Little attention on CDs in the past Historic connection to fiducial distribution, which is largely considered as “Fisher’s biggest blunder” (see, Efron, 1998) Not yet seriously looked at the possible utilities of CDs especially in applications. Renewed interest

Recent developments Entirely within the frequentist school — Not attempt to derive a new “paradox free” fiducial theory (not possible!) Focus on providing tools for problems whose solutions are previously unavailable or unknown. Renewed interest

21st century developments of confidence distributions & related topics Confidence distributions (Schweder & Hjort 2002, 2015; Singh, Xie & Strawderman 2005, 2007; Xie & Singh 2013) Bayes/objective Bayes (Berger 2006, Dongchu’s book with Jim 2015(?), etc.) Dempster-Shafer calculus (Dempster 2008); Inferential Models (Liu & Martin 2009, 2012, 2014) Generalized inference/generalized fiducial inference (Weerahandi 1989, 1993; Hannig 2009, 2013; Hannig & Lee 2009);

(* above only cited some ‘review’ papers and books) Current research focus — while paying attention to new developments of theoretical frameworks, Emphasize on applications — Focus on to provide frequentist inference tools for problems whose solutions are previously unavailable or unknown. Can these all lead to a brighter new future?

Now, with the emerging BFF,

Can we really unify the foundation of statistical inferences?

“Can our desire to find a unification of Bayesian, Frequentist and Fiducial (BFF) perspectives do the same trick, allowing all of us to thrive under one roof as BFFs (Best Friends Forever)?” - Meng (2014, IMS Bulletin) Classical notion: confidence distribution Classical CD function (e.g., Efron 1993, Cox 2006):

Let (−∞, ξn(α)] be a 100α% lower side confidence interval for θ, at every level α ∈ (0, 1)

Assume the ξn(α) = ξn(Xn, α) is continuous and increasing in α. ⇒ The inverse function

−1 Hn(·) = ξn (·)

is a CD in the usual Fisherian sense. Remarks: (a) The CD definition does not involve fiducial reasoning. (b) But the CD is often interpreted as a sort of distribution of θ —where the fiducial argument is involved. Classical notion: confidence distribution Classical CD function (e.g., Efron 1993, Cox 2006):

Let (−∞, ξn(α)] be a 100α% lower side confidence interval for θ, at every level α ∈ (0, 1)

Assume the ξn(α) = ξn(Xn, α) is continuous and increasing in α. ⇒ The inverse function

−1 Hn(·) = ξn (·)

is a CD in the usual Fisherian sense. Remarks: (a) The CD definition does not involve fiducial reasoning. (b) But the CD is often interpreted as a sort of distribution of θ —where the fiducial argument is involved. An Animation An Animation An Animation An Animation An Animation An Animation Formal Definition: Confidence Distribution

Θ = parameter space of θ; X = sample space of data Xn = (X1,..., Xn). Definition

A function H(·) = H(·, X n) on Θ × X → [0, 1] is called a confidence distribution (CD) for a parameter θ, if it follows two requirements:

R1) For each given X n ∈ X , H(·) is a cumulative distribution function on Θ;

R2) At the true parameter value θ = θ0, H(θ0) ≡ H(θ0, X n), as a function of

the sample X n, follows the uniform distribution U(0, 1); Or, equivalently,

H(θ0, X n) satisfies

−1 PXn (H(θ0, X n) 6 α) = PXn (θ0 6 H (α)) = α, for any 0 < α < 1.

Also, the function H(·) is an asymptotic confidence distribution (aCD), if the U(0, ) requirement is true only asymptotically.

Remark: The definition is consistent with the classical CD definition

. An analogy example in : Consistent [or unbiased] estimator/estimate θˆ R1) It (θˆ) is a point (estimator/estimate) on the parameter space.

R2) It (θˆ) tends to the true θ0, as n → ∞ [or Eθˆ = θ0].

I Performance measurement for CD: frequentist probability coverage o Simple and intuitive interpretation – used many times, majority conclusions (e.g, 95%) are correct

Formal Definition: Confidence Distribution

Translation of the two requirements: R1) H(θ) is a distribution function on the parameter space.

R2) H(θ) has “correct” information about θ0 — ensures the coverage rates of confidence intervals of all levels. I Performance measurement for CD: frequentist probability coverage o Simple and intuitive interpretation – used many times, majority conclusions (e.g, 95%) are correct

Formal Definition: Confidence Distribution

Translation of the two requirements: R1) H(θ) is a distribution function on the parameter space.

R2) H(θ) has “correct” information about θ0 — ensures the coverage rates of confidence intervals of all levels.

An analogy example in point estimation: Consistent [or unbiased] estimator/estimate θˆ R1) It (θˆ) is a point (estimator/estimate) on the parameter space.

R2) It (θˆ) tends to the true θ0, as n → ∞ [or Eθˆ = θ0]. Formal Definition: Confidence Distribution

Translation of the two requirements: R1) H(θ) is a distribution function on the parameter space.

R2) H(θ) has “correct” information about θ0 — ensures the coverage rates of confidence intervals of all levels.

An analogy example in point estimation: Consistent [or unbiased] estimator/estimate θˆ R1) It (θˆ) is a point (estimator/estimate) on the parameter space.

R2) It (θˆ) tends to the true θ0, as n → ∞ [or Eθˆ = θ0].

I Performance measurement for CD: frequentist probability coverage o Simple and intuitive interpretation – used many times, majority conclusions (e.g, 95%) are correct An analogy in point estimation The definition of a consistent estimator describes a certain required property An MLE or M-estimator is obtain by a certain procedure (e.g., maximizing criterion/solving estimating equations)

Descriptive definition versus procedures

My understanding — CD, fiducial distribution and also many other related approaches are closely related, but The confidence distribution definition from “behaviorist”/pragmatic viewpoint only describes a certain required property Fisher’s fiducial distribution is based on inductive reasoning which often leads to a specific procedure (e.g., solving structure equations) Descriptive definition versus procedures

My understanding — CD, fiducial distribution and also many other related approaches are closely related, but The confidence distribution definition from “behaviorist”/pragmatic viewpoint only describes a certain required property Fisher’s fiducial distribution is based on inductive reasoning which often leads to a specific procedure (e.g., solving structure equations) An analogy in point estimation The definition of a consistent estimator describes a certain required property An MLE or M-estimator is obtain by a certain procedure (e.g., maximizing criterion/solving estimating equations) Descriptive definition versus procedures

My understanding — CD, fiducial distribution and also many other related approaches are closely related, but The confidence distribution definition from “behaviorist”/pragmatic viewpoint only describes a certain required property Fisher’s fiducial distribution is based on inductive reasoning which often leads to a specific procedure (e.g., solving structure equations) Descriptive Procedure-wise Point estimation Consistent estimator MLE M-estimation ...... Distributional inference Confidence distribution Fiducial distribution (distribution estimator) p-value function Bootstrap Bayesian posterior ...... Descriptive definition versus procedures

My understanding — CD, fiducial distribution and also many other related approaches are closely related, but The confidence distribution definition from “behaviorist”/pragmatic viewpoint only describes a certain required property Fisher’s fiducial distribution is based on inductive reasoning which often leads to a specific procedure (e.g., solving structure equations) Descriptive Procedure-wise Point estimation Consistent estimator MLE M-estimation ...... Distributional inference Confidence distribution Fiducial distribution (distribution estimator) p-value function Bootstrap Bayesian posterior ...... Bayes posterior/fiducial distribution as confidence distribution

Fiducial distributions are often CDs/asymptotic CDs “Statistical methods designed using the fiducial reasoning have typically very good statistical properties as measured by their repeated sampling (frequentist) performance” (Hannig 2009) Bayes posteriors are approximate/asymptotic CDs 1st order results: As sample size increases, the prior info disappears and the posterior distribution is asymptotically normal (normalized likelihood) (Bernstein-von Mises theorem; LaCam 1953, 1958) 2nd order results: In many occasions, Jefferey/matching/reference priors can possibly lead posteriors with frequentist coverage at rate of O(1/n) (Berger & Bernardo 1992; Fraser 2011) Objective Bayesian: The developments emphasize on and provides tools for finding matching priors so that the corresponding posteriors have a correct frequentist coverage property. Fiducial argument Equivalent equation: θ = X¯ − U Thus, when we observe X¯ = x¯,

θ = x¯ − U (2) Since U is unobserved, but we know in the model that U ∼ N(0, 1/n). Thus by (2), we have θ ∼ N(x¯, 1/n) =⇒ The fiducial distribution is N(x¯, 1/n)!

Remark In the example, N(x¯, 1/n) is in fact a very good function for making inference about θ! o N(x¯, 1/n) is a Bayesian posterior (flat prior) and a confidence distribution for θ (by either test (p-value function), or likelihood (normalized) ...)

Two-page intro of Fisher fiducial distribution

Model/Structure equation: Normal sample X¯ ∼ N(θ, 1/n) X¯ = θ + U where U ∼ N(0, 1/n) (1) Remark In the example, N(x¯, 1/n) is in fact a very good function for making inference about θ! o N(x¯, 1/n) is a Bayesian posterior (flat prior) and a confidence distribution for θ (by either test (p-value function), or likelihood (normalized) ...)

Two-page intro of Fisher fiducial distribution

Model/Structure equation: Normal sample X¯ ∼ N(θ, 1/n) X¯ = θ + U where U ∼ N(0, 1/n) (1) Fiducial argument Equivalent equation: θ = X¯ − U Thus, when we observe X¯ = x¯,

θ = x¯ − U (2) Since U is unobserved, but we know in the model that U ∼ N(0, 1/n). Thus by (2), we have θ ∼ N(x¯, 1/n) =⇒ The fiducial distribution is N(x¯, 1/n)! Two-page intro of Fisher fiducial distribution

Model/Structure equation: Normal sample X¯ ∼ N(θ, 1/n) X¯ = θ + U where U ∼ N(0, 1/n) (1) Fiducial argument Equivalent equation: θ = X¯ − U Thus, when we observe X¯ = x¯,

θ = x¯ − U (2) Since U is unobserved, but we know in the model that U ∼ N(0, 1/n). Thus by (2), we have θ ∼ N(x¯, 1/n) =⇒ The fiducial distribution is N(x¯, 1/n)!

Remark In the example, N(x¯, 1/n) is in fact a very good function for making inference about θ! o N(x¯, 1/n) is a Bayesian posterior (flat prior) and a confidence distribution for θ (by either test (p-value function), or likelihood (normalized) ...) Two-page intro of Fisher fiducial distribution

Two fundamental problems

“Hidden subjectivity” (Dempster, 1963; Martin & Liu, 2013) — “Continue to regard U as a random sample" from N(0, 1/n), even “after X = x is observed.” – In particular, U|X¯ = x¯ 6∼ U, by equation (??). (X and U are completely dependent — given one, the other one is also completely given.)

Fisher’s interpretation and constraints — Fiducial distribution is an inherent distribution of θ; it is unique, is optimal, always exists, ... – Parameter θ is a fixed (non-random) quantity; it can not have a distribution – Many paradoxes: e.g., we can not find such a fiducial distribution unless in the location/scale family; Manipulation of fiducial distributions can cause problems; etc. CD — a key difference with classical/Fisher fiducial distribution

Confidence distribution is a purely frequentist concept and not the classical fiducial distribution! It is viewed as an estimator for the parameter of interest; not an inherent distribution of the parameter It is a clean and coherent frequentist concept, similar to a point estimator – No paradoxes: Freeing itself from those restrictive, if not controversial, constraints set forth by Fisher on a fiducial distribution; e.g., Uniqueness, existence, manipulation of CDs/fiducial distributions ... ⇒ More elaboration – example of two normal means ¯ 1 Not issue for CDs (under new interpretation): Manipulation of N(x, n ) & ¯ 1 N(y, m ) can get a “distribution estimator” for µ1/µ2, but it may lose the exact coverage property – Asymptotic coverage is still okay - so still an asymptotic CD – Similar to the point estimator x¯/y¯, it loses unbiasedness, but still consistent (“asymptotically unbiased”)

Manipulation of fiducial distributions or CDs Example (ratio of two normal means) – Creasy-Fieller problem  ¯ 1 x1, ··· , xn ∼ N(µ1, 1) =⇒ THE fiducial distribution of µ1 a CD is N(x, n )  ¯ 1 y1, ··· , ym ∼ N(µ2, 1) =⇒ THE fiducial distribution of µ2 a CD is N(y, m ) Question: How to estimate µ1/µ2? (assume µ2 6= 0) ¯ 1 ¯ 1 Issue for classical fiducial inference: Manipulation of N(x, n ) & N(y, m ) in the usual ways does not lead to a fiducial distribution of µ1/µ2! – It is a very strange distribution that does not have proper coverages ⇒ A fiducial distribution is not a proper probability distribution ⇒ “Epistemic" probability? Manipulation of fiducial distributions or CDs Example (ratio of two normal means) – Creasy-Fieller problem  ¯ 1 x1, ··· , xn ∼ N(µ1, 1) =⇒ THE fiducial distribution of µ1 a CD is N(x, n )  ¯ 1 y1, ··· , ym ∼ N(µ2, 1) =⇒ THE fiducial distribution of µ2 a CD is N(y, m ) Question: How to estimate µ1/µ2? (assume µ2 6= 0) ¯ 1 ¯ 1 Issue for classical fiducial inference: Manipulation of N(x, n ) & N(y, m ) in the usual ways does not lead to a fiducial distribution of µ1/µ2! – It is a very strange distribution that does not have proper coverages ⇒ A fiducial distribution is not a proper probability distribution ⇒ “Epistemic" probability? ¯ 1 Not issue for CDs (under new interpretation): Manipulation of N(x, n ) & ¯ 1 N(y, m ) can get a “distribution estimator” for µ1/µ2, but it may lose the exact coverage property – Asymptotic coverage is still okay - so still an asymptotic CD – Similar to the point estimator x¯/y¯, it loses unbiasedness, but still consistent (“asymptotically unbiased”) Why confidence distribution (& related inference)?

Broad, informative, flexible, effective ... Confidence distribution versus confidence interval — CD is more “flexible” and provides a “good summary” (more informative) of sample data and model (Cox 1958, 2013) A possible unifying platform for inference (Bayesian, fiducial, frequentist) Supports new methodology developments beyond conventional approaches o New prediction approaches o New testing methods o New simulation schemes o Combining information from diverse sources (fusion learning, meta analysis, etc.) Why confidence distribution (& related inference)?

Broad, informative, flexible, effective ... Confidence distribution versus confidence interval — CD is more “flexible” and provides a “good summary” (more informative) of sample data and model (Cox 1958, 2013) A possible unifying platform for inference (Bayesian, fiducial, frequentist) Supports new methodology developments beyond conventional approaches o New prediction approaches o New testing methods o New simulation schemes o Combining information from diverse sources (fusion learning, meta analysis, etc.) Combination methodology in science

I. Combination of point estimators/single values o Different versions of weighted sum of point estimators o Variety forms of combining p-values (e.g. Fisher 1932, Stouffer et al 1949, etc) II. Combination of interval estimators o The approach of combining intervals (risk difference) by Tian et al. (2009) III. Combination of “distribution estimators”/functions o Multiplication of likelihood functions (frequentist) o Bayes formula (Bayesian) o Combination of “distribution estimators” (confidence distributions) Combine independent CDs – a general recipe

Setup: Common θ across k independent studies; th Hi (θ) is a CD from the i study.

k For any given function gc (u1,..., uk ) on [0, 1] → < monotonic in each coordinate, Singh et al. (2005) prove that the combined function

(c) H (θ) = Gc {gc (H1(θ),..., Hk (θ))}

is a CD for θ.

– the function Gc (t) = Pr(gc (U1,..., Uk ) ≤ t) completely

determined by the given gc function

o U1,..., Uk are independent U[0,1] random variable

−1 −1 A simple example (when we take gc (u1,..., uk ) = Φ (u1) + ··· + Φ (uk ))  √  (c) −1  −1  H (θ) = Φ Φ H1(θ) + ... + Φ Hk (θ) k Unifying framework for combining information

A unifying platform for almost all existing methods of information combination

Classical approaches of Fisher method combining p-values (from Stouffer (normal) method Marden, 1991) Tippett (min) method Max method Sum method Model-based meta-analysis Fixed-effects model: MLE method approaches (from Normand, Fixed-effects model: Bayesian method 1999, Table IV) Random-effects model: Method of moment Random-effects model: REML method Random-effects model: Bayesian method (normal prior on θ and fixed τ) (Xie, Singh & Strawderman, 2011) Others (2 × 2 tables) Tian et al (2009, Biostat.) approach of combining intervals (risk difference) Mantel-Haenszel (MH) method (odds ratio) Peto method (odds-ratio) (Yang et al, 2013; Yang, 2013) Combining functions Multiplication of likelihood functions Bayesian formula (Singh, Xie & Strawdermna 2005; Xie et al 2013) Additional developments on CD combination framework

Leads to a new and single unified computing algorithm for meta-analysis o gmeta package (R package) — a single R function for variety of meta-analysis approaches (Yang et al 2013)

Offers new methods beyond conventional approaches o Robust meta-analysis approach (Xie et al 2011, JASA) o Approach to incorporate expert opinions (Xie et al 2013, ANAS) o Exact meta-analysis analysis of rare event studies (Liu et al 2014, JASA) o Meta-analysis analysis of heterogeneous studies (Liu et al 2015, JASA) o Split-and-conquer method for big data (Chen & Xie 2014, Sinica) o “Non-parametric” meta-analysis (Claggett et al. 2014, JASA) Combining BFF inferences across paradigms

A nice feature of the CD combination method — It does not require any information regarding how the individual input CDs are obtained. =⇒ Provides a theoretical framework to combine estimators from different statistical paradigms – Bayesian, fiducial, frequentist ... Combining BFF inferences across paradigms – illustrative example

Four independent studies/trials to exam the correlation coefficient ρ of two normal random variables

o e.g., the type of cd4/HIV data studied in DiCiccio & Efron (1996)

Study-I Fisher’ Z method (Fisher 1915, Singh et al 2005) – fiducial/frequentist √  1 1 +ρ ˆ 1 1 + ρ  H (ρ) = 1 − Φ n − 3 log − log 1 2 1 − ρˆ 2 1 − ρ where ρˆ = sample correlation coefficient & n = sample size.

Study-II Bootstrap BCa method (DiCiccio & Efron, 1996) – frequentist ! Φ−1(Gˆ (ρ)) − zˆ H2(ρ) = Φ − zˆ 1 + aˆ(Φ−1(Gˆ (ρ)) − zˆ)

where Gˆ = est. dist. of bootstrap sample ρ∗ & (aˆ, zˆ) are correction terms. Combining BFF inferences across paradigms – illustrative example

(continue ..) Study-III Profile likelihood function– frequentist Z 1 `prof(ρ) `prof(ρ) H3(ρ) = e e dρ −1

where `prof(ρ) = log-profile likelihood function of ρ (assumed σ1 ≡ σ2). Study-IV Bayesian with arc-sine prior (Fosdick & Raftery, 2012) – Bayesian

fpost(ρ|data) ∝ f (data|ρ)πarc-sine(ρ)

(Assumed the marginal means and variances (µ1, µ2, σ1, σ2) are known.)

Fact: Each study makes its own inference conclusion individually. Question: Can we combine the inferences from the four independent studies, given that the ρ is the same in all studies?

(Thanks to Chengrui Li for the example.) Combining BFF inferences across paradigms – numerical results

Table : Inference on correlation coefficient; four independent bivariate normal trials Combining inferences across paradigms – performance (coverage)

Table : Inference on correlation coefficient; four independent bivariate normal trials; 1000 repetitions

Coverage rate Mean length (& SD) (1000 repetitions) of 95% CIs Study-I: Fisher’s Z method 0.943 0.294(0.0559)

Study-II: Bootstrap BCa 0.948 0.293(0.0667) Study-III: Profile likelihood 0.946 0.298(0.0533) Study-IV: Bayesian/arc-sine prior 0.957 0.252(0.0474) Combination (via CDs) 0.945 0.142(0.0224) To be continue ... Part II & III

Part II: CD is an effective inference tool for fusion learning – Provide solutions for problems whose solutions are previously unavailable, unknown or not easily available (added values) – Incorporating external information in analyses of clinical trials; – Exact meta- analysis for 2x2 tables with rare events; – Efficient network meta-analysis: a CD approach; – Meta-analysis of heterogeneous studies using only summary statistics; – Nonparametric combining inferences using CD, bootstrap & data depth Part III: Additional topics (perhaps in the future/not in this workshop ...) – Confidence curve (CV)/confidence net (CN) – Multivariate CD – A myth about epistemic probability ("not a proper distribution function") – A second look at CD in connection with bootstrap, fiducial, IM models and objective Bayes/Bayes ... In planning - BFF-No. 3 and Fusion learning workshop Location: Rutgers University (New Jersey) Time: April 2016 You are all invited!

!"!$#& %&'!