! "#$$% &  " ' ( '))***+ ,   +) )  

-  .,/,   01* ,* 0 

 2 (32 430 ,45++6 78 5+9+6

8           !" # $ %& '( $ ) (*+, -.-'$/ ' $)/+ (// / ($ +( /   *0 0  12! 1! 34 *  5 6 ! 207 ! !0 

8  :            

  !" !!

#$$ ! %      # &'   ($

)' " !$ *  !

#$%+

! !$

,  !- !!

, !$$ !$

./ 0,   1!$$! $! '1  !   ! 1   $ !$ *  !2 1  ! 3*  !, 4!!

;*  ,<456 %"7 ,!  !,!89 ; #:;# JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION , VOL. , NO. , –, Review http://dx.doi.org/./..

REVIEW Generalized Fiducial Inference: A Review and New Results

Jan Hanniga,HariIyerb, Randy C. S. Laic,d, and Thomas C. M. Leec aDepartment of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC, USA; bStatistical Engineering Division, National Institute of Standards and Technology, Gaithersburg, MD, USA; cDepartment of Statistics, University of California, Davis, Davis, CA, USA; dDepartment of Mathematics & Statistics, University of Maine, Orono, ME, USA

ABSTRACT ARTICLE HISTORY R. A. Fisher, the father of modern statistics, proposed the idea of fiducial inference during the first half ofthe Received February  20th century. While his proposal led to interesting methods for quantifying uncertainty, other prominent Revised January  statisticians of the time did not accept Fisher’s approach as it became apparent that some of Fisher’s bold KEYWORDS claims about the properties of fiducial distribution did not hold up for multi-parameter problems. Beginning Approximate Bayesian around the year 2000, the authors and collaborators started to reinvestigate the idea of fiducial inference computations; and discovered that Fisher’s approach, when properly generalized, would open doors to solve many impor- Data-generating equation; tant and difficult inference problems. They termed their generalization of Fisher’s idea as generalized fiducial Fiducial inference; Jacobian inference (GFI). The main idea of GFI is to carefully transfer randomness from the data to the parameter space calculation; Model selection; using an inverse of a data-generating equation without the use of Bayes’theorem. The resulting generalized Uncertainty quantification fiducial distribution (GFD) can then be used for inference. After more than a decade of investigations, the authors and collaborators have developed a unifying theory for GFI, and provided GFI solutions to many challenging practical problems in different fields of science and industry. Overall, they have demonstrated that GFI is a valid, useful, and promising approach for conducting . The goal of this article is to deliver a timely and concise introduction to GFI, to present some of the latest results, as well as to list some related open research problems. It is authors’ hope that their contributions to GFI will stimulate the growth and usage of this exciting approach for statistical inference. Supplementary materials for this article are available online.

1. Introduction of interest demonstrated itself in both the number of different approaches to the problem and the number of researchers work- TheoriginoffiducialinferencecanbetracedbacktoR.A.Fisher ingontheseproblems,andisleadingtoanincreasingnumberof (1922, 1925, 1930, 1933, 1935) who introduced the concept of publications in premier journals. The common thread for these afiducialdistributionforaparameter,andproposedtheuse approaches is a definition of inferentially meaningful probabil- of this fiducial distribution, in place of the Bayesian posterior ity statements about subsets of the parameter space without the distribution, for of the parameter. In simple need for subjective prior information. situations, especially in one parameter families of distributions, These modern approaches include Dempster–Shafer theory Fisher’s fiducial intervals turned out to coincide with classical (Dempster 2008; Edlefsen, Liu, and Dempster 2009)andrecent confidence intervals. For multi-parameter families of distribu- (since 2010) related approach called inferential models (Mar- tions, the fiducial approach led to confidence sets whose fre- tin, Zhang, and Liu 2010;ZhangandLiu2011;MartinandLiu quentist coverage probabilities were close to the claimed confi- 2013, 2015a,c), which aims at provably conservative and effi-

Downloaded by [University North Carolina - Chapel Hill] at 08:05 14 August 2017 dence levels but they were not exact in the repeated sampling fre- cient inference. While their philosophical approach to infer- quentist sense. Fisher’s proposal led to major discussions among ence is different from ours, the resulting solutions are often the prominent statisticians of the mid-20th century (e.g., Jeffreys mathematically closely related to the fiducial solutions presented 1940; Stevens 1950;Tukey1957;Lindley1958;Fraser1961a,b, here. Interested readers can learn about the inferential models 1966, 1968; Dempster 1966, 1968). Many of these discussions approach to inference from the book by Martin and Liu (2015b). focused on the nonexactness of the confidence sets and also A somewhat different approach termed confidence distributions nonuniqueness of fiducial distributions. The latter part of the looksattheproblemofobtaininganinferentiallymeaningful 20th century has seen only a handful of publications (Dawid, distribution on the parameter space from a purely frequentist Stone, and Zidek 1973;Wilkinson1977;DawidandStone1982; point of view (Xie and Singh 2013). One of the main contribu- Barnard 1995;Salome1998) as the fiducial approach fell into tionsofthisapproachisfusion learning:itsabilitytocombine disfavor and became a topic of historical interest only. information from disparate sources with deep implications for Since the mid-2000s, there has been a revival of interest meta analysis (Schweder and Hjort 2002; Singh, Xie, and Straw- in modern modifications of fiducial inference. This increase derman 2005; Xie, Singh, and Strawderman 2011;Hannigand

CONTACT Jan Hannig [email protected] Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC . Color versions of one or more of the figures in the article can be found online at www.tandfonline.com//r/JASA. Supplementary materials for this article are available online. Please go to www.tandfonline.com/r/JASA. ©  American Statistical Association JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1347

Xie 2012;Xieetal.2013). Another related approach is based on (Hannig, Wang, and Iyer 2003;WangandIyer2005, 2006a,b; higher order likelihood expansions and implied data-dependent Hannig, Iyer, and Wang 2007; Wang, Hannig, and Iyer 2012b), priors (Fraser 2004, 2011; Fraser, Reid, and Wong 2005;Fraser and interlaboratory experiments and international key compari- and Naderi 2008;Fraser,Fraser,andStaicu2009;Fraseretal. son experiments (Iyer, Wang, and Mathew 2004). It has also been 2010). Objective ,whichaimsatfinding applied to derive confidence procedures in many important sta- nonsubjective model-based priors, is also part of this effort. tistical problems, such as variance components (E, Hannig, and Examples of recent breakthroughs related to reference prior and Iyer 2008; Cisewski and Hannig 2012),maximummeanofa model selection are Bayarri et al. (2012), Berger (1992), Berger multivariate normal distribution (Wandler and Hannig 2011), and Sun (2008), Berger, Bernardo, and Sun (2009), and Berger, multiple comparisons (Wandler and Hannig 2012a), extreme Bernardo, and Sun (2012). Objective Bayesian inference is a value estimation (Wandler and Hannig 2012b), mixture of nor- very well-developed field but there is room for fiducial infer- mal and Cauchy distributions (Glagovskiy 2006), wavelet regres- ence for many reasons. It often provides a good alternative both sion (Hannig and Lee 2009),andlogisticregressionandbinary in terms of performance and ease of use, and the generalized response models (Liu and Hannig 2016). fiducial distribution is never improper. Moreover, generalized Onemaingoalofthisarticleistodeliveraconciseintroduc- fiducial inference and its various cousins are rapidly evolving tion to GFI. Our intention is to provide a single location where and have the potential for uncovering deep and fundamental the various developments of the last decade can be found. As a insights behind statistical inference. Finally, there are several second goal of this article, some original work and refined results other recent fiducial related works including Wang2000 ( ), Xu on GFI are also presented. Specifically, they are Definition 1 and and Li (2006), Veronese and Melilli (2015), and Taraldsen and Theorems 1, 3,and4. Lindqvist (2013) who show how fiducial distributions naturally The rest of this article is organized as follows. Start- arise within a decision theoretical framework. ing from Fisher’s fiducial argument, Section 2 provides a Arguably, generalized fiducial inference (GFI) has been on complete description of GFI, including some new results. the forefront of the modern fiducial revival. It is motivated by TheissueofmodelselectionwithintheGFIframework the work of Tsui and Weerahandi (1989, 1991) and Weerahandi is discussed in Section 3. Section 4 concerns the use of (1993, 1994, 1995)ongeneralized confidence intervals and the GFI for discrete and discretized data, and Section 5 offers work of Chiang (2001)onthesurrogate variable method for some practical advice on how to handle common compu- obtaining confidence intervals for variance components. The tational challenges when applying GFI. Lastly, Section 5.1 main spark came from the realization that there was a connec- provides some concluding remarks while technical details tion between these new procedures and fiducial inference. This are relegated to the online appendix. The following web- realization evolved through a series of works (Iyer, Wang, and site http://anson.ucdavis.edu/∼tcmlee/GFiducial.html contains Mathew 2004; Patterson, Hannig, and Iyer 2004; Hannig, Iyer, computer code for many of the methods in this review. and Patterson 2006b; Hannig 2009). GFI defines a data-dependent measure on the parameter space by carefully using an inverse of a deterministic data- 2. The Switching Principle: Fisher’s “Fiducial generating equation without the use of Bayes’ theorem. The Argument” Extended resulting generalized fiducial distribution (GFD) is a data- dependent distribution on the parameter space. GFD can be The idea underlying GFI is motivated by our understanding of Fisher’s fiducial argument. GFI begins with expressing the rela- viewed as a distribution estimator (as opposed to a point or θ interval estimator) of the unknown parameter of interest. The tionship between the data, Y, and the parameters, ,as resulting GFD when used to define approximate confidence sets is often shown in simulations to have very desirable proper- Y = G(U, θ), (1) ties, for example, conservative coverages but shorter expected (·, ·)

Downloaded by [University North Carolina - Chapel Hill] at 08:05 14 August 2017 lengths than competing procedures (E, Hannig, and Iyer 2008). where G is a deterministic function termed the data- The strengths and limitations of the generalized fiducial generating equation,andU is the random component of this approach are becoming better understood, see, especially, data-generating equation whose distribution is independent of Hannig (2009, 2013). In particular, the asymptotic exactness parameters and completely known. of fiducial confidence sets, under fairly general conditions, The dataY areassumedtobecreatedbygeneratingarandom was established in Hannig (2013), Hannig, Iyer, and Patterson variable U and plugging it into the data-generating Equation (2006b), and Sonderegger and Hannig (2014). Higher order (1). For example, a single observation from N(μ, 1) distribution asymptotics of GFI was studied in Majumder and Hannig canbewrittenasY = μ + U,whereθ = μ and U is N(0, 1) (2015). GFI has also been extended to prediction problems in random variable. Wang,Hannig,andIyer(2012a). Model selection was intro- For simplicity, this subsection only considers the simple case ducedintotheGFIparadigminHannigandLee(2009). This where the data-generating Equation (1) can be inverted and the ideawasthenfurtherexploredinclassicalsettinginWandler inverse Qy(u) = θ exists for any observed y and for any arbitrary and Hannig (2011) and in the ultra high-dimensional regression u.Fisher’sfiducial argument leads one to define the fiducial dis- in Lai, Hannig, and Lee (2015). tribution for θ as the distribution of Qy(U ) whereU is an inde- GFI has been proven useful in many practical applications. pendent copy of U.Equivalently,asamplefromthefiducialdis- θ , = ,..., Earlier examples include bioequivalence (McNally, Iyer, and tribution of can be obtained by generating U i i 1 N θ = ( ). Mathew 2003; Hannig et al. 2006a), problems of metrology and using i Qy U i Estimates and confidence intervals for 1348 J. HANNIG ET AL.

θ canbeobtainedbasedonthissample.IntheN(μ, 1) exam- small sample performance by simulation and prove good ple, Qy(u) = y − u and the fiducial distribution is therefore the largesamplepropertiesbyasymptotictheorems. distribution of y − U ∼ N(y, 1). 3. From a Bayesian point of view, Bayes’ theorem updates the distribution of U after the data are observed. How- = Example 1. ConsiderthemeanandsamplevarianceY ever, when no prior information is present, changing ( ¯ , 2) (μ, σ 2 ) Y S computed from n independent N random vari- the distribution of U only by restricting it to the set μ σ 2 ables, where and are parameters to be estimated. A natural “there is at least one θ solving the equation y = G(U, θ)” data-generating equation for Y is seems to us as a reasonable choice (see the next section). Arguably, this so-called “continuing to regard” assump- ¯ = μ + σ 2 = σ 2 , Y U1 and S U2 tion has been behind most of the philosophical contro- −1 versies surrounding fiducial inference in the past. where U1,U2 are independent with U1 ∼ N(0, n ) and U2 ∼ (( − )/ ,( − )/ ) Gamma n 1 2 n 1 2 . √ ¯ 2 The inverse Qy(u) = (y − su1/ u2, s /u2).Consequently, for any observed value y¯ and s, and an independent copy of 2.1. A Refined Definition of Generalized Fiducial μ = ¯ − / U, denoted as U , the distribution of y sU1 U2 is Distribution the marginal fiducial distribution of μ.Theequaltailedsetof √ √ The inverse to Equation (1) does not exist for two possible rea- 95% fiducial probability is (y¯ − ts/ n, y¯ + ts/ n) where t is sons. Either, there is more than one θ for some value of y and the 0.025 critical value of the t-distribution with n − 1 degrees u,orthereisnoθ satisfying y = G(u, θ). The first situation can of freedom, which is the classical 95% for μ. be dealt with by using the mechanics of Dempster–Shafer calcu- Remark 1. Generalized fiducial distribution is a data-dependent lus (Dempster 2008). A more practical solution is to select one measure on the parameter space. It is mathematically very simi- of the several solutions using a possibly random mechanism. In lartoBayesianposteriorsandcanbeusedinpracticeinasimilar Section 4, we will review theoretical results that showed that the fashion. For example, the median (or mean) of the GFD can be uncertainty due to multiple solutions has, in many parametric used as a point estimator. More importantly, certain sets of fidu- problems, only a second-order effect on statistical inference. cial probability 1 − α canbeusedasapproximate(1 − α)100% For the second situation, Hannig (2009)suggestedremoving confidence sets, see Theorem 3. A nested collection of such the values of u for which there is no solution from the sample approximate confidence sets at all confidence levels can also be space and then renormalizing the probabilities, that is, using U ={ = inverted for a use as an approximate p-value (Hannig 2009;Xie the distribution of U conditional on the event y u : y ( , θ), θ} and Singh 2013). G u for some . The rationale for this choice is that we A useful graphical tool for visualizing GFDs is the confidence know that the observed data y were generated using some fixed θ = (θ , ) curve of Birnbaum (1961). If R(θ|x) is the distribution (or sur- unknown 0 and u0,thatis,y G 0 u0 .Thevaluesofu for = (·, ) vival) function of a marginal fiducial distribution, the confi- which y G u does not have a solution could not be the dence curve is defined as CV(θ ) = 2|R(θ|x) − 0.5|. On a plot true u0 hence only the values of u for which there is a solution of CV(θ ) versus θ,alineacrosstheheight(y-axis) of α,forany should be considered in the definition of the generalized fiducial 0 <α<1, intersects with the confidence curve at two points, distribution (an exception to this suggestion is in cases where and these two points correspond (on x-axis) to an α level, equal theparameterspaceisinsomewayconstrained.Inthiscase,it tailed, two-sided confidence interval for θ. Thus, a confidence is often beneficial to extend the parameter space, perform the curve is a graphical device that shows confidence intervals of all inversion in the extended space, and then project to the bound- levels. The minimum of a confidence curve is the median of the ary of the constrained parameter space. A good example of such fiducial distribution which is the recommended point estimator. a situation is the variance component model where variances are Figure 1 shows an example of a confidence curve for a dataset constrained to be greater than or equal to zero (E, Hannig, and 2 U Downloaded by [University North Carolina - Chapel Hill] at 08:05 14 August 2017 generated from U (θ, θ ) distribution of Example 4. Iyer 2009; Cisewski and Hannig 2012)). However, y,thesetofu for which the solution exists, has probability zero for most prob- Remark 2. We have made a conscious choice to eschew philo- lems involving absolutely continuous random variables. Condi- sophical controversies throughout the development of GFI. tioning on such a set of probability zero will therefore lead to However, we find it inevitable to make at least some philosoph- nonuniqueness due to the Borel paradox (Casella and Berger ical comments at this point: 2002, sec. 4.9.3). 1. The idea behind GFD is very similar to the idea behind Hannig (2013) proposed an attractive interpretation of the the likelihood function: what is the chance of observing conditional distribution by limit of discretizations. Here, we my data if any given parameter was true. The added value generalize this approach slightly. Throughout this manuscript, of GFD is that it provides likelihood function with an U denotes an independent copy of U and θ denotes a random appropriate Jacobian obtaining a proper probability dis- variable taking values in the parameter space . tribution on the parameter space, see (4)below. To define GFD, we need to interpret the ill-defined condi- 2.GFDdoesnotpresumethattheparameterisrandom. tional distribution of U | U ∈ Uy.Todothatwe“fattenup” Insteaditshouldbeviewedasadistributionestimator the manifold Uy by  so that the enlarged set Uy, ={u : y − (ratherthanapointorintervalestimator)ofthefixed G(u, θ)≤, for some θ} has positive probability and the con- true parameter. To validate this distribution estimator in ditional distribution of U | U ∈ Uy, is well defined. Finally, a specific example, we then typically demonstrate good thefatteningneedstobedoneinaconsistentwaysothatthe JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1349

limit of conditional distributions as  → 0 is well defined. This possibly random element of the closure of the set A¯. When the leads to the following definition: probability P(∃θ , y = G(U , θ )) > 0 then the limit (2)isthe conditional distribution Definition 1. A probability measure on the parameter space  is called a generalized fiducial distribution (GFD) if it can be V[{θ : y = G(U , θ )}] |{∃θ , y = G(U , θ )}. obtained as a weak limit: This is an older definition of GFD that can be found in Hannig (2009, 2013). lim arg min y − G(U , θ ) min y − G(U , θ )≤ . → 0 θ θ The next subsection offers a useful computational formula (2) for evaluating (2). If there are multiple minimizers arg minθ y − G(U , θ ), one selects one of them (potentially at random). Notice that the 2.2. A User Friendly Formula for Generalized Fiducial conditioning in (2) is modifying the distribution of U to only Distribution consider values for which an approximate inverse to G exists. While Definition (2) for GFD is conceptually appealing and very Remark 3. Definition 1 illuminates the relationship between general, it is not immediately clear how to compute the limit in ∞ GFD and Approximate Bayesian Computations (ABC; Beau- many practical situations. In a less general setup using the l mont, Zhang, and Balding 2002).InanidealizedABC,onegen- norm, Hannig (2013)derivedaclosedformofthelimitin(2) ∗ erates first an observation θ from the prior, then generates a applicable to many practical situations. Here, we provide a gen- new sample using a data-generating equation y = G(U , θ ) eralization of this result, which is applicable in most situations and compares the generated data with the observed data y.Ifthe where the data follow a continuous distribution. p observed and generated datasets are close (e.g., y − y≤), Assume that the parameter θ ∈  ⊂ R is p-dimensional, n the generated θ is accepted, otherwise it is rejected and the pro- the data x ∈ R are n dimensional. The following theorem pro- cedure is repeated. If the measure of closeness is a norm, it is easy vides a useful computational formula.  → to see that when 0 the weak limit of the ABC distribution Theorem 1. Suppose Assumptions A.1 to A.3 stated in Appendix is the posterior distribution. A. Then the limiting distribution in (2) has a density On the other hand, when defining GFD one generates U , θ =  − ( , θ ) ( , θ) ( , θ) finds a best-fitting arg minθ y G U ,computes (θ| ) = f y J y , = ( , θ) θ  − ≤ r y (3) y G U , again accepts if y y ,andrejects  f (y, θ )J(y, θ ) dθ otherwise. ( , θ) In either approach, an artificial dataset y = G(U , θ ) is where f y is the likelihood and the function generated and compared to the observed data. The main differ- d ence is that the Bayes posterior simulates the parameter θ from J(y, θ) = D G(u, θ) . (4) θ − the prior while GFD uses the best-fitting parameter. d u=G 1 (y,θ) = ( ) =| | ( ) Remark 4. TheGFDdefinedin(2) is not unique as it depends If (i) n p then D A det A . Otherwise the function D A ( ) = on both the data-generating Equation (1), the norm used in (2) depends on the norm used; (ii) the l∞ norm gives D A θ |det(A)i|; (iii) under an additional Assumption A.4 and the minimizer chosen. Let U be an independent copy of i=(i1,...,ip ) 1/2 U and let for any measurable set A, V[A] be a rule selecting a the l2 norm gives D(A) = (det A A) . Downloaded by [University North Carolina - Chapel Hill] at 08:05 14 August 2017

Figure . Confidence curve of a GFD for parameter θ based on sample of size  from U(θ, θ2 ) with θ = 100. The minimum and maximum values of the sample used to generate the GFD are . and ., respectively. The interval between the two points where the dotted line intersects the confidence curve (98.49, 105.92) is the approximate % confidence interval. The minimum of the confidence curve is the median of the generalized fiducial distribution. Its value of . provides a natural point estimator. 1350 J. HANNIG ET AL. n = ( ≤ of the p × p nonzero submatrix: In (ii) the sum spans over p of p-tuples of indexes i 1 < ···< ≤ ) × ( ) i1 ip n .Foranyn p matrix A,thesubmatrix A i × = ( ,..., ) d is the p p matrix containing the rows i i1 ip of A. J(z, θ) = det S(G(u, θ)) . (7) θ − Thereisaslightabuseofnotationin(3)asr(θ|y) is not a d u=G 1 (y,θ) conditional density in the usual sense. Instead, we are using this Next, denote the solution of the equation s = S(G(u, θ)) by notation to remind the reader that the fiducial density depends Qs(u) = θ. A straightforward calculation shows that the fiducial on the fixed observed data. density (3)with(7) is the conditional distribution of Qs(U ) | Cases (i) and (ii) are a simple consequence of results in Han- A(U ) = a,theGFDbasedonS conditional on the observed nig (2013). The formula in (iii) was independently proposed in ancillary A = a,seeBirnbaum(1962) and Iyer and Patterson Fraser et al. (2010) based on arguments related to tangent expo- (2002). nential families without being recognized as a fiducial distribu- tion.TheproofisinAppendixA.Theeaseofuseof(4)willbe demonstrated on several examples in the next subsection. The 2.3. Two Examples rest of this subsection discusses the effects of various transfor- In this section, we will consider two examples, linear regression mations. and uniform distribution. In the first case, the GFD is the same as Remark 5. Just like posterior computed using Jeffreys’ prior, Bayes posterior with respect to the independence Jeffreys’ prior GFD is invariant under smooth reparameterizations. whileinthesecondtheGFDisnotaBayesposteriorwithrespect This assertion has been shown for smooth transformation by to any prior (that is not data dependent). chainruleinHannig(2013). However, this property is general Example 3 (Linear Regression). Express linear regression using and follows directly from (2), since for an appropriate selection the data-generating equation of minimizers and any one-to-one function θ = φ(η) Y = G(U, θ) = Xβ + σU, φ arg min y − G(U ,φ(η )) = arg min y − G(U , θ ). where Y is the dependent variables, X is the design matrix, θ = η θ (β, σ ) are the unknown parameters, and U is a random vector with known density f (u) independent of any parameters. Remark 6. GFD could change with transformations of the data- d ( , θ) = To compute GFD, simply notice that dθ G U generating equation. (X,U ), U = σ −1(y − Xβ). From here the Jacobian in Assume that the observed dataset has been transformed (4)usingthel∞ norm simplifies to with a one-to-one smooth transformation Z = T(Y ). Using −1 the chain rule, we see that the GFD based on this new data- J∞(y, θ) = σ |det (X,Y ) | = ( ) i generating equation and with observed data z T y is the i=(i1,...,ip ) ≤ <···< ≤ density (3)withtheJacobianfunction(4) simplified to 1 i1 ip n and the density of GFD is d d ( , θ) = ( ) · ( , θ) . − − − JT z D T y G u (5) (β, σ| ) ∝ σ n 1 (σ 1( − β)). θ − r y f Y X dy d u=G 1 (y,θ) This coincides with the Bayesian solution using the indepen- −1( ) Notice that for simplicity we write y instead of T z in (5). dence Jeffreys’ prior (Yang and Berger 1997). For completeness, we recall the well-known fact that the like- The J function has a more compact form when using the lihood based on z = T(y) satisfies l2 norm. In particular by Cauchy–Binet formula, we see that (( , − β) ( , − β)) β −1 det X y X X y X is invariant in .Byselecting dT β = (X X )−1X y,weimmediatelyobtain fT (z|θ) = f (y|θ) det . (6) Downloaded by [University North Carolina - Chapel Hill] at 08:05 14 August 2017 dy −1 1 1 J2(y, θ) = σ | det(X X )| 2 RSS 2 , Thesecondtermintheright-handsideof(6)isaconstantand does not affect the GFD with the exception of model selection whereRSSistheresidualsumofsquares.AsthetwoJacobian considerations in Section 3. functions differ only by a constant, the GFD is unchanged. = Ascanbeseenfromtheabovecalculation,GFDwillusually As a special case, the GFD for the location-scale model X 1 ( , θ) = σ −1 | − | change with transformation of the data. An important exception ,thel∞ Jacobian is J∞ y i< j Yi Yj while the l2 ( , θ) = σ −1 σˆ σˆ is when the number of observations and number of parameters Jacobian becomes J2 y n n,where n is the maximum are equal, that is, n = p. Indeed, by careful evaluation of (4), likelihood estimator of σ . = ( ) ( , θ) ( |θ) = (5), and (6), we see that for z T y we have J y fY y Example 4 (Uniform U{a(θ ) − b(θ ), a(θ ) + b(θ )}). As a sec- ( , θ) ( |θ) JT z fT z and the GFD is unchanged. ond example, we will study a very irregular model. The refer- ence prior for this model is complicated and has been obtained Example 2. Consider the following important transformation. = ( , ) as Theorem 8 in Berger, Bernardo, and Sun (2009). Let Z S A be one-to-one smooth transformation, where Express the observed data using the following data- S is a p-dimensional statistic and A is an ancillary statistic. Let generating equation s = S(y) and a = A(y) betheobservedvalues.SincedA/d = 0, the function D in (5) is the absolute value of the determinant Yi = a(θ ) + b(θ )Ui, Ui iid U (−1, 1). JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1351

d ( ,θ)= (θ ) + (θ ) We performed a simulation study for the particular case of Simple computations give dθ G u a b U with U = b−1(θ )(Y − a(θ )). If a (θ ) > |b (θ )|,(4) simplifies to U (θ, θ2); a(θ ) = θ,b(θ ) = θ 2 − θ. For this model, the likeli- −n ( |θ) ={θ(θ − )} 1/2 (θ ) hood is f y 1 I( , ( ) ) and the Jacobians y(n) y 1 n are J (y,θ) = a (θ ) +{log b(θ )} {y − a(θ )} 1 i ¯( θ − ) − θ 2 i=1 y 2 1 J1(y,θ) = n and ¯ θ(θ − 1) = n[a (θ ) − a(θ ){log b(θ )} + yn{log b(θ )} ]. (8) 2 (w1y(1) + w2y(n))(2θ − 1) − (w1 + w2)θ J (y,θ) = , We used a (θ ) > |b (θ )| only to show that the terms inside the 2 θ(θ − 1) absolute values below are all positive. However, we remark that under this assumption both a(θ ) − b(θ ) and a(θ ) + b(θ ) are where strictly increasing, continuous functions of θ. 1 + n 1 + n w1 = , w2 = . With the above, the GFD is then 2 + ( + ) n + 4(1 + n)y(1) 1 4n 1 n y(n) (θ ) − (θ ){ (θ )} + ¯ { (θ )} a a log b yn log b An example of confidence curve for a GFD based on (10)isin r(θ|y) ∝ b(θ )n Figure 1. × . We compared the performance of the two fiducial distribu- I{a(θ)−b(θ)y(n)} (9) tions to the Bayesian posteriors with the reference prior π(θ) = ( θ− ) ψ( 2θ ) 2 1 2θ−1 ψ( ) As an alternative fiducial solution, consider a transformation θ(θ−1) e (Berger, Bernardo, and Sun 2009)( x is the to the minimal sufficient and ancillary inspired by of the Exam- digamma function defined by ψ(z) = d log( (z)) for z > 0) ={ ( ), ( ), ( − )/( − )} dz ple 2. Z h1 Y(1) h2 Y(n) Y Y(1) Y(n) Y(1) . and flat prior π(θ) = 1. We selected the transformations hi so that their inverse For all the combinations of n = 1, 2, 3, 4, 5, 10, 20, 100, 250 −1(θ ) = = (θ ) − (θ )( − )/( + ) −1(θ ) = h1 EY(1) a b n 1 n 1 and h2 and θ = 1.5, 2, 5, 10, 25, 50, 100, 250 we analyzed 16,000 inde- = (θ ) + (θ )( − )/( + ). EY(n) a b n 1 n 1 There are only two pendent datasets. Based on this we found the empirical coverage nonzero terms in (5)andconsequently of the 2.5%, 5%, 50%, 95%, 97.5% upper confidence bounds. TheresultsaresummarizedinFigure 2.Weobservedthatthe

J (y,θ) = (w + w ) a (θ ) − a(θ ){log b(θ )} simple GFD (denoted by F1 in the figures), the alternative 2 1 2 GFD based on minimal sufficient statistics (F2) and the refer- w + w ence prior Bayes posterior (BR) maintain stated coverage for all + 1y(1) 2y(n) { (θ )} , w + w log b (10) parameter settings. However, the flat prior Bayes posterior (B1) 1 2 does not have a satisfactory coverage, with the worst departures θ w = ( ) w = ( ) from stated coverage observed for small n and large . where 1 h1 y(1) and 2 h2 y(n) . Downloaded by [University North Carolina - Chapel Hill] at 08:05 14 August 2017

Figure . Boxplots of empirical coverages of the 2.5%, 5%, 50%, 95%, 97.5% upper confidence bounds for the simple GFD (F), GFD based on minimal sufficient statistics (F), reference Bayes (BR), and flat prior Bayes (B) for all  parameter settings. The broken lines provide bounds on random fluctuations of the empirical coverages showing that F, F, and BR maintain stated coverage while B does not. 1352 J. HANNIG ET AL.

(MAD) (MAD) Figure . Boxplots of log10 of GFD based on minimal sufficient statistics (F), reference Bayes (BR), and flat prior Bayes minus the log10 of the simple fiducial (F) averaged over , simulated datasets. Each  parameter combinations provide one data point for the boxplots. Positive values mean that GFD F concentrates closer to the truth; consequently, F is the best in this metric.

Foreachdataset,wehavealsomeasuredthemeanabsolute succinctly state the theorem, we denote the rescaled density of ∗ − / − / ˆ ˆ deviation from the true parameter (MAD) of each of the GFD GFD by r (s|y) = n 1 2r(n 1 2s + θ|y),whereθ is the consis- = |θ − θ | (θ| ) θ and posteriors, that is, MAD 0 r y d . To aid in tent maximum likelihood estimator (MLE). comparison, we compute the difference of the log (MAD) of 10 Theorem 2 (Sonderegger and Hannig 2014). Under Assump- F2, BR, and B1 minus the log (MAD) of F1 on each dataset. A 10 tions B.1 to B.4 in Appendix B positive (resp., negative) value of the difference signifies that the √ | (θ )| θ posterior is concentrated further from (resp., closer to) the truth ∗ det I 0 − T (θ )s/ P 0 | − √ s I 0 2 → . than the simple fiducial. These relative MADs are then averaged r s y e ds 0 Rp 2π across the 16,000 simulated datasets for all the parameter set- tings and reported in Figure 3. We observe that F2 is better than One application of GFD is to take sets of 1 − α fiducial prob- F1, which is better than BR though the absolute value of the dif- ability and use them as approximate confidence intervals. Next ference is relatively small. B1 is not competitive. we state conditions under which this is valid. Assumption 1. Let us consider a sequence of datasets Y n gener- 2.4. Theoretical Results ated using fixed parameters θn ∈ n with corresponding data- dependent measures (such data-dependent measures can be, for This section discusses asymptotic properties for GFI. We hope example, GFD s, Bayes posteriors, or confidence distributions) that the material included here will be useful for the study of ,

Downloaded by [University North Carolina - Chapel Hill] at 08:05 14 August 2017 on the parameter space R .Wewillassumethatthesecon- GFD in future practical problems. n Y n verge to a limiting fiducial model in the following way: First, we present a Bernstein–von Mises theorem for GFD, 1. There is a sequence of measurable functions t of Y so which provides theoretical guarantees of asymptotic normality n n that t (Y ) convergesindistributiontosomerandom and asymptotic efficiency. It also guarantees in conjunction with n n variable T. Theorem 3 that appropriate sets of fiducial probability 1 − α 2. are indeed approximate 1 − α confidence sets. An early ver- (a) The T from Part 1 can be decomposed into sion of this theorem can be found in Hannig (2009). Here, we T = (T , T ) and there is a limiting data-generating will state a more general result due to Sonderegger and Hannig 1 2 equation T = H (V , ξ), T = H (V ),where (2014). 1 1 1 2 2 2 V = (V ,V ) has a fully known distribution inde- Assume that we are given a random sample of independent 1 2 pendent of the parameter ξ ∈ .Thedistribution observations Y ,...,Y with data-generating equation Y = 1 n i of T is obtained from the limiting data-generating G(θ,U ),withU iid U (0, 1). This data-generating equation i i equation using ξ . leads to the Jacobian function (4)thatisaU-statistic. This real- 0 (b) The equation H is one-to-one if viewed as a func- ization makes the GFD amenable to theoretical study. 1 tion (possibly implicit) for any combination of ξ, v , Asymptotic normality of statistical estimators usually relies 1 t , where one is held fixed, one taken as a depen- on a set of technical assumptions and GFD is no exception. To 1 dent, and one taken as an independent variable. The JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1353

equation H2 is one to one. Consequently, the lim- where V1 = (E1 − E2)/2, V2 = (E1 + E2)/2withE1, E2 are

iting GFD defined by (2) is the conditional distri- independent, E1 ∼ exp[{a (θ0) − b (θ0)}/{2b(θ0)}]andE2 ∼ ( ) | ( ) = (v ) = ξ { (θ ) + (θ )}/{ (θ )} bution Qt 1 V 1 H2 V 2 t 2,whereQt 1 1 exp[ a 0 b 0 2b 0 ]. The density of the limiting is the solution of t 1 = H1(v1, ξ).Denotethiscon- GFD is therefore proportional to ditional measure by Rt . −ξ { (θ )} (c) For any open set C ⊂  and limiting data t,thelim- (ξ| ) ∝ log b 0 (ξ ). r t e I(T1−T2,T1+T2 ) iting fiducial probability of the boundary Rt (∂C) = 0. The fact that Assumption 1, Part 2 is satisfied follows 3. There are homeomorphic injective mappings from n immediately. n into  so that (θ ) = ξ Next, define the transformations (a) n n,0 0; (b) For any sequence of data tn(y ) → t, the trans- ⎛ ⎞ n y( ) − (a(θ ) − b(θ )) 1 0 0 formed fiducial distribution measures converge ⎜ (θ ) − (θ ) ⎟ − W 1/2 −1/2 ⎜ a 0 b 0 ⎟ , 1 −→ t (y) = n · , weakly Rn yn n Rt . n / / ⎝ ⎠ 1 212 a(θ0) + b(θ0) − y(n) (θ ) + (θ ) Theorem 3. Suppose Assumption 1 holds. Fix a desired cover- a 0 b 0 <α< (θ ) = (θ − θ ). age 0 1. For any observed data yn,selectanopenset n n 0 ( ) ( ( )) = α ( ) → Cn yn satisfying: (i) Rn,y Cn yn ; (ii) tn yn t implies ( ( )) → ( ) n V ={(v , v ) (v ) ∈ n Cn yn C t ; (iii) the set t 2 1 2 : Qt 1 1 Simple calculations show that Assumption 1, Part 1 and 3 are ( ) = (v )} C t and t 2 H2 2 is invariant in t 1. satisfied with ξ = 0. ( ) α 0 Then the sets Cn yn are asymptotic confidence sets. Finally,noticethatanycollectionofsetsoffiducialprobabil- ity α that in the limit becomes location invariant in t1 (such as The theorem provides a condition on how various sets of a one sided or equal tailed intervals) are asymptotic α confidence fixed fiducial probability need to be linked together across dif- intervals. ferent observed datasets to make up a valid confidence set. To understand the key Condition (iii) notice that it assumes the ( ) V sets C t are obtained by de-pivoting a common set t 2 .Inpar- 2.5. Practical Use of GFI ticular if the limiting data-generating equation T 1 = H1(V 1, ξ) has group structure, Condition (iii) is equivalent to assuming the From a practical point of view, GFI is used in a way similar to the use of a posterior computed using a default (objective) sets C(t ) are group invariant in t 1. The conditions on the limit- ing data-generating equation were partially inspired by results prior, such as probability matching, reference, or flat prior. The for Inferential Models of Martin and Liu (2015a). The proof main technical difference is that the objective prior is replaced of Theorem 3 is in Appendix C. Also, this corollary follows by a data-dependent Jacobian (4). This data dependence can in immediately: some examples lead to the existence of second-order matching GFDevenwhenonlyfirst-ordermatchingisavailablewiththe Corollary 1. Any model that satisfies the assumptions of Theo- nondata-dependent priors (Majumder and Hannig 2015). Some rem 2 satisfies Assumption 1.Inparticular,foranyfixedinterior argued (Welch and Peers 1963;MartinandWalker2014)that 0 point θ0 ∈  the limiting data-generating equation T = ξ + data-dependent priors are essential in achieving superior fre- −1 V where the random vector V ∼ N(0, I(θ0) ).Thetransfor- quentist properties in complex statistical problems. ( ) = 1/2(θˆ − θ ) (θ) = 1/2(θ − θ ) − α mations are tn yn n n 0 , n n 0 and First, we suggest using a set of fiducial probability 1 and ξ = ( ) 0 0. Any collection of sets Cn yn that in the limit becomes of a good shape (such as one sided or equal tailed) as an approx- location invariant will form asymptotic confidence intervals. imate 1 − α confidence interval, see Theorem 3. Next, the mean

Downloaded by [University North Carolina - Chapel Hill] at 08:05 14 August 2017 ormedianoftheGFDcanbeusedforpointestimation. Most of the theoretical results for GFI in the literature GFDscanalsobeusedforpredictingfutureobservations. were derived in regular statistical problems and are covered by This is done by plugging in a random variable having the Corollary 1. Notice that in the regular case the limiting data- GFD (2) for the parameter into the data-generating equation generating equation has no ancillary part. The next example for the new observations. This approach produces a predic- shows that the ancillary part in Theorem 3 is needed in some tive distribution that accommodates in a natural way both the nonregular cases. uncertainty in the parameter estimation and the randomness of the future data. More details are in Wang, Hannig, and Iyer Example 5 (Example 4 continued). Recall that Yi = a(θ ) + (2012a). b(θ )Ui, i = 1,...,n,whereUi are iidU (−1, 1).Weassumethat GFDs are rarely available in closed form. Therefore, we often (θ ) > | (θ )| θ ∈  a b for so that the GFD Rn,yn has a density need to use a Markov Chain Monte Carlo (MCMC) method such 0 given by (9). Fix an interior point θ0 ∈  . To verify conditions as a Metropolis–Hastings or Gibbs sampler to obtain a sample of Theorem 3, we need to define the limiting data-generating from the GFD. While the basic issues facing implementation of equation, and the transformations tn and n. We start with the the MCMC procedures are similar for both Bayesian and gener- limiting data-generating process: alized fiducial problems, there are specific challenges related to generalizedfiducialprocedures.Wediscusssomecomputational T1 = ξ + V1, T2 = V2, issues in Section 5. 1354 J. HANNIG ET AL.

3. Model Selection in GFI comes with a scaling constant attached to it. In fact, the fidu- cial factors are closely related to the intrinsic factors of Berger Hannig and Lee (2009) introduced model selection into the GFI and Pericchi (1996, 2001).Thiscanbeseenfromthefactthat paradigm in the context of wavelet regression. The presenta- for the minimal training sample (n =|M|), we usually have tion here is reexpressed using definition (2). There are two main  fM (y, θM )JM (y, θM ) dθM = 1. ingredients needed for an effective fiducial model selection. The M Similarly, the quantity r(M|y) canalsobeusedforfiducial first is to include the model as one of the parameters and the sec- model averaging much akin to the Bayesian model averaging ond is to include penalization in the data-generating equation. (Hoeting et al. 1999). Consider a finite collection of models M.Thedata- generating equation is We illustrate the use of this model selection on two exam- ples, wavelet regression (Hannig and Lee 2009)andultrahigh- Y = G(M, θM,U ), M ∈ M, θM ∈ M, (11) dimensional regression (Lai, Hannig, and Lee 2015).

where Y is the observations, M is the model considered, θM are the parameters associated with model M,andU is a random vec- 3.1. Wavelet Regression tor of fully known distribution independent of any parameters. | | { }n Denote the number of parameters in the model M by M . Suppose n-observedequispaceddatapoints xi i=1 satisfy the Similar to MLE, an important issue needing to be solved is that following model GFI tends to favor models with more parameters over ones with fewer parameters. Therefore, an outside penalty accounting for Xi = gi + i, our preference toward parsimony needs to be incorporated in = ( ,..., ) the model. See Appendix D for more details. where g g1 gn isthetrueunknownregressionfunc-  In Hannig and Lee (2009), a novel way of adding a penalty tion and i’s are independent standard normal random variables σ 2 = J+1 into the GFI framework is proposed. In particular, for each with mean 0 and variance ,andn 2 is an integer power model M they proposed augmenting the data-generating Equa- of 2. tion (11)by Most wavelet regression methods consist of three steps. The first step is to apply a forward wavelet transform to the data = 0 = Pk, k = 1,...,min(|M|, n), (12) y and obtain the empirical wavelet coefficients y Hx.Here, H is the discrete wavelet transform matrix. The second step is ( ) = ˆ where Pk are iid continuous random variables with fP 0 q to apply a shrinkage operation to y to obtain an estimate d for independent ofU,andq is a constant determined by the penalty. the true wavelet coefficients d = Hg. Lastly, the regression esti- (Based on ideas from the minimum description length principle ˆ = ( ˆ ,...,ˆ ) − / mate g g1 gn for g is computed via the inverse discrete Hannig and Lee (2009) recommended using q = n 1 2 as the ˆ wavelet transform: gˆ = H d. The second step of wavelet shrink- default penalty.) Notice that the number of additional equations ageisimportantbecauseitisthestepwherestatisticalestimation is the same as the number of unknown parameters in the model. is performed. Hannig and Lee (2009)usedGFItoperformthe Fortheaugmenteddata-generatingequation,wehavethefol- second step. Apparently this is the first published work where lowing theorem. This theorem has never been published before Fisher’s fiducial idea is applied to a nonparametric problem. butitdoesimplicitlyappearinHannigandLee(2009)andLai, Due to the orthonormality of the discrete wavelet transform Hannig, and Lee (2015). For completeness, we provide a proof matrix H, a model for the empirical wavelet coefficients is Y = in Appendix D. d + σU with U being a n-dimensional vector of independent ( , ) Theorem 4. Let us suppose the identifiability Assumption D.1 in N 0 1 random variables. The assumption of sparsity implies Appendix D holds and that each of the models satisfy assump- that many of the entries in the vector d are zero. This allows us tions of Theorem 1 (in particular |M|≤n). Then the marginal to cast this as a model selection problem, where the model M

Downloaded by [University North Carolina - Chapel Hill] at 08:05 14 August 2017 generalized fiducial probability of model M is is the list of nonzero entries. The data-generating Equation (11) becomes |M| q  fM (y, θM )JM (y, θM ) dθM ( | ) = M , + σ , ∈ , r M y | | (13) = dk Uk k M M ( , θ ) ( , θ ) θ Y  M ∈M q  fM y M JM y M d M k M σUk, k ∈ M .

where fM (y, θM ) is the likelihood and JM (y, θM ) is the Jacobian 2 Notice that θM ={σ , dk k ∈ M}.Asdiscussedabove,weaug- function computed using (4)foreachfixedmodelM. ment the data-generating equations by (12)withq = n−1/2. Remark 7. The quantity r(M|y) can be used for inference It follows from Theorem 4 that the GFD has generalized den- in the usual way. For example, fiducial factor:theratio sity proportional to r(M1|y)/r(M2|y), can be used in the same way as a Bayes fac- ∈  |y | 2 −2 n +1 j M j tor. As discussed in Berger and Pericchi (2001), one of the issues r(σ , d, M) ∝ (σ ) 2 n −|M| with the use of improper priors in Bayesian model selection is 2 2 the presence of arbitrary scaling constant. While this is not a |M| log n ∈ (dk − yk) + ∈  y × exp − − k M i M i problem when a single model is considered, because the arbi- 2 2σ 2 trary constant cancels, it becomes a problem for model selec- tion. An advantage of GFD is that the Jacobian function (4) × δ0(di), (14) i∈M JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1355 δ ( ) δ ( ) = ∈ {I − where 0 s is the Dirac function, that is, A 0 s ds 1if0 A We assume that for any size m,theresidualvectors /( −| |) ( )−1 } / ∈ and 0 otherwise. The term 1 n M is an additional normal- X M X MX M X M y RSSM are distinct for all the models M

ization term introduced to account for the number of the ele- M of size m, so that the identifiability Assumption D.1 is satis- ments in the sum above it. fied. Theorem 4 implies The normalizing constant in (14)cannotbecomputedina ( | ) ∝ ( ) closed form so a sample from r(σ 2, d, I) will have to be sim- r M y Rγ M −γ ulated using MCMC techniques. Note that the GFD is defined n −|M| − n−|M|−1 − |M|+1 p = (πRSS ) 2 n 2 . (15) in the wavelet domain. Hannig and Lee (2009)usedtheinverse 2 M |M| wavelet transform to define a GFD on the function domain. Additionally, Hannig and Lee (2009) also assumed that M Similar to the tree constraint of the previous subsection, Lai, satisfies a tree condition (Lee 2002). This condition states that if Hannig, and Lee (2015) additionally reduced the number of a coefficient is thresholded, all its descendants have to be thresh- models by constructing a class of candidate models, denoted as M M olded too; the exact formulation is in Hannig and Lee (2009). .This shouldsatisfythefollowingtwoproperties:the M This constraint greatly reduces the search space and allows for number of models in is small and it contains the true model ( ) both efficient calculations and clean theoretical results. In the andmodelsthathavenonnegligiblevaluesofrγ M .Tocon- M article, they reported a simulation study showing small sample struct , they first apply the sure independence screening (SIS) procedure of Fan and Lv (2008) and then apply LASSO and/or performance superior to the alternative methods considered and proved an asymptotic theorem guaranteeing asymptotic consis- SCAD to those p predictors that survived SIS, and take all those M tency of the fiducial model selection. models that lie on the solution path as .Notethatconstruct- ing M in this way will ensure the true model is captured in M with high probability (Fan and Lv 2008). 3.2. Ultra High-Dimensional Regression Lai, Hannig, and Lee (2015) showed good properties of the Lai, Hannig, and Lee (2015) extended the ideas of fiducial model GFI solution both by simulation and theoretical considerations. selection to the ultra high-dimensional regression setting. The In particular, they proved a consistency theorem provided under most natural data-generating equation for this model is the following conditions. Let M be any model, M0 bethetruemodel,andHM be the = ( , β ,σ2, ) = β + σ , = ( )−1 Y G M M Z X M M Z projection matrix of X M,thatis,HM X M X MX M X M.  =||μ − μ||2, μ = ( ) = β Define M HM where E Y X M0 M0 . where Y represents the observations, M is the model consid- Throughout this subsection, we assume the following identifia- ered (collection of parameters that are nonzero), X M is the bility condition holds: design matrix for model M, β ∈ R|M| and σ>0areparam- M  eters, and Z is a vector of iid standard normal random variables. M ⊂ , | |≤ | | =∞ lim→∞ min : M0 M M k M0 (16) For computational expediency, they suggested using a sufficient- n |M0| log p ancillary transformation that yields the same Jacobian function for some fixed k > 1. Condition (16)iscloselyrelatedtothe as the l Jacobian discussed in Section 2.3.TheJacobianfunction 2 sparse Riesz condition (Zhang and Huang 2008). used is Let M be the collection of models such that M = − 1 1 { | |≤ | |} | |≤ ( , θ ) = σ 1| ( )| 2 2 . M : M k M0 for some fixed k.Therestriction M JM y M det X MX M RSSM k|M0| is imposed because in practice we only consider models The standard Minimum Description Length (MDL) penalty with size comparable with the true model. n−|M|/2 was not designed to handle ultra high-dimensional If p is large, a variable screening procedure to reduce the size problems. Inspired by the Extended Bayesian Information Crite- is still needed. This variable screening procedure should result

rion (EBIC) penalty of Chen and Chen (2008), Lai, Hannig, and in a class of candidate models M , which satisfies

Downloaded by [University North Carolina - Chapel Hill] at 08:05 14 August 2017 Lee (2015) proposed extending the penalty by modifying (12) ( ∈ M ) → ( ) = ( ), to P M0 1andlogm j o j log n (17) M M 0 = B| |, 0 = P , k = 1,...,|M|, where j contains all models in that are of size j,andm j is M k M the number of models in j. The first condition in (17)guar- where |M| is the dimension of M, Bm is a Bernoulli(1 − rm) ran- antees the model class contains the true model, at least asymp- dom variable that penalizes for the number of models that have totically. The second condition in (17) ensures that the size of thesamesizem;andPi are iid continuous random variables with the model class is not too large. The authors report small sample fP(0) = q independent of Bm that penalize for the size of mod- performance preferable to competing methods as determined by els. Following the recommendation of Hannig and Lee (2009), simulation study and prove asymptotic consistency of the fidu- −γ = −1/2 = p cial model selection algorithm. we select q n .Additionally,weselectrm m ,wherep is the number of parameters in the full model. The second choice is to penalize for the fact that there is a large number of models 4. GFI for Discrete and Interval Data that all have the same size. The most natural choice is γ = 1 for which rm is the probability of randomly selecting a model M Mostofthematerialpresentedin Sections 2 and 3 was developed from all models of size m.However,tomatchtheEBICpenalty for exactly observed continuous distributions. This section dis- of Chen and Chen (2008), we allow for other choices of γ . cusses discrete and discretized observations. 1356 J. HANNIG ET AL.

When the observations are discrete then there is no prob- understand (0, 1) as the degenerate distribution (Dirac mea- lemwiththeBorelparadoxandthelimitingdistributionin(2) sure)r on 0. canbeeasilycomputed;seeRemark4. In particular, if we define X ∼ Binomial(n, p) with n known. GFD is the 50–50 mix- Qy(u) ={θ : y = G(u, θ)} the GFD is the conditional distribu- ture of Beta(x + 1, n − x) and Beta(x, n − x + 1) distri- tion r butions, see Hannig (2009). X ∼ Poisson(λ). GFD is the 50–50 mixture of ( ) |{ ( ) =∅}, V[Qy U ] Qy U (18) Gamma(x + 1, 1) and Gamma(x, 1) distributions, see whereV[A] selects a (possibly random) element of the closure of r Dempster (2008). ∼ ( , ) the set A¯ and U is an independent copy of U.IfA = (a, b) is a X Negative Binomial r p with r known. GFD is the 50– ( , − + ) ( , − ) finite interval, then we recommend a rule that selects one of the 50 mixture of Beta r x r 1 and Beta r x r distri- endpoints a or b at random independent of U (Hannig 2009). butions, see Hannig (2014). This selection maximizes the variance of the GFD, has been also ∼ ( , ,..., ) Example 7. Next we consider Y Multinomial n p1 pk , called “half correction” (Efron 1998; Schweder and Hjort 2002; ≥ k = where n is known and pi 0, i=1 pi 1areunknown. Hannig and Xie 2012) and is closely related to the well-known When the categories of the multinomial have a natural order- continuity correction used in normal approximations. = n = l ing, Hannig (2009)suggestedtowriteY i=1 X i, ql i=1 pi and model each Xi through the data-generating equation 4.1. Some Common Discrete Distributions = ( ), ( ),..., ( ) , = ,..., , X i I(0,q1 ) Ui I[q1,q2 ) Ui I[qk−1,1) Ui i 1 n In this subsection, we compute the GFDs for parameters of sev- ,... ( , ) eral popular discrete distributions. where U1 Un are iid U 0 1 random variables. Denote the first quadrant Q ={q :0≤ q1 ≤ ...qk−1 ≤ 1}. Hannig (2009) Example 6. Let X be a random variable with distribution func- showed that the GFD (18)forq is given by tion F(y|θ). Assume there is Y so that Pθ (Y ∈ Y) = 1forall { ∈ Q ≤ ≤ , = ..., − } , θ ∈ Y V[ q : U( i ) qi U( + i ) i 1 k 1 ] , and for each fixed y the distribution function is either a j=1 y j 1 j=1 y j nonincreasing function of θ, spanning the whole interval (0, 1), where yi is the ith component of the observed y and U( ) is the or a constant equal to 1. Similarly, the left limit F(y−|θ)is also j jth order statistics of U ,...,U , which is an independent copy either a nonincreasing function of θ spanning the whole interval 1 n of U.TheGFDforp is then obtained by a simple transforma- (0, 1),oraconstantequalto0. − tion. Hannig (2009) showed good asymptotic and small sample Define the near inverse F (a|θ) = inf{y : F(y|θ) ≥ a}.Itis properties of this GFD. well known (Casella and Berger 2002)thatifU ∼ U(0,1), Y = − A drawback of the solution above is its dependency on the F (U|θ)has the correct distribution and we use this association ordering of the categories. Lawrence et al. (2009)provideda as a data-generating equation. +( ) = {θ ( |θ) = } solution that does not rely on a potentially arbitrary ordering of Next, it follows that both Qy u sup : F y u and − + the categories. Their approach starts from analyzing each coor- Q (u) = inf{θ : F(y−|θ) = u} exist and satisfy F(y|Q (u)) = y y dinate of Y individually. ( | −( )) = u and F y− Qy u u.ConsequentlyifU is an independent AscanbeseeninExample6, the fiducial inversion of each copy of U coordinate when ignoring the others gives a relationship Ui ≤ + − ≤ ∼ ( , ) P(Q (u) ≤ t) = 1 − F(y|t) and P(Q (u) ≤ t) = 1 − F(y−|t). pi 1where Ui Beta yi 1 are independent. Additionally, the y y k k fact that = p = 1 imposes a condition = U ≤ 1. Con- − i 1 i i 1 i Finally, notice that for all u ∈ (0, 1) the function F (u|θ) is sider the following random vector with its distribution taken as θ ¯ ( ) = nondecreasing in andtheclosureoftheinverseimageQy u the conditional distribution {Q−(u), Q+(u)}.Sincetheconditionin(18)hasprobability1, y y ( , ,..., ) ∼ ( − −···− , there is no conditioning and the half corrected GFD has distri- W0 W1 Wk 1 U1 Uk

Downloaded by [University North Carolina - Chapel Hill] at 08:05 14 August 2017 bution function U1,...,Uk) |{U1 +···+Uk ≤ 1}. F(y|θ)+ F(y−|θ) R(θ|y) = 1 − . A straightforward calculation shows that the vector W fol- 2 lows Dirichlet(1, y1,...,yk ) distribution. Writing Qy(w) = { w ≤ , = ,..., } ( ) If either of the distribution function is constant, we interpret it p : i pi i 1 k the GFD is V[Qy W ]. , = ,... as a point mass at the appropriate boundary of the parameter Denote by ei i 1 k the coordinate unit vectors in Rk (w) space. .NoticethatthesetQy is a simplex with vertexes {(w ,...,w ) + w , = ,..., } Analogous argument shows that if the distribution function 1 k ei 0 i 1 k .TheselectionruleV anal- and its left limit were increasing in θ than the half corrected GFD ogous to the half correction selects each vertex with equal prob- / wouldhavedistributionfunction ability and the GFD is an equal probability (1 k)mixtureof Dirichlet(Y1 + 1,Y2,...,Yk), ..., Dirichlet(Y1,Y2,...,Yk + 1). F(y|θ)+ F(y−|θ) R(θ|y) = . 2 4.2. Median Lethal Dose (LD50) Using this result, we provide a list of the half corrected GFDs for three well-known discrete distributions. Here, we under- Consider an experiment involving k dose levels x1, x2,...,xk. stand Beta(0, n + 1) and Beta(x + 1, 0) as the degenerate dis- Each dose level xi is administered to ni subjects with yi positive tributions (Dirac measure) on 0 and 1, respectively. Similarly, we responses, i = 1, 2,...,k. Assume that the relationship between JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1357

( ∈ )> , dose level xi and the probability pi of a positive response can be Since Pθ0 Y Ay 0 the arguments in Remark 4 still represented by the logistic-linear model, given by apply and the formula (18)remainsvalidwithQy(u) ={θ : G(u, θ) ∈ A¯ },whereA¯ is the closure of A . ( ) = β + β = β ( − μ), y y y logit pi 1xi 0 1 xi Hannig (2013) proved fiducial Bernstein–von Mises theo- rem for discretized data. He assumed that we observed dis- μ =−β /β where 0 1 represents the median lethal dose (LD50) cretized iid observations with a distribution function F(y|θ). ( ) = { /( − )} − and logit pi log pi 1 pi . The parameter of interest He set F (a|θ) = inf{y : F(y|θ) ≥ a} and assumed the data- LD50 is frequently of interest in many applied fields. Examples generating equation include a measure toxicity of a compound in a species in quantal − bioassay experiments and measure of difficulty in item response Yi = F (Ui | θ), i = 1,...,n, models. There are three classical methods for estimating LD50: the whereYi are random variables, θ ∈  is a p-dimensional param- delta method, Fieller’s method, and the likelihood ratio method. eter, Ui are iid U (0, 1). Ifthedose–responsecurveissteeprelativetothespreadofdoses, We restate the main theorem in Hannig (2013)inthelan- then there may be no dose groups, or at most one dose group, guage of this review article: with observed mortalities strictly between 0% and 100%. In such Theorem 5 (Hannig 2013). Suppose Assumption E.1 in cases, the maximum likelihood estimator of β1 is not calcula- ble and the Delta method and Fieller’s method fail to provide a Appendix E holds. Then the GFD defined by (18)hasthesame confidence set. Furthermore, when the standard Wald test does asymptotically normal distribution and satisfies Assumption 1 β = regardless the choice of V[· ]. Consequently, any collection of not reject the null hypothesis 1 0, Fieller’s confidence sets are ( ) eithertheentirereallineorunionsofdisjointintervals.Likewise, sets Cn yn that in the limit becomes location invariant will if the null hypothesis could not be rejected by the likelihood ratio form asymptotically correct confidence intervals. test, the likelihood ratio confidence sets are either the entire real line or unions of disjoint intervals. 4.4. Linear Mixed Models E, Hannig, and Iyer (2009)proposedageneralizedfiducial solution that does not suffer from these issues. They based their Despite the long history of inference procedures for normal lin- inference on the following data-generating equation: LetYij, i = ear mixed models, a well-performing, unified inference method 1,...,k, j = 1,...,ni denote the jth subject’s response to the is lacking. Analysis of variance (ANOVA)-based methods offer, dose level xi.SinceYij follows a Bernoulli distribution with suc- what tends to be, model-specific solutions. Bayesian methods cess probability pi = antilogit(β0 + β1xi): allow for solutions to very complex models, but determining an appropriate prior distribution can be confusing. = ( ), = ,..., , = ,..., . Cisewski and Hannig (2012)proposedtheuseofGFIfor Yij I(0,antilogit(β0+β1xi )) Uij j 1 ni i 1 k discretized linear mixed models that avoids the issues men- Here (β0,β1) are unknown parameters and Uij are independent tioned above. They started with the following data-generating standard uniform random variables. equation: TheGFDiswell-definedusing(18) and E, Hannig, and Iyer r li (2009) proposed to use a Gibbs sampler to implement it. They Y = Xβ + σ , performed a thorough simulation study showing that the gener- i V i, jUi, j = = alized fiducial method compares favorably to the classical meth- i 1 j 1 ods in terms of coverage and median length of the confidence where X is a known n × p fixed-effects design matrix, β is the interval for LD(50). Moreover, the generalized fiducial method p × 1vectoroffixedeffects,V i, j is the n × 1 design vector for performed well even in the situation when the classical methods level j of random effect i, li is the number of levels per random fail. They also proved that the fiducial CIs give asymptotically σ 2 , Downloaded by [University North Carolina - Chapel Hill] at 08:05 14 August 2017 effect i, i is the variance of random effect i,andtheUi j are correct coverage, and that the effect of discretization is negligi- independent and identically distributed standard normal ran- ble in the limit. dom variables. TocomputetheGFDin(18), Cisewski and Hannig (2012) designed a computationally efficient modification of sequential 4.3. Discretized Observations Monte Carlo (SMC) algorithm (Doucet, De Freitas, and Gordon In practice, most datasets are rounded off in some manner, say, 2001;DelMoral,Doucet,andJasra2006;DoucandMoulines by a measuring instrument or by storage on a computer. Mathe- 2008). The fiducial implementation includes a custom design matically speaking, we do not know the exact realized valueY = resampling and modification step that greatly improves the effi- y.Insteadweonlyobserveanoccurrenceofanevent{Y ∈ Ay}, ciency of the SMC algorithm for this model. for some multivariate interval Ay = [a, b) containing y and sat- Cisewski and Hannig (2012) performed a thorough simu- ( ∈ )> = ( , θ ) isfying Pθ0 Y Ay 0, where Y G U 0 is an indepen- lationstudyshowingthattheproposedmethodyieldsconfi- dent copy of Y. dence interval estimation for all parameters of balanced and For example, if the exact value of the random vector Y was unbalanced normal linear mixed models. The fiducial intervals y = (π, e, 1.28) and due to instrument precision all the values were as good as or better than the best tailor made ANOVA- were rounded to one decimal place, our observation would be based solutions for the simulation scenarios covered. In addi- the event Ay = [3.1, 3.2) × [2.7, 2.8) × [1.2, 1.3). tion, for the models considered by Cisewski and Hannig (2012) 1358 J. HANNIG ET AL.

andforthepriorselectedbasedonrecommendationsinthelit- where IK is a random selection of K different p-tuples. Numeri- erature, the Bayesian interval lengths were not generally com- cal simulations confirm that this approximation is very promis- petitive with the other methods used in the study. ing for a wide range of applications. In practice, a common The authors point out that even though more variation choice of K wouldbeintheorderofhundreds.Onemaywant was incorporated into the data for the generalized fiducial to choose K keeping in mind that a small K may fail to yield a methodduetotheuseofdiscretizeddata,thegeneralizedfidu- good enough approximation. On the other hand, a large value cial method tended to maintain stated coverage (or be con- of K would cause too much computations and it may be not servative) while having average interval lengths comparable or favorable. shorter than other methods even though the competing meth- In most algorithms such as an MCMC sampler, the density ods assumed the data are observed exactly. is repeatedly evaluated for different values of θ. We recommend tokeepthesamechoiceofIK for different values of θ to gain stability of the algorithm. 5. Computational Issues The above discussion also applies to the generalized fiducial density (13)whenmodelselectionisinvolved. This section presents some computational challenges involved when applying GFI in practice and some possible solutions to solve these challenges. For any given model, we recall that the GFD is defined as the 6. Concluding Remarks and Open Problems weak limit in (2) and under fairly general conditions, the weak After many years of investigations, the authors and collaborators (θ| ) limit has a density r y givenin(3). This density can often be have demonstrated that GFI is a useful and promising approach used directly to form estimates and asymptotic confidence inter- for conducting statistical inference. GFI has been validated by vals for the model parameters, in a similar manner as the density asymptotic theory and by simulation in numerous small sample of the posterior distribution in the Bayesian paradigm. Standard problems. In this article, we have summarized the latest theoret- sampling techniques such as MCMC, importance sampling, or ical and methodological developments and applications of GFI. sequential Monte Carlo have been successfully implemented, for Toconclude,welistsomeopenandimportantresearchprob- example, Hannig et al. (2006a), Hannig (2009), Hannig and Lee lems about GFI. (2009), Wandler and Hannig (2012b), and Cisewski and Hannig 1. As mentioned earlier, the choice of data-generating (2012). equation G in (1) is not unique for many problems. The exact form of generalized fiducial density could be hard Based on our practical experience gained from simula- to compute. For this reason, Hannig, Lai, and Lee (2014)pre- tions, GFD-based intervals are usually conservative and sented a computationally tractable solution for conducting gen- often quite short as compared to competing methods eralized fiducial inference without knowing the exact closed for small sample sizes. This property is not well under- form of the generalized fiducial distribution. stood as traditional asymptotic tools (including higher order asymptotics) do not explain it. Understanding this nonasymptotic phenomenon will likely help both with 5.1. Evaluating the Generalized Fiducial Density via deeper understanding of GFI and the optimal choice of Subsampling G. Although our numerical experience suggests that dif- In some situations, even the denominator of the density r(θ|y) ferent choices of G only lead to small differences in prac- becomes too complicated to evaluate directly, particularly so tical performances, it would still be important to develop when the l∞ norm is used in (2). In such situations, the function an objective method for choosing G. D(·) in (4) is a sum over all possible tuples of length p,thatis, 2. As an interesting alternative, one could modify the GFD (·) θ D(A) = =( ,..., ) |det(A)i|.Ifwehaven observations, there definition2 ( ) by adding a penalty term p on to i i1 ip n encourage sparse solutions: Downloaded by [University North Carolina - Chapel Hill] at 08:05 14 August 2017 are in total p number of possible tuples. If the sum cannot be n simplified analytically, one is obliged to compute all p terms. Such computations can become prohibitively expensive even for arg min y − G(U , θ ) lim→ moderate n and p. Appropriate approximations are required to 0 θ evaluate the density efficiently. + (θ)  − ( , θ)≤ . If the observations are iid and l∞ norm is used, p y G U (19) ( d ( , θ)| − ) D dθ G u u=G 1 (y,θ) is a U-statistic. Given the strong (·) dependency of the terms in D ,itseemspossibletousemuch For example, in the context of linear regression with an less than n terms for approximation without loss of accuracy. p l1 penalty p(·), just as the lasso (Tibshirani 1996)and Blom (1976)showedthatincompleteU-statistic based on Dantzig selector (Candes and Tao 2007)do,(19)willlead random selection of K subsamples behaves very similar to the to sparse solutions. We stress that while obtaining sparse complete U-statistic when n and K are large. On the basis of point estimators through a minimization problem has this result, Hannig (2009), and its follow-up articles, we suggest become a standard technique, (19)producessparse dis- (·) to replace D by tributions on the parameter space also as a result of opti- mization. This is different from sparse posterior distri- ˆ D(A; IK ) = |det(A)i| , butions obtained as a result of sparsity priors. The hope

i∈IK JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1359

is that this approach will lead to computationally effi- Ser.,Vol.38), ed. P.Lahiri, Beachwood, OH: Inst. Math. Statist., pp. 135– cient ways of quantifying uncertainty in model selection 207. [1354] procedures. Berger, J. O., and Sun, D. (2008), “Objective Priors for the Bivariate Normal Model,” The Annals of Statistics, 36, 963–982. [1347] 3. One possible way to gain a deeper philosophical under- Birnbaum, A. (1961), “On the Foundations of Statistical Inference: standing of GFI is to find a general set of conditions Binary Experiments,” The Annals of , 32, 414– under which GFI is in some sense an optimal data- 435. [1348] dependent distribution on the parameter space (assum- ——— (1962), “On the Foundations of Statistical Inference,” Journal of the ing such a set exists). The work of Taraldsen and American Statistical Association, 57, 269–326. [1350] Blom, G. (1976), “Some Properties of Incomplete U-Statistics,” Biometrika, Lindqvist (2013) that provides an initial result on a con- 63, 573–580. [1358] nection between and fiducial inference Candes, E., and Tao, T. (2007), “The Dantzig Selector: Statistical Estimation would be a good starting point. When p is Much Larger Than n,” The Annals of Statistics, 35, 2313– 4. Itwouldbeinterestingtoinvestigatetheperformanceof 2351. [1358] GFI when the data-generating equation is misspecified. Casella, G., and Berger, R. L. (2002), Statistical Inference (2nd ed.), Pacific Grove, CA: Wadsworth and Brooks/Cole Advanced Books and Soft- For example, what would happen to the empirical confi- ( , ) ware. [1348,1356] dence interval coverages if N 0 1 is used as the random Chen, J., and Chen, Z. (2008), “Extended Bayesian Information Criteria for component when the truth is in fact t with 3 degrees of Model Selection With Large Model Spaces,” Biometrika, 95, 759–771. freedom? [1355] Lastly,wehopethatourcontributionstoGFIwillstimu- Chiang, A. K. L. (2001), “A Simple General Method for Constructing Con- fidence Intervals for Functions of Variance Components,” Technomet- latethegrowth,usage,andinterestofthisexcitingapproach rics, 43, 356–367. [1347] for statistical inference in various research and application Cisewski, J., and Hannig, J. (2012), “Generalized Fiducial Inference for Nor- communities. malLinearMixedModels,”The Annals of Statistics, 40, 2102–2127. [1347,1348,1357,1358] Dawid, A. P., and Stone, M. (1982), “The Functional-Model Basis of Fiducial Supplementary Materials Inference,” The Annals of Statistics, 10, 1054–1074. [1346] Dawid, A. P.,Stone, M., and Zidek, J. V.(1973), “Marginalization Paradoxes The online supplementary materials contain the appendices for the article, in Bayesian and Structural Inference,” JournaloftheRoyalStatistical and code for many of the methods in this review. Society, Series B, 35, 189–233. [1346] Del Moral, P., Doucet, A., and Jasra, A. (2006), “Sequential Monte Carlo Samplers,” JournaloftheRoyalStatisticalSociety, Series B, 68, 411–436. Acknowledgment [1357] TheauthorsarethankfultoYifanCuiwhohasfoundnumeroustyposinan Dempster, A. P. (1966), “New Methods for Reasoning Towards Posterior earlier version of this article. They are also most grateful to the reviewers Distributions Based on Sample Data,” The Annals of Mathematical and the associate editor for their constructive and insightful comments and Statistics, 37, 355–374. [1346] suggestions. ——— (1968), “A Generalization of Bayesian Inference” (with discussion), JournaloftheRoyalStatisticalSociety, Series B, 30, 205–247. [1346] ——— (2008), “The Dempster-Shafer Calculus for Statisticians,” Interna- Funding tional Journal of Approximate Reasoning, 48, 365–377. [1346,1348] Douc, R., and Moulines, E. (2008), “Limit Theorems for Weighted Samples Hannig was supported in part by the National Science Foundation under With Applications to Sequential Monte Carlo Methods,” The Annals of Grant Nos. 1016441, 1633074, and 1512893. Lee was supported in part by Statistics, 36, 2344–2376. [1357] the National Science Foundation under Grant Nos. 1209226, 1209232, and Doucet, A., De Freitas, N., and Gordon, N. (2001), Sequential Monte Carlo 1512945. Methods in Practice,NewYork:Springer.[1357] E, L., Hannig, J., and Iyer, H. K. (2008), “Fiducial Intervals for Variance Components in an Unbalanced Two-Component Normal Mixed Lin- References ear Model,” Journal of the American Statistical Association, 103, 854– 865. [1347] Barnard, G. A. (1995), “Pivotal Models and the Fiducial Argument,” Inter- ——— (2009), “Applications of Generalized Fiducial Inference,” Ph.D. Dis- Downloaded by [University North Carolina - Chapel Hill] at 08:05 14 August 2017 national Statistical Reviews, 63, 309–323. [1346] sertation, Colorado State University, Fort Collins, CO. [1348,1357] Bayarri, M. J., Berger, J. O., Forte, A., and García-Donato, G. (2012), “Crite- Edlefsen, P.T., Liu, C., and Dempster, A. P.(2009), “Estimating Limits From ria for Bayesian Model Choice With Application to Variable Selection,” Poisson Counting Data Using Dempster–Shafer Analysis,” The Annals The Annals of Statistics, 40, 1550–1577. [1347] of Applied Statistics, 3, 764–790. [1346] Beaumont, M. A., Zhang, W., and Balding, D. J. (2002), “Approximate Efron, B. (1998), “R.A. Fisher in the 21st Century,” Statistical Science, 13, Bayesian Computation in Population Genetics,” Genetics, 162, 2025– 95–122. [1356] 2035. [1349] Fan, J., and Lv, J. (2008), “Sure Independence Screening for Ultrahigh Berger, J. O. (1992), “On the Development of Reference Priors” (with dis- Dimensional Feature Space,” JournaloftheRoyalStatisticalSociety, cussion), Bayesian Statistics, 4, 35–60. [1347] Series B, 70, 849–911. [1355] Berger, J. O., Bernardo, J. M., and Sun, D. (2009), “The Formal Def- Fisher, R. A. (1922), “On the Mathematical Foundations of Theoretical inition of Reference Priors,” The Annals of Statistics, 37, 905–938. Statistics,” Philosophical Transactions of the Royal Society of London, [1347,1350,1351] Series A, 222, 309–368. [1346] ——— (2012), “Objective Priors for Discrete Parameter Spaces,” Journal of ——— (1925), “Theory of Statistical Estimation,” Proceedings of the Cam- the American Statistical Association, 107, 636–648. [1347] bridge Philosophical Society, 22, 700–725. [1346] Berger, J. O., and Pericchi, L. R. (1996), “The Intrinsic Bayes Factor for ——— (1930), “,” Proceedings of the Cambridge Philo- Model Selection and Prediction,” Journal of the American Statistical sophical Society, xxvi, 528–535. [1346] Association, 91, 109–122. [1354] ——— (1933), “The Concepts of Inverse Probability and Fiducial Probabil- ——— (2001), “Objective Bayesian Methods for Model Selection: Introduc- ity Referring to Unknown Parameters,” Proceedings of the Royal Society tion and Comparison,” in Model Selection (IMS Lecture Notes Monogr. of London, Series A, 139, 343–348. [1346] 1360 J. HANNIG ET AL.

——— (1935), “The Fiducial Argument in Statistical Inference,” The Annals Lai, R. C. S., Hannig, J., and Lee, T. C. M. (2015), “Generalized Fiducial of Eugenics, VI, 91–98. [1346] Inference for Ultra-High Dimensional Regression,” Journal of Ameri- Fraser, A. M., Fraser, D. A. S., and Staicu, A.-M. (2009), “The Second can Statistical Association, 110, 760–772. [1347,1354,1355] Order Ancillary: A Differential View With Continuity,” Bernoulli. Offi- Lawrence, E., Liu, C., Vander Wiel, S., and Zhang, J. (2009), “ANew Method cial Journal of the Bernoulli Society for Mathematical Statistics and Prob- for Multinomial Inference Using Dempster-Shafer Theory.” [1356] ability, 16, 1208–1223. [1347] Lee, T. C. M. (2002), “Tree-Based Wavelet Regression for Correlated Data Fraser, D., and Naderi, A. (2008), “Exponential Models: Approximations for using the Minimum Description Length Principle,” Australian and New Probabilities,” Biometrika, 94, 1–9. [1347] Zealand Journal of Statistics, 44, 23–39. [1355] Fraser, D., Reid, N., and Wong, A. (2005), “What a Model With Data Says Lindley, D. V.(1958), “Fiducial Distributions and Bayes’ Theorem,” Journal About Theta,” International Journal of Statistical Science, 3, 163–178. of the Royal Statistical Society, Series B, 20, 102–107. [1346] [1347] Liu, Y., and Hannig, J. (2016), “Generalized Fiducial Inference for Binary Fraser, D. A. S. (1961a), “On Fiducial Inference,” The Annals of Mathemat- Logistic Item Response Models,” Psychometrica, 81, 290–324. [1347] ical Statistics, 32, 661–676. [1346] Majumder, P. A., and Hannig, J. (2015), “Higher Order Asymptotics for ———(1961b), “The Fiducial Method and Invariance,” Biometrika, 48, 261– Generalized Fiducial Inference,” unpublished manuscript. [1347,1353] 280. [1346] Martin, R., and Liu, C. (2013), “Inferential Models: A Framework for Prior- ———(1966), “Structural Probability and a Generalization,” Biometrika, 53, Free Posterior Probabilistic Inference,” Journal of the American Statis- 1–9. [1346] tical Association, 108, 301–313. [1346] ———(1968), The Structure of Inference, New York-London-Sydney: Wiley. ——— (2015a), “Conditional Inferential Models: Combining Information [1346] for Prior-Free Probabilistic Inference,” JournaloftheRoyalStatistical ——— (2004), “Ancillaries and Conditional Inference,” Statistical Science, Society, Series B, 77, 195–217. [1346,1353] 19, 333–369. [1347] ——— (2015b), Inferential Models: Reasoning with Uncertainty (Vol. 145), ——— (2011), “Is Bayes Posterior Just Quick and Dirty Confidence?” Sta- Boca Raton, FL: CRC Press. [1346] tistical Science, 26, 299–316. [1347] ——— (2015c), “Marginal Inferential Models: Prior-Free Probabilistic Fraser, D. A. S., Reid, N., Marras, E., and Yi, G. Y. (2010), “Default Priors Inference on Interest Parameters,” Journal of the American Statistical for Bayesian and ,” JournaloftheRoyalStatistical Association, 110, 1621–1631. [1346] Society, Series B, 72. [1347,1350] Martin, R., and Walker, S. G. (2014), “Asymptotically Minimax Empirical Glagovskiy, Y. S. (2006), “Construction of Fiducial Confidence Intervals Bayes Estimation of a Sparse Normal Mean Vector,” Electronic Journal For the Mixture of Cauchy and Normal Distributions,” Master’s The- of Statistics, 8, 2188–2206. [1353] sis, Department of Statistics, Colorado State University, Fort Collins, Martin, R., Zhang, J., and Liu, C. (2010), “Dempster-Shafer Theory and CO. [1347] Statistical Inference With Weak Beliefs,” Statistical Science, 25, 72– Hannig, J. (2009), “On Generalized Fiducial Inference,” Statistica Sinica, 19, 87. [1346] 491–544. [1347,1348,1349,1352,1356,1358] McNally, R. J., Iyer, H. K., and Mathew, T. (2003), “Tests for Individual and ———(2013), “Generalized Fiducial Inference via Discretization,” Statistica Population Bioequivalence Based on Generalized p-Values,” Statistics Sinica, 23, 489–514. [1347,1348,1349,1350,1357] in Medicine, 22, 31–53. [1347] ——— (2014), Discussion of “On the Birnbaum Argument for the Strong Patterson, P.,Hannig, J., and Iyer, H. K. (2004), “Fiducial Generalized Con- Likelihood Principle” by D. G. Mayo, Statistical Science, 29, 254–258. fidence Intervals for Proportionof Conformance,” Technical Report [1358] 2004/11, Colorado State University, Fort Collins, CO. [1347] Hannig, J., E, L., Abdel-Karim, A., and Iyer, H. K. (2006a), “Simultane- Salome, D. (1998), “Statistical Inference via Fiducial Methods,” ous Fiducial Generalized Confidence Intervals for Ratios of Means of Ph.D. dissertation, University of Groningen, Groningen, The Lognormal Distributions,” Austrian Journal of Statistics, 35, 261–269. Netherlands. [1346] [1347,1358] Schweder, T., and Hjort, N. L. (2002), “Confidence and Likelihood,” Scan- Hannig, J., Iyer, H. K., and Patterson, P.(2006b), “Fiducial Generalized Con- dinavian Journal of Statistics, 29, 309–332. [1346,1356] fidence Intervals,” Journal of American Statistical Association, 101, 254– Singh, K., Xie, M., and Strawderman, W. E. (2005), “Combining Informa- 269. [1347] tion From Independent Sources Through Confidence Distributions,” Hannig, J., Iyer, H. K., and Wang, J. C.-M. (2007), “Fiducial Approach to The Annals of Statistics, 33, 159–183. [1346] Uncertainty Assessment: Accounting for Error Due to Instrument Res- Sonderegger, D., and Hannig, J. (2014), “Fiducial Theory for Free- olution,” Metrologia, 44, 476–483. [1347] Knot Splines,” in Contemporary Developments in Statistical Theory, Hannig, J., Lai, R. C. S., and Lee, T. C. M. (2014), “Computational Issues aFestschriftinhonorofProfessorHiraL.Koul,eds.S.Lahiri,A. of Generalized Fiducial Inference,” Computational Statistics and Data Schick, A. SenGupta, T. N. Sriram, New York: Springer, pp. 155–189. Analysis, 71, 849–858. [1358] [1347,1352] Hannig, J., and Lee, T. C. M. (2009), “Generalized Fiducial Inference for Stevens, W. L. (1950), “Fiducial Limits of the Parameter of a Discontinuous Downloaded by [University North Carolina - Chapel Hill] at 08:05 14 August 2017 Wavelet Regression,” Biometrika, 96, 847–860. [1347,1354,1355,1358] Distribution,” Biometrika, 37, 117–129. [1346] Hannig, J., Wang, C. M., and Iyer, H. K. (2003), “Uncertainty Calculation for Taraldsen, G., and Lindqvist, B. H. (2013), “Fiducial Theory and Optimal the Ratio of Dependent Measurements,” Metrologia, 4, 177–186. [1347] Inference,” The Annals of Statistics, 41, 323–341. [1347] Hannig, J., and Xie, M. (2012), “A Note on Dempster-Shafer Recombina- Tibshirani, R. (1996), “Regression Shrinkage and Selection via the tions of Confidence Distributions,” Electrical Journal of Statistics,6, Lasso,” JournaloftheRoyalStatisticalSociety, Series B, 58, 267–288. 1943–1966. [1347,1356] [1358] Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. (1999), Tsui, K.-W., and Weerahandi, S. (1989), “Generalized p-Values in Signif- “Bayesian Model Averaging: A Tutorial” (with discussion), Statistical icance Testing of Hypotheses in the Presence of Nuisance Param- Science, 14, 382–417; corrected version available at http://www.stat. eters,” Journal of the American Statistical Association, 84, 602– washington.edu/www/research/online/hoetingl999.pdf.[1354] 607. [1347] Iyer, H. K., and Patterson, P. (2002), “A Recipe for Constructing General- ——— (1991), “Corrections: “Generalized p-Values in Significance Testing ized Pivotal Quantities and Generalized Confidence Intervals,”Techni- of Hypotheses in the Presence of Nuisance Parameters,”Journal of the cal Report 10, Department of Statistics, Colorado State University, Fort American Statistical Association, 86, 256 (Journal of the American Sta- Collins CO. [1350] tistical Association, 84, 1989, 602–607). [1347] Iyer, H. K., Wang, C. M. J., and Mathew, T. (2004), “Models and Confidence Tukey, J. W.(1957), “Some Examples With Fiducial Relevance,” The Annals Intervals for True Values in Interlaboratory Trials,” Journal of the Amer- of Mathematical Statistics, 28, 687–695. [1346] ican Statistical Association, 99, 1060–1071. [1347] Veronese, P., and Melilli, E. (2015), “Fiducial and Confidence Distributions Jeffreys, H. (1940), “Note on the Behrens-Fisher Formula,” The Annals of for Real Exponential Families,” Scandinavian Journal of Statistics, 42, Eugenics, 10, 48–51. [1346] 471–484. [1347] JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1361

Wandler, D. V., and Hannig, J. (2011), “Fiducial Inference on the Maximum ——— (1995), Exact Statistical Methods for Data Analysis, Springer Series in Mean of a Multivariate Normal Distribution,” Journal of Multivariate Statistics, New York: Springer-Verlag. [1347] Analysis, 102, 87–104. [1347] Welch, B. L., and Peers, H. W. (1963), “On Formulae for Confidence Points ——— (2012a), “A Fiducial Approach to Multiple Comparisons,” Journal of Based on Integrals of Weighted Likelihoods,” JournaloftheRoyalSta- Statistical Planning and Inference, 142, 878–895. [1347] tistical Society, Series B, 25, 318–329. [1353] ——— (2012b), “Generalized Fiducial Confidence Intervals for Extremes,” Wilkinson, G. N. (1977), “On Resolving the Controversy in Statistical Extremes, 15, 67–87. [1347,1358] Inference,” JournaloftheRoyalStatisticalSociety, Series B, 39, 119– Wang, J. C.-M., Hannig, J., and Iyer, H. K. (2012a), “Fiducial Prediction 171. [1346] Intervals,” Journal of Statistical Planning and Inference, 142, 1980–1990. Xie, M., Liu, R. Y., Damaraju, C. V., and Olson, W. H. (2013), [1347,1353] “Incorporating External Information in Analyses of Clinical Trials ——— (2012b), “Pivotal Methods in the Propagation of Distributions,” With Binary Outcomes,” The Annals of Applied Statistics, 7, 342– Metrologia, 49, 382–389. [1347] 368. [1347] Wang, J. C.-M., and Iyer, H. K. (2005), “Propagation of Uncertainties in Xie, M., and Singh, K. (2013), “, the Frequentist Measurements Using Generalized Inference,” Metrologia, 42, 145–153. Distribution Estimator of a Parameter: A Review,” International Statis- [1347] tical Review, 81, 3–39. [1346,1348] ——— (2006a), “A Generalized Confidence Interval for a Measurand in the Xie, M., Singh, K., and Strawderman, W. E. (2011), “Confidence Distribu- Presence of Type-A and Type-B Uncertainties,” Measurement, 39, 856– tions and a Unified Framework for Meta-Analysis,” Journal of the Amer- 863. [1347] ican Statistical Association, 106, 320–333. [1346] ——— (2006b), “Uncertainty Analysis for Vector Measurands Using Fidu- Xu, X., and Li, G. (2006), “Fiducial Inference in the Pivotal Family of Dis- cial Inference,” Metrologia, 43, 486–494. [1347] tributions,” Science in China, Series A, 49, 410–432. [1347] Wang, Y. H. (2000), “Fiducial Intervals: What Are They?” The American Yang, R., and Berger, J. O. (1997), “A Catalogue of Noninformative Priors,” Statistician, 54, 105–111. [1347] Technical Report ISDS 97–42, Duke University, Durham, NC. [1350] Weerahandi, S. (1993), “Generalized Confidence Intervals,” Journal of the Zhang, C., and Huang, J. (2008), “The Sparsity and Bias of the Lasso Selec- American Statistical Association, 88, 899–905. [1347] tion in High-Dimensional Linear Regression,” The Annals of Statistics, ——— (1994), Correction: “Generalized Confidence Intervals,” Journal of 36, 1567–1594. [1355] the American Statistical Association, 89, 726 (Journal of the American Zhang, J., and Liu, C. (2011), “Dempster-Shafer Inference With Weak Statistical Association, 88, 1993, 899–905). [1347] Beliefs,” Statistica Sinica, 21, 475–494. [1346] Downloaded by [University North Carolina - Chapel Hill] at 08:05 14 August 2017