Measuring Nuisance Parameter Effects in Bayesian Inference
Total Page:16
File Type:pdf, Size:1020Kb
Measuring nuisance parameter effects in Bayesian inference Alastair Young Imperial College London WHOA-PSI-2017 1 / 31 Acknowledgements: Tom DiCiccio, Cornell University; Daniel Garcia Rasines, Imperial College London; Todd Kuffner, WUSTL. 2 / 31 I Substantial Bayesian robustness literature, quantifying degree to which posterior quantities depend on prior. I Especially important with hierarchical models, interpretability deeper in hierarchy becomes challenging. Background I Inferences drawn from Bayesian data analysis depend on the assumed prior distribution. 3 / 31 I Especially important with hierarchical models, interpretability deeper in hierarchy becomes challenging. Background I Inferences drawn from Bayesian data analysis depend on the assumed prior distribution. I Substantial Bayesian robustness literature, quantifying degree to which posterior quantities depend on prior. 3 / 31 Background I Inferences drawn from Bayesian data analysis depend on the assumed prior distribution. I Substantial Bayesian robustness literature, quantifying degree to which posterior quantities depend on prior. I Especially important with hierarchical models, interpretability deeper in hierarchy becomes challenging. 3 / 31 I Local: based on infinitesimal perturbations to prior, differentiation of posterior quantities wrt prior. Approaches I Global: all possible inferences arising from class of priors considered, hope resulting class of inferences is small, so robust to prior. Limited by difficulty of computing range of posterior quantity as prior varies. 4 / 31 Approaches I Global: all possible inferences arising from class of priors considered, hope resulting class of inferences is small, so robust to prior. Limited by difficulty of computing range of posterior quantity as prior varies. I Local: based on infinitesimal perturbations to prior, differentiation of posterior quantities wrt prior. 4 / 31 Typical inferential problem Let Y = fY1;:::; Yng be random sample from underlying distribution Fθ, indexed by d + 1-dimensional parameter θ = (θ1; : : : ; θd+1) = ( ; χ). Have = θ1 a scalar parameter of interest, χ a d-dimensional nuisance parameter. 5 / 31 Key question How sensitive is the (marginal) posterior inference on to presence in model of nuisance parameter χ and assumed form of prior distribution? 6 / 31 I Offer direct, practical interpretation of measures of influence in terms of posterior probabilities (in `full' and `reduced' models). Purpose I Motivated by corresponding frequentist ideas, provide machinery, based on Bayesian asymptotics, to measure extent of the influence that nuisance parameters have on the inference. 7 / 31 Purpose I Motivated by corresponding frequentist ideas, provide machinery, based on Bayesian asymptotics, to measure extent of the influence that nuisance parameters have on the inference. I Offer direct, practical interpretation of measures of influence in terms of posterior probabilities (in `full' and `reduced' models). 7 / 31 Key (frequentist) statistic For testing H0 : = 0 against a one-sided alternative the signed root likelihood ratio statistic is 1=2 R ≡ R( 0) = sgn( ^ − 0) 2 l(θ^) − l( 0; χ^0) ; where l(θ) = l( ; χ) is the log-likelihood function, θ^ = ( ^; χ^) is the MLE of θ, andχ ^0 is the constrained MLE of χ given the value = 0 of the interest parameter . 8 / 31 Bayesian posterior approximation: `full model' For continuous prior density π(θ) = π( ; χ), the posterior tail probability given the data Y = (Y1;:::; Yn) is ∗ −3=2 P( ≥ 0jY ) = Φ(R ) + O(n ); where Φ(·) is the cumulative distribution function of the standard normal distribution, ∗ ∗ −1 R ≡ R ( 0) = R + R log T =R ; 1=2 j − lχχ( 0; χ^0)j π(θ^) T ≡ T ( 0) = l ( 0; χ^0) ; 1=2 j − lθθ(θ^)j π( 0; χ^0) −1=2 and ^ − 0 is of order O(n ). 9 / 31 A factorization We may factorize T = TINFTNP; with ^ π ( ) ^ 1=2 TINF ≡ TINF( 0) = l ( 0; χ^0) −l (θ) ; π ( 0) 1=2 χj j − lχχ( 0; χ^0)j π (θ^) TNP ≡ TNP( 0) = ; 1=2 χj j − lχχ(θ^)j π ( 0; χ^0) with π ( ) the marginal prior density for and πχj (θ) the conditional prior density for χ given . 10 / 31 I Both TINF and TNP are invariant under interest-respecting transformations, hence ZNP and ZINF also have this property. I TNP does not appear in the tail probability formula when nuisance parameters are absent, so ZNP also vanishes in this case. This decomposition has two important properties: Properties For Bayesian inference, ∗ R = R + ZNP + ZINF; −1 −1 where ZNP = R log TNP and ZINF = R log TINF=R . 11 / 31 I Both TINF and TNP are invariant under interest-respecting transformations, hence ZNP and ZINF also have this property. I TNP does not appear in the tail probability formula when nuisance parameters are absent, so ZNP also vanishes in this case. Properties For Bayesian inference, ∗ R = R + ZNP + ZINF; −1 −1 where ZNP = R log TNP and ZINF = R log TINF=R . This decomposition has two important properties: 11 / 31 I TNP does not appear in the tail probability formula when nuisance parameters are absent, so ZNP also vanishes in this case. Properties For Bayesian inference, ∗ R = R + ZNP + ZINF; −1 −1 where ZNP = R log TNP and ZINF = R log TINF=R . This decomposition has two important properties: I Both TINF and TNP are invariant under interest-respecting transformations, hence ZNP and ZINF also have this property. 11 / 31 Properties For Bayesian inference, ∗ R = R + ZNP + ZINF; −1 −1 where ZNP = R log TNP and ZINF = R log TINF=R . This decomposition has two important properties: I Both TINF and TNP are invariant under interest-respecting transformations, hence ZNP and ZINF also have this property. I TNP does not appear in the tail probability formula when nuisance parameters are absent, so ZNP also vanishes in this case. 11 / 31 Relationship with posterior mean The posterior mean of R satisfies −1 EfR( )jY g = − gNP + gINF + O(n ); where gNP = lim ZNP( 0); gINF = lim ZINF( 0): 0! ^ 0! ^ −1 Standard calculations show that ZNP( 0) = gNP + O(n ) and −1 −1=2 ZINF( 0) = gINF + O(n ), for 0 in O(n ) neighborhood of ^. 12 / 31 −1 It is apparent that ZINF, at least to error of order O(n ), does not account for the presence of nuisance parameters. Nuisance parameters are accounted for by ZNP. Key observation −1 ZINF agrees to error of order O(n ) with the value it would take in the reduced problem where the nuisance parameter χ is set equal to the maximum likelihood estimateχ ^ and Bayesian inference is considered only for the scalar parameter based on the prior density π ( ). 13 / 31 Nuisance parameters are accounted for by ZNP. Key observation −1 ZINF agrees to error of order O(n ) with the value it would take in the reduced problem where the nuisance parameter χ is set equal to the maximum likelihood estimateχ ^ and Bayesian inference is considered only for the scalar parameter based on the prior density π ( ). −1 It is apparent that ZINF, at least to error of order O(n ), does not account for the presence of nuisance parameters. 13 / 31 Key observation −1 ZINF agrees to error of order O(n ) with the value it would take in the reduced problem where the nuisance parameter χ is set equal to the maximum likelihood estimateχ ^ and Bayesian inference is considered only for the scalar parameter based on the prior density π ( ). −1 It is apparent that ZINF, at least to error of order O(n ), does not account for the presence of nuisance parameters. Nuisance parameters are accounted for by ZNP. 13 / 31 We offer a practical interpretation of this ratio in terms of posterior probabilities. Asymptotics guiding the relevant finite-sample quantity to examine. Proposal Candidate measures to assess the extent of the influence that the nuisance parameters have on the Bayesian inference might be the ratios gNP=gINF and ZNP=ZINF. 14 / 31 Asymptotics guiding the relevant finite-sample quantity to examine. Proposal Candidate measures to assess the extent of the influence that the nuisance parameters have on the Bayesian inference might be the ratios gNP=gINF and ZNP=ZINF. We offer a practical interpretation of this ratio in terms of posterior probabilities. 14 / 31 Proposal Candidate measures to assess the extent of the influence that the nuisance parameters have on the Bayesian inference might be the ratios gNP=gINF and ZNP=ZINF. We offer a practical interpretation of this ratio in terms of posterior probabilities. Asymptotics guiding the relevant finite-sample quantity to examine. 14 / 31 Details Let the reference value ^1−α be the asymptotic 1 − α posterior percentage point of , given by R( ^1−α) = zα, where zα is the α-quantile of the standard normal distribution. 15 / 31 In examples, such as high-dimensional linear regression, found to be highly accurate. The relative difference in the percentile bias of the asymptotic percentage point ^1−α between the full and the reduced problems is P ≤ ^1−αjY − (1 − α) − PReduced ≤ ^1−αjY − (1 − α) PReduced ≤ ^1−αjY − (1 − α) exp z1−αZNP( ^1−α) − 1 = ; 1 − exp −z1−αZINF( ^1−α) to error of order O(n−1). 16 / 31 The relative difference in the percentile bias of the asymptotic percentage point ^1−α between the full and the reduced problems is P ≤ ^1−αjY − (1 − α) − PReduced ≤ ^1−αjY − (1 − α) PReduced ≤ ^1−αjY − (1 − α) exp z1−αZNP( ^1−α) − 1 = ; 1 − exp −z1−αZINF( ^1−α) to error of order O(n−1). In examples, such as high-dimensional linear regression, found to be highly accurate. 16 / 31 Simplifications To error of order O(n−1) we have ^ exp z1−αZNP( 1−α) − 1 ZNP( ^1−α) gNP = = : 1 − exp −z1−αZINF( ^1−α) ZINF( ^1−α) gINF 17 / 31 The entire prior density is of the form π( ; χ, β) = π ,χjβ( ; χ, β)πβ(β); while the log-likelihood function depends only on ( ; χ).