Accelerated Gibbs Sampling of Normal Distributions Using Matrix Splittings and Polynomials
Total Page:16
File Type:pdf, Size:1020Kb
Bernoulli 23(4B), 2017, 3711–3743 DOI: 10.3150/16-BEJ863 Accelerated Gibbs sampling of normal distributions using matrix splittings and polynomials COLIN FOX1 and ALBERT PARKER2 1Department of Physics, University of Otago, Dunedin, New Zealand. E-mail: [email protected] 2Center for Biofilm Engineering, Department of Mathematical Sciences, Montana State University, Boze- man, MT, USA. E-mail: [email protected] Standard Gibbs sampling applied to a multivariate normal distribution with a specified precision matrix is equivalent in fundamental ways to the Gauss–Seidel iterative solution of linear equations in the precision matrix. Specifically, the iteration operators, the conditions under which convergence occurs, and geomet- ric convergence factors (and rates) are identical. These results hold for arbitrary matrix splittings from classical iterative methods in numerical linear algebra giving easy access to mature results in that field, including existing convergence results for antithetic-variable Gibbs sampling, REGS sampling, and gener- alizations. Hence, efficient deterministic stationary relaxation schemes lead to efficient generalizations of Gibbs sampling. The technique of polynomial acceleration that significantly improves the convergence rate of an iterative solver derived from a symmetric matrix splitting may be applied to accelerate the equivalent generalized Gibbs sampler. Identicality of error polynomials guarantees convergence of the inhomogeneous Markov chain, while equality of convergence factors ensures that the optimal solver leads to the optimal sampler. Numerical examples are presented, including a Chebyshev accelerated SSOR Gibbs sampler ap- plied to a stylized demonstration of low-level Bayesian image reconstruction in a large 3-dimensional linear inverse problem. Keywords: Bayesian inference; Gaussian Markov random field; Gibbs sampling; matrix splitting; multivariate normal distribution; non-stationary stochastic iteration; polynomial acceleration 1. Introduction The Metropolis–Hastings algorithm for MCMC was introduced to main-stream statistics around 1990 (Robert and Casella [48]), though prior to that the Gibbs sampler provided a coherent approach to investigating distributions with Markov random field structure (Turcinˇ [60], Grenan- der [32], Geman and Geman [25], Gelfand and Smith [23], Besag and Green [11], Sokal [58]). The Gibbs sampler may be thought of as a particular Metropolis–Hastings algorithm that uses the conditional distributions as proposal distributions, with acceptance probability always equal to1(Geyer[26]). In statistics, the Gibbs sampler is popular because of ease of implementation (see, e.g., Roberts and Sahu [51]), when conditional distributions are available in the sense that samples may be drawn from the full conditionals. However, the Gibbs sampler is not often presented as an ef- ficient algorithm, particularly for massive models. In this work, we show that generalized and 1350-7265 © 2017 ISI/BS 3712 C. Fox and A. Parker accelerated Gibbs samplers are contenders for the fastest sampling algorithms for normal target distributions, because they are equivalent to the fastest algorithms for solving systems of linear equations. Almost all current MCMC algorithms, including Gibbs samplers, simulate a fixed transition kernel that induces a homogeneous Markov chain that converges geometrically in distribution to the desired target distribution. In this aspect, modern variants of the Metropolis–Hastings algorithm are unchanged from the Metropolis algorithm as first implemented in the 1950s. The adaptive Metropolis algorithm of Haario et al. [34] (see also Roberts and Rosenthal [50]) is an exception, though it converges to a geometrically convergent Metropolis–Hastings algorithm that bounds convergence behaviour. We focus on the application of Gibbs sampling to drawing samples from a multivariate normal distribution with a given covariance or precision matrix. Our concern is to develop generalized Gibbs samplers with optimal geometric, or better than geometric, distributional convergence by drawing on ideas in numerical computation, particularly the mature field of computational linear algebra. We apply the matrix-splitting formalism to show that fixed-scan Gibbs sampling from a multivariate normal is equivalent in fundamental ways to the stationary linear iterative solvers applied to systems of equations in the precision matrix. Stationary iterative solvers are now considered to be very slow precisely because of their geometric rate of convergence, and are no longer used for large systems. However, they remain a basic building block in the most efficient linear solvers. By establishing equivalence of error polynomials, we provide a route whereby acceleration techniques from numerical linear algebra may be applied to Gibbs sampling from normal distributions. The fastest solvers employ non- stationary iterations, hence the equivalent generalized Gibbs sampler induces an inhomogeneous Markov chain. Explicit calculation of the error polynomial guarantees convergence, while control of the error polynomial gives optimal performance. The adoption of the matrix splitting formalism gives the following practical benefits in the context of fixed-scan Gibbs sampling from normal targets: 1. a one-to-one equivalence between generalized Gibbs samplers and classical linear iterative solvers; 2. rates of convergence and error polynomials for the Markov chain induced by a generalized Gibbs sampler; 3. acceleration of the Gibbs sampler to induce an inhomogeneous Markov chain that achieves the optimal error polynomial, and hence has optimal convergence rate for expectations and in distribution; 4. numerical estimates of convergence rate of the (accelerated) Gibbs sampler in a single chain and a priori estimates of number of iterations to convergence; 5. access to preconditioning, whereby the sampling problem is transformed into an equivalent problem for which the accelerated Gibbs sampler has improved convergence rate. Some direct linear solvers have already been adapted to sampling from multivariate normal distributions, with Rue [52] demonstrating the use of solvers based on Cholesky factorization to allow computationally efficient sampling. This paper extends the connection to the iterative linear solvers. Since iterative methods are the most efficient for massive linear systems, the associated samplers will be the most efficient for very high dimensional normal targets. Polynomial accelerated Gibbs sampling 3713 1.1. Context and overview of results The Cholesky factorization is the conventional way to produce samples from a moderately sized multivariate normal distribution (Rue [52], Rue and Held [53]), and is also the preferred method for solving moderately sized linear systems. For large linear systems, iterative solvers are the methods of choice due to their inexpensive cost per iteration, and small computer memory re- quirements. Gibbs samplers applied to normal distributions are essentially identical to stationary itera- tive methods from numerical linear algebra. This connection was exploited by Adler [1], and independently by Barone and Frigessi [8], who noted that the component-wise Gibbs sampler is a stochastic version of the Gauss–Seidel linear solver, and accelerated the Gibbs sampler by introducing a relaxation parameter to implement the stochastic version of the successive over-relaxation (SOR) of Gauss–Seidel. This pairing was further analyzed by Goodman and Sokal [30]. This equivalence is depicted in panels A and B of Figure 1. Panel B shows the contours of a normal density π(x), and a sequence of coordinate-wise conditional samples taken by the Gibbs sampler applied to π. Panel A shows the contours of the quadratic minus log(π(x)) and the Gauss–Seidel sequence of coordinate optimizations,1 or, equivalently, solves of the normal equations ∇ log π(x) = 0. Note how in Gauss–Seidel the step sizes decrease towards conver- gence, which is a tell-tale sign that convergence (in value) is geometric. In Section 4, we will show that the iteration operator is identical to that of the Gibbs sampler in panel B, and hence the Gibbs sampler also converges geometrically (in distribution). Slow convergence of these al- gorithms is usually understood in terms of the same intuition; high correlations correspond to long narrow contours, and lead to small steps in coordinate directions and many iterations being required to move appreciably along the long axis of the target function. Roberts and Sahu [51] considered forward then backward sweeps of coordinate-wise Gibbs sampling, with relaxation parameter, to give a sampler they termed the REGS sampler. This is a stochastic version of the symmetric-SOR (SSOR) iteration, which comprises forward then backward sweeps of SOR. The equality of iteration operators and error polynomials, for these pairs of fixed-scan Gibbs samplers and iterative solvers, allows existing convergence results in numerical analysis texts (for example, Axelsson [5], Golub and Van Loan [29], Nevanlinna [45], Saad [54], Young [64]) to be used to establish convergence results for the corresponding Gibbs sampler. Existing results for rates of distributional convergence by fixed-sweep Gibbs samplers (Adler [1], Barone and Frigessi [8], Liu et al. [39], Roberts and Sahu [51]) may be established this way. The methods of Gauss–Seidel, SOR, and SSOR, give stationary linear iterations that were used as linear solvers in the 1950s, and are now considered very slow. The corresponding fixed-scan Gibbs