Spao-temporal dependence: a blessing and a curse for computaon and inference (illustrated by composional data modeling) (and with an introducon to NIMBLE)

Christopher Paciorek UC Berkeley Stacs

Joint work with: The PalEON project team (hp://paleonproject.org) The NIMBLE development team (hp://-nimble.org)

NSF-CBMS Workshop on Bayesian Spaal Stascs August 2017 Funded by various NSF grants to the PalEON and NIMBLE projects PalEON Project

Goal: Improve the predicve capacity of terrestrial ecosystem models

Friedlingstein et al. 2006 J. Climate

“This large variaon among carbon-cycle models … has been called ‘uncertainty’. I prefer to call it ‘ignorance’.” - Prence (2013) Grantham Instute

Crical issue: model parameterizaon and representaon of decadal- to centennial-scale processes are poorly constrained by data Approach: use historical and fossil data to esmate past vegetaon and climate and use this informaon for model inializaon, assessment, and improvement Spao-temporal dependence: a blessing 2 and a curse for computaon and inference Fossil Pollen Data

r Berry Pond West s n te Berry Pond, W Massachusetts e at oll eed P m t c D u al: ni h s & w co r A h c s r a c e orga Ye Pine HemlockBir Oak HickoryBe ChestnGra Cha % European settlement 2000

1900

1800

1700

1600

1500

1400

1300

1200

1100 Onset of Little Ice Age 1000

20 20 40 20 20 40 20 1500 60

Spao-temporal dependence: a blessing 3 and a curse for computaon and inference Selement-era Land Survey Data Survey grid in Wisconsin Surveyor notes

Raw oak tree proporons (on a grid in the western poron and in irregular township areas in the eastern poron)

Spao-temporal dependence: a blessing 4 and a curse for computaon and inference Outline • Applicaon 1: Spaal smoothing of composional data – Seng: Mulvariate data, high-dimensional quanes, non-conjugate models – A hierarchical mulnomial probit model with CAR spaal process – Data augmentaon – How much smoothness (in space)? – Computaonal implicaons • Computaonal tools – Overview of current soware – Introducon to NIMBLE • Applicaon 2: Temporal predicon of biomass from composional data – How much smoothness (in me)? – A hierarchical sck-breaking composional model with Generalized Pareto nonstaonary temporal smoothing – Default MCMC and computaonal challenges – Customized MCMC using NIMBLE • Concluding thoughts Spao-temporal dependence: a blessing 5 and a curse for computaon and inference Applicaon 1: Spaal smoothing of composional data

• Mulvariate: ~20 taxa (species) • Sum-to-one constraint on proporons • 8 km by 8 km grid: • ~10,000 grid points • 1.3 million trees (> 20 cm diameter) in total • ~125 trees per grid cell

Spao-temporal dependence: a blessing 6 and a curse for computaon and inference Applicaon 1: Should we model spaal dependence?

• Yes: • We want to esmate composion at all locaons. • We want to smooth over noise at observed locaons. • We are interested in joint inference for mulple locaons, so we need to account for posterior covariance. • No: • We would need to model the spaal dependence, with the resulng computaonal implicaons.

Spao-temporal dependence: a blessing 7 and a curse for computaon and inference Applicaon 1: Should we model mulvariate dependence?

• Yes: • Taxa do show correlated abundance (taxa have similaries in their ecological characteriscs). • If joint inference on mulple tree species is desired, need mulvariate correlaon structure to properly characterize given our actual knowledge. • No: • Dependence varies by locaon (nonstaonarity) • E.g., hemlock/beech posively correlated in general, but beech not present in some locaons where hemlock appears (different western range limits) • Would require more complex model • Locaons with data have data for all taxa • Imputaon is only spaal not mulvariate • With no measurement error and separable covariance, kriging predicon for a taxon depends only on data from that taxon at other locaons • Inference not focused on mul-taxon funconals

Spao-temporal dependence: a blessing 8 and a curse for computaon and inference Applicaon 1: Spaal smoothing of composional data

• Mulvariate: ~20 taxa (species) • Sum-to-one constraint on proporons • 8 km by 8 km grid: • ~10,000 grid points • 1.3 million trees (> 20 cm diameter) in total • ~125 trees per grid cell

Model overview: • Mulnomial likelihood (no over-dispersion) • One spaal process per taxon • Sum-to-one constraint based on a mulnomial probit specificaon • Otherwise, no mulvariate structure • Spaal process hyperparameters

Spao-temporal dependence: a blessing 9 and a curse for computaon and inference Applicaon 1: Standard Spaal Mulnomial Logit Model

A spaal mulnomial logit model: y Multi(n , ✓(s )) i ⇠ i i exp(gp(si)) ✓p(si)= k exp(gk(si)) g ( ) GP( ) p · ⇠ P p for locaon i and taxon p.

Computaonal implicaons: • No conjugacy! • Can’t integrate analycally over the latent processes • How propose good values of each g process?

Consider McCulloch and Rossi (1994) mulnomial extension of Albert and Chib (1993) data augmentaon (DA) trick for probit regression.

Spao-temporal dependence: a blessing 10 and a curse for computaon and inference Applicaon 1: Spaal Mulnomial Probit Model with Data Augmentaon A spaal mulnomial probit model:

yij = p i↵ wijp = max wijk k w (g (s ), 1) ijp ⇠ N p i g ( ) GP( ) p · ⇠ p for locaon i, tree j, and taxon p.

Computaonal implicaons: • Data augmentaon version allows conjugate updates of each g process • But! Introduce new level in model – higher dimensional and with potenal for cross-level dependence to impede MCMC performance

Spao-temporal dependence: a blessing 11 and a curse for computaon and inference Applicaon 1: How much smoothness?

Applicaon is based on 8 km grid, so CAR style (i.e., Markov random field) models a natural choice. How smooth spaally? • First order (simple neighborhood) CAR models: not smooth spaally.

yij = p i↵ wijp = max wijk k wijp (gp(si), 1) ⇠ N 2 gp (0, pQ) (ICAR) ⇠ N

• Second order (thin-plate spline) CAR models: very smooth spaally. • Lindgren et al (2011) SPDE approximaon to Matern-based Gaussian process: range parameter and limited control over differenability parameter.

Spao-temporal dependence: a blessing 12 and a curse for computaon and inference Applicaon 1: Smoothness and computaon

• Sparse precision matrices • Very computaonally efficient for conjugate updates • Without conjugacy not clear how to generate good proposals for enre spaal field for a taxon, so computaonal efficiency of limited relevance • Locaon-specific updates would mix poorly when there is strong spaal dependence • Simple CAR models may show reasonable mixing for spaal process values with fixed hyperparameters because of lesser spaal smoothness • Cross-level dependence from separate updates of latent data values, spaal process values, spaal hyperparameters • Updates of spaal process and hyperparameters not directly informed by data

Spao-temporal dependence: a blessing 13 and a curse for computaon and inference Applicaon 1: MCMC design

yij = p i↵ wijp = max wijk k wijp (gp(si), 1) ⇠ N 2 g (0, Q) (ICAR) p ⇠ N p • Cross-level dependence from separate updates of latent data values, spaal process values, spaal hyperparameters

• Adequate performance required joint (cross-level) updates of {gp, σp}: • Metropolis proposal for σp with conjugate proposal for gp • Equivalent to marginalizing over gp but avoids correlated truncated normal density for w

Spao-temporal dependence: a blessing 14 and a curse for computaon and inference Applicaon 1: MCMC implementaon

yij = p i↵ wijp = max wijk k wijp (gp(si), 1) ⇠ N 2 g (0, Q) (ICAR) p ⇠ N p • Overall MCMC wrien in R • Truncated normal computaons done in C++ via Rcpp (can also use openMP for parallelizaon)

• Joint {gp, σp} samples done in R using sparse matrix computaons with spam package (which uses Fortran) • Even with customizaon, MCMC takes order of two weeks • Computaon pre-dates NIMBLE but NIMBLE designed to allow users to set up customized MCMC sampling for components of models

• E.g., the joint {gp, σp} sampling could be coded as a user-defined sampler in NIMBLE (and NIMBLE provides such a sampler for some such situaons) Spao-temporal dependence: a blessing 15 and a curse for computaon and inference Applicaon 1: MCMC performance 0.26 0.22 0.80 0.24 0.20 0.75 Ash Beech 0.22 Basswood 0.18 0.70 0.20 0.65 0.16 0 50 150 250 0 50 150 250 0 50 150 250 iteration iteration iteration 1.0 0.20 0.8 0.54 0.6 Gum Birch Cedar 0.18 0.50 0.4 0.2 0.46 0.16 0 50 150 250 0 50 150 250 0 50 150 250 iteration iteration iteration

Trace plots for taxon-specific hyperparameters

Spao-temporal dependence: a blessing 16 and a curse for computaon and inference Applicaon 1: Results

Model selecon: • First order CAR and Lindgren GP approximaon have similar performance but GP approximaon has anomalies at the spaal boundaries. • Second order (thin plate spline) CAR too smooth.

Predicon: hp://gandalf.berkeley.edu:3838/paciorek/setVegComp

Spao-temporal dependence: a blessing 17 and a curse for computaon and inference Bayesian soware landscape Hand-coded algorithms: • R, Python: fast to develop and easy to share, but slow computaon • C++, Rcpp: slower to develop and harder to share, but fast computaon • Julia: fast to develop and fast computaonally but less widely used

Black-box MCMC engines: • JAGS: single variable samplers with a focus on conjugate samplers • : Hamiltonian MC, variaonal Bayes • PyMCMC3: flexible sampler choice, Hamiltonian MC, variaonal Bayes

NIMBLE: • Customizable MCMC and other algorithms plus a system for programming algorithms for hierarchical models in R

Spao-temporal dependence: a blessing 18 and a curse for computaon and inference Applicaon 1: Soware needs • Exploit sparsity • Flexibility in choosing samplers for parts of the model • Joint sampling of spaally-dependent process values • Customize joint sampling of hyperparameters and spaal process to improve mixing • Use compiled code for computaonal bolenecks

Notes: • NIMBLE can’t do all of this yet (no sparse matrices right now), but designed for such flexibility • Would be interesng to compare performance of my customized sampling to Stan’s HMC

Spao-temporal dependence: a blessing 19 and a curse for computaon and inference Exisng soware

Model Algorithm

Y(1) Y(2) Y(3)

X(1) X(2) X(3)

e.g., BUGS (WinBUGS, OpenBUGS, JAGS), INLA, Stan, various R packages NIMBLE: The Goal

Model Algorithm language

Y(1) Y(2) Y(3)

X(1) X(2) X(3) + = Divorcing Model Specificaon from Algorithm

MCMC Flavor 1 Your new method

Y(1) Y(2) Y(3) MCMC Flavor 2 Data cloning

X(1) X(2) X(3) Parcle Filter MCEM

Quadrature Importance Sampler Maximum likelihood

Spao-temporal dependence: a blessing 22 and a curse for computaon and inference NIMBLE’s goals

– Retaining BUGS compability – Providing a variety of standard algorithms – Allowing developers to add new algorithms (including modular combinaon of algorithms) – Allowing users to operate within R – Providing speed via compilaon to C++, with R wrappers

Spao-temporal dependence: a blessing 23 and a curse for computaon and inference NIMBLE System Summary

R objects + R under the hood stascal model (BUGS code) + algorithm (nimbleFuncon)

R objects + C++ under the hood

² We generate C++ code, ² compile and load it, ² provide interface object.

Spao-temporal dependence: a blessing 24 and a curse for computaon and inference NIMBLE

1. Model specificaon

BUGS language è R/C++ model object

2. Algorithm

MCMC, Parcle Filter/Sequenal MC, etc.

3. Programming algorithms NIMBLE within R è R/C++ algorithm object

Spao-temporal dependence: a blessing 25 and a curse for computaon and inference The Success of R

Spao-temporal dependence: a blessing 26 and a curse for computaon and inference NIMBLE: Programming with Models

liersCode <- nimbleCode( { for(j in 1:G) { for(I in 1:N) { r[i, j] ~ dbin(p[i, j], n[i, j]); p[i, j] ~ dbeta(a[j], b[j]); } You give NIMBLE: mu[j] <- a[j]/(a[j] + b[j]); theta[j] <- 1.0/(a[j] + b[j]); a[j] ~ dgamma(1, 0.001); b[j] ~ dgamma(1, 0.001); } )

> liersModel$a[1] <- 5 # set values in model > simulate(liersModel, ‘p') # simulate from prior You get this: > p_deps <- liersModel$getDependencies(‘p’) # model structure > calculate(liersModel, p_deps) # calculate probability density > getLogProb(pumpModel, ‘r') NIMBLE also extends BUGS: mulple parameterizaons, named parameters, and user-defined distribuons and funcons.

Spao-temporal dependence: a blessing 27 and a curse for computaon and inference User Experience: Specializing an Algorithm to a Model liersModelCode <- modelCode({ sampler_slice <- nimbleFuncon( for(j in 1:G) { setup = funcon((model, mvSaved, control) { for(I in 1:N) { calcNodes <- model$getDependencies(control$targetNode) r[i, j] ~ dbin(p[i, j], n[i, j]); discrete <- model$getNodeInfo()[[control$targetNode]]$isDiscrete() p[i, j] ~ dbeta(a[j], b[j]); […snip…] } run = funcon() { mu[j] <- a[j]/(a[j] + b[j]); u <- getLogProb(model, calcNodes) - rexp(1, 1) theta[j] <- 1.0/(a[j] + b[j]); x0 <- model[[targetNode]] a[j] ~ dgamma(1, 0.001); L <- x0 - runif(1, 0, 1) * width b[j] ~ dgamma(1, 0.001); […snip….] }) …

> liersMCMCconf <- configureMCMC(liersModel) > liersMCMCconf$printSamplers() […snip…] [3] RW sampler; targetNode: b[1], adapve: TRUE, adaptInterval: 200, scale: 1 [4] RW sampler; targetNode: b[2], adapve: TRUE, adaptInterval: 200, scale: 1 [5] conjugate_beta sampler; targetNode: p[1, 1], dependents_dbin: r[1, 1] [6] conjugate_beta sampler; targetNode: p[1, 2], dependents_dbin: r[1, 2] [...snip...] > liersMCMCconf$addSampler(‘a[1]’, ‘slice’, list(adaptInterval = 100)) > liersMCMCconf$addSampler(‘a[2]’, ‘slice’, list(adaptInterval = 100)) > liersMCMCconf$addMonitors(‘theta’) > liersMCMC <- buildMCMC(liersMCMCspec) > liersMCMC_Cpp <- compileNimble(liersMCMC, project = liersModel) > liersMCMC_Cpp$run(20000)

Spao-temporal dependence: a blessing 28 and a curse for computaon and inference NIMBLE

1. Model specificaon

BUGS language è R/C++ model object

2. Algorithm library

MCMC, Parcle Filter/Sequenal MC, MCEM, etc.

3. Programming algorithms NIMBLE programming language within R è R/C++ algorithm object

Spao-temporal dependence: a blessing 29 and a curse for computaon and inference NIMBLE’s algorithm library

– MCMC samplers: • Conjugate, adapve Metropolis, adapve blocked Metropolis, slice, ellipcal slice sampler, parcle MCMC, specialized samplers for parcular distribuons (Dirichlet, CAR) • Flexible choice of sampler for each parameter • User-specified blocks of parameters – Sequenal Monte Carlo (parcle filters) • Various flavors – MCEM – Write your own

Spao-temporal dependence: a blessing 30 and a curse for computaon and inference NIMBLE

1. Model specificaon

BUGS language è R/C++ model object

2. Algorithm library

MCMC, Parcle Filter/Sequenal MC, etc.

3. Algorithm specificaon NIMBLE programming language within R è R/C++ algorithm object

Spao-temporal dependence: a blessing 31 and a curse for computaon and inference NIMBLE: Programming With Models

We want:

• High-level processing (model structure) in R

• Low-level processing in C++

Spao-temporal dependence: a blessing 32 and a curse for computaon and inference NIMBLE: Programming With Models sampler_myRW <- nimbleFuncon( setup = funcon(model, mvSaved, targetNode, scale) { calcNodes <- model$getDependencies(targetNode) }, run = funcon() { model_lp_inial <- calculate(model, calcNodes) proposal <- rnorm(1, model[[targetNode]], scale) 2 kinds of model[[targetNode]] <<- proposal funcons model_lp_proposed <- calculate(model, calcNodes) log_MH_rao <- model_lp_proposed - model_lp_inial

if(decide(log_MH_rao)) jump <- TRUE else jump <- FALSE # …. Various bookkeeping operaons … # })

Spao-temporal dependence: a blessing 33 and a curse for computaon and inference NIMBLE: Programming With Models sampler_myRW <- nimbleFuncon( setup = funcon(model, mvSaved, targetNode, scale) { query model calcNodes <- model$getDependencies(targetNode) structure }, ONCE run = funcon() { model_lp_inial <- calculate(model, calcNodes) proposal <- rnorm(1, model[[targetNode]], scale) model[[targetNode]] <<- proposal model_lp_proposed <- calculate(model, calcNodes) log_MH_rao <- model_lp_proposed - model_lp_inial

if(decide(log_MH_rao)) jump <- TRUE else jump <- FALSE # …. Various bookkeeping operaons … # })

Spao-temporal dependence: a blessing 34 and a curse for computaon and inference NIMBLE: Programming With Models sampler_myRW <- nimbleFuncon( setup = funcon(model, mvSaved, targetNode, scale) { calcNodes <- model$getDependencies(targetNode) }, run = funcon() { model_lp_inial <- calculate(model, calcNodes) proposal <- rnorm(1, model[[targetNode]], scale) the actual model[[targetNode]] <<- proposal (generic) model_lp_proposed <- calculate(model, calcNodes) algorithm log_MH_rao <- model_lp_proposed - model_lp_inial

if(decide(log_MH_rao)) jump <- TRUE else jump <- FALSE # …. Various bookkeeping operaons … # })

Spao-temporal dependence: a blessing 35 and a curse for computaon and inference The NIMBLE compiler (run code) Feature summary: • R-like matrix algebra (using Eigen library) • R-like indexing (e.g. X[1:5,]) • Use of model variables and nodes • Model calculate (logProb) and simulate funcons • Sequenal integer iteraon • If-then-else, do-while • Access to much of Rmath.h (e.g. distribuons) • Automac R interface / wrapper • Call out to your own C/C++ or back to R • Many improvements / extensions planned Spao-temporal dependence: a blessing 36 and a curse for computaon and inference NIMBLE: What can I program? • Your own distribuon for use in a model • Your own funcon for use in a model • Your own MCMC sampler for a variable in a model • A new MCMC sampling algorithm for general use • A new algorithm for hierarchical models • An algorithm that composes other exisng algorithms (e.g., MCMC-SMC combinaons)

Spao-temporal dependence: a blessing 37 and a curse for computaon and inference NIMBLE: What can I program? • Your own distribuon for use in a model • Your own funcon for use in a model • Your own MCMC sampler for a variable in a model • A new MCMC sampling algorithm for general use • A new algorithm for hierarchical models • An algorithm that composes other exisng algorithms (e.g., MCMC-SMC combinaons)

Spao-temporal dependence: a blessing 38 and a curse for computaon and inference Status of NIMBLE and Next Steps

• First release was June 2014 with regular releases since. Lots to do: – Improve the user interface and speed up compilaon – Refinement/extension of the NIMBLE programming language • e.g., automac differenaon, parallelizaon, sparse matrices – Addional algorithms wrien in NIMBLE DSL • e.g., normalizing constant calculaons, Laplace approximaons, HMC and other samplers • Bayesian nonparametrics with Claudia Wehrhahn Cortes and Abel Rodriguez (UCSC)

• Interested? – Announcements: nimble-announce Google site – User support/discussion: nimble-users Google site – Write an algorithm using NIMBLE! – Help with development of NIMBLE: email [email protected] or see github.com/nimble-dev

Spao-temporal dependence: a blessing 39 and a curse for computaon and inference Applicaon 2: Predicng biomass from composional data

Calibraon: at selement me we have biomass esmates (based on survey data and a

spaal model) and pollen composion (from sediment cores) Biomass Point Estiamtes at Settlement

Biomass esmates at ponds 60 50 40 30 Frequency 20 10

●● 0 ● ●● ●

● ● 0 ● 10 20 30 40 50 75

● 100 125 150 ● ●●● ●● ● Biomass (Mg/ha) ●●●●●●●●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ●●● ●●● ●● ● ●● ● ●● ● ● ●● ● ● ● ●● ●●● ● ● ●●● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●

● ●

Predicon: based on calibraon model and pollen composion over me, predict biomass

r Berry Pond West s n te Berry Pond, W Massachusetts e at oll eed P m t c D u al: ni h s & w co r A h c s r a c e orga Ye Pine HemlockBir Oak HickoryBe ChestnGra Cha % European settlement 2000

1900

1800

1700

1600

1500

1400

1300

1200

1100 Onset of Little Ice Age 1000

20 20 40 20 20 40 20 1500 60

Spao-temporal dependence: a blessing 40 and a curse for computaon and inference Applicaon 2: Calibraon model • Pollen proporon for each taxon determined by transformaon of a flexible (spline) funcon of biomass • shape1 and shape2 parameters of beta distribuon are splines of biomass • Primary calibraon parameters are spline coefficients • Mulnomial likelihood for pollen counts given modeled proporons • Fit in NIMBLE (could be fit in various other packages)

Spao-temporal dependence: a blessing 41 and a curse for computaon and inference Applicaon 2: Calibraon model fit

PINUSX prairie

● ● ● ● ●● ● ● ●● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ●●● ● ●● ● ●●●● ●● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ●● ● ●●●●● ●●●●● ●●●● ● ● ●●● ● ● ●●●●●●●● ● ●●●●● ●●●●●●● ● ● ● ● ● ●●● ● ● ●●● ●● ●●● ● ●●● ●● ● ●● ●●●● ● ● ●●●●●●●●●●●● ●●● ● ● ●● ● ●● ●●●●●●●● ●●●●●●●● ● ●●●●● pollen prop ● ● ●●●●● ●● pollen prop ●●●● ● ● ●●● ● ●● ●●●●●●●● ● ● ●●● ●●● ●● ● ● ● ● ● ● ●●● ● ●● ●●● ●● ●● ●● ●● ●●●●●●●● ● ●●● ● ● ●●● ● ● ●● ●●● ● ● ● ●●●●●●●● ● ●●●● ● ●●● ● ● ● ●●●●●●● ● ●●●●●●●●● ● ●●●●●●●● ● ● ●●●● ●● ●● ● ●●●●●●● ● ● ● ●●●●●● ● ●●●●● ● ●● ● ●●● ●●●●●●●●●●● ● ●● ● ●●●● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●● ● ● ● ● ● ● ● ●● ●● 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.2 0.4 0.6 0.8 0 20 40 60 80 100 140 0 20 40 60 80 100 140

biomass biomass

QUERCUS other herbs

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ●●●●● ● ● ●●●●● ●●●● ● ●● ●● ● ● ● ● ● pollen prop ●● ● pollen prop ●● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●●●●●● ● ● ●●●● ● ●● ● ● ● ●● ●●●● ● ●● ●●● ● ● ● ●●●● ● ● ●● ●● ● ● ● ● ●● ● ●● ●●● ● ● ●● ●● ●●●●●●●●● ● ● ●●●● ● ●● ● ● ● ● ●●●● ● ● ●● ● ● ●●●● ● ●●●●●● ●●●● ●●●●●●●●●●●●● ●●●●● ●●●●●●●●●● ● ● ● ● ● ●●●●●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ● ● ●●●● ● ●●●●●●●●●● ● ● ● ●●●● ● ● ●●● ●●●●●●●●●●●●●●●●●●●●●● ●●● ● ●●●●●●● ●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ● ●● ●●● ●● ● ● ● ●●●●● ● ● ●●● ● ●●●●● ●●●●● ●●●●●● ●●●●●●●● ●● ●● ● ● ● ●● ● ● ● ●● ●● ●● ●● ● ● ● ● ● ● ●● ●● ● 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 0.5

0 20 40 60 80 100 140 0 20 40 60 80 100 140

biomass biomass Mean and variability of modeled pollen proporons across ponds vary with biomass

Spao-temporal dependence: a blessing 42 and a curse for computaon and inference Applicaon 2: Predicon Model

taxon k

σ β1k , β2k

bt α1tk, α2tk

Z(bt) ptk

Ytk me t

Spao-temporal dependence: a blessing 43 and a curse for computaon and inference Applicaon 2: Predicon Model for(t in 1:nTimes) pollen likelihood Y[t, 1] ~ dbetabin(alpha1[t, 1], alpha2[t, 1], n[t]) for(k in 2:(nTaxa-1)) { Y[t, k] ~ dbetabin(alpha1[t, k], alpha2[t, k], n[t]-sum(Y[t, 1:(k-1)]) for (k in 1:nTaxa) for(t in 1:nTimes) { alpha1[t, k] <- exp(Zb[t, 1:nKnots] %*% beta1[1:nKnots, k]) alpha2[t, k] <- exp(Zb[t, 1:nKnots] %*% beta2[1:nKnots, k]) latent } predictor for( t in 1:nTimes) Zb[t, 1:nKnots] <- bspline(b[t], knots[1:nKnots]) for(t in 2:nTimes) b[t] ~ dnorm(b[t-1]), sd = sigma) biomass evoluon sigma ~ dunif(0, 10) # Gelman (2006) b[1] ~ dunif(0, 400) hyperpriors

Spao-temporal dependence: a blessing 44 and a curse for computaon and inference Applicaon 2: Biomass predicon at one site

Calibraon sites and predicon site (red) Biomass over me 50

100 forest woodland ● ● 80 ● ● ● ● ●● ● ● 48 ● ● ● ● ● ● ● ● ● ● ● ●●●●● ●●● ● ●● ● ● ●● ● ● ● ●

● 60 ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●●● ● ● ●●● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ●●● ● ●●●●● ●●●●●● ● ● ●● ● ● ●●

46 ● ● ● ●●● ● ● ● ● biomass ● ● ● 40 ● ●● ●●● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● 20 grassland ●●●● ● ●● ● ● ● ● ● ●

44 ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 0 ●

● ●

42 −8000 −6000 −4000 −2000 0 2000

−95 −90 −85 year Key ecological queson: how does biomass (carbon storage) evolve over me? Stascal queson: how to model temporal process? Smoothness? • Discrete first-order autoregressive (i.e., CAR) model is not smooth • Discrete second-order autoregressive (i.e., thin plate spline) is very smooth • Nonstaonarity? Spao-temporal dependence: a blessing 45 and a curse for computaon and inference Applicaon 2: Generalized Pareto / Trend filtering

• Discrete autoregressive model is a model (prior) for temporal contrasts (in biomass) • Nonstaonarity could be achieved by seng some contrasts to zero • Reversible jump • L1 prior (Laplace / double exponenal) a la the Lasso • Generalized Pareto extends the Laplace prior based on extensive work on properes of shrinkage priors (Carvalho et al (2010), Tansey et al. (2016), Taddy (2013)) • Looks like Laplace prior but with faer tails • Could consider first-order (piecewise constant model), second-order (piecewise linear), third-order (piecewise quadrac) contrasts

Spao-temporal dependence: a blessing 46 and a curse for computaon and inference Applicaon 2: Generalized Pareto / Trend filtering • Marginalized model (third order) bt GenPar(3bt 1 3bt 2 + bt 3, , ) ⇠ • Sparsity-inducing prior and modeling of contrasts produces very complicated and oen very strong temporal dependence • Hard to make good MCMC proposals

• Model (third order) with data augmentaon

bt (3bt 1 3bt 2 + bt 3, !t) ⇠ N ! Exp(2/2) t ⇠ t Ga( , ) t ⇠ • Now have normal prior for b1:T but no conjugacy so sll hard to find good proposals • And we have addional hierarchical levels that can impede MCMC

mixing Spao-temporal dependence: a blessing 47 and a curse for computaon and inference Applicaon 2: MCMC performance

Mixing with data augmentaon using Mixing in marginalized model using default NIMBLE MCMC HMC in Stan 130 130 125 125 120 120 115 biomass at − 5000 years biomass at − 5000 years 115 110 chain 1 chain 2 110

0 200 400 600 800 1000 0 200 400 600 800 1000 MCMC iteration MCMC iteration (recall non-differenable spike at zero from generalized Pareto)

Spao-temporal dependence: a blessing 48 and a curse for computaon and inference Applicaon 2: MCMC performance (2)

Stan-based posterior correlaons of biomass process values

chain 1 chain 2

1.0 1.0 1.0 1.0 0.8 0.8 0.5 0.5 0.6 0.6

0.0 0.0 0.4 0.4

−0.5 −0.5 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Spao-temporal dependence: a blessing 49 and a curse for computaon and inference Applicaon 2: Customized block sampling in NIMBLE

1. Use data augmentaon with normal approximaon to likelihood [y|b] at each point to provide approximately conjugate proposals for biomass process • Simple to approximate with mode and curvature of likelihood 2. Joint updates for : bivariate random walk for !t, t,bt l:t+l hyperparameters and approximate conjugate update for biomass process values • Joint updang of hyperparameters and process addresses cross-level dependency • Joint updang of mulple biomass values addresses temporal dependency • Local neighborhood updates for biomass reduce computaon and avoid high-dimensional approximate conjugacy

Sampling done in NIMBLE using a user-defined sampler, combined with standard samplers for other model parameters.

Spao-temporal dependence: a blessing 50 and a curse for computaon and inference Applicaon 2: Customized MCMC performance

Mixing with data augmentaon using customized NIMBLE MCMC 135 1.0 1.0

120 0.8 0.8 0.6 125 0.6 80 0.4 0.2 sigma 0.4

115 0.0 40

0.2 −0.2 −0.4 biomass at − 3000 years 0.0 0 105 0.0 0.4 0.8 0 2000 4000 0 2000 4000 MCMC iteration MCMC iteration

Spao-temporal dependence: a blessing 51 and a curse for computaon and inference Applicaon 2: Inial results 150 ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●

● ● 100 biomass

50 ● ● ● ● ● ●

● pointwise MLE 0

−8000 −6000 −4000 −2000 0 year Spao-temporal dependence: a blessing 52 and a curse for computaon and inference Concluding thoughts • The spao(-temporal) dependence we need for smoothing/predicon can greatly affect algorithm performance. • Blocked sampling can address dependence but good proposals can be hard to find, parcularly with: – non-conjugate models and – dependence across model levels. • Even with algorithm advances, computaonal limitaons sll greatly limit our ability to fit rich model structures. • NIMBLE provides a plaorm for – customizing algorithms for parcular models and – developing general-purpose algorithms for hierarchical models.

Spao-temporal dependence: a blessing 53 and a curse for computaon and inference PalEON Acknowledgements

• Pollen-biomass Collaborators: Ann Raiho, Jason McLachlan (Notre Dame Biology) • PalEON invesgators: Jason McLachlan (Notre Dame, PI), Mike Dietze (Boston U.), Andrew Finley (Michigan State), Amy Hessl (West Virginia), Phil Higuera (Idaho), Mevin Hooten (USGS/Colorado State), Steve Jackson (USGS/Arizona), Dave Moore (Arizona), Neil Pederson (Harvard Forest), Jack Williams (Wisconsin), Jun Zhu (Wisconsin) • NSF Macrosystems Program

Spao-temporal dependence: a blessing 54 and a curse for computaon and inference NIMBLE Acknowledgements

NIMBLE development team: • Perry de Valpine (PI) UC Berkeley Environmental Science, Policy and Management (ESPM) • Daniel Turek Williams College • Nick Michaud UC Berkeley Stascs and ESPM • Fritz Obermeyer UC Berkeley Stascs and ESPM • Duncan Temple Lang UC Davis Stascs • and various development team alumni

NIMBLE can be installed from CRAN in the usual way for an R package, and a full website with link to the User Manual is at hp://r-nimble.org.

Spao-temporal dependence: a blessing 55 and a curse for computaon and inference

References

• Albert, James H., and Siddhartha Chib. "Bayesian analysis of binary and polychotomous response data." Journal of the American stascal Associaon 88.422 (1993): 669-679.

• Carvalho, Carlos M., Nicholas G. Polson, and James G. Sco. "The horseshoe esmator for sparse signals." Biometrika 97.2 (2010): 465-480.

• de Valpine, Perry, et al. "Programming with models: wring stascal algorithms for general model structures with NIMBLE." Journal of Computaonal and Graphical Stascs 26.2 (2017): 403-413.

• Lindgren, Finn, Håvard Rue, and Johan Lindström. "An explicit link between Gaussian fields and Gaussian Markov random fields: the stochasc paral differenal equaon approach." Journal of the Royal Stascal Society: Series B (Stascal Methodology) 73.4 (2011): 423-498.

• McCulloch, Robert, and Peter E. Rossi. "An exact likelihood analysis of the mulnomial probit model." Journal of Econometrics 64.1 (1994): 207-240.

• Taddy, Ma. "Mulnomial inverse regression for text analysis." Journal of the American Stascal Associaon 108.503 (2013): 755-770.

• Tansey, Wesley, Athey, A., Reinhart, A., and Sco, J. G. "Mulscale spaal density smoothing: an applicaon to large-scale radiological survey and anomaly detecon." Journal of the American Stascal Associaon just-accepted (2017).

Spao-temporal dependence: a blessing 56 and a curse for computaon and inference