A JOINT SPATIO-TEMPORAL MODEL OF OPIOID ASSOCIATED DEATHS AND TREATMENT ADMISSIONS IN OHIO
BY
YIXUAN JI
A Thesis Submitted to the Graduate Faculty of
WAKE FOREST UNIVERSITY GRADUATE SCHOOL OF ARTS AND SCIENCES
in Partial Fulfillment of the Requirements
for the Degree of
MASTER OF ARTS
Mathematics and Statistics
May 2019
Winston-Salem, North Carolina
Approved By:
Staci Hepler, Ph.D., Advisor
Miaohua Jiang, Ph.D., Chair Robert Erhardt, Ph.D. Acknowledgments
First and foremost, I would like to express my sincere gratitude and appreciation to my advisor, Dr. Staci A. Hepler. Without your continuous support and inspiration, I could not have achieved this thesis research. Thank you so much for your patience, motivation, enthusiasm, and immense knowledge, which made this research even more enjoyable. In addition to my advisor, I would also like to thank Dr. Miaohua Jiang and Dr. Robert Erhardt, for the constant encouragement and advice, and for serving on my thesis committee. Furthermore, this thesis benefited greatly from various courses that I took, including Bayesian statistics from Dr. Hepler, factor analysis and Markov Chain from Dr. Berenhaut, poisson linear model from Dr. Erhardt’s Generalized Linear Model class, R programming skills from Dr. Nicole Dalzell and also reasoning skills from Dr. John Gemmer’s Real Analysis class. Additionally, I would like to extend my thanks to Dr. Ellen Kirkman, Dr. Stephen Robinson and Dr. Sarah Raynor for their special attention and support to me, as an international student. Last but not least, I would like to thank my family for their sincere mental support throughout my life. Moreover, I appreciate the help and companionship from all my fellow graduate students, faculty and staff in the Mathematics and Statistics Department.
ii Table of Contents
Acknowledgments ...... ii
Abstract ...... v
List of Figures ...... vi
List of Tables ...... vii
Chapter 1 Introduction ...... 1 1.1 Opioid Misuse in Ohio ...... 1 1.2 Data Collection ...... 2 1.3 Overview Of Research ...... 7
Chapter 2 Background Information...... 10 2.1 Introduction to Bayesian Statistics ...... 10 2.2 Markov Chain Monte Carlo ...... 13 2.2.1 Metropolis-Hastings Algorithm ...... 15 2.2.2 Gibbs Sampler ...... 16 2.3 Spatial Statistics ...... 18 2.3.1 CAR Model ...... 19 2.3.2 Spatio-temporal ...... 20
Chapter 3 Models ...... 22 3.1 Models ...... 22 3.2 Priors and Computation ...... 24 3.3 Model Selection ...... 27
Chapter 4 Conclusions ...... 29 4.1 Model Comparison ...... 29 4.2 Effects on Death Rates ...... 32 4.3 Effects on Treatment Rates ...... 33 4.4 Spatial Factors ...... 34 4.5 Scaled Loadings ...... 37 4.6 Independent Error Term ...... 39
iii Chapter 5 Discussion...... 41 5.1 Censored Observations ...... 41 5.2 Covariates ...... 42
Bibliography ...... 45
Appendix A Description of covariates ...... 51
Appendix B Trace Plots ...... 53 B.0.1 R code ...... 53 B.0.2 Trace Plots ...... 53
Appendix C Relevant R code ...... 56 C.0.1 Model ...... 56 C.0.2 Map of Ohio ...... 61
Curriculum Vitae ...... 71
iv Abstract
Opioid misuse is a major public health issue in the United States and in 2017, Ohio had the second highest age-adjusted drug overdose rate. In this thesis, we consider a joint spatio-temporal model of county-level surveillance data on deaths and treatment admissions using a latent spatial factor model. Our main goal is to estimate a common spatial factor, which offers a summary of the underlying joint burden of opioid misuse across space and time. The result is supposed to provide a valuable tool to allocate resources across the state in a timely manner. keywords: opioid epidemic, factor model, conditional autoregressive, spatio-temporal, resource allocation
v List of Figures
1.1 Observed death rates ...... 4 1.2 Observed treatment rates ...... 6 1.3 Observed adolescent poverty ...... 7
4.1 Estimated variance ratio ...... 31 4.2 Posterior mean rate for death and treatment ...... 32 4.3 Diagram of the joint spatio-temporal model ...... 35 4.4 Spatial Factors ...... 36 4.5 Spatial Loadings ...... 38 4.6 Error Term D ...... 39 4.7 Error Term T ...... 40
B.1 Trace Plot ...... 55
vi List of Tables
1.1 Diagnostic Codes used to define treatment ...... 5
4.1 Posterior mean of intercepts β1, β3 and slopes β2, β4 with marginal coefficients eβ2 , eβ4 for death rates and treatment rates respectively. . 34
4.2 Posterior standard deviation of intercepts β1, β3 and slopes β2, β4 for death rates and treatment rates respectively...... 34
A.1 Covariates ...... 52
vii Chapter 1: Introduction
1.1 Opioid Misuse in Ohio
Poisoning is one of the main causes of unintentional injury deaths in the United States, which are only second to motor vehicle crashes [9]. Almost all poisoning deaths result from drugs, and most drug poisonings are attributed to the abuse of prescription and illegal drugs. Overdose death rates varied according to the different types of drugs. Cocaine and heroin used to be the principal reason. However, deaths related to opoiods began increasing and accounted for a large number of deaths in the beginning of the 21st century [45]. Of note, concomitant use of benzodiaz-epines is implicated in the majority of prescription opioid-related overdose deaths [21].
Opioids are widely used for managing pain from acute injuries, surgeries, and advanced cancer. However, opioid misuse in some circumstances is associated with an elevated risk of early death, legal problems, and infectious diseases such as hepatitis C and human immunodeficiency virus (HIV). Accidental drug overdose, suicide, trauma and human immunodeficiency virus are attributed to the risk of premature mortality related to opioid overdose [11]. Opioid abuse is currently a national public epidemic and conditions on opioid misuse and abuse continue to deteriorate the United States. Opioid-related drug overdoses account for over 60% of drug overdose deaths within the United States, representing a fourfold increase within the past 15 years [7]. Treatment admissions for prescription opioid dependence rised more than fivefold since 2000 [35].
In particular, the drug overdose death rate in Ohio is above the national average. Opioid overdose deaths rearched 42,249 (66.4%) in 2016 (13.3 per 100,000 population), with a 27.9% rate increase from 2015 [37]. Of note, Ohio ranked top five among all states with the highest opioid-related overdose deaths. Ohio had the third highest
1 opioid overdose death rate (32.9 deaths per 100,000) in the country in 2016, which increased from 5.9 deaths per 100,000 in 2009 [14].
Fentanyl is an opioid used as a pain medication. However, it also emerged as a unprecedented threat to public health [17]. There have been numerous reports of inadvertent overuse, intentional misuse, and outbreaks of overdoses with fentanyl. In 2017, fentanyl and related drugs like carfentanil were involved in 70.7 percent of overdose deaths in Ohio; while fentanyl was involved involved in 58.2 percent in 2016, 37.9 percent in 2015, and 19.9 percent in 2014 [28]. The highest rates of affected areas were observed in the Southwestern portion of Ohio [26].
In this thesis, our overall goal is to quantify the burden of the opioid epidemic at the county level. Modeling illicit activity is usually accomplished through survey data. However, survey data at the county level is unavailable or sparse at best. Tra- ditional public health surveys of hard to reach and stigmatized populations are likely to be underreported and prohibitively expensive [29]. Complex designed surveys, like respondent driven sampling [18], are often conducted to deal with this problems but it is not common because of the time intensive nature of this design [31]. In Ohio, surveillance data on opioid associated deaths and treatment admissions are routinely collected at the county level. In this research, we develop a multivariate spatio- temporal factor model to jointly model the available county level surveillance data. The factor model framework synthesises the information in the multiple outcomes, providing an overall measure of the burden of the opioid epididemic at the county level [19].
1.2 Data Collection
In this thesis, we will use surveillance data collected by the state of Ohio routinely: opioid associated deaths and treatment admissions. We obtained these surveillance
2 data to conduct a joint spatio-temporal model of opioid associated deaths and treat- ment admissions for all of Ohio’s 88 counties for the most recent available years, 2007- 2016. Death data were collected from the Ohio Public Health Data Warehouse Ohio Resident Mortality Data and these counts are publicly accessible from the Ohio De- partment of Health Website (http://publicapps.odh.ohio.gov/EDW/DataCatalog). All resident deaths where poisoning from any opiate is mentioned on the death cer- tification are included in the death counts. We count death for the county where the decedent resided in Ohio no matter place of death. We searched the underlying cause of death dataset for deaths with an International Classification of Diseases-10 (ICD) code T40.0-T40.4 and T40.6, defined as ”poison from any opiate.” Population estimates were collected from the Center’s for Disease Control and Prevention (CDC) under the category of demographics. Observed county death rates per 100,000 res- idents for each county are shown in Figure 1.1. It is clear that higher death rate across the Southern Ohio from Figure 1.1. We also notice the clustered high death rates concentrated in the Southwestern of Ohio in the latest 5 years while it primarily centered in the Southeastern of Ohio from 2007 to 2011.
3 Figure 1.1: Observed death rates
Treatment admission data were collected from the Ohio Department of Mental Health and Drug Addiction Services through a data use agreement. Treatment admis- sion were assessed according to the diagnostic codes listed in Table 1.1 and contained any residential, intensive outpatient, or outpatient treatment for opioid misuse. Pa- tients who present to hospitals to receive treatment or to any other medical facility for overdose or complications from opioid misuse are excluded from our counts. In some cases, a patient had several admissions, but we only counted them once. In this data set, treatment data for people under and over age 21 were collected separately, but we will only use the total counts, which is the sum of treatment data above in this thesis. Since the data are yearly county-level data, there are treatment data both under 21 and over 21 suppressed or censored according to state policy. Then counties marked as “?” in the raw data set indicating the counts were under 10. Therefore, we decided to impute 5 to all censored data. Treatment rates per 100,000 residents
4 ICD-9 Diagnostic Codes 304.00 Opiate Type Dependence, Unspecified Use 304.01 Opiate Type Dependence, Continuous Use 304.02 Opiate Type Dependence, Episodic 304.03 Opiate Type Dependence, in Remission 304.70 Combinations of Opioid Type Drug with Any Other, Unspecified Use 304.71 Combinations of Opioid Type Drug with Any Other, Continuous Use 304.72 Combinations of Opioid Type Drug with Any Other, Episodic 304.73 Combinations of Opioid Type Drug with Any Other, in Remission 305.50 Opioid Abuse, Unspecified Use 305.51 Opioid Abuse, Continuous Use 305.52 Opioid Abuse, Episodic 305.53 Opioid Abuse, in Remission
DSM-IV-TR Diagnostic Codes 292.00 Opioid Withdrawal 292.11 Opioid-Induced Psychotic Disorder, with Delusions 292.12 Opioid-Induced Psychotic Disorder, with Hallucinations 292.81 Opioid Intoxication Delirium 292.84 Opioid-Induced Mood Disorder 292.89 Opioid Intoxication 292.89 Opioid-Induced Sexual Dysfunction 292.89 Opioid-Induced Sleep Disorder 292.90 Opioid-Related Disorder, NOS 304.00 Opioid Dependence 305.50 Opioid Abuse
Table 1.1: Diagnostic Codes used to define treatment
5 are shown in Figure 1.2 where we can see treatment rates are consistently highest in Southeastern Ohio from 2007 to 2016. There are also lot of treatment resources were put in this area [41] and we will discuss this in later section.
Figure 1.2: Observed treatment rates
We also collected county level covariate information about social environmental characteristics of each county to explain death and treatment rates. We acquired population estimates and environmental variables from the Centers for Disease Con- trol and Prevention (CDC). CDC’s National Environmental Public Health Tracking Network provides health data and environment data from national, state, and city sources (https://ephtracking.cdc.gov) According to previous research [30], people liv- ing in rural counties were more engaged in the misuse of opioids compared to their urban counterparts. We selected the percentage of adolescents living in poverty as our covariate to account for socioeconomic status for this analysis. Adolescent poverty is shown in Figure 1.3 where we notice there are consistently higher rates of adolescent
6 poverty in the Eastern and Southeastern part of the state. In general, there are many other social environmental factors that will impact the opioid epidemic such as de- mographics and chronic disease conditions. We only consider the only one covariate given the limited availability of yearly county level information for non-Census years.
Figure 1.3: Observed adolescent poverty
1.3 Overview Of Research
Factor models are considered to investigate the shared latent patterns of multivariate data. The method assumes the outcomes have a common variable that measures similar things conceptually. Then it allows us to describe multiple outcomes using the common factor. Recently, a lot of research has been interested in jointly analysing several potentially related diseases with respect to the fields of public health and health care. Therefore, there has been increased works on a joint mapping analysis to investigate shared common risk factors. It takes advantage of observed patterns
7 and possible risk factors to provide a suitable tool for estimate the shared risk factors under many outcomes. Then a factor model is particularly useful for this purpose. An example is the work Wang and Wall [44] who used them to study spatial characteristics of different types of cancers. Neeley [27] developed a spatial confirmatory factor analysis to quantify a latent unobserved measure of climate using observations from different climate models. Latent factor models have also been used for estimating common trends in multivariate time series. Tzala [43] uncovered the spatial and temporal patterns of latent factors underlying the cancer data in Greek.
In our thesis, opioid associated deaths and treatment admissions are two outcomes of the opioid epidemic outcomes related to opioid abuse. They both tell stories about underlying epidemic but from different aspects. Instead of modeling a single outcome, we focus on death and treatment jointly to provide more information of the overall burden of the opioid epidemic in each county. Therefore, it is essential to measure associations between the outcomes for each county to improve estimation of the underlying outcomes by modeling across rates of death and treatment[25]. In addition, we can also include potential properties of a county that may be related to the burden in a given county[44]. There are more general models we could fit instead of the factor models, like multivariate conditional autoregressive model, but our interest is in the underlying latent process driving these outcomes rather than any single outcome, and so a factor model lets us accomplish the overall goal of our research.
We use areal models for spatial data to help quantify the local scope of the opioid epidemic in Ohio counties. A common approach to modeling areal data is to fit con- ditional autoregressive (CAR) models [4]. By assuming our common spatial factor follows a spatio-temporal CAR model, we account for spatial and temporal depen- dence in the latent opioid burden. By combining a factor model with a CAR model,
8 we account for the spatial and temporal dependence, but also dependence between the outcomes.
In this thesis we consider a joint spatio-temporal model of opioid associated deaths and treatment admissions in Ohio. We have bivariate count observations of opioid associated deaths and treatment admissions for all 88 Ohio counties for each year from 2007-2016. The overarching goal of this research is to analyze surveillance data within Bayesian paradigm and estimate the underlying latent common spatial factor after accounting for socioeconomic properties of a county. Then we can measure the unobserved burden of the opioid epidemic for each county at each year. In this case, we can help policy makers and public health professionals in targeting counties that are most in need of intervention through evaluating the relative burden across the state. The work of Hepler and Kline [22] has a similar goal, but they only consider the spatial only cases. They conducted a joint spatial model in Ohio using aggregated counts from 2013-2015. However, the framework ignored importance features, i.e, temporal structure in the model. In this thesis, we obtain counts that are broken down by years so that we can assess both spatial and temporal patterns associated with the opioid epidemic in Ohio. By modelling trends over space and time, we can better response to the changing in epidemic in the state in a timely manner. It can also provide policy makers a valuable tool to track the results after intervention at the critical times.
The general layout of this paper will be as follows. Chapter 2 will consist of the prerequisite knowledge which were needed to analyze data. Chapter 3 will contain the statistical methods and models with the results described in Chapter 4. Finally we will discuss the findings and future works in Chapter 5.
9 Chapter 2: Background Information
This chapter will be used to set up all of the necessary background information used in our research. Our research takes a Bayesian approach to conduct a bivariate areal data analysis at the county-level. Section 2.1 will cover some basic Bayesian theories. Section 2.2 will review the basics of the iterative process of Markov Chain Monte Carlo and necessary theorems. The code that was created in this research was based on two algorithms. One is Metropolis-Hastings Algorithm covered in sec- tion 2.2.1 while the other one is Gibbs Sampler covered in section 2.2.2. Both two algorithm are used in the process of updating models. In Section 2.3, we cover the basics of spatial statistics. We will introduce spatial statistics from two aspects, one is Conditional Autoregressive Model in Section 2.3.1 and the other is Spatio-temporal in Section 2.3.2.
2.1 Introduction to Bayesian Statistics
Bayesian statistics was named after Thomas Bayes, who mentioned Bayes’ theorem published in 1763 [3]. Whereas Classical or Frequentist statistics assumes parameters θ are unknown constants, the Bayesian paradigm is to assume parameters are random variables and uncertainty in parameters is quantified through a probability distribu- tion. The believed distribution of possible values of a parameter are summarized by a probability distribution π on the support of θ, Θ. The prior distribution π(θ) quantifies the beliefs regarding θ prior to observing any data. After observing data x, the beliefs regarding θ are updated and inference is based on the resulting posterior distribution of θ conditional on x, denoted π(θ|x). According to Bayes’ Theorem,
10 f(x|θ)π(θ) π(θ|x) = . (2.1) m(x)
Note that f(x|θ) is the likelihood function and m(x) is the marginal distribution of x, where
( P f(x|θ)π(θ) if θ is discrete m(x) = (2.2) R f(x|θ)π(θ)dθ if θ is continuous.
Bayesian inference is entirely based on the posterior distribution. The overall purpose of inference is to provide a decision for decision makers. Then we need a whole evaluation criterion of decision procedures to compare different decisions absurd solutions, that’s called Decision Theory. In decision theory, we usually want to find a loss function, L(θ, d) that assesses the consequences of each decision and depends on the parameters of the model. The quadratic loss is the most common evaluation criterion proposed by Legendre [24] and Gauss [15]. It is also called square error loss,
L(θ, d) = (θ − d)2, (2.3) where d ∈ D is a decision related to θ ∈ Θ based on the observation x ∈ X. The quadratic loss function can always penalize large deviation heavily.
To make an effective comparison according to the loss function, frequentist ap- proach is to consider a risk or average loss,
R(θ, δ) = Eθ[L(θ, δ(x))] Z (2.4) = L(θ, δ(x))f(x|θ)dx X where δ(x) is the decision rule. In the Bayesian framework, the approach to decision theory is to define the integrated risk,
11 Definition 1. The integrated risk is the frequentist risk averaged over the values of θ according to their prior distribution,
r(π, δ) = Eπ[R(θ, δ)] Z = R(θ, δ(x))π(θ)dθ Θ (2.5) Z Z = L(θ, δ(x))f(x|θ)dxπ(θ)dθ. Θ X
Then we can find a constructive tool for the determination of a Bayes estimator.
Definition 2. A Bayes estimator associated with prior π(θ) and loss function L(θ, δ(x)) is any estimator δπ that minimizes r(π, δ(x)). The value r(π, δπ) is called the Bayes risk.
Bayes decision rule gives us a method for minimizing the overall risk and the Bayes risk is the best we can do. In fact, quadratic loss is extensively used because there is a consensus with non-decision-theoretic inference based on the posterior mean. That is, the Bayes estimators associated with the quadratic loss are the posterior means.
Proposition 2.1. The Bayes estimator δπ associated with the prior distribution π and with the quadratic loss (2.3), is the posterior expectation
R π π Θ θf(x|θ)π(θ)dθ δ (x) = E [θ|x] = R . (2.6) Θ f(x|θ)π(θ)dθ
Note that the quadratic form is not the only of loss satisfying this property, but the Bayes estimators are robust with respect to the quadratic form. Moreover, the quadratic loss is particularly interesting in the setting of bounded parameter spaces when the choice of a more subjective loss is impossible. Therefore, I will choose the quadratic loss function in this thesis.
12 2.2 Markov Chain Monte Carlo
In real world, it is rarely the case that the posterior distribution takes a known form. However, we can approximate the posterior distribution by simulating draws from this distribution using Markov Chain Monte Carlo (MCMC).
As mentioned in the previous section, our Bayes estimator of all random quantities will be the posterior mean. Since the posterior does not take a recognizable form, we cannot compute the posterior mean analytically. However, we can estimate this quantity with a Monte Carlo estimator. The Monte Carlo estimator of an expected value is the sample mean. When the sample size is large enough, the sample mean will converge in probability to the real expected value. From a Bayesian perspective, the Monte Carlo integration can be applied if the observations can be generated from the posterior distribution. MCMC is used to generate these samples from the posterior distribution. For the Monte Carlo estimator to converge to the true expected value, the Markov chain must satisfy some properties that are outlined below for the discrete state space case. The continuous state space case follows analogously, and the corresponding definitions can be found in Stochastic processes by Lamperti. [23].
Let P be a k × k matrix with elements {Pi,j : i, j = 1, ..., k}.A random walk is a sequence of random variables X = (X0,X1,X2,... ) starting at some fixed node,
X0, taking values in the fixed finite state space S = {s1, s2, . . . , sn}. One common type of random walk is a time-homogeneous Markov chain on V , i.e.
P (Xn+1 = sj|X0 = si0 ,X1 = si1 ,...,Xn−1 = sin−1 ,Xn = si)
= P (Xn+1 = sj|Xn = si) (2.7)
= pij
P is transition matrix and the elements of the transition matrix P are called tran- sition probabilities. This property means the conditional distribution of Xn+1 given
13 (X0,...,Xn) depends only on Xn. For a Markov Chain, we have the following fea- tures that will be necessary for the computational methods outlined in the following sections.
Definition 3. A Markov chain ( X0,X1,... ) with state space S = {s1, . . . , sk} and transition matrix P is said to be irreducible if for all si, sj ∈ S we have that si ↔ sj. Otherwise the chain is said to be reducible.
In other words, a random walk on a state space is said to be irreducible if for all 1 ≤ i, j ≤ k and M ≥ 0 there exists N ≥ 0 such that
P (XM+N = sj|XM = si) > 0 (2.8)
Definition 4. The period of a state si ∈ S is the greatest common divisor of all possible number of steps in which starting from si, one can finally return to si. i.e.
gcd{t ≥ 1 : P (Xt = si|X0 = si) > 0}. (2.9)
We say a random walk is aperiodic if the period of all nodes is 1.
Definition 5. Let (X0,X1,... ) be a Markov chain with state space {s1, . . . , sk} and transition matrix P . A row vector π = (π1, . . . , πk) is said to be a stationary distri- bution for the Markov chain, if it satisfies
Pk (1) πi ≥ 0 for i = 1, . . . , k, and i=1 πi = 1, and
Pk (2) πP = π , meaning that i=1 πiPi,j = πj for j = 1, . . . , k.
Theorem 2.2. Any irreducible and aperiodic Markov chain has exactly one stationary distribution.
Theorem 2.1 is called Existence of stationary distributions. In Monte Carlo Markov Chains, we need to generate an irreducible and aperiodic Markov Chain
14 to simulate the stationary posterior distribution[34]. Suppose we can construct an irreducible and aperiodic Markov chain (X0,X1,... ), whose unique stationary distri- bution is π. If we run the chain with a initial distribution, the Markov chain will converge to the stationary distribution π when n tends to infinity. And the mean of the stationary distribution is exactly our Bayes estimator under quadratic loss function.
2.2.1 Metropolis-Hastings Algorithm
The Metropolis-Hastings Algorithm is one of the most common MCMC algorithms.
The main idea is to generate the next state Xt+1 given Xt from a proposal density g and a target posterior distribution f. The first step is to construct a point Y from the proposal density g(·|Xt). Then, if the point is accepted, the chain will move to
Y at time t + 1, i.e, Xt+1 = Y ; otherwise the chain stay at its original state in time t, which is Xt+1 = Xt. The most commonly used proposal is a normal distribution or a uniform distribution since both are symmetric about the current state which makes the calculations simpler. For instance, if the proposal density is normal, then
2 2 g(·|Xt) ∼ Normal(µt = Xt, σ ) for some known σ . This algorithm is irreducible and aperiodic [32] with stationary distribution given by the posterior distribution.
The Metropolis-Hastings algorithm is as follows:
(1) Choose a proposal distribution g(·|Xt) satisfies the regularity conditions above.
(2) Set t=0 and generate X0 from an initial distribution.
(3) Generate a candidate point Y from g(·|Xt).
(4) Generate U from Uniform(0,1).
15 (5) If f(Y )g(X |Y ) U ≤ t f(Xt)g(Y |Xt)
accept Y and set Xt+1 = Y ; otherwise set Xt+1 = Xt.
(6) Increment t.
(7) Repeat (3) to (6) until the chain has converged to a stationary distribution.
Of note, the acceptance probability is
f(Y )g(Xt|Y ) p(Xt,Y ) = min , 1 f(Xt)g(Y |Xt) in step (5). Then the stationary distribution of the Metropolis-Hastings is exactly the target distribution f, which in Bayesian statistics is the desired posterior distribution.
2.2.2 Gibbs Sampler
The Gibbs Sampler is the second method used in Markov chain Monte Carlo and is named by Geman and Geman[16]. It is a special case of Metropolis-Hastings Sampler but it can be applied to multivariate posterior distributions. We can generate a chain by sampling from the full conditional distributions of the target distribution, and every candidate point is therefore accepted. Given that X = (X1,X2,...,Xd) is a
d random vector R for some d > 1 with full conditional densities f1, f2, . . . , fd. Define the d − 1 dimensional random vectors
X(−j) = (X1,X2,...Xj−1,Xj+1,...,Xd) and
Xj|X(−j) ∼ fj(Xj|X(−j)),
16 where fj(Xj|X(−j)) is the corresponding univariate full conditional density of Xj given
X(−j). Then the Gibbs Sampler generates the chain by iteratively sampling from the d conditional densities.
We let X(t) be Xt in the following Gibbs sampling algorithm:
(1) Initialize X(0) at time t = 0.
(2) Set x1 = X1(t − 1) for t = 1.
(3) For j = 1, . . . , d,
∗ (a) Generate Xj (t) from fj(Xj|x(−j)).
∗ (b) Update xj = Xj (t).
∗ (4) Set X(t) = (X1(t),...,Xd (t)).
(5) Increment t.
(6) Repeat (2) to (5) for t = 1, 2,... .
Gibbs Sampling is faster and computational cheaper than Metropolis-Hastings sampler since it updates each iterations. In Gibbs Sampler, we are not directly sam- pling from the posterior distribution itself. Rather, we simulate samples by sweeping through all the posterior conditionals, one random variable at a time. Because we initialize the algorithm with random values, the samples simulated based on this algo- rithm at early iterations may not necessarily be representative of the actual posterior distribution. For this reason, MCMC algorithms are typically run for a large num- ber of iterations. Because samples from the early iterations are not from the target posterior, it is common to discard these samples. The discarded iterations are often referred to as the “burn-in” period [6].
17 Metropolis and Gibbs can be combined by doing a Metropolis update for a certain full conditional within a Gibbs sampler. This sort of approach is known as Metropolis- within-Gibbs. I will discuss the computational details for the research of this thesis in Section 4.2.
The MCMC algorithm was implemented in OpenBUGS. BUGS is a software pack- age for performing Bayesian inference Using Gibbs Sampling. The user can choose a statistical model by simply stating the relationships between related variables. The software includes an expert system, which determines an appropriate MCMC scheme based on the Gibbs sampler for analysing the specified model. BUGS also offers a wide range of output types that is free to choose from by the users. OpenBUGS is an open-source version of the package, we can run OpenBUGS and analyse its output from within R [39].
2.3 Spatial Statistics
”Everything is related to everything else, but near things are more related than distant things.” Waldo Tobler.[42]
Spatial statistics methods use location information to quantify patterns among data. The assumption of spatial statistics is that nearby geo-referenced units are associated in some way. Ignoring spatial dependence results in inaccurate standard errors and less precise predictions. The development of the GIS community plays an important role in the application of spatial statistics since there is now an abundance of data that is geo-referenced. For instance, spatial statistics methods are widely used in public health, climate science, and many other application areas. There are a variety of spatial data types according to the fields and application. Based on the different types of data, different statistical methods capture certain important features of the underlying spatial process. Areal data involves aggregated quantities for each areal
18 unit within some relevant spatial partition of a given region, such as census tracts within a city, or counties within a state. In this thesis, we consider areal data.
2.3.1 CAR Model
Two very popular models for analyzing areal data are the simultaneously and condi- tionally autoregressive models (abbreviated SAR and CAR), originally developed by Whittle [46] and [4], respectively. The SAR model is computationally convenient for use with likelihood methods while the CAR model is computationally convenient for Gibbs sampling used in conjunction with Bayesian model fitting [8].
The CAR model is specified by assuming the conditional distribution of the vari- able at an areal unit (in our case, county), conditional on the neighboring counties, follow a specific normal distribution. This specification yields a multivariate normal joint distribution for all counties.
More specifically, let Y (si) denote the random variable at location si and Y−si the vector of all observations except Y (si). For each location, we assume the conditional distribution is a normal distribution. The conditional mean and variance are:
n X wij E[Y (s )|Y ] = ρ Y (s ), i −si w j j=1 i+ (2.10)
2 V ar[Y (si)|Y−si ] = σ , where W denote an adjacency matrix and the parameter ρ controls the strength of the spatial association. wij are nonzero only if location sj is in the neighborhood set of si and wi+ is the total number of neighbors of county i. ρ is between 0 and 1, ρ = 0 corresponds to spatial independence, and ρ = 1 indicating a strong spatial dependence across the state. Then the joint distribution can be determined from the set of full conditional distributions by using the Hammersley-Clifford theorem [4]. Let
19 D = diag(wi+), then the joint distribution induced by the conditional distributions is Y ∼ N(0, (D − ρW)−1σ2). (2.11)
The Intrinsic Conditional Autoregressive (ICAR) model is a special case when
ρ = 1. Then the expected value of Y (si)|Y−si can be explained as the average value of their neighbors, which makes this special case be more popular in the application. The joint distribution in the ICAR model produces an improper distribution because it creates a singular matrix D−W so that the covariance matrix is not of full rank [5]. Although this joint distribution is not a valid distribution, it can be used as a prior model since it yields a valid posterior distribution, provided a centering constraint, Pn such as i=1 Yi = 0, is enforced [44].
2.3.2 Spatio-temporal
Spatio-temporal data are spatial data combined with temporal structure. The spatio- temporal data also have the same important statistical characteristic that nearby observations tend to be more alike than those far apart with respect to both space and time.
As for the temporal dependence, we assume a vector autoregressive (VAR) model for years t = 1,...,T . VAR is a stochastic process model used to capture the lin- ear interdependencies among multiple time series [38]. VAR models generalize the univariate autoregressive model (AR model) by allowing for more than one evolving variable. Let yt be a vector, which has as the ith elements, yi,t is the observation at time t of the ith variable, then VAR(p) is,
yt = c + A1yt−1 + ··· + Apyt−p + t, t = 1,...,T where the i periods back observation yt−i is called the ith lag of y, c is a k-vector of constants, Ai is a time-invariant (k × k) matrix and t is a k-vector of error terms.
20 In our thesis, we will consider a VAR(1) model for the dependence. Then
yt = c + A1yt−1 + t, t = 1,...,T.
For spatial dependence, we still assume a CAR model. Then for the first year, the distribution of Yt is CAR model. For t = 2, . . . , t,
2 −1 Yt ∼ N(ηYt−1, σ (D − W) ).
It means the mean of the multivariate normal is η times the variable at the previous time period.
21 Chapter 3: Models
In this chapter, we describe a joint spatio-temporal model for 88 bivariate obser- vations of counts of opioid associated deaths and treatment admissions from 2007 to 2016 and combine ideas from several approaches that have previously appeared in first two chapters. In Section 3.1, we introduce the common spatial factor model. In Section 3.2, we give the computation details in Bayesian frameworks. Then we cover the Bayesian model selection methods in Section 3.3.
3.1 Models
Factor analysis is usually built to explore the underlying structure of multivariate data sets and it is particularly useful when we are not interest in any existing variable, but in underlying meaningful common patterns [2]. Generally speaking, these uncovering patterns are difficult or impossible to be measured in practice. In our thesis, the overall goal is to estimate such a common factor hypothesized to be responsible for the interrelations between the observed variables.
An early application of latent factor models to epidemiology mapping was Yanai [47] who focused on the spatial and temporal structure of multiple cancer sites in Japan involving a standard factor analysis. Then Wall and Wang [44] developed a common spatial factor model to improve estimation in a geographical context that captured incorporate spatial dependencies. They also analyzed multivariate spatial data on cancer-specific mortality in MN, USA within the frequentist and Bayesian frameworks, respectively.
Our model extends upon a generalized common spatial factor model [44]. In multivariate spatial data, there exists two types of correlations: one is the correlation
22 between variables at the same location, the other one is the correlation of each variable across the locations. We assume that two types of correlations are related with a common spatial factor or latent variable. That is, the common spatial factor shares spatial trends across models for each outcomes. It is a shared underlying factor that captures spatial characteristics in a location that are associated with multiple outcomes [12].
The generalized spatial factor model framework is as follows. Let Yij be the jth j = 1, ··· , p random variable observed at location si(i = 1, ··· , n). We assume that conditional on the fixed and random effects, Yij are independent and have a non-normal distribution from an exponential family with mean parameter θij and a
2 possibly separate variance parameter σj . It can be written as
g(θij) = Oij + βj + λjfi (3.1)
where g() denotes an appropriate link function, Oij is a known offset according to the specific data, fi is the underlying common spatial factor at location si, intercept βj and slope λ are unknown parameters called ‘factor loadings’. All spatial and between outcome dependence will be captured through the random effects.
In our application, we have 88 bivariate observations of counts of opioid associated deaths and treatment admissions from 2007-2016. We assume that conditional on the fixed and random effects, the bivariate count data are independent Poisson random
D variables. For each county i = 1, ··· , 88 and t = 1, ··· , 10, let Yit be the count of
T deaths and Yit be the count of treatment admissions. Then our model assumes
D D ind D Yti |λti ∼ P oisson(λti ) (3.2) T T ind T Yti |λti ∼ P oisson(λti)
D T where the Poisson means λti and λti represent the product of the relative risk of
23 death and treatment in county i in year t times the expected number of death and treatment in year t within county i [10].
We use the canonical log link function for the Poisson mean. That is,
D D D D D log(λti ) = log(Pti ) + Xtiβt + αti Uti + ti (3.3) T T T T T log(λti) = log(Pti ) + Xtiβt + αtiUti + ti
D T where the offsets Pti and Pti are the population of counts of county i during year t,
D D T Xti is the vector of explanatory variables with fixed effects βt for death and Xti is
T the vector of explanatory variables with fixed effects βt for treatment in county i in year t.
Here, Uti is the generalized common spatial factor for our bivariate Poisson out- comes. We can interpret the spatial factor Uti as the “joint burden” related with both death and treatment that is unexplained by the selected covariates. The factor load-
D T ings for death and treatment are αti and αti, respectively. The loadings explain how much influence each outcome has on the common spatial factor Uti. The independent
D T error terms, ti and ti, captures independent deviations that are outcome and county specific.
3.2 Priors and Computation
In Bayesian paradigm, we need to assign prior distributions to all unknown parameters in the model. We believe that for each year, the spatial factor and the factor loadings have spatial autocorrelation. We assume an ICAR model for these quantities within each year t. We believe the factor and loadings at year t are associated with these quantities in the previous year. To capture this temporal dependence, we assume a vector autoregressive process of order 1 so that the mean in year t is a function of the previous year’s value.
24 More specifically, we assume the following model for the spatial factor Ut: