<<

A JOINT SPATIO-TEMPORAL MODEL OF OPIOID ASSOCIATED DEATHS AND TREATMENT ADMISSIONS IN OHIO

BY

YIXUAN JI

A Thesis Submitted to the Graduate Faculty of

WAKE FOREST UNIVERSITY GRADUATE SCHOOL OF ARTS AND SCIENCES

in Partial Fulfillment of the Requirements

for the Degree of

MASTER OF ARTS

Mathematics and

May 2019

Winston-Salem, North Carolina

Approved By:

Staci Hepler, Ph.D., Advisor

Miaohua Jiang, Ph.D., Chair Robert Erhardt, Ph.D. Acknowledgments

First and foremost, I would like to express my sincere gratitude and appreciation to my advisor, Dr. Staci A. Hepler. Without your continuous support and inspiration, I could not have achieved this thesis research. Thank you so much for your patience, motivation, enthusiasm, and immense knowledge, which made this research even more enjoyable. In addition to my advisor, I would also like to thank Dr. Miaohua Jiang and Dr. Robert Erhardt, for the constant encouragement and advice, and for serving on my thesis committee. Furthermore, this thesis benefited greatly from various courses that I took, including Bayesian statistics from Dr. Hepler, factor analysis and Markov Chain from Dr. Berenhaut, poisson linear model from Dr. Erhardt’s Generalized Linear Model class, R programming skills from Dr. Nicole Dalzell and also reasoning skills from Dr. John Gemmer’s Real Analysis class. Additionally, I would like to extend my thanks to Dr. Ellen Kirkman, Dr. Stephen Robinson and Dr. Sarah Raynor for their special attention and support to me, as an international student. Last but not least, I would like to thank my family for their sincere mental support throughout my life. Moreover, I appreciate the help and companionship from all my fellow graduate students, faculty and staff in the Mathematics and Statistics Department.

ii Table of Contents

Acknowledgments ...... ii

Abstract ...... v

List of Figures ...... vi

List of Tables ...... vii

Chapter 1 Introduction ...... 1 1.1 Opioid Misuse in Ohio ...... 1 1.2 Data Collection ...... 2 1.3 Overview Of Research ...... 7

Chapter 2 Background Information...... 10 2.1 Introduction to Bayesian Statistics ...... 10 2.2 Markov Chain Monte Carlo ...... 13 2.2.1 Metropolis-Hastings Algorithm ...... 15 2.2.2 Gibbs Sampler ...... 16 2.3 Spatial Statistics ...... 18 2.3.1 CAR Model ...... 19 2.3.2 Spatio-temporal ...... 20

Chapter 3 Models ...... 22 3.1 Models ...... 22 3.2 Priors and Computation ...... 24 3.3 Model Selection ...... 27

Chapter 4 Conclusions ...... 29 4.1 Model Comparison ...... 29 4.2 Effects on Death Rates ...... 32 4.3 Effects on Treatment Rates ...... 33 4.4 Spatial Factors ...... 34 4.5 Scaled Loadings ...... 37 4.6 Independent Error Term ...... 39

iii Chapter 5 Discussion...... 41 5.1 Censored Observations ...... 41 5.2 Covariates ...... 42

Bibliography ...... 45

Appendix A Description of covariates ...... 51

Appendix B Trace Plots ...... 53 B.0.1 R code ...... 53 B.0.2 Trace Plots ...... 53

Appendix C Relevant R code ...... 56 C.0.1 Model ...... 56 C.0.2 Map of Ohio ...... 61

Curriculum Vitae ...... 71

iv Abstract

Opioid misuse is a major public health issue in the United States and in 2017, Ohio had the second highest age-adjusted drug overdose rate. In this thesis, we consider a joint spatio-temporal model of county-level surveillance data on deaths and treatment admissions using a latent spatial factor model. Our main goal is to estimate a common spatial factor, which offers a summary of the underlying joint burden of opioid misuse across space and time. The result is supposed to provide a valuable tool to allocate resources across the state in a timely manner. keywords: opioid epidemic, factor model, conditional autoregressive, spatio-temporal, resource allocation

v List of Figures

1.1 Observed death rates ...... 4 1.2 Observed treatment rates ...... 6 1.3 Observed adolescent poverty ...... 7

4.1 Estimated variance ratio ...... 31 4.2 Posterior mean rate for death and treatment ...... 32 4.3 Diagram of the joint spatio-temporal model ...... 35 4.4 Spatial Factors ...... 36 4.5 Spatial Loadings ...... 38 4.6 Error Term D ...... 39 4.7 Error Term T ...... 40

B.1 Trace Plot ...... 55

vi List of Tables

1.1 Diagnostic Codes used to define treatment ...... 5

4.1 Posterior mean of intercepts β1, β3 and slopes β2, β4 with marginal coefficients eβ2 , eβ4 for death rates and treatment rates respectively. . 34

4.2 Posterior standard deviation of intercepts β1, β3 and slopes β2, β4 for death rates and treatment rates respectively...... 34

A.1 Covariates ...... 52

vii Chapter 1: Introduction

1.1 Opioid Misuse in Ohio

Poisoning is one of the main causes of unintentional injury deaths in the United States, which are only second to motor vehicle crashes [9]. Almost all poisoning deaths result from drugs, and most drug poisonings are attributed to the abuse of prescription and illegal drugs. Overdose death rates varied according to the different types of drugs. Cocaine and heroin used to be the principal reason. However, deaths related to opoiods began increasing and accounted for a large number of deaths in the beginning of the 21st century [45]. Of note, concomitant use of benzodiaz-epines is implicated in the majority of prescription opioid-related overdose deaths [21].

Opioids are widely used for managing pain from acute injuries, surgeries, and advanced cancer. However, opioid misuse in some circumstances is associated with an elevated risk of early death, legal problems, and infectious diseases such as hepatitis C and human immunodeficiency virus (HIV). Accidental drug overdose, suicide, trauma and human immunodeficiency virus are attributed to the risk of premature mortality related to opioid overdose [11]. Opioid abuse is currently a national public epidemic and conditions on opioid misuse and abuse continue to deteriorate the United States. Opioid-related drug overdoses account for over 60% of drug overdose deaths within the United States, representing a fourfold increase within the past 15 years [7]. Treatment admissions for prescription opioid dependence rised more than fivefold since 2000 [35].

In particular, the drug overdose death rate in Ohio is above the national average. Opioid overdose deaths rearched 42,249 (66.4%) in 2016 (13.3 per 100,000 population), with a 27.9% rate increase from 2015 [37]. Of note, Ohio ranked top five among all states with the highest opioid-related overdose deaths. Ohio had the third highest

1 opioid overdose death rate (32.9 deaths per 100,000) in the country in 2016, which increased from 5.9 deaths per 100,000 in 2009 [14].

Fentanyl is an opioid used as a pain medication. However, it also emerged as a unprecedented threat to public health [17]. There have been numerous reports of inadvertent overuse, intentional misuse, and outbreaks of overdoses with fentanyl. In 2017, fentanyl and related drugs like carfentanil were involved in 70.7 percent of overdose deaths in Ohio; while fentanyl was involved involved in 58.2 percent in 2016, 37.9 percent in 2015, and 19.9 percent in 2014 [28]. The highest rates of affected areas were observed in the Southwestern portion of Ohio [26].

In this thesis, our overall goal is to quantify the burden of the opioid epidemic at the county level. Modeling illicit activity is usually accomplished through survey data. However, survey data at the county level is unavailable or sparse at best. Tra- ditional public health surveys of hard to reach and stigmatized populations are likely to be underreported and prohibitively expensive [29]. Complex designed surveys, like respondent driven sampling [18], are often conducted to deal with this problems but it is not common because of the time intensive nature of this design [31]. In Ohio, surveillance data on opioid associated deaths and treatment admissions are routinely collected at the county level. In this research, we develop a multivariate spatio- temporal factor model to jointly model the available county level surveillance data. The factor model framework synthesises the information in the multiple outcomes, providing an overall measure of the burden of the opioid epididemic at the county level [19].

1.2 Data Collection

In this thesis, we will use surveillance data collected by the state of Ohio routinely: opioid associated deaths and treatment admissions. We obtained these surveillance

2 data to conduct a joint spatio-temporal model of opioid associated deaths and treat- ment admissions for all of Ohio’s 88 counties for the most recent available years, 2007- 2016. Death data were collected from the Ohio Public Health Data Warehouse Ohio Resident Mortality Data and these counts are publicly accessible from the Ohio De- partment of Health Website (http://publicapps.odh.ohio.gov/EDW/DataCatalog). All resident deaths where poisoning from any opiate is mentioned on the death cer- tification are included in the death counts. We count death for the county where the decedent resided in Ohio no matter place of death. We searched the underlying cause of death dataset for deaths with an International Classification of Diseases-10 (ICD) code T40.0-T40.4 and T40.6, defined as ”poison from any opiate.” Population estimates were collected from the Center’s for Disease Control and Prevention (CDC) under the category of demographics. Observed county death rates per 100,000 res- idents for each county are shown in Figure 1.1. It is clear that higher death rate across the Southern Ohio from Figure 1.1. We also notice the clustered high death rates concentrated in the Southwestern of Ohio in the latest 5 years while it primarily centered in the Southeastern of Ohio from 2007 to 2011.

3 Figure 1.1: Observed death rates

Treatment admission data were collected from the Ohio Department of Mental Health and Drug Addiction Services through a data use agreement. Treatment admis- sion were assessed according to the diagnostic codes listed in Table 1.1 and contained any residential, intensive outpatient, or outpatient treatment for opioid misuse. Pa- tients who present to hospitals to receive treatment or to any other medical facility for overdose or complications from opioid misuse are excluded from our counts. In some cases, a patient had several admissions, but we only counted them once. In this data set, treatment data for people under and over age 21 were collected separately, but we will only use the total counts, which is the sum of treatment data above in this thesis. Since the data are yearly county-level data, there are treatment data both under 21 and over 21 suppressed or censored according to state policy. Then counties marked as “?” in the raw data set indicating the counts were under 10. Therefore, we decided to impute 5 to all censored data. Treatment rates per 100,000 residents

4 ICD-9 Diagnostic Codes 304.00 Opiate Type Dependence, Unspecified Use 304.01 Opiate Type Dependence, Continuous Use 304.02 Opiate Type Dependence, Episodic 304.03 Opiate Type Dependence, in Remission 304.70 Combinations of Opioid Type Drug with Any Other, Unspecified Use 304.71 Combinations of Opioid Type Drug with Any Other, Continuous Use 304.72 Combinations of Opioid Type Drug with Any Other, Episodic 304.73 Combinations of Opioid Type Drug with Any Other, in Remission 305.50 Opioid Abuse, Unspecified Use 305.51 Opioid Abuse, Continuous Use 305.52 Opioid Abuse, Episodic 305.53 Opioid Abuse, in Remission

DSM-IV-TR Diagnostic Codes 292.00 Opioid Withdrawal 292.11 Opioid-Induced Psychotic Disorder, with Delusions 292.12 Opioid-Induced Psychotic Disorder, with Hallucinations 292.81 Opioid Intoxication Delirium 292.84 Opioid-Induced Mood Disorder 292.89 Opioid Intoxication 292.89 Opioid-Induced Sexual Dysfunction 292.89 Opioid-Induced Sleep Disorder 292.90 Opioid-Related Disorder, NOS 304.00 Opioid Dependence 305.50 Opioid Abuse

Table 1.1: Diagnostic Codes used to define treatment

5 are shown in Figure 1.2 where we can see treatment rates are consistently highest in Southeastern Ohio from 2007 to 2016. There are also lot of treatment resources were put in this area [41] and we will discuss this in later section.

Figure 1.2: Observed treatment rates

We also collected county level covariate information about social environmental characteristics of each county to explain death and treatment rates. We acquired population estimates and environmental variables from the Centers for Disease Con- trol and Prevention (CDC). CDC’s National Environmental Public Health Tracking Network provides health data and environment data from national, state, and city sources (https://ephtracking.cdc.gov) According to previous research [30], people liv- ing in rural counties were more engaged in the misuse of opioids compared to their urban counterparts. We selected the percentage of adolescents living in poverty as our covariate to account for socioeconomic status for this analysis. Adolescent poverty is shown in Figure 1.3 where we notice there are consistently higher rates of adolescent

6 poverty in the Eastern and Southeastern part of the state. In general, there are many other social environmental factors that will impact the opioid epidemic such as de- mographics and chronic disease conditions. We only consider the only one covariate given the limited availability of yearly county level information for non-Census years.

Figure 1.3: Observed adolescent poverty

1.3 Overview Of Research

Factor models are considered to investigate the shared latent patterns of multivariate data. The method assumes the outcomes have a common variable that measures similar things conceptually. Then it allows us to describe multiple outcomes using the common factor. Recently, a lot of research has been interested in jointly analysing several potentially related diseases with respect to the fields of public health and health care. Therefore, there has been increased works on a joint mapping analysis to investigate shared common risk factors. It takes advantage of observed patterns

7 and possible risk factors to provide a suitable tool for estimate the shared risk factors under many outcomes. Then a factor model is particularly useful for this purpose. An example is the work Wang and Wall [44] who used them to study spatial characteristics of different types of cancers. Neeley [27] developed a spatial confirmatory factor analysis to quantify a latent unobserved measure of climate using observations from different climate models. Latent factor models have also been used for estimating common trends in multivariate time series. Tzala [43] uncovered the spatial and temporal patterns of latent factors underlying the cancer data in Greek.

In our thesis, opioid associated deaths and treatment admissions are two outcomes of the opioid epidemic outcomes related to opioid abuse. They both tell stories about underlying epidemic but from different aspects. Instead of modeling a single outcome, we focus on death and treatment jointly to provide more information of the overall burden of the opioid epidemic in each county. Therefore, it is essential to measure associations between the outcomes for each county to improve estimation of the underlying outcomes by modeling across rates of death and treatment[25]. In addition, we can also include potential properties of a county that may be related to the burden in a given county[44]. There are more general models we could fit instead of the factor models, like multivariate conditional autoregressive model, but our interest is in the underlying latent process driving these outcomes rather than any single outcome, and so a factor model lets us accomplish the overall goal of our research.

We use areal models for spatial data to help quantify the local scope of the opioid epidemic in Ohio counties. A common approach to modeling areal data is to fit con- ditional autoregressive (CAR) models [4]. By assuming our common spatial factor follows a spatio-temporal CAR model, we account for spatial and temporal depen- dence in the latent opioid burden. By combining a factor model with a CAR model,

8 we account for the spatial and temporal dependence, but also dependence between the outcomes.

In this thesis we consider a joint spatio-temporal model of opioid associated deaths and treatment admissions in Ohio. We have bivariate count observations of opioid associated deaths and treatment admissions for all 88 Ohio counties for each year from 2007-2016. The overarching goal of this research is to analyze surveillance data within Bayesian paradigm and estimate the underlying latent common spatial factor after accounting for socioeconomic properties of a county. Then we can measure the unobserved burden of the opioid epidemic for each county at each year. In this case, we can help policy makers and public health professionals in targeting counties that are most in need of intervention through evaluating the relative burden across the state. The work of Hepler and Kline [22] has a similar goal, but they only consider the spatial only cases. They conducted a joint spatial model in Ohio using aggregated counts from 2013-2015. However, the framework ignored importance features, i.e, temporal structure in the model. In this thesis, we obtain counts that are broken down by years so that we can assess both spatial and temporal patterns associated with the opioid epidemic in Ohio. By modelling trends over space and time, we can better response to the changing in epidemic in the state in a timely manner. It can also provide policy makers a valuable tool to track the results after intervention at the critical times.

The general layout of this paper will be as follows. Chapter 2 will consist of the prerequisite knowledge which were needed to analyze data. Chapter 3 will contain the statistical methods and models with the results described in Chapter 4. Finally we will discuss the findings and future works in Chapter 5.

9 Chapter 2: Background Information

This chapter will be used to set up all of the necessary background information used in our research. Our research takes a Bayesian approach to conduct a bivariate areal data analysis at the county-level. Section 2.1 will cover some basic Bayesian theories. Section 2.2 will review the basics of the iterative process of Markov Chain Monte Carlo and necessary theorems. The code that was created in this research was based on two algorithms. One is Metropolis-Hastings Algorithm covered in sec- tion 2.2.1 while the other one is Gibbs Sampler covered in section 2.2.2. Both two algorithm are used in the process of updating models. In Section 2.3, we cover the basics of spatial statistics. We will introduce spatial statistics from two aspects, one is Conditional Autoregressive Model in Section 2.3.1 and the other is Spatio-temporal in Section 2.3.2.

2.1 Introduction to Bayesian Statistics

Bayesian statistics was named after Thomas Bayes, who mentioned Bayes’ theorem published in 1763 [3]. Whereas Classical or Frequentist statistics assumes parameters θ are unknown constants, the Bayesian paradigm is to assume parameters are random variables and uncertainty in parameters is quantified through a probability distribu- tion. The believed distribution of possible values of a parameter are summarized by a probability distribution π on the support of θ, Θ. The prior distribution π(θ) quantifies the beliefs regarding θ prior to observing any data. After observing data x, the beliefs regarding θ are updated and inference is based on the resulting posterior distribution of θ conditional on x, denoted π(θ|x). According to Bayes’ Theorem,

10 f(x|θ)π(θ) π(θ|x) = . (2.1) m(x)

Note that f(x|θ) is the likelihood function and m(x) is the marginal distribution of x, where

( P f(x|θ)π(θ) if θ is discrete m(x) = (2.2) R f(x|θ)π(θ)dθ if θ is continuous.

Bayesian inference is entirely based on the posterior distribution. The overall purpose of inference is to provide a decision for decision makers. Then we need a whole evaluation criterion of decision procedures to compare different decisions absurd solutions, that’s called Decision Theory. In decision theory, we usually want to find a loss function, L(θ, d) that assesses the consequences of each decision and depends on the parameters of the model. The quadratic loss is the most common evaluation criterion proposed by Legendre [24] and Gauss [15]. It is also called square error loss,

L(θ, d) = (θ − d)2, (2.3) where d ∈ D is a decision related to θ ∈ Θ based on the observation x ∈ X. The quadratic loss function can always penalize large deviation heavily.

To make an effective comparison according to the loss function, frequentist ap- proach is to consider a risk or average loss,

R(θ, δ) = Eθ[L(θ, δ(x))] Z (2.4) = L(θ, δ(x))f(x|θ)dx X where δ(x) is the decision rule. In the Bayesian framework, the approach to decision theory is to define the integrated risk,

11 Definition 1. The integrated risk is the frequentist risk averaged over the values of θ according to their prior distribution,

r(π, δ) = Eπ[R(θ, δ)] Z = R(θ, δ(x))π(θ)dθ Θ (2.5) Z Z = L(θ, δ(x))f(x|θ)dxπ(θ)dθ. Θ X

Then we can find a constructive tool for the determination of a Bayes estimator.

Definition 2. A Bayes estimator associated with prior π(θ) and loss function L(θ, δ(x)) is any estimator δπ that minimizes r(π, δ(x)). The value r(π, δπ) is called the Bayes risk.

Bayes decision rule gives us a method for minimizing the overall risk and the Bayes risk is the best we can do. In fact, quadratic loss is extensively used because there is a consensus with non-decision-theoretic inference based on the posterior mean. That is, the Bayes estimators associated with the quadratic loss are the posterior means.

Proposition 2.1. The Bayes estimator δπ associated with the prior distribution π and with the quadratic loss (2.3), is the posterior expectation

R π π Θ θf(x|θ)π(θ)dθ δ (x) = E [θ|x] = R . (2.6) Θ f(x|θ)π(θ)dθ

Note that the quadratic form is not the only of loss satisfying this property, but the Bayes estimators are robust with respect to the quadratic form. Moreover, the quadratic loss is particularly interesting in the setting of bounded parameter spaces when the choice of a more subjective loss is impossible. Therefore, I will choose the quadratic loss function in this thesis.

12 2.2 Markov Chain Monte Carlo

In real world, it is rarely the case that the posterior distribution takes a known form. However, we can approximate the posterior distribution by simulating draws from this distribution using Markov Chain Monte Carlo (MCMC).

As mentioned in the previous section, our Bayes estimator of all random quantities will be the posterior mean. Since the posterior does not take a recognizable form, we cannot compute the posterior mean analytically. However, we can estimate this quantity with a Monte Carlo estimator. The Monte Carlo estimator of an expected value is the sample mean. When the sample size is large enough, the sample mean will converge in probability to the real expected value. From a Bayesian perspective, the Monte Carlo integration can be applied if the observations can be generated from the posterior distribution. MCMC is used to generate these samples from the posterior distribution. For the Monte Carlo estimator to converge to the true expected value, the Markov chain must satisfy some properties that are outlined below for the discrete state space case. The continuous state space case follows analogously, and the corresponding definitions can be found in Stochastic processes by Lamperti. [23].

Let P be a k × k matrix with elements {Pi,j : i, j = 1, ..., k}.A random walk is a sequence of random variables X = (X0,X1,X2,... ) starting at some fixed node,

X0, taking values in the fixed finite state space S = {s1, s2, . . . , sn}. One common type of random walk is a time-homogeneous Markov chain on V , i.e.

P (Xn+1 = sj|X0 = si0 ,X1 = si1 ,...,Xn−1 = sin−1 ,Xn = si)

= P (Xn+1 = sj|Xn = si) (2.7)

= pij

P is transition matrix and the elements of the transition matrix P are called tran- sition probabilities. This property means the conditional distribution of Xn+1 given

13 (X0,...,Xn) depends only on Xn. For a Markov Chain, we have the following fea- tures that will be necessary for the computational methods outlined in the following sections.

Definition 3. A Markov chain ( X0,X1,... ) with state space S = {s1, . . . , sk} and transition matrix P is said to be irreducible if for all si, sj ∈ S we have that si ↔ sj. Otherwise the chain is said to be reducible.

In other words, a random walk on a state space is said to be irreducible if for all 1 ≤ i, j ≤ k and M ≥ 0 there exists N ≥ 0 such that

P (XM+N = sj|XM = si) > 0 (2.8)

Definition 4. The period of a state si ∈ S is the greatest common divisor of all possible number of steps in which starting from si, one can finally return to si. i.e.

gcd{t ≥ 1 : P (Xt = si|X0 = si) > 0}. (2.9)

We say a random walk is aperiodic if the period of all nodes is 1.

Definition 5. Let (X0,X1,... ) be a Markov chain with state space {s1, . . . , sk} and transition matrix P . A row vector π = (π1, . . . , πk) is said to be a stationary distri- bution for the Markov chain, if it satisfies

Pk (1) πi ≥ 0 for i = 1, . . . , k, and i=1 πi = 1, and

Pk (2) πP = π , meaning that i=1 πiPi,j = πj for j = 1, . . . , k.

Theorem 2.2. Any irreducible and aperiodic Markov chain has exactly one stationary distribution.

Theorem 2.1 is called Existence of stationary distributions. In Monte Carlo Markov Chains, we need to generate an irreducible and aperiodic Markov Chain

14 to simulate the stationary posterior distribution[34]. Suppose we can construct an irreducible and aperiodic Markov chain (X0,X1,... ), whose unique stationary distri- bution is π. If we run the chain with a initial distribution, the Markov chain will converge to the stationary distribution π when n tends to infinity. And the mean of the stationary distribution is exactly our Bayes estimator under quadratic loss function.

2.2.1 Metropolis-Hastings Algorithm

The Metropolis-Hastings Algorithm is one of the most common MCMC algorithms.

The main idea is to generate the next state Xt+1 given Xt from a proposal density g and a target posterior distribution f. The first step is to construct a point Y from the proposal density g(·|Xt). Then, if the point is accepted, the chain will move to

Y at time t + 1, i.e, Xt+1 = Y ; otherwise the chain stay at its original state in time t, which is Xt+1 = Xt. The most commonly used proposal is a normal distribution or a uniform distribution since both are symmetric about the current state which makes the calculations simpler. For instance, if the proposal density is normal, then

2 2 g(·|Xt) ∼ Normal(µt = Xt, σ ) for some known σ . This algorithm is irreducible and aperiodic [32] with stationary distribution given by the posterior distribution.

The Metropolis-Hastings algorithm is as follows:

(1) Choose a proposal distribution g(·|Xt) satisfies the regularity conditions above.

(2) Set t=0 and generate X0 from an initial distribution.

(3) Generate a candidate point Y from g(·|Xt).

(4) Generate U from Uniform(0,1).

15 (5) If f(Y )g(X |Y ) U ≤ t f(Xt)g(Y |Xt)

accept Y and set Xt+1 = Y ; otherwise set Xt+1 = Xt.

(6) Increment t.

(7) Repeat (3) to (6) until the chain has converged to a stationary distribution.

Of note, the acceptance probability is

  f(Y )g(Xt|Y ) p(Xt,Y ) = min , 1 f(Xt)g(Y |Xt) in step (5). Then the stationary distribution of the Metropolis-Hastings is exactly the target distribution f, which in Bayesian statistics is the desired posterior distribution.

2.2.2 Gibbs Sampler

The Gibbs Sampler is the second method used in Markov chain Monte Carlo and is named by Geman and Geman[16]. It is a special case of Metropolis-Hastings Sampler but it can be applied to multivariate posterior distributions. We can generate a chain by sampling from the full conditional distributions of the target distribution, and every candidate point is therefore accepted. Given that X = (X1,X2,...,Xd) is a

d random vector R for some d > 1 with full conditional densities f1, f2, . . . , fd. Define the d − 1 dimensional random vectors

X(−j) = (X1,X2,...Xj−1,Xj+1,...,Xd) and

Xj|X(−j) ∼ fj(Xj|X(−j)),

16 where fj(Xj|X(−j)) is the corresponding univariate full conditional density of Xj given

X(−j). Then the Gibbs Sampler generates the chain by iteratively sampling from the d conditional densities.

We let X(t) be Xt in the following Gibbs sampling algorithm:

(1) Initialize X(0) at time t = 0.

(2) Set x1 = X1(t − 1) for t = 1.

(3) For j = 1, . . . , d,

∗ (a) Generate Xj (t) from fj(Xj|x(−j)).

∗ (b) Update xj = Xj (t).

∗ (4) Set X(t) = (X1(t),...,Xd (t)).

(5) Increment t.

(6) Repeat (2) to (5) for t = 1, 2,... .

Gibbs Sampling is faster and computational cheaper than Metropolis-Hastings sampler since it updates each iterations. In Gibbs Sampler, we are not directly sam- pling from the posterior distribution itself. Rather, we simulate samples by sweeping through all the posterior conditionals, one random variable at a time. Because we initialize the algorithm with random values, the samples simulated based on this algo- rithm at early iterations may not necessarily be representative of the actual posterior distribution. For this reason, MCMC algorithms are typically run for a large num- ber of iterations. Because samples from the early iterations are not from the target posterior, it is common to discard these samples. The discarded iterations are often referred to as the “burn-in” period [6].

17 Metropolis and Gibbs can be combined by doing a Metropolis update for a certain full conditional within a Gibbs sampler. This sort of approach is known as Metropolis- within-Gibbs. I will discuss the computational details for the research of this thesis in Section 4.2.

The MCMC algorithm was implemented in OpenBUGS. BUGS is a software pack- age for performing Using Gibbs Sampling. The user can choose a statistical model by simply stating the relationships between related variables. The software includes an expert system, which determines an appropriate MCMC scheme based on the Gibbs sampler for analysing the specified model. BUGS also offers a wide range of output types that is free to choose from by the users. OpenBUGS is an open-source version of the package, we can run OpenBUGS and analyse its output from within R [39].

2.3 Spatial Statistics

”Everything is related to everything else, but near things are more related than distant things.” Waldo Tobler.[42]

Spatial statistics methods use location information to quantify patterns among data. The assumption of spatial statistics is that nearby geo-referenced units are associated in some way. Ignoring spatial dependence results in inaccurate standard errors and less precise predictions. The development of the GIS community plays an important role in the application of spatial statistics since there is now an abundance of data that is geo-referenced. For instance, spatial statistics methods are widely used in public health, climate science, and many other application areas. There are a variety of spatial data types according to the fields and application. Based on the different types of data, different statistical methods capture certain important features of the underlying spatial process. Areal data involves aggregated quantities for each areal

18 unit within some relevant spatial partition of a given region, such as census tracts within a city, or counties within a state. In this thesis, we consider areal data.

2.3.1 CAR Model

Two very popular models for analyzing areal data are the simultaneously and condi- tionally autoregressive models (abbreviated SAR and CAR), originally developed by Whittle [46] and [4], respectively. The SAR model is computationally convenient for use with likelihood methods while the CAR model is computationally convenient for Gibbs sampling used in conjunction with Bayesian model fitting [8].

The CAR model is specified by assuming the conditional distribution of the vari- able at an areal unit (in our case, county), conditional on the neighboring counties, follow a specific normal distribution. This specification yields a multivariate normal joint distribution for all counties.

More specifically, let Y (si) denote the random variable at location si and Y−si the vector of all observations except Y (si). For each location, we assume the conditional distribution is a normal distribution. The conditional mean and variance are:

n X wij E[Y (s )|Y ] = ρ Y (s ), i −si w j j=1 i+ (2.10)

2 V ar[Y (si)|Y−si ] = σ , where W denote an adjacency matrix and the parameter ρ controls the strength of the spatial association. wij are nonzero only if location sj is in the neighborhood set of si and wi+ is the total number of neighbors of county i. ρ is between 0 and 1, ρ = 0 corresponds to spatial independence, and ρ = 1 indicating a strong spatial dependence across the state. Then the joint distribution can be determined from the set of full conditional distributions by using the Hammersley-Clifford theorem [4]. Let

19 D = diag(wi+), then the joint distribution induced by the conditional distributions is Y ∼ N(0, (D − ρW)−1σ2). (2.11)

The Intrinsic Conditional Autoregressive (ICAR) model is a special case when

ρ = 1. Then the expected value of Y (si)|Y−si can be explained as the average value of their neighbors, which makes this special case be more popular in the application. The joint distribution in the ICAR model produces an improper distribution because it creates a singular matrix D−W so that the covariance matrix is not of full rank [5]. Although this joint distribution is not a valid distribution, it can be used as a prior model since it yields a valid posterior distribution, provided a centering constraint, Pn such as i=1 Yi = 0, is enforced [44].

2.3.2 Spatio-temporal

Spatio-temporal data are spatial data combined with temporal structure. The spatio- temporal data also have the same important statistical characteristic that nearby observations tend to be more alike than those far apart with respect to both space and time.

As for the temporal dependence, we assume a vector autoregressive (VAR) model for years t = 1,...,T . VAR is a stochastic process model used to capture the lin- ear interdependencies among multiple time series [38]. VAR models generalize the univariate autoregressive model (AR model) by allowing for more than one evolving variable. Let yt be a vector, which has as the ith elements, yi,t is the observation at time t of the ith variable, then VAR(p) is,

yt = c + A1yt−1 + ··· + Apyt−p + t, t = 1,...,T where the i periods back observation yt−i is called the ith lag of y, c is a k-vector of constants, Ai is a time-invariant (k × k) matrix and t is a k-vector of error terms.

20 In our thesis, we will consider a VAR(1) model for the dependence. Then

yt = c + A1yt−1 + t, t = 1,...,T.

For spatial dependence, we still assume a CAR model. Then for the first year, the distribution of Yt is CAR model. For t = 2, . . . , t,

2 −1 Yt ∼ N(ηYt−1, σ (D − W) ).

It means the mean of the multivariate normal is η times the variable at the previous time period.

21 Chapter 3: Models

In this chapter, we describe a joint spatio-temporal model for 88 bivariate obser- vations of counts of opioid associated deaths and treatment admissions from 2007 to 2016 and combine ideas from several approaches that have previously appeared in first two chapters. In Section 3.1, we introduce the common spatial factor model. In Section 3.2, we give the computation details in Bayesian frameworks. Then we cover the Bayesian model selection methods in Section 3.3.

3.1 Models

Factor analysis is usually built to explore the underlying structure of multivariate data sets and it is particularly useful when we are not interest in any existing variable, but in underlying meaningful common patterns [2]. Generally speaking, these uncovering patterns are difficult or impossible to be measured in practice. In our thesis, the overall goal is to estimate such a common factor hypothesized to be responsible for the interrelations between the observed variables.

An early application of latent factor models to mapping was Yanai [47] who focused on the spatial and temporal structure of multiple cancer sites in Japan involving a standard factor analysis. Then Wall and Wang [44] developed a common spatial factor model to improve estimation in a geographical context that captured incorporate spatial dependencies. They also analyzed multivariate spatial data on cancer-specific mortality in MN, USA within the frequentist and Bayesian frameworks, respectively.

Our model extends upon a generalized common spatial factor model [44]. In multivariate spatial data, there exists two types of correlations: one is the correlation

22 between variables at the same location, the other one is the correlation of each variable across the locations. We assume that two types of correlations are related with a common spatial factor or latent variable. That is, the common spatial factor shares spatial trends across models for each outcomes. It is a shared underlying factor that captures spatial characteristics in a location that are associated with multiple outcomes [12].

The generalized spatial factor model framework is as follows. Let Yij be the jth j = 1, ··· , p random variable observed at location si(i = 1, ··· , n). We assume that conditional on the fixed and random effects, Yij are independent and have a non-normal distribution from an exponential family with mean parameter θij and a

2 possibly separate variance parameter σj . It can be written as

g(θij) = Oij + βj + λjfi (3.1)

where g() denotes an appropriate link function, Oij is a known offset according to the specific data, fi is the underlying common spatial factor at location si, intercept βj and slope λ are unknown parameters called ‘factor loadings’. All spatial and between outcome dependence will be captured through the random effects.

In our application, we have 88 bivariate observations of counts of opioid associated deaths and treatment admissions from 2007-2016. We assume that conditional on the fixed and random effects, the bivariate count data are independent Poisson random

D variables. For each county i = 1, ··· , 88 and t = 1, ··· , 10, let Yit be the count of

T deaths and Yit be the count of treatment admissions. Then our model assumes

D D ind D Yti |λti ∼ P oisson(λti ) (3.2) T T ind T Yti |λti ∼ P oisson(λti)

D T where the Poisson means λti and λti represent the product of the relative risk of

23 death and treatment in county i in year t times the expected number of death and treatment in year t within county i [10].

We use the canonical log link function for the Poisson mean. That is,

D D D D D log(λti ) = log(Pti ) + Xtiβt + αti Uti + ti (3.3) T T T T T log(λti) = log(Pti ) + Xtiβt + αtiUti + ti

D T where the offsets Pti and Pti are the population of counts of county i during year t,

D D T Xti is the vector of explanatory variables with fixed effects βt for death and Xti is

T the vector of explanatory variables with fixed effects βt for treatment in county i in year t.

Here, Uti is the generalized common spatial factor for our bivariate Poisson out- comes. We can interpret the spatial factor Uti as the “joint burden” related with both death and treatment that is unexplained by the selected covariates. The factor load-

D T ings for death and treatment are αti and αti, respectively. The loadings explain how much influence each outcome has on the common spatial factor Uti. The independent

D T error terms, ti and ti, captures independent deviations that are outcome and county specific.

3.2 Priors and Computation

In Bayesian paradigm, we need to assign prior distributions to all unknown parameters in the model. We believe that for each year, the spatial factor and the factor loadings have spatial autocorrelation. We assume an ICAR model for these quantities within each year t. We believe the factor and loadings at year t are associated with these quantities in the previous year. To capture this temporal dependence, we assume a vector autoregressive process of order 1 so that the mean in year t is a function of the previous year’s value.

24 More specifically, we assume the following model for the spatial factor Ut:

2 −1 Ut ∼ N 0, τu (D − W) when t = 2007 (3.4) 2 −1 Ut ∼ N(ηUt−1, τu (D − W) ) when t = 2008, ··· , 2016 where we assume the prior on η is flat on (0,1) since it can reflect prior beliefs that the temporal dependence is positive, and the upper bound can make sure the time series process is stationary,

η ∼ Uniform(0, 1). (3.5)

As discussed above, there are identifiability issues and not all random quantities in equation (3.3) can be estimated without imposing some restrictions on the factor and loadings. To see this, observe that the latent variable Uti can be fixed to have

1 any scale. That is, αtiUti = (cαti) × ( c Uti) where c 6= 0. In other words, both the factors and loadings could be arbitrarily rescaled but the model stays the same [13]. To ensure our model is identifiable, we fix the loadings for one outcome to be constant across space and time. We use treatment as this reference outcome, that is, we assume

T αti = 1 for all t and i. The loadings for death are assumed to follow a mean one ICAR model as follows,

D 2 −1 αt ∼ N(1, τα(D − W) ) (3.6)

D In this prior distribution, we constrain the mean of αti to be 1 for each year, i.e, 1 P88 88 i=1 αti = 1 for t = 2007, ··· , 2016. This constraint removes the identifiability problems in ICAR model. By adding the centering constraint, the interpretation for the loadings equal to 1 in one county is that there is a similar effect on the latent factor as treatment. The CAR priors were specified in OpenBUGS using the car.normal distribution [39].

The prior distributions for the regression coefficients in vectors βD and βT are assigned independent, non-informative prior distributions that are uniform on the real

25 D T line. The error terms ti and ti are assumed to be independent, normally distributed

2 2 with mean zero and variance σD and σT , respectively,

D iid 2 ti ∼ N (0, σD) (3.7) T iid 2 ti ∼ N (0, σT ).

2 2 2 2 The variance parameters τu , τα, σD and σT have inverse gamma prior distributions with shape and scale parameters of 0.5. Then we can specify the posterior distribution of all observable quantities as follow,

D D T D T 2 2 2 2 D T π(U, α ,  ,  , β , β , τu , τα, σD, σT |Y ,Y )

D T D T 2 D 2 D 2 2 T 2 ∝ L(λ , λ |Y ,Y )f(U|τu , η)f(α |τα)f( |σD)π(σD))f( |σT ) (3.8)

D T 2 2 2 2 × π(β )π(β )π(τu )π(τα)π(σD)π(σT )π(η) where L(·) denotes the likelihood function, f(·) is the joint distributions of random effects and π is used to present the prior distributions.

Metropolis-within-Gibbs algorithm is used to explore the posterior distribution.

2 2 We can use Gibbs update to simulate the variance parameters, τD and τT , and Metropolis-Hasting updates to sample other values. Adaptive MCMC is used to ensure adequate mixing within all Metropolis-Hasting updates [33]. The results pre- sented in the next chapter was run for a total 10,000 iterations with a burn-in of 5,000

2 2 2 2 iterations. We chose the initial value of variance parameters τu , τα, σD, σD as 0.1, β = [−5, −3, 0.01, 0.01]T according to the coefficients from frequentist Poisson linear model prospective, and the rest of parameters all started as 0. The chain output was thinned with only every tenth draw being stored. In addition, we checked convergence with trace plots and randomly selected trace plots are showed in Appendix B. The full MCMC algorithm was implemented in ‘R2OpenBUGS’ package in R and using a single core of a 128GB node on the Wake Forest High Performance Computing Deac

26 Cluster. It took approximately 1 hour to complete and the codes are shown in Ap- pendix C. And we use posterior mean as our estimation according to the background information in Section 2.1.

3.3 Model Selection

In classical approaches, we commonly use Akaike information criterion (AIC) [1] and Bayesian information criterion (BIC) [36] for model selection. The AIC and BIC are easy to compute but require specification of the number of parameters in the model. However, in Bayesian hierarchical model, we can’t specify the number of parameters [40] and the degrees of freedom could be much smaller than the number of parameters [20]. Therefore, we conducted model checking and comparison via Deviance Informa- tion Criterion (DIC) by extending the AIC criterion [40]. Spiegelhalter defined DIC as a classical estimate of fit plus twice the effective number of parameter, i.e,

¯ DIC = D(θ) + 2pD (3.9) ¯ = D + pD where D(θ¯) is the average of D(θ) for all MCMC samples of θ and D(θ) = −2logp(y|θ)+ C is called ‘Bayesian deviance’ in general, where p(y|θ) is the likelihood function of the observed data given the parameter θ, and C is a constant that cancels out in all calculations that compare different models, which therefore does not need to be ¯ ¯ ¯ known. pD = D(θ)−D(θ) is defined as ‘effective’ number of parameters where θ is the average of MCMC samples of θ. That is, pD can be considered as a ‘mean deviance minus the deviance of the means’.

According to the definition of DIC, we only need MCMC samples and the like- lihood function of observed data to calculate DIC. Similar to AIC and BIC, DIC itself has no well-defined meaning. But in model selection, smaller DIC across models

27 implies better fit.

28 Chapter 4: Conclusions

In this chapter, we will cover a summary of results based on posterior samples of 10,000 iterations keeping 10th sample due to the high autocorrelations. Firstly, we assess the fit of model using variance ratio in Section 4.1. We discuss the effects on death rates and effects on treatment rates in Section 4.2 and 4.3 respectively. Then we will move to the spatial factors in Section 4.4 and scaled loadings in Section 4.5. Finally, we will check the independent error term to make sure we have already captured all the spatial dependence in Section 4.6.

4.1 Model Comparison

We compare our joint spatio-temporal model discussed in Chapter 3 with model with no time dependence, i.e, with conditional autoregressive distribution on spatial factor

Uti only. In classical Bayesian Statistics, we use deviance information criterion (DIC) to do model selection as we mentioned in 3.3. But in our model, we assigned zero- mean normal distributions to the independent random errors D and T , which means they can be updated through MCMC algorithm as well. Then the deviance in our model measures the leftover patterns after accounting for covariates, spatial factor and error terms. That is, our error terms in both models can be used to capture the leftover deviance. Then it cannot be compared by DIC directly. Meanwhile, in our thesis, our main goal is to estimate the spatial factor, then DIC is unnecessarily the best criterion to be considered.

We then elect a measurement called the “Variance Ratio” to check model A and model B in Bayesian framework which is often used and is the relative efficiency of

29 model A compared to model B in terms of estimating the factor.

Vˆ ar(U A) Variance Ratio = ti (4.1) ˆ B V ar(Uti ) where Uit is the posterior mean estimates of the spatial factors at location i in time t.

Variance(Uti) captures the estimates of posterior uncertainty in the common spatial factors for model A and model B. Then the variance ratio displays the comparison of model A versus model B for the influence on the variance of the common spatial factor.

From the definition of variance ratio, if model A and B perform equally as well in the estimation of the spatial factor, the ratio should be 1. Then if model B has a larger estimated variance of spatial factor, the ratio will be smaller than 1, otherwise, the ratio will be larger than 1. In our research, model A is our joint spatio-temporal model discussed in chapter 3 while in model B, we only consider conditional autoregressive distribution on spatial factor Uti, i.e,

2 −1 Ut ∼ N(0, τu (D − W) ). (4.2)

Then we believe the spatial factor Uti has no relationship with the past value U(t−1)i for t = 2008,..., 2016 [22]. Then the simpler model B is a spatial-only model for 88 counties in 10 years.

Figure 4.1 shows the estimated variance ratio where the midpoint is 1, blue repre- sents ratio smaller than 1 and red means larger. From Figure 4.1, it is seen that the average of the ratio is around 0.5, which means our model reduces the variation of the spatial factor by half. It results from the improvement in model fit - that is, the introduced temporal structure. However, we do see larger variance clustered around the Cincinnati Metropolitan Area from 2015 in southwestern Ohio. We will discuss the details in Section 5.2.

30 Figure 4.1: Estimated variance ratio

The other noticeable output in Figure 4.1 is that the average estimated variance ratio in 2016 is around 1, which means our model doesn’t have much improvement in the variance of spatial factor. It is clear that the spatial factor is updated from both past values and future values in Bayesian methods. However, associated deaths and treatment admissions data in 2017 are not included in our raw data. Then we can just estimate the spatial factor in 2016 based on the observations in 2016 and 2015. Therefore, the lack of information is the main reason that the variance ratio is around 1 for our model and compared model.

In all, our joint spatio-temporal model can be thought of as being a better fit compared to the joint spatial-only model in terms of variance ratio, the measurement we defined above.

31 4.2 Effects on Death Rates

Figure 4.2: Posterior mean rate for death and treatment

The posterior mean estimates of the rate ratios for death are displayed in Figure

D D 4.2(a), and we see a clear upward trend in posterior mean estimation β0 and β1 over the years. We can see from Table 4.1 that the posterior mean estimates of the intercept term increase from -10.81 in 2007 to -8.78 in 2016 indicating that the

ˆD D overall posterior mean estimates of the log mortality, log(λti ), increase. Recall λti is the average of death counts in county i at time t, and then the intercept terms can be explained as the statewide coefficients when the covariate is zero, meaning there is no adolescent poverty. It is reasonable because the opioid related death increase drastically in Ohio recently.

Each 1% increase in the proportion of adolescent poverty is associated with a 0.0003% multiplicative change of mortality in 2007. As years goes by, each 1% increase

32 in the proportion of adolescent poverty is associated with a 0.0015% multiplicative change of mortality in 2016. All these results told us richer areas suffer more from the opioid related death. And this is consistent with what we see because death is worse in the Cincinnati area. The trend shows the poorer areas are getting worse over time. Also it is consistent with what we believe as the stronger, deadlier drugs started in the cities and have spread to the more rural areas.

4.3 Effects on Treatment Rates

Figure 4.2(b) shows the posterior mean estimates of the intercept and slope for treat- ment admissions from 2007 to 2016, and Table 4.2 shows the posterior standard error of intercepts β1, β3 and slopes β2, β4 for death rates and treatment rates respectively. In general, we observe a sharp decrease in 2007 to 2010 and a significant increase after 2011. It is obvious that the lowest estimation appears in year 2010 and 2011, where the statewide average treatment admissions reach its minimum. We see in- crease of 9 times more treatment admissions for every 1% increase in the proportion of adolescent poverty in 2011 while more than 100 times more treatment admissions for every 1% increase in the proportion of adolescent poverty in 2007 and 2016. In 2010 and 2011, the government began putting resources to battle the opioid epidemic, especially about creating pathways to treatment and recovery. Governor authorizing the expanded use of medication-assisted treatment in combination with traditional counseling in responding to the opiate crisis. All these resources resulted in an in- crease of treatment rates following 2011. Since most of the resources were added to rural, poorer areas, the difference between rich counties and poor counties were also small in 2010 and 2011, but the difference grew following 2011. In 2013, Ohio had a Medicaid expansion, making treatment accessible to more people in poverty stricken areas, and drugs users were intercepted by law enforcement to treatment from the

33 Ohio Department of Mental Health and Ohio State Highway Patrol. This may also explain the sharp increase in treatment rates over the last few years.

β2 β4 Year β1 β2 β3 β4 e e 2007 -10.81 -8.25 3.84 4.71 0.0003 111.51 2008 -10.03 -7.87 0.57 3.91 0.0004 49.95 2009 -9.79 -7.54 0.49 3.05 0.0005 21.06 2010 -9.43 -7.28 -0.45 2.79 0.0007 16.27 2011 -9.68 -6.95 1.11 2.199 0.0010 8.90 2012 -9.75 -6.80 1.67 2.29 0.0011 9.88 2013 -9.47 -6.73 1.52 2.83 0.0012 16.90 2014 -9.27 -6.71 1.93 3.66 0.0012 39.01 2015 -8.98 -6.53 1.66 4.34 0.0015 77.01 2016 -8.78 -6.53 1.90 5.09 0.0015 162.74

Table 4.1: Posterior mean of intercepts β1, β3 and slopes β2, β4 with marginal coeffi- cients eβ2 , eβ4 for death rates and treatment rates respectively.

Year β1 β2 β3 β4 2007 0.25 0.12 1.30 0.63 2008 0.28 0.15 1.46 0.79 2009 0.27 0.15 1.24 0.68 2010 0.27 0.15 1.18 0.66 2011 0.27 0.15 1.12 0.63 2012 0.26 0.14 1.09 0.61 2013 0.25 0.14 1.08 0.63 2014 0.24 0.13 1.08 0.60 2015 0.22 0.13 1.06 0.64 2016 0.21 0.14 1.05 0.71

Table 4.2: Posterior standard deviation of intercepts β1, β3 and slopes β2, β4 for death rates and treatment rates respectively.

4.4 Spatial Factors

The diagram of our joint spatio-temporal model is shown in Figure 4.3. The main goal of this thesis is to estimate the factor after considering the percentage of adolescent poverty and study how the latent burden of opioid misuse varies spatially and over

34 Figure 4.3: Diagram of the joint spatio-temporal model time. ˆ The distribution of the posterior mean estimates of the common spatial factor, Uti, are shown in Figure 4.3. The common spatial factor reflects the common geographical trend as well as the joint association between treatment and death after accounting for the economic status. Therefore, the estimated spatial factor can be interpreted as the “joint burden” for a county that illustrates the consensus between the death and treatment related to opioid use. Then we can explain that the posterior mean of the spatial factor represents an estimate of the underlying unobserved burden of opioid misuse. In addition, the high values in spatial factor reflect a higher burden of opioid misuse since the sign of spatial factor is positively affiliated with bivariate P88 outcomes according to the constraint, i=1 Uti = 0. Since the constraint is made for each year, the spatial factor is constructed in each year. So we can only compare the spatial factor to different counties in the same year. That is, it is meaningless to make comparisons with the estimate spatial factor for the same counties in different years.

In general, one can noticed that there is a clear within-county heterogeneity in ten years in Figure 4.3. The map indicates a zone of high-burden areas in South Ohio, especially in Southeast portions of Ohio. These findings are also supported by the

35 map of death rates and treatments rates where south Ohio is suffered from high death rates and excess treatment rates has been found for the southeast parts of Ohio in ten years. Conversely, we see lower values of spatial factors in northwestern and eastern parts of the state over years.

Figure 4.4: Spatial Factors

Since the common spatial factor explains both death and treatment information after accounting for adolescent poverty, there is no poverty difference between coun- ties, which means we have already accounted for the poverty of each county. Then in addition to poverty, counties in Southeastern Ohio still involve in high burden of opioid misuse, but the burden is becoming less extreme relative to the rest of Ohio in the latest 5 years. That may be because efforts were made to direct resources to that area. In 2011, the Governor’s Cabinet Opiate Action Team (GCOAT) was announced to be established to fight against opiate abuse in Ohio. GCOAT was cre- ated to implement a multifaceted strategy, such as, to promote the responsible use of

36 opioids and reduce the supply of opioids. In 2013, the new Southern Ohio Addiction Treatment Center is established in Jackson County, addressing a gap in local services for individuals. It also could be that the rest of the state just got worse in the latest 5 years while the southeast Ohio doesn’t show up as that much worse anymore. All these possible reasons mean effective methods were put into Southeast Ohio.

4.5 Scaled Loadings

D In Figure 4.4, we display the posterior mean estimates of the scaled loadings, αti . The intuition of the scaled loadings is coefficient, which can be explained by how the

D T common spatial factor influence each of the bivariate outcomes. Then αti and αti in Equation 3.3 represent the coefficients for death and treatment respectively. As we discussed in section 3.2, we fixed the loadings for treatment as constant and compared

D αti .

D After centering and rescaling αti , loadings equal to zero means that the coefficient of spatial factors are the same. That is, death and treatment outcomes contribute closely equal to the common spatial factor. For positive loadings, death is more influential than treatment in that county. If the loadings are less than zero, treatment is a strong contributor to the spatial factor. From Figure 4.4, we note death loads positively on the common spatial factor in western Ohio while treatment is more extreme in eastern part of the state in 2007 and 2008. However, the loadings in mid-eastern Ohio are always positive after 2009. Highlighted in the scaled loadings in southeast Ohio is always blue in the ten years, which means the high treatment rates play an important role in estimating the common spatial factor. And the result is the same because a lot of treatment centers were added to southern Ohio after 2010. In 2015, a large influx of Fentanyl appeared in Greater Cincinnati Metro Area suddenly, which explains deaths have skyrocketed in southwestern Ohio. This is also likely why

37 we see red in this region for the loading in recent years.

Figure 4.5: Spatial Loadings

38 4.6 Independent Error Term

Figure 4.6: Error Term D

Figure 4.5 and 4.6 show the posterior mean estimates of the independent error term

D T for death and treatment, ˆti and ˆti, respectively. We can also call these terms as inde- pendent latent factors or nugget effects[27]. We can interpret them as the deviations from what we expected through the real value given the covariate and spatial factor of the model. If the estimated posterior mean of the error terms are independent, then the rest of model is capturing the spatial trends. If we observe an obvious spatial dependence on the estimation, our model is failing to capture the spatial structure because there is still spatial dependence leftover. When the magnitude of the inde- pendence error terms is small, it implies the difference between the relationship we assumed and the covariate and spatial structure is tiny. It is also an evidence to show that our model fits well.

39 Figure 4.7: Error Term T

It is apparent form Figure 4.5 and 4.6 that there is no spatial dependence in the error term. These findings indicate that there is no spatial structure in the data which does not appear to be fully explained by the common spatial factor and covariate. By comparing Figure 4.5 and Figure 4.6, we can see the magnitude of the independent errors in death are larger than the independent errors in treatment. This indicates the model fits better for treatment data than data data because death are higher than expected based on the rest of model.

40 Chapter 5: Discussion

We constructed a joint spatio-temporal model that accounts for different sources of opioid associated deaths and treatment admissions to estimate underlying burden of opioid epidemic in Ohio. And there are several potential future directions that could be done as an extension to this research based on our data or current development. We will discuss the censored observations and the covariates problems in the following sections.

5.1 Censored Observations

As we mentioned above, treatment admission counts from 1 to 10 are censored because of the state policy. Among all the 880 observations for 88 counties in 10 years, 31 observations are censored for both adolescent and adult, 510 observations are censored in adolescent and 10 in adult. In our research, we imputed 5 for each censored data. To be specific, we imputed 5 for observations censored in adolescent treatment and adult treatment, 10 for observations censored in both treatment under 21 and over 21.

However, according to censored data analysis, we can address this situation by incorporate the information into the likelihood. That is, for counties with only ado- lescent counts under 10, the true value lies between the adult counts plus 1 and the adult counts plus 9. The counties with only adult count under 10, the true total treatment counts lies between the adolescent counts plus 1 and the adult counts plus 9. Actually, when only the adult count is censored, the adolescent counts are all 0s in our data. Then, the counties with only adult count under 10, the true total treatment counts lies between 1 and 9. Similarly, for the county with both adolescent counts

41 and adult counts censored, the true number must lie between 2 and 18. To transfer these information to the likelihood, we can create an interval.

Then, our likelihood function will be

2016 88 T D T D Y  Y D D  L(λ , λ |Y ,Y ) = f(Yti |λti ) × t=2007 i=1

88 (5.1)  Y D D I(cti=0) T T T T I(cti=1) [f(Yti |λti )] [F (Yti + 9|λti) − F (Yti |λti)] × i=1

T T I(cti=2) T T I(cti=3) [F (9|λti) − F (0|λti)] [F (18|λti) − F (1|λti)]

T T where f(·|λi ) and F (·|λi ) are the probability mass function and cumulative distri-

T T bution function, respectively, of a Poisson random variable with mean λi , f(·|λi ) is

D the probability mass function of a Poisson random variable with mean λi , and I(·) is an indicator function such that

  0, total count observed  1, adolescent count censored cti = (5.2)  2, adult count censored  3, both adolescent and adult count censored

We used to order the counties according to cti for a fixed year before plugging in the model since there is a limit of build-in functions in OpenBUGS. However, the censored indicator are not always the same for county i in each year in the actual computation. We hope to learn more about OpenBUGS grammar and find a way to tackle this computational issue to make our model more accurate.

5.2 Covariates

In our research, we only consider one covariate since it is difficult to find all ten-year data. Most of the social environmental characteristics are collected for the major

42 counties with large population from 2007 to 2016. So it is a problem to impute missing data only for small-size counties based on the existing values of large counties which will result in a bias problem for the raw data. But in fact, it is impossible that the death and treatment are only related to only one explanatory variable. So in the future, we can extend our model by adding more covariates, like a couple of covariates that do not change over time. For instance, an indicator of rural vs urban, an indicator of a county being a place with a healthcare shortage, etc.

As discussed above, our model comparison criterion didn’t work well for the Cincinnati Metropolitan area from 2013. We could include a covariate indicating for counties that are considered to be part of Cincinnati Metropolitan Area in South- western Ohio as this is one of the hardest hit regions in the state. There have been cases of people purchasing prescribed fentanyl on the streets in Greater Cincinnati Metropolitan Area in 2015. Since the death due to fentanyl jumped suddenly in 2015, the temporal structure is not well-defined in these area. If we include an indicator of Cincinnati Metro Area, the spatial factor would pick up a more accurate burden.

Overall, we have demonstrated the utility of using spatio-temporal model to syn- thesize multiple outcomes related to the opioid epidemic and the estimation of the latent factor to track the underlying county-level burden. We hope this analysis provides relative comparisons as they reallocate finite resources to make the biggest impact possible and some insights about regions that should be considered for addi- tional resources. In addition, this modeling approach is the foundation for modeling of epidemic. It can be easily extended to incorporate additional outcomes as the data is available from the state surveillance systems in the future. We can also ap- ply it to additional states and other diseases or disorders where similar surveillance measures are collected. In conclusion, we hope this approach provides a coherent, model-based method that can be used by policy makers to make evidence-based de-

43 cisions by putting various pieces of information.

44 Bibliography

[1] Htrotugu Akaike. Maximum likelihood identification of Gaussian autoregressive moving average models. Biometrika, 60(2):255–265, 1973.

[2] DJ Bartholomew and M Knott. Latent variable models and factor analysis, Kendalls, Library of Statistics, vol. 7. New York, NY: Edward Arnold, 1999.

[3] Thomas Bayes. LII. An essay towards solving a problem in the doctrine of chances. By the late Rev. Mr. Bayes, FRS communicated by Mr. Price, in a letter to John Canton, AMFR S. Philosophical transactions of the Royal Society of London, (53):370–418, 1763.

[4] . Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society. Series B (Methodological), pages 192– 236, 1974.

[5] Julian Besag and Charles Kooperberg. On conditional and intrinsic autoregres- sions. Biometrika, 82(4):733–746, 1995.

[6] Stephen P Brooks and Andrew Gelman. General methods for monitoring conver- gence of iterative simulations. Journal of computational and graphical statistics, 7(4):434–455, 1998.

[7] Susan Calcaterra, Jason Glanz, and Ingrid A Binswanger. National trends in pharmaceutical opioid related overdose deaths compared to other substance re- lated overdose deaths: 1999–2009. Drug and alcohol dependence, 131(3):263–270, 2013.

45 [8] Bradley P Carlin, Alan E Gelfand, and Sudipto Banerjee. Hierarchical modeling and analysis for spatial data. Chapman and Hall/CRC, 2014.

[9] CDC. Web-based injury statistics query and reporting system (wisqars). Avail- able at http://www.cdc.gov/ncipc/wisqars.

[10] and John Kaldor. Empirical Bayes estimates of age-standardized relative risks for use in disease mapping. Biometrics, pages 671–681, 1987.

[11] Shane Darke, Louisa Degenhardt, and Richard Mattick. Mortality amongst illicit drug users: epidemiology, causes and intervention. Cambridge University Press, 2006.

[12] David B Dunson. Dynamic latent trait models for multidimensional longitudinal data. Journal of the American Statistical Association, 98(463):555–563, 2003.

[13] BS Everitt. An introduction to latent variable models. 1984.

[14] The Henry J. Kaiser Family Foundation. State health facts, 2016. Opi- oid Overdose Death Rates and All Drug Overdose Death Rates per 1000,000 Population (Age-Adjusted). https://www.kff.org/other/state-indicator/ opioid-overdose-death-rates.

[15] Carl Friedrich Gauss. M´ethode des moindres carr´es:M´emoires sur la combinai- son des observations. Mallet-Bachelier, 1855.

[16] Stuart Geman and Donald Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. In Readings in computer vision, pages 564–584. Elsevier, 1987.

46 [17] GA: US Department of Health HAN health advisory. Atlanta and CDC Hu- man Services. Increases in fentanyl drug confiscations and fentanyl-related over- dose fatalities, 2015. https://emergency.cdc. gov/han/han00384.asp.

[18] Mark S Handcock, Krista J Gile, and Corinne M Mar. Estimating hidden popula- tion size using respondent-driven sampling data. Electronic journal of statistics, 8(1):1491, 2014.

[19] Staci Hepler, Erin McKnight, Andrea Bonny, and David Kline. A Latent Spa- tial Factor Approach for Synthesizing Opioid Associated Deaths and Treatment Admissions in Ohio Counties. Epidemiology (Cambridge, Mass.), 2019.

[20] James S Hodges and Daniel J Sargent. Counting degrees of freedom in hierar- chical and other richly-parameterised models. Biometrika, 88(2):367–379, 2001.

[21] Michael Jann, William Klugh Kennedy, and Gaylord Lopez. Benzodiazepines: a major component in unintentional prescription drug overdoses with opioid anal- gesics. Journal of pharmacy practice, 27(1):5–16, 2014.

[22] David Kline, Staci Hepler, Andrea Bonny, and Erin McKnight. A joint spatial model of opioid associated deaths and treatment admissions in Ohio. Annals of Epidemiology, 2019.

[23] John Lamperti. Stochastic processes: a survey of the mathematical theory. 1977.

[24] Adrien Marie Legendre. Nouvelles m´ethodes pour la d´eterminationdes orbites des com`etes. F. Didot, 1805.

[25] Miguel A Martinez-Beneito, Paloma Botella-Rocamora, and Sudipto Banerjee. Towards a multidimensional approach to Bayesian disease mapping. Bayesian analysis, 12(1):239, 2017.

47 [26] Erin R McKnight, Andrea E Bonny, Hannah LH Lange, David M Kline, Mah- moud Abdel-Rasoul, Joseph R Gay, and Steven C Matson. Statewide opioid prescriptions and the prevalence of adolescent opioid misuse in Ohio. The Amer- ican journal of drug and alcohol abuse, 43(3):299–305, 2017.

[27] ES Neeley, WF Christensen, and SR Sain. A Bayesian spatial factor analysis approach for combining climate model ensembles. Environmetrics, 25(7):483– 497, 2014.

[28] Ohio Department of Health. 2017 ohio drug overdose data: General findings in- ternet, 2017. http://www.odh.ohio.gov/health/vipp/drug/dpoison.aspx.

[29] Joseph J Palamar, Jenni A Shearston, and Charles M Cleland. Discordant re- porting of nonmedical opioid use in a nationally representative sample of us high school seniors. The American journal of drug and alcohol abuse, 42(5):530–538, 2016.

[30] Laura C Palombi, Catherine A St Hill, Martin S Lipsky, Michael T Swanoski, and M Nawal Lutfiyya. A scoping review of opioid misuse in the rural United States. Annals of Epidemiology, 2018.

[31] Lucy Platt, Martin Wall, Tim Rhodes, Ali Judd, Matthew Hickman, Lisa G Johnston, Adrian Renton, Natalia Bobrova, and Anya Sarang. Methods to re- cruit hard-to-reach groups: comparing two chain referral sampling methods of recruiting injecting drug users across nine studies in Russia and Estonia. Journal of Urban Health, 83(1):39, 2006.

[32] Gareth O Roberts. Markov chain concepts related to sampling algorithms. Markov chain Monte Carlo in practice, 57, 1996.

48 [33] Gareth O Roberts and Jeffrey S Rosenthal. Examples of adaptive MCMC. Jour- nal of Computational and Graphical Statistics, 18(2):349–367, 2009.

[34] Sheldon M Ross. Introduction to probability models. 2007.

[35] SAMHSA, Center for Behavioral Health Statistics, and Quality. Treatment episode data set, 2002-2012: national admissions to substance abuse treatment services. BHSIS series S-71, HHS publication no. (SMA) 14-4850, 2014.

[36] Gideon Schwarz et al. Estimating the dimension of a model. The annals of statistics, 6(2):461–464, 1978.

[37] Puja Seth, Lawrence Scholl, Rose A Rudd, and Sarah Bacon. Overdose deaths involving opioids, cocaine, and psychostimulantsunited states, 2015–2016. Amer- ican Journal of Transplantation, 18(6):1556–1568, 2018.

[38] Christopher A Sims. Macroeconomics and reality. Econometrica: journal of the Econometric Society, pages 1–48, 1980.

[39] , Andrew Thomas, , and Dave Lunn. Openbugs User Manual Ver. 3.2.3 (March, 2014), 2014.

[40] David J Spiegelhalter, Nicola G Best, Bradley P Carlin, and Angelika Van Der Linde. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4):583–639, 2002.

[41] Governors Cabinet Opiate Action Team. Combating the opiate crisis in ohio. Available at https://mha.ohio.gov/Portals/0/assets/Initiatives/GCOAT/ Combating-the-Opiate-Crisis_SEPT-2018.pdf.

[42] Waldo R Tobler. A computer movie simulating urban growth in the Detroit region. Economic geography, 46(sup1):234–240, 1970.

49 [43] Evangelia Tzala and Nicky Best. Bayesian latent variable modelling of multivari- ate spatio-temporal variation in cancer mortality. Statistical methods in medical research, 17(1):97–118, 2008.

[44] Fujun Wang and Melanie M Wall. Generalized common spatial factor model. , 4(4):569–582, 2003.

[45] Margaret Warner, Li Hui Chen, and Diane M Makuc. Increase in fatal poisonings involving opioid analgesics in the United States, 1999-2006. NCHS data brief, (22):1–8, 2009.

[46] Peter Whittle. On stationary processes in the plane. Biometrika, pages 434–449, 1954.

[47] Haruo Yanai, Yutaka Inaba, Hirofumi Takagi, Hiroyuki Toyokawa, and Shunichi Yamamoto. An epidemiological study on mortality rates of various cancer sites during 1958 to 1971 by means of factor analysis. Behaviormetrika, 5(5):55–74, 1978.

50 Appendix A: Description of covariates

51 Variable Description Median Age Median age of residents in the county Number of Interstates Number of major interstates that pass through the county Health Professional Shortage Age (HPSA) Indicator of a shortage of mental health facilities Aged 18-64 on Disability Percent of disability population aged 18 to 64 Cincinnati Metropolitan Indicator of belonging to the Cincinnati Metropolis Single Female Households Percent of single female households High Intensity Drug Trafficking Area Indicator of belonging to this DEA classification Food Stamps Percent of county on food stamps Percent White Percent of population classified white by Census Percent Hispanic Percent of population classified hispanic by Census Unemployment Rate Percent of population receiving unemployment benefits Median Household Income Median total income of all households SSI Percent receiving Supplemental Security Income

52 Health Insurance Percent of population with health insurance Public Insurance Percent of population with public health insurance Below Poverty Percent of county with incomes below poverty line Bachelors Degree Percent with at least a bachelor degree Percent Rural Percent of population living in a rural area Grandparents Percent of households with minors being raised by grandparents

Table A.1: Covariates Appendix B: Trace Plots

B.0.1 R code set.seed(4222019) n<-length(out3$sims.array)/(out3$n.iter-out3$n.burnin) index<-sample(n,5) for (i in 1:5){ matplot(out3$sims.array[,,index[i]],type = "l", ylab = "Random Select Parameter") }

B.0.2 Trace Plots

53 54 Figure B.1: Trace Plot

55 Appendix C: Relevant R code

In this section, I’ll include primary code for our model.

C.0.1 Model

library(R2OpenBUGS) ############################################################ ########### Load data ############ ############################################################

TDbyyear<-read.csv(file="CorrectData.csv",header=T) summary(TDbyyear)

W<-read.csv(file = "OhioAdjacency.csv",header = FALSE) ohio<-read.csv("treatment_death_data.csv",header=T)

############################################################ ########### Set up for BUGS ############ ############################################################ n<-nrow(W) TT = length(unique(TDbyyear$Year)) num<-colSums(W) adj<-NULL

56 for(j in 1:n){ adj<-c(adj,which(W[j,]==1)) } adj<-as.vector(adj) num<-as.vector(num) weights<-1+0*adj sink("mod.txt") cat(" model { for (t in 2007:2007){ for (i in 1:88){ y1[i] ~ dpois(lambda1[i]) y2[i] ~ dpois(lambda2[i]) log(lambda1[i])<-log(z[i])+ beta1[t-2006]+ beta3[t-2006]*x[i] + (1+alpha[i])*U[i]+ V1[i] log(lambda2[i])<-log(z[i])+ beta2[t-2006]+ beta4[t-2006]*x[i] + U[i]+ V2[i] V1[i] ~ dnorm(0,tau.V1) V2[i] ~ dnorm(0,tau.V2) }

# ICAR prior for the spatial random effects and loadings alpha U[1:n] ~ car.normal(adj[], weights[], num[], tau.U) alpha[1:n] ~ car.normal(adj[], weights[], num[], tau.alpha)

57 #Priors beta1[t-2006] ~ dflat() beta2[t-2006] ~ dflat() beta3[t-2006] ~ dflat() beta4[t-2006] ~ dflat() } for (t in 2008:2007+TT-1){ for (i in 1:88){

y1[88*(t-2007)+i] ~ dpois(lambda1[88*(t-2007)+i]) y2[88*(t-2007)+i] ~ dpois(lambda2[88*(t-2007)+i]) log(lambda1[88*(t-2007)+i])<-log(z[i])+ beta1[t-2006] + beta3[t-2006]*x[88*(t-2007)+i] + (1+alpha[88*(t-2007)+i])* (U[88*(t-2007)+i]+mu[88*(t-2007)+i]) +V1[88*(t-2007)+i] log(lambda2[88*(t-2007)+i])<-log(z[i])+ beta2[t-2006] + beta4[t-2006]*x[88*(t-2007)+i] + (U[88*(t-2007)+i]+ mu[88*(t-2007)+i]) + V2[88*(t-2007)+i] V1[88*(t-2007)+i] ~ dnorm(0,tau.V1) V2[88*(t-2007)+i] ~ dnorm(0,tau.V2) mu[88*(t-2007)+i]<-eta*U[88*(t-2008)+i] }

# ICAR prior for the spatial random effects and loadings alpha U[(88*(t-2007)+1):(88*(t-2007)+n)] ~ car.normal(adj[], weights[], num[], tau.U)

58 alpha[(88*(t-2007)+1):(88*(t-2007)+n)] ~ car.normal(adj[], weights[], num[], tau.alpha)

#Priors beta1[t-2006] ~ dflat() beta2[t-2006] ~ dflat() beta3[t-2006] ~ dflat() beta4[t-2006] ~ dflat() } tau.V1 ~ dgamma(0.5,0.5) tau.V2 ~ dgamma(0.5,0.5) tau.U ~ dgamma(0.5,0.5) tau.alpha ~ dgamma(0.5,0.5) eta ~ dunif(0,1)

}

}

", fill = TRUE) sink() ############################################################ ######### Call BUGS ########### ############################################################ data<-list(y1=TDbyyear$AllDeaths,y2=TDbyyear$TreatmentTotal,

59 x=TDbyyear$AdolescentPoverty,z=TDbyyear$TotalPopulation, n=n,TT=TT,num=num,adj=adj,weights=weights)

inits<- function(){list(beta1=rep(-5,TT),beta2=rep(-3,TT),beta3=rep(0.01,TT), beta4=rep(0.01,TT),U=rep(0,n*TT),V1=rep(0,n*TT),V2=rep(0,n*TT), alpha=rep(0,n*TT),tau.V1=0.1,tau.V2=0.1,tau.U=0.1,tau.alpha=0.1, eta=.9) } keepers <- c("beta1","beta2","beta3","beta4","V1","V2","tau.V1","tau.V2", "tau.U","alpha","tau.alpha","U","eta") ptm = proc.time() out3<- bugs( data=data, inits=inits, parameters.to.save=keepers, model.file="mod.txt", n.iter=10000, n.chains=1, n.thin=10 ) proc.time()-ptm

60 C.0.2 Map of Ohio

TDbyyear$death_rate=TDbyyear$AllDeaths/TDbyyear$TotalPopulation*100000 TDbyyear$treatment_rate=TDbyyear$TreatmentTotal/ TDbyyear$TotalPopulation*100000

#### load the packages required library(’rgeos’) library(’spdep’) library(’rgdal’) library(’maptools’) library(’shapefiles’) library(’ggmap’)

U<-out3$mean$U beta1<-out3$mean$beta1 beta2<-out3$mean$beta2 beta3<-out3$mean$beta3 beta4<-out3$mean$beta4 alpha<-out3$mean$alpha V1<-out3$mean$V1 V2<-out3$mean$V2 VAR1<-(out3$sd$U)^2 VAR2<-(out4$sd$U)^2 var_ratio<-VAR1/VAR2

61 u<-matrix(NA,nrow=88,ncol=10) a<-matrix(NA,nrow=88,ncol=10) d<-matrix(NA,nrow=88,ncol=10) t<-matrix(NA,nrow=88,ncol=10) x<-matrix(NA,nrow=88,ncol=10) v1<-matrix(NA,nrow=88,ncol=10) v2<-matrix(NA,nrow=88,ncol=10) var_r<-matrix(NA,nrow=88,ncol=10) for (i in 1:10){ u[,i]<-U[(88*i-87):(88*i)] a[,i]<-alpha[(88*i-87):(88*i)] d[,i]<-TDbyyear$death_rate[(88*i-87):(88*i)] t[,i]<-TDbyyear$treatment_rate[(88*i-87):(88*i)] x[,i]<-TDbyyear$AdolescentPoverty[(88*i-87):(88*i)] v1[,i]<-V1[(88*i-87):(88*i)] v2[,i]<-V2[(88*i-87):(88*i)] var_r[,i]<-var_ratio[(88*i-87):(88*i)] } u<-data.frame(u) a<-data.frame(a) d<-data.frame(d) t<-data.frame(t) x<-data.frame(x)

62 v1<-data.frame(v1) v2<-data.frame(v2) var_r<-data.frame(var_r)

####factor in dataframe change to numeric u2<-data.frame(lapply(u, function(x) as.numeric(as.character(x)))) a2<-data.frame(lapply(a, function(x) as.numeric(as.character(x)))) d2<-data.frame(lapply(d, function(x) as.numeric(as.character(x)))) t2<-data.frame(lapply(t, function(x) as.numeric(as.character(x)))) x2<-data.frame(lapply(x, function(x) as.numeric(as.character(x)))) v1<-data.frame(lapply(v1, function(x) as.numeric(as.character(x)))) v2<-data.frame(lapply(v2, function(x) as.numeric(as.character(x)))) var_r<-data.frame(lapply(var_r, function(x) as.numeric(as.character(x)))) outresult<-data.frame(cbind(as.vector(ohio$County),u2,a2,d2,t2,x2, v1,v2,var_r)) colnames(outresult)<-c("County",paste0("U",1:10),paste0("alpha",1:10), paste0("death",1:10),paste0("treatment",1:10),paste0("x",1:10), paste0("v1",1:10),paste0("v2",1:10),paste0("var_r",1:10))

### read in the data count.data=read.csv(’treatment_death_data0.csv’) count.data$id=count.data$GeoID-39000 count.data<-cbind(count.data,outresult)

63 #Read in Ohio county shape file #get shapes oh.map=readOGR(dsn=’Shapes’,layer=’cb_2014_us_county_500k’)

#convert to data frame oh.map.data=fortify(oh.map[oh.map$STATEFP==’39’,],region=’COUNTYFP’) oh.map.data$id2=as.numeric(oh.map.data$id) oh.map.data.merge=merge(oh.map.data,count.data,by.x="id2",by.y="id")

##Spatial Factor U1.load.map=ggplot()+ geom_polygon(data=oh.map.data.merge,aes(x=long,y=lat,group=id, fill=U1),color=’black’,alpha=.8,size=.3)+ scale_fill_gradient2(name="",midpoint=0,low=’blue’,high=’red’)+ coord_map()+ theme_nothing(legend=T)+ ggtitle("2007")+theme(plot.title = element_text(hjust = 0.5))

U1.load.map

#Loadings alpha1.load.map=ggplot()+ geom_polygon(data=oh.map.data.merge,aes(x=long,y=lat,group=id, fill=alpha1),color=’black’,alpha=.8,size=.3)+ scale_fill_gradient2(name="",midpoint=0,low=’blue’,high=’red’)+

64 coord_map()+ theme_nothing(legend=T)+ ggtitle("2007")+theme(plot.title = element_text(hjust = 0.5)) alpha1.load.map

#death d1.load.map=ggplot()+ geom_polygon(data=oh.map.data.merge,aes(x=long,y=lat,group=id, fill=death1),color=’black’,alpha=.8,size=.3)+ scale_fill_gradient2(name="",midpoint=0,low=’blue’,high=’red’)+ coord_map()+ theme_nothing(legend=T)+ ggtitle("2007")+theme(plot.title = element_text(hjust = 0.5)) d1.load.map

#treatment rate t1.load.map=ggplot()+ geom_polygon(data=oh.map.data.merge,aes(x=long,y=lat,group=id, fill=treatment1),color=’black’,alpha=.8,size=.3)+ scale_fill_gradient2(name="",midpoint=0,low=’blue’,high=’red’)+ coord_map()+ theme_nothing(legend=T)+ ggtitle("2007")+theme(plot.title = element_text(hjust = 0.5))

65 t1.load.map

#Poverty x1.load.map=ggplot()+ geom_polygon(data=oh.map.data.merge,aes(x=long,y=lat,group=id, fill=x1),color=’black’,alpha=.8,size=.3)+ scale_fill_gradient2(name="",midpoint=0,low=’blue’,high=’red’)+ coord_map()+ theme_nothing(legend=T)+ ggtitle("2007")+theme(plot.title = element_text(hjust = 0.5)) x1.load.map

#error term v11.load.map=ggplot()+ geom_polygon(data=oh.map.data.merge,aes(x=long,y=lat,group=id, fill=v11),color=’black’,alpha=.8,size=.3)+ scale_fill_gradient2(name="",midpoint=0,low=’blue’,high=’red’)+ coord_map()+ theme_nothing(legend=T)+ ggtitle("2007")+theme(plot.title = element_text(hjust = 0.5)) v11.load.map v21.load.map=ggplot()+ geom_polygon(data=oh.map.data.merge,aes(x=long,y=lat,group=id,

66 fill=v21),color=’black’,alpha=.8,size=.3)+ scale_fill_gradient2(name="",midpoint=0,low=’blue’,high=’red’)+ coord_map()+ theme_nothing(legend=T)+ ggtitle("2007")+theme(plot.title = element_text(hjust = 0.5)) v21.load.map

#variance ratio var1.load.map=ggplot()+ geom_polygon(data=oh.map.data.merge,aes(x=long,y=lat,group=id, fill=var_r1),color=’black’,alpha=.8,size=.3)+ scale_fill_gradient2(name="",midpoint=1,low=’blue’,high=’red’)+ coord_map()+ theme_nothing(legend=T)+ ggtitle("2007")+theme(plot.title = element_text(hjust = 0.5)) var1.load.map

#10 years library(ggpubr) ggarrange(U1.load.map, U2.load.map,U3.load.map,U4.load.map,U5.load.map, U6.load.map,U7.load.map,U8.load.map,U9.load.map,U10.load.map, common.legend = TRUE, legend="right", ncol = 5, nrow = 2) ggarrange(alpha1.load.map, alpha2.load.map,alpha3.load.map,

67 alpha4.load.map,alpha5.load.map,alpha6.load.map,alpha7.load.map, alpha8.load.map,alpha9.load.map,alpha10.load.map, common.legend = TRUE, legend="right", ncol = 5, nrow = 2) ggarrange(d1.load.map, d2.load.map,d3.load.map,d4.load.map,d5.load.map, d6.load.map,d7.load.map, d8.load.map,d9.load.map,d10.load.map, common.legend = TRUE, legend="right", ncol = 5, nrow = 2) ggarrange(t1.load.map, t2.load.map,t3.load.map,t4.load.map,t5.load.map, t6.load.map,t7.load.map, t8.load.map,t9.load.map,t10.load.map, common.legend = TRUE, legend="right", ncol = 5, nrow = 2) ggarrange(x1.load.map, x2.load.map,x3.load.map,x4.load.map,x5.load.map, x6.load.map,x7.load.map, x8.load.map,x9.load.map,x10.load.map, common.legend = TRUE, legend="right", ncol = 5, nrow = 2) ggarrange(v11.load.map, v12.load.map,v13.load.map,v14.load.map, v15.load.map,v16.load.map, v17.load.map,v18.load.map,v19.load.map,v110.load.map,

68 common.legend = TRUE, legend="right", ncol = 5, nrow = 2) ggarrange(v21.load.map, v22.load.map,v23.load.map,v24.load.map, v25.load.map,v26.load.map, v27.load.map,v28.load.map,v29.load.map,v210.load.map, common.legend = TRUE, legend="right", ncol = 5, nrow = 2) ggarrange(var1.load.map, var2.load.map,var3.load.map,var4.load.map, var5.load.map,var6.load.map, var7.load.map,var8.load.map,var9.load.map,var10.load.map, common.legend = TRUE, legend="right", ncol = 5, nrow = 2)

##########beta beta1<-out3$mean$beta1 beta2<-out3$mean$beta2 plot(1:10,beta1,type="l",ylim=c(-12,-5)) lines(1:10,beta2,col=2) beta3<-out3$mean$beta3 beta4<-out3$mean$beta4 plot(1:10,beta3,type="l",ylim=c(-1,5)) lines(1:10,beta4,col=2)

69 par(oma=c(4,1,1,1),mfrow=c(1,2),xpd=NA) plot(2007:2016,beta1,type="l",xaxt = "n",main="(a) Death",cex.lab=1.5, cex.axis=1.5,cex.main=1.5,lwd=2,ylim=c(-12,-5), ylab="Posterior mean rate ratios for death",xlab="Year") lines(2007:2016,beta2,col=2) axis(1, at = 2007:2016) plot(2007:2016,beta3,type="l",xaxt = "n",cex.lab=1.5, main="(b) Treatment",cex.axis=1.5,cex.main=1.5,lwd=2,ylim=c(-1,5), ylab="Posterior mean rate ratios for treatment",xlab="Year") lines(2007:2016,beta4,col=2) axis(1, at = 2007:2016) par(fig = c(0, 1, 0, 1), oma = c(0, 0, 0, 0), mar = c(0, 0, 0, 0), new = TRUE) plot(0, 0, type = "n", bty = "n", xaxt = "n", yaxt = "n") legend("bottom",legend=c("Intercept", "Slope"),col=1:2,bty="n", xpd = TRUE, horiz = TRUE, inset = c(0,0),cex = 1.5,lwd = 1.5, lty = 1)

70 Curriculum Vitae Yixuan Ji

UNDERGRADUATE Xiamen University STUDY: Xiamen, China B.S., Economics Statistics, June 2017 Korea University Seoul, Korea GRADUATE Wake Forest University STUDY: Winston-Salem, North Carolina M.A., Statistics, May 2019

Honors and Awards

1. Best Graduating Student in Statistics at WFU (2018-2019)

2. Member, Pi Mu Epsilon (National Mathematics Honor Society; April 2018)

3. Teaching Assistantship in 2018 granted by Wake Forest University

4. Member, American Mathematical Society (2018)

5. Summer Research Fellowship at WFU (2018)

6. Tuition Scholarship in 2017 granted by Wake Forest University

7. Dean’s List at XMU (2015)

8. First Prize of Academic Scholarship at XMU (2014-2015)

71