<<

Evaluating Health Policy Effect with Generalized and Generalized Estimating Equation Model

A Thesis

Presented in Partial Fulfillment of the Requirements for the Degree Master of and Quantitative Risk Management in the Graduate School of The Ohio State University

By

Chen Zhao, B.S.

Graduate Program in Department of Mathematics

The Ohio State University

2020

Master’s Examination Committee:

Dr. Chunsheng Ban, Co-Advisor Dr. Bo Lu, Co-Advisor c Copyright by

Chen Zhao

2020 Abstract

According to the Affordable Care Act (ACA) in 2010, the scope of Medicaid is expanded to cover the people with annual incomes under the 138% of the Federal

Poverty Level (FPL). Each state has the option to determine whether they join the program or not. So far, there are over 30 states have implemented the Medicaid expansion program. Also, in the rural counties, due to the lack of nearby hospitals and low population, there is a general trend leading to loss of medical care workforce supply. In this thesis, I will examine the association between the Medicaid expansion and the change of health care workforce supply in the rural counties. To answer this question, I used the difference-in-differences (DD) model. The effect of the Medicaid expansion program on rural health care workforce supply was explored by analyzing the change of number of Primary Care Physicians and Nurse Practitioners before and after the Medicaid expansion. I used the (GLM) and

Generalized Estimating Equation (GEE) to construct DD estimates and standard errors, and compare the fitting results of both. Finally, I also compared the model

fitting in and negative binomial distribution.

ii This is dedicated to my family.

iii Acknowledgments

I would like to express deep gratitude to my advisor, Dr. Chunsheng Ban, for his encouragement and suggestion that always provide me with a sense of direction in pursuit of knowledge. The completion of this project could not have been possible without the participation and assistance of my research mentor, Dr. Bo Lu, I greatly appreciate all of his guidance on my study. Also, it has been an honor to join Dr. Yi

Xu’s research group, I cannot thank enough for her support and encouragement.

I deeply appreciate everyone advice and guidance in support of this project.

iv Vita

November 14, 1995 ...... Born - Beijing, China

2018 ...... B.S. , Beijing Technology and Business University. 2018-present ...... Graduate Student, The Ohio State University.

Fields of Study

Major Field: Quantitive Risk Management and Science

v Table of Contents

Page

Abstract ...... ii

Dedication ...... iii

Acknowledgments ...... iv

Vita...... v

List of Tables ...... viii

List of Figures ...... ix

1. Introduction ...... 1

1.1 The introduction of Causal Inference ...... 1 1.1.1 Average Treatment Effect ...... 2 1.1.2 Randomized ...... 3 1.1.3 Estimated Causal Effect in Regression ...... 3 1.1.4 Causal Inference in ...... 4 1.2 Health Policy Background ...... 6 1.3 Method ...... 7 1.3.1 Instrumental Variables ...... 7 1.3.2 Regression Discontinuity ...... 8

2. Methods ...... 10

2.1 Difference-in-Differences ...... 10 2.1.1 Framework of difference-in-differences ...... 10 2.1.2 Difference-in-differences with Regression ...... 14 2.2 Generalized Linear Model ...... 15 2.2.1 Framework of ...... 15

vi 2.2.2 Setting of Generalized Linear Model ...... 15 2.2.3 Robust Estimation ...... 17 2.3 Generalized Estimating Equation ...... 19

3. Data Analysis ...... 21

3.1 Data ...... 21 3.2 Model ...... 24 3.3 Results ...... 27

4. Conclusion ...... 33

Appendices 35

A. Stata commands and results ...... 35

Bibliography ...... 40

vii List of Tables

Table Page

3.1 The number of rural counties in each state ...... 22

3.2 Status of State Action on the Medicaid Expansion Decision ...... 23

3.3 State-level Characteristics ...... 25

3.4 Medicaid expansion effect on Primary Care Physician and Nurse Prac- titioner using GLM and GEE ...... 29

3.5 Medicaid expansion effect on Primary Care Physician and Nurse Prac- titioner using GEE with Negative Binomial distribution ...... 31

viii List of Figures

Figure Page

2.1 Difference-in-differences parallel trend. Note: Adapted from [10] . . . 12

3.1 the Primary Care Physician in rural county over year ...... 25

3.2 Distribution of Nurse Practitioner in rural county over year ...... 26

3.3 Distribution of Primary Care Physician in rural county ...... 26

3.4 Distribution of Nursing Practitioner in rural county ...... 27

3.5 Distribution of PCP density by year ...... 31

3.6 Distribution of NP density by year ...... 32

ix Chapter 1: Introduction

1.1 The introduction of Causal Inference

From a very general perspective, causal inference can be thought of as a field

related to statistics, computer science, psychology, economics, and many more. One

could think of causal inference as a field of quantitative research in understanding

the causality. The causality has played a critical role in developing social science

research. For example, people were wondering if schooling has a causal connection to

the future income. It is quite possible that the causal inference helps us understand

the impact of certain variables on the outcomes. i.e. we can infer what will happen

to the average income for every additional year in school.

Different from the association relation which is computed from the data alone,

the causal relation focus on the data-generating process. [16] In causal inference, we

denote the treatment A and outcome Y . If we want to understand causality, we have

to define the potential outcomes that refer to the values of all outcomes describing

experimental units with different treatments [19]. Here, I simplify the treatment at

two levels for individual i (Ai=0 is without treatment, Ai= 1 is with treatment).

A=1 The potential outcome for subject i getting treatment is Yi , on the other hand,

A=0 the potential outcome without the treatment is Yi . The observed outcome Yi for

1 individual i could be expressed in terms of potential outcomes

A=1 A=0 Yi = AiYi + (1 − Ai)Yi (1.1)

By comparing the difference of the potential outcomes between two treatment

A=0 A=1 status, Yi − Yi , we can identify the causal effect of a treatment A on outcomes Y for subject i.

1.1.1 Average Treatment Effect

Fundamental problem for causal inference is that we cannot obtain all potential outcomes for the same individual. We can only observe one of potential outcomes that treatment actually assigned, and the others are missing. Therefore, inference for causal effect of the treatment is a problem. [18]

In general, we will consider the Average Treatment Effect (ATE) which is the average causal effect of population. People are assigned to treatment groups (Yi|Ai =

1) and control groups (Yi|Ai = 0), we can obtain a causal relationship between the treatment and the outcomes by comparing the average results of the treatment group and the control group.

A=1 A=0 E(Yi|Ai = 1) − E(Yi|Ai = 0) = E(Yi |Ai = 1) − E(Yi |Ai = 1) (1.2) A=0 A=0 + E(Yi |Ai = 1) − E(Yi |Ai = 0)

A=1 A=0 The term E(Yi |Ai = 1)−E(Yi |Ai = 1), on the right hand side, is the average

A=0 A=0 causal effect in the treatment group. The leftover, E(Yi |Ai = 1)−E(Yi |Ai = 0), is called the selection bias. The selection bias is generated by the pre-treatment

covariates difference between two groups. Due to the presence of selection bias, we

cannot directly conclude the causal relationship based on the observed outcomes.

2 Thus, many causal inference methods, such as , are aimed to eliminate the

selection bias.

1.1.2

One way to solve the selection bias problem is to use in treatment

assignment. In the randomized experiment, since each individual are assigned to

different groups randomly, the treatment A is independent of potential outcomes

A=0,1 Yi . Thus, the selection bias in (1.2) will disappear

A=0 A=0 A=0 A=0 E(Yi |Ai = 1) − E(Yi |Ai = 0) = E(Yi ) − E(Yi ) = 0 (1.3)

The purpose of such is to ensure that the observed results are only

dependent on the treatment. Because at this time the treatment group and the

experimental group are drawn from one distribution. It is impossible to guarantee

that all other variables are identical if we pick up individuals in the treated group.

However, on the average, since we randomly pick up from the same distribution, the

average differences are arbitrary between the two groups, and systematic differences

disappear. Therefore, the difference in average outcomes between the two groups is

solely determined by the treatment.

1.1.3 Estimated Causal Effect in Regression

We used the conditional expectation function (CEF) to model (1.2) the causal

effect in the last sections. According to the Theorem of The regression-CEF [1], the

best linear estimated CEF is the model.

Here, we set up our causal effect model as

3 Yi = α + ρAi + βXi + i (1.4)

where ρ is the causal effect, Xi denotes the vector of all other variables related to

2 response variables Yi, and i ∼ N(0, σ ) is the random error term.

In order to remove selection bias, we need to control the covariates Xi to keep the balance over treatment groups. Thus, one key assumption that can guarantee regres- sion having the causal interpretation is the conditional independence assumption [5]

A A E(Yi |Xi,Ai = 1) = E(Yi |Xi,Ai = 0) = E(Yi|Xi,Ai) (1.5)

Xi is the covariates, the expected potential outcomes are independent by the treat-

ment, which that there is no systematic difference of potential outcomes across

the treated groups. This ensures that the causal variable of interest is independent

of potential outcomes so that the groups being compared are truly comparable [1].

When we estimate the causal effect based on the above equation, we need to control

all covariates so that the conditional independence assumption can be satisfied. But

in real life problems, this might not work.

1.1.4 Causal Inference in Observational Study

Despite the randomized experiments are the most effective way to infer causality,

it is difficult to implement in the real world, especially in social science. First, the

randomized experiments are high cost in expense and time. For example, the Ten-

nessee Student-Teacher Achievement Ratio (STAR) is a randomized experiment that

intends to investigate if the class size would affect the student achievement [12]. This

experiment cost about $12 million and last for four years. Second, there are some

ethical issues in randomized experiments. The experiment unit is required to ignore

4 the experimental conditions in the randomized experiment. Thus, there are some legal and constitutional problems listed in Winich’s work [22].

Therefore, people use the observational study to make the causal inference that the individuals could self select to the treatment group or control group. When using the observational study to estimate the causal effect, there will be a difference between the treatment group and the control group other than the treatment intervention.

For example, we want to check the effects of hospitalization on health status without control, we will select hospital treatment as the treatment variable of interest, and the response variable is the measure of the health status. We use observational studies so that the treatment group is patients in the hospital, and the control group is people who have not recently hospitalized. Because people in the treatment group will come to the hospital because of their own disease, so the treatment group itself has some characteristics (covariates) that affect the response variable, which we called confounders. Due to the existence of confounders, it will cause the selection bias, so the estimated causal effect no longer accurate, by affecting the responses variables and treatment variables.

In order to examine the causal effect in the observational study, Hern´anand

Robin list three identifiability conditions that conceptualized observational study as conditionally randomized experiment [8]

1. Conditional Exchangeability: Y A ⊥⊥ A|X, the treatment A is independent of

potential outcome Y A given the covariates X

2. Positivity: P(A|X) > 0, ∀A, the probability of any assigned treatment must be greater than zero.

5 A=1 A=0 3. Consistency: Yi = AiYi + (1 − Ai)Yi , the observed outcomes is only determined by treatment status. This means that for the same covariates, the

outcome of treatment is fixed.

1.2 Health Policy Background

Causal inference has many applications in the real life, such as clinical trials, economics, etc. In this paper, I will mainly use causal inference on the topic of health care policy, especially the Medicaid expansion in 2014.

Medicaid is a government-sponsored insurance program which was signed into law by President Lyndon Johnson in 1965. [2] The plan mainly includes medical expenses coverage for low-income families and individuals, including nursing home care, long- term care services, and home health care services.

According to the 2010 Affordable Care Act (ACA), one was to extend the scope of Medicaid in 2014 to 138% of the population under the Federal Poverty Level

(FPL) [7]. Each state has the option to determine whether to join the expansion plan or not. There are 31 states and Washington, DC that have joined the program in

2017. Meanwhile, the number of beneficiaries has also increased from 56.5 million to 73 million [6], with the main newly added beneficiaries being childless adults.

After enrollment, many people are gaining access to primary care, preventive services.

Some people with chronic diseases had never been treated because they could not afford medical treatment, and now they have improved because they have joined the

Medicaid program.

With the rapid growth in the number of the new enrollees, the demand for prospec- tive medical assistance has increased significantly. And in rural areas, due to the lack

6 of nearby hospitals, poor physician supply, and low population, the loss of medical care staff is a general trend leading to a reduction in the medical workforce. What we want to study is whether the expansion of Medicaid will have an impact on the decline in the loss of medical workforce trend. In this paper, we will concentrate on the number of Primary Care Physician and Nursing Practitioner in the rural coun- ties. Because they always serve as the beginning of health care delivery system, the sufficient supply of these medical workforce is important to meet the increase of new enrollees. [9]

1.3 Method

For practical problems, especially in the social sciences, it is implausible to work on the randomized experiments, or include all the confounders related to the response variables in the model. However, there are many techniques implemented in econo- metrics and social sciences that attempt to mimic the experimental design. There are three commonly used methods to analyze causal inference for observational study in social science. I will introduce two of them, Instrumental Variables and Regres- sion Discontinuity is this chapter, and discuss, Difference-in-Differences, in the next chapter.

1.3.1 Instrumental Variables

As we know before, randomized experiments are a good method to estimate the causal effect. Because we randomly assign the treatment to each individual, the treatment is unrelated to all variables that might affect the outcome. In this case, the variation of outcomes is called the exogenous variation. However, when we work on the observational data, the pre-treatment has variation between the treatment group

7 and the control group that will affect the outcomes, and we called the endogenous variation. The instrumental variable is a method that can tackle the endogenous variation in estimating the causal effect. An instrumental variable is a variable that is correlated with the independent variable and uncorrelated with the error term [3].

The basic idea of the instrumental variables is the exclusion restriction, that is, [1]

1. The instrument is as good as randomly assigned (i.e. Independent of potential

outcomes, conditional on covariates)

2. The instrument has no effect on outcomes other than through the treatment

Here, we define the instrumental variable as Z, Ai is the independent variable for outcome Yi, we have

Ai = β10Xi + β11Zi + 1i

Yi = β20Xi + β21Zi + 2i

In the first equation, we are measuring the first-stage effect of instruments Zi on Ai with holding the covariates X. In the second equation, we begin by stating the outcome Yi generate by the instrument Zi through the treatment Ai. Based on assumption in exclusion restriction, we merge these two mathematical equations and

β21 describe the causal effect of treatment Ai on Yi, the ratio . β11

1.3.2 Regression Discontinuity

In some cases, the individuals’ treatment is determined by the variables that above or below the threshold. Although the different participants will be divided into different groups, the randomization is unfeasible. We could estimate the average

8 treatment effect by comparing response variables that just lying above or below the cutoff. Thus we can conduct the estimation of the effect of treatment based on these subsets of the population. This method was called the Regression Discontinuity.

For example, in the high school, the students will be awarded by the National Merit

Scholarship Awards based on their PSAT scores. Thistlewaithe and Campbell in 1960 did research to test whether the awarded students are more likely to complete their college study [20]. This study was conducted by the regression discontinuity model that assumes that the individuals attribute around the threshold are very similar.

Thus there is no systematic difference in the treatment group and the control group.

By comparing the difference in outcomes between the individuals who just above the above the threshold and the individuals who just below the threshold.

9 Chapter 2: Methods

In this chapter, I will provide some theoretical details of difference-in-differences method that use the Area Health Resource File (AHRF) to analyze the causal effect of Medicaid expansion on the rural health care workforce. I review a framework of the difference-in-differences (DD) model. In the following sections, I will discuss the

Generalized Linear Model (GLM) and the Robust and propose the framework of the Generalized Estimating Equation (GEE)

2.1 Difference-in-Differences

We will use part of the AHRF which contains health care workforce and demo- graphical characteristics of 644 counties surveyed each of 7 years, to estimate the

Medicaid expansion effect on the health care workforce. The treatment of interest is the Medicaid expansion status of each state. We will focus on the health care workforce outcome of the pre-treatment and the post treatment, and use difference- in-differences model to estimate the causal effect of Medicaid expansion.

2.1.1 Framework of difference-in-differences

I will explain the idea behind the difference-in-differences model. We focus on the particular time that the treatment takes place. Because we know there might be

10 some changes that happen at this time T . We have the observational panel data on the response variable Yiat for the treated group (a=1), before and after treatment, and similarly for the control group (a=0). When we compare the average outcome between the treatment group and the control group before the treatment, there might be the difference, E(Yiat|a = 1, t < T ) − E(Yiat|a = 0, t < T ), across the groups. The reason might be that the characteristics of subjects in the treatment group differ from the control group, which is known as selection bias. Therefore, we cannot identify the causal effect by comparing the difference between different groups, because there are differences in the pre-treatment groups.

To address this problem, we can eliminate the selection bias by deducting the pre- treatment difference from the post-treatment difference. The difference is the causal effect without the pre-treatment selection bias. The main assumption of difference- in-differences is the parallel trend, that is in the absence of treatment, the difference between the treatment and control group is constant over time. In the figure2.1, the trends in outcomes should be the same for the treatment groups and control groups if the treatment was not implemented. Although there is a selection bias in pre-treatment, we assume that the selection bias remains the same before and after treatment, the difference can be eliminated by subtracting. This is the difference-in- differences method.

In our study, the treatment is the Medicaid expansion policy, the outcomes is the number of Primary Care Physicians and Nurse Practitioners that we want to reflect the impact of Medicaid expansion through changes in the number of doctors and nurses in rural counties. We can figure out the Medicaid expansion change effect on those outcomes by comparing the pre-expansion with the post-expansion. We

11 Figure 2.1: Difference-in-differences parallel trend. Note: Adapted from [10]

12 assume that there are the same trends in each state when is not exposed to the

Medicaid expansion change. Therefore, we can use difference-in-differences design to

evaluate the causal effect of Medicaid expansion policy on the health care workforce.

To see this, we use

A=1 Yist = number of Primary Care Physicians of county i in state

s at year t if they implement Medicaid expansion

A=0 Yist = number of Primary Care Physicians of county i in state

s at year t if they do not implement Medicaid expansion

According to the parallel assumption, we can define the average potential outcome of non-expansion county by

A=0 E(Yist |s, t) = β0 + β1statei + β2timet (2.1)

Here, statei is the dummy variable indicating to either the Medicaid expansion state

or the Medicaid non-expansion state, and timet is the dummy variable indicating

whether county i in the post-expansion period. From the above equation, we know

that the average of the health care workforce is determined by the state effect β1 and

the common time trend β2. Thus, when the time t is fixed, the difference of average

outcome is only determined by each state characteristics. And for each state, the

change in outcome is due to a constant time effect.

We assume the Medicaid expansion has a fixed effect β on the health care work-

force, the average potential outcome of expansion county is

A=1 E(Yist |s, t) = β0 + β1statei + β2timet + β (2.2)

13 A=1 A=0 Here β = E(Yist |s, t) − E(Yist |s, t) is the causal effect, and we have

In equation (2.3), Ast is the dummy variable which equal to one for those obser-

vations in the treatment state after the Medicaid expansion. This is the difference-in-

differences model for Medicaid expansion on the health care workforce, the outcome

2 Yist is determined by state effect γs, time effect λt, policy effect ρ, and the N(0, σ )

random error term ist.

2.1.2 Difference-in-differences with Regression

Next, I will review how to use the to fit the difference-in-

differences model. We generate a dummy variable Tit which is one if the county i in

the Medicaid expansion state and the time period is the post-treatment and is zero

otherwise, statei is the dummy variable for indicating the Medicaid expansion state

or not, timet is the dummy variable which is one if after the Medicaid expansion and

zero if before the Medicaid expansion. Then we have the regression framework

Yit = β0 + β1statei + β2timet + β3Tit + it (2.3)

In equation (2.3), it is the difference-in-differences model for Medicaid expansion

on the health care workforce, the outcome Yist is determined by state effect β1, time

2 effect β2, policy effect ρ, and the N(0, σ ) random error term ist.

There are two advantages of using the regression model. First, we can obtain the standard errors for every variable and test the significance for .

Second, for regression, we can control some covariates in the model to get the accurate causal effect estimation.

14 2.2 Generalized Linear Model

2.2.1 Framework of General Linear Model

First I will discuss the general linear model, we have

yi = β0 + β1x1i + β2x2i + ··· + βpxpi + i (2.4)

yi is the dependent variable, x1i, x2i, ··· , xpi is the independent variables in explaining

2 yi, and i is the error terms where i ∼ N(0, σ ) with constant σ. We can denote this as the matrix form Y = Xβ + .

Under this situation, let Y as n × 1 outcome matrix with Y ∼ N(µ, Σ), where

µ = E(X0β) and Σ = σ2I, and X is n × p covariate matrix. We can estimate

the coefficient of independent variable βˆ by maximum likelihood estimation, the log-

here is [21]

L ∝ (Y − µ)0Σ−1(Y − µ) (2.5)

The first derivative of log-likelihood function is

∂L = X0Σ−1(Y − µ) (2.6) ∂β

This term is also known as the score equation, and we can estimate the coefficient

βˆ by setting the score equation equal to zero. i.e. βˆ = (X0Σ−1X)−1XΣ−1Y .

Because we assume Σ = σ2I, this could be simplified as βˆ = (XX)−1XY

2.2.2 Setting of Generalized Linear Model

The key assumptions of the general linear model are the normality assumption

of error and the independent assumption. However, many problems have error term

15 distribution other than the normal distribution. We can extend the idea of the general linear model to Generalized Linear Model (GLM).

First we define the linear predictor

ηi = β0 + β1x1i + β2x2i + ··· + βpxpi (2.7)

let ηi denotes the function of of outcome µi = E(Yi), and the function we called link function g

g(µi) = ηi (2.8)

Through the link function, the the mean of response variable is related to the linear model, which means that we can predict the response variable Y by the linear model via link function. The difference between the general linear model and generalized linear model is that GLM allows the response variable follows different distributions.

Thus, we need not to require the error term follows the normal distribution any more, and we can specific other distribution, such as Poisson distribution.

And we can also define the V that describes how the variance,

V ar(Yi), depends on the mean

V ar(Yi) = φV (µi) (2.9)

The constant φ is the dispersion .

Here, I review the generalized linear model for the Poisson distribution. Because the Poisson family is very useful, especially for modeling .

In the Poisson regression model, we will use independent variable X to fix the count response variable y. Assume the count response variable that y ∼ P oisson(µ), the parameter µ > 0, and the density function, which is the probability of observing

16 y e−µ p(y) = µy × for y = 0, 1, 2, 3, ... y!

As we known, the mean and variance of the Poisson distribution is µ. To connect µ with X, we will use exponential link-function, and the reason is that the exponential function can keep the non-negative value to connect with µ, that is

µ = eXβ

We need to model the log µ as a linear function of X, thus we have

log µ = Xiβ

To estimate the parameter β, we will use the maximum likelihood estimation method, the log-likelihood function for Poisson regression is [4]

n X log L(β) = (yi log µ − µ − log (yi!)) i=1 By differentiate the log-likelihood function and set it to zero, it can yield the estimated parameter βˆ.

2.2.3 Robust Variance Estimation

We use the panel data which contain repeated observation several time on the same county, the main problem is that the clustered data would be correlated and lead to the biased statistical inference. Because, it is inappropriate to assume that the observation is independent with others, the estimate of the variance of the coefficient

β is no longer efficient.

17 In order to estimate the variance of estimated coefficient, we first extend to the matrix format GLM

Y = g(X0β) +  (2.10)

X is the nT ×k matrix, where we have the observations of n subjects for T times, and each observation has k characteristics.

We extend the log-likelihood function for GLM with link function g and the a proper distribution of , and the estimated variance of the estimated coefficient is the negative inverse matrix of the second derivative of log-likelihood function

∂2L V ar(βˆ) = −( )−1 = s2(X0X)−1 (2.11) ∂βˆ∂βˆ

Here, s2 is the variance of residuals, the estimated variance of each parameter is the squared root of the value of main diagonal elements of V ar(βˆ).

Because the outcomes within the same cluster is correlated to each other, the estimated variance of estimated coefficient by above formula is incorrect. The robust estimator was developed to estimated parameter standard errors with clustered data by n ˆ 0 −1 X 0 0 0 −1 V arR(β) = (X X) [ XiuˆiuˆiXi](X X) (2.12) i=1 0 ˆ where Xi is the a T × k covariate matrix for subject i, uˆi = Yi − X β is the residuals for subject i. The robust estimator for estimated parameter standard error ˆ is diagonal element of the squared root of V arR(β). The idea of this robust estimator is to weighting each observations contribution, and considering first variability within the cluster and then summing across cluster for the final adjustment [23]

18 2.3 Generalized Estimating Equation

There is another popular method to solve the clustered data problem, that is,

Generalized Estimated Equation (GEE). This method was developed by Liang and

Zeger [13]. The general idea of GEE is that assume the structure of within

the cluster, and using the assumptions to estimate the β.

Suppose we have the panel data consists of K subjects, each subject has ni ob-

servations (i = 1, 2, ..., K) which is denoted by Yij (j = 1, 2, ..., ni). Let µi denotes

the average outcomes for subject i. We assume that the outcome across the subject

is independent, and it is correlated for same subject. And the relationship between

µi and covariate Xi can be written as

g(µi) = XiβGEE (2.13)

where g() is the known link function determined by the distribution of outcome.

We can estimate βGEE by setting the ’quasi-score’ equation as 0

N X ∂µi Q(β ) = (V )−1(Y − µ ) = 0 (2.14) GEE ∂β i i i i=1

And the variance of βGEE is

(A )1/2R (α)(A )1/2 V = i i i (2.15) i φ

Here, φ is the dispersion parameter, and Ai is the diagonal matrix with the

variance of g(µij) as the jth element for subject i. Ri(α) describes the pattern of measures for subject i.

19 In GEE model, the response variable will use the same link function G and linear predictor setup XiβGEE in the GLM. Here we use quasi-likelihood estimation to estimate the , since we are not assuming the normal distribution any more.

20 Chapter 3: Data Analysis

3.1 Data

We use the county-level Area Health Resource File (AHRF) to analyze the impact of the Medicaid expansion on the health care workforce. This dataset was collected by the Health Resources and Services Administration, Bureau of Health Professions.

There are more than 6,000 variables on health facilities, health professions, measures of resource scarcity, health status, economic activity, health training programs, and socioeconomic and environmental characteristics [17].

82% of rural counties are identified as Medically Underserved Areas (MUAs) [15], and there is a general downward trend in health care providers in rural counties. Thus, we want to study whether Medicaid expansion has an impact on the medical workforce in rural counties, our target population is the rural counties in the United States. We exclude the non-rural county according to Rural-Urban Continuum Codes [14], and included 644 rural counties with codes 8 (completely rural or less than 2,500 urban population, adjacent to a metro area) and 9 (completely rural or less than 2,500 urban population, not adjacent to a metro area). In table 3.1 below we list the number of rural counties that we will study in each state.

21 State Count State Count State Count AK 17 ME 2 OR 5 AL 11 MI 14 PA 4 AR 13 MN 19 SC 1 CA 4 MO 30 SD 42 CO 20 MS 21 TN 16 FL 2 MT 29 TX 49 GA 22 NC 16 UT 5 IA 20 ND 37 VA 21 ID 10 NE 51 VT 3 IL 10 NM 6 WA 5 IN 5 NV 4 WI 13 KS 42 NY 1 WV 11 KY 36 OH 2 WY 4 LA 5 OK 16 Total 644

Table 3.1: The number of rural counties in each state

Because Medicaid expansion is determined by individual states, the expansion date varies for each state, with 24 states and Washington, D.C. have implemented

Medicaid expansion in January 2014. 7 states implemented Medicaid expansions between April 2014 and June 2016. The remaining 19 states have not implemented as of September 2017 in table 3.2.

22 States that expanded Medicaid States that expanded Medicaid Non-expansion States as of May 2017 (N=19) as of January 2014 (N=25) between April 2014 and July 2016 (N=7) Arizona 4/14: Michigan Alabama Arkansas 8/14: New Hampshire Florida California 1/15: Pennsylvania Georgia Colorado 2/15: Indiana Idaho Connecticut 9/15: Alaska Kansas Delaware 1/16: Montana Maine (1/10/2019) District of Columbia 7/16: Louisiana Mississippi Hawaii Missouri Illinois Nebraska Iowa North Carolina Kentucky Oklahoma Maryland South Carolina 23 Massachusetts South Dakota Minnesota Tennessee Nevada Texas New Jersey Utah New Mexico Virginia (1/1/2019) New York Wisconsin North Dakota Wyoming Ohio Oregon Rhode Island Vermont Washington West Virginia

Table 3.2: Status of State Action on the Medicaid Expansion Decision Our outcome variables are the density of Primary Care Physicians and the density

of Nurse Practitioners that is defined as the number of physicians relative to the size

of a county’s population (per 1,000 population) [11]. We will use data from 2011

to 2017 for a total of 4508 observations. In figure 3.1 and figure 3.2, we see that

the Primary Care Physicians in the rural areas remain the same since the Medicaid

expansion and the Nurses Practitioner in the rural areas has a faster growth after

Medicaid expansion. In the following sections, I will present the statistical analysis of

Medicaid expansion effect on the Primary Care Physician and the Nurse Practitioner.

3.2 Model

I will use the difference-in-differences design approach to investigate the Medi-

caid expansion effect on the health care workforce. Here, we implement the linear

regression model with dummy variables to analyze this issue as follows

log Yit = α + τYeart + γPolicyit + βXit + it (3.1)

Where Yit denotes the density of Primary Care Physician (PCP) or density of

Nurse Practitioner (NP) for county i in year t. Yeart is the constant time trend effect,

γ is the effect of interest, Policyit is the dummy variable for Medicaid expansion. Xit includes state-level controls of poverty rate, unemployment rate, and percentage of the population that is African American. The characteristics of these state demographical covariates are summarized in table 3.3. The number of observations is 4508 which is the 7 years data for 644 counties, while there are only 4478 observations of Nurse

Practitioners due to 30 missing data points in 2011.

24 Figure 3.1: the Primary Care Physician in rural county over year

Variable Obs Mean Std.Dev. Min Max PCP Count 4508 2.707 3.575 0 43 NP Count 4478 2.901 3.291 0 24 Population 4508 7319.96 5772.282 71 35860 Percentage of African American 4508 5.785 14.198 0 85.7 Poverty Rate 4508 17.201 7.331 3.6 56.7 Unemployment Rate 4508 6.114 3.178 1.2 23.7

Table 3.3: State-level Characteristics

25 Figure 3.2: Distribution of Nurse Practitioner in rural county over year

Figure 3.3: Distribution of Primary Care Physician in rural county

26 Figure 3.4: Distribution of Nursing Practitioner in rural county

From the plots (figure 3.3 and figure 3.4) we can conclude that the density of the workforce in the rural county has a right-skewed distribution. The basic assumption of normality distribution seems implausible in this situation. Also, the observations in the dataset are clustered at the county-level. As mentioned in Chapter 2, there are two general methods to conduct causal inference in the panel dataset. Thus, I will use the generalized linear model and Generalized Estimating Equation with Poisson family and Log link function, and compare the results from these two methods.

3.3 Results

The number of health care workforce is highly dependent on population, more populations have more doctors. In order to standardize the health care workforce on the same scale. We decide to implement the density of Primary Care Physicians and the density of Nurse Practitioners as the response variables respectively. In the

27 statistical software, we use the offset term. For our analysis, we will use GLM and

GEE in the Poisson family with log link function, we can define the offset by

0 logE[Y|X] = X β + logP

0 logE[Y|X] − logP = X β Y log [ |X] = X0β E P

We noticed that the offset term logP is forced to have coefficient 1, so we can simply

Y move the offset to the left side to get the term, P , that is the health care workforce density by definition. The reason why we use the offset term is that the number of

Primary Care Physician and Nurse Practitioner follows the Poisson distribution, but the density does not.

Table 3.4 reports GLM and GEE estimates of the effects of Medicaid expansion on health care workforce outcomes. The dependent variable is Primary Care Physician density and Nurse Practitioner density, and the model also includes the time effect

Y ear trend, and three county-level covariates, African American percentage, poverty rate, and unemployment rate. In column 1 and column 2, the GEE model shows that Medicaid expansion enhances the Primary Care Physician in the rural area, and for Nurse Practitioner, it has the same effect in column 2. Moreover, the coefficient estimates obtained by the GEE model are statistically significant in explaining the policy effect on outcomes with z value 2.38 and 2.10. While the estimated effect in the GLM model suggests that the Medicaid expansion leads to lower Primary

Care Physician in the rural area, but substantially increase the Nurse Practitioner

28 GEE GEE GLM GLM log PCP density log NP density log PCP density log NP density main Policy 0.0250 0.0243 -0.0117 0.0945 (2.38) (2.10) (-0.12) (1.20)

Year trend -0.00745 0.0901 -0.00658 0.0861 (-3.16) (30.60) (-0.48) (5.95)

African American -0.0110 -0.00416 -0.00575 -0.00651 Percentage (-5.87) (-3.08) (-1.22) (-1.90)

Poverty rate -0.00244 0.0118 -0.0253 0.0192 (-1.35) (6.79) (-4.21) (3.68)

Unemployment 0.00416 -0.000231 0.0108 0.00201 rate (1.58) (-0.07) (0.75) (0.15)

Constant -0.887 -1.598 -0.542 -1.726 (-19.67) (-31.17) (-3.46) (-10.08) Observations 4508 4478 4508 4478 t statistics in parentheses

Table 3.4: Medicaid expansion effect on Primary Care Physician and Nurse Practi- tioner using GLM and GEE

workforce than that of GEE model. And both of them are not statistically significant in the model.

In the above analysis, we assume the outcomes follows the Poisson distribution, because Poisson distribution is suitable for fitting the count type outcome in a period of time. The Poisson distribution has the property of that mean is identical with variance, that is, its dispersion parameter is equal to 1. However, according to table

3.3, we found that the Primary Care Physicians and Nurse Practitioners’ mean and

29 variance are not equal. Thus, I will implement Negative Binomial distribution in GEE

model to fit the outcome again, and compare the results with the Poisson distribution.

First, I will explain how to define the dispersion parameter in the Negative Bino-

mial distribution. Let x denotes the number of identically distributed Bernoulli trials before a number of events occurs r times. We assume the probability of Bernoulli

pr 2 pr trials is p. We know the mean µ = 1−p and variance σ = (1−p)2 , and have

µ p = (3.2) µ + r r 1 − p = (3.3) r + µ µ2 σ2 = + µ (3.4) r

This equation is the relationship between mean and variance in the Negative

1 Binomial distribution. And we denote α = r is the dispersion parameter. In Stata, the command of the negative binomial distribution family in GEE allows

us to set the value of dispersion parameter ourselves. Therefore, I first got the disper-

sion parameter value through negative binomial distribution regression in Appendix

A, and then set up in gee model with the obtained values model.

In table 3.5, we know that the impact of the Medicaid expansion estimated by

the negative binomial distribution on the Primary Care Physicians has been reduced,

while the effect of time trends has been aggravated. Combining with the distribution

of the outcome variables (Figure 3.3 and Figure 3.5), we noticed that there is no

significant difference between the treatment group and the control group before and

after the Medicaid expansion. There is less policy impact on the negative binomial

distribution. And for the control group, time trends did not make much difference

30 (1) (2) log PCP density log NP density Policy 0.0161 0.0225 (0.95) (1.31)

Year trend -0.00367 0.0898 (-1.02) (22.62)

African American -0.00861 -0.00420 Percentage (-3.45) (-2.13)

Poverty rate -0.00504 0.0101 (-1.73) (3.76)

Unemployment 0.00635 0.00117 rate (1.52) (0.26)

Constant -0.887 -1.568 (-12.97) (-22.36) Observations 4508 4478 t statistics in parentheses

Table 3.5: Medicaid expansion effect on Primary Care Physician and Nurse Practi- tioner using GEE with Negative Binomial distribution

Figure 3.5: Distribution of PCP density by year

31 Figure 3.6: Distribution of NP density by year

over the year. The year effect in the negative binomial distribution model is smaller than the Poisson distribution model. For the Nurse Practitioners, we found that in the GEE with the negative binomial distribution model, estimated effects of policy and time have a reduced impact on the number of nurses compared to the Poisson model.

From Stata, we also obtain a dispersion parameter of 0.31 in the Primary Care

Physician model with χ2 value is 1297.24, and 0.21 in the Nurse Practitioner model with χ2 value is 673.51 in Appendix A. Both of them are significant, which means dis- persion parameter significantly differs from 1 and the negative binomial distribution model is more appropriate than the Poisson distribution model.

32 Chapter 4: Conclusion

In this thesis, I conduct causal inference in Medicaid expansion effect on the num- ber of Primary Care Physician and Nurse Practitioner in the rural counties. Causal inference differs from our previous quantitative research methods in that it is based on the concept of potential outcomes. This requires different treatment effects on the same individual at the same time. However, it is difficult to observe in real life, espe- cially for observational research on the effects of health policy. In order to solve this problem, there are many strategies we can use, such as Instrumental Variables, Re- gression Discontinuity and Difference-in-Differences. For Medicaid expansion, many states implemented this health care policy in 2014, and successive states have succes- sively implemented it. The purpose of Medicaid expansion is to make more people able to receive medical treatment. We want to make sure that after a significant increase in the number of new enrollees, there can be enough medical providers to provide treatment. Previously, the number of medical providers in rural counties had a downward trend, and the Primary Care Physician and Nurse Practitioner as the entry of the health care delivery system. Therefore, we mainly studied the impact of

Medicaid expansion on the number of doctors and nurses in rural counties.

Chapter 1 is the review of causal inference methods and related background in

Medicaid expansion. Here is how to define causality and some methods of causal

33 inference in observational studies. In chapter 2, we mainly discussed the difference- in-differences method, and the use of GLM and GEE methods to deal with data correlation issues in panel data. In chapter 3, I used the AHRF dataset to analyze the number of Primary Care Physician and Nurse Practitioner in rural counties, and included some state-level variables as relevant factors for analysis. In the end, we found that the GEE model with negative binomial distribution can better fit both the number of Primary Care Physicians and the number of Nurse Practitioners.

34 Appendix A: Stata commands and results

I list the outcome and Stata command for difference-in-difference model

35 36 37 38 39 Bibliography

[1] Joshua D Angrist and J¨orn-SteffenPischke. Mostly harmless : An empiricist’s companion. Princeton university press, 2008.

[2] Edward Berkowitz. Medicare and medicaid: The past as prologue. health care Financing Review, 27(2):11, 2005.

[3] Roger J Bowden and Darrell A Turkington. Instrumental variables, volume 8. Cambridge university press, 1990.

[4] Sachin Date. An Illustrated Guide to the Poisson Regression Model, 2019.

[5] Rajeev H Dehejia and Sadek Wahba. Causal effects in nonexperimental stud- ies: Reevaluating the evaluation of training programs. Journal of the American statistical Association, 94(448):1053–1062, 1999.

[6] Centers for Medicare & Medicaid Services. November 2019 Medicaid & CHIP Enrollment Data Highlights, 2019.

[7] HealthCare. gov. Federal Poverty Level, 2020.

[8] MA Hern´anand JM Robins. Causal inference: What if. Boca Raton: Chapman & Hill/CRC, 2020.

[9] Adam N Hofer, Jean Marie Abraham, and Ira Moscovice. Expansion of coverage under the patient protection and affordable care act and primary care utilization. The Milbank Quarterly, 89(1):69–89, 2011.

[10] Andy (https://stats.stackexchange.com/users/26338/andy). What is difference-in-differences? Cross Validated. URL:https://stats.stackexchange.com/q/125266 (version: 2018-08-17).

[11] Primary Health Care Performance Initiative. Physician density, 2018.

[12] Alan B Krueger. Experimental estimates of education production functions. Quarterly Journal of Economics, 114(2):497–532, 1999.

40 [13] Kung-Yee Liang and Scott L Zeger. Longitudinal data analysis using generalized linear models. Biometrika, 73(1):13–22, 1986.

[14] U.S. Department of Agriculture. Rural-Urban Continuum Codes, 2013.

[15] National Advisory Committee on Rural Health and Human Services. The 2008 Report to the Secretary: Rural Health and Human Services Issues, 2008.

[16] Maya L Petersen and Mark J van der Laan. Causal models and learning from data: integrating causal modeling and statistical estimation. Epidemiol- ogy (Cambridge, Mass.), 25(3):418, 2014.

[17] Health Resources and Services Administration. Area Health Resources Files, 2019.

[18] D.B. Rubin. Causal inference. In Penelope Peterson, Eva Baker, and Barry McGaw, editors, International Encyclopedia of Education (Third Edition), pages 66 – 71. Elsevier, Oxford, third edition edition, 2010.

[19] Donald B Rubin. for causal effects: The role of randomization. The Annals of statistics, pages 34–58, 1978.

[20] Donald L Thistlethwaite and Donald T Campbell. Regression-discontinuity anal- ysis: An alternative to the ex post facto experiment. Journal of Educational psychology, 51(6):309, 1960.

[21] Rebecca Willett. 10. Linear Models and Maximum Likelihood Estimation, 2017.

[22] Bruce J Winick. Legal limitations on correctional therapy and research. Minn. L. Rev., 65:331, 1980.

[23] Christopher Zorn. Comparing gee and robust standard errors for conditionally dependent data. Political Research Quarterly, 59(3):329–341, 2006.

41