<<

Econ 8208- Some Bayesian Econometrics

Patrick Bajari

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 1 / 72 Motivation

Next we shall begin to discuss Gibbs and Monte Carlo (MCMC) These are methods from Bayesian and econometrics We shall emphasize the practical value of these approaches over philosophical issues in Bayes vs. Classical estimation However, it is important to review a couple of basic results from …rst Please read 3.8 carefully

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 2 / 72 Motivation

In Bayes, we normally start by specifying a p(y θ) j and a prior p(θ) The likelihood depends on θ Note that this is di¤erent from GMM where we only specify moment equations We use Bayes Theorem to learn about the posterior distribution of the conditional on the data:

p(θ)p(y θ) p(θ y) = j j p(y θ)p(θ)dθ j R

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 3 / 72 Motivation

In many models, the posterior is a very complicated function and cannot be expressed analytically However, it turns out that in many models of interest it is possible to simulate the posterior distribution This is often possible even when classical methods of inference cannot be done That is, we will draw pseudo random deviates θ(1), ..., θ(S ) from p(θ y) j

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 4 / 72 Motivation

We will often be interested in functions g(θ) For example, g(θ) might be pro…t, consumer surplus or other functions of demand parameters The of g(θ) can be simulated as:

1 (s) ∑ g θ p g(θ)p(θ y)dθ S s ! j   Z We may also be interested in maximizing expected utility/pro…t (or minimizing loss to statisticians) from some function g(a, θ) where a is an action 1 max g a, θ(s) a ∑ S s  

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 5 / 72 Motivation

Bayes requires to the speci…cation of a prior p(θ) This choice is frequently criticized as ad hoc However, the authors note that under weak regularity condtitions asymptotically: 1 p(θ y) N(θMLE , H ) j  θMLE Asymptotically, the prior swamps the likelihood b b

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 6 / 72 Motivation

Some practical advantages of Bayes-

1 Simulation of complicated models (especially w/ latent variables) 2 Exact inference (e.g. asymptotic approximations not used) 3 Link to decision theory

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 7 / 72 Bayesian Analysis of Regression

An important case to study will be the normal model We shall show that a key step in estimating many rich models is to analyze the normal regression model Consider the following multiple regression model:

yi = xi0β + εi 2 εi N(0, σ ),  y N(X β, σ2I ) 

In the above, y is a vector formed of the yi and X is has ith row xi

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 8 / 72 Bayesian Analysis of Regression

Given the assumptions above, the likelihood for y is:

2 2 n/2 1 p(y X , β, σ ) ∝ (σ ) exp (y X β)0 (y X β) j 2σ2   In Bayesian econometrics, it is convenient to work with prior distributions that are conjugate. That is, the posterior is in the same "class" of distributions as the prior The posterior is always proportional to the prior times the likelihood If we are clever in choosing the right density for the prior, the posterior will be analytically convenient to work with Our concerns are mainly comptutational- we do not necessarily take the parametric form of the prior seriously

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 9 / 72 Bayesian Analysis of Regression

It will be helpful to rewrite the likelihood function as follows: First note that by substituting y = X β + (y X β) b b

(y X β)0 (y X β) = 0 0 y X β y X β + β β X 0X β β   2  0    b = vs + bβ β X 0Xb β β b     b b where β is ols estimate, v = n k b

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 10 / 72 Bayesian Analysis of Regression

We can then write the likelihood as:

2 2 n/2 1 2 1 0 p(y X , β, σ ) ∝ (σ ) exp vs β β X 0X β β j 2σ2 2σ2      2 n/2 1 2 1 0 = (σ ) exp( vs ) exp( β β X 0bX β β ) b 2σ2 2σ2 2 v /2 1 2  2 (n v )/2   = (σ ) exp( vs ) (σ )b b 2σ2  1 0 exp( β β X 0X β β ) 2σ2     We will specify the prior in the form:b b p(β, σ2) = p(σ2)p(β σ2) j Looking at the above, we guess that a prior of the form λ δ p(θ) ∝ θ exp( ) (inverse gamma) will be conjugate and a normal θ for β will be conjugate

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 11 / 72 Bayesian Analysis of Regression

The exact form we shall use is:

2 v0 /2+1 v0s p(σ2) ∝ σ2 exp( 0 ) 2σ2  We can think of this as a posterior arising from a sample size v0 with 2 su¢ cient statistic s0 The natural on β is normal:

2 2 k 1 p(β σ ) ∝ σ exp (β β)0A(β β) j 2σ2    This is a normal with β and precision (inverse of ) A

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 12 / 72 Bayesian Analysis of Regression

A key insight is that the posterior has the form:

p(β, σ2) ∝ p(y X , β, σ2)p(σ2)p(β σ2) j j After some tedious (but not deep) algebra, we …nd:

2 2 k 1 p(β, σ ) ∝ σ exp (β β)0A(β β) 2σ2    2 2 ((n+v0 )/2+1) ns + v0s σ2 exp e 0 e 2σ2    1 β = X 0X + A X 0X β + Aβ    e b

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 13 / 72 Bayesian Analysis of Regression

Clearly the posterior for β conditional on y, X , σ2 is normal It is a weighted average depending on the prior precision A and the precision from the data X 0X It also depends on the prior mean β and the classical estimator β As the number of observations becomes large, this converges to the OLS estimate β b Convincing yourself of this derivation would be good preparation for the prelims b

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 14 / 72 Bayesian Analysis of Regression

Derivation of this result is done by "completing the square" Since this is a common operation in Bayesian econometrics, it is worth reviewing

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 15 / 72 Bayesian Analysis of Regression

Derivation of the posterior for β is done by "completing the square" Since this is a common operation in Bayesian econometrics, it is worth reviewing

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 16 / 72 Bayesian Analysis of Regression

A key insight is that the posterior has the form:

p(β, σ2) ∝ p(y X , β, σ2)p(σ2)p(β σ2) j j 2 n/2 1 ∝ σ exp (y X β)0 (y X β) 2σ2    k/2 1 σ2 exp β β 0 A β β  2σ2    2   v0 /2+1 v0s σ2 exp( 0 )  2σ2 

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 17 / 72 Bayesian Analysis of Regression

We guess that β is going to be normally distributed. We strive to rewrite the equation below as follows:

(y X β)0 (y X β) + β β 0 A β β = β β 0 V β β +junk not involving  β     e e If we can write it in this form, then βis the posterior mean and V the precision matrix e

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 18 / 72 Bayesian Analysis of Regression

(y X β)0 (y X β) + β β 0 A β β = (v W β)0(v W β)0   y X v = , W = , A = U0U Uβ U    

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 19 / 72 Bayesian Analysis of Regression

De…ne β as the ols projection of v on W ,

1 e β = (W 0W ) W 0v 1 = X X + A X X β + Aβ e 0 0    Then an application of the plus/minus trick shows:b

0 (v W β)0(v W β)0 = β β W 0W β β + junk not involving β 0     = β β X 0X + A eβ β + junke not involving β      e e

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 20 / 72 Methods

We will often begin with an unnormalized posterior density π (θ) (e.g. known up to a constant) The dimension of θ may be high For example, we may be studying a discrete choice model with 2000 consumers, 5 controls There are 10,000 random coe¢ cients (which we treat as parameters) There are also hyper parameters linking the random coe¢ cients Importance sampling and classical simulators may not be useful in this setting

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 21 / 72 Markov Chain Monte Carlo Methods

Our goal will be to construct a Markov chain F to "simulate the poster"

Generate a sequence of R pseudo random deviates θ0, ..., θR However, they will not be iid

Instead we will draw them from a chain θr +1 F (θr ) from an  arbitrary starting point θ0

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 22 / 72 Markov Chain Monte Carlo Methods

We will want the invariant, or long run, distribution of the Markov chain to be π If this is the case, then for a function h of interest: 1 h (θ ) h (θ) π (θ) dθ R ∑ r p r ! Z

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 23 / 72 Some Markov Chain Basics

Restrict attention to …nite support Let the state space S = θ1, ..., θd

Seqeunce of random variablesn θ0, θ1o, θ2, ...

j i Pr θr +1 = θ θr = θ = pij j h i Initial distribution π0(θ)

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 24 / 72 Some Markov Chain Basics

The distribution of θ1 is:

j Pr θ1 = θ = ∑ π0(θi )pij i h i π1 = π0P

Distribution over states is a row vector of probabilities After r iterations:

r πr = π0P

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 25 / 72 Some Markov Chain Basics

In practice, it is useful to restrict attention to chains that don’tget "stuck"

Assume that pij > 0 all i, j Then there exists a stationary distribution:

r π = lim π0P r ∞ ! Note that π satis…es: π = πP We also call π the invariant distribution. Characterizes the long run behavior of the chain after the impact of π0 wears o¤.

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 26 / 72 Some Markov Chain Basics

It is also useful to look at time reversibility to …nd the invariant distribution Roughly speaking, time reversibliity that it is equally likely to see a transition from i to j as seeing a transition from j to i

j i1 i2 is Pr θr = θ θr +1 = θ , θr +2 = θ , ..., θr +s = θ = j h j i1 i2 is Pr θr = θ , θr +1 = θ , θr +2 = θ , ..., θr +s = θ

h i1 i2 is i Pr θr +1 = θ , θr +2 = θ , ..., θr +s = θ

h j i1 j i θr = θ ] Pr θr +1 = θ θr = θ Pr j i2 is j i1 " Pr[θr +2 = θ , ..., θrh+s = θ θr = θ , θri+1 = θ # = j i1 Pr θr +1 = θ

i1 i2 is i1 Pr θr +1 = θ , θr +2 =h θ , ..., θr +is = θ θr +1 = θ j The second byh the multiply/divide trick i

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 27 / 72 Some Markov Chain Basics

However, since we have a Markov chain:

i2 is j i1 Pr[θr +2 = θ , ..., θr +s = θ θr = θ , θr +1 = θ ] = j i1 i2 is i1 Pr θr +1 = θ , θr +2 = θ , ..., θr +s = θ θr +1 = θ j Hence: h i

j i1 i2 is Pr θr = θ θr +1 = θ , θr +2 = θ , ..., θr +s = θ = j h j i1 j i Pr θr = θ ] Pr θr +1 = θ θr = θ = j h i1 i Pr θr +1 = θ

h πj pji i pij = πi

Where pij denotes the reverse chain

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 28 / 72 Some Markov Chain Basics

Time reversability means pij = pij This in turns implies that:

πj pji pij = πi πi pij = πj pji

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 29 / 72 Some Markov Chain Basics

The above shows π stationary distribution implies πi pij = πj pji We can also prove the result in the other direction

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 30 / 72 Some Markov Chain Basics

Suppose that the chain is reversible wrt ωi , all i, j ωi pij = ωj pji

∑ ωi pij = ∑ ωj pji i i = ωj ∑ pji i = ωj

Hence ωP = ω All of this can be generalized to the continuous case

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 31 / 72 Gibbs Sampling

A very important method to simulate the posterior distribution is called Gibbs sampling Suppose that we have a distribution f (θ)

Let’sus "block" θ into p groups (θ1, θ2, ..., θp )

Let f1(θ1 θ2, ..., θp ) denote the distribution of θ1 given θ2, ..., θp j De…ne f2, ..., fp analogously

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 32 / 72 Gibbs Sampling

We will then simulate the following Markov chain R (θ1r , θ2r , ..., θpr ) r =1 starting from an arbitrary value simulate using thef following steps:g

step 1 : θ1r +1 f1 (θ1 θ2r , ..., θpr )  j step 2 : θ2r +1 f2 (θ2 θ1r +1, θ3r ..., θpr )  j step 3 : θ3r +1 f3 (θ3 θ1r +1, θ2r +1, θ4r ..., θpr ) .  j .

step p : θpr +1 fp (θp θ1r +1, θ2r +1, ..., θp 1r +1)  j Return to step 1

Note that you update the values of each block at each step.

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 33 / 72 Gibbs Sampling

Gibbs sampling is useful for a couple of reasons:

1 You can do it in many models of interest 2 It is simple to do 3 It is highly accurate in many applications of interest 4 When combined with data augmentation (which we will discuss later), it is very powerful for the analysis of problems

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 34 / 72 Gibbs Sampling

The posterior distribution can be simulated using Gibbs This is because the posterior is the invariant or long run distribution This can be shown using proof by induction Consider a two block Gibbs sampler Let π(θ) denote the posterior

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 35 / 72 Gibbs Sampling

Suppose that we run the following Gibbs sampler:

θ2,r +1 π2 1 (θ1,r )  j θ1,r +1 π1 2 (θ2,r )  j Suppose by the induction hypothesis that θr π ( ). Then:  

θ1,r π1 ( ) = π(θ1, θ2)dθ2   Z The distribution of θ2,r +1 is then:

θ2 π2 1 (θ1,r ) π1 (θ1,r ) dθ1  Z j = π2

Hence θ2,r +1 is a draw from the invariant distribution A symmetric argument shows θ1,r +1 is a draw from the invariant distribution

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 36 / 72 Application to OLS

Returning to our ols example we can Gibbs this model as follows:

2 2 1 β σ N(β σ X 0X + A ) j  2j 2 v1s1  σ 2e  χv1 v1 = v0 + n 2 2 2 v0s0 + ns s1 = v0 + n While it is a bit tedious to set up this simulator, it is not di¢ cult technically Also, it will typically be highly accurate and fast The text also gives examples of SUR and other related models

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 37 / 72 Conjugacy for Covariance Matrices

Consider yi an m by 1 random normal vector, yi N (0, Σ)  Let there be i = 1, ..., n observations:

1/2 1 1 p (y1, ..., yn Σ) ∝ Σ exp y 0Σ yi j ∏ j j 2 i i   n/2 1 1 = Σ exp ∑ yi0Σ yi j j 2 i ! Next note that:

1 1 ∑ yi0Σyi = ∑ tr yi yi0Σ 2 i i 1  = tr(SΣ )

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 38 / 72 Conjugacy for Covariance Matrices

Use the notation etr ( ) for exponential of the trace function 

n/2 1 1 p (y1, ..., yn Σ) ∝ Σ etr SΣ j j j 2   This suggests a natural conjugate form as follows:

(v0 +m+1)/2 1 1 p (Σ v0, V0) Σ etr V0Σ j  j j 2   This is an inverted Wishart distribution

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 39 / 72 Conjugacy for Covariance Matrices

You can interpret V0 as a location of the prior (e.g. you prior beliefs about the value of Σ)

The term v0 is the spread of the distribution

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 40 / 72 SUR Model

Next consider the Seemingly Unrelated Regression Model (SUR) This is a set of m regressions that are related through correlated error terms

yi = Xi βi + εi i = 1, ..., m k = 1, ..., n

In an unfortunate notational choice, k is the number of observations

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 41 / 72 SUR Model

For all k, (εk,1, εk,2, ..., εk,m ) N (0, Σ)  The SUR model is a well known framework in simultaneous equations The SUR model speci…es that the m equations are related through a set of correlated error terms However, they are not connected through k

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 42 / 72 SUR Model

Stack equations as follows:

y = X β + ε

ε N (0, Σ In) , ε0 = (ε1..., εm )  y 0 = (y1, ..., ym )

X1 0 0 .   0 X .. 0 X = 2 2 3 . .. .. 6 . . . 0 7 6 7 6 0 0 Xm 7 6    7 4 5 β = β10 ,..., βm0 

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 43 / 72 SUR Model

We will work with conditionally conjugate priors

p (β, Σ) = p (β ) p (Σ) j 1 β N β, A  Σ IW (v0, V0)  

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 44 / 72 SUR Model

We use a standard trick from the theory of GLS and multiply by a variance matrix which makes our system iid

Consider the LU decomposition Σ = U0U 1 Multiply both sides of our linear equation by U 0   y = X β + ε

1 0 y = U In y e e e    1 0 Xe = U In X

   As in the theory of GLS,e we are now homoskedastic

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 45 / 72 SUR Model

Arguing as in our linear model, we can show that:

1 β Σ, y, X N β, X 0X + A j     1  e e e β = X 0X + A X 0X βGLS + Aβ     e e e e e b Where βGLS is the GLS estimator Note that once again we have a Bayesian "shrinkage" estimator b

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 46 / 72 SUR Model

The postrior of Σ β is inverted Wishart j

Σ β IW (v0 + n, S + V0) j  S = E 0E

E = (ε1, ..., εm )

εi = Xi β εi That is, the posterior involves a sum of squared errors plus where you centered the prior, V0

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 47 / 72 SUR Model

To Gibbs sample, we draw a pseudo random sequence

(Σ0, β0) , (Σ1, β1) , ..., (ΣR , βR ) We use the following Markov Chain

1 Start with an arbitrary point of support, (Σ0, β0). Then for r < R 1 2 Draw β Σr N β, X X + A r +1j  0     3 Draw Σr +1 β IW (v0 + n, S + V0) j r +1  e e e 4 Return to step 2.

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 48 / 72 Hierarchical Models

In a Bayesian framework, it is common to work with hierarchical models As we shall show, this allows for extreme rich modeling and a simple way to form Gibbs samplers in many applications We next explain how these models can be analyzed by piecing together a graph The authors show that you can conceptualize of simulating from the model as a "Directed Acyclic Graph" (DAG)

p(θ) p (y θ) ! j θ y !

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 49 / 72 Hierarchical Models

In a hierarchical model, we use a sequence of conditional distributions for θ E.g., for two conditional distributions:

p(θ2) p(θ1 θ2) p (y θ1) ! j ! j θ2 θ1 y ! ! Or working with joint and marginal distributions:

p (θ1) = p(θ1 θ2)p(θ2)dθ2 Z j

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 50 / 72 Hierarchical Models

θ2 and y are independent after conditioning on y

p(θ1, θ2, y) = p(θ2)p(θ1 θ2)p(y θ1) j j

Notice that there is no term that involves θ2 and y This suggests the following Gibbs sampler:

θ2 θ1 j θ1 θ2, y j

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 51 / 72 Hierarchical Models

The authors note that there are the following other types of structures in hierarchical models:

θ2 ! θ1 θ3 ! θ1 θ3 ! θ2 θ3

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 52 / 72 Hierarchical Models

There are three rules for "reading" the dependence. A node depends on

1 any node it points to 2 any node that points to it 3 any node that pionts to the node directly "downstream"

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 53 / 72 Hierarchical Linear Model

We illustrate these principals using a regression model with random coe¢ cients:

yi = Xi βi + εi 2 εi iid N(0, σ In )  i i 2 Note that both βi and σi are speci…c to the observation i

βi = ∆0zi + vi vi iid N 0, V  β 2 2 vi s0i  σi 2  χvi

Here zi is a set of covariates which could in‡uence the βi in the above equation 2 We model our prior equation by equation on σi

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 54 / 72 Hierarchical Linear Model

Note that this model is hierarchical in the sense that βi depends on Vβ and ∆ 2 2 Also, σi depends on vi and s0i The directed graph can for this model is on page 71

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 55 / 72 Hierarchical Linear Model

To …nish specifying the model, we must give priors for the hyper parameters Vβ and ∆

V IW (v, V ) β  1 vec (∆) N vec ∆ , V A  β  

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 56 / 72 Hierarchical Linear Model

To summarize the model can be written as the following set of conditional distributions

2 yi Xi , β , σ j i i β zi , ∆, V i j β 2 2 σ vi , s i j 0i V v, V βj ∆ V , ∆, A j β This can be expressed as a directed graph on page 71

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 57 / 72 Patrick Bajari () Econ 8208- Some Bayesian Econometrics 58 / 72 Hierarchical Linear Model

This generates the following Gibbs sampler:

2 β yi , Xi , ∆, zi , V , σ i j β i 2 2 σ yi , Xi , β , vi , s i j i 0i V β , v, V , Z, ∆, A βj f i g β ∆ β , V , Z, A, ∆ j f i g β Note that most of these will be normally distributed The variance parameters should have inverse Wishart/gamma distributions

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 58 / 72 Data Augmentation

Next we explore Gibbs Sampling and Data Augmentation Essentially, this is an approach to handling latent variables This is an extremely powerful technique in practice as we shall show The authors illustrate using a binary probit example

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 59 / 72 Data Augmentation

Consider the binary :

zi = xi β + εi , εi N(0, 1)  0 if z < 0 y = i i 1 otherwise 

We observe (yi , Xi ) while zi is latent

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 60 / 72 Data Augmentation

We use a normal prior for β, β N 0, A 1  We treat the parameter vector θ = (β, z) where z includes all of the latent zi The directed graph for this model is:

β z y ! !

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 61 / 72 Data Augmentation

The posterior distribution can be simulated using the Gibbs sampler:

z β, X , y j β z, X j

zi is distributed as truncated normal with "mean" parameter xi β and standard deviation 1

zi truncated above at 0 if yi = 0 and below at 0 if yi = 1 The GHK of Chapter 2 can be used to simulate the truncated normal

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 62 / 72 Data Augmentation

The posterior distribution for β satis…es:

1 β z, X N β, X 0X + A j   1   β = X 0Xe + A X 0z + Aβ   e

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 63 / 72 Semiparametric Bayes

The models that we have talked about thus far rely on fairly strict parametric assumptions We can weaken these assumptions in some cases through the use of mixture distributions A mixture of normal distributions can approximate an arbitrary distribution quite well In practice, a handful of mixtures is often adequate

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 64 / 72 Semiparametric Bayes

The basic can be written as follows:

yi N (µ , Σind )  ind indi Multinomial (pvec) 

In the above, yi is a p-dimensional vector and pvec is a vector of probabilities for which normal deviate you use in your analysis

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 65 / 72 Semiparametric Bayes

A set of convenient conditionally conjugate priors are:

pvec Dirichlet (α)  1 µ N(µ, Σk a ) k = 1, ..., K k  µ Σk IW (v, V )  The can be written as:

K 1 αi 1 f (x1, .., xp ; α1, ..., ap ) = ∏ xi B(α) i=1 K ∏ Γ (αi ) B(α) = i=1 K Γ ∑i=1 αi  

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 66 / 72 Semiparametric Bayes

One convenient property to note is that:

αi E [xi ] = K ∑i=1 αi Clearly, this can be used as part of a hierarchical model For example, you could put mixtures on the random coe¢ cients One paper we will probably discuss uses a mixture of normals on random coe¢ cients in a multinomial probit This results in a semiparametric distribution of random coe¢ cients Also, you could form a semiparametric error term in this fashion The text provides the Gibbs samplers for this hierarchical part of the model The text notes that there are identi…cation issues, however.

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 67 / 72 Metropolis Algorithm

There are many problems in which the distributions may be more complicated than a simple normal, Wishart or other form for which random number generators are readily available We will produce a time reversible chain where the invariant distribution is the posterior The chain will also be fairly simple to simulate

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 68 / 72 Metropolis Algorithm

Here is a discrete version of the algorithm Let π denote the posterior

i 1 Start in state i, θ0 = θ

2 Draw state j w/ prob qij (multinomial draw) π q 3 Compute α = min 1, j ji πi qij n jo i 4 With probability α, θ1 = θ (move) or else θ1 = θ (don’tmove) 5 Repeat

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 69 / 72 Metropolis Algorithm

We wish to show that the algorithm is time reversible with respect to π

This means πi pij = πj pji where pij is the probability of transitioning from i to j This can be proven as follows:

πj qji πi pij = πi qij min 1, = min πi qij , πj qji πi qij f g   πi qij πi pji = πj qji min 1, = min πi qij , πj qji πj qji f g  

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 70 / 72 Metropolis Algorithm

You can extend these ideas to continuous distributions Let π once again be the posterior up to a factor of proportionality The normal model works as follows:

1 Start with θ0 2 2 Draw θ = θ0 + ε, ε N(0, s Σ)  π(θ) 3 Compute α = min 1, e π(θ0 ) n e o 4 With probability α, set θ1 = θ (move) otherwise θ1 = θ0 (stay)

e

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 71 / 72 Metropolis Algorithm

It is useful to start with a guess at the MLE and use the variance matrix estimate to form Σ Many like to tune s so that the accept/reject ratio is about .3-.5 The text describes more elaborate procedures The text shows how to use this in a random coe¢ cient logit model The chain may exhibit high , inhibiting convergence The authors use the "metropolis within Gibbs" to study the random coe¢ cient logit

Patrick Bajari () Econ 8208- Some Bayesian Econometrics 72 / 72 6TargetMarketing

In ”The Value of Purchase History Data in Target • Marketing” Rossi et. al. attempt to estimate household level preference parameters.

This is of interest as a marketing problem. •

CMI checkout coupon uses purchase information • to customize coupons to a particular household.

In principal, the entire purchase history (from con- • sumer loyalty cards) could be used to customize coupons (and hence prices)

If a household level preference parameter can be • forecasted with high precision, this is essentially first degree price discrimination! Even with short purchase histories, they find that • profits are increased 2.5 fold through the use of purchase data compared to blanket couponing strate- gies.

Even one observation can boost profits from coupon- • ingby50%.

This application is of interest to economists as • well.

The methods in this paper allow us to account for • consumer heterogeneity in a very rich manner.

This might be useful to examine the distribtion • of welfare consequences of a policy intervention (e.g. a merger or market regulation).

Beyond that, these methods demonstrate the power • of Bayesian methods in latent variable problems. 7 Random Coefficients Model

Multinomial probit with panel data on household • level choices

yh,t = Xh,tβh + εh,t ε N(0, Λ) h,t ∼ β = ∆z + v , v N(0,V ) h h h h ∼ β

Households h =1,...,H and time t =1,...,T •

X covariates and z demographics • h,t h

Note that household specificrandomcoefficients • βh remain fixed over time

I observed choice • h,t The posterior distributions are derived in Appen- • dix A.

Formally, the derivations are very close to our • multinomial probit model above.

Gibbs sampling is used to simulate the posterior • distribution of Λ, ∆,Vβ

8 Predictive Distributions

The authors wish to give different coupons to dif- • ferent households.

A rational (Bayesian) decision maker would form • her beliefs about household h’s preference para- meters given her posterior about the model para- meters. This will involve, as we show below, forming a • predictive distribution for βh given the econome- trician’s information set.

As a first case, suppose that the econometrician • only knew zh, the demographics of household h

From our model, p(β z , ∆,V )isN(∆z ,V ) • h| h β h β

Given the posterior p(∆,V Data), the econome- • β| tricians predictive distribution for βh is:

p(βh zh, Data)= p(βh zh, ∆,Vβ)p(∆,Vβ Data) | Z | |

We can simulate p(βh zh, Data) using Gibbs sam- • | (s) (s) pling given our posterior simulations ∆ ,Vβ s =1,...,S:

1 s p(β z , ∆(s),V( )) S h| h β X We could draw random β from p(β z ,Data). • h h| h

s s s For each ∆(s),V( ), draw β( ) from p(β z , ∆(s),V( )) • β h h| h β

(s) Given β ,s=1, ..., S, we could then simulate • h purchase probabilities.

(s) Draw ε from ε N(0, Λ(s)) • ht h,t ∼

The posterior purchase probability for j, given • Xht and zh is: 1 (s) (s) 1 Xjhtβh + εjht >Xj htβh + εj ht for j0 = j S s 0 0 6 X ½ ¾

This would allow us to simulate the purchase re- • sponse to different couponing strategies for a spe- cifichouseholdh. The paper runs through different couponing strate- • gies given different information set (e.g. full or choice only information sets).

The key ideas are similar- form a predictive distri- • bution for h’s preferences and simulate purchase behavior in an analogous fashion.

In the case of a full purchase information his- • tory, we could use the raw Gibbs output since (s) the markov chain will simulate βh s =1,...,S.

Thiscouldthenbeusedtosimulatechoicebe- • havior as in the example above (given draws of (s) εht ) 9Data

AC Neilson scanner panel data for tuna in Spring- • field Missouri.

400 households, 1.5 years, 1-61 purchases. •

Brands and covariates in Table 2. •

Demographics Table 3. •

Table4,deltacoefficients. •

Poorer people prefer private label. •

Goodness of fitmoderatefordemographiccoeffi- • cients Figures 1 and 2, household level coefficient esti- • mates with different information sets

Table5,returntodifferent marketing strategies. •

Bottom line, you gain .5 cents to 1.0 cents per • customer through better estimates.

With a lot of customers, this could be quite prof- • itable.