Affine Term Structure Models I: Term-Premia and Stochastic

Pierre Collin-Dufresne UC Berkeley

Lectures given at Copenhagen Business School June 2004

Contents

1 Why are Dynamic Term Structure Models useful? 4

2 Definitions 5

3 Term structure in a deterministic world 6

4 Traditional Expectations Hypothesis 9

5 Short rate models 10

5.1 One-factor models ...... 10

5.1.1 One-factor Vasicek model (PDE approach) ...... 15

5.1.2 One-factor Cox-Ingersoll and Ross (CIR) Model . . . 19

5.1.3 One-factor affine models ...... 20

5.1.4 Interpretation of the Market price of Risk: equilib- rium model ...... 22

1 5.1.5 Specification of risk-premia and predictability in returns ...... 24

5.1.6 Fitting the Term structure in a one-factor Gaussian model ...... 28

5.1.7 Shortcomings of one-factor short rate models . . . . . 29

5.1.8 Empirical Evidence ...... 30

5.2 Multi-factor short-rate models ...... 31

5.2.1 Simple Generalization: Independent state variables . 31

5.2.2 Affine Term Structure Models - Duffie and Kan (DK 1996) ...... 31

5.2.3 Tractability of Affine framework: ...... 32

5.2.4 Problems with Latent Variable specification: ...... 33

5.2.5 Why rotate from latent variables to observables? . . . 35

5.2.6 Example of Identification: 2-factor Gaussian Model 38

5.2.7 Model-Insensitive Estimation of the State Variables . 39

5.2.8 Canonical Representation and Maximality: the A1(3) model ...... 40

5.3 Unspanned Stochastic volatility ...... 41

5.3.1 Empirical Evidence ...... 43

5.3.2 USV Affine Models ...... 46

5.3.3 Maximal A1(3) model with USV ...... 50

5.3.4 Maximal A1(4) model with USV ...... 52

5.3.5 Specification of Risk-Premia ...... 53

5.4 Estimation of Affine models with and without USV . . . . . 54

2 6 Recent Developments 61

3 1 Why are Dynamic Term Structure Models useful?

Macroeconomics: The Term Structure (TS) is a major indica- • tor of economic activity. Models can be used to learn/forecast macroeconomy, understand and help Monetary policy. : TS affects valuation of all assets (discount- • ing). Dynamic models are useful to value TS derivatives, and manage risk (i.e., volatility). They also help under- stand the risk-return trade-off of bonds (term premia). Interest rate risk affects all economic agents: • 1. Households: e.g., mortgages (prepayment options). 2. Firms: financing and investment decisions. Interest rate derivatives are biggest component of derivative mar- • ket:

OTC (trillion $) Swaps Options Forwards Total Interest rate 58.5 12.3 8.6 79.4 Currency 3.9 3.1 12.4 19.4 Other 5.1

Source: Swaps Monitor 2000

Most trading occurs Over The Counter (85% of the outstanding • notional).

Derivatives (trillion $) 1998 2002 Over the counter Notional amount 80 142 Gross market value 3 6 Exchange traded Notional amount 14 24

Source: Bank of International Settlements

Volume in Exchange traded Options and Futures is mainly Equity • and Interest rate followed by FX and Commodity.

4 2 Definitions

P (t, T ) is the price of a zero- bond that pays $1 at maturity • (P (T, T ) = 1).

The continuously compounded Y (t, T ) is de- • Y (t,T )(T t) fined by : P (t, T ) = e− − .

The term structure of interest rates is the mapping (T t) • − → Y (t, T ).

Define the time-t forward rate f(t, T , T ) for borrowing/lending • 1 2 between T1 and T2 in the future by:

f(t,T1,T2)(T2 T1) P (t, T1)e− − = P (t, T2).

The instantaneous Forward rate f(t, T ) = limT T f(t, T, T2) is • 2→ given by f(t, T ) = ∂ ln P (t, T ) = Y (t, T ) + (T t)∂ Y (t, T ) − 2 − 2 (Assuming differentiability of the term structure). Note also the rela- tion T f(t,s)ds P (t, T ) = e− t (1) R .

The instantaneous short rate r(t) = f(t, t) = Y (t, t). •

5 3 Term structure in a deterministic world

In a deterministic world no-arbitrage implies that all bonds earn the same return over any finite period 1/P (t, T1) = P (T1, T2)/P (t, T2).

Taking the limit (T t), it also follows that all bonds earn the same 1 → instantaneous rate of return:1

dP (t, T ) = r(t) dt T. P (t, T ) ∀ or equivalently T r(s)ds P (t, T ) = e− t R Further, we have: f(t, s) = r(s) (2) 1 T Y (t, T ) = r(s)ds (3) T t − Zt f(t, T1, T2) = Y (T1, T2) (4) If we specify the simple mean-reverting short-rate model: dr(t) = κ(θ r(t))dt, − κ(s t) κ(s t) then r(s) = e− − r(t) + (1 e− − )θ and − T r(s)ds P (t, T ) = e− t rRtBκ(T t) θ(T t Bκ(T t)) = e− − − − − − , (5) where we have defined τ κτ κs 1 e− B (τ) = e− ds = − κ κ Z0 T1 1To prove it take the log of both sides, differentiate with respect to T2. Get ln P (t, T1) = r(s)ds + ψ(t). − 0 Solve for ψ(t). R

6 We can also prove this by the ‘PDE’ approach:

In particular, assume P P (t, r(t)). Then dP (t) = (P dr + P dt) = ≡ r t (P κ(θ r) + P )dt. By no arbitrage, dP = rP dt. Thus P satisfies: r − t P κ(θ r) + P = rP r − t subject to P (T, r) = 1. Guessing that the bond price formula is an exponential-affine solution in r(t), we obtain the same result as above.

For this ‘one-factor’ model the term structure is: B (T t) Y (t, T ) = θ (θ r ) κ − . (6) − − t (T t) −

The term structure is increasing iff rt < θ

The long rate is limT Y (t, T ) = θ. →∞ More generally, we could assume an affine multi-factor short rate model: n

r(t) = δjXj(t), j=1 X where

dX (t) = µi + µi X (t) i = 1, . . . , n. i  0 j j  ∀ j X Bond prices can easilybe solved for as abo ve (exercise).2

Modern term structure theory basically adds ‘noise’ (Brownian mo- tion and/or jumps):

Short-rate models basically ‘add’ a stochastic term to the short- • rate dynamics and try to derive bond prices using no-arbitrage considerations. 2Richard (1978) offers further discussion as well as introducing inflation and a nominal term structure model.

7 For example, Vasicek (1977) considers dr(t) = κ(θ r(t))dt + σ dw(t). − Such models are useful for predicting/explaining the cross-section of observed bond prices on a given date. In contrast, Bond or Forward rate models takes the observed term • structure as given, and adds a stochastic term to the bond or for- ward rate dynamics in a manner consistent with absence of arbi- trage. The most common use of this framework is for pricing fixed in- come derivatives. This (HJM) approach is analogous to pricing equity options (e.g., Black Scholes) taking the stock price as given.

Remarks:

1. Short rate vs. Bond vs. Forward rate model: While the modeling approach seems different, we will show be- low that there is a one to one correspondence between the two. 2. Note that since f(t, T ) = r(T ) in a deterministic world forward rates are constant, i.e. df(t, T ) = 0dt. This explains why in a risky world, forward rate dynamics will be primarily determined by the volatility structure. 3. There has been some debate in the literature (fueled by a some- what controversial example in CIR (1985b) about the need to dis- tinguish: Equilibrium models derived from fundamental principles with • rational maximizing agents within a GE economy (Cox, In- gersoll and Ross (1985), Lucas (1978)), and

8 Arbitrage-free (HJM) models which solely rest on the princi- • ple of absence of arbitrage (Heath, Jarrow and Morton (1992)). It is now well understood that this distinction is moot.

4 Traditional Expectations Hypothesis

Traditional theories of the term structure are described in Cox, Inger- soll and Ross (1981).

Typically, start from the relations that hold in a deterministic world (above) and apply an expectation.

CIR distinguish:

Unbiased expectation hypothesis: f(t, T ) = E[r(T )] • T r(s)ds Local expectation hypothesis: P (t, T ) = E[e− t ] • R (i.e., Et[dP (t, T )] = r(t)P (t, T )dt) T Return to maturity expectation hypothesis: 1/P (t, T ) = E[e t r(s)ds] • R 1 T Yield to maturity expectation hypothesis Y (t, T ) = T tE[ t r(s)ds] • − R CIR 81 have shown that all these assumptions are mutually incompat- ible (essentially, because of Jensen’s inequality).

As we shall see below, ruling out arbitrage opportunities implies that only the local expectation hypothesis holds, but when the expectation is computed under the so-called risk-neutral measure.3

3Of course, it is possible to construct examples where one or the other expectation hypothesis holds, but generically they are inconsistent with the absence of arbitrage.

9 5 Short rate models

5.1 One-factor models

The short rate r(t) is the only state variable. Its dynamics is spec- • ified as a one-factor Markov process:

drt = µr(t, rt)dt + σr(t, rt)dwt

If we can only borrow or lend at the risk-free rate then markets • are incomplete.

Suppose however that we can trade in multiple zero-coupon bonds. • Then absence of arbitrage (AoA) implies cross-sectional restrictions as we show below.

Assume P (t, T ) = P i(t, r ) is continuous and differentiable. Con- • i t struct a portfolio 1 2 V (t) = n1P (t) + n2P (t), that is self financing: 1 2 dVt = n1(t)dP (t) + n2(t)dP (t) 1 1 = n P 1 σ2 + P 1µ + P 1 + n P 2 σ2 + P 2µ + P 2 dt 1 2 rr r r r t 2 2 rr r r r t   1 2    + n1Pr + n2Pr σrdwt  The first equality comes from the self-financing condition, and the second from Ito’ˆ s lemma.

Suppose we choose the weights n , n such that the portfolio is { 1 2} locally risk-free. That is, we choose n , n so that: { 1 2} 1 2 n1Pr + n2Pr = 0 (7)

10 By AoA this portfolio must earn the risk-free rate of return: 1 1 n P 1 σ2 + P 1µ + P 1 +n P 2 σ2 + P 2µ + P 2 = r(n P 1+n P 2) 1 2 rr r r r t 2 2 rr r r r t 1 2     (8) Combining equations (7) and (8), we find 1 1 1 1 P 1 σ2 + P 1µ + P 1 rP 1 = P 2 σ2 + P 2µ + P 2 rP 2 . P 1 2 rr r r r t − P 2 2 rr r r r t − r   r  

Therefore, there must exist γ(t) independent of maturities such that: 1 2 Prrσ + Prµ + Pt rP γ(t) = 2 r r − . (9) σrPr

We obtain the fundamental PDE for bond prices: 1 P σ2 + P (µ γσ ) + P rP = 0 (10) 2 rr r r r − r t − subject to the boundary condition P (T, T ) = 1.

Remark: Arbitrage and Convexity:

1 2 Suppose you have duration-matched portfolios (Pr = Pr ) with equal 1 2 1 2 value (P = P ). Then the PDE above indicates that if Prr > Prr > 0 1 2 then Pt < Pt < 0. Convexity is not free! (similar to Gamma for options).

11 Equivalent Martingale measure. • Q Q Define a process w (t) by dwt = dwt + γdt, then we obtain: dr = (µ γσ )dt + σ dwQ (11) r − r r t µQdt + σ dwQ. (12) ≡ r r t Equation (10) can be rewritten as: 1 P σ2 + P µQ + P rP = 0. (13) 2 rr r r r t −

The instantaneous expected expected return on the risk-free bond is equal to the risk-free rate when the expectation is taken with respect to a different measure Q under which dynamics of the risk-free rate are given by (12) above with wQ(t) being a standard Brownian motion.

Under technical conditions on γ, Girsanov’s theorem shows that there exists a measure Q equivalent to P under which wQ defined above is indeed a standard Brownian motion.

If these technical conditions are satisfied, then discounted Bond prices are martingales under the Q-measure and we we obtain:

Q T r(s)ds P (t, T ) = E [e− t (t)] R |F Q is the so-called risk-neutral equivalent martingale measure (EMM) (Harrison and Kreps (1979)), the existence of which guarantees the AoA.

Remark: The argument used above to derive the ‘fundamental’ PDE (13) is not specific to bond prices. I applies to any European style contin- gent claim that has a payoff that is a function solely of the short rate. Pricing different contingent claim only changes the boundary condi- tions associated with the PDE. Further for any such contingent claim

12 we have the validity of the risk-neutral pricing formula:

Q T r(s)ds C(t, rt) = E [e− t C(T ) (t)] R |F Standard short rate models in the literature are:

Author short rate risk premium Merton dr = θdt + σ dw γ (const.) Vasicek dr = κ(θ r)dt + σ dw γ (const.) − Cox, Ingersoll, Ross dr = κ(θ r)dt + σ√r dw γ√r − Dothan dr = κrdt + σr dw γ Duffie & Kan dr = κ(θ r)dt + √σ1 + σ2r dw γ√σ1 + σ2r − γ1 Ahn & Gao κ(θ r)r dt + σr1.5dw + γ √r − √r 2 Ho and Lee dr = θ(t)dt + σ dw 0 Hull & White (extended Vas.) dr = (φ(t) κr)dt + σ dw 0 − Hull & White (extended CIR) dr = (φ(t) r)dt + σ√r dw 0 − Black & Karasinsky d log r = κ(t)(θ(t) log r)dt + σ(t) dw 0 − Quadratic (Jamshidian) r = θ + σ x + σ x2 where dx = κx dt + dw 0 1 2 −

Remarks:

1. All models with zero market price of risk are, in fact, defined under the risk-neutral measure. They have time dependent pa- rameters which can be chosen to fit the observed term structure at the initial date. 2. The commonly used Black and Karasinsky model (which nests a continuous time version of the Black, Derman and Toy model) generates infinite values for Eurodollar futures (Hogan and Wein- traub (1993)). 3. The Ahn and Gao (1999) model in fact corresponds to defining the short rate as the inverse of a square-root process.

13 4. The quadratic model is a restricted two-factor affine model. In- 2 deed setting yt = xt we have: dy = (1 2κy )dt + 2x dw t − t t t Thus (xt, yt) form jointly an affine two-factor model. The quadratic model however, restricts the initial value of the state vector to sat- 2 isfy y0 = x0 (which along with the dynamics insures that yt = xt a.e.).

For illustration, we first introduce the Gaussian Vasicek model and then its generalization to the affine case.

14 5.1.1 One-factor Vasicek model (PDE approach)

Vasicek (1977) assumes that the short rate is: dr = κ(θ r )dt + σdw t − t t and constant market price of risk γ: dP (t,T ) E[ ]/dt rt γ = P (t,T )] − Pr(t,T ) σ P (t,T ) Under the risk-neutral measure the risk-free process is thus: dr = κ(θQ r )dt + σdwQ t − t t with σ θQ = θ γ. − κ Then bond prices are solution to the PDE: 1 P σ2 + P κ(θQ r) + P rP = 0 (14) 2 rr r − t − s.t. P (T, T ) = 1. Guessing a solution of the form: A(T t) B(T t)r P (r, t, T ) = e − − − , we find that A, B solve the system of ODE:

B0 = κB 1 (15) − − 1 2 2 Q A0 = σ B κθ B (16) 2 − with A(0) = B(0) = 0. The solution is easily computed (Mathemat- ica. . . ) κτ e− 1 B(τ) = − (17) κ τ 1 A(τ) = ds σ2B(s)2 κθQB(s) (18) 2 − Z0   15 The integral can also be computed in closed-form (Mathematica).

We obtain the from P (t, T ) = exp( Y (t, T )(T t)): − −

T 2 T 1 Q σ 2 Y (t, T ) = rtB(T t) + κθ B(T u)du B(T u) du T t − t − − 2 t − −  Z Z (19) 

Setting σ = 0 we recognize the deterministic term structure of equa- tion 6.

Introducing uncertainty adds two components:

a Jensen’s inequality effect which tends to lower long-term yields, • a risk-premium effect (if γ = 0) which can go either way. • 6 The bond price return is given by: dP (t, T ) = r dt σB(T t)dwQ (20) P (t, T ) t − − t

By definition the term-premium, i.e. the expected return on longer maturity bonds in excess of risk-free rate, is: dP (t, T ) P (t, T ) E[ ]/dt r = γσ r = γσB(T t) P (t, T ) − t P (t, T ) − −

If γ < 0 then term premia are positive (and θQ > θ).

In the Vasicek model we can infer term premia from the average slope of the term structure:

Note that

lim Y (t, T ) = rt T t → 16 σ2 γσ σ2 lim Y (t, T ) = θQ = θ T − 2κ2 − κ − 2κ2 →∞ In particular, the average long-term slope of the term structure is: σ2 γσ σ2 θQ θ = − 2κ2 − κ − 2κ2

For example, if on average the TS slope is increasing we deduce that:

σ γ < −2κ

Remark:

The zero-coupon bond price process under the risk-neutral mea- • sure in the Vasicek model (eq 20) is similar to the to the Merton (1976) stock price model with stochastic interest rates. As a result it is possible to derive a similar closed-form solution for options on zero-coupon bonds. Following Jamshidian (1991) it is also possible to obtain closed • form solutions for coupon bond options.

17 Conclusions about one-factor model

In a deterministic world: • – AoA all bonds have the same (instantaneous) return. ⇒ – Today’s yield curve embeds all the information about all fu- ture rates. In a one-factor risky world: • – AoA There exists a Market price of risk (equal to the term ⇒ premium on a T -maturity bond normalized by the diffusion of that bond). – Under regularity condition on the MPR, there exists an EMM under which all bonds prices have the same instantaneous rate of return (i.e., the local expectation’s hypothesis holds under the risk-neutral measure). Various short-rate models differ in terms of the assumptions they • make on any two of : – Dynamics of short rate under historical measure (determines ), – Dynamics of short rate under the risk-neutral measure (deter- mines cross section), – the Market price of risk (determines the difference between the physical and risk-neutral drift, i.e. the risk-premium).

18 5.1.2 One-factor Cox-Ingersoll and Ross (CIR) Model

One shortcoming of the Vasicek model is that it allows for negative interest rates. Instead, CIR (1985b) assume that the dynamics of the short rate is:

dr = κ(θ r )dt + σ√r dw t − t t t The advantage of this model is that it allows for time-varying volatil- ity and guarantees positive interest rates if κθ > 0. Feller (1951) also 2 shows that if 2κθ > σ then rt > 0 a.e..

Further there exists a market price of risk process γ(rt) such that: dP (t,T ) E[ P (t,T )] ]/dt rt γ(rt) − = γ√rt ≡ Pr(t,T ) σ√rt P (t,T ) Under the risk-neutral measure the risk-free process is: dr = κQ(θQ r )dt + σ√r dwQ t − t t t with κQ = κ + σγ, θQ = θκ/κQ. Then bond prices are solution to the PDE: 1 P σ2r + P κQ(θQ r) + P rP = 0 (21) 2 rr r − t − with boundary condition P (T, T ) = 1. Guessing that the solution is of the form A(T t) B(T t)r P (r, t, T ) = e − − − , we find that A, B solve the system of ODE: 2 Q σ 2 B0 = κ B + B 1 (22) − 2 − Q Q A0 = κ θ B (23) −

19 with boundary conditions A(0) = B(0) = 0. The solution is easily computed (Mathematica will do it for you. . . ).

The bond price follows: dP (t, T ) = r dt σ√r B(T t)dwQ P (t, T ) t − t − t And the risk-premium on the bond is given by: dP (t, T ) P (t, T ) E[ ]/dt r = γσr r = γσr B(T t) P (t, T ) − t t P (t, T ) − t − Note that under the physical measure dP (t, T ) = r (1 γσB(T t)) dt σ√r B(T t)dw P (t, T ) t − − − t − t We see that when r 0 the volatility tends to zero, and expected re- → turn also tends to zero as the compensation for risk (the risk-premium) is proportional to the interest rate. We return to that issue shortly.

5.1.3 One-factor affine models

One-factor affine models have the particularity that:

the drift and the variance of the short rate process are affine in the • short rate, and bond prices are exponentially affine in the short rate (or alterna- • tively stated, Yields are affine).

In fact, as we show below, there is an ‘equivalence’ between the two statement.

20 First, assume the short rate is affine under the risk-neutral measure: dr = κQ(θQ r )dt + √σ + σ r dwQ t − t 1 2 t t Then bond prices are solution to the PDE: 1 P (σ + σ r) + P κQ(θQ r) + P rP = 0 (24) 2 rr 1 2 r − t − with boundary condition P (T, T ) = 1. Guessing that the solution is A(T t) B(T t)r of the form P (r, t, T ) = e − − − , we find that A, B solve the system of ODE:

1 2 Q B0 = σ B + κ B 1 (25) − 2 2 − 1 2 Q Q A0 = σ B κ θ B (26) 2 1 − with boundary condition A(0) = B(0) = 0. The solution is easily computed (Mathematica will do it for you. . . ) 2(eητ 1) B(τ) = − (27) (η + κQ)(eητ 1) + 2η − τ 1 A(τ) = ds σ B(s)2 κQθQB(s) (28) 2 1 − Z0   Q 2 where η = (κ ) + 2σ2. The integral can also be computed in closed-form (Mathematica). p For the converse, suppose that bond prices are of the form P (t, T ) = A(T t) B(t t)rt e − − − . Plugging into the fundamental PDE, 1 P σ2 + P µQ + P rP = 0 (29) 2 rr r r r t − 1 2 2 Q we obtain that B(τ) σ B(τ)µ + ( A0(τ) + B0(τ)r) r = 0 2 r − r − − must hold for any τ. Using two (e.g., τ = τ ), we get a system 1 6 2 of two equations with two unknowns, which provided they are not Q redundant, yields a solution for µr , σr affine in r.

21 Remarks

Admissibility: • Inspection of the SDE followed by the short-rate shows that not all parameter values are admissible. Indeed, the SDE is only well-defined if σ + σ r 0. Conditions that insure this are 1 2 ≥ easily obtained, however. Set yt = σ1 + σ2rt. Itoˆ gives dyt = Q Q κ (σ2θ +σ1 yt)dt+σ2√ytdwt. For yt to remain positive, all we − need is that σ θ+σ 0. Indeed, y is continuous and when it hits 2 1 ≥ zero has a positive drift, thus can never cross zero. Furthermore, Q Q 1 2 if κ (σ2θ + σ1) > 2σ2, then it can be shown that yt > 0 a.s. (e.g. zero is an entrance boundary for y, Feller (1951)). Affine models are particularly tractable because they admit expo- • nential affine solution for the ‘extended transform’ T Q t r(s)ds iλ r(T ) M0(T t)+M1(T t)r(t) Et e− e = e − − R h i where M0, M1 are two deterministic functions that satisfy a sys- tem of Riccatti ODE. As shown by Heston (1993), Chacko and Das (2002), Duffie, Pan and Singleton (2000), CD and Goldstein (2002) this can be used to price derivatives in closed-form solu- tions using inverse Fourier transform techniques. Duffie, Filipovic and Schachermayer (2001) use the exponen- • tially affine transform as a mathematical definition of affine pro- cesses. The larger class may include jumps (of Poisson and/or Levy type).

5.1.4 Interpretation of the Market price of Risk: equilibrium model

Following CIR (1985a) or Lucas (1978) consider a simple represen- tative agent economy with time-separable utility u(x) = log x, where

22 aggregate output is given by:

dct 2 = (µt + βt )dt + βtdw(t) ct

Then equilibrium state price density is Π = 1 . t ct Applying Ito’ˆ s lemma we obtain:

dΠt = µtdt βtdw(t) Πt − − = r dt γ dw(t) − t − t

The second equation defines the risk-free rate and the market price of risk, by definition of a state price density.

This simple example shows that any price system (r,γt) that is arbitrage- free can be supported by some ‘equilibrium’ model.

In particular, if we choose µt to follow an Ornstein Uhlenbeck process and β to be constant then we obtain the Vasicek model.

If we choose µt to follow a square-root process and βt = γ√µt then we obtain CIR (1985b).

CIR (1985) argued that the only way to specify a sensible risk-premium is to show that it is supported by a General equilibrium model. The above shows that it is sufficient that there exists an EMM.

Sufficient conditions for the market price of risk to be ‘acceptable’ are that it satisfies the Novikov condition.

For the case where the short rate process is of the diffusion type (which covers the case of a general Markov process) necessary and sufficient

23 conditions are given in theorem 7.19 p.294 in Liptser and Shiryaev (1974). (Basically it is sufficient to verify that T T P ( γ 2dt < ) = Q( γ 2dt < ) = 1, || t|| ∞ || t|| ∞ Z0 Z0 where the P and Q measures are defined on the canonical space using the measure induced by the process.4

The discussion above indicates that one need not restrict ourselves to the simple risk-premium structure given by CIR and Vasicek. In fact, Duffee (2002) and Dai and Singleton (2002) show that to capture the observed predictability in bond returns it is necessary to allow for more general risk-premia.

5.1.5 Specification of risk-premia and predictability in bond re- turns

There are three different kinds of risk-premia that are considered in the literature

First generation • – Vasicek: γ constant. – CIR: γ(r) = γ√r.

– Affine: γ(r) = γ√σ0 + σ1r. Second Generation affine (Duffee (2002), DS (2002)): • – Vasicek: γ(r) = γ0 + γrr.

4 T 2 Suppose w is a BM on (Ω, , P ) and γ satisfies P ( γt dt < ) = 1. Define the exponential supermartingale F 0 || || ∞ 1 T 2 t 1 ξt = exp( 2 0 γt dt 0 γtdwt). ξt is a martingRale iff E[ξT ] = 1. Define the measure Q by Q(F ) = E[ F ξT ]. − || T|| −2 T 2 Note that for F = γt dt < we have Q(F ) = E[ξT ]. Therefore ξ is a martingale iff Q( γt dt < ) = 1. R 0 || || R ∞ 0 || || ∞ If γ is process of the diffusion type (in the sense of Liptser and Shiryaev), then the latter condition can easily be checked. R R

24 γr – CIR: γ(r) = γ0√r + √r γr Affine: γ(r) = γ0√σ0 + σ1r + . – √σ0+σ1r Second generation semi-affine (Duarte (2002)): • – CIR: γ(r) = γ0 + γr√r

All models have affine dynamics under the risk-neutral measure. This is essential to obtain explicit solutions for bond prices (useful for im- plementation/estimation).

However, the ‘semi-affine’ models have non-affine dynamics under the physical measure.

The disadvantage is that one cannot compute moments for the state variables under the physical measure, which can make the estimation somewhat less simple.

Why are such extended models of risk-premia useful?

The main difference can be seen from the definition of excess ex- pected return:

dP (t, T ) P (t, T ) E[ ]/dt r = γ(r )σ r = γ(r )σB(T t) P (t, T )] − t t P (t, T ) − t −

The traditional risk-premium specification imply that the sign of the risk-premium on bonds is constant.

This is however hard to reconcile with the empirical evidence (Duffee (2002) Table I):

Predictabiliy in (long-term) bond returns (Violation of Expecta- • tion’s hypothesis):

25 Regression of Treasury bond returns on previous month term struc- ture slope and volatility shows positive relation between long- term bond returns and slope of the TS as well as volatility. While average excess return in sample to Treasury bond is small, • the slope of TS predicts large variation in excess returns. Re- quires time variation and sign switching in risk-premia (Fama French (1993)).

Note however that traditional risk-premium specification predict that either risk-premia are constant, or that they are positive and increasing in the short rate (i.e., decreasing with slope).

In either case, they impose that compensation for risk is a fixed mul- tiple of volatility. In particular, risk-premia cannot switch sign over time.

More general structure of risk-premia breaks this link and allows for changes in sign in risk-premia.

Duffee (2002) and Dai and Singleton (2002) show that this is nec- essary to capture predictability and improve out of sample forecasts using affine models.

Remarks

On absence of arbitrage and viability in the second generation • CIR model. Suppose the risk-free rate is given by: dr = κ(θ r )dt + σ√r dw t − t t t γr and the risk-premium is γ(r) = γ0√r + √r Then the risk-neutral measure process is given by: dr = κQ(θQ r )dt + σ√r dwQ t − t t t 26 where Q κ = κ + γrσ κQθQ = κθ γ σ − 0 CIR (1985b) use this example to suggest that absent an equilib- rium model to justify the model the resulting economy may not be viable. Their argument is as follows. The bond process is given by: dP (t, T ) = r dt σ√r B(T t)dwQ P (t, T ) t − t − t = σ( γ B(T t) + r (1 γ B(T t)))dt σ√r B(T t)dw − r − t − 0 − − t − t There is a potential arbitrage if/when the short rate hits zero as long as γ = 0. r 6 Feller’s condition states that if 2κθ > σ2 then the short rate re- mains strictly positive almost surely. The solution to CIR’s ‘puz- zle’ is that the risk-neutral measure is equivalent to the physi- cal measure (for this choice of risk-premium) if and only if the Feller condition holds under both measures (i.e., 2κθ > σ2 and 2κQθQ > σ2). This follows directly from theorem 7.19 p.294 in Liptser and Shiryaev (1974). Of course, in that case CIR’s arbi- trage is not implementable! On the estimation of the risk-premium parameters. • In traditional affine models, the cross-section of the data is help- ful in pinning down expected return parameters, because physical and risk-neutral drift share common parameters. With extended risk-premium, all parameters of the physical mea- sure drift differ from those of the risk-neutral drift. This may create substantial biases in estimation due to the near unit root behavior of short rates in sample (Duffee and Stanton (2003)).

27 5.1.6 Fitting the Term structure in a one-factor Gaussian model

To perfectly fit initial term structure, an idea suggested by Cox, Inger- soll and Ross and extended by Hull and White (1990) is to make one or more parameters deterministic (appropriately chosen) functions of time (under the Q measure).

Of course, this model does not deliver implications about the cross- section of bond prices, but rather should be used to price derivatives relative to the observed term structure.

The simplest example has only one time dependent parameter: dr = (φ(t) κr )dt + σdw t − t t Denote the observed term structure by Pˆ(0, T ). We want to pick the function φ( ) such that P (0, T ) = Pˆ(0, T ) T or equivalently such · ∀ that f(0, T ) = fˆ(0, T ) T . For the Vasicek model the latter obtains ∀ if: T T 2 σ B(s, T )b(s, T )ds + b(0, T )r0 + b(s, T )φ(s)ds = fˆ(0, T ) − 0 0 Z Z (30) Differentiating and combining with the above we obtain: ˆ ˆ 2 φ(T ) = ∂2f(0, T ) + κf(0, T ) + σ B2κ(0, T ) (31)

t t s x(u)du where we have defined Bx(s, t) = s bx(s, u)du and bx(s, t) = e − . R Remark: R

As we show below, this so-called extended Vasicek model is in • fact identical to a so-called HJM model with the following for- ward rate dynamics: df(t, T ) = σ2b(t, T )B(t, T )dt + σb(t, T )dwQ(t)

28 with initial condition f(0, T ) = f(0, T ) T . ∀ It is also equivalent to a bond price model with following dynam- • b ics: dP (t, T ) = r(t)dt σB(t, T )dwQ(t) P (t, T ) − with initial condition P (0, T ) = P (0, T ) T . ∀ The HJM approach basically corresponds to relaxing the time- • b homogeneity of short rate models (and possibly the Markovian structure of the short rate).

5.1.7 Shortcomings of one-factor short rate models

1. All bond prices are instantaneously perfectly correlated. 2. Hedging of any fixed income derivative can be done ‘equally well’ with any arbitrary maturity bond. 3. Time-homogeneous model restricts current term structure shape. 4. Restricts future term structure shape. 5. Restricts volatility and correlation structures (at most function of the short rate). 6. When pricing derivatives two sources of errors: (i) calibration of bond prices, (ii) pricing of derivatives relative to bonds.

The last problem can be somewhat alleviated by the fitting ‘trick’ shown above. However, unless the model is re-fitted continuously, the possible shapes of future term structures will be restricted within the model (this is a generic inconsistency of the ‘HJM approach’).

29 5.1.8 Empirical Evidence

Factor analysis of term structure reveals at least three factors: • Level, slope, curvature (Litterman and Scheinkman (91)), and possibly four (Knez, Litterman and Scheinkman (94)). Time series analysis of short rate shows that interest rate volatil- • ity is stochastic (Brenner, Harjes and Kroner (95), Anderson and Lund (97), Benzoni et al. (2003)) Factor analysis using Derivative data suggest that that term struc- • ture factors are not sufficient to explain the dynamics of fixed in- come derivatives. (Collin-Dufresne and Goldstein (02), Heiddari and Wu (02))

Two routes are followed to make models more consistent with the data:

Allow for multiple factors while remaining in the time-homogeneous • short rate model setup. This approach will deliver cross-sectional restrictions for bonds and is better suited for empirical work, and investigation of risk-return trade-off in bond markets. Directly model bond prices or forward rates in an arbitrage-free • way, taking the observed term structure as given. This is the Heath-Jarrow-Morton approach used by practitioners. It focuses on pricing of derivatives relative to bond prices, and not on pric- ing of bonds across maturities. This approach in general, focuses exclusively on the risk-neutral distribution of the term structure. There exists a corresponding (in general, non-time homogeneous) short rate process consistent with any HJM model.

30 5.2 Multi-factor short-rate models

5.2.1 Simple Generalization: Independent state variables

The simplest generalization is to write the short rate as a sum of inde- n i pendent factors: rt = δ0 + i=1 Xt , where, for example, dXi = κi(θi PXi)dt + σi + σi Xi dZ (t) t − t 1 2 t i and dwidwj = 0 i = j. Obviously, q ∀ 6 T T Q t rsds t δ0ds n i P (t, T ) = E [e− t] = e− Πi=1P (t, T ) R |F R i Ai(T t) Bi(T t)Xi where P (t, T ) = e − − − t is the one-factor bond ‘price’.

5.2.2 Affine Term Structure Models - Duffie and Kan (DK 1996)

Introduce ‘Latent’ set of N state variables X with dynamics: { }

dX(t) = Q θQ X(t) dt + Σ S(t) dZQ(t) (32) K −  p where Q, Σ are (N N) matrices, and S diagonal matrix with com- K × ponents

Sii(t) = αi + βi>X(t) . (33)

The spot rate is specified as an affine function of X:

r(t) = δ0 + δx> X(t) , (34)

Due to Markov structure, all variables of interest can be written as V ( X(t) ) { } 31 Note that this specification imposes that the

1) Risk-neutral drift Q θQ X(t) K − 2) Covariance Matrix Σ S(t) S(t)ΣT = Σ S(t) ΣT p p 3) Spot rate r(t) = δ0 + δx> X(t) are all affine (i.e., linear plus constant) functions of X : { } These strong restrictions are imposed in order to obtain tractability. Whether or not these models can capture empirical observation is an open question.

For the most part, when the affine model has been found to fail in some respect, there has been a proposal to modify/generalize the frame- work that improves empirical fit while maintaining tractability.

5.2.3 Tractability of Affine framework:

Many fixed income securities (e.g., caps, swaptions) obtain tractable solutions. In particular, bond prices take a simple exponential affine structure: (τ (T t)): ≡ − A(τ) B(τ) X(t) P (t, τ) = e − > , (35) where A(τ) and B(τ) satisfy the (deterministic) ODE’s (and initial conditions):

dA(τ) Q Q 1 N 2 = θ > > B(τ) + Σ>B(τ) α δ : A(0) = 0 dτ − K 2 i=1 i i − 0 P   32 dB(τ) Q 1 N 2 = > B(τ) Σ>B(τ) β + δ : B(0) = 0 dτ −K − 2 i=1 i i x P   which can be quickly estimated for all maturities τ on a computer.

Bond yields Y (t, τ) via Y (t, τ) = 1 Log (P (t, τ)) linear in X − τ { }  A(τ) B(τ)> Y (t, τ) = + X(t). (36) − τ τ 2 important points:

1. Empirically, by observing yields Y , can ‘back out’ the latent { } variables X if parameter vector Θ (and hence A(τ) and B(τ)>) { } { } is known. Empirically, usually assume N yields are measured without error ⇒ Below, we demonstrate this assumption is strongly rejected em- ⇒ pirically 2. Theoretically, seems to suggest that one can ‘rotate’ state vector (and its dynamics) from the latent variables X to the observed { } yields Y . { } In fact, cannot do so in general and maintain tractability. ⇒ Due to A(τ) and B(τ)> not known in closed-form ⇒

5.2.4 Problems with Latent Variable specification:

Admissibility SDEs must be well-defined. For example, if X is • 1 Gaussian it cannot show up under any square root, so we must set all βi,1 = 0. Dai and Singleton (2000) have defined Am(n)

33 families: m is number of Brownian motions, (n) is number of state variables driving conditional variances (that show up ’under the square root’). Identification and maximal models. • Not all parameters are identified from an econometric point of view (because we observe yields, not the state variables). This leads DS (2000) to define maximal models, those with the maxi- mum number of separately identifiable parameters. Invariant Transformations • Invariant transformations consists in making changes in state- variables or/and rotating the Brownian motion vector, without affecting the dynamics of the short rate and thus leaving bond prices unchanged.

Dai and Singleton (2000) propose a canonical representation for affine models in terms of latent variables that is admissible, and maximal.

Instead, we propose an alternative canonical representation in terms of observable state variables.

34 5.2.5 Why rotate from latent variables to observables?

Latent variables identified only after parameter vector is known • Ex: 1 factor model: 1 r(t) = δ + δ X(t) X(t) = (r(t) δ ) 0 x ⇒ δ − 0  x  r(t) ( = yield at zero maturity) identified independent of param- eters, but definition of X(t) changes every time a new parameter vector is tested. Difficult to come up with a reasonable first guess for Θ ⇒ { } Increase the number of local maxima? ⇒ Difficult to identify whether model is ‘maximal’ (Dai and Single- • ton (2000)) Models which appear to be well-specified in fact contain some ⇒ parameters which are not identifiable.

Q? Is there a tractable way to rotate from latent variables to observ- ables?

A! Yes! But rather than yields of finite maturity, need Taylor series expansion of yields around zero maturity:

τ 2 Y (t, τ) = Y (t, 0) + τ ∂ Y (t, τ) + ∂2 Y (t, τ) + . . . τ=0 2! τ=0   τ 2 A(τ) = A(0) + τ ∂ A(τ) + ∂2 A(τ) + . . . τ=0 2! τ=0   τ 2 B(τ) = B(0) + τ ∂ B(τ) + ∂2 B(τ) + . . . . τ=0 2! τ=0  

35 Q? Why does this lead to tractability?

A! Because the ODE’s that define A(τ) and B(τ) are effectively ex- pansions about τ = 0

Using A(0) = B(0) = 0 and collecting terms of the same order τ, we find:

Y n(t) ∂n Y (t, τ) ≡ τ=0

1 N = ∂n+1 A(τ) + ∂n+1 B (τ)X (t) n = 0, 1, 2 . . . n + 1 − τ=0 τ=0 i i ∀ i=1 ! X

The first few Taylor series components have simple interpretations: Y 0(t) = r(t) Y 1(t) = (1/2) µQ(t) Y 2(t) = (1/3) EQ[dµQ(t)] V (t) dt t −   where µQ = E[dr(t)]/dt and V (t) = (dr(t))2/dt.

We call a representation of the state vector canonical if it is written in terms of:

Taylor series components of the term structure at zero (i.e., the • Y n), The quadratic co-variations of the Y n (i.e., V i,j = dY idY j/dt). •

36 Advantages:

1. Factors are intuitive and have physical interpretation (level, slope, curvature, spot rate volatility, etc.) 2. Theoretical observability of factors guarantees that model is ‘Q- maximal’: all parameters are identifiable from fixed income se- curities. 3. ‘Q-maximality’ is independent of the risk-premium specification (unlike DS 2000). 4. Model-insensitive estimates of state vector readily available: (a) State vector empirically ‘observable’ (b) Can use, eg., OLS to estimate ‘first guess’ at parameter vector 5. Model remains affine and tractable 6. Identifies which parameters identifiable from bond prices alone under ‘USV’

37 5.2.6 Example of Identification: 2-factor Gaussian Model

The ‘maximum’ 2-factor Gaussian model under the Q measure Q drt = (αr + βrr rt + βrx xt)dt + σr dZr,t Q dxt = (αx + βxr rt + βxx xt)dt + σx dZx,t

Q ‘Canonical representation’ obtained by rotating from (rt, xt) to (rt, µt ), where Q µ = αr + βrr rt + βrx xt

Therefore

Q Q drt = µ dt + σrdZr,t dµQ = (γ κ r κ µQ)dt + σ dZQ t µ − µr t − µµ t µ µ,t

Only 6 parameters in canonical representation (instead of 9). • µQ is observable (it is twice the slope of the term structure at • t zero).

38 5.2.7 Model-Insensitive Estimation of the State Variables

Simulate a two factor A (2) model (parameters from Duffie and • 2 Singleton (1997)).

Sample 10 years of weekly data maturities 0.5, 1, 2, 5, 7, 10 years. • { }

Add i.i.d. noise with either 2bp or 5bp standard errors. •

Estimate the level (Y 0 = r) and slope (Y 1 = µQ) of the term • structure (at τ = 0) using quadratic and cubic polynomials fitted with OLS.

Regress the estimates obtained from the polynomial fits on the • true value of the simulation: (See Table 1 of CD, Goldstein, Jones (2003))

true r = α + β estimated r +  t × t t Q Q true µ = α0 + β0 estimated µ + 0, t × t t

39 5.2.8 Canonical Representation and Maximality: the A1(3) model

Consider the 3-factor model of short rate r = Y + Y + Y with: • 1 2 3 dY = κ (θ Y )dt + σ Y dZ 1 11 1 − 1 11 1 1 dY2 = (κ21Y1 + κ22Y2 + κ23pY3)dt + σ21 Y1dZ1 +σ22 α2 + β2Y1dZ2 + σ23 α3p+ β3Y1dZ3 dY3 = (κ31Yp1 + κ32Y2 + κ33Y3)dt +pσ31 Y1dZ1 +σ32 α2 + β2Y1dZ2 + σ33 α3p+ β3Y1dZ3 p p Rotating to (r, µQ, V ) we obtain: • dV = (γ κ V )dt + σ V ψ dZ t V − V t V t − 1 1 dr = µQdt + σ V ψ pdZ + σ2V ψ dZ + σ2V ψ dZ t t 1 t − 1 1 2 t − 2 2 3 t − 3 3 dµQ = (m + m rp+ m µQ + m Vq)dt + ν V ψ qdZ t 0 r t µ t V t 1 t − 1 1 +ν σ2V ψ dZ + ν σ2V ψpdZ , 2 2 t − 2 2 3 3 t − 3 3 q q 2 2 2 σ1 + σ2 + σ3 = 1 where by definition of Vt: 2 σ1ψ1 + ψ2 + ψ3 = 0.

There are also a few admissibility restrictions (see CDGJ 2003). Maximal model has 14 parameters instead of 19 (Confirms DS ⇒ (2000)).

40 5.3 Unspanned Stochastic volatility

Numerous Multifactor Stochastic Volatility Models of the TS. • Fong and Vasicek (91), Longstaff and Schwartz (92), Chen and Scott (93), Balduzzi et al. (96), Chen (96), DS (00). . . All of these models fall within ‘Affine Class’ •

In general, stochastic volatility can be reinterpreted as changes in ⇒ yields (Duffie and Kan (DK 96)) If enough bonds with different maturities are traded, markets are ⇒ complete. All Derivatives can be hedged with bonds alone. ⇒

Example: Term Structure Stochastic Volatility Model

Term structure Derivatives: Longstaff and Schwartz (1992): • dr = κ (θ r) dt + √V dzQ r r − 1 dV = κ (θ V ) dt + σ√V dzQ V V − 2

T T A(T t)+B(T t) rt+C(T t)Vt Note: P (t) = P (t, rt, Vt) = e − − − – Longstaff and Schwartz (1992) – More generally, Duffie and Kan (1996)

Volatility risk can be hedged with appropriate position in any two bonds. ⇒ Volatility plays dual role (cross-section and time series): ⇒ 1) It is a linear combination of yields 2) It is the of the spot rate

41 Contrast: Equity Stochastic Volatility Model

Equity Derivatives: Heston (1993): • dS = r dt + √V dzQ S 1 dV = κ(θ V ) dt + σ√V dzQ − 2

– dz1Q drives innovations in stock price

– dz2Q drives innovations in volatility

Volatility risk cannot be hedged by any portfolio of stock and bond ⇒

Difference: Have modeled traded asset, and its volatility, whereas the short rate model models the short rate and its volatility, and derives prices

Q? If one can trade in a large number (possibly infinite) of bonds, are markets necessarily complete?

Q? Or does there exist (multi-factor) short rate model, where bond markets are incomplete (i.e., fixed income derivatives are not spanned by bond prices)?

From an empirical perspective this may be a desirable feature, since it appears hard in practice to hedge Fixed-income derivatives (espe- cially if they are highly sensitive to volatility risk, such as straddles) with positions in only bonds (CDG (2001)).5 5There is also mounting evidence that existing models, while adequately fitting TS, do not explain dynamics of

42 A!: CDG (2002) give necessary and sufficient conditions for USV in an affine model:

Two-dimensional Markov models cannot display USV (and be • arbitrage-free). Three dimensional (non-Gaussian) affine models (with two or • three factors) can display USV if parameters satisfy certain re- strictions.

5.3.1 Empirical Evidence

Data: • - Monthly implied volatilities on caps and floors from Feb 95 to Dec 00. Minimizes effect of ‘stale quotes’ ∼ - Swap rates for all available (10) maturities + 6-month LIBOR rate.

- Three currencies: US($), UK(£), JP(Y)=

Methodology • - Construct portfolios of at-the-money cap and floors implied volatilities, Black’s formula ∼ - Estimate 1-month straddle returns (long both cap and floor) Interpolate ATM cap and floor implied volatilities. ∼ Straddles are ‘delta’ neutral (volatility-sensitive). ⇒ 1-month returns minimize interpolation error ⇒ Derivatives such as Caps and Floors Longstaff et al. (00), Jagannathan and Sun (99).

43 - OLS Regression of straddle returns on swap rate changes

- Principal component analysis of residuals

Results • - Low R2 in the regression even for the 10-factor model, as low as 10% of the variation is ∼ explained.

- High Correlation of the residuals. First (two) component(s) explain 85% (98%) of remaining varia- ∼ tion.

Implications for Stochastic Volatility Models of the Term struc- • ture

- The principal factor driving volatility should not drive TS level.

- does not span all term structure risk. Fixed income derivatives are not redundant securities. ⇒

44 US Straddles UK Straddles JP Straddles Maturity R2 Adjusted R2 R2 Adjusted R2 R2 Adjusted R2 1 0.215 0.085 0.27 0.134 0.229 0.044 2 0.316 0.202 0.177 0.023 0.239 0.057 3 0.349 0.241 0.155 -0.002 0.305 0.139 4 0.341 0.232 0.162 0.006 0.464 0.336 5 0.385 0.283 0.149 -0.01 0.468 0.34 7 0.439 0.346 0.12 -0.044 0.513 0.396 10 0.478 0.391 0.097 -0.071 0.398 0.254

Table 1: R2 and Adjusted R2 of the regression of straddle returns with maturities 1Y, 2Y, 3Y, 4Y, 5Y, 7Y, 10Y on the changes in swap rates for all available maturities ( 0.5, 1, 2, 3, 4, 5, 7, 10 for US{ data and 0.5, 1, 2, 3, 4, 5, }6, 7, 8, 9, 10 for UK and JP data). Although multicollinearity{ is evident in the regressors,} the R2 represents{ an upper bound on the proportion} of the variance of straddle returns that can be hedged by trading in swaps.

US Residuals UK Residuals JP Residuals Eigenvector Eigenvalue % Explained Eigenvalue % Explained Eigenvalue % Explained 1 0.07184 0.87603 0.07751 0.84485 0.157 0.83438 2 0.00865 0.10546 0.01215 0.13245 0.02217 0.11782 3 0.00091 0.01113 0.00139 0.01511 0.00482 0.02561 4 0.00035 0.0043 0.00038 0.00411 0.00235 0.01251 5 0.00014 0.00169 0.00018 0.00193 0.00101 0.00536 6 0.00009 0.00109 0.00008 0.00091 0.00051 0.0027 7 0.00002 0.0003 0.00006 0.00064 0.00031 0.00163

Table 2: Eigenvalues of principal component decomposition of the covariance matrix of residuals, ordered by magni- tude of the eigenvalue. Note that over 80% of the variation is captured by the first principal component.

45 5.3.2 USV Affine Models

In a d-factor model, USV occurs if the rank of the diffusion matrix • of any vector of bond prices is less than d

The bond market is incomplete ⇒ In an N dimensional d factor affine model, • N P T (t) = exp A(T t) + B (T t)X (t) − i − i i=1 ! X where A(τ), B1(τ), . . . , BN (τ) are continuous deterministic functions which satisfy a system of ODE’s (DK96).

USV is equivalent to: β , . . . , β not all zero such that: • ∃ 1 d d β B (τ) = 0 τ 0 i i ∀ ≥ i=1 X We refer to Collin-Dufresne and Goldstein (2001) for details of the analysis and further discussion of this question. Below we summarize the main results.

First, it can be shown that no two-dimensional model can display • unspanned stochastic volatility.

In other words, if the state of the term structure can be described with two state variables, the short rate and its volatility, that follow a joint Markov , then if sufficient zero-coupon bonds are traded, all fixed-income securities can be hedged by trading only in bonds. Intuition for this results can be gained by thinking in terms of ‘duration’ and ‘convexity.’ Bonds with different maturity have dif- ferent convexity, and thus react differently to volatility shock. For a

46 shock in volatility to leave bond prices unchanged, the short rate has to adjust to compensate changes due to volatility. But the short rate cannot accommodate both duration and convexity differences across bonds. Indeed, this would imply that duration is proportional to con- vexity which is inconsistent within an arbitrage-free model of the term structure.

Sketch of proof:

By definition, a bivariate model exhibiting USV would imply bond prices are functions of only the time-to-maturity and the spot rate, and T T independent of the spot rate volatility V : P (t, rt, Vt) = P (t, rt). This in turn implies that bond prices must satisfy 1 P T (t, r) σ2(t, r, V )+P T (t, r) µ (t, r, V ) = r P T (t, r) P T (t, r) T . 2 rr r r r − t ∀ (37) Note that the right hand side of equation (37) is a function only of r, while the left hand side is a function of both V and r. Since it is not possible for the ratio of duration and convexity to be con- stant across maturities, there is no way for the left hand side to be independent of V , unless the spot rate process itself is one-factor

Markov (µr(t, r, V ) = µr(t, r), σr(t, r, V ) = σr(t, r)), which also pre- cludes USV. Note that this rules out USV in models like Longstaff and Schwartz (92) and Fong and Vasicek (91).

However, three dimensional affine models (with two or three fac- • tors) can display USV if parameters satisfy certain restrictions. In fact, we provide below, necessary and sufficient conditions for trivari- ate affine models to display USV.

First, it is tedious but easy to show that any trivariate affine USV model can be rewritten in terms of the following state variables:

47 1) the spot rate, r, 1 2) the drift of the spot rate, µ = dtE[dr] 1 2 3) the variance of the spot rate V = dtE[(dr) ]

This allows us to limit our search for trivariate models that exhibit USV to models that support bond price formula of the form

M0(T s)+M1(T s)rs+M2(T s) µs P (s, rs, µs) = e − − − . (38) Therefore, it is convenient to define 1 µ µ µ µ E[dµ] = m + m r + m µ + m V (39) dt 0 r µ V 1 µ µ µ µ E[(dµ)2] = σ + σ r + σ µ + σ V (40) dt 0 r µ V 1 r,µ r,µ r,µ r,µ E[drdµ] = c + c r + c µ + c V (41) dt 0 r µ V By applying Ito’s lemma to equation (38), and then collecting terms of order constant, r, and µ, we find that the time-dependent coefficients are defined through µ µ σ0 2 rµ M 0 = m M + M + c M M (42) 0 2 2 2 0 1 2 µ µ σr 2 rµ M 0 = m M + M + c M M 1 (43) 1 r 2 2 2 r 1 2 − σµ µ µ 2 rµ M 0 = m M + M + c M M + M , (44) 2 µ 2 2 2 µ 1 2 1 and satisfy the boundary conditions

M0(0) = 0, M1(0) = 0, M2(0) = 0 . (45) Furthermore, by collecting terms of order V , we find that this model supports USV if and only if for all dates τ the following condition holds: σµ 1 0 = mµ M (τ) + V M 2(τ) + crµM (τ)M (τ) + M 2(τ) . (46) V 2 2 2 V 1 2 2 1 48 It then follows that necessary and sufficient conditions for the • model to display USV are:

µ r,µ r,µ r,µ r,µ µ r,µ 2 r,µ m = (2c + 2(c )2 + c /c ) mr = 2(c ) + cµ r µ V r V − V µ − r,µ µ r,µ m = 3c  m =  3c  µ V µ V  µ  µ m = 1  m = 1  V  µV r,µ r,µ r,µ r,µ or  µ r,µ r,µ r,µ 2 r,µ r,µ   σr = 2cµ (cµ + (c ) + cr /c )  σr = 2cV (cr + 2cµ cV )  V V  µ − r,µ r,µ r,µ  µ − r,µ r,µ r,µ σ = 4c + 6c c σµ = 4cV cµ + 2cr µ r µ V µ r,µ r,µ r,µ r,µ µ r,µ 2 2  σ = (c )  σ = cµ + (c ) + cr /c  V V  V V V     (47)   Remarks:

First, the reason that there are two sets of parameter restrictions • that generate USV is because equation (46) is a quadratic equa- tion in B ( ) or B ( ). This in turn generates two possible solu- 1 · 2 · tions for B ( ) in terms of B ( ). 1 · 2 · Second, these two sets of restrictions reduce to the same set if • r,µ r,µ r,µ r,µ and only if c = 0 and c + c c = 0. This condition ob- V 6 r µ V tains, for example, when the covariance between the short rate and its drift depends only on the volatility, an important special case (basically, the so-called A1(3) models) which we examine below. Finally, we note that several of the USV restrictions presented • above occur naturally once we limit the class of models to those which are admissible: that is, those which restrict the ‘square- root’ state variables to be non-negative (see below).

To provide some intuition for the proof of this result, sufficiency ob- tains because the right hand side of equation (46) can be shown to

49 be identically zero when either of the two sets of parameter condi- tions holds. Necessity obtains because if any one of the conditions is not satisfied, then we can show, by taking repeated time-derivatives of the system of ODE’s (Ricatti equations) evaluated at τ = 0, that equation (46) cannot hold.

We note that the models of both Chen (1996) and Balduzzi, Das, Foresi and Singh (1996) cannot satisfy these necessary restrictions, and thus cannot display USV. Also, clearly, the A0(3) class of models of Dai and Singleton (2000) cannot exhibit USV (or incomplete bond

markets). However, Dai and Singleton’s (2000) maximal A1(3), A2(3) and A3(3) models do have the flexibility to exhibit USV. In fact, using our canonical representation for the maximal A1(3) model and apply- ing the necessary and sufficient conditions for USV to apply we may derive the maximal model displaying USV.

5.3.3 Maximal A1(3) model with USV

4 Necessary and sufficient conditions for USV in A1(3) case (CDG (2002)): 2 mr = 2cV mµ = 3cV − µ 2 mV = 1 σV = (cV ) where Q dr dµ = (c0 + cV V ) dt Q 2 µ µ (dµ ) = (σ0 + σV V ) dt

The maximal model with USV is: dV = (γ κ V )dt + σ V ψ dZ t V − V t V t − 1 1 dr = µ dt + σ V ψ pdZ + (1 σ2)V + σ2ψ + ψ dZ + ψ dZ t t 1 t − 1 1 − 1 t 1 1 2 3 − 2 2 q p 50 p dµ = (m 2c2 r + 3c µ + V ) dt t 0 − V t V t t +c σ V ψ dZ + c (1 σ2)V + σ2ψ + ψ dZ + ν ψ dZ V 1 t − 1 1 V − 1 t 1 1 2 3 2 − 2 2 p q p where

for stationarity: κv > 0 , cV < 0 for admissibility: γ κ ψ > 0 , ψ > 0 , 1 > σ2 , ψ + ψ > 0 v − v 1 − 2 1 1 2 USV imposes 5 restrictions (model has 9 parameters under the Q measure):

γv , κv , σv , ψ1, σ1, cV , ψ2, m0, ν2 (48)

Corresponding Bond prices are given by:

P (t, T ) = exp A(T t) B (T t) r B (T t) µQ (49) − − r − t − µ − t   where: c τ 2c τ 3 + 4e V e V Br(τ) = − − 2cV c τ (1 e V )2 Bµ(τ) = − 2 2cV 1 4c τ µ 3c τ µ µ A(τ) = 3e V (2c c σ ) + 16e V (3c c σ ) + 25σ 5 V 0 0 V 0 0 0 96cV − − − µ µ cV τ 2 2cV τ 48e ( 5cV c0 2cV m0 + σ0 ) + 12e (2cV ( 6c0 cV m0) + 3σ0 ) − − − −µ − +6c ( 23c + 12c m + 2( 6c c 4c2 m + σ )τ) V − 0 V 0 − V 0 − V 0 0  Note that the USV model is a two-factor model of the cross- • section of bond prices, but a three factor model of the time series of bonds.

51 γ , κ are not identifiable from bond prices alone: In contrast to • V V claim of DS, ‘maximality’ must be defined relative to all fixed income derivatives. This also implies the model can be ‘extended’ to allow for a very • simple two-step calibration procedure to fixed-income derivatives (such as at-the-money Caps/Floors). First, as in Hull and White (1990), the parameter can be made time-dependent to fit the term Q structure of forward rates. Second, the parameters γv and κv can be made time-dependent to fit the term structure of CAP volatili- ties, without affecting the initial calibration of the term structure of forward rates. Note: Even if volatility is an arbitrary Markov process, obtain • same affine yields even though dynamics of state vector are not affine!

5.3.4 Maximal A1(4) model with USV

Similarly we derive the maximal 4-factor USV model which has state variables (r, µ, V, θ) where θ = 3Y 2 (local curvature):

Q dV = (γ κ V )dt + σ V ψ dZ (t) t V − V t V t − 1 1 p Q Q Q dr = µ dt + σ V ψ dZ (t) + (1 σ2)V + σ2ψ + ψ + ψ dZ (t) t t 1 t − 1 1 − 1 t 1 1 3 4 2 Q Q + ψ dZp (t) + ψ dZ (qt) − 3 3 − 4 4 p p Q Q Q Q dµ = (θ + V ) dt + c σ V ψ dZ (t) + c (1 σ2)V + σ2ψ + ψ + ψ dZ (t) t t t rµ 1 t − 1 1 rµ − 1 t 1 1 3 4 2 Q Q +ν ψ dZ (t) +pν ψ dZ (t) q 3 − 3 3 4 − 4 4 p p Q Q Q dθ = a 2c2 (3c a ) r + (7c2 3c a ) µ + a θ + 3c V dt t 0 − rµ rµ − θ t rµ − rµ θ t θ rµ t  Q  Q +c2 σ V ψ dZ (t) + c2 (1 σ2)V + σ2ψ + ψ + ψ dZ (t) rµ 1 t − 1 1 rµ − 1 t 1 1 3 4 2 p q 52 Q Q +η ψ dZ (t) + η ψ dZ (t). 3 − 3 3 4 − 4 4 p p

Note: the A1(4) USV model has a total of 14 risk-neutral parameters

(γV , κV , σV , ψ1, ψ3, ψ4, ν3, ν4, η3, η4, σ1, a0, crµ, aθ), as opposed to 22 for the unrestricted model.

The zero coupon bond price is given by:

Q Q P (t, T ) = exp A(T t) B (T t) r B (T t) µ B (T t) θ , − − r − t − µ − t − θ − t  (50)  where the deterministic functions A(τ), Br(τ), Bµ(τ), and Bθ(τ) are obtained in closed form.

Note that the A (4) USV model is a 3-factor Gaussian model of • 1 the cross-section of bond prices, but a 4-factor model of the time series of bonds.

5.3.5 Specification of Risk-Premia

Specify second generation affine risk-premia process so that the dy- namics of the state vector for the unrestricted A1(3) under the histor- ical measure are:

dVt = (γ + λ 0 λ ψ1 ) (κ λ ) V dt + σ Vt ψ dZ (t) V V − V − V − V t V − 1 1   p Q drt = λr0 + λrr rt + 1 + λrµ µt + λrV Vt dt 2 2 + σ Vt ψ dZ (t) + σ Vt ψ dZ (t) + σ Vt ψ dZ (t) 1 − 1 1 2 − 2 2 3 − 3 3 p q q

53 Q Q dµt = m0 + λµ0 + mr + λµr rt + mµ + λµµ µt + mV + λµV Vt dt       +ν V ψ dZ (t) + ν σ2V ψ dZ (t) + ν σ2V ψ dZ (t) 1 t − 1 1 2 2 t − 2 2 3 3 t − 3 3 p q q All drift parameters in r , µQ, V dynamics are risk-adjusted. • t t

Extends slightly Duffee, but need to guarantee that Feller con- • dition holds under both the physical and risk neutral measures for existence of EMM (Liptser and Shiryaev).

5.4 Estimation of Affine models with and without USV

Below we present a summary of the results in CD, Goldstein and Jones (2004) about the estimation and comparative tests of various three and four factor affine models.

Empirical Methodology

Use weekly swap rate data (maturities 2, 3, 4, 5, 7, 10 and six • { } month LIBOR from Jan. 7, 1988 to Nov. 27, 2002. Adjust LIBOR quote for non-synchronicity with USD swap rates. • Estimate unrestricted model using Quasi Maximum Likelihood • (QML), similar to Chen and Scott (1993), Pearson and Sun (1994).

– Standard procedure fits 3 specific yields perfectly to invert for state variables. – Remaining yields observed with ‘measurement’ errors.

54 – Log-Likelihood is a combination of transition density of state variables and (Gaussian) likelihood for the errors. – When transition density is not known explicitly use a Gaus- sian (QML) approximation based on the exact first two moments which can be computed explicitly (Fisher and Gilles (1996), Duffee (2002))

‘Improvements’ to QML estimation

Use principal components instead of yields to invert for state • and ‘errors’ – Guarantees to fit perfectly first three PC which explain over 95% of the variance of yields (Litterman and Scheinkman (1991). – ‘Orthogonalizes’ the (unconditional) matrix of measurement errors. – Dispenses with the arbitrariness of the yields fitted exactly. – Retains simplicity of inversion for the state (PC’s are linear in state variables).

Tested cumulant expansion approximation to the transition den- • sity based on explicit higher order moments to improve estima- tion of transition density.

Noninvertibility of Yields

When state vector cannot be inverted from bond prices, use a simu- lated QML approach based on Efficient Importance Sampler of Richard

55 and Zhang (1996,97) (see also Sandmann and Koopman (1998)), Pen- nachi (1991), Brandt and He (2002)

Ex 1): USV implies V cannot be determined from bond yields.

Ex 2): If assumed that yields are measured with errors, then state vector not invertible from yields.

Let P = , , ..., denote the time series of PC’s of the yield P1 P2 PT curve.  Likelihood function, p(P θ), may be written as the integral | p(P, V θ) dV | Z V where = V1, V2, ..., VT denotes the time series of the variance process.  The integral is evaluated using simulation.

¿From ‘importance sampling’, an approximate auxiliary model pa(V P, θ) | is specified

pa(V P, θ) p(P, V θ) p(P, V θ) dV = p(P, V θ) | dV = E a | , | | pa(V P, θ) pa(V P, θ) Z Z |  | (51)

The closer the auxiliary model is to the actual model, the less sensitive the ratio is to the simulated variance path, and the more quickly the expectation will converge.

56 The EIS approach essentially chooses the auxiliary density pa(V P, θ) | (within a certain parametric class) to minimize the variation in ln p(V P, θ) ln pa(V P, θ) | − |

57 Test Five Specifications

1. Unrestricted A1(3) model assume 2 PCs are observed without error “2PC” • simulate paths of V • “invert” V and the 2 PCs for r, µQ, and V • 2. Unrestricted A1(3) model assume 3 PCs are observed without error “3PC” • “invert” PCs for r, µQ, and V • 3. A1(3) model with USV restrictions assume 2 PCs are observed without error “USV” • “invert” PCs for r and µQ • simulate paths of V • 4. Unrestricted A1(2) model assume 2 PCs are observed without error “A (2)” • 1 “invert” PCs for r and V (there is no µQ) • 5. Unrestricted A1(4) model assume 3 PCs are observed without error “A (4)USV” • 1 simulate paths of V • “invert” V and the 3 PCs for r, µQ, θ, and V •

– Models 2, 3, and 4 are restricted versions of 1 (Table 4).

– Model 5 nests Model 3.

58 Empirical Results

Ignoring QML approximation error, all three-factor restricted mod- • els rejected by LR test. All models predict short rate and slope accurately (Table 7). • Only A (3) and A (4)USV capture dynamics of Curvature (Fig- • 1 1 ure 1, Table 7). But A (3) (unrestricted and 3PC) predict volatility that is nega- • 1 tively correlated with model-independent GARCH volatility, as well as with volatility extracted from short rate implied by the model itself! (Figure 2, Table 7) A (3) cannot both capture curvature and dynamics of short rate ⇒ 1 volatility (V plays double role in unrestricted A1(3) model). A (3) USV does a better job at capturing volatility dynamics, but • 1 not quite as good for curvature (Table 7, Figure 2). Only A (4) USV can capture both time series property of short • 1 rate volatility and dynamics of TS Curvature factor. Superiority of A (4) model confirmed by out of sample yield • 1 changes (table 9) and squared yield changes (table 10), as well as predictability regression (Figure 3) and Maturity/volatility re- lation (Table 4).

59 Conclusion

Propose a canonical representation for affine models in which: • – the state variables have simple physical interpretations such as level, slope and curvature at the short end, – their dynamics remain affine and tractable, – the model is by construction ‘maximal’ in the sense of Dai and Singleton (00), – model-insensitive estimates of the state variables are readily available.

Offer a complete characterization of the ‘maximal’ A (3) and • 1 A1(4) USV model. Empirical estimation of the various models show: • – USV restrictions do not significantly affect cross-sectional fit, but – Substantially improve the time-series properties of the model (in and out of sample forecasts of squared changes in yields). – Even though USV is nested within the unrestricted model, im- posing the USV restriction explicitly improves the estimated time series of volatility, because USV breaks the dual role played by volatility in the unrestricted model. – To capture dynamics of level, slope, and curvature, as well as stochastic short rate volatility need four distinct factors (A1(4)).

60 6 Recent Developments

Joint models of term structure and derivatives • Derivatives convey more informative about higher order moments of TS. – Relative pricing of Caps and Swaptions? Jagannathan, Kaplin and Sun (2000), Longstaff, Santa-Clara and Schwartz (2000) document mis-pricing of Caps relative to captions in a Gaussian string model. CDG (2002) show that relative price of caps and swaption can be seen as proxy for stochastic correlation. Han (2003) and Li (2003) find some empirical support for stochastic correlation. Brownian field ‘String’ Models can: • 1. take into account information in observation ‘errors’ (no low- dimensionality in forward rates variance covariance matrix), 2. use information on both derivatives and term structure con- sistently, 3. deliver unique optimal bond portfolio choice 4. handle predictability factor in bond return of Cochrane and Piazzesi (2004).

– Models: (Kennedy (1994), Goldstein (2000), Santa-Clara and Sornette (2001), CDG (2002). – Estimation: Li (2003), Han (2003), Bester (2004). Combining affine models with macroeconomic information. • 1. Specification of latent short rate model consistent with short rate behavior of Fed Funds Target Rate.

61 (a) Balduzzi, Bertola, Foresi (1996) model short rate as mean- reverting around target rate. (b) Babbs and Selby (1991) model German Bundesbank dis- count rate. (c) Piazzesi (2001) builds in jumps in target rates around FOMC meeting dates (both scheduled and unscheduled meetings). Allows to study the reaction in term structure (long yields) ⇒ to short rate movements. 2. Combine Macro-variables and latent variables for the short rate. Typically, use a Taylor rule of the form

rt = δ0 + δM0 Mt + δZ0 Zt

where Mt is a vector of macro variables (inflation, output. . . ) and Z is a vector of latent variables (orthogonal to M). Typi- cally assume Gaussian dynamics and estimate P-measure co- efficients from VAR. With assumption on risk-premia, obtain an arbitrage-free model of the term structure that contains ex- plicit information on Macro-variables. (a) Makes predictions about responses of term structure to changes in Macro-variables (Ang and Piazzesi (2003)). (b) Makes predictions about future macro-variables based on information in the yield curve if allows for feedback be- tween interest rates and macro-variables (Hordahl, Tris- tani and Vestin (2003), Duffee (2004), Ang, Piazzesi and Wei (2003))

62 References

[1] D.-H. Ahn and B. Gao. A parametric non-linear model of the term structure. The Review of Financial Studies, 15 no 12:721– 762, 1999. [2] P. Balduzzi, S. Das, and S. Foresi. A simple approach to three factor affine term structure models. Journal of Fixed Income, 6:43–53, 1996. [3] G. Chacko and S. Das. Pricing interest rate derivatives: A gen- eral approach. The Review of Financial Studies, 15:195–241, 2002. [4] L. Chen. Stochastic mean ans stochastic volatility– a three factor model of the term structure of interest rates and its application to pricing of interest rate derivatives. Blackwell Publishers, Oxford, U.K., 1996. [5] P. Collin-Dufresne, R. Goldstein, and C. Jones. Identification and estimation of ‘maximal’ affine term structure models: An application to stochastic volatility. Carnegie Mellon Working paper, 2002. [6] P. Collin-Dufresne and R. S. Goldstein. Do bonds span the fixed- income markets? theory and evidence for unspanned stochastic volatility. Journal of Finance, VOL. LVII NO. 4, 2002. [7] P. Collin-Dufresne and R. S. Goldstein. Generalizing the affine framework to HJM and random field models. Carnegie Mellon University Working Paper, 2002. [8] P. Collin-Dufresne and R. S. Goldstein. Pricing swaptions in an affine framework. Journal of Derivatives, 10:9–26, 2002.

63 [9] J. C. Cox, J. E. Ingersoll Jr., and S. A. Ross. A reexamination of the traditional hypotheses about the term structure of interest rates. Journal of Finance, 36:769–799, 1981. [10] J. C. Cox, J. E. Ingersoll Jr., and S. A. Ross. A theory of the term structure of interest rates. Econometrica, 53:385–407, 1985b. [11] Q. Dai and K. Singleton. Expectation puzzles, time-varying risk premia and dynamic models of the term structure. forthcoming Journal of Financial Economics, 2002. [12] Q. Dai and K. J. Singleton. Specification analysis of affine term structure models. Journal of Finance, 55:1943–1978, 2000. [13] G. R. Duffee. Term premia and interest rate forecasts in affine models. Journal of Finance, 57 no 1, 2002. [14] D. Duffie, D. Filipovic, and W. Schachermayer. Affine processes and applications to finance. Working Paper Stanford University, 2001. [15] D. Duffie and R. Kan. A yield-factor model of interest rates. , 6:379–406, 1996. [16] D. Duffie, J. Pan, and K. Singleton. Transform analysis and op- tion pricing for affine jump-diffusions. Econometrica, 68:1343– 1376, 2000. [17] W. Feller. Two singular diffusion problems. Annals of Mathe- matics, 54:173–182, 1951. [18] R. S. Goldstein. The term structure of interest rates as a random field. The Review of Financial Studies, 13no2:365–384, 2000. [19] D. Heath, R. Jarrow, and A. Morton. Bond pricing and the term structure of interest rates: A new methodology for contingent claims evaluation. Econometrica, 60:77–105, 1992.

64 [20] S. L. Heston. A closed form solution for options with stochastic volatility. Review of financial studies, 6:327–343, 1993. [21] M. Hogan and K. Weintraub. The lognormal interest rate model and eurodollar futures. Working Paper, Citibank, Nw York, 1993. [22] J. Hull and A. White. Pricing securities. The Review of Financial Studies, 3no4:573–592, 1990. [23] R. Jagannathan, A. Kaplin, and S. Sun. An evaluation of multi- factor cir models using libor, swap rates and cap and swaption prices. Working paper Northwestern University, 2000. [24] F. Jamshidian. An exact formula. Journal of Fi- nance, v44, n1:205–09, 1989. [25] F. Jamshidian. Contingent claim evaluation in the gaussian in- terest rate model. Research in Finance, 9:131–170, 1991. [26] F. Jamshidian. Bond, futures and option evaluation in the quadratic gaussian interest rate model. Applied Mathematical Finance, v3, n2:93–115, 1995. [27] D. Kennedy. The term structure of interest rates as a Gaussian random field. Mathematical Finance, 4:247–258, 1994. [28] F. Longstaff, P. Santa-Clara, and E. S. Schwartz. The relative valuation of caps and swaptions: Theory and empirical evidence. Journal of Finance, 56:2067–2109, 2001. [29] R. E. Lucas. Asset prices in an exchange economy. Economet- rica, 46:1426–1446, 1978. [30] R. C. Merton. Theory of rational option pricing. Bell Journal of Economics and Management Science, 4:141–183, 1973. [31] S. F. Richard. An arbitrage model of the term structure of interest rates. Journal of Financial Economics, 6:33–57, 1978.

65 [32] P. Santa-Clara and D. Sornette. The dynamics of the forward interest rate curve with stochastic string shocks. Review of Fi- nancial Studies, 14, 2001. [33] O. Vasicek. An equilibrium characterization of the term struc- ture. Journal of Financial Economics, 5:177–188, 1977.

66