27 Week 9. Term Structure I Expectations Hypothesis and Bond Risk Premia — Notes

27 Week 9. Term structure I Expectations Hypothesis and Bond Risk Premia — Notes

1. Background: We’re back to regressions +1 =  +  + +1 to forecast returns. We will connect them to portfolios, a job that is just beginning.

2. Deﬁnitions

3. Expectation hypothesis — 3 statements.

4. Bond risk premia (how expectations fails) — Fama/Bliss; Cochrane/Piazzesi.

5. Foreign exchange, expectations and carry trade. The link between forecasts and portfolios.

27.1 Deﬁnitions

1. Discount bond: A promise to pay $1 at time  + . (This is the same thing as a zero coupon bond.)

2. Price ()  =priceof year discount bond at time t Example.  (2) =0.9 10% interest rate  ⇐⇒ () distinguishes maturity from raising a number to a power.

()  =logpriceof-year discount bond at time .

Example: ln(09) = 010536 01 or “10% discount − ≈− 3. Small letters = log of large letters. (ln). Note

ln(1 + ) ; ≈  1+ for small . ≈

Thus (11) 01 = 10 % return. Why logs? The two year return is 12.Thetwoyear ≈ log return is 1 + 2. Adding is a lot easier than multiplying.

4. Yield.

(a) Concept: a discount rate for bonds. (b) Deﬁnition (words) What constant discount rate would make sense of the bond price, if we knew the future for sure and there were no default?

(c) Definition (formula). For any bond with  =cashflow at time ,  solves   =      X You have to find this numerically.

522 (d) For discount bonds, 1 1 () () () −   =     () ↔ ≡  h i h i 1 () ()  ≡−  01 Example: (2) = 01 (2) = =005 “5% discount per year.”  − →  2 (e) Point: yield allows you to compare bonds of diﬀerent maturity and coupon on an even basis. (It’s like implied volatility for options.) It’s just another more convenient way to rewrite the price. It is not the expected return you will earn for one year! It is the compound average yield to maturity if the bond does not default — but not the expected return.

(f) 1234Time

-0.5

5% yield – “average” price rise

-0.10

-0.15

Actual price path -0.20

Price

5. Forward rate.

(a) Deﬁnition: The rate at which you can contract today to borrow from time  +  1and − pay back at time  + . (b) Fun fact: you can synthesize this forward loan by shorting  +  1 zeros and buying −  +  zeros16.Hence,wecanﬁnd the forward rate from today’s zero rates.

( 1) () ( 1 )  −  =  − →   ≡ ()  () ( 1 ) ( 1) ()  =  − →  −    ≡  −  (c) Example. (3) (2) (2 3)  = 015;  = 010  → =005 = 5%  −  − →  6. Holding period return.

16 ( 1) Move the money from  +  1to andthenbackto +  $1 at  +  1 transforms to $ − dollars at . () − ( 1) ( 1) −() $1 at  transforms to $1 at + .Thus,$ − transforms to  −  at  + 

523 (a) Deﬁnition (words). Buy an -year bond at time  and sell it — now an  1yearbond − —attime +1. (b) Deﬁnition (equations) ( 1) () +1− +1 = ()  () ( 1) ()   −   +1 ≡ +1 −  (Note:  for “holding period return” in Asset Pricing.  is enough though. ) (c) Excess log returns (over the risk free rate)

() () (1) +1 ≡ +1 −  Note: this is convenient but tricky. It’s not really a zero cost portfolio since it’s logs!

7. Why we focus on zeros. Zero coupon bonds are the a la carte menu. Coupon bonds are a bundle of zeros.  =cashﬂow at time . Then,

Coupon bond = CF1 1 year zero + + CF2 2 year zero + ... × × Hence, (1) (2) Coupon bond price =  CF1 +  CF2 +  × × 8. Summary (you need to know these):

() ()  =log( )e.g.-0.1 1 () = ()  −  () ( 1) ()  =  −  () ( 1) − ()  =  −  +1 +1 −  () = () (1) +1 +1 −  9. Notation: ()denotesmaturity.  denotes time, when each item is observed. If you want to be really clear, use arrows for buy and sell dates. For example, buying an n period bond, holding it for a year, and selling it as an n-1 period bond gives a return

() (  1) +1 =  →+1− →

27.2 Expectations hypothesis

1. The yield curve: a plot of bond yields as a function of maturity. (Forward curve: forward ratesasafunctionofmaturity)

(a) Why does the yield curve sometimes slope up and sometimes slope down? (b) Expectations hypothesis: An upward sloping curve means people think short term interest rates will rise in the future.

524 2. Today’s yield curve (Nominal and real = TIPS)17

Be careful though, the x axis is stretched so it has diﬀerent time intervals. This is a common presentation, and useful for many purposes, but it gives a misleading sense of upward curvature at the short end. Here is a yield curve with maturity evenly spaced, and adding forward rates

Treasury yield curve 8/16/2013 4.5

3.5

2.5

Percent 2

1.5

0.5 Yield forward 0 0 5 10 15 20 25 30 Maturity, years

3. Why are yields diﬀerent for bonds with diﬀerent maturity? Three equivalent statements:

(a) Long maturity yield = average of expected future short rates (plus risk premium) (b) Forward rate = expected future spot rate (plus risk premium)

17http://www.treasury.gov/resource-center/data-chart-center/interest-rates/Pages/Historic-Yield-Data- Visualization.aspx

525 (c) Expected holding period returns should be equal across maturities (plus risk premium)

Thus, the yield curve rises if people expect short rates to be higher in the future. The decline in future bond prices cuts oﬀ the great yield, so you don’t earn any extra in long bonds for the ﬁrst year. 4. Example: Today’s yield curve (under EH) implies that markets expect interest rates to start rising 1 year form now, recover to about 4.8% by year 6 and then stay there. 5. Big picture: These are all relatives of the random walk for stock prices, and (approximately) risk neutral arbitrageurs. Alternative ways of getting money from one date to another must have the same expected value (plus risk premium). It’s an obvious principle with many sur- prising implications.

Hold long term Forward rate for one period Hold N period zero

0 1 2 ... N-1 N

Roll over 1 period bonds Spot rate Short rate

27.2.1 Yields and future interest rates

1. Two ways of getting money from now (0) to N should give same expected return. Return from 0 to  : Buy N year zero Rollover () () (1) (1) (1) (1) 0  =  =  0 + 1 + 2 +  1 → − − 2. Expected return should be the same → () 1 (1) (1) (1) (1) 0 =  0 + 1 + 2 +  1 (+risk premium)  − ³ ´ 3. Long yield = average of expected future short rates (plus risk premium). 4. Note: some people do this in levels rather than logs, Buy N year zero Rollover   = 1 =  ()     =  (1) (1) (1) 0   () 0 1 2  1 0 1  1 → − − h i 1 () (1) (1) (1)  0 =  0 1  1 (+risk premium) − ∙³ ´ ¸ (ln()) =ln(()) so the level and log statements are not exactly equivalent. They are 6 separate embodiments of the “Expectations hypothesis.” We — like all modern term structure models — use logs. (The diﬀerence is only in small 122 terms)

526 5. Question: What does a yield curve that rises with maturity mean? Answer: That short rates are expected to rise in the future.

27.2.2 Forward Rate

1. Two ways of getting money from N to N+1 should give same expected return. Why? Lock in now, or wait and borrow/lend at the spot rate.

2. Forward rate = expected future spot rate (plus risk premium)

lock in = E(wait and use spot) (+risk premium) () (1)  =  + 1 (+risk premium) − h i 3. Forward rates reveal “market’s forecast of interest rates” directly.

4. What does a rising forward curve mean? That short rates will be higher in the future.

27.2.3 Holding period returns

1. Two ways of getting money from  to +1 should be the same. Expected holding period returns should be equal across maturities (plus risk premium)

hold N period bond 1 year = hold 1 period bond for 1 year () (1) (1)   +1 =   +1 =  (+risk premium) → → h () i h () i   +1 =   +1 =0(+riskpremium) → → h i h i 2. This insight is again algebraically equivalent to the others.

27.2.4 Example

1. Example:

(a) One year yield = 5%. Two year yield = 10%. Prices: (1) = 005 (2) = 020.  −  − (b) Does this mean you expect to earn more on two year bonds? No — we should expect (1) (2) to earn the same for the ﬁrst year. Thus, the EH tells us what  +1 is: (+1)= (1) (2) (1) (1) (1)    =  =005. This means ( )= 015 or (³ )=0´ 15. +1 −   +1 − +1 ³ (1) ´ (c) (+1)=015: We expect the short rate to rise in the future, so that you earn the same on long term bonds as short term bonds this year, despite the higher initial yield on long term bonds.

527 0

5% yeld −0.05

−0.1 10% yield log price

15% forward 15% yield

−0.15

5% return −0.2 0 1 2 time

(d) Equivalently: i. Long yield (10%) = average of expected future short yields (5%, 15%). ii. The expected two year return is the same from rolling over (5% + 15%) as from holding the two year bond (20%). iii. The expected one-year return is the same (5%) whether you hold the one period bond or the two period bond. iv. Forward rate = (1) (2) = 005 ( 020) = 015 = expected one year rate one  −  − − − year out (15%). (e) Hopefully the graph emphasizes how diﬀerent statements of the expectations hypothesis (1) are equivalent. There is one thing we don’t know ahead of time — where +1 will end up (green triangle in the middle of the graph). As you move it up and down, this year’s return, next year’s one-year yield both change together. (f) Note the green line is how the expectations hypothesis says things will work out. That’s not necessarily the true expected path, and it certainly isn’t the actual outcome!

27.2.5 Exchange rates

1. Notation:

 = Exchangerate(Euros/Dollar)attimet   = US interest rate (in Dollars)   = Foreign interest rate (in, e.g. Euros)

2. Should you invest in Euro bonds or dollar bonds? The realized dollar return from Euro bonds is the Euro return (interest rate) less any depreciation of the Euro

 $+1 euro+1 euro$    +1 = = =  +1 → $ euro × euro+1$+1 → × +1    +1 =  +1 +  +1 → → − 3. The expectations hypothesis says:

528 (a) The expected dollar rate of return on dollars vs. Euros should be the same (plus risk premium)     +1 =   +1 +   (+1)+riskpremium → → − ³ ´ ³ ´ (b) In particular, for one-period interest rates

(1) (1)  =  +   (+1)   − (c) Rearranging, the interest spread should equal the expected depreciation

(1) (1)   =  (+1)   −  − If Euro rates are higher, we expect the Euro to depreciate, i.e. more Euros per dollar in the future.

Often people use  or  for interest rates, and call ∗ or ∗ the foreign rate.

∗  =  (+1)   − − ∗  =  (+1)   − − (Note: if you express exchange rates in dollars / euro, you get the opposite sign,

∗  =   (+1)  − − Euros are in fact quoted in dollars per euro. Yen are quoted as yen per dollar. Keep the sign straight!)

4. Example: Just before the 1997 Asian currency crashes, crash, local rates were 20%, dollar rates are 5%. This is a sense in which high interest rates correspond to high (about to decline) exchange rates. (And have nothing to do with “speculators,” “tight monetary policy,” etc. )

5. Forward rate statements

st+1

it it*

(a) There are forward markets for currencies. Thus, i. By arbitrage, going around the box gets zero.

 +  ∗  =0 −  −

529 ii. “Covered interest parity” The forward-spot spread = the interest diﬀerential

∗  =    − − you “cover” the exchange rate change and have no risk. iii. “Uncovered interest parity.” We can express the expectations hypothesis (or trade its violations!) as the forward-spot spread equals expected appreciation

 (+1)  = ∗  =   −  − − This is “uncovered interest parity” because you do not “cover” the exchange rate risk. iv. Note: This is still a “hypothesis!” Yes, it’s false, at least in the short run. “Covered interest parity” is arbitrage, a much stronger hypothesis.

27.3 Yield curve summary

1. Buy a long term bond, hold to  vs. rollover interest rate. Long yield = average of expected future short rates (plus risk premium)

() 1 0 =  (0 + 1 + 2 +  1)+(riskpremium)  − A rising yield curve means interest rates are expected to rise in the future.

2. Lock in forward rate vs. wait, borrow/lend at spot rate Forward rate = expected future spot rate (plus risk premia)

( +1) (1)  → = (+)+(riskpremium)

A rising forward curve means interest rates are expected to rise in the future.

3. Hold a short term bond for one period, vs. hold a long-term bond for one period? Expected holding period returns should be equal across maturities (plus risk premia)

() (1)   +1 =  +(riskpremium) → h i 4. Exchange rates Us Rate = Foreign rate + expected depreciation of foreign currency (plus risk premium)    =  +  +1+(riskpremium)   − 5. Many more. For example, expected two-year returns should be the same. Two-year forward rate ( +3to + 5) should equal expected two-year yield in year  + 3. The two-year Euro vs. Us yield spread equals the two-year expected depreciation.

6. Fact: 1-3 are algebraically equivalent. If any one holds, then all the others do too.

530 27.4 Risk premia

1. What risk premium do we expect? Which end of these ways to get money from x to y is riskier?

(a) Think of one year holding period. It seems that the long term bond is riskier, long yields should be higher to compensate investors for risk. (This is the usual idea). (b) How about a 10 year holding period? For a 10 year horizon, it seems the 10 year bond is safer. Rolling over short bonds, interest rates could rise and decline. That suggests that 10 year yields should be lower—yield curves should decline on average! (c) An important lesson here: for most investors, the riskless asset is a long term bond! The coupon only part of a TIP should be the bedrock riskless asset in a portfolio, NOT “cash” (d) What if variation in interest rates is all due to changes in inflation not changes in real interest rates?Wheninflation rises, the short term rates rise 1-1. In this case, the real 10 year return is safer if you roll over nominal short term bonds (money market) rather than hold nominal 10 year bonds. (e) An interesting observation: In the 19th century, under the gold standard, there was little long-term inflation risk. Interest rate variation was real interest rates. The yield curve sloped downward on average, long-term bonds are safer for long-term investors. (f) (Long-term investors who look for quarterly performance in their bond portfolios are nuts. We know that bond yields rise when bond prices fall, so the long-term value is unaffected by short-term losses.) (g) Once it exists, the risk premium may vary over time. All risk premia go up in bad times, and the pattern of real/nominal, and demand duration/supply duration may change.

Summary: The sign of risk premium can go either way; it depends on investor’s horizon relative to supply of bonds, and whether real interest rates or inﬂation are the source of interest rate risk. Thus, we’ll (for now) let the data speak on risk premium size and sign rather than impose any theory.

2. Theory II: as always,

  (+1)=(+1+1)  =  +1 ∆+1 ()  ¡    ∆¢+1 ≈− +1 ³ ´ The risk premium can go either way — do interest rate surprises come in good times or bad times? (In the last equation I make the approximation  (+1) = () (+1) (1) = +1 ≈ +1 +1 −  −  () +( +1)(+1) (1) in order to express the covariance with yields) − +1  −  (a) Intuition: interest rates seem to go down in bad times, like Fall 2008, and up in good times. That means long term bonds do well in bad times. That’s an extra reason they should oﬀer lower returns than short-term bonds, and the real yield curve should slope down.

531 (b) Inflation mucks up the picture for nominal bonds. If inflation comes in good times (the usual Keynesian assumption), then again interest rates rise in good times. But if it’s stagflation, inflation in bad times, then long term bonds do worse in bad times, and this could justify an upward sloping yield curve.

3. We’ll just look at the risk premium empirically.

4. Deﬁnition of “expectations hypothesis”

(a) Strict deﬁnition: “Expectations hypothesis” means no risk premium. (b) The usual deﬁnition: Expectations hypothesis means a risk premium that is small and constant over time. (Otherwise the theory is totally vacuous!) (c) “Expectations” is not quite the same as “risk neutrality” because we took logs. It’s risk neutrality + 1/22. Fortunately 2 is usually quite small.

27.5 Empirical evaluation of yield curves and risk premia

1. Facts Yields of 1−5 year zeros and fed funds

1960 1970 1980 1990 2000 2010

Yield spreads y(n)−y(1)

−1

−2

1960 1970 1980 1990 2000 2010

532 1−5 year forwards 16 14 12 10 8 6 4 2

1960 1970 1980 1990 2000 2010

forward spreads f(n)−y(1) 4

−2

−4

1960 1970 1980 1990 2000 2010

Annual returns on 1−5 year bonds 30 25 20 15 10 5 0 −5

1960 1970 1980 1990 2000 2010

Annual excess returns on 2−5 yerar bonds 15

−5

−10

−15 1960 1970 1980 1990 2000 2010

(a) The dominant movement is “level shift” — all yields up and down together. (b) The yield curve does change shape. Sometimes rising, sometimes falling. This is the “slope” factor. There is a regular business cycle pattern: inverted at peaks, rising at troughs.

533 (c) When the yield/forward curve is upward sloping, interest rates subsequently rise; when the yield curve is inverted, interest rates do subsequently fall. The expectations hypothesis looks fairly reasonable. (d) In 2003-04, the expectations hypothesis forecast big increases in rates. In 2004 and 05 the 1 year rate did rise! Actually 1 year rates rose more than forecast. The 5 year rate is dead on the forecast.

yields and forwards 5

12/05

3 12/04 percent

12/03 1

0 1 2 3 4 5 maturity in years

Solid=forwardcurve.Dash=yieldcurve. (e) However, if you look closely you can see the Fama-Bliss EH failure too. The two-year rate is above the one year rate “on the way down” and through a typical regression. Rates rise eventually, but when the two year rate is higher than the one year rate, one year rates should rise next year. (f) Is there a risk premium on average?

Interest rate data 1964:01-2012:12 Maturity  12345  () 5.68 5.90 6.08 6.23 6.34  () h(1)i - 0.21 0.39 0.54 0.65 −  h() (1)i - 0.50 0.91 1.24 1.36 − h tstatistici - (1.94) (1.91) (1.86) (1.66)  () (1) - 1.80 3.31 4.60 5.67 − h “Sharpe”i - 0.28 0.28 0.27 0.24

There does seem to be a small average risk premium for long term bonds. But it seems small on average. Compare Sharpe to the 0.5 of market portfolio. This is way inside the Mean-Variance frontier, and  0. Expectations usually means “plus a constant ≈ (small) risk premium” and this is that constant risk premium. Note that long term bonds look awful by one-period measures. Why do people hold them? Answer: they don’t care about one year mean and variance!

534 2. Expectations failures—Fama Bliss. (Updated 1964-2012)

() (1) (1) +1 = + 1  = () (1) (−) − (1)  +    + +1  +    + +1  −   −  n ³ () ´ 2 ³ () ´ 2 2 0.83 0.27 0.11 0.17 0.27 0.01 3 1.14 0.35 0.13 0.53 0.33 0.05 4 1.38 0.43 0.15 0.84 0.26 0.14 5 1.05 0.49 0.07 0.92 0.17 0.17 forecasting one year returns forecasting one year rates on n-year bonds n years from now

3. Wake up. This is the central table.

(a) Left hand panel: “Expected returns on all bonds should be the same” Run a standard → regression to see if anything forecasts the diﬀerence in return. Run

() () (1)  =   =  +  + +1 +1 +1 −  We should see  =0. i. Over one year, expected excess returns move one for one with the   spread! − (The coefficient that “should be” zero.) The “fallacy of yield” is right (well, “fallacy of forward rates”) at a one-year horizon. ii. How do we reconcile this with the first table? Long bonds don’t earn much more on average.Buttherearetimes when long bonds expect to earn more, and other times when they expect to earn less.   is sometimes positive, sometimes negative. − (b) Right hand panel, row 1. (2) (1) (1) i. If  is 1% higher than  , we should see +1 rise 1% higher. We should see  = 1. Instead, we see  =0! A forward rate 1% higher than the spot rate should mean the spot rate rises. Instead, it means that the 2 year bond earns 1% more over the next year on average. ii. 083 + 017 = 1 is not by chance. Mechanically,thetwocoefficients in the first row add up. If   does not forecast ∆,itmust forecast returns.18 See the plot of − (1) bonds over time for intuition. If we move the +1 up we increase expected returns (2) (1) +1 and decrease the future yield +1. (c) Right hand panel, higher rows

18Why? Note that the forward-spot spread equals the change in yield plus the excess return on the two year bond.

 (2) (1) = (1) (2) + (1)  −   −   = (1) (2) + (1) + (1) + (1) +1 −   − +1  ³ (2) (1) ´(1) ³ (1) ´ =   +   +1 −  +1 −  In this precise way, a forward rate higher than³ the spot rate´ must³ imply´ a high return on two year bonds or an increase in the on year rate. Now, if  =  + , regress both sides on  andyouget1=regressionofAonFplus regression of B on F. Doesn’t this remind you that dividend yields must forecast dividend growth or returns? It’s the same idea.

535 i. At longer horizons, the   spread does start to forecast  year changes in yields. − (Correspondingly, it ceases to forecast  year returns — not shown.) ii. Once again, expectations does seem to work “in the long run.” (d) Note: higher rows do not add up to 1. Why? There are “complementary” regressions that add up to 1, I just didn’t show them. Time

‘safe’ 1 year return

Higher 1 year Price rate in year 1-2

= lower return of 2 year bond in 0- 1

Time Time

= Higher 3 year =Higher 1 year rate in 1-4 rate in 3-4 Lower return of 4 Lower return of 4 year bond from 0-1 year bond from 0-3

4. Q: Why do we run (1) (1) (2) (1)   =  +    + +1? +1 −   −  (2) ³(1) ´ The expectations hypothesis says  =  +1 , so why not run

(1) h (2)i +1 =  +  + +1?

A1:That’s valid but not as strong a test. An analogy:  = temperature. Just reporting Forecast =Temp makes you look like a good forecaster!

+1 =0+1  + +1 × +1 =0+1  + +1 × But this won’t work for (+1 ). Being able to forecast changes is a more powerful test − thanbeingabletoforecastlevels of a slow-moving series like temp, or yield. It’s much better to run +1  =0+1 ( )++1 − × − In short, there’s nothing wrong with doing it with levels, but diﬀerences is a more powerful test A2: Does a forward spread forecast a rate rise is a diﬀerent and more powerful question than, “does a high forward rate forecast a high interest rate?”

536 5. Interpretation.

(a) Dividend yield “should” forecast dividend growth and not returns. Dividend yield does forecast excess returns and not dividend growth. The two forecasts add up mechanically. (b) Yield spread “should” forecast short rates, and not excess returns. Yield spreads do forecast excess returns and not yield changes. The two forecasts add up mechanically. () (1) (c) The usual logic if forward rates are set so that  = + 1 then, on average, () (1) − following a time of high  , we should see a higher + 1. We can check by running this regression. − (d) “Sluggish adjustment.” For stocks on D/P; bonds on F-S and exchange rates on domestic- foreign interest rate, an expected oﬀsetting adjustment doesn’t happen. (However, outcomes aren’t adjusting to prices, prices reﬂect these expectations of outcomes, so while “sluggish adjustment” is a good way to keep track of the facts, it’s not a good way to think of the mechanism.) i. Stocks: P/D high? D should rise. It doesn’t. “Buy yield” (“value”). ii. Bonds:   high?  should rise. It doesn’t (at least for a few years). “Buy yield” −

If the current yield curve is as plotted in the left hand panel, the right hand panel gives the forecast of future one year interest rates. This is based on the right hand panel of the above table. The dashed line in the right hand panel gives the forecast from the expectations hypothesis, in which case forward rates today are the forecast of future spot rates. (e) There are big risk premia and they vary over time. There are times when you expect much better returns on long term bonds, and other times when you expect much better returns on short term bonds. (An upward sloping yield curve results in rising rates, but not soon enough — you earn the yield in the meantime). High expected returns happen in bad times. Interpretation: This looks like business-cycle related variation in risk, expected returns. ( varies over time). Who wants to hold long-term bonds in the bottom of a recession? (f) Fama-Bliss are really about “why do forward rates move today?” As the dividend yield regression is about “why do stock prices move today?” Answer: Prices today reﬂect big risk premiums, not expected changes in rates. They started with the regression of yield

537 changes on forward rates, and the returns were the complementary part. They really aren’t about “how do we forecast bond returns?” which is what CP do.

27.5.1 Exchange rates

1. Exchange rates: If foreign interest rates are higher than US rates, the foreign currency should depreciate. It doesn’t (at least for a few years). The “buy yield” advice works again. At a verbal level, this makes some sense. Domestic interest rates are low in bad times, so these are times when people are not willing to take foreign exchange risk. .

2. Numbers:

(a) From Asset Pricing summary.

DM $ U SF Mean appreciation -1.8 3.6 -5.0 -3.0 Mean interest diﬀerential -3.9 2.1 -3.7 -5.9  1975-1989 -3.1 -2.0 -2.1 -2.6 2 .026 .033 .034 .033  1976-1996 -0.7 -1.8 -2.4 -1.3

i. The first row gives the average appreciation of the dollar against the indicated currency, in percent per year. The second row gives the average interest differential — foreign interest rate less domestic interest rate, measured as the forward premium — the 30 day forward rate less the spot exchange rate. On average, the expectations hypothesis seems to work; countries that have high interest differentials depreciate. ii. The third through fifth rows give the coefficients and 2 in a regression of exchange rate changes on the interest differential = forward premium,

  +1  =  + ( )++1 =  + (  )++1 − −  −  where  = log spot exchange rate,  =forwardrate, = foreign interest rate,  = domestic interest rate. Source: Hodrick (1999) and Engel (1996). (This is one also works better in the long run, not shown). The coeﬃcient that should be +1 is -2 or more! (b) Is the foreign rate  high? The exchange rate should fall. It doesn’t — it actually goes the wrong way. This is more than sluggish adjustment.

3. Picture: as above for yields.

538

US TB 16 UK TB

0 1975 1980 1985 1990 1995 2000 2005

8 UK − US TB

−2

−4

1975 1980 1985 1990 1995 2000 2005

$ / Lb 2.4 2.2 2 1.8 1.6 1.4 1.2

1975 1980 1985 1990 1995 2000 2005

(a) Macro: When UK  US interest rates, we expect the Lb to be higher than the dollar, and then to fall. We do see a nice correlation between the two series — high interest rates do correspond to high exchange rates; and as interest rates mean-revert, exchange rates do fall back. (b) When UK  US interest rates, we expect $/Lb to fall. You can see it does in many episodes — 1975, 1981, 1990-1993, 1998 (?) 2004. Yet it takes longer than 3 months (these are 3 month rates), so there is a time of proﬁt

539 (c) More recent data: See optional paper by Jurek Table 1: Regressions still are being run on a country by country basis, with similar results.

+1  =  + ( )++1 − −  = 1 but big standard errors tiny 2.Thetiny2 makesthislookawfullythinasa − trading strategy. (The other optional papers also have updated regressions. The pattern has not changed.)

27.6 Cochrane-Piazzesi update Bottom line

1. FB run () () (1) ()  =  + (  )+ +1  −  +1 CP ask “what happens if you use all the right hand variables to improve forecasts?” () (1) (2) (3) (4) (5) () +1 =  + 1 + 2 + 3 + 3 + 5 + +1

(1) (2) (3) (4) (5) 0 Using a vector notation, where       ,wecanwritethatas ≡      h() () i +1 =  + 0  + +1 2. Results:

(a) 2 rises up to 44%, up from the Fama-Bliss / Campbell Shiller 15%

(b) The pattern of the 15 are the same when forecasting each maturity .

(c) A single “factor” 0 forecasts bonds of all maturities. We see high expected returns for all bonds in “bad times.” (d) This factor with tent-shaped coeﬃcients  is correlated with slope but is not the slope of the yield curve. The improvement comes because it tells you when to bail out — when rates will rise in an upward-slope environment. Basic regression

Unrestricted

4 5 4 2 3 0 2

−2 1 2 3 4 5 Restricted

4 5 2 4 3 0 2 −2 1 2 3 4 5

540 () (1) (2) (5) () +1 =  + 1 + 2 +  + 5 + +1

Regressions of bond excess returns on all forward rates, not just matched   as in Fama-Bliss. • − The plot is a plot of The same linear combination of forward rates forecasts all maturities’ returns. Just stretch • the pattern more to get longer term bonds! To see the point, what would Fama-Bliss coefficients look like? Answer: For each () you’d • see a single blip up at  () andasingledipdownat(1). They would be different for each different . Adapting the numbers from the “Expectations failures—Fama Bliss. (Updated 1964-2012)” table in a different format,

() () (1) +1 =  +    + +1 (1) (2) (3) (4)− (5) n    ³  ´ 2 -0.83 0.83 3 -1.14 1.14 4 -1.38 1.38 5 -1.05 1.05

How could the coefficients be so utterly different? Answer: all forward rates are highly correlated, so huge differences in coefficients don’t actually make that huge a difference in forecast. For example, if all forward rates moved exactly together then a 1.0 coefficient on the 4 year rate would give exactly the same result as a 1.0 coefficient on the 5 year rate.

A single factor for expected bond returns

We can caputre the pattern screaming at us from the data by

5 () (1) (2) (5) () 1 +1 =  0 + 1 + 2 +  + 5 + +1;  =1 4 =2 ³ ´ X

One common combination of forward rates   tells you where all expected returns are going • 0 at any date. Then stretch it up more for long term bonds with  Two step estimation; ﬁrst estimate  by ﬁnding where the average (portfolio of all bonds) is • going, then estimate  to see “if average bonds are going up 1%, how much does this maturity go up?”

5 1 () (1) (1 2) (4 5) +1 = +1 = 0 + 1 + 2 → +  + 5 → + +1 = 0 + +1 4 =2 X Then () () +1 =  0 + +1 Results: ¡ ¢

541 Table 1 Estimates of the single-factor model

A. Estimates of the return-forecasting factor, +1 =   +¯+1 0 2 2 0 1 2 3 4 5   (5) OLS estimates 3.24 2.14 0.81 3.00 0.80 2.08 0.35 105.5 − − − B. Individual-bond regressions Restricted Unrestricted () () () () +1 =  > + +1 +1 = 0  + +1 2 2    ³ ´  2 0.47 0.31 0.32 3 0.87 0.34 0.34 4 1.24 0.37 0.37 5 1.43 0.34 0.35

 capture tent shape. •  increase steadily with maturity, stretch the tent shape out. • The restricted model  almost perfectly matches unrestricted coefficients. (This is the point • of the graph) 2 =034 037 up from 015 017. And we’ll get to 044! • − − There is a very significant rejection of  =0 • The 2 are almost the same for the single-factor restriction as for the unrestricted regressions. • The restriction looks good in the graph. See paper version of Table 1 for standard errors, joint tests including small sample, unit roots, • etc. Bottom line: it’s highly significant; Expectations is rejected, the improvement on FB/3 factor models is statistically significant.

Stock Return Forecasts

Does our bond return forecasting variable also forecast stock returns?

Table 3. Forecasts of excess stock returns (VWNYSE) +1 =  +  + +1   (t)  (t) (5) (1) (t) 2 > − 1.73 (2.20) 0.07 3.56 (1.80) 3.29 (1.48) 0.08 1.87 (2.38) 0.58 ( 0.20) 0.07 1.49 (2.17) 2.64 (1.39)− − 0.10 MA > 2.11 (3.39) 0.12 MA   2.23 (3.86) 1.95 (1.02) 1.41 ( 0.63) 0.15 > − − The 5 year bond had  =143. Thus, 173 211 is what you expect for a perpetuity. • −   does better than D/P and term spread: It drives out spread; It survives with D/P • 0

542 A common term risk premium in stocks, bonds! Times when expected bond returns are high • are also times when expected stock returns are high. That we see a pervasive change in expected returns across maturities and now across stocks and bonds is reassurance that it’s real, not fads or measurement errors.

This is the beginning of a larger project.  forecasts stocks,   forecasts bonds,   • − − ∗ forecasts fx. What is the common element in the right hand side? Here we found that we don’t need a separate forecaster  () (1) for each return () , there is in fact only “one”  −  +1 right hand variable that matters. Next, is there a similar “common factor” across stocks, bonds, foreign exchange, etc.?

See optional notes at the end for much more on CP •

27.6.1 FX Update

1. Better than regressions, let’s look at returns of portfolios — If you put your money into all high interest rate currencies and beneﬁtfromdiversiﬁcation? Jurek Table II panel B. Transforming the regressions to portfolios shows a big mean, a big t statistic, and a very attractive Sharpe ratio. (But again, this stops before the crash)

2. Even better, why not sort into portfolios just like Fama French? See Lustig, Roussanov, Verdelhan Table 1. Putting assets into portfolios and looking at the average return is the same thing as running forecasting regressions. However, it lets you see the economic impor- tance of the regression in a way that forecasting regressions do not show. E(R), Sharpe = E(R)/() can be very big across portfolios even when the forecast R2 is very small.

Country – time – regression view Portfolio view

f-s = i – i* return

543 3. Now that they’re in portfolios how about Fama and French’s second idea. Do average returns line up with factor betas? Here’s what they did

(a) LRV Table 3.

The columns here are weights by which you construct “factors.” The ﬁrst is “everyone

544 against the dollar” like rmrf. The second one is “buy the high interest spread countries, short the low interest spread countries” like hml. We’ll study the method for producing these numbers next week. For now (b) See LRV Table 4.

Expected returns line up in betas on a high-low factor. The high interest rate currencies all depreciate at the same time, and low interest rate currencies all appreciate at the same time. (We’ll come back to the factor stuﬀ later) This table is a bit hard to read. The betas in the bottom half line up with average returns in Table 1. The top half gives a cross-sectional regression of average returns on betas, done three ways (the mechanics aren’t that interesting). (c) In words, you earn returns in the carry trade by taking the risk that all the high interest rate countries will depreciate together, and all the low interest rate countries will appreciate together. You seem not to earn returns by taking the risk that all countries appreciate relative to the dollar. But this might change...

4. Does this trade suﬀer Peso problems, like writing a big put? It certainly fell apart in the ﬁnancial crisis, per Jurek Figure 1.

545 27.7 Updates and new directions

A summary and attempt to see where we’re going.

1. So far you’ve seen bond-by-bond or country-by-country regressions. What’s going on now is an eﬀort to “put this all together.” In many ways that also amounts to “see if this is useful in investing,” and if so how. Some directions

2. Integrate the country and portfolio view.

(a) Regressions still are being run on a country by country basis, with similar results.

    ∆ =  + (  )+ +1  −  +1  = 1 but big standard errors tiny 2.Thetiny2 makesthislookawfullythinasa − trading strategy... (b) A “pooled regression.” If you assume all countries have the same a and b you can put them in one big regression and get a better estimate. (c) “Pooled, FE” lets each country have its own intercept,

       =  + (  )+ +1 −   −  +1 This is the same as the country by country regressions, forcing them only to have the same slope coeﬃcient . In this regression, we look only at how much variation over

546 time in forward spreads or interest diﬀerentials corresponds to exchange rate changes, i.e. “if the forward spread for this country is unusually high relative to this country’s normal value what happens to exchange rates?” (d) “XS” lets each time have its own intercept,

       =  + (  )+ +1 −   −  +1 Then we’re only looking at how much variation across countries in forward spreads or interest differentials corresponds to exchange rate changes, i.e. “if the forward spread for this country is unusually high relative to the forward spreads of other countries this year what happens to exchange rates?” This is the regression counterpart of the portfoio view. (e) The coefficients are different. The XS 2 is a lot higher, suggesting that portfolios will do better. Hassan and Mano (optional paper) are sorting this out...

3. More right hand variables.

(a) FB run () =  + ( () (1))+() +1  −  +1 and the FX regressions are

() =  + ( () ())+() +1  −  +1 (b) CP ask “what happens if you use more right hand variables to improve forecasts?”

() (1) (2) (3) (5) () +1 =  + 1 + 2 + 3 + 5 +  + +1

and “what if you forecast stock returns using bond variables?”

 +1 =  + () + (0)++1

     ∆ =  + (  )+   + +1? +1  −   −  ³ ´ Maybe a CP like factor emerges across countries?

(e) Do interest spreads forecast bond returns? Does 0 forecast currency returns? How many common factor are there in expected returns across all asset classes?

4. Factors Stock average returns correspond to rmrf, hml, smb. Bond average returns correspond to level shocks. (it turns out, we didn’t do this in class, but it’s in “decomposing the yield curve” if you get interested) FX returns correspond to LRV’s slope shock. Put all this together too!

547 27.8 Bond expectations / risk premia summary

1. Expectations hypothesis (in logs). Big picture: the random walk background.

(a) Forward rate = expected future spot rate (b) Long term bond yield = expected future short yields (c) You expect to earn the same amount on bonds of any maturity over the next year (d) A rising term structure (yield higher with maturity, like right now) implies that interest rates will rise in the future, not that you make more money holding long term bonds.

2. Empirical evidence — averages

(a) On average, the yield curve slopes slightly upwards, and returns are slightly higher on long term bonds. But poor Sharpe ratios.

3. Empirical evidence — regressions. Big picture:

excess return +1 =  +  + +1 → As with stocks, are there times when one kind of bond is expected to do better than another?

4. Fama-Bliss

(a) Right hand variable: For the  period bond, use the forward rate ( 1 )lessspot − → rate. (b) Fact:  1. At a one year horizon, a forward rate 1% higher than the 1 year rate means ≈ you expect to earn 1% more on long term bonds. (c) Bond math:   mechanically means either return or rise in one year yield. At a one − year horizon, a forward rate 1% higher than the 1 year rate does not signal a rise in interest rates! (d) At a 5 year horizon, a forward rate 1% higher than the 1 year rate does signal a 1% rise in interest rates. “Sluggish adjustment” Interest rates are Waiting For Godot.

5. Empirical evidence — Cochrane-Piazzesi

(a) Central innovation — forecast returns with  = all forward rates, not just the matched forward spot spread. (b) A common “factor” (linear combination of yields and forward rates) forecasts bond returns of all maturities. Long bond expected returns move more than short maturity expected returns. (This is the most important ﬁnding, and holds even if the 2 improvement and tent are not important.) (c) R2 rises to 0.35-0.44 from FB 0.15. (d) Expected bond returns are high in bad times — until the inevitable rise in interest rates happens. Then you get killed in long term bonds. Watch out! (e) “Signal” is curvature in the forward rate curve. This is not the usual “curvature” factor in the yield curve — more curved at the long end.

(f) Big question: What the heck is 0? What does the tent shape mean?

548 28 Term Structure II. Interest Rates Factor Models notes

The “Investments notes” contain a matrix review. If you forgot how to multiply matrices you • should go read that now.

28.0.1 Motivation and idea

Yields of 1−5 year zeros and fed funds

1960 1970 1980 1990 2000 2010

Yield spreads y(n)−y(1)

−1

−2

1960 1970 1980 1990 2000 2010

Look at the graph. Look at the movie.19 Most movements in 1-5 year bonds are a) “level • shifts” (top graph) b) changes in slope of the term structure (bottom graph). How can we capture this behavior?

How about • ()  =  + level + slope +(otherstuff). If (1)  1 2 (2) − ⎡  ⎤ 1 1 (3) ⎡ ⎤ ⎡ − ⎤ ⎢  ⎥ = 1 level + 0 slope +(otherstuff) ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ (4) ⎥ ⎢ 1 ⎥ ⎢ 1 ⎥ ⎢  ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ (5) ⎥ ⎢ 1 ⎥ ⎢ 2 ⎥ ⎢  ⎥ ⎢ ⎥ ⎢ ⎥ ⎢  ⎥ ⎣ ⎦ ⎣ ⎦ Then a 1% change⎣ in “level”⎦ moves all yields up 1%. A change in “slope” moves the short rates down and the long rates up; it changes the slope of the yield curve. Two “factors” describe the movements of five yields.

19lecture9.m

549 Analogy: This is a lot like the FF3F model. •   +1 =  + +1 + +1 + +1 + +1

We express each yield (return) as a sum of coefficients (betas) specific to that security times “factors” (level, slope; rmrf, hml, smb) common to all securities. In turn the factors are specific combinations (portfolios) of the same underlying securities. rmrf is the sum of all returns, hml is long value stocks short growth stocks etc.; level is an average of all yields, slope is long high maturity yields and short low maturity yields etc.

We’re describing variance, the movement of ex-post yields, not yet expected returns,etc.Now • we do care about time series regressions, R2 signiﬁcance of coeﬃcients, etc. And for now, we don’t care about intercepts. We’ll tie together means and variances in a bit.

28.0.2 A simple approach to a factor model

Why don’t we try exactly the FF approach? Let’s deﬁne a level factor as the average of all • yields (like rmrf) and the slope factor as a high-low portfolio just like hml

1 (1) (2) (3) (4) (5) level  +  +  +  +  ≡ 5      1 h (4) (5) 1 (1) (2) i slope =  +   +   2   − 2   h i h i Clearly, when all yields go up, level will rise. When the yield curve slopes up slope will be a big number, and when it slopes down, it will be a negative number.

Now, we can just run regressions to ﬁnd the “betas.” • () ()  =  +  level +  slope +   × ×  I did it, 2     (1) 0.09 0.9998 -0.77 0.97 (2) -0.08 1.0003 -0.28 0.97 (3) -0.04 0.9999 0.10 0.97 (4) -0.02 1.0008 0.38 0.98 (5) 0.04 0.9993 0.56 0.98 This ought to look pretty familiar and just about what you expected. (The units are%, / so the intercepts are truly tiny, i.e. 9 bp for (1) These are yields not returns, so the intercepts are not “alphas.” )

The 2 are very high here. The factor model is almost a perfect ﬁt. There are really only • two portfolios that change, and then everything else is just a linear combination of these. Here is a plot of actual yields and

() ﬁt  =  level +  slope  × × The lines are indistinguishable, meaning that the two-factor model is an excellent ﬁt.

550 Actual Yields

0 1960 1970 1980 1990 2000 2010

Fitted Yields −− Two factor model

0 1960 1970 1980 1990 2000 2010

The good ﬁt means that we are very close to an “arbitrage pricing theory.” Once where you • know where level and slope are (or any two yields) on any date, level and slope,youknow exactly where all the other yields must be! We will end up with “no-arbitrage term structure models” in which there are exactly n factors, and the rest are priced by arbitrage.

Principal components analysis allows you to construct factors like this very easily, without • having to “guess” the factors.

Notice the two steps in constructing a factor model. Any factor model. • 1. How do you form the factors (level, slope) from yields or other observable variables? level = ,slope =  2. What are the loadings or betas; if the factor moves, how much does each bond move? ()  = level + slope

28.1 Factor analysis

That’s nice, but ad-hoc. Can you do better — get a closer fit—withadifferent construction • of level and slope? What’s the best way to form another factor (curvature) to get a better fit?

Objective: • ()  =  + level + slope + curve +(otherstuﬀ, as small as possible)

551 How do you construct factors that are uncorrelatedwitheachother,soastomaximize2 / minimize the variance of the error.

The answer is: the eigenvalue decomposition/principal components. (OLS regression picks • the    to minimize the variance of the error given the factors. The question here is, how do you pick the factors so that after you have run the regressions you get the smallest possible error?)

Here’s the procedure. First a very short version for people comfortable with matrices • 1. Form the covariance matrix of yields. Using the notation

(1)  (2) ⎡  ⎤  = (3) ⎢  ⎥ ⎢ ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎢ ⎥ ⎣ ⎦ Σ = (0) This is, for the Fama Bliss data, a 5 x 5 matrix. 2. Take its eigenvalue decomposition. (The matlab function eig does this, and you don’t have to know anything about how it works. If you’re curious see the appendix to the notes.) This decomposition forms three matrices from Σ,

Σ = Λ0

20 in which Λ is diagonal, and 0 = 0 = . 3. That’s it. In matlab, there are two steps. With yields =a  matrix as usual, × form Sigma = cov(100*yields) andthenform[Q,L] = eig(Sigma). That’s all you have to do. What does this mean? 4. The columns of  represent the weights by which you form factors from yields,

 = 0

5. The columns of  also represent the loadings or betas, which answer “how much does  move when a factor moves,”  = 

Proof:  is deﬁned as  = 0,so =(0) = .

6. The factors  are uncorrelated with each other, and the diagonals of Λ are their variances,

(0 )=Λ

Proof: (0 )=(00)=0(0) = 0Σ0 = Λ 7. We form approximate (but often a very good approximation) factor models by leaving out factors with small variances , or just considering them as part of an error term.

20 0 = 0 =  comes from the fact that Σ is a symmetric matrix. In general, the eigenvalue decomposition is 1 Σ = Λ− . Covariance matrices are symmetric.

552 8. Once you’ve formed the factors , A regression of  on  also gives the loadings . Since the factors are uncorrelated with each other, if you leave out some factors, regressions with the remaining factors also recover the remaining columns of .

Now, a more detailed version of the same thing. This just amounts to writing out the matrices • and looking at them.

1. Sigma = cov(yields); Formthecovariancematrixofyields

(1)  (2)  = ⎡  ⎤ (3) ⎢  ⎥ ⎢  ⎥ ⎣ ⎦ Σ (0) ≡  I’ll do three yields in the examples, so Σ is a 3 3 matrix. The FB data has 5 yields so × this is a 5 5 matrix. In general you’d do this with many, many maturities. × 2. [Q,L] = eig(Sigma); Use a computer to ﬁnd the eigenvalue decomposition of Σ,

Σ = Λ0

Λ is a diagonal matrix 1 00 Λ = ⎡ 0 2 0 ⎤ 003 ⎢ ⎥ and write the  matrix in terms of its⎣ columns as ⎦

|||  = ⎡ 1 2 3 ⎤ ⎢ |||⎥ ⎣ ⎦ You do not need to know what an eigenvalue is, or how to compute this decomposition by hand (essentially impossible). Just trust and verify that the computer does it right. 3. The eigenvalue decomposition looks like this:

Σ = Λ0

1 00  ||| − 10 − = 1 2 3 0 2 0  ⎡ ⎤ ⎡ ⎤ ⎡ − 20 − ⎤ 003 30 ⎢ |||⎥ ⎢ ⎥ ⎢ − − ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ The property of the Q matrix 0 = 0 =  means  100 − 10 − |||  1 2 3 = 010  ⎡ − 20 − ⎤ ⎡ ⎤ ⎡ ⎤ 30 001 ⎢ − − ⎥ ⎢ |||⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦

553 4. Now, the columns of  tell you how to construct the factors  from the yields .You can form the factors by  = 0 This means (1) (1)  10  (2) − − (2) ⎡  ⎤ = ⎡ 20 ⎤ ⎡  ⎤ (3) − − (3) 30 ⎢  ⎥ ⎢ − − ⎥ ⎢  ⎥ ⎢ ⎥ ⎣ ⎦ ⎢ ⎥ for instance, ⎣ ⎦ ⎣ ⎦ (1)  (1) (2)  = 10 ⎡  ⎤  − − (3) h i ⎢  ⎥ ⎢  ⎥ For example: if ⎣ ⎦

10 = 131313 then this operation constructs the “level”h factor. i

(1)  (1) (2) 1 (1) (2) (3)  = 131313 ⎡  ⎤ =  +  +   (3) 3 h i ⎢  ⎥ ³ ´ ⎢  ⎥ ⎣ ⎦ 5. The factors are uncorrelated with each other, and the Λ diagonals tell you the variance of each factor

(0 )=Λ this means

(1) 2 (1) (1) (2) (1) (3)   ( ) (  ) (  ) 1 00 (2) (1) (2) (3) 2 (2) (2) (3)  ⎛⎡  ⎤    ⎞ = ⎡  ( ) (  ) ⎤ = ⎡ 0 2 0 ⎤ (3) · 2 (3) 00 ⎜⎢  ⎥ h i⎟  ( ) 3 ⎜⎢  ⎥ ⎟ ⎢ ·· ⎥ ⎢ ⎥ ⎝⎣ ⎦ ⎠ ⎣ ⎦ ⎣ ⎦ 6. The columns of  are also the loadings, they tell you how much each  moves when a factor  moves.  =  This means

(1) (1)   (2) ||| (2) | (1) | (2) | (2) ⎡  ⎤ = ⎡ 1 2 3 ⎤ ⎡  ⎤ = ⎡ 1 ⎤  + ⎡ 2 ⎤  + ⎡ 3 ⎤  (3) (3) ⎢  ⎥ ⎢  ⎥ ⎢  ⎥ ⎢ |||⎥ ⎢  ⎥ ⎢ | ⎥ ⎢ | ⎥ ⎢ | ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ or () () (1) () (2) () (3)  = 1  + 2  + 3 

 =  is a factor model—the columns of  are the “loadings” or the “betas”; the  and  of the ﬁrst equation.

554 7. To form a factor model, leave out the factors with really small eigenvalues,eitherignoring them or treating them as small errors For example, () () (1) () (2)  = 1  + 2  +  () () (1) () (2)  =˜ 1  + 2  (1) (2) Since the factors are uncorrelated with each other,  is uncorrelated with  and  so  are still regression coeﬃcients.

Asimpleexample: Suppose the covariance matrix is • 2 (1) (1) (2)      21 (  )= =  0 (1) (2) 2 (2) 12 ⎡  ³ ´ ³  ´ ⎤ " # Now, ﬁnd its eigenvalue decomposition:⎣ ³ ´ ³ ´ ⎦

>>Sigma=[21;12]; >> [Q,L] = eig(Sigma); >> disp(Q); -0.71 0.71 0.71 0.71 >> disp(L); 1.00 0 03.00

21 071 071 10 071 071 = − − " 12# " 071 071 #" 03#" 071 071 # (note 071 = 1√2) Now check that this worked.

>> disp(Q*L*Q’); 2.00 1.00 1.00 2.00 >>disp(Q’*Q); 1.00 0 01.00 >> disp(Q*Q’); 1.00 0 01.00

Example continued: We interpret the result as a factor model. • 1. The factors are two random variables (1) and (2) with 2 (1) (1) (2)      10 (  )= =  0 (1) 2 2 (2) 03 ⎡  ³ ´ ³  ´ ⎤ " # ⎣ ³ ´ ³ ´ ⎦ (2) is more volatile, and the two x’s are uncorrelated with each other.

555 2. We think of  as generated from the factors by

 =  (1) 071 071 (1)  =  (2) −071 071 (2) "  # " #"  # (1) 071 071  = (1) + (2) (2) −071  071  "  # " # " #

3. What does this mean? (2) is the more volatile factor, with 2 (2) =3.Whenitmoves, both  move together (“level”). (1) is a less volatile factorh withi2 (1) =1.When it moves, (1) goes down and (2) goes up (“slope”). The  were correlatedh i with each other. We capture this correlation here with a larger “common component” (2) and a smaller component that sends them in different directions. 4. Language The  are factors and the columns  are the factor loadings since they tell you how much each  moves when an  moves. The  are also “betas”, regression coefficients of  on the factors .Thefirst, smaller factor is a “slope factor” and the second larger factor is a “level factor.” This statement just describes the loadings. 5. Constructing a time-series of the factors. The  are constructed from the  by .

 =  1  = −  = 0 (1) (1)  10  (2) = −− −− (2)  20  "  # " −− −− #"  # (1) 071 071 (1) 071(1) +071(2)  =  =   (2) −071 071 (2) − (1) (2) "  # " #"  # 071 +071

( is usually not symmetric!) Note that the same columns of  that describe how  responds to each of the x, also describe how to construct factors  from the data  We recover the “level” factor from an average of the two yields. We recover the “slope” factor from long-short yield spread!

28.1.1 Unit variance factors — Optional

Areﬁnement: unit variance factors. Now we have • (1) 071 071  = (1) + (2); (2) −071  071  "  # " # " # 2 (1) 2 (2)   =1;  =3 ³ ´ ³ ´ As you can see, the second factor accounts for more of the movement of . Rather than express this fact by remembering that the second  is more volatile, let’s express it as bigger loadings on the second factor. Let’s just rescale the factors by their standard deviations to express the model in terms of factors with unit variance.

556 Deﬁne new factors () where () =  () ().The will have the same (one) variance. • h i (1)  071 (1) (1) 071 (2) (2) (2) = −    +    ; "  # " 071 # " 071 #  ³ h i ´ ³ h i ´ (1)  071 (1) (1) 071 (2) (2) (2) = −    +    ; "  # Ã" 071 # ! Ã" 071 # !  h i h i (1) 071 122  = (1) + (2); (2) −071  122  "  # " # " # (1) (2) (1) (2)   =   =1;    =0 ³ ´ ³ ´ ³ ´ (I used  (1) =1 (2) = √3and√3√2=122) h i h i In this way you can see more easily that the second factor is “bigger.” A typical unit movement • in (2) moves both (1) and (2) up by 1.22. A typical unit movement in (1) moves (1) up by 0.71 and (2)down by 0.71.

The matrix version of this calculation: •  = ; (0)=Λ 1  = Λ 2  1  = Λ 2 ; (0)=

1³ ´ Thus, loadings are the columns of Λ 2 rather than just . Powers of a diagonal matrix are easy, √1 00 1 Λ 2 = ⎡ 0 √2 0 ⎤ 00√3 ⎢ ⎥ and multiplying a matrix by a diagonal⎣ is easy, ⎦

√1 00 1 ||| ||| Λ 2 = ⎡ 1 2 3 ⎤ ⎡ 0 √2 0 ⎤ = ⎡ 1√1 2√2 3√3 ⎤  00√3 ⎢ |||⎥ ⎢ ⎥ ⎢ |||⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ We can recover the  with 1  = Λ− 2 0

Another view, “normalization.” If you have a factor model • (1) (2)  = 1 + 2 there is always a bit of arbitrariness. You can always make the factor bigger if you make the loading smaller. This is the same as 1  =(2  ) (1) +  (2)  1 2  2  × µ × ¶ 1  =(  ) (1) +  (2)  1   2  × µ × ¶ So what’s the right scale?

557 1. The original eigenvalue decomposition picks the “size” from 10 1 = 1 meaning that the  () 2 sum of squared loadings is one, =1 1 =1. 2. That’s nice, but not particularlyP intuitive.³ This´ new idea chooses a scaling factor so that the variance of the factors is one. By choosing  = ((1))wehave 1 (1) =1.   ×  3. You’re free to choose other scaling factors if they are more convenient.³ ´

28.1.2 A real example: yields

A real example. Yields! This also shows you that all this talk is only about 2-3 lines of matlab •

Sigma = cov(100*yields); [Q,L] = eig(Sigma); loads = Q*L^0.5; disp(’standard deviation of factors’); disp(diag(L)’.^0.5); (my names)2-5 zigzag curve slope level 0.06 0.07 0.10 0.58 5.80 disp(Q) 0.06 0.15 -0.47 -0.74 0.46 -0.35 -0.55 0.56 -0.21 0.46 0.70 0.32 0.44 0.12 0.45 -0.59 0.57 -0.03 0.36 0.44 0.19 -0.49 -0.52 0.51 0.43 plot(loads) plot(Q)

558 loadings with σ = 1 4 level slope 3 curve zigzag 2 2−5

−1 1 2 3 4 5

loadings 1 level slope 0.5 curve zigzag 2−5 0

−0.5

−1 1 2 3 4 5

1. Graph interpretation: If factor x moves, how much do yields of maturity 1,2,3,4,5 move? Almost all variation in yields is due to “level” shifts (5.80% ). Slope and curve are almost all the rest. 2. Graph interpretation: What combination of (1)() produces each factor? Sensibly, “level” is an average of all yields, slope is long - short, etc. (1) (1) (2) ||| (2) Interpretation 1: ⎡  ⎤ = ⎡ 1 2 3 ⎤ ⎡  ⎤ (3) (3) ⎢ ⎥ ⎢ |||⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ (1) (1)  1  (2) − − (2) Interpretation 2: ⎡  ⎤ = ⎡ 2 ⎤ ⎡  ⎤ (3) − − (3)  3  ⎢ ⎥ ⎢ − − ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ The names “level” “slope” and “curvature” come entirely as verbal descriptions of the shapes of these graphs.

3. The standard deviations of the factors are 5.8%, 0.58% and then less than 10 basis points. The factors past the ﬁrst two are really small!

1 4. The top graph Λ 2 tells us “what does the typical (1  ) movement of each factor do to the yield curve?” The bottom graph tells us “What does a unit movement of each factor (very unlikely for the small factors) do to the yield curve?” The bottom graph () is better for eyeballing the shape of the factor loadings without worrying about which is more important than the other. Use it only in conjunction with the . The top graph reminds us that the smaller factors are tiny.

559 28.1.3 Dropping factors

Dropping some factors. As you can see, some of the factors are very small. You capture • almost all movements in yields by keeping just the ﬁrst few factors.

(1)  (2) ⎡  ⎤ (3) ⎢  ⎥ 1 level + 2 slope + 3 curve ⎢ ⎥ ≈ × × × ⎢ (4) ⎥ ⎢  ⎥ ⎢ (5) ⎥ ⎢  ⎥ ⎢  ⎥ ⎣ ⎦

actual yields

0.1

0 1960 1970 1980 1990 2000 2010 level only

0.1

0 1960 1970 1980 1990 2000 2010 level and slope

0.1

0 1960 1970 1980 1990 2000 2010 level, slope and curve

0.1

0 1960 1970 1980 1990 2000 2010

560 levels only 0.07

0.06

0.05

0.04

0.03

0.02

0.01 2000 2001 2002 2003 2004 2005 2006 2007

level and slope 0.07

0.06

0.05

0.04

0.03

0.02

0.01 2000 2001 2002 2003 2004 2005 2006 2007

561 level slope and curve 0.07

0.06

0.05

0.04

0.03

0.02

0.01 2000 2001 2002 2003 2004 2005 2006 2007

Movements in yields can be captured very well by movements in the ﬁrst two - three factors alone.

Dropping factors II. Note that the factors are uncorrelated with each other. ( )=Λ. • 0 Thus, the left out factors are uncorrelated with the factors you keep in.

() () () ()   level +  slope +  curve +(leftout)  ≈ 1 × 2 × 3 × Therefore, this is a regression equation! This is a way of ﬁnding a regression model like FF3F when you don’t know what to use on the right hand side.

Notice the analogy to FF3F: three factors (market, hml, smb) account for almost all return • variation (2 above 90%). The factors are constructed as weighted combinations of the same securities.

2 and variance. Look, for example at (1). Now, the factors are uncorrelated so • 2 (1) (1)2 2 (1) (1)2 2 (2) (1) 2 (3)  ( )=1   + 2  ( )+3  ( )+ (1)2 ³ (1)2´ (1) = 1 1 + 2 2 + 3 3 + 

(1) (2 is the ﬁrst or top element of the second column of  and so forth.) The fact that the  get small after 3 or so means that the error term from leaving out 4 and 5 will be very small —high2

We often say that   gives the “percent of the variance explained by the th factor.” • P Ld= diag(L); disp(Ld’); disp(Ld’/sum(Ld)); fractions of variance 0.00 0.01 0.01 0.33 33.62 0.00 0.00 0.00 0.01 0.99

562 28.1.4 Some uses of a factor model

We’re talking about bonds and yields here, but you can just as easily do the same thing for • stocks and for returns.

Σ = ( 0)

Σ = Λ0

 = 0

nowwehavethefactors

 =  (: 1:3) +smallerror ≈ This is the heart of Barra, commercial risk management models.

Often •

()=(: 1:3) ()+(small error) = = (: 1:3) ()+ alpha

and the model ends up describing means as well.

Uses 1: multifactor immunization. • 1. Recall the basic duration hedge: You choose 1 security so that the portfolio value doesn’t change if level changes. The problem with this is, if the slope changes, you are exposed to risk. 2. Multifactor immunization: You can instead choose 3 securities so that the portfolio value doesn’t change if level, slope and curvature change. Since there are so few additional changes, the portfolio is very well immunized.

Uses 2: hedging/manager evaluation. Like FF3F for bonds. We now have a 3 factor model • for yields, use just like FF3F 3 factor model. If your portfolio has no  on anything, then it will have very little variance, as no    gives no variance. This lets you separate “directional bet” on factors from “mispricing” () of bonds.

As an analogy, back when we did •   +1 =  + +1 + +1 + +1 + +1

we were focused on means and alphas, but there was an important lesson for variance and risk management too.

1. For example, we could think of the variance (risk) of the portfolio as coming from its various betas (assuming for simplicity that the factors are uncorrelated)

2  2 2 2 2 2 2 2  ( )=  ()+  ()+  ()+ ()

2. This “factor model” was also useful for risk management. For example, if we want to hold an asset , the model can tell us “what will happen to the asset if the market goes down another 10%.”

563 3. That information is especially useful in forming a portfolio. For example, if you have two securities  and , the variance of an equally-weighted portfolio is

( + )=2()+2()+2()

Well, modeling covariances is really tough. Instead, though, what if you use the factor model? Then

  2 2 2 2 2 2 2  2    ( + )=( + )  ()+(+)  ()+( + )  ()+ ( )+ ( )+2(  )

Once you know what each beta is (covariance with the factors) then you can quickly ﬁgure out the variance of the portfolio, especially if the  are small and uncorrelated with each other. You can also quickly ﬁgure out how much of  to buy in order to hedge  and make this portfolio variance small. This is why risk management is practically all conducted in the context of factor models. (e.g. Barra) 4. All of this is much easier if the errors are small. For bonds, the errors are essentially zero, which makes these uses of a factor model even more compelling.

28.1.5 Summary

1. Find the eigenvalue decomposition of yield covariance matrix,

Λ0 = (0)

2. The columns of  tell you how much yields of each maturity move when that factor moves — they are “factor loadings”. Example: “First factor” is typically a “level factor” so 1 = 1515151515 0. This means “if factor 1 moves 1, all yields go up by 1/5.” Theseh are also regression coeifficients of yields on factors. 3. The columns of  also tell you how much to weight each yield when you construct the factor from the yields. Example: you find the first factor at any point in time by taking the average of the 5 yields.

4. The elements of diagonal Λ are the variance of each of the (uncorrelated with each other) factors.

5. A “factor model” comes from ignoring the very small factors — small Λ.

6. A factor model for yields also implies a factor model for prices or forward rates.Youcanalso directly factor-analyze prices, forward rates, returns, changes in yields, prices, forward rates or returns. The results are very similar.

28.2 Simplest term structure model — expectations hypothesis

Now, a third approach to a factor model. In this case we will “derive” a factor model. Here, • I’ll use the expectations hypothesis to connect bonds of various maturity. That’s a shortcut, real term structure models start with  = (), but that adds a bit of algebra and not much clarity. The logic is the same.

564 Here is the simplest model, which shows you the logic. We specify a “short rate process” and • use the expectations hypothesis to get other maturities.

Start with an AR(1) for the short rate, • (1) (1)   = ( )++1 +1 −  − ³ ´ Now, let’s ﬁnd all the other forward rates assuming the expectations hypothesis. It’s easy to • calculate forecasts of an AR(1)!:

(2) (1) (1)  = ( )= + ( )  +1  − (3) (1) 2 (1)  = ( )= +  ( )  +2  − () (1)  1 (1)  = (+ 1)= +  − ( ) − −

Yields are just a little more complicated 1 (2) =  ((1) )+(1)  2  +1  1 h i =  + ((1) )+(1) 2  −  1+h  i (2)  = (1)   − 2  −  ³ ´ 2  () 1+ +  +  (1) 1  (1)   =   = −    −   − (1 )  − ³ ´ − ³ ´ Alternatively, calculate yields from the forward rates 1 1 (2) = (2) = (2) (1) + (1)  −2  −2  −   1 ³ ´ (2) = (1) +  (2)  2   1 ³ ´ (2) = (1) +  + ((1) )  2   − 1+³  ´ (2)  = (1)   − 2  − ³ ´ and so on.

Look what we have • 1. This is a one-factor model with the one-year rate as the single factor. 2.Themodelcanproducediﬀerent shapes of the forward and yield curve, including upward and downward sloping. The curve is upward sloping when short rates lower than average (1) ( ) and thus are expected to rise. This captures a lot of reality!

565 3. You want more complex shapes or more factors? Move past the simple AR(1)! An example:

 =  1 +  −  =  1 +  −

(1)  =  +  (2) (1)  = +1 =  +  () (1)  1  = + 1 =  +  −  −

 is a “level” factor and  is a “slope” factor.

() (1)  1   =  − 1   −  − ³ ´ How is this diﬀerent from a real term structure model? Real models start with  () = •  (+1+2+ 1), but otherwise follow the same logic. The expectations hypothesis × is only an approximation. Doing it right, real models add extra 122 terms. For example, the “single-factor Vasicek” model looks just like our model but with some extra terms.

(1) (1)   =    + +1 +1 −  − ³ ´ ³ ´ 1  (2)  =  (1)  +  2   2  − − − ∙ ¸ ³ ´ ³ ´ 1  (3)  = 2 (1)  (1 + )2 + (1 + ) 2  −  − − 2  ³ ´ ³ ´ ∙ ¸

What do we use models like this for? When you do it right, you are sure there are no arbitrage • opportunities. With a single factor, every term structure derivative depends only on the single factor. Then you can price omitted maturities or term structure options. The idea is exactly the same as Black and Scholes’ idea to price stock options from the “single factor” of the underlying stock, by arbitrage.

566 29 Term structure extras

The following is not required material either in class problem set or ﬁnal. In previous years I did two lectures on interest rates, and some of this is the second lecture that you’re missing. If you’re interested at some point, you might ﬁnd it useful. If you’re having enough problems keeping up with what we actually did, skip it!

29.1 CP extras

We do a lot of extra work to convince you it’s real and robust. This is the “notes” version of material in the paper, and some extensions. I won’t cover this in class unless by miracle I have extra time.

29.1.1 More lags

4 i=0 3 i=1 i=2 2 i=3

0 Coefficient −1

−2

−3 1 2 3 4 5 Maturity

() () +1 =  + 0   + +1 − 3 ()  () +1 =  + 0  + +1 − =1 X More lags are signiﬁcant, with the same pattern. • Checking individual lags reassures us it’s not just measurement error, i.e. •

+1  =  +  + +1 − if  is measured with error, you’ll see something. But

+1  =  +  112 + +1 − − ﬁxes this problem.

567 Spuriously high price

Large return at t Low return at t+1

The pattern suggests moving averages •

+1 =  + 0 ( +  1 +  2 + ) +1 − −

 12346 2 0.35 0.41 0.43 0.44 0.43

Interpretation: Yields or forwards at  should should carry all information about the future. If • the lags enter, there must be a little measurement error.  change slowly over time, so  112 is − informative about the true .Movingaveragesareagoodwaytoenhanceaslow-moving“signal” buried in high-frequency “noise.”

29.1.2 History

5 0 Fama−Bliss

5 γ ′ f 0 −5 5 0 −5 Ex−post returns

10 Yields 5 0 1965 1970 1975 1980 1985 1990 1995 2000

  predictions are consistent in many episodes • 0 568   and slope are correlated. Both show a rising yield curve but no rate rise • 0   improvement in many episodes.   says get out in 1984, 1987, 1994, 2004. What’s the • 0 0 signal....?

14 Dec 1981 12 Apr 1983 10 Aug 1985

8 Apr 1992

6 Aug 1993

Forward rate (%) Feb 2002 Dec 2003 4

1 2 3 4 5 Maturity

Green: CP say go. Red: FB say go, CP say no. • Tent-shaped coeﬃcients interact with tent-shaped forward curve to produce the signal. • CP: in the past, tent-shape often came with upward slope. Others saw upward slope, thought • that was the signal. But an upward slope without a tent does not work. The tent is the real signal.

29.1.3 Real time

569 10

5 Full sample

5,−5 Real time

1970 1975 1980 1985 1990 1995 2000

Regression forecasts ˆ>. “Real-time” re-estimates the regression at each  from 1965 to .(Note: Just because we don’t have data before 1964 doesn’t mean people don’t know what’s going on. Out of sample is not crucial, but it is interesting.)

29.1.4 Macro

Return forecast Unemployment

1965 1970 1975 1980 1985 1990 1995 2000

Is it real, a time-varying risk premium? Or is it some new psychological “eﬀect,” an unexploited • proﬁt opportunity? Here,   is correlated with business cycles, and lower frequency. (Level, not growth.) Suggests • 0 a “business cycle related risk premium.”

570 It’s also significant that the same signal predicts all bonds, and predicts stocks. If “overlooked” • it is common to a lot of markets! The Point of history, real time and macro: this is the kind of analysis you do to make sure • you’re not finding a fish. 1/20 t statistics look good. You need to persuade yourself.

29.1.5 Failures and spread trades

What this is about (so far): when, overall is there a risk premium (high expected returns) in long • term vs. short term bonds. “Trade” is just betting on long vs. short maturity, “betting on interest rate movements.” What this is not about (so far). Much ﬁxed income “arbitrage” involves relative pricing, small • deviations from the yield curve. “Trade” might be short 30 year, long 29.5 year. A hint of spread trades If the one-factor model is exactly right, then deviations from the single-factor model should not be predictable. (2) (2) (2)  2+1 =  +00 + +1 =  +00 + +1 +1 − (Why?

(2) (2) (2) +1 =  + 2 0 + +1 (2) +1 =  + 0 +¡ +1¢ multiply the second by 2 and subtract.)

Table 7. Forecasting the failures of the single-factor model A. Coeﬃcients and t-statistics Right hand variable (1) (2) (3) (4) (5) Left hand var. const.      (2) +1 2+1 0.11 0.20 0.80 0.30 0.66 0.40 (t-stat)− (−0.75) (−1.43) (2.19) (−0.90) (−1.94) (1.68) (3) − − − − +1 3+1 0.14 0.23 1.28 2.36 1.01 0.30 (t-stat)− (1.62) (2.22) (−5.29) (11.24) (−4.97) (−2.26) (4) − − − +1 4+1 0.21 0.20 0.06 1.18 1.84 0.82 (t-stat)− (2.33) (2.39) (−0.33) (−8.45) (9.13) (−5.48) (5) − − − +1 5+1 0.24 0.23 0.55 0.88 0.17 0.72 (t-stat)− (−1.14) (−1.06) (1.14) (−2.01) (−0.42) (2.61) − − − − B. Regression statistics 2 2 () () Left hand var.   (5) (˜>) (lhs) ( >) (+1) (2) +1 2+1 0.15 41 0.17 0.43 1.12 1.93 (3) − +1 3+1 0.37 151 0.21 0.34 2.09 3.53 (4) − +1 4+1 0.33 193 0.18 0.30 2.98 4.90 (5) −  5+1 0.12 32 0.21 0.61 3.45 6.00 +1 − Pattern: if () is a little out of line with the others (low price), then () is good relative to • all the others.

571 No common factor. Bond-speciﬁc mean-reversion. • This is tiny. 17-21 bp, compare to 200-600 bp returns. • The single-factor   accounts for all the economically important variation in expected returns • 0 But the left hand side is tiny too, so tiny/tiny = good 2 • Tiny isn’t so tiny if you leverage up like crazy! • But... measurement error looks the same. •

29.1.6 Latest data

CP Real−time forecast average (across mat) returns with spreads 8

−2

−4

forecast CP Sample actual −6 1990 1995 2000 2005 2010 2015

CP Real−time forecast average (across mat) returns 8

−2

−4

forecast CP Sample actual −6 1990 1995 2000 2005 2010 2015

572 FB Real−time Forecast average (across mat) returns with spreads 8

−2

−4

forecast CP Sample actual −6 1990 1995 2000 2005 2010 2015

Out of sample looked pretty good until the financial crisis. CP got the low level right and matched the ups and downs a lot better than the FB forecast. We both missed the great returns to long-term bond holders before the financial crisis (big negative spike). We got the big negative spike in returns with the crisis. We’ve been matching the ups and downs pretty well since then, though off on the overall level. Of coure, the Fama-Bliss forecast has been doing spectacularly well since the crisis, so the correlation of yield spread with tent is the big story, and in fact tent is hurting relative to yield spread. However, none of this bother us that much. The main point of the CP model is that there is a single-factor model of expected bond returns. It does not really matter whether that single factor is recovered by a “slope” operator, a “tent” operator, or includes additional variables. More on this point, with equations, after we discuss a bit what “factor models” means.

29.1.7 CP factor vs. level slope and curvature

The CP return-forecasting variable 0 is a “factor” — it’s a linear combination of forward rates or yields. Here, we express it as a function of yields to compare it to the level slope and curvature factors. This is here as an interesting exercise using factor models.

573 A. Expected return factor γ* B. Yield factors 15 10 0.5 level 5 0 0 slope −5 −0.5 −10 curvature 1 2 3 4 5 1 2 3 4 5 Yield maturity Yield maturity C. Return Predictions D. Forward rate forecasts 15 4 10 level, slope, 2 5 & curve 0 level & slope 0 −5 −10 −2 1 2 3 4 5 1 2 3 4 5 Yield maturity Forward rate maturity

Panel A: Panels A and B are the loadings,howyouconstruct0 and yield factors from underlying yields. This is the tent, expressed as a function of yields. We can express 0 as a function of yields too. 0 = ∗0;Whatyield curve signals high returns on long term bonds (not forward curve)? A:  Slope plus 4-5 spread. ∗ ≈ Panel B: 0 has nothing to do with level slope and curvature slope and curvature (0 is curved at the long, not short end). Panel C: We can forecast returns using factors at time t, to try to summarize the information in the entire yield curve. +1 =  +   +   + +1 Then, from the construction × ×  = 0, etc, you can figure out the implied regression coefficients of +1 on . The graph plots those implied coefficients. Moral: You can’t approximate the forecasting power of 0 well by running +1 on level, slope, and curvature factors. Moral 1 Term structure models need L, S, C to get yield behavior and   to capture the • 0 information in the yield curve about expected returns. Adding   will not help much to hit yields (pricing errors) but it will help to get forecasts • 0 right

Moral 2. You can’t ﬁrst reduce to L, S, C, then examine +1 Reason #1 this was missed. • → Panel D: The tent pattern is stable as we add forward rates, a guard against multicollinearity. (Not really related, but we had a blank panel to ﬁll!)

574 29.2 What Is and Is not Important about CP

What is important, and very robust: A single factor model for expected returns. What is not • important: tent vs slope, or the ability of additional variables to forecast returns OK. Now, with multiple securities and forecasts expected returns move around over time too. • () +1 is a random variable at time t. So, how correlated is variation in expected return of (2) (3) one bond with another, what is ((+1)(+1))?Stopandwrapyouyourmind around the concept that expected returns vary over time and can be correlated with each other. For example, can the expected return of a two year bond go up and a three year bond go down? Can the expected return on stocks rise and the expected return on bonds fall?

1. The real, end (I think) enduring point of the CP model is that there is a near-perfect single-factor model of all bond expected returns. Expected returns on all bond maturities rise and fall together. It does not happen that expected return on one bond goes up while the expected return on another bond goes down. In equations () (+1)=(factor) () () 2  ( )( ) =   (factor) +1 +1 × it’s like a sigle colum ofh asingle, and zerosi everywhere else. 2. For counterexample, the Fama-Bliss regressions suggest that there are four factors in the four expected returns. If (2) (2) (1)   = 2 + 2(  ) +1  −  ³ (3) ´ (3) (1)   = 3 + 3(  ) +1  −  ³ ´  (2) (3) Then it is perfectly possible for  +1 and  +1 to go their separate ways. 3. Now, (a little more subtle) even the³ FB´ regressions³ could´ have a one-factor model of expected returns if ( (2) (1))and( (3) (1)) were perfectly correlated: If there were a  −   −  one-factor model of forward rates then FB regressions would imply a one-factor model of expected returns. But the fact is that’s not true. There are at least three forwardrate factors (level, slope, and curvature) so FB regressions imply a three-factor model of expected returns.

What’s not important about CP: The tent vs. the slope. It’s too easy to get swept up in the • 2 contest, and whether out of sample the average FB forecast does better or worse than CP. Suppose when all is said and done, we decide that the CP factor really is slope, i.e. something like () (1) (2) (4) (5) ( )= +  2 1 +  +2 +1 −  −    rather than ³ ´ () (1) (2) (3) (4) (5) ( )= +  2 +1 +2 +1 2  +1 −     −  We will be perfectly happy. The exact³ shape of the coeﬃcients that recover´ the factor is not well identiﬁed.

575 Similarly, it is likely that in the end other variables will help to forecast returns. The inter- • esting question in the end is whether those variables leave the single factor structure intact, i.e., say () (1) (2) (4) (5) ( )= +  2 1 +  +2 +  +1 −  −    The interesting question is not whether³ such variables exist, i.e. if  = 0. We´ will be perfectly happy with  =0 6 Why we’re excited about CP. What is the factor structure of expected returns more broadly? • Is there some combination of  and  that forecasts both stocks and bonds? Is it true that

 stocks:  =  (  +  ) +1 × × () bonds:  =  (  +  ) +1 × × If not, is there at least “factor structure” whereby a few factors describe most of the variance of expected returns across asset classes? The pictures in “Discount rates” of lots of prices moving together, and my interpretation of a big common discount rate shock, certainly suggests it!

Uniting the models.Sofar,wehave • () ( )= (cp factor) +1 × (So far, this is all just “describe,” not “explain”) In “Decomposing the yield curve,” Piazzesi and I look for factors to account for expected returns. Expected returns should correspond to covariance with something, no? The answer is

() () ( )=(  level+1) (cp factor) +1 +1 ×

The  factor describes how expected returns vary through time.The coeﬃcients that tell you how much each bond expected return loads on the factor is given by the covariance of return with the level factor!

29.2.1 Treasury curves during the crash

Some of this comes down to weirdness in the treasury yield curve during the crisis. In Dec 2008 there was a huge demand for treasuries that could be repoed to ﬁnance other positions. There was also a separaton into two yield curves, one for treasuries with 6% couponds and one for the others, documented in the Pancost extra reading below. This is leading to very weird treasury yield curves, which I think are beyond the FB data construction technique to make sense of .

576 yields and forwards in the crash 5 12/06

12/07 3 percent

12/08

0 1 2 3 4 5 maturity in years

Forwards in the crash

3 percent

06/08 07/08 08/08 2 09/08

10/08

1 11/08

12/08

0 1 2 3 4 5 maturity in years

Here are the underlying data. Each bond is a blue dot. The red line is the FB ﬁt to the yield curve. You can see it’s pretty hopeless lately because there’s so much noise in blue dots In 06 it was all sensible

Treasury and Fama Bliss yield curve Dec 29 2006 5.5

4.5

3.5

3 Yield 2.5

1.5

0.5

0 0 2 4 6 8 10 12 14 16 18 20 Duration

577 In Dec 07 things were getting worse. The massive upward curve causes the CP factor to say returns should be extremely negative.

Treasury and Fama Bliss yield curve Dec 30 2007 5

4.5

3.5

2.5 Yield

1.5

0.5

0 0 2 4 6 8 10 12 14 16 18 20 Duration

Now it’s just a mess. The zig zag in yields gives rise to the weird forward rate pattern. Look at these huge spreads for bonds of the same duration! I think that’s the on the run (and deliverable to short/repo) vs. oﬀ the run spread.

Treasury and Fama Bliss yield curve Dec 30 2008 5

4.5

3.5

2.5 Yield

1.5

0.5

0 0 2 4 6 8 10 12 14 16 18 20 Duration

Moral: In future research, I need to ﬁnd better ways to extract the yield curve from data like this. Aaron Pancost (optional reading) dug in to treasury curves in Dec 2008, and found another nifty “arbitrage” in the ﬁnancial crisis

578 579 According to Aaron, UBS had a huge portfolio of old high coupon bonds and dumped them in Dec 2008. It’s an an interesting “downward sloping demand” “apparent arbitrage” “no, a stop loss order is not a put option,” “ﬁre sale,” etc. story. Quotes from an un-named source:

Since I’m sitting on the xxx fixed income trading floor today, I walked over and talked with one of the main Treasury bond traders who was here during the 2008 crisis period. Here’s his take. 1. The numbers are real. There really was a large disconnect in the market during this period, although a 40 basis point mispricing was pretty small relative to some of the other strange stuff we were seeing back then. 2.TheproximatecauseforthestrangepricinghereisUBS.UBSwasthelargest dealer in the Treasury market at that time given their willingness to commit balance sheet. When UBS along with everybody else started losing money, they pulled back their balance sheet commitments and started selling big chunks of their Treasury portfolio. 3. Given that these older bonds had always been a bit on the cheap side given their illiquidity, UBS had put together a large position in these older bonds. Typically, these bonds were about 5bps cheap with gusts up to 15bps. When UBS suddenly had to start unwinding their balance sheet commitments, these bonds were differentially affected since UBS was selling them much more than they were the notes simply because that’s what their inventory was. [JC: I.e. following my advice that the trader who doesn’t want liquidity should buy the cheap illiquid asset, but forgetting my advice about the “average investor,” and they might indeed be the trader who wants liquidity! You can’t leverage a liquidity-providing position!]

580 4. Also, these longer bonds were traded primarily on the STRIPS desk which isn’t the same as the usual Treasury desk. This opened the door to some disconnects between the markets. 5. Anecdotally, I’m told that UBS lost upwards of 200MM on the unwind of these long bond positions. 6. Academics will ask, why wasn’t this stuﬀ arbed? But you have to remember the environment when nobody was willing to commit balance sheet or take the mark to market risk. There were far larger arbs in the market than this 40 basis point mispricing. There wasn’t any risk capital for something this small.

29.3 Factor models in FX

We return to Lustig Roussanov and Verhdelahn and review the factor models in the 1-6 • portfolios.

(Possibly) we return to Brant and Kavajecz and review the factor model in daily yield changes. •

29.4 Expectations hypothesis vs. risk neutrality

1. Why is it a “hypothesis” and not “risk neutral?” We are used to doing ﬁrst approximations of this sort by arguments that speculators are pretty risk neutral, so expected returns of one asset are equal to those of another.

2. An easy but not quite right version

(a) Start as always with (1)  = (+1) (2) (1)  = (+1+1) (b) Add risk neutrality,

   =   +1 − +1 −  µ  ¶   =0:+1 = −

 ()+ 1 2() (c) So, using  =  2

(2)  (1)  = − (+1) (2)  (1)   = −  +1 (2) (1) 1 2 (1)    +  ( )   = −  +1 2  +1 (2) (1) 1 2 (1)  =  +  +  ( )  − +1 2  +1 (2) (1) 1 2 (1) 2 =  +   ( )  +1 − 2  +1

581 (d) What about ?

(1)    = (− )=− (1)  =  (1) −  = 

(e) So, 1 1 (2) = (1) +  (1) + 2((1) )  2 +1  +1 4  +1 h i As you see, the “real” answer has a 1/2 2 term in as well. This is usually small.  =001 and ()=001 so 2()=0012. (f) As you will notice, I played a bit of a trick here. If people are really risk neutral in this sense, then

(1) +1 =  2 (1)  (+1)=0

Interest rates never change at all! Ok, we do get the expectations hypotheisis, but it’s pretty boring.

3. Doing it right — market prices of risk.

(a) We could put in  shocks or think about nominal bonds, but that turns out not to be productive. So we’ll have to think of “risk neutral” as meaning “interest rate risk has a beta of zero,” which term structure people say is “the market price of interest rate risk is zero.” (b) The aswer 1 1 (2) = (1) +  (1) + 2 (1) + (∆ (1) )  2   +1 2 +1 +1 +1 h i h i The ﬁrst term is the expectations hypothesis! The last term is familiar: it’s the beta of interest rates on consumption growth. It’s the “market price of interest rate risk.” So, the expectations hypothesis results if the market price of interest rates is zero. Except... except for the annoying 122 term. The expectations hypothesis is not exactly the same thingaszeropriceofinterestrateriskbecauseoftheannoying1/22 terms. (c) How to do it? It’s really only a few lines of algebra. Here we go. I’ll let people be risk averse, and I’ll let the conditional mean of consumption growth ∆+1 wander around. To keep it simple, I’ll keep the conditional variance of consumption growth 2 2 2 constant  (∆+1)= (∆+1)= (∆)    =   +1 − =   ∆+1 +1 −  − − µ  ¶ 1 2 2 (1)  ∆+1  (∆+1)+   (∆)  = (+1)= − − = − − 2

(1) ³ 1 2 2 ´  =   (∆+1)+   (∆)  − − 2

582 1 2 2 (2) (1)  ∆+1  +1(∆+2)+   (∆)  = (+1+1)= − − − − 2 1 2 2 2 ∆+1 ³+1(∆+2)+   (∆) ´ =  − − − 2 ³ ´  ()+ 1 2() Once again, apply  =  2

1 2 2 1 2 2 1 2 2 1 2 (2) 2 (∆+1) (∆+2)+   (∆)+   (∆)+   (+1(∆+2))+ 2 (∆+1+1(∆+2))  = − − − 2 2 2 2 

Yes, +1 (∆+2) is a random variable at  + 1, with a variance and covariance as you look at it from period zero. Taking logs, rearranging and simplifying a bit 1 (2) =   (∆ )+ 22 (∆)   +1 2 ∙− − ¸ 1 +   (∆ )+ 22 (∆)  +2 2 ∙− − ¸ 1 + 22 ( (∆ )) + 2(∆  (∆ )) 2 +1 +2 +1 +1 +2 Now,

2 (1) (1)  (∆+1+1 (∆+2)) = (∆+1  )=(∆+1 ) − +1 +1 substituting the terms in brackets as well, and using  in place of ,

(2) (1) (1) 1 2 2 (1) 2 =   +   [+1 (∆+2)] + (∆+1 ) −  −  − +1 2 +1 1 1 (2) = (1) +  (1) + 2 (1) + (∆ (1) )  2   +1 2 +1 +1 +1 h i h i 30 Term Structure III Term structure models notes

Reminders 1 Two steps in everything we’re doing •

1. Form the factors (level, slope) from yields (returns, etc) level = ,slope =  () 2. Loadings or betas;  = level + slope

Reminder 2 • 1. Our ﬁrst attempt: a) Form factors from intuition, b) Run regressions to ﬁnd loadings

1 (1) (2) (3) (4) (5) level  +  +  +  +  ≡ 5      1 h (4) (5) 1 (1) (2) i slope =  +   +   2   − 2   h i h i

() ()  =  +  level +  slope +   × × 

583 2. Our second attempt. Use principal components to maximize 2.  answers both questions.

Λ0 = (0)  = 0

 = 

Leave out columns of  corresponding to small eigenvalues — small factors. 3. Now: More ways to reﬁne the same two steps.

Purpose: • 1. The factor model was nice, but it could only describe bonds that we included from the beginning. 2. How do you interpolate the yield curve across maturities? What should the yield be of a 2.5 year bond? What should be the yield be of a 6 year bond? How do we ﬁnd a good set of ()()() to calculate, for example,

(25) (25) (25) (25)  =  level +  slope +  curve  × × × 3. How do you price term structure options, other derivatives? How do you “Interpolate” (or extrapolate) across instruments, as Black and Scholes do for stock and bond? Exam- ple: what price should we charge for special products — ARM mortgages with an interest rate cap? This is a big business. 4. In both cases, we want to extend in a way that doesn’t introduce arbitrage opportunities or huge Sharpe ratios. 5. Example: if (2) =2%and(3) =3%,maybeyouﬁt (25) =25%. But are we sure this price doesn’t give an arbitrage opportunity or a huge Sharpe ratio to a hedge fund if they could do fancy long-short strategies between 2, 2.5 and 3 year bonds? If 253% was right, they’ll bury you.

30.1 Real, “arbitrage free” term structure models

Still to do: • 1. This model has no average slope, and no risk premium — (())areallthesame. That isn’t a surprise, we imposed the expectations hypothesis! A better model has to incorporate risk premiums. 2. One nagging problem — are we sure we haven’t let in arbitrage opportunities that clever options traders might exploit?

Approach:  = () of course. This is equivalent to “risk-neutral” “arbitrage-free” ap- • proaches.

Idea: •

584 1. No arbitrage opportunities we can generate prices with some marginal utility 0. ⇐⇒ 2.  allows you to introduce risk premia, i.e. recall ()=()() 3. The ideal approach would be to write, say,

   =  +1 − +1  µ  ¶ Then

   (1) =   +1 − 1    " µ  ¶ × #    (2) =  2 +2 − 1    " µ  ¶ × # etc. This would tie bond prices, returns to macroeconomic events more than our hand waving so far. Alas, these don’t work in practically useful ways yet. 4. What current term structure models do: (a) Write a statistical model for , for example,

+1 =  +  + +1

(b) Find bond prices for any maturity from

(1)  =  (+1 1) =  +  (2) × (1)  =  (+1+2 1) = (+1 )=(+1 ( + +1))  × +1 and so forth. (This is equivalent to “short rate process” plus “market price of risk.” ( ) generates “market price of risk”). (c) Calibrate parameters   2()tofit bond prices (or other securities) in hand (for example, choose a few maturities to match, or better, match “level” “slope” and “curvature” factors). (d) Use the  to find prices of all other bonds, swaps, options, etc. 5. This should remind you of the problem set in which you found an  to price stock and bond, and then used it to value an option. We’re doing the same thing — finding an  that prices a few bonds, and then using that  to try to price all bonds and bond options.

30.1.1 Simplest term structure model, “discrete time Vasicek”

It’s the simplest example and shows the ideas. • Where we’re going: (the answer when we’re all done) A “short rate equation” plus “one- • factor arbitrage-free model of the term structure”

(1) (1)   =    + +1 +1 −  − ³ ´ ³ ´

585 1  (2)  =  (1)  +  2   2  − − − ∙ ¸ ³ ´ ³ ´ 1  (3)  = 2 (1)  (1 + )2 + (1 + ) 2  −  − − 2  ³ ´ ³ ´ ∙ ¸

(1 + ) 1 1 (2)  = (1)  +  2  − 2  − − 2 2  ³ ´ ³ ´ µ ¶ (1 +  + 2) 1 1 (3)  = (1)  1+(1+)2 +  [1 + (1 + )] 2  − 3  − − 3 2  ³ ´  ³ ´ ½ h i ¾ Intuition: the ﬁrst equation tells you where interest rates are going over time. The second and third sets of equations tell you where each forward rate is at any date, depending only on where the short rate is on that date. This is just like the last one-factor model with a bit more complex constant term. Construction • 1. Construction. Suppose  follows the time series model

(+1 )= ( )++1 − − 1 2 2 ln +1 =    +1 − − 2  − 2 To be speciﬁc,  are iid Normal.    are free parameters; we’ll pick these to make the model ﬁtaswellaspossible.(Moreintuition on this model below.) The idea is that we can’t see  but market participants can; it carries the information about the conditional mean of their discount factors/marginal utility growth. (Again, view this as analogous to constructing a discount factor to price stock and bond, rather than as a deeply-founded idea.) 2. First bond price. (1) ln +1  =  (+1)=  (a) Math fact21:If is normal with mean  and variance³ ´2,then

 + 1 2  ( )= 2  (35) Intuition:  is curved up, so () () (like risk aversion but backwards). (b) Applying that fact (watch this line carefully!) (1) 1 2 1 2 2 1 2 2 [ln(+1)]+  [ln(+1)]    +     =  2 = − − 2 2 = − Take logs, (1)  =ln (+1)=  (1) −  =  2 2 Now you see why I set up the problem with the 12  to begin with! 21Wheredoesthiscomefrom? 2 1 ( )  1 ∞  −  ( )=  − 2 2  2 Z−∞ + 1 2 Complete the square and you can reexpress this as  2 times the integral of a normal, which is 1.

586 (c) The one year interest rate “reveals” the latent (not observed) state variable x.So, we can write the ﬁrst equation

(1) (1)   =    + +1 +1 −  − ³ ´ ³ ´ just as we did with the expectations-hypothesis version above. 3. On to the next price. Pay attention to this algebra

(2) (1)  =  +1+1 ³ ´(1) (1) ln(+1)  ln(+1)+ =    +1 =   +1 µ ¶ µ ¶ 1 2 2    +1 +1 =  − 2  − − − ³ ´ Now we use the model for ,

+1 =  +  ( )++1 −

+1 +  =2 +(1+)( )++1 − substituting,

1 2 2 (2)   +1 2 (1+)( ) +1  =  − 2 − − − − − 1 2 2 (2) ³ 2 (1+)( )   (1+)+1 ´  =  − − − − 2 − ³ ´ Using (35) again,

1 2 2 1 2 2 (2) 2 (1+)( )   + (1+)   = − − − − 2 2 1 2 (2) 2 (1+)( )+( +)  = − − − 2 (2) 1 2  = 2 (1 + )( )+( + )  − − − 2  4. From prices, we ﬁnd yields and forwards,

1 (1 + ) 1 1 (2) = (2) =  + ( ) +  2  2  2  2 2  − − − µ ¶ (1 + ) 1 1 (2)  = (1)  +  2  − 2  − − 2 2  ³ ´ ³ ´ µ ¶

 (2) = (1) (2)   −  1 2 =  ( )+2 +(1+)( ) ( + ) − − − − − 2  1  (2) =  +  (1)  ( + )2   − − 2  ³ ´1  (2)  =  (1)  ( + )2  −  − − 2  ³ ´ ³ ´

587 Next price • (3) (2)  =  +1+1  You don’t really want to do it, right? ³ ´

Result: • (1) (1)   =    + +1 +1 −  − ³ ´ ³ ´ 1  (2)  =  (1)  ( + )2  −  − − 2  ³ ´ ³ ´ 1  (3)  = 2 (1)  (1 + )2 + (1 + ) 2   2  − − − ∙ ¸ ³ ´ ³ ´ 1 2  (4)  = 3 (1)  1+ + 2 + (1 +  + 2) 2  −  − − 2  ³ ´  ³ ´ ∙ ³ ´ ¸

(1 + ) 1 1 (2)  = (1)  +  2  − 2  − − 2 2  ³ ´ ³ ´ µ ¶ (1 +  + 2) 1 1 (3)  = (1)  1+(1+)2 +  [1 + (1 + )] 2  − 3  − − 3 2  ³ ´  ³ ´ ½ h i ¾

(1) 1. As before, we have a “short rate process” plus a “one factor model” When  moves, all the other yields move together, longer ones move less than shorter ones. (1) 2. This is just like the EH model, with a different set of constants. Since the + are the same as in my expectations model, the new set of constants are ariskpremiumin () (1) the term structure — If  = + 1+stuff,well,stuff is a risk premium! − Let’s look at the results. • 1. I chose some parameters to fit the FB zero coupon bond data22 () (1) 2. I plot  for a bunch of assumed values for  Thesearethepossibleyieldcurvesat any date according to the model. The dashed lines represent the expectations hypoth- (1) esis, (+), calculated using the AR(1) model, and therefore also the forward rates that were produced by my simple expectations model.

22 (1) (1) I ran a regression of +1 on  to get ; I took the variance of errors from that regression to get ;Itookthe (1) mean  =   . Finally, I picked the market price of risk  to ﬁt the average 5 year forward spread: ³ ´ 1 2  (5) =  + 4 (1)  1+ + 2 + 3 + (1 +  + 2 + 3) 2   − − 2  ³ (5) ´ h ¡ ¢ i    − 1 2 3  = 2 3 2 1+ +  +  −(1 + ³+  ´+  ) − 2 ¡ ¢

588 forwards, λ = −13.2, ρ = 0.84, 100 x σ = 1.58 forwards and expectations 0.1 0.1

0.08 0.08

0.06 0.06

percent 0.04 percent 0.04

0.02 0.02

0 0 0 2 4 6 8 10 0 2 4 6 8 10 maturity maturity

yields yields and expectations 0.1 0.1

0.08 0.08

0.06 0.06

percent 0.04 percent 0.04

0.02 0.02

0 0 0 2 4 6 8 10 0 2 4 6 8 10 maturity maturity

3. Cool! This captures some basic patterns; yields are upward sloping when lower, downward sloping when higher. The substantial risk premium I estimated to match the average upward slope does introduce a substantial deviation of the model from expectations at the long end. (1) () 4. I use the history of  ,andﬁnd the model-implied 

589 Simulation of Vasicek yield model

10 yields

0 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010

1−5 year zero coupon yields 15

10 yields

0 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 It looks pretty good! Again, the spread is higher when yield is lower. 5. But you can see it’s not perfect. In the model, the spread reflects how far yields are from the overall mean of about 6%. In the data, the spread reflects how far yields are from some sort of “nearby” mean. For example, the spread is off in 1973, 1981, and 1994. To see this more clearly,

590 Simulation of Vasicek yield spreads

1.5 1 0.5 0 −0.5

yield spread −1 −1.5 −2

1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010

1−5 year zero coupon yield spreads

yield spread −1

−2

1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 We really need a two, or three-factor model.

A two-factor Vasicek model: Make one factor move quickly (slope, business cycle), and make • the other one move slowly (level, inﬂation)

 (+1 )= ( )++1 − −  (+1 )= ( )+ − − +1 1 2 2 1 2 2   ln +1 =       ( + )   −2   − 2   − − +1 − +1

(1)  = log  (+1)=( + )  − ...here we go! This will give “level” and “slope” factors. (Or just interpret “x” above as a vector and “”asamatrix.)

What about returns?Wecanﬁnd the model for returns directly from the model for yields • and prices

(2) = (1) (2) +1 +1 −  (3) = (2) (3) +1 +1 − 

591 After some algebra23, we can write the answer as

(2) 1 2   = +   (36) +1 − 2  ³ ´ µ ¶ (3) 1 2 2   = (1 + ) +  (1 + )  +1 − 2  ³ ´ ∙ ¸ (4) 1 2 2 2 2   = 1+ +  + (1 +  +  )  +1 − 2  ³ ´ ∙ ³ ´ ¸

(2) (2)  =   +1 (37) +1 +1 − (3) ³ (2) ´  =   (1 + )+1 +1 +1 − (4) ³ (4) ´ 2  =   (1 +  +  )+1 +1 +1 − ³ ´ What does this mean?

1. Expected returns are constant over time. There is a constant risk premium which increases for longer-term bonds. See the risk premiums in yields, forwards and returns. 2. The sign and size of the risk premium depend on . 0 means long term bonds earn less than short term bonds, as above, and vice versa. The rather complex 2 terms are just numbers — they don’t vary over time. That’s the point. They are the same as the 2 terms in forward rates, which are also controlled by . 3. Returns follow a one-factor model. (Equations (37).) When there is a positive shock to interest rates +1  0, returns on all bonds suﬀer. Long term bond returns suﬀer more, by the (1 +  + ) factor.

Comments: risk premiums, returns, risk neutral probabilities. • 23

(1) =  (1) (1)   − −  − ³ ´ 1 (2) = 2 (1 + ) (1)  + +  2  − −  − 2  ³ ´ ³ 1´ (3) = 3 (1 +  + 2) (1)  + 1+(1+)2 +  (1 + (1 + )) 2  − −  − 2  ³ ´ n ¡ ¢ o (2) = (1) (2) +1 +1 −  1 (2) =  (1)  2 (1 + ) (1)  + +  2 +1 − − +1 − − − −  − 2  ³ ´ h ³ 1 ´ ³ ´ i (2) =  (1)  +(1+) (1)  +  2 +1 − +1 −  − − 2  (2) ³ (1) ´ ³ (1) ´ ³ 1 ´ 2  =     +(1+)   +   +1 −  −  − − 2  (2) (1)³ ´1 2 ³ ´ ³ ´  =  +   +   +1  − − 2  (2) (2) (1) ³ 1 ´ 2  =   = +   +1 +1 −  − 2  ³ ´ and so forth. Isn’t this fun?

592 1. Expected returns and betas are still here24!

() 1 2 () () () (1)   +  ( )= ( +1) = (  ) +1 2 +1 − +1 − +1 +1 ³ ´ Expected return is earned as a compensation for exposure to interest rate shocks. When interest rates rise, bond returns go down. The more a return is negative when interest rates rise, the greater the bond must pay in expected return, and the larger , the greater this eﬀect.  is the “market price of interest rate risk”. (The annoying 122 term comes from levels vs. logs and is usually quite small.) 2. We can write the risk premium in forward rates similarly

() 1 2 () (1) () (1)   +  (+1)= + 1  (+1+1) − 2 − − − ³ ´ ³ ´ If expected returns are higher on long-term bonds, 0, then forward rates are higher than expected future spot rates (plus the usual annoying 122 term.) 3. What does  mean? How much risk premium should we expect? What sign should it be? What economic forces drive the risk premium? This model helps us to organize our discussion about safety of long vs. short bonds. We started with

1 2 2 (1) (1) ln +1 = ∆+1 =    +1 =     − − − 2  − − +1 − +1 h i  tells you whether people are happier or unhappier when interest rates rise. If 0, that means people are happier (consumption rises) with higher interest rates. 0 means that people are unhappy (consumption declines) with higher interest rates. 4. Which is it? . (1) (a) My first guess: a lower interest rate +1means higher  (bad state), which means 0 sign and negative premium. This is a typical result — models like this tend to predict downward sloping yield curves, since “long term bonds are safer for long term investors.” Real yield curves (TIPS) do show this pattern, and yield curves did slope downward under the Gold standard. (b) Nominal yield curve data so far show the opposite pattern, with typically upward- sloping curves. 0 might make sense if we view high nominal rates as a sign of inflation — if the “level” factor corresponds to expected inflation, since inflation occurs in “bad times.” 24Why? Just plug in from (37),

2 (2) 2  +1 = 

2 ³ (3) ´ 2 2  +1 =(1+) 

2 ³ (4) ´ 2 2 2  +1 =(1+ +  )  ³ ´ (2) 2   +1 = 1 +1 −  ³ (3) ´ 2   +1 = (1 + ) +1 −  ³ (4) ´ 2 2   +1 = (1 +  +  )  +1 −  Then look at the two terms on the right of³ (36). ´

593 (c) News reports seem to think that lowering rates is good for the economy. How- ever, there is supply and demand. Higher rates are a sign of a humming economy (demand); lower rates might help an otherwise sluggish economy (supply). News reports always forget there is both supply and demand. (d)  is a free parameter, so you can set any risk premium you want. Practitioner uses of these models never even stop a moment to think if the parameters make economic sense. 5. Part of the risk premium stays even with  = 0, risk neutrality This is “the effect of convexity,” the fact that “Expectations hypothesis is not the same as risk neutrality.”  ()+ 1 2 It stems from ()=( )= 2 . You care about actual returns ,notlog returns . This is another force for typical downward slope of many term structure models. However, with  =0the122 terms are typically very small, so there is in practice little difference between expectations and risk neutrality. (At most bond returns have 10% annual volatility, so 122 =01022=0005 = 05%) 6. The risk premium is constant over time — not like Fama-Bliss or Cochrane Piazzesi. To get a time-varying risk premium, we have to add time-varying market price of risk, () . Then you would have  +1 varying over time with . Cochrane and Piazzesi “Decomposing the yield curve”³ shows´ you how to do this. 7. “Risk neutral probabilities” You will hear about these in fixed income and options classes. Here’s the idea. Look at the one-factor model we wound up with,

(1) (1)   =    + +1 +1 −  − ³ ´ ³ ´ 1  (2) =  +  (1)  +  2   − − 2  ³ ´ µ ¶ Now, suppose you write a new model with a diﬀerent mean

(1) (1)  ∗ =   ∗ + +1 +1 −  − ³ ´ ³ ´ 2  ∗ =   − 1  − but you set  = 0. You’d get exactly the same forward rate, prices, yields.

(2) (1) 1 2  ∗ =   ∗ ( +0)  −  − − 2  ³ ´ ³ ´

2 2 1  (2)    =  (1)    2  − − 1   − − 1  − 2  Ã − ! Ã Ã − !! µ ¶ 2 2 1  (2)  =  (1)    +    2   1  1  2  − − − − µ ¶ ³ ´ 1− −  (2)  =  (1)  +  2  −  − − 2  ³ ´ µ ¶ Thus we get the same answer if we “transform to risk-neutral probabilities which alter the drift, and then use risk-neutral valuation formula” We can write the whole model

594 this way.

(1) (1)  ∗ =   ∗ + +1 +1 −  − ³ (2) ´ ³ (1) ´ 1 2  ∗ =   ∗ ( )  −  − − 2  ³ ´ ³ ´ (3) 2 (1) 1 2 2  ∗ =   ∗ (1 + )   −  − − 2  ³ ´ ³ ´ ∙ ¸ (4) 3 (1) 1 2 2 2  ∗ =   ∗ 1+ +    −  − − 2  ³ ´ ³ ´ ∙ ³ ´ ¸ The advantage of this approach is that we get prettier formulas that hide .It’salso algebraically easier once you find . The disadvantage: we can’t directly describe the (1) (1) actual dynamics. We can’t fitthefirst equation by running a regression of +1 on  . This approach is common in interpolation or option pricing exercises in which you don’t particularly care about the dynamics, or in which you brazenly fitdifferent (inconsistent) dynamics at different dates in order to fit the cross section better.

Additional comments on model construction . • 1. Why do we need a more complex model? Why can’t we just use an AR(1) for ?Ifwe need more, why not an ARMA model? It turns out this time series model is an extremely useful model for many uses in ﬁnance and a natural for when an AR(1) isn’t enough. If youhaveanAR(1), +1 =  + +1 then you know how to ﬁnd its forecasts,

 + =  

Here’s what an AR(1) and its forecast look like

ar(1) and its forecast 6

−1

−2

−3 0 10 20 30 40 50 60 70 80 90 100 You can think of our model as an AR(1) plus some noise,

+1 =  + +1

+1 =  + +1

595 Here’s what that looks like

ar(1) plus noise 15

−5

−10

−15 0 20 40 60 80 100 120 140 160 180 200 This captures a lot of things we do.  captures an expected value that moves slowly over time, and then  is the outcome. For example, we thought that expected returns move slowly over time in a way captured by d/p,

+1 =() + +1

()+1 = () + +1 Here’s the key: we can still easily compute expected values in this model.

+1 =  + +1

+1 =  + +1

(+1)=

(+2)= 2 (+3)=  and so on. As in the expectations model, we’re going to need to calculate these. Note  that we don’t have + =  .The series is very jumpy. We could not model a series that is both very jumpy and has a slow-moving mean with an AR(1). This model turns out to be equivalent to an ARMA(1,1) with roots that nearly cancel, but this representation of it (in terms of x and y) is much easier to deal with .

2. Why did we need logs? If we use an AR(1) or other time series model for ,thenwesee    0 sometimes, and we know (+1)−  0. Equivalently, the point is to keep out arbitrage opportunities “arbitrage-free term structure model.” There is a deep theorem: A set of prices p and payoﬀs x don’t have arbitrage opportunities if and only if they can be represented by some positive +1  0and = (+1+1). Thus, we model log , this keeps 0. It also anticipates that we want answers in terms of log prices, yields, etc. 3. I could have written the model without the mysterious sas (1) (1)   =    + +1 +1 −  − ³ ´ ³1 2 2 ´ (1) ln +1 =    +1 −2  −  −

596 “a short rate process plus a market price of risk.” Then, I would have checked that the (1) (1)  the model produces is the same  I started with, i.e.

(1) 1 2 2 (1) (1)    +1  = ln  (+1)= ln  − 2 − − =  − − µ ¶ Take your pick — is this more or less confusing than starting with an  you “can’t see” (1) and then showing that it turns out to be  ?

30.1.2 A tour of term structure models.

1. What do we do next? Just more complex time-series model for +1 in order to ﬁt the data better! The algebra gets worse, but you don’t do anything at all diﬀerent conceptually.

2. Multi-factor models, with “level shifts” that move the overall mean (inﬂation?), plus “slope shifts” when yield is temporarily above or below that overall mean. Something like

  = ⎡  ⎤  ⎢ ⎥ ⎣ ⎦ +1 =  +  + +1; +1 (0) ∼ N The discount factor is related to these state variables by 1  =exp       +1 0  2 0 0 +1 µ− − − ¶ () ( 1)  =  +1+1− This is exactly what we just did, but with vectors³ and matrices.´

3. Time-varying risk premia. How do we match Fama-Bliss, CP evidence for time-varying risk premiums? Well, we saw

() 1 2 () () (1)  +  ( )= (  ) +1 2 +1 − +1 +1 () If you want +1 to vary over time, you need  to vary over time. An example (this is a scalar version of the CP “decomposing the yield curve.”)

(1) (1) 2  =  +  + +1; +1 (0 ) (38) +1  ∼ N 

(1) 1 2 2 ln +1 =    +1 (39) −  − 2   − (1)  = 0 + 1 

(1) If  rises, then  rises, and then +1 is multiplied by a bigger number. We ﬁnd bond prices as usual:

(1)  = (+1) () ( 1)  =  +1+1− ³ ´ 597 4. Time-varying volatility matches the fact that interest rates are sometimes more, and sometimes less volatile. It lets you capture changing option prices when volatility rises and falls

 =  1 + √ − 1 2 ln +1 =   √+1 −2  − − Now, when  varies, interest rates also become more or less volatile. (“Discrete time Cox Ingersoll Ross”)

5. Continuous time. It’s usually easier to write all this stuﬀ in continuous time.

 = ( ) +  −  =     − −   () =  +    µ  ¶ But it’s the same idea.

6. Don’t lose the big picture: Again, it’s just a time series model for  then take expectations.

(a) Art: what are good models that ﬁt the data well and are used in practice? Algebra:

cranking out these 0 (it’s easier in continuous time) (b) ....See Veronesi/Heaton!

30.1.3 How would you use these models?

1. Price excluded maturities, “interpolate the yield curve.” Fit the model ( 2)todataon (1) (3) (4) (2) (2)    , ﬁgure out what  should be. Is  underpriced? Hence, price coupon bonds. 2. Price an option.

(1) ¯ (a) Example: A cap. Pays a stream of “dividends”  1,exceptitpaysonly 1if (1) ¯ − −    .

(1) (1)  =  +1 max  1 ¯ 1 + +1+2 max  1 ¯ 1 +   − − +1 − − h ³ ´ ³ í (Interest rates are lognormal, so each one is a mini Black-Scholes call option.) (b) Example 2: What is the option to refinance a 30 year fixed mortgage worth? How much lower would the rate be if you didn’t have the option to refinance? You often can’t find pretty formulas for this sort of stuff, but it’s easy enough to program a computer.

3. Risk management, hedge movements in yields. My name is China. I hold 800 billion long U.S. Treasuries. How many interest rate swaps do I need to avoid interest rate risk? Multifactor models are better than simple duration or convexity hedges.

598 30.2 Factor model review.

1. Bond yields, changes in yields, returns, etc. are often summarized by factor models,inwhich movements in all  bonds are summarized by movements in a few factors,

() () () ()  =  level +  slope +  curve (+error)  1 × 2 × 3 × 2. Level, slope and curve are themselves special linear combinations of yields.

3. The eigenvalue decomposition of the yield covariance matrix gives you a very easy way to ﬁt (1) (2) 0 a factor model. If  =    , h i Σ = ( 0); Σ = Λ0

Then  = 0 lets you construct factors, and  =  is the factor model! Directions: 1) Sigma = cov(yields) 2) [Q,L]=eig(Sigma) 3) Plot Q 4) interpret.

4. This is just like FF3F regressions for stocks, 0 produces the orthogonalized factors (hml, smb) for you.

5. “Term structure models” CIR, etc. are just a speciﬁcation of a time series process for a discount factor, like  =  1 +  − and then (1) (2) (1)  = (+1);  = (+1+1) etc. (This is the same thing as a “short rate process” plus a “market price of risk.”) Since the time series process for the discount factor runs oﬀ a few “factors” ,theyproduce factor models for yields.

6. The “single-factor Vasicek model” is

(a) (1)  is the “single factor,” 1 2  are the “loadings” which measure  − ³how much´ each forward moves whenh the factor moves.i 2 (1) (1) (b)    = + 1 is the expectations hypothesis component, and the rest is the − − risk³ premium.´

599 (c) This model is formed from the assumption

+1  =  ( )++1 − − 1 2 2 ln +1 =    +1 −2  − − () ( 1)  =  +1+1− h i 7. Returns and yields in the single-factor Vasicek model also follow “one-factor” models

(a) Yields

(1 + ) 1 1 (2)  = (1)  +  2  − 2  − − 2 2  ³ ´ µ ¶ (1 +  + 2) 1 1 (3)  = (1)  1+(1+)2 +  [1 + (1 + )] 2  − 3  − − 3 2  ³ ´ ½ h i ¾ (b) Returns

1  (2) = +  2  +1 2  − µ ¶ ³ ´ 1  (3) = (1 + )2 + (1 + ) 2  +1 2  − ∙ ¸ ³ ´ 2 (3) 1 2 2 2   = 1+ +  + (1 +  +  )  +1 − 2  ³ ´ ∙ ³ ´ ¸

(2) (2)  =   +1 +1 +1 − (3) ³ (2) ´  =   (1 + )+1 +1 +1 − (4) ³ (4) ´ 2  =   (1 +  +  )+1 +1 +1 − ³ ´ 30.3 Appendix to factor model notes

(This is highly optional and pretty heavy algebra. It’s just a reference for the curious).

30.3.1 Eigenvalue-based factor/principal components models

(This is a background reference only, for people who know the math and want to dig deeper. You have to know what an eigenvalue is) Summary of the procedure Given an  1 vector of random variables ,with( )=Σ,weformΛ = ()by × 0 0 the eigenvalue decomposition. If  is a   matrix of data on , then [Q,L] = eig(cov(y)) in × matlab. Λ is diagonal and  is orthonormal, 0 = 0 = .

We form “factors” by  = 0The columns of  thus express how to construct factors  from the data on .

600 (1) (2) We can then write  = ,i.e.  = 1 + 2 + where 1 = (: 1) denotes the ﬁrst column of .Thecolumnsof thus also give “loadings” that describe how each  moves if one of the factors  moves.

We have ( 0)=0Λ0 = Λ,i.e.the are uncorrelated with each other. If some of the diagonals Λ are zero, then we express all movements in  by reference to only a few underlying factors. For example, if only the first Λ is nonzero, then we can express  = 11 In practice, we often find that many of the diagonals of Λ are very small, so setting them to zero and fitting  with only a few factors leads to an excellent approximation. Since the factors are uncorrelated, if we ignore some factors, the loadings on the remaining ones are the same as if we ran regressions,

(1) (2) (3)  = 1 + 2 +[3 ] (1) (2)  = 1 + 2 + 

Derivation Factors constructed in this way solve in turn the question “what linear combinations of  has maximum variance, subject to the constraint that the sum of squared weights is one and each linear combination is orthogonal to the previous ones?” In equations, each column  of  satisﬁes

max  0 s.t. 0 =1,0 =01 £ ¡ ¢¤ Why is this an interesting question? “What linear combination 0 of the  has the highest variance?” would be too easy a question. Just make  big. The constraint 0 =1meansthesum of squared  must be equal to one, so you can’t boost variance by making  big. You have to ﬁnd the right pattern of  across the elements of . Why is this the answer? Let’s look at the ﬁrst maximization — what linear combination of the  has the largest variance, if you constrain the sum of squared weights to be less than one?

max (0)s.t.0 =1  { } Forming a Lagrangian,  = 0Σ 0 − The ﬁrst order condition ()is Σ =  This is an eigenvalue problem! The answer to this question is then, choose  as an eigenvector of the matrix Σ. Now, which one? Let’s see what variance of  we get out of all this

()=(0)=0Σ = 0 = 0

Aha! The eigenvalue gives us the the variance of . So, the answer to our maximization is, choose the eigenvector  corresponding to the largest eigvenvalue . In sum, to ﬁnd the linear combination of  with largest variance, and sum of squared weights equal one, we choose as weights  the eigenvector corresponding to the largest eigenvalue of the covariance matrix of .  is a matrix of eigenvectors, so you’re done. Eigenvectors are orthogonal

601 0 = 0 so the eigenvectors corresponding to successively smaller eigenvalues answer the question for the remaining factors. Now, what does this have to do with 2? Suppose we leave out some factors

(1) (2) (3)  = 1 + 2 +[3 ]

Since (1) and (2) have maximum variance, this means (3) and beyond have minimum variance. In short we have found a factor model that for each choice of how many factors to use maximizes the 2 in these regressions. Rotation There are lots of equivalent ways to write any factor model. If we have

 =  then if  is any orthogonal (rotation) matrix, i.e. any matrix with 0 = 0 = ,wecandeﬁne new factors  = 0.Then = 

 = ()=()

Now the columns of  give us new loadings on the new factors, which are constructed by  = 0 = 00 =()0  This works even better if we use unit variance shocks.

1 1 1  = Λ 2 Λ− 2  = Λ 2  ³ 1 ´³1 ´ ³ ´ (0)=Λ− 2 ΛΛ− 2 = 

Now if we rotate,  =  = 0 we have 1 1  = Λ 2  = Λ 2 0 ³ ´ ³ ´ (0)=(00)=0 =  In sum, if we rotate unit-variance factors, they are still uncorrelated with each other and still have unit variance. You’re free to recombine factors in any way you want to make them look pretty.

30.3.2 How do you really solve the one-factor Vasicek model?

Guess ()  =   ( )  − −

602 then

() ( 1)  =  +1+1−

³ 1´ 2 2   ( )=log exp    +1 exp ( 1  1 (+1 )) 2 − − − − µ µ− − − ¶ − − ¶ 1 2 2 =log exp    +1 +  1  1 ( )  1+1 2 − − − µ µ− − − − − − ¶¶ 1 2 2 =log exp  +  1 (1 +  1)( )   +1  1+1 − − 2 − µ µ− − − − − − ¶¶ 1 2 2   ( )=  +  1 (1 +  1)( )+  1 +  1  − − − 2 − − − − − µ − ¶

The constant and the term multiplying  must separately be equal. Thus,

 =1+ 1 −

1 2 2  =  +  1 +  1 +  1  − − 2 − µ − ¶ That’s easy to solve

0 =0

1 =1

2 =1+ 2 3 =1+ +   1  − 1   =   = −  1  =0 X −

0 =0

1 =  − 1  = 2 +  + 2 2 2  − µ ¶ 1  = 3 + (1 +  )  + (1 +  )2 2 3 2  − ∙ ¸ 2 2 1 2 2 4 = 4 + 1+ +   + 1+ +   − 2  ∙³ ´ ³ ´ ¸ Or just 1 2 2  =  +  1 +  1  − 2 − ∙ − ¸ You see the pattern from here

603