Pricing, Hedging and Testing Risky Assets in Financial

Markets

by

Yu Ren

A thesis submitted to the Department of Economics

in conformity with the requirements for

the degree of Doctor of Philosophy

Queen’s University

Kingston, Ontario, Canada

June, 2008

copyright ©Yu Ren, 2008

ISBN: 978-0-494-44600-3

ii

Abstract

State price density (SPD) and stochastic discount factor (SDF) are important ele- ments in asset pricing. In this thesis, I first propose to use projection pursuit regression

(PPR) and local polynomial regression (LPR) to estimate the SPD of interest rates non- parametrically. By using a similar approach, I also estimate the delta values in the options and discusses how to delta-hedge these options.

Unlike SPD measured in a risk-neutral economy, SDF is implied by an asset pricing model. It displays which prices are reasonable given the returns in the current period. Hansen and Jagannathan (1997) develop the Hansen-Jagannathan distance (HJ- distance) to measure pricing errors produced by SDF. While the HJ-distance has several desirable properties, Ahn and Gadarowski (2004) find that the specification test based on the HJ-distance overrejects correct models too severely in commonly used sample size to provide a valid test. This thesis proposes to improve the finite sample properties of the HJ-distance test by applying the shrinkage method (Ledoit and Wolf, 2003) to compute its weighting matrix. iii

Co-Authorship

Chapter 5 is a co-author work with Professor Katsumi Shimotsu at Queen’s University. Dedication

To my family for their encouragement, understanding and unstinting belief. v

Acknowledgements

I am grateful to Katsumi Shimotsu for his guidance. I would like to thank Susumu

Imai, James MacKinnon, Chris Ferrall, Frank Milne, Allan Gregory for helpful discus- sions and comments. I also benefit from the annual meetings of Canadian Economics

Association (2006), Canadian Econometric Study Group (2007) and Midwest Econo- metric Group (2007).

vi

Contents

Abstract ...... ii

Co-Authorship ...... iii

Dedication ...... iv

Acknowledgements ...... v

List of Tables ...... xii

List of Figures ...... xiii

Chapter 1 Introduction 1

Chapter 2 Literature Review 9

Chapter 3 Estimation of State Price Densities for Interest Rate Options 14

3.1 Construction of the Estimator ...... 14

3.1.1 Financial Market Theory ...... 14

3.1.2 Nonparametric Estimation ...... 15

3.1.3 Asymptotic Properties ...... 18

3.1.4 Implementing the model ...... 20

vii

3.2 Simulations ...... 22

3.2.1 Learning the Black-Scholes ...... 23

3.2.2 Learning the CIR ...... 27

3.3 Application to the Interest Rate Options ...... 31

3.3.1 Data ...... 31

3.3.2 Accuracy of SPD ...... 33

3.3.3 Pricing Interest Rate Options ...... 34

3.3.4 Dynamics of SPDs ...... 41

3.3.5 Economic VAR ...... 46

3.3.6 In-Sample and Out-of-Sample Price Forecasts ...... 47

3.4 Conclusion ...... 49

Chapter 4 Delta Hedging on Interest Rate Options 51

4.1 Delta Hedging Strategy ...... 52

4.1.1 Delta Hedging Stocks ...... 52

4.1.2 Delta Hedging Interest Rate Options ...... 53

4.2 Delta Estimation ...... 55

4.2.1 My Model ...... 55

4.2.2 Alternative Models ...... 55

4.3 Measuring Hedging Performance ...... 56

4.4 Simulations ...... 58

4.4.1 Stochastic Process for Underlying Values ...... 59

viii

4.4.2 Data Simulation ...... 60

4.4.3 Estimation and Hedging Performance ...... 61

4.5 Empirical Analysis ...... 63

4.5.1 Data ...... 63

4.5.2 Hedging Interest Rate Options ...... 65

4.6 Conclusion ...... 67

Chapter 5 Hansen-Jagannathan Distance Test 71

5.1 Hansen-Jagannathan distance ...... 71

5.2 Finite sample properties of the HJ-distance test ...... 74

5.2.1 Simulation design ...... 74

5.2.2 Simulation results ...... 75

5.3 Improved estimation of covariance matrix by shrinkage ...... 80

5.3.1 Shrinkage method and the HJ-distance ...... 80

5.3.2 Optimal shrinkage intensity ...... 81

5.3.3 Estimation of the optimal shrinkage intensity ...... 85

5.4 Simulation results with the shrinkage method ...... 92

5.5 Conclusion ...... 98

Chapter 6 Conclusion 102 ix

Bibliography 105

Appendix

A 110

A.1 Mathematical details for Chapter 2 and Chapter 3 ...... 110

A.1.1 Framework for Local Polynomial Regression ...... 110

A.1.2 Proofs ...... 112

A.1.3 Bootstrap the confidence interval ...... 117

A.2 Model Specification for Chapter 3 ...... 117

A.2.1 Simple model ...... 117

A.2.2 Fama-French and Premium-Labor models ...... 117

B Figures 119 x

List of Tables

Table

3.1 Mean squared error (in percentage) between the true values and the

estimators (In Sample) ...... 25

3.2 Mean squared error (in percentage) between the true values and the

estimators (out of sample) ...... 26

3.3 Mean squared error (in percentage) between the true values and the

estimators (In Sample) ...... 29

3.4 Mean squared error (in percentage) between the true values and the

estimators (Out of Sample) ...... 30

3.5 Statistical Properties of the Interest Rate Option (IRX) in 1993 . . . . . 32

3.6 Estimation Error of SPDs ...... 34

3.7 Mean Squared Error (in percentage) ...... 37

3.8 Summary of IRX with its estimated prices ...... 39

3.9 Statistical properties of SPD estimators ...... 43

3.10 Notations of CDFs ...... 45

3.11 Hypotheses ...... 45 xi

3.12 E-VAR (I) ...... 48

3.13 E-VAR (II) ...... 48

3.14 MSE (in percentage) for estimated call option prices ...... 50

4.1 Ratios between Sum of Squared Estimation Errors and Sum of Squared

True Values of Delta ...... 62

4.2 Mean Square Hedging Error: with Interest Rate Options ...... 64

4.3 Delta of Interest Rate Options ...... 66

4.4 Mean Square Delta Hedging Error for 21 days in 1993 ...... 68

4.5 Mean Square Delta Hedging Error for 42 days in 1993 ...... 69

4.6 Mean Square Delta Hedging Error for 63 days in 1993 ...... 70

5.1 Rejection frequencies of the specification test using the HJ-distance . . 77

5.2 Rejection frequencies of the specification test using the HJ-distance

with the exact weighting matrix G ...... 79

5.3 Rejection frequencies of the specification test using the HJ-distance

with shrinkage estimation of G ...... 94

5.4 Summary statistics of αˆ ...... 96

5.5 The mean squared error of the HJ-distance from two estimation meth-

ods of G ...... 97

5.6 Rejection frequencies of the HJ-distance test using five factors to esti-

mate G ...... 99 xii

5.7 Rejection frequencies of the HJ-distance test using two factors to esti-

mate G ...... 100 xiii

List of Figures

Figure

B.1 Pricing error for out-of-money interest rate call options ...... 120

B.2 Pricing error for at-the-money interest rate call options ...... 121

B.3 Pricing error for in-the-money interest rate call options ...... 122

B.4 Pricing error for out-of-money interest rate put options ...... 123

B.5 Pricing error for at-the-money interest rate put options ...... 124

B.6 Pricing error for in-the-money interest rate put options ...... 125

B.7 Forecasting error for out-of-money interest rate call options ...... 126

B.8 Forecasting error for at-the-money interest rate call options ...... 127

B.9 Forecasting error for in-the-money interest rate call options ...... 128

B.10 SPDs of the interest rate ...... 129

B.11 CDFs of F11 and F21 ...... 130

B.12 Estimates of η ...... 131

B.13 The density functions for αˆ in the Simple Model of 100 Portfolios . . . 132

B.14 The density functions for αˆ in a misspecified Simple Model of 100

Portfolios ...... 133 Chapter 1

Introduction

State price density (SPD) is an important element in asset pricing. SPD is also called pricing kernel or equivalent martingale measure. It is a probability density func- tion of a risky asset in a risk-neutral economy. The estimation of SPD attracts more and more attentions in recent years because it is very useful in pricing financial deriva- tives. Financial derivatives are the financial instruments whose values depend on other risky assets. As long as we know SPD of these risky assets, we can price the financial derivatives by the discounted expected payoffs under the measure of SPD. Since SPDs are measured in a risk-neutral economy, we can not observe them in the market. We have to estimate them.

In this thesis, I am going to estimate SPD for interest rates. I choose this spe- cific topic because few factors affect investments more than interest rates. And two of the most closely watched are the benchmark rates on short-term and long-term United

States Treasury securities, which reflect changes in general economic conditions, in-

flationary expectations, monetary and fiscal policies and the value of the U.S. dollar.

Other interest rates, including bank prime lending rates, home mortgage rates and cor- 2 porate and rates, tend to respond to trends in the Treasury markets. For hedging and speculation against changes in interest rates, interest rate derivatives are widely used as efficient tools. Today, interest rate markets are among the largest and most liquid state-contingent security markets, with daily trading volumes of trillions of U.S. dollars. Therefore, the accurate and efficient pricing of interest rate derivatives is of enormous practical importance. In practice, the interest rate derivatives can be designed to be very complicated. If I knew the SPDs of the interest rates, the prices of the interest rate derivatives would not be so elusive.

Breeden and Litzenberger (1978) proved that the SPDs of any risky asset are equal to the second order partial derivatives of the prices of the European style call op- tions written on this asset with respect to the strike prices. So my problem reduces to the estimation of these second order partial derivatives. Usually the prices of the European style options are affected by five factors, the current underlying value, the , the time-to-maturity, the discount rate and the . When the discount factor has little impact on the prices and the volatility is regarded as a function of the current un- derlying value, I can reasonably ignore these two factors in nonparametric estimation.

Even for a three-dimensional nonparametric regression, the Nadaraya-Watson kernel estimation,the most popular nonparametric estimation, still needs a large sample size to provide reliable estimators. Of course, I can collect more data to offset the nega- tive effect of the high dimension. But the large sample size of data usually requires a long time period range. And this long time range increases the chance of the structure switches in the sample. Here, I propose to use projection pursuit regression (PPR) to 3 alleviate the problem of the high-dimensional estimation. The basic idea behind this technique is projecting the high-dimensional regressors onto a low dimension space at

first, and then running a low-dimensional nonparametric regression to get the estima- tors, which places emphasis on lower dimensional features that can be thus estimated accurately with relatively low variance. Another nice feature about PPR is that the estimation process can be iterated until the improvement becomes small.

Since the second order partial derivatives of a nonparametric estimator contain a large bias, I adopt local polynomial regression (LPR) to estimate them instead of taking them directly from the kernel function of the estimated option prices. Unlike the Nadaraya-Watson kernel estimator, which is basically a local constant estimation,

LPR fits the data locally with a polynomial function. This polynomial function depends on the order of the derivative I am going to estimate and allows much more variability on the shape of the estimators. This flexibility makes the estimators more reliable and accurate than the Nadaraya-Watson kernel estimators. In this thesis, I will conduct PPR combined with LPR to estimate SPDs in the interest rate options.

I use a linear combination to project the high dimension factor into R1 space, and then run a univariate nonparametric estimation by LPR to approximate the option prices and estimate the derivatives of the univariate function. Once this has been accom- plished, SPDs can be obtained through the chain rule. The main results are based on the simulated data and the short-term interest rate options (IRX) traded in the Chicago

Board Options Exchange (CBOE). For the purpose of comparison, I also report the results when I use the model in Aït-Sahalia and Lo (1998), a representative of the mul- 4 tivariate nonparametric models, to estimate SPDs. The findings will reveal that the estimators explored in this thesis are superior to the alternative one.

By using a similar approach, I go further to estimate the first order partial deriva- tives of interest rate option prices with respect to the underlying values, which are called delta in finance. Delta hedging strategy is frequently used in the financial markets. This strategy can reduce the investment risk coming from the fluctuation of the underlying value. Hedging is simply to make this risk as low as possible. Basically, the investors try to construct a portfolio, which is in the long position of the target financial deriva- tive, and in the short position of the hedging tool. And the amount of the hedging tool is equal to the ratio of the delta value of two derivatives in this portfolio. If the port- folio is formed this way, the net value of delta is zero and the price of the portfolio is more stable than any individual derivative. Usually, for financial derivatives written on stocks, index or something traded directly in the market, the hedging tool can be the underlying risky asset because the delta of the risky asset is equal to 1 always. So the hedging ratio depends on the delta value of the target derivative only. However, for interest rate options, I need to modify the delta hedging strategy a little bit because the interest rates are not directly traded in the market. Naturally, the interest rate options are good candidates for this purpose. So I try to use one interest rate option to delta- hedge another interest rate option. Compared with the traditional hedging strategy, this method has two advantages. First, there is no strict requirement on the money. I can do hedging by the amount of money comparable with the values to be hedged. Second, the hedging can be performed in the same market, which eliminates certain information 5 lags. While, unlike the delta hedging on stock options, I have to estimate the two delta values in order to obtain the hedging ratio since the denominator of the delta ratio is no long equal to 1. That drives me to find a more accurate model to estimate the delta of interest rate options.

With some modification, the model used in the estimation of SPD can be em- ployed for the estimation of the delta values of the interest rate options. I also try the

PPR model of Hutchinson (1994) and the Nadaraya-Watson kernel model. Hutchinson

(1994) mentions PPR in pricing index options. But it estimates the delta not through

LPR. They get the delta values by differentiating the kernel function directly. By com- paring my method with these two alternatives, I can highlight the role of PPR and

LPR in the estimation. The comparisons in the simulations and the empirical analysis suggest that it is more attractive to use the delta hedging strategy when my model is adopted in the estimation of the delta values.

SPD prices the financial derivatives through the probability in the risk-neutral economy. Stochastic discount factor (SDF) describes portfolio prices from another aspect. The prices of financial derivatives can be represented as the expectation of the inner products between the payoffs and SDF. Usually, SDF is implied by a factor model.

If the model was the true data generating process (DGP) of returns, SDF could price the returns perfectly. In reality, asset pricing models are at best approximations. This implies no SDF can price portfolios perfectly in general. Therefore, it is important to construct a measure of pricing errors produced by SDFs so that we are able to compare and evaluate SDFs. For this purpose, Hansen and Jagannathan (1997) develop the 6

Hansen-Jagannathan distance (HJ-distance). This measure is in the quadratic form of the pricing errors weighted by the inverse of the second moment matrix of returns.

Intuitively, the HJ-distance equals the maximum pricing error generated by a model for portfolios with unit second moment. It is also the least-squares distance between a stochastic discount factor and the family of SDFs that price portfolios correctly.

The HJ-distance has several desirable properties in comparison to the J-statistic of Hansen (1982). First of all, it does not reward variability of SDFs. The weighting matrix used in the HJ-distance is the second moment of portfolio returns and indepen- dent of pricing errors, while the Hansen statistic uses the inverse of the second moment of the pricing errors as the weighting matrix and rewards models with high variability of pricing errors. Second, as Jagannathan and Wang (1996) point out, the weighting ma- trix of the HJ-distance remains the same across various pricing models, which makes it possible to compare the performances among competitive SDFs by the relative values of the HJ-distances. Unlike the Hansen statistic, the HJ-distance does not follow a chi- squared distribution asymptotically. Instead, Jagannathan and Wang (1996) show that, for linear factor models, the HJ-distance is asymptotically distributed as a weighted chi-squared distribution. In addition, they suggest a simulation method to develop the empirical p-value of the HJ-distance statistic.

However, Ahn and Gadarowski (2004) find that the specification test based on the HJ-distance severely overrejects correct models in commonly used sample sizes, compared with the Hansen test which mildly overrejects correct models. Ahn and

Gadarowski (2004) attribute this overrejction to poor estimation of the pricing error 7 variance matrix, which occurs because the number of assets is relatively large for the number of time-series observations. Ahn and Gadarowski (2004) report that the rejec- tion probability reaches as large as 75% for a nominal 5% level test, demonstrating a serious need for an improvement of the finite sample properties of the HJ-distance test.

In this thesis, I propose to improve the finite sample properties of the HJ-distance test via more accurate estimation of the weighting matrix, which is the inverse of the second moment matrix of portfolio returns. I justify my method by showing that poor estimation of the weighting matrix contributes significantly to the poor small sample performance of the HJ-distance test. When the exact second moment matrix is used, the rejection frequency becomes comparable to its nominal size.1

Of course, the true covariance matrix is unknown. I employ the idea of the shrink- age method following Ledoit and Wolf (2003) to obtain a more accurate estimate of the covariance matrix. The basic idea behind shrinkage estimation is to take an optimally weighted average of the sample covariance matrix and the covariance matrix implied by a possibly misspecified structural model. The structural model provides a covariance matrix estimate that is biased but has a small estimation error due to the small number of parameters to be estimated. The sample covariance matrix provides another estimate which has a small bias, but a large estimation error. The shrinkage estimation balances the trade-off between the estimation error and bias by taking a weighted average of these two estimates. 1 Jobson and Korkie (1980) also report poor performance of the sample covariance matrix as an esti- mate of the population covariance when the sample size is not large enough compared with the dimension of the portfolio. 8

In this shrinkage method, one needs to choose a structural model serving as the shrinkage target. Here, because testing a SDF is the purpose of the HJ-distance test, a natural choice of the structural model is the asset pricing model whose SDF is tested by the HJ-distance test. The optimally weighted average is constructed by minimizing the distance between the weighted covariance matrix and the true covariance matrix, and the optimal weight can be estimated consistently from the data.

I allow both possibilities where the target model is correctly specified and mis- specified. In the former case, the shrinkage target is asymptotically unbiased. In the latter case, the shrinkage target is biased, but the estimated weight on the shrinkage target converges to zero in probability as the sample size tends to infinity. Therefore, the proposed covariance matrix estimate is consistent in both cases.

Using this covariance matrix estimate greatly improves the finite sample perfor- mance of the HJ-distance test. I use similar data sets with Ahn and Gadarowski (2004).

With 25 portfolios, the rejection frequencies are close to the nominal size even for the sample sizes of 160. With 100 portfolios, the rejection frequency is sometimes far from the nominal size, but it is much closer than the case in which the sample covariance matrix is used.

The layout of the thesis is as follows. In Chapter 2, I review the literature relating to state price density and stochastic discount factor. In Chapter 3, I propose a new model to estimate SPD for interest rates. In Chapter 4, I discuss the delta-hedging strategy on interest rate options. In Chapter 5, I propose a method to improve the finite sample performance of the HJ-distance test on SDF. Chapter 6 concludes. Chapter 2

Literature Review

The first research on state price densities dates back to Breeden and Litzenberger

(1978), who proved that the SPDs of any risky asset are equal to the second order par- tial derivatives of the prices of the European style call options 1 written on this asset with respect to the corresponding strike prices. Given this breakthrough, research on the estimation of SPDs from option prices has been very active in recent years. Several dozens of methods have appeared in recent literature, and, in general, they fall into two categories: parametric and nonparametric methods. Parametric methods make strong assumptions on the structure of the process of the underlying asset dynamics and derive an explicit formula for the option prices based on those dynamics. Then SPDs can be obtained by taking the partial derivatives according to the pricing formula as exempli-

fied in Merton (1973), Vasicek (1977), Black and Scholes (1978), Courtadon (1982),

Cox, Ingersoll and Ross (1985), Duffie and Kan (1996), Aït-Sahalia (1996), Chacko and Das (2002) and Jarrow, Li and Zhao (2007). Yet these kinds of parametric methods

1 Options are one kind of financial derivatives, which give the buyer the right, but not the obligation, to engage in a future transaction on some underlying security at specific prices. European style options can not be exercised until the expiry date, while American style options can be exercised at any time before the . 10 are not always the most reliable; they are valid only when the motion of the underlying asset correctly specifies the data within a certain time period, and they suffer gravely from the misspecification. Given these difficulties, many have turned to another cat- egory, nonparametric models. Nonparametric models do not require any assumption on the process of the underlying asset, a convenience that saves a great deal of effort in describing the motion of the underlying asset properly while, at the same, avoiding the error caused by the misspecification. Aït-Sahalia and Duarte (2003) and Yatchew and Härdle (2006) set up univariate nonparametric regressions to estimate the second order partial derivatives of the option prices with respect to the strike prices. Basically they focused on one-day data, regarded the option price as a function of the strike price only, and tried to get SPDs from the univariate regressions. Although their methods were successful for index options, they cannot be implemented into interest rate op- tion markets because, within one day, the interest rate options would not have many observations on the strike prices. So I have to focus on multivariate nonparametric re- gression models. Many recent papers have talked about how to estimate option prices through multivariate nonparametric regression models, among which are Hutchinson et al (1994), Jackwerth and Rubinstein (1996), Ghysels et al. (1997), Aït-Sahalia and Lo

(1998), Broadie et al. (2000a, b), Garcia and Gencay (2000), Cont (2001), Cont and da

Fonseca (2002), Daglish (2003), Hardle et al. (2002). But multivariate nonparametric methods are limited by the so-called “ curse of dimensionality", which requires a large data set for the high dimensional estimation. Moreover, what I am really interested in is the second order derivatives of the option prices with respect to the strike prices, 11 and the derivatives of a nonparametric estimator need not be good estimators for the derivatives of the function to be estimated. If I take the partial derivatives directly from the kernel estimators, the estimated SPDs can be very misleading. Thus the problem of estimating SPDs of interest rates is not solved properly in the literature. I can not adapt one to apply into my case. And I have to design another method to address the main difficulties in the estimation of SPDs for interest rates.

After I set up the model for the estimation of SPD, I can use a similar approach to estimate the delta value for hedging. In the literature, there are many recent research works on hedging financial derivatives. Garcia and Gencay (2000) talk about pric- ing and hedging derivatives with a neural network and homogeneity hint. Villaverde

(2004) hedges European and Barrier options in a discrete time and discrete space set- ting by using stochastic optimization to minimize the mean downside hedge error under transaction costs. Clewlow and Hodges (1997) examine the problem of delta-hedging portfolios under the transaction costs. Crepey (2004) compares the profit and loss aris- ing from the delta-neutral dynamic hedging of options, using two different models to derive the values for the delta of the option. Bakshi and Kapadia (2003) investigate the correspondence between the volatility risk premium and the mean delta-hedged portfo- lio returns. Fink (2003) presents an alternative static hedging methodology, denoted the generalized static hedge, that appears to perform more reliably. Sercu and Wu (2000) compare the performances of various hedge ratios for three-month currency exposures, and find that the price-based hedge ratios generally perform better than the regression- based ones. Windcliff, Forsyth and Vetzal (2006) explore the valuation and hedging 12 of discretely observed volatility derivatives using three different models for the price of the underlying asset: Geometric Brownian motion with constant volatility, a surface, and jump-diffusion. All the above papers talk about hedging a stock or an index, and none of them mention hedging of interest rates derivatives. The special characteristics of interest rate derivatives prevent us from adopting one model from the literature into this study.

As SPDs are not observable in the markets, I have to build a model to estimate them. Stochastic discount factors (SDF) are implied by asset pricing models. Mostly, researchers are likely to test the validity of SDF in asset pricing. Therefore, the HJ- distance has already been applied widely in financial studies. Typically, when a new model is proposed, the HJ-distance is employed to compare the new model with alter- native ones. The new model can be supported if it offers small pricing errors. This type of comparison has been adopted in many recent papers. For instance, by using the HJ- distance, Jagannathan and Wang (1998) discuss cross sectional regression models; Kan and Zhang (1999) study asset pricing models when one of the proposed factors is in fact useless; Campbell and Cochrane (2000) explain why the CAPM and its extensions are better at approximating asset pricing models than the standard consumption-based asset pricing theory; Hodrick and Zhang (2001) evaluate the specification errors of several empirical asset pricing models that have been developed as potential improvements on the CAPM; Lettau and Ludvigson (2001) explain the cross section of average stock re- turns; Jagannathan and Wang (2002) compare the SDF method with the Beta method in estimating risk premium; Vassalou (2003) studies models that include a factor that cap- 13 tures news related to future Gross Domestic Product (GDP) growth; Jacobs and Wang

(2004) investigate the importance of idiosyncratic consumption risk for the cross sec- tional variation in asset returns; Vassalou and Xing (2004) compute default measures for individual firms; Huang and Wu (2004) analyze the specifications of option pricing models based on time-changed Levy process; and Parker and Julliard (2005) evaluate the consumption capital asset pricing model in which an asset’s expected return is de- termined by its equilibrium risk to consumption. Some other works test econometric specifications using the HJ-distance, including Bansal and Zhou (2002) and Shapiro

(2002); Dittmar (2002) uses the HJ-distance to estimate the nonlinear pricing kernels in which the risk factor is endogenously determined and preferences restrict the defini- tion of the pricing kernel. This literature shows the importance of the improvement on the HJ-distance test in finite samples. Chapter 3

Estimation of State Price Densities for Interest Rate Options

This chapter describes a new model for estimating the state price density (SPD) of interest rates. Estimating the SPD of interest rates differs from estimating the SPD of other risky assets, mainly because the motion of interest rates has a mean reversion property. When rates rise above the mean, there is a drift to push them down, and when they fall below, there is a force that drives them back up. This characteristic makes it difficult to estimate the SPD of interest rates.

3.1 Construction of the Estimator

3.1.1 Financial Market Theory

The fair price of a European style call option is equal to the discounted expected

−γ(T −t) ∗ payoff, e E [max(ST −E, 0)], where t is the current date, T is the maturity date,

1 ST is the underlying value at T , E is the strike price or price and γ is the

current discount rate. The expectation is calculated under the risk neutral density. If I

1 It is easy to distinguish the strike price E from the expectation E. 15 write this expectation in the integral form, I have

Z +∞ −γτ Ct(St, E, τ, γ) = e max(ST − E, 0)f(ST |St, τ, γ, δ)dST , (3.1) 0 where τ = T − t and δ is the dividend rate. The SPD is defined to be f(ST |St, τ, γ, δ)

in the above integral. It assigns probability densities to various values of the asset at

time of expiration given the current asset price, the time to expiry, the current risk-free

interest rate and the corresponding dividend yield of the asset.

Breeden and Litzenberger (1978) show that the SPDs can be derived by taking

the second order partial derivatives of the call option prices with respect to the strike

prices. Specifically,

∂2C(S , E, τ, γ, δ) f(S |S , τ, γ, δ) = eγτ t | . (3.2) T t ∂E2 E=ST

For example, if the underlying asset follows the geometric Brownian motion, the call

option price can be obtained by the Black-Scholes formula

−δτ −γτ Ct(St, E, τ, γ) = Ste Φ(d1) − Ee Φ(d2), (3.3)

ln(S /E)+(r−δ+ 1 σ2)τ √ t √ 2 where d1 = σ τ and d2 = d1−σ τ. In this case, the corresponding SPD

2 2 is a lognormal density with mean ((γ − σ) − σ /2)τ and variance σ τ for ln(ST /St):

 2 2  1 [ln(ST /St) − (γ − δ − σ /2)τ] f(ST |St, τ, γ, δ) = √ exp − . (3.4) 2 2 ST 2πσ τ 2σ τ

3.1.2 Nonparametric Estimation

In recent years, nonparametric estimation has attracted more and more attention

in the estimation of SPDs, although it tends to be limited by sample sparsity and poor 16 estimators of the derivatives. To alleviate these two problems, I will use projection pursuit regression (PPR) combined with local polynomial regression (LPR) to estimate the SPDs.

0 0 Given the data set X = (X1,X2, ··· ,Xn) and Y = (Y1,Y2, ··· ,Yn) , where Xi

is a k × 1 factor vector and Yi is a scalar, my model is

Yi = m(Xi) + σ(Xi)εi, (3.5)

where i = 1, 2, ··· , n and εi is i.i.d. with the mean of zero. The aim is to estimate the conditional expectation m(X0) = E(Y |X0) and its partial derivatives from the observations {(Xi,Yi)}.

Suppose aj is a k × 1 unit vector. I approximate m(X0) surface by a finite sum of ridge functions: J X m(X0) ∼ gj(X0aj). (3.6) j=1 The ridge functions are determined iteratively. Specifically, assuming I have already determined the first l−1 terms of the vector aj and the function gj (j = 1, 2, ··· , l−1).

Let l−1 X ri = Yi − gj(Xiaj) (3.7) j=1

k be the residuals of this approximation. Suppose a ∈ R is a unit vector. I plot ri against

Xia, and fit a smooth curve g to this scatter plot.

g in the l-th step is defined to be

gl(u) = E[zl(X)|Xal = u], (3.8) 17 and al is chosen to minimize

2 D(al) = E[(ri − gl(Xial)) ] (3.9)

over all possible choices of direction a. The minimizing direction al and the corre-

sponding smooth function gl are then inserted as the next term into the approximating

sum. This process is iterated until the improvement becomes small.

There are many choices for estimating the smooth function gj, I propose using

local polynomial regression (LPR) for two reasons: first, LPR delivers a small bias

for the estimators as compared with traditional Nadaraya-Watson kernel estimation,

and second, LPR produces good estimators on the derivatives of the function being

estimated (Fan and Gijbels (1996)).

Specifically, for al, denote Xial = ui and X0al = u0. A polynomial is fitted

locally by a weighted least squares regression problem:

n P X X p 2 min {ri − βp(ui − u0) } Kh(ui − u0), (3.10) βp i=1 p=0

where Kh(·) = K(·/h)/h. K(·) is the Gaussian kernel and h is a bandwidth. The

ˆ solution to the least square problem is denoted by βp (p = 0...P ). Then, the v-th order

(v) derivative of g, g (u0) (v = 0...P ), can then be estimated by

(v) ˆ gˆ (u0) = v!βv. (3.11)

ˆ As for choosing the bandwidth and arriving at the estimator βp, please refer to the

Appendix. 18

Estimating gj by gˆj, I can then estimate D(al) by calculating the sum of squared residuals relative to this gˆ:

ˆ −1 X 2 D(al) = n (ri − gˆ(Xial)) , (3.12)

which means the estimator of al is

ˆ −1 X 2 aˆl = arg min D(al) = arg min n (ri − gˆ(Xial)) . (3.13) al al

J J (v) J After I obtain all the estimators of {aˆj}j=1, {gˆj}j=1 and {gˆj }j=1 (v = 1, 2, ··· ,P ),

J X mˆ (X) = gˆj(Xaˆj). (3.14) j=1 And the estimates of the first and the second order derivatives of m with respect to the q-th (q = 1, 2, ··· , k) factor are

J ∂mˆ X (1) = gˆ (Xaˆ )ˆa , (3.15) ∂X j j jq q j=1 and 2 J ∂ mˆ X (2) = gˆ (Xaˆ )ˆa2 . (3.16) ∂X2 j j jq q j=1

3.1.3 Asymptotic Properties

Before I look at the asymptotic properties of the estimators, let me first introduce

R j R j 2 some notations. Define µj = u K(u)du, νj = u K (u)du, the typical element

˜ ∗ (j, l) of matrix Bj,l = (µj+l), Bj,l = (µj+l+1), B = (νj+l), where 0 ≤ j, l ≤ P and

0 0 cp = (µp+1, ··· , µ2p+1) , c˜p = (µp+2, ··· , µ2p+2) . Denote f as the density function of

Xia. Define

2 a0 = arg min D(a) = arg min E[{ri − g(Xia)} ], (3.17) a a 19 and n ˆ −1 X 2 aˆ = arg min D(a) = arg min n {ri − gˆ(Xia)} . (3.18) a a i=1 I assume that the minimum is attained uniquely on A, except for the sign change, and that the first three directional derivatives of f and z exist and are continuous uniformly in x ∈ Rk and in all directions; that f vanishes outside a compact set, and is bounded

 away from 0 on A for some  > 0; and that K is the Gaussian kernel. Let a0 give a

ˆ local minimum of D(a) and let aˆ be a value which minimizes D(a). I use P0, P1 and P2

to denote the order of local polynomial estimation for gj and its first and second order

derivatives.

Theorem 1 Let a0 and aˆ be defined as Equation (3.17) and Equation (3.18). Then aˆ

−1/2 converges to a0 with the rate of (nh0) , where h0 is the optimal bandwidth of the lo-

−1/(2P0+3) cal polynomial regression for estimating the m(X0) and has the order of Op(n ).

Proof see the Appendix.

Theorem 2 The conditional moment estimator, mˆ (X0), converges to its true value at

−1/2 the rate of (nh0) . The estimator of the conditional first order partial derivative

(1) with respect to the q-th regressor, mˆ q (X0), converges to its true value at the rate of

3 −1/2 (2) (nh1 ) ; and the estimator of the second order derivative, mˆ q (X0), converges to

5 −1/2 its true value at the rate of (nh2 ) , where h0, h1 and h2 are the corresponding

−1/(2P1+3) optimal bandwidths for LPR in Equation (3.10). And h1 ∼ Op(n ), h2 ∼

−1/(2P2+3) Op(n ).

Proof see the Appendix. 20

3.1.4 Implementing the model

The previous two sections describe the theoretical technique used in this chapter.

In this section, I will explain how to implement this model into the interest rate option data.

It is commonly known that there are five factors that affect the options: the cur- rent underlying value Si, the strike price Ei, the time to maturity τi, the discount rate γi

−γiτi and the volatility σi. Usually, the discount factor, e , is relatively stable, while the volatility can be assumed to be an unknown function of Si. Therefore, I can exclude

these two factors from the regressors.

The model I am going to use is

X 2 2 2 Ci ∼ gj(a1jSi+a2jEi+a3jτi+a4jSi +a5jEi +a6jτi +a7jSiEi+a8jEiτi+a9jSiτi). j (3.19)

This model maintains all the assumptions for the theorems in the previous section, so the SPD estimators are

2 ˆ ∂ Ci X (2) (1) = [ˆg (·)(ˆa + 2ˆa E +a ˆ S +a ˆ τ )2 +g ˆ (·)(2ˆa )]. (3.20) ∂E2 j 2j 5j i 7j i 8j i j 5j i j

As I have already derived the asymptotic properties of the partial derivative estimators for each individual regressor, it is not hard to see that my SPD estimators converge to the true values asymptotically at the rate of n−2/9 if I use the optimal bandwidth and the optimal polynomial function. 21

The reason I incorporate the quadratic forms of the factors into the regressors is that if I only use the linear combination of Si, Ei and τi as the regressors, I do not allow

for much variation on the shape of the second order derivatives. This can be seen in the

following model X Ci ∼ gj(a1jSi + a2jEi + a3jτi), (3.21) j

where j is the index for the ridge function g. I estimate {gj, aj} by iterations, where

each iteration can be expressed as

ri = gj(a1jSi + a2jEi + a3jτi) + ei. (3.22)

Based on this setup, the estimators of the first and second order partial derivatives with

respect to Ei and Si are

∂rˆi (1) =g ˆj (·)ˆa2j, (3.23) ∂Ei

∂rˆi (1) =g ˆj (·)ˆa1j, (3.24) ∂Si

2 ∂ rˆi (2) 2 2 =g ˆj (·)ˆa2j, (3.25) ∂Ei

2 ∂ rˆi (2) 2 2 =g ˆj (·)ˆa1j. (3.26) ∂Si

If I divide both sides of Equation (3.23) and Equation (3.25) by Equation (3.24) and

Equation (3.26) respectively, the ratio of the estimators for the first order partial deriva-

2 2 tives is aˆ2j/aˆ1j and the ratio for the second order derivatives is aˆ2j/aˆ1j, which is the 22 squared value of the previous one. But it is uncommon for the two ratios to have such a relationship. This suggests that I impose strong restrictions on the estimators of the partial derivatives, which would jeopardize the performance of the model.

Thus, to relax this constraint, I can add some quadratic forms into the regressors.

2 2 2 For example, I can incorporate Si , Ei , τi , SiEi, Siτi and Eiτi into the explanatory

variables. In this case, the model becomes

X 2 2 2 Ci ∼ gj(a1jSi+a2jEi+a3jτi+a4jSi +a5jEi +a6jτi +a7jSiEi+a8jEiτi+a9jSiτi). j (3.27)

And after the estimation of {gˆj, aˆj}, the SPD estimators can be obtained by

2 ˆ ∂ Ci X (2) (1) = [ˆg (·)(ˆa + 2ˆa E +a ˆ S +a ˆ τ )2 +g ˆ (·)(2ˆa )]. (3.28) ∂E2 j 2j 5j i 7j i 8j i j 5j i j

Note that increasing the number of the regressors in the model leads to increased bur- dens in computation. For this reason, I will not consider the cubic forms of the factors.

3.2 Simulations

To test the performance of my SPD estimators, I conduct two types of Monte

Carlo simulations. In the first, I want to check how well my method learns the Black-

Scholes formula. Although one might not expect the interest rate to follow the ge- ometric Brownian motion precisely, my estimator should be able to approximate the

Black-Scholes formula when I assume it determines the option prices. In the second simulation, I assume the interest rate will follow the CIR model (Cox, Ingersoll and 23

Ross (1985)), which describes the interest rates more realistically, and compare my estimators with the semi-parametric estimators in Aït-Sahalia and Lo (1998) (AL esti- mators). 2

In their simulations, Aït-Sahalia and Lo (1998) calibrate their model using char- acteristics of the S&P 500 options market. I calibrate my experiments using the IRX, the short term interest rate options, traded in the Chicago Board Option Exchange

(CBOE) in 1993.

3.2.1 Learning the Black-Scholes

Let me simulate the data at first. I draw the initial values of the option contracts, which are 1000 times the underlying interest rates, from the uniform distribution be- tween [28, 32]. Then, I assign the six strike prices, 25, 27.125, 30, 32.125, 35 and

37.125 to each contract. Next, I pick the time-to-maturity for the option contract ran- domly from 5 to 200 days. After simulating a number of N options, I assume the interest rates will follow the geometric Brownian motion with a volatility of 0.075.

Setting the discounted rate at 3% annually, I derive the ‘true’ prices of the simulated options using the Black-Scholes formula.

With this simulated data as the input for the estimation model, I employ my method to estimate the SPDs at the values equal to the strike prices, using the mean

2 Aït-Sahalia and Lo (1998) derive the from the data at first, and set them as the dependent variable. Then, they choose the future prices, the strike prices and the time-to-maturity as the regressors to derive Nadaraya-Watson kernel estimator of the volatility. At last, they plug the estimated volatility back into the Black-Scholes formula along with other parameters to estimate the option prices and SPDs. 24 squared error (MSE) divided by the mean squared ‘true’ SPD to gauge the accuracy of the estimators. Henceforth, I will refer to this measure as MSE in percentage. I repeat the whole procedure 1000 times, and report the mean and the standard deviation of MSE in percentage in these 1000 simulations in 3.1. I also report the performance of the AL estimators in this table. It is not surprising to see that the AL estimators can produce a smaller estimation error than mine, since their estimation is based on the

Black-Scholes model.

In order to give a clearer view of the performances of these two methods, I es- timate SPDs at the same points for each set of simulated data. As these points are not necessarily contained in the simulated data, I can regard this estimation as an out-of- sample estimation. Focusing on the SPDs at (29,29.01, 29.02,··· , 30.99, 31)’ after 84 days given that the current underlying value of the option contract is 30, I use the same simulated data set to estimate the SPDs of the ‘benchmark points’ and calculate MSE in percentage, repeating this simulation 1000 times. The average MSE in percentage is summarized by Table 3.2.

Table 3.1 and Table 3.2 show how the bias and the variance of the estimators on the prices and SPDs decrease when the number of observations increases. Moreover, the SPDs converge at a slower rate than the option prices, which is consistent with

Theorem 1 and Theorem 2. 25 ) ) ) ) − 4 − 4 − 4 − 4 − 5 − 5 − 4 − 5 10 10 10 10 10 10 10 10 × × × × × × × × 1 . 2498 8 . 3288 1 . 5419 1 . 5496 )( 7 . 5989 )( 4 . 0429 )( 1 . 4696 )( 1 . 1915 − 9 − 9 − 8 − 8 − 8 − 8 − 8 − 9 10 10 10 10 10 10 10 10 Semi-parametric × × × × × × × × 7 . 2927 3 . 3633 2 . 9244 2 . 1891 1 . 4486 8 . 2616 0.2884 0.2539 0.2493 0.2384 ) (0.0463) ( 2 . 3698 ) (0.0395) ( 1 . 2678 ) (0.0269) ( ) (0.0252) ( − 4 − 5 − 4 − 5 − 4 − 5 − 5 − 4 10 10 10 10 10 10 10 10 PPR+LPR × × × × × × × × 4 . 063 4 . 2929 4 . 2261 3 . 9207 ( 4 . 7841 ( 4 . 1034 ( 3 . 4980 ( 3 . 1637 SPDs. I collect the ratio of MSE of the estimators over the mean square true values, and replicate 1000 Method: to determine the theoretical SPDs, and use my method and the semi-parametric method in Aït-Sahalia and points =600 =2400 =4800 =1200 N N N N Estimators: Option Prices SPDs Option Prices SPDs Data in-sample Estimation able 3.1: Mean squared error (in percentage) between the true values and the estimators (In Sample) T Black-Scholes formula use the Lo (1998) to estimatesimulations. the The table reports the mean and the standard deviation (in the brackets) of the ratios in the 1000 simulations. I 26 ) ) ) ) − 13 − 11 − 12 − 11 − 12 − 11 − 12 − 11 10 10 10 10 10 10 10 10 × × × × × × × × 1 . 9064 1 . 9208 1 . 9359 1 . 9798 )( 8 . 2545 )( 1 . 0014 )( 1 . 0608 )( 1 . 3422 − 30 − 27 − 29 − 28 − 21 − 20 − 17 − 16 10 10 10 10 10 10 10 10 Semi-parametric × × × × × × × × 1 . 0199 2 . 7827 1 . 1416 3 . 0355 1 . 7505 6 . 8980 0.0466 0.0540 0.0655 0.0799 0 . 0123 0 . 0123 0 . 0124 0 . 0127 ( 0 . 0037 ) (0.0164) ( ( 0 . 0039 ) (0.0171) ( ( 0 . 0044 ) (0.0180) ( 1 . 3108 ( 0 . 0054 ) (0.0210) ( 5 . 7439 SPDs. I collect the ratio of MSE of the estimators over the mean square true values, and replicate 1000 Method: PPR+LPR to determine the theoretical SPDs, and use my method and the semi-parametric method in Aït-Sahalia and Lo points =600 =4800 =2400 =1200 N N N N Estimators: Option Prices SPDs Option Prices SPDs Data Estimation out-of-sample able 3.2: Mean squared error (in percentage) between the true values and the estimators (out of sample) T Black-Scholes formula use the (1998) to estimate the simulations. The table reports the mean and the standard deviation (in the brackets) of the ratios in the 1000 simulations. I 27

From these two tables, I can see the AL estimation is far superior to my model when the interest rates are assumed to follow the geometric Brownian motion. How- ever, in reality, the interest rates possess the mean-reversion property: as the interest rates rise above the mean level, there is a negative drift that pulls the rates down; yet when the interest rates fall below it, there is a positive force that drives them back up.

Thus it is hard to believe that a geometric Brownian motion fits the interest rates very well. For this reason, I am going to try another model, which describes the interest rates more realistically to examine the changes in the performance of the SPD estimators.

3.2.2 Learning the CIR

In this section, I assume that the interest rate follows the CIR model, which captures the basic mean-reversion property of interest rates. The model as constructed by Cox, Ingersoll and Ross (1985) for interest rate dynamics is as follows:

1/2 dSt = α0(α1 − St)dt + σSt dWt, (3.29)

where {Wt} is a standard one-dimensional Brownian motion and α0, α1 are some

constants. It has been shown that the transition density is determined by the fact

2 that given St = S0, 2cSt+∆ has a noncentral χ distribution with degrees of free-

2 dom 2(2α0α1/σ − 1) + 2 and noncentrality parameter 2cS0 exp(α0∆), where c =

2 2α0/[σ (1 − exp(−α0∆))] and ∆ is the time interval.

I simulate the option data in the same manner I did before, but price the options

by assuming the interest rate follows the CIR models after the option contracts are 28 written. Based on the conditional density of the interest rate implied by the CIR model,

I can obtain the ‘true’ prices and the SPDs. Here, I set α0 = 0.21459, α1 = 0.032

and σ = 0.075. Then, I use the simulated data to derive my SPD estimators and AL

estimators. Table 3.3 and Table 3.4 report the statistics of the MSE in percentage for

in-sample and out-of-sample in 1000 simulations.

Table 3.3 and Table 3.4 show that, when the underlying asset does not follow the

geometric Brownian motion, my SPD estimators and option prices are superior to the

AL estimators. Moreover, my results in Table 3.3 and Table 3.4 are comparable with

those in Table 3.1 and Table 3.2. The difference between the performance of simulation

in the two cases can be explained by the complexity of the stochastic motion of the

interest rate.

The simulations I have done suggest that if I know the motion of the interest rates

up to a handful of unknown parameters, the parametric or semi-parametric method is

more promising than mine. But, when the motion of the interest rates is not certain,

my method is the better choice. Moreover, no matter what law of motion the interest

rates follow, my results are quite comparable, which reveals the robustness of my SPD

estimators. 29 Semi-parametric 0.44970.3179 0.0232 6.8633 0.0207 6.5578 0.25060.2232 0.0195 6.3461 0.0188 6.1983 ) (0.2393) (0.0025) (1.0565) ) (0.5208) (0.0031) (1.5389) ) (0.1056)) ( 0.0018) (0.0533) (0.8749) ( 0.0011) (0.6664) − 5 − 5 − 5 − 5 − 5 − 5 − 5 − 5 10 10 10 10 10 10 10 10 × PPR+LPR × × × × × × × 6 . 2362 5 . 3834 5 . 3703 5 . 2302 4 . 9779 ( 7 . 5726 ( 5 . 9173 ( 5 . 7901 ( Method: points =600 =1200 =2400 =4800 N N N N Estimators: Option Prices SPDs Option Prices SPDs Data Estimation SPDs. I collect the ratio of MSE of the estimators over the mean square true values, and replicate 1000 simulations. The to determine the theoretical SPDs, and use my method and the semi-parametric method in Aït-Sahalia and Lo (1998) to able 3.3: Mean squared error (in percentage) between the true values and the estimators (In Sample) T in-sample CIR model use the I estimate the table reports the mean and the standard deviation (in the brackets) of the ratios in the 1000 simulations. 30 Semi-parametric 0.00630.0061 0.0234 6.7453 0.0207 6.6056 0.00470.0035 0.0168 6.3431 0.0131 6.0538 ) (0.0188) (0.0144) (1.2388) ) (0.0217) (0.0194) (1.6570) ) (0.0094)) ( 0.0098) (0.0065) (0.8706) ( 0.0064) (0.6093) − 5 − 5 − 5 − 5 − 5 − 5 − 5 − 5 10 10 10 10 10 10 10 10 × PPR+LPR × × × × × × × 3 . 4727 2 . 6284 2 . 4628 2 . 2394 1 . 9759 ( 3 . 4107 ( 2 . 8961 ( 2 . 7963 ( Method: points =600 =1200 =2400 =4800 N N N N Estimators: Option Prices SPDs Option Prices SPDs Data Estimation SPDs. I collect the ratio of MSE of the estimators over the mean square true values, and replicate 1000 simulations. to determine the theoretical SPDs, and use my method and the semi-parametric method in Aït-Sahalia and Lo (1998) to able 3.4: Mean squared error (in percentage) between the true values and the estimators (Out of Sample) T out-of-sample CIR model use the The table reports the mean and the standard deviation (in the brackets) of the ratios in the 1000 simulations. I estimate the 31

3.3 Application to the Interest Rate Options

In this section I use the tools I have described to analyze the data on short term interest rate options (IRX) traded in CBOE. The sample period is from January 4, 1993 to December 31,1993.

3.3.1 Data

Interest rate options are European-style, cash-settled options on the spot yield of

U.S. Treasury securities. The options on the short-term rate (ticker symbol IRX) are based on the annualized discount rate of the most recently auctioned 13-week Treasury bill. The 13-week T-bill yield is the recognized benchmark of short-term interest rates.

These bills are issued by the U.S. Treasury in auctions conducted weekly by the Federal

Reserve Bank. Underlying values for the option contracts are 1000 times the underlying interest rates. For example, an annualized discount rate of 5.5% on the newly auctioned

13-week Treasury bills would place the underlying value for the option on short-term rates (IRX) at 55.00.

To eliminate data error and ensure that closing option prices are representative of market conditions at the end of the trading day, I use several screening criteria. In my sample, observations with a time-to-maturity of less than six days, a price lower

−rτ than 0.125, or that violate the no-arbitrage lower bound (Ct ≥ max(0,St − e E)) are dropped. Table 3.5 describes the main features of my data set. 32

Table 3.5: Statistical Properties of the Interest Rate Option (IRX) in 1993

Interest Rate Time-to-maturity Strike Price

mean 29.8173 83.1936 30.8740

std deviation 0.7869 48.6029 4.0116

min 28 6 25

5% percentile 28.63 19 25

10% percentile 29 27 25

50 % percentile 29.88 72 30

90 % percentile 30.88 159 37.125

95 % percentile 31 172 37.125

max 31.25 189 42.125 33

3.3.2 Accuracy of SPD

The annual data is comprised of data from four quarters, the first three containing roughly 1000 data points each, while the last quarter contains only around 300. Em- ploying my method to estimate the SPDs for each quarter, I try to compare my estima- tors with the AL estimators. Following Aït-Sahalia and Lo (1998), I use the standard- ized butterfly prices as discrete approximations of the SPD values. The standardized butterfly price represents the price of the portfolio created by buying two call options and selling one call option. These three options have the same underlying asset and the same time-to-maturity, but different strike prices. Specifically, given a call option

(S0,E0, τ0), I try to find another two options which have the same underlying values and the same time-to-maturities but different strike prices, denoted by (S0,E0 + ∆E1, τ0) and (S0,E0 − ∆E2, τ0). Using C0, C1 and C2 to denote the corresponding prices of the above three option prices, by the Taylor expansion I have the following

2 ∂C0 ∂ C0 2 C1 ≈ C0 + ∆E1 + 0.5 2 ∆E1 , (3.30) ∂E0 ∂E0

and 2 ∂C0 ∂ C0 2 C2 ≈ C0 − ∆E2 + 0.5 2 ∆E2 . (3.31) ∂E0 ∂E0

The standardized butterfly price is given by

2 ∂ C0 ∆E2C1 + C2∆E1 − (∆E2 + ∆E1)C0 2 = 2 2 . (3.32) ∂E0 (∆E1 ∆E2 + ∆E2 ∆E1)

The standardized butterfly price is available only for a small part of options because it

needs very strict requirements on the strike prices of C0, C1 and C2. 34

Treating the standardized butterfly price as a benchmark, I measure the accuracy of the SPD estimators by MSE in percentage. For the purpose of comparison, I also report the AL estimators. Table 3.6 summarizes these results.

Table 3.6: Estimation Error of SPDs

PPR+LPR AL

Jan-Mar 0.2795 0.4490

Apr-Jun 0.2348 0.4071

Jun-Sep 0.1515 0.4293

Oct-Dec 0.0757 0.4565

I use the data in 1993, and estimate SPDs implied in them. I treat the standardized butterfly price as a benchmark, and calculate the MSE in percentage for the estimation.

Table 3.6 reveals that my estimators can deliver a smaller MSE than the AL es-

timators. I should note that butterfly prices can tell us a probability density at one

specific point, and their derivation has very strict requirements. I can not use the butter-

fly prices to approximate all the SPDs necessary for pricing. However, in this instance,

the butterfly prices partially reflect the accuracy of the SPD estimators.

3.3.3 Pricing Interest Rate Options

One of the aims of estimating SPDs is to price the interest rate derivatives. The-

oretically, I can consider the prices of the options as compensation for the uncertain

payoffs at the expiration day, which can be obtained numerically by the integral of

the payoff with respect to SPDs. Since I only estimate SPDs at discrete points, the 35 estimated price for the interest rate call option, Pˆ, can be expressed by

ˆ −γτ X ˆ P = e max(ST − E, 0)f(ST |St, τ), (3.33)

ST ˆ where ST is the underlying value at the expiration day, and f(ST |St, τ) is the corre- sponding estimated SPD at the point of ST given St and τ.

As an empirical example, I obtain the SPD estimates from the first six months of

1993, and use them to price the call options for the same time period, leaving the third and the fourth quarters untouched for the time being as they will be used later as the forecasting window. Since the SPD estimators are based on the second order derivatives of the option prices with respect to the strike prices, the number of observations on the strike prices is crucial to the accuracy of these estimates. In the data, only the strike prices between 25 and 40 have more than 100 observations, so I focus on estimating the SPDs between 25 and 40, which are part of the true SPD function. Because there is little chance that the interest rate will go below 25 or above 40 as evidenced by Table

3.5. I am confident that my SPDs contain most of cases I need to consider; the part I ignore has a much smaller probability density, and thus has very little effect on pricing.

Since the SPDs to be estimated fall between 25 and 40, the call options where strike prices are between 25 and 37.125 can be priced properly. For options with a strike price of 40, I assume their price equals zero, which is never true in practice; thus in order to evaluate my method fairly, I do not consider such options.

After obtaining the SPD estimators by my model, I adopt the pricing formula

(3.33) to estimate the price for each option. Calculating the mean squared error in 36 percentage to measure the pricing accuracy, the MSE in percentage turns out to be

0.2468 for the first half of 1993. In addition, using the AL method to estimate SPDs and the option prices sees an increase in the MSE in percentage to 0.4563. In order to examine the performance of estimation methods in even greater detail, I group the option data according to their ; an out-of-the-money option is one for which the ratio of the underlying value over the strike price is less than 0.97, while an in-the- money option is one for which the ratio is higher than 1.03. An at-the-money option is between these two categories. I report the MSE in percentage for each group, and these results are summarized in the first panel of Table 3.7. Plotting the pricing errors for the three groups in Figure B.1, Figure B.2 and Figure B.3, the blue circle denotes my pricing error, and the red star denotes the pricing error for the AL model. Table 3.7 and the figures demonstrate that my SPD estimators can reduce the pricing error for about

30% compared with the AL estimators. As introduced earlier, the trading volume of interest rate derivatives is trillions of U.S. dollars. Thus even a small reduction in the pricing error would benefit investors considerably in such a sizeable derivative market.

What makes my model even more attractive is when one considers that the assumption on the motion of interest rates is relaxed. 37 y At-the-money In-the-money 0.45830.5904 0.21791 0.3468 0.1204 0.2173 0.56410.7009 0.4211 0.4591 0.1056 0.1361 0.53960.7911 0.3781 0.6445 0.1715 0.3645 Out-of-the-Mone options options Call options able 3.7: Mean Squared Error (in percentage) T Put Call interest rate semi-parametric: PPR+LPR: PPR+LPR: semi-parametric: Forecasting interest rate semi-parametric: Pricing interest rate PPR+LPR: Pricing use my method and the semi-parametric method in Aït-Sahalia and Lo (1998) to estimate the SPDs in the first half of 1993, and then price the I interest rate call optionssummarizes and MSE the in interest percentage. rate put options in that period. I also do the one-day-ahead forecasting for the call options. This table 38

In order to give an even clearer picture of the accuracy of my estimated prices, I track a specific option traded on January 3rd, 1993 for several trading days. This option had an underlying value of 30.88, a time to maturity of 53 days, and a strike price of

30 on January 3rd. Table 3.8 shows the summary for this option; the first column is the trading day, the second is the time to maturity, and the third column is the value of 1000 times the short term interest rate, which fluctuates daily. The fourth column lists the closing market price for each trading day and the last two columns are the estimators of the prices for the call options. Pˆ and P˜ denote the estimated prices by my method

and the AL method, respectively. This table shows clearly that my estimators are truer

to the market prices than the AL estimators for most trading days, with the exception

of January 11th.

SPDs can be used not only to price call options, but also to price other interest

rate derivatives, which is one of the main reasons SPDs are so attractive. Collecting the

SPDs estimated from the call options for the first half of 1993, I use the same pricing

procedure to price the put options of that period, and adapt the same selection rules as

mentioned at the beginning of this section, focusing on the estimation of the prices for

the put options where the strike prices belong to [27.125, 40]. Similarly with the call

options, I divide the put options into three groups according to their moneyness. The

second panel of Table 3.7 reports the MSE in percentage for two different methods.

Associated with these statistics are Figure B.4, Figure B.5 and Figure B.6. The blue

circle denotes my pricing error, and the red star denotes the pricing error for the AL

model. 39

Table 3.8: Summary of IRX call option with its estimated prices

ˆ ˜ Date Maturity Date St Market Price P P 1993/01/04 53 30.88 2.25 2.2396 1.4446 1993/01/05 52 31 2.203 2.2742 1.4806 1993/01/06 51 31 2.25 2.2391 1.4906 1993/01/07 50 31 2.344 2.2042 1.5067 1993/01/08 49 30.63 1.7815 1.9172 1.3854 1993/01/11 48 30.63 1.3125 1.881 1.3842 1993/01/12 47 30 1.3125 1.318 1.0537 1993/01/13 46 30 1.2815 1.2877 1.0296 1993/01/14 45 29.88 1.1565 1.1553 0.9473 1993/01/15 44 29.63 0.7505 0.8225 0.8716 1993/01/19 42 29.88 1.0935 1.0731 0.8719 1993/01/21 40 29.88 1.0625 1.0207 0.8347

I track a call option traded on January 3rd 1993 for several trading days. This option has an underlying value of 30.88 and the time-to-maturity of 53 days on January 3rd. The strike price for this option contract is 30. The first column is the trading day. The second column is the time-to-maturity. The third column is the value of 1000 times the short term interest rate. The fourth column lists the market prices I observe. The last two columns are the estimators of the prices for the call options. Pˆ and P˜ denote the estimated prices by my method and AL method respectively. 40

Many investors are also concerned with out-of-sample forecasting and the use of recent data to estimate the current prices. Using the latest data over 120 trading days,

I estimate the SPDs of the interest rate, using them to price the interest rate options for the current trading day. Selecting the second half of 1993 as a forecasting window,

I run a one-day-ahead forecast of the third and the fourth quarters of 1993; the third panel in Table 3.7 and Figures B.7-B.9 to summarize these results. The three panels in Table 3.7 show that regardless of whether I use my SPD estimators to do pricing or forecasting, the MSE using my method is always smaller than the MSE when the AL method is utilized, which implies that my SPD estimators are more reliable.

In addition, the estimation accuracy increases with the moneyness. One possible explanation for this is massive probability density on the right hand side of the strike prices for the in-the-money options, which reduces the effect of the bias of the SPDs estimators. Moreover, the prices of the in-the-money options are higher than in the other two cases. The high prices render the MSE in percentage small for the same estimation error. I also find that the MSE of forecasting is smaller than the MSE in pricing call or put options. That is because, when I do forecasting, I run one-day-ahead estimation. Thus, I encounter the boundary points less frequently than I did in the in-sample pricing.

In addition to the European options, two types of interest rate derivatives that are frequently traded include interest rate caps and . Interest rate caps are offered by financial institutions in the over-the-counter market and set a lower and/or upper limit for the rate to be charged. Swaptions give the holder the right to enter into 41 a certain interest rate between fixed rates and floating rates at a certain time in the future. In pricing these derivatives, I treat the upper (lower) limits in the caps or the

fixed rates in the swaptions as the strike prices in the European style options, and apply the same pricing method without much effort. Due to a shortage of data, I do not check the validity of my method as applied in these areas, but certainly this would be an area to explore in future research.

3.3.4 Dynamics of SPDs

When the estimators of SPDs are used to price interest rate derivatives, what I care about is not one probability density at a specific state. Instead, given the current value St and the time-to-maturity τ, I am interested in estimating the SPDs for the consecutive values of ST . These estimated SPDs give us a rough picture of the shape of f(ST |St, τ) , which in turn can serve as the basis for the interest rate derivatives. What

fascinates me in particular is how the shape of f(ST |St, τ) evolves over St and τ.

Using the data from the first half of 1993 to analyze this, the underlying value

of IRX varies in the range of 28 to 31.25. Assuming the underlying value for my

experiment is initially 29, and allowing the time-to-maturity to be 42, 84 and 126 days,

which correspond to the short-term, medium-term and long-term maturity, then, for the

different maturity days, I plot the SPDs along with ST varying from 25 to 40. Using

bootstrap to get the confidence intervals, 3 I repeat the whole process by replacing the

current interest rate with 30 and 31. The estimators of SPDs are plotted along with ST ,

3 The procedure of bootstrapping the confidence interval is provided in the Appendix. 42 and the typical shapes are shown as Figure B.10 in the appendix, where the solid line is the estimated SPD, and the dash line is their 95% confidence interval. To quantify these figures, I calculate the mean, the variance, the skewness and the kurtosis of ST under the measure of the estimated SPDs, and summarize them in Table 3.9.

From the table, I can see that the SPD estimators have similar bell shapes, and that their curves shift to the right as the time-to-maturity increases. This implies the interest rate is more likely to reach a high level in the long run than in the short run, which reflects that investors expect the interest rate will go up in the future.

Further exploration of the connection between the movements of SPDs and the prices of the interest rate derivatives suggests that given two cumulative distribution functions (CDF) of ST in a risk-neutral economy, F1 and F2, the first will first order stochastic dominates the other. If F1 is first order stochastic dominating F2, then when the payoff functions increase with ST , I can quickly deduce that the price of the first is higher than that of the second. Thus, the first order stochastic dominance provides a convenient route for us from the SPDs to the prices.

Let us look at the two specific CDFs at first: fixing St = 29, and letting τ take

the value of 42 and 84, I use F11, F21 to denote the CDFs of ST on the condition that

(τ = 42, St = 29) and (τ = 84, St = 29) respectively. I restrict my attention for ST located in the interval of [25, 37.125], and discrete [25, 37.125] by the grid of 0.01. The

SPD, fˆ, for each grid is estimated based on the data in the first half of 1993. The CDFs 43

Table 3.9: Statistical properties of SPD estimators

Mean Variance

τ = 42 τ = 84 τ = 126 τ = 42 τ = 84 τ = 126

St = 29 29.4824 33.8775 34.6798 2.8407 20.8195 12.8508

St = 30 34.0746 34.9832 39.6936 19.6017 16.3531 15.4514

St = 31 32.7858 29.5311 28.6709 5.1890 15.1174 9.5829

Skewness Kurtosis

τ = 42 τ = 84 τ = 126 τ = 42 τ = 84 τ = 126

St = 29 -0.4280 -1.4112 -0.6509 2.4953 2.1642 1.4616

St = 30 -1.3006 -1.5906 -1.0388 1.9274 2.8227 1.1243

St = 31 -1.8112 1.2911 1.7361 3.8288 1.7591 3.7216

The mean, variance, skewness and kurtosis of the interest rate at the expiry date under the measure of the estimated SPDs. The SPDs are estimated on the condition of St and τ. 44 can be obtained by

ˆ X ˆ Fi1(ST ) = 0.01 fi1(S), (3.34)

S

H0 : η = inf(F11 − F21) ≥ 0, (3.35) ST

H1 : η = inf(F11 − F21) < 0. (3.36) ST

Because it is very hard to derive the distribution analytically, I use the bootstrap method as described in the previous section to derive the 95% percentile confidence interval of

η and check whether I can reject the null hypothesis or not.

By the same procedure, I conduct additional hypotheses on the relations between different pairs of (τ, St). The notations and the hypotheses are summarized in Table

3.10 and Table 3.11.

It turns out none of the hypothesis can be rejected at the 0.05 level. The typical shape of η is shown by Figure B.12, where the estimates of F11 − F21 are plotted along its confidence interval, and support the claim that the CDFs of ST with higher St or longer τ first order stochastic dominate the ones with lower St or shorter τ. Therefore, for any nondecreasing payoff function, 4 the prices of the interest rate derivatives

4 Interest rate option is one kind of interest rate derivatives, which has the payoff function max[ST − 45 Table 3.10: Notations of CDFs

Conditions Notations of CDFs

(τ = 42, St = 29) F11

(τ = 84, St = 29) F21

(τ = 126, St = 29) F31

(τ = 42, St = 30) F12

(τ = 84, St = 30) F22

(τ = 126, St = 30) F32

(τ = 42, St = 31) F13

(τ = 84, St = 31) F23

(τ = 126, St = 31) F33

I use Fij to denote the CDFs with corresponding SPDs for nine different current conditions.

Table 3.11: Hypotheses

H0 F11 − F21 ≥ 0 F21 − F31 ≥ 0

F12 − F22 ≥ 0 F22 − F32 ≥ 0

F13 − F23 ≥ 0 F23 − F33 ≥ 0

F11 − F12 ≥ 0 F12 − F13 ≥ 0

F21 − F22 ≥ 0 F22 − F23 ≥ 0

F31 − F32 ≥ 0 F32 − F33 ≥ 0

I test twelve hypotheses of the relations between two CDFs. 46 considered in the experiments increase with St and τ.

3.3.5 Economic VAR

In economics and finance, Value-at-Risk (VAR) is a measure of how the market value of an asset or portfolio of assets is likely to decrease over a certain time period, and represents the maximum amount at risk to be lost from an investment at a particular confidence level. In Aït-Sahalia and Lo (2000), economic VAR (henceforth E-VAR) is proposed to measure the risk exposed by a risky asset or portfolio. Compared with the traditional statistical VAR, E-VAR is calculated based on Arrow-Debreu prices that are the discrete format of SPDs and provides more economic valuation of the uncertainty.

For example, a conventional VAR statistic might suggest a 5% probability of 20 mil- lions lost for 100 millions investment over the next two weeks, which seems to be a big risk exposure. But if this 20% loss happens only when other similar investments will suffer 40% or even more loss, such risk is mild after all.

To calculate the E-VAR of interest rate options at time t, I calculate the price gap of one option contract on two different trading days. If I stand at time t, as the change of τ is certain, what I need to figure out is with a confidence level of 95% how low the

˜ interest rate could be at time t + ∆t. So I use the estimated SPDs to find a St+∆t such

˜ that P (St+∆t < St+∆t|St) = 5%. Then I can derive the market value of the option by

˜ assuming that the interest rate is St+∆t and the maturity date is τ −∆t. In order to make the comparison possible, I gauge the E-VAR for a one dollar investment.

E, 0]. While, the payoff functions of other interest rate derivatives, such as interest rate caps and floors, can be quite different with interest options. 47

I must point out that I only consider the direct risk due to the movement of the interest rate. I do not consider any other indirect risk at this stage. The E-VAR tells us how much I could lose at most for the downward movement of the interest rate and the time-to-maturity. Adopting this procedure, I can calculate the risks of the interest rate call options in the first half of 1993 over the next two weeks, summarizing the options by their moneyness and the maturity days. The average E-VAR for each group is reported by Table 3.12. The numbers in the brackets are the standard deviation of

E-VAR for the corresponding group. I can see that the numbers in the first row of Table

3.12 are larger than the ones in the second or the third row, which implies that interest rate options that have less time to mature will bear more risk than others. In addition, the out-of-the-money options are much riskier than the in-the-money or at-the-money.

Therefore, if investors hold the interest rate options that have a short term to mature or are in the out-of-the-money position, they should be particularly cautious with regard to the risk behind this investment. I also report the E-VAR calculated from the AL model in Table 3.13. The difference between Table 3.13 and Table 3.12 implies that the

E-VAR derived by AL method may lead to inaccurate information on the investment risk.

3.3.6 In-Sample and Out-of-Sample Option Price Forecasts

When I am only interested in the prices of the call options, I do not have to estimate the SPDs. Instead, I can estimate the prices directly by local polynomial re- gression after I obtain the estimates of the projection directions. The accuracy of this 48

Table 3.12: E-VAR (I)

Time-to-maturity out-of-the-money at-the-money in-the-money

less than 84 days 0.9186 0.8095 0.4765 (0.1197) ( 0.0578) (0.1590)

between 84 and 126 days 0.7393 0.4585 0.2806 (0.2543) (0.3201) (0.1741)

longer than 126 days 0.3346 0.2312 0.1471 (0.4672) (0.4072) (0.2924)

I estimate economic Value-at-Risk for one dollar investment by my method, and divide results according by the moneyness and the maturity days of the options. The average E-VAR for each group is reported by in this table. The numbers in the brackets are the standard deviation of E-VAR for the corresponding group.

Table 3.13: E-VAR (II)

Time-to-maturity out-of-the-money at-the-money in-the-money

less than 84 days 0.8160 0.8944 0.6720 (0.2774) (0.0825) ( 0.1394)

between 84 and 126 days 0.6557 0.5832 0.5278 ( 0.2717) (0.3181) (0.2436)

longer than 126 days 0.2185 0.2750 0.2282 (0.4044) (0.4429) (0.4020)

I estimate economic Value-at-Risk for one dollar investment by the semi-parametric method, and divide results according by the moneyness and the maturity days of the options. The average E-VAR for each group is reported by in this table. The numbers in the brackets are the standard deviation of E-VAR for the corresponding group. 49 kind of estimation is another important aspect I should concern.

Aït-Sahalia and Lo (1998) conduct the first moment estimation as well, however, since my method does not impose any assumption on the motion of the interest rate, it is expected to work better than theirs. Table 3.14 reports the in-sample and out- of-sample MSE in percentage for the two methods. The vertical time column shows the time period upon which the estimation is derived. The horizontal time periods are the prediction windows and the diagonal of the table contains the in-sample MSE in percentage and the above diagonals are the out-of-sample MSE in percentage.

Table 3.14 shows my method delivers much smaller in-sample and out-of-sample estimation error with the exception of the long term prediction for the last quarters. My prediction estimation error is more stable than the AL estimators, which corroborates the robustness of my method.

3.4 Conclusion

In this chapter, I propose to combine projection pursuit regression with local polynomial regression to estimate state price densities for interest rate options. This method does not rely on any assumption on the motion of the interest rate, and alleviates the problem of high dimensional estimation in nonparametric methods. Compared with other alternative estimators, the estimated SPDs in this chapter are more reliable and robust. Moreover, this method is not restricted to the interest rate markets and can be adjusted to estimate the partial derivatives in other fields. For example, I can estimate the delta, gamma, theta, rho and vega of the options for the hedging management. 50

Table 3.14: MSE (in percentage) for estimated call option prices

Jan-Mar Apr-Jun Jun-Sep Oct-Dec PPR+LPR

Jan-Mar 0.0142 0.0716 0.0196 0.0635

Apr-Jun 0.0095 0.0090 0.0738

Jun-Sep 0.0049 0.0475

Oct-Dec 0.0013

Semi-parametric model

Jan-Mar 0.0159 0.1282 0.0257 0.0814

Apr-Jun 0.0102 0.0458 0.096

Jun-Sep 0.0086 0.0630

Oct-Dec 0.0193

This table reports the in-sample and the out-of-sample MSE of the interest rate call option prices estimated by my method and by the semi-parametric model in Aït-Sahalia and Lo (1998). The vertical time column shows the time period based on which the estimation is derived. The horizontal time periods are the prediction windows. The diagonal of the table contains the in-sample MSE in percentage and the above diagonals are the out-of-sample ones. Chapter 4

Delta Hedging on Interest Rate Options

This chapter will discuss the delta hedging strategy for interest rate options. In- terest rate options are financial derivatives written on the interest rates. They have been one of the most liquid interest rate derivatives in the world. The most popular inter- est rate options offered by financial intuitions are interest rate caps and interest rate

floors in the over-the-counter markets. They provide insurance against interest rates rising above or falling below certain levels. The buyer of interest rate caps (floors) re- ceives money at the end of each period in which an interest rate is higher (lower) than the agreed strike price. Another important type of interest rate option traded on the exchange markets is the interest rate option of the Chicago Board Options Exchange.

These are European-style, cash-settled options on the spot yield of U.S. Treasury se- curities. The global market for interest rate options is notionally valued by the Bank for International Settlements at $52,275 billion by June 2007. The huge investment in interest rate options leads to the importance of hedging investment risks in this market. 52

4.1 Delta Hedging Strategy

4.1.1 Delta Hedging Stocks

The delta of a financial derivative is defined to be the first-order partial derivative of the price of the financial instrument with respect to the underlying value. Investors try to construct a portfolio for which the net value of delta is equal to zero, so that the price of this portfolio is not sensitive to small variations in the underlying value. Thus, they can reduce their investment risk. The portfolio usually consists of the financial derivative to be hedged and the underlying risky asset traded in the market. For sim- plicity, we call the financial derivatives to be hedged the targets, and the underlying assets the hedging tools. To accomplish delta hedging, investors hold the opposite po- sition from the hedging tool in the amount equal to the ratio of the delta values for the two financial instruments in the portfolio. The numerator of this ratio is the delta of the target, and the denominator is the delta of the hedging tool. For a risky asset, the delta is easy to obtain, and is always equal to 1. The delta of the financial derivative, however, varies from time to time. So the hedging ratio needs to be balanced frequently.

To explain how delta hedging works, let me introduce some notation. I use Pt and Ct to denote the prices of the portfolio and the target respectively at time t, ∆t to denote the delta of the target, and St to denote the underlying value. According to the definition of the delta,

∂Ct ∆t = . (4.1) ∂St

The portfolio is constructed by holding one share of the target and ∆t shares of the 53 hedging tool. So the price of the portfolio for hedging purposes is

Pt = Ct − ∆tSt. (4.2)

The net delta value of this portfolio is

∂Pt ∂Ct ∂St = − ∆t = 0 ∂St ∂St ∂St

The first-order partial derivative of the portfolio price with respect to the underlying value is zero.

In the next period, the price of the portfolio will become Pt+1, which can be approximated with the equation

2 ∂Pt ∂ Pt 2 2 Pt+1 = Pt + (St+1 − St) + 0.5 2 (St+1 − St) + o((St+1 − St) ) ∂St ∂St 2 ∂ Pt 2 2 = Pt + 0.5 2 (St+1 − St) + o((St+1 − St) ). ∂St

If St+1 does not change too much with St, Pt+1 should be very close to Pt. Thus, the investors can reduce the investment risk of a small fluctuation in St by delta hedging.

4.1.2 Delta Hedging Interest Rate Options

The delta hedging method for stocks cannot be applied to interest rate options directly. The main reason is that interest rates are not traded in the market. Therefore, hedging interest rate options requires the use of a different hedging tool. The candidates considered here are interest rate options. 54

If I choose interest rate options as the hedging tool, I will use one interest rate option to hedge another interest rate option. Since the delta values of the interest rate options are not constant, in order to obtain the hedging ratio, I have to use estimates for both the numerator and the denominator for the ratio.

c c I use Ct and ∆t to denote the price of the option to be hedged and its delta value

h h at time t, and use Ct and ∆t to the corresponding parts for the hedging tool. Then the portfolio I need is c ∆t h Pt = Ct − h Ct . (4.3) ∆t

The advantage of this method is that these two options can be in two different

positions, which can alleviate the funding requirement for the hedging. And the gamma

of the portfolio in delta hedging is

c c ∆t h Γt = Γt − h Γt . ∆t

c b This gamma value depends on both options in the portfolio. When ∆t and ∆t are

comparable, the gamma can be reduced.

c h But I have to estimate the delta accurately, since neither ∆t nor ∆t can be ob-

served. The next section of this chapter uses a model that can estimate delta more

accurately than the models in the existing literature. And based on that, I can further

discuss how to perform delta hedging for interest rate options. 55

4.2 Delta Estimation

4.2.1 My Model

I use the same nonparametric estimation model as in my previous chapter, which combines projection pursuit regression (PPR) and local polynomial regression (LPR).

Here I describe the model briefly. For more details, please refer to Chapter 3.

Since the state-price densities involve the estimation of the second-order partial derivatives, the estimation of delta is relatively easy to be conducted. It is commonly known that there are five major factors affecting options, the current underlying value

St, the strike price Et, the time to maturity τt, the discount rate γt and the volatility

−γtτt σt. Usually, the discount factor, e , is relatively stable, and the volatility is assumed to be an unknown function of Si. Therefore, I can exclude these two factors from the regressors.

Here I also incorporate the quadratic form of the factors:

X 2 2 2 Ct ∼ gj(a1jSt+a2jEt+a3jτt+a4jSt +a5jEt +a6jτt +a7jStEt+a8jEtτt+a9jStτt). j (4.4)

After the estimation of {gˆj, aˆj}, the SPD estimators can be obtained with the equation ˆ ∂Ct X (1) = [ˆg (·)(ˆa + 2ˆa S +a ˆ E +a ˆ τ )]. (4.5) ∂S j 1j 4j t 7j t 9j t t j

4.2.2 Alternative Models

For purposes of comparison, I use another two models to estimate the delta of in-

terest rate options: the PPR model mentioned in Hutchinson (1996) and the Nadaraya- 56

Watson kernel estimator. Hutchinson (1996) touches on PPR about pricing options.

The basic idea in this model is the same as the PPR setup in my model except that they still use a local constant, instead of a local linear, to estimate the option prices. Further- more, they obtain the estimates for the delta by taking the partial derivative of the kernel function. So when I put my model together with the model in Hutchinson (1996), I can clearly show the importance of LPR. Another difference is that Hutchinson (1996) only uses the moneyness and the time to maturity as regressors. This model adopts the as- sumption that the option price is homogenous of degree one on the underlying value and the strike price. This assumption can reduce the dimension of nonparametric esti- mation, but as pointed out by Hutchinson (1996), it is not justified yet. Using the same notation in my model, I can write their model as:

X Ct ∼ gj(a1j(St/Et) + a2jτt), (4.6) j and ˆ ∂Ct X (1) = [ˆg (·)(a /E )]. (4.7) ∂S j 1j t t j

4.3 Measuring Hedging Performance

The purpose of estimating the delta is to hedge the risk of the investment on inter-

est rate options. However, the hedging cannot be done perfectly because the estimates

of the delta values contain estimation errors. In addition, it is infeasible to hedge op-

tions continuously. So the second-best choice for us is to look for a method or model

which can make the hedging errors as low as possible. Thus, I need a measure to gauge 57 the hedging errors for different hedging tools.

I assume the investors are in the long position of one target interest rate option and in the short position of another hedging bond or option. And this investment is

financed by borrowing or lending in the risk-free market. After the initial investment, the investor will balance the portfolio according to hedging ratio in the next following periods. And again, the balance action is financed in the risk-free market. That means initially the investor has nothing, and, if the hedging is perfect and there is no arbitrage in the market, the investor should get zero profit at the expiration day of the target option. What I am curious about is at the expiration day of the target option how far this portfolio is from zero profit.

c Let me use Vt to denote the market value of the target option at time t. Similarly,

h f Vt and Vt denote the market values of the hedging option and the risk-free bond at time t respectively. At time 0,

f c h V0 = −(V0 + V0 ), (4.8)

and the value of the portfolio, V0, is

c h f V0 = V0 + V0 + V0 , (4.9)

h f and Vt and Vt are balanced daily with the hedging ratio. At the time T , I want to know

−rT η = e |VT |. (4.10)

h h C0, C0 , ∆0 and ∆0 denote the price of the option to be hedged, the price of the hedging tool, and their delta values respectively. If I use one interest rate option to 58 hedge another interest rate option, then

c V0 = C0, (4.11)

h ∆0 h V0 = − h C0 , (4.12) ∆0

f c h V0 = −(V0 + V0 ). (4.13)

Thus,

c h f V0 = V0 + V0 + V0 = 0. (4.14)

At anytime t, the value of the portfolio is given by

c h f Vt = Vt + Vt + Vt , (4.15)

where

c Vt = Ct, (4.16)

h ∆t h Vt = h Ct , (4.17) ∆t   f rτ f ∆t ∆t−τ h Vt = e Vt−τ − h − h Ct . (4.18) ∆t ∆t−τ

4.4 Simulations

To run my Monte Carlo simulations, I use three different models to simulate the

option contracts. The models I am going to use are the CIR (Cox, Ingersoll, Ross)

model, the Vasicek model, and the Black-Scholes model. I assume they describe the

motions of interest rates in the risk-neutral economy properly with some appropriate

parameters. Based on these models, I can simulate the theoretical prices and the delta 59 values for the options. After that, I employ three different methods to estimate the deltas, and try to compare the estimators with the theoretical values. The three methods for the estimation are my model as described in this chapter, the PPR in Hutchinson

(1994) and the Nadaraya-Watson kernel model.

4.4.1 Stochastic Process for Underlying Values

I first use the CIR model to describe the motion of the interest rates. In the CIR model, the dynamic of the underlying value can be expressed by

1/2 dSt = α0(α1 − St)dt + σSt dWt, (4.19)

where {Wt} is a standard one-dimensional Brownian motion and α0 = 0.21459, α1 =

0.032 and σ = 0.0075. The CIR model captures the basic mean-reversion property

of interest rates. The transition density is determined by the fact that, given St = S0,

2 2 2cSt+∆ has a noncentral χ distribution with degrees of freedom 2(2α0α1/σ − 1) + 2

2 and noncentrality parameter 2cS0 exp(α0∆), where c = 2α0/[σ (1−exp(−α0∆))] and

∆ is the time interval.

The Vasicek model is very similar to the CIR model except that it does not allow

the heteroskedasticity of the interest rates. It can be expressed by the following:

dSt = α0(α1 − St)dt + σdWt, (4.20)

where I choose α0 = 0.21459, α1 = 0.032 and σ = 0.001. If ρ is defined to be

exp(−α0τ), the transition density is a normal distribution with a mean of α1+(St−α1)ρ

2 2 and a standard deviation of σ (1 − ρ )/(2α0). 60

The Black-Scholes model is different with the previous two models. It assumes the stochastic process of the underlying value as

dSt = α0Stdt + σStdWt, (4.21) where σ is chosen to be 0.05. The Black-Scholes model does not capture the mean reversion property, as it is not designed specifically for interest rates. But, due to its popularity, I still consider this model here.

4.4.2 Data Simulation

I will use the stochastic processes to simulate the data. Although the processes are quite different, the basic ideas for the simulation are the same. I assign the initial underlying value to be 32.25, and simulate the underlying values for 253 days according to one of the stochastic processes. Then, for each simulated underlying value, I set up four maturity dates for the option contracts, which are the 35th day, the 100th day, the

165th day and the 235th day, and set up three strike prices, 27.125, 30 and 32.125. I do not follow the conventions of the Chicago Board Options Exchange rigorously, but I can still capture the most salient features of interest rate options this way. In addition, I get a number of data points similar with what I am going to use in the empirical analysis.

After simulating the option contracts, I use the transition density function to derive the option prices. Given the current underlying value, the strike price and the expiration day, I simulate the underlying value at the expiration day 5000 times and regard the mean values of the payoffs of the option as the theoretical prices. Furthermore, I obtain 61 the theoretical value of the delta for each option contract.

4.4.3 Estimation and Hedging Performance

After simulating the data, I will use three methods to estimate the delta values.

The Nadaraya-Watson kernel estimator is one of the most popular nonparametric es- timators. But due to the curse of dimensionality, it performs well for the multivariate case only in a large sample size. The PPR model in Hutchinson (1994) has a similar setup to the PPR part in my model except that it derives the delta by differentiating the kernel function directly. By comparing my model with these two models, I can high- light the roles of PPR and LPR in estimating the partial derivatives for the multivariate function.

I simulate the data 1000 times. Each time, I use the three estimation models to estimate the delta values, and collect the ratios of the mean squared estimation errors over the mean squared true values. Table 4.1 reports the mean and the standard devia- tion of the ratios delivered by the three methods for three different underlying processes of the interest rates. It shows that my method produces much smaller estimation errors than the other two no matter what kind of stochastic process is used for the simulation.

Next, I use the estimated delta to conduct the delta hedging on the option con- tracts. Since, in my simulated data, there are four maturity dates and three strike prices, in order to alleviate the boundary problem of nonparametric estimation, I focus on the options which have a strike price of 30 and expire at the 100th day, and I use other options which have the same strike price but expire at the 165 day as the hedging tools. 62

Table 4.1: Ratios between Sum of Squared Estimation Errors and Sum of Squared True Values of Delta

PPR+LPR Hutchinson Nadaraya-Watson CIR model

mean 0.0317 0.1312 0.2683

standard deviation 0.0465 0.0153 0.0505

Vasicek model

mean 0.0277 0.1110 0.2687

standard deviation 0.0171 0.0136 0.0505

Black-Scholes model

mean 0.0144 0.1756 0.2630

standard deviation 0.0306 0.0398 0.0499

This table summarizes the means and the standard deviations of the ratios between the mean squared estimation errors and the mean squared true values in 1000 simulations. The CIR model, the Vasicek model and the Black-Scholes model are employed here to simulate the data. My model in this chapter, the PPR model in the PPR in Hutchinson (1994) and Nadaraya- Watson kernel model are used to estimate the delta values. 63

I measure the hedging performance as I describe in the previous section, and repeat the simulation 1000 times. Table 4.2 summarizes the mean square hedging error (MSHE) in the 1000 simulations for different simulations and different estimation methods. In addition, in order to reveal the effect of the discrete hedging, I also show MSHE by using the actual theoretical delta values. It turns out that MSHE in my method are smaller than those delivered by the Hutchinson model or the Nadaraya-Watson model, and closer to MSHE when the true delta values are plugged into the hedging.

4.5 Empirical Analysis

In this section I will use the data of short-term interest rate options (IRX) traded in CBOE to analyze how to do delta hedging in practice. The sample period is from

January 4, 1993 to December 31,1993.

4.5.1 Data

Interest rate options are European-style, cash-settled options on the spot yield of

U.S. Treasury securities. The options on the short-term rate (ticker symbol IRX) are based on the annualized discount rate on the most recently auctioned 13-week Treasury bill. The 13-week T-bill yield is the recognized benchmark for short-term interest rates.

These bills are issued by the U.S. Treasury in auctions conducted weekly by the Federal

Reserve Bank. Underlying values for the option contracts are 1000 times the underlying interest rates. For example, an annualized discount rate of 5.5% on the newly auctioned

13-week Treasury bills would place the underlying value for the option on short-term 64

Table 4.2: Mean Square Hedging Error: with Interest Rate Options

Hedging Window True Value PPR+LPR Hutchinson Nadaraya-Watson CIR model

21 days 0.0008 0.0273 0.0798 0.0401

42 days 0.0045 0.0468 0.1654 0.1100

63 days 0.0139 0.0726 0.3284 0.1664

99 days 0.0360 0.1451 0.7224 0.2335

Vasicek model

21 days 0.0008 0.0114 0.0680 0.0315

42 days 0.0044 0.0320 0.1517 0.0896

63 days 0.0137 0.0563 0.3062 0.1354

99 days 0.0426 0.1235 0.7722 0.2053

Black-Scholes model

21 days 0.0001 0.0035 0.0921 0.0623

42 days 0.0006 0.0202 0.3695 0.1575

63 days 0.0018 0.0342 0.8052 0.2271

99 days 0.0058 0.0677 1.8879 0.2807

This table reports the mean square hedging error for different simulation models and estimation methods. ‘True Value’ here means the actual theoretical values of the delta in the simulated data. 65 rates (IRX) at 55.00.

To eliminate data error and ensure that closing option prices are representative of market conditions at the end of the trading day, in my sample observations with time-to-maturity less than six days and price lower than 0.125 have been dropped.

4.5.2 Hedging Interest Rate Options

I use PPR-LPR to estimate the delta of the interest rate options in 1993, and group the results according to the moneyness and the maturity dates. In Table 4.3, I report the mean and the standard deviation of the delta values in each group. It seems the interest rate options which take a short amount of time or a long amount of time to mature tend to have higher delta values than the options which take a medium amount of time to mature. This reflects the fact that investors expect the motions of the interest rates will show mean reversion properties.

I move on and do the delta hedging for every option in the data. One option contract can have different underlying value and different time-to-maturity for each trading day before the expiration. I treat every trading day as a starting point and hedge the option contract for a fixed time interval. That means the hedging result for the same option contract depends on when the hedging begins and how long it is going to last.

I also use different hedging tools, and try to find out which tool is the best for hedging the interest rate options. As shown in Table 4.4, the target option has the strike price of E and the time-to-maturity of τ, and the hedging option has the strike price of

Eh and the time-to-maturity of τh. Eh and E can be the same, but when they are not, I 66

Table 4.3: Delta of Interest Rate Options

Out-of-the-Money At-the-money In-the-money

Short Term 0.3657 0.4526 0.4668 (0.2765) (0.2859) (0.3576)

Medium Term 0.3107 0.1882 0.2804 (0.2368) (0.2484) (0.3247)

Long Term 0.8063 0.9816 0.9992 (0.2566) (0.0452) (0.0091)

This table reports the statistical properties for the estimated delta values. ‘Short Term’ means the the time-to-maturity less than 84 days, ‘Long Term’ means the time-to-maturity longer than 126 days and ‘Medium Term’ indicates the time length between these two. 67 choose Eh to be as close to E as possible. I conduct several experiments. First, for each option to be hedged, the hedging tool is focused on the option which is traded in the same day with the target and has the longest time to mature. The results are in the first panel of Table 4.4. In this panel, different strike prices for the hedging tool have also been explored. Then, I try the option in which τh = τ, as shown in the second panel of

Table 4.4. All the experiments are repeated for three different delta estimators. Table

4.5 and Table 4.6 report the hedging performance for 42 days and 63 days respectively.

It is clear that among all the hedging experiments conducted as shown in the table,

the hedging tool which has the same time-to-maturity as the target option and smaller

strike prices can deliver the smallest hedging errors when my method is employed to

estimate the delta values. Varying with the simulations, MSHE does not increase over

the hedging time since in the actual data some options have less than 42 days to mature

and are excluded from further calculations. In other words, the data points for different

hedging periods are different.

4.6 Conclusion

This chapter examines delta hedging on interest rate options. I use three different

models to estimate the delta values and two different hedging tools to conduct the hedg-

ing. I compare the hedging performances in the simulations and the empirical data. I

find the model I propose in Chapter 3 provides more accurate estimates for the delta

values of the interest rate options, and it is more effective to use the interest rate options

as the hedging tool than other interest rate derivatives. 68

Table 4.4: Mean Square Delta Hedging Error for 21 days in 1993

PPR+LPR Hutchinson Nadaraya-Watson

Hedging with Interest Rate Options

longest τh

Eh < E 0.3021 0.4489 1.2383

Eh = E 0.2395 0.3524 1.0511

Eh > E 0.3169 0.9055 1.2835

τh fixed at τ

Eh < E 0.0859 0.2603 0.4103

Eh > E 0.2610 0.6165 0.3916

This table summarizes the hedging errors for the interest rate options for 21 days by different hedging tools. The hedging tools employed here are the interest rate options and the treasury bills. And the interest rate options with different time-to-maturity and the strike prices are tried in this table. τh and Eh denote the time-to-maturity for the hedging options. τ and E denote the time-to-maturity for the target options. 69

Table 4.5: Mean Square Delta Hedging Error for 42 days in 1993

PPR+LPR Hutchinson Nadaraya-Watson

Hedging with Interest Rate Options

longest τh

Eh < E 0.2126 0.5183 1.4570

Eh = E 0.1522 0.2251 0.9686

Eh > E 0.2896 1.2172 1.3759

τh fixed at τ

Eh < E 0.1107 0.5276 0.6979

Eh > E 0.3359 1.5455 0.8738

This table summarizes the hedging errors for the interest rate options for 42 days by different hedging tools. The hedging tools employed here are the interest rate options and the treasury bills. And the interest rate options with different time-to-maturity and the strike prices are tried in this table. τh and Eh denote the time-to-maturity for the hedging options. τ and E denote the time-to-maturity for the target options. 70

Table 4.6: Mean Square Delta Hedging Error for 63 days in 1993

PPR+LPR Hutchinson Nadaraya-Watson

Hedging with Interest Rate Options

longest τh

Eh < E 0.2310 0.8579 1.6865

Eh = E 0.1457 0.1796 1.0514

Eh > E 0.2801 1.4485 1.3635

τh fixed at τ

Eh < E 0.1294 0.9391 1.0568

Eh > E 0.2293 1.9467 1.0565

This table summarizes the hedging errors for the interest rate options for 63 days by different hedging tools. The hedging tools employed here are the interest rate options and the treasury bills. And the interest rate options with different time-to-maturity and the strike prices are tried in this table. τh and Eh denote the time-to-maturity for the hedging options. τ and E denote the time-to-maturity for the target options. Chapter 5

Hansen-Jagannathan Distance Test

This chapter will show a method to improve the finite sample performance of the HJ-distance test. The HJ-distance test is widely used as a measure to evaluate the stochastic discount factor implied by an asset pricing model. However, for the sample sizes we usually encounter, the rejection frequencies of this test are much higher than the nominal numbers. We are going to find a method to remedy this problem.

5.1 Hansen-Jagannathan distance

Hansen and Jagannathan (1997) develop a measure of degree of misspecification of an asset pricing model. This measure, called the HJ-distance, is defined as the least squares distance between the stochastic discount factor associated with an asset pricing model and the family of stochastic discount factors that price all the assets correctly.

Hansen and Jagannathan (1997) show that the HJ-distance is also equal to the maximum pricing errors generated by a model on the portfolios whose second moments of returns are equal to one.

Consider a portfolio of N primitive assets, and let Rt denote the t-th period gross 72 returns of these assets. Rt is a 1 × N vector. A valid stochastic discount factor (SDF),

0 mt, satisfies E(mtRt) = 1N , where 1N is a N-vector of ones. If an asset pricing model implies a stochastic discount factor mt(δ), where δ is a K ×1 unknown parameter, then

the HJ-distance corresponding to this asset pricing model is given by

p 0 −1 HJ(δ) = E[wt(δ)] G E[wt(δ)],

0 0 where wt(δ) = Rtmt(δ) − 1N denotes the pricing errors and G = E(RtRt).

We follow Ahn and Gadarowski (2004) and focus on linear factor pricing mod-

˜ els. Linear factor pricing models imply the SDF of the linear form mt(δ) = Xtδ, where

˜ Xt = [1 Xt] is a 1 × K vector of factors including 1; see Hansen and Jagannathan

(1997). Note that linear factor pricing models can accommodate nonlinear functions

˜ of factors because Xt may contain polynomials of factors, and the linearity assump- tion here is not very restrictive. For example, Bansal, Hsieh, and Viswanathan (1993),

Chapman (1997), and Dittmar (2002) consider nonlinear factor models of this type. In addition, many successful asset pricing models are in linear forms.1

The HJ-distance can be estimated by its sample analogue

q 0 −1 HJT (δ) = wT (δ) GT wT (δ),

−1 PT −1 PT 0 ˜ where wT (δ) = T t=1 wt(δ) = DT δ − 1N , DT = T t=1 RtXt and GT =

−1 PT 0 T t=1 RtRt. Following Jagannathan and Wang (1996), the parameter δ is estimated 1 For example, the Sharpe (1964)-Lintner (1965)-Black (1972) CAPM, the Breeden (1979) con- sumption CAPM, the Adler and Dumas (1983) international CAPM, the Chen, Roll, and Ross (1986) five macro factor model, and the Fama-French (1992, 1996) three factor model. 73 by minimizing the sample HJ-distance HJT (δ), giving the estimate δT as

0 −1 −1 0 −1 δT = (DT GT DT ) DT GT 1N .

The estimator δT is equivalent to a GMM estimator with the moment condition E[wt(δ)] =

−1 0 and the weighting matrix GT .

Jagannathan and Wang (1996) prove that, under the hypothesis that the SDF

prices the returns correctly, the sample HJ-distance follows

N−K 2 X T [HJT (δT )] →d λjυj, j=1

2 where υ1,. . . υN−K are independent χ (1) random variables, and λ1,. . . λN−K are

nonzero eigenvalues of the following matrix:

1/2 −1/2 −1/2 0 0 −1 −1 0 −1/2 −1/2 0 1/2 0 Λ = Ω G [IN − (G ) D(D G D) D G ](G ) (Ω ) .

0 0 ˜ Here Ω = E[wt(δ)wt(δ) ] denotes the variance of pricing errors, and D = E(RtXt). It

can be proved that Λ is positive semidefinite with rank N − K. GT and DT can be used

to estimate G and D consistently. Under the hypothesis that the SDF prices the returns

−1 PT 0 correctly, Ω can be estimated consistently by ΩT = T t=1 wt(δT )wt(δT ) .

−1 δT is not as efficient as the optimal GMM estimator that uses ΩT (optimal

weighting matrix) as the weighting matrix, defined as

0 −1 −1 0 −1 δOP T,T = (DT ΩT DT ) DT ΩT 1N .

−1 Associated with δOP T,T and ΩT is the J-statistic of Hansen (1982)

0 −1 JT (δOP T,T ) = T wT (δOP T,T ) ΩT wT (δOP T,T ), 74 which is widely used for specification testing. Under the null hypothesis that the SDF prices the returns correctly, Hansen’s J-statistic is asymptotically χ2-distributed with

N − K degrees of freedom.

The HJ-distance has several desirable properties over the J-statistic. First, it does not reward the variability of SDFs. The weighting matrix used in the HJ-distance is the second moment of portfolio returns and independent of pricing errors. On the other hand, the J-statistic uses the inverse of the second moment of the pricing errors as the weighting matrix and hence rewards models with high variability of pricing errors.

Second, as Jagannathan and Wang (1996) point out, the weighting matrix of the HJ- distance remains the same across various pricing models, which makes it possible to compare the performances among competitive SDFs by the relative values of the HJ- distances for a given dataset.

5.2 Finite sample properties of the HJ-distance test

In this section, we investigate the finite sample performances of the specification test based on the HJ-distance (henceforth the HJ-distance test) following the settings of

Ahn and Gadarowski (2004).

5.2.1 Simulation design

We simulate three sets of data comparable to those in Ahn and Gadarowski

(2004). The first set is a simple three-factor model with independent factor loadings, where the scale of expected returns and variability of the factors are roughly matched 75 to those of the actual market-wide returns. The statistical properties of the factors and idiosyncratic errors are set to be identical to those in Ahn and Gadarowski (2004). We refer to this model as the Simple model henceforth. The second set of data is calibrated to resemble the statistical properties of the three-factor model in Fama-French (1992).

The third set of data is calibrated based on the Premium-Labor model in Jagannathan and Wang (1996). The details of the data generation are provided in the Appendix.

We simulate each set of the data with 1000 replications. For each replication, we calculate the HJ-distance and test the null hypothesis that the stochastic discount factor implied by the DGP prices portfolio returns correctly. Since the stochastic discount factors are derived from the true DGPs, the actual rejection frequency is supposed to be close to the nominal level. The critical values of the HJ-distance test are calculated following the algorithm by Jagannathan and Wang (1996). First, draw M ×(N −K) in-

2 PN−K dependent random variables from χ (1) distribution. Next, calculate uj = i=1 λivij

(j = 1,...,M). Then the empirical p-value of the HJ-distance is

M −1 X 2 p = M I(uj ≥ T [HJT (δT )] ), j=1 where I(·) is an indicator function which equals one if the expression in the brackets is true and zero otherwise. In our simulation, we set M = 5, 000.

5.2.2 Simulation results

Table 5.1 summarizes the results from this simulation with 25 and 100 port- folios and T = 160, 330, 700. Panel A of this table corresponds to Table 1 of Ahn 76 and Gadarowski (2004), while Panels B and C correspond to Table 3 of Ahn and

Gadarowski (2004). The first column in each panel is the significance level of the tests. The other columns report the actual rejection frequencies for different numbers of observations. The results are comparable to those in Ahn and Gadarowski (2004).

The HJ-distance test overrejects the correct null under all combinations of the

DGPs, the number of portfolios, and sample sizes. In the Simple model and Fama-

French model, the size distortion is noticeable except for the combination of T = 700 and 25 portfolios. The size distortion is particularly large with 100 portfolios but im- proves as T increases. In the Premium-Labor model, the HJ-distance test is severely oversized both with 25 and 100 portfolios and for all sample sizes. As suggested by

Ahn and Gadarowski (2004), this excessive rejection frequencies for the HJ-distance may be due to a feature of the data based on the Premium-Labor model not present in the other data, possibly the temporal dependence of the factors.

Ahn and Gadarowski (2004) investigate the source of this overrejection and find that one of its sources is the poor estimation of the variance matrix of the pricing errors,

Ω. They find that repeating their simulations using the exact pricing error variance matrix Ω removes most of the upward bias in the size of the HJ-distance test. However, the exact pricing error matrix is unknown, and hence it is impossible to use this method in practice and the problem of overrejection has remained unsolved.

We examine other possible sources of overrejection. It is well-known that the accuracy of the weighting matrix has a significant effect on the finite sample property of the GMM-based Wald tests (e.g., Burnside and Eichenbaum, 1996). We conjecture 77

Table 5.1: Rejection frequencies of the specification test using the HJ-distance

Number of Observations T =160 T =330 T =700 (A)Simple Model 25 Portfolios 1% 4.5 2.4 1.9 5% 15.1 8.7 7.7 10% 23.8 16.4 13.4 100 Portfolios 1% 99.6 51.3 11.7 5% 99.9 71.8 27.4 10% 99.9 81.3 39.3 (B)Fama-French Model 25 Portfolios 1% 5.8 3.3 1.1 5% 15.1 10.6 7.1 10% 23.9 18.9 12.8 100 Portfolios 1% 99.8 53.7 13.8 5% 100.0 76.0 30.8 10% 100.0 84.3 44.6 (C)Premium-Labor Model 25 Portfolios 1% 14.9 11.3 9.2 5% 31.9 26.0 19.5 10% 42.7 34.4 29.0 100 Portfolios 1% 99.7 79.1 36.8 5% 99.9 90.1 59.1 10% 99.9 94.3 69.6

This table shows the rejection rates over 1000 trials using the p-values for the HJ-distance. For Panel (A), factors and returns are simulated to make the mean and variance of gross returns roughly consistent with historical data in the US stock market. For Panel (B) and (C), factors and returns are simulated using either the Fama-French (1993) model or the Premium-Labor model per Jagannathan and Wang (1996). 78 another possible source of the overrejection is the poorly estimated weighting matrix.

−1 PT 0 0 Jagannathan and Wang (1996) use T t=1 RtRt = Vd ar(Rt) + Eb(Rt) Eb(Rt) as an

0 0 estimate of G = E(RtRt) = V ar(Rt) + E(Rt) E(Rt). While E(Rt) can be estimated accurately by the sample mean for the sample size of our interest, the sample covariance matrix can be a very inaccurate estimate of V ar(Rt) when the number of observations is not large enough relative to the number of portfolios, as pointed out by Jobson and

Korkie (1980). In our case, with 25 portfolios, G has (26 × 25)/2 = 325 elements.

Consequently, the poor estimation of G may be another main reason for the poor small sample performance of the HJ-distance test.

We confirm this conjecture by repeating the simulations in Table 5.1 but replacing

GT with the exact second moment matrix G. Table 5.2 shows the resulting rejection frequencies of the HJ-distance test. We approximate G by the sample second moment matrix from 10,000 time-series observations. In all cases, the rejection rates of the HJ- distance test improve dramatically. The HJ-distance test now has good small sample properties in the Simple model and the Fama-French model. In particular, with 25 portfolios, the actual size is close to the nominal size for all T . Comparing it with

Table 5.1 suggests that the improvement of the size of the original HJ-distance test with large T occurs mainly through a more accurate estimation of G. In the Premium-Labor model, there still remains size distortion, but its magnitude is much smaller than those in Panel C of Table 5.1. 79

Table 5.2: Rejection frequencies of the specification test using the HJ-distance with the exact weighting matrix G

Number of Observations T =160 T =330 T =700 (A)Simple Model 25 Portfolios 1% 0.9 1.2 1.3 5% 5.4 5.8 6.8 10% 12.4 11.8 12.1 100 Portfolios 1% 0.8 1.2 1.2 5% 4.4 5.7 5.7 10% 11.6 10.9 10.9 (B)Fama-French Model 25 Portfolios 1% 1.0 1.2 0.7 5% 4.9 5.7 5.0 10% 9.9 11.8 11.9 100 Portfolios 1% 1.1 1.6 1.3 5% 7.4 7.4 5.8 10% 16.8 14.7 13.6 (C)Premium-Labor Model 25 Portfolios 1% 4.5 6.7 6.9 5% 15.2 20.3 16.3 10% 24.3 29.7 25.9 100 Portfolios 1% 3.1 7.3 8.2 5% 14.2 22.3 23.0 10% 26.5 35.1 36.3

This table shows the rejection rates over 1000 trials using the p-value of the HJ-distance, but approximating the weighting matrix, G, by the sample second moment matrix from 10,000 time-series observations. 80

5.3 Improved estimation of covariance matrix by shrinkage

The simulation evidence in the previous section reveals that the finite sample performance of the HJ-distance test improves significantly when one employs a better estimate of the second moment matrix of portfolio returns, or equivalently, a better estimate of the covariance matrix of portfolio returns. In this section, we explore the possibility of improved estimation of the portfolio covariance matrix by the shrinkage method following the approach of Ledoit and Wolf (2003).

5.3.1 Shrinkage method and the HJ-distance

The shrinkage method dates back to the seminal chapter by Stein (1956). The basic idea behind the shrinkage method is to balance the trade-off between bias and variance by taking a weighted average of two estimators. If one estimator is unbiased but has a large variance while the other estimator is biased but has a small variance, then taking a properly weighted average of the two estimators can outperform both estimators in terms of accuracy (mean squared error). The biased estimator is called the shrinkage target to which the unbiased estimator with a large variance is shrunk.

In our context, the sample covariance matrix is an unbiased estimator of the true covariance matrix but has a large variance. Note that the purpose of the HJ-distance test is to test if a SDF can price the returns correctly. Therefore, a natural choice of the shrinkage target is the covariance matrix implied by the factor model which implies the

SDF of interest. Factor pricing models explain asset returns in terms of a few factors 81 and uncorrelated residuals, thereby imposing a low-dimensional factor structure to the returns. Since the parameters of a factor model can be estimated with a small variance, the estimate of the asset covariance matrix implied by the factor model has a small variance, although it is a biased estimate when the factor model is misspecified.

One might argue for using the asset covariance matrix implied by the factor model alone, without combining it with the sample covariance. We advocate the shrink- age method in this chapter because the HJ-distance test is often used to compare the fit of different SDFs. Comparing different SDFs by the HJ-distance requires one to use the same weighting matrix across all candidate SDFs, but one does not know which

SDF is the correct SDF a priori. Using the shrinkage method allows one to use the same weighting matrix across different SDFs without assuming one particular SDF is the correct one.

5.3.2 Optimal shrinkage intensity

The shrinkage method assigns weight α to a covariance matrix implied by a factor model and weight 1 − α to the sample covariance matrix. Using the shrinkage method requires the determination of α, which is called the shrinkage intensity. Ledoit and

Wolf (2003) derive the analytical formula for the optimal α and discuss its estimation when the shrinkage target is a single-factor model and is a misspecified model of asset returns. We extend their method to multiple-factor shrinkage targets as well as to the case where the shrinkage target is the correct model of asset returns.

As in Section 2, let Rt denote a 1 × N vector of the t-th period gross returns of 82

∗ N assets, and let Xt denote a 1 × K vector of factors not including a constant, where

∗ K = K − 1. Let Rti denote the t-th period gross return of the i-th asset, so that

Rt = (Rt1,...,RtN ).

Suppose the following K-factor linear asset pricing model is used to construct the shrinkage target. It is not necessary that the model generates the actual stock returns.

Rti = µi + Xtβi + εti, t = 1,...,T, (5.1)

∗ where βi is a K × 1 vector of slopes for the ith asset, and εti is the mean-zero idiosyn-

cratic error for asset i in period t. εti has a constant variance δii across time, and is

uncorrelated to εtj with j 6= i and to the factors. The model (5.1) may be the asset pric-

ing model corresponding to the SDF we test, but any other linear factor model can be

∗ used. Let β = (β1, . . . , βN ) denote the K × N matrix of the slopes, µ = (µ1, . . . , µN )

be the 1 × N vector of the intercepts, and εt = (εt1, . . . , εtN ). Then the factor model

(5.1) is written as

Rt = µ + Xtβ + εt, V ar(εt) = ∆ = diag(δii), t = 1,...,T. (5.2)

We impose the following assumptions on the stock returns and factors.

Assumption 1 Stock returns Rt and factors Xt are independently and identically dis-

tributed over time.

Assumption 2 Rt and Xt have finite fourth moment, and Var(Rt) = Σ.

The iid assumption is used in Ledoit and Wolf (2003). We may allow Rt and

Xt to be heteroskedastic and/or serially correlated by assuming they satisfy conditions 83 such as mixing or near-epoch dependence without affecting the logic underlying our argument. We use the iid assumption because it is an acceptable first-cut approximation and relaxing it adds substantial notational complexity.

The asset pricing model (5.2) implies the following covariance matrix of Rt:

0 Φ = β V ar(Xt)β + ∆.

We can estimate Φ by estimating its components. Regressing the i-th portfolio returns on an intercept and the factors, we obtain the least squares estimate of βi and the resid- ual variance estimate. Let bi and dii denote these estimates of βi and δii, respectively.

Let b = (b1, . . . , bN ) and D = diag(dii), then the estimate of Φ is

0 F = b Vd ar(Xt)b + D, (5.3)

where Vd ar(Xt) is the sample covariance matrix of the factors.

We estimate Σ by a weighted average of F and the sample covariance of Rt, S, with the weight (shrinkage intensity) α assigned to the shrinkage target F. We choose the shrinkage intensity α so that it minimizes a risk function. Let ||Z|| be the Frobenius norm of an N × N matrix Z, so

N N 2 0 X X 2 kZk = Trace(Z Z) = zij. i=1 j=1

Following Ledoit and Wolf (2003), we use the following risk function

Q(α) = E[L(α)], where L(α) is a quadratic measure of the distance between the true and estimated co- 84 variance matrices

L(α) = kαF + (1 − α)S − Σk2.

Let sij, fij, σij, and φij denote the (i, j)-th element of S,F, Σ, and Φ, respec- tively. It follows that

PN PN 2 Q(α) = i=1 j=1 E(αfij + (1 − α)sij − σij)

PN PN 2 = i=1 j=1{V ar(αfij + (1 − α)sij) + [E(αfij + (1 − α)sij − σij)] }

PN PN 2 2 = i=1 j=1{α V ar(fij) + (1 − α) V ar(sij) + 2α(1 − α)Cov(fij, sij)

2 2 + α (φij − σij) }.

The optimal α can be derived by differentiating Q(α) with respect to α. The second order condition is satisfied since Q(α) is concave. Solving the first order condition for

α gives the optimal α as

PN PN PN PN V ar(sij) − Cov(fij, sij) α∗ = i=1 j=1 i=1 j=1 , PN PN PN PN 2 i=1 j=1 V ar(fij − sij) + i=1 j=1(φij − σij)

which is the same as (3) in Ledoit and Wolf (2003). Multiplying both the numerator

and the denominator by T , we obtain √ √ √ PN PN PN PN V ar( T sij) − Cov( T fij, T sij) α∗ = i=1 j=1 √ √ i=1 j=1 . (5.4) PN PN PN PN 2 i=1 j=1 V ar( T fij − T sij) + T i=1 j=1(φij − σij) √ PN PN As in Ledoit and Wolf (2003), define π = i=1 j=1 AsyV ar[ T sij], ρ = √ √ PN PN PN PN 2 i=1 j=1 AsyCov[ T fij, T sij], and let γ = i=1 j=1(φij − σij) denote the

measure of the misspecification of the factor model (5.2). √ PN PN ∗ Define η = i=1 j=1 AsyV ar[ T (fij − sij)]. We consider the limit of α as

T → ∞ in two cases separately, depending on whether Φ = Σ. First, consider the case 85 where Φ 6= Σ. Since S is consistent while F is not, the optimal shrinkage intensity α∗

converges to 0 as T → ∞. Ledoit and Wolf (2003) prove in their Theorem 1 that

π − ρ T α∗ → , as T → ∞. (5.5) γ

When Φ = Σ, both S and F are consistent for Σ, but they have different variance.

In this case, the optimal shrinkage intensity α∗ converges to a non-degenerate limit

π − ρ α∗ → , as T → ∞. (5.6) η

This case is not considered in Ledoit and Wolf (2003), but the proof of (5.6) follows

from the proof of Theorem 1 of Ledoit and Wolf (2003, pp. 610-611). From (5.5) and

(5.6), the shrinkage estimate αF + (1 − α)S is consistent for Σ under both Φ 6= Σ and

Φ = Σ. Note that (π − ρ)/η does not necessarily equal 1. (π − ρ)/η = 1 if F is an

asymptotically efficient estimator of Σ.

5.3.3 Estimation of the optimal shrinkage intensity

Since π, ρ, µ and γ in the formula for α∗ are unobservable, we must find esti- √ √ √ mators for them. Define πij = AsyV ar[ T sij], ρij = AsyCov[ T fij, T sij], γij = √ 2 PN PN PN (φij − σij) , and ηij = AsyV ar[ T (fij − sij)], so that π = i=1 j=1 πij, ρ = i=1

PN PN PN PN PN j=1 ρij, γ = i=1 j=1 γij, and η = i=1 j=1 ηij. In the following, we present

consistent estimates of these quantities and show the asymptotic behavior of our esti-

mate of α∗. 86

5.3.3.1 πij and γij

From Lemma 1 of Ledoit and Wolf (2003), a consistent estimator for πij is given

by T 1 X p = [(R − m )(R − m ) − s ]2, ij T ti i tj j ij t=1

−1 PT where mi = T t=1 Rti is the sample average of the return of the i-th asset. Define

2 cij = (fij − sij) , then cij →p γij follows from Lemma 3 of Ledoit and Wolf (2003).

5.3.3.2 ρij

When i = j, note that fii = sii. Thus we can use pii to estimate ρii. When i 6= j,

first define M = I − T −1110, where I is a T × T identity matrix and 1 is a T × 1 vector

∗ 0 0 0 of ones. Collect the factors into a T × K matrix X: X = (X1,...,XT ) . We use R·i to denote a T × 1 vector of the i-th asset return. Recall that (see (5.3))

0 F = b Vd ar(Xt)b + D,

0 −1 0 where b = (b1, . . . , bN ), and bi is given by bi = (X MX) X MR·i. D is a diagonal matrix of residual variance estimates.

−1 0 ∗ Define Sxi = T R·iMX, which is the 1×K sample covariance vector between

−1 0 ∗ ∗ Xt and Rtj, and define Sxx = T X MX, which is the K × K sample covariance matrix of Xt. Then we can express fij for i 6= j as

0 0 −1 −1 0 0 −1 0 −1 0 fij = R·iMX(X MX) T (X MX)(X MX) X MR·j = Sxi(Sxx) (Sxj) .

(5.7) 87

¯ −1 PT ∗ Let X = T t=1 Xt = 1 × K vector of the sample average of the factors; σxj =

∗ ∗ ∗ 1 × K covariance vector between Xt and Rtj; σxx = K × K covariance matrix of

Xt.

The following lemma provides a consistent estimator of ρij. Recall that sij de-

notes the (i, j)-th element of S and is equal to the sample covariance between Rti and

Rtj.

Lemma 1 A consistent estimator of ρij is given by rij, defined as follows: for i = j, set rii = pii, and for i 6= j, set rij as

−1 0 −1 0 −1 −1 0 rij = ZiSxx (Sxj) + SxiSxx (Zj) − SxiSxx ZxSxx (Sxj) ,

√ √ where Zi and Zx are consistent estimates of AsyCov [ TSxi, T sij] and AsyCov √ √ [ TSxx, T sij], respectively, and they take the form

T −1 X  ¯  Zi = T (Rti − mi)(Xt − X) − Sxi [(Rti − mi)(Rtj − mj) − sij] , t=1 T −1 X  ¯ 0 ¯  Zx = T (Xt − X) (Xt − X) − Sxx [(Rti − mi)(Rtj − mj) − sij] . t=1

Proof For i = j, the stated result follows from fii = sii. For i 6= j, from (5.7), √ √ −1 0 −1 0 fij converges to σxiσxx (σxj) in probability. Expanding T fij around T σxiσxx (σxj)

gives

√ √ √ √ −1 0 −1 0 −1 0 T fij = T σxiσxx (σxj) + T (Sxi − σxi)σxx (σxj) + T σxiσxx (Sxj − σxj) √ −1 −1 0 −σxiσxx T (Sxx − σxx)σxx (σxj) + op(1),

where the third term follows from ∂(X(θ)−1)/∂θ = −X(θ)−1(∂X(θ)/∂θ)X(θ)−1. It 88 follows that

√ √ √ √ h i h i −1 0 AsyCov T fij, T sij = AsyCov TSxi, T sij σxx (σxj) √ √ −1 h 0 i + σxiσxx AsyCov T (Sxj) , T sij √ √ −1 h i −1 0 − σxiσxx AsyCov TSxx, T sij σxx (σxj) .

Since (Xt,Rt) is iid, the three asymptotic covariances on the right-hand side are es-

0 timated consistently by Zi, (Zj) , and Zx, respectively. The required result follows because Sxj and Sxx are consistent estimates of σxj and σxx. 

5.3.3.3 ηij

A similar analysis gives the following lemma. Its proof follows from the proof of lemma 1 and hence omitted. Let {A}kl denote the (k, l)-th element of matrix A, and let {a}k denote the k-th element of vector a.

Lemma 2 A consistent estimator of ηij is given by hij = wij + pij − 2rij, where wij is √ a consistent estimator of AsyV ar[ T fij]. For i = j, we set wii = pii. For i 6= j, wij

is given by

−1 a −1 0 −1 a −1 0 −1 a 0 wij = SxjSxx ZiiSxx (Sxj) + SxiSxx ZjjSxx (Sxi) + 2SxiSxx ZjiSxx(Sxj) K K X X  −1 −1 −1 b −1 0 + {SxiSxx }k{SxjSxx }lSxiSxx ZklSxx (Sxj) k=1 l=1 K K X X  −1 −1 c −1 0 c −1 0 − 2 {SxiSxx }k{SxjSxx }l Zi,klSxx (Sxj) + Zj,klSxx (Sxi) k=1 l=1 √ √ a b c where Zij, Zkl, Zi,kl are consistent estimates of AsyCov[ TSxi, TSxj], AsyCov √ √ √ √ [ TSxx, T {Sxx}kl], and AsyCov[ TSxi, T {Sxx}kl], respectively, and they take 89 the form

T a −1 X ¯ 0 0 ¯ Zij = T [(Rti − mi)(Xt − X) − (Sxi) ][(Rtj − mj)(Xt − X) − Sxj], t=1 T b −1 X ¯ 0 ¯ ¯ 0 ¯ Zkl = T [(Xt − X) (Xt − X) − Sxx][{(Xt − X) (Xt − X) − Sxx}kl], t=1 T c −1 X ¯ ¯ 0 ¯ Zi,kl = T [(Rti − mi)(Xt − X) − Sxi][{(Xt − X) (Xt − X) − Sxx}kl]. t=1

5.3.3.4 Estimate of α∗ and its asymptotic behavior

We construct an estimate of the optimal shrinkage intensity by replacing the un- knowns in α∗ in (5.4) with their estimates:

PN PN PN PN pij − rij αˆ = i=1 j=1 i=1 j=1 . (5.8) PN PN PN PN i=1 j=1 hij + T i=1 j=1 cij

We analyze the asymptotic behavior of αˆ for the following two cases:

Case 1. Φ 6= Σ.

Case 2. The stock returns are generated by the factor model (5.2), and εt is independently

and identically distributed over time with finite fourth moment.

These two cases cover most situations of practical interest. They leave out only a small

case in which Φ = Σ but the stock returns are not generated by the factor model (5.2).

The following lemma shows that, in Case 1, T αˆ converges in probability to the

limit of T α∗, while in Case 2, αˆ converges to a random variable which is smaller than

α∗. Since 0 < α0 < α∗ and Q(α) is concave, the shrinkage estimator has a smaller risk

than the sample covariance matrix. The simulations in the following section show that 90 using the shrinkage estimator leads to a substantial improvement of the finite sample performance of the HJ-distance test.

Lemma 3 As T → ∞, we have

π − ρ ∗ T αˆ →p = lim T α , in Case 1 , γ T →∞ π − ρ αˆ → α0 = < α∗, in Case 2 , d η + ξ

PN PN 2 where ξ = i=1,i6=j j=1(ξij) and {ξij}i,j=1,··· ,N,i6=j are jointly normally distributed

with mean zero.

Indeed, an estimate of α∗ that is consistent in both Case 1 and Case 2 is given by

PN PN PN PN pij − rij α˜ = i=1 j=1 i=1 j=1 , a ∈ (0, 1). (5.9) PN PN a PN PN i=1 j=1 hij + T i=1 j=1 cij PN PN By downweighting i=1 j=1 cij, this estimate favors the possibility that Φ = Σ.

From the proof of Lemma 3, it follows straightforwardly that α˜ →p 0 in Case 1 and

∗ α˜ →p (π − ρ)/η in Case 2. However, α˜ converges to 0 at a slower rate than α in Case

1. This reflects a trade-off between the consistency in both cases and the higher-order consistency in Case 1. Our preference of αˆ over α˜ and our choice of a conservative po- sition regarding this trade-off seems appropriate, because we expect that simple factor models are used as a shrinkage target in practice and those models are neither likely to nor meant to provide a complete description of the observed data.

Proof

In Case 1, rewrite T αˆ as

PN PN PN PN pij − rij T αˆ = i=1 j=1 i=1 j=1 . PN PN PN PN (1/T ) i=1 j=1 hij + i=1 j=1 cij 91

Then the stated result follows from pij →p πij and cij →p γij (Ledoit and Wolf (2003),

Lemmas 1 and 3), and Lemmas 1 and 2.

In Case 2, it follows from pij →p πij and Lemmas 1 and 2 that

PN PN PN PN pij − rij π − ρ + o (1) αˆ = i=1 j=1 i=1 j=1 = p . PN PN PN PN PN PN i=1 j=1 hij + T i=1 j=1 cij η + op(1) + i=1 j=1 T cij

2 We proceed to derive the asymptotic distribution of T cij = T (fij −sij) . Recall cij = 0

for i = j. For i 6= j, we have, from the definition of sij and (5.7),

−1 0 −1 0 0 −1 0 sij = T R·iMR·j, fij = T R·iMX(X MX) X MR·j. (5.10)

0 Define ε·i = (ε1i, . . . , εT i) , and rewrite the model (5.2) as R·i = µi1 + Xβi + ε·i for

i = 1,...,N. Substituting this into (5.10), we can express the difference between fij

and sij as

−1 0 0 −1 0 fij − sij = T (Xβi + ε·i) MX(X MX) X M(Xβj + ε·j)

−1 0 − T (Xβi + ε·i) M(Xβj + ε·j)

−1 0 0 −1 0 −1 0 = T ε·iMX(X MX) X Mε·j − T ε·iMε·j

−1 −1/2 0 −1 0 −1 −1/2 0 −1 0 = T (T ε·iMX)(T X MX) (T X Mε·j) − T ε·iMε·j.

−1/2 0 −1/2 PT −1/2 PT −1 PT Since T ε·iMX = T t=1 εtiXt − (T t=1 εti)(T t=1 Xt) = Op(1),

−1 0 −1/2 0 −1/2 PT −1/2 PT −1 PT T X MX →p σxx, and T ε·iMε·j = T t=1 εtiεtj−(T t=1 εti)(T t=1 εtj)

−1/2 PT = T t=1 εtiεtj + op(1), it follows that

√ T −1/2 X T (fij − sij) = T εtiεtj + op(1). t=1

2 Since εtiεtj is iid with mean 0 and finite variance, an (N − N) × 1 vector 92

−1/2 PT {T t=1 εtiεtj}i,j=1,...,N,i6=j converges to a normally distributed random vec- tor in distribution. 

The following theorem is a simple consequence of Lemma 3:

Theorem 3 Define the shrinkage covariance matrix estimate Σˆ as

Σˆ =αF ˆ + (1 − αˆ)S.

ˆ Then Σ →p Σ as T → ∞, because if Φ 6= Σ then αˆ → 0 and if Φ = Σ then both F and S are consistent for Σ.

5.4 Simulation results with the shrinkage method

With the shrinkage covariance matrix estimate Σˆ constructed in the previous sec- tion, we define the shrinkage estimate of the second moment of the asset returns as

1 1 Gˆ = Σˆ + ( R01)( R01)0. T T

In this section, we examine the finite sample performance of the HJ-distance test when the inverse of Gˆ is used as the weighting matrix. The other settings of the Monte Carlo experiments are the same as in Section 2. In order to avoid overshrinkage or negative shrinkage, we set 0 and 1 as the lower and upper bound for αˆ.

Table 5.3 reports the rejection frequencies of the HJ-distance test with Gˆ. Com- pared with Table 5.1, we find that the rejection frequencies improve in all cases. For example, for the Simple model and Fama-French model with 25 portfolios, the rejec- tion frequency of the HJ-distance test in Table 5.1 is more than twice the nominal level 93 for T = 160 and 330, whereas the rejection frequencies in Table 5.3 are close to the nominal level for all T. With 100 portfolios, the HJ-distance test with Gˆ still tends to overreject the correct null, but the degree of overrejection is much smaller than in Table

5.1.

Kan and Zhou (2004) derive the exact distribution of the HJ-distance under the normality assumption. Their Tables I and III report the rejection frequency of the asymptotic HJ-distance test and that of the feasible version of their exact test, respec- tively. We compare their Table III with the results for the Fama-French model in our

Table 5.3.2 With 25 portfolios, the HJ-distance test with shrinkage performs as well as the exact test, and the actual sizes of both tests are close to the nominal size. With 100 portfolios, the exact test performs substantially better than the shrinkage version. This is probably due to a poor chi-squared approximation. However, very few applications use as many as 100 portfolios; of the applications of the HJ-distance tests surveyed in the Introduction, all of them but Jagannathan and Wang (1996) use fewer than 25 port- folios. Therefore, we may conclude that the HJ-distance test with shrinkage performs as well as the exact test for most portfolio sizes of practical interest.

Table 5.4 reports the summary statistics of the estimated optimal shrinkage in- tensity αˆ. Figure B.13 shows the kernel density estimate of αˆ for the Simple model.

This corresponds to the case where Φ = Σ in Lemma 3. The results with the other

2 Although Kan and Zhou (2004) describe their factors as the Premium-Labor factors when K = 3, the rejection frequencies of the asymptotic test in their Table I for K = 3 are too small compared with the results with the Premium-Labor model in our Table 5.1 and Table 3 of Ahn and Gadarowski (2004). In fact, their results for K = 3 in Table I are more compatible with our results with the Fama-French model. 94

Table 5.3: Rejection frequencies of the specification test using the HJ-distance with shrinkage estimation of G

Number of Observations T =160 T =330 T =700 (A)Simple Model 25 Portfolios 1% 1.6 1.3 0.8 5% 6.6 6.8 5.4 10% 13.4 12.8 10.4 100 Portfolios 1% 3.9 1.3 0.9 5% 15.1 7.2 5.3 10% 28.4 13.7 11.1 (B)Fama-French Model 25 Portfolios 1% 1.3 1.2 0.7 5% 5.8 5.1 4.0 10% 9.9 9.8 9.6 100 Portfolios 1% 23.3 7.7 2.8 5% 49.9 22.9 11.2 10% 64.3 33.6 20.5 (C)Premium-Labor Model 25 Portfolios 1% 6.6 9.7 5.4 5% 18.8 20.8 13.5 10% 28.7 32.4 23.4 100 Portfolios 1% 31.5 19.5 11.2 5% 59.0 42.4 28.8 10% 73.0 58.4 40.0

This table shows the rejection rates over 1000 trials using the p-value of the HJ-distance, but approximating the weighting matrix, G, by shrinkage method, which average the sample co- variance and the structure covariance with an optimal weight. 95 factor models are similar and thus not reported here. From Table 5.4 and Figure B.13, we can see that αˆ is centered around 0.8 ∼ 1 and the estimated covariance matrices are much closer to F than the sample covariance matrix when Φ = Σ. Figure B.14 shows the kernel density estimate of αˆ when the data are simulated from the Simple model with 100 portfolios, but only two of the three factors are used in constructing F .

This corresponds to the case where Φ 6= Σ in Lemma 3. Figure B.14 shows that αˆ is converging to zero, corroborating Lemma 3.

One important feature of the shrinkage method is that it provides a better esti- mate of the HJ-distance itself. Table 5.5 reports the MSE of the HJ-distance with two estimates of G relative to the HJ-distance computed with the true value of G. The MSE of the HJ-distance with shrinkage is less than half of and substantially smaller than the

MSE of the HJ-distance with sample covariance. Therefore, the shrinkage method pro- vides a more accurate comparison of the HJ-distance across different models. Note that this feature is not present with the exact distribution approach.

We are also interested in the sensitivity of the shrinkage method to the overspec- ification and/or underspecification of the factor model used in constructing F. Tables

5.6 and 5.7 report the results of the following simulation experiment. For each model, we conduct the HJ-distance test as in Table 5.3 but we use an overspecified or under- specified factor model to estimate the factor model (5.1) and construct the shrinkage target F. For the overspecified case, we generate two additional factors with the same statistical properties as the original factors, and use the five-factor model to estimate F.

In the underspecified case, we pick two arbitrary factors from the original three factors, 96

Table 5.4: Summary statistics of αˆ

Number of Observations T =160 T =330 T =700 (A)Simple Model 25 Portfolios mean 0.8290 0.8757 0.8981 standard deviation 0.1287 0.1040 0.0895 100 Portfolios mean 0.8180 0.8722 0.8951 standard deviation 0.0953 0.0679 0.0532 (B)Fama-French Model 25 Portfolios mean 0.9280 0.9462 0.9605 standard deviation 0.0780 0.0631 0.0533 100 Portfolios mean 0.6324 0.6443 0.6514 standard deviation 0.0909 0.0875 0.0844 (C)Premium-Labor Model 25 Portfolios mean 0.8120 0.8152 0.8199 standard deviation 0.0832 0.0780 0.0753 100 Portfolios mean 0.7304 0.7326 0.7340 standard deviation 0.0297 0.0238 0.0206

This table shows the mean and the standard deviation of the estimated optimal shrinkage inten- sity αˆ for each model. 97

Table 5.5: The mean squared error of the HJ-distance from two estimation methods of G

Number of Observations T =160 T =330 T =700 (A)Simple Model 25 Portfolios sample covariance 0.0084 0.0017 0.00046 shrinkage estimation 0.0033 0.0009 0.00032 100 Portfolios sample covariance 0.5630 0.0394 0.0040 shrinkage estimation 0.0424 0.0067 0.0009 (B)Fama-French Model 25 Portfolios sample covariance 0.00440 4.9618 × 10−4 6.4341 × 10−5 shrinkage estimation 0.00098 1.1522 × 10−4 1.4838 × 10−5 100 Portfolios sample covariance 0.5606 0.0385 0.0036 shrinkage estimation 0.0511 0.0077 0.0009 (C)Premium-Labor Model 25 Portfolios sample covariance 0.0049 7.4934 × 10−4 1.1823 × 10−4 shrinkage estimation 0.0009 1.6685 × 10−4 2.9242 × 10−5 100 Portfolios sample covariance 0.5139 0.0366 0.0038 shrinkage estimation 0.0331 0.0054 0.0007

This table compares the HJ-distances when the sample covariance or the shrinkage covariance is used as the weighting matrix. Here, we report the mean squared error. 98 and use the two-factor model to estimate F. In the overspecified case, F is still con- sistent for Σ but suffers from extra sampling error, while F is inconsistent for Σ and the shrinkage estimate should converge to the sample covariance in the underspecified case.

Table 5.6 reports the results with the overspecified target factor model. The re- jection frequencies reported in Table 5.6 are close to those in Table 5.3, and using an overspecified shrinkage target causes little deterioration in the performance of the HJ- distance test. On the other hand, the results with the underspecified target factor model reported in Table 5.7 are substantially worse than those in Table 5.3, except for the

Premium-Labor model. However, they are still better than those in Table 5.1, in par- ticular with the Fama-French and Premum-Labor models. Therefore, when conducting the HJ-distance test, a researcher can benefit significantly from using the shrinkage method with a possibly overspecified shrinkage target to estimate G.

5.5 Conclusion

The HJ-distance test rejects correct SDFs too often in the finite sample, which limits its practical use. We find that one reason for this phenomenon is a poorly esti- mated covariance matrix of the asset returns. We propose to use the shrinkage method to construct an improved estimate of this matrix.

The sample covariance matrix is often used to estimate the covariance matrix of asset returns. When the number of portfolios is large, however, this estimate suf- fers from a large estimation error. The shrinkage method uses another estimate that 99

Table 5.6: Rejection frequencies of the HJ-distance test using five factors to estimate G

Number of Observations T =160 T =330 T =700 (A)Simple Model 25 Portfolios 1% 1.3 1.3 0.8 5% 6.6 6.4 5.3 10% 12.8 12.3 10.4 100 Portfolios 1% 3.5 1.3 0.9 5% 14.9 7.1 5.2 10% 28.0 13.2 11.0 (B)Fama-French Model 25 Portfolios 1% 1.2 1.2 0.7 5% 5.4 4.9 3.9 10% 9.6 9.6 9.4 100 Portfolios 1% 23.4 7.8 2.5 5% 49.7 22.3 10.8 10% 63.0 33.7 20.4 (C)Premium-Labor Model 25 Portfolios 1% 6.3 9.6 5.4 5% 17.9 20.5 13.4 10% 28.8 31.4 23.2 100 Portfolios 1% 28.5 21.0 11.6 5% 55.5 44.9 30.0 10% 70.4 57.0 42.0

This table shows the rejection rates over 1000 trials using the p-value of the HJ-distance. We simulate the factors and returns in the same way as we have done in Table 1, but we use two more factors in the structure model. They are simulated with the same statistic properties of the first two factors in the original three factors. 100

Table 5.7: Rejection frequencies of the HJ-distance test using two factors to estimate G

Number of Observations T =160 T =330 T =700 (A)Simple Model 25 Portfolios 1% 4.2 1.8 1.2 5% 13.1 9.8 7.1 10% 23.0 17.2 11.8 100 Portfolios 1% 97.2 45.0 11.0 5% 99.6 69.8 28.4 10% 99.9 81.3 41.4 (B)Fama-French Model 25 Portfolios 1% 4.0 1.9 1.1 5% 11.7 7.5 5.6 10% 18.6 14.8 10.3 100 Portfolios 1% 17.3 5.4 2.3 5% 39.3 17.7 10.0 10% 54.4 29.4 17.8 (C)Premium-Labor Model 25 Portfolios 1% 6.6 9.6 5.5 5% 18.3 20.7 13.4 10% 28.7 32.1 23.2 100 Portfolios 1% 32.5 20.1 10.9 5% 60.3 42.7 28.2 10% 73.8 58.3 39.6

This table shows the rejection rates over 1000 trials using the p-value of the HJ-distance. We simulate the factors and returns in the same way as we have done in Table 1, but we use only the first two factors in the structure model. 101 imposes some structure onto this high dimensional estimation problem, and combines it optimally with the sample covariance matrix. Our simulation results show that the shrinkage method significantly mitigates the overrejection problem of the HJ-distance test. Chapter 6

Conclusion

In this thesis, I first propose to combine projection pursuit regression with local polynomial regression to estimate the state-price densities in the interest rate option market. This method does not reply on any assumption on the motion of the interest rate, and alleviates the problem of high dimension estimation in nonparametric meth- ods. Compared with other alternative estimators, the estimated SPDs in this thesis are more reliable and robust. This method is not restricted to the interest rate markets and can be adjusted to estimate the partial derivatives in other fields. For example, I can certainly estimate the delta, gamma, theta, rho and vega of the options for the hedg- ing management. As an illustration, this thesis also examines the delta hedging on the interest rate options. I use three different models to estimate the delta values and two different hedging tools to conduct the hedging. I compare the hedging performances in the simulations and the empirical data. I find my model can provide more accurate esti- mates for the delta values of the interest rate options, and using the interest rate options as the hedging tool can be more effective than other interest rate derivatives.

Besides SPD, stochastic discount factor (SDF) is widely used in asset pricing. 103

And the HJ-distance test is designed to test the validity of the SDF. But the HJ-distance test rejects correct SDFs too often in the finite sample, which limits its practical use. I

find that one reason for this phenomenon is a poorly estimated covariance matrix of the asset returns. I propose to use the shrinkage method to construct an improved estimate of this matrix. The sample covariance matrix is often used to estimate the covariance matrix of asset returns. When the number of portfolios is large, however, this estimate suffers from a large estimation error. The shrinkage method uses another estimate that imposes some structure onto this high dimensional estimation problem, and combines it optimally with the sample covariance matrix. My simulation results show that the shrinkage method significantly mitigates the overrejection problem of the HJ-distance test.

A few questions remain to be addressed in the future research. First, it would be a vast improvement if I could come up with a fast and convenient way to find the projection direction. Second, in this paper I project high dimension regressors using the linear combination. Other methods of projection are also very interesting to examine further. Linear combination is easy to conduct, but it does not allow much variety of the regressors. It covers some properties of the shape of the original functions. Third, it would be interesting to test the hypothesis that some other variables may be the potential factors affecting the SPDs of the interest rate options.

For the HJ-distance test, the future research can be explored in the following asepcts. First, the shrinkage method mitigates but does not completely solve the over- rejection problem of the HJ-distance test, in particular when the portfolio size is as large 104 as 100. A further improvement would be desirable. Second, it would be interesting to investigate how to choose the shrinkage target optimally and how to obtain a better esti- mate of the optimal shrinkage intensity. Third, the estimation of the covariance matrix plays an important role in many tests in empirical finance. It would be worthwhile to examine whether the method proposed in this paper can improve the finite properties of those tests. Bibliography

[1] M. Adler and B. Dumas, International portfolio choice and corporate finance: a synthesis, Journal of Finance, 38, 925-984, 1983.

[2] S.C. Ahn and C. Gadarowski, Small sample properties of the GMM specification test based on the Hansen-Jagannathan distance, Journal of Empirical Finance, 11, 109-132, 2004.

[3] Y. Aït-Sahalia, Nonparametric pricing of interest rate derivative securities, Econo- metrica, 64, 527-560, 1996.

[4] Y. Aït-Sahalia and J. Duarte, Nonparametric option pricing under shape restrictions, Journal of Econometrics, 116, 9-47, 2003.

[5] Y. Aït-Sahalia and A. Lo, Nonparametric estimation of state-price densities implicit in financial asset prices, Journal of Finance, 53, 499-547, 1998.

[6] Y. Aït-Sahalia and A. Lo, Nonparametric risk management and implied risk aversion, Journal of Econometrics, 94, 9-51, 2000.

[7] G. Bakshi and N. Kapadia, Delta-hedged gains and negative market volatility risk premium, The Review of Financial Studies, 16, 527-566, 2003.

[8] R. Bansal, D.A. Hsieh and S. Viswanathan, A new approach to international arbitrage pricing, Journal of Finance, 48, 1719-1747, 1993.

[9] R. Bansal and H. Zhou, Term strucutre of interest rate with regime shifts, Journal of Finance, 57, 1997-2043, 2002.

[10] F. Black, Capital market equilibrium with restricted borrowing, Journal of Busi- ness, 45, 444-54, 1972.

[11] F. Black and M. Scholes, The pricing of options and corporate liabilities, Journal of Political Economy, 81, 637-659, 1973. 106

[12] D.T. Breeden, An intertemporal asset pricing model with stochastic consumption and investment opportunities, Journal of Financial Economics, 7 , 265-296, 1979.

[13] D. Breeden and R. Litzenberger, Prices of state-contingent claims implicit in option prices, Journal of Business, 51, 621-651, 1978.

[14] M. Broadie, J. Detemple, E. Ghysels and O. Torre’s, Nonparametric estimation of American options exercise boundaries and call prices, Journal of Economic Dynam- ics and Control, 24, 1829-1857, 2000a.

[15] M. Broadie, J. Detemple, E. Ghysels and O. Torre’s, American options with stochastic dividends and volatility: a nonparametric investigation, Journal of Econo- metrics, 94, 53-92, 2000b.

[16] C. Burnside and M. Eichenbaum, Small-sample properties of GMM-based Wald tests Journal of Business & Economic Statistics, 7, 265-296, 1996.

[17] A. Buraschi and A. Jiltsov, Model uncertainty and option markets with heterogeneous beliefs, Journal of Finance, 61, 2841-2897, 2006.

[18] J. Y. Campbell and J.H. Cochrane, Explaining the poor performance of consumption-based asset pricing models, Journal of Finance, 55, 2863-2878, 2000.

[19] G. Chacko and S. Das, Pricing interest rate derivatives: a general approach, The review of financial studies, 14, 195-241, 2002.

[20] D.A. Chapman, Approximating the asset pricing kernel, Journal of Finance, 52, 1383-1410, 1997.

[21] N.F. Chen, R. Roll and S.A. Ross, Economic forces and the stock market, Journal of Business, 59, 383-403, 1986.

[22] L. Clewlow and S. Hodges, Optimal delta hedging under transactions costs, Jour- nal of Economics Dynamics and Control, 21, 1353-1376, 1997.

[23] R. Cont, Empirical properties of asset returns, stylized facts and statistical issues, Quantitative Finance, 1, 223-236, 2001.

[24] R. Cont and J. da Fonseca, Dynamics of implied volatility surfaces, Quantitative Finance, 2, 45-60, 2002.

[25] G. Courtadon, The pricing of options on default-free bonds, Journal of Financial and Quantitative Analysis, 17, 75, 1982.

[26] J. Cox, J. Ingersoll and S. Ross, A theory of the term structure of interest-rates, Econometrica, 53(2), 385-407, 1985.

[27] S. Crepey, Delta-hedging vega risk?, Quantitative Finance, 4, 559-579, 2004. 107

[28] T. Daglish, A pricing and hedging comparison of parametric and non-parametric approaches for American index options, Journal of Financial Econometrics, 1(3), 327-364, 2003.

[29] R.F. Dittmar, Nonlinear pricing kernels, kurtosis preference, and evidence from the cross section of equity returns, Journal of Finance, 57, 369-403, 2002.

[30] D. Duffie and R. Kan, A yield-factor model of interest rates, Mathematical Fi- nance, 6, 379-406, 1996.

[31] E.F. Fama and K.R. French, The cross-section of expected stock returns, Journal of Finance, 47, 427-466, 1992.

[32] E.F. Fama and K.R. French, Multifactor explanations of asset pricing anomalies, Journal of Finance, 51 , 55-84, 1996.

[33] J. Fan, A Selective Overview of Nonparametric Methods in Financial Econometrics, Statistical Science, 20, 317-227, 2005.

[34] J. Fan and I. Gijbels, Local polynomial modelling and its applications, Chapman & Hall, 1996.

[35] J. Fink, An examination of the effectiveness of static hedging in the presence of , The Journal of Futures Markets, 23, 859-890, 2003.

[36] R. Garcia and R. Gencay, Pricing and hedging derivative securities with neural networks and a homogeneity hint, Journal of Econometrics, 94, 93-115, 2000.

[37] T. Gasser, H.-G. Müller and V. Mammitzsch, Kernels for nonparametric curve estimation Journal of Royal Statistic Society, 47, 238-252, 1985.

[38] E. Ghysels, V. Patilea, E. Renault and O. Torres, Nonparametric methods and option pricing, Discussion paper, 9775, 1997.

[39] A. Gupta and M. Subrahmanyam, An empirical examination of the convexity bias in the pricing of interest rate swaps, Journal of Financial Economics, 55, 239-279, 2000.

[40] P. Hall, On Projection Pursuit Regression, The Annals of Statistics, 17, 573-588, 1989.

[41] L.P. Hansen, Large sample properties of generalized method of moments estimators, Econometrica, 50, 1029-1054, 1982.

[42] L.P. Hansen and R. Jagannathan, Assessing specific errors in stochastic discount factor models, Journal of Finance, 52, 557-590, 1997. 108

[43] W. Hardle,T. Kleinow and G. Stahl, Applied Quantitative Finance: Theory and Computational Tools, Springer, Berlin, 2002.

[44] R.J. Hodrick and X.Y. Zhang, Evaluating the specification errors of asset pricing models, Journal of Financial Economics, 62, 327-376, 2001.

[45] J.Z. Huang and L. Wu, Specification analysis of option pricing models based on time-changed levy process, Journal of Finance, 59, 1405-1439, 2004.

[46] J. Hutchinson, A. Lo and T. Poggio, A Nonparametric Approach to Pricing and Hedging Derivative Securities Via Learning Networks, Journal of Finance, 49, 851- 889, 1994.

[47] J. Jackwerth and M. Rubinstein, Recovering probability distributions from option prices, Journal of Finance, 51, 1611-1632, 1996.

[48] K. Jacobs and K.Q. Wang, Idiosyncratic consumption risk and the cross section of asset returns, Journal of Finance, 59, 2211-2252, 2004

[49] R. Jagannthan and Z. Wang, The conditional CAPM and the cross-section of expected returns, Journal of Finance, 51, 3-53, 1996.

[50] R. Jagannthan and Z. Wang, An asymptotic theory for estimating beta-pricing models using cross-sectional regression, Journal of Finance, 53, 1285-1309, 1998.

[51] R. Jagannthan and Z. Wang, Empirical evaluation of asset-pricing models: A comparison of the SDF and beta methods, Journal of Finance, 57 , 2337-2367, 2002.

[52] R. Jarrow, H. Li and F. Zhao, Interest rate caps smile too! But can the market models capture the smile?, Journal of Finance, 62, 345-382, 2007.

[53] J.D. Jobson and B. Korkie, Estimation for Markowitz efficient portfolios, Journal of the American Statistical Association, 75, 544-554, 1980.

[54] R. Kan and C. Zhang, GMM tests of stochastic discount factor models with useless factors, Journal of Financial Economics, 54 , 103-127, 1999.

[55] R. Kan and G. Zhou, Hansen-Jagannathan Distance: Geometry and Exact Distribution, Working paper, University of Toronto, 2004.

[56] O. Ledoit and M. Wolf, Improved estimation of the covariance matrix of stock returns with an application to portfolio selection, Journal of Empirical Finance, 10 , 603-621, 2003.

[57] M. Lettau and S. Ludvigson, Resurrecting the (C)CAPM: a cross-sectional test when risk premia are time-varying, Journal of Political Economy, 109, 1238-1287, 2001. 109

[58] J. Lintner, The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets, Review of Economics and Statistics, 47, 13-37, 1965. [59] R. Merton, Rational theory of option pricing, Bell Journal of Economics and Man- agement Science, 4, 141-183, 1973. [60] J. Parker and C. Julliard, Consumption risk and the cross section of expected returns, Journal of Political Economy, 113, 185-222, 2005. [61] S. Ross, The arbitrage theory of capital asset pricing, Journal of Economic Theory, 13, 341-360, 1976. [62] P. Sercu and X. Wu, Cross- and delta-hedges: Regression- versus price-based hedge ratios, Journal of Banking & Finance, 24, 737-757, 2000. [63] J. Shanken, On the estimation of beta-pricing models, Review of Finance Studies, 5 , 1-34, 1992. [64] A. Shapiro, The investor recognition hypothesis in a dynamic equilibrium: theory and evidence, The Review of Financial Studies, 15 , 97-141, 2002. [65] W.F. Sharpe, Capital asset prices: a theory of market equilibrium under conditions of risk, Journal of Finance, 19, 425-42, 1964. [66] C. Stein, Inadmissibility of the usual estimator for the mean of a multivariate normal distribution, Berkeley, CA: Proceedings of the Third Berkeley Symposium on Mathematical and Statistical Probability, University of California, Berkeley, 1956. [67] O. Vasicek, Equilibrium characterization of term structure, Journal of Financial Economics, 5, 177, 1977. [68] M. Vassalou, News related to future GDP growth as a risk factor in equity returns, Journal of Financial Economics, 68 , 47-73, 2003. [69] M. Vassalou and Y.H. Xing, Default risk in equity returns, Journal of Finance, 59 , 831-868, 2004. [70] M. Villaverde, Hedging European and barrier options using stochastic optimization, Quantitative Finance, 4, 549-557, 2000. [71] H. Windcliff, P. Forsyth and K. Vetzal, Pricing methods and hedging strategies for volatility derivatives, Journal of Banking & Finance, 30, 409-431, 2006. [72] A. Yatchew and W. Härdle, Nonparametric state price density estimation using constrained least squares and the bootstrap, Journal of Econometrics, 133, 579-599, 2006. Appendix A

A.1 Mathematical details for Chapter 2 and Chapter 3

A.1.1 Framework for Local Polynomial Regression A.1.1.1 Estimate β

Suppose I have already determined the first l − 1 terms of {aˆj} and {gˆj}. And define l−1 X rˆi = Yi − gˆj(Xiaˆj). j=1

For each given estimator aˆl, define uˆi = Xiaˆl. Then, I have the bivariate data (ˆui, rˆi). Without ambiguousness, I use the notation (ui, ri) to replace (ˆui, rˆi) in the appendix. I assume ri = z(ui) + σ(ui)εi, where E(εi) = 0, V ar(εi) = 1. My interest is to estimate the regression function z(u0) = E(r|u = u0) and its (v) the v-th order derivative z (u0). Local polynomial regression is set up as:

n P X X p 2 min {ri − βp(ui − u0) } Kh(ui − u0), (A.1) βp i=1 p=0 where Kh is a kernel function assigning weights to each datum point and Kh = K(·/h)/h. h is a bandwidth controlling the size of the local neighborhood. The optimal polyno- mial order P is chosen to be v + 1.1 It is more convenient to write local polynomial regression in a matrix form. De- note  P    1 (u1 − u0) ··· (u1 − u0) r1  . . . .   .  U =  . . . .  ,R =  .  . P 1 (un − u0) ··· (un − u0) rn

1 See Fan and Gijbels (1996) 111

Further, let W be the n × n diagonal matrix of the weights:

W = diag{Kh(ui − u0)}. Then local polynomial regression turns to be: min(R − Uβ)0W (R − Uβ) β

0 with β = (β0, β1, ··· , βP ) . The solution vector is βˆ = (U 0WU)−1U 0WR.

A.1.1.2 Kernel and Bandwidth Selection The local neighborhood is determined by a bandwidth h and a kernel function K. A bandwidth h = 0 causes interpolating each data point (the most complex model), while a bandwidth of infinity corresponds to fitting globally a polynomial of degree P (the simplest model). The complexity of the model is controlled by the bandwidth. I choose a bandwidth which minimizes the weighted mean integrated squared error with the weights Z (v) (v) 2 (v) {(E[ˆz (ui)] − z (ui)) + V ar[ˆz (ui)]}w(ui)dui.

It turns out R σ2(u )w(u )/f(u )du 1/(2P +3) h = C i i i i . opt v,P R (P +1) 2 [z (ui)] w(ui)dui The constant Cv,P depends on the choice of kernel and√ the order of the polynomial. For the Gaussian kernel, which is K(u) = exp(−u2/2)/ 2π and will be used throughout the paper, the related constant is C0,1 = 0.776, C1,2 = 0.884 and C2,3 = 1.006. The formula of the optimal bandwidth involves some unknown parameters. I need to replace them by their corresponding estimators. A simple way to do so is fitting a polynomial of order P + 3 globally to z(u), which is

P +3 zˇ(ui) =α ˇ0 + ··· +α ˇP +3ui , (A.2)

2 (P +1) and estimate σˇ (ui) and zˇ by OLS. For the global optimal bandwidth, a typical choice of weighting function would be w(u) = w0(u)f(u) where w0(u) is 1 for all u between the mean of the ui minus 1.5 times the standard deviation of the ui and the mean plus 1.5 times the standard deviation, and 0 for u outside this interval. In this R p case, w0(u)du = 3 V ar(u), estimated by the sample moment. So the estimated global optimal bandwidth is

 ssr × R w (u)du 1/(2P +3) hˆ = C 0 , opt v,P Pn (P +1) 2 n i=1[ˆz (ui)] w0(ui) 112

where ssr is the sum of squared residuals from the regression (A.2). From the above formula, I can see that the optimal bandwidth not only depends on the data but also depends on the order of the derivative to be estimated. So even for the same data set, when I estimate the conditional moment and its partial derivatives, I should use different bandwidths. For the convenience of notation, I use h0, h1 and h2 to denote the optimal bandwidths for estimating the conditional moment and the first and the second order partial derivatives.

A.1.2 Proofs Fan and Gijbels (1996) and Hall (1989) have provided a couple of theorems for LPR and PPR respectively. They are very helpful for the proofs of the asymptotic properties in this paper. I will summarize them briefly at first, and incorporate them as the lemmas.

A.1.2.1 Notations Recall the notations: • K: the kernel used in LPR.

R j R j 2 • µj = u K(u)du; νj = u K (u)du.

˜ ∗ • Matrices: Bj,l = (µj+l), Bj,l = (µj+l+1), B = (νj+l), where 0 ≤ j, l ≤ P .

0 0 • Vectors: cp = (µp+1, ··· , µ2p+1) , c˜p = (µp+2, ··· , µ2p+2) .

A.1.2.2 Theorems in Hall (1989)

k 1 Suppose Xi is a R -random variable and ri is a R random variable. And the density function of X is ψ. Denote f(·) as the p.d.f. of u. I have

2 D(a) = E[{ri − g(Xia)} ],

ˆ −1 X 2 D(a) = n (ri − gˆ(Xia)) , i and n   1 X u0 − Xia gˆ(u ) = r K˜ , 0 ˜ ˆ i ˜ nhf(u0) i=1 h ˜ ˜ where u0 = X0a. K and h are the kernel and the bandwidth in estimating gˆ. I assume ˜ R ∞ j ˜ K fulfills the condition that −∞ u K(u)du is equal to 1 for j = 0, and equal to 0 for 1 ≤ j ≤ λ − 1. In other words, K˜ is assumed to have an order of (0, λ).2

2 See the definition of the order of kernels in Gassar, Müller and Mammitzsch (1985). 113

Lemma 4 [Theorem 4.2 in Hall (1989)] Assume that the first three directional deriva- tives of ψ and z exist in A and are continuous uniformly in X ∈ Rk and in all direc- tions; that ψ vanishes outside a compact set, and is bounded away from 0 on A for ˜ some  > 0; and that K is Hölder continuous and compactly supported. Let a0 give a local minimum of D(a) and let aˆ be a value of which minimizes Dˆ(a). aˆ converges to ˜λ a0 with the rate of h .

A.1.2.3 Theorems in Fan and Gijbels (1996)

Consider the bivariate data (u1, r1), ...(un, rn).

Lemma 5 [Theorem 3.1 in Fan and Gijbels (1996) ] Assume that f(u0) > 0 and that (P +1) f(·) and z are continuous in a neighborhood of u0. Further, assume that h is the bandwidth in Equation (A.1), h → 0 and nh → ∞. Then the asymptotic conditional variance of zˆv(u0) is given by 2   0 −1 ∗ −1 v!σ 1 V ar{zˆv(u0)} = ev+1B B B ev+1 1+2v + op 1+2v f(u0)nh nh The asymptotic conditional bias when P − v is odd 3 is given by v! Bias{zˆ (u )} = e0 B−1c z(P +1)(u )hP +1−v + o (hP +1−v) v 0 v+1 P (P + 1)! 0 p

where ev+1 is a (v + 1) × 1 vector. The (v + 1)-th element is 1, and all the others are 0 zero. Namely, ev+1 = (0, 0,, ··· , 1) .

A.1.2.4 Further Discussion about univariate LPR I continue the discussion of LPR estimator in Fan and Gijbels (1996). Con- ˜ sider the bivariate data (u1, r1), ...(un, rn). I adopt the notation Bj,l = (µj+l), Bj,l = ∗ Pn j (µj+l+1), B = (νj+l) and Bn,j = i=1 Kh(ui − u0) . The estimator of LPR is ˆ 0 ˆ βv = ev+1β 0 −1 0 = ev+1Bn U WR.

And define ϕ = (u1 − u0)/h, I have q  Bn,j = EBn,j + Op V ar(Bn,j) s ! Z Z 1 = nhj ujK(ϕ)f(u + hϕ)dϕ + O n (u − u )2jK2 f(u )du 0 p 1 0 h2 1 1 √ j j 0 j 2 2j−1 = nh f(u0)µj + nh f (u0)µj+1h + nh O(h ) + Op( nh ) j = nh f(u0)µj{1 + op(1)}.

3 Fan and Gijbels (1996) also report the bias when P − v is even. 114

The last equality is due to the fact that the optimal bandwidth has the order of 1/(2P + 3). So Bn = nf(u0)HBH{1 + op(1)}, where H = diag(1, h, ··· , hP ). Plug the Bn back into the formula, I have

n 1 + op(1) X βˆ = K∗r , v nhv+1f(x ) v i 0 i=1 where P ! ∗ 0 −1 P 0 X vl l Kv (t) = ev+1B (1, t, ··· , t ) K(t) = B t K(t) l=0 −1 jl ∗ ∗ with B = (B ). I refer to Kv as the equivalent kernel. The equivalent kernel Kv has an order of (v, P + 1) .4

A.1.2.5 Proof of Theorem 1

I use a0 to denote the true value of the projection direction. And define z(u0) = E(r|X0a = u0). The true value a0 is

2 a0 = arg min E[ri − z(ui)] . a Given the projection direction a, if I use LPR to estimate the conditional expec- ˆ tation, then v = 0, P = 1, zˆ(u) = β0. So

n 1 X u − Xia zˆ(u) = K∗( )r {1 + o (1)}, ˆ 0 h i p nhf(u) i=1 where 1 ! ∗ 0 −1 0 X 0l l K0 (t) = e1B (1, t) K(t) = B t K(t). l=0 ∗ The equivalent kernel K0 has an order of (0, 2). The estimated projection direction, aˆ, is

−1 X 2 aˆ = arg min n (ri − zˆ(u)) . a i ∗ This setup is the same with Hall (1989) except that I use a different kernel K0 . I keep the same assumptions in Lemma 4, and define

n −1 X 1 X ∗ 2 a˜ = arg min n (ri − K0 ri) . a ˆ i nhf(u) i=1

4 This is shown in Fan and Gijbels (1996). 115 √ −1/5 −1/2 According to Lemma 4, nh(˜a − a0) →p 0 since h ∼ O(n ) and (nh) ∼ O(n−2/5). As discussed before,

n  0  1 X u − a Xi zˆ(u) = K∗ r {1 + o (1)} ˆ 0 h i p nhf(u) i=1 n 1 X √ = K∗r + o (1/ nh). ˆ 0 i p nhf(u) i=1 The second equality is proved by Hall (1989). Since " n # √ 1 X nh zˆ(u) − K∗r → 0, ˆ 0 i p nhf(u) i=1 I have √ −1 X 2 (nh)[ˆa − a˜] = arg min n [ nh(ri − zˆ(u))] a i √ n −1 X 1 X ∗ 2 − arg min n [ nh(ri − K0 ri)] a ˆ i nhf(u) i=1 →p 0.

So √ √ nh(ˆa − a0) = nh[(ˆa − a˜) + (˜a − a0)]

→p 0.

A.1.2.6 Proof of Theorem 2

∗ ∗ ∗ If I use K0 , K1 and K2 to denote the equivalent kernels for estimating the con- ditional moment and its first and second order derivatives, I have

• u0 = X0a, uˆ0 = X0aˆ

ˆ −1 P ∗ u−Xia • f(u) = (nh0) K ( ) 0 h0

1 Pn ∗ u−Xia • For ∀X, m(u) = E(ri|Xa = u), mˆ (u) = K0 ( )ri{1 + nh0fˆ(u) i=1 h0 −1/5 op(1)}, h0 ∼ Op(n )

(1) 0 (1) 1 Pn ∗ u−Xia • For ∀X, m (u) = m (u), mˆ (u) = 2 ˆ i=1 K1 ( h )ri{1 + op(1)}, nh1f(u) 1 −1/7 h1 ∼ Op(n )

(2) 00 (2) 1 Pn ∗ u−Xia • For ∀X, m (u) = m (u), mˆ (u) = 3 ˆ i=1 K2 ( h )ri{1 + op(1)}, nh2f(u) 2 −1/9 h2 ∼ Op(n ) 116

Here I provide the proof for the asymptotic property of the second order deriva- tives. The other two can be proved by the similar methods. (2) ˆ−1 3 −1 P ∗ 3 −1 ∗ ∗ uˆ0−Xia mˆ (ˆu0) = f (nh ) K zi + op((nh ) ), where K = K ( ). 2 2i 2 2i 2 h2 Then, q 5 (2) (2) ˆ−1 −1/2 X ∗ −1/2 nh2{mˆ (ˆu0) − E[m ˆ (ˆu0)]} = f [(nh2) K2ii] + op((nh2) ).

ˆ ∗ Since the function f converges to f and K2ii is i.i.d. by assumption, the central limit theorem tells us

−1/2 X −1/2 ∗ ∗ 2 2 −1 n h2 K2ii →d N(0, lim E((K2i) i h2 )).

∗ 2 2 −1 −1 2 R ∗ 2 And under the appropriate conditions, limn→∞ E((K2i) i h2 ) = f (u0)σ (K2i) dφ, where φ = u0−Xia . h2 So I have q Z 5 (2) (2) −1 2 ∗ 2 nh2{mˆ (ˆu0) − E[m ˆ (ˆu0)]} →d N(0, f (u0)σ (K2i) dφ).

(2) (2) On the other hand, by Lemma 4, {E[m ˆ (ˆu0)] − m (ˆu0)} is converging with 2 the rate of Op(h2). −1/9 p 5 2/9 Since h2 ∼ Op(n ), nh2 ∼ Op(n ). Therefore, q q 5 (2) (2) 5 (2) (2) nh2{mˆ (ˆu0) − m (ˆu0)} = nh2{mˆ (ˆu0) − E[m ˆ (ˆu0)]} q 5 (2) (2) + nh2{E[m ˆ (ˆu0)] − m (ˆu0)}

= Op(1) + Op(1)

As

(2) (2)  (2) (2)   (2) (2)  m (u0) − mˆ (ˆu0) = m (u0) − m (ˆu0) + m (ˆu0) − mˆ (ˆu0)

According to Lemma 5, the first term on the right hand side q 5 (2) (2) 2p (2) (2) nh2(m (u0) − m (ˆu0)) = h2 nh2[m (u0) − m (ˆu0)] 2 = h2op(1).

So I have completed the proof of theorem 2 q 5 (2) (2) 2 nh2{mˆ (ˆu0) − m (u0)} = Op(1) + h2op(1) 117

A.1.3 Bootstrap the confidence interval The steps of bootstrapping the confidence interval are: 1) Use the optimal bandwidth h0 to estimate the option prices mˆ (ui) and collect the residuals of the εˆ1, ··· , εˆn. (2) 2) Use the optimal bandwidth h2 to estimate the SPDs mˆ (ui) . ˜ 3) Use a large bandwidth h0 to estimate the option prices. Denote the estimators by m˜ (ui). ˜ 4) Use a large bandwidth h2 to estimate the option SPDs. Denote the estimators (2) by m˜ (ui). B B B 5) Construct bootstrap sample data set (u1, y1 ), (u2, y2 ), ··· , (un, yn )), where B B B yi =m ˜ (ui) + εi and εi is obtained from the sampling εˆ1, ··· , εˆn by wild bootstrap. 6) Use the bootstrapped data set and h to estimate mˆ (2)B. Repeat the whole 2 h2 process for 199 times. 7) Obtain 0.025 and 0.975 quantiles of {mˆ (2)B − m˜ (2)(u )}, c and c . h2 i 1 2 (2) 8) 95% point-wise confidence intervals for m is [m ˆ (ui) − c2, mˆ (ui) − c1].

A.2 Model Specification for Chapter 3

A.2.1 Simple model The first Simple model is generated by following the procedure of Ahn and Gadarowski (2004). Specifically, the data are generated by the following data gen- erating process: Rti = µ + Xt1β1i + Xt2β2i + Xt3β3i + eti,

where i is the index of individual portfolio returns, and t is the index of time. Rti is the gross return of portfolio i at time t. Xtj (j=1,2, and 3) is the common factor for time t, drawn from a normal distribution with mean equal to 0.0022 and variance equal to −5 6.944 × 10 . βki (k=1,2, and 3) is the corresponding beta of factor Xk for portfolio i, and they are drawn from uniform distribution U[0, 2]. eit is the idiosyncratic error that is normally distributed with mean zero and variance 6.944 × 10−5. µ, β and X are chosen at values which make the mean and variance of gross returns roughly consistent with historical data in the US stock market.

A.2.2 Fama-French and Premium-Labor models I follow the procedure of Ahn and Gadarowski (2004) to generate data sets cal- ibrated to resemble the statistical properties of the Fama-French and Premium-Labor models. First, I collect 330 time-series observations of monthly returns of the Fama- French portfolios and the Fama-French factors between July 1963 and December 19905 . For the Premium-Labor model, I follow the steps in Jagannathan and Wang (1996) to obtain the portfolio returns and the factors.

5 URL is http : //mba.tuck.dartmouth.edu/pages/faulty/ken.french/data_library.html 118

Second, I apply the two-pass estimation following Shanken (1992). Specifically, I regress the portfolio returns on the corresponding factors by OLS, obtain the estimates of βki, and collect the residuals. I then compute the diagonal sample covariance matrix of the residuals. Subsequently, I run the following cross sectional regression:

E[Rti] = µ + (E(Xt1) + η1)β1i + (E(Xt2) + η2)β2i + (E(Xt3) + η3)β3i.

This gives the estimates of the risk-free rate, µ, and the factor-mean adjusted risk prices, ηk. Finally, I simulate the factors from normal distribution with the mean and the co- variance equal to the sample mean and the sample covariance matrix derived from the actual data of the corresponding factors. The error terms, eti, are drawn from normal distribution with the mean equal to zero and the variance equal to the sample covari- ance of the residuals. The calibrated portfolio returns are generated by the following equation:

Rti = µ + (Xt1 + η1)β1i + (Xt2 + η2)β2i + (Xt3 + η3)β3i + eti

The risk-adjusted prices are incorporated in order to simulate the portfolio return close to the true data. Appendix B

Figures 120

Figure B.1: Pricing error for out-of-money interest rate call options

2

1

0

−1

−2

−3

Pricing Error −4

−5

−6

−7

−8 0 50 100 150 200 Time−to−Maturity

Pricing error for the interest rate call options in the first half of 1993. The blue circle denotes the pricing error in our model, and the red star denotes the pricing error in the semi-parametric model of Aït-Sahalia and Lo (1998). The figure is the pricing error for out-of-the-money 121

Figure B.2: Pricing error for at-the-money interest rate call options

4

2

0

−2 Pricing Error

−4

−6

−8 0 50 100 150 200 Time−to−Maturity

Pricing error for the interest rate call options in the first half of 1993. The blue circle denotes the pricing error in our model, and the red star denotes the pricing error in the semi-parametric model of Aït-Sahalia and Lo (1998). The figure is the pricing error for at-the-money. 122

Figure B.3: Pricing error for in-the-money interest rate call options

5

0 Pricing Error

−5

−10 0 50 100 150 200 Time−to−Maturity

Pricing error for the interest rate call options in the first half of 1993. The blue circle denotes the pricing error in our model, and the red star denotes the pricing error in the semi-parametric model of Aït-Sahalia and Lo (1998). The figure is the pricing error for in-the-money. 123

Figure B.4: Pricing error for out-of-money interest rate put options

1

0.5

0

−0.5 Pricing Error

−1

−1.5

−2 20 40 60 80 100 120 140 160 180 200 Time−to−Maturity

Pricing error for the interest rate put options in the first half of 1993. The blue circle denotes the pricing error in our model, and the red star denotes the pricing error in the semi-parametric model of Aït-Sahalia and Lo (1998). The figure is the pricing error for out-of-the-money. 124

Figure B.5: Pricing error for at-the-money interest rate put options

1

0.5

0

−0.5

−1 Pricing Error −1.5

−2

−2.5

−3 0 50 100 150 200 Time−to−Maturity

Pricing error for the interest rate put options in the first half of 1993. The blue circle denotes the pricing error in our model, and the red star denotes the pricing error in the semi-parametric model of Aït-Sahalia and Lo (1998). The figure is the pricing error for at-the-money. 125

Figure B.6: Pricing error for in-the-money interest rate put options

1

0

−1

−2

−3 Pricing Error −4

−5

−6

−7 0 50 100 150 200 Time−to−Maturity

Pricing error for the interest rate put options in the first half of 1993. The blue circle denotes the pricing error in our model, and the red star denotes the pricing error in the semi-parametric model of Aït-Sahalia and Lo (1998). The figure is the pricing error for in-the-money. 126

Figure B.7: Forecasting error for out-of-money interest rate call options

1.5

1

0.5

0

−0.5

−1 Pricing Error

−1.5

−2

−2.5

−3 0 20 40 60 80 100 120 Time−to−Maturity

Forecasting error for the interest rate call options in the third quarter of 1993. The blue circle denotes the pricing error in our model, and the red star denotes the pricing error in the semi- parametric model of Aït-Sahalia and Lo (1998). The figure is the pricing error for out-of-the- money, at-the-money and in-the-money respectively. 127

Figure B.8: Forecasting error for at-the-money interest rate call options

2

1

0

−1 Pricing Error

−2

−3

−4 0 20 40 60 80 100 120 Time−to−Maturity

Forecasting error for the interest rate call options in the third quarter of 1993. The blue circle denotes the pricing error in our model, and the red star denotes the pricing error in the semi- parametric model of Aït-Sahalia and Lo (1998). The figure is the pricing error for at-the-money. 128

Figure B.9: Forecasting error for in-the-money interest rate call options

4

2

0

−2 Pricing Error

−4

−6

−8 0 20 40 60 80 100 120 Time−to−Maturity

Forecasting error for the interest rate call options in the third quarter of 1993. The blue circle denotes the pricing error in our model, and the red star denotes the pricing error in the semi- parametric model of Aït-Sahalia and Lo (1998). The figure is the pricing error for in-the-money. 129 Figure B.10: SPDs of the interest rate

0.25

0.2

0.15

0.1

0.05 SPD

0

−0.05

−0.1

−0.15 26 28 30 32 34 36 Cash Price at Expiration

This figure shows the SPDs of the interest rate after 42 days given that St = 30. The red dash lines are the 95% point-wise confidence interval. 130

Figure B.11: CDFs of F11 and F21

1

0.9

0.8

0.7

0.6

0.5 CDF

0.4

0.3

0.2

0.1

0 26 28 30 32 34 36 38 Cash Price at Expiration

This figure shows the CDFs of F11 and F21. The red dashed line indicates F21 and the blue line indicates F11. 131 Figure B.12: Estimates of η

0.4

0.35

0.3

0.25

0.2

0.15

0.1 Difference between two CDFs 0.05

0

−0.05 26 28 30 32 34 36 38 Cash Price at Expiration

This figure shows the estimates of η and the associated 95% confidence level. 132 15 T=160

10

5 density

0 0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1

15 T=330

10

5 density

0 0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1

15 T=700

10

5 density

0 0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1

Figure B.13: The density functions for αˆ in the Simple Model of 100 Portfolios 133 200 T=160

100 density

0 0 0.02 0.04 0.06 0.08 0.1

300 T=330

200

100 density

0 0 0.02 0.04 0.06 0.08 0.1

600 T=700 500 400 300 200 density 100 0 0 0.02 0.04 0.06 0.08 0.1

Figure B.14: The density functions for αˆ in a misspecified Simple Model of 100 Port- folios