Model-Free Volatility Prediction or Can the Stock Market be Linearized?

Dimitris N. Politis University of California, San Diego

2 Financial returns data

• Data: X1,... ,Xn from financial returns {Xt,t∈ Z} • e.g. percentage returns of a stock price, stock index or foreign exchange rate • returns are assumed: strictly stationary, zero1 and symmetrically distributed ◦ financial returns exhibit heavy-tails and volatility clustering. • Bachelier (1900); Fama (1965), Mandelbrot (1963).

1 centered, detrended, etc. 3 91 c lto h al eun fteIMsokpiefo eray1 94t December to 1984 30, 1, August February to from 1983 price 1, stock 1, August IBM October the to from of up returns returns 1991. 1987 index daily 31, 31, the stock December of S&P500 Plot from daily (c) returns the 1991; Yen/Dollar of daily Plot the (b) of 2002; Plot (a) 1: Figure

ibm.s sp500.s YenDret -0.2 0.0 0.1 -0.20 -0.05 -0.04 0.02 0.08 0 0010 2000 1500 2000 1000 1500 500 1000 0 500 0 0020 3000 2000 1000 0 4 (b) (a) (c) ARCH models

• Engle’s (1982) ARCH(p)model:    p =  + 2 (1) Xt Zt a aiXt−i i=1

• original assumption: {Zt} is i.i.d. N(0, 1) ˆ • But residuals {Zt} from a fitted ARCH(p)model appear non-normal (heavy-tailed)

◦ ARCH models with heavy-tailed errors, e.g. {Zt}∼tk with degrees of freedom k empirically chosen ‡ ad hoc! cf. Bollerslev et al. (1992), Shephard (1996)

5 Sample Kyrtosis

• j( )= Define Ki Y the empirical (sample) kyrtosis of the dataset {Yi,Yi+1,... ,Yj} • Under model (1), the residuals

ˆ  Xt Zt =  (2) ˆ + p ˆ 2 a i=1 aiXt−i ought to behave like i.i.d. N(0, 1) • 3 n( ˆ) n( ) But typically: << K1 Z << K1 X

◦ Question: is there a specification of a,ˆ aˆ1, aˆ2,... that n( ˆ)  3 will yield K1 Z ? ◦ Answer: not in general.

6 Studentization idea

ˆ  Xt The residual Zt =  ˆ + p ˆ 2 a i=1 aiXt−i may be seen as an attempt to ‘studentize’ the return Xt by dividing with a time-localized estimate of standard

• Why exclude the value of Xt from an empirical (causal) estimate of the of Xt?

 2 ≥ 0 Include an Xt term (with its own coefficient, a0 ) in the studentization, and define the new empirical ratio ˆ  Xt Wt =  (3) ˆ +ˆ 2 + p ˆ 2 a a0Xt i=1 aiXt−i

◦ Question: is there a specification of a,ˆ aˆ0, aˆ1,... that n( ˆ )  3 will yield K1 W ? ◦ Answer: in general ... YES!

··· ˆ n ˆ aˆ =0, aˆ0 =1, aˆ1 =ˆa2 = 0 implies Wt = sign(Xt) and thus K1 (W )=1. n ˆ But aˆ0 =0implies K1 (W ) > 3. —Intermediate Value Theorem—

7 An Implicit ARCH Model

ˆ  Xt The residual Wt =  ˆ +ˆ 2 + p ˆ 2 a a0Xt i=1 aiXt−i corresponds to a true equation of the type:  Xt Wt =  . (4) + 2 + p 2 a a0Xt i=1 aiXt−i leading to the Implicit ARCH Model:    p =  + 2 + 2 (5) Xt Wt a a0Xt aiXt−i i=1 where Wt ∼ i.i.d.N(0, 1)—almost! √ ◦ Eq. (4) implies that |Wt|≤1/ a0.

Hence, Wt ∼ i.i.d. from a truncated N(0, 1).

8 A heavy-tailed distribution for ARCH residuals

Eq. (5) is implicit—solving for Xt yields:    p =  + 2 (6) Xt Ut a aiXt−i i=1 where

Wt Ut =  . (7) 1 − 2 a0Wt √ Recall Wt ∼ i.i.d. from N(0, 1) truncated to ±1/ a0.

◦ Hence, Ut ∼ i.i.d. with density f(u; a0, 1) given (for all u ∈ R)by:

(1 + 2)−3/2 exp(− u2 ) a0u 2 2(1+a0u ) f(u; a0, 1) = √  √ √ 2π Φ(1/ a0) − Φ(−1/ a0)

9 Some properties of the new distribution

(1 + 2)−3/2 exp(− u2 ) a0u 2 2(1+a0u ) f(u; a0, 1) = √  √ √ 2π Φ(1/ a0) − Φ(−1/ a0)

◦ a0 ≥ 0 is a capturing the degree of heavy tails

◦ If a0 =0,thenf(u;0, 1) = φ(u), i.e., N(0, 1)

◦ If a0 > 0,thenf(u; a0, 1) has finite moments only up to (almost)ordertwo2

i.e., if U ∼ f(u; a0, 1) with a0 > 0, then E|U|d < ∞ for all d ∈ [0, 2) but E|U|d = ∞ for all d ∈ [2, ∞)

2 As in the t2 distribution, i.e., Student’s t with 2 d.f. 10 (a) (b) Normal vs. t5 Normal vs. f(u;0.1,1) 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

012345 012345

u u

(c) (d) t5 vs. f(u;0.1,1) t1 vs. f(u;0.5,1) 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

012345 012345

u u

Figure 2: (a) Standard normal density (shaded) vs. f(u;0.1, 1); (b) Standard normal density (shaded) vs. t with 5 degrees of freedom; (c) t with 5 degrees of freedom (shaded) vs. f(u;0.1, 1); (d) t with 1 degree of freedom (shaded) vs. f(u;0.5, 1).

11 A scale/shape family and MLE

◦ the density f(u; a0, 1) can be scaled to create a two-parameter family of densities with typical member 1 x f(x; a ,s)= f( ; a , 1) for all x ∈ R 0 s s 0

◦ (a0,s): shape and scale parameters

 Ft−1= {Xs, 0

The likelihood of the data X =(X1,... ,Xn) conditionally on Fp (also called the ‘pseudo- likelihood’) is given by: n L(a, a0,a1,... ,ap|X)= f(Xt; a0,st). t=p+1

12 GARCH(1,1) model and approximate MLE

= 2 = + 2 + 2 (8) Xt stUt with st C AXt−1 Bst−1

and Ut ∼ i.i.d. f(u; a0, 1) (9)

• The above GARCH(1,1) model given by (8)–(9) has 3 four parameters: A, B, C and a0. • The GARCH model (8) is equivalent to an ARCH model with p = ∞ and the following identifications: C a = , and a = ABi−1 for i =1, 2,... 1 − B i

◦ Approximation: ai  0 for all i ≥ some finite p0.

◦ So the GARCH(1,1) model (8) is approximately equivalent to the ARCH model (6) with p = p0.The MLEs of A, B, C and a0 can then be obtained by ( | ) maximizing L a, a0,a1,... ,ap0 X with respect to the four free parameters a0, A, B, C,notingthat

a, a1,... ,ap0 are simple functions of A, B, C.

3The same number of parameters (four) characterizes the GARCH (1,1) model with t–errors; the number of degrees of freedom for the best-fitting t distribution represents the fourth parameter. 13 = 2 = + 2 + 2 GARCH model: Xt stUt and st C AXt−1 Bst−1 ˆ ˆ ˆ aˆ0 A B C Yen/Dollar–N(0, 1) N/A 0.062 0.898 2.29e-06 Yen/Dollar–t distr. N/A 0.027 0.923 8.95e-07 Yen/Dollar–f(·; a0, 1) 0.089 0.028 0.938 8.38e-07 S&P500–N(0, 1) N/A 0.104 0.834 6.63e-06 S&P500–t distr. N/A 0.022 0.927 1.83e-06 S&P500–f(·; a0, 1) 0.081 0.023 0.936 1.96e-06 IBM–N(0, 1) N/A 0.104 0.807 1.72e-05 IBM–t distr. N/A 0.027 0.913 5.65e-06 IBM–f(·; a0, 1) 0.066 0.029 0.912 6.32e-06 Table 2. MLEs in the GARCH (1,1) model under three possible distributional assumptions for the GARCH errors: the N(0, 1),thet

(with estimated degrees of freedom), and the new f(·; a0, 1) density.

14 Series : YenDret Series : YenDret^2 Series : Wt9 ACF ACF ACF 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

0 102030 0102030 0102030 Lag Lag Lag

Figure 3: (Yen/Dollar example) (a) of the returns series {Xt}; (b) Correlogram 2 S of the squared returns {Xt }; (c) Correlogram of the optimal Simple NoVaS series {Wt,9}. 2 F = { } Problem: Predict Xt based on t−1 X1,... ,Xt−1 • 2 To predict the squared returns Xt , a natural thing to 2 try is to fit a linear model to the series Xt . • e.g. the AIC criterion yields AR–order r =26for Yen/Dollar of Figure 3 (b)—this is too large! • 2 Correlogram of Xt is misleading: spuriously large estimates at high lags. • Problem: confidence bands are misleading due to non-linearity and heavy tails of the squared returns, and so is AIC due to non-normality. ◦ GARCH(1,1) by-passes order selection: it is like 2 fitting ARMA(1,1) to the squared returns Xt .

15 Prediction of squared returns via GARCH

= 2 = + 2 + 2 GARCH model: Xt stUt and st C AXt−1 Bst−1

ˆ2 = ˆ + ˆ 2 + ˆ 2 But the GARCH predictor st C AXt−1 Bst−1 (estimated squared volatility) also has poor performance.4

Predictor type Yen/Dollar S&P500 IBM 2 AR model for Xt with AIC 0.971 1.125 1.108 ˆ2 st —GARCH(1,1) with normal errors 1.005 1.164 1.140 ˆ2 st —GARCH(1,1) with t–errors 1.025 1.151 1.139 Table 3. Entries give the empirical MSE of prediction of squared returns relative to benchmark (=sample ).

• 2 = ( 2|F ) ∼ (0 1) Note: st E Xt t−1 iff Ut i.i.d.N ,

Despite the fact that the residuals Ut appear non-normal, the (conditional) mean’s optimality is typically based on a MSE criterion. But the latter presupposes the MSE is finite, i.e., that the Xt series has a finite 4th . • Do financial returns have finite 4th moment?

4Andersen and Bollerslev (1998) quote “poor out-of-sample forecasting performance vis-a-vis daily squared returns”, and “numerous studies have suggested that ARCH and stochastic volatility models provide poor volatility forecasts”. 16 eea dataset general o:diyrtrso h &50idxsann h eid1112 o8-30-1991. to Here 1-1-1928 period the Yen/Dolla spanning of index returns S&P500 daily the Bottom: of returns daily Top: iue4 ltof Plot 4: Figure

V VAR_k (Yen/D) variance SP500 i j

( 0.0002 0.0004 0.0006 0.00080.0 0.0001 0.0002 0.0003 0.0004 0.0005 Y 0020 3000 2000 1000 0 15000 10000 5000 0 )and { Y V i 1 ,Y k K ( X i i j +1 ( )and Y ..,Y ,... eoeteeprcl(ape aineadkroi fa of kyrtosis and variance (sample) empirical the denote ) (a) k K j 1 k } ( . X safnto of function a as ) xhnert --98t 8-1-2002. to 1-1-1988 rate exchange r 17

KYRT_k (Yen/D) kyrtosis SP500

246810120 5 10 15 20 25 30 0020 3000 2000 1000 0 0010015000 10000 5000 0 k . (b) k Do financial returns have finite 4th moment?

NO!—SLLN seems to break down for the 4th moment. • But they appear to have a finite 2nd moment (or, at least, ‘almost’ finite 2nd moment).5

 MSE criterion, i.e., L2, is inapplicable.

• Use L1 criterion and Mean Absolute Deviation.

5 4 4 Fan and Yao (2003), Kokoszka et al. (2003): max(1,EUt )(A + B) < 1 ⇔ EXt < ∞. 18 L1 prediction of squared returns

◦ 2 ˜ 2 L1–optimal predictor of Xt is Xt , i.e., the conditional 2 F of the distribution of Xt given t−1 ◦ = ˜ 2 = 2 Since Xt stUt, if follows that Xt m2 st where 2 m2 is the median of the common distribution of Ut .

◦ When Ut is assumed N(0, 1),f(u;0.1, 1) or t5,the corresponding values of m2 are 0.455, 0.475, 0.528

 ˆ 2 = ˆ2 Practical estimator: Xt m2 st (which is about one half of the common estimator)

19 ˆ2 ˆ2 benchmark st m2st Yen/Dollar–N(0, 1) 0.0697 0.0646 0.0545 Yen/Dollar–t distr. ” 0.0550 0.0541 Yen/Dollar–f(·; a0, 1) ” 0.0567 0.0540 S&P500–N(0, 1) 0.3343∗ 0.1042 0.0919 S&P500–t distr. ” 0.0947 0.0920 S&P500–f(·; a0, 1) ” 0.0975 0.0918 IBM–N(0, 1) 0.1692 0.1918 0.1500 IBM–t distr. ” 0.1571 0.1455 IBM–f(·; a0, 1) ” 0.1609 0.1454 Table 4a. Entries represent the Mean Absolute Deviation (multiplied by 1,000) for three predictors of squared returns. Predictions were carried out over the 2nd half of each dataset, with coefficients ∗ estimated from the 1st half. This value is as high because the crash of 1987 is present in the 2nd half of the S&P500 dataset.

20 ˆ2 ˆ2 benchmark st m2st Yen/Dollar–N(0, 1) 0.0697 0.0650 0.0545 Yen/Dollar–t distr. ” 0.0554 0.0540 Yen/Dollar–f(·; a0, 1) ” 0.0574 0.0539 S&P500–N(0, 1) 0.3343 0.1135 0.0942 S&P500–t distr. ” 0.0948 0.0920 S&P500–f(·; a0, 1) ” 0.0978 0.0919 IBM–N(0, 1) 0.1692 0.1815 0.1472 IBM–t distr. ” 0.1545 0.1453 IBM–f(·; a0, 1) ” 0.1577 0.1453 Table 4b. As before, i.e. predictions carried out over the 2nd half of each dataset, but GARCH coefficients estimated from the whole of the dataset. 6

6By contrast to the conservative entries of Table 4a, the entries of Table 4b are over-optimistic as the GARCH coefficients used have unrealistic accuracy; therefore, the truth should lie somewhere in-between Table 4a and Table 4b. Nevertheless, the two tables are similar enough to suggest that the effect of the accuracy of the GARCH coefficients is not so prominent, and Table 4b leads to the same conclusions as Table 4a. 21 Moral

• ARCH/GARCH models do have predictive validity for the squared returns. • This is particularly true when a heavy-tailed distribution is assumed for the GARCH residuals with the f(·; a0, 1) distribution appearing to have a slight edge over the popular t distribution. • To appreciate and take advantage of this one must: (a) use a more meaningful measure of prediction performance such as L1 loss, and (b) use the optimal ˆ2 predictor m2st in the L1 case.

22 AModel-freeviewpoint

Recall the studentized empirical ratio (3)

 Xt Wt =  + 2 + p 2 a a0Xt i=1 aiXt−i It was claimed that there is a specification of a, a0,a1,... that will approximately normalize Wt. ◦ Look for this specification without resort to a model. ◦ Once this specification is found, the mapping of {Xt,t∈ Z} to {Wt,t∈ Z} is a normalizing and variance–stabilizing transformation: NoVaS.

23 NoVaS transformation

{Xt,t∈ Z} ←→ {Wt,t∈ Z} where  Xt Wt =  + 2 + p 2 a a0Xt i=1 aiXt−i

• NoVaS—domain is easier to work with.

•{Wt,t∈ Z} is by construction (approximately) Normal with no .

24 (a) -3 -1 1 2 3

0 1000 2000 3000

(b) -3 -1 1 2 3

0 500 1000 1500 2000

(c) -2 0 2

0 500 1000 1500 2000

Figure 5: Plots of the Simple NoVaS transformed series corresponding to the three datasets of Figure 2. The variance stabilization effect is quite apparent; in particular, note that the market crash of October 1987 is hardly (if at all) noticeable in Figure 5 (b) and (c). A comparison with Figure 2 is quite striking.

25 YenDret 0 200 400 -0.04 0.02 0.08 -0.04 -0.02 0.0 0.02 0.04 0.06 0.08 -2 0 2

YenDret Quantiles of Standard Normal sp500.s 0 200 400 -0.20 -0.05

-0.20 -0.15 -0.10 -0.05 0.0 0.05 0.10 -2 0 2

sp500.s Quantiles of Standard Normal ibm.s -0.2 0.0 0.1 0 100 200 300

-0.2 -0.1 0.0 0.1 -2 0 2

ibm.s Quantiles of Standard Normal

Figure 6: and Q-Q plots for the three returns series.

26 YenDret--NoVaS 0 100 200 300 -3 -1 1 2 3

-3 -2 -1 0 1 2 3 -2 0 2

YenDret--NoVaS Quantiles of Standard Normal sp500.s--NoVaS -3 -1 1 2 3 0 50 100

-2 0 2 -2 0 2

sp500.s--NoVaS Quantiles of Standard Normal -2 0 2 ibm.s--NoVaS 0 50 100 150

-2 0 2 -2 0 2

ibm.s--NoVaS Quantiles of Standard Normal

Figure 7: Histograms and Q-Q plots for the three NoVaS series of Figure 5.

27 Spectral and bispectral density

For a mean zero time series {Yt,t∈ Z} define:

• The second and third order cumulant functions γ(k)=EYtYt+k and Γ(j, k)=EYtYt+jYt+k. • The spectral andbispectral density functions: ( )=(2 )−1 ∞ ( ) −iwk f w π k=−∞ γ k e ,and ∞ ∞ −2 −iw1j−iw2k F (w1,w2)=(2π) Γ(j, k)e . j=−∞ k=−∞

28 Linearity and joint normality

• Also define the normalized bispectrum as: 2 |F (w1,w2)| K(w1,w2)= . f(w1)f(w2)f(w1+w2)

◦ Ifthetimeseriesislinear,thenK(w1,w2) is 2 3 constant—equal to Γ(0, 0) /(2πγ(0) ) for all w1,w2. ◦ If the time series is (jointly) normal,then K(w1,w2) ≡ 0 for all w1,w2.

—Can test a time series for linearity or joint normality using the bispectrum; cf. Subba Rao/Gabr, Hinich, etc.

◦ NoVaS transforms to linearity and joint normality.

29

FigÙÖe ½: ¶

^

E×ØiÑaØed bi×Ô ecØÖÙÑ Ã ´! ;! µ Ú׺ ´! ;! µ fÓÖ Øhe ˲È5¼¼ ÖeØÙÖÒ׺

½ ¾ ½ ¾

FigÙÖe ¾: ¶

^

E×ØiÑaØed bi×Ô ecØÖÙÑ Ã ´! ;! µ Ú׺ ´! ;! µ fÓÖ Øhe ˲È5¼¼ ÆÓÎaË ×eÖie׺

½ ¾ ½ ¾ ½ Prediction of squared returns using NoVaS

The NoVaS transformation  Xt Wt =  + 2 + p 2 a a0Xt i=1 aiXt−i can be inverted: p W 2  X2 = t a + a X2 t 1 − a W 2 i t−i 0 t i=1 ◦ L –optimal predictor of X2 is 1 t p 2 = + 2 Xt µ2 a aiXt−i i=1   2 Wt where µ = Median |F − . 2 1 − 2 t 1 a0Wt

7 • If Wt is (approximately) uncorrelated, then estimate W 2 { k ; } µ2 by the sample median of − 2 k=1+p,... ,t−1 . 1 a0Wk

• If Wt is found correlated, then Gaussian theory can be used to get the predictive distribution of Wt|Ft−1.

7 See e.g. Figure 3 (c). 30 Series : YenDret Series : YenDret^2 Series : Wt9 ACF ACF ACF 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

0 102030 0102030 0102030 Lag Lag Lag

Figure 8: (Yen/Dollar example) (a) Correlogram of the returns series {Xt}; (b) Correlogram 2 S of the squared returns {Xt }; (c) Correlogram of the optimal Simple NoVaS series {Wt,9}.

λi -4 -1 -0.5 0 0.5 1 4 ˜ KY RT(Wt,12,i) 2.99 3.11 3.14 3.01 2.97 3.01 3.04 ˜ Table: Sample kyrtosis of Wt,i = Wt + λiWt−1 for different values of λi.

• Wt appears to be approximately a Gaussian process.

31 Choosing the parameters of NoVaS

• Choose the order p and a, a0,... ,ap with the twin goals8 of normalization and variance–stabilization of the transformed series {Wt}. • Recall that in the NoVaS series  Xt Wt =  + 2 + p 2 a a0Xt i=1 aiXt−i the denominator has the interpetation of a local estimate of standard deviation of Xt. • the simplest such local estimates are: a moving 2 average, and of the past Xt s. • both simple schemes involve the choice a =0.

– ‘Simple NoVaS’: ai =1/(p +1)for all 0 ≤ i ≤ p. – ‘Exponential NoVaS’: a = ce−ci for all 0 ≤ i ≤ p. i  • p =1 For variance–stabilization, we require i=0 ai .

8 Secondarily, the NoVaS parameters may be further optimized with a specific criterion in mind, e.g., optimal volatility prediction. 32 KYRT(W t,p) 2 4 6 8 10 12 14

0 50 100 150 200

p

Figure 9: Illustration of the simple NoVaS algorithm for the Yen/Dollar dataset: plot of S KY RTn(Wt,p) as a function of p; the solid line indicates the Gaussian kyrtosis of 3. Simple NoVaS

Target: normalize the NoVaS series Wt; for example, n( ) ensure that the kyrtosis K1 W is close(st) to 3.

• Simple NoVaS: ai =1/(p +1)for all 0 ≤ i ≤ p. n( )  3 – Pick p to get K1 W .

33 KYRT(W t,c) 2.5 3.0 3.5 4.0

0.02 0.04 0.06 0.08 0.10 0.12 0.14

c

Figure 10: Illustration of the Exponential NoVaS algorithm for the Yen/Dollar dataset: plot E of KY RTn(Wt,c) as a function of c; the solid line indicates the Gaussian kyrtosis of 3. Exponential NoVaS

• Exponential NoVaS: a = ce−ci for all 0 ≤ i ≤ p. i   – Pick c and p only; c is identified from ai =1. – Only c is important here: (i) pick a huge starting value for p, e.g., n/4; n( )  3 (ii) pick c to get K1 W ; (iii) trim the value of p:ifai <,thenletai =0, and let p= smallest i such that ai <; (iv) finally, renormalize the aissothat ai =1.

34 0.02 0.04 0.06 0.08

10 20

Figure 11: Plot of the exponential coefficients ai versus the index i =1,... ,p for the two values of c suggested by Figure 10; note that c  0.0113 corresponds to p = 10, while c  0.0985 corresponds to p = 22. .

35 Comparison of predictors using MAD 9

Predictor type Yen/Dollar S&P500 IBM 2 AR model for Xt with AIC 0.963 0.912 0.941 ˆ2 st —GARCH(1,1) with normal errors 0.971 0.982 0.980 ˆ2 st —GARCH(1,1) with t–errors 0.821 0.818 0.864 ˆ2 m2st —GARCH(1,1) with normal errors 0.805 0.817 0.829 ˆ2 m2st —GARCH(1,1) with t–errors 0.793 0.799 0.831 Simple NoVaS 0.800 0.764 0.834 Exponential NoVaS 0.787 0.754 0.820

Entries are relative to benchmark (=sample variance).   2 =ˆ + p 2 NoVaS predictor: Xt µ2 a i=1 aiXt−i where W 2 ˆ { k ; } µ2=samplemedianof − 2 k=1+p,... ,t−1 . 1 a0Wk ◦ Simple NoVaS ≈ GARCH(1,1) with t–errors ◦ Exponential NoVaS is best overall!

9GARCH fit and predictions using whole dataset. 36 General Exponential NoVaS • Exponential NoVaS with choice of α optimized for volatility prediction. • All (c, α) combinations in Table 5 are equally succesful in normalizing the NoVaS transformation. 10

α Yen/Dollar S&P500 IBM 0.00 0.787 0.754 0.820 (0.098) (0.084) (0.069) 0.05 0.786 0.750 0.815 (0.109) (0.095) (0.075) 0.10 0.785 0.746 0.811 (0.120) (0.108) (0.080) 0.20 0.785 0.739 0.806 (0.140) (0.135) (0.098) 0.30 0.784 0.733 0.803 (0.183) (0.195) (0.127) 0.40 0.783 0.730 0.797 (0.260) (0.300) (0.180) 0.50 0.787 0.733 0.789 (0.410) (0.520) (0.290) 0.60 0.787 N/A 0.787 (0.813) (—) (0.580) 0.65 N/A N/A 0.788 (—) (—) (0.990) 11

10The N/A entries in Table 5 indicate values of α that are too big for the kyrtosis matching to be successful. 11Entries give the empirical Mean Absolute Deviation (MAD) of prediction of squared returns relative to benchmark using the General Exponential NoVaS with parameter α; below each entry in parentheses is the optimal exponent c from kyrtosis matching. 37 Three references

1. D. N. Politis, ‘A normalizing and variance-stabilizing transformation for financial time series’, in Recent Advances and Trends in Nonparametric , (M.G. Akritas and D.N. Politis, Eds.), Elsevier (North Holland), 2003, pp. 335-347.

2. D. N. Politis, ‘Model-free volatility prediction’. UCSD Dept. of Economics Discussion Paper 2003-16.

3. D. N. Politis, ‘A heavy-tailed distribution for ARCH residuals with application to volatility prediction’, Annals of Economics and Finance, vol. 5, pp. 283-298, 2004.

4. D. N. Politis, ‘Can the Stock Market be Linearized?’. UCSD Dept. of Economics Discussion Paper 2006-03.

38