An alternative route to performance hypothesis testing Received (in revised form): 7th November, 2003

Bernd Scherer heads Research for Deutsche in Europe. Before joining Deutsche, he worked at Morgan Stanley, J.P. Morgan and Schroders, where he headed global fixed income research. He has published several books on asset management and is a lecturer in at the University of Augsburg.

Head of Investment Solutions, Deutsche Asset Management, Mainzer Landstr. 178–190, 603227 Frankfurt, Germany Tel: ϩ49 69717063461; Fax: ϩ49 15112100340; e-mail: [email protected]

Abstract A wide variety of –return ratios are routinely reported in sales pitches as well as academic publications. Little attempt has been made, however, to look at the small sample distributions of these estimators in order to derive confidence bands. The reason for this has been the extreme difficulty of working out the required statistics for most risk–return ratios. Rather than following classical statistics, this paper relies on a general and robust method which not only provides confidence intervals for arbitrary risk–return ratios, sample sizes and distribution, but is also fairly easy to implement.

Keywords: , , funds, bootstrapping, confidence interval, small sample

Introduction normally distributed. There is little Reported risk–return ratios relate guidance on the small sample behaviour average returns to alternative measures of risk adjusted performance measures of risk, and hence involve the ratio of or how many data points one needs to a random (owing to sampling error) rely on for asymptotic results. Useful nominator and denominator. As such, exceptions are Pedersen and Satchell point estimates of these ratios are easy (2000), Miller and Gehr (1978) and to calculate, but confidence intervals Jobson and Korkie (1981). Moreover, are much more difficult to arrive at. these analytical solutions are either Confidence intervals are needed, extremely difficult to work out or however, for any kind of statistical simply do not exist for modifications of inference and decision making. While the popular Sharpe ratio which focus Lo (2002) has shown that an more on downside risk. An example is asymptotic distribution exists for the the well-known Sortino ratio (it relates Sharpe ratio, this result provides a average return to of special case rather than a general downside returns). A general method is concept. Traditional methods only work needed which provides confidence well if the sampling distribution of the intervals for arbitrary risk–return ratios, statistic in question is asymptotically sample sizes and distributions.

᭧ Henry Stewart Publications 1479-179X (2004) Vol. 5, 1, 5–12 Journal of Asset Management 5 Scherer

Bootstrapping theory as an order to overcome the above problem, alternative bootstrapping techniques are relied on. Suppose a series of returns (total return Resampling treats the current sample as a

minus risk-free rate) r1, r2, ..., rm is good approximation of the true observed. Ex post risk–return ratios (␵ˆ) distribution (in the absence of further are calculated as the ratio of average information, it is the best available). It return per unit of risk. This paper will then repeatedly draws from the empirical focus on Sharpe and Sortino ratios distribution to recalculate the statistic of (Sharpe, 1994; Sortino and Price, 1994). interest many times to arrive at the Both ratios differ with respect to the bootstrap sampling distribution that can employed risk measure. The Sharpe ratio now be used for hypothesis testing. applies a symmetric risk concept, equally Suppose one is given monthly returns on penalising (squaring) downside and upside the HFR fund of funds index ranging deviations from the sample mean return, from January 1990 to April 2003 (160 while the Sortino ratio only includes observations). The JPM one-month cash negative performance in their calculation rate from DataStream is used to calculate of squared returns. The Sortino ratio is a risk-free return. The bootstrapping included for three reasons. First, it procedure is as follows. Randomly draw provides a better capture of risk if returns 160 (original sample size) returns with are non-normally distributed, as is the replacement from the original sample. ␵* case for returns. Secondly, it Calculate a new risk–return ratio ˆ b is known to suffer more from estimation based on the resampled returns. Repeat error, as it uses fewer data (extreme this procedure for b ϭ 1, ..., B times ␵* ␵* ␵* ␵* returns are by definition rare). Thirdly, arriving at ˆ 1, ˆ 1,L,ˆ b,L ˆ B resampled no large sample approximations exist. ratios. The bootstrap sampling ␵* The sample calculations for both ratios distribution of ˆ b can now be taken to are given below. judge whether the sampling distribution ␵ˆ of in small samples is normal (and 1 m ͸r hence traditional approaches might not m i Sharpe ratio ϭ i=1 (1) be so bad after all) or not. With 1 m ϭ ͸(r Ϫ r)2 B 1,000, the following results are Ί Ϫ i 1 m 1 i=1 obtained (see Figure 1). While the Sharpe ratio is well behaved (all 1 m ͸r resampled realisations plot on a straight m i ϭ i=1 line, equalising hypothetical and Sortino ratio m (2) 1 2 empirical percentiles), the same cannot Ί Ϫ ͸I(ri <0)(ri) m 1 i=1 be said about the Sortino ratio. Deviations from normality are large at

where I(ri < 0) denotes the indicator both ends. Normal (small sample) function. High ratios are preferable approximations look only reasonable for (everythingelseequal)astheyindicatea the traditional Sharpe ratio. They seem better return per unit of risk taken. If to be largely misleading, however, for the small sample distribution of ␵ˆ is far the Sortino ratio. Taking the 2.5 per cent from normal, classical methods are biased and 97.5 per cent percentile, one can and unreliable. In any case, the analytical now arrive at a symmetric 95 per cent ␵* ␵* formula for the large sampling confidence band CI(ˆ 2.5%, ˆ 97.5%). Note distribution of (1) is extremely hard to the apparent non-normality of the sample come by, while it is unknown for (2). In distribution for the Sortino ratio in

6 Journal of Asset Management Vol. 5, 1, 5–12 ᭧ Henry Stewart Publications 1479-179X (2004) Performance hypothesis testing

1.5 0.5

0.4

1.0

0.3

0.2 SHARPE RATIO SORTINO RATIO 0.5

0.1

0.0 0.0

–4 –2 0 2 4 –4 –2 0 2 4 Quantiles of Standard Normal Quantiles of Standard Normal

Figure 1 QQ-plot for bootstrapped sampling distribution

Figure 2 Bootstrapped sampling distributions

᭧ Henry Stewart Publications 1479-179X (2004) Vol. 5, 1, 5–12 Journal of Asset Management 7 Scherer

150

100

50

0

0.0 0.2 0.4 0.6 0.8 1.0

Probability that double bootstrapped Sortino-ratio falls below initial Sortino-ratio

1– Z * * Figure 3 Sample histogram of ub ϭ Z ͚ z=1I(␵ˆ bz <␵ˆ)

Figure 2. As suspected, the Sortino ratio (2002) involves the following shows a much larger dispersion in calculations. Perform the simple bootstrap resampled outcomes and hence as described above. Save all b ϭ 1, ..., B estimation error. The corresponding resampled data sets as well as resampled ␵* values can be cut off from the histograms ratios ˆ b.Thisiscalledthefirst stage in Figure 2. For the Sortino (Sharpe) resampling. For each of the B resampled ratio, the 95 per cent interval ranges datasets, start a second round of z ϭ 1, from 0.11 to 0.92 (0.08–0.41). Both ..., Z resamples, leading to a total of BZ ␵** ␵* ratios are significantly different from zero. resamples denoted as ˆ bz.Foreachˆ b, there exists a new set of Z resampled ␵** ␵** ratios ˆ b1,L, ˆ bZ. These are the second ␵* Increasing the coverage ratio stage resamples. For each ˆ b, calculate the with the double bootstrap percentage of second stage resamples ␵** ␵** So far, reliance has been on the 95 per ˆ b1,L, ˆ bZ that fall below the original cent interval from a simple bootstrap sample estimate of the risk–return ratio ␵ˆ, procedure. It is not guaranteed, however, ie calculate that the 95 per cent confidence band 1 Z calculated above indeed covers the true u ϭ ͸I(␵ˆ** < ␵ˆ)(3) b bz ratio with 95 per cent probability. The Z z=1 true ratio might only fall 85 per cent of all times into the estimated confidence Choose B ϭ 1,000 and Z ϭ 200. Under

band. One method of increasing the ideal conditions ub follows a uniform coverage probability is the double distribution. Figure 3 shows that this bootstrap, which can be thought of as assumption is clearly violated for the ␵** bootstrapping the bootstrap. The double double bootstrapped Sortino ratios ˆ b1,L, ␵** bootstrap as described by Nankervis ˆ bZ. Formal tests such as the Kolmogorov

8 Journal of Asset Management Vol. 5, 1, 5–12 ᭧ Henry Stewart Publications 1479-179X (2004) Performance hypothesis testing

(a) RAW DATA (b) FILTERED DATA

1.0 1.0

0.8 0.8

0.6 0.6

0.4 ACF

ACF 0.4

0.2 0.2

0.0 0.0

–0.2 –0.2 0 5 10 15 20 0 5 10 15 20 Lag Lag

Figure 4 Autocorrelation for hedge fund series and Smirnov test as well as the ␹2 nor volatilities are autocorrelated. adjustment test provide p values close to Bootstrapping implicitly removes any 0 per cent. Hence, the null hypothesis time dependence by its very definition. that Figure 3 comes from a uniform All draws are unconditional on the result distribution can be safely rejected. of the previous draw. After a strong Finally, calculate the 2.5 per cent and positive draw, there is no mechanism that

97.5 per cent percentiles of ub.Usethese would favour another positive draw (in value to adjust the first stage resample case of positive autocorrelation) or ␵* ␵* confidence band to CI(ˆ u2.5%, ˆ u97.5%). another large draw (in case of ARCH With the double bootstrap, the effects). Many financial time series (real confidence interval changes to a much estate, corporate bonds, high yield, hedge higher bound of 0.18 (instead of 0.11) funds), however, show strong representing the 8 per cent quantile autocorrelation due to the illiquidity (rather than the 2.5 per cent quantile). In (infrequent trading) of the underlying ␵* ϭ effect, one gets CI(ˆ u7.9% 0.18, instruments. Bootstrapping fails if one ␵* ϭ ˆ u96% 0.96). does not account for return dependence. For the data analysed so far, the autocorrelation function can be plotted as What if one needs to deal with in Figure 4(a). The first-order autocorrelated data? autocorrelation coefficient amounts to So far, it has been implicitly assumed 0.31 and is well outside the confidence that return data r1, r2, ..., rm have been limits (given by dotted line), ie it is drawn independently, ie neither returns statistically significant.

᭧ Henry Stewart Publications 1479-179X (2004) Vol. 5, 1, 5–12 Journal of Asset Management 9 Scherer

14 8

12

6 10

8

4

6 FREQUENCY FREQUENCY

4 2

2

0 0

–0.2 –0.1 0.0 0.1 0.2 0.3 0.4 –0.1 0.0 0.1 0.2 0.3 LOWER BOUND (2.5% LEVEL) FOR RAW DATA LOWER BOUND (2.5% LEVEL) FOR FILTERED DATA

Figure 5 Lower confidence bound (2.5 per cent) on Sharpe ratio for raw and adjusted data

One way to deal with this issue is to estimate Equation (4) using robust remove the autocorrelation using a methods. The above procedure will leave simple filter that adjusts for the inherent the average return of a series constant, AR(1) process (lag one is statistically but effectively increase (decrease) its significant, while lag two is not) and variance for positive autocorrelation reapply the methods established in the (negative correlation) as described in previous sections to the transformed Scherer (2002). The effect of this series. It is known that the first-order procedure can be seen in the right-hand autocorrelation can be removed. Estimate autocorrelation plot in Figure 4(b). the AR(1) coefficientfromalinear Virtually all significant autocorrelation regression (confidence bounds are given by dotted lines) has been removed. After the ϭ ␪ ϩ ␪ ϩ ␧ rt 0 1rt–1 t (4) removal of autocorrelation, one can proceed as in the previous section. ␪ Save the regression coefficient 1 and Rather than focusing on a single series, * create a new filtered series rt all HFR series are used for the described according to frequency and data period (see the HFR website for a description of the ␪ * ϭ 1 Ϫ underlying data series). rt Ϫ ␪ rt Ϫ ␪ rt–1 (5) 1 1 1 1 Figure 5 plots the distribution of lower confidence bounds (2.5 per cent Obviously this represents a break with percentile) of Sharpe ratios for raw data the non-parametric approach used so far. as well as adjusted (filtered) data. The As a compromise, one might want to Sharpe ratios of 5 out of 37 series

10 Journal of Asset Management Vol. 5, 1, 5–12 ᭧ Henry Stewart Publications 1479-179X (2004) Performance hypothesis testing

12

10

10

8

8

6 6 FREQUENCY FREQUENCY

4 4

2 2

0 0

-0.2 0.0 0.2 0.4 -0.2 0.0 0.2 0.4

LOWER BOUND (2.5% LEVEL) FOR RAW DATA LOWER BOUND (2.5% LEVEL) FOR FILTERED DATA

Figure 6 Lower confidence bound (2.5 per cent) on Sortino ratio for raw and adjusted data

(expected: 2.5% ϫ 37 Ϸ 1) are not confidence intervals without having to statistically different from zero. This rely on asymptotic approximations (as in number increases to 11, if filtered data reality samples are small). It has also been (with higher ) are used. If this is shown that the sampling distribution of combined with a suspicion of upward other risk–return measures for which bias in hedge fund returns, many hedge asymptotic results do not exist might be fund styles fail to show statistically highly non-normal. While the double significant Sharpe ratios of even monthly bootstrap methodology leads to returns. An almost identical picture can significantly refined confidence bands, a be found for the Sortino ratio in Figure more realistic modelling of hedge fund 6. Again, the historical track record of returns leaves the case for hedge funds the hedge fund industry as a whole less optimistic than providers of these leaves some doubt about the industry’s services might wish. ability to create added value. Notes 1 All calculations have been performed in S-Plus. For Summary relevant code and further examples on the use of bootstrapping in management see Scherer Renewed interest in the significance of and Martin (2003). risk–return ratios has been focusing on closed form solutions for the well-known References Sharpe ratio. This paper provided a Jobson, J. and Korkie, B. (1981) ‘Performance robust methodology for evaluating the Hypothesis Testing with the Sharpe and Treynor Measures’, Journal of Finance, 36, 889–908. properties of the Sharpe ratio’s sampling Lo, A. W. (2002) ‘The Statistics of Sharpe Ratios’, distribution, as well as how to derive Financial Analysts Journal, July/August, 58 (4).

᭧ Henry Stewart Publications 1479-179X (2004) Vol. 5, 1, 5–12 Journal of Asset Management 11 Scherer

Miller, R. and Gehr, A. (1978) ‘Sample Bias and and Quantitative Analysis, 35(3), 425–50. Performance Measures: A Note’, Journal of Financial Scherer, B. (2002) Portfolio Construction and Risk and Quantitative Analysis, 13, 943–6. Budgeting, Riskwater, London. Nankervis, J. (2002) ‘Stopping Rules for Double Scherer, B. and Martin, D. (2003) Portfolio Optmization Bootstrap Confidence Intervals, University of Surrey, using Nuopt for S-Plus, Springer, New York. http://www.bus.qut.edu.au/esam02/program/papers- Sharpe, W. F. (1994) ‘TheSharpeRatio’, Journal of /Nankervis_John.pdf Portfolio Management, Fall, 49–58. Pedersen, C. and Satchell, S. (2000) ‘Small Sample Sortino, F. A. and Price, L. N. (1994) ‘Performance Measures of Performance Measures in the Measurement in a Downside Risk Framework’, Asymmetric Response Model’, Journal of Financial Journal of Investing, Fall.

12 Journal of Asset Management Vol. 5, 1, 5–12 ᭧ Henry Stewart Publications 1479-179X (2004)