Quantitative Finance, Vol. 5, No. 3, June 2005, 271–276

Pairs trading

ROBERT J. ELLIOTTy, JOHN VAN DER HOEK*z and WILLIAM P. MALCOLM§

yHaskayne School of Business, University of Calgary, Calgary, Alberta, Canada T2N 1N4 zDepartment of Applied Mathematics, The University of Adelaide, Adelaide, South Australia 5005, Australia §National ICT Australia, The Australian National University, Canberra, ACT, Australia 0200

(Received 27 December 2004; in final form 11 April 2005)

‘Pairs Trading’ is an used by many Funds. Consider two similar which trade at some spread. If the spread widens the high and buy the low stock. As the spread narrows again to some equilibrium value, a profit results. This paper provides an analytical framework for such an investment strategy. We propose a mean- reverting Gaussian Markov chain model for the spread which is observed in Gaussian noise. Predictions from the calibrated model are then compared with subsequent observations of the spread to determine appropriate investment decisions. The methodology has potential applications to generating wealth from any quantities in financial markets which are observed to be out of equilibrium.

Keywords: Pairs trading; Hedge funds; Spreads

1. Introduction discussion of pairs trading can be found in Gatev et al. (1999) and in two recent books by Vidyamurthy (2004) Pairs Trading is a trading or investment strategy used to and Whistler (2004). Reverre (2001) discusses a classical exploit financial markets that are out of equilibrium. study of pairs trading involving Royal Dutch and Shell Litterman (2003) explains the philosophy of Goldman stocks. Pairs trading is also regarded as a special form of Sachs Asset Management as one of assuming that while Statistical and is sometimes discussed under markets may not be in equilibrium, over time they move this topic. The idea of pairs trading can be applied to to a rational equilibrium, and the trader has an interest any equilibrium relationship in financial markets, or to to take maximum advantage from deviations from () portfolios of securities some held short equilibrium. Pairs Trading is a consisting and others held (see Nicholas (2000)). of a long in one security and a short position in another security in a predetermined ratio. If the two securities are stocks from the same financial sector (like two mining stocks), one may take this ratio to be unity. 2. The spread model This ratio may be selected in such a way that the resulting portfolio is market neutral, a portfolio with zero to 2.1. The state process the market portfolio. This portfolio is often called a spread. We shall model this spread (or the return process Consider a state process fxkj k ¼ 0, 1, 2, ...g where xk for this spread) as a mean-reverting process which we denotes the value of some (real) variable at time tk ¼ k calibrate from market observations. This model will for k ¼ 0, 1, 2, .... We assume that {xk} is mean reverting: allow us to make predictions for this spread. If observa- pffiffiffi x x ¼ða bx Þ þ " , ð1Þ tions are larger (smaller) than the predicted value (by kþ1 k k kþ1 some threshhold value) we take a long (short) position where 0, b > 0, a 2R(which we may assume is non- in the portfolio and we unwind the position and make negative without any loss of generality), and f"kg is iid a profit when the spread reverts. A brief history and Gaussian Nð0, 1Þ. Clearly, we assume that "kþ1 is inde- pendent of x0, x1, ..., xk. The process mean reverts to ¼ a=b with ‘strength’ b. Clearly, *Corresponding author. Email: [email protected]. 2 edu.au xk Nðk, kÞ, ð2Þ

Quantitative Finance ISSN 1469–7688 print/ISSN 1469–7696 online # 2005 Taylor & Francis http://www.tandf.co.uk/journals DOI: 10.1080/14697680500149370 272 R. J. Elliott et al. where occurs. An alternative would be to initiate a long trade hi a a only when yk exceeds x^kjk1 by some threshold value. A ¼ þ ð1 bÞk, ð3Þ k b 0 b corresponding short trade could be entered when y < x^ . and k kjk1 Various decisions have to be made by the trader. What 2 hi 2 2k 2 2k is a suitable pair of securities for pair trading? If our k ¼ 1 ð1 bÞ þ 0 ð1 bÞ : ð4Þ 1 ð1 bÞ2 estimates for B reveal 0 < B < 1, then this is consistent with the mean-reverting model we have described. It is easy to show that Comparing yk and x^kjk1 may or may not lead to a a ! as k !1, ð5Þ trade if thresholds must be met. How are thresholds set? k b See Vidyamurthy (2004) for some possibilities. When is and the pairs trade unwound? There are various possibilities: the next trading time (see Reverre (2001) example) or 2 2 when the spread corrects sufficiently. The price applica- k ! as k !1, ð6Þ 1 ð1 bÞ2 tion and data bases used are often proprietary in industry applications. The machinery we present then provides provided we have chosen > 0 and small so that some useful tools appropriate for pairs trading. j1 bj < 1. Another explicit strategy could make use of First- We can also write (1) as Passage Times results (see Finch (2004) and references xkþ1 ¼ A þ Bxk þ C"kþ1, ð7Þ cited therein) for the (standardized) Ornstein–Uhlenbeck pffiffiffi with A ¼ a 0, 0 < B ¼ 1 b<1 and C ¼ .We process could also regard x ffi XðkÞ where fXðtÞjt 0g satisfies pffiffiffi k dZðtÞ¼ZðtÞ dt þ 2 dWðtÞ: ð11Þ the stochastic differential equation dXðtÞ¼ða bXðtÞÞ dt þ dWðtÞ, ð8Þ Let where fWðtÞjt 0g is a standard Brownian motion (on T0,c ¼ infft 0, ZðtÞ¼0 j Zð0Þ¼cg, ð12Þ some probability space). which has a probability density function f0,c. It is known explicitly that rffiffiffi ! 2.2. The observation process 2 jcj e t c2 e 2t f0,cðtÞ¼ exp ð13Þ p ð1 e2tÞ3=2 2ð1 e2tÞ We assume that we have an observation process {yk}of {xk} in Gaussian noise: for t > 0. Now f0,c has a maximum value at t^ given by yk ¼ xk þ D !k, ð9Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 1 where f! g are iid Gaussian Nð0, 1Þ and independent of t^ ¼ ln 1 þ ðc2 3Þ2 þ 4c2 þ c2 3 : ð14Þ k 2 2 the f"kg in (1) and D > 0. We may assume that 0  C < D, which should be the case for small values of . We can also write (8) in the form We set Yk ¼ fy0, y1, ..., ykg which represents the dXðtÞ¼ðXðtÞÞ dt þ dWðtÞ, ð15Þ information from observing y0, y1, ..., yk. We will wish to compute the conditional expectation (filter): where ¼ b and ¼ a=b. When x^ ¼ E ½x jY Š, ð10Þ k k k Xð0Þ¼ þ c pffiffiffiffiffiffi , ð16Þ which are ‘best’ estimates of the hidden state process 2 through the observed process. In order to make the estimate (10), we will need to estimate ðA, B, C, DÞ or the most likely time T at which XðT Þ¼ is given by rather ðA, B, C2, D2Þ from the observed data. We shall 1 present various results for this below. T ¼ t^, ð17Þ where t^ is given by (14). Use the Ornstein–Uhlenbeck 2.3. The application process as an approximationpffiffiffi to (7) with a ¼ A=, b ¼ð1 BÞ= and ¼ C= with the calibrated values We shall regard {yk} as a model for the observed spread A, B, C. Choose ap valueffiffiffiffiffiffi of c > 0. Enter a pair trade of two securities at time tk. We assume the observed when yk þ cð= 2 Þ and unwind the trade at time spread is a noisy observation of some mean-reverting T later. A correspondingpffiffiffiffiffiffi pair trade would be performed state process {xk}. The {yk} could also model the returns when yk  cð= 2 Þ and unwound at time T later. of the spread portfolio as is often done in practice. Other methods based on up- and down-crossing results If yk > x^kjk1 ¼ E ½xkjYk1Š the spread is regarded as for AR(1) processes could also be considered. Corre- too large, and so the trader could take a long position sponding results like those for the Ornstein–Uhlenbeck in the spread portfolio and profit when a correction processes are not known. Pairs trading 273

3. Filtering and estimation results Step 1 (the E-step): Compute (with #~ ¼ #^ )  j

dP# We assume some underlying probability space ð, F, PÞ Qð#, #~Þ¼E ~ log Y : ð29Þ # dP N whose details need not concern the trader, except that P #~ represents the real world probability. Step 2 (the M-step): Find ^ #jþ1 2 arg max Qð#, #jÞ: ð30Þ #2 3.1. Kalman filtering In the literature there are basically two procedures to implement the EM-Algorithm. We have a state equation

xkþ1 ¼ A þ Bxk þ C"kþ1, ð18Þ 3.2.1. Shumway and Stoffer (1982) smoother and the observation equation approach. This method is described by Shumway and yk ¼ xk þ D !k, ð19Þ Stoffer (1982, 2000) and is an off-line calculation and for k ¼ 0, 1, 2, .... makes use of smoother estimators for the Kalman Filter. Given ðA, B, C, DÞ we can compute We define smoothers (for k  N ): x^ ¼ E ½x jY Š, ð31Þ k ¼ x^k  x^kjk ¼ E ½xkjYkŠð20Þ kjN ÂÃk N Âà ¼ ð ^ Þ2 Y ¼ ð ^ Þ2 ð Þ using the Kalman Filter (see Elliott et al. (1995) for a kjN E ÂÃxk xkjN N E xk xkjN , 32 reference probability style proof ). Let k1, kjN ¼ E ðxk x^kjN Þðxk1 x^k1jN Þ : ð33Þ Âà 2 Rk ¼ kjk  E ðxk x^kÞ Yk : ð21Þ These smoothers can be computed by

B j Then ðx^k, RkÞ are determine recursively as follows: k k J k ¼ , ð34Þ kþ1jk x^kþ1jk ¼ A þ Bk ¼ A þ Bx^kjk, ð22Þ Âà 2 2 x^kjN ¼ x^kjk þJk x^kþ1jN ðA þ Bx^kjkÞ , ð35Þ kþ1jk ¼ B kjk þ C , ð23Þ Âà ¼ þJ2 , ð36Þ 2 kjN kjk k kþ1jN ÂÃkþ1jk Kkþ1 ¼ kþ1jk=ðkþ1jk þ D Þ, ð24Þ k1, kjN ¼Jk1kjk þJkJ k1 k, kþ1jN Bkjk , ð37Þ x^ ¼ x^ ¼ x^ þK ½ y x^ Š, ð25Þ kþ1 kþ1jkþ1 kþ1jk kþ1 kþ1 kþ1jk ¼ Bð1 K Þ , ð38Þ 2 N1, NjN N N1jN1 Rkþ1 ¼ kþ1jkþ1 ¼ D Kkþ1 ¼ kþ1jk Kkþ1kþ1jk: ð26Þ where initial values for this backward recursion x^NjN and For initialization we could take x^ ¼ y and R ¼ D2. 0 0 0 NjN are obtained from the Kalman Filter along with 2 2 other estimates. Given #j ¼ðA, B, C , D Þ and initial Remark: As k !1, Rk converges (monotonically) to j1 2 2 2 2 2 2 values for the Kalman Filter x^0 ¼ x^0jN and the positive root R of B R þðC þ D B D ÞR j1 C2D2 ¼ 0 provided B2 6¼ 0, C2D2 6¼ 0. We cannot say 0j0 ¼ 0jN which are the smoothers from the ^ ^ ^ 2 ^ 2 previous step ( j 1). The updates #jþ1 ¼ðA, B, C , D Þ very much about limiting values of x^k except it is are computed as follows: exponentially forgetting of x^0. However, these comments are not very important as we will only assume the A^ ¼ , ð39Þ model (18) holds over a short time horizon for a given N 2 set of values on ðA, B, C, DÞ. N B^ ¼ , ð40Þ N 2  XN ^ 2 1 ^ ^ 2 3.2. Estimation of model C ¼ ðxk A BBxk1Þ YN , ð41Þ N k¼1 We now provide estimates for # ðA, B, C2, D2Þ based on XN Âà ^ 2 1 2 D ¼ ðyk xkÞ YN , ð42Þ observations y0, y1, ..., yN . We use the EM-Algorithm to N þ 1 find #^ by an iteration that provides a stationary value of k¼0 the likelihood function based on the observations. In fact, where let (see Elliott et al. (1995)) XN Âà XN hi  2 2 ¼ E x Y ¼ j þ x^ j , dP# k 1 N k 1 N k 1 N L ð#Þ¼E Y ð27Þ k¼1 k¼1 N 0 dP N 0 XN XN Âà be the likelihood function for # 2 . The maximum ¼ E ½xk1xkjYN Š¼ k1, kjN þ x^k1jN x^kjN , likelihood estimate solves k¼1 k¼1 XN ^ # ¼ arg max LN ð#Þ: ð28Þ ¼ x^ , #2 kjN k¼1 The EM-Algorithm is an iterative method to compute #^. XN ^ If #0 is an initial estimate, the EM-Algorithm provides ¼ x^k1jN ¼ x^NjN þ x^0jN , ^ #j, j ¼ 1, 2, ..., as a sequence of estimates. k¼1 274 R. J. Elliott et al.

2 2 and the right-hand sides of (41) and (42) are readily If E ¼ E ^ , which means using #^ ¼ðA, B, C , D Þ in #j j ^ ^ ^ ^ 2 ^ 2 computed in terms of smoothers: the dynamics (18), (19), then #jþ1 ¼ðA, B, C , D Þ is  XN given through ^ 2 1 2 ^2 ^2 ^2 2 C ¼ kjN þ x^kjN þ A þ B k1jN þ B ðx^k1jN Þ "#1"# c1 2 d1 c1 N ¼ ðI Þ H I k 1 ^ N c0 N N A ¼ 1 I N , ð43Þ  d2 d2 H N H N 2A^x^ þ 2A^B^x^ 2B^ 2B^x^ x^ , hi kjN k1jN k1, kjN kjN k1jN 1 d c ^ ¼ 1 ^ 1 ð Þ B d H N AI N , 44 hiH 2 1 XN hiN D^ 2 ¼ y2 2y x^ þ þ x^2 : 1 d d c c d N þ 1 k k kjN kjN kjN C^ 2 ¼ H 0 þ TA^2 þ H 2 B^2 2A^I 0 þ 2A^B^I 1 2B^H 1 , k¼0 T N N N N N The disadvantage of this algorithm is that, as new values ð45Þ hi of observations are given, the whole algorithm must be 1 d D^ 2 ¼ Y 2Jc þ H 0 : ð46Þ repeated off-line. However, if we have written a code for T þ 1 N N N this estimation based on N þ 1 observations y , y , ..., y , 0 1 N We now provide recurrences for computing the quantities then with y we simply provide the code with input Nþ1 in (43)–(46). Given # we use the Kalman Filter calcula- y , y , ..., y . The Shumway and Stoffer algorithm j 1 2 Nþ1 tions (22)–(26) to determine the values of and R , from has been widely tested. k k which we have (M ¼ 0, 1, 2)

dM M M M 2 H k ¼ ak þ bk k þ dk ½Rk þ kŠ, ð47Þ 3.2.2. Elliott and Krishnamurthy (1999) filter b approach. This approach to the implementation of the Jk ¼ ak þ bkk, ð48Þ EM-Algorithm uses filtered quantities and can be b I 0 ¼ s0 þ t0 , ð49Þ performed on-line. This was based on a new class of k k k k finite-dimensional recursive filters for linear dynamic b I 1 ¼ s1 þ t1 , ð50Þ systems, which can be adapted to equations (18) and (19). k k k k 2 The important advantages of this filter-based Yk ¼ Yk1 þ yk, ð51Þ EM-Algorithm compared with the (standard) smoother based EM-Algorithm include (i) substantially reduced where the various coefficients are determined as follows. memory requirements, and (ii) ease of parallel implemen- Set tation on a multiprocessor system (see Elliott and 1 B2 ¼ þ , ð52Þ Krishnamurthy (1997, 1999)). The details of this approach k R C 2 are discussed in Elliott et al. (in press), where computa- k 1 B2 tional issues and convergence are reported. ¼ , ð53Þ ^ 2 2 k C 2 As in section 3.2.1, we start with #j ¼ðA, B, C , D Þ and k  initial values for the Kalman Filter and the next estimate 1 AB S ¼ k , ð54Þ ^ ^ ^ ^ 2 ^ 2 k 2 #jþ1 ¼ðA, B, C , D Þ. k Rk C We introduce various quantities: then Xk Âà 0 2 d0 0 a0 ¼ 0, b0 ¼ 0, d 0 ¼ 1, H k ¼ xl , H k ¼ E H k Yk , 0 0 0 l¼0 0 0 0 0 2 1 akþ1 ¼ ak þ bkSk þ d k½Sk þ k Š, Xk Âà 1 d1 1 0 0 0 H k ¼ xlxl1, H k ¼ E H k Yk , bkþ1 ¼ bkk þ Sk þ 2d kkSk, l¼1 d 0 ¼ 1 þ d 02, ð55Þ Xk Âà kþ1 k k 2 2 d2 2 H k ¼ xl1, H ¼ E H k Yk , k 1 1 1 l¼0 a0 ¼ 0, b0 ¼ 0, d 0 ¼ 0, Xk Âà 1 1 1 1 2 1 b akþ1 ¼ ak þ bkSk þ dk ½S k þ k Š, Jk ¼ xl yl, Jk ¼ E Jk Yk , l¼0 1 1 1 bkþ1 ¼ bkk þ Sk þ 2d kkSk, Xk b Âà 1 1 2 0 0 0 d kþ1 ¼ k þ dk k, ð56Þ I k ¼ xl, I k ¼ E I k Yk , l¼0 2 2 2 Xk Âà a0 ¼ 0, b0 ¼ 0, d0 ¼ 0, 1 b1 1 I k ¼ xl1, I k ¼ E I k Yk , 2 2 2 2 2 1 akþ ¼ ak þ bkSk þðd k þ 1Þ½S k þ k Š, l¼0 1 2 2 2 Xk bkþ1 ¼ bkk þ 2ðd k þ 1ÞkSk, Y ¼ y2: k l 2 2 2 l¼0 d kþ1 ¼ð1 þ d kÞk, ð57Þ Pairs trading 275

0.5 where programmers are warned to distinguish here Estimate of A between superscript 2 and squared terms 0.4 0.3 a0 ¼ 0, b0 ¼ y0, 0.2 a ¼ a þ b S , kþ1 k k k 0.1 bkþ1 ¼ ykþ1 þ bkk, ð58Þ 0 0 50 100 150 0 0 EM Algorithm Pass Index s0 ¼ 0, t 0 ¼ 1, 0 0 0 0.95 skþ ¼ sk þ tkSk, 1 Estimate of B 0 0 0.9 tkþ1 ¼ 1 þ tkk, ð59Þ 0.85 1 1 s0 ¼ 0, t0 ¼ 0, 0.8 1 1 1 0.75 skþ1 ¼ sk þ tkSk, 1 1 0.7 tkþ1 ¼ 1 þ tkk: ð60Þ 0 50 100 150 ^ ^ EM Algorithm Pass Index Remarks: (a) Given #j the calculation of #jþ1 is com- puted by the steps: initialize the Kalman Filter with Figure 1. Convergence of the maximum likelihood estimates A 2 and B. 0 ¼ y0 and R0 ¼ D . If the k and Rk have been calculated, the various coefficients may now be calculated using (52)–(54) and then (55)–(60). Find kþ1 and Rkþ1 from the Kalman Filter equations. Continue until k ¼ N. 0.8 Then compute the quantities in (47)–(51) (at k ¼ N) and ^ ^ 0.6 then #jþ1 from (43)–(46). Some initial guess for #0 must be made, and then iterations are concluded when the 0.5 ^ ^ 0.4 values for #j have converged sufficiently. Call this #(N). 2 2 (b) If #^ðNÞ¼ðA^, B^, C^ , D^ Þ, we should check that A^ > 0 Estimate of C 0.2 and 0 < B^ < 1, else the pairs trading algorithm should 0 50 100 150 not be used with this data. (c) The procedure described EM Algorithm Pass Index in (a) could be regarded as an initialization, and need not 1.3 be repeated in subsequent steps, where only one iteration Estimate of D should suffice to update the coefficients in the model. 1.2 1.1 1 3.3. Implementation of the EM-Algorithm 0.9 0.8 We will assume that model (18), (19) holds over N 0 50 100 150 ^ periods. The values of #(N)andN are computed based EM Algorithm Pass Index on the observations y0, y1, ..., yN and a trade may be initiated as described in section 2, and possibly unwound Figure 2. Convergence of the maximum likelihood estimates C at t ¼ N þ 1 (or according to some other criterion). Based and D. ^ on #(N), Nþ1 is computed based on the data y1, y2, ..., yNþ1 (the most recent N þ 1 values with the To illustrate the typical performance of the Shumway c2 Kalman Filter initialized at 1 ¼ y1 and R1 ¼ D ðNÞ) and Stoffer EM algorithm, adapted to estimation of the and a trade initiated. #^ðN þ 1Þ is calculated with one set fA, B, C, Dg, we consider a simulation with parameter iteration using section 3.2.1 or 3.2.2 and using the values A ¼ 0:20, B ¼ 0:85, C ¼ 0:60 and D ¼ 0:80. Our Kalman Filter based on data y1, y2, ..., yNþ1. The observation set contained 100 points. To initialize the EM procedure is then repeated. algorithm, the following values were used: A ¼ 1:20, B ¼ 0:50, C ¼ 0:30 and D ¼ 0:70, with x^0j0 ¼ 0 and 0j0 ¼ 0:1. The EM algorithm was iterated 150 times. Figures 1 and 2 show convergence of the maximum 4. Numerical examples likelihood estimates of all parameters. Here we will provide some simulation and calibration results which demonstrate that the Shumway and Stoffer algorithm provides a consistent and robust References estimating algorithm for the model. Studies based on Elliott, R.J., Aggoun, L. and Moore, J.B., Hidden Markov Elliott and Krishnamurthy are given by Elliott et al. (in Models, 1995 (Springer: Berlin). press). Some initial experiments have also been performed Elliott, R.J. and Krishnamurthy, V., Exact finite-dimensional with real data with a hedge fund. filters for maximum likelihood parameter estimation of 276 R. J. Elliott et al.

continuous-time linear Gaussian systems. SIAM Journal of Litterman, B., Modern Investment Management—An Control and Optimization, 1997, 35, 1908–1923. Equilibrium Approach, chapter 1, 2003 (Wiley: New York). Elliott, R.J. and Krishnamurthy, V., New finite-dimensional Nicholas, J.G., Market Neutral Investing— Long/Short Hedge filters for parameter estimation of discrete-time linear Fund Strategies, 2000 (Bloomberg Professional Library, Gaussian models. IEEE Transactions of Automatic Control, Bloomberg Press: Princeton, NJ, USA). 1999, 44, 938–951. Reverre, S., The Complete Arbitrage Desk-book, chapter 10, Elliott, R.J., Malcolm, W.P. and van der Hoek, J., 2001 (McGraw Hill: New York). The numerical analysis of a filter based EM algorithm Shumway, R.H. and Stoffer, D.S., An approach to time series (in press). smoothing and forecasting using the EM algorithm. Journal Finch, S., Ornstein–Uhlenbeck process. Unpublished Note. of Time Series, 1982, 3, 253–264. Available online at: http://pauillac.inria.fr/algo/bsolve/ Shumway, R.H. and Stoffer, D.S., Time Series Analysis and Its constant/constant.html. Applications, 2000 (Springer: New York). Gatev, E.G., Goetzmann, W.N. and Rouwenhorst, K.G., Pairs Vidyamurthy, G., Pairs Trading—Quantitative Methods and trading: performance of a relative average arbitrage rule. Analysis, 2004 (Wiley: New York). NBER Working Paper 7032, National Bureau of Economic Whistler, M., Trading Pairs—Capturing Profits and Hedging Research Inc., 1999. Available online at: http://www.nber.org/ Risk with Strategies, 2004 (Wiley: papers/w7032. New York).