Enhancing a Pairs Trading Strategy with the Application of Machine Learning

Enhancing a Pairs Trading strategy with the application of Machine Learning Simão Nava de Moraes Sarmento Thesis to obtain the Master of Science Degree in Electrical and Computer Engineering Supervisor(s): Prof. Nuno Cavaco Gomes Horta Examination Committee Chairperson: Prof. Teresa Maria Sá Ferreira Vazão Vasques Supervisor: Prof. Nuno Cavaco Gomes Horta Member of the Committee: Prof. Helena Isabel Aidos Lopes Tomás September 2019 ii Declaration I declare that this document is an original work of my own authorship and that it fulfills all the require- ments of the Code of Conduct and Good Practices of the Universidade de Lisboa. iii iv Acknowledgments I would first like to thank my thesis supervisor, Prof. Nuno Horta, for his guidance during this work. His feedback was fundamental in delineating the aspects this thesis should focus on. I would also like to thank my friends and my family, for their constant support during the course of this work, even when that implied spending less time with them. v vi Resumo Pairs Trading e´ uma das estrategias´ mais importantes utilizadas por fundos de investimento. E´ particu- larmente interessante por ultrapassar o processo arduo´ de avaliaçao˜ de t´ıtulos financeiros focando-se no preço relativo. Atraves´ da compra de um t´ıtulo financeiro relativamente subvalorizado e da venda de um sobrevalorizado, o investidor pode lucrar apos´ a convergenciaˆ dos preços do par. No entanto, com a afluenciaˆ dos dados dispon´ıveis, tornou-se mais dif´ıcil encontrar pares recompensadores. Neste trabalho investigamos dois problemas: (i) como encontrar pares rentaveis´ num espaço de procura lim- itado e (ii) como evitar longos per´ıodos de queda devido a pares com divergenciaˆ prolongada. Para gerir estas dificuldades, exploramos a aplicaçao˜ de tecnicas´ promissoras de Aprendizagem Automatica.´ Propomos a integraçao˜ de um algoritmo de aprendizagem nao˜ supervisionada, OPTICS, para lidar com o problema (i). Os resultados demonstram que a tecnica´ sugerida e´ capaz de superar as tecnicas´ comuns, sendo que o portfolio´ alcança um Sharpe ratio medio´ de 3.80, em comparaçao˜ com 3.58 e 2.59 obtido com tecnicas´ padrao.˜ Para o problema (ii), introduzimos um modelo de investimento baseado em previsao,˜ capaz de reduzir os per´ıodos de queda em 75%, ainda que isso implique uma diminuiçao˜ da rentabilidade. A estrategia´ proposta e´ testada utilizando um modelo ARMA, uma LSTM e um Codificador-Descodificador baseado em LSTM. Os resultados deste trabalho sao˜ simulados durante diversos per´ıodos, entre Janeiro de 2009 e Dezembro de 2018, utilizando series´ de preços de 5 minutos de um grupo de 208 ETFs relacionadas com commodities, e considerando custos transacionais. Palavras-chave: Pairs Trading, Mercado Neutro, Aprendizagem Automatica,´ Aprendizagem Profunda, Aprendizagem Nao˜ Supervisionada vii viii Abstract Pairs Trading is one of the most valuable market-neutral strategies used by hedge funds. It is particularly interesting as it overcomes the arduous process of valuing securities by focusing on relative pricing. By buying a relatively undervalued security and selling a relatively overvalued one, a profit can be made upon the pair’s price convergence. However, with the growing availability of data, it became increasingly harder to find rewarding pairs. In this work, we address two problems: (i) how to find profitable pairs while constraining the search space and (ii) how to avoid long decline periods due to prolonged divergent pairs. To manage these difficulties, the application of promising Machine Learning techniques is investigated in detail. We propose the integration of an Unsupervised Learning algorithm, OPTICS, to handle problem (i). The results obtained demonstrate the suggested technique can outperform the common pairs’ search methods, achieving an average portfolio Sharpe ratio of 3.79, in comparison to 3.58 and 2.59 obtained by standard approaches. For problem (ii), we introduce a forecasting-based trading model, capable of reducing the periods of portfolio decline by 75%. Yet, this comes at the expense of decreasing overall profitability. The proposed strategy is tested using an ARMA model, an LSTM and an LSTM Encoder-Decoder. This work’s results are simulated during varying periods between January 2009 and December 2018, using 5-minutes price data from a group of 208 commodity-linked ETFs, and accounting for transaction costs. Keywords: Pairs Trading, Market Neutral, Machine Learning, Deep Learning, Unsupervised Learn- ing ix x Contents Declaration............................................... iii Acknowledgments...........................................v Resumo................................................. vii Abstract................................................. ix List of Tables.............................................. xvii List of Figures............................................. xix List of Acronyms............................................ xxi 1 Introduction 1 1.1 Topic Overview..........................................1 1.2 Objectives.............................................3 1.3 Thesis Outline..........................................4 2 Pairs Trading - Background and Related Work5 2.1 Mean-Reversion and Stationarity................................5 2.1.1 Augmented Dickey-Fuller test..............................6 2.1.2 Hurst exponent......................................6 2.1.3 Half-life of mean-reversion................................7 2.1.4 Cointegration.......................................7 2.2 Pairs selection..........................................8 2.2.1 The minimum distance approach............................9 2.2.2 The correlation approach................................ 10 2.2.3 The cointegration approach............................... 10 2.2.4 Other approaches.................................... 11 xi 2.3 Trading execution......................................... 11 2.3.1 Threshold-based trading model............................. 11 2.3.2 Other trading models in the literature.......................... 13 2.4 The application of Machine Learning in Pairs Trading..................... 13 2.5 Conclusion............................................ 14 3 Proposed Pairs Selection Framework 15 3.1 Problem statement........................................ 15 3.2 Proposed framework....................................... 16 3.3 Dimensionality reduction..................................... 17 3.4 Unsupervised Learning..................................... 18 3.4.1 Problem requisites.................................... 19 3.4.2 Clustering methodologies................................ 19 3.4.3 DBSCAN......................................... 20 3.4.4 OPTICS.......................................... 22 3.5 Pairs selection criteria...................................... 25 3.6 Framework diagram....................................... 27 3.7 Conclusion............................................ 27 4 Proposed Trading Model 29 4.1 Problem statement........................................ 29 4.2 Proposed model......................................... 30 4.3 Model diagram.......................................... 32 4.4 Time series forecasting..................................... 33 4.5 Autoregressive Moving Average................................. 34 4.6 Artificial Neural Network models................................ 35 4.6.1 Long Short-Term Memory................................ 35 4.6.2 LSTM Encoder-Decoder................................. 36 4.7 Artificial Neural Networks design................................ 37 4.7.1 Hyperparameter optimization.............................. 37 4.7.2 Weight initialization.................................... 38 xii 4.7.3 Regularization techniques................................ 38 4.8 Conclusion............................................ 40 5 Implementation 41 5.1 Research design......................................... 41 5.2 Dataset.............................................. 42 5.2.1 Exchange-Traded Funds................................. 42 5.2.2 Data description..................................... 43 5.2.3 Data preparation..................................... 44 5.2.4 Data partition....................................... 45 5.3 Research Stage 1........................................ 47 5.3.1 Development of the pairs selection techniques.................... 48 5.3.2 Trading setup....................................... 48 5.3.3 Test portfolios....................................... 48 5.4 Research Stage 2........................................ 49 5.4.1 Building the forecasting algorithms........................... 49 5.4.2 Test conditions...................................... 50 5.5 Trading simulation........................................ 50 5.5.1 Portfolio construction................................... 50 5.5.2 Transaction costs..................................... 52 5.5.3 Entry and exit points................................... 53 5.6 Evaluation metrics........................................ 53 5.6.1 Return on Investment.................................. 53 5.6.2 Sharpe Ratio....................................... 54 5.6.3 Maximum Drawdown................................... 56 5.7 Implementation environment.................................. 56 5.8 Conclusion............................................ 56 6 Results 57 6.1 Data cleaning........................................... 57 6.2 Pairs selection performance................................... 58 xiii 6.2.1 Eligible pairs....................................... 58

Load more