2004 to March 2021, the S&P 500 Index’S Annual Return Was 9.85% with a Maximum

RO4 1 Using Machine Learning and Algorithmic Trading to Beat the U.S. Stock Market Index by Chih-yu LEE and Matthew Johann Siy UY RO4 Advised by Prof. David Paul ROSSITER Submitted in partial fulfillment of the requirements for COMP 4981 in the Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2020-2021 Date of submission: April 14, 2021 2 Abstract The project aims to develop investment strategies that can achieve higher returns with lower risks than the S&P 500 Index by applying Machine Learning techniques to traditional investment strategies. In total, we implemented and backtested three Machine Learning-based investing strategies: Harry Browne Permanent Portfolio with LSTM (HBLSTM), Stock Selection with Natural Language Processing (SSNLP), and Pairs Trading with PCA and Clustering (PTPC). From December 2004 to March 2021, the S&P 500 Index's annual return was 9.85% with a Maximum Drawdown (MDD) of 55.1%. On the other hand, two of our Machine Learning-based investment strategies, HBLSTM and SSNLP, were able to achieve both a significantly higher annual return and a lower MDD. HBLSTM had an annual return of 20.4% with an MDD of 18%, while SSNLP had an annual return of 14.8% and an MDD of 33.4%. Although PTPC merely had an annual return of 2.97% with an MDD of 31.6%, we noticed that its role is insurance that generates positive return over time as it performed well during the crises. Finally, our Combined Portfolio had an annual return of 9.99% with an MDD of 7.8%, providing further evidence of the benefits of using Machine Learning techniques in investing strategies. 3 Table of Contents 1 Introduction 12 1.1 Overview......................................... 12 1.2 Objectives......................................... 22 1.3 Literature Survey..................................... 24 2 Methodology 26 2.1 Design........................................... 26 2.2 Implementation...................................... 42 2.3 Testing.......................................... 59 2.4 Evaluation......................................... 67 3 Discussion 80 3.1 Performance Comparison with S&P 500 Index..................... 80 3.2 Performance during Black Swan Events........................ 81 3.3 Limitations and Challenges............................... 90 4 Conclusion 94 4.1 Summary of Achievements................................ 94 4.2 Future Work....................................... 95 5 References 96 6 Appendix A: Meeting Minutes 99 6.1 Minutes of the 1st Project Meeting........................... 99 6.2 Minutes of the 2nd Project Meeting........................... 101 6.3 Minutes of the 3rd Project Meeting........................... 103 6.4 Minutes of the 4th Project Meeting........................... 104 4 6.5 Minutes of the 5th Project Meeting........................... 105 6.6 Minutes of the 6th Project Meeting........................... 106 6.7 Minutes of the 7th Project Meeting........................... 107 6.8 Minutes of the 8th Project Meeting........................... 108 6.9 Minutes of the 9th Project Meeting........................... 109 6.10 Minutes of the 10th Project Meeting.......................... 111 6.11 Minutes of the 11th Project Meeting.......................... 112 6.12 Minutes of the 12th Project Meeting.......................... 113 6.13 Minutes of the 13th Project Meeting.......................... 114 6.14 Minutes of the 14th Project Meeting.......................... 115 7 Appendix B: Glossary 116 8 Appendix C: Project Planning 120 8.1 Distribution of Work................................... 120 8.2 GANTT Chart...................................... 121 9 Appendix D: Required Hardware and Software 122 9.1 Hardware Requirements................................. 122 9.2 Software Requirements.................................. 122 10 Appendix E: Additional Information 123 10.1 Brokerages........................................ 123 10.2 Quantopian........................................ 124 10.3 QuantConnect...................................... 125 11 Appendix F: Additional Information For Future Work 130 11.1 Reinforcement Learning................................. 130 5 11.2 Stock Price Prediction in Active Investment by LSTM................ 133 11.3 Construction of Website................................. 134 11.4 Custom Backtesting System............................... 136 6 List of Tables 1 Table for BoW Model.................................. 31 2 Term Frequency Table for TF-IDF Model....................... 31 3 Inverse Document Frequency Table for TF-IDF Model................ 31 4 Table for TF-IDF Model................................. 32 5 Summary of Pairs Trading With PCA and Clustering................ 71 6 Summary of Combined Portfolio............................ 79 7 Summary of All Models Performance.......................... 80 8 Distribution of Work................................... 120 9 GANTT Chart...................................... 121 10 Hardware Requirements................................. 122 11 Software Requirements.................................. 122 List of Figures 1 Illustration of an RNN [1]................................ 19 2 Illustration of 4-Stage Developments.......................... 23 3 System Design...................................... 26 4 Illustration of LSTM Model Cells............................ 29 5 Example of Correlated But Not Co-Integrated Stock Pairs.............. 36 6 Illustration of 2-Dimension Data Before and After PCA............... 38 7 Illustration of The Axes Transformation After PCA................. 38 8 Illustration of QuantConnect Visualization Report (Page 1)............. 47 9 Illustration of QuantConnect Visualization Report (Page 2)............. 48 10 Illustration of QuantConnect Visualization Report (Page 3)............. 49 7 11 Upper Part of The Visualization Webpage....................... 50 12 Lower Part of The Visualization Webpage....................... 50 13 Basic Metrics of The Strategy.............................. 51 14 Heatmap of The Monthly Returns........................... 52 15 Interactive Equity Curve................................. 53 16 Interactive Equity Curve for The Recent 6 Years After Zooming in......... 53 17 12-Month Rolling Return................................ 54 18 Drawdown Diagram................................... 55 19 Drawdown Summary Table............................... 55 20 Volatility Diagram.................................... 56 21 Rolling Beta........................................ 57 22 Weight Allocation.................................... 58 23 Code for Plotting Necessary Variables......................... 63 24 The Custom Variable Tracker from The Self.Plot Code................ 63 25 QuantConnect Automated Trading Dashboard.................... 64 26 QuantConnect Automated Trading Dashboard (Status = Launch)......... 65 27 QuantConnect Automated Trading Dashboard (Status = Runtime Error)..... 65 28 QuantConnect Automated Trading Dashboard (Status = Stopped)......... 65 29 Orders on QuantConnect Automated Trading Dashboard.............. 66 30 Equity Value on QuantConnect Automated Trading Dashboard........... 66 31 Backtest Result of Harry Browne Permanent Portfolio from December 2004 to August 2020........................................ 67 32 Backtest Result of Machine Learning Model-based Harry Browne Permanent Portfolio from December 2004 to March 2021.......................... 68 33 Backtest Result of Stock Selection Using NLP from December 2004 to March 2021 69 8 34 Long-Short Exposure of Stock Selection Using NLP from December 2004 to March 2021............................................ 69 35 Backtest Result of Pairs Trading Strategy with PCA and DBSCAN from December 2004 to December 2008................................. 70 36 Backtest Result of Pairs Trading Strategy with PCA and DBSCAN from January 2009 to December 2012................................. 70 37 Backtest Result of Pairs Trading Strategy with PCA and DBSCAN from January 2013 to December 2016................................. 71 38 Backtest Result of Pairs Trading Strategy with PCA and DBSCAN from January 2017 to March 2021................................... 71 39 Pairs Trading Result From 2007 to 2008 (1st Part of The Global Financial Crisis). 72 40 Pairs Trading Result From 2007 to 2008 (2nd Part of The Global Financial Crisis) 72 41 Pairs Trading Result In 2020 (The Coronavirus Crisis)................ 72 42 Equity Curve for Stock Selection Using NLP for Different Thresholds from Decemeber 2004 to March 2021................................... 74 43 CAGR for Stock Selection Using NLP for Different Thresholds from Decemeber 2004 to March 2021...................................... 75 44 Sharpe Ratio for Stock Selection Using NLP for Different Thresholds from December 2004 to March 2021................................... 75 45 Max Drawdown for Stock Selection Using NLP for Different Thresholds from December 2004 to March 2021................................... 75 46 Summary Table for Pairs Trading With PCA and Clustering for Exit Z-Score = +0.2 and -0.2....................................... 76 47 Summary Table for Pairs Trading With PCA and Clustering for Different Exit Z-Scores 76 48 Summary Table for Pairs Trading With PCA and Clustering for Different Entry Z-Scores.......................................... 77 49 Summary Table for Pairs Trading With PCA and Clustering for Different Frequency for Co-integration Testing................................ 77 9 50 Equity Curve for The Combined Portfolio

2004 to March 2021, the S&P 500 Index’S Annual Return Was 9.85% with a Maximum

Algorithmic Approach to Options Trading

Front Office Middle Office Back Office

Risk Management & Trading Conference

Trading System Development

Successful Algorithmic Trading

The Basics of Algorithmic Trading

Quant Trading Guide

The 'Outsiders'