2004 to March 2021, the S&P 500 Index’S Annual Return Was 9.85% with a Maximum

Total Page:16

File Type:pdf, Size:1020Kb

2004 to March 2021, the S&P 500 Index’S Annual Return Was 9.85% with a Maximum RO4 1 Using Machine Learning and Algorithmic Trading to Beat the U.S. Stock Market Index by Chih-yu LEE and Matthew Johann Siy UY RO4 Advised by Prof. David Paul ROSSITER Submitted in partial fulfillment of the requirements for COMP 4981 in the Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2020-2021 Date of submission: April 14, 2021 2 Abstract The project aims to develop investment strategies that can achieve higher returns with lower risks than the S&P 500 Index by applying Machine Learning techniques to traditional investment strategies. In total, we implemented and backtested three Machine Learning-based investing strategies: Harry Browne Permanent Portfolio with LSTM (HBLSTM), Stock Selection with Natural Language Processing (SSNLP), and Pairs Trading with PCA and Clustering (PTPC). From December 2004 to March 2021, the S&P 500 Index's annual return was 9.85% with a Maximum Drawdown (MDD) of 55.1%. On the other hand, two of our Machine Learning-based investment strategies, HBLSTM and SSNLP, were able to achieve both a significantly higher annual return and a lower MDD. HBLSTM had an annual return of 20.4% with an MDD of 18%, while SSNLP had an annual return of 14.8% and an MDD of 33.4%. Although PTPC merely had an annual return of 2.97% with an MDD of 31.6%, we noticed that its role is insurance that generates positive return over time as it performed well during the crises. Finally, our Combined Portfolio had an annual return of 9.99% with an MDD of 7.8%, providing further evidence of the benefits of using Machine Learning techniques in investing strategies. 3 Table of Contents 1 Introduction 12 1.1 Overview......................................... 12 1.2 Objectives......................................... 22 1.3 Literature Survey..................................... 24 2 Methodology 26 2.1 Design........................................... 26 2.2 Implementation...................................... 42 2.3 Testing.......................................... 59 2.4 Evaluation......................................... 67 3 Discussion 80 3.1 Performance Comparison with S&P 500 Index..................... 80 3.2 Performance during Black Swan Events........................ 81 3.3 Limitations and Challenges............................... 90 4 Conclusion 94 4.1 Summary of Achievements................................ 94 4.2 Future Work....................................... 95 5 References 96 6 Appendix A: Meeting Minutes 99 6.1 Minutes of the 1st Project Meeting........................... 99 6.2 Minutes of the 2nd Project Meeting........................... 101 6.3 Minutes of the 3rd Project Meeting........................... 103 6.4 Minutes of the 4th Project Meeting........................... 104 4 6.5 Minutes of the 5th Project Meeting........................... 105 6.6 Minutes of the 6th Project Meeting........................... 106 6.7 Minutes of the 7th Project Meeting........................... 107 6.8 Minutes of the 8th Project Meeting........................... 108 6.9 Minutes of the 9th Project Meeting........................... 109 6.10 Minutes of the 10th Project Meeting.......................... 111 6.11 Minutes of the 11th Project Meeting.......................... 112 6.12 Minutes of the 12th Project Meeting.......................... 113 6.13 Minutes of the 13th Project Meeting.......................... 114 6.14 Minutes of the 14th Project Meeting.......................... 115 7 Appendix B: Glossary 116 8 Appendix C: Project Planning 120 8.1 Distribution of Work................................... 120 8.2 GANTT Chart...................................... 121 9 Appendix D: Required Hardware and Software 122 9.1 Hardware Requirements................................. 122 9.2 Software Requirements.................................. 122 10 Appendix E: Additional Information 123 10.1 Brokerages........................................ 123 10.2 Quantopian........................................ 124 10.3 QuantConnect...................................... 125 11 Appendix F: Additional Information For Future Work 130 11.1 Reinforcement Learning................................. 130 5 11.2 Stock Price Prediction in Active Investment by LSTM................ 133 11.3 Construction of Website................................. 134 11.4 Custom Backtesting System............................... 136 6 List of Tables 1 Table for BoW Model.................................. 31 2 Term Frequency Table for TF-IDF Model....................... 31 3 Inverse Document Frequency Table for TF-IDF Model................ 31 4 Table for TF-IDF Model................................. 32 5 Summary of Pairs Trading With PCA and Clustering................ 71 6 Summary of Combined Portfolio............................ 79 7 Summary of All Models Performance.......................... 80 8 Distribution of Work................................... 120 9 GANTT Chart...................................... 121 10 Hardware Requirements................................. 122 11 Software Requirements.................................. 122 List of Figures 1 Illustration of an RNN [1]................................ 19 2 Illustration of 4-Stage Developments.......................... 23 3 System Design...................................... 26 4 Illustration of LSTM Model Cells............................ 29 5 Example of Correlated But Not Co-Integrated Stock Pairs.............. 36 6 Illustration of 2-Dimension Data Before and After PCA............... 38 7 Illustration of The Axes Transformation After PCA................. 38 8 Illustration of QuantConnect Visualization Report (Page 1)............. 47 9 Illustration of QuantConnect Visualization Report (Page 2)............. 48 10 Illustration of QuantConnect Visualization Report (Page 3)............. 49 7 11 Upper Part of The Visualization Webpage....................... 50 12 Lower Part of The Visualization Webpage....................... 50 13 Basic Metrics of The Strategy.............................. 51 14 Heatmap of The Monthly Returns........................... 52 15 Interactive Equity Curve................................. 53 16 Interactive Equity Curve for The Recent 6 Years After Zooming in......... 53 17 12-Month Rolling Return................................ 54 18 Drawdown Diagram................................... 55 19 Drawdown Summary Table............................... 55 20 Volatility Diagram.................................... 56 21 Rolling Beta........................................ 57 22 Weight Allocation.................................... 58 23 Code for Plotting Necessary Variables......................... 63 24 The Custom Variable Tracker from The Self.Plot Code................ 63 25 QuantConnect Automated Trading Dashboard.................... 64 26 QuantConnect Automated Trading Dashboard (Status = Launch)......... 65 27 QuantConnect Automated Trading Dashboard (Status = Runtime Error)..... 65 28 QuantConnect Automated Trading Dashboard (Status = Stopped)......... 65 29 Orders on QuantConnect Automated Trading Dashboard.............. 66 30 Equity Value on QuantConnect Automated Trading Dashboard........... 66 31 Backtest Result of Harry Browne Permanent Portfolio from December 2004 to August 2020........................................ 67 32 Backtest Result of Machine Learning Model-based Harry Browne Permanent Portfolio from December 2004 to March 2021.......................... 68 33 Backtest Result of Stock Selection Using NLP from December 2004 to March 2021 69 8 34 Long-Short Exposure of Stock Selection Using NLP from December 2004 to March 2021............................................ 69 35 Backtest Result of Pairs Trading Strategy with PCA and DBSCAN from December 2004 to December 2008................................. 70 36 Backtest Result of Pairs Trading Strategy with PCA and DBSCAN from January 2009 to December 2012................................. 70 37 Backtest Result of Pairs Trading Strategy with PCA and DBSCAN from January 2013 to December 2016................................. 71 38 Backtest Result of Pairs Trading Strategy with PCA and DBSCAN from January 2017 to March 2021................................... 71 39 Pairs Trading Result From 2007 to 2008 (1st Part of The Global Financial Crisis). 72 40 Pairs Trading Result From 2007 to 2008 (2nd Part of The Global Financial Crisis) 72 41 Pairs Trading Result In 2020 (The Coronavirus Crisis)................ 72 42 Equity Curve for Stock Selection Using NLP for Different Thresholds from Decemeber 2004 to March 2021................................... 74 43 CAGR for Stock Selection Using NLP for Different Thresholds from Decemeber 2004 to March 2021...................................... 75 44 Sharpe Ratio for Stock Selection Using NLP for Different Thresholds from December 2004 to March 2021................................... 75 45 Max Drawdown for Stock Selection Using NLP for Different Thresholds from December 2004 to March 2021................................... 75 46 Summary Table for Pairs Trading With PCA and Clustering for Exit Z-Score = +0.2 and -0.2....................................... 76 47 Summary Table for Pairs Trading With PCA and Clustering for Different Exit Z-Scores 76 48 Summary Table for Pairs Trading With PCA and Clustering for Different Entry Z-Scores.......................................... 77 49 Summary Table for Pairs Trading With PCA and Clustering for Different Frequency for Co-integration Testing................................ 77 9 50 Equity Curve for The Combined Portfolio
Recommended publications
  • Algorithmic Approach to Options Trading
    International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 08 Issue: 05 | May 2021 www.irjet.net p-ISSN: 2395-0072 Algorithmic Approach to Options Trading Shubham Horambe1, Ketan Khanolkar1, Dhairya Dixit1, Huzaifa Shaikh1, Prof. Manasi U. Kulkarni2 1B. Tech Student, Dept of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra, India 2 Assistant Professor, Dept of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra, India ----------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Majority of the traders lose money whilst trading sitting in front of the screen for hours. Due to this continuous in options owing to market speculation, emotional trading and monitoring requirement, the performance of Manual Trading lack of risk management. The purpose of this research paper is is limited to the amount of time of the day the trader can to introduce an automated way of trading in options in order dedicate to trading. Other than this, Manual Trading also to tackle these problems. To achieve this, we have adopted suffers from time delays, which is the ability to execute some basic option strategies namely Covered Call, Protective trades in a small-time window. Also, for predicting Put and Covered Strangle. These strategies have been tested unprecedented non-random changes that take place in trade over a period of 5 years (2015-2019) for a set of stocks in the prices, a trader must delve into and analyze more and more U.S options market. information which is more powerful than information available on open-sources like news platforms. This is very Key Words: Options, Algorithmic Trading, Call Option, Put difficult to do for humans, as unlike computers, we can Option.
    [Show full text]
  • Front Office Middle Office Back Office
    HEDGE FUND TECHNOLOGY Front Office Page 2 Research and Analysis Page 4 Trade Execution and Management Page 8 Portfolio Management Middle Office Page 11 Data Management Page 13 Risk Management Page 15 Cash and Collateral Management Page 16 Securities Pricing and NAV Calculation Page 17 Investor Relations Page 18 Trade Reconciliation and Processing Back Office Page 19 Accounting Page 21 Compliance and Reporting Page 24 Administration Page 25 IT Security and Support HEDGE FUND ALERT: 5 Marine View Plaza, Suite 400, Hoboken NJ 07030. 201-659-1700 To start your FREE trial subscription, return to HFAlert.com and click on “Free Trial”. Or call 201-659-1700. You can also complete the Free Trial Coupon on the last page of this newsletter and fax it to 201-659-4141. Hedge Fund 2 ALERT HEDGE FUND TECHNOLOGY FRONT OFFICE: Research and Analysis FRONT OFFICE: Research and Analysis Vendor Product The Skinny AIM Software GAIN Corporate Actions DM Tracks various corporate events for companies in an investment portfolio, enabling aimsoftware.com the manager to react promptly to announcements. Austrian vendor founded in 1999. Arialytics Aria Aims to reduce the time it takes to identify, design and review trading algorithms. arialytics.com Created with new and emerging managers in mind. Allows users to quickly analyze big-data sets to identify patterns and predictors. Machine-learning component tweaks models in response to real-time activity. Rye, N.Y., vendor founded in 2010 by David Marra, previously a principal at Boston Consulting. Dataminr Dataminr Continually monitors and analyzes publicly available Twitter feeds for information dataminr.com about companies and investments.
    [Show full text]
  • Risk Management & Trading Conference
    RiskMathics, aware that the most important factors to develop and consolidate the Financial WHO SHOULD ATTEND? Markets are training and promoting a high level financial culture, will host for the third time in The Risk Management & Trading Conference is aim at Practitioners directly or indirectly involved Mexico: “The Risk Management & Trading Conference”, which will have the participation of in areas of trading, risk management, regulation, technology, and research & development of leading authorities who have key roles in the global financial industry. Stock Exchanges, Brokers, Brokerage Houses, Banks, Institutional Investors (Pension Funds, OBJECTIVES Mutual Funds, Insurance Companies, etc.), Hedge Funds, and Independent Investors. One of the primary objectives of this Conference is to provide through Workshops, Presentations It will be of particular relevance to: and Round Table Discussions the latest advances in Risk Management, Trading, Technology • Chief Executive Officers of financial institutions and intermediaries and Market Regulation, and to transmit all this knowledge by local and international authorities • Traders in the field. • Risk Managers Some other objectives of this Conference are to explain and show in detail the current • Consultants situation and where the Global Financial Industry is heading, advances in Pricing, and how • Regulators intermediaries and direct or indirect participants of markets need to be prepared to remain • Technology managers and staff competitive in spite of the new challenges and paradigms
    [Show full text]
  • Trading System Development
    Trading System Development An Interactive Qualifying Project submitted to the Faculty of WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment of the requirements for the Degree of Bachelor of Science By: Stephen Andrews John Bieber Ted Bieber Grant Espe Advisors: Professor Hossein Hakim Professor Michael Radzicki Date: 5/13/2018 Report submitted to: Professor Hossein Hakim Worcester Polytechnic Institute This report represents work of WPI undergraduate students submitted to the faculty as evidence of a degree requirement. WPI routinely publishes these reports on its web site without editorial or peer review. For more information about the projects program at WPI, see http://www.wpi.edu/Academics/Projects. Abstract The goal of this project was to develop a system of automated trading systems that achieves a greater return than comparable alternatives, such as the S&P 500. Over the past year, the inflation rate in the US was 2.1%, meaning that the buying power of $10,000 would decrease by $210. In order to counteract this, it is beneficial to identify a reliable investment vehicle to generate consistent returns. This team developed a system which achieved a 19.73% return on $400,000 initial capital when trading from April 7th, 2017 to April 4th, 2018, beating inflation by 17.63% and the S&P 500 by 9.3%. Trading on a mix of stocks and forex, the system made 634 trades and earned $0.27 for every dollar it risked, which compounds to $172.63 for every year that it was in the market. Its system quality, which is a metric for determining the overall quality and versatility of a trading system, was 2.85.
    [Show full text]
  • Successful Algorithmic Trading
    Contents I Introducing Algorithmic Trading 1 1 Introduction to the Book . 3 1.1 Introduction to QuantStart . 3 1.2 What is this Book? . 3 1.3 Who is this Book For? . 3 1.4 What are the Prerequisites? . 3 1.5 Software/Hardware Requirements . 4 1.6 Book Structure . 4 1.7 What the Book does not Cover . 5 1.8 Where to Get Help . 5 2 What Is Algorithmic Trading? . 7 2.1 Overview . 7 2.1.1 Advantages . 7 2.1.2 Disadvantages . 8 2.2 Scientific Method . 9 2.3 Why Python? . 9 2.4 Can Retail Traders Still Compete? . 10 2.4.1 Trading Advantages . 10 2.4.2 Risk Management . 11 2.4.3 Investor Relations . 11 2.4.4 Technology . 11 II Trading Systems 13 3 Successful Backtesting . 15 3.1 Why Backtest Strategies? . 15 3.2 Backtesting Biases . 16 3.2.1 Optimisation Bias . 16 3.2.2 Look-Ahead Bias . 16 3.2.3 Survivorship Bias . 17 3.2.4 Cognitive Bias . 17 3.3 Exchange Issues . 18 3.3.1 Order Types . 18 3.3.2 Price Consolidation . 18 3.3.3 Forex Trading and ECNs . 19 3.3.4 Shorting Constraints . 19 3.4 Transaction Costs . 19 3.4.1 Commission . 19 3.4.2 Slippage . 19 3.4.3 Market Impact . 20 3.5 Backtesting vs Reality . 20 4 Automated Execution . 21 4.1 Backtesting Platforms . 21 1 2 4.1.1 Programming . 22 4.1.2 Research Tools . 22 4.1.3 Event-Driven Backtesting . 23 4.1.4 Latency .
    [Show full text]
  • The Basics of Algorithmic Trading
    Trim Size: 6in x 9in k Chan c01.tex V2 - 12/11/2016 12:44 A.M. Page 1 CHAPTER 1 The Basics of Algorithmic Trading k n algorithmic trading strategy feeds market data (historical or live) into 1 k Aa computer (backtest or automated execution) program. The program then submits orders to the broker through an API, and receives order status notifications back from the broker. The flowchart in Figure 1.1 illustrates this process. Notice that I deliberately use the same box to indicate the computer pro- gram that generates backtest results and live orders: This is the best way to ensure we are trading the exact same model that we have backtested. In this chapter, I will discuss the latest services, products, and their ven- dors applicable to each of the blocks in Figure 1.1. In addition, I will describe my favorite performance metrics, the way to determine the optimal lever- age, and the simplest asset allocation method. Though I have touched on many (but notCOPYRIGHTED all) of these issues in my previous MATERIAL books, I have updated them here based on the state of the art. The FinTech industry has not been stand- ing still, nor has my understanding of issues ranging from brokers’ safety to subtleties of portfolio optimization. k Trim Size: 6in x 9in k Chan c01.tex V2 - 12/11/2016 12:44 A.M. Page 2 Live market Historical market data data Computer program Order Order status Backtest results Broker API Order Order status Broker’s server k 2 FIGURE 1.1 Algorithmic trading at a glance k ■ Historical Market Data For daily historical data in stocks and futures, I have been using CSI (csidata.com) for a long time.
    [Show full text]
  • Quant Trading Guide
    Quant Trading Guide Callum McDougall email [email protected] November 2020 Contents 0 Introduction 2 1 What is quant trading? 3 1.1 Common terminology........................................3 1.2 What does a quant trader do?....................................3 1.3 Quant Trading vs Quant Research.................................4 1.4 Common misconceptions.......................................4 1.4.1 Finance culture is terrible..................................4 1.4.2 The hours in finance are really bad.............................5 1.4.3 You need to be a maths genius...............................5 1.4.4 You need to be a really good coder.............................6 1.4.5 You need lots of finance experience.............................6 1.4.6 You need an amazing CV..................................7 1.4.7 If you make a small mistake in trading, you could lose your job.............7 1.4.8 Working in finance is morally wrong............................8 1.5 Good things about quant trading..................................8 1.6 Bad things about quant trading...................................9 1.7 What to do if you think you might be interested......................... 10 2 Quant Trading Firms & Internships 11 2.1 What is an internship like?..................................... 11 2.2 Internship structure......................................... 12 2.3 Quant trading firms......................................... 13 3 How to prepare for interviews 15 3.1 Solving problems........................................... 15 3.2 Types of interview
    [Show full text]
  • The 'Outsiders'
    July 12, 2017 The ‘Outsiders’ - Emerging Ecosystems Equity Research Platforms of Change From Grassroots to Moonshots Rise of the New Robert D. Boroujerdi Year one of a new administration and the (1) DIY Algorithms: Democratizing Finance. (212) 902-9158 [email protected] headlines have kept us busy. Market rotations Dorm rooms to desktops the crowdsourcing of Goldman Sachs & Co. LLC have been swift, the rise of passive unabated and alpha may be the next investment paradigm. the distinction between quant and fundamental (2) Edge Computing. The IT pendulum is once continues to fade. While this construct remains Christopher Wolf, CFA again shifting from centralized (i.e., cloud-based) (212) 934-4221 [email protected] our reality, we pause for a moment to look to decentralized computing in end-devices. Goldman Sachs & Co. LLC beyond the benchmark and peer into seven emerging platforms that exist in the periphery of (3) Big Data & Sports Analytics. From optical our investable universe yet merit close watching. tracking to next-gen wearables, the marriage of science and sport is increasingly ubiquitous. From DIY algorithms broadening the pool of alpha to the multi-billion dollar Dark Net (4) Is Voice the Death of Search? From Siri to economy and the rise of Voice & Digital Cortana, voice search could shake up the $100bn+ Assistants in Search, industries we know are advertising market while also driving its growth. silently reshaping. The blending of science, (5) The Hidden Order of Chaos Theory. What if technology and Sports Analytics is changing the complete disorder is mathematically impossible? nature of engagement between coach, athlete and fan, while the dual forces of AI and the IoT are (6) Alzheimer’s in Focus.
    [Show full text]