Exploration of a Trading Strategy System Based on Meta-Labeling and Hybrid Modeling Using the Sigtech Platform
Total Page:16
File Type:pdf, Size:1020Kb
Exploration of a trading strategy system based on meta-labeling and hybrid modeling using the SigTech Platform. Petri Nousiainen Master’s Thesis Master of Engineering - Big Data Analytics June 3, 2021 MASTER’S THESIS Arcada University of Applied Sciences Degree Programme: Master of Engineering - Big Data Analytics Identification number: 8277 Author: Petri Nousiainen Title: Exploration of a trading strategy system based on meta-labeling and hybrid modeling using the SigTech Platform. Supervisor (Arcada): Ph.D. Magnus Westerlund Commissioned by: Abstract: The thesis aims to study a machine learning (ML) supported trading system. The methodology is based on a process that is utilizing meta-labeling, thus provides labels for a secondary model, where losses and gains are labeled outcomes. The secondary model provides a prediction based on the primary model output correctness. The sec- ondary model predicts whether the primary model succeeds or fails at a particular prediction (a meta-prediction). A probability correctness measure of the direction pre- diction of the secondary model is used to size the position. They are "meta" labels because the original simple trading strategy predicts the ups and downs of the market (base predictions or labels). The metalabels predict whether those base predictions are correct or not. The research question is finding a trading strategy process that can predict the next trading opportunity. The system constructed is a binary classification that aims to de- termine profitable trades, either buying or selling opportunities. The proposed system is a hybrid model setup of a primary and secondary model. The process starts with processing raw price data given to a primary model, a trend following Donchian Channel technical analysis model. The primary model output is binary labeling used for the secondary model. The secondary model is the Random Forest machine learning algorithm that predicts if the next trade is a profit or not and with what probability. The prediction probability is used for order sizing. Finally, the rule for executing a trade is that both the primary and secondary models agree on the decision. The result shows that the study supports the hypothesis that machine learning can im- prove any trading strategy. The coded trading strategy system successfully predicts the profitable trades with probabilities utilizing meta-labeling and the designed hybrid model. Furthermore, the machine learning prediction of the secondary model signifi- cantly improves the performance of the primary model trading strategy. Keywords: Machine learning, feature selection, metalabeling, meta-labeling, labeling, random forest, trading strat- egy, investment strategy, hybrid process, hybrid model, research process, trading system, bet size, trading platform, binary classification, SigTech Number of pages: 92 Language: English Date of acceptance: 30.6.2021 CONTENTS 1 Introduction .................................. 14 1.1 Introduction .................................. 14 1.2 Background and Need ............................. 14 1.3 Purpose of the Study ............................. 16 1.4 Research Question and Hypothesis ...................... 16 1.5 Significance To The Field ........................... 16 1.6 Limitations .................................. 19 1.7 Ethical Considerations ............................. 20 2 Related ..................................... 22 2.1 Financial Theory ............................... 22 2.1.1 Information Efficiency ........................... 22 2.1.2 Efficient Markets Hypothesis ....................... 22 2.2 Literature Review ............................... 22 2.2.1 Technical Analysis ............................ 22 2.2.2 Trend Following .............................. 23 2.2.3 Trading Strategies ............................ 23 2.2.4 Fixed Fraction Asset Allocation ...................... 24 2.2.5 Futures Contracts ............................. 26 2.3 Body of the Review .............................. 26 2.3.1 Trading Strategy Process - Predicting Conditional Probability of Next Trade’s Profit ................................... 27 2.3.2 The Process of a Strategy in the SigTech Platform ............. 29 2.3.3 Another Machine Learning Supported Process ............... 31 2.3.4 Analysis ................................. 33 2.4 Summary ................................... 34 3 Methodology .................................. 37 3.1 Setting .................................... 37 3.1.1 Tool - Platform SigTech .......................... 38 3.2 Data and Features .............................. 41 3.3 Model .................................... 45 3.3.1 Strategy Process 1 - Tail Reaper Trading Strategy ............. 45 3.3.2 Strategy Process 2 - MNIST Example ................... 46 3.3.3 The Coded Trading System Based on The Tail-Reaper and MNIST Processes 48 3.4 Measurement Instruments ........................... 49 3.5 Procedure .................................. 51 3.5.1 Technical Analysis Indicators as Features ................. 51 3.5.2 Donchian Channel Technical Analysis Model ................ 51 3.5.3 Meta-Labeling .............................. 56 3.5.4 The Validation of Hybrid Model by Logic Gate ............... 57 3.6 Data Analysis ................................. 58 3.7 Conclusions ................................. 58 4 Algoritmic Descriptions ............................ 60 4.1 Random Forest Classifier ........................... 60 4.2 Feature Selection ............................... 61 5 Experiments .................................. 62 5.1 Primary Model ................................ 62 5.2 Labeling - Binary Classification ........................ 62 5.3 Secondary Model ............................... 62 6 Results ..................................... 70 6.1 Descriptive Statistics ............................. 71 6.2 Inferential Statistics .............................. 71 7 Conclusions .................................. 74 7.1 Discussion .................................. 75 7.2 Summary of Limitations ............................ 83 7.3 Recommendations For Future Research .................... 83 7.4 Concluding Discussion ............................ 84 References ...................................... 85 5 FIGURES Figure 1. Features Importance Table of The Meta Labeling Process - Tail Reaper Trading Strategy "............................ 30 Figure 2. Classification Tree Example by predictnow.ai - Tail Reaper Trading Strategy "................................. 30 Figure 3. Process of Meta Labeling - Hudson and Thames "A Toy Example"... 32 Figure 4. The Receiver Operating Characteristic (ROC) Curve Example...... 34 Figure 5. The Precision Recall Table of The Meta Labeling Process - "A Toy Example" Wikipedia contributors(2021d)................ 35 Figure 6. SigTech platform - the data browser view for searching data...... 39 Figure 7. SigTech platform - the list of data types................. 39 Figure 8. SigTech platform - the data instrument examples of equity index futures 40 Figure 9. SigTech platform - the list of data instruments and the reference data.. 40 Figure 10. SigTech platform - Data feature set 17 futures rolling front from 2000- 01-04 to 2021-04-21 including totally 5557 close prices......... 43 Figure 11. SigTech platform - E-mini SP500 rolling front future from 2000-01-04 to 2021-04-21 including totally 5557 close prices............. 43 Figure 12. E-mini SP500 rolling front future Volatilities from 2000-01-04 to 2021- 04-21 including totally 5557 close prices................. 44 Figure 13. Classification tree of the predictnow.ai.................. 45 Figure 14. Feature selection of Tail Reaper strategy................. 46 Figure 15. The process structure of MNIST example................ 47 Figure 16. The Trading Strategy System....................... 48 Figure 17. The Donchian Channel Trading Strategy Signals............ 54 Figure 18. The Random Forest Decision Tree.................... 60 Figure 19. The prediction results of the random forest algorithm of the secondary model - probability last ten days, and the maximum and minimum of the total period............................... 64 Figure 20. The prediction results of the random forest algorithm in the secondary model - accuracy, profit or not, and probability.............. 64 Figure 21. The secondary model random forest feature importance table...... 66 Figure 22. The cumulative return of the underlying, the primary model and the secondary model during test period of 917 days.............. 67 Figure 23. The Receiver Operating Characteristic (ROC) curve of the probability prediction................................. 67 Figure 24. The confusion matrix of the secondary model profit or not prediction.. 68 Figure 25. The confusion matrix of the profit or not prediction - extracting true positives.................................. 68 Figure 26. The Trading Strategy System....................... 79 Figure 27. The cumulative return of the underlying, the primary model and the secondary model during test period of 917 days.............. 81 TABLES Table 1. Definitions............................... 10 Table 2. Definitions............................... 11 Table 3. Definitions............................... 12 Table 4. Definitions............................... 13 Table 5. The seven ethical requirements for trustworthy AI........... 21 Table 6. Base Model Metrics.......................... 32 Table 7. Base Model Metrics.......................... 33 Table 8. Meta Label Metrics........................... 33 Table 9. Meta Label Metrics..........................