On Advances in Deep Learning with Applications in Financial Market Modeling

On Advances in Deep Learning with Applications in Financial Market Modeling by Xing Wang A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy Auburn, Alabama August 8, 2020 Keywords: Deep learning, Stock prediction, Convolutional neural network, Stock2Vec, Deep Q-network, Exploration, Overestimation, Cross Q-learning, DQN Trading Copyright 2020 by Xing Wang Approved by Alexander Vinel, Chair, Assistant Professor of Industrial and Systems Engineering Jorge Valenzuela, Distinguished Professor of Industrial and Systems Engineering Daniel F. Silva, Assistant Professor of Industrial and Systems Engineering Erin Garcia, Industrial and Systems Engineering Abstract This dissertation focuses on advancing the machine learning, with a particular focus on the application for financial trading. It is organized into two parts. The first part of this dissertation (Chapters 1-2) will be concerned with the application of predictive modeling on stock market prediction. Chapter 1 presents the basics of machine learning and deep learning. In Chapter 2, we combine several recent advances in deep learning to build a hybrid model to forecast the stock prices, that gives us the ability to learn from various aspects of the related information. In particular, we take a deep look at the representation learning and temporal convolutional network for sequential modeling. With representation learning, we derived an embedding called Stock2Vec, which gives us insight for the relationship among different stocks, while the temporal convolutional layers are used for automatically capturing effective temporal patterns both within and across series. Our hybrid framework integrates both advantages and achieves better performance on the stock price prediction task than several popular benchmarked models. In the second part of this dissertation (Chapters 3 - 6), we turn our focus to the topics of reinforcement learning. In Chapter 3, we provide the necessary mathematical and theoretical preliminaries in reinforcement learning, as well as several recent advances in deep Q-networks (DQNs) that we would apply later. In Chapters 4 and 5, we aim at algorith- mically improving the convergence of training in reinforcement learning, with theoretical analysis and empirical experiments. One prominent challenge in reinforcement learning is the tradeoff between exploration and exploitation. In deep Q-networks (DQNs), this is usu- ally addressed by monotonically decreasing the exploration rate yet is often unsatisfactory. In Chapter 4, we propose to encourage exploration by resetting the exploration rate when it ii is necessary. Another severe problem in training deep Q-networks involves the overestimation for the Q-values. In Chapter 5, we propose to bootstrap the estimates from multiple agents, and refer to this learning paradigm as cross Q-learning. Our algorithm effectively reduces the overestimation and significantly outperforms the state-of-the-art DQN training algorithms. In Chapter 6, we continue our studies on DQN with an application in real financial trading environment, by training a DQN agent that provides trading strategies. Finally, we summarize this dissertation in Chapter 7, and discuss the possible directions for future research. iii Acknowledgments The first person I am greatly indebted to during my Ph.D. studies is my advisor, Dr. Alexander Vinel. He is the kindest advisor a Ph.D. student could have asked for. Dr. Vinel fully supports every decision I made and gives me lots of freedom to grow as an independent researcher; he encourages me at every difficult moment and takes good care of me both academically and personally. I must express special thanks to Dr. Fadel Megahed and Dr. Daniel Silva. I was benefited greatly from many discussions with them to gain guid- ance and excellent insights, both of them have significant impact on my research. I would also like to thank the other committee members, Dr. Jorge Valenzuela, Dr. Erin Garcia, and Dr. Levent Yilmaz, for their time and effort on giving me invaluable feedback. Thanks also go to all my collaborators, Bing Weng, Lin Lu, Yijun Wang, and Waldyn Martinez, for their kind and brilliant help, and I had great pleasure to work with them. Looking back the long journey, I cannot forget to express my sincere gratitude to my masters advisors and mentors in other majors, Dr. Alvin Lim, Dr. Yujin Chung, and Dr. Henry Kinnucan, I am fortunate enough to be guided by them and have learned a lot from them as well. Last but not the least, I need to thank my parents for their long-lasting support and love during my life. iv Table of Contents Abstract . ii Acknowledgments . iv List of Figures . ix List of Tables . xiii 1 Machine Learning and Deep Learning Preliminaries . .1 1.1 Introduction to Machine Learning . .1 1.2 Support Vector Machine . .2 1.3 Boosting . .4 1.3.1 Gradient Boosted Tree and XGBoost . .5 1.4 Bagging and Random Forest . .7 1.5 Neural Networks and Deep Learning . .7 1.5.1 Convolutional neural network . .8 1.5.2 Recurrent neural network . .9 1.6 Regularization . 11 1.7 Principal Component Analysis (PCA) and Robust PCA . 13 1.8 Summary . 16 2 Stock2Vec: A Hybrid Deep Learning Framework for Stock Market Prediction with Representation Learning and Temporal Convolutional Network . 17 2.1 Introduction . 17 2.2 Related Work . 21 2.3 Methodology . 24 2.3.1 Problem Formulation . 24 2.3.2 A Distributional Representation of Stocks: Stock2Vec . 24 v 2.3.3 Temporal Convolutional Network . 27 2.3.4 The Hybrid Model . 30 2.4 Data Specification . 32 2.5 Experimental Results and Discussions . 35 2.5.1 Benchmark Models, Hyperparameters and Optimization Strategy . 35 2.5.2 Performance Evaluation Metrics . 37 2.5.3 Stock2Vec: Analysis of Embeddings . 38 2.5.4 Prediction Results . 42 2.6 Concluded Remarks and Future Work . 45 2.A Sector Level Performance Comparison . 47 2.B Performance comparison of different models for the one-day ahead forecasting on different symbols . 48 2.C Plots of the actual versus predicted prices of different models on the test data 51 3 Reinforcement Learning Preliminaries . 61 3.1 Markov Decision Processes . 61 3.2 Value-based Reinforcement Learning . 62 3.3 Deep Q-Networks . 63 3.3.1 Double DQN . 64 3.3.2 Dueling DQN . 64 3.3.3 Bootstrapped DQN . 65 3.A A Simple Proof of Policy Invariance under Reward Transformation From Lin- ear Programming Perspective . 66 3.A.1 Encoding MDP as LP . 67 3.A.2 Policy Invariance under Reward Transformation . 68 4 Re-anneal Decaying Exploration in Deep Q-Learning . 70 4.1 Introduction . 70 4.2 Exploration in DQN . 72 vi 4.2.1 Exploration Strategies . 72 4.2.2 Exploration Decay . 74 4.3 Exploration Reannealing . 75 4.3.1 Local Optima in DQN . 75 4.3.2 Exploration Reannealing . 75 4.3.3 Defining Poor Local Optima . 77 4.3.4 Algorithm . 79 4.4 Experimental Results . 80 4.4.1 Testbed Setup . 80 4.4.2 Implementation of Exploration Reannealing . 82 4.4.3 Results . 83 4.5 Conclusions . 87 5 Cross Q-Learning in Deep Q-Networks . 88 5.1 Introduction . 88 5.2 Estimating the Maximum Expected Values . 92 5.2.1 (Single) Maximum Estimator . 92 5.2.2 Double Estimator . 93 5.2.3 Cross Estimator . 94 5.3 Convergence in the Limit . 95 5.4 Cross DQN . 97 5.5 Experimental Results . 102 5.5.1 CartPole . 103 5.5.2 Lunar Lander . 109 5.6 Conclusions and Future Work . 113 6 An Application of Deep Q-Network for Financial Trading . 115 6.1 Introduction and Related Work . 116 6.2 Problem Formulation for Trading . 116 vii 6.2.1 State Space . 116 6.2.2 Action Space . 117 6.2.3 Reward Function . 118 6.3 Experiment . 120 6.3.1 Environment Setup . 120 6.3.2 DQN Agent Setup . 122 6.3.3 Results . 123 6.3.4 Effect of Transaction Cost . 126 6.4 Summary . 128 7 Conclusion . 129 viii List of Figures 2.1 Model Architecture of Stock2Vec. 27 2.2 Visualization of a stack of 1D convolutional layers, non-causal v.s. causal. 28 2.3 Visualization of a stack of causal convolutional layers, non-dilated v.s. dilated. 30 2.4 Comparison between a regular block and a residual block. In the latter, the convolution is short-circuited. 31 2.5 The full model architecture of hybrid TCN-Stock2Vec. 32 2.6 Feature importance plot of XGBoost model. ..

Load more