Stock Market Prediction Using Ensemble of Graph Theory, Machine Learning and Deep Learning Models

San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 5-20-2019 STOCK MARKET PREDICTION USING ENSEMBLE OF GRAPH THEORY, MACHINE LEARNING AND DEEP LEARNING MODELS Pratik Patil San Jose State University Follow this and additional works at: https://scholarworks.sjsu.edu/etd_projects Part of the Artificial Intelligence and Robotics Commons, and the Other Computer Sciences Commons Recommended Citation Patil, Pratik, "STOCK MARKET PREDICTION USING ENSEMBLE OF GRAPH THEORY, MACHINE LEARNING AND DEEP LEARNING MODELS" (2019). Master's Projects. 692. DOI: https://doi.org/10.31979/etd.38nc-j52r https://scholarworks.sjsu.edu/etd_projects/692 This Master's Project is brought to you for free and open access by the Master's Theses and Graduate Research at SJSU ScholarWorks. It has been accepted for inclusion in Master's Projects by an authorized administrator of SJSU ScholarWorks. For more information, please contact [email protected]. STOCK MARKET PREDICTION USING ENSEMBLE OF GRAPH THEORY, MACHINE LEARNING AND DEEP LEARNING MODELS A Project Report Presented to Dr. Ching seh Wu Department of Computer Science San José State University In Partial Fulfillment Of the Requirements for the Class CS 298 By Pratik Patil May 2019 © 2019 Pratik Patil ALL RIGHTS RESERVED The Designated Thesis Committee Approves the Thesis Titled STOCK MARKET PREDICTION USING ENSEMBLE OF GRAPH THEORY, MACHINE LEARNING AND DEEP LEARNING MODELS by Pratik Patil APPROVED FOR THE DEPARTMENT OF COMPUTER SCIENCE SAN JOSÉ STATE UNIVERSITY May 2019 Dr. Ching seh Wu Department of Computer Science Dr. Katerina Potika Department of Computer Science Dr. Marjan Orang Department of Economics ACKNOWLEDGEMENT This has been one long and arduous journey, but nevertheless a worthwhile life experience because of the many great Professors at SJSU and beloved friends. I am grateful and take this opportunity to thank my advisor Dr. Wu, who has been my constant support not only during the thesis but during my whole master's degree. It wouldn’t have been possible without his trust and belief in me to do good research. I also want to thank Dr. Potika and Dr. Orang for consenting to be on my committee and giving their valuable inputs to my project, without which the project would not have been successful. I would also like to thank my parents, my sister Priya and my beloved friend Shweta for supporting and encouraging me throughout my graduation. Abstract Efficient Market Hypothesis (EMH) is the cornerstone of the modern financial theory and it states that it is impossible to predict the price of any stock using any trend, fundamental or technical analysis. Stock trading is one of the most important activities in the world of finance. Stock price prediction has been an age-old problem and many researchers from academia and business have tried to solve it using many techniques ranging from basic statistics to machine learning using relevant information such as news sentiment and historical prices. Even though some studies claim to get prediction accuracy higher than a random guess, they consider nothing but a proper selection of stocks and time interval in the experiments. In this project, a novel approach is proposed using graph theory. This approach leverages Spatio- temporal relationship information between different stocks by modeling the stock market as a complex network. This graph-based approach is used along with two techniques to create two hybrid models. Two different types of graphs are constructed, one from the correlation of the historical stock prices and the other is a causation-based graph constructed from the financial news mention of that stock over a period. The first hybrid model leverages deep learning convolutional neural networks and the second model leverages a traditional machine learning approach. These models are compared along with other statistical models and the advantages and disadvantages of graph-based models are discussed. Our experiments conclude that both graph-based approaches perform better than the traditional approaches since they leverage structural information while building the prediction model. Index Terms - Stock market, machine learning, deep learning, graph theory, financial networks, time series forecasting, spatio-temporal v Table of Contents Chapter 1: Introduction ............................................................................................................................. 1 I. Introduction ............................................................................................................................................. 1 II. Research Objective ................................................................................................................................ 4 Chapter 2: Literature Review .................................................................................................................... 5 I. Introduction ............................................................................................................................................. 5 II. Machine Learning .................................................................................................................................. 7 III. Graph Theory Approach. .................................................................................................................. 10 III. Deep Learning. ................................................................................................................................... 16 Chapter 3: Implementation Platform and Libraries Used .................................................................... 17 I. Data and Graph Manipulation Libraries: ...................................................................................... 17 II. Development Platform:..................................................................................................................... 17 a) Setup on local computer (Windows 10) ...................................................................................... 17 b) Collab by Google. ....................................................................................................................... 18 c) Amazon Web Services (AWS) ................................................................................................... 18 Chapter 4: Dataset .................................................................................................................................... 19 I. Stock Price Data Collection: ............................................................................................................ 21 a) 1-Day interval dataset: ................................................................................................................ 21 b) 1-minute interval dataset. ............................................................................................................ 21 II. News dataset: ..................................................................................................................................... 22 Chapter 5: Modelling stocks into a graph. ............................................................................................. 23 I. Correlation based relationship. ....................................................................................................... 23 a) Spurious correlation problem: ..................................................................................................... 24 vi b) Pearson correlation coefficient .................................................................................................... 25 c) Spearman correlation coefficient ................................................................................................ 26 d) Kendall correlation coefficient .................................................................................................... 28 e) Choosing the correct threshold. .................................................................................................. 29 II. News based relationship. .................................................................................................................. 32 Chapter 6: Approaches and Implementation. ........................................................................................ 34 I. Graph Based Deep Learning Models .............................................................................................. 34 a) Definitions and nomenclature: .................................................................................................... 35 II. Graph Based Traditional ML Models ......................................................................................... 37 a) Community Detection: ................................................................................................................ 38 b) Feature Extraction ....................................................................................................................... 38 c) Formulating the time series forecasting problem as supervised machine learning ..................... 40 d) Building Linear Models .............................................................................................................. 42 III. Statistical Model ............................................................................................................................ 44 Chapter 7: Experiments. .........................................................................................................................

Stock Market Prediction Using Ensemble of Graph Theory, Machine Learning and Deep Learning Models

Bab 1 Pendahuluan

Insight MFR By

Statistický Software

The R Project for Statistical Computing a Free Software Environment For

List of New Applications Added in ARL #2586

Cumulation of Poverty Measures: the Theory Beyond It, Possible Applications and Software Developed

Comparison of Three Common Statistical Programs Available to Washington State County Assessors: SAS, SPSS and NCSS

Statistika Penelitian Bisnis & Manajemen

The Development of a Statistical Software Resource for Medical Research

Medical Statistics Online Help: Sample Size & Power for Clinical Trials

Syllabus Personal Introduction

Jorge Fallas