LETTER Communicated by Jinjung Liang

Improving Closing Price Prediction Using and Technical Indicators

Tingwei Gao [email protected] Yueting Chai [email protected] Department of Automation, Tsinghua University, Beijing 100084, China

This study focuses on predicting stock closing prices by using recurrent neural networks (RNNs). A long short-term memory (LSTM) model, a type of RNN coupled with stock basic trading data and technical in- dicators, is introduced as a novel method to predict the closing price of the . We realize dimension reduction for the technical indicators by conducting principal component analysis (PCA). To train the model, some optimization strategies are followed, including adap- tive moment estimation (Adam) and Glorot uniform initialization. Case studies are conducted on Standard & Poor’s 500, NASDAQ, and Apple (AAPL). Plenty of comparison experiments are performed using a series of evaluation criteria to evaluate this model. Accurate prediction of stock market is considered an extremely challenging task because of the noisy environment and high volatility associated with the external factors. We hope the methodology we propose advances the research for analyzing and predicting stock time series. As the results of experiments suggest, the proposed model achieves a good level of fitness.

1 Introduction

Stock prediction is the act of determining the future price value or move- ment of a company stock (Hegazy, Soliman, & Salam, 2014). Mining stock market patterns is generally perceived as a challenging task as the stock data are noisy and nonstationary (Abu-Mostafa & Atiya, 1996). Stock data are actually a type of time series. Generally financial time series predict fu- ture values using charts or model techniques, inclusive of candlestick pat- terns and (ML) algorithms. The prominent ML model for time series analysis is the artificial neural network (ANN). At present, ANNs are regarded as the state-of-the-art theory and tech- nique for regression and classification applications. A recurrent neural network (RNN) is a special kind of ANN, designed to learn sequential or time-varying patterns (Medsker & Jain, 2001). Long short-term memory (LSTM) is an important kind of RNN that excels at remembering values for

Neural Computation 30, 2833–2854 (2018) © 2018 Massachusetts Institute of Technology doi:10.1162/neco_a_01124

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01124 by guest on 25 September 2021 2834 T. Gao and Y. Chai

either long or short periods of time (Gers, Schmidhuber, & Cummins, 2000). It has been shown to outperform other RNNs on tasks involving long time lags (Gers, Schraudolph, & Schmidhuber, 2002). This study proposes and validates a novel stock prediction model on the basis of LSTM, stock basic trading data, stock technical indicators, and principal component analysis (PCA). The model is designed to predict the closing price of the next day. Our first major contribution is that we effec- tively design a stock prediction system using LSTM. Second, we propose a method of combining basic stock trading data and technical indicators as the input variables by PCA. Only technical indicators are associated with the PCA unit. Third, our comparison experiments evaluate the model. The remainder of this letter is organized as follows. Section 2 provides a brief overview of related work. In section 3, we describe our prediction model and its design details. Section 4 shows and analyzes the comparison experiment results. Finally, conclusions are presented in section 5.

2 Literature Review

In the past decade, the use of ML for stock market behavior analysis has been an active research topic, inclusive of trading strategy (Sirignano & Cont, 2018; Samarakoon & Athukorala, 2017; Dash & Dash, 2016; Chour- mouziadis & Chatzoglou, 2016; Zhu, Yin, & Li, 2014; Takeuchi & Lee, 2013) and stock price prediction. There is plenty of pioneering work on the appli- cations of ANNs for predicting stock prices (Aghakhani & Karimi, 2016; Göçken, Özçalıcı, Boru, & Dosdogru,˘ 2016; Yaqub & Al-Ahmadi, 2016; Chen, 2015; Yetis, Kaplan, & Jamshidi, 2014; Sun, Che, & Wang, 2014; Li, Wu, Liu, & Luo, 2014; Das & Uddin, 2013; Oliveira, Nobre, & Zárate, 2013). Recently, RNN models have been introduced as methods to predict stock prices. Jia (2016) investigated the effectiveness of LSTM for stock market prediction, Xie and Wang (2016) found that RNNs are effective in forecast- ing stock prices, and Chen, Zhou, and Dai (2015) predicted stock returns using LSTM. A model based on RNN was proposed for predicting stock returns (Rather, Agarwal, & Sastry, 2015). Besides ANNs, a variety of other ML methods have been used for stock market forecasting—for instance, support vector machines (SVM; Chen & Hao, 2017; Wen, Xiao, He, & Gong, 2014; Ni, Ni, & Gao, 2011), decision trees (Hu, Feng, Zhang, Ngai, & Liu, 2015; Wu, Lin, & Lin, 2006), genetic algo- rithms (Ye & Wei, 2015; Fang, Fataliyev, Wang, Fu, & Wang, 2014; Sheta, Faris, & Alkasassbeh, 2013), and Markov chains (Gupta & Dhingra, 2012; Wang, Cheng, & Hsu, 2010). Some research has focused on traditional time series analysis for stock market prediction, such as autoregressive (Mathew, Sola, Oladrin, & Amos, 2013; Chou & Wang, 2007), autoregressive and moving average (ARMA) (Anaghi & Norouzi, 2013; Feng & Cao, 2011), autoregressive in- tegrated moving average (Lin & Pai, 2010; Al-Shiab, 2006), and generalized

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01124 by guest on 25 September 2021 Improving Stock Closing Price Predictions 2835

autoregressive conditional heteroskedasticity (GARCH) (Zhang, 2014; Dong, 2012; Wang, Guo, Niu, & Cao, 2010). PCA is critical for stock market prediction. Zhong and Enke (2017) pre- sented a data mining process to predict stock using ANN and PCA. Chang and Wu (2015) presented a kernel-based PCA to extract critical features to increase the performance of a stock trading model. The statistical behaviors of Chinese stock market fluctuations were investigated by Liu and Wang (2011) using PCA. Tsai and Hsiao (2010) used PCA as a feature selection method for predicting the stock market.

3 Proposed Predicting Model

3.1 Input Method

3.1.1 Input Variables Selection. Our model’s input variables include ba- sic historical trading data and technical indicators. There are six variables in the basic trading data set. The open price (OP) is the price at which a security first trades when the exchange opens on a given trading day.The closing price (CL) is the final price at which a security is traded on agiven trading day. The high price (HI) is the highest price at which a stock trades over the course of a trading day. The low price (LO) is the lowest price at which a stock trades over the course of a trading day. The adjusted price (AD) is a stock’s CL on any given day of trading that has been amended to include any distributions and corporate actions that occurred at any time prior to the next day’s open. The volume (VO) is the total quantity of shares or contracts traded for a specified security. Stock technology indicators can be adopted to predict the performance of company’s stock price (Thomas, 2001). Technical indicators are mathemat- ical calculations on the basis of stock basic trading data. They have been proved to have abundant and latent information about stock markets. Em- ploying technique indicators has had better effects, as some technique in- dicators contain information of that very day and information on previous, days. For instance, PROC provides the closing price information for the past 12 days, SO-%K in the price information for the past 5 days, MACD infor- mation for the past 26 days, and VROC volume information for the past 12 days. In our study, 15 stock market technical indicators are selected as the input variables (we list the calculation formula of technical indicators in Table 1):

• Accumulation distribution (ACD) attempts to relate price and vol- ume in the stock market. • The moving average convergence divergence (MACD) shows the relationship between two moving averages of prices and reveals

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01124 by guest on 25 September 2021 2836 T. Gao and Y. Chai

Table 1: Technical Indicators and Their Formulas.

Technical Indicator Formula = + × (CL − LO) − (HI −CL) ACD ACD ACDprevious-day VO HI − LO MACD MACD = EMA(CL, 12) − EMA(CL, 26) CHO CHO = EMA(AD, 3) − EMA(AD, 10)   t = Highest Highest(t) max CLi =  i 1  t = Lowest Lowest(t) min CLi i=1 = (CL − Lowest(5)) × SO-%K SO-%K Highest(5) − Lowest(5) 100% SO-%D SO-%D = MA(STOS-%K, 3) − = + × CL CLprevious-day VPT VPT VPTprevious-day VO CLprevious-day = Highest(n) −CL × W-R% W-R% Highest(n) − Lowest(n) 100% = − 100 RSI RSI 100 1 + RS MOME MOME(n) = CLt − CLt−n AC AC(t) = AO − MA(AO, t) CL −CL − PROC PROC = t 12 × 100% CLt−12 VO−VO − VROC VROC = t 12 × 100% VOt−12 ≥ , = + OBV If CL CLprevious-day OBV OBVprevious-day VO < , = − If CL CLprevious-day OBV OBVprevious-day VO

Moving average (MA) and exponential moving average (EMA).   t , = / MA(x t) xi t i=1 , = α × + − α × ,α= / + EMA(x t) x (1 ) EMAprevious-day 2 (t 1) = Average of upward price change . RS Average of downward price change MedianPrice = (HI + LO)/2 , = , − , AO(t1 t2 ) MA(MedianPrice t1 ) MA(MedianPrice t2 )

changes in strength, direction, and duration of a trend in a stock’s price. • The Chaikin oscillator (CHO) is used to measures the AD line of the MACD. • Highest(t) is the highest closing price value during the past t trading days. • Lowest(t) is the lowest closing price value within the past t trading days.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01124 by guest on 25 September 2021 Improving Stock Closing Price Predictions 2837

• The stochastic oscillator (SO) attempts to compare the closing price of a security to the range of its prices over a certain period of time. There are two indicators in SO. • The volume price trend (VPT) attempts to relate price and volume. • The Williams %R (W-R%) measures overbought and oversold levels. • The relative strength index (RSI) intends to identify overbought or oversold conditions in the trading of an asset. • Momentum (MOME) is the rate of acceleration of price or volume. • Acceleration (AC) measures the acceleration and deceleration of price. • The price rate of change (ROC) measures the percentage change in price between the current price and the price n periods in the past. • The Volume rate of change (VROC) is used to gauge the volatility in a security’s volume. • On-balance volume (OBV) uses volume flow to predict changes in stock price.

3.1.2 PCA for Technical Indicators. The aim of PCA is to extract the most critical information from the data set (Abdi & Williams, 2010). The reason we select PCA is that the technical indicators have some redundant infor- mation and highly correlated features. It is generally known that correlation information presented to the neural network will have an adverse impact on the learning ability of neural network, for example, by easily falling into local minima solutions, a heavier training burden, or reducing the gener- alization ability of the network (Mohamadsaled & Hoyle, 2008; Pan, Rust, & Bolouri, 2000). The relevant data will decrease the distinctiveness of data representation, thus introducing confusion in the neural network. The ANN method for dimension reduction does not apply to such situations. For in- stance, an embedding layer (Shan et al., 2016) in ANN can reduce the di- mension, yet it is applied for solving the sparse problem of one hot vector. Accordingly, we should try to reduce the correlation between inputs when using neural networks. PCA is applied because of the obvious correlation information between input variables, such as MOME and PROC. PCA can decrease the dimension of the data and extract the intrinsic features of data. In this study, PCA is considered the most suitable method. Another method to decrease dimension is linear discriminant analysis (LDA). However, it pertains to supervised learning and is applied for classification, which is not suitable for our research. Thus, the choice of the dimension-reduction method in this study is in accordance with the principle of method and practical situation. (See Zhong & Enke, 2017, for a detailed description and calculation procedure of PCA.) In this study, only some of the input variables are processed by PCA. The basic trading input variables are not processed by PCA, while technical indicator data are decreased to extract the crucial and intrinsic features from high-dimensional data space using PCA.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01124 by guest on 25 September 2021 2838 T. Gao and Y. Chai

Figure 1: Diagram of building up the sequence data set.

3.1.3 Inputs Sequence. The input variables need to be changed into se- quence data. The input data segmentation is made by a sliding window of 20 days. This process is described in Figure 1. A sliding window is applied to the entire data set to extract the input data used by the forecasting model. After that, each input variable has 20 days of observation. The blue block represents the input data, which include data form 20 trading days. The white block represents the output data, the closing price of the next day. The training samples and testing samples are obtained sequentially. Thus, we obtain results on predictions based on stock market behavior over the previous 20 days.

3.2 Neural Network Design

3.2.1 Long Short-Term Memory Neural Networks. In feedforward neural networks (FFNNs), there are no connections among the neurons in the same layer, while an RNN a kind of ANN, has connections between neural units that form a directed cycle. LSTM has a novel structure, a memory cell, which contains four main elements: an input gate, a forget gate, an output gate, and a neuron unit. This special structure makes decisions about what infor- mation to store and when to allow reading, writing, and forgetting via the three gates that open and close. Figure 2 illustrates how data flow through a memory cell and are controlled by the gates. The behavior of memory cell can be described in the following ways:

= σ + + , it 1(Wixt Uict−1 bi ) (3.1) = σ + + , ft 1(W f xt U f ct−1 b f ) (3.2) = σ + + , ot 1(Woxt Uoct−1 bo) (3.3)

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01124 by guest on 25 September 2021 Improving Stock Closing Price Predictions 2839

Figure 2: The inner architecture of an LSTM cell.

= • + • σ , + , ct ft ct−1 it 1(Wcxt bc ) (3.4) = • σ , ht ot 2(ct ) (3.5)

where xi is the input vector at time t; ht is the output vector; ct is the memory cell state; it is the input gate vector; ft is the forget gate vector; ot is the output gate vector; Wi, W f , Wo, Wc, Ui, U f , Uo are the weight matrices; bi, b f , σ σ bo, bc are the bias vectors; and 1 and 2 are activation functions. In ANNs, the activation function of a node defines the output of that node given an input (Singh & Sharma, 2014). In our prediction model, there σ σ are three kinds of activation functions: 1 (hard sigmoid), 2 (tanh), and σ 3 (ReLU). ReLU is one of the most popular activation functions (LeCun, σ σ Bengio, & Hinton, 2015). The applications of 1 and 2 are shown in Figure σ 2. 3 is in the output layer of our model:

σ = , . + . , 1(x) max(0 min(0 5 0 25x 1)) (3.6) 2 σ (x) = − 1, (3.7) 2 1 + exp(−2x) σ = , . 3(x) max(0 x) (3.8)

3.2.2 Neural Network Architecture. Our model consists of three layers in a neural network: the input layer, the LSTM layer (hidden layer), and the output layer (see Figure 3). Every unit in a layer is connected to all the other units in adjacent layers. There are 21 original input variables. After PCA is applied, the input layer has 15 neural units, the LSTM layer has 10 LSTM

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01124 by guest on 25 September 2021 2840 T. Gao and Y. Chai

Figure 3: Architecture of the proposed neural network.

neural units, and the output layer has only 1 unit. Thus, the output layer is considered the full connection layer.

3.3 Network Training Method

3.3.1 Data Preprocessing. Data normalization is more like a “scaling- down” transformation of the attributes (Han, Pei, & Kamber, 2011). In this model, min-max normalization is used to achieve normalization. The equa- tion for data normalization is given by

v − v v = i min × − + , i v − v (new_max new_min) new_min (3.9) max min

v = v , v ,...,v v = where ( 1 2 n ), i is the ith normalized data, new_ max 1, new_ min = 0. We use zero-mean for further data processing so each feature has a mean value of zero. For practical reasons, it is advantageous to center the data. We give its formula as follows:

v = v − v , i i mean (3.10)

v where i is the zero mean data. At the last step of preprocessing, we shuffle the training data randomly after each epoch of training. This shuffling can break the strong correlations between consecutive traning data.

3.3.2 Network Weights Initialization. In our prediction model, two ap- proaches are used for weight initialization. We use Glorot uniform as the

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01124 by guest on 25 September 2021 Improving Stock Closing Price Predictions 2841

initialization method of the input matrices and orthogonal initialization for the recurrent matrices. Glorot uniform initialization generates random weights and biases by sampling from a uniform distribution function (Glorot & Bengio, 2010). In this model, the feedforward weights are initialized by sampling from the uniform distribution the following initialization procedure to approxi- mately satisfy our objectives of maintaining activation variances and back- propagated gradient variance as one moves up or down the network:  √ √  6 6 W ∼ U −√ , √ , numUnits1 + numUnits2 numUnits1 + numUnits2 (3.11)

where numUnits1 is the number of neural units in the low layer and numUnits2 is the number of neural units in the high layer. In the internal element of the LSTM layer, the initialization uses the or- thogonal matrix method, which generates a random orthogonal matrix. This method has been shown to have good performances in RNNs (Le, Jaitly, & Hinton, 2015). Orthogonal matrices have many interesting prop- erties, but the most important one for us is that all its eigenvalues have an absolute value of one. This means that no matter how many times we per- form repeated matrix multiplication, the resulting matrix neither explodes nor vanishes. This allows gradients to backpropagate more effectively.

3.3.3 Optimization Algorithm for Parameters Update. We utilize Adam to perform optimization. This method is computationally efficient, has few memory requirements, and is invariant to diagonal rescaling of the gra- dients (Kingma & Ba, 2014). The core of Adam is estimating the first and second moments of gradients to do the update. All steps about Adam can be described by the following formulas:

= β + − β ∇ w , mt 1mt−1 (1 1 ) Q( ) (3.12) v = β v + − β ∇ w 2, t 1 t−1 (1 2 )( Q( )) (3.13) m mˆ = t , t − βt (3.14) 1 1 v v = t , ˆt − βt (3.15) 1 2 η w = w − √ , t+1 t mˆ t (3.16) vˆt + ε

where Q(w) is the error function, ∇Q(w) is the gradient, mt is the first mo- ment estimate at time t,andvt is the second moment estimate of t.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01124 by guest on 25 September 2021 2842 T. Gao and Y. Chai

Table 2: Experiment Data Set.

OP HI LO CL AD VO S&P 500 2000-01-03 1469.25 1478.00 1438.26 1455.22 1455.22 931,800,000 2000-01-04 1455.22 1455.22 1397.43 1399.42 1399.42 1,009,000,000 ··· ··· ··· ··· ··· ··· ··· 2016-11-09 2131.56 2170.10 2125.35 2163.26 2163.26 6,264,150,000 2016-11-10 2167.49 2182.30 2151.17 2167.48 2167.48 6,451,640,000 NASDAQ 2000-01-02 2011.08 2022.37 1999.77 2006.68 2006.68 1,666,780,000 2000-01-04 2020.78 2047.36 2020.78 2047.36 2047.36 2,362,910,000 ··· ··· ··· ··· ··· ··· ··· 2016-08-12 5219.66 5233.34 5215.55 5232.89 5232.89 1,501,620,000 2016-08-15 5242.18 5271.36 5241.14 5262.02 5262.02 1,533,170,000 AAPL 2000-01-03 104.875 112.5000 101.7875 111.9375 3.625643 133,949,200 2000-01-04 108.250 110.6250 101.1875 102.5000 3.319964 128,094,400 ··· ··· ··· ··· ··· ··· ··· 2016-10-06 113.70 114.34 113.13 113.89 112.8191 28,779,300 2016-10-10 115.02 116.75 114.72 116.05 114.9588 36,236,000

β = . β = . ε = −8 In our system, we set the initial values as 1 0 9, 2 0 999, 10 , − and η = 10 3.

4 Comparison Experiments

4.1 Experiment Data Description. A case study is presented on the ba- sis of the S&P 500, NASDAQ, and AAPL. The stock trading data are ob- tained from Yahoo Finance. The first data set is collected from 4243 trading days from January 3, 2000, to November 10, 2016, of the S&P 500. The sec- ond data set has 3177 samples from January 2, 2004, to Augest 15, 2016, of NASDAQ. The last data set has 4220 historical observations of AAPL from January 3, 2000, to October 10, 2016. Each data set consists of daily trading information including LO, HI, OP, CL, AD, and VO, as listed in Table 2.

4.2 Performance Metrics. The experiment results were measured by several criteria. The first is the mean absolute error (MAE), is a common approach in the forecasting domain (Hyndman & Koehler, 2006). The root mean square error (RMSE) is shown in equation 4.2. The mean absolute percentage error (MAPE) computes the percentage of error between the actual and predicted values. The average mean absolute percentage error (AMAPE) is shown in equation 4.4. To check the accuracy of the predictive trend, we use the percentage of correct trend (PCT) as the metric, which

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01124 by guest on 25 September 2021 Improving Stock Closing Price Predictions 2843

evaluates the accuracy rate of predicting ups and downs:

n 1 MAE = |ϕ˜ − ϕ |, (4.1) n i i i=1 

n 1 RMSE = (˜ϕ − ϕ )2, (4.2) n i i i

n ϕ − ϕ = × 1 ˜i i , MAPE 100 ϕ (4.3) n i i=1

n ϕ − ϕ = × 1 ˜i i , AMAPE 100 n (4.4) n = / ϕ i 1 (1 n) i i=1 n 1 PCT = μ n i i=1

1, if (ϕ ˜ + -ϕ )(ϕ + − ϕ ) > 0 = i 1 i i 1 i , ui (4.5) 0, otherwise

whereϕ ˜ represents the predictive value, ϕ represents the actual value, and n is the total number of data points.

4.3 Experiment Results. To demonstrate the utility of our system, we tested seven models: moving average (MA), estimated moving average (EMA), ARMA, GARCH, SVM, FFNN, and LSTM. The closing price of the next day is to be predicted. All predictive experiments were performed us- ing the Python tool, Scikit-learn, and TensorFlow package. There are dif- ferent approaches for model input variables. The basic trading data were accepted by MA, EMA, ARMA, GARCH, SVM, and FFNN as the input vari- ables. The LSTM model has four types of input method (IM): IM-A merely involves basic trading data; IM-B involves basic trading data and technical indicators; IM-C involves basic trading data and technical indicators. All in- put variables are processed by PCA. IM-D involves basic trading data and technical indicators. Only technical indicators are processed by PCA. Thus, LSTM coupled with IM-D represents the model we propose. These different input methods are shown in Figure 4. In the first experiment, the next day’s closing price in S&P 500 ispre- dicted. This experiment has four types of test data: 50 days represent the short-term prediction; 100 and 200 days denote the medium-term predic- tion; and 400 days represent the long-term prediction. For each type of test

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01124 by guest on 25 September 2021 2844 T. Gao and Y. Chai

Figure 4: Different input methods.

Table 3: Prediction Performance of 50 Days: S&P 500.

Model PCT MAPE AMAPE MAE RMSE MA 0.52 0.7692 0.7649 16.4133 21.3053 EMA 0.52 0.6845 0.6832 14.6616 17.4182 ARMA 0.52 0.6539 0.6604 14.0052 16.8417 GARCH 0.52 0.6205 0.6115 12.7812 16.2205 SVM 0.42 0.5875 0.5859 12.5727 16.0482 FFNN 0.50 0.5730 0.5717 12.2671 15.9717 LSTM + IM − A 0.60 0.5676 0.5680 12.1879 16.2729 LSTM + IM − B 0.62 0.5743 0.5733 12.3263 16.8214 LSTM + IM − C 0.62 0.5527 0.5521 12.2065 16.0016 LSTM + IM − D 0.66 0.4904 0.4882 10.5855 14.7756

set, the remaining data are training data. Tables 3 to 6 present the experi- mental results for the data set. We suggest that the accuracy achieved by our model is better than that achieved by other models for predicting the closing price in the short, medium, and long terms. Obviously, our model has the smallest MAE, RMSE, MAPE, and AMAPE and the biggest PCT. Figures 5 to 8 present the times series and scatter charts between the pre- dictive result and actual value, in which test data sets have 200 days and 400 days of trading data, respectively. In the scatter diagram, the closer the predicted results to the straight line y = x, the more accurate the prediction will be.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01124 by guest on 25 September 2021 Improving Stock Closing Price Predictions 2845

Table 4: Prediction Performance of 100 Days: S&P 500.

Model PCT MAPE AMAPE MAE RMSE MA 0.51 1.0001 0.9949 21.3715 28.9837 EMA 0.52 0.6700 0.6628 14.2362 18.3327 ARMA 0.53 0.6305 0.6182 13.9367 18.1342 GARCH 0.53 0.6093 0.5925 13.7492 17.9020 SVM 0.44 0.5784 0.5721 12.2886 17.6524 FFNN 0.51 0.5670 0.5609 12.0466 17.4330 LSTM + IM − A 0.53 0.5360 0.5311 13.8091 17.8717 LSTM + IM − B 0.56 0.6306 0.6268 13.4642 17.4240 LSTM + IM − C 0.57 0.6153 0.6127 12.1606 16.9824 LSTM + IM − D 0.65 0.5195 0.5092 11.2820 15.4552

Table 5: Prediction Performance of 200 Days: S&P 500.

Model PCT MAPE AMAPE MAE RMSE MA 0.48 1.2415 1.2273 25.6319 32.3547 EMA 0.54 0.7145 0.7086 14.7985 19.4146 ARMA 0.54 0.6836 0.6787 14.4276 19.0508 GARCH 0.55 0.6533 0.6522 13.9206 18.7235 SVM 0.48 0.6229 0.6152 12.8311 17.5105 FFNN 0.50 0.6164 0.6068 12.6720 17.4707 LSTM + IM − A 0.48 0.5940 0.5871 12.2613 17.7404 LSTM + IM − B 0.55 0.6318 0.6243 13.0388 17.0834 LSTM + IM − C 0.57 0.5999 0.5941 12.4080 17.0542 LSTM + IM − D 0.68 0.5219 0.5137 11.8897 16.5715

Table 6: Prediction Performance of 400 Days: S&P 500.

Model PCT MAPE AMAPE MAE RMSE MA 0.485 1.6310 1.6411 33.9279 40.9691 EMA 0.501 0.9329 0.9265 19.1532 24.6726 ARMA 0.503 0.9107 0,9005 18.7931 24.2427 GARCH 0.505 0.8528 0.8496 17.2482 22.9740 SVM 0.504 0.7907 0.7799 16.1231 21.8863 FFNN 0.508 0.7567 0.7451 15.4032 21.1726 LSTM + IM − A 0.495 0.7240 0.7145 14.7709 20.4668 LSTM + IM − B 0.520 0.7394 0.7287 15.0640 20.5520 LSTM + IM − C 0.528 0.7320 0.7182 14.8479 20.5808 LSTM + IM − D 0.655 0.5875 0.5749 12.1517 16.8977

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01124 by guest on 25 September 2021 2846 T. Gao and Y. Chai

Figure 5: Predictive result series chart of 200 days by our model: S&P 500.

Figure 6: Predictive result scatter chart of 200 days by our model: S&P 500.

The last two experiments are performed to compare the performance of different ANN models with data sets from NASDAQ and AAPL. Because the emphases of the last two experiments are to predict stock from the short- term and medium-term perspectives, the test set has 50 days, 100 days, and 150 days of trading data, respectively. In fact, short-term and medium-term

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01124 by guest on 25 September 2021 Improving Stock Closing Price Predictions 2847

Figure 7: Predictive result series chart of 400 days by our model: S&P 500.

Figure 8: Predictive result scatter chart of 400 days by our model: S&P 500.

predictions have more practical meaning in stock prediction. It is suggested from Tables 7 to 12 that our model had the best performance. Figures 9 to 12 present the times series and scatter charts between the predicted result and actual value, in which the test data set has 150 days of trading data.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01124 by guest on 25 September 2021 2848 T. Gao and Y. Chai

Table 7: Prediction Performance for 50 Days: NASDAQ.

Model PCT MAPE AMAPE MAE RMSE FFNN 0.54 0.6970 0.6842 34.1660 47.8931 LSTM + IM − A 0.54 0.6694 0.6569 32.8022 46.7781 LSTM + IM − B 0.54 0.7410 0.7282 36.3618 52.2088 LSTM + IM − C 0.58 0.6694 0.6583 32.8747 47.2671 LSTM + IM − D 0.66 0.5278 0.5185 29.3826 38.7946

Table 8: Prediction Performance for 100 Days: NASDAQ.

Model PCT MAPE AMAPE MAE RMSE FFNN 0.50 0.7190 0.7136 35.1073 45.5271 LSTM + IM − A 0.51 0.6700 0.6652 32.7279 43.8147 LSTM + IM − B 0.54 0.6749 0.6702 32.9698 44.0891 LSTM + IM − C 0.55 0.6620 0.6540 32.1746 42.8401 LSTM + IM − D 0.65 0.5569 0.5509 30.0238 39.6445

Table 9: Prediction Performance for 150 Days: NASDAQ.

Model PCT MAPE AMAPE MAE RMSE FFNN 0.480 0.8939 0.8598 41.3120 54.2980 LSTM + IM − A 0.507 0.7931 0.7796 37.4567 51.0865 LSTM + IM − B 0.527 0.8070 0.7946 38.1778 51.4175 LSTM + IM − C 0.573 0.7958 0.7808 37.5156 51.9173 LSTM + IM − D 0.633 0.5797 0.5749 31.2294 40.3922

Table 10: Prediction Performance for 50 Days: AAPL.

Model PCT MAPE AMAPE MAE RMSE FFNN 0.52 1.4983 1.5053 1.6519 2.3174 LSTM + IM − A 0.54 0.8973 0.8962 0.9834 1.3511 LSTM + IM − B 0.54 0.8878 0.8777 0.9632 1.2878 LSTM + IM − C 0.54 1.1100 1.1109 1.2190 1.2890 LSTM + IM − D 0.64 0.8132 0.8128 0.9088 1.0421

5Conclusion

This study proposes using LSTM, stock basic trading data, technical indi- cators, and PCA to realize the stock prediction model. To this end, we in- troduced the methodology in section 3. We also conduct some comparison

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01124 by guest on 25 September 2021 Improving Stock Closing Price Predictions 2849

Table 11: Prediction Performance for 100 Days: AAPL.

Model PCT MAPE AMAPE MAE RMSE FFNN 0.50 1.7844 1.7561 1.8207 2.3195 LSTM + IM − A 0.51 0.9241 0.9257 0.9597 1.3486 LSTM + IM − B 0.56 0.9654 0.9623 0.9977 1.3465 LSTM + IM − C 0.55 0.8961 0.8944 0.9273 1.2886 LSTM + IM − D 0.65 0.8369 0.8362 0.8885 1.1434

Table 12: Prediction Performance for 150 Days: AAPL.

Model PCT MAPE AMAPE MAE RMSE FFNN 0.453 1.9353 1.8991 1.9630 2.4597 LSTM + IM − A 0.507 0.9704 0.9701 1.0028 1.4146 LSTM + IM − B 0.553 0.9846 0.9795 1.0125 1.4364 LSTM + IM − C 0.553 0.9744 0.9694 1.0020 1.4231 LSTM + IM − D 0.640 0.8536 0.8497 0.9113 1.1694

Figure 9: Predictive result series chart of 150 days by our model: NASDAQ.

experiments using the historical time series data from S&P 500, NASDAQ, and AAPL to validate and evaluate the performance of our proposed ap- proach. Based on the case study, we show that our prediction system gives slightly higher prediction accuracy for stock closing prices of the next day,

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01124 by guest on 25 September 2021 2850 T. Gao and Y. Chai

Figure 10: Predictive result scatter chart of 150 days by our model: NASDAQ.

Figure 11: Predictive result series chart of 150 days by our model: AAPL.

which outperforms the comparison models. Thus, the model has shown that using LSTM combined with basic stock trading data, technical indica- tors, and PCA to predict the behavior and trends of stock prices is a feasible alternative to other techniques.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01124 by guest on 25 September 2021 Improving Stock Closing Price Predictions 2851

Figure 12: Predictive result scatter chart of 150 days by our model: AAPL.

References

Abdi, H., & Williams, L. J. (2010). Principal component analysis. Wiley Interdisci- plinary Reviews: Computational Statistics, 2(4), 433–459. Aghakhani, K., & Karimi, A. (2016). Anew approach to predict stock big data by com- bination of neural networks and harmony search algorithm. International Journal of Computer Science and Information Security, 14(7), 36. Abu-Mostafa, Y. S., & Atiya, A. F. (1996). Introduction to financial forecasting. Applied Intelligence, 6(3), 205–213. Al-Shiab, M. (2006). The predictability of the Amman stock exchange using the uni- variate autoregressive integrated moving average (ARIMA) model. Journal of Eco- nomic and Administrative Sciences, 4(2), 17–35. Anaghi, M. F., & Norouzi, Y. (2013). A model for stock price forecasting based on ARMA systems. In Proceedings of the International Conference on Advances in Com- putational Tools or Engineering Applications (vol. 8267, pp. 265–268). Piscataway, NJ: IEEE. Chang, P.C., & Wu, J. L. (2015). A critical feature extraction by kernel PCA in stock trading model. New York: Springer-Verlag. Chen, K., Zhou, Y., & Dai, F. (2015). A LSTM-based method for stock returns predic- tion: A case study of China stock market. In Proceedings of the IEEE International Conference on Big Data (pp. 2823–2824). Piscataway, NJ: IEEE. Chen, X. (2015). Stock price prediction via deep belief networks. Ph.D. diss., University of New Brunswick. Chen, Y., & Hao, Y. (2017). A feature weighted support vector machine and k-nearest neighbor algorithm for stock market indices prediction. Expert Systems with Ap- plications, 80, 340–355.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01124 by guest on 25 September 2021 2852 T. Gao and Y. Chai

Chou, H. C., & Wang, D. (2007). Forecasting volatility on the UK stock market: A test of the conditional autoregressive range model. International Research Journal of Finance and Economics, 10, 7–13. Chourmouziadis, K., & Chatzoglou, P.D. (2016). An intelligent short term stock trad- ing fuzzy system for assisting investors in portfolio management. Expert Systems with Applications, 43(C), 298–311. Das, D., & Uddin, M. S. (2013). Data mining and neural network techniques in stock market prediction: A methodological review. International Journal of Artificial In- telligence and Applications, 4(1), 117. Dash, R., & Dash, P. K. (2016). A hybrid stock trading framework integrating techni- cal analysis with machine learning techniques. Journal of Finance and Data Science, 2(1), 42–57. Dong, Y. (2012). Short-term prediction for the Shanghai Stock Index based on the ARMA-GARCH model. Journal of Chongqing University of Technology, 10,9. Fang, Y., Fataliyev, K., Wang, L., Fu, X., & Wang, Y. (2014, July). Improving the genetic-algorithm-optimized wavelet neural network for stock market predic- tion. In Proceedings of the International Joint Conference on Neural Networks (pp. 3038–3042). Piscataway, NJ: IEEE. Feng, P., & Cao, X. B. (2011). An empirical study on the stock price analysis and prediction based on ARMA model. Mathematics in Practice and Theory, 41(22), 84– 90. Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural Computation, 12(10), 2451–2471. Gers, F. A., Schraudolph, N. N., & Schmidhuber, J. (2002). Learning precise tim- ing with LSTM recurrent networks. Journal of Machine Learning Research, 3, 115– 143. Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feed- forward neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics (vol. 9, pp. 249–256). Göçken, M., Özçalıcı, M., Boru, A., & Dosdogru,˘ A. T. (2016). Integrating metaheuris- tics and artificial neural networks for improved stock price prediction. Expert Sys- tems with Applications, 44, 320–331. Gupta, A., & Dhingra, B. (2012, March). Stock market prediction using hidden Markov. In Proceedings of the Students Conference on Engineering and Systems (pp. 1–4). Piscataway, NJ: IEEE. Han, J., Pei, J., & Kamber, M. (2011). Data mining: Concepts and techniques. Amsterdam: Elsevier. Hegazy, O., Soliman, O. S., & Salam, M. A. (2014). A machine learning model for stock market prediction. arXiv:1402.7351. Hu, Y., Feng, B., Zhang, X., Ngai, E. W. T., & Liu, M. (2015). Stock trading rule discov- ery with an evolutionary trend following model. Expert Systems with Applications, 42(1), 212–222. Hyndman, R. J., & Koehler, A. B. (2006). Another look at measures of forecast accu- racy. International Journal of Forecasting, 22(4), 679–688. Jia, H. (2016). Investigation into the effectiveness of long short term memory networks for stock price prediction. arXiv:1603.07893. Kingma, D., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01124 by guest on 25 September 2021 Improving Stock Closing Price Predictions 2853

Le, Q. V., Jaitly, N., & Hinton, G. E. (2015). A simple way to initialize recurrent networks of rectified linear units. arXiv:1504.00941. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. Li, Y., Wu, C., Liu, J., & Luo, P. (2014). A combination prediction model of stock com- posite index based on artificial intelligent methods and multi-agent simulation. International Journal of Computational Intelligence Systems, 7(5), 853–864. Lin, C. S., & Pai, P. F. (2010). A hybrid ARIMA and support vector machines model in stock price forecasting. Inner Mongolia Electric Power, 132(2), 29902. Liu, H., & Wang, J. (2011). Integrating independent component analysis and prin- cipal component analysis with neural network to predict Chinese stock market. Mathematical Problems in Engineering, 2011, 583–601. Mathew, O. O., Sola, A. F., Oladiran, B. H., & Amos, A. A. (2013). Prediction of stock price using autoregressive integrated moving average filter (arima (p,d,q)). Global Journal of Science Frontier Research, 13(8), 79–88. Medsker, L. R., & Jain, L. C. (2001). Recurrent neural networks: Design and applications. Boca Raton, FL: CRC Prees. Mohamad-Saleh, J., & Hoyle, B. S. (2008). Improved neural network performance us- ing principal component analysis on Matlab. International Journal of the Computer, the Internet, and Management, 16(2), 1–8. Ni, L. P., Ni, Z. W., & Gao, Y. Z. (2011). Stock trend prediction based on fractal fea- ture selection and support vector machine. Expert Systems with Applications, 38(5), 5569–5576. Oliveira, F. A., Nobre, C. N., & Zárate, L. E. (2013). Applying artificial neural net- works to prediction of stock price and improvement of the directional prediction index: Case study of PETR4, Petrobras, Brazil. Expert Systems with Applications, 40(18), 7596–7606. Pan, Z., Rust, A., & Bolouri, H. (2000). Image redundancy reduction for neural net- work classification using discrete cosine transforms. In Proceedings of the Inter- national Joint Conference on Neural Networks (vol. 3, pp. 149–154). Piscataway, NJ: IEEE. Rather, A. M., Agarwal, A., & Sastry, V. N. (2015). Recurrent neural network and a hybrid model for prediction of stock returns. Expert Systems with Applications, 42(6), 3234–3241. Samarakoon, P. A., & Athukorala, D. A. S. (2017). System abnormality detection in stock market complex trading systems using machine learning techniques. In Pro- ceedings of the National Information Technology Conference (pp. 125–130). Piscataway, NJ: IEEE. Shan, Y., Hoens, T. R., Jiao, J., Wang, H., Yu, D., & Mao, J. C. (2016). Deep crossing: Web-scale modeling without manually crafted combinatorial features. In Proceed- ings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 255–262). New York: ACM. Sheta, A., Faris, H., & Alkasassbeh, M. (2013). A genetic programming model for S&P 500 stock market prediction. International Journal of Control and Automation, 6(5), 303–314. Singh, A., & Sharma, S. K. (2014). Calculation of resonant frequency of hexagonal split ring resonator using ANN. International Journal of Research in Engineering and Technology, 3, 144–147.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01124 by guest on 25 September 2021 2854 T. Gao and Y. Chai

Sirignano, J., & Cont, R. (2018). Universal features of price formation in financial markets: Perspectives from deep learning. Social Science Electronic Publishing. Sun, Q., Che, W. G., & Wang, H. L. (2014). Bayesian regularization BP neural network model for the stock price prediction. In F. Sun, T. Li, & H. Li (Eds.), Foundations and applications of intelligent systems (pp. 521–531). Berlin: Springer. Takeuchi, L., & Lee, Y. Y. A. (2013). Applying deep learning to enhance momentum trading strategies in (Technical Report). Stanford, CA: Stanford University. Thomas, P. (2001). A relationship between technology indicators and stock market performance. Scientometrics, 51(1), 319–333. Tsai, C. F., & Hsiao, Y. C. (2010). Combining multiple feature selection methods for stock prediction: Union, intersection, and multi-intersection approaches. Decision Support Systems, 50(1), 258–269. Wang, W., Guo, Y., Niu, Z., & Cao, Y. (2010). Stock indices analysis based on ARMA-GARCH model. In Proceedings of the IEEE International Conference on In- dustrial Engineering and Engineering Management (pp. 2143–2147). Piscataway, NJ: IEEE. Wang, Y. F., Cheng, S., & Hsu, M. H. (2010). Incorporating the Markov chain concept into fuzzy stochastic prediction of stock indexes. Applied Soft Computing, 10(2), 613–617. Wen, F., Xiao, J., He, Z., & Gong, X. (2014). Stock price prediction based on SSA and SVM. Pracedia Computer Science, 31, 625–631. Wu, M.-C., Lin, S.-Y., & Lin, C.-H. (2006). An effective application of decision tree to stock trading. Expert Systems with Applications, 31, 270–274. Xie, X. K., & Wang, H. (2016, October). Recurrent neural network for forecasting stock market trend. In Proceedings of the 2016 International Conference on Computer Science, Technology and Application (p. 397). Singapore: World Scientific. Yaqub, M. U., & Al-Ahmadi, M. S. (2016, August). Application of combined ARMA- neural network models to predict stock prices. In Proceedings of the 3rd Multidis- ciplinary International Social Networks Conference on Social Informatics (p. 40). New York: ACM. Ye, Q., & Wei, L. (2015). The prediction of stock price based on improved wavelet neural network. Open Journal of Applied Sciences, 5(4), 115. Yetis, Y., Kaplan, H., & Jamshidi, M. (2014, August). Stock market prediction by using artificial neural network. In Proceedings of the World Automation Congress (pp. 718– 722). Piscataway, NJ: IEEE. Zhang, C. (2014). Stock price forecast with ARMA-GARCH based on error correc- tion. Journal of Nanjing University of Aeronautics and Astronautics, 3,8. Zhong, X., & Enke, D. (2017). Forecasting daily stock market return using dimen- sionality reduction. Expert Systems with Applications, 67, 126–139. Zhu, C., Yin, J., & Li, Q. (2014). A stock decision support system based on DBNs. Journal of Computational Information Systems, 10(2), 883–893.

Received February 5, 2018; accepted June 4, 2018.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01124 by guest on 25 September 2021