Predictability of Financial Market Indexes by Deep Neural Network
Total Page:16
File Type:pdf, Size:1020Kb
2017 International Symposium on Nonlinear Theory and Its Applications, NOLTA2017, Cancun, Mexico, December 4-7, 2017 Predictability of Financial Market Indexes by Deep Neural Network Tomoya Onizawa, Takehiro Suzuki, and Tomoya Suzuki Graduate School of Science and Engineering, Ibaraki University, Nakanarusawa-cho 4-12-1, Hitachi-shi, Ibaraki, 316-8511, Japan. Email: [email protected], [email protected], [email protected] Abstract—We investigated the predictability of TOPIX usual three-layer neural network might be hard to work in (Tokyo Stock Price Index) as a typical stock market index terms of the curse of dimensionality. For this reason, we by using various deep neural networks to learn nonlinear need to use multilayered neural networks, but the vanishing dynamics hidden in complex price movements. If there gradient problem might be happen in the back propagation are not any dynamics in them, that is, financial systems algorithm. To moderate this problem, we also use the deep are random, prediction accuracy would not be improved by neural network (DNN), which includes the pre-training by deep learning procedures. However, we confirmed this im- the autoencoder[1]. In addition, the dropout technique[2] provement by some simulations with real stock data, which can be also included in the DNN by suspending some neu- means that financial systems are not random and have some rons randomly to restrain the over-fitting problem. predictability. If the EMH is true, the above attempts must be useless. However, if the prediction accuracy of TOPIX is improved, this means that stock markets have some patterns against 1. Introduction the random walk and have some predictability against the As a core concept of traditional financial economics, EHM. To verify this possibility, we perform some predic- the efficient market hypothesis (EMH) insists that finan- tive simulations with real stock data. cial price behavior is completely unpredictable and can be characterized by the random walk. However, the EMH 2. Traditional Neural Networks is merely a hypothesis, and especially the behavioral eco- nomics has insisted that the EMH is limited due to the psy- 2.1. Three-Layer Neural Network (3NN) chological bias of market participants. Moreover, in recent The most typical neural network is composed of the years, it becomes easier to obtain real financial data, and three layers: an input layer, a hidden layer, and an out- therefore it is possible to discuss the validity of the EMH put layer. First, let us denote the output value from the j-th by using real data. hidden layer as Because the EMH insists that financial markets are un- 0 1 BXI C predictable, if we could show the predictability, it would B C y = f @B w x + b AC ; where f (u) = max(u; 0): (1) be a counter-evidence of the EMH. In this purpose, we j i j i j i=1 can examine the predictability of individual stocks, but the market indexes such as TOPIX and the Nikkei Stock Av- Here, xi is the output value from the i-th input layer, wi j erage (Nikkei 225) are more universal as market behavior. is the connection weight from the input layer to the hidden In particular, TOPIX is composed by the market capital- layer, b j is the bias of the hidden layer, and f is a nonlinear = ∼ = ∼ ization of all the companies listed on the first section of function. Then, i 1 I and j 1 J. the Tokyo Stock Exchange (TSE). Therefore, the present Similarly, the output value from the k-th output layer is denoted as study focuses on TOPIX as a universal example to discuss 0 1 the predictability of stock markets and the validity of the BXJ C = B + C ; = : EMH. zk g @B w jky j bkAC where g(u) u (2) = As a nonlinear prediction model, we use the neural net- j 1 work framework. Although we can apply the time-series Here, w jk is the connection weight from the hidden layer to data of TOPIX to a neural network, it might be better to the output layer, bk is the bias of the output layer, and g is apply all the stocks composing TOPIX as explanatory vari- the identity function. Then, k = 1 ∼ K. ables because TOPIX depends on them. However, because Next, the model parameters: wi j, w jk, b j, and bk are the dimension of the input layer increases very much, the trained by the back propagation algorithm. Moreover, for - 166 - dynamically adjustment of the training late, we apply the 3. Stacked Autoencoder adaptive moment estimation (Adam) [3], which estimates the mean m(t) and variance v(t) of the distribution of the er- Some autoencoders are stacked for the pre-training of ror function E(t),(t = 1; 2; ··· ; T). Here, E(t) is the square DNN as shown in Fig. 1. An autoencoder is just a 3NN of error between a predicted value and its true value at time t. Eqs. (1) and (2), and its model parameters can be learned The update equation of training late is as follows: by the back propagation mentioned in Sect. 2.1. However, the number of neurons in the hidden layer is less than that + = β + − β r 2; m(t 1) 1m(t) (1 1) E(t) (3) in the input layer, and the number of neurons in the output 2 v(t + 1) = β2v(t) + (1 − β2)rE(t) : (4) layer is the same as the input layer. Namely, if we can get the same input data from the output layer, it can be said that Next, these values are modified as follows: the hidden layer worked for dimension reduction and the mˆ (t + 1) m(t + 1)=(1 − β1); (5) encoded input data can be decoded without losing essential vˆ(t + 1) v(t + 1)=(1 − β2): (6) information. If we composed the DNN that has more than four lay- Here, β and β is used to prevent the rapid decline of train- 1 2 ers, we stack the autoencoder as shown in Fig. 1 for pre- ing late. Moreover, β = 0:9 and β = 0:999 are recom- 1 2 training of the DNN. First, we remove the output layer of mended in the original study[3], and therefore we adopt the first autoencoder, and we consider the hidden layer of the same values. it as the input layer of the second autoencoder. Next, we Finally, each model parameter of a neural network is re- train the second autoencoder by the back-propagation algo- newed by the back propagation with the above values. p rithm, and repeat the same procedure many times. Finally, w (t + 1) = w (t) − αmˆ (t + 1)=( vˆ(t + 1) + ϵ) (7) the output layer is connected to the hidden layer of the final jk jk p autoencoder where the original input data were compressed w (t + 1) = w (t) − αmˆ (t + 1)=( vˆ(t + 1) + ϵ) (8) i j i j p into a low dimension by the stacked auto encoders. Only + = − α + = + + ϵ bk(t 1) bk(t) mˆ (t 1) ( pvˆ(t 1) ) (9) the output layer is initialized at random. In our study, the b j(t + 1) = b j(t) − αmˆ (t + 1)=( vˆ(t + 1) + ϵ) (10) number of neurons in each hidden layer was reduced by 90% compared to each input layer. After the pre-training, Here, α = 0:001 and ϵ = 10−8 are recommended in Ref.[3], the whole DNN is trained again by the back-propagation al- and therefore we adopt the same values. gorithm. Although the vanishing gradient problem happens 2.2. Deep Neural Network (DNN) again in the pre-trained DNN, the whole network even near the input layer was already trained. Therefore, this problem As the number of neurons in the input layer is larger, the is not serious. relationship between input and output data becomes more complex, and therefore we need a multilayered neural net- work (MNN) to enhance its learning ability. Moreover, in 4. Predictive Simulations terms of the curse of dimensionality, the MNN is useful In this study, we predict the return rate y(t) of the TOPIX for dimension reduction, which is a sort of feature selec- futures during the daytime from the open (9:00) to the tion. However, in the back propagation mentioned above, close (15:00) by using the return rates of stocks x (t), the MNN has the vanishing gradient problem due to which i (i = 1; 2; ··· ; I) during the night time from the close to the the modification of model parameters becomes weaker as open. Namely, y(t) is the output of a neural network and in closer to the input layer. the set of x (t) is the input of it: However, recently, the deep learning was invented, and i one of core concepts is the autoencoder[1] that can be used c(t) − o(t) y(t) = ; (11) for the pre-training of MNN to moderate the vanishing gra- o(t) dient problem. Here, this deep learning model is called the − − = oi(t) ci(t 1); deep neural network (DNN), and we also apply this model xi(t) (12) ci(t − 1) to examine whether the predictability of TOPIX can be im- proved. where c(t) and o(t) are the closing and opening price of Moreover, the dropout[2] is also a core concept of the the TOPIX futures, and oi(t) and ci(t − 1) are the opening deep learning in order to reduce the over-fitting problem and closing price of the i-th stock.