<<

USING MACHINE LEARNING TECHNIQUES IN THE

STOCK MARKET

A Project Presented to the Faculty of California State Polytechnic University, Pomona

In Partial Fulfillment Of the Requirements for the Degree Master of Science In Economics

By David Licerio 2018 SIGNATURE PAGE

PROJECT: USING MACHINE LEARNING TECHNIQUES IN THE STOCK MARKET

AUTHOR: David Licerio

DATE SUBMITTED: Spring 2018

Economics Department

Dr. Craig Kerr Project Committee Chair Economics

Dr. Carsten Lange Economics

Dr. Bruce Brown Economics

ii ACKNOWLEDGMENTS

I would like to thank my professor Dr. Craig Kerr for all the help, and my family for all the love and support given.

iii ABSTRACT

The objective of this paper is to forecast stock market prices using machine learning techniques. First, a statistical analysis of the Amazon stock is carried out. For further analysis, an autoregressive integrated (ARIMA) model is fitted to the data and a forecast made. After the ARIMA and statistical analysis, machine learning techniques are applied to the stock to make another forecast on the returns to the Amazon stock. The returns are compared to a buy and hold strategy. The machine learning model includes a multi-layer perception model (MLP). The model used includes the following technical indicators: index, moving average convergence divergence, , , and Williams’ accumulation distribution.

iv Contents

1 1

1.1 Introduction ...... 1

2 3

2.1 Review of Literature ...... 3

3 10

3.1 Statistical Analysis ...... 10

4 18

4.1 The Model ...... 18

4.2 Conclusion ...... 23

Bibiliography 24

v List of Tables

3.1 ARIMA Summary ...... 16

3.2 Forecasted ARIMA Values ...... 17

vi List of Figures

2.1 Support Vector Machine Regression ...... 4

3.1 Amazon Close Prices ...... 11

3.2 Histogram of Daily Returns ...... 12

3.3 Fitted ARIMA Model (0,1,0) ...... 13

3.4 Forecasted ARIMA ...... 14

3.5 ARIMA ACF ...... 15

3.6 ARIMA PACF ...... 16

4.1 Multi-Layer Perceptron ...... 19

4.2 and Price ...... 20

4.3 Moving Average Convergence Divergence ...... 21

4.4 Neural Network ...... 23

vii Chapter 1

1.1 Introduction

Stock market prices follow what is known as a “random walk” meaning the prices from one day to the next are random, making it difficult to forecast the next value in the trend.

The efficient market hypothesis states that if all prices reflect current information, then it is not possible to beat the system for a profit (Investopedia, 2018c). However, many statistical techniques are used in financial data to point out trends to see if the markets are predictable.

Three authors are covered in the literature review. One commonality between two of the authors is that they use the same form of machine learning, known as a multi­ layer perceptron (MLP), a technique that is also used in this paper. The MLP technique performed better than the other machine learning algorithms used including; single layer perceptron (SLM), radial basis function (RBF), and support vector machines (SVM).

For this paper, Amazon daily adjusted closing prices were gathered from Yahoo Fi­ nance from July 1, 2017 to March 28, 2018. An analysis of the returns to the stock is carried out. Next, an ARIMA model is fitted to the data for further analysis of the Ama­ zon stock. Finally, the machine learning technique known as MLP is implemented and

1 a comparison between a buy and hold strategy versus the machine learning technique is made.

2 Chapter 2

2.1 Review of Literature

Patel et al. (2015) makes predictions on the stock prices of the 1-10th, 15th, and 30th day forecast. In order to accomplish this, the authors used a two step fusion technique.

A two step infusion technique requires a combination of two different machine learning techniques, in order to make a single forecast. The reason for the two step technique is that they felt previous use of machine learning techniques relied on older statistical parameters as the forecasting value increased. Meaning, as time went on, the values were based off of older information and became less useful.

The first stage is a support vector machine regression (SVMR). According to Math­ works.com (2018), the goal of this technique is to find a linear function, that is as flat as possible (minimizing beta), subject to constrained residuals. Meaning, you want to find a line which separates the data into 2 sections based on specified properties of the data.

SVMR is used to organize and classify data in regression analysis. For example, if you had a cluster of data on the x-axis of a graph, and a cluster of data aligned on the y-axis, you would want a line coming from the point of origin, or (0,0), at a 45◦ angle. The line would be the linear function we are looking for. An example can be seen in Figure 2.1 ?.

3 The line separating the blue and red dots at an angle is the linear function that would be created by the SVMR, which would separate the green and red dots based on their color properties.

Figure 2.1: Support Vector Machine Regression

The second stage of the technique used by Patel et al. (2015) includes a hybrid model of an Artificial Neural Network (ANN), Random Forest (RF) and SVMR. According to

Van Greven and Bohte (2017), an artificial neural networks is a computer system based off of biological brains and is meant to learn something without having been explicitly designed to do so. ANN’s generally consist of 3 layers. The first is an input layer of neurons which sends data to a second layer. The second layer then relays that data to the third layer for output. An ANN learns based on the inputs it is given, and the out­ put achieved is based on what it has learned given the inputs. ANN’s are considered non-linear statistical modeling tools and are used to find complex patterns in data. The

4 advantage of ANN’s compared to other statistical techniques is it’s ability to learn. A random forest is used in regression by using what are known as decision trees. Investo­ pedia (2018b) states that decision trees consist of branches which evaluate data, and has leaves which hold the conclusions of the data. According to Vidyha (2018), a decision tree splits data into homogeneous sets, based on the input that differentiates the variables the most. For example, say we have 10 students, and we want to know which students are most likely to play handball during recess. We also have data based on age, gender and height. The decision tree would then organize the data, and would be able to differ­ entiate between which of the variables (age, gender height) would be the greatest help in determining which students are most likely to play handball during recess. According to the site BML (2016) decision trees and ordinary least squares (OLS) can both be used for regression, however decision trees can also be used for classification. Another difference between the two is that OLS assumes Gaussian relations hold while DT’s do not have such assumptions on the data. These Gaussian assumptions include the data should be stationary, no severe multicollinearity, no heteroskedasticity, no serial-correlation, and the expected error is 0. A stationary variables means that the mean, variance and er­ ror term do not change over time. No severe multicollinearity means the x coefficients should not be significantly correlated. Heteroskedasticity means the variance of the error is different among the data. Serial-correlation is when the price goes up (or down), the next price go up (or down) and continues to follow this pattern.

The hybrid model is then compared to a single stage machine learning model where

ANN, RF, and SVR are used alone. Just as in this project, Patel et al. (2015) used indica­ tors in their neural networks which included including relative strength index, accumu­ lation /distribution oscillator, moving average and commodity channel index. In order to identify which of the algorithms performed the best, the authors used mean absolute

5 percentage error (MAPE), mean absolute error (MAE), relative root mean square error

(rRMSE), and mean squared error (MSE). Each of the measurements are used to measure the accuracy of a forecasting model, specifically in trend estimation. MAPE gives the er­ ror in terms of percentage. MAE is calculated by obtaining the average of the difference between the predicted value and actual value. rRMSE is used for models whose error are measured in different units. Finally, MSE takes the average of the squared error between the forecasted and actual values. The authors found the that two stage fusion prediction models consisting of ANN and SVMR performed better than the single stage predictions since those techniques had the lowest of the MAPE, MAE, rRMSE and MSE. Patel et al.

(2015) found that the best prediction model was when they combined ANN and SVMR.

In another article by Usmani et al. (2016), the authors use machine learning tech­ niques to make predictions on the Karachi Stock Exchange. In order to predict the entire market, the model uses oil, gold, silver, interest and foreign exchange rates as inputs.

These inputs are then combined with simple moving average and ARIMA models. The techniques that are used for comparison include a single layer perceptron, multi-layer perceptron (MLP), Radial Basis Function (RBF) and SVM. Usmani et al. (2016) article is similar to this paper in that a multi-layer perceptron is used.

To further explain the techniques used, a single layer perceptron consists of a single layer of input neurons and an output neuron. A neuron in the context of machine learning is modeled after the neurons in our brain and acts as a cell which gathers data and sends data out. In this case, an input neuron would gather data, for example the price of an apple along with random generated numbers. The neuron would then take this information and assign a weighted value between 0 and 1. The 0 and 1 represent the importance of the inputs and in this case, 1 would be assigned to the price of an apple and 0 to the random generated number. A 0 would be assigned since the random number has no value in the

6 data, while the price of the apple is more relevant to the data so it would be given a 1.

This data is then sent to the output layer to receive the weighted sum of the input neurons.

In this case, a single layer perceptron made up of 2 input neurons and 1 output neuron would take the following mathematical form: output=random generated input∗weight

0+price of apple∗weight1. For this example, the output would be the price of the apple.

The multi-layer perceptron is different from the single layer perceptron in that it contains one or more hidden layers of perceptrons, while a single layer perceptron has no hidden layers.

According to Publishing (2018), a hidden layer get its name from the fact it is not visible as the input and output layers are. The hidden layer is where further learning of the algorithm takes place. For example, say there are inputs of images of apples. The neuron in the hidden layer would read the images of apples, building up an idea of what an apple looks like, so that it can more accurately recognize an apple when it comes across the image of an apple again in the future. By increasing the number of neurons in the hidden layer, the algorithm would become better at identifying the apple image.

However, if too many neurons are in the hidden layer, it may take longer for the algorithm to run.

An RBF is another neural network which also has input, output and hidden layers.

The difference is that in the case of the RBF, the hidden layer consists of a function whose values relies on the center. For example, the function could consist of the center of a circle.

The four machine learning algorithms are run separately on the data. It was found that the oil prices, correlated the most with the market, while the foreign exchange rate had the lowest, meaning it could be eliminated from the model. The researchers concluded that the best performing model was the MLP.

7 Lastly, Moghaddam, Moghaddam and Esfandyari (2016) use daily stock data to make predictions on the NASDAQ index. An MLP model was used for the forecast. 100 days of price history is used. The first 70 days is used to train the model while the last 30 days is used to interpret the reliability of the algorithm. Training the model is done by running the algorithm with selected inputs, reducing the error, and then achieving the desired output. Training the data is what allows the model to learn. Once the patterns are learned and the algorithm has an idea of what the data is doing and the desired output is achieved, a forecast can be made. In the case of Moghaddam, Moghaddam and Esfandyari (2016), the forecast is made after the first 70 days, on the last 30 days. The actual values of the 30 days is compared with the forecasted values of the 30 days. By comparing the forecasted and actual values, the reliability of the model can be seen. If the forecasted values greatly differ from the actual values, then the model may not be a good model to use. The model is evaluated via the r-squared and RMSE. The R-squared demonstrates how much of the change of a dependent variable is explained by dependent variables. The authors ran

7 different experiments by training the neural networks in either levenberg-marquardt

(LM) or one step secant (OSS). Wolfram.com states that LM finds the parameters of the model, so that the sum square of deviations are minimized for non-linear functions.

According to Gavin (2017), LM is a combination of Gauss-Newton and gradient descent methods. Gavin (2017) states that in the gradient descent method “... the sum of the squared errors is reduced by updating the parameters in the steepest-descent direction.”

Furthermore, Gavin (2017) asserts that in the Gauss-Newton method “... the sum of the squared errors is reduced by assuming the least squares function is locally quadratic, and finding the minimum of the quadratic.” According to mathworks.com OSS is “...a function that updates weight and bias values according to the one-step secant method.”

With the OSS method, you would have a non-linear function, and use secant lines to try

8 to approximate the parameters in which that function lies. The researchers found that the highest performing model was the MLP which used LM, with an R-Squared of .974., meaning that 97.4% of the changes in price could be explained by the model.

9 Chapter 3

3.1 Statistical Analysis

In this section, a statistical analysis of Amazon stock is done, along with an ARIMA model. The statistical analysis and ARIMA model are followed by a comparison of the buy and hold strategy versus the machine learning algorithm in the next section. The

Amazon adjusted close prices can be seen in Figure 3.1. The adjusted close price ac­ counts for stock splits, dividends and rights offerings. According to Investopedia (2018a) a stock split is when a company decides to break down its current stock. Meaning, they would split the stock into half and thre would be twice as manys shares. For example, if the price of a stock is $200 currently, it would be $100 after the split. Dividend payments are made out to shareholders per each share owned every quarter. An example of divi­ dend payments would be if a stock is trading at $11, and has a $1 dividend, the adjusted price would be $10. Rights offerings gives rights to shareholders to a larger portion of shares.

The returns can be seen in Figure 3.2. The y-axis represents how many times that amount of returns came up in the data in days. The highest number being 20. The x-axis represents the percentage change in returns which was calculated calculated with the

10 following formula: (P1−P0) . Where P is the price of the next day and p is the orginal P0 1 0 price. We can see that the returns are normally distributed. The height of the histogram

takes place at about .02% returns. There is one outlier that is located on the right side of

the graph. That outlier came from October 27, 2107, when the prices jumped $130. This

was one day after Amazon announced 3rd quarter sales were up 34%.

Figure 3.1: Amazon Close Prices

An ARIMA model was made to conduct further analysis on the Amazon stock. The model that was selected was (0,1,0) and can be seen in Figure 3.3. The (0,1,0) in this case stands for (p,d,q). Where p represents the Autoression of y, D represents first differ­ ecning, and q represesnts the moving average. Since there is only 1 in the d value, there is only 1 first difference in the model. This signifies that the data follows what is known as a random walk. A random walk means that data has a random pattern that cannot be foreseen. The random walk formula takes the form Xt = Xt−1 + et . Where Xt is the

original price, Xt−1 is the previous price, or lag, and et is the error term. The red line on the graph represents the forecast values, while the black line are AMZN adjusted close

prices. By looking at the ARIMA model, it is evident that the model does follow the

trend of the actual price values. Figure 3.4, which demonstrates the forecast values given

11 Figure 3.2: Histogram of Daily Returns by the ARIMA model seen on the blue line. As the the forecast increases over time, the less accurate the model becomes. This loss of accuracy can be seen in the grey area surrounding the blue forecast values. The grey area represents the size of the confidence interval and increases in size overtime in the forecast. The confidence interval provides us with other possible values that the forecast can take, even though they may be less likely then the actual forecasted values.

12 Figure 3.3: Fitted ARIMA Model (0,1,0)

Figures 3.5 and 3.6 represent the ACF and PACF of the ARIMA model. According to Nau (2018) ACF stands for Auto-correlation function, which tells us whether points in the data are serial correlated by looking at the correlation between a time series and its lags. In the graph shown, the ACF plot shows one significant spike, then drops off. The

PACF also tells us if the data are serial-correlated, but instead uses partial correlations between the residuals and lags of the time series. The PACF in this case has no significant spike. The combination of the graphs demonstrate to us that the model in fact follows a random walk, or (0,1,0). Next, a Ljung-Box test was done which tests the randomness of the data. The null hypothesis for the Ljung-Box test is there is no serial-correlation

13 Figure 3.4: Forecasted ARIMA while the alternative is that there is serial-correlation. Serial-correlation is when prices are a function of past prices. In other words, if the prices are going up,they will continue to go up. In this case, the ljung-box test for serial-correlation shows a high p-value of

.8. Thus, we fail to reject the null hypothesis that the data is free of serial correlation in favor the alternative that serial-correlation persists in the data.

14 Figure 3.5: ARIMA ACF

The charts in Table 3.1 give the AIC and BIC values. AIC is the akaike information criterion and takes the form AIC = 2k −2ln(L). According to K Burnham (2002), k is the

number of estimated parameters, and L is the highest probability value of the function.

On the other hand, Ernst Wit (2012) states that the BIC is the Bayesian Information

criterion and takes the form BIC = ln(n)k − 2ln(L), where L, N, and K have the same

meaning. Both the AIC and BIC estimate the quality of the models. The AIC and BIC

values were the lowest with the (0,1,0) model. The lower BIC and AIC, the better the

forecasted model is. With other models like the (1,0,0) the BIC and AIC values were

closer to 1,700. A higher BIC or AIC means the forecast is less reliable. An ARIMA

model of (1,0,0) takes the form Xt = pXk−1, where pXk−1 is the first difference. The first difference is calculated by subtracted the new price from the old price. Since the other values for the AIC and BIC were higher, the (0,1,0) model the best fitting model.

15 Figure 3.6: ARIMA PACF

Table 3.2 are the actual forecast values. From the table, the forecasted values are staying constant at 1431.42. This could be due to the fact daily data is being used, and there is not enough time is given for movement. Looking at the high and low at the 95% level, the values increase in the difference between them overtime. This is related to the fact that as the forecast values increase over time, the accuracy of the ARIMA model decreases.

Table 3.1: ARIMA Summary Coefficients AIC BIC

Values 1630.34 1630.36

16 Table 3.2: Forecasted ARIMA Values Forecasted Value Low 95 High 95

853.50 1390.810 1472.030

1431.42 1373.989 1488.851

431.42 1361.082 1501.758

1431.42 1350.201 1512.639

1431.42 1340.614 1522.226

1431.42 1331.947 1530.893

1431.42 1323.977 1538.863

1431.42 1316.559 1546.282

1431.42 1309.591 1553.249

1431.42 1303.001 1559.839

17 Chapter 4

4.1 The Model

The model used in this paper is a multi-layer perceptron. This neural network will be

compared to a buy and hold strategy. According to Berlinger (2015), MLP is in the fam­

ily of feed forward, deep neural networks. As mentioned earlier, ANNs are computer net­

works that vaguely imitate animals brains in that they have similar components. ANN’s

have nodes, or artificial neurons which hold and pass information through links similar

∞ to synapses found in biological brains. A neuron takes the following form: ∑i=1 xiwi where x represents inputs (in this paper they will be 6 technical indicators which will be explained later on), and w represents weights, which take values between 0 and 1. In assigning the weights, 0 would be assigned if the data was less relevant, 1 being the most relevant. After the weights are assigned values, just as in the apple example described earlier, the output given goes through what is known as back propagation. Berlinger

(2015) states that in back propagation, the forecasted values are compared with the ac­ tual values, and if the values do not match, the weights are adjusted to reduce the error. If they do match, the weights do not change. The weights are adjusted through the weight change formula shown here: Δ = n ∗ w ∗ x , where w is the weight, n is the learning rate,

18 and x represents the inputs. The learning rate is how fast the MLP learns and is assigned

at the discretion of the researcher. The larger the learning rate is, the longer the algo­

rithm will take to run. However, if the learning rate is too small, the resulting data may

not be accurate enough. This feed forward and back propagation continues until the error

term can go no lower. The error term is how much the observed data differs from the

actual data. In this case, it would be the difference in values learned by the MLP, to the

actual values from the 6 technical indicators. Once the error is minimized, output can be

achieved. Figure 4.1 is a diagram of an MLP gathered from Oriani (2018).

Figure 4.1: Multi-Layer Perceptron

For this paper, 6 neurons are used as inputs. The neurons consist of the 6 technical indicators used. The first of the indicators includes the Relative Strength Index. Figure

4.2 shows the relative strength index and prices. According to Berlinger (2015) the rela­ tive strength index is able to tell investors if an asset is oversold or over bought. It does

19 100 average of up-closes this by using the following formula RSI = 100 − (1 +RS) : where RS= average of down-closes . An RSI moves between 0 and 100, and is overbought when above 70, and oversold when below 30. Whenever the green line (Prices) is above the blue line (RSI), there is an up­ ward trend in the price. From early October to the end of October 2017, the RSI is at an upward trend the entire time, coming from the low end. Since the RSI was low, it meant too many investors were over selling, and they should have held on to the stock as it was rising.

Figure 4.2: Relative Strength Index and Price

The next indicator used is the moving average convergence divergence (MACD) in­ dicator and can be seen in Figure 4.3. The MACD shows the relationship between two moving averages and is calculated by subtracting the 26 day EMA from the 12 day EMA.

When the MACD lines (red) is below the Price (green) it is an indication that the price will be rising in the future because the EMA is below the actual price. The further the gap between the two, the stronger the indication is that the price is going to rise. Throughout

January 2018, the MACD stayed below the price line, indicating that throughout that year, it would have been best to hold Amazon stock instead of selling. When the MACD crosses the price line from above, it is a buy signal, when it crosses from below, it is a

20 sell signal. According to Berlinger (2015), the MACD can give false alarms, so it is best

to use only during strong trends, when the gap between the price and MACD is large.

Figure 4.3: Moving Average Convergence Divergence

The other technical indicators used are the commodity channel index (CCI), stochas­ tic oscillator (SO), and Williams’ accumulation distribution (WAD). The CCI lets in­ vestors know if an asset is being over bought or over sold. The formula for the CCI is Price-Moving Average .015∗ . Knowing when an asset is being over bought or over sold can give investors a good idea on where the asset will be heading in the future. For example, if you can tell with a high probability if an asset is being oversold, investors can buy it at a discount, knowing that the price will be going up in the future. The stochastic oscillator is a indicator, which tells us how fast the asset is picking up in price. The

Low-Low14 stochastic oscillator is constructed using the following formula S = 100 ∗ High14-Low14 . The Low represents the low of that day, Low 14 represents the low of the past 14 days, and High 14 is the high price of the last 14 days. Finally, the WAD is another momentum indicator which lets us know whether investors are more often buying or selling in the market, thus identifying whether it is a buyers or seller market. The WAD takes the form Close Price-Open Price WAD= High Price-Low Price ∗.

21 July 3, 2017 to February 2017 was used for training the algorithm. The training data set is what the program used to learn the next change in price. Training data is used as a sort of reference so that the predicted values can be compared to the actual values.

The input layer consist of 6 neurons made up of the 6 technical indicators. Different networks were tested with hidden layers of 6,12, and 18. The neural network gives one output which is the return to the Amazon stock. Low learning rates of .01, .02, and .03 were used. A rule was made so that the network would stop once the 1000th iteration was reached, so that the algorithm would not go on forever and the error term is minimized.

An iteration is when the model repeats itself by running through the input, hidden, output layers. The network which had the lowest RMSE was used as the chosen model. The month of March was used to forecast the returns to the Amazon stock. This was graphed alongside the buy and hold strategy. The graph can be seen below in Figure 4.4. Looking at the graph, both the neural network, as well as the buy and hold strategy performed with similar results. The neural network made returns at a slightly quicker rate, but still at about the same amount. The buy and hold strategy is almost identical to the nearual network so that you can slide the NN to the right to fit exactly over the buy and hold strategy. The difference being the NN is much more complex. This is a shining example of the efficient market hypothesis at work. Even with advanced computer technology, beating out the market is very difficult. This could persuade investors to simply invest in index funds, and follow a buy and hold strategy to reduce trading fees.

22 Figure 4.4: Neural Network

4.2 Conclusion

Patel et al. (2015) find that the best working machine learning techniques are a fusion of techniques which all performed better than the 1 stage techniques. However, the best working algorithm of all the of the techniques was a combination of ANN and SVMR.

Usmani et al. (2016) studied RBF, SLP, MLP and SVMR and found the best of those techniques was MLP, which will be used in this paper. Finally, Moghaddam, Moghaddam and Esfandyari (2016), also that MLP outperformed other machine learning techniques.

What we can conclude from this is that MLP is comes out as one of the more accurate machine learning techniques when it comes to stock market forecasting.

It was found in this paper that the machine learning techniques used did not outper­ form a buy and hold strategy. This could be due to many other people in the market using

23 machine learning techniques, thus making the markets even more competitively and ac­ curately priced. In the future, it would be interesting to try fusion techniques to see if a better result can be obtained.

24 Bibliography

Berlinger, Edina; Illes, Ferenc ; Milan Banai; Daroczi Gergely. 2015. Mastering R

For Quantitative Finance. Pakt Publishing.

BML. 2016. “Logistic Regresion VS Decision Trees.” https: // blog. bigml. com/

2016/ 09/ 28/ logistic-regression-versus-decision-trees/ , Accessed on

2018-04-27.

Ernst Wit, Edwin van den Heuvel, JanWillem Romeijn. 2012. All models are

wrong...: an introduction to model uncertainty. Wiley Publishing.

Gavin, Henri. 2017. “Levenberg-Marquandt method for Least Sqaure Curve-Fitting

Problems.” Journal for the Department of Civil and Enviornment Engineering at Duke

University.

Investopedia. 2018a. “Accumulation/Distribution.” https: // www. investopedia.

com/ terms/ a/ accumulationdistribution. asp/ , Accessed on 2018-04-07.

Investopedia. 2018b. “Decision Trees.” https: // www. investopedia. com/

terms/ d/ decision-tree. asp/ , Accessed on 2018-04-20.

25 Investopedia. 2018c. “Effiecient Market Hypothesis.” https: // www.

investopedia. com/ terms/ e/ efficientmarkethypothesis. asp/ , Ac­

cessed on 2018-04-07.

K Burnham, D Anderson. 2002. Model Selection and Multimodel Inference: A practi­

cal and Theoretical Approach. Springer-Verlag Publishing.

Mathworks.com. 2018. “Understanding Support Vector Machine

Regression”.” https: // www. mathworks. com/ help/ stats/

understanding-support-vector-machine-regression. html? s_ tid=

gn_ loc_ dropl/ , Accessed on 2018-04-21.

Moghaddam, Amin Hedayati, Moein Hedayati Moghaddam, and Morteza Esfand­

yari. 2016. “Stock market index prediction using artificial neural network.” Journal of

Economics, Finance and Administrative Science, 21(41): 89–93.

Nau, Robert. 2018. “Identifying AR or MA terms in an ARIMA model.” http: //

people. duke. edu/ ~ rnau/ 411arim3. htm# plots/ , Accessed on 2018-05-01.

Oriani, Felipe. 2018. “StackOverflow.” https:

// stackoverflow. com/ questions/ 33649645/

how-should-nodes-be-connected-in-a-neural-network/ , Accessed on 2018-04-11.

Patel, Jigar, Sahil Shah, Priyank Thakkar, and K Kotecha. 2015. “Predicting stock

market index using fusion of machine learning techniques.” Expert Systems with Ap­

plications, 42(4): 2162–2172.

Publishing, Standout. 2018. “Hidden Layers.” http: // standoutpublishing.

com/ g/ hidden-layer. html/ , Accessed on 2018-04-21.

26 Usmani, Mehak, Syed Hasan Adil, Kamran Raza, and Syed Saad Azhar Ali. 2016.

“Stock market prediction using machine learning techniques.” 322–327, IEEE.

Van Greven, Marcel, and Sander Bohte. 2017. Artificial Neural Networks as Models

of Neural Information Processing. Frontiers.

Vidyha, Analytics. 2018. “Decision Trees Simplified.”

https: // www. analyticsvidhya. com/ blog/ 2016/ 04/

complete-tutorial-tree-based-modeling-scratch-in-python/ one/ ,

Accessed on 2018-03-01.

27