Stock market prediction using artificial neural networks

A quantitative study on time delays

AROSHINE MUNASINGHE DAJANA VLAJIĆ

Royal Institute of Technology DD143X, Bachelor's Thesis in Computer Science Thesis Supervisor: Pawel Herman Title: Researcher, PhD, Department of Computational Biology, KTH Examiner: Örjan Ekeberg

1 June 2015

Stock market prediction using artificial neural networks by Aroshine Munasinghe and Dajana Vlajić

Submitted to KTH Computer Science and Communication on June 1, 2015, in partial fulfillment of the requirements for the degree of Bachelor of Computer Science

Abstract

This report investigates how prediction of stock markets with Artificial Neural Networks (ANN) is affected by altering aspects of data quanti- ties. A short-term and a long-term perspective considering time delays are examined. Inspired by neurosciences, ANNs have shown great po- tential in terms of recognising patterns in nonlinear systems. Existing research suggests that ANN is an eminent model to predicting stock markets due to its dynamical characteristics. Closing prices of large- caps within the sectors of IT and Telecommunication represented by the Swedish of OMX30 Stockholm (OMXS30), have been leveraged as data. The ANNs are implemented as multilayer feedforward networks, trained using supervised learning. To identify specific configurations, the models have undergone extensive testing by mean squared errors and statistical analysis. The results obtained suggest that the short- term perspective is optimally predicted for significantly small numbers of time delays, and that optimal configurations do not alter for increas- ing quantities of data. No significant conclusions could be drawn from the results for the long-term perspective. Key words: ANOVA, Backpropagation, Configurations, Stock Pre- diction, Artficial Neural Networks Referat

Denna rapport undersöker hur förutsägelser av aktiemarknader med hjälp av artificiella neuronnät (ANN) påverkas genom att förändra aspek- ter av datamängder. Undersökningen har utförts ur både ett korttids- och långtidsperspektiv, relaterat till antalet tidsförskjutningar. Inspi- rerade av neurovetenskap har ANN visat en stor potential att finna mönster i ickelinjära system. Befintliga undersökningar visar att ANN är en utmärkt modell för att förutsäga aktiemarknaden till följd av dess dynamiska egenskap. Stängningspriser av stora börsnoterade bo- lag inom IT- och Telekom-sektorerna är representerade av den svenska marknaden inom indexet OMX30 Stockholm, och utgör datamängden. ANN-modellerna är implementerade som feedforward-nätverk med fle- ra lager, som har tränats med hjälp av kontrollerad inlärning. För att identifiera specifika konfigurationer har nätverken testats genom utför- liga testningar av minsta kvadratfel och av statistisk analys. Resultaten visar att de optimala konfigurationerna för korttidsperspektivet inklude- rade mindre tidsförskjutningar och att dessa inte förändrades vid ökade mängder data. Inga signifikanta slutsatser kunde dras från resultaten för långtidsperspektivet. Nyckelord: ANOVA, Aktiemarknad, Backpropagation, Artificiella Neurronnät, Konfigurationer Contents

1 Introduction 1 1.1 Problem Statement and Hypothesis ...... 2 1.2 Scope ...... 2 1.3 Motivation ...... 2

2 Background 3 2.1 Stock Market Prediction ...... 3 2.1.1 Company Risks ...... 3 2.1.2 Random Walk Hypothesis ...... 4 2.2 ...... 4 2.3 Neural Networks ...... 5 2.4 Artificial Neural Networks ...... 5 2.4.1 Architecture of Artificial Neural Networks ...... 5 2.4.2 Backpropagation Algorithm ...... 7 2.4.3 Overfitting ...... 7 2.5 ANN Applied on Stock Data ...... 8

3 Methods 9 3.1 Data Collection ...... 9 3.2 Implementation ...... 10 3.2.1 Identifying Configurations ...... 10 3.2.2 Bayesian Regularisation ...... 11 3.3 Performance Measures ...... 11 3.3.1 Mean Squared Error ...... 11 3.4 Statistical Tests ...... 11 3.4.1 Assessment of Normality ...... 11 3.4.2 Analysis of Variance ...... 13 3.4.3 Assumptions ...... 14 3.4.4 Null Hypothesis ...... 14 3.4.5 Post Hoc Tests ...... 15

4 Results 17

5 Discussion 21 5.1 Analysis of Results ...... 21 5.2 Limitations ...... 22

6 Conclusion 23 6.1 Future Research ...... 23

Bibliography 25 Chapter 1

Introduction

Prediction of stock market data is known as a prominent issue for stock traders. Stock market data has a highly dynamic property due to a conflicting extent of influental factors. The issue has been approached for business interests by observing market forces, making assumptions and recognising historical data. Limited conditions for success have been established for these methodologies. Increased attention from academia in the field of predictions has driven development of technical and statistical models. In contrast to conventional methodologies practiced by traders, technical methods are confined and restricted to observations of historical data. One of the most recently examined technical methods is Artificial Neural Net- works (ANN). Originating from neural networks, ANNs are mathematical structures originally designed to mimic architecture, fault-tolerance, and learning capability of neural networks existing in the human brain [1]. ANNs have been successfully applied for prediction in the fields related to medicine, engineering and physics [1]. The methods have been suggested for modeling financial time series due to the ca- pability of mapping nonlinear connections [2], as proposed for stock market data [3]. The performance of ANNs has been compared to statistical approaches, rep- resenting significantly improved results for increasing complexity in time series [4]. Further factors affecting the performance of ANNs are the choice of architecture, input data, and quantity of data. In this report, ANNs are deployed to model nonlinear connections in stock mar- ket data, in coherence with altering extents of data. This is tested for Nordic large-caps regarding short-term and long-term perspectives. Bayesian regularisa- tion backpropagation has been incorporated to train the network, supported by statistical testing methods for approving the established results.

1 CHAPTER 1. INTRODUCTION

1.1 Problem Statement and Hypothesis

The goal of this report is to analyse the quantity of data sufficient for the prediction of stock market prices. Increasing the quantity of data is equivalent to moving back in the past. The report focuses on two aspects:

• Short-term perspective: a maximum of two(2) years of data points is used to predict a day ahead.

• Long-term perspective: a maximum of four(4) years of data points are used to predict a day ahead.

The hypothesis reckoned for the performance evaluation is that small quantity of time delays, and quantities of data produces an optimal result for the short-term perspective. For the long-term perspective, large quantity of data is considered to give optimal results, in combination with large numbers of time delays.

1.2 Scope

The research does not include measuring the performance of ANN as existing com- parative research proposes the usage of this technical model. Aspects in the extent of historical data is the main focus of this report for the qualified conclusion of optimal configurations. In addition, the data used include five large-caps within the sectors of IT and Telecommunication sectors from the Swedish index of OMXS30.

1.3 Motivation

Despite extensive research in the area of stock market, no significant guidelines to estimate or predict the market have been established. Various methodologies within the technical and statistical analysis have all used to attempt to predict the price. However, there is profoundly limited research conducted in the quantity of data needed to predict stock markets. Therefore, this report intends to investigate how this factor impacts the performance of ANNs. Due to predictions of prices for one day ahead, a maximum of two(2) and four(4) years has been chosen for short-term and long-term perspectives, respectively.

2 Chapter 2

Background

The background provides an insight on the issue of predicting stock market data with technical methods. Initially, aspects of influences on stock market behavior are covered and conventional theories about data fluctuations are brought up. Technical analysis is broadly described in 2.2, followed by a detailed investigation on distinc- tive properties and internal architecture of Artificial Neural Networks (ANN). Lastly in 2.5, the performance of ANNs are investigated in comparison to other methods.

2.1 Stock Market Prediction

The movement of stock prices is triggered by alterations related to supply and demand, generally referred to as market forces. These represent results a a combi- nation of factors such as earnings and impact of social media, strongly related to a company's internal and external properties [5]. From another perspective, data is affected by informed and 'noise traders'[6]. Therefore, to generate profit in stock markets, a range of influential factors are to be considered. For the purpose of accurate predictions, traders apply methods of analysis derived either from a fundamental or technical perspective [5]. The traditional approach is the involving factors in relation to the company such as market position, growth rates and revenue generation [7]. The method leveraged in this report is the technical analysis based on historical fluctuations. Technical analysis is further discussed in Section 2.2.

2.1.1 Company Risks

An additional influential factor for risk assessments is the size of companies. The trading community denominates large and small companies as large-cap and small- cap where size is directly determined by the market capitalisation values. Large-cap have a market capitalisation value more than EUR 1 billion and everything below is considered small-cap.

3 CHAPTER 2. BACKGROUND

In contrast to large-cap , investments in small-cap stocks are considered more uncertain. Other implications affecting trading are that small-caps have lower liquidity. This might result in constraints related to trading such as insufficient amounts of shares and extended selling processes resulting in less returns [8]. For new companies listed as small-caps, the absence of historic data exposes them to al- terations of trader preferences. The possibility for immediate fluctuations therefore exist for small-caps, implying that the latter is not compatible with the short-term perspective.

2.1.2 Random Walk Hypothesis The Random Walk Hypothesis (RWH) suggests that stock data do not follow pat- terns, and is therefore not eligible for prediction. Extensive research on the topic imply the opposite [9], often stated by observing the outcomes of variance ratio tests. Providing evidence against RWH, research conducted by Blasco et al.[10] suggests possibilities of short-time perspective. Developing tests based on assumptions of random variables with the same variance, Lo et al. [9] examined the validity of RWH on weekly data for two American indices. For the time-series used, the researchers suggest that the result reject RWH. Comprehensive research on the possibilities of rejecting RWH has been conducted [6], contributing to further explorations of predictions on nonlinear data.

2.2 Technical Analysis

Methods originating in technical analysis use historical data based on the existence of repetition of history, trends in price movements, and assumptions of absolute market action [7]. Technical analysis does not consider factors about the company and relies on the assumption that public information has no impact on the price [11]. From this aspect, technical analysis opposes the Efficient Market Hypothesis (EMH) stating that stock prices are affected by all available information [14]. Hence, the EMH suggests that prediction is impossible . Simplification of the theory supporting technical analysis states that increased supply would cause a price fall compared in comparison to results of an increased demand [13]. Therefore, timing is perceived a key for successful prediction. Accord- ing to extensive research, technical analysis has shown positive results in terms of prediction [14]. Research on the reward of technical analysis on the Singaporean market implies that an essential part of member firms heavily rely on technical analysis[14].

4 2.3. NEURAL NETWORKS

2.3 Neural Networks

A biological neuron in a human brain consists of four basic components: dendrites, soma, axon, and synapses [15]. The fundamental processing element is a neuron. The dendrites are extensions of the soma - acting as input channels [15]. The soma receives input signals by synapses of other neurons, these are processed over time [15]. The processes are turned into an output that is sent out to other neurons through the axon and the synapses[15]. Inspired by the described biological methods of training, ANNs have been devel- oped in an highly abstract[15] technical aspect for prediction. In addition, extended research [16] propose that ANN can find relationships by mapping inputs and out- puts of a system, using general functions and a large set of training data.

2.4 Artificial Neural Networks

Numerous types of Artificial Neural Networks (ANN), including hybrids, have been proposed for information processing since the first model was presented by Mc- Culloch and Pitts in 1943 [17]. Functions, learning algorithms and topology are properties that distinguish the different models. Due to the considerable amount of variations, ANNs have been applied in numerous areas, including pattern recog- nition, data compression [17] and prediction. Detecting nonlinear connections is a prominent strength of ANNs, and therefore an attractive property for modeling the dynamical stock market [2] recognised by nonlinear connections. Inspired by biological methods of learning over time, the highly abstract structure of ANNs use backpropagation algorithms for this purpose, described in Section 2.4.2.

2.4.1 Architecture of Artificial Neural Networks The architecture of ANN consist of neurons and layers connected by weights. The structure of the network used in this report consists of three layers in total; one input layer, one hidden layer, and one output layer. The two types of networks, single-layer and multi-layer, differentiate in terms of number of layers as suggested by the denominations.

5 CHAPTER 2. BACKGROUND

Figure 2.1. A higly abstract model of an ANN.

In Figure 2.1., the architecture of an ANN illustrates a simplified three-layer network. An input layer that feeds input data to the network, modified by the intermediate hidden layer, followed by an output layer where the results are pro- duced. Each input is initially weighted for the neurons in the hidden layer using a backpropagation algorithm, discussed in Section 2.4.2, where the sum is pursued by an activation function [18]. Activation functions generally have a curve in analogy to a sigmoid. A hyperbolic tangent sigmoid transfer function have been used in the report for this purpose. 2 y = tan sig(v) = − 1 1 + e2n

Figure 2.2. Tan-Sigmoid transfer function.

Neurons in the hidden layer are multiplied by a weight with each neuron in the output layer, and added together. Consequently, the ANN is considered numerically balanced. The structure of an ANN can be illustrated mathematically. Let W be a m by n matrix where wij representing the weight affecting the connections between Layer A and B [18]. Layer A is the input consisting m number of neurons, and Layer B represents the hidden layer with n number of neurons. These are connected by vectors ai to each neuron's activation function output. Input values to the neurons in Layer B do also consist of vectors, bj.

6 2.4. ARTIFICIAL NEURAL NETWORKS

Hence, the flow in network can be represented by

W a = b

where the learning is modified of each wij in W [18]. Adjusting input data with weights using a backpropagation algorithm is called the learning or training process of the network. Historical data is divided into three steps; training, validation and testing sets. However, with regard to the algorithm chosen, the data is indexed into two sets - training and testing.

2.4.2 Backpropagation Algorithm

The most common type of training technique in a multilayer network is a back- propagation algorithm. A backpropagation algorithm operates by propagating the errors backwards [19].The algorithm reduces the error between actual and expected results, used in the network until the ANN learns the data represented by the train- ing index. Bearing in mind that the training starts with random weights, the goal is to adjust the weights in order to minimise the errors. In this report, multilayer networks are used together with a the backpropaga- tion algorithm named Bayesian Regularisation, more comprehensively described in Section 3.2.2.

2.4.3 Overfitting

According to Geman et al. [20], the ideal way to predict the stock market would be to choose a model that both accurately captures the regularities in its training data and generalise unseen data well. An issue that can occur when training ANNs is overfitting. The phenomena occurs due to a small difference in number of param- eters in the network, compared to the total number of data points in the training set. According to Hellström and Holmström [21], an ANN with too high complexity might also get overfit with respect to the bias-variance tradeoff. In short, the bias-variance tradeoff concerns two main points: error due to bias, and error due to variances. For randomness in underlying data sets, the performed models will consist of a range of predictions [22]. In this case, bias measures how far off the predictions are from the correct value [22]. Likewise, the variance measure how the predictions vary amongst realisations of the mode [22]. These performance measures will be further explained in Chapter 3. The effect of overfitting is generalisations based on unknown data, hence bad prediction ability. The most secure way to get more accurate weights and models is to have a quantity of data large enough to be able to train a network with complex issues [23].

7 CHAPTER 2. BACKGROUND

2.5 ANN Applied on Stock Data

Predicting data from financial markets have gained considerable interest over the most recent decades. Several linear methods have been deployed for this purpose, particularly in the field of statistics. In spite of adequate results, strong indications [24] of the nonlinear property of stock markets have driven exhaustive research on predictions using different analysis methods. Within the scope of technical analysis, deployment of ANNs for predicting stock markets has therefore been investigated to a high extent, showing promising results. In the year 2006, research on modelling predictions of Indian stock market was conducted using weekly closing prices as data inputs. According to Mohan et al. [25], the performance by deploying two models of ANNs for this purpose, and found the possibility of more input data as factor for improved results. In terms of technical analysis methods, Vaisla and Bhatt [4] found that ANN outperformed a statistical analysis with multiple regression due to network condi- tions. The performance ability tested by the functions absolute percentage error (MAPE), mean square error(MSE) and root-mean-square error (RMSE), was based on daily data from a time span of 2 years (from March 1 April 2005 to 30 March 2007). Hence, promising results were found for short-term data for ANN in com- parison to the statistical method. Another comparative study conducted by Godknows and Olusanya [26] observed differences in prediction performance between ANNs and statistical models based on autoregressive integrated moving average (ARIMA). The data represented loga- rithmic returns of the Nigerian stock market index (NGSEINDX). Before applying the methods, the normalised data was successfully tested using the Hurst coeffi- cient; indicating that no random walk existed. Based on results of different error functions and a directional change statistic (Dstat) the study implies that the sta- tistical models were outperformed by ANNs. From the directional change, results show that 45 percent of the logarithmic returns were estimated by the ANNs, com- pared to 25 percent of the ARIMA models. In conclusion, the extent of data used in the report was comprised by returns from January 1985 to December 2010 with a training ratio of 96 percent.

8 Chapter 3

Methods

This chapter explains the methods used for the report, including a description of the additionally applied statistical means. A brief introduction on how and where data has been collected and pre-processed is presented in 3.1, followed by a narrowing explanation of the implementation of an Artificial Neural Network (ANN) in section 3.2.

3.1 Data Collection

The stock data used in this report is represented by Swedish companies from the OMXS30 index, in the IT and Telecommunaction large-caps. In particular, the data represents closing prices for each one of the companies, collected from Nasdaq OMX Group, Inc. It has been taken into consideration that the stock market is closed on on specific holidays, including weekends. For the long-term perspective, the data collected represents over a four year span (1st January 2011 - 1st January 2015) The data was pre-processed by taking the mean for every week in the 4 year span to minimise nonlinearity. For the short- term perspective, daily data was selected data for a two year span (1st 2012 - 1st January 2014). The data was extracted in a format separating the values by commas (csv- format). The stocks indices used in this report are Axis, Ericsson B, Nokia Oyj, Tele2 B and TeliaSonera.

9 CHAPTER 3. METHODS

3.2 Implementation

Figure 3.1. Implementation process for short and long-term perspective.

The ANN was implemented by MATLAB 2014; which is a programming lan- guage used for mathematical computations. The implementation included construc- tion, training, testing and evaluation of the ANN. As Figure 3.1. illustrates, after the data was collected a nonlinear autoregressive network (narnet) was implemented. The narnet prepared for training by Bayesian regularisation backpropagation which is described in Section 3.2.2. The ratio for training and testing was 80:20 respec- tively, due to the quantity of data. According to Alamili [27], research have been done on different training ratios other than 80 percent. However, Alamili highlights that at least 20 percent of the data should be performed as testing data, especially regarding nonlinear models. To evaluate statistical significant differences in the results, a statistical model analysing the variance, ANOVA, was used. The reason to use this tool was mainly to explain the observations. Lastly, a post hoc test was used for further analysis. According to Verial [28], this test clarifies exactly which groups of configurations that differ the most.

3.2.1 Identifying Configurations

According to Francis et al. [29], the number of hidden nodes was not a decisive consideration. Hence, one hidden layer was selected and the number of hidden nodes were constant in the sense that a maximum of nodes were 2. This range concerns both short and long-term perspective. The reason for these ranges was not solely chosen with respect to earlier research, but also due to the risk of overfitting. Having the ANN structure described in Figure 2.1., a formula was created to compare if the answer gave enough range difference compared to the parameters given by the quantities of input data (time delays):

(inputsnodes(timedelays) × hiddennodes) + (2 × hiddennodes) + outputnode

To demonstrate, having input nodes with a maximum of 10, hidden nodes with a maximum of 2 and an output, gives a value of 25. The result is compared to the quantity of data available, such as 500 data points. This means that there is a ratio of 27:500, which is assumed as a secure enough difference range for not overfitting the neural network.

10 3.3. PERFORMANCE MEASURES

3.2.2 Bayesian Regularisation Bayesian regularisation is the backpropagation algorithm used for updating weights and biases in the network. The main reason for adoption of the algorithm is due to the reduction of the potential of overfitting and ability to improve the prediction of quality and generalisation of the network [23]. In this report, the method combined the weights and minimised the mean squared errors, as discussed in Section 3.3.1, followed by a determination of the combinations.

3.3 Performance Measures

To evaluate the performance of the ANN, different sets of metrics can be used and compared. According to Ambrose [30], the use of performance measures are important as it helps to confirm what is revealed implications of results. Hence, the method reduces the chances for results being biased and misleadings of relying only on the index for decision making.

3.3.1 Mean Squared Error The reason to use the mean squared error (MSE) was to find the errors before these are put into the statistical analysis tool - which is described in Section 3.3.2. MSE was achieved by squaring the difference between the obtained output and the actual output, known as the target output.

1 N MSE = X(Y¯ − Y )2 N i i i=1

The formula above describes Y¯i is a vector which describes the predicted value, while Yi is a vector which describes the observed value for a number of N observations. In this report, the MSE was produced from the ANOVA tests described in Section 3.3.2.

3.4 Statistical Tests

3.4.1 Assessment of Normality In order to run the analysis tests, the historical data from the five enterprises was normalised using standard deviation and mean. The main purpose of normalising was to reduce the range differences in the input variables for numerical stability [21]. In addition, it enhanced the convergence of the training algorithm used [21]. A method used frequently[21] for feature normalisation is by taking the mean and the standard deviation taken over a whole set of predictor variables. The disadvantage of this, especially for predictor variables, is that the training data will

11 CHAPTER 3. METHODS have future information containing the mean and variances. Hence in this report, with regard to the ratio on the historical data, the normalisation was firstly featured on the training data. Secondly, the statistical measures of the normalised training data were used to normalise testing data.

12 3.4. STATISTICAL TESTS

Normalisation using the mean and the standard deviation was computed as

Pt − Pmean(t) Pt = σp(t) where both the mean Pmean(t) and the standard deviation σp(t) of the historical data were computed with a fixed length np, containing 80 percent of the data. The featured normalisation on the training- and testing data gave the difference between the number of standard deviations of the data and its running mean.

3.4.2 Analysis of Variance Analysis of variance (ANOVA) assumes that all sample populations are normally distributed. In this report, the sample population was represented by the enter- prises historical stock data. The groups were all the configurations done for each company. The function one-way ANOVA determines whether there are significant differences between the means of two or more independent groups. The results that are produced shows the smallest errors for each configuration.

Table 3.1. Add caption

Source SS df MS F Sig.

Between SSb m-1 MSb MSb/MSw p-value Within SSw n-m MSw Total SSb + SSw n-1

Table 3.1. which is described from the left, presents results given from the one- way test. The sum of squares are the variability of the data. There are two sources of variability, the variability within each group (SSw), and the variability between the groups (SSb). The degrees of freedom (df) shows the difference of m groups being compared, the errors in the degrees of freedom (n-m) where n contains the data collected, as well as the total degrees of freedom (n-1). The estimation of the mean squared within (MSw) is based on (SSw), and the mean square between (MSb) is based on (SSb). Both (MSw) and (MSb) estimates when the null hypothesis is true and when the null hypothesis is false. The null hypothesis will be described in Section 3.4.4. The F-distribution depends on the (MSw) and (MSb). Finally, the significance p − value is computed from the F-distribution. One-way ANOVA is also known as the means model, and is a special case of the linear model yij = αj + εij where yij is an observation based on i numbers of observations along with j different groups. αj defines the population mean for each group. Finally, ij is an independent and normally distributed random error with zero mean and constant variance. In this report, the results from ANOVA was amongst other things evaluated by null hypothesis.

13 CHAPTER 3. METHODS

3.4.3 Assumptions

Before applying one-way ANOVA testing to analyse the data, three important as- sumptions must be passed in order to get valid results.

Figure 3.2. The assumptions that must be met to carry out a one-way ANOVA.

All assumptions given in Figure 3.2., have been met accordingly. The independent variables were the five categorical independent groups representing the enterprises. The dependent variables, which were the quantities of data used depending on the configurations, were also approximately normally distributed by taking the differ- ence between the output values and the target values. The populations had a ho- mogeneity of variances, meaning that each group had the same population variance α2.

3.4.4 Null Hypothesis

To support that there are significant differences in the group means, a significance test was done in accordance to the null hypothesis. In one-way ANOVA, two hypotheses is taken into consideration

H0 = µ1 = µ2 . . . µa

H1 = at least one of the group means differs from other means where µ = group mean and a = number of groups. If ANOVA gives a significant result where at least two group means are significantly different from each other, H1 is accepted. In other words, if the difference between at least two groups is zero the null hypothesis is accepted, else rejected. Moreover, to reject the null hypothesis, H0, a significance level (alpha, α) is chosen before getting any results. The pre- chosen probability in this report was at first chosen to be α= 0.05. If the p-value from ANOVA has a level less than 5 percent, the null hypothesis can be rejected. However, from the results, the pre-chosen probability could have been even less to being α= 0.001. Though, the number might give a false sense of security. Nevertheless, it must be remembered that ANOVA does not give specific infor- mation about which groups of configurations that had significant differences from each other, it rather gives information that there are groups that differ from each other. In order to determine the specific groups, a post hoc test was used.

14 3.4. STATISTICAL TESTS

3.4.5 Post Hoc Tests By the results given from ANOVA, and if null hypothesis is rejected, the final step is to determine which group means that have a significant difference. This is done by a post hoc test for each configuration. A post hoc test controls the experimentwise error rate α= 0.05 in the same way as the one-way ANOVA. There are several types of algorithms available to conduct post hoc analysis. The test chosen in this report was the Tukey's honest significant difference criterion. The Tukey's honest significant difference (Tukey's HSD) determines pairwise- comparison between at least two group means,

Y − Y q = A B s SE where YAand YB are the pairwise-comparison of the group means. The standard error (SE) is results from the testing data. The values, qs, from the method is compared to a qcriticalvalue, which is accessed by the studentised range distribution used in the operation of Tukey's HSD. If qs is larger than the critical value qcritical, the conclusion that two group means are significantly different can be drawn. According to Lane [21], the Tukey's HSD is commonly considered to be robust - it is less likely to reject the null hypothesis. Hence, the results after using this method were fairly confident.

15

Chapter 4

Results

Due to implications of previous results on ANNs, variations of data quantities have been tested. Both short-time and long-time perspective have been investigated, implying significant differences for the optimal configurations with the help of a statistical approach. The latter is represented by analysis of variance (ANOVA) producing the MSE, superseded by a multicomparison test representing significant differences between groups.

Figure 4.1. Multiple comparisons of groups with significant differences.

In Figure 4.1. the blue line represents the lowest MSE, the grey lines repre- sent groups within the range of insignificance and the red lines represent values significantly different from the blue line. Differentiating the quantity of data for the short-term perspective did not al- ternate the two most optimal number of delays. Hence, the two lowest MSE for

17 CHAPTER 4. RESULTS each time period was reflected by identical configurations. As illustrated by Table 4.1., the optimal number of delays for the short-term perspective was one, for both number of hidden neurons tested. The lowest MSE was found for data from 1.5 years back. For tests run on data for the long-term perspective, equivalent groups of con- figurational settings gave the optimal predictions. The lowest MSE found for these were 0.8762 and 0.4904, respectively. In contrast to the latter, two hidden neurons were used for obtaining the lowest MSE. Due to differences of MSE, the results strongly imply that the quantity of data matters for the rate of prediction. Similar results were found on data quantities representing data from the previous 3 and 3.5 years. Observations on the number of delays for these configurations exploit paradoxical results. The lowest and highest possible number for delays, 4 and 26 weeks, were optimal for one neuron in the hidden layer. Conclusively, the quantity data used for these periods produced the lowest MSE for long-term perspective.

Table 4.1. Optimal results for short-term perspective on daily data.

Period (years) Data points MSE Configurations 2 500 0.1204; 0.1158 1:1;2:1 1.5 375 0.0790; 0.0766 1:1;2:1 1 250 0.195; 0.1856 1:1;2:1 0.5 125 0.1230; 0.1160 1:1;2:1 0.25 63 0.6239; 0.5611 1:1;2:1

Table 4.2. Optimal results for long-term perspective on weekly mean data.

Period (years) Data points MSE Configurations 4 200 0.4904; 0.4932 1:4; 2:4 3.5 175 0.4771; 0.4885 1:4; 2:4 3 150 0.4216; 0.3975 2:4; 2:26 2.5 125 0;3986; 0.4012 1:4; 2:4 2 100 0.8936; 0.8762 1:4; 2:4 1.5 75 0.9489; 0.9106 2:17; 2:26

Small quantities of data showed remarkably bad results for both short-term and long-term perspective. As Table 4.1. implies, the optimal configurations for the smallest quantities of short-term data clearly represented the highest among the total of MSE. The reason for this is perceived to be overfitting, which is explained in the discussion in Chapter 5. When comparing the results of short-term and long-term testing, the lowest MSE were found for short-term perspective. The equivalence of configurations of short-term testing and long-term testing is illustrated in Figure 4.1.1. and Figure 4.1.2. below.

18 Figure 4.2. Differences of optimal MSE found for short-term perspective.

Figure 4.3. Differences of optimal MSE found for long-term perspective.

19

Chapter 5

Discussion

5.1 Analysis of Results

With the aim to further explore the area of technical analysis for stock prediction, Artificial Neural Networks (ANN) were quantitatively tested for optimisation. The results in this report strongly indicate better results for the short-term perspective than for the long-term perspective. Increasing quantity of data for the short-term perspective had no impact on the optimal configurations, in contrast to the long- term perspective. Hence, for the short-term perspective, the smallest possible delays (one day) gave the best results for the data set containing daily prices for 1.5 year. Considering prediction of long-term data, the best MSE was found for less than half the quantity of data used for the optimal MSE found for short-term perspective. On the other hand, the lowest MSE (0.3975) for long-term perspective was found for the configuration for the maximum possible number of delays. A remarkable observation is that the configuration with the lowest numbers of hidden neurons and delays (1:4) gave low MSE for each data set within the range of 2.5 to 4 years. These values are close to the minimal MSE, with a maximal distinction of 0.0929 for the highest MSE of these. This result suggests that prediction might be sufficiently accurate, for all datasets tested from 2.5 to 4 years. Hence, increasing the quantity of data might have minimal effect on good predictions. The discussion of a good prediction is not included in the scope of this report. The reported results does therefore not support the importance of the differences found for the smallest MSE. In addition, the results do not imply any further observable patterns for long-term data due to strongly differentiating number of delays for each data set. As presented in Table 4.1.2. and Table 4.1.3, the MSE shown for the smallest sets of data for both perspectives are 0.25 and 1.5 years respectively, higly indicate overfitting. This is possibly due to the fact that the configurations have a ratio too close to the parameters according to the equation given in Section 3.2.1. The param- eters and the configurations have a combination that predicts the past by complete chance. A developed system to test on independent data sets to demonstrate the reliability of their findings is not developed in this report.

21 CHAPTER 5. DISCUSSION

Moreover, short-term data gave the lowest MSE, and is therefore the optimal choice of perspective for prediction for this ANN. This is considered as an effect of the distribution and quantity of data, in comparison to long-term perspective. Conclusively, the hypothesis posed in Section 1.1 was partly confirmed; the op- timal results considering both time perspective were found for a small number of time delays. Increasing the quantity of data showed no significant differences in terms of optimal configurations for the short-term perspective. Altering the quanti- ties of data regarding the long-term perspective, resulted in different configurations including variations of time delays. The optimal MSE was found for an intermedi- ate quantity of data for both perspectives, indicating the difficulty of determining whether small or high quantities of data are desirable.

5.2 Limitations

The efficiency of the ANN was not possible to evaluate since the results obtained were not compared to other prediction methodologies. A limitation for the results obtained in this report are the specifications for the selected ANN based on the quantity of data used in previous successful research on additional conditions. Nonetheless, this report does not state that the same results could not be obtained by differentiating the initial conditions for the ANN. In addition, the integration with other artificial intelligence methods is not emphasised. Investigating data from enterprises increased the risk of impact of extreme fluc- tuations, in comparison to indices. On the other hand, choosing companies from the same market within two highly related sections of the index reduced the chance of high nonlinearity. For the same purpose, only large-caps were considered in this report due to chances of high liquidity. This is considered to be one of the most influential factors of the results in this report for one-day-ahead prediction from a short-term perspective. The distribution of data might as well have limited our results. Short-term perspectives were solely based on daily data. No further pre-processing was done in terms of distributions for short-term data, as for long-term data. By taking the mean of every week during a four year period, the risk of extreme day-to-day fluctuations was minimised. Further generalisations and distributions could have been done in the choice of quantity data, for the purpose of confirming the research results.

22 Chapter 6

Conclusion

As can be seen, an overall conclusion can be drawn that both short and long-term perspectives show significant differences for the optimal MSE using feedforward net- works and statistical analysis models. The research question stated in Section 1.1 have been successfully fulfilled, essentially to a greater extent than what was ex- pected when the research question was conceived. Increasing the quantity of data had no impact on the optimal configurations for short-term data. Opposite results were found for the long-term data. Finally, the smallest possible time delay showed the best results regarding the short-term perspective. In contrast, results for the long-term perspective presented a variation of time delays for different configura- tions - hence, no conclusions can be drawn for this perspective.

6.1 Future Research

Considering a wide range of influencing factors on the results, attempting to imitate a realistic model, the ANN was modeled strongly in relation to previous research. Neither growth nor pruning methods were attempted for the selection of network architecture. Further testings could include increasing the number of hidden nodes, or by reducing the number of nodes or weights. For future research, interesting aspects for performance of the ANN includes tests on intraday real time data. Two weeks of intra price data for the short-term perspective could be important to certify the profitability of the strategy made in this report. Albeit, the quantity of data might limit the scope of the results, this could possibly result in a more precise prediction model. Performances might also be done using ANN with inputs spread over the entire market instead of concentrating on a particular sector. Likewise, regarding the long-term perspective, collected data could have included other distributions. To demonstrate, having a data collection of a four years period and to predict a day ahead, the data collection of 60 percent might contain points from every day, 30 percent might contain points from every week, and 10 percent could be spread even further from each other as far back in the past the data

23 CHAPTER 6. CONCLUSION goes, such as a point from every month. However, this can cause errors as input data might cause misinterpretations of the results in either direction. Hence, it is important that the data has high quality; bearing outliers in the data. Finally, further research could determine the duration a trained ANN system remains valid and effective in prediction before it is found to be in need of retraining since the data inputs are strongly related to the proportion of training and testing. This is also know as early stopping. In other words, the training data is stopped at a point where the performance measure on the cross-validation set has its minimum.

24 Bibliography

[1] Kar, A., Stock Prediction using Artificial Neural Networks[online] Available at: http://www.cs.berkeley.edu/~akar/IITK_website/EE671/report_ stock.pdf [Accessed 15 May 2015]

[2] Arora, R., 2008, Artificial Neural Networks for forecasting stock price,[online] Available at: http://cims.nyu.edu/~ra1221/IIMA/ANN.pdf [Accessed 20 May 2015]

[3] Nayak, S. C., Misra, B. B., and Behera H., S., 2014, Impact of Data Normal- ization on Stock Index Forecasting, [online] Available at: http://www.mirlabs.org/ijcisim/regular_papers_2014/IJCISIM_24. pdf [Accessed 21 May 2015]

[4] Vaisla K., and Bhatt A., 2014, An Analysis of the Performance of Artificial Neural Network Technique for Stock Market Forecasting, [online] Available at: http://www.researchgate.net/profile/Dr_Kunwar_Vaisla2/ publication/49620536_An_Analysis_of_the_Performance_ of_Artificial_Neural_Network_Technique_for_Stock_ Market_Forecasting/links/09e4150dc251fe7f69000000.pdf [Accessed 30 June 2015]

[5] Larsen, J., 2010, Predicting Stock Prices Using Technical Analysis and , [online] Available at: http://www.diva-portal.org/smash/get/diva2:354463/FULLTEXT01.pdf [Accessed 27 May 2015]

25 BIBLIOGRAPHY

[6] McMillan, D., 2002, Non-Linear Predictability of UK Stock Market Returns, [online] Available at: http://repec.org/mmfc03/McMillan.pdf [Accessed 27 May 2015]

[7] Murphy, J., 1999, Technical Analysis of the Financial Markets: A Compre- hensive Guide to Trading Methods and Applications, [online] Available at: http://cdn.preterhuman.net/texts/unsorted2/Stock%20books% 20029/John%20J%20Murphy%20-%20Technical%20Analysis%20Of%20The% 20Financial%20Markets.pdf [Accessed 29 May 2015]

[8] Cooper S. K., Grotha C. J., and Averab E. W., 1985, Liquidity, exchange listing, and common stock performance, Journal of Economics and Business,Volume 37, Issue 1,p. 19-33

[9] Lo W. A., and MacKinlay C. A., 1988, Stock market prices do not follow random walks: evidence from a simple specification test, [online] Available at: http://rfs.oxfordjournals.org/content/1/1/41.full.pdf+html [Accessed 27 May 2015]

[10] Blasco N., Del Rio, C., and Santamaria R., 2003, The Random Walk Hypothesis in the Spanish Stock Market: 1980 - 1992, Journal of Business Finance & Accounting Volume 24, Issue 5, pages 667-684.

[11] Bakshi N., 2008, Predicting Stock Prices Using Technical Analysis and Machine Learning, [online] Available at: http://e-collection.library.ethz.ch/eserv/eth:30710/ eth-30710-01.pdf [Accessed 26 May 2015]

[12] Burton G. M., 2015, The Efficient Market Hypothesis and Its Critics, [online] Available at: http://www.jstor.org/stable/pdf/3216840.pdf [Accessed 26 May 2015]

[13] Larsen. J. L., 2010, Predicting Stock Prices Using Technical Analysis and Ma- chine Learning, [online] Available at: http://www.diva-portal.org/smash/get/diva2:354463/FULLTEXT01.pdf [Accessed 29 May 2015]

26 [14] Wing, K.W., Manzur, M., and Chew, B-K., 2012, How rewarding is technical analysis? Evidence from Singapore stock market, [online] Available at: http://repository.hkbu.edu.hk/cgi/viewcontent.cgi?article=1037& context=econ_ja [Accessed 28 May 2015]

[15] Kulasekere, E.C., 2006, Introduction to neural networks, [online] Available at: http://www.ent.mrt.ac.lk/~ekulasek/ami/PartA.pdf [Accessed 28 May 2015]

[16] Jabin S., 2014, Stock Market Prediction using Feed-forward Artificial Neural Network, [online] Available at: http://www.researchgate.net/profile/Suraiya_Jabin/publication/ 268152880_Stock_Market_Prediction_using_Feed-forward_Artificial_ Neural_Network/links/549ac4610cf2fedbc30e3240.pdf [Accessed 26 May 2015]

[17] Yang, J., 2010, Intelligent Data Mining using Artificial Neural Networks and Genetic Algorithms: Techniques and Applications, [online] Available at: http://wrap.warwick.ac.uk/3831/1/WRAP_THESIS_Yang_2010.pdf [Accessed 30 May 2015]

[18] Yao, J., and Tan L. C., 1997, A case study on using neural networks to perform technical forecasting of forex, [online] Available at: http://ac.els-cdn.com/S0925231200003003/1-s2. 0-S0925231200003003-main.pdf?_tid=1b768d40-fa79-11e4-b60d-00000aab0f02& acdnat=1431636058_dcc76d2a5b71297682d2dd94e16a6f85 [Accessed 28 May 2015]

[19] Gupta, S., Kumar, D., and Sehgal, P., 2012, Minimization of Error in Training a Neural Network Using Gradient Descent Method, [online] Available at: http://www.omgroup.edu.in/ejournal/Papers%20for%20E-Journal/PDF/ 2.pdf [Accessed 27 May 2015]

27 BIBLIOGRAPHY

[20] Mathworks, 2015, Improve Neural Network Generalization and Avoid Overfit- ting, [online] Available at: http://se.mathworks.com/help/nnet/ug/improve-neural-network-generalization-and-avoid-overfitting. html [Accessed 29 May 2015]

[21] Fortmann-Roe, S., 2012, Understanding the Bias-Variance Tradeoff, [online] Available at: http://scott.fortmann-roe.com/docs/BiasVariance.html [Accessed 23 May 2015]

[22] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R., 2014, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, [online] Available at: http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf [Accessed 19 May 2015]

[23] Fang, H., Lai, K.S., and Lai, M., 1994, Fractal structure in Currency Futures Price Dynamic, [online] Available at: http://nccur.lib.nccu.edu.tw/bitstream/140.119/72663/1/169181. pdf [Accessed 14 May 2015]

[24] Dutta, G., Jha, A.K., Laha, P., and Mohan, N., 2005, Artificial Neural Network Models for Forecasting Stock Price Index in Bombay Stock Exchange, [online] Available at: http://www.iimahd.ernet.in/publications/data/2005-10-01_ALaha. pdf [Accessed 16 May 2015]

[25] Godknows, I.M., Olusanya, O.E., 2014, Forecasting Nigerian Stock Market Returns using ARIMA and Artificial Neural Network Models, [online] Available at: http://www.cenbank.org/out/2015/sd/forecasting%20nigerian% 20stock%20market%20returns.pdf [Accessed 27 May 2015]

[26] Nasdaq, 2015, Aktier - Aktiekurser för bolag listade påNASDAQ NORDIC, [online] Available at: http://www.nasdaqomxnordic.com/aktier [Accessed 28 May 2015]

28 [27] Hervé, A., and Lynne, W., 2010, What Is the Tukey HSD Test?, [online] Available at: https://www.utdallas.edu/~herve/abdi-HSD2010-pretty.pdf [Accessed 18 May 2015]

[28] Francis, G., Refenes, A.N., and Zapranis, A., 1994, Stock Performance Mod- eling Using Neural Networks: A Comparative Study with Regression Models, Neural Networks, Volume 7, p. 376-385

[29] Ticknor, J.L., 2013, A Bayesian regularized artificial neural network for stock market forecasting, [online] Available at: http://parsproje.com/tarjome/modiriyat/492.pdf [Accessed 27 May 2015]

[30] Patel, B., and Sunil, Y.R., 2014, Stock Price Prediction Using Artificial Neural Network, [online] Available at: http://www.ijirset.com/upload/2014/june/76_Stock.pdf [Accessed 18 May 2015]

29