Pace University

School of Computer Science and Information Systems

Department of Computer Science

Eshwar Singh

Comparative Economic Forecasting with Neural Networks: Forecasting Aggregate Business Sales from S&P 500 and Interest Rates

Master’s Thesis

Supervising Faculty: Dr Anthony Joseph.

Abstract Research utilizing neural networks is a rapidly growing field of study for its extensive analytical behavior. This study uses neural networks to forecast economic time-series data. It focuses on comparative economic forecasting using neural networks with the objective of forecasting aggregate business sales using Standard and Poor’s (S&P) 500 index and interest rates. The application software used was Mathworks’ Matlab and NeuroDimension’s NeuroSolutions. The two networks used were time-lagged feedforward backpropagation and the Elman . These neural network models were implemented, and then trained and tested on sales and S&P 500 index, sales and interest rates, as well as sales and S&P 500 index together with interest rates. In particular, various data manipulation procedures were used, software tools employed during preprocessing, different methodologies applied during forecasting, and error metrics techniques applied during post-processing analysis and data evaluation. Furthermore, the study showed that current stock market prices were correlated to past stock prices suggesting that stock market data have long memory and can be useful for forecasting purposes. This is contrary to the efficient market hypothesis and the random walk theory assumption, which states that today’s asset price, does not depend on previous prices. Moreover, the Matlab and NeuroSolutions neural network modeling frameworks were compared to determine their relative performance and suitability for economic time series forecasting.

2

Table of contents

1. Introduction…………………………………………………………………7

2. Time Series Data………………………………………………………………9 2.1 Time Series Analysis…………………………………………………….10 2.2 Source of Data…………………………………………………………....11 2.3 Technical Analysis..……………………………………………………...12 2.4 Fundamental Analysis…..………………………………………………..12 2.5 Data Plots………………………………………………………………...13 2.6 Economic Analysis...... …...... 14 2.7 Indicators…...... 14 7.1 Moving Averages...……………………………………………………....15 7.2 Differencing……………………………………………………………...15

3. Performance Metrics………………………………………………………….16 3.1 Correlation………………………………………………………………..16 3.2 Mean Square Error………………………………………………………..16 3.3 Root Mean Square Error………………………………………………….17 3.4 Percentage of Correct Directions…………………………………………17 3.5 Theil………………………………………………………………………18 3.6 Mean Absolute Percentage Error…………………………………………18

4. Preprocessing Tools…………………………………………………………...18 4.1 Twelve Month Differences……………………………………………….18 4.2 Volterra Filtering…………………………………………………………19 4.3 Normalization…………………………………………………………….22 4.4 Zero Mean………………………………………………………………..23 4.5 Maximum Correlation……………………………………………………23 4.6 Matlab…………………………………………………………………….25 4.7 Hurst Exponent…………………………………………………………...25

5. Neural Networks……………………………………………………………….26 5.1 Neural Network Design…………………………………………………...26 5.2 ………………………………………………………27 5.3 Multi-Layer Perceptron…………………………………………………...28 5.4 Temporal Neural Networks….……………………………………………29 5.5 Time-Lagged Feedforward Neural Network……………………………...30 5.6 Elman Recurrent Neural Network………………………………………...31 5.7.1 Training a Neural Network………………………………………………..33 7.2 Sliding Window Training…………………………………………………34 5.8 Generalization...... …...34 5.9 Testing the network………………………………………………………35 5.10 Design of Neural Network Models……………………………………....35

3

6. Experiments………………………………………………………………..….37 6.1 Predicting sales using S&P 500 index………………………….………….40 6.2 Predicting sales using 3-month treasury bills…………….………………..49 6.3 Predicting sales using both S&P 500 & 3-mth T-Bills…………………….57

7. Data Analysis…………………………………………………………………..63

8. Conclusion…………………………….……………………………………….67 9. References……………………………………………………………………..69

4

Table of Figures & Tables

Figures

2.1 Scatter plots of filtered sales, S&P500 & 3-month T-Bills………………..9 2.2 Plots of unfiltered sales, S&P500 & 3-month T-Bills……………………..13 4.2.1 12 month differenced of Sales, S&P500 & 3-month T-Bills………………21 4.2.2 Filtered sales using the 5 th order Volterra series expansion……………. …22 4.5.1 Plot of the filtered sales versus S&P 500 shifted 6 months………………..23 4.5.2 Plot of the filtered sales versus 3-month T-Bills shifted 20 months……….24 5.1 Neural Network adaptive designs………………………………………….27 5.3 A basic architecture of the multi-layer Perceptron………………………...29 5.5 TLFN with 2 delays and 4 processing elements………………………...... 31 5.6 Elman Recurrent network……………………………………………….....32 6.1.1a Predict Sales - S&P 500 unfiltered NeuroSolutions TLFN……………...... 40 1.1b Predict Sales - S&P 500 filtered NeuroSolutions TLFN………………...... 40 1.1c Predict Sales - S&P 500 filtered (shifted) NeuroSolutions TLFN….…...... 41 1.2a Predict Sales - S&P 500 unfiltered NeuroSolutions Elman network...... …42 1.2b Predict Sales - S&P 500 filtered NeuroSolutions Elman network...... 43 1.2c Predict Sales - S&P 500 filtered (shifted) NeuroSolutions Elman network..43 1.3a Predict Sales - S&P 500 unfiltered Matlab TLFN……………………...... 44 1.3b Predict Sales - S&P 500 filtered Matlab TLFN………………………...... 45 1.3c Predict Sales - S&P 500 filtered (shifted) Matlab TLFN………………...... 45 1.4a Predict Sales - S&P 500 unfiltered Matlab Elman network...... ……...... 46 1.4b Predict Sales - S&P 500 filtered Matlab Elman network...... ………...... 47 1.4c Predict Sales - S&P 500 filtered (shifted) Matlab Elman network...…...... 47 6.2 1a Predict Sales - T-Bills unfiltered NeuroSolutions TLFN...... 49 2.1b Predict Sales - T-Bills filtered NeuroSolutions TLFN...... 49 2.1c Predict Sales - T-Bills filtered (shifted) NeuroSolutions TLFN...... 50 2.2a Predict Sales - T-Bills unfiltered NeuroSolutions Elman network...... 51 2.2b Predict Sales - T-Bills filtered NeuroSolutions Elman network...... 51 2.2c Predict Sales - T-Bills filtered (shifted) NeuroSolutions Elman network.....52 2.3a Predict Sales - T-Bills unfiltered Matlab TLFN...... …...... 53 2.3b Predict Sales - T-Bills filtered Matlab TLFN…...... 53 2.3c Predict Sales - T-Bills filtered (shifted) Matlab TLFN...... …..54 2.4a Predict Sales - T-Bills unfiltered Matlab Elman network...... ……….55 2.4b Predict Sales - T-Bills filtered Matlab Elman network...... …...... 55 2.4c Predict Sales - T-Bills filtered (shifted) Matlab Elman network...... ….56 6.3.1a Predict Sales - S&P 500 & T-Bills unfiltered NeuroSolutions TLFN…..... 58 3.1b Predict Sales - S&P 500 & T-Bills filtered NeuroSolutions TLFN……..…58 3.2a Predict Sales - S&P 500 & T-Bills unfiltered NeuroSolutions Elman…...... 59 3.2b Predict Sales - S&P 500 & T-Bills filtered NeuroSolutions Elman…...... 59 3.3a Predict Sales - S&P 500 & T-Bills unfiltered Matlab TLFN…...... 60 3.3b Predict Sales - S&P 500 & T-Bills filtered Matlab TLFN…...... 60 3.4a Predict Sales - S&P 500 & T-Bills unfiltered Matlab Elman…...... 61 3.4b Predict Sales - S&P 500 & T-Bills filtered Matlab Elman…...... 61

5

Tables

6.1 Training & testing subsets without account of lead/lag correlation values....37 6.2 Training & testing subsets with account of lead/lag correlation values…….38 6.3 Subdivision of datasets used in Matlab……………………………………..39 6.4 Summary of performance metrics predicting sales with S&P 500………….48 6.5 Summary of performance metrics predicting sales with 3-month T-Bills….57 6.6 Summary of performance metrics predicting sales with S&P & T-Bills…...62 6.7 Summary of Averages………………………………………………………62 6.8 Summary of performance metrics for all experiments……………………...63

6

1. Introduction

The human brain is perhaps the most intricate biological system found in the human body. It functions as a processor for sensory pathway stimuli, allows for cognitive control of movement and unconsciously controls many body functions and organs utilizing neuronal pathways, and conducts these operations within nano-seconds.

The human brain also serves as the reservoir for memory, whereby it stores past information which can be retrieved for future use. The artificial neural network (ANN) is a network modeled after the human brain, connected by processing elements called neurons. The connections between the neurons are a way to store the knowledge acquired by the network. These connections are representative of the network weights.

Moreover the network has the ability to learn different types of relationship. This neural system can be fabricated into a complex network of processing elements to capture the intricacies of the non-linear, non-parametric time series data. The adaptive nature of these networks can provide good prediction results for even volatile and chaotic time- series data.

This thesis is based on using the ANN models to forecast aggregate business sales from the S&P 500 index and 3-month treasury bills. The volatile nature and noisiness of the S&P 500 index and 3-month treasury bills data suggest the need for preprocessing and filtering of the datasets and careful design of the ANN models to facilitate adequate forecasting. Research (Zhang et al, 1997; Bodis, 2004; Konur and Ali, 2004) has shown the use of ANN models in forecasting the behavior of different non-linear datasets including financial and economic time series.

7

This paper is organized in the following way: section 2 describes the time-series data, its characteristics and predictability; section 3 outlines the performance metrics which include correlation and the mean square error; section 4 describes the preprocessing tools; for example, finding the 12-month first difference, Hurst exponent of the datasets and

Volterra filtering; section 5 details the neural network models and their design, focusing on the various techniques used to train and test the network models; section 6 covers all the experiments that were done; and section 7 analyzes the results. The conclusion and reference lists respectively follow for the remaining sections of this thesis.

8

Time Series Data ______

2. Time Series Data

Time series data is a collection of linear and non-linear data in a sequence of ordered samples in time or space (Principe et al, 1999; Zhang, 2004). This type of data is kept in the sequence generated to maintain time order since current samples may be dependent on the previous set of samples. They are best displayed as Scattergram (scatter-plots)

(see figure 2.1). Examples of such time series can be found in stock market data (Easton and McColl, 1997), meteorological data, economic data, and sociological data such as employment figures (Easton and McColl, 1997). The data we are interested in is aggregate business sales, S&P500 index and interest rates, which are economic indicators. It should be noted that the inverted 3-month treasury bills (interest rates) is sometimes referred to as T-Bills.

Scatter Plots Sales versus S&P 500 Index (Filtered) Scatter Plots of Sales versus T-BILLS (Filtered)

1.2 1.5 1 0.8 1 0.6 0.4 0.5 0.2 0 0

-0.2 T-BILLS -1 -0.5 0 0.5 1 1.5 -1 -0.5 0 0.5 1 1.5 -0.5

S&P 500 Index -0.4 -0.6 -1 -0.8 -1 -1.5 Sales Sales

Figure 2.1 Scattergram of filtered Aggregate Sales, S&P500 index and 3 month T- Bills.

The scatter-plot of S&P 500 index and sales shows a weak positive linear relationship which is supported by a weak correlation coefficient of 0.3077 suggesting that they may be more strongly nonlinearly related. The scatter-plot of interest rates and sales shows a somewhat weak negative linear relationship. The strength of this relationship is

9

Time Series Data ______

demonstrated in a weak correlation coefficient of 0.3843. The plot also shows strong

nonlinear elements in it.

2.1 Time Series Analysis

Time series analysis is a data analysis technique use to understand the underlying

theory of the data samples, to comprehend what the samples are comprised of, and how

they were generated (Easton and McColl, 1997). That is, are there any patterns in the

data set? Time series prediction is a method that uses historical data to predict the future

behavior of the series. This is based on the theory that if it is possible to determine that

there is a trend in the data then there is the potential for forecasting the next sample (Wu

and Lu, 1993). There are many methods that can be used to model time series data; for

example, the auto-regressive integrated moving average (ARIMA) model that is used for

random walk or random trend modeling (Virili and Freisleben, 2000; Meng). In this

experiment differencing is used to create stationary-like datasets (Meng; Virili and

Freisleben, 2000; Bodis, 2004) and the differences rather that the levels of the data are

used with the ANN models to forecast the data based on the embedded patterns in the

data structure. It might also be desirable to perform additional filtering to sufficiently

remove the noise that may cause difficulty in recognizing the patterns in the data. The

type of filtering used depends on the characteristics of the datasets. Time varying

nonlinear datasets are more appropriately filtered using time varying nonlinear filters.

10

Time Series Data ______

2.2 Sources of Data

The Standards & Poor’s (S&P) 500 index, aggregate sales and 3-month treasury bills

data were used in the experiments. The S&P 500 index and interest rates are leading

economic indicators, while sales is a coincident indicator (Larrain, 2002; Conference

Board), thus there is a relationship between these indicators and the US economy. The

S&P 500 index is a composite of stocks issued by companies that were chosen because of

their market share, industrial factor and the value of their liquid assets (Investopedia).

The S&P 500 index was designed to be a leading economic indicator of equities, and is

used as a benchmark for the Dow Jones Average. Aggregate business sales are composed

of manufacturing and trade sales. Final sales can be expressed as the Gross Domestic

Product (GDP) minus inventory investment while manufacturing and trade sales includes

the national wholesale and retail markets (Larrain, 2002). Three month treasury bills are

short-term securities issued by the US government to raise money from public sources.

They are issued every 3 months and sold for less than the face value. When matured the

full face value of the bill is repaid to the purchaser.

The data being analyzed covered the period from January 1959 to February 2006 in

monthly intervals, equating to 553 samples. It was extracted from database archives and

stored in formats including comma separated values (CSV) which can be readily

translated into excel format. The 3-month treasury bills data was downloaded from The

Board of Governors of the Federal Reserve System. The S&P500 index data was

downloaded from yahoo.com and aggregate sales was downloaded from Global Insight

databases. Since the available documentation of aggregate sales data started in January

11

Time Series Data ______

1959, it was necessary to shorten the interest rates and S&P 500 index data sequence to

the same number of samples. The adjusted closed values of S&P 500 data were used. It

should also be noted that the inverted values of the 3-month treasury bills data, which

produces the yield curve (Navarro, 2005), were used during forecasting. This is a

standard practice in stochastic environments (Wikipedia). The intention is to determine if

any interrelationship exists between the economic variables.

2.3 Technical Analysis

Technical analysis assumes that a trend lies within each dataset reflecting previous

market biases (Lawrence, 1997). Each sample contains information for an entire trading

day, and this can be used together with previous samples to forecast tomorrow’s values.

This form of forecasting assumes that historical data of a market variable is reflected in

its current values thereby enabling the prediction of the variable.

2.4 Fundamental Analysis

Fundamental analysis is derived from a company’s intrinsic value which depends on

its market performance, competitors, the profits it generates, economic conditions, and

the macro-economy; for example, interest rates and inflation (Bodis, 2004). These

factors affect the price per share on a daily basis and do not utilize any historical data.

While this is a systematic approach to predicting a company’s stock, it can however lead

to complexity when attempting to forecast since it may be problematic to consider which

market condition will affect its stock price the most.

12

Time Series Data ______

2.5 Data Plots

The following plots furnish a visualization of the original data-sets of aggregate sales,

S&P500 index and 3-month treasury bills (see figure 2.2). These are the unfiltered noisy

and trendy versions which also provide an outline of patterns that may be visual to the

naked eye.

Sales S&P 500 Index

1200000 1600 1000000 1400 1200 800000 1000 600000 Sales 800 S&P 500 Index 400000 600 400 (S&P 500 Index) Aggregate Sales 200000 200

0 Adjusted Closing Values 0 1 54 107 160 213 266 319 372 425 478 531 1 66 131 196 261 326 391 456 521 Number of Samples Number of Samples

3-month Treasury Bills (Interest Rates)

18 16 14 12 10 3-month T-Bills 8 6

Interest Rates Interest 4 2 0 1 62 123 184 245 306 367 428 489 550 Number of Samples

Figure 2.2 Plots of unfiltered aggregate sales, S&P500 index and 3-month treasury bills.

13

Time Series Data ______

2.6 Economic Analysis

Stock markets and Bond markets move in opposite directions to the Federal Reserve

interest rates (Hall and Lieberman, 1998). If the Federal Reserve increase interest rates

the stock and bond markets fall, and vice versa. Since Federal Reserve interest rates and

bond prices are inversely related, an increase in the interest rates will cause bond prices to

fall. Company stocks are held because of dividend payouts, which are directly related to

corporation’s profits. Higher profits results in an increased dividend payout (Hall and

Lieberman, 1998). Stocks must remain competitive with the bond market; therefore

lower stock prices are more attractive. If interest rates are increased then bond prices

tend to move lower and become more attractive than stock prices; hence a sell-off of

stocks is likely as investors are more likely to buy bonds (Hall and Lieberman, 1998).

Interest Rates may also have a negative impact on sales over a specific time period

(Larrain, 2002). This inversely correlated relationship affects economic growth and GDP.

2.7 Indicators

Indicators are used to explain the influences of the economic conditions and behavior of the time series that are used during forecasting (Bodis, 2004). An economic indicator is a statistic about the economy (Wikipedia) which allows for the evaluation of the current economic performance and the anticipation of future performances. Economic indicators include various indexes and economic summaries: for example, unemployment, housing starts, consumer price index, retail sales, and stock market prices. They are categorized in a branch of macroeconomics named “business cycles”

(Wikipedia). There are three types of indicators: leading, lagging, and coincident. Some

14

Time Series Data ______

of the leading indicators are stock prices, housing markets and interest rates. Some

lagging indicators are GDP and unemployment rates while examples of coincident

indicators are aggregate sales and payroll.

2.7.1 Moving Averages

The moving average of a dataset produces the average value over a set window or

time-period. It filters the dataset thereby removing some of the noise; hence it produces a

smoother graph of the original data for analysis and provides for a reasonably better

visualization of the data. In general, if the price of a stock falls below its moving average

it is a signal to sell and when the price of a stock rises higher than its moving average it is

a signal to buy (Bodis, 2004). However, this can be misleading since the moving average

can generate “mixed” results. It can also be noted that a buy signal can be generated if

the short term moving average rises higher than the long term moving average.

2.7.2 Differencing

Differencing is another method for removing noise and trends from a time series

dataset as well as a way to view the data in terms of differences rather than their levels. It

creates a smoother and more stationary time series by removing some of the noise, trends

and mean to produce a more desirable set of samples to analyze (Virili and Freisleben,

2000). There are several forms of differencing; for example, backward/forward

differencing, logarithmic scaled differencing and relative differencing. These methods of

differencing are used to render a slowly changing time varying time series stationary.

15

Performance Metrics ______

3. Performance Metrics

Performance metrics are used to analyze the data during and after forecasting. They are the traditional methods used to provide validity and correctness of the output and assess the forecasted samples relative to the original samples.

3.1 Correlation

The correlation coefficient measure is a strength indicator which indicates the strength of and linear relationship between two variables (Bodis, 2004). The correlation

coefficient values range between -1 and 1 where a higher value represents a more correlated pair of variables. The equation below represents the correlation coefficient function:

N ∑( − )( − ) xt x t xˆt xˆ t CORR = t=1 N N ∑( − ) 2 ∑( − ) 2 xt x t xˆt xˆ t t=1 t=1 where and represents the actual and predicted values and represents the mean xt xˆt x value of , while is the mean value of . xt xˆ t xˆt

3.2 Mean Square Error

The mean square error (MSE) is the average squared difference between the actual

output and the desired output samples (Principe et al, 2000). The MSE value provides a

clear indication of the correctness of the forecasted data relative to the actual results. A

16 Performance Metrics ______smaller MSE value is indicative of a more favorable forecast. The MSE is defined as follows:

1 N MSE = ( − ) 2 ∑ xt xˆt N t=1

where and represent the actual and the predicting datasets. xt xˆt

3.3 Root Mean Square Error

The root mean square error (RMSE) is the square root of the MSE and is described in the following equation:

RMSE = MSE

3.4 Percentage of Correct Directions

The percentage of correct directions (POCD) indicates the number of samples in the desired and forecasted datasets that are moving in the same direction. This performance metric is designed to provide a measure of analysis of the actual and output data, where a higher percentage of same directions characterize a more desirable forecasting result.

POCD is described in the following equation:

1 N POCD = (HS (∆ ⋅ ∆ ) +1− HS (| ∆ | + | ∆ |)) ∑ xt xˆt xt xˆt N t=1 where ∆ = − and ∆ = − , HS represents the Heaviside function where xt xt xt−1 xˆt xˆt xt−1

1 if x > 0 HS(x) =  0 otherwise

17

Performance Metrics ______

3.5 Theil

The Theil coefficient of inequality (T) measures the relative correctness of the forecasted data. This coefficient ranges from 0 and 1 with the value of 0 indicating a perfect forecast. It is defined as follows:

N − 2 ∑ ()xt xˆt T = t =1 N 2 ∑ xt t =1

3.6 Mean Absolute Percent Error

The Mean Absolute Percent Error (MAPE) measure the accuracy of the

forecasted dataset relative to the actual dataset. It is specifically used to identify trends

and is measured as a fractional value. This performance measure is given by the

following expression:

N − 1 t ˆt MAPE = ∑ x x N T =1 xt

A MAPE of up to 10% is considered very good, while a range between 10% and 20% or even higher is normal.

4. Preprocessing Tools

4.1 Twelve Month Differences

The twelve month backward difference is a preprocessing method used to transform and remove noise from the datasets. It also makes the datasets more stationary.

18

Preprocessing Tools ______

It is a low pass filter (Principe et al, 2000). In particular, the 12-month difference equation is actually 13 months inclusive: January to January, February to February etc.

That is, this function takes the difference between samples 13 months apart. It should be noted that the first 13 samples are lost in the transformation of the datasets from levels to differences. Sales and inverted interest rates were filtered using the 12 month backward difference. The first 12 month relative difference was used to preprocess the

S&P 500 index. It produced essentially the same result as taking the 12 month backward difference of the natural logarithm of the S&P 500 index. The relative difference of the

S&P 500 index is computed in fewer steps than the backward difference of the logarithm.

4.2 Volterra Filtering

The truncated Volterra series expansion is a nonlinear filter used to remove noise from the data prior to forecasting. This filter is known for its ability to smooth nonlinear datasets. The following expression describes a time varying truncated Volterra series expansion:

K −1 N1 −1N2 −1 y(n) = a + ()k,n x(n − k) + a (n ,n , n)x(n − n )x(n − n ) + .... + 0 ∑a1 ∑ ∑ 2 1 2 1 2 k=0 n=01 n2 =0 M − M1 −1M 2 −1 p 1

∑ ∑..... ∑ a p (m1 , m2 ,...... , m p ,n)x(n − m1 )x(n − m2 ).... x(n − m p ) m1 =0m2 =0 m p =0

where al (m1 ,...... ml , n ) refers to the l -th series of the Volterra kernel of the system and

max {K, N1 , N ,2 ..... M 1 ,... M p } represents the memory of the Volterra series expansion.

This function is a discrete time series expansion suitable for nonlinear filtering. The

19 Preprocessing Tools ______

Volterra series expansion is desirable since it primarily removes noise, and it can also be

cost effective with proper choice of the order and coefficient values used. In this thesis,

the 2 nd order Volterra series expansion was used to filter data for the S&P 500 index and aggregate sales. The 3 rd order expansion was used for the 3-month treasury bills because

of the difficulty in obtaining suitable coefficients from the 2nd order series expansion to generate a sufficiently desirable filtered version of the original dataset. The 2nd and 3 rd order Volterra series expansion are respectively described as follows:

K −1 L1 −1 L2 −1

Second Order: y (n) = a0 + ∑ a(k, n)x(n − k) + ∑∑b(l1 ,l2 )x(n − l1 )x(n − l2 ) k=0 l1 =0 l2 =0

K −1 L1 −1 L2 −1

Third Order: y (n) = a0 + ∑ a(k, n)x(n − k) + ∑∑b(l1 ,l2 )x(n − l1 )x(n − l2 ) + k=0 l1 =0 l2 =0

M1 −1M 2 −1M 3 −1

∑ ∑ ∑(m1 ,m ,2 m ,3 n)x(n − m1 )x(n − m2 )x(n − m3 ) m1 =0m2 =0m3 =0

where a, b and c are varying, and x is the inputted S&P 500 index, aggregate sales or inverted 3-month treasury bills. The 3rd order Volterra filter generated better results for the 3-month treasury bills because of its increased filtering power. Below are graphs of the filtered 12 month differenced aggregate sales, S&P 500 index and 3-month treasury bills:

20

Preprocessing Tools ______

Filtered Sales using 2nd order Volterra Filtered S&P 500 Index using 2nd order Volterra 100000 0.6 80000 60000 0.4 40000 0.2 S&P 500-12- Sales-12 mth diff 20000 mth diff 0 Filtered Sales Filtered S&P 0 1 87 173 259 345 431 517 -0.2 500 index -20000 1 64 127 190 253 316 379 442 505 -0.4 -40000 -60000 -0.6

Filtered TBills using 3rd order Volterra

10 8 6 4 TBills inv 12 mth 2 diff 0 Filtered TBills -2 1 58 115 172 229 286 343 400 457 514 -4 -6 -8 -10

Figure 4.2.1 Plots of the filtered 12-month differenced aggregate sales, 3-month T-Bills and S&P500 index.

The filtering process did not require a fourth or fifth order Volterra filter since the filtered data did not significantly improve as shown in the graph below:

21

Preprocessing Tools ______

Filtered sales with Volterra 5th order series expansion

100000 80000 60000 40000 Sales 12 mth diff 20000 Filtered Sales 0 -20000 1 77 153 229 305 381 457 533 -40000 -60000

Figure 4.2.2 Filtered sales using the 5th order Volterra series expansion.

Therefore, the cost and time required to generate a filtered dataset with the higher order

Volterra series expansion could not be justified for the slightly better results.

4.3 Normalization

Normalizing the datasets is an important aspect of forecasting data using the neural network models which are bounded to at most between ± 1 (Bodis, 2004). In addition,

due to the volatility of the time series data under study, the values can fluctuate sharply in

a short period exceeding the limits of the neural network model and therefore causing

difficulty in network training. Normalization brings the range of data within the limits of

the neural network model. For example, in Matlab environment it is advised that the data

be normalized so that the network can perform significantly better (Matlab Neural

Network). For these experiments, the datasets were normalized to within ±1 to allow the

neural network to function efficiently on the data. This was achieved by taking the

maximum absolute value of the entire dataset and dividing each sample by this value.

22

Preprocessing Tools ______

4.4 Zero Mean

The mean was removed from the respective datasets to reduce the ranges, and to center the datasets around the zero reference axis. This was achieved by removing the average (mean) from the dataset by taking the difference between the average and each sample in the dataset. The mean was obtained in the following way:

1 n x = ∑ xi n i=1

Zero mean concept is a useful approach in forecasting, it may be considered as another

normalization mechanism.

4.5 Maximum Correlation

Maximum correlation is achieved by shifting a dataset relatively to another while determining the maximum correlation value between them. The following graphs are illustrative for aggregate sales versus S&P 500 index, and aggregate sales versus 3-month treasury bills:

Filtered Aggregate Sales and S&P 500 Index shifted by 6 samples with maximum correlation of 0.5078

1.5

1

0.5 Filtered S&P 500 0 Filtered Sales 1 49 97 145 193 241 289 337 385 433 481 529 -0.5

-1

Figure 4.5.1 Plot of the filtered aggregate sales versus S&P 500 index.

23

Preprocessing Tools ______

Filtered Aggregate Sales and 3-month T-Bills shifted by 20 samples with maximum correlation of 0.5988

1.5 1 0.5 Filtered 3-month T- Bills 0 Filtered Sales -0.5 1 50 99 148 197 246 295 344 393 442 491 -1 -1.5

Figure 4.5.2 Plot of the filtered aggregate sales versus 3-month treasury bills.

The process involves shifting one dataset relative to another in an attempt to realize a

value of the correlation representative of the maximum lead/lag of one dataset over the

other. In particular, before shifting the datasets relative to each other the correlation

between aggregate sales and S&P 500 index was 0.3077. In shifting S&P 500 index

relative to sales the maximum correlation of 0.5078 was reached at 6 months. This was

also done for aggregate sales and 3-month treasury bills where before any shifting took

place the overall correlation was -0.38436. Shifting 3-month treasury bills relative to

sales yield the maximum correlation of 0.5988 at 20 months. This was a separate

experiment to empirically determine the maximum number of months sales lagged S&P

500 index and 3-month treasury bills respectively. The higher the correlation values are

likely to yield better forecasting outputs.

24

Preprocessing Tools ______

4.6 Matlab

Matlab is an application software and programming language with interfaces to Java,

C/C++ and Fortran. It has many toolboxes which are dedicated to areas such as aerospace technology and bioinformatics. In this study, Matlab provides an environment for creating programs with built-in functions for performance metrics and forecasting using its neural network toolbox. The advantage of using Matlab involves the relative ease of manipulating data between input and output applications as well as easily modifying and managing data. The software’s high-level language capability also allows for many methods of data pre- and post-processing without switching software applications, which contributes to its advantages.

4.7 Hurst Exponent

The Hurst exponent (H) determines the predictability of a time series. An H value greater than 0.5 indicates that the time series is predictable. The following equation describes the Hurst exponent:

log( R (m)) − log( S (m)) H (m) = L L , 1 ≤ m ≤ M log( α) + log( L) where 0 < H < 0.5 is anti-persistent, H = 0.5 is random, and 0.5 < H ≤ 1 is persistent.

The H values for aggregate sales, S&P 500 index, and 3-month T-Bills were 0.8634,

0.8558 and 0.8316 respectively.

25

Neural Networks ______

5. Neural Networks

5.1 Neural Networks Design

Neural Networks is an adaptive nonlinear parallel distributed system used for

function approximation, time series prediction, classification and . One of the

strengths of the neural network model may be characterized by its ability to capture

underlying patterns of non-linear trends (Zhang, 2005). Fashioned after the human brain

and nervous system (Thawornwong and Enke, 2004; Qi, 1999), a neural network has the

ability to process complex data similar to stock market data by utilizing its parallelism

and harnessing the power of its processing elements to process the data (Zhang et al,

1997). Neural network applications, however, are not confined to only stock market data

as examples of time series data. Other applications areas include aerospace technologies,

genetics, signal processing, control systems, speech processing and data reconstruction.

The network’s main function is to process data using many neurons (Reed and Marks

II, 1998) and the outputs from these neurons may become input to other neurons as the

process continues until eventually the network response or output is generated. In neural

network design, the initial weights are usually randomly chosen. They are adjusted

continuously up to their best values as the network learns from the inputted data and

adapts its output data accordingly. The network does not function independently as it

requires all inputs to generate a desirable output. Hence, a requirement must be that a

sufficiently large and no larger network is needed to provide a reasonable response.

Neural networks are nonparametric in nature (Principe et al, 2000) in that they do not

assume the functional relationship between the input and output datasets. Their

performance feedbacks back-propagate through the network, adjusting the values of the

26 Neural Networks ______parameters in a systematic way during the training process as the networks output approach the desired response. If the desired goal is not achieved, the network re-adjusts its weights as the learning process continues (Principe et al, 2000). The network designer must decide on the input type, size, and complexity of the network. He/she must decide on what setup values to assign the system to initially start the network training process including the percentage of the total dataset to set aside for training and testing. The diagram in Figure 5.1 below is a simple design of a neural network model. Neural network is sometimes called soft computing (Bodis, 2004). This type of network can efficiently manipulate seemingly complex and noisy data hence its use in financial and investment trading analyses. This study utilized models from two different types of neural network environments: Matlab and NeuroSolutions.

Figure 5.1 Generic Neural Network Model.

5.2 Supervised Learning

Supervised learning is the method of training a neural network system to generate desired outputs functionally dependent on the inputs (Reed and Marks II, 1998). It acts

27

Neural Networks ______as a teacher representing the environment with the knowledge derived from examples of input and target values of the training vector (Haykin, 1994). The principal components of the supervised learning system are: adaptive parameters, presence of desired data values, a termination criterion through the minimization of some optimization problem, and a procedure for computing the best parameters (Principe et al, 2000). By learning the relationship between the network output and target values, a function is produced that enable generalization from the presented data to somewhat related unseen data samples.

5.3 Multi-Layer Perceptron

The multi-layer perceptron (MLP) is the most widely used architecture for building neural networks (Zhang et al, 1997). It contains a set of inputs connected to hidden layers whose function is to accept the input data samples and process them. There may be more than one hidden layer used to process the data to produce a network output similar to the desired output. Some of the more advanced neural network architectures are an extension of the basic MLP. One of the most effective ways to train MLP is the backpropagation algorithm. Backpropagation is the most commonly used training algorithm for the learning of a feedforward network (Zekic, 1998; Haykin, 1994). It implements the Gradient Descent method of the learning rule. Error correction learning minimizes the error data samples resulting from the difference of neural network output and the target response. The minimization criterion is the mean square error. The weights are adjusted so as to minimize the mean square error between the predicted and actual target values.

28

Neural Networks ______

i 1

i 2

i3

i n

Figure 5.3 A basic architecture of the multi-layer Perceptron.

5.4 Temporal Neural Networks

A system is said to be temporally dependent if a given input to the system generates a different response depending on the previous inputs (Swingler, 1996). A temporal (or dynamic) system current output is dependent on both the current and the pervious inputs

(Swingler, 1996). A dynamic neural network has built-in short-term memory that is accomplished by introducing time delays into the synaptic structure of the neural network

(Haykin, 1994). Another method of making a neural network behave dynamically is to make it recurrent by introducing feedback into the designs (Haykin, 1994). The two

29

Neural Networks ______types of networks that can be termed temporal neural networks are time-delay neural networks time, lagged feedforward neural networks and recurrent networks.

5.5 Time Lagged Feedforward Neural Network

Feedforward neural networks are widely used for forecasting financial data due to its

ability to predict the dependent samples (Thawornwong and Enke, 2004). It is an

extension of the multilayer perceptron that includes nonlinear processing elements in a

feedforward design. The time lagged feedforward neural network (TLFN) is a dynamic

network with short-term memory elements provided at the input layer in the form tap-

delay lines (Principe et al, 2000) giving the network ability to store past data. The depth

of the short-term memory is one plus the number of delays and the resolution of the

network is one. The tap delay lines with their corresponding weights connecting to the

first hidden layer constitute impulse response filters in linear combiners. Following these

temporal components is the typical nonlinear static MLP. Thereafter, the MLP processes

the present and past values of the network input layers. The diagram below is an

example of TLFN.

30

Neural Networks ______

x 1

z − 1

x 2

z−1

Figure 5.5 Time lagged feedforward network with 2 delays and 4 processing elements.

5.6 Elman Recurrent Neural Network

The Elman network is a recurrent neural network based on context processing elements (Principe et al, 2000). Its central purpose is to process time varying samples to predict output samples of a dataset (Swingler, 1996); for example, forecasting stock market data. It is also used in classification of time series and many other types of applications. The Elman network can be trained using the backpropagation algorithm. In this network, the hidden layer receives input from its output via the context layer as shown below in figure 5.5:

31

Neural Networks ______

x 1

x2

Figure 5.6 Elman Recurrent network.

In addition the context layer is a copy of the hidden layer acting as an extension of the

input layer. This hidden layer retains a copy of the hidden units for the previous time

step (Swingler, 1996) providing short term memory which is used as input for the current

time step. This memory is encapsulated in the hidden layers and will span for the number

of network layers. Therefore Elman network is a dynamic neural network. From the

inherent retention capacity of the short term memory, the network can learn and adjust its

weights to generate a function suitable for mapping the input data to the target.

32

Neural Networks ______

5.7.1 Training a Neural Network

When all the data is collected, pre-processed and the neural network chosen for the particular data-type, the network is then trained for the specific network input. In training a network, one must decide on the many options that are available such as the type of transfer function, the training algorithm, type of training, the limits of the network operation and the number of training samples, and the number of testing samples. This stage of neural network processing requires some trial and error (Swingler, 1996). The transfer function specifies the range of the data samples by its upper and lower limits.

For time-series data the tanh or tansig functions are appropriate because it gives range of the data from -1 to 1 after normalization. The training algorithm decides how the network will train or learn. Depending on the algorithm, the network can be trained to produce a relatively faster result compared to other training algorithms. However, there are appropriate training algorithms depending on the network and its input. For time series data, the Levenberg-Marquardt algorithm is suitable since it has been shown to work faster with the backpropagation network. It is also used in time-series forecasting.

There are two types of training, batch and incremental where batch training is faster since it requires less weight updates. Batch training also provides a more accurate measurement of the changes in weights (Swingler, 1996). The limits of the network operation are confined to a specific goal. For example, the number of epochs or times the network will cycle is set to prevent over-training and the mean square error is assigned a value to achieve; if the network attained either the assigned number of epochs or MSE value, it would stop training. There are also input settings to manually increment the learning parameters of the network. There is no particular method of selection when

33

Neural Networks ______deciding the number of training samples to choose from within a dataset. However, the training set must include a representation of the target data otherwise this can lead to poor results (Reed and Marks II, 1998). A poor result may contain irregularities not found in the target data leading to incorrect output data. The selection of the training subset must be sufficiently long to allow adequate training of the neural network since an MLP neural network is a universal function approximator.

5.7.2 Sliding Window Training

This procedure can be used to improve results of the network by training on a specific set of samples, test the results and then slide the window to include the testing set and removing the same number of samples at the end of the dataset to maintain integrity of the number of training samples. The strategy described is used when the prediction model changes over time (Skabar and Cloete, 2001) or if the input data is not correlated to the target set. It optimizes the network performance by providing a method of predicting data that exhibit nonlinear time varying behavior.

5.8 Generalization

Generalization in terms of neural networks processing can be associated to how well the network has performed on the forecasted dataset (Principe et al, 2000). Often viewed as an approximation (Reed and Marks II, 1998), the intention is to find a network that is not too simple in fitting the training values which would likely generate poor results. If the network’s functionality is sufficient to minimize the error on the training samples then this would give rise to adequate forecasting results. The training data must be a true

34

Neural Networks ______representation of the target data. The network must also be adequately sized so as not to memorize the data and become “less able to generalize between similar input-output”

(Haykin, 1994) samples. Factors that affect generalization are “the size and efficiency of the training sets,” “physical complexity of the problem,” and the network design. The network design and the training sets are related concepts. The number of training samples is directly proportional to the number of weights. Furthermore, built-in tap delay lines in network designs increase the number of weights. Seemingly, to empirically obtain “good generalization” the number of weights must be less than the allowable error on test times the number of training samples (Haykin, 1994).

5.9 Testing a Neural Network

Testing the network is a method of comparison between the network outputs and the

target. Using the partial data set-aside for testing, the network utilizes the function

approximated during training period to map the input to the actual samples.

5.10 Design of Neural Network Models

In the experiments that follow in the next section, the NeuroSolutions TLFN models were of the focused time delay neural network (TDNN) design with a depth of two samples indicating a single delay element per input. The setups include one hidden layer with four processing elements, tanh transfer function, and the Levenberg Marquardt learning rule. The output layer had one processing element, with a tanh transfer function, and the

Levenberg Marquardt learning rule. The supervised learning regime used was under supervised learning control, the networks were set to terminate training after 1000

35

Neural Networks ______

Epochs (cycles) or with MSE having a threshold of 0.0001 or after it is noted that incremental change from one iteration to the next is less than the threshold. The networks’ weight updates were randomly done with batch learning mode. The initial weight values were randomly chosen in each TLFN model The NeuroSolutions Elman recurrent network models consists of one hidden layer with four processing elements and the tanh transfer function with a Levenberg Marquardt learning rule. The output layer had one processing element, with a tanh transfer function, using the Levenberg

Marquardt learning rule. The setup of these network models for supervised training was similar to that of TLFN.

The Matlab TLFN models consist of a depth of eight samples or seven delays at the inputs. There were one hidden layer using five processing elements with a tansig transfer function and the Levenberg Marquardt learning rule. The output layer had one processing element which included a tansig transfer function and the Levenberg

Marquardt learning rule. The MSE was set to a threshold value of 0.001 and the epochs were set between the values of 150 and 200 to terminate training. The initial weights were randomly chosen and the network was trained in a supervised batch mode. The

Elman recurrent neural network models contained one hidden layer with five processing elements and the tansig transfer function. The learning rule was Levenberg Marquardt.

The output layer had one processing element with a tansig transfer function and the

Levenberg Marquardt learning rule. The setup of the Elman recurrent network models for supervised training was similar to that of the TLFN.

36

Experiments______

6. Experiments

The following lists of experiments were conducted using Mathworks’ Matlab neural

network toolbox and NeuroDimension’s NeuroSolutions application software for time-

lagged feedforward network (TLFN) and Elman recurrent neural networks:

1. Predicting aggregate sales using S&P 500 index.

2. Predicting aggregate sales using 3-month treasury bills.

3. Predicting aggregate sales using both S&P 500 index and 3-month treasury

bills.

Each of S&P 500 index, 3-month T-Bills and aggregate sales datasets were subdivided into a

training subset and a testing subset. The training subset accounted for 60% of the respective

datasets (see Table 6.1):

Table 6.1

Predictor Data Training (Feb 60–Sept 87) Testing (Oct 87–Dec 05) Correlation value

length 60% 40% (maximum)

S&P 500 553 333 samples 220 samples 0.3077

3-month T-Bills 553 333 samples 220 samples -0.3843

Note: This data subdivision did not account for the maximum possible lead of S&P 500 Index and 3-month T-Bills

over sales. The datasets were used in their assigned time relationships.

In the preprocessing of the unfiltered data, the datasets were first normalized as described

previously, the 12 month difference taken and the mean was removed. The filtered versions of

the dataset were derived by first generating the 12 month difference, filtering the datasets using

the 2 nd or 3 rd order Volterra filter and then normalizing the datasets.

37

Experiments ______

Some experiments were done by shifting the datasets relative to each other, as discussed

in section 4.5, to get a maximum correlation. The idea is that a higher overall correlation value

relates to better forecasting results. Since correlation coefficient is an important performance

metric in measuring the relationship between the network output samples and the desired output,

it is instructive that finding the maximum of this metric between the desired output and the

predictor variables may produce better results. Using a Matlab program written specifically to

calculate the correlation between two sets of samples in parts or in its entirety, the correlation

values were generated. It was observed that shifting the S&P 500 index forward by 6 samples

(months) and the 3-month treasury bills forward by 20 samples (months) relative to sales,

resulted in the maximum correlations of 0.5078 and 0.5988 respectively (see Table 6.2).

Table 6.2

Predictor Data Training (Feb 60–Sept 87) Testing (Oct 87–Dec 05) Correlation value

length 60% 40% (maximum)

S&P 500 546 328 Samples 218 Samples 0.5078

3-month T-Bills 532 320 Samples 212 Samples 0.5988

Note: Data subdivision accounting for lead of predictor variables over predicted variable. Subdivision of datasets

……..into training and testing subsets along with correlation values between predictor (S&P 500 and 3-month T-

……. Bills) and predicted (sales).

In some experiments, the corresponding leads of predictors S&P 500 index and 3-month T-Bills were used as parameters. Obtaining the leads required shifting one dataset relative to another which resulted in a smaller number of samples being used for prediction. In particular, 6 samples for the sales and S&P 500 index and 20 samples for the sales and 3-month treasury bills,

38

Experiments ______

which resulted from S&P 500 index leading sales by 6 months and 3-month treasury bills leading

sales by 20 months, were appropriately removed from the respective datasets to produce smaller

dataset sizes. After the samples were removed, 60 percent of each dataset was used for training

and 40 percent was used for testing (forecasting). Only the filtered versions of the datasets were

used in this set of experiments.

The Matlab experiments were done using the sliding window training method with subsets of

the datasets shown in Table 6.3. From the 408 (74%) sample data subset of the original datasets

of 553 samples, 82% (333 samples) were used for training and 18 % (75 samples) for testing.

After the testing was completed on the first 75 samples the data subsets were shifted 75 samples

forward, an amount equal to the previous testing subset, and the first 75 samples were removed

maintaining a data subset size of 408 samples (section 5.7.2). This permitted having a fixed

amount of samples for training in all experiments.

Table 6.3

Training Testing Correlation Predictor Data length Feb 60-Sept 87 Oct 87-Dec 05 Coefficient S&P 500 553 333 (60%) 220 (40%) 0.3077 Subdivisions 408 408 403 333 (82%) 333 (82%) 333 (83%) 75 (18%) 75 (18%) 70 (17%) 0.2420 3-month T-Bills 553 333 (60%) 220 (40%) -0.3843 Subdivisions 408 408 403 333 (82%) 333 (82%) 333 (83%) 75 (18%) 75 (18%) 70 (17%) -0.3659

Note: Table 6.3 displays the windowing technique which required the subdivision of the datasets and the testing

subsets in Matlab.

In the next three sections are the results of the experiments produced by NeuroSolutions and

Matlab neural network models.

39

Experiments ______

6.1 Predicting aggregate Sales using S&P 500 Composite index in both NeuroSolutions and

Matlab environments.

Figures 6.1.1a, 6.1.1b and 6.1.1c display results of predicted aggregate sales using TLFN.

Predicted Aggregate Sales from S&P 500 Index

0.08 0.06 0.04

0.02 Desired 0 Output

Sales values Sales -0.02

-0.04 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 -0.06 October 1987 - December 2005

Figure 6.1.1a. Prediction of unfiltered sales values using TLFN in NeuroSolutions

environment.

Predicted Aggregate Sales from S&P 500 Index

1.2 1 0.8 0.6 Desired 0.4 Output 0.2

Sales values Sales 0 -0.2 -0.4 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 October 1987 - December 2005

Figure 6.1.1b. Prediction of filtered sales values using TLFN in NeuroSolutions

environment.

40

Experiments ______

Predicted Aggregate Sales from S&P 500 Index

1.2 1 0.8 0.6 Desired 0.4 Output 0.2 Sales values Sales 0 -0.2 -0.4 Dec-87 Dec-89 Dec-91 Dec-93 Dec-95 Dec-97 Dec-99 Dec-01 Dec-03 Dec-05 December 1987 - December 2005

Figure 6.1.1c. Prediction of filtered sales values shifted by 6 months using TLFN in

NeuroSolutions environment.

Figure 6.1.1a, the unfiltered version of the sales data prediction, shows a noisy plot that is an inherent characteristic of unfiltered economic and financial time series data. This data prediction generated a root mean square error (RMSE) of 0.141 and a correlation of 0.87 between the predicted and actual sales, which demonstrates the accuracy of the predicted sales values. The plot shows a seemingly good fit between the desired and the neural network output values. It has a relatively small RMSE and strong correlation value but the percent of correct directions

(POCD) of 37 shows a small percentage of desired sale values and predicted sale values that is moving in the same direction. The Theil value of 0.456 is suggestive of a moderately good prediction of sales.

Figure 6.1.1b represents the output of the filtered version of the previous experiment. It

is quite noticeable that this graph displays a much smoother representation of the sales datasets

than before. This prediction generated a RMSE of 0.063, a correlation value of 0.98, a POCD

value of 61 and a Theil value of 0.122, which represents an improved prediction over unfiltered

data.

41

Experiments ______

The results generated from the shifting strategy (Figure 6.1.1c) shows a desirable forecast

of sales using S&P 500 index. From visualization, the plots are closely fitted. The desired and

output sales match each other very closely. The RMSE generated was 0.055 and the correlation

value was 0.98 suggesting favorable empirical results for this prediction. The POCD value was

64 showing that the desired and predicted sales were moving in the same direction most of the

times. The Theil value for this experiment was 0.114 and mean absolute percentage error

(MAPE) was 0.2138.

Figures 6.1.2a, 6.1.2b and 6.1.2c display results of predicted aggregate sales using the Elman

recurrent network.

Predicted Aggregate Sales from S&P 500 Index

0.08 0.06

0.04

0.02 Desired 0 Output

Sales values Sales -0.02

-0.04 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 -0.06 October 1987 - December 2005

Figure 6.1.2a. Prediction of unfiltered sales values using the Elman neural network in

NeuroSolutions environment.

42

Experiments ______

Predicted Aggregate Sales from S&P 500 Index

1.2 1 0.8 0.6 Desired 0.4 Output 0.2

Sales values Sales 0 -0.2 -0.4 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 October 1987 - December 2005

Figure 6.1.2b. Prediction of filtered sales values using the Elman neural network in

NeuroSolutions environment.

Predicted Aggregate Sales from S&P 500 Index

1.2 1 0.8 0.6 Desired 0.4 Output 0.2

Sales values Sales 0 -0.2 -0.4 Dec-87 Dec-89 Dec-91 Dec-93 Dec-95 Dec-97 Dec-99 Dec-01 Dec-03 Dec-05 December 1987 - December 2005

Figure 6.1.2c. Prediction of filtered sales values shifted by 6 months using the Elman neural

network in NeuroSolutions environment.

From Figure 6.1.2a the unfiltered prediction is not as favorable. It shows that the sales

data is noisy and that the predicted values do not closely follow the desired values. Nonetheless,

the RMSE value was 0.141 and the correlation 0.88 were reasonably good for the prediction but

the POCD value of only 38 shows that the predicted sale values do not move in the same

43

Experiments ______

direction as the desired sales value for the most part. The Theil and MAPE values were 0.440

and 0.2507 respectively (Table 6.4).

Figure 6.1.2b displays better results for the filtered version of the sales datasets than in

the previous experiment. For this experiment, the RMSE was 0.063 and the correlation was 0.98

while the POCD value was 64, the Theil value was 0.117, and the MAPE was 0.2121. These

performance metrics support the observation of relatively better results.

Figure 6.1.2c is a plot of the desired and predicted output of aggregate sales and S&P 500 index using the shifting strategy. Again, relatively better results were generated as shown by the

RMSE value of 0.055, a correlation of 0.98, a POCD value of 64, a Theil value of .115 and a

MAPE value of 0.2091.

Figures 6.1.3a, 6.1.3b and 6.1.3c display results of predicted aggregate sales using TLFN.

Predicted Aggregate Sales from S&P 500 Index

0.08 0.06 0.04 0.02 Desired 0 Output

Sales values Sales -0.02

-0.04 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 -0.06 October 1987 - December 2005

Figure 6.1.3a. Prediction of unfiltered sales values using TLFN in Matlab environment

44

Experiments ______

Predicted Aggregate Sales from S&P 500 Index

1.2 1 0.8 0.6 Desired 0.4 Output 0.2

Sales values Sales 0 -0.2 -0.4 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 October 1987 - December 2005

Figure 6.1.3b. Prediction of filtered sales values using TLFN in Matlab environment.

Predicted Aggregate Sales from S&P 500 Index

1.2 1 0.8 0.6 Desired 0.4 Output 0.2

Sales values Sales 0 -0.2 -0.4 Dec-87 Dec-89 Dec-91 Dec-93 Dec-95 Dec-97 Dec-99 Dec-01 Dec-03 Dec-05 December 1987 - December 2005

Figure 6.1.3c. Prediction of filtered sales values shifted by 6 months using TLFN in Matlab

environment.

Figure 6.1.3a shows a prediction of aggregate sales values using the unfiltered version of

the datasets. The difference between the predicted and the desired sales output generated a

RMSE of 0.044 and a correlation of 0.96. The POCD value was 74; the Theil and MAPE values

were 0.292 and 0.4250 respectively. The Matlab results are slightly better than its

NeuroSolutions counterpart in terms of the RMSE, correlation, POCD, and Theil (Table 6.4).

45

Experiments ______

The prediction of filtered aggregate sales using Matlab’s neural network (Figure 6.1.3b)

produced the following results: RMSE of 0.010, a correlation value of 0.99, a POCD value of 86,

and a Theil value of 0.080. The experiment demonstrates the ability of Matlab neural networks

to produce very close predicted sales values to the desired dataset.

The shifting strategy (Figure 6.1.3c) generated a RMSE value of 0.070, a correlation value of 0.94, a POCD value of 63, and a Theil value of 0.227. This prediction did not produce as good results as the filtered version (Figure 6.1.3b) of the experiment.

Figures 6.1.4a, 6.1.4b and 6.1.4c display results of predicted aggregate sales using the Elman recurrent network.

Predicted Aggregate Sales from S&P 500 Index

0.08 0.06 0.04 0.02 Desired 0 Output

Sales values Sales -0.02 -0.04 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 -0.06 October 1987 - December 2005

Figure 6.1.4a. Prediction of unfiltered sales values using the Elman neural network in Matlab

environment

46

Experiments ______

Predicted Aggregate Sales from S&P 500 Index

1.2 1 0.8 0.6 Desired 0.4 Output 0.2

Sales values Sales 0 -0.2 -0.4 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 October 1987 - December 2005

Figure 6.1.4b. Prediction of filtered sales values using the Elman neural network in Matlab

environment

Predicted Aggregate Sales from S&P 500 Index

1.2 1 0.8 0.6 Desired 0.4 Output 0.2

Sales values Sales 0 -0.2 -0.4 Dec-87 Dec-89 Dec-91 Dec-93 Dec-95 Dec-97 Dec-99 Dec-01 Dec-03 Dec-05 December 1987 - December 2005

Figure 6.1.4c. Prediction of filtered sales values shifted by 6 months using the Elman neural

network in Matlab environment.

The plots generated in the experiments were conducted using the Elman neural network and are

shown in Figures 6.1.4a, 6.1.4b and 6.1.4c. The prediction of unfiltered aggregate sales from

S&P 500 index (Figure 6.1.4a) resulted in a RMSE value of 0.273, correlation value of 0.92, a

POCD value of 69, and a Theil value 0.322. The filtered version of aggregate sales used in the

previous experiment yielded a prediction where the RMSE was 0.035, the correlation value was

47

Experiments ______

0.99, the POCD value was 73, and a Theil value was 0.129. This experiment, again, produced

better results than in the previous case where unfiltered data were used. With the shifting

strategy where the sales values were pushed forward by 6 months the network performance

(Figure 6.1.4c) resulted in a RMSE value of 0.077, a correlation value of 0.94, a POCD value of

64, and a Theil value of 0.197. A relatively good MAPE value of 0.232 was also generated

(Table 6.4).

Table 6.4

Experiment # Epochs RMSE Correlation POCD Theil MAPE 6.1.1a (unfiltered) 35 0.141 0.87 37 0.456 0.2314 6.1.1b (filtered) 47 0.063 0.98 61 0.122 0.2258 6.1.1c *(filtered) 60 0.055 0.98 64 0.114 0.2138 6.1.2a (unfiltered) 48 0.141 0.88 38 0.440 0.2507 6.1.2b (filtered) 38 0.063 0.98 64 0.117 0.2121 6.1.2c *(filtered) 40 0.055 0.98 64 0.115 0.2091 6.1.3a (unfiltered) 200 0.044 0.96 74 0.292 0.4250 6.1.3b (filtered) 150 0.010 0.99 86 0.080 0.1191 6.1.3c *(filtered) 150 0.070 0.94 63 0.227 0.3009 6.1.4a (unfiltered) 200 0.273 0.92 69 0.322 0.5972 6.1.4b (filtered) 150 0.035 0.99 73 0.129 0.2421 6.1.4c *(filtered) 150 0.077 0.94 64 0.197 0.2317

Note: Summary of performance metrics predicting aggregate sales with S&P 500 index.

* Experiments that utilized the shifting strategy.

48

Experiments ______

6.3 Predicting Aggregate Sales using inverted 3-month Treasury Bills in both

NeuroSolutions and Matlab environments.

Figures 6.2.1a, 6.2.1b and 6.2.1c display results of predicted aggregate sales using TLFN.

Predicted Aggregate Sales from 3-month T-Bills

0.08 0.06 0.04

0.02 Desired 0 Output

Sales values Sales -0.02

-0.04 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 -0.06 October 1987 - December 2005

Figure 6.2.1a. Prediction of unfiltered sales values using TLFN in NeuroSolutions environment.

Predicted Aggregate Sales from 3-month T-Bills

1.2 1 0.8 0.6 Desired 0.4 Output 0.2

Sales values Sales 0 -0.2 -0.4 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 October 1987 - December 2005

Figure 6.2.1b. Prediction of filtered sales values using TLFN in NeuroSolutions environment.

49

Experiments ______

Predicted Aggregate Sales from 3-month T-Bills

1.2 1 0.8 0.6 Desired 0.4 Output 0.2 Sales values Sales 0 -0.2 -0.4 May-88 May-90 May-92 May-94 May-96 May-98 May-00 May-02 May-04 May 1988 - December 2005

Figure 6.2.1c. Prediction of filtered sales values shifted 20 months using TLFN in

NeuroSolutions environment.

This section illustrates the experiments conducted using TLFN in NeuroSolutions. The

experiments were conducted by predicting aggregate sales using inverted 3-month T-Bills.

Using the unfiltered version of sales and T-bills datasets (Figure 6.2.1a), the following

performance metrics were obtained: RMSE of 0.141, correlation value of 0.88, POCD value of

38, and a Theil value was 0.442. Visualization of the plot (Figure 6.2.1b) of the filtered version

of sales and T-Bills datasets showed that the output sales data is very closely fitted to the desired

sales data. From performance metrics, the RMSE value was 0.055 and correlation value was

0.98. The network model for the most part produced the same output as with the TLFN under

S&P 500 index forecasting of sales, thereby showing the benefits of filtering. The POCD value

of 62, and Theil value of 0.115 were also better (Table 6.5). This experiment (Figure 6.2.1c)

produced very good forecasting results in terms of the closeness of the predicted relative to the

desired sales values and as shown by the performance metrics. That is, the RMSE was 0.070,

correlation value was 0.98, POCD value was 62, and the Theil value was 0.130.

50

Experiments ______

Figures 6.2.2a, 6.2.2b and 6.2.2c display the results of predicted aggregate sales using the Elman

recurrent network.

Predicted Aggregate Sales from 3-month T-Bills

0.08 0.06 0.04

0.02 Desired 0 Output

Sales values Sales -0.02

-0.04 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 -0.06 October 1987 - December 2005

Figure 6.2.2a. Prediction of unfiltered sales values using Elman network in NeuroSolutions

environment.

Predicted Aggregate Sales from 3-month T-Bills

1.2 1 0.8 0.6 Desired 0.4 Output 0.2

Sales values Sales 0 -0.2

-0.4 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 October 1987 - December 2005

Figure 6.2.2b. Prediction of filtered sales values using Elman network in NeuroSolutions

environment.

51

Experiments ______

Predicted Aggregate Sales from 3-month T-Bills

1.2 1 0.8 0.6 Desired 0.4 Output 0.2 Sales values Sales 0 -0.2 -0.4 May-88 May-90 May-92 May-94 May-96 May-98 May-00 May-02 May-04 May 1988 - December 2005

Figure 6.2.2c. Prediction of filtered sales values shifted by 20 months using Elman network in

NeuroSolutions environment.

The experiments whose results are depicted in figures 6.2.2a, 6.2.2b and 6.2.2c were done using

the Elman network in NeuroSolutions environment. The performance metrics relating to the

results in Figure 6.2.2a are as follows: RMSE value was 0.141, correlation value was 0.86, and

POCD and Theil values were 40 and 0.448 respectively. The MAPE value was 0.254. The

filtered counterpart (Figure 6.2.2b) generated much better results as shown in the RMSE value of

.063, correlation value of 0.98, a higher POCD value of 63 and a Theil value of 0.124 (Table

6.5). The results of the prediction adopting the shifting strategy were similar to the above

filtered version as shown in the following: RMSE of 0.063, correlation value of 0.98, POCD

value of 64, and a Theil value of 0.123.

52

Experiments ______

Figures 6.2.3a, 6.2.3b and 6.2.3c display results of aggregate sales predicted using TLFN.

Predicted Aggregate Sales from 3-month T-Bills

0.08 0.06 0.04 0.02 Desired 0 Output

Sales values Sales -0.02

-0.04 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 -0.06 October 1987 - December 2005

Figure 6.2.3a. Prediction of unfiltered sales values using TLFN in Matlab environment.

Predicted Aggregate Sales from 3-month T-Bills

1.2 1 0.8 0.6 Desired 0.4 Output 0.2

Sales values Sales 0 -0.2 -0.4 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 October 1987 - December 2005

Figure 6.2.3b. Prediction of filtered sales values using TLFN in Matlab environment.

53

Experiments ______

Predicted Aggregate Sales from 3-month T-Bills

1.2 1 0.8 0.6 Desired 0.4 Output 0.2

Sales values Sales 0 -0.2 -0.4 May-88 May-90 May-92 May-94 May-96 May-98 May-00 May-02 May-04 May 1988 - December 2005

Figure 6.2.3c. Prediction of filtered sales values shifted by 20 months using TLFN in Matlab

environment

The graphical representations of figures 6.2.3a, 6.2.3b and 6.2.3c are plots of predicted aggregate

sales from inverted interest rates (T-Bills) using the TLFN neural network in Matlab

environment. The first experiment (Figure 6.2.3a) with the unfiltered version of aggregate sales

generated a RMSE value of 0.221, a correlation value of 0.96, a POCD value of 74, and a Theil

value of 0.278. The second and third experiments, the filtered versions, showed very good

prediction results with RMSE values of 0.141 and 0.054, and correlation values of 0.99 and 0.97

respectively. The POCD values were 77 and 74 while the Theil values were 0.118 and 0.155

respectively, representing very good forecasted results.

Figures 6.2.4a, 6.2.4b and 6.2.4c display the results of aggregate sales predicted using the Elman

recurrent network.

54

Experiments ______

Predicting Aggregate Sales with 3-month T-Bills

0.08 0.06 0.04 0.02 Desired 0 Output -0.02 Sales values Sales -0.04 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 -0.06 October 1987 - December 2005

Figure 6.2.4a. Prediction of unfiltered sales values using the Elman neural network in Matlab

environment

Predicted Aggregate Sales from 3-month T-Bills

1.2 1 0.8 0.6 Desired 0.4 Output 0.2

Sales values Sales 0 -0.2 -0.4 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 October 1987 - December 2005

Figure 6.2.4b. Prediction of filtered sales values using the Elman neural network in Matlab

environment.

55

Experiments ______

Predicted Aggregate Sales with 3-month T-Bills

1.2 1 0.8 0.6 Desired 0.4 Output 0.2

Sales values Sales 0 -0.2 -0.4 May-88 May-90 May-92 May-94 May-96 May-98 May-00 May-02 May-04 May 1988 - December 2005

Figure 6.2.4c. Prediction of filtered sales values shifted by 20 months using the Elman

neural network in Matlab environment.

In the three preceding plots, Matlab version of the Elman neural network was used. In the

experiments that generated the results of these plots, the unfiltered sales prediction produced a

RMSE of 0.236, a correlation of 0.94, a POCD of 63, and Theil and MAPE values of 0.390 and

0.3614 respectively. In the filtered version reasonably better performance statistics were

generated as indicated by the following results: RMSE value of 0.074, correlation value of 0.96,

POCD value of 73, and a Theil value of 0.137. Using the shifting strategy with the filtered

version of sales (Figure 6.2.4c) produced similar performance metrics as in the experiment with

the filtered version and is shown as follows: RMSE value of 0.065, correlation value of 0.96,

POCD value of 76, and a Theil value of 0.168 (Table 6.5).

56

Experiments ______

Table 6.5

Experiment # Epochs RMSE Correlation POCD Theil MAPE 6.2.1a (unfiltered) 37 0.141 0.88 38 0.442 0.2814 6.2.1b (filtered) 38 0.055 0.98 62 0.115 0.2045 6.2.1c *(filtered) 32 0.070 0.98 62 0.130 0.2276 6.2.2a (unfiltered) 33 0.141 0.86 40 0.448 0.2536 6.2.2b (filtered) 43 0.063 0.98 63 0.124 0.2041 6.2.2c *(filtered) 78 0.063 0.98 64 0.123 0.2017 6.2.3a (unfiltered) 200 0.221 0.96 74 0.278 0.4312 6.2.3b (filtered) 150 0.141 0.99 77 0.118 0.1918 6.2.3c *(filtered) 120 0.054 0.97 74 0.155 0.2391 6.2.4a (unfiltered) 200 0.236 0.94 63 0.390 0.3614 6.2.4b (filtered) 150 0.074 0.96 73 0.137 0.2320 6.2.4c *(filtered) 150 0.065 0.96 76 0.168 0.2317

Note: Summary of performance metrics predicting aggregate sales using 3-month T-Bills.

* Experiments that utilized the shifting strategy.

6.3 Predicting Aggregate Sales using S&P 500 Index and inverted 3-month Treasury Bills in both NeuroSolutions and Matlab environments.

In the experiments of this section, sales were forecasted using both S&P 500 index and inverted

3-month treasury bills. These experiments are an attempt to produce better prediction of sales through combined effect of the predictors S&P 500 index and inverted 3-month T-Bills.

Figures 6.3.1a, and 6.3.1b display results of predicted aggregate sales using TLFN.

57

Experiments ______

Predicted Aggregate Sales from S&P 500 Index and 3-month T- Bills

0.08 0.06 0.04 0.02 Desired 0 Output -0.02 Sales values Sales -0.04 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 -0.06 October 1987 - December 2005

Figure 6.3.1a. Prediction of unfiltered sales values using TLFN in NeuroSolutions

environment.

predicted Aggregate Sales from S&P 500 Index and 3-month T- Bills

1.2 1 0.8 0.6 Desired 0.4 0.2 Output

Sales values Sales 0 -0.2 -0.4 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 October 1987 - December 2005

Figure 6.3.1b. Prediction of filtered sales values using TLFN in NeuroSolutions environment.

Figures 6.3.1a and 6.3.1b show a forecast of aggregate sales using S&P 500 index and inverted

3-month T-Bills. Their characteristic details were compared. In Figure 6.3.1a that displays the

unfiltered version of the predicted and desired sales datasets, the RMSE value was 0.141, the

correlation value was 0.88, and the POCD and Theil values were 41 and 0.442 respectively.

Figure 6.3.1b displays the forecasting of aggregate sales with S&P 500 index and 3-month

treasury bills when the filtered datasets were used. It exhibited better results than its unfiltered

58

Experiments ______

counterpart. The network produced similar output values to those of the desired samples as

evidenced by the performance statistics of the RMSE of 0.054, a correlation value of 0.99, a

POCD value of 64, and a Theil value of 0.108.

Figures 6.3.2a and 6.3.2b display the results of sales predicted using the Elman network.

Predicted Aggregate Sales from S&P 500 Index and 3-month T-Bills

0.08 0.06 0.04 0.02 Desired 0 Output -0.02 Sales values Sales

-0.04 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 -0.06 October 1987 - December 2005

Figure 6.3.2a. Prediction of unfiltered sales values using Elman network in NeuroSolutions environment.

Predicted Aggregate Sales from S&P 500 Index and 3-month T- Bills

1.2 1 0.8 0.6 Desired 0.4 Output 0.2

Sales values Sales 0 -0.2 -0.4 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 October 1987 - December 2005

Figure 6.3.2b. Prediction of filtered sales values using Elman network in NeuroSolutions environment.

59

Experiments ______

The NeuroSolutions experiments for predicting aggregate sales showed that the unfiltered sales

prediction (Figure 6.3.2a) produced a RMSE of 0.141, a correlation of 0.88, a POCD of 39, a

Theil of 0.447, and a MAPE of 0.336 (Table 6.6), while for the filtered aggregate sales forecast

(Figure 6.3.2b) there was a RMSE value of 0.063, a correlation value of 0.98, a POCD value of

63, and a Theil value of 0.120.

Figures 6.3.3a and 6.3.3b display results of aggregate sales predicted using TLFN.

Predicted Aggregate Sales from S&P 500 Index and 3- month T-Bills

0.08 0.06 0.04 0.02 Desired 0 Output -0.02 Sales values Sales -0.04 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 -0.06 October 1987 - December 2005

Figure 6.3.3a. Prediction of unfiltered sales values using TLFN in Matlab environment.

Predicted Aggregate Sales from S&P 500 Index and 3- month T-Bills

1.5

1 Desired 0.5 Output

Sales values Sales 0

-0.5 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 October 1987 - December 2005

Figure 6.3.3b. Prediction of filtered sales values using TLFN in Matlab environment.

60

Experiments ______

In these experiments, Matlab’s TLFN was used to generate the following results: for

unfiltered sales prediction the RMSE value was 0.165, correlation value was 0.92, POCD value

was 68, and the Theil value was 0.365. For filtered sales prediction the RMSE was 0.031,

correlation was 0.99, POCD was 78, Theil was 0.075, and the MAPE was 0.1461 (Table 6.6).

Figures 6.3.4a and 6.3.4b display the results of sales predicted using the Elman network.

Predicted Aggregate Sales from S&P 500 Index and 3- month T-Bills

0.08 0.06 0.04 0.02 Desired 0 Output -0.02 Sales values Sales -0.04 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 -0.06 October 1987 - December 2005

Figure 6.3.4a. Prediction of unfiltered sales values using the Elman neural network in Matlab environment.

Predicted Aggregate Sales from S&P 500 Index and 3- month T-Bills

1.5

1 Desired 0.5 Output 0 Sales values Sales

-0.5 Oct-87 Oct-89 Oct-91 Oct-93 Oct-95 Oct-97 Oct-99 Oct-01 Oct-03 Oct-05 October 1987 - December 2005

Figure 6.3.4b. Prediction of filtered sales values using the Elman neural network in Matlab environment.

61

Experiments ______

Figures 6.3.4a and 6.3.4b represent results of experiments performed using Elman neural

network in Matlab environment. They portray prediction of aggregate sales using S&P 500 index

and inverted 3-month T-Bills. The results of the prediction for the filtered sales dataset were

better than for the unfiltered sales data. For example, the RMSE for the unfiltered sales

prediction was 0.193 while the filtered sales prediction yielded 0.032. The correlation values

were 0.90 for the unfiltered and 0.99 for the filtered sales, while the Theil and MAPE were 0.345

and 0.202 and 0.321 and 0.292 respectively. The POCD values, however, were the same 70.

Table 6.6

Experiment # Epochs RMSE Correlation POCD Theil MAPE 6.3.1a (unfiltered) 37 0.141 0.88 41 0.442 0.2550 6.3.1b (filtered) 32 0.054 0.99 64 0.108 0.2141 6.3.2a (unfiltered) 31 0.141 0.88 39 0.447 0.3336 6.3.2b (filtered) 64 0.063 0.98 63 0.120 0.2783 6.3.3a (unfiltered) 121 0.165 0.92 68 0.365 0.5280 6.3.3b (filtered) 150 0.031 0.99 78 0.075 0.1461 6.3.4a (unfiltered) 130 0.193 0.90 70 0.345 0.3211 6.3.4b (filtered) 150 0.032 0.99 70 0.202 0.2920

Note: Summary of performance metrics predicting sales with S&P 500 index and 3-month T-Bills.

Table 6.7

Experiments Epochs RMSE Correlation POCD Theil MAPE

Unfiltered 106 0.165 0.904 54 0.389 0.356 Filtered 97 0.060 0.977 68 0.134 0.221 NeuroSolutions (NS) 43 0.096 0.941 54 0.241 0.237 Matlab 158 0.108 0.958 72 0.217 0.306 LFN 97 0.091 0.954 64 0.219 0.265 Elman 103 0.107 0.945 62 0.239 0.278 S&P 500 Index 106 0.085 0.951 63 0.218 0.272 3-month T-Bills 103 0.110 0.953 64 0.219 0.255 S&P 500 Index & 3-month T-Bills 89 0.103 0.941 62 0.263 0.296

Note: Summary averages of the experiments performed in their respective categories.

62

Experiments ______

Table 6.8

Summary of Experiments in NeuroSolutions Environment Data Type TLFN Elman Unfiltered Epoch RMSE r POCD Theil MAPE Epoch RMSE r POCD Theil MAPE S&P 500 Index 35 0.141 0.87 37 0.456 0.2314 48 0.141 0.88 38 0.44 0.2507 3-month T-Bills 37 0.141 0.88 38 0.442 0.2814 33 0.141 0.86 40 0.448 0.2536 S&P 500 Index & 3-month T-Bills 37 0.141 0.88 41 0.442 0.255 31 0.141 0.88 39 0.447 0.3336 Averages 36.33 0.141 0.88 38.667 0.447 0.2559 37.333 0.141 0.873 39 0.445 0.2793 Filtered 47 0.063 0.98 61 0.122 0.2258 38 0.063 0.98 64 0.117 0.2121 * S&P 500 Index 60 0.055 0.98 64 0.114 0.2138 40 0.055 0.98 64 0.115 0.2091 38 0.055 0.98 62 0.115 0.2045 43 0.063 0.98 63 0.124 0.2041 * 3-month T-Bills 32 0.07 0.98 62 0.13 0.2276 78 0.063 0.98 64 0.123 0.2017 S&P 500 Index & 3-month T-Bills 32 0.054 0.99 64 0.108 0.2141 64 0.063 0.98 63 0.12 0.2783 Averages 41.8 0.0594 0.98 62.6 0.118 0.2172 52.6 0.0614 0.98 63.6 0.12 0.2211

Summary of Experiments in Matlab Environment Unfiltered Epoch RMSE r POCD Theil MAPE Epoch RMSE r POCD Theil MAPE S&P 500 Index 200 0.044 0.96 74 0.292 0.425 200 0.273 0.92 69 0.322 0.5972 3-month T-Bills 200 0.221 0.96 74 0.278 0.4312 200 0.236 0.94 63 0.39 0.3614 S&P 500 Index & 3-month T-Bills 121 0.165 0.92 68 0.365 0.528 130 0.193 0.9 70 0.345 0.3211 Averages 173.7 0.1433 0.95 72 0.312 0.4614 176.67 0.234 0.92 67.333 0.352 0.4266 Filtered 150 0.01 0.99 86 0.08 0.1191 150 0.035 0.99 73 0.129 0.2421 * S&P 500 Index 150 0.07 0.94 63 0.227 0.3009 150 0.077 0.94 64 0.197 0.2317 150 0.141 0.99 77 0.118 0.1918 150 0.074 0.96 73 0.137 0.232 * 3-month T-Bills 120 0.054 0.97 74 0.155 0.2391 150 0.065 0.96 76 0.168 0.2317 S&P 500 Index & 3-month T-Bills 150 0.031 0.99 78 0.075 0.1461 150 0.032 0.99 70 0.202 0.292 Averages 144 0.0612 0.98 75.6 0.131 0.1994 150 0.0566 0.968 71.2 0.167 0.2459

Note: Summary of performance metrics for all experiments conducted in the prediction of Aggregate Sales.

* Denotes the experiments that adopted the shifting strategy.

r Represents the correlation coefficient of the experiments.

7. Data Analysis

This section provides an analysis of the forecasted sales data to give some understanding of the experiments and the interpretation of the results. In general, it can be said that the filtered version of the datasets produced better forecasted results. Not only did the graphing representation of the datasets demonstrated relatively good predicted outputs compared to the

63

Data Analysis ______

desired samples, but also the RMSE values were 0.060 compared to the unfiltered version of

0.165 (Table 6.7), and the correlation values were stronger as evidenced by their averages 0.977 compared to unfiltered value of 0.904. The Theil values for the filtered datasets also showed values closer to zero, which substantiated the other performance metrics (RMSE and correlation coefficient) mentioned. The POCD values, which measure the percentage of correct directions between the desired and predicted values, averaged 68 for the filtered sales forecasted samples whereas the unfiltered datasets averaged 54.

The following is a specific example of the kind of results that were obtained. A comparative look at figures 6.2.2a, 6.2.2b and 6.2.2c showed better prediction results were generated in experiments where the filtered versions (Figures 6.2.2b and 6.2.2c) of the datasets were used. In particular, the desired and network outputs were almost visually identical for the forecasting of filtered aggregate sales. The performance metrics also supported the data graphically displayed in these plots: the RMSE and correlation values were 0.063 and 0.98 respectively for both predictions. The experiment that utilized the shifting strategy generated slightly better results as evidenced by the POCD, Theil and MAPE values of 64, 0.123, and 0.202 respectively (Table

6.5) and is shown in Figure 6.2.2c. In comparison, the unfiltered version of aggregate sales prediction (Figure 6.2.2a) did not produce quite as good results with the noisy plot and a larger

RMSE value of 0.141. Also, the correlation, POCD, Theil, and MAPE values were not as good as in the cases of filtered samples (Table 6.5).

Experiments 6.1.1c, 6.1.2c, 6.1.3c, 6.1.4c, 6.2.1c, 6.2.2c, 6.2.3c and 6.2.4c in which the

S&P 500 index and interest rate datasets were individually shifted relative to sales to obtain their respective lead over sales by finding the strongest correlation before training and testing, also provided reasonably good results. This did not prove that shifting samples to get the maximum

64 Data Analysis ______

correlation between the predictor and predicted variable results in better forecasting in every circumstance but showed that this methodology can be useful in cases where conventional techniques do not suffice in producing desirable results. Generally, the predictions obtained with the shifted filtered datasets were not as good as those obtained with the non-shifted filtered datasets. In the Matlab’s case, the non-shifted filtered datasets were generally better than the shifted filtered especially with both S&P 500 index and 3-month T-Bills (Table 6.8) forecasting of aggregate sales, with the best overall performance being for S&P 500 index. For S&P 500 index under TLFN, the shifted filtered dataset yielded generally better performance than the mere non-shifted filtered for both TLFN and the Elman forecast. However, for the inverted T-

Bills under TLFN the non-shifted filtered yielded generally better performance than the shifted filtered while under the Elman they are essentially the same.

A closer look at the summary of results (Table 6.8) showed that the unfiltered versions of the experiments produced similar results in NeuroSolutions TLFN and Elman recurrent networks, only the MAPE was marginally better using the S&P 500 index as the predictor in the TLFN model. In Matlab, the TLFN models using the unfiltered datasets generally preformed better than the Elman recurrent networks as shown in the performance metrics (Table 6.8). In particular, with TLFN both S&P 500 index and 3-month T-Bills performed better, except in the case of MAPE for the 3-month T-Bills with a value of 0.2317 (Table 6.8). The filtered versions of the experiments using NeuroSolutions TLFN and Elman networks again generated similar results, only the overall average number of epochs (42) for the TLFN designs were smaller compared the Elman recurrent networks (53), showing the network trained faster on the datasets given similar network models (Table 6.8). The networks that utilized the filtered versions of the datasets in Matlab’s neural network environment also generated similar average performance

65 Data Analysis ______

statistics; more importantly the experiment that derived the best overall results used the S&P 500

index as the predictor and produced a RMSE, correlation, POCD, Theil, and MAPE values of

0.01, 0.99,0 86, 0.08, and 0.1191 respectively (Table 6.8). In the case where both datasets were

used together to predict sales in NeuroSolutions and Matlab environments, the empirical results

were comparable to the experiments in which only one predictor was used with an overall RMSE

of 0.103 compared to 0.085 for S&P 500 index and 0.110 for 3-month T-Bills, a correlation

value of 0.941 compared to 0.951 for S&P 500 index, and 0.953 for 3-month T-Bills, and a

POCD value of 62 compared to 63 for S&P 500 index, and 64 for 3-month T-Bills (Table 6.7).

The Theil and MAPE values were also similar in averages as shown in the Table 6.7. In the

experiments that used the shifting strategy, the 3-month T-Bills produced better results than S&P

500 index in Matlab neural network environment. This can be seen in Table 6.8 that shows a

RMSE value of 0.054, a correlation value of 0.97, a POCD value of 74, a Theil value of 0.155,

and a MAPE value of 0.2391 for 3-month T-Bills compared to a RMSE value of 0.07, a

correlation value of 0.94, a POCD value of 63, a Theil value of 0.227, and a MAPE value of

0.3009 for S&P 500 index. The Elman recurrent network demonstrated similar performance

characteristics using this methodology except that the MAPE values of 0.2317 were the same for

the S&P 500 index and 3-month treasury bills (Table 6.8). In comparing the prediction results

obtained in the two environments, NeuroSolutions performed better than Matlab in the areas of

the average number of epochs with a value of 43, RMSE with a value of 0.096, and MAPE with

a value of 0.237; however, Matlab performed better than NeuroSolutions in the areas of average

correlation with a value of 0.958, POCD with a value of 72, and Theil with a value of 0.217

(Table 6.7). Overall, the TLFN model provided moderately better results than the Elman

66

Data Analysis ______

recurrent network as shown in Table 6.7 with value differences of 0.016, 0.009, 2, 0.02, and

0.013 in the RMSE, correlation, POCD, Theil, and MAPE respectively.

Using multiple predictors during the experiments that utilized the shifting strategy was

not done since compensating for the lead of 6 months for the S&P 500 index and the lead of 20

months for the inverted interest rates to maximize correlation prevented adequate manipulation

of the datasets and hence forecasting.

8. Conclusion

The experiments in this study provided empirical results that demonstrated the predictability

of non-linear data using artificial neural network models, in particular economic variables. The

experiments were successful in predicting aggregate sales using S&P 500 index, inverted 3-

month treasury bills, and the combination of both S&P 500 index and inverted 3-month

treasury bills. In general, the results were essentially the same. Although no significance test

was performed on the prediction results one might suspect that the forecast using the shifted

filtered results would have been noticeably better than in the cases of mere filtering but this

was not found to be the case. One reason may be the small size of the network resulted in

better generalization which possibly compensated for the positive effects of shifting one

dataset relative to the other.

During forecasting, the windowing technique was used to generate better results in the

Matlab environment. This method required dividing the datasets into 3 subsets of length 75,

75, and 70. The NeuroSolutions neural network models did not require the windowing

methodology as the network models adapted well to the training samples to produce prediction

67

Conclusion ______

results in the testing data similar to those obtained for the windowing technique used with

Matlab neural network models.

This work focuses only on predicting aggregate sales with S&P 500 index and inverted 3-

month treasury bills individually and in combination. However this neural network

methodology can be utilized to forecast many other types of time series data; for example, the

bond market and interest rates. Another area of interest is since Matlab generated somewhat

better results with the windowing technique, it would be desirable to attempt this methodology

using NeuroSolutions with a similar size network since the Matlab networks had more

processing elements. The NeuroSolutions neural network models were setup with one delay

per input while Matlab had seven delays. Would a larger NeuroSolutions network model

generalized better? Would a smaller Matlab neural network model generalize better? How

large can a network model be before the predicted results begin to deteriorate? These are

further areas of study that needs to be pursued. In almost every case the TLFN performed

better than the Elman recurrent network. This may be a consequence of the fact that TLFN is

considered easier to train and therefore more practical (Principe et al, 2000) as well as the

Elman network is a simple version of a recurrent network. Nonetheless, the Elman recurrent

network at least in theory is considered to be more efficient than the TLFN (Principe et al,

2000). Therefore it would be noteworthy to examine different Elman recurrent network design

for comparison with TLFNs in future studies.

68 References ______

References

1. “Global Business Cycle Indicators.” The Conference Board.

.

2. “Technical Indicators.” QuoteLinks.com, .

3. “The Bond Market: A Look Back.” Investopedia.com.

< http://www.investopedia.com/articles/06/centuryofbonds.asp >.

4. “Time series.” Wikipedia online .

.

5. “What is Time-Series Forecasting?” Decisioneering.

6. Akhtar, M.A. “Effects of Interest Rates and Inflation on Aggregate Inventory

Investment in the United States.” The American Economic Review, Vol. 73, No 3.

June 1983, 319-328.

7. Angeline, Peter J, Gregory M. Saunders, and Jordan B Pollack. “An Evolutionary

Algorithm that Constructs Recurrent Neural Networks.” The Ohio State

University, 1993, 1-28. .

8. Board of Governors of the Federal Reserve System, Federal Reserve Bank.

.

9. Bodis, Lorant. “Financial Time Series Forecasting Using Artificial Neural

Networks.” Master Thesis, Babes-Bolyai University, 2004.

.

10. Crone, Sven F, Stefan Lessmann, and Robert Stahlbook. “Utility based Data

Mining for Time Series Analysis – Cost-sensitive Learning for Neural Network

69

References ______

11. Predictors.” ACM, Chicago, Illinois.

.

12. Easton, Valerie J and John H. McColl. “Statistics Glossary – Time series data.”

September, 1997. .

13. Edmonds, Andrew N. “Time Series Prediction Using Supervised Learning and

Tools from Chaos Theory.” University of Luton, Faculty of Science and

Computing, September, 1996. .

14. Gruca, Thomas S, Bruce R Kelmz and E. Ann Furr Petersen. “Mining Sales Data

using a Neural Network Model of Market Response.” ACM, SIGKDD ,

Explorations Newsletter Vol.1, issue.1, June 1999, 39-43.

.

15. Hall, Robert E and Marc Lieberman. Economics: Principles and Applications.

Cincinnati, Ohio: South-Western College Publishing, 1998.

16. Haykin, Simon. Neural Networks: A Comprehensive Foundation. New York,

NY: Macmillan College Publishing Company, 1994.

17. Jasic, Teo and Douglas Wood. “The profitability of the daily stock market

indices trades based on the neural network predictions: case study for the S&P

500, the DAX, the TOPIX and the FTSE in the period 1965-1999.” Applied

Financial Economics , 2004.

.

18. Johnson, Eugene . Fundamentals of Marketing, 4th Edition. AMACOM, 2004.

19. Kanas, Angelos. “Non-Linear Forecasts of Stock Returns.” Journal of

Forecasting, 2003, 1-5.

70 References ______

20. Khosrow-Pour, Mehdi, ed. Encyclopedia of Information Science and

Technology, Volume IV. Idea Group Publishing, 2005.

book/id_15522/toc.asp >.

21. Knoop, Todd A. Recessions and Depressions: Understanding Business Cycles.

Praeger Publishers, 2004. .

22. Konur, Umut and Ali Okatan. “Time Series Prediction using Recurrent Neural

Network Architectures and Time Delay Neural Networks.” Department of

Computer Engineering, Halic University, Istanbul, Turkey, 2004, 5-8.

23. Krzysztof, Michalak, and Rafal Raciborski. “Dynamic Correlation Approach to

Early Stopping in Artificial Neural Network Training, Macroeconomic

Forecasting Example” IEEE Computer Society Digital Library, 2005.

FTOKEN=6184618 >.

24. Larrain, Maurice. “Do interest rates lead real sales and inventories? A spectral

analysis approach – Statistical Data Included” Business Economics , April, 2002.

25. Lawrence, Ramon. “Using Neural Networks to Forecast Stock Market Prices.”

University of Manitoba, 1997.

26. Meng, Qiang. “Data Mining & Statistics in Business.”

.

27. Navarro, Peter, ed. What the Best MBAs Know: How to Apply the Greatest Ideas

Taught in the Business Schools. New York, NY: McGraw-Hill, 2005.

http:\\Library.books24x7.com/book/id_11949/toc.asp>.

71

References ______

28. Perez-Rodriguez, Jorge V., Salvador Torra, and Andrada Felix. “Are Spanish

ibex35 stock future index returns forecasted with non-linear models?” Applied

Financial Economics, 2005, 963-975.

.

29. Principe; Jose C, Neil R. Euliano and W. Curt Lefebvre. Neural and Adaptive

Systems: Fundamentals Through Simulations . New York, NY: John Wiley &

Sons, 2000.

30. Qi, Min. “Nonlinear Predictability of Stock Returns Using Financial and

Economic Variables.” Journal of Business Economic Statistics, Vol. 17, no 4,

October 1999, 419-429. .

31. Reed, Russell and Robert J, Marks II. Neural Smithing: Supervised Learning in

Feedforward Artificial Neural Networks . London, England: MIT Press, 1999.

32. Skabar, Andrew and Ian Cloete. “Neural Networks, Financial Trading and the

Efficient Markets Hypothesis.” The Australian Computer Society , Vol. 4. 2001,

241-249. .

33. Smith, Kate and Jatiner Gupta. Neural networks in Business: Techniques and

Applications . Idea Group Publishing, 2002.

book/id_4082/toc.asp >.

34. “Standard & Poor’s 500 Index – S&P 500.” Investopedia.com.

.

35. Swingler, Kevin. Applying Neural Networks: A Practical Guide . New York, NY:

Harcourt Brace & Company, 1996.

72

References ______

36. Thawornwong, Suraphan and David Enke. “The adaptive selection of financial

and economic variables for use with artificial neural networks.”

NeuroComputing, 2004, 205-215. .

37. The MathWorks. “Neural Network Toolbox using Matlab version 6.5.”

.

38. The MathWorks. “Matlab Function Reference,” Desktop Tools & Development

Environment - Programming Tools.

.

39. Virili, Francesco and Bernd Freisleben. “Nonstationary and Data Preprocessing

for Neural Network Predictions of an Economic Time Series.” IEEE

International Joint Conference on Neural Networks, 2000, 1-6.

40. Wu, Shaun-inn and Ruey-Pyng Lu. “Combining Artificial Neural Networks and

Statistics for Stock-Market Forecasting.” Proceedings of the 1993 ACM

conference on Computer Science, Indianapolis, Indiana, February 1993, 16-181.

.

41. Yahoo. “Finance.” .

42. Zekic, Marijana. “Neural Network Applications in Stock Market Predictions – A

Methodology Analysis.” University of Josip Juraj Strossmayer in Osijek, Faculty

of Economics, Croatia, 1998, 1-11.

.

43. Zhang, Guoqiang, B. Eddy Patuwo and Michael Y. Hu. “Forecasting with

artificial neural networks: The state of the art.” Graduate School of Management,

Kent State University, Kent, Ohio 1997, 35-62.

73 References ______

44. Zhang, Peter G., ed. Neural Networks in Business Forecasting. Idea Group

Publishing, 2004. .

74