Forecasting with Artificial Neural Networks

EVIC 2005 Tutorial Santiago de Chile, 15 December 2005

Æ slides on www.neural-forecasting.com

Sven F. Crone Centre for Forecasting Department of Management Science Lancaster University Management School email: [email protected] EVIC’05 © Sven F. Crone - www.bis-lab.com

Lancaster University Management School? EVIC’05 © Sven F. Crone - www.bis-lab.com

What you can expect from this session …

ƒ Simple back propagation algorithm [Rumelhart et al. 1982]

(t pj ,o pj ) E p = C(t pj , o pj ) o pj = f j (net pj ) Δ p w ji ∝ − ∂w ji ∂C(t ,o ) ∂C(t ,o ) ∂net pj pj = pj pj pj ∂w ji ∂net pj ∂w ji ∂C(t ,o ) Æ „How to …“ on Neural δ = − pj pj pj ∂net pj Network Forecasting ∂C(t ,o ) ∂C(t ,o ) ∂o δ = − pj pj = pj pj pj pj with limited maths! ∂net pj ∂opj ∂net pj ∂o pt ' = f j (net pj ) ∂net pj

∂C(t pj ,opj ) ' δ pj = f j (net pj ) Æ CD-Start-Up Kit for ∂o pj ∂ w o ∂C(t ,o ) ∂net ∂C(t ,o ) ∑ ki pi Neural Net Forecasting pj pj pk = pj pj i ∑ ∂net ∂o ∑ ∂net ∂o k pk pj k pk pj Æ 20+ software simulators ∂C(t pj ,opj ) δ = ∑ wkj = − ∑δ pj wkj k ∂net pk k Æ datasets ' pj = f j (net pj )∑δ pj wkj Æ literature & faq k δ ⎧∂C(t pj ,opj ) ' ⎪ f j (net pj ) if unit j is in the output layer ⎪ ∂opj Æ slides, data & additional info on pj = ⎨ ⎪ ' f j (net pj )∑δ pk wpjk if unit j is in a hidden layer www.neural-forecasting.com ⎩⎪ k EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks 1. Forecasting? 2. Neural Networks? 3. Forecasting with Neural Networks … 4. How to write a good Neural Network forecasting paper! EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks 1. Forecasting? 1. Forecasting as predictive Regression 2. Time series prediction vs. causal prediction 3. Why NN for Forecasting? 2. Neural Networks? 3. Forecasting with Neural Networks … 4. How to write a good Neural Network forecasting paper! EVIC’05 © Sven F. Crone - www.bis-lab.com

Forecasting or Prediction?

ƒ : „ Application of data analysis algorithms & discovery algorithms that extract patterns out of the data” Æ algorithms? Data Mining

TASKS Descriptive Predictive timing of dependent variable Data Mining Data Mining

Categorisation: Summarisation Association Sequence Categorisation: Time Series Regression Clustering & Visualisation Analysis Discovery Classification Analysis

¾K-means ¾Feature Selection ¾Association rules ¾Temporal ¾Decision trees ¾Linear Regression ¾Exponential Clustering ¾Princip.Component ¾Link Analysis association rules ¾Logistic regress. ¾Nonlinear Regres. smoothing ¾Neural networks Analysis ¾Neural networks ¾Neural networks ¾(S)ARIMA(x) ¾K-means ¾Disciminant MLP, RBFN, GRNN ¾Neural networks Clustering Analysis ¾Class Entropy

ALGORITHMS EVIC’05 © Sven F. Crone - www.bis-lab.com

Forecasting or Classification

Independent Metric scale Ordinal scale Nominal scale dependent Metric sale ƒRegression ƒAnalysis of Variance ƒTime Series DOWNSCALE Inputs Target Analysis ...... Ordinal scale ...... DOWNSCALE DOWNSCALE DOWNSCALE ... Cases ...... Nominal scale ƒClassification ƒContingency ...... DOWNSCALE Analysis ...

ƒPrincipal ƒClustering NONE Component Inputs Analysis ...... Cases ...... ƒ Simplification from Regression (“How much”) Æ Classification (“Will event occur”) Æ FORECASTING = PREDICTIVE modelling (dependent variable is in future) Æ FORECASTING = REGRESSION modelling (dependent variable is of metric scale) EVIC’05 © Sven F. Crone - www.bis-lab.com

Forecasting or Classification?

ƒ What the experts say … EVIC’05 © Sven F. Crone - www.bis-lab.com

International Conference on Data Mining

ƒ You are welcome to contribute … www.dmin-2006.com !!! EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks 1. Forecasting? 1. Forecasting as predictive Regression 2. Time series prediction vs. causal prediction 3. SARIMA-Modelling 4. Why NN for Forecasting? 2. Neural Networks? 3. Forecasting with Neural Networks … 4. How to write a good Neural Network forecasting paper! EVIC’05 © Sven F. Crone - www.bis-lab.com

Forecasting Models

ƒ Time series analysis vs. causal modelling

Time series prediction (Univariate)

„ Assumes that data generating process that creates patterns can be explained only from previous observations of dependent variable

Causal prediction (Multivariate)

„ Data generating process can be explained by interaction of causal (cause-and-effect) independent variables EVIC’05 © Sven F. Crone - www.bis-lab.com

Classification of Forecasting Methods

Forecasting Methods

Objective Subjective „Prophecy“ Forecasting Methods Forecasting Methods educated guessing… Time Series Causal Methods Methods

Averages Linear Regression Sales Force Composite

Moving Averages Multiple Regression Analogies Naive Methods Dynamic Regression Delphi Exponential Smoothing PERT Vector Autoregression Simple ES Survey techniques Linear ES Intervention model

Seasonal ES Neural Networks Dampened Trend ES Neural Networks ARE

Simple Regression ƒ time series methods Autoregression ARIMA ƒ causal methods Neural Networks & CAN be used as ƒ Averages & ES Demand Planning Practice ƒ Regression … Objektive Methods + Subjektive correction EVIC’05 © Sven F. Crone - www.bis-lab.com

Time Series Definition

ƒ Definition

Time Series is a series of timely ordered, comparable observations

yt recorded in equidistant time intervals ƒ Notation

Yt represents the t th period observation, t=1,2 … n EVIC’05 © Sven F. Crone - www.bis-lab.com

Concept of Time Series

ƒ An observed measurement is made up of a

systematic part and a

random part

ƒ Approach

Unfortunately we cannot observe either of these !!!

Forecasting methods try to isolate the systematic part

Forecasts are based on the systematic part

The random part determines the distribution shape

ƒ Assumption

Data observed over time is comparable

„ The time periods are of identical lengths (check!)

„ The units they are measured in change (check!)

„ The definitions of what is being measured remain unchanged (check!)

„ They are correctly measured (check!)

data errors arise from sampling, from bias in the instruments or the responses, from transcription. EVIC’05 © Sven F. Crone - www.bis-lab.com

Objective Forecasting Methods – Time Series

Methods of Time Series Analysis / Forecasting

Class of objective Methods

based on analysis of past observations of dependent variable alone

ƒ Assumption

there exists a cause-effect relationship, that keeps repeating itself with the yearly calendar

Cause-effect relationship may be treated as a BLACK BOX

TIME-STABILITY-HYPOTHESIS ASSUMES NO CHANGE: Æ Causal relationship remains intact indefinitely into the future!

the time series can be explained & predicted solely from previous observations of the series

Æ Time Series Methods consider only past patterns of same variable Æ Future events (no occurrence in past) are explicitly NOT considered! Æ external EVENTS relevant to the forecast must be corected MANUALLY EVIC’05 © Sven F. Crone - www.bis-lab.com

Simple Time Series Patterns

Diagram 1.1: Trend - Diagram 1.2: Seasonal - long-term grow th or decline occuring m ore or les s regular m ovem ents w ithin a series w ithin a year 100 120 80 100 80 60 60 40 40 regular fluctuation within a year 20 Long term movement in series 20 (or shorter period) 0 0 superimposed on trend and cycle 3 6 9 5 12 15 18 21 24 27 30 10 15 20 25 30 35 40 45 Year Year

Diagram 1.3: Cycle - Diagram 1.4: Irregular - alternating upsw ings of varied length random m ovements and those w hich and intensity reflect unusual events 10 350

8 300 250 6 200 4 150 2 Regular fluctuation 100 superimposed on trend (period 50 0

5 may be random) 0 10 15 20 25 30 35 40 45 1 Year 10 19 28 37 46 55 64 73 82 EVIC’05 © Sven F. Crone - www.bis-lab.com

Regular Components of Time Series

A Time Series consists of superimposed components / patterns: ¾ Signal ƒ level ‘L‘ ƒ trend ‘T‘ ƒ seasonality ‘S‘ ¾ Noise ƒ irregular,error 'e'

Sales = LEVEL + SEASONALITY + TREND + RANDOM ERROR EVIC’05 © Sven F. Crone - www.bis-lab.com

Irregular Components of Time Series

Original-Zeitreihen

45 Structural changes in systematic data 40 35

30

25 PULSE 20 ƒ 15

10 one time occurrence 5

0 on top of systematic stationary / [t] M 08.2000 M 09.2000 M 10.2000 M 11.2000 M 12.2000 M 01.2001 M 02.2001 M 03.2001 M 04.2001 M 05.2001 M 06.2001 M 07.2001 M 08.2001 M 09.2001 M 10.2001 M 11.2001 M 12.2001 M 01.2002 M 02.2002 M 03.2002 M 04.2002 M 05.2002 M 06.2002 M 07.2002 M

trended / seasonal development Datenwert original Korrigierter Datenwert

STATIONARY time series with PULSE ƒ LEVEL SHIFT Original-Zeitreihen one time / multiple time shifts 60

50 on top of systematic stationary / trended / seasonal 40 etc. development 30 20

10

ƒ STRUCTURAL BREAKS 0 [t] Jul 01 Jul Jul 02 Jul Mai 01 Mai Okt 00 Okt Mai 02 Mai Okt 01 Okt Jan 01 Jan 01 Jun Jan 02 Jan 02 Jun Feb 01 Feb Mrz 01 Apr 01 Feb 02 Feb Mrz 02 Apr 02 Sep 00 Sep Nov 00 Dez 00 Sep 01 Sep Nov 01 Dez 01 Aug 00 Aug 01

Trend changes (slope, direction) Datenwert original Korrigierter Datenwert

Seasonal pattern changes & shifts STATIONARY time series with level shift EVIC’05 © Sven F. Crone - www.bis-lab.com

Components of Time Series

ƒ Time Series

Level

Trend

Random

• Time Series Æ decomposed into Components EVIC’05 © Sven F. Crone - www.bis-lab.com

Time Series Patterns

Time Series Pattern

REGULAR IRREGULAR Time Series Patterns Time Series Patterns

STATIONARY) SEASONAL TRENDED FLUCTUATING INTERMITTANT Time Series Time Series Time Series Time Series Time Series

Original-Zeitreihen Original-Zeitreihen Original-Zeitreihen Original-Zeitreihen Original-Zeitreihen 45 12 60000 60 30 40

50000 10 35 50 25

30 40000 8 40 20

25

30000 6 30 15 20

20000 15 20 10 4 10 10000 10 5 5 2

0 0 0 [t] 0 [t] [t] [t] 0 [t] Jul 80 Jul Jul 79 Jul Jul 78 Jul Jul 77 Jul Mai 80 Mai 79 Mai 78 Mai 77 Jan 80 Jan 79 Jan 78 Jan 77 Mrz 80 Mrz 79 Mrz 78 Mrz 77 Sep 80 Nov 80 Sep 79 Nov 79 Sep 78 Nov 78 Sep 77 Nov 77 M 08.2000 M 09.2000 M 10.2000 M 11.2000 M 12.2000 M 01.2001 M 02.2001 M 03.2001 M 04.2001 M 05.2001 M 06.2001 M 07.2001 M 08.2001 M 09.2001 M 10.2001 M 11.2001 M 12.2001 M 01.2002 M 02.2002 M 03.2002 M 04.2002 M 05.2002 M 06.2002 M 07.2002 M

1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 -5 M 08.2000 M 09.2000 M 10.2000 M 11.2000 M 12.2000 M 01.2001 M 02.2001 M 03.2001 M 04.2001 M 05.2001 M 06.2001 M 07.2001 M 08.2001 M 09.2001 M 10.2001 M 11.2001 M 12.2001 M 01.2002 M 02.2002 M 03.2002 M 04.2002 M 05.2002 M 06.2002 M 07.2002 M -10 M 08.2000 M 09.2000 M 10.2000 M 11.2000 M 12.2000 M 01.2001 M 02.2001 M 03.2001 M 04.2001 M 05.2001 M 06.2001 M 07.2001 M 08.2001 M 09.2001 M 10.2001 M 11.2001 M 12.2001 M 01.2002 M 02.2002 M 03.2002 M 04.2002 M 05.2002 M 06.2002 M 07.2002

Datenwert original Korrigierter Datenwert Datenwert original Korrigierter Datenwert Datenwert original Korrigierter Datenwert Datenwert original Korrigierter Datenwert Datenwert original Korrigierter Datenwert

Yt = f(Et) Yt = f(St,Et) Yt = f(Tt,Et) time series fluctuates Number of periods time series is time series is time series is very strongly around level with zero sales is high influenced by influenced by influenced by (mean deviation > (ca. 30%-40%) level & random level, season and trend from level ca. 50% around mean) fluctuations random and random fluctuations fluctuations + PULSES! + LEVEL SHIFTS! Combination of individual Components + STRUCTURAL BREAKS! Yt = f( St, Tt, Et ) EVIC’05 © Sven F. Crone - www.bis-lab.com

Components of complex Time Series

Zeitreihen-Diagramm Sales or 800 750 observation of Y 700 Different possibilities to t 650

time series at point t 600 combine components

550

500 [t] Jul 60 Jul Jul 61 Jul Jul 62 Jul Jul 63 Jul Mai 60 Mai Mai 61 Mai Mai 62 Mai Mai 63 Mai Jan 60 Jan 61 Jan 62 Jan 63 Mrz 60 Mrz 61 Mrz 62 Mrz 63 Sep 60 Sep Nov 60 Sep 61 Sep Nov 61 Sep 62 Sep Nov 62 Sep 63 Sep Nov 63

Datenwert original Korrigierter Datenwert

Original-Zeitreihen = 45 consists of a 40 35

30

combination of f ( ) 25

20

15

10

5

0 [t] Jul 80 Jul Jul 79 Jul Jul 78 Jul Jul 77 Jul Mai 80 Mai 79 Mai 78 Mai 77 Jan 80 Jan 79 Jan 78 Jan 77 Mrz 80 Mrz 79 Mrz 78 Mrz 77 Sep 80 Sep Nov 80 Sep 79 Sep Nov 79 Sep 78 Sep Nov 78 Sep 77 Sep Nov 77 Base Level + Datenwert original Korrigierter Datenwert Original-Zeitreihen St , 12 Additive Model Seasonal Component 10

8

6 Yt = L + St + Tt + Et

4

2

[t] 0 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980

Datenwert original Korrigierter Datenwert Tt , Trend-Component Multiplicative Model

Yt = L * St * Tt * Et

Original-Zeitreihen

60000 E 50000 t 40000 Irregular or 30000

20000

random Error- 10000

0 Component [t] M 08.2000 M 09.2000 M 10.2000 M 11.2000 M 12.2000 M 01.2001 M 02.2001 M 03.2001 M 04.2001 M 05.2001 M 06.2001 M 07.2001 M 08.2001 M 09.2001 M 10.2001 M 11.2001 M 12.2001 M 01.2002 M 02.2002 M 03.2002 M 04.2002 M 05.2002 M 06.2002 M 07.2002 M

Datenwert original Korrigierter Datenwert EVIC’05 © Sven F. Crone - www.bis-lab.com

Classification of Time Series Patterns No Additive Multiplicative Seasonal Effect Seasonal Effect Seasonal Effect No Trend Effect

Additive Trend Effect

Multiplicative Trend Effect

[Pegels 1969 / Gardner] EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda Forecasting with Artificial Neural Networks 1. Forecasting? 1. Forecasting as predictive Regression 2. Time series prediction vs. causal prediction 3. SARIMA-Modelling 1. SARIMA – Differencing 2. SARIMA – Autoregressive Terms 3. SARIMA – Moving Average Terms 4. SARIMA – Seasonal Terms 4. Why NN for Forecasting? 2. Neural Networks? 3. Forecasting with Neural Networks … 4. How to write a good Neural Network forecasting paper! EVIC’05 © Sven F. Crone - www.bis-lab.com

Introduction to ARIMA Modelling

ƒ Seasonal Autoregressive Integrated Moving Average Processes: SARIMA

popularised by George Box & Gwilym Jenkins in 1970s (names often used synonymously)

models are widely studied

Put together theoretical underpinning required to understand & use ARIMA

Defined general notation for dealing with ARIMA models Æ claim that most time series can be parsimoniously represented by the ARIMA class of models

ƒ ARIMA (p, d, q)-Models attempt to describe the systematic pattern of a time series by 3 parameters

p: Number of autoregressive terms (AR-terms) in a time series

d: Number of differences to achieve stationarity of a time series

q: Number of moving average terms (MA-terms) in a time series d Φ p (B)(1− B) Zt = δ + Θq (B)et EVIC’05 © Sven F. Crone - www.bis-lab.com

The Box-Jenkins Methodology for ARIMA models

Model Identification Data Preparation • Transform time series for stationarity • Difference time series for stationarity

Model selection •Examine ACF & PACF • Identify potential Models (p,q)(sq,sp) auto

Model Estimation & Testing Model Estimation • Estimate parameters in potential models • Select best model using suitable criterion

Model Diagnostics / Testing • Check ACF / PACF of residuals Æ white noise • Run portmanteau test of residuals Re-identify

Model Application Model Application • Use selected model to forecast EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA-Modelling

ƒARIMA(p,d,q)-Models ƒ ARIMA - Autoregressive Terms AR(p), with p=order of the autoregressive part ƒ ARIMA - Order of Integration, d=degree of first differencing/integration involved ƒ ARIMA - Moving Average Terms MA(q), with q=order of the moving average of error

ƒ SARIMAt (p,d,q)(P,D,Q) with S the (P,D,Q)-process for the seasonal lags

ƒObjective

Identify the appropriate ARIMA model for the time series

Identify AR-term

Identify I-term

Identify MA-term

ƒIdentification through

Autocorrelation Function

Partial Autocorrelation Function EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA-Models: Identification of d-term

ƒ Parameter d determines order of integration

ƒ ARIMA models assume stationarity of the time series

Stationarity in the mean

Stationarity of the variance (homoscedasticity)

ƒ Recap:

Let the mean of the time series at t be μtt= EY( )

and λtt, −−τ = cov(YY t , t τ ) () ttλ, = var Y t ƒ Definition

A time series is stationary if its mean level μt is constant for all t and its variance and covariances λt-τ are constant for all t In other words:

„ all properties of the distribution (mean, varicance, skewness, kurtosis etc.) of a random sample of the time series are independent of the absolute time t of drawing the sample Æ identity of mean & variance across time EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA-Models: Stationarity and parameter d

ƒ Is the time series stationary Stationarity: sample B μ(A)= μ(B) Sample A var(A)=var(B) etc.

this time series: μ(B)> μ(A) Æ trend Æ instationary time series EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA-Modells: Differencing for Stationariry

ƒ Differencing time series

ƒ E.g. : time series Yt={2, 4, 6, 8, 10}. ƒ time series exhibits linear trend 10 ƒ 1st order differencing between

observation Yt and predecessor Yt-1 derives a transformed time series: 2 4-2=2

1 234 5 6-4=2 8-6=2 xt 10 10-8=2

ÆThe new time series ΔYt={2,2,2,2} is stationary through 1st differencing 2 Æd=1 Æ ARIMA (0,1,0) model 1 234 Æ2nd order differences: d=2 xt-xt-1 EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA-Modells: Differencing for Stationariry

ƒ Integration

Differencing

ZYYttt=−−1

Transforms: Logarithms etc.

Where Zt is a transform of the variable of interest Yt

chosen to make Zt-Zt-1-(Zt-1-Zt-2)-… stationary

ƒ Tests for stationarity:

Dickey-Fuller Test

Serial Correlation Test

Runs Test EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda Forecasting with Artificial Neural Networks 1. Forecasting? 1. Forecasting as predictive Regression 2. Time series prediction vs. causal prediction 3. SARIMA-Modelling 1. SARIMA – Differencing 2. SARIMA – Autoregressive Terms 3. SARIMA – Moving Average Terms 4. SARIMA – Seasonal Terms 4. Why NN for Forecasting? 2. Neural Networks? 3. Forecasting with Neural Networks … 4. How to write a good Neural Network forecasting paper! EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA-Models – Autoregressive Terms

ƒ Description of Autocorrelation structure Æ auto regressive (AR) term

If a dependency exists between lagged observations Yt and Yt-1 we can describe the realisation of Yt-1

Yctttptpt= ++φ11 Y−−φφ 2 Y 2 +++... Y − e

Observation Weight of the Observation in random component in time t AR relationship t-1 (independent) („white noise“)

Equations include only lagged realisations of the forecast variable

ARIMA(p,0,0) model = AR(p)-model

ƒ Problems

Independence of residuals often violated (heteroscedasticity)

Determining number of past values problematic

ƒ Tests for Autoregression: Portmanteau-tests

Box-Pierce test

Ljung-Box test EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA-Modells: Parameter p of Autocorrelation

ƒ stationary time series can be analysed for autocorrelation- structure ƒ The autocorrelation coefficient for lag k n ∑ ()YYYttk−−()− Y tk=+1 ρk = n ∑()YYt − t=1

denotes the correlation between lagged observations of distance Autocorrelation between x_t and x_t-1 k 220

200 ƒ Graphical interpretation … 180 Uncorrelated data has 160 X_t-1 low autocorrelations [x_t-1] 140

Uncorrelated data shows 120 no correlation patern 100 100 120 140 160 180 200 220 … [x_t] EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA-Modells: Parameter p

ƒ E.g. time series Yt 7, 8, 7, 6,5, 4, 5, 6, 4.

lag 1: 7, 8 lag 2: 7, 7 lag 3: 7, 6 8, 7 8, 6 8, 5 7, 6 7, 5 7, 4 6, 5 6, 4 6, 5 5, 4 5, 5 5, 6 4, 5 4, 6 4, 5 5, 6 5, 4 6, 4

r1=.62 r2=.32 r3=.15

ACF 0.6 Æ Autocorrelations rt gathered at lags 1, 2, … make up the … 0.0 autocorrelation function (ACF) 1 23 lag -0.6 EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA-Models – Autoregressive Terms

ƒ Identification of AR-terms …?

NORM AUTO 1.0 1.0

.5 .5

0.0 0.0

-.5 -.5 Confidence Limits Confidence Limits

-1.0 Coefficient ACF -1.0 Coefficient ACF 1 3 5 7 9 11 13 15 1 3 5 7 9 11 13 15 2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16

Lag Number Lag Number • Random independent • An AR(1) process? observations EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA-Modells: Partial Autocorrelations

ƒ Partical Autocorrelations are used to measure the degree of

association between Yt and Yt-k when the effects of other time lags 1,2,3,…,k-1 are removed

Significant AC between Yt and Yt-1 Æ significant AC between Yt-1 and Yt-2 Æ induces correlation between Yt and Yt-2 ! (1st AC = PAC!)

ƒ When fitting an AR(p) model to the time series, the last coefficient p of Yt-p measures the excess correlation at lag p which is not accounted for by an AR(p-1) model. πp is called the pth order partial autocorrelation, i.e.

π p= corr(YY t , tp− | Y t−−12 , Y t ,..., Y tp −+ 1) ƒ Partial Autocorrelation coefficient measures true correlation at Yt-p YYYYtptptptpt=ϕ01122++ϕϕ−− πν +. . . .+ − + EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA Modelling – AR-Model patterns

Autocorrelation function Partial Autocorrelation function

1 1 0,8 Pattern in ACf 0,8 1st lag significant 0,6 0,6

0,4 0,4

0,2 0,2

0 0 12345678910111213141516 12345678910111213141516 -0,2 -0,2

-0,4 -0,4

-0,6 -0,6 Autocorrelation Partial Autocorrelation -0,8 -0,8 Confidence Confidence -1 Interval -1 Interval

ƒ AR(1) model: =ARIMA (1,0,0) Ycttt= ++φ11 Y− e EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA Modelling – AR-Model patterns

Autocorrelation function Partial Autocorrelation function 1 Pattern in ACf 1 1st lag significant 0,8 0,8

0,6 0,6

0,4 0,4

0,2 0,2

0 0 12345678910111213141516 12345678910111213141516 -0,2 -0,2

-0,4 -0,4

-0,6 -0,6 Autocorrelation Partial Autocorrelation -0,8 -0,8 Confidence Confidence -1 Interval -1 Interval

ƒ AR(1) model:Ycttt= ++φ11 Y− e =ARIMA (1,0,0) EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA Modelling – AR-Model patterns

Autocorrelation function Partial Autocorrelation function 1 Pattern in ACf 1 1st & 2nd lag significant 0,8 0,8

0,6 0,6

0,4 0,4

0,2 0,2

0 0 12345678910111213141516 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 -0,2 -0,2

-0,4 -0,4

-0,6 -0,6 Autocorrelation Partial Autocorrelation -0,8 -0,8 Confidence Confidence -1 Interval -1 Interval

ƒ AR(2) model: Yc= ++φ Yφ Y + e =ARIMA (2,0,0) tttt11−− 2 2 EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA Modelling – AR-Model patterns

Autocorrelation function Partial Autocorrelation function 1 Pattern in ACF: 1 1st & 2nd lag significant 0,8 dampened sine 0,8

0,6 0,6

0,4 0,4

0,2 0,2

0 0 12345678910111213141516 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 -0,2 -0,2

-0,4 -0,4

-0,6 -0,6 Autocorrelation Partial Autocorrelation -0,8 -0,8 Confidence Confidence -1 Interval -1 Interval

ƒ AR(2) model: =ARIMA (2,0,0) Yctttt= ++φ11 Y−−φ 2 Y 2 + e EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA Modelling – AR-Models

ƒ Autoregressive Model of order one ARIMA(1,0,0), AR(1)

Autoregressive: AR(1), rho=.8 ƒ 5

ƒ 4 Ycttt=+φ11 Y− + e ƒ 3 ƒ 2

ƒ 1

ƒ 0 =+1.1 0.8Ye + ƒ 1ƒ 4ƒ ƒ7 10ƒ 13ƒ 16ƒ 19ƒ 22ƒ 25ƒ 28ƒ 31ƒ 34ƒ 37ƒ 40ƒ 43ƒ 46ƒ 49 tt−1 ƒ -1 ƒ -2

ƒ -3

ƒ -4 ƒ Higher order AR models

Yctttptpt=+φ11 Y−− +φφ 2 Y 2 +... + Y − + e

for p =1,−< 1φ1 < 1

p = 2,−< 1φφφφφ22121 <∧ 1 + <∧ 1 − < 1 EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda Forecasting with Artificial Neural Networks 1. Forecasting? 1. Forecasting as predictive Regression 2. Time series prediction vs. causal prediction 3. SARIMA-Modelling 1. SARIMA – Differencing 2. SARIMA – Autoregressive Terms 3. SARIMA – Moving Average Terms 4. SARIMA – Seasonal Terms 4. Why NN for Forecasting? 2. Neural Networks? 3. Forecasting with Neural Networks … 4. How to write a good Neural Network forecasting paper! EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA Modelling – Moving Average Processe

ƒ Description of Moving Average structure

AR-Models may not approximate data generator underlying the

observations perfectly Æ residuals et, et-1, et-2, …, et-q

Observation Yt may depend on realisation of previous errors e

Regress against past errors as explanatory variables

Ycetttt=+ −θ11 e− −θφ 2 e−− 2 −−... qtq e

ARIMA(0,0,q)-model = MA(q)-model

for q =1,−< 1θ1 < 1

q = 2,−< 1θθθθθ22121 <∧ 1 + <∧ 1 − < 1 EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA Modelling – MA-Model patterns

Autocorrelation function Partial Autocorrelation function 1 1

0,8 0,8 st 0,6 1 lag significant 0,6 Pattern in PACF

0,4 0,4

0,2 0,2

0 0 12345678910111213141516 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 -0,2 -0,2

-0,4 -0,4

-0,6 -0,6 Autocorrelation -0,8 -0,8 Partial Autocorrelation Confidence Confidence -1 Interval -1 Interval ƒ MA(1) model:Yce= +−θ e =ARIMA (0,0,1) ttt11− EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA Modelling – MA-Model patterns

Autocorrelation function Partial Autocorrelation function 1 1

0,8 0,8 st 0,6 1 lag significant 0,6 Pattern in PACF

0,4 0,4

0,2 0,2

0 0 12345678910111213141516 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 -0,2 -0,2

-0,4 -0,4

-0,6 -0,6 Autocorrelation -0,8 -0,8 Partial Autocorrelation Confidence Confidence -1 Interval -1 Interval

ƒ MA(1) model: =ARIMA (0,0,1) Ycettt= +−θ11 e− EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA Modelling – MA-Model patterns

1 Autocorrelation function1 Partial Autocorrelation function

0,8 0,8 st nd rd 0,6 1 , 2 & 3 lag significant 0,6 Pattern in PACF

0,4 0,4

0,2 0,2

0 0 12345678910111213141516 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 -0,2 -0,2

-0,4 -0,4

-0,6 -0,6 Autocorrelation -0,8 -0,8 Partial Autocorrelation Confidence Confidence -1 Interval -1 Interval

ƒ MA(3) model: Ycettttt= +−θ11 e− −θθ 12 e−− − 13 e =ARIMA (0,0,3) EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA Modelling – MA-Models

ƒ Autoregressive Model of order one ARIMA(0,0,1)=MA(1)

Autoregressive: MA(1), theta=.2 ƒ 3 ƒ 2,5 ƒ 2 Ycettt=+ −θ11 e− ƒ 1,5 ƒ 1 ƒ 0,5 ƒ 0 =++10ee 0.2 ƒ -0,5ƒ 1ƒ 4ƒ ƒ7 10ƒ 13ƒ 16ƒ 19ƒ 22ƒ 25ƒ 28ƒ 31ƒ 34ƒ 37ƒ 40ƒ 43ƒ 46ƒ 49 tt−1 ƒ -1 ƒ -1,5 ƒ -2 Period EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA Modelling – Mixture ARMA-Models

ƒ complicated series may be modelled by combining AR & MA terms

ARMA(1,1)-Model = ARIMA(1,0,1)-Model

Yctttt=+φ11 Y− + e −θ 11 e−

Higher order ARMA(p,q)-Models

Yc=+φ Y +φφ Y +... + Y + e tttptptθθ11−− 2 φ 2 −

−−11eett−− 2 2 −−... qtq e − EVIC’05 © Sven F. Crone - www.bis-lab.com

1 Autocorrelation function1 Partial Autocorrelation function

0,8 0,8 st st 0,6 1 lag significant 0,6 1 lag significant

0,4 0,4

0,2 0,2

0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 -0,2 -0,2

-0,4 -0,4

-0,6 -0,6 Autocorrelation -0,8 -0,8 Partial Autocorrelation Confidence Confidence -1 Interval -1 Interval

ƒ AR(1) and MA(1) model: =ARMA(1,1)=ARIMA (1,0,1) EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda Forecasting with Artificial Neural Networks 1. Forecasting? 1. Forecasting as predictive Regression 2. Time series prediction vs. causal prediction 3. SARIMA-Modelling 1. SARIMA – Differencing 2. SARIMA – Autoregressive Terms 3. SARIMA – Moving Average Terms 4. SARIMA – Seasonal Terms 5. SARIMAX – Seasonal ARIMA with Interventions 4. Why NN for Forecasting? 2. Neural Networks? 3. Forecasting with Neural Networks … 4. How to write a good Neural Network forecasting paper! EVIC’05 © Sven F. Crone - www.bis-lab.com

Seasonality in ARIMA-Models

ƒ Identifying seasonal data:Spikes in ACF / PACF at seasonal lags, e.g

t-12 & t-13 for yearly

t-4 & t-5 for quarterly 1,0000 ACF ,8000 Upper Limit ,6000 ƒ Differences ,4000 Low er Limit ,2000 ,0000 Simple: ΔY =(1-B)Y 1 4 7 0 3 6 9 2 5 8 1 4 t t 1 1 1 1 2 2 2 3 3 -,2000 s s Seasonal: Δ Yt=(1-B )Yt -,4000 with s= seasonality, eg. 4, 12

ƒ Data may require seasonal differencing to remove seasonality

To identify model, specify seasonal parameters: (P,D,Q)

„ the seasonal autoregressive parameters P

„ seasonal difference D and

„ seasonal moving average Q Æ Seasonal ARIMA (p,d,q)(P,D,Q)-model EVIC’05 © Sven F. Crone - www.bis-lab.com

Seasonality in ARIMA-Models

ACF

1.0000 ACF .8000 Upper Limit .6000 .4000 Low er Limit 1 .2000 7 Seasonal spikes 4 .0000 3 1 0 3 6 9 2 5 8 4 = monthly data 1 1 1 2 2 2 31 -.2000

-.4000 34 31 28 .8000 25 22 PA CF .6000 9 6 1 1 Upper Limit .4000 13 10 .2000 7 Low er Limit 4 .00001

-.2000

-.4000 -.6000 PACF EVIC’05 © Sven F. Crone - www.bis-lab.com

Seasonality in ARIMA-Models

ƒ Extension of Notation of Backshift Operator

s s s Δ Yt = Yt-Yt-s =Yt–B Yt=(1-B )Yt ƒ Seasonal difference followed by a first difference: (1-B) (1-Bs)

Yt

ƒ Seasonal ARIMA(1,1,1)(1,1,1)444-modell 4 (11−−Φ−−=+−−Θφθ11B)( BBBYcB)( 11)( ) tt( 11 11)( Be)

Non-seasonal Non-seasonal Non-seasonal AR(1) difference MA(1) Seasonal Seasonal Seasonal AR(1) difference MA(1) EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda Forecasting with Artificial Neural Networks 1. Forecasting? 1. Forecasting as predictive Regression 2. Time series prediction vs. causal prediction 3. SARIMA-Modelling 1. SARIMA – Differencing 2. SARIMA – Autoregressive Terms 3. SARIMA – Moving Average Terms 4. SARIMA – Seasonal Terms 5. SARIMAX – Seasonal ARIMA with Interventions 4. Why NN for Forecasting? 2. Neural Networks? 3. Forecasting with Neural Networks … 4. How to write a good Neural Network forecasting paper! EVIC’05 © Sven F. Crone - www.bis-lab.com

Forecasting Models

ƒ Time series analysis vs. causal modelling

Time series prediction (Univariate)

„ Assumes that data generating process that creates patterns can be explained only from previous observations of dependent variable

Causal prediction (Multivariate)

„ Data generating process can be explained by interaction of causal (cause-and-effect) independent variables EVIC’05 © Sven F. Crone - www.bis-lab.com

Causal Prediction

ƒ ARX(p)-Models

ƒ General Dynamic Regression Models EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks

1. Forecasting? 1. Forecasting as predictive Regression 2. Time series prediction vs. causal prediction 3. SARIMA-Modelling 4. Why NN for Forecasting? 2. Neural Networks? 3. Forecasting with Neural Networks … 4. How to write a good Neural Network forecasting paper! EVIC’05 © Sven F. Crone - www.bis-lab.com

Why forecast with NN?

ƒ Pattern or noise?

ÆAirline Passenger data Æ Fresh products supermarket Sales Seasonal, trended Æ Æ Seasonal, events, ÆReal “model” disagreed: heteroscedastic noise multiplicative seasonality Æ Real “model” unknown or additive seasonality with level shifts? EVIC’05 © Sven F. Crone - www.bis-lab.com

Why forecast with NN?

ƒ Pattern or noise?

Æ Random Noise iid Æ BL(p,q) Bilinear (normally distributed: Autoregressive Model mean 0; std.dev. 1) yytttt= 0.7 −−12ε + ε EVIC’05 © Sven F. Crone - www.bis-lab.com

Why forecast with NN?

ƒ Pattern or noise?

Æ TAR(p) Threshold Æ Random walk Autoregressive model yytt= −1 + ε t yyttt= 0.9−−11+≤ε for y t1

= −−0.3yytt−−11ε for t >1 EVIC’05 © Sven F. Crone - www.bis-lab.com

Motivation for NN in Forecasting – Nonlinearity!

Æ True data generating process in unknown & hard to identify Æ Many interdependencies in business are nonlinear

ƒ NN can approximate any LINEAR and NONLINEAR function to any desired degree of accuracy

Can learn linear time series patterns

Can learn nonlinear time series patterns Æ Can extrapolate linear & nonlinear patterns = generalisation! ƒ NN are nonparametric

Don’t assume particular noise process, i.e. gaussian ƒ NN model (learn) linear and nonlinear process directly from data

Approximate underlying data generating process

Æ NN are flexible forecasting paradigm EVIC’05 © Sven F. Crone - www.bis-lab.com

Motivation for NN in Forecasting - Modelling Flexibility

Æ Unknown data processes require building of many candidate models!

ƒ Flexibility on Input Variables Æ flexible coding

binary scale [0;1]; [-1,1]

nominal / ordinal scale (0,1,2,…,10 Æ binary coded [0001,0010,…]

metric scale (0.235; 7.35; 12440.0; …) ƒ Flexibility on Output Variables

binary Æ prediction of single class membership

nominal / ordinal Æ prediction of multiple class memberships

metric Æ regression (point predictions) OR probability of class membership! ƒ Number of Input Variables

… ƒ Number of Output Variables

… Æ One SINGLE network architecture Æ MANY applications EVIC’05 © Sven F. Crone - www.bis-lab.com

Applications of Neural Nets in diverse Research Fields Æ 2500+ journal publications on NN & Forecasting alone!

Citation Analysis by year Forecast title=(neural AND net* AND (forecast* OR predict* OR time-ser* OR Sales Forecast ƒ Neurophysiology time w/2 ser* OR timeser*) & title=(... ) Æ simulate & explain brain and evaluated sales forecasting related point predictions Linear (Forecast) 350 Informatics 2 35 ƒ R = 0.9036 300 Æ eMail & url filtering 30 Æ VirusScan (Symmantec Norton Antivirus) 250 25 Æ Speech Recognition & Optical Character Recognition200 20 ƒ Engineering 150 Æ control applications in plants 15 Æ automatic target recognition (DARPA) 100[citations] Æ explosive detection at airports 10 50 Æ Mineral Identification (NASA Mars Explorer) 5 Æ starting & landing of Jumbo Jets (NASA) 0 ƒ Meteorology / weather 0 [year] Æ Rainfall prediction 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 Æ ElNino Effects Number of Publications by ƒ Corporate Business Business Forecasting Domain Æ credit card fraud detection Æ simulate forecasting methods 5 ƒ Business Forecasting Domains 52 Electrical Load / Demand 32 Financial Forecasting 83 General Business „ Currency / Exchange rate Marketing „ stock forecasting etc. 51 Finance Production Sales forecasting Product Sales Æ not all NN recommendations 21 Electrical Load are useful for your DOMAIN! 10 EVIC’05 © Sven F. Crone - www.bis-lab.com

IBF Benchmark– Forecasting Methods used

Applied Forecasting Methods (all industries) 0% 10% 20% 30% 40% 50% 60% 70% 80%

Averages 25%

Autoregressive Methods 7%

Decomposition 8% Time Series methods (objective) Æ 61% Exponential Smoothing 24%

Trendextrapolation 35% Econometric Models 23% Causal Methods Neural Networks 9% (objective) Æ 23% Regression 69% [Forecastinng Method] [Forecastinng Analogies 23%

Delphi 22% Judgemental Methods PERT 6% (subjective) Æ 2x% Surveys 49%

[number replies]

Æ Survey 5 IBF conferences in 2001

240 forecasters, 13 industries ÆNN are applied in coroporate Demand Planning / S&OP processes! [Warning: limited sample size] EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks 1. Forecasting? 2. Neural Networks? 1. What are NN? Definition & Online Preview … 2. Motivation & brief history of Neural Networks 3. From biological to artificial Neural Network Structures 4. Network Training 3. Forecasting with Neural Networks … 4. How to write a good Neural Network forecasting paper! EVIC’05 © Sven F. Crone - www.bis-lab.com

What are Artificial Neural Networks?

ƒ Artificial Neural Networks (NN)

„a machine that is designed to model the way in which the brain performs a particular task …; the network is … implemented … or .. simulated in software on a digital computer.“ [Haykin98]

class of statistical methods for information processing consisting of large number of simple processing units (neurons), which exchange information of their activation via directed connections. [Zell97] Input Processing Output

ƒtime series observation θn+1 ƒtime series prediction ƒcausal variables ƒdependent variables θn+5 ƒimage data (pixel/bits) θn+2 ƒclass memberships ƒFinger prints θ ƒclass probabilities Black Box n+6 ƒChemical readouts ƒprincipal components θn+3

ƒ… θnh+ ƒ…

θn+4 EVIC’05 © Sven F. Crone - www.bis-lab.com

What are Neural Networks in Forecasting?

ƒ Artificial Neural Networks (NN) Æ a flexible forecasting paradigm

A class of statistical methods for time-series and causal forecasting

Highly flexible processing Æ arbitrary input to output relationships

Properties Æ non-linear, nonparametric (assumed), error robust (not outlier!)

Data driven modelling Æ “learning” directly from data

Input Processing Output

ƒNominal, interval or θn+1 ƒRatio scale ratio scale (not ordinal) θ n+5 ƒRegression θ ƒtime series observations n+2 ƒSingle period ahead ƒLagged variables ƒMultiple period ahead θn+6 ƒDummy variables Black Box ƒ… ƒCausal variables θn+3

θnh+ ƒExplanatory variables θ ƒDummy variables n+4 ƒ… EVIC’05 © Sven F. Crone - www.bis-lab.com

DEMO: Preview of Neural Network Forecasting

ƒ Simulation of NN for Business Forecasting

ƒ Airline Passenger Data Experiment ƒ 3 layered NN: (12-8-1) 12 Input units - 8 hidden units – 1 output unit ƒ 12 input lags t, t-1, …, t-11 (past 12 observations) Æ time series prediction ƒ t+1 forecast Æ single step ahead forecast

Æ Benchmark Time Series [Brown / Box&Jenkins] ƒ 132 observations ƒ 13 periods of monthly data EVIC’05 © Sven F. Crone - www.bis-lab.com

Demonstration: Preview of Neural Network Forecasting

ƒ NeuraLab Predict! Æ „look inside neural forecasting“

Errors on training / validation / test dataset

Time Series versus Neural Network Forecast Æ updated after each learning step

PQ-diagramm in sample observations & forecasts out of sample training ÅÆ validate = Test

Time series actual value Absolute Forecasting Errors

NN forecasted value EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks 1. Forecasting? 2. Neural Networks? 1. What are NN? Definition & Online Preview … 2. Motivation & brief history of Neural Networks 3. From biological to artificial Neural Network Structures 4. Network Training 3. Forecasting with Neural Networks … 4. How to write a good Neural Network forecasting paper! EVIC’05 © Sven F. Crone - www.bis-lab.com

Motivation for using NN … BIOLOGY!

ƒ Human & other nervous systems (animals, insects Æ e.g. bats)

Ability of various complex functions: perception, motor control, pattern recognition, classification, prediction etc.

Speed: e.g. detect & recognize changed face in crowd=100-200ms

Efficiency etc. Æ brains are the most efficient & complex computer known to date

Human Brain Computer (PCs) Processing Speed 10-3ms (0.25 MHz) 10-9ms (2500 MHz PC) Neurons/Transistors 10 billion & 103 billion conn. 50 million (PC chip) Weight 1500 grams kilograms to tons! Energy consumption 10-16 Joule 10-6 Joule Computation: Vision 100 steps billions of steps Æ Comparison: Human = 10.000.000.000 Æ ant 20.000 neurons EVIC’05 © Sven F. Crone - www.bis-lab.com

Brief History of Neural Networks

ƒ History

Developed in interdisciplinary Research (McCulloch/Pitts1943)

Motivation from Functions of natural Neural Networks ª neurobiological motivation

ª application-oriented motivation [Smith & Gupta, 2000]

Turing Hebb Dartmouth Project Minsky/Papert Werbos 1st IJCNN 1st journals 1936 1949 1956 1969 1974 1987 1988 McCulloch / Pitts Rosenblatt Kohonen Rumelhart/Hinton/Williams 1943 1959 1972 1986 Minski 1954 INTEL1971 IBM 1981 Neuralware 1987 SAS 1997 builds 1st NeuroComputer 1st microprocessor Introduces PC founded Enterprise Miner GE 1954 White 1988 IBM 1998 1st computer payroll system 1st paper on forecasting $70bn BI initiative

Neuroscience Applications in Engineering Applications in Business Æ Pattern Recognition & Control ÆForecasting …

ª Research field of Soft-Computing & Artificial Intelligence ª Neuroscience, Mathematics, Physics, Statistics, Information Science, Engineering, Business Management ª different VOCABULARY: statistics versus neurophysiology !!! EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks 1. Forecasting? 2. Neural Networks? 1. What are NN? Definition & Online Preview … 2. Motivation & brief history of Neural Networks 3. From biological to artificial Neural Network Structures 4. Network Training 3. Forecasting with Neural Networks … 4. How to write a good Neural Network forecasting paper! EVIC’05 © Sven F. Crone - www.bis-lab.com

Motivation & Implementation of Neural Networks

ƒFrom biological neural networks … to artificial neural networks

o w 1 i,1neti = ∑wijoj −θ j ai = f ()neti oi = ai w j o2 i,2 oi w oj i,j

θn+1

θn+5

θn+2

θn+6

θn+3

θnh+ Mathematics as abstract θn+4 representations of reality ⎛ ⎞ o = tanh⎜ w o −θ ⎟ Æ use in software simulators, i ⎜∑ ji j i ⎟ hardware, engineering etc. ⎝ j ⎠

neural_net = eval(net_name); [num_rows, ins] = size(neural_net.iw{1}); [outs,num_cols] = size(neural_net.lw{neural_net.numLayers, neural_net.numLayers-1}); if (strcmp(neural_net.adaptFcn,'')) net_type = 'RBF'; else net_type = 'MLP'; end

fid = fopen(path,'w'); EVIC’05 © Sven F. Crone - www.bis-lab.com

Information Processing in biological Neurons

ƒ Modelling of biological functions in Neurons

10-100 Billion Neurons with 10000 connections in Brain

Input (sensory), Processing (internal) & Output (motoric) Neurons

CONCEPT of Information Processing in Neurons … EVIC’05 © Sven F. Crone - www.bis-lab.com Alternative notations – Information processing in neurons / nodes

Biological Representation

Graphical Notation Input Input Function Activation Function Output

in1 wi,1

w in i,2 out 2 netiijj= ∑ w in afnetiij=−( θ ) i βi => weights wi j

β => bias θ wi,j 0 inj Neuron / Node ui

Mathematical Representation alternative: ⎧ 1if∑ wxji j−≥θ i 0 ⎛⎞ ⎪ j ywxijiji=−tanh ⎜⎟θ yi = ⎨ ∑ 0ifwx−< 0 ⎝⎠j ⎪ ∑ ji j i θ … ⎩ j EVIC’05 © Sven F. Crone - www.bis-lab.com

Information Processing in artificial Nodes

CONCEPT of Information Processing in Neurons Input Function (Summation of previous signals) Activation Function (nonlinear) binary step function {0;1} sigmoid function: logistic, hyperbolic tangent etc. Output Function (linear / Identity, SoftMax ...)

Input Input Activation Output Output Function Function Function in1 wi,1

w in i,2 a = f (net ) o = a out 2 netiijjj= ∑ w in −θ i i i i i j

w in i,j Neuron / Node u j = i Unidirectional Information Processing

⎧1if∑ woji j−θ i ≥ 0 ⎪ j outi = ⎨ ⎪0if∑ woji j− i θ< 0 ⎩ j EVIC’05 © Sven F. Crone - www.bis-lab.com

Input Functions EVIC’05 © Sven F. Crone - www.bis-lab.com

Binary Activation Functions

ƒ Binary activation calculated from input

afnetj = act() j,θ j e.g. afnetj =−act( jθ j ) EVIC’05 © Sven F. Crone - www.bis-lab.com

Information Processing: Node Threshold logic

Node Function Æ BINARY THRESHOLD LOGIC 1. weight individual input by connection strength 2. sum weighted inputs 3. add bias term 4. calculate output of node through BINARY transfer function Æ RERUN with next input inputs weightsinformation processing output

2.2 o1 w1,i 0.71 2.2* 0.71 +4.0*-1.84 +1.0* 9.01 w o 2,i-1.84 = 3.212 -4.778 < 0 4 2 Î 0.00 0.00 3.212 oj - 8.0 9.01 = -4.778 w3,i 1 o3 8.0 w0,i θ=o0 ⎧10∀∑ woji j−≥θ i ⎪ j oi = ⎨ 00wo− < netiijjj= w o −θ ai = f (neti ) ⎪ ∑ ji j i θ ∑ j j ⎩ EVIC’05 © Sven F. Crone - www.bis-lab.com

Continuous Activation Functions

ƒ Activation calculated from input

afnetj = act() j,θ j e.g. afnetj =−act( jθ j )

Hyperbolic Tangent

Logistic Function EVIC’05 © Sven F. Crone - www.bis-lab.com

Information Processing: Node Threshold logic

Node Function Æ Sigmoid THRESHOLD LOGIC of TanH activation function 1. weight individual input by connection strength 2. sum weighted inputs 3. add bias term 4. calculate output of node through BINARY transfer function Æ RERUN with next input inputs weightsinformation processing output

2.2 o1 w1,i 0.71 2.2* 0.71 +4.0*-1.84 +1.0* 9.01 w o 2,i-1.84 = 3.212 Tanh(-4.778) 4 2 = -0.9998 -0.9998 3.212 oj - 8.0 9.01 = -4.778 w3,i 1 o3 8.0 w0,i θ=o0 ⎛ ⎞ net= w o −θ a = f (net ) o = tanh⎜ w o −θ ⎟ iijjj∑ i i i ⎜∑ ji j i ⎟ j ⎝ j ⎠ EVIC’05 © Sven F. Crone - www.bis-lab.com

A new Notation … GRAPHICS!

ƒ Single Linear Regression … as an equation:

yxxx=+βonnββ11 + 2 2 β ++... ε +

ƒ Single Linear Regression … as a directed graph:

1

β0

x1 β1 β x2 2 y ∑βnnx … n βn

xn

Unidirectional Information Processing EVIC’05 © Sven F. Crone - www.bis-lab.com

Why Graphical Notation?

ƒ Simple neural network equation without recurrent feedbacks:

⎛⎞⎛⎞⎛⎞ ywwwxMin=−−−⇒tanh⎜⎟ tanh tanhθθθ ! kkjkijijjik⎜⎟∑∑∑⎜⎟⎜⎟ ⎝⎠ki⎝⎠⎝⎠ j

2 ⎛⎞NN ⎛⎞ ⎛⎞⎜⎟xw−−θθ ⎜⎟ xw − ⎜⎟⎜⎟∑∑iijj ⎜⎟ iijj ⎛⎞N ee⎝⎠ii==11− ⎝⎠ βi => wi ⎛⎞⎝⎠ with … tanh ⎜⎟⎜⎟xwiij−=θ j 2 ∑ ⎛⎞NN ⎛⎞ β => θ ⎝⎠⎝⎠i=1 ⎛⎞⎜⎟xw−− ⎜⎟ xw − 0 ⎜⎟⎜⎟∑∑iijjθθ ⎜⎟ iijj ⎝⎠ee⎝⎠ii==11+ ⎝⎠

ƒ Also: θn+1 un+1 yt u1

yt-1 θn+2 u2 un+2 y θn+5 t+1 yt-2 u3 un+5 ... θn+3

un+3 yt-n-1 un Æ Simplification θ n+4 for complex models! un+4 EVIC’05 © Sven F. Crone - www.bis-lab.com

Combination of Nodes

ƒ “Simple” processing per node

ƒ Combination of simple nodes ⎛⎞⎛⎞N tanh ow −θ creates complex behaviour o wk,j ⎜⎟⎜⎟∑ iij j k ⎝⎠⎝⎠i=1 2 ƒ … ⎛⎞NN ⎛⎞ ⎛⎞⎜⎟ow−−θθ ⎜⎟ ow − ⎜⎟∑∑iijj ⎜⎟ iijj w ⎜⎟⎝⎠ii==11 ⎝⎠ o l,j ee− l ⎜⎟ ⎝⎠ =→2 ⎛⎞N ⎛⎞⎛⎞NN ⎛⎞ ⎛⎞ ⎜⎟owiijj−− ⎜⎟ ow iijj − tanh ow −θ ⎜⎟∑∑θθ ⎜⎟ w1,i ⎜⎟⎜⎟∑ iij j ⎜⎟ee⎝⎠ii==11+ ⎝⎠ o1 ⎝⎠i=1 ⎝⎠ ⎜⎟ 2 ⎛⎞NN ⎛⎞ ⎝⎠ ⎛⎞⎜⎟ow−−θθ ⎜⎟ ow − ⎜⎟∑∑iijj ⎜⎟ iijj w2,i ⎜⎟ee⎝⎠ii==11− ⎝⎠ wi,j o2 ⎜⎟ ⎝⎠ =→2 o j ⎛⎞⎛⎞NN ⎛⎞ ⎜⎟owiijj−− ⎜⎟ ow iijj − N ⎜⎟∑∑θθ ⎜⎟ ⎛⎞⎛⎞ w3,i ⎜⎟ee⎝⎠ii==11+ ⎝⎠ tanh ow −θ o3 ⎜⎟ ⎜⎟⎜⎟∑ iij j ⎜⎟ ⎝⎠⎝⎠i=1 ⎝⎠wi,j+1 2 ⎛⎞NN ⎛⎞ ⎛⎞⎜⎟ow−−θθ ⎜⎟ ow − ⎜⎟∑∑iijj ⎜⎟ iijj ⎜⎟ee⎝⎠ii==11− ⎝⎠ w ⎜⎟ ol l,j ⎝⎠ = 2 ⎛⎞⎛⎞NN ⎛⎞ ⎜⎟∑∑owiijj−θθ−− ⎜⎟ ow iijj ⎜⎟⎜⎟ ⎜⎟ EVIC’05 © Sven F. Crone - www.bis-lab.com

Architecture of Multilayer Perceptrons

Architecture of a Combination of neurons Æ Classic form of feed forward neural network!

Neurons un (units / nodes) ordered in Layers

unidirectional connections with trainable weights wn,n

Vector of input signals xi (input)

Vector of output signals oj (output)

w1,5 w5,8

X1 u1 u5 u8 w8,12 u12 o1 = neural network X2 u2 u5 u9

U13 o2 X3 u3 u6 u10 … ……… u14 o3

X4 u4 u7 u11

input-layer hidden-layers output-layer θ ⎛ ⎛ ⎛ ⎞ ⎞ ⎞ ⎜ ⎜ ⎜ ⎟ θ ⎟ ⎟ ok = tanh wkj tanh wki tanh w jio j − j − i −θ k ⇒ Min! ⎜∑∑∑⎜ ⎜ ⎟ ⎟ ⎟ ⎝ k ⎝ ij⎝ ⎠ ⎠ ⎠ EVIC’05 © Sven F. Crone - www.bis-lab.com

Dictionary for Neural Network Terminology

ƒ Due to its neuro-biological origins, NN use specific terminology

Neural Networks Statistics Input Nodes Independent / lagged Variables Output Node(s) Dependent variable(s) Training Parameterization Weights Parameters … …

Æ don‘t be confused: ASK! EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks 1. Forecasting? 2. Neural Networks? 1. What are NN? Definition & Online Preview … 2. Motivation & brief history of Neural Networks 3. From biological to artificial Neural Network Structures 4. Network Training 3. Forecasting with Neural Networks … 4. How to write a good Neural Network forecasting paper! EVIC’05 © Sven F. Crone - www.bis-lab.com

Hebbian Learning

ƒ HEBB introduced idea of learning by adapting weights [0,1]

Δwoaij=η i j

ƒ Delta-learning rule of Widrow-Hoff

Δwotaij=−η i() j j

=−=ηotij() o jηδ o ij EVIC’05 © Sven F. Crone - www.bis-lab.com

Neural Network Training with Back-Propagation

Training Æ LEARNING FROM EXAMPLES 1. Initialize connections with randomized weights (symmetry breaking) 2. Show first Input-Pattern (independent Variables) (demo only for 1 node!) 3. Forward-Propagation of input values unto output layer 4. Calculate error between NN output & actual value (using error / objective function) 5. Backward-Propagation of errors for each weight unto input layer  RERUN with next input pattern…

w 4 w Input-Vector Weight-Matrix 1,4 4,9 x W x1 12345678910 1 5 o1 1 9 2 3 x2 6 4 2 5 o2 10 6 x3 7 3 7 8 9 10 w3,8 w3,10 8 o E=o Output-Vector -t t Teaching-Output EVIC’05 © Sven F. Crone - www.bis-lab.com

Neural Network Training

ƒ Simple back propagation algorithm [Rumelhart et al. 1982]

∂C(t pj ,o pj ) E p = C(t pj , o pj ) o pj = f j (net pj ) Δ p w ji ∝ − ∂w ji ∂C(t ,o ) ∂C(t ,o ) ∂net pj pj = pj pj pj ∂w ji ∂net pj ∂w ji ∂C(t ,o ) Δwoij=η iδ j δ = − pj pj pj ∂net pj ⎧ f ′ net t−∀ ooutput nodes j δ jjjj()() ∂C(t pj ,opj ) ∂C(t pj ,opj ) ∂opj ⎪ δ = − = mit j = ⎨ ()() pj ∂net ∂o ∂net f ′ net w∀hidden nodes j pj pj pj ⎪ jj∑ kjk ∂o pt ⎩ k = f ' (net ) j pj δ ∂net pj Δwo=η δ ∂C(t pj ,opj ) ' ij i j δ pj = f j (net pj ) ∂o pj 1 ′ ∂ wkiopi mit f() netjjjj=→=− f () net o (1) o ∑ otwiij() ∂C(t pj ,opj ) ∂net pk ∂C(t pj ,opj ) i ∑ ∑ = ∑ 1+ e i k ∂net pk ∂opj k ∂net pk ∂opj ⎧ ∂C(t pj ,opj ) ooto()()1−− ∀ output nodes j = w = − δ w jjjj δ ∑ kj ∑ pj kj ⎪δ k ∂net pk k j = ⎨ ()() oo1−∀ w hidden nodes j = f ' (net ) δ w ⎪ jj∑ kjk pj j pj ∑ pj kj ⎩ k k δ δ ⎧∂C(t pj ,opj ) ' ⎪ f j (net pj ) if unit j is in the output layer ⎪ ∂opj pj = ⎨ ⎪ ' f j (net pj )∑δ pk wpjk if unit j is in a hidden layer ⎩⎪ k EVIC’05 © Sven F. Crone - www.bis-lab.com

Neural Network Training = Error Minimization

ƒ Minimize Error through changing ONE weight wj

E(wj)

random random starting point 1 starting point 2

local local local minimum minimum minimum

local minimum GLOBAL minimum wj EVIC’05 © Sven F. Crone - www.bis-lab.com

Error Backpropagation = 3D+ Gradient Decent

ƒ Local search on multi-dimensional error surface 4 2 ƒ task of finding the deepest 0 valley in mountains -2 local search -4 stepsize fixed

follow steepest decent

2 Ælocal optimum = any valley Æglobal optimum = deepest 0 valley with lowest error Ævaries with error surface

4 0 -4 2 0 -2 -4 -2 6 0 5 4 2 2.5 0 2 0

-2.5

-4 4 -2 0 0 2 4 EVIC’05 © Sven F. Crone - www.bis-lab.com

Demo: Neural Network Forecasting revistied!

ƒ Simulation of NN for Business Forecasting

ƒ Airline Passenger Data Experiment ƒ 3 layered NN: (12-8-1) 12 Input units - 8 hidden units – 1 output unit ƒ 12 input lags t, t-1, …, t-11 (past 12 observations) Æ time series prediction ƒ t+1 forecast Æ single step ahead forecast

Æ Benchmark Time Series [Brown / Box&Jenkins] ƒ 132 observations ƒ 13 periods of monthly data EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks

1. Forecasting? 2. Neural Networks? 3. Forecasting with Neural Networks … 1. NN models for Time Series & Dynamic Causal Prediction 2. NN experiments 3. Process of NN modelling 4. How to write a good Neural Network forecasting paper! EVIC’05 © Sven F. Crone - www.bis-lab.com

Time Series Prediction with Artificial Neural Networks

ƒ ANN are universal approximators [Hornik/Stichcomb/White92 etc.] ª Forecasts as application of (nonlinear) function-approximation ª various architectures for prediction (time-series, causal, combined...)

yˆt+h = f ()xt + εt+h f

yt+h = forecast for t+h (-) = linear / non-linear function

xt = vector of observations in t et+h = independent error term in t+h ª Single neuron / node θ n+1

≈ nonlinear AR(p) θ n+5 θ ª Feedforward NN (MLP etc.) n+2 θ n+6

≈ hierarchy of nonlinear AR(p) θ n+3 θ ª Recurrent NN (Elman, Jordan) nh+ θ ≈ nonlinear ARMA(p,q) n+4 ª … yˆt+1 = f (yt , yt−1, yt−2 ,..., yt−n−1 ) Non-linear autoregressive AR(p)-model EVIC’05 © Sven F. Crone - www.bis-lab.com

Neural Network Training on Time Series

ƒSliding Window Approach of presenting Data Input Present new data pattern to Neural Network Calculate Neural Network Output from Input values Compare Neural Network Forecast agains <> actual value Backpropagation θn+1 un+1 Change weights yt Forwardu1 pass to reduce output θ yt-1 n+2 forecast error u2 un+2 y θn+5 t+1 yt-2 New Data Input Backpropagationun+5 u3 θ ... n+3 Slide window un+3 yt-n-1 forward to show un next pattern θn+4

un+4 EVIC’05 © Sven F. Crone - www.bis-lab.com Neural Network Architectures for Linear Autoregression

Æ Interpretation ƒ weights represent autoregressive terms ƒ Same problems / shortcomings as standard AR-models!

yt Æ Extensions u1 ƒ multiple output nodes yt-1 = simultaneous auto- u2 y t+1 regression models yt-2 uj u3 ƒ Non-linearity through ... different activation function in output yt-n-1 un node

yˆt+1 = f (yt , yt−1, yt−2 ,..., yt−n−1 ) yywywywˆt+−−−−−−−−1112211= t tj++ t t j t t j ++−... yw tn tn jθ j linear autoregressive AR(p)-model EVIC’05 © Sven F. Crone - www.bis-lab.com Neural Network Architecture for Nonlinear Autoregression

Æ Extensions ƒ additional layers with yˆt+1 = f (yt , yt−1, yt−2 ,..., yt−n−1 ) ⎛⎞tn−−1 nonlinear nodes yywˆ =−tanh θ tiijj+1 ⎜⎟∑ ƒ linear activation ⎝⎠it= Nonlinear autoregressive AR(p)-model function in output layer EVIC’05 © Sven F. Crone - www.bis-lab.com Neural Network Architectures for Nonlinear Autoregression

Æ Interpretation ƒ Autoregressive modeling AR(p)- approach WITHOUT the moving average terms of errors ≠ nonlinear ARIMA θ n+1 ƒ Similar problems / un+1 yt shortcomings as u1 standard AR-models! yt-1 θn+2 u2 un+2 y θn+5 t+1 yt-2 Æ Extensions u3 un+5 ... θn+3 ƒ multiple output nodes un+3 yt-n-1 = simultaneous auto- un regression models θn+4

un+4 yˆt+1 = f (yt , yt−1, yt−2 ,..., yt−n−1 ) ⎛⎞⎛⎞⎛⎞ ywwwyˆ =−−−tanh⎜⎟ tanh tanh θ θθ tkjkijitjjik+−1 ⎜⎟∑∑∑⎜⎟⎜⎟ ⎝⎠ki⎝⎠⎝⎠ j Nonlinear autoregressive AR(p)-model EVIC’05 © Sven F. Crone - www.bis-lab.com Neural Network Architectures for Multiple Step Ahead Nonlinear Autoregression

Æ Interpretation ƒ As single Autoregressive modeling AR(p)

θ n+1

n 5 θ +

θ n+ 2

θ n+6

θ + n 3 θ nh+

θ n+ 4

yyˆˆtt++12, ,..., y ˆ tn += fyyy( ttt , −− 12 , ,..., y tn −− 1)

Nonlinear autoregressive AR(p)-model EVIC’05 © Sven F. Crone - www.bis-lab.com

Neural Network Architectures for Forecasting - Nonlinear Autoregression Intervention Model

Æ Interpretation ƒ As single Autoregressive modeling AR(p) ƒ Additional Event term to explain external events

θn+1 Æ Extensions θ n+2 ƒ multiple output nodes

θn+5 = simultaneous multiple regression θn+3

θn+4

yyˆˆtt++12, ,..., y ˆ tn += fyyy( ttt , −− 12 , ,..., y tn −− 1)

Nonlinear autoregressive ARX(p)-model EVIC’05 © Sven F. Crone - www.bis-lab.com Neural Network Architecture for Linear Regression

Demand

Max Temperature

Rainfall

Sunshine Hours

yfxxxˆ = ( 123, , ,..., xn )

yxwxwxwˆ = 11j ++++− 2tj 2 3 tj 3 ... xw nnjjθ Linear Regression Model EVIC’05 © Sven F. Crone - www.bis-lab.com Neural Network Architectures for Non-Linear Regression (≈Logistic Regression)

Demand

Max Temperature

Rainfall

Sunshine Hours

yˆt+1 = f (yt , yt−1, yt−2 ,..., yt−n−1 ) ⎛⎞tn−−1 ˆ yywtiijj+1 =−log ⎜⎟∑ θ ⎝⎠it= Nonlinear Multiple (Logistic) Regression Model EVIC’05 © Sven F. Crone - www.bis-lab.com Neural Network Architectures for Non-linear Regression

Æ Interpretation ƒ Similar to linear Multiple Regression Modeling ƒ Without nonlinearity in output: weighted θ expert regime on non- n+1 linear regression ƒ With nonlinearity in

θ n+2 output layer: ???

θ n+5

θ n+3

θ n+4

yfxxxˆ = ( 123, , ,..., xn )

yxwxwxwˆ = 11j ++++− 2tj 2 3 tj 3 ... xw nnjjθ Nonlinear Regression Model EVIC’05 © Sven F. Crone - www.bis-lab.com

Classification of Forecasting Methods

Forecasting Methods

Objective Subjective „Prophecy“ Forecasting Methods Forecasting Methods educated guessing… Time Series Causal Methods Methods

Averages Linear Regression Sales Force Composite

Moving Averages Multiple Regression Analogies Naive Methods Dynamic Regression Delphi Exponential Smoothing PERT Vector Autoregression Simple ES Survey techniques Linear ES Intervention model

Seasonal ES Neural Networks Dampened Trend ES Neural Networks ARE

Simple Regression ƒ time series methods Autoregression ARIMA ƒ causal methods Neural Networks & CAN be used as ƒ Averages & ES Demand Planning Practice ƒ Regression … Objektive Methods + Subjektive correction EVIC’05 © Sven F. Crone - www.bis-lab.com

Different model classes of Neural Networks

ƒ Since 1960s a variety of NN were developed for different tasks Æ Classification ≠ Optimization ≠ Forecasting Æ Application Specific Models

Focus

Æ

ƒ Different CLASSES of Neural Networks for Forecasting alone! Æ Focus only on original Multilayer Perceptrons! EVIC’05 © Sven F. Crone - www.bis-lab.com

Problem!

ƒ MLP most common NN architecture used ƒ MLPs with sliding window can ONLY capture nonlinear seasonal autoregressive processes nSAR(p,P)

ƒ BUT:

Can model MA(q)-process through extended AR(p) window!

Can model SARMAX-processes through recurrent NN EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks

1. Forecasting? 2. Neural Networks? 3. Forecasting with Neural Networks … 1. NN models for Time Series & Dynamic Causal Prediction 2. NN experiments 3. Process of NN modelling 4. How to write a good Neural Network forecasting paper! EVIC’05 © Sven F. Crone - www.bis-lab.com

Time Series Prediction with Artificial Neural Networks

ƒ Which time series patterns can ANNs learn & extrapolate? [Pegels69/Gardner85]

ƒ …??? Æ Simulation of Neural Network prediction of Artificial Time Series EVIC’05 © Sven F. Crone - www.bis-lab.com

Time Series Demonstration – Artificial Time Series

ƒ Simualtion of NN in Business Forecasting with NeuroPredictor

ƒ Experiment: Prediction of Artificial Time Series (Gaussian noise)

Stationary Time Series

Seasonal Time Series

linear Trend Time Series

Trend with additive Seasonality Time Series EVIC’05 © Sven F. Crone - www.bis-lab.com

Time Series Prediction with Artificial Neural Networks

ƒ Which time series patterns can ANNs learn & extrapolate? [Pegels69/Gardner85]

;;

;;;

; ; ;

Æ Neural Networks can forecast ALL mayor time series patterns

Æ NO time series dependent preprocessing / integration necessary

Æ NO time series dependent MODEL SELECTION required!!!

Æ SINGLE MODEL APPROACH FEASIBLE! EVIC’05 © Sven F. Crone - www.bis-lab.com

Time Series Demonstration A - Lynx Trappings

ƒ Simulation of NN in Business Forecasting

ƒ Experiment: Lynx Trappings at the McKenzie River ƒ 3 layered NN: (12-8-1) 12 Input units - 8 hidden units – 1 output unit ƒ Different lag structures: t, t-1, …, t-11 (past 12 observations ƒ t+1 forecast Æ single step ahead forecast

Æ Benchmark Time Series [Andrews / Hertzberg] ƒ 114 observations ƒ Periodicity? 8 years? EVIC’05 © Sven F. Crone - www.bis-lab.com

Time Series Demonstration B – Event Model

ƒ Simulation of NN in Business Forecasting

ƒ Experiment: Mouthwash Sales ƒ 3 layered NN: (12-8-1) 12 Input units - 8 hidden units – 1 output unit ƒ 12 input lags t, t-1, …, t-11 (past 12 observations) Æ time series prediction ƒ t+1 forecast Æ single step ahead forecast

Æ Spurious Autocorrelations from Marketing Events ƒAdvertisement with small Lift ƒPrice-reductions with high Lift EVIC’05 © Sven F. Crone - www.bis-lab.com

Time Series Demonstration C – Supermarket Sales

ƒ Simulation of NN in Business Forecasting

ƒ Experiment: Supermarket sales of fresh products with weather ƒ 4 layered NN: (7-4-4-1) 7 Input units - 8 hidden units – 1 output unit t+4 ƒ Different lag structures: t, t-1, …, t-7 (past 12 observations) ƒ t+4 forecast Æ single step ahead forecast EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks 1. Forecasting? 2. Neural Networks? 3. Forecasting with Neural Networks … 1. NN models for Time Series & Dynamic Causal Prediction 2. NN experiments 3. Process of NN modelling 1. Preprocessing 2. Modelling NN Architecture 3. Training 4. Evaluation 4. How to write a good Neural Network forecasting paper! EVIC’05 © Sven F. Crone - www.bis-lab.com

Decisions in Neural Network Modelling

ƒ Data Pre-processing Transformation Scaling Normalizing to [0;1] or [-1;1] ƒ Modelling of NN architecture Number of INPUT nodes Number of HIDDEN nodes Number of HIDDEN LAYERS manual Number of OUTPUT nodes Decisions recquire Information processing in Nodes (Act. Functions) Interconnection of Nodes Expert-Knowledge

NN Modelling Process Modelling NN ƒ Training Initializing of weights (how often?) Training method (backprop, higher order …) Training parameters Evaluation of best model (early stopping) ƒ Application of Neural Network Model

ƒ Evaluation Evaluation criteria & selected dataset EVIC’05 © Sven F. Crone - www.bis-lab.com

Modeling Degrees of Freedom

ƒ Variety of Parameters must be pre-determined for ANN Forecasting: D= [DSE DSA ] Dataset Selection Sampling P= [C N S ] Preprocessing Correction Normalization Scaling A= [NI NS NL NO K T ] Architecture no. of input no. of hidden no. of hidden no. of output connectivity / Activation nodes nodes layers nodes weight matrix Strategy U= [FI FA FO ] signal processing Input Activation Output function Function Function L= [G PT,L IP IN B ] learning algorithm choice of Learning initializations number of stopping Algorithm parameters procedure initializations method & phase & layer parameters O objective Function

Æ interactions & interdependencies between parameter choices! EVIC’05 © Sven F. Crone - www.bis-lab.com

Heuristics to Reduce Design Complexity

ƒ Number of Hidden nodes in MLPs (in no. of input nodes n)

2n+1 [Lippmann87; Hecht-Nielsen90; Zhang/Pauwo/Hu98]

2n [Wong91]; n [Tang/Fishwick93]; n/2 [Kang91]

0.75n [Bailey90]; 1.5n to 3n [Kasstra/Boyd96] … ƒ Activation Function and preprocessing

logistic in hidden & output [Tang/Fischwick93; Lattermacher/Fuller95; Sharda/Patil92 ]

hyperbolic tangent in hidden & output [Zhang/Hutchinson93; DeGroot/Wurtz91]

linear output nodes [Lapedes/Faber87; Weigend89-91; Wong90] ƒ ... with interdependencies!

Æ no research on relative performance of all alternatives Æ no empirical results to support preference of single heuristic Æ ADDITIONAL SELECTION PROBLEM of choosing a HEURISTIC Æ INCREASED COMPLEXITY through interactions of heurístics Æ AVOID selection problem through EXHAUSTIVE ENUMERATION EVIC’05 © Sven F. Crone - www.bis-lab.com

Tip & Tricks in Data Sampling

ƒ Do’s and Don'ts

Random order sampling? Yes!

Sampling with replacement? depends / try!

Data splitting: ESSENTIAL!!!!

„ Training & Validation for identification, parameterisation & selection

„ Testing for ex ante evaluation (ideally multiple ways / origins!)

Æ Simulation Experiments EVIC’05 © Sven F. Crone - www.bis-lab.com

Data Preprocessing

ƒ Data Transformation

Verification, correction & editing (data entry errors etc.)

Coding of Variables

Scaling of Variables

Selection of independent Variables (PCA)

Outlier removal

Missing Value imputation

ƒ Data Coding

Binary coding of external events Æ binary coding

n and n-1 coding have no significant impact, n-coding appears to be more robust (despite issues of multicollinearity)

Æ Modification of Data to enhance accuracy & speed EVIC’05 © Sven F. Crone - www.bis-lab.com

Data Preprocessing – Variable Scaling

X2 X2 ƒ Scaling of variables Income Income 100.000 1.0 90.000 0.9 80.000 0.8 70.000 0.7 60.000 0.6 50.000 0.5 40.000 0.4 30.000 0.3 20.000 0.2 10.000 0.1 0 0.0 0.01 0.02 0.03 X1 0.0 0.1 0.2 … 0.5 … 0.9 1.0 X1 Hair Length Hair Length (x− Min(x)) Linear interval scalingy=+ ILower() IUpper − ILower Max(x)− Min(x)

Intervall features, e.g. „turnover“ [28.12 ; 70; 32; 25.05 ; 10.17 …] Linear Intervall scaling to taget intervall, e.g. [-1;1] eg. x= 72 Max(x)== 119.95 Min(x) 0 Target [-1;1] (72− 0) 144 y=− 1 + (1 − ( − 1)) =− 1 + = 0.2005 119.95− 0 119.95 EVIC’05 © Sven F. Crone - www.bis-lab.com

Data Preprocessing – Variable Scaling

X2 X2 ƒ Scaling of variables Income Income 100.000 1.0 90.000 0.9 80.000 0.8 70.000 0.7 60.000 0.6 50.000 0.5 40.000 0.4 30.000 0.3 20.000 0.2 10.000 0.1 0 0.0 0.01 0.02 0.03 X1 0.0 0.1 0.2 … 0.5 … 0.9 1.0 X1 Hair Length Hair Length

Standardisation / Normalisation x −η y = σ

ƒ Attention: Interaction of interval with activation Function

Logistic [0;1]

TanH [-1;1] EVIC’05 © Sven F. Crone - www.bis-lab.com

Data Preprocessing – Outliers

ƒ Outliers

extreme values

Coding errors

Data errors

ƒ Outlier impact on scaled variables Æ potential to bias the analysis

„ Impact on linear interval scaling (no normalisation / standardisation) Scaling

ƒ Actions 010 253 -1 +1

Æ Eliminate outliers (delete records)

Æ replace / impute values as missing values

Æ Binning of variable = rescaling

Æ Normalisation of variables = scaling EVIC’05 © Sven F. Crone - www.bis-lab.com

Data Preprocessing – Skewed Distributions

ƒ Asymmetry of observations

ƒ …

Æ Transform data ƒ Transformation of data (functional transformation of values) ƒ Linearization or Normalisation Æ Rescale (DOWNSCALE) data to allow better analysis by ƒ Binning of data (grouping of data into groups) Æ ordinal scale! EVIC’05 © Sven F. Crone - www.bis-lab.com

Data Preprocessing – Data Encoding

ƒ Downscaling & Coding of variables

metric variables Æ create bins/buckets of ordinal variables (=BINNING)

„ Create buckets of equaly spaced intervalls

„ Create bins if Quantile with equal frequencies

ordinal variable of n values Æ rescale to n or n-1 nominal binary variables

nominal Variable of n values, e.g. {Business, Sports & Fun, Woman} Æ Rescale to n or n-1 binary variables „ 0 = Business Press „ 1 = Sports & Fun „ 2 = Woman

„ Recode as 1 of N Coding Æ 3 new bit-variables

„ 1 0 0 Æ Business Press

„ 0 1 0 Æ Sports & Fun „ 0 0 1 Æ Woman

„ Recode 1 of N-1 Coding Æ 2 new bit-variables

„ 1 0 Æ Business Press „ 0 1 Æ Sports & Fun

„ 0 0 Æ Woman EVIC’05 © Sven F. Crone - www.bis-lab.com

Data Preprocessing – Impute Missing Values

X2 ƒ Missing Values Income 100.000 missing feature value for instance 90.000 some methods interpret “ “ as 0! 80.000 70.000 Others create special class for missing 60.000 50.000 … 40.000 30.000 20.000 10.000 0 0.01 0.02 0.03 X1 Hair Length ƒ Solutions

Missing value of interval scale Æ mean, median, etc.

Missing value of nominal scale Æ most prominent value in feature set EVIC’05 © Sven F. Crone - www.bis-lab.com

Tip & Tricks in Data Pre-Processing

ƒ Do’s and Don'ts

De-Seasonalisation? NO! (maybe … you can try!)

De-Trending / Integration? NO / depends / preprocessing!

Normalisation? Not necessarily Æ correct outliers!

Scaling Intervals [0;1] or [-1;1]? Both OK!

Apply headroom in Scaling? YES!

Interaction between scaling & preprocessing? limited

Æ Simulation Experiments EVIC’05 © Sven F. Crone - www.bis-lab.com

Outlier correction in Neural Network Forecasts?

Outlier correction? YES!

Neural networks are often characterized as

„ Fault tolerant and robust

„ Showing graceful degradation regarding errors Æ Fault tolerance = outlier resistance in time series prediction?

Æ Simulation Experiments EVIC’05 © Sven F. Crone - www.bis-lab.com

ƒ Number of OUTPUT nodes Given by problem domain! ƒ Number of HIDDEN LAYERS 1 or 2 … depends on Information Processing in nodes

Also depends on nonlinearity & continuity of time series ƒ Number of HIDDEN nodes Trial & error … sorry!

ƒ Information processing in Nodes (Act. Functions) Sig-Id

Sig-Sig (Bounded & additional nonlinear layer) TanH-Id TanH-TanH (Bounded & additional nonlinear layer)

ƒ Interconnection of Nodes ??? EVIC’05 © Sven F. Crone - www.bis-lab.com

Tip & Tricks in Architecture Modelling

ƒ Do’s and Don'ts

Number of input nodes? DEPENDS! Æ use linear ACF/PACF to start!

Number of hidden nodes? DEPENDS! Æ evaluate each time (few)

Number of output nodes? DEPENDS on application!

fully or sparsely connected networks? ???

shortcut connections? ???

activation functions Æ logistic or hyperbolic tangent? TanH !!!

activation function in the output layer? TanH or Identity!

Æ Simulation Experiments EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks 1. Forecasting? 2. Neural Networks? 3. Forecasting with Neural Networks … 1. NN models for Time Series & Dynamic Causal Prediction 2. NN experiments 3. Process of NN modelling 1. Preprocessing 2. Modelling NN Architecture 3. Training 4. Evaluation & Selection 4. How to write a good Neural Network forecasting paper! EVIC’05 © Sven F. Crone - www.bis-lab.com

Tip & Tricks in Network Training

ƒ Do’s and Don'ts

Initialisations? A MUST! Minimum 5-10 times!!!

Error Variation by number of Initialisations

40,0

35,0

30,0

25,0

20,0 [MAE] 15,0

10,0

5,0 lowest error 0,0 highest error 1 5 10 25 50 100 200 400 800 1600 [initialisations]

Æ Simulation Experiments EVIC’05 © Sven F. Crone - www.bis-lab.com

Tip & Tricks in Network Training & Selection

ƒ Do’s and Don'ts

Initialisations? A MUST! Minimum 5-10 times!!!

Selection of Training Algorithm? Backprop OK, DBD OK … … not higher order methods!

Parameterisation of Training Algorithm? DEPENDS on dataset!

Use of early stopping? YES – carefull with stopping criteria!

Suitable Backpropagation training parameters (to start with) „ Learning rate 0.5 (always <1!) „ Momentum 0.4 „ Decrease learning rate by 99%

Early stopping on composite error of Training & Validation

Æ Simulation Experiments EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks 1. Forecasting? 2. Neural Networks? 3. Forecasting with Neural Networks … 1. NN models for Time Series & Dynamic Causal Prediction 2. NN experiments 3. Process of NN modelling 1. Preprocessing 2. Modelling NN Architecture 3. Training 4. Evaluation & Selection 4. How to write a good Neural Network forecasting paper! EVIC’05 © Sven F. Crone - www.bis-lab.com

Experimental Results

ƒ Experiments ranked by validation error

Rank by Data Set Errors TRAIN valid-error Training Validation Test ANN ID overall lowest 0,009207 0,011455 0,017760 overall highest 0,155513 0,146016 0,398628 VALID 1st 0,010850 0,011455 0,043413 39 (3579) 2nd 0,009732 0,012093 0,023367 10 (5873) … … … … … 25th 0,009632 0,013650 0,025886 8 (919) TEST … … … … … 14400th 0,014504 0,146016 0,398628 33 (12226)

A A A AA significant positive correlations A Æ AA A AAA AA AA AA AA A A A A A 0,25 AAAA A AAAAA AAAA A AAAA A AAAA „ training & validation set AA A AAAA A A AAAA A 10 A A A A n A AA A A A A AAAA

A o A AAAAA A AAA i A AAA A AAAAA AA A AAAA t 0,20 AA AA A AAA A AAAA AAA AA AAA A A AA a AA AA AA A AAAA AAA AA A A AAAAAA „ A A A AA A A d validation & test set AAA AA A A AA

AA A A A A i AAAA AAAAAA A A AA A A l AAAAA A A A A A AAAA AAAAAAA AAA AA A AAAA AAAAAAAA A A A A AAA a AAAAAAAAAAAAAAA A AAA AAAAAAAAAAAAAAAAAA A A A AA

A A AA V

AAAAAAAAA AA A AA A A AAAA AA A t AA A A AA 0,15 A AAAAAAAAAAAAAA AA A AA A A AAAA AAAAAAAAAAAAAAAAA AA A AA A

A A A e s AA A

A AAAAAAAAAAAAAAAAAAAA A AAAAAA A A l A AAAAAAAAAAAAAAA AAAAAA A AAA 6 „ training & test set A AAA

AAAAA AA A AA A i e A AAAAAAA A A A A AAAAAAAAAAAAAAAAAAAAA AAA A A A AAAAA

A A A A t t AAAAAAAAAAAA AAA A A AA AAAAAAAAAAAAAAAAAAAAAA AAA AAA AAAAAAAAAAAAAAAAAAAAAAAA A A A A AA AAAAAAAAAAAAAAAAAAAAAAAAA A n AAAAAAAAAAAAAAAAAAAAAAAAA AAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAA A A AA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AA A A AAAAAAAAAAAAAAAAAAAAAAAAAAA A A AAAAAAAAAAAAAAAAAAAAAAAAAAAAA ce 0,10 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA A AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAA AA AA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA er AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA A AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA A P AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA A 1 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA A 0,05 AAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAA inconsistent errors by selection criteria AAAAAAAAAAAAAA AAAAAAAAAAAAA AAAAAAA AAA 0,15 „ low validation error Æ high test error 0,00 0,10 0,05 0,05 0,10 0,15 valid train „ higher validation error Æ lower test error EVIC’05 © Sven F. Crone - www.bis-lab.com

Problem: Validation Error Correlations

ƒ Correlations between dataset errors Correlation between datasets Data included Train - Validate Validate - Test Train - Test 14400 ANNs 0,7786** 0,9750** 0,7686** top 1000 ANNs 0,2652** 0,0917** 0,4204** top 100 ANNs 0,2067** 0,1276** 0,4004**

Æ validation error is questionable selection criterion S-Diagramm of Error Correlations for TOP 1000 ANNs by Training error

S-Diagramm of Error Correlations E Tain for TOP 1000 ANNs by Test error E Valid 0,09 E Test decreasing correlation 20 ANN mov.Average Train 0,08 E Tain 20 ANN mov.Average Test E Valid S-Diagramm of Error0,035 Correlations0,07 E Tain E Test 20 ANN mov.Average Train 20 ANN mov.Average Test for TOP 1000 ANNs0,03 by 0,06Valid error E Valid high variance on test error 0,05 0,025 E Test [Error] 0,04 E Tain 0,02 150 MovAv. E Test 0,3 0,03 E Valid

[Error] 150 MovAv. E Train same results ordered 0,08 0,015 0,02 E Test 0,01 20 ANN mov.Average Train 0,01 0,250,07 0 20 ANN mov.Average Test by training & test error 0,005 1 64 127 190 253 316 379 442 505 568 631 694 757 820 883 946 [ordered ANN by Train error] 0,06 0 0,2 1 64 127 190 253 316 379 442 505 568 631 694 757 820 883 946 [ordered ANN by Valid error] 0,05

0,15

[Error] 0,04 [Error]

0,03 0,1 0,02 0,05 0,01

00 11 1056 64 127 2111 190 3166 253 4221 316 5276 379 6331 442 505 7386 568 8441 631 9496 694 10551 757 11606 820 883 12661 946 13716 [ordered[ordered ANNANN byby ValidValid error]error] EVIC’05 © Sven F. Crone - www.bis-lab.com

ƒ Desirable properties of an Error Measure:

summarizes the cost consequences of the errors

Robust to outliers

Unaffected by units of measurement

Stable if only a few data points are used

Fildes,IJF, 92, Armstrong and Collopy, IJF, 92; Hendry and Clements, Armstong and Fildes, JOF, 93,94 EVIC’05 © Sven F. Crone - www.bis-lab.com

Model Evaluation through Error Measures

ƒ forecasting k periods ahead we can assess the forecast quality using a holdout sample

ƒ Individual forecast error

et+k= Actual - Forecast eyFttt= −

n ƒ Mean error (ME) 1 MEYFttktk=−+ + Add individual forecast errors ∑ n k=1 As positive errors cancel out negative errors, the ME should be approximately zero for an unbiased series of forecast

n ƒ Mean squared error (MSE) 1 2 MSEttktk=−∑() Y++ F Square the individual forecast errors n k=1 Sum the squared errors and divide by n EVIC’05 © Sven F. Crone - www.bis-lab.com

Model Evaluation through Error Measures

Æavoid cancellation of positive v negative errors: absolute errors 1 n ƒ Mean absolute error (MAE) MAE=−∑ Ytk+ F tk+ n k=1 Take absolute values of forecast errors

Sum absolute values and divide by n n 1 YFtk+ − tk+ ƒ Mean absolute percent error (MAPE) MAPE = ∑ nYk =1 tk+ Take absolute values of percent errors

Sum percent errors and divide by n

ÆThis summarises the forecast error over different lead-times ÆMay need to keep k fixed depending on the decision to be made based on the forecast: 1 T+n−k 1 T+n−k Y − F (k) MAPE(k) = t+k t MAE(k) = ∑Yt+k − Ft (k) ∑ (n−k +1) t=T (n−k +1) t=T Yt+k EVIC’05 © Sven F. Crone - www.bis-lab.com

Selecting Forecasting Error Measures

ƒ MAPE & MSE are subject to upward bias by single bad forecast ƒ Alternative measures may are based on median instead of mean

ƒ Median Absolute Percentage Error

median = middle value of a set of errors sorted in ascending order

If the sorted data set has an even number of elements, the median is the average of the two middle values f ⎛ef t , ⎞ MdAPE = Med⎜ ×100⎟ ⎝ yt ⎠

ƒ Median Squared Error f 2 MdSE = Med(ef ,t ) EVIC’05 © Sven F. Crone - www.bis-lab.com

Evaluation of Forecasting Methods

ƒ The Base Line model in a forecasting competition is the Naïve 1a No Change model Æ use as a benchmark t f t yˆ + = yt

ƒ Theil’s U statistic allows us to determine whether our forecasts outperform this base line, with increased accuracy trough our method (outperforms naïve ) if U < 1 yt f t 2 ⎛(ˆ yt − )⎞ ⎜ + f + ⎟ ∑⎜ yt ⎟ U = ⎝ ⎠ y 2 ⎛()t −y ⎞ ∑⎜ t f+ ⎟ ⎝ yt ⎠ EVIC’05 © Sven F. Crone - www.bis-lab.com

Tip & Tricks in Network Selection

ƒ Do’s and Don'ts

Selection of Model with lowest Validation error? NOT VALID!

Model & forecasting competition? Always multiple origin etc.!

Æ Simulation Experiments EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks

1. Forecasting? 2. Neural Networks? 3. Forecasting with Neural Networks … 1. NN models for Time Series & Dynamic Causal Prediction 2. NN experiments 3. Process of NN modelling 4. How to write a good Neural Network forecasting paper! EVIC’05 © Sven F. Crone - www.bis-lab.com

How to evaluate NN performance

Valid Experiments ƒ Evaluate using ex ante accuracy (HOLD-OUT data)

Use training & validation set for training & model selection

NEVER!!! Use test data except for final evaluation of accuracy ƒ Evaluate across multiple time series ƒ Evaluate against benchmark methods (NAÏVE + domain!) ƒ Evaluate using multiple & robust error measures (not MSE!) ƒ Evaluate using multiple out-of-samples (time series origins) Æ Evaluate as Empirical Forecasting Competition!

Reliable Results ƒ Document all parameter choices ƒ Document all relevant modelling decisions in process Æ Rigorous documentation to allow re-simulation through others! EVIC’05 © Sven F. Crone - www.bis-lab.com

Evaluation through Forecasting Competition

ƒ Forecasting Competition

Split up time series data Æ 2 sets PLUS multiple ORIGINS!

Select forecasting model

select best parameters for IN-SAMPLE DATA

Forecast next values for DIFFERENT HORIZONS t+1, t+3, t+18?

Evaluate error on hold out OUT-OF-SAMPLE DATA

choose model with lowest AVERAGE error OUT-OF-SAMPLE DATA

ƒ Results Æ M3-competition

simple methods outperform complex ones

exponential smoothing OK Æ neural networks not necessary

forecasting VALUE depends on VALUE of INVENTORY DECISION EVIC’05 © Sven F. Crone - www.bis-lab.com

Evaluation of Forecasting Methods

ƒ HOLD-OUT DATA Æ out of sample errors count! … today Future … … 2003 “today” presumed Future … Method Jan Feb Mar Apr Mai Jun Jul Aug Sum Sum Baseline Sales 90 100 110 ? ? ? ? ? Method A 90 90 90 90 90 90 90 90 Method B 110 100 120 100 110 100 110 100 absolute error AE(A) 0 10 20 ? ? ? ? ? 30 ? absolute error AE(B) 20 0 10 ? ? ? ? ? 10 ? … t+1 t+2 t+3

… t+1 t+2 t+3 SIMULATED = EX POST Forecasts EVIC’05 © Sven F. Crone - www.bis-lab.com

Evaluation of Forecasting Methods

ƒ Different Forecasting horizons, emulate rolling forecast …

… 2003 “today” presumed Future … Method Jan Feb Mar Apr Mai Jun Jul Aug Sum Sum Baseline Sales 90 100 110 100 90 100 110 100 Method A 90 90 90 90 90 90 90 90 Method B 110 100 120 100 110 100 110 100 absolute error AE(A) 0 10 20 10 0 10 20 10 30 50 absolute error AE(B) 20 0 10 0 20 0 0 0 30 20 … t+1 t+2 t+3 … t+1 t+2 ƒ Evaluate only RELEVANT horizons …

omit t+2 if irrelevant for planning! EVIC’05 © Sven F. Crone - www.bis-lab.com

Evaluation of Forecasting Methods

ƒ Single vs. Multiple origin evaluation … 2003 “today” presumed Future … Method Jan Feb Mar Apr Mai Jun Jul Aug Sum Sum Baseline Sales 90 100 110 100 90 100 110 100 Method A 90 90 90 90 90 90 90 90 Method B 110 100 120 100 110 100 110 100 absolute error AE(A) 0 10 20 10 0 10 20 10 30 50 absolute error AE(B) 20 0 10 0 20 0 0 0 30 20 … B A ƒ Problem of sampling Variability! B Evaluate on multiple origins A Calculate t+1 error Calculate average of t+1 error A Æ GENERALIZE about forecast errors B EVIC’05 © Sven F. Crone - www.bis-lab.com

Software Simulators for Neural Networks

Commercial Software by Price Public Domain Software

ƒ High End ƒ Research oriented Neural Works Professional SNNS SPSS Clementine JNNS JavaSNNS SAS Enterprise Miner ƒ Midprice JOONE

Alyuda NeuroSolutions …

NeuroShell Predictor

NeuroSolutions

NeuralPower Æ FREE CD-ROM for evaluation

PredictorPro Data from Experiments

„ M3-competition

ƒ Research „ airline-data Mathlab Library „ lynx-data R-package „ beer-data NeuroLab Software Simulators ƒ …

Æ Consider Tashman/Hoover Tables on forecasting Software for more details EVIC’05 © Sven F. Crone - www.bis-lab.com

Neural Networks Software - Times Series friendly!

Alyuda Inc.

Ward Systems AITrilogy: NeuroShell Predictor, NeuroShell Classifier, GeneHunter ”Let your systems learn the wisdom NeuroShell 2, NeuroShell Trader, Pro,DayTrader of age and experience” Attrasoft Inc. Predictor Predictor PRO

Promised Land Braincell

Neural Planner Inc. Easy NN Easy NN Plus

NeuroDimension NeuroSolutions Cosunsultant Inc. Neurosolutions for Excel NeuroSolutions for Mathlab Trading Solutions EVIC’05 © Sven F. Crone - www.bis-lab.com

Neural networks Software – General Applications

Neuralware Inc Neural Works Professional II Plus

SPSS SPSS Clementine DataMining Suite

SAS SAS Enterprise Miner

… … EVIC’05 © Sven F. Crone - www.bis-lab.com

Further Information

ƒ Literature & websites NN Forecasting website www.neural-forecasting.com or www.bis-lab.com

Google web-resources, SAS NN newsgroup FAQ ftp://ftp.sas.com/pub/neural/FAQ.html

BUY A BOOK!!! Only one? Get: Reeds & Marks ‘Neural Smithing’

ƒ Journals Forecasting … rather than technical Neural Networks literature! „ JBF – Journal of Business Forecasting „ IJF – International Journal of Forecasting „ JoF – Journal of Forecasting

ƒ Contact to Practitioners & Researchers Associations „ IEEE NNS – IEEE Neural Network Society „ INNS & ENNS – International & European Neural Network Society

Conferences „ Neural Nets: IJCNN, ICANN & ICONIP by associations (search google …) „ Forecasting: IBF & ISF conferences!

Newsgroups news.comp.ai.nn

Call Experts you know … me ;-) EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Business Forecasting with Artificial Neural Networks

1. Process of NN Modelling 2. Tips & Tricks for Improving Neural Networks based forecasts a. Copper Price Forecasting b. Questions & Answers and Discussion a. Advantages & Disadvantages of Neural Networks b. Discussion EVIC’05 © Sven F. Crone - www.bis-lab.com

Advantages … versus Disadvantages!

Advantages Disadvantages

- ANN can forecast any time - ANN can forecast any time series pattern (t+1!) series pattern (t+1!) - without preprocessing - without preprocessing - no model selection needed! - no model selection needed! - ANN offer many degrees of - ANN offer many degrees of freedom in modeling freedom in modeling - Freedom in forecasting with - Experience essential! one single model - Research not consistent - Complete Model Repository - explanation & interpretation - linear models of ANN weights IMPOSSIBLE - nonlinear models (nonlinear combination!) - Autoregression models - impact of events not directly - single & multiple regres. deductible - Multiple step ahead - … EVIC’05 © Sven F. Crone - www.bis-lab.com

Questions, Answers & Comments?

Summary Day I - ANN can forecast any time series pattern (t+1!) - without preprocessing - no model selection needed! - ANN offer many degrees of freedom in modeling - Experience essential! - Research not consistent Sven F. Crone [email protected] What we can offer you: SLIDES & PAPERS availble: - NN research projects with www.bis-lab.de complimentary support! - Support through MBA master www.lums.lancs.ac.uk thesis in mutual projects EVIC’05 © Sven F. Crone - www.bis-lab.com

Contact Information

Sven F. Crone Research Associate

Lancaster University Management School Department of Management Science, Room C54 Lancaster LA1 4YX United Kingdom

Tel +44 (0)1524 593867 Tel +44 (0)1524 593982 direct Tel +44 (0)7840 068119 mobile Fax +44 (0)1524 844885

Internet www.lums.lancs.ac.uk eMail [email protected]