<<

Rob J Hyndman Forecasting using

3. and OTexts.com/fpp/2/ OTexts.com/fpp/6/1

Forecasting using R 1 Outline

1 graphics

2 Seasonal or cyclic?

3 Autocorrelation

Forecasting using R Time series graphics 2 Time series graphics

Time plots R command: plot or plot.ts Seasonal plots R command: seasonplot Seasonal subseries plots R command: monthplot Lag plots R command: lag.plot ACF plots R command: Acf

Forecasting using R Time series graphics 3 Time series graphics Economy class passengers: Melbourne−Sydney plot(melsyd[,"Economy.Class"]) 30 25 20 15 Thousands 10 5 0

1988 1989 1990 1991 1992 1993

Year

Forecasting using R Time series graphics 4 Time series graphics Antidiabetic drug sales 30 > plot(a10) 25 20 15 $ million 10 5

1995 2000 2005

Year

Forecasting using R Time series graphics 5 Time series graphics Seasonal plot: antidiabetic drug sales

30 2008 ●

2007 ●

● 2007 ● 25 ● ● 2006 ● ● ● ● ● ● 2006

● ● ● ● 2005 ● ● ● ● 2005 20 ● ● ● ● 2004 ● ● 2004 ● ● ● ● ● ● ● ● ● 2003 ● ● ● ● ● ● 2003 2002 ● ● ● ● ● ● $ million ● 2002 15 ● ● 2001 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2001 2000 ● ● ● ● ● ● ● 2000 ● ● ● 1999 ● ● ● ● ● ● ● ● ● ● 1998 1999 ● ● ● ● ● ● ● 1997 10 ● ● ● ● ● ● ● ● ● ● ● 1998 ● ● 1996 19961997 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 19931995 ● ● ● ● ● 19941995 ● ● ● ● ● ● ● ● ● ● ● ● 1993 ● ● ● 1994 ● ● ● ● ● ● ● 1992 ● ● ● ● ● ● ● ● ● 5 1992 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1991 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Year Forecasting using R Time series graphics 6 Seasonal plots

Data plotted against the individual “seasons” in which the were observed. (In this case a “season” is a month.) Something like a time plot except that the data from each season are overlapped. Enables the underlying seasonal pattern to be seen more clearly, and also allows any substantial departures from the seasonal pattern to be easily identified. In R: seasonplot

Forecasting using R Time series graphics 7 Seasonal plots

Data plotted against the individual “seasons” in which the data were observed. (In this case a “season” is a month.) Something like a time plot except that the data from each season are overlapped. Enables the underlying seasonal pattern to be seen more clearly, and also allows any substantial departures from the seasonal pattern to be easily identified. In R: seasonplot

Forecasting using R Time series graphics 7 Seasonal plots

Data plotted against the individual “seasons” in which the data were observed. (In this case a “season” is a month.) Something like a time plot except that the data from each season are overlapped. Enables the underlying seasonal pattern to be seen more clearly, and also allows any substantial departures from the seasonal pattern to be easily identified. In R: seasonplot

Forecasting using R Time series graphics 7 Seasonal plots

Data plotted against the individual “seasons” in which the data were observed. (In this case a “season” is a month.) Something like a time plot except that the data from each season are overlapped. Enables the underlying seasonal pattern to be seen more clearly, and also allows any substantial departures from the seasonal pattern to be easily identified. In R: seasonplot

Forecasting using R Time series graphics 7 Seasonal subseries plots Seasonal subseries plot: antidiabetic drug sales

30 > monthplot(a10) 25 20 15 $ million 10 5

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Month

Forecasting using R Time series graphics 8 Seasonal subseries plots

Data for each season collected together in time plot as separate time series. Enables the underlying seasonal pattern to be seen clearly, and changes in seasonality over time to be visualized. In R: monthplot

Forecasting using R Time series graphics 9 Seasonal subseries plots

Data for each season collected together in time plot as separate time series. Enables the underlying seasonal pattern to be seen clearly, and changes in seasonality over time to be visualized. In R: monthplot

Forecasting using R Time series graphics 9 Seasonal subseries plots

Data for each season collected together in time plot as separate time series. Enables the underlying seasonal pattern to be seen clearly, and changes in seasonality over time to be visualized. In R: monthplot

Forecasting using R Time series graphics 9 Quarterly Australian Beer Production

beer <- window(ausbeer,start=1992) plot(beer) seasonplot(beer,year.labels=TRUE) monthplot(beer)

Forecasting using R Time series graphics 10 Time series graphics Australian quarterly beer production 500 450 megaliters 400

1995 2000 2005

Forecasting using R Time series graphics 11 Time series graphics Seasonal plot: quarterly beer production ● 19941992 ● 1997 ● 1999 ● 19981995 ● 1993 ● 20021996 ● 2000 500 ● 200320062001 ● 2005 ● 2007

● 2004 19942001 ● 450 1992 ● megalitres ● 2006 ● 19971993199920032004 ● 19982002 ● 19952007 ● 2000 ● ● ● 2008 ● ● 2005 ● ● ● ● 1996 ● ● ● ● ● 400 ● ● ● ● ●

Q1 Q2 Q3 Q4

Quarter

Forecasting using R Time series graphics 12 Time series graphics Seasonal subseries plot: quarterly beer production 500 450 Megalitres 400

Jan Apr Jul Oct

Quarter

Forecasting using R Time series graphics 13 Outline

1 Time series graphics

2 Seasonal or cyclic?

3 Autocorrelation

Forecasting using R Seasonal or cyclic? 14 Time series patterns

Trend pattern exists when there is a long-term increase or decrease in the data. Seasonal pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week). Cyclic pattern exists when data exhibit rises and falls that are not of fixed period (duration usually of at least 2 years).

Forecasting using R Seasonal or cyclic? 15 Time series patterns Australian electricity production 14000 12000 GWh 10000 8000

1980 1985 1990 1995

Year

Forecasting using R Seasonal or cyclic? 16 Time series patterns Australian clay brick production 600 500 400 million units 300 200

1960 1970 1980 1990

Year

Forecasting using R Seasonal or cyclic? 17 Time series patterns Sales of new one−family houses, USA 90 80 70 60 Total sales Total 50 40 30

1975 1980 1985 1990 1995

Forecasting using R Seasonal or cyclic? 18 Time series patterns US Treasury bill contracts 91 90 89 88 price 87 86 85

0 20 40 60 80 100

Day

Forecasting using R Seasonal or cyclic? 19 Time series patterns Annual Canadian Lynx trappings 7000 6000 5000 4000 3000 Number trapped 2000 1000 0

1820 1840 1860 1880 1900 1920

Time

Forecasting using R Seasonal or cyclic? 20 Seasonal or cyclic?

Differences between seasonal and cyclic patterns: seasonal pattern constant length; cyclic pattern variable length average length of cycle longer than length of seasonal pattern magnitude of cycle more variable than magnitude of seasonal pattern

The timing of peaks and troughs is predictable with seasonal data, but unpredictable in the long term with cyclic data.

Forecasting using R Seasonal or cyclic? 21 Seasonal or cyclic?

Differences between seasonal and cyclic patterns: seasonal pattern constant length; cyclic pattern variable length average length of cycle longer than length of seasonal pattern magnitude of cycle more variable than magnitude of seasonal pattern

The timing of peaks and troughs is predictable with seasonal data, but unpredictable in the long term with cyclic data.

Forecasting using R Seasonal or cyclic? 21 Seasonal or cyclic?

Differences between seasonal and cyclic patterns: seasonal pattern constant length; cyclic pattern variable length average length of cycle longer than length of seasonal pattern magnitude of cycle more variable than magnitude of seasonal pattern

The timing of peaks and troughs is predictable with seasonal data, but unpredictable in the long term with cyclic data.

Forecasting using R Seasonal or cyclic? 21 Seasonal or cyclic?

Differences between seasonal and cyclic patterns: seasonal pattern constant length; cyclic pattern variable length average length of cycle longer than length of seasonal pattern magnitude of cycle more variable than magnitude of seasonal pattern

The timing of peaks and troughs is predictable with seasonal data, but unpredictable in the long term with cyclic data.

Forecasting using R Seasonal or cyclic? 21 Seasonal or cyclic?

Differences between seasonal and cyclic patterns: seasonal pattern constant length; cyclic pattern variable length average length of cycle longer than length of seasonal pattern magnitude of cycle more variable than magnitude of seasonal pattern

The timing of peaks and troughs is predictable with seasonal data, but unpredictable in the long term with cyclic data.

Forecasting using R Seasonal or cyclic? 21 Outline

1 Time series graphics

2 Seasonal or cyclic?

3 Autocorrelation

Forecasting using R Autocorrelation 22 Autocorrelation

Covariance and correlation: measure extent of linear relationship between two variables (y and X).

Autocovariance and autocorrelation: measure linear relationship between lagged values of a time series y.

We measure the relationship between: yt and yt−1 yt and yt−2 yt and yt−3 etc.

Forecasting using R Autocorrelation 23 Autocorrelation

Covariance and correlation: measure extent of linear relationship between two variables (y and X).

Autocovariance and autocorrelation: measure linear relationship between lagged values of a time series y.

We measure the relationship between: yt and yt−1 yt and yt−2 yt and yt−3 etc.

Forecasting using R Autocorrelation 23 Autocorrelation

Covariance and correlation: measure extent of linear relationship between two variables (y and X).

Autocovariance and autocorrelation: measure linear relationship between lagged values of a time series y.

We measure the relationship between: yt and yt−1 yt and yt−2 yt and yt−3 etc.

Forecasting using R Autocorrelation 23 Example: Beer production > lag.plot(beer,lags=9)

400 450 500 400 450 500

12 4 12 4 4 12 24 24 24 32 32 32 16 28 28 16 16 28 8 8 8 2044 44 20 20 44 36 36 36

500 604048 6048 40 60 4048 56 56 56 64 64 64

52 52 52 379 37 9 37 9 beer beer beer

450 1 1 1 57 57 57 452949 49 4529 49 45 29 21 5 215 5 21 25 41 2541 41 25 61 13 31 61 1331 31 61 13 65 33 6 47 113 6533 11 347 6 47 11 3 6 33 53 39 35 15 5315 39 35 35 15 39 53 2 51 7 51 7 2 7 51 2 17 42 14 55 43 23 17 234355 42 14 4355 23 14 42 17 5434 59 27 59 27 54 34 2759 54 34 182622 19 19 18 26 22 19 221826

400 63 63 63 6650 50 50 58 58 58 62 38461030 62 4638 30 10 6230103846 lag 1 lag 2 lag 3

4 12 12 4 4 12 24 24 24 32 32 32 16 28 28 16 1628 8 8 8 44 20 2044 4420 36 36 36

48 60 40 4860 40 40 60 48 500 56 56 56

52 52 52 937 379 37 9 beer beer beer

1 1 1 450 57 57 57 4929 45 45 2949 494529 21 5 5 21 21 5 4125 4125 4125 133161 61 13 31 13 61 31 6 34711 33 33 6 47 3 11 114733 3 6 15 39 35 53 53 35 3915 53 351539 51 27 2 51 7 7 51 2 42 142355 43 17 1417 42 4355 23 17 55 2343 42 14 3454 59 27 54 34 59 27 5927 5434 26 2218 19 2218 26 19 19 222618

63 400 50 50 50 58 58 58 4662 303810 3862463010 46 3038 10 lag 4 lag 5 lag 6

12 4 12 4 4 12 24 24 24 32 32 32 16 28 28 16 16 28 8 8 8 20 44 44 20 44 20 36 36 36

500 4860 40 48 40 4048 56 56 56

52 52 52 37 9 9 37 37 9 beer beer beer

450 1 1 1 57 57 57 45 49 29 45 49 29 2949 45 21 5 5 21 21 5 41 25 25 41 41 25 31 13 31 13 13 31 3 1147 6 33 11 476 333 33 6 47 11 3 39 15 35 53 3515 3953 53 39 35 15 51 7 2 2 51 7 2 51 7 235543 1442 17 425514 43 2317 17 1442 43 55 23 59 27 5434 54 345927 54 34 27 19 261822 22 182619 18 26 22 19 400 50 50 50 58 58 58 1046 38 30 3830 1046 10 46 3038 lag 7 lag 8 lag 9 400 450 500

Forecasting using R Autocorrelation 24 Example: Beer production > lag.plot(beer,lags=9,do.lines=FALSE)

400 450 500 400 450 500

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●

500 ●● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ●● ● ● ● ● beer beer beer

450 ● ● ● ● ● ● ●● ● ● ● ●●● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● 400 ●● ● ● ● ● ● ● ●●●● ● ●● ● ● ●●●● ● lag 1 lag 2 lag 3

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ●● ● ● ● ● 500 ● ● ●

● ● ● ●● ● ● ● beer beer beer

● ● ● 450 ● ● ● ●● ● ● ● ● ● ●●● ●●●● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● 400 ● ● ● ● ● ● ●●●●● ● ● ● ● lag 4 lag 5 lag 6

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

500 ●● ● ● ● ●● ● ● ●

● ● ● ● ● ● ● ● ● beer beer beer

450 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ●●● ● ● ● ●

400 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● lag 7 lag 8 lag 9 400 450 500

Forecasting using R Autocorrelation 25 Lagged scatterplots

Each graph shows yt plotted against yt−k for different values of k. The are the correlations associated with these scatterplots.

Forecasting using R Autocorrelation 26 Lagged scatterplots

Each graph shows yt plotted against yt−k for different values of k. The autocorrelations are the correlations associated with these scatterplots.

Forecasting using R Autocorrelation 26 Autocorrelation

We denote the sample autocovariance at lag k by ck and the sample autocorrelation at lag k by rk. Then define

T 1 X ck = (yt − y¯)(yt−k − y¯) T t=k+1

and rk = ck/c0

r1 indicates how successive values of y relate to each other

r2 indicates how y values two periods apart relate to each other

rk is almost the same as the sample correlation between

yt and yt−k.

Forecasting using R Autocorrelation 27 Autocorrelation

We denote the sample autocovariance at lag k by ck and the sample autocorrelation at lag k by rk. Then define

T 1 X ck = (yt − y¯)(yt−k − y¯) T t=k+1

and rk = ck/c0

r1 indicates how successive values of y relate to each other

r2 indicates how y values two periods apart relate to each other

rk is almost the same as the sample correlation between

yt and yt−k.

Forecasting using R Autocorrelation 27 Autocorrelation

We denote the sample autocovariance at lag k by ck and the sample autocorrelation at lag k by rk. Then define

T 1 X ck = (yt − y¯)(yt−k − y¯) T t=k+1

and rk = ck/c0

r1 indicates how successive values of y relate to each other

r2 indicates how y values two periods apart relate to each other

rk is almost the same as the sample correlation between

yt and yt−k.

Forecasting using R Autocorrelation 27 Autocorrelation

We denote the sample autocovariance at lag k by ck and the sample autocorrelation at lag k by rk. Then define

T 1 X ck = (yt − y¯)(yt−k − y¯) T t=k+1

and rk = ck/c0

r1 indicates how successive values of y relate to each other

r2 indicates how y values two periods apart relate to each other

rk is almost the same as the sample correlation between

yt and yt−k.

Forecasting using R Autocorrelation 27 Autocorrelation Results for first 9 lags for beer data:

r1 r2 r3 r4 r5 r6 r7 r8 r9 −0.126 −0.650 −0.094 0.863 −0.099 −0.642 −0.098 0.834 −0.116

Forecasting using R Autocorrelation 28 Autocorrelation Results for first 9 lags for beer data:

r1 r2 r3 r4 r5 r6 r7 r8 r9 −0.126 −0.650 −0.094 0.863 −0.099 −0.642 −0.098 0.834 −0.116 0.5 ACF 0.0 −0.5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Lag

Forecasting using R Autocorrelation 28 Autocorrelation

r4 higher than for the other lags. This is due to the seasonal pattern in the data: the peaks tend to be 4 quarters apart and the troughs tend to be 2 quarters apart.

r2 is more negative than for the other lags because troughs tend to be 2 quarters behind peaks. Together, the autocorrelations at lags 1, 2, . . . , make up the autocorrelation or ACF. The plot is known as a

Forecasting using R Autocorrelation 29 Autocorrelation

r4 higher than for the other lags. This is due to the seasonal pattern in the data: the peaks tend to be 4 quarters apart and the troughs tend to be 2 quarters apart.

r2 is more negative than for the other lags because troughs tend to be 2 quarters behind peaks. Together, the autocorrelations at lags 1, 2, . . . , make up the autocorrelation or ACF. The plot is known as a correlogram

Forecasting using R Autocorrelation 29 Autocorrelation

r4 higher than for the other lags. This is due to the seasonal pattern in the data: the peaks tend to be 4 quarters apart and the troughs tend to be 2 quarters apart.

r2 is more negative than for the other lags because troughs tend to be 2 quarters behind peaks. Together, the autocorrelations at lags 1, 2, . . . , make up the autocorrelation or ACF. The plot is known as a correlogram

Forecasting using R Autocorrelation 29 Autocorrelation

r4 higher than for the other lags. This is due to the seasonal pattern in the data: the peaks tend to be 4 quarters apart and the troughs tend to be 2 quarters apart.

r2 is more negative than for the other lags because troughs tend to be 2 quarters behind peaks. Together, the autocorrelations at lags 1, 2, . . . , make up the autocorrelation or ACF. The plot is known as a correlogram

Forecasting using R Autocorrelation 29 ACF 0.5 ACF 0.0 −0.5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Acf(beer) Lag

Forecasting using R Autocorrelation 30 ACF 0.5 ACF 0.0 −0.5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Acf(beer) Lag

Forecasting using R Autocorrelation 30 Recognizing seasonality in a time series

If there is seasonality, the ACF at the seasonal lag (e.g., 12 for monthly data) will be large and positive. For seasonal monthly data, a large ACF value will be seen at lag 12 and possibly also at lags 24, 36, . . . For seasonal quarterly data, a large ACF value will be seen at lag 4 and possibly also at lags 8, 12, . . .

Forecasting using R Autocorrelation 31 Recognizing seasonality in a time series

If there is seasonality, the ACF at the seasonal lag (e.g., 12 for monthly data) will be large and positive. For seasonal monthly data, a large ACF value will be seen at lag 12 and possibly also at lags 24, 36, . . . For seasonal quarterly data, a large ACF value will be seen at lag 4 and possibly also at lags 8, 12, . . .

Forecasting using R Autocorrelation 31 Australian monthly electricity production Australian electricity production 14000 12000 GWh 10000 8000

1980 1985 1990 1995

Year

Forecasting using R Autocorrelation 32 Australian monthly electricity production 0.8 0.6 0.4 ACF 0.2 0.0 −0.2 0 10 20 30 40

Lag

Forecasting using R Autocorrelation 33 Australian monthly electricity production

Time plot shows clear trend and seasonality. The same features are reflected in the ACF. The slowly decaying ACF indicates trend. The ACF peaks at lags 12, 24, 36, . . . , indicate seasonality of length 12.

Forecasting using R Autocorrelation 34 Australian monthly electricity production

Time plot shows clear trend and seasonality. The same features are reflected in the ACF. The slowly decaying ACF indicates trend. The ACF peaks at lags 12, 24, 36, . . . , indicate seasonality of length 12.

Forecasting using R Autocorrelation 34 Which is which?

1. Daily morning temperature of a cow 2. Accidental deaths in USA (monthly) thousands chirps per minute 7 8 9 10 40 60 80 0 20 40 60 1973 1975 1977 1979

3. International airline passengers 4. Annual mink trappings (Canada) thousands thousands 20 60 100 100 200 300 400 1950 1952 1954 1956 1850 1870 1890 1910

A B ACF ACF -0.4 0.2 0.6 1.0 -0.4 0.2 0.6 1.0

5 10 15 20 5 10 15 20 C D ACF ACF -0.4 0.2 0.6 1.0 -0.4 0.2 0.6 1.0

5 10 15 20 5 10 15 20 Time series graphics

Time plots R command: plot.ts Seasonal plots R command: seasonplot Seasonal subseries plots R command: monthplot Lag plots R command: lag.plot ACF plots R command: Acf

Forecasting using R Autocorrelation 36