PROCEEDING OF 3RD INTERNATIONAL CONFERENCE ON RESEARCH, IMPLEMENTATION AND EDUCATION OF MATHEMATICS AND SCIENCE YOGYAKARTA, 16 – 17 MAY 2016

M – 13 Univariate and Multivariate Time Series Models to Forecast Train Passengers in

Lusi Indah Safitri1, Suhartono2, and Dedy Dwi Prastyo3 1,2,3Department of Statistics, Institut Teknologi Sepuluh Nopember Email: [email protected]

Abstract— Time series model is one of quantitative methods that frequently used for forecasting a number of train passengers in certain route. In general, there are two types of time series models, i.e. univariate and multivariate time series. The objective of this paper is to apply ARIMA model as a univariate method and VARIMA as a multivariate method for forecasting a number of executive train passengers in Indonesia, particularly - route. The number of daily train passengers in three types of executive classes that departure from Surabaya Pasar Turi station, i.e. Argo Bromo Anggrek Pagi, Argo Bromo Anggrek Malam, and Sembrani, are used as case study. The data are consisted 761 observations and recorded from January 1st, 2014 till February 27th, 2016 and divided into two parts, i.e. January 1st, 2014 to January 30th, 2016 and 1-27 February 2016 as training and testing data, respectively. Root mean of squares error (RMSE) in testing data is used as criteria to select the best forecasting model. The results show that ARIMA yields more accurate forecast at two data, i.e. number of passengers at Argo Bromo Anggrek Pagi and Sembrani, whereas VARIMA gives better forecast at Argo Bromo Anggrek Malam. Hence, this result inlines with the first conclusion of M3 competition, i.e. statistically sophisticated or complex methods do not necessarily provide more accurate forecasts than simpler ones.

Keywords: forecasting, train passengers, ARIMA, VAR, RMSE.

I. INTRODUCTION The train is known as the mode of transport that has multiple advantages, such as energy saving, land- saving, environmentally friendly, high safety levels, able to transport large amounts, as well as adaptive to technological development [1]. The number of train tickets is sold uncertainty (fluctuatively) everyday. In general, the demand of train tickets at the weekend usually increase compared to normal days. Moreover, the peak of train tickets demand usually occurs one to five days ahead of Idul Fitri due to an annual mudik tradition in Indonesia. The term mudik refers to the exodus of Indonesian workers from the cities back to their hometowns ahead of Idul Fitri. Not only the Muslim community of Indonesia will return to their places of origin, but also people adhering to other religions traditionally use this public holiday to visit their parents or make a short holiday. This demand peak usually continues after Idul Fitri due to they must going back after this holiday.

A problem often faced by the railway operator is the large supply train ticket quotas are not appropriate to the number of train passengers. The number of train passengers usually increase in the days ahead of the national holidays or certain religious holidays. Due to the railway operator usually only provides the number of tickets as a normal day, this can lead to frustration of the passengers train because many passengers did not get a ticket. Hence, an accurate prediction of the number of rail passengers in the future is important to minimize the number of train passengers who did not get a ticket.

There are some researches on forecasting the train passengers have been conducted such as Andalita [2] who studied about forecasting the number of train passengers in economy class using ARIMA (Autoregressive Integrated Moving Average) and ANFIS (Adaptive Neuro Fuzzy Inference System) methods. Furthermore, Hermawan [3] applied the model NN (Neural Network) for forecasting the number of railway passengers in Jabodetabek. Additionally, Rosyidah [4] employed ARIMA modeling for forecasting of passenger trains on DAOP IX Jember.

M - 79

ISBN 978-602-74529-0-9

In this research, forecasting the number of train passengers in executive class with univariate and multivariate time series. The data will be used is the number of train passengers in three executive trains, i.e. Argo Bromo Anggrek Pagi, Argo Bromo Anggrek Malam and Sembrani. The univariate time series modeling used ARIMA while the multivariate time series modeling employed Vector Autoregressive Integrated Moving Average (VARIMA). Modeling of multivariate time series are not only able to predict the number of passengers in the future, but also could explain the relevance of other types of executives trains. Once the best model of univariate and multivariate time series was obtained, the models were compared based on the accuracy to predict the number of train passengers. In forecasting, multivariate methods are usually more complicated than the univariate method. However as said by Makridakis and Hibon [5] that statistical methods are more sophisticated or more complicated does not always provide more accurate estimates than a simple method. Through the comparison of the two models time series derived from both methods will be obtained the best model to predict the number of rail passengers in the executive class Pasar Turi station.

II. LITERATURE REVIEW A. Autoregressive Integrated Moving Average (ARIMA) ARIMA(p, d, q) model is a combination of the AR (Autoregressive) order p model and MA (Moving Average) order q model with differencing order d. ARIMA model can be used in the seasonal and non- seasonal data. The ARIMA (p, d, q) can be written as follows [6]: d p(B )(1 B ) Y t     q ( B ) a t ,

p Where  is a constant, pp(BBB ) (1  1    ) is polynomial backshift operator for AR q andpq(BBB ) (1  1    ) polynomial backshift operator for MA. The procedures used to obtain the forecasting value of using ARIMA consists of four steps starting from the model identification, parameter estimation, diagnostic testing and selection of the best models, forecasting. The identification can be done using ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function) plots. This step is valid if the time series is stationary that can be visually checked through time series plot. If the data is not stationary in mean, then do differencing whereas if the data is not stationary in variance, then the Box-Cox transformation can be used. The complete steps to do ARIMA modeling can be seen in [4].

B. Vector Autoregressive (VAR) The VAR model with order one denoted as VAR(1) follows this equation [6]:

Ya Y  , t o t-1 t  The model in (2) that consists of two series can be written in matrix form as follows:

YY c    a  1,t1 11 12 1, t 1  1, t ,  YY    a  2,t c2  2122  2,1 t   2, t  In general, the VAR model of order p denotes as VAR (p) is formulates as [4]:

Y   Y     Y  a . t o t-1 p t- p t  The complete description of VAR model can be seen in [6].

C. Model Identification, Diagnostic Checking, and Model Selection Identification of time series model can be done by creating ACF and PACF plots for the univariate models. The identification step for the multivariate models can be done by employing MPACF (Matrix of Partial Autoregression Function) and the AIC (Akaike Information Criterion) [6].

M - 80

PROCEEDING OF 3RD INTERNATIONAL CONFERENCE ON RESEARCH, IMPLEMENTATION AND EDUCATION OF MATHEMATICS AND SCIENCE YOGYAKARTA, 16 – 17 MAY 2016

Checking the assumptions of model is conducted after the identification and parameter estimation steps. The purpose of this step is to determine whether the model fulfills the assumptions. The ARIMA model assumes that the residual is white noise and normally distributed whereas the VARIMA model assumes that the vector of residual is white noise and follows multivariate normal distribution [6].

Selection of the appropriate model was done based on smallest RMSE (Root Mean Squared Error) of out of sample data. The RMSE is calculated as follows [6].

L 2 1 ˆ RMSEout of sample =  Y Yn  l . L l1 nl with is number of observation in out of sample data, is observation l in out sample, and is the value of l-step forecasting.

III. RESEARCH METHODOLOGY

A. Data and Variables The data used in the study were the data of the number passengers in daily train executive class with Surabaya-Jakarta route in the period from January 1, 2014, until February 29, 2016. These data are secondary data which obtained from Pasar Turi Train Station. There are three types of train executive class data, i.e. the number of passengers in the train Argo Bromo Anggrek Pagi, the number of Sembrani train's passengers, and the number of Argo Bromo Anggrek Malam passengers.

The variables in this study are denoted by Ymt, with m is stating the type of trains and t is stating the time (days). Following is the details of the variables in the study:

Y1,t : The daily number of Argo Bromo Anggrek Pagi train passengers

Y2,t : The daily number of Argo Bromo Anggrek Malam train passengers

Y3,t : The daily number of Sembrani train passengers. B. Steps of Analysis 1. ARIMA modeling with the following steps: a) Stationary inspection of data by looking at the data, ACF and PACF plots. If the data is not stationary variance, we can transform this data. If the data is not stationary in mean, we can difference this data. Formally, Augmented Dickey-Fuller test is used to check the data in the stationary mean. b) Identification of the model by ACF and PACF plots to determine the order of AR and MA c) Parameter estimation using OLS method. d) Selection of the best model by AIC criterion. e) Diagnosis check. f) Forecasting for data out samples with the selected order. 2. VAR modeling with the following steps: a) Inspection stationary of data such as the ARIMA model. b) To identify the model by MPACF and minimum AIC value thus obtained VAR order. c) Diagnosis check. d) Forecasting for data out samples with the model selected. 3. Compare the best model between univariate and multivariate models that have been selected.

IV. ANALYSIS AND RESULTS

A. Descriptive Statistics The first step in this study was dividing the data into in sample, i.e. January 1, 2014 until January 14, 2015, and out of sample data span from January 15 until February 29, 2016. The descriptive statistics of the data are presented in Table 1. The largest average of the number of train passengers is for train ABAP (Argo Bromo Anggrek Pagi), followed by train ABAM (Argo Bromo Anggrek Malam) and Sembrani. However, the variance for Sembrani is the largest. The correlation between number of train passengers across trains are shown in Table 2.

M - 81

ISBN 978-602-74529-0-9

TABEL 1. DESCRIPTIVE STATISICS FOR THE NUMBER OF TRAIN PASSENGERS Train Average Variance Minimum Maximum ABAP 350 6304 129 635 ABAM 338 5072 109 540 Sembrani 317 6307 98 544

TABEL 2. CORRELATION OF NUMBER OF PASSENGERS OF BETWEEN TRAINS

Train ABAP ABAM Sembrani ABAP 1 - - ABAM 0.677 1 - Sembrani 0.666 0.761 1

Table 2 shows that number of passengers between trains are correlated. The strongest correlation is between ABAM and Sembrani. All the correlation are significant with p-values are less than 0.05. These facts motivated the use of multivariate time series to model the data.

B. ARIMA Model ARIMA modeling begins with the identification step given the series stationary. Following is time series plot of the three train’s passenger data: ABAP (left), ABAM (middle), and Sembrani (right). Based on time series plot in Figure 1, the data seems to be not stationary. They have high fluctuations. In order to know the stationary of these data, the Box-Cox transformation was used to check it where the results were reported in Table 3.

i

g

m

a

a

P

l

i a

o 28-29 5 17-18 24

n 28-29 05 17-18 24 M

28-29 05 17-18 24 a

m 700

r 600 o

o 600 b r 6

m 5

B

m o

1

e r

o 7 5

S B

g 600 7 5

7 7 1 7 i 4 7 r 1 7 4 3 3 3 o 7 500 5 2 6 7 77 2 p 4 3 3

A 7 500 7 7 6 1 7 g 5 7 7 7 7 7 5 4 2 7 5 7 5 7 41 5 3 7

7 A 4 5 7 7 r i 4 51 3 7 7 5 1 716 7 1 7 7 7 7 5 3 6 23 2 3 4 7 5 7 7 2 3 7 7 7 7 5 7 54 2 p 4 7 4 7 5 775 7 7 7 4 6 A 6 6 1 5 6 56 7 71 7 7 624 4 5 7 7 a 715 56172 7 7 161 1 6 6 6 5 2457 23 7 6 12 75 3 5 7 5 7 6 3 i 5 1 3 4 5 7 7 7 3 t 7 7

A 5 5 7 5 500 7 7 7 5 6 676 7 5 5 7 6466 7 5575 3 5 7 5 7 43511 7 4745 7 7 7 7 7 7 4 7 6 7 67 5 5 1 5 4 5 7 77 5 51 5 4 1 5 7 1 1 1 477 7 1 6 5 p 7 77 4 5 6 5 5 6 5 7 2 7 7 7 7 7 5 4 67 5 4 5 5 5 5 5 6 5 6 e 5 7 7 7 6 27 26 7 2 7 3 1 7 7 5 43 6 7 4 7 1 5 7 7 7 77 7 2 557 3 74 3 5 5 5 1 6 17 4 5 5 6313 7 3 5 7 7 7 2 7 12 12 234 7 7 7 5 7 a 7 76 16 5 75 5 7 1 4 1 77 5 52 7 7 17 7 5 7 5 62 7 4 31 2 2 24 7 2 2 71717 2 4 7 7 55 7 4 5 24242 3 1 5 54 6 r 45 7 7 6 5 5 2452647 1 7 7 1 7 2 7 6 71 77 1 2 5 2 12 7 6 66 6 21 74 7 A 3 3 7567 7 7 7 7 3 6 7 2 4 5 21 73 7 24 1 53 3 5 5 400 6 6 1 17 3 5 26 46131251 7 7 61 1 6 7 45 53 6 3 3 5 t 36 557 4 527 7 7 1 2 7 7 6 156 2 1 33 6 7 2 7 73 7 7 7 7 55 31 7 667 4 5 5 5 3 1 1 74 7 17 4 22 1 2 1 64 73 1 5 1 3 7 1 7 7 4 5 7 5 5 2 62 671 7 6 2 3 51 3 177 3 400 357 1 7 3 41 3467 161 1 1 33 1 6 1 4 e 7 773 37 1 7 3 5 4 27 5 5 7 7 11 3 7 554 447 1 7 74752 77 13 7 1 7 5 237 132411 7 1 1 21 25 6 1 41 7 3 5613 6125 3 6521 1 131 4 26 3 7 1 3 2 55 2453 5 5 7 5 735 55 7 1 6 6 1 6 357 1 1 4 2 25 5 2 e 6 4 3 32 5 7 5 7 2 3 3 7 6 1 1 2 4 6 5 7 1 5

3 7 1 1 2 7 a 5 1 4 52 3 6 6 54 5 7 5 5

1 635 6 1525 6 77 1 7 7 7 4 1 3 1 4 524 7 1 7 77 7 777 46 2 4 4 34 4 5 11 45717216 63 42 5 2 44 2 2 5 16 1 1 12612 5 K 3 1 7 3 6 134 1 5 55 4 4 44 6 r 7 1 6 3 6 7 5 1 2 6 6 46 47 37 6 1 5 26 7 7 3 7 5 75 13 1 3 7 5 1 75 2 5 51 1 1 5 7 7 6 7 t 5272 7 5 1 5 3 6 2 463 1 3 4 31 6 5 2 6167 6 24 7 1 2 43 4 5 5

4 51 523 561 5417 1761 3 1 4512 6 1 6 1 3 7 76 7 3 63 6 46 534 6452 2 3756757 77 5 7 6 7 32 4 2 4 3 3362 1 53 1 3 2 734 221623141246 7 5 1233 17 144 41 3 6 5 524 3 5 e 2 4 61 5 7 7 5 5 1 5 3 6 4 g 2 3 1 2 47 14 4 5 400 7 16 3 3 7 3 e 57 14 4 5 5 4 2 6 2 6 4 67 17 2 3 5 4 2 14 1 4 4 5 1 27 2 5 7 7 2 2 5 6 1 41 4 24 1 61 7 2 1 6 3 32 3423 4 6 41 63 4 14 31 21 1 1 5 4 1 32 5 7 54 271 1 7 235 41 4 4 3 2 5433 54 1 3 132 3 42 5 4 5

5 1 5 4 1 2 5 2 7 5 1 r 35 1 662 325 1 5 6 4 2 4 2 4 1 7 5 1 3 32 6 3 K 36 2234 346 4 4 3 75 77 6 4 3 42 31 2 7 22 5 4 23 2 6 4 6 77 42 32 43 62 6 15145 126 23 1 1 3 5 2 4 2 4 2 41 4 4 3 1 3 n 2 4 55 3 4 6 22 1 5 5 322 51 2 31 4 2 2 2 134 3 61 2 4 6 5 3 4 1 4 3 6 1 6 14 1 3 3 5 24 6 5 111 3 3 66 1 44 5 1 5 7 624 5 3 2 4 6 5 43 5 65 7 3 34 4 1 11 2 6 2 2 6 34 1 3 5 2 e 3 5 6 7 4 4 3 2 4 6 3 12 1 2 1 3 43 14 4 32 2 5 1 2 3 4 1 4 6 5 4 4 a 7 3 2 3 41 2 2 7 32 6 2 3 6 6 623 5 2 2 5 35 5 4 65 3 43 2 6 2 57 5 2 2 3 2 6 6324 6 7 2 24 3 44 4 7 2 4 3 4 5 4 3 5 11 2 22 54 4 4 2 7 1156 2 6 2 1 2 3 1 g 4 3 5 4 1 5 2 4 4 2 1 6 1 5 3 5 6 2 4 5 6 3 4 6 6 5 1 5 31 55 6 14 5 1 K 43 3 324 5 1 5 3 6 4 3 300 1 7 2 4 32 63 3 1 2 3 3 24 5 5 3 5 1 6 5 6 66 4 6 3 12 31 6 6 p 6 3 2 23 4 4 3 5 1 6 5 5 26 2 5 56 6 5 1 4 6 2 35 3 2 5 3 2 5 6 5 3 3 5 6 6 3 n 4 54 5 4 4 3 431 300 4 1 6 1 3 23 5 33 1 6 2 2 2 3 4 5 24 46365 4 5 1 6 32 54 6 34 4 4 45234 6111 6 536 55 31 6 1 1 34 4 22 43 3 4 2 6 4 6 1 1 4 3 2 3 45 3 334 56 6 5 6 1 44 1

16 3 1 3 5 34 4 3 2 3 g 4 5 15 6 6 2 3 6 3 2 4 4 1 2 43 3 4 3 4 2 4 5 1 1 m 5 3 2 5 4 3 2 5 6 2 a 66 2 3 5 4 3 2 3 4 2 4 1 5 43 4 1 4 22 136 3 4 2 36 22 3 4 5 2 3446 45 2 754 1 2 44 16 2 51 11 25 56 6 5 62 5 6 4 2 6 131 341 6 5 6 6 4 1 2 3 1 34 2 6 6 6 5 2 6 1 63 36 3 6 2 2 2 4 7 n 5 4 1 6 63 6 2 64 2 6 4 300 5 2 3 3 4 6 3 2 3 u 6 3 4 1 p 7 67 1 2 445 2 43 61 3 3 2 6 1 6 3 4 6 34 6 6 1 2 4 32 1 3 56 4462 4 2 3 2 31 3 32 3 2 3 6 3 6 3 642 4 7 6 3 3 4 6 6 6 2 2 2 4 6 6 42 7 6 5 1 31231 2 4 1 1 61 1 4 44 6 2 6 6 3 2 3 2 3 a 3 6 5 4 6 3 6 n 5 2 5 1 3 2 2 3 3 6 2114 5 6 1 23 2 44 6 2 6 43 61 2 6 4 66 2 7 6 1 6 6 5 6 6 13 233 m 5 52 6 3 4 5 4 3 4 6 3 36 65 3 3 2 2 4 3 6 7 6 4 6 2 4 4 34 3 34 p 2 1 4 1 2 6 1 4 1 1 1 2 3 2 e 4 3 4 2 4 4 3 1 2 2 3 2 4 25 1 2 6 622 1 21 5 6 5 32 6 213 1 u 46 131 4 4 6313 61 42 26 2 3 21 2 6 4612 2 3 6 63 33 44 3 4 2 4 6 2 6 P 2 6 4 4 1 6 2 2 6 6 23 62 2 2 m 4 1 2 6 200 63 4 6 2 6 n 5 3 6 5 6 6 1 1 16 1 3 3 7 6 2 1 2 4 3 6 2 1 6 2 4 6 u 200 3 6 6 6 6

1 6 a

e 63 5 3 6 4 54 1 4 6 4 3 200 5 n 6

6 6 y

p 7 6163 6 6 2

2 2 e 1 6

6 n

44 42 2 5 P

32 k a 5 7

2 a

a y

y 100 y n 100

100 n

k

n

a

k

a B Hari 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 a Day

y Day 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 y n Bulan n b r r y n l g p t v c n b r r y n l g p t v c n b l t l t r r l t c r r l t c n Month n b r r y n g p v c n b r r y n g p v c n b Month n b y n g p v n b y n g p v n b a a p u c e a p u c e a u c a u c a p u c e a p u c e Ja e a Ju J u e O o Ja e a Ju J u e O o Ja e a e p a u J u e o e a e p a u J u e o e a e Ja e a Ju J u e O o Ja e a Ju J u e O o Ja e F M A M A S N D F M A M A S N D F a J F M A J S O D J F M A J S O D J F F M A M A S N D F M A M A S N D F

B M A N M A N B Tahun 4 5 6 Year 1 1 1 Year 14 15 16 14 15 16 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 FIGURE 1. TIME SERIES PLOT OF NUMBER OF PASSENGERS TRAINS

TABEL 3. BOX-COX TRANSFORMATION Variable Rounded Value Lower Limit Upper Limit ABAP 1 0.82 1.48 ABAM 1.36 1.06 1.71 Sembrani 1.33 1.06 1.65

In the ABAP, the rounded value for the parameter estimate in Box-Cox transformation is one. This indicates that only the ABAP variables are stationary in variance. The series for ABAM and Sembrani are required to be transformed. The next step is to check the stationary in the mean. The checking can be viewed via time series plot and ACF. ACF plot shows a form of a cut off pattern. This shows that the data has not been stationary in the mean. It is necessary to do differencing, i.e. differencing order one and seven. After differencing, the next step is to look at the ACF and PACF plot of the data that has been stationary. Through ACF and PACF plot can be determined the order of ARIMA models. The ACF and PACF plots had been stationary using differencing order one and seven.

The ARIMA models for the ABAP data is ARIMA (0, 1, 1) (0, 1, 1)7 because the ACF plot occurs in the lag 1,6,7,8 are cut off. The modeling for ABAM and Sembrani series need an outlier detection approach because the normality assumption in ARIMA models was not fulfilled. The model for ABAM series is ARIMA ([6,7,8], 1,2)(0,1,1)7 with few outliers whereas the model for Sembrani is ARIMA (7, 1, 1) (0, 1, 1)7 with few outliers. All models have fulfilled the assumptions, i.e. white noise and normality distribution. Thus, the model for Y1,t is:

Y1,t Y t 1  Y t  7 0.610 a t  1  0.812 a t  7  a 1, t .

M - 82

PROCEEDING OF 3RD INTERNATIONAL CONFERENCE ON RESEARCH, IMPLEMENTATION AND EDUCATION OF MATHEMATICS AND SCIENCE YOGYAKARTA, 16 – 17 MAY 2016

The model for Y2,t is:

27 (1 0.29BBB  0.21 )(1  0.83 ) ()T Y a W I 2,t (1 0.10BBB6  0.3 7  0.27 8 ) t AO t with WI()T is: AO t

(33) (458) (31) (634) (631) (453) (365) (416)  2554.1IIIIIIIIt  2033.3 t  2144.5 t  2173.8 t  1350.5 t  950.3 t  1040.3 t 1438.7 t (39) (452) (486) (14) (107) (271) (361) (208) 1548.3IIIIIIt  1116.1 t  1389.1 t  1084.6 t  1224.9 t  1103.6 t  1116.4IItt 1031.4  (347) (689) (609) (610) (358) (134) (730) (291) 943.9IIIIIIIIt  741.7 t  1447.5 t  1319.0 t  1154.7 t  1046.4 t  687.0 t  1005.9 t  (121) (10) (705) (551) (206) (444) (187) (194) 979.1IIIIIt  1025.2 t  1028.4 t  1016.3 t 1169.0 t 1006.7IIIt  1272.1 t  1168.1 t  (190) (176) (594) (718) (161) (632) (707) (38) (652) 986.8IIIIIIIIIt 781.8 t  792.9 t 974.8 t  838.5 t  822.2 t  764.6 t 1029.9 t  881.5 t (655) (648) (280) (431) (725) (326) (269) (337) (297) 1234.4IIIt 1093.3 t 830.7 t 735.4IIIIIIt  492.5 t  1006.2 t  634.1 t  707.4 t  612.4 t  (199) (471) (681) (16) (30) (92) (152) (211) (311) 961.5IIIIIIIIIt 824.9 t  904.5 t  717.4 t  1384.1 t  595.1 t  696.7 t 724.2 t  842.2 t  (455) (533) (483) (378) (185) (414) 522.7IIIIIIt 502.7 t  662.8 t  699.6 t  649.4 t  659.9 t .

The model for Y3,t is : 7 (1 0.44BB )(1 0.88 ) ()()TT Y a W I W I , 3,t (1 0.09BBB2  0.28 7  0.12 8 ) t AO tls t with and WI()T is: ls t

(631) (31) (452) (486) (457) (358) (414) (39)  2334.9IIIIIIIIt1249.5 t  1786.0 t  1467.7 t  1243.9 t  1299.5 t  1441.3 t  1538.4 t  (530) (485) (444) (366) (208) (306) (636) (458) 1131.3IIIIIIt  1268.8 t  1296.6 t  1689.9 t  1392.7 t  1211.1 t  1396.1IItt 1237.3 (498) (696) (87) (283) (368) (623) (416) (564) (53) 1123IIIIIIIIIt  1073.9 t  1098.0 t  1075.6 t  1003.5 t  1024 t  1000.2 t  888.9 t  973.2 t  (735) (402) (736) (365) (30) (634) (737) (361) 981.5IIIIt 896.8 t  1122.5 t  1630.6 t  1523.4IIIIt 1372.3 t  1060.0 t  1559.4 t  (5) 654.8It .

C. VARIMA Model VARIMA modeling begins with the identification step based on MACF and MPACF plots and AIC. The series for three train passenger were differenced order one and seven because the data are not stationary in the mean. These series were transformed using Box-Cox transformation because they are not stationary in variance. The minimum value of the AIC indicated that the proper model is VARIMA (30,1,0)(0,1,0)7.

The results of parameter estimation of VARIMA (30,1,0)(0,1,0)7 indicates that the model has 270 parameters. However, the p-value of each parameter some parameters were not significant. So, the model need restrict some parameters. This restriction process start from the parameter estimate with the highest p-value until all parameter estimates had p-value less than Type-I error (α = 0.05). The results of parameter estimation VARIMA (30,1,0)(0,1,0)7 with restriction showed that there are 78 parameters that were significant in the model. The VARIMA (30,1,0)(0,1,0)7 model for the series of the number of train passengers is:

M - 83

ISBN 978-602-74529-0-9

1 0 0 0.4 0 0.1  0.3 0 0.1  0.2 0 0  0.1 0 0.1  0.7 0, 01 0.1  2   3   4   5 0 1 0  0 0.7 0BBBBB   0.3  0.6 0    0.7  0.4 0.4    0.7  0.3 0    0  0.2 0.5   0 0 1  0 0 0.7  0 0 0.6   0 0 0.3   0.1 0  0.3   0.1 0  0.2 

0.1 0 0 0.7 0 0 0.44 0 0.1 0.24 0 0 0.2 0 0 0.1 0 0 6  7   8   9   10   11 0 0.2 0 B 0  0.7 0 BBBBB   0  0.5 0    0  0.4 0    0  0.2 0    0  0.2 0   0 0 0.1 0 0 0.7   0 0  0.5   0 0 0.4   0 0  0.2   0 0  0.1 

0 0 0 0.4 0 0 0.3 0 0 0.1 0 0  0.1 0 0  0.1 0 0  12   14  15   16   17   18 0  0.1 0 BB   0  0.4 0   0  0.3 0 BBBB  0 0.2 0    0  0.1 0    0 0 0  0 0.1 0   0 0 0.5  0 0 0.4   0 0  0.3   0 0  0.2   0 0 0 

0 0 0 0.1 0 0 0.3 0 0 0.2 0 0 0 0 0.1 0.1 0 0 0.1 0 0  19   20   21   22  23   24   25 0 0 0 BBBB   0 0 0    0 0.3 0    0  0.2 0   0 0.1 0 BBB  0 0.1 0   0 0 0  0.1 0 0   0.1 0 0   0 0 0.3   0 0 0.2  0 0 0.2   0 0 0.1   0 0 0 

7 0.2 0 0 0.1 0 0 (1BBY )(1 ) a   28   29  7 1,t 1,t 0 0.2 0B  0  0.1 0 B (1  B )(1  B ) Y  a .      7 2,t 2,t  0 0 0.2 0 0 0 (1BBY )(1 ) a      3,t 3,t 

Based on the model that had been established, the next step is testing the assumptions on residual. In multivariate time series modeling, to testing the assumption of white noise on the residual can be done by looking at the results of the portmanteau test. In this study, testing the white noise used AIC criterion calculated from the residual of VARIMA (30,1,0)(0,1,0)7 model. Based on Table 4, the smallest AIC value is in AR (0) and MA (0). This suggests that the residuals of the model have fulfilled the white noise assumption.

TABEL 4. MINIMUM INFORMATION CRITERION OF RESIDUAL Lag MA 0 MA 1 MA 2 MA 3 MA 4 MA 5 AR(0) 27.26852 27.32797 27.32771 27.32819 27.32541 27.3277 AR(1) 27.28481 27.31673 27.32693 27.3307 27.33645 27.34313 AR(2) 27.29714 27.32953 27.33551 27.33822 27.34943 27.35946 AR(3) 27.3034 27.3347 27.33804 27.34664 27.35837 27.37177 AR(4) 27.30585 27.3421 27.35016 27.35857 27.37946 27.39254 AR(5) 27.31611 27.35402 27.36436 27.37477 27.39558 27.4099

The next assumption that must be fulfilled is multivariate normal distribution for the vector of residual. The null hypothesis of the test is that the residuals follows have multivariate normal distribution. The null hypothesis will be fail to be rejected if the p-value of the statistic test exceed the value of Type-I error. The evaluation of the multivariate normal assumption test can also be done visually through QQ plot of the residual. The assumption is fulfilled when residual plot tends to form a straight diagonal line as displayed by Figure 2. Moreover, if the proportion of values which generated through calculation are at least 50% greater than the value of statistic test, in this study is about 83%, the null hypothesis is fail to be rejected. Therefore, residuals of the VARIMA (30,1,0)(0,1,0)7 model have multivariate normal distribution.

18

16

14

12

10 q 8

6

4

2

0

0 2 4 6 8 10 12 14 16 18 dj FIGURE 2. QQ PLOT FOR RESIDUALS Once the best model for ARIMA and VARIMA were obtained, the out of sample forecast can be calculated for the series. The RMSE out of the samples were reported in Table 5. The models for ABAP

M - 84

PROCEEDING OF 3RD INTERNATIONAL CONFERENCE ON RESEARCH, IMPLEMENTATION AND EDUCATION OF MATHEMATICS AND SCIENCE YOGYAKARTA, 16 – 17 MAY 2016

and Sembrani have relatively large RMSE for both univariate and multivariate models. This indicates that the two models are less good than the model for ABAM. The comparison of RMSE out samples for univariate and multivariate models indicated that multivariate model, which is more complex, were not always resulted in better prediction than the simpler model. Often in some studies suggest that more complicated method will yield better accuracy but the reality is not always so. Table 6 reported the forecasting value obtained from ARIMA model for each train.

TABEL 5. THE COMPARISON RMSE OUT OF SAMPLE FOR UNIVARIATE AND MULTIVARIATE

Variable RMSE Out of Sample for ARIMA RMSE Out of Sample for VARIMA ABAP 111.5 133.8 ABAM 91.2 77.5 Sembrani 102.1 120.0

TABEL 6. THE RESULTS OF FORECASTING FOR EACH TRAIN ARIMA ARIMA Dates Dates ABAP ABAM Sembrani ABAP ABAM Sembrani 15-Jan-16 274 372 287 6-Feb-16 253 285 150 16-Jan-16 295 285 213 7-Feb-16 339 432 308 17-Jan-16 382 432 369 8-Feb-16 255 305 167 18-Jan-16 297 305 226 9-Feb-16 197 286 153 19-Jan-16 240 286 198 10-Feb-16 190 322 165 20-Jan-16 232 322 219 11-Feb-16 213 281 162 21-Jan-16 255 281 204 12-Feb-16 218 369 234 22-Jan-16 260 369 269 13-Feb-16 239 281 137 23-Jan-16 281 281 182 14-Feb-16 325 430 297 24-Jan-16 368 430 337 15-Feb-16 241 302 155 25-Jan-16 283 302 198 16-Feb-16 183 283 140 26-Jan-16 225 283 180 17-Feb-16 176 318 152 27-Jan-16 218 318 194 18-Feb-16 199 280 150 28-Jan-16 241 280 188 19-Feb-16 204 366 223 29-Jan-16 246 366 256 20-Feb-16 225 278 124 30-Jan-16 267 278 165 21-Feb-16 311 428 287 31-Jan-16 354 428 320 22-Feb-16 227 299 142 1-Feb-16 269 299 181 23-Feb-16 169 281 127 2-Feb-16 211 281 166 24-Feb-16 162 315 139 3-Feb-16 204 315 178 25-Feb-16 185 278 137 4-Feb-16 227 278 175 26-Feb-16 190 363 212 5-Feb-16 232 372 245 27-Feb-16 211 276 110

V. CONCLUSIONS AND RECOMMENDATIONS The empirical results found that the best model for univariate approach to model number of passenger train are ARIMA (0,1,1)(0,1,1)7 for Argo Bromo Anggrek Pagi, ARIMA([6,7,8],1,2)(0,1,1)7 with outliers detection for Argo Bromo Anggrek Malam, and ARIMA (7,1,1)(0,1,1)7 with outliers detection for Sembrani. The multivariate models appropriate to model these three series is VARIMA (30,1,0) (0,1,0)7. Based on the RMSE value for out of sample data, it can be concluded that the ARIMA model had better prediction for two series whereas the VARIMA model outperformed in one series, i.e. series for Argo Bromo Anggrek Malam.

The empirical results also showed that many outliers were found in the data and influenced the forecast accuracy of both univariate and multivariate models. Hence, more detail about outlier detection can be done for further research. Moreover, non linear time series models such as ANN (Artificial Neural Network) which is flexible to overcome data with outliers also could be considered as a future research for forecasting train passengers in Indonesia.

M - 85

ISBN 978-602-74529-0-9

REFERENCES [1] PT. (2016). www.kereta-api.co.id. Situs Resmi PT. Kereta Api Indonesia (Persero). [2] Andalita, I. (2015). Peramalan Jumlah Penumpang Keret Api Kelas Ekonomi Kertajaya Menggunakan ARIMA dan ANFIS. Tugas Akhir Statitika ITS. Surabaya. [3] Hermawan, N. (2014). Aplikasi Model Neural Network dan Neuron Fuzzy Untuk Peramalan Banyaknya Penumpang Kereta Api Jabotadetabek. Tugas Akhir UNY. Yogyakarta. [4] Rosyiidah, U. (2013). Pemodelan ARIMA Dalam Peramalan Penumpang Kereta Api Pada DAOP IX Jember. Jurnal. Jember. [5] Makridakis, S., & Hibon, M. (2000). The M3-Competition: Results, Conclusions, and Implications. International Journal of Forecasting, 16, 451-476 [6] W. W. Wei. Time Series Analysis (Univariate and Multivariate Methods). United States of America:Pearson Education,Inc. (2006)

M - 86