MODELING MALAYSIAN ROAD ACCIDENTS: THE STRUCTURAL TIME SERIES APPROACH

NOOR WAHIDA BINTI MD JUNUS

UNIVERSITI SAINS

2018

MODELING MALAYSIAN ROAD ACCIDENTS: THE STRUCTURAL TIME SERIES APPROACH

by

NOOR WAHIDA BINTI MD JUNUS

Thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy

January 2018

ACKNOWLEDGEMENT

First and foremost praise to Allah the Almighty who give knowledge, strength and determination to finally finish my thesis even though the journey was so hard.

This success cannot be achieved without the guidance and assistance from others.

Therefore I would like to express my sincere gratitude to my supervisor Assoc. Prof.

Dr. Mohd Tahir Ismail for the continuous support of my Ph.D study and related research, for his patience, motivation, and immense knowledge. Besides, I would like to thank my co-supervisor Dr. Zainudin Arsad for his insightful comments and encouragement, but also for the hard question which incented me to widen my research from various perspectives.

A very special gratitude goes out to Ministry of Higher Education as well as Sultan

Idris Education University for helping and providing the funding of my study. This special gratitude also goes to Institute of Postgraduate Study and School of

Mathematical Sciences that supported several conference fees along my study period.

Finally I am grateful to my family members especially my mother who have provided me through moral and emotional support in my life. Last but by no means least, to all my friends especially everyone in School of Mathematical Sciences postgraduate lab, it was great sharing laboratory with all of you during last four years.

Thanks for all your encouragement.

ii

TABLE OF CONTENTS

ACKNOWLEDGEMENT ii

TABLE OF CONTENTS iii

LIST OF TABLES viii

LIST OF FIGURES xi

LIST OF SYMBOLS AND ABBREVIATIONS xiii

ABSTRAK xxi

ABSTRACT xxiii

CHAPTER 1- INTRODUCTION

1.1 Background of the Study 1

1.2 Motivation and Problem Statements 4

1.3 Objective 5

1.4 Contribution of the Study 7

1.5 Scope of data 8

1.5.1 Road Accidents 9

1.5.2 Climate Related Variables 10

1.5.3 Economic Related Variables 13

1.5.4 Seasonal Related Variables 14

1.5.5 Road Safety Related Variables 17

1.6 Limitation of the Study 18

1.7 Summary and Thesis Organization 19

CHAPTER 2 - LITERATURE REVIEW

2.1 Structural Time Series 21

2.2 Advantages of Structural Time Series 23

iii

2.3 Application of the Structural Time Series Model 24

2.3.1 Economics 24

2.3.2 Meteorology, Ecology and Agriculture 26

2.3.3 Other Disciplines 27

2.4 Common Techniques of Road Safety Modeling 28

2.5 Structural Time Series in Road Safety 43

2.6 Comparison of Road Accidents Models between This Study with

Previous Studies 47

2.7 Summary 49

CHAPTER 3 - METHODOLOGY

3.1 Properties of Data 50

3.1.1 Descriptive Statistics 50

3.1.2 Time Series Plot 52

3.1.3 Correlation analysis 52

3.1.4 Unit Root Test 53

3.2 Regression Analysis 55

3.2.1 Time Series Regression 55

3.2.2 Parameter Estimation and Hypothesis Testing 56

3.2.3 Diagnostics Checking 59

3.3 Box and Jenkins Analysis 62

3.3.1 Box and Jenkins ARIMA Model 63

3.3.2 Box and Jenkins Model Identification 66

3.3.3 Box and Jenkins Model Estimation and Validation 70

3.4 Structural Time Series 72

3.4.1 Trend Model 73

iv

3.4.2 Seasonal Model 74

3.4.3 Incorporating Explanatory and Intervention Variable 76

3.4.4 State Space Form 77

3.4.5 Kalman Filter Estimation 79

3.5 Evaluation of Structural Time Series Model 82

3.5.1 Model Diagnostic 82

3.5.2 Goodness-of-fit of Structural Time Series 84

3.6 Application on Road Accidents 85

3.7 Summary 87

CHAPTER 4 - PRELIMINARY STUDY

4.1 Descriptive Statistics 89

4.1.1 Road Accidents Series 89

4.1.2 Climate Related Variable 90

4.1.3 Economic Related Variables 92

4.2 Time Series Plot 93

4.2.1 Road Accident Series 93

4.2.2 Climate Related Variables 97

4.2.3 Economic Related Variables 108

4.3 Correlation Analysis 110

4.3.1 Correlation Analysis for Regions 111

4.3.2 Correlation Analysis for Individual State 112

4.3.3 Unit Root Test 113

4.4 Time Series Regression 114

4.4.1 Time Series Regression with Seasonal Dummies 115

4.4.2 Incorporating Explanatory Variables 119

v

4.5 Box and Jenkins SARIMA 130

4.5.1 Estimating SARIMA Model for Regional Road Accidents 131

4.5.2 Estimating SARIMA Model for Individual States Road

Accidents 132

4.6 Summary 134

CHAPTER 5 - MODELING UNIVARIATE ROAD ACCIDENTS MODEL

5.1 Model Estimation 135

5.1.1 Estimating Road Accident Model for Northern Region 136

5.1.2 Estimating Road Accident Model for Other Regions 137

5.2 Understanding Estimated Regional Road Accidents Model. 141

5.2.1 Trend Pattern of Regional Road Accidents 141

5.2.2 Seasonal Pattern for Regional Road Accidents 145

5.3 Estimating Road Accidents Model for Individual State Level 148

5.3.1 Northern State Road Accidents Pattern 150

5.3.2 Road Accidents Pattern for Other States 155

5.4 Special Features of Seasonal Road Accidents Pattern 166

5.5 Prediction and Forecasting Performance of the Structural Time

Series 167

5.6 Summary 174

CHAPTER 6 - INCORPORATING EXPLANATORY AND INTERVENTION VARIABLE OF ROAD ACCIDENTS MODEL

6.1 Estimating and Understanding Regional Road Accidents Model 176

6.1.1 Error Estimate 178

6.1.2 Estimation of Trend and Seasonal Component 181

6.1.3 Estimation of Explanatory Variables 187

6.1.4 Observing Outliers and Structural Breaks 188

vi

6.2 Estimating and Understanding Individual State Road Accidents

Model 191

6.2.1 Error Estimate 193

6.2.1 States Level Road Accidents Pattern with Explanatory

Variables 197

6.2.2 Estimation of Explanatory Variables 210

6.2.3 Observing Possible Outliers and Structural Breaks 213

6.3 Prediction Performance of STS with Explanatory Variables 214

6.4 Summary 222

CHAPTER 7 - CONCLUSION AND RECOMMENDATION

7.1 Concluding Remarks 224

7.2 Implications of the Study 227

7.3 Suggestion for Future Research 229

REFERENCES 230

APPENDICES

LIST OF PUBLICATIONS

vii

LIST OF TABLES

Page

Table 1.1: List of Variables and unit of measurements 9

Table 1.2: Aggregated regions and their corresponding states 10

Table 1.3: Location of stations that record climate related variables 12

Table 1.4: Computation of Aggregation Samples 12

Table 1.5: An example of BLKG coding 16

Table 1.6: An example of SAFE coding 18

Table 2.1: Recent studies on application of structural time series 27

Table 2.2: Recent methods/ models used in road safety study 38

Table 2.3: Recent studies on the application of structural time series on 45 road safety

Table 2.4: Comparisons between model used in this thesis with 49 previous models

Table 3.1: Value range of coefficient of correlation 53

Table 3.2: Durbin-Watson test decision 60

Table 3.3: Model identification 69

Table 3.4: Structural time series specification model (trend+seasonal) 75

Table 4.1: Descriptive statistics of road accidents series 90

Table 4.2: Descriptive statistics of climate related variables 91

Table 4.3: Descriptive statistics of the monthly economic related 93 variables

Table 4.4: Correlation coefficient between number of road accident for 111 each region with selected dependent variables

Table 4.5: Correlation coefficient between the number of road accidents 113 and selected dependent variables for each state

viii

Table 4.6: Estimated regional road accidents model 116

Table 4.7: Road accidents model for individual states 117

Table 4.8: Durbin-Watson test of autocorrelation 119

Table 4.9: Variance of inflation factor for regions 120

Table 4.10: Variance of inflation factor for individual states 120

Table 4.11: List of possible outlier observations 121

Table 4.12: Estimated regional road accidents model by incorporating 123 explanatory variables

Table 4.13: Estimated individual states’ road accidents model by 127 incorporating explanatory variables

Table 4.14: Estimated regional road accidents model based on Box and 131 Jenkins SARIMA models

Table 4.15: Estimated road accidents model for individual states based 132 on Box and Jenkins SARIMA models

Table 5.1: Estimated results and performance criteria of STS model for 137 northern region road accidents

Table 5.2: Estimated results and performance criteria of STS model for 138 central region road accidents

Table 5.3: Estimated results and performance criteria of STS model for 139 east coast region road accidents

Table 5.4: Estimated results and performance criteria of STS model for 140 southern region road accidents

Table 5.5: Estimated results and performance criteria of STS model for 140 Borneo region road accidents

Table 5.6: Final estimation results according to regions 142

Table 5.7: Best road accident model specification for each individual 149 state

Table 5.8: Final estimation result of northern state road accident model 151

Table 5.9: Final estimation result of central and southern state road 155 accident model

ix

Table 5.10: Final estimation results of east coast and Borneo states road 161 accident models

Table 5.11: Special feature of seasonality of road accidents pattern 166

Table 5.12: Error values for prediction road accidents models 173

Table 5.13: Error values for forecasting road accidents models 174

Table 6.1: Estimation of regional road accidents after adding 178 explanatory variables

Table 6.2: Final road accidents model estimates with explanatory 182 variables

Table 6.3: Estimation of state level road accidents with explanatory 192 variables

Table 6.4: Final estimate of road accidents model with the explanatory 195 variables

Table 6.5: Error values for prediction road accidents models with 221 explanatory variables

x

LIST OF FIGURES

Page

Figure 1.1: 16 An illustration of process determining g1 and g2

Figure 3.1: Box and Jenkins methodology 67

Figure 3.2: Box and Jenkins model identification process 69

Figure 3.3: Step by step procedure of designing of road accidents 86 modeling

Figure 4.1: Monthly time series plot of road accidents for all regions. 94

Figure 4.2: Monthly time series plot of road accidents for individual 95 states

Figure 4.3: Monthly time series plot of amount of rainfall for all 98 regions

Figure 4.4: Monthly time series plot of amount of rainfall for 100 individual states

Figure 4.5: Monthly time series plot of number of rainy days for all 101 regions

Figure 4.6: Monthly time series plot of number of rainy days for 102 individual states

Figure 4.7: Monthly time series plot of maximum temperature for all 103 regions

Figure 4.8: Monthly time series plot of maximum temperature for 105 individual states

Figure 4.9: Regional time series plot for monthly maximum API 106

Figure 4.10: Time series plot of monthly maximum API for individual 107 states

Figure 4.11: Monthly time series plot for economic effect 109

Figure 5.1: Seasonal components of northern region road accidents 137

Figure 5.2: Seasonal components of southern region road accidents 139

xi

Figure 5.3: Trend components according to regions 143

Figure 5.4: Seasonal components according to regions 146

Figure 5.5: Trend components of road accidents for northern states 151

Figure 5.6: Seasonal component for northern states 154

Figure 5.7: Trend components of road accidents in Negeri Sembilan, 156 Melaka and Johor

Figure 5.8: Seasonal component for southern states 158

Figure 5.9: Trend components of road accidents in for central states 159

Figure 5.10: Seasonal components of road accidents for central states 160

Figure 5.11: Trend components of road accidents for east coast states 162

Figure 5.12: Seasonal components of road accidents in east coast states 163

Figure 5.13: Trend components of road accidents for Borneo states 164

Figure 5.14: Seasonal components of road accidents for Borneo states 165

Figure 5.15: Real and estimated states road accidents produced by TSR, 168 SARIMA and STS model

Figure 5.16: Real and estimated states road accidents produced by TSR, 169 SARIMA and STS model

Figure 6.1: Auxiliary residual of regional road accidents model 180

Figure 6.2: Trend components with explanatory and intervention 183 variable according to regions

Figure 6.3: Seasonal components according to regions 185

Figure 6.4: Trend pattern of state level road accidents 198

Figure 6.5: Seasonal pattern road accidents model for individual states 204

Figure 6.6: Real and estimated regional road accidents produced by 215 TSR and STS models

Figure 6.7: Real and estimated states road accidents values produced 217 by TSR and STS models

xii

LIST OF SYMBOLS AND ABBREVIATIONS

Dependent / response variable Yt

Trend component µt

γ Seasonal component t

Irregular component/observation error / disturbance εt

β Regression coefficient

Level error/disturbance ηt

Slope error/disturbance ς t

vt Slope component

ω Seasonal error/ disturbance t Time of t

α state component c constant

Mean of Y observation Y

θ Moving average parameter

φ Autoregressive parameter d Order of differencing

Correlation coefficient r

xiii

2 Coefficient of determination R

System matrices Ztt , T , R t , H t, Q t

Variance of 1-step ahead prediction error Ft

1-step ahead prediction error vt

I Dummy / intervention variable

W Non seasonal different function

Z Seasonal different function

s Seasonal periodic

σ 2 Variance

g Number of hoilday

m Total number of holiday

P Order of seasonal autoregressive

D Order of seasonal differencing

Q Order of seasonal moving average

n Number of sample size

Test of significance correlation coefficient tr

Test of significance regression coefficient tβ

X Explanatory variable

xiv

F-statistics F0

Lower limit of Durbin-Watson statistic dL

Upper limit of Durbin-Watson statistics dU

Standardize residuals bi k Number of lag

κ Number of estimated parameter

E ( ) Expected value

Var ( ) variance cov( ) covariance

∆ Differencing process

Non-seasonal differencing function Wt

Seasonal differencing function Zt

Intervention variable It

λ Intervention coefficient

ρ Autocorrelation function k

τ State component error matrix

Prediction error matrix vt

Variance of prediction error matrix Ft

xv e exponent

 Approximate

Variance of measurement error matrix Ht

Variance of state component error matrix Qt

AADK National Anti-Drugs Agency

ACF Autocorrelation function

ADF Augmented Dickey Fuller

AIC Akaike information criterion

ANN Artificial neural network

API Air pollution index

AR Autoregressive

ARIMA Autoregressive integrated moving average

ARMA Autoregressive moving average

ASEAN Association of Southeast Asian Nations

BIC Bayesian information criterion

BLKG Balik Kampung

BSM Basic structural model

CNY

CO2 Carbon dioxide

CPI Consumer price index for transportation

xvi

CUSUM cumulative sum control chart

DL Deterministic level

DLDS Deterministic linear with deterministic seasonal

DLSS Deterministic level with stochastic seasonal

DOS Department of Statistics

DTDS Deterministic trend with deterministic seasonal

DTSS Deterministic trend with stochastic seasonal dw Durbin watson

EM Expectation-maximization

FENB Fixed effect negative binomial

FEP Fixed effect Poisson

GDP Gross domestic product

GLM Generalized Linear Model

GQ Goldfeld-Quandt test

I integrated

INAR Integer autoregressive

JB Jarque Bera test

JPJ Road Transport Department

KILL Killed

KSI killed and seriously injured

LB Ljung-Box test

xvii

LDDS Local level drift with deterministic seasonal

LDSS Local level drift with stochastic seasonal

LL Local level

LLDS Local level deterministic seasonal

LLSS Local level stochastic seasonal

LRT Latent risk time series

LTDS Linear trend deterministic seasonal

LTSS Linear trend with stochastics seasonal

MA Moving average

MAAP Microcomputer Accident Analysis Package

MAPE Mean absolute percentage error

Max Maximum

Min Minimum

MSE Mean square error

MSP Motorcycle Safety Programme

NA Not applicable

NB Negative binomial

NO2 Nitrogen dioxide

O3 Ozone

OECD Organisation for Economic Co-operation and Development

OILP Crude oil price

xviii

OLS Ordinary least square regression p Order of autoregressive

PACF Partial autocorrelation function

PCR Principal component regression

PCR Principal component regression

PM10 Particulate matter less that 10 microns q Order of moving average

RAIND Number of rainy day

RAINF Monthly average of rainfall amount

RENB random effect negative binomial

RMP

RMSE Root mean square error

SAFE Road safety operation (OPS sikap/)

SAR Seasonal autoregressive

SARIMA Seasonal autoregrssive integrated moving average

SARMA Seasonal autoregressive moving average

SD Standard deviation

SMA Seasonal moving average

SO2 Sulphur dioxide

SPAD Land Public Transport Comission

STDS Smooth trend with deterministic seasonal

xix

STS Structural time series

STSS Smooth trend with stochastic seasonal

SUTSE Seemingly Unrelated Time Series Equations

SWOV Dutch Foundation of Road Safety Research

TEMP temperature

TSR Time series regression

UPM Universiti Putra Malaysia

US United States of America

USM Universiti Sains Malaysia

VIF Variance inflation factor

WHO World Health Organization

WN White noise

xx

PERMODELAN KEMALANGAN JALAN RAYA DI MALAYSIA:

PENDEKATAN SIRI MASA BERSTRUKTUR

ABSTRAK

Permodelan bilangan kemalangan jalan raya telah menjadi topik umum sejak kebelakangan ini. Beberapa kajian berkaitan telah dijalankan dengan tujuan untuk mendapatkan model terbaik yang dapat meramal kemalangan jalan raya dengan lebih tepat. Walau bagaimanapun corak atau pola arah aliran dan kebermusiman bagi kemalangan jalan raya jarang dititikberatkan. Dengan menganggarkan corak arah aliran dan kebermusiman, secara tidak langsung sistem peramalan menjadi lebih baik. Secara tradisinya, penganggaran corak arah aliran dan kebermusiman menggunakan kaedah penguraian. Namun kaedah ini menghasilkan peramalan yang kurang tepat dan tidak dapat menggambarkan keadaan sebenar. Oleh yang demikian pendekatan siri masa berstruktur (STS) dicadangkan untuk memodelkan corak arah aliran dan kebermusiman kemalangan jalan raya. Hal ini kerana pendekatan STS membolehkan interpretasi secara terus dan menawarkan komponen siri masa berubah-ubah mengikut masa. Dalam kajian ini, model kemalangan jalan raya dibangunkan dengan menggunakan pendekatan STS. Melalui kaedah ini, corak arah aliran dan kebermusiman kemalangan jalan raya dapat diperhatikan. Kajian ini dijalankan ke atas 5 rantau utama dan semua 14 buah negeri di Malaysia. Kajian ini juga menyiasat pengaruh terhadap kemalangan jalan raya dengan menggunakan pembolehubah penerang yang bersesuaian. Lapan pembolehubah penerang telah dipilih termasuk empat pembolehubah iklim, dua pembolehubah ekonomi, pembolehubah bermusim, dan pembolehubah berkaitan keselamatan jalan raya.

Keberkesanan model untuk menjangkakan dan meramal kemalangan masa depan dibandingkan dengan model sedia ada seperti model siri masa regresi (TSR) dan

xxi model autoregresi bersepadu purata bergerak bermusim (SARIMA). Kajian mendapati, corak arah aliran dan kebermusiman kejadian kemalangan jalan raya berbeza mengikut lokasi. Bilangan kemalangan jalan raya dianggarkan meningkat pada musim perayaan terutamanya di negeri-negeri yang kurang membangun. Di samping itu ciri-ciri khas perilaku stokastik bagi kemalangan jalan raya dapat diperhatikan. Dalam tempoh kajian, corak kemalangan jalan raya berfluktuasi turun dan naik. Pada masa yang sama pengaruh terhadap kemalangan jalan raya juga berbeza mengikut lokasi. Dari segi prestasi peramalan, STS menunjukan peramalan yang boleh percaya berbanding dengan TSR dan SARIMA.

xxii

MODELING MALAYSIAN ROAD ACCIDENTS: THE STRUCTURAL TIME

SERIES APPROACH

ABSTRACT

Modeling the number of road accidents occurrence is a quite common topic in recent years. A number of studies have been developed with the aim to find the best model that gives better prediction. However, statistical patterns such as trend and seasonality of road accidents is rarely observed. Estimating the pattern of trend and seasonal will indirectly provide a better impact on prediction system.

Traditionally, estimation of trend and seasonal patterns are made based on decomposition method. Yet, this type of estimation shows intangible predictions as the estimation are based on deterministic form. Therefore, structural time series

(STS) approach is proposed to model the trend and seasonal pattern of road accidents occurrence. The STS approach offered a direct interpretation and allowed the time series component including trend and seasonal to vary over time. In this thesis the road accidents model is developed using the STS approach with the aim to observe the pattern of trend and seasonality of road accidents occurrence. This thesis was done on all 5 main regions and 14 states in Malaysia. The study further enhance investigation on road accidents influences at different locations with appropriate explanatory variables. There are 8 explanatory variables considered in this study, which includes four climate variables, two economic variables, seasonal related variable and safety related variable. Effectiveness of the model is measured by comparing their prediction and forecasting performance with time series regression

(TSR) and seasonal autoregressive integrated moving average (SARIMA) models.

The study found that the trend and seasonal patterns of road accidents occurrence vary in different locations. The number of accidents was estimated to be higher

xxiii during festival seasons especially in non-developing states. Besides, the special features of the stochastic behavior of road accidents pattern is also observed. During the study period, the pattern of road accidents is fluctuate between increasing and decreasing. Similarly, the influence of road accidents in different locations also varies. In terms of the prediction and forecasting performance, STS gave more reliable prediction and forecasting compared to TSR and SARIMA models.

xxiv

CHAPTER 1

INTRODUCTION

This chapter begins with the background of the study followed by the motivation of the thesis and proceeds with the objective, contribution of the study to the knowledge and society as well as the scope and limitation of the study. The summary which discusses the structure of the thesis will be presented at the end of this chapter.

1.1 Background of the Study

One of the aim of a developed country is to enhance the survival rate of its population by improving the community’s healthcare and quality of life. In order to determine this, it is important to know the exact number and causes of mortality as components of the population’s health status. Besides, the figures are also important for social economic planning and monitoring in which at the same time it can be used as a good evidence for policy making and implementation.

Across all countries, one of the leading causes of mortality is attributed to road accidents. Aderamo (2012a) revealed that road accidents in developing countries contributed 85 percent of world’s mortality. Meanwhile, World Health

Organization (WHO) in 2014 reported that the ninth leading cause of mortality with

1.3 million deaths is caused by road accidents, and in 2013, it is also the fourth leading cause of death in the United States. In Malaysia, for year 2013, Malaysian

1

Department of Statistics (DOS) reported that transport accidents have become the fifth causes of mortality among Malaysian populations and second cause of mortality among Malaysian male population.

Death from road crash or also known as road fatalities have a big impact to economic growth and at the same time affects the victims families emotionally. In

2004, WHO reported that in Bangladesh, over 70% of households state that their households income, food consumption and food production had decreased after a road death occured to one of their family members.

Therefore, a safe road traffic network system is very important to facilitate the movement of goods apart of improving the community health care by reducing the road death. The important key here is to reduce traffic accident that is main contributor to road fatalities. There are various factors which contribute to road accidents. It can be categorized into driver factor, vehicle factor and roadway factor

(Bun, 2012).

Driver factor includes all factors related to the drivers and other road users. It includes the driver behavior, visual, clarity or clearness of hearing and reaction speed. The vehicle factor includes vehicle design, safety maintenance and safety feature that may reduce accidents occurrence. On the other hand, meteorological or climate condition such as temperature, precipitation, wind speed and fog are also important contributing factor to road accidents as they reduce visibility and cause the loss of vehicle control.

Various efforts have been done in order to reduce the number of road accidents. Specifically in Malaysia starting from early 1970, the first motorcycle lane was built along federal highway with the aim to reduce motorcycle accidents. Study

2 by Radin Umar et al. (1996) found that this intervention has successfully reduced motorcycle accidents by 34%. In 1989, the Road Commissions Safety Cabinet was formed that is responsible to formulate a national road safety target. In the following year, Microcomputer Accident Analysis Package (MAAP) was introduced. The package enables Malaysia to access black spot analysis and conduct necessary treatment to the affected area.

In 1996, Malaysian government established a 5 years National Road Safety

Target. The target is to reduce the number of accident death by 30% by year 2000.

Various initiatives were carried out to achieve the target. In 1997, the road safety research centre which is under Universiti Putra Malaysia (UPM) was mandated to conduct research on motorcycle safety as one of its initiatives. In 2000 the reported accidents death was 6035, which is 5% lower than predicted death by Radin Umar,

(1998) that is 6389.

In the Malaysian road safety plan (2006-2010) the government target to reduce

52.4% of road death by 2010. Among the initiatives to achieve the target was enforcement of Ops Sikap since 2001. This operation was conducted to ensure safety on all roads in Malaysia during festive seasons. It is followed by introducing rear seat belt legislation in 2009. However, in 2010, the index of road death stood at 3.4 per

10000 vehicles which are higher than expected that are 2.0 per 10000 vehicles

(Sarani et al., 2012). This is a relatively poor performance and it puts Malaysia as one of the developing countries that contributed the highest number of road fatalities per 100000 population among the ASEAN countries (Abdul Manan & Várhelyi,

2012).

3

1.2 Motivation and Problem Statements

As discussed before, Malaysia need a strong road safety analysis. Therefore, over the past few years, a number of studies on road safety have been developed. The aim of the studies is to investigate factors that contribute to road accidents as well as to identify the most accurate methods to predict road accidents. Numerical modeling is a common tool for estimating number of road accidents. The model can be either deterministic or probabilistic (stochastic). However, some of the study gives a poor prediction results especially in term of error structure. Sometimes, the studies produced models which either gave accurate prediction without explaining the phenomenon or could describe the phenomenon without being able to explain or predict it (Hakim, 1991).

The models which describe the main features of the series may give a better prediction model. These features can be examined from the pattern of the trend and the seasonal behaviour of the series. Trend and seasonal analysis are best carried out by means of unobserved components or structural time series (Harvey, 2006b)

Unfortunately, road safety study which is the focus of this feature is very rare and limited especially in Malaysia. The studies usually focus on cross sectional studies and effectiveness of the intervention procedure. Therefore, the better model which can describe these valuable features and at the same time investigate the effectiveness of the intervention procedure may give a great impact in improving the road safety.

On the other hand, the scope of the variables used in the road safety study may not suitable especially the dummy variable which involved time series analysis.

For example, the study by Radin Umar et al., (1996) that incorporated the moving

4 holiday effect describing festival holiday. They applied dummy variable to represent this event and name the variable as Balik Kampung (BLKG). It is coded as “0” to represent not BLKG season and “1” to represent BLKG season. In this case, this variable is quite relevant since the study use weekly data. However if the study involves a monthly, quarterly or annual series the dummy variable “0” and “1” is not suitable as the event only occurred partially during the unit data.

Recently, studies done on road safety either focus on regional of population specific aspects. It was found that road safety behaviour in larger population is more risky than smaller populations (Houston, 2007). Yet, these kinds of studies that compared between states or regions are very limited. Up to our knowledge, in

Malaysia, only Wan Yaacob et al. (2012) made the comparison on the number of road accidents between each state. However their study was based on the panel data analysis. This method somehow resricted on the limited number of observation.

1.3 Objective

The main objective of this thesis is to model the number of road accidents occurrence in Malaysia using the structural time series approach. Indirectly, the model developments of this model allows to observance of stochastic behavior or pattern of road accidents. This study will observe and compare the variation of trends and patterns of road accidents during the study period that is between January 2001 to December 2013.

To obtain a better understanding of the trends and seasonal patterns the model is applied to aggregate datasets that includes five main regions and 14 states of

Malaysia. The five main regions consist of the northern, southern, central, east coast

5 and Borneo regions. The aim is to allow the investigation of pattern changes at different locations of regions and states.

After the trends and seasonal patterns have been observed, it is important to investigate the main contributors to these changes. In order to do that the explanatory variables which may explain the changes are incorporated in the model. The variables include climate related variables, economic related variables, rules and regulations enforced during the study period as well as seasonality related variables.

Scott (1986) found that, besides the controllable explanatory variables can identified, incorporating the explanatory variables indirectly creates greater understanding of what “drive” the series, produce fluctuation and provides a basis against which to evaluate further impose changes on safety enforcement implementation.

Modeling and predicting road accidents occurrence has been commonly practiced by many researchers in the recent years. Many models have been introduced to predict road accidents occurrence. One of the most famous approaches from seventies is Box and Jenkins SARIMA model. Thus, the study will compare the forecasting performance of the univariate structural time series with Box and Jenkins

SARIMA model. At the same time, as the starting point of structural time series is a regression model in which the explanatory variables are function of times (Harvey,

1989), the predicting and forecasting performance between two methodology are also compared for both models with and without the explanatory variables. After all the objective of this study can be summarized as follows:

i. To propose alternative road accidents model for each state in

Malaysia by using the structural time series approach.

6

ii. To observe the deterministic and stochastic behaviour or pattern of

road accidents for different regions and states.

iii. To investigate and to understand the influence of road accidents for

different regions and states using the right explanatory variables.

iv. To compare the performance of the structural time series with time

series regression and seasonal autoregressive integrated moving

average model.

1.4 Contribution of the Study

Road safety study is not a new area of interest. This field has been studied by different researchers since a long time ago. The most common approach used is cross sectional model. However, the cross sectional data and their appropriate analysis provide a frozen snapshot on the road safety situation at a fixed point in time

(Stipdonk, 2008). The changes and risk exposure over time cannot be observed.

Therefore the most suitable approach is by considering time series data and their appropriate analysis. Time series method allows the investigation of changes in exposure, risk, of road safety overtime. In other words, it may provide the estimate of road safety which can help policy makers in developing realistic quantitative safety target.

There are various time series techniques that can be used to model road accidents occurrence. The Box and Jenkins model is among the common models preferred by researchers. However, in this study, the structural time series model is introduced in developing a road accidents model for Malaysia as it is offered a lot advantages. This is the first study that applied this approach for the Malaysian case.

Kalman filter estimation technique is used in estimating the model parameters.

7

Through this model, time series components such as trends and seasonal components are extracted and modeled. Thus, the stochastic and deterministic behaviour of trends and seasonal patterns are observed and interpreted. On the other hand, the estimated unobserved component found in the model is important in giving a clear indication of the future long term movement of the series. Indirectly, the model may strengthen the system of road safety modeling in the future.

The best model with relevant explanatory variables may give a better understanding of the road accidents occurrence. In this study, the appropriate way of incorporating the festive seasons and safety operation enforcements are introduced into the model. This approach replaces the common procedure of incorporating those variables that are based on dummy variables of “0” and “1”. This approach is more sensible to the situation and expected to improve the time series of road safety modeling.

First time applied to model Malaysian road accidents, this study is expected to be beneficial to the society as well as the relevant parties. The road accidents model is developed according to regions and individual states instead of only small relative number of countries is covered as in existing study. Therefore, the proposed model may help the society and responsible parties in monitoring the road on a smaller scale, that focused on regions and individual states.

1.5 Scope of data

The main restriction in developing road accidents model is the suitability and availability of data. Some of the data may not be available during the study period and some of them may include missing values. The data are handled with extra care and the handling procedure is explained in details in the appropriate subsections. The

8 variables considered in this thesis include the number of road accidents as the dependent or output variable, and the independent variables consist of climate related variable, economic related variables, seasonal related variables, and rules and regulation that have been enforced during the period of the study. As a summary the list of variables used in this study are tabulated in Table 1.1.

Table 1.1: List of variables and unit of measurement. Variables Description Unit of Measurements RA Monthly number of Log of RA road accidents RAINF Monthly Amount of Milimeter (mm) rainfall RAIND Monthly number of Day rainy day TEMP Monthly average of Degree celcius (°C) maximum temperature API Monthly average of Index maximum air pollution index CPI Consumer price index Index for transportation OILP Crude oil price Ringgit Malaysia (RM) BLKG Balik Kampung culture Weight variable SAFE Operation of Ops Sikap Weight variable dan Ops Selamat

1.5.1 Road Accidents

Majority studies made on road safety research employed number of injuries, number of casualties and frequency of road accidents as their variables of interests.

In this study, monthly frequency or monthly number of road accident occurrences in all states is considered as the dependent variable. The number of road accidents was obtained from Royal Malaysia Police (RMP). RMP has defined road accidents as follows:

9

“The occurrence of accidents on public or private roads due to negligence or omission by any party concerned (on the aspects of road users conduct, maintenance of vehicle and road condition) or due to environmental factors (excluding natural disaster) resulting in collision (including out of control cases and collision or victim in vehicle against object inside or outside the vehicle eg: bus passenger) which involved at least one moving vehicle, structure or animal and is recorded by the police”

The number of road accidents recorded include all 14 states in Malaysia. In this study, the number of road accidents is further aggregated into five main regions.

The aggregated regions and corresponding states are defined as in Table 1.2.

Throughout the study, each variable included are also aggregated into region and analysis were performed based on respective regions and states.

Table 1.2: Aggregated regions and their corresponding states Region States Northern Penang, Perlis, Kedah, and Perak Southern Negeri Sembilan, Melaka and Johor Central Kuala Lumpur and Selangor East Coast Kelantan, Terengganu, Pahang Borneo Sabah and Sarawak

1.5.2 Climate Related Variables

Weather variations have some influence on road conditions and road users.

Hot day with high temperature may affect the mood of drivers. Heavy rain and hazy day might influence the vision of drivers. Heavy rain also made the road wet and slippery. These conditions, may contribute to road safety. In this case, climate variables would be the best factor to consider as one of the factors that caused road accidents.

Climate factors that are considered in this study include monthly average of rainfall amount (in millilitre) (RAINF), number of rainy days (RAIND), monthly maximum temperature (in degrees Celsius) (TEMP), and air pollution index (API).

10

Majority of the data were based on the Monthly Statistical Bulletin and Compendium of Environmental Statistics, which are published by the DOS, while other data were obtained from Department of Meteorology, the main body that is responsible for compiling the environmental data in Malaysia.

Daily rainfall was considered if the amount of rainfall recorded is equal or exceeds 0.1mm. API was calculated based on the average concentration of each air pollutant, namely SO2, NO2, CO2, O3, and PM10 and air pollutant with the highest concentration will determine the API. Typically, concentration of a fine particulate matter (PM10) is the highest compared to other pollutants, and this determines the

API. The API can be categorized as good if the index is between 0 and 50, moderate if the index is between 51 and 100, unhealthy if the index is between 101 and 200, very unhealthy if the index falls between 201 and 300, and hazardous if the index is more than 300. However, API data are quite limited for the states of Selangor and

Perlis. The data only covers the period of January 2004 to December 2013 for both states. The details of climate related variables incorporated in this study are tabulated in Table 1.3 together with the stations that collected the data. Besides, as in this study the series are aggregated into a regions, the climate related variable for regions are computed as fin Table 1.4

The similar variable such as amount of rainfall, number of rainy day and temperature were used in road safety modeling literature such as Scott (1986), Keay and Simmonds (2006), Wan Yaacob et al. (2011a, 2012) and Brijs et al. (2008). It was found that these factors have some influence on road accident occurrence. In

2012 Dutch Foundation of Road Safety Research (SWOV), stated that visibility can be reduced to 50 meters during heavy rain as well as during snow and thick fog. On

11 the other hand, extreme temperature tends to cause harmful effects on driver’s performance, road infrastructure, and vehicle components.

Table 1.3: Location of stations that record climate related variables Station Location State RAINF & RAIND TEMP API Penang Bayan Lepas/ Bayan Lepas Prai, USM Butterworth Perlis Chuping Chuping/ Kangar Kangar Kedah Alor Setar, Alor Setar Alor Star Langkawi Perak Ipoh, K. Kangsar, Ipoh/ Sitiawan Tanjong Malim, Sitiawan Ipoh Negeri Seremban Seremban Seremban Sembilan Melaka Bandaraya Bandaraya Bandaraya Melaka Melaka Melaka Johor Batu Pahat, Senai, Mersing Johor Bahru Kluang, Mersing Kuala Parlimen Kuala Lumpur Batu Muda Lumpur Selangor Sepang, Petaling Sepang, Petaling Shah Alam Jaya, Subang Jaya, Subang Kelantan K. Bharu, K. Krai Kota Bharu Kota Bharu Terengganu K. Terengganu Kuala Kuala Terengganu Terengganu Pahang Jerantut, Cameron Kuantan Kuantan Highland, Muadzam Shah, Temerloh Sabah Kota Kinabalu Kota Kinabalu Kota Kinabalu Sarawak Kuching Kuching Kuching

Table 1.4: Computation of Aggregation Samples Climate Related Computation Variables RAINF The total amount of rainfall for each states under the regions RAIND The average number of rainyday for each states under the region TEMP The average of maximum temperature for each states under the region API The average of maximum air pollution index for each states under the region

12

Unfortunately, some of the climate related variables may involve missing values problem due to technical error. The missing values are observed in amount of rainfall, temperature and air pollution index for selected states. In order to handle these missing values, this study used linear interpolation method as suggested by

Law et al. (2008). Interpolations were only done for short period of time by averaging the observations over preceding and posterior periods. However, because the missing values in this study involve a long period time, it is handle by interchanging the dataset into annual data. The preceding and posterior values are based on annual values. For example, if the missing value is for January 2005, the preceding value will be January 2004 and the posterior value will be January 2006.

1.5.3 Economic Related Variables

Numerous economic related variables could be incorporated in the study, however, their influence on accidents data may be indirect in changing the characteristics of traffic and road environments (Scott, 1986). The economic related variables that are considered in this study include crude oil price (in Malaysian

Ringgit per Barrel) (OILP) and Consumer Price Index for transport (CPI). OILP is accessed from the World Bank website. It is calculated based on the simple average of three spot prices which are Dated Brent, West Texas Intermediate and Dubai

Fateh.

CPI is computed based on number of vehicles purchased, operation of personal transport equipment (including spare parts, accessories or lubricant) and transport services. The data for this variable are gathered from monthly statistical bulletin provided by DOS. Both economic related variables above have been used as

13 explanatory variables in this study to test whether they really influence road accidents frequency.

1.5.4 Seasonal Related Variables

Festival celebrations are usually caused more road accidents to occur. This is because the traffic suddenly becomes heavier because citizens return to their hometown (known as Balik Kampung) to visit their relative during the festivals.

Such festivals include Chinese New Year, Eid-ul-Fitr, and Deepavali are determined based on the lunar calendar. The dates of these celebrations are not fixed every year and they change on yearly basis. Radin Umar et al. (1996) incorporated similar variables in measuring the effect of festival celebrations on motorcycle accidents.

They applied dummy variable to represent this event and name the variable as Balik

Kampung (BLKG) . It is coded “0” to represent not BLKG season and “1” to represent BLKG season. The study is sensible as it involved weekly data.

However, the BLKG which represents festival holidays are not absorbed by monthly dummies. Therefore this study applied one weight variable for moving holidays as in Shuja et al. (2007). From a survey made on 350 respondents, it is found that the number of off days that is usually taken for Eid- ul-Fitr was 7 days (2 days before festival and 5 days during and after the festival), 8 days for Chinese New

Year (2 days before festival and 6 days during and after the festival) and 4 days for

Deepavali (1 day before festival and 3 days during and after the festival). In this study, the variable to represent BLKG events were coded as in the expression below and example of the coding for this variable will be as in Table 1.5. In this study,

BLKG variable only considered three main festivals that is Chinese New Year, Eid- ul-Fitr and Deepavali.

14

Case1: If the date of the festival falls in the beginning of the month (1st-15th), the weight value is define as follows

 g1  in the respective festive month  m  g BLKG1=  2 before the respective month  m 0 otherwise  

where g1 is the number of holidays that fall in the respective month, g2 is the number of holidays before the respective month and m is the total of holiday ( m = 7 for Eid-ul-Fitr, m = 8 for Chinese New Year and m = 4 for Deepavali).

Case2: If the date of the festival falls at the end of the month (16th-31st), the weight value is defined as follows

 g1  in the respective festive month  m  g BLKG2=  2 after the respective month  m 0 otherwise  

where g1 is the number of holidays that fall in respective month, g2 is the number of holidays after the respective month and m is total of holiday ( m = 7 for Eid-ul-Fitr,

m = 8 for Chinese New Year and m = 4 for Deepavali).

15

Table 1.5: An example of BLKG coding Year Month Festival Date of Ratio BLKG festival 2004 1 Chinese New 22 -Jan 1 1.00 Year 2004 2 0.00 2004 10 0.00 2004 11 Deepavalli 12-Nov 1 2.00 Eid -ul -Fitr 14-Nov 1 2005 1 0.00 2005 2 Chinese New 9 –Feb 1 1.00 Year 2005 9 0.00 2005 10 1/4 0.25 2005 11 Deepavalli 1 Nov 3/4 1.75 Eid -ul -Fitr 4 Nov 1 2006 1 Chinese New 29 Jan 5/8 0.63 Year 2006 2 3/8 0.37 2006 3 0.00 2006 10 Deepavalli 21 Oct 1 2.00 Eid-ul-Fitr 24 Oct 1

For example, in 2006 Chinese New Year falls on 29 Jan, g1 = 5 and g2 = 3 . Given in

Figure 1.1 is an illustration of how to determine g1 and g2 as suggested by Shuja et al. (2007).

Figure 1.1: An illustration of process determining g1 and g2

16

1.5.5 Road Safety Related Variables

Other data that were also considered include the road safety related variable which is enforcement of road safety ,Ops Sikap (SAFE). Ops Sikap or Attitude Ops is a traffic safety operation carried out by Royal Malaysia Police to nurture peoples’ safety awareness on all roads in Malaysia during festive seasons such as Eid-ul-

Fitr, Deepavali, and Chinese New Year. This operation began in 2001 which involves the collaboration of Malaysian Road Transport Department (JPJ),

Land Public Transport Comission (SPAD) and The National Anti-Drugs Agency

(AADK).

Ops Sikap variable has been used by Wan Yaacob et al. (2011b) in examining its effect on road accidents in Malaysia. The study implement dummy variable “0” to represent no SAFE and “1” to represent SAFE operation. However, it is found that this notation will be quite not irrelevant if its date involves two consecutive months.

In such cases this study suggests to use weight variable for SAFE where the representation of the Ops Sikap variable are based on the rate number of day the operation is carried out. The total of operation day for the enforcement of Ops Sikap for both Chinese New Year and Eid-ul-Fitr is 15 days. If SAFE involved two consecutive months, the total number of days of the operation on those months were divided by 15. While other months were coded as “0” to represent no Ops Sikap.

Table 1.4 illustrates this case.

17

Table 1.6: An example of SAFE coding Year Duration Month Code 2001 9 Dec-23 Dec 12 1 2002 5 Feb- 19 Feb 2 1 29Nov-13 Dec 11 2/15 12 13/15 2003 25 Jan-8 Feb 1 7/15 2 8/15 18 Nov-2Dec 11 13/15 12 2/15

1.6 Limitation of the Study

The study fails to take into account the influence of some other important or relevant variables since these variables are either not available in monthly unit or there are not available in state by state basis. For example the data on gross domestic product (GDP) only available in quarterly, while the data for volume of traffic not collected in state by state basis.

As state in earlier section, the period of the study is from January 2001 up to

Disember 2013. However, the variable of air pollution index (API) for Perlis and

Selangor only can be retrieved from 2004 onwards. Therefore, the model of road accidents for these both states are developed based on data from year 2004 until

2013.

The study also, only cover univariate analysis with and without explanatory variables and no multivariate analysis has been developed. Besides, the prediction and forecasting of road accidents model only applicable for univariate time series model without explanatory variables as the lack of information of other explanatory variables for year above 2013. Furthermore, this study does not include mathematical proving since all the equations used are mostly taken from published literature.

18

1.7 Summary and Thesis Organization

This thesis is divided into seven chapters which include this introductory chapter, followed by literature review in Chapter 2, methodology in Chapter 3, the analysis and discussion of the result in Chapter 4 to Chapter 6 and conclusion of the thesis is in Chapter 7.

Chapter 1, the introductory chapter, presents the background of the research including the research problem followed by the objectives and significance of the study. Besides, the scope of the study which describes the variables used in this thesis is also presented in this chapter.

In Chapter 2, the background definitions of structural time series approach is given and the advantages of this technique is reviewed. Furthermore, previous literature on the application of common techniques to model road safety study especially road accidents occurrence is discussed. Chapter 2 is important for the understanding of some related idea in developing road accidents model in this thesis.

Chapter 3 is concerned with the statistical analysis or theoretical technique used in this thesis which includes descriptive statistics and correlation analysis.

Moreover, this chapter discusses all common methods used in developing road accidents models as well as introducing the structural time series method in modeling road accident. This chapter also includes step by step procedure of developing road accidents model which is applied in this thesis.

Chapter 4 describes the properties of data collected based on descriptive statistics, time series plot and correlation analysis. Descriptive statistics is important in describing the basic feature of the data, while time series plot is useful in observing the basic pattern of the series such as trends and seasonality. The

19 correlation analysis measures the strength of relationship among the variable. In addition, common time series methodology such as time series regression (TSR) and seasonal autoregressive integrated moving average (SARIMA) analysis are applied.

This chapter is important as an early stage of the study before it is applied to the other analysis. In addition, common time series analysis used in this chapter will be compared with the other methods, which will be employed in the next two chapters.

Chapter 5 estimates the model for the number of road accidents using structural time series approach. The chapter begins with the model identification, followed by estimating the model for the number of road accidents model for five regions as well as for individual states in Malaysia. The statistical trend and seasonal pattern of each series is also observed as one of the objectives in this chapter. Next, the estimated road accidents models for the regions as well as the individual states are then compared with TSR and SARIMA model to measure their performance.

Next, the number of road accidents models is refitted in Chapter 6. However, the estimated model incorporates explanatory variables to investigate their influence to road accidents. The estimation of explanatory variables as well as their discussion will be thoroughly described. Besides, the stochastic trends and seasonal patterns after incorporating the explanatory variables and considering the outliers will be observed. The performance of the estimation model between STS and TSR will be discussed at this chapter.

The last chapter summarises the conclusion of this thesis from both theoretical and applied points of view. It also contains suggestion of further research related to the idea of this thesis.

20

CHAPTER 2

LITERATURE REVIEW

There are numerous statistical and mathematical methods that are introduced to model and predict the road safety. Some of the models are less sophisticated which could not describe the phenomenon or give a poor prediction. This chapter provides a historical perspective of the structural time series approach and the developments in road safety research empirically and methodologically.

2.1 Structural Time Series

In the beginning, structural model is developed as a traditional decomposition of time series component as a sum of trend, seasonal and irregular components

(Harvey and Durbin, 1986).

Yt=µγε ttt ++ tn =1, 2, ..., (2.1)

where Yt denotes the t -th observation possibly after the logarithmic transformation

and µt , γ t , and εt are the trend, seasonal and irregular components. The trend

component is simply deterministic linear model written as µtt=cv + and the

seasonal component, γ t is the seasonal periodic function such as the number of month, quarter or week. Its limited application can be enhanced based on this form as many series have a better fit if its structures evolve overtime. 21

The fundamental thought of how this can be accomplished originated from

Muth (1960) who considered the situation where there is no seasonality and trend

occurred without slope but the level, µt varied over time in random walk giving the model.

Yttt=+=+µε, µ tt µ−1 η t (2.2)

where εt and ηt are independent white noise terms. Later, Theil and Wage (1964) and Nerlove and Wage (1964) extended the model by including a trend with slope that yielded local linear trend model. The model made both level, µt and slope, vt components evolved overtime which gives the model below

µµ= ++ η =+ς t t−−11v t t, vv tt−1 t (2.3)

where ς t is a slope disturbance term that independent of εt and ηt .

In 1965, Schweppe (1965) showed that a likelihood function could be used to evaluate both models by using the Kalman filter via prediction error decomposition.

However, a constraint in the computation technology in the 1960s made the results cannot be exploited properly. During that time, Box and Jenkins technique is the most influential time series methods. Box and Jenkins (1976) have observed that the first difference of Equation (2.2) and second difference of Equation (2.3) yield first order moving average process and second order moving averages process respectively. This has led to formulation of the class of ARIMA (Autoregressive integrated Moving Average) model class and the development of model selection strategy.

22

Although the ARIMA approach has dominated the time series literature in

1970s and 1980s, the structural approach was more prevalent in control engineering

(Harvey, 2006a). It is largely due to the familiarity of the Kalman filter approach in control engineering area since the appearance of Kalman (1960). Kalman filter is a set of mathematical equations that recursively estimate the state parameters by minimizing mean square error (Welch and Bishop, 2006). Another advantage of the

Kalman filter approach is that it can be used to construct complex models. In 1970s, an early example of application of Kalman filter approach in economic and statistical research can be found in Rosenberg (1973) on time varying parameters, and in 1980s in Young (1984); Harvey(1989); West and Harrison (1986) and Kitagawa and

Gersch (1996).

In order to handle seasonal component in structural time series Harrison and

Stevens (1971) suggest two general techniques which employ trigonometric model and time varying seasonal dummy. Besides, structural time series could also be extended by including explanatory variables and intervention variable which will be briefly explained in the next chapter.

2.2 Advantages of Structural Time Series

Structural time series has a direct interpretation in time series modeling and explanatory variable can be added in direct manner. The model can be put in state space form, and estimated by Kalman filter estimation technique (Harvey and

Durbin, 1986). In addition, structural time series also has good performance in forecasting annual, quarterly and monthly data especially for long forecasting horizons and seasonal data. The forecasting results is quite reliable and accurate compared to others forecasting methods (Andrews, 1994).

23

Besides, structural time series make it easy to handle missing values and outliers once it is in state space form (Harvey, 1989). The missing values were estimated using Kalman filter approach while outliers were handled by including intervention variable. On the other hands, structural time series will model the seasonal and trend components compared to ARIMA model which eliminate both components using differencing of the original series (Jalles, 2009). This condition indicated that structural time series does not easily remove the important information from original series.

However, structural time series also have its flaws. Referring to Karlis and

Hermans (2012), the structural model are usually more complicated and less interpretable compared to standard time series model. Besides, extra computational effort is needed and there is still a lack of statistical software that implement this approach.

2.3 Application of the Structural Time Series Model

Recent contribution on the application of structural time series can be seen in various applications such as economics, sociology, management science, operational research, geography meteorology and engineering (Harvey, 1989). This section will review some application of the structural time series approach in several disciplines.

2.3.1 Economics

In economics, the application of structural time series (STS) can be found in

Thury and Witt (1998) which generates monthly forecast of Austrian and German industrial production. The specification of the STS model used in this study is basic

24 structural model (BSM) with cylical effect. The model used in their study is then compared with the autoregressive integrated moving average (ARIMA) model. The study found that STS outperformed ARIMA model.

While, Moosa (1999) estimated the Okun’s coefficient by extracting the cyclical components of GDP and unemployments using the STS. The estimate value in the study is found to be close to the original value estimated by Okun. Next,

Moosa (2000) presented the monetary model exchange rate under the German hyperinfalation. The study relates the exchange rate with the stochastic trend, money supply, and the explorative expectation.

Muscatelli and Tirelli (2001) investigated the relationship between unemployment and labour productivity growth for OECD countries using the STS approach. In their study, the specified structural model used is trend model with cycle and explanatory variables. The study incorporated the impulse dummy variable and found that variance of slope disturbance to be insignificant. Therefore, majority of the unemployment models in OECD countries have a fixed level time series component.

Then, Broadstock and Collins, (2010) presented a mean of determining historic cycling price index in United Kingdom for the period 1949-2006 based on annual demands data. Similar to another recent study, the STS model used in this study is trend model. The most recent study in economics is by Krieg and van den Brakel

(2012) which obtained the precise estimate of the Dutch monthly unemployment rate.

25

2.3.2 Meteorology, Ecology and Agriculture

In meterology area, STS application can be found in estimating, predicting as well as forecasting the temperature. For example, Zheng and Basher (1999) identified the STS model for annual temperature in New Zealand. The study is followed by Allen (1999) which applied this approach to reconstruct past temperature from three species of tree datasets. Allen founds that the estimation made based on STS improved the model and produced a better fit in comparison to principal component regression (PCR).

Furthermore, the STS approach is also widely used in ecology study such as in Wang and Getz (2007) who applied STS model to decompose and detrend 25 years of monthly live trapping shrew population. The study claimed that the STS approach took into account the measurement error in comparison to the classical Box and Jenkins time series analysis. In the similar discipline, Knape et al. (2009) model bird migration using multivariate STS approach.

In agriculture field, STS aplication is used to forecast and estimate elasticities. These studies include Freeman and Kirkwood (1995) that modeled the catch of fish, Hannonen (2005) which estimated the variation of land price, Shepherd

(2006) that estimated the elasticities of supply for cotton across 30 countries and 16 region and Singh et al. (2014) that forecasted a Gram production in India. The study showed that it is worthwhile highlighting the wide variety of trend specifications and providing more accurate estimates.

26

2.3.3 Other Disciplines

Besides that, structural time series or state space method has been widely used in engineering discipline. Dordonnat et al. (2008), Dilaver and Hunt (2011) and

Dordonnat et al. (2012) are among the researchers that used state space time series method to forecast and model electricity load. In tourism industry, Song et al. (2011),

Athanasopoulos et al. (2011), Kim et al. (2011), and Preez and Witt (2003) also used state space time series approach to forecast and model incoming tourist.

Last but not least, in air quality study, Lipfert and Murray (2012) used this special approach to relate air pollution with daily mortality. Meanwhile, Lawson et al. (2011) has developed air quality model to predict traffic-related nitrogen oxides concentration. As summary, Table 2.1 tabulated the recent study on few discipline which applied the STS approach. From its applications in various disciplines the STS has shown to be a promising tool for analyzing time series data accurately. Therefore it is worthwhile to apply this approach in road safety time series data.

Table 2.1: Recent studies on application of structural time series

Reference Area of Study Dordonat et al. (2008, 2012), Dilaver and Hunt (2011) Engineering Song et al.(2011) Tourism Shepherd (2006), Wang and Getz (2007), Knape et al. Agriculture, Ecology (2009) Lawson et al.(2011) Air Quality Hannonen (2005), Krieg and Van Den Brakel (2012), Economy Muscatelli and Tirelli (2013).

27

2.4 Common Techniques of Road Safety Modeling

Road safety is an old issue that has been investigated by researchers for more than 40 years. From an empirical standpoint, most studies attempted to relate the number of population and number of vehicles with traffic accidents and fatalities

(Valli, 2005; Ali and Bakheit, 2011; Nasaruddin et al., 2012). Numerous studied also focused on road characteristics such as gradients, pave roads, traffic flow, road design and geometry and black spot analysis (Greibe, 2003; Noland, 2003; Harnen et al., 2006; Ali and Bakheit, 2011; Abdul Manan et al., 2013; Amin et al., 2014)

In addition, a number of studies have investigated the weather, meteorological and climate effect risk on road safety level (Brijs et al., 2008; Wan

Yaacob et al., 2010; Wan Yaacob et al., 2011a; Wan Yaacob et al., 2012; Bergel-

Hayat et al. 2013; Amin et al., 2014). At the same time, there are several studies that focused on the effect of restraints device used such as seatbelts and helmets and the safety intervention (Wan Yaacob et al., 2010; Wan Yaacob et al., 2011a; Wan

Yaacob et al., 2011b; Wan Yaacob et al., 2012; Sarani et al., 2013). Besides, the economic effect on road safety level also among the interest in road safety study

(Law et al., 2005; Ali & Bakheit, 2011).

From the methodology standpoint, a variety of statistical and mathematical approaches have been introduced. Such methods includes linear regression, generalized linear model, time series model, panel data analysis, and artificial neural network. One of the first models developed for road accidents is proposed by

Smeed (1949). The model is formulated to relate the number of death in car accidents as below:

28

b DP tt= a (2.4) NNtt

where Dt is the number of death at time t , Nt is the number of registered vehicle at

time t , and Pt is the population size at time t . Parameter a and b were estimated based on least square estimate. This model was then used in certain countries with great success (Karlis and Hermans, 2012). However, Andreassen (1985) argued that relationship found by Smeed’s model is spurious as the model represented 20 countries using only one year of data. In that case, Andreassen came out with a new equation which can be written as follows:

ab Dt= α NP tt (2.5)

After a long period, the Smeed’s model and Andreassen‘s equation are still being used by the researcher. The study which applied these models can be found in

Valli (2005), Nasaruddin et al. (2012) and Ponnaluri (2012). Lassarre (2001) criticise the model as it neglects an important factor which is the indicator of the progress made in safety performance by the road transport system as a whole. The study also stated that a kilometer driven currently is not equivalent with a kilometer in 1960.

Therefore, a lot of road safety models developed incorporate the progress made in safety performance, for example by determining the road safety causal effect. The ordinary least square (OLS) regression approach is among the method attract the attention of researchers. Earlier accidents models based on OLS regression are proposed by Scott (1983) and Zlatoper (1984). The models assumed that response

29 variable were independent of each other and is used to measure the relationship between response and explanatory variable.

Scott (1983) model monthly road accidents in Britain focusing on two vehicle accidents with ten explanatory variables including traffic volume, temperature, amount of rainfall, petrol price index, trend, seasonality, fuel crisis speed limit and number of working days. The study founds that petrol price, temperature, trend and fuel crisis significantly reduce the number of road accidents while amount of rainfall results in increasing number of road accidents.

Meanwhile, Zlatoper (1984) modeled the annual road accidents death series in United States from 1947 to 1980. The study was extended by Peltzman (1975) that made a cross sectional study by incorporating new variables and using time series data. The study used three dependent variables to represent road accidents death, which is total accidents death, vehicle occupant death and pedestrian death. The explanatory variables considered in this study are price, income, traffic volume, alcohol influence, average speed, youth driver, size of vehicle, type of driving whether in rural or in urban area, safety standard and secular trends. As time passed, the OLS regression model is always one of the favourite models for many researchers in road safety modeling. Studies which employed this method include

Nasr (2009), Desai and Patel (2011), Aderamo (2012a), Aderamo and Olatujoye,

(2013) and others.

The performance of the OLS regression model is comparable with other approaches in the statistics and artificial intelligence. For example Ali and Bakheit

(2011) predicted road traffic accidents in Sudan and compared the predictive ability of artificial neural network (ANN) with multiple linear regression. The study found

30 that regression based model is very much comparable to ANN in terms of goodness of fit, but ANN estimates are much closer to the actual value where regressions model showed apparent deviation from actual values.

However, the frequency related to road safety such as number of road accidents, casualties or fatalities are discrete, non-negative and considered as count data. Therefore, normal OLS regression analysis may not suitable for some cases.

Two classes of models that are suitable to cater this framework is Poisson regression and negative binomial model. Studies that used on this approach include Radin Umar et al. (1996), Greibe (2003), Harnen et al. (2006), Brijs et al. (2008),Usman et al.

(2010), Abdul Manan et al. (2013), Abusini (2013) and Sarani et al. (2013).

Abusini (2013) studied the influence of road characteristics with the number of road accidents. Meanwhile, Brijs et al. (2008) and Usman et al. (2010), dealt with the impact of weather on road accidents. Since the studies used Poisson regression, over dispersion problem, where variance of the model does not equal to the mean may occur. To overcome this problem, several researchers used correction technique of quasi likelihood. This technique was applied by other researchers such as Harnen et al. (2006). Alternatively, Abdul Manan et al. (2013) used negative binomial regression model to overcome this overdispersion problem. Goodness of fit of those models are tested using scaled of deviance which has been applied by other researchers such as Greibe (2003), Wan Yaacob et al. (2011a), and Abdul Manan et al. (2013).

However, an assumption in modeling regression is that the observation must be independent. Regression methods such Poisson regression, binomial regression or linear regression may not be appropriate for time series data. To model time series

31 data, time series analysis such Box and Jenkins and exponential smoothing are more appropriate. Count data related to road safety which are collected in certain period of time can be modeled using univariate time series analysis such in Ofori et al. (2012) and Razzaghi et al. (2013) or incorporating the explanatory and intervention variable such as in Scott (1986), Law et al. (2005) and Wan Yaacob et al. (2011b).

Ofori et al. (2012) used annual time series of motorcycle road accidents to forecast road traffic injuries in Ghana. The study compared forecasting performance of two time series method that are damped trend exponential smoothing and Box and

Jenkins ARIMA. The study involved transformation and differencing of the data to achieve stationarity. Akaike Information Criterion (AIC) and Bayesian Information

Criterion (BIC) were used to select the best model which fulfills the residual assumption of the model that is independent, normality and randomness of the residuals. The results were then forecasted within sample data and RMSE and MAPE were used to evaluate the performance of the forecast. The study found that ARIMA performed better than damped trend exponential smoothing.

Using similar approach, Razzaghi et al. (2013) fitted monthly traffic accidents attributed to motor vehicles using ARIMA to understand the trend of accidents in

Taybad city. The study use autocorrelation function (ACF) and partial autocorrelation function (PACF) to identify the order of ARIMA model. Similar to

Ofori et al. (2012), the study also used AIC and BIC criterion to select the best fitted model. Unexpectedly the monthly series of road accidents trends does not fit seasonal ARIMA models. Several reasons were given such as quality of data, weather changes, and behavioral factor that is not taken into account which made the series does not satisfied Box and Jenkins method.

32

On the other hand, by incorporating explanatory variable such as traffic volumes, petrol price, amount of rainfall, temperature, number of working day and secular trends, Scott (1986) compared regression and Box and Jenkins methods in explaining the reduction of road accidents frequencies in Great Britain. The study applied transfer function model to incorporate explanatory variable in Box and

Jenkins methods. It is found that Box and Jenkin models represented the series better than regression model. In terms of simplicity, regression model is preferable.

Similar approached used by Scott has been applied by Law et al. (2005) to investigate the effect of the recent economic crisis and Motorcycle Safety

Programme (MSP) on motorcycle-related accidents casualties. The model incorporated annual series of motorcycle related accidents and non-motorcycle related accidents, injuries and fatalities as response variable and its predicted by gross domestic product (GDP) and the intervention of MSP. Similar to other Box and

Jenkins model, the study handled the stationarity test using Augmented Dickey

Fuller (ADF) test and diagnostic test of residual to test the normality using Ljung

Box test. The best model was selected based on the lowest AIC value. The result founds that MSP is proven to be successful in bringing down the number of motorcycle accidents and it is positively related with GDP.

In contrast, Yaacob et al. (2011) applied Box and Jenkins with Box and Tiao intervention analysis to examine the effect of Ops Sikap on road accidents. The study involved monthly number of road accidents from January 1996 - December 2007 and dummy variable of Ops Sikap (i-xv) that was implemented from December 2001.

The modeling process involved two phases, that is identification of pre-intervention

33 univariate model of ARIMA and incorporation of intervention analysis using Box and Tiao model.

The study found the best fitted model to describe effect of Ops Sikap intervention was ARMA(1,12) and Ops Sikap intervention successfully reduce the number of road accidents. Although this may be true, following the theory of Box and Jenkins methods, the author may be wrongly interpreting their result. The study stated that there are two differencing criteria which are in non seasonal differencing and seasonal differencing. The author should have presented a result in correct seasonal ARIMA order such as Seasonal ARIMA (p,d,q) (P,D,Q)s. Besides, the study does not show the residual assumption of the model is satisfied the white noise, although it is mentioned in the methodology part.

Alternatively, to incorporate both cross sectional and time interdependencies in modeling road safety, panel data approach or also known as longitudinal study were developed. Researchers that used this approach include Brijs et al. (2008) which introduced integer autoregressive (INAR) model for modeling count data with time interdependencies in predicting effects of environmental factor to road safety. The data used in the study include daily crash data, meteorological data and traffic exposure data. The study covered daily data of three big cities in Netherlands which are Utrecht, Dordrecht and Haarlemmermeer for year 2001.

Three models were fitted. In the first model, day of the week, dummies were used to reflect differences in exposure as a proxy if traffic exposure was not available. The second model included actual traffic exposure as a covariate in the model while the third model included both covariates. Comparison was made on these three models. At the same time, stepwise procedure was carried out to

34 overcome multicollinearity. Variable showed higher correlation with the response variable will be included in the model. The estimation procedure used is expectation- maximization (EM) algorithm which is claimed to be an efficient tool for fitting model with similar structure. Then, each of the fitted model was compared with the competing models which namely the Poisson regression and negative binomial regression models. The study found that Poisson regression was the worst model while negative binomial regression improved the overdispersion problem occured in

Poisson regression model. However, comparing the three types of model with different variables, INAR model was preferable since it gave the best log-likehood ratio test statistic. The study also showed that weather conditions have significant effect on crash counts.

Similarly, Noland (2003) determined the effectiveness of road infrastructure in reducing annual fatalities using fixed effect negative binomial model and random effect negative binomial model. The study included several variables that represent road infrastructure, changes in technology and control variables such as total population, population age, income, alcohol intake, seat belt legislation and medical technology. The study covered 50 states in the United States (US) for the periods of

14 years. The study showed that changes in age cohort, increase in seat belt usage, decline in alcohol consumption, and increase in medical technology are successful in reducing fatalities, but changes in US highway infrastructure from 1984 to 1997 do not reduce the number of road fatalities.

On the other hand, Wan Yaacob et al.(2010) applied random effect model and pooled negative binomial (NB) model in examining factors contributed to the occurrence of road accidents 14 states in Malaysia from January 1996 to December

35

2007. The tested variables include monthly number of road accidents occurrence as the dependent variable and the list of independent variables such as number of registered vehicles, amount of rainfall, number of rainy days, dummy months, dummy regions and time effect as a proxy for technology changes. At first, data from

14 states in Malaysia were disaggregated according to 4 region that is East coast,

South, West and North. This is in order to examine the spatial effects. Three types of model (basic non location, non time specific regression, and location with time effect models) were developed for each approach namely NB model, and random effect negative binomial (RENB) model. The study used AIC, log-likehood and deviance test to choose the best fitted model. The study found that RENB model perform better than NB model. Furthermore, the study also revealed that month of August and

October are prone to have more accidents and as expected states such as Kuala

Lumpur, Selangor, Malacca and Negeri Sembilan have higher likelihood to contribute to the highest number of road accidents. The results also uncovered that number of registered vehicles, time effect and weather have positive relationship with road accidents occurrence.

A year later, Wan Yaacob et al. (2011a) applied the same data in Wan Yaacob et al. (2010) using different models which are pooled Poisson model, fixed effect

Poisson (FEP) model and fixed effect negative binomial (FENB) model. The model used is based on panel data analysis. The study used similar variables as in the previous study, but it does not consider the spatial effect. The study found that fixed effect negative model was the best model to describe the number of road accidents and similar to the previous study, number of registered vehicles, amount of rainfall, time effect and dummy months found to be significant. Meanwhile, the results

36 revealed different as results in previous study i.e. months of October, November,

December are more prone to have road accidents.

Then, a year later, based on the best model found, which is FENB model, Wan

Yaacob et al. (2012), examined the relationship between number of road accidents with precipitation factor. In this study, they compared conditional and unconditional estimation methods using FENB model. The variables involved in the study include monthly number of vehicles registration, amount of rainfall, number of rainy days, spell effect and the interaction term of rainfall and spell effect. The study showed that both conditional and unconditional models gave approximately similar results.

Unexpectedly, the results also indicated that monthly rainfall with short spell period have greater risk in raising the number of road accidents.

Table 2.2 presents the summary of recent studies related to road safety using different statistical models. There are improvements in the field of road safety modeling can be clearly seen year by year. However, some of the models have their own weaknesses and produced unsatisfactory results. Therefore, there is a need to improve current models and develop a better model that is able to give better performance in accurately predict and explain the variables involved.

37

Table 2.2: Recent methods/ models used in road safety study Author(s) Year Datasets Method/ Results Model Greibe 2003 Accidents data for Generalized The model is capable of five years period linear Model describing more than 60% of aggregated for (GLM)- systematic variation road junction and road Poisson link, while junction have link. Two distribution lower value explained predictors variable includes traffic flow and road design Noland 2003 Annual fatalities/ Panel data Changes in age cohort, injuries and road analysis: Fixed increased seat belt usage, infrastructure effect negative reduced alcohol consumption, changes in binomial and increased in medical technology control random effect technology will reduce variables negative numbers of fatalities. binomial Changes in highway infrastructure in US do not reduce fatalities and injuries Valli 2004 Annual accidents Smeeds and Model perform well for both series from 1977- Andressen equations 2001 for major Equations metropolitan cities in India, disaggregated into motor vehicles accidents, injuries and fatalities Law et al. 2005 Annual series Box and Motorcycle safety program motorcycle related Jenkins with proven to be successful in accidents and non transfer bringing down the number of motorcycle related function model motorcycle-related accidents, accidents, casualties and fatalities. Total injuriesand number of motorcycle related fatalities, Gross accidents casualties and dometic fatalities is positively related products(GDP), to GDP and intervention of MSP

38

Table 2.2: Continued Author(s) Year Datasets Method/ Results Model Harnen et 2006 Motorcycle GLM: Poisson Vehicle traffic flow, road al. accidents regression design and road geometry frequency, traffic were significant variables to flow and road explain road accidents in design/road junctions geometry Brijs et al. 2008 Daily car crash, Comparison of Weather effects do influence meteorology data Integer the number of crashes, INAR and traffic autoregressive outperformed simple Poisson exposure data of (INAR) and regression model. the Netherlands in competing year 2001 for model three big cities. Quddus 2008 Annual road traffic ARIMA, ARIMA is the best model for fatatlities in Great GLM, INAR aggregated TS data INAR(1) Britain (1950- and Poisson is best 2005) disaggregated model. and Monthly car casualties within the London congestion charging zone (Jan 1991- Oct 2005), introduction of charge on car casualties (intervention) Wan 2010 Monthly number of RENB and RENB model is superior Yaacob et road accidents in Pooled NB compared to NB model. al. Malaysia, Months of August and registered vehicle, October prone to have more amount of rainfall, accidents. number of rainy South region (Kuala Lumpur, day, time, dummy Selangor, Malacca and regions and Negeri Sembilan) have higher dummy months road accidents occurrence likelihood. Registered vehicle, time trend and weather have positive relationship with accidents Desai and 2011 Hourly data of Ordinary least Traffic volume is Patel total accidents, square (OLS) significantly related to total fatal accidents and regression accidents and fatal accidents traffic volume from 2005-2010

39

Table 2.2: Continued Author(s) Year Datasets Method/ Results Model Ali and 2011 Annual accidents Artificial Both methods provided very Bakheit frequency in Sudan Neural similar forecast values. with 5 indicators Network However PCR method is (ANN) and more realistic while ANN Principal method was found to have component higher values regression (PCR) Wan 2011(a) Monthly number of Pooled Best model to describe road Yaacob et road accidents in Poisson, FEP accidents model is Fixed al. Malaysia, model, FENB Effect negative binomial registered vehicles, models. amount of rainfall, Registered vehicle, amount of number of rainy rainfall, time effect and days, time, and dummy months are dummy months significantly related to road accidents. Road accidents are prone to occur in months of Oct, Nov and Dec Wan 2011(b) Monthly number of Box and Best fitted model to describe Yaacob et road accidents, Jenkins and road accidents is al. dummy variable of Box Tiao ARMA(1,12) road safety intervention and implementation of Ops intervention analysis Sikap reduced the number of road accidents Ofori et al. 2012 Annual injuries of Comparison ARIMA (1,1,1) performed road accidents for between better period of 1991- damped trend 2010 exponential smoothing and ARIMA model Wan 2012 Monthly total Conditional Both conditional and Yaacob et number of road and unconditional models have al. accidents, Unconditional similar results.Time which is registered vehicles, FENB a proxy for development amount of rainfall, technology has significant number of rainy relationship with road day,spell effect, accidents. interaction term of Highest correlation is rainfall with spell between rainfall and effect, time. interaction term of rainfall and spell effect. Rainfall in shorter spell periods present greater risk of road accidents compared to rainfall with longer spell periods.

40

Table 2.2: Continued Author(s) Year Datasets Method/ Results Model Nasarudin et 2012 Annual series of Comparisons Regression approach gave al. motorcycle death of OLS and better estimate of motorcycle and two predictor Smeeds fatalities variables formula Abusini 2013 Number of GLM Motorcycle accidents follow motorcycle Poisson distribution. accidents, and road Shoulder width and gradient geometric factor influence road accidents. such as, median, gradient, shoulder width, speed and flow in Batu East Java, Indonesia Razzaghi et 2013 Monthly series of Descriptive Traffic road accidents have al. road accidents analysis and an increasing trend over 5 from 2007-2011 in ARIMA years period of study, Taybad city however the series does not fit ARIMA model. Seasonal ARIMA is suggested. Sarani et al. 2013 Weekly number of Poisson Enforcement of rear seat belt passenger vehicle regression law successfully reduced the that sustained number of people getting severe and slight severe and slight injuries. injuries and 3 Over time, the number of predictors people getting severe and variables slight injuries will continue to increase, Balik Kampong culture found to have positive significant relationship in this study Abdul 2013 Motorcycle NB Motorcyle fatalities per km Manan et accidents fatalities, on primary road is al. road traffic statistically significant. volumes, road Fatalities are affected by the structure/ road average daily number of design motorcycle and number of access per km Moges & 2014 Time series data Holt and Monthly and daily pattern of Woldeyohan from 1996 to 2011 Brown injury and death due to nes on injury and death exponential vehicle crash is highly due to vehicle smoothing dependence on particular crash in North methods month and day. Future Gandar Zone, forecast showed the Northwest increasing trend Ethiopia.

41

Table 2.2: Continued Author(s) Year Datasets Method/ Results Model Salako, 2014 Monthly series ARIMA and Motor accident is highest Adegoke & converted to OLS during 4th quarter and lowest Akanmu, quarterly series of during 1st quarter 7 years period from 2006-2012 in Osun Sector, Nigeria Mohammadi 2015 Traffic accidents in Independent t- The highest occurrence of an et al. Isfahan, Iran from test, anova, chi road accident is during the 2006-2011, square, and summer and lowest during demographic factor multivariate autumn, while the highest and and seasonal factor logistic lowest mortality rate was observed in summer and winter respectively Amin et al. 2014 Road accidents, NB and Impaired driving, speed, and traffic flow, Poisson aggressive driving and environmental regression occupant protection are a conditions, road predominant factors of geometry, property accidents. damage only, Uncontrollable injuries, fatalities, meteorological conditions daily rainfall, snow such as visibility preciptation fall, mean and wind speed are also temperature important contributing factor of accidents. Due to climate change, number of accidents decline during snowing and freezing days Shahid et 2015 State wise data of Spatial More accidents but lower al. registered vehicle, variability and fatalities in urbanized and road traffic temporal developed states and lower accidents, major variability accidents but serious fatalities and minor using Mann is less urbanized state. casualties and Kendal test Festival months show peak fatalities in and Sen's slope increase especial Eid-ul-Fitr. Malaysia from methods 2008-2013 Nanga 2016 Monthly series of Box and SARIMA(1,1,0)(0,1,1)12 was road accidents in Jenkin found to be the best model. Ogun, Ghana, for SARIMA 20 years period model (1991-2010)

42

2.5 Structural Time Series in Road Safety

Harvey and Durbin (1986) pioneered the structural time series (STS) in road safety study. This study is in conjunction with report of the post seat belt law intervention analysis by the Department of Road Transport in Britain. Basically this study discussed on the reduction of road casualties after seat belt law was implemented. This study compared between structural time series approach with

ARIMA method to model road casualties. The study includes data of killed and seriously injured (KSI) and killed (KILL) for various categories of road user which directly and indirectly have affect on seat belt usage. Drivers and front passenger may directly effected by seat belt usage while rear passenger, pedestrians and cyclists are indirectly effected.

The study developed a basic structural modeling (BSM) as this model can prevent lost of information by incorrectly restrict some of the components to be zero throughout the period. All parameters are estimated using Kalman filter estimation technique. Diagnostic tests were carried which include homoscedasticity test, Ljung

Box test for the independence of the residual and Jarque Bera or Bowman Shenton test of normality. Prediction of post sample period was tested using post sample predictive test statistics which is similar to Chow test.

At the same time, ARIMA modeling has been conducted. ARIMA modeling for road casualties series seems to follow the airline model that is SARIMA model of

(0,1,1)(0,1,1)12. However, the determination for the order of ARIMA modeling is quite complicated. This study also considered explanatory variables by trying different combinations of explanatory variables. The explanatory variables involved in this trial include petrol price and car traffic index.

43

To present the intervention effect of the seat belt law, dummy variable has been defined as “0” before the intervention and “1” after the intervention was implemented. To check for the robustness of the model, the authors used sensitivity analysis by setting the relative variance. If the changes of the coefficient of intervention variable are smaller than the original estimate and the new estimate was deemed not significant, it means that the model is robust.

The study found that seat belt law intervention has succeeded in reducing the number KSI of car drivers and front seat passengers, as well as numbers of KILL car drivers, front seat passenger, and rear seat passenger. Besides, increased in petrol price also has a significant effect to KSI and KILL drivers, rear seat passengers and front seat passengers.

In terms of methodology, the author claimed that ARIMA model identification procedure is hard to operate and explanatory variables were incorporated in the model using transfer function technique. Besides, basic identification tool is cross correlogram, which is even harder to interpret than univariate correlogram. In this case, the difficulties have increased as the number of explanatory variables increased compared to structural approach which is a more direct and transparent technique for time series modeling. Explanatory and intervention variables can be added in direct manner because the model can be put in state space form and handled computationally using the Kalman filter estimation technique.

After a few decades, the STS model on road safety is well developed inline with the availability of many statistical software. Table 2.3 presents several recent studies on road safety modeling that applied the STS approach. Lassarre (2001)

44 presented a progress of road safety in 10 European countries for 30 years period based on STS approach. The study has successfully related number of road fatalities with number of vehicle per km. The study also determined the best specification of

STS trend model that fits the 10 European countries.

Table 2.3: Recent studies on the application of structural time series on road safety Reference Objective Specification of Structural Time Series Model Lassare (2001) Progress of road safety in 10 Local linear trend European countries +intervention variables

Hermans et al. Investigate monthly frequency Local level, local linear (2006) and severity of road traffic trend, local level seasonal accidents in Belgium with explantory variables

Scuffham and Examine the changes in the trend Random walk + lag Langley (2002) and seasonal patterns in fatal dependent crashes in New Zealand in relation to changes in economic conditions

Bergel-Hayat Relate road accidents with Local linear seasonal Model (2013) weather effect + explanatory variables+risk variables

Bergel-Hayat and Relate fatalities with economic Local linear trend seasonal+ Zukowska (2014) factors in Poland intervention

Antoniou and Forecast macroscopic road safety SUTSE and latent risk Time Yannis (2013) in Greece Series

Antoniou, Forecast in 5 European Country Local linear trend and latent Papadimitriou, risk model and Yannis (2014)

45

In 2002 the STS approach was once again applied to examine the changes in the trend and seasonal patterns of fatal crashes in New Zealand in relation to changes in economic conditions between 1970 and 1994 (Scuffham and Langley, 2002). They highlighted the advantage of STS approach in terms of non-restrictive treatment of seasonality which allows seasonal and trend patterns to gradually change over long periods as ‘intuitively appealing’.

Then, Hermans et al. (2006) discovered a long-term trend of time series and quantify the impact of weather conditions, economic factors and several laws related to road accidents frequency and severity of crashes and casualties in Belgium during

1974-1999. From the sample, forecast is made for 2 years period and the results are compared between regression model with ARMA. There are a lot of similarities between the results of the state space method and the regression model with ARMA errors.

In 2013, the STS model of road safety relate with the weather effect. The study done by Bergel-Hayat et al. (2013) applied STS to aggregate datasets of injury accidents for France, the Netherlands and Athens region over the period of 20 years.

The study found that weather variables are significantly related to the aggregate number of injury, but the magnitude and sign of correlations vary according to type of road. There are two STS models considered namely, local linear trend and seasonal model with and without exposure variables.

In recent years, more advanced approaches of STS in road safety can be found such as multivariate STS which is also known as Seemingly Unrelated Time Series

Equations (SUTSE) and latent risk time series (LRT). Studies that applied these approaches include Bergel-Hayat and Zukowska (2014), Antoniou et al. (2013, 2014,

46

2016), Antoniou & Yannis (2013), Chang, (2014), and the recent one in

Commandeur et al. 2017).

Based on the review, the STS approach is applied in modeling road safety behaviour in most European countries. However its application in the Asian countries is quite limited especially in Malaysia. Therefore, the STS approach is a significant tool to model road accidents in Malaysia. It is able to extract the unobserved trends and seasonal patterns of road accidents directly as well as stochastic or deterministic pattern of the phenomenon indirectly.

2.6 Comparison of Road Accidents Models between This Study with

Previous Studies

Road safety in Malaysia has been statistically modeled by numerous methodologies and models. Such models include Smeeds model (Nasarudin et al.,

2012), Poisson and NB regression (Radin Umar et al., 1996; Sarani et al., 2013;

Harnen et al., 2006; Abdul Manan et al., 2013), Box and Jenkins time series analysis

(Yaacob et al., 2011; Law et al.,2005) and panel data analysis (Wan Yaacob et al.,

2010; Wan Yaacob et al., 2011a), although the models used in these studies may have their own weaknesses. Thus, the model may not represent an accurate prediction which turns into false alarm in terms of road safety level.

On the other hand, the indicators or the explanatory variables used in the models resulted to bias prediction especially in terms of dummy variable. For example seasonal related variable in Radin Umar et al. (1996) which represent the festive season was named as “Balik Kampung” (BLKG) that was incorporated using dummy variables “0” and “1”. In their study it was sensible to show the festive

47 seasons as the data was in weekly series. However, the variables representing festival holiday were not absorbed by monthly dummies in the cases where the festive seasons occurred in early and end days of the months.

Existing analysis for road safety in Malaysia is relatively sparse in the sense of only a relatively small number of states are covered in each study. The analysis assumed that there is no difference between road safety in Malaysian states or at least in the regions studied. Up to our knowledge only Wan Yaacob et al., (2011a,2012) in their panel data analysis model of road accidents were have analysed road accidents series for all 14 states in Malaysia.

Therefore, Table 2.4 summarised the improvement in the model used in this study compared to the most recent road safety models. In term of improving the prediction accuracy, the STS is used to model the number of road accidents. STS is modeled by decomposing the trend and seasonal component of the series either in deterministic or stochastic pattern. Indirectly, the the behaviour of road accidents pattern can be observed and interpereted.

For biased dummy variables this study applied one weight variable for moving holiday as in Shuja et al.(2007) to represent BLKG. Beside this, another weight variable for safety measure intervention is introduced that is based on the days the operation is carried out. The methods for the calculation for both indicators were already discussed in Chapter 1, Section 1.4.4 and 1.4.5 respectively. Besides, it is noteworthy to model the number of road accidents cross all states and regions. There are often significant discrepancies among them, particularly in the case of developed versus developing states or regions.

48

Table 2.4: Comparisons between model used in this thesis with previous models Factor Models in This Thesis Previous Models Model and Use structural time series Classical approaches of time Methodology model in modeling monthly series modeling resulted in number of road accidents in poor performance of residual Malaysia. This method allow assumptions for example in the unobserved component of OLS models. While in Box time series such as trend and and Jenkins methods seasonal to vary overtime. In achieving the stationarity other words the component condition will result in can be observed directly so detrending and that in this case the trend and deseasonalized of the series seasonal pattern of road which indirectly removed the accidents can be determined. important informations in the series. Indicators Weight variables for festive Use dummy variable to holiday and safety represent festive holiday and intervention variables intervention variables

Area of study Aggregated into 5 main Only a relatively small regions and 14 states number of countries are covered

2.7 Summary

This chapter has reviewed the development of the STS model and road safety modeling in recent years. The improvement in road safety modeling is clearly shown year by year by numerous statistical and mathematical models. The aim of each model is to improve the prediction so that early intervention can be developed. STS approach has improved many prediction models from variety of disciplines including in road safety. Hence, this model may be the best model to applied in Malaysian road safety study to improve the prediction accuracy by extracting the unobserved component and incorporate sensible indicators. Details on the STS model will be discussed in the next chapter.

49

CHAPTER 3

METHODOLOGY

Based on the literature review discussed in Chapter 2, a few statistical techniques have been found to be appropriate in achieving the objectives of the study. These techniques include preliminary analysis, time series regression (TSR), seasonal autoregressive integrated moving average (SARIMA) model and structural time series (STS) model. The discussion in this chapter are gathered and summarized from several references among others include Harvey (1989), Montgomery, Peck and

Vining (2001), Proietti (2002), Mann (2004), Lazim (2005), Commandeur and

Koopman, (2007) and Durbin and Koopman (2012).

3.1 Properties of Data

The analysis begins with preliminary analyses that describe the characteristics of the main data series of road accidents and other variables. The analyses include descriptive statistic, time series plot and correlation analysis.

3.1.1 Descriptive Statistics

Descriptive statistics are quantitative description that describes the basic features of the observation in tables, figures, charts or graphs (Friend, 1976). The aim of descriptive statistics is to simplify the larger amount of data in sensible ways or in

50 other words transform it into single indicator. Most common measurement to present this single indicator is measurement of central tendency that is the measurement where the observation fall and measurement of dispersion or variation that measure how the observation fall from the center (Janes, 1995). Measure of central tendency includes mean, mode and median, whereas measure of dispersion includes standard deviation or variance, minimum and maximum values of the variables.

Mean is the average of the observation. The sample mean, Y measure the mean and calculated as in Equation (3.1).

n ∑Yi Y = i=1 (3.1) n

where n is the number of observation,YY12, , ..., Yn .

On the other hand, standard deviation is the most accurate measured of dispersion. Standard deviation is useful if the series or observation contains outliers as it can greatly exaggerate the range. Lower standard deviation indicate that data points tend to be very close to the mean whereas higher standard deviation show that the data is spread out over a large range of value (Mann, 2004). The standard deviation are computed as in Equation (3.2).

n 2 ∑()YYi − SD = i=1 (3.2) n −1

In addition, minimum and maximum the sample variables indicate the least and greatest element in the sample. The values are least robust statistics even though it is sensitive to the outliers. The minimum and maximum values are useful to

51 understand the total span of our data as it can be used to calculate the range, which suggests the spread of the sample data.

3.1.2 Time Series Plot

Time series offers an understanding of the past and enables prediction for the future (Cowpertwait and Metcalfe, 2009). The first step of time series analysis is presentation of the time series plot. The plot shows a graphical view of observation on the y-axis against equally space-time interval on the x-axis. The plot usually used to observe the pattern so that the data can be analyzed further with suitable methodologies. Besides, the time series plot also shows how something changes over a period of time or before and after effect of process changes. It is also useful for comparing data patterns of different groups as well as determining the presence of trends and seasonal influences in the series. It is significant for modeling procedure in the future.

3.1.3 Correlation analysis

Correlation analysis measures the strength of the relationship between two variables. The coefficient of correlation ranges from –1 to +1, with positive value indicates positive relationship between the variables. The most common measurement used to compute correlation coefficient is the Pearson product-moment correlation that can be calculated as in Equation (3.3).

n∑ XYii− ( ∑∑ Xi)( Y i) r = (3.3)  22−−22  n∑ Xi( ∑ X i)  nY ∑∑ ii( Y) 

52 where n is the number of observations. However the coefficient is not appropriate to be applied if the variables are not normally distributed as the coeffecient is affected by extreme values, which may exaggerate or dampen the strength of relationship

(Mukaka, 2012).

The magnitude of the coefficient indicating the degree of association between two variables is shown in Table 3.1.

Table 3.1: Value range of coefficient of correlation Coefficient of Interpretation Correlation r =1.0 Perfect positive correlation 0.7<

The existence of a linear correlation between two variables and its significance can be tested by using the following hypothesis and their test statistic is as in

Equation (3.4).

H0 :ρ = 0 (no correlation between variables)

H1 :ρ ≠ 0 (exist significant correlation between variables)

n − 2 tr=  t− (3.4) rn1 − r 2 2

3.1.4 Unit Root Tests

The requirement of most time series modeling is the stationary of the series as non stationary series may influence the behaviour or the properties time series

53 modeling. For example, estimated regression on non stationary series may result in spurious regression. Therefore unit root test is conducted to determine the stationarity of the series before futher modeling technique are perform.

In practise non stationary series can be transformed into stationary series by differencing the original series. A series said to integrated of order d written I(d) if the series require differencing d times before it achieved the stationarity condition. A series said to be stationary when corresponding mean, variance and covariance are constant and independent of time t . In order to examine the stationarity of the series the well known unit root test Phillips Perron (PP) test is used in this study.

Phillip Perron test is developed by Phillips and Perron (1988) in overcome the problem in Augmented Dickey Fuller (ADF) test. The method is similar with ADF except that its incorporate the automatic correction which allowing the autocorrelated residual and heterocedasticity in the series. The test control the higher order serial correlation in the errors by employing a correction factor that estimated long run variance of the error process based on Newe West (1987) test.

The PP unit root test is based on the following equation

∆=Yttαα0 + 1 TY + α 21− (3.5) with Y is the variable being tested, T is deterministic trend and ε is the error term.

The null hypothesis for the present of unit root (series is non stationary, α2 = 0 ) is tested against and alternative hypothesis for the absence of unit root (series is

stationary, α2 < 0 ).

54

3.2 Regression Analysis

Correlation coefficient measures the strength of the relationship between two variables. However, it does not provide information about the size of changes in the dependent variable as a result of changes in the independent variable. The linear regression analysis provides this information. Generally, regression analysis is normally used to investigate the relationship between the variables.

3.2.1 Time Series Regression

Regression technique was developed to infer relationship between cross sectional data variables. The simple linear regression model relating one independent variable (also known as explanatory variable) and the dependent variable (also known as response variable) is shown below:

YXi=++ββ01ii ε (3.6)

where β0 is the intercept of regression line on the y-axis, β1 is the slope of the

regression line that also indicates changes in Yi per unit changes in X i and εi is the unobserved random errors. Meanwhile, the regression model employed for analysis involving time series variables is shown below:

YXt=++ββ01tt ε (3.7)

The regression model can be extended to involve more than one independent variable. The multiple time series regression model describing the relationship between a response variable Y on a set of κ explanatory variables using the regression model is shown below:

Yt=ββ0 + 11 XXt + β 2 2 t ++... βκκ Xtt + ε (3.8)

55

Where βκj ,j = 0,1,..., represents the change in the response of Yt per unit change in

X it with all the remaining explanatory variables Xiit ()≠ j are held constant. The regression model above may also include the lag of dependent and independent variables.

3.2.2 Parameter Estimation and Hypothesis Testing

There are several methods that can be used to estimate the coefficient parameter, but ordinary least square (OLS) is most extensively used method. OLS was developed by German mathematician, Carl Friedrich Gauss which minimized the sum of squared distance between observed response,Y in the data set and the predicted response, Yˆ calculated from the model which also known as the residual,

ˆ ε . Thus, the general least square criterion to estimate βκ is as follows:

n 2 S(βββ012 , , ,..., βκ ) = ∑ εt t=1 (3.9) nk ˆˆ2 =∑∑()YXt −−ββ0 j jt tj=11=

The function S must be minimized with respect to βββ012, , ,..., βκ . Thus the least

square estimator of βββ012, , ,..., βκ must satisfy

nk ∂S ˆˆ =−2(∑∑YXt −−ββ0 j jt )0 = (3.10) ∂β ˆˆˆ ˆ 0 βββ012, , ,..., βk tj=11=

and

nk ∂S ˆˆ =−2(∑∑Yt −−ββ0 j XX jt) jt = 0 (3.11) ∂β ˆˆˆ ˆ 1 βββ012, , ,..., βκ tj=11=

Simplifying these two equation yields:

56

n n nn ˆˆ ˆ ˆ nββ01+∑ X 1t + β 2 ∑ X 2 t ++... βκκ ∑∑ XYtt = t=1 t= 1 tt= 11= nn n n n ˆˆˆ2 ˆ βββ0111212∑∑∑Xt++ Xt Xt X t +... βκκ ∑ X1t X t = ∑ XY 1tt tt=11= t= 1 t= 1 t= 1 (3.12)  n n n nn ˆˆ ˆ ˆ2 ββ0∑Xκt+ 1 ∑ Xj κ X 12 t + β ∑ Xκt X 2 t +=... β κκ ∑∑ Xt XY κtt t=1 t= 1 t= 1 tt=11=

Solving Equation (3.12) will give the least square estimator of βββ012, , ,..., βκ .

However, it is more convenient to deal with the multiple time series regression using matrix notation. Expressing Equation (3.8) in matrix notation results as follows:

Y=Xβ+ε (3.13) where

Y1 1 XX11 12  X1κ     Y 1 XX Xκ YX= 2 , =  21 22 2          Yn 1 XXnn12 X nκ 

βε01   βε   βε= 12 , =        βεκ  n

The least square estimator of βˆ that minimizes error can be simplified as follows;

S()()()β==−− ε′′ ε Y Xβ Y Xβ (3.14) which can be expressed as

S()β=−−+ Y′ Y β ′′ X Y Y ′ Xβ β ′′ X Xβ (3.15) =−+YY′2βXY ′′ βXXβ ′′

The least square estimator must satisfy

∂S =−+22XY′′ XXβ (3.16) ∂β βˆ which simplifies the least square normal equation as below:

57

XX′′βˆ = XY (3.17)

Solving Equation (3.17) yields the least square estimator of βˆ = () XX′′−1 XY .

Besides estimating the regression coefficient, the estimation of mean square error (MSE), is important, as it is required to ensure the prediction ability of the regression model. The estimation of MSE is obtained from sum of square residual

(SSRes) which can be defined as in Equation (3.18).

SS =(YX − βˆˆ )(′ Y - Xβ ) Res (3.18) =YY′′ − βXˆ Y

SS Thus MSE = Res , with the degree of freedom equal n −κ which associated with n − κ p parameters that are estimated in the regression model.

In addition, as the model involves multiple predictor variables, the appropriate hypothesis to test the significance of regression model are as follows:

H01 :ββ= 2 = ... = βκ = 0

H:1 βκ ≠ 0, for at least one κ

Rejecting null hypothesis implies that at least one of the regressor XX12, ,..., Xκ contributes significantly to the model. The test procedure is a generalization of the analysis of variance used in simple linear regression. The total or sum of total (SST) is partitioned into a sum of square regression (SSReg) and sum of square residual

(SSRes). Thus the test statistic to test this hypothesis is:

SSRe g / κ FF0= ~ kn/1−−κ (3.19) SSRes /1 n −−κ

58

2 n ∑Yt t=1 where SS =βXY′′ − , SS =YY′ − βXY ′′ and Re g n Res

2 n ∑Yt t=1 SST =YY′ = . n

Once the above null hypothesis is rejected, determining which variable contributed significantly to the model is then tested. The procedures are similar with t-test in Equation (3.20) except that the test is conducted for each k-1 coefficient.

ββˆ − tt= jj~ (3.20) βj ˆ ακ/2,n−− 1 se()β j

3.2.3 Diagnostics Checking

Despite the significance and goodness-of-fit of the regression model, there are few assumptions pertaining the model and error term that need to be examined.

The assumptions include the error term having mean zero[ E ( εt ) = 0] , constant

2 variance [Var()εσt = ε ] , is serially uncorrelated [Cov(,εεtt−1 )= 0] and is normally distributed. As observations in time series variable are often correlated, autocorrelation may exist among the errors and therefore violating the assumption of no serial correlation.

The presence of autocorrelation in the errors has several effects on the regression model. These include inefficient estimate of the model coefficients, the mean square error (MSE) may be seriously underestimated that results in giving false impression of accuracy, and the t-test and F-test are no longer valid. A few statistical tests can be used to detect the presence of autocorrelation in the errors. Durbin-

Watson, dw statistic test that was developed by Durbin and Watson (1950, 1951,

59

1971) is one of the most widely used tests to detect autocorrelation. The null hypothesis of no autocorrelation ( ρ = 0 ) in the errors is tested againts the alternative hypothesis of the present the autocorrelation ( ρ ≠ 0 ) in the errors.

The dw statistic for testing autocorrelation among the errors is calculated as below:

n 2 ∑()εεtt− −1 t=2 dw = n (3.21) 2 ∑εt t=1

where εt is the residual from the estimated OLS regression.

Computed dw statistic lies between 0 and 4. The significance of the dw

statistic is determined by comparing with two critical limits, the lower limit, dL and

upper limit, dU . Decision of the dw test is summarized in Table 3.2. As can be seen,

test statistic that is smaller than dL or is greater than 4 − dL indicates the presence of positive and negative serial autocorrelation in the residuals, respectively. Test

statistic that falls between dU or is greater than 4 − dU indicates the absence of serial autocorrelation in the residuals. Since many time series show trending behavior, positive autocorrelation typically occurs among the residuals. One of the remedial measures for autocorrelation is by including the lag of the dependent variable

. Table 3.2: Durbin-Watson test decision Value of dw Decision

0 <

dLU<< dw d No decision

44−

44−dUL < dw <− d No decision

dUU< dw <−4 d Do not reject H,0 no autocorrelation

60

Another assumption pertaining to the regression model is the independence among the explanatory variables. Dependence among the explanatory variables is known as multicollinearity problem. This scenario may occur in the regression model involving more than one explanatory variables, whereby there exist possibility that these variables are highly correlated.

Serious multicollinearity is a problem that gives bad impact to the usefulness of the regression model. Multicollinearity can seriously affect the precision of the estimated regression coefficients. Serious multicollinearity may result in larger standard error of the coefficients leading to insignificance of the t-statistic for the coefficients. On the other hand, multicollinearity creates redundant information leading to much larger value of the coefficients of determination, R2. Thus, it is important to overcome the multicollinearity problem.

There are a few methods used for detecting multicollinearity. Preliminary check can be conducted by computing the correlation between all pairs of explanatory variables similar as in Section 3.1.3. If the correlation coefficient is high,

r > 0.8, this indicates the possibility of a multicollinearity problem. The most common method for detecting multicollinearity is by calculating the variance inflation factor (VIF) as given below:

1 VIFj = 2 (3.22) 1 − R j

2 where R j is the coefficient of determination from the auxiliary regression model with

X j being regressed on the remaining explanatory variables. Value of VIFj that is greater than 10 indicates the existence of multicollinearity among the explanatory

61 variables. Serious multicollinearity problem is overcome by removing certain explanatory variables from the regression model (Gujarati, 2003; Wooldridge, 2006).

Equally important in the diagnostic checking is the residual analysis. The analysis involves the detection of outliers or extreme observations that potentially have a severe effect on the regression model. Existence of outliers may lead to violation of normality assumption. Removal of outliers often results in model improvement.

The presence of outliers can be detected through residual plot and normal probability plot. However, a better way to identify the outliers is by using scaled residual such as standardized, studentized and R-Student residuals

(Montgomery, 2012). In this thesis, the outliers are detected using the standardized residual, which can be computed as below:

ε b= t , tn= 1,..., (3.23) t n ε 2 ∑ t t=1 n − κ

ˆ where, εt=YY tt − and κ is the number of parameter. If the computed absolute value

of the standardized residual is greater than 3( bi > 3) , this indicates the observation is outlier. The presence of outliers can be overcome by using impulse dummy variables to prevent the loss of degree of freedom through the loss of observations.

3.3 Box and Jenkins Analysis

The Box and Jenkins approach was first introduced by George E.P. Box and

Gwilyn M. Jenkins in 1976. The approach is synonymous with general ARIMA modeling that is very common to time series analysis forecasting and control. The

62 term ARIMA stands for combination that comprises autoregressive (AR), integrated

(I) and moving average (MA). AR model was first introduced by Yule (1926) and

Walker (1931). MA was first introduced by Slutzky (1937) and combination of both model, ARMA introduced by Wald (1938). Meanwhile I or integrated does indicate that differencing done to make the series stationary. In addition, as the study involved seasonal series, seasonal ARIMA model is discussed in detail in the next section.

3.3.1 Box and Jenkins ARIMA Model

Box and Jenkins method is subject to stationarity of the series. A series said to be stationary if it fluctuate some fixed value randomly, generally either around the mean value of the series or it could be some other constant values or even zero values. In other words, the series is stationary if it indicates no trend component

present. Mathematically Yt said to be stationary if it meets the following conditions:

i. The mean is constant, EY(tt )= EY (−−12 ) = EY ( t ) = ... = c.

= −22 =σ <∞ ii. The variance is constant,Var() Ytt E ( Y c ) . =−− iii. The covariance between Yt and Ytk+ , cov(YYt , tk++ ) EY [(t c )( Y tk c )] and is independent of tk≠ . It is only differentiated by distance in time k between

th the two elements Yt and Ytk+ . The corresponding k order autocorrelation

EY[(t−− cY )( tk+ c )] ρk = Var() Y Var ( Y ) t tk+ (3.24) Cov(, Y Y ) = t tk+ σ 2

2 with the under stationarity condition Var() Yt = Var ( Ytk+ ) = σ .

63

Thus the possible Box and Jenkins model that meet the stationary series include autoregressive (AR), moving average (MA) and autoregressive moving average model (ARMA). AR model refer as a function of its previous values and error terms or mathematically written as:

Yct=+φφ11 Yt−− + 2 Y t 2 ++... φptp Y − + ε t (3.25)

φ = where c and j (jp 1,2,..., ) are constant terms of parameters to be estimated and

th constraint of φ j < 1. Yt is dependent or current values and Ytp− the p order of the

lagged dependent, εt is the error term which is assumed to have normal and identically distributed with zero mean and constant variance.

On the other hand, MA model links the current values of time series with random errors that have occurred in previous periods rather than the values of the actual series themselves. Mathematically, the general equation of order qth of moving average is written as:

Yct=−θε11t−− − θε 2 t 2 −−... θεq tq − + ε t (3.26)

θ = Where c is contant mean about the series fluctuate, q (q 1,2,...) are the MA

parameters to be estimated and εtq− (q =1, 2, 3) are the errors terms which assumed to be independently distributed overtime.

Aside from using AR or MA independently, these two models can be combined to produce another distinct model. Such model is known as ARMA model where ARMA (p, q) is written as in Equation (3.27).

Yct=+φφ11 Yt−− + 2 Y t 2 ++... φptp Y − (3.27) −θε11t−− − θε 2 t 2 −−... θεq tq − + ε t

64

However, as mentioned by Nelson and Plosser (1982), in the real world, the characteristic of non-stationary series are fairly large. The non-stationary series can be categorized into two classes; non-stationary in mean and non-stationary in variance. Luckily, a simple procedure may transform non-stationary series into stationary series. This is done by performing differencing to the series and it is analogous to the process of removing the trend pattern from the actual data.

Occasionally, taking the first different may not sufficient in transforming the series into stationary series. This calls for second degree differencing. However, in most economic time series differencing beyond second degree is rarely needed.

A series that required first difference to be stationary is said to be integrated

of order one or first order integrated series,YIt ~ [ (1)] . In short, this is conveniently written as in Equation (3.28).

∆=YYYt tt −−1 (3.28) where the symbol ∆ denotes the differencing process and the second order

differencing, YIt ~ [ (2)] can be written as in Equation (3.29)

∆() ∆Yt =∆ YY tt −∆ −1

=−−−(YYtt−1 )( Y t −− 12 Y t ) (3.29)

=−+YYYttt2 −−12

where ∆Yt is the first difference.

Thus, non-stationary series of Box and Jenkins models are known as

ARIMA (p,d,q) where d denotes the degree of differencing involved to achieved stationarity in the series. A simple example model of ARIMA model is

ARIMA(1,1,1) and can be written as in Equation (3.30).

Wct=+−+φ1 W t−− 1 θε 11 tt ε (3.30)

65 where Wt= YY tt − −1 represents first difference of the series and it is assumed to be stationary. In this case the values of pdq=1, =1, = 1 . Alternatively, Equation

(3.30) can be written as:

YcYYYt=+ tttttt−−−11112 +φ + φ +− ε θε 11 − + ε (3.31)

For series that involved seasonal component, additional differencing is necessary to be performed in order to eliminate the seasonality effect. This is called seasonal differencing that is performed as follows:

Zt= YY t − ts− (3.32) where Zt is the seasonal difference and s denote the number of seasonal. In this case

s =12 for monthly series. However, somehow there is a case where the seasonal differencing does not result in stationary series. This is called non-seasonal

differencing, where Zt is further differentiate as ZZtt− −1 .

Model which involved seasonal component is known as seasonal ARIMA and in this thesis it is usually written as SARIMA (p,d,q) (P,D,Q)s where p,d,q is

ARIMA order for non seasonal part and P,D,Q is ARIMA order for seasonal part.

The simple example of SARIMA model is SARIMA (1,1,1)(1,1,1)12 as written in equation (3.33).

ZZZZtttttt=−−−11112 +φ + φ +− ε θε 11 − (3.33)

where Zt= YY tt − −12 .

3.3.2 Box and Jenkins Model Identification

Basic of Box and Jenkins methodology consists of three main stages that are identification, estimation and validation, and model application. Prior to the implementation of this stage, the series are subject to the stationary condition, non-

66 missing value problem and constant variance. The three main stage of Box Jenkins methodology is summarized in Figure 3.1.

The first step in developing a Box-Jenkins model is to identify the most suitable and appropriate class of model to be applied to the given series. It is the most difficult stage that is developed by computing, analyzing and plotting various statistics based on historical data. The most common statistics used to identify the model are autocorrelation and partial autocorrelation coefficient in which both statistics measure the degree interdependence among the observation in the series.

The coefficient values which takes values between −1 and +1 reveal important information pertaining the series as well as the most appropriate model that can be formulated using the series.

Source:Lazim (2005) Figure 3.1: Box and Jenkins methodology

67

Recall that correlation is a measure of strength of the relationship between variables. Thus, in context of time series the term auto is used to denote time factor and hence autocorrelation measure the degree of relationship between the current

values of time series with its past values. Generally, autocorrelation is denoted by ρk and formulated as in Equation (3.24). A series of the autocorrelation values is defined as autocorrelation function (ACF) and when plotted against the lag terms it is known as correlogram.

On the other hand, the partial autocorrelation denoted with ρkk is used to

measure the degree of association between Yt and Ytk− when the effect of other time lag 1, 2, 3,… up to k −1 are somehow eliminated or is said to be partially out. The partial autocorrelation of order p is defined as the last autocorrelation coefficient of an AR(p) model. For example, in the equation of AR(1), AR(2), AR(3),…, AR(k-1) and an AR(k) process, respectively, the last coefficient in each of these equation is called the partial auto correlation coefficient.

The most suitable and appropriate class of Box and Jenkins model can be

determined by plotting various values of autocorrelation, ρk (for k = 1, 2, 3,  ). If the ACF tails off exponentially quickly as the lag k increases, this indicates that the series is stationary. The identification of ARMA order then proceeds by plotting

PACF. The model is said to have AR(p) model if PACF cuts off at lag p and ACF tails off at lag q . Meanwhile, if ACF cuts off at lag q with the PACF tail of exponentially fast this indicate that the model may be present as MA(q) model.

Otherwise, if both ACF and PACF tails off, the possible model may present as

ARMA (p,q) model.

68

On the other hand, if the ACF tails off rather slowly and trending, it indicates non-stationary series where differencing must be performed to achieve stationarity condition. The identification then proceed as stationary series that already discussed above. The only difference is the model involved integrated series of order d, which indicates the number of differencing to be performed in order to achieve stationarity condition. In summary, the identification process of Box and Jenkins model is tabulated in Table 3.3. In case of seasonality series, the seasonal differencing must be performed and the identification of seasonal part is determined at seasonal lag. For clear picture, the step-by-step identification procedure is illustrated in Figure 3.2.

Table 3.3: Model identification Process ACF PACF AR(p) Tails off Cuts off at non seasonal lag p MA(q) Cuts off at non seasonal lag q Tails off ARMA(p,q) Tails off Tails off SAR(P) Tails off seasonally Cuts off at seasonal lag P SMA(Q) Cuts off at seasonal lag Q Tails off seasonally SARMA(P,Q) Tails off seasonally Tails off seasonally

Figure 3.2: Box and Jenkins model identification process

69

The aim of the Box and Jenkins modeling is to obtain a model that is parsimony and adequately fit the patterns in the data observed. However, observing

ACF and PACF may be the trickiest part especially from the plot. In many cases, human eyes and mind may be deceiving and inefficient. Therefore, several models must be fitted and the best model is then chosen if the entire statistical requirements are satisfied.

3.3.3 Box and Jenkins Model Estimation and Validation

Once the Box and Jenkins model is identified, parameters of the selected models are then estimated. There are several common methods used in estimating the parameters such as maximum likelihood estimate, method of moment and least square method. However, similar to time series regression, least square method will be applied in estimating the Box and Jenkins parameters. This process aims to estimate the coefficients that minimize the difference between the actual and the predicted values. However, the estimated coefficients are subject to the stationarity and invertebility condition. Consider AR(p) model with mean c :

Yct−=φ( Yt−−1 − c) + + φεp( Y tp − c) + t (3.34)

Given {YY12, ,  , Yn} , the parameter of c and φp (p = 1, 2...) can be estimated by minimizing the square of prediction error:

n 2 S=∑ [ Yct −−φφ11 ( Yt−− − c ) + ... +p ( Y tp − c )] (3.35) tp= +1

To obtain the estimate of the mean, c , differentiate the function with respect to c

and differentiate the function with respect to φp (p = 1,2...) to obtain the

φp (p = 1,2...) parameters.

70

On the other hand, if it involved MA model, the estimation process may be quite crucial. This can be solved using AR (∞) representation of MA process but the

prediction is now a non linear function of θq . Consider MA(1) model:

Yct=+ε t − θε11 t−− ⇔ ε tt =Yc −− θε11 t (3.36)

Assume ε o = 0 , thus ε1=Yc 1 −,,ε2 = Yc 2 −+ θε 11 εnn =Yc −+ θε1 n− 1

It can be noticed that the prediction errors are not independent of each other.

n 2 θ In this case, to obtain the estimation of ∑ε , various values of {c, 1}must be t=1

n θˆ 2 considered and the {cˆ, 1} value that minimizes ∑ε will be chosen. Similar t=1 processes were applied for higher order model including ARIMA and seasonal

ARIMA models. In addition, it is important to keep the model parsimony where unnecessary terms must be excluded, as it may lead to complicated model and decrease forecasting performance.

Furthermore, the absence of residual autocorrelation is diagnosed. The presence of autocorrelation indicates that the model was misspecified. It implies that important parameters may have been omitted or unimportant parameters are mistakenly included in the model. Reformulation of the model may result to a better fitted model. This stage of process is known as model diagnostic checking. It is ensures that the residuals are normal and they are identically distributed with mean zero and constant variance. The process of estimation and diagnostic checking may be repeated for a few times before the final best model is obtained.

In addition, there is a possibility that there are more than one best model to explain the series. In this case, the criterion procedure is used. The criterion, gives the information whether the model can be used with confidence or the whole process

71 need to be repeated to identify a better model. The most common criterion used is

Akaike Information Criteria (AIC). A model is considered to have a better if it has small AIC value.

Once the model has been selected, estimated, checked to meet all test criteria and the model’s fitness has been confirmed, it is usually a straightforward task to conduct the forecasts. But, if the model fails to produce reliable forecast values or fails to explain the phenomena being studied then the model need to be revised and updated again.

3.4 Structural Time Series

Traditional time series decomposition method can be written as a sum of

trend, µt , seasonal, γ t and irregular, εt components as written below:

Yt=µγε ttt ++ (3.37)

However, the model above is can be inadequate as it treats all the components deterministically. Modifying the model by letting the time series components to follow autoregressive process of order one may result in a more flexible model that fits almost all time series. Allowance for time-varying component is known as structural time series (STS) or state space time series modeling approach.

The STS model can be easily extended to handle any specific features of time series that are difficult to be handled by ARIMA model such as the seasonality and trending series. Moreover, STS model is types of regression model in which the explanatory variables are function of time and the parameters are allowed to be time- varying. Specification of STS model may involve any combination of trend, cycle and seasonal components. The appropriate specification varies according to the feature of the series under investigation or any prior knowledge of the system.

72

Structural model with a few combinations of the components will be discussed in the next sections.

3.4.1 Trend Model

The basic STS model deals with a series that has the underlying level changes

overtime. The series can be thought as having a trend, µt that evolves according to a random walk process with noise, that can be written as below:

2 Yt=µε tt + , ε t ~ NID(0, σε ) 2 (3.38) µµtt=−1 + η t, η t ~ NID(0, ση )

where the irregular component or measurement error term, εt and level error term,

ηt are white noise processes. Equation (3.38) above is popularly known as the local

2 level (LL) model. Letting the variance of the level error term to be zero,ση = 0 reduces the model as deterministic level (DL) model. On the other hand, when the

2 variance of measurement error term is set to zero,σ ε = 0 , the model reduces to pure random walk in which the trend coincides with the observation.

If the series displays a steady upward or downward movement, a

deterministic linear trend model µνtt=c + will be inadequate in explaining the series. Further flexibility may be required by generating the recursive formula for

trend such as µµtt=−−11 + ν t and ννtt= −1 respectively for level and slope with the

initial values of µ0 = c and νν0 = . Then, the random disturbances are introduced on the right hand side of the recursive formulae to allow time variation. The resulting formula is known as local linear trend (LT) model and it is written as:

73

2 Yt=µε tt + εt~ NID(0, σε ) 2 µµtt=−−11 ++v t η t, η t ~ NID(0, ση ) (3.39) 2 vvt= tt−1 +ςς, t ~ NID(0, σς )

where ς t is a white noise error term for slope component that is independent of εt

2 and ηt . Letting σ ς = 0 , the LT model in Equation (3.39) reduces into random walk with constant drift or also known as local level with drift (LD) model:

2 µµtt=−1 ++v η t, η t ~ NID(0, ση ) (3.40)

2 whereas if ση = 0 , the trend is an integrated random walk or also referred as smooth trend (ST) model as the resulting trend varies very smoothly over time:

µµtt=−−11 + ν t 2 (3.41) νt= νς tt−1 + ς t NID(0, σς )

If both variance error term (level and slope) were fixed to zero, it is reduced back as deterministic linear trend (DT) model.

3.4.2 Seasonal Model

Time series data were sometimes influenced by seasonal variation such as changes in weather, calendar and policy. Technique of handling seasonality in STS model is either in trigonometry form or in dummy seasonal form. This study preferred dummy seasonal form in modeling seasonal variation. A fixed or deterministic seasonal pattern is a model that has the seasonal effects summing to zero over a year. This can be formally written as:

s−1 γγt+ t−1 ++... γt−−( s 1) = 0 or ∑γtj− = 0 j=0 where s is periodicity of the seasonal. Similar to other time series components stochastic seasonal model is a more flexible alternative that allows seasonal effect to

74 change over time by letting their sum over the previous year equals to a random error

ω σ 2 term, t that is normally independent of zero mean and ω .

s−1 2 γt=−+∑ γω tj− t ωt~ NID(0, σω ) (3.42) j=1

where s denote the number of season and ωt denotes seasonal error term that is independence of all error terms.

Combination of various specifications of trends and seasonal components discussed above will result in twelve possible of STS models that can be summarized as in Table 3.4.

Table 3.4: Structural time series specification model (trend+seasonal) Abbr Model Equations 2 Yt=µγε ttt ++ ε t NID(0, σε )

Deterministic level with µµt = DLDS deterministic seasonal s−1 γγt= −∑ tj− j=1 2 Yt=µγε ttt ++ ε t NID(0, σε )

Deterministic level with µµt = DLSS stochastic seasonal s−1 2 γt=−+∑ γω tj− t ωt NID(0, σω ) j=1 2 Yt=µγε ttt ++ ε t NID(0, σε ) 2 Local level with µµtt=−1 + η t ηt~ NID(0, ση ) LLDS deterministic seasonal s−1 γγt= −∑ tj− j=1 2 Yt=µγε ttt ++ ε t NID(0, σε ) 2 Local level stochastic µµtt=−1 + η t ηt~ NID(0, ση ) LLSS seasonal s−1 2 γt=−+∑ γω tj− t ωt NID(0, σω ) j=1 2 Yt=µγε ttt ++ ε t NID(0, σε )

µµtt=−1 + ν t Deterministic trend with DTDS νν= deterministic seasonal t s−1 γγt= −∑ tj− j=1

75

Table 3.4: Continued Abbr Model Equations 2 Yt=µγε ttt ++ ε t NID(0, σε )

µµtt=−1 + ν t Deterministic trend with DTSS νν= stochastic seasonal t s−1 2 γt=−+∑ γω tj− t ωt NID(0, σω ) j=1 2 Yt=µγε ttt ++ ε t NID(0, σε )

µµtt=−−11 + ν t Smooth trend with 2 STSS νν= + ς ςNID(0, σ ) stochastic seasonal tt−1 t t ς s−1 2 γt=−+∑ γω tj− t ωt NID(0, σω ) j=1 2 Yt=µγε ttt ++ ε t NID(0, σε )

µµtt=−−11 + ν t Smooth trend with 2 STDS νν= + ς ςNID(0, σ ) deterministic seasonal tt−1 t t ς s−1 γγt= −∑ tj− j=1 2 Yt=µγε ttt ++ ε t NID(0, σε ) 2 µµt= t−1 ++ νη tt η t NID(0, ση ) Local level drift with LDDS νν= deterministic seasonal t s−1 γγt= −∑ tj− j=1 2 Yt=µγε ttt ++ ε t NID(0, σε ) 2 µµt= t−1 ++ νη tt η t NID(0, ση ) Local level drift with LDSS νν= stochastic seasonal t s−1 2 γt=−+∑ γω tj− t ωt NID(0, σω ) j=1 2 Yt=µγε ttt ++ ε t NID(0, σε ) 2 µµtt=−−11 ++ ν t η t η t NID(0, ση ) Local linear trend with LTSS νν= + ς ς σ2 stochastic seasonal tt−1 t t NID(0,ς ) s−1 2 γt=−+∑ γω tj− t ωt NID(0, σω ) j=1 2 Yt=µγε ttt ++ ε t NID(0, σε ) 2 µµtt=−−11 ++ ν t η t η t NID(0, ση ) LTDS Local linear trend with νν= + ς ς σ2 deterministic seasonal tt−1 t t~ NID(0,ς ) s−1 γγt= −∑ tj− j=1

76

3.4.3 Incorporating Explanatory and Intervention Variable

Relating a particular series with the time series components may not be enough to describe the pattern of certain time series. This is because some of the series may be influenced by other external factors, for example increase in traffic accident is related to the increase in the population. Incorporating the influences of external factors in STS to investigate their effect on particular series is by adding explanatory and intervention variables into the equation as follows:

k 2 Yt=+µ t ∑ βjt XI jt ++ λε t t t εt ~ NID(0, σε ) (3.43) j=1

where X jt is the explanatory variable that is among the 10 variables explain in

section 1.5, It is the intervention dummy variable that is suddenly occurred due to

outlier or structural breaks in the model and β jt and λt are the unknown parameters to be estimated.

3.4.4 State Space Form

The STS considered in the previous section can be expressed as a summary of the dynamic of a system that is known as state space form. This representation allows a unified statistical treatment of STS and thus provide powerful tool of estimating the STS parameters (Allen, 1999; Hannonen, 2005; Durbin and Koopman, 2012).

The state space form consists of two equations comprising of the measurement and transition or state equation.

The measurement equation that is also called the observation equation, describes the relationship between the observed (dependent) variable and unobserved state variable. The measurement equation may also include other observed

77 explanatory variables. Meanwhile, the state equation describes the dynamic evolution of the state component. Following Harvey (1989), the measurement and transition equations of the system can be written as follows:

YZ=αε + t tt t (3.44) αt= Tα t t-1 + R t τ t

= n ×1 × for tn1, ..., with Yt is a( ) vector of dependent variable, αt is a (m 1) state

vector, Zt is a (nm× ) matrix of cycle and trend components, and εt is a (n×1)

vector of serially uncorrelated measurement error such that εtt NID( 0H ,) . The

matrix Tt is a (mm× ) state transfer matrix, τt is a ( g ×1) vector of serially

× uncorrelated error term such that τtt NID( 0Q , ) , and Rt is a (mg) matrix related to error term.

The matrix, Ztt , T , R t , H t and Qt matrices are called system matrices.

Specifically, Zt and Tt are state matrices, Ht and Qt can be regarded as variance error matrices. In many applications, the state space model is time invariant and therefore these matrices are assumed to be non-stochastic and can be written without subscripts. The specification of the state space form are completed with the following two assumptions:

i. The initial condition are given by E (ααˆ00) = and Var(αˆ00) = P .

ii. The error term or disturbance are uncorrelated with each other and the

= ′ initial state, E (ετti−− t′ j) = 0 for, ij≠ , tn1, , and E (εαti− 0 ) = 0,

E (ταti− 0′ ) = 0 for all tn=1, ..., .

Once the STS is written in a state space form, the Kalman filter can be applied for estimating the unobserved state variable.

78

3.4.5 Kalman Filter Estimation

Kalman filter estimation is a recursive procedure which predicts the optimal unobserved state variable for a given time with the knowledge of all previous observations (Harvey, 1989). The aim of the filter is to update the state variable as new observation becomes available. The recursive Kalman filter is carried out by performing two passes on the data: a forward pass, from tn=1, , using a recursive algorithm known as the Kalman filter that is applied to the observed time series and a backward pass from tn= ,  , 1, using recursive algorithm known as state and disturbance smoothers that are applied to the output of the Kalman filter

(Commandeur and Koopman, 2007; Koopman et al., 2006).

A variety of Kalman filter recursive equations are available but those employed in this study are based on the work of Harvey and Shephard (1993) which

assumed that the initial condition of the state vector as αˆ 0 NID( αP00 , ) . The

Kalman filter estimation consists of two iterative procedure: predicting and updating.

The first stage of the Kalman filter recursion is to estimate the 1-step-ahead of the

state vector, αˆ t|t-1 and the corresponding error covariance of the estimate, Pt|t-1 based

on all information up to and including time t −1, (YY12, , ..., Yt− 1) using equations below:

ααˆ t|t-1= E( t | YYtt−−12 , , , Y 1) (3.45) = Tαˆ t-1|t-1

′ Pt|t-1 =−−E (ααtˆˆ t|t-1)( αα t t|t-1 )  (3.46)

=TPt1|t1−− T′′ + RQR

79

Given the 1-step-ahead estimate of the state vector, the 1-step-ahead estimate of the measurement variable with the corresponding covariance matrices of measurement error are given below:

ˆ YZt|t−− 1= αˆ t|t 1 (3.47)

ˆˆ′ Ft =−−E ( YYt t|t−− 1)( YY t t|t 1 )  (3.48)

=ZPt|t− 1 Z′ + H with the prediction error presented by the following equation:

vt =Y t− Zαˆ t|t− 1 (3.49)

The updating stage of Kalman filter recursion incorporates a new observation into the predicted state vector to obtain an improved estimate. The process involves

updating the estimate αˆ t|t− 1 and Pt|t− 1 given a new observation at time t , Yt based on the following equations:

-1 αˆˆt|t = α t|t−− 1+− P t|t 1 Z′ F( Yt Zαˆ t|t-1 ) (3.50)

−1 Pt|t = P t|t-11− P t|t- Z′ F ZPt|t− 1 (3.51)

The full derivation of the Kalman filter recursive equations can be found in

Harvey (1989) that relies on the standard results of multivariate normal theory.

The process of predicting and updating is repeated until the end of the sample period, tn= . When all n observations have been processed, the Kalman filter yields

the optimal estimator of the current state vector, αˆ n|n as well as the predicted state

vector in the next time period, t +1, αˆ n+1|n . This estimator contains all the

ˆ information needed to make optimal prediction of future values, Yn+1|n of both the

80 state and observations. The Kalman filter algorithm places more weight on most recent observations than distant past observation.

The unknown parameters in the system matrices of state space form are estimated based on maximum likelihood estimation method. In this study, the likelihood function based on prediction error decomposition as in Harvey (1993) and

Kim and Nelson (1999) is used:

n 2 n 1 vt logLdt=−−+ log( 2π ) ∑ log F (3.52) 22t=1 Ft

where Ld is a set of unknown parameters for a specific statistical model.

Once the Kalman filter predicting and updating procedures are completed, the state variable, α can be re-estimated using backward recursion that is better known as smoothing based on the complete set of observations. There exist several smoothing algorithms with classical fixed-interval smoothing being the most basic.

Derivation of the smoothing algorithms can be found in Harvey (1989) and Durbin and Koopman (2012).

The smoothing algorithms consist of a set of backward recursion starting with

the final estimate of αˆ n|n and Pn|n that were estimated from the forward recursion of updating and predicting. The estimate of the state vector at time t given complete observation with the correpsonding covariance matrix of the estimation error are given as follows:

 αˆˆt|n = α t|t+− P t( α ˆ t+1|n Tα ˆ t|t )

 Pt|n =P t|t +P t|t+ 1( P t+1|n− P t+1|t) P t′

81 where P =P TP′ -1 for tn=−−1, n 2, , 2, 1. Since the smoothed estimate is t t|t t+1|t  based on more information than the updated estimate, generally it has a smaller

covariance error matrix than that of the updated estimate, that is PPt|n≤ t|t .

3.5 Evaluation of Structural Time Series Model

Similar with other Gaussian models, the residuals should satisfy three assumptions, (in order of importance): independence, homoscedasticity and normality.

3.5.1 Model Diagnostic

The assumption of independence for the residuals can be checked by using

Ljung-Box (LB) test or also known as portmanteau test. The null hypothesis of uncorrelated series is tested againts the alternative hypothesis for the present of serial correlation. The LB test statistic calculated as follows:

k r 2 LB( k )= n ( n += 2)∑ l , l1,2,.. k (3.53) l=1 nl−

where rk is the residual autocorrelation at lag k , calculated as below:

nk− ∑(vt−− vv )( tk+ v ) t=1 rk = n (3.54) 2 ∑( vvt − ) t=1 where v is the mean of the residuals. The statistic should be tested against chi-square distribution with (kw−+ 1) degrees of freedom, where w is the number of

2 estimated disturbance variances. If LB() k < χkw−+1,α , it indicates that the

82 independence assumption of the residuals is satisfied or in another words the residuals are serially uncorrelated.

Homoscedasticity of the residuals can be checked with the Goldfeld-Quandt

(GQ) test. The null hypothesis for the presence of homoscedasticity is tested against the present of heterocedasticity. The GQ test statistic is as follow:

n 2 ∑ =−+vt GQ() h = tnh1 (3.55) dh++1 2 ∑t=κ +1 vt where κ is the number of estimated parameter in the model and h is the nearest integer to (n − κ )/3. The statistic tests whether the variance of the residuals in the first third-part of the series is equal to the variance of the residuals to the last third part of the series. The GQ test statistic is tested against an F -distribution with (hh , ) degree of freedom. The homoscedasticity of the residual is said to be satisfied if the

calculated GQ test statistic is less than Fhh,,α .

The normality assumption can be tested by using Jarque-Bera (JB) test as follows:

sk22(− 3) JB= n + (3.56) 6 24

n n 1 2 1 4 ( vv− ) ( vv− ) n ∑ t n ∑ t with s = t=1 and k = t=1 n n 2 1 2 3 1 2 ( vvt − ) ( − ) n ∑ ∑ vvt t=1 n t=1

where s is the skewness and k is the kurtosis of the residuals. The JB test statistic should be tested against a chi-square distribution with two degrees of freedom. If

2 JB < χ2,α , the null hypothesis of normality is not rejected which concludes that the residuals are normally distributed. Rejecting the null hypothesis indicates that the

83 possible occurrence of outliers or structural break in the series which can be ensured by inspecting the auxiliary residual.

Auxiliary residual is an important diagnostic tool that can be examined if the normality of the residuals are not satisfied. Recall that the smoothing filter applied in backward pass on the data yield the smoothed observation with the correspondence state residuals and their variances. The auxiliary residuals are obtained by dividing the smoothed residuals with the square root of the variances as follows:

εηˆˆt t and (3.57) var(εηˆˆtt ) var( ) for tn= 1,..., which result in standardized smoothed residuals. The standardized smoothed measurement residuals value that is far exceed the 95% confidence interval shows that the series may contain an outlier while if the standardized smoothed state residuals value is greater than 95% confidence interval it indicates that the series contains a structural break either in the level or slope component.

If an outlier or structural break is detected, the corresponding observation is checked for possible measurement or typing error. If the values of the observation is appropriate, the observation is treated as an outlier or the series is said to contain structural break. The presence of an outlier or structural break is handled by adding an intervention variable to the model.

3.5.2 Goodness-of-fit of Structural Time Series

As univariate time series analysis is very common with Box and Jenkins methods, the effectiveness of univariate STS modeling which is developed in the first part of the thesis will be compared with the Box and Jenkins modeling. Meanwhile,

84 the multiple STS, which include the explanatory variable, will be compared with time series regression.

Comparison on goodness-of-fit on the three estimated models are made with regards to within sample and out of sample. For the case of within sample comparison, prediction accuracy is calculated for the period from January 2001 to

December 2013. For out-of-sample forecast accuracy, forecasted values are compared with the true values for 2014.

The goodness-of-fit of the STS will be evaluated based on two loss function measures that are root mean square error (RMSE) and mean absolute percentage error (MAPE) as follows:

2 nnˆ (YYtt− ) YY− ˆ RMSE = ∑∑ , MAPE =tt ×100 (3.58) tt=11nY= t

ˆ Where Yt and Yt are actual observed value and predicted value respectively and n is the number of predicted values.

3.6 Application on Road Accidents

The theoretical background on the methodology of Box and Jenkins model, time series regression and STS to be applied to model the number of road accidents have been discussed thoroughly in Section 3.1 to 3.5. This section discusses step-by- step procedure in developing the model for the number of road accidents that is the main objective in this thesis. Summary of the modeling procedures is displayed as flow chart in Figure 3.3.

The subject of interest or the main ingredient in this thesis is the monthly number of road accident for the period from January 2001 to December 2014. As can

85 be seen from Figure 3.3, the time series regression and SARIMA models are to be compared with the STS model that is estimated using the Kalman filter estimation technique. Incorporating explanatory variables, comparison will also be made between the time series regression and the STS models.

Figure 3.3: Step by step procedure of designing of road accidents modeling

Development of road accidents model based on STS approach begins by fitting the possible STS model as tabulated in Table 3.1. The best model is chosen based on the smallest value of Akaike information criterion (AIC). The value of AIC is computed as follows:

86

1 [ −+2logL 2κ ] (3.59) n d

with log Ld is the value of log-likelihood function which is maximised in state space modeling, κ is the number parameter estimated in the analysis.

The road accidents model is better explained if the explanatory variables are incorporated in the models. For example, Yaacob et al. (2012), Bergel-Hayat et al.

(2013) and Amin et al. (2014) have incorporated the meteorological and climate effect risk on road accidents and Sarani et al. (2013) included the restraints device used such as seatbelts and helmets. Therefore, in the next stage, approriate explanatory variables are incorporated to the models. The explanatory variables that are incorporated include amount of rainfall (RAINF), number of rainy day (RAIND), temperature (TEMP), air pollution index (API), crude oil price (OILP), consumer price index for transportation (CPI), festive holiday season (BLKG), and safety intervention (SAFE) For each modeling process, the diagnostic test is performed to ensure that each model fulfilled the residual assumption. If a model does not fulfill the residual assumptions the model will be revised with appropriate statistical techniques.

3.7 Summary

The appropriate statistical analysis in developing road accidents model results in giving a good an manageable prediction of road accidents in the future. Common method used such as Box and Jenkins and time series regression may be improved by introducing better model such as STS which have direct interpretation in terms of time series component. This chapter discussed common methods used in developing road accidents model as well as introducing the STS method in modeling road accident which has been pioneered by Harvey (1989). This chapter also includes the

87 step by step procedure to model road accidents which is the focus of this thesis. All the procedures discussed in this chapter will be applied in Chapter 4, 5, and 6.

88

CHAPTER 4

PRELIMINARY STUDY

This chapter describes the properties of data collected based on descriptive statistics, time series plot and correlation analysis. Descriptive statistics is important in describing the basic feature of the data. Meanwhile time series plot is useful in observing the basic pattern of the series such as the trends and seasonality. The correlation analysis measures the strength of relationship among the variable. In addition, this chapter also applied common time series method such as time series regression and Box and Jenkins analysis in modeling road accidents.

4.1 Descriptive Statistics

Measure of basic statistics such as minimum and maximum value, mean, and standard deviation for continuous variables are calculated and tabulated as in through

Table 4.1 to Table 4.3. The data was calculated according to respective regions and states.

4.1.1 Road Accidents Series

Referring to the Table 4.1, central region that includes state of Kuala Lumpur and Selangor are among the highest contributor of road accidents. The average and standard deviation road accidents series are 12487 and 2540.4, respectively. The

89 range of accidents in this region is between 7597 and 19004 cases. Selangor is among the main contributor of the road accidents in central region with the average of road accidents occurrence is about 8361.9 cases per month. The second highest contributor of road accidents is the northern region followed by the southern region with average of road accidents occurrence are 6650.9 and 6414 cases per month. The region with second lowest road accident occurrences is Borneo, while the lowest is the east coast. In term of states, Perlis are rarely experience of road crash as it is recorded to be the lowest contributor of road accidents occurrence. The average numberof road accidents for state of Perlis is 123.33 cases per month with the minimum number of road accidents occurred are 66 cases.

Table 4.1: Descriptive statistics of road accidents series Regions/ States Min Mean Max SD North 4615 6650.90 9316 956.35 Perlis 66 123.33 191 28.66 Penang 1941 2724.00 3602 358.21 Kedah 903 1341.70 2139 254.68 Perak 1680 2470.50 3692 350.00 South 4004 6414.00 9388 1338.80 Melaka 628 1011.10 1439 194.41 Negeri Sembilan 929 1421.40 2413 298.49 Johor 2447 3981.60 5885 864.04 East Coast 1390 2639.80 4417 620.51 Kelantan 402 678.27 1219 163.21 Pahang 593 1259.80 2002 298.70 Terengganu 382 701.71 1210 179.33 Central 7597 12487.00 19004 2540.40 Kuala Lumpur 2719 4125.20 6137 719.56 Selangor 4873 8361.90 12867 1834.60 Borneo 1472 2432.30 3378 441.90 Sabah 720 1166.80 1625 225.51 Sarawak 742 1265.50 1774 222.90

4.1.2 Climate Related Variable

Referring to the rainfall series (RAINF), east coast region found to have the highest amount of rainfall with the mean and standard deviation of rainfall is 90

1722.1mm and 974.71mm per month. The result is expected since the east coast

region is among the region that involved in floods for almost every year.

Table 4.2: Descriptive statistics of climate related variables Rainfall (RAIN_F) Rainyday (RAIN_D) Regions/ States Min Mean Max SD Min Mean Max SD North 229.1 1486 4394.2 648.48 3 15 26 4.78 Perlis 0 171.23 633.4 100.9 0 14 27 5.82 Penang 50.9 379.84 1544.5 228.42 5 15 29 5.16 Kedah 0 394.37 1339.4 243.36 0 15 27 6.19 Perak 97.8 532.8 1319.8 213.67 7 16 25 4.41 South 242 1193.6 3235 490.32 4 16 24 3.88 Melaka 20.7 160.57 471.7 79.13 3 14 24 4.25 Negeri Sembilan 0 216.03 522.8 106.14 0 17 28 5.07 Johor 112.5 815.53 2653 418.43 4 16 24 3.98 East Coast 151.4 1722.1 4838.2 974.71 4 15 26 4.88 Kelantan 19.2 435.29 2014 343.8 3 15 26 5.37 Pahang 108.6 1050.4 3040.5 508.38 5 16 25 4.34 Terengganu 0.2 225.17 1170.2 213.93 1 14 30 5.97 Central 149.5 897.06 2027.2 345.55 6 16 27 4.09 Kuala Lumpur 19 224.75 527 112.6 4 15 26 4.76 Selangor 76 673.7 1550.2 273.42 6 16 27 4.23 Borneo 896.7 2452.9 6891.8 935.23 8 17 27 3.82 Sabah 48.4 908.51 2310.2 460.21 2 16 26 4.17 Sarawak 514.8 1544.4 4609.4 622.47 9 19 29 4.4 Temperature (TEMP) Air Pollution Index (API) Regions/ States Min Mean Max SD Min Mean Max SD North 30.3 32.4 35.0 0.91 34 62 104 13.10 Perlis 29.7 32.7 36.2 1.24 33 58 104 10.35 Penang 29.9 31.9 33.9 0.77 30 65 121 16.15 Kedah 30.1 32.4 35.5 1.10 25 56 118 16.16 Perak 30.5 32.6 35.0 0.76 29 68 161 20.53 South 29.8 32.0 34.5 0.76 33 69 313 28.58 Melaka 29.8 32.0 35.1 0.82 29 67 415 34.80 Negeri Sembilan 30.4 32.2 34.9 0.87 29 72 173 21.02 Johor 28.9 31.8 33.7 0.92 29 69 432 42.05 East Coast 28.2 31.1 33.3 1.23 27 54 130 13.36 Kelantan 28.3 31.8 34.5 1.43 26 52 116 12.83 Pahang 27.5 30.5 32.4 1.10 21 52 172 17.94 Terengganu 27.6 31.0 33.3 1.29 26 88 200 29.45 Central 30.8 32.8 35.1 0.74 48 89 251 27.54 Kuala Lumpur 30.7 33.1 35.8 0.83 26 88 200 29.45 Selangor 31 32.6 34.6 0.80 52 96 301 31.79 Borneo 29.6 31.7 33.0 0.72 20 46 90 12.98 Sabah 29.7 31.6 33.1 0.75 1 43 90 14.44 Sarawak 29.2 31.8 33.4 0.77 11 50 114 17.34

91

The main contributor for the higher rainfall series is the state of Pahang. The lowest RAINF is Malacca and Perlis that received only 160.57 mm and 171.23mm of rainfall along the study period. The average number of rainy days per month is similar for each state that is between 14 to 19 days with a minimum of between 0-9 days and maximum of 24-30 days. The highest average RAIND was recorded by

Sarawak with 19 days per month while the lowest is experienced by Perlis and

Kedah having no rain for a month.

The mean values for the temperature of each state were almost similar that is ranging between 30.5 to 32.7 degree Celsius. Perlis recorded the highest temperature with 32.7 degree and this is in agreement with the state’s recorded rainfall. The state with the lowest temperature was Pahang which is also among the states that received higher amount of rainfall. Meanwhile, monthly maximum air pollution index (API) was higher in the central region. The state with the highest API is Selangor while the lowest is Sabah. The API in Malaysia is in the safe level.

4.1.3 Economic Related Variables

The economic related variables that incorporated in this study include of crude oil price, and consumer price index. The crude oil prices along 2001 to 2013 are ranging between RM 70.38 to RM 430.78 per barrel and yet the monthly price average is RM 221.84 per barrel (refer Table 4.3). The price is simple average of three spot prices that is Dated Brent, West Texas Intermediate and Dubai Fateh.

In addition, the consumer price index for transportation can be divided into three prices that are for Peninsular, Sabah and Sarawak. The prices in Peninsular are lower in comparison to Sabah and Sarawak. On average, the CPI transport for

92

Peninsular is 93.63 and the price range of 81.50 to 108.90. Meanwhile the average

CPI for Sabah and Sarawak are 111.04 and RM109.46 respectively.

Table 4.3: Descriptive statistics of the monthly economic related variables Economic variable Min Mean Max SD Crude Oil Price 70.38 221.84 430.78 90.69 CPI (Peninsular) 81.50 93.63 108.90 8.5474 CPI (Sabah) 99.10 111.04 141.30 7.9298 CPI (Sarawak) 98.8 109.46 142.80 7.3025

4.2 Time Series Plot

Time series plot enable the evaluation of patterns and behavior in the data overtime. The time series plots for each variable are illustrated in Figure 4.1 through

Figure 4.9. In the following subsection, the description of the pattern of the series will be discussed.

4.2.1 Road Accident Series

The time series plot of road accidents series in Figure 4.1 and Figure 4.2, exhibit non-stationary behavior. These figures displayed some trending pattern. The increment of the road accidents is proportionate to the growth of population and economic. Generally, along the study period from 2001 and up to 2013 the number of road accidents in each region shows a large increasing number of road accidents during month of March and lower during month of February.

As displayed in Figure 4.1(a), the number of road accidents observed in the northern region is higher in March, July, August and October, and lower in February,

April, June, September and October. For central region, the number of road accident show a positive trend along the study period during March, July and October and

93 decreasing trend during February and June. Otherwise, the number of road accident show a increasing trend during January and August. After 2008 onwards, the number of road accident in central region decreases during September, November, and

December. Besides, one unusual observation was detected in the series during July

2013.

(a) Northern (b) Central (c) Southern

(d) East Coast (e) Borneo Figure 4.1: Monthly time series plot of road accidents for all regions.

Furthermore, the trend pattern of road accident in southern region is almost similar with northern region. Apart of March and July, the upwards trend of the series can be clearly seen in months of May and August and this number greatly decreases during months of February, April and September with few potential unusual point are found during December 2001, February 2009 and May 2012.

In contrast, the number of road accidents in east coast region is extremely higher during the main festival season, especially Eid ul-Fitr with the highest peak found in August 2012. Meanwhile, in Borneo region the number of road accidents from the early period of study until year 2008 is approximately constant, while after 2009 the number of road accidents are fluctuated. Commonly, road accidents in

94

Borneo were found to increase during March, May, and October and decrease during

January, February and June.

(a) Penang (b) Perlis (c) Kedah

(d) Perak (e) Selangor (f) Kuala Lumpur

(g) Johor (h) Melaka (i) Negeri Sembilan

(j) Kelantan (k) Pahang (l) Terengganu

(m) Sabah (n) Sarawak Figure 4.2: Monthly time series plot of road accidents for individual states

95

In addition, referring to the individual time series plot of road accidents series as illustrated in Figure 4.2, similar to regional time series plot, individual road accidents series also shows positive upward trends during the period of study. For northern states, Penang recorded the highest number of accidents and highest number of accidents was recorded in August. Road accidents series for Perlis exhibits non- seasonal pattern with few unusual points. For Kedah, road accidents for this state may increase during the festive seasons, similar to road accidents pattern in the east coast regions. For Perak, the trend and seasonal pattern approximately resembles

Penang road accidents series but in the opposite directions.

From Section 4.1.1, it is evident that central region recorded the highest number of road accidents in Malaysia, the highest is in Selangor followed by Kuala

Lumpur. Referring to the time series plot of road accident series for Selangor, the seasonality pattern is constant across the study period with few unusual points in

2005, 2011 and 2013. Meanwhile, road accidents pattern for Kuala Lumpur shows that the magnitude of fluctuation becomes larger after 2011 onwards.

In contrast, in southern region states which include Negeri Sembilan, Melaka and Johor, the road accidents series was higher during the final quarter of the year that is during October, November, and December and the series was lower in the first few months around January and February. For Johor, as illustrated in Figure 4.2 (g), the number of road accidents clearly shows a linear increasing pattern in 2003. This is assumed to be related to Visit Johor 2003 campaign. Conversely, the state of

Melaka (Figure 4.2 (h)) shows a sudden decreasing pattern between 2005 to 2006 and after 2009, it records an upward trend. For Negeri Sembilan, in 2011, as seen in

Figure 4.2 (i), there is an unusual point around November but the reason behind this is unknown.

96

The trend recorded by individual states of the east coast region is similar to the overall trend seen in that region. Kelantan, Terengganu, and Pahang recorded higher number of road accidents during the festive seasons especially Eid-ul-Fitr.

Higher number of road accidents occurrence for states of Kelantan and Terengganu can be clearly seen in Figure 4.2 (j) and Figure 4.2 (l). On the other hand, road accidents in Sabah and Sarawak seem to be higher during May, August and October.

The large increment in May is expected as it is related to the celebration of harvest and Gawai Dayak festival that are only celebrated in the Borneo region.

Generally, the upward or downward trend of road accidents happening in each state or region is influenced by many factors such as climate, economy, festive season, school holiday, and policies or road safety intervention implemented as an initiative to reduce the number of road accidents and deaths.

4.2.2 Climate Related Variables

Climate effects that have been incorporated in this study include amount of rainfall (RAINF), monthly number of rainy days (RAIND), monthly maximum temperature (TEMP) and monthly maximum average of air pollution index (API) and the time series plot for these variables are displayed in Figure 4.3 through Figure 4.6.

Figure 4.3 displays the time series plot for monthly amount of rainfall series according regions whereas Figure 4.4 displays the time series plot for monthly amount of rainfall series according to individual states.

Referring to the time series plot for the rainfall, each rainfall series in each region and state has a stationary mean. The fluctuation of the series also shows that stationary in variance was also fulfilled. The regime of rainfall in each state generally has similar pattern where it has two periods of maximum rainfall followed by two

97 periods of minimum rainfall. According to Meteorology Department, maximum rainfall usually occurs during April to May and October to November. Meanwhile, the minimum amount of rainfall is around January to February and June to July.

(a) Northern (b) Central (c) Southern

(d) East Coast (e) Borneo Figure 4.3: Monthly time series plot of amount of rainfall for all regions

However, this statement does not include the east coast and Borneo regions.

Rainfall in east coast region (refer to Figure 4.3 (d)) recorded heavy rain during the beginning of the year and final quarter of the year, while, middle of the year are the driest month for this region. In contrast with Borneo region, the regime of rainfall here alternately changes between upward and downward peak with maximum rainfall usually occurs around December and the driest month is in February.

On the contrary, the time series plot of rainfall series for northern and

Borneo regions show few unusual points. Figure 4.3(a) for northern region rainfall series shows a sudden larger amount of rainfall in October 2003 while Figure 4.3 (e) for Borneo region rainfall series has a larger amount of rainfall series during January

98

2009. Whereas for southern region (Figure 4.3 (c)), after 2006 onward, the rainfall series show that the is an increase in magnitude of seasonal changes.

Analyses on rainfall series for individual states are not much different in terms of the stationarity of the series. However, due to different topography, different rainfall series behavior may occur. For northern states, the unusual point observed in

October 2003 is true for all states except Perlis. Notwithstanding, the states of Kedah and Perak have a second extreme unusual point of rainfall that is during August 2009 and October 2008 respectively. Figure 4.4 (b) shows the extreme rainfall series for state of Perlis during December 2005.

For southern states, only Johor shows unusual rainfall pattern. There is a larger amount of rainfall during January 2007 and 2008. For the east coast states,

Kelantan rainfall series (Figure 4.4(h)) is increasing yearly, while Pahang rainfall series (Figure 4.4(i)) shows an increasing magnitudes of seasonal pattern up to year

2006, and in year 2008 and 2009, the rainfall series was found to be higher during

December. In contrast, the rainfall series for the state of Terengganu shows a decreasing seasonal pattern up to year 2008. After year 2009 onwards, the rainfall series shows an increasing pattern with the maximum rainfall series found in

November along this period.

Central states rainfall series that include Kuala Lumpur and Selangor is not much different with the aggregated central region rainfall series. The series are stationary without unusual point. Meanwhile, the rainfall series for state of Sabah

(Figure 4.4(m)) has an increasing pattern until 2008 before the amount of rainfall suddenly dropped between 2009 and 2010. In 2011 onwards, the amount of rainfall series shows decreasing seasonal pattern until the end period of study. In contrast,

99

Sarawak rainfall series pattern is evenly distributed along the study period with one unusual point found in January 2009.

(a) Penang (b) Perlis (c) Kedah

(d) Perak (e) Melaka (f) Negeri Sembilan

(g) Johor (h) Kelantan (i) Pahang

(j) Terengganu (k) Kuala Lumpur (l) Selangor

(m) Sabah (n) Sarawak Figure 4.4: Monthly time series plot of amount of rainfall for individual states

100

Moreover, in terms of number of rainy days (RAIND,) the time series plot for each region is not much different, similar to individual states as displayed in Figure

4.5 and Figure 4.6 respectively. The patterns of number of rainy days are stationary in mean as well as in variance for all regions and states. Generally, the number of rainy days are uniform and evenly distributed during the study period with the higher number of rainy days occurring during early and end of the year compared to the middle of the year. However, RAIND for the state of Sabah (Figure 4.6(m)) shows somehow an unusual pattern compared to other states and the RAIND in February

2010 is found to be the lowest.

(a) Northern (b) Central (c) Southern

(d) East Coast (e) Borneo Figure 4.5: Monthly time series plot of number of rainy days for all regions

101

(a) Penang (b) Perlis (c) Kedah

(d) Perak (e) Melaka (f) Negeri Sembilan

(g) Johor (h) Kelantan (i) Pahang

(j) Terengganu (k) Kuala Lumpur (l) Selangor

(m) Sabah (n) Sarawak Figure 4.6: Monthly time series plot of number of rainy days for individual states

102

(a) Northern (b) Central (c) Southern

(d) East Coast (e) Borneo Figure 4.7: Monthly time series plot of maximum temperature for all regions

Malaysia essentially observes tropical weather, without extremely high temperatures. Figure 4.7 shows the time series plot of monthly maximum temperature for regions in Malaysia. Throughout the years, the temperature series show similar patterns where the east coast region shows lowest temperature and northern region shows the highest temperature. This is expected as Perlis that is included in the northern region is known as the hottest state. As displayed in Figure

4.7(a), Figure 4.7 (b) and Figure 4.7 (c), the regime of temperature in northern, southern and central region usually have maximum value around February to March and minimum temperature in the end of the year from October to November.

In contrast, the east coast region, has the hottest temperature from April to

May (Figure 4.7(d)) The temperature dropped after September and lowest temperature for this region is around December. This pattern of maximum temperature is almost similar for Borneo region. Hottest temperature always occur

103 between May to July and wettest temperature is around December to January. The temperature reading has a close relationship with amount of rainfall.

Referring to temperature series for individual states displayed in Figure 4.8, the monthly maximum temperature for each state is evenly distributed with uniform temperature that vary between 30 to 35 degree Celcius. Several states recorded lower temperature for several years during this study. For example, Negeri Sembilan and

Selangor experienced this period around 2006 up to 2009 whereas in state of

Terengganu, the temperature dropped around 2008. Meanwhile the temperature patterns for Penang, Melaka, and Kuala Lumpur show a decreasing pattern, as their temperature dropped at the end of the study period.

In contrast another climate variable taken into account, monthly average maximum air pollution index (API) shows a non-stationary pattern. The API patterns during the study period from January 2001 to December 2012 for each region in

Malaysia are displayed in Figure 4.9 while the API pattern for individual states are displayed in Figure 4.10. Based on the regional API series, it can be observed that the maximum monthly API reading was between good to moderate. However, there is a slight increase in API reading recorded in southern, central and east coast regions. For the southern region, unhealthy API was recorded in August 2009 while very unhealthy API was recorded in October 2010.

104

(a) Penang (b) Perlis (c) Kedah

(d) Perak (e) Melaka (f) Negeri Sembilan

(g) Johor (h) Kelantan (i) Pahang

(j) Terengganu (k) Kuala Lumpur (l) Selangor

(m) Sabah (n) Sarawak Figure 4.8: Monthly time series plot of maximum temperature for individual states

105

(a) Northern (b) Central (c) Southern

(d) East Coast (e) Borneo Figure 4.9: Regional time series plot for monthly maximum API

Furthermore, for central region, unhealthy index was reached in February and

August 2005. Then, beginning December 2008 until the end of the study period, the

API again reached unhealthy level. The highest API was recorded in June 2013, for all regions experienced very unhealthy air quality excepts the northern and Borneo regions. The maximum API in 2013 was related to the large scale burning in many parts of a neighboring country that caused haze emergency to be declared in several states (Meteorology Deparment, 2015).

Moreover, as illustrated in Figure 4.10, API time series plot for individual states show that the API were between good to moderate. On the other hand, short period of extreme haze episodes were experienced in certain states at the certain months. For example, for Johor (refer Figure 4.10 (g)), there are two high peaks which represent hazardous API level happening in October 2010 and June 2013. The hazardous API level in October 2010 was related to transboundary haze pollution, forcing the government to close 170 schools particularly in Muar district

(Meteorology Department, 2015).

106

The high API level in June 2013 does not only affected Johor, but almost all states in the southern, east coast and central region were affected. The API in June

2013 recorded between unhealthy and hazardous level. Selangor and Kuala Lumpur are among the states which experienced very unhealthy API level after 2008 onwards. This is not a surprise because the unhealthy API in those states are due to emission from millions of motor vehicles in those states.

(a) Penang (b) Perlis (c) Kedah

(d) Perak (e) Melaka (f) Negeri Sembilan

(g) Johor (h) Kelantan (i) Pahang Figure 4.10: Time series plot of monthly maximum API for individual states

107

(j) Terengganu (k) Kuala Lumpur (l) Selangor

(m) Sabah (n) Sarawak Figure 4.10: Continued

4.2.3 Economic Related Variables

Figure 4.11 displays the time series plot for economic effects. Figure 4.11(a) illustrates the time series plot for crude oil price. This figure shows that the oil price increased steadily during the first six years of the study period and tripled by the middle of 2006. Later that year, the prices increased sharply and the maximum price was recorded in July 2008. The maximum price of crude oil price phenomenon is known as oil shock and it is caused by strong demand confronting stagnating world production (Hamilton, 2009). However, the price reduces to normal price in line with the decline of world oil price a month after it hits the peak.

Meanwhile, Figure 4.11(b) through Figure 4.11(d) display the transportation consumer price index (CPI) for Peninsular, Sabah and Sarawak respectively. As explained in Section 4.1.3, the transport CPI for Peninsular is lowest followed by

Sarawak and Sabah. The transport CPI in Peninsular Malaysia shows slow increment each month until end of August 2005. In January 2006, the price suddenly dropped

108 by 17.6% to 103.3 from a constant price of 121.4 since August 2005. In March

2006, the price rose up to the average price by 9.4% to 112. In June 2008, the price increased 20.7% from 114.2 to 135.4 before it slowly decreased starting from August

2008 until January 2009 to the average price. In January 2011, the price once again decreased 12.3% from 115.9 to 103.3 before it returned to the average price in

September 2013.

(a) Crude oil price (b) CPI transport of Peninsular Malaysia

(c) CPI Transport for Sabah (d) CPI Transport for Sarawak Figure 4.11: Monthly time series plot for economic effect

Meanwhile, the transport CPI for state of Sabah increased at a slow rate until

April 2003 and stood up constant until July 2003. In August 2003, the CPI rose by

6.8% to 118.8 before it returned to the average price a month later. After that the slow increment continued until December 2005 before it suddenly dropped approximately by 19.5% to 106.1 in January 2006. In March 2006, the transport CPI

109 increased to the average price and experience a steady increase until May 2008. In

June 2008, the price once again increased sharply by to 22.4 % to 138.7 before it decreased at slow rate starting from August 2008 to December 2008. The price increased constantly for almost a year in 2009 before it suddenly dropped by 19.5% to 99.1 in January 2010. After a month, the price continued to slowly increase to level price and in September 2013, the price hiked up by 4.3% to approach the level price.

In contrast, CPI in Sarawak was increasing at a slow rate before it suddenly dropped by 11% from 115.8 to 104.6 in January 2006. However, this price returned to the old price after 2 month. Between April 2006 and Mei 2008, transport CPI for

Sarawak is fluctuate between above and below the level price. In June 2008, the CPI increased by 23.4% from 115.3 to 139.3. After two months, the price decreases at a slow rate starting from August 2008 until January 2009. In 2009, the price fluctuated up and down around the level price. Similar case is experienced by Sabah. The state’s CPI price in January 2010 decreased 15.6% from 114.8 to 98.8. A month later, the price increased at a slow rate before it approached the level price in

September 2013.

4.3 Correlation Analysis

This section will measure the relationship between dependent variable which is monthly number of road accidents occurrence with selected independent variables.

The road accidents series in this analysis is based on the natural logarithmic form.

Recall that the study involved eight selected independent variables that are amount of rainfall (RAINF), number of rainy days (RAIND), maximum temperature (TEMP), maximum air pollution index (API), crude oil price (OILP), consumer price index for

110 transport (CPI), return to hometown culture (BLKG), and safety precaution operation

(SAFE).

4.3.1 Correlation Analysis for Regions

Table 4.4 shows the regional Pearson correlation coefficient between number of road accidents with all nine independent variables. The result shows that majority of the variables except for TEMP and CPI, has a weak positive relationship with the number of road accidents for each region. RAINF, SAFE and BLKG have a positive significant relationship at 5% significant level with the number of road accidents in northern and east coast regions. As expected, RAIND is significantly correlated with the number of road accidents in east coast and Borneo regions. This may be due to the different topography feature of these two region where the roads are usually built in the highland areas.

Table 4.4: Correlation coefficient between number of road accident for each region with selected dependent variables East Variable Northern Central Southern Borneo Coast RAINF 0.1784** 0.1439* 0.1408* 0.2334** 0.1253 RAIND 0.1259 0.0785 0.0734 0.2497** 0.2228** TEMP -0.2635** -0.1556* -0.1468* -0.1156 0.0886 API 0.6038** 0.6054** 0.5219** 0.6566** 0.4419** OILP 0.8309** 0.8387** 0.8386** 0.7619** 0.8423** CPI -0.1185 -0.1515* -0.1594** -0.0907 -0.0413 BLKG 0.1677** -0.0519 0.0688 0.2178** 0.0308 SAFE 0.1769** -0.0780 0.0592 0.2182** 0.0205 Note: ** and * denote significant at 5% and 10% significant level

Meanwhile, TEMP has recorded negative significant correlation in the northern and central regions. This result indicates that the number of road accidents in these regions decreased during hotter months. In the other hand, the results also show that air pollution index (API) and crude oil price (OILP) variables are correlated with the number of accidents in all regions. These results also indicate that 111 there is strong relationship between OILP with accident series where the correlation coefficient is more than 0.7. There is also significant relationship between CPI and accidents series in central and southern regions at 10% significant level. This results indicate that the increase of CPI will decrease the number of road accidents for both regions.

4.3.2 Correlation Analysis for Individual State

Correlation coefficient between road accidents series for each state and selected variables is tabulated in Table 4.5. Based on the finding, RAIND, API, and

OILP have a positive relationship with accidents series. The results indicate that accidents increase with the increase in RAIND, API and OILP. However, only OILP shows a significant relationship across all states, while the significance of other variables are vary according to states.

For instance, RAIND is only significantly related with the number of road accidents in Kedah, Perak, Kelantan, Terengganu, Pahang and Sabah while API is significantly related with the number of road accidents in all states except Sabah.

Furthermore, CPI and the number of road accidents show a negative correlation.

Although it is only significant for states of Perlis, Kuala Lumpur, Selangor, Negeri

Sembilan and Melaka. Besides, RAINF has a positive relationship with the number of road accidents in all states except for Perlis while it is significant at 5% level in states of Perak, Negeri Sembilan, Kelantan, Terengganu, Pahang and Sabah.

112

Table 4.5: Correlation coefficient between the number of road accidents and selected dependent variables for each state States Variables Kuala Negeri Penang Perlis Kedah Perak Selangor Lumpur Sembilan RAINF 0.1279 -0.0256 0.1304 0.2569** 0.1487 0.0856 0.1707** RAIND 0.1095 -0.0055 0.1415* 0.1807** 0.1508 0.0448 0.0568 TEMP -0.3706** -0.0316 -0.2471** -0.2005** 0.0291 -0.3815** 0.0880 API 0.4964** 0.2648** 0.4659** 0.5591** 0.3600** 0.5961** 0.6538** OILP 0.8491** 0.5933** 0.8217** 0.7456** 0.6564** 0.7951** 0.7988** CPI -0.0795 -0.4931** -0.1377* -0.1201 -0.5010** -0.2006** -0.1994** BLKG 0.0237 0.0370 0.1793** 0.3008** -0.0724 -0.0720 0.1175 SAFE 0.0378 0.0489 0.1961** 0.2952** -0.1346 -0.1080 0.0811 States Variable Melaka Johor Kelantan Terengganu Pahang Sabah Sarawak RAINF -0.0366 0.1596** 0.1442* 0.1995** 0.2556** 0.2508** 0.0363 RAIND 0.0372 0.1346* 0.1959** 0.2362** 0.2305** 0.3198** 0.1220 TEMP -0.2952** -0.1427* -0.0722 -0.0646 -0.1666** 0.0083 0.1524* API 0.4667** 0.3455** 0.4956** 0.5584** 0.6739** 0.0939 0.5283** OILP 0.7813** 0.8470** 0.6639** 0.7356** 0.7873** 0.8295** 0.8207** CPI -0.1842** -0.1366* -0.0308 -0.0922 -0.1146 -0.1127 0.0338 BLKG 0.0758 0.0487 0.3065** 0.2253** 0.1488* 0.0428 0.0188 SAFE 0.0624 0.0495 0.3340** 0.2176** 0.1399* 0.0151 0.0238 ** and * denote significant at 5% and 10% significant levels.

TEMP is negatively correlated with number of road accidents in almost all

state excluding Selangor, Sabah, and Sarawak. Moreover, the result of Pearson

correlation shows that the number of road accidents in Kedah, Perak, Kelantan,

Terengganu and Pahang are correlated with the implementation of safety operation or

SAFE during the festive seasons. The results are identical with BLKG which shows a

significant positive relationship with the number of road accidents. This finding is

expected since both variables are related to festival seasons.

4.3.3 Unit Root Tests

Stationarity of series is important before further analysis is performed

especially in regression analysis. To ensure whether the variables used in this study

are stationary or not Philip- Pheron (PP) test is conducted without time trend at

level and first different of the variables and at seasonal (12) and non seasonal (6)

113

lag. The test is conducted with the data in logarithmic form for road accidents series

and in original form for others remaining variables .

The result of unit root test for both regions and states are tabulated in Table A

and Table B in the Appendix 1. The result of test at level of the road accidents and

oil price for almost all regions and states are non stationary as it is rejecting null

hypothesis of PP test. However at first difference all the variables have achieve the

stationarity condition. Therefore it can be conclude that all variables are integrated

of order 1.

Traditionally, cointegration test is perform to ensure the result of regression

analysis which involve these variables are applicable in overcome the spurious

regression problem. However, the alternative solution as suggested by Wooldridge

(2006) to overcome the spurious regression problem associated with non stationary

series is by adding the time trend variables. In this study this approach has been

applied and dicussed in next sections.

4.4 Time Series Regression

Many statistical techniques have been used to model number of road accidents. Regression analysis is one of the most widely used methods to relate the dependence of response variable on several explanatory variables. Two types of OLS regression models are estimated, the first is by incorporating monthly seasonal dummies while the second type is by incorporating several explanatory variables.

The presence of outliers is handled with impulse dummy variable.

114

4.4.1 Time Series Regression with Seasonal Dummies

OLS time series regression model is first estimated with trend variable and eleven monthly seasonal dummies with December being the month of reference. The regression model can written as below:

11 Yt=β o ++ vt∑ D jt γε j + t j=1 where βo is the intercept of the model, v is the slope coefficient, t is period take

value between 1 to n, γ j is the seasonal dummies coefficient and Djt takes value of

1 if it falls in the j month. Results from the estimated time series regression are displayed in Table 4.6 for the regions case and in Table 4.7 for the states case.

The trend measuring average monthly growth of road accidents shows that the number of road accidents increased between 0.3% to 0.5% per month in the five regions. Surprisingly, the increase of road accidents in the east coast region at 0.47% per month is the highest compared to four other regions in Malaysia. Despite rapid development experienced by the southern and central regions, estimated number of increase of road accidents in these two regions are slightly lower at 0.45% and

0.43%, respectively. Interestingly, the growth of road accidents in the northern region is one third lower than the average growth in the three regions above.

With regards to monthly seasonal trend, on average, the number of road accidents is highest in August for northern, southern and east coast regions, while the highest number of road accidents is marginal higher in July in the central region and is the highest in October in Borneo region. The estimated number of road accidents is found to be lowest in February in all five regions. Interestingly, the number of road accidents throughout Malaysia are found to be generally lower in April, May and

June compared to December. The lowest number of road accidents in February may

115 related to the lowest number of day in that month that is between 28 to 29 days per month.

Table 4.6: Estimated regional road accidents model Region Coefficient Northern Central Southern East Coast Borneo ** ** ** ** ** β0 8.5560 9.0470 8.4070 7.4849 7.5015 v 0.0030** 0.0043** 0.0045** 0.0047** 0.0039** ** γ1 0.0169 0.0107 -0.0091 -0.0416 -0.0562 ** ** ** ** ** γ 2 -0.0439 -0.0983 -0.0898 -0.0879 -0.1186 ** γ 3 0.0162 0.0634 -0.0019 0.0194 -0.0144 ** ** ** γ 4 -0.0088 0.0458 -0.0414 -0.0417 -0.0435 ** γ 5 -0.0081 0.0335 -0.0071 -0.0044 0.0097 ** ** ** γ 6 -0.0398 -0.0043 -0.0528 -0.0327 -0.0569 ** γ 7 -0.0021 0.0776 0.0009 -0.0250 -0.0253 ** ** * ** γ 8 0.0472 0.0727 0.0284 0.0828 -0.0013 ** ** γ 9 -0.0081 0.0345 -0.0175 0.0419 -0.0438 ** ** * * γ10 0.0401 0.0775 0.0255 0.0696 0.0193 * γ11 -0.0117 0.0221 -0.0153 0.0220 -0.0360 LB 17.074 74.486** 41.843** 33.887** 160.69** GQ 0.89999 0.9919 0.5146 0.5504 1.2206 JB 2.7031 0.4287 4.4262 35.801** 1.5183 R2 0.9023 0.9585 0.9651 0.8283 0.9290 **,* denote significant at 5% and 10% level

Despite the simple model, the goodness-of-fit for time series regression model shows that the model fits the data very well with value of R2 of more than 80%. This value varies from 82.8% for the east coast region to 96.5% for the southern region.

However, despite high goodness-of-fit, LB statistic presented near the bottom of

Table 4.6 indicates serious autocorrelation problem. This means that, the assumption that independence of the residuals is not met for all the regions except for the northern region. The Goldfeld-Quandt (GQ) test statistic and Jarque-Bera (JB) test statistic show the absence of heterocedasticity in the residuals while normality assumptions for the residuals are satisfied for most regions.

116

Table 4.7: Road accidents model for individual states State Coefficient Kuala Negeri Penang Perak Perlis Kedah Selangor Lumpur Sembilan ** ** ** ** ** ** ** β0 7.6970 7.5825 4.2686 6.8698 8.6120 8.0050 6.9247 v 0.0027** 0.0027** 0.0051** 0.0039** 0.0047** 0.0035** 0.0042** ** ** γ1 0.0067 0.0196 0.1010 0.0267 0.0071 0.0162 -0.0514

** ** ** ** γ 2 -0.0803 -0.0078 -0.0139 -0.0383 -0.0981 -0.0994 -0.0847

** ** γ 3 0.0059 0.0154 0.0751 0.0335 0.0593 0.0708 -0.0061

** * γ 4 -0.0096 -0.0047 0.0661 -0.0216 0.0524 0.0321 -0.0373

* * γ 5 -0.0165 -0.0059 0.0444 0.002 0.0335 0.0325 -0.0065

** ** γ 6 -0.0378 -0.047 -0.0232 -0.0306 -0.0039 -0.0062 -0.0539

** ** γ 7 0.0086 -0.0151 0.0479 -0.0038 0.0763 0.0795 -0.0211

* ** ** ** ** γ 8 0.0318 0.0502 0.059 0.0696 0.0699 0.0780 0.0044

* γ 9 -0.0291 -0.0057 0.0848 0.0217 0.0392 0.0242 -0.0264

** ** ** ** γ10 0.0114 0.0599 0.007 0.0653 0.0759 0.0793 0.0300

** γ11 -0.0359 0.016 -0.014 -0.0118 0.0213 0.0231 0.0209 LB 279.73** 10.4735 55.7156** 20.583* 75.042** 71.267** 51.216** GQ 0.8813 0.8908 0.9321 0.8153 0.7797 1.0745 0.7235 JB 0.6906 8.7733** 11.837** 21.748** 0.0049 13.042** 66.007** R2 0.8852 0.7878 0.7847 0.8716 0.9526 0.9345 0.903 State Coefficient Johor Melaka Kelantan Terengganu Pahang Sabah Sarawak ** ** ** ** ** ** ** β0 7.9080 6.6048 6.1416 6.1165 6.7556 6.7404 6.8717 v 0.0047** 0.0040** 0.0040** 0.0050** 0.0049** 0.0041** 0.0038** ** ** γ1 0.0083 -0.0158 -0.0442 -0.0316 -0.0435 -0.0470 -0.0650

** ** ** ** ** γ 2 -0.0972 -0.0685 -0.0261 -0.0668 -0.1314 -0.1389 -0.1018

γ 3 -0.0011 0.0025 0.0688 0.034 -0.011 -0.0098 -0.0193

** * * ** γ 4 -0.0419 -0.0421 -0.0079 -0.0433 -0.0546 -0.0396 -0.0472

γ 5 -0.0045 -0.016 0.014 0.0019 -0.0143 -0.0025 0.0202

** * ** ** ** γ 6 -0.0539 -0.0429 0.0375 -0.0197 -0.0740 -0.0575 -0.0565

* γ 7 0.0118 -0.0074 -0.0083 -0.0157 -0.0355 -0.0085 -0.0410

** ** ** γ 8 0.0418 0.0121 0.1547 0.0927 0.043 0.0228 -0.0239

** ** γ 9 -0.0098 -0.0333 0.1046 0.0640 -0.0026 -0.0046 -0.0808

* ** ** ** γ10 0.0309 0.0005 0.1293 0.1090 0.0179 0.0459 -0.0052

* ** γ11 -0.0232 -0.0400 0.0548 0.0485 -0.0117 -0.0075 -0.0627 LB 72.211** 139.63** 34.438** 43.406** 39.467** 105.39** 116.52** GQ 0.3806 0.5136 0.5666 0.4449 0.7658 1.1838 0.7963 JB 2.8724 1.8596 67.557** 29.157** 0.401 3.0919 0.7695 R2 0.9601 0.9074 0.6334 0.7977 0.871 0.9208 0.8933 **,* denote significant at 5% and 10% level

117

The results of OLS time series regression model with monthly seasonal dummies for number of road accidents in individual states are tabulated in Table 4.7.

The results show that, on average, the estimated monthly growth of road accidents is generally between 0.3% to 0.5% with the highest growth of road accidents is surprisingly found for the state of Perlis at 0.51%, followed by Terengganu and

Pahang at 0.5% and 0.49%, respectively. Meanwhile, Penang and Perak are found to have the lowest monthly growth of road accidents in Malaysia.

The estimated monthly road accidents show the lowest number of accidents is recorded in the month of February for eleven of the fourteen states. For Perlis and

Perak the lowest number of road accidents is found to be in June while for Kelantan, the lowest is found in January. On the other hand, the largest number of road accidents is found to vary according to the states with Penang, Kedah, Johor, Melaka,

Kelantan and Pahang recorded the largest number of road accidents in August, while

Perak, Negeri Sembilan, Terengganu and Sabah showed the largest number of road accidents near the end of the year (in October).

Similar as in regional model, the goodness-of-fit for time series regression model shows the model fits the data very well with the value of R2 varies from

63.3% for the state of Kelantan to 96.0% for the state of Johor. However, the LB statistic indicates serious autocorrelation problem. This means that, the assumption that independence of residuals is not met for majority of the states except for Perak.

The GQ test statistic and JB test statistic show the absence of heterocedasticity in the residuals while the normality assumptions for the residual are not satisfied for certain states in northern, central, southern and east coast regions.

118

4.4.2 Incorporating Explanatory Variables

The analysis is continued by incorporating selected explanatory variables as means to investigate possible causes that contribute to road accidents in Malaysia.

The results of preliminary time series regression model can be found in Appendix 2.

Table 4.8 shows Durbin Watson test statistic for autocorrelation while Table 4.9 shows the value of variance inflation factor (VIF) based on the preliminary model.

Meanwhile, Table 4.10 lists the possible outliers observations based on the value of standardized residual that are greater than 3 in magnitude, together with the corresponding month.

Table 4.8: Durbin-Watson test of autocorrelation Region Durbin Watson Decision Northern 1.250 Positive serial correlation Central 1.138 Positive serial correlation Southern 1.118 Positive serial correlation East Coast 1.723 No serial correlation Borneo 1.033 Positive serial correlation States Durbin Watson Decision Penang 1.264 Positive serial correlation Perak 1.374 Positive serial correlation Perlis 1.564 Positive serial correlation Kedah 1.124 Positive serial correlation Selangor 1.138 Positive serial correlation Kuala Lumpur 1.571 Positive serial correlation Melaka 0.993 Positive serial correlation Johor 0.360 Positive serial correlation Negeri Sembilan 1.826 No serial correlation Kelantan 2.028 No serial correlation Terengganu 1.899 No serial correlation Pahang 1.280 Positive serial correlation Sabah 1.186 Positive serial correlation Sarawak 1.149 Positive serial correlation

119

Table 4.9: Variance of inflation factor for regions Regions Variables East Northern Central Southern Borneo Coast RAINF 5.210 3.915 2.791 6.061 5.863 RAIND 7.618 4.983 3.870 7.126 5.946 TEMP 4.634 2.512 3.007 10.928 8.455 API 2.893 2.437 1.726 3.638 1.937 OILP 5.906 6.176 6.026 5.830 6.580 CPI 1.370 1.321 1.364 1.418 1.463 SAFE 4.554 4.292 4.341 4.304 4.281 BLKG 3.958 3.695 3.707 3.751 3.702

Table 4.10: Variance of inflation factor for individual states States Variables Kuala Penang Perak Perlis Kedah Selangor Melaka Lumpur RAINF 3.164 4.252 2.364 4.185 3.417 2.490 1.799 RAIND 4.857 5.789 4.877 7.832 4.177 3.350 3.221 TEMP 3.891 3.239 5.264 5.195 2.697 2.015 3.436 API 2.041 2.389 1.632 2.309 1.781 2.206 1.573 OILP 5.885 5.915 2.765 6.066 2.946 5.782 5.953 CPI 1.391 1.437 1.911 1.477 2.035 1.356 1.395 SAFE 4.404 4.362 4.629 4.452 4.345 4.288 4.403 BLKG 3.824 3.786 4.162 3.911 3.918 3.786 3.815 States Variables Negeri Tereng- Johor Kelantan Pahang Sabah Sarawak Sembilan ganu RAINF 2.699 2.615 4.018 3.221 3.611 3.372 5.145 RAIND 3.233 2.736 4.901 4.699 3.909 4.893 4.901 TEMP 3.867 2.495 10.154 6.020 7.193 6.906 6.765 API 1.378 2.365 2.907 2.934 2.952 1.369 2.369 OILP 5.721 6.553 5.822 5.513 5.889 5.845 6.904 CPI 1.335 1.436 1.330 1.812 1.394 1.438 1.583 SAFE 4.394 4.354 4.397 4.274 4.289 4.318 4.321 BLKG 3.741 3.723 3.747 3.704 3.710 3.831 3.741

Results in Table 4.8 shows that most of the estimated models have autocorrelation problem in the estimated residuals except for east coast region, and for states of Negeri Sembilan, Kelantan and Terengganu. The lag of dependent variable (LRA_1) is included in the subsequent analysis as remedial to the autocorrelation problem. Majority of the VIF values in Table 4.9 are less than 10 indicating that there is no multicollinearity problem among the selected explanatory variables.

120

Although the VIF values related to TEMP variable for east coast region and states of Kelantan exceed the rules of thumbs for VIF of 4.0 or 10.0, the variable is remain in subsequent analysis. As Brien (2007) in their study found that it is inapproriate to questioning the result of the studies when the VIF is greater than 4, 10 or 30 as the value are closedly related to number of sample size.

Table 4.11: List of possible outlier observations Standardized residual Regions/States Month (bi ) East Coast Nov-03 3.8327 Penang Perak Nov-07 -3.233 Mac-08 3.046 Apr-08 4.650 Negeri Sembilan Dec-01 3.543 Nov-11 5.749 Kelantan Nov-03 4.264

Referring to Appendix 2 of the preliminary model, some of the road accidents model for east coast regions and several states do not satisfy the normality of residual assumption. In specific east coast regions, Perak, Negeri Sembilan and Kelantan. The unsatisfied normality assumption may be due to outliers that exist in the observation.

Table 4.11 shows the list of possible outliers for those region and states. This table indicates that at least one possible outlier is found for the states that are failed to satisfied normality assumption. The dummy variable for each possible outlier is included in the subsequent analysis as a remedy for outliers and unsatisfied normality assumption problem. However, only significant dummy variables are considered.

After considering the autocorrelation, multicollinearity and outliers, the final estimate of ordinary least square time series regression model is written as below:

11 11 (3.60) Yt=β o ++ vt∑∑ Djt γ j + βεi X it + t ji=11=

121

β where i is the coefficient of explanatory and dummy variables and X i is the explanatory and dummy variables. Results from the estimated time series regression together with the explanatory variables are displayed in Table 4.12 for the regions and in Table 4.13 for individual states. Meanwhile the VIF of both models are displayed in Appendix 3.

Not much different to the model without explanatory variables, the average monthly growth of road accidents shows road accidents growth is between 0.3% to

0.5% per month in the five regions with the east coast region recorded the highest growth that is 0.5% per month. Despite the rapid development experienced by northern, southern, and central regions, the estimated road accidents growth are slightly lower at 0.3% while Borneo recorded 0.4%.

With regards to the monthly seasonality pattern, the estimated number of road accidents is the highest in August in the northern and east coast regions, while the highest number of road accidents is marginally higher in March in the central region and in October in the Borneo regions, while the southern region recorded the highest number in July. The estimated number of road accidents is found to be the lowest in

February in all five regions. Interestingly, the number of road accidents throughout

Malaysia is generally found to be lower in January, April, June, September and

November compared to December.

In relation to the explanatory effects, the estimated model found that the number of road accidents for most regions increase due to climate effect. Heavy rainfall increases the number of road accidents in all regions. This result is expected because heavy rain made the road become slippery, which causing lost of traction between tyres and road as well as impaired visibility. However, this relationship is

122 only significant for the northern, southern and central regions. This results makes sense as those regions are among the developed regions that experience rapid urbanisation and heavy raind directly flash floods or landslides that may cause the road accidents.

Table 4.12: Estimated regional road accidents model by incorporating explanatory variables Regions Coefficients Northern Central Southern East Coast Borneo ** ** ** ** ** β0 7.429 4.887 5.611 7.491 6.067 v 0.003** 0.003** 0.003** 0.005** 0.004** -4 ** * γ1 4.8×10 0.033 -0.022 -0.008 -0.041 ** ** ** ** γ 2 -0.087 -0.077 -0.098 -0.026 -0.086 ** * ** γ 3 0.026 0.102 0.034 0.171 0.018 ** ** γ 4 -0.008 0.001 -0.037 0.131 -0.028 ** γ 5 0.005 0.018 0.018 0.156 0.021 ** ** γ 6 -0.022 0.001 -0.028 0.139 -0.044 ** ** ** γ 7 0.017 0.095 0.038 0.136 -0.007 ** ** ** ** γ 8 0.037 0.064 0.037 0.200 0.006 ** ** ** γ 9 -0.031 0.014 -0.021 0.142 -0.041 ** γ10 0.008 0.078 0.025 0.104 0.024 ** γ11 -0.049 0.001 -0.043 -0.019 -0.033 Nov2003 Outlier - - - - 0.294** -5** -5** -5** -5 -5 β1_ RAINF 2.7×10 4.2×10 1.7. ×10 2.0×10 1.4. ×10 β 2_ RAIND -0.001 0.001 0.001 0.004 0.002 β ** 3_ TEMP 0.005 0.008 0.006 -0.010 0.029 -4 -4 -5 -5 -5 β4_ API -3.1×10 -1.9×10 -9.3×10 2.9×10 2.4×10 -4 -5 -5 -5** -5 β5_ OILP 8.8×10 -2.8 ×10 -4.1×10 -4.2×10 2.3×10 -4 -4 * β6_ CPI -4.9×10 -3.9×10 -0.001 0.001 0.001 * ** ** β7_ BLKG 0.020 -0.038 0.012 0.057 -0.006 ** ** ** * β8_ SAFE 0.077 0.013 0.041 0.141 0.043 * β9_ LRA_1 0.115 0.432 0.317 -0.007 0.042 LB 34.01** 9.290 15.902** 45.703** 138.23** GQ 1.062 0.941 0.562 0.461 1.300 JB 2.566 0.977 7.342 0.387 2.103 R2 0.948 0.970 0.978 0.920 0.932 **,* denote significant at 5% and 10% level

123

Meanwhile, the estimated model shows that the number of rainy days

(RAIND) are contributing in increasing the number of road accidents all regions except northern region. However it does not contribute significantly to the increasing number of road accidents. The model indicates that the number of road accidents increases with the increase in temperature (TEMP) except for east coast region.

However, the finding is only significant for Borneo regions. As expected, the estimated model also shows that the number of accidents increase with the increase in air pollution index (API) for all regions. The results may be true as high API reduces drivers’ visibility. Yet, significant relationship between API and number of road accidents is not found.

The estimated model for the influence of economic effects in traffic accidents suggest that the number of road accidents for almost all regions increases with the increase in consumer price index (CPI) for transportation and crude oil price (OILP).

The finding on CPI, however is contaradict with souther region road accidents as it show a negative significant relationship. Whereas, OILP does not show any significant effect to the number of road accident, in contrast with the statement of

Scott (1986) who stating that petrol price appeared to be quite strongly related to many accident series.

As expected, the estimated model shows that the number of road accidents increase during the “return to hometown” or Balik Kampung (BLKG) period which usually took place during the festive seasons include Eid-ul-Fitr, Chinese New Year and Deepavali. The finding does not include the central and Borneo region which shows opposite results. For central region the finding is reasonable as many residents in this region are outsiders who are not originally from this region and will travel to their hometown during festive seasons. It is found that the relationship between

124

BLKG and number of road accidents occurrence is only significant for northern, central and east coast regions.

Surprisingly, the estimated model indicates that the safety operations (SAFE) such as OPS Sikap or OPS Selamat that took place during festive seasons does not reduce the accidents occurrence. The finding is significant for almost all regions except central region.

It is found that, road accident series for east coast region involved an outliers.

The possible outlier in east coast region is in November 2003. The estimated outlier related to end year school holiday in conjuction with Eid-ul-Fitr holiday offering a long holiday celebration and causing more road accidents occurrence. In this case road accidents are estimated to increase by almost 29.4% .

After incorporating these explanatory variables, the goodness-of-fit of time series regression model shows that the model fit is fairly better than without explanatory variables. The value of R2 is more than 90%, varies from 92.4% for east coast region to 97.8% for southern region. Moreover, referring to the model diagnostic analysis, road accidents models for central regions found to be adequate wherein all residual assumptions are fully satisfied. The independent test of residual for the remaining regional road accidents show a larger values together with probability values that clearly rejecting null hypothesis indicating that there is high amount of dependency between the residuals.

The results of OLS time series regression model with explanatory variable for number of road accidents in individual states are tabulated in Table 4.13 Similar to the road accidents model without explanatory variables, the estimated of road accidents monthly growth is generally between 0.2% to 0.6% with the highest number recorded by Perlis and Terngganu at 0.6 % followed by Kelantan and

125

Pahang at 0.5%. Meanwhile, Penang, Selangor and Sarawak are found to have the lowest monthly growth of road accidents throughout Malaysia. This result contradicts with the road accidents model without explanatory variables which shows

Penang and Perak have the lowest growth of road accidents.

The estimated monthly road accidents with explanatory variables show that

February has the lowest number of road accidents for 11 out of all 14 states.

Meanwhile, Kelantan, Sabah and Sarawak recorded the lowest number of road accidents estimate in November, June and Sarawak respectively. Similar to road accidents model without explanatory variables, the largest number of road accidents is found to vary according to states with Penang, Perlis, Melaka, Pahang and Sabah show the biggest number of road accidents in March. While Kedah, Perak, Negeri

Sembilan, Kelantan and Terengganu show the largest number of road accidents in

August, while Kuala Lumpur have the largest number of road accidents in October.

The estimated number of road accidents in almost all states increase under adverse weather conditions. The number of road accidents in all states increases with the increase in amount of rainfall. Nevertheless, the finding is only significant for all states of except Penang, Perlis, Selangor, Kuala Lumpur, Kelantan, and Pahang.

Similarly, increase in the number of rainy days in Pahang, Kelantan, Sabah, and

Sarawak significantly increase the number of road accidents in those states but somehow, rainy days significantly reduce the number of road accidents in Perlis.

126

Table 4.13: Estimated individual states’ road accidents model by incorporating explanatory variables States Coefficient Kuala Penang Perak Perlis Kedah Selangor Melaka Lumpur ** ** ** ** ** ** ** β0 4.603 7.053 4.134 5.154 5.520 6.132 3.850 v 0.002** 0.003** 0.006** 0.003** 0.003** 0.003** 0.003** * * γ1 -0.004 -0.007 0.111 -0.009 0.025 0.026 -0.041 ** ** ** ** ** ** γ 2 -0.106 -0.062 -0.002 -0.125 -0.063 -0.095 -0.099 ** γ 3 0.018 0.026 0.175 0.015 0.102 0.077 0.024 * ** ** γ 4 -0.035 -0.002 0.166 -0.060 0.024 -0.002 -0.059 * * * γ 5 -0.032 0.033 0.149 -0.006 0.042 0.009 -0.006 ** * * γ 6 -0.041 0.004 0.053 -0.043 0.016 -0.018 -0.036 ** ** ** ** γ 7 0.014 0.035 0.136 -0.009 0.118 0.080 0.005 ** ** ** γ 8 0.009 0.064 0.119 0.021 0.079 0.065 -0.005 ** ** ** ** γ 9 -0.070 -0.002 0.154 -0.041 0.056 0.010 -0.053 * ** ** γ10 -0.015 0.024 0.127 -0.003 0.098 0.082 -0.031 ** ** ** γ11 -0.067 -0.024 0.084 -0.090 0.005 0.019 -0.093 Nov 2007 June2010 -0.135** -0.124** Mar 2008 Oct 2010 Outlier NA NA NA NA June2013 0.150** -0.420** 0.123** April 2008 0.222** -5** -5** -4 -5** -5 -5 -4** β1_ RAINF 3.9×10 7.9×10 1.1×10 7.8×10 3.2×10 6.0×10 1.8. ×10 * -4 -4 β2_ RAIND 0.001 0.001 -0.007 0.001 0.003 -2.1×10 -1.7×10 * β3_ TEMP 0.011 0.009 -0.014 0.014 0.006 0.008 0.004 ** -5 -4 -4 -5 -5 -5 β4_ API -0.001 -1.2×10 -4.6×10 1.6×10 -6.2×10 -4.4×10 1.8×10 -4 -4** -4 -4 -7 -5 -4** β5_ OILP 1.7×10 -2.1×10 -3.1×10 1.7×10 4.6×10 -1.6×10 -2.3×10 -4 -4 -4 ** ** β6_ CPI -2.6×10 -3.8. ×10 -3.8×10 -0.001 -0.002 -0.002 -0.001 ** * ** ** ** β7_ BLKG -0.006 0.033 -0.046 0.037 -0.031 -0.028 0.040 ** ** ** -5 β8_ SAFE 0.037 0.120 0.085 0.100 7.8×10 -0.007 0.014 ** ** ** ** ** ** β9_ LRA_1 0.363 0.028 0.191 0.190 0.366 0.229 0.409 LB 27.688** 26.210** 19.474* 28.104** 9.0481 22.796** 19.374* GQ 1.025 1.048 0.619 0.656 1.0143 0.87568 0.85208 JB 6.623** 1.678 1.833 4.962* 5.6391** 0.900 0.497 R2 27.688 0.924 0.786 0.928 0.9448 0.949 0.939 **,* denote significant at 5% and 10% level

127

Table 4.13: Continued States Coefficient Negeri Tereng- Johor Kelantan Pahang Sabah Sarawak Sembilan ganu ** ** ** ** ** ** ** β0 4.286 6.296 5.739 6.011 5.644 2.597 2.917 v 0.003** 0.004** 0.005** 0.006** 0.005** 0.003** 0.002** ** ** γ1 -0.009 -0.068 -0.024 -0.019 -0.021 -0.015 -0.068 ** ** * ** ** γ 2 -0.110 -0.126 -0.008 -0.049 -0.074 -0.053 -0.061 ** ** ** ** ** γ 3 0.046 -0.018 0.219 0.144 0.098 0.073 0.025 * ** ** ** ** γ 4 -0.041 -0.060 0.158 0.107 0.038 -0.015 -0.052 ** ** * -4 γ 5 0.017 0.003 0.144 0.151 0.083 -1.2×10 0.029 ** ** ** ** ** γ 6 -0.040 -0.026 0.183 0.122 0.057 -0.058 -0.062 ** ** ** ** γ 7 0.046 0.009 0.119 0.116 0.089 0.011 -0.010 ** ** ** ** γ 8 0.039 0.019 0.219 0.190 0.122 0.026 -0.018 ** ** ** γ 9 -0.028 -0.020 0.160 0.152 0.045 -0.028 -0.080 ** ** ** * γ10 0.038 -0.010 0.122 0.148 0.013 0.043 0.011 ** ** ** γ11 -0.046 -0.075 -0.030 0.022 -0.055 -0.031 -0.076 Nov 2011 Nov 2003 Outlier NA NA NA- NA NA 0.339** 0.480** -5** -4** -4 -4** -4 -5** β1_ RAINF 2.5×10 1.5×10 1.5×10 1.8×10 1.1 ×10 3.2×10 0.000 -5 ** ** ** ** β2_ RAIND -3.9×10 0.002 0.007 0.001 0.009 0.005 0.005 ** -4 ** ** β3_ TEMP 0.006 0.014 0.002 -0.001 -2.4×10 0.045 0.034 -4 -5 -4 ** -4 ** -4 β4_ API -1.3×10 -4.4×10 2.0×10 0.002 -2.8×10 -0.001 4.1×10 -6 -5 -4** ** -4** -5 -5 β5_ OILP 1.4×10 -4.2×10 -4.1×10 -0.001 -3.1×10 -2.3×10 2.3×10 -4 ** ** ** -4 β6_ CPI -4.4×10 -0.002 0.003 0.002 0.001 0.001 3.0×10 ** * β7_ BLKG -0.005 0.052 0.058 0.032 0.045 -0.012 -0.010 ** ** ** ** * β8_ SAFE 0.047 0.029 0.232 0.194 0.103 0.036 0.045 ** ** ** ** β9_ LRA_1 0.437 0.050 -0.029 -0.049 0.125 0.373 0.397 LB 13.626 28.01** 23.13** 30.84** 52.96** 8.434 36.693** GQ 0.445 0.350 0.696 0.360 0.539 1.208 0.708 JB 5.429* 0.233 3.529 0.500 0.616 1.961 0.755 R2 0.967 0.946 0.828 0.881 0.919 0.942 0.908 **,* denote significant at 5% and 10% level

128

Meanwhile, number of road accidents increases between 0.2% and up to

4.5% with the increase in temperature. The finding is true for all states except for

Perlis, Terengganu, and Pahang that exhibit some reduction of road accidents during sunnny days. However, the relation is only significant for several states, namely

Negeri Sembilan, Sabah, and Sarawak. In the case of API, high API reading causes, a reduction in the number of road accidents occurrence for almost all states except

Kedah, Sarawak, Melaka, Kelantan and Terengganu. This finding is expected, as the larger API reduce visibility distance, thus increasing risk of accidents occurrence.

Yet, the reduction is only significant for two states, Penang and Sabah.

For the economic variables, the estimated number of road accidents decreases with the increases in crude oil price and consumer price index of transport. The significant reduction of road accidents with the increase of crude oil price can be seen for the state of Perak, Melaka, Kelantan, Pahang and Terengganu. Meanwhile the increase of consumer price index is significantly positive affecting the number of road accidents in Kelantan and Sabah.

Moreover, the estimated model suggests that returm to village or “Balik

Kampung” culture which is occurred during the festival seasons increases the number of road accidents for all states except Penang, Selangor, Kuala Lumpur,

Johor, Sabah and Sarawak. However, the result is only significant for Perak,

Selangor, Kuala Lumpur, Melaka, Negeri Sembilan and Pahang. Furthermore, the intervention initiative, which were implemented to reduce road accidents occurrence seems still to be inefficient. It can be seen from the estimated model that SAFE operation, increased road accidents in all states except Selangor and Kuala Lumpur.

129

In addition, based on the residuals analysis, outliers may exist in the model of road accidents developed. The outlier points exist in several states including Penang,

Perak, Negeri Sembilan, and Kelantan. There are three outlier points found for Perak that are in November 2007, March 2008 and April 2008. The unknown outlier point in November 2007 is estimated to decrease the number of road accidents for Perak at least by 13.5% while the outlier point in March and April 2008 is estimated to increase the number of road accident by 15% and 22% respectively.

The possible outlier point for state of Selangor found in January 2005.

However the reason of sudden decrease is unknown. The number of road accidents is estimated to decrease approximately 16.9% . For states of Negeri Sembilan, and

Kelantan, outliers possibly occurred in November 2011, and November 2003 respectively. The occurrence of those outliers shows increment in the number of road accidents with the estimated of 33.9% and 48% increment respectively. The 33.9% increment in November 2011 for Negeri Sembilan and 4% increment in November

2003 for Kelantan can be related to the end year school holiday together with festive season’s celebration.

4.5 Box and Jenkins SARIMA

Box and Jenkins SARIMA is specifically designed to handle features such as seasonal variation as well as trend within a series. The seasonal and trend components are treated using special method rather than making it as dummy variable as in regression analysis described in Section 4.4 (Scott, 1986). Using the step by step procedure discussed in Section 3.3 of Chapter 3 which involve identification, estimation and validation, the estimated road accidents model based

130 on Box and Jenkins SARIMA analysis is presented in Table 4.14 and Table 4.15 for regional and individual states of road accidents models respectively.

4.5.1 Estimating SARIMA Model for Regional Road Accidents

Referring to Table 4.14, the estimated SARIMA models are different for each region. The results indicate that different region show different behavior of road accidents pattern. This finding is similar to Palamara et al. (2013) which stated that the risk of crash and severity varies with geographic location. For northern, central, and east coast regions road accidents series fit better with first non seasonal differencing transformation. While, for the southern and Borneo regions stationary series is achieved after transformation at both non-seasonal and seasonal lag.

Table 4.14: Estimated regional road accidents model based on Box and Jenkins SARIMA models Region / Northern Central Southern East Coast Borneo State (0,1,1) (2,1,0) (2,0,0) (1,1,1) (0,1,1) Model (1,0,0) (1,0,1) (0,1,1) (1,0,0) (0,1,1) c NA NA 0.0045** NA NA AR(1) NA -0.6310** 0.2379** -0.2824** NA AR (2) NA -0.2648** 0.2363** NA NA MA (1) 0.9978** NA NA 0.7597** 0.7928** SAR(1) 0.4442** 0.9998 NA 0.6009** NA SMA(1) NA 0.9726** 0.6086** NA 0.7958** LB(12) 8.8112 7.6100 6.2437 9.0004 3.2564 JB 4.0069 0.9905 1.2039 59.824** 28.406** WN Yes Yes Yes Yes Yes R2 0.8949 0.9641 0.9688 0.8600 0.9210 **,* denote significant at 5% and 10% level

Generally, the estimated model shows that all variables are significant and the error terms are not serially correlated. Moreover, the R2 value, which is on the percentage of dependence variable that is explained by the independent variables, is

131 quite high for each region. Besides, as observed, residuals for the estimated models are all white noise.

4.5.2 Estimating SARIMA Model for Individual States Road Accidents

For individual states, (refer Table 4.15) the best estimated SARIMA model for states of Penang, Perlis, Selangor, Sabah and Sarawak are

SARIMA(0,1,1)(1,0,1). The model reflects that the number of road accidents in theses states depends on the number of road accidents one year earlier and noise of the recent year observations.

Table 4.15: Estimated road accidents model for individual states based on Box and Jenkins SARIMA models Kuala Negeri State Penang Perlis Kedah Perak Selangor Lumpur Sembilan (0,1,1) (0,1,1) (0,1,1) (0,1,1) (0,1,1) (1,1,1) (0,1,1) Model (1,0,1) (1,0,1) (1,0,0) (1,0,0) (1,0,1) (0,0,1) (1,0,0) c NA NA NA NA NA NA 0.0042** AR(1) NA NA NA NA NA -0.1973 MA (1) 0.7337** 0.7648** 0.7210** 0.8499** 0.6619** 0.6289** 0.9421** SAR(1) 0.9974** 0.9080** 0.7960** 0.3574** 0.9998** -0.4451** 0.3842** SMA(1) 0.9371** 0.8213** NA NA 0.9762** NA NA LB(12) 8.0663 8.2509 16.2030 5.3010 8.8992 8.8375 5.9376 JB 0.1706 3.8138 3.8138 6.8729** 27.927** 5.8976* 121.59** WN Yes Yes Yes Yes Yes Yes Yes R2 0.9259 0.8003 0.8731 0.7513 0.9607 0.8878 0.9075 Tereng- State Melaka Johor Kelantan Pahang Sabah Sarawak ganu (0,1,1) (0,1,1) (1,1,1) (1,1,1) (0,1,1) (0,1,1) (0,1,1) Model (0,1,1) (0,1,1) (1,0,0) (1,0,0) (1,0,0) (1,0,1) (1,0,1) c NA NA 0.0035** NA 0.0047** NA NA AR(1) NA NA -0.1746** -0.2680** NA NA NA MA (1) 0.7122** 0.6405** 0.9299** 0.7549** 0.8676** 0.8058** 0.7824** SAR(1) NA NA 0.5529** 0.5836** 0.3982** 0.9606** 0.9893** SMA(1) 0.9089** 0.5953** NA NA NA 0.7112** 0.8903** LB(18) 9.8527 6.6438 9.3887 808628 9.7683 6.9104 8.4960 JB 2.2277 2.2692 157.53** 35.153** 9.2687** 6.0225** 1.6273 WN Yes Yes No Yes Yes Yes Yes R2 0.9303 0.9672 0.7135 0.8378 0.8736 0.933 0.9164 **,* denote significant at 5% and 10% level

132

Meanwhile road accidents in the state of Kuala Lumpur have their own special features which can be represented by SARIMA (1,1,1)(0,0,1). The model indicates that road accident series achieve the stationary condition after first differencing. The estimated road accident for Kuala Lumpur can be presented as weighted sum of recent year as well as recent values and weighted sum of recent values of noise.

For the states of Perak, Kedah, Negeri Sembilan and Pahang the number of road accidents model can be represented by SARIMA (0,1,1)(1,0,0). The model implies that transformation with the difference at first non-seasonal lag fit fairly to the series of road accidents. The estimated model shows that road accidents in these states are influenced by series of recent years accidents together with the recent noise values.

In contrast, for states of Melaka and Johor, stationary condition was achieved after two transformations at both non-seasonal and seasonal lag. The best SARIMA model for both states is SARIMA (0, 1, 1) (0, 1, 1). The model ensemble of simple exponential smoothing at both non-seasonal and seasonal components. The estimated model indicates that the series is influenced by the recent month and year of noise values. On the other hand, the best SARIMA model for road accidents in Kelantan and Terengganu is SARIMA (1, 1, 1) (1, 0, 0). The estimated model indicates that the series achieved stationarity after first differencing at non-seasonal lag. Road accidents in both states are influenced by number of recent month’s and year’s road accident as well as noise values of recent months.

All coefficient are found significant similar with the estimated SARIMA model for regional road accidents. At the same time, all time series models, achieved

133 white noise condition. Other residual assumption for the model is satisfied except for normality residuals assumption for states of Penang, Perlis, Kedah, Melaka, and

Johor.

4.6 Summary

This chapter discussed on the properties of the data that will be used in this study. This analysis is important as the preliminary study before it is applied to other analysis. From the analysis, early description about the variables are known. The strength of relationship between number of road accidents and other variables will improve the analysis in the next chapter. In addition, common time series analysis such as time series regression and seasonal autoregressived integrated moving average used in this chapter will be compared with the structural time series model, which will be employed in the next two chapters.

134

CHAPTER 5

MODELING UNIVARIATE ROAD ACCIDENTS MODEL

Following the preliminary study made in Chapter 4, that described the properties of the data series and common analysis of road accidents model, this chapter will estimate the model for number of road accidents using structural time series (STS) approach. The chapter begins with model identification, followed by model estimation for number of road accidents for five regions and individual states in

Malaysia. Diagnostic of estimated residuals for each of the model is conducted as one of the identification procedures and the Akaike Information criterion (AIC) is used to select the best model. Estimated road accidents models for all regions as well as individual states are then compared with the time series regression (TSR) and seasonal autoregressive integrated moving average (SARIMA) model to measure their performance.

5.1 Model Estimation

In this analysis, the number of road accident is expressed in natural logarithm form. Recall from Section 3.4 that the structural time series methodology decomposes a time series into trends (summation of level and slope component), seasonal and irregular components. Allowing for stochastic and deterministic components, lead to twelve possible combinations of structural time series model

135

(refer to Table 3.4). The best structural time series model that describe the number of road accidents is determined by the AIC model selection criteria, making sure that all assumptions for errors are fulfilled. Three main assumptions for the errors that need to be fulfilled are independence, homoscedasticity, and normality that are diagnosed using Ljung-Box (LB), Goldfelt-Quandt (GQ), and Jarque-Bera (JB) tests respectively.

Fitting road accidents series for each aggregated region with all twelve possible models found that all road accidents models that combined deterministic level and stochastic seasonal (DLSS), linear trend stochastic seasonal (LTSS) and linear trend deterministic seasonal (LTDS) cannot be estimated because the algorithm failed to converge. In addition, the analysis found that most of the models which incorporate stochastic time series produce lower variance disturbance compared to deterministic time series components. This indicates that road accidents models with stochastic process perform better as they are more flexible and the trend or pattern of the series can adapt to the underlying changes (Harvey and Koopman,

1996)

5.1.1 Estimating Road Accident Model for Northern Region

Out of the twelve possible models, only seven models can be estimated while the remaining five models cannot be estimated due to failure of the algorithm to converge. The estimated models are displayed in Table 5.1. From the seven models, the local level with stochastic seasonal (LLSS) model has the lowest value of AIC

(-5.3102). However LLSS model does not satisfy the normality assumption. This is might be related to existence of outliers or level break in the series as in

Commandeur and Koopman (2007). On the other hand, the estimated variance for

136

2 the seasonal errors (σω ) is quite small which indicates the seasonal patterns of road accidents in the northern region rarely change over time. As can be seen in Figure 5.1 there is consistent seasonal pattern with approximately equal interval between peaks and troughs.

. Table 5.1: Estimated results and performance criteria of STS model for northern region road accidents Model Parameter DLDS LLDS LLSS DTDS DTSS LDDS LDSS σ 2 ε 0.022 0.002 0.001 0.002 0.002 0.002 0.001 σ 2 -4 -4 -5 -5 η 0.000 2.0×10 1.4×10 0.000 0.000 2.0×10 3.2×10 σ 2 ς NA NA NA 0.000 0.000 0.000 0.000 σ 2 -6 -6 -6 ω 0.000 0.000 5.9×10 0.000 2.9×10 0.000 5.3×10 LB(12) 508.46** 23.255** 11.161 17.512** 13.605 16.627* 7.675 GQ 5.532** 1.230 0.850 1.330 1.042 1.241 0.856 JB 0.040* 2.027 11.483** 0.545 1.742 0.356 5.052* AIC -3.176 -5.263 -5.310 -5.229 -5.252 -5.252 -5.300 *,**denote significance at 10% and 5% level respectively. Underlined AIC indicate best specification model

Figure 5.1: Seasonal components of northern region road accidents

5.1.2 Estimating Road Accident Model for Other Regions

For the central region, five out of twelve structural time series models can be estimated and the results are tabulated in Table 5.2. It is noticed that the stochastic 137 seasonal component is not appropriate in explaining the number of road accidents in this region. This finding should not be a surprise as Figur 4.1 (b) does not show clear month-to-month variation. The absence of seasonal component indicates that seasonal variation in the number of road accident in the central region remains the same over the years. Referring to the diagnostic on the residual, only two models, local level with deterministic seasonal (LLDS) and local level drift with deterministic seasonal (LDDS) models, satisfy all three residuals assumptions.

Table 5.2: Estimated results and performance criteria of STS model for central region road accidents Model Parameter DLDS LLDS DTDS STDS LDDS 2 σ ε 0.042 0.001 0.002 0.001 0.001 2 -4 -4 ση 0.000 3.0×10 0.000 0.000 2.0×10 2 -7 σ ς NA NA 0.000 4.8×10 0.000 2 σ ω 0.000 0.000 0.000 0.000 0.000 LB(12) 964.57** 9.078 61.990** 29.965** 9.566 GQ 7.624** 0.854 1.090 1.036 0.907 JB 5.129* 0.560 3.249 1.408 1.097 AIC -2.569 -5.608 -5.387 -5.510 -5.591 **,* denote significance at 5% and 10% level respectively. Underlined AIC indicate best specification model.

In addition, the best combination of components to describe the number of road accidents in central region is the LLDS model that has the lowest AIC value (-5.608).

In contrast to the northern region, the best model describing the number of road accident in the central region does not have the slope component while having a deterministic seasonal component.

Next, the identification of possible model for east coast, southern and Borneo regions road accidents model are made, and the estimated results are tabulated in

Table 5.3, Table 5.4 and Table 5.5, respectively. For east coast region, notice that each possible model clearly violates the most important residual assumption that is independence of the estimated residual. Besides the residual assumption of normality

138

also significant and this may be due to occurrence of possible outliers in the series as

mentioned by Commandeur and Koopman (2007). These issue can be solved by

adding the intervention and explanatory variable. In such situation the best model are

described only based on the lowest AIC value that is LLSS model.

Figure 5.2: Seasonal components of southern region road accidents

Table 5.3: Estimated results and performance criteria of STS model for east coast region road accidents Model Parameter DLDS LLDS LLSS DTDS DTSS STDS STSS LDDS LDSS 2 σ ε 0.057 0.008 0.001 0.010 0.001 0.001 0.002 0.009 0.002 σ 2 -4 -4 -4 -4 η 0.000 4.0×10 4.0×10 0.000 0.000 0.000 0.000 1.0×10 2.0×10 σ 2 -7 -6 ς NA NA NA 0.000 0.000 8.5×10 1.9×10 0.000 0.000 σ 2 -5 -5 -5 -5 ω 0.000 0.000 8.1×10 0.000 2.3×10 0.000 7.2×10 0.000 7.8×10 LB(12) 366.64** 48.073** 24.791** 45.362** 48.100** 40.229** 17.896** 41.727** 18.335** GQ 4.487** 0.730 0.512 0.885 0.672 0.778 0.535 0.806 0.551 JB 1.772 45.190** 94.216** 17.775** 32.149** 32.772** 82.430** 33.397** 79.502** AIC -2.288 -3.874 -3.991 -3.812 -3.811 -3.841 -3.950 -3.843 -3.960 **,* denote significance at 5% and 10% level respectively.Underlined AIC indicate best specification model

139

Table 5.4: Estimated results and performance criteria of STS model for southern region road accidents Model Parameter DLDS LLDS LLSS DTDS DTSS STDS STSS LDDS LDSS σ 2 ε 0.046 0.001 0.001 0.002 0.001 0.001 0.001 0.001 0.001 σ 2 -4 -4 -4 -5 η 0.000 3.0×10 2.0×10 0.000 0.000 0.000 0.000 1.0×10 8.3×10 σ 2 -7 -6 ς NA NA NA 0.000 0.000 8.7×10 1.8×10 0.000 0.000 σ 2 -6 -6 -6 -6 ω 0.000 0.000 2.9×10 0.000 1.4×10 0.000 3.0×10 0.000 3.2×10 LB(12) 108.24** 20.002** 8.643 47.544** 56.012** 25.108** 12.018 20.153** 6.131 GQ 9.708** 1.090 0.804 1.040 1.281 1.010 0.682 1.111 0.726 JB 4.977 0.846 0.760 1.189 1.097 0.864 0.011 0.037 0.077 AIC -2.496 -5.618 -5.658 -5.013 -5.518 -5.536 -5.603 -5.606 -5.673 **,* denote significance at 5% and 10% level respectively.Underlined AIC indicate best specification model

Table 5.5: Estimated results and performance criteria of STS model for Borneo region road accidents Model Model DLDS LLDS LLSS DTDS STDS STSS LDDS LDSS σ 2 ε 0.0362 0.0012 0.0012 0.0025 0.0016 0.0015 0.0014 0.0013 σ 2 -5 -5 η 0.0000 0.0002 0.0002 0.0000 0.0000 0.0000 8.8×10 8.5×10 σ 2 -7 -7 ς NA NA NA 0.0000 1.7×10 1.8×10 0.0000 0.0000 σ 2 -7 -7 -7 ω 0.0000 0.0000 5.1×10 0.0000 0.0000 5.4×10 0.0000 7.2×10 LB(12) 659.96** 7.6897 6.9646 68.850** 5.6604 5.6363 4.5134 3.4955 GQ 4.4172** 1.2764 1.174 2.7330** 1.5191 1.3997 1.5744 1.4096 JB 2.7061 13.199** 14.460** 3.2115 9.2728** 11.801** 12.490** 15.547** AIC -2.7063 -5.4009 -5.3921 -5.043 -5.3754 -5.3675 -5.3912 -5.3861 **,* denote significance at 5% and 10% level respectively.Underlined AIC indicate best specification model

Similar with the northern region, LDSS model conveys the best road accidents

model for the southern region with the lowest AIC value (-5.673) and satisfying all

three residuals assumptions. Although the model’s seasonal disturbance variance is

quite small, yet the varying seasonal patterns are visible as illustrated in Figure 5.2

similar with the northern region case. In contrast with seasonality component of

northern region that becomes larger over time, seasonality component of southern

region road accident becomes smaller over time with consistent peak and trough at

approximately equal interval through the period of study.

140

Meanwhile, the LLDS model better portrays road accident series for Borneo region. The estimated model is similar to central region road accidents series.

However, the normality residuals of the model is clearly violated since JB=13.199

2 which is greater than critical value of χ(2,0.05) = 5.99 . On the other hand, normality of the residual assumption is the least important assumption (Commandeur and

Koopman, 2007) which may be due to outliers or structural break in the series.

Detailed interpretation of the estimated models is discussed in the next section.

5.2 Understanding Estimated Regional Road Accidents Model.

Once the model has been estimated for each region, the estimated unobserved components are presented and discussed. Interpretation for the deterministic component follows the classical regression model. Meanwhile, the stochastic component is discussed by looking at the behaviour of the estimated smoothed component over the sample period.

5.2.1 Trend Pattern of Regional Road Accidents

Recall that the best specified road accidents model for northern and east coast regions is LLSS, while LDSS model best represents southern region and LLDS model best fit the central and Borneo regions. Thus, the number of road accidents for the southern region exhibits a stochastic trend with deterministic slope while the northern, central, east coast and Borneo regions show a stochastic trend without slope.

The stochastic trend is presented in Figure 5.3 while Table 5.6 presents the estimated unobserved trend component at the end of the sample period together with

141 two measure of goodness-of-fit of the model. The final estimate of the level of road accident in the central region is the highest with 16394 (computed based on anti logs,

e9.7047 ) accidents per month. The results is in line with the highest number of vehicles registered in central regions as reported by Road Transport Department of Malaysia in 2012.

Table 5.6: Final estimation results according to regions Region Parameter Northern Southern Central East Coast Borneo Model LLSS LDSS LLDS LLSS LLDS ** ** ** ** µt 8.979 9.072 9.705 8.098 8.001 ** ν t NA 0.004 NA NA NA- ** γ1 -0.005 -0.010 -0.018 -0.080 -0.027 ** ** * γ 2 -0.028 -0.047 -0.127 0.004 -0.089 ** -5 γ 3 0.012 0.011 0.035 4.0×10 0.015 * γ 4 -0.016 -0.034 0.018 -0.061 -0.014 ** ** γ 5 -0.008 0.037 0.005 0.062 0.040 ** ** γ 6 -0.018 -0.027 -0.032 -0.054 -0.026 ** * γ 7 0.013 0.040 0.050 0.052 0.005 ** ** ** * ** γ 8 0.129 0.076 0.045 0.255 0.030

γ 9 -0.030 -0.020 0.007 -0.031 -0.013 ** ** γ10 0.016 0.014 0.050 0.008 0.051

γ11 -0.034 -0.026 -0.005 -0.078 -0.004 ** ** γ12 -0.033 -0.013 -0.027 -0.077 0.032 SE 0.048 0.038 0.038 0.100 0.043 R2 0.374 0.312 0.314 0.488 0.375 **,* denote significance at 5% and 10% level respectively.

The southern region has the second highest level of road accident

9.072 8.979 [8704  e ] followed by the northern 7938  e and east coast

8.098 3289  e regions with Borneo having the lowest level of road accident in

December 2013. The estimated deterministic slope coefficient suggests an increasing

142 rate of 0.43% per month in the level of road accidents in the southern region.

Approximately similar estimated slope is found in the TSR analysis. The rates of increase is considered low. Although, in Novermber 2015, the Borneo Post reported that Malaysia has the 20th highest rate of road accidents in the world.

(a) Northern

(b) Southern

(c) Central Figure 5.3: Trend components according to regions

143

(d) East Coast

(e) Borneo Figure 5.3: Continued

Note that although Borneo has larger number of registered vehicle (Road

Transport Department, 2012) than in east coast region, Borneo has the lowest level of road accidents. This may be due to the fact that Borneo has the longest road in

Malaysia (Malaysian Public Work Department, 2013) contributing to low traffic volume and thus resulting in small number of road accidents.

From Figure 5.3, it can be seen that the level component for southern region is smoothly increasing, while, for northern, central, east coast and Borneo regions the level component show more obvious wavy pattern. For example, it can be seen that there is a large increase in the level component between April and October 2011 and

144 this number decrease considerably between November 2004 and February 2005 for the central region.

5.2.2 Seasonal Pattern for Regional Road Accidents

Another component of STS model is the seasonal component describing the pattern of number of road accidents in the five regions. Referring to Table 5.6, only the northern, southern and east coast regions show stochastic seasonal pattern in the number of road accidents while the remaining two regions show a deterministic seasonal component.

Looking at the deterministic seasonal pattern in Figure 5.4(d) and Figure

5.4(e), it is clear that February has the lowest number of road accidents in the central and Borneo regions. On average, the number of road accidents are lower in February in the central and Borneo regions. The number of road accidents seems to fluctuate between March and December in the central and Borneo regions.

Figure 5.4(a), Figure 5.4(b) and Figure 5.4(c) show the change in the stochastic seasonal pattern in the number of road accidents in the northern, southern and east coast regions over the sample period. In 2001 and 2002, the largest number of road accidents for both northern and southern regions is observed to be in December.

From 2003 to 2008, the largest number of road accidents occured in October. From

2009 onwards, August is observed to have the largest number of road accidents in both regions. A similar case is found for the central and Borneo regions.

145

(a) Northern

(b) Southern

(c) East Coast Figure 5.4: Seasonal components according to regions

146

(d) Central

(e) Borneo Figure 5.4: Continued

The seasonal pattern of larger number of road accident in east coast region is changing every two years. In 2001 and 2002, the biggest number of road accidents is happens in December. In the following two years (2003-2004), the numbers of road accidents is larger in November, followed by October for year (2005-2006) and so on. This change is expected as it is parallel with Eid-ul-Fitr festival.

On the other hand, the goodness-of-fit of the model that is measured using standard error (SE) and coefficient of determination, R 2 does not show any serious problem. The R 2 show between 0.31% and 0.48% of the combination of stochastic and deterministic time series components has explained the road accident model

147 well. In addition, standard error of each regional road accidents model is less than one indicating that the estimation mostly achieved the prediction accuracy.

Compare to the analysis based on TSR, the coefficient determination, R2 in the current analysis is much lower. Yet, the stochastic trend and seasonal component in the current analysis show the independence assumption for the residuals is satisfied for all regions. The results indicate that STS provides better prediction model compared to time series regression model. However, the accuracy of the models must be evaluated based on loss function measurement which be discussed at the end of this chapter.

5.3 Estimating Road Accidents Model for Individual State Level

Describing the number of road accidents according to regions may not present the real road accidents phenomenon. Risk of road accidents varies according to location as the road structures and conditions may differ according to the area and traffic volume. Aderamo (2012b) stated that urban area recorded more accidents but lower number of injuries than rural areas, while rural areas have lower number of road accidents with more serious fatalities. Therefore, this section will analyze road accidents from smaller scale that is focusing on individual states.

Using similar approach as in Section 5.1, the best structural time series model of road accidents for each state is determined and the best specification models are presented in Table 5.7. The full identification model results are presented in

Appendix 4. Between the periods of January 2001 and December 2013, the local level fixed seasonal (LLDS) model gives a good model to present number of road accidents in Penang, Perlis, Kuala Lumpur, Selangor, Malacca, Pahang, and

Sarawak. Meanwhile, local level drift with fixed seasonal (LDSS) model is best

148 suited for explaining the number of road accidents in Negeri Sembilan. On the other hand, trends of road accidents in Kedah, Perak, Kelantan, Terengganu, Johor, and

Sabah best to be represented with local level seasonal (LLSS) model.

Table 5.7: Best road accident model specification for each individual state Kuala Negeri States Penang Perlis Kedah Perak Selangor Lumpur Sembilan Models LLDS LLDS LLSS LLSS LLDS LLDS LDSS 2 σ ε 0.0009 0.0099 0.0015 0.0029 0.0012 0.0011 0.0031 2 -5 ση 0.0002 0.0010 0.0003 0.0002 0.0002 0.0004 3.6×10 2 σς NA NA NA NA NA NA 0.0000 σ 2 -5 -6 -6 ω 0.0000 0.0000 1.9×10 7.0×10 0.0000 0.0000 1.9×10 LB(12) 13.747 7.9766 12.036 14.368 11.405 10.154 12.832 GQ 1.0587 0.9669 0.8423 0.9902 0.9169 0.6823 0.8885 JB 0.8374 3.2766 8.8766** 6.5859** 3.8601 10.635** 83.441** SE 0.0361 0.1115 0.0700 0.0704 0.0415 0.0435 0.0619 R2 0.3834 0.3877 0.3595 0.4111 0.3773 0.2876 0.4897 States Melaka Johor Kelantan Terengganu Pahang Sabah Sarawak Models LLDS LLSS LLSS LLSS LLDS LLSS LLDS 2 σ ε 0.0018 0.0006 0.0042 0.0017 0.0056 0.0015 0.0020 2 ση 0.0004 0.0003 0.0003 0.0005 0.0005 0.0003 0.0003 2 σς NA NA NA NA NA NA NA σ 2 -6 -6 ω 0.0000 3.0×10 0.0002 0.0001 0.0000 2.5×10 0.0000 LB(12) 12.336 9.4524 25.809** 26.408** 27.134** 10.958 8.8344 GQ 0.6715 0.8644 0.5465 0.4602 0.7383 1.1279 0.8210 JB 0.9077 1.1136 168.71** 45.807** 8.1560** 7.0880** 5.5832** SE 0.0506 0.0417 0.1411 0.11551 0.0831 0.0532 0.0519 R2 0.3124 0.1908 0.5171 0.50291 0.4860 0.2573 0.4234 **,* denote significance at 5% and 10% level respectively.

The results clearly contradict the road accidents models for regions except for central region where similar pattern is found for this region with the respective states of Kuala Lumpur and Selangor. Meanwhile, for others state, the trend and pattern of road accidents in each state is different even though it falls under the same region.

Complementary to this, the standard diagnostic presented in Table 5.7 do not indicate any serious problem corroborating that the model fits the model specification. Only

149 certain models display a weak evidence of normality and autocorrelation, thus rejecting the relevant null hypothesis at 1% significant level. Such cases include the states of Selangor, Negeri Sembilan, Kelantan Terengganu and Pahang.

Nevertheless, this issue can be solved by incorporating the intervention and explanatory variables that will be discussed in the next chapter.

Goodness-of-fit of the models as measured by R 2 indicate that 20% and up to

52% of the combination of stochastic and deterministic time series component successfully explain the road accidents occurrence in each state. On the other hand, standard error for each model is close to zero except for the states of Perlis, Kelantan and Terengganu, showing that model almost achieved the prediction accuracy.

5.3.1 Northern State Road Accidents Pattern

The northern region is made of four states that is Penang, Perlis, Perak, and

Kedah. Table 5.8 shows the estimation of unobserved component of the northern states. The estimated unobserved components for northern region is reproduced at the end of the corresponding state for comparison purposes. Out of four states, only

Kedah and Perak show similar specification of STS model that is LLSS model while the other states best represented with LLDS model.

The estimated level of road accidents in northern states found that Penang

8.0732 shows the highest level of road accidents with 3208  e accidents followed by

Perak [2916  e7.9781], Kedah [1624  e7.3926 ] and the lowest is Perlis with 156

[] e5.0424 road accidents. The estimated level of road accidents for Perlis is relevant since the state recorded the smallest population size and smallest area compared to

150 other states in Malaysia. This corresponds to the lowest number of registered vehicle for Perlis that is only 1% (Shahid et al. 2015). The stochastic trend discussed here is illustrated in Figure5.5.

Table 5.8: Final estimation result of northern state road accident model Parameter Penang Kedah Perlis Perak Northern Model LLDS LLSS LDDS LLSS LLSS µ ** ** ** ** ** t 8.073 7.393 5.042 7.978 8.980 ν t NA NA NA NA NA ** ** γ1 0.018 -0.018 0.060 -0.013 -0.005 ** * γ 2 -0.069 -0.045 -0.054 0.006 -0.028 * γ 3 0.017 0.019 0.036 -0.001 0.012

γ 4 0.002 -0.022 0.028 -0.017 -0.016

γ 5 -0.005 0.008 0.007 -0.009 -0.008 ** ** γ 6 -0.026 -0.025 -0.060 -0.043 -0.018 ** γ 7 0.021 0.049 0.012 -0.018 0.013 ** ** ** * γ 8 0.044 0.192 0.024 0.1389 0.129 * γ 9 -0.017 -0.032 0.051 -0.026 -0.030 ** γ10 0.024 0.023 -0.026 0.025 0.016 ** γ11 -0.023 -0.081 -0.046 -0.015 -0.034

γ12 0.013 -0.067 -0.031 -0.028 -0.033 **,* denote significance at 5% and 10% level respectively.

(a) Penang Figure 5.5: Trend components of road accidents for northern states

151

(b) Kedah

(c) Perlis

(d) Perak Figure 5.5: Continued

152

In terms of seasonality, only Perak and Kedah show stochastic seasonal pattern while Perlis and Penang show deterministic seasonal pattern. The seasonality pattern for Penang and Perlis are illustrated in Figure 5.6(c) and Figure 5.6(d). Surprisingly, seasonality pattern for the number of road accidents in Penang resembles the seasonality pattern of road accidents in central region. Seasonality pattern for the number of road accidents for Perlis fluctuates throughout the study period with the higher number of road accidents observed in month of January.

The seasonality of Perak and Kedah are as reflected in Figure 5.6(a) and

Figure 5.6(b). These figures show that the number of road accidents recorded in these two states is not much different compared to the number for the northern region. It can be seen that the number of road accidents for both states during month of

December shows a downward trends towards the end of study period with the largest number of road accidents found in 2001 and 2002. The pattern of road accidents for month of October shows an increasing trends with the largest number of road accidents found in the middle of the sample years before it decreases towards the end of the year.

In the mean time, the number of road accidents for month of August shows an increasing trend towards the end of the sample period with the highest peak found during 2009 onwards. The different is road accident in Kedah for month of

September show increasing trend with the highest peak around 2009 before it decrease at the end of the sample period while the Perak road accident in September is less obvious. On the other hand, lowest number of road accidents for Perak is in

June, while in Kedah, the number of road accidents fluctuate between February,

April and November.

153

(a) Kedah

(b) Perak

(c) Penang Figure 5.6: Seasonal component for northern states

154

(d) Perlis Figure 5.6: Continued

Table 5.9: Final estimation result of central and southern state road accident model Kuala Negeri State Selangor Central Melaka Johor Southern Lumpur Sembilan Model LLDS LLDS LLDS LDSS LLDS LLSS LDSS µ ** ** ** ** ** ** ** t 8.570 9.319 9.705 7.579 7.168 8.589 9.072 ν ** ** t NA NA NA 0.004 NA NA 0.004 ** * ** γ1 -0.012 -0.021 -0.018 -0.033 0.004 0.006 -0.010 ** ** ** ** ** ** ** γ 2 -0.127 -0.127 -0.127 -0.066 -0.049 -0.053 -0.047 ** ** ** * γ 3 0.043 0.031 0.035 0.013 0.023 -0.001 0.011 ** * * * γ 4 0.004 0.024 0.018 -0.018 -0.022 -0.037 -0.034 ** * γ 5 0.005 0.006 0.005 0.013 0.005 0.042 0.037 ** ** ** ** * * γ 6 -0.034 -0.032 -0.032 -0.035 -0.022 -0.041 -0.027 ** ** ** ** ** γ 7 0.052 0.049 0.050 -0.002 0.014 0.049 0.040 ** ** ** ** ** ** γ 8 0.051 0.042 0.045 0.024 0.033 0.076 0.076

γ 9 -0.003 0.012 0.007 -0.007 -0.012 -0.017 -0.020 ** ** ** ** γ10 0.052 0.049 0.050 0.050 0.022 0.021 0.014 ** γ11 -0.004 -0.006 -0.005 0.041 -0.018 -0.037 -0.026 ** * ** * γ12 -0.027 -0.027 -0.027 0.020 0.022 -0.008 -0.013 **,* denote significance at 5% and 10% level respectively.

5.3.2 Road Accidents Pattern for Other States

The estimated unobserved components for all states in southern and central regions are tabulated in Table 5.9 while the estimated unobserved component for all state in east coast and Borneo regions are tabulated in Table 5.10. For states in the southern region that includes Johor, Malacca, and Negeri Sembilan, the highest number of road accidents is in December 2013 for state of Johor with 5370 [ e8.589 ],

155 followed by Melaka with 1297 [] e7.168 , while Negeri Sembilan exhibits the lowest number of road accidents compared to other southern states with the increasing trend of 0.41% per month which is larger than Perak’s road accidents growth.

(a) Negeri Sembilan

(b) Melaka

(c) Johor Figure 5.7: Trend components of road accidents in Negeri Sembilan, Melaka and Johor

156

Changes in the trends pattern for states in the southern region are reflected in

Figure 5.7. The estimated road accidents for Negeri Sembilan incorporated the slope component in the trend component. The trend is more smooth compared to trend for

Malacca and Johor that fluctuate and show wavy pattern. Negeri Sembilan and Johor show stochastic seasonal pattern of road accidents while Melaka shows deterministic seasonality pattern. The lowest number of road accidents for all southern states is found to be in February. Surprisingly, Malacca (Figure 5.8(c)) resembles the seasonality pattern of road accidents for Borneo region that fluctuates between Mac to December. The difference is that larger number of road accidents is recorded in

August compared to Borneo region. For the case of Johor, the seasonality pattern for road accidents as in Figure 5.8(a) is not much different compared to southern, northern, Perak and Kedah road accidents. The number of road accidents is found to be larger in October between 2003 and 2008.

In contrast, the seasonality pattern of road accidents in Negeri Sembilan as illustrated in Figure 5.8(b) shows a similar pattern as road accidents in Kedah and

Perak for months of February, October, and December. The difference is that the road accidents pattern in month of November for the state shows some fluctuation pattern that is higher in 2003 and 2011 and lower in 2007. Meanwhile, in August, the number of road accidents pattern shows an increasing trend from the lowest in 2001 to the highest in 2013.

157

(a) Johor

(b) Negeri Sembilan

(c) Malacca Figure 5.8: Seasonal component for southern states

158

Recall that the central region recorded the largest number of road accidents in

Malaysia. This number contributed by Selangor with the level of road accidents of

9.319 11142  e accidents in 2013 which is the largest in the central region as well as

8.570 8.589 in Malaysia, followed by Kuala Lumpur 5270  e , Johor 5370  e ,

8.073 7.978 Penang 3207  e and Perak 2916  e . The result is expected as all these states are industrial states that have high traffic congestion rate. This is proven by

Shahid et al. (2015) which state that number of vehicles registered in Selangor and

Kuala Lumpur are more than one third of the total vehicles registered in the country.

(a) Kuala Lumpur

(b) Selangor Figure 5.9: Trend components of road accidents in for central states

159

(a) Kuala Lumpur

(b) Selangor Figure 5.10: Seasonal components of road accidents for central states

In terms of seasonality, as referring to Figure 5.10 the largest number of road accidents in both states is in the month of October while the lowest is in the month of

February. Low number of road accidents is also recorded in the middle of the year

(June) and end of the year (November and December) due to mid-year school holiday and end-year school holiday . As expected, traffic congestion during peak hours are reduced as parent and school bus operators do not need to send and fetch their children.

For the case of east coast region which include the states of Pahang, Kelantan and Terengganu, the largerst number of road accidents is contributed by Pahang with

7.374 6.796 1594  e accidents, followed by Terengganu with 894  e accidents and

160

6.678 Kelantan with 794  e accidents at the end of sample year. The changes of the estimated trends components are reflected in Figure 5.12. It can be seen that for all east coast states, the estimated trend of road accidents is smoothly increasing with all observation are below and above the estimated level.

Table 5.10: Final estimation results of east coast and Borneo states road accident models Tereng- East State Kelantan Pahang Sabah Sarawak Borneo ganu Coast Model LLSS LLDS LLSS LLSS LLSS LLDS LLDS µ ** ** ** ** ** ** ** t 6.678 7.374 6.796 8.098 7.283 7.338 8.001 ν t NA NA NA NA NA NA NA * ** γ1 -0.063 -0.020 -0.095 -0.080 -0.016 -0.025 -0.027 ** ** ** ** γ 2 -0.016 -0.108 -0.033 0.004 -0.109 -0.062 -0.089 -5 γ 3 0.070 0.013 0.018 4.0×10 -0.023 0.021 0.015

γ 4 -0.044 -0.030 -0.089 -0.061 -0.020 -0.007 -0.014 ** ** ** γ 5 0.094 0.011 0.069 0.062 0.047 0.060 0.040 ** ** ** γ 6 -0.061 -0.048 -0.026 -0.054 -0.047 -0.016 -0.026 * -5 γ 7 -0.003 -0.009 0.086 0.052 0.040 -7.0×10 0.005 ** ** ** ** ** ** γ 8 0.393 0.070 0.253 0.255 0.058 0.017 0.030 ** γ 9 -0.046 0.025 -0.024 -0.031 0.003 -0.040 -0.013 ** ** ** ** γ10 0.013 0.047 -0.009 0.008 0.046 0.035 0.051 * γ11 -0.173 0.018 -0.069 -0.078 -0.008 -0.022 -0.004 ** ** ** γ12 -0.163 0.030 -0.081 -0.077 0.027 0.041 0.032 **,* denote significance at 5% and 10% level respectively.

Seasonal patterns of road accident for all east coast states are illustrated in

Figure 5.12. Road accidents in the states of Kelantan and Terengganu have a stochastic seasonal pattern while in the state of Pahang has a deterministic seasonal pattern. The stochastic seasonal pattern for Kelantan and Terengganu is very similar with the east coast region’s pattern. In both states, generally, the month that recorded peak number of road accidents changes for every two years. Perhaps this is related to the Eid-ul-Fitr festival. Meanwhile, for the state of Pahang, the number of road accidents generally increases from the lowest in February to the highest in August.

161

(a) Kelantan

(b) Pahang

(c) Terengganu Figure 5.11: Trend components of road accidents for east coast states

162

(a) Kelantan

(b) Terengganu

(c) Pahang Figure 5.12: Seasonal components of road accidents in east coast states

163

In Borneo region the largest number of road accidents is contributed by the state of Sarawak. It is revealed that Sarawak recorded higher road accidents than

Sabah. Looking at the road accidents trends for both states illustrated in Figure

5.13(a) and Figure 5.13(b), the figures show a little wavy pattern with a big decreasing trend around early 2003 for state of Sabah and in 2009 for state of

Sarawak.

(a) Sabah

(b) Sarawak Figure 5.13: Trend components of road accidents for Borneo states

164

(a) Sabah

(b) Sarawak Figure 5.14: Seasonal components of road accidents for Borneo states

Besides, Sarawak road accidents shows a deterministic seasonal pattern that is very similar with the Borneo region’s seasonal pattern, fluctuating between months of March to December (refer Figure 5.14(b)). Perhaps the highest number of road accidents is found to be in May. Meanwhile, Sabah road accidents show a stochastic seasonal pattern as shown in Figure 5.14(a). The seasonal pattern for this state is quite different compared to other states and regions. The number of road accidents in

August is less obvious at the beginning of the sample period but found to be larger towards the end. The number of road accidents in October shows a downward trend starting 2008 onwards.

165

5.4 Special Features of Seasonal Road Accidents Pattern

For comparison purposes, some special features of stochastic seasonal pattern is tabulated in Table 5.11 The number of road accidents in December shows a large decreasing trend from a higher number of road accidents at the beginning of the sample period except for the state of Sabah that is consistent along the period of study. Road accidents pattern in August for states of Kedah, Perak, Kelantan,

Terengganu, and Negeri Sembilan show slow increasing trend while Johor and Sabah are less obvious.

Table 5.11: Special feature of seasonality of road accidents pattern Tereng- Negeri Month Kedah Perak Kelantan Johor Sabah ganu Sembilan

December

August * *

October *

February

May * * *

*Indicate no obvious pattern are observed.

In October, the number of road accidents for states of Kedah, Perak, Kelantan,

Terengganu, and Negeri Sembilan show a quadratic trend with large number of road accidents found in the middle of the study period. In contrast, Sabah shows a slow decreasing trend. February records the lowest number of road accidents for majority of the states, however it is found to increase at a slow rate for Johor and wavy trend for Kelantan and Terengganu. On the other hand, the number of road accidents in 166

May also show slow decreasing trend for the northern states, while Johor and Sabah show slow increasing trend while seasonal trend for the remaining state are less obvious.

5.5 Prediction and Forecasting Performance of the Structural Time Series

In this section, prediction and actual performance of STS and the commonly used method that are SARIMA and TSR for each road accidents model is measured.

Effectiveness of STS will be evaluated based of two famous loss function measurement that is root mean square error (RMSE), and mean absolute percentage error (MAPE). Figure 5.15 shows the estimated results of the prediction within sample data and forecasting results of one year ahead together with actual value of road accidents occurrence.

Across all the regions and states, the STS model (in purple line) perform best in the predicting horizon (January 2012- December 2013) but not in the forecasting horizon (January 2014- December 2014). The STS predicted value is closely related to the actual values (blue line) while the TSR and ARIMA method somehow over and under estimated the accidents occurrences.

167

(a) Northern

(b) Central

(c) Southern

(d) East Coast Figure 5.15: Real and estimated states road accidents produced by TSR, SARIMA and STS model for regions

168

(e) Borneo Figure 5.15: Continued

(a) Penang

(b) Perlis Figure 5.16: Real and estimated states road accidents produced by TSR, SARIMA and STS model for individual states

169

(c) Kedah

(d) Perak

(e) Kuala Lumpur

(f) Selangor Figure 5.16: Continued

170

(g) Negeri Sembilan

(h) Johor

(i) Melaka

(j) Kelantan Figure 5.16: Continued

171

(k) Terengganu

(l) Pahang

(m) Sabah

(n) Sarawak Figure 5.16: Continued

172

Even though the forecasting horizon for almost all states and regions show the prediction from each method are quite far from the actual value, the STS estimated the best prediction values. To ensure the accuracy of the prediction and forecasting horizon, performance of each prediction method is measured. The results are tabulated in Table 5.12 and Table 5.13 respectively.

As expected, RMSE and MAPE are found to be lowest for STS method for all regions and states in the study. The results indicate that the stochastic model performs better in predicting road accidents occurrence compared to the commonly used method. Next, the SARIMA method gives the second best prediction.

Table 5.12: Error values for prediction road accidents models RMSE MAPE Region/ State TSR SARIMA STS TSR SARIMA STS Northern 0.0024 0.0022 0.0004 10.930 10.420 4.2600 Penang 0.0010 0.0012 0.0006 7.5600 8.440 5.7700 Perlis 0.0087 0.0097 0.0065 35.890 38.620 31.380 Kedah 0.0064 0.005 0.0005 18.980 16.910 5.5200 Perak 0.0049 0.0054 0.0019 16.310 16.330 9.7500 Central 0.0011 0.0016 0.0006 6.4100 7.8300 4.6700 Selangor 0.0013 0.0015 0.0006 6.9700 8.1100 4.6700 Kuala Lumpur 0.0023 0.0032 0.0011 10.600 12.990 7.1700 Southern 0.0017 0.0016 0.0003 8.4900 8.130 3.8400 Melaka 0.0019 0.0022 0.0009 12.820 11.660 8.2400 Negeri Sembilan 0.0025 0.0023 0.0013 12.280 11.700 9.0100 Johor 0.0020 0.0020 0.0003 9.4700 9.2500 3.4900 East Coast 0.0102 0.0072 0.0002 24.550 20.720 3.4700 Kelantan 0.0242 0.0154 0.0011 45.500 37.830 9.2500 Terengganu 0.0114 0.0078 0.0002 29.180 27.080 4.4100 Pahang 0.0081 0.0076 0.0045 23.740 23.870 18.000 Borneo 0.0035 0.0026 0.0013 14.240 12.200 8.8400 Sabah 0.0045 0.0033 0.0010 19.010 15.480 8.2000 Sarawak 0.0036 0.0031 0.0017 14.990 12.670 9.2100

173

Table 5.13: Error values for forecasting road accidents models RMSE MAPE Region/ State TSR SARIMA STS TSR SARIMA STS Northern 0.0044 0.0012 0.0011 7.9100 3.9500 3.3200 Penang 0.0048 0.0021 0.0014 9.8700 6.0400 4.1200 Perlis 0.0238 0.0157 0.0140 26.620 22.280 21.380 Kedah 0.0106 0.0033 0.0033 15.580 6.8400 7.1700 Perak 0.0046 0.0033 0.0026 8.6000 7.2900 6.6700 Central 0.0026 0.0005 0.0008 5.6500 2.2600 3.0300 Selangor 0.0034 0.0007 0.0013 6.8000 2.8300 4.0800 Kuala Lumpur 0.0021 0.0017 0.0009 5.4800 4.7300 3.0500 Southern 0.0042 0.0031 0.0018 6.6200 6.0800 4.1900 Melaka 0.0026 0.0020 0.0045 7.1300 5.9800 8.4800 Negeri Sembilan 0.0019 0.0030 0.0017 5.9300 5.8900 5.4500 Johor 0.0075 0.0025 0.0016 9.8700 5.2100 4.7200 East Coast 0.0358 0.0085 0.0096 25.750 9.0800 9.9100 Kelantan 0.0589 0.0302 0.0271 37.310 23.420 21.540 Terengganu 0.0439 0.0057 0.0082 34.240 12.170 12.860 Pahang 0.0280 0.0143 0.0048 24.820 13.380 8.5900 Borneo 0.0155 0.0031 0.0016 17.580 6.9500 4.8400 Sabah 0.0118 0.0021 0.0020 15.380 5.3200 6.0000 Sarawak 0.0215 0.0067 0.0034 23.050 11.770 8.1000

5.6 Summary

This chapter model road accidents occurrence for each region and their state based on STS approach. Generally, the chapter founds that generalizing the road accidents model according to the region may not be sufficient as each state may not have similar behavior in terms of road condition, facilities and safety precaution.

This is due to different economic and social development conditions of the state even though they are under the same region.

It is found that road accidents in northern region is best suited with local level with stochastic seasonal (LLSS) model, but only Kedah and Perak road accidents are best suited with the similar model, while other remaining states under the same region are best suited with STS model with deterministic seasonal components.

Similar situations is found for other regions except the central region. Both states under this region have similar stochastic behavior.

174

Generally, higher number of road accident is found to be in August and

October, except for individual states such as in Perlis and Sarawak that have the largest road accident in January and May respectively. Higher number of road accident in August can be contributed to the celebration of National Day. Several major roads were closed for 2 or 3 days before the celebration to give an ample time for rehearsal of the events. The closure of major roads will result in high traffic volume that continuously contributes to the risk of road accidents occurrence.

Road accidents seem to increase if there is double celebration. For example, in 2001 and 2002, majority of the series have the higher number of road accidents in

December. It is known that this increase is related to the celebration of Eid-ul-Fitr,

Chrismast day and end year school holiday. Similar situation happens in October

2006 where Deepavali and Eid ul Fitr were celebrated in this same month.

From 2009 onwards, road accidents seem to be higher in August for the majority of the series. This finding may be related to school holidays, which fall on this month. Majed (2012), supported this view stating that road fatalities increase up to 18-20 per day on school holidays compared to not more than 15 on school days.

In addition, some of the series do not satisfy the normality of the residuals assumption. This related to the occurrence of outliers in those series. Regions and states that show the occurrence of outliers are northern region, Selangor, Kelantan,

Terengganu, Pahang, and Negeri Sembilan. On the other hand, it is proved that STS performs better compared to other common time series method in producing good prediction and forecast for road accidents occurrence.

175

CHAPTER 6

INCORPORATING EXPLANATORY AND INTERVENTION VARIABLE OF ROAD ACCIDENTS MODEL

In this chapter, the structural time series (STS) model that best fit the number of road accidents series in Chapter 5 are refitted with incorporating selected explanatory variables to investigate the variables’ influence on road accidents. The investigation will involve both regions and states levels, taking into account all five regions and 14 states in Malaysia. The estimation and discussion on the influence of explanatory variables will be described thoroughly. Apart from that, the study will also observed stochastic trend and seasonal pattern after incorporating the explanatory variables and considering the outliers in the series. Performance of the estimation model will be discussed at the end of this chapter.

6.1 Estimating and Understanding Regional Road Accidents Model

Recall that in Chapter 5, behavior of road accidents is different according to respective region. Northern, central, east coast and Borneo regions tends to have a stochastic trend without slope component while the southern region has a stochastic trend with a deterministic slope. In terms of seasonality of road accidents, only northern, southern and east coast regions show a stochastic seasonal pattern while, the remaining two regions have fixed seasonal pattern. Adding the selected explanatory variables will not make much difference compared to STS without explanatory variables model. Table 6.1 summarized the model estimation results after adding an explanatory variables. Only east coast regions’ road accidents model

176 were changed to deterministic seasonal model as variance disturbance of the seasonal component equals to zero. This finding indicates that adding the selected explanatory variable has removed the stochastic seasonality of road accidents in the region.

Adding explanatory variables somehow improved the models. These can be seen from the value of the standard error of the model that shows a smaller value compared to the model without explanatory variables as tabulated in Table 6.1.The estimated model indicates that adding selected explanatory variables, which include climate variables, economic variables, seasonal related variables and road safety variables, have reduced the error of the models. On the other hand, these components have also increased the coefficient of determination, R2 indicating the selected explanatory variables successfully explained the road accident occurrences.

2 Furthermore, observation disturbance variance,σ s shows a lower value compared to the univariate model without explanatory variable, which has been discussed in Chapter 5. This indicated that the residuals of the model after adding the explanatory variable are much closer to independent random values. However, the closer the residuals towards the independent random values can be determine through the diagnostic test. The general Ljung-Box (LB) test for independence based on first

12 autocorrelations is insignificant for all regions except the northern and east coast region.

The test of homoscedasticity is satisfactory, while the normality assumption of the model after adding the explanatory variables were not satisfied for almost all regional road accidents model except for the northern region. As mentioned in

Commandeur et al. (2007), the larger critical value of JB test may be influenced by

177 the outliers and structural break in the series. Thus, the error estimate must be developed to test the influence of the outliers.

Table 6.1: Estimation of regional road accidents after adding explanatory variables Northern Southern Central Parameter Uni- Uni- Uni- Multiple Multiple Multiple variate variate variate Model LLSS LLSS LDSS LDSS LLDS LLDS 2 -4 -4 -4 -5 -4 ση 3.0×10 1.0×10 1.3×10 8.3×10 0.001 3.0×10 2 σς NA NA 0.000 0.000 NA NA

2 -7 -6 -7 -6 σω 1.1×10 5.9×10 6.9×10 3.2×10 0.000 0.000 2 -4 -4 -4 σ ε 4.0×10 0.001 5.2×10 0.001 2.0×10 0.001 SE 0.030 0.048 0.031 0.038 0.032 0.038 R2 0.775 0.374 0.563 0.312 0.559 0.314 LB(12) 34.028** 11.161 6.377 6.131 15.082 9.078 GQ 1.114 0.850 1.054 0.726 1.266 0.854 JB 3.195 11.483** 4.960* 0.077 11.944** 0.560 East Coast Borneo Parameter Uni- Uni- Multiple Multiple variate variate Model LLDS LLSS LLDS LLDS 2 -4 -4 ση 5.7×10 0.0004 3.6×10 0.0002 2 σς NA NA NA NA

2 -5 σω 0.000 8.1×10 0.000 0.000 2 -4 -4 σ ε 3.0×10 0.001 8.4×10 0.001 SE 0.062 0.099 0.037 0.043 R2 0.811 0.488 0.547 0.375 LB(12) 19.354** 24.791* 8.489 7.690 GQ 0.422 0.512 1.344 1.276 JB 20.405** 94.22* 12.664** 13.199** *, **denote significance at 10% and 5% level

6.1.1 Error Estimate

Recall from Section 3.5.2, inspection of auxiliary residuals is important if the model does not achieve the normality assumption of the residuals. Auxiliary residuals which consist of standardised smoothed observation disturbance and standardised smoothed level disturbance allow for the detection of outliers and the structural break in the series. Consider the regional road accidents model developed

178 in Section 6.1 that shows unsatisfactory normality of residual assumption, the auxiliary residuals for each region is illustrated in Figure 6.1. The figure consists of the standardised smoothed observation disturbance on the left side and standardises smoothed level disturbance on right side.

Each auxiliary residuals on the left side of Figure 6.1 can be considered as a t-test with the null hypothesis of no outliers in the series (Koopman et al., 2009).

Applying the usual 95% confidence limit of ±1.96 corresponding to two-tailed t- test, it is clearly seen that possible outliers for southern region road accidents series occurred 9 times, followed by 8 time for central and east coast region and 4 times for

Borneo region.

Similarly, each auxiliary residuals on the right side of Figure 6.1 can also be considered as a t-test, but with null hypothesis of no structural break in the level of observed time series (Koopman et al., 2009). As seen, southern region road accidents series shown 9 points of possible breaks followed by 8 points for central region, 7 points for east coast region and 5 point for Borneo region. Only points that are located far outside of the confidence limit are considered. In this study, point is considered if exceeds 2.3 for outliers and 2.5 for structural breaks, as suggested in

Koopman et al. (2009).

However, not all possible outliers and structural breaks influence the occurrence of road accidents. Therefore, the model is refitted by incorporating all possible points of outlier and structural breaks but only significant point of outlier and structural breaks are retain in the model. In the model, outliers are handled using similar methods as in Section 4.4.2, that is by incorporating an impulse dummy variable, which takes the value of “0”, and the outlier observation will takes the

179 value of “1”. Meanwhile, the structural breaks are handled by incorporating step intervention variables that is “0” before the event and “1” after. The effect of the outliers and structural breaks in the series is presented in the next sub section.

Residual Region Standardised Smoothed Observation Standardised Smoothed Level Disturbances Disturbances

Southern

Central

East Coast

Borneo

Figure 6.1: Auxiliary residual of regional road accidents model

180

6.1.2 Estimation of Trend and Seasonal Component

The best fitted road accidents model which incorporate all explanatory variables as well as significant outlier and level breaks is tabulated in Table 6.2. To aid of the discussion, the trend and seasonal pattern after the addition of the explanatory variable is displayed in Figure 6.2 and Figure 6.3 respectively. For comparison purposes the first two region’s (northern and southern region) road accidents level components are plotted together with the level component without the explanatory variable. The estimated trend with explanatory variable (red line) exhibits it recovered quite well with the true observation series (black line) compared to the estimated trend without explanatory variable (blue line). This finding indicates that the estimated trend fit better with the incorporation of explanatory variable. The results are also applicable for all regional road accidents models.

In terms of seasonality pattern, referring to Table 6.3, only northern and southern region road accidents show a stochastic seasonal pattern while other remaining regions have deterministic seasonal component. However, the variance disturbance of the seasonal component for northern and southern region are very small. The finding indicates that the stochastic changes of seasonal road accidents in this regions are almost invisible as displayed in Figure 6.3 (a) and Figure 6.3 (b). In comparison to the discussion in Section 5.2.1, the seasonal pattern after adding the explanatory variables results to lower in February throughout the study period and higher in August, while other month show a consistent seasonal pattern. This result is expected for northern region as the independence test on first 12 autocorrelations

(seasonal lag) is significant indicate that the seasonal pattern may not occur in the series. Meanwhile, the southern region road accidents seasonal pattern after adding

181 the intervention and explanatory variables recorded lowest number of road accidents in February and the number gradually increase and peak in August.

Table 6.2: Final road accidents model estimates with explanatory variables Parameter Northern Southern Central East Coast Borneo Model LLSS LDSS LLDS LLDS LLDS 2 -5 -5 -4 ση 0.0003 8.9×10 0.0005 7.2×10 4.0×10 2 σ ς NA 0.0000 NA NA NA

2 -7 -5 σ ω 1.1×10 5.3×10 0.0000 0.0000 0.0000 2 -4 -4 σ ε 0.0004 5.6×10 0.0002 0.002 7.2×10 ** ** ** ** ** µt 10.384 8.924 11.901 10.677 7.065 ** ν t NA 0.005 NA NA NA * ** γ1 0.005 -0.013 -0.017 -0.100 -0.015 ** ** ** ** ** γ 2 -0.076 -0.074 -0.107 -0.151 -0.048 ** ** γ 3 0.012 0.020 0.010 0.043 0.026 * γ 4 -0.004 -0.027 0.004 0.030 -0.016 ** ** γ 5 0.006 0.016 0.009 0.044 0.030 ** ** ** γ 6 -0.016 -0.009 -0.024 0.038 -0.034 ** ** * γ 7 0.016 0.049 0.036 0.029 0.012 ** ** ** ** ** γ 8 0.053 0.051 0.065 0.100 0.027 * ** γ 9 -0.004 -0.004 0.016 0.073 -0.017 * ** * ** γ10 0.018 0.022 0.057 0.033 0.037 * ** γ11 -0.022 -0.029 -0.003 -0.077 -0.015 ** ** γ12 0.013 -0.005 -0.046 -0.061 0.013 -5** -5** -5** -5 β1_ RAINF 3.0×10 2.0×10 4.0×10 1.0×10 1.0.E-05 -4 ** ** β2_ RAIND -2.6×10 0.002 -0.001 0.004 0.006 * ** β3_ TEMP 0.009 0.006 -0.009 -0.012 0.042 -5 -5 -4 -4 -4 β4_ API 3.0×10 2.0×10 1.5×10 1.5×10 2.6×10 -5 -5 -4 -4 -5 β5_ OILP 1.0×10 9.0×10 -2.0×10 -1.6×10 -6.0×10 ** ** -4 β6_ CPI -2.992 -0.002 -0.001 -0.002 -2.2×10 ** ** ** β7_ BLKG 0.019 0.010 -0.032 0.057 -0.005 ** ** ** ** β8_ SAFE 0.068 0.043 0.003 0.134 0.029 ** ** ** β10 _ LRA_1 -0.165 0.010 -0.199 -0.265 -0.061 Nov2003 May2012 Jul2013 0.287** Jun2003 Outliers NA 0.104** 0.086** Jan2004 0.117** -0.177** Nov2002 0.117** Feb2005 Jan2005 Level Break NA NA NA -0.069** -0.102** Apr2011 0.144** SE 0.030 0.030 0.028 0.054 0.036 R2 0.775 0.599 0.681 0.856 0.584 LB(12) 34.028** 4.903 14.636 16.177* 10.566 GQ 1.114 0.947 1.210 0.778 1.884** JB 3.195 5.651* 2.360 3.263 19.29**

182

Otherwise, as illustrate in Figure 6.3(c), Figure 6.3(d) and Figure 6.3(e), the lowest number of road accidents for southern, central, east coast and Borneo regions were found in February which confirmed the finding in Section 5.2.1. The seasonal pattern road accidents for central region resemble an east coast region road accidents.

Meanwhile, for Borneo region, the number of road accidents seems to fluctuate between March to December with the largest number of road accidents found in

October.

(a) Northern

(b) Southern Figure 6.2: Trend components with explanatory and intervention variable according to regions

183

(c) Central

(d) East Coast

(e) Borneo Figure 6.2: Continued

184

(a) Northern

(b) Southern

(c) Central Figure 6.3: Seasonal components according to regions

185

(d) East Coast

(e) Borneo Figure 6.3: Continued

Based on the results discussed above, it is clear that the explanatory and intervention variables have a large impact on road accidents modeling. This can be seen through the changes of road accidents trend and their seasonal pattern which is different from road accidents model without explanatory and intervention variables.

Next section will discuss further on the impact of the explanatory variable towards the road accidents behavior.

186

6.1.3 Estimation of Explanatory Variables

Referring to Table 6.2 climate effect has a positive relationship with the increasing number of road accidents in all five regions except for the number of rainy days in central and northern regions, maximum temperature for central and east coast regions that show a negative relationship with the number of road accidents. RAINF is positively correlated with the increasing number of road accidents in all regions.

However, the relationship is only significant for the northern, southern and central regions. Similar results is also found based on time series regression in Section 4.4.2

The RAIND significantly affect road accidents in Borneo and east coast region,

These results are reasonable as these regions are near hilly where only small amount of rain may result in wet and slippery road conditions that will contribute to road traffic accidents. In addition, as estimated in TSR analysis maximum temperature also affect road accidents. However, for central region, the estimated effect is the opposite. API also shows different results compared to the TSR analysis.

The estimated model for the influence of economic effects in traffic accidents shows that CPI for transport has a negative relationship with the increase in number of road accidents in all regions. In contrast in TSR the CPI does not significantly affect road accidents in northern region, although it significanly affects the road accidents in southern region. This result is reasonable for both regions since these regions are among the the highest number of yearly registered vehicles after the central region but with less developed road structures compared to the central region.

Crude oil prices have a negative relationship with road accidents in all regions except the northern and southern region. However this variable does not show any significant effect on the number of road accidents.

187

Furthermore, the estimated model shows that BLKG culture is significant and positively related to the occurrence of road accidents in the northern and east coast regions but negatively significant to the number of road accidents in the central region. The results also agreed with the results estimated in TSR analysis where, traffic volume becomes much higher in this season because Malaysians took the chance to go back to their hometowns due to the long holidays, and this contributed to the higher number of road accidents for these regions. At the same time, road accidents in the central region decreased during the festive season as many residents in this region are not native to this region and will travel back to their hometown during festive seasons.

Road accidents in all regions are estimated to increase due to the implementation of safety precaution (SAFE) such as OPS Sikap and OPS Selamat during festive seasons. However, this implementation only significantly affect the number of road accidents in the northern, southern, east coast and Borneo regions. It is unexpected that the results show that the operation failed to reduce traffic accidents. However, the results for the northern and east coast regions are parallel with the result of BLKG culture since both variables are related to festive seasons.

6.1.4 Observing Outliers and Structural Breaks

For southern region, out of nine of possible outliers only one points give significant effect. The southern region road accidents show a sudden positive increase of road accidents in May 2012. The outlier in May 2012 may be related to the midterm school break together with the Labour day and Wesak day holidays. On the other hand, the significant structural break is found in February 2005. However the reason behind this sudden decrease are unknown and crucial to interprete.

188

For central region, a significant outlier was found in July 2013. The outliers may be related to the month of Ramadan or month of fasting for the Muslims. This is reasonable as fasting day may results to fatigue because of unusual meal timing and sleep pattern which affect driver’s concentration. This is agreed with Mehmood, et al. (2015) which found that there in increasing number of road traffic injuries during

Ramadan in Karachi, Pakistan especially during the evening.

In the mean time, step intervention due to structural break for central region were found to be occur in November 2002, January 2005 and April 2011. The break in November 2002 is crucial to interpret. It is assumed that the landslide in Taman

Hill view Ulu Klang (Utusan Malaysia, 2002) may affect the road condition near the location which results in massive road traffic and road accidents. However, this assumption may also be false.

Afterwards the break in January 2005 was related to the effectiveness of the hand over operation of Light Rail Transit (LRT) to Rapid KL on November 2004 by

Prasarana Malaysia which results in reduction of the number of road accidents the central region. Then, the break in April 2011 may be due to the reopening the

Puduraya bus terminal in Kuala Lumpur which operated 24 hours a day. The improvement in public transport system reduced the number of vehicles on the road as well as ensured smooth traffic flow in the city centre.

In the east coast region, the outliers that gave significant impact on road accidents is detected in November 2003 and January 2004. The outlier in November

2003 is similar as observed in the TSR analysis. The point is assumed due to end year school holiday and Eid-ul-Fitr holiday which is fall on end of weekday which offer a long holiday celebration. Another outlier in January 2004 related to the

189 celebration of New Year and Chinese New Year celebrations. States like Kelantan and Terengganu do not celebrate New years, and Chinese community in this region is a smaller compared to other regions. Moreover, children start their new school calendar in January. It is observed that fewer holidays may decrease the number of road accidents in the east coast region.

The sudden increase of road accidents in Borneo region that is found in June

2003 is assumed due to the midterm school break that is falls from 25th May to 7th

June together with the Harvest and Gawai Dayak festivals which are celebrated annually in the state of Sabah and Sarawak. In terms of break intervention, the number of road accidents in the Borneo region was found to decrease in September

2012 and August 2013. Reduction of road accidents after September 2012 may be related to the announcement of the first phase of the Automated Enforcement System

(AES) that is carried out in several major roads and expressways.

Residual assumption for normality has improved by considering outliers and structural breaks in the series (refer Table 6.2). However, this statement does not include the southern and Borneo region road accidents model in which the normality and homoscedasticy of the residuals still do not satisfy the residuals assumption even though after adding the intervention variables. Otherwise, the goodness-of-fit of the model that is measured by the standard errors of the model as well as the R2 and the disturbance variance of each component decreased due to the inclusion of the intervention variables

Even though the addition of intervention variables for each outliers and structural break may improve the fit of the model as tabulated in Table 6.2, it may also result in false assumption. Besides the insertion of intervention variables as the

190 result of observing outliers and structural break in the series should always be based on theory and facts concerning the possible cause of their occurrence that is somehow not easy to interpret.

6.2 Estimating and Understanding Individual State Road Accidents Model

As discussed in Section 5.3, the risk of road accidents varies between location. The factors influencing of road accidents may also be different. Therefore, this section will discuss on analyzing road accidents model by incorporating explanatory variables and intervention variables using a smaller scale that focuses on individual state. The study period is from January 2001 to December 2013, except for the states of Perlis and Selangor. Study period for these two states only covers from January 2004 to December 2013 due to data availability of one explanatory variable.

Using similar approach as in Section 6.1, the estimated model of road accidents with explanatory variables are presented in Table 6.3. Incorporating selected explanatory variables shows that states of Perak and Negeri Sembilan road accidents model have been changed from the stochastic seasonal model into deterministic seasonal model while other states follow the same road accidents model analyzed previously in Section 5.3. Incorporating explanatory variables does also help in improving the model. It is clearly shown that the much smaller value of

2 irregular disturbance variances ( σ ε ) were found for majority of the states and indicate explanatory variables have accounted the residual in the estimated univariate model previously.

191

Otherwise the goodness-of-fit of the model after adding the explanatory variables as measured by R2 shows some improvement. The value of R2 is higher than univariate model, indicating that the selected explanatory variables successfully explain road accidents model for each state. On the other hand, the standard error of each model approaches zero and smaller than the model without the explanatory variables indicating that the accuration of current prediction model is better.

Table 6.3: Estimation of state level road accidents with explanatory variables Penang Perlis Kedah Parameter Uni- Uni- Uni- Multiple Multiple Multiple variate variate variate Models LLDS LLDS LLDS LLDS LLSS LLSS 2 -4 -4 -4 ση 3.0×10 2.0×10 0.001 0.001 0.001 3.0×10 2 σ ς NA NA NA NA NA NA 2 -8 -5 σ ω 0.000 0.000 0.000 0.000 4.8 ×10 1.9×10 2 σ ε 0.001 0.001 0.010 0.010 0.001 0.002 SE 0.033 0.036 0.108 0.112 0.046 0.070 R2 0.538 0.383 0.501 0.388 0.743 0.360 LB(12) 9.744 12.336 6.286 9.452 20.997** 43.071** GQ 1.104 1.059 0.727 0.967 1.093 0.842 JB 3.555 0.837 4.230 3.277 4.292 8.877** Perak Kuala Lumpur Selangor Parameter Uni- Uni- Uni- Multiple Multiple Multiple variate variate variate Models LLDS LLSS LLDS LLDS LLDS LLDS 2 -4 -4 -4 -4 ση 3.4 ×10 0.0002 2.7×10 2.0×10 0.001 4.0×10 2 σ ς NA NA NA NA NA NA 2 -6 σ ω 0.000 7.0×10 0.000 0.000 0.000 0.000 2 σ ε 0.001 0.003 0.001 0.001 0.001 0.001 SE 0.042 0.070 0.038 0.042 0.037 0.044 R2 0.797 0.411 0.529 0.377 0.526 0.288 LB(12) 26.257** 49.800** 17.989* 22.016** 16.521* 10.958 GQ 1.156 0.990 0.970 0.917 0.982 0.682 JB 20.300** 6.586** 2.314 3.860 10.857** 10.635** *, **denote significance at 10% and 5% level

192

Table 6.3 Continued Negeri Sembilan Melaka Johor Parameter Uni- Uni- Uni- Multiple Multiple Multiple variate variate variate Model LDDS LDSS LLDS LLDS LLSS LLSS 2 -5 -5 -5 -4 -4 -4 ση 5.69×10 3.6×10 6.9×10 4.0×10 3.6×10 3.0×10 2 σ ς 0.000 0.000 NA NA NA NA 2 -6 -6 -6 σ ω 0.000 1.9×10 0.000 0.000 1.1×10 3.0×10 2 σ ε 0.002 0.003 0.002 0.002 0.001 0.001 SE 0.049 0.062 0.050 0.051 0.037 0.042 R2 0.694 0.490 0.537 0.312 0.424 0.191 LB(12) 16.833* 8.834 18.305 12.34 10.833 9.452 GQ 1.381 0.889 0.763 0.672 0.906 0.864 JB 265.550** 83.441** 3.333 0.908 7.544** 1.114 Kelantan Terengganu Pahang Parameter Uni- Uni- Uni- Multiple Multiple Multiple Variate Variate Variate Models LLSS LLSS LLSS LLSS LLDS LLDS 2 -4 ση 0.001 3.0×10 0.001 0.001 0.001 0.001 2 σ ς NA NA NA NA NA NA 2 -6 -4 -7 -4 σ ω 1.2×10 2.0×10 7.3×10 1.0×10 0.000 0.000 2 σ ε 0.008 0.004 0.005 0.002 0.002 0.006 SE 0.099 0.141 0.083 0.116 0.056 0.116 R2 0.776 0.517 0.755 0.503 0.778 0.503 LB(12) 20.301** 25.81** 21.271** 26.408** 10.633 27.134** GQ 0.576 0.547 0.354 0.460 0.675 0.738 JB 23.567** 168.71** 6.058* 45.807** 0.568 8.156** Sabah Sarawak

Parameter Uni- Uni- Multiple Multiple Variate Variate Models LLSS LLSS LLDS LLDS 2 -4 -4 -4 ση 4.0×10 3.0×10 0.001 3.0×10

2 σ ς NA NA NA NA

2 -6 -6 σ ω 1.0×10 2.5×10 0.000 0.000

2 σ ε 0.001 0.002 0.001 0.002

SE 0.047 0.053 0.047 0.052

R2 0.476 0.257 0.549 0.423

LB(12) 9.905 10.958 12.702 8.834

GQ 1.159 1.128 0.965 0.821

JB 0.951 7.088** 3.828 5.583*

*, **denote significance at 10% and 5% level

6.2.1 Error Estimate

The standard diagnostic presented in Table 6.3 do not indicate any serious problem except for certain states, include Kedah, Perak, Kuala Lumpur, Selangor,

Negeri Sembilan, Kelantan, and Terengganu which display a weak evidence of serial error autocorrelation problem. The relevant null hypothesis for the independent test

193 is rejected at 10% and 5% significance level. The assumptions of homoscedasticity are satisfactory for all states.

Meanwhile, the normality assumption is clearly violated for the road accidents models of Perak, Selangor, Negeri Sembilan, Johor, Kelantan, and

Terengganu. Similar as in Section 6.1.1, the violated normality assumption is handled by adding the impulse intervention variables or level intervention variable whenever appropriate. To determine the possible intervention variable the auxiliary residuals is plotted and displayed in Appendix 5.

Looking at Appendix 5, on the auxiliary residuals, possible outliers for the state of Perak occurred at triple times, seven times for states of Selangor with at least two time point stand out to be located far outside the 95% confidence limit, 5 times for Kelantan and 6 times for Terengganu. For both states, Kelantan and Terengganu the outlier point at the end of 2003 is located far outside the 95% confidence limit in negative direction. Meanwhile, for Negeri Sembilan and Johor, possible outliers occurred 3 and 4 times respectively.

The auxiliary residuals of standardised smooth level disturbances for structural breaks detection are displayed on the right side of Appendix 5. From the appendix only certain states such as Selangor and Terengganu show large value of t- test of the standardised smooth level disturbances, while the other states, have t-test values that does not exceed absolute values of 2.5 as suggested by Koopman et al.

(2009). However, similar in Section 6.1.1, only significant outliers and structural break are considered.

194

Table 6.4: Final estimate of road accidents model with the explanatory variables States Coefficient Kuala Negeri Penang Perlis Kedah Perak Selangor Lumpur Sembilan Model LLDS LLDS LLSS LLDS LLDS LLDS LDDS 2 -4 -4 -4 -4 -5 ση 3.0×10 0.001 0.001 2.8×10 2.4×10 2.7×10 5.7×10 2 σ ς NA NA NA NA NA NA 0.000 2 -8 σ ω 0.000 0.000 4.8×10 0.000 0.000 0.000 0.000 2 -4 σ ε 0.001 0.010 0.001 0.001 3.0×10 0.001 0.002 ** ** ** ** ** ** µt 9.144 6.455 8.202 9.164 10.339 9.412 8.121 ** ν t NA NA NA NA NA NA 0.004 ** ** * ** γ1 0.023 0.004 0.009 -0.021 -0.016 -0.009 -0.032 ** ** ** ** ** ** ** γ 2 -0.082 -0.095 -0.094 -0.059 -0.101 -0.123 -0.089 ** * γ 3 -0.007 0.040 0.028 0.039 0.019 0.021 0.016

γ 4 -0.006 0.053 -0.022 -0.010 0.003 -0.007 -0.020 ** ** γ 5 -0.011 0.047 0.009 0.026 0.014 -0.007 0.037 ** ** ** γ 6 -0.025 -0.036 -0.021 -0.002 -0.027 -0.036 0.004 ** ** ** ** γ 7 0.026 0.024 0.002 0.018 0.041 0.049 0.025 ** ** ** ** ** γ 8 0.051 0.029 0.046 0.047 0.055 0.060 0.040 * ** ** γ 9 -0.003 0.070 0.010 -0.002 0.024 0.005 0.008 * ** ** ** γ10 0.020 -0.017 0.041 0.005 0.059 0.060 0.018 ** ** ** γ11 -0.013 -0.046 -0.029 -0.032 -0.015 0.012 -0.040 ** ** ** ** γ12 0.026 -0.073 0.021 -0.009 -0.056 -0.024 0.032 -5** -5 -5** -4** -5** -5 -4** β1_ RAINF 4.0×10 8.0×10 7.0×10 1.1×10 5.0×10 5.0×10 1.7×10 -5 -4 β2_ RAIND 0.001 -0.004 0.001 0.001 9.0×10 -2.2×10 0.001 ** β3_ TEMP 0.023 -0.006 0.008 -0.002 -0.008 0.004 0.004 -5 -5 -4 -4 -4 -5 -5 β4_ API 7.0×10 -4.1×10 2.9×10 1.2×10 1.6×10 6.0×10 -9.0×10 -5 -5 -4 -5 -4** -5 -5 β5_ OILP 2.0×10 6.0×10 2.1×10 4.0×10 -2.7×10 -6.0×10 7.0×10 * ** -4 ** β6_ CPI -0.001 -0.006 -0.002 -0.003 2.5×10 -0.003 -0.002 * ** ** * ** β7_ BLKG 0.001 0.015 0.025 0.040 -0.028 -0.022 0.049 ** ** ** β8_ SAFE 0.026 0.035 0.105 0.097 0.003 -0.006 0.033 ** ** ** β9_ LRA_1 -0.212 -0.097 -0.139 -0.111 -0.096 -0.076 -0.078 Nov-11 Jul-13 Nov-11

-0.142** 0.096** 0.334**

Apr-08

0.178** Outliers NA NA NA Mac-12

-0.167**

Aug2012

0.106**

Jan-05 -0.185** Mac-05 Level Break NA NA NA NA NA NA 0.165** Apr-11 0.182** SE 0.033 0.108 0.046 0.034 0.025 0.038 0.049 R2 0.538 0.657 0.743 0.867 0.801 0.529 0.696 LB(12) 0.538 6.286 20.997** 32.572** 8.649 17.989* 16.833* GQ 1.104 0.727 1.093 0.733 1.442 0.970 1.381 JB 3.555 4.230 4.292 8.144** 6.637 2.314 265.55** *, **denote significance at 10% and 5% level

195

Table 6.4:Continued States Coefficient Melaka Johor Kelantan Terengganu Pahang Sabah Sarawak Model LLDS LLSS LLSS LLSS LLDS LLSS LLDS 2 -5 -4 -4 ση 6.9×10 3.2×10 0.001 0.001 0.001 4.0×10 0.001 2 σ ς NA NA NA NA NA NA NA 2 -7 -6 -7 -6 σ ω 0.000 9.3×10 2.3×10 1.7×10 0.000 1.0×10 0.000 2 σ ε 0.002 0.001 0.005 0.003 0.002 0.001 0.001 ** ** ** ** ** ** ** µt 4.449 8.257 6.901 7.700 9.367 5.880 6.917 ν t NA NA NA NA NA NA NA ** ** ** γ1 -0.004 -0.002 -0.089 -0.120 -0.068 -0.008 -0.017 ** ** ** ** ** ** * γ 2 -0.063 -0.076 -0.124 -0.162 -0.126 -0.070 -0.033 ** ** γ 3 0.053 0.011 0.083 0.030 0.027 0.017 0.020 * γ 4 -0.028 -0.029 -0.004 -0.003 0.015 -0.012 -0.018 ** ** γ 5 0.024 0.018 -0.012 0.035 0.045 0.031 0.047 ** γ 6 -0.007 -0.022 0.048 0.025 0.021 -0.051 -0.002 ** ** ** γ 7 0.035 0.055 0.004 0.028 0.034 0.026 0.011 * ** ** ** ** * γ 8 0.025 0.054 0.107 0.104 0.087 0.038 0.018 ** ** ** γ 9 -0.018 -0.006 0.051 0.085 0.042 -0.007 -0.034 ** ** ** γ10 0.001 0.036 0.033 0.079 0.005 0.040 0.022 ** ** ** ** ** γ11 -0.057 -0.030 -0.074 -0.039 -0.053 -0.013 -0.031 ** ** γ12 0.039 -0.010 -0.023 -0.061 -0.028 0.008 0.018 -4** -5** -5 -4** -5 -5 -5 β1_ RAINF 1.8×10 3.0×10 -1.0×10 1.5×10 1.0×10 2.0×10 2.0×10 -4 ** ** ** ** β2_ RAIND 2.8×10 0.001 0.006 0.002 0.007 0.007 0.006 * ** ** β3_ TEMP 0.004 0.004 0.027 0.003 -0.011 0.041 0.043 -4 -5 -4 -4 -5 β4_ API 1.5×10 -6.0×10 -0.001 0.001 3.6×10 -1.0×10 9.0×10 -5 -5 -4 -4 -5 -4 -4 β5_ OILP 9.0×10 5.0×10 -1.1×10 -3.6×10 7.0×10 -1.7×10 1.3×10 -4 β6_ CPI -0.002 -0.001 -0.003 -0.002 -0.002 1.5×10 -0.001 ** * * ** β7_ BLKG 0.045 -0.004 0.055 0.041 0.040 -0.006 -0.002 ** ** ** ** * β8_ SAFE 0.008 0.045 0.220 0.178 0.099 0.032 0.030 ** ** ** ** * β9_ LRA_1 0.375 0.033 -0.122 -0.178 -0.220 0.003 -0.139 May2005 Nov-03 Dec2001 -0.108** 0.489** 0.217** May 2012 Feb-05 Nov-03 0.128** -0.303** 0.266** Outliers NA NA NA NA Nov-07 Jan-05

-0.247** 0.258**

Nov-07

-0.177**

Feb-04 0.270** Level Break NA NA NA NA NA NA Jan-08 0.116* SE 0.050 0.128 0.087 0.067 0.056 0.047 0.047 R2 0.537 0.483 0.854 0.776 0.778 0.476 0.549 LB(12) 18.305 10.032 17.838* 12.529 10.633 9.905 12.702 GQ 0.763 1.111 1.044 0.724 0.675 1.159 0.965 JB 3.333 10.291** 2.296 1.316 0.568 0.951 3.828 *, **denote significance at 10% and 5% level

196

Re-estimating the model by incorporating the possible outliers and breaks resulted to the best estimated model as in Table 6.4. The results show that incorporating the intervention variables by considering outliers and structural breaks in the series, not just improved the residual assumptions for normality, but, at the same time, it also satisfies the residuals independence assumption for the state of

Terengganu.

Moreover, standard errors of the model as well as the disturbance variance of each component decreased due to inclusion of the intervention variables. However, for the state of Perak, Negeri Sembilan and Johor the normality assumption of the road accidents model for those states are still not satisfied even though after incorporating the intervention variables.

6.2.2 States Level Road Accidents Pattern with Explanatory Variables

After the best model for each state is determined, the trend and seasonal pattern for each state is observed. Table 6.4 shows the estimation model for road accidents with explanatory and intervention variables based on STS. Out of the 14 states, only Negeri Sembilan has stochastic trend with deterministic slope while the other states have stochastic trend without slope component. In term of seasonality,

Kedah, Johor, Kelantan, Terengganu and Sabah show stochastic seasonal pattern while the other states show deterministic seasonal pattern.

The trend pattern movement of road accidents for each state after incorporating the explanatory and intervention variables are illustrated in Figure 6.3.

The figure shows that the observed time series is recovered quite well when the explanatory and intervention variables are incorporates in road accidents model for

197 each state compared to the road accidents model without the explanatory variables discussed in Section 5.4 and Section 5.5.

As an example, comparison of trend between models with and without explanatory is illustrated for states of Penang and Negeri Sembilan, is given in

Figure 6.4(a) and Figure 6.4(h). Blue line indicates the trend without explanatory variables and the red line represents the trend with explanatory variables. Both plots show that after incorporating the explanatory variables and considering the intervention variables due to outlier or structural break, the trend pattern is close to the actual observation.

(a) Penang

(b) Perlis Figure 6.4: Trend pattern of state level road accidents

198

(c) Kedah

(d) Perak

(e) Selangor Figure 6.4: Continued

199

(f) Kuala Lumpur

(g) Melaka

(h) Negeri Sembilan Figure 6.4: Continued

200

(i) Johor

(j) Kelantan

(k) Terengganu Figure 6.4: Continued

201

(l) Pahang

(m) Sabah

(n) Sarawak Figure 6.4: Continued

202

The estimated seasonal patterns of road accidents for each state is illustrated in Figure 6.4. The patterns are quite distinct from that obtained in Section 5.4 and

Section 5.5. The estimated seasonal pattern road accidents for states of Perak and

Negeri Sembilan have a stochastic seasonal pattern before incorporating the explanatory and intervention variables. However after incorporating the explanatory and intervention variables, the estimated seasonal pattern road accidents for those states portray a deterministic seasonal pattern.

For Penang adding the explanatory variables to the model resulted to lower road accidents in February and higher in August which is approximately similar with the road accidents seasonal pattern for Perak. On the other hand, the seasonality pattern of road accidents in Perlis which is found to fluctuate (Figure 5.9 (d) ) before adding the explanatory variables have a fixed seasonal pattern that is lowest in

February and highest in September in the current analysis. The seasonality pattern for the number of road accidents in Perlis resembles central region as shown in Figure

6.2 (c).

As in Figure 5.12 (a) and Figure 5.12(b), the seasonal pattern of road accidents for state of Kuala Lumpur and Selangor are not that different. Perhaps, the highest number of road accidents after incorporating the explanatory and intervention variables are found in August instead of October as shown in Figure 6.4(e) and

Figure 6.4(f). These indicate that the explanatory and intervention variables have already captured the extreme value in October. The pattern follow the central region road accident pattern.

Similar with analysis in Chapter 5, the seasonality pattern of road accident in

Melaka resembles the the pattern of recorded for Borneo region. It is fluctuates from

203

March to December. In contrast the larger number of road accidents is found in

December instead of October for Borneo region. On the other hand, the pattern also indicates that the explanatory variables included in the model successfully captured the largest number of road accidents that is occured in August before adding the explanatory variables.

Meanwhile, seasonal pattern for Negeri Sembilan road accidents is quite distinct compared to one without the explanatory variables in Chapter 5. For this states, the seasonal pattern with explanatory variable show the deterministic seasonal pattern with the highest number of road accidents found in August throughout the study period. Meanwhile stochastic seasonal pattern for the Kedah and Johor road accidents is less obvious compared to Figure 5.6 (a) and 5.11(a) respectively. In months of February and April, the number of road accidents in Johor is found to be increasing at a slow rate and in December this number shows a decreasing trend. At the same time, the highest number of road accidents throughout study period was found in August.

(a) Penang Figure 6.5: Seasonal pattern road accidents model for individual states

204

(b) Perlis

(c) Kedah

(d) Perak Figure 6.5: Continued

205

(e) Selangor

(f) Kuala Lumpur

(g) Negeri Sembilan Figure 6.5: Continued

206

(h) Melaka

(i) Johor

(j) Kelantan Figure 6.5: Continued

207

(k) Terengganu

(l) Pahang

(m) Sabah Figure 6.5: Continued

208

(n) Sarawak Figure 6.5:Continued

For Kelantan and Terengganu the seasonal pattern of road accidents in those states is less obvious compared in Figure 5.12(a) and Figure 5.12(b) which show the uncertain stochastic seasonality. Seasonal pattern for state of Pahang after incorporating the explanatory and intervention variables, following the east region’s road accidents seasonal pattern. The number of road accidents fluctuates between

March to October with the highest number of road accidents found in August.

The seasonality pattern of Sabah road accidents shows a stochastic seasonal pattern as illustrated in Figure 6.4(m). The number of road accident is higher in

October throughout the study period until 2010. The result is in contrary to the analysis in Chapter 5 which found that the number of road accidents for month of

October is the highest starting 2008 until end of the study period, with highest number recorded in August. Meanwhile, Sarawak road accidents illustrated in Figure

6.4(n) show a deterministic seasonal pattern and have a very similar pattern with

Borneo region’s road accidents. The number of road accidents has fluctuated

209 between March to December with the highest number of road accidents found in

May.

Generally, the lowest number of road accidents are found in February for all states except for Melaka which is found in November, throughout the period of study, and highest number of road accidents is found in August except for certain states such as Perlis that is found in September, Melaka in October and Sarawak in

May. The lowest number of road accident in February confirmed the analysis in

Chapter 5 and this may be related to the number of days in February that is only 28-

29 days compared to other months which have 30 to 31 days..

6.2.3 Estimation of Explanatory Variables

The analysis conducted in the last section highlights the variety of trends and seasonal specifications showed by different states. Meanwhiles, this section will discuss on the influence of road accidents for each states. This factor may be different to each state due to unique characteristics such as climate effects, economic effects and calendar effects.

Recall that the climate effects considered in this study include monthly amount of rainfall (RAINF), number of rainy day per month (RAIND), monthly maximum temperature (TEMP) and monthly maximum air pollution index (API). At the state level the climate effect towards road accidents may be differ between each states that is not only on their magnitude even thought their sign of correlation.

As stated by several researcher such as Fridstrom and Ingesbrigtsen (1991),

Fridstrøm et al. (1995), Chang and Chen (2005), Caliendo et al. (2007), (Shankar, et al. (1995), and Keay and Simmonds (2006) the number of road accidents increase

210 with the increase of amount of rainfall. It is noticeable in the current analysis that there is positive correlation between road accidents and amount of rainfall for all states except the Kelantan. The state of Kelantan road accidents has a negative correlation with amount of rainfall possibly due to the drivers that are used to the weather (Eisenberg, 2004) as number of rainy days for this state is among the highest in Malaysia that is between 3 to 26 days per months. However, the correlation is only significant at 5% to 10% level for Penang, Perak, Kedah, Selangor, Negeri Sembilan,

Melaka, Johor, and Terengganu. Meanwhile, the number of rainy days are correlated with road accidents for the states of Kelantan, Pahang, Sabah and Sarawak. The increase of road accidents is concomitant with the increase in the number of rainy days are reasonable as the states in both east coast and Borneo regions are hilly and vulnerable to landslide occurrences, prone toroad accidents occurrence.

Temperature is significantly positive correlated with road accidents in a several states such as Penang, Kelantan, Sabah and Sarawak. The result agrees with

Scott (1986) that claimed hot temperature is prone to having more road accidents in

United Kingdom. High temperature will disturb people’s emotion where making them becomes irritable of each other, easily tired and reduce concentration level

(German Traffic Safety Council, 2000). These emotions have indirect effect to the increase in the number of road accidents

Further investigation on the impact of economic effects towards the occurrence of road accidents found that the increase in crude oil price only correlated at 10% significant level with road accidents in Selangor. The effect is very low where the increase in OILP reduces accident in the states by less than 0.001% which is contradicts with Scott (1986) who stated that petrol price appeared to be quite

211 strongly related to many accidents series. On the other hand, CPI for transportation has a negative significant relationship with the increasing number of road accidents in Perlis, Perak and Kuala Lumpur. The result is reasonable because as the price of transportation increases, the number of consumers who buy new transportation will decrease, decreasing traffic volume, and in the same time, reducing the number of accidents.

Moreover, the estimated model for the effect of BLKG culture shows that it is significantly positive related to the occurrence of road accidents in the states of

Kedah, Perak, Negeri Sembilan, Melaka, Kelantan, Pahang, and Terengganu.

Increase in the numbers of road accidents for those states during festivald holiday or

BLKG seasons are ranges between 2.5% to 5.5% with highest number of road accidents recorded in the state of Kelantan. On the contrary, at 10% significance level, Selangor and Kuala Lumpur road accidents are negatively correlated to the

BLKG seasons. It is not a surprise as it is confirmed from the analysis in previous section on regional road accidents influence. The study on central region road accidents which include Selangor and Kuala Lumpur shows that the number of road accidents is reduced during the festive season because many residents in this region are not native to this region and will travel back to their hometown during festive seasons.

It is revealed that safety operations (SAFE) conducted during festive seasons failed to reduce traffic accidents significantly. Table 6.4 clearly shows that SAFE is significantly positive correlated with all states except states in central regions, Perlis

Negeri Sembilan, Melaka and Sarawak. In fact, the finding similar to the result

212 gathered from previous section of road accidents influence toward the implementation of safety precaution.

6.2.4 Observing Possible Outliers and Structural Breaks

Referring the prior analysis based on time series regression (TSR) approach in observing outliers, almost all significant outliers found in structural time series (STS) are similar. For example, significant outliers point found in state of Negeri Sembilan road accidents series. However more significant outliers point were observed based on STS approach. This can be seen in Perak and Kelantan road accidents series.

For Perak road accidents series, only three outliers are found by TSR approach, while STS approach found four outliers. There are two similar points observed in TSR analysis that is in November 2007 and April 2008. Another two significant outliers detected are in March 2012 and August 2012. The outliers occurred in August 2012 is estimated to increase accidents in Perak by 10% while, the number of accidents is estimated to decrease by 16.7% in March 2012.

Similarly, using STS approach, two other outliers were found to significantly affect accidents occurrence for Kelantan, instead of only one outlier based on TSR approach. The estimated outlier points occurred in February 2005 and November

2012 which decreased accidents number by 30.3% and 24.7% respectively.

In contrast, some of the outliers detected in TSR analysis does not appear to be significant using STS approach and vice versa. For example using TSR analysis two outliers are detected in affecting the number of accidents in Penang. But, using

STS analysis, there is no outliers for states of Penang. Otherwise, using STS analysis, four significant outliers and two significant structural break points were detected for

213

Terengganu which were not observed based on TSR analysis. The four outliers detected were in December 2001, November 2003, January 2005 and November

2007 and two structural break points occurred in February 2004 and January 2008.

For the case of Selangor road accident series, STS approach found one significant outliers in July 2013. The point is coincides with the 1 Malaysia Mega

Sale Carnival that is held from 29th June till 1st September 2013. The carnival that was held during the month of Ramadan offers good chance to the Muslims in preparing the Eid-ul-Fitr celebration. As known, Selangor is developed state which has many shopping malls, which may indirectly increase the volume of traffic in this state. In addition, road accidents in Selangor is estimated be influenced by few structural break point, that occurred in January 2005, Mac 2005 and April 2011.

6.3 Prediction Performance of STS with Explanatory Variables

The model fitted in this chapter is just a prediction. It may be perform better or poorer than other common time series model. Thus, in this section, performance of structural time series (STS) model in modeling road accidents is compared with the time series regression (TSR) model. Similar as in Section 5.5, the comparison is based on two famous loss functions that are RMSE and MAPE. Prediction horizon from January 2012 to December 2013 is illustrated in Figure 6.6 and Figue 6.7 for regional road accidents model and individual states road accidents model respectively.

214

(a) Northern

(b) Central

(c) Southern Figure 6.6: Real and estimated regional road accidents produced by TSR and STS models regions

215

(d) East Coast

(e) Borneo Figure 6.6: Continued

As displayed in Figure 6.6, both prediction methods give good prediction in the beginning of the sample period for almost all regions. Yet, at the end of the sample period, the estimated value is quite far than the actual value especially the estimated value based on TSR method which is in red. The green colour, which represent STS method, is very close to the actual value (in blue). Referring to Table 6.5, it is not a surprising, because the prediction performance based on STS approach is lower compared to TSR approach.

216

(a) Penang

(b) Perlis

(c) Kedah Figure 6.7: Real and estimated states road accidents values produced by TSR and STS models for individual states

217

(d) Perak

(e) Kuala Lumpur

(f) Selangor Figure 6.7: Continued

218

(g) Negeri Sembilan

(h) Johor

(i) Melaka Figure 6.7: Continued

219

(j) Kelantan

(k) Terengganu

(l) Pahang Figure 6.7: Continued

220

(m) Sabah

(n) Sarawak Figure 6.7: Continued

Table 6.5: Error values for prediction road accidents models with explanatory variables RMSE MAPE Region/ State TSR STS TSR STS Northern 0.0010 0.0002 6.7819 3.1380 Penang 0.0007 0.0003 6.8209 4.5579 Perlis 0.0101 0.0058 40.3735 30.9234 Kedah 0.0029 0.0006 12.7514 5.5062 Perak 0.0013 0.0004 8.5067 4.5341 Central 0.0012 0.0001 6.9696 1.9726 Selangor 0.0011 0.0002 6.8713 2.7670 Kuala Lumpur 0.0013 0.0007 8.7477 6.2382 Southern 0.0012 0.0005 7.2765 4.6417 Melaka 0.0011 0.0012 8.4111 9.1293 Negeri Sembilan 0.0012 0.0012 9.1755 9.5235 Johor 0.0019 0.0004 9.6024 4.1032 East Coast 0.0049 0.0013 16.7767 9.5938 Kelantan 0.0122 0.0045 34.2394 20.5443 Tganu 0.0059 0.0018 20.7892 12.4210 Pahang 0.0041 0.0009 16.6147 8.0436 Borneo 0.0029 0.0007 13.5538 6.0370 Sabah 0.0030 0.0009 14.1067 8.1220 Sarawak 0.0024 0.0009 11.4882 6.9310

221

Similar condition is also found for individual state’s prediction performance as can be seen in Figure 6.7. However, certain states such as state of Perlis, Negeri

Sembilan, and Kelantan, show that the prediction based on TSR and STS methods cannot predict the number of road accidents well especially at the extreme value points even after considering the outliers and structural break points. Yet, based on

Table 6.5, the STS model is a better model which produced less error compared to

TSR model.

6.4 Summary

In this chapter the best fitted univariate STS model of road accidents are refitted by incorporating the selected explanatory and intervention variable based on the results gathered in Chapter 5. The process of refitted involved in ensuring the satisfaction of diagnostic residual, observing the stochastic pattern and the detection of the outliers and the structural break. Moreover, factors that influenced road accidents occurrence are also investigated.

After the explanatory variables have been added, it is found that some of the stochastic models were reduced into deterministic model. The explanatory variables have deseasonalized the series which put the independence error assumption fail to satisfied at seasonal lag. For some models, the normality of residuals assumptions were clearly violated because of the occurrence of outliers and structural breaks in the series. However, not all outliers and structural breaks are easily interpreted.

Generally, it is observed that outliers are related to holidays in which a few off days may result in the decreasing in the number of road accidents while double festive seasons in a month will hugely increase number of road accidents.

222

In terms of significance of explanatory variables, the results differ between each region and state not only based on their magnitude, but even their sign of correlation. More significant variables should be included to improve the model such as trading day, volume of traffic and drivers behavior. Meanwhile, the prediction performance shows that STS model gave a better performance compared to TSR model. Yet, it is also observed that STS model and TSR model still cannot capture the extreme points in the series even after considering possible outliers and structural breaks in the models.

223

CHAPTER 7

CONCLUSION AND RECOMMENDATION

This chapter summarises all the findings of the analysis gathered from previous chapters. Besides, suggestion and recommendations for future research are also discussed.

7.1 Concluding Remarks

In the preliminary study, several analyses were done include descriptive analysis, correlation analysis and preliminary road accidents model. From the analysis, number of road accidents is found to vary according to regions as well as states. On average, higher number of road accidents is recorded in more developed region such as central region which include the states of Kuala Lumpur and Selangor and lower number of road accidents is found in east coast region. For individual states, the lowest number of road accidents is founds in the state of Perlis.

Time series plot of road accidents for all regions and states show an increasing trend from January 2001 to December 2013. However the pattern of the trend vary based on individual region as well as state. The series is a non-stationary series especially in mean. A strong seasonality is also found for a majority of the road accidents series.

224

Correlation analysis show that the number of road accidents have a positive correlation with amount of rainfall (RAINF), number of rainydays (RAIND), air pollution index (API), crude oil price (OILP), and festive seasons (BLKG).

Meanwhile, the number of road accidents exhibits negative correlation with temperature (TEMP) and consumer price index for transportation (CPI). However this correlation is not applicable for analysis in individual states as the correlation coefficient magnitude for each state varies

Relating regional road accidents with dummy seasonal variables and selected explanatory variables based on TSR model found that the model fit very well.

However, LB statistic which measures the independence of the residual indicates serious autocorrelation problem. Meanwhile, estimating road accidents model based on SARIMA model found that majority of the regions and states road accidents series is non-stationary. As a requirement of the SARIMA modeling the series need to be differencing to achieve stationarity either at the seasonal or non-seasonal lag or both.

In order to get a better model, the STS model is introduced. This model allows the non-stationary series to be modeled straight away without the need for differencing. On the other hand, the STS model is expected to have the road accidents model without the serious error autocorrelation problem. In the first stage, the univariate structural time series model is presented. This involved specifying the best STS model to represent the number of road accident according to regions and the states. In this stage, the ability of in sample data prediction and forecasting is compared with SARIMA and TSR. In the second stage, explanatory variables are incorporated and the presence of outliers and structural breaks are determined.

225

The first objective of the study is to observed the deterministic and stochastic behaviours or patterns of the road accidents series at different location. This objective is achieved in Chapter 5 when the first stage of STS was introduced. The analysis found that the seasonality pattern varies according to individual region and state. Most of the series have a stochastic trend without slope component except for the southern region, state of Perlis and Negeri Sembilan. The trend movement of the southern regions, and states of Perlis and Negeri Sembilan where they recorded smoothly increasing trend movement. Meanwhile, the other regions and the states trend’s pattern show more obvious wavy pattern.

Regarding to the seasonal pattern of road accidents, it is found that southern and east coast regions and individual states of Kedah, Perak, Negeri Sembilan, Johor,

Kelantan, Terengganu and Sabah exhibit stochastic seasonal pattern while others regions and states have deterministic seasonal pattern. The special features of the stochastic seasonal pattern for selected month are also presented.

The second objective of the study is to investigate and understand the influences of road accidents at different locations. This objective is achieved in

Chapter 6 where explanatory variables are incorporated into the models. By incorporating the explanatory variables, as well as considering the possible outliers and sturctural breaks resulted in improved model prediction ability as well as the structure error.

The presence of explanatory variables in some road accidents model are assumed to deseasonalized the series which made the independence error assumptions fail to satisfy at seasonal lag. On the other hand, some of the stochastic seasonal pattern were reduce into the deterministic seasonal pattern after the

226 explanatory variables were added. In terms of significance of explanatory variables, the results differ between region and state not only in their magnitude but also their sign of correlation.

The third objective of the study is to compare the performance of STS with the existing model is achieved at the end of Chapter 5 and Chapter 6. In Chapter 5, TSR and SARIMA models are compared with STS model that is estimated using Kalman filter estimation technique. Incorporating explanatory variables, comparison will also be made between TSR and STS models (Chapter 6). For both comparisons, it is found that STS outperforms other existing models.

7.2 Implications of the Study

Road accidents could result in a big loss including social and economic loss.

Therefore, road safety study is very important as one of the initiative to overcome the problem. The proposed model in this study can be used as an alternative to manage road accidents occurrence. The model in this study is expected to assist local authorities such as Royal Malaysian Police, Road Transport Department and Road

Safety Department in establishing the basis for finding a solution to reduce the unintended road accidents.

The risk of road accidents varies between location. Its may be result from different nature and lifestyle of certain states including climate effect, economy and the way of thinking. For example states in east coast region most likely have a heavy rainfall at the end of the year compare to other states. Moreover, in term of economy, states in central region show a better economy growth indirectly have much more facilities that help a safety features and the way thinking of the citizen in this area.

227

Therefore in this study, the road accidents model are develop based of states by states basis.

In addition, this study also aims to assist the society in planning their traveling schedule especially during adverse weather and festival seasons. Therefore, the increasing number of road accidents can be overcome indirectly the economic development and quality of life of the society could be improved.

Time series analysis always referred on pattern and behaviour of the series.

Trend, seasonal and cycle are unobserved time series component that is used to represent the pattern of the series. In this study, trend and seasonal components are modeled based on random walk that allows for stochastic and deterministic behaviour of trend and seasonal pattern to be observed and interpreted. The estimated unobserved component found in the model is important in giving a clear indication of the future long term movement of the series. Indirectly, the model may strengthen the system of road safety modeling in the future.

Several existing study use dummy variable of “0” and “1” to represent the seasonal related variables. However in this study, the appropriate way of incorporating festive seasons and safety operation enforcements are introduced. The approach is more sensible to the situation and expected to improve time series of road safety modeling. Therefore, this procedure gives a new perspective in road safety modeling.

228

7.3 Suggestion for Future Research

There are a several suggestions for future research. First, in terms of the application of STS approaches, this methods may be employed in other areas of study such as environment, economy as well as in demographic study as this method is able to model all time series component such as trend, cycle and seasonal pattern.

Furthermore, the application of STS is more interesting if we allow cycle component in the model. However, this demanding larger number of observations, which is usually a limitation for many research.

On the other hand, instead of only allowing the time series component to vary over time, the coefficient of the explanatory variables should also be allow to vary overtime. In this approach, the patterns or movement of the explanatory coefficients may be observed as time varying.

Besides, as observed in this study, even though the STS method gave a good prediction, this approach still unable to capture the extreme value in the series. It is suggested that STS is employed together with other modeling approach which can capture the extreme values.

229

REFERENCES

Abdul Manan, M. M., Jonsson, T., & Várhelyi, A. (2013). Development of a safety performance function for motorcycle accident fatalities on Malaysian primary roads. Safety Science, 60, 13–20. https://doi.org/10.1016/j.ssci.2013.06.005

Abdul Manan, M. M., & Várhelyi, A. (2012). Motorcycle fatalities in Malaysia. IATSS Research, 36(1), 30–39. https://doi.org/10.1016/j.iatssr.2012.02.005

Abusini, S. (2013). The effect of road characteristics on motorcycle accident in Batu East Java Indonesia. In International Conference on Mathematical Sciences and Statistics 2013 (ICSMS 2013) (Vol. 241, pp. 241–246). https://doi.org/10.1063/1.4823912

Aderamo, A. J. (2012a). Assessing the Trends in Road Traffic Accident Casualties on Nigerian Roads. Journal of Social Sciences, 31(3), 19–25.

Aderamo, A. J. (2012b). Spatial pattern of road traffic accident casualties in Nigeria. Mediterranean Journal of Social Sciences, 3 (May), 61–72. https://doi.org/10.5901/mjss.2012.v3n2.61

Aderamo, A. J., & Olatujoye, S. (2013). Trends motorcycle accidents in Lokoja, Nigeria. European Journal of Science and Technology, 2(6), 251–261.

Ali, G. A., & Bakheit, C. S. (2011). Comparative analysis and prediction of traffic accidents in Sudan using artificial neural netwrok and statistical method. In Proceedings of the 30th Southern African Transport Conference (SATC 2011). 202–214.

Allen, K. (1999). A structural time series approach to the reconstruction of Tasmanian maximum temperatures. Environmental Modelling & Software, 14, 261–274.

Amin, M. S. R., Zareie, A., & Amador-Jiménez, L. E. (2014). Climate change modeling and the weather-related road accidents in Canada. Transportation Research Part D: Transport and Environment, 32(2014), 171–183. https://doi.org/10.1016/j.trd.2014.07.012

Andreassen, D. (1985). Linking deaths with vehicles and population. Traffic Engineering & Control, 26(11), 547–549.

230

Andrews, R. L. (1994). Forecasting performance of structural time series models. Journal of Business & Economic Statistics, 12(1), 129–133. https://doi.org/10.1080/07350015.1994.10509996

Antoniou, C., Papadimitriou, E., & Yannis, G. (2014). Road safety forecasts in five European countries using structural time series models. Traffic Injury Prevention, 15(6), 598–605. https://doi.org/10.1080/15389588.2013.854884

Antoniou, C., & Yannis, G. (2013). State-space based analysis and forecasting of macroscopic road safety trends in Greece. Accident Analysis and Prevention, 60, 268–276. https://doi.org/10.1016/j.aap.2013.02.039

Athanasopoulos, G., Hyndman, R. J., Song, H., & Wu, D. C. (2011). The tourism forecasting competition. International Journal of Forecasting, 27, 822–844.

Bergel-Hayat, R., Debbarh, M., Antoniou, C., & Yannis, G. (2013). Explaining the road accident risk: Weather effects. Accident Analysis & Prevention, 60, 456– 465. https://doi.org/10.1016/j.aap.2013.03.006

Bergel-Hayat, R., & Zukowska, J. (2014). Structural time series modelling of the number of road fatalities in Poland in relation with economic factors. In TRA 5th conference, Paris, 14–17 April 2014.

Box, G. E. P., & Jenkins, J. M. (1976). Time series analysis: Forecasting and control. Holden-Day, San Francisco, CA.

Brien, R. M. O. (2007). A Caution regarding rules of thumb for variance inflation factors. Quality &Quantity, 41, 673–690. https://doi.org/10.1007/s11135-006- 9018-6

Brijs, T., Karlis, D., & Wets, G. (2008). Studying the effect of weather conditions on daily crash counts using a discrete time-series model. Accident; Analysis and Prevention, 40(3), 1180–90. https://doi.org/10.1016/j.aap.2008.01.001

Broadstock, D. C., & Collins, A. (2010). Measuring unobserved prices using the structural time-series model: The case of cycling. Transportation Research Part A: Policy and Practice, 44(4), 195–200. https://doi.org/10.1016/j.tra.2010.01.002

Bun, E. (2012). Road traffic accidents in Nigeria : A public helath problem. Short Communication, 3(2), 1–3.

231

Caliendo, C., Guida, M., & Parisi, A. (2007). A crash-prediction model for multilane roads. Accident Analysis & Prevention, 39(4), 657–670. https://doi.org/10.1016/J.AAP.2006.10.012

Chang, L.-Y., & Chen, W.-C. (2005). Data mining of tree-based models to analyze freeway accident frequency. Journal of Safety Research, 36(4), 365–375. https://doi.org/10.1016/J.JSR.2005.06.013

Chang, Y. S. (2014). Comparative analysis of long-term road fatality targets for individual states in the US—An application of experience curve models. Transport Policy, 36, 53–69. https://doi.org/10.1016/j.tranpol.2014.07.005

Commandeur, J. J. F., Koopman, S., & Koo. (2007). An introduction to state space time series analysis. United State: Oxford University Press.

Commandeur, J. J. F., Wesemann, P., Bijleveld, F., Chhoun, V., & Sann, S. (2017). Setting Road Safety Targets in Cambodia: A Methodological Demonstration Using the Latent Risk Time Series Model. Journal of Advanced Transportation, 2017(January), 1–9. https://doi.org/10.1155/2017/5798174

Commandeur, J., & Koopman, S. (2007). An introduction to state space time series analysis. United State: Oxford University Press.

Cowpertwait, P. S. P., & Metcalfe, A. V. (2009). Time series data. In Introductory time series with R. New York, NY: Springer New York. https://doi.org/10.1007/978-0-387-88698-5_1

Desai, M. M., & Patel, A. . (2011). Road accidents study based on regression model : A case study of Ahmedabad City, National Conference on Recent Trends in Engineering & Technology (May), 1–8.

Dilaver, Z., & Hunt, L. C. (2011). Industrial electricity demand for Turkey: A structural time series analysis. Energy Economics, 33(3), 426–436. https://doi.org/10.1016/j.eneco.2010.10.001

Dordonnat, V., Koopman, S. J., & Ooms, M. (2012). Dynamic factors in periodic time-varying regressions with an application to hourly electricity load modelling. Computational Statistics & Data Analysis, 56(11), 3134–3152. https://doi.org/10.1016/j.csda.2011.04.002

Dordonnat, V., Koopman, S. J., Ooms, M., Dessertaine, A., & Collet, J. (2008). An hourly periodic state space model for modelling French national electricity load. International Journal of Forecasting, 24(4), 566–587.

232

https://doi.org/10.1016/j.ijforecast.2008.08.010

Durbin, J., & Koopman, S. J. (2012). Time Series Analysis by State Space Methods: Second Edition. OUP Oxford. Retrieved from https://books.google.com/books?id=lGyshsfkLrIC&pgis=1

Durbin, J., & Koopman, S. J. S. . (2012). Time Series Analysis by State Space Method (2nd ed.). London: Oxford University Press.

Durbin, J., & Watson, G. S. (1950). Testing for serial correlation in least squares regression: I. Biometrika, 37, 409–428. https://doi.org/10.2307/2332391

Durbin, J., & Watson, G. S. (1951). Testing for serial correlation in least squares regression. II. Biometrika, 38, 159–178. https://doi.org/10.2307/2332325

Durbin, J., & Watson, G. S. (1971). Testing for serial correlation in least squares regression: III. Biometrika, 58, 1–19. https://doi.org/10.2307/2332391

Eisenberg, D. (2004). The mixed effects of precipitation on traffic crashes. Accident Analysis & Prevention, 36(4), 637–647. https://doi.org/10.1016/S0001- 4575(03)00085-X

Freeman, S. N., & Kirkwood, G. . (1995). On a structural time series method for estimating stock biomass and recruitment from catch and effort data. Fisheries Research, 22, 77–98.

Fridstrøm, L., Ifver, J., Ingebrigtsen, S., Kulmala, R., & Thomsen, L. K. (1995). Measuring the contribution of randomness, exposure, weather, and daylight to the variation in road accident counts. Accident Analysis & Prevention, 27(1), 1– 20. https://doi.org/10.1016/0001-4575(94)E0023-E

Friend, D. (1976). How to study statistics. Education + Training, 18(8), 230–236. https://doi.org/10.1108/eb016420

Greibe, P. (2003). Accident prediction models for urban roads. Accident Analysis and Prevention, 35(May 2001), 273–285.

Gujarati, D. N. (2003). Basic econometrics. McGraw Hill.

Hakim, S. (1991). A critical review of macro models for Road Accidents, 23(5), 379–400.

233

Hannonen, M. (2005). An analysis of land prices : A structural time series approach. International Journal of Strategic Property Management, 9(3), 145–172. https://doi.org/10.1080/1648715X.2005.9637534

Harnen, S., Wong, S. V, & Hashim, W. I. W. (2006). Motorcycle accident prediction model for junctions on urban roads in Malaysia. Advances in Transportation Studies an International Journal, 8, 31–40.

Harrison, P. ., & Stevens, C. . (1976). Bayesian forecasting. Journal of the Royal Statistical Society. , (Series B). Retrieved from https://www.jstor.org/stable/2984970?seq=1#page_scan_tab_contents

Harrison, P. J., & Stevens, C. F. (1971). A bayesian approach to short-term forecasting. Journal of the Operational Research Society, 22(4), 341–362. https://doi.org/10.1057/jors.1971.78

Harvey, A. (2006a). Chapter 7 Forecasting with Unobserved Components Time Series Models. Handbook of Economic Forecasting, 1, 327–412. https://doi.org/10.1016/S1574-0706(05)01007-4

Harvey, A. (2006b). Trend Analysis. In Encyclopedia of Environmetrics. Chichester, UK: John Wiley & Sons, Ltd. https://doi.org/10.1002/9780470057339.vat025

Harvey, A. . (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press. Retrieved from https://www.amazon.com/Forecasting-Structural-Models-Kalman- Filter/dp/0521405734

Harvey, A. C., & Durbin, J. (1986). The effects of seat belt legislation on British road casualties : A case study in structural time series modelling. Journal of Royal Statistical Society, 149(3), 187–227.

Harvey, A. C., & Shephard, N. (1993). Structural time series models, Handbook of Statistics (11), 261–302.

Hermans, E., Wets, G., & Van den Bossche, F. (2006). The frequency and severity of road traffic accidents investigated on the basis of state space methods. Journal of Transport Statistics, 9(1), 63–76.

Houston, D. J. (2007). Are helmet laws protecting young motorcyclists? Journal of Safety Research, 38(3), 329–336. https://doi.org/10.1016/j.jsr.2007.05.002

234

Jalles, J. T. (2009). Structural time series models and the Kalman filter: A concise review.

Janes, J. (1995). Descriptive statistics: where they sit and how they fall. Library Hi Tech Library Review Management Research Review, 17(3), 402–409. Retrieved from http://dx.doi.org/10.1108/07378839910303063

Kalman, R. . (1960). A New Approach to linear filtering and prediction problems. Journal of Basic Engineering, 82.

Karlis, D., & Hermans, E. (2012). Time series models for road safety accident prediction, 27.

Keay, K., & Simmonds, I. (2006). Road accidents and rainfall in a large Australian city. Accident Analysis and Prevention, 38 (January 2005), 445–454. https://doi.org/10.1016/j.aap.2005.06.025

Kim, J. H., Wong, K., Athanasopoulos, G., & Liu, S. (2011). Beyond point forecasting: Evaluation of alternative prediction intervals for tourist arrivals. International Journal of Forecasting, 27(3), 887–901. https://doi.org/10.1016/j.ijforecast.2010.02.014

Kitagawa, G., & Gersch, W. (1996). Smoothness Priors Analysis of Time Series (Vol. 116). New York, NY: Springer New York. https://doi.org/10.1007/978-1-4612- 0761-0

Knape, J., Jonz, N., Sk, M., & Sokolov, L. (2009). Modeling demographic processes in marked populations. Environmental and Ecological Statistics (3) 59-79. https://doi.org/10.1007/978-0-387-78151-8

Koopman, S. J., Harvey, A. ., Doornik, J. A., & Shephard, N. (2006). Structural Time Series Analyser Modeller and Predictor STAMP 7. London: Timberlake Consultants Ltd.

Krieg, S., & van den Brakel, J. a. (2012). Estimation of the monthly unemployment rate for six domains through structural time series modelling with cointegrated trends. Computational Statistics & Data Analysis, 56(10), 2918–2933. https://doi.org/10.1016/j.csda.2012.02.008

Lassarre, S. (2001). Analysis of progress in road safety in ten European countries. Accident Analysis and Prevention, 33(6), 743–751. https://doi.org/10.1016/S0001-4575(00)00088-9

235

Law, T. H., Noland, R. B., & Evans, A. W. (2008). Model factors associated with the relationship between motorcycle deaths and economic growth. Accident Analysis and Prevention, xxx. https://doi.org/10.1016/j.aap.2008.11.005

Law, T. H., Umar, R. S. R., Zulkaurnain, S., & Kulanthayan, S. (2005). Impact of the effect of economic crisis and the targeted motorcycle safety programme on motorcycle-related accidents, injuries and fatalities in Malaysia. International Journal of Injury Control and Safety Promotion, 12(1), 9–21. https://doi.org/10.1080/17457300512331339166

Lawson, A. R., Ghosh, B., & Broderick, B. (2011). Prediction of traffic-related nitrogen oxides concentrations using Structural Time-Series models. Atmospheric Environment, 45(27), 4719–4727. https://doi.org/10.1016/j.atmosenv.2011.04.053

Lazim, M. A. (2005). Introductory Business Forecasting a Practical Approach. Shah Alam: Pusat Penerbitan Universiti(UPENA) UiTM.

Lipfert, F. W., & Murray, C. J. (2012). Air pollution and daily mortality : A new approach to an old problem. Atmospheric Environment, 55, 467–474. https://doi.org/10.1016/j.atmosenv.2012.03.013

Majed, M. J. (2012). Apabila musim cuti sekolah, kemalangan jalan raya pun meningkat. Utusan Malaysia.

Mann, P. S. (2004). Introductory Statistics: Using Technology (FIFTH EDIT). Wiley. Retrieved from http://www.amazon.com/Introductory-Statistics- Technology-Prem-Mann/dp/0471473243

Mehmood, A., Moin, A., Khan, I. Q., Mohammad Umer, M., & Rashid, J. (2015). Vulnerable road users are at greater risk during Ramadan — results from road traffic surveillance data. Journal of the Pakistan Medical Association, 65(3), 287–291.

Moges, H., & Woldeyohannes, S. (2014). Trends and projections of vehicle crash related fatalities and injuries in Northwest Gondar, Ethiopia: A time series analysis. International Journal of Environmental Health Engineering, 3(1), 30. https://doi.org/10.4103/2277-9183.139752

Mohammadian, M., Packzad, R., Salehiniya, H., Khazaie, S., Nematollahi, S., Pishkuhi, M. A., & Hafshejani, A. M. (2015). Seasonal Pattern in Occurrence and In-hospital fatality rate from Traffic Accidents in Isfahan, Iran. International Journal of Epidemiologic Research, 2(3), 126–133.

236

Montgomery, D. C., Peck, E. A., & Vining, G. G. (2001). Introduction to Linear Regression Analysis (Third Edit). Canada: John Wiley & Sons.

Moosa, I. A. (1999). Cyclical output, cyclical unemployment, and Okun’s coefficient A structural time series approach. International Review of Economics & Finance, 8(3), 293–304. https://doi.org/10.1016/S1059-0560(99)00028-3

Moosa, I. A. (2000). A structural time series test of the monetary model of exchange rates under the German hyperinflation. Journal of International Financial Markets, Institutions and Money, 10(2), 213–223. https://doi.org/10.1016/S1042-4431(99)00033-5

Mukaka, M. M. (2012). Statistics corner: A guide to appropriate use of correlation coefficient in medical research. Malawi Medical Journal : The Journal of Medical Association of Malawi, 24(3), 69–71. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/23638278

Muscatelli, V. A., & Tirelli, P. (2001). Unemployment and growth: some empirical evidence from structural time series models. Applied Economics, 33, 1083– 1088. https://doi.org/10.1080/00036840010003276

Muth, J. F. (1960). Optimal properties of exponentially weighted forecasts. Journal of the American Statistical Association, 55(290), 299–306.

Nanga, S. (2016). Time Series Analysis of Road Accidents in Ghana. Dama International Journal of Researchers , 878(6), 2343–6743. Retrieved from http://www.damaacademia.com/issue/volume1/issue6/DIJR-JU-007.pdf

Nasaruddin, N., Wah, Y. B., & Voon, W. S. (2012). Fatality prediction model for motorcycle accidents in Malaysia. In Statistics in Science, Business, and Engineering (ICSSBE), 2012 International Conference (pp. 1–6).

Nasr, M. (2009). Methods of Forecasting Deaths due to Road Accidents in Pakistan. Comsats University of Technology in Islamabad, Pakistan

Nerlove, M., & Wage, S. (1964). On the optimality of adaptive forecasting. Management Science, 10(2), 207–224. https://doi.org/10.1287/mnsc.10.2.207

Noland, R. B. (2003). Traffic fatalities and injuries: the effect of changes in infrastructure and other trends. Accident; Analysis and Prevention, 35(4), 599– 611. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/12729823

237

Ofori, T., Ackah, B., & Ephraim, L. (2012). Statistical models for forecasting road accident injuries in Ghana . International Journal of Research in Environmental Sciences and Technology, 2, 143–149.

Peltzman, S. (1975). The effects of automobile safety regulation. Journal of Political Economy, 83(4), 677–726.

Ponnaluri, R. V. (2012). Modeling road traffic fatalities in India: Smeed’s law, time invariance and regional specificity. IATSS Research, 36, 75–82.

Preez, J., & Witt, S. F. (2003). Univariate versus multivariate time series forecasting : an application to international tourism demand, 19, 435–451. https://doi.org/10.1016/S0169-2070(02)00057-2

Quddus, M. A. (2008). Time series count data models: An empirical application to traffic accidents. Accident Analysis & Prevention, 40(5), 1732–1741. https://doi.org/10.1016/j.aap.2008.06.011

Radin Umar, R., Murray, G. M., & Brian, L. H. (1996). Modelling of conspicuity- related motorcycle accidents in Seremban and Shah Alam, Malaysia. Accident Analysis and Prevention, 28(3), 325–332.

Radin Umar, R. S. (1998). Model kematian jalan raya di Malaysia: Unjuran tahun 2000. Pertanika J. Sci. & Technol, 108(62), 107–119. Retrieved from http://psasir.upm.edu.my/3430/1/Model_Kernatian_Jalan_Raya_di_Malaysia.pd f

Razzaghi, A., Bahrampour, A., Baneshi, M. R., & Zolala, F. (2013). Assessment of trend and seasonality in road accident data : An Iranian case study. International Journal of Health Policy and Management, 1(1), 51–55.

Rosenberg, B. (1973). Random coefficients models: The analysis of a cross section of time series by stochastically convergent parameter regression. Annals of Economic and Social Measurement, 24.

Salako, R. J., Adegoke, B. O., & Akanmu, T. A. (2014). Time series analysis for modeling and detecting seasonality pattern of auto-crash cases recorded at federal road safety commission, Osun Sector command (RS111), Osogbo. International Journal of Engineering and Advanced Technology Studies, 2(4), 25–34.

Sarani, R., Hashim, H. H., Wan Yaacob, W. F., Mohamed, N., & Radin Sohadi, R. U. (2013). The effect of rear seatbelt advocacy and law enforcement in reducing

238

injuries among passenger vehicle. International Journal of Public Health Research, 3(1), 267–275.

Sarani, R., Sharifah Allyana, S. M. ., Jamilah, M. ., & Wong, S. . (2012). Predicting Malaysian road fatalities for year 2020. Malaysian Institute of Road Safety Research (Vol. MRR 06/201). Kuala Lumpur.

Schweppe, F. (1965). Evaluation of likelihood functions for Gaussian signals. IEEE Transactions on Information Theory, 11(1), 61–70. https://doi.org/10.1109/TIT.1965.1053737

Scott, P. (1983). Variations in two-vehicle accident frequencies, 1970-1978. Crowthorne, Berkshire. Retrieved from http://trove.nla.gov.au/work/21890540?selectedversion=NBD3084871

Scott, P. P. (1986). Modelling time series of British road accidents data. Accident Analysis and Prevention, 18(2), 109–117.

Scuffham, P. A., & Langley, J. D. (2002). A model of traffic crashes in New Zealand. Accident Analysis & Prevention, 34(5), 673–687. https://doi.org/10.1016/S0001-4575(01)00067-7

Shahid, S., Minhans, A., Che Puan, O., Hasan, S. A., & Ismail, T. (2015). Spatial and temporal pattern of road accidents and casualties in Peninsular Malaysia. Jurnal Teknologi, 76(14), 57–65. https://doi.org/10.11113/jt.v76.5843

Shankar, V., Mannering, F., & Barfield, W. (1995). Effect of roadway geometrics and environmental factors on rural freeway accident frequencies. Accident Analysis & Prevention, 27(3), 371–389. https://doi.org/10.1016/0001- 4575(94)00078-Z

Shepherd, B. (2006). Estimating price elasticities of supply for cotton: A structural time-series approach. Paris.

Shuja, N., Lazim, M. A., & Wah, Y. B. (2007). Moving Holiday Effects Adjustment for Malaysian Economic Time Series. Department of Statistics.

Singh, D. P., Thakur, A. K., & Ram, D. S. (2014). Application of structural time series model for forecasting Gram production in India. American International Journal of Research in Science Technology, Engineering & Mathematics AIJRSTEM, 14(133), 2328–3491. Retrieved from http://www.iasir.net

239

Smeed, R. J. (1949). Some statistical aspects of road safety research. Journal of the Royal Statistical Society. Series A (General), 112(1), 1–34. Retrieved from http://www.jstor.org/stable/2984177

Song, H., Li, G., Witt, S. F., & Athanasopoulos, G. (2011). Forecasting tourist arrivals using time-varying parameter structural time series models. International Journal of Forecasting, 27(3), 855–869. https://doi.org/10.1016/j.ijforecast.2010.06.001

Stipdonk, H. . (2008). Time series applications on road safety developments in Europe. Deliverable D7.10 of the EU FP6 Project SafetyNet. Contract, (May 2004), 1–154.

SWOV. (2012). SWOV Fact Sheet- The Influence of Weather on Road Safety. Leidschendam, Netherlands. Retrieved from https://www.swov.nl/rapport/Factsheets/UK/FS_Influence_of_weather.pdf

Theil, H., & Wage, S. (1964). Some observation on adaptive forecasting. Manag. Sci, 10, 198–206.

Thury, G., & Witt, S. F. (1998). Forecasting industrial production using structural time series models. Omega, 26(6), 751–767. https://doi.org/10.1016/S0305- 0483(98)00024-3

Usman, T., Fu, L., & Miranda-moreno, L. F. (2010). Quantifying safety benefit of winter road maintenance : Accident frequency modeling. Accident Analysis and Prevention, 42(6), 1878–1887. https://doi.org/10.1016/j.aap.2010.05.008

Valli, P. P. (2005). Road accident models for large metropolitan cities of India. IATSS Research, 29(1), 57–65.

Wan Yaacob, W. F., Lazim, M. A., & Bee Wah, Y. (2010). Evaluating spatial and temporal effects of accidents likelihood using random effects panel count model. In International Conference on Science and Social Research (CSSR 2010) (pp. 960–964).

Wan Yaacob, W. F., Lazim, M. A., & Bee Wah, Y. (2011a). Applying fixed effect pnel count model to examine road accident occurence. Journal of Applied Sciences, 11(7), 1185–1191.

Wan Yaacob, W. F., Lazim, M. A., & Bee Wah, Y. (2012). Modeling road accidents using fixed effects model: Conditional versus unconditional model. In Proceeding of the World Congress on Engineering (Vol. I).

240

Wan Yaacob, W. F., Wan Husin, W. Z., Abd Aziz, N., & Nordin, N. I. (2011b). An intervention model of road accidents: The case of OPS Sikap. Journal of Applied Sciences, 11(7), 1105–1112.

Wang, G., & Getz, L. L. (2007). State-space models for stochastic and seasonal fluctuations of vole and shrew populations in east-central Illinois. Ecological Modelling, 207(2–4), 189–196. https://doi.org/10.1016/j.ecolmodel.2007.04.026

Welch, G., & Bishop, G. (2006). An Introduction to the Kalman Filter, UNC-Chapel Hill, TR 95-041, 1–16.

West, M., & Harrison, P. J. (1986). Monitoring and Adaptation in Bayesian Forecasting Models. Journal of the American Statistical Association, 81(395).

Wooldridge, J. M. (2006). Introductory Econometrics : A Modern Approach (6 Edition). Boston, USA: Cengage Learning.

Young, P. C. (1984). Recursive estimation and time-series analysis : An introduction. Springer-Verlag.

Zheng, X., & Basher, R. E. (1999). Structural time series models and trend detection in global and regional temperature series. American Meteorology Society, 2347– 2358.

Zlatoper, T. J. (1984). Regression analysis of time series data on motor vehicle death in the United States. Journal of Transport Economics and Policy, 18(3), 263– 273.

241

APPENDICES

APPENDIX 1 (1)

Table A: Unit root test for region Variables Lag Northern Central Southern Level 1st diff Level 1st diff Level 1st diff LRA 6 -2.502 -37.37** -1.958 -29.454** -1.115 -33.150** 12 -2.911** -47.64** -2.103 -42.567** -1.219 -46.340** RAINF 6 -9.019** -22.44** -10.317** -26.838** -9.523** -23.158** 12 -9.272** -32.76** -10.556** -37.581** -9.419** -34.798** RAIND 6 -8.375** -20.26** -10.364** -27.470** -8.655** -21.013** 12 -9.313** -31.93** -10.524** -37.722** -8.537** -29.466** TEMP 6 -5.239** -13.01** -5.749** -16.251** -6.700** -16.754** 12 -4.212** -19.21** -5.531** -23.530** -6.397** -24.745** API 6 -4.323** -18.01** -5.369** -27.260** -8.824** -30.873** 12 -4.677** -23.39** -6.247** -33.918** -9.969** -40.075** OILP 6 -1.590 -7.858** -1.590 -7.858** -1.590 -7.858** 12 -1.300 -7.400** -1.300 -7.400** -1.300 -7.400** CPI 6 -3.227** -10.605** -3.227** -10.605** -3.227** -10.605** 12 -2.979** -11.022** -2.979** -11.022** -2.979** -11.022** BLKG 6 -13.827** -32.159** -13.827** -32.159** -13.827** -32.159** 12 -20.929** -55.663** -20.929** -55.663** -20.929** -55.663** SAFE 6 -12.731** -33.574** -12.731** -33.574** -12.731** -33.574** 12 -14.393** -53.673** -14.393** -53.673** -14.393** -53.673** Variables Lag East Coast Borneo Level 1st diff Level 1st diff LRA 6 -3.630** -35.456** -1.955 -37.085** 12 -4.266** -47.349** -2.203 -44.609** RAINF 6 -8.527** -22.133** -7.227** -18.446** 12 -8.664** -34.842** -6.731** -26.787** RAIND 6 -7.448** -19.260** -7.247** -19.599** 12 -7.392** -33.932** -6.836** -28.606** TEMP 6 -4.810** -9.258** -5.592** -12.951** 12 -3.101** -11.903** -4.626** -18.739** API 6 -4.462** -23.396** -9.501** -37.468** 12 -5.140** -30.836** -10.729** -46.916** OILP 6 -1.590 -7.858** -1.590 -7.858** 12 -1.300 -7.400** -1.300 -7.400** CPI 6 -3.227** -10.605** -2.794* -10.794** 12 -2.979** -11.022** -2.597* 11.185** BLKG 6 -13.827** -32.159** -13.827** -32.159** 12 -20.929** -55.663** -20.929** -55.663** SAFE 6 -12.731** -33.574** -12.731** -33.574** 12 -14.393** -53.673** -14.393** -53.673**

242

APPENDIX 1 (2)

Table B: Unit root test for individual states Penang Kedah Perlis Variables Lag Level 1st diff Level 1st diff Level 1st diff LRA 6 -2.473 -37.139** -2.694* -31.059** -3.365** -28.984** 12 -2.871* -43.252** -3.112** -40.901** -3.978** -33.049** RAINF 6 -9.017** -22.513** -7.505** -17.778** -8.836** -22.350** 12 -9.253** -32.260** -7.710** -29.376** -8.937** -30.355** RAIND 6 -8.695** -20.215** -6.452** -15.098** -7.294** -16.928** 12 -9.767** -29.419** -6.409** -24.853** -7.807** -26.819** TEMP 6 -5.355** -14.017** -5.010** -12.041** -4.758** -11.628** 12 -4.761** -20.726** -3.851** -17.613** -3.900** -16.771** API 6 -4.749** -23.984** -4.819** -15.597** -6.282** -17.544** 12 -5.431** -28.795** -5.015** -21.187** -6.306** -22.303** OILP 6 -1.590 -7.858** -1.590 -7.858** -2.381 -6.786** 12 -1.300 -7.400** -1.300 -7.400** -2.075 -6.357** CPI 6 -3.227** -10.605** -3.227** -10.605** -2.869* -9.219** 12 -2.979** -11.022** -2.979** -11.022** -2.644* -9.661** BLKG 6 -13.827** -32.159** -13.827** -32.159** -13.233** -29.965** 12 -20.929** -55.663** -20.929** -55.663** -20.054** -48.597** SAFE 6 -12.731** -33.574** -12.731** -33.574** -11.682** -28.926** 12 -14.393** -53.673** -14.393** -53.673** -12.830** -43.342** Perak Selangor Kuala Lumpur Variables Lag Level 1st diff Level 1st diff Level 1st diff LRA 6 -4.249** -36.005** -2.451 -22.093** -2.779* -33.385** 12 -5.051** -46.286** -2.691* -32.731** -3.196** -46.313** RAINF 6 -9.636** -26.726** -8.232** -21.434** -11.790** -30.472** 12 -9.585** -38.224** -8.186** 31.510** -12.114** -41.437** RAIND 6 -9.096** -23.215** -8.198** 20.963** -11.053** -29.912** 12 -9.264** -35.886** -8.094** 31.395** -11.337** -39.698** TEMP 6 -5.849** -14.520** -4.953** -13.884** -6.143** -17.698** 12 -5.094** -20.753** -4.679** -19.640** -6.431** -58.495** API 6 -5.446** -23.737** -6.737** -24.740** -4.684** -22.507** 12 -6.138** -32.251** -7.535** -30.498** -5.231** -26.758** OILP 6 -1.590 -2.381** -2.381 -6.786** -1.590 -7.858** 12 -1.300 -2.075** -2.075 -6.357** -1.300 -7.400** CPI 6 -3.227** -10.605** -2.869* -9.219** -3.227** -10.605** 12 -2.979** -11.022** -2.644* -9.661** -2.979** -11.022** BLKG 6 -13.827** -32.159** -13.233** -29.965** -13.827** -32.159** 12 -20.929** -55.663** -20.054** -48.597** -20.929** -55.663** SAFE 6 -12.731** -33.574** -11.682** -28.926** -12.731** -33.574** 12 -14.393** -53.673** -12.830** -43.342** -14.393** -53.673**

243

APPENDIX 1 (3)

Table B: Continued Johor Melaka Negeri Sembilan Variables Lag Level 1st diff Level 1st diff Level 1st diff 6 -1.294 -31.531** -1.455 -31.897** -1.966 -36.803** LRA 12 -1.392 -45.935** -1.698 -34.678** -2.344 -47.339** 6 -9.503** -23.034** -11.252** -29.653** -10.116** -23.964** RAINF 12 -9.387** -35.063** -11.300** -43.500** -10.132** -30.705** 6 -9.904** -25.647** -8.650** -20.803** -9.539** -23.523** RAIND 12 10.201** -38.913** -8.459** -29.914** -9.502** -32.014** 6 -6.276** -14.462** -6.471** -16.891** 5.861** -15.681** TEMP 12 -5.658** -20.881** -6.348** -25.666** -5.498** -23.762** 6 -10.613** -34.073** -8.846** -27.585** -5.327** -24.948** API 12 -11.240** -45.180** -9.690** -35.963** -6.186** -30.226** 6 -1.590 -7.858** -1.590 -7.858** -1.590 -7.858** OILP 12 -1.300 -7.400** -1.300 -7.400** -1.300 -7.400** 6 -3.227** -10.605** -3.227** -10.605** -3.227** -10.605** CPI 12 -2.979** -11.022** -2.979** -11.022** -2.979** -11.022** 6 -13.827** -32.159** -13.827** -32.159** -13.827** -32.159** BLKG 12 -20.929** -55.663** -20.929** -55.663** -20.929** -55.663** 6 -12.731** -33.574** -12.731** -33.574** -12.731** -33.574** SAFE 12 -14.393** -53.673** -14.393** -53.673** -14.393** -53.673** Kelantan Terengganu Pahang Variables Lag Level 1st diff Level 1st diff Level 1st diff 6 -6.5840** -35.102** -3.975** -35.067** -2.786* -36.098** LRA 12 -7.7800** -48.185** -4.715** -47.916** -3.247** -44.286** 6 -8.799** -23.04** -9.239** -23.103** -9.679** -25.966** RAINF 12 -8.896** -42.42** -9.578** -37.394** -9.558** -36.571** 6 -7.722** -20.31** -7.087** -17.813** -8.573** -22.225** RAIND 12 -7.955** -36.01** -6.793** -30.947** -8.809** -35.093** 6 -4.914** -10.30** -4.675** -10.403** -5.580** -12.259** TEMP 12 -3.262** -14.42** -3.222** -13.624** -4.470** -17.373** 6 -5.560** -24.61** -4.997** -20.834** -5.364** -25.450** API 12 -6.353** -33.94** -5.746** -26.574** -6.220** -34.080** 6 -1.590 -7.858** -1.590 -7.858** -1.590 -7.858** OILP 12 -1.300 -7.400** -1.300 -7.400** -1.300 -7.400** 6 -3.227** -10.605** -3.227** -10.605** -3.227** -10.605** CPI 12 -2.979** -11.022** -2.979** -11.022** -2.979** -11.022** 6 -13.827** -32.159** -13.827** -32.159** -13.827** -32.159** BLKG 12 -20.929** -55.663** -20.929** -55.663** -20.929** -55.663** 6 -12.731** -33.574** -12.731** -33.574** -12.731** -33.574** SAFE 12 -14.393** -53.673** -14.393** -53.673** -14.393** -53.673**

244

APPENDIX 1 (4)

Table B: Continued Sabah Sarawak Variables Lag Level 1st diff Level 1st diff LRA 6 -2.147 -27.811** -2.062 -7.182** 12 -2.339 -39.509** -1.738 -3.991** RAINF 6 -7.606** -21.547** -6.790** -6.934** 12 -7.493** -29.068** -2.105 -7.130** RAIND 6 -6.901** -19.526** -6.603** -7.084** 12 -6.661** -26.134** -2.278 -7.447** TEMP 6 -5.780** -13.757** -6.598** -8.256** 12 -4.953** -20.412** -1.327 -6.143** API 6 -15.531** 66.992** -2.730* -7.755** 12 -16.793** -77.473** -1.517 -4.344** OILP 6 -1.590 -7.858** -1.590 -7.858** 12 -1.300 -7.400** -1.300 -7.400** CPI 6 -2.778* -5.743** -2.278 -5.830** 12 -2.639* -4.001** -1.942 -3.849** BLKG 6 -13.827** -32.159** -13.827** -32.159** 12 -20.929** -55.663** -20.929** -55.663** SAFE 6 -12.731** -33.574** -12.731** -33.574** 12 -14.393** -53.673** -14.393** -53.673**

245

APPENDIX 2 (1)

Preliminary Time Series Regression Model

Table A: Estimated regional road accidents model. Regions Coefficient Northern southern Central East Coast Borneo ** ** ** ** ** β0 8.420 8.315 8.768 7.518 6.500 v 0.003** 0.005** 0.005** 0.005** 0.004** * γ1 0.000 -0.019 0.018 -0.006 -0.043 ** ** ** ** γ 2 -0.086 -0.104 -0.091 -0.022 -0.092 ** ** γ 3 0.022 0.007 0.051 0.185 0.012 * ** γ 4 -0.004 -0.035 0.023 0.147 -0.027 ** γ 5 0.005 0.006 0.026 0.172 0.020 * ** * γ 6 -0.021 -0.029 0.005 0.157 -0.044 ** ** γ 7 0.014 0.023 0.083 0.152 -0.009 ** ** ** ** γ 8 0.039 0.039 0.086 0.213 0.005 ** ** * γ 9 -0.023 -0.009 0.034 0.151 -0.038 ** ** γ10 0.011 0.024 0.083 0.118 0.026 ** * γ11 -0.040 -0.029 0.025 0.015 -0.031

β -5* -5 -5** -5 -5 1_ RAINF 2.8×10 1.4×10 3.7×10 2.1×10 1.4×10 β 2_ RAIND -0.001 0.001 6.0.E-07 0.004 0.002

β * 3_ TEMP 0.005 0.006 0.011 -0.012 0.025

β -4 -4 ** -5 -5 4_ API -3.6×10 -1.1×10 -0.001 1.9×10 1.1×10

β -4 -5 -5 -4** -6 5_ OILP 1.1×10 -3.7×10 -3.3×10 -4.7×10 9.3×10

β ** * 6_ CPI -0.001 -0.001 -0.001 0.001 0.001

β ** 7_ BLKG 0.018 0.011 -0.032 0.039 -0.007

β ** ** ** ** 8_ SAFE 0.078 0.043 0.018 0.172 0.043 DW 1.2499 1.1184 1.1381 1.7279 1.033 LB 47.749** 65.545** 54.703** 40.152** 144.67** GQ 1.098 0.52786 1.0376 0.33167 1.3374 JB 2.5635 3.7824 1.7045 6.7848** 1.7829 2 R 0.9476 0.9721 0.964 0.9119 0.9321

246

APPENDIX 2 (2)

Table B: Estimated individual states of road accidents model. States Kuala Penang Perak Perlis Kedah Selangor Melaka Lumpur ** ** ** ** ** ** ** β0 7.505 7.357 5.201 6.458 8.830 7.956 6.373 v 0.002** 0.003** 0.006** 0.003** 0.004** 0.004** 0.005** * γ1 0.008 -0.003 0.104 -0.010 0.012 0.017 -0.027 ** ** ** ** ** ** γ 2 -0.094 -0.053 0.017 -0.123 -0.076 -0.105 -0.099 * ** ** ** γ 3 0.005 0.038 0.188 0.010 0.058 0.048 0.005 ** ** ** γ 4 -0.014 0.014 0.191 -0.048 0.041 0.009 -0.046 * ** ** γ 5 -0.021 0.035 0.167 -0.004 0.052 0.009 -0.012 * γ 6 -0.033 0.005 0.069 -0.036 0.026 -0.016 -0.028 * ** ** ** γ 7 0.013 0.033 0.135 -0.008 0.110 0.074 0.007 ** ** ** γ 8 0.026 0.064 0.126 0.029 0.103 0.078 0.012 ** ** ** γ 9 -0.043 -0.002 0.164 -0.017 0.074 0.023 -0.031 ** ** γ10 -0.008 0.018 0.086 0.017 0.107 0.082 -0.026 ** * ** ** γ11 -0.048 -0.041 0.066 -0.065 0.026 0.033 -0.072 -5 -5* -5 -5** -5 -5 -4** β1_ RAINF 3.4×10 6.3×10 -2.4×10 7.3×10 2.0×10 6.4×10 1.6×10 -5 -4 β2_ RAIND 2.8×10 0.002 -0.006 0.001 0.003 -0.001 -1.7×10 ** ** β3_ TEMP 0.008 0.005 -0.018 0.015 0.008 0.011 0.011 -4 -4 -4 -5 -6 β4_ API -0.001 1.2×10 -0.001 1.8×10 -2.0×10 -6.4×10 -3.7×10 -4** -4 -4 -4** -5 -6 -4** β5_ OILP 3.0×10 -1.3×10 -2.6×10 2.3×10 1.3×10 -7.4×10 -3.6×10 -4 -4 ** ** β6_ CPI -4.9×10 -3.9×10 -0.001 -0.001 -0.003 -0.003 -0.001 -4 ** * * * β7_ BLKG -1.8×10 0.045 -0.014 0.032 -0.025 -0.025 0.032 * ** ** -4 β8_ SAFE 0.033 0.101 0.077 0.104 -0.001 -3.8×10 0.023 DW 1.264 1.374 1.564 1.124 1.138 1.571 0.993 LB 95.782** 31.924** 23.067** 56.317** 35.912** 39.039** 101.260** GQ 1.934** 1.038 0.663 0.542 1.343 0.874 0.886 JB 5.676* 74.131** 4.537 1.725 1.691 1.016 0.131 2 R 0.904 0.897 0.757 0.924 0.937 0.947 0.925

247

APPENDIX 2 (3)

Table B: Continued States Negeri Johor Kelantan Terengganu Pahang Sabah Sarawak Sembilan ** ** ** ** ** ** ** β0 7.868 6.491 5.575 5.678 6.551 5.317 6.254 v 0.005** 0.004** 0.005** 0.006** 0.006** 0.004** 0.004** ** ** γ1 -0.001 -0.069 -0.024 -0.016 -0.025 -0.018 -0.057 ** ** ** ** ** γ 2 -0.105 -0.130 -0.004 -0.045 -0.086 -0.078 -0.087 ** ** ** γ 3 0.014 -0.034 0.232 0.149 0.080 0.027 0.003 ** ** * γ 4 -0.028 -0.071 0.170 0.106 0.037 -0.009 -0.032 ** ** * γ 5 0.012 -0.005 0.155 0.152 0.078 -0.003 0.038 ** ** * γ 6 -0.032 -0.027 0.197 0.122 0.054 -0.049 -0.035 ** ** * γ 7 0.032 0.009 0.130 0.118 0.077 -0.003 -0.016 ** ** ** ** γ 8 0.054 0.020 0.226 0.191 0.117 0.029 -0.018 ** ** ** γ 9 0.000 -0.019 0.158 0.147 0.049 -0.013 -0.075 ** ** ** * ** γ10 0.045 -0.003 0.133 0.145 0.014 0.047 -0.001 ** γ11 -0.021 -0.037 0.019 0.017 -0.053 -0.011 -0.055 -5 -4* -6 -4** -6 -5** -6 β1_ RAINF 2.2×10 1.3×10 7.7×10 1.8×10 2.5×10 4.2×10 9.7×10 * ** ** β2_ RAIND -0.001 0.003 0.008 0.001 0.009 0.003 0.003 ** -4 ** β3_ TEMP 0.003 0.020 0.002 1.7×10 -0.002 0.037 0.015 -4 -5 -4 * -4 * β4_ API -1.4×10 -4.2×10 -1.5×10 0.002 -3.9×10 -0.001 0.001 -5 -5 -4** ** -4** -6 -5 β5_ OILP 3.9×10 2.8×10 -4.8×10 -0.001 -3.3×10 -7.6×10 4.6×10 ** * -4 β6_ CPI -0.001 -0.003 0.003 0.002 0.001 0.002 2.9×10 * β7_ BLKG -0.002 0.030 0.029 0.033 0.044 -0.009 -0.007 ** * ** ** ** * β8_ SAFE 0.048 0.043 0.284 0.193 0.104 0.042 0.044 DW 0.360 1.826 2.028 1.899 1.2795 1.186 1.149 LB 93.332** 22.158** 18.980** 27.181** 84.611** 59.726** 133.240** GQ 0.987 0.817 0.421 0.316 0.547 1.186 0.686 JB 3.685 217.360** 13.993** 0.923 0.466 2.838 1.636 2 R 0.964 0.930 0.804 0.882 0.917 0.932 0.893

248

APPENDIX 3

Table 1: Variance inflation factor for each regions East Northern Central Southern Coast Borneo rainf 5.219 3.927 2.813 6.220 5.890 raind 7.843 5.024 3.882 7.126 6.063 temp 4.635 2.548 3.007 11.167 8.929 api 2.909 2.746 1.727 3.667 1.939 oilp 6.095 6.177 6.027 5.989 6.705 cpi 1.372 1.349 1.404 1.424 1.463 blkg 4.587 4.322 4.342 4.457 4.292 safe 3.968 3.711 3.709 4.003 3.705

Table 2: Variance Inflation factor for individual states Penang Perak Perlis Kedah Selangor KL Melaka rainf 3.295 4.463 2.508 4.196 3.470 2.493 1.804 raind 4.949 5.873 5.114 7.975 4.182 3.428 3.221 temp 3.922 3.329 5.281 5.202 2.706 2.092 3.492 api 2.374 2.545 1.650 2.310 1.877 2.212 1.574 oilp 6.506 6.299 2.846 6.242 2.949 5.792 6.237 cpi 1.398 1.455 1.935 1.491 2.299 1.607 1.399 blkg 4.435 4.562 4.855 4.494 4.392 4.318 4.435 safe 3.837 4.046 4.244 4.187 3.918 3.855 3.837 Johor N9 Kelantan Tganu Pahang Sabah Sarawak rainf 2.705 2.622 4.108 3.223 3.805 3.420 5.152 raind 3.249 2.779 4.912 4.715 3.909 5.017 5.167 temp 3.890 2.548 10.407 6.099 7.254 6.988 7.189 api 1.378 2.368 3.009 2.983 2.998 1.369 2.377 oilp 5.756 6.619 5.959 5.665 5.938 5.850 6.914 cpi 1.347 1.533 1.351 1.812 1.401 1.457 1.583 blkg 4.400 4.526 4.561 4.280 4.294 4.324 4.325 safe 3.741 3.764 4.006 3.706 3.711 3.845 3.742

249

APPENDIX 4(1)

Table 1: Estimated results and performance criteria of structural time series Model for Penang road accidents Model Parameter DLDS LLDS LLSS DTDS STDS LDDS LDSS σ 2 ε 0.0186 0.0009 0.0009 0.0021 0.0011 0.0010 0.0010 σ 2 -5 -5 η 0.0000 0.0002 0.0002 0.0000 0.0000 9.8×10 9.8×10 σ 2 -6 ς NIL NIL NIL 0.0000 1.4×10 0.0000 0.0000 σ 2 -8 -8 ω 0.0000 0.0000 4.8×10 0.0000 0.0000 0.0000 4.6×10 LB(12) 516.45** 13.7470 13.6780 218.2** 11.5340 10.6780 10.6340 GQ 3.6936** 1.0587 1.0545 1.3076 1.1284 1.0969 1.0923 JB 1.1229 0.8374 0.8232 1.1566 0.4839 0.3599 0.3714 AIC -3.3232 -5.7017 -5.6890 -5.2149 -5.6168 -5.6579 -5.6452

Table 2: Estimated results and performance criteria of structural time series Model for Kedah road accidents Model Parameter DLDS LLDS LLSS DTDS DTSS LDDS LDSS σ 2 ε 0.0375 0.0035 0.0015 0.0047 0.0035 0.0040 0.0017 σ 2 η 0.0000 0.0004 0.0003 0.0000 0.0000 0.0001 0.0002 σ 2 ς NIL NIL NIL 0.0000 0.0000 0.0000 0.0000 σ 2 -5 -6 -5 ω 0.0000 0.0000 1.9×10 0.0000 8.1×10 0.0000 1.8×10 LB(12) 511.60** 28.614** 12.0360 26.591** 24.139** 26.219** 12.8360 GQ 6.1322 1.2837 0.8423 1.2915 0.9546 1.2894 0.8673 JB 1.0989 7.7166** 8.8766** 10.967** 11.967** 6.0835** 4.7598 AIC -2.6739 -4.5511 -4.6343 -4.4821 -4.5233 -4.4982 -4.5815

Table 3: Estimated results and performance criteria of structural time series Model for Perlis road accidents Model Parameter DLDS LLDS LLSS DTDS STDS STSS LDDS LDSS σ 2 ε 0.0723 0.0099 0.0097 0.0148 0.0120 0.0117 0.0102 0.0100 σ 2 η 0.0000 0.0010 0.0009 0.0000 0.0000 0.0000 0.0007 0.0007 σ 2 -6 -6 ς NIL NIL NIL 0.0000 3.0×10 3.2×10 0.0000 0.0000 σ 2 -6 -6 -6 ω 0.0000 0.0000 1.0×10 0.0000 0.0000 1.5×10 0.0000 1.5×10 LB(12) 566.02** 7.9766 8.3127 37.186** 13.0790 13.8620 7.4759 8.0447 GQ 9.7057 0.9669 0.9443 0.9487 1.0175 0.9835 0.9614 18.2265 JB 4.4628** 3.2766 3.2909 3.8304 3.3298 3.2570 3.5130 3.4898 AIC -2.0680 -3.6217 -3.6093 -3.4257 -3.5003 -3.4883 -3.5600 -3.5481

250

APPENDIX 4(2)

Table 4: Estimated results and performance criteria of structural time series Model for Perak road accidents Model Parameter DLDS LLDS LLSS DTDS DTSS LDDS LDSS σ 2 ε 0.0234 0.0039 0.0029 0.0042 0.0035 0.0042 0.0032 σ 2 -6 -5 η 0.0000 0.0002 0.0002 0.0000 0.0000 7.2×10 2.4×10 σ 2 ς NIL NIL NIL 0.0000 0.0000 0.0000 0.0000 σ 2 -6 -6 -6 ω 0.0000 0.0000 7.0×10 0.0000 4.8×10 0.0000 6.2×10 LB(12) 251.92** 22.474** 14.3680 16.378* 11.9260 16.1010 9.7364 GQ 4.9959** 1.2328 0.9902 1.1855 0.9577 1.1940 0.9528 JB 5.2769* 5.1848* 6.5859** 13.088** 13.726** 12.787** 12.706** AIC -3.2380 -4.5817 -4.5960 -4.5747 -4.5836 -4.5618 -4.5752

Table 5: Estimated results and performance criteria of structural time series Model for Selangor road accidents Model Parameter DLDS LLDS DTDS STDS LDDS σ 2 ε 0.0502 0.0011 0.0023 0.0018 0.0013 σ 2 η 0.0000 0.0004 0.0000 0.0000 0.0002 σ 2 -7 ς NIL NIL 0.0000 5.6×10 0.0000 σ 2 ω 0.0000 0.0000 0.0000 0.0000 0.0000 LB(12) 892.35** 10.1540 47.745** 33.116** 11.336 GQ 6.5604** 0.6823 0.8540 0.8299 0.7067 JB 3.4794 10.635** 24.014** 42.774** 20.078** AIC -2.4053 -5.3550 -5.1110 -5.2530 -5.3291

Table 6: Estimated results and performance criteria of structural time series Model for Kuala Lumpur road accidents Model Parameter DLDS LLDS DTDS STDS LDDS σ 2 ε 0.0292 0.0012 0.0020 0.0016 0.0013 σ 2 -6 η 0.0000 0.0002 0.0000 0.0000 8.0×10 σ 2 -7 ς NIL NIL 0.0000 2.3×10 0.0000 σ 2 ω 0.0000 0.0000 0.0000 0.0000 0.0000 LB(12) 874.30** 11.405 73.053** 13.9120 8.7047 GQ 9.4129** 0.9169 1.5209* 1.0468 0.9833 JB 3.6785 3.8601 6.5026** 5.8481* 6.0142** AIC -2.9048 -5.4454 -5.2757 -5.3783 -5.4299

251

APPENDIX 4(3)

Table 7: Estimated results and performance criteria of structural time series Model for Johor road accidents Model Parameter DLDS LLDS LLSS DTDS DTSS STDS STSS LDDS LDSS σ 2 ε 0.0502 0.0009 0.0006 0.0019 0.0016 0.0014 0.0008 0.0010 0.0006 σ 2 η 0.0000 0.0004 0.0003 0.0000 0.0000 0.0000 0.0000 0.0003 0.0001 σ 2 -6 -6 ς NIL NIL NIL 0.0000 0.0000 2.3×10 3.6×10 0.0000 0.0000 σ 2 -6 -6 -6 -6 ω 0.0000 0.0000 3.0×10 0.0000 1.8×10 0.0000 4.1×10 0.0000 3.6×10 LB(12) 999.01** 23.578** 9.4524 81.931** 95.213** 49.961** 18.161** 27.179** 8.0290 GQ 7.1292** 1.2584 0.8644 0.8232 0.5625 1.1395 0.6619 1.2810 0.7601 JB 1.6441 1.7336 1.1136 2.3979 2.8977 0.7230 0.7537 1.8209 1.1683 AIC -2.4056 -5.5156 -5.5675 -5.2864 -5.3090 -5.3728 -5.4760 -5.4789 -5.5593

Table 8: Estimated results and performance criteria of structural time series Model for Negeri Sembilan road accidents Model Parameter DLDS LLDS LLSS DTDS LDDS LDSS σ 2 ε 0.0433 0.0033 0.0030 0.0041 0.0034 0.0031 σ 2 -5 -5 η 0.0000 0.0002 0.0002 0.0000 3.6×10 3.7×10 σ 2 ς NIL NIL NIL 0.0000 0.0000 0.0000 σ 2 -6 -6 ω 0.0000 0.0000 1.5×10 0.0000 0.0000 2.2×10 LB(12) 881.69** 22.089** 18.810** 33.582** 16.768* 12.4260 GQ 14.4020** 0.9898 0.9491 1.2978 0.9477 0.8885 JB 1.6594 53.970** 70.403** 72.551** 62.351** 85.115** AIC -2.5412 -4.6782 -4.6688 -4.6033 -4.7062 -4.6996

Table 9: Estimated results and performance criteria of structural time series Model for Melaka road accidents Model Parameter DLDS LLDS LLSS DTDS STDS STSS LDDS LDSS σ 2 ε 0.0389 0.0018 0.0017 0.0034 0.0021 0.0020 0.0019 0.0018 σ 2 η 0.0000 0.0004 0.0004 0.0000 0.0000 0.0000 0.0002 0.0002 σ 2 -6 -6 ς NIL NIL NIL 0.0000 5.2×10 5.2×10 0.0000 0.0000 σ 2 -7 -7 -7 ω 0.0000 0.0000 1.7×10 0.0000 0.0000 5.0×10 0.0000 3.9×10 LB(12) 879.4500 12.3360 12.1040 149.79** 9.3948 9.0735 9.6476 9.2611 GQ 8.4963 0.6715 0.6633 0.5953 0.7478 0.7216 0.6600 0.6419 JB 7.6379 0.9077 0.8629 0.3673 0.5034 0.4917 1.1480 1.0504 AIC -2.6402 -5.0772 -5.0646 -4.7633 -4.9841 -4.9728 -5.0374 -5.0254

252

APPENDIX 4(4)

Table10: Estimated results and performance criteria of structural time series Model for Kelantan road accidents Model Parameter DLDS LLDS LLSS DTDS DTSS STSS LDDS LDSS LTSS σ 2 ε 0.0543 0.0181 0.0042 0.0202 0.0118 0.0049 0.0184 0.0044 0.0047 σ 2 -5 η 0.0000 0.0004 0.0003 0.0000 0.0000 0.0000 0.0001 0.0002 4.7×10 σ 2 -5 -7 ς NIL NIL NIL 0.0000 0.0000 7.2×10 0.0000 0.0000 4.1×10 σ 2 -5 ω 0.0000 0.0000 0.0002 0.0000 7.0×10 0.0001 0.0000 0.0002 0.0001 LB(12) 110.520** 46.205** 25.809** 43.887** 40.852** 20.779** 43.071** 21.690** 21.097** GQ 2.8413** 0.9081 0.5465 1.2101 0.9162 0.5788 1.0161 0.6183 0.5839 JB 16.425** 50.150** 168.710** 29.399** 80.889** 138.880** 43.445** 145.32** 140.49** AIC -2.3329 -3.2148 -3.3520 -3.1372 -3.2132 -3.3146 -3.1640 -3.3079 -3.3021

Table 11: Estimated results and performance criteria of structural time series Model for Terengganu road accidents Model Parameter DLDS LLDS LLSS DTDS DTSS STDS STSS LDDS LDSS σ 2 ε 0.0673 0.0013 0.0016 0.0135 0.0102 0.0118 0.0028 0.0116 0.0020 σ 2 η 0.0000 0.0005 0.0005 0.0000 0.0000 0.0000 0.0000 0.0002 0.0003 σ 2 -6 -6 ς NIL NIL NIL 0.0000 0.0000 1.1×10 4.0×10 0.0000 0.0000 σ 2 -5 -5 ω 0.0000 0.0000 0.0001 0.0000 2.4×10 0.0000 9.7×10 0.0000 0.0001 LB(12) 364.29** 49.795** 26.408** 60.117** 68.864** 42.894** 20.857** 45.415** 22.653** GQ 4.4026** 0.6113 0.4602 0.7546 0.5700 0.6470 0.4813 0.6619 0.4822 JB 4.7190* 35.668** 45.807** 25.2955 19.137** 23.328** 47.813** 25.353** 36.127** AIC -2.1348 -3.5910 -3.7216 -3.5085 -3.5347 -3.5440 -3.6540 -3.5522 -3.6752

Table 12: Estimated results and performance criteria of structural time series Model for Pahang road accidents Model Parameter DLDS LLDS LLSS DTDS STDS STSS LDDS LDSS σ 2 ε 0.0589 0.0056 0.0052 0.0075 0.0060 0.0055 0.0059 0.0054 σ 2 η 0.0000 0.0005 0.0004 0.0000 0.0000 0.0000 0.0002 0.0002 σ 2 -6 -6 ς NIL NIL NIL 0.0000 1.7×10 1.8×10 0.0000 0.0000 σ 2 -6 -6 -6 ω 0.0000 0.0000 2.5×10 0.0000 0.0000 3.1×10 0.0000 3.0×10 LB(12) 529.16** 27.134** 25.582** 46.499** 20.816** 18.8130 22.016** 20.060** GQ 4.7398** 0.7383 0.6951 0.8170 0.7678 0.7165 0.7857 0.7357 JB 0.7168 8.1560** 9.6973** 0.5529 7.0196** 9.0131** 5.8617* 7.5115** AIC -2.2567 -4.1647 -4.1544 -4.0515 -4.1337 -4.1253 -4.1355 -4.1265

253

APPENDIX 4(5)

Table 13: Estimated results and performance criteria of structural time series Model for Sabah road accidents Model Parameter DLDS LLDS LLSS DTDS DTSS LDDS LDSS σ 2 ε 0.0401 0.0019 0.0015 0.0032 0.0031 0.0021 0.0017 σ 2 -5 -5 η 0.0000 0.0003 0.0003 0.0000 0.0000 8.4×10 9.4×10 σ 2 ς NIL NIL NIL 0.0000 0.0000 0.0000 0.0000 σ 2 -6 -7 -6 ω 0.0000 0.0000 2.5×10 0.0000 6.5×10 0.0000 2.4×10 LB(12) 632.28** 13.3070 10.9580 85.751 90.0100 9.1413 7.5319 GQ 5.3864 1.3537 1.1279 2.4803 2.4051 1.5520 1.3003 JB 4.8714 6.7660 7.0880 5.0761 4.6649 8.6112 9.3091 AIC -2.6126 -5.0850 -5.0994 -4.8300 -4.8195 -5.0656 -5.0789

Table 14: Estimated results and performance criteria of structural time series Model for Sarawak road accidents Model Parameter DLDS LLDS LLSS DTDS STDS STSS LDDS LDSS σ 2 ε 0.0345 0.0020 0.0020 0.0036 0.0024 0.0023 0.0021 0.0021 σ 2 η 0.0000 0.0003 0.0003 0.0000 0.0000 0.0000 0.0001 0.0001 σ 2 -7 -7 ς NIL NIL NIL 0.0000 3.1×10 3.1×10 0.0000 0.0000 σ 2 -7 -7 -7 ω 0.0000 0.0000 3.0×10 0.0000 0.0000 3.6×10 0.0000 3.8×10 LB(12) 414.0100 8.8344 8.9090 30.794 6.2100 6.6360 6.1921 6.3060 GQ 3.2929 0.8210 0.7840 1.3338 0.8749 0.8295 0.9475 0.8948 JB 1.9027 5.5382 5.5468 1.7300 2.8206 3.0767 4.0221 4.2285 AIC -2.7504 -5.0303 -5.0193 -4.7208 -5.0019 -4.9912 -5.0082 -4.9980

254

APPENDIX 5(1)

Table 1: Auxiliary Residual for selected state

State Observation/ Irregular Level Perak

Selangor

Negeri Sembilan

Kelantan

255

APPENDIX 5(2)

Table 1: continued

State Observation/ Irregular Level Terengganu

Johor

256

LIST OF PUBLICATIONS

Noor Wahida Md Junus and Mohd Tahir Ismail (2013) Predicting road accidents: structural time series approach. In The 21st National Symposiym on Mathematical Sciences (SKSM21).

Noor Wahida Md Junus and Mohd Tahir Ismail (2013) Modeling road accidents: An approach using structural time series. In Statistical and Operation Research International Conference (SORIC 2013).

Noor Wahida Md Junus, Mohd Tahir Ismail and Zainudin Arsad (2014) Behaviour of road accidents: Structural Time Series Approach. In International Conference on Quantitative Sciences and Its Application (ICOQSIA 2014).

Noor Wahida Md Junus, Mohd Tahir Ismail and Zainudin Arsad (2014) Road accidents model: Time series regression versus structural time series approach. In The 3rd International Conference on Computer Science & Computational Mathematics (ICCSCM 2014).

Noor Wahida Md Junus, Mohd Tahir Ismail and Zainudin Arsad (2015) Predicting Penang road accidents: Time series regression versus structural time series. Indian Journal of Science and Technology,8(30).

Noor Wahida Md Junus, Mohd Tahir Ismail and Zainudin Arsad (2016) Climate and festival effects on Penang road accidents. Jurnal Teknologi 78(4-4): 135-144.

Noor Wahida Md Junus, Mohd Tahir Ismail , Zainudin Arsad and Rosmanjawati Abdul Rahman (2017) Malaysia road accidents influences based on structural time series approach. Applied Mathematics and Information Sciences, 11(4): 1029-103.

257