Forecasting Monthly Airline Passenger Numbers with Small Datasets Using Feature Engineering and a Modified Principal Component Analysis

Forecasting Monthly Airline Passenger Numbers with Small Datasets Using Feature Engineering and a Modified Principal Component Analysis Sara Al-Ruzaiqi A Doctoral Thesis Submitted in partial fulfilment of the requirements for the award of Doctor of Philosophy of Loughborough University 4 2019 Dedication I dedicate this Thesis to the two most important people in my life my father and my mother (my beloved parents) for their endless support and unconditional love. My parents always believed in me even in times when I was full of doubt in myself. They were always there cheering me up and stood by me through the good and bad times. I hope both of you are proud of me. 5 Abstract In this study, a machine learning approach based on time series models, different feature engineering, feature extraction, and feature derivation is proposed to improve air passenger forecasting. Different types of datasets were created to extract new features from the core data. An experiment was undertaken with artificial neural networks to test the performance of neurons in the hidden layer, to optimise the dimensions of all layers and to obtain an optimal choice of connection weights – thus the nonlinear optimisation problem could be solved directly. A method of tuning deep learning models using H2O (which is a feature-rich, open source machine learning platform known for its R and Spark integration and its ease of use) is also proposed, where the trained network model is built from samples of selected features from the dataset in order to ensure diversity of the samples and to improve training. A successful application of deep learning requires setting numerous parameters in order to achieve greater model accuracy. The number of hidden layers and the number of neurons, are key parameters in each layer of such a network. Hyper-parameter, grid search, and random hyper-parameter approaches aid in setting these important parameters. Moreover, a new ensemble strategy is suggested that shows potential to optimise parameter settings and hence save more computational resources throughout the tuning process of the models. The main objective, besides improving the performance metric, is to obtain a distribution on some hold-out datasets that resemble the original distribution of the training data. Particular attention is focused on creating a modified version of Principal Component Analysis (PCA) using a different correlation matrix – obtained by a different correlation coefficient based on kinetic energy to derive new features. The data were collected from several airline datasets to build a deep prediction model for forecasting airline passenger numbers. Preliminary experiments show that fine-tuning provides an efficient approach for tuning the ultimate number of hidden layers and the number of neurons in each layer when compared with the grid search method. Similarly, the results show that the modified version of PCA is more effective in data dimension reduction, classes reparability, and classification accuracy than using traditional PCA. Keywords: Feature Engineering; Deep Learning; Principle Component Analysis (PCA); algorithm; prediction. 7 Acknowledgements First of all, I would like to address my most precious appreciation to my supervisor Dr Christian Dawson for his guidance and encouragement through this work. His profound knowledge, positive attitude, patient guidance, and valuable suggestions on my work guaranteed the completion of this Thesis. One simply could not wish for a better or friendlier supervisor. He has set an example of excellence as an instructor and a mentor. I have been extremely lucky to have a supervisor who cared so much about my work. My special acknowledgements go to all those people who provide me with data for my experiments. My warm appreciation is due to the Public Authority for Civil Aviation, Directorate General of Meteorology, and Ministry of Tourism in Oman. My warmly acknowledge go to his majesty Sultan Qaboos government the Ministry of Higher Education (my sponsor) and Cultural Attaché (Embassy of Oman) for providing the financial assistance and support throughout the research period. I cannot find proper words to express my deep gratitude to my family and friends for their sincere encouragement and inspiration during this period, which helped to bring me into this stage of my life. And, last but not least, I would like to thank all who have knowingly and unknowingly helped me and been involved in the successful completion of this report. Sara Al-Ruzaiqi Loughborough, England 20.09.2019 8 Abbreviations ACF Auto correlation function GDP Gross Domestic Product ANN Artificial neural network model AR Autoregressive models ARIMA Autoregressive Integrated Moving Average ARIMAX Auto Regressive Integrated Moving Average with Exogenous Input BSM Basic Structural models CRM Customer relationship management DGP Data generating process H2O Open source machine learning platform MA Moving Average models MAD Mean Absolute Deviation MAE Mean Absolute Error MAPD Mean Absolute Percentage Error MAPE Mean Absolute Percentage Division MASE Mean Absolute Scaled Square Error MECA Ministry of Environmental and Climate Affairs MSA Mean Absolute Error MSE Mean Square Error MSPE Mean Squared Prediction Error PCA Principal Component Analysis PACF partial auto correlation function RAE Relative Absolute Error RMSE Root Mean Squared Error RMSE Root Mean Squared Error RRSE Root Relative Squared Error SES Simple exponential smoothing SMAPE Symmetric Absolute Percentage Error S Seasonal component of a time series T Trend component of a time series α,β,γ Smoothing parameters for smoothing based approaches 9 yˆ Vector of time series forecasts ω Combination weight vector e Unity vector ε Forecast error yˆ Time series forecast yˆc Combined time series forecast ω Weights for linear forecast combination φ Dampening factor 10 Table of Contents Chapter 1 .............................................................................................................................. 19 1.1 Introduction ....................................................................................................................... 19 1.2 Research Aims and Objectives ..................................................................................... 24 1.3 Research Hypotheses ...................................................................................................... 26 1.4 Research Contributions ................................................................................................. 26 1.4.1 Conceptual Contribution ........................................................................................................ 26 1.4.2 Technical Contributions ......................................................................................................... 27 1.4.3 Comprehensive Literature Investigation ........................................................................ 28 1.4.4 Data Collection ........................................................................................................................... 29 1.5 Thesis Overview ............................................................................................................... 30 Chapter 2: Literature Review ........................................................................................ 31 2.1 Introduction ....................................................................................................................... 31 2.2 Traditional Time Series Forecasting ......................................................................... 31 2.3 Simple Forecasting Methods ........................................................................................ 32 2.3.1 Average Method ......................................................................................................................... 32 2.3.2 Naïve Method .............................................................................................................................. 33 2.3.3 Seasonal Naïve Method ........................................................................................................... 33 2.3.4 Drift Method ................................................................................................................................ 33 2.4 Exponential Smoothing .................................................................................................. 34 2.4.1 Simple Exponential Smoothing ........................................................................................... 34 2.4.2 Holt’s Linear Trend Method .................................................................................................. 35 2.4.3 Exponential Trend Method ................................................................................................... 36 2.4.4 Damped Trend Methods ......................................................................................................... 36 2.4.5 Additive Damped Trend ......................................................................................................... 37 2.4.6 Multiplicative Damped Trend .............................................................................................. 37 2.4.7 Holt-Winters Seasonal Method ........................................................................................... 38 2.4.8 Holt-Winters Additive Seasonal Method ......................................................................... 38 2.4.9 Holt-Winters Multiplicative Seasonal Method .............................................................

Load more