Master of Science in computer science February 2018
Demand Forecasting Of Outbound Logistics Using Machine learning
Ashik Talupula
Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fulfilment of the requirements for the degree of Master of Science in computer science . The thesis is equivalent to 20 weeks of full time studies.
The authors declare that they are the sole authors of this thesis and that they have not used any sources other than those listed in the bibliography and identified as references. They further declare that they have not submitted this thesis at any other institution to obtain a degree.
Contact Information: Author(s): Ashik Talupula E-mail:[email protected]
University advisor: Dr. Hüseyin Kusetogullari Department of Computer Science
Faculty of Computing Internet : www.bth.se Blekinge Institute of Technology Phone : +46 455 38 50 00 SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57 Abstract
Background. long term volume forecasting is important for logistics service provider for planning their capacity and taking the strategic decisions. At present demand is estimated by using traditional methods of averaging techniques or with their own experiences which often contain some error.This study is focused on filling these gaps by using machine learning approaches.Sample data set is provided by the organiza- tion, which is the leading manufacture of Trucks, buses and construction equipment, organization has a customers from more than 190 markets and has a production fa- cilities in 18 countries. Objectives. This study is to investigate a suitable machine learning algorithm that can be used for forecasting demand of outbound distributed products and then eval- uating the performance of the selected algorithms by conducting an experiment to articulate the possibility of using long-term forecasting in transportation. Methods. primarily, literature review was initiated to find a suitable machine learn- ing algorithm and then based on the results of literature review an experiment is performed to evaluate the performance of the selected algorithms Results. Selected CNN,ANN and LSTM models are performing quite well But based on the type and amount of historical data that models were given to learn, models have a very slight difference in performance measures in terms of forecasting performance. Comparisons are made with different measures that are selected by the literature review Conclusions. This study examines the efficacy of using Convolutional Neural Net- works (CNN) for performing demand forecasting of outbound distributed products at country level. The methodology provided uses convolutions on historical loads. The output from the convolutional operation is supplied to fully connected layers to- gether with other relevant data. The presented methodology was implemented on an organization data set of outbound distributed products per month. Results obtained from the CNN were compared to results obtained by Long Short Term Memories LSTM sequence-to-sequence (LSTM S2S) and Artificial Neural Networks (ANN) for the same dataset. Experimental results showed that the CNN outperformed LSTM while producing comparable results to the ANN. Further testing is needed to compare the performances of different deep learning architectures in outbound forecasting.
Keywords: Demand forecasting , time series , outbound logistics, machine learning.
i Acknowledgments
First of all, I would like to thank my university supervisor, Dr. Hüseyin Kuse- togullari. He was always open when I ran into a trouble spot or had a question about my research or writing query. He always permitted this paper to be my own work, but steered me in the right adirection whenever he thought I required it. I would also like to thank my supervisor at Volvo Teja Yerneni for supporting me not only with the thesis part but also in motivating and collaborating with the team at Volvo. Finally, I must express my deep appreciation to my parents and to my friends for offering me with unfailing support and continuous encouragement throughout my years of study and through the process of researching and writing this thesis. Without them, this achievement would not have been feasible. Thank you.
ii Contents
Abstract i
Acknowledgments ii
1 Introduction 1 1.1 Problem Statement ...... 2 1.1.1 Aim ...... 3 1.1.2 Objectives ...... 3 1.1.3 Research Questions ...... 3
2 Related Work 4 2.1 Time series forecasting ...... 6
3 Preliminaries 7 3.1 Forecasting ...... 7 3.2 Time series ...... 7 3.2.1 Univariate ...... 7 3.2.2 Multivariate ...... 7 3.2.3 Components of time series ...... 8 3.3 Time series forecasting as a supervised problem ...... 9 3.3.1 Supervised learning ...... 9 3.3.2 Sliding window approach for time series data ...... 9 3.4 Artificial Neural Networks ...... 9 3.4.1 Activation Functions ...... 10 3.4.2 Recurrent Neural Networks ...... 12 3.4.3 LSTM ...... 13 3.4.4 CNN ...... 14 3.5 ARIMA ...... 14 3.6 SVR ...... 15 3.7 Multiple parallel input and Multi step output...... 16
4 Method 18 4.1 Data gathering ...... 19 4.2 Data pre-processing ...... 19 4.3 Data set ...... 20 4.4 Experiment setup ...... 20 4.5 performance metrics ...... 21 4.6 Walk forward Validation ...... 21
iii 5 Results 23 5.1 Learning curve ...... 23 5.2 FORECASTS ...... 24 5.3 Forecasting Performance ...... 24 5.4 Validity Threats ...... 25
6 Analysis and Discussion 26 6.1 Implementation ...... 26 6.2 Discussion ...... 27
7 Conclusions and Future Work 28
References 29
A Supplemental Information 32
iv List of Figures
1.1 Outbound process ...... 2
2.1 Time series ...... 6
3.1 Univariate time series ...... 7 3.2 multivariate time series ...... 8 3.3 time series decomposition ...... 8 3.4 Time series data ...... 9 3.5 supervised problem ...... 9 3.6 single layer perceptron ...... 10 3.7 Multi layer perceptron ...... 10 3.8 Sigmoid ...... 11 3.9 Tan-h ...... 11 3.10 Relu ...... 12 3.11 Recurrent and feed forward networks structure ...... 12 3.12 LSTM Architecture ...... 13 3.13 support vector regressor ...... 16 3.14 multivariate time series ...... 16 3.15 Transformation of input and output from the above series ...... 17
4.1 Data set ...... 20 4.2 Walk forward validation ...... 22
5.1 LSTM training graph ...... 23 5.2 CNN training graph ...... 23 5.3 Actual vs forecast using CNN ...... 24 5.4 Actual vs forecast using CNN ...... 24 5.5 Models performances ...... 25
A.1 Distribution of residuals ...... 32 A.2 Actual vs forecast using LSTM ...... 32 A.3 Decomposition of Time series ...... 33 A.4 forecsat using LSTM ...... 33 A.5 forecast using LSTM ...... 34
v Chapter 1 Introduction
A supply chain consists of all activities bounded with moving goods from raw materi- als to the consumer [35]. Sales and Order Planning(SOP) is responsible for planning and agreeing volume from all business units for the upcoming months on the first hand. Then it plays the role of communication of those volumes to operation plants and production logistics to plan supply chain activities[33]. Logistics is the process of distribution of goods from point of origin to point of consumption to meet consumer requirements. Inbound logistics refers to transport, storage, delivery of goods coming inside a business and outbound logistics refers to the same for goods going outside of a business[34]. The process starts, when a customer places an order by connecting to the sales department, the order is then processed by sales department and assigns it to the production plant. Sales office provides the customer with customer delivery date (CDD). CDD is provided if, and only if goods are directly transported to the customer location and it is specified as Available at Terminal Date (ATD) and Indi- cated Customer Delivery Date(I-CDD) if the goods will pass through terminal and noted as transfer. An ATD shows when the order ought to be at the terminal and be prepared to stack onto the following transport unit and an I-CDD. The business volume of logistics has a sustainable growth with the advancement of the economy and improved offline and online technology thus, efficient logistics demand prediction is needed to manage their processes in an organized manner[18] . Forecasting is the process of predicting the future, based on past or current data. Forecasting plays an important role in sales and operations planning for taking strate- gic and planning decisions. Forecasted values are just the projections, we don’t get the exact value we only try to reduce the error with the help of forecasting tools and more sophisticated models. One can easily forecast sales by using different fore- casting techniques like ARIMA[22], SVM [20], ANN [23][37], LSTM[13],CNN[15][30], etc. by having the details of previous sales record and accurate demand details.
1 Chapter 1. Introduction 2
Figure 1.1: Outbound process
Forecasting on outbound distributed products lowers the cost of warehousing and transportation by optimizing the logistic process through consolidation, capacity planning and collaboration using a third-party logistics provider. The purpose of the Thesis is to forecast outbound distributed products of a manufacturing company that uses third party logistics (3PL) services for distribution of their products through Air, water and road transportation. Third party services include handling logistics such as warehousing, packaging, fulfillment and distribution.
1.1 Problem Statement Most of the logistics service providers faces several challenges in managing warehouse and distribution of products such as capacity planning, freight volume. So, there is a need to study outbound processes of a manufacturing company for developing a proper plan to overcome the challenges. Transportation is the major part of Logistics, where securing the capacity in carriers would be the most concerned issue for the logistics services especially international logistics. The risk of shortage with capacity with carrier providers can be minimized by an early request of space, this can be achieved through reliable Long-term volume forecasting (LTVF). This also help in planning and handling the higher capacity demand of transportations, which can’t be handled by the carrier providers. Carrier providers could increase their service Chapter 1. Introduction 3 capacity upon request to meet our demands if we could give an early demand.
1.1.1 Aim The main aim of the thesis is to investigate suitable machine learning model that can translate SOP (sales and operation planning) information into forecasting infor- mation for the outbound Distribute product processes.
1.1.2 Objectives • Identifying an appropriate machine learning model for forecasting outbound logistics.
• Evaluating the efficiency of selected machine learning algorithm
1.1.3 Research Questions • RQ 1: What are the available state-of-art-methods used in forecasting? The motivation for this research question is to find out a suitable forecasting method to identify underlying causes over a period.
• RQ 2: Which Machine Learning Model would perform better forecasting on time series data? Motivation: The motivation for this research question is to evaluate different time series forecasting models on outbound logistics and selecting the appro- priate one based on the performance. Chapter 2 Related Work
Related work for this research incorporates demand forecasting of supply chain and logistics in general. The investigations on forecasting demand and its connection to supply chain network began far earlier. In 1960, Winter exhibited strategies of the exponential forecasting framework for forecasting sales for the purpose of optimizing production planning. In the most recent decade, a few papers were proposed to deal with this issue. Gilbert (2005) [14] stated a multistage inventory network model build on ARIMA, he also motivated about the cause of bullwhip effect and also the demand variations in inventory and galling orders. Besides, Liang (2006) [31] proposed a solution for estimating the ordering capacity for the period t+ 1 of a multi-echelon supply chain, where every entity was permitted to use diverse inventory structure. Aburto and Weber (2007) [1] presented an hybrid intelligent system which is combination of both neural networks and ARIMA(auto regressive and moving aver- age model) for forecasting the demand. In 2008, Carbonneau [5] stated the use of advance non linear machine learning algorithms in the context of extended supply chain. Garcia et al. (2012) [11], used support vector machines to solve the issues faced by distribution and discovery of new models. Kandananond (2012) [24] stated that In forecasting consumer product demand Support vector machine outran the results compared to ANN’s (artificial neural networks) and also in the roiling year , the same author [24] mentioned that SVM surpasses ARIMA method of forecasting. Manas Gaur, Shruthi Goel, and Eshaan Jain (2015) [12] used K Nearest Neighbor and the Bayesian networks for forecasting the demand in supply chain. The aim of their study is to find the suitable one by comparing both the algorithms and also adaptive boosting technique is used in conjunction with different algorithms to improve the performance of the model. Results of the experiment shows that Bayesian networks with or without adding adaptive boosting surpasses the K Nearest Neighbor (KNN). Also, KNN with two nearest neighbors gave the promising results Wen-Jing Yuan and ZE-YI Jin [38]proposed a combination of grey model and stacked autoencoder (SAE) for forecasting demand of logistics by taking merits for the uniqueness of logistics demand forecasting issue. The original data is processed through multiple grey model and the out of the grey model is given as the input to the SAE model, to get the final predicted value, extreme learning machine (ELM) is applied for exact prediction at the top and SAE for feature extraction at the bot- tom. The proposed model shows more accurate results than ordinary grey network model when applied in the empirical research on the logistics demand of a Brazilian
4 Chapter 2. Related Work 5 company. Yan Zhao and Shengchang [40], Wang proposed two forecasting models Support vector machine (SVM) and least squares support vector machine (LS SVM), to iden- tify the better forecasting model they evaluated the efficiency of both the models by considering the features of complexity and nonlinearity in highway freight vol- ume. Based on calculations the forecasting model based on LSSVM is efficient for forecasting the freight volume Pei-you Chen and LU Liu [7], proposed PSO-SVR algorithm which is combi- nation of both support vector regression (SVR) and particle swarm optimization algorithm (PSO) to forecast the demand of coal transportation. They selected rail- way freight turnover volume, amount of coal consumption and some other factors, they choose railway freight volume from 1995-2011 as learning samples, radial basis function (RBF) as the kernel of prediction model to establish the influence factors by combining both the models. Results show that the selected algorithm is superior to Neural Networks Back propagation (BP) in forecast accuracy and error. Real Carbonneau [5] studied the applications of advanced machine learning al- gorithms such as neural networks, recurrent neural networks and SVR to forecast the falsified demand data set of the supply chain. He compared it with traditional methods like linear regression, moving average. Results paved positive towards the RNN and SVR. Two different data sets were used to experiment one is collected from simulated supply chain and other one is actual Canadian foundries orders. Jingyi Du [9] stated that LSTM (neural networks) has gained a great attention in deep learning, especially time series. LSTM network is used to predict the apple stock price by using multiple feature input and single feature input variable to verify the prediction on stocks time series. Results showed positive when used multi-feature as an input. The widely used machine learning approach is Artificial Neural Network. How- ever, Hu and Zhang (2008)[17] explained the draw backs of using ANNs such as optimizing cost function and uncontrolled convergence. LSTM, support vector re- gressor and Random forest regressor to present the accurate demand forecasting by overcoming the draw backs caused by traditional methods and ANNs Kasun Amarasinghe , Daniel L Marino[2] atalked about the forecasting demand of energy load using deep neural networks. this paper investigates the effectiveness of using Convolutional Neural Networks (CNN) for performing energy load forecasting at individual building level. The presented methodology uses convolutions on his- torical loads. The output from the convolutional operation is fed to fully connected layers together with other pertinent information. The presented methodology was implemented on a benchmark data set of electricity consumption for a single res- idential customer. Results obtained from the CNN were compared against results obtained by Long Short Term Memories LSTM sequence-to-sequence (LSTM S2S), Factored Restricted Boltzmann Machines (FCRBM), “shallow” Artificial Neural Net- works (ANN) and Support Vector Machines (SVM) for the same dataset. Experi- mental results showed that the CNN outperformed SVR while producing comparable results to the ANN and deep learning methodologies. Chapter 2. Related Work 6
2.1 Time series forecasting Time series is a set of data points taken at specified time of equal intervals [6]. Time series analysis has only one variable and need to predict other variable with respect to time.Time series data are often spotted in predicting stock prices, sales of a retail store, electricity demand, airline passengers, forecasting weather. consider an observed time series as t1,t2,t3,t4. . . ..tn want to forecast for the future value of the series tn+h were h is the forecast horizon . Forecast of tn+h is made for ‘h’ steps ahead at the time ‘tn’ is represented as t^n(h). The symbol t^will distinguish from observed and forecasted values. A forecasting method is a technique for figuring out forecasts from present and past observations []. A forecasting model is selected based on given series of data. Forecasting methods and models aren’t similar they shouldn’t be used as an equivalent term. Judgmental forecasts, Univariate methods, Multivariate methods are the three types of Forecasting methods[6]
Figure 2.1: Time series
data set provided for the study follows a non-linear pattern.According to the literature study conducted many papers stated that ANN, LSTM and CNN are best suitable for adopting the patterns in non linear data. so deep learning techniques LSTM, CNN and ANN(’Artificial Neural Networks’)are adpoted for this study Chapter 3 Preliminaries
3.1 Forecasting Forecasting is determining what is going to happen in the future by analyzing from the past and current patterns in the data, it helps the business people to plan the uncertainty of what will might and might not occur. Forecasting approaches are further classified into two types: 1. Quantitative: this type of forecasting is done by taking historical data, time series or correlation information and creating these forecasts out into the future of what we think is going to happen right 2. Qualitative: These are opinions taken from experts, decision makers and customers
3.2 Time series
As discussed in the section 2.1 time series is a sequence of observations , usually ordered in time, these are further classified into univariate and multivariate time series depending on the number of dependent variables recorded with respect to time.
3.2.1 Univariate Univariate timeseries data that has only single variable recorded sequential over equal intervals of time. The table below is a univariate time series data stating the monthly sales of a product.
Figure 3.1: Univariate time series
3.2.2 Multivariate Multivariate time series data that has more than one-time dependent variable (mul- tiple time series that deals with dependent data simultaneously). These type of time series data are bang on and challenging in context the machine learning.
7 Chapter 3. Preliminaries 8
Figure 3.2: multivariate time series
3.2.3 Components of time series The several reasons which affect the values of an observation in a time series are said to be components of time series. These are decomposed into four categories:
• Trend: in time series analysis trend is a movement to relatively higher or lower values over a long period of time. When a trend pattern of the data exhibits a general direction that is upward (higher highs and higher lows) is called as upward trend. When a trend pattern of the data exhibits a general direction that is downward (lower highs and lower lows) is called as downward trend. When there is no trend it is called as horizontal trend
• Seasonality: time series data that exhibits a repeating pattern at a fixed interval of time within a one-year period is called as seasonality. It is a common pattern seen across many time series data.
• Cyclic pattern: It exists, when the data exhibits rises and falls that are not of a fixed period.
• Irregular fluctuations : these are left over series of residuals after removing of trend and cyclic variations from a data set, which may or may not be random. These fluctuations are unpredictable and erratic in nature.
Figure 3.3: time series decomposition Chapter 3. Preliminaries 9
3.3 Time series forecasting as a supervised problem Most of the time series forecasting problems are shaped as a supervised learning problem. Standard linear and nonlinear machine learning algorithms can be used by transforming time series data to a supervised learning problem.
3.3.1 Supervised learning In this type of learning machine learns under guidance, where you have the input variables represented as (X) and output variable as (Y) and the algorithms are used to learn the mapping between them.
3.3.2 Sliding window approach for time series data Using prior time steps to predict the next time step is said to be sliding window method, it is also called as lags in statistics and time series. Time series can be shaped into supervised problem by restructuring time series data set in the form of using previous time steps as input variable(X) and next time steps as out variable (y).let’s suppose we have a time series as in table 1 we can transform the time series in table1 into a supervised learning problem by using the previous values of time steps to predict the values of a next time steps.
Figure 3.4: Time series data Figure 3.5: supervised problem
3.4 Artificial Neural Networks Artificial neural networks are computing systems inspired by biological neural net- works. They are also called as neural networks, they perform tasks by learning from examples without being programmed with the set of instructions. ANN’s consists of a set of neurons connected and organized in layers. These neurons send signals to each other through a weighted connection. The neural network architecture is composed of input, output and hidden layers. Input layer is the initial input vector of the data further process by subsequent Layers of the Artificial neural networks. Hidden layers are the layers between the input and output layers where neurons of this layer take in the set of weighted inputs and gives an output through the activation function. Output layer gives the required outputs[39]. Chapter 3. Preliminaries 10
• Perceptron: It is the basic building block of neural network, it is a linear classifier used for binary prediction. This type of network works only for linear structured data.
Figure 3.6: single layer perceptron
• Multi -layer neural network:: Its has more advanced network architecture compared to perceptron. They are used in solving complex regression and classification tasks. Recurrent neu- ral networks, convolution neural networks are some examples of multi-layer perceptron[21].
Figure 3.7: Multi layer perceptron
3.4.1 Activation Functions Calculations performed in a neuron are two types activation functions and aggrega- tions. Aggregations are just the weighted sum where as activation functions define the output of the neuron given set of input data. These activation functions are different for different types of architectures. Relu, sigmoid, and Tan-h are the widely used non-linear activation functions Chapter3.Preliminaries 11
•Sigmoidactivationfunction:Thesearemostlyusedinbinaryclassification problems,itiskindoflogisticfunctionwhichgeneratesthesetofprobability outputsto0and1withthegiveninputs[25].
Figure3.8:Sigmoid
Sigmoid (x)= ex/1+ ex (3.1)
•Tan-hactivationfunction:Tan-hactivationfunctionisanalternativetologistic sigmoidfunction.Italsofollowsthesigmoidalfunctionbehaviorbut,theout valuesareboundedintherangeof-1and1,highnegativeinputvaluestoTan-h functionwillmaptonegativeoutputs[25]
Figure3.9:Tan-h
Tanh (x)=2 /1+ e2x (3.2)
•Reluactivationfunction:Itiscalledasrectifiedlinearunits,itiswidelyused inconvolutionnetworks(CNN)becauseitgivesaccuratepredictionfortrue orfalse.Itisfamousbecauseallitsnegativevaluesareconsideredas0and positivevaluesareconsideredas1[32]. Chapter3.Preliminaries 12
Figure3.10:Relu
ReLU (x)= max (0 ,x ) (3.3)
3.4.2 RecurrentNeuralNetworks Recurrentneuralnetwork(RNN)worksontheprincipleofsavingtheoutputofa layerandfeedingthisbacktotheinputinordertopredicttheoutputofthelayer. RNN’saremostlyusedforsequentialtypeofdata.TheformulationofRNNisdone byabstractingthegeneralconceptsandcommonpropertiesoffeedforwardneural networks.Thesetypesofnetworksarewidelyusedinspeechrecognition,sentiment classificationandtimeseriesprediction[16].
Figure3.11:Recurrentandfeedforwardnetworksstructure
Consideraninputsequencex=(x1...,xT),astandardrecurrentneuralnet- work(RNN)computesthehiddenvectorsequenceh=(h1...,hT)andoutputvector sequencey=(y1,...,yT)byiteratingthefollowingequationsfromt=1toT
ht=H(wxh xt+whh ht 1+bt) (3.4)
yt=Why ht+by (3.5) WhereWdenotestheweightmatrices.Histhehiddenlayerfunction,Wxhdenotes theinputhiddenweightmatrixandbhdenoteshiddenbiasvector Chapter 3. Preliminaries 13
3.4.3 LSTM Lstm (Long short-term memory) are evolved version of the recurrent neural network, during back propagation recurrent neural networks suffer from the vanishing gradient problem. Gradients are the values used to update weights of a neural network. vanishing gradient problem is when a gradient shrinks as it back propagates through time. If a gradient value becomes extremely small it doesn’t contribute to much learning so, in recurrent neural networks the layers that gets a small gradient update doesn’t learn mainly, the starting layers. Because these layers are not learning they can forget what is seen in longer sequences does having short term memory. Lstm’s are created as the solution to short term memory, they have the internal mechanisms called gates that can regulate the flow of information, these gates learns which data in a sequence is important to keep or throw away by doing this it learns to use relevant information to make predictions,. Lstms are mostly used in speech recognition, text generation and time series[19].
Figure 3.12: LSTM Architecture
A common lstm architecture is composed of a cell states and three regulators usually called as gates. Cell states acts as highway that transfers relative information all the way down to the sequence chain think of it as a memory of the network because cell state carry information throughout the sequence processing from earlier time steps could be carried all the way to the last time step thus reducing the effects of short-term memory. The gates are just different neural networks that decide which is allowed on the cell state. The gates learn what information is relevant to keep or forget during training. these contains sigmoid activation, instead of squishing values between -1 and 1 squishes values between 0 and 1 forget gate decides what information should be thrown or kept away information from the previous hidden state and information from the current input is passed through the sigmoid function. Values comes out between o and 1 closer to 0 means forget and 1 means to keep, to update the cell state we have the input gate and output gate sends aggregate values to activation function. Chapter 3. Preliminaries 14