A High Precision Artificial Neural Networks Model for Short-Term

energies

Article A High Precision Artiﬁcial Neural Networks Model for Short-Term Energy Load Forecasting

Ping-Huan Kuo 1 ID and Chiou-Jye Huang 2,* ID

1 Computer and Intelligent Robot Program for Bachelor Degree, National Pingtung University, Pingtung 90004, Taiwan; [email protected] 2 School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, Jiangxi, China * Correspondence: [email protected]; Tel.: +86-137-2624-7572

Received: 14 December 2017; Accepted: 9 January 2018; Published: 16 January 2018

Abstract: One of the most important research topics in smart grid technology is load forecasting, because accuracy of load forecasting highly influences reliability of the smart grid systems. In the past, load forecasting was obtained by traditional analysis techniques such as time series analysis and linear regression. Since the load forecast focuses on aggregated electricity consumption patterns, researchers have recently integrated deep learning approaches with machine learning techniques. In this study, an accurate deep neural network algorithm for short-term load forecasting (STLF) is introduced. The forecasting performance of proposed algorithm is compared with performances of five artificial intelligence algorithms that are commonly used in load forecasting. The Mean Absolute Percentage Error (MAPE) and Cumulative Variation of Root Mean Square Error (CV-RMSE) are used as accuracy evaluation indexes. The experiment results show that MAPE and CV-RMSE of proposed algorithm are 9.77% and 11.66%, respectively, displaying very high forecasting accuracy.

Keywords: artiﬁcial intelligence; convolutional neural network; deep neural networks; short-term load forecasting

1. Introduction Nowadays, there is a persistent need to accelerate development of low-carbon energy technologies in order to address the global challenges of energy security, climate change, and economic growth. The smart grids [1] are particularly important as they enable several other low-carbon energy technologies [2], including electric vehicles, variable renewable energy sources, and demand response. Due to the growing global challenges of climate, energy security, and economic growth, acceleration of low-carbon energy technology development is becoming an increasingly urgent issue [3]. Among various green technologies to be developed, smart grids are particularly important as they are key to the integration of various other low-carbon energy technologies, such as power charging for electric vehicles, on-grid connection of renewable energy sources, and demand response. The forecast of electricity load is important for power system scheduling adopted by energy providers [4]. Namely, inefficient storage and discharge of electricity could incur unnecessary costs, while even a small improvement in electricity load forecasting could reduce production costs and increase trading advantages [4], particularly during the peak electricity consumption periods. Therefore, it is important for electricity providers to model and forecast electricity load as accurately as possible, in both short-term [5–12] (one day to one month ahead) and medium-term [13] (one month to five years ahead) periods. With the development of big data and artificial intelligence (AI) technology, new machine learning methods have been applied to the power industry, where large electricity data need to be carefully

Energies 2018, 11, 213; doi:10.3390/en11010213 www.mdpi.com/journal/energies Energies 2018, 11, 213 2 of 13 managed. According to the Mckinsey Global Institute [14], the AI could be applied in the electricity industry for power demand and supply prediction, because a power grid load forecast affects many stakeholders. Based on the short-term forecast (1–2 days ahead), power generation systems can determine which power sources to access in the next 24 h, and transmission grids can timely assign appropriate resources to clients based on current transmission requirements. Moreover, using an appropriate demand and supply forecast, electricity retailers can calculate energy prices based on estimated demand more efficiently. The powerful data collection and analysis technologies are becoming more available on the market, so power companies are beginning to explore a feasibility of obtaining more accurate results using AI in short-term load forecasts. For instance, in the United Kingdom (UK), the National Grid is currently working with the DeepMind [15,16], a Google-owned AI team, which is used to predict the power supply and demand peaks in the UK based on the information from smart meters and by incorporating weather-related variables. This cooperation tends to maximize the use of intermittent renewable energy and reduce the UK national energy usage by 10%. Therefore, it is expected that electricity demand and supply could be predicted and managed in real time through deep learning technologies and machines, optimizing load dispatch, and reducing operation costs. The load forecasting can be categorized by the length of forecast interval. Although there is no official categorization in the power industry, there are four load forecasting types [17]: very short term load forecasting (VSTLF), short term load forecasting (STLF), medium term load forecasting (MTLF), and long term load forecasting (LTLF). The VSTLF typically predicts load for a period less than 24 h, STLF predicts load for a period greater than 24 h up to one week, MTLF forecasts load for a period from one week up to one year, and LTLF forecasts load performance for a period longer than one year. The load forecasting type is chosen based on application requirements. Namely, VSTLF and STLF are applied to everyday power system operation and spot price calculation, so the accuracy requirement is much higher than for a long term prediction. The MTLF and LTLF are used for prediction of power usage over a long period of time, and they are often referenced in long-term contracts when determining system capacity, costs of operation and system maintenance, and future grid expansion plans. Thus, if the smart grids are integrated with a high percentage of intermittent renewable energy, load forecasting will be more intense than that of traditional power generation sources due to the grid stability. In addition, the load forecasting can be classified by calculation method into statistical methods and computational intelligence (CI) methods. With recent developments in computational science and smart metering, the traditional load forecasting methods have been gradually replaced by AI technology. The smart meters for residential buildings have become available on the market around 2010, and since then, various studies on STLF for residential communities have been published [18,19]. When compared with the traditional statistical forecasting methods, the ability to analyze large amounts of data in a very short time frame using AI technology has displayed obvious advantages [10]. Some of frequently used load forecast methods include linear regression [5,6,20], autoregressive methods [7,21], and artificial neural networks [9,22,23]. Furthermore, clustering methods were also proposed [24]. In [20,25] similar time sequences were matched while in [24] the focus was on customer classification. A novel approach based on the support vector machine was proposed in [26,27]. The other forecasting methods, such as exponential smoothing and Kalman filters, were also applied in few studies [28]. A careful literature review of the latest STLF method can be found in [8]. In [13], it was shown that accuracy of STLF is influenced by many factors, such as temperature, humidity, wind speed, etc. In many studies, the artificial neural network (ANN) forecasting methods [9–11,29] have been proven to be more accurate than traditional statistical methods, and accuracy of different ANN methods has been reviewed by many researchers [1,30]. In [31], a multi-model partitioning algorithm (MMPA) for short-term electricity load forecasting was proposed. According to the obtained experimental results, the MMPA method is better than autoregressive integrated moving average (ARIMA) method. In [17], authors used the ANN-based method reinforced by wavelet denoising algorithm. The wavelet method was used to factorize electricity load data into signals with different Energies 2018, 11, 213 3 of 14

moving average (ARIMA) method. In [17], authors used the ANN-based method reinforced by Energies 2018, 11, 213 3 of 13 wavelet denoising algorithm. The wavelet method was used to factorize electricity load data into signals with different frequencies. Therefore, the wavelet denosing algorithm provides good frequencies.electricity load Therefore, data for the neural wavelet network denosing training algorithm and improves provides load good forecasting electricity accuracy. load data for neural networkIn this training study, and a new improves load loadforecasting forecasting model accuracy. based on a deep learning algorithm is presented. The Inforecasting this study, accuracy a new of load proposed forecasting model model is with basedin the on requested a deep learningrange, and algorithm model has is advantages presented. Theof simplicity forecasting and accuracy high offorecasting proposed modelperformance. is within theThe requested major contributions range, and model of this has paper advantages are: (1) of simplicityintroduction and of high a precise forecasting deep performance. neural network The mode majorl contributionsfor energy load of thisforecasting; paper are: (2) (1) comparison introduction of ofperformances a precise deep of neuralseveral network forecasting model methods; for energy and, load (3) forecasting;creation of (2)a novel comparison research of direction performances in time of severalsequence forecasting forecasting methods; based and,on convolutional (3) creation of neural a novel networks. research direction in time sequence forecasting based on convolutional neural networks. 2. Methodology of Artificial Neural Networks 2. Methodology of Artificial Neural Networks Artificial neural networks (ANNs) are computing systems inspired by the biological neural networks.Artificial The neural general networks structure (ANNs) of ANNs are computingcontains neurons, systems weights, inspired and by thebias. biological Based on neural their networks.powerful Themolding general ability, structure ANNs of are ANNs still contains very popular neurons, in the weights, machine and learning bias. Based field. on However, their powerful there moldingare many ability, ANN ANNs structures are still used very in popularthe machine in the le machinearning problems, learning field.but the However, Multilayer there Perceptron are many ANN(MLP) structures [32] is the used most in commonly the machine used learning ANN problems,type. The butMLP the is Multilayera fully connected Perceptron structure (MLP) artificial [32] is theneural most network. commonly The used structure ANN of type. MLP The is shown MLP is in a fullyFigure connected 1. In general, structure the MLP artificial consists neural of one network. input Thelayer, structure one or ofmore MLP hidden is shown layers, in Figure and1 one. In general,output layer. the MLP However, consists the of one MLP input network layer, onepresented or more in hiddenFigure layers,1 is the andmost one common output MLP layer. structure, However, which the MLP has networkonly one presentedhidden layer. in Figure In the1 MLP,is the all most the commonneurons MLPof the structure, previous whichlayer are has fully only connected one hidden to layer. the neurons In the MLP, of the all next the neuronslayer. In ofFigure the previous 1, x1, x2, layerx3, ... are, x6 fullyare the connected neurons toof thethe neuronsinput layer, of the h1, next h2, h layer.3, h4 are In the Figure neurons1, x1, ofx2 ,thex3 ,hidden... , x6 layer,are the and neurons y1, y2, ofy3, the y4 are input the layer, neuronsh1, h of2, hthe3, h output4 are the layer. neurons In the of thecase hidden of energy layer, load and forecasting,y1, y2, y3, y the4 are input the neurons is the past of theenergy output load, layer. and In the the output case of is energy the future load energy forecasting, load. the Although, input is thethe pastMLP energy structure load, is and very the simple, output it isprovides the future good energy results load. in Although,many applications. the MLP structureThe most is commonly very simple, used it provides algorithm good for resultsMLP training in many is applications.the backpropagation The most algorithm. commonly used algorithm for MLP training is the backpropagation algorithm.

Figure 1. The Multilayer Perceptron (MLP) structure. Figure 1. The Multilayer Perceptron (MLP) structure.

Although MLPs are very good in modelling and patter recognition, the convolutional neural networksAlthough (CNNs) MLPs provide are very better good accuracy in modelling in highly andpatter non-linear recognition, problems, the convolutionalsuch as energy neural load networksforecasting. (CNNs) The CNN provide uses betterthe concept accuracy of weight in highly sharing. non-linear The one-dimensional problems, such convolution as energy load and forecasting.pooling layer The are CNN presented uses thein Figure concept 2. The of weight lines in sharing. the same The color one-dimensional denote the same convolution sharing weight, and pooling layer are presented in Figure2. The lines in the same color denote the same sharing weight, and sets of the sharing weights can be treated as kernels. After the convolution process, the inputs x1, and sets of the sharing weights can be treated as kernels. After the convolution process, the inputs x , x2, x3, ... , x6 are transformed to the feature maps c1, c2, c3, c4. The next step in Figure 2 is pooling,1 xwherein2, x3, ... the, x6 featureare transformed map of convolution to the feature layer maps is sampledc1, c2, c and3, c4 its. The dimension next step is inreduced. Figure2 For is pooling,instance, whereinin Figure the 2 dimension feature map of of the convolution feature map layer is 4, is and sampled after andpooling its dimension process that isreduced. dimension For is instance, reduced into Figure2. The2 process dimension of pooling of the featureis an important map is 4, procedure and after poolingto extract process the important that dimension convolution is reduced features. to 2. The process of pooling is an important procedure to extract the important convolution features.

Energies 2018, 11, 213 4 of 13 Energies 2018, 11, 213 4 of 14 Energies 2018, 11, 213 4 of 14

Figure 2. The one-dimensional (1D) convolution and pooling layer. FigureFigure 2. 2. TheThe one-dimensional one-dimensional (1D) (1D) convolution convolution and and pooling pooling layer. layer. The other popular solution of the forecasting problem is Long Short Term Memory network TheThe other other popular popular solution solution of of the the forecasting forecasting problem problem is is Long Long Short Short Term Term Memory Memory network network (LSTM) [33]. The LSTM is a recurrent neural network, which has been used to solve many time (LSTM)(LSTM) [33]. [33]. The The LSTM LSTM is is a a recurrent recurrent neural neural network, network, which which has has been been used used to to solve solve many many time time sequence problems. The structure of LSTM is shown in Figure 3, and its operation is illustrated by sequencesequence problems. The structurestructure ofof LSTMLSTM is is shown shown in in Figure Figure3, and3, and its its operation operation is illustrated is illustrated by theby the following equations: thefollowing following equations: equations: σ ⋅ f tfttf = =([,]) W( h· −[1 x + b] + ) (1) f = f σt ([,]) Wσ W⋅ hf h xt−1 +, x bt b f (1)(1) tfttfσ ⋅ −1 ititti = σ ([,]) W⋅ h−1 x + b (2) i = i t =([,]) Wσ(W hi −· [h xt−1 +, xbt] + bi) (2)(2)  titti⋅ 1 CtCttC = tanh( W⋅ [ h−1 , x ] + b ) (3) C =Ce ttanh(= tanh W(W [ hC−· ,[h xt− ] +1, bxt] ) + bC) (3)(3) tCttC×+×1  Cttttt = f CiC−1  (4) C = f ×+×CiC− (4) tttttCσt = ft⋅×1Ct−1 + it × Cet (4) ototto = σ ([,]) W⋅ h−1 x + b (5) ototto = ([,]) W h−1 x + b (5) ot ==×σ(Wo · [ht−1, xt] + bo) (5) hott=×tanh( C t ) (6) hotttanh( C t ) (6) ht = ot × tanh(Ct) (6) where xt is the network input, and ht is the output of hidden layer, σ denotes the sigmoidal function, where xt is the network input, and ht is the output of hidden layer, σ denotes the sigmoidal function, where xt is the network input, and ht is the output of hidden layer, σ denotes the sigmoidal function, Ct is the cell state, and Ct denotes the candidate value of the state. Besides, there are three gates in Ct is the cell state, and C denotes the candidate value of the state. Besides, there are three gates in Ct is the cell state, and Cet denotes the candidate value of the state. Besides, there are three gates in LSTM: it is the input gate, ot is the output gate, and ft is the forget gate. The LSTM is designed for LSTM:LSTM: iit isis the the input input gate, gate, ot is is thethe outputoutput gate,gate, andand fft isis thethe forgetforget gate.gate. The The LSTM LSTM is is designed designed for for solving tthe long-term dependencyt problem. In general,t the LSTM provides good forecasting results. solvingsolving the the long-term long-term dependency dependency problem. problem. In In general, general, the the LSTM LSTM provides provides good forecasting results.

 Ct Ct

Figure 3. The Long Short Term Memory network (LSTM) structure. FigureFigure 3. 3. TheThe Long Long Short Short Term Term Memory Memory network network (LSTM) (LSTM) structure. structure. 3. The Proposed Deep Neural Network 3.3. The The Proposed Proposed Deep Deep Neural Neural Network Network The structure of the proposed deep neural network DeepEnergy is shown in Figure 4. Unlike The structure of the proposed deep neural network DeepEnergy is shown in Figure 4. Unlike the generalThe structure forecasting of the method proposed based deep on neural the LS networkTM, the DeepEnergy DeepEnergy is shownuses the in CNN Figure structure.4. Unlike theThe the general forecasting method based on the LSTM, the DeepEnergy uses the CNN structure. The generalinput layer forecasting denotes method the information based on on the past LSTM, load, the an DeepEnergyd the output usesvalues the represent CNN structure. the future The energy input input layer denotes the information on past load, and the output values represent the future energy layerload. denotesThere are the two information main processes on past in load, DeepEnergy, and the outputfeature values extraction, represent and forecasting. the future energy The feature load. load. There are two main processes in DeepEnergy, feature extraction, and forecasting. The feature Thereextraction are two in DeepEnergy main processes is performed in DeepEnergy, by three feature convolution extraction, layers and forecasting. (Conv1, Conv2, The feature and Conv3) extraction and extraction in DeepEnergy is performed by three convolution layers (Conv1, Conv2, and Conv3) and three pooling layers (Pooling1, Pooling2, and Pooling3). The Conv1–Conv3 are one-dimensional (1D) three pooling layers (Pooling1, Pooling2, and Pooling3). The Conv1–Conv3 are one-dimensional (1D) convolutions, and the feature maps are all activated by the Rectified Linear Unit (ReLU) function. convolutions, and the feature maps are all activated by the Rectified Linear Unit (ReLU) function.

Energies 2018, 11, 213 5 of 13 in DeepEnergy is performed by three convolution layers (Conv1, Conv2, and Conv3) and three pooling layers (Pooling1, Pooling2, and Pooling3). The Conv1–Conv3 are one-dimensional (1D) convolutions, and the feature maps are all activated by the Rectified Linear Unit (ReLU) function. Besides, the kernel sizesEnergies of Conv1, 2018, 11, 213 Conv2, and Conv3 are 9, 5, 5, respectively, and the depths of the feature maps are5 of 16, 14 32, 64, respectively. The pooling method of Pooling1 to Pooling3 is the max pooling, and the pooling size Besides, the kernel sizes of Conv1, Conv2, and Conv3 are 9, 5, 5, respectively, and the depths of the is equal to 2. Therefore, after the pooling process, the dimension of the feature map will be divided by feature maps are 16, 32, 64, respectively. The pooling method of Pooling1 to Pooling3 is the max pooling, 2 to extract the important features of the deeper layers. and the pooling size is equal to 2. Therefore, after the pooling process, the dimension of the feature map In the forecasting, the first step is to flat the Pooling3 layer into one dimension and construct will be divided by 2 to extract the important features of the deeper layers. a fully connected structure between Flatten layer and Output layer. In order to fit the values previously In the forecasting, the first step is to flat the Pooling3 layer into one dimension and construct a normalizedfully connected in the structure range [0, between 1], thesigmoidal Flatten layer function and Outp is chosenut layer. as In an order activation to fit the function values previously of the output layer.normalized Furthermore, in the range in order [0, 1], tothe overcome sigmoidal thefunction overfitting is chosen problem, as an activation the dropout function technology of the output [34] is adoptedlayer. Furthermore, in the fully connected in order to layer. overcome Namely, the the over dropoutfitting problem, is an efficient the drop wayout to preventtechnology overfitting [34] is in artificialadopted neural in the network.fully connected During layer. the Namely, training the process, dropout neurons is an efficient are randomly way to “dead”.prevent overfitting As shown in Figurein artificial4, the outputneural network. values of During chosen the neurons training (the process, gray neurons circles) are randomly equal to zero “dead”. in certain As shown training in iteration.Figure 4, The the chosenoutput neuronsvalues of arechosen randomly neurons changed (the gray during circles) training are equal process. to zero in certain training iteration.Furthermore, The chosen the neurons flowchart are of random proposedly changed DeepEnergy during is training represented process. in Figure5. Firstly, the raw energyFurthermore, load data are the loaded flowchart into of the prop memory.osed DeepEnergy Then, the data is represented preprocessing in Figure is executed 5. Firstly, and the data raw are normalizedenergy load in data the rangeare loaded [0, 1] into in order the memory. to fit the Then, characteristic the data preprocessing of the machine is executed learning and model. data For are the purposenormalized of validation in the range of DeepEnergy [0, 1] in order generalization to fit the characteristic performance, of the themachine data arelearning split intomodel. training For the data andpurpose testing of data. validation The training of DeepEnergy data are generalization used for training performance, of proposed the model.data are After split theinto trainingtraining process,data theand proposed testing data. DeepEnergy The training network data are is used created for traini andng initialized. of proposed Before model. the Af training,ter the training the training process, data arethe randomly proposed shuffled DeepEnergy to force network the proposed is created model and init to learnialized. complicated Before the training, relationships the training between data input andare output randomly data. shuffled The training to force datathe proposed are split mode into severall to learn batches. complicated According relationships to the orderbetween of shuffledinput and output data. The training data are split into several batches. According to the order of shuffled data, the model is trained on all of the batches. During the training process, if the desired Mean Square data, the model is trained on all of the batches. During the training process, if the desired Mean Error (MSE) is not reached in the current epoch, the training will continue until the maximal number Square Error (MSE) is not reached in the current epoch, the training will continue until the maximal of epochs or desired MSE is reached. On the contrary, if the maximal number of epochs is reached, number of epochs or desired MSE is reached. On the contrary, if the maximal number of epochs is then the training process will stop regardless the MSE value. Final performances are evaluated to reached, then the training process will stop regardless the MSE value. Final performances are demonstrateevaluated to feasibility demonstrate and feasibility practicability and practicability of the proposed of the method. proposed method.

Input Conv1 Pooling1 Conv2 Pooling2 Conv3 Pooling3 Flatten Output ......

Feature extraction Forecasting

Figure 4. The DeepEnergy structure. Figure 4. The DeepEnergy structure.

Energies 2018, 11, 213 6 of 13 Energies 2018, 11, 213 6 of 14

Start Shuffle the order of training data

Load the raw Train the model energy load data No on batchs

Data preprocessing Network converged?

Split the training and testing data Yes

Performance evaluation on Initialize of the testing data neural network

End

FigureFigure 5. 5.The The DeepEnergy DeepEnergy ﬂowchart. flowchart.

4.4. Experimental Experimental Results Results InIn the the experiment, experiment, the the USA USA District District public public consumption consumption dataset dataset and and electric electric load load dataset dataset from from 20162016 provided provided by by the the Electric Electric Reliability Reliability Council Council of of Texas Texas were were used. used. Since Since then, then, the the support support vector vector machinemachine (SVM) (SVM) [35 ][35] is a popularis a popular machine machine learning learning technology, technology, in experiment; in experiment; the radial the basis radial function basis (RBF)function kernels (RBF) of SVMkernels were of SVM chosen we tore demonstratechosen to demonstrate the SVM performance. the SVM performance. Besides, the Besides, random the forestrandom (RF) forest [36], decision(RF) [36], tree decision (DT) [ 37tree], MLP, (DT) LSTM,[37], MLP, and proposedLSTM, and DeepEnergy proposed DeepEnergy network were network also implementedwere also implemented and tested. The and results tested. of The load results forecasting of load by forecasting all of the methods by all of are the shown methods in Figures are shown6–11. in InFigures the experiment, 6–11. In thethe training experiment, data werethe training two-month data data, were and two-month test data were data, one-month and test data data. were In order one- tomonth evaluate data. the In performances order to evaluate of all the listed performances methods, the of datasetall listed was methods, divided the into dataset 10 partitions. was divided In the into ﬁrst10 partition,partitions. training In the datafirst partition, consisted oftraining energy data load consisted data collected of energy in January load data and Februarycollected 2016,in January and testand data February consisted 2016, of dataand test collected data co innsisted March of 2016. data In collected the second in March partition, 2016. training In the datasecond were partition, data collectedtraining in data February were data and Marchcollected 2016, in andFebruary test data and were March data 2016, collected and test in Aprildata were 2016. data The followingcollected in partitionsApril 2016. can The be deduced following by partitions the same can analogy. be deduced by the same analogy. InIn Figures Figure6 –6–11,11, red red curvescurves denotedenote the forecasting forecasting results results of of the the corresponding corresponding models, models, and and blue bluecurves curves represent represent the the ground ground truth. truth. The The vertical vertical axes axes represent represent the energyenergy loadload (MWh), (MWh), and and the the horizontalhorizontal axes axes denote denote the the time time (hour).(hour). The energy energy load load from from the the past past (24 (24 × 7)× h7) was h was used used as an as input an inputof the of theforecasting forecasting model, model, and and predicted predicted energy energy load load in in the the next next (24 ×× 3) hh waswas anan output output of of the the forecastingforecasting model. model. After After the the models models received received the the past past (24 (24× 7)× 7) h data,h data, they they forecasted forecasted the the next next (24 (24× 3)× 3) h energyh energy load, load, red red curves curves in Figures in Figures6–11. Besides,6–11. Besides, the correct the correct information information is illustrated is illustrated by blue curves. by blue Thecurves. differences The differences betweenred between and blue red curvesand blue denote curves the denote performances the performances of the corresponding of the corresponding models. Formodels. the sake For of the comparison sake of comparison fairness, testing fairness, data testin wereg not data used were during not used the training during the process training of models. process Accordingof models. to According the results to presented the results in Figurespresented6–11 in, theFigures proposed 6–11, DeepEnergythe proposed network DeepEnergy has the network best predictionhas the best performance prediction among performance all of the among models. all of the models.

Energies 2018, 11, 213 7 of 13

Energies 2018, 11, 213 7 of 14

(a) (b)

(e) (f)

Figure 6. The forecasting results of support vector machine (SVM): (a) Partial results A; (b) Partial Figure 6. The forecasting results(e) of support vector machine (SVM): (a(f)) Partial results A; (b) Partial results B; (c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F. results B;Figure (c) Partial 6. The forecasting results C; (resultsd) Partial of support results vector D; (e machine) Partial (SVM): results (a E;) Partial (f) Partial results results A; (b) Partial F. results B; (c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F.

(a) (b)

(e) (f)

Figure 7. The forecasting results of random forest (RF): (a) Partial results A; (b) Partial results B; (c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F. Energies 2018, 11, 213 8 of 14 Energies 2018, 11, 213 8 of 14 Energies 2018, 11Figure, 213 7. The forecasting results of random forest (RF): (a) Partial results A; (b) Partial results B; (c) 8 of 13 FigurePartial 7.results The forecasting C; (d) Partial results results of D;random (e) Partial forest results (RF): E;(a )( fPartial) Partial results results A; F. ( b) Partial results B; (c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F.

(a) (b) (a) (b)

(e) (f) (e) (f) Figure 8. The forecasting results of decision tree (DT): (a) Partial results A; (b) Partial results B; (c) Figure 8. The forecasting results of decision tree (DT): (a) Partial results A; (b) Partial results B; FigurePartial 8.results The forecastingC; (d) Partial results results of D; decision (e) Partial tree results (DT): (E;a) (Partialf) Partial results results A; F. ( b ) Partial results B; (c) (c) PartialPartial results results C; C; (d ()d Partial) Partial resultsresults D; D; (e ()e Partial) Partial results results E; (f E;) Partial (f) Partial results results F. F.

(a) (b) (a) (b)

(e) (f)

Figure 9. The forecasting results of Multilayer Perceptron (MLP): (a) Partial results A; (b) Partial Figure 9. The forecasting results of Multilayer Perceptron (MLP): (a) Partial results A; (b) Partial results B; results B; (c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F. (c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F.

(a) (b)

(e) (f)

Figure 10. The forecasting results of LSTM: (a) Partial results A; (b) Partial results B; (c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F.

(a) (b)

Energies 2018, 11, 213 9 of 14

(e) (f)

Energies 2018, Figure11, 213 9. The forecasting(e )results of Multilayer Perceptron (MLP): (a) Partial(f) results A; (b) Partial 9 of 13 Figureresults B;9. (Thec) Partial forecasting results resultsC; (d) Partialof Multilayer results D;Perceptron (e) Partial (MLP):results E;(a )( fPartial) Partial results results A; F. (b ) Partial results B; (c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F.

(a) (b)

(e) (f)

Figure 10. The forecasting(e )results of LSTM: (a) Partial results A; (b) Partial( resultsf) B; (c) Partial results Figure 10. The forecasting results of LSTM: (a) Partial results A; (b) Partial results B; (c) Partial results C; FigureC; (d) Partial 10. The results forecasting D; (e) resultsPartial ofresults LSTM: E; ((af) Partial results A;F. ( b) Partial results B; (c) Partial results (d) Partial results D; (e) Partial results E; (f) Partial results F. C; (d) Partial results D; (e) Partial results E; (f) Partial results F.

(a) (b) Energies 2018, 11, 213 10 of 14 (a) (b)

(e) (f)

Figure 11. The forecasting results of proposed DeepEnergy: (a) Partial results A; (b) Partial results B; Figure 11. The forecasting results of proposed DeepEnergy: (a) Partial results A; (b) Partial results B; (c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F. (c) Partial results C; (d) Partial results D; (e) Partial results E; (f) Partial results F. In order to evaluate the performance of forecasting models more accurately, the Mean Absolute Percentage Error (MAPE) and Cumulative Variation of Root Mean Square Error (CV-RMSE) were employed. The MAPE and CV-RMSE are defined by Equations (7) and (8), respectively, where yn

denotes the measured value, yˆn is the estimated value, and N represents the sample size.

1 N yy− ˆ MAPE =  nn (7) Nyn =1 n

2 1 N yy− ˆ  nn Nyn =1 n CV-RMSE = (8) 1 N  yn N n=1 The detailed experimental results are presented numerically in Tables 1 and 2. As shown in Tables 1 and 2, the MAPE and CV-RMSE of the DeepEnergy model are the smallest and the goodness of error is the best among all models, namely, average MAPE and CV-RMSE are 9.77% and 11.65%, respectively. The MAPE of MLP model is the largest among all of the models; an average error is about 15.47%. On the other hand, the CV-RMSE of SVM model is the largest among all models; an average error is about 17.47%. According to the average MAPE and CV-RMSE values, the electric load forecasting accuracy of tested models in descending order is as follows: DeepEnergy, RF, LSTM, DT, SVM, and MLP. It is obvious that red curve in Figure 11, which denotes the DeepEnergy algorithm, is better than other curves in Figures 6–10, which further verifies that the proposed DeepEnergy algorithm has the best prediction performance. Therefore, it is proven that the DeepEnergy STLF algorithm proposed in the paper is practical and effective. Although the LSTM has good performance in time sequence problems, in this study, the reduction of training loss is still not fast enough to handle this forecasting problem because the size of input and output data is too large for the traditional LSTM neural network. Therefore, the traditional LSTM is not suitable for this kind of prediction. Finally, the experimental results show that proposed DeepEnergy network provides the best results in energy load forecasting.

Energies 2018, 11, 213 10 of 13

In order to evaluate the performance of forecasting models more accurately, the Mean Absolute Percentage Error (MAPE) and Cumulative Variation of Root Mean Square Error (CV-RMSE) were employed. The MAPE and CV-RMSE are deﬁned by Equations (7) and (8), respectively, where yn denotes the measured value, yˆn is the estimated value, and N represents the sample size.

N 1 yn − yˆn MAPE = ∑ (7) N n=1 yn s N 2 1 yn−yˆn N ∑ yn n=1 CV − RMSE = (8) N 1 N ∑ yn n=1 The detailed experimental results are presented numerically in Tables1 and2. As shown in Tables1 and2, the MAPE and CV-RMSE of the DeepEnergy model are the smallest and the goodness of error is the best among all models, namely, average MAPE and CV-RMSE are 9.77% and 11.65%, respectively. The MAPE of MLP model is the largest among all of the models; an average error is about 15.47%. On the other hand, the CV-RMSE of SVM model is the largest among all models; an average error is about 17.47%. According to the average MAPE and CV-RMSE values, the electric load forecasting accuracy of tested models in descending order is as follows: DeepEnergy, RF, LSTM, DT, SVM, and MLP.

Table 1. The experimental results in terms of Mean Absolute Percentage Error (MAPE) given in percentages.

Test SVM RF DT MLP LSTM DeepEnergy #1 7.327408 7.639133 8.46043 9.164315 10.40804813 7.226127 #2 7.550818 8.196129 10.23476 11.14954 9.970662683 8.244051 #3 13.07929 10.11102 12.14039 19.99848 14.85568499 11.00656 #4 16.15765 17.27957 19.86511 22.45493 12.83487893 12.17574 #5 5.183255 6.570061 8.50582 15.01856 5.479091542 5.41808 #6 10.33686 9.944028 11.11948 10.94331 11.7681534 9.070998 #7 8.934657 6.698508 8.634132 7.722149 7.583802292 9.275215 #8 18.5432 16.09926 17.17215 16.93843 15.6574951 13.2776 #9 49.97551 17.9049 21.29354 29.06767 16.31443679 11.18214 #10 11.20804 8.221766 10.68665 12.20551 8.390061493 10.80571 Average 14.82967 10.86644 12.81125 15.46629 11.32623153 9.768222

Table 2. The experimental results in terms of Cumulative Variation of Root Mean Square Error (CV-RMSE) given in percentages.

Test SVM RF DT MLP LSTM DeepEnergy #1 9.058992 9.423908 10.57686 10.65546 12.16246177 8.948922 #2 10.14701 10.63412 12.99834 13.91199 12.19377007 10.46165 #3 17.02552 12.42314 14.58249 23.2753 16.9291218 13.30116 #4 21.22162 21.1038 24.48298 23.63544 14.13596516 14.63439 #5 6.690527 7.942747 10.10017 15.44461 6.334195125 6.653999 #6 11.88856 11.6989 13.39033 12.20149 12.96057349 10.74021 #7 10.77881 7.871596 10.35254 8.716806 8.681353107 10.85454 #8 19.49707 17.09079 18.95726 17.73124 16.55737557 14.51027 #9 54.58171 19.91185 24.84425 29.37466 17.66342548 13.01906 #10 13.80167 10.15117 13.06351 13.39278 10.20235927 13.47003 Average 17.46915 12.8252 15.33487 16.83398 12.78206008 11.65942 Energies 2018, 11, 213 11 of 13

It is obvious that red curve in Figure 11, which denotes the DeepEnergy algorithm, is better than other curves in Figures6–10, which further veriﬁes that the proposed DeepEnergy algorithm has the best prediction performance. Therefore, it is proven that the DeepEnergy STLF algorithm proposed in the paper is practical and effective. Although the LSTM has good performance in time sequence problems, in this study, the reduction of training loss is still not fast enough to handle this forecasting problem because the size of input and output data is too large for the traditional LSTM neural network. Therefore, the traditional LSTM is not suitable for this kind of prediction. Finally, the experimental results show that proposed DeepEnergy network provides the best results in energy load forecasting.

5. Discussion The traditional machine learning methods, such as SVM, random forest, and decision tree, are widely used in many applications. In this study, these methods also provide acceptable results. In aspect of SVM, the supporting vectors are mapped into a higher dimensional space by the kernel function. Therefore, the selection of kernel function is very important. In order to achieve the goal of nonlinear energy load forecasting, the RBF is chosen as a SVM kernel. When compared with the SVM, the learning concept of decision tree is much simpler. Namely, the decision tree is a ﬂowchart structure easy to understand and interpret. However, only one decision tree does not have the ability to solve complicated problems. Therefore, the random forest, which represents the combination of numerous decision trees, provides the model ensemble solution. In this paper, the experimental results of random forest are better than those of decision tree and SVM, which proves that the model ensemble solution is effective in the energy load forecasting. In aspect of the neural networks, the MLP is the simplest ANN structure. Although the MLP can model the nonlinear energy forecasting task, its performance in this experiment is not outstanding. On the other hand, the LSTM considers data relationships in time steps during the training. According to the result, the LSTM can deal with the time sequence problems, and the forecasting trend is marginally correct. However, the proposed CNN structure, named the DeepEnergy, has the best results in the experiment. The experiments demonstrate that the most important feature can be extracted by the designed 1D convolution and pooling layers. This veriﬁcation also proves the CNN structure is effective in the forecasting, and the proposed DeepEnergy gives the outstanding results. This paper not only provides the comparison of the traditional machine learning and deep learning methods, but also gives a new research direction in the energy load forecasting.

6. Conclusions This paper proposes a powerful deep convolutional neural network model (DeepEnergy) for energy load forecasting. The proposed network is validated by experiment with the load data from the past seven days. In the experiment, the data from coast area of the USA were used and historical electricity demand from consumers was considered. According to the experimental results, the DeepEnergy can precisely predict energy load in the next three days. In addition, the proposed algorithm was compared with five AI algorithms that were commonly used in load forecasting. The comparison showed that performance of DeepEnergy was the best among all tested algorithms, namely the DeepEnergy had the lowest values of both MAPE and CV-RMSE. According to all of the obtained results, the proposed method can reduce monitoring expenses, initial cost of hardware components, and long-term maintenance costs in the future smart grids. Simultaneously, the results verify that proposed DeepEnergy STLF method has strong generalization ability and robustness, thus it can achieve very good forecasting performance.

Acknowledgments: This work was supported by the Ministry of Science and Technology, Taiwan, Republic of China, under Grants MOST 106-2218-E-153-001-MY3. Author Contributions: Ping-Huan Kuo wrote the program and designed the DNN model. Chiou-Jye Huang planned this study and collected the energy load dataset. Ping-Huan Kuo and Chiou-Jye Huang contributed in drafted and revised manuscript. Energies 2018, 11, 213 12 of 13

Conﬂicts of Interest: The authors declare no conﬂict of interest.

References

1. Raza, M.Q.; Khosravi, A. A review on artificial intelligence based load demand forecasting techniques for smart grid and buildings. Renew. Sustain. Energy Rev. 2015, 50, 1352–1372. [CrossRef] 2. Da Graça Carvalho, M.; Bonifacio, M.; Dechamps, P. Building a low carbon society. Energy 2011, 36, 1842–1847. [CrossRef] 3. Jiang, B.; Sun, Z.; Liu, M. China’s energy development strategy under the low-carbon economy. Energy 2010, 35, 4257–4264. [CrossRef] 4. Cho, H.; Goude, Y.; Brossat, X.; Yao, Q. Modeling and forecasting daily electricity load curves: A hybrid approach. J. Am. Stat. Assoc. 2013, 108, 7–21. [CrossRef] 5. Javed, F.; Arshad, N.; Wallin, F.; Vassileva, I.; Dahlquist, E. Forecasting for demand response in smart grids: An analysis on use of anthropologic and structural data and short term multiple loads forecasting. Appl. Energy 2012, 96, 150–160. [CrossRef] 6. Iwafune, Y.; Yagita, Y.; Ikegami, T.; Ogimoto, K. Short-term forecasting of residential building load for distributed energy management. In Proceedings of the 2014 IEEE International Energy Conference, Cavtat, Croatia, 13–16 May 2014; pp. 1197–1204. [CrossRef] 7. Short Term Electricity Load Forecasting on Varying Levels of Aggregation. Available online: https://arxiv. org/abs/1404.0058v3 (accessed on 11 January 2018). 8. Gerwig, C. Short term load forecasting for residential buildings—An extensive literature review. Smart Innov. Syst. 2015, 39, 181–193. 9. Hippert, H.S.; Pedreira, C.E.; Souza, R.C. Neural networks for short-term load forecasting: A review and evaluation. IEEE Trans. Power Syst. 2001, 16, 44–55. [CrossRef] 10. Metaxiotis, K.; Kagiannas, A.; Askounis, D.; Psarras, J. Artificial intelligence in short term electric load forecasting: A state-of-the-art survey for the researcher. Energy Convers. Manag. 2003, 44, 1524–1534. [CrossRef] 11. Tzafestas, S.; Tzafestas, E. Computational intelligence techniques for short-term electric load forecasting. J. Intell. Robot. Syst. 2001, 31, 7–68. [CrossRef] 12. Ghayekhloo, M.; Menhaj, M.B.; Ghofrani, M. A hybrid short-term load forecasting with a new data preprocessing framework. Electr. Power Syst. Res. 2015, 119, 138–148. [CrossRef] 13. Xia, C.; Wang, J.; McMenemy, K. Short, medium and long term load forecasting model and virtual load forecaster based on radial basis function neural networks. Int. J. Electr. Power Energy Syst. 2010, 32, 743–750. [CrossRef] 14. Bughin, J.; Hazan, E.; Ramaswamy, S.; Chui, M. Artificial Intelligence—The Next Digital Frontier? Mckinsey Global Institute: New York, NY, USA, 2017; pp. 1–80. 15. Oh, C.; Lee, T.; Kim, Y.; Park, S.; Kwon, S.B.; Suh, B. Us vs. Them: Understanding Artificial Intelligence Technophobia over the Google DeepMind Challenge Match. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, Denver, CO, USA, 6–11 May 2017; pp. 2523–2534. [CrossRef] 16. Skilton, M.; Hovsepian, F. Example Case Studies of Impact of Artificial Intelligence on Jobs and Productivity. In 4th Industrial Revolution; National Academies Press: Washingtom, DC, USA, 2018; pp. 269–291. 17. Ekonomou, L.; Christodoulou, C.A.; Mladenov, V. A short-term load forecasting method using artificial neural networks and wavelet analysis. Int. J. Power Syst. 2016, 1, 64–68. 18. Valgaev, O.; Kupzog, F. Low-Voltage Power Demand Forecasting Using K-Nearest Neighbors Approach. In Proceedings of the Innovative Smart Grid Technologies—Asia (ISGT-Asia), Melbourne, VIC, Australia, 28 November–1 December 2016. 19. Valgaev,O.; Kupzog, F.Building Power Demand Forecasting Using K-Nearest Neighbors Model—Initial Approach. In Proceedings of the IEEE PES Asia-Pacific Power Energy Conference, Xi’an, China, 25–28 October 2016; pp. 1055–1060. 20. Humeau, S.; Wijaya, T.K.; Vasirani, M.; Aberer, K. Electricity load forecasting for residential customers: Exploiting aggregation and correlation between households. In Proceedings of the 2013 Sustainable Internet and ICT for Sustainability, Palermo, Italy, 30–31 October 2013. Energies 2018, 11, 213 13 of 13

21. Veit, A.; Goebel, C.; Tidke, R.; Doblander, C.; Jacobsen, H. Household electricity demand forecasting: Benchmarking state-of-the-art method. In Proceedings of the 5th International Confrerence Future Energy Systems, Cambridge, UK, 11–13 June 2014; pp. 233–234. [CrossRef] 22. Jetcheva, J.G.; Majidpour, M.; Chen, W. Neural network model ensembles for building-level electricity load forecasts. Energy Build. 2014, 84, 214–223. [CrossRef] 23. Kardakos, E.G.; Alexiadis, M.C.; Vagropoulos, S.I.; Simoglou, C.K.; Biskas, P.N.; Bakirtzis, A.G. Application of time series and artificial neural network models in short-term forecasting of PV power generation. In Proceedings of the 2013 48th International Universities’ Power Engineering Conference, Dublin, Ireland, 2–5 September 2013; pp. 1–6. [CrossRef] 24. Fujimoto, Y.; Hayashi, Y. Pattern sequence-based energy demand forecast using photovoltaic energy records. In Proceedings of the 2012 International Conference on Renewable Energy Research and Applications, Nagasaki, Japan, 11–14 November 2012. 25. Chaouch, M. Clustering-based improvement of nonparametric functional time series forecasting: Application to intra-day household-level load curves. IEEE Trans. Smart Grid 2014, 5, 411–419. [CrossRef] 26. Niu, D.; Dai, S. A short-term load forecasting model with a modified particle swarm optimization algorithm and least squares support vector machine based on the denoising method of empirical mode decomposition and grey relational analysis. Energies 2017, 10, 408. [CrossRef] 27. Deo, R.C.; Wen, X.; Qi, F. A wavelet-coupled support vector machine model for forecasting global incident solar radiation using limited meteorological dataset. Appl. Energy 2016, 168, 568–593. [CrossRef] 28. Soubdhan, T.; Ndong, J.; Ould-Baba, H.; Do, M.T. A robust forecasting framework based on the Kalman filtering approach with a twofold parameter tuning procedure: Application to solar and photovoltaic prediction. Sol. Energy 2016, 131, 246–259. [CrossRef] 29. Hahn, H.; Meyer-Nieberg, S.; Pickl, S. Electric load forecasting methods: Tools for decision making. Eur. J. Oper. Res. 2009, 199, 902–907. [CrossRef] 30. Zhang, G.; Patuwo, B.E.; Hu, M.Y. Forecasting with artificial neural networks: The state of the art. Int. J. Forecast. 1998, 14, 35–62. [CrossRef] 31. Pappas, S.S.; Ekonomou, L.; Moussas, V.C.; Karampelas, P.; Katsikas, S.K. Adaptive load forecasting of the Hellenic electric grid. J. Zhejiang Univ. A 2008, 9, 1724–1730. [CrossRef] 32. White, B.W.; Rosenblatt, F. Principles of neurodynamics: Perceptrons and the theory of brain mechanisms. Am. J. Psychol. 1963, 76, 705. [CrossRef] 33. Hochreiter, S.; Urgen Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef][PubMed] 34. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [CrossRef] 35. Suykens, J.A.K.; Vandewalle, J. Least squares support vector machine classifiers. Neural Process. Lett. 1999, 9, 293–300. [CrossRef] 36. Liaw, A.; Wiener, M. Classification and Regression by randomForest. R News 2002, 2, 18–22. [CrossRef] 37. Safavian, S.R.; Landgrebe, D. A Survey of Decision Tree Classifier Methodology. IEEE Trans. Syst. Man Cybern. 1991, 21, 660–674. [CrossRef]

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).