Volume III, Issue III, March 2016 IJRSI ISSN 2321 - 2705

Wind Power Forecasting Using Extreme Learning Machine

1Asima Syed, 2S.K Aggarwal 1Electrical and Renewable energy Engineering Department, Baba Ghulam Shah Badshah University, India 2Electrical Engineering Department, Maharishi Markandeshwar University, India

Abstract─ Wind generation is one of the rapidly growing source g ( ) Activation function of ELM of renewable energy. The uncertainty in the generation is large due to the variability and intermittency in the w Input weights of ELM wind speed, and operating requirements of power systems. is one of the inexpensive and direct methods to 훽 Output weights of ELM alleviate negative influences of intermittency of wind power generation on power system. This paper proposes an extreme 훽 Approximated output weights learning machine (ELM) based forecasting method for wind power generation. ELM is a learning algorithm for training single-hidden layer feed forward neural networks (SLFNs). The I. INTRODUCTION performance of proposed model is assessed by utilizing the wind ind generation is an increasingly exploited form of power data from three wind power stations of Ontario, Canada W renewable energy. However, there is a great variation as the test case systems. The effectiveness of the proposed model in wind generation due to erratic nature of the earth’s is compared with persistence benchmark model. atmosphere, and this poses a number of intricacies that act as Keywords─ Extreme learning machine (ELM); single-hidden a restraining factor for this energy source. Wind generation layer feed forward network (SLFN); wind power forecasting; vary with time and location due to fluctuations in wind speed. wind power. Therefore, unexpected variation in the output of may escalate the operating costs for electricity system by NOMENCLATURE increasing the requirements for primary reserves, as well as pose potential risks to the reliability of electricity supply [1]. b Biases of ELM A grid operator’s priority is to forecast the variations of wind power production so as to schedule the spinning reserve 푏 Approximated biases capacity and to regulate the grid operations [2], [3]. This makes the wind power forecasting tools very important. H Hidden layer output matrix of ELM Besides the transmission system operators (TSOs), forecasting tools are required by various end-users which include energy † traders and energy service providers (ESPs), independent 퐇 Moore-Penrose generalized inverse of ELM’s power providers (IPPs), etc. and to provide inputs for different tasks like energy trading, economic scheduling, security hidden layer output matrix assessment, etc. Researchers focus on developing a forecasting tool so as to predict wind power production with f ( ) Output function of ELM adequate accuracy. Several approaches such as the persistence method, physical modeling approach, statistical models, and 푁 Number of ELM’s hidden nodes soft computing models (SCMs) are employed in the literature for short-term deterministic wind power forecasting (WPF). 푤 Approximated input weights The persistence method, also called as a “Naive predictor” is employed as a benchmark for comparing other forecasting N Number of training samples tools for short-term WPF [4]. Physical modeling approach involve Numerical Weather Prediction (NWP) for wind n Dimension of the input vector of ELM forecasting which utilizes various weather data and operates by evaluating complex mathematical models [5]. The m Dimension of the output vector of ELM statistical models such as exponential techniques, autoregressive moving average and grey predictors employ i, j, l Common indices historical data to tune the model parameters and error is minimized when the patterns match the historical ones within www.rsisinternational.org Page 33

Volume III, Issue III, March 2016 IJRSI ISSN 2321 - 2705 a stipulated bound [6]-[8]. SCMs, such as back-propagation WPF. Evolutionary optimization techniques, such as genetic neural network [9], probabilistic neural network, radial base algorithm (GA) [14], particle swarm optimization (PSO) [15] function neural network [10], cascade correlation neural and firefly algorithm [16] are employed for optimizing the network, fuzzy ARTMAP [11], support vector machines [12], neural networks (NN). Hybrid Gaussian processes [13], and fuzzy logic are widely used in 푡푕 intelligent algorithms like neuro-fuzzy methods [17], adaptive 푖 hidden neuron. 푤푖 · 푥푗 denotes the inner product of 푤푖 and neuro-fuzzy inference system (ANFIS) [18] using fuzzy logic 푥푗 .The output neurons are chosen linear. and NN for WPF are also getting popular. The standard SLFNs with 푁 hidden neurons and activation In this paper, a new forecasting approach is proposed based function g(.) can approximate these N samples with zero error on the extreme learning machine (ELM), which is a unique means that there exists 훽 , 푤 and 푏 such that learning algorithm for training single-hidden layer feed 푖 푖 푖 forward neural networks (SLFNs). Conventionally, all the 푁 parameters of feed forward neural networks need to be tuned 푖=1 훽푖 푔 푤푖 . 푥푗 + 푏푖 = 푡푗 , 푗 = 1, … , 푁 (2) and thus there exists the dependency between the parameters of different layers. For past decades, gradient-based methods have The above N equations can be written compactly as: been extensively used for training feed forward NNs, which are generally very slow due to inappropriate learning steps or may H훽 = T (3) converge to local minimums. And numerous iterative learning steps are required by such learning algorithms to attain Where improved learning performance. In contrary to this, ELM randomly chooses the input weights and analytically H (퐰ퟏ,…, 퐰N , 푏1,…, 푏푵 ,퐱1,…, 퐱N) = determines the output weights by simple matrix computations, thus characterizing an extremely fast learning algorithm than most common learning algorithms such as back-propagation 푔 퐰1. 퐱1 + 푏1 ⋯ 푔 퐰N . 퐱ퟏ + 푏푁 [19]. Apart from reaching the smallest training error, the ⋮ … ⋮ (4) proposed learning algorithm also tends to reach the smallest 푔 퐰1. 퐱N + 푏1 ⋯ 푔 퐰N . 퐱N + 푏N 푁×푁 norm of weights which is different from traditional learning algorithms. The proposed model is assessed by using the data T 푇 from the three wind power stations of Ontario, Canada namely 훽1 퐭1 , and Kingsbridge 훽 = ⋮ and 퐓 = ⋮ (5) 훽T 퐭푇 wind farm. The performance of the proposed model is N 푁 ×푚 푁 푁×푚 evaluated by comparing with persistence benchmark model. The results reveal a significant improvement over persistence model. H is called the hidden layer output matrix of the neural network [21], [22]. Rest of this paper is organized as follows. Section II The input weights 퐰 and the hidden layer biases 푏 are describes the ELM learning algorithm for SLFNs, Section III 푖 푖 selected randomly and in fact need not to be tuned. Once briefs the benchmark reference model and Section IV presents arbitrary values have been assigned to these parameters in the the numerical results and discussions and Section V delineates beginning of the learning, H can remain unchanged. the conclusions. In most of the cases the number of hidden neurons is much II. EXTREME LEARNING MACHINE less than the number of distinct training samples, 푁 ≪ 푁 , H is Extreme learning machine (ELM) for single hidden layer feed a non square matrix and there may not exist 푤푖 ,푏푖 , 훽푖 such that forward networks (SLFNs) randomly chooses the input weights Hβ = T. Thus, one may need to find specific 퐰 퐢,푏푖 , 훽푖 푖 = and coherently determines the output weights through simple 1,…,푁 such that matrix computations [20]. For N arbitrary distinct samples 푥 푡 , where 푥 =[푥 푥 , … 푥 ]푇 ∈ 푅푛 and 푖, 푖 푖 푖1, 푖2 푖푛 퐇 퐰 1, … , 퐰 푁 , 푏1, … , 푏푁 훽 − 퐓 = 푡 [푡 푡 … , 푡 ]푇 ∈ 푅푚 , standard SLFNs with 푁 hidden 푖= 푖1, 푖2, 푖푛 푚푖푛푤 ,푏 ,훽 퐇 퐰 1, … , 퐰 푁 , 푏1, … , 푏푁 훽 − 퐓 (6) neurons and activation function g(x) are mathematically 푖 푖 modeled as This is equivalent to minimizing the cost function of the traditional gradient-based leaning algorithms employed in 푁 푓(푥푗 ; 푤, 푏, 훽)= 푖=1 훽푖 푔 푤푖 . 푥푗 + 푏푖 , 푗 = 1, … , 푁 (1) back-propagation learning

Where 푤 = [푤 , 푤 … , 푤 ]푇 is the weight vector connecting 푁 푁 2 푖 푖1 푖2, 푖푛 퐸 = 푗 =1 푖=1 훽푖 푔(퐰푖 . 퐱푗 + 푏푖 ) − 퐭푗 (7) the ith hidden neuron and the input neurons, 훽푖 = [훽 , 훽 , … 훽 ]푇is the weight vector connecting the 푖푡푕 hidden 푖1 푖2 푖푚 For fixed and randomly chosen input weights 퐰푖 and neuron and the output neurons, and 푏푖 is the threshold of the hidden layer biases 푏푖 , training a SLFN is simply equivalent to www.rsisinternational.org Page 34

Volume III, Issue III, March 2016 IJRSI ISSN 2321 - 2705 evaluating a least-square solution 훽 of the linear system Hβ = TABLE I gives the average MAE for five years on T: applying persistence and ELM models on Amaranth wind farm for different forecast horizons. Fig. 1 illustrates the performance comparison of the two forecasting models with 퐇 퐰 , … , 퐰 , 푏 , … , 푏 훽 − 퐓 = 1 푁 1 푁 MAE measure as a function of look-ahead time. Fig. 1 reveals 푚푖푛 퐇 퐰 , … , 퐰 , 푏 , … , 푏 훽 − 퐓 (8) 훽 1 푁 1 푁 that the ELM forecasting model outperforms the persistence model from second look-ahead hour only. Also, the MAE of persistence model reaches up to 39.021 for 12 hours ahead The smallest norm least square solution of above linear system forecast whereas in case of ELM model MAE reaches up to is 25.29 only which asserts a significant improvement. Furthermore the proposed model is tested on two more wind 훽 = 퐇† T (9) farms of Ontario, Canada. TABLE II and TABLE III gives the † numerical results obtained for Port Alma and Kingsbridge Where 퐇 is Moore-Penrose Generalized Inverse of matrix H. wind farms respectively on application of both persistence and The 퐇† is generally obtained by singular value decomposition ELM forecasting models. Fig. 2 shows the performance (SVD) method. comparison for Port Alma wind farm on the basis of MAE. The ELM eliminates many limitations of traditional gradient- figure reveals that in case of Port Alma also ELM model based NN training algorithms such as local minima, outperforms the persistence model from second look-ahead overtraining and high computation burdens without iterative hour only with MAE of persistence model reaching 24.355 and gradient-based training. The ELM with N hidden-layer neurons MAE of ELM reaching 17.189. However in case of can learn N distinct samples exactly with zero error for any Kingsbridge wind farm persistence model performs better from infinitely differentiable activation function. In addition, ELM ELM model for first two look-ahead hours as shown in Fig. 3, training can always ensure the best outcomes according to the nevertheless the ELM model outperforms the persistence assigned weights. The training speed is extremely fast due to model from third look-ahead hour. The model provided the the simple matrix computation in (9). average MAE of 21.010, 13.904 and 5.068 for Amaranth wind farm, Port Alma wind farm and Kingsbridge wind farm III. BENCHMARK REFERENCE MODEL respectively for 12 look-ahead hours over a five year period. Persistence model is used in our work to determine the Overall the proposed forecasting model gives the significant performance of the proposed model. The persistence or naive improvement in all the three cases. predictor, the most prevalent benchmark model, states that future wind production remains the same as the last measured V.CONCLUSION value of the power Forecasting of the wind power is captious to the operation of the power system. However, errors in the wind power 푝 푝 푡 + 푘 푡 = 푝(푡) (10) forecasting are inevitable because of the non-linear and stochastic nature of the weather system. Conventional neural The measure of the error used to evaluate the effectiveness of network based forecasting models do not furnish adequate performance in our work is mean absolute error (MAE), which performance with regard to both accuracy and computational is given by (11) time. In this paper, extreme learning machine based deterministic forecasting of wind power is successfully applied 1 which produces both fast and more accurate results MAE = 푛 퐴 − 퐹 (11) 푛 푖=1 푖 푖

Where 퐴 is the actual and 퐹 is the forecasted values of the 푖 푖 TABLE I. MAE ERROR FOR DIFFERENT FORECASTING LEAD HOURS wind power. FOR AMARANTH WIND FARM

IV. RESULTS AND DISCUSSIONS Look- Average MAE ahead The results are obtained for the time period between time PERSISTENCE ELM January 3, 2010 and January 1, 2015, covering a 5 year period. (hours) The results are given for three wind farms of Ontario, Canada which include Amaranth wind farm situated in Amaranth and 1 9.824 11.0141 Melancthon Township, near Shelburne, Port Alma wind farm 2 15.678 14.797 located in Port Alma and Kingsbridge wind farm situated in 3 19.957 17.965 Goderich. The meteorological data comprising of wind speed 4 23.408 19.669 and wind direction is provided by WEATHER BANK, Inc and 5 26.335 20.790 the wind power data is provided by Independent Electricity 6 28.908 21.7631 System Operator (IESO). These three parameters namely wind 7 31.159 23.099 speed, wind direction and wind power are utilized as the input parameters in our forecasting model. 8 33.149 23.637 9 34.876 23.837 www.rsisinternational.org Page 35

Volume III, Issue III, March 2016 IJRSI ISSN 2321 - 2705

10 36.399 24.912 TABLE III. MAE FOR DIFFERENT FORECASTING 11 37.736 25.359 LEAD HOURS FOR KINGSBRIDGE WIND FARM 12 39.022 25.282 Average MAE 45 Look-ahead 40 PERSISTENCE time (hours) PERSISTENCE ELM 35 ELM 1 2.182 2.407 30 2 3.508 3.523 25 3 4.461 4.268 20 4 5.246 4.799 15 5 5.919 5.093 10 6 6.509 5.391

Mean Asolute Error (MAE) Error Asolute Mean 5 7 7.027 5.615 0 8 7.481 5.766 1 2 3 4 5 6 7 8 9 10 11 12 9 7.905 5.869 10 8.280 5.941 look-ahead time (hours) 11 8.625 6.052 Fig. 1 Performance comparison of persistence and ELM models with MAE 12 8.939 6.099 measure as a function of look-ahead time for Amaranth wind farm

TABLE II. MAE FOR DIFFERENT FORECASTING LEAD HOURS FOR PORT ALMA WIND FARM 10 9 Average MAE PERSISTENCE 8 ELM Look-ahead PERSISTENCE ELM 7 time (hours) 6 1 6.244 6.805 5 2 9.977 9.711 4 3 12.615 11.631 3 4 14.675 12.822 2

5 16.413 13.763 (MAE) Error Absolute Mean 1 6 17.945 14.541 0 7 19.273 15.196 1 2 3 4 5 6 7 8 9 10 11 12 8 20.459 15.643 look-ahead time (hours) 9 21.542 15.999 10 22.539 16.592 Fig. 3 Performance comparison of persistence and ELM models with MAE measure as a function of look-ahead time for Kingsbridge wind farm 11 23.462 16.953 12 24.355 17.189 and is also compared with persistence model. This is because the learning speed of ELM is notably fast and can train SLFNs 30 quite faster than classical learning algorithms. ELM can also be PERSISTENCE employed to train SLFNs with non-differentiable activation 25 ELM functions which make its application broader. Furthermore the 20 unique feature of this algorithm is that besides reaching the smallest training error, it also tends to reach the smallest norm 15 of weights which is contrary to the traditional gradient-based 10 learning algorithms that do not consider the magnitude of the weights. The proposed learning algorithm also does not 5 encounter the issues like local minimum, overfitting, improper Mean Asolute Error (MAE) Error Asolute Mean learning rate etc. All these advantages make ELM favourable 0 for training SLFNs which in turn provide more accurate results 1 2 3 4 5 6 7 8 9 10 11 12 when exploited for wind power forecasting. look-ahead time (hours) ACKNOWLEGDEMENT Fig. 2 Performance comparison of persistence and ELM models with MAE measure as a function of look-ahead time for Port Alma wind farm www.rsisinternational.org Page 36

Volume III, Issue III, March 2016 IJRSI ISSN 2321 - 2705

[11] A. U. Haque and J. Meng, “Short-Term Wind Speed Forecasting Based The authors will like to thank WEATHER BANK, Inc for on Fuzzy ARTMAP,” Int. J. Green Energy, vol. 8, no. 1, pp. 65–80, 2011 providing wind speed and wind direction data and Independent [12] M. Mohandes, T. Halawani, S. Rehman, and A. A. Hussain, “Support Electricity System Operator (IESO) for providing wind power Vector Machines For Wind Speed Prediction,” Renew. Energy, vol. 29, no. 6, pp. 939–947, 2004. data. [13] Duehee Lee and Ross Baldick, “Short-Term Wind Power Ensemble Prediction Based on Gaussian Processes and Neural Networks”, IEEE REFERENCES Trans. Smart Grid, vol. 5, no. 1, pp.501-510, January 2014. [14] G. Kariniotakis, G. Stavrakakis, and E. Nogaret, “Wind Power [1] B. Parsons, M. Milligan, B. Zavadil, D. Brooks, B. Kirby, K. Forecasting Using Advanced Neural Network Models,” IEEE Trans. Dragoon,and J. Caldwell, “Grid Impacts of Wind Power: A Summary of Energy Convers., vol. 11, no. 4, pp. 762–767, December 1996. Recent Studies in the United States,” in Proc. EWEC, Madrid, Spain, [15] N. Amjady, F. Keynia, and H. Zareipour, “Wind Power Prediction by a 2003. New Forecast Engine Composed of Modified Hybrid Neural Network [2] R. Doherty and M. O’Malley, “Quantifying Reserve Demands due to and Enhanced Particle Swarm Optimization,” IEEE Trans. Sustain. Increasing Wind Power Penetration,” in Proc. IEEE Power Tech Conf. Energy, vol. 2, no. 3, pp. 265–276, Jul. 2011. ,Bologna, Italy, vol. 2, pp. 23–26, 2003. [16] Ashraf Ul Haque, M. Hashem Nehrir and Paras Mandal, “A Hybrid [3] R. Doherty and M. O’Malley, “A New Approach to Quantify Reserve Intelligent Model for Deterministic and Quantile Regression Approach Demand in Systems with Significant Installed Wind Capacity,” IEEE for Probabilistic Wind Power Forecasting”, IEEE Trans. Power Systems, Trans. Power Syst., vol. 2, no. 2, pp. 587–595, May 2005. vol. 29, no. 4, pp.1663-1672, July 2014 [4] M. Lange and U. Focken, “ Physical Approach to Short-Term Wind [17] P. Pinson and G. Kariniotakis, “Wind Power Forecasting Using Fuzzy Power Prediction ”New York, NY, USA: Springer, 2010. Neural Networks Enhanced With On-line Prediction Risk Assessment,” [5] S. Salcedo-Sanz, A. M. Perez-Bellido, E. G. Ortiz-Garcia, A. Portilla- in Proc. IEEE Power Tech Conf., Bologna, Italy, vol. 2, p. 8, June, 2003. Figueras, L. Prieto, and D. Paredes, “Hybridizing the Fifth Generation [18] M. Negnevitsky, P. Johnson, and S. Santoso, “Short Term Wind Power Mesoscale Model with Artificial Neural Networks for Short-Term Wind Forecasting Using Hybrid Intelligent Systems,” in Proc. IEEE Power Speed Prediction,” Renew. Energy, vol. 34, no. 6, pp. 1451–1457, 2009. Eng. Soc. General Meeting, pp. 1–4, June 2007. [6] H. Madsen, P .Pinson, G. Kariniotakis, H. A. Nielsen, and T. S .Nielsen, [19] Guang Bin Huang, Qin-Yu Zhu and Chee-Kheong Siew, ‟ Extreme Learning Machine: Theory and Applications”, Neurocomputing, vol. 70, “Standardizing The Performance Evaluation of Short Term Wind Power pp. 489-501, 2006. Prediction Models,” Wind Eng., vol. 29, no. 6, pp. 475–489, 2005 [20] G.B. Huang, Q.Y. Zhu, and C.K. Siew, “Extreme Learning Machine: A [7] R. G. Kavasseri and K. Seetharaman, “Day-ahead Wind Speed New Learning Scheme of Feed Forward Neural Networks,” in proc. Int. Forecasting Using F-ARIMA Models,” Renew. Energy, vol. 34, no. 5, Joint Conf. on Neural Networks, Budapest, Hungary, vol. 2, pp. 985-990, pp.1388–1393, 2009. Jul. 25–29, 2004. [8] M. Yang, S. Fan, and W.J. Lee, “Probabilistic Short-Term Wind Power [21] G. B. Huang and H. A. Babri, “Upper Bounds on the Number of Hidden Forecast Using Componential Sparse Bayesian Learning,” IEEE Trans. Neurons in Feed Forward Networks with Arbitrary Bounded Nonlinear Ind. Applicat., vol. 49, no. 6, pp. 2783–2792, 2013. Activation Functions,” IEEE Trans. on Neural Networks, vol. 9, no. 1, [9] Wen-Yeau Chang, “Application of Back Propagation Neural Network pp. 224-229, 1998. for Wind Power Generation Forecasting”, Int. Journal of Digital Content [22] G.B.Huang, “Learning capability and storage capacity of two-hidden- Tech. and Its Applications, vol. 7, no.4, February 2013. layer Feed Forward Networks,” IEEE Transactions on Neural Networks, [10] George Sideratos and Nikos Hatziargyriou, “An Advanced Radial Base vol. 14, no. 2, pp. 274-281, 2003. Structure For Wind Power Forecasting,” Int. Journal of Power Energy Sys., vol. 12, pp. 1-10, 2008, ACTA Press.

www.rsisinternational.org Page 37