Comparison of Feature Selection Methods Using Anns in MCP-Wind Speed Methods. a Case Study ⇑ José A
Total Page:16
File Type:pdf, Size:1020Kb
Applied Energy 158 (2015) 490–507 Contents lists available at ScienceDirect Applied Energy journal homepage: www.elsevier.com/locate/apenergy Comparison of feature selection methods using ANNs in MCP-wind speed methods. A case study ⇑ José A. Carta a, , Pedro Cabrera a, José M. Matías b, Fernando Castellano c a Department of Mechanical Engineering, University of Las Palmas de Gran Canaria, Campus de Tafira s/n, 35017 Las Palmas de Gran Canaria, Canary Islands, Spain b Department of Statistics, University of Vigo, Lagoas Marcosende, 36200 Vigo, Spain c Renewable Energies Department, Canary Islands Institute of Technology (ITC), Playa de Pozo Izquierdo s/n, 35119 Santa Lucía – Las Palmas, Spain highlights An analysis is carried out of the benefits of feature selection in MCP methods which use ANNs. The wrapper approach (WA) generated lower mean errors than the filter approach (FA). No significant statistical difference was observed between the WA and the FA in certain cases. The FA generated models somewhat simpler and more interpretable than the WA. The WA displayed better predictive capacity than the FA, but is more computationally intensive. article info abstract Article history: Recent studies in the field of renewable energies, and specifically in wind resource prediction, have Received 8 May 2015 shown growing interest in proposals for Measure–Correlate–Predict (MCP) methods which simultane- Received in revised form 27 July 2015 ously use data recorded at various reference weather stations. In this context, the use of a high number Accepted 21 August 2015 of reference stations may result in overspecification with its associated negative effects. These include, Available online 7 September 2015 amongst others, an increase in the estimation error and/or overfitting which could be detrimental to the generalisation capacity of the model when handling new data (prediction). Keywords: This paper analyses the benefits of feature selection for use with Artificial Neural Network (ANN) tech- Measure–correlate–predict method niques with a multilayer perceptron (MLP) structure when the ANNs are used as MCP methods to predict Artificial neural networks Wind speed mean hourly wind speeds at a target site. The features considered in this study were the mean hourly Wind direction wind speeds and directions recorded in 2003 and 2004 at five weather stations in the Canary Feature selection Archipelago (Spain). Cross-validation technique The two feature selection techniques considered in the analysis were the Correlation Feature Selection (CFS), which is a correlation-based filter approach (FA), and an MLP-based wrapper approach (WA). The metrics used to compare the results were the mean absolute error (MAE), the mean absolute percentage error (MAPE) and the index of agreement (IoA). Evaluation of the mean errors obtained in the 10-fold cross-validation tests for the year used to repre- sent the short-term wind data period resulted in several conclusions. These included, notably, that the WA gave lower mean errors than the FA in 100% of the cases analysed independently of the metric employed. However, the FA resulted in a significant reduction in computational load and considerable enhancement of model interpretability. When very good correlation coefficients were obtained between the target and reference stations, no significant statistical difference was observed at 5% level between the three models (FA, WA and the models constructed with all the variables) in most of the cases analysed. Ó 2015 Elsevier Ltd. All rights reserved. ⇑ Corresponding author. Tel.: +34 928 45 14 83; fax: +34 928 45 14 84. E-mail address: [email protected] (J.A. Carta). http://dx.doi.org/10.1016/j.apenergy.2015.08.102 0306-2619/Ó 2015 Elsevier Ltd. All rights reserved. J.A. Carta et al. / Applied Energy 158 (2015) 490–507 491 Nomenclature A parameter defined by Eq. (2) MAE mean absolute error, Eq. (5) (m/s) AEMET State Meteorological Agency of the Ministry of the MAPE mean absolute percentage error, Eq. (6) (%) Environment and Rural and Marine Environs of the MCP Measure–Correlate–Predict Spanish Government (Spanish initials) MLP multilayer perceptron ANOVA analysis of variance n number of data, Eqs. (1), (2), (5), (6) and (7). ANNs artificial neural networks ncv number of errors obtained in 10-fold cross-validation. agl above ground level N number of reference stations, Fig. 5 B parameter defined by Eq. (2) o vector which contains the observed wind speed values BH Benjamini and Hochberg step-up procedure [49] (m/s) at a target site, Eqs. (5)–(7) C parameter defined by Eq. (2) o Arithmetic mean in m/s of observed wind speed values, CCC Circular Correlation Coefficient, Eq. (1) Eq. (7) CFS Correlation Feature Selection p-value Is the estimated probability of rejecting the null hypoth- CL circular-linear correlation coefficient, Eq. (3) esis (H0) of a study question when that null hypothesis CPU Central Processing Unit is true [50] D, D0 variables that represent wind direction (Degree) Q parameter defined by Eq. (2) D1; ...; D5 variables that represent wind directions of weather q number of neurons in the hidden layer of the neural stations no. 1, ..., 5, respectively network, Fig. 5 E parameter defined by Eq. (2) rcs correlation coefficient, Eq. (4) e vector which contains the estimated values of the wind rvc correlation coefficient, Eq. (4) speed (m/s) at a target site, Eqs. (5)–(7) rvs correlation coefficient, Eq. (4) F parameter defined by Eq. (2) S variable representing wind speed (m/s) FA filter approach S1; ...; S5 variables which represent the wind speeds of weather FA-1, FA-2 used to mark the itinerary followed in the use of the stations number 1, ..., 5, respectively filter approach (FA), Fig. 7 WA wrapper approach FS full feature set (no feature selection). WA-1, WA-2 Used to mark the itinerary followed in the use of G parameter defined by Eq. (2) the wrapper approach (WA), Fig. 7 GNU recursive acronym which stands for GNU is Not Unix Weka Waikato Environment for Knowledge Analysis H parameter defined by Eq. (2) WS weather station H0 null hypothesis, Eq. (8) WS-k weather station identified with the number k H1 alternative hypothesis, Eq. (8) IoA index of agreement, Eq. (7) Greek letters ITC Technological Institute of the Canary Islands (Spanish a level of statistical significance initials) l ; l i j population means of models i and j, Eq. (8) 1. Introduction However, a considerable number of authors [2,12,15–23] argue that it may be beneficial to use several as opposed to a sin- 1.1. Background gle reference station, as the use of various stations may be able to capture details of the wind resource at a target site that would The feasibility of implementing a wind energy conversion sys- otherwise have been missed if only one reference station had tem at a target site depends fundamentally on the wind regime been used. A variety of data mining techniques have been pro- at the site in question [1–3]. In view of this, and as a consequence posed in this respect which allow simultaneous use of the infor- of the interannual variability of wind speed, long series of wind mation provided by a number of reference stations. Perhaps the data are required [1,4–9] to enable estimation of the mean charac- most notable of the proposed techniques are those based on teristics of the wind resource over the useful life of a wind project biologically-inspired algorithms, the so-called artificial neural with the least degree of uncertainty possible. If only short series of networks or ANNs [24,25], and more specifically multilayer per- wind data are available for a particular target site, a common way ceptron (MLP) neural networks [3,11,12,18–23,26,27]. Lopez of at least partially overcoming this drawback is the use of meth- et al. [21] and Velázquez et al. [22] highlight the importance of ods which attempt to estimate the mean long-term characteristics using the wind speeds and directions of the reference stations of the wind resource at the target site by using long historical wind as features, especially in sites of complex terrain. Though data series recorded at nearby weather stations (WSs). Among such Velázquez et al. [22] reported in their studies that predictive methods are those widely known as MCPs (Measure–Correlate–Pre capacity tended to increase with the number of reference sta- dict) [10]. tions, they also noted that not all the reference stations that were As reported by Carta et al. [10], a considerable number of MCPs used improved the results. However, these authors did not con- have been proposed in the scientific literature and new techniques template the possibility of reducing the number of features, continue to be formulated [11,12]. Most of these MCP methods use including only the variables (speed or direction) of each reference a single reference WS and employ linear equations [13] or proba- station that might really be relevant. Aside from this, Brower [2], bilistic equations [14]. The parameters of these models are deter- in reference to linear regression models, warns that the use of a mined with wind data series recorded in a concurrent short-term large number of reference stations, especially if they are strongly time period at the target and reference sites and the equations or correlated, may give rise to model overspecification, which could functions are then used to estimate the long-term wind data at be detrimental to the generalisation capacity of the model when the target site [10]. handling new data (prediction). 492 J.A. Carta et al. / Applied Energy 158 (2015) 490–507 1.2. Aim of this paper process [33]. An additional comparison will involve statistical hypothesis testing to determine whether there is a significant As previously stated, a growing trend has been noted in propos- statistical difference (at 5% level) between the results obtained als that employ MCP methods, which simultaneously use as refer- with the three strategies considered (FA, WA and FS).