Data Management System for Energy Analytics and Its Application to Forecasting
Total Page:16
File Type:pdf, Size:1020Kb
Data Management System for Energy Analytics and its Application to Forecasting Francesco Fusco Ulrike Fischer Vincent Lonij IBM Research Ireland IBM Research Ireland IBM Research Ireland [email protected] ufi[email protected] [email protected] Pascal Pompey Jean-Baptiste Fiot Bei Chen IBM Research Ireland IBM Research Ireland IBM Research Ireland [email protected] jean- [email protected] baptiste.fi[email protected] Yiannis Gkoufas Mathieu Sinn IBM Research Ireland IBM Research Ireland [email protected] [email protected] ABSTRACT forecasting, are essential operations for the optimization of The effective management of a power grid with an increas- the power flow in a smart grid. For the successful applica- ing share of (distributed) renewables and more and more tion of such operations the reliable processing and manage- available data, e.g., coming from smart meters, heavily re- ment of the data collected by a smart grid is a key factor. lies on advanced data analytics such as demand and supply Due to the increasing number of diverse devices consider- forecasting. In this context, data management is one ma- able amount of data is produced by a smart grid, leading to jor challenge in electric grids. Large amount of data from a trend of emerging big data architectures and the discussion multiple heterogeneous sources require transformations, e.g., of technical challenges as well as potential use cases [1, 2]. spatio-temporal alignment or anomaly detection, to serve Data management is one major challenge in electric grids data analytics tasks and are often applied on different views as data is incomplete in nature, heterogeneous, difficult to of the data, e.g., on state, substation or feeder level. merge and arrives at different rates [3]. Energy data ar- In this paper, the progress on the development of an en- rives from various distributed sources, e.g., supervisory con- ergy data management systems for the electricity grid is pre- trol and data acquisition (SCADA) systems, smart meters, sented. The design of the system was inspired by the real- renewable generation systems, and differs significantly in world use case of forecasting short-term energy demand in terms of format, resolution and quality. Moreover, data Vermont, using data from a combination of SCADA, smart might represent different views of a power system, e.g., load meters and weather forecasting services. A general data at feeder level or over a whole substation. Analytics tasks, model addressing the aforementioned challenges and aimed such as demand forecasting, also require contextualisation at supporting advanced data analytics is introduced. The with additional data sets, such as calendar information and proposed data model views a time series as an abstract con- weather forecasts. The latter might come from different cept that might represent raw measurements or arbitrary weather services, again arriving at different rates and in dif- operations. The benefits of the system is demonstrated for ferent time and location resolutions. The system needs to the design and live update energy demand forecasts. be able to consolidate all these diverse data sets and pro- vide a common view. Analytics are then rarely performed on raw data, but require transformations of the data such as the computation of weighted averages or anomaly detec- 1. INTRODUCTION tion. The system has to support, store and continuously re-compute such transformation so that analytics can be The smart grid is the next-generation power system char- continuously applied. acterized by the inclusion of highly-distributed intelligent In this paper, we explicitly address the challenge of data devices and communication technologies that manage elec- management in a smart grid, proposing a data management tricity demand in a sustainable, reliable and economic man- architecture for the smart grid that is, on the one side, able ner. Advanced data analytics, such as load classification or to manage such diverse data sets and, on the other side, sup- ports operations, such as forecasting, that perform various transformation on the available data. To achieve this, we introduce a generic data model for the energy domain and show how this data model can be applied for the use case of c 2016, Copyright is with the authors. Published in the Workshop Pro- short term energy demand forecasting. ceedings of the EDBT/ICDT 2016 Joint Conference (March 15, 2016, Bor- There have been other efforts in the design of smart grid deaux, France) on CEUR-WS.org (ISSN 1613-0073). Distribution of this architectures. For example, Yang et. al. [4] give a high-level paper is permitted under the terms of the Creative Commons license CC- overview for a smart grid big data management system, in- by-nc-nd 4.0 troducing a simple data model, but mainly discussing data EnDM ’16 Bordeaux, France Data ingestion Modeling Runtime env. Data Feeds (off-line, runtime) (off-line) interval energy data from MV-90. Where required, energy demand or distributed generation is obtained by aggregating Data curation FeatureFeature selection selection Data curation ModelModel scoring scoring Spatio-temporal ModelModel training training Weather Spatio-temporal On-lineOn-line learning learning alignment RefinementsRefinements data alignment AnomalyAnomaly flags flags data from AMI up to the desired level of the grid (feeder, FilteringFiltering MetricsMetrics & & KPIs KPIs External substation). IBM Deep Thunder [12] was used to obtain applications Smart weather predictions up to 72 hours ahead. Deep Thunder Meters WebWeb Portal Portal InterfaceInterface APIsAPIs DataData produces weather forecasts on a grid with 1 km resolution ModelModel RESTREST APIs APIs SQLSQL Spark,Spark, HBASEHBASE and 10 minute time steps. Forecasts are updated twice per SCADA, ICCP,MV90 MarketMarket biddingbidding day and each run produces about 300 gigabytes of data. In- ...... ternally, the data are stored in a combination of relational Metadata, HDFS NetCDF DB2 Asset data - smart meters - Gridded database management systems (RDBMS) for electrical asset - Weather feat. weather data - Power data - Metadata data and SCADA/MV-90, Hadoop distributed file system - Models Data storage (HDFS) for the AMI data, network common data format (NetCDF) for the weather gridded data. Figure 1: Architecture diagram. As shown in Fig. 1, the data management system supports the tasks of: data ingestion, curation and spatio-temporal alignment of energy and weather data; training forecasting distribution and load balancing. Using cloud computing as models, which requires retrieving raw data and designing a platform for smart grid data management [5] and pro- input features (covariates); retrieving data of covariates and viding time series analytics as a service [6] poses another scoring forecasting models at runtime; interfaces to client trend in this area. A cloud-based demand response plat- applications (e.g. web portal, market bidding services). form is introduced in [7], which provides a workflow for data ingestion and scalable demand forecasting on Hadoop. To handle large amount of data and enable real-time reactions data streaming techniques have also been applied for the 3. THE DATA MODEL smart grid context [8]. A real-time data management sys- In order to manage highly heterogeneous sets of data and tem was developed within the MIRABEL project with focus support a variety of analytical operations typical of energy on storing and processing special energy planning objects for and utilities, as the ones detailed in section 2, a data model demand-response [9]. In the specific context of modeling en- was developed. The main objective of the data model was ergy data, the representation of time series and flex-offers providing a transparent, high-level interface to the users and within the context of distributed energy markets has been client applications, as well maintaining consistency, integrity studied in [10], while an ontology for big data in the smart and traceability between the various data sources. grids has been considered in [11], with particular focus on Figure 2 shows a conceptual structure of the proposed offshore wind farms. The proposed data model specifically data model. All dynamic data in an electricity grid can be addresses the management of dynamic time series data, as represented as a time series. Analogue, operations on the well as the operations and analytics applied to them. It can data applied by analytics also produce time series, such as therefore be seen as complementary to the reviewed research lags or rolling-window forecasts. Consequently, at the core of efforts and might serve as a basis for other existing designs the proposed data model is the representation of time series, of smart grid architectures. including timestamp/value pairs and abstract operations, as In the remainder of the paper a general system architec- described in section 3.1. The data model also contextualises ture supporting data management and analytics tasks for the time series with respect to a physical quantity (e.g. en- energy utilities is introduced in section 2. The paper then ergy demand, power, temperature) and entity (e.g. substa- focuses on the data model and query operations at the core tion, service territory), through the concept of signals, as of the proposed system in section 3. To demonstrate the us- discussed in section 3.2. Finally, the representation of ana- ability of our system, in section 4, the use case of short-term lytical models, which links the output time series of complex energy demand forecasting is discussed and concrete exam- operations, such as energy forecasts, to one or more several ples on the stored data and query operations are given. Final input time series, is then detailed in section 3.3. conclusions and future possibilities are discussed in section 5. 3.1 Time series At the core of the data model is the abstract concept of 2. SYSTEM ARCHITECTURE a TimeSeries, which in general can represent materialized A data management system was developed, conceptually data or abstract operations.