A Multidimensional Model for Building Energy Management

José Miguel Castanheira Cavalheiro

Thesis to obtain the Master of Science Degree in Information Systems and Computer Engineering

Supervisor: Prof. Dr. Paulo Jorge Fernandes Carreira

Examination Committee Chairperson: Prof. Dr. João Emílio Segurado Pavão Martins Supervisor: Prof. Dr. Paulo Jorge Fernandes Carreira Member of the Committee: Prof. Dr. Diogo Manuel Ribeiro Ferreira

November 2015 ii “Difficulties mastered are opportunities won.” — Winston Churchill

iii iv Acknowledgments

Gostaria de agradecer ao meu orientador, Professor Paulo Carrreira, pela exigencia,ˆ pelo empenho demonstrado, e pelas cr´ıticas, comentarios,´ e sugestoes.˜

A todos aqueles que acompanharam o progresso desta dissertac¸ao,˜ e que contribu´ıram com cr´ıticas e sugestoes.˜

A` minha fam´ılia, em especial aos meus pais, a` minha irma,˜ e aos meus avos,´ por todo o apoio que sempre me deram, para que pudesse concluir com exitoˆ esta etapa.

A todos os amigos que fiz durante o percurso no IST, e aos inumeros´ colegas com quem passei interminaveis´ horas a desenvolver projectos. Em particular, agradec¸o ao Alberto Carvalho, ao Joao˜ Murtinheira, a` Marta Baptista, ao Nuno Duarte, ao Nuno Salvador, e ao Sebastiao˜ Freire, pela forte entre-ajuda demonstrada e pelos momentos partilhados.

Obrigado.

v vi Resumo

A organizac¸ao˜ dos dados e´ um aspecto crucial para gerir os dados relacionados com o consumo de energia dos edif´ıcios. Apesar da importanciaˆ do tema nao˜ existem propostos na literatura modelos de referenciaˆ para gestao˜ de dados de energia de edif´ıcios. Por esse motivo, esta tese propoe˜ um modelo de dados de referencia,ˆ desenvolvido de acordo com as melhores praticas´ de modelac¸ao˜ multidimen- sional, e melhorado iterativamente de acordo com as revisoes˜ de utilizadores experientes no dom´ınio da energia. A qualidade do modelo e´ tambem´ avaliada de acordo com metricas´ de complexidade, us- abilidade, e qualidade de desenho. Alem´ disso, e´ tambem´ desenvolvido um prototipo´ de um sistema de gestao˜ de energia com base no model proposto, sendo posteriormente validado com gestores de energia de diversas organizac¸oes.˜ O resultado e´ portanto um modelo multidimensional de elevada qual- idade, e que pode ser reutilizado para criar ou melhorar os modelos dos sistemas de gestao˜ de energia existentes.

Palavras-chave: Gestao˜ de Energia, Medidor de energia, Armazem´ de dados, Apoio a` de- cisao,˜ Modelo multidimensional

vii viii Abstract

Data organization is a critical aspect in Building Energy Data Management. Yet, despite the impor- tance of the topic, no sound reference model for energy data has been proposed in the literature. This work proposes a reference data model developed according to standard multidimensional modelling methodologies and improved iteratively in review meetings with users (knowledgeable in the energy management domain). The quality of the model is evaluated according to complexity, usability, and design metrics. Moreover, a BEMS prototype is built upon our model proposal, and validated with expe- rienced energy managers. The end-result is a high-quality re-usable multidimensional data model that can be applied to create or improve on the data model designs of building energy management systems.

Keywords: Energy Management, Energy Metering, Data Warehousing, Decision Support, Mul- tidimensional Model

ix x Contents

Acknowledgments...... v Resumo...... vii Abstract...... ix List of Tables...... xv List of Figures...... xvii Nomenclature...... xix Glossary...... xix

1 Introduction 1 1.1 Problem Definition...... 2 1.2 Methodology and Contributions...... 2 1.3 Document Structure...... 3

2 Concepts 5 2.1 Decision Support Systems...... 5 2.2 Building Energy Management Systems...... 6 2.2.1 Architecture of Building Energy Management Systems...... 7 2.2.2 User Characterization...... 8 2.2.3 Comparison between BEMSs and DSSs...... 8 2.3 DW for Energy Management...... 9 2.3.1 Multidimensional Model for Energy Management...... 9 2.3.2 The Energy Data Cube...... 10 2.3.3 Slowly Changing Dimensions...... 11 2.4 Integration of Energy-Related Data Sources...... 12 2.5 Energy Data Quality...... 13

3 Related Work 15 3.1 Energy Information systems...... 15 3.2 Energy Management Systems Standards and Guides...... 17 3.3 DW Models for Building Energy Management...... 18 3.3.1 Multidimensional Model Proposals in Literature...... 18 3.3.2 Energy Data Analysis Activities...... 19

xi 3.4 Evaluation of Conceptual Models...... 19 3.5 Evaluation of Multidimensional Models...... 20 3.6 Energy Data Visualization and Reporting...... 22 3.7 Discussion...... 23

4 A Multidimensional Model Solution 25 4.1 Multidimensional Model Development...... 25 4.1.1 Building Energy Management Related Data Sources...... 26 4.1.2 Building Energy Management Business Processes...... 28 4.2 Multidimensional Model Solution Description...... 29 4.2.1 Design Choices...... 30 4.2.2 Multiple Transaction Fact Tables...... 30 4.2.3 Multi-valued Dimensions...... 31 4.2.4 Role-playing Dimensions...... 32 4.2.5 Aggregate Fact Tables...... 33 4.2.6 Variable Depth Hierarchies and Hierarchy Bridges...... 34

5 Multidimensional Model Validation 35 5.1 Metrics and Metric Selection...... 35 5.2 Multidimensional Model Validation Methodology...... 36 5.2.1 Workload Tests...... 37 5.2.2 End-User Review Sessions...... 38 5.3 Metrics Evaluation Results...... 38 5.3.1 Structural and Cognitive Complexity Metric Results...... 38 5.3.2 Usability Metric Results...... 39 5.3.3 Design Quality Metric Results...... 40 5.4 Model Validation Findings...... 40 5.4.1 Workload Tests Findings...... 41 5.4.2 User Review Sessions Findings...... 41 5.5 Discussion...... 41

6 BEMS Prototype Solution 43 6.1 BEMS Prototype Development Context...... 43 6.2 ETL Workflows Development...... 44 6.2.1 Choosing the Data Integration Tool...... 44 6.2.2 Determing Systems-of-Record and Extraction Procedures...... 44 6.2.3 Extracting Data from Sources...... 45 6.2.4 Establishing Data Source Priorities...... 46 6.2.5 Overcoming Data Quality Problems...... 47 6.2.6 Determining Slowly Changing Dimensions...... 52

xii 6.2.7 Designing the Integration Processes...... 52 6.3 OLAP Web Server Architecture Overview...... 54 6.4 Repository Overview...... 54 6.5 Energy Management Web Applications...... 55 6.5.1 Application Use Case...... 56 6.5.2 Detailed Consumption Analysis Charts...... 58 6.5.3 Space Comparison Analysis Charts...... 59 6.5.4 Year Comparison Analysis Charts...... 59 6.5.5 Energy-related Factors Analysis Charts...... 60 6.5.6 A4 Occupation & Activities Analysis of a Lecture Room...... 60 6.5.7 Energy Costs Simulator...... 61 6.5.8 Peak Load Analysis Charts...... 61 6.6 Lessons Learned during BEMS Prototype Development...... 62 6.7 BEMS Prototype Evaluation Context...... 63 6.8 BEMS Prototype Evaluation Methodology...... 64 6.9 BEMS Prototype Evaluation Results...... 65 6.9.1 Usability and Performance Evaluation...... 65 6.9.2 Evaluation of Energy Data Analysis Methods...... 66 6.9.3 Evaluation of Individual Charts...... 67 6.9.4 Discussion...... 68

7 Conclusions 69 7.1 Impact...... 70 7.2 Future Work...... 70

Bibliography 73

A IST University Context Description 87

B Multidimensional Model Relational Schema 89

C ETL Workflows 93 C.1 ETL Workflows Dependencies Hierarchy...... 93 C.2 Workflows Figures...... 94 C.2.1 Job used to load IST data...... 94 C.2.2 Dimension Tables Workflows...... 95 C.2.3 Fact Tables Workflows...... 99

D Mondrian XML Schema 103

xiii E BEMS Prototype Evaluation Questionnaire 111 E.1 Background Information...... 111 E.2 Usability Evaluation...... 111 E.3 Technical Evaluation...... 112

xiv List of Tables

2.1 Comparison of DSS and BEMS architecture components...... 9

3.1 Comparison of multidimensional evaluation procedures of multidimensional model pro- posals across several industries...... 21 3.2 Summary of multidimensional model quality metrics literature reports...... 22

4.1 Output of the four-step design process of the Modelling phase, from Kimball lifecycle... 28 4.2 Representation of the developed bus matrix...... 29 4.3 Example of a building space hierarchy bridge table...... 34

5.1 Summary of metrics results obtained during the execution of the model evaluation and improvement process cycle...... 39

6.1 Summary of integrated data sources, their origins, data formats, extraction procedures, and the mapping between model tables and their systems-of-record...... 45 6.2 Summary of algorithms tested in order to find matches between space names...... 50 6.3 Summary of energy management analysis and reporting techniques and model facts and dimensions, and their association with energy consumption analysis web charts...... 56 6.4 Description of usability questions posed during interviews with energy managers..... 63 6.5 Demographic information, education, years of experience, and BEMS usage proficiency of the interviewed energy managers...... 64

xv xvi List of Figures

2.1 Main components of a BEMS...... 7 2.2 Main architecture components of a DW...... 10 2.3 Example of a multidimensional data cube...... 11

4.2 Representation of the complete multidimensional model...... 30 4.3 Representation of energy meter readings fact table...... 31 4.4 Representation of the group bridge tables between meter readings fact table, and equip- ment dimension...... 32 4.5 Representation of the fact table that records organization members space occupancy over time...... 33

5.1 Representation of the multidimensional model evaluation and improvement process cycle 37 5.2 Example of a query used to determine the daily average energy consumption of the uni- versity campus buildings...... 37 5.3 Representation of time and date dimension hierarchies...... 40

6.1 Representation of space ETL workflow...... 48 6.2 Representation of the switch case step used to parse each activity type separately.... 49 6.3 Representation of the initial part of energy costs ETL workflow...... 51 6.4 Representation of the highest level ETL workflow...... 53 6.5 Example of the interactions between BEMS prototype system components required to update an energy consumption analysis web chart...... 57 6.6 Snapshot of the detailed consumption analysis charts...... 58 6.7 Snapshot of space comparison analysis charts...... 60 6.8 Snapshot of & activities analysis chart of the lecture room A4...... 61 6.9 Snapshot of peak load analysis charts...... 62 6.10 Results of the usability questionnaire, including the maximum, minimum and average values 65 6.11 Results of the average BEMS prototype interface response time for each of the 11 interviews 66 6.12 Summary of the number of user requests for each category of functionality missing on the BEMS prototype...... 67

C.1 Representation of the highest level ETL workflow...... 94

xvii C.2 Representation of the transformation that loads time dimension data...... 95 C.3 Representation of the transformation that creates minutes data table...... 95 C.4 Representation that loads date dimension data...... 95 C.5 Representation of the transformation that creates a calendar years data table...... 95 C.6 Representation of the job that loads space dimension data...... 96 C.7 Representation of the transformation that loads parent space data...... 96 C.8 Representation of the transformation that loads child space data...... 96 C.9 Representation of the transformation that loads spaces description data...... 97 C.10 Representation of the transformation that loads space hierarchy bridge data...... 97 C.11 Representation of the transformation that validates start/end date...... 97 C.12 Representation of the transformation that loads activities dimension data...... 98 C.13 Representation of the transformation that parses activities data...... 98 C.14 Representation of the transformation that creates calendar days data table...... 98 C.15 Representation of the transformation that loads occupancy fact table data...... 99 C.16 Representation of the transformation that loads energy readings fact table data...... 99 C.17 Representation of the transformation that loads weather readings fact table data..... 100 C.18 Representation of the transformation that extracts and transforms weather variables data 101 C.19 Representation of the transformation that loads degree days data...... 101 C.20 Representation of the transformation that loads energy costs fact table data...... 102 C.21 Representation of the transformation that aggregates energy data peak demand period. 102

xviii Glossary

API Application Programming Interface BASIC-EIS Basic Energy Information System BEMS Building Energy Management System DRS Demand Response System DSA Data Staging Area DSS DW EEM Enterprise Energy Management System EIS Energy Information System ETL Extract, Transform, Load HTML HyperText Markup Language HVAC Heating, Ventilating, and Air Conditioning IST Instituto Superior Tecnico´ IS Information System JSON JavaScript Object Notation MDX Multidimensional Expressions Language MOLAP Multidimensional Online Analytical Processing OLAP Online Analytical Processing PDI Pentaho Data Integration REST Representational State Transfer ROLAP Relational Online Analytical Processing SCD Slowly Changing Dimension URI Uniform Resource Identifier WEB-EMCS Web Energy Management and Control System Wi-Fi Local Area Wireless Computer Networking XML Extensible Markup Language XPATH XML Path Language

xix xx Chapter 1

Introduction

Energy management is a crucial activity to sustain the competitive advantage of facilities and orga- nizations. The primary concern of energy management is continually finding sources of waste and opportunities for improvement, which result in increased energy efficiency [1,2]. To find energy-saving opportunities, energy consumption data must be analysed in the light of the fac- tors that influence it. In buildings, this means analysing energy consumption in terms of multiple dimen- sions such as the arrangement of spaces, the specificities of constructive elements, the characteristics of the installed/commissioned equipment, and, ultimately, the behaviour of the occupants [3,4,5,6,7]. Decision Support Systems (DSSs) are well-known in management for supporting this sort of mul- tidimensional analyses. They enable managers to analyse vast amounts of data, identifying relevant knowledge, and choosing among different courses of action [8,9]. DSSs collect data from multiple sources and store it in a data repository known as a Data Warehouse (DW) [10]. In this repository, data is organized under a global unified schema that facilitates data analysis and presentation [11]. Moreover, using this reference schema, commonly known as a multidimensional model [12], distinct tools are able to cooperate, enabling managers to integrate, analyse and visualize large volumes of data—a degree of separation of concerns that largely explains the success of DSSs. Building Energy Management Systems (BEMSs) are the decision support systems that support the energy management processes, that consist of monitoring, analysing, controlling, and optimizing energy usage. Overall, BEMSs minimize energy consumption, and maximize productive conditions and energy efficiency [13]. BEMS comprise activities such as (i) consolidating energy-related data from different sources, (ii) using data access tools to analyse building performance, (iii) visualizing energy-related data, and (iv) generating reports [14, 15]. All these activities must access a common data model. Although the creation of multidimensional models is by now well established in the information sys- tems domain [11, 12], creating a reference multidimensional model for energy management is hard. The explanation for this fact lies in the difficulty to obtain precise detailed requirements regarding en- ergy management activities. First, the existing energy management standards such as I. S. 393:2005 [16], ANSI/MSE 200:2008 [17], BS EN ISO 16001:2009 [18], and BS EN ISO 50001:2011 [19] do

1 not agree on precise business requirements for energy management. Second, these standards do not deliver appropriate detail to enable deriving accurate information requirements [20]. In addition, it is well known that business process systematization is essential to obtain an accurate model formulation, without which many formulations are possible, but are either incomplete or inaccurate, thus resulting in increased development and maintenance costs [21]. The lack of a quality information model can be grasped in BEMSs with confusing user interfaces (that force users to throw away large amounts of data [22, 23]), and that are limited in terms of analysis capabilities (often forcing energy managers to use spreadsheets to process and analyse energy data) [2, 24]. Indeed, the negative impacts resulting from poor engineering practices related to BEMS have been documented and traced to high development and maintenance costs [22, 25]. This work develops and validates the design of a multidimensional model for building energy man- agement that seamlessly supports a broad range of energy-related data analysis activities. Our model proposal is grounded on well-established principles of multidimensional modelling and patterns devel- oped by Kimball et al., thus achieving a high quality model that is simple to use and modify [26]. We validate our model according to multidimensional model quality metrics, including complexity, usability, and design quality metrics. Moreover, we study energy data integration, data visualization, and reporting techniques, and de- velop a BEMS prototype to validate the model with energy managers.

1.1 Problem Definition

Despite a few sparse contributions [15, 27], no proposal of a reference multidimensional model has been documented in literature that supports a broad range of activities underlying building energy manage- ment. Moreover, the quality of existing models was never validated, which is explained by the lack of proposals regarding the evaluation of multidimensional models, and standards defining how to evaluate their quality. A reference model for energy management would promote best practices, encoding a set of reusable requirements, and easing the agreement between users and developers on what the system should fulfil [28, 29, 30]. Furthermore, a high quality model would help stakeholders on finding requirement errors, suggesting modifications according to changing business needs, and thus, positively impacting the effort and cost required for a BEMS development [28, 31, 32].

1.2 Methodology and Contributions

This work aims at developing and validating a multidimensional model for building energy management. To this aim, we first analyse the requirements of data warehousing and multidimensional models for energy management decision support. In particular, we review literature on energy information sys- tems, energy management standards, existing multidimensional model proposals, energy data analysis techniques, and energy data visualization methods.

2 Secondly, we develop the model according to the methodology proposed by Kimball et al. (known as the Kimball lifecycle) that follows a bottom-up approach [26]. This methodology includes a dimensional modelling phase that comprises identifying (i) the business processes design targets, (ii) the granularity of data captured by the processes, (iii) the dimensions that describe the business context, and (iv) the resulting measurements (facts). The subsequent steps are building the bus matrix (that maps business processes and dimensions) and designing the model. Overall, Kimball et al. methodology is known for easing users collaboration on the development process. Therefore, compensating the lack of energy management business processes systematization [26]. Thirdly, we study how to validate our proposed multidimensional model, reviewing multidimensional model proposal evaluation methods, and model quality metrics literature proposals. As a result, we define an evaluation methodology, in which the model is validated with quality metrics, tested against a wide range of queries, and reviewed by users with energy management domain knowledge. These procedures are part of an iterative process, where the model is continuously analysed and improved. Finally, the model is used as a blueprint on the development of a BEMS prototype. The prototype development consists of extracting, integrating, and loading energy-related data from different sources, creating an OLAP server that responds to data analysis requests, and developing energy data analysis tools that users interact with. As a result, we are able to perform a further model evaluation, based on interviews with experienced energy managers. Those interviews have the purpose of evaluating BEMS prototype usability, performance, and functionality (energy consumption analysis tools), assessing if the model supports BEMSs functionalities and does not compromise system performance. The major contributions of this work are as follows:

• Identification of the major data sources and data analysis activities regarding the building energy management domain.

• Review of the major energy data visualization methods and reporting techniques.

• Definition of a multidimensional model instantiated to energy management domain, based on state- of-art techniques presented on the literature, and improved iteratively according to user reviews (researchers in building energy management domain).

• Survey of methods used on the literature to evaluate DW models.

• Validation of the proposed model using multidimensional model complexity, usability, and design quality metrics.

• Development of a BEMS prototype based upon the proposed model, and validated during inter- views with experienced energy managers.

1.3 Document Structure

The remaining of this document is organized as follows:

3 Chapter 2 provides a systematization of building energy management and data warehousing concepts which are necessary to understand this work.

Chapter 3 addresses related work, exploring existing energy information systems, techniques for en- ergy data analysis, DW models for energy management, methods used to evaluate multidimen- sional models, and energy data reporting and visualisation techniques.

Chapter 4 describes the multidimensional model development methodology and solution proposal.

Chapter 5 describes the multidimensional model evaluation methodology and results.

Chapter 6 describes the BEMS prototype development and solution, and the evaluation methodology and results.

Chapter 7 presents the conclusions and implications of our work.

4 Chapter 2

Concepts

In general, BEMS rely on a DW system responsible for managing different types of building energy- related data. The DW integrates data from different sources, assures the data quality, and provides data required by managers to perform energy-related analyses. In a sense, BEMS are decision support systems instantiated to building energy management. Thus, the comprehension of decision making process and DSS characteristics is a starting point to understand BEMS.

2.1 Decision Support Systems

DSSs are IT systems that provide knowledge to support decision making. Essentially, a DSS stores vast amounts of data, which are used by managers to detect situations to correct, choose the most appro- priate solution, and apply the solution. They allow managers to perform complex analyses that normally would be time consuming, and would result in late/bad decisions. The goal of DSSs is reducing cogni- tive, temporal, or economic requirements; enabling fast, reliable, and accurate, quality decisions [8]. Decisions can be unstructured, structured, and semi-structured. Unstructured decisions are not executed regularly and there is no agreed solution for them. They require the manager to judge and evaluate the situation before taking an action course. Structured decisions are executed periodically, and their solution consists of a predefined set of tasks, which is well known. Semi-structured decisions are applied over problems where the solution is partly known. They are a combination of structured and unstructured decisions [8, 33]. Different types of managers use DSSs to perform decisions. In particular, senior managers take long range unstructured decisions, such as determining the organization strategies (e.g. entering a market) and analysing the financial performance. Middle managers support the activities of senior managers and take mostly semi-structured decisions. For instance, a middle manager activity may be analysing why the energy consumption increased in the last year. The structured part of the decision is obtaining a report from the DSS. On the other hand, the unstructured part consists of consulting different facility managers to obtain their insights, evaluate the situation, and formulate a conclusion. Finally, operational managers monitor the business conditions and take structured decisions (e.g. fix equipment malfunctioning) [8, 33].

5 Despite the decision type, according to Simon the decision making process can be organized in four phases: intelligence, design, choice, and implementation [34]:

Intelligence refers to when decision makers collect data, analyse acquired knowledge according to the organization interests, and identify problems [8, 33, 34].

Design consists of identifying and exploring distinct courses of action to solve the problem, understand- ing their causes and effects [8, 33, 34].

Choice involves choosing a solution to the problem, according to the analyses made about each so- lution alternative, internal and external organization pressure, and decision maker characteristics (e.g. experience, and trait) [8, 33, 34].

Implementation consists of implementing the selected course of action and monitoring its progress [8, 33, 34].

With respect to the architecture of DSSs, several authors describe distinct DSS architecture compo- nents. Nevetheless, these authors agree that a DSS is composed of at least three sub-systems: data sub-system, model sub-system, and user interface sub-system [9, 35, 36, 37].

Data sub-system collects and stores data from different sources, providing it to decision makers during the intelligence phase.

Model sub-system provides one or more models, which are used by managers to derive information from data. For instance, a linear regression model is a mathematical model that models the relation between different variables. Decision makers use those models to predict decision outcomes, experiment different input parameters, and estimate what combinations lead to the best output.

User interface sub-system provides a way for decision makers to interact with the system, performing a loop, where the user selects different models, modifies parameters and variables, and executes data analyses. Usually, the interface sub-system is represented by a visual interface to make the interaction user-friendly.

2.2 Building Energy Management Systems

Building Energy Management Systems can be understood as decision support systems that support energy management processes: they monitor, analyse, control, and optimize energy usage. Indeed, similarly to DSSs, BEMSs collect and store building energy consumption data from meters, sensors, and other sources, enabling managers to analyse how energy is spent. Data is then consolidated to enable identifying energy-saving opportunities, forecasting energy consumption demand, detecting anomalous situations, performing improvement actions, and measuring energy saving strategies out- comes [13, 38]. Some BEMSs are capable of actuating autonomously on the equipment to decrease energy consumption, maintaining occupant comfort and equipment efficiency [39, 40].

6 Building Data Performance Data flow Application Automation Management Optimization Layer Layer Layer Layer Request Message

Reporting tools Automation Data Storage Evaluation Systems module Warn Visualisation Tools Building and Act energy Optimization Equipment Equipment consumption module Control Tools Energy Manager models

Adjust equipment functioning Send Requests

Figure 2.1: Main components of BEMS, organized progressively from data acquisition (left) to presen- tation (right). The building automation layer sends automation systems data to the data management layer, that stores it, enabling the performance optimization layer to evaluate and optimize the different equipment functioning. Then, the application layer queries the optimization layer, to provide users with visualization features and letting them adjust equipment parameters.

The aforementioned energy saving operations belong to the set procedures that organizations im- plement to continuously improve buildings energy performance. The role of a BEMS is supporting those activities, which are executed according to a plan-do-act-check cycle described as follows [20]:

Plan activities aim at evaluating current situation and designing improvement plans.

Do is concerned with implementing improvement measures.

Act consists in taking actions to continue improving.

Check is focused on measuring the effectiveness of measures previously implemented.

2.2.1 Architecture of Building Energy Management Systems

There is no consensus in the literature regarding what is the appropriate architecture for a BEMS. Exist- ing proposals fit into a generic architecture consisting of a building automation layer, a data management layer, a performance optimization layer, and an application layer (see Figure 2.1)[15, 27, 40, 41, 42]:

Building automation layer contains building automation systems, such as meters and sensors, and provides data types such temperature and luminance that are related with building performance.

Data management layer collects and stores data from the building automation layer into a data storage system, such as a DW.

Performance optimization layer evaluates energy performance, optimizes the equipment functioning, and warns users about abnormal situations.

Application layer provides a user interface along with a set of tools, which are used by the users to parametrize the system, analyse data, obtain reports, and control equipment functioning. Some examples of tools are and OLAP (Online analytical processing) tools.

7 2.2.2 User Characterization

Building Energy Management Systems target different classes of users, who have with distinct profiles, responsibilities, and information requirements. Different authors agree on four major types of BEMS users: building owners, facility managers, building operators, and occupants [15, 27, 38].

Building Owner/Executive Manager monitors the energy consumption and CO2 emissions, manages energy costs and bills, and audits the facility costs [15, 27]. This class of user manages different buildings and should be targeted with financial information (e.g. budget benchmarking) [38].

Facility manager/Analyst user analysis building spaces data and takes actions to maintain occupants comfort. This class of user should have access to reports with energy consumption patterns and historical data. These reports should also provide different types of comparisons between spaces, equipment, energy consumption, and other factors, on distinct periods of time [15, 27, 38].

Building operator is responsible for maintaining a high user comfort, while regulating equipment func- tioning to minimize energy consumption [15].

Occupant monitors the energy consumption of equipment and has access to current energy consump- tion. Occupants also request the implementation of measures to increase comfort [15, 27, 38].

2.2.3 Comparison between BEMSs and DSSs

There are similarities between BEMS and DSS architectures. Analogously to the building automation and data management layers found in BEMS, the DSS data sub-system collects and stores data from different sources. Furthermore, both DSS and BEMS require models for building environment and en- ergy consumption, enabling users to better understand energy consumption. Also, the BEMS application layer and the DSS user interface provide a way for users to interact with the system [36, 37]. Similarities between BEMS and DSSs are extensible to decision making process. In concrete, build- ing energy management intelligence phase consists of integrating and analysing energy-related data; the design and choice phases include the use of reports and dashboards to analyse and choose among alternative energy reduction activities; and implementation phase comprises the execution of chosen energy reduction actions or policies, and measuring its impact throughout time. In comparison with previously identified DSS users, BEMS users have similar responsibilities. Ac- cording to BEMS users description provided by different authors, building owners are senior managers who manage the organization energy costs [15, 27, 38]; facility managers are middle managers with responsibilities, such as supporting building owners activities (e.g. analyse consumption increase); and operational managers are building operators responsible for taking structured decisions, such as regu- lating equipment functioning [15, 27, 38]. Clearly, a BEMS can be regarded a DSS for building energy management.The parallel between a DSS and BEMS is summarized in Table 2.1.

8 DSS BEMS

Data sub-system [36] Building automation and data management layers Architecture Model sub-system [36] Building environment and energy consumption component models [40] User Interface [36] Performance optimization and application layers

Intelligence [8, 33, 34] Integrating energy-related data, analysing con- sumption, and identifying sources of waste Decision Design and Choice [8, 33, 34] Analysis of energy data in reports or dashboards, making and determining energy reduction activities activity Implementation [8, 33, 34] Taking action by performing energy reduction ac- tivities or policies

Table 2.1: Comparison of DSS (left) and BEMS (right) architecture components and supported activities.

2.3 DW for Energy Management

The DW preserves history, storing activities and events occurred over time. Also, it is optimized for accesses involving a huge number of records, quickly responding to complex queries over stored data. Moreover, the DW extracts, transforms, and stores data from multiple heterogeneous sources, assuring data quality and consistency [10, 11]. Indeed, a Data Warehouse as defined by Kimball and Caserta is “a system that extracts, cleans, conforms, and delivers source data into a dimensional data store and then supports and implements querying and analysis for the purpose of decision making.”[43]. A DW can be understood as a system comprising operational data source systems, a data staging area, a data presentation area, and data access tools [44]. The operational source systems provide input data. Usually, they support different business areas, and use disparate technologies and formats for staging data [44]. In building energy management context, the operational source systems correspond to the building automation layer of the BEMS. In order to take advantage of data sources, data is extracted and temporarily persisted in the data staging area (DSA). Extracted data is integrated, modified to solve data quality issues, and finally deliv- ered into the presentation area. That process is known as ETL (Extract, Transform, Load) [43, 45]. In the presentation area, data is organized and stored under a global schema (multidimensional model), to be analysed and queried using data access tools (e.g. OLAP tools) [44, 46]. On BEMS, the data management layer is represented by the DW presentation area. The data access tools support activities that explore data from the data presentation area, and are part of BEMS application layer. Data access tools are responsible for delivering consumable information to the end-user [44]. The overall DW architecture is depicted in Figure 2.2.

2.3.1 Multidimensional Model for Energy Management

Multidimensional models are used for storing data in the data presentation area that besides efficiency, ensures a number of other aspects [28]. Models provide developers with an abstraction of data, enabling them to explore different design solutions, associated risks, and resulting costs, without getting lost

9 Weather Data Data Staging Area

Extract Integrate

Data Presentation Data Acess Energy Extract Load Access Consumption Clean Area Tools Data Extract Conform

Equipment Data

Figure 2.2: The main components of a DW: organized progressively from data left to right. The opera- tional source systems contain input data that is extracted and transformed on the data staging area, and then loaded on the data presentation area. Data Access Tools obtain data from the data presentation area according to users data analysis requests. Adapted from [44]. on the details of a system that might be large and complex. Moreover, the model captures well the user requirements and domain knowledge, and therefore, developers and stakeholders may use it to communicate and agree on them [21, 28, 31, 47]. These reasons make the multidimensional model a central point of a DW. Multidimensional models are encoded using the core concepts of fact and dimension tables. Facts are observations regarding the business performance. Dimensions are the set of attributes that describe the business measurements [44]. In the context of building energy management, the multidimensional model aggregates data associ- ated with factors that impact energy consumption. For instance, energy measurements data is stored in a fact table, which is described by space data (on space dimension), equipment data (on equipment dimension), and time and date data (on time and date dimensions).

2.3.2 The Energy Data Cube

The DW model can de deployed on a relational , where it is referred as ; or in an on-line analytical processing (OLAP) database, where it is referred as a data cube. The designation comes from the visual metaphor of a 3-D cube, which is an abstraction of how the system is organized (see Figure 2.3)[26]. OLAP storage enables data to be modelled and analysed according to its multiple dimensions, quickly accessing large amounts of summarized data, and providing fast response to complex queries by maintaining precomputed aggregate values [48]. Moreover, OLAP systems provide operations for users to visualize data at different levels of abstraction and granularity, and with distinct perspectives. Those operations are roll-up, drill-down, slice, dice, pivot, among other [48]. Next we make a brief description of the different operations.

Roll-up climbs up on the data aggregation hierarchy, going from more detailed data to less detailed data. For example, we can roll up the hourly average energy consumption to view the average daily consumption.

10 Electric appliances

Lighting February Equipment Air Handling Units March Time (Months) HVAC April

1st 2nd Basement Ground floor floor Space

Figure 2.3: Example of an energy multidimensional data cube with time, space and equipment dimen- sions. Adapted from [48].

Drill-down is the dual operation of roll-up. It starts with less detailed data and presents more detailed data. For instance, we can drill down on energy consumption by floor occupation to view energy consumption by room.

Slice splits the cube according to one dimension, obtaining a one-dimension cube slice. For example, we can slice the cube to obtain the energy measurements recorded on January (date dimension).

Dice is equivalent to the slice operation, but splits the cube according to two or more dimensions. For example, we can dice the cube to obtain the weather conditions occurred on February 9th (date dimension), between ten and eleven AM (time dimension).

Pivot rotates over one of the cube axis, changing the way data is presented.

The DW data analysis tools, and in particular the OLAP operation tools, can be implemented following either a ROLAP (relational OLAP) or a MOLAP (multidimensional OLAP) architecture. The ROLAP approach comprises a DW deployed using a relational database, and an OLAP server responsible for managing OLAP query functions. This server maps the multidimensional model facts and dimensions with relational database tables (organized under a relational data model), executing OLAP queries as if they were being executed over an OLAP data storage. The advantages of this approach are permitting the use of a relational database, and centralizing the OLAP queries functionality on the OLAP server (alleviating the load on data analysis tools). The MOLAP approach is similar to ROLAP, but requires the use of a proprietary OLAP storage [26, 49].

2.3.3 Slowly Changing Dimensions

In the multidimensional model, facts are permanently being added while dimension tables data is mostly static. However, there are situations where the dimension table records have to be modified. Consider, for example, a typical situation where an equipment/device changes from one installation location to another. If the model does not handle these changes, the dimension’s data will eventually become

11 outdated/obsolete, thus leading to wrong conclusions — the consumption of a given device will be accounted in the wrong subsystem or building area. Slowly changing dimensions (SCD) are strategies to handle dimension attribute changes. They are the approach used to deal with the changes in the operational source systems data, that require updating dimension tables data. There are 3 major types of SCDs: type 1, type 2, and type 3, which we describe next [44].

SCD type 1 overwrites the attribute values of dimension rows. Therefore, the attributes always have the most recent values and thus history is not preserved. This strategy is usually used on small changes (e.g. spelling mistakes) [26, 44].

SCD type 2 adds a new dimension table entry every time it is necessary to update a row’s attribute value. Every row must have an effective and expiration date, which expires when the attribute changes and a new row is added. Thus, it is possible to identify if a row attribute value is up-to- date. This type of SCD preserves all historical values. However, depending on the change rate, the dimension tables may grow fast [26, 44].

SCD type 3 requires the dimension rows to have provisional attributes to handle historical values. Every time an attribute is modified the historical value column is overwritten with the current value, and the current value is replaced with the new one. Type 3 has little usefulness when changes are unpredictable, and should be used when a huge number of rows is modified at predefined moments in time (e.g annually) [26, 44].

The choice of SCDs used in each dimension depends on the aggregated data features, the busi- ness requirements, stakeholders involved, and the advantages/disadvantages of every SCD type [12]. Therefore, we will only be able to determine what are the appropriate SCDs for each dimension, after developing the multidimensional model and analysing the integrated data.

2.4 Integration of Energy-Related Data Sources

The main goal of energy data integration is to enable correlating different sources of energy data under the same data schema, removing inconsistencies and semantic mismatches, and providing trustworthy information. In addition to the energy meters, other data sources provide relevant data to understand energy consumption, such as energy simulation tools, building automation tools, billing and operational data, supply contracts, pricing, weather, and occupancy data [38, 50]. The integration of heterogeneous data sources is crucial for energy management, as it helps to understand energy consumption factors. However, several factors hinder the process of energy data integration, namely:

• The heterogeneity of data sources and operational source systems, which have distinct hardware, data structures, or instance level semantics. One example is the wide range of different equipment with distinct purpose, functionality, and characteristics [51].

12 • Data sources that are hard to locate or acquire (e.g. occupancy data, energy costs data) [51].

• The large amounts of data produced by hundreds of thousands of meters and sensors, which are continuously making new recordings [52].

• Data quality depending mostly on the accuracy of metering devices, and their capacity to detect and handle communication failures [52].

2.5 Energy Data Quality

Energy management decisions depend heavily on the quality of energy-related data. Poor data qual- ity compromises the effectiveness of energy policies, and ultimately, compromises the investment on energy management initiatives [52, 53]. In some industries, poor data quality can affect equipment functioning, compromise product quality, result in lost business opportunities, damage relationships with partners/suppliers, overcharge/undercharge costumers, and the lack of compliance with laws/regula- tions. Overall, it affects organizations revenues [54]. Energy data quality depends on reliability of data sources and the effectiveness of and integration. In either case, data must be evaluated according to its validity, accuracy, and completeness, defined as follows [54]:

Validity is the property that defines if data values are within the expected range. For instance, there is a range of expected values for the hourly consumption of heating, ventilating, and air condi- tioning (HVAC) equipment. If the obtained values are negative, then data is not valid [54]. Invalid values may result from meters improperly configured, consumption value overflows (e.g. 999,999 recorded as 000,000), among other.

Accuracy is the distance of data from its correct representation on a real world occurrence. There are two forms of data accuracy, namely syntactic and semantic accuracy. Consider a meter reading made on the university building. The reading is syntactically accurate as long as the units are correctly represented (e.g. kWh instead of KWH). On the other hand, if the reading is associated with a room on another building, then it is semantically inaccurate [55].

Completeness determines if the data set contains all data records and attributes required by the busi- ness information needs. For instance, empty energy records (e.g. caused by internet failures) can be mistaken by a zero consumption, resulting in incorrect billing. Another example are en- ergy records that are not associated with any equipment, hampering the accounting of individual equipment consumption [54].

Organizations typically execute data quality processes to guarantee the required data quality levels in their decision making activities. For instance, the billing information provided to building owners must have higher quality than the consumption values showed to the building occupants. These processes are performed using data quality tools, which follow rules as the following [54]:

13 • To detect validity and syntactic accuracy issues, it is necessary to verify if data record values fall within an accepted range and if the representation is correct. For instance, an equipment status may be represented with 0 or 1 exclusively. In addition, records are compared with previous ones to detect overflow situations.

• Value range check may also be used between different attributes to detect semantic inaccuracies. For instance, every energy record is associated with a tariff, although the tariff depends of the total consumption over a certain period (e.g. fifteen minutes). Therefore, if the total consumption value goes beyond the expected demand value, then subsequent energy records must be associated with a different tariff.

• Even when values are within bounds, it is relevant to investigate zero and negative values. Records from a facility with a zero consumption should be flagged as erroneous. Afterwards, it can be enquired what were the causes and then remove the error indication. The negative values analysis consists of summing consecutive values and check if the value decreases. It might result from meter/sensor resets, overflows, among other causes.

• Another important check is analysing consecutive records with the same data. The value and the timestamp may be the same, or only the values are different. This kind of issue is associated with communication problems between the DW and the sensors/meters, equipment malfunctioning, time jitter, among other. This is another case of semantic inaccuracies.

• Other types of checks include verifying record timestamps to detect missing records, and checking when was the last time data was collected from a source, flagging dead sources, and analysing the issue.

Typical measures to correct data issues include deleting exact duplicates, analyse near exact du- plicates to find the correct record, use algorithms to estimate values that replace missing and wrong values, or manually editing incorrect values [54].

14 Chapter 3

Related Work

Despite the fact that there is an understanding about the common characteristics of BEMS, their require- ments have to be specified in detail to inform the development of a multidimensional model. Seeking to identify these requirements, we will review energy management standards and literature reports on en- ergy data analysis activities. Additionally, we will review energy information systems, that are a generic category of energy decision support systems. These sources should offer an insight on the major energy management activities and their requirements, such as the required data types. Before developing our model we also review existing model proposals and how they were evaluated. The issue of model evaluation is also not consensual. Understanding how different literature sources evaluate their models will enable to come up with a more solid evaluation strategy. We will first study the techniques used to validate other multidimensional models proposals, and review the methodolo- gies used to evaluate conceptual models. Finally, we will review multidimensional model quality metric proposals. Since reporting is a fundamental aspect of DSSs, we will also briefly analyse existing reporting methods and energy visualization techniques.

3.1 Energy Information systems

There is no commonly agreed definition of energy information system (EIS). Nevertheless, different authors agree that EISs consist of data acquisition hardware, performance monitoring software, and communication systems, that collect, store, manage, and present information to users [2, 41, 56]. EISs are used for the purpose of metering and collecting energy-related data, and providing time-series data visualization tools. Energy time-series data involves large volumes of data, and thus, OLAP is referred as the most suited technology for EISs to summarize or aggregate data according to specific time-intervals (e.g. monthly, weekly), enabling users to identify consumption totals and peak demands [38]. For those purposes, EISs extract and integrate energy consumption data, building features data, HVAC and lighting equipment data, weather data, and energy costs data [41].

15 According to Motegi et al. there are four kinds of EISs: the basic EIS, the demand response system, the energy enterprise management system, and the web energy management and control system [41]. Each kind of system enables users to perform different activities that support energy management pro- cesses.

Basic Energy Information Systems (Basic-EIS) collect building data and provide data visualization tools, but do not enable users to do any kind of detailed data analysis [13, 41].

Demand Response Systems (DRS) are used by organizations to participate in demand response pro- grams of energy providers. These programs consist on requests for the facilities managers to reduce electricity consumption on facilities. Adhering organizations use the system to implement demand reduction strategies and analyse their progress, forecast energy consumption accord- ing to historical data, and calculate estimated energy savings using baseline techniques. Thus, benefiting from savings on energy costs [13, 41, 57].

Enterprise Energy Management Systems (EEM) enables energy managers to analyse energy costs, and facilitates energy contracts procurement by providing users with different energy providers cost rates. In addition, it supports energy benchmarking, load profiling, and energy reporting [13, 41, 57].

Energy Benchmarking is the process of comparing buildings or spaces energy usage. The bench- marking may include normalization techniques, which are used to consider factors such as area, volume, weather conditions, or the number of occupants, on the benchmark re- sults [13, 41, 57, 58].

Load profiling is the presentation of energy consumption curves, over a time period. These graphs enable managers to identify peak loads and compare the consumption with baseline load profiles [13, 41, 58].

Energy reporting consists of creating graphical and textual representations of individual equip- ment or spaces energy usage, load profiles, and aggregated consumption values. Reports may also include pie charts with individual equipment consumption [13, 41, 58].

Web Energy Management and Control Systems (Web-EMCS) integrate data obtained from building automation systems (e.g. HVAC), and monitor, control, and automatically optimize equipment functioning. Moreover, the system supports anomaly detection, identifying anomalous situations (e.g. equipment malfunctioning), suggesting causes, and proposing solutions [13, 41].

Taking into account that BEMSs are also EISs, EIS features, namely, the integrated data types and the supplied data visualization tools, are requirements that should be considered on the development of a multidimensional model for a BEMS.

16 3.2 Energy Management Systems Standards and Guides

Existing energy management system standards and implementation guides aim at helping organisa- tions to implement energy management processes. These standards follow the plan-do-act-check cycle approach, dividing energy management processes into planning (do), implementation (act), checking (act/check), and review (act/check) sub-processes [16, 17, 18, 19, 59, 60]. On the following para- graphs we describe the activities associated with energy management processes on I. S. 393:2005 [16], ANSI/MSE 200:2008 [17], BS EN ISO 16001:2009 [18], and BS EN ISO 50001:2011 [19] stan- dards, and Sustainble Ireland 2009 I. S. EN 16001:2009 [60], and CarbonTrust 2011 [59] implementation guides [16, 17, 18, 19, 59, 60]:

Planning consists of identifying and understanding energy usage factors and opportunities for improve- ment, and determining energy management reduction policies and their goals. During this step, I. S. 393:2005 [16], BS EN ISO 16001:2009 [18], and ANSI/MSE 200:2008 [17] standards suggest the analysis of past, present, and future energy usage, using regression analysis methods, study- ing consumption baselines, and estimating future energy usage and consumption. Also, standards and guides agree on the importance of analysing energy consumption according to different time periods (e.g. month, season), facilities, equipment types (e.g. HVAC), purposes (e.g. heating) and state, organization activities, groups of individuals/organizations, energy costs (e.g. tariffs, bills, contracts), and external factors (e.g. climate data).

Implementation comprises the implementation and assessment of previously defined energy reduction activities and policies. In particular, buildings may be modified or expanded, equipment may be replaced or regulated differently, and energy suppliers and contracts may change. Therefore, standards and guides agree on recording modifications affecting energy consumption, enabling users to compare modifications impact on consumption.

Checking involves monitoring energy reduction policies progress over time. Accordingly, I. S. 393:2005 [16], and ANSI/MSE 200:2008 [17] propose the accounting of energy costs and analysing of their evolution, and BS EN ISO 16001:2009 [18] suggests sub-metering of high energy usage activities.

Review process requires top managers to review energy management policies implementation, and their progress according to predefined goals. In particular, I. S. 393:2005 [16] and BS EN ISO 16001:2009 [18] refer the importance of considering changes that will influence energy consump- tion in the following year, reinforcing the need of considering those events on energy consumption analysis [16, 60].

Despite the lack of concrete energy management activities requirements, standards refer other im- portant aspects to consider on the development of a multidimensional model for a BEMS. In particu- lar, the usage of forecasting, regression analysis, and baseline methods to study energy consumption patterns; the analysis of energy consumption according to distinct perspectives such as time periods, activities, occupation, and organizational units; and the accounting of the impact of occurring events on energy consumption.

17 3.3 DW Models for Building Energy Management

A fundamental aspect of a multidimensional model for building energy management is the flexibility of the model with respect to the dimensions that influence energy consumption. A more flexible model enables managers to analyse a larger spectrum of energy consumption factors. Therefore, energy managers are able to obtain more accurate insights over consumption causes, executing more effective energy reduction policies [61]. To identify the major multidimensional model facts and dimensions, and their associated data types, we review existing multidimensional model literature proposals, and energy related data analysis reports.

3.3.1 Multidimensional Model Proposals in Literature

Among the multidimensional model literature proposals, different authors agree on the existence of a primary set of dimensions, such as (i) a time dimension, (ii) a location dimension that captures build- ing locations, (iii) a measurement/sensing device dimension, and (iv) an organization dimension that aggregates data from individual occupants to the organization level [15, 27]. However, existing mod- els typically lack dimensions such as energy costs and user occupancy as proposed in other models (e.g. [62]). Some models are acceptable in terms of flexibility but present design issues. Li et al. presents a model that is relatively complete but includes a dimension called external that stores all energy con- sumption related measurements [62]. Those measurements cannot be stored as dimension attributes, which are used to describe energy measurements context (e.g. related space, equipment, organiza- tion). Thus, each energy consumption related measurement category should be stored in at least one fact table [12]. Li et al. model also includes a renewable energy dimension requiring all energy consumption mea- surements to be associated with a renewable energy dimension entry, which may not always hap- pen [62]. Therefore, the model should include a degenerate dimension to distinguish renewable and non renewable energy sources, or a junk dimension identifying the source type [12]. Another modelling issue is modelling of fact tables as dimension tables. This problem is found in modelling of energy tariffs, that should be modelled as facts because they vary over time and according to supplier, location, and organization unit. In addition, they are associated with the process of analysing energy costs, and should be stored in a fact table [12]. Another problem is including dimensions in the model that are too specific. The model proposed byG okc¸e¨ and Gokc¸e¨ includes an HVAC dimension instead of an equipment dimension [27]. Likewise, the remaining types of equipment (e.g. illumination) will be stored in new individual dimensions, resulting in the creation of a centipede fact table—a fact table with too many dimensions. Such fact tables are known to compromise queries performance, and ultimately, the system usability [12]. The multidimensional model proposed by Hong-ye et al. includes separate dimensions for date and time [63]. This aspect is important in building energy management context, where managers analyse data summarized by years, minutes, or even seconds [64]. Furthermore, having separate time and date

18 dimensions reduces date dimension number of rows, that would otherwise end-up storing millions of rows, compromising the underlying system performance [12]. Despite the importance of having both time and date dimensions, other authors include solely a date dimension on their models.

3.3.2 Energy Data Analysis Activities

To identify the major multidimensional model facts and dimensions, and their associated data types, we reviewed energy related data analysis literature reports. According to Fumo, the major factors that drive energy consumption are (i) weather variables, (ii) building characteristics, (iii) equipment features, and (iv) occupants behaviour [4]. Due to its impact on energy consumption, there are a vast number of literature reports regarding the impact of weather variables (e.g. temperature) on energy consumption. Lazos et al. reviews fore- casting techniques that take into account weather variables to optimise commercial buildings equipment functioning and consumption [7]. Other authors, such as Parkpoom and Harrison [65], Pardo et al. [66], and Christenson et al. [67], study the impact of temperature, solar radiation, and degree day increases on future energy consumption demand across different spatial locations. Vollaro et al. [68], and Poirazis et al. [69], perform similar analysis, but their studies take into account building characteristics such as floors area and insulation materials. Degree days and building characteristics are also used by Gao and Malkawi to benchmark building energy performance, along with equipment operation hours and purpose, and the number of occupants [70]. Concerning the impact of space occupancy on energy consumption, Seryak and Kissock [6], Feng et al. [71], Tso and Yau [72], Gul and Patidar [73], and Santin et al. [74] perform studies considering the number of space occupants and their behaviour regarding equipment usage, on spaces with dif- ferent purposes (e.g. office, and restroom), at different time periods (e.g. day, and night). The same energy consumption factors are used by Buchmann et al. [75], Iyer et al. [76], and Perez et al. [77] to disaggregate energy consumption data. Other energy consumption related factors, such as energy costs and tariffs, are also considered on literature reports. In particular, Apolinario et al. [78] proposes an optimisation technique for Portuguese energy tariffs, and Rocha et al. [79] develops an optimisation method to regulate equipment operation, according to weather conditions and energy real-time market costs. On its turn, Florides et al. analysis the impact of several energy reduction policies (e.g. installing solar shading systems), and presents their cost effectiveness [80].

3.4 Evaluation of Conceptual Models

Conceptual models are used to define user requirements and design information systems (IS) according to those requirements [31, 81]. Indeed, most errors occurred during IS design are related to require- ments [31, 82, 83], which are also a frequent cause of unsuccessful IS projects [31, 82]. In fact, concep- tual models influence the time, cost, and quality of IS development [31, 81], and therefore, they should

19 be evaluated. To the best of our knowledge, no methodology has been proposed to evaluate the quality of mul- tidimensional models. Likewise, there are no agreed upon guidelines to evaluate conceptual models, neither there is an agreement on what defines a “good” conceptual model. There are several reasons why it is hard to evaluate conceptual models. Unlike software systems, there are no standards to evaluate the quality of conceptual models [31, 84]. Furthermore, contrarily to what happens with finished software products, conceptual models cannot be evaluated against a specification [32]. More specifically, the model defines the user requirements, and there are no other requirements to evaluate against it, but people expectations, wishes, needs, and desires. It is therefore a highly subjective process, hard to systematize and document [31, 85, 86]. During his research, Moody identified several issues on existing proposals to evaluate conceptual models [31]. Next we describe some of them.

1. Proliferation of proposals results from an immature field, where there is no consensus on con- cepts and terminology, and where research efforts are scattered [31, 87].

2. Lack of adoption in practice means so far no proposals were widely accepted, and despite their potential benefits there is no practical evidence to support them [31, 88].

3. Lack of measurement is the fact that most authors propose quality criteria, but do not determine how to them [31, 89].

4. Lack of evaluation procedures is the absence of details on how to execute evaluation proce- dures, which is recurrent among proposals [31].

5. Lack of empirical testing is the evidence that most proposals were only tested logically or theo- retically, and never empirically [31, 89].

6. Lack of knowledge about practices is the non availability of empirical studies on how industry practitioners evaluate conceptual models [31].

3.5 Evaluation of Multidimensional Models

Not only the number of literature references that propose multidimensional models for energy manage- ment is small, but their quality is never evaluated. In concrete, (i) Ahmed et al. compares the perfor- mance of the DW created with an existing database system [15]; (ii) Gokc¸e¨ and Gokc¸e¨ proposes and evaluates a DW schema through practical use case scenarios [27]; and (iii) Li et al. [90], and Hong-ye et al. [63] propose a DW for energy management, but do not describe any kind of evaluation done. The lack of multidimensional model evaluation procedures is extensible to other domains. Indeed, despite the fact that many industries are using DW systems, their schemas were rarely submitted to proper quality evaluation, and the few reported DW evaluations never took into account the quality of

20 Reference Performance Testing Use Case Scenarios Data Quality Analysis Reports Quality Analysis Schema Quality Analysis

Energy Management domain Ahmed et al. [15] ● ○ ○ ○ ○ Gokc¸e¨ and Gokc¸e¨ [27] ○ ● ○ ○ ○ Li et al. [90] ○ ○ ○ ○ ○ Hong-ye et al. [63] ○ ○ ○ ○ ○

Health & Medical domain Berndt et al. [91] ○ ○ ● ○ ○ De Mul et al. [92] ○ ○ ● ● ○ Lamer et al. [93] ○ ○ ○ ○ ○ Roelofs et al. [94] ○ ○ ○ ○ ○ Hu et al. [95] ● ● ○ ○ ○ Zhou et al. [96] ○ ○ ○ ○ ○ Trick [97] ○ ○ ○ ○ ○ Wisniewski et al. [98] ○ ○ ● ● ○ Rubin and Desser [99] ○ ○ ○ ○ ○ Natural sciences domain Liu et al. [100] ○ ○ ○ ○ ○ Eleveld et al. [101] ○ ○ ○ ○ ○ Burkhardt et al. [102] ○ ○ ○ ○ ○ Construction domain Rujirayanyong and Shi [103] ○ ○ ○ ○ ○ Park and Kim [104] ○ ○ ○ ○ ○ Other domains Hossain et al. [105] ● ○ ○ ○ ○ Chou and Tseng [106] ○ ○ ○ ○ ○ Hao et al. [107] ● ○ ○ ○ ○ Song and LeVan-Shultz [108] ● ○ ○ ○ ○ Mendes [109] ○ ○ ○ ○ ○

Table 3.1: Comparison of multidimensional evaluation procedures (right) of multidimensional model pro- posals (left) across several industries (○– Not Tested ●–Tested). Some multidimensional model proposals are not validated, while other are validated in several ways. None validates the model quality. the multidimensional schema (see Table 3.1). This fact represents a limitation in the evaluation pro- cedures since the schema quality influences several parts of DW development cycle (e.g. stakeholders communication and requirements validation), and constrains the overall DW performance [28]. In software engineering fields, software metrics are used to objectively measure software quality attributes. Although, it is difficult to measure subjective qualities such as understandability, that may have different interpretations [32]. According to Kitchenham, one solution is measuring software at- tributes and relate them with qualities [110]. Moody and Shanks followed this approach and proposed a conceptual model evaluation framework, which is intended to objectively compare the quality of dif- ferent models [111]. Likewise, we looked for proposals on multidimensional model quality metrics (see Table 3.2). On the literature, we identified metrics for development and model structure quality, described as follows:

Development quality metrics measure the quality of the model development process. The major de-

21 Reference Categories of Metrics are Defines desirable Target Quality the metrics validated target values

Object-oriented design Gosain et al. [118] S No No Structural Complexity Ali and Gosain [119] S Yes Partially Understandability Serrano et al. [120, 121, 122] S Yes No Efficiency, Understandability

Unified design Cherfi and Prat [115] S No Yes Analysability, Simplicity Prat and Cherfi [116] S, D No Yes Analysability, Simplicity Berenguer et al. [117] S Yes Partially Cognitive Complexity

Kimball design Serrano et al. [123, 124] S Yes No Cognitive Complexity Papastefanatos et al. [125] S, D Yes No Maintainability Golfarelli and Rizzi [126, 127] S, D Yes Yes Usability Nagpal et al. [128] S Yes No Structural Complexity Kimball et al. [26] S, D Yes Yes Usability, Design Quality Kimball and Ross [12, 44] S, D Yes Yes Usability, Design Quality

Table 3.2: Summary of multidimensional model quality metrics literature reports. The table indicates metrics authors, metrics category (S stands for structural metrics and D for development metrics), the existence or non-existence of desirable values on the proposals, and the metrics target qualities.

velopment qualities are concerned with model compliance with user requirements and data mod- elling rules, and if it is implemented within budget and schedule [111, 112, 113].

Structural quality metrics measure qualities, such as complexity and understandability, which depend on the amount of fact and dimension tables, and their definitions [111, 114, 115, 116].

Literature proposals on multidimensional model quality metrics target one of several model designs. There are proposals for unified [115, 116, 117], object-oriented [118, 119, 120, 121, 122], and multidi- mensional model design [123, 124, 125, 126, 127, 128] (as described by Kimball et al. [26]). Indeed, this segmentation limits the number of available metrics for each design category. Another issue with existing metric proposals is the lack of validation of metrics themselves, which compromise their usefulness. One example is Cherfi and Prat metrics proposal for analysability and simplicity, that were never validated [115]. One final problem with existing metrics proposals is the lack of desirable values associated with different metric results. Serrano et al. [120, 121, 122, 123, 124] has conducted several studies to verify the relation of the number of facts and dimensions with model understandability. However, none of the studies defines the expected number of facts or dimensions on a model with either low or high understandability.

3.6 Energy Data Visualization and Reporting

Data visualization is the representation of information obtained from aggregated data. Effective visu- alization tools, such as graphs and charts, allow data to be self-explanatory. On the contrary, large amounts of numerical values are difficult to understand and find patterns. Data visualization is a vehicle to identify patterns, which enable users to see the regular behaviour of equipment consumption. Moreover, patterns are used to create baselines for detecting abnormal system

22 conditions. In a graph it is easy to visualize data points which are distant from the normal pattern [129]. Despite there is no agreement on the major BEMS data visualisation and reporting methods, several authors agree that BEMS should provide users the following reporting capabilities [64, 130, 131]:

Accountability reporting is used to present daily information about energy consumption demand. Managers use accountability reports to monitor near real-time equipment performance, improve equipment operation, and evaluate implemented energy saving strategies. Another form of ac- countability reports are monthly cost allocation reports, which include a breakdown of costs per organization unit and equipment type.

Load aggregation reporting also known as simple tracking, comprises the inspection of monthly or yearly aggregated consumption, and may have a graphical representation of load profile.

Baseline reporting consists in comparing current equipment consumption rate with baselines. These reports can be enhanced with normalization techniques to measure the influence of other fac- tors. For instance, analysing the chiller consumption may require considering indoor and outdoor temperature differences.

Loading Histograms reporting have a graphical representation of equipment load and operational hours.

In addition to the previous reporting techniques, Granderson et al. proposes other accountability re- porting methods and techniques, such as using capital budgeting metrics to quantify the benefits of en- ergy saving policies (internal rate of return), and converting energy consumption into carbon-emissions for sustainability analysis (carbon accounting)[64]. Furthermore, Granderson et al. proposes several variations of loading aggregation reporting [64]. More precisely, comparing current year energy per- formance with past year performance (longitudinal benchmarking), comparing buildings energy perfor- mance with similar buildings performance (cross-sectional benchmarking), and inspecting daily energy meter data in order to find inefficiencies (load profiling). Finally, the authors identify other reporting techniques, including the reporting of size, timing, and duration of peak loads (peak load analysis), the inspection of lighting, heating, and cooling equipment efficiency, and the use of regression models (e.g. using weather variables) to characterize energy performance (model baseline reporting)[64].

3.7 Discussion

Despite the scarcity of authoritative literature sources regarding the precise requirements of energy management activities, existing literature reports enable drawing some conclusions. During the exe- cution of energy management processes, energy managers perform data analysis activities, such as load profiling and energy performance benchmarking. These techniques are transversal to EIS systems functionality, energy management standards, and literature reports on energy data analysis activities. Accordingly, a multidimensional model for building energy management must support these data anal-

23 ysis activities. Additionally, the previous literature sources refer a common set of data types the model should integrate, namely, energy consumption and costs data, weather data, and occupation data. The number of literature references that propose multidimensional models for energy management is small. Another striking aspect is that, existing models do not support a broad range of activities under- lying energy management, and most models present design anomalies. In either case, existing model proposals are a starting point to develop our model proposal. The proposal will include dimensions for time, location, sensing device, and organization. In addition, as a result of reviewing energy data anal- ysis reports, we will also create dimensions to integrate space, equipment, weather, occupancy, and energy costs data. Another issue with existing model proposals is that the quality of existing models was never evalu- ated. The explanation seems to lie in the lack of proposals regarding multidimensional model quality evaluation. We also reviewed conceptual model evaluation proposals in order to determine how to evaluate DW models. Regrettably, there are no standards defining how to evaluate conceptual models. Moreover, conceptual models cannot be evaluated against a specification, but only against informal requirements, compromising the objectivity of the process. Finally, it becomes clear that existing model evaluation proposals have several issues, in particular, none is widely adopted in practice and there are no studies reviewing industry practitioners methods to evaluate conceptual models. Another alternative to evaluate a multidimensional model is using multidimensional model quality metrics. In literature, we found metrics to measure model structural and development qualities that can be used to validate the quality of our model proposal. One final aspect concerned the identification of energy visualization and reporting techniques. De- spite the lack of agreement on the most relevant techniques, literature identifies a common set of tech- niques, including accountability and load reporting, and other variations such as longitudinal benchmark- ing. We will consider these techniques to develop our BEMS prototype.

24 Chapter 4

A Multidimensional Model Solution

A multidimensional model for building energy management must integrate several data types, and also support the distinct energy management activities that BEMS must implement. The development of such model comprises (i) the identification of major energy management business processes, (ii) the definition of dimensions and facts, and (iii) the use of multidimensional model design patterns to determine the relationships between model entities. This chapter details the multidimensional model development process, and explains the building blocks of our model proposal.

4.1 Multidimensional Model Development

In literature, there are two major DW development methodologies. The methodology created by Inmon follows a top-down approach, which relies on IT professionals depth knowledge of business processes and a previous definition of those processes [11] . This approach is also based on the creation of entity- relationship diagrams and other technical artifacts, limiting end-users intervention on the development process [11, 26, 132]. The methodology developed by Kimball et al. (known as the Kimball lifecycle) follows a bottom-up approach [26]. In concrete, Kimball’s methodology consists of defining the business processes model targets; determining data granularity, model dimensions and facts; and building a matrix that maps business processes and dimensions (bus matrix). Unlike the top-down approach related concepts, bottom-up concepts, such as facts and dimensions, are easier to understand by end-users, easing their collaboration on the development process. This aspect is particularly important on the building energy management context, where users intervention compensates for the lack of business processes systematization. For these reasons, we follow the Kimball lifecycle methodology to develop our multidi- mensional model [11, 26, 132]. Taking into account the Kimball lifecycle principles regarding dimensional modelling, any model de- sign activities are preceded by the following steps [26]:

Identify the business processes to be modelled, taking into account business requirements and avail- able data sources.

25 Contains

Own Building Installed in Space Influences Time and Organization Space Occupants Date and their Behaviour

Pay Influences Associated With Influences Consumption

Associated With Energy Costs Energy Data Equipment Influences Consumption Weather

Records

Measures Consumption Energy Meter

Figure 4.1: Concept map of building energy consumption domain, depicting the network of concepts involved.

Declare the grain defining the meaning of each fact table measurement and its level of detail.

Identify the dimensions determining the sets of attributes describing fact table measurements. Usu- ally, dimensions answer the questions of who, what, where, when and how regarding the measure. Typical dimensions are time and date.

Identify the facts by defining what measures to include on fact tables. Facts must be consistent with declared grain.

The four-step prelude to model design is succeeded by the development of the bus matrix that aims at standardizing dimensions and facts, so they can be used and interpreted uniformly across the organi- zation. The result is a DW development architectural framework, that allows development teams to work independently and at different times. Moreover, the bus matrix is a bridge of communication between developers and business managers [12]. In order to perform the previous steps and then develop the bus matrix, in the context of BEMS, it is necessary to consider (ii) the underlying data sources (i), and building energy management pro- cesses [12]. The following sections describe the data sources and business processes related to build- ing energy management.

4.1.1 Building Energy Management Related Data Sources

The identification of the major data sources was based on the review of energy management standards, and energy data analysis activities. In addition, we enquired several energy management domain ex- perts about the relationships between distinct data types (See Figure 4.1). Accordingly, the major data sources that have to be integrated into a multidimensional model for building energy management are the following:

Energy Metering Data refers to energy consumption data, and quantifies the energy required to per- form business activities on a given space area (optionally using an equipment), at a specific time

26 interval [38, 133, 134]. Energy measurements are performed by meters of distinct types and mea- sure different energy features (e.g. electric current intensity) [38]. From our analysis, energy me- ters (i) use different protocols and communication parameters (baud rate, parity, data bits), (ii) mea- sure different types of energy (e.g. gas, electricity), and (iii) use different units of measurements, that may be represented using the international system of units, or another system [133, 134].

Energy readings are associated to a datapoint, which aggregates the properties that describe an energy measurement. For example, it describes the units, precision, scale, and domain (e.g. energy or weather reading) of each measurement [135].

Building Spaces Data is concerned with capturing the organization of the building envelope. Each space entity can be an atomic space or an aggregation of spaces [136]. An aggregate space entity is a collection of spaces sharing similar characteristics, such as proximity, common building infrastructure, or even similar areas or volumes.

BEMSs use building spaces data to quantify energy consumption per space. It is also common to collect building characteristics, such as building height, shape, size, and individual space charac- teristics namely function/purpose (e.g. storage room, office room) [23].

Equipment Data encompasses devices that consume energy in the context of business activities, and are located in a building space. Equipment can be categorized according to multiple dimensions such as location, period of operation, and function; and assembled in major systems such as HVAC, which may be organized into several functional sub-systems (e.g. chilled water production, air handling units) [4,5, 65].

Equipment usage varies throughout time, for instance, equipment is used to adjust thermal comfort according to the seasonal weather conditions; and depends on equipment characteristics, such as dimensioning, that impact energy consumption [4,5, 65].

Building Space Occupancy and Behaviour Data influence energy consumption depending on space occupants activities, individual behaviour, and number of individuals occupying the space [3,6, 65]. In buildings, energy consumption varies with occupancy. An occupant can be an individual person or a collective entity (e.g. an organization), which in turn, can be an aggregation of other entities or individuals.

Organizational data is used by BEMSs to allocate costs per organization or per individual members. It captures the structure of organization elements, i.e., groups of one or more people, realizing activities in a coordinated way, to achieve common goals [33, 137, 138, 139, 140]. For example, in a university context we can identify groups with different objectives, such as learning, teaching, or administration activities (e.g. cleaning, security, and logistics). The organization activities are divided among its members, and there is a hierarchy of authority/responsibility [33, 137, 138, 139, 140]. Organization’s structure changes frequently, and therefore the system must be capable to handle organizational data changes [23].

27 Weather data is related to the environmental conditions (e.g. temperature, wind speed, and solar radi- ation) of a specific location. This data type is important since there is a relation with energy con- sumption in buildings [4,7, 38, 141]. For instance, temperature and humidity require adjustment in thermal comfort; and solar radiation and cloud formation influence the total lighting load [7].

Energy costs usually vary on fixed time schedules (e.g hourly, and daily). The cost has a fixed compo- nent, and a variable component that depends on the current energy demand, the estimated energy consumption, among other factors. There are different categories of energy costs, including tariffs and real-time energy market costs [64, 78].

4.1.2 Building Energy Management Business Processes

The identification of business processes is a fundamental step to limit the number of design targets and correctly define the grain, dimensions, and facts [12]. As described in Section 3.2, to identify building energy management business processes, we reviewed energy management standards [16, 17, 18, 19, 59, 60]. However, the standard’s description of energy management activities is not backed by concrete requirements nor described with sufficient detail [20], thus hampering the identification of the business processes that the multidimensional model must support. Another aspect concerning building energy management standards is they only describe high level energy management activities such as understanding energy usage, or benchmarking current perfor- mance [20]. These activities comprise the execution of lower level activities, that involve the analysis of different types of measurements. As a result, the energy management activities described by the standards are associated with several types of measurements. Yet, each business process the multidi-

Step Output

1. Identify the Business processes business processes ● Analysis of buildings energy consumption, ● Analysis of weather conditions impact on energy consumption, ● Analysis of energy tariffs and costs evolution over time, ● Analysis of space occupancy and activities impact over energy consumption.

2. Declare the grain Measurements and their associated granularity ● Energy consumption readings (every 15 minutes), ● Weather readings obtained (every 30 minutes), ● Space occupation (every 30 minutes), ● Energy Costs information (every month).

3. Identify dimensions Dimensions and their associated roles Time and Date (when), Space (where), Organization (who), Equipment (how), Activity (what), Datapoint.

4. Identify the facts Facts and their associated fact tables ● Energy measurements (energy meter readings fact table), ● Weather readings (weather readings fact table), ● Heating and cooling degree days (degree days fact table), ● Energy costs and tariffs (energy costs fact table), ● Measured occupation (building space occupancy fact table).

Table 4.1: Output of the four-step design process of the Modelling phase, from Kimball lifecycle [26], instantiated to the Energy Management domain. The four-steps comprise identifying the business pro- cesses, declaring the grain, identifying the dimensions, and identifying the facts.

28 Business Process Time Date Activity Space Datapoint Equipment Organization

Analysis of buildings energy consumption ● ● ○ ● ● ● ○ Analysis of weather conditions impact on energy consumption ● ● ○ ● ● ● ○ Analysis of energy tariffs and costs evolution over time ● ● ○ ● ○ ○ ● Analysis of space occupancy and activities impact over consumption ● ● ● ● ○ ○ ●

Table 4.2: Representation of the Bus Matrix that associates business processes (left) with dimension Tables (right) [44](○– Non-existent association ●–Existent association). mensional model supports must be associated with a single type of measurements, and a corresponding granularity [12]. Therefore, the activities describe on the standards are not appropriate to develop our model. An alternative way of determining the business processes is considering the energy consumption analysis methods and techniques reported on the literature as described in Section 3.3.2, and then inferring the underlying business processes. In particular, studying the impact of weather conditions on energy consumption [4,7, 65, 66, 67, 68, 70, 141], analysing the impact of space occupancy and occupant behaviour on consumption [6, 70, 72, 73, 74, 75], and evaluating the evolution of energy costs and tariffs over time [64, 78, 79, 80]. In addition, we considered the process of analysing energy consumption in buildings [16, 18, 19, 20, 57, 59, 64]. The identified business processes concern one different type of measurements each, and can easily be associated with a granularity. As a result, we were able to complete the analysis steps of declaring the grain, identifying the dimensions, and identifying the facts [12, 26]. The details are presented in Table 4.1. Using these analysis steps it is then possible to derive the bus matrix (see Table 4.2), and inform the model design, as described on the following section.

4.2 Multidimensional Model Solution Description

During the execution of building energy management business processes, energy consumption related metrics are recorded and stored as fact table measurements. Thus, the core of the multidimensional model consists of four fact tables: energy measurements, weather readings, building space occu- pancy, and energy costs. The major dimension tables are time, date, space, equipment, and datapoint dimension. Addi- tionally, the model contains other constructs, such as hierarchy bridges and group bridges, resulting from the application of multidimensional design patterns. The complete model representation is given on Figure 4.2 using the crow’s feet notation [142]. The application of multidimensional modelling design patterns depends on the problem specificity and context. We now describe the modelled situations and the design choices taken.

29 T D T Organization Space Hierarchy Bridge Organization Hierarchy Bridge

Payer Owner Occupant

D F Building Space D Space Activity Occupancy

F D F Energy Costs Time and Date Weather Readings

T F D Space Group Bridge Energy Meter Readings Datapoint

D T Equipment Equipment Group Bridge

Figure 4.2: Representation of the complete multidimensional model using crow’s feet notation. The line side in a form of a crow feet (ˆ) represents a multiple association, and the single line means the cardinality is one ( † ). In order to increase readability, Time and Date Dimension are merged into one entity, and Weather Readings fact and Degree Days fact table are also merged. The rectangles left upper corner’s F stands for fact, D stands for dimension, and T stands for table.

4.2.1 Fact Table Design Choices

An important decision for storing energy data is deciding what type of fact tables to use. Fact tables can be categorized as transaction fact tables, periodic snapshot fact tables or accumulating snapshot fact tables [44]. Transaction fact tables represent events (e.g. energy or weather readings) related to moments in time, which are stored in single rows [44]. On the other hand, snapshot fact tables take a “picture” of all the events occurred in a period frame without worrying about single events [44]. Considering its characteristics, transaction fact tables are the most appropriate choice to represent single energy measurements, energy tariffs, single weather readings, and space occupancy recordings. The energy measurements fact table is detailed in Figure 4.3.

4.2.2 Multiple Transaction Fact Tables

Using multiple transaction fact tables allows more descriptive dimension attribute names, instead of a single fact table with generalized names that compromise the model understandability. Furthermore, ETL processes will likely be simplified since different source systems are loaded into different fact tables. However, more tables also means more time to load and index [12]. An important design decision is deciding if energy consumption and weather data should be stored in a single fact table or in multiple fact tables. Despite being correlated, energy consumption measurements

30 and weather readings are analysed differently [7]. The differences between them are as follows:

• They come from different data sources and have distinct granularities—they are recorded with different frequency rates.

• Despite being associated with the same dimensions (time, date, datapoint, space, and equipment), energy consumption is related to a group of spaces, while weather readings are associated only with the building where the station is located.

• They are associated with distinct business processes, and therefore, data analysis tools should present both data types in different ways.

Taking into account these differences, creating distinct fact tables, which is known as a multiple transaction fact tables approach, is the most sensible solution [44]. The weather fact table holds different types of weather measurements (e.g. precipitation, and tem- perature). In this case, not only they are associated with the same business process, but they also have the same granularity and dimensionality. Therefore, they should be stored on the same fact table.

4.2.3 Multi-valued Dimensions

In the multidimensional data model, dimension table rows are related to the fact table by a one-to- many relationship [143]. For instance, the same dimension row may be associated to many energy measurements (the measurements that took place on that given day). However, sometimes it is difficult to model data according to this one-to-many relationship pattern. Suppose, for example, that an energy measurement is done by multi-sensor equipment installed in different building spaces. In this case we have two many-to-many relation:

• Many-to-Many relationship between energy measurements and equipment

1. Each energy measurement is recorded by several equipment sensors working together.

2. Each equipment records many energy measurements.

• Many-to-Many relationship between energy measurements and building spaces

D Time Time ID T Space Group Bridge F Energy Meter Readings ... Space Group ID Time ID ... Date ID D Date Datapoint ID Date ID T Equipment Group Bridge Space Group ID ... Equipment Group ID Equipment Group ID ... Measurement D Datapoint Datapoint ID ...

Figure 4.3: Representation of energy meter readings fact table using crow’s feet notation. The line side in a form of a crow feet (ˆ) represents a multiple association, and the single line means the cardinality is one († ). The rectangles left upper corner’s F stands for fact, D stands for dimension, and T stands for table. Adapted from [12].

31 F Energy Meter Readings Equipment Group ID T Equipment Group Bridge ... D Equipment Equipment Group ID Equipment ID Equipment ID ... F Weather Readings Contribution Weight Equipment Group ID ...

Figure 4.4: Representation of the group bridge tables between meter readings fact table, and equipment dimension. The line side in a form of a crow feet (ˆ) represents a multiple association, and the single line means the cardinality is one († ). The rectangles left upper corner’s F stands for fact, D stands for dimension, and T stands for table. Adapted from [12].

1. Each Energy measurement is associated with the different spaces where the consumption occurred.

2. Each Space is associated with several energy consumption recordings.

This issue is known as the multi-valued dimension problem [46]. One of the best solutions to over- come this problem consists in using an intermediate table to serve as a bridge between the fact and dimension tables [143]. Essentially, the fact table is connected by a many-to-many relationship to a group bridge table, which contains an individual row for each element in a group. For instance, an en- ergy measurement associated with three spaces is assigned one group with three space rows [12]. In order to solve multi-valued dimension issues, our model includes space and equipment bridge tables (see Figure 4.4). Whenever two equipment work together to record energy measurements, their contribution might be different [143]. For instance, in some situations, one equipment might do the recordings by itself, or may record erroneous measurements, requiring later equipment identification. Accordingly, our group bridge table stores the percentage corresponding to the role of each equipment contribution on recorded measurements(e.g. 100%, 0%) [12].

4.2.4 Role-playing Dimensions

A role-playing dimension is referenced by two or more fact table foreign keys. Each key represents a role, which is described by a name, and is associated with a dimension table view (instead of the dimension table itself) [44]. Analogously to bridge tables, role-playing dimensions are used to model many-to-many relationships. However, unlike bridge tables, they require a fixed cardinality—a fixed number of roles. Additionally, the relationship cardinality must be low, or otherwise the high number of roles and their associated table views might degrade the DW system performance [43]. One interesting instance of application for role-playing dimensions is when modelling occupancy. Occupancy may be associated to three organizations that may be different: the one who pays for the space, the one who owns it, and the one who is occupying the space. Once more, there is a one-to- many relationship between occupancy and organizations, which represents a multi-valued dimension issue. Nevertheless, this time the cardinality is always three, even if the same organization has all three

32 D Payer Organization F Building Space Occupancy D Time Organization ID Time ID Time ID ... Date ID ... Activity ID D Owner Organization Space ID D Date Organization ID Building Space ID Date ID ... Owner Organization ID ... Payer Organization ID D Occupant Organization Occupant Organization ID D Activity Organization ID ... Activity ID ......

D Building Space Building Space ID ...

Figure 4.5: Representation of the fact table that records organization members space occupancy over time. The line side in a form of a crow feet (ˆ) represents a multiple association, and the single line means the cardinality is one († ). The rectangles left upper corner’s F stands for fact, D stands for dimension, and T stands for table. Adapted from [12].

roles. Accordingly, we use a role-playing dimension to model this. The occupancy fact table along with the associated role playing dimensions is represented in Figure 4.5.

4.2.5 Aggregate Fact Tables

Fact tables typically store data at the lowest granularity to enable analysing data at distinct levels of detail; detailed data can always be aggregated into larger grain (e.g. we can obtain a monthly consumption from a daily consumption), but not the other way around. For instance, a user may want to analyse an equipment consumption at minute, hour, or even month frequency.

Data aggregation, also known as summarization, is the default DW system behaviour to deal with query responses involving large amounts of data. However, data aggregation relies on low level gran- ularity fact tables that are large and usually force the system to summarize millions of rows, which negatively impacts the performance of the DW [26, 49]

Aggregate fact tables are used to store a subset of another fact table data, but with a different grain. This typically results in a dramatic performance increase, adding only a small amount of extra storage and being transparent to users [26].

Degree days are a metric of heating and cooling, which are calculated from the difference between a day’s average temperature and a predefined base temperature [67]. Hence, degree days have a different granularity by comparison with other weather variables (e.g. temperature).

Despite the fact that we could pass along the responsibility of calculating cooling degree days to user applications, it would force the DW system to summarize a large amount of rows, penalizing the appli- cation’s performance. On the other hand, storing degree days measurements in weather readings fact table, which has a fifteen minute granularity, goes against the recommended multidimensional design principles. Consequently, we use an aggregate fact table, to store the calculated degree days.

33 4.2.6 Variable Depth Hierarchies and Hierarchy Bridges

A hierarchy bridge table has an entry for each relationship between an entity and its parent, enabling to navigate through the hierarchy. Hierarchy bridge tables are placed between a fact and a dimension table, and do not require neither of them to be modified. An organization is composed of smaller organizational units (e.g. departments, groups), which vary in number, size, purpose, and designation. For instance, the Physics department might have three groups, while the Mathematics one may be organized with branches, where each branch has 5 groups. A model for building energy management must handle hierarchies with variable depth, namely (i) the organization hierarchy, and (ii) the building space hierarchy. Both require using a hierarchy bridge table to model each hierarchy. A subset of our space hierarchy bridge is represented in Table 4.3.

Parent Space ID Space ID Levels from top Is Bottom Is Top

Physics Building Physics Building 0 No Yes Physics Building Quantum Physics Room 1 Yes No Physics Building Astrophysics Room 1 No No Astrophysics Room Stellar Physics Room 2 Yes No

Table 4.3: Example of a building space hierarchy bridge table. Each row has the space identifier (Space ID), the parent space identifier (Parent Space ID), the difference between the space position and the top of spaces hierarchy (levels from top), a flag indicating whether the space is on top of the hierarchy (is Top level), and another flag indicating if the space is at the bottom of hierarchy (is Bottom Level). Adapted from [12].

34 Chapter 5

Multidimensional Model Validation

The quality of the multidimensional model must be assured at distinct phases of DW development cycle. During the modelling phase, the model is validated with business users to confirm the accuracy of data requirements. Whenever users do not understand the model, they have to rely on the interpretation conveyed by developers who may have misinterpreted the original requirements [28]. Multidimensional models are usually the first artefacts to be tested since their early validation is fundamental to find errors, much like any other software engineering artefact [32, 126]. For instance, assessing the model quality before designing ETL processes reduces the impact of design errors [126]. Overall, the quality of the multidimensional model translates into the cost and time required to develop the DW, which in our case is embodied in the BEMS. To the best of our knowledge there are no literature proposals regarding the validation of multidi- mensional models quality evaluation. Accordingly, we developed a model validation methodology, in which the model is validated using metrics proposed on the literature, performing workload tests as de- scribed by Golfarelli and Rizzi, and reviewed by end-users [126]. All evaluation steps were part of an iterative process, in which model anomalies were corrected and improvements introduced. The precise evaluation metrics and workload tests will be detailed along the coming Sections.

5.1 Metrics and Metric Selection

The multidimensional model quality metrics employed are those proposed on the literature to measure different model qualities of structural complexity [128], cognitive complexity [123, 124], usability [126, 127], and design quality [12, 26]. Some original metric proposals do not qualify the desirable values associated with different metric results. Thus, we applied the metrics along with recommended values reported on the literature by other authors. Such is the case of the recommended number of dimensions and fact tables [26, 123]. However, we will only consider metrics associated with Kimball et al. multidimensional model representation [26], and metrics previously validated (empirically or theoretically) [144]. The metrics considered can be found ahead in the document in Table 5.1 (Section 5.3). In order to further clarify some metrics, we now turn to

35 describing what are advanced constructs, the roll-up factor, and the multidimensional model complexity metric. Advanced constructs proposed by Golfarelli and Rizzi measure how hard the model may be for users to understand [145]. Multidimensional models become more difficult to understand when one of the following constructs is present:

Cross dimensional attribute values that are obtained through the combination of values from several attributes [145].

Non-additive measures on dimension rows (e.g. temperature measurements) cannot be summed [44]. The problem with non-additive measures is that queries executed by analysis tools over the DW, select thousands or millions of rows each time, and the way to handle it consists in using domain specific aggregate functions, compromising the performance of analytics queries [12, 26].

Hierarchies with multiple arcs have attributes linked to their parents several times. Therefore, the attributes have multiple values for each parent value, increasing queries semantic complexity [145].

Incomplete hierarchies have missing values on any hierarchical level [145].

Hierarchies with optional arcs have missing values on a hierarchical level and all its descendants. When missing values are at the bottom of the hierarchy, optional arcs and incomplete hierarchies are equivalent [145].

Variable depth hierarchies are those in which nodes at the same hierarchical level are not equally distant from the bottom of the hierarchy. [44, 145].

The roll-up factor measures the dimension hierarchies OLAP navigability, and is defined as the inverse of the average number of descendants among the non-leaf hierarchical levels [126]. The idea behind multidimensional complexity metric is building a graph where the dimensions, facts, and hierarchy levels are the nodes, and the relationships between these elements are represented by the edges. The graph is used to obtain the number of relationships, which is used along with the number of table attributes to calculate the metric result. The metric is defined as follows [128]:

N 2 NodeiEdges ∗ NodeiAttributes Q i=1 NodeiEdges + 1

5.2 Multidimensional Model Validation Methodology

The multidimensional model evaluation procedures were applied iteratively until there were no significant changes on the schema. Changes arose from improvements identified after workload tests, evaluation with metrics or end-user review sessions. Figure 5.1 depicts the evaluation process.

36 Schema needs to be modified/improved according to expert users feedback

Schema needs to be modified/improved

Design Analyse Review model Review model Multidimensional evaluation with expert with users Model results users

Execute workload tests Document evaluation results Perform evaluation with Start metrics Model Development Development completed

Figure 5.1: Representation of the multidimensional model evaluation and improvement process cycle (using BPMN). The first step consists in the development of the initial model version (right). Subsequent steps comprise reviewing the model with users, executing workload tests, and performing evaluation with metrics. Afterwards, the model is reviewed by energy managers. During the last step developers document the model evaluation results (left).

5.2.1 Workload Tests

Workload tests validate the feasibility of executing data analyses, which in turn depend on the tables and relationships available in the schema [126]. The tests are also used to verify if every concept has a single, and objective interpretation, which impacts the design of queries [114]. For instance, if two queries can be applied using different concepts, then its nomenclature must be corrected. The tests consisted of four steps: (i) designing the queries, (ii) checking if they were applicable over the schema, (iii) expressing them in SQL, and (iv) executing them over the schema, which was loaded in a relational database. The queries were created according to energy data analysis methods and techniques described by Granderson et al. [64]. The description of these techniques is followed by the required data types, their data granularity, and the explanation of how the data types should be aggregated. Accordingly, we tested the model support for the sixteen analysis methods and techniques, applied over IST energy-related data. We executed a total of fifty queries (approximately three for each analysis method). Figure 5.2 shows one of the fifty queries executed. Model evaluation metric results were obtained using a model evaluation tool specifically developed for the purpose. Some metric results were calculated manually (e.g. number of snowflake constructs). In all cases, metric values were stored by the Java tool, enabling the comparison of the results obtained for different design alternatives.

SELECT week_day_name, AVG(reading) FROM meter_readings_fact_table NATURAL JOIN date_dimension GROUP BY week_day_name

Figure 5.2: Example of a query used to determine the daily average energy consumption of the university campus buildings.

37 5.2.2 End-User Review Sessions

End-user review sessions were executed with the participation of 8 users and lasted an average of 90 minutes each. Each session involved one or two users at a time, and consisted of (i) explaining the concepts and the latest modifications, (ii) discussing the designations used, (iii) asking them what denominations should be improved, and (iv) inquiring them about improvement suggestions. The evaluation and improvement process lasted eight weeks. During the first five weeks, end-user review sessions were performed with five users, who have been taught how to interpret a multidimen- sional model. The participants had previous experience in the building energy management domain, two (40%) had a Computer Engineering Master Degree, one (20%) had a PhD, and the remaining (40%) had a Bachelor’s Degree. The average age was 30 years old, with a standard deviation of 3.7. After each end-user session with energy managers, we identified what improvements could be made and repeated the model evaluation as previously described. Over the sixth week, the schema was not being significantly modified any more and we could not identify any significant improvements. The final end-user review sessions were performed with three experienced energy managers. Two of them had Master Degrees on Mechanical and Environmental Engineering, and the remaining had a bachelor on Telecommunications Engineering. The evaluation process finished when the improvements suggestions were not significant any more.

5.3 Metrics Evaluation Results

The evaluation results next described are related solely to the final metrics evaluation results. Due to space constraints, the evaluation results obtained during intermediary stages of the model development are not detailed on this section (see Table 5.1). On the following sub-sections we describe the results for structural and cognitive complexity, usability, and design quality metrics.

5.3.1 Structural and Cognitive Complexity Metric Results

The structural complexity metric (described on Section 5.1) result for our model was 229. Comparing it the value obtained from the example provided by the author, our model complexity is high. This reflects the complexity of the domain itself, but structural complexity of our model may compromise its understandability and maintainability [128]. Regarding the cognitive complexity metrics, we have an acceptable number of fact tables (5) and dimension tables (7), a low average number of foreign keys (4), and a low average number of fact table measurements (3). Unlike the structural metric results, the cognitive complexity metric results indicate that the model is easy to understand. Taking into account the results obtained from both structural and cognitive complexity metrics, the results regarding understandability and maintainability are inconclusive.

38 Evaluation Week Number Target Desired Metric 1 2 3 4 5 6 7 8 Result

Structural Complexity Metrics Multidimensional model complexity metric [128] 56 89 160 210 262 250 235 229 Low

Cognitive Complexity Metrics Number of fact tables [123, 124] 2 6 6 7 6 4 4 5 Low [123] Number of dimension tables [123, 124] 8 11 12 11 9 10 10 7 4 to 15 [26] Max. number of fact table foreign keys [123, 124] 5 4 5 5 7 7 7 8 Less than 20 [12] Max. number of fact table measurements [123, 124] 2 9 9 2 4 9 9 9 Low [132]

Usability Metrics Number of advanced constructs [126, 127] 0 0 3 3 3 3 3 3 Few or None Max. number of dimensions per fact table [126, 127] 5 4 5 5 5 6 7 8 Less than 25 [12] Max. number of attributes on dimension tables [126, 127] 19 19 19 19 19 19 19 13 Few or 50 to 100 [26] Roll-up factor for date dimension [126, 127] 40% 66% 66% 66% 66% 83% 83% 83% High Roll-up factor for time dimension [126, 127] 100% 100% 100% 100% 100% 100% 100% 100% High Roll-up factor for equipment dimension [126, 127] 100% 100% 100% 100% 100% 100% 100% 100% High

Design Quality Metrics Number of text attributes on fact tables [12, 26] 0 0 0 0 0 0 2 0 None Number of snowflake constructs [12, 26] 0 2 0 0 0 0 0 0 Few or None Number of hierarchies split across dimension tables [12, 26] 0 0 0 0 0 0 0 0 None Use of operational keys to join fact tables [12, 26] Yes Yes Yes Yes No No No No Non-existent Fact table data complies with table grain [12, 26] False True True True False True True True True

Table 5.1: Summary of metrics results obtained during the execution of the model evaluation and im- provement process cycle. The process lasted for 8 weeks, and comprised the application of structural and cognitive complexity metrics, usability metrics, and design quality metrics. Each metric evaluates a target quality.

5.3.2 Usability Metric Results

Usability metric results show that our model has two variable depth hierarchies which, as explained earlier, are considered advanced constructs. An alternative modelling approach consists in using smart textual attributes, that describe the hierarchy structure. However, using a bridge table allows the entities on the hierarchy to be associated with multiple owners/parents, which is relevant for both organization and building space hierarchies. Furthermore, this solution allows the association of expiration times- tamps with hierarchy levels, thus limiting the impact of hierarchy modifications. This is an important requirement for organization hierarchies. Overall, despite being a complex design alternative, bridge tables are the most appropriate design choice, bringing higher flexibility when analysing variable depth hierarchies [12]. The weather readings fact table has non-additive measures, which is another category of advanced construct. In this case there is one kind of non-additive measure, that we previously identified as non- additive numeric measures. In order to deal with non-additive numeric measures (e.g temperature), our strategy is to store in the fact table the numerator and denominator, that can be used later by data analysis tools to calculate the monthly average ratio. Our proposed multidimensional model has an average of 5 dimensions per fact, with a maximum of 8 and minimum of 5 dimensions. These results are far from the maximum of 25 dimensions recommended by Kimball and Ross [12]. Consequently, our model number of dimensions per fact table indicates a good usability and also a low performance impact on the underlying system. The obtained average number of attributes per dimension table was 6, with a maximum of 13 and

39 Hour Year

Hour Quarter Week Quarter

Minute Month Month Week Day

Second Day Day

Figure 5.3: Representation of time dimension hierarchy (left), and date dimension hierarchy (right). a minimum of 4. Despite there is no indication about the recommended number of attributes, Kimball et al. refers that some dimensions have between 50 a 100 attributes, while others have only a few attributes [26]. Accordingly, we have an acceptable number of attributes, but unlike other model features, we cannot consider it as a solid indicator. Indeed, the number of model attributes are determined by the model user. The roll-up factors (described on Section 5.1) were 71% for the hierarchy belonging to date dimen- sion, and 1 for the hierarchy belonging to time dimension. High roll-up factor values indicate an efficient OLAP navigability, and a more complete definition of relations between attributes [126]. Figure 5.3 represents the three existing hierarchies. Although we have two advanced constructs, considering the remaining metric results, we can affirm that our model has a high level of usability. Therefore, it is flexible for users to analyse the different energy-related aspects, has a high learnability and ease of use, and does not imply a significant perfor- mance penalty on the underlying system [12, 126].

5.3.3 Design Quality Metric Results

Considering the design quality metrics, we did not include any textual attributes on fact tables, and fact tables are joined by surrogate keys (synthetic record identifiers never reused or reassigned) [12]. Indeed, it is recommended that future model users follow this rules while determining fact table attributes. The model does not have snowflake constructs or hierarchies split across multiple dimensions, and fact table data complies with specified grain, as previously described on the cases of weather readings fact table and degree days fact table. Therefore, according to design quality metrics, our model has a high quality design.

5.4 Model Validation Findings

During the application of workload tests and while performing user review sessions, several model im- provements were identified. Those improvements were performed, and then validated on subsequent model development iterations. Likewise, the final model design does not present the identified issues. They are only indicative of workload tests and user review sessions contribution to the final model de- sign. The following sections describe workload tests findings and the major aspects referred by the

40 users.

5.4.1 Workload Tests Findings

While applying the workload tests, we realized that our schema was not capable of representing or- ganizations with different hierarchy layouts, or buildings with different space layouts. For instance, a building may have floors or not, and each floor might have rooms or not. Thus, we created space and organization hierarchy bridges described on section 4.2.6. We also concluded that each energy meter is associated with several building spaces (e.g. one space where it is installed, and the one or more spaces it measures), and each space is related to three organizations (e.g. the one paying it, the owner of it, and the one occupying it). The solutions adopted for both situations were using a group bridge table and a role playing dimension, which were described on Section 4.2.3, and Section 4.2.4 respectively.

5.4.2 User Review Sessions Findings

While discussing the model fact table measurements, and dimension attributes with users, we concluded that some names were not adequate. For instance, energy tariffs fact table was renamed energy costs fact table, since it must store several energy cost categories other than tariffs. Users also pointed out the role of activities and associated space occupancy on energy consumption. Accordingly, we created an activity dimension associated with building space occupancy fact table.

5.5 Discussion

Overall, both workload tests and user reviews contributed to improve the model completeness and easiness of use. However, experience energy managers demonstrated difficulty in identifying model improvements, without having possibility of performing concrete energy related analyses. Therefore, reinforcing the need of performing user review sessions, in which we demonstrate the functionalities of a BEMS built upon our model. During the model evaluation with metrics we could not take a conclusion about the model complexity, which is associated with its understandability and maintainability. Although, we confirmed a high level of model usability, and thus its high learnability, ease of use, and flexibility when performing energy-related analyses, and low performance impact on the underlying system. In addition, we demonstrated the model high design quality. Despite the validation against a static model structure, the model may be simplified or extended. For instance, variable depth hierarchies may be replaced with fixed depth hierarchies. Indeed, the proposed model qualities along with the guarantees provided by the use of Kimball lifecycle [26] development guidelines, enables us to affirm that our model is easily modifiable.

41 42 Chapter 6

BEMS Prototype Solution

The lack of empirical testing is an obstacle to the effective evaluation of the quality of conceptual mod- els [31]. As for multidimensional models, the lack of empirical testing limits the assessment of whether the model is fit for purpose, i.e., the model supports the required BEMS functionalities. Therefore, we validate our model by developing a BEMS prototype built upon our model proposal, using university campus building data (e.g. buildings, and energy data). From a technical point of view, our solution consists of three independent modules: (i) the definition and execution of ETL processes to integrate data, (ii) the development of an OLAP web server serv- ing energy data analysis requests, and (iii) the development of energy management web applications, enabling users to pose data analysis requests. The prototype is validated conducting interviews with energy managers. The purpose of the inter- views is to evaluate the usability, performance, and functionality of the BEMS prototype, resulting in an additional model validation.

6.1 BEMS Prototype Development Context

According to the distinct BEMS architecture components described in Section 2.2.1, our prototype will focus on the data management and application layers. Those are the BEMS layers required to evaluate our model. The development of building automation and performance optimization layers goes beyond the scope of this work. Our BEMS prototype closely follows the architecture of a DW. In that architecture, our prototype includes the following components:

1. Data management layer consists of:

(a) A DW data staging area that is loaded by ETL workflows that extract and conform energy- related data.

(b) A DW data presentation area including a data storage and an OLAP web server.

43 2. A Data application layer represented by data access tools in the form of energy management web applications.

Our BEMS prototype was developed using open-source software tools, which are flexible, extensible, and usually not tied to any operating system. Moreover, being open-source allows us to make it publicly available for other researchers [46]. BEMSs can be used in different contexts, including different types of buildings and organizations. The BEMS prototype implementation uses data from the years of 2013 and 2014 and IST university context, whose details can be found in appendix SectionA. In the following Sections we describe the major solution components, namely ETL workflows, OLAP web server, and energy data analysis applications.

6.2 ETL Workflows Development

A DW system relies on a unique data storage containing clean and cohesive data, that enable users to perform business related analysis [43]. The effective design and implementation of the appropriate ETL workflows are a significant part of the DW development process. In order to develop our ETL workflows, we followed Kimball and Caserta [43] ETL development guide- lines, plus those included on the Data Staging Design & Development phase of Kimball lifecycle [26]. The following subsections describe the development of ETL processes.

6.2.1 Choosing the Data Integration Tool

Our choice for implementing ETL workflows was Pentaho Data Integration Software (PDI). PDI is a professional grade open-source ETL tool, that enables the definition and execution of ETL processes, and is supported by a growing community. We used PDI to extract data from sources, transform data according to the model specification, and load data on a data storage (PostgreSQL relational database). PDI ETL workflows can be jobs or transformations that consist of steps, which are connected by hops. A job is a workflow for high level tasks, such as invoking other transformations or jobs, schedul- ing tasks, or creating/deleting files. On its turn, transformations are used to access data from different sources, modify data and load it on existing repositories. Steps are the designation of all the tasks performed by both jobs and transformations [146].

6.2.2 Determing Systems-of-Record and Extraction Procedures

Energy-related data is available in distinct types of data sources (e.g. web API, website), and stored using different data formats (e.g. JSON, XML). Data is extracted in different ways (e.g. programmati- cally using an API), transformed using ETL workflows, and then loaded into the appropriate dimensions and fact tables. The heterogeneity of data sources requires the application of complex data transforma- tion processes, resulting in data records that are different from those stored into the DW presentation

44 Data Format Source Description Extraction Model Table Target Procedure

XLS ERSE Website1 M Energy Costs Fact Table Datapoint Dimension EnergIST Energy Management Sys- SA Energy Readings Fact Table tem2 Datapoint Dimension Fenix System website3 SA Space Dimension

CSV IST Taguspark cactitp API4 A Space Occupancy Fact Table Degreedays.net Website5 SA Degree Days Aggregate Table

HTML MeteoIST Website6 A Weather Readings Fact Table, Datapoint Dimension

JSON FenixEdu API7 A Activity Dimension, Space Dimension, Space Hierarchy Bridge Wunderground API8 A Weather Readings Fact Table, Degree Days Fact Table

XML Yahoo Weather API9 A Weather Readings Fact Table, Datapoint Dimension

Table 6.1: Summary of integrated data sources, their origins, data formats, extraction procedures (M stands for Manual, SA for Semi Automatic, and S for automatic), and the mapping between model tables and their systems-of-record. area. As such, it is vital to document dimension and fact tables systems-of-record — the original data sources [43]. Another crucial aspect to consider are the required data extraction procedures. In general, data can be extracted directly from sources by PDI using for example web APIs. Nevertheless, some sources require a manual extraction of stored data files, which we designate as a semi-automatic extraction process. As for energy costs data, it was manually extracted and put into files (manual extraction pro- cedure). Table 6.1 summarizes the extracted data formats, the data sources description, the performed extraction procedure, and the target multidimensional model dimensions and fact tables.

6.2.3 Extracting Data from Sources

The distinct data formats require the adoption of different procedures to parse each format. In order to parse XLS, XML (Extensible Markup Language), and CSV (Comma-separated values) files, we used PDI built-in functionalities. On the other hand, JSON (Javascript Object Notation) format data records are parsed using JSON PATH expressions (a XPATH language dialect) embedded in ETL workflow steps. To extract weather data measurements from HTML pages source, we embedded a web scrapping Java application in PDI. Essentially, the Java application downloads the web pages, uses regular ex- pressions to isolate relevant HTML content, splits measurement values into rows and columns, and then

1http://www.erse.pt/pt/electricidade/tarifaseprecos/Paginas/default.aspx 2http://energist.ist.utl.pt 3http://fenix.tecnico.ulisboa.pt 4https://cactitp.tecnico.ulisboa.pt/graph_view.php 5http://www.degreedays.net 6http://meteo.ist.utl.pt 7http://fenixedu.org/dev/api 8http://www.wunderground.com/weather/api 9https://developer.yahoo.com/weather

45 sends data to the next ETL workflow step. Unlike other data types, time and date data was created directly using PDI steps, instead of being ex- tracted from external data sources. Time and date transformations were divided in sub-transformations, which are extensively used by other transformations. For instance, the transformation responsible for loading weather data, relies on a date sub-transformation to obtain a table containing the days of 2014. This table is later used to associate weather records data with date dimension entries. IST equipment and organization decomposition data was not available. In either case, organization and equipment data records are associated with atomic spaces, while our energy consumption data is associated with building aggregates only. Thus, due to the different data types granularity, it would not be possible to effectively correlate available energy measurements data with organization or equip- ment data. Associating those records with entire buildings would make impossible to account individual equipment or organizational structures impact on energy consumption. A similar issue occurs on the correlation of data from aggregated building energy consumption and space activities. Yet, we were able to obtain isolated energy measurements data regarding one single room (Taguspark A4), and correlate it with occurring events such as classes and exams.

6.2.4 Establishing Data Source Priorities

An important aspect of designing ETL workflows is to establish data source priorities that deal with cases where equivalent data records exist on different sources. We established data source priorities for the following data types:

Building names are available on FenixEdu API 10, Fenix system website 11, and EnergIST energy management system 12. Names available on FenixEdu API and Fenix system are complete, while those on EnergIST include abbreviations, incomplete names, and words separated with dashes instead of spaces. In addition, FenixEdu data is available directly through its API, while Fenix system requires user authentication and data has to be extracted semi automatically—users must extract data files using visual tools. In conclusion, we choose FenixEdu as the main data source for building names and space hierarchy layout data (also available on both systems).

Degree days data is provided by Degreedays 13 and weather underground 14 websites. Degreedays website allows users to manually export data from their website, and weather underground offers an API to programmatically access data. On the other hand, Degreeday website is permanently available, while weather underground website API restricts daily data access. Taking into account data availability on both sources, we gave a higher priority to date provided by Degreedays website

Weather variables such as temperature or humidity are collected by weather stations. For instance, IST weather station records weather variable measurements every five minutes. Afterwards, measure-

10http://fenixedu.org/dev/api 11http://fenix.tecnico.ulisboa.pt 12http://energist.ist.utl.pt 13http://www.degreedays.net 14http://www.wunderground.com/weather/api

46 ments are made available on MeteoIST website 15, and then sent to other weather data websites such as weather underground 16. In any case, MeteoIST website provides IST campus weather data with the lowest granularity, and data is always available for extraction. Therefore, we gave a higher priority to MeteoIST website over other data sources.

6.2.5 Overcoming Data Quality Problems

Integrating data from heterogeneous sources requires checking for inconsistent descriptions, erroneous value assignments, missing values, integrity violation constraints, among other issues [147]. The major data quality issues and solutions taken are described on the following paragraphs.

1. Preserving Implicit Hierarchy Relationships

In general, there are dependencies between data source records. Some dependencies concern primary key-foreign key associations (between facts and dimensions), while other result in dimen- sion hierarchies [43]. In either case, ETL workflows must be able to transpose data source records dependencies into multidimensional model dependencies.

As we will make clear head in the document (Section 6.2.7), loading dimensions table data before fact tables data guarantees the correct setting of multidimensional model primary key-foreign key associations.

The parent-child relationships between IST building spaces are encoded into JSON objects, in- cluding the space description and the list of children spaces. To preserve IST building spaces parent-child relations, we had to parse space data records iteratively — parsing a parent space and then parsing its children. However, PDI does not provide any looping steps. Therefore, we created a job responsible for invoking the space for each space hierarchy level, passing parent data to its children spaces.

2. Filtering Non-Existent Records

The existence of null dimension table records is prone to user errors. In concrete, users may try to join fact tables with null dimension table keys, obtaining unexpected data results. Moreover, the existence of null dimension keys result in referential integrity violations [12]. Referential integrity requires foreign keys to reference non null primary keys [148]. That is, records describing fact table measurements must be present on dimension tables.

In order to deal with non-existent space records (e.g. buildings where its only floor is omitted from the hierarchy layout), we filtered and removed spaces with null identifiers.

3. Integrating Data from Distinct Sources

Some multidimensional model tables require the extraction and transformation of data from several sources. The resulting ETL workflow may be complex, making debugging and future modifications harder. 15http://meteo.ist.utl.pt/ 16http://www.wunderground.com/weather/api

47 Figure 6.1: Representation from left to right of space ETL workflow. First, space data is extracted and loaded (1). Secondly, space description data is extracted and space dimension table data is updated (2). Finally, space hierarchy bridge data is created and loaded according to space dimension data (3).

One solution is creating several ETL workflows instead of one, where the first loads dimension data, and the following update dimension table database records. As a result, the overall ETL workflows execution time will increase, but the delay is not considerable because in general di- mensions have a low number of records.

To integrate space layout and space description sources data, we developed two distinct trans- formations with numerous steps. We used the first transformation to load space layout data into the data storage, and used the second to update stored records with space description data (see Figure 6.1).

4. Cleaning Dimension Attributes Data

Dimension tables attributes data is used to describe fact table measurements data. Accordingly, attributes data quality is crucial for the efficiency of data analysis activities [54].

Space data sources have a field for space usage and another for classification. Moreover, some space records had data on both fields, while other had one empty field. Accordingly, our space dimension records have an attribute containing the concatenation of the original description and classification fields. On the other hand, data source records containing exact duplicate data were compared and de-duped.

5. Creating Surrogate Keys

Surrogate keys are synthetic keys used to join dimensions and fact tables [12]. The use of surro- gate keys requires generating sequential identifiers on most ETL workflows, instead of relying on the natural identifiers provided by data sources. Surrogate keys have the following advantages [12]:

• Surrogate keys are compact 4-byte integers, while natural keys can be alpha numeric or long types. This encoding handles most dimensions and improves database queries performance.

• Unlike surrogate keys, natural data source keys can be reused or reassigned on the data sources, compromising our DW records integrity. Indeed, the DW would integrate records sharing natural keys, which are used as primary keys, making it impossible to uniquely identify records.

• Associating records with both surrogate and natural keys enables the use of type 2 SCD (described on Section 2.3.3). Their roles are described as follows:

Natural Key is used to group records with their successors/ancestors, enabling the analy- sis of record attribute-value changes over time. Additionally, natural keys are used to

48 Figure 6.2: Representation of the switch case step used to parse each activity type separately. The steps that follow the switch case step parse activities data using a different set of JSON PATH queries.

match dimension table records with their original data source records, identifying oc- curred changes.

Surrogate Key is used as the dimension table records primary key, uniquely identifying each dimension record.

Despite the required effort (on ETL workflows development) to associate dimension records with surrogate keys, we included surrogate keys on all dimension tables, and stored space’s natural keys in space dimension.

6. Transforming Heterogeneous Data Records

Data records stored on the same source may have different data structures, data types, fields, among other differences. These differences must be identified before developing ETL workflows, and records should be parsed according to its characteristics [43].

One way to transform heterogeneous data records comprises the use of PDI switch-case case. This step is used to split the main ETL workflow execution branch. Records are split among the sub-branches, according to their attribute values or data types, and then parsed differently.

JSON objects containing space activities data have different structures and fields for each event type. Therefore, we used a switch-case step, parsing different event type records with distinct JSON PATH expressions (see Figure 6.2).

In order to load energy access tariff costs, a number of calculations need to be performed accord- ing to energy cost components (e.g. daily period cost, peak power, and contracted power). This logic is implemented using a switch-case step to split the rows according to each row associated cost component, performing the appropriate calculations, and then loading the data.

7. Merging Records on Disparate Source Systems

Matching entities across disparate systems usually requires the use of a matching algorithm. This algorithm can be as simple as joining records using their primary keys. However, distinct data sources usually do not share primary keys. Therefore, record matching algorithms are usually based on fuzzy logic techniques, such as string matching [43].

49 Algorithm Average Time Result Accuracy

Pair Letter Similarity 3min 37s 100% Jaro 3min 8s 100% Levenshtein 3min 4s 90% Needleman-Wunsch 7min 25s 90% Metaphone 3min 11s 60% Double Metaphone 3min 11s 60%

Table 6.2: Summary of algorithms tested in order to find matches between space names. Time results are calculated from the average of ten algorithm runs.

Energy consumption records on EnergIST 17 energy management system are associated with space names that are inconsistent with those on Fenix system website 18. In order to find the most appropriate algorithm to match names on both systems, we searched for PDI string matching algorithm’s implementation documentation. Regrettably, such documentation is not available.

As an alternative way of choosing the most appropriate string matching algorithm, we ran each algorithm ten times, and analysed performance and matching accuracy results (see Table 6.2). In general, the algorithms execution time between different executions was similar (3 seconds difference on average). Accordingly, it did not seem relevant to run each algorithm more than 10 times.

Analysing the different algorithm results, Metaphone and Double Metaphone algorithms performed poorly in terms of accuracy. Those algorithms are based on the English language, and our data records contain Portuguese building names [146].

Needleman-Wunsch algorithm took twice the time used by the remaining algorithms, while Lev- enshtein algorithm was the one that took less time to finish, but did not find any match for some words. Regarding pair letter similarity and Jaro algorithms, we choose Jaro algorithm because it took less time to finish.

8. Using a Data Source To Load Several Model Tables

Fact table measurements data and their associated dimension records may be extracted from the same data source. Accordingly, the dimension table records must be loaded before the fact table measurements.

A possible solution is developing an ETL workflow that extracts data, waits for the branch that loads dimension records to finish, and then allows the second branch to load fact table measurements.

Our multidimensional model fact tables are associated with a datapoint dimension table. Thus, be- fore loading fact table records, it is necessary to create new datapoint dimension records, and load them before their associated fact rows. To create datapoint records, we filter attributes describing fact table records. In particular, we (i) select unique record attribute-value combinations, (ii) as- sociate them with a unique identifier, load them on database datapoint dimension table, and (iii)

17http://energist.ist.utl.pt 18http://fenix.tecnico.ulisboa.pt

50 Figure 6.3: Representation from left to right of the initial part of energy costs ETL workflow. After extract- ing energy costs data (1), resulting data is used on two branches. In particular, the lower branch filters unique cost description field value combinations (2), adds constant fields such as units (3), generates a unique row key (4), loads data on datapoint dimension (5), and merges obtained rows with original data rows coming from the first branch (6).

then use a PDI step to join them back with their related fact table records (see Figure 6.3). Finally, those records are transformed and loaded as fact table measurements.

9. Load Measurements with Different Units

Sometimes fact table measurements must be provided in different units of measurement. How- ever, for efficiency reasons, the measurements should be stored using a fixed unit. The solution alternatives are as follows [12, 26]:

(a) Creating a dimension table that holds unit conversion factors and associates them with fact table measurements.

(b) Storing all measurements according to the same unit, along with U unit conversion factor attributes. Accordingly, fact table rows would have M measurements + U unit conversion attribute columns.

(c) Storing M measures for each unit factor U, having M * U fact table measurement columns.

The first two alternatives force users to multiplying measurements by conversion factors, being error prone [12]. In our case, the third alternative is the most appropriate. Indeed, there is only one measure type (energy consumption), and three units of measure (Wh, kWh, and MWh), resulting in three fact table measurements.

10. Zero and Negative Measurements

As previously described, zero and negative energy measurement values should be flagged as erroneous. For instance, the occurrence of negative values is an indication of faulty energy me- ters/sensors. Therefore, energy managers should enquire the error causes, and then remove the error indication [54].

Energy measurements data obtained from EnergIST system 19 contains negative and zero values. Negative values result from faulty energy meters, and zero measurements are associated with

19http://energist.ist.utl.pt

51 unused spaces. Yet, there are few unused spaces, and sometimes faulty meters also record zero measurements. Accordingly, our solution is converting all negative values to zero, and letting energy managers analyse each case.

6.2.6 Determining Slowly Changing Dimensions

As previously described on Section 2.3.3, slowly changing dimensions are strategies associated with dimension tables, that define how dimensions handle data record changes over time. On the context of building energy management, the correct handling of dimension data changes enables the analysis of events (e.g. equipment replacement) impact on energy consumption. On the following paragraphs we describe the SCDs types associated with each dimension table. Time and date dimension records are immutable and, therefore, we associated those dimensions with type 0 SCD, which indicates that dimension attributes must not be changed [12]. Attributes on the datapoint dimension are used to describe measurements (e.g. units, scale). Since these attributes rarely change, there is no need to keep their historic values. Accordingly, we associated datapoint dimension with type 1 SCD. Spaces layout and organizations structure are subject to change over time. These changes affect energy consumption and consequently, energy managers may need to compare energy consumption before and after space or organizational changes have occurred. A type 2 SCD is the most appropriate choice for organization and space dimensions. On a university campus, activity types and schedules are subject to change every six months (ac- cording to academic terms). As a result, a large amount of activity dimension rows is updated every six months. Activity changes are predictable and involve many rows, indicating that this dimension table should be associated with type 3 SCD.

6.2.7 Designing the Integration Processes

The ETL workflows were designed modular. The highest level job orchestrates the execution of trans- formations or sub-jobs that extract, transform, and load distinct data types from different data sources. As a result, it becomes easy to stop extracting data from one source and obtain it from another. All it requires is replacing the old transformation on main orchestration job. The highest level job determines the date of the oldest and newest energy-related data measure- ments, those time boundaries are propagated to all transformations. Accordingly, all transformations extract data regarding the same period (e.g. 2014, and 2015). The workflows were designed according to our multidimensional model table dependencies—fact table rows foreign keys depend on dimension table rows primary keys. Likewise, dimensions tables data was loaded before fact tables data. Also, to guarantee that all dimensions were loaded before any of the fact tables, dimensions were loaded sequentially. On the other hand, in order to reduce ETL workflows execution time, all fact tables are loaded in parallel (see Figure 6.4). Appendix SectionC contains the representation of all ETL workflows.

52 Figure 6.4: Representation from left to right of the highest level ETL workflow. First, SQL tables are created on PostgreSQL database (1). Secondly, time (2), date (3), and space (4) dimensions data is extracted and loaded. Finally, activities (5), occupation (6), weather (7), energy consumption, and energy costs (8) data is loaded in parallel.

Unlike other dimensions data, activities data is loaded before occupancy fact table, but in parallel with remaining fact tables. This exception occurs because activities data extraction takes a considerable amount of time. In general, the first transformations steps consist of receiving parameters from parent transformations or jobs, and then extracting data from sources. On the other hand, some transformations receive data records directly from parent transformations. In order to reduce the data extraction execution time, some transformations use several http clients in parallel. Afterwards, data is merged into one single table, that is processed by subsequent steps. The second part of the transformation usually consists in transforming the obtained data records. The following steps are frequently used:

Select Step filters unnecessary data attributes, or remove auxiliary fields created during the transfor- mation process. This step is particularly useful to avoid merging unnecessary data attributes or propagating redundant fields.

Lookup Step retrieves specific dimension record keys from the database. This step is used to associate fact table records data with dimension table keys.

Constant Value Step adds auxiliary data fields. For instance, we use constant value steps to add weather variables suffixes data. These suffixes are used to build and parametrize meteoIST 20 data source URLs.

Java and JavaScript Steps are used to build URLs, apply formulas over data records, and convert data types (e.g. integer to long). In general, these steps are used to execute complex operations that cannot be performed directly using PDI steps.

Transformations end by giving back the records to the parent transformation or job, or loading them directly on the database. Data can be loaded with traditional database inserts, or using a bulk load technique. This technique consists in loading data on the database, without logging the transaction, and without guaranteeing foreign keys integrity. More precisely, fact table records are added even if their

20http://meteo.ist.utl.pt/

53 associated dimension table records are not found. On the other hand, bulk load reduces considerably the time required to load data. Despite the bulk load technique advantages, we opted to use traditional database inserts, guaranteeing loaded data integrity [43].

6.3 OLAP Web Server Architecture Overview

Despite their advantages, MOLAP systems rely on proprietary OLAP [26]. Therefore, our solution was developed according to a ROLAP architecture. Our system architecture includes a Post- gresSQL relational database, a XML metadata repository, an OLAP web server, and web data analysis tools. Our web server relies on Pentaho Mondrian OLAP engine, which we will simply call Mondrian. Mon- drian is a Java library that receives MDX query requests, generates optimized SQL queries, and exe- cutes them over a relational database. For that purpose, the multidimensional model was loaded as a relational database model on PostgreSQL relational database. The complete relational model definition is described on Appendix SectionB. Mondrian uses three levels of cache to store frequent query responses data, improving queries re- sponse performance. The schema cache stores the multidimensional model schema proprieties, mem- ber cache stores dimension tables values, and segment cache stores some previous query result val- ues [49]. The OLAP web server query requests are received through a REST API, represented by a single URI (/Application), that has a GET method with parameters for MDX query clauses described as follows:

Cube contains the OLAP cube identifier, from which data is to be retrieved [149].

Rows determines the OLAP cube rows data [149].

Columns defines the OLAP cube columns data [149].

With is the MDX query clause that specifies an expression to be applied over a set of tuples within the cube. The expression is associated with an alias, which is later used on on rows and on columns clauses to evaluate the expression [149].

Where contains an expression that determines the dimension hierarchy used to slice the cube, obtaining a one-dimension cube slice [149].

Stat parameter is used by OLAP web server to answer the request with server performance statistics data, instead of a query result.

6.4 Metadata Repository Overview

The translation between MDX and SQL queries is defined on Mondrian metadata repository using XML files conforming to Mondrian XML schema. Our Mondrian XML metadata file contains the following definitions:

54 Physical Tables Metadata includes relational database table attributes and their corresponding data types.

Dimensions Metadata is used to map relational database tables with multidimensional model dimen- sions, determining dimension attributes, setting attribute dependencies (e.g. each month belongs to a single year), and defining dimension hierarchies. In particular, our XML dimensions metadata includes the following hierarchies:

• Date hierarchy (date dimension) : YearÐ→MonthÐ→Day.

• Quarterly hierarchy (date dimension): YearÐ→QuarterÐ→MonthÐ→Day.

• Weekly hierarchy (date dimension): YearÐ→WeekÐ→Week Day Number.

• Time hierarchy (time dimension): HourÐ→QuarterÐ→MinuteÐ→Second.

• Space hierarchy (space dimension): Parent SpaceÐ→Space. This hierarchy is also associ- ated with space hierarchy bridge, which stores space decomposition layout data.

Cubes Metadata determines OLAP cube dimensions and measures.

Roles Metadata sets users access level to each cube.

According to Kimball et al., in order to optimize queries execution performance, aggregations should be performed with non decimal values [26]. However, both energy consumption and weather measure- ments are expressed in decimal values. Therefore, our fact table measurements were loaded into data storage using long data types, but were converted and presented to users as decimal values. The measurement’s data conversion operations, known as calculated members, are defined on Mondrian XML metadata files. These operations are executed in memory by Mondrian (during the aggregation of queries data), maximizing queries performance using non-decimal data, but providing users with accu- rate decimal data. Fact table measurements defined on XML metadata file are associated with aggregation functions (e.g. sum). As a result, the three original measurements (associated with Wh, kWh, MWh) on energy consumption fact table, were combined with four aggregation functions (max, min, avg, sum), resulting in twelve measurements (available to use on MDX queries). The complete Mondrian XML metadata file definitions are represented in Appendix SectionD.

6.5 Energy Management Web Applications

In order to develop energy management analysis applications, we developed interactive web chart appli- cations for energy management, according to the reporting techniques and analysis methods described on section 3.6. Furthermore, all model dimensions and facts are used on the different charts (see Ta- ble 6.3). The charts were developed using JavaScript; JQuery 21, jsTree 22 and Highcharts 23 libraries;

21https://jquery.com/ 22https://www.jstree.com/ 23http://www.highcharts.com/

55 Energy Consumption Analysis Charts

Reporting Techniques Detailed Consumption Analysis Space Comparison Analysis Year Comparison Analysis Energy-related Factors Analysis Energy Costs Simulator A4 Occupation & Activities Analysis Peak Load Analysis Chart

Longitudinal Benchmarking ○ ○ ● ○ ○ ○ ○ Cross Sectional Benchmarking ○ ● ○ ○ ○ ○ ○ Simple Tracking ● ○ ○ ○ ○ ○ ○ Utility Cost Accounting ○ ○ ○ ○ ● ○ ○ Internal Rate of Return – – – – – – – Carbon Accounting ● ● ● ● ○ ○ ●

Energy Consumption Analysis Methods Load Profiling ● ○ ○ ○ ○ ○ ○ Peak Load Analysis ○ ○ ○ ○ ○ ○ ● Photo Voltaic Monitoring – – – – – – – Loading Histograms – – – – – – – Simple Baselines ○ ○ ● ○ ○ ○ ○ Model Baselines ○ ○ ○ ● ○ ○ ○ Lighting Efficiency – – – – – – – Heating & Cooling Efficiency – – – – – – – Energy Signature ○ ○ ○ ● ○ ○ ○

Multidimensional Model Dimensions Space Dimension ● ● ● ● ● ● ● Activity Dimension ○ ○ ○ ○ ○ ● ○ Datapoint Dimension ○ ○ ○ ○ ○ ○ ○ Time Dimension ● ● ○ ○ ○ ● ● Date Dimension ● ● ● ● ● ● ● Organization Dimension ○ ○ ○ ○ ○ ○ ○ Equipment Dimension ○ ○ ○ ○ ○ ○ ○

Multidimensional Model Facts Energy Readings Fact Table ● ● ● ● ● ● ● Weather Readings Fact Table ○ ○ ○ ● ○ ○ ○ Energy Costs Fact Table ○ ○ ○ ○ ● ○ ○ Space Occupancy Fact Table ○ ○ ○ ○ ○ ● ○

Table 6.3: Summary of energy management analysis and reporting techniques and model facts and dimensions, and their association with energy consumption analysis web charts (●– Existent association ○– Non-existent association “–”– Unimplemented method or technique). All fact and dimension tables are considered, except equipment and organization dimensions. and Bootstrap 24 framework.

6.5.1 Application Use Case

Our web charts application follow a common execution procedure, in which they read chart configuration parameters, build MDX query clauses, send query requests to OLAP web server, parse the response data, and build charts according to the requested configurations (see Figure 6.5). Those steps are described as follows:

1. Users start by choosing one of the available charts, depending on their data analysis goals. For

24http://getbootstrap.com/

56 2. 3. 4. Analysis Make Data Request Analysis Chart Read Chart Parameters OLAP Web Execute Query PostgreSQL Chart Web Page Server Data Storage 7. Application 6. 5. Update Chart Data Send Data Matrix Array Send Query Result

Chart Parameters: MDX Query Clauses: Complete MDX Query: Executed SQL Query:

Time Granularity: Year Rows: SELECT SELECT [ Dat e] . [ 2014] . chi l dr en [ Dat e] . [ 2014] . chi l dr en dat e_di mensi on. cal endar _year , Selected Space: Al ameda On Rows, dat e_di mensi on. cal endar _mont h, Columns: space_di mensi on. par ent _space_key, sum( ener gy_r eadi ngs_f act _t abl e. Normalization Criterion: MWh [ Ener gy Measur ement s { ( [ Ener gy Measur ement s MWh sum] , MWh sum] , measur ement _mwh) [ Space] . [ Al ameda] [ Space] . [ Al ameda] ) } On Col umns FROM Cube: [ Ener gy Readi ngs] ener gy_r eadi ngs_f act _t abl e, FROM [ Ener gy Readi ngs] dat e_di mensi on, space_di mensi on

WHERE 1. dat e_di mensi on. cal endar _year = 2014 and Configure Analysis Chart space_di mensi on. par ent _space_key = 2 and ener gy_r eadi ngs_f act _t abl e. dat e_key = dat e_di mensi on. dat e_key and ener gy_r eadi ngs_f act _t abl e. space_key Energy Manager = space_di mensi on. space_key GROUP BY dat e_di mensi on. cal endar _year , dat e_di mensi on. cal endar _mont h_number , space_di mensi on. par ent _space_key

Figure 6.5: Example of the interactions between BEMS prototype system components required to update an energy consumption analysis web chart. The user chooses a chart (1), and the chart application reads the requested configurations (2) and then sends a data request to the OLAP Web server (3). The server executes the request over the PostgresSQL database (4), data is sent to the OLAP server (5) and to the chart application (6), and then web chart data is updated (7).

instance, the detailed analysis charts enable users to inspect a single building hourly energy con- sumption. On the other hand, comparison analysis chart permits users to compare energy perfor- mance between different buildings.

2. After choosing the most appropriate chart, users configure the chart. The major chart configuration parameters are the selection of one or more buildings, the selection of a normalization criterion 2 (e.g. MWh, MWh/m , and CO2 Kg), and the selection of a time granularity (e.g. yearly, monthly, and weekly).

3. According to the chosen chart parameters, the chart application requests its specific query builder to determine the MDX query clauses (e.g. rows, columns, and cube) required to create a chart that displays data according to user request. For instance, if the user selects MWh/m2 normalization criterion, the query builder creates an MDX query with an optional WITH clause, determining that energy consumption values must be divided by their corresponding building area.

4. The set of constructed query parameters is sent to the OLAP Web server, that verifies the received MDX query clauses validity, builds a complete MDX query, requests mondrian to execute the MDX query over the postgres data storage, and answers back with a JSON matrix array containing query results data.

5. The chart application parses the data matrix array according to the expected query result structure. For example, consider a query as the following:

57 Figure 6.6: Snapshot of the detailed consumption analysis charts depicting space (1), normalization (2), and time (3) selectors (left); and the energy consumption chart (4) itself (right).

SELECT {([Cooling Degree Days],[Space].[Alameda].[South Tower]), ([Energy Measurements MWh sum],[Space].[Alameda])} ON COLUMNS, [Date].[2014].children ON ROWS FROM [Energy Readings]

The result of the query is a data matrix in which the first column contains cooling degree days data calculated from IST weather station recordings (located on South Tower). The second column contains Alameda campus aggregated energy consumption. The matrix rows represent the twelve months of 2014.

6. After parsing the data matrix, the web chart data is updated according to the specified chart con- figurations.

On the following paragraphs we describe the major aspects of each chart type. Along with each chart description, we also include some analyses results, which we used to stimulate energy manager’s attention during our prototype evaluation with energy managers.

6.5.2 Detailed Consumption Analysis Charts

Detailed Analysis bar chart enables users to inspect energy consumption trends in different time periods, going from years to minutes (see figure 6.6). The user starts by choosing a space, a time window (e.g. from January to February), and a granularity (e.g. Monthly), and according to the window limits, we determine loaded data. For instance, if the user chooses a Monthly granularity with a time window covering March, we will load March, but also February and April data, anticipating future requests. Furthermore, it will not be necessary to load any further data, until the time window goes beyond the previously loaded months.

58 Using the detailed analysis chart is possible to take the following conclusions:

• August is associated with nearly half the consumption of the remaining months, since it corre- sponds to the usual academic vacancies period.

• January and February have a lower consumption than other months, which is due to the reduced number of academic activities.

• Weekend day and night periods consumption corresponds solely to the base load consumption.

• During lunch period, from twelve o’clock to two o’clock, there is a consumption decrease, varying among different buildings.

• Geology department building consumption from April to December is low and irregular, which is probably due to a faulty energy meter.

6.5.3 Space Comparison Analysis Charts

The goal of space comparison bar chart is comparing buildings consumption performance, over a time period (see figure 6.7). For instance, it is possible to compare the consumption of North and South Tower, during the fifty-two weeks of 2014. However, using the same approach to display daily or hourly data is highly inefficient. For instance, a daily line chart would include 365 points, and an hourly chart would include 8760 points. Therefore, unlike other time periods, we display average consumption values in daily and hourly consumption charts. As we previously referred, the different charts provide a selector for the normalization criterion. Com- paring the consumption of South Tower and Chemistry Pavilion, South Tower energy consumption dur- ing 2014 was nearly 2,5 times higher. Although, applying a MWh/m2 normalization criterion, Chemistry Pavilion consumption is 1,18 times higher, which is due to its high energy consumption over a small area.

6.5.4 Year Comparison Analysis Charts

To evaluate implemented energy reduction policy results, energy managers compare energy consump- tion on equivalent periods in different years. For instance, implementing a free cooling policy, which is the use of external air temperature to regulate internal temperature, may result in a significant energy consumption reduction. Year Comparison chart is similar to space comparison chart, but instead of being used to compare consumption on different spaces (on the same year), it is used to compare consumption on different years (on the same building). Moreover, it presents a ratio curve that represents consumption evolution ratio over the years.

59 Figure 6.7: Snapshot of space comparison analysis charts depicting space (1), time (3), and normaliza- tion (4) selectors; the switch button that is used to unstack chart bars (2); and the comparison chart (5) itself.

6.5.5 Energy-related Factors Analysis Charts

Energy consumption is influenced by different factors, such as temperature or humidity. In order to enable users to study variables influence on energy consumption, we developed a scatter chart, which includes a regression curve. The X axis has the weather variable scale (e.g. degrees Celsius), and the Y axis contains the energy consumption scale (e.g. MWh). Analogously to other charts, variables chart includes selectors for space, time, and normalization, but also a regression type selector (e.g. linear, polynomial). All data points are associated with a description which helps users to study outliers (e.g. day, month). Furthermore, we allow users to remove outlier points, which may compromise the regression curve usefulness. For instance, when using cooling degree days to analyse cooling needs, it may be useful to remove winter months, which have zero cooling degree days since there are no cooling needs. Other type of outliers include August month, and weekend days or holidays, in which energy consumption is abnormally low.

6.5.6 A4 Occupation & Activities Analysis of a Lecture Room

Two factors that highly influence energy consumption are space activities and occupants behaviour. While performing activities, users rely on heating or cooling air systems, illumination systems, and other various equipment. Therefore, we developed a chart to represent the correlation between energy con- sumption, academic lessons, and students occupation, on Taguspark A4 room. The chart uses a line to represent energy consumption, and includes marks for classes occurring over time. In addition, marks description includes the number of estimated occupants, according to the number of users connected to the room Wi-Fi access point.

60 Figure 6.8: Snapshot of & activities analysis chart of the lecture room A4. On the chart it is possible to see the correlation between class occurrences and energy consumption peaks.

A4 consumption chart shows a correlation between classes and energy consumption peaks (see figure 6.8). On the other hand, consumption during periods without classes is associated only with base load energy consumption, due to the absence of other activities. Regrettably, it was not possible to associate any consumption trends with occupation values. In fact, we found classes with equivalent energy consumption and disparate occupation values. The explanation for this seems to lie on the lack of occupation values accuracy.

6.5.7 Energy Costs Simulator

Energy manager’s major goal is reducing energy costs. Likewise, BEMSs must enable managers to analyse energy costs, so they can evaluate energy management policy effectiveness results, compare different supplier costs, and analyse the impact of tariff or fixed energy cost changes on the total energy cost. Our energy cost simulator application presents a table with detailed 2014 monthly energy costs. In particular, there are columns for monthly energy consumption, network access tariff cost, energy cost, and total cost. Access tariff cost includes peak power and active energy cost components. Reactive energy cost was not considered because reactive energy data was not available and it represents a small percentage on the total energy cost.

6.5.8 Peak Load Analysis Charts

Peak load analysis is used to identify high energy demand periods and configure equipment function- ing accordingly. On the other hand, base load is associated with constant energy consumption; some equipment function uninterruptedly throughout time, resulting in permanent minimum energy consump- tion values.

61 Figure 6.9: Snapshot of peak load analysis charts depicting space (1), normalization (2), and time (3) selectors; and displaying peak load (upper curve) and base load energy consumption (lower curve).

Our peak analysis chart presents peak and base load line charts, providing selectors for space, time, and normalization (see figure 6.9).

6.6 Lessons Learned during BEMS Prototype Development

The multidimensional model has a large impact on the BEMS development complexity. The way data is organized under the model determines the complexity of ETL workflows and data analysis tools. The energy costs fact table initially stored energy cost rates. Consequently, the first energy costs MDX query template had to aggregate energy consumption data with different time granularities (e.g. daily, and hourly), and then use it on the calculation of several types of energy costs. The result was an overly complex query, that compromised the prototype responsiveness. As a solution, we modified the energy costs data in order to store pre-calculated monthly energy costs. This new way of organizing data greatly simplified the energy costs MDX query template. On the other hand, the energy costs ETL workflow complexity increased significantly. BEMS prototype development should follow an iterative approach instead of a waterfall approach. During the first prototype development cycles, we followed a waterfall approach, comprising the devel- opment of all ETL workflows, followed by the development of all web charts. However, during charts development we identified several issues, requiring changes across several ETL workflows. The subse- quent development stages were made iteratively, consisting of developing the ETL workflows required to obtain specific data types, developing the MDX query template, and then developing the chart. As a result, we minimized the impact of chart applications changes on the other system components. Despite being an efficient library, Mondrian does not label MDX query result values, thus increasing the applications programming effort. Consequently, we had to created complex JavaScript functions that

62 Question ID Description

Design Question 1 The application interface is pleasant to use

Easiness of Use Question 2 The application interface is easy to use

Learnability Question 3 It is easy to learn how to use the application Question 4 The information provided by the application is easy to understand

Satisfaction Question 5 I am satisfied with the outcome of the performed tasks

Performance Question 6 The user interface is highly responsive

Table 6.4: Description of usability questions posed during interviews with energy managers, including design, easiness of use, learnability, satisfaction, and performance questions. determine energy measurements time, date, space, and other aspects. These labels were crucial to correctly set-up energy consumption charts data.

6.7 BEMS Prototype Evaluation Context

The BEMS prototype evaluation consisted on interviewing experienced energy managers, who are re- sponsible for buildings with considerable energy consumption (e.g. office buildings). The aim of the interviews was the evaluation of BEMS prototype front-end, which users interact with, obtaining a per- ception of the overall system functionality and quality [126]. Also, front-end evaluation results may depend on the prototype performance and usability issues. Likewise, the interview questionnaire was divided in two parts: one for evaluating the prototype usability and performance, and another to evaluate the energy consumption analysis methods. Usability evaluation questions were based on the ISO 9241, standard for ergonomics of human- computer interaction [150]. The questions aim at evaluating the BEMS interface design, easiness of use, learnability, and satisfaction. Usability dimension questions are described in Table 6.4. Performance evaluation was associated with a question, in which we asked the users to classify the prototype performance and responsiveness from one (slow and unresponsive) to five (fast and respon- sive). In addition, we recorded BEMS user interface response time during interview sections. Functionality evaluation consisted of (i) identifying energy consumption analysis methods missing on the prototype, and (ii) identifying necessary improvements on the provided analysis charts – missing data types, data filters, or other features necessary for an effective consumption analysis. In order to identify missing energy consumption analysis methods, we supplied users with a list of methods and techniques that were not available on the prototype. For instance, individual equipment consumption accounting, or longitudinal benchmarking. Yet, our ultimate goal was identifying methods not available on that list, so we could determine necessary multidimensional model changes. Similarly,

63 Interview User Age Gender Years of Education Type of Had Previous Number Experience Building Experience with BEMS

1 A 45 M 12 Master Degree Educational Yes 2 B 58 M 18 12th grade Educational No 3 C 59 M 21 Master Degree Commercial Yes 4 D 39 M 11 Master Degree Commercial Yes 5 E 58 M 20 Master Degree Commercial Yes 6 F 50 F 5 Bachelor Degree Commercial Yes 7 G 39 M 10 Bachelor Degree Governmental Yes 8 H 27 M 5 Master Degree Educational Yes 9 I 57 M 28 Bachelor Degree Commercial Yes 10 J 49 M 20 Master Degree Commercial Yes 11 K 34 M 8 12th grade Educational No

Table 6.5: Demographic information, education, years of experience, and BEMS usage proficiency of the interviewed energy managers. the evaluation of each chart functionality was performed asking users to indicate missing data types or features, or alternatively, selecting one or more options from a predefined list. The complete question- naire is available on appendix SectionE.

6.8 BEMS Prototype Evaluation Methodology

During the evaluation of the BEMS prototype, we performed 11 interviews with experienced energy managers, responsible for different types of buildings, used for different purposes (e.g. education, cul- ture activities). Table 6.5 details the profiles of the energy manager, such as age, gender, years of experience, and other. Interviews were performed on the energy manager’s office, and took an average of 90 minutes each. The interviews were semi-structured, allowing energy managers to interrupt any time, and having us determining the interview course along the way. For instance, during some interviews we were able to see the organization BEMS functioning, or hear about implemented energy policies. Nonetheless, we always obtained answers for all the questions. The fundamental interview structure consisted on the following:

1. Explaining to the participants our work context (e.g. BEMS prototype purpose and objectives), and the multidimensional model entities and relationships.

2. Performing a demo of each energy consumption analysis chart, and giving hints about energy consumption analysis results obtainable using those charts.

3. Requesting energy managers to verbally answer our questionnaire.

4. Asking energy managers to provide feedback, comments, and suggestions about the prototype, or any clarification about their work specifics (e.g. energy management analysis methodologies) or any other aspect, such as building characteristics, or implemented energy reduction policies.

64 Question 1 (Design)

Question 2 (Easiness of Use)

Question 3 (Learnability)

Question 4 (Learnability)

Question 5 (Satisfaction)

Question 6 (Performance)

1 2 3 4 5

Figure 6.10: Results of the usability questionnaire, including the maximum, minimum and average val- ues. The questions evaluate performance, and usability dimensions of design, functionality, easiness of use, learnability, and satisfaction.

6.9 BEMS Prototype Evaluation Results

As previously described, the BEMS prototype was evaluated according to its usability, performance, and functionality. In particular, functionality evaluation included the assessment of analysis methods, and individual charts features. The evaluation results were based on the analysis of the questionnaires and observations attained throughout the interviews. In addition, performance evaluation was based on the analysis of recorded BEMS user interface response time. On the following sections we describe the results obtained for usability, performance, and functionality evaluation.

6.9.1 Usability and Performance Evaluation

BEMS usability evaluation results are summarized in Figure 6.10. Most of the interviewed energy man- agers gave high scores for design and ease of use questions. Nevertheless, some participants gave low scores, evoking displease in using web applications over desktop applications, and scrolling pages vertically (the appropriate way for web pages [151]). In addition, some users referred that home page button was hard to find, and there was no option to export data as excel files. Scores gave to learnability questions were also high. Although, some users were not familiar with cooling and heating degree days concepts, and thus, considered regression analysis charts confusing. According to those users, the solution may be having a small explanation about what are degree days, how they are calculated, and how they are represented in the chart. Regarding satisfaction questions, the scores were again high, but once more, some users had rea- sons to not give a maximum score. In particular, users referred the lack of freedom analysing degree days data (requiring a base temperature selector), analysing CO2 Kg data (requiring a conversion rate selector), and filtering by day period (e.g. empty, super empty). Furthermore, users referred the need for the interface to supply context information, such as the data sources origin, the last time data was updated, and the description of each building (e.g. area). Analogously to the remaining questions, energy managers gave a high score to performance ques- tion. Although some did not give a maximum score, they did not indicate a valid reason for it. In comparison, measured user interface response time (including OLAP server response time) was also

65 100

80

60

40

Average Prototype Response Time (Milliseconds) 20 1 2 3 4 5 6 7 8 9 10 11 Interview Number

Figure 6.11: Results of the average BEMS prototype interface response time (including OLAP server response time) for each of the 11 interviews. low (see Figure 6.11). Indeed, the average response time varied between 25 and 110 milliseconds, having the user feel, in most cases, the system reacting instantaneously [152]. In general, we can affirm that BEMS prototype usability and performance aspects did not compromise functionality evaluation. This idea is reinforced with most users referring the high usability of our BEMS prototype, in contrast with using their systems, or manually creating spreadsheets and reports.

6.9.2 Evaluation of Energy Data Analysis Methods

The number of user requests for each energy consumption analysis method is depicted in Figure 6.12. During the first 7 interviews, 5 users referred the importance of longitudinal benchmarking and peak load analysis histograms, and 7 users highlighted the role of base load curves on building energy manage- ment analyses. Due to our model completeness, we were able to create year comparison chart, peak load analysis chart, and base load analysis chart, presenting those charts during the last 4 interviews, where its importance was again referred. Individual equipment and organization consumption accounting functionality was considered missing by 8 and 9 users respectively. Despite those functionalities were not available, our model includes dimen- sions for organization and equipment. Although, to guarantee the model effectively supports equipment and organization accounting, we will have to implement a chart providing those energy consumption analysis methods. Energy consumption forecasting analysis was requested by 8 users. In literature, there are numerous studies about energy consumption forecast, considering different energy consumption related factors (e.g. weather), different building characteristics (e.g. area), occupants behaviour, and other related variables. In general, there seems to exist a correlation between the model completeness and the range of variables available to consider on energy consumption forecasting. In any case, we cannot conclude if the model effectively supports energy consumption forecasting analysis, until we perform further research on the subject.

66 Longitudinal Benchmarking

Peak Load Analysis

Baseline Curves

Organization Consumption Accounting

Equipment Consumption Accounting

Consumption Forecasting

Other

5 6 7 8 9 Number of user requests

Figure 6.12: Summary of the number of user requests for each category of functionality missing on the BEMS prototype.

Other user requests were the possibility of analysing the correlation between energy consumption and temperature (an alternative way of displaying regression variable chart data), and comparing the consumption among different equipment categories (e.g. heating, cooling). Taking into account we did not implement a chart for equipment consumption analysis, we cannot evaluate the implications of this last request.

6.9.3 Evaluation of Individual Charts

According to user answers, detailed analysis chart, comparison analysis, peak load analysis, and year comparison charts, should display both gas and water consumption along with electricity consumption. In addition, those charts should provide a filter to separate weekends and week days consumption. Although we did not create charts for gas and water consumption, extending the model to support these data types only requires the creation of water and gas measurement columns on energy measurements fact table. As for variable analysis chart, users referred the need of having tools to isolate data clusters, and transform regression charts into line charts. In order to provide those tools, it will not be necessary to modify the model. Instead, it will only require modifying the BEMS prototype front-end. Even though energy costs are calculated on a monthly basis, according to users, the energy cost simulator should provide an estimation of hourly energy costs. The estimation may be done according to consumption patterns, and other variables (e.g. weather, year season). Another suggested feature was the comparison of energy costs according to different supplier rates. Hourly energy cost estimations should be calculated by BEMS performance optimization layer, and stored in the DW periodically, along with monthly energy costs. Indeed, supplying users with energy cost estimations will not require any model modifications. On the other hand, to integrate energy costs from different suppliers under our model, we will have to modify the model. A solution is creating a measurement column for each supplier on energy costs fact table.

67 The individual chart evaluation revealed the overall need of associating energy consumption with events, associated with implemented policies, realized investments, equipment malfunctioning occur- rences, received alarms, and other. In order for the model to associate consumption with events oc- curred over time, it will be necessary create at least an event fact table, and a dimension to describe the events. This new table will be associated with the process of analysing the impact of occurred events on energy consumption.

6.9.4 Discussion

Overall, BEMS prototype evaluation demonstrates that our multidimensional model supports a broad range of energy data analysis methods, and does not comprise the underlying system performance, thus demonstrating the applicability of the model in real-world settings. Regarding the usability evaluation results, the prototype must be modified to provide more context information to users, and supply less restrictive data filter tools. In either case, the lack of such improve- ments did not comprise usability evaluation results. After applying minor modifications on the model, our model supports the analysis of gas consump- tion, energy cost simulations, and the correlation of events with energy consumption. In general, the functional evaluation results demonstrate that our model is highly extensible.

68 Chapter 7

Conclusions

Energy consumption analysis is performed by energy managers using BEMSs, which can be regarded as DSSs instantiated to the energy management domain. Despite DSS development being already well established in the Information Systems domain, BEMS still lack a reference information model that can be re-used to integrate energy-related data from heterogeneous data sources, and facilitate the integration of data analysis tools, alleviating the overall effort required for systems development and maintenance.

The difficulty to obtain precise requirements regarding energy management business turns the cre- ation of a multidimensional model for energy management into a challenging task. Indeed, no multi- dimensional model proposals in the literature support a broad a range of building energy management data analysis activities. As we demonstrate in our related-work analysis, existing models present several multidimensional modelling design issues, and to the best of our knowledge none was validated.

In contrast to other proposals, our model was designed according to widely accepted multidimen- sional modelling principles and DW development guidelines, namely role-playing dimensions and hier- archy bridges. Our approach is innovative in that we followed the bottom-up DW development approach of Kimball et al. and leveraged end-user collaboration on validation of the multidimensional model devel- opment process, to overcome the lack of business process systematization [26].

Another merit of the model developed in this work lies in the iterative validation process employed, that consists of (i) the evaluation with multidimensional model design quality metrics proposed on liter- ature, (ii) testing the model against a wide range of queries, and (iii) the model revision by users with building energy management domain knowledge. The outcome is a high quality model. In concrete, evaluation results indicate that our model is easy to use, flexible for users to perform energy-related analyses, has a low impact on the underlying system, and is easily extensible.

To further validate the model proposal, we developed and validated a BEMS prototype built upon our model. The prototype was validated during interviews with experienced energy managers. During the interviews, energy managers referred the variety of provided energy data analyses (supported by the model), and made several improvement suggestions. Due to the high model completeness, these suggestions only require the application of minor modifications on the model.

69 7.1 Impact

Our multidimensional model proposal documents how common multidimensional model design anoma- lies are addressed and constitutes a step towards the systematization of BEMS system requirements and energy management domain knowledge. This systematization can greatly improve the communica- tion between developers and stakeholders during system development. The existence of a reference multidimensional model design for building energy management con- tributes to a better integration between BEMS system components, and eases data analysis tools de- velopment. Overall, the model will help reducing BEMS development and maintenance costs [153]. The high model extensibility will enable developers to start focusing less on BEMS data management layer, and more on the development of more efficient data analysis tools, or any other advanced system functionalities (e.g. alarm management). The model and its encoded requirements can also be transposed to other energy management areas, namely, industrial energy management [29, 30]. Additionally, our model iterative development methodology can be used to create models for other areas where there is a poor domain requirements definition.

7.2 Future Work

Despite the fact that our model proposal excels on several aspects, it can still be enhanced before it can effectively be a reference multidimensional model for building energy management:

• Evaluation process

Our model proposal was evaluated using complexity, usability, and design quality metrics. How- ever, we believe it is necessary to create new metrics to analyse a broader range of model features and qualities (e.g. learnability, modifiability). Also, the model completeness must be evaluated with several case studies. In particular, during our interviews with energy managers, we identified that Portuguese public hospitals do not use any BEMS or EIS, and thus, providing us an opportunity to study the BEMS prototype effectiveness on a real case scenario.

The IST university campus energy management activities rely on data provided by Energist energy management system. Unlike our prototype, this system does not integrate energy consumption related factors data, limiting the scope of energy-related data analysis. Likewise, another BEMS prototype testing scenario consists in using the BEMS prototype as a replacement for Energist system. For that purpose, it will be necessary to create a building automation layer that gathers Modbus energy meters data.

• Prototype and Model Evaluation

The BEMS prototype was validated with data of a university campus. Although this validation ex- amples consists of multiple buildings, the model must be evaluated using data related to other

70 contexts and building types. In particular, integrating commercial buildings (requested from build- ing organization) and households (available on public energy datasets [154, 155, 156, 157, 158]) energy related data.

• Reporting Methodology

During the interviews, energy managers indicated the limitations of existing BEMS. At the same time, they were not willing to acquire new systems, due to high investments made on their BEMS, and other proprietary building automation systems. However, according to the energy managers, those issues could be surpassed with a system able to integrate data from existing BEMS, instead of integrating data directly from building automation systems. As a result, this new system would simplify the creation of energy consumption reports.

To create one such system, it is necessary to develop a building energy management reporting methodology, based on ISO 16001 guidelines [18]. Then, it will be necessary to assess the model and the BEMS prototype completeness regarding the reporting methodology requirements.

• Space and Energy Usage Estimation

Another immediate future work regarding energy data is studying methods to better estimate space occupation. Despite the privacy issues on obtaining occupation data, considering lessons atten- dance lists or occupation sensors (if available), may increase occupation data accuracy.

• Perform Identified Improvements

Throughout the course of interviews with energy managers, we identified several model and BEMS prototype improvements. In particular, creating new data filters for periods, degree days base

temperature, and CO2 conversion rate; extend the model to integrate new types of energy (e.g. gas); and create an event fact table. These improvements should be included in a new model development cycle, and followed by interviews with energy managers.

• Data update rate

As we previously described, events such as the implementation of energy reduction policies, or modifications on equipment operation, impact energy consumption values. Accordingly, it should be studied what is the most appropriate data update rate; Traditional DW systems are typically updated daily, and near near real-time DW systems are usually updated hourly or more fre- quently [159]. As for the case of embedding the model into a near real-time DW system, it should be evaluated if the model is appropriate for such a system.

• Energy Data ETL Processes Blueprint

The ETL processes used on this work should be modified so they can used to extract, transform, and load data from distinct building energy-related contexts. Generalizing the ETL processes is possible because energy data quality problems are mostly the same across different building energy-related contexts [54]. The ETL processes generalization should consist in using PDI Java API to create reusable ETL applications that target specific data quality issues. Additionally, it

71 should be created a user interface that enables users to add new energy-related data sources dynamically. More precisely, the user specifies the data source type (e.g. API), location (e.g. URL), and stored data format (e.g. CSV file attributes), and the system selects a pre-existing ETL data extraction application to obtain the data.

72 Bibliography

[1] W. C. Turner and S. Doty. Energy Management Handbook. The Fairmont Press, Inc., 2007. ISBN 0881735426.

[2] B. L. Capehart. Information Technology for Energy Managers. The Fairmont Press, Inc., 2004. ISBN 0881734497.

[3]L.P erez-Lombard,´ J. Ortiz, and C. Pout. A review on buildings energy consumption information. Energy and Buildings, 40(3):394–398, 2008. ISSN 03787788. doi: 10.1016/j.enbuild.2007.03.007.

[4] N. Fumo. A review on the basics of building energy estimation. Renewable and Sustainable Energy Reviews, 31:53–60, 2014. ISSN 13640321. doi: 10.1016/j.rser.2013.11.040.

[5]L.P erez-Lombard,´ J. Ortiz, J. F. Coronel, and I. R. Maestre. A review of HVAC systems re- quirements in building energy regulations. Energy and Buildings, 43(2):255–268, 2011. ISSN 03787788. doi: 10.1016/j.enbuild.2010.10.025.

[6] J. Seryak and K. Kissock. Occupancy and behavioral affects on residential energy use. In Solar Conference, pages 717–722. American Solar Energy Society, 2003.

[7] D. Lazos, A. B. Sproul, and M. Kay. Optimisation of energy management in commercial buildings with weather forecasting inputs: A review. Renewable and Sustainable Energy Reviews, 39:587– 603, 2014. ISSN 13640321. doi: 10.1016/j.rser.2014.07.053.

[8] C. W. Holsapple. Decisions and Knowledge. In Handbook on Decision Support Systems 1, chap- ter 2, pages 21–53. Springer, 2008.

[9] D. J. Power. Decision Support Systems: Concepts and Resources for Managers. Greenwood Publishing Group, 2002.

[10] R. Meredith, P. O’Donnell, and D. Arnott. Databases and Data Warehouses for Decision Support. In Handbook on Decision Support Systems 1, chapter 11, pages 207–230. The Fairmont Press, Inc., 2008.

[11] W. H. Inmon. Building the Data Warehouse. John Wiley & Sons, 2005. ISBN 0471081302.

[12] R. Kimball and M. Ross. The Data Warehouse Toolkit, The Definitive Guide to Dimensional Mod- eling. John Wiley & Sons, Inc., 2013. ISBN 9781118530801. doi: 10.1145/945721.945741.

73 [13] N. Motegi, M. A. Piette, S. Kinney, and K. Herter. Guide to Analysis Applications In Energy Information Systems. In Information Technology for Energy Managers, chapter 14, pages 145– 155. The Fairmont Press, Inc., 2004.

[14] P. Raghunathan. Data Analysis and Decision Making: Using Spreadsheets and Pivot Tables To Get A Read On Energy Numbers. In Information Technology for Energy Managers, chapter 15, pages 157–169. The Fairmont Press, Inc., 2004.

[15] A. Ahmed, J. Ploennigs, K. Menzel, and B. Cahill. Multi-dimensional building performance data management for continuous commissioning. Advanced Engineering Informatics, 24(4):466–475, 2010. ISSN 14740346. doi: 10.1016/j.aei.2010.06.007.

[16] I. S. 393:2005. Energy Management Systems. Technical report, Sustainable Energy Ireland, 2006.

[17] ANSI/MSE 200:2008. A Management System for Energy. Technical report, ANSI/MSE, 2000.

[18] BS EN ISO 16001:2009. Energy management systems. Requirements with guidance for use. Technical report, ISO, 2008.

[19] BS EN ISO 50001:2011. Energy management systems. Requirements with guidance for use. Technical report, ISO, 2011.

[20] P. Antunes, P. Carreira, and M. Mira da Silva. Towards an energy management maturity model. Energy Policy, 73:803–814, 2014. ISSN 03014215. doi: 10.1016/j.enpol.2014.06.011.

[21] W. E. Walker, P. Harremoes,¨ J. Rotmans, J. P. Van der Sluijs, M. B. A. Van Asselt, P. Janssen, and M. P. Krayer von Krauss. Defining Uncertainty: A Conceptual Basis for Uncertainty Management in Model-Based Decision Support. Integrated Assessment, 4(1):5–17, 2003. ISSN 1389-5176. doi: 10.1076/iaij.4.1.5.16466.

[22] S. Tom. Introduction to Web-based Information and Control Systems. In Information Technology for Energy Managers, chapter 2, pages 9–15. The Fairmont Press, Inc., 2004.

[23] B. Gnerre and G. Cmar. Defining the Next Generation Enterprise Energy Management System. In Web Based Energy Information and Control Systems: Case Studies and Applications, chapter 32, pages 403–434. The Fairmont Press, Inc., 2005.

[24] B. L. Capehart, W. C. Turner, and W. J. Kennedy. Guide to energy management. The Fairmont Press, Inc., 2006. ISBN 088173425X.

[25] G. Yee and T. Webster. State of Practice of Energy Management, Control, and Information Sys- tems. In Web Based Energy Information and Control Systems: Case Studies and Applications, chapter 21, pages 275–286. The Fairmont Press, Inc., 2005.

74 [26] R. Kimball, M. Ross, W. Thornthwaite, J. Mundy, and B. Becker. The Data Warehouse Lifecycle Toolkit, 2nd Edition: Practical Techniques for Building Data Warehouse and Systems. John Wiley & Sons, 2008. ISBN 0470149779.

[27] H. U. Gokc¸e¨ and K. U. Gokc¸e.¨ Multi dimensional energy monitoring, analysis and optimization system for energy efficient building operations. Sustainable Cities and Society, 10:161–173, 2014. ISSN 22106707. doi: 10.1016/j.scs.2013.08.004.

[28] D. Schuff, K. Corral, and O. Turetken. Comparing the understandability of alternative data ware- house schemas: An empirical study. Decision Support Systems, 52(1):9–20, 2011. ISSN 01679236. doi: 10.1016/j.dss.2011.04.003.

[29] J. Becker, P. Delfmann, and R. Knackstedt. Adaptive Reference Modeling: Integrative Configu- rative and Generic Adaption Techniques for Information Models. In Reference modeling, pages 27–58. Springer, 2007. ISBN 9783790819656.

[30] M. Goeken and R. Knackstedt. Multidimensional Reference Models for Data Warehouse Develop- ment. In International Conference on Enterprise Information Systems, pages 347–354. Citeseer, 2007.

[31] D. L. Moody. Theoretical and practical issues in evaluating the quality of conceptual models: current state and future directions. Data & Knowledge Engineering, 55(3):243–276, 2005. ISSN 0169023X. doi: 10.1016/j.datak.2004.12.005.

[32] I. Sommervile. Software engineering. Addison-Wesley, 2006.

[33] K. C. Laudon and J. P. Laudon. Management Information Systems: Managing the Digital Firm. Prentice Hall, 2014.

[34] H. A. Simon. The New Science of Management Decision. Prentice-Hall, 1977.

[35] G. Marakas. Decision Support Systems in the Twenty-first Century. Prentice-Hall, 1999.

[36] R. H. Sprague. A Framework for the Development of Decision Support Systems. MIS quarterly, pages 1–26, 1980. doi: 102307248957.

[37] E. Oz. Management Information Systems. Cengage Learning, 2008. ISBN 9781423901785.

[38] D. Fong and A. Schurr. Relational Database Choices and Design. In Information Technology for Energy Managers, chapter 22, pages 255–263. The Fairmont Press, Inc., 2004.

[39] G. J. Levermore. Building Energy Management Systems: Applications to low-energy HVAC and natural ventilation control. Taylor & Francis, 2000.

[40] X. Ma, R. Cui, Y. Sun, C. Peng, and Z. Wu. Supervisory and Energy Management System of Large Public Buildings. In 2010 International Conference on Mechatronics and Automation (ICMA), pages 928–933. IEEE, 2010. ISBN 978-1-4244-5140-1. doi: 10.1109/ICMA.2010.5589969.

75 [41] N. Motegi, M. A. Piette, S. Kinney, and K. Herter. Introduction to Web-based Energy Information Systems for Energy Management and Demand Response in Commercial Buildings. In Information Technology for Energy Managers, chapter 7, pages 55–66. The Fairmont Press, Inc., 2004.

[42] H. Doukas, K. D. Patlitzianas, K. Iatropoulos, and J. Psarras. Intelligent building energy man- agement system using rule sets. Building and Environment, 42(10):3562–3569, 2007. ISSN 03601323. doi: 10.1016/j.buildenv.2006.10.024.

[43] R. Kimball and J. Caserta. The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning Conforming, and Delivering Data. John Wiley & Sons, 2004. ISBN 0764579231.

[44] R. Kimball and M. Ross. The Data Warehouse Toolkit: The Complete Guide To . John Wiley & Sons, 2002. ISBN 0-471-20024-7.

[45] P. Vassiliadis. A Survey of Extract-Transform-Load Technology. International Journal of Data Warehousing and Mining, 5(3):1–27, 2009. ISSN 1548-3924. doi: 10.4018/jdwm.2009070101.

[46] R. Bouman and J. V. Dongen. Pentaho Solutions: Business Intelligence and Data Warehousing with Pentaho and MySQL. Wiley Publishing, Inc., 2009. ISBN 978-0-470-48432-6.

[47] J. Rumbaugh, I. Jacobson, and G. Booch. The Unified Modeling Language Reference Manual. Pearson Higher Education, 2004. ISBN 020130998X.

[48] J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2011. ISBN 9781558609013. doi: 10.1002/1521-3773(20010316)40:6⟨9823::AID-ANIE9823⟩3.3.CO;2-C.

[49] W. D. Back, N. Goodman, and J. Hyde. Mondrian in Action: Open source business analytics. Manning Publications Co., 2013. ISBN 9781617290985.

[50] H. U. Gokc¸e¨ and K. U. Gokc¸e.¨ Holistic system architecture for energy efficient building operation. Sustainable Cities and Society, 6(1):77–84, 2013. ISSN 22106707. doi: 10.1016/j.scs.2012.07. 003.

[51] S. T. March and A. R. Hevner. Integrated decision support systems: A data warehous- ing perspective. Decision Support Systems, 43(3):1031–1043, 2007. ISSN 01679236. doi: 10.1016/j.dss.2005.05.029.

[52] L. Luciano and P. Carreira. Integrating Energy Data with ETL. In CEUR Workshop, volume 923, pages 79–88. Citeseer, 2012. ISBN 16130073 (ISSN).

[53] D. De Silva, X. Yu, D. Alahakoon, and G. Holmes. A data Mining Framework for Electricity Con- sumption Analysis From Meter Data. IEEE Transactions on Industrial Informatics, 7(3):399–407, 2011. ISSN 1551-3203. doi: 10.1109/TII.2011.2158844.

[54] G. Thompson, J. Yeo, and T. Tobin. Data Quality Issues and Solutions for Enterprise Energy Management Applications. In Web Based Energy Information and Control Systems: Case Studies and Applications, chapter 33, pages 435–446. The Fairmont Press, Inc., 2005.

76 [55] C. Batini and M. Scannapieco. Data Quality: Concepts, Methodologies and Techniques. Springer, 2006. ISBN 9783540331728.

[56] J. Granderson, M. A. Piette, and G. Ghatikar. Building energy information systems: User case studies. Energy Efficiency, 4(1):17–30, 2011. ISSN 1570646X. doi: 10.1007/s12053-010-9084-4.

[57] J. C. Van Gorp. Maximizing Energy Savings with Energy Management Systems. Strate- gic Planning for Energy and the Environment, 24(3):57–69, 2004. ISSN 1048-5236. doi: 10.1080/10485230409509667.

[58] G. Yee and T. Webster. Review of Advanced Applications in Energy Management, Control, and Information Systems. In Web Based Energy Information and Control Systems: Case Studies and Applications, chapter 22, pages 287–304. The Fairmont Press, Inc., 2005.

[59] CarbonTrust 2011. Energy Management-A comprehensive guide to controlling energy use. Tech- nical report, CarbonTrust, 2011.

[60] I. S. EN 16001:2009. Implementation Guide. Technical report, Sustainable Energy Ireland, 2009.

[61] P. Carreira and C. A. Silva. Greening by IT. In Green Sustainable Data Centres, chapter 8, pages 1–35. Open University Nederlands, 2014.

[62] X. Li, C. P. Bowers, and T. Schnier. Classification of Energy Consumption in Buildings With Outlier Detection. IEEE Transactions on Industrial Electronics, 57(11):3639–3644, 2010. ISSN 02780046. doi: 10.1109/TIE.2009.2027926.

[63] X. Hong-ye, Y. Qi, and L. Ai-guo. Measurement of the energy real-time data warehouse system design and Implementation. In 2012 International Conference on Computer Science and Service System, pages 2087–2090. IEEE, 2012. ISBN 978-0-7695-4719-0. doi: 10.1109/CSSS.2012.519.

[64] J. Granderson, M. A. Piette, and B. Rosenblum. Energy information handbook: Applications for energy-efficient building operations. Lawrence Berkeley National Laboratory, 2011. doi: LBNL-5272E.

[65] S. J. Parkpoom and G. P. Harrison. Analyzing the Impact of Climate Change on Future Electricity Demand in Thailand. IEEE Transactions on Power Systems, 23(3):1441–1448, 2008. ISSN 0885- 8950. doi: 10.1109/TPWRS.2008.922254.

[66] A. Pardo, V. Meneu, and E. Valor. Temperature and seasonality influences on Spanish electricity load. Energy Economics, 24(1):55–70, 2002. ISSN 01409883. doi: 10.1016/S0140-9883(01) 00082-2.

[67] M. Christenson, H. Manz, and D. Gyalistras. Climate warming impact on degree-days and building energy demand in Switzerland. Energy Conversion and Management, 47(6):671–686, 2006. ISSN 01968904. doi: 10.1016/j.enconman.2005.06.009.

77 [68] R. D. L. Vollaro, C. Guattari, L. Evangelisti, G. Battista, E. Carnielo, and P. Gori. Building energy performance analysis: A case study. Energy and Buildings, 87:87–94, 2015. ISSN 03787788. doi: 10.1016/j.enbuild.2014.10.080.

[69] H. Poirazis, A.˚ Blomsterberg, and M. Wall. Energy simulations for glazed office buildings in Swe- den. Energy and Buildings, 40(7):1161–1170, 2008. ISSN 03787788. doi: 10.1016/j.enbuild. 2007.10.011.

[70] X. Gao and A. Malkawi. A new methodology for building energy performance benchmarking: An approach based on intelligent clustering algorithm. Energy and Buildings, 84:607–616, 2014. ISSN 03787788. doi: 10.1016/j.enbuild.2014.08.030.

[71] X. Feng, D. Yan, and T. Hong. Simulation of occupancy in buildings. Energy & Buildings, 87: 348–359, 2015. ISSN 0378-7788. doi: 10.1016/j.enbuild.2014.11.067.

[72] G. K. F. Tso and K. K. W. Yau. Predicting electricity energy consumption: A comparison of re- gression analysis, decision tree and neural networks. Energy, 32(9):1761–1768, 2007. ISSN 03605442. doi: 10.1016/j.energy.2006.11.010.

[73] M. S. Gul and S. Patidar. Understanding the energy consumption and occupancy of a multi- purpose academic building. Energy and Buildings, 87:155–165, 2015. ISSN 03787788. doi: 10.1016/j.enbuild.2014.11.027.

[74] O. G. Santin, L. Itard, and H. Visscher. The effect of occupancy and building characteristics on energy use for space and water heating in Dutch residential stock. Energy and Buildings, 41(11): 1223–1232, 2009. ISSN 03787788. doi: 10.1016/j.enbuild.2009.07.002.

[75] E. Buchmann, K. Bohm,¨ T. Burghardt, and S. Kessler. Re-identification of Smart Meter data. Personal and Ubiquitous Computing, 17(4):653–662, 2013. ISSN 16174909. doi: 10.1007/ s00779-012-0513-6.

[76] S. R. Iyer, M. Sankar, P.V. Ramakrishna, V. Sarangan, A. Vasan, and A. Sivasubramaniam. Energy disaggregation analysis of a supermarket chain using a facility-model. Energy and Buildings, 97: 65–76, 2015. ISSN 03787788. doi: 10.1016/j.enbuild.2015.03.053.

[77] K. X. Perez, W. J. Cole, J. D. Rhodes, A. Ondeck, M. Webber, M. Baldea, and T. F. Edgar. Non- intrusive disaggregation of residential air-conditioning loads from sub-hourly smart meter data. Energy and Buildings, 81:316–325, 2014. ISSN 03787788. doi: 10.1016/j.enbuild.2014.06.031.

[78] I. Apolinario, N. Felizardo, A. L. Garcia, P. Oliveira, A. Trinidad, and P. Verdelho. Determination of Time-Of-Day Schedules in the Portuguese Electric Sector. In 2006 IEEE Power Engineering Society General Meeting, pages 1–8. IEEE, 2006. ISBN 1-4244-0493-2. doi: 10.1109/PES.2006. 1709487.

78 [79] P. Rocha, A. Siddiqui, and M. Stadler. Improving energy efficiency via smart building energy management systems: A comparison with policy measures. Energy and Buildings, 88:203–213, 2015. ISSN 03787788. doi: 10.1016/j.enbuild.2014.11.077.

[80] G. A. Florides, S. A. Tassou, S. A. Kalogirou, and L. C. Wrobel. Measures used to lower building energy consumption and their cost effectiveness. Applied Energy, 73(3-4):299–328, 2002. ISSN 03062619. doi: 10.1016/S0306-2619(02)00119-8.

[81] Y. Wand and R. Weber. Research commentary: information systems and conceptual modeling—a research agenda. Information systems reseach, 13(4):363–376, 2002.

[82] A. Endres and H. D. Rombach. A handbook of software and systems engineering: Empirical observations, laws, and theories. Pearson Education, 2003. ISBN 0321154207. doi: 10.1109/ MS.2004.1270773.

[83] S. Lauesen and O. Vinter. Preventing Requirement Defects: An Experiment in Process Improve- ment. Requirements Engineering, 6(1):37–50, 2001. ISSN 0947-3602. doi: 10.1007/PL00010355.

[84] ISO/IEC. ISO/IEC 25010:2011 Systems and software engineering – Systems and software quality requirements and evaluation (SQuaRE) – System and software quality models. Technical report, ISO/IEC, 2011.

[85] D. L. Moody. Metrics for Evaluating the Quality of Entity Relationship Models. In 17th International Conference on Conceptual Modeling ER ’98, pages 211–225. Springer, 1998.

[86] O. I. Lindland, G. Sindre, and A. Solvberg. Understanding quality in conceptual modeling. IEEE Software, 11(2):42–49, 1994. ISSN 07407459. doi: 10.1109/52.268955.

[87] R. Maier. Organizational Concepts and Measures for the Evaluation of Data Modeling. In Devel- oping Quality Complex Database Systems: Practices, Techniques and Technologies, chapter 1, pages 1–27. IGI Publishing, 2011.

[88] H. Kaindl, S. Brinkkemper, J. A. Bubenko Jr., B. Farbey, S. J. Greenspan, C. L. Heitmeyer, J. C. do Prado Leite, N. R. Mead, J. Mylopoulos, and J. Siddiqi. Requirements Engineering and Technology Transfer: Obstacles, Incentives and Improvement Agenda. Requirements Engineering, 7(3):113– 123, 2002. ISSN 0947-3602. doi: 10.1007/s007660200008.

[89] G. Poels, J. Nelson, M. Genero, and M. Piattini. Quality in Conceptual Modeling-New Research Directions. In Advanced Conceptual Modeling Techniques, pages 243–250. Springer, 2003.

[90] Y. Li, L. Wang, L. Ji, and C. Liao. A Data Warehouse Architecture supporting Energy Manage- ment of Intelligent Electricity System. In 2nd International Conference on Computer Science and Electronics Engineering (ICCSEE 2013). Atlantis Press, 2013. ISBN 978-90-78677-61-1. doi: 10.2991/iccsee.2013.177.

79 [91] D. J. Berndt, A. R. Hevner, and J. Studnicki. The Catch data warehouse: Support for community health care decision-making. Decision Support Systems, 35(3):367–384, 2003. ISSN 01679236. doi: 10.1016/S0167-9236(02)0114-8.

[92] M. De Mul, P. Alons, P. Van der Velde, I. Konings, J. Bakker, and J. Hazelzet. Development of a clinical data warehouse from an intensive care clinical information system. Computer Methods and Programs in Biomedicine, 105(1):22–30, 2012. ISSN 01692607. doi: 10.1016/j.cmpb.2010. 07.002.

[93] A. Lamer, M. Jeanne, B. Vallet, G. Ditilyeu, F. Delaby, B. Tavernier, and R. Logier. Development of an anesthesia data warehouse: Preliminary results. Irbm, 34(6):376–378, 2013. ISSN 19590318. doi: 10.1016/j.irbm.2013.09.005.

[94] E. Roelofs, L. Persoon, S. Nijsten, W. Wiessler, A. Dekker, and P. Lambin. Benefits of a clinical data warehouse with data mining tools to collect data for a radiotherapy trial. Radiotherapy and Oncology : Journal of the European Society for Therapeutic Radiology and Oncology, 108(1): 174–9, 2013. ISSN 1879-0887. doi: 10.1016/j.radonc.2012.09.019.

[95] H. Hu, M. Correll, L. Kvecher, M. Osmond, J. Clark, A. Bekhash, G. Schwab, D. Gao, J. Gao, V. Kubatin, C. D. Shriver, J. A. Hooke, L. G. Maxwell, A. J. Kovatich, J. G. Sheldon, M. N. Liebman, and R. J. Mural. DW4TR: A Data Warehouse for Translational Research. Journal of Biomedical Informatics, 44(6):1004–1019, 2011. ISSN 15320464. doi: 10.1016/j.jbi.2011.08.003.

[96] X. Zhou, S. Chen, B. Liu, R. Zhang, Y. Wang, P. Li, Y. Guo, H. Zhang, Z. Gao, and X. Yan. Devel- opment of traditional Chinese medicine clinical data warehouse for medical knowledge discovery and decision support. Artificial intelligence in medicine, 48(2):139–152, 2010. ISSN 1873-2860. doi: 10.1016/j.artmed.2009.07.012.

[97] W. E. Trick. Building a data warehouse for infection control. American Journal of Infection Control, 36(3):S75–S81, 2008. ISSN 01966553. doi: 10.1016/j.ajic.2007.07.004.

[98] M. F. Wisniewski, P. Kieszkowski, B. M. Zagorski, W. E. Trick, M. Sommers, and R. A. Weinstein. Development of a Clinical Data Warehouse for Hospital Infection Control. Journal of the American Medical Informatics Association, 10(5):454–463, 1998. doi: 10.1197/jamia.M1299.care.

[99] D. L. Rubin and T. S. Desser. A Data Warehouse for Integrating Radiologic and Pathologic Data. Journal of the American College of Radiology : JACR, 5(3):210–7, 2008. ISSN 1558-349X. doi: 10.1016/j.jacr.2007.09.004.

[100] S. Liu, C. Han, S. Wang, and Q. Luo. Data Warehouse Design For Earth Observation Satellites. Procedia Engineering, 29:3876–3882, 2012. ISSN 18777058. doi: 10.1016/j.proeng.2012.01.587.

[101] M. A. Eleveld, W. B. H. Schrimpf, and A. G. Siegert. User requirements and information definition for a virtual coastal and marine data warehouse. Ocean and Coastal Management, 46(6-7):487– 505, 2003. ISSN 09645691. doi: 10.1016/S0964-5691(03)00031-0.

80 [102] U. Burkhardt, D. J. Russell, P. Decker, M. Dohler,¨ H. Hofer,¨ S. Lesch, S. Rick, J. Rombke,¨ C. Trog, J. Vorwald, E. Wurst, and W. E. R. Xylander. The Edaphobase project of GBIF-Germany-A new online soil-zoological data warehouse. Applied Soil Ecology, 83:3–12, 2014. ISSN 09291393. doi: 10.1016/j.apsoil.2014.03.021.

[103] T. Rujirayanyong and J. J. Shi. A project-oriented data warehouse for construction. Automation in Construction, 15(6):800–807, 2006. ISSN 09265805. doi: 10.1016/j.autcon.2005.11.001.

[104] T. Park and H. Kim. A data warehouse-based decision support system for sewer infrastructure management. Automation in Construction, 30:37–49, 2013. ISSN 09265805. doi: 10.1016/j. autcon.2012.11.017.

[105] M. M. Hossain, T. Azim, M. Y. Karim, and A. S. M. L. Hoque. Integrated Data Warehousing for Telecommunication Industries. In 2009 12th International Conference on Computer and Informa- tion Technology, pages 657–662. IEEE, 2009. ISBN 9781424462841. doi: 10.1109/ICCIT.2009. 5407317.

[106] J.-S. Chou and H.-C. Tseng. Establishing expert system for prediction based on the project- oriented data warehouse. Expert Systems with Applications, 38(1):640–651, 2011. ISSN 09574174. doi: 10.1016/j.eswa.2010.07.015.

[107] Y. Hao, S. Hongwei, and Z. Zili. The application of e-commerce System based on data warehouse. In 2011 6th IEEE Joint International Information Technology and Artificial Intelligence Conference, ITAIC 2011, volume 2, pages 493–496. IEEE, 2011. ISBN 9781424486236. doi: 10.1109/ITAIC. 2011.6030381.

[108] I.-Y. Song and K. LeVan-Shultz. Data Warehouse Design for E-Commerce Environments. In Advances in Conceptual Modeling, pages 374–387. Springer, 1999.

[109] A. B. Mendes. BI and Data Warehouse Solutions for Energy Production Industry: Application of the CRISP-DM methodology. In 2010 Conference on Bridging the Socio-technical Gap in Decision Support Systems: Challenges for the Next Decade, pages 211–222, 2010.

[110] B. Kitchenham. Measuring Software Development. In Software Reliability Handbook, chapter 10, pages 303–31. Elsevier, 1990.

[111] D. L. Moody and G. G. Shanks. What Makes a Good Data Model? Evaluating the Quality of Entity Relationship Models. Springer, 1994.

[112] M. C. Reingruber and W. W. Gregory. The Data Modeling Handbook: A Best-Practice Approach to Building Quality Data Models. John Wiley & Sons, Inc., 1994.

[113] P. Johannsson, M. Boman, J. A. Bubenko Jr., and B. Wangler. Conceptual Modelling. Prentice- Hall, 1996.

[114] C. Batini, S. Ceri, and S. B. Navathe. Conceptual Database Design: An Entity-Relationship Ap- proach. Benjamin-Cummings Publishing Co., 1992. ISBN 0805302441.

81 [115] S. S.-S. Cherfi and N. Prat. Multidimensional Schemas Quality: Assessing and Balancing Ana- lyzability and Simplicity. In Conceptual Modeling for Novel Application Domains, pages 140–151. Springer, 2003.

[116] N. Prat and S. S.-S. Cherfi. Multidimensional Schemas Quality Assessment. In 15th International Conference on Advanced Information Systems Engineering, (CAiSE’03), pages 253–263. ACM Press, 2003.

[117] G. Berenguer, R. Romero, J. Trujillo, M. A. Serrano, and M. Piattini. A Set of Quality Indicators and Their Corresponding Metrics for Conceptual Models of Data Warehouses. In Data Warehousing and Knowledge Discovery, pages 95–104. Springer-Verlag, 2005. ISBN 978-3-540-28558-8. doi: 10.1007/11546849.

[118] A. Gosain, S. Nagpal, and S. Sabharwal. Quality Metrics for Conceptual Models for Data Ware- house focusing on Dimension Hierarchies. ACM SIGSOFT Software Engineering Notes, 36(4):1, 2011. ISSN 01635948. doi: 10.1145/1988997.1989015.

[119] K. B. Ali and A. Gosain. Predicting the Quality of Object-Oriented Multidimensional (OOMD) Model of Data Warehouse using Decision Tree Technique. International Journal of Engineering Science & Advanced Technology, 2(4):1048–1054, 2012.

[120] M. A. Serrano, J. Trujillo, C. Calero, and M. Piattini. Metrics for data warehouse conceptual models understandability. Information and Software Technology, 49(8):851–870, 2007. ISSN 09505849. doi: 10.1016/j.infsof.2006.09.008.

[121] M. A. Serrano, J. Trujillo, C. Calero, and M. Piattini. Metrics for data warehouse conceptual models understandability. Information and Software Technology, 49(8):851–870, 2007. ISSN 09505849. doi: 10.1016/j.infsof.2006.09.008.

[122] M. A. Serrano, C. Calero, J. Trujillo, S. Lujan-Mora,´ and M. Piattini. Empirical Validation of Metrics for Conceptual Models of Data Warehouses. In Advanced Information Systems Engineering, pages 506–520. Springer, 2004.

[123] M. A. Serrano, C. Calero, and M. Piattini. Experimental Validation of Multidimensional Data Models Metrics. In 36th Annual Hawaii International Conference on System Sciences, 2003., pages 1–7. IEEE, 2003. ISBN 0769518745.

[124] M. A. Serrano, C. Calero, H. A. Sahraoui, and M. Piattini. Empirical studies to assess the under- standability of data warehouse schemas using structural metrics. Software Quality Journal, 16(1): 79–106, 2008. ISSN 09639314. doi: 10.1007/s11219-007-9030-7.

[125] G. Papastefanatos, P.Vassiliadis, A. Simitsis, and Y. Vassiliou. Design Metrics for Data Warehouse Evolution. In Conceptual Modeling-ER 2008, pages 440–454. Springer, 2008.

82 [126] M. Golfarelli and S. Rizzi. Data warehouse testing: A prototype-based methodology. Information and Software Technology, 53(11):1183–1198, 2011. ISSN 09505849. doi: 10.1016/j.infsof.2011. 04.002.

[127] M. Golfarelli and S. Rizzi. Data Warehouse Testing. International Journal of Data Warehousing and Mining, 7(2):26–43, 2011. ISSN 1548-3924. doi: 10.4018/jdwm.2011040102.

[128] S. Nagpal, A. Gosain, and S. Sabharwal. Complexity Metric for Multidimensional Models for Data warehouse. In CUBE International Information Technology Conference, pages 360–365. ACM, 2012. ISBN 9781450311854.

[129] M. Bobker. Knowledge Practice In a Sea of Information. In Information Technology for Energy Managers, chapter 16, pages 171–181. The Fairmont Press, Inc., 2004.

[130] J. Lewis. The Case for Energy Information. In Information Technology for Energy Managers, chapter 10, pages 89–108. The Fairmont Press, Inc., 2004.

[131] P. Allen and D. Green. Creating Web-Based Information Systems From Energy Management System Data. In Information Technology for Energy Managers, chapter 34, pages 397–404. The Fairmont Press, Inc., 2004.

[132] M. Breslin. Data Warehousing Battle of the Giants. Business Intelligence Journal, 9(1):6–20, 2004.

[133] V. C. Gung¨ or,¨ D. Sahin, T. Kocak, S. Ergut,¨ C. Buccella, C. Cecati, and G. P. Hancke. Smart Grid Technologies: Communication Technologies and Standards. IEEE Transactions on Industrial Informatics, 7(4):529–539, 2011. ISSN 15513203. doi: 10.1109/TII.2011.2166794.

[134] K. De Craemer and G. Deconinck. Analysis of State-of-the-art Smart Metering Communication Standards. In 5th Young Researchers Symposium, pages 1–6. IEEE, 2010. doi: 10.1109/TSG. 2012.2218834.

[135] T. Cardoso. A Framework towards Efficient Integration of Energy Data. Master thesis, Instituto Superior Tecnico,´ 2013.

[136] O. Newman. Creating Defensible Space. US Department of Housing and Urban Development, Office of Policy Development and Research. Institute for Community Design Analysis, Center for Urban Policy Research, Rutgers University., Washington, DC, 1996.

[137] E. H. Schein. Organizational Psychology. Prentice-Hall, 1965.

[138] ISO/IEC. ISO/IEC 12207:2008 Systems and software engineering-software life cycle processes. Technical report, ISO/IEC, 2008.

[139] D. Sytse W and S. Hein. Economic Approaches to Organizations. Prentice-Hall, 2008. ISBN 0273681974. doi: 10.1016/0956-5221(93)90036-R.

83 [140] H. Mintzberg. Structure in 5’s: A Synthesis of the Research on Organization Design. Management science, 26(3):322–341, 1980.

[141] G. Mihalakakou, M. Santamouris, and A. Tsangrassoulis. On the energy consumption in residential buildings. Energy and Buildings, 34(7):727–736, 2002. ISSN 03787788. doi: 10.1016/S0378-7788(01)00137-2.

[142] A. Silberschatz, H. Korth, and S. Sudarshan. Entity−Relationship Model. In Database System Concepts, 6th Edition, pages 36–87. McGraw-Hill, 2010.

[143] W. Rowen, I.-Y. Song, C. Medsker, and E. Ewen. An Analysis of Many-to-Many Relationships Between Fact and Dimension Tables in Dimensional Modeling. In International Workshop on Design and Management of Data Warehouses (DMDW 2001), Interlaken Switzerland, pages 1– 13, 2001.

[144] A. Gosain and Heena. Literature Review of Data Model Quality Metrics of Data Warehouse. Procedia Computer Science, 48:236–243, 2015. ISSN 18770509. doi: 10.1016/j.procs.2015.04. 176.

[145] M. Golfarelli and S. Rizzi. Data Warehouse Design, Modern Principles and Methodologies. McGraw-Hill Osborne Media, 2009. ISBN 978-0-07-161039-1.

[146] M. Casters, R. Bouman, and J. V. Dongen. Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration. Wiley Publishing, Inc., 2010. ISBN 0470947527.

[147] S. Chaudhuri and U. Dayal. An overview of Data Warehousing and OLAP Technology. ACM SIGMOD Record, 26(1):65–74, 1997. ISSN 01635808. doi: 10.1145/248603.248616.

[148] C. Coronel, S. Morris, and P. Rob. Database systems: design, implementation, and management. Cengage Learning, 2009. ISBN 9780538469685.

[149] Microsoft. Multidimensional Expressions (MDX) Reference. Microsoft, 2012. doi: 10.1007/ s13222-011-0058-2.

[150] 9241-210: 2010. Ergonomics of human system interaction-Part 210: Human-centred design for interactive systems. Technical report, ISO, 2009.

[151] J. Nielsen. Scrolling and Scrollbars, 2005.

[152] J. Nielsen. Usability engineering. Elsevier, 1994.

[153] C. Schutz and M. Schrefl. Customization of domain-specific reference models for data ware- houses. In 2014 IEEE 18th International Enterprise Distributed Object Computing Conference (EDOC), pages 61–70. IEEE, 2014. ISBN 978-1-4799-5470-4. doi: 10.1109/EDOC.2014.18.

[154] J. Z. Kolter and M. J. Johnson. REDD: A Public Data Set for Energy Disaggregation Research. In SustKDD workshop on Data Mining Applications in Sustainability, pages 59—-62, 2011. ISBN 9781450308403.

84 [155] K. Anderson, A. Ocneanu, D. Benitez, D. Carlson, A. Rowe, and M. Berges. BLUED: A Fully Labeled Public Dataset for Event-Based Non-intrusive Load Monitoring Research. In 2nd KDD Workshop on Data Mining Applications in Sustainability (SustKDD), pages 1–5, 2012. ISBN 9781450315586.

[156] S. Barker, A. Mishra, D. Irwin, E. Cecchet, P. Shenoy, and J. Albrecht. Smart *: An Open Data Set and Tools for Enabling Research in Sustainable Homes. In 2012 Workshop on Data Mining Applications in Sustainability (SustKDD 2012), 2012.

[157] J. Kelly and W. Knottenbelt. The UK-DALE dataset, domestic appliance-level electricity demand and whole-house demand from five UK homes. arXiv e-prints, 59, 2014. doi: 10.1038/sdata.2015. 7.

[158] A. Monacchi, D. Egarter, W. Elmenreich, S. D’Alessandro, and A. M. Tonello. GREEND: An Energy Consumption Dataset of Households in Italy and Austria. In 2014 IEEE International Conference on Smart Grid Communications (SmartGridComm), pages 1–16, 2014. ISBN 9781479949342. doi: 10.1109/SmartGridComm.2014.7007698.

[159] R. J. Santos, J. Bernardino, and M. Vieira. Leveraging 24/7 Availability and Performance for Distributed Real-Time Data Warehouses. 2012 IEEE 36th Annual Computer Software and Appli- cations Conference, pages 654–659, 2012. doi: 10.1109/COMPSAC.2012.92.

[160] J. McConahay. Using Modbus for Process Control and Automation. Part 1. Control Engineering, pages A12–A14, 2011.

85 86 Appendix A

IST University Context Description

Instituto Superior Tecnico´ (IST) college has two campuses, Alameda and Taguspark. Alameda campus has twenty-four buildings spread across 104.223 m2, where each building has a variable number of floors composed of rooms. On the other hand, Taguspark campus has a single building occupying 116 000 m2. This building has four floors, each divided in three sub-spaces composed of several rooms. IST campus spaces are classified according to its purpose. In particular, there are lecture halls, study rooms and libraries, labs and computer rooms, and offices and meeting rooms1. The different building spaces are owned by IST organization structures, such as departments, units, and directions. These structures are divided in different ways. For instance, the school council is a flat organization while the management council has an hierarchical structure with different hierarchical levels. Organization structures are responsible for defining how the spaces are used. In general, the ac- tivities that occurring in IST buildings are lessons, tests and exams, among other (e.g. meetings, and research activities). The space activities and their participants are associated with equipment energy consumption (e.g. HVAC). In order to measure energy consumption, each building has a Modbus meter reader that records minutely aggregated energy consumption [160]. Consequently, energy records are associated with the group of sub-spaces that compose the entire building. The Alameda campus has a weather station, which records air temperature, solar radiation, among other weather variables. The records are obtained every fifteen minutes.

1http://tecnico.ulisboa.pt/files/media/media-kit/apresentacao-institucional

87 88 Appendix B

Multidimensional Model Relational Schema

CREATE TABLE time_dimension ( time_key SERIAL PRIMARY KEY, time_id INT NOT NULL, time_of_day TIME NOT NULL, hour INT NOT NULL, quarter_hour INT NOT NULL, minute INT NOT NULL, second INT NOT NULL );

CREATE TABLE date_dimension ( date_key SERIAL PRIMARY KEY, date_id INT NOT NULL, calendar_date DATE NOT NULL, calendar_year INT NOT NULL, is_leap_year BOOLEAN NOT NULL, calendar_quarter INT NOT NULL, calendar_month VARCHAR NOT NULL, calendar_month_number INT NOT NULL, calendar_week INT NOT NULL, calendar_week_day_number INT NOT NULL, calendar_week_day_name VARCHAR NOT NULL, week_day_type VARCHAR NOT NULL, calendar_day INT NOT NULL );

CREATE TABLE organization_dimension( organization_key INT PRIMARY KEY, organization_name VARCHAR NOT NULL );

CREATE TABLE activity_dimension( activity_key INT PRIMARY KEY, activity_type VARCHAR NOT NULL, activity_description VARCHAR NOT NULL );

89 CREATE TABLE spaces_group_dimension( spaces_group_key BIGINT PRIMARY KEY, spaces_group_name VARCHAR NOT NULL );

CREATE TABLE equipment_group_dimension( equipment_group_key BIGINT PRIMARY KEY, equipment_group_name VARCHAR NOT NULL );

CREATE TABLE equipment_dimension( equipment_key SERIAL PRIMARY KEY, description VARCHAR NOT NULL, equipment_name VARCHAR NOT NULL, electric_load VARCHAR NOT NULL, equipment_type VARCHAR NOT NULL, functionality VARCHAR NOT NULL, system_type VARCHAR NOT NULL, subsystem_type VARCHAR NOT NULL );

CREATE TABLE datapoint_dimension( datapoint_key SERIAL PRIMARY KEY, description VARCHAR NOT NULL, scale INT NOT NULL DEFAULT 1, precision INT NOT NULL DEFAULT 0, timezone VARCHAR NOT NULL DEFAULT ’UTC+0’, is_virtual BOOLEAN NOT NULL DEFAULT false, domain_description VARCHAR NOT NULL );

CREATE TABLE space_dimension( parent_space_key SERIAL NOT NULL, parent_space_id BIGINT NOT NULL, parent_space_name VARCHAR NOT NULL, space_id BIGINT NOT NULL, space_key SERIAL PRIMARY KEY, space_name VARCHAR NOT NULL, space_type VARCHAR NOT NULL, space_description VARCHAR NOT NULL DEFAULT ’No Description’, space_area BIGINT NOT NULL DEFAULT 0, has_natural_light BOOLEAN NOT NULL DEFAULT FALSE );

CREATE TABLE energy_costs_fact_table( active_energy_cost BIGINT NOT NULL, access_tariff_cost BIGINT NOT NULL, date_key INT REFERENCES date_dimension (date_key), organization_key INT REFERENCES organization_dimension (organization_key), space_key INT REFERENCES space_dimension (space_key), datapoint_key INT REFERENCES datapoint_dimension (datapoint_key) );

CREATE TABLE energy_readings_fact_table( time_key INT REFERENCES time_dimension (time_key), date_key INT REFERENCES date_dimension (date_key), equipment_key INT REFERENCES equipment_group_dimension (equipment_group_key), space_key INT REFERENCES spaces_group_dimension (spaces_group_key),

90 datapoint_key INT REFERENCES datapoint_dimension (datapoint_key), measurement_wh INT NOT NULL, measurement_kwh INT NOT NULL, measurement_mwh INT NOT NULL );

CREATE TABLE weather_readings_fact_table( time_key INT REFERENCES time_dimension(time_key), date_key INT REFERENCES date_dimension(date_key), equipment_key INT REFERENCES equipment_group_dimension (equipment_group_key), space_key INT REFERENCES spaces_group_dimension (spaces_group_key), datapoint_key INT REFERENCES datapoint_dimension (datapoint_key), temperature INT NOT NULL, temperature_felt INT NOT NULL, humidity INT NOT NULL, wind_speed INT NOT NULL, wind_direction INT NOT NULL, air_pressure INT NOT NULL, solar_radiation INT NOT NULL, precipitation INT NOT NULL );

CREATE TABLE degree_days_aggregate_fact_table( date_key INT REFERENCES date_dimension(date_key), space_key INT NOT NULL, cooling_degree_days INT NOT NULL, heating_degree_days INT NOT NULL );

CREATE TABLE building_space_occupancy_fact( space_key INT REFERENCES space_dimension(space_key), activity_key INT REFERENCES activity_dimension(activity_key), owner_organization_key INT REFERENCES organization_dimension_view1(organization_key), payer_organization_key INT REFERENCES organization_dimension_view2(organization_key), occupant_organization_key INT REFERENCES organization_dimension_view3(organization_key), date_key INT REFERENCES date_dimension(date_key), start_time_key INT REFERENCES time_dimension(time_key), end_time_key INT REFERENCES time_dimension(time_key), occupancy_value INT NOT NULL DEFAULT 0 );

CREATE TABLE spaces_group_bridge( spaces_group_key BIGINT REFERENCES spaces_group_dimension (spaces_group_key), space_id BIGINT REFERENCES space_dimension (space_key) );

CREATE TABLE equipment_group_bridge( equipment_group_key BIGINT REFERENCES equipment_group_dimension (equipment_group_key), equipment_key BIGINT REFERENCES equipment_dimension (equipment_key) );

CREATE TABLE space_hierarchy_bridge( parent_space_key INT NOT NULL, space_key INT REFERENCES space_dimension(space_key), distance INT NOT NULL, bottom_flag BOOLEAN NOT NULL, top_flag BOOLEAN NOT NULL );

91 92 Appendix C

ETL Workflows

C.1 ETL Workflows Dependencies Hierarchy

On the following we describe the dependencies between PDI ETL Jobs and Transformations. In order to guarantee data is correctly loaded, all dimensions are loaded sequentially, and then fact tables are loaded in parallel. The only exception is activities dimension, which is loaded before occupancy fact table, but in parallel with remaining fact tables.

• Steps included on the Job used to load IST data (Figure C.1):

– Dimensions loaded sequentially

1. Step that creates PostgreSQL database tables

2. Transformation that loads time dimension data (Figure C.2)

(a) Transformation that creates minutes data table (Figure C.3)

3. Transformation that loads date dimension data (Figure C.4)

(a) Transformation that creates years data table (Figure C.5)

4. Job that loads space dimension data (Figure C.6)

(a) Transformation that loads parent spaces data (Figure C.7)

i. Transformation that loads child spaces data (Figure C.8)

(b) Transformation that updates space dimension with description data (Figure C.9)

(c) Transformation that loads space hierarchy bridge data (Figure C.10)

5. Transformation that validates start date (Figure C.11)

6. Transformation that validates end date (Figure C.11)

7. Transformation that extracts and loads activities dimension data (Figure C.12)

(a) Transformation that loads activities dimension data (Figure C.13)

(b) Transformation that creates days data table (Figure C.14)

– Fact tables loaded in parallel

93 * Transformation that loads occupancy fact table data (Figure C.15)

* Transformation that loads energy readings fact table data (Figure C.16) * Transformation that loads weather readings fact table data (Figure C.17) · Transformation that extracts and transforms weather variables data (Figure C.18) · Transformation that creates days data table (Figure C.14) · Transformation that loads degree days data (Figure C.19)

* Transformation that loads energy costs Fact table data (Figure C.20) · Transformation that aggregates energy data by peak demand period (Figure C.21)

C.2 Workflows Figures

C.2.1 Job used to load IST data

Figure C.1: Representation from left to right of the highest level ETL workflow. First, SQL tables are created on PostgreSQL database (1). Secondly, time (Figure C.2), date (Figure C.4), and space (Figure C.6) dimensions data is extracted and loaded; and end and start date parameters are validated (Figure C.11). Finally, occupation (Figure C.15), energy costs (Figure C.20), energy consumption (Figure C.16), and weather fact tables (Figure C.17) data is loaded in parallel.

94 C.2.2 Dimension Tables Workflows

Time Dimension Workflow

Figure C.2: Representation from left to right of the transformation that loads time dimension data. First, year parameter is read from parent transformation (1). Secondly, the transformation creates a table with hours, minutes, and seconds (2), and updates it with hour quarters (3), day/night periods (4), and surrogate keys (5). The last step consists of loading time dimension data (6).

Figure C.3: Representation from left to right of the transformation that creates minutes data table.

Date Dimension Workflow

Figure C.4: Representation from left to right of the transformation that loads date dimension data. First, year parameter is read from parent transformation (1). Secondly, the transformation creates a table with years, months, and days (2), and updates it with week numbers (3), month names (4), week days (5), week day types (6), and surrogate keys (7). The last step consists of loading date dimension data (8).

Figure C.5: Representation from left to right of the transformation that creates a calendar years data table.

95 Space Dimension related Workflows

Figure C.6: Representation from left to right of the job that loads space dimension data. First, space data is extracted and loaded (Figure C.7). Secondly, space description data is extracted and space dimension table data is updated (Figure C.9). Finally, space hierarchy bridge data is created and loaded according to space dimension data (Figure C.10).

Figure C.7: Representation from left to right of the transformation that loads parent space data. First, the transformation extracts university campus data (1), parses it using JSON Path expressions (2), and creates a new URL to obtain buildings data (3). Secondly, obtains buildings (Figure C.8), floors (Figure C.8), rooms (Figure C.8), and room divisions data (Figure C.8). Additionally, the transformation filters (4) and loads A4 room data (5).

Figure C.8: Representation of the transformation that loads child space data. This transformation is identical to parent space transformation.

96 Figure C.9: Representation from left to right of the transformation that loads space description data. First, the transformation obtains space dimension data (1) and spaces description data (2). Secondly, both tables are joined using space natural keys (3). Thirdly, space classification and description at- tributes are concatenated (4), de-duped (5), and area and lighting attributes original data types are converted (6). The last step consists of loading space hierarchy bridge data (7).

Figure C.10: Representation from left to right of the transformation that loads space hierarchy bridge data. First, the transformation obtains space data (1), and calculates the number of hierarchical levels between each space and its parent (2). Secondly, spaces hierarchy data is updated with top and bottom- up flags (3), and a space dimension key (4). The last step consists of loading space hierarchy bridge data (5).

Date Validation Workflow

Figure C.11: Representation from left to right of the transformation that validates start/end date. The transformation verifies the existance of non-existant days, such as the 31st of April.

97 Activity Dimension Related Workflows

Figure C.12: Representation from left to right of the transformation that loads activities dimension data. First, start and end date parameter are read from parent transformation (1). Secondly, a days data table is obtained (2), cross joined with a4 space id (3), and used to create the data source URLs (4); each URL is used to obtain the activities, from a4 room, on a specific day. Thirdly, activities data is obtained using several http clients working in parallel (5). The last step consists of activities data to a sub-transformation (Figure C.13).

Figure C.13: Representation from left to right of the transformation that parses activities data. First, event data obtained from parent transformation (1). Secondly, the event data is parsed differently according to its type (2). The last step consists of loading activities (3), and occupancy (4) dimension data.

Figure C.14: Representation from left to right of the transformation that creates calendar days data table.

98 C.2.3 Fact Tables Workflows

Space Occupancy Fact Table Workflow

Figure C.15: Representation from left to right of the transformation that loads occupancy fact table data.

Energy Readings Fact Table Workflow

Figure C.16: Representation from left to right of the transformation that loads energy readings fact table data. First, energy measurements (1) and spaces data (2) is extracted. Secondly, data records are joined according to space names, using Jaro algorithm (3). Thirdly, the obtained data records are associated with datapoint dimension entries (4). The last step consists of loading energy measurements fact table data (5).

99 Weather Readings Fact Table Related Workflows

Figure C.17: Representation from left to right of the transformation that loads weather readings fact table data. First, start and end date parameter are read from parent transformation (1). Secondly, weather variables data is obtained in parallel (Figure C.18 and Figure C.19). Thirdly, weather records data are joined, according to time and date attributes, and put into a single table (2). The last step consists of loading weather measurements fact table data (3).

100 Figure C.18: Representation from left to right of the transformation that extracts and transforms weather variables data.

Figure C.19: Representation from left to right of the transformation that loads degree days data.

101 Energy Costs Fact Table Related Workflows

Figure C.20: Representation from left to right of the transformation that loads energy costs fact table data. First, energy costs (1), and space and organization (2) data are obtained, and joined according to month attribute (3). Secondly, the newly obtained table is joined with energy measurements data (Figure C.21) aggregated according to different consumption periods (e.g. peak hours) (3). Thirdly, energy costs are split (4), and calculated according to the corresponding consumption period (5). The last step consists of loading weather measurements fact table data (6).

Figure C.21: Representation from left to right of the transformation that aggregates energy data peak demand period.

102 Appendix D

Mondrian XML Schema

103

104

105

106 < Measures > < Measures > < Measures > < Measures >

107 < Measures >

[Measures].[Unformatted Energy Measurements MWh AVG]*0.000000001 [Measures].[Unformatted Energy Measurements MWh MAX]*0.000000001 [Measures].[Unformatted Energy Measurements MWh SUM]*0.000000001 [Measures].[Unformatted Energy Measurements MWh MIN]*0.000000001 [Measures].[Unformatted Energy Measurements kWh AVG]*0.000001 [Measures].[Unformatted Energy Measurements kWh MAX]*0.000001 [Measures].[Unformatted Energy Measurements kWh MIN]*0.000001 [Measures].[Unformatted Energy Measurements kWh SUM]*0.000001 [Measures].[Unformatted Energy Measurements Wh AVG]*0.001 [Measures].[Unformatted Energy Measurements Wh SUM]*0.001 [Measures].[Unformatted Energy Measurements Wh MAX]*0.001 [Measures].[Unformatted Energy Measurements Wh MIN]*0.001 [Measures].[Unformatted Energy Measurements kWh SUM]*0.000001*0.589 [Measures].[Unformatted Temperature]*0.1 [Measures].[Unformatted Temperature Felt]*0.1 [Measures].[Unformatted Humidity]*0.1

108 [Measures].[Unformatted Wind Speed]*0.1 [Measures].[Unformatted Wind Gust]*0.1 [Measures].[Unformatted Wind Direction]*0.1 [Measures].[Unformatted Air Pressure]*0.1 [Measures].[Unformatted Solar Radiation]*0.1 [Measures].[Unformatted Precipitation]*0.1 [Measures].[Unformatted Cooling Degree Days]*0.1 [Measures].[Unformatted Heating Degree Days]*0.1

109 110 Appendix E

BEMS Prototype Evaluation Questionnaire

E.1 Background Information

• Age

• Gender

• Education

• Job

• How many years of experience do you have in the field of Building Energy Management?

• What type of buildings does your organization own?

– Commercial buildings (e.g. Bank)

– Residential buildings (e.g. apartment block)

– Educational buildings (e.g college)

– Industrial buildings (e.g. factory)

– Other

• What are the areas of activity of organization building occupants?

E.2 Usability Evaluation

• Answer the following questions with a value from 1 to 5. (The following questions are based on ISO 9241, which is the standard for ergonomics of human- computer interaction. These questions are used to determine if the effectiveness of the prototype

111 is hindered by usability issues.)

1. (Design) Is the application interface pleasant to use?

2. (Easiness of use) Is the application interface easy to use?

3. (Learnability) Is it easy to learn how to use the application?

4. (Learnability) Is the information provided by the application easy to understand?

5. (Satisfaction) Are you satisfied with the outcome of the performed tasks?

• How do you classify the prototype performance/responsiveness from 1 to 5 (1 is used for slow and unresponsive, and 5 is used for fast and highly responsive)? (There is a correlation between multidimensional model structures, query complexity and system response time. Therefore, this question aims at determining if the system responsiveness is ac- ceptable.)

• Do you have any comments/suggestions about the prototype interface? (This question gives the user the possibility of pointing out any usability aspects not covered by the previous questions.)

E.3 Technical Evaluation

• What kinds of functionality are missing on the prototype? (Note: Options below refer to functionality that despite not being available on the prototype are supported by the model. This question tries to determine missing functionality apart from the op- tions below, and if those functionalities are supported by the model.)

1. Longitudinal Benchmarking

2. Peak Load Analysis Histograms

3. Baseline Curves

4. Individual equipment consumption account

5. Individual organizational unit consumption account

6. Energy Consumption Forecasting

7. Other

• What are the energy consumption related dimensions/data types missing on the different charts? (This question aims at determining if there are any missing relations between dimensions and facts, or if there missing dimensions/fact tables.)

112 1. What data types are missing on Simple Analysis Chart? a) Organizational units data b) Occupation values c) Energy cost d) Equipment data e) Other

2. What data types are missing on Comparison Analysis Chart? a) Organizational units data b) Occupation values c) Energy cost d) Equipment data e) Other

3. What data types are missing on Variable Analysis Chart? a) Organizational units data b) Occupation values c) Energy cost d) Equipment data e) Other

4. What data types are missing on Occupation Analysis Chart?

a) Organizational units data b) Energy cost c) Equipment data d) Other

5. What data types are missing on Energy Cost Simulator?

a) Organizational units data b) Occupation values c) Equipment data d) Other

• Do you have any comments/suggestions about the prototype functionality and features? (This question gives the user the possibility of suggesting any technical aspects not covered on previous questions.)

113 114