Research Collection

Doctoral Thesis

Leveraging Data Analytics Towards Activity-based Energy Efficiency in Households

Author(s): Cao, Hông-Ân

Publication Date: 2017-06

Permanent Link: https://doi.org/10.3929/ethz-b-000000236

Rights / License: In Copyright - Non-Commercial Use Permitted

This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use.

ETH Library Diss. ETH No. 24173

Leveraging Data Analytics Towards Activity-based Energy Efficiency in Households

A thesis submitted to attain the degree of DOCTOR OF SCIENCES of ETH ZURICH (Dr. sc. ETH Zurich)

presented by Hông-Ân Cao ing. sys. com. dipl. EPF École Polytechnique Fédérale de Lausanne born on February 23, 1987 citizen of Monthey VS, Switzerland

accepted on the recommendation of Prof. Dr. Friedemann Mattern, examiner Prof. Dr. Nuno Jardim Nunes, co-examiner Prof. Dr. Torben Bach Pedersen, co-examiner

2017

Có trí thì nên. — Vietnamese proverb

To my parents.

ABSTRACT

Aiming for sustainable development means reconsidering the access to energy sources in industrialized countries, which are not faced with contingency scenarios that are implemented in emergent and newly-developed countries, to allow equal access to energy sources for all and thwart environmental degradation. The global penetration of renewable energy sources to replace fossil fuel and nuclear power plants means adjusting to stochastic energy production. The expected yield will be dependent on very different weather and landscape con- ditions and will represent a challenge for countries with continuous access to energy sources and where energy is often considered a pub- lic utility. Tracking wastage, improving the scheduling, and the pro- cesses that consume energy, would allow us to match the demand and the supply of energy. This will be particularly crucial during peak time, where meeting the high demand incurs the ramping up of mostly unclean additional power plants or introducing power system instability. The digitalization of the energy sector has started with the roll- out of smart meters to record the electricity consumption at a finer granularity and are aimed to replace the biannual or yearly dispatch of utility companies’ employees to read the meter. Considerable re- search efforts have been directed at analyzing aggregated loads from these smart meters or at developing algorithms for disaggregating households’ total electricity consumption to isolate single appliances’ traces. However, less focus has been set on assessing the potential of using sub-metered data for improving the energy efficiency in house- holds. This was primarily linked to the fact that the necessary datasets were not widely available, due to the difficulty and the costs in instru- menting households for acquiring the consumption data from appli- ances. The objective of this thesis is to investigate how to leverage and improve existing disaggregated datasets to develop data-driven techniques to improve the energy efficiency within residential homes. Starting from smart meter data, we segmented households into groups with similar electricity consumption pattern based on their peak consumption, to identify hurtful consumption patterns in the perspective of utility companies, for which they could launch targeted mitigation campaigns. However, improving the energy efficiency in the residential sector requires to change individuals’ relationship to- wards their electricity consumption. These behaviors are closely re- lated to the activities that are carried out throughout the day and can be supported by the usage of consumer electronics, such as appli- ances. Therefore, we turned to analyzing the behaviors inside house-

i holds that triggered the usage of electricity by studying a large disag- gregated dataset and developed learning techniques to extract activ- ity patterns. We first addressed the challenge of determining when appliances are actively used by households’ residents, from when they are off or idle and incurring standby consumption by develop- ing GMMthresh, an automatic thresholding method, which is agnos- tic of the appliance’s type, brand and model, but instead relies on the statistical distribution of its power consumption. Due to the lack of event-based and activity labels in existing data- sets to allow us to validate our learning technique, we leveraged crowdsourcing concepts to provide an expert-annotated dataset to en- rich the existing datasets through our Collaborative Annotation Frame- work for Energy Datasets (CAFED). We conducted two in-depth studies to quantify the performance of regular users against expert users in la- beling energy data on CAFED. We provided analysis tools and meth- ods that can be generalized to crowdsourcing systems for improving the quality of the workers’ contributions. Using the expert-annotated labels, we validated GMMthresh with expert manually labeled data. Then, we developed a method for learning temporal association rules for identifying activities involving the usage of appliances within households. Our pipeline includes our thresholding algorithm and a novel search algorithm for determining time windows for the asso- ciation rules efficiently and in a data-driven manner. The contributions of this thesis rely on exploiting energy data and developing novel techniques towards identifying activity patterns and their scheduling, which could then become part of an ambient intel- ligence system that would smarten existing homes. The methods we developed are not restricted to the energy research, as they can be ap- plied to sensor data, where for example inertial sensors also require machine learning algorithms to filter out background noise from ac- tual movement. Similarly, our work on the crowdsourcing of time series opens new perspectives for extending the range of data that can be annotated by the crowd and provides design insights and mit- igation techniques for improving the quality of the labeling on col- laborative platforms. Finally, our temporal association rules mining framework is not limited to energy time series but can be applied to search for temporal windows and understanding the scheduling of any time series dataset.

ii RÉSUMÉ

Pour permettre le développement durable à tous et éviter les dé- sastres environnementaux, les pays industrialisés doivent reconsidé- rer leurs rapports aux resources énergétiques, bien que n’étant pas actuellement exposés à des plans de contingence comme dans les pays émergents et en voie de développement. La pénétration des éner- gies renouvelables à l’échelle mondiale qui remplaceront à terme les sources fossiles et les centrales nucléaires entraîne un ajustement à la nature stochastique de la production d’énergie propre. Le rende- ment attendu dépendra fortement des conditions météorologiques et de la situation géographique des installations et représentera un défi majeur pour les pays qui ont bénéficié jusqu’à présent d’un accès continu aux sources d’énergie, en faisant de celle-ci une commodité. Identifier les sources de gaspillage, améliorer l’ordonnancement et les processus qui consomment de l’énergie, permettraient de faire correspondre la demande à la production d’énergie. Ceci sera parti- culièrement crucial aux heures de pointe, où répondre à la forte de- mande occassionne l’inclusion de centrales électriques additionnelles souvent polluantes ou le risque d’introduire de l’instabilité dans le réseau électrique. La digitalisation du secteur de l’énergie a commencé avec l’intro- duction des compteurs intelligents qui permettent de mesurer la con- sommation électrique à une granularité plus fine, et ainsi de rem- placer le passage d’employés pour le relevé de compteur semestriel ou annuel. Des efforts considérables de recherche ont été entrepris pour analyser les charges aggrégées enregistrées par ces compteurs intelligents ou pour développer des algorithmes pour désagréger la consommation d’électricité totale des résidences pour en déduire les charges individuelles des appareils électroménagers. Cependant, moins d’accent a été porté à évaluer le potentiel d’exploiter les données ob- tenues des charges individuelles détaillées pour améliorer l’efficacité énergétique des clients résidentiels. Ceci a été au préalable dû au fait que les ensembles de données nécessaires n’étaient pas généralement disponibles, à cause de la difficulté et des coûts pour instrumenter les foyers pour collecter les données de consommation des appareils ménagers. L’objectif de cette thèse et d’examiner comment exploiter et améliorer les ensembles de données désagrégées existants et de dé- velopper des méthodes fondées sur l’analyse de données pour amé- liorer l’efficacité énergétique au sein des foyers. Nous commençons par considérer les données de compteurs intel- ligents pour segmenter les foyers en groupes avec des profils de con- sommation d’électricité similaire, sur la base de leur consommation aux heures de pointes, pour identifier les profils nuisibles aux compa-

iii gnies électriques, qui pourront conduire des campagnes de sensibili- sation pour y remédier. Cependant, améliorer l’efficacité energétique dans le secteur résidentiel implique l’adoption de changement de comportement face à sa propre consommation énergétique. Ces com- portements sont étroitement liés aux activités qui ont lieu à travers la journée et qui peuvent inclure l’utilisation d’appareils électroniques, tels que les appareils ménagers. Par conséquent, nous avons entrepris d’analyser les comportements dans les foyers qui reposent sur l’inter- action avec des appareils électroménagers et qui consomment donc de l’électricité par l’entremise d’un large ensemble de données désa- grégées et avons développé des techniques d’apprentissages pour ex- traire des motifs d’activités. Nous avons premièrement remédié au défi de déterminer quand un appareil est activement utilisé par rap- port à quand il est éteint ou en mode veille, et donc inactif en déve- loppant GMMthresh, un algorithme pour automatiquement définir un seuil pour différencier les modes actif et inactif agnostiquement, soit sans connaître le type, la marque ou le modèle de l’appareil, mais en se basant sur les caractéristiques statistiques de sa consommation de puissance. Due à l’absence de données marquées pour les événements et les activités dans les ensembles de données existants pour permettre de valider notre méthode d’apprentissage, nous avons mis à profit des méthodes de crowdsourcing pour produire un ensemble de données annotées par des experts à partir d’un ensemble existant de données désagrégées au travers de notre plateforme d’annotation de données énergétiques CAFED (Collaborative Annotation Framework for Energy Datasets). Nous avons mené deux études détaillées pour quantifier et comparer la performance d’utilisateurs normaux et d’experts dans l’annotation de données énergétiques sur CAFED. Nous offrons des outils d’analyse et des méthodes qui sont généralisables pour amé- liorer la contribution des travailleurs. Grâce aux données annotées par les experts, nous avons pu évaluer GMMthresh. Finalement, nous avons développé une méthode pour apprendre des régles associatives temporelles pour identifier les activités qui impliquent l’utilisation d’appareils électroniques dans les foyers. Notre pipeline inclut GMM- thresh et un algorithme novateur de recherche pour déterminer les intervalles de temps où ont lieu les règles de manière efficace et ba- sée sur l’analyse des données. Les contributions de cette thèse reposent sur l’exploitation de don- nées énergétiques et le développement de méthodes originales qui permettent l’identification de motifs d’activités et de leur ordonnan- cement, et elles pourront faire partie d’un futur système d’intelligence ambient qui pourra réguler et transformer les foyers actuels en mai- sons intelligentes. Les méthodes développées ne sont pas limitées au domaine de recherche sur l’énergie mais peuvent être appliquées à différents senseurs, tels que les senseurs inertiels, où il s’agit de dif-

iv férentier les mesures bruitées des mesures effectives d’activité. Simi- lairement, notre travail sur le crowdsourcing de séries temporelles promet des nouvelles perspectives pour étendre le crowdsourcing à d’autres domaines qui nécessitent des données annotées et nous of- frons des conseils pour concevoir des plateformes collaboratives pour améliorer la qualité des annotations. Finalement, nos régles associa- tives temporelles ne sont pas limitées aux séries temporelles d’énergie mais notre pipeline est extensible à la recherche de fenêtres tempo- relles et la compréhension de l’ordonnance d’événements avec di- verses sortes de données temporelles.

v

RELATEDPUBLICATIONS

Chapter 3, Chapter 4 and Chapter 5 are based on the following publi- cations, where the author is the lead author. Chapter 4 and Chapter 5 contain material based on the author’s work with Tri Kurniawan Wi- jaya during her exchange at EPFL. His Ph.D. dissertation [243] and this dissertation do not contain overlapping material.

[1] Hông-Ân Cao, Christian Beckel, and Thorsten Staake. “Are Domestic Load Profiles Stable Over Time? An Attempt to Iden- tify Target Households for Demand Side Management Cam- paigns.” In: Proceedings of the 39th Annual Conference of the IEEE Industrial Electronics Society (IECON ’13). Vienna, Austria: IEEE, Nov. 2013, 4733–4738. doi: 10.1109/IECON.2013.6699900. [2] Hông-Ân Cao, Felix Rauchenstein, Tri Kurniawan, Karl Aberer, and Nuno Nunes. “Leveraging User Expertise in Collaborative Systems for Annotating Energy Datasets.” In: Proceedings of the 2016 Workshop on Smart Grids at the 2016 IEEE International Con- ference on Big Data (BigData ’16). Washington, DC, USA: IEEE, Dec. 2016, 3087–3096. doi: 10.1109/BigData.2016.7840963. [3] Hông-Ân Cao, Tri Kurniawan Wijaya, and Karl Aberer. “Es- timating Human Interactions with Electrical Appliances for Activity-based Energy Savings Recommendations.” In: Proceed- ings of the 1st ACM Conference on Embedded Systems for Energy- Efficient Buildings (BuildSys ’14). Memphis, TN, USA: ACM, Nov. 2014, 206–207. doi: 10.1145/2674061.2675037. [4] Hông-Ân Cao, Tri Kurniawan Wijaya, and Karl Aberer. “Es- timating Human Interactions With Electrical Appliances for Activity-based Energy Savings Recommendations.” In: Proceed- ings of the 2016 IEEE International Conference on Big Data (Big- Data ’16). Washington, DC, USA: IEEE, Dec. 2016, 1301–1308. doi: 10.1109/BigData.2016.7840734. [5] Hông-Ân Cao, Tri Kurniawan Wijaya, Karl Aberer, and Nuno Nunes. “A Collaborative Framework for Annotating Energy Datasets.” In: Proceedings of the 2015 Workshop for Sustainable Development at the 2015 IEEE International Conference on Big Data (BigData ’15). Santa Clara, CA, USA: IEEE, Oct. 2015, 2716–2725. doi: 10.1109/BigData.2015.7364072. [6] Hông-Ân Cao, Tri Kurniawan Wijaya, Karl Aberer, and Nuno Nunes. “Temporal Association Rules For Electrical Activity Detection in Residential Homes.” In: Proceedings of the 2016 Workshop on Smart Grids at the 2016 IEEE International Confer-

vii ence on Big Data (BigData ’16). Washington, DC, USA: IEEE, Dec. 2016, 3097–3106. doi: 10.1109/BigData.2016.7840964.

viii We have seen that computer programming is an art, because it applies accumulated knowledge to the world, because it requires skill and ingenuity, and especially because it produces objects of beauty. — Donald E. Knuth [134]

ACKNOWLEDGMENTS

I would first like to thank my advisor, Prof. Friedemann Mattern, for allowing me to take on this journey and to have supported me by offering an environment where my intellectual curiosity could wan- der freely and where I had the occasion to take on many different challenges. Throughout the years, I have had the immense chance of keeping strong relationships with people I have met. To this day, the bonds I have made with EPFL have allowed me to not only have built lifelong friendships during my studies, but also to have fostered scientific collaborations that have had a lasting impact in my career and life. I would first like to thank the NRC Lausanne team to have warmly welcomed me from the very early moments of its settling on the EPFL campus. I would like to thank Aurore, Matti, Niko and Olivier, for having been mentor figures from the first moment where I stepped into the premises. I cannot express enough gratitude for what I have learned and for being pushed to step out of my comfort zone and believing in myself every step of the way. I am especially grateful for Juha’s great leadership, who was always ready to make time to guide us, despite juggling countless other responsibilities and who deeply cared for each of us. My colleagues Debmalya, Diego, Emma, Gianpaolo, Imad, Jan, Julien, Markus, Matthias and Riikka created such an extraordinary atmosphere that lit up every day at work. The shutdown of NRC Lausanne was a considerable loss of wonderful people and exciting projects and I am forever grateful to have had the privilege to have been part of the team and to have learned from so many inspiring people. Then, I would like to thank Tri and Julien for their unyielding sup- port and having always left an open door for passionate discussions thanks to their great intellectual curiosity. They have shown me how research should be led with enlightening exchanges and caring guid- ance. My time at the LSIR lab could not have been possible without Prof. Karl Aberer’s kind approval and for making me feel part of the lab. I am particularly grateful for the fruitful exchanges I have had with other lab members such as Alexandra, Amit, Berker, Ben- jamin, Chantal, Hao, Hung, Natalia, Julia, and Tian. I would also like to thank friends I have met at EPFL or through EPFL to have

ix lent an ear and to have provided support and cheering during this journey, in alphabetical order: Acacio, Alex, Alexandre (Duc), Bas- tian, Damien, Daniel, Dominic, Eric, Fabien, Giuseppe, Khoa, Julia, Jérôme, Matthias, Noémie, Omer, Philippe, Raphaël, Robert, Romina, Sébastien, Stéphanie, Steve. I would like to express my deepest grat- itude towards Régis for his support and invaluable help throughout these years, as a friend and as a fellow researcher. A great thank you to the "Souper Romand" team, who brought a bit of our Romandie to Zurich, for the very fond memories of delicious dinners and endless laughter. They say you keep coming back to your alma mater and going back to EPFL is always feeling like you have never really left. Sylviane, Néjia, Antonella, Cécilia, Martine and Patricia, have always made me feel welcomed, as they have always been like substitute mothers throughout my studies and even after having graduated. I would like to express my appreciation of all the wonderful re- searchers I have had the privilege to meet and to befriend: Nguyên, Shihan, Lucas, Manolis, Yu, Xiaoqi and Yin, with whom I have had enlightening discussions, who have visited me in Zurich or whom I had the chance to visit. Furthermore, I would like to express my extreme gratitude to Prof. Nuno Nunes and Prof. Pierluca Lanzi for believing in my work and providing guidance. Without Barbara’s unshakable help I would have drowned in so many side tasks. I am extremely indebted to her for having always assisted me in the most ungrateful tasks with a firm, but efficient and soft touch. I would also like to thank my lab colleagues and my student, Felix Rauchenstein. I could not name everyone that I have met and has shaped my life, but they have left an indelible mark and I shall never forget their great influence. I would like to warmly thank my longtime friends Mélina and Fanny, for having never changed and for having never left my side throughout these years, for we have grown up together, but we will hopefully live through many more stories and challenges for the years to come. Last but not least, this journey would never have been possible without my family’s support and love. I owe my passion for science to my parents, for the sacrifices they have made to offer me an en- vironment that stimulated discussions and curiosity from the early age and for teaching me endurance, patience and humility. I want to thank my brother for sticking up through these times and lighting the mood with his sharp but hilarious remarks. I could not have made this journey without my rock, my friend, my love, Christoph.

x CONTENTS

Abstracti Related Publications vii Acknowledgments ix List of Figures xiii List of Tables xvi Acronyms xvi 1 introduction1 1.1 Towards Sustainable Development ...... 1 1.2 From Smart Meters to Smart Grids ...... 3 1.3 DRM and the Smart Home ...... 5 1.4 Thesis Outline and Contributions ...... 8 2 state of the art 11 2.1 Demand Analytics ...... 11 2.1.1 Psychological Factors ...... 11 2.1.2 Leveraging the Smart Meters’ Opportunities . . 12 2.2 Towards Smart Homes ...... 15 2.2.1 Designing Feedback and Moving Towards Au- tomation ...... 15 2.2.2 Context Awareness ...... 17 2.2.3 Leveraging Human Activity Information for En- ergy Conservation ...... 18 3 customer segmentation 21 3.1 Identifying The "Right" Customers ...... 21 3.2 Related Work ...... 22 3.3 Dataset and Data Preparation ...... 24 3.3.1 Data Pre-processing ...... 24 3.3.2 Clustering ...... 27 3.3.3 Similarity Ranking ...... 29 3.4 Experimental Results ...... 30 3.4.1 Seasonal Subsetting and Data Curating . . . . . 30 3.4.2 Evaluation of the Clustering ...... 30 3.4.3 Comparison with Similarity Distance Ranking Classification ...... 33 3.5 Conclusion and Discussion ...... 34 4 crowdsourcing energy data labeling 35 4.1 CAFED: A Collaborative Framework for Energy Datasets 37 4.1.1 Challenges for Collecting Large Labeled Datasets 39 4.1.2 Framework ...... 40 4.1.3 User Engagement and Motivation ...... 43 4.1.4 Results ...... 49 4.1.5 Giving Back to the Community ...... 49

xi xii contents

4.1.6 Conclusion and Discussion ...... 51 4.2 Crowdsourcing Through User Expertise Quantification 52 4.2.1 Crowdsourcing Manual Labels Through the Gen- eral Public ...... 53 4.2.2 Related Work ...... 55 4.2.3 User Studies ...... 55 4.2.4 Methodology ...... 59 4.2.5 Results ...... 65 4.2.6 Conclusion and Discussion ...... 69 5 inferring activities 73 5.1 Activity Detection ...... 74 5.2 An Alternative to Eco-feedback Systems ...... 74 5.3 GMMthresh: Agnostic Appliance Activity Threshold- ing Determination ...... 75 5.3.1 Related Work ...... 76 5.3.2 Methodology ...... 76 5.3.3 Experimental Results ...... 78 5.3.4 Conclusion and Discussion ...... 87 5.4 Temporal Association Rules for Activity Detection . . . 88 5.4.1 Related Work ...... 89 5.4.2 Methodology ...... 91 5.4.3 Empirical Evaluations ...... 98 5.4.4 Conclusion and Discussion ...... 106 6 conclusion 109 6.1 Future Directions ...... 110 6.1.1 Customer Segmentation ...... 110 6.1.2 A Pledge for More Datasets ...... 111 a cafed appendix 115 a.1 Entity-relationship Model ...... 115 a.2 CAFED Survey ...... 115 a.3 AMT Users’ Feedback ...... 117

bibliography 119 LISTOFFIGURES

Figure 1 Global distribution of the ecological footprint in 2012 [176]...... 1 Figure 2 Hourly network load for the electricity grid in Switzerland in 2015 [222]...... 2 Figure 3 Electricity production in Switzerland since 1950 [222]...... 3 Figure 4 Electricity consumption per sector in Switzer- land since 1987 [222]...... 3 Figure 5 DRM measures for energy efficiency targets [212] 5 Figure 6 DRM customer-side flow control view in an Energy Management System (EMS)[212]... 6 Figure 7 Jewel Box house, 2015 CEDIA Award recipi- ent [224]...... 7 Figure 8 Expected energy savings according to technol- ogy used for feedback on energy consump- tion [16]...... 17 Figure 9 Daily curves for one household with multiple consecutive 0k Wh-readings. We notice that the issue is not related to one single day of the week, but can occur any day of the week. . . . 25 Figure 10 Histogram of the maximum length of consec- utive 0k Wh-readings per day ...... 25 Figure 11 Histogram of the distribution of the average weekly consumption using a log scale for 0.025 kWh bins...... 27 Figure 12 15 clusters (i.e., 14 + 1, obtained through first phase flat curves separation), K-Means, corre- lation, filter window = 3 on the training sum- mer dataset. All characteristic load curves dif- fer in the position of their peak...... 33 Figure 13 14 (i.e. 14 clusters with SOM + K-Means), Eu- clidean, filter window = 5 on the training sum- mer dataset. Clusters 1 and 2 are not distin- guishable as their peaks are located at the same positions...... 33 Figure 14 Comparison between the clusters built from the training set in Figure 12. The dashed line curves represent the new cluster profiles using the summer test set using the cosine distance as the similarity measure...... 34

xiii xiv List of Figures

Figure 15 CAFED architecture, based on a web server ar- chitecture with a database for handling 3 key components: security (authentication), curve dis- patching and annotation...... 40 Figure 16 Annotation view. We highlight in red the curve selection module and in orange the annotation workbench. The personal performance compo- nent is highlighted in blue, while the competi- tive components are in purple. The badge sec- tion shown in green highlights the badges ac- quired by the user...... 41 Figure 17 Annotation workbench in the case of a single appliance and circuit-level data ...... 45 Figure 18 Tasks’ difficulty levels and coaching on CAFED. 56 Figure 19 Online scoring for the experiment with physi- cal access to the participants (top) and on AMT (bottom), with varying αr’s and with and with- out difficulty weighting (easy , medium , dif- ficult )...... 66 Figure 20 Leave-one-out cross-validation prediction for the experiment with physical access (easy , medium , difficult )...... 69 Figure 21 AMT leave-one-out cross-validation prediction (easy , medium , difficult )...... 70 Figure 22 Histogram (in log scale) of the monthly power distribution for dishwasher1, where low power measurements are more represented...... 79 Figure 23 Outcome of the GMM for livingroom1 (circuit- level data) and dishwasher1 (single appliance). In (b) and (d), power below the threshold is considered to be in the idle state, and in the active state otherwise...... 85 Figure 24 Scores (F1 score and Hamming score sH) over- view per appliance ...... 86 Figure 25 Scores (F1 score and Hamming score sH) over- view per household ...... 86 Figure 26 Scores (F1 score and Hamming score sH) over- view for household 6910 from January to April comparing all three thresholding methods (av- erage over all appliances) ...... 87 Figure 27 Thresholds per appliance for all households and details for household 6910 ...... 87 Figure 28 Pipeline for deriving the temporal association rules ...... 92 Figure 29 Time relationships: contain, follow and overlap . 93 Figure 30 Heatmap and Gaussian Kernel Density Esti- mation (KDE) for an association rule ...... 94 Figure 31 Full and diagonal covariance matrices and cor- responding data spread ...... 96 Figure 32 Ellipse rotation and bounding box ...... 98 Figure 33 Impact of the selection of the concentration fac- tor α for the Dirichlet Process Gaussian Mix- ture Model (DPGMM). in Figure 33a, Figure 33b, Fig- ure 33c, and Figure 33d, α takes the values 0.01, 0.1, 1, and 10 respectively...... 101 Figure 34 Bivariate histogram and tolerance regions for household 624 where DPGMM overgeneralizes. in Figure 34c, Figure 34d, Figure 34e, Figure 34f, Fig- ure 34g, and Figure 34h, the ? locates the cen- ter of the ellipse (as ~µ, the Gaussian’s mean). The dashed, dash-dotted and dotted lines rep- resent population coverage percentages at 1, 2 and 3 standard deviations σi from the means µi respectively. The colored tolerance regions show the ellipses and their rectangular bound- ing boxes...... 102 Figure 35 Bivariate histogram and tolerance regions for household 2974, where DPGMM and Variational Bayesian Gaussian Mixture Model (VBGMM) overgeneralize and fail to capture smaller clus- ters. in Figure 35c, Figure 35d, Figure 35e, Fig- ure 35f, Figure 35g and Figure 35h, the ? lo- cates the center of the ellipse (as ~µ, the Gaus- sian’s mean). The dashed, dash-dotted and dot- ted lines represent population coverage per- centages at 1, 2 and 3 standard deviations σi from the means µi respectively. The colored tolerance regions show the ellipses and their rectangular bounding boxes...... 103 Figure 36 Kitchen and dishwasher rule, support: 0.548 . 106 Figure 37 Entity-relationship model for the PostgreSQL database: in purple the tables containing the original Pecan Street data, in green the tables for the annotations and in orange the tables for the users’ management ...... 115

xv LISTOFTABLES

Table 1 Combination of the evaluated clustering algo- rithms and distance measures ...... 29 Table 2 Winter and summer training and test sets. The table also contains the total number of daily load curves and the corresponding removed curves...... 30 Table 3 20 top scoring configurations of parameters for the clustering ...... 32 Table 4 Summary of the collected manual labels . . . . 50 Table 5 AMT classification results (precision, recall and F1-score) per class (Expert / User) using Ad- aboost and weak classifiers (LibLinear, Naive Bayes, Random Tree, Multi-layer Perceptrons), before coaching (BC), after coaching (AC), with and without coaching (All) ...... 68 Table 6 Selected appliances and their categories . . . . 81 Table 7 Appliances per household ...... 82 Table 8 Gaussian Mixture Model (GMM) parametriza- tion: selected configuration 15 GMM, no bin- ning (higher is better for the precision, the re- call and F1 score and lower is better for sH).. 84 Table 9 Households data details, with number of ap- pliances per month for each household ID . . 99 Table 10 Parametrization for the temporal sequential as- sociation rules and results ...... 104

ACRONYMS

AI Artificial Intelligence

AMI Advanced Metering Infrastructure

AMT Amazon Mechanical Turk

ANOVA Analysis of Variance

ART Adaptive Resonance Theory

xvi acronyms xvii

BIC Bayesian Information Criterion

BLUED Building-level Fully-labeled Dataset for Electricity Disaggregation

CAFED Collaborative Annotation Framework for Energy Datasets

CDF Conditional Random Field

CER (Irish) Commission for Energy Regulation

DPGMM Dirichlet Process Gaussian Mixture Model

DRM Demand Response Management

DSM Demand-side Management

DST Daylight Saving Time

ECO Electricity Consumption and Occupancy Dataset

EM Expectation-Maximization

EMS Energy Management System

FCM Fuzzy C-Means

GHG Greenhouse Gases

GMM Gaussian Mixture Model

HIT Human Intelligence Task

HMM Hidden Markov Model iAWE Ambient Water and Electricity Sensing Dataset

IEA International Energy Agency

IoT Internet of Things

KDE Kernel Density Estimation kHz kiloherz kWh kilowatt-hour

KSC K-Spectral Centroids

LSIR Distributed Information Systems Laboratory

NILM Non-intrusive Load Monitoring

OPEC Organization of the Petroleum Exporting Countries

PCA Principal Component Analysis xviii acronyms

REDD Reference Energy Disaggregation Dataset

SOM Self-organizing Maps

UK-DALE UK Domestic Appliance-level Electricity Dataset

VBGMM Variational Bayesian Gaussian Mixture Model

WCED World Commission on Environment and Development INTRODUCTION 1

1.1 towards sustainable development

Energy efficiency is a ubiquitous concern and not only the preroga- tive of some regions on earth. At the global scale, different reasons and realities might pave countries’ daily operations and lead to the re- vision of their relationship towards energy production and consump- tion. Random blackouts and contingency plans for the scheduling of energy is to date still the reality of emergent and newly-developed countries to alleviate energy deficits. Beyond grid reliability and qual- ity of service, climate changes due to the overusing of earth’s re- sources [160] and the ecological footprint of industrialized countries as can be seen in Figure 1 are impacting all of humanity. NASA’s Goddard Institute for Space Studies showed that our global aver- age temperature has risen by approximately 0.8◦C since 1880 [41]. Global warming consequences could be particularly felt in the winter of 2015, where New York witnessed a "tropical" Christmas as tem- peratures neared 18◦C, before reaching 22◦C on the following day. Further, record-breaking warm weather for that time of the year had also been reported across the eastern half of the United States [80, 172, 204].

Figure 1: Global distribution of the ecological footprint in 2012 [176]

Concrete steps towards pursuing sustainable development started in the 1980s with the creation of the Bruntland Commission in the form of the World Commission on Environment and Development (WCED) to promote economic growth, the protection of the environ- ment from human caused damages and social equality globally [34].

1 2 introduction

Figure 2: Hourly network load for the electricity grid in Switzerland in 2015 [222]

Since the publishing of the Bruntland report in 1987, several world congresses have taken place, with milestones such as the Kyoto pro- tocol in 1997, which was adopted by 192 countries with the commit- ment to reduce Greenhouse Gases (GHG) emissions [173]. Reconsid- ering the access to energy sources in industrialized countries as well, to allow equal access to energy sources for all and thwart damages to environment is still a priority on the United Nations’ agenda [231] and further measures were ratified by over 194 signatories in the 2015 Paris agreement [230]. The global penetration of renewable energy sources replacing fos- sil fuel and nuclear power plants means adjusting to stochastic en- ergy production. In this scenario, the yield will be dependent on very different weather and landscape conditions. This will represent a challenge for countries already having continuous access to en- ergy sources as this change will impact the generation of electricity, heating and cooling policies, but also transportation paradigms [211]. Tracking wastage of energy, improving the scheduling, and the pro- cesses that consume energy, would help to match the demand and the supply. This is particularly crucial for electrical energy during peak time and critical seasons where the production depends on weather conditions as can be seen in Figure 2, and so, it is often the case that meeting the high demand incurs the ramping up of additional but often unclean power plants. This strategy thus over-dimensions the electrical grid, and power plants will mostly remain idle during off-peak time. As climate changes are reaching alarming states, am- bitious governmental programs have targeted a reduction of energy consumption, CO2 and an increase of the share of renewable energy sources by different deadlines (such as 2020 for Europe [56] and the U.S. [233], or phasing out nuclear power in 2050 for Switzerland [57]). 1.2 from smart meters to smart grids 3

Figure 3: Electricity production in Switzerland since 1950 [222]

Figure 4: Electricity consumption per sector in Switzerland since 1987 [222]

Some countries are advantaged by their geographical setting, allow- ing them to integrate large shares of clean energy sources such as hydroelectric, wind or solar power. But even in the most favored coun- tries, such as Switzerland, the demand for electricity has been rising consistently over the past decades as can be seen in Figure 3 and Fig- ure 4, while production capacities for traditional sources such as hy- dropower can only be increased to a limited proportion [222]. Given the current scenario, the discontinuation of nuclear power usage will impose new challenges. Concrete policies have yet to be adopted and rally social acceptance [247], then can they be implemented in prac- tice, as the attempts at speeding up the adoption of changes in the form of energy subsidies are incurring unsuspected and controver- sial social, environmental and fiscal impacts [52].

1.2 from smart meters to smart grids

While energy efficiency initiatives were primarily motivated by oil contingency measures linked to the 1973 OPEC oil embargo [2] and ever since the 1997 Kyoto Protocol, the International Energy Agency (IEA) has reported different rates of success and advancement in the implementation of global energy efficiency measures [181]. In prac- tice, government-issued policies are often linked to regulatory mea- 4 introduction

sures that will force manufacturers to comply to new standards such as in the case of appliances or building certification in the U.S. with the Energy Star label, or low CO2 emission standards for vehicles in the EU [113]. The European Commission planned to have 80% of the households equipped with smart meters by 2020 [82]. New opportunities have arisen with the deployment of smart meters for obtaining finer grained and real-time snapshots of a household’s elec- trical energy consumption pattern, instead of the yearly, bi-annually or quarterly reading of the aggregate usage. The information collected from the smart meters is a first step to- wards smart grids, to advance towards the provisioning for electricity in real time, instead of negotiating energy prices in longer horizons on the spot market. Demand-side Management (DSM) is associated to programs that focus on changes that consumer should apply to mod- ify their demand to become more energy efficient to curtail invest- ments to meet the peak demand. It can be achieved through finan- cial incentives or educational campaigns to modify the consumer’s behavior for long-term horizons. Applicable measures include build- ing renovations or home appliances upgrades towards more energy efficient models. A special case of DSM is Demand Response Man- agement (DRM), which refers to a set of mechanisms for managing electrical energy consumption in response to supply-side signals; it gives the demand side a greater incentive to be responsive to chang- ing supply conditions and has been successfully applied to industrial consumers. Such systems would and should be able to offer time- variable solutions as can be seen in Figure 5 to handle peak consump- tion via real-time pricing, which should become critical as more coun- tries integrate renewable sources, as wind or solar production will be dependent on weather conditions [179]. While hydroelectric power, and in particular pumped storage plants, such as in Switzerland [221] and Norway, allow the grid operators to regulate excess energy supply by pumping it back during low demand periods of the day and storing it as a macro-level battery, those plants are only suited in some very specific geographic ar- eas to guarantee a satisfactory yield and most potential capacity has been exploited in industrialized countries, but small and medium hy- dropower plants still have an under-developed potential in emerging or developing countries [110], as long as the benefits outweigh the negative, social and environmental impacts [44]. Without pumped storage, other means for storing the excess energy to balance for peri- ods of poor conditions need to be made available. In this configuration, microgrids are becoming realizable outcomes as technological components are emerging such as the roll-out of smart meters, the decrease of the costs of solar panels and soon the deployment of individual storage solutions such as through the Tesla battery and more affordable electrical vehicles and the increas- 1.3 drmandthesmarthome 5

Figure 5: DRM measures for energy efficiency targets [212] ing number of testbeds. Elon Musk announced that Tesla will be- gin selling battery packs to residential consumers, small businesses, and utility companies [43]. This appears to be an answer to many of the challenges facing an energy sector suffering from the absence of large-scale storage capacity to accompany renewable energy genera- tion [153]. Although useful, Tesla’s batteries are not yet economically competitive (being currently priced at USD 5500 for a 14 kWh capac- ity [225]), and their safety and durability remain to be proven [166]. Other technologies, such as compressed air storage, are untested for large-scale use. While large-scale storage solutions still require some time before they will be ready, climate change demands immediate action.

1.3 drm and the smart home

While Demand-side Management (DSM) programs targeting the resi- dential sector were primarily developed in the 1970s, they have been met with limited success levels due to the strategies based on mon- etary incentives [2]. However, the convergence of technologies to fa- cilitate the creation of distributed generation, storage and manage- ment of energy sources involving the residential customers represent a promising sector for developing and applying energy efficiency measures and is currently being considered by countries such as Ger- many [2]. In parallel, the development and the deployment of the Internet of Things (IoT)[90], with a forecast of 20.8 billion connected de- 6 introduction

Figure 6: DRM customer-side flow control view in an Energy Management System (EMS)[212]

vices by 2020, can expand its realm of application to the homes [116] and render the implementation and deployment of DRM strategies at the household level possible. The potential for applying DRM strate- gies in the smart home is to allow automation and control of the di- verse components within the buildings such as shown in Figure 6, by having a two-way communication pathway through the meter- ing infrastructure between the customers and the energy providers. Wired and wireless (both proprietary and open) protocols have been developed and some even standardized by the industry for control- ling components as diverse as lights, shades or sound systems within the home [102]. At the moment, they are still the privilege of luxury domotics projects such as the Jewel Box house as can be seen in Fig- ure 7 [17] and mostly target improving the residents’ comfort by au- tomating some control tasks [30]. The perspective is nevertheless that smart homes and smart cities are set in the digitalization of our envi- ronment by increasing the connectivity of different components that surround us. Mining data acquired from these sensors provides the ability to learn and predict information to enable services that allevi- ate and improve the functioning of most tasks (at the level of a home, a city or a larger system) by optimizing processes and scheduling in regards to the system’s exogenous constraints such as the grid’s status and the users’ own objectives and preferences. 1.3 drmandthesmarthome 7

Figure 7: Jewel Box house, 2015 CEDIA Award recipient [224]

Disaggregated electrical energy footprints of single appliances could soon be reported by smart appliances and combining contextual data can help build predictive and adaptive models to learn energy-hungry behaviors and their scheduling [16]. The potential for conserving en- ergy lies in benefiting from the pervasive integration of smart com- ponents in homes and as they are democratized, the costs will sink and contribute to a higher penetration in households. Beyond being used for enabling control for the comfort of the user, they provide additional insights into the context of household residents and will enable us to prepare strategies to render homes more energy efficient by taking into account real-time information from the electrical grid. A pervasive implementation of DRM might be a solution worth con- sidering. To exploit its full potential, however, DRM must be imple- mented pervasively, by including residential customers as well in the design of the system [25, 133, 243]. The key contributor to this system should be each household resi- dent and what is targeted is a change of behavior. Attempts at con- serving energy to sensitize households residents to their electrical energy consumption by considering them as "homo economicus" en- tities and reward them for opting for an improved energy efficiency were tested as early as in the 1980s [63, 118, 126, 127, 209]. Those studies showed that some first-hand obstacles were the jargon used to describe the energy consumption, where unfamiliarity to the units used to represent it undermined the capacity for self-adjusting to the wastage, as the amount of energy used by distinct electrical appli- ances was either under- or over-estimated and the monetary compen- sation not decisive enough to adopt changes. Unless customizable so- lutions that can be adapted to the customer’s lifestyle are proposed, opposition and failure would be the likely outcome. The action that can be taken to improve a resident’s electrical energy footprint is not limited to only pinpointing appliances’ consumption, but leading to awareness and actions to lower the consumption in practice. These landmarks could be more easily identified and memorable if they 8 introduction

can be related to the different activities that pave the residents’ daily lives, such as cooking, housekeeping or entertainment. The lesson learned from previous attempts were that a model was imposed to the end-users, instead of trying to understand their rela- tionship and the reasons for using electricity. Addressing behaviors requires studying behaviors. The potential thus lies in improving the chances that a system is socially acceptable, by trying to understand people’s lifestyles and to which extent they could be improved, while at the same time automating the least invasive tasks and rendering them more energy efficient. The necessary technologies are under de- velopment or underway to mature and are offering new perspectives for achieving these goals. Convincing the end-users to adopt a new relationship to their electrical energy consumption will require un- derstanding their needs and compromising enough such that energy efficiency targets can be met without disrupting their everyday life too much. A first step towards this is by speaking their language and this can be achieved by identifying and studying the activities that re- quire electrical energy usage, conveying this information to the house- hold residents and developing energy conservation strategies based on them. This also means that the stakeholders (industry, researchers, governments, utility companies) should share a common vision as of how these soon to be ripe fruits can be combined to shape tomor- row’s strategies for sustainability that are convincing enough that the end-users opt to adopt them.

1.4 thesis outline and contributions

The goal and contributions of this thesis lie in the usage of data an- alytics to identify and study electricity consumption patterns within households towards achieving energy efficiency in the residential sec- tor. We leverage existing datasets and develop data-driven methods to learn the scheduling of human activities that involve the usage of consumer electronics and electrical appliances. This thesis proceeds by exploring the residential electrical energy consumption pattern from the macro-level to inferring activities as follows.

Chapter 3 We study how to segment residential customers’ elec- trical energy consumption loads based on their peak consumption, as a way for utility companies to discover what consumption profiles exist in their customer-base, to then devise strategies to match or to alleviate the demand during peak hours. We defined a two-phased clustering scheme compatible with existing popular clustering algo- rithms and show which distance measure to use to be discriminative against the peak location in a load consumption time series. 1.4 thesis outline and contributions 9

Chapter 4 To fully evaluate learning algorithms, we were miss- ing ground truth labels, as they are necessary to quantify the good- ness of learning algorithms. To leverage existing datasets, we took advantage of an unexploited source: the wisdom of the research com- munity. Crowdsourcing has contributed to considerable advances in the development of algorithms in computer vision or information retrieval systems, by allowing the creation and the consolidation of large datasets of annotated data. This had not yet been used in the energy research domain, where, instead, researchers have to wait for other research groups to release datasets that are often imperfect in diverse dimensions (missing event-based labels, being of too coarse granularity, having been collected over a short period, not containing enough appliances, etc.). We developed a Collaborative Annotation Framework for Energy Datasets (CAFED) for retrofitting labels on existing datasets by having researchers who have gathered consider- able experience with power data, by being familiar with appliance signatures through Non-intrusive Load Monitoring (NILM) research, annotate power curves. We focused primarily on active versus idle an- notations, but our system could be easily extended to allow other labeling on top of time series data. We provide a thorough evalua- tion of the expertise of users to provide crowdsourced annotations for energy time series.

Chapter 5 To achieve energy efficiency in households, we needed to understand what triggers the electrical energy consumption. There- fore, we focused on learning the residents’ behaviors that involve the consumption of electricity, by identifying activities that require the usage of appliances and other electronic devices. While consid- erable research efforts have been directed at analyzing aggregated loads from smart meters or at developing algorithms for disaggre- gating loads to extract the consumption of single appliances, less fo- cus has been put on assessing the potential of using disaggregated data. This was primarily due to the fact that such datasets were not widely available, due to the difficulty and the costs in instrument- ing households for acquiring the consumption data from appliances. We propose GMMthresh to automatically determine the threshold dif- ferentiating the active and idle states of an appliance, based solely on the statistical properties of its load consumption. We validate our method by using the manually labeled data acquired through expert- crowdsourced annotations presented in Chapter 4. Then, we propose a pipeline for mining temporal association rules to learn the schedule of human activities involving the usage of electrical appliances.

Chapter 4 and Chapter 5 contain material based on the author’s work during her exchange at the Distributed Information Systems Laboratory (LSIR) at EPFL.

STATEOFTHEART 2

This chapter reviews the state of the art on data analytics in the energy research domain as part of DSM. It covers demand analytics using smart meter data and dives into energy efficiency measures at the household level by leveraging additional finer-grained datasets and enabling future smart home scenarios.

2.1 demand analytics

2.1.1 Psychological Factors

Psychologists have attempted to examine people’s response to changes towards improving their household energy consumption, for it to be- come more sustainable. De Young [69] suggested that the strategies to render conservation behaviors durable are limited to persuasive com- munication (in the form of providing information to increase aware- ness and help individuals identify behaviors to change), push (coer- cive motivational incentives by introducing regulations and penal- ties) and pull measures (positive motivational means by providing monetary incentives or other types of advantages). Gatersleben [91] showed in a survey how the application of energy efficiency through the adoption of those policy strategies would be received by people. The study focused on the impact on the quality of life through the im- plementation of conservation measures. The respondents feared the most negative effects on their comfort level, the freedom and control in their lifestyle and an impact on their ability to experience enjoy- able things in their daily lives. The measures they found most effec- tive to implement were the ones that demanded the least efforts and were less likely to impact their quality of life, such as receiving sub- sidies for replacing electrical equipment. The survey also revealed that the participants were unaware of the current state of their coun- tries environmental policies. Additionally, Abrahamse and Steg [1] conducted a study that suggested that while energy consumption is linked to socio-demographic attributes of a household (such as the disposable income), changes of behaviors that can lead to energy con- servation can be attributed to psychological factors. This is related to the findings of Kahneman [121] that people’s decision making can- not be framed under the rational-agent model but is simplified by substituting heuristics such as the representativeness, the availabil- ity and the anchoring of the information instead to complex reason- ing, which can introduce misjudgment and systematic mistakes. Sut-

11 12 state of the art

terlin [219] recognized the necessity to segment energy consumers based on their conservation behaviors and demonstrated the general effect of the social value orientation on the predisposition to choose either curtailment or energy efficiency measures, and thus the neces- sity to tailor conservation strategies to the correct audience to maxi- mize the chances that they are adopted by the consumers. Pedersen [182] conducted surveys to gather about 60 attitudinal and behav- ioral factors that they analyzed using Principal Component Analy- sis (PCA) and K-Means clustering, instead of traditional Analysis of Variance (ANOVA) or hypothesis testing used in the psychological literature, and identified the dimensions that represent the segments of the residential customers of BC Hydro.

2.1.2 Leveraging the Smart Meters’ Opportunities

As utility companies worldwide have rolled out digital electricity me- ters in households, also known as smart meters, researchers have in- vestigated what knowledge could be extracted from electrical con- sumption data.

2.1.2.1 Customer Segmentation As smart meters are rolled out throughout the world, high resolution temporal electricity consumption data replace the traditional spot me- tering, where utility companies have to send employees to collect the aggregate electricity usage of each dwelling periodically (yearly, bian- nually or monthly) on analog meters to bill their customers accord- ingly. Services can be built by leveraging the time series data recorded from the smart meters to improve the energy management on the util- ity side or providing customized services to the customers, which will enable an efficient implementation of DRM programs. Albert and Rajagopal [9], Beckel, Sadamori, and Santini [28], and Wijaya, Aberer, and Seetharam [244] attempted to extract socio-economic characteristics of households from smart meter data. These features are difficult to obtain, unless expensive and often time-consuming surveys are mailed out or interviewers are sent out to meet residen- tial customers, and these previous approaches did not guarantee a high rate of responsiveness. Beckel [25] and Beckel, Sadamori, and Santini [28] analyzed a data set of coarse-grained electricity consump- tion data (e.g., 1 sample every 30 minutes) to infer characteristics such as the employment status, the floor area or the number of occupants. In a similar way, Albert and Rajagopal [9] proposed a methodology to automatically infer demographic and appliance stock characteristics of homes. As the energy market becomes liberalized in most coun- tries, these techniques allow utility companies to address customer churn and identify households, which may be targeted and may opt for differentiated tariffs and saving recommendations. 2.1 demand analytics 13

These efforts can be related to segmenting customers to derive target groups for energy conservation campaigns based on similar load profiles [7, 10, 35, 143, 244], or similar thermal profiles [8, 11]. Previous work, before the usage of Advanced Metering Infrastruc- ture (AMI), such as by Almeida et al. [12], relied on survey data and socio-economic details about customers to segment and profile them. Kwac, Flora, and Rajagopal [139, 140] and Kwac et al. [143] proposed different customer segmentation based on the objective to use the to- tal daily consumption to identify high-consumption users with large variance and normalized daily curves with K-Means with a parame- ter to control the cluster size, followed by refinement through hierar- chical clustering. Albert, Gebru, and Ku [7] used K-Spectral Centroids (KSC) and K-Means to cluster daily consumption curves. They eval- uated the households cluster membership variability through their entropy and they additionally used logistic regression to determine if some socio-economic characteristics could predict the propension to change clusters. Cao, Beckel, and Staake [35] identify cluster pro- files based on the time of the day to locate the peak consumption to differentiate customer segments and provide an analysis of which discriminative score is compatible with classical clustering algorithms such as K-Means, Self-organizing Maps (SOM) or hierarchical cluster- ing, to guarantee cluster consistency. Haben, Singleton, and Grindrod [100] used a finite Gaussian Mixture Model with the Bayesian Infor- mation Criterion (BIC) as the control factor for the number of clusters to group customers based on features computed from the consump- tion of residential customers. Wijaya [243] and Wijaya, Aberer, and Seetharam [244] formally defined and evaluated a framework for cus- tomer segmentation that could be parameterized to the needs of the user to prepare and select smart meter data on which a clustering algorithm could be applied. They additionally provided an index ex- tending existing cluster consistency indices and a discriminative in- dex to indentify socio-demographic features that are most representa- tive of each cluster. Kwac and Rajagopal [141, 142] proposed a linear model for the response of residential customers for DR events based on temperature variation. They developed an algorithm based on the Stochastic Knapsack Problem to select the appropriate customers to enroll.

2.1.2.2 Load Disaggregation While the previous approaches focused on relatively coarse aggre- gated consumption data, considerable work has also targeted the problem of disaggregating the electric load curve to determine the consumption of individual appliances. Referred to as Non-intrusive Load Monitoring (NILM), these approaches use finer-grained aggre- gated consumption data (e.g., one sample every second) to provide consumption at a device-level as established by Hart [103]. A good 14 state of the art

overview of existing NILM approaches is given by Zoha et al. [253] and Zeifman and Roth [251]. NILM is related to activity recognition as it allows to identify individual devices (e.g., the stove or the tele- vision) from the aggregated consumption data. However, Armel et al. [16] have shown that while NILM approaches can identify some appliances such as the refrigerator or washing machine, the major- ity of devices still cannot be reliably detected from 1 Hz or coarser data alone. While Gupta, Reynolds, and Patel [99] have demonstrated that most consumer electronic and fluorescent lighting devices can be detected with an accuracy exceeding 93%, their approach requires sensing the electricity consumption at multiple kilohertz.

2.1.2.3 Load Forecasting Utility companies need to plan their operations by managing their assets to offer a reliable supply of energy. Load forecasting assists them in managing investments in their infrastructure, the scheduling of the production and the transmission of energy or the establishment of spot or derivative contracts for energy bidding. Different time hori- zons will influence the strategy and thereby the prices. Short-term forecasting focuses on predicting the demand a few minutes to a few days ahead, mid-term forecasting targets weeks to months ahead and long-term forecasting assist the decision making for the forthcoming years [241, 243]. Electricity price forecasting has been tackled by dif- ferent techniques, which have evolved as the regulations surround- ing the energy market changed towards market liberalization from multi-agent models, to statistical model fitting and machine learning techniques such as neural networks [241]. Similarly to electricity price forecasting, load forecasting techniques encompass statistical and ar- tificial intelligence methods such as neural networks or regression models [105, 220, 248]. Fine-grained residential electricity consump- tion data from smart meters has allowed finer models to be built, by forecasting the demand of single households [101, 246] or the load of groups of households are taken into consideration and used by inter- mediary agents known as aggregators that serve as gateway between residential users and energy providers [93].

2.1.2.4 Publicly Available Datasets Recent research in the analysis of aggregated and disaggregated elec- tricity consumption data has been made possible by the release of various public datasets. The (Irish) Commission for Energy Regula- tion (CER) dataset [54], for example, consists of the total energy con- sumption at a granularity of 30 minutes for over 4000 households. Dis- aggregated data at 1 Hz granularity is provided by the Reference En- ergy Disaggregation Dataset (REDD) dataset [135], where data from six households were collected. The Smart* dataset [20] contains data 2.2 towards smart homes 15 from 3 households, with disagreggated and sensor data in the case of one of the households. The Building-level Fully-labeled Dataset for Electricity Disaggregation (BLUED) dataset [14] provides data from a single family house with 12k Hz sampling during a week. The Pecan Street dataset [171] incorporates 1-minute appliance-level and aggre- gate data from over 800 households in Texas and in California. The Ambient Water and Electricity Sensing Dataset (iAWE) dataset [22] in- cludes 73 days of appliance-level and water supply data from a house in India. The Electricity Consumption and Occupancy Dataset (ECO) dataset [26] contains consumption data annotated with occupancy ground truth from six households. The UK Domestic Appliance-level Electricity Dataset (UK-DALE) dataset [125] provides data at different granularities from 5 households in the UK. The CASAS project [60] in- volved 32 households where a smart home kit was used to transform the households into smart homes with sensors for activity detection. The total energy consumption per household was also measured. The activity labels collected in a month were provided by manual label- ing by the authors for 18 single-resident apartments by examining the data. The experiment did not provide appliance-level electricity measurements.

2.2 towards smart homes

2.2.1 Designing Feedback and Moving Towards Automation

Aside from the monthly utility bill, little tangible traces are left from the usage of energy in households. Unlike a car that cannot func- tion when its tank needs to be replenished, energy in households is supplied instantaneously and continuously through people’s indoors wiring. As the usage of these services hardly ever suffers any disrup- tion, households residents are oblivious to the energy consumed by the appliances they plug into electrical sockets. The utility company sends a bill containing the aggregated consumption in kilowatt-hour (kWh) or cubic meters over the last period and this measure rarely understood by consumers is the only feedback that their customers receive. With a single data point, such indirect feedback does not help the customers to identify which appliances or activities are most energy-hungry. Actually, households often find it difficult to correctly disaggregate the bill over the possible usages and the users respon- sible for the consumption. Energy used by appliances, which were built to simplify people’s lives by reducing manual labor, such as the dishwasher or the washing machine, is often overestimated, while the energy spent on heating is usually grossly underestimated as shown by Kempton and Montgomery [127], Schmidt-Küster [209], and Sut- terlin [219]. For this reason, many digital metering initiatives have targeted the provisioning of direct, immediate consumption feedback. 16 state of the art

In their study, Ehrhardt-Martinez, Donnelly, and Laitner [78] provide a concise overview of such approaches in the United States. However, many of the dedicated feedback monitors of the first gen- eration such as the "Energy Detective TED”1 or Google’s now aban- doned PowerMeter2 never gained much traction with ordinary house- holds. Their interface was often complex and difficult to navigate, ap- pealing only to tech-savvy users. Also, while these devices provide an instantaneous view of the current consumption, the conveyed infor- mation did not include a device-level breakdown [239]. The challenge with making use of AMI to reach energy efficiency in households is to derive useful information and services to engage the users, but also how this information is formulated and displayed to the households’ residents [66, 198]. Therefore, research aiming for a reduction in en- ergy consumption moved towards notifying users about the biggest "energy guzzlers" in the household as shown by Mattern, Staake, and Weiss [159]. For this purpose, Weiss et al. [239] introduced the eMe- ter iPhone application, which allowed customers to directly measure the electricity consumption of selected appliances. Jahn et al. [115] de- veloped a prototype of a smart home where, based on energy prices entered by the system’s user, a context-aware application could give feedback to the household residents about the energy usage and the cost of using different appliances based on augmenting the informa- tion while recognizing the object on a mobile phone, to reduce the need to switch the user’s attention. However, substantial savings us- ing such a pull approach (i.e., the user is initiating the procedure to identify energy-efficient behavior) are only working if the users are continuously involved. Such long-term user participation, however, is difficult to achieve unless continued incentives such as significant monetary savings can be provided. Programmable thermostats for ex- ample often only save energy if the occupants are generally focused on those savings in the first place as shown by Meier et al. [162] and Nevius and Pigg [175]. In order to lower the barrier of entry to users, authors such as Froehlich, Everitt, and Fogarty [88] and Froehlich, Findlater, and Lan- day [89] therefore looked at personalized, device-level consumption feedback. Similarly, Lu et al. [156] have considered the automation of high energy tasks such as heating. Armel et al. [16] quantified that real-time feedback could allow for a reduction of the energy consump- tion in households by up to 12% as can be seen in Figure 8. By sensing activities, the aim is to be able to give direct, personalized and imme- diate feedback. In terms of energy conservation, it is important to distinguish between the savings that can be obtained by forgoing the consumption altogether or those (monetary) ones that can result from shifting the activity to a different time as shown by Cottone et al. [64,

1 www.theenergydetective.com 2 www.google.com/powermeter/about 2.2 towards smart homes 17

Figure 8: Expected energy savings according to technology used for feed- back on energy consumption [16]

65]. Following the realization that the operation of traditional ther- mostats is often cumbersome and prone to errors that even increase the energy consumption [156], many authors have also identified the potential for automation in a heating scenario and proposed algo- rithms to predict human occupancy in order to predictively control a thermostat [81, 98, 156, 169, 203, 210]. Kleiminger [133] provides a quantitative performance evaluation of various approaches.

2.2.2 Context Awareness

Context awareness has become a field of research on its own, i.e., en- abling computers to recognize situations using, for instance, infor- mation about the user’s location, activity or social environment. The pervasiveness and ubiquity of sensors that can now intimately record additional personal information by being embedded in wearable de- vices deliver massive amounts of data, from which contextual infor- mation can be retrieved, and thereby support the user in the selection, planning, and accomplishment of everyday actions [75]. As the en- vironment changes over time, a context-aware system should adapt swiftly by sensing the user’s context, modeling it and reconfiguring itself efficiently and reliably as an assistive technology. Since the emer- gence of this field in the beginning of the 1990s [207], many research projects have demonstrated context-aware applications. The MIThril inference engine by DeVaul et al. [71] was developed and utilized for building a context-aware cell phone that would, for example au- tomatically switch profiles when entering a movie theater or restau- rant. While acquiring the location of a user is central to this research field, any information on the physical conditions of the current sur- roundings is valuable and the different sources of information can be processed and fused to enable context-aware applications to support 18 state of the art

their users as shown by Schmidt, Beigl, and Gellersen [208]. Recent studies by Dey and Newberger [72] and Lim and Dey [150] and Lim, Dey, and Avrahami [151] in this domain have focused on the interac- tion of context-aware systems and end-users, specifically focusing on the question of the application’s intelligibility, i.e., whether or not a user will understand, trust, and accept the context-aware background assistance. Context-awareness is a building block for ambient intelligence, and uses artificial intelligence advances towards becoming an assistive technology that can reason and adapt to support people [59]. Smart homes are a special case where an ambient intelligent system can be deployed. Testbeds for smart homes were deployed in some rel- atively small scale due to the difficulty and cost in equipping liv- ing premises with the necessary sensing infrastructure, but notable projects are related to the Aware Home [129] that predominantly was built for finding lost objects and providing support for elder people, the House_n that was a living laboratory to create a data library for activity monitoring where each participant stayed 10 days [112], the Adaptive Home that used an occupancy model and reinforcement learning for determining and controling the heating, ventilation and lighting pattern [167, 168] or the MavHome where the focus was on building a probabilistic mobility model inside a home [67]. Cook et al. [60] extended the MavHome into the CASAS project to study the po- tential of smart homes for elderly care and energy efficiency. Fensel et al. [85] developed the SESAME-S system as a prototype for future smart homes, combining sensors and actuators, and semantic model- ing for managing the automation, the metering and the pricing. The system was however managed by predefined or user-selected policies. The system was not deployed in a real residential setting, but rather in a school and in a factory.

2.2.3 Leveraging Human Activity Information for Energy Conservation

Activity recognition is a long-established field of research, where wearable accelorometer sensors were used to determine daily activ- ities [18, 120, 158], human trajectories, interactions with objects or social activities [3]. Roggen et al. [200] discuss challenges and require- ments for opportunistic sensing of activities. Recent work by Wang et al. [237] leveraged RF signals for detecting human traces and ac- tivities in households, but is not yet linking it to the consumption of electrical devices and allows only one user to be detected. Most approaches linked to activity dectection neither targeted en- ergy conservation, nor used the electricity consumption as an in- put variable for the recognition of activities. Thus, leveraging hu- man activity information for energy conservation is most closely re- lated to recent work on DSM. Some experimental studies for real- 2.2 towards smart homes 19 time scheduling of appliances usage to achieve peak shaving (i.e., to reduce consumption when it is costliest) have been carried out by Costanzo et al. [62] and Barker et al. [20, 21], but leave out impor- tant appliances the residents are interacting with. To estimate the po- tential savings for demand side policies, Wijaya et al. [245] presented DRSim, a simulator for DSM systems that is aware of the current status of the grid and the activities carried out inside a household. Richardson et al. [197] suggested a bottom-up approach and started from user-described activities to model and synthesize what the elec- trical load would look like in households. Kasteren et al. [122] ac- quired sensor data such as through motion detectors or wearable de- vices in a house and activity labels through a Bluetooth headset com- bined with speach recognition. They used a Hidden Markov Model (HMM) and a Conditional Random Field (CDF) to model the activ- ities changes. Cook [58], Cook and Schmitter-Edgecombe [61], and Rashidi et al. [192] ran a controlled experiment where several stu- dents performed the same activity to take into account variability be- tween subjects for performing the activities. The activities were more geared towards assessing activities that might be impaired as people age. The authors then used manually labeled activity data from sen- sors to develop an unsupervised technique for discovering activities based on an HMM for mining the sequences of events represented as symbols, which were then clustered using K-Means with a modi- fied edit distance that takes into consideration re-ordering and events’ duration and frequency. As an extension, Chen and Cook [47] and Chen, Das, and Cook [48] extracted features from sensor data and attempted to link the aggregate load of the household to the resi- dents’ activities and found out that the results were biased due to the impossibility to remove large appliances from the overall load. How- ever, the adaptability of an ambient system where the responsiveness of the residents would be taken into consideration as to minimize the discomfort was not considered. Cottone et al. [64, 65] attempted to detect activities from a stream of events and labeled the activity with an HMM, similarly to Kasteren et al. [122], but used synthetic data with the objective of shaving the peaks by shifting a whole activity to a more convenient time of the day. Phillips et al. [186] used wire- less sensors (acoustic and light sensors) to detect human movements linked to events that can be attributed to activities and evaluate their system by comparing its accuracy with plug-level data. Ranjan, Grif- fiths, and Whitehouse [190] attempted to attribute fixture usage to household residents using an RFID-based tracking system. Rollins et al. [202] designed a push-system for recording user activities based on the identification of interactive loads by clustering the states of appliances, but their system failed to recognize appliances such as dryers. Inspired by the early results of Ong and Bergés [177], Rollins and Banerjee [201] also published work on association rules based 20 state of the art

on appliance consumption patterns. Molina-Markham et al. [164] sug- gested that activities could be inferred from the electric load curve of a household. In their work, the authors described a setup in which the aggregated electricity consumption is sensed in three homes over two months. In addition, annotation is performed over "at least three days". Recently, Thomas and Cook [226] extended the CASAS project by integrating an algorithm for automatically switching on and off appliances based on the activity recognition, and predicted the next activity in a time frame of 10 minutes, training on a week of user- annotated activities. CUSTOMERSEGMENTATION 3

This chapter is based on work that appeared in the Proceedings of the 39th Annual Conference of the IEEE Industrial Electronics Society (IECON ’13)[35].

Elaborating Demand-side Management (DSM) strategies is crucial for integrating electricity from renewable sources into the electrical grid. Although future DSM strategies will largely depend on an au- tomatic control of larger loads, it is also widely agreed upon that consumer behavior will play an important role as well - be it by pur- chasing respective automation techniques or by shifting the use of appliances to other times of the day. Doing so, it becomes possible to select households that offer sufficient load shifting potential, and to overcome undirected and thus, expensive energy conservation cam- paigns, where every customer is contacted by their utility company by mail, although there is no guarantee that they might respond pos- itively. To our knowledge, this perspective is still under-researched, especially when it comes to clustering methods on load consumption data with a focus on peak detection accuracy to provide customer segmentation. In this chapter, we use the data collected in the Irish CER dataset, which contains readings for over 4000 residential customers over a pe- riod of 18 months at a 30-minute interval. Our contribution consists in showing that the whole clustering of the time series, with a few adaptations on the usage of the K-Means algorithm, provides better clustering results without sacrificing practical feasibility. Characteris- tic load profiles allow us to segment the customers, address groups of households with similar consumption patterns and determine the cluster membership of a given load curve on the fly. This will sup- port decision making regarding the investments in load shifting cam- paigns to prevent over- or under-dimensioning linked to peak energy demand.

3.1 identifying the "right" customers

As the generation of electricity from renewable resources does not fully rely upon a previously defined and arbitrary schedule, but is the result of varying environmental parameters, the required flexibil- ity to balance supply and demand will increasingly be achieved by managing the demand of the grid. This will require a more thorough appreciation of the network flow and usage, and contrasts with the

21 22 customer segmentation

current set-up, where synthetic load profiles are commonly used to provision for energy, although they constitute an average profile for all households. More fine-grained information about the specific con- sumption patterns could allow for a better understanding of when and which customers are responsible for the peak-time energy con- sumption, which is costly for the energy providers. While extensive work has been carried out on producing an es- timation of the load consumption, we are focusing on identifying the characteristic load profiles. The novelty resides in the fact that al- though clustering methods have been tested on smart metering data, they were mostly intended as an exploratory phase or as a proof of concept that the data can be segmented. For this reason, evaluation means of the obtained clusters still need to be characterized in order to be applicable for clearly defined use cases. We see three major advantages that relate to (i) providing detailed insights about household load curves in general (ii) being able to iden- tify "hurtful" households, which helps to focus the cost of mitigation to the relevant ones (iii) having a means that can assist in determin- ing the customer value up to the point where tariffs may depend on the load curve, even in the household segment. A non-exhaustive list would comprehend measures such as sending prompts, extended in- formation on utility bills, behavioral cues e.g., to collect bonus points for a desired change of load profile type, enabling energy consulting teams to preselect households that are given priority for automatic load shifting measures or evaluating the effects of load shifting cam- paigns in a very focused way. The work is of special interest as it can be implemented with- out hardware investments beyond off the shelf deployment of smart- metering infrastructures, using well known but specifically adapted clustering techniques. It relies on a new approach to select the ap- propriate parameters and establishing characteristic cluster profiles as references to determine cluster membership on the fly. The remainder of this chapter is structured as follows. We review the related work in Section 3.2. Then, we present the dataset and the data preparation to build the clustering framework in Section 3.3 and discuss the experimental results in Section 3.4. We provide insights on possible applications and research tracks in Section 3.5.

3.2 related work

The access to high resolution smart meter data will allow the util- ity companies to plan infrastructure investments and thus the ne- gotiation of spot market contracts more efficiently. Load forecasting can for example benefit from predicting the load to be provisioned based on finer-grained historic data [92]. Additionally, improving the management of the energy as the market liberalization progresses 3.2 related work 23 in most countries and as energy efficiency targets are envisionned worldwide by governments’ future policies, allows utility companies to address customer churn and identify households, which may be targeted and may adopt differentiated tariffs and saving recommen- dations. To our knowledge, researchers have leveraged diverse smart meter datasets towards customer segmentation. McLoughlin, Duffy, and Conlon [161] also investigated the Irish CER dataset’s potential for segmenting households by relying on survey data, which relates more to a classification task. Chicco et al. [50, 51] extracted statisti- cal features from daily load curves based on consumption ratios and compared them to clustering the daily hourly load curve with an Adaptive Resonance Theory (ART) neural network system. They used a weighted Euclidean distance to compare feature vectors. Chang and Lu [45] leveraged Fuzzy C-Means (FCM) to allow multiple member- ship for candidate hourly load curves and used a decision tree to assign statistical load features to the obtained clusters similarly to Chicco et al. [50, 51]. Rodrigues et al. [199] evaluted the clustering of load curves using K-Means and Self-organizing Maps (SOM) and used the normalized Euclidean distance as a similarity measure. Their work was extended by Ramos et al. [188] and Ramos and Vale [189] to apply hierarchical clustering. Sánchez et al. [205] chose SOM for train- ing, then clustered the map units with K-Means. The vectors were constituted of quantitative features including descriptive characteris- tics of the shape of the curve such as the number of peaks or the ratio of peaks and valleys, but also qualitative characteristics from survey data extracted from the load curves. Liu et al. [155] extended time-based statistical variables used in previous work, incorporating time-based peak load consumption with SOM. They differentiated the clusters based on the consumption level of customers instead of the shape of the load curve. These previous attempts at profiling load consumption can often be considered as explorative work on the po- tential of using load curves, as a growing number of datasets are made available to researchers and different clustering techniques can be applied. However, an evaluation of the "quality" of the obtained clusters has not yet been undertaken: this relates to the choice of the clustering algorithm, the distance measure that is examined and an analysis and discussion of shapes of the obtained characteristic load profiles. The cluster profiles that were obtained have very simi- lar shapes and cannot be differentiated based on the time of the day for the consumption. Insights about how to improve the segmentation can be learned from Iglesias and Kastner [111], who provided a very thorough analysis and comparison of different clustering techniques. Additionally, the work of Keogh and Kasetty [128] showed that care must be taken when mining data from time series to be able to justify the claims related to the results of an empirical evaluation. 24 customer segmentation

3.3 dataset and data preparation

In this chapter, we concentrate on cluster consumption patterns based on peak positions, which can be identified as hurtful moments of the day for energy providers. This would allow not only to characterize populations of customers, but also to react to the more demanding profiles. The latter can be enabled by adopting a strategy of contact- ing them and offering counseling or different tariffs, in order to in- fluence their consumption behavior to fit the utility companies’ goals. To target customers that are more likely to react positively to such stimuli, their selection can be supported by favoring stable behaviors over time (households that do not significantly change their time of peak consumption from one week to the other, which can be seen as stable1). A way for the utility companies to better provision for their network without relying only on synthetic load profiles can be fore- seen, which might be aggregating the information too much and thus, be less adaptive to the specificity of the population of customers that are served. To this end, we analyze the Irish CER dataset [54], which contains 30-minute readings of 4225 residential customers, which were col- lected over a period of 18 months throughout Ireland. Building the analysis with these data allows us to show that the results are signif- icant and not influenced by an ill-sampled, hence not representative enough set of households. In this section, we present how to proceed to achieve robust results. The first task consists in assessing the quality of the input and decid- ing the format of the object to be clustered. Then, we explore suitable clustering methods and the choice of parameters that can enable the identification of peaks in the load curves.

3.3.1 Data Pre-processing

Overall, we focus on the shape of the curve instead of the exact amount of energy consumed. The goal is not to forecast the load at any point in time, but rather to target a set of clusters that diverge in the position of their peaks throughout the day. We review the re- quired steps to build the objects that will be clustered.

3.3.1.1 Cleaning of the Dataset Best practice in data mining consists in verifying the quality of the in- put. Given the technical reports about the data collection [53, 55], we assumed the presence of potential hardware failure and discarded the data collected from the first month. 0k Wh-readings were neverthe- less identified throughout the span of the data collection. Their pres-

1 On the contrary, it could be argued that unstable households are of interest. 3.3 dataset and data preparation 25

0.35

Sunday, September 6, 2009 0.3 Tuesday, September 8, 2009 Thursday, September 10, 2009 0.25 Friday, September 11, 2009 Sunday, September 13, 2009 0.2

0.15

Consumption (kWh) 0.1

0.05

0 0 5 10 15 20 25 Slot (h)

Figure 9: Daily curves for one household with multiple consecutive 0k Wh- readings. We notice that the issue is not related to one single day of the week, but can occur any day of the week.

Figure 10: Histogram of the maximum length of consecutive 0k Wh- readings per day

ence could be attributed to smart metering faults2. Different reasons could be suggested such as blackouts (in the case where the smart meter is not self-powered, such readings could be happening) or com- munication errors due to network unreliability. Figure 9, shows the case of one particular household with multiple 0-kWh readings. We can clearly see that this pattern does not only apply to specific week- days, but to any day of the week. Hence, we decided to investigate the occurrence of the null measurements through histograms of their distribution. The adopted strategy relied on evaluating the propor- tion of incriminated consecutive measurements. When looking at the maximum length of consecutive null readings, the number of affected daily curves quickly dropped below 100 as the length of the sequence increased. This motivated the choice of a sequence of five consecu- tive 0-kWh records as a cut-off value for the removal of incriminated curves as can be seen in Figure 10. This allowed the discarding of a very negligible number of curves overall (0.55% to 0.8% of all curves in the datasets listed in Table 2).

2 0k Wh-readings should not happen: http://www.ss3meteronline.co.uk/faq.html 26 customer segmentation

The data were collected at a frequency of once every 30 minutes, providing 48 samples per day. However, this was not the case when Daylight Saving Time (DST) was implemented. In the case of shifting to winter time, one additional hour was added to the daily readings, implying that 50 samples were recorded and when moving to sum- mer time, one hour disappeared, thus only 46 samples were kept. This was mitigated by correcting the incriminated days and transforming the corresponding vectors of readings into regular 48-sample vectors. For this reason, the third and fourth samples on the winter DST day were discarded, since they are duplicate readings from 1 am to 2 am. Regarding the summer time, as records from 2 am to 2:59 am were missing, the average of the first hour of data was replicated.

3.3.1.2 Splitting the Dataset Mutanen, Repo, and Järventausta [170] highlighted and incorporated seasonal differences in their implementation and analysis. To take these seasonal variations into account, the dataset was divided into summer and winter data, as those effects were expected to influence the shape of the profiles. For each season, four weeks of data were used as training data to build the cluster profiles, while the larger corresponding sets can be designated as test data.

3.3.1.3 Focusing on the Load Curves Shapes We decided to evaluate weekday patterns by averaging weekly data from Monday to Friday and removing special days such as public holidays. The variation from one weekday to the other was not signif- icant enough to build separate patterns for each day as expressed by Mutanen, Repo, and Järventausta [170]. The interest being primarily the overall shape of the load profiles, we considered the effect of a Wiener filter to remove the oscillations, which, in the framework set-up, are less relevant than the most promi- nent peaks. For this purpose, we selected different smoothing win- dows. Similarly, we examined different normalization and scaling meth- ods that are presented by Milligan and Cooper [163]. As described by Milligan and Cooper [163], normalizing each curve by dividing it by its maximum value not only preserves the shape of the curve, but also provides a scaling between 0 and 1 of all measurements. This preserves the relative variability between each reading and renders each object independent from each other and from the dataset (this is not the case if column-wise modifications are applied on the raw data for example). This was further consolidated by comparing the outcome of the clustering using the different normalizing techniques, which provided the most differentiated cluster profiles, i.e., cluster separation. 3.3 dataset and data preparation 27

Figure 11: Histogram of the distribution of the average weekly consumption using a log scale for 0.025k Wh bins.

We applied pre-processing on really low consumption load curves, to deal with cases where the dwelling is left inhabited and which are expected to only present a base-load consumption. To determine the threshold to separate base-load/standby consumption from "real" user triggered consumption patterns, we plotted a histogram of the distribution of the average weekly consumption. For bins of 0.025 kWh from 0.025k Wh to 3k Wh (the maximum), we determined that for all average weekly figures below 0.125k Wh, as in Figure 11. This step was implemented to identify consumption patterns, which should be treated as flat consumption cases (after the scaling, flat curves would be modeled as a vector with components equal to 1) and avoid them having an out of proportion impact on the clustering once that the normalization is applied and their shape magnified.

3.3.2 Clustering

In this section we present our two-phase clustering scheme. First, we introduce the clustering algorithms that we will consider and the parametrization for their evaluation. Then, we present the similarity ranking for the online cluster membership decision for incoming load curves based on the the characteristic load profiles determined by the clustering.

3.3.2.1 Algorithms We applied the most common clustering methods and rated them with the goal of finding clusters, which group households based on their ability to single out peak consumption over the day. For this purpose, we examined different choices of parameters for the follow- ing clustering techniques: hierarchical clustering, K-Means and SOM dimension reduction followed by K-Means. We are aware of the curse of dimensionality, which has been covered by many authors such as 28 customer segmentation

more recently and particularly thoroughly by Houle et al. [108]. The latter implies that points in higher dimensionality cannot be differen- tiated as summarized in Equation 1.   kXdk Dmax − Dmin lim Var = 0 ⇒ → 0 (1) d→ EkXdk Dmin

The difference∞ in performance of the whole clustering of the time series were evaluated against the extraction of a subset of features as listed by Beckel, Sadamori, and Santini [27] or the usage of Principal Component Analysis (PCA) to reduce the dimensionality of the data.

3.3.2.2 Number of Clusters We compared the performance of the clustering against the forma- tion of 5 to 14 clusters, as more clusters would lead to over-fitting and overcome the purpose of simplifying the visualization of house- holds consumption patterns. We also expected that a higher num- ber of clusters would lead to some clusters containing very few load curves with isolated shapes, instead of being able to generalize and highlight common features of the data.

3.3.2.3 Combination of Parameters To decide upon the most appropriate clustering framework to suit our goal of identifying different peaks, we evaluated the following combinations of parameters:

• whole time series clustering and extraction of features

• Wiener filter window (no filtering, 2 to 5 samples windows length)

• number of clusters (from 5 to 14)

• combination of different clustering algorithms and distances as seen in Table 1.

In particular, the whole time series clustering consisted of using 48- dimension load curves representing the average weekday consump- tion. We also needed to adapt the vectors when using the correlation and cosine distances. It required the data to be standardized column- wise, as the object we clustered were scaled between 0 and 1, implying that they had a relatively small standard deviation. We chose to evalu- ate the effectiveness of reducing the dimension of the input vectors to mitigate the curse of dimensionality. This was achieved by extracting 18 features, which comprehended statistical data over parts of the day (such as mean, standard deviation, min and max) and ratios used by Beckel, Sadamori, and Santini [27] and peak data, i.e., the number of 3.3 dataset and data preparation 29

Table 1: Combination of the evaluated clustering algorithms and distance measures

Clustering technique Distance SOM + K-Means Manhattan SOM + K-Means Euclidean K-Means Manhattan K-Means Euclidean K-Means Correlation K-Means Cosine Hierarchical Manhattan Hierarchical Euclidean Hierarchical Correlation Hierarchical Cosine

peaks during parts of the days. Alternatively, we selected the most sig- nificant PCA components (with contribution over 1%, which means 17 components are retained). After some preliminary testing, we adapted both the K-Means and hierarchical clustering methods in combination with the correlation and the cosine distances. Since the flat consumption patterns could not be singled out, we applied a two-phase clustering consisting of pre-applying the K-Means algorithm with the same settings but choos- ing the Euclidean distance to isolate the flat load curves. The cluster- ing with the current choice of parameters was then carried out on the remaining curves. The data were stored in a PostgreSQL database. Scripts to fetch and format the data were written in Python and Shell. For the clustering part, Matlab’s implementation of the clustering algorithms was used, along with the SOM toolbox3 and peakdet toolbox4 for determining the location of the peaks.

3.3.3 Similarity Ranking

The usage of characteristic load profiles and their integration into an online portal can save the cost of re-clustering data for the new load curves. Selecting the cluster they belong to relates to selecting the most similar characteristic load profile, i.e., having the smallest distance. This allows us to significantly reduce the cost of assigning

3 http://www.cis.hut.fi/somtoolbox/ 4 http://www.billauer.co.il/peakdet.html 30 customer segmentation

Table 2: Winter and summer training and test sets. The table also contains the total number of daily load curves and the corresponding removed curves.

Start date End date # Days # Weeks Removed Total 08/17/09 09/13/09 28 4 651 118271 08/17/09 10/31/10 287 41 720 118300 10/26/09 11/22/09 28 4 7153 1212138 10/26/09 12/31/10 215 31 7257 908177

a household’s weekly consumption pattern to one of those clusters and thus permits a more scalable implementation. Also, it serves as a validation means for pondering the clustering accuracy. For this purpose, we examined different distance measures such as the Manhattan, Euclidean and cosine distances along with the corre- lation between the household load curve and the load profiles. We focused on their ability to match a given load curve to the most simi- lar reference curve.

3.4 experimental results

In this section, we present the experimental evaluation of our two- phase clustering scheme.

3.4.1 Seasonal Subsetting and Data Curating

The work presented by Mutanen, Repo, and Järventausta [170] takes the seasonal component into consideration and more recently, the U.S. Energy Information Administration reported that homes showed sea- sonal variation in electricity use5. For this reason, 4 different subsets were built from the CER Irish dataset, which was collected from July 14, 2009 to December 31, 2010 as can be seen in Table 2. DST dates for Ireland were used as benchmarks for separating winter from summer, i.e. October 25, 2009, March 28, 2010 and October 31, 2010.

3.4.2 Evaluation of the Clustering

The selection of the most fitted clustering method and parameters relies on the target of identifying hurtful consumption behaviors as peaks. The performance of the clustering is in consequence based on how the peaks of different load curves match the peaks of the

5 http://www.eia.gov/todayinenergy/detail.cfm?id=10211 3.4 experimental results 31 load profiles produced by the clustering. For this purpose, an N- th dimensional binary vector li for the i load curve and a binary vec- tor for its corresponding cluster ci are built, taking the value 1 when a peak is at a given position. The ratio of matching peaks is computed as in Equation 2, where < li, ci > represents the inner product of the two binary vectors.

N N if li(k) > 0  li(k) k=1 k=1  P PN N mi =  (2) 1 if li(k) = 0 and ci(k) = 0  k=1 k=1 0 P P  otherwise   Then the score used is the average of Equation 2 over all curves in the considered dataset as in Equation 3. We refer to it as the peak match score.

1 N m (3) N i i=1 X We use a second score for rating the distinctiveness of the character- istic load profiles. This consists of summing the Hamming distances of all pairs of the binary representation of the cluster profiles. The 20 top scoring configurations of parameters are highlighted in Table 3. Although the scoring functions offer a quantitative way of evaluating the clustering, they merely provide a set of candidates that will be evaluated visually. The candidates for the best clustering parametrization combines the concept that the load curves have to match the characteristic load profiles and the latter have to be distinct from each other. Extracting features from the load curves leads to the issue of scal- ing them appropriately so that the components of the vector are not overpowering each other during the process of clustering, which is for example avoided when the whole time series is used, as all readings are scaled. Overall, reducing the dimensionality from the 48-reading vector proved less successful as the scoring revealed that the peak match score was well below 10%. The presence of stacked versions of the same cluster was most prominent and hence, the distinctiveness of the clusters was not assured. Also, distance measures such as the Euclidean and Manhattan dis- tances tend to aggregate the points to the same cluster, as the no- tion of position of the peaks is absorbed through the summing. Thus, other attempts, such as transforming the load curve into a binary vec- tor that marks the position of the peaks or padding the original load curve with its binary peak representation, did not succeed either. Trading off these scores, K-Means with the correlation as a distance measure was selected. Also, the load curves were smoothed through 32 customer segmentation yeo lseigAgrtmDsac it idw#Cutr ekMthSoeDsicieesScore Distinctiveness Score Match Peak Clusters # Window Filt. Distance Algorithm clustering of Type hl ls.KMasCorrelation K-Means clust. Whole hl ls.KMasCorrelation Correlation K-Means Cosine K-Means Euclidean clust. Correlation Whole K-Means Cosine + SOM K-Means clust. Whole K-Means clust. Whole K-Means clust. Correlation Whole Manhattan Euclidean clust. Whole K-Means + SOM K-Means clust. Correlation Whole K-Means clust. Whole Cosine K-Means clust. Whole Manhattan clust. Whole K-Means + SOM K-Means clust. Correlation Whole Euclidean clust. Correlation Whole K-Means + SOM K-Means clust. Correlation Whole K-Means clust. Correlation Whole K-Means clust. Correlation Whole K-Means clust. Correlation Whole K-Means clust. Whole K-Means clust. Whole clust. Whole clust. Whole Table 3 : 20 o crn ofiuain fprmtr o h clustering the for parameters of configurations scoring top 40 14 5 30 0 0 0 13 10 0 13 0 12 0 2 14 0 5 13 0 5 13 0 5 14 0 2 13 0 5 11 0 5 14 0 3 0 14 5 14 0 4 12 0 5 11 0 5 13 0 5 12 4 14 5 13 4 5 4 5 ...... 91 238 230 276 19417 210 230 19502 290 192 19548 174 19597 259 19614 260 19624 273 19673 204 19778 19824 236 20156 190 20179 290 20321 236 20963 21182 21425 21554 . . . 19290 2199 96136 192 1946 1951 162 2059 3.4 experimental results 33

Figure 12: 15 clusters (i.e., 14 + 1, obtained through first phase flat curves separation), K-Means, correlation, filter window = 3 on the train- ing summer dataset. All characteristic load curves differ in the position of their peak.

Figure 13: 14 (i.e. 14 clusters with SOM + K-Means), Euclidean, filter win- dow = 5 on the training summer dataset. Clusters 1 and 2 are not distinguishable as their peaks are located at the same positions. the usage of a Wiener filter of window 3 (i.e. for each value of the load curve and 3 neighbors on the left on 3 on the right are used), which corresponds to using data in the scope of 1.5 hours around each measurement to correct the oscillations, which are considered as noise. The "appropriate" number of clusters was 14 (15, if counting the group of flat curves that were excluded by the first clustering phase). The results can be seen in Figure 12, in contrast with Figure 13, where not all clusters are as distinct and we see stacked versions of the same flat cluster and overlapping peaks.

3.4.3 Comparison with Similarity Distance Ranking Classification

Different distance measures to rank the similarity of the load curves to the cluster profiles were tested to determine the smallest distance to classify the curve into the right bin and assess the quality of the initial clustering on the training set. As can be seen in Figure 14, the 34 customer segmentation

Figure 14: Comparison between the clusters built from the training set in Fig- ure 12. The dashed line curves represent the new cluster profiles using the summer test set using the cosine distance as the similar- ity measure.

best results are achieved with the cosine distance as the resulting cluster averages match the reference characteristic load profiles.

3.5 conclusion and discussion

In this chapter, we proposed a method to build cluster profiles with the objective of identifying hurtful behaviors from the utility com- panies’ viewpoint. Once that the clusters are built, the classification of a household consumption pattern from one week to the other is achieved by ranking the similarity of each curve to the previously es- tablished reference consumption patterns. Overall, the clustering pro- duces distinctive enough characteristic load profiles to target the dis- crimination of consumption patterns based on the peaks positions. Assigning households to these profiles requires little overhead, which would permit an integration in an online portal and lead to more ap- plications for the utility companies. The segmentation of the households will allow the utility compa- nies to get a better understanding of what consumption profiles exist among their customers and their proportion, instead of relying on an oversimplification of the consumer base through the usage of the synthetic load profiles. Based on the energy provider’s appreciation of what pattern is more hurtful, specific segments of customer can be easily selected and addressed. An application that could be foreseen would be to understand how the households’ consumption evolves over time and target either the more stable households (i.e., selecting a threshold, as the percentage of weeks a specific household remains in the same cluster or simply identifying the top x number of house- holds that have remained stable over time). This can be implemented in the frame of an awareness raising campaign as to maximize the chance of them reacting to a stimuli such as differentiated tariffs as a way of inciting load shifting. CROWDSOURCINGENERGYDATALABELING 4

This chapter is based on work developed by the author during her exchange at the LSIR at EPFL and published in the Proceedings of the 2015 Workshop for Sustainable Development at the 2015 IEEE International Conference on Big Data (BigData ’15)[39]. Additionally, the chapter contains an extension of the publication appearing in the Proceedings of the 2016 Workshop on Smart Grids at the 2016 IEEE International Conference on Big Data (BigData ’16)[36].

Achieving energy efficiency in households requires integrating the residents in the loop. At the moment, most utility company customers are only accustomed with the format of monthly bills as a feedback for their electricity usage. As a result, they are often over- or under- estimating the consumption patterns of their appliances and are not familiar with energy jargon [16, 87, 127]. Confronting them with con- crete information, and in particular, providing real-time feedback was estimated to offer higher potential energy savings under the best con- ditions [16]. Additionally, a smart home agent can incorporate an am- bient intelligent system that monitors the residential consumption in real-time and control appliances based on usage and occupancy pat- terns. Understanding human behaviors incurring energy consump- tion would allow us to determine when and which appliances are triggered together to perform those activities. This would enable us to give energy savings recommendations at the activity level and extend the range of measures to improve energy efficiency. A user would thereafter be able to optimize their energy consumption to their own individual needs, thus, making choices that cut the energy bill with- out sacrificing quality of life. Learning users’ activities requires determining when humans are interacting with appliances. While static thresholding has been used in prior work [152, 229], these methods are not agnostic of the ap- pliance type and model. Therefore, any effort to produce a learning algorithm for automatic thresholding [37] requires ground truth data, i.e., an indication of when an appliance is turned on or off by the user, for validation. However, acquiring high quality data demands efforts for planning, deploying and monitoring the experiment, and incurs considerable costs [84]. While the infrastructure’s installation does not involve the active participation of the households’ residents, the acquisition of ground truth data requires human efforts for the annotation of events. This task has to be carefully designed to be sim- ple enough and should not induce user fatigue in order to guarantee the labeling quality [196, 202].

35 36 crowdsourcing energy data labeling

Real-world ground truth data are required for validating or infer- ring models in different fields that rely on machine learning. As the al- gorithms rely on supervised or semi-supervised learning techniques, the need for ground truth data has increased. Common but helpful tasks such as determining which email should be classified as spam benefit from sets of sample junk mails, but the fine tuning of the the classifier still requires the users’ participation to reduce false posi- tives and false negatives. In computer vision, object recognition relies on the segmentation of an image (similarly to performing it on the frames of a video) to indicate what objects are present, but also where they are located. Attempts at building large sets of human annotated images involve crowdsourcing the efforts and relying on gamifica- tion [6] or a collaborative framework [123]. CAPTCHAs [83], which are traditionally used for verifying that a user is human and not a robot before granting them access to a resource, are now diverted to extract street numbers for Google Street View. Research topics in computer science are not the only ones requiring ground truth data, as the study of the genome and the understanding of the function of each gene is also adopting the strategy of crowdsourcing the efforts in their community [213]. In this chapter, we examine how to fully evaluate learning algo- rithms, when ground truth labels are missing, although they are nec- essary to quantify the goodness of learning algorithms. To leverage existing datasets, we took advantage of an unexploited source: the wisdom of the research community. Crowdsourcing has contributed to considerable advances in the development of algorithms in com- puter vision or information retrieval systems, by allowing the cre- ation and the consolidation of large datasets of annotated data. This had not yet been used in the energy research domain, where, instead, researchers have to wait for other research groups to release datasets that are often imperfect in diverse dimensions (missing event-based labels, being of too coarse granularity, having collected over a short period, not containing enough appliances, etc.). In Section 4.1, we present our Collaborative Annotation Framework for Energy Data- sets (CAFED) for retrofitting labels on existing datasets by having researchers who have gathered considerable experience with power data, by being familiar with appliance signatures through Non-intrusive Load Monitoring (NILM) research, annotate power curves. We fo- cused primarily on active vs idle annotations, but our system could be easily extended to allow other labeling on top of time series data. In Section 4.2 we provide a thorough evaluation of the expertise of users to provide crowdsourced annotations for energy time series. The CAFED manually annotated ground truth is used to validate the GMMthresh algorithm that we will present in Chapter 5. 4.1 cafed: a collaborative framework for energy datasets 37

4.1 cafed: a collaborative framework for energy data- sets

Targeting human activities responsible for the energy consumption instead of focusing solely on single appliance feedback for achiev- ing energy efficiency in residential homes would link human behav- iors to the resulting energy consumption. To this end, learning when appliances are in an active or idle state and the related user activ- ity is crucial. Until smart appliances become widespread and can communicate their internal state, identifying when the residents in- teract with the appliances has to be determined from the available information that can be recorded from these devices. We will present GMMthresh in Chapter 5 for this purpose, which, as a learning model, requires validation through ground truth data, in the form of annota- tions to indicate when an appliance is active or idle. Launching data collection campaigns to incorporate these missing ground truth data involves careful planning before the roll-out of the experiment. Pro- hibitive costs for the hardware and time investment to monitor the deployed equipment are necessary for quality data. As such, pub- licly released datasets containing appliance-level data offer a basis for most researchers. This section addresses these challenges by pro- viding a collaborative web-based framework to retrofit labeling on ex- isting datasets. The platform is publicly available, applies the wisdom of the crowd in the realm of energy research and leverages gamifica- tion techniques to encourage users’ active contribution. The access to the platform and furthermore to the expert manually labeled dataset intends to enable future research and foster more collaboration in this area. In the energy domain, efforts have been deployed to offer toolkits for simplifying the deployment of data collections [84] or the evalu- ation of NILM algorithms [23] on the most common publicly avail- able datasets. The existing literature shows that in the case of the ap- pliances, considerable progress on the understanding of the energy signature of devices has been made [19, 97, 103, 253]. Existing at- tempts at obtaining ground truth data for ON-OFF events depended on human supervision for the annotation of existing energy datasets obtained through an event detection algorithm [183]. More complex annotations such as acquiring human activities labels were achieved through a web platform [196]. However, there has not yet been any initiative to take advantage of the wisdom of the community on en- ergy disaggregation to annotate existing datasets. “Crowdsourcing systems coordinate large groups of people to solve problems that a single individual could not achieve at the same scale. Microtasking systems typically use highly-controlled workflows to manage paid, non-expert workers toward expert-level results. While these crowdsourcing approaches are effective for simple independent 38 crowdsourcing energy data labeling

tasks, many real-world tasks such as the ones in design and engi- neering require deep domain knowledge that is difficult to decom- pose into independent microtasks that anyone can complete.” [195, p. 1] Consequently, most crowdsourcing workflows and algorithms aim to structure non-expert contributions to produce expert-level perfor- mance. In this section, we propose to leverage the knowledge acquired through NILM research to annotate the Pecan Street dataset. This dataset was collected in the frame of an experiment involving a smart grid demonstration project in Texas and provides electricity, water, and natural gas and solar generation measurements [171]. The pub- licly available version of the dataset we use contains appliance-level data and thus, does not provide state information about the appli- ances, i.e., when they are active from when they are in standby mode or off. Thus, the task consists in indicating when an appliance is pow- ered on and being actively used to serve a human activity and when it can be considered idle. Our approach brings expert crowdsourcing to the very specific domain of labeling and annotating energy events in public datasets. Unlike other domains where we can leverage the wis- dom of the crowd, here the activities require expert knowledge from the community. Regardless, we make use of gamification techniques to promote expert user participation. We attempt to provide an easy to use framework as a modulable plugin that can be used on existing publicly available datasets to pro- vide crowdsourced annotated data to energy experts and made freely available to the community. We summarize the key contributions of this section as follows: • We describe the design of a web interface for the annotation of a power trace dataset (such as the Pecan Street dataset) and relying on an intuitive approach, from the users’ perspective, with simple drawing tools;

• We characterize the design of a fetching engine to keep track of single users’ and the crowd’s performance overall, providing a consistent annotation flow and ensuring data consistency and motivating users’ contribution;

• We explain how our approach fosters interaction among researchers in the domain, leveraging the wisdom of experts, and thus con- tributing to the future research in this area by providing access to the annotated data. The remainder of this section is organized as follows. Section 4.1.1 presents the challenges for collecting large labeled datasets in di- verse domains. Section 4.1.2 introduces key components of our an- notation framework. Section 4.1.3 describes the motivational tech- niques we include in our design to engage users’ participation. Sec- tion 4.1.4 discusses results obtained through the usage and evaluation 4.1 cafed: a collaborative framework for energy datasets 39 of our platform by test users. Section 4.1.5 explains how the data acquired through our system can be disseminated among the com- munity. We conclude by discussing lessons learned and future work in Section 4.1.6.

4.1.1 Challenges for Collecting Large Labeled Datasets

4.1.1.1 Home Energy Analytics Launching an energy data collection in residential environments re- quires finding volunteers and efforts in planning. Efforts to main- tain the hardware and solve failures or other anomalies that could be introduced by the faulty behavior of the residents are necessary to guarantee the quality of the data. Some issues can be alleviated with the usage of a framework like Piloteur [84], as it can serve as a best-practice basis for the deployment back-end. However, a real- life experiment involves monetary costs in terms of the measuring equipment, but also for the installation, as the complexity increases depending on the household’s setup and the appliances and circuits that should be monitored. Efforts in terms of time and costs are thus often prohibitive and discourage the acquisition of new data. The energy community has benefited from NILM research, as they have collected and shared disaggregated data. The datasets vary in the number of households that were included in the experiment roll- out, the type of appliances and circuits that were monitored, the duration of the data collection, the type of data that were collected (power, voltage, etc.) and their granularity. REDD[ 135], BLUED[ 14], Smart* [20], Pecan Street [171], iAWE[ 22] ECO[ 26] allow the commu- nity to benefit from the efforts of the groups and the organizations that initiated those data collections, but also provides lessons learned for prospective set-ups.

4.1.1.2 Other Domains Data analytics techniques are applied to increasing amounts of col- lected data to extract knowledge from them and are assisting in the verification of models in diverse domains. Diverse applications re- quire human input to improve the quality of a classification algo- rithm such as spam filtering or Internet search [6]. NELL, the au- tonomous learner system requires adjustments to its newly acquired categories by integrating some daily human interactions [42]. Crowd- sourcing has also become popular for providing metadata for Twitter messages [86]. However, not only computer scientists, but also biolo- gists are faced with large data amounts in their quest to understand gene functionality. There have been efforts to integrate annotations collaboratively in a structured way for the Zebrafish genome [213]. 40 crowdsourcing energy data labeling

Computer vision is a field where the diversity of the concepts that should be captured by images and videos requires large collections of real-life examples to be collected. General labels can be obtained from content description in the HTML anchors for images [46]. While CAPTCHAs were at first introduced to differentiate robots from hu- man users, by carefully embedding images in them, labels for text and image recognition can be obtained with varying degrees of ac- curacy [83]. However, precise segmentation of objects would require a different environment design and more focus on the task. Prior work in this domain has already considered crowdsourcing segment- ing images and labeling areas of interest in images [193, 235, 249]. The integration of gamification into the labeling pipeline was already regarded by the computer vision community to reduce the tiredness incurred by the task [6].

4.1.2 Framework

We present our Collaborative Annotation Framework for Energy Data- sets (CAFED)1. A view of the framework is available in Figure 16. The system architecture can be seen in Figure 15. The technical implemen- tation details can be accessed as additional material on our GitHub repository.2. We discuss the key components in the following.

Figure 15: CAFED architecture, based on a web server architecture with a database for handling 3 key components: security (authentica- tion), curve dispatching and annotation.

4.1.2.1 Database Architecture pecan street (formerly known as wikienergy) database CAFED relies on the Pecan Street dataset,3 which was curated and for- merly hosted by WikiEnergy. The data were collected from January to May 2014 in 239 households and include 73 categories of appliances

1 https://cafed.inf.ethz.ch 2 http://github.com/caoh/CAFED 3 http://www.pecanstreet.org/ 4.1 cafed: a collaborative framework for energy datasets 41

Figure 16: Annotation view. We highlight in red the curve selection mod- ule and in orange the annotation workbench. The personal per- formance component is highlighted in blue, while the competi- tive components are in purple. The badge section shown in green highlights the badges acquired by the user. and circuits and provide 1-minute measurements. The original Pecan Street data were stored in a PostgreSQL database in a spreadsheet- like format. Each row of the table has the following attributes: the household id, a timestamp with time zone information, a real value that stores the total power consumption at the corresponding times- tamp, and real numbers for all types of appliances and circuits that were monitored over the whole dataset. This means that for each row, a lot of columns are empty. We normalize the original Pecan Street database in order to optimize updates and inserts for our framework and provide a detailed Entity-relationship Diagram in the Appendix, in Section A.1.

4.1.2.2 Security Since the framework consists of a web platform, several measures have been taken to guarantee the users’ confidentiality and privacy. Prospective users are encouraged to sign up for an account, where 42 crowdsourcing energy data labeling

they can choose a username and share their full name and email address. The authentication is handled by phppass4 and passwdqc,5 which are based on recommended methods for salting and hashing the passwords. The user is provided with the option to change their password at their convenience and to create a profile with additional information such as their addresses and their affiliation (only univer- sity at the moment). The relevant data are stored in two separate ta- bles in the database. We considered using location (derived from the address, country, affiliation or IP address of the users) to offer addi- tional gamification features based on the location of the contributors as will be discussed in Section 4.1.3. Additionally, typical measures for banning malicious IPs, session management and different attacks are implemented following the OWASP6 guidelines.

4.1.2.3 Dispatcher The dispatcher is handling the fetching of the curves to be annotated by the experts and guarantees a dynamic and targeted assignment of the missing labels. It relies on the use of a fetching table to keep track of how many annotators have been allotted a given power trace (pending annotations) and how many tasks were fulfilled to consol- idate the result (committed annotations). The fetcher is called by a function that queries and updates the fetching table and returns the data to be annotated to the user. The data quality is enforced by the use of majority voting to decide the final value to be attributed to a given measurement in a power trace, the first objective to reach would be to obtain three annotations per curve. Once that this value is reached for all the readings, we expand the threshold to the next odd number. The dispatcher implements two modes of operations for the curve attribution. The user has the option to randomly display curves by letting the dispatcher choose the household and the appliance type or circuit. The alternative allows the user to select the type of appliance for their assignment. Using this schema, the fetcher is keeping track of the available data that still need to be annotated for that specific selection and when a household is identified, it will try to maintain continuity by attempting to attribute power curves from the same household day, after day by keeping track of previously annotated data by the same user.

4 http://www.openwall.com/phpass/ 5 http://www.openwall.com/passwdqc/ 6 http://www.owasp.org/ 4.1 cafed: a collaborative framework for energy datasets 43

4.1.2.4 Annotation The objects to be annotated consist of time series over the span of a day. This allows the user to correlate potential events arising during a day to variations in a power trace and to decide which changes can be attributed to a device or circuit being powered on. Similarly to the problem of segmenting objects in an image or a video [123], we require the annotator to highlight portions of a power trace to indicate the occurrence of an event, in our case, when the appliance is active. We integrate a toolbox with drawing features to enable the annotation of portions of the curves as can be seen in Figure 16. We binarize users’ inputs by transforming the highlighted areas into ones, while setting the rest to zeros.

4.1.3 User Engagement and Motivation

Crowdsourcing has largely focused on tasks any individual can com- plete: many crowdsourcing platforms are built to accomplish tasks that require little training (e.g., Amazon Mechanical Turk (AMT)) and recruit amateurs (e.g. FoldIt). Also, at the moment, those platforms are not suitable for more complex tasks that require an interactive interaction with the data to be annotated, in our case, time series, without significant effort to re-direct the worker to one’s own self- hosted annotation system. They are instead designed to provide con- tent description through categorical or survey type of annotations (obtained through text fields, tick boxes or lists). Consequently, most crowdsourcing workflows and algorithms aim to structure non-expert contributions to produce expert-level performance. In the energy do- main, the annotation process requires experts that are capable of un- derstanding and labeling the power events. Using arbitrary set thresholds [174] does not perform satisfactorily in cases where the baseline consumption is above and will not scale with the diversity of appliances and baseline consumption profiles. In the case of circuit-level at the room level, the energy used by con- sumer electronics in standby mode adds up and will vary from one household to the other, making the definition of a threshold diffi- cult to scale on a large set of households. Additionally, appliances in standby-mode should not be considered actively in use, so the notion of baseline also applies to them. Given the diversity of appliances and intra-categorical variances due to brand, model and production year differences, deriving this information without expert knowledge about the expected power signature of electrical devices and notions about the mechanical functioning of the appliances would induce the amateur annotators to label the time series incorrectly. As can be seen in Figure 17, in the case of dishwasher1, the expert recommended high- lighting the activity in one block because of his knowledge of subse- quent cycles through a washing program (instead of producing seg- 44 crowdsourcing energy data labeling

mented annotations as the power dropped to the baseline). Then in the case of livingroom1, not only is the baseline to be decided upon, but small peaks before 06:00 could be interpreted as an activity di- verging from the baseline, while side information such as when these arise and their frequency would indicate otherwise. Our design sets expert annotators at the core of the system, as their contribution is essential to the building of the manually labeled ground truth data. We discuss ways to facilitate users’ interaction with our system and how to acquire their loyalty. We consider differ- ent means of motivating the domain experts to participate. We expect two profiles of users, namely (i) experts whose research interests can benefit from the dataset, (ii) experts that are altruistic and wish to contribute to the community. In the case of the altruistic contributor, we integrate both intrinsic and external motivation elements [119] in the form of gamification techniques to alleviate the repetitiveness of the annotation task. Obtaining the dataset is also considered as a mo- tivational tool as will be discussed in more details in Section 4.1.5. We describe below the elements that are implemented in the framework.

4.1.3.1 Annotation Task Simplification With such repetitive task as the annotation of data, users are required to familiarize themselves with the platform and to be able to interact with it efficiently. The perception of the easiness or difficulty of han- dling the tool will influence the contributors’ willingness and thus motivation to use it [119].

curve selection We decided to embed two modes for curve se- lection in CAFED, namely the random and appliance-specific modes as can be seen in the red area in Figure 16. Both modes can be se- lected by choosing the appropriate option as the user logs into the platform. The random mode allows the dispatcher to select the curves randomly as described in Section 4.1.2. If the user is not comfortable with the curve they were assigned to, we embed a skip button to query for another appliance. The appliance specific mode allows the user to choose the appliance they are the most familiar with. This might speed up the annotation progress, as the user can put their expert knowledge into practice, while the random mode allows for more diversity and surprise. The skip button allows to navigate be- tween households. To minimize interactions with buttons and other input interfaces and preserve the annotation flow dynamics, after the user has submitted their annotations, a new curve is automatically se- lected by the dispatcher based on the user’s preference and displayed again in their workbench.

curve annotation We consider that the most natural way of in- dicating which area of a curve represents a period when an appliance 4.1 cafed: a collaborative framework for energy datasets 45

(a) Single appliance: dishwasher1 (b) Circuit: livingroom1 with noticeable baseline consumption

Figure 17: Annotation workbench in the case of a single appliance and circuit-level data is active would be to draw or highlight it with a marker (similarly of locating objects in an image). The user is thus provided with a tool- box consisting of a pencil, an eraser and a magnifying glass (and their respective icons replace the cursor in the panel that contains the curve to be annotated). This enables an interaction similar to using a sheet of paper and a pen in the physical world for the annotations as can be seen in Figure 17. When selecting the drawing mode, regardless of the height of the cursor (which takes the appearance of the icon representing the feature currently on), clicking and dragging it to the end of the desired area will act as a highlighting feature. We also in- tegrate the option to erase the annotation and to zoom in to focus on curve portions. We also pay attention to the layout of the information as to facilitate the decision process for the areas to be annotated. We combine a view where the user can compare the original curve in blue to its binarized version in green as can be seen in the workbench in Figure 16 in orange and in Figure 17. We display information related to the curve such as the household’s ID, the type of appliance or circuit that is represented and the day on which the data were recorded. In order to have side cues on how the data should be annotated, we add the next 6 days in the right panel for the same appliance and household and always normalize the graph’s y-axis to the appliance’ maximum power reading over all data available for the considered household to avoid scaling confusion. We preserve the chronology of the curves by displaying the current curve in the left panel, while the next days are shown on the right side. These measures are embedded to guarantee consistency in the annotation process and to provide side information to the annotator.

4.1.3.2 Gamification In our setup, we assume that users are content contributors. In par- ticular, we intend for the annotations to be provided through crowd- sourcing by domain experts and thus, be trustworthy data, as they 46 crowdsourcing energy data labeling

have the necessary knowledge to provide the appropriate labeling of the data. Since annotating energy datasets is an activity that would hardly be decomposed into independent microtasks that anyone can complete, we differentiate ourselves from the usage of other crowd- sourcing platforms. In addition, using services like AMT would imply having to monetize the effort and evaluate the quality of the workers’ contribution or even to select the appropriate workers [148]. Regardless, in an effort to motivate users’ participation, we inte- grate some gamification concepts to foster user engagement [70, 252]. We focus here on two intertwined techniques, namely feedback through performance tracking [119] and the usage of badges. From a socio- psychological standpoint, badges offer a set of attributes, which com- bine educational and social influences on users’ motivation [15].

performance tracking Performance tracking can be twofold: allowing the user to keep track of their own progress or to position their contribution in comparison with the rest of the participants. Live feedback on the user’s performance assesses the user’s past contri- bution and contributes to their motivation [119]. We implement the latter in the performance panel, which is located at the left of the workbench as not to distract the user too much from their task, but still being close to the eye if the user wants to peek at the information as can be seen in blue in Figure 16. The user’s personal performance combines about the number of data points and the equiv- alent number of curves were submitted, the number of days since signing up or the user’s best daily performance. We also display the user’s past 7-day performance in the form of 7 squares that can take varying shades of green depending on the number of submission for each day (white for 0, a pastel green for 1-2, an apple green for 3-9 and a dark green for over 10). By placing the cursor above a square, the user can view the exact number of submissions for a specific day. As will be explained in Section 4.1.5, the user’s contribution is re- warded by the release of the data. The historical feature can assist the user in the scheduling of their contribution and to motivate them to provide submissions frequently, until the data are unlocked. We introduce competition by showcasing the user’s performance against the other group members with a leader board as can be seen in pink dotted lines in Figure 16. The information appears in the wel- coming section as to be the first information to be displayed upon logging in. This not only should motivate the user to improve their rate of contribution as they compare themselves to others, but also provides recognition as a reward, as all annotators are faced with this information [119]. As mentioned in Section 4.1.2 we could add more leader boards based on categories to value the users’ performance in subgroups where they would rank higher (based on their affilia- tion, on the continent or the country they are located in). We add 4.1 cafed: a collaborative framework for energy datasets 47

information about the collaborative effort of the community in the form of progress bars in the performance panel on the right side of the workbench as can be seen in pink solid lines in Figure 16. This information consists of the progress in labeling the time series (dis- tinctively for each time series and in a consolidated manner, where the minimum number of annotators has been reached for each time series).

badges As platforms such as Stack Overflow7 provide free sup- port for users from varying backgrounds to ask question and con- tribute answers, badges were introduced to reward contribution. This, as a consequence, also provides an appreciation of the contributors’ knowledge by their peers and based on their pedigree. Carefully placing badges can steer the user’s behavior towards targets set by the platform designer and this effect increases as they approach the boundary to gain them [13]. Content generation of geo-tagging data was also boosted on Foursquare by the usage and constant addition of new badges to be acquired by users that were checking in at places they visited. Using badges not only enforces user loyalty or boosts and rewards performance, but also, some pedagogical feedback can be given to the user [252]. Our badges are awarded for different types of behaviors and can be grouped in different categories as can be seen in the green area in Fig- ure 16. In the following, we explain the goals that should be achieved with the diverse categories of badges we intended to include. Some side effects of awarding some badges can be linked to performance tracking as well. We use an alert box in green to attract the user’s at- tention to the acquisition of the latest badge. As we cannot expect the user to read the information page that also provides an overview on how badges can be obtained, the box also contains an explanation as of why they obtained the badge. The badges are placed by categories as to facilitate the reading of the information and keeping track of the badges acquired. By using this design we acknowledge the user’s performance in real-time and, by incorporating an additional way of signaling that a badge was acquired, we avoid that the information gets lost (pop ups can be distracting and as a usual habit, can be closed to prevent inconvenient alert windows, which are usually as- sociated with advertisement).

Submissions Badges We differentiate badges that are permanently awarded for a particular milestone and ephemeral badges, for which a regular performance has to be provided in order to maintain them in the user’s badge collection. While we provide an information page that not only describes the purpose of the framework but also how to acquire badges, we include some badges that are designed to pos-

7 http://stackoverflow.com/ 48 crowdsourcing energy data labeling

itively enforce the user’s interaction with the annotation system by acknowledging their contribution. For example, we add the submis- sion badges, which are received when a given number of submissions is achieved. The contribution is already rewarded with one submis- sion and validates the user’s first interaction with the system as being correct and successful. The submission category is at the moment the only one intended to use leveling for the same type of badges, as to reinforce a given purpose, which is in our case to attract more submissions. They are relatively easily achieved as to motivate the user to contribute even on a small scale already. The target number of submissions can of course be extended to encourage the user to contribute more.

Expertise Badges We also encourage exploration by having expert badges that are awarded as a larger range of types of appliances are annotated. Using both the random and the appliance selection modes, the succession of curves that are assigned to them or the different appliances / circuits that are picked by affinity can lead the user to collect an expertise badge by chance. Although we could have used levels such as with the submission badges, we first decided to reward curiosity, which means that once that all appliances in a group have been annotated once, the badge is delivered to the user. We could extend it to other expertise badges to foster relentlessness in a given field or for one specific appliance or circuit. We establish natural groupings of appliances / circuits and create the corresponding badges. For example, an expert badge is awarded once that all appliances that could be found in a group have all been submitted at least once. Concretely, the bathroom expert badge is awarded for the appliances linked to the bathroom environment, while the chef badge rewards curiosity in the kitchen area. The climate expert badge can be obtained by annotating all corresponding climate regulation appliances. The explorer badge is awarded for thinking out of the box, for submitting annotations for appliances that are not so widespread across the dataset such as a wine fridge or appliances with the unknown label. The light expert badge can be gained by label- ing all lights, while the outdoor badge relates to appliances that can be found outside of the household. The home owner badge is awarded once that all appliances in a household have been annotated at least once and thus, this depends on the dispatcher’s selection.

Ephemeral Badges The previously presented badges were for- ever awarded badges. To influence the user’s loyalty and thus fre- quent contribution, we add ephemeral badges. Daily attributed badges reward current performances and consistency badges and require the user to contribute more frequently over time. Top contributor badges are awarded for the user’s ranking over the previous day and require 4.1 cafed: a collaborative framework for energy datasets 49

that they top the other participants on the current day to retain the badge. The endurance badge is intended as a motivational tool that is triggered once that the user has submitted 10 submissions over the course of the current day. We also add the frequent flyer badge, which can be obtained once that the user has contributed at least once per day on five occasions over the course of a week. This badge can be kept as long as the previously explained ruled in respected in the span of 7 days. Finally, the champion badge rewards a contributor that have annotated all curves in the dataset, to target over-achievers. We could of course add more rules, more badges and probably a continuous renewal process to integrate more badges in a similar way to Foursquare to retain users.

4.1.4 Results

To the best of our knowledge, CAFED is the first system that provides a dynamic attribution of time series to be annotated by expert users, while consolidating the already annotated traces. Also, through the platform the results are stored and compiled in a ready to be de- ployed format. Most users have easily associated the annotation pro- cess with the highlighter and paper equivalent. Users have reported to require a few seconds to a few minutes (in case of very segmented portions of curves to be annotated accurately) to commit their an- notations. The first badge is awarded after one single submission and the users have recognized that it was perceived as a confirma- tion that they had correctly interacted with the platform. Most users have taken great care with zooming in and out to accurately indicate the start and the end of the task. When the users thought that they needed to justify their annotation, they provided us with an explana- tion for their reasoning. From their feedback we have realized that more features are needed to allow a comparison with other users’ annotations. The top three users provided the majority of the annotations with 600 annotations provided in the span of 2 weeks and devoting an average of 90 minutes per day for doing so. We provide an overview of the data collected so far in Table 4.

4.1.5 Giving Back to the Community

Substantial progress has been enabled through the public release of datasets. Through this framework, we intend to give back to the com- munity by providing access to annotated data. We follow in the foot- steps of platforms such as Pecan Street Dataport or NILMTK8 to pro- vide a unified access to an online platform to facilitate the creation

8 http://nilmtk.github.io/ 50 crowdsourcing energy data labeling

Table 4: Summary of the collected manual labels Result Value

Total # annotated curves 4856 # curves annotated by 3 annotators 469 # curves annotated by 2 annotators 572 # curves annotated by 1 annotators 2548 % curves annotated by 3 annotators 0.5% # users 9 # badges distributed 174

of manually labeled ground truth data and their dissemination in the community.

4.1.5.1 Combining the Results Although we are targeting domain expert users, we envision that their annotation will not always agree. However, by relying on the wisdom of the crowd, we decided to consider majority voting to con- solidate the annotations of the data points. Concretely, each curve is assigned to an odd number of annotators and the final outcome relies on the combination of the majority’s decision for each measurement. We start by requiring 3 as a minimum to be reached, which means that at least two similar annotations for a given data point are neces- sary and will yield its value eventually. As to increase the credibility and quality of data, as the number of converging decisions increases to reach a consensus, we decided not to stop once that the threshold of 3 annotators per curve has been reached, but to continue to the next odd number and so on.

4.1.5.2 Downloading the Data Our goal is to share these ground truth data with the community. However, this can only be possible once that there are data to be shared. We decided to release the dataset progressively to contribu- tors, as they reach certain levels of contributions. This can be seen as another gamification technique. In this perspective, we opt to re- ward frequent and numerous submissions. This can be represented by the combination of two badges, namely the endurance and the fre- quent flyer badges, in the form of the download badge. This means that all the data available can be downloaded as soon as both badges are available in the user’s badge collection. The user could of course only provide 1 submission per day on 4 days over a week and pro- vide 10 submissions on the 5th day, but this still means they should have contributed 14 submissions. They will need to have obtained 4.1 cafed: a collaborative framework for energy datasets 51 the download badge again to maintain their access to the current set of annotated data. The absolute figures in terms of available data to be downloaded are subject to change as the data provided by the community are increasing.

4.1.6 Conclusion and Discussion

4.1.6.1 Lessons Learned The curve fetching engine requires an optimized access to the time series. This requires a clear entity-relationship model and optimized indexing and the creation of assisting tables (as table joins can be inefficient if only partial information is required) for enabling the data query. Depending on the granularity of the measurements, extensive care has to be taken to estimate the number of records to be stored in the database. Inadequate data types will also quickly overflow. Having separate but replicated development and production envi- ronment allowed us to experiment with framework changes, without impacting the user’s experience too much. Having experienced data issues, backups allowed us to revert to previous versions and having additional data consistency constraints avoided any data loss. We launched a small user study for determining where our design was flawed. At the moment, we have tested our platform with 9 users who have provided over 4500 annotated curves. We have taken their comments into account in the design and the improvement of the platform. We realized that some features that seemed obvious and although documented in the Help section were not correctly identi- fied or used by the users. This is why we had to incorporate a help video and help markers in the workbench to direct the users’ atten- tion to the embedded functionalities. Also, a detail that can greatly impact the quality of the annotation consists in the y-axis scaling be- fore displaying the curves to the user. The dynamic scaling of the y-axis to the current curve data would produce inconsistency: lower power measurements (from the noise or the baseline) that would be unnoticed in the presence of active measurements would become vis- ible for a day without the residents’ activity and could be annotated. So, we proceeded to the scaling to the max value for all curves for the same household and appliance. After this change, users commu- nicated that they did not necessarily notice the y-axis scale and were mostly looking at the shape of the curve and using the y-axis for confirmation.

4.1.6.2 Future Work We have presented a modulable plugin that is accessible to the com- munity via a web platform and combines design features to facilitate the annotation process and to engage the user. We follow in the foot- 52 crowdsourcing energy data labeling

steps of initiatives such as WikiEnergy or NILMTK. We do not ex- clude a merging of all tools under the same platform to regroup the efforts to provide access to data and tools to the community. To improve the experience with the tool, we could add the possi- bility to search for annotation examples from other users. Similarly to Stack Overflow and as explained in [252] and in [15], this could show the status of different contributors by allowing them to interact with each other. This could be enabled by allowing a user to search for similar contents annotated by others and by displaying the user’s badges status with their username. In the case where the plugin were to be merged with another platform like WikiEnergy, which allows additional interactions between users, such as posting questions on a forum, we could display a summary of acquired badges associated with the username of the poster. Also, we would not need to restrict the labeling to binary decisions solely, but could easily adapt the plat- form to incorporate multi-label problems by extending the toolbox at the disposal of the users to annotate activities that took place during the day for example. In order to maintain the quality of the annotations and to prevent unintentional mistakes, we could add an amend option, that allows the user to correct previously submitted data. Also, we could envision pre-selecting active sections and presenting the result to the user and only require for them to validate or correct the pre-computed result, similarly to the example developed by [123, 183]. Also, since we have as well annotated an extensive amount of curves, they can be used as trusted data to verify and validate new users’ contributions and discard malicious contributors as will be presented in Section 4.2. In this section, we developed a modulable plugin that can be easily adapted to fit other datasets. By unifying the access to other data- sets, we would also prevent ad-hoc solutions, where each researcher would have to build their own system and we show that the labels can be extended to encompass more events that determine when an appliance is active or idle. In the next section, we will examine how non-expert users’ work quality can be quantified.

4.2 crowdsourcing through user expertise quantifica- tion

While tasks such as segmenting images or determining the sentiment expressed in a sentence can be assigned to the general public, oth- ers require background knowledge and thus, expert users need to be selected. In the case of energy datasets, acquiring data represents an obstacle to develop data-driven methods to achieve energy efficiency, due to prohibitive monetary and time costs linked to the instrumen- tation of households in order to monitor their energy consumption. More so, most existing datasets only contain pure power time series, 4.2 crowdsourcing through user expertise quantification 53 without appliances’ states or additional labels. To identify human ac- tivities that are responsible for the residential energy consumption, such labels are required to determine when a device is in usage from when it is idle (incurring stand-by consumption or being off), and by extension to separate human activities triggering energy usage, from the baseline consumption. In this section, we build upon our CAFED to evaluate and distinguish the performance of expert users against that of regular users in the annotation of power time series. Our con- tribution in this section consists in i) providing a thorough evaluation of the performance of general users in comparison with experts in the labeling of energy time series, ii) showing data-driven approaches to address quality issues with crowdsourcing and iii) providing design insights to improve workers’ submission quality. Through two user studies, one of which on Amazon Mechanical Turk (AMT), with cu- rated benchmark annotation tasks, we provide data-driven and effi- cient techniques to detect weak and adversarial workers and promote users when the contributors’ user-base is limited. Additionally, we show that if carefully selected, the seed gold standard tasks can be reduced to a small number of tasks that are representative enough to determine the user’s expertise and predict crowd-combined annota- tions with high precision.

4.2.1 Crowdsourcing Manual Labels Through the General Public

The development of learning algorithms entices the usage of data to improve and evaluate the accuracy of their outcome. Before the spread of online platforms, acquiring ground truth data was tedious as it was difficult and costly to recruit workers to perform specific tasks. These had often to be solved by benevolent lab mates, so it took considerable time to collect those datasets. Nowadays, the ma- jority of the micro-tasks that are present on AMT or CrowdFlower consist of image and text labeling and have contributed to build large- scale datasets that have allowed progress in the fields of computer vision and natural language processing. However, the introduction of a monetary gain instead of the benevolence of fellow researchers or acquaintances to label such data can lead to the abuse of the system to increase workers’ remuneration, at the expense of the quality of the data. While obtaining labels for text or image content can be distributed to a larger audience of workers due to the nature of the tasks them- selves, and can piggyback on existing systems such as CAPTCHAs, crowdsourcing tasks for different fields such as labeling genes or lo- cating volcanoes in satellite images would require domain knowledge expertise that is not widely available to the general public. Energy analytics, where data are obtained through the instrumentation of households to obtain power data from dwellings, has benefited from 54 crowdsourcing energy data labeling

the adoption of smart meters. Such datasets were released by differ- ent research groups and organizations and contain aggregated load consumption at the households’ level at a finer granularity. How- ever, for the development of human activity-level or more generally event-based algorithms linked to the consumption of energy caused by households’ residents, more labels that can be used for training and testing the algorithms are required. This is due to the fact that new datasets have to be collected, to include more appliances and real-time annotations from the residents: existing datasets have the shortcomings of having either been collected at coarser time gran- ularities, for shorter periods, for less appliances (sometimes having only aggregated household consumption) or simply without event- based labels (appliances states or human activities). High monetary costs to successfully carry out data collections reliably have hindered the advances in this domain. They are mostly related to the complex- ity in instrumenting households: the type of electrical appliances and the electrical wiring can force the sub-metering to be performed at the circuit-level and requires expensive hardware and the assistance of certified electricians, preventing the usage of cheaper alternatives such as smart plugs that can be inserted between the appliance’s socket and the electrical outlet. Our CAFED9 [39] represented the first effort to retrofit labeling on an existing dataset by leveraging the wisdom of domain experts to annotate an appliance as being active or idle, based on the time series representing its power consumption. Due to the low availability of users that can contribute to such sys- tem, we leverage online methods to adaptively evaluate and adjust a user’s score to the task’s difficulty. This salvages as many annota- tions as possible and promotes promising users, while guaranteeing the quality of the system, by being able to detect weak or adversar- ial contributors rapidly. Moreover, if the gold standard tasks are well curated, using domain knowledge, and accounting for the tasks’ dif- ficulty, few of these tasks are necessary to evaluate the users’ exper- tise levels. These can be used to predict crowd-combined annotations with high accuracy. We also found that coaching improves the perfor- mance of regular users, setting it closer to that of experts. This shows that the research in the energy domain can benefit from paradigms and advances in big data by leveraging crowdsourcing to collect and consolidate datasets and benefit from the wisdom of the crowd. In the following, we will review the related work in Section 4.2.2, then we will present two user studies for collecting the data in Section 4.2.3. We will then detail our methods for scoring the users based on their performance in Section 4.2.4, present our results in Section 4.2.5 and conclude in Section 4.2.6.

9 https://cafed.inf.ethz.ch 4.2 crowdsourcing through user expertise quantification 55

4.2.2 Related Work

EM[ 68] has been used to provide scores to evaluate the quality of the labeling of categorical data to separate between error and bias [114] and to compare expert (geologists) annotations against the perfor- mance of an algorithm in the case of images [215]. The performance of experts and non-expert users has been evaluated for natural lan- guage processing tasks and using expert annotated data to correct the annotation bias [216]. Probabilistic models have been used for inferring labels for images when the expertise of the annotators is unknown and the difficulty of the task is accounted for [240, 242]. Expanding the realm of tasks that can be solved by crowd-sourcing suggests taking steps towards improving the workflow, the design of the tasks to be assigned to the workers, valuing the participation of trusted users, expanding the existing platforms by integrating ma- chine learning andAI techniques to improve quality [ 132]. Recent work has established that behavioral cues based on the interaction with the platform for information retrieval tasks is more successful at detecting fraudulent interactions when compared to baseline gold standard tasks solved by experts [124]. Splitting the annotations of information retrieval into a training phase and a test phase has been surveyed [147]. Socio-demographic features have also been leveraged to isolate high quality workers for solving multiple choices ques- tions [148]. To improve the quality of the contributions, gamification techniques have proven to be more successful than filtering the work- ers based on their countries [79]. Energy datasets are mostly constituted of time series of power mea- surements. The development of algorithms for extracting knowledge such as the states of appliances from these data, requires ground truth labels. The monetary costs involved in instrumenting house- holds to obtain energy measurements [84] have hindered the appari- tion of new data collections to mitigate the absence of events that triggered the energy consumption. Crowdsourcing has not been used extensively for acquiring time series labeling, despite some initiatives for having expert-annotated data such as the CAFED platform [39]. The type of tasks differs from previous crowdsourcing initiatives as they require some background knowledge to be solved successfully and are more straining than the classical image or text labeling micro- tasks, which can be carried out in the form of multiple choice ques- tions or entering a value in a text box and is fitting for existing plat- forms such as AMT or CrowdFlower, but not for time series data.

4.2.3 User Studies

For the purpose of this evaluation, we ran two user studies, one with physical access to the participants and the second one on AMT, to 56 crowdsourcing energy data labeling

(a) Easy task (car, single appliance): no (b) Medium task (livingroom, circuit): baseline consumption. baseline, periodic (timer) consump- tion, gold standard overlay.

(c) Difficult task (refridgerator, single (d) Coaching for improving the energy appliance): high oscillation, peri- knowledge: focus on two distinct odic pattern. blocks to decide if they are linked to the same activity.

Figure 18: Tasks’ difficulty levels and coaching on CAFED.

compare the performance in annotating power curves by domain ex- perts and non-experts. The manual labeling consists in indicating on a daily time series that represents a single appliance or a circuit (e.g. data acquired through a multiplug or at the room level), when the ap- pliance is actively used (being triggered on by a resident) or idle (being in stand-by mode or off) [37, 39], and thus only exhibiting baseline power consumption. The setup of both experiments is representative of the current situation with research fields where the access to do- main experts is limited, but users with general and common knowl- edge is large. Benchmark energy curves were selected for different appliances from the Pecan Street dataset and with varying degrees of difficulty (that would require domain knowledge to solve more ac- curately) on our crowdsourcing CAFED platform10, which contains a gamification module to dispense badges for user engagement [39]. Additionally, we gathered 3 experts who have extensively worked

10 https://cafed-study.inf.ethz.ch/ 4.2 crowdsourcing through user expertise quantification 57 with power data. The experts’ contributions were used to create the ground truth or gold standard, which could then be compared to the regular users’ labeling.

4.2.3.1 Experiment Description As can be seen in Figure 18, we determined i) easy tasks as tasks without baseline consumption, where the active consumption would consist in everything above 0 [W] as in Figure 18a, ii) medium tasks require additional knowledge such as the presence of baseline con- sumption, or context (type of appliance or circuit) as in Figure 18b, and iii) difficult tasks rely on the detection of periodic patterns with high oscillation such as fridges and the mechanical functioning of an appliance (in the case of a fridge, the compressor or in the case of multi-state appliances such as dishwashers, being able to link the consumption to different stages in the washing process) as in Fig- ure 18c. The annotation task can be time consuming as the curves have a 1-minute granularity, meaning that 1440 data points per daily curve have to be annotated and that careful annotators are required to meticulously inspect the curves and zoom in and out to decide when to transition from active to idle and conversely. Each task would therefore be fulfilled within a few seconds to a few minutes, depend- ing on its difficulty level. The curves were labeled both by the experts and the regular users. We ran controlled experiments with both regular users’ groups, by implementing different stages, to assess their performance in more details. This consisted of four phases described in the following.

• Phase 1 was a survey to collect background information about our participants’ familiarity and knowledgeability in regards to the energy jargon and the functioning of appliances. The survey can be found in the Appendix, in Section A.2.

• Phase 2 consisted in the solving of the 30 (8 in the case of AMT) predefined annotation tasks. In order not to influence the par- ticipants, we only provided a quick start video tutorial on how to interact with the platform.

• Phase 3 offered user coaching: we discussed their experience with Phase 2 and tutored them (this was replaced with videos in the case of AMT). This covered exploiting information dis- played on the platform and deepening their knowledge about the electrical consumption of appliances as in Figure 18d. They were made aware of the following: – they should make use of the appliance’s type and its oper- ation mode; – stand-by consumption of different appliances occurs even when the devices are idle; 58 crowdsourcing energy data labeling

– they should identify periodicity by looking at the time axis in the annotation panel and make use of the curves ex- tracted from the same week in the right panel to distin- guish extraordinary patterns from normal functioning; – fluctuating power consumption depends on the usage and the context (e.g. heater in cold or hot room); – electricity patterns can exhibit fluctuations due to metering noise or inner circuits.

• Phase 4 was a repetition of Phase 2. This round simulated hav- ing a group of more experienced users, already knowing how to interact with the platform and having some basic knowledge about energy dataset labeling. We expected the second annota- tion session to improve the quality of the labeling. This would imply that some guidance is needed in order to get acceptable results.

In order to distinguish experts’ from non-experts’ work, we incor- porated additional features that were collected during the study. Not only did we record the time spent to solve each tasks, but also the mouse movements to highlight the users’ interaction with the plat- form (the usage of the annotation tools, side panel with additional days, etc.), which can also be used in the next section.

4.2.3.2 Physical Access to the Participants We recruited 7 users in Zurich with diverse levels of familiarity with the energy jargon, diverse education backgrounds and occupations, with ages ranging from 16 to 50. They were selected for their interest in improving the energy efficiency in households and their motiva- tion for taking part in this experiment and solving these annotation tasks without remuneration. They solved 30 tasks without coaching, then repeated the labeling on the same curves, but in a different order, after receiving our energy tutorial.

4.2.3.3 AMT Workers We changed our platform to enable compatibility with AMT by for- matting the experiment as a survey to solve on our external platform and the Human Intelligence Task (HIT) is validated upon the sub- mission of a completion code. For this purpose, we prepared single- use matching tokens that were embedded in the URL, which is only made availabe to workers if they accept the HIT and if they have not already viewed the survey task previously by adapting TurkGate as a gateway for incoming workers on our server [94]. JavaScript code in the HIT extracts the AMT worker’s id and the HIT’s assignment id. TurkGate contains an URL parser module that will extract the. The AMT worker ID, the time of the request and the URL are stored by 4.2 crowdsourcing through user expertise quantification 59

TurkGate in a MySQL database. The TurkGate gateway upon verifi- cation in the MySQL database will redirect the AMT worker to our CAFED web server. To comply with AMT’s regulations, we automat- ically created accounts matching the tokens, initialized with random passwords, as the gateway redirected the users to our platform if they have not accessed our web server before (which is dependent on the AMT worker ID). All phases described above were loading automati- cally one after the other, and the coaching was provided in the form of short videos that had to be watched to grant access to the last an- notation phase. The workers were randomly assigned to 3 groups of 20. Each group annotated 4 tasks from one difficulty level, followed by 4 tasks from another difficulty level, such as i) medium followed by difficult tasks, ii) difficult followed by medium tasks and iii) easy followed by difficult tasks. Then after the tutoring videos, they pro- ceeded to annotate similar tasks again. Each worker with an approval rate of at least 75% was paid 3 USD for completing the experiment and providing the corresponding completion token.

4.2.4 Methodology

In the following, we show how the data obtained through the user studies are treated and analyzed. We attempt to compare and ana- lyze non-expert against expert users in three different manners. Our goal is to detect weak workers to either ignore their input as soon as possible or ban them from the system. In a system with few potential users, we want to promote good workers, by quantifying their annota- tion quality and accounting for its fluctuation. We focus on the online scoring of users in the order in which the tasks are solved and the classification of non-expert and expert contributions. For these two aspects of our analysis, we present the features that can be extracted from the data. Then, we explore how to combine different users’ anno- tations based on their expertise level and to predict how the wisdom of the crowd performs against the experts’ annotations. We first describe the features that can be extracted from the data and then explain how they are used in the online scoring, the classi- fication and the prediction. The gold standard is extracted from the annotations provided by the expert contributors by applying major- ity voting, this allows us to consolidate the experts’ annotations by enabling consensus [39]. features We distinguish scoring features, which compare a regu- lar user’s work against the gold standard computed from the experts, from features extracted from single annotation tasks. scoring To distinguish the regular users’ from the experts’ work, we score their contribution against the gold standard produced by the 60 crowdsourcing energy data labeling

experts. These scores are computed for each user and each task indi- vidually. They quantify the worker’s quality reflected by their perfor- mance in annotating the benchmark curves by looking at the accuracy of the outcome in comparison with the gold standard. Additionally, the variety of scores addresses potential malicious contributions by considering different attack scenarios. In the following, we will for- mally define the scores that can be extracted by comparing the users’ annotations.

Hamming distance The annotated daily curves are binary vectors. To compare different annotators’ performances for the same curve, we can turn to distance measures for binary vectors. To compare two binary vectors x and y of dimension d (representing when an appli- ance is considered active as 1 or idle as 0), one such measure is to compute their Hamming distance as described in Equation 4.

d−1 dH(x, y) = |yk − xk| (4) k=0 X We propose a score based on the Hamming distance: the percentage of correctly annotated minutes per daily curve. The score as described in Equation 5 is computed for each task i and user j.

dH(ti,j,gi) scoreHi,j = 1 − d (5) This score relies on the observation that most appliances are not al- ways active (they are either in stand-by mode or off most of the day). This is reflected in how the curves should be annotated, as the major- ity of the time the appliance or circuit should be considered idle, and the annotation binary vector should have by extension a majority of zeros. If our score took the whole annotation binary vector in consid- eration, it would allow a user who had provided minimal effort and labeled the curve as all idle (thus zeros) to achieve a high score. To prevent such attack scenario, we focus on the proportion of true positives labeling over the vector’s length d in comparison with the manually annotated ground truth provided by the experts, instead of

being biased by the true negatives’ proportion. We define the scoreHi,j for the annotation ti,j for task i by user j and its corresponding gold standard gi as in Equation 5.

Focusing on the Classification Confusion Matrix We address the case of a user who provides annotations with zeros only, by account- ing only for the true positives and thus drastically decreasing their score. However, a user who would instead only provide annotations with ones would achieve the highest score of 1. They are indeed guar- anteed to correctly classify all of the active sections of the curve, al- though this labeling is comparably as damaging as the all-zeros sce- nario. To prevent this, we leverage the classification confusion matrix 4.2 crowdsourcing through user expertise quantification 61 for additional scores. This is why we consider the true positives TP as another score. Additionally, we can leverage i) the false negatives FN, where the user annotates an appliance as being idle, although it is considered active by the experts, ii) the false positives FP, where conversely, the appliance is annotated as active, despite being marked as idle by the experts. We focus on comparing the parts annotated as active by the user and those annotated as active in the gold standard. This means using TP, FN and FP, to define the ratio of parts correctly annotated as active. For the annotation ti,j for task i and provided by user j, we define the scoreAi,j as in Equation 6.

TPi,j scoreA = (6) i,j TPi,j+FPi,j+FNi,j We can also make use of traditional machine learning scores such as precisioni,j, recalli,j and the F1-score F1i,j , defined for task i’s annotation ti,j, provided by user j.

WAVE Algorithm The previous scores were defined per task and per user. To compare multiple users’ performances for one specific task, we use the weight-adjusted voting algorithm for ensembles of classifiers (WAVE) [130, 131]. The algorithm takes as input an m ∗ d × n matrix of n vectors vi, where for each user i

1 if the user annotated the kth data point correctly v = ik  0 otherwise and computes weight vectors for the users and the data points to be annotated. This can be related to our case, where each data point to be annotated in our daily consumption curves corresponds to a WAVE exercise. Then follows directly that the input vector vi that con- catenates all of the m annotation tasks, each being a binary vector of dimension d, as a vector with m ∗ d components, is defined as follows:

1 if the user and the gold standard agree  th vik = on the k minute   0 otherwise  The WAVE algorithm outputs the following:

• an m ∗ d-dimensional weight vector WAVEtasks for all the min- utes in all the given annotation tasks, which gives more impor- tance to more difficult minutes;

• an n-dimensional weight vector WAVEusers for the n users, which gives more weight to users who label more difficult minutes (in the daily curves to be annotated) correctly.

We will use the user weights WAVEusers in the following sections. 62 crowdsourcing energy data labeling

data features

Behavioral Features Our platform allows us to capture behav- ioral features relating to the annotation tasks. We are thus using fea- tures relating to the users’ interactions while annotating each curve. As can be seen in Figure 18, the platform consists of a tool box, where the user can choose the pencil to highlight zones considered as active, the eraser to correct their annotations and a glass magnifier for zooming in or out. The workbench consists of an annotation panel as the yellow area where the curve to be annotated is displayed on the left side, and on the right panel, a week-viewer to display the next 7 days of data, in order to determine if the curve to annotate reflects an occasional or the usual consumption pattern. We can collect the following information for each annotation ti,j for task i and user j:

•# secondsi,j needed to complete the annotation ti,j;

•# mousemovementsi,j over the tool box for annotating ti,j;

•# mousemovementsi,j over the annotation panel for annotating ti,j;

•# mousemovementsi,j over the week-viewer for annotating ti,j;

•# millisecondsi,j spent over the tool box area for annotating ti,j;

•# millisecondsi,j spent over the annotation area for annotating ti,j;

•# millisecondsi,j spent over the week-viewer area for annotat- ing ti,j.

Data characteristics Additionally, we make use of each task i’s

difficulty level cDi .

4.2.4.1 Analysis In this part, we focus on the data analysis to determine how to char- acterize the regular users’ and experts’ work.

online scoring To guarantee the quality of the data collected in a crowdsourcing system, we need to detect weak and adversarial workers rapidly to either ignore their contribution or to ban them. If the number of contributors is limited, we need to salvage as many annotations as possible, by giving some slack to users for whom we can detect a temporary decrease in the quality of the annotations, if they have proven to be well-performing in the past. For this reason, we should value the user’s expertise in regards to the task’s diffi- culty level and consider that bad performance, if temporary, could be 4.2 crowdsourcing through user expertise quantification 63 overlooked and explainable (weaker knowledge about a specific ap- pliance’s functioning, contrasting with solid performance with other types of appliances). We define the combined score ci,j for user j solving the current task i as in Equation 7, where α, β and γ can be used for giving more or less weight to different other scores.

α β γ ci j = F1 ∗ score ∗ score (7) , i,j Hi,j Ai,j We analyze the evolution of the annotations’ quality as the worker is submitting them by computing the current combined score ci,j and acknowledging for the tasks’ difficulty level with the coefficient cDi (0.2 for easy, 0.3 for medium, 0.5 difficult). To account for the past performances, we consider a remembrance factor αr ∈ [0, 1] for preferring more recent contributions, which applies exponential de- cay over past annotations. We define the online score scoreOi,j for the current ith task as the recurrence relation as in Equation 8 with scoreO1,j = c1,j.

c α Di r scoreOi,j = α +c ci,j + scoreOi−1,j (8) r Di αr + cDi classification To distinguish between regular users and experts, we use feature vectors representing each annotation task and use known machine learning classifiers. We describe which features can be used and which algorithms would be suitable in the following.

Feature Vectors We want to classify each annotation provided by each user j as either coming from an expert or a regular user. For this, we can build a feature vector from each task i’s specific scores as described in Section 4.2.4. Namely, we use the confusion matrix values TPi,j, TNi,j, FPi,j, FNi,j, precisioni,j, recalli,j, and the scoreHi,j and scoreAi,j scores. Additionally, we use all task-specific features described in Equation 4.2.4, i.e., interaction features and the task’s difficulty level.

Classifiers As our data classes are unbalanced, due to having less expert data than regular user data, we use Adaboost [206] as an ensemble classification method for combining weak classifiers. The weak classifiers that we consider are Naive Bayes, LibLinear, Multi- layer Perceptrons and Random Trees. prediction In this part, we investigate how to combine the wis- dom of the crowd for annotating each task and for approaching the expert users’ annotation level. This would mean deciding for each 64 crowdsourcing energy data labeling

data point in a curve to be annotated, what value it should take, de- pending on the contribution of multiple annotators. Classical meth- ods for combining the labeling for each curve to annotate ti from each ti,j contribution by each user j exist, but we take advantage of the user’s expertise level to improve the prediction. We want to ob- th tain the k data point tik by combining each ti,jk provided by a user j.

Majority Voting Majority voting is the simplest approach to com- bine multiple annotations, by choosing the value supplied by the ma- jority of the users among n users as shown in Equation 9.

n n 1 if j=1 ti,jk > t = 2 (9) ik  0 otherwiseP This combination is not robust if the majority of the annotators are unknowledgeable or if an attacker creates multiple accounts and feeds the incorrect labeling multiple times.

Weighted Majority Voting The other approach consists in weight- ing the majority voting [154, 227] according to the workers’ expertise level. This would allow to increase the influence of more experienced users and diminish the influence of weaker users. We are looking to define the weights wj for each user j, normalized over all n users, to

compute the predicted label tik as in Equation 10.

n 1 if j=1 wj ∗ ti,jk > 0.5 t = (10) ik  0 otherwiseP To reflect the user’s expertise level, we can combine existing scores to reflect the users’ performance in labeling against the gold standard, compared to the other users, and account for the task’s difficulty con-

tribution coefficient cDi , as in Equation 11 for each user j over the m tasks to be solved. Then, we obtain

δ m α β γ WAVEusers i=1 cD ∗F ∗score ∗TP j i 1i,j Hi,j i,j wj = n δ m α β γ (11) j=1 WAVEusersP i=1 cD ∗F ∗score ∗TP j i 1i,j Hi,j i,j P P where the constants α, β, γ and δ can be adjusted to emphasize or reduce the impact of some scores. We will evaluate the prediction from the crowd against the gold standard and observe the robustness of the weights by changing the set of tasks used to build the expertise weights. For this, we proceed with leave-one-out cross-validation and use m − 1 tasks for training and obtaining the expertise weights wj to predict the annotations for the left-out task. 4.2 crowdsourcing through user expertise quantification 65

4.2.5 Results

In the following, we discuss the results for the three different axes of our analysis: the online scoring of the users’ performances over time, the classification of the annotations as expert or non-expert work and the prediction of an annotation based on the contributors’ expertise levels and the performance of others. Due to our implementation, all the users had distinct IP addresses, and we effectively gathered 60 different AMT workers who were only permitted to take part in the experiment once. By collecting the users’ IP addresses, we could geo-tag their origin. The AMT worker population was constituted of about 50% US and 27% Indian workers, the rest being spread world- wide. About 17% of the workers had master qualifications and the rest had above 75% HIT approval rate.

4.2.5.1 Online Scoring Many factors can influence the quality of the annotation for non- adversarial users, such as their focus, their motivation or the task’s difficulty. In the long run, we would like to retain users who perform usually well, accounting for occasional bad results, but also be able to react quickly to weak or malicious workers and expel them from the system. We introduce online scoring to take a user’s performance over time into account, and to determine whether or not to keep them in the system on the fly. We parameterize the online scores described in Equation 8 by evaluating the remembrance factor αr for different values: 0 (forgetting the past), 0.5 and 1 as in Figure 19, for both ex- periments. As can be seen in Figure 19a and Figure 19b, selecting αr > 0 al- lows to account for the past performances, but an overall score that does not take the task difficulty into account would promote users who have solved easier tasks successfully, but does not guarantee that they will succeed at solving medium or difficult tasks. Choosing a larger αr > 0 shows a smoother online score that is more resilient to punctual bad labeling. This would allow to be more lenient to users, based on past good performance. We also observe that we can select a threshold for accepting or even promoting users or banning them in- stead. Additionally, we can clearly distinguish experts’ performance from regular users’ with some comfortable margin and this enables the selection of a threshold that can be tailored to the sensitivity to bad performance. In the case of the AMT experiment, we encountered 12 poorly per- forming users (scores below 0.3) and 6 attackers (scores around 0) across all three test groups. These users consistently showed little or no knowledge about the energy domain as assessed in the survey and they spent the least time completing the experiment (less than 20 min- utes on average). The best performing users (12) were spread equally 66 crowdsourcing energy data labeling

Experts, study with physical access User 30, study with physical access 1 1

0.8 0.8

0.6 0.6 Exp. 1, α = 0.0 Exp. 1, α = 0.5 α Score 0.4 Exp. 1, = 1.0 Score 0.4 Exp. 2, α = 0.0 Exp. 2, α = 0.5 α = 1.0 Exp. 2, α = 1.0 α = 1.0 w/DW 0.2 Exp. 3, α = 0.0 0.2 α = 0.5 Exp. 3, α = 0.5 α = 0.5 w/DW Exp. 3, α = 1.0 α = 0.0 0 0 5 10 15 20 25 30 5 10 15 20 25 30 #Solved Tasks #Solved Tasks (a) Experts’ online scoring (b) A user’s online scoring

AMT User 342 AMT User 238 1 1

0.8 0.8

0.6 0.6 Score

Score 0.4 0.4 α α = 1.0 = 1.0 α α = 1.0 w/DW = 1.0 w/DW 0.2 α 0.2 α = 0.5 = 0.5 α α = 0.5 w/DW = 0.5 w/DW α α = 0.0 = 0.0 0 0 16 17 18 29 8 9 11 27 20 30 33 34 12 31 32 35 8 9 11 27 16 17 18 29 12 31 32 35 20 30 33 34 Task ID (in completion order) Task ID (in completion order) (c) A rare case for a well-performing (d) An AMT user improving after the AMT user (group 2: difficult, then coaching in the last 8 tasks (group 3: medium) easy, then difficult)

Figure 19: Online scoring for the experiment with physical access to the par- ticipants (top) and on AMT (bottom), with varying αr’s and with and without difficulty weighting (easy , medium , difficult )

across the test groups, when the tasks’ difficulty is accounted for and αr = 1, and do not score below 0.4 as can be seen in Figure 19c, they had generally given correct answers in the survey, but the large majority was not aware of the functioning of a fridge. The average per- forming users, generally knew about fridges’ mechanics, but had less knowledge about other appliances and baseline consumption. Work- ers in the third group (easy, then difficult) solved the difficult tasks as well as the users from the second group (difficult, then medium) be- fore the coaching, while the first group annotated the difficult tasks mostly incorrectly. The results were consistent for both experiments and we generally observed an improvement of the annotation quality after the coach- ing.

4.2.5.2 Classification As described previously in Equation 4.2.4.1, we use Adaboost to ad- dress the class imbalance due to having more regular users than ex- perts. We classify vectors for each annotation as described in Equa- 4.2 crowdsourcing through user expertise quantification 67 tion 4.2.4.1 and normalize the numeric attributes. We perform the evaluation through a 10-fold cross-validation for the datasets consist- ing of

1. the experts’ and the users’ annotations before receiving the coach- ing,

2. the experts’ and the users’ annotations after receiving the coach- ing,

3. all experts’ and users’ annotations, and obtain the results shown in Table 5. We obtained comparable results for the first and the AMT exper- iments. We present the latter in more details in the following. We notice that the naivest classifier, i.e., the Naive Bayes, performs over- all worst than the Random Tree, LibLinear and Multi-layer Percep- trons. This is due to the class imbalance and that Naive Bayes’ weights are smaller for the class with the least representatives [194]. Overall, the best classification scores are achieved before coaching the regu- lar users, as their performance is improved significantly afterwards (we can observe an improvement of 10-20%), and their work becomes comparable to the experts’ annotations. The combined dataset (con- taining annotations before and after the coaching) shows the worst scores generally, and this is due to the incertitude induced by the im- proved annotations, bringing the non-experts’ closer to the experts’ contributions. The best scores are achieved for the datasets composed of the anno- tations obtained before the coaching session and for the Random Tree, with F1-scores for classifying the experts of 68.1% and 95.2% respec- tively for the regular users. The share of false negatives is mostly due to misclassifying the experts due to the similarity in the annotation of the easy tasks and the medium tasks. The share of false positives is however an indication of the potential for selecting users, whose contribution should be promoted (if looking at the performance on the medium task solving), but we could explain this by the fact that easy tasks were solved as well as the experts.

4.2.5.3 Prediction As described previously in Equation 4.2.4.1, we are investigating the prediction of an annotation by combining the work of several work- ers, by computing a weighted majority voting based on the expertise weight as in Equation 10. The scoring is obtained by computing the F1-score, the activity score scoreA and the Hamming score scoreH between the resulting crowd-combined annotation vector and the re- spective gold standard. We first examine the leave-one-out cross-validation for predicting one annotation based on computing the expertise weight on m − 1 68 crowdsourcing energy data labeling

Table 5: AMT classification results (precision, recall and F1-score) per class (Expert / User) using Adaboost and weak classifiers (LibLinear, Naive Bayes, Random Tree, Multi-layer Perceptrons), before coach- ing (BC), after coaching (AC), with and without coaching (All)

Class Data PrecE RecE F1E PrecU RecU F1U

LL BC 0.577 0.417 0.484 0.916 0.954 0.935 LL AC 0.54 0.375 0.443 0.91 0.952 0.931 LL All 0.494 0.292 0.367 0.9 0.955 0.927 NB BC 0.332 0.931 0.489 0.986 0.719 0.831 NB AC 0.357 0.569 0.439 0.929 0.846 0.885 NB All 0.25 0.875 0.389 0.97 0.607 0.747 RT BC 0.681 0.681 0.681 0.952 0.952 0.952 RT AC 0.512 0.583 0.545 0.936 0.917 0.926 RT All 0.595 0.611 0.603 0.941 0.938 0.939 MLP BC 0.6 0.125 0.207 0.883 0.988 0.932 MLP AC 0.286 0.028 0.051 0.872 0.99 0.927 MLP All 0.375 0.083 0.136 0.877 0.979 0.925

other tasks as can be seen in Figure 20, where we show the prediction for each task based on using the remaining ones for training or the experiment with physical access to the participants. As can be seen in Figure 20a, by accounting for the difficulty of the task, as more

weight is given to users who have solved medium (cDi = 0.5) and

difficult tasks better (cDi = 0.3), we obtain high average prediction scores (92.3% for the F1-score, 89.7% for the scoreA, 97.2% for the scoreH), for the curves obtained before the coaching. in Figure 20b, with the same weight distribution, the results have improved for an- notations made after the coaching, as the average prediction scores (95.3% for the F1-score, 92.7% for the scoreA, 98.9% for the scoreH), namely in the annotation of the freezer, which was previously poorly annotated as can be seen in the second dip in Figure 20a. We examine how selecting a training set built on annotated curves of the same difficulty level influences the scores of the combined an- notations, by considering the AMT experiment. Additionally to the leave-one-out cross validation, we use groups of tasks of the same difficulty levels to predict tasks from another difficulty level. As can be seen in Figure 21, groups 1 and 2 had to annotate the same tasks, but the difficulty levels’ sequence was inverted. Similarly to the results seen for the online scoring, the leave-one-out cross- validation produced better predictions for medium tasks in group 2, except for task 12 as can be seen in Figure 21a and Figure 21b. The same applies to difficult tasks, with the exception of task 17. Addition- 4.2 crowdsourcing through user expertise quantification 69

LOOCV prediction BC LOOCV prediction AC 1 1

0.8 0.8

0.6 0.6

Score Score Score 0.4 H 0.4 (avg: 97.20%) Score A Score (avg: 98.94%) 0.2 (avg: 89.74%) 0.2 H Score (avg: 92.68%) F1-score A (avg: 92.31%) F1-score (avg: 95.26%) 0 0 5 10 15 20 25 30 5 10 15 20 25 30 Predicted Task ID Predicted Task ID (a) Prediction on the set of curves ob- (b) Prediction on the set of curves ob- tained before the coaching with dif- tained after the coaching with diffi- ficulty weighting culty weighting

Figure 20: Leave-one-out cross-validation prediction for the experiment with physical access (easy , medium , difficult ) ally, being able to solve the easiest tasks does not correlate with being able to solve the difficult tasks as group 1’s predictions are more accu- rate than group 3’s. We examined the benefit of selecting subgroups of tasks of the same difficulty level to predict tasks from another dif- ficulty level. in Figure 21c and in Figure 21d, we inverted groups’ sequence, so that we could compare predicting difficult tasks from medium tasks in both groups 1 and 2. As observed previously, group 2 (87.5% for the F1-score, 82.6% for the scoreA, 96.5% the scoreH) still performs better than group 1 (85.8% for the F1-score, 81.4% for the scoreA, 98.3% the scoreH), with higher average prediction scores. A drastic improvement occurs for the annotation of task 30, improv- ing from 32.3%F1-score to 98.5%. Also, the difficult tasks’ prediction from group 3 show again that scoring high on the annotation of easy curves does not guarantee an accurate prediction of difficult tasks. This shows that users that are able to solve those tasks successfully have more advanced knowledge about the energy domain. They can generalize across the annotation tasks, regardless of their difficulty.

4.2.6 Conclusion and Discussion

In contrast to more traditional crowdsourcing tasks, in the case of energy research, beyond common knowledge, more elaborate knowl- edge linking the power consumption representation to the mechanics behind the functioning of appliances is required. We evaluated regu- lar users’ against experts’ work through two user studies, one with direct access to the participants and the other was deployed as a sur- vey on AMT, with a fully autonomous workflow. We assessed the users’ degrees of familiarity with the energy jargon or the function- ing of electrical appliances by integrating a survey testing the user’s general conception about the average consumption of diverse appli- 70 crowdsourcing energy data labeling

LOOCV prediction (AMT, All, Mid -> Diff) LOOCV prediction (AMT, All, Diff -> Mid) 1 1

0.8 0.8

0.6 0.6

Score 0.4 Score 0.4

Score (avg = 97.46%) Score (avg = 97.96%) 0.2 H 0.2 H Score (avg = 82.39%) Score (avg = 86.13%) A A F1-score (avg = 87.25%) F1-score (avg = 90.20%) 0 0 8 9 11 27 16 17 18 29 12 31 32 35 20 30 33 34 16 17 18 29 8 9 11 27 20 30 33 34 12 31 32 35 Predicted task ID (in completion order) Predicted task ID (in completion order) (a) Leave-one-out cross-validation, (b) Leave-one-out cross-validation, group 1 group 2

CV prediction w/DW (AMT, All, mid -> diff) 1 CV prediction w/DW (AMT, All, mid -> diff) 1

0.8 0.8

0.6 0.6 Score 0.4 Score 0.4

Score (avg = 98.27%) H Score (avg = 96.52%) 0.2 0.2 H Score (avg = 81.40%) Score (avg = 82.55%) A A F1-score (avg = 85.75%) F1-score (avg = 87.53%) 0 0 16 17 18 29 20 30 33 34 16 17 18 29 20 30 33 34 Predicted task ID (in completion order) Predicted task ID (in inverted sequence order) (c) Training medium, predicting difficult, (d) Training medium, predicting difficult, group 1 group 2

Figure 21: AMT leave-one-out cross-validation prediction (easy , medium , difficult )

ances, from single-state to more complex appliances or their aware- ness about baseline consumption, especially when triggered by a pe- riodic consumption pattern. The users often had little to no knowl- edge about the average consumption of appliances that are generally present in their homes, and this produced lower quality work. But the performance dramatically improved in some cases, if the users were trained to pay attention to certain details when annotating different appliances or circuit-level data. This should be as much as possible intertwined in the design of the HITs. In our case, we could not have used AMT as such, but we proceeded with our own back-end, which allowed us to add training material between annotation sessions. We developed the online scoring to provide an effective way for de- tecting when to promote a user or to discard weaker users. Moreover, if we leverage the difficulty of the annotations and carefully curate the difficult tasks, we can use a small seed of benchmark tasks to improve the prediction quality significantly. This would reduce the work load on the expert users, by requiring less gold standard tasks to be collected. Although we also showed that some tasks cannot be solved as well as by experts, we have underlined the necessity for the designers of a collaborative system for labeling data where domain 4.2 crowdsourcing through user expertise quantification 71 knowledge is required, to make use of more domain-specific informa- tion to craft the challenge benchmark questions to vet the quality of the workers. Looking at the evolution of the quality of the annotations as time progresses and detecting when the annotations become too straining could be examined further, especially for tasks that require more ef- fort than categorizing an item. Additionally, the CAFED platform con- tains a user engagement component and dispenses badges based on achievements. One user of the first experiment reportedly provided over 175 annotations (for a total of 3 hours in a row) due to the mo- tivation of acquiring more badges as had been previously shown for text labeling gamification [79]. The quality of the data could be stirred as the badges are allocated. We received positive feedback from AMT workers regarding the usability of the platform, the fact that they learned about their energy consumption and that they would have in- terest in participating in similar experiments in the future. The feed- back is available in the Appendix in Section A.3. This would encour- age researchers from different field to benefit from crowdsourcing to enhance their datasets. In this chapter we showed how crowdsourcing can help provide manually labeled ground truth for energy datasets. We first only con- sidered expert users in our CAFED platform, but then extended our analysis to regular users and showed how their contribution could be used to some extent, provided that their expertise can be quantified.

INFERRINGACTIVITIES 5

This chapter is based on work developed by the author during her exchange at the LSIR at EPFL and published in Proceedings of the 2016 IEEE International Conference on Big Data (BigData ’16)[38]. Additionally, the chapter contains work appearing in the Proceedings of the 2016 Workshop on Smart Grids at the 2016 IEEE International Conference on Big Data (BigData ’16)[40].

The future smart grid offers the possibility of having fine-grained information and capabilities to monitor its status in real time. Imple- menting real-time and personalized feedback could amount to a sub- stantial energy reduction in the residential segment [16]. This should be considered with the potential savings during peak time, when high penalties might become a reality in the future. It can also be the cornerstone of future off-the-grid scenarios as micro-generation and battery technologies become more affordable. Focusing on the household scale offers an alternative to aggregating levels in DRM systems. In the context of the smart home, one could foresee trad- ing off users’ lifestyle preferences and comfort with saving measures, while preserving the privacy of the residents, by providing an opti- mization inside households. From a technical standpoint, it has yet to be decided how much in- formation should be collected, i.e., the granularity of such data, and which additional sensors should be integrated to provide a better un- derstanding of how electricity is consumed. To this end, the access to disaggregated data requires the setup of data collection architec- tures with prohibitive costs. One practical alternative is single point, non-intrusive sensing of aggregated energy which involves the devel- opment of NILM algorithms on existing household-level aggregated data to differentiate the devices in use [185]. Given the recent release of a large dataset with appliance-level measurements, the Pecan Street dataset, abstracting the usage of electrical devices in households by investigating the motives behind them being triggered by a user be- comes possible. This involves unraveling information from the col- lected power measurements and finding out when and how they are used in conjunction. To achieve energy efficiency in households, we needed to under- stand what triggers the electricity consumption. Therefore, in this chapter, we focused on learning the residents’ behaviors that involve the consumption of electricity, by identifying activities that require the usage of appliances and other electronic devices. While consid- erable research efforts have been directed at analyzing aggregated

73 74 inferring activities

loads from smart meters or at developing algorithms for disaggregat- ing loads to extract the consumption of single appliances, less focus has been put on assessing the potential of using disaggregated data. This was primarily due to the fact that such datasets were not widely available, due to the difficulty and the costs in instrumenting house- holds for acquiring the consumption data from appliances.

5.1 activity detection

Activity recognition is a long-established field of research. Previous work looked at human trajectories, interactions with objects or social activities [3]. However, most approaches neither target energy con- servation, nor use the electricity consumption as an input variable for the recognition of activities. Thus, our goal of estimating human inter- actions with electrical appliances agnostically is most closely related to recent work on Demand-side Management (DSM). The ability to accurately predict future energy needs is the cornerstone in proper DSM, and many research efforts have been devoted to this in the last couple of years [218]. Some of the investigated methods rely heavily on past consumption data to predict future demand, and therefore, we argue that our research can be of added value in this situation, especially given the high granularity of data (one measurement per minute) that can be easily modified to test different predictions peri- ods (e.g. hour, day, week) to evaluate the outcomes of the prediction algorithms in a variety of energy consumption scenarios, including off-the-grid households. Activity recognition in households can be assisted through sensor deployment in households [58, 191, 192, 223] or WiFi signatures [237]. When real-life deployments were not possible, prior work used simu- lated power traces for investigating human activities in households [65, 197, 245]. Attempts at using existing publicly released datasets to identify appliances that are used in conjunction and the flexibility of their usage in households have utilized the REDD dataset to support their analysis but have used predefined thresholds for determining when the appliances were on ON or OFF [174].

5.2 an alternative to eco-feedback systems

Our approach attempts to tackle the known limitations of current eco- feedback systems, which focus on increasing efficiency by raising end- user awareness of how their actions impact the use of energy. Pereira et al. [184] showed that energy disaggregation strategies, commonly used in eco-feedback systems, are overwhelming for most users, as they lose interest and show relapsing behaviors in their energy con- servation actions. From the initial challenge of creating effective low- cost disaggregation strategies, the new problem to be faced was that 5.3 gmmthresh: agnostic appliance activity thresholding determination 75 of generating meaningful strategies to re-aggregate consumption data that could effectively lead to long-term sustainable energy conserva- tion practices in domestic environments. The rest of this chapter is organized as follows. In Section 5.3, we present GMMthresh to automatically determine the threshold dif- ferentiating the active and idle states of an appliance, based solely on the statistical properties of its load consumption. Our method is validated by using the manually labeled data acquired through expert-crowdsourced annotations acquired on the CAFED platform presented in Chapter 4. Then, in Section 5.4, we propose a pipeline for mining temporal association rules to learn the schedule of human activities involving the usage of electrical appliances.

5.3 gmmthresh: agnostic appliance activity threshold- ing determination

Until smart appliances become widespread, determining the state of an appliance and in particular, when it is active from when it is idle or in standby mode, can only rely on disaggregated power time se- ries. We investigate how an appliance’s trace properties can be lever- aged without side information that could assess the proximity of the residents, nor ground truth data from a journal that documents the activities in the household, to determine when there is interaction with an appliance to carry out a human activity. Setting fixed thresh- olds based on the analysis of a set of known appliances and building databases of signatures will not scale with the release of new models of appliances, as their characteristics are expected to evolve as devices become more efficient due to technological improvements. Instead, determining these thresholds agnostically of the appliances’ types, models and brands, based on statistical properties of their consump- tion, would be adaptable for existing and next generation devices. In this section, we propose an automated method for determining when an electrical device is triggered by households’ residents solely from its power trace. Knowing when an appliance is in use is required for identifying recurrent patterns that could later be understood as activities as we will see in Section 5.4. In order to determine which appliances are utilized conjointly and linked to a human activity, our contribution is to distinguish the active consumption from the base- line and noise in their power traces. Our method could be extended to other types of sensors, where it is necessary to determine useful measurements from baseline noise (such as in the case of inertial sen- sors). The remainder of this section is organized as follows. Section 5.3.1 presents related work. Section 5.3.2 introduces the methodology for the automatic thresholding. Section 5.3.3 shows the algorithm’s eval- uation through experimental results. We conclude in Section 5.3.4. 76 inferring activities

5.3.1 Related Work

Previous work used statistical attributes of the data to determine oc- cupancy, we are however assessing activities that incur energy con- sumption [49]. While NILM has focused on disaggregating loads by supervised learning through ON-OFF events [238], state detection for modeling and maintaining appliances’ signatures [24, 76], spike de- tection [183] or an analysis of the different appliances patterns [187], determining when an appliance is active, often relies on using a pre- defined threshold [74].

5.3.2 Methodology

Using only electrical loads (no side information, nor ground truth), it is necessary to evaluate how to differentiate baseline consumption that can be considered as noise, from human-triggered actions. While it would be possible to handpick a threshold to decide when the ap- pliance is powered on and serving a human activity, such a process would be done arbitrarily and would not be generalizable given the multitude of brands and models in consumer electronics and how they change and evolve due to technological advances. To this end, we developed an automated way of deciding when an appliance reaches a power level high enough, such that it can be regarded as being used by a human being. This requires considering each household sepa- rately and learning from the specificity of each trace. Such method relates to image thresholding, an essential method for isolating ob- jects or other relevant information in digital images [109].

5.3.2.1 State Estimation We consider two types of power traces, namely appliance-level data (single appliances), and circuit-level data (aggregated readings recorded by instrumenting circuits at the room level, or obtained from a power strip). We refer to both as appliances from now on. We explain how different power levels are linked to the appliance’s state and its uti- lization. Since a human being is not activating the appliances throughout the day, we can distinguish between an idle state (off / stand-by mode, typically low power levels) and an active state (when the residents are powering it on or actively interacting with it). We notice, for exam- ple, in the case of a washing machine, that several mechanisms allow running different washing programs and cycles throughout its time of use (soaking, spinning, etc.). In the case of data being collected at the circuit level, we could expect to observe different devices (lights, smaller consumer electronics) being turned on. Each mode of func- tioning can be related to the internal state of an appliance in the case of single appliances or to different electrical devices being switched 5.3 gmmthresh: agnostic appliance activity thresholding determination 77 on in the case of circuit level data and operating at different power levels [77]. So, we rely on this to suggest that different states in the use of an appliance are linked to different levels of power. Following this idea, we want to observe the relationships between power levels in the distribution of the power measurements of an appliance. Although we intend to discover activities in a data-driven man- ner, i.e., without a-priori knowledge, nor human labeling, we have in mind for the time-being high level activities (such as cooking, clean- ing, etc.). This means that we do not dwell into the intricacy of the different stages involved in an activity (in the case of cooking: clean- ing vegetables, heating ingredients, eating, etc.). Thus, if we consider a power strip in the kitchen and its respective power readings, the transitions in the traces might be due to smaller appliances being powered on (kettle, mixer, etc.). However, since, they are not disag- gregated, they cannot be labeled and cannot be directly used. This is why we focus on the overall duration of the interaction with an ap- pliance, not differentiating between all the stages and sub-activities it might involve, thus we only consider two appliance states, i.e., idle or active.

5.3.2.2 Gaussian Mixture Model We model the distribution of power levels by approximating it with a Gaussian Mixture Model (GMM)[37]. A GMM is a probabilistic model that assumes that the data points under consideration are gen- erated from a mixture of a finite number of Gaussian distributions. The estimation of the means and covariances that define the Gaus- sians is obtained by achieving the maximum likelihood of the mix- ture through the Expectation-Maximization (EM) algorithm. We refer the reader to Section 6.8 and Chapter 8 of [104] for a formal definition of the GMM and theEM algorithm respectively. The different modes of an appliance’s power distribution can be attributed to the different internal states of the appliance or to the se- quence of appliances being activated in the case of circuit-level data. Given that most of the appliances operate at low power levels dur- ing their idle period, the idle state can be identified as the first set of correlated measurements. Thus, we locate the point that lies in the first valley of the Gaussian mixture (the first Gaussian identified by its mean at µ1 represents the idle status, while starting from the sec- ond Gaussian centered at µ2, the appliance is considered in use). We define the bottom of the valley as the minimum of the distribution between the first and the second Gaussians as in Equation 12 for p being the multimodal distribution modeled by the GMM. In the case 78 inferring activities

where the GMM overfits close small peaks, we merge those peaks and identify µ1 as the largest mean in the set of adjacent peaks.

arg min p(x) (12) µ16x6µ2 We propose GMMthresh as the procedure to determine the best GMM fit for an appliance’s distribution and to locate the threshold between the first two modes of the distribution as can be seen in Algorithm 1.

Algorithm 1 GMMthresh Input: Set of points X = {x1, ..., xN} Maximum numbers of Gaussians in the mixture M Output: The threshold T 1: k ← 1 2: minBIC ← 3: bestGMM ← NULL 4: T ← NULL ∞ 5: while k 6 M do 6: model ← GMM(X, k) . Determine the Gaussian Mixture Model GMM for X and k number of Gaussians 7: if model.BIC < minBIC then 8: minBIC ← model.BIC 9: bestGMM ← model 10: k ← k + 1 11: µ ← sort(bestGMM.means) . Sort the means in ascending order 12: T ← arg min bestGMM.p(x) µ16x6µ2

. Find the valley between the first two means

13: return T

5.3.3 Experimental Results

5.3.3.1 Datasets The Pecan Street dataset1 originally comprised 239 monitored house- holds mostly located in Texas. Their aggregate power consumption and disaggregated load readings are provided at a rate of once every

1 http://www.pecanstreet.org/ 5.3 gmmthresh: agnostic appliance activity thresholding determination 79

Figure 22: Histogram (in log scale) of the monthly power distribution for dishwasher1, where low power measurements are more repre- sented.

minute and span from January to May 2014. While 70 different types of appliances are recorded, there are at most 22 actively monitored cir- cuits per household. Appliances with larger ranges of consumption are for example ovens, dishwashers or furnaces. We leverage the wisdom of the crowd by using expert annotated data from the Pecan Street dataset through our CAFED platform2 pre- sented in Chapter 4. This tool allows us to select and display power time series dynamically to users that are familiar with the energy domain and have the required knowledge for discerning when an ap- pliance is active from when it is idle by looking at its power trace. The user can then interact with the platform and highlight portions of the time series where the appliance is active. The expert annotated data are collected through the platform and made available to other re- searchers in the community. Using this expert crowdsourcing method, over 4500 daily time series have been collected so far and we believe that the framework could be extended to other publicly available data- sets [39].

5.3.3.2 Parameter Selection Our algorithm considers one month of data per appliance (to min- imize the impact of weather) and to ensure that enough data are available (some appliances might not be used frequently on a weekly basis). The readings’ distribution can be represented by a histogram of the different power measurements, where the modes coincide with Gaussians and the peaks with the Gaussians’ means. We observe for each month that some power level readings amount to thousands of occurrences, while the magnitude of other representatives is in the order of hundreds to a few instances as in Figure 22.

2 https://cafed.inf.ethz.ch 80 inferring activities

Therefore, the data are scaled to lessen the order of magnitude be- tween the measurements, in particular the lower measurements, since the appliance is expected to be mostly in idle mode. This amplifies all candidate peaks (Gaussians) with regards to the more prominent low power peaks. The scaling of the histogram power distribution consists in selecting for each bin i, the quantity ni of power measurements in the bin and to convert it to a logarithmic scale, thus in the order of C ∗ log(ni + 1), where C is a constant. The rescaling of the density function amplifies all candidate modes, while reducing the prominent ones. Additionally, C allows small peaks to be identified by the GMM by ensuring that enough data are used. It is set to the sample size as 13 z α defined in Equation , where 2 is the z-score for a predefined con- fidence interval, σ the standard deviation of the sampled data and E the error margin. We evaluate the manually labeled ground truth data obtained through the CAFED platform and determine the standard deviation of the different appliances and households for the active data. This value does not vary significantly across the annotated data and is roughly 200 W. For this purpose, we select this value for σ. For a confidence interval of 95%, the z-score is defined as 1.96. We target an error margin of 5 W for the thresholds and thus, E is set at 5. In this configuration, we set C = 6147.

z α ∗σ 2 2 C = ( E ) (13)

We use a parametric implementation for the GMM from Matlab. The number of Gaussians to be fit to the mixture model is used as an input parameter. The determination of the best fitting model relies on the Bayesian Information Criterion (BIC) as defined in Equation 14, where k represents the number of parameters to be estimated (in our case the number of Gaussians to be fitted), N the sample size and likelihood the likelihood function to be maximized. We select the best model by choosing the one with the lowest BIC value, where k represents the number of Gaussians in the mixture. Additionally, we evaluate the impact of binning the data (5 W, 10 W), i.e. grouping continuous values in each bin and sampling values from each bin according to the previously defined log scaling.

BIC = −2 · log(likelihood) + k · log(N) (14)

5.3.3.3 Evaluation The evaluation is performed by using January data to determine the threshold for the active state for a set of 8 monitored appliances com- bining both single appliances and circuits as can be seen in Table 6. From the CAFED dataset, we use the first week of February to eval- uate the thresholds determined for the selected appliances for 10 5.3 gmmthresh: agnostic appliance activity thresholding determination 81

Table 6: Selected appliances and their categories Appliance Category bathroom1 Circuit clotheswasher1 Single Appliance dishwasher1 Single Appliance kitchen1 Circuit light_plugs1 Circuit livingroom1 Circuit microwave1 Single Appliance oven1 Single Appliance households. Additionally, to evaluate the performance of the algo- rithm over time and show the effect of the input data in determining the threshold, we select one household where the thresholds for the appliances are computed for the first 4 months and use the subse- quent first week of the following month as testing data. The available input data for the GMM is shown in Table 7. We compare the performance of the GMM thresholding to two ar- bitrary thresholds, i.e. 0 W, which can be used in the case where the baseline is zero and 50 W, which can be considered as an educated guess for detecting most of the major appliances [74] and taking into account the standby-power of most consumer electronics devices [145, 232]. We score the different parametrizations by using common infor- mation retrieval scores as follows. The precision as defined in Equa- tion 15 measures the fraction of data points that were actually anno- tated as active against all data points that the algorithm determined to be active. The recall as in Equation 16 measures the proportion of data points that the algorithm determined to be active in comparison with the actual number of available active points. Its limitation relies in the fact that a perfect recall score can be achieved by deciding that all data points should be considered as active. This is why, another common score is the F1 score as in Equation 17, which combines both previous measurements and balances their effect. Additionally, we define a score sH as in Equation 19 based on the Hamming distance as defined in Equation 18.

TP precision = TP+FP (15)

TP recall = TP+FN (16) 82 inferring activities 8669 8197 8142 7982 9737 9922 2974 5568 1632 6910 bathroom household_id e e e e oYsYsYes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes No No No No Yes No No Yes No No Yes Yes No Yes No No No No No No Yes No Yes No Yes Yes No No Yes Yes Yes Yes Yes Yes Yes Yes No Yes Yes No No Yes Yes Yes Yes Yes Yes Yes No Yes Yes No Yes Yes No Yes No Yes No Yes Yes No Yes Yes Yes 1 clotheswasher 1 Table dishwasher 7 Apine e household per Appliances : 1 kitchen 1 light_plugs 1 livingroom 1 microwave 1 oven 1 5.3 gmmthresh: agnostic appliance activity thresholding determination 83

precision∗recall F1 score = 2 · precision+recall (17)

n dH(a, b) = a(i) ⊕ b(i) (18) i=0 X

N 1 sH = N dH(aj, b) (19) j=1 X The evaluation is performed by determining the thresholds in Jan- uary and evaluating them against the annotated ground truth of the first seven days of February. We however distinguish two cases in the handling of the annotated ground truth data. In the fetching process of dispatching curves to be annotated by our contributors, we enforce majority voting, i.e. each curve should be annotated by 3 users and for each data point, the most frequent annotation is chosen (2 are nec- essary in this case). In the case where 2 annotations per data point are obtained, annotators could diverge on some annotated points. This is why in the latter, we evaluate the precision, recall and F1 score on points where the annotations concord, while the Hamming score con- sists on a weighted average of the individual annotations provided by each annotator as in Equation 19.

5.3.3.4 Results As can be seen in Table 8, we compute the average scores per ap- pliance and per household as defined in Section 5.3.3.3. Then, we combine the scores obtained for all appliances in each household by averaging them to evaluate the model’s predictive power. The best ap- proximation for the power distribution should be such that its modes are fitted by the Gaussians determined by GMMthresh. This means that the best scores should be achieved, i.e., higher precision, recall and F1 score and lower Hamming score sH. Two parameters are eval- uated: the number of Gaussians in the model and the effect of the binning (or rounding) of the power measurements. We can see in Table 8 that a configuration allowing to search for more Gaussians fits the power distribution more closely. The round- ing effect is to reduce the effect of neighboring modes, allowing to re- duce their overfit. However, aggregating measurements also reduces the accuracy of the thresholding when modes are adjacent, especially in the case of the largest tested bin size (10 W). This is particularly noticeable for appliances whose states operate at a more fine-grained power scale. Overall, the best configuration that minimizes the Ham- ming score (the least differences between the binary output from the 84 inferring activities Table F recall prec. s cr m i bathroom bin gmm score H 1 8 prmtiain eetdconfiguration selected parametrization: GMM : etrfrs for better 51 282 4 10 4 5 15 1 4 15 4 10 15 6 5 10 1 0 10 0 10 10 0 5 15 1 0 15 0 10 15 0 5 10 1 0 10 0 10 10 0 5 15 1 0 15 0 10 15 0 5 10 1 0 10 1 10 10 1 5 15 1 1 15 1 10 15 1 5 10 1 10 10 H ) ...... 5 5 11 5 257 5 257 11 086 0 257 0 171 0 804 0 898 0 898 0 899 0 898 0 892 0 993 0 861 0 861 0 862 0 861 0 849 1 804 0 000 0 000 1 000 000 000 . 1 46 314 1 clotheswasher ...... 7 10 12 10 179 0 000 0 179 0 0 829 0 931 0 875 0 934 0 931 0 875 0 900 0 893 0 807 0 894 0 893 0 806 0 883 1 994 1 000 999 994 000 . . . 0 11 13 607 625 6 11 768 1 15 obnig(ihri etrfrtepeiin h ealadF and recall the precision, the for better is (higher binning no GMM , dishwasher ...... 8 0 0 0 883 0 902 0 888 0 873 0 889 0 857 0 869 0 849 0 824 0 850 0 826 0 774 0 953 0 998 0 999 0 954 000 000 ...... 0 109 000 109 371 2 70 108 429 70 200 119 229 643 1 kitchen ...... 2 0 0 923 0 886 0 885 0 923 0 874 0 885 0 916 0 820 0 819 0 916 0 803 0 819 0 944 1 981 0 981 0 944 1 981 981 . . 7 47 771 22 771 . . . . 0 43 000 89 000 4 48 343 48 971 1 light_plugs ...... 9 0 0 0 898 0 896 0 899 0 937 0 896 0 699 0 823 0 820 0 821 0 893 0 820 0 622 0 999 0 999 0 000 0 999 999 000 ...... 6 50 762 50 190 1 149 619 149 381 6 268 667 268 667 1 livingroom ...... 8 0 0 0 884 0 837 0 935 0 884 0 837 0 935 0 955 0 888 0 888 0 955 0 888 0 888 1 855 0 841 1 992 1 855 841 992 . . 2 12 524 13 524 . . . . 2 13 429 28 429 8 12 286 27 286 1 microwave ...... 0 0 0 0 807 0 879 0 879 0 690 0 763 0 878 0 818 0 806 0 805 0 697 1 685 1 803 0 902 1 999 1 000 1 902 000 000 ...... 1 13 911 7 125 3 7 232 10 571 6 7 768 10 982 1 cr n oe is lower and score 1 oven ...... 6 37 762 8 78 786 9 0 0 898 0 900 0 875 0 792 0 797 900 0 0 837 0 840 0 842 0 707 0 716 840 0 0 000 0 000 0 917 0 000 0 000 000 6 58 762 . . . 2 32 929 2 37 429 0 61 000 1 ...... v std avg 6 0 0 866 0 891 0 892 0 866 0 861 865 0 0 889 0 847 0 833 0 847 0 812 800 0 0 918 0 976 0 986 0 957 0 977 997 ...... 9 40 596 2 35 124 6 94 668 6 49 862 8 92 183 7 91 873 ...... 046 026 020 085 057 071 064 032 029 095 076 080 069 055 029 055 055 007 ...... 823 311 378 981 098 927 5.3 gmmthresh: agnostic appliance activity thresholding determination 85

Original data Algorithm decision Annotated data 400 active

350

300

250

200

150 Power [W] Power

100 threshold decision Binary 50

0 idle 0 3 6 9 12 15 18 21 24 Time [h] (a) Overlay of a histogram and (b) Overlay of a daily power trace and its GMM approximation for binary decision (idle below /active livingroom1 (circuit-level data). above the threshold line) for livin- Underlying Gaussians in dashed groom1 lines.

(c) Overlay of a histogram and its GMM (d) Overlay of a daily power trace, and approximation for dishwasher1 (sin- binary decision (idle below /active gle appliance data). Underlying above the threshold line) for dish- Gaussians in dashed lines. washer1

Figure 23: Outcome of the GMM for livingroom1 (circuit-level data) and dish- washer1 (single appliance). In (b) and (d), power below the thresh- old is considered to be in the idle state, and in the active state otherwise.

GMM and the annotated data) and maximizes the F1 score consists in modeling 15 Gaussians and not binning the data. The outcome of the algorithm can be seen in Figure 23 in the case of dishwasher1 (single appliance) and of livingroom1 (circuit / room). In both cases, as can be seen in the respective test (annotated) time series in Figure 23d and Figure 23b, 50 W would not highlight the smaller power measurements (the ramping up and the ramping down of the device) in the case of dishwasher1, while in the case of the livingroom1, the baseline is above 50 W. If the baseline level is close to the arbi- trarily chosen threshold (for testing purposes it was set to 50 W), the decision for livingroom1 would be to classify it erroneously as being active throughout the day. We compare the performance of GMMthresh in terms of the F1 score and Hamming score sH of the selected model against the usage of 50 W and 0 W as thresholds. Figure 24 shows that GMMthresh per- forms steadily well for all appliances and consistently outperforming the 0 W threshold. It outperforms the 50 W threshold in all cases, except for kitchen1 and light_plugs1. As can be seen in Table 8, the other scores’ performance similarity is linked to the fact that the deter- 86 inferring activities

1.0 1200 GMM Hamming score sH

0.9 0W Hamming score sH 1000 50W Hamming score sH 0.8 H 0.7 s 800 0.6

score 600

1 0.5 F 0.4 400

0.3 Hamming score GMM F1 score 200 0.2 0W F1 score 50W F1 score 0.1 0 oven1 oven1 bathroom1 kitchen1 bathroom1 kitchen1 clotheswasher1dishwasher1 light plugs1livingroom1microwave1 clotheswasher1dishwasher1 light plugs1livingroom1microwave1 Appliances Appliances

(a)F 1 score per appliance for all three (b) Hamming score sH per appliance thresholding methods (higher is bet- for all three thresholding methods ter) (lower is better)

Figure 24: Scores (F1 score and Hamming score sH) overview per appliance

1.0 900 GMM Hamming score sH 0.9 800 0W Hamming score sH 50W Hamming score sH 0.8 700 H 0.7 s 600 0.6 500 score

1 0.5 400 F 0.4 300

0.3 Hamming score 200 GMM F1 score 0.2 0W F1 score 100 50W F1 score 0.1 0 1632 2974 5568 6910 7982 8142 8197 8669 9737 9922 1632 2974 5568 6910 7982 8142 8197 8669 9737 9922 Households Households

(a)F 1 score per household for all three (b) Hamming score sH per household thresholding methods (higher is bet- for all three thresholding methods ter) (lower is better)

Figure 25: Scores (F1 score and Hamming score sH) overview per household

mined thresholds lie generally below 20 W as can also be seen across households in Figure 27a. dishwasher1 is however better detected by the GMM and the 50 W thresholds as the determined thresholds are more spread than in the case of clotheswasher1 as can be seen in Fig- ure 27a. microwave1 and oven1 show the worst performance for the 0 W threshold as low power measurements (< 10 W) are erroneously detected as showcasing human activity. In the case of circuit-level data, we have seen that when the base- line is above 50 W as in Figure 23b, the appliance is considered active during the whole day. The baseline can be attributed to consumer electronics for entertainment in the case of livingroom1 that remain in standby mode and are thus not voluntarily powered on to be used by the residents. The predictive power per household combines the scores for all appliances belonging to each household. As can be seen in Figure 25, when combining the previous observations, the GMMthresh performs better overall. While all households are single- family homes, the performance varies across the households due to the set of appliances available and the residents’ lifestyles as can be seen in Figure 27a. We expect that some appliances are used less frequently than others (for example oven1). Since the determination of the threshold through 5.3 gmmthresh: agnostic appliance activity thresholding determination 87

1.00 350 GMM Hamming score sH

0W Hamming score sH

0.95 300 50W Hamming score sH H

0.90 s 250

0.85 200 score

1 0.80 150 F

0.75 100 Hamming score

F 0.70 GMM 1 score 50 0W F1 score

50W F1 score 0.65 0 Jan Feb Mar Apr Jan Feb Mar Apr Months Months

(a)F 1 score for household 6910 from (b) Hamming score sH for household January to April for all three thresh- 6910 from January to April for all olding methods (higher is better) three thresholding methods (lower is better)

Figure 26: Scores (F1 score and Hamming score sH) overview for house- hold 6910 from January to April comparing all three thresholding methods (average over all appliances)

100 120

80 100

80 60

60

Power [W] 40 Power [W] 40

20 20

0 0 oven1 bathroom1 kitchen1 clotheswasher1dishwasher1 light plugs1livingroom1microwave1 bathroom1 dishwasher1 kitchen1 light plugs1 livingroom1 Appliances Appliances (a) Thresholds obtained per appliance (b) Thresholds obtained per appliance over all households from January to April for household 6910

Figure 27: Thresholds per appliance for all households and details for house- hold 6910

GMMthresh depends on the input data, we show the scores combined from the thresholds computed monthly for January through April for household 6910 in Figure 26. Throughout those 4 months, the GMM maintains its prediction power close to the 0 W and above the 50 W thresholds and outperforms both static thresholding methods in the case of livingroom1. As can be seen in Figure 27b, the determined thresholds do not vary significantly for appliances that are used reg- ularly (such as bathroom1 or kitchen1). dishwasher1 and light_plugs1 show the most variance. Since the method depends on historical data, it is to be expected that it requires enough data to estimate the power distribution of an appliance.

5.3.4 Conclusion and Discussion

In this section, we introduced an automated way of determining when an appliance is activated by a human being by filtering out baseline noise from the readings and by looking at the distribution of the power measurements with consistently high accuracy. Our meth- 88 inferring activities

ods performed better than the generally accepted best guess thresh- olds and achieve an F1 score of about 0.9 for all appliances that were evaluated. Having now obtained binary vectors of data, we can consider daily time windows and infer patterns of appliances being used conjointly and derive temporal rules, as will be shown in Section 5.4. In a real- life deployment, to mitigate the fact that the thresholds depend on the available data (the appliances have to be used by the households residents), the accuracy could be improved by developing an online version of the algorithm with a decay factor for forgetting past thresh- olds and balancing with newly evaluated thresholds.

5.4 temporal association rules for activity detection

Increasing energy efficiency is part of the goals set by governments across the world to reduce the energy footprint and provide sustain- able development to all. The advent of new technologies that permit the monitoring of the electrical consumption within households, such as with smart meters and future smart appliances that are likely to re- port their own consumption, and progress in control technologies for actuating different components such as lights or thermometers, offer prospects to smarten homes by exploiting the large amounts of data available and derive processes to conserve energy. The opportunity to collect real-time consumption data prepares us to contemplate real- time feedback to inform the residents about their usage of energy [16]. However, the final link to this chain, from data to action, relies on households’ residents to assimilate the feedback and to change their relationship towards their energy consumption. The failure of earlier energy conservation campaigns was due to the discrepancy between the residents’ energy knowledge (such as energy units awareness or the evaluation of how much electricity an appliance would consume) and the expected energy reduction that utilities were aiming by of- fering money incentives [88, 89, 127, 209]. Adequate information has to be provided in assisting the decision making as has been shown in the process of acquiring new appliances to reduce future energy costs [31]. While feedback at the appliance level could be provided, given the unfamiliarity with the energy jargon and the overwhelming occur- rences of when diverse appliances are used throughout the week, the residents might not be able to associate the triggering of an appliance to a behavior to address. By aggregating interactions with appliances and abstracting the underlying ongoing activity, the granularity can be reduced. Also, if a resident were to keep a diary of their daily ac- tivities, since most of them are essential (e.g. cooking), they would be salient in their memory and thus more easily associated to effective interactions with electrical devices. Beyond identifying and estimat- 5.4 temporal association rules for activity detection 89 ing the amount of energy that is used during specific human activi- ties, this additional information could be used to build new strategies within a smart home to improve and offset energy-hungry behaviors by providing automation measures to reduce their footprint. This would first require us to learn what activities can be detected and their scheduling, and more specifically to predict the time windows where they might occur. In this section, we make use of a large dataset with appliance- and circuit-level power data and provide a framework for determin- ing temporal sequential association rules. Sequences of time intervals where the appliances are in usage can vary in their order, duration and the time elapsed between these events. Our contribution consists in providing a full pipeline for mining frequent sequential itemsets and a novel way to discover the time windows during which these se- quences of events occur and to capture their variance in terms of du- ration and order. Our method is data-driven and relies on the data’s statistical properties and allows us to avoid an exhaustive search for the time windows’ sizes, by relying instead on machine learning tech- niques to identify and predict those time windows. We hereby exam- ine temporal sequential association rules in a novel way, based on ma- chine learning techniques, to learn time windows where a rule’s head and body take place and to exploit historical data and their statisti- cal properties. Given the variance in the usage of different appliances for completing specific tasks, such as cooking, where the diversity of the recipes in terms of preparation and cooking time contributes to the variance in what appliances and in which order and how long they are used, considering sequential frequent itemsets allows us to capture rules that still reflect the underlying behavior. We provide an analysis of a dataset with disaggregated energy consumption and show that rules can be learned that reflect expected activities that should take place within households. Our technique is not limited to energy data and is thus generalizable to datasets for which temporal sequential rules should be mined. In the following, we will review related work in Section 5.4.1. Then, we will present the methodology for extracting temporal sequential association rules in Section 5.4.2. Experimental results on a disaggregated dataset with several house- holds will be evaluated in Section 5.4.3. Finally, we will discuss im- provements and future work in Section 5.4.4.

5.4.1 Related Work

In the frame of DRM, the shifting of select appliances usages and their optimal scheduling to enable the shaving of peak consumption was studied experimentally [20, 21, 62]. As more smart meters are rolled out and equipped in households, large datasets with aggregate load consumption are released such as the CER dataset with over 4000 90 inferring activities

households or the PG&E dataset, and serve as a basis for research in customer segmentation or demand forecasting for DRM[ 8, 246]. Due to the difficulty to collect single appliances’ power consumption, previous work relied on activities as described by human beings to model and synthesize electrical loads in households [197]. DRSim was developed as a simulator for DRM systems that is aware of the current status of the grid and the activities carried out inside a house- hold and attempts to estimate the potential savings for demand side policies [245]. Until all households are equipped with smart appliances that can communicate their consumption and internal states, determining when an appliance is active or idle (being powered off or being in standby- mode), requires either datasets with explicit labeling of these states or an algorithm to determine them automatically [37, 38]. The Data- port dataset is one of the largest dataset with 1-minute power data including disaggregated readings from single appliances and circuit (such as power strips or rooms) from over 800 households, but it is still lacking rich metadata such as appliances’ states and activity an- notations. Efforts were directed at expanding the datasets that are available to the community by learning from the computer vision community and using crowdsourced human labeling to acquire la- bels for providing richer data to develop and refine algorithms based on machine learning [39]. This retrofitting of existing datasets offers an alternative to acquiring new datasets, which is prohibitive in terms of costs and time [84], as consumer electronics is widespread and re- quires a large set of smart plugs to be installed and even special in- strumentation for larger appliances with higher wattage. Preliminary attempts at inferring activities from households’ non-synthetic elec- tric load curves relied on aggregated electricity consumption, with few annotations [164]. Activity detection was previously performed by considering activities as events’ streams and using symbolic anal- ysis with the specific goal of shifting activities to a more convenient moment of the day [64] and using an HMM[ 178], however, both anal- yses relied on synthetic data. Attempts at using real data are linked to the CASAS project [47, 58, 61, 138, 191] and set in a students’ apart- ment, the data were used to extract sensor data features to link the ag- gregate household consumption load to human activities, but failed to address the bias induced by the inability to discard energy-hungry appliances from the overall load curve [47, 48]. Another attempt at us- ing real data was achieved through a push-system for recording user activities based on the identification of interactive loads by clustering the states of appliances [202], but still failing to recognize appliances. This work spawned the analysis of association rules [201], but con- sidering fixed hourly windows, instead of mining for variable time intervals and not considering the time relationships between the time intervals. 5.4 temporal association rules for activity detection 91

The development of APRIORI [4] was followed by different se- quential pattern mining algorithms such as GSP [5], WINEPI and MINEPI [157], SPADE [250] or PREFIX-SPAN [117], they provided sequential pattern analysis but considered the events to be instan- taneous. Temporal pattern mining progressively included different time relationships between the sequences of events [106, 107, 165, 234]. A framework for identifying sequential temporal intervals pro- vided an algorithm based on APRIORI for learning the association rules by searching for frequent arrangements of sequences of events, extending and revising Allen’s temporal relationships and allowing user-defined constraints for mining the rules, but did not offer the possibility to mine for the time windows during which the temporal association rules occurred [180]. Titarl was developed to learn tempo- ral association rules on symbolic time sequences (where sensor data were binarized by introducing discretization of each variable repre- senting a sensor), but considered uniformly distributed intervals for the time intervals where the rules occurred, instead of exploiting the statistical property of the data [95]. The work was then extended to forecast temporal intervals by providing a refinement procedure for first extracting temporal association rules, then merging them [96].

5.4.2 Methodology

If we consider the cooking activity, we expect different appliances to be used to fulfill this task such as an oven or a kitchen stove. The trig- gering of those appliances can then be followed by the usage of a dish- washer for cleaning the dishes. Due to the diversity of recipes that can be used for preparing a meal for example, defining temporal thresh- olds for the duration of events during which different appliances are used is too restrictive and will not capture the variance in the way corresponding activities are conducted. Thus, to learn human activi- ties triggering electrical consumption, we identify co-occurring events and their respective association rules. We exploit previous work on sequential itemsets mining by considering temporal relationships be- tween the events and their respective time intervals allows us to clas- sify and order these events according to the sequence in which they occur [180]. This means that different events can follow, contain or overlap one another. The succession and merging of these events can be identified as activities. Additionally, we define a novel method to derive the time windows where these activities arise and learn the as- sociation rules between these activities. In the following, we describe the methodology for deriving temporal association rules for sequen- tial events such as defined in Equation 20 and through our pipeline as can be seen in Figure 28. 92 inferring activities

Figure 28: Pipeline for deriving the temporal association rules

5.4.2.1 Data Binarization In the case of activities, the events consist in the triggering and active usage of appliances in residential homes. The data that are recorded consist in power data and do not contain information about when appliances are in usage, instead of being off or in standby-mode. The measurements are converted to a binary form, where active and idle states are determined using GMMThresh [37, 38], presented in Sec- tion 5.3. This binarization method can be used for sensor data as well, to distinguish background noise (idle state) from meaningful read- ings (active state). For multi-state data, we advise the reader to define a quantification scheme for the original data and to create a variable per quantification level and transform the data accordingly.

5.4.2.2 Sequential Association Rules Mining Since we want to learn daily temporal association rules, we can con- sider that each day of collected data represents a basket, in the tra- ditional market basket analysis [4]. The intervals during which the appliances are active represent the items in each basket. Sequential association rules mining has been widely studied and defines how frequent sequential itemsets are extracted [180]. We briefly explain how these association rules are constructed. The events’ intervals E maintain temporal relationships R and constitute arrangements A = (E, R). An arrangement A defines the temporal relationships between the time intervals where different events take place. For n = (E E ) Pn = n! = m events E, E 1, ..., n , 2 (n−2)! permutations for the pair-wise temporal relationships R(Ei, Ej) between Ei and Ej can be 5.4 temporal association rules for activity detection 93 computed , where i < j and i, j ∈ {1, ..., n}, thus we can define an m-tuple of pairwise relationships R = (R1, ..., Rm). The temporal rela- tionships that are selected in the case of the activities constitute a gen- eralization of more refined ones [180] and in this pipeline, we restrict them to R ∈ {contain, follow, overlap} as in Figure 29. The search for these arrangements is performed on an enumeration tree expanded breadth-wise and being pruned based on a minimum support value. The rules are determined for each arrangement by considering sub- arrangements and by extracting all partitions of the set of events into two subsets as the head and the body of the rule. In order not to re- peat rules, the sub-arrangements are extracted in lexicographic order. The rules are accepted or discarded following APRIORI’s strategy [180].

Figure 29: Time relationships: contain, follow and overlap

A[tAS , tAE ] −→ B[tBS , tBE ] (20)

5.4.2.3 Time Windows bivariate histograms (or heatmaps) Having determined the body and head of the rules as two sub-arrangements, we derive a novel technique for extracting the time windows during which the rules hold. For this, we build a co-occurrence matrix for the head and the body of each rule based on the time intervals during which they occur. To preserve the order of the rule, i.e., the body and the head respectively, we only consider the cases where the head’s time inter- vals are subsequent to the body’s time intervals. In the reverse case, the rule with head and body inverted, should it have enough sup- port, will be processed independently from another arrangement. In practice, we build a bivariate histogram (or heatmap) for each minute in a day for both the head and the body of the rule. For each day, co-occurring minutes for both the head and the body are marked as zones in the bivariate histogram. For example, for a particular day, if the head is active from 10 a.m. to 11 a.m. and the body is active from 94 inferring activities

11:30 a.m. to 2 p.m., the co-occurring region would be the rectangular area [10:00-11:00; 11:30-14:00], using the 24-hour notation, and would contain ones, while the other regions zeroes. The bivariate histogram is created by super-imposing the different co-occurring regions for each day. As can be seen in Figure 30a, each minute that has occurred more frequently throughout the dataset, will be more accounted for than minutes that have happened more irregularly. It can also be no- ticed that the matrix is upper triangular, as we are interested in events (the rule’s head) appearing after the body’s events.

(a) Bivariate histogram (heatmap). (b) Gaussian KDE.

Figure 30: Heatmap and Gaussian KDE for an association rule

P(~x|~µ, Σ) = √ 1 exp(− 1 (~x − ~µ)T Σ−1(~x − ~µ)) (21) (2π)k|Σ| 2

Σ = E[(~x − ~µ)(~x − ~µ)T ] (22) where Σij = cov(xi, xj) = E[(xi − µi)(xj − µj)]

tolerance regions The regions of interest for determining the time windows for the association rules are the temporal regions that occur the most often. They can be smoothed as can be seen in Fig- ure 30b by using a KDE. Using the bivariate histogram concept allows us to conceptually assimilate each region as a trivariate Gaussian distribution. The regions of interest are then the projection of each trivariate Gaussian into the horizontal plane. Thus, identifying these regions can be undertaken by a Gaussian Mixture Model (GMM), where each Gaussian can represent a separate region or cluster of points. Equation 21 represents a k-dimensional Gaussian distribution, entirely defined by its mean ~µ and covariance matrix Σ, defined sep- arately in Equation 22. We will describe how to derive those regions in the following. The projection of a trivariate Gaussian on the horizontal plane is a bivariate Gaussian. The region they cover can be delimited by an 5.4 temporal association rules for activity detection 95

isocontour [73] as defined in Equation 23. These isocontours can be defined by the mean ~µ and the covariance matrix Σ of the data points clustered within, where the spread between the data points and the 2 mean is represented by the Mahalanobis distance dΣ(~x, ~µ) as defined in Equation 24. We can define from a k-variate Gaussian distribution an ellipsoidal region as in Equation 25 where ~µ and Σ are the sam- ple mean and the sample covariance matrix, respectively, obtained 2 from the clustered data, where c → χk(p), the chi-squared distribu- tion with k degrees of freedom and for covering a p-percentage of the population as the population size N → [214]. These regions are referred to as statistical tolerance regions [136, 137, 214]. They have been used in assembly tolerance for specifying∞ the quality of production to be achieved [144].3 The population coverage is thus a parameter for the size of the tolerance area. A closed form solution was defined for bivariate cases and approximations are available for higher dimensional distributions [214].

P(~x|~µ, Σ) = c with c ∈ [0; 1] (23)

2 T −1 dΣ(~x, ~µ) = (~x − ~µ) Σ (~x − ~µ) (24)

2 R(~µ, Σ, c) = {~x : dΣ(~x, ~µ) 6 c} (25)

To determine the closed form equation of the ellipsoid defined in Equation 25, we can recall the definition of the covariance ma- trix Σ as in Equation 22, which summarizes the spread of the data. Such observation is the basis to popular methods such as PCA, to transform the data into an orthogonal basis set, where the first vec- tor of this basis will have the direction of the largest variance of the data (this consists in performing the eigendecompostion of the covari- ance matrix Σ = VLV−1, where L is the diagonal matrix of eigenval- ues and V the respective eigenvectors). This change of coordinates operates under a linear transformation T and consists of a rotation through a matrix R and the scaling of the data points along each axis 4 through a matrix S where T = RS√ [146], as illustrated√ in Figure 31 and Σ = RSSR−1 = TT T , with S = L and R = V [217], the Cholesky decomposition of Σ. As can be seen in Figure 32, we can bound the ellipse by a box to get the approximation of the time intervals during which the events occur for both the head and the body of a rule, as the sides of the rectangle delimited by the ellipse’s extremum points.

3 These should not be confused with confidence regions (or intervals), which yield the confidence for the sample mean and covariance matrix, as the experiment is repeated. 4 The matrix is thus diagonal. 96 inferring activities

! ! 1 0 5 −2 Σ = Σ = 0 1 −2 5

(a) The covariance matrix is the identity ma-(b) The covariance matrix is a full matrix. trix, data contained in a circle. Notice the rotation and the spread of the data into an ellipse.

Figure 31: Full and diagonal covariance matrices and corresponding data spread

Without loss of generality, if we consider the case of the bivariate Gaussian distribution, we can rewrite the density function as in Equa- tion 27 by taking the covariance matrix Σ as in Equation 26. We can compute the change of coordinates as the linear transformation Y = TX. Since the new basis is orthogonal, the covariance matrix in that basis is a diagonal matrix as can be seen in Equation 28 as the variables are uncorrelated, and which can be solved to obtain the co- σ2 σ2 θ variances y1 , y2 and the rotation angle as in Equation 29. The tolerance regions, which we are interested in, are delimited by iso-

contours such that f(x1, x2) = c, where c > 0. a = cσy1 is the the

semi-major axis of the ellipse and b = cσy2 the semi-minor axis, re- spectively.

! ! σ2 σ σ2 ρσ σ Σ = 1 12 = 1 1 2 with ρ = σ12 (26) 2 2 σ1σ2 σ12 σ2 ρσ1σ2 σ2

x −µ f(x , x ) = 1 √ exp(− 1 [( 1 x1 )2 1 2 2 2(1−ρ2) σx 2πσx1 σx2 1−ρ 1 x −µ x −µ −2ρ 1 x1 2 x2 (27) σx1 σx2 x −µ +( 2 x2 )2]) σx2

SS = R−1ΣR =⇒

 2   2  σ 0  (θ) (θ)  σ σx x  (θ) − (θ) (28) y1 = cos sin x1 1 2 cos sin 0 σ2 − (θ) cos(θ) σ σ2 (θ) cos(θ) y1 sin x1x2 x2 sin

1 2σx1x2 θ = 2 arctan σ2 −σ2 x1 x2  σ2 +σ2 q (σ2 −σ2 )2 σ2 = x1 x2 + x1 x2 + σ2 (29)  y1 2 4 x1x2  σ2 +σ2 q (σ2 −σ2 )2 σ2 = x1 x2 − x1 x2 + σ2 y2 2 4 x1x2    5.4 temporal association rules for activity detection 97

Additionally, the size of the tolerance region also bears a statistical meaning that determines the value c. Indeed, the Mahalanobis dis- tance r to the Gaussian can set the ellipse’s size as for the bivariate case, it is dependent on the cumulative distribution of the Gaussian distribution. A closed form solution based on the cumulative distribu- tion function as expressed in Equation 30, based on the parametriza- tion of the ellipse and its Cholesky decomposition [29] and allows us to determine r based on the probability that an observation falls within the region delimited by the ellipse defined by the isocontour at value c. Additionally, the new coordinate system has an orthogonal basis and the variables y1 and y2 are thus uncorrelated and ρ = 0 and 2 2  y −µ   y −µ  2 the ellipse’s equation can be rewritten as 1 1 + 2 2 6 c . σy1 σy2 y1−µi Each term ∼ Zi = N(0, 1) contributes as an i.i.d standard σyi distribution, and is therefore equivalent to a chi-square distribution 2 k 2 (U = i=1 Zi ∼ χk) with k degrees of freedom. The isocontour can thus be computed for a specific proportion p of the population to be P 2 p covered by the tolerance region as c = χ2(p) or c = −2 ln(1 − p), equivalently.

r2 F(r) = 1 − exp(− 2 ) = p p (30) r = F−1(p) = −2 ln(1 − p)

To define the bounding box to the tolerance ellipse, we use the gen- eral form of the parametric equations of an ellipse as in Equation 31, obtained by rotating the polar coordinates of an ellipse through the rotation matrix R. The bounding box is delimited by lines passing through the extremum points of the ellipse and can thus be obtained by taking the partial derivatives of the general parametric equations as in Equation 32 to obtain the values t that should be set in the parametric equation.

y1 = µ1 + a cos(t) cos(θ) − b sin(t) sin(θ) (31) y2 = µ2 + a cos(t) sin(θ) + b sin(t) cos(θ)

∂y1 = 0 ⇐⇒ t = − b tan(θ) ∂t a (32) ∂y2 b  ∂t = 0 ⇐⇒ t = a cot(θ)

Having derived how to obtain the bounding boxes for the time win- dows prediction, we use it in conjunction with a GMM that will iden- tify clusters of data points. If this is successful, a temporal sequential rule as defined in Equation 20 is added to the set of association rules, if not it is discarded. If no rule can be determined for the current itemset, the node is discarded and is not expanded further. Our plat- form also includes constraints developed for an optimistic pruning 98 inferring activities

Figure 32: Ellipse rotation and bounding box

of the frequent sequential itemsets [180], such as duration constraints for each arrangement or an  time tolerance for the temporal rela- tionships between the intervals representing the different variables (appliances) that are considered. It is also easily adapted to enforce constraints on the time windows for the predictions such that user- defined time of the day or durations δ as described in Equation 33 can prune out less relevant rules. The cases described in Equation 33 can be generalized to our general formulation in Equation 20 to perform an exhaustive search for all temporal sequential association rules.

A{t1} −→ B{t2}

A{t1} −→ B[t2, t3] (33)

A{t1} −→ B{t1 + δ}

5.4.3 Empirical Evaluations

We use the Dataport dataset, with data ranging from July 2012 until April 2015. The dataset contains 1-minute appliance-level (washing machines, ovens, etc.) and circuit-level (rooms, multiplugs for small appliances in the kitchen, etc.) power data for over 70 types of meters and more than 800 households located mainly in Texas and in Califor- nia. We select 16 households with large numbers of appliances. The data are cleaned to contain only full days (discarding missing data and daylight saving time days). The measurements are binarized us- ing GMMThresh [37, 38], which distinguishes when an appliance is active and thus triggered on to serve an activity, from when it is idle, being either off or in stand-by mode. Without loss of generality, we select January data for deriving the temporal association rules, with some households having 1, 2 or 3 months worth of data for that spe- 5.4 temporal association rules for activity detection 99 : Households data details, with number of appliances per month for each household ID 9 Table 624 1464 1632 2018 2472 2974 3615 5568 6139 6348 6378 7510 7982 9776 9922 9926 010101 13 12 17 14 15 0 0 14 0 12 14 0 14 15 13 15 11 0 15 22 9 15 9 14 14 9 0 14 0 18 17 0 13 12 15 14 15 0 0 15 15 0 15 12 15 15 13 15 - - - Household id 2013 2014 2015 100 inferring activities

cific month. We remove appliances that are likely to always be on or exhibit a periodic behavior due to being controlled by a timer or a thermostat, such as fridges, freezers, furnaces or air condition- ing units. The selected households and the details about how many months of data and how many appliances are considered can be seen in Table 9. The scheduling and duration of usage of different appliances is ex- pected to vary significantly. We are looking for general rules and this can be achieved by relaxing the conditions for the time and the dura- tion of different events and instead considering sequences of events. While some activities such as textile care, would mostly have the washing machine first enabled, then would be followed by the dryer, activities such as cooking are less likely to preserve the order of differ- ent appliances as cooking relies on very different recipes, involving different subsets of appliances and durations for each instance. Thus, if we are interested in the sequence in which different groups of ap- pliances are used, quantifying their total duration of usage and the exact time window during which they are triggered on is likely to fail. Instead, we are searching for intervals where the appliance is used and the time relationships between these intervals, accounting for some flexibility on the intervals’ bounds. However, due to the fact that some appliances can be used for a very short time, while others are active for longer periods, we take the precaution to downscale the data to improve the detection of the time relationships between the time intervals where they take place. From the 1-minute data gran- ularity we construct 15-minute intervals. We mine for the top 1000 (which means we can get more temporal rules as their number de- pends on the number of clusters that are detected) rules extracted from the arrangements in each household, they replace rules with lower scores.

5.4.3.1 Support and Interestingness Measures Additional parameters can influence the search for the temporal se- quential association rules platform, such as selecting the support thresh- old for the frequent itemsets filtering. The interestingness measures used for determining the association rules and the minimum thresh- olds for discarding or accepting them are quite diverse in the litera- ture. Two well-known measures are the support defined as supp(X) = |{t∈D;X⊆t}| supp(X∪Y) |D| = P(X) and the confidence as conf(X =⇒ Y) = supp(X) = P(Y|X) [4]. The tolerance for the time relationships is represented by an  slack on the boundaries of the intervals. Then, for determining the time windows, we choose a threshold for the bivariate histogram as a minimum support for how often each minute should have been marked as occurring, this allows us to discard noise and is similar to the support filtering when mining for the frequent itemsets. 5.4 temporal association rules for activity detection 101

5.4.3.2 GMM The type of GMM method, i.e. a standard GMM, a GMM based on variational inference (VBGMM) and its infinite GMM counterpart based on a Dirichlet Process (DPGMM) influences the quality of the clustering and the proportion of the population that should be cov- ered by the tolerance region impacts the size of the ellipse for the windows’ prediction. The DPGMM and the VBGMM rely on a con- centration parameter α as a DP can be described by a Chinese restau- rant process where α is proportional to the probability to join a new table [32, 33]. A larger α will influence the clustering by assigning the data to more clusters. To approach the natural number of clusters in #days the data, we set α to the proportion α = #datapoints , which usually oscillates between 0.01 and 0.1 and as can be seen in Figure 33, the more natural clustering is achieved for α = 0.1.

(a) α = 0.01 (b) α = 0.1

(c) α = 1 (d) α = 10

Figure 33: Impact of the selection of the concentration factor α for the DPGMM. in Figure 33a, Figure 33b, Figure 33c, and Figure 33d, α takes the values 0.01, 0.1, 1, and 10 respectively.

As can be seen in Figure 34 and Figure 35, the quality of the predic- tions depends on how many clusters are detected and how precise the tolerance regions are. The DPGMM and VBGMM clustering methods overgeneralize the clustering, by merging smaller clusters into larger ones, often covering areas as large as a whole day. This in turn cre- ates very large time windows. The choice of a full covariance matrix instead of a diagonal matrix also impacts the prediction as it will 102 inferring activities

overfit the tolerance regions more and create larger time windows especially in the case of the DPGMM and VBGMM in Figure 35 as the estimated regions of interest are long tilted lines, although the KDE indicates large zones spawn by different Gaussians. As can be seen in both cases, the KDE assigns similar concentrated regions as the GMM, which instead fits the data better. This is why we will con- sider the diagonal covariance matrices in more details. All evaluated parameters can be found in Table 10.

(a) Bivariate histogram (b) KDE for the heatmap. (heatmap).

(c) Elliptical tolerance re-(d) Elliptical tolerance re-(e) Elliptical tolerance re- gions, GMM, diagonal gions, DPGMM, diagonal gions, VBGMM, diagonal covariance matrix Σ. covariance matrix Σ. covariance matrix Σ.

(f) Elliptical tolerance(g) Elliptical tolerance re-(h) Elliptical tolerance re- regions, GMM, full gions, DPGMM, full gions, VBGMM, full covariance matrix Σ. covariance matrix Σ. covariance matrix Σ.

Figure 34: Bivariate histogram and tolerance regions for household 624 where DPGMM overgeneralizes. in Figure 34c, Figure 34d, Fig- ure 34e, Figure 34f, Figure 34g, and Figure 34h, the ? locates the center of the ellipse (as ~µ, the Gaussian’s mean). The dashed, dash-dotted and dotted lines represent population coverage per- centages at 1, 2 and 3 standard deviations σi from the means µi respectively. The colored tolerance regions show the ellipses and their rectangular bounding boxes. 5.4 temporal association rules for activity detection 103

(a) Bivariate histogram (b) KDE for the heatmap. (heatmap).

(c) Elliptical tolerance re-(d) Elliptical tolerance re-(e) Elliptical tolerance re- gions, GMM, diagonal gions, DPGMM, diagonal gions, VBGMM, diagonal covariance matrix Σ. covariance matrix Σ. covariance matrix Σ.

(f) Elliptical tolerance(g) Elliptical tolerance re-(h) Elliptical tolerance re- regions, GMM, full gions, DPGMM, full gions, VBGMM, full covariance matrix Σ. covariance matrix Σ. covariance matrix Σ.

Figure 35: Bivariate histogram and tolerance regions for household 2974, where DPGMM and VBGMM overgeneralize and fail to cap- ture smaller clusters. in Figure 35c, Figure 35d, Figure 35e, Fig- ure 35f, Figure 35g and Figure 35h, the ? locates the center of the ellipse (as ~µ, the Gaussian’s mean). The dashed, dash-dotted and dotted lines represent population coverage percentages at 1, 2 and 3 standard deviations σi from the means µi respectively. The colored tolerance regions show the ellipses and their rectangular bounding boxes. 104 inferring activities M full GMM BM full VBGMM PM full DPGMM M full GMM BM full VBGMM PM full DPGMM M diag GMM BM diag VBGMM PM diag DPGMM M diag GMM BM diag VBGMM M oa.Fe.Sp.ItrsigesMnSoePo.ElpeTm up oa b ue oa b ep Rules Temp. Nb. Total Rules Nb. Total Supp. Time Ellipse Prob. Score Min Interestingness Supp. Freq. diag Covar. DPGMM GMM 0 0 0 0 0 0 0 0 0 0 0 0 ...... 1 1 1 1 1 1 1 1 1 1 1 1 Table support support support confidence confidence confidence support support support confidence confidence confidence 10 Prmtiainfrtetmoa eunilascainrlsadresults and rules association sequential temporal the for Parametrization : 0 0 0 0 0 0 0 0 0 0 0 0 ...... 0 4 0 4 0 4 0 4 0 4 0 4 0 4 0 4 0 4 0 4 0 4 0 4 ...... 41 67769 14214 5 8 41 46513 14214 5 8 41 45783 14214 5 8 1340079 8173 5 8 1333140 8173 5 8 1335048 8173 5 8 41 68493 14214 5 8 41 52840 14214 5 8 41 55705 14214 5 8 1339859 8173 5 8 1328674 8173 5 8 1328876 8173 5 8 5.4 temporal association rules for activity detection 105

5.4.3.3 Results We summarize in Table 10 the total number of rules for all households. The number of association rules is the same for all methods depend- ing on the chosen interestingness measure, as only the arrangements with enough support or confidence are selected, which guarantees enough data for the clustering. The number of distinct temporal rules varies based on the interestingness measure: the confidence creates significantly more rules due to its definition. Additionally, the num- ber of distinct temporal rules also varies across the diverse clustering methods, due to the overgeneralization of DPGMM and VBGMM, which often creates tolerance regions covering the whole day. The threshold for the bivariate histogram should be chosen as a propor- tion of the dataset instead of an absolute value to reduce the noise incurred by the variance. Due to the type of appliances and circuits monitored in the dataset, most activities cannot be described in a de- tailed way. As for the sequences of appliances or circuits that are aggregated in the frequent itemsets and eventually split into rules, we notice that interesting rules are inherent to the number of appliances that are available in each household and not so much about the number of days available for the clustering, as we obtained on average per num- ber of months of data available per household about the same number of rules discovered. The configuration we tested was for a small toler- ance to intervals’ relationships misalignment by selecting an  of 1 (15 minutes) and thus we are identifying itemsets where appliances are being used relatively closely in time, which is fitting for cooking for example. The parameter could be adapted to relax the constraint for time separation and to capture appliances’ usages more disconnected in time. The number of interesting rules is also dependent on whether or not the household residents were at home and actually interacted with the devices and circuits, as some households that only had one month of data showed significantly more rules than households with three months of data. This is due to the fact that single appliances were aggregated such as kitchen appliances (that could contain toast- ers, coffee makers, etc.) and some houses were more instrumented than others and larger appliances such as ovens, cooking ranges, dish- washers or clothes washers are measured separately, but not available in all households. We observe that appliances that are linked to cooking are identi- fied in rules such as in Figure 36. Similar rules link ovens to ranges, or kitchen appliances to microwaves. Side interactions with cooking can be detected such as activities in bedrooms, bathrooms or living rooms. We also notice that the usage of cooking appliances can be preceded or followed by the usage of a dishwasher for cleaning the dishes. Additionally, dishwashers or clothes washers are used in con- 106 inferring activities

Figure 36: Kitchen and dishwasher rule, support: 0.548

junction with water heaters, triggered for warming the water (as it is common for such appliances in the US, where the appliances are connected to external cold and hot water sources). In some cases, we could suppose that the residents were preparing to leave as interac- tions with kitchen appliances were followed by activity in the garage, and conversely, the arrival of the residents could be detected as well. We verified with the surveys supplied with the Dataport dataset for some of the households and the rules that were mined were corre- lated with the residents’ declarations about the rate of usage of dif- ferent appliances (1-3 times per week), which influences how many rules can be detected. Additionally, in households where residents mentioned that they sometimes work at home during the week, more rules could be observed. The number of residents per dwelling also influenced the types of rules that could be discovered, due to having noisier rules due to activities being carried out by different people concurrently. However, the time windows during which rules were discovered are meaningful when corroborated with the survey infor- mation and times where people can be expected to be at home.

5.4.4 Conclusion and Discussion

In this section, we have derived time windows for temporal sequen- tial association rules based on the co-occurrence of time intervals through machine learning techniques. Our novel method uses the statistical properties of the data to efficiently identify time windows without having to perform an exhaustive search for their occurrence and duration. It is based on the co-occurrence of arrangements of se- quential events that can be seen as a bivariate histogram (or heatmap), which can be adjusted to guarantee that events arise in a signifi- cant enough proportion by applying a support threshold for the co- occurrence matrix. Using a threshold on the co-occurrence frequency allows us to eliminate the noise from the variation of the time inter- vals and serves as the support filtering in the APRIORI algorithm 5.4 temporal association rules for activity detection 107 when computing the frequent itemsets in a transaction. Each zone having strong co-occurrence can be approximated by a trivariate Gaus- sian distribution. We treat the planar projection of each Gaussian as a tolerance region, where a percentage of the population can be covered. As such, each region is an ellipse, whose area can be adjusted to the probability to cover a certain percentage of the population. The Gaus- sians are discovered by estimating them by a GMM, whose param- eters can be used to determine the ellipses. Events that occur more often are concentrated in different temporal regions, this is captured by the clustering and the spread of the points around the mean of the Gaussians. The rules can be refined by adding more constraints on how the frequent itemsets are constructed (relaxing the time relation- ships) and the search can be parametrized. Our predictions can be improved by selecting the features before computing the frequent itemsets selections (by observing the correla- tion, auto-correlation with time lags). But also, we plan on collecting ground truth data and applying our analysis on a dataset with more appliances and activity labels for validation. Additionally, the tempo- ral pattern analysis can be extended to accommodate different granu- larities, e.g., weekly rules can be mined by changing the data format to weekly data and thus deliver intervals across weeks instead of days. Our method is generalizable and can be applied to different datasets with time series to learn habits (mobility traces, smartphones’ inter- action, etc.) through temporal sequential rules and to use the time windows for scheduling and predictions. In this chapter, we presented techniques to infer activities from sub- metered data. Since we do not yet have access to smart appliances that could report their internal state, we differentiated active and idle states from the statistical properties of an appliance power consump- tion by using GMMthresh. Our algorithm was integrated in a pipeline to determine temporal association rules to detect activities requiring the usage of electrical appliances. We provided a scheme based on the distribution of the temporal intervals where sequences of events occur to extract the time windows during which the association rules are valid in a data-driven manner, instead of having to resort to an exhaustive search.

CONCLUSION 6

In this thesis we examined applications where data analytics can lead to new techniques to gain insights into energy consumption in res- idential homes. The approaches we presented are aimed at discov- ering patterns in smart meter datasets to achieve energy efficiency, starting from a coarse view in the customer base to disaggregating the consumption within individual homes to discover activities that consume energy. In Chapter 3, we examined household-level, aggregated load con- sumption. We focused on differentiating peak-time consumption and obtained cluster profiles that differ on the time of the day when the peak consumption arises. Due to the large volumes of data that are being collected as more and more smart meters are deployed and are recording high frequency data, utility companies need to determine how much information should be stored and for how long to avoid scalability issues. We used a two-phase clustering scheme, where his- toric data is first processed to determine cluster profiles, and then load curves are assigned to clusters by carrying out on-the-fly similar- ity comparisons with the cluster profiles based on a distance measure that is sensitive to the location of the peaks. This allows us to meet the concrete objectives of identifying hurtful consumption profiles for the utility companies through the usage of popular clustering algorithms, where a discriminative score ensures the assignment of consumption curves to the most similar cluster representative and maintains clus- ter consistency based on the time of day information contained in the peak location. In Chapter 4, we focused on overcoming the challenges incurred by the lack of energy datasets containing event-based labels. Indeed, due to the prohibitive costs for instrumenting residential homes to collect new datasets, we decided to leverage the wisdom of NILM experts to retrofit labels on existing datasets. We developed and evaluated our CAFED platform, which allows us to annotate power time series by indicating when an appliance should be considered active (in use by a human being) or idle (off or in standby mode). We also provided an evaluation of the platform with non-expert users, by running one experiment with a small group of users in Zurich and another exper- iment on AMT and provide insight into the design of crowdsourcing tasks to improve the quality of annotations. We showed the positive effect of offering tutorial videos to coach the workers for improving the annotation quality and the requirement for few but curated tasks

109 110 conclusion

that are annotated by experts and can be used as challenge questions to quantify the workers’ expertise levels. In Chapter 5, we presented GMMthresh, an algorithm for automati- cally determining the threshold at which an appliance can be consid- ered active instead of idle. Our method is based on statistical proper- ties of the consumption curve of an appliance, and thus agnostic of its type, model or brand. We used the manually annotated CAFED ground truth data to validate the algorithm. We then proposed a full pipeline for determining temporal association rules. Based on historic data and using a data-driven method, we can identify sequences of appliances being used in conjunction and the corresponding time windows for these association rules, without having to resort to an exhaustive search, but instead relying on the distribution of the time windows of these events. Identifying activities with association rules allows us to investigate the scheduling of activities and to provide predictions for the time windows where such events occur.

6.1 future directions

6.1.1 Customer Segmentation

Segmenting household consumption can be achieved at different gran- ularities by considering either daily, weekly or monthly consumption, depending on the application that best serves the purpose of the util- ity companies. Once that the desired discriminative target profiles are obtained based on which criterion is the most pertinent to their anal- ysis, the utility companies can obtain a clear representation of what their customer base is constituted of and could either model objec- tives they want to achieve with different proportions of each cluster profile types, or select customers with the most potential for saving energy to be contacted for energy conservation campaigns. The analysis could be expanded by verifying whether an underly- ing Markov chain could allow us to estimate the likelihood of a house- hold to change their consumption from one week to the other, but also to highlight whether regional factors can impact the shape of the char- acteristic load profiles. Also, finding out if socio-demographic charac- teristics, if they are made available through surveys can be mapped to the cluster profiles, can be used to describe expected features of the population in each cluster, which was envisionned by Albert and Rajagopal [10] and Wijaya, Aberer, and Seetharam [244]. Albert and Rajagopal [9] have used the cluster membership as a sequence of sym- bols and HMM to predict the next sequence of symbols, where a symbol is a representation of a segment of consumption by a Gaus- sian distribution. The modeling could be refined by adding external factors, such as environmental information (weather), day of the year (holiday, regular day, week-end), dwelling size or the number of res- 6.1 future directions 111 idents. It could also be foreseen that households’ consumption is ex- pressed by a mixture of their cluster memberships (the features could express the propensity to adhere to one characteristic consumption segment or another). This would allow the utility companies to select households that tend to variate their consumption similarly, based on the proportion of time they "spend" in different segments. As more data reflecting the residents’ lifestyles can be made available, such as their occupancy profiles, the appliances they own and how they use them can be integrated, more sophisticated models can be devel- oped to more accurately represent and explain the diversity among the customers and assist the utility companies in achieving their en- ergy reduction goals by steering them towards potential groups of interest.

6.1.2 A Pledge for More Datasets

Learning from the information retrieval and the computer vision com- munities, we presented CAFED as a modulable plugin that is acces- sible to the community via a web platform and combines design features to facilitate the acquisition of annotation data to enrich ex- isting datasets. We followed in the footsteps of initiatives such as WikiEnergy or NILMTK. Our work on annotating power data can however be generalized to the problem of annotating time series data, which contrasts with the current usage of crowdsourcing platforms to annotate static data types such as pictures or text fragments. Depend- ing on the type of labels that need to be obtained, integrating an underlying algorithm to pre-select and pre-annotate zones of interest would reduce the work of the annotator to correct the algorithms out- put, e.g. using GMMthresh for indicating active sections in the power time series would be similar to delimiting zones in an image by doing the pre-processing with a segmenting algorithm. There is also an increasing interest for crowdsourcing platform workers to be presented with more diverse tasks. This would how- ever require work on the side of the task designers to develop the necessary backend to host the HIT, if the crowdsourcing platforms do not yet shift their functionalities to support more complex tasks than text or image categorization (which can be achieved by a drop- down or an option list). The complexity can also refer to the nature of the task, which dictactes how the worker will interact with the plat- form, but also the skills and the background knowledge they need to possess to carry it out successfully. If an experiment is conducted in the presence of its organizers, ambiguous points can be resolved by a clarification. However, in the absence of human supervision, mitigat- ing design shortcomings or interpretation uncertainty on an online platform is difficult and can lead to quality issues as misunderstand- ing the task can create a mismatch with the expected outcome. In the 112 conclusion

case of labeling energy datasets, we observed that non-expert users often had little to no knowledge about the average consumption of ap- pliances that are generally present in their homes, and this produced lower quality work. The performance however dramatically improved in some cases, if the users were trained to pay attention to certain de- tails when annotating different appliances or circuit-level data. This should be as much as possible intertwined in the design of the HITs. In our case, we could not have used AMT as such, but we proceeded with our own back-end, which allowed us to add training material between annotation sessions. This means that for the annotators to complete the task at the best of their ability, the task should be unam- biguously defined and the user interface designed to alleviate efforts to execute the annotation, but also integrating an educational or a gamification component could prolong the user’s attention span. For example, one user of the first experiment reportedly provided over 175 annotations (for a total of 3 hours in a row) due to the motivation of acquiring more badges as had been previously shown for text label- ing gamification [79]. The quality of the data could be stirred as the badges are allocated. We received positive feedback from AMT work- ers regarding the usability of the platform, the fact that they learned about their energy consumption and that they would have interest in participating in similar experiments in the future. This is hopefully indicating that the realm of fields that can benefit from crowdsourc- ing platform to construct large (annotated) datasets can be extended, even with more demanding tasks, where traditionally expert users are required, as long as enough care is put into the design of the task and overcoming the limitations of the platform. Mozer [168] suggested that additional information about the activi- ties of the households’ residents could improve the Adaptive Home’s scheduling of the control decisions over time, instead of the relying on spot decisions. However, reducing the usage of activities to a classi- fication problem is difficult due to the variability in the sequence and duration of behaviors and sub-actions included in an activity. Com- paritively, Li, Wang, and Han [149] and Wang and Prabhala [236] showed that while our assumption is that human beings are crea- tures of habits and there is a periodicity to their mobility traces, this depends on the time interval considered to aggregate sparse observa- tions and the available amount of data. Similarly, when looking at ac- tivity traces, a recommendation system aiming for actionable recom- mendations in a home setting should leverage historic pattern collec- tion and allow for short-term context learning. Additionally, a single inhabitant could be running different activities in parallel, therefore, models based on the independence of the household’s residents activ- ity choice or the atomicity of the behaviors in the household would not reflect the reality of the energy usage. Also, as soon as more than one inhabitant is present in the dwelling, the overlaying of different 6.1 future directions 113 objectives by each resident and the resulting concurrent actions, ren- ders the stream of events to analyze more chaotic. To improve collec- tive models’ performances, more data should be collected that distin- guish household residents such as ambient sound [228], their location within their home, the consumption pattern of distinct appliances and environmental information that would yield contextual informa- tion, the social relationships, but above all, the activity labels from the residents. With wearable devices such as smart watches or smart glasses, as an addition to the ubiquitous smart phones, an ecosystem of sensing devices that can track the user’s whereabouts and inter- actions would allow to improve the data collection by leveraging the seamless acquisition of contextual data through the embedded sen- sors or simply facilitate the annotation through the usage of speech recognition or the convenient screen interaction of a watch, which is always available at a person’s wrist or smart glasses. Additionally, the learning and the prediction of energy consumption in multi-residents’ settings with varying horizons of predictions could be improved, as the sensors that are embedded in wearable devices could help de- velop comfort assessment models by measuring the body’s response to changes in the environment. A practical application and the con- tribution of different learning approaches in households that move beyond traditional eco-feedback systems and anticipate distributed micro-generation scenarios leading to important changes in energy sustainability and ultimately the utility business should provide a combination of (i) actionable recommendations for energy conserva- tion including those that take advantage of the availability of renew- able sources and new battery technologies, (ii) suggesting novel ap- proaches for in-house automation that could leverage smart appli- ances and grid supply and demand balance. The opportunity lies in the consolidation of the ecosystem of devices that surround people in their homes and the fusion of the pieces of information that can be ob- tained inside the households and be calibrated with information ema- nating from external entities such demand response signals from the utility companies, before being able to take actions and implement optimal strategies that benefit the end-user and the providers. This starts with the collection of larger sets of datasets and more testbeds to test and perfect existing algorithms, but also to overcome the chal- lenges of real-life deployments as theI oT technologies are not yet mature for moving towards actuation as the ecosystem is fragmented due to the lack of standards. Last but not least, due to the sensitivity of the patterns that are learned and are intimately linked to the lifestyles of people, it should be made clear how privacy can be guaranteed by limiting the gov- ernance of the utility companies to providing input signals about the status of the grid, while running the ambient intelligence system strictly separately in the households. In the same way as an Internet 114 conclusion

service provider could use aggregated traces of usage of their net- work to improve the resiliency of their infrastructure, without track- ing the information in the incoming and outgoing traffic of their cus- tomers. CAFEDAPPENDIX A

This chapter contains the Appendix related to the CAFED system. In particular, we show the entity-relationship model of the underlying PostgreSQL database in Section A.1. Material related to the experi- ments carried out with regular users can be found in Section A.2 and Section A.3. They showcase the survey that was taken by the experi- ment participants and the AMT workers’ feedback respectively. a.1 entity-relationship model

Figure 37: Entity-relationship model for the PostgreSQL database: in purple the tables containing the original Pecan Street data, in green the tables for the annotations and in orange the tables for the users’ management a.2 cafed survey

We ask you to fill this survey truthfully, without using Google or any other search engine. If you do not know the correct answer, just guess what you think could be the right one. The results of this phase do not impact the other phases.

115 116 cafed appendix

• How much power in watts does a microwave use on average? – 20-100 [W] – 80-250 [W] – 200-850 [W] – 600-1500 [W] – 900-2100 [W]

• Do you know how a fridge works? – Yes – No

• If yes, how? Describe shortly (in 3-4 sentences).

• Do you understand the concept of an air-compressor? – Yes – No

• How much power in watts does a washing machine use when it is idle? – 0 [W] – 10-20 [W] – 30-50 [W] – 100-200 [W] – 250-350 [W] – 400-500 [W]

• Do you have an idea of how the daily power consumption curve of a fridge approximately looks like? – Yes – No

• Could you, given a power consumption curve, determine at what time a person has interacted with a fridge (e.g., opened the door)? – Yes – No

• If yes, could you also do it for other devices (e.g., washing ma- chine, dishwasher, air conditioner, light bulb, etc.)? – Yes – No

• What is the average power consumption in watts of an LED light bulb (equivalent to a 60 [W] incandescent light bulb)? A.3 amt users’ feedback 117

– 5-10 [W] – 10-15 [W] – 15-20 [W] – 20-25 [W] – 25-30 [W]

• How many cycles does a dishwasher program have? – 1 – 2 – 3 – 4 – 5

• How do you rate your knowledge in the field of annotation energy consumption curves? – 1 (very poor) – 2 – 3 – 4 – 5 (very good)

• Feedback, comments, etc. (optional) a.3 amtusers’ feedback

• “Thank you! I tried.”

• “I think this was one of the most interesting things I’ve done in quite some time!”

• “Thanks sorry for all the trouble it has caused!!!”

• “Thanks for a good HIT.”

• “I really enjoyed this task! Is there anyway you can provide me with feedback to let me know how I did? Thank you!”

• “Good and easy too use tool ! I would like to work for you more please email me for more work available, thanks.”

• “Was fun and interesting to do. Also fun to just learn about annotating.”

• “One or two of the charts had a hard time loading. I’m not sure if it was from an error on the page or my DSL connection, but I figured it was worth mentioning.” 118 cafed appendix

• “This task was interesting to complete.”

• “What a great survey!”

• “Pretty cool!”

• “I would like to do more of these. I can even do at a cheaper cost. Just let me know if I need to improve in any area. My mechanical turk ID is: ***REMOVED***. Thank you for the op- portunity.”

• “Please add direction keys function to move between times- tamps for effective moving between the graph.”

• “Good study, I want more.”

• “Interesting to learn about energy consumption in a house. I however do not use a dishwasher, microwave or a clothes-washer and feel that western society uses too much energy. It would be better if we could do with less.”

• “Thanks, it was really an informative task and an eye opener.”

• “Too long”

• “Thanks for the opportunity.”

• “I learned a lot. Thanks.”

• “It was enjoyable! Thank you” BIBLIOGRAPHY

[1] Wokje Abrahamse and Linda Steg. “How Do Socio-demographic and Psychological Factors Relate to Households’ Direct and Indirect Energy Use and Savings?” In: Journal of Economic Psy- chology 30.5 (Oct. 2009), 711–720. doi: 10.1016/j.joep.2009. 05.006. [2] Alliance Commision on National Energy Efficiency Policy (AC- NEEP). The History of Energy Efficiency. Tech. rep. Washington, DC, USA: ACNEEP, 2013, 1–45. url: https://www.ase.org/ sites / ase . org / files / resources / Media % 20browser / ee _ commission_history_report_2-1-13.pdf. [3] Jake K. Aggarwal and Michael S. Ryoo. “Human Activity Anal- ysis.” In: ACM Computing Surveys 43.3 (Apr. 2011), 1–43. doi: 10.1145/1922649.1922653. [4] Rakesh Agrawal, Tomasz Imieli´nski,and Arun Swami. “Min- ing Association Rules Between Sets of Items in Large Databases.” In: Proceedings of the 1993 ACM SIGMOD International Confer- ence on Management of Data (SIGMOD ’93). Washington, DC, USA: ACM, May 1993, 207–216. doi: 10.1145/170036.170072. [5] Rakesh Agrawal and Ramakrishnan Srikant. “Mining Sequen- tial Patterns: Generalizations and Performance Improvements.” In: Proceedings of the 11th International Conference on Data Engi- neering (ICDE ’95). Taipei, Taiwan: IEEE, Mar. 1995, 3–14. doi: 10.1109/ICDE.1995.380415. [6] Luis von Ahn. “Games With a Purpose.” In: Computer 39.6 (June 2006), 92–94. doi: 10.1109/MC.2006.196. [7] Adrian Albert, Timnit Gebru, and Jerome Ku. “Drivers of Vari- ability in Energy Consumption.” In: Proceedings of the 1st ECM- L/PKDD International Workshop onData Analytics for Renewable Energy Integration (DARE ’13). Prague, Czech Republic: ECML PKKD, Sept. 2013, 1–12. [8] Adrian Albert and Ram Rajagopal. “Building Dynamic Ther- mal Profiles of Energy Consumption for Individuals and Neigh- borhoods.” In: Proceedings of the 2013 IEEE International Confer- ence on Big Data (BigData ’13). Santa Clara, CA, USA: IEEE, Oct. 2013, 723–728. doi: 10.1109/BigData.2013.6691644. [9] Adrian Albert and Ram Rajagopal. “Smart Meter Driven Seg- mentation: What Your Consumption Says About You.” In: IEEE Transactions on Power Systems 28.4 (Nov. 2013), 4019–4030. doi: 10.1109/TPWRS.2013.2266122.

119 120 Bibliography

[10] Adrian Albert and Ram Rajagopal. “Cost-of-service Segmen- tation of Energy Consumers.” In: IEEE Transactions on Power Systems 29.6 (Nov. 2014), 2795–2803. doi: 10.1109/TPWRS.2014. 2312721. [11] Adrian Albert and Ram Rajagopal. “Finding the Right Users for Thermal Demand-response : An Experimental Evaluation.” In: IEEE Transactions on Smart Grid (Apr. 2016), 1–16. issn: 1949- 3053. doi: 10.1109/TSG.2016.2555985. [12] Aníbal de Almeida, Paula Fonseca, Barbara Schlomann, and Nicolai Feilberg. “Characterization of the Household Electric- ity Consumption in the EU, Potential Energy Savings and Spe- cific Policy Recommendations.” In: Energy and Buildings 43.8 (Aug. 2011), 1884–1894. doi: 10 . 1016 / j . enbuild . 2011 . 03 . 027. [13] Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, and Jure Leskovec. “Steering User Behavior With Badges.” In: Pro- ceedings of the 22nd International Conference on World Wide Web (WWW ’13). Rio de Janeiro, Brazil: ACM, May 2013, 95–106. doi: 10.1145/2488388.2488398. [14] Kyle Anderson, Adrian Ocneanu, Diego Benitez, Derrick Carl- son, Anthony Rowe, and Mario Berges. “BLUED: A Fully La- beled Public Dataset for Event-based Non-intrusive Load Mon- itoring Research.” In: Proceedings of the 2nd KDD Workshop on Data Mining Applications in Sustainability (SustKDD ’12). Bei- jing, China: ACM, Aug. 2012, 1–5. [15] Judd Antin and Elizabeth F. Churchill. “Badges in Social Me- dia: A Social Psychological Perspective.” In: Proceedings of the 2011 CHI Workshop Gamification (CHI ’11). Vancouver, BC, Canada: ACM, May 2011, 1–4. [16] Kathleen Carrie Armel, Abhay Gupta, Gireesh Shrimali, and Adrian Albert. “Is Disaggregation the Holy Grail of Energy Efficiency? The Case of Electricity.” In: 52.0 (Jan. 2013), 213– 234. doi: 10.1016/j.enpol.2012.08.062. [17] CEDIA Awards. Best Integrated Home (Over £250,000). 2015. url: http://www.cediaawards.org/about/archive/Awards-2015/ best-integrated-home-over-250k-2015 (visited on 05/03/2016). [18] Ling Bao and Stephen S. Intille. “Activity Recognition from User-annotated Acceleration Data.” In: Proceedings of the 2004 International Conference on Pervasive Computing (Pervasive ’04). Linz, Austria: Springer, Apr. 2004, 1–17. doi: 10.1007/b96922. [19] Sean Barker, Sandeep Kalra, David Irwin, and Prashant Shenoy. “PowerPlay: Creating Virtual Power Meters Through Online Load Tracking.” In: Proceedings of the 1st ACM Conference on Embedded Systems for Energy-Efficient Buildings (BuildSys ’14). Bibliography 121

Memphis, TN, USA: ACM, Nov. 2014, 60–69. doi: 10.1145/ 2674061.2674068. [20] Sean Barker, Aditya Mishra, David Irwin, Emmanuel Cecchet, and Prashant Shenoy. “Smart*: An Open Data Set and Tools for Enabling Research in Sustainable Homes.” In: Proceedings of the 2nd KDD Workshop on Data Mining Applications in Sustain- ability (SustKDD ’12). Beijing, China: ACM, Aug. 2012, 1–6. [21] Sean Barker, Aditya Mishra, David Irwin, Prashant Shenoy, and Jeannie Albrecht. “SmartCap: Flattening Peak Electricity Demand in Smart Homes.” In: Proceedings of the 2012 IEEE In- ternational Conference on Pervasive Computing and Communica- tions (PerCom ’12). Lugano, Switzerland: IEEE, Mar. 2012, 67– 75. doi: 10.1109/PerCom.2012.6199851. [22] Nipun Batra, Manoj Gulati, Amarjeet Singh, and Mani B. Sri- vastava. “It’s Different: Insights Into Home Energy Consump- tion in India.” In: Proceedings of the 5th ACM Workshop on Embed- ded Systems For Energy-Efficient Buildings (BuildSys ’13). Rome, Italy: ACM, Nov. 2013, 1–8. doi: 10.1145/2528282.2528293. [23] Nipun Batra, Jack Kelly, Oliver Parson, Haimonti Dutta, William Knottenbelt, Alex Rogers, Amarjeet Singh, and Mani Srivas- tava. “NILMTK: An Open Source Toolkit for Non-intrusive Load Monitoring.” In: Proceedings of the 5th International Con- ference on Future Energy Systems (e-Energy ’14). Cambridge, UK: ACM, June 2014, 265–276. doi: 10.1145/2602044.2602051. [24] Gerald Bauer, Karl Stockinger, and Paul Lukowicz. “Recog- nizing the Use-mode of Kitchen Appliances From Their Cur- rent Consumption.” In: Proceedings of the 4th European Confer- ence on Smart Sensing and Context (EuroSSC ’09). Guildford, UK: Springer, Sept. 2009, 163–176. doi: 10.1007/978-3-642-04471- 7_13. [25] Christian Beckel. “Scalable and Personalized Energy Efficiency Services with Smart Meter Data.” Ph.D. Dissertation. ETH Zurich, 2015, 1–217. doi: 10.3929/ethz-a-010578740. [26] Christian Beckel, Wilhelm Kleiminger, Thorsten Staake, and Silvia Santini. “The ECO Data Set and the Performance of Non-intrusive Load Monitoring Algorithms.” In: Proceedings of the 1st ACM Conference on Embedded Systems for Energy-Efficient Buildings (BuildSys ’14). Memphis, TN, USA: ACM, Nov. 2014, 80–89. doi: 10.1145/2674061.2674064. [27] Christian Beckel, Leyna Sadamori, and Silvia Santini. “Towards Automatic Classification of Private Households Using Electric- ity Consumption Data.” In: Proceedings of the 4th ACM Work- shop on Embedded Sensing Systems for Energy-Efficiency in Build- 122 Bibliography

ings (BuildSys ’12). Toronto, Canada: ACM, Nov. 2012, 169–176. doi: 10.1145/2422531.2422562. [28] Christian Beckel, Leyna Sadamori, and Silvia Santini. “Auto- matic Socio-economic Classification of Households Using Elec- tricity Consumption Data.” In: Proceedings of the the 4th Interna- tional Conference on Future Energy Systems (e-Energy ’13). Berke- ley, CA, USA: ACM, May 2013, 75–86. doi: 10.1145/2487166. 2487175. [29] Miichael Bensimhoun. N-dimensional Cumulative Function and Other Useful Facts About Gaussian and Normal Densities. Tech. rep. Jerusalem, Israel, 2009, 1–8. [30] Georges Berweiler. SMART HOME - Fantasme, Réalités et En- jeux. Lausanne, Switzerland, 2016. url: https://www.electrosuisse. ch/de/verband/fachgesellschaften/itg/itg-rueckblicke/ 160308-smart-home-lausanne.html. [31] Julia Blasch, Nilkanth Kumar, Massimo Filippini, Julia Blasch, Nilkanth Kumar, and Massimo Filippini. Boundedly Rational Consumers, Energy and Investment Literacy, and the Display of In- formation on Household Appliances. Tech. rep. ETH Zurich, June 2016, 1–43. doi: 10.3929/ethz-a-010656875. [32] David M. Blei, Thomas L. Griffiths, and Michael I. Jordan. “The Nested Chinese Restaurant Process and Bayesian Non- parametric Inference of Topic Hierarchies.” In: Journal of the ACM 57.2 (Jan. 2010), 1–30. doi: 10.1145/1667053.1667056. [33] David M. Blei and Michael I. Jordan. “Variational Inference for Dirichlet Process Mixtures.” In: Bayesian Analysis 1.1 (Mar. 2006), 121–143. doi: 10.1214/06-BA104. [34] Gro Harlmen Bruntland. Report of the World Commission on Environment and Development: Our Common Future Table. Tech. rep. Oslo, Norway: WCED, Mar. 1987, 1–300. url: https:// sustainabledevelopment.un.org/content/documents/5987our- common-future.pdf. [35] Hông-Ân Cao, Christian Beckel, and Thorsten Staake. “Are Domestic Load Profiles Stable Over Time? An Attempt to Iden- tify Target Households for Demand Side Management Cam- paigns.” In: Proceedings of the 39th Annual Conference of the IEEE Industrial Electronics Society (IECON ’13). Vienna, Austria: IEEE, Nov. 2013, 4733–4738. doi: 10.1109/IECON.2013.6699900. [36] Hông-Ân Cao, Felix Rauchenstein, Tri Kurniawan, Karl Aberer, and Nuno Nunes. “Leveraging User Expertise in Collaborative Systems for Annotating Energy Datasets.” In: Proceedings of the 2016 Workshop on Smart Grids at the 2016 IEEE International Con- ference on Big Data (BigData ’16). Washington, DC, USA: IEEE, Dec. 2016, 3087–3096. doi: 10.1109/BigData.2016.7840963. Bibliography 123

[37] Hông-Ân Cao, Tri Kurniawan Wijaya, and Karl Aberer. “Es- timating Human Interactions with Electrical Appliances for Activity-based Energy Savings Recommendations.” In: Proceed- ings of the 1st ACM Conference on Embedded Systems for Energy- Efficient Buildings (BuildSys ’14). Memphis, TN, USA: ACM, Nov. 2014, 206–207. doi: 10.1145/2674061.2675037. [38] Hông-Ân Cao, Tri Kurniawan Wijaya, and Karl Aberer. “Es- timating Human Interactions With Electrical Appliances for Activity-based Energy Savings Recommendations.” In: Proceed- ings of the 2016 IEEE International Conference on Big Data (Big- Data ’16). Washington, DC, USA: IEEE, Dec. 2016, 1301–1308. doi: 10.1109/BigData.2016.7840734. [39] Hông-Ân Cao, Tri Kurniawan Wijaya, Karl Aberer, and Nuno Nunes. “A Collaborative Framework for Annotating Energy Datasets.” In: Proceedings of the 2015 Workshop for Sustainable Development at the 2015 IEEE International Conference on Big Data (BigData ’15). Santa Clara, CA, USA: IEEE, Oct. 2015, 2716–2725. doi: 10.1109/BigData.2015.7364072. [40] Hông-Ân Cao, Tri Kurniawan Wijaya, Karl Aberer, and Nuno Nunes. “Temporal Association Rules For Electrical Activity Detection in Residential Homes.” In: Proceedings of the 2016 Workshop on Smart Grids at the 2016 IEEE International Confer- ence on Big Data (BigData ’16). Washington, DC, USA: IEEE, Dec. 2016, 3097–3106. doi: 10.1109/BigData.2016.7840964. [41] Michael Carlowicz. World of Change: Global Temperatures. 2010. url: http://earthobservatory.nasa.gov/Features/WorldOfChange/ decadaltemp.php (visited on 12/19/2016). [42] Andrew Carlson, Justin Betteridge, and Bryan Kisiel. “Toward an Architecture for Never-Ending Language Learning.” In: Pro- ceedings of the 24th Conference on Artificial Intelligence (AAAI ’10). Atlanta, GA, USA: AAAI, July 2010, 1306–1313. [43] Davide Castelvecchi. “Will Tesla’s Battery Change the Energy Market?” In: Nature (May 2015). doi: 10.1038/nature.2015. 17469. [44] Michael M. Cernea. “Social Impacts and Social Risks in Hy- dropower Programs: Preemptive Planning and Counter-risk Measures.” In: Proceedings of the United Nations Symposium on Hydropower and Sustainable Development. Beijing, China: CHIN- COLD, Aug. 2004, 1–22. url: http : / / www . un . org / esa / sustdev/sdissues/energy/op/hydro_cernea_social%20impacts_ backgroundpaper.pdf. [45] Rung-Fang Chang and Chan-Nan Lu. “Load Profiling and Its Applications in Power Market.” In: Proceedings of the 2003 IEEE Power Engineering Society General Meeting (PES ’03). Toronto, 124 Bibliography

ON, Canada: IEEE, July 2003, 974–978. doi: 10.1109/PES.2003. 1270442. [46] Sahar Changuel, Nicolas Labroche, and Bernadette Bouchon- meunier. “A General Learning Method for Automatic Title Extraction from HTML Pages.” In: Proceedings of the 6th In- ternational Conference on Machine Learning and Data Mining in Pattern Recognition (MLDM ’09). Vol. 5632. Leipzig, Germany: Springer, July 2009, 704–718. doi: 10.1007/978-3-642-03070- 3. [47] Chao Chen and Diane J. Cook. “Behavior-based Home Energy Prediction.” In: Proceedings of the 8th International Conference on Intelligent Environments (IE ’12). Guanajuato, Mexico: IEEE, June 2012, 57–63. doi: 10.1109/IE.2012.44. [48] Chao Chen, Barnan Das, and Diane J. Cook. “Energy Predic- tion Based on Resident’s Activity.” In: Proceedings of the 4th In- ternational Workshop on Knowledge Discovery from Sensor Data (SensorKDD ’10). Washington, DC, USA: ACM, July 2010, 1–7. [49] Dong Chen, Sean Barker, Adarsh Subbaswamy, David Irwin, and Prashant Shenoy. “Non-intrusive Occupancy Monitoring Using Smart Meters.” In: Proceedings of the 5th ACM Workshop on Embedded Systems For Energy-Efficient Buildings (BuildSys’13). Rome, Italy: ACM, Nov. 2013, 1–8. doi: 10 . 1145 / 2528282 . 2528294. [50] Gianfranco Chicco, Roberto Napoli, Petru Postolache, Mircea Scutariu, and Cornel Toader. “Electric Energy Customer Char- acterisation for Developing Dedicated Market Strategies.” In: Proceedings of the 2001 IEEE Porto Power Tech Conference (Pow- erTech ’01). Porto, Portugal: IEEE, Sept. 2001, 6. doi: 10.1109/ PTC.2001.964627. [51] Gianfranco Chicco, Roberto Napoli, Petru Postolache, Mircea Scutariu, and Cornel Toader. “Customer Characterization Op- tions for Improving the Tariff Offer.” In: IEEE Transactions on Power Systems 18.1 (Feb. 2003), 381–387. doi: 10.1109/TPWRS. 2002.807085. [52] David Coady, Ian Parry, Louis Sears, and Baoping Shang. “How Large Are Global Energy Subsidies?” In: IMF Working Papers 15.105 (May 2015), 1. doi: 10.5089/9781513532196.001. [53] Commission for Energy Regulation. Electricity Smart Metering Customer Behaviour Trials (CBT) Findings Report. Tech. rep. Dublin, Ireland: Commission for Energy Regulation, 2011, 1–99. [54] Commission for Energy Regulation. Electricity Smart Metering Technology Trials Findings Report. Tech. rep. Dublin, Ireland: Com- mission for Energy Regulation, 2011, 1–58. Bibliography 125

[55] Commission for Energy Regulation. Smart Metering Informa- tion Paper 4: Results of Electricity Cost-benefit Analysis, Customer Behaviour Trials and Technology Trials. Tech. rep. Dublin, Ireland: Commission for Energy Regulation, 2011, 1–39. [56] European Commission. Energy 2020 - A Strategy for Competitive, Sustainable and Secure Energy Com/2010/0639. 2010. url: https: //ec.europa.eu/energy/en/topics/energy-strategy/2020- energy-strategy (visited on 04/21/2016). [57] Swiss Confederation. Loi sur l’Énergie. Bern, Switzerland, 2017. url: https://www.admin.ch/opc/fr/federal-gazette/2016/ 7469.pdf. [58] Diane J. Cook. “Learning Setting-generalized Activity Models for Smart Spaces.” In: IEEE Intelligent Systems 27.1 (Jan. 2012), 32–38. doi: 10.1109/MIS.2010.112. arXiv: NIHMS150003. [59] Diane J. Cook, Juan C. Augusto, and Vikramaditya R. Jakkula. “Ambient Intelligence: Technologies, Applications, and Oppor- tunities.” In: Pervasive and Mobile Computing 5.4 (Aug. 2009), 277–298. doi: 10.1016/j.pmcj.2009.04.001. [60] Diane J. Cook, Aaron S. Crandall, Brian L. Thomas, and Narayanan C. Krishnan. “CASAS: A Smart Home in a Box.” In: Computer 46.7 (July 2013), 62–69. doi: 10.1109/MC.2012.328. [61] Diane J. Cook and Maureen Schmitter-Edgecombe. “Assessing the Quality of Activities in a Smart Environment.” In: Methods of information in medicine 48.5 (May 2009), 480–485. doi: 10 . 3414/ME0592. [62] Giuseppe T. Costanzo, Anna M. Kosek, Guchuan Zhu, Luca Ferrarini, Miguel F. Anjos, and Gilles Savard. “An Experimen- tal Study on Load-peak Shaving in Smart Homes by Means of Online Admission Control.” In: Proceedings of the 3rd IEEE PES International Conference on Innovative Smart Grid Technolo- gies (ISGT Europe ’12). Berlin, Germany: IEEE, Oct. 2012, 1–8. doi: 10.1109/ISGTEurope.2012.6465658. [63] Mark Costanzo, Dane Archer, Elliot Aronson, and Thomas Pettigrew. “Energy Conservation Behavior: The Difficult Path From Information to Action.” In: American Psychologist 41.5 (May 1986), 521–528. doi: 10.1037/0003-066X.41.5.521. [64] Pietro Cottone, Salvatore Gaglio, Giuseppe Lo Re, and Marco Ortolani. “User Activity Recognition for Energy Saving in Smart Homes.” In: Pervasive and Mobile Computing 16.Part A (Jan. 2015), 156–170. doi: 10.1016/j.pmcj.2014.08.006. 126 Bibliography

[65] Pietro Cottone, Salvatore Gaglio, Giuseppe Lo Re, and Marco Ortolani. “User Activity Recognition for Energy Saving in Smart Homes.” In: Proceedings of the 3rd Conference on Sustainable In- ternet and ICT for Sustainability (SustainIT’ 13). Palermo, Italy: IEEE, Oct. 2013, 1–9. doi: 10.1109/SustainIT.2013.6685196. [66] . “Smart Metering: What Potential for Householder Engagement?” In: Building Research & Information 38.5 (Oct. 2010), 442–457. doi: 10.1080/09613218.2010.492660. [67] Sajal K. Das, Diane J. Cook, Amiya Battacharya, Edwin O. Heierman, and Tze-Yun Lin. “The Role of Prediction Algo- rithms in the MavHome Smart Home Architecture.” In: IEEE Wireless Communications 9.6 (Dec. 2002), 77–84. doi: 10.1109/ MWC.2002.1160085. [68] Alexander and Allan M Skene. “Maximum Like- lihood Estimation of Observer Error-Rates Using the EM Al- gorithm.” In: Applied statistics 28.1 (Mar. 1979), 20–28. [69] Raymond De Young. “Changing Behavior and Making it Stick: The Conceptualization and Management of Conservation Be- havior.” In: Environment and Behavior 25.3 (May 1993), 485–505. doi: 10.1177/0013916593253003. url: http://eab.sagepub. com/cgi/doi/10.1177/0013916593253003. [70] Sebastian Deterding, Dan Dixon, Rilla Khaled, and Lennart Nacke. “From Game Design Elements to Gamefulness: Defin- ing "Gamification".” In: Proceedings of the 15th International Aca- demic MindTrek Conference on Envisioning Future Media Environ- ments (MindTrek ’11). Tampere, Finland: ACM, Sept. 2011, 9–15. doi: 10.1145/2181037.2181040. [71] Rich DeVaul, Michael Sung, Jonathan Gips, and Alex Pentland. “MIThril 2003: Applications and Architecture.” In: Proceedings of the 7th IEEE International Symposium on Wearable Computers (ISWC ’03). Sanibel Island, FL, USA: IEEE, Oct. 2003, 4–11. doi: 10.1109/ISWC.2003.1241386. url: http://www.ieeeexplore. ws/document/1241386/. [72] Anind K. Dey and Alan Newberger. “Support for Context- aware Intelligibility and Control.” In: Proceedings of the 27th International Conference on Human Factors in Computing Systems (CHI ’09). Boston, MA, USA: ACM, Apr. 2009, 859–868. doi: 10. 1145/1518701.1518832. url: http://dl.acm.org/citation. cfm?doid=1518701.1518832. [73] Chuong B. Do. “The Multivariate Gaussian Distribution.” Stan- ford, CA, USA, 2008. Bibliography 127

[74] Luc Dufour, Dominique Genoud, Gianluca Rizzo, Antonio J. Jara, Pierre Roduit, Jean Jacques Bezian, and Bruno Ladevie. “Test Set Validation for Home Electrical Signal Disaggrega- tion.” In: Proceedings of the 8th International Conference on Innova- tive Mobile and Internet Services in Ubiquitous Computing (IMIS ’14). Birmingham, UK: IEEE, July 2014, 415–420. doi: 10.1109/ IMIS.2014.56. [75] Julien Eberle. “Energy-efficient Continuous Context Sensing on Mobile Phones.” Ph.D. Dissertation. EPFL, 2015, 1–170. doi: 10.5075/epfl-thesis-6761. [76] Dominik Egarter and Wilfried Elmenreich. “Autonomous Load Disaggregation Approach Based on Active Power Measure- ments.” In: Proceedings of the 1st IEEE Workshop on Pervasive Energy Services (PerEnergy ’15). St. Louis, MO, USA: IEEE, Mar. 2015, 293–298. doi: 10.1109/PERCOMW.2015.7134051. [77] Dominik Egarter, Manfred Pöchacker, and Wilfried Elmenre- ich. “Complexity of Power Draws for Load Disaggregation.” In: CoRR abs/1501.0 (Jan. 2015), 1–26. arXiv: 1501.02954 [cs.OH]. [78] Karen Ehrhardt-Martinez, Kat A. Donnelly, and John A. Lait- ner. Advanced Metering Initiatives and Residential Feedback Pro- grams: A Meta-review for Household Electricity-saving Opportuni- ties. Tech. rep. Washington, DC, USA: ACEEE, 2010, 1–140. [79] Carsten Eickhoff, Christopher G. Harris, Arjen P. de Vries, and Padmini Srinivasan. “Quality through Flow and Immer- sion: Gamifying Crowdsourced Relevance Assessments.” In: Proceedings of the 35th International ACM SIGIR Conference on Re- search and Development in Information Retrieval (SIGIR ’12). Port- land, OR, USA: ACM, 2012, 871–880. doi: 10.1145/2348283. 2348400. [80] Jon Erdman and Quincy Vagell. Lower 48 States Just Experienced the Warmest Winter on Record. 2016. url: https : / / weather . com/news/climate/news/record-warmest-winter-us-2015- 2016 (visited on 12/13/2016). [81] Varick L. Erickson, Miguel A. Carreira-Perpinan, and Alberto E. Cerpa. “OBSERVE: Occupancy-based System for Efficient Reduction of HVAC Energy.” In: Proceedings of the 10th Inter- national Conference on Information Processing in Sensor Networks (IPSN ’11). Chicago, IL, USA: IEEE, Apr. 2011, 258–269. [82] European Commission. Smart Grids and Meters. 2016. url: https: //ec.europa.eu/energy/en/topics/markets-and-consumers/ smart-grids-and-meters (visited on 12/16/2016). 128 Bibliography

[83] Peter Faymonville, Kai Wang, John Miller, and Serge Belongie. “CAPTCHA-based Image Labeling on the Soylent Grid.” In: Proceedings of the ACM SIGKDD Workshop on Human Computa- tion (HCOMP ’09). Paris, France: ACM, June 2009, 46–49. doi: 10.1145/1600150.1600167. [84] John Feminella, Devika Pisharoty, and Kamin Whitehouse. “Pi- loteur: A Lightweight Platform for Pilot Studies of Smart Homes.” In: Proceedings of the 1st ACM Conference on Embedded Systems for Energy-Efficient Buildings (BuildSys ’14). Memphis, TN, USA: ACM, Nov. 2014, 110–119. doi: 10.1145/2676061.2674076. [85] Anna Fensel, Slobodanka Tomic, Vikash Kumar, Milan Ste- fanovic, Sergey V. Aleshin, and Dmitry O. Novikov. “SESAME- S: Semantic Smart Home System for Energy Efficiency.” In: Informatik-Spektrum 36.1 (Feb. 2013), 46–57. doi: 10.1007/s00287- 012-0665-9. [86] Tim Finin, Will Murnane, Anand Karandikar, Nicholas Keller, Justin Martineau, and Mark Dredze. “Annotating Named En- tities in Twitter Data with Crowdsourcing.” In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT ’10). Vol. 2010. June. Los Angeles, CA, USA: ACL, June 2010, 80–88. [87] Jon Froehlich. “Promoting Energy Efficient Behaviors in the Home Through Feedback: The Role of Human-computer In- teraction.” In: Proceedings of the Human Computer Interaction Consortium Workshop (HCIC ’09). Fraser, CO, USA: HCIC, Feb. 2009, 1–11. [88] Jon Froehlich, Kate Everitt, and James Fogarty. “Sensing Op- portunities for Personalized Feedback Technology to Reduce Consumption.” In: Proceedings of the 2009 CHI Workshop on Defining the Role of HCI in the Challenge of Sustainability (CHI ’09). ACM, Apr. 2009, 1–7. [89] Jon Froehlich, Leah Findlater, and James Landay. “The Design of Eco-feedback Technology.” In: Proceedings of the 2010 ACM SIGCHI Human Factors in Computing Systems Conference (CHI ’10). Atlanta, GA, USA: ACM, Apr. 2010, 1999–2008. doi: 10. 1145/1753326.1753629. [90] Gartner. Gartner Says 6.4 Billion Connected "Things" Will Be in Use in 2016, up 30 Percent From 2015. 2015. url: http://www. gartner.com/newsroom/id/3165317 (visited on 11/22/2016). [91] Birgitta Gatersleben. “Sustainable Household Consumption and Quality of Life: The Acceptability of Sustainable Consumption Patterns and Consumer Policy Strategies.” In: International Jour- nal of Environment and Pollution 15.2 (2001), 200–216. doi: 10. Bibliography 129

1504 / IJEP . 2001 . 000596. url: http : / / www . inderscience . com/link.php?id=596. [92] Mahmoud Ghofrani, Mohammad Hassanzadeh, Mehdi Etezadi- Amoli, and M. Sami Fadali. “Smart Meter Based Short-term Load Forecasting for Residential Customers.” In: Proceedings of the 2011 North American Power Symposium (NAPS ’11). Boston, MA, USA: IEEE, Aug. 2011, 1–5. doi: 10 . 1109 / NAPS . 2011 . 6025124. [93] Lazaros Gkatzikis, Iordanis Koutsopoulos, and Theodoros Sa- lonidis. “The Role of Aggregators in Smart Grid Demand Re- sponse Markets.” In: IEEE Journal on Selected Areas in Commu- nications 31.7 (July 2013), 1247–1257. doi: 10.1109/JSAC.2013. 130708. [94] Gideon Goldin and Adam Darlow. TurkGate (Version 0.4.0) [Soft- ware]. 2013. url: http://gideongoldin.github.com/TurkGate/ (visited on 11/21/2016). [95] Mathieu Guillame-Bert and James L. Crowley. “Learning Tem- poral Association Rules on Symbolic Time Sequences.” In: Pro- ceedings of the 4th Asian Conference on Machine Learning (ACML ’12). Singapore: JMLR, Nov. 2012, 159–174. [96] Mathieu Guillame-Bert and Artur Dubrawski. “Learning Tem- poral Rules to Forecast Events in Multivariate Time Sequences.” In: Proceedings of the 2014 NIPS Workshop on Machine Learning for Clinical Data, Healthcare and Genomics (NIPS ’14). Montreal, QC, Canada: Curran Associates, Dec. 2014, 1–9. [97] Manoj Gulati, Shobha Sundar Ram, and Amarjeet Singh. “An in Depth Study Into Using EMI Signatures for Appliance Iden- tification.” In: Proceedings of the 1st ACM Conference on Embed- ded Systems for Energy-Efficient Buildings (BuildSys ’14). Mem- phis, TN, USA: ACM, Nov. 2014, 70–79. doi: 10.1145/2674061. 2674070. [98] Manu Gupta, Stephen S. Intille, and Kent Larson. “Adding GPS-control to Traditional Thermostats: An Exploration of Po- tential Energy Savings and Design Challenges.” In: Proceedings of the 7th International Conference on Pervasive Computing (Perva- sive ’09). Vol. 3468. Nara, Japan: Springer, May 2009, 95–114. doi: 10.1007/978-3-642-01516-8_8. [99] Sidhant Gupta, Matthew S. Reynolds, and Shwetak N. Patel. “ElectriSense: Single-point Sensing Using EMI for Electrical Event Detection and Classification in the Home.” In: Proceed- ings of the 12th International Conference on Ubiquitous Computing (Ubicomp ’10). Copenhagen, Denmark: ACM, Sept. 2010, 139– 148. doi: 10.1145/1864349.1864375. 130 Bibliography

[100] Stephen Haben, Colin Singleton, and Peter Grindrod. “Anal- ysis and Clustering of Residential Customers Energy Behav- ioral Demand Using Smart Meter Data.” In: IEEE Transactions on Smart Grid 7.1 (Jan. 2016), 136–144. doi: 10.1109/TSG.2015. 2409786. [101] Stephen Haben, Jonathan Ward, Danica Vukadinovic Greetham, Colin Singleton, and Peter Grindrod. “A New Error Measure for Forecasts of Household-level, High Resolution Electrical Energy Consumption.” In: International Journal of Forecasting 30.2 (Apr. 2014), 246–256. doi: 10.1016/j.ijforecast.2013. 08.002. [102] Son N. Han, Gyu Myoung Lee, and Noel Crespi. “Seman- tic Context-aware Service Composition for Building Automa- tion System.” In: IEEE Transactions on Industrial Informatics 10.1 (Feb. 2014), 752–761. doi: 10.1109/TII.2013.2252356. [103] George W. Hart. “Nonintrusive Appliance Load Monitoring.” In: Proceedings of the IEEE 80.12 (Aug. 1992), 1870–1891. doi: 10.1109/5.192069. [104] Trevor Hastie, , and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference and Pre- diction. New York, NY: Springer, Mar. 2009, 1–745. doi: 10 . 1007/978-0-387-84858-7. [105] Tao Hong and Shu Fan. “Probabilistic Electric Load Forecast- ing: A Tutorial Review.” In: International Journal of Forecasting 32.3 (July 2016), 914–938. doi: 10.1016/j.ijforecast.2015. 11.011. [106] Frank Höppner. “Discovery of Temporal Patterns.” In: Proceed- ings of the 2001 European Conference on Knowledge Discovery in Databases (PKDD ’01). Freiburg, Germany: Springer, Sept. 2001, 192–203. doi: 10.1007/3-540-44794-6_16. [107] Frank Höppner and Frank Klawonn. “Finding Informative Rules in Interval Sequences.” In: Proceedings of the 4th International Symposium on Intelligent Data Analysis (IDA ’01). Cascais, Por- tugal: Springer, Sept. 2001, 125–134. doi: 10 . 1007 / 3 - 540 - 44816-0_13. [108] Michael E. Houle, Hans-Peter Kriegel, Peer Kröger, Erich Schu- bert, and Arthur Zimek. “Can Shared-neighbor Distances De- feat the Curse of Dimensionality?” In: Proceedings of the 22nd International Conference on Scientific and Statistical Database Man- agement (SSDBM ’10). Heidelberg, Germany: Springer, July 2010, 482–500. doi: 10.1007/978-3-642-13818-8_34. Bibliography 131

[109] Zhi-Kai Huang and Kwok-Wing Chau. “A New Image Thresh- olding Method Based on Gaussian Mixture Model.” In: Applied and Computation 205.2 (Nov. 2008), 899–907. doi: 10.1016/j.amc.2008.05.130. [110] IEA. Technology Roadmap: Hydropower. Tech. rep. Paris, France: IEA, 2012, 1–68. url: http : / / www . iea . org / publications / freepublications/publication/2012_Hydropower_Roadmap. pdf. [111] Félix Iglesias and Wolfgang Kastner. “Analysis of Similarity Measures in Times Series Clustering for the Discovery of Build- ing Energy Patterns.” In: Energies 6.2 (Jan. 2013), 579–597. doi: 10.3390/en6020579. [112] Stephen S. Intille, Kent Larson, J. S. Beaudin, J. Nawyn, E. Munguia Tapia, and P. Kaushik. “A Living Laboratory for the Design and Evaluation of Ubiquitous Computing Technologies.” In: Proceedings of the 2005 Conference on Human Factors in Comput- ing Systems (CHI ’05). Portland, OR, USA: ACM, Apr. 2005, 1941–1944. doi: 10.1145/1056808.1057062. [113] Constantin Ionescu, Tudor Baracu, Gabriela-Elena Vlad, Ho- ria Necula, and Adrian Badea. “The Historical Evolution of the Energy Efficient Buildings.” In: Renewable and Sustainable Energy Reviews 49 (Sept. 2015), 243–253. doi: 10.1016/j.rser. 2015.04.062. [114] Panagiotis G. Ipeirotis, Foster Provost, and Jing Wang. “Qual- ity management on Amazon Mechanical Turk.” In: Proceed- ings of the 2010 ACM SIGKDD Workshop on Human Computation (HCOMP ’10). Washington, DC, USA: ACM, July 2010, 64. doi: 10.1145/1837885.1837906. [115] Marco Jahn, Marc Jentsch, Christian R. Prause, Ferry Pramu- dianto, Amro Al-Akkad, and Rene Reiners. “The Energy Aware Smart Home.” In: Proceedings of the 5th International Conference on Future Information Technology (FutureTech ’10). Busan, South Korea: IEEE, May 2010, 1–8. doi: 10.1109/FUTURETECH.2010. 5482712. [116] Francois Jammes. “Internet of Things in Energy Efficiency.” In: Ubiquity 2016.February (Feb. 2016), 1–8. doi: 10.1145/2822887. [117] Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Jianyong Wang, Helen Pinto, Qiming Chen, Umeshwar Dayal, and Mei-Chun Hsu. “Mining Sequential Patterns by Pattern-growth: The Pre- fixSpan Approach.” In: IEEE Transactions on Knowledge and Data Engineering 16.11 (Nov. 2004), 1424–1440. doi: 10.1109/TKDE. 2004.77. 132 Bibliography

[118] B. M. Johnson. Patterns of Residential Occupancy. Tech. rep. Ot- tawa, ON, Canada: National Research Council Canada. Divi- sion of Building Research, 1981. [119] J. H. Jung, Christoph Schneider, and Joseph Valacich. “Enhanc- ing the Motivational Affordance of Information Systems: The Effects of Real-time Performance Feedback and Goal Setting in Group Collaboration Environments.” In: Management Science 56.4 (Apr. 2010), 724–742. doi: 10.1287/mnsc.1090.1129. [120] Holger Junker, Oliver Amft, Paul Lukowicz, and Gerhard Tröster. “Gesture Spotting with Body-worn Inertial Sensors to Detect User Activities.” In: Pattern Recognition 41.6 (June 2008), 2010– 2024. doi: 10.1016/j.patcog.2007.11.016. [121] Daniel Kahneman. “Maps of Bounded Rationality: Psychology for Behavioral Economics.” In: American Economic Review 93.5 (Nov. 2003), 1449–1475. doi: 10.1257/000282803322655392. [122] Tim van Kasteren, Athanasios Noulas, Gwenn Englebienne, and Ben Kröse. “Accurate Activity Recognition in a Home Setting.” In: Proceedings of the 10th International Conference on Ubiquitous Computing (Ubicomp ’09). Seoul, South Korea: ACM, Sept. 2008, 1–9. doi: 10.1145/1409635.1409637. [123] Isaak Kavasidis, Simone Palazzo, Roberto Di Salvo, Daniela Giordano, and Concetto Spampinato. “An Innovative Web-based Collaborative Platform for Video Annotation.” In: Multimedia Tools and Applications 70.1 (May 2014), 413–432. doi: 10.1007/ s11042-013-1419-7. [124] Gabriella Kazai and Imed Zitouni. “Quality Management in Crowdsourcing Using Gold Judges Behavior.” In: Proceedings of the 9th ACM International Conference on Web Search and Data Mining (WSDM ’16). San Francisco, CA, USA: ACM, Feb. 2016, 267–276. doi: 10.1145/2835776.2835835. [125] Jack Kelly and William Knottenbelt. “The UK-DALE Dataset, Domestic Appliance-level Electricity Demand and Whole-house Demand from Five UK Homes.” In: Scientific Data 2 (Mar. 2015), 1–14. doi: 10.1038/sdata.2015.7. [126] Willett Kempton and Linda L. Layne. “The Consumer’s En- ergy Analysis Environment.” In: Energy Policy 22.10 (Oct. 1994), 857–866. doi: 10.1016/0301-4215(94)90145-7. [127] Willett Kempton and Laura Montgomery. “Folk Quantifica- tion of Energy.” In: Energy 7.10 (Oct. 1982), 817–827. doi: 10. 1016/0360-5442(82)90030-5. Bibliography 133

[128] Eamonn Keogh and Shruti Kasetty. “On the Need for Time Se- ries Data Mining Benchmarks: A Survey and Empirical Demon- stration.” In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’02). Edmonton, AB, Canada: ACM, July 2002, 102–111. doi: 10 . 1145/775047.775062. [129] Cory D. Kidd, Robert Orr, Gregory D. Abowd, Christopher G. Atkeson, Irfan A. Essa, Blair MacIntyre, Elizabeth Mynatt, Thad E. Starner, and Wendy Newstetter. “The Aware Home: A Living Laboratory for Ubiquitous Computing Research.” In: Proceedings of the 2nd International Workshop on Cooperative Buildings, Integrating Information, Organization, and Architecture (CoBuild ’99). Vol. 1670. Pittsburgh, PA, USA: Springer, Oct. 1999, 191–198. doi: 10.1007/10705432_17. [130] Ahhyoun Kim, Minji Kim, and Hyunjoong Kim. “Double-bagging Ensemble Using WAVE.” In: Communications for Statistical Ap- plications and Methods 21.5 (Sept. 2014), 411–422. doi: 10.5351/ CSAM.2014.21.5.411. [131] Hyunjoong Kim, Hyeuk Kim, Hojin Moon, and Hongshik Ahn. “A Weight-adjusted Voting Algorithm for Ensemble of Classi- fiers.” In: Journal of the Korean Statistical Society 40.4 (Dec. 2011), 437–449. doi: 10.1016/j.jkss.2011.03.002. [132] Aniket Kittur, Jeffrey V. Nickerson, Michael Bernstein, Eliza- beth Gerber, Aaron Shaw, John Zimmerman, Matthew Lease, and John Horton. “The Future of Crowd Work.” In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work (CSCW ’13). San Antonio, TX, USA: ACM, Feb. 2012, 1301– 1317. doi: 10.1145/2441776.2441923. [133] Wilhelm Kleiminger. “Occupancy Sensing and Prediction for Automated Energy Savings.” Ph.D. Dissertation. ETH Zurich, 2015, 1–207. doi: 10.3929/ethz-a-010450096. [134] Donald E. Knuth. “Computer Programming as an Art.” In: Communications of the ACM 17.12 (Dec. 1974), 667–673. doi: 10.1145/361604.361612. [135] J. Zico Kolter and Matthew J. Johnson. “REDD: A Public Data Set for Energy Disaggregation Research.” In: Proceedings of the 1st KDD Workshop on Data Mining Applications in Sustainability (SustKDD ’11). San Diego, CA, USA: ACM, Aug. 2011, 1–6. [136] Kalimuthu Krishnamoorthy and Thomas Mathew. Statistical Tolerance Regions: Theory, Applications, and Computation. Wiley, 2009, 461. 134 Bibliography

[137] Kalimuthu Krishnamoorthy and Sumona Mondal. “Improved Tolerance Factors for Multivariate Normal Distributions.” In: Communications in Statistics - Simulation and Computation 35.2 (Feb. 2006), 461–478. doi: 10.1080/03610910600591883. [138] Narayanan C. Krishnan and Diane J. Cook. “Activity Recogni- tion on Streaming Sensor Data.” In: Pervasive and Mobile Com- puting 10.PART B (Feb. 2014), 138–154. doi: 10.1016/j.pmcj. 2012.07.003. [139] Jungsuk Kwac, June Flora, and Ram Rajagopal. “Household Energy Consumption Segmentation Using Hourly Data.” In: IEEE Transactions on Smart Grid 5.1 (Jan. 2014), 420–430. doi: 10.1109/TSG.2013.2278477. [140] Jungsuk Kwac, June Flora, and Ram Rajagopal. “Lifestyle Seg- mentation Based on Energy Consumption Data.” In: IEEE Trans- actions on Smart Grid PP.99 (Sept. 2016), 1–16. doi: 10.1109/ TSG.2016.2611600. [141] Jungsuk Kwac and Ram Rajagopal. “Demand Response Tar- geting Using Big Data Analytics.” In: Proceedings of the 2013 IEEE International Conference on Big Data (BigData ’13). Santa Clara, CA, USA: IEEE, Oct. 2013, 683–690. doi: 10.1109/BigData. 2013.6691643. [142] Jungsuk Kwac and Ram Rajagopal. “Data-driven Targeting of Customers for Demand Response.” In: IEEE Transactions on Smart Grid 7.5 (Sept. 2016), 2199–2207. doi: 10.1109/TSG.2015. 2480841. [143] Jungsuk Kwac, Chin-Woo Tan, Nicole Sintov, June Flora, and Ram Rajagopal. “Utility Customer Segmentation Based on Smart Meter Data: Empirical Study.” In: Proceedings of the 2013 IEEE International Conference on Smart Grid Communications (Smart- GridComm ’13). Vancouver, BC, Canada: IEEE, Oct. 2013, 720– 725. doi: 10.1109/SmartGridComm.2013.6688044. [144] Marvin J. Law. “Multivariate Statistical Analysis of Assembly Tolerance Specifications.” Master Thesis. Brigham Young Uni- versity, 1996. [145] Lawrence Berkeley National Laboratory. Standby Power Sum- mary Table. url: #http://standby.lbl.gov/summary- table. html# (visited on 11/21/2016). [146] David C. Lay. Linear Algebra and Its Applications. Pearson, 2012, 576. [147] John Le, Andy Edmonds, Vaughn Hester, and Lukas Biewald. “Ensuring Quality in Crowdsourced Search Relevance Evalu- ation: The Effects of Training Question Distribution.” In: Pro- ceedings of the 33rd International ACM SIGIR Conference on Re- Bibliography 135

search and Development in Information Retrieval (SIGIR ’10). Geneva, Switzerland: ACM, July 2010, 17–20. [148] Hongwei Li, Bo Zhao, and Ariel Fuxman. “The Wisdom of Mi- nority: Discovering and Targeting the Right Group of Work- ers for Crowdsourcing.” In: Proceedings of the 23rd International Conference on World Wide Web (WWW ’14). Seoul, South Korea: ACM, Apr. 2014, 165–175. doi: 10.1145/2566486.2568033. [149] Zhenhui Li, Jingjing Wang, and Jiawei Han. “ePeriodicity: Min- ing Event Periodicity from Incomplete Observations.” In: IEEE Transactions on Knowledge and Data Engineering 27.5 (May 2015), 1219–1232. doi: 10.1109/TKDE.2014.2365801. [150] Brian Y. Lim and Anind K. Dey. “Assessing Demand for In- telligibility in Context-aware Applications.” In: Proceedings of the 11th International Conference on Ubiquitous Computing (Ubi- Comp ’09). Orlando, FL, USA: ACM, Oct. 2009, 195–204. doi: 10.1145/1620545.1620576. [151] Brian Y. Lim, Anind K. Dey, and Daniel Avrahami. “Why and Why Not Explanations Improve the Intelligibility of Context- Aware Intelligent Systems.” In: Proceedings of the SIGCHI Con- ference on Human Factors in Computing Systems (CHI ’09). Boston, MA, USA: ACM, Apr. 2009, 2119–2128. doi: 10.1145/1518701. 1519023. [152] Gu-yuan Lin, Shih-chiang Lee, J.Y.-J. Hsu, and Wan-rong Jih. “Applying Power Meters for Appliance Recognition on the Electric Panel.” In: Proceedings of the 5th IEEE Conference on In- dustrial Electronics and Applications (ICIEA ’10). Taichung, Tai- wan: IEEE, June 2010, 2254–2259. doi: 10.1109/ICIEA.2010. 5515385. [153] David Lindley. “Smart Grids: The Energy Storage Problem.” In: Nature 463.7277 (Jan. 2010), 18–20. doi: 10.1038/463018a. [154] Nick Littlestone and Manfred K. Warmuth. “The Weighted Majority Algorithm.” In: Information and Computation 108.2 (Feb. 1994), 212–261. doi: 10.1006/inco.1994.1009. [155] Hongyan Liu, Zhiyuan Yao, Tomas Eklund, and Barbro Back. “Electricity Consumption Time Series Profiling: A Data Min- ing Application in Energy Industry.” In: Proceedings of the 12th Industrial Conference on Data Mining (ICDM ’12). Berlin, Ger- many: Springer, July 2012, 52–66. doi: 10.1007/978- 3- 642- 31488-9_5. [156] Jiakang Lu, Tamim Sookoor, Vijay Srinivasan, Ge Gao, Brian Holben, John Stankovic, Eric Field, and Kamin Whitehouse. “The Smart Thermostat: Using Occupancy Sensors to Save En- ergy in Homes.” In: Proceedings of the 8th ACM Conference on 136 Bibliography

Embedded Networked Sensor Systems (SenSys ’10). Zurich, Switzer- land: ACM, Nov. 2010, 211–224. doi: 10.1145/1869983.1870005. [157] Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. “Dis- covery of Frequent Episodes in Event Sequences.” In: Data Mining and Knowledge Discovery 1.3 (1997), 259–290. doi: 10 . 1023/A:1009748302351. [158] Andrea Mannini, Stephen S. Intille, Mary Rosenberger, Angelo M. Sabatini, and William Haskell. “Activity Recognition Us- ing a Single Accelerometer Placed at the Wrist or Ankle.” In: Medicine & Science in Sports & Exercise 45.11 (Nov. 2013), 2193– 2203. doi: 10.1249/MSS.0b013e31829736d6. [159] Friedemann Mattern, Thorsten Staake, and Markus Weiss. “ICT for Green - How Computers Can Help Us to Conserve En- ergy.” In: Proceedings of The 1st International Conference on Energy- efficient Computing and Networking (e-Energy ’10). Passau, Ger- many: ACM, Apr. 2010, 1–10. doi: 10.1145/1791314.1791316. [160] Charlotte McDonald. “How Many Earths Do We Need?” In: BBC News (June 2015). url: http : / / www . bbc . com / news / magazine-33133712. [161] Fintan McLoughlin, Aidan Duffy, and Michael Conlon. “Eval- uation of Time Series Techniques to Characterise Domestic Electricity Demand.” In: Energy 50 (Feb. 2013), 120–130. doi: 10.1016/j.energy.2012.11.048. [162] Alan Meier, Cecilia Aragon, Therese Peffer, Daniel Perry, and Marco Pritoni. “Making Energy Savings Easier: Usability Met- rics for Thermostats.” In: Journal of Usability Studies 6.4 (2011), 226–244. [163] Glenn W. Milligan and Martha C. Cooper. “A Study of Stan- dardization of Variables in Cluster Analysis.” In: Journal of Classification 5.2 (Sept. 1988), 181–204. doi: 10.1007/BF01897163. [164] Andrés Molina-Markham, Prashant Shenoy, Kevin Fu, Em- manuel Cecchet, and David Irwin. “Private Memoirs of a Smart Meter.” In: Proceedings of the 2nd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building (BuildSys ’10). Toronto, ON, Canada: ACM, Nov. 2010, 61–66. doi: 10.1145/ 1878431.1878446. [165] Fabian Mörchen. “Unsupervised Pattern Mining from Sym- bolic Temporal Data.” In: ACM SIGKDD Explorations Newslet- ter 9.1 (June 2007), 41–55. doi: 10.1145/1294301.1294302. [166] Steven Morris. Welsh Home Installs UK’s First Tesla Powerwall Storage Battery. Feb. 2016. url: http : / / www . theguardian . com/environment/2016/feb/05/welsh-home-installs-uks- first-tesla-powerwall-storage-battery. Bibliography 137

[167] Michael C. Mozer. “The Neural Network House: An Environ- ment That Adapts to Its Inhabitants.” In: Proceedings of the 1998 AAAI Spring Symposium on Intelligent Environments (AAAI ’98). Palo Alto, CA, USA: AAAI, Mar. 1998, 110–114. url: http : //www.aaai.org/Library/Symposia/Spring/1998/ss98-02- 017.php. [168] Michael C. Mozer. “Lessons from an Adaptive Home.” In: Smart Environments. Hoboken, NJ, USA: Wiley, Jan. 2005. Chap. 12, 271–294. doi: 10.1002/047168659X.ch12. [169] Michael C. Mozer, Lucky Vidmar, and Robert H. Dodier. “The Neurothermostat: Predictive Optimal Control of Residential Heating Systems.” In: Advances in Neural Information Processing Systems 9 (NIPS ’97). Vol. 9. Denver, CO, USA: MIT, Dec. 1997, 953–959. [170] Anti Mutanen, Sami Repo, and Pertti Järventausta. Customer Classification and Load Profiling Based on AMR Measurements. Tech. rep. Tampere, Finland: Deapartment of Electrical Energy Engi- neering, Tampere University of Technology, 2010, 1–37. [171] Kazunori Nagasawa, Charles R. Upshaw, Joshua D. Rhodes, Chris L. Holcomb, David A. Walling, and Michael E. Webber. “Data Management for a Large-scale Smart Grid Demonstra- tion Project in Austin, Texas.” In: Proceedings of the ASME 2012 6th International Conference on Energy Sustainability (ES ’12). San Diego, CA, USA: ASME, July 2012, 1027–1031. [172] National Oceanic and Atmospheric Administration (NOAA). Winter Was Record Warm for the Contiguous U.S. 2016. url: http: / / www . noaa . gov / news / winter - was - record - warm - for - contiguous-us (visited on 12/13/2016). [173] United Nations. Kyoto Protocol To the United Nations Framework Kyoto Protocol To the United Nations Framework. New York, NY, USA, 1998. url: http://unfccc.int/resource/docs/convkp/ kpeng.pdf. [174] Bijay Neupane, Torben Bach Pedersen, and Bo Thiesson. “To- wards Flexibility Detection in Device-level Energy Consump- tion.” In: Proceedings of the 2nd ECML/PKDD International Work- shop onData Analytics for Renewable Energy Integration (DARE ’14). Vol. 8817. Nancy, France: Springer, Sept. 2014, 1–16. doi: 10.1007/978-3-319-13290-7_1. [175] Monica J. Nevius and Scott Pigg. “Programmable Thermostats that Go Berserk? Taking a Social Perspective on Space Heating in Wisconsin.” In: Proceedings of the 2000 ACEEE Summer Study on Energy Efficiency in Buildings (ACEEE ’00). Niagara Falls, NY, USA: ACEEE, Aug. 2000, 233–244. 138 Bibliography

[176] Federal Statistical Office. Sustainable Development. Pocket Statis- tics 2016. Tech. rep. Neuchâtel, Switzerland: FSO, 2016, 1–44. url: https://www.bfs.admin.ch/bfs/fr/home/statistiques/ developpement-durable.assetdetail.1101247.html. [177] Leneve Ong and Mario Bergés. “Poster Abstract: Exploring Sequential and Association Rule Mining for Pattern-based En- ergy Demand Characterization.” In: Proceedings of the 5th ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Buildings (BuildSys ’13). Rome, Italy: ACM, Nov. 2013, 1–2. doi: 10.1145/2528282.2528308. [178] Francisco Javier Ordonez, Gwenn Englebienne, Paula de Toledo, Tim van Kasteren, Araceli Sanchis, and Ben Krose. “In-home Activity Recognition: for Hidden Markov Models.” In: IEEE Pervasive Computing 13.3 (July 2014), 67–75. doi: 10.1109/MPRV.2014.52. [179] Peter Palensky and Dietmar Dietrich. “Demand Side Manage- ment: Demand Response, Intelligent Energy Systems, and Smart Loads.” In: IEEE Transactions on Industrial Informatics 7.3 (Aug. 2011), 381–388. doi: 10.1109/TII.2011.2158841. [180] Panagiotis Papapetrou, George Kollios, Stan Sclaroff, and Dim- itrios Gunopulos. “Mining Frequent Arrangements of Tempo- ral Intervals.” In: Knowledge and Information Systems 21.2 (Nov. 2009), 133–171. doi: 10.1007/s10115-009-0196-0. [181] Sara Pasquier and Aurelien Saussay. Progress Implementing the IEA 25 Energy Efficiency Policy Recommendations 2011 Evalua- tion. Tech. rep. Paris, France: IEA, 2012, 1–130. url: https:// www.iea.org/publications/insights/insightpublications/ progress_implementing_25_ee_recommendations.pdf. [182] Marc Pedersen. “Segmenting residential customers: energy and conservation behaviors.” In: Proceedings of the 2008 ACEEE Sum- mer Study on Energy Efficiency in Buildings (ACEEE ’08). Pacific Grove, CA, USA: ACEEE, Aug. 2008, 229–241. [183] Lucas Pereira and Nuno J. Nunes. “Semi-automatic Labeling for Public Non-intrusive Load Monitoring Datasets.” In: Pro- ceedings of the 5th IFIP Conference on Sustainable Internet and ICT for Sustainability (SustainIT ’15). Madrid, Spain: IEEE, Apr. 2015, 1–4. doi: 10.1109/SustainIT.2015.7101378. [184] Lucas Pereira, Filipe Quintal, Mary Barreto, and Nuno J. Nunes. “Understanding the Limitations of Eco-feedback: A One-year Long-term Study.” In: Proceedings of the Workshop on Human- Computer Interaction and Knowledge Discovery and Data Mining (HCI-KDD ’13). Maribor, Slovenia: Springer, July 2013, 237–255. doi: 10.1007/978-3-642-39146-0_21. Bibliography 139

[185] Lucas Pereira, Filipe Quintal, Nuno J. Nunes, and Mario Bergés. “The Design of a Hardware-software Platform for Long-term Energy Eco-feedback Research.” In: Proceedings of the 4th ACM SIGCHI Symposium on Engineering Interactive Computing Sys- tems (EICS ’12). Austin, TX, USA: ACM, May 2012, 221–230. doi: 10.1145/2305484.2305521. [186] Dennis E. Phillips, Rui Tan, M. Moazzami, Guoliang Xing, Jinzhu Chen, and David K. Y. Yau. “Supero: A Sensor Sys- tem for Unsupervised Residential Power Usage Monitoring.” In: Proceedings of the 2013 IEEE International Conference on Per- vasive Computing and Communications (PerCom ’13). San Diego, CA, USA: IEEE, Mar. 2013, 66–75. doi: 10.1109/PerCom.2013. 6526716. [187] Hannu Pihala. “Non-intrusive Appliance Load Monitoring Sys- tem Based on a Modern kWh-meter.” Master Thesis. VTT Tech- nical Research Centre of Finland, 1998, 1–71. [188] Sérgio Ramos, Vera Figueiredo, Fátima Rodrigues, Raul Pin- heiro, and Zita Vale. “Knowledge Extraction from Medium Voltage Load Diagrams to Support the Definition of Electrical Tariffs.” In: International Journal of Engineering Intelligent Sys- tems for Electrical Engineering. Vol. 15. 3. CRL, Sept. 2007, 143– 150. [189] Sérgio Ramos and Zita Vale. “Data Mining Techniques to Sup- port the Classification of MV Electricity Customers.” In: Pro- ceedings of the 2008 IEEE Power and Energy Society General Meet- ing (PES ’08). Pittsburgh, PA, USA: IEEE, July 2008, 1–7. doi: 10.1109/PES.2008.4596669. [190] Juhi Ranjan, Erin Griffiths, and Kamin Whitehouse. “Discern- ing Electrical and Water Usage by Individuals in Homes.” In: Proceedings of the 1st ACM Conference on Embedded Systems for Energy-Efficient Buildings (BuildSys ’14). Memphis, TN, USA: ACM, Nov. 2014, 20–29. doi: 10.1145/2674061.2674066. [191] Parisa Rashidi and Diane J. Cook. “Activity Knowledge Trans- fer in Smart Environments.” In: Pervasive and Mobile Computing 7.3 (June 2011), 331–343. doi: 10.1016/j.pmcj.2011.02.007. [192] Parisa Rashidi, Diane J. Cook, Lawrence B. Holder, and Mau- reen Schmitter-Edgecombe. “Discovering Activities to Recog- nize and Track in a Smart Environment.” In: IEEE Transactions on Knowledge and Data Engineering 23.4 (Sept. 2011), 527–539. doi: 10.1109/TKDE.2010.148. [193] Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hock- enmaier. “Collecting Image Annotations Using Amazon’s Me- chanical Turk.” In: Proceedings of the NAACL HLT 2010 Work- shop on Creating Speech and Language Data with Amazon’s Me- 140 Bibliography

chanical Turk (CSLDAMT ’10). Los Angeles, CA, USA: ACL, June 2010, 139–147. [194] J. D. Rennie, L. Shih, J. Teevan, and D. Karger. “Tackling the Poor Assumptions of Naive Bayes Text Classifiers.” In: Pro- ceedings of The 20th International Conference on Machine Learning (ICML ’03). Washington, DC, USA: AAAI, Aug. 2003, 616–623. [195] Daniela Retelny, Sébastien Robaszkiewicz, Alexandra To, Wal- ter S. Lasecki, Jay Patel, Negar Rahmati, Tulsee Doshi, Melissa Valentine, and Michael S Bernstein. “Expert Crowdsourcing With Flash Teams.” In: Proceedings of the 27th annual ACM Sym- posium on User Interface Software and Technology (UIST ’14). Hon- olulu, HI, USA: ACM, Oct. 2014, 75–85. doi: 10.1145/2642918. 2647409. [196] Darren P. Richardson, Enrico Costanza, and Sarvapali D. Ram- churn. “Evaluating Semi-automatic Annotation of Domestic Energy Consumption As a Memory Aid.” In: Proceedings of the 14th International Conference on Ubiquitous Computing (Ubi- comp ’12). Pittsburgh, PA, USA: ACM, Sept. 2012, 613–614. doi: 10.1145/2370216.2370330. [197] Ian Richardson, Murray Thomson, David Infield, and Conor Clifford. “Domestic Electricity Use: A High-resolution Energy Demand Model.” In: Energy and Buildings 42.10 (June 2010), 1878–1887. doi: 10.1016/j.enbuild.2010.05.023. [198] Yann Riche, Jonathan Dodge, and Ronald A. Metoyer. “Study- ing Always-on Electricity Feedback in the Home.” In: Proceed- ings of the 28th International Conference on Human Factors in Com- puting Systems (CHI ’10). Atlanta, GA, USA: ACM, Apr. 2010, 1995–1998. doi: 10.1145/1753326.1753628. [199] Fátima Rodrigues, Jorge Duarte, Vera Figueiredo, Zita Vale, and M. Cordeiro. “A Comparative Analysis of Clustering Al- gorithms Applied to Load Profiling.” In: Proceedings of the 3rd International Conference on Machine Learning and Data Mining (MLDM ’03). Vol. 2734. Leipzig, Germany: Springer, July 2003, 73–85. doi: 10.1007/3-540-45065-3_7. [200] Daniel Roggen, Gerhard Tröster, Paul Lukowicz, Alois Fer- scha, José Del R. Millán, and Ricardo Chavarriaga. “Oppor- tunistic Human Activity and Context Recognition.” In: Com- puter 46.2 (Feb. 2013), 36–45. doi: 10.1109/MC.2012.393. [201] Sami Rollins and Nilanjan Banerjee. “Using Rule Mining to Understand Appliance Energy Consumption Patterns.” In: Pro- ceedings of the 2014 IEEE International Conference on Pervasive Computing and Communications (PerCom ’14). Budapest, Hun- gary: IEEE, Mar. 2014, 29–37. doi: 10 . 1109 / PerCom . 2014 . 6813940. Bibliography 141

[202] Sami Rollins, Nilanjan Banerjee, Lazeeb Choudhury, and David Lachut. “A System for Collecting Activity Annotations for Home Energy Management.” In: Pervasive and Mobile Computing 15 (Dec. 2014), 153–165. doi: 10.1016/j.pmcj.2014.05.008. [203] Nirmalya Roy, Abhishek Roy, and Sajal K. Das. “Context-aware Resource Management in Multi-inhabitant Smart Homes: A Nash H-learning Based Approach.” In: Proceedings of the 4th Annual IEEE International Conference on Pervasive Computing and Communications (PerCom ’06). Pisa, Italy: IEEE, Mar. 2006, 148– 158. doi: 10.1109/PERCOM.2006.18. [204] Jason Samenow. America’s Year Without a Winter: The 2015-2016 Season Was the Warmest on Record. Washington, DC, USA, Mar. 2016. url: https://www.washingtonpost.com/news/capital- weather - gang / wp / 2016 / 03 / 08 / americas - year - without - a - winter - the - 2015 - 2016 - season - was - the - warmest - on - record/. [205] Ignacio Benítez Sánchez, Ignacio Delgado, Laura Moreno Sar- rión, Alfredo Quijano López, and Isabel Navalón Burgos. “Clients Segmentation According to Their Domestic Energy Consump- tion by the Use of Self-organizing Maps.” In: Proceedings of the 6th International Conference on the European Energy Market (EEM ’09). Leuven, Belgium: IEEE, May 2009, 1–6. doi: 10.1109/EEM. 2009.5207172. [206] Robert E. Schapire. “The Boosting Approach to Machine Learn- ing: An Overview.” In: Nonlinear Estimation and Classification. Vol. 171. Springer, 2003, 149–171. doi: 10.1.1.24.5565. [207] Bill Schilit, Norman Adams, and Roy Want. “Context-aware Computing Applications.” In: Proceedincs of the 1st Workshop on Mobile Computing Systems and Applications (WMCSA ’94). Santa Cruz, CA, USA: IEEE, Dec. 1994, 85–90. doi: 10.1109/WMCSA. 1994.16. [208] Albrecht Schmidt, Michael Beigl, and Hans W. Gellersen. “There Is More to Context Than Location.” In: Proceedings of the 1998 International Workshop on Interactive Applications of Mobile Com- puting (IMC ’98). Elsevier, Nov. 1998, 893–901. doi: 10.1016/ S0097-8493(99)00120-X. [209] Wolf-Jürgen Schmidt-Küster. “Einfluss des Verbraucherverhal- tens auf den Energiebedarf Privater Haushalte.” In: Einfluss des Verbraucherverhaltens auf den Energiebedarf Privater Haushalte. Vol. 15. Munich, Germany: Springer, 1982, 3–6. doi: 10.1007/ 978-3-642-95404-7_2. [210] James Scott, A. J. Bernheim Brush, John Krumm, Brian Meyers, Michael Hazas, Stephen Hodges, and Nicolas Villar. “PreHeat: Controlling Home Heating Using Occupancy Prediction.” In: 142 Bibliography

Proceedings of the 13th International Conference on Ubiquitous Com- puting (Ubicomp ’11). UbiComp. Beijing, China: ACM, Sept. 2011, 281–290. doi: 10.1145/2030112.2030151. [211] Kristin Seyboth et al. Renewables 2016 Global Status Report. Tech. rep. Paris, France: REN21, 2016, 1–272. url: http : / / www . ren21.net/status-of-renewables/global-status-report/. [212] Pierluigi Siano. “Demand Response and Smart Grids - A Sur- vey.” In: Renewable and Sustainable Energy Reviews 30 (Feb. 2014), 461–478. doi: 10.1016/j.rser.2013.10.022. [213] Meghna Singh et al. “The Zebrafish GenomeWiki: A Crowd- sourcing Approach to Connect the Long Tail for Zebrafish Gene Annotation.” In: Database : The Journal of Biological Databases and Curation 2014 (Jan. 2014), bau011. doi: 10.1093/database/ bau011. [214] Minoru Siotani. “Tolerance Regions for a Multivariate Normal Population.” In: Annals of the Institute of Statistical Mathematics 16.1 (Dec. 1964), 135–153. doi: 10.1007/BF02868568. [215] Padhraic Smyth, Usama Fayyad, and Michael Burl. “Inferring Ground Truth from Subjective Labelling of Venus Images.” In: Advances in Neural Information Processing Systems 7 (NIPS ’94). Denver, CO, USA: MIT, Dec. 1994, 1085–1092. [216] Rion Snow, Brendan O’Connor, Daniel Jurafsky, and Andrew Y. Ng. “Cheap and Fast - But is it Good? Evaluating Non- expert Annotations for Natural Language Tasks.” In: Proceed- ings of the 2008 Conference on Empirical Methods in Natural Lan- guage Processing (EMNLP ’08). Honolulu, HI, USA: ACL, Oct. 2008, 254–263. [217] Vincent Spruyt. A Geometric Interpretation of the Covariance Ma- trix. 2014. url: http : / / www . visiondummy . com / 2014 / 04 / geometric-interpretation-covariance-matrix/ (visited on 07/21/2016). [218] L. Suganthi and Anand A. Samuel. “Energy Models for De- mand Forecasting - A Review.” In: Renewable and Sustainable Energy Reviews 16.2 (Feb. 2012), 1223–1240. doi: 10 . 1016 / j . rser.2011.08.014. [219] Bernadette Suzanne Sutterlin. “Segmentation and Characteri- zation of Energy Consumers: Consumers’ Differences in Energy- related Behaviors and Commonalities in Perceptions of Others’ Behavior.” Ph.D. Dissertation. ETH Zurich, 2012, 1–185. doi: 10.3929/ethz-a-007575448. Bibliography 143

[220] Lukas G. Swan and V. Ismet Ugursal. “Modeling of End-use Energy Consumption in the Residential Sector: A Review of Modeling Techniques.” In: Renewable and Sustainable Energy Re- views 13.8 (Oct. 2009), 1819–1835. doi: 10.1016/j.rser.2008. 09.033. [221] Swiss Federal Office of Energy SFOE. Hydropower. 2016. url: http : / / www . bfe . admin . ch / themen / 00490 / 00491 / index . html?lang=en (visited on 04/21/2016). [222] Swiss Federal Office of Energy SFOE. Statistique Suisse de l’Électricité 2015. Tech. rep. Bern, Switzerland: SFOE, 2016, 1–56. [223] Emmanuel Munguia Tapia, Stephen S. Intille, and Kent Larson. “Activity Recognition in the Home Using Simple and Ubiqui- tous Sensors.” In: Proceedings of the 2nd International Conference on Pervasive Computing (Pervasive ’12). Linz, Austria: Springer, Apr. 2004, 158–175. doi: 10.1007/978-3-540-24646-6_10. [224] Défi Technique. Jewel Box. 2015. url: https://defitechnique. com / en / realisations / villa - individuelle - 2/ (visited on 12/14/2016). [225] Tesla. Powerwall 2. 2016. url: https://www.tesla.com/powerwall (visited on 12/19/2016). [226] Brian Thomas and Diane J. Cook. “Activity-aware Energy-efficient Automation of Smart Buildings.” In: Energies 9.8 (Aug. 2016), 624–640. doi: 10.3390/en9080624. [227] Tian Tian and Jun Zhu. “Max-Margin Majority Voting for Learn- ing from Crowds.” In: Advances in Neural Information Processing Systems 28 (NIPS ’15). Montreal, QC, Canada: Curran Asso- ciates, Dec. 2015, 1621–1629. [228] Sebastien Tremblay, Dany Fortin-Simard, Erika Blackburn-Verreault, Sebastien Gaboury, Bruno Bouchard, and Abdenour Bouzouane. “Exploiting Environmental Sounds for Activity Recognition in Smart Homes.” In: Proceedings of the AAAI Workshop on Artifi- cial Intelligence Applied to Assistive Technologies and Smart Envi- ronments (AAAI ’15). Austin, TX, USA: AAAI, Jan. 2015, 41–46. url: http://aaai.org/ocs/index.php/WS/AAAIW15/paper/ view/9697. [229] Ngoc Cuong Truong, Long Tran-Thanh, Enrico Costanza, and Sarvapali D. Ramchurn. “Activity Prediction for Agent-based Home Energy Management.” In: Proceedings of the 4th Interna- tion Workshop on Agent Technologies for Energy Systems (ATES ’13). Saint Paul, Minnesota, USA: AAMAS, May 2013, 1–8. [230] FCCC United Nations. Adoption of the Paris Agreement. Tech. rep. L9. Paris, France: United Nations, 2015, 1–32. url: http: //unfccc.int/resource/docs/2015/cop21/eng/l09r01.pdf. 144 Bibliography

[231] General Assembly United Nations. Transforming Our World: The 2030 Agenda for Sustainable Development. Tech. rep. August. New York, NY, USA: United Nations, Aug. 2015, 1–31. url: http : / / www . un . org / pga / wp - content / uploads / sites / 3 / 2015 / 08 / 120815 _ outcome - document - of - Summit - for - adoption-of-the-post-2015-development-agenda.pdf. [232] Bryan Urban, Victoria Shmakova, Brian Lim, and Kurt Roth. Energy Consumption of Consumer Electronics in U.S. Homes in 2013. Tech. rep. Boston, Massachusetts, USA: Fraunhofer USA Center for Sustainable Energy Systems, 2014, 1–158. [233] U.S. Department of Energy. 2014 Smart Grid System Report. Tech. rep. Washington, DC, USA: DOE, 1–35. url: http://energy. gov/oe/downloads/2014-smart-grid-system-report-august- 2014. [234] Roy Villafane, Kien A. Hua, Duc Tran, and Basab Maulik. “Knowl- edge Discovery from Series of Interval Events.” In: Journal of In- telligent Information Systems 15.1 (2000), 71–89. doi: 10.1023/A: 1008781812242. [235] Carl Vondrick, Donald Patterson, and Deva Ramanan. “Effi- ciently Scaling up Crowdsourced Video Annotation.” In: Inter- national Journal of Computer Vision 101.1 (Sept. 2012), 184–204. doi: 10.1007/s11263-012-0564-1. [236] Jingjing Wang and Bhaskar Prabhala. “Periodicity Based Next Place Prediction.” In: Procedings of the Mobile Data Challenge by Nokia Workshop in Conjunction with International Conference on Pervasive Computing (Pervasive ’12). Newcastle, UK: Idiap, June 2012, 1–5. [237] Yan Wang, Jian Liu, Yingying Chen, Marco Gruteser, Jie Yang, and Hongbo Liu. “E-eyes: Device-free Location-oriented Ac- tivity Identification Using Fine-grained WiFi Signatures.” In: Proceedings of the 20th Annual International Conference on Mo- bile Computing and Networking (MobiCom ’14). Maui, HI, USA: ACM, Sept. 2014, 617–628. doi: 10.1145/2639108.2639143. [238] Markus Weiss, Adrian Helfenstein, Friedemann Mattern, and Thorsten Staake. “Leveraging Smart Meter Data to Recognize Home Appliances.” In: Proceedings of the 2012 IEEE Interna- tional Conference on Pervasive Computing and Communications (PerCom ’12). Lugano, Switzerland: IEEE, Mar. 2012, 190–197. doi: 10.1109/PerCom.2012.6199866. [239] Markus Weiss, Friedemann Mattern, Tobias Graml, Thorsten Staake, and Elgar Fleisch. “Handy Feedback: Connecting Smart Meters With Mobile Phones.” In: Proceedings of the 8th Interna- tional Conference on Mobile and Ubiquitous Multimedia (MUM Bibliography 145

’09). Cambridge, UK: ACM, Nov. 2009, 1–4. doi: 10 . 1145 / 1658550.1658565. [240] Peter Welinder, Steve Branson, Serge Belongie, and Pietro Per- ona. “The Multidimensional Wisdom of Crowds.” In: Advances in Neural Information Processing Systems 23 (NIPS ’10). Vancou- ver, BC, Canada: Curran Associates, Dec. 2010, 2424–2432. [241] Rafał Weron. “Electricity Price Forecasting: A Review of the State-of-the-art with a Look into the Future.” In: International Journal of Forecasting 30.4 (Oct. 2014), 1030–1081. doi: 10.1016/ j.ijforecast.2014.08.008. [242] Jacob Whitehill, Paul Ruvolo, Tingfan Wu, Jacob Bergsma, and Javier Movellan. “Whose Vote Should Count More: Optimal In- tegration of Labels from Labelers of Unknown Expertise.” In: Advances in Neural Information Processing Systems 22 (NIPS ’09). Vancouver, BC, Canada: Curran Associates, Dec. 2009, 2035– 2043. [243] Tri Kurniawan Wijaya. “Pervasive Data Analytics for Sustain- able Energy Systems.” Ph.D. Dissertation. EPFL, 2015, 1–176. doi: doi:10.5075/epfl-thesis-6556. [244] Tri Kurniawan Wijaya, Karl Aberer, and Deva P. Seetharam. “Consumer Segmentation and Knowledge Extraction from Smart Meter and Survey Data.” In: Proceedings of the 2014 SIAM Inter- national Conference on Data Mining (SDM ’14). Philadelphia, PA, USA: SIAM, July 2014, 226–234. doi: 10.1137/1.9781611973440. 26. url: http : / / epubs . siam . org / doi / abs / 10 . 1137 / 1 . 9781611973440.26. [245] Tri Kurniawan Wijaya, Dipyaman Banerjee, Tanuja Ganu, Di- panjan Chakraborty, Sourav Battacharya, Thanasis Papaioan- nou, Deva P. Seetharam, and Karl Aberer. “DRSim: A Cyber Physical Simulator for Demand Response Systems.” In: Pro- ceedings of the 2013 IEEE International Conference on Smart Grid Communications (SmartGridComm ’13). Vancouver, BC, Canada: IEEE, Nov. 2013, 217–222. doi: 10.1109/SmartGridComm.2013. 6687960. [246] Tri Kurniawan Wijaya, Matteo Vasirani, Samuel Humeau, and Karl Aberer. “Cluster-based Aggregate Forecasting for Resi- dential Electricity Demand Using Smart Meter Data.” In: Pro- ceedings of the 2015 IEEE International Conference on Big Data (BigData ’15). Santa Clara, CA, USA: IEEE, Oct. 2015, 879–887. doi: 10.1109/BigData.2015.7363836. [247] Rolf Wüstenhagen, Maarten Wolsink, and Mary Jean Bürer. “Social Acceptance of Renewable Energy Innovation: An In- troduction to the Concept.” In: Energy Policy 35.5 (May 2007), 2683–2691. doi: 10.1016/j.enpol.2006.12.001. 146 Bibliography

[248] Matt Wytock and J. Zico Kolter. “Sparse Gaussian Conditional Random Fields: Algorithms, Theory, and Application to En- ergy Forecasting.” In: Proceedings of the 2013 International Con- ference on Machine Learning (ICML ’13). Vol. 28. Atlanta, GA, USA: JMLR, June 2013, 125–1273. url: http : / / jmlr . org / proceedings/papers/v28/wytock13.pdf. [249] Jenny Yuen, Bryan Russell, and Antonio Torralba. “LabelMe Video: Building a Video Database With Human Annotations.” In: Proceedings of the 2009 IEEE 12th International Conference on Computer Vision (ICCV ’09). Kyoto, Japan: IEEE, Sept. 2009, 1451–1458. doi: 10.1109/ICCV.2009.5459289. [250] Mohammed J. Zaki. “Efficient Enumeration of Frequent Se- quences Mohammed.” In: Proceedings of the 7th International Conference on Information and Knowledge Managemen (CIKM ’98). Bethesda, MD, USA: ACM, Nov. 1998, 68–75. doi: 10.1145/ 288627.288643. [251] Michael Zeifman and Kurt Roth. “Nonintrusive Appliance Load Monitoring: Review and Outlook.” In: IEEE Transactions on Consumer Electronics 57.1 (Feb. 2011), 76–84. doi: 10.1109/TCE. 2011.5735484. [252] Gabe Zichermann and Christopher Cunningham. Gamification by Design: Implementing Game Mechanics in Web and Mobile Apps. O’Reilly, Aug. 2011, 1–210. [253] Ahmed Zoha, Alexander Gluhak, Muhammad Ali Imran, and Sutharshan Rajasegarar. “Non-intrusive Load Monitoring Ap- proaches for Disaggregated Energy Sensing: A Survey.” In: Sensors (Basel, Switzerland) 12.12 (Dec. 2012), 16838–16866. doi: 10.3390/s121216838.