Sequence-To-Point Learning with Neural Networks for Non-Intrusive Load Monitoring

Sequence-to-point learning with neural networks for non-intrusive load monitoring Chaoyun Zhang1, Mingjun Zhong2, Zongzuo Wang1, Nigel Goddard1, and Charles Sutton1 1School of Informatics, University of Edinburgh, United Kingdom [email protected], fngoddard,[email protected] 2School of Computer Science, University of Lincoln, United Kingdom [email protected] Abstract usage for every appliance in a given household, multiple devices exhibiting similar power consumption, and simul- Energy disaggregation (a.k.a nonintrusive load monitoring, taneous switching on/off of multiple devices. Therefore en- NILM), a single-channel blind source separation problem, ergy disaggregation has been an active area for the appli- aims to decompose the mains which records the whole house electricity consumption into appliance-wise readings. This cation of artificial intelligence and machine learning tech- problem is difficult because it is inherently unidentifiable. niques. Popular approaches have been based on factorial Recent approaches have shown that the identifiability prob- hidden Markov models (FHMM) (Kolter and Jaakkola 2012; lem could be reduced by introducing domain knowledge into Parson et al. 2012; Zhong, Goddard, and Sutton 2013; 2014; the model. Deep neural networks have been shown to be a 2015; Lange and Berges´ 2016) and signal processing meth- promising approach for these problems, but sliding windows ods (Pattem 2012; Zhao, Stankovic, and Stankovic 2015; are necessary to handle the long sequences which arise in sig- 2016; Batra, Singh, and Whitehouse 2016; Tabatabaei, Dick, nal processing problems, which raises issues about how to and Xu 2017). combine predictions from different sliding windows. In this paper, we propose sequence-to-point learning, where the in- Recently, it has been shown that single-channel BSS put is a window of the mains and the output is a single point of could be modelled by using sequence-to-sequence (seq2seq) the target appliance. We use convolutional neural networks to learning with neural networks (Grais, Sen, and Erdogan train the model. Interestingly, we systematically show that the 2014; Huang et al. 2014; Du et al. 2016). In particular, it has convolutional neural networks can inherently learn the signa- been applied to energy disaggregation (Kelly and Knotten- tures of the target appliances, which are automatically added belt 2015a) —both convolutional (CNN) and recurrent neu- into the model to reduce the identifiability problem. We ap- ral networks (RNN) were employed. The idea of sequence- plied the proposed neural network approaches to real-world to-sequence learning is to train a deep network to map be- household energy data, and show that the methods achieve tween an input sequence, such as the mains power readings state-of-the-art performance, improving two standard error in the NILM problem, and an output sequence, such as the measures by 84% and 92%. power readings of a single appliance. Energy disaggregation (Hart 1992) is a single-channel blind A difficulty immediately arises when applying seq2seq in source separation (BSS) problem that aims to decompose signal processing applications such as BSS. In these applica- the whole energy consumption of a dwelling into the en- tions, the input and output sequences can be long, for exam- ergy usage of individual appliances. The purpose is to help ple, in one of our data sets, the input and output sequences households to reduce their energy consumption by helping are 14,400 time steps. Such long sequences can make train- them to understand what is causing them to use energy, and ing both computationally difficult, both because of memory it has been shown that disaggregated information can help limitations in current graphics processing units (GPUs) and, householders to reduce energy consumption by as much as with RNNs, because of the vanishing gradient problem. A arXiv:1612.09106v3 [stat.AP] 18 Sep 2017 5 − 15% (Fischer 2008). However, current electricity me- common way to avoid these problems is a sliding window ters can only report the whole-home consumption data. This approach, that is, training the network to map a window of triggers the demand of machine-learning tools to infer the the input signal to the corresponding window of the out- appliance-specific consumption. put signal. However, this approach has several difficulties, Energy disaggregation is unidentifiable and thus a diffi- in that each element of the output signal is predicted many cult prediction problem because it is a single-channel BSS times, once for each sliding window; an average of multiple problem; we want to extract more than one source from a predictions is naturally used, which consequently smooths single observation. Additionally, there are a large number the edges. Further, we expect that some of the sliding win- of sources of uncertainty in the prediction problem, includ- dows will provide a better prediction of a single element ing noise in the data, lack of knowledge of the true power than others; particularly, those windows where the element is near the midpoint of the window rather than the edges, so Copyright c 2018, Association for the Advancement of Artificial that the network can make use of all nearby regions of the Intelligence (www.aaai.org). All rights reserved. input signal, past and future. But a simple sliding window approach cannot exploit this information. 2014). Various inference algorithms could then be employed In this paper, we propose a different architecture called to infer the appliance signals fXig (Kolter and Jaakkola sequence-to-point learning (seq2point) for single-channel 2012; Zhong, Goddard, and Sutton 2014; Shaloudegi et al. BSS. This uses a sliding window approach, but given a win- 2016). It is well-known that the problem is still uniden- dow of the input sequence, the network is trained to predict tifiable. To tackle the identifiability problem, various ap- the output signal only at the midpoint of the window. This proaches have been proposed by incorporating domain has the effect of making the prediction problem easier on the knowledge into the model. For example, local informa- network, as rather than needing to predict in total W (T −W ) tion, e.g., appliance power levels, ON-OFF state changes, outputs as in the seq2seq method, where T is the length of and durations, has been incorporated into the model (Kolter the input signal and W the size of the sliding window, the and Jaakkola 2012; Parson et al. 2012; Pattem 2012; Zhao, seq2point method predicts only T outputs. This allows the Stankovic, and Stankovic 2015; Tabatabaei, Dick, and Xu neural network to focus its representational power on the 2017); others have incorporated global information, e.g., to- midpoint of the window, rather than on the more difficult tal number of cycles and total energy consumption (Zhong, outputs on the edges, hopefully yielding more accurate pre- Goddard, and Sutton 2014; 2015; Batra, Singh, and White- dictions. house 2016). However, the domain knowledge required by We provide both an analytical and empirical analysis of these methods needs to be extracted manually, which makes the methods, showing that seq2point has a tighter approxi- the methods more difficult to use. As was previously noted, mation to the target distribution than seq2seq learning. On all these approaches require handcrafted features based on two different real-world NILM data sets (UK-DALE (Kelly the observation data. Instead, we propose to employ neu- and Knottenbelt 2015b) and REDD (Kolter and Johnson ral networks to extract those features automatically during 2011)), we find that sequence-to-point learning performs learning. dramatically better than previous work, with as much as 83% reduction in error. Sequence-to-sequence learning Finally, to have confidence in the models, it is vital to in- Kelly and Knottenbelt [2015a] have applied deep learning terpret the model predictions and understand what informa- methods to NILM. The neural networks learns a nonlin- tion the neural networks for NILM are relying on to make ear regression between a sequence of the mains readings their predictions. By visualizing the feature maps learned and a sequence of appliance readings with the same time by our networks, we found that our networks automati- stamps. We will refer to this as a sequence-to-sequence ap- cally extract useful features of the input signal, such as proach. Although RNN architectures are most commonly change points, and typical usage durations and power lev- used in sequence-to-sequence learning for text (Sutskever, els of appliances. Interestingly, these signatures have been Vinyals, and Le 2014), for NILM Kelly and Knottenbelt commonly incorporated into handcrafted features and archi- [2015a] employ both CNNs and RNNs. Similar sequence- tectures for the NILM problem (Kolter and Jaakkola 2012; to-sequence neural network approaches have been applied Parson et al. 2012; Pattem 2012; Zhao, Stankovic, and to single-channel BSS problems in audio and speech (Grais, Stankovic 2015; Zhong, Goddard, and Sutton 2014; 2015; Sen, and Erdogan 2014; Huang et al. 2014; Du et al. 2016). Batra, Singh, and Whitehouse 2016; Tabatabaei, Dick, and Sequence-to-sequence architectures define a neural net- Xu 2017), but in our work these features are learned auto- work Fs that maps sliding windows Yt:t+W −1 of the input matically. mains power to corresponding windows Xt:t+W −1 of the output appliance power, that is, they model Xt:t+W −1 = Energy disaggregation Fs(Yt:t+W −1)+, where is W -dimensional Gaussian ran- The goal of energy disaggregation is to recover the energy dom noise. Then, to train the network on a pair (X; Y ) of consumption of individual appliances from the mains sig- full sequences, the loss function is nal, which measures the total electricity consumption. Sup- T −W +1 X pose we have observed the mains Y which indicates the total Ls = log p(Xt:t+W −1jYt:t+W −1; θs); (1) power in Watts in a household, where Y = (y1; y2; :::; yT ) t=1 and yt 2 R+.

Sequence-To-Point Learning with Neural Networks for Non-Intrusive Load Monitoring

Completeness in Quasi-Pseudometric Spaces—A Survey

"Mathematics for Computer Science" (MCS)

MTH 304: General Topology Semester 2, 2017-2018

General Topology

Continuity Properties and Sensitivity Analysis of Parameterized Fixed

Inspector's Handbook: Subbase Construction

Sufficient Generalities About Topological Vector Spaces

Seqgan: Sequence Generative Adversarial Nets with Policy Gradient

Topological Vector Spaces

NETS There Are a Number of Footnotes, Which Can Safely Be Ignored

Well-Order As a Construction Principle for Physical Theories

Nets, the Axiom of Choice, and the Tychonoff Theorem