Toolbox for development and validation of grey-box building models for forecasting and control

De Coninck, Roel; Magnusson, Fredrik; Åkesson, Johan; Helsen, Lieve

Published in: Journal of Building Performance , Taylor & Francis

DOI: 10.1080/19401493.2015.1046933

2016

Document Version: Peer reviewed version (aka post-print) Link to publication

Citation for published version (APA): De Coninck, R., Magnusson, F., Åkesson, J., & Helsen, L. (2016). Toolbox for development and validation of grey-box building models for forecasting and control. Journal of Building Performance Simulation, Taylor & Francis, 9(3), 288-303. https://doi.org/10.1080/19401493.2015.1046933

Total number of authors: 4

General rights Unless other specific re-use rights are stated the following general rights apply: Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal

Read more about Creative commons licenses: https://creativecommons.org/licenses/ Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

LUND UNIVERSITY

PO Box 117 221 00 Lund +46 46-222 00 00 Toolbox for development and validation of grey-box building models for forecasting and control

Roel De Conincka,b,c, Fredrik Magnussond, Johan Åkessone, Lieve Helsenb,c a3E nv, 1000 Brussels, Belgium, bKU Leuven, Department of Mechanical Engineering, 3001 Heverlee, Belgium, cEnergyVille, 3600 Waterschei, Belgium, dDepartment of Automatic Control, Lund University, SE-221 00 Lund, Sweden, eModelon AB, Ideon Science Park, SE-223 70 Lund, Sweden

Abstract detection, energy efficiency analysis and model-based building operation. A first step in many of these ap- As automatic sensing and Information and Communi- plications is the creation of a building energy system cation Technology (ICT) get cheaper, building moni- model. toring data becomes easier to obtain. The availability Models can be classified according to the white- of data leads to new opportunities in the context of en- box, grey-box and black-box paradigm. (Bohlin, ergy efficiency in buildings. 1995; Madsen and Holst, 1995; Kristensen et al., 2004; This paper describes the development and valida- Henze and Neumann, 2011). Although the bound- tion of a data-driven grey-box modelling toolbox for aries between these categories are blurry and often buildings. The Python toolbox is based on a Modelica overlapping, this paradigm is useful for understanding library with thermal building and Heating, Ventilation the modelling procedure. White-box modelling bases and Air-Conditioning (HVAC) models and the optimi- the model solely on prior physical knowledge of the sation framework in JModelica.org. The toolchain fa- building. Most building simulation software falls un- cilitates and automates the different steps in the system der this category, like TRNSYS, EnergyPlus and many identification procedure, like data handling, model se- others (Crawley et al., 2008). Black-box modelling lection, parameter estimation and validation. bases the model solely on response data (monitoring To validate the methodology, different grey-box of the building) and a universal model set, including models are identified for a single-family dwelling with e.g. AR and ARMAX. Although physical insight is detailed monitoring data from two experiments. Vali- not required for making a black-box model, a model dated models for forecasting and control can be iden- structure has to be chosen and this often involves mak- tified. However, in one experiment the model perfor- ing assumptions about the system, for example with mance is reduced, likely due to a poor information regard to linearity. Grey-box identification methods content in the identification dataset. and tools cater for the situation where prior knowledge Keywords: grey-box models, parameter estimation, of the object is not comprehensive enough for satisfac- collocation method, validation, Modelica tory white-box modelling and, in addition, purely em- pirical black-box methods do not suffice because the Introduction involved physical processes are too complex. Grey- and black-box models are also called inverse models. The continuous progress in ICT has led to the avail- The difference between white- and grey-box mod- ability of small and low-cost sensors, low-power wire- elling is not in the complexity of the model. A single- less data transfer protocols, cheap and accessible data state model can be a white-box model if all param- storage and powerful servers. Applied to the building eters can be fixed based on physical knowledge only. sector, these technologies can be used to collect large However, when one or more parameters in a white-box amounts of building monitoring data at relatively low model are estimated based on a fitting of the model to costs. The abundance of data gives rise to new oppor- measurement data, the model becomes grey, no mat- tunities and applications in existing buildings like fault ter its complexity. Therefore, the distinction between white and grey cannot be made by only looking at the nich, Germany. The validation is carried out for two model structure: one has to know how the model pa- different experiments on the same building. rameters have been identified. All three model types can be either deterministic Part I or stochastic. A deterministic model cannot explain the differences between the model output and the true Methodology variations of the states (observations). Madsen and Holst(1995) therefore introduced a Wiener process in Overview the system equations to cope with the simplifications of the model and uncertainties in inputs and monitor- A high-level overview of the toolbox is shown in Fig- ing. The obtained model is a stochastic state-space ure1. The toolbox is composed of four major compo- model. nents: For existing buildings with available monitoring data, the grey-box approach is considered to combine 1. the Modelica library FastBuildings with thermal the best of two worlds: physical insight and model zone models, HVAC components and building structure from the white-box paradigm and parameter models; estimation and statistical framework from the black- box paradigm. This paper describes an approach to 2. different .mop files specifying the model compo- grey-box modelling for buildings and the development nents and which parameters to estimate; of a toolbox combining Modelica and Python. The re- 3. JModelica.org as a middle layer for compilation sulting framework will be referred to as the toolbox in of the .mop files as well as formulation and solu- the remainder of this paper and will be validated on tion of the optimisation problem; a single-family dwelling. The toolbox is not publicly available, but can be obtained with an open-source li- 4. Python module greybox.py delivering the user in- cense for research purposes by contacting the authors terface and top-level functionality. of this paper. The toolbox has been developed with two purposes in mind. A first application is model predictive control (MPC). In this context, the grey-box model serves as the control model in a feed-back loop with the build- ing. According to Henze(2013), the process of model identification accounts for 70 % of the effort for imple- menting an MPC controller. Automating this process can therefore reduce the total cost of MPC in build- ings. To validate such a control model, the k-step pre- diction performance is used. A second application is load forecasting for real buildings. The forecast hori- zon is typically one day or one week. In this case, a different metric to validate the model is required: the simulation performance. This is the model deviation from a measured output in an open-loop simulation Figure 1: Overview of the grey-box buildings toolbox. when measured disturbances are applied. It is clear that a model showing a good simulation Modelica library FastBuildings performance will also have a good k-step prediction performance. The opposite is not true. Therefore, we Modelica is an equation-based modelling language for will use the simulation performance as quality crite- cyber-physical systems (Elmqvist, 1997). The object- rion for the model validation. oriented philosophy stimulates model reuse, resulting This paper is split in two parts. The first describes in many available libraries, often open-source and free. the methodology and development of the toolbox. The Modelica is gaining importance in the building simula- second describes the validation results for a well mon- tion community (Wetter, 2011; Wetter and Van Treeck, itored experimental single-family dwelling near Mu- 2013). The choice for Modelica for the construction of the models is based on three major arguments (Wet- Currently, the thermal zone models available in ter, 2009). Firstly, Modelica allows for linear, non- the FastBuildings library are based on a resistor- linear and hybrid model formulations and therefore capacitance (RC) network analogy which is often used it does not limit the model structure as such. Sec- for the modelling of thermal processes. This is how- ondly, Modelica is equation-based, thus allowing effi- ever not required: any model that specifies a relation- cient Newton-type solvers to be used as an alternative ship between the heat flows and temperatures at the to for example genetic algorithms. Thirdly, Modelica interface of a thermal zone can be implemented. Dif- has a connector concept to support component-based ferent examples of models in the library are schemati- modelling. cally presented in Figure5 in PartII. The FastBuildings library targets low-order build- The FastBuildings library largely contains the do- ing modelling. The library has sub-packages for ther- main specific knowledge that is fundamental in grey- mal zone models (including windows), HVAC, user box modelling. Different thermal zone models often behaviour, inputs, buildings and examples. Single encountered in literature are present in the library and and multi-zone building models can be created eas- it is very easy to add more models (Davies, 2004; ily by instantiating one of the predefined templates in Bacher and Madsen, 2011; Sourbron et al., 2013; the Building sub-package and redeclaring the desired Reynders et al., 2014). The FastBuildings library is submodels, like the thermal zone, HVAC or window very dynamic in the sense that it is being extended model. The following design principles are applied with extra building models the more it is applied to throughout the library. different cases. How these models are chosen in a for- ward selection approach is explained in Section Tool- The thermal connectors are HeatPorts from the • box functionality and work flow. The FastBuildings Modelica.Thermal package, which is part of library is distributed with the Modelica license 2 and the Modelica Standard Library (MSL). can be found in the openIDEAS source code repository Thermal resistors and capacitances are not used on Github (KU Leuven and 3E, 2014). • from the MSL. Simplified versions with less aux- iliary variables are implemented. They have ex- actly the same interface and connectors to ensure The JModelica.org platform compatibility with the MSL.

A strict naming convention is used for consis- The toolbox relies heavily on the JModelica.org plat- • tency and to enable the greybox.py toolbox to au- form, which is an open-source tool for simulation and tomate certain tasks. optimisation of dynamic systems described by Mod- elica code (Åkesson et al., 2010). For simulation The library heavily relies on the extends con- purposes, JModelica.org uses the Functional Mockup • struct in order to avoid code duplication. This Interface (Blochwitz et al., 2011). For optimisation is specifically useful for the thermal zone models purposes, JModelica.org offers various algorithms and that have increasing complexity as a function of also supports the Modelica language extension Opti- their order. mica (Åkesson, 2008). Optimica allows for high-level formulation of dynamic optimisation problems of the An inner/outer component simFasBui passes • type presented in SectionI. all inputs like weather data, occupancy etc. from the top level to all sublevels. Every model structure for which the parameters have to be estimated is characterised by a different The models for thermal zones, HVAC and user .mop file. These files are very similar to ordinary Mod- • behaviour have exactly the same interface as their elica (.mo) files, but they can also contain Optimica equivalents in the IDEAS library. IDEAS, de- code. Each .mop file has the same structure and has to veloped by KU Leuven and 3E, is an open- define two models: one model for simulation, called source library for modelling and simulation of Sim, and one for parameter estimation, called Parest. buildings and integrated districts (Baetens et al., By default, the models are based on the FastBuildings 2012). Therefore, it is very easy to replace one or Modelica library, which has been developed in con- more detailed components from an IDEAS-based junction with this toolbox. However, this is not re- model by a low-order equivalent from the Fast- quired for the toolbox to work, as long as some nam- Buildings library. ing conventions are followed. Any parameter present in the model can be estimated, including initial values twice continuously differentiable with respect to all of of the states. its arguments (except the first one). This disables the The toolbox estimates the unknown model parame- use of hybrid models. Initial conditions are given by ters using JModelica.org’s algorithm based on direct specifying the initial state, as given by (1c), where t0 collocation. Collocation is used to discretise time, is the start time. The initial state is usually unknown, which reduces the optimisation problem to a nonlin- in which case some, or all, elements of x0 can also be ear program (NLP), as presented in Section Solution introduced as elements of the vector p. method and described in more detail in Magnusson and The objective (1a) of the optimisation is to minimise Åkesson(2012), where in particular optimal control is the integrated quadratic deviation e of the model out- also treated. JModelica.org utilises third-party NLP put from the corresponding measurement data. The solvers, which require first- and second-order deriva- model output y is typically some of the states, but tives of all expressions in the NLP with respect to all could also be some of the algebraic variables (and also decision variables. CasADi is used to obtain these by inputs, as discussed below). The matrix Q, which typi- algorithmic differentiation (Andersson et al., 2012). cally is diagonal, is used to weigh the different outputs. In this paper we use the NLP solver IPOPT with the The measurement data is assumed to be a function of sparse linear solver MA27 from HSL (Wächter and time, denoted by yM. Since measurements are typi- Biegler, 2006; HSL, 2013). cally discrete in time, they are simply interpolated lin- early to form yM. The output deviation e is then given Problem formulation by

Identification of the unknown model parameters is for- e(t) := y(t) y (t). (2) mulated as a dynamic optimisation problem of the gen- − M eral form The inputs can be treated in two different ways. The Z t f first is to assume that the inputs are known exactly by minimise e(t)T Qe(t)dt, (1a) t0 their measurement data and treat them as fixed values with respect to x(t),w(t),u(t), p, instead of decision variables. The second way is to have an error-in-variables approach where the inputs subject to F(t,x˙(t),x(t),w(t),u(t), p) = 0, are kept as decision variables and treat them as model (1b) output, that is, include them in the vector y and pe- x(t0) = x0, (1c) nalise their deviation from the corresponding measure- t [t ,t ]. ment data. The second way is useful for coping with ∀ ∈ 0 f uncertainties in measurement data. The system dynamics are modelled by a differential- algebraic equation (DAE) system (1b), where t is the time, x(t) is the state, w(t) is the vector-valued al- gebraic variable, u(t) is the vector-valued system in- Solution method put, which includes both control variables and distur- bances, and p is the vector of parameters to be esti- The approach taken to solve the optimisation problem mated. (1) is based on low-order direct collocation as pre- Algebraic variables often occur in Modelica mod- sented by Biegler(2010). The idea is to divide the els. A typical example is the conversion of measured time horizon into a number of elements, ne, of fixed electricity consumption in a radiative and a convective (but possibly distinct) lengths hi and approximate the fraction. The resulting radiative and convective heat time-variant system variablesx ˙,x,w and u by a poly- fluxes are contained in w(t). nomial of time within each element, called a colloca- The DAE system may be implicit, non-linear, time- tion polynomial. These polynomials are determined variant, and high-index. It is the result of the com- by enforcing the dynamic constraints at a certain num- pilation of the FastBuildings model. In the case of ber of points, nc, within each element. These points high-index systems, index reduction is automatically are called collocation points and ti,k is used to denote performed by the JModelica.org compiler. collocation point number k [1..n ], where [1..n ] de- ∈ c c Since a gradient-based method is applied to solve notes the integer interval between 1 and nc, in element the dynamic optimisation problem, F needs to be number i [1..n ]. ∈ e The system variables’ values at these points, de- introduction of x1,0, the initial condition (1c) is tran- noted by scribed into (3c). Finally, we introduce the constraints (3e) to capture (x˙i,k,xi,k,wi,k,ui,k,ei,k) := the dependency between x andx ˙, which is implicit in (x˙(ti,k),x(ti,k),w(ti,k),u(ti,k),e(ti,k)), (1). The state derivativex ˙i,k in a collocation point is approximated by a finite difference of the collocation are then interpolated based on Lagrange interpola- point values of the state in that element. The finite dif- tion polynomials to form the collocation polynomials. ference weights αl,k are related to the butcher tableau There are different schemes for choosing the place- of the Runge-Kutta method that corresponds to the col- ment of collocation points with different numerical location method. properties. In this paper we only consider Radau col- All that remains is to solve the NLP (3) in order to location. obtain an approximate solution to the original problem All collocation methods correspond to special cases (1). We do this numerically using IPOPT, as described of implicit Runge-Kutta methods and thus inherit de- in Section The JModelica.org platform. sirable stability properties making them suitable for stiff systems. Toolbox functionality and work flow This approximation reduces (1), which is of infinite dimension, into a finite-dimensional nonlinear pro- The user interacts with the toolbox through the grey- gram (NLP) of the form box.py Python module. This module defines two classes GreyBox and Case, as shown in Figure1. ne nc ! T The idea is to instantiate the GreyBox class once for min. ∑ hi ∑ ωkei,kQei,k , (3a) i=1 k=1 the system identification of a given building. The

w.r.t.x ˙i,k,xi,l,wi,k,ui,k, p, GreyBox object will contain many different instances of the Case class. Every Case is an attempt (success- s.t. F(ti,k,x˙i,k,xi,k,wi,k,ui,k, p) = 0, (3b) ful or not) to obtain a model for the given building. x = x , 1,0 0 (3c) The Case therefore keeps track of the model struc- x = x , n [1..n 1], (3d) n,nc n+1,0 ∀ ∈ e − ture, identification data, initial guess, solver settings 1 nc and results of a single parameter estimation attempt. x˙i,k = ∑ α j,k xi, j, (3e) The functionality of the toolbox is packed in methods hi j=0 · of the GreyBox class and can be grouped into different i [1..n ], k [1..n ], l [0..n ]. ∀ ∈ e ∀ ∈ c ∀ ∈ c domains, according to the foreseen workflow. This is shown in Figure2. This workflow is discussed in the The NLP objective (3a) is an approximation of the following paragraphs. original objective (1a) based on Gauss-Radau quadra- ture, where the measurement error ei,k in each colloca- tion point is summed and weighted by the correspond- ing element length hi and quadrature weight ωk, which depends on the choice of collocation points. Note that the decision variables are not only the unknown pa- rameters p, but also the discretised system variables x˙i,k,xi,l,wi,k, and ui,k (unless it has been eliminated). The constraint (1b) from the continuous-time model dynamics is transformed into the discrete-time con- straint (3b) by enforcing it only in each of the collo- cation points. Since the states need to be continuous (but not dif- ferentiable) with respect to time, the new continuity constraint (3d) needs to be introduced. Because we Figure 2: Workflow and high-level functionality in the tool- use Radau collocation, where no collocation point ex- box. ists at the start of each element, this also requires the introduction of the new variables xi,0, which represent The methods under data handling are used to load the value of the state at the start of element i. With the the data files, resample the data if desired, create data slices of given lengths (for example one week, but can exist many local minima. To investigate the parame- be any period) and show a plot of any data slice. Typ- ter search space more systematically and increase the ically, one data slice is the training set, and the other chances of finding a global minimum, a Latin hyper- slices can be used for cross-validation. Resampling the cube sampling method has been implemented. This data is an important step because the toolbox automat- method will take a single initial guess as well as lower ically chooses the collocation points to coincide with and upper bounds for each parameter and derive a the measurements. Thus, the (size of) the numerical univariate beta distribution from these three values. problem (3) is strongly dependent on the chosen sam- The distribution can be symmetric or asymmetric, as pling time, and it is often a good strategy to start with shown in Figure3. a large sampling time and refine it in a later stage. When the data has been pre-processed, a model 3.0 structure has to be specified in the 2.5 step. This is accomplished by specifying the path 2.0 to a .mop file. There are two models in the .mop file: a Modelica model for simulation, called Sim, 1.5 and a Modelica + Optimica model for parameter es- 1.0 timation, called Parest. The main difference is that 0.5 in the model Parest, the value of the Optimica at- Probability density function tribute free is set to true for each parameter to be 0.0 0.0 0.2 0.4 0.6 0.8 1.0 estimated. The compilation of both models happens a = 1.5,b = 4.5 a = b = 4 a = 4.5,b = 1.5 automatically by invoking the corresponding JModel- ica.org functionality. Model information (state vector, parameter vector and required inputs) and solver set- Figure 3: Symmetric and asymmetric beta distributions. tings are also obtained in this step. The parameters a and b characterise the probability den- Before the parameter estimation can be attempted, sity function. an initial guess has to be specified for each element in the parameter vector. These can be set by default, by The Latin hypercube sampling will then derive n inheritance, by Latin hypercube sampling or manually. stratified samples from each distribution and combine When the default initial guesses are used, an appro- them randomly to obtain n different initial guesses. priate value is chosen for each parameter, based on its Each of these guesses will be copied to a new case name. For example, the naming convention in Fast- to keep track of the results. Buildings forces all parameter names for thermal re- When a case has an initial guess for the parame- sistances to start with ’R’ (like RWal), for thermal ca- ter vector, the parameter estimation can be started. pacities with ’C’ (like CZon), for fractions with ’fra’ However, the NLP (3) requires good initial guesses for (like fraRad), etc. Based on the first letter(s) of a pa- each of the decision variables (including all collocated rameter to be estimated, a default initial value will be states and algebraic variables). This is handled by sim- set. ulating first with the Sim model and the initial guess of An alternative for obtaining the initial guess is to the parameter vector. The resulting simulation trajec- start from the optimised parameter vector of a previous tories are used as initial guesses for the decision vari- case, the parent case. This is especially useful when ables in (3). Numerical scaling factors for each system a new .mop file is selected that has similarities with variable are also computed as the infinity norm of the a previously processed .mop file. Due to the naming corresponding trajectory. conventions in FastBuildings, the corresponding pa- The solution time and the number of iterations can rameters will have the same name. Therefore, the best vary a lot depending on the initial guess and the abil- initial guess for a similar parameter in the new model ity of the model to represent the measurement data. will be the optimal value from the parent. For new Through the IPOPT interface, the toolbox allows the parameters, the default initial guess method described specification of a maximum solution time and/or a above is used. maximum number of iterations after which it will in- The last automated option to obtain initial guesses terrupt the optimisation. is based on Latin hypercube sampling. Due to the The estimation adds the optimised parameter vector non-convexity of the problem, there can potentially to the case, as well as the IPOPT solver . The validation of the results is always based on a The final step in the system identification is model post-simulation with the Sim model and the optimised acceptance. Model acceptance is needed on two lev- values of the parameter vector. This can be done on the els: for a single model, and between different models. training data (auto-validation) or on any other dataset For a single model, generally a Latin hypercube sam- (cross-validation). There are both visual and quantita- pling is executed and the resulting global optimum is tive validation methods. The visual methods contain, accepted if it is a valid solution. Valid means that: for example, time series plots of the resulting trajec- tories and corresponding residuals, scatter plots of the the parameters do not lie on the specified mini- • residuals with monitoring data and a plot of the au- mum or maximum bounds; tocorrelation function of the residuals. This also im- plies a check on the weights of the matrix Q from (1a) the parameter values are physically reasonable; • in case the error-in-variables method is used. When the confidence intervals are within reasonable a full Latin hypercube sample has been estimated, a • visual check of the different local optima is imple- bounds. mented. This can be used to judge whether the sam- ple was large enough to suppose that the global opti- These criteria are not totally objective and often re- mum has been found. The quantitative methods are quire an expert’s check on the model. If the global op- based on a computation of the root-mean-square error timum is not valid, the local optima are analysed and (RMSE) for each trajectory in the vector e from (2). may contain a valid model. If no valid model is found As the RMSE is computed based on post-simulation within the sample, there are different options. A new with adaptive step length, discretisation errors in the Latin hypercube sample can be generated with differ- collocation method are accounted for in the model val- ent distributions and/or a larger sample size. Some- idation process. times it can also help to change numerical settings for A computation of the confidence interval for each of the solver or to resample the data differently. If none the estimated parameters is implemented. This gives of these solutions leads to a valid model, the selected an indication of the accuracy of the estimation and the model structure cannot be matched to the considered parameter’s influence on the model’s input-output be- identification dataset and a different model structure haviour. The standard deviation of the estimated pa- has to be chosen. rametersp ˆ is computed according to Englezos and A forward selection approach is preferred for inter- Kalogerakis(2000). The standard deviation for pa- model acceptance. This approach starts with a very rameter i is the square root of the diagonal element on simple model, generally a first order single zone model (i,i) in the covariance matrix cov(pˆ) of the estimated with a low number of free parameters. Then, model parameters, which is given by order and complexity are increased until (i) the mod- els cannot be validated or (ii) the RMSE in cross- 2 T 1 cov(pˆ) = σˆ (J J)− , validation cannot be improved anymore. The selection where σˆ 2 is the estimated variance of the output devi- of the candidate models in this procedure is not sys- ation e. J is composed based on the results of a sim- tematic. The procedure can be carried out manually or ulation with the estimated parameters and sensitivities automatically. The manual solution is a trial and error computations activated. J contains the sensitivities of procedure, often accompanied by tailor-made models the model outputs with respect to the estimated param- based on the results of previous identification attempts. eters as shown in Eq. (4). For the automatic solution, the modeller makes a se- lection of models that is passed to the toolbox. The  ∂y1(t) ∂y1(t) ∂y1(t)  toolbox will then sort the models according to the ∂ p ∂ p ... ∂ p 1 2 np number of parameters and start with the identification  ∂y2(t) ∂y2(t) ∂y2(t)   ...   ∂ p1 ∂ p2 ∂ pnp  of the least complex one. Validation tests are specified J =  . . . .  (4)  . . .. .  by the user and the toolbox will automatically select   ∂yn (t) ∂yn (t) ∂yn (t) the best valid model. Which models are passed to the y y ... y ∂ p1 ∂ p2 ∂ pnp toolbox is again a user-specified choice based on ex-

In this equation, y1 ...yny are the ny model outputs pertise and available meta-information. It is of course and p1 ... pnp are the np free parameters. This method possible to pass every model of the FastBuildings li- can only be applied when the model output is equal to brary, but this will often result in an unacceptable one or more states. CPU-time. Even if the forward selection procedure is not as strict as the one presented by Bacher and Mad- mates also without using a-priori knowledge. Also for sen (Bacher and Madsen, 2011), it has several practi- the very first model in the forward selection approach, cal advantages. Firstly, the initial guesses and distri- which is supposed to be very simple, this works reli- butions for the parameters can be inferred from pre- ably. viously identified models. This is enabled by a strict The toolbox uses upper and lower bounds for the pa- naming convention in the FastBuildings library and by rameter estimation. The main reason for these bounds the fact that similar model components like walls, in- is to reduce the feasible region and thus the search filtration, solar gains etc. are repeated in more com- space. Most bounds reflect basic physical laws, for plex models. Secondly, starting from the first identi- example by imposing that resistances, capacities, gA fied model, there is always a reference performance values and fractions have to be positive. For fractions, (RMSE) with which the new results can be compared. an upper bound of one can be imposed, but it can also This approach avoids overfitting of the model, as will be relaxed. This can be used for example to obtain in- be demonstrated in PartII. ternal gains as a fraction of measured electricity con- sumption. Thanks to body heat gains, this estimated Data requirements fraction is allowed to be larger than one. Most param- eters do not have an upper bound because it is impossi- ble to specify them without using meta-information. If The developed grey-box modelling approach is in- a parameter estimation would result in unrealistic high tended for existing buildings. The aim is not to de- values, this is to be detected by either too high confi- velop a detailed emulator model, but to develop a sim- dence intervals, an expert check on the values or bad plified low-order model that works well in an MPC cross-validation. The use of bounds for the starting context or for predicting loads in buildings. For prac- temperatures of the states is illustrated in PartII. tical use in existing buildings, we can not count on the existence of detailed emulator models, hence model Sometimes, meta-information can be replaced by an order reduction approaches are excluded. Therefore, analysis of the available data. For example, a large the aim is to develop a methodology that can cope with window area on a specific orientation can be discov- very little meta-information and a limited amount of ered automatically by a correlation analysis on zone measurement data. temperatures with incident radiation on different ori- With regard to the meta-information, the require- entations. This information is then used to select ments depend on the complexity of the model. For which solar radiation components are used as distur- very simple models, typically single-zone and without bances in the model. HVAC components, there is no need for any a-priori For more complex models however, more meta- knowledge of the building. Only the location of the information is needed. When a multi-zone model is building has to be known if weather data is to be ob- created, information about the position of available tained from a generic weather service. Other build- zone measurements (temperature, humidity, electric- ing properties, like building size, orientation, window ity consumption etc.) is very useful to decide on the area, envelope properties etc. are not required. Nev- zoning strategy. If the model has to contain the HVAC ertheless, this information can be beneficial for fixing system, some information is necessary, in particular initial guesses and for validation of estimated parame- about the presence of specific equipment. This infor- ters. mation is used to adapt the model structure to the in- The more meta-information we want to use, the stalled HVAC system. more manual interventions are needed in the identi- The first requirement for the monitoring dataset is fication procedure. Therefore, this information is op- that the meaning of all variables is clear. The dataset tional. By default, initial guesses are hard coded based has to contain at least the indoor and ambient temper- on naming conventions. This means for example that ature and heating/cooling loads at hourly interval. The all resistances in a model are attributed the same initial ambient temperature (and other weather variables) can value, unless a resistance with the same name has been also be obtained from a weather service if the location estimated in the parent case of the current model. In of the building is known. The availability of more vari- this situation, the initial guess is the optimised value ables is beneficial and will improve the model. Elec- from the parent case. Experience shows that the com- tricity consumption monitoring (with sub-metering for bination of these initial guesses with the Latin hyper- plug power) is strongly recommended. More sub- cube sampling is sufficient to find good parameter esti- meters for electricity always improve the information Setup Exp. 1 Exp. 2 Period Summer Winter Blinds Closed Open Zoning Single zone Two zones (doors closed and sealed) Heating Sequence Different sequence

Table 1: Overview of the differences between both experi- ments.

paper focus on the building only and do not include Figure 4: Front view and ground floor of the experimental components for users or HVAC. house (north is up). For experiment 2, Zone 2 refers to the 3 small rooms in the north. Control versus forecasting content and usability, for example for creating equip- The presented grey-box approach aims at identifying ment scheduling profiles. When the HVAC system is models for forecasting and control. We have argued to be modelled, a measurement of the energy use of that the simulation performance is the correct criterion different components is required. to validate the models, and it is sufficient to validate Occupancy measurements are often not available. models for forecasting. However, a model with a good Mostly, the model does not need the occupancy itself, simulation performance is not necessarily suited for but the internal gains from body heat transfer. In of- optimal control. Some additional criteria are: fices, these are correlated with plug power. Alterna- tively, occupancy can be modelled based on measure- observability: models that are not observable can- • ments of relative humidity or CO2. This is not yet im- not be initiated in the right state with an adapted plemented in the current version of the toolbox. state estimation procedure; complexity: the model has to fit in an optimal • control framework. In particular, non-linear mod- Part II els are difficult to optimise; solver time: even if the requirements above are • Validation fulfilled, solving the OCP may require more time than is available between subsequent control Methodology steps. Experimental setup However, we are confident that the models pre- A detailed experiment was set up by Fraunhofer IBP sented in this paper are suited for MPC. Firstly, state (Holzkirchen, Germany) in order to collect monitoring estimation has been implemented on identical and data from well-known buildings near Munich, the twin very similar models as those presented in this paper houses (Kersken et al., 2014). We use monitoring data (Vande Cavey et al., 2014). Secondly, the presented from one of these houses. A schematic overview of the models are all linear (even if the FastBuildings library building is given in Figure4. and our our optimal control framework JModelica.org Two experiments are performed, resulting in two both allow non-linear models). Thirdly, we have tested datasets of about 40 days each. Each experiment con- those models in MPC and the computation times for sists of consecutive periods of free floating operation, solving the OCP are around one minute or much less a randomly ordered logarithmically distributed binary depending on the initialisation and forecasting hori- sequence (ROLBS) for heat inputs and a temperature zon. controlled operation. The differences between the ex- The ultimate validation of a control model is by as- periments are detailed in Table1. sessing its control performance, but this is subject to There are no users in the experimental house. The future work. We therefore assume that the models con- heating consists of electrical heaters in each of the sidered here are suitable for control if they have a good spaces. By consequence, the models presented in this simulation performance. Simulation performance tion rates etc. In order to validate the grey-box tool- box for its intended use and work flow, none of this The simulation performance is quantified by the sim- meta-information is used in the modelling phase. ulation error SE. It is a weighted average of the root mean square error (RMSE) for a set of n model out- puts: Experiment 1

n Data handling and zoning SE = ∑ q j(RMSE j), (5) j=1 The available dataset is very detailed. There are differ- where ent temperature sensors (in different rooms and also

r m 2 within a single room to measure stratification), heat ∑i=1 (y j(ti) Mj(ti)) RMSE j = − . flux measurements, humidity sensors etc. The sam- m pling period of the data is 10 minutes. We start the The weighting factors q j are the diagonal elements forward selection procedure with a simple single-zone of matrix Q of (1a). For each selected output variable, model lumping all heated spaces, and we neglect the y j(ti) is the model output at time instant ti and Mj(ti) interaction with the boundary spaces (attic and cel- is the corresponding measurement. The model outputs lar). For this thermal zone, an average zone temper- are taken at m time instants, corresponding to the data ature TZon has to be defined. As we know nothing points. These m data points may be the raw measure- about the building (we do not use the available meta- ment data or the result after a downsampling operation. information) we just average all temperatures of the The measurement data M is used to form yM in (2) by heated spaces and sum their heating loads. We also interpolation. However, as the toolbox sets the collo- resample the 10 minute data to hourly values. This re- cation points so that they coincide with the time in- duces the size of the numerical problem because the stants ti defined by the (resampled) measurement data, toolbox automatically chooses the collocation points no interpolation is actually needed. to coincide with the measurements. An overview of all It is important to implement the validation as cross- models that are successfully identified and validated in validation (as opposed to auto-validation). This means the forward selection procedure is shown in Figure5. computing the simulation performance on a section of the dataset that was not used for identification. For val- A first model idation of the control performance, a short validation period of about 1 day is sufficient. However, in order The first single-zone model is model A of Figure5. to validate the load forecasting application, we need a The model has only one state TZon. The free param- much longer dataset. As our experiments contain each eters are the thermal capacity CZon and start temper- 40 days of data, we split them in two equal parts: the ature TZon(0), the total solar transmittance gA of the identification and (cross-)validation subsets. We will windows and a total heat loss coefficient RWal repre- refer to the simulation performance in cross-validation senting all heat losses to the ambient temperature TAmb as prediction performance. (see Table2). For the validation simulation, an initial state vector Model inputs are the ambient temperature TAmb, is required. This state vector can be identified by filter- global horizontal radiation IGlo,Hor, and summed heat- ing techniques from measurement data up to the start ing loads QHea. The use of the global horizontal radia- time of the validation dataset. In this study, the val- tion instead of the radiation on several vertical surfaces idation dataset starts where the identification dataset is an approximation that allows us to estimate only one ends. Therefore, the state vector can be determined as gA value. In subsequent model refinements we will the model state at the end of the identification period. use the radiation components on different orientations As explained in PartI, the forward selection ap- and multiple windows. The identification is based on proach results in a single grey-box model for a given a minimisation of RMSE(Tzon). dataset. We will call this model the accepted model. As for all models that will be discussed below, a It is the model resulting in the lowest SE on the cross- Latin hypercube sample with initial guesses has been validation dataset that is valid. created and the best result of this sample is presented. For the experimental house however, almost all Table2 gives the resulting parameter estimates and meta-information is available: dimensions, construc- their estimated 95% confidence interval for the first or- tion, window positions, material properties, ventila- der model. The RMSEauto is 0.61 K and the RMSEcross Model A Model B

Model C Model D

Model E Model F

Figure 5: Overview of identified valid models for experiment 1. See the nomenclature at the end of this article for the meaning of the variables. Parameter Estimated value 95% conf. still use the global horizontal radiation IGlo,Hor to es- CZon 5.4e7 J/K +/- 2.2e6 timate a single gA value. The incorporation of solar RWal 7.0e-3 K/W +/- 1.2e-4 gains can be refined by adding windows and connect- TZon(0) 30.0 ◦C- ing each window to a different solar radiation. In our gA 2.65 m2 +/- 0.16 m2 attempts, we obtained the best results with two win- dows, connected to the vertical global radiation on Table 2: Overview of free parameters and their estimated East and West respectively. This resulted again in a values for the first order model (model A). large reduction of RMSEauto to 0.10 K and a small re- duction of RMSEcross to 0.71 K. is 2.01 K which is not very accurate. Still, this model is a good starting point and it provides useful initial guesses for the parameters in more detailed models. Model E When analysing the data, we have found a possible cause for the discrepancy between the re- Model refinements sults in auto and cross-validation. In the identifi- cation dataset, the mean attic temperature is higher Different models of increasing complexity have been than in the validation set, leading to overestimation identified. Without discussing all attempts in detail, of temperatures on cross-validation. When we add a we try to give an overview of the model improve- thermal resistance to the attic and estimate its value ments. An overview of the RMSE values for auto and we can indeed improve the prediction performance. cross-validation of all successfully identified models is The obtained model has an RMSEauto of 0.09 K and shown in Figure6 and the model schemes are shown RMSEcross is 0.56 K. The model has 12 parameters, in Figure5. The accepted model is discussed in more and none of the estimated values are physically impos- detail in the next section. sible or are positioned at their minimum or maximum boundary. This is an important validation criterion, it Model B The single state model cannot capture all requires however an expert check. dynamics in the measurement data. More states are All subsequent attempts to improve the model lead needed. A variety of additional states can be sug- to non-physical models. This may seem a non- gested, such as internal mass TInt , inertia of the build- issue since we are dealing with grey-box models ing envelope TWal, inertia of the heat emission system in which the parameters are allowed to represent THea etc. Different two-state models have been identi- lumped characteristics. However, experience shows fied, the best model (on cross-validation) has an addi- that when models have unrealistic values for the phys- tional state for the walls (model B). This model has six ical (lumped) parameters, these are always accompa- free parameters (of which two are initial temperatures nied by extremely large confidence intervals. of the states). The RMSEauto is 0.31 K and RMSEcross is 0.76 K. This is a substantial improvement compared to the single-state model. Model F A different situation occurs when all phys- ical parameters have acceptable values, but the esti- mated initial temperatures for the states are at the im- Model C We can still improve the model by increas- posed boundaries (270 K and 310 K by default). When ing its order to three states. Many attempts lead in- this happens for a state corresponding to a large time deed to lower RMSE values in auto-validation, but not constant (large RC value), the state does not act very in cross-validation. This means these models are over- dynamically and the energy balance of the model is fitted. We found one model however that slightly im- biased. However, it is often possible to obtain a valid proves RMSEcross to 0.74 K. This model is able to re- model by limiting the initial state temperatures to a duce RMSEauto by 50% to 0.15 K but this barely re- narrow bound based on physical insight and analy- sults in a better prediction performance. The model sis of the measurement data. This will never lead has an additional state for the thermal inertia in the to a lower RMSE on auto-validation because we ex- zone and an additional resistance rIn f in parallel with clude the optimal solution from the feasible region. the wall. This leads to 10 free parameters. However, it may result in a lower RMSE on cross- validation and thus a better prediction performance. Model D An analysis of the residuals reveals a cor- What happens is that the numerically (slightly) better relation between model error and solar radiation. We solution is shifted to a more physical solution. 2.5 n=1 2.0 n=2

[K] n=3

Zon 1.5 n=4 T

1.0 1.4 95% confidence RMSE on 0.5 1.2 0.0 0 2 4 6 8 10 12 14 1.0 Number of estimated parameters 0.8

0.6

Figure 6: RMSE values for auto-validation (filled mark- Normalized confidence interval

ers) and cross-validation (hollow markers) for the different 1 2 Int Int In f Zon Bou Bou Wal Wal R gA gA C R R R C C models as a function of the number of estimated parameters C and the model order n.

This is also observed in experiment 1. We try to Figure 7: Normalised confidence interval for the parame- ters in the accepted model for experiment 1. improve the model with 12 parameters by adding a state for the boundary with the attic (model F). This adds two parameters to be estimated, the thermal ca- pacity of the boundary CBou and the initial tempera- ture of this state TBou(0). The optimisation returns TBou(0) = 270 K (-3.15 ◦C). Physically, we know this temperature should lie somewhere between the tem- peratures of the zone and the attic. By increasing the lower bound to 22.9 ◦C we find a valid model with an RMSEauto of 0.09 K and a strongly improved RMSEcross of 0.33 K. This model, with 14 parameters, is the accepted model. It will be discussed in more de- Par. Meaning Value tail below. Further attempts led to non-physical mod- CBou State boundary to attic 8.1e+07 J/K els or did not improve the forecasting performance CInt State internal mass 2.6e+07 J/K while adding unnecessary model complexity. CWal State building envelope 2.3e+08 J/K CZon State zone 3.4e+06 J/K Model validation TBou(0) Initial temperature 22.9 ◦C TInt (0) Initial temperature 29.6 ◦C Table3 gives the resulting parameter estimates for TWal(0) Initial temperature 27.1 ◦C model F. The normalised confidence intervals are TZon(0) Initial temperature 30.3 ◦C shown in Figure7. The toolbox does not compute RBou Resistance to attic 3.4e-2 K/W confidence intervals for the initial temperatures of the RIn f Resistance to ambient 1.5e-2 K/W states. We can see that all confidence intervals are rea- RInt Resistance CZon CInt 1.0e-3 K/W sonably small. Together with the physically meaning- ↔ RWal Resistance envelope 1.8e-2 K/W ful parameters this is an indication of validity for our 2 gA1 gA windows East 0.46 m model. More specifically, this indicates that the model 2 gA2 gA windows West 1.03 m is not overfitted. Figures8 and9 show the measured and simulated Table 3: Overview of estimated parameters for the accepted zone temperatures for the identification and validation model (model F) for experiment 1. datasets respectively. The latter represents the simula- tion performance SE as we have only one model out- put TZon in (5). We can see that, given perfect pre- dictions of the disturbances, the model is able to pre- dict the measured temperature very well, even in an itnttemlzns aho hs oe scom- is zones these of two Each to different zones. leading two thermal maintained experiment, distinct this are in regimes that ex-temperature to is compared 1 differences periment fundamental the of One zoning and handling Data 2 Experiment experiment. of this control in monitored and dwelling forecasting the both for validated is model research. the future of in presence elaborated be the will by This caused attic. dynamics slow the an compensates for if estimation good state be and/or still estimation can online performance control The be poor. would performance Without simulation attic. the the information, of this temperature the dependent of is prediction it a dis- that on is a model However, accepted the of days. advantage 20 over simulation open-loop 1. model experiment accepted for the in (cross-validation) the dataset for temperature validation zone simulated and Measured 9: Figure 1. accepted experiment for the for temperature in model (auto-validation) zone dataset simulated identification and the Measured 8: Figure

rmteersls ecnld httegrey-box the that conclude we results, these From °C °C 16 18 20 22 24 26 22 24 26 28 30

13/09/13 24/08/13

17/09/13 28/08/13

21/09/13 01/09/13 T T T T Zon Zon Zon Zon 25/09/13 05/09/13 (simulated) (measured) (simulated) (measured)

29/09/13 09/09/13 (iha diinlresistance model additional with an fit (with tween good 5 a Figure found of have B we models. zones single-zone both the For for results the discuss briefly model. two-zone useful the the provide of of magnitude also of order will the regarding This indications as zone condition. other boundary the single- of a two temperature identify the to with models try zone first we magnitude of parameters, orders the the to and for dynamics order the on in grip However, a get known. are the conditions zones and weather power both heating we for their so temperature only the when condition simultaneously predict two-zone boundary to a no able for be be will aim will we There 1, model. experiment to contrast In models Single-zone dy- 2. the zone of of estimation will namics the we complicate As will shown). this ten (not later, about 1 see is zone for 2 than zone smaller for times power heating the Also stable. the of that see plot can 10), a we Figure From (see temperatures zone values. measured hourly averaged to data re- the we sample and measurements available all average simply investi- been not have zones two gated. than more with zones els these call will We accord- av- ingly. rooms and individual zones in measurements two ap- the only eraging zoning modelling basic of most consists The proach rooms. different of posed the dataset, identification the set. is validation the data 2. is the half experiment second of full half the both first for for The temperature temperatures ambient (averaged) and zones Measured 10: Figure oe.I sascn-re oe ih8parameters 8 with model second-order a is It zone). °C ihu eciigalsest raetemdl we models the create to steps all describing Without but meta-information available use not do we Again, − 10 10 20 30 40 0 T Zon 11/04/14 n h onaytmeaueo h other the of temperature boundary the and 15/04/14 T Zon 19/04/14 2 T a ifrn oto n svery is and control different a has 23/04/14 Amb

27/04/14

Zon 01/05/14 T Zon and 1

05/05/14 2

09/05/14 Zon

R 13/05/14 .Mod- 2. T Zon Bou

17/05/14 1 be- SE of which two are initial temperatures. Each zone has a single window connected to the radiation on a vertical, south oriented plane (instead of the global horizontal radiation as indicated in the scheme of model B). Parameter Zon1 Zon2 Unit Table4 gives the resulting parameter estimates for CInt 1.7e7 2.7e7 J/K both models. The corresponding RMSE values are CZon 2.6e6 1.2e7 J/K given in Table5. The normalised confidence intervals TInt (0) 26.4 22.7 ◦C are shown in Figure 11. From these results we can see TZon(0) 29.5 22.5 ◦C the following: RBou 6.2e-3 2.9e-2 K/W RInt 1.3e-3 6.0e-4 K/W for all parameters, the order of magnitude is RWal 3.9e-2 4.8e-2 K/W • roughly the same for both zones; gA 3.1 0.28 m2 the thermal resistance of the boundary is a factor • Table 4: Overview of estimated parameters for the accepted 5 higher when estimated from zone 2; single-zone models for experiment 2. zone 2 has a much better RMSE than zone 1, • auto but a higher RMSEcross; the confidence intervals for zone 2 are much • larger, except for RBou and RWal; both zones have higher solar aperture areas than • for experiment 1. This makes sense considering that in experiment 1 the blinds were closed, and in experiment 2 they are open. 1.4 Zone 1, 95% confi dence The low RMSEauto for zone 2 is misleading. Both Zone 2, 95% confi dence the confidence intervals and cross-validation show that 1.2

the model for zone 2 is not very good. We can under- i 1.0 stand this result by analysing the measurement data as shown in Figure 10. The temperature in zone 2 is 0.8 extremely flat during the identification period. There- 0.6 fore, the thermal inertia in this zone is not excited and Normalized conf dence interval 1 consequently it is very hard or even impossible to es- Int Int Zon Bou Wal R gA C R R timate the time constants and other parameters of a C dynamic model. We can conclude that poor datasets (with little excitation of the states) cause difficulties for the identification of dynamic models. Whenever Figure 11: Normalised confidence intervals for the esti- possible, the building control system should cause suf- mated parameters for both single-zone models. ficient excitation of all building components during the identification period. This conclusion has been formu- lated previously in literature, amongst others by Sour- bron et al.(2013) and Žá cekovᡠet al.(2014). We now try to identify a two-zone model by com- bining the two single-zone models. RMSEauto RMSEcross Two-zone model Zone 1 0.27 K 0.51 K Zone 2 0.07 K 0.65 K The two-zone model has two outputs on which the SE 0.34 K 1.16 K simulation error SE will be computed according to (5): TZon1 and TZon2. We take both weight factors w j = 1. Table 5: RMSE of individual zone models and resulting SE. For the model of the boundary between both zones two options are explored: a thermal resistance, or a wall composed of two resistances and a capacity. In principle, the parameter values should not deviate Parameter Zon1 Zon2 Unit CInt 1.9e7 1.1e8 J/K CZon 2.8e6 5.5e8 J/K TInt (0) 26.8 23.9 ◦C TZon(0) 29.2 22.5 ◦C RInt 1.4e-3 4.0e-5 K/W RWal 2.0e-2 5.9e-3 K/W Figure 12: Accepted two-zone model for experiment 2. gA 2.4 18.7 m2 CBou 4.0e9 J/K RBou 1.7e-2 K/W much from the ones in Table4. We also expect the TBou(0) 25.3 ◦C largest parameter deviations for the parameters with the largest confidence intervals. Table 6: Overview of estimated parameters for the accepted Without an additional state in the boundary wall be- two-zone model for experiment 2. tween the zones, the results are not very good: the model has an SEauto of 0.37 K and SEcross of 1.74 K. Moreover, the initial temperature of CInt for zone 2 lies RMSEauto RMSEcross at the boundary of 270 K (-3.1 ◦C) and rises monoton- Zone 1 0.23 K 0.51 K ically during the identification period, thus falsifying Zone 2 0.10 K 1.14 K the energy balance. SE 0.33 K 1.65 K With an additional state CBou as shown in Figure 12, the simulation performance improves. However, anal- Table 7: RMSE and SE of accepted two-zone model. ysis of the estimated parameters reveals again an ini- tial temperature TInt (0) of 270 K. This time however, the capacity C is not very large, and an attempt to Int predicting zone 2 than zone 1, we need to increase the narrow down the feasible region for the initial tem- weighting factor w2 from Eq. (5). perature leads to a valid model. The SEauto becomes 0.335 K and SEcross drops to 1.65 K. We should not be The bad simulation performance of zone 2 also be- surprised that SEauto drops slightly below the level of comes evident when comparing the measured and sim- 0.343 K obtained with the two single-zone models: an ulated zone temperatures. These are shown in Fig- additional degree of freedom is introduced with CBou. ures 13 and 14. For zone 1 however, the prediction per- Further attempts to improve the model were not suc- formance is very good, despite the deviation on zone 2. cessful, the accepted model is discussed in more detail Again, we can stress the importance of online identifi- below. cation and state estimation in order to avoid recurring model bias.

Model validation

34 T (measured) T (measured) Table6 shows the resulting parameter estimates for the Zon1 Zon2 32 TZon1 (simulated) TZon2 (simulated) accepted model presented in Figure 12. All parame- 30 ters have physical values. For zone 1, the parameters 28 barely shift compared to the single-zone model. For °C zone 2 however, most parameters change with a fac- 26 tor of 10. This again indicates that the identification 24 ± dataset is worse for zone 2. 22 When the SE is split in the RMSE values for each zone separately, Table7 is obtained. Zone 1 has a very good performance, also in cross-validation. Zone 2 09/04/14 13/04/14 17/04/14 21/04/14 25/04/14 however has a bad RMSEcross. By comparison of Ta- ble5 and Table7, we see that the single-zone model Figure 13: Measured and simulated temperature of both for Zone 2 has a better prediction performance than zones for the identification dataset (auto-validation) in the the two-zone model. If we were more interested in accepted model for experiment 2. ntecosvldto aae,temdldeviations model the dataset, cross-validation days the prediction 20 on over good simulation very open-loop an a In has performance. that identified house is single-zone building a single-family 1, experiment a In Germany. of Munich, near monitoring detailed the by by purposes authors. research the for contacting obtained license be open-source can an but available, with The publicly not problem. is optimisation toolbox the parameter of the non-convexity to of the related issues sampling local-minima overcomes hypercube ease-of- space search Latin and A robustness to use. paid is attention Specific problems. estimation parameter nu- the efficient of an solution allows merical The method gradient-based operations. a of common use all for the interface wraps high-level that module Python of a functionality as implemented is It the of estimation parameter the multi-zone buildings. and single users, HVAC, zones, thermal Model- for The package candidates. ica model potential many with both in forecasting. applicable a and are MPC in paper which models way, this grey-box automated of obtain largely part to first approach an The presents control buildings. and monitored analysis of for models the low-order for framework of strong creation a as build- considered is the modelling grey-box in specifically More attention community. simulation gaining ing is modelling Inverse Conclusion 2. experiment for ac- both the model in of cepted (cross-validation) dataset temperature validation simulated the for and zones Measured 14: Figure

h olo svldtdo w aaesgenerated datasets two on validated is toolbox The automates largely that presented is toolbox a Next, library building a of creation the is step first The °C 22 24 26 28 30 32 34 36 38 40

29/04/14 FastBuildings

JModelica.org 03/05/14

07/05/14 otislwodrmodels low-order contains n rsnsteue a user the presents and FastBuildings 11/05/14 T T T T Zon Zon Zon Zon 2 2 1 1 (simulated) (measured) (simulated) (measured) 15/05/14 models. Nomenclature options. these explore the will and toolbox developments grey-box the Future of equipment. coef- HVAC transfer and heat ficients in components. encountered formula- model typically model are non-linear These the also allowing in by freedom tion large a grey-box creates the Mod- Secondly, elica load. point, heating the set ex- predict would For temperature model balanced. a is given problem be ample, the can as outputs long and as inputs switched Therefore, information to predefined direction. opposed a model flow with as acausal model equations, an input-output have with an we variables equation- that all is means relating model This grey-box the based. of use Firstly, the from Modelica. come that methodology proposed the data. identification good of need the identifica- out the points clearly in dif- excitation more an weak has dataset, a tion however to zone Due second ficulties. The good K. a 0.51 zone, of an with first achieved the is performance For prediction performance. mixed with an with small very are rses o uprigti oko h ieo Evia 3E of side the on work this of supporting (Region for Innoviris Brussels) thank to wishes Coninck De Roel Acknowledgement Con Wal C Rad Amb Bou f In Emb Int Zon Meaning Subscript I gA Q R Meaning T Symbol ial,w att on u w datgsof advantages two out point to want we Finally, identified is building two-zone a 2, experiment In Radiative Convective (outdoor) Ambient Boundary Infiltration system cooling) or (heating Embedded envelope building / Walls Internal air) denoting (mostly Zone radiation Solar admittance Solar flux Thermal resistance Thermal capacity Thermal Temperature RMSE cross RMSE f11 sotie.This obtained. is K 1.14 of cross FastBuildings fol .3K. 0.33 only of RMSE library cross the ITEA2 project Enerficiency (contract RBC/11 DS The functional mockup interface for tool indepen- 143a) and the European commission for supporting dent exchange of simulation models. In "8th In- his work on behalf of KU Leuven by supporting the ternational Modelica Conference", Dresden, Ger- FP7 project PerformancePlus (contract nb. 308991). many, pp. 20–22. Fredrik Magnusson acknowledges support from the Swedish Research Council through the LCCC Lin- Bohlin, T. (1995). Editorial - Special issue on grey naeus Center and is also a member of the ELLIIT Ex- box modelling. International Journal of Adaptive cellence Center at Lund University. Control and Signal Processing 9, 461–464.

Crawley, D. B., J. W. Hand, M. Kummert, and B. T. References Griffith (2008, April). Contrasting the capabilities of building energy performance simulation pro- Åkesson, J. (2008). Optimica—an extension of mod- grams. Building and Environment 43(4), 661–673. elica supporting dynamic optimization. In Proc. 6th International Modelica Conference 2008. Davies, M. G. (2004). Simple Models for Room Response. In Building Heat Transfer, Chapter 14, Åkesson, J., K.-E. Årzén, M. Gäfvert, T. Bergdahl, pp. 311–334. Chichester: John Wiley & Sons, Ltd. and H. Tummescheit (2010, November). Mod- eling and optimization with Optimica and Elmqvist, H. (1997). Modelica - A unified object- JModelica.org—languages and tools for solving oriented language for physical systems modeling. large-scale dynamic optimization problems. Com- Simulation Practice and Theory 5(6), 32. puters and Chemical Engineering 34(11), 1737– 1749. Englezos, P. and N. Kalogerakis (2000, October). Applied Parameter Estimation for Chemical En- Andersson, J., J. Åkesson, and M. Diehl (2012). gineers, Volume 81 of Chemical Industries. CRC CasADi – A symbolic package for automatic dif- Press. ferentiation and optimal control. In S. Forth, P. Hovland, E. Phipps, J. Utke, and A. Walther Henze, G. P. (2013, May). Model predictive control (Eds.), Recent Advances in Algorithmic Differen- for buildings: a quantum leap? Journal of Building tiation, Lecture Notes in Computational Science Performance Simulation 6(3), 157–158. and Engineering, Berlin. Springer. Henze, G. P. and C. Neumann (2011). Building sim- Bacher, P. and H. Madsen (2011, February). Iden- ulation in building automation systems. In J. L. M. tifying suitable models for the heat dynamics of Hensen and R. Lamberts (Eds.), Building perfor- buildings. Energy and Buildings 43(7), 1511–1522. mance simulation for design and operation, pp. 401–440. Baetens, R., R. De Coninck, J. Van Roy, B. Ver- bruggen, J. Driesen, L. Helsen, and D. Saelens HSL (2013). A collection of Fortran codes for large (2012). Assessing electrical bottlenecks at feeder scale scientific computation. http://www.hsl. level for residential net zero-energy buildings rl.ac.uk. by integrated system simulation. Applied En- ergy ((Special issue on Smart Grids, Renewable Kersken, M., I. Heusler, and P. Strachan (2014, Energy Integration, and Climate Change Mitigation September). Erstellung eines neuen, mess- - Future Electric Energy Systems)). datengestützten validierungs-szenarios für gebäude-simulationsprogramme. In Christoph van Biegler, L. T. (2010). Nonlinear Programming: Con- Treeck, Dirk Müller (eds.) Proceedings of BauSIM cepts, Algorithms, and Applications to Chemical 2014, Aachen, Germany, pp. 144–151. IBPSA. Processes. MOS-SIAM Series on Optimization. Mathematical Optimization Society and the Soci- Kristensen, N. R., H. Madsen, and S. B. Jorgensen ety for Industrial and Applied . (2004, February). Parameter estimation in stochas- tic grey-box models. Automatica 40(2), 225–237. Blochwitz, T., M. Otter, M. Arnold, C. Bausch, C. Clauß, H. Elmqvist, A. Junghanns, J. Mauss, KU Leuven and 3E (2014). open-IDEAS source code M. Monteiro, T. Neidhold, et al. (2011, March). repository. https://github.com/open-ideas. Madsen, H. and J. Holst (1995). Estimation of continuous-time models for the heat dynamics of a building. Energy and Buildings 22, 67–79.

Magnusson, F. and J. Åkesson (2012, September). Collocation methods for optimization in a Mod- elica environment. In 9th International Modelica Conference, Munich, Germany.

Reynders, G., J. Diriken, and D. Saelens (2014, Oc- tober). Quality of grey-box models and identified parameters as function of the accuracy of input and observation signals. Energy and Buildings 82, 263–274.

Sourbron, M., C. Verhelst, and L. Helsen (2013, May). Building models for model predictive con- trol of office buildings with concrete core activa- tion. Journal of Building Performance Simula- tion 6(3), 175–198.

Vande Cavey, M., R. De Coninck, and L. Helsen (2014). Setting up a framework for model predic- tive control with moving horizon state estimation using JModelica. In 10th International Modelica Conference, Lund, Sweden, pp. 1295–1303.

Wetter, M. (2009, June). Modelica-based modelling and simulation to support research and develop- ment in building energy and control systems. Jour- nal of Building Performance Simulation 2(2), 143– 161.

Wetter, M. (2011). A view on future building system modeling and simulation. In J. L. M. Hensen and R. Lamberts (Eds.), Building performance simula- tion for design and operation, pp. 28.

Wetter, M. and C. Van Treeck (2013). IEA EBC An- nex 60 - New generation computational tools for building and community energy systems based on the Modelica and Functional Mockup Interface standards. http://iea-annex60.org/about. html.

Wächter, A. and L. T. Biegler (2006). On the im- plementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear pro- gramming. Mathematical Programming 106(1), 25–57.

Žáceková,ˇ E., Z. Vána,ˇ and J. Cigler (2014, Decem- ber). Towards the real-life implementation of MPC for an office building: Identification issues. Ap- plied Energy 135, 53–62.