Corporate Technology

System Identification, Forecasting, Control with Neural Networks

Hans Georg Zimmermann

Siemens AG, Corporate Technology Email : [email protected]

Page 1 CT RTC BAM ©Siemens AG, Corporate Technology Neural Networks @ Siemens: 25 Years of Research, Development, Innovation

Economical Applications Risk Price Analysis Forecasting Software Environment Graphical User High Performance Interface State Space Computing Prior Rule Modeling Demand Forecasting Customer Information Forecasting Specification & Diagnosis Scripted Relation Mgt. Languages Modeling Function Uncertainty Complex Approximation Analysis Mathematics of Feedforward Decision Neural Nets Neural Networks Learning Support Process Algorithms processing Modeling Neuro-Fuzzy Optimal Process Networks Recurrent Control Surveillance Neural Nets Fuzzy Topology Rules Design for Neural Networks Process Forecasting Control Renewables Technical Applications

Page 2 CT RTC BAM ©Siemens AG, Corporate Technology Mathematical Neural Networks

Complex Systems

non- Based on data identify an input-output relation linearity y = W2 f (W1x) T Calculus d 2 Neural E = ()yt − yt → min ∑ W ,W Networks t =1 1 2 y = W2 f (W1x) output y

∂E W2 ΔW2 = − η = ∗ Linear Algebra ∂W2 f (z)= tanh(z) tanh # variables ∂E ΔW1 = − η = ∗ W1 ∂W Existence Theorem: 1 (Hornik, Stinchcombe, White 1989) input x 3-layer neural networks can approximate any continuous Neural networks imply a Correspondence of function on a compact domain. Equations, Architectures, Local Algorithms.

Page 3 CT RTC BAM ©Siemens AG, Corporate Technology Neural Networks are No Black Boxes

Application: Modeling of a Gas Turbine Model Output vs. actual NOx  Inputs: 35 sensor measures and control variables of the turbine

 Output: NOx emission of the gas turbine Sensitivity Analysis: Compute the first derivatives along the : ∂output ∂output > 0 < 0 ∂inputi ∂inputi

NOx Sensitivity over Time A classification of input-output sensitivities:  constant over time (= linear relationship)  monotone (input can be used in 1dim. control) Inputs  non-monotone (only multi-dim control possible)  ~ zero (input useless in modeling and control)

Page 4 CT RTC BAM ©Siemens AG, Corporate Technology Neural Forecasting of Wind Power Supply

Forecast vs. actual Energy Supply (0h) Forecast vs. actual Energy Supply (6h)

Training Test Training Test

Forecast vs. actual Energy Supply (12h) Forecast vs. actual Energy Supply (18h)

Training Test Training Test

 Task: Forecast the wind energy supply of an entire wind field over the next 24 hours  Solution: Deep Neural Network (DNN) with 10 hidden layers. Inputs are wind speed and direction  Results: DNN (RMSD: 7.20%) outperform the analytical Jensen model (benchmark, RMSD: 10.22%) Page 5 CT RTC BAM ©Siemens AG, Corporate Technology Forecasting of Solar Power Supply

PV Supply SENN + Physics Physics SENN Model Error measured on Test Data 1,60 6,0% 5,78%

1,40 5,5% 5,19% 1,20 5,0% 1,00 4,45% 4,5% 0,80 4,0% 0,60

0,40 normalized Model Error 3,5% Total PV supply (scaled) supply PV Total

0,20 3,0% Physics SENN SENN + Physics 0,00 Forecast Horizon: 24h in minute time buckets (Test day, 2012-01-01) Model

 Forecast the energy supply of a PV plant with a neural network Physical Model (SENN) based on weather forecasts and/or a physics based model Neural Network Combined Approach

 Data set: 2011-09-08 to 2012-01-23. Minute data (200k data points). Forecast Error 40k data points are randomly selected as test data.

 Performance of a purely data-driven model (SENN) is comparable to the physics based model

 A hybrid model (SENN & physics) improves the forecast accuracy Time Page 6 CT RTC BAM ©Siemens AG, Corporate Technology Optimizing Sensors with Neural Networks

Improve Optical Smoke Detectors: sensor signal target ( =1 if fire, =0 else ) •Smoke-poor fires are hard to detect •Avoid false alarms caused by e.g. steam, NN-Output welding, exhaust fumes

Modeling with Neural Networks: • Input: 3 inputs extracted from one sensor over 40 time steps. 50 fire scenarios. • Network Design: Feedforward network including local and global modeling. • Network Learning: Stochastic learning including cleaning noise water steam plastic fire • Result: Neural networks was able to (time series of 2 different scenarios) classify all patterns correctly

Page 7 CT RTC BAM ©Siemens AG, Corporate Technology Modeling of Open Dynamical Systems with Recurrent Neural Networks (RNN) y

st+1 = tanh(Ast + But ) state transition st +1 = f (st ,ut ) s yt = Cst output equation T y t = g(st ) 2 ()y − y d → min identification ∑ t t A B,, C u t=1

Finite unfolding in time transforms time into a spatial architecture. We yt−2 yt−1 yt yt+1 yt+2 yt+3 yt+4 assume, that xt=const in the future. C C C C C The analysis of open systems by C C A A A A A A RNNs allows a decomposition of st−2 st−1 st st+1 st+2 st+3 st+4 its autonomous & external driven subsystems. B B B B Preprocessing: u u u u Long-term depends t−3 t−2 t−1 t u = x − x on a strong autonomous subsystem. t t t−1

Page 8 CT RTC BAM ©Siemens AG, Corporate Technology Modeling Dynamical Systems with Error Correction Neural Networks (ECNN)

An error correction considers the d st +1 = f (st ,ut , (yt − yt )) forecast error in present time as a reaction on unknown external information. y t = g(st ) In order to correct the forecasting this error is T d 2 ()yt − yt → min used as an additional input, which substitutes ∑ f ,g the unknown external information. t =1

A A AA

0 + Id C D C D C D C C noise st−2 zt−2 st−1 zt−1 st zt st+1 yt+1 st+2 yt+2

B -Id B -Id B -Id B d d d ut−3 yt−2 ut−2 yt−1 ut−1 yt ut

Page 9 CT RTC BAM ©Siemens AG, Corporate Technology Combining - Invariance Separation with ECNN

dynamicst separation  The bottleneck autoassociator solves the variance - invariance decomposition. invariants variantst  The Error Correction Neural Network identity forecast solves the transformed temporal problem. invariants variants t+1  The sub-networks are implicitly coupled combination by shared weights.

dynamicst+1 y t yt+1 yt+2 A AA F F F 0 + Id C D C D C C xt noise st−1 zt−1 st zt st+1 x t+1 st+2 x t+2

E B E B E B d u d d yt t−2 − yt−1 ut−1 − yt ut

Page 10 CT RTC BAM ©Siemens AG, Corporate Technology Effective Load Forecasting with Recurrent Neural Networks

Load Forecasting for Electrical Energy Compression of Electrical Load Curves

Load ForecastLoad Forecast vs.actual Actual Load Load Daily load is represented by a super- 1 2 95 96 position of intraday sub-load curves yt ,,yt …, yt ,yt  y1  B > 0  t    2    y  α  α1, …, α8 t 1 logistic       = ·   energy energy load M M A     95 α   yt     8  1 2 95 96   yt ,,yt …, yt ,yt 96    yt    date Accuracy of load forecast is 97,95% (Benchmark 95%)

 Task: Predict the upcoming energy demand on a 15 min. time grid up to 5 days ahead.

 Difficulty: Incorporate the impact of ex- ternal influences on the energy demand.

Page 11 Intelligent Systems & Control ©Siemens AG, Corporate Technology Modeling Closed Dynamical Systems with Recurrent Neural Networks

st +1 = f (st ) st+1 = Atanh(st ) , s0 state transition y t = g(st ) yt = [Id 0, ]st output equation T d 2 ()yt − yt → min identification ∑ ,sA t =1 0

We can only observe a yt−2 yt−1 yt yt+1 yt+2 fragment of the world … [Id ]0, [Id ]0, [Id ]0, [Id ]0, [Id ]0, s bias 0 Id A Id A Id A Id A st−2 tanh st−1 tanh st tanh st+1 tanh st+2

…but to understand the dynamics of the observables, we have to reconstruct at least a part of the hidden states of the world. Forecasting is based on observables and hidden states. Page 12 Intelligent Systems & Control ©Siemens AG, Corporate Technology The Identification of Dynamical Systems in Closed Form

Embed the original architecture into a larger architecture, which is easier to learn. After the training, the extended architecture has to converge to the original model.

d d d d yt−3 yt−2 yt−1 yt -Id -Id -Id -Id tar tar tar tar = 0 = 0 = 0 = 0 y t +1 y t +2 − Id − Id − Id − Id         [Id ]0,  0  [Id ]0,  0  [Id ]0,  0  [Id ]0,  0  [Id ]0, [Id ]0,

s0 Id A Id A Id A Id A Id A bias st −3 tanh st−2 tanh st−1 tanh st tanh st +1 tanh st +2

The essential task is NOT to reproduce the past observations, but to identify related hidden variables, which make the dynamics of the observables reasonable.

Page 13 Intelligent Systems & Control ©Siemens AG, Corporate Technology Approaches to Model Uncertainty in Forecasting 1Measure uncertainty as volatility (variance) of the Training Test target series. The underlying forecast model is a constant. Thus sin(ωt) can be highly uncertain!?? 2Build a forecast model. The error is interpreted

Forecast as uncertainty in form of additive noise. The width Target Series of the uncertainty channel is constant over time. y d y d y d t−2 t−1 t 3 Describe uncertainty as a diffusion process -Id -Id -Id tar tar tar (random walk). The diffusion channel widens over = 0 = 0 = 0 y t +1 y t +2 − Id − Id − Id       Id ]0,[  0  Id ]0,[  0  Id ]0,[  0  Id ]0,[ Id ]0,[ time, e.g. scaled by the one-step model error.

s0 Id A Id A Id A Id A bias s tanh s tanh s tanhs tanh s t−2 t−1 t t +1 t +2 For large systems 2 & 3 fail: We have to learn to zero error → the uncertainty channel disappears. 4One large model doesn't allow to analyze forecast uncertainty, but an ensemble forecast shows the characteristics of an uncertainty channel: Given a finite set of data, there exist many perfect models LME Copper Price Copper LME of the past data, showing different future scenarios Time Steps of Forecast Horizon (20 Days) Forecast Horizon (in days) caused by different estimations of the hidden states. Page 14 Intelligent Systems & Control ©Siemens AG, Corporate Technology Control of Dynamical Systems with Recurrent Neural Networks

Questions to Solve

System Identification

State Estimation

Controller Design

Page 15 Intelligent Systems & Control ©Siemens AG, Corporate Technology System Identification, State Estimation & Optimal Control with RNNs

s = tanh(As + Bu ) s tanh A s B u (s ) τ +1 τ τ Unfold the RNN into the τ +1 = ( τ + τ τ )

yτ = Csτ future, given A,B,C. yτ = C sτ system identification Learn a linear feedback normative target ∗ 2 d controller: uτ = u(sτ ) = Dsτ (y − y ) → min L( yτ ) → min ∑ τ τ A B ,, C ∑ D τ ≤t τ >t

y y y y yt−3 yt−2 t−1 t yt+1 t+2 t+3 C C C C C C C 0 + A A A A A A noise st−3 st−2 st−1 st st+1 st+2 st+3

B B B B D B D B D B u ∗ u∗ u∗ ut−4 ut−3 t−2 ut−1 ut t+1 t+2

Page 16 Intelligent Systems & Control ©Siemens AG, Corporate Technology Reward Maximization with a Learned Controller

Tasks: System Identification State Estimation Controller Design

1 Challenges:

0.9 Reward  High data volume (5000 var/sec) 0.8 Learned controller  Streaming real-time analysis 0.7 (different alogrithms)

0.6  Autonomous learning

0.5 No operation  Real-time data analytics using 0.4 1000 learned models Random actions 0.3  Optimization of emissions

0.2 0 100 200 300 400 500 600 700 800 900 1000

Number of 5 s time steps  Optimal turbine operation

Page 17 Intelligent Systems & Control ©Siemens AG, Corporate Technology