Networked Dynamic State Estimation with Time-Stamped Multi-Sensor Observations

Home , Bei (state)

NORTHEASTERN UNIVERSITY

DOCTORAL THESIS

Author: Supervisor: Bei Yan Hanoch Lev-Ari

A thesis submitted in fulﬁlment of the requirements for the degree of Doctor of Philosophy

in the

Department of Electrical and Computer Engineering

October 2014 Declaration of Authorship

I, Bei Yan, declare that this thesis titled, ’Networked Dynamic State Estimation with Time- Stamped Multi-Sensor Observations’ and the work presented in it are my own. I conﬁrm that:

⌅ This work was done wholly or mainly while in candidature for a research degree at this University.

⌅ Where any part of this thesis has previously been submitted for a degree or any other qualiﬁcation at this University or any other institution, this has been clearly stated.

⌅ Where I have consulted the published work of others, this is always clearly attributed.

⌅ Where I have quoted from the work of others, the source is always given. With the excep- tion of such quotations, this thesis is entirely my own work.

⌅ I have acknowledged all main sources of help.

⌅ Where the thesis is based on work done by myself jointly with others, I have made clear exactly what was done by others and what I have contributed myself.

Signed:

Date:

i Abstract

In this dissertation the performance of a continuous-discrete Kalman filter using multi-sensor observations is analyzed in the presence of irregular sampling, observation/control delay, bad data and system parameter inaccuracy. Irregular sampling may occur as a result of differences in sampling rates and/or lack of synchrony in a geographically-distributed power system. Al- ternatively, it may also be caused by intermittency (i.e., packet-loss) in the communication link. We show that the average error covariance depends only on system parameters and on the moment generating function of the irregular sampling interval of the multi-sensor sampling pattern. We obtain lower and upper bounds on the average error covariance, as well as a necessary condition for its stability, expressed in terms of the region of convergence of the sampling interval moment generating function. In the discussion of communication delay, we provide an explicit expression for the added effect of delayed time-stamped observations on the steady-state error covariance of our networked Kalman filter. We demonstrate that observation delay always degrades the performance of the (continuous-discrete) Kalman filter, and that the added estimation error depends on the moment generating function of the transmission delay. We propose a delay-mitigation method to cancel the delay in control path: our delay-mitigation method transmutes delay into increased estimation error, but at the same time, avoids the destabilizing effect of delay in the control path. In the presence of bad data, the performance of a KF dynamic state estimator can be significantly affected. We provide guidelines for customization of the proposed detector, under both a Bayesian and “Fisherian” approach to statistical decision theory. The former is recommended when the statistical distribution of corrupted measurements is available; the latter is more appropriate when little is known about the likelihood of occurrence of bad measurements and their statistics. Finally, we consider the Kalman filter under the presence of parameter inaccuracy. As we all know the Kalman filter is the optimal linear mean square state estimator when the system state space model is accurate and its performance will degrade. Modifications of the discrete-time Kalman filter, designed to minimize its sensitivity to structured parameter inaccuracies are provided and the improved performance of our robust dynamic state estimation is illustrated as compared with the standard Kalman filter. We also combine our robust Kalman filter with the delay-mitigation approach and this results in a significant reduction in state estimation error. In considering the resiliency, we use the term Region of Acceptable Performance (RAP) to denote the set of all sensor sampling rate combinations that result in tr(P ,av) h. Once the boundary of the RAP has been determined we can define the  Resilience Margin of a given dynamic state estimator in terms of the level of perturbation (such as packet drops, communication delay, time-stamp inaccuracy, bad data) that can be accommodated without leaving the RAP. Acknowledgements

It took me a long time to finish this dissertation, and fortunately the support and guidance from many sources during this long time helped me finish it successfully. In my mind, the assistance has been academic and emotional. To these people I am thankful. First of all I would like to express my gratitude to my advisor, Professor Hanoch Lev-Ari, who have been a tremendous mentor for me. I would like to thank you for encouraging my research. Your guidance and support from the initial to the final level enabled me to make progress on research and finish the dissertation. I have a great fortune to work with you, and your patience and motivation helped me to overcome the many challenges I encountered in the process of carrying out my research and writing this dissertation. I am really thankful for the time you spent on my research and dissertation and all the help you offered. Your advice on both research as well as on my career have been priceless.

I would also like to thank my committee members, Professors Aleksandar Stankovic and Ali Abur. It is my honor to work with you under the CURENT program which extends my research into power system and give me a good understanding of the application of networked estimation and control problem.

A special thanks to my family: words cannot express how grateful I am to my mother and father for all of the sacriﬁces that you have made on my behalf.

Lastly, I offer my regards and blessings to all my friends who supported me in so many ways during the completion of the dissertation.

iii Contents

Declaration of Authorship i

Abstract ii

Acknowledgements iii

Contents iv

List of Figures vi

List of Tables viii

1 Introduction 1 1.1 Networked Estimation and Control Architecture ...... 3 1.2 Continuous-Discrete State-Space Model and Kalman Filter ...... 4 1.3 Irregular Sampling Patterns ...... 6 1.3.1 Single-Sensor Time Pattern ...... 7 1.3.2 Multi-Sensor Time Pattern ...... 8 1.4 Communication Delays ...... 9 1.5 Bad Data ...... 11 1.6 Modeling Inaccuracies ...... 12 1.7 Resiliency ...... 13

2 Networked State Estimation with Irregular Sampling Pattern 16 2.1 Single-Sensor Estimation with Irregular Sampling Pattern ...... 16 2.2 Multi-Sensor Estimation with Irregular Sampling Pattern ...... 17 2.2.1 Lower bound on EP (tn) ...... 22 2.2.2 Upper bound on EP (tn) ...... 24 2.2.3 Renewal approximation for the MSTP ...... 28 2.2.4 Effect of sensor differences ...... 28

3 Networked State Estimation with Communication Delay 31 3.1 Delay in Control Path ...... 32 3.2 Delay in Observation Path ...... 35 3.3 Time-Stamp Accuracy ...... 38

iv Contents v

4 Networked State Estimation in the Presence of Bad Data 42 4.1 Model of Bad Data ...... 43 4.2 Innovation-Based Detection and Suppression of Bad Data ...... 44 4.2.1 Bayesian detection ...... 44 4.2.2 Fisher-type detection: ...... 50 4.3 Simulation Results ...... 51

5 Robust Continuous-Discrete Kalman Filter 52 5.1 Robust Continuous-Discrete Kalman Filter ...... 54 5.2 Continuous-Discrete Robust Kalman Filter with Known Control Inputs ..... 61 5.3 Stability of the Continuous-Discrete Robust Kalman Filter ...... 66 5.4 Robust Delay Mitigating Control ...... 68 5.5 Robust Delay Mitigation in Observation Path ...... 72

6 Resilience Margins of Networked State Estimation 74 6.1 Packet Drops and Sensor Placement ...... 76 6.2 Communication Delay and Time-Stamp Inaccuracy ...... 78 6.3 Corrupted Sensor Measurements ...... 80

A Continuous-Discrete Kalman Filter 84

B Statistics of Sampling Patterns 87

C Averaged Time Update 88

D Diagonal Scaling Lemma 90

E Lemmas 91

F Fundamentals of Binary Hypothesis Testing [48] 94

G Examples 96

H Islanded Power System Example 98

I Kalman Filter with Parameter Uncertainty in Unstable System 101

Bibliography 104 List of Figures

1.1 Networked Estimation/Control Conﬁguration ...... 4 1.2 Effect of bad data on the performance of a Kalman ﬁlter...... 12

2.1 SEER vs Average Sampling Interval Tav in both sampling pattern with different x 18 2.2 SEER vs Average Sampling Interval Tav with different kurtosis ...... 19 2.3 Estimation performance (trace(P+,av)) as a function of F1 and F2 ...... 20 2.4 Contour plot of Fig. 2.3 ...... 21 2.5 Effect of relative variance on estimation performance: ...... 21 2.6 P ,av vs Average Sampling Rate Fav in two-sensor Bernoulli-drop case: two unstable eigenvalues in A ...... 26 2.7 P ,av vs Average Sampling Rate Fav in two-sensor Bernoulli-drop case: one unstable eigenvalue in A ...... 27 2.8 P ,av vs. the average sampling rate Fav : two-sensors with uniformly-distributed sampling intervals...... 29 2.9 Contour plot of P ,av vs Packet receiving probability 1 p with two different sensor Bernoulli-drop case ...... 30

3.1 Effect of delay mitigation on SEER, in the presence of observation noise .... 34 3.2 Effect of delay mitigation on SEER, in the presence of system parameters inaccuracy ...... 34 (d) 3.3 P ,av vs. the average sampling rate Fav : two-sensors with uniformly-distributed sampling intervals...... 39 3.4 Robust Kalman Filter with Delay Mitigating Control in Islanded Power System 40 3.5 Effect of time-stamp inaccuracy on delay ...... 41

4.1 Effect of bad data on the performance of a Kalman ﬁlter...... 43 4.2 Average error covariance as a function of the normalized innovation threshold. . 50 4.3 Performance of a Kalman ﬁlter using a detector for bad data...... 51

5.1 Effect of model parameter inaccuracy on standard Kalman filter performance corresponding to delay ...... 53 5.2 Robust continuous-discrete Kalman filter: stable system ...... 62 5.3 Robust continuous-discrete Kalman filter: “marginally-stable” system ..... 63 5.4 Robust continuous-discrete Kalman filter with constant control inputs: stable system ...... 65 5.5 Robust continuous-discrete Kalman filter with constant control inputs: “marginally- stable” system ...... 65 5.6 Robust continuous-discrete Kalman filter with unstable system ...... 67

vi List of Figures vii

5.7 Robust continuous-discrete Kalman ﬁlter with unstable system and feedback control ...... 68 5.8 Robust Kalman Filter with Delay Mitigating Control: Unstable System ..... 70 5.9 Robust Kalman Filter with Delay Mitigating Control: “Marginally-Stable” System 71 5.10 Robust Kalman Filter with Delay Mitigating Control: Stable System ...... 71 5.11 Block Diagram of Islanded Power System ...... 72 5.12 Robust Kalman Filter with Delay Mitigating Control in Islanded Power System 73

(i) 6.1 Contour plot of P ,av vs average sampling rate Fav in regular sampling case: (a) both of sensors are observable ...... 75 (i) 6.2 Contour plot of P ,av vs average sampling rate Fav in two-sensor Bernoulli-drop case: (a) both of sensors are observable ...... 77 (i) 6.3 Contour plot of P ,av vs average sampling rate Fav in two-sensor Bernoulli-drop case: (b) sensor 2 is observable and sensor 1 is not ...... 78 (i) 6.4 Contour plot of P ,av vs average sampling rate Fav with communication delay . 79 (i) 6.5 Contour plot of P ,av vs average sampling rate Fav with communication delay and time-stamp inaccuracy ...... 80 (i) 6.6 Contour plot of P ,av vs average sampling rate Fav with Corrupted Sensor Mea- surements ...... 82 (i) 6.7 Contour plot of P ,av vs average sampling rate Fav with Corrupted Sensor Mea- surements ...... 83

H.1 Block diagram of the simulated power system ...... 98 H.2 Robust delay-mitigated dynamic state estimation in the presence of a 0.2sec delay and 30% parameter inaccuracy ...... 100

I.1 Structure of Kalman Filter with Parameter Uncertainty ...... 102 List of Tables

4.1 Four decision scenarios and their costs ...... 45

5.1 Robust Continuous-Discrete Kalman Filtering (type 1) Algorithm in Measure- ment and Time Update Form ...... 59 5.2 Robust Continuous-Discrete Kalman Filtering (type 2) Algorithm in Measure- ment and Time Update Form ...... 60

6.1 Resilience Margin ...... 77

F.1 Four decision scenarios and their costs ...... 95

viii Chapter 1

Introduction

Progress in communication technology has opened up the possibility of large scale control systems in which the control task is distributed among several processors, sensors, estimators and controllers interconnected via communication channels. Such control systems may be distributed over long distances and may use a large number of actuators and sensors. Motivated by distributed Electric Energy Systems applications, we consider the problem of networked dynamic state estimation using Kalman ﬁltering.

When data is sent via unreliable communication channels, the effect of communication delays and loss or corruption of information cannot be neglected. In addition, measurements can also be subjected to faults caused by sensor failure, bad connections and data conversion abnor- malities. The effects of communication delay, sensor-dependent sampling times and irregular sensor timing patterns have been studied by several authors [1]-[8]. In particular, random packet drops give rise to a Bernoulli pattern, and may result in divergence (i.e., loss of stability) of the associated Kalman ﬁlter [9]-[15]. Similarly, sensor failure and/or bad sensor data impact estimation performance and can also lead to loss of stability [16]. Finally, networked estimation in large scale, geographically-dispersed system must also contend with timing and model parameter inaccuracies ([17], [18]).

In this dissertation we examine the effects of irregular measurement timing patterns, communication delay, bad sensor data, time-stamp inaccuracies and system parameter uncertainties on the performance of a networked Kalman ﬁlter estimator. We develop techniques that can counter the effects of delay and bad sensor measurements, even in the presence of system model parameter uncertainty. In addition, we construct performance metrics that quantity the effect of such perturbations on a networked dynamic state estimator and provide guidelines for designing networked estimators that can operate reliably in the presence of sensing and communication perturbations and faults. Since we consider arbitrary observation timing irregularities, we choose to describe networked estimation systems in terms of continuous-time, discrete-measurement,

1 Chapter 1. Introduction 2 state-space models. We then analyze the steady-state performance of the associated continuous- discrete Kalman ﬁlter [19] under the assumption that all transmitted signals are time-stamped, so that both the sensor sampling instant and the communication delay can be determined when an observation packet is received at the estimation/control center. Finally, we also examine the effects of limited time-stamp accuracy on the performance of the proposed dynamic state estimator, which incorporates both a delay mitigation scheme and a bad data detection/suppression scheme.

Our approach accommodates arbitrary timing irregularities, regardless of their cause. We model in Chapter 2 the sequence of time-variant inter-sample intervals for each individual sensor as a renewal process [20]. We also assume that the random sampling time sequences for distinct sensors are statistically-independent of each other. This reﬂects our fundamental modeling assumption that sensors are geographically-dispersed and, in general, need not coordinate their sampling patterns.

In Chapter 3 we provide an explicit expression for the added effect of delayed time-stamped observations on the steady-state error covariance of our networked Kalman ﬁlter. We demonstrate that observation delay always degrades the performance of the (continuous-discrete) Kalman

ﬁlter, and that the added estimation error depends on the moment generating function f o ( ) of D · the transmission delay. We propose a delay-mitigation method to cancel the delay in control path: our delay-mitigation method transmutes delay into increased estimation error, but at the same time, avoids the destabilizing effect of delay in the control path. We also demonstrate that the effect of inaccurate time-stamping is essentially equivalent to an increase in average delay, resulting in a further increase in state estimation error.

In Chapter 4, we discuss the performance of a KF dynamic state estimator in the presence of bad data. We propose an innovations-based detector for identifying corrupted sensor measurements that can result in a signiﬁcant performance improvement as compared with a standard Kalman ﬁlter. We provide in Chapter 4 guidelines for customization of the proposed detector, under both a Bayesian and “Fisherian” approach to statistical decision theory. The former is recommended when the statistical distribution of corrupted measurements is available; the latter is more appropriate when little is known about the likelihood of occurrence of bad measurements and their statistics.

The Kalman filter is the optimal linear mean square state estimator when the system state- space model is accurate. When this assumption is not satisfied, the Kalman filter is not optimal anymore and its performance will degrade [21]-[23]. The degradation of performance directly depends on the level of parameter inaccuracy. Modifications of the discrete-time Kalman filter, designed to minimize its sensitivity to structured parameter inaccuracies, were developed in [24], [25]. We implement in Chapter 5 the robust state-space estimators of [24], [25] in a Chapter 1. Introduction 3 continuous-discrete setting, and analyze their performance in the presence of communication delay and model parameter inaccuracies.

In Chapter 6, we talk about resiliency of networked state estimation. Similar to single sensor case, the average error covariance P ,av of a continuous-discrete Kalman ﬁlter using measurements from multiple, non co-located, sensors is a function of the individual sampling rates of all sensors. We use the term Region of Acceptable Performance (RAP) to denote the set of all sensor sampling rate combinations (F1 ,F2 ,...FL) that result in tr(P ,av) h, where the threshold  value h can be application-speciﬁc. We illustrate the RAP boundary for a two-sensor estimator in the presence of packet drops, communication delay, time-stamp inaccuracy and bad data. Our resilience diagrams quantify the level of perturbation that can be tolerated by an estimator with a given set of sensor sampling rates.

1.1 Networked Estimation and Control Architecture

Networked monitoring and control of large scale systems, such as the power grid, is usually implemented in a hierarchical fashion, with a central monitoring and control facility coordinating the operation of multiple estimation/control hubs [26]. Because of the considerable geographi- cal separation between the hubs and the central facility, resulting in signiﬁcant communication delays, dynamic state estimation is usually implemented at the hub level. The central facility is responsible, instead, for combining and reﬁning the estimates provided by the various hubs, and translating those into control objectives (e.g. set points) for each hub. Additional (intermediate) levels may be needed in very large scale systems such as an integrated North American smart power grid [26].

Each estimation/comtrol hub acquires measurements from multiple sensors, which are typically not co-located, and may be spread over a wide area. Consequently, the observation paths (i.e., the communication links between sensor and the estimation/control hub) will each have different statistical characteristics and, most likely, be independent of each other. Similarly, the dynamic state estimates that are generated by the hub are transmitted over a control path to one, or more, actuators.

The research strategy described in this dissertation focuses on the operation of a single estimation hub. The networked estimation and control conﬁguration that we consider consists of a single estimation hub that collects observations from several sensors and provides dynamic state estimates to one, or more, actuators. A communication network is used to transmit observations from the sensors to the central estimation center, and to deliver state estimate information from this entity to individual actuators (Fig. 1.1). In order to be able to mitigate the effects of delay in Chapter 1. Introduction 4 the control path we augment each actuator with a control module, which uses dynamic prediction to compensate for the delay. For this reason we also consider that the estimation hub should transmit state estimates (rather than control signals) to the control module. This also makes it possible to divide the state estimation task between the estimation hub and the control module: the former determines the observation-dependent component of the state estimate, while the latter combines the received information with a locally-computed component that represents the contribution of the actuator signal to the state estimate. Consequently, there is no need to transmit information about the actuator control signal to the estimation hub.

The discussion in the remainder of this dissertation assumes that the celebrated separation principle holds for the conﬁguration described in Fig. 1.1. This makes it possible to use a (continuous-discrete) Kalman ﬁlter in the estimation hub to determine the observation-dependent component of the state estimate. We present in Chapter 3 some results from the literature (e.g. [27]) that validate the separation principle assumption.

FIGURE 1.1: Networked Estimation/Control Conﬁguration

1.2 Continuous-Discrete State-Space Model and Kalman Filter

We are mainly interested in state estimation in geographically-distributed Cyber-Physical Sys- tems [28], such as the electric power grid, with their characteristic two-layer structure:

a physical system that ‘lives’ in the real world, in continuous time, and • Chapter 1. Introduction 5

an estimation and control system that ‘lives’ in cyberspace, namely is implemented in a • computer or digital processor, in discrete time.

Sensors and actuators act as interfaces between the two layers. The need to transmit sensor measurements over a digital communication network dictates the use of discrete-time measurements even when the underlying system evolves in continuous-time.

We consider a continuous-time discrete-measurement dynamic linear system model with control inputs [19], viz.,

x˙(t)=Ax(t)+Bu(t)+w(t) (1.1a)

y(tn)=C(tn)x(tn)+hn (1.1b) where x(t),w(t) and hn are the state, process noise and measurement noise, respectively. The processes w(t) and h are i.i.d. with E w(t)w (s) = Qd(t s), E h h = R d , and n { ⇤ } { n m⇤ } n n,m E w(t)h = 0 for all t and all n. The process u(t) is a known (control) input. The sequence { n⇤} of measurements y(tn) is the union of observations from all sensors, ordered according to the observation times t . The set of sensors that are observed at the time instant t can be, in { n} n general, any one of the 2L 1 combinations of one or more sensors from a total of L available sensors. Consequently, C(t )=C for some 1 k 2L, and similarly for R(t ). The n k   n sequence of measurements received at the estimation hub, say y ( ), is a delayed version of the d · o observations acquired by the multiple geographically-distributed sensors, namely yd(tn + Dn)= o y(tn). Here Dn denotes the communication delay experienced by the observation acquired at the time instant tn .

We assume that all transmitted signals are time-stamped, which means the information packets sent from each sensor to the estimation hub consist of triplets: (a) a label identifying the sensor, (b) a time-stamp specifying when the observation was acquired, and (c) the observation itself. Alternatively, a packet may represent a cluster of co-located sensors (as explained above), in which case the time stamp identiﬁes the entire cluster rather than the individual sensor and the ”observation” component is in fact a collection (i.e. vector) of observations from all the sensors in the cluster.

The continuous-discrete Kalman ﬁlter (see App. A) alternates between a measurement-update step (in discrete time) and a time-update process (in continuous time) [19], viz.,

Measurement update (at time tn):

1 x+(tn)=x (tn)+P (tn)C⇤(tn) C(tn)P (tn)C⇤(tn)+R(tn) y(tn) C(tn)x (tn) (1.2a) ⇣ ⌘ ⇣ ⌘ b b b Chapter 1. Introduction 6

1 P+(tn)=P (tn) P (tn)C⇤(tn) C(tn)P (tn)C⇤(tn)+R(tn) C(tn)P (tn) (1.2b) ⇣ ⌘

Time update (in the interval t t < t + ): n  n 1

A(t tn) x(t)=e x+(tn)+Bu(t) (1.3a)

t tn A(t tn) A⇤(t tn) At A⇤t Pb(t)=e Pb+(tn)e + e Qe dt (1.3b) Z0

The continuous-discrete Kalman ﬁlter, as described in [19], addresses the general problem of estimating the states of a continuous-time system from discrete-time measurements acquired at known arbitrarily-spaced observation times t . Notice that this ﬁlter can be implemented { n} without advance knowledge of t : time-stamping provides the necessary timing information { n } when it is needed.

Once the initialization transient has died off, the error covariance P(t) decreases with every measurement update, and increases in between measurement updates. When the sampling pattern is regular, the post-update error covariance P+(tn) converges to a steady-state limit, and similarly for the pre-update error covariance P (tn). However, when the sampling pattern is irregular and random, the two sequences P+(tn);n = 1,2, and P (tn);n = 1,2, { ···} { ···} may oscillate and never converge. This motivates us to introduce the average error covariance def P ,av = lim EP (tn) as a measure of steady state performance, where E denotes expectation ± n • ± ! with respect to the probability density function of the inter-sample intervals Ts(n) , tn tn 1. Our objective is to obtain a simple characterization of P ,av in terms of the system parameters ± and the statistics of the sampling pattern.

1.3 Irregular Sampling Patterns

The two primary causes for irregularity of the sampling pattern are: (a) occasional loss of a measurement in the communication network (=packet drop), (b) timing inaccuracy (i.e., jitter/drift) of the clock used to determine sampling instants, and (c) lack of sampling synchrony between the individual sensors. A regular sensor sampling pattern is completely characterized in terms of its sampling interval Ts while an irregular one has a time-varying sampling interval Ts(n). We model the sequence of time-variant inter-sample intervals for each individual sensor as an i.i.d. random process, characterized in terms of its probability density function or, equivalently, its moment generating (a.k.a. characteristic) function. This means that each individual sensor inter-sample pattern is a renewal process [20]. In the multi-sensor case, the sequence t is { n} Chapter 1. Introduction 7 a superposition [20] of the individual sensor sampling patterns, i.e., an ordered union of these individual renewal processes. We refer to this combined sequence as the multi-sensor timing pattern (MSTP).

1.3.1 Single-Sensor Time Pattern

Here we list the most common irregular sampling patters:

Regular sampling with Bernoulli packet drop: • This is a “lattice-type” [20] sampling pattern, since all Ts(n) are multiples of a fundamental time interval D. The probability density function (pdf) is

i 1 Pr T (n)=iD =(1 p)p i 1 s where p is the packet drop probability and D is the fundamental sampling interval.

Uniformly distributed random sampling • The sampling interval is Ts(n)=Tav(1+ax), where Tav is the average sample interval, x is uniformly distributed in the interval [ 1,1] and a determines the variance of the ﬂuctua- tion of T (n)(0 a < 1). When a = 0, T (n)=T , so that uniformly distributed random s  s av sampling includes regular sampling as a special case. Uniformly distributed sampling can be used to model clock jitter and similar causes of timing inaccuracy, such as GPS timing errors.

Gamma distribution sampling: • Gamma distribution sampling is a convenient continuous “non-lattice-type” sampling pattern with enough degrees of freedom to model 1st, 2nd and higher (3rd or 4th) order moments. The probability density function (pdf) is

1 1 k 1 x Pr T (n)=x = x e Q x > 0 and k,Q > 0 s Qk G(k) The pdf of the gamma distribution is parameterized in terms of a shape parameter k and a scale parameter Q. Both k and Q have positive values.

The statistics of a given sensor time pattern T (n) can be speciﬁed in terms of its pdf, its { s } moment generating function f (s)=E esTs(n) , or in terms of its moments. The most useful Ts { } moment-based metrics of a given sensor time pattern are:

1 The mean sampling interval Tav , ETs(n), and the average sampling rate Fav , • Tav Chapter 1. Introduction 8

VarTs(n) The variance index x = 2 • Tav 4 E[Ts(n) Tav] The kurtosis 2 • (VarTs(n))

Explicit expressions for these metrics are provided in App . B,

1.3.2 Multi-Sensor Time Pattern

When an estimation hub receives observation packets from multiple sensors, the corresponding superposed time pattern is obtained by extracting the time-stamped tn and the sensor identiﬁca- tion label, which we denote by sn, from each received packet. We shall refer to the pair-sequence (t ,s );1 n < • , assembled in order of increasing t values, as the multi-sensor time pattern { n n  } n (MSTP) of the observations received at the estimation hub. Notice that the MSTP is, in fact, a marked point process [20], with the sensor labels s playing the role of ”marks”. { n} L (k) The sub-MSTP associated with the k-th sensor cluster (where 1 k 2 1) is ti ,   ({k) }(k) tn; sn = k,1 n < • . As explained earlier, we model the single cluster time pattern ti ti 1 {  } as an i.i.d random process, characterized in terms of its probability density function or, equivalently, its moment generating function. This means that each individual cluster inter-sample pattern is a renewal process. We also assume that these individual cluster sampling patterns are mutually independent, since they represent geographically-distributed sensors which communi- cate with the estimation hub via statistically independent channels. Thus, the combined MSTP is a superposition of independent renewal processes [20]. In general, the combined MSTP is no longer a renewal process, so that the corresponding inter-observation intervals Ts(n) , tn tn 1 need not be i.i.d, and the marks s could be correlated with each other, as well as with the { n} point process t . We say that an MSTP has the i.i.d property when the inter-observation inter- { n} val sequence T (n) and the mark sequence s are both i.i.d, and are also independent of each { s } { n} other. In particular, this property is exhibited by two important special MSTPs: (a) synchronized multi-sensor Bernoulli packet drop, and (b) multi-sensor Poisson.

To be speciﬁc, we now need to distinguish between discrete-time and continuous-time MSTPs. A discrete-time MSTP is a superposition of discrete-time renewal processes, namely the obser- (k) vation time instants ti must all be multiples of a fundamental time interval D. This occurs, for instance, when all sensors use the same uniform sampling time pattern, as proposed for phasor measurement units (PMU) in electric power systems [29]. Because the time instants t all { n} lie on a grid (with spacing D), several distinct (non co-located) sensors may be observed at the same time instant, namely, several distinct sensor labels s ,s , s may be associated with n1 n2 ··· np a single tn . This is known, in the terminology of point processes, as a marked point process with multiple occurrences. The best known example of a discrete-time MSTP is the synchronized multi-sensor Bernoulli packet drop process: all sensors are sampled at the time instants Chapter 1. Introduction 9 t = iD (i = 0,1,2,3, ), but only a subset of all measurements arrives at the estimation hub, ··· due to information packet drops. We shall refer to this case as a synchronized Bernoulli MSTP.

In contrast, a continuous-time MSTP involves no constraints on the time instants t : it is { n} a superposition of non-lattice continuous-time renewal process. This occurs, for instance when each sensor uses its own timing pattern, with no synchronization between individual sensors. In this case, the probability of multiple occurrences (i.e., several information packets being acquired at the same time instant) is zero, so that the MSTP is a marked point process with no multiple occurrences. The best known example of a continuous-time MSTP is the multi-sensor Poisson process: it is obtained by a superposition of several single-sensor Poisson sampling patterns. We shall refer to this case as a Poisson MSTP.

We deﬁned earlier the average observation rate for a single sensor as the inverse of the average (k) (k) (k) inter-observation time Tav , E[ti ti 1], so for a multi-sensor conﬁguration

L 1 F = F(k) , F(k) av Â av av , (k) (1.4) k=1 Tav

This definition applies to both continuous-time and discrete-time MSTPs. In order to allow a unified analysis of both types of MSTPs, we now redefine the mark sn as the collection of all sensor labels associated with a single time instant tn. For a continuous-time MSTP this is simply the sensor label of the single information packet that was acquired at the time instant t , i.e., s n n 2 1,2,3, L where L is the total number of distinct sensors. However, a discrete-time MSTP { ··· } uses marks that identify subsets of 1,2,3, L , so that in this case s 1,2,3, 2L 1 . { ··· } n 2 { ··· } Consequently C(t )=C for some 1 i 2L 1 and similarly for R . n i   n

1.4 Communication Delays

Communication delay is one of the unavoidable side-effects of using a computer network to interconnect sensors, actuators and control centers/nodes. The time required to read a sensor measurement and to send a control signal to an actuator through the network depends on network characteristics such as their topologies, routing schemes and network conditions. There- fore, the overall performance of a network-based control system can be signiﬁcantly affected by network delays. The severity of the delay problem is aggravated when data loss occurs during a transmission. Moreover, delays not only degrade the performance of a network-based control system, but can also results in instability.

When the delay is unknown, one needs to reformulate the optimum state-estimation problem to include the presence of delay [30]: the standard Kalman ﬁlter may not be the optimal mean square error (MSE) estimator in this case. In contrast, using time-stamped information packets Chapter 1. Introduction 10 facilitates accurate determination of the length of the delay, so that the standard Kalman ﬁlter can still be used to obtain the optimal MSE estimator [27], [31], [32], [33] (see Chapter 3 for further details).

In this dissertation, we analyze the effect of time-stamped delay with a focus on electric power systems as a motivating application. Electric power systems are among most spatially extended engineered systems. Their stable operation in the presence of disturbances and outages is made possible by real-time control over communication networks. Since we assume all transmitted signals are time-stamped, the estimation hub or the control module can also determine the length of transmission delay Dn for each information packet by comparing the time of packet arrival

(tn + Dn) with the time-stamp tn of the packet.

For the purpose of corrective action (a.k.a. ”delay mitigation”), one can consider the length of packet transmission delay as a known time-varying process. However, in order to be able to evaluate the long-term, steady-state, impact of delay on the average error covariance, we shall model the delay processes for a single sensor as an i.i.d sequence, and assume that the delay process for non-co-located sensors are statistically independent of each other. We provide additional detail about the statistics of the delay process in Chapter 3.

The presence of delay in the control path (Fig. 1.1) may result in loss of stability of the overall system. Since we are sending the state estimate instead of a control signal (recall Sec. 1.1), we assume that each actuator is equipped with an intelligent control module. Thus one objective of our control module is to mitigate the effects of delay in the control path, so as to avoid loss of stability of the closed-loop system. The method that we propose allows us to overcome very large delay (at least 10-20 times higher than the smallest destabilizing delay in the original control loop) at the cost of increased state estimation error. Our approach to delay mitigation, as outlined in [34] consists of three key ideas:

The estimation center transmits sampled state estimates to a control module attached to • the actuator (see Fig. 1.1).

Each transmitted information packet is time-stamped. This enables the control module to • determine the delay experienced by each information packet.

The control module uses a state-prediction step to transform the delayed (and thus out- • dated) state estimate it receives into an up-to-date estimate that is then used to generate the desired control signal.

The use of delay mitigation allows us to maintain stability of the closed-loop system even if the original open-loop system is unstable. Consequently, the delay-mitigated state estimation error bounded even in the presence of model parameter inaccuracies (see Sec. 5.4). Chapter 1. Introduction 11

1.5 Bad Data

“Bad data” is a collective name for sensor readings that get corrupted, before being received by a Kalman ﬁlter (KF) dynamic state estimator, as a result of sensor/communication failures, poor instrument calibration, faulty or reverse wiring, as well as malicious data injections. Our objective is to detect occurrences of bad data and suppress their impact on the quality of dynamic state estimates. This would be applicable, for instance, to resilient estimation of generator state information in cyber-physical electric power systems. In addition, we obtain a statistical characterization of the detection and amelioration process. This allows us to quantify the corresponding resilience margin as a function of frequency and severity of bad measurements (see Sec. 6.3).

The performance of a KF dynamic state estimator can be significantly affected by the presence of bad data. Notice that successful detection and removal of all bad measurements can significantly reduce the average state-estimation error, as compared with the “untreated” case (Fig. 1.2). Motivated by this observation, we propose an innovations-based detector for identifying corrupted sensor measurements. The incorporation of such a detector into a standard Kalman filter can result, under typical operation conditions, in a significant performance improvement. We provide in Chapter 4 guidelines for customization of the proposed detector, under both a Bayesian and “Fisherian” approach to statistical decision theory. The former is recommended when the statistical distribution of corrupted measurements is available; the latter is more appropriate when little is known about the likelihood of occurrence of bad measurements and their statistics.

The most common model assumed for the occurrence of bad data is

C(tn)x(tn)+hn H0 y(tn)= (1.5) ( bn H1 where H1 indicates bad data, while H0 indicates normal (i.e., un-corrupted) sensor observations. The range of typical bn values in power system applications is assumed to exceed, by several standard deviations, the range of normal (un-corrupted) sensor measurements [26, 35]. The detector that we propose in Chapter 4 relies on the property to classify each received measurement y(tn) as either “good” (i.e., H0) or “bad” (i.e., H1). Every time the detector rejects the hypothesis H0 , the measurement update step is skipped (= the measurement is dropped). Notice that the KF can be implemented to carry out a (scalar) measurement update for each individual sensor measurement, even when several measurements are received at the same time.

Consequently y(tn) in (4.1) is always a scalar, and the detector needs only to classify scalar (i.e., single-sensor) observations. Chapter 1. Introduction 12

Bad Data Detector Comparison (bad data shift = 3) 0.08

Perfect Detector No Detector 0.07

0.06

0.05

Error Covariance 0.04

0.03

0.02

0.01 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Probability of Bad Data

FIGURE 1.2: Effect of bad data on the performance of a Kalman ﬁlter.

1.6 Modeling Inaccuracies

A dynamic state estimator, such as the Kalman filter or the extended Kalman filter relies on accurate knowledge of the various system parameters (e.g. the matrices A, B, Q, C and R in (1.1)). Using inaccurate parameter values to set up the estimator may result in a significantly increased level of estimation error. The presence of communication delay will cause a further degradation in the estimator’s performance [36], and the same is true for occasional occurrence of bad data.

The challenge of obtaining and maintaining accurate, up-to-date system parameter values is particularly evident in large, geographically-distributed systems such as the electric power grid. Components such as generators, transformers, transmission and distribution lines are in constant operation for prolonged periods of time, and thus cannot be easily tested for changes of their parameters (which make up the matrices A, B, Q of (1.1)). In addition, changes in loads have a direct impact on the “known input” signal u(t) of (1.1). For instance, the so called swing equation, which describes the dynamics of electric machines, such as generators and motors, viz., aw˙ (t)+Dw(t)=s[P (t) P (t)] m e use the difference between the electric power Pe(t) and the mechanical power Pm(t) as a driving signal for the ﬂuctuation of the machine’s angular velocity w(t), where s = 1 for generators Chapter 1. Introduction 13 and s = 1 for motors [37]. Changes in electrical loads of generators (resp. mechanical loads of motors) result in changes in Pe(t) (resp. Pm(t)), which must be known accurately to the dynamic state estimator. Finally, the parameter matrices C and R depend on the sensor used to acquire measurements: their operating characteristics may change in time as a result of aging and variation in ambient conditions.

The abundance of sources of parameter inaccuracy in large systems underscores the need for robust dynamic state estimation that can maintain acceptable level of estimation error in the presence of signiﬁcant system parameter inaccuracies. In this dissertation we focus on robust dynamic state estimation in the presence of errors in the matrix A of (1.1). Our approach is motivated by the observation that such changes often have a global impact, causing system-wide propagation of errors, which may lead to increased component stress and eventually component failure. In contrast, changes in a single sensor, or even a small number of sensors, usually have a minimal effect on the quality of state estimation in a well-designed system. This is so because in steady-state operation the measurement update step combines and averages contributions from many sensors. Thus, if the deployment of sensors ensures a “resilience margin”, maintaining observability when a few sensors deteriorate, or even stop working all together, then the average impact on estimation performance can be kept under control. In this dissertation we follow the framework of [24] to set up a model of uncertainty in the dynamic linear system model with parameter inaccuracy in the system matrix A, viz. (recall (1.1))),

x˙(t)=(A + dA)x(t)+Bu(t)+w(t) (1.6)

The parameter error dA is assumed to be completely unknown, except for satisfying the norm magnitude constraint dA e, where the value of e is known in advance to the dynamic state k k estimator. We assume in the sequel that the matrix B is known with sufﬁcient accuracy, and that we are able to determine the known input u(t) (by measuring electrical power Pe(t) directly, or calculating it from voltage and current measurements), also with sufﬁcient accuracy.

1.7 Resiliency

We have demonstrated in previous work (see [38]) that P ,av obtained using a single sensor 1 depends primarily on the (average) rate F = /ETs , with negligible dependence on higher order moments of the inter-observation interval. Similarly, the average error covariance P ,av of a continuous-discrete Kalman ﬁlter using measurements from multiple, non co-located, sensors is a function of the individual sampling rates of all sensors. We shall use the term Region of Acceptable Performance (RAP) to denote the set of all sensor sampling rate combinations

(F1 ,F2 ,...FL) that result in tr(P ,av) h, where the threshold value h can be application-  speciﬁc. Chapter 1. Introduction 14

Once the boundary of the RAP has been determined (or approximated) we can deﬁne the Re- silience Margin of a given dynamic state estimator in terms of the level of perturbation (such as packet drops, communication delay, time-stamp inaccuracy, bad data) that can be accommodated without leaving the RAP. For instance, Figs. 6.2 and 6.3 illustrate the RAP boundary for a two-sensor estimator in the presence of packet drops. In this example we chose a threshold value h such that the RAP boundary corresponds to a relative state estimation error of 5%. Since packet drops result in a reduction of the equivalent sensor sampling rate, the RAP boundary shifts away from the origin as the probability p of packet drop occurrence increases. In particular, we notice that a sensor sampling scheme with, say, F1 = F2 = 15 samples/sec can tolerate packet drop rates up to p = 0.45, which we propose as a metric of resilience for this type of perturbation. This motivates the name Resilience Diagram for plots such as Figs. 6.2 and 6.3.

In the “2-observable” case (Fig. 6.2), the loss of any single sensor still leaves the system observable. Notice the the RAP boundary in this example is almost straight (for all p values), conﬁrming our earlier ﬁnding that the level of estimation error depends primarily on the combined sampling rate Fcombined = Âi Fi . In contrast, in the “1-observable” case (Fig. 6.3), the RAP boundary is no longer straight, showing a strong nonlinear dependence on F1 when

F2 approaches zero. This is so because the loss of sensor 2 renders the system unobservable, leading to divergence of the Kalman ﬁlter. In general we call a system r-observable if there exists a set of r sensors (but not a smaller one) whose loss will result in a loss of observability. One can also deﬁne “r-detectability” in a similar way.

The shape of the RAP boundary in Fig. 6.3 implies that efﬁcient networked estimation systems must be r-observable with r 2. Thus, sensor placement should aim to increase this degree of observability, perhaps to r = 3 or r = 4, in order to achieve resilience with respect to multiple sensor failures. Since in this case RAP boundaries are almost straight, a drop in sampling rate for one sensor (due, for instance, to packet drops) in an r-observable system for r 2 can be compensated by increasing the rate of other sensors.

In general, a resilient design of a networked estimation system should be able to accommodate single (and possibly even multiple) faults/perturbations while maintaining an acceptable level of performance. Our resilience diagrams quantify the level of perturbation that can be tolerated by an estimator with a given set of sensor sampling rates [39]. For instance Fig. 6.5 illustrates the resilience margin with respect to communication delay and time-stamp inaccuracy, while Fig. 6.7 illustrates the resilience margin with respect to occurrences of bad data. Thus, for the example shown in Fig. 6.7 and 5% time-stamp accuracy, the operating point of a dynamic state estimator with sensor sampling rates equal, say to F1 = 20 samples/sec = F2 , is well within the RAP for delays up to 0.2 sec. In fact, we can associate with this operating point a resilience metric deﬁned as the ratio between the combined sampling rate (i.e., F1 + F2) and the RAP Chapter 1. Introduction 15 boundary rate (for 0.2 sec delay and 5% time-stamp accuracy), viz.,

40 samples/sec resilience margin = = 1.33 30 samples/sec Chapter 2

Networked State Estimation with Irregular Sampling Pattern

Randomly varying sampling times have been recognized by many authors as the main challenges to networked estimation (see, e.g. [8], [30], [33]). Much of previously published work on networked estimation has been dedicated to obtaining explicit expression for the average estimation error covariance as a function of the statistics (pdf or moment generating function) of the irregular inter-sample interval. In this chapter, we aim to ﬁnd out this relation not only in single-sensor but also in multi-sensor case.

2.1 Single-Sensor Estimation with Irregular Sampling Pattern

In the single-sensor case, most previous published work focused on the effects of random (Bernoulli) packet drops [9]-[15]. Only a few authors ([1], [2], [8] and [40]) considered a more general case where the sampling intervals are random with an arbitrary pdf. In [1], a continuous- time stochastic dynamic system with modeling of measurement arrivals of a Poisson process was considered and the authors gave a lower bound on the sampling rate that makes it possible to keep the estimation error variance below a given threshold with a speciﬁed probability. In effect, the results of [1] provide information about the probability distribution of the estimation error, albeit only for a single state system. In [2], the authors considered a linear discrete-time system with partial observability and proposed a state estimator with exponential stability under natural assumptions. In [8], the Itoˆ Volterra model is introduced to solve state estimation problem with a continuous, linear, stochastic, time-varying system. In this model, the observations could be arbitrary combination of continuous and discrete measurements. In [9]-[15], random (Bernoulli) packet drops in a discrete-time system are considered, and some conclusions are given based on

16 Chapter 2. Networked State Estimation with Irregular Sampling Pattern 17 that. First of all, upper and lower bounds of the average error covariance is given and in particular, for an unstable system, detectability and stabilizability no longer guarantee convergence of the Kalman filter: P ,av becomes infinite when the packet drop probability exceeds a critical value ([10], [12] and [14]). Also in [13], the authors proved that under a sufficient condition the sequence of random prediction error covariance matrices converges weakly to a unique invariant distribution. In my master’s thesis [40], I showed how to interpret packet drops as a special case of irregular sampling. Also we studied the behavior of estimation performance for a stable system and demonstrated, empirically that Signal to Estimation Error Ratio (SEER) is a monotone decreasing, up-convex function of the average sampling interval E[Ts(n)] (Fig. 2.1). We also demonstrated empirically that (for stable system), the dependence of SEER on higher order moments of Ts(n) is small (practically negligible) as compared with the dependence on E[Ts(n)] def Var[T (n)] h s in (Fig. 2.1, 2.2). In particular, the increasing of fluctuation index x = E[T (n)]2 slightly de- h s in crease the SEER, so the best performance of state estimation is always achieved with a regular sampling pattern. Another thing we observed from simulation is that the estimation performance is not sensitive to the choice of sampling pattern in the simulation where we use periodically- variant and uniformly distributed random sampling, which means when two patterns have the same average sampling interval Tav and the same variance index x, the corresponding SEER is practically the same.

2.2 Multi-Sensor Estimation with Irregular Sampling Pattern

Networked estimation with multiple, non-co-located, sensor involves additional factors that in- ﬂuence estimation performance, beyond the average sampling rate. These include:

Some sensors may have a critical role. Setting their sampling rate to be zero is equivalent • to removing them which can result in loss of observability/detectability and hence to divergence of the Kalman ﬁlter. The results of [9] demonstrate this phenomenon in terms of a boundary curve for the region of stable operation.

We consider the multi-sensor timing pattern as the superposition of each single-sensor • timing pattern. If each single-sensor timing pattern consists of i.i.d. intervals (i.e., is a renewal precess), the same need not to be true in the multi-sensor case: inter-sample intervals need not be mutually independent.

The number of “control parameters” that affect the performance of the Kalman ﬁlter in • the multi-sensor case is much higher: at a minimum, performance depends on the average sampling rate of each sensor (see e.g. Fig. 2.3) but possibly also on higher-order moments of the inter-sample interval. For instance, it has been shown in [40] that the relative alignment of the individual timing patterns is an important parameter, especially when Chapter 2. Networked State Estimation with Irregular Sampling Pattern 18

SEER vs Average Sampling Interval (T ) in both SSPs with different ξ av 50 periodically−variant ξ=0 uniformly rand ξ=0 45 periodically−variant ξ=0.02 uniformly rand ξ=0.02 periodically−variant ξ=0.125 uniformly rand ξ=0.125 40 periodically−variant ξ=0.32 uniformly rand ξ=0.32

25 SEER /dB 20

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Average Sampling Interval (T ) /s av

SEER vs Average Sampling Interval (T ) in both SSPs with different ξ av

13.92 periodically−variant ξ=0 uniformly rand ξ=0 periodically−variant ξ=0.02 uniformly rand ξ=0.02 13.9 periodically−variant ξ=0.125 uniformly rand ξ=0.125 periodically−variant ξ=0.32 13.88 uniformly rand ξ=0.32

13.86

13.84

13.82 SEER /dB

13.8

13.78

13.76

13.74

0.298 0.2985 0.299 0.2995 0.3 0.3005 0.301 0.3015 Average Sampling Interval (T ) /s av

FIGURE 2.1: SEER vs Average Sampling Interval Tav in both sampling pattern with different x Chapter 2. Networked State Estimation with Irregular Sampling Pattern 19

SEER vs Average Sampling Interval (T ): Kurtosis Comparing with different λ av 50 ξ=0 ξ=0.02 λ=0 45 ξ=0.02 λ=0.1 ξ=0.02 λ=0.2 ξ=0.02 λ=0.3 ξ=0.02 λ=0.4 40 ξ=0.02 λ=0.5 ξ=0.125

25 SEER /dB 20

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Average Sampling Interval (T ) /s av

SEER vs Average Sampling Interval (T ): Kurtosis Comparing with different λ av

ξ=0 ξ=0.02 λ=0 13.9 ξ=0.02 λ=0.1 ξ=0.02 λ=0.2 ξ=0.02 λ=0.3 ξ=0.02 λ=0.4 13.88 ξ=0.02 λ=0.5 ξ=0.125

13.86

13.84

SEER /dB 13.82

13.8

13.78

13.76

0.299 0.2995 0.3 0.3005 0.301 0.3015 Average Sampling Interval (T ) /s av

FIGURE 2.2: SEER vs Average Sampling Interval Tav with different kurtosis Chapter 2. Networked State Estimation with Irregular Sampling Pattern 20

the individual sampling rates are all equal. In fact, there is a direct relation between the variance of the inter-sample interval of an MSTP and the error covariance of the associated Kalman ﬁlter estimator (see e.g. Fig. 2.5).

Nevertheless, the dominant factor impacting estimation performance is still the average sensor sampling rate, so we expect to see that the average error covariance is a monotone decreasing function of each average sampling rate Fav,i, and the best results are achieved when the superposed sequence of sampling times is as close to regular as possible. However the differences between single-sensor case and multi-sensor case, as discussed in above, will also have an impact on the estimation performance.

0.035 0.03

0.03 0.025

0.025

0.02

0.02 ) +,av 0.015

tr(P 0.015

0.01 0.01

0.005 0.005

0 0 500 0 100 200 300 400 500 600 700 800 900 1000 Sampling Rate of Sensor 1 (F ) Sampling Rate of Sensor 2 (F ) 1 2

FIGURE 2.3: Estimation performance (trace(P+,av)) as a function of F1 and F2

The empirical results conﬁrm these basic observations in a two-sensor case: the average error covariance is indeed a monotone decreasing function of the average sampling rates F1 and F2 (Fig. 2.3). The same conclusion is also evident from the corresponding contour plot (Fig. 2.4), in which the lines are almost straight, and P+,av is approximately a monotone decreasing function of the linear combination F1 +1.2F2. This effect is a result of the difference between the two sensors: measurements from sensor 2 have a bigger impact on the value of P+,av than measurements from sensor 1.

Notice that the error covariance P(t) in (1.3b) depends on the MSTP. Since we model t as { n} a random sequence, P(t) now becomes a conditional covariance, viz.,

P(t)=E [x(t) x(t)][x(t) x(t)]⇤ t 0:• n o b b Chapter 2. Networked State Estimation with Irregular Sampling Pattern 21

Contour of Two Channels State Estimation in 2D 300 0.025

250

0.02

0.0003 200

0.015

150

0.0003 0.01 0.0004 100 Sampling Rate of Channel 1

0.005 50 0.0007

0.0003 0.0014 0.0004

50 100 150 200 250 300 Sampling Rate of Channel 2

FIGURE 2.4: Contour plot of Fig. 2.3

−3 x 10 7

) 4 +,av

3 trace(P

0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 relative shift δ

FIGURE 2.5: Effect of relative variance on estimation performance: Chapter 2. Networked State Estimation with Irregular Sampling Pattern 22

In the sequel we shall be concerned only with the dynamics of the ensemble-averaged error covariances EP (tn), where the expectation is with respect to the joint distribution of the ± random sampling times t . Thus, we can restrict our discussion of the dynamics of P(t) to { n} the measurement instants t , and replace (1.3b) by { n}

Ts(n) ATs(n) A⇤Ts(n) At A⇤t P (tn)=e P+(tn 1)e + e Qe dt (2.1) Z0

Again, recall that (1.2b) and (2.1) describe the evolution of P (tn) under the assumption of ± negligible observation delays.

2.2.1 Lower bound on EP (tn)

From (2.1), we have Ts(n) At A⇤t P (tn) e Qe dt Z0 Using (C.8) from App . C, we ﬁnd that EP (tn) Plower for all n, where

1 Plower = T fTs (T QT⇤) T ⇤ (2.2a) h i This result holds for every (stationary) sequence Ts(n) of multi-sensor inter-observation intervals, whether they are i.i.d. or not. In deriving this result we assume that the matrix A is diagonalizable, so that A = TLT 1, where T is a nonsingular matrix and L = diag l is a { i} diagonal matrix constructed from the eigenvalues of A (see App . C). Here denotes the Schur- Hadamard (i.e., element-wise) matrix product and

M def 1 fTs = fTs (ll + lm⇤) 1 (2.2b) ll + lm⇤ l,m=1  ⇣ ⌘ where f (s)=E esTs(n) is the moment generating function of the (stationary) multi-sensor Ts { } inter-observation interval Ts(n).

One important conclusion from (2.2a) is that EP (tn) converges only if s Re(l ) < max (2.3) i 2

Here li are the eigenvalues of A, and smax defines the boundary of the ROC of fTs (s), which is the left-half-plane Re(s) < smax . When the condition (2.3) is violated, some of the diagonal elements of fTs become infinite, and the same statement holds for the lower bound matrix Plower, so that (2.3) is a necessary (but not sufficient) condition for the convergence of EP (tn). As an example consider the case of multiple sensors, all sampled at the same time instants nD, but with each set of sensor measurements transmitted independently of the other sensors, so that packet Chapter 2. Networked State Estimation with Irregular Sampling Pattern 23 losses occur independently for each sensor. In particular, in the multi-sensor Bernoulli case with packet drop probability pk for sensor k, we have

i 1 Pr T (n)=iD =(1 p)p i 1 s L where (for L sensors) p = ’ pk. The moment generating function of the superposed MSTP k=1 becomes (1 p)esD f (s)= Ts 1 pesD and its region of convergence is Re(s) < s , where s = 1 ln p. The necessary stability max max D condition (2.3) can be expressed in this case as

L 2DmaxRe(li) p = ’ pk < e i , pc (2.4) k=1

This special result was also obtained in [12], but under the more restrictive assumption that all sensors are co-located and their observations are all transmitted in a single data packet. This clearly makes the transmission scheme more vulnerable to packet-drops, as well as cyber- security attacks. In particular, the use of multiple synchronized sensors does not change the value of pc , the critical packet-drop probability, so that stability of the scheme discussed in [12] does not improve when more sensors are used. In contrast, when each sensor data are transmitted independently, our result (2.4) shows that the critical value pc decreases as more sensors are deployed: stability is improved because now

L 1 (k) s = ln p = smax (2.5) max D ’ k Â ⇣ k=1 ⌘ k ( ) where s k = 1 ln p is determined by the region of convergence for the moment generating max D k function of sensor “k”, so the range of feasible Re(li) values becomes wider. The result smax = (k) Âsmax holds for a wide variety of MSTPs. k

In general, the stability condition (2.3) can be violated only when smax is ﬁnite, which happens only if the range of possible Ts(n) is unbounded. For instance, in the case of a Ts(n) that is uniformly distributed in the range [D(1 a), D(1 + a)], the moment generating function is sinh(aDs) f (s)=eDs Ts aDs for all s, so that smax = • and the condition (2.3) is always satisﬁed. Chapter 2. Networked State Estimation with Irregular Sampling Pattern 24

2.2.2 Upper bound on EP (tn)

We start with the simplifying assumption that the MSTP has the i.i.d property, which allow us to derive an upper bound for EP (tn). Since Ts(n + 1) is independent of P+(tn) under the assumed i.i.d property, we can apply (C.5) and (C.8) to (2.1), resulting in the relation

1 EP (tn+1)=T WT [T EP+(tn)T ⇤] T ⇤ + Plower (2.6a) { s } where M

WTs = fTs (ll + lm⇤) (2.6b)  l,m=1

Next we turn our attention to the measurement update (1.2b), which we can rewrite as

1 1 1 P+(tn)= P (tn)+C⇤(tn)R (tn)C(tn) (2.7) ⇣ ⌘ 1 Notice that the relation (2.1) implies that P (tn) > 0 if P+(tn 1) > 0, so that P (tn) is well deﬁned in (2.7) and, consequently, P+(tn) > 0 as well. It follows by mathematical induction that

P (tn) > 0 for all n, provided that P(t0) > 0. Next we observe that C(tn) and R(tn) depend only ± on the label sn , while P (tn) depends on Ts(1), Ts(n) and s1, sn 1 . Thus { ··· } { ··· }

E P+(tn) Ts(1:n),s1:n 1 = F(P (tn)) | where the matrix function F(X) is

K 1 def 1 1 F(X) = Â Pk X +Ck⇤Rk Ck (2.8) k=1 ⇣ ⌘ and P is the probability distribution for the i.i.d sequence s . k { n} Using the matrix inversion lemma and the assumption that X > 0, we can now rewrite F(X) as K 1 F(X)= P X XC⇤(C XC⇤ + R ) C X Â k k k k k k k=1 ⇣ ⌘ which is a convex combination of concave monotone increasing functions (see Lemmas 2 and 3 in App . E). Consequently F(X) inherits the same properties, and so

EP+(tn)=EF P (tn) F EP (tn)  where we used Jensen’s inequality. Combining this with the averaged time update (2.6a) we obtain

EP (tn+1) G(EP (tn)) (2.9a)  Chapter 2. Networked State Estimation with Irregular Sampling Pattern 25 where def 1 G(X) = T W [T F(X)T ⇤] T ⇤ + P (2.9b) { Ts } lower

The function G(X) inherits the properties of F(X), i.e., it is monotone increasing and down- convex. It follows, by mathematical induction, that EP (tn) Vn where the matrix sequence  Vn ;1 n < • is generated by the nonlinear recursion Vn = G(Vn 1) with V0 = EP(t0), which {  } is the non-random matrix used to initialize the evolution of P(t) in the Kalman ﬁlter. Assuming that the discrete-time system deﬁned by (1.1) with tn = nD is both detectable and stabilizable, the matrix sequence Vn will converge when the average observation rate Fav exceeds a certain upper upper critical value Fc . Finding a compact expression for Fc is still an open research problem in the general case (see [12] for the synchronized Bernoulli case).

In summary, we conclude that in steady state

def P ,av = lim EP (tn) limVn = Pupper (2.10) n • n • !  ! as well as P ,av Plower . We use here the Bernoulli MSTP to illustrate our results (Fig. 2.6). We use two sensors with 5 distinct combinations of p1 and p2 values, so that we can control the VarTs(n) relative variance x = 2 = p of the MSTP. We plot the trace of P ,av , limn • EP (tn), Tav ! along with the trace of the two matrix bounds Plower and Pupper = limn • Vn , as a function of ! 2 1 pi the average sampling rate Fav . Since Fav = Âi=1 D (recall (1.4)) we can control the value of Fav by changing D. The system parameters we use in this example are (G.3) which can be found in App. G. Notice that we have two unstable eigenvalues of A, and that the system is both controllable and observable. While the two bounds remain close for large values of Fav , they separate from each other as Fav decreases. The lower bound tr(Plower) and the actual tr(P ,av) ln p increase dramatically as Fav approaches the critical value implied by (2.3), i.e. Dc = 2maxRe(l ) i lower 2 1 pi and F = Â (which in this example is 1.67), and diverge for Fav Fc. On the other c i=1 Dc  hand, the upper bound diverges for Fupper 3.05. c ⇡ However, for other sets of system parameter values ((G.4) in App. G), both bounds remain close for all Fav > Fc , as shown in Fig. 2.7 for Notice that in this case A has two stable and one unstable eigenvalues, and the system is observable and controllable.

Another interesting result we ﬁnd in both Fig. 2.6 and Fig. 2.7 is that as we decrease the variance index x, P ,av decreases to some extent when the average sampling rate is ﬁxed, which means that the variance of the sampling interval sequence still has effect on estimation performance. However it is much smaller as compared with the effect of Fav.

The upper bound in (2.10) relies on the assumption that the MSTP (t ,s );1 n < • has { n n  } the i.i.d property. In general, however, the superposition of independent renewal processes does not possess the i.i.d property. This is true, in particular, for all continuous-time MSTP other than Chapter 2. Networked State Estimation with Irregular Sampling Pattern 26

10000 ξ=0.34

9000 ξ=0.27 ξ=0.20 8000 ξ=0.14

7000

6000 ) ,av −

5000

trace(P 4000

3000

2000

1000

0 0 1 2 3 4 5 6 7 8 9 10 Average sampling rate F av

550 ξ=0.34 ξ=0.27 ξ=0.20 =0.14 500 ξ

450 ) ,av −

400 trace(P

350

300

1.9 1.95 2 2.05 2.1 2.15 Average sampling rate F av

FIGURE 2.6: P ,av vs Average Sampling Rate Fav in two-sensor Bernoulli-drop case: two unstable eigenvalues in A Chapter 2. Networked State Estimation with Irregular Sampling Pattern 27

10000 ξ=0.34

9000 ξ=0.27 ξ=0.20 8000 ξ=0.14

7000

6000 ) ,av −

5000

trace(P 4000

3000

2000

1000

0 0 1 2 3 4 5 6 7 8 9 10 Average sampling rate F av

ξ=0.34 700 ξ=0.27 ξ=0.20 ξ=0.14 650

600 ) ,av −

550 trace(P

500

450

1.9 1.95 2 2.05 2.1 2.15 Average sampling rate F av

FIGURE 2.7: P ,av vs Average Sampling Rate Fav in two-sensor Bernoulli-drop case: one unstable eigenvalue in A Chapter 2. Networked State Estimation with Irregular Sampling Pattern 28

Poisson [20]. We employ instead a technique to construct an MSTP approximation that has the i.i.d property [41], and thus gives rise to an approximate upper bound.

2.2.3 Renewal approximation for the MSTP

The method presented in [41] produces an i.i.d. approximation Ts(n) to the sequence Ts(n) of mutually-dependent inter-sample intervals, while preserving the ﬁrst order probability density e function, so that Pr T (n) t = Pr T (n) t . This makes it possible to evaluate the moment { s  } { s  } generating function f (s) in closed form, so that the corresponding upper bound sequence Vn e Ts and its asymptotic limit can be determined. In addition, we attach a randomly selected label sn to e e each interval Ts(n) with the same distribution as the (steady state) label sn , so that Pr sn = k = { e} Pr sn = k . The resulting approximate MSTP (Ts(n),sn) has the i.i.d property and preserves { } e def e the ﬁrst order probability density functions of (Ts(n),sn). Thus P ,av = lim EP (tn) is upper n • e e ! bounded by limVn , and we conjecture that the same upper bound applies also to the original n • ! e e averaged error covariance P ,av . e We illustrate the utility of this approach by the example (G.3) in App. G. The sampling intervals for each one of the two sensors are uniformly distributed in [0,Tmax], and the probability density function (pdf) of the superposed MSTP, which we also use as the pdf of our renewal

3 Ts 2 approximation, is f (Ts)= (1 ) = f (Ts) for 0 Ts Tmax . The resulting upper bound, Tmax Tmax   as well as the actual values of tr(P ,av) and tr(P ,av), are presented in Fig. 2.8. We notice that e tr(P ,av) tr(P ,av), which motivates our conjecture that the (approximate) upper bound still ⇡ e applies to P ,av. e

2.2.4 Effect of sensor differences

In contrast to the single sensor case, the performance of a continuous-discrete Kalman ﬁlter using measurements from multiple, non co-located, sensors depends both on the statistics of the superposed MSTP and the distinct properties of individual sensors. Thus the resulting Pav may depend, in a non-symmetric way on the average sampling intervals of all the individual sensors. In the discussion of resilience in in Chapter 6, we ﬁnd out when the sensors are all observable, the differences are barely visible (Fig. 6.1), however if one of the sensors is unobservable, the differences will be big (Fig. 6.3). Using the same system parameters (G.3) in App. G, we show in Fig. 2.9 that the same performance is achieved when 1 p = 0.48,1 p = 0 and 1 2 1 p = 0,1 p = 0.52 which means sensor 1 needs little bit smaller measurement rate than 1 2 sensor 2 to achieve the same level of performance. This difference is obvious when the level of performance is high which means the error is small, for instance, here the curve represents the 10dB error. However when the error increases to 90dB which can be considered as unstable Chapter 2. Networked State Estimation with Irregular Sampling Pattern 29

4 x 10 2 lower bound 1.8 upper bound actual 1.6 renewal approximation

1.4

) 1.2 ,av −

0.8 trace(P

0.6

0.4

0.2

0 0 1 2 3 4 5 6 7 8 9 10 Average sampling rate F av

FIGURE 2.8: P ,av vs. the average sampling rate Fav : two-sensors with uniformly-distributed sampling intervals. margin, the difference between sensors is not observable. In sum, we can say that the effect of sensor differences only exist when we compare them at the performance level not in the stability level. We will continue this discussion in Chapter 6. Chapter 2. Networked State Estimation with Irregular Sampling Pattern 30

1 90

0.9 80

0.8 70 0.7

60 0.6 ) 2 p

− 0.5 50 (1

0.4 40

0.3 30 0.2

20 0.1

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (1−p ) 1

FIGURE 2.9: Contour plot of P ,av vs Packet receiving probability 1 p with two different sensor Bernoulli-drop case Chapter 3

Networked State Estimation with Communication Delay

Communication delay has been recognized by many authors as one of the main challenges to networked estimation (see, e.g. [8], [30], [33]). The use of time-stamping allows accurate measurement of information packet delays (up to the available timing accuracy), enabling prediction-based mitigation of the adverse effects of delay. Thus the net effect of (mitigated) delay is an increase in estimation error. Much of previously published work on networked estimation has been dedicated to obtaining explicit expressions for the average estimation error covariance as a function of the statistics (pdf or moment generating function) of the length of delay.

Several papers have addressed the delay issue in estimation and control problems. Some of them consider the delay to be time-varying and unknown. In this case, we can only do the analysis when there is a assumption on delay, such as the delay has a known upper bound. In [42], the authors considered the problem of robust H• ﬁltering for continuous-time uncertain linear systems with multiple time-varying unknown delays and proposed an LMI-based asymptotically stable linear ﬁlter assuring a guaranteed performance, irrespective of the uncertainty and the size of the time delays.

Other papers, such as ([8], [27], [43], [44], [31], [32], [33]), consider known delay that is either constant or random time-stamped. Time-stamping makes it possible to use a standard Kalman ﬁlter (augmented with sufﬁcient buffering capacity) to obtain the optimal estimate in the presence of delay, as described in Sec. 1.4.

In the discussion of delay, we can consider delay either in the measurement path or in the control path (recall Fig. 1.1). In the literature, most of the papers investigate the problem of optimal estimation with delay in measurement path ([8], [43], [44], [31], [32], [33]), while

31 Chapter 3. Networked State Estimation with Communication Delay 32

[34] discusses the delay in control path, and [27] considers both types of delay (i.e. in both measurement and control paths).

The presence of delay in the control path may results in loss of stability of the overall system. This is true, in particular, when one of the functions of the control loop is to stabilize an unstable open loop system. Thus the primary objective of our control module is to mitigate the effects of delay in the control path, so as to avoid loss of stability of the closed-loop system. The method that we presented in [34] allows us to overcome very large delay (at least 10-20 times higher than the smallest destabilizing delay in the original control loop) at the cost of increased state estimation error.

Time-varying delay in the measurement channel is also discussed under a variety of assumptions. For instance, in [31], the time-varying delay has a known upper bound. However the delay durations are allowed to be larger than a single sampling interval, which may cause the delayed measurements to be received out-of-order. Buffering is needed to allow a reordering of the measurements in the order in which they were originally acquired. On the other hand, [27] restricts the delay to be upper bounded and also shorter than one sampling interval, which simplifies the implementation to a certain extent. Once the measurements have been reordered, a standard Kalman filter is used to construct the optimal MMSE state estimate ([27], [31]). The optimal filter in [31] is time-varying, stochastic, and does not converge in general to a steady state. However, the optimal estimator and controller in [27] is guaranteed to be stable.

3.1 Delay in Control Path

Following the discussion in Sec. 1.4, we restrict our discussion from now on to networked estimation/control systems in which the control module receives state estimates, rather than control signals, from the estimation center (Fig. 1.1). Thus, we follow the formulation in [40] and [34], assuming that the estimation center transmits a sequence of state estimates, say x(tn), to the control module attached to the actuator. The time instants t1, t2 include, at least, { ···} b all the times when x(t) underwent a measurement update. The control module receives, at the c c time instant qn = tn + Dn, the pair x(tn),tn , where Dn denotes the delay in the control path b { } (Fig. 1.1). By comparing the time-stamp tn with the physical time of arrival qn, the control b module can accurately determine the value of the delay Dc = q t . n n n

Notice that the estimate delivered to the control module at time qn refers to a delayed version of the state vector (i.e., at time tn). To remedy this discrepancy, the control module carries out a local time update, viz., (d) ADc x (qn)=e n x(tn) (3.1)

b b Chapter 3. Networked State Estimation with Communication Delay 33

Moreover, for all q t < q , the state estimate used by the control unit is n  n+1

(d) A(t qn) (d) x (t)=e x (qn) (3.2)

Notice that following from (3.1) andb (3.2), we have b

(d) A(t tn) x (t)=e x(tn) (3.3)

b b This local estimate is then used to construct the control signal u(t)= Kx(d)(t). The time- (d) update equation (3.1), which maps x(tn) into x (qn), induces a similar relation between the b associated error covariances, viz., b b c c c Dn (d) AD A⇤D As A⇤s P (qn)=e n P+(tn)e n + e Qe ds (3.4) Z0

Applying ensemble averaging results in a simple relation between P+,av and Pav, namely [34]

1 (d) 1 1 b T Pav T ⇤ = WDc T P+,av T ⇤ + fDc T QT ⇤ (3.5a) ⇥ ⇤ ⇥ ⇤ where

M WDc = fDc (ll + lm⇤) l,m=1 h i (3.5b) 1 M fDc = fDc (ll + lm⇤) 1 ll + lm⇤ l,m=1  ⇣ ⌘ c Here fDc (s) is the moment generating function associated with the distribution of the delay Dn . Thus the delay-mitigation method of [34] results in increased error covariance, but at the same time, avoids the destabilizing effect of delay in the control path.

To illustrate the capability of the delay-mitigation method, we consider the unstable system (G.3) in App. G. This system can be stabilized, for instance by conventional feedback control with K =[17 7], but when delay is present in the control feedback path, the closed-loop system becomes unstable for Dc 0.18s Dc . In contrast, using our delay mitigation technique, the , cr closed-loop system remains stable, with delays as high as Dc 20s. However, the level of estimation error increases with the increase of delay, as shown in Fig. 3.1: the SEER decreases when delay increases. Notice that we still have SEER 18dB when Dc = 2s, a delay value that ⇡ c is 10 times higher than the critical delay Dcr = 0.18s(top curve in Fig. 3.1).

The effect of additive noise at the sensor output or in the communication channel is captured by the lower two curves in Fig. 3.1. In one case (dashed line), white noise is added to the acquired observations y(t). In the other (circle line), to the transmitted estimate x(t). The level of noise in each case is adjusted to achieve a local signal to noise ratio of 20dB. All three curves b Chapter 3. Networked State Estimation with Communication Delay 34

SEER vs Delay involving noise 100 No Noise 20dB Noise at Output 20dB Noise at Channel

40 SEER /dB

−20 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 delay /s

FIGURE 3.1: Effect of delay mitigation on SEER, in the presence of observation noise in Fig. 3.1) display the same dependence on the length of delay (Dc), and illustrate the fact that our delay-mitigation technique transmutes delay into increased estimation error.

SEER vs Delay involving inaccuracy of A and B 100 No Inaccuracy 1% Inaccuracy 5% Inaccuracy 90

SEER /dB 50

10 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 delay /s

FIGURE 3.2: Effect of delay mitigation on SEER, in the presence of system parameters inaccuracy

Our method for mitigating delay in the control path relies on availability of the system parameter matrices A and B at the control module, so that (3.1) and (3.2) can be used to evaluate x(d)(t). Inaccuracies in system parameters will result in further degradation in the quality of the estimation x(d)(t), as demonstrated in Fig. 3.2. The two lower curves in this figure correspond b b Chapter 3. Networked State Estimation with Communication Delay 35 to element wise inaccuracy in A and B of 1% (dash line) and 5% (circle line). Clearly, higher inaccuracy results in higher estimation error level (equivalently lower SEER), and this relation is approximately linear for small levels of inaccuracy. This result motivates us to develop a robust Kalman filter that can reduce the sensitivity to parameter inaccuracies as compared with the standard Kalman filter when our delay mitigating method is applied.

Other papers also analyze delay in the control path. In particular, [27] combines random time delays with real-time control and develops a new scheme using time-stamping to handle random delays in both the observation path and the control path. Under the assumptions of independence of delays, and limitation of delay duration to less than a single sample interval, the authors present an algorithm that provides both optimal state estimation and optimal state feedback. More important, the authors prove the separation principle which allows feedback control based on the optimal state estimate instead of real state in the presence of delay. In addition [27] discusses also observation path delay. In fact, the same dynamic prediction approach that we used to mitigate delay in the control path can be also used for countering the effects of delay in the observation path, as discussed in the following section.

3.2 Delay in Observation Path

While the basic principles of time-stamped delay mitigation apply also to dealing with delay in the observation path, there are two complications that need to be addressed before attempting prediction-based compensation of the delay. First the measurements from the sensors may arrive out of order due to random time-varying delay, so that evaluation of the state estimate may not be possible using the standard Kalman ﬁlter recursion. In addition, while a measurement update can be carried out as soon as a measurement is received, the update equations must use tn (the o time the measurement was acquired), not tn + Dn (the time the measurement was received at the estimation center). These complications suggest that the standard Kalman ﬁlter has to be augmented with the capacity to backtrack to an earlier process time (which is not the same as physical time) and then predict forward to the present.

In networked estimation and control systems, the two typical distributions of the delay in computer network includes exponential distribution and truncated normal distribution. In particular the exponential distribution has been shown as the best way to describe the process of packet delay in a TCP/IP based network [45]. As a example, phasor measurement unit (PMU) which measures the electrical waves on an electricity grid, using a common time source for synchronization can measure 50/60 Hz AC waveforms (voltages and currents) typically at a rate of 48 samples per cycle (2880 samples per second) which equals around 0.3 ms in sampling interval. Under this packet arrival rate with typical packet size, we have the average delay around 0.05 Chapter 3. Networked State Estimation with Communication Delay 36 ms, so the average delay is normally smaller than the sampling interval and the likelihood that the delay exceeds sampling interval is small as well.

In the literature, several papers investigate the problem of optimal estimation with delay in the measurement path ([8], [31], [32], [33], [43], [44]). In [31], [43] and [44], the optimal estimation problem for discrete-time systems with time-varying delay in the measurement channel is studied under the assumption that the time-varying delay has a known upper bound. In particular, in [43], the authors first reformulate the delayed measurement as a measurement with multiple state delay and then the estimation problem is transformed into one for systems with multiple measurement channels that contain the same state information as the original measurement and each channel has single constant delay. Finally, the authors derive the optimal estimator by adopting a reorganized innovation analysis approach. The estimator is designed by propagating in parallel a number of Riccati difference equations. In [31], the random delay is known from the time-stamp and the random delayed system is reconstructed as an equivalent delay-free one by using measurement reorganization technique. The authors present both an optimal time-varying linear Kalman filter, as well as a suboptimal filter with constant gain to achieve a certain level of performance and stability. In summary, both papers rely on measurement reorganization (using stored information) as a preliminary step before applying a standard Kalman filter. An alternative approach is presented in [46], where buffering is avoided by computationally propagating estimates backward in time.

In summary, we need to distinguish between approaches that consider the delay as unknown (such as [42]) and those that use time-stamping (e.g. [8], [31], [32], [33], [43], [44]) to obtain an accurate estimate of the possibly time-varying delay associated with each received measurement. In addition, while relying on the assumption that the delay never exceeds the length of a single inter-sample interval simpliﬁes both the implementation and the associated performance analysis (see e.g. [27]), this assumption may not hold in many speciﬁc cases. In the sequel we restrict our discussion to the time-stamped approach, and examine the implications of restricted delay length.

The optimum Minimum Mean Square Error (MMSE) state estimate x(t) without transmis- def sion delay is x(t) = x t Y (t) , namely the projection of the state x(t) on the measurement subspace ⇣ ⌘ b b def Y (t) = span y(t ); t t { n n  }

In the presence of delay in the observation path, the measurement y(tn) becomes available only o o at the time instant tn + Dn , where Dn (in general time-variant and random) is the transmission def delay. Thus, the optimum MMSE state estimate in the presence of known delay is x(d)(t) = x t Y (d)(t) , namely the projection of the state x(t) on the subspace of delayed measurements b ⇣ ⌘ b def Y (d)(t) = span y(t ); t + Do t { n n n  } Chapter 3. Networked State Estimation with Communication Delay 37

Since Y (d)(t) is a subspace of Y (t), then x(d)(t) has a larger error covariance than x(t), namely P(d)(t) P(t) for all t. b b In general, it is difficult to relate Y (d)(t) to Y (t) when the ordering of the time instants t + Do is different from that of the original sequence t of sampling times. In such case, { n n} { n} x(d)(t) can be efficiently implemented by augmenting a standard Kalman filter with a sufficient amount of memory that stores (and retrieves) information about several most recent measure- b ment update steps. However, we can make a more detailed statement about the relation between (d) P (tn) and P (tn) if we assume that the delay does not alter the order in which measurements ± ± o o are received at the estimation center, namely tn 1 +Dn 1 < tn +Dn holds for every n 1. This assumption holds, for instance, where the original sampling is regular (possibly with packet drops) and the deration of delay is upper-bounded by the length of the original sampling interval, such as assumed in [27]. However, this assumption is also met in more general cases, such as regular sampling with arbitrary constant (non-random) delay. The assumption

o o tn 1 + Dn 1 < tn + Dn , n 1 (3.6) implies that o o tk + Dk < tn + Dn for all k < n (3.7) and, consequently that (d) o Y (tn + Dn)=Y (tn) (3.8)

This key relation allows us to relate x(d)( ) explicitly to x ( ), namely, + · + ·

(d) o o b o b ADn x+ (tn + Dn)=x+ tn + Dn Y (tn) = e x+(tn) (3.9) ⇣ ⌘ and, consequently, b b b

o o o Dn (d) o ADn A⇤Dn At A⇤t P+ (tn + Dn)=e P+(tn)e + e Qe dt (3.10) Z0

In this case x(d)( ) can be efficiently determined via a (slightly) modified version of the continuous- + · discrete Kalman filter (1.2)-(1.3). In particular, the evaluation of x+(tn) and P+(tn) requires b backtracking only to tn 1, so that we only need to recover from storage x+(tn 1) and P+(tn 1). b In the other words, this measurement-update procedure is identical to the one used in a standard b o (delay-free) Kalman filter, except for the need for predicting from tn to tn + Dn , using (3.9) and (3.10).

(d) The same relation holds also for P (tn) and P (tn +Dn). By analogy with the relation (2.6a), (d) def we can apply ensemble averaging to obtain a simple relation between P ,av and P ,av = Chapter 3. Networked State Estimation with Communication Delay 38

(d) lim EP (tn + Dn) , namely n • !

1 (d) 1 1 T P ,av T ⇤ = WD T P ,av T ⇤ + fD T QT ⇤ (3.11a) ⇥ ⇤ ⇥ ⇤ where

M WD = fD(ll + lm⇤) l,m=1 h i (3.11b) 1 M fD = fD(ll + lm⇤) 1 ll + lm⇤ l,m=1  ⇣ ⌘

Here fD(s) is the moment generating function associated with the distribution of the delay o Dn , averaged over all sensors. This result is obtained under the assumption that the delays D are mutually independent, and also independent of the sensor sampling times t . { n} { n} The presence of delay always leads to the increase of the error covariance, similar to the effect of the time-update process in a standard (delay-free) continuous-discrete Kalman ﬁlter. This results in an increase in the average error covariance, as shown in our example (Fig. 3.3).

The delay we use in this example is uniformly distributed in [ 0,Dmax ], and its impact increases

(almost linearly) with Dmax . Notice that the effect of delay can usually be compensated by increasing the average observation rate Fav.

In contrast to Fig. 3.3, which shows the effect of delay on the average error covariance, the example we discuss in App. H (Islanded Power System) illustrates the local effect of delay (Fig. 3.4). We notice that the effect of delay is particularly noticeable at the instants of sharp transitions, which in this example are caused by step changes in the control input (= engine throttle position). Thus x(d)(t) x(t) when the control signal is constant, while x(d)(t) lags (c) ⇡(c) (c) behind x(t) in the intervals (ti , ti + D), where ti denotes the instants of step change in the b b b control input, and D (=0.2 sec) is the delay in this example. b

3.3 Time-Stamp Accuracy

As we indicated before, our analysis of the effect of delay is based on using time-stamping, which is assumed to provide an accurate value for tn, the time instant at which the n-th measurement has been acquired. When the clock used to establish the time-stamp is not perfectly accurate, the time-stamp that is attached to Y (t ) is, in fact, t n , where n indicates the error n n n n contributed by the limited accuracy of the sensor sampling clock. Assuming that the clock used at the estimation center is very accurate (as compared with the sensor sampling clock), the delay Chapter 3. Networked State Estimation with Communication Delay 39

4500 without delay with delay U(0,0.1) 4000 with delay U(0,0.5)

3500

3000 ) ,av

− 2500

2000 trace(P

1500

1000

500

1.5 2 2.5 3 3.5 Average sampling rate F av

(d) FIGURE 3.3: P ,av vs. the average sampling rate Fav : two-sensors with uniformly-distributed sampling intervals. value determined by our delay mitigation module will be

(t + Do) (t n )=Do n n n n n n n so that the time-stamp error maps directly into an error in our delay estimate. As a result, the expression (3.9) is replaced by

o (d,n) o A(Dn+nn) Ann (d) o x+ (tn + Dn)=e x+(tn)=e x+ (tn + Dn) (3.12) so that the delayed versionb state estimate includesb a perturbationb caused by the timing error nn.

As a result, the actual error covariance is given by

o o o nn (d,n) o Ann (d) o A⇤Dn At A⇤t P+ (tn + Dn)=e P+ (tn + nn )e + e Qe dt (3.13) Z0 which can be averaged in time to yield the relation

1 (d,n) 1 (d) 1 T P T ⇤ = Wn T P T ⇤ + fn T QT ⇤ +,av +,av h i ⇥ ⇤ with Wn , fn deﬁned , as in (3.11b), in term of the moment generating function fn (s) associated with the probability density function of the timing error process n . Here we assumed that { n} Chapter 3. Networked State Estimation with Communication Delay 40

Real State Robust State Estimate with delay Standard State Estimate with delay 0.05

−0.05 state 1 −0.1

−0.15

−0.2

−0.25 0 100 200 300 400 500 600 700 800 iteration (8sec)

FIGURE 3.4: Robust Kalman Filter with Delay Mitigating Control in Islanded Power System the random process D and n are statistically independent of each other, and that n { n} { n} { n} has a zero-mean. Clearly, we should expect a larger impact on the error covariance as the 2 statistical spread of nn values increases. For instance, if nn is gaussian with variance sn , then 1 2 2 { } 1 2 2 M s sn (ll +lm⇤ ) sn 2 fn (s)=e 2 , so that Wn = e 2 and similarly for fn : increasing sn causes each l,m=1 h i (d,n) element of Wn and fn to increase in magnitude, resulting in a higher value for P ,av . ± This relation is evident in Fig. 3.5, corresponding to a simulation with a zero-mean gaussian timing inaccuracy n . In the simulation, we choose three different value of s to see the rela- { n} tion between the level of inaccuracy and the level of increased error. In Fig. 3.5, the three ﬁgures represent three different s with s1 = 0.01, s2 = 0.02 and s3 = 0.05. The x axis represents the s relative inaccuracy ( Do ) and the y axis represents relative error covariance (the ratio of increasing error and error without inaccuracy). As shown in Fig. 3.5, we can see all three ﬁgures are almost identical with each other and the relative error covariance increases as the relative inaccuracy s increases. For instance, when Do = 0.4, we have around 1% increasing in error covariance, and s as Do increase to 1, we have around 14% increasing in error covariance. From this point of view, we cam decide the acceptable relative inaccuracy from the acceptable increasing percentage in error covariance. Chapter 3. Networked State Estimation with Communication Delay 41

error covariance: delay with time−stamping inaccuracy stable system; Inaccuracy~N(0, 0.01)

0.15

0.1

0.05

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

error covariance: delay with time−stamping inaccuracy stable system; Inaccuracy~N(0, 0.02)

0.15

0.1

0.05

relative error covariance 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

error covariance: delay with time−stamping inaccuracy stable system; Inaccuracy~N(0, 0.05)

0.15

0.1

0.05

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 relative inaccuracy on delay (sigma/delay)

FIGURE 3.5: Effect of time-stamp inaccuracy on delay Chapter 4

Networked State Estimation in the Presence of Bad Data

The performance of a KF dynamic state estimator can be significantly affected by the presence of bad data. Notice that successful detection and removal of all bad measurements can significantly reduce the average state-estimation error, as compared with the “untreated” case (Fig. 4.1). Motivated by this observation, we propose an innovations-based detector for identifying corrupted sensor measurements. The incorporation of such a detector into a standard Kalman filter can result, under typical operation conditions, in a significant performance improvement. We provide in this chapter guidelines for customization of the proposed detector, under both a Bayesian and “Fisherian” approach to statistical decision theory. The former is recommended when the statistical distribution of corrupted measurements is available; the latter is more appropriate when little is known about the likelihood of occurrence of bad measurements and their statistics.

42 Chapter 4. Networked State Estimation with Bad Data 43

Bad Data Detector Comparison (bad data shift = 3) 0.08

Perfect Detector No Detector 0.07

0.06

0.05

Error Covariance 0.04

0.03

0.02

0.01 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Probability of Bad Data

FIGURE 4.1: Effect of bad data on the performance of a Kalman ﬁlter.

4.1 Model of Bad Data

In this chapter, we consider two general scenarios of sensor data corruption: erasure/replacement corruption model and additive corruption model.

Erasure/replacement corruption

In this scenario, the normal sensor measurement is replaced by a corrupted one. In other words, we assume that the measurements received by a Kalman ﬁlter estimator are given by

C(tn)x(tn)+hn H0 y(tn)= (4.1) ( bn H1 where H1 indicates bad data, while H0 indicates normal (i.e., un-corrupted) sensor observations. The range of typical bn values in power system applications is assumed to exceed, by several standard deviations, the range of normal (un-corrupted) sensor measurements [26, 35].

The detector we propose classiﬁes each received measurement y(tn) as either “good” (i.e.,

H0) or “bad” (i.e., H1). Every time the detector rejects the hypothesis H0 , the measurement update step is skipped (= the measurement is dropped). Notice that the KF can be implemented to carry out a (scalar) measurement update for each individual sensor measurement, even when Chapter 4. Networked State Estimation with Bad Data 44

several measurements are received at the same time. Consequently y(tn) in (4.1) is always a scalar, and the detector needs only to classify scalar (i.e., single-sensor) observations.

Additive corruption

An alternative model for occurrence of bad data was proposed in [47], namely

C(tn)x(tn)+hn H0 y(tn)= (4.2) ( C(tn)x(tn)+bn H1

def where b is a zero-mean sequence with (possibly sensor dependent) variance E b 2 = e . n | n| n

4.2 Innovation-Based Detection and Suppression of Bad Data

In principle, the decision whether to keep or drop the measurement y(tn) should be based on the entire set of causally available measurements y(t ),y(t ),...,y(t ) or, equivalently, the { 1 2 n } corresponding innovation sequence n(t ),n(t ),...,n(t ) . The innovations are evaluated as { 1 2 n } part of the measurement update step of the Kalman ﬁlter dynamic state estimator (see App. A). In the absence of bad data, and assuming a jointly Gaussian probability distribution for the process noise w(t) and the measurement noise hn, the innovations are also Gaussian and mutually independent [19]. If the same holds true also in the presence of bad data, namely if

n p n(t1),n(t2),...,n(tn) H(t1),H(t2),...,H(tn) = ’ p n(tk) H(tk) (4.3) ⇣ ⌘ k=1 ⇣ ⌘ then the optimal decision rule for classifying y(tn) involves only n(tn) and none of the preceding innovations. Although the relation (4.3) need not hold in the most general case, we adopt here the (possibly suboptimal) approach of using only n(tn) to make a decision about

H(tn). This approach has the advantage of low implementation cost, in addition to reducing the average (mean-square) state estimation error almost to the same level that would be achieved with perfect detection of bad data (see Fig. 4.3).

4.2.1 Bayesian detection

The optimal decision rule, from a Bayes criterion perspective, is [48] (see also App. F)

H0 L n(tn) 7 zn (4.4a) H h i 1 Chapter 4. Networked State Estimation with Bad Data 45 where L(n) is the likelihood ratio function

def pn(t )(n H1) L(n) = n | (4.4b) p (n H ) n(tn) | 0 and zn is the optimal (Bayesian) threshold, given by

(c c )Pr(H ) z = 01 00 0 (4.4c) n (c c )Pr(H ) 10 11 1 Here Pr(H ) is the probability of occurrence for hypothesis H , and c ; i, j = 0,1 are the i i { ij } costs (= penalties) associated with the four decision scenarios, as deﬁned in Table 4.1.

D0 (Decision = H0) D1 (Decision = H1) Truth = H0 c00 c01 Truth = H1 c10 c11

TABLE 4.1: Four decision scenarios and their costs

Notice that the threshold zn is sensor-dependent and, consequently, time-dependent. This allows for using sensor-dependent costs, reﬂecting the fact that each sensor causes a different change in the state-error covariance, as well as sensor-dependent prior probabilities Pr(H0) and

Pr(H0).

Since the innovation n(tn) is always scalar, the two decision regions corresponding to the decision rule (4.4) are subsets of the real line. In particular, when the fraction L(n) is symmetric with respect to the origin, as well as monotone increasing in [0,•), the decision rule (4.4) induces the simple equivalent H0 1 n(tn) 7 L (zn) | | H1

1 where n = L (z) denotes the inverse function of z = L(n). Since the variance of n(tn) is often sensor-dependent, it is more convenient to rewrite this decision rule in a normalized form, viz., H0 n(tn) | | 7 qn (4.5a) Y(tn) H1

def p where Y(t ) = E n(t ) 2, and n H0 | n | 1 L (zn) qn = (4.5b) Y(tn) p n(tn) The normalized innovation | |/ Y(tn) is also commonly used to assess the quality of dynamic state estimates in the presencep of limited numerical precision [19]. Chapter 4. Networked State Estimation with Bad Data 46

Erasure/replacement corruption

The probability density p (n H ) and the cost difference c c can be determined n(tn) | 0 00 01 from the standard measurement update expressions of the Kalman ﬁlter, given in App. A.We choose to deﬁne the costs c in terms of the change in the state error covariance caused by { ij} each one of the four scenarios in the measurement update step, namely

def cij = tr P+(tn) P (tn) (Hi,D j) (4.6) n o In particular, the decision D1 results in dropping the current measurement, so that P+(tn)=

P (tn), and thus ci1 = 0 under both H0 (i = 0) and H1 (i = 1).

In contrast, the decision D0 results in keeping the current measurement, so that the error covariance is modiﬁed by the measurement update step. Under H0 we have the standard relation

P+(tn)=P (tn) K(tn) Y(tn)K⇤(tn) so that 2 c = tr K(t ) Y(t )K⇤(t ) = K(t ) Y(t ) 00 n n n k n k n n o and c c = K(t ) 2 Y(t ) (4.7) 01 00 k n k n where we used the fact that n(tn) is a scalar. If we assume, in addition, that under H0 the process noise and measurement noise are (jointly) Gaussian, then the zero-mean innovation process n(tn) is also Gaussian, with variance

Y(tn)=C(tn) P (tn) C⇤(tn)+Rn so that p (n H )=N 0,Y(t ) . n(tn) | 0 n Under normal operating conditions (i.e., under H0) the innovation variance has a limited range of possible values. In contrast, under (H1,D0) we use a bad measurement to carry out a standard measurement update, which results in a signiﬁcant increase in the magnitude (and variance) of the resulting “innovation.” Notice that the evaluation of the Kalman gain K( ) is · always carried out under the H0 scenario, and does not depend on the actual measurements. This means that the occurrence of an isolated bad measurement results in a transient perturbation of the state estimate, which eventually dies off. In order to simplify our analysis, we will assume that undetected bad measurements occur infrequently. In other words, we assume that almost all previous bad data occurrences have been detected and eliminated, so that the state estimate x (tn) is (approximately) based only on un-corrupted observations. We conclude from the preceding discussion that the state estimate x (tn) is uncorrelated with y(tn)=bn. Since in b b Chapter 4. Networked State Estimation with Bad Data 47 this case

n(tn)=bn C(tn)x (tn)(under H1) (4.8a) it follows that b 2 2 EH n(tn) = E bn +C(tn) P(tn) P (tn) C⇤(tn) (4.8b) 1 | | | | h i where we used the fact that the state covariance P(tn) satisﬁes the relation

def P(tn) = E x(tn)x⇤(tn) = E x (tn)x ⇤(tn) + P (tn) (4.9) n o n o The innovation variance of (4.8) is signiﬁcantly largerb than bY(tn), the innovation variance under normal (un-corrupted) operating conditions. This is so because, in general, P(tn) P (tn) and

def e = E b 2 R n | n| n which implies that

n(tn) This observation provides an intuitive rationale for using the magnitude of / Y(tn) as a decision statistic for detection of bad measurements. p

Returning to the evaluation of c10, we recall that the measurement update for the state estimate isx ˆ+(tn)=xˆ (tn)+K(tn) n(tn) (see App. A), which in our case is

xˆ+(tn)=xˆ (tn)+K(tn) bn C(tn)x (tn) h i = K(tn)bn + I K(tn)C(tn) x (tn) (4.10) b h i Recall that the evaluation of the Kalman gain is independent of theb actual measurements, and thus reﬂects only the scenario H0 . Thus the resulting ﬁltered estimatex ˆ+(tn) is no longer the optimum MSE estimate of x(tn) from the available measurements. On the other hand, the estimatex ˆ (tn) is (approximately) still the correct predicted estimate under H0 , provided that that undetected bad measurements occur infrequently, as we have assumed. This allows us to use the approximation (recall (4.9)

E x (tn)x ⇤(tn) P(tn) P (tn) ⇡ n o We can now rewrite (4.10) as b b

x(tn) xˆ+(tn)= x(tn) xˆ (tn) K(tn)bn I K(tn)C(tn) x (tn) h i h i b Chapter 4. Networked State Estimation with Bad Data 48 where the right-hand-side expression consists of three mutually-uncorrelated terms. The resulting error covariance for this scenario (a bad measurement occurs, but is not detected) is

⇤ P+(tn)=P (tn)+K(tn) enK⇤(tn)+ I K(tn)C(tn) P(tn) P (tn) I K(tn)C(tn) h ih ih (4.11a)i We conclude that

2 ⇤ c10 c11 = K(tn) en + tr I K(tn)C(tn) P(tn) P (tn) I K(tn)C(tn) (4.11b) k k nh ih ih i o

Additive corruption

The only expressions that need to be modiﬁed are (4.8) and (4.11). Now,

n(tn)=C(tn) x(tn) xˆ (tn) + bn h i so that 2 EH n(tn) = C(tn)P (tn)C⇤(tn)+en 1 | | which is signiﬁcantly larger than Y(tn) (the innovations variance under H0) by virtue of the relation e R . n n def Next introduce the notation x (tn) = x(tn) xˆ (tn), and observe that ± ±

xˆ+(tn)=e xˆ (tn)+K(tn) n(tn) = xˆ (tn)+K(tn) bn + K(tn)C(tn)x (tn) so that e

x+(tn)=x (tn) K(tn)C(tn)x (tn) K(tn) bn = I K(tn)C(tn) x (tn) K(tn)bn e e e h i Here, too, we will assume that undetected bad measurementse occur infrequently, so that the state estimate x (tn) is (approximately) based only on un-corrupted observations, and is thus uncorrelated with bn . Consequently, b ⇤ P+(tn)=K(tn) en K⇤(tn)+ I K(tn)C(tn) P (tn) I K(tn)C(tn) (4.12a) h i h i which we recognize as a variant of the so-called “Joseph form” of the measurement update for the error covariance [49] (see also App. A). We conclude that

2 ⇤ c10 c11 = K(tn) en + tr I K(tn)C(tn) P (tn) I K(tn)C(tn) P (tn) (4.12b) k k nh i h i o Chapter 4. Networked State Estimation with Bad Data 49

⇤ We notice that the expression I K(tn)C(tn) P (tn) I K(tn)C(tn) appears both in (4.11) and (4.12). This expression satisﬁesh the identityi (see App.h A) i

⇤ I K(tn)C(tn) P (tn) I K(tn)C(tn) + K(tn)Rn K⇤(tn)=P (tn) K(tn)Y(tn)K⇤(tn) h i h i which induces the compact representation

⇤ tr I K(tn)C(tn) P (tn) I K(tn)C(tn) (4.13) nh i h i o 2 = tr P (tn) K(tn) Rn + Y(tn) k k n o ⇣ ⌘

Optimal selection of q

The expressions for P+(tn) in each one of the four scenarios (Hi,D j) allow us also to determine the steady-state (average) error covariance and its dependence on the normalized innovation threshold q (Fig. 4.2). The optimal choice for q can be determined from such plots, thereby avoiding the need for an explicit solution of the (nonlinear) equation

(c c )Pr(H ) L(q)= 01 00 0 (c c )Pr(H ) 10 11 1 Notice that the error covariance plot as a function of q has a single global minimum, and is locally (but not globally) up-convex. Also notice that the optimal q decreases as the probability

Pr(H1) , p increases, and that the optimal normalized threshold is between 1.6 and 2.6 for 0.1 p 0.5 in the example used to obtain Fig. 4.2. In other words, using q = 2 in this case   is a reasonable choice when the information needed to ﬁnd the optimal q is not available. Chapter 4. Networked State Estimation with Bad Data 50

Error covariance from pdf: bad data 3σ shift 0.04 p=0.1 p=0.2 p=0.5

0.035

0.03

X: 1.72 Y: 0.02471

Error covariance 0.025

X: 2.28 Y: 0.02027 0.02 X: 2.6 Y: 0.01918

0.015 0 1 2 3 4 5 6 Normalized innovation threshold θ

FIGURE 4.2: Average error covariance as a function of the normalized innovation threshold.

4.2.2 Fisher-type detection:

When p (n H ) is not known, typically because H is a composite hypothesis, combining n(tn) | 1 1 many possible scenarios of “bad data,” we can no longer rely on the notion of a likelihood ratio. Nevertheless, we can still express the probability of false alarm in terms of the decision region for H (see App. F). In particular, when p (n H ) is symmetric with respect to the origin, 0 n(tn) | 0 the false alarm probability PFA is given by the area in the (one-sided) tail of this probability distribution, so that n(tn) PFA < a | | > qn (4.14) () Y(tn) where qn is a monotone decreasing function of ap. Notice the similarity of this decision rule to its Bayesian counterpart (4.5): these two decision rules differ only in the method used to determine the threshold q . In the Bayesian approach q is determined by the costs c and n n { ij} the prior probabilities Pr(Hi). In the Fisher (and also the Neyman-Pearson) approach we select the value of qn so as to achieve a prescribed level of PFA . Chapter 4. Networked State Estimation with Bad Data 51

4.3 Simulation Results

We consider here a stable system ((G.1) in App. G). In particular we use the additive corruption model (4.2) with the corruption bn to be zero-mean Gaussian with variance en = 1 . As we can see in Fig. 4.3, augmenting the standard KF with our Bayesian detector results in a sig- niﬁcant performance improvement compared to using a KF without a detector. Moreover, the performance of the Bayesian detector is practically matched by that of a Fisher detector using q = 2 (which corresponds to false alarm probability PFA = 5%). Only when bad data occur infrequently (e.g. p < 0.1), using a Fisher detector with q = 2 results in a slight performance degradation. This can of course be remedied by increasing the value of the normalized innovation threshold q . Nevertheless, in the absence of any information about the statistics of bad data, the Fisher-type detector offers a nearly-optimal capability for detection and elimination of bad data. In contrast, when accurate statistical information about the likelihood of occurrence and distribution of values of bad data measurements can be obtained, the normalized innovation threshold q can be customized, on a case-by-case basis, by relying on the Bayesian expression (4.4).

Bad Data Detector Comparison 0.08 Perfect Detector Bayesian Detector Fisher Detector (θ=2) 0.07 No Detector

0.06

0.05

0.04 Error Covariance

0.03

0.02

0.01 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Probability of Bad Data (p)

FIGURE 4.3: Performance of a Kalman ﬁlter using a detector for bad data. Chapter 5

Robust Continuous-Discrete Kalman Filter

The Kalman filter is the optimal linear mean square state estimator when the system state-space model is known accurately. However when this assumption is not satisfied, the Kalman filter is not optimal anymore, and its performance will degrade [21]-[23]. The degradation of performance directly depends on the level of parameter uncertainty. In particular, previous work has demonstrated that delay-mitigation approaches are sensitive to model parameter inaccuracy: the level of estimation error increases significantly when the parameter values used in estimation deviate from their true values by as little as 5% (see Fig. 5.1).

Robustness of dynamic state estimation has been discussed from two distinct perspectives:

H• state-space estimation ([50], [51]) aims to construct state estimates that are robust • with respect to external perturbations, such as the measurement and process noises.

State-space estimation subject to model parameter uncertainty ([18]-[25]) aims to con- • struct state estimates that are robust with respect to perturbations and/or inaccurate knowledge of the state-space model parameter values.

In this section we focus on the second perspective and develop continuous-discrete variants of the robust state-space estimators from [24] and [25]. The two primary modiﬁcations that we introduce here are:

We show how to map a continuous-time state equation with unstructured uncertainty into • a discrete -time state equation with structured parameter uncertainty, which is necessary for developing the recursive estimates of [24] and [25].

52 Chapter 5. Robust Continuous-Discrete Kalman Filter 53

Delay Mitigation with Robust Kalman Filter: close to unstable system 15 no inaccuracy Kalman filter with 1% inaccuracy Kalman filter with 5% inaccuracy Kalman filter with 25% inaccuracy

10 Relative Error Covariance (dB)

0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Delay (sec)

FIGURE 5.1: Effect of model parameter inaccuracy on standard Kalman ﬁlter performance corresponding to delay

We include in our (continuous-discrete) state equation a known (control) input term, • which is absent from the formulation in [24] and [25]. Notice that, due to parameter uncertainties in the state equation, the effect of such a known control signal cannot be accurately predicted.

Our robust Kalman filter (Fig. 5.2, Fig. 5.3) shows the improvement in estimator quality, as compared with the standard KF. We notice that the level of estimation error depends on the level of parameter inaccuracy and “distance from instability” of the state equation: the estimation error increases with higher parameter inaccuracy, and larger (less negative) values of Re p , { i} where p are the poles of the continuous-time state equation. { i} In Chapter 3 we observed that our delay-mitigation method using standard Kalman filter is sensitive to model parameter inaccuracy. In Sec. 5.4 we combine our robust Kalman filter with the delay-mitigation approach we proposed in Sec. 3.1: this results in a significant reduction in state estimation error, as compared with the results achieved in Sec. 3.1, in the presence of model parameter inaccuracy. Thus, our time-stamping-based delay canceling method makes it possible to overcome very large delays in the control path while avoiding adverse effects on the stability of the system, and to do so in the presence of significant model parameter uncertainty. Chapter 5. Robust Continuous-Discrete Kalman Filter 54

5.1 Robust Continuous-Discrete Kalman Filter

We consider a continuous-time discrete-measurement dynamic linear system model [19] with unstructured parameter uncertainty in the system matrix A, viz.,

x˙(t)=(A + dA)x(t)+w(t) (5.1a)

y(tn)=C(tn)x(tn)+hn (5.1b) where x(t),w(t) and hn are the state, process noise and measurement noise, respectively. The processes w(t) and h are i.i.d. with E w(t)w (s) = Qd(t s), and E h h = R(t )d . As n { ⇤ } { n m⇤ } n n,m we explained in Sec. 1.6, our decision to consider accuracy in the matrix A, but not in C( ), is · motivated by the observation that changes in A often have a global impact, causing system-wide propagation of errors, which may lead to increased component stress and eventually component failure. In contrast, changes in a single sensor, or even a small number of sensors, usually have a minimal effect on the quality of state estimation in a well-designed system. This is so because in steady-state operation the measurement update step combines and averages contributions from many sensors.

We model the uncertainty dA as a random quantity, possibly time-variant, that satisﬁes the constraint dA e, where the norm bound e is a known quantity. As we explained in Sec. 1.6, k k we only consider here the uncertain in matrix A because it will have a global impact, causing system-wide propagation of errors, other than sensor uncertainties which usually have a minimal effect on the quality of state estimation in a well-designed system. The sampling instants t { n} can be arbitrarily spaced. We consider arbitrary spacing of the sensor sampling instants t in { n} order to be able to model the effects of transmitting sensor measurements to a central estimator over a communication network. Such effects include occasional loss of a measurement in the communication network (=packet drop), and lack of sampling synchrony between the individual sensors. The performance of a standard (non-robust) continuous-discrete Kalman ﬁlter in the presence of irregular sampling has been discussed in [38] and [39].

In [24] and [25] the authors propose several discrete-time robust state-space estimators that were designed to take into account the presence of structured parameter uncertainties in a discrete-time state space model. In this section, we develop continuous-discrete versions of such robust state-space estimators. To bridge the gap between the discrete-time setting used in [24][25] and our continuous-discrete model (1.1) we need to map the unstructured uncertainty dA into structured uncertainties in the matrices F, G used in [24] and [25]. To do so, we need ﬁrst to rewrite the state equation (5.1a) in a discrete format, with a focus on the sampling instants Chapter 5. Robust Continuous-Discrete Kalman Filter 55

tn , viz., { } Ts(n) (A+dA)Ts(n) (A+dA)t x(tn)=e x(tn 1)+ e w(tn t)dt (5.2) Z0 where Ts(n) , tn tn 1. Thus, the relation (5.2) leads to

(A+dA)Ts(n) F(tn)+dF(tn)=e which results in AT (n) F(tn)=e s (5.3) dF(t )=e(A+dA)Ts(n) eATs(n) n In particular, when dA commutes with A and eT (n) 1, we have the approximation edATs(n) = s ⌧ I + dATs(n), and thus

dF(t ) eATs(n)dAT (n)=M(t )D(t )E (t ) (5.4) n ⇡ s n n f n where

M(tn)=eI dA D(t )= (5.5) n e ATs(n) E f (tn)=Ts(n)e

Notice that as dA varies over the entire ball dA e, the corresponding D(t ) varies over k k n the entire ball D(t ) 1. Thus, (5.4) is fully consistent with the “structured uncertainty” k n k assumption of [24] and [25].

These observations motivate us to deﬁne, in the general case.

M(tn)=eI

1 (A+dA)Ts(n) ATs(n) ATs(n) D(tn)= e e e (5.6) x(tn)eTs(n) h i ATs(n) E f (tn)=x(tn)Ts(n)e where x(t ) is a suitable scaling constant, chosen to ensure the norm inequality D(t ) 1. n k n k

One possible value for x(tn) is provided by the “perturbation bound” results of [52]. In particular, [52, Theorem 2] implies that

(A+dA)T (n) AT (n) AT (n) (µ(A) a(A)+ dA )T (n) e s e s e s T (n) dA e k k s k kk k s k k Chapter 5. Robust Continuous-Discrete Kalman Filter 56 where µ(A) max l ( A+A⇤ ) and a(A) max Rel (A) Since dA e, we concluded that , i i 2 , i i k k

ATs(n) K(e ) (µ(A) a(A)+e)Ts(n) D(tn) e k k x(tn) where K( ) denotes the condition number of a matrix, namely ⇤

AT (n) AT (n) AT (n) K(e s )= e s e s k k · k k

This result motivates us to choose

ATs(n) (µ(A) a(A)+e)Ts(n) x(tn)=K(e )e so that D(t ) 1 is ensured. k n k Unfortunately, the perturbation bound approach cannot be used to obtain the structured decomposition of dG(tn) that is a prerequisite for using the results of [24] and [25]. On the other hand, using the special assumptions that dA commutes with A and eT (n) 1, results in the s ⌧ desired decomposition of of dG(tn) as we presently show. Our simulation results indicate that our robust state estimators, which are constructed under special assumption on dA, work well even when these assumptions are violated. Indeed. Fig. 5.2-5.10 are obtained for dA matrices that do not commute with A, and for high levels (up to 30%) of parameter inaccuracy.

The matrix dG(tn) can now be expressed in terms of dA. First of all, we need to discretize the integral term of (5.2), which yields the sum

Ts(n) e(A+dA)t w(t t)dt [eh(A+dA) e2h(A+dA) eL(n)h(A+dA)]z(t ) n ⇡ ··· n Z0 where we divide [0 Ts(n)] into L(n) subintervals of length h each, and introduce zi(tn) , ih (i 1)h w(tn t)dt, so that R z(t ) [z (t ) z (t ) z (t )]T (5.7) n , 1 n 2 n ··· L(n) n

Thus the fully-discretized version of (5.2) is

x(tn)=(F(tn)+dF(tn))x(tn 1)+(G(tn)+dG(tn))z(tn) (5.8) where h(A+dA) 2h(A+dA) L(n)h(A+dA) G(tn)+dG(tn) [e e e ] , ··· Chapter 5. Robust Continuous-Discrete Kalman Filter 57

Consequently

hA 2hA L(n)hA G(tn)=[e e e ] ··· (5.9) h(A+dA) hA 2h(A+dA) 2hA L(n)h(A+dA) L(n)hA dG(tn)=[e e e e e e ] ··· which completes the conversion of our continuous-discrete model (5.2) into a fully-discretized format. In particular, when dA commutes with A and eT (n) 1, we have the approximation of s ⌧ dG(tn) as

hA 2hA L(n)hA dG(tn) [e (hdA) e (2hdA) e (L(n)hdA)] ⇡ ··· , M(tn)D(tn)Eg(tn)

where

M(tn)=eI dA D(t )= n e hA 2hA L(n)hA Eg(tn)=[he 2he L(n)he ] ···

Combing this with the form of uncertain dF(tn) in (5.4), we have

[dF(tn) dG(tn)] = M(tn)D(tn)[E f (tn) Eg(tn)] (5.10)

which is a structured uncertainty of the form assumed in [24] and [25]. We shall refer to the resulting state estimation scheme as a robust continuous-discrete Kalman ﬁlter.

Robust Continuous-Discrete Kalman Filter (Type 1)

Our robust continuous-discrete Kalman filter (Type 1) is obtained by suitable modifications of the discrete-time state-space estimator of [24]. The main idea in deriving this robust extension of the standard Kalman filter is to use the deterministic interpretation of the Kalman filter to achieve the best worst-case performance in the presence of structured uncertainty in system parameters. The derivation includes two steps. The first step is admitting a deterministic interpretation as the

solution to a regularized least-squares problem [24]. Thus ﬁx a time instant tn and assume that

the ﬁltered estimate x+(tn) has already been computed with the corresponding error variance

matrix P+(tn). Given a new measurement y(tn+1), we then pose the problem of estimating x(tn) b again, along with z(tn), by solving

2 2 2 min x(tn) x+(tn) 1 + z(tn) 1 + y(tn+1) C(tn+1)x(tn+1) 1 (5.11) P (tn) Q (tn) R (tn+1) x(tn),z(tn) k k + k k k k { }h i b where P+(tn) is a recursively-constructed sequence of covariance-type matrices, x+(tn) denotes the

b Chapter 5. Robust Continuous-Discrete Kalman Filter 58 robust ﬁltered state estimate based on the measurements y(t1), y(t2), y(tn) , and Q(tn)= { ··· } T E[z(tn)z(tn) ] is the covariance matrix of the discrete-time process noise z(tn) of (5.8). The second step is formulating a robust version of (5.11) as follow

2 2 2 min max x(tn) x+(tn) 1 + z(tn) 1 + y(tn+1) C(tn+1)x(tn+1) 1 (5.12) P (tn) Q (tn) R (tn+1) x(tn),z(tn) dA k k + k k k k { } h i b Our robust continuous-discrete Kalman filter is the solution to the recursive regularized least squares problem (5.12). The construction of this solution is based on results from [24]. The corresponding time-recursive estimation algorithm is summarized in Table 5.1. The primary difference, as compared with the standard KF, is the use of modified parameter matrices Q( ), · R( ), P+( ), G( ) and F( ), which incorporate the robustness model (i.e., the matrices E f ( ), · · · · b · Eg( ) and M( )). The modified parameter matrices are then used in the measurement- and time- b · b b· b update steps. Notice that R( ) has to be evaluated before the time-update step, while Q( ), · · P+( ), G( ) and F( ) can only be evaluated after the measurement-update step. The algorithm · · · b b generates the robust estimates x (tn) as well as a number of covariance-type matrices. We b b b ± should notice that the robust Kalman filter does not propagate the true error covariance matrix b P (tn) , E [x(tn) x (tn)][x(tn) x (tn)]⇤ . In contrast, the robust algorithm we describe next ± { ± ± } (type 2) does provide an explicit upper bound on the true error covariance. b b Robust Continuous-Discrete Kalman Filter (Type 2)

Similar to our Type 1 Robust Continuous-Discrete Kalman Filter, we formulate a recursive determinstic regularized least squares problem [25], viz.,

2 2 min x(tn) x (tn) 1 + y(tn) C(tn)x(tn) 1 (5.13) P (tn) R (tn) x(tn) k k k k h i and augment it to accommodateb parameter uncertainty, resulting in a min-max problem, viz.,

2 2 minmax x(tn) x (tn) 1 + y(tn) C(tn)x(tn) 1 (5.14) P (tn) R (tn) x(tn) dA k k k k h i The robust continuous-discrete Kalmanb ﬁlter (Type 2) is constructed by applying the structured uncertainty relations (5.10) to the formulation used in [25]: it is the (unique) solution of this min-max problem (5.14).

The corresponding time-recursive estimation algorithm is summarized in Table 5.2. The covariance-like matrix W( ) combines the effect of system parameters, such as C( ), R( ), A, · · · Ts( ), G( ), Q( ) with the structured uncertainty parameters M( ) and E f ( ). The recursion for · · · · · W(tn) is obtained by minimizing an upper bound for the true error covariance P (tn), namely

P (tn) Y(tn) Z(tn) (5.15)  b b Chapter 5. Robust Continuous-Discrete Kalman Filter 59

Initialization

1 Setx ˆ+(t0)=P (t0) C⇤(t0) C(t0) P (t0) C⇤(t0)+R(t0) y(t0) where P (t0)=P0 ⇣ ⌘ Optimization of lˆ (tn)

ˆ ˆ If C(tn+1)M(tn)=0, then set l(tn)=0. Otherwise construct G(l) and determine l(tn) by minimizing G(l) over the interval lˆ (t ) > l (t ) M (t )C (t )R 1(t )C(t )M(t ) n l n ,k ⇤ n ⇤ n+1 n+1 n+1 n k G(l)= x(l) 2 +l E x(l) E 2 + Ax(l) b 2 k kQ k a b k k kW(l) 1 1 1 where Q P (tn) Q (tn), Ea [E f (tn) Eg(tn)], Eb E f (tn)x+(tn), W R (tn+1) , + , , , and A , C(tn+1)[F(tn) G(tn)], b , y(tn+1) C(tn+1)F(tn)x+(tn), H , C(tn+1)M(tn) b † W(l) , W +WH(lI H⇤WH) H⇤W, Q(l) , Q + lEa⇤Ea b 1 x(l) , Q(l)+A⇤W(l)A A⇤W(l)b + lEa⇤Eb h i h i Modiﬁed Parameters (R(tn),Q(tn),P+(tn),G(tn),F(tn))

R(t )=R(t ) lˆ 1(t )C(t )M(t )M (t )C (t ) n+1 n+1 n n+1 n ⇤ n ⇤ n+1 1(t )= 1(t )+lˆ (t ) (t )[I + lˆ (t )E (t )P (t )E (t )] 1 (t ) Q n b Q n n Eg⇤ n n f n + n ⇤f n Eg n P (t )=(P 1(t )+lˆ (t )E (t )E (t )) 1 b + n + n n ⇤f n f n ˆ G(tn)=G(tn) l(tn)F(tn)P+(tn)E⇤(tn)Eg(tn) b f ˆ ˆ F(tn)= F(tn) l(tn)G(tn)Q(tn)E⇤(tn)E f (tn) I l(tn)P+(tn)E⇤(tn)E f (tn) b g b f ⇣ ⌘⇣ ⌘ b Signals b b Statisticsb

Time Update (in the interval t t < t ) n  n+1

xˆ (tn+1)=F(tn)xˆ+(tn) P (tn+1)=F(tn) P+(tn) F⇤(tn)+ G(tn)Q(tn) G⇤(tn) b b b b b Measurement Update (at the time instant tn+1 )

n(tn+1)=y(tn+1) C(tn+1) xˆ (tn+1) Y(tn+1)=C(tn+1) P (tn+1) C⇤(tn+1)+R(tn+1) 1 xˆ+(tn+1)=xˆ (tn+1)+K(tn+1) n(tn+1) K(tn+1)=P (tn+1) C⇤(tn+1) Y (tn+1) b P+(tn+1)=P (tn+1) K(tn+1) Y(tn+1)K⇤(tn+1)

TABLE 5.1: Robust Continuous-Discrete Kalman Filtering (type 1) Algorithm in Measurement and Time Update Form Chapter 5. Robust Continuous-Discrete Kalman Filter 60

Evaluation of W(tn)

y(tn) is the maximum singular value of I + B(tn) with B(t )=F(t )W(t )C (t )R 1(t )C(t )(Y(t ) Z(t ))C (t )R 1(t )C(t )W(t )F (t ) n n n 0 n n n n n 0 n n n n 0 n +F(t ) W(t )C (t )R 1(t )C(t )W(t ) W(t )C (t )R 1(t )C(t )(Y(t ) Z(t )) F (t ) n n 0 n n n n b n 0b n n n n n 0 n ⇣ a(t ) satisﬁes the constraint a(t )I M(t )Y(t )M (t ) > 0 ⌘ n n n n 0 bn b W(t )=(Y(t ) Z(t )) (Y(t ) Z(t ))C (t )R¯ 1(t )C(t )(Y(t ) Z(t )) n n n n n 0 n bn n n n where R¯(t )=R(t )+C(t )(Y(t ) Z(t ))C (t ) b b n b n b n n n 0 n b b and Y(t )=y2(t )I + F(t )Y(t )F (t )+a(t )E (t )E (t )+ (t ) (t ) (t ) n+1 n n n 0 n b n f bn 0f n G n Q n G0 n Z(t )=(y2(t ) 1)I + F(t )Z(t )F (t ) F(t )W(t )C (t )R 1(t )C(t )W(t )F (t ) n+1 b n n b n 0 n n n 0 n n n n 0 n +F(t ) (Y(t ) Z(t ))C (t )R 1(t )C(t )W(t )+W(t )C (t )R 1(t )C(t )(Y(t ) Z(t )) F (t ) nb n n 0 n n b n n n 0 n n n n n 0 n ⇣ F(t )W(t )C (t )R 1(t )C(t )(Y(t ) Z(t ))C (t )R 1(t )C(t )W(t )F (t ) ⌘ b n b n 0 n n n n n 0 n n n b n 0 bn

Measurement Updateb (atb the time instant tn )

n(tn)=y(tn) C(tn) xˆ (tn) 1 K(tn)=W(tn) C0(tn) R (tn)

xˆ+(tn)=xˆ (tn)+K(tn) n(tn) Time Update (in the interval t t < t ) n  n+1

xˆ (tn+1)=F(tn)xˆ+(tn)

TABLE 5.2: Robust Continuous-Discrete Kalman Filtering (type 2) Algorithm in Measurement and Time Update Form

The same recursion also propagates the values of Y( ) and Z( ), so that we can evaluate at every · · iteration step an upper bound on the actual error covariance P (tn) for the state estimatex ˆ (tn) b b that is propagated by this robust algorithm.

Simulation results of Robust Continuous-Discrete Kalman Filter (type 1 and 2)

We use two examples to illustrate the improved performance of our robust dynamic state esti- 0.30.2 mations (type 1 and 2) as compared with the standard Kalman ﬁlter. We use dA1 = " 0.20.3 # 0.03 0.02 for example 1 (a stable system, see (G.1)), and dA2 = for example 2 (a marginally- " 0.30.03 # dA stable system, see (G.1)). These perturbations results in k A k 30%. Notice that these dA k k ⇡ Chapter 5. Robust Continuous-Discrete Kalman Filter 61 matrices do not commute with the corresponding A matrices.

The resulting empirical error covariance for example 1 is shown in Fig. 5.2 as a function of time (Ts(n)=0.02sec). Steady state is achieved after 30 measurements have been acquired. We notice that the effect of parameter inaccuracy in the standard Kalman ﬁlter is an increase of 2.6dB in the average error covariance. Our type 1 robust estimator reduces this added error by about 0.8dB, while the type 2 robust estimator achieves a 0.7dB error reduction. Even though the state estimates generated by our type 1 and type 2 estimators differ from each other (see Fig. 5.2), the level of mean square estimation error is very similar.

The improvement in performance is even more noticeable in example 2 (Fig. 5.3 and Fig. 5.3). Here the steady-state increase in error covariance caused by parameter inaccuracy is 15dB for ⇡ the standard Kalman filter, but only 3.5dB for our type 1 estimator (and 5dB for our type ⇡ ⇡ 2 estimator). This is also evident from the dynamics of state estimates (Fig. 5.3): the type 1 estimate tracks the actual state (for both states), the type 2 estimate tracks only the first state, while the standard Kalman filter takes a much longer time to converge to a steady-state, and suffers from significant overshoots.

In summary, our type 1 estimator offers a somewhat better error reduction capability than the type 2 estimator, and both can significantly outperform the standard Kalman filter in the presence of medium to large system parameter uncertainty. Since the type 2 estimate has a matching upper bound (dashed line in Fig. 5.2 and Fig. 5.3), the same bound can also be used to approximately predict the error covariance level of our type 1 robust state estimator. This upper bound depends only on system parameters, perturbation model parameters (M(tn), E f (tn)) and the measurement instant t : however it does not depend on the actual measured values. Thus, { n} it can be determined in advance (i.e., offline) when the measurement instants are known in advance, e.g., when the sampling interval is uniform and there are no packet drops.

The analysis of App. I indicates that the contribution to state estimation error due to parameter uncertainty depends also on the power level of the actual state vector. Consequently a marginally state system (e.g. example 2), which tends to have a larger state covariance exhibits a more dramatic increase in the error covariance under uncertainty, as compared with a stable system (e.g. example 1) with the same level of parameter inaccuracy.

5.2 Continuous-Discrete Robust Kalman Filter with Known Con- trol Inputs

The technique that we developed in Sec. 5.1, based on the discrete-time procedures of [24] and [25] does not account for the (possible) presence of known control inputs. Although it is relatively easy to incorporate known inputs in the formulation of standard Kalman ﬁlter, the same Chapter 5. Robust Continuous-Discrete Kalman Filter 62

comparison of 2 methods of RKF 4 actual 3 non−robust 2 robust 2001 robust 2004 1

0 state −1

−2

−3

−4 0 10 20 30 40 50 60 70 80 90 100 iteration

3 actual 2 non−robust robust 2001 robust 2004 1

0 state

−1

−2

−3 0 10 20 30 40 50 60 70 80 90 100 iteration

comparison of 2 methods of RKF 10

non−robust 5 robust 2001 robust 2004 Theoretical Upbound

error covariance in dB actual parameters 4

0 0 20 40 60 80 100 120 140 160 180 200 iteration

FIGURE 5.2: Robust continuous-discrete Kalman ﬁlter: stable system Chapter 5. Robust Continuous-Discrete Kalman Filter 63

comparison of 2 methods of RKF 30 actual 20 non−robust robust 2001 robust 2004 10

0 state

−10

−20

−30 0 20 40 60 80 100 120 140 160 180 200 iteration

30 actual non−robust 20 robust 2001 robust 2004 10 state 0

−10

−20 0 20 40 60 80 100 120 140 160 180 200 iteration

comparison of 2 methods of RKF 30 non−robust actual parameters robust 2001 robust 2004 Theoretical Upbound

15 error covariance in dB

0 0 20 40 60 80 100 120 140 160 180 200 iteration

FIGURE 5.3: Robust continuous-discrete Kalman ﬁlter: “marginally-stable” system Chapter 5. Robust Continuous-Discrete Kalman Filter 64 does not hold for the robust state estimator. This is so because the contribution (to the state) from a known input cannot be accurately evaluated in the presence of an unknown model parameter perturbation. Here we to modify the robust continuous-discrete state estimators formulated in the previous section to accommodate known system inputs, and derive the resulting modiﬁed robust state estimators.

When we consider known control inputs, the original system equation (5.1a) becomes

x˙(t)=(A + dA)x(t)+Bu(t)+w(t) (5.16) where u(t) is the known control input. The differences between without inputs version and with inputs version are the time update in Kalman filter. In type 1, the filter can be obtained by Table 5.1 with the modification of state estimate time update equation as below.

Ts(n+1) At x (tn+1)=F(tn)x+(tn)+ e Bu(tn+1 t)dt (5.17) Z0 Similarly in type 2, web only haveb to rewriteb the state estimate time update equation in Table 5.2:

Ts(n+1) At x (tn+1)=F(tn)x+(tn)+ e Bu(tn+1 t)dt (5.18) Z0 Using the same twob examples (stableb and marginally-stable) as in Sec. 5.1, we observe that both type 1 and type 2 estimators work well with the added known input term, resulting in an improved performance, as compared with the standard Kalman ﬁlter (with known input). Chapter 5. Robust Continuous-Discrete Kalman Filter 65

comparison of 2 methods of RKF 16

12 non−robust robust 2001 robust 2004 Theoretical Upbound 10 Actual parameters

8 error covariance in dB

0 0 10 20 30 40 50 60 70 80 90 100 iteration

FIGURE 5.4: Robust continuous-discrete Kalman ﬁlter with constant control inputs: stable system

comparison of 2 methods of RKF 45 non−robust actual parameters robust 2001 robust 2004 40 Theoretical Upbound

20 error covariance in dB

0 0 50 100 150 200 250 300 350 400 iteration

FIGURE 5.5: Robust continuous-discrete Kalman ﬁlter with constant control inputs: “marginally-stable” system Chapter 5. Robust Continuous-Discrete Kalman Filter 66

5.3 Stability of the Continuous-Discrete Robust Kalman Filter

A well known result about the standard KF states that the error covariance converges to a steady state value when:

t are uniformly-spaced and there are no packet dropouts • { n} the underlying system is detectable and stabilizable • the initial state covariance is positive deﬁnite •

Thus a KF can track the divergent trajectory of an unstable system so long as the three conditions are satisﬁed.

However, this result no longer holds then the system parameters used to construct the KF differ from those of the actual system: in this case, the estimation error e (tn) , x(tn) xˆ (tn) ± ± will diverge, regardless of the level of inaccuracy. Indeed, we show in App. I that the estimation error e (tn) is the output of a (stable) linear time-invariant ﬁlter whose input consists of the various differences/perturbations between the actual system and the system model used by the

KF. One of these perturbations is dF(tn)x(tn): this input will, in general, be unbounded when the actual system is unstable, resulting in an unbounded estimation error e (tn) (and similarly for e+(tn)). In summary, the presence of system parameter inaccuracies, however small, makes it impossible to track the (divergent) state of an unstable system. The same conclusion applies also to our robust KF (see Fig. 5.6).

The analysis of App. I makes it clear that stability of the standard KF (as well as of our robust version KF) depends only on two contribution factors:

the magnitudes of the eigenvalues of the modiﬁed system matrix F(I KC) • the boundedness of the perturbation input dF(t )x(t ) • n n

The classical result about convergence of the error covariance recursion guarantees that the modiﬁed system matrix is “stable”, i.e., all its eigenvalues are strictly within the unit circle. Thus the KF will produce a bounded estimation error when the actual state trajectory x(t ) { n } is bounded. This can happen, for instance, if stabilizing feedback is added to an open-loop unstable system.

We use example 3 (eq. (G.3) in App. G) to illustrate the improvement offered by our robust KF when applied to a feedback-stabilized system. Even though the original F matrix is unstable, the effect of stabilizing feedback is to produce bounded state trajectories. Consequently, both The standard KF and our robust continuous-discrete version achieve bounded estimation error Chapter 5. Robust Continuous-Discrete Kalman Filter 67

(Fig. 5.7). We can see that the level of error covariance in Fig. 5.7 is close to that of Fig. 5.2 and 5.4 because the behavior of the feedback stabilized system is similar to that of our (stable) example 1. In particular, we notice that the 3dB improvement in steady-state error covariance achieved by the robust KF is of the same level of magnitude as the improvement shown in Fig. 5.2 and 5.4.

140 non−robust actual parameters robust

120

100

60 error covariance in dB

0 0 50 100 150 200 250 300 350 400 iteration

FIGURE 5.6: Robust continuous-discrete Kalman ﬁlter with unstable system Chapter 5. Robust Continuous-Discrete Kalman Filter 68

Non−robust

8 Actual Parameters Robust

6 error covariance in dB

3 0 50 100 150 200 250 300 350 400 iteration

FIGURE 5.7: Robust continuous-discrete Kalman ﬁlter with unstable system and feedback control

5.4 Robust Delay Mitigating Control

We can now combine our robust KF with the delay mitigation technique of Sec. 3.1. This will result in a further reduction in error covariance as compared with the results achieved in Sec. 3.1 with a standard KF.

Recall that the presence of delay in the control path (Fig. 1.1) may result in loss of stability of the overall system. This is true, in particular, when one of the functions of the control loop is to stabilize an unstable open loop system. Thus the primary objective of our control module is to mitigate the effects of delay in the control path, so as to avoid loss of stability of the closed loop system. The method that we use allows us to overcome very large delay (at least 10-20 times higher than the smallest destabilizing delay in the original control loop) at the cost of increased state estimation error. Our approach to delay mitigation, as outlined in Sec. 3.1 (also [34]) consists of three key ideas:

The estimation center transmits sampled state estimates to a control module attached to • the actuator (see Fig. 1.1).

Each transmitted information packet is time-stamped. This enables the control module to • determine the delay experienced by each information packet using a highly accurate GPS clock. Chapter 5. Robust Continuous-Discrete Kalman Filter 69

The control module uses a prediction step to transform the delayed (and thus outdated) • state estimate it receives into an up-to-date estimate that is then used to generate the desired control signal.

We have shown that this approach transmutes communication delay into increased estimation error [34]. However, the level of this increase is quite sensitive to inaccuracies in model parameters: for instance a 5% inaccuracy can result in a 10dB error increase when a standard Kalman filter is used to provide the state estimate (see Fig. 5.8). This observation motivates our interest in using a robust version of the Kalman filter to reduce the impact of model inaccuracies in the presence of delay. This means we use our robust Kalman filter in both the estimation center and the control module to generate state estimates that are less sensitive to model parameter inaccuracies. This results in a significant performance improvement in the presence of considerable parameter inaccuracy, as compared with our earlier (non-robust) results of [34].

To be speciﬁc, we assume that the estimation center transmits a sequence of state estimates, say x(t ), to the control module attached to the actuator. The time instants t , t include, i { 1 2 ···} at least, all the times when x(t) underwent a measurement update. Since each information b packet is time-stamped, the control module receives, at the time instant qn = tn + dn , the pair b x(t ), t , where d denotes the delay in the control path (Fig. 1.1). By comparing the time- { n n} n stamp tn with the physical time of arrival qn , the control module can accurately determine the b value of the delay d = q t . n n n

x(qn)=F(dn)x(tn) (5.19) where x(tn) is the estimateb receivedb (atb qn) in the information packet transmitted by the estimation center. Since the robust predictor (5.19) uses a modiﬁed version F(dn) of the original model b Ad parameter matrix F(dn)=e n , we need to include in the control module also a portion of the b robust Kalman ﬁlter algorithm, which is needed to determine F(dn) from F(dn), G(dn), E f (dn),

Eg(dn), P+(dn), Q(dn) and the internal control parameter l(dn) (see Table 5.1 and 5.2). This b local estimate is then used to construct the control signal u(t)= Kx(t). b We use here the same feedback-stabilized example (example 3)b used in Sec. 3.1 and Sec. 5.3. Recall that when delay is present in the feedback path, the closed-loop system becomes unstable for d 0.18s d . In contrast, using our delay mitigation technique, the closed- , cr loop system remains stable, with delays as high as d = 20s. However, the level of estimation error increases with the increase of delay, as is evident from Fig. 5.8. Notice the signiﬁcant improvement achieved when we use the robust Kalman ﬁlter, instead of the standard one, in both the state estimation center and in the control module: 5dB for 5% inaccuracy and 10dB Chapter 5. Robust Continuous-Discrete Kalman Filter 70 for 25% inaccuracy. Also notice that we still have a relative error covariance of less than 20dB (with parameter inaccuracy of 1% or less) when d = 1s, a delay value that is 5 times higher than the critical delay dcr = 0.18s. The reduction in error covariance achieved with the robust KF is evident in other examples as well. It is still considerable for our “marginally-stable” system (example 2), i.e., one with poles near the jw axis (Fig. 5.9). Similar to our results in Sec. 5.1 and 5.2, the improvement achieved with the robust KF tends to decrease as the eigenvalues of the matrix A move further away from the jw axis into the left half-plane (see Fig. 5.10)

45 no inaccuracy Robust Kalman filter with 1% inaccuracy 40 Kalman filter with 1% inaccuracy Robust Kalman filter with 5% inaccuracy Kalman filter with 5% inaccuracy 35 Robust Kalman filter with 25% inaccuracy Kalman filter with 25% inaccuracy

15 Relative Error Covariance (dB)

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Delay (sec)

FIGURE 5.8: Robust Kalman Filter with Delay Mitigating Control: Unstable System Chapter 5. Robust Continuous-Discrete Kalman Filter 71

Delay Mitigation with Robust Kalman Filter: close to unstable system 10

9 no inaccuracy Robust Kalman filter with 1% inaccuracy Kalman filter with 1% inaccuracy 8 Robust Kalman filter with 5% inaccuracy Kalman filter with 5% inaccuracy Robust Kalman filter with 25% inaccuracy Kalman filter with 25% inaccuracy 7

4 Relative Error Covariance (dB)

0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Delay (sec)

FIGURE 5.9: Robust Kalman Filter with Delay Mitigating Control: “Marginally-Stable” Sys- tem

Delay Mitigation with Robust Kalman Filter: stable system 6 no inaccuracy Kalman filter with 1% inaccuracy Robust Kalman filter with 1% inaccuracy Kalman filter with 5% inaccuracy Robust Kalman filter with 5% inaccuracy Kalman filter with 25% inaccuracy 5 Robust Kalman filter with 25% inaccuracy

3 Relative Error Covariance (dB)

0 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 Delay (sec)

FIGURE 5.10: Robust Kalman Filter with Delay Mitigating Control: Stable System Chapter 5. Robust Continuous-Discrete Kalman Filter 72

5.5 Robust Delay Mitigation in Observation Path

Similar to robust delay mitigation control, the same approach can be used also with delay in observation path which requires special treatment, as explained in Sec. 3.2. To be more speciﬁc, We demonstrated an Islanded Power System Example in this section to show the performance of our robust Kalman ﬁlter algorithm with delay-mitigation method in the presence of parameters uncertainty.

We simulate a (small) islanded power system, similar to the ones used on ships, consisting of:

A synchronous generator, driven by a turbine (or diesel engine) with an iso-synchronous • speed governor and a voltage regulator

A permanent magnet synchronous motor (PMSM), controlled via a variable frequency • drive (VFD), and connected to a marine propulsion system.

The simulation is carried out in terms of the modules shown in Fig. 5.11, and the (dynamic) state variables of the simulated system are:

The generator torque angle d (t) and its ﬁrst derivative (=angular speed transient) d˙ (t) . • G G The motor speed transient d˙ (t) • M The quadrature and direct components of the motor’s stator current, namely, i (t) and • q id(t)

' $ ' $ Variable Permanent Synchronous Magnet Frequency Generator Synchronous Drive Motor

& % & %

FIGURE 5.11: Block Diagram of Islanded Power System

We measure the speed and stator current of the PMSM, as well as the VFD input current (since its phase depends on the generator torque angle), using sensors with SNR in the range of 10-20 dB. Notice that all the sensors are located in the vicinity of the propulsion motor. We use these measurements to construct estimates of all ﬁve state variables, using our continuous- discrete robust Kalman ﬁlter, augmented with delay-mitigation method. We introduce a 0.2sec Chapter 5. Robust Continuous-Discrete Kalman Filter 73 delay in the path between the sensors (all co-located at the PMSM) and our dynamic state estimator and 30% uncertainty into the system matrix of the model. The results (Fig. 5.12) show that the robust KF state estimate can track the real state better than standard KF, in the presence of both parameter uncertainty and delay. In particular the effect of delay is most noticeable at the sharp state transitions, which are typically caused by the step changes of the input signal (in this case the load input power). The effect of parameter uncertainty can be seen clearly in the steady state: our robust KF achieves around 70% reduction in error covariance compared with the standard KF.

Real State Robust State Estimate with delay Standard State Estimate with delay 0.05

−0.05 state 1 −0.1

−0.15

−0.2

−0.25 0 100 200 300 400 500 600 700 800 iteration (8sec)

FIGURE 5.12: Robust Kalman Filter with Delay Mitigating Control in Islanded Power System Chapter 6

Resilience Margins of Networked State Estimation

A networked state estimation system may experience various types of perturbations, which can adversely impact the quality of estimation. The ability to accommodate perturbations without excessive degradation in the quality of service is know as resilience. A resilient design of a networked estimation system should be able to accommodate single (and possibly even multiple) faults or perturbations while maintaining an acceptable level of performance. In this chapter we provide analytical and computational tools that allow to quantify the margin that separates a given network estimation system from the “red line” or boundary of unacceptably high estimation error. In particular we examine in this chapter the relation between resilience margins and the following metrics of perturbation severity: packet drop probability, number of failed sensors, average transmission delay, time-stamp (rms) accuracy, and probability of bad sensor data.

Our preceding analysis has identiﬁed three sets of control variables that impact estimation quality:

the average sampling rate of each sensor (F(i) for sensor i) • av sensor selection and/or placement that determines the coefﬁcient vector C , and • i the threshold used for detection of bad data •

In this chapter we demonstrate how these control parameters can be used to ensure the desired resilience margin for a networked state estimation system.

We shall use the term Region of Acceptable Performance (RAP) to denote the set of all sensor (1) (2) (L) sampling rate combinations (Fav ,Fav ...,Fav ) that result in a normalized average error covari- tr(P ,av) ance that satisﬁes the inequality h, where P is the asymptotic state covariance, which tr(P)  74 Chapter 6. Resilience Margins of Networked Estimation 75 can be calculated from a Lyapunov equation, and the threshold value h can be application- speciﬁc. Our upper bound Pupper of (2.10) provides a convenient approximation for P ,av:

tr(Pupper) tr(P ,av) setting h implies that h, so that the corresponding combination of sensor tr(P)  tr(P)  sampling rates is guaranteed to be within the RAP. Once the boundary of the RAP has been determined (or approximated) we can deﬁne the resilience margin of a given dynamic state estimator in terms of the level of perturbation (such as packet drop, bad data, communication delay and time-stamp inaccuracy) that can be accommodated without leaving the RAP.

Estimation performance contour plot in regular sampling 20

F2 (regular sampling rate of sensor 2) 6

0 0 2 4 6 8 10 12 14 16 18 20 F1 (regular sampling rate of sensor 1)

(i) FIGURE 6.1: Contour plot of P ,av vs average sampling rate Fav in regular sampling case: (a) both of sensors are observable Chapter 6. Resilience Margins of Networked Estimation 76

A typical RAP with no perturbation is shown in Fig. 6.1, for the system of example (G.5). Notice that in this example the RAP boundary (which corresponds to h = 0.05) is almost perfectly straight, indicating that the level of estimation error covariance depends mainly on the (i) combined sampling rate Fav = SFav . The slight curvature of this boundary away from the origin (1) (2) indicates a mild (“second-order”) nonlinear dependence on Fav and Fav . Notice that acceptable performance can be maintained in this example even if one of the sensors fails completely (so that the corresponding Fav becomes zero). This is so because the system in this example remains observable with a single sensor, namely both (C1,A) and (C2,A) are observable. Similar results have been also presented in [9]. A different conﬁguration of sensors is discussed in Sec. 6.1.

In the remainder of this chapter we demonstrated that selecting the sampling rates within the (1) (2) RAP, such as Fav = Fav = 15samples/sec in this example, results in reliable operation even in the presence of packet drops, communication delays and bad data.

6.1 Packet Drops and Sensor Placement

Loss of information packets results in an equivalent drop of the average sampling rate, since in D 1 p this case Tav = 1 p (see App. B), and thus Fav = D . Consequently, one needs higher values 1 of D (the unperturbed sampling rate) in order to achieve the same level of performance. This is evident from Fig. 6.2: increasing the probability “p” of packet drops shifts the RAP boundary further away from the origin.

(1) (2) Notice that our “reliable operation point” of Fav = Fav = 15samples/sec remains within the RAP for p < 0.45, but results in excessive estimation error for higher values of “p”. Thus, higher sampling rates (say 1 = 1 = 20samples/sec) would be needed to achieve acceptable perfor- D1 D2 mance when p = 0.5. These observations provide the rationale for using the term resilience diagram for plots such as Fig. 6.2: the choice 1 = 1 = 15samples/sec is acceptable as long D1 D2 as p < 0.45, which deﬁnes the resilience margin associated with this operation point. There are two equivalent ways to deﬁne this resilience margin (see Table 6.1):

Divide the combined rate for the proposed operating point (i.e., 15+15=30) by the com- • bined rate for the p = 0 line (which is t 16.3samples/sec).

1 Form the corresponding ratio of 1 p values, i.e., 1 0.45 . •

Both calculations result (approximately) in the ratio 1.83, which we choose as our resilience margin metric. Chapter 6. Resilience Margins of Networked Estimation 77

Estimation performance contour plot of packet drop in regular sampling 40 packet drop probability p=0.5 packet drop probability p=0.2 35 packet drop probability p=0.1 packet drop probability p=0

10 F2 (regular sampling rate of sensor 2)

0 0 5 10 15 20 25 30 35 40 F1 (regular sampling rate of sensor 1)

(i) FIGURE 6.2: Contour plot of P ,av vs average sampling rate Fav in two-sensor Bernoulli-drop case: (a) both of sensors are observable

p 0 0.45

Fav 16.3 30

TABLE 6.1: Resilience Margin

The ability to maintain stable operation in the presence of failed sensors is a desirable property of a networked state estimator. If r is the smallest number of failed sensors that results in loss of observability, we shall say that the estimator is r-observable. The example shown in Fig. 6.2 is clearly 2-observable. In contrast, the example shown in Fig. 6.3 is 1-observable because the loss of sensor 2 results in loss of observability leading to divergence of the Kalman ﬁlter. Notice that stable operation can still be maintained if sensor 1 alone fails.

The main differences between Fig. 6.3 and Fig. 6.2 are:

The RAP boundary curves are no longer straight, and no longer symmetric with respect • to F1 and F2 Chapter 6. Resilience Margins of Networked Estimation 78

The quality of performance still depends on the combined sampling rate F + F , but only • 1 2 when F2 is not too small. In particular F2 = 0 is always outside the RAP: failure of sensor 2 results in divergence of the dynamic state estimator.

These differences imply that networked estimation systems must be r-observable with r = 2. In fact, sensor placement should aim to increase the degree of observability, perhaps to r = 3 or r = 4, in order to achieve resiliency with respect to multiple isolated sensor failures.

Estimation performance contour plot of packet drop in regular sampling: 1−observable case 40 packet drop probability p=0.5 packet drop probability p=0.2 35 packet drop probability p=0.1 packet drop probability p=0

10 F2 (regular sampling rate of sensor 2)

0 0 5 10 15 20 25 30 35 40 F1 (regular sampling rate of sensor 1)

(i) FIGURE 6.3: Contour plot of P ,av vs average sampling rate Fav in two-sensor Bernoulli-drop case: (b) sensor 2 is observable and sensor 1 is not

6.2 Communication Delay and Time-Stamp Inaccuracy

We have shown in Chapter. 3 that communication delays have a similar effect to packet drops: the level of estimation error increases as the average communication delay increases (similar to the effect of increased packet drop probability). In particular, when all packets experience the Chapter 6. Resilience Margins of Networked Estimation 79 same distribution of delays, regardless of the sensor that generated the packet, we can parame- terize the RAP boundary in terms of the average delay Dav (Fig. 6.4).

Again we observe that for 2-observable systems the RAP boundaries are practically straight, and that increasing Dav shifts the RAP boundary away from the origin. Similarly, we can quan- (1) (2) tify the resilience margin of any operating point (say Fav = Fav = 15samples/sec) either in terms of the sampling rate ratio (i.e., (15+15)/16.3 t 1.83), or in terms of the delay that would shift the RAP boundary to this operating point (Dav t 0.25sec in this example).

The analysis in Sec. 3.3 has shown that the impact of time-stamp inaccuracy on estimation quality is equivalent to an increase in the average delay. This is illustrated in Fig. 6.5, which is parameterized both in terms of average delay and relative time-stamp inaccuracy.

Estimation performance contour plot of delay in regular sampling 25 delay ∆=0.1sec delay ∆=0.05sec delay ∆=0.02sec delay ∆=0sec

10 F2 (regular sampling rate of sensor 2)

0 0 5 10 15 20 25 F1 (regular sampling rate of sensor 1)

(i) FIGURE 6.4: Contour plot of P ,av vs average sampling rate Fav with communication delay Chapter 6. Resilience Margins of Networked Estimation 80

Estimation performance contour plot of delay in regular sampling 35 delay ∆=0.2sec delay ∆=0.2sec and 2% relative inaccuracy delay ∆=0.2sec and 5% relative inaccuracy delay ∆=0.1sec delay ∆=0.1sec and 2% relative inaccuracy 30 delay ∆=0.1sec and 5% relative inaccuracy delay ∆=0sec

15 F2 (regular sampling rate of sensor 2)

0 0 5 10 15 20 25 30 35 F1 (regular sampling rate of sensor 1)

(i) FIGURE 6.5: Contour plot of P ,av vs average sampling rate Fav with communication delay and time-stamp inaccuracy

6.3 Corrupted Sensor Measurements

The occurrence of bad data, as demonstrated in Chapter 4, increases the level of estimation error. In particular, when the probability of bad data occurrence is the same for all sensors, we can parametrize the RAP boundary in terms of this probability p , Pr(H1) (Fig. 6.6).

We use the same 2-observable example (G.5) that we used in sections 6.1 and 6.2, with a ﬁxed bad data distribution (y+3)2 (y 3)2 1 1 1 p(y H )= e 2 + e 2 1 2 p p  2p 2p Chapter 6. Resilience Margins of Networked Estimation 81

We use a Fisher detector with a normalized threshold of q = 2, which corresponds to a false alarm probability PFA = 5%. Again we observe that in this 2-observable example the RAP boundary is essentially straight, and that increasing p shifts the RAP boundary away from the origin. Resilience margin can be quantiﬁed either in terms of sampling rate (as in sections 6.1 and 6.2), or in terms of the value of p that shifts the RAP boundary to the system’s operating point, say F1 = F2 = 15samples/sec, which corresponds to p = 0.4.

In contrast to delay mitigation, the detection and elimination of bad data involves a control parameter (the normalized detection threshold q) that has to be selected by the system designer. We provided in Chapter 4 a framework for optimal selection of q, based on a statistical characterization of bad data. Simulation results show that when typical bad data magnitudes exceed those of uncorrupted data by 2 standard deviations (or more), setting 2 q 3 tends to produce   estimation quality that is very close to that achieved with an optimal threshold (Fig. 6.7). In other words, a Fisher detector with 2 q 3 is often as good as a Bayesian detector, as we   have already observed in Chapter 4. In any case, using a Fisher detector with a ﬁxed q produces a conservative RAP boundary, providing a lower bound for our resilience margin metric. Chapter 6. Resilience Margins of Networked Estimation 82

Performance with different bad data probability 40 p=0 p=0.1 (threshold=2) p=0.2 (threshold=2) p=0.5 (threshold=2) 35

F2 (sampling rate of sensor 2) 15

0 0 5 10 15 20 25 30 35 40 F1 (sampling rate of sensor 1)

(i) FIGURE 6.6: Contour plot of P ,av vs average sampling rate Fav with Corrupted Sensor Mea- surements Chapter 6. Resilience Margins of Networked Estimation 83

Performance with different threshold, fixing shift=3, standard deviation=1 and probability=0.1 25 threshold=1 (false alarm=35%) threshold=1.5 (false alarm=15%) threshold=2 (false alarm=5%) threshold=3 (false alarm=0.2%) threshold=4 (false alarm=0.01%) optimal threshold=2.68 (false alarm=0.6%) 20

10 F2 (sampling rate of sensor 2)

0 0 5 10 15 20 25 F1 (sampling rate of sensor 1)

(i) FIGURE 6.7: Contour plot of P ,av vs average sampling rate Fav with Corrupted Sensor Mea- surements Appendix A

Continuous-Discrete Kalman Filter

Continuous-Discrete Problem Setting

Determine the linear least-squares estimate of x(t), the state vector of a given continuous-time state-space model, viz.,

x˙(t)=A(t) x(t)+B(t) w(t) (A.1) y(tn)=C(tn) x(tn)+hn in terms of the discrete-time measurements y(t ),y(t ),...,y(t ) , where t is the most recent { 0 1 n } n observation instant that precedes “t”, namely, t t < t . This means we wish to form the n  n+1 (continuous-time) state estimate

n xˆ(t)= hk(t)y(tn k) tn t < tn+1 Â k=0  where we select the (possibly time-variant) coefﬁcients hk(t) to minimize the mean-square estimation error E x(t) xˆ(t) 2. Notice that y(t ) is obtained by sampling a sensor at the k k n time instant tn : the variance of the additive measurement noise hn depends on the bandwidth of the sampling device used (which is typically much larger than the sampling rate), as well as the noise level of the sensor. Also notice that in many applications the (irregular) measurement instants t are not known in advance. Consequently, the evaluation of the Kalman gain { n} K(tn) can be carried out only when a measurement becomes available (i.e., at the time instant tn ).

The (possibly time-variant) matrices A(t),B(t),C(tn) are assumed to be known. The (pos- 1 sibly multichannel) disturbance signals nw(t) and hn areo assumed to be mutually uncorrelated,

1Correlated disturbances can also be handled.

84 Appendix A. Continuous-Discrete Kalman Filter 85 and satisfy the relations

E w(t) w⇤(s) = Q(t) d(t s) , E h h⇤ = R d (A.2a) n m n n,m n o n o for all t,s,n,m, and

E w(t) hn⇤ = 0 , for all t,n (A.2b) n o The Continuous-Discrete Kalman Filter

Signals Statistics

Measurement Update (at the time instant tn )

n(tn)=y(tn) C(tn) xˆ (tn) Y(tn)=C(tn) P (tn) C⇤(tn)+Rn 1 xˆ+(tn)=xˆ (tn)+K(tn) n(tn) K(tn)=P (tn) C⇤(tn) Y (tn)

P+(tn)=P (tn) K(tn) Y(tn)K⇤(tn)

Time Update (in the interval t t < t ) n  n+1

xˆ(t)=F(t,tn)xˆ+(tn) P(t)=F(t,tn) P+(tn) F⇤(t,tn) + t ( , ) ( ) ( ) ( ) ( , ) tn F t t B t Q t B⇤ t F⇤ t t dt R

Herex ˆ (tn) lim xˆ(t) denotes the predicted state estimate, i.e., the estimate just prior to ⌘ t tn " applying a measurement update: it is the L-MMSE estimate of the state-vector x(tn) in terms of the measurements y(t0),y(t1),...,y(tn 1) . Also, n(tn) is the innovations signal associated { } with y(tn), viz.,

n(tn) := y(tn) yˆ (tn) wherey ˆ (tn) denotes the L-MMSE of y(tn) in terms of the same (past) measurements. Simi- larly,x ˆ+(tn) lim xˆ(t) denotes the ﬁltered state estimate, i.e., the estimate just after applying ⌘ t tn # a measurement update: it is the L-MMSE estimate of the state-vector x(tn) in terms of the measurements y(t0),y(t1),...,y(tn 1),y(tn) . { } Appendix A. Continuous-Discrete Kalman Filter 86

The state-transition matrix F(t,t) is the solution of the matrix differential equation

d F(t,t)=A(t)F(t,t)(t t) with the initial condition F(t,t)=I dt

A(t t) When A(t) is a constant matrix, F(t,t)=e . Finally, K(tn) is the Kalman gain, and

Y(tn) := E n(tn)n⇤(tn) n o P (t ) := E x(t ) xˆ (t ) x(t ) xˆ (t ) ⇤ + n n + n n + n nh ih i⇤o P (tn) := E x(tn) xˆ (tn) x(tn) xˆ (tn) nh ih i o Notice that the continuous-time estimatex ˆ(t) in, in fact, a predicted estimate for every time instant t in the interval tn < t < tn+1 . This becomes apparent if we opt to focus solely on the measurement update instants t , resulting in a “discrete” version of the time update, namely, { n}

Time Update (at the time instant tn+1)

xˆ (tn+1)=F(tn+1,tn)xˆ+(tn) P (tn+1)=F(tn+1,tn) P+(tn) F⇤(tn+1,tn) + tn+1 ( , ) ( ) ( ) ( ) ( , ) tn F tn+1 t B t Q t B⇤ t F⇤ tn+1 t dt R

Initialization:

The continuous-discrete Kalman ﬁlter recursions start with the initial conditions

xˆ (t0)=0 and D P (t0)=P0 = E x(0)x⇤(0) n o

Numerical Considerations

The Kalman filter is prone to accumulation of (finite precision) roundoff errors. In particular, the subtraction step in the recursion P+(tn)=P (tn) K(tn) Y(tn)K⇤(tn) may result in an invalid P+(tn), i.e., one that is not positive-definite. One way to overcome this difficulty is to use the equivalent recursion step (the so-called “Joseph form”)

⇤ P+(tn)= I K(tn)C(tn) P (tn) I K(tn)C(tn) + K(tn) RnK⇤(tn) h i h i which guarantees that P+(tn) remains positive-deﬁnite even under ﬁnite precision calculations. Appendix B

Statistics of Sampling Patterns

Sampling Patterns Mean Variance Index Kurtosis Moment Generating Function D (1 p)2 (1 p)esD Bernoulli packet drop 1 p p p + 6 fTs (s)= 1 pesD 1 6 k Gamma distribution kQ k k fTs (s)=(1 sQ) 2 Uniformly distributed random D a 6 f (s)=esD sinh(asD) 3 5 Ts asD

87 Appendix C

Averaged Time Update

1 Let matrix A be diagonalizable, say A = TLT , where T is a nonsingular matrix and L = diag l is a diagonal matrix constructed from the eigenvalues of A. Consequently, for any { i} matrix M and for any scalar t, we have

At A t Lt 1 L t Lt L t e Me ⇤ = Te T MT⇤e ⇤ T ⇤ = T e Me˜ ⇤ T ⇤ (C.1) ⇣ ⌘ 1 where M˜ = T MT⇤ and the asterisk ()⇤ denotes Hermitian transpose. Using the diagonal scaling lemma of App. D, we can rewrite eLt Me˜ L⇤t as

T eLt Me˜ L⇤t = vecd(eLt ) vecd(eL⇤t ) M˜ (C.2) ⇢ h i where denotes a Schur-Hadamard product, and el1t el2t Lt 2 3 vecd(e )= . 6 . 7 6 7 6 7 6 elMt 7 6 7 4 5 is a column vector constructed from the diagonal elements of eLt . Notice that [vecd(eL⇤t )]T = Lt [vecd(e )]⇤, so that

Lt L t T Lt Lt vecd(e )[vecd(e ⇤ )] = vecd(e )[vecd(e )]⇤ M = e(ll +lm⇤ )t (C.3) l,m=1 h i Now, if t is random, then

88 Appendix C. Averaged Time Update 89

At A t Lt L t E e Me ⇤ = TE e Me˜ ⇤ T ⇤

n o n Lt o L t T = TE vecd(e ) vecd(e ⇤ ) M˜ T ⇤ M n (l +l )t ⇥ ⇤ o = TE e l m⇤ M˜ T ⇤ l,m=1 nh i M o = T f (l + l ⇤) M˜ T ⇤ (C.4) t l m l,m=1 ⇣⇥ ⇤ ⌘ where we used the moment generating function ft (s) associated with the random variable t, viz., f (s) E est . t , { } In summary, when A is diagonalizable, we have

At A t 1 E e Me ⇤ = T W (T MT⇤) T ⇤ (C.5) t n o h i M where Wt = ft (ll + lm⇤) . l,m=1 h i t As A⇤s Similarly, applying the diagonal scaling lemma to the integral 0 e Qe ds, we obtain R As A s Ls 1 L s e Qe ⇤ = T e T QT ⇤e ⇤ T ⇤

Lns L s T o 1 = T vecd(e ) vecd(e ⇤ ) (T QT ⇤) T ⇤ M n (l +l )s ⇥ ⇤1 o = T e l m⇤ (T QT ⇤) T ⇤ (C.6) l,m=1 nh i o so that

t eAsQeA⇤sds Z0 t M (l +l )s 1 = T e l m⇤ (T QT ⇤) T ⇤ds l,m=1 Z0 nht i M o (l +l )s 1 = T e l m⇤ ds (T QT ⇤) T ⇤ l,m=1 ⇢✓Z0 ◆ h iM (ll +lm⇤ )t e 1 1 = T (T QT ⇤) T ⇤ (C.7) 8" ll + lm⇤ # 9 < l,m=1 =

Applying ensemble averaging: we obtain ;

t As A⇤s 1 E e Qe ds = T[ft (T QT ⇤)]T ⇤ (C.8) ⇢Z0

M 1 where ft = ft (ll + lm⇤) 1 . ll +lm⇤ l,m=1 h i Appendix D

Diagonal Scaling Lemma

Let G be any m n matrix and D , D any diagonal matrices of size m m and n n, respectively. ⇥ 1 2 ⇥ ⇥ Then we have D GDT = vecd(D )[vecd(D )]T G (D.1) 1 2 1 2 n o where vecd(D1) is a column vector consisting of the diagonal elements of the matrix D1 (same as diag(D ) in Matlab), and denotes the Schur-Hadamard product (same as . in Matlab). 1 ⇤

The main utility of the result is in the case when D1 and D2 are random or time-variant, while

G is ﬁxed. When D1 and D2 are real valued, then we have

E D GDT = R G (D.2) 1 2 D1,D2 ⌦ ↵ where RD1,D2 is the averaged cross covariance of vecd(D1) and vecd(D2), namely

T RD1,D2 = E vecd(D1)[vecd(D2)] D n oE When D1 and D2 are complex valued, we have

E D GD⇤ = R G (D.3) h { 1 2}i D1,D2

where RD1,D2 = E vecd(D1)[vecd(D2)]⇤ ⌦ ↵

90 Appendix E

Lemmas

Lemma 1: Let C be any m n matrix and R any m m matrix. Deﬁne the matrix valued function def ⇥ ⇥ y(K,X) = FXF + KRK , where F = I + KC and X is n n while K is n m. We have ⇤ ⇤ ⇥ ⇥ 1 miny(K,X)=y(KX ,X), where KX = XC⇤(CXC⇤ +R) and ymin(K,X)=X XC⇤(CXC⇤ + K 1 R) CX.

Proof:

y(K,X)=(I + KC)X(I + KC)⇤ + KRK⇤

= K(CXC⇤ + R)K⇤ + KCX + XC⇤K⇤ + X

=(K K )(CXC⇤ + R)(K K )⇤ X X 1 +(X XC⇤(CXC⇤ + R) CX)

def 1 where KX = XC⇤(CXC⇤ + R) . Since (K KX )(CXC⇤ + R)(K KX )⇤ 0, the mini- def mum of y(K,X) is achieved when K = KX , so that ymin(K,X)=y(KX ,X) = f (X), where def f (X) = X XC (CXC + R) 1CX. ⇤ ⇤ Lemma 2: f (X) is a monotone increasing matrix function.

Proof: For any n n matrices X and Y such that X Y we have f (X)=y(K ,X) y(K ,X) ⇥  X  Y from the result of Lemma 1, also since y(KY ,X) is a monotone increasing function of X, we have y(K ,X) y(K ,Y)= f (Y). Thus we have f (X) f (Y), and f (X) is a monotone increasing Y  Y  matrix function [53].

Lemma 3: f (X) is a concave (=down convex) matrix function.

91 Appendix E. Lemmas 92

Proof: Let X and Y be any two square (n n) matrices, and let Z be their convex combination, ⇥ i.e, Z = aX +(1 a)Y, where a [0,1]. Then 2

f (Z)= y(KZ,Z)=FZZFZ⇤ + KZRKZ⇤

= aF XF⇤ +(1 a)F YF⇤ + aK RK⇤ +(1 a)K RK⇤ Z Z Z Z Z Z Z Z = ay(K ,X)+(1 a)y(K ,Y) Z Z

Using Lemma 1, we have y(K ,X) y(K ,X) and y(K ,Y) y(K ,Y), so that Z X Z Y

f (Z) ay(K ,X)+(1 a)y(K ,Y) X Y = a f (X)+(1 a) f (Y)

Lemma 4: If f (X) is any monotone increasing and down-convex matrix function (where X is a square positive semi-deﬁnite matrix) then g(X)=A f (X) has the same property for any positive deﬁnite matrix A.

Proof: For any X1,X2

g(X ) g(X )=A f (X ) A f (X )=A ( f (X ) f (X )) 2 1 2 1 2 1 so that if X > X then g(X ) g(X ) > 0, where we used the facts that: (i) f (X) is mono- 2 1 2 1 tone increasing, and (ii) the Schur-Hadamard product of positive-semidefinite matrices is also positive-semidefinite [53]. Next using the linearly property of the Schur-Hadamard product, we find that

g lX +(1 l)X lg(X )+(1 l)g(X ) 1 2 1 2 ⇣ ⌘ h i = A f lX +(1 l)X lA f (X )+(1 l)A f (X ) 1 2 1 2 ⇣ ⌘ h i = A f lX +(1 l)X l f (X )+(1 l) f (X ) 1 2 1 2 ⇢ ⇣ ⌘ h i Since f (X) is concave, it follows that

f lX +(1 l)X l f (X )+(1 l) f (X ) 0 1 2 1 2 ⇣ ⌘ h i Also since A is positive deﬁnite, using the Schur-Hadamard product theorem, we conclude that

g lX +(1 l)X lg(X )+(1 l)g(X ) 0 1 2 1 2 ⇣ ⌘ h i Appendix E. Lemmas 93 which establishes the concavity of g(X) . Appendix F

Fundamentals of Binary Hypothesis Testing [48]

The binary decision problem, also known as binary hypothesis testing involves two hypotheses

(=scenarios), usually denoted as H0 and H1. The objective is to decide which hypothesis is true, given a random measurement X (possibly a random vector). A systematic way to reach a decision is to partition the space of all possible measurements into two decision regions, A0 and A , so that we choose hypothesis H when X A , and similarly for A and H . 1 0 2 0 1 1 Bayesian decision rule:

The Bayes criterion provides an explicit cost function, which leads to the optimal decision rule

H0 L(X) 7 z (F.1a) H1 where L(X) is the likelihood ratio function

def p(X H ) L(X) = | 1 (F.1b) p(X H ) | 0 and z is the optimal (Bayesian) threshold, given by

(c c )Pr(H ) z = 01 00 0 (F.1c) (c c )Pr(H ) 10 11 1 Here Pr(H ) is the probability of occurrence for hypothesis H , and c ; i, j = 0,1 are the i i { ij } costs (= penalties) associated with the four decision scenarios, as deﬁned in Table F.1.

Various other decision rules can be obtained as special cases of the Bayesian decision rule (F.1). In particular, the overall probability of error (i.e., probability of making the wrong decision) is minimized when we set c00 = 0 = c11 , and c01 = 1 = c10 . 94 Appendix F. Fundamentals of Binary Hypothesis Testing [48] 95

D0 (Decision = H0) D1 (Decision = H1) Truth = H0 c00 c01 Truth = H1 c10 c11

TABLE F.1: Four decision scenarios and their costs

Neyman-Pearson decision rule:

When the prior probabilities Pr(H ), Pr(H ), and the costs c are all unknown we can 0 1 { ij} still use the decision rule (F.1), but the value of the threshold z has to be determined without reliance on these unknown parameters. The decision rule (F.1a) deﬁnes decision regions that depend on the value of the threshold z, so that the probabilities

P = p(X H ) dX , P = p(X H ) dX D | 1 FA | 0 ZA1(z) ZA1(z) are both functions of z. The Neyman-Pearson Theorem guarantees that the decisions based on (F.1) are always Pareto-optimal, namely, this decision rule outperforms any other choice of decision regions. The threshold value z is usually selected to achieve a speciﬁed value for the probability of false alarm PFA .

Fisher decision rule:

The most challenging decision scenario occurs when p(X H ) is not known, typically because | 1 H1 is a composite hypothesis, combining many possible scenarios (e.g., “bad data”). Since only p(X H ) is available in this case, we can no longer rely on the notion of a likelihood ratio. | 0 Nevertheless, we can still express the probability of false alarm in terms of the decision region, viz., P = p(X H ) dX 1 p(X H ) dX FA | 0 ⌘ | 0 ZA1 ZA0

The decision region A1 is, therefore, selected so as to achieve a desired value for PFA .

In particular, when X is one-dimensional (i.e., a scalar), and p(X H ) is symmetric with | 0 respect to the origin, we ﬁnd that

• P = 2 p(X H ) dX FA | 0 Zq which is the area in the tail of the probability distribution p(X H ) deﬁned by q. For instance, | 0 in the (zero-mean) Gaussian case, setting PFA = 0.05 corresponds to the (Fisher) decision rule “reject H if X > 1.96s,” where s is the standard deviation of the random variable 0 | | X. Setting P = 0.01 changes the decision rule to X > 2.58s. FA | | Appendix G

Examples

Stable System

10 11 A = , Q = , C = 11, R = 1 (G.1) " 1 1 # " 11# h i

“Marginally-Stable” System

0.10 11 A = , Q = , C = 11, R = 1 (G.2) " 1 0.1 # " 11# h i

Unstable System

1.25 0 20 20 A = , Q = , C1 = 11, C2 = 23, R = 2.5 (G.3) " 11.1 # " 20 20 # h i h i

1.25 1 0 20 20 20

A = 2 0 0.973, Q = 2 20 20 20 3, C1 = 11, C2 = 23, R = 2.5 6 00 0.6 7 6 20 20 20 7 h i h i 6 7 6 7 4 5 4 5 (G.4)

96 Appendix G. Examples 97

Stable System (two sensors both observable)

10 11 A = , Q = , C1 = 11, C2 = 23, R = 1 (G.5) " 1 1 # " 11# h i h i

Stable System (two sensors one unobservable)

10 11 A = , Q = , C1 = 10, C2 = 23, R = 1 (G.6) " 1 1 # " 11# h i h i Appendix H

Islanded Power System Example

Simulation model:

We simulate a (small) islanded power system, similar to the ones used on ships, consisting of:

A synchronous generator, driven by a turbine (or diesel engine) with an iso-synchronous • speed governor

A permanent magnet synchronous motor (PMSM), controlled via a variable frequency • drive (VFD), and connected to a marine propulsion system.

The simulation is carried out in terms of the modules shown in Fig. H.1. Only the steady-state responses of the turbine, the VFD and the propeller are simulated at the present. The remaining modules are modeled with their transient dynamics. A voltage regulator is assumed (but not simulated) for the generator, so that the generator voltage VG(t) has constant amplitude, but variable phase, which is determined by the torque angle of the generator. A (short) lossless distribution line connects the generator to the VFD.

V (t) V (t) G - Variable M - Generator PMSM Pe(t) PMSM PM(t) Load Turbine - freq. - - (swing) - - (stator) (swing) ( prop.) drive wM(t) 6 6 Speed governor wG(t) wload(t)

FIGURE H.1: Block diagram of the simulated power system

Thus the (dynamic) state variables of the simulated system are:

The generator torque angle d (t) and its ﬁrst derivative (=angular speed transient) d˙ (t) . • G G 98 Appendix H. Islanded Power System Example 99

The motor speed transient d˙ (t) • M The quadrature and direct components of the motor’s stator current, namely, i (t) and • q id(t)

Robust Kalman ﬁlter with delay-mitigation:

We use these measurements to construct estimates of all five state variables, using a robust continuous-discrete Kalman filter, augmented with delay-mitigation capability. We introduce a (relatively long) delay in the path between the sensors (all co-located at the PMSM) and our dynamic state estimator. The results (Fig. H.2) show the time-evolution of each of the five state variables and its delay-mitigated estimate. We change the PMSM/propeller speed (in sharp steps) from half-speed, through three-quarters-speed, to full-speed, and then back again to half- speed. All variables are simulated as per-unitized. In particular, the power delivered to the propeller at full-speed equals one unit.

Notice that the electrical transients (in iq(t) and id(t)) are much shorter than the electro- mechanical transients. The effect of delay on our dynamic state estimates (dashed lines) is noticeable mainly in the presence of step transitions (such as in the estimate of iq(t)). Although practical delays are expected to be shorter than 50msec, we opted here to use a longer delay, so as to demonstrate the capability of our delay-mitigation technique (which relies on dynamic state prediction), as well as to make the effect of delay easier to discern. We assumed sensor SNR of 20dB for measuring the generator and PMSM currents, and 10dB for measuring the PMSM speed. This level of measurement noise appears to have no visible effect on the quality of state estimation, as evident from the plots.

We apply here 30% uncertainty to the system matrix. Meanwhile, our robust Kalman ﬁlter method is applied to decrease the effect of system parameter uncertainty. The result (Fig. H.2) shows that the robust KF state estimate can track the real state when both parameters uncertainty and delay are involved. In particular only at the transient time, the tracking is not that accurate but at the steady state, the tracking is as good as no uncertainty case. Appendix H. Islanded Power System Example 100

Robust Kalman Filter with delay: Islanded Power System Example 0.8 0.6 Actual State 0.4 Robust KF state estimate with delay 0.2 Standard KF state estimate with delay

state 1 0 −0.2 −0.4 0 100 200 300 400 500 600 700 800 iteration (8 sec) 2

0 state 2 −2

0 100 200 300 400 500 600 700 800 iteration (8 sec) 0.06 0.04 0.02

state 3 0 −0.02 0 100 200 300 400 500 600 700 800 iteration (8 sec)

0.1

0.05 state 4

0 0 100 200 300 400 500 600 700 800 iteration (8 sec) 0.03 0.02 0.01

state 5 0 −0.01 0 100 200 300 400 500 600 700 800 iteration (8 sec)

FIGURE H.2: Robust delay-mitigated dynamic state estimation in the presence of a 0.2sec delay and 30% parameter inaccuracy Appendix I

Kalman Filter with Parameter Uncertainty in Unstable System

A Kalman filter with accurate model parameters is always stable whatever the system is stable or not, in particular when the state space is unstable, the state estimate will diverge as iterations goes up, but the state error covariance will eventually converge [54]. This property only hold for the case that we know exactly the parameters in Kalman filter. If there is uncertain in parameters, Kalman filter may not be stable anymore, in particular in unstable system.

For simplicity, we use discrete state-space model and corresponding Kalman ﬁlter to demonstrate. Consider a state-space model with uncertainty in system matrix F(tn), input matrix G(tn) and control input u(tn), viz.,

x(tn+1)= F(tn)+dF(tn) x(tn)+ G(tn)+dG(tn) u(tn)+du(tn) + wn (I.1) y(tn)=C (tn) x(tn)+hn

Since the Kalman Filter has no knowledge of uncertainty, the only parameters it can use is the original parameters. So at time instant tn, the measurement update of statex ˆ(tn) is

xˆ+(tn)=xˆ (tn)+K(tn) y(tn) C(tn) xˆ (tn) and the posterior state estimate error e (t ) x(t ) xˆ (t ) satisﬁes the relation + n , n + n

e+(tn)=e (tn) K(tn) C(tn)e (tn)+hn (I.2) = I K(tn)C(tn) e (tn) K(tn)hn

101 Appendix I. Kalman Filter with Parameter Uncertainty in Unstable System 102

Comparing the time update

xˆ (tn+1)=F(tn)xˆ+(tn)+G(tn)u(tn) to the state equation (I.1), we ﬁnd that the prior state estimate error e (tn) , x(tn) xˆ (tn) satisﬁes the relation

e (tn+1)=F(tn)e+(tn)+dF(tn)x(tn)+G(tn)du(tn)+dG(tn) u(tn)+du(tn) + wn (I.3) combine (I.2) and (I.3) equations, we obtain the recursion

e (tn+1)=F(tn) I K(tn)C(tn) e (tn) F(tn)K(tn)hn (I.4) +dF(tn)x(tn)+G(tn)du(tn)+dG(tn) u(tn)+du(tn) + wn

This relation describes a linear time-variant system with output e (tn) and an input that combines contributions from hn, dF(tn), dG(tn), du(tn) and wn (Fig. I.1).

FIGURE I.1: Structure of Kalman Filter with Parameter Uncertainty

Our objective is to deduce the asymptotic behavior of e (tn) from the properties of the input signals and the linear system that maps these inputs into e (tn). We restrict our discussion to the case where F, G, dF, dG, C are all constant (i.e., time-invariant). In addition, we assume 1 that F,GQ 2 is controllable and F,C is detectable. Under these conditions, the Kalman filter { } { } converges to a steady state [54, ch.14]. In particular, P+(tn) converges to a positive definite steady state covariance which is also the unique positive definite solution of the (discrete-time) algebraic Riccati equation associated with the system. Consequently, the Kalman gain K(tn) also converges to a steady state value K, and thus the linear system in Fig. I.1 becomes time-invariant in steady state, with a transfer function [I z 1F(I KC)] 1. The analysis in [54, App E] implies that the matrix F(I KC) has all its eigenvalues strictly within the unit circle (notice that Kp,z of [54] is equal to FK in our notation). Thus, in steady state, the linear system in Fig. I.1 is BIBO stable. Appendix I. Kalman Filter with Parameter Uncertainty in Unstable System 103

We now conclude that the output signal e (tn) will be bounded (and hence will have a bounded steady-state covariance P+) when the combined input in Fig. I.1 is bounded. Now hn and wn are bounded by our modeling assumptions: the noise covariances R and Q are bounded. similarly we assume that all perturbations: dF, dG, du are also bounded, and so is the known control input u( ). Thus the only possible cause for divergence of e (tn) is the potential lack of boundedness · of the input component dF(tn)x(tn).

When the matrix F + dF has all its eigenvalues strictly inside the unit circle, the state space model (I.1) is stable and generates bounded state trajectories x( ), so that e (tn) remains bounded · and the Kalman ﬁlter converges even in the presence of model parameter inaccuracies. However, if the matrix F + dF is unstable, then the state trajectories x( ) diverge, and so e (tn) diverges · as well, regardless of the magnitude of dF.

In summary, the classical results about convergence of the Kalman ﬁlter with detectable F,C 1 { } and controllable F,GQ 2 applies only with perfect knowledge of all model parameters. Since { } some parameter inaccuracy, however small, is always present the use of a Kalman ﬁlter estimator is, in fact limited to the case of a stable F matrix Bibliography

[1] M. Micheli and M.I. Jordan, “Random Sampling of a Continuous-time Stochastic Dynami- cal System,” Proc. 15th International Symposium on Mathematical Theory of Networks and Systems, Aug. 2002.

[2] A.S. Matveev and A.V. Savkin, “The Problem of State-Estimation via Asynchronous Com- munication Channels with Irregular Transmission Times,” IEEE Trans. on Automatic Control, Vol. 48, No. 4, pp. 670-676, Apr. 2003.

[3] V. Gupta, T. Chung, B. Hassibi, and R. M. Murray, “On a stochastic sensor selection algorithm with applications in sensor scheduling and sensor coverage,” Automatica, vol. 42, no. 2, pp. 251-260, Feb. 2006.

[4] Y. Xu and J. Hespanha, “Estimation under controlled and uncontrolled communications in networked control systems,” Proc. IEEE Conf. Decision Control, Seville, Spain, Dec. 2005, pp. 842-847.

[5] J.P. Hespanha, P. Naghshtabrizi and Y. Xu, “A Survey of Recent Results in Networked Control Systems,” Proc. IEEE, 95(1), pp. 138-162, Jan. 2007.

[6] P. Antsaklis, J. Baillieul, “Special Issue on Technology of Networked Control Systems,” IEEE Proc., Vol. 95, No. 1, Jan. 2007.

[7] L. Schenato, B. Sinopoli, M. Francheschetti, K. Poola and S. Sastry, “Foundations of Con- trol and Estimation Over Lossy Networks,” Proc. IEEE, 95(1), pp. 163-187, Jan. 2007.

[8] H. Zhang, M.V. Basin and M. Skliar, “Ito-Volterra Optimal State Estimation and Continu- ous, Multivariate, Randomly Sampled, and Delayed Measurements,” IEEE Trans. on Auto- matic Control, Vol. 52, No. 3, pp. 401-416, March 2007.

[9] X. Liu and A. Goldsmith, “Kalman ﬁltering with partial observation losses,” Proc. IEEE Conf. Decision Control, Nassau, Bahamas, Dec. 2004, vol. 4, pp. 4180-4186.

[10] K. Plarre and F. Bullo, “On Kalman ﬁltering for detectable systems with intermittent observations,” IEEE Trans. Autom. Control, vol. 54, no. 2, pp. 386-390, Feb. 2009.

104 Bibliography 105

[11] A. Censi, “On the performance of Kalman ﬁltering with intermittent observations: A ge- ometric approach with fractals,” Proc. Amer. Control Conf., St. Louis, MO, Jun. 2009, pp. 3806-3812.

[12] B. Sinopoli et al., “Kalman Filtering With Intermittent Observations,” IEEE Trans. on Automatic Control, Vol. 49, No. 9, Sept. 2004, pp. 1453-1464.

[13] S. Kar, B. Sinopoli and J. Moura, “Kalman Filtering With Intermittent Observations: Weak Convergence to a Stationary Distributions,” IEEE Trans. Automat. Contr., Vol. 57, No. 2, pp. 405-420, Feb. 2012.

[14] Y. Mo, B. Sinopoli, “A Characterization of the Critical Value for Kalman Filtering with Intermittent Observations,” IEEE Conference on Decision and Control, Cancun, Mexico, 2008, pp. 2692-2697.

[15] S. Kar, B. Sinopoli and J. Moura, “A Random Dynamical System Approach to Filtering in Large-scale Networks,” Proc. 2010 American Control Conference, Baltimore, June 2010, pp. 1027-1034.

[16] Y. Zhang, J. Zhang, X. Luo, X. Guan, “Sensor/Actuator Faults Detection for Networked Control Systems via Predictive Control,” International Journal of Automation and Comput- ing, pp. 173-180, June 2013.

[17] I.R. Petersen and A.V. Savkin, Robust Kalman Filtering for Signals and Systems with Large Uncertainties, Boston, MA, Birkhauser 1999.

[18] L. Xie, Y.C. Soh and C.E. de Souza “Robust Kalman ﬁltering for Uncertain discrete-time systems”, IEEE Trans. Automat. Contr., Vol. 39, pp. 1310-1314, Jun. 1994.

[19] A.H. Jazwinski, Stochastic Processes and Filtering Theory, Dover, 2007.

[20] D.R. Cox and V. Isham, Point Processes, Chapman and Hall, 1980.

[21] A. Gelb, Applied Optimal Estimation, The MIT Press, 1974.

[22] H. W. Sorenson, Kalman Filtering: Theory and Application, IEEE Press, 1985.

[23] M. S. Grewal, A. P. Andrews, Kalman Filtering: Theory and Application, Prentice Hall, 1993.

[24] A. H. Sayed, “A framework for state-space estimation with uncertain models,” IEEE Trans- actions on Automatic Control, Vol. 46, No. 7, pp. 998-1013, July 2001.

[25] A. Subramanian and A. H. Sayed, “Regularized robust ﬁlters for time-varying uncertain discrete-time systems,” IEEE Transactions on Automatic Control, Vol. 49, No. 6, pp. 970- 975, June 2004. Bibliography 106

[26] A. Gomez-Exp´ osito,´ A. Abur, A. de la Villa Jaen´ and C. Gomez-Quiles,´ “A multilevel state estimation paradigm for smart grids,” Proc. IEEE, Vol. 99, No. 6, pp. 952-976, June 2011.

[27] J.Nilssoon, B.Bernhardsson and B. Wittenmark “Stochastic analysis and control of real- time systems with random time delays”, Automatica, Vol. 34, No. 1, pp. 57-64, Jan 1998.

[28] W. Wolf, “Cyber-physical Systems”, IEEE Computer, March 2009, pp. 88-89.

[29] A.G. Phadke and J.S. Thorp, Synchronized Phasor Measurements and Their Applications, Springer, 2008.

[30] A.S. Matveev and A.V. Savkin, “Estimation and Control over Communication Networks,” Birkhauser; 2009

[31] H. Zhang, G. Feng and C. Han “Linear Estimation for Random Delay Systems”, Confer- ence on Decision and Control (CDC), Atlanta, Dec. 2010.

[32] J. Hua, T. Liang and Z. Lei “Design Time-Stamp Based State Pridiction Controller for Networked Control Systems”, WiCOM, Dalian, China, Oct. 2008.

[33] L. Schenato “Optimal Estimation in Networked Control Systems Subject to Random Delay and Packet Drop”, IEEE Tran. on Auto. Control, Vol. 53, No. 7, pp. 1311-1317, Jun. 2008.

[34] H. Lev-Ari, A.M. Stankovic´ and B. Yan, “Estimation and Control in Electric Energy Sys- tems Based on Time-Stamped Signals”, European Control Conference (ECC), Budapest, Hungary, Aug. 2009.

[35] G.B. Giannakis, V. Kekatos, N. Gatsis, S.-J. Kim, H. Zhu and B.F. Wollenberg, “Moni- toring and Optimization for Power Grids – A signal processing perspective”, IEEE Signal Processing Magazine, Sept. 2013, pp. 107-128.

[36] B. Yan, H. Lev-Ari and A.M. Stankovic,´ “Robust Continuous-Discrete Kalman Filter for Time-Stamped Delay Mitigation in Networked Estimation and Control Systems”, North American Power Symposium (NAPS), Pullman, WA, Sep. 2014.

[37] J. D. Glover, S. Mulukutla and T. Overbye, “Power System Analysis and Design,” Cengage Learning; 5th edition, 2011

[38] B. Yan, H. Lev-Ari and A.M. Stankovic,´ “Multi-Sensor Networked Estimation in Electric Power Grids”(invited), 4th International Workshop on Computational Advances in Multisen- sor Adaptive Processing, San-Juan, PR, Dec. 2011.

[39] B. Yan, H. Lev-Ari and A.M. Stankovic,´ “Multi-Sensor Networked State Estimation with Delayed and Irregularly-Spaced Observations”, IEEE Statistical Signal Processing Work- shop, Ann Arbor, MI, Aug. 2012. Bibliography 107

[40] B. Yan, “Estimation and Control under Communication Network Constraints,” M.S. The- sis, Department of Electrical and Computer Engineering, Northeastern University, Boston, May 2010.

[41] P. Torab and E.W. Kamen, “On Approximate Renewal Models for the Superposition of Renewal Processes,” IEEE International Conference on Communications, 2001.

[42] C.E.De Souza, R. M. Palhares, and P.L.D. Peres, “Robust H• ﬁlter design for uncertain linear systems with multiple time-varying state delays,” IEEE Trans. Signal Process., Vol. 49, No. 3, pp. 569-576, March 2001.

[43] H. Zhang, L. Xie and W. Wang “Optimal estimation for systems with time-varying delay”, IET Control Theory and Applications, Vol. 4, Iss. 8, pp. 1391-1398, 2010.

[44] D. Ding, Z. Wang, B. Shen, and H. Shu, “H• State Estimation for Discrete-Time Com- plex Networks With Randomly Occurring Sensor Saturations and Randomly Varying Sensor Delays,” IEEE Transactions on Neural Networks and Learning Ssystems, Vol. 23, No. 5, pp. 725-736, May 2012.

[45] A. M. Sukhov, N. Kuznetsova, “What type of distribution for packet delay in a global network should be used in the control theory,” Networking and Internet Architecture (cs.NI): arXiv:0907.4468, July 2009.

[46] D. Simon, “Optimal State Estimation: Kalman, H Inﬁnity, and Nonlinear Approaches ,” Wiley-Interscience; 2006

[47] L. Xie, D.H. Choi, S. Kar, and H.V. Poor, “Bad data detection in smart grid: a distributed approach,” in Smart Grid Communications and Networking, E. Hossain, Z. Han and H.V. Poor (Eds.), Cambridge University Press, 2012.

[48] H.V. Poor, An Introduction to Signal Detection and Estimation, Springer Verlag, 1994.

[49] http://en.wikipedia.org/wiki/Kalman_filter

[50] L.E. Ghaoui and G. Calaﬁore “Robust Filtering for Discrete-Time Systems with Bounded Noise and Parametric Uncertainty”, IEEE Transactions on Automatic Control, Vol. 46, No. 7, pp. 1084-1189, July 2001.

[51] B. Hassibi, A. H. Sayed, T. Kailath, Indeﬁnite-Quadratic Estimation and Control: A Uni- ﬁed Approach to H2 and H• Theories, SIAM, 1999.

[52] C.V. Loan, “The Sensitivity of the Matrix Exponential,” SIAM Journal on Numerical Anal- ysis, Vol. 14, No. 6 Dec. 1977.

[53] D.S. Bernstein, Matrix Mathematics: Theory, Facts, and Formulas with Application to Linear Systems Theory, Princeton University Press, 2005. Bibliography 108

[54] T. Kailath, A. H. Sayed and B. Hassibi, Linear estimation, Prentice Hall, 2000.

[55] B. Sinopoli et al., “Cyber Physical Security of a Smart Grid Infrastructure,” Proceedings of the IEEE, Vol. 100, Issue 1, Jan. 2012, pp. 195 - 209.