<<

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 2, FEBRUARY 2012 275 Distributed Fault Detection and Isolation of Large-Scale Discrete-Time Nonlinear Systems: An Adaptive Approximation Approach Riccardo M. G. Ferrari, Member, IEEE, Thomas Parisini, Fellow, IEEE, and Marios M. Polycarpou, Fellow, IEEE

Abstract—This paper deals with the problem of designing a Recently there has been significant research activity in mod- distributed fault detection and isolation methodology for non- eling, control and cooperation methodologies for distributed linear uncertain large-scale discrete-time dynamical systems. As a systems (see, for example, [4], and the references cited therein). divide et impera approach is used to overcome the scalability issues This activity is motivated by several applications, especially of a centralized implementation, the large scale system being monitored is modelled as the interconnection of several subsys- in complex large-scale systems, such as traffic networks, en- tems. The subsystems are allowed to overlap, thus sharing some vironmental systems, communication networks, power grid state components. For each subsystem, a Local Fault Diagnoser is networks, water distribution networks, etc. Such systems, designed, based on the measured local state of the subsystem as although their dynamics and control objectives may appear to well as the transmitted variables of neighboring states that define be completely different, have some important common charac- the subsystem interconnections. The local diagnostic decision is made on the basis of the knowledge of the local subsystem dynamic teristics: their dynamics are complex and spatially distributed, model and of an adaptive approximation of the interconnection and, as a result, it is typically more convenient to decompose with neighboring subsystems. The use of a specially-designed the system into smaller subsystems which can be more easily consensus-based estimator is proposed in order to improve the controlled and monitored locally (or regionally). The study of detectability and isolability of faults affecting variables shared controlling spatially distributed systems is not a new problem. among overlapping subsystems. Theoretical results are provided As far back as in the 1970s, researchers sought to develop so to characterize the detection and isolation capabilities of the pro- posed distributed scheme. Finally, simulation results are reported called “decentralized control” methods [5]. Since then there showing the effectiveness of the proposed methodology. have been many advancements in the design and analysis of distributed control schemes. On the other hand, much less Index Terms—Adaptive estimation, distributed fault detection and isolation, large-scale system, nonlinear systems. research activity has been devoted at the problem of designing fault diagnosis schemes specifically for distributed systems. Due to the complexity of the problem, in practice it is diffi- I. INTRODUCTION cult to achieve robust fault diagnosis in large-scale distributed systems within a centralized implementation, mainly because HE problem of automated fault diagnosis and accom- of scalability issues. In fact, a centralized scheme sooner or modation is motivated by the need to develop more T later may hit one of the two following constraints on the hard- autonomous and intelligent systems that operate reliably in ware/software architecture used to implement it: limited avail- the presence of system faults. In dynamical systems, faults able computation power for evaluating the fault decision, and are characterized by critical and unpredictable changes in the limited communication bandwidth for acquiring all the neces- system dynamics, thus requiring the design of suitable fault sary measurements. While considerable effort was aimed at de- diagnosis schemes [1]–[3]. Moreover, with current technolog- veloping distributed fault diagnosis algorithms suited to discrete ical trends several systems of practical interest are large-scale event systems (see, for instance, [6]), much less attention was and/or physically distributed and thus the decomposition and devoted to discrete or continuous–time systems (see [7], on the spatial distribution of highly demanding computational tasks is problem of designing sensor networks for fault-tolerant estima- of critical importance. tion [8], [9] on fault-tolerance in distributed systems [10]–[12], which are focused on decentralized fault detection, and [13] on Manuscript received May 26, 2010; accepted August 05, 2011. Date of pub- fault consensus in networks of unmanned vehicles). lication August 15, 2011; date of current version January 27, 2012. This work In previous works [14]–[16], the authors developed some was supported by the Italian Ministry for University and Research, by Regione Friuli-Venezia-Giulia, and the Research Promotion Foundation of Cyprus. Rec- preliminary results on a quantitative distributed fault detection ommended by Associate Editor A. Ferrara. scheme where a large-scale system was decomposed into a set R. M. G. Ferrari is with Danieli Automation S.p.A., Buttrio, (e-mail: of disjoint subsystems, and the physical interaction between [email protected]). T. Parisini is with Imperial College London, London, U.K. and also with neighboring subsystems was described by uncertain nonlinear the DI3 University of Trieste (I), Trieste 34127, Italy (e-mail: t.parisini@pa- functions. A network of Local Fault Diagnosers (LFD) was perplaza.net). developed so that each LFD monitored a single subsystem by M. M. Polycarpou is with the KIOS Research Center for Intelligent Systems and Networks, Department of Electrical and Computer Engineering, University making use of the measurement of local variables, as well as of Cyprus, Nicosia 1678, Cyprus (e-mail: [email protected]). the value of some interconnection variables communicated by Color versions of one or more of the figures in this paper are available online neighboring LFDs. But, apart from this exchange of measure- at http://ieeexplore.ieee.org. ments, the neighboring LFDs were not involved in the process Digital Object Identifier 10.1109/TAC.2011.2164734

0018-9286/$26.00 © 2011 IEEE 276 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 2, FEBRUARY 2012 of deciding whether a fault occurred in a subsystem. In this where denotes the unknown fault-evolution rate (the paper, the above distributed detection scheme is extended to ”abrupt” fault time-profile can be obtained as in (2)). allow cooperation between neighboring LFDs by using over- The problem of detecting and isolating faults in nonlinear un- lapping decompositions [17] of the initial large-scale system. certain systems described by (1) using adaptive approximation In this way, more than one LFD may be monitoring a single methodologies has been addressed in several works in the lit- shared variable and collectively decide on the presence of faults erature (see, among others, [20], [21] and the references cited influencing it. This is implemented by means of a specially therein). In this paper, we consider design and analysis of an designed consensus-like estimation scheme that may improve adaptive approximation methodology for the case of large-scale the capability of the LFDs to detect a fault with respect to the and distributed nonlinear systems for which a centralized Fault consensus–less, non overlapping case. Detection and Isolation (FDI) architecture may not be possible The novel contribution of the present paper, in the context of or not desirable. We decompose the (possible large) original discrete-time nonlinear systems, is the generalization and exten- FDI problem into a number of smaller problems, easier to solve sion of the distributed fault detection scheme presented in [16] with the available hardware/software infrastructure. Namely, we so as to be able to include also local and global fault isolation consider the decomposition of system into subsystems2 capabilities, thanks to the introduction of specialized Fault Iso- , each characterized by a local state vector lation Estimators and a Global Fault Diagnoser (see [18] for , with a separate monitoring agent designed for each some preliminary results). A rigorous characterization of the . fault isolation capability of the proposed scheme is given, while In order to introduce system decompositions, first of all the a simulated 11–tanks system is used throughout the paper to il- system structure is defined using graph theory [22]. lustrate the decomposition strategy, the modeling of local and Definition 1: The structure of a dynamical system distributed faults and, finally, to show the effectiveness of the having a state vector and an input vector is the proposed methodology. set of ordered pairs The paper is organized as follows: in Section II, a problem formulation is developed for fault diagnosis of distributed dy- namical systems. The design and analysis of a distributed fault detection and isolation architecture is presented in Section III, followed by the detailed development of its detection part in Section IV, and of its isolation part in Section V. Finally, simu- lation results for illustrating the methodology are given in Sec- Definition 2: The structural graph [17] of a dynamical tion VI, while Section VII provides some concluding remarks. system , having a state vector and an input vector , is the directed graph (digraph) having II. BACKGROUND the node set Let us consider a nonlinear dynamic system , referred to as and the arc set . monolithic system and described by the following discrete–time The decomposition of the monolithic system is based on model: decomposing its structural graph. The idea of graph decompo- sition has been used in many fields [23]. For example, graph decomposition has been used in numerical methods involving (1) the solution to partial differential equations [24]–[27], in image processing [28], in operations research [29], and, of course, in where is the discrete-time instant, and large–scale system decomposition [17], [30]. To decompose a denote 1 the state and input vectors, respectively, and monolithic system described as in (1) and having a structural represents the nominal healthy dynamics. More- graph , we define subsystems , with over, the function stands for the uncertainty in , each one having a local state vector the state equation and includes external disturbances as well as and a local input vector . These local vectors are con- modeling errors and possibly discretization errors. From a qual- structed by taking components of the monolithic system vectors itative viewpoint, the term represents and , based on ordered sets of in- the deviation in the system dynamics due to a fault. The term dices, called extraction index set [15], [16], [31]. These sets can characterizes the time profile of a fault that occurs be defined by introducing the following extraction mapping. at some unknown discrete-time instant , and denotes Definition 3: For each subsystem , its extraction index the nonlinear fault function. This formulation (first introduced set is obtained by means of an extraction mapping in [19]) allows both additive and multiplicative faults (since is , so that . a function of and ), as well as more general nonlinear faults. Definition 4: The local state and the local The fault time profile models incipient faults char- input of a dynamical subsystem , arising acterized by an exponential decaying time-profile from the decomposition of a monolithic system , are if respectively the vectors and (2) if , where is the extraction index set of the –th subsystem. 1Here and in the rest of the paper the use of bold letters indicates that a given quantity is related to the monolithic system. 2In the paper, a capital-case index denotes a specific sub-system. FERRARI et al.: DISTRIBUTED FAULT DETECTION AND ISOLATION OF LARGE-SCALE DISCRETE-TIME NONLINEAR SYSTEMS 277

It is worth noting that, when performing the “ ” operation Definition 10: The fundamental graph of a decomposition in the two previous definitions, the elements of the index set , is the digraph , having the node set are taken in the order they appear. According to Definition and the arc set is 4, the local input contains all the input components that affect . at least one component of the local state vector. At this point, Unlike linear systems, for which powerful model decomposi- the structural graph of the –th subsystem can be easily defined tion techniques and descriptions exist (see for instance the works as the subgraph induced on by the subset made of all the published in recent years by D’Andrea et al. [4], [35]), for non- components of together with those of . linear systems the decomposition task is much more difficult, Definition 5: A decomposition of dimension of the and in general it is not possible to devise an additive decompo- large-scale system is a multiset made sition into purely local and purely interconnection terms. There- of subsystems, defined through a multiset of fore, a general decomposition as in [17] is considered index sets, such that for each the following ax- ioms hold: 1) ; 2) , for each ; (3) 3) the subdigraph of induced by must be weakly con- nected, that is, each component of must act on or must where is the local nominal function and be acted on by at least another component of ; represents the interconnection 4) . function, where the effects of the local uncertainty term has Axiom 1 prevents the definition of trivial empty subsystems, also been incorporated, with . Further- Axiom 2 is necessary for well-posedness, Axiom 3 avoids more, is the local input (see Definition that resulting subsystems have isolated state components, 4), is the vector of interconnection while Axiom 4 requires that the decomposition covers the variables (see Definition 8), and whole original monolithic system. It is important to note that is the local fault function. Let us now introduce two assumptions the above decomposition does not require that for any two that will be used in the subsequent analysis. subsystems . This allows Assumption 1: The fault function is such that the funda- for a state component of to be assigned to more than one mental graph is not modified by the fault event. subsystems, thus leading to an overlapping decomposition [32]. Assumption 1 is introduced to simplify the formal analysis; Such decompositions have been found to be useful tools when according to this assumption, in the paper we suppose that the addressing problems of stability, control and estimation [33], possible fault event does not cause a change to the system struc- and fault diagnosis [34] for large–scale linear systems. ture by adding new dependencies between variables be- As a result of overlaps, some components of are assigned longing to different subsystems, so that the local fault function to more than one subsystems thus giving rise to the concepts of is a function of local and interconnection variables only 3. shared state variable and overlap index set. This also means that the neighbors index set and the inter- Definition 6: A shared state variable is a component of connection vector do not change structure due to the occur- such that , for some , rence of a fault. and a given decomposition of dimension . Assumption 2: For each , the state vari- Definition 7: The overlap index set of a shared variable ables and control variables remain bounded be- is the set , whose cardinality is . fore and after the occurrence of a fault, i.e., there exist some In the following, the notation , with , will stability regions , such that be used to denote the fact that after the decomposition the –th , . Finally, state component of became the –th of , . the time profile parameter is unknown but it is lower bounded Now, we define the interaction (if any) between different by a known constant , that is . subsystems, through the external variables influencing the As a consequence of Assumption 2, for each subsystem dynamics of subsystem . , it is possible to define some stability Definition 8: The interconnection variables vector regions for the interconnecting variables . Since no fault of the subsystem is the accommodation is considered in this work the feedback con- vector troller acting on the system must be such that the variables . and remain bounded for all . Assumption 2 is The set of subsystems acting on a given subsystem required for well-posedness, but does not cause major loss of through the interconnection vector is the neighbors index set generality to the proposed FDI scheme. In fact, from a practical , a concept that naturally leads to the fundamental graph [17] perspective, detecting faulty modes characterized by large or whose nodes represent subsystems and whose arcs represent even unbounded ”magnitudes” typically turns out to be quite their interaction through interconnection variables. an easy task by resorting to limit-checking techniques. Definition 9: The neighbors index set of a subsystem is The interconnection function in the decomposition de- the set scribed by (3) includes the uncertainty represented by the term . Therefore, in the sequel the following will be needed.

3However, it is possible for a fault event to remove some of the interconnec- tions, which can be formally represented by setting some g function to zero. 278 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 2, FEBRUARY 2012

Fig. 2. Scheme of the proposed DFDI architecture concerning the same three subsystems of Fig. 1.

Fig. 1. (a) Example of decomposition of a system into three overlapping health of the global system . Fig. 2 shows in pictorial form subsystems , and , and (b) the corresponding fundamental graph. the structure of the DFDI architecture. The various part of the architecture are arranged in three layers: the first layer is con- Assumption 3: The interconnection function is an unstruc- stituted of the physical subsystems, the second layer is made up tured and uncertain nonlinear function, whose –th component by local fault diagnosers, while the third one contains the global is bounded by some known function 4, i.e. fault diagnoser. The different type of arrows highlight the dif- ferent interactions between the parts of the architecture: phys- ical interactions in the first layer, consistent information flows (4) between layer one and two and between parts of layer two, while sporadic communication between the second and the third layer where the bounding function is known and bounded is illustrated by dashed arrows. for all . Following the fault isolation formulation proposed in [21], for isolation purposes we assume that for the global system A. Example there exists a global fault set containing possible non- linear fault functions , . Following To gain some more insight into the afore-described decompo- the decomposition and based on Assumption 1, the introduc- sition approach, consider the example depicted in Fig. 1, where tion of the global fault set leads to the existence, for each a specific decomposition of a system into three overlapping subsystem ,ofalocal fault set containing known subsystems , and is considered. The example of types of possible nonlinear fault functions 5 , Fig. 1 deals with a 11–tank system, which will be re-considered . Thus, each LFD provides a fault decision in the simulation Section VI. The decomposition shown in this example is such that: regarding the health of the corresponding subsystem , , by relying on nonlinear adaptive estimators of the local and are the local states, state , with . The first estimator, called Fault , and are the local inputs, Detection Approximation Estimator (FDAE), is based on the , and nominal model (3) and is used for fault detection. The remaining estimators, called Fault Isolation Estimators (FIE), make are the interconnection variables. Furthermore, up a bank of estimators to be used to determine which of the and are shared variables with possible faults in the set has occurred. and . Under normal operating conditions (that is from until a fault is detected) the FDAE is the only estimator that each III. DISTRIBUTED FAULT DETECTION AND LFD employs. After a fault is detected by any of the LFDs, ISOLATION ARCHITECTURE the GFD in response triggers the switch of each LFD from fault The backbone of the proposed Distributed Fault Detection detection to fault isolation operating mode. In the latter mode, and Identification (DFDI) architecture is made of communi- each LFD activates its own bank of FIEs in order to try to locally cating Local Fault Diagnosers , which are devoted to mon- isolate the occurred fault, by employing kind of a Generalized itor each of the subsystems. The LFDs generate a fault deci- Observer Scheme (GOS), (see [36], [37]). The local fault deci- sion regarding the mode of behavior (healthy or one among sions of the LFDs are communicated to the GFD, allowing the possible faulty modes) of the corresponding subsystem . it to determine which one of the faults in the global set ,ifany, These decisions are gathered by a higher level agent , which affects the system (see Section V and Algorithm 1). is referred to as Global Fault Diagnoser (GFD), in order to co- In the DFDI scheme, we assume that every LFD takes uncer- ordinate the LFDs and formulate a fault decision about the tain measurements of according to , where is an unknown term characterizing the measurement 4In the paper, when there is no risk of ambiguity and for the sake of simplicity, a compact notation like, for instance, g @tA  g @x @tAYz @tAYu @tAA, is used. 5The global and the local fault sets is described in detail in Section V. FERRARI et al.: DISTRIBUTED FAULT DETECTION AND ISOLATION OF LARGE-SCALE DISCRETE-TIME NONLINEAR SYSTEMS 279 error associated with the process of measuring by each LFD Definition 12: The fundamental detection signature associ- (we assume to be perfectly available). Moreover, each LFD ated with the system at the discrete-time instant is the communicates with the neighboring LFDs in in order to fill index set the interconnection vector (see the example in Fig. 2). Due to the uncertain state measurements, it follows that, instead of re- (9) ceiving the actual interconnection vector , each LFD receives from its neighbors the vector , where Now, the local fault detection logic for the –th LFD can be is made of the components of affecting the relevant compo- stated in terms of the local detection signature . Specif- nents of the measurements . ically, a fault affecting the –th subsystem is detected by its Assumption 4: The measuring uncertainties represented by LFD at the first discrete-time instant such that becomes the vectors and are unstructured and unknown, but, for non-empty. This discrete-time instant is called the local fault de- each and for each , the components tection time , as formally defined in the following. of and of are bounded, respectively, as Definition 13: The local fault detection time is defined as . (5) Finally, the fault detection time is simply defined as the earliest among the local detection times. where and are known positive scalars. Hence, it is pos- Definition 14: The fault detection time is defined as sible to define a priori two compact regions of interest and . such that and . This formalizes the fact that in the proposed architecture the Under the assumptions made so far, a shared variable event of a LFD detecting a fault is immediately relayed to the and the interconnection part of the local model (3) are measured GFD, which computes the fundamental detection signature and modeled by distinct LFDs in the overlap set with dis- and sets as the earliest discrete-time instant at which it be- tinct uncertainties. Following this consideration, in the sequel, comes non empty. Then, it immediately informs every LFD that a cooperation mechanism between LFDs in the overlap set a fault has been detected and that the isolation mode, introduced will be devised. in Section III and further described in Section V, should be ac- tivated. IV. DISTRIBUTED FAULT DETECTION Remark 1: The communication between the LFDs and the After the DFDI algorithm is initialized at by turning on GFD required to implement the DFDI architecture is event- each th LFD, only its FDAE estimator is enabled and monitors driven, that is, only events such as the detection or isolation of a the subsystem , providing a local state estimate of the fault are communicated through the channels depicted as dashed local state . The difference between the estimate and arrows in Fig. 2. As this kind of exchanged information is lim- the measurements yields the estimation error ited to simple boolean values, this means that even if the com- which plays the role of a residual and will be munication between the LFDs and the GFD follows a one-to-all compared, component by component, with a suitable detection pattern, scalability should not be an issue in practice. threshold6 . The following condition: A. Local Fault Detection and Approximation Estimator (6) The local FDAE is a nonlinear adaptive estimator based on is associated with the fault-free hypothesis the subsystem model (3), which (as in [16] in the continuos-time case) generalizes to the distributed context the fault diagnosis (7) methodology presented in [21]. By this, we mean that (6) is a necessary (but generally not suf- First of all, the simpler case of a non-shared state variable is ficient) condition for (7), so that should condition (6) be violated addressed. The estimate of the th component is computed at some time instant , then the hypothesis is falsified and as the so–called local fault detection signature is generated, thus leading to a local fault detection decision. In qualitative fault diagnosis schemes, such as [38], the fault signature is de- (10) fined as a symbolic vector, that qualitatively describes the be- havior of residuals and their derivatives after the occurrence of where . The term is the –th output of an adap- a fault. Instead, in quantitative schemes, such as [1], [2], [36], tive approximator designed to learn the unknown interconnec- the fault signature represents the pattern of residuals that exhibit tion function , and denotes its adjustable parame- abnormal behavior after the occurrence of a fault. ters vector, with being a compact set 7. As in [16], Definition 11: The local detection signature associated with in this paper we assume that represents a linear-in-the-pa- the subsystem , at the discrete-time instant rameters (but otherwise nonlinear) multivariable approximation is the index set model, such as neural networks, fuzzy logic networks, polyno- mials, spline functions, wavelet networks, etc.

(8) 7For the sake of simplicity we assume ” to be a origin–centered hyper- sphere with radius w (see [21] for some remarks on this geometrical sim- 6To be defined in (20). plification). 280 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 2, FEBRUARY 2012

It is important to emphasize the differences among the present that , , computes locally the term and approach and the one described in [21] regarding the central- communicates it to the other LFDs, according to a suitable com- ized case. Whilst in [21] the adaptive approximator is devoted munication graph , alongside its actual state estimate . to learn the fault function after the detection of a fault, in the Specifically, for the sake of generality, we assume to have a present case, the adaptive approximator starts from the very be- generic communication graph , that may include ginning to learn the uncertain interconnection function in order the all-to-all communication as a special case. Bearing this in to facilitate more accurate and faster detection. It is worth noting mind, (11) can be rewritten in more compact form as that to implement (10), the th LFD needs only to receive from its neighbors the values of the variables making up the the in- terconnection vector . In order for to learn the interconnection function , the parameter vector is updated according to the following law:

(12) where denotes the gra- dient matrix of the on–line approximator with respect to its ad- The term is a weighted adjacency matrix re- justable parameters, and is given by flecting the way the various LFDs estimating the same shared . is a projection operator [39] re- variable communicate with each other. In this work, only stricting within according to: doubly-stochastic adjacency matrices are con- sidered [41]. For example, we may consider the Metropolis ad- if jacency matrices [42], [43] defined as if if if The learning rate is computed at each step as , where denotes the Frobenius if norm while are design constants that (13) guarantee the stability of the learning law [39], [40]. where is the degree of the –th node in . The case of a variable of the original centralized system Remark 2: Requiring the matrix to be doubly stochastic that, after the decomposition, is shared among more than one is a standard assumption in many problems of distributed control LFDs (see the simple example shown in Fig. 1) is more compli- and estimation. As previously said, there exist simple weights cated. Clearly, one option is for each LFD to just implement its selection schemes such as the Metropolis or the Maximum-de- own version of (10), by using the measurement , the local gree [42] that guarantee double-stochasticity (see [44] for fur- model and the components of the adaptive intercon- ther details). nection approximator. Instead, in order to take advantage of the Before the occurrence of a fault (i.e., for ), the dy- redundancy introduced by the overlap and motivated by the en- namics of the LFD estimation error component is couraging practical results shown in [16], in this paper we use a deterministic consensus scheme between the LFDs in so that their FDAEs cooperate towards the estimation of the shared state variable . The proposed consensus protocol leads to the following FDAE dynamic equation:

Since by assumption, the estima- tion error dynamics satisfies

(11) (14) where the additional terms with respect to (10) appearing in the third line smooth out the difference between the various estimate where the following scalar quantities are defined of the shared variable, and those in the last two lines average , the various local functions and approximated interconnection . functions. It is of customary importance to note that, in order Accordingly, the vectors and are defined as to implement (11), the LFD does not need the information and about the expressions of and of ; instead, it suffices . FERRARI et al.: DISTRIBUTED FAULT DETECTION AND ISOLATION OF LARGE-SCALE DISCRETE-TIME NONLINEAR SYSTEMS 281

It is worth noting that, in general, the functions and so that, component-wise, it becomes take on non-zero values due to several factors, including measurement errors on , the measurement errors of neigh- bouring LFDs, and the uncertainty in the interconnection func- tion itself. Although the aim of the adaptive approximator is to learn the uncertain function , generally it cannot be expected to match the actual term even if the weights of the adaptive approximator could be optimally selected. This may be formalized by introducing an optimal weight vector [45] (18)

where is a vector containing the -th row of matrix . Now, a threshold on the estimation error that guarantees no false–positive fault detections for is proposed. The ab- solute value of the estimation error for can be upper with taking values in their respective domains. This bounded by using the triangular inequality as follows: leads to the definition of the Minimum Functional Approxima- tion Error (MFAE) , which describes the least possible approximation error that can be achieved at the discrete-time instant if . By introducing the parameter estimation (19) error and the following function: where (upper bound on the total uncertainty term)8 it turns out that can be written as . By using (14), the dynamics of the LFD estimation error com- ponent before the occurrence of a fault (i.e., ) can be written as

with the function being such that9 . By taking the absolute value component-wise so that , the inequalities (19), can be written as (15) where we introduced the following total uncertainty term Using the Comparison Lemma [46], the absolute value of each component of can be bounded by the corresponding compo- nent of , defined as the solution of the following equation: In order to analyze the behavior of and define the threshold (see (6)), it is convenient to introduce the (20) following vectors related to the detection estimator of all the LFDs sharing the variable : , with initial conditions .It is worth noting that the adaptive threshold defined in (20) can , and . be easily implemented by any LFD in by means of a linear The FDAE estimation error dynamics of all the LFDs in discrete-time first-order filter driven by a suitable input (see [21] can then be written in a more useful and compact form in the continuous-time case).

(16) B. Faulty Behavior and Fault Detectability

Since and is a doubly stochastic matrix, all its eigen- In this subsection, the behavior of the DFDI algorithm in the values are within the unitary circle. Then, it follows that (16) presence of a fault and its detection capabilities are investigated. represents the dynamics of a stable LTI discrete–time system. Assume that at the discrete-time instant a fault occurs. The solution of (16) is Let (21)

with denoting the component of the fault function affecting the -th state equation of the monolithic system (see (1)). After

8The notation m—x is short for m—x .

(17) 9As  is a compact the function can always be defined. 282 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 2, FEBRUARY 2012

Fig. 3. (a) Local fault: for t ! „ only the local detection signature ƒ @tA of the first LFD is non empty, and the fundamental detection signature is a singleton ƒ@tA a fIg, t ! „ . (b) A distributed fault with non-overlapping signature: Vt" ! m—xf„ YsaIY PY Qg all the local detection signatures ƒ @t"A of the LFDs are non empty, and the fundamental detection signature is equal to ƒ@t"AafIY PY Qg (no shared variables appear in any of the local detection signatures). (c) A distributed fault with overlapping signature: Vt" ! m—xf„ YsaIY PY Qg all the local detection signatures ƒ @t"A of the LFDs are non empty, and the fundamental detection signature is equal to ƒ@t"AafIY PY Qg (in this case, shared variables may appear in the local detection signatures). the occurrence of the fault, for , the estimation error dynamics for a shared state variable given by (16) becomes

(22) Using the triangle inequality, we obtain where is a vector whose components are all equal to . The following theorem gives a sufficient condition for the estimation error to cross its corresponding threshold in fi- nite time, thus allowing the fault to be detected by the -th LFD. Therefore, it characterizes the class of faults that can be detected by the proposed scheme, given the bounds available on the un- known functions. Theorem 1 (Local Fault Detectability): Given a subsystem The threshold can be written as , if there exists a discrete-time instant such that the fault satisfies the inequality

(23) for at least one component , then the fault is detected at the discrete-time instant , that is . Now, from the definition of the threshold in Section IV-A, Proof: At the discrete-time instant , by using (17) it follows that the last inequality is implied by and (21), the estimation error vector can be written as

(24) so that the fault detection condition is implied by the theorem hypothesis. By applying the same expansion as in (17) and (18), the solu- Remark 3: Theorem 1 provides a (possibly conservative) suf- tion for the estimation error for the –th component of the –th ficient condition for fault detectability: if at some discrete-time subsystem can be written as10 instant at least one subsystem shows a non-empty local detection signature , then this would cause the GFD to be alerted by the corresponding LFD. In qualitative and rough terms, the inequality on the left-hand side of (23) character- izes the relative ”magnitude” of the effect of the fault versus s 10As ‡s is doubly stochastic and all the components of 0s are equal to 0 , the upper bound on the unknown functions quantified by the it holds @‡sA 0s a 0s for all h. right-hand side of (23). FERRARI et al.: DISTRIBUTED FAULT DETECTION AND ISOLATION OF LARGE-SCALE DISCRETE-TIME NONLINEAR SYSTEMS 283

V. D ISTRIBUTED FAULT ISOLATION are no shared variables, the local detection signatures are such that . A. Formulation of the Distributed Fault Isolation Problem 3) Distributed Fault, Overlapping Signature: A different For isolation purposes, it is assumed that the fault function situation is shown in Fig. 3(c) where links and variables in may either be unknown or belong to a known global fault set more than one subsystem are affected by the same single fault , with but, now, shared variables are involved. Specifically, this means that if , 2, 3, are the local fault detection times of all the LFDs, In general, not all the subsystems are affected by a given fault then and function , but only those in the corresponding fault influence and set . For each –th fault, contains the indexes of all the there may exist such that subsystems that, after the decomposition , are assigned at . least a global state component for which the fault function In cases 2) and 3) above, without loss of generality, we con- is non–zero for at least one discrete-time instant, as defined sidered the situation where all LFDs detect a fault at some finite below. time. The case where not all LFDs are able to detect a fault can Definition 15: The fault influence set for the –th fault be addressed in an analogous way. function is the index set In qualitative terms, in this paper, we assume that the generic -th LFD has access only to the knowledge of the local fault set (25) . Furthermore, the -th LFD is not informed about the fault influence sets of the global faults corresponding to the local fault For each subsystem ,alocal fault set (defined below) functions belonging to . As a consequence, the –th LFD may can be built with the local fault functions obtained by all the only be able to detect and isolate the “local part” of a fault that global faults such that influences the subsystem , but it has not enough information to discern whether the isolated local part correspond to a local fault, or it is just caused by a “larger” distributed fault. This am- Notice that the local fault functions depend only on the local biguity is overcome by the third layer (see Fig. 2), consisting of variables , and (see Assumption (1)). The global index the global fault diagnoser , which is assumed to have infor- and the local index of a fault are related by a mapping mation about the global fault set and the fault influence sets , so that . This of all the global fault functions. By exploiting this knowledge means that, for all the subsystems so that , for the and the local fault decisions gathered by all the lower level generic component of a global fault function it holds that LFDs, the GFD may be able to take a correct global fault de- , with . cision : a successful global isolation of a fault by the GFD locally The concept of the fault influence sets naturally leads to a sub- may require that all of the fault “local parts” have been isolated division of the faults into two categories, depending upon their by the LFDs in its influence set. topology: local faults, whose influence set is a singleton, and B. Local Fault Isolation Logic distributed faults, whose influence set includes more than one subsystem. Now, these categories are illustrated in the context After a fault has been detected at discrete-time instant and of the same simple example of Fig. 1. the GFD informs every LFD to switch from the detection to 1) Local Fault: The simplest situation is exemplified in Fig. the isolation mode, the FDAE adaptive approximator of 3(a). The structure of the fault is enhanced: dashed arcs rep- every LFD stops to learn the interconnection function, that is resent part of the healthy dynamics changed by the fault, and , to prevent the interconnection filled nodes represent variables affected by the fault. As can be approximator from keeping on learning also the “influence” of seen, the arc 1 is faulty so that the dynamics of the variables the fault function on the interconnection term. At the same and are affected, thus leading to the fault influence set time, each LFD enables its bank of , Fault being . This implies that only the local detection sig- Isolation Estimators (FIEs) in order to implement a GOS for nature (see (8)) may become non-empty as this fault affects the task of fault isolation, such as the one described in [21]. only variables ”internal” to subsystem that are not shared This scheme relies on the generic -th FIE of the -th LFD being by any other subsystems. More precisely, if the first LFD detects matched to the corresponding fault function , belonging to a fault at a discrete-time instant , then the local detection the local fault set . Each fault function in is assumed to signature satisfies . be of the form Furthermore, the fundamental detection signature (see (25)) is . These faults are referred to as local faults. 2) Distributed Fault, Non-Overlapping Signature: As shown (26) in Fig. 3(b), a more general situation arises when links and vari- ables in more than one subsystem are affected by the same single where, for , the known func- fault, , for which it holds . This means that, if tions provide the func- all LFDs detect a fault at discrete-time instants , tional structure of the fault and the unknown parameter vec- then and tors provide its “magnitude”. For . Furthermore, since there the sake of simplicity and without much loss of generality, the 284 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 2, FEBRUARY 2012 parameter domains are assumed to be origin–centered The -th FIE estimator dynamic equation for the most general hyper–spheres with radius . case of a distributed fault, with a shared variable, is defined as After the generic –th FIE estimator is enabled, with , it monitors its subsystem , providing a local state estimate of the local state , analogously to the FDAE. The difference between the estimate and the measurements yields the estimation error which, again, is used as a residual and compared, component by component, with a suitable detection threshold . The condition (30) (27) where is associated to the -th fault hypothesis is the -th component of a linearly-parameterized function that matches the structure of the -th fault function , and the vector has been introduced. (28) Analogously to the FDAE case, the parameters vectors are with . Should condition (27) be violated at some updated according to the learning law discrete-time instant , the hypothesis is falsified and a so–called local fault isolation signature is generated. Definition 16: The -th local isolation signature shown by the subsystem , at discrete- where and is time instant is the index set again a suitable projection operator

if if (29) The learning rate is computed at each step as As soon as the hypothesis is falsified and the corre- , with sponding isolation signature becomes non-empty, the . The corresponding estimation error specific FIE stops its operation and the fault is excluded dynamic equation is as a possible cause of the detection signature. The first such time instant is the exclusion time . Definition 17: The -th fault exclusion time is defined as . Ideally, the goal of the isolation logic is to exclude every but one fault, which may be said to be isolated. To express this in a formal way, the following definition is introduced. Definition 18: A fault is locally isolated at dis- crete-time instant iff which implies and . Furthermore is the local fault isolation time. Remark 4: Again we should note that, if a fault has been lo- cally isolated, we can conclude that it actually occurred if we assume a priori that only faults belonging to the set may occur. Otherwise, it can only be concluded that it cannot be ex- cluded that it occurred. Now, considering a matched fault (that is, , ), the error equation can be written as C. Local Fault Isolation and Fault Isolation Estimators

Now, the FIEs are described in detail. After the fault has occurred, the state equation of the -th component of the -th subsystem becomes

where By introducing the parameter FERRARI et al.: DISTRIBUTED FAULT DETECTION AND ISOLATION OF LARGE-SCALE DISCRETE-TIME NONLINEAR SYSTEMS 285 estimation errors , the FIE estimation and, analogously, the threshold solution is given by error equation for a matched fault becomes

so that its absolute value can be bounded by a threshold that is solution of the following equation: This threshold guarantees by definition that no matched fault is excluded because of uncertainties or the effect of the parameter estimation error . In the case of a non–matched fault (that is,

for some and with ), the dynamics of the –component of the estimation error of the -th FIE of the -th LFD can be written as

As in Section IV-A, the error and threshold solutions can be conveniently expressed in vector form , so that it holds

As shown before, a convenient way to study the behavior of the estimation error of the LFDs sharing the variable is to consider the vector , given by the dynamic equation

where the following mismatch vector was introduced

The solution can then be written as

Componentwise, the estimation error is given by

and componentwise is described by 286 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 2, FEBRUARY 2012

Now, owing to the introduction of the above fault mismatch vector, the following important sufficient condition for fault isolability can be proved. Theorem 2 (Local Fault Isolability): Given a fault , if for each there exists some discrete- time instant and some such that

which is implied by the inequality in the hypothesis of the present theorem. Should the inequality hold for every fault function of but the –th, then this fault is locally isolated in the sense of Definition 18. D. Global Fault Isolation Logic As discussed earlier, in the proposed DFDI setting a distinc- tion should be drawn on the way local and distributed faults are isolated. If a fault is local, then having the corresponding LFD exclude every but that fault is sufficient for declaring it isolated. However, for distributed faults, the isolation needs that all the LFDs, in the influence set of that fault 11, exclude all other faults. The following formalizes the conditions for a fault, local or dis- tributed, to be globally isolated: then, the -th fault is isolated. Furthermore, the local isolation Definition 19: A fault is globally isolated if for each time is upper-bounded by . -th LFD in the fault influence set , the corresponding local Proof: By using the triangle inequality, the absolute value functions have been isolated, with . Furthermore of the –th component of the –th FIE of the –th LFD estima- is the global fault isola- tion error is lower-bounded for by tion time. In practice, the global isolation task is carried out by the GFD, by using the fault influence sets of all the global faults in , and the LFDs local fault decisions. The GFD isolation logic is de- tailed in Algorithm 1. In the algorithm, is a boolean variable that is true only when a fault has been success- fully globally isolated, while is the global index of the isolated fault. It is assumed that each LFD sends a fault decision message to the GFD both when it excludes and when it isolates a fault, so that two kinds of message are possible: ex- cluded and isolated. Clearly, in case of a fault not belonging to the a-priori known fault set, a locally isolated fault may still be excluded at a later discrete-time instant by its LFD. Using the known bounds on and and the fact that the -th fault cannot already be excluded at time because of the way Algorithm 1 Global fault isolation logic its threshold has been defined, we have while do wait for a detection message end while notify every LFD to stop learning notify every LFD to start isolation

loop wait for a local isolation message corresponding to the fault locally isolated or excluded In order for the –th fault to be excluded, the inequality must be satisfied. This translates to the if AND then following further inequality:

else

11The fault influence set was introduced in Def. 15. FERRARI et al.: DISTRIBUTED FAULT DETECTION AND ISOLATION OF LARGE-SCALE DISCRETE-TIME NONLINEAR SYSTEMS 287

if locally isolated for each such that then

end if end if end loop VI. SIMULATION RESULTS Re-consider the monolithic system depicted in Fig. 1 a (the square labels refer to the pipes number) and decomposed into three overlapping subsystems, according to the decomposition , with index sets , and . Three pumps are present, feeding the first, seventh and eleventh tank with the following flows: , and . The nominal tank sections, and the interconnecting and drain pipe cross-sections are organized in suitable vectors , , and , respectively (see [47] for the numerical values). All the pipes outflow coefficients are unitary. When building the local models of each LFD, the actual cross-sections used are affected by random uncertainties no larger than 5% and 8% of the nominal values, respectively for the tanks and for the pipes. The outflow coefficients are off by no more than 10%. Furthermore the tank levels measurements are affected by measuring uncertainties whose components are upper bounded by positive scalars , , and (see [47]). In order to learn the interconnection functions of each subsystem, that in this example account for the flows through pipes crossing a subsystem boundary, each LFD is provided with adaptive approximators , implemented by RBF neural networks having 3 neurons along the range of each input dimension. The parameter domains were chosen to be hy- perspheres with radii equal to , with being the sampling period. The learning rate auxiliary coeffi- Fig. 4. Time–behaviors of simulated signals related to tanks no. 11 when a cients for the interconnection adaptive approximators were set leakage is introduced at time 750 s. The fault hypothesis no. 2 is locally rejected to , , , , at time 825 s. , , while the filter constants were all set to , and the total uncertainties were bounded by relative quota of the water in the suitable , , and (see [47]). The weighting matrices for pipes is drained out of the tanks instead of flowing between shared variables were them. All these cases represent distributed faults, the fault influence sets being , . As can be easily seen, the local fault diagnosers may experience the following local signatures: This can be interpreted, for instance in the case of tank 5, as • LFD no. 1 can see as local only the breakdown of pump 1, each of the sharing LFD trusting its own estimate and model or the leakage in tanks 4 and 5, or the effect on tanks 3 and three times more than the estimates of every other LFD in the 4 of the breakdown of pipe 3; overlap set. Three faults were modelled: • LFD no. 2 can see as local only the breakdown of pump 2, 1) Actuator fault in pump 1, 2 and 3: partial or full shut- or the leakage in tanks 4, 5 and 6, or the effect on tanks 4 down of all the pumps modelled as , and 6 of the breakdown of pipe 5; where represents the pumps flow in the faulty case and • LFD no. 3 can see as local only the breakdown of pump 3, . or the leakage in tank 5. 2) Leakage in tank 4, 5 and 6: circular hole of unknown Thus, the resulting fault sets can be constructed in a straight- radius in the tank bottom, so that the forward way, by using standard approximate models for hy- outflow due to the leak is draulic systems (see [47] for detailed descriptions of the fault . models). 3) Breakdown of pipes 3 (tanks ) and 5 (tanks ): Figs. 4–5 show the results of a simulation where at partial or complete breakdown of those pipes, so that a an incipient fault of the first kind begins to affect the three 288 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 2, FEBRUARY 2012

Fig. 6. Time–behaviors of simulated signals related to tanks no. 7 when a leakage is introduced at time 750 s. The faults hypotheses no. 2 and 3 are locally rejected shortly after fault detection. Fig. 5. Time–behaviors of simulated signals related to tanks no. 1 when a leakage is introduced at time 750 s. The faults hypotheses no. 2 and 3 are locally rejected shortly after fault detection. of the three tanks that are directly fed by the pumps, are plotted: tank 1 corresponds to the first local component of subsystem 1, TABLE I tank 7 to the fourth of subsystem 2, and tank 11 to the fifth of TIME SEQUENCE OF FAULT OCCURRENCE,DETECTION AND ISOLATION EVENTS subsystem 3. The sequence of events leading from fault occur- rence to fault detection and finally to fault isolation, is summa- rized in Table I. A few seconds after the fault occurrence time, the fault is detected by the FDAE of the second LFD, as shown in Fig. 6(a). This results in the second LFD sending a fault de- tection message to the GFD, that thus computes a non-empty pumps, reducing their efficiency by an amount equal, respec- fundamental detection signature. In response to this event, the tively, to 25%, 35% and 20%, with a time constant . GFD forces the remaining two LFDs to stop the detection mode, For each LFD, the detection and isolation residuals components and start the isolation mode of operating. For this reason even FERRARI et al.: DISTRIBUTED FAULT DETECTION AND ISOLATION OF LARGE-SCALE DISCRETE-TIME NONLINEAR SYSTEMS 289 if at later times the detection residuals of LFDs number 1 and [7] M. Staroswiecki, G. Hoblos, and A. Aitouche, “Sensor network de- 3 are able to cross their relative thresholds, these events do not sign for fault tolerant estimation,” Int. J. Adaptive Control Signal Pro- cessing, vol. 18, no. 1, pp. 55–72, 2004. correspond to a fault detection, as the fault was already detected [8] R. J. Patton, C. Kambhampati, A. Casavola, P. Zhang, S. Ding, and D. earlier by LFD no. 2. During the isolation mode, all the LFDs Sauter, “A generic strategy for fault-tolerance in control systems dis- are eventually able to reject the fault hypotheses no. 2 and 3, tributed over a network,” Eur. J. Control, vol. 13, no. 2–3, pp. 280–296, 2007. but never the fault hypothesis no. 1, that is thus locally isolated. [9] S. Klinkhieo and R. J. Patton, “A two-level approach to fault-tolerant As the GFD receives the local fault isolation messages from the control of distributed systems based on the sliding mode,” in Preprints LFDs, it constantly checks whether for a given fault all the LFDs 7th IFAC Symp. Fault Detection, Supervision Safety Tech. Processes, Barcelona, , 2009, pp. 1043–1048. in its fault influence set have locally isolated it. In the example [10] X. G. Yan and C. Edwards, “Robust decentralized actuator fault detec- presented here, fault no. 1 is locally isolated by the third LFD tion and estimation for large-scale systems using a sliding mode ob- at time 824 s, thus prompting the GFD to globally isolate fault server,” Int. J. Control, vol. 81, no. 4, pp. 591–606, 2008. 1 at that same time. [11] W. Li, W. Gui, Y. Xie, and S. Ding, “Decentralized fault detection system design for large-scale interconnected systems,” in Preprints VII. CONCLUSION 7th IFAC Symp. Fault Detection, Supervision Safety Tech. Processes, Barcelona, Spain, 2009, pp. 816–821. In this paper, a problem formulation and a distributed fault di- [12] X. Zhang, M. M. Polycarpou, and T. Parisini, “Decentralized fault de- agnosis architecture for large-scale dynamical systems was pre- tection in a class of large-scale nonlinear uncertain system,” in Proc. sented. The proposed scheme relies on overlapping decomposi- Joint 48th IEEE Conf. Decision Control, 28th Chinese Control Conf., Shanghai, , 2009, pp. 6988–6993. tions of the system into sets of interconnected simpler subsys- [13] N. Meskin, K. Khorasani, and C. A. Rabbath, “Fault consensus in a net- tems. Each subsystem is monitoredby a local fault diagnosis unit, work of unmanned vehicles,” in Preprints 7th IFAC Symp. Fault Detec- whichisabletodetectthepresenceoffaults forthecorresponding tion, Supervision Safety Tech. Processes, Barcelona, Spain, 2009, pp. 1001–1006. subsystem based on its own measurements and information from [14] R. M. G. Ferrari, T. Parisini, and M. M. Polycarpou, “A fault detection neighboring subsystems. An adaptive approximation scheme is scheme for distributed nonlinear uncertain systems,” in Proc. IEEE Int. developed in order to learn the functional uncertainty in the in- Symp. Intell. Control, Munich, , 2006, pp. 2742–2747. [15] R. M. G. Ferrari, T. Parisini, and M. M. Polycarpou, “Distributed fault terconnection between neighboring subsystems, before any fault diagnosis with overlapping decompositions and consensus filters,” in is detected. As overlapping decompositions lead to some state Proc. Amer. Control Conf., New York, 2007, pp. 693–698. components being shared between two or more subsystems, a [16] R. M. G. Ferrari, T. Parisini, and M. M. Polycarpou, “Distributed specially designed consensus-based estimation scheme was de- fault diagnosis with overlapping decompositions: An adaptive approx- imation approach,” IEEE Trans. Autom. Control, vol. 54, no. 4, pp. vised in order to allow the distributed diagnosis scheme to reach 794–799, Apr. 2009. a common decision about faults affecting such variables. Dis- [17] D. Siljakˇ , Large-Scale Dynamic Systems: Stability and Structure. tributed detectability and isolability results were proved in order New York: North Holland, 1978. [18] R. M. G. Ferrari, T. Parisini, and M. M. Polycarpou, “Distributed fault to show the potential improvements attainable by this consensus diagnosis of large-scale discrete-time nonlinear systems: New results scheme w.r.t. a consensus-less one, and in order to provide a way on the isolation problem,” in Proc. 49th IEEE Conf. Decision Control, to check the expected sensitivity of the FDI scheme to faults. Atlanta, GA, 2010, pp. 1619–1626. [19] M. M. Polycarpou and A. Helmicki, “Automated fault detection and ac- To the best of the authors knowledge, this is the first work ad- commodation: A learning systems approach,” IEEE Trans. Syst., Man dressing a distributed fault isolation scheme for nonlinear, un- Cybern., vol. 25, no. 11, pp. 1447–1458, Nov. 1995. certain large-scale discrete time systems. [20] M. M. Polycarpou and A. Trunov, “Learning approach to nonlinear fault diagnosis: Detectability analysis,” IEEE Trans. Autom. Control, Future research effort will be devoted to address several in- vol. 45, no. 4, pp. 806–812, Apr. 2000. teresting open issues, namely: i) inclusion of time-delays in the [21] X. Zhang, M. M. Polycarpou, and T. Parisini, “A robust detection and dynamic model of the distributed system and in the commu- isolation scheme for abrupt and incipient faults in nonlinear systems,” IEEE Trans. Autom. Control, vol. 47, no. 4, pp. 576–593, Apr. 2002. nication links between the local FDI modules; ii) state vari- [22] G. Chartrand and O. Oellermann, Applied and Algorithmic Graph ables not available for measurement; iii) validation on practi- Theory. Singapore: McGraw-Hill International Editions, 1993. cally-relevant distributed use-cases, both in simulation and in [23] G. Karypis and V. Kumar, “A fast and high quality multilevel scheme for partitioning irregular graphs,” SIAM J. Sci. Comp., vol. 20, no. 1, actual experiments. This latter point will require quite signif- pp. 359–392, 1999. icant efforts in order to address implementation issues of the [24] X. Cai and Y. Saad, “Overlapping domain decomposition algorithms learning algorithms due to the presence of disturbances and vari- for general sparse matrices,” Numer. Linear Algebra Appl., vol. 3, no. ables with different scales. In this connection, it is worth noting 3, pp. 221–237, 1996. [25] B. F. Smith, P. Bjorstad, and W. Gropp, Domain Decomposition: Par- that early experiments on a lab-scale experimental setup have allel Multilevel Methods for Elliptic Partial Differential Equations. shown promising results (see [47]). Cambridge, U.K.: Cambridge Univ. Press, 2004. [26] J. Lagnese and G. Leugering, Domain Decomposition Methods in Op- REFERENCES timal Control of Partial Differential Equations. Basel, : [1] J. Gertler, Fault Detection and Diagnosis in Engineering Systems. Birkhäuser, 2004. New York: Marcel Dekker, 1998. [27] H. D. Simon, “Partitioning of unstructured problems for parallel [2] M. Blanke, M. Kinnaert, J. Lunze, and M. Staroswiecki, Diagnosis and processing,” Computing Systems in Engineering, vol. 2, no. 2–3, pp. Fault Tolerant Control. Berlin, Germany: Springer, 2003. 135–148, 1991. [3] R. Isermann, Fault-Diagnosis Systems: An Introduction from Fault De- [28] J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE tection to Fault Tolerance. Berlin, Germany: Springer, 2006. Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888–905, Aug. [4] C. Langbort, R. Chandra, and R. D’Andrea, “Distributed control de- 2000. sign for systems interconnected over an arbitrary graph,” IEEE Trans. [29] D. Johnson, C. Aragon, L. McGeoch, and C. Schevon, “Optimization Autom. Control, vol. 49, no. 9, pp. 1502–1519, Sep. 2004. by simulated annealing: An experimental evaluation. Part I: Graph par- [5] N. Sandell, P. Varaiya, M. Athans, and M. Safonov, “Survey of decen- titioning,” Oper. Res., vol. 37, no. 6, pp. 865–892, 1989. tralized control methods for large scale systems,” IEEE Trans. Autom. [30] M. Vidyasagar, “Decomposition techniques for large-scale systems Control, vol. AC-23, no. 2, pp. 108–128, Apr. 1978. with nonadditive interactions: Stability and stabilizability,” IEEE [6] P. Baroni, G. Lamperti, P. Pogliano, and M. Zanella, “Diagnosis of Trans. Autom. Control, vol. AC-25, no. 4, pp. 773–779, Aug. 1980. large active systems,” Artif. Intell., vol. 110, no. 1, pp. 135–189, 1999. 290 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 2, FEBRUARY 2012

[31] S. Stankovic,ˇ M. S. Stankovic,ˇ and D. M. Stipanovic,ˇ “Consensus based Thomas Parisini (F’11) received the “Laurea” de- overlapping decentralized estimator,” in Proc. Amer. Control Conf., gree (cum laude and printing honours) in electronic New York, 2007, pp. 2744–2749. engineering and the Ph.D. degree in electronic engi- [32] M. Ikeda and D. Siljak,ˇ “Overlapping decompositions, expansions, and neering and computer science from the University of contractions of dynamic systems,” Large Scale Syst., vol. 1, no. 1, pp. Genoa, Genoa, Italy, in 1988 and 1993, respectively. 29–38, 1980. He was with the Politecnico di Milano, Milan, [33] M. Hodziˇ cˇ and D. Siljak,ˇ “Decentralized estimation and control with Italy, and since 2001 he is Professor and Danieli overlapping information sets,” IEEE Trans. Autom. Control, vol. Endowed Chair of Automation Engineering with AC-31, no. 1, pp. 83–86, Jan. 1986. the University of Trieste. Trieste, Italy. Since 2009, [34] M. G. Singh, D. Li, Y. Chen, M. Hassan, and Q. Pan, “New approach he has been Deputy Rector of the University of to failure detection in large-scale systems,” Proc. Inst. Elect. Eng. D, Trieste and since 2010 he also holds the Chair of vol. 130, no. 5, pp. 243–249, 1983. Industrial Control at Imperial College London, London, U.K. He authored or [35] R. D’Andrea and G. E. Dullerud, “Distributed control design for spa- co-authored more than 200 research papers in archival journals, book chapters, tially interconnected systems,” IEEE Trans. Autom. Control, vol. 48, and international conference proceedings. He is involved as Project Leader no. 9, pp. 1478–1495, Sep. 2003. in several projects funded by the European Union, by the Italian Ministry for [36] P. M. Frank, “Fault diagnosis in dynamic systems using analytical and Research, and he is currently leading consultancy projects with some major knowledge–based redundancy – A survey and some new results,” Au- process control companies (ABB, Danieli, Duferco, , among others). tomatica, vol. 26, no. 3, pp. 459–474, 1990. His research interests include neural-network approximations for optimal [37] R. Patton, P. Frank, and D. Clark, Fault Diagnosis in Dynamic Systems: control problems, fault diagnosis for nonlinear and distributed systems and Theory and Application. Upper Saddle River, NJ: Prentice Hall, 1989. nonlinear model predictive control systems. [38] P. Mosterman and G. Biswas, “Diagnosis of continuous valued sys- Dr. Parisini received the 2004 Outstanding Paper Award of the IEEE tems in transient operating regions,” IEEE Trans. Syst., Man, Cybern. TRANSACTIONS ON NEURAL NETWORKS and the 2007 IEEE Distinguished A, Syst. Humans, vol. 29, no. 6, pp. 554–565, Jun. 1999. Member Award. He is the Editor-in-Chief of the IEEE TRANSACTIONS ON [39] M. M. Polycarpou, “On–line approximators for nonlinear system CONTROL SYSTEMS TECHNOLOGY. He was the Chair of the IEEE Control identification: A unified approach,” in Control and Dynamic Systems: Systems Society Conference Editorial Board, a Distinguished Lecturer of the Neural Network Systems Techniques and Applications, X. Leondes, IEEE Control Systems Society and the Chair of the Technical Committee on Ed. New York: Academic, 1998, vol. 7, pp. 191–230. Intelligent Control of the IEEE Control Systems Society. He was an elected [40] C. R. Johnson, Lectures on adaptive parameter estimation. Upper member of the Board of Governors of the IEEE Control Systems Society and Saddle River, NJ: Prentice Hall, 1988. of the European Control Association (EUCA) and a member of the board [41] R. Horn and C. Johnson, Matrix Analysis. New York: Cambridge of evaluators of the 7th Framework ICT Research Program of the European Univ. Press, 1985. Union. Thomas Parisini is currently serving as an Associate Editor of the [42] L. Xiao, S. Boyd, and S. Lall, “A scheme for robust distributed sensor International Journal of Control and served as Associate Editor of the IEEE fusion based on average consensus,” in Proc. 4th Int. Symp. Inform. TRANSACTIONS ON AUTOMATIC CONTROL, of the IEEE TRANSACTIONS ON Process. Sensor Netw., Los Angeles, CA, 2005, pp. 63–70. NEURAL NETWORKS,ofAutomatica, and of the International Journal of [43] L. Xiao, S. Boyd, and S. Kim, “Distributed average consensus with Robust and Nonlinear Control. He was involved in the program and organizing least-mean-square deviation,” J. Par. Dist. Comp., vol. 67, no. 1, pp. committees of several international conferences. In particular, he was the 33–46, 2007. Program Chair of the 2008 IEEE Conference on Decision and Control and he [44] B. Gharesifard and J. Cortes, “When does a digraph admit a doubly sto- is General Co-Chair of the 2013 IEEE Conference on Decision and Control. chastic adjacency matrix?,” in Proc. Amer. Control Conf., Baltimore, MD, 2010, pp. 2440–2445. [45] A. Vemuri and M. M. Polycarpou, “On-line approximation methods Marios M. Polycarpou (F’06) received the B.A. for robust fault detection,” in Proc. 13th IFAC World Congress, Sidney, degree in computer science and the B.Sc. degree AU, 1996, vol. K, pp. 319–324. in electrical engineering from Rice University, [46] L. Grujic and D. Siljak,ˇ “On stability of discrete composite systems,” Houston, TX, in 1987, and the M.S. and Ph.D. IEEE Trans. Autom. Control, vol. AC-18, no. 5, pp. 522–524, Oct. 1973. degrees in electrical engineering from the University [47] R.M.G. Ferrari, “Fault diagnosis of distributed large-scale dis- of Southern California, Los Angeles, CA, in 1989 crete-time nonlinear systems,” Univ. Trieste, Tech. Rep., 2010 and 1992, respectively. [Online]. Available: http://control.units.it/ferrari He is a Professor of Electrical and Computer Engineering and the Director of the KIOS Research Center for Intelligent Systems and Networks, Uni- versity of Cyprus. In 1992, he joined the University of Cincinnati, Ohio, where he reached the rank of Professor of Electrical and Computer Engineering and Computer Science. In 2001, he was the first faculty to join the newly established Department of Electrical and Computer Engineering, University of Cyprus, where he served as Founding Department Chair from 2001 to 2008. His teaching and research interests are in intelligent systems and control, adaptive and cooperative control systems, computational intelligence, fault diagnosis and distributed agents. He has published more than 200 articles in refereed journals, edited books and refereed conference proceedings, and co-authored the book Adaptive Approximation Based Control (Wiley, 2006). He is also the holder of three patents. Riccardo M. G. Ferrari (M’03) received the Laurea Dr. Polycarpou received the William H. Middendorf Research Excellence degree (with honors) in electronic engineering and Award at the University of Cincinnati (1997) and was nominated by students for control systems and the Ph.D. degree in information the Professor of the Year award (1996). He is currently an IEEE Distinguished science from the University of Trieste, Trieste, Italy, Lecturer in computational intelligence. He served as the Editor-in-Chief of the in 2004 and 2009, respectively. IEEE TRANSACTIONS ON NEURAL NETWORKS between 2004 to 2010. He serves He has authored and co-authored several papers on the Advisory Board of two international journals and is past Associate Ed- published in international journals and conference itor of the IEEE TRANSACTIONS ON NEURAL NETWORKS (1998–2003) and of proceedings. Since 2008, he has been a Junior the IEEE TRANSACTIONS ON AUTOMATIC CONTROL (1999–2002). He served as Researcher with the R&D Department, Danieli the Chair of the Technical Committee on Intelligent Control, IEEE Control Sys- Automation S.p.A., Buttrio (UD), Italy. His current tems Society (2003–05) and as Vice President, Conferences, of the IEEE Com- research interests include fault diagnosis for non- putational Intelligence Society (2002–2003). He participated in more than 50 linear centralized and distributed dynamic systems, numerical modeling and research projects/grants, funded by several agencies and industry in the United industrial applications of advanced monitoring and control techniques. States, by the European Commission and by the Research Promotion Founda- Dr. Ferrari received the 2005 Giacomini Award of the Italian Acoustic Society tion of Cyprus. He is currently the President of the IEEE Computational Intel- for the best MsC thesis in acoustics. ligence Society.