Stochastic Differential Contraction in Nonlinear System Analysis

Mardavij Roozbehani, Alexandre Megretski and Munther A Dahleh

Abstract— By adopting on a notion of stochastic differential w2(·). Inequalities of the form (1) inevitably lead to upper contraction, the paper presents new results on the incremental bounds on u(·) that scale like mean squared gain (IMSG) analysis of nonlinear systems with stochastic inputs. The relative power between two stochastic 1 processes is defined as the asymptotic average (over time) of , (2) the second moment of the point-wise distance (in some metric) 1 − µ between the two processes. The IMSG of a system is then µ 1 defined as the relative power between two output trajectories as approaches to , which is identical in nature to the driven by two independent instances of i.i.d. inputs with unit scaling that arises from a worst-case analysis. This worst- relative power. While contracting metrics have been previously case type scaling is particularly evident when the underlying used for analysis of nonlinear systems, the formulation and system is linear, in which case the contraction rate µ is equal analysis method in this paper lead to new conditions that yield to the square of the modulus of the dominant pole. a more accurate upper bound on the system gain. The idea is to introduce a notion of stochastic differential contraction which In contrast to [16] and [17], we define a notion of stochas- does not explicitly embed an exponential rate of contraction. tic differential contraction, i.e., a metric which contracts only This approach is more suitable for analysis of systems with in expectation and does not explicitly embed an exponential stochastic inputs. In particular, and unlike previous approaches, rate of contraction. Therefore, the associated Lyapunov in- the standard H2-norm analysis results for linear systems can equalities do not impose an exponential rate of decay on the be recovered as a special case in this setting. distance between trajectories. Instead, they require that the I.INTRODUCTION underlying Lyapunov function decreases (in expectation) by an amount that does not explicitly depend on the value of Contraction theory, which can be seen as an extension of the function itself (unlike the exponential decay rate case). the classical Lyapunov theory to analysis of the behavior Intuitively, we propose conditions that amount to Lyapunov of system trajectories with respect to each other, instead of inequalities of the form: around nominal equilibria, has a long history in the nonlinear systems literature. According to [1] and [2], the earliest v(t + 1) − v(t) ≤ φ(w1(t), w2(t)) − ψ(x1(t), x2(t)) references on the subject appear to be in the math circles [3]– [7], with further and independent grounding and theoretical where ψ(·, ·) is a suitable function that we wish to bound developments contributed by various authors including [8] from above along the system trajectories. This approach and [9]. These methods have found applications in various allows us to avoid an inverse scaling of the form (2), and engineering problems including analysis of limit cycles [10], obtain sharper bounds than those in the existing literature. system identification [11], [12], synchronization of nonlinear Interestingly, with this approach we recover the standard H2 oscillators [13], process control [14] and general stability analysis results for linear systems [18] as a special case. To analysis [15]. Recently, such methods have been revisited and the best of our knowledge this does not apply to the existing extended to analysis of system gains for nonlinear systems results in the literature, indicating that our results are less with stochastic inputs [16], [17]. conservative. Previous approaches to contraction analysis of nonlinear Finally, our results are presented in terms of implicit non- systems with stochastic inputs [16], [17], rely on an exponen- linear equations (descriptor form) that define the underlying tial rate of contraction, which leads to Lyapunov inequalities dynamics. This is an attractive feature because many engi- of the form: neering systems including networks with optimization in the ¯ loop, systems with algebraic constraints, and parametrically u(t + 1) − µu(t) ≤ φ(w1(t), w2(t)), ∀t ∈ Z+, (1) identified systems from input/output data [12], are naturally where µ < 1 is the contraction rate and u(t) = represented in such implicit forms. ¯ ψ(x1(t), x2(t)) is a functional that needs to be bounded from above. Here, x1(·) and x2(·) are two trajectories of A. Preliminaries the system corresponding to two different inputs w1(·) and Notation 1: The sets of non-negative real numbers and integers are denoted by R+ and Z+ respectively. For a l The authors are with the Laboratory for Information and Decision vector v ∈ R , vk denotes the k-th element of v, and def Systems (LIDS), Massachusetts Institute of Technology (MIT), 2 Pl 2 |v| denotes the standard 2-norm: |v| = i=1 |vi| . Cambridge, MA, U.S.A. emails: {mardavij, ameg, dahleh}@mit.edu. We will use |v|P to denote the standard quadratic norm This work was supported by the National Science Foundation (Grant CPS- (semi-norm) with respect to a positive definite (semidefinie) 2 T l 11358430), the MIT-Skoltech Initiative, and the MIT-Masdar Initiative. symmetric matrix P : |v|P = v P v. The space of R -valued l def T n×n functions u : Z+ 7→ R with finite power covariance matrix W = E[ww ] ∈ R . Furthermore, w T is independent of the initial condition x(0) of Σ. 2 def 1 X 2 kuk = lim sup |u (t)| The following theorem presents our main result. P T P T →∞ t=0 Theorem 1: Consider the system Σ defined in (4)–(5). P is denoted by ` . Suppose that Assumptions 1 and 2 are satisfied. Let 2 g : n 7→ n f : n 7→ n With a slight abuse of notation, we denote discrete-time R R and R R be a pair of continuously stochastic processes using functions defined over the domain differentiable functions such that the equation l of integers. If u : Z+ 7→ R is a stochastic process, then it g (x(t + 1)) = f (x(t)) + w(t) (6) is said to have finite average power if: y(t) = h(x(t)) (7) T n 2 def 1 X 2 is satisfied for every pair of sequences x : Z+ 7→ R and E kukP = lim sup E |u(t)|P < ∞. n T →∞ T w : Z+ 7→ R that satisfy (4). Suppose that there exist t=0 n n positive semidefinite matrices P ∈ S+ and Q ∈ S+ such P that the following inequality holds: The space of all such sequences is denoted by E`2 . For i.i.d. sequences the above definition is equivalent to the underlying 2 2 2 ˙ ˙ distribution having finite second moment.When the standard |g˙(x)∆|P ≥ f(x)∆ + h(x)∆ , ∀x, ∆ (8) P Q 2-norm is applied (i.e., P = Il) we drop P from all the introduced notations. For a differentiable function f : Rn 7→ Then, the (P,Q)-scaled IMSG of Σ is upper bounded by 1: m ˙ ˙ R , we use fx(·) or simply f(·) to denote the Jacobian: ˙ f (x) = Jx(f(x)) = df (x) /dx. Since throughout the paper GP,Q(w → y) ≤ 1. (9) time is discrete, this notation would not be confused with derivative with respect to time. The cone of positive semi- Proof: Let (x1, y1) and (x2, y2) be two solutions of definite (PSD) matrices in n×n is denoted by Sn . Finally, R + Σ corresponding to two different inputs w1(·) and w2(·), Tr(X) denotes the trace of a square matrix X. and initial conditions x1(0) and x2(0). Let (xγ , yγ ) be the Definition 1: Incremental Mean Squared Gain (IMSG): solution corresponding to the input W w(·) Let denote the class of admissible input signals w (t) = γw (t) + (1 − γ)w (t), for a discrete-time dynamical system Σ. Let Y denote the γ 1 2 signal space for the output y(·). Given a pair of positive and the initial condition: semidefinite matrices P and Q of appropriate dimensions, x (0) = γx (0) + (1 − γ)x (0), the (P,Q)-scaled incremental mean squared gain of Σ is γ 1 2 denoted by Then ∆γ (t) = ∂xγ (t)/∂γ and Γγ (t) = ∂yγ (t)/∂γ are well- GP,Q(w → y), defined continuous functions of γ. Thus, considering (6) and (7) with x = x and y = y , and differentiating with respect and is defined to be the minimal γ ≥ 0 such that the γ γ to γ we obtain the following linear dynamics for ∆ (t): inequality γ 2 2 g˙(x (t + 1))∆ (t + 1) = f˙(x (t))∆ (t) (10) inf : γ2E kw − w˜k − E ky − y˜k ≥ 0 (3) γ γ γ γ γ P Q is satisfied for all input/output pairs (w, y) ∈ W × Y and + w1(t) − w2(t) (w, ˜ y˜) ∈ W × Y such that ˙ P Γγ (t) = h(xγ (t))∆γ (t) (11) w − w˜ ∈ E`2 . Define the functional S : 2n 7→ according to: II. MAIN RESULTS R R+ 2 A. Stochastic Differential Contraction Analysis of Implicit S :(x, ∆) 7→ |g˙(x)∆)|P Dynamical Systems For every fixed γ ∈ [0, 1], let Vγ : Z+ 7→ R+ be the function Consider discrete-time dynamical system Σ specified by that is defined as follows: the implicit state-space model: Vγ (t) = S(xγ (t), ∆γ (t)). Σ: ψ (x(t + 1), x(t), w(t)) = 0 (4) It then follows from (10) that: y(t) = h (x(t)) (5) n n m n p 2 where ψ : R × R × R 7→ 0n, and h : R 7→ R are EVγ (t + 1) = E |g˙(xγ (t + 1))∆γ (t + 1)|P (12) continuously differentiable functions. 2 Assumption 1: The function ψ is invertible in the sense ˙ 2 = E f(xγ (t))∆γ (t) + E |w1(t) − w2(t)|P that the equation ψ(a, x, w) = 0 has a unique solution a ∈ P (13) Rn for all x ∈ Rn, and w ∈ Rm. Assumption 2: The class of admissible input signals W is where, the equality between (12) and (13) follows from n restricted to i.i.d. sequences w : Z+ 7→ R with bounded independence of noise from the state, which in turn follows from Assumption 2. It then follows from equations (8) and III.DISCUSSION AND NUMERICALEXAMPLES (11) – (13) that: A. Linear Systems In this section we briefly discuss our results and compare EV (t + 1) − EV (t) ≤ E |w (t) − w (t)|2 γ γ 1 2 P with the existing literature. For the moment let us consider 2 ˙ the simple case where (6) is of the form: − E h(xγ )∆γ (t) (14) Q x(t + 1) = ax(t) + w(t) Integrating with respect to γ and summing up both sides of a ∈ (−1, 1) W = 1 the above inequality yields: where all variable are scalars, , and . With Q = 1, (8) reduces to: 2 T −1 Z 1 2 P − a P ≥ 1 1 X ˙ E h(xγ )∆γ (t) dγ T Q t=0 0 and we have Z 1 2 2 −1 ˆ ˆ 2 1 E kx − x˜k ≤ inf 2Tr(PW ) = 2(1 − a ) . ≤ E w1(t) − w2(t) P + EVγ (0)dγ (15) P T 0 This is in agreement with the bound that can be obtained where tˆ∈ {0,...,T − 1} is an arbitrary time. Since from the results of [17] with µ = a2,D = 1, β = 1 and M = 1. Z 1 2 Next, let us consider a linear system with two states, 2 ˙ |y1(t) − y2(t)|Q = h(xγ )∆γ (t)dγ defined via the following state space model: 0 Q Z 1 2 ≤ h˙ (x )∆ (t) dγ, (16) a 0 γ γ A = ,B = I2,C = I2 D = 02, 0 Q 1 0.9 it follows from (15) and (16) that and W = I2. From Corollary 1, our results can be written as the following optimization problem: T −1 1 X 2 lim sup E |y1(t) − y2(t)|Q T →∞ T min Tr(P ) (20) t=0 P 2 ≤ E w1(tˆ) − w2(tˆ) P subject to: P AT PA + I = 2Tr(PW ) (17) 2 Again, we emphasize that these are the LMI conditions This completes the proof. that arise in the standard H2 analysis of linear systems. The results of [17] can be summarized as the following Remark 1: Note that condition (8) is equivalent to the optimization problem: following matrix inequality constraint: Tr(P ) T ˙ T ˙ ˙ T ˙ min (21) g˙(x) P g˙(x) f(x) P f(x) + h(x) Qh(x), ∀x. P,µ 1 − µ

subject to: µP ≥ AT PA In the following corollary, the standard H2 analysis results for linear systems (see, e.g., [18]) are recovered. Corollary 1: Consider the linear system P ≥ I2 Let γ and γ denote the optimal values of optimization x(t + 1) = Ax(t) + w(t) 1 2 problems (20) and (21) respectively. Figure 1 shows the ratio y(t) = Cx(t) γ2/γ1 as the parameter a in the A matrix approaches to 1. These results show the advantage of the method presented in w(·) where is a zero-mean i.i.d. process. If there exists a the paper and suggest that the contraction analysis approach P PSD matrix such that of [17] can be more conservative than Theorem 1. Remark 2: It has been shown in [16] that the correspond- P AT PA + CT C, (18) ing bounds for stochastic contraction analysis of nonlinear then, systems are optimal in the sense that they can be achieved. T −1 The methods in [17] are essentially the extensions of the 1 X 2 lim sup E |y(t)| ≤ Tr(PW ) (19) results of [16] to state dependent metrics. Note, however, that T T →∞ t=0 our results do not contradict the optimality claims of [16] and Proof: Apply Theorem 1 with g(x) = x, f(x) = Ax, [17], as those bounds are achievable by a scalar dynamical and h(x) = Cx. Then (8) reduces to (18), and (19) follows system [16]. For scalar systems, our bound is equal to that from (17) and the fact that (w(·), x(·), y(·)) = (0, 0, 0) is an of [17]. The proof of the optimality results as presented in admissible solution. [16] does not extend to higher dimensions. 3.8 where −3 3.75 2 2 27e 1 T µ = max{φ1(x) + φ2(x) } = ,L = [1 1] . x ,x 3.7 1 2 2b c

3.65 Feasibility of (27) implies that GP,I (w → y) ≤ 1 for any feasible solution P . By minimizing Tr(PW ) subject to (27) 3.6 we obtain: 3.55 2 performance ratio E ky − y˜k ≤ inf 2Tr(PW ) < 1108. 3.5 P √ 3.45 Thus, the IMSG from w to y is less than 1108/2 ≈ 16.65,

3.4 that is, GI,I (w → y) < 16.65. 0.9 0.92 0.94 0.96 0.98 1 a IV. CONCLUSIONS Fig. 1. The ratio γ /γ as parameter a varies from 0.9 to 0.99. 2 1 We presented new results on contraction analysis of non- B. A Simple Nonlinear Example linear systems with stochastic inputs based on a notion of Consider discrete-time system Σ˜ defined by: stochastic differential contraction which does not explicitly require an exponential rate of decay on the distance be- a − 1 5 1 x (t + 1) = x (t) − x (t) + e−x3(t) + w(t) (22) tween two trajectories. As a special case, we recover the 1 a 1 a 2 c standard H2 analysis results for linear systems. This feature 1 1 −x3(t) is absent in previous approaches to contraction analysis of x2(t + 1) = x1(t) + x2(t) + e + w(t) (23) a c nonlinear systems with stochastic inputs. This suggests that 2 2 −1 the presented method can in general be less conservative and x3(t + 1) = b(x1(t) + x2(t) ) (24) provide an improvement over existing methods. T −x3(t) y(t) = x1(t), x2(t), e (25) REFERENCES where a, b and c are positive constants. In this example we [1] J. Jouffroy, “Some ancestors of contraction analysis,” in Decision and 2 Control, 2005 and 2005 European Control Conference. CDC-ECC ’05. set a = b = c = 100. Also, E[w(t) ] = 1. Rewriting (24) as 44th IEEE Conference on, pp. 5450–5455, 2005. [2] A. Pavlov, et al., “Convergent dynamics, a tribute to Boris P. Demi- 2 2 −1 e−x3(t+1) = e−b(x1(t) +x2(t) ) , (26) dovich,” Systems & Control Letters, vol. 52, no. 3, pp. 257–261, 2004. [3] D. Lewis, “Metric properties of differential equations,” American we obtain a new representation of Σ˜ in the implicit form Journal of Mathematics, vol. 71, no. 2, pp. 294–312, 1949. [4] Z. Opial, “Sur la stabilite´ asymptotique des solutions dun systeme defined by (22), (23), and (26). It can be shown that Equation dequations´ differentielles,”´ Ann. Polinici Math, vol. 7, pp. 259–267, (8) cannot be satisfied if Theorem 1 is directly applied to 1960. (22)–(24). Reformulating Equation (24) in the exponential [5] P. Hartman, “On stability in the large for systems of ordinary differential equations,” Canadian Journal of Mathematics, vol. 13, pp. 480– form (26) serves as a suitable nonlinear coordinate transfor- 492, 1961. mation that facilitates the analysis. In order to apply Theorem [6] B. P. Demidovich, “Dissipativity of a nonlinear system of differ- 1 to (22), (23), and (26), we compute the Jacobian matrices: ential equations,” Vestnik Moscow State University, Ser. Matematika Mekhanika, vol. Part I, no. 6, pp. 19–27 (1961), vol. Part II, no. 1, pp. 3–8, 1962.  1 0 0   a−1 − 5 φ  [7] P. Hartman, Ordinary differential equations. Birkhauser, 1964. a a 3 [8] W. Lohmiller and J.-J. E. Slotine, “On contraction analysis for non- ˙ 1 g˙(x) =  0 1 0  , f(x) =  1 φ3  linear systems,” Automatica, vol. 34, no. 6, pp. 683–696, 1998.    a  [9] D. Angeli, “A Lyapunov approach to incremental stability properties,” 0 0 φ3 φ1 φ2 0 Automatic Control, IEEE Trans. on, vol. 47, no. 3, pp. 410–421, 2002. [10] I. R. Manchester and J.-J. E. Slotine, “Transverse contraction criteria where 1 for existence, stability, and robustness of a limit cycle,” Systems & −x3 Control Letters, vol. 63, pp. 32–38, 2014. φ3 = φ3(x) = − e c [11] M. M. Tobenkin, I. R. Manchester, and A. Megretski, “Stable nonlinear identification from noisy repeated experiments via convex optimiza- and tion,” arXiv preprint arXiv:1303.4175, 2013. ¯ ¯ [12] M. M. Tobenkin, I. R. Manchester, J. Wang, A. Megretski, and φ1 = φ1(x) = x1φ(x1, x2), φ2 = φ2(x) = x2φ(x1, x2) R. Tedrake, “Convex optimization in identification of stable non-linear state space models,” in Decision and Control (CDC), 2010 49th IEEE where −b(x2+x2)−1 Conference on, pp. 7232–7237, IEEE, 2010. e 1 2 φ¯(x , x ) = 2b . [13] W. Wang and J.-J. E. Slotine, “On partial contraction analysis for 1 2 (x2 + x2)2 coupled nonlinear oscillators,” Biological cybernetics, vol. 92, no. 1, 1 2 pp. 38–53, 2005. Note also that h˙ (x) =g ˙(x). It can be shown that with [14] W. Lohmiller and J.-J. E. Slotine, “Nonlinear process control using ¯ contraction theory,” AIChE journal, vol. 46, no. 3, pp. 588–596, 2000. Q = I3, and a block diagonal matrix P = diag(P , p), where [15] K. H. Khalil, Nonlinear Systems. Prentice Hall, 2002. ¯ 2 P ∈ S+, and p > 0, a sufficient condition for (8) is given [16] Q.-C. Pham, N. Tabareau, and J.-J. Slotine, “A contraction theory by the following matrix inequality constraint: approach to stochastic incremental stability,” Automatic Control, IEEE Trans. on, vol. 54, no. 4, pp. 816–820, 2009. [17] Q.-C. Pham and J.-J. Slotine, “Stochastic contraction in riemannian " AT PA¯ − P¯ + pµI AT PL¯ # metrics,” arXiv preprint arXiv:1304.0340, 2013. 2 [18] G. E. Dullerud and F. Paganini, A course in robust control theory, + I3 0. (27) LT PAL¯ T PL¯ − p vol. 6. Springer New York, 2000.