Quick viewing(Text Mode)

Discretisation of Sparse Linear Systems: an Optimisation Approach∗

Discretisation of Sparse Linear Systems: an Optimisation Approach∗

Discretisation of Sparse Linear Systems: An Optimisation Approach∗

Matheus Souza† Jos´eC. Geromel Patrizio Colaneri Robert N. Shorten

January 6, 2015

Abstract

This paper addresses the discretisation problem for sparse linear systems. Classical methods usually destroy sparsity patterns of continuous-time systems. We develop an optimisation procedure that yields the best approximation to the discrete-time dynamical with a prescribed sparsity pattern and subject to stability and other constraints. By formulating this problem in an adequate manner, tools from convex optimisation can be then applied. Error bounds for the approximation are provided for special classes of matrices. Numerical examples are included.

Keywords: Discretisation, Linear Systems, LMI.

1 Introduction

Discretisation of continuous-time linear systems is a well established procedure, due to its key role in digital control engineering [1] and sampled-data systems [2]. Nevertheless, the requirement for novel discretisation methods is still emerging in several areas. Examples of such areas include networked control systems [3] and large scale collaborative optimisation problems such as those found in intelligent transportation systems (ITS) applications [4]. The basic objective in these new application areas is that one seeks preserve a certain property of interest. In this paper, we will consider the problem of realising discretisation algorithms that preserve sparsity constraints. Large-scale dynamical systems usually present structural characteristics, which are fundamental to de- scribe their behaviour [5]. Indeed, these systems usually derive from the dynamical interaction of several interconnected subsystems, which can model industrial settings [6], automated highway systems [7], struc- tural dynamics [8] and network flow problems [9]. Therefore, these structures arise not only due to physical

∗This work was in part supported by Science Foundation Ireland grant 11/PI/1177; Funda¸c˜ao de Amparo `aPesquisa do Estado de S˜ao Paulo (FAPESP/Brazil); Conselho Nacional de Desenvolvimento Cient´ıfico e Tecnol´ogico (CNPq/Brazil). M. Souza and R. Shorten are with the Hamilton Institute, National University of Ireland Maynooth, Maynooth, Co. Kildare, Ireland. M. Souza and J. C. Geromel are with the School of Electrical and Computer Engineering (FEEC), Univeristy of Campinas (UNICAMP), Campinas, SP, Brazil. P. Colaneri is with Dipartimento di Elettronica e Informazione, Politecnico de Milano, Milan, Italy. R. Shorten is also with IBM Research Ireland, Dublin, Ireland. A preliminary version of this paper has been submitted to the 13th European Control Conference. †Corresponding author: M. Souza. Phone: +353 1 708 6100. Email: [email protected] characteristics of the system being modelled, but also due to communication and costs limitations. For in- stance, an automated highway system may only allow communication between neighbouring vehicles, which builds up the sparsity pattern in its continuous-time state dynamics. The sparsity patterns presented by large scale systems are usually obtained for their continuous-time formulation. However, the discrete-time versions of these models are the ones that will be either implemented or simulated and, as it will be further discussed in the sequel, the classical discretisation methods usually destroy this sparsity pattern. To avoid this, discretisation methods based on Euler’s forward approximation to the exponential can be adopted [10, 11]. Unfortunately, these approximations are usually good only for small values of the sampling period. This paper provides novel discretisation techniques for sparse linear systems. We break free from the classical approach of approximating the and recast the problem in the setting of convex optimisation, which can be solved efficiently with the existing methods. Error bounds are provided for special classes of sparse matrices that arise in several practical applications. The notation is standard. Capital letters denote matrices and small letters denote vectors and scalars. For matrices and vectors, (′) denotes transpose and, for a block-structured , (⋆) denotes each of its symmetric blocks. The sets of real, nonnegative real, positive real and natural numbers are indicated as R R R∗ N , +, + and . For symmetric matrices, X > 0 denotes that X is positive definite. For square matrices, tr(·) denotes the trace function and σmax(·) represent its maximum singular value. For a real matrix A, its ′ spectral and its Frobenius norms will be denoted by kAk2 = σmax(A) and kAkF = tr(A A). Finally, for a (n) real function f of one variable, f denotes its n-th order derivative. p

2 Discretisation of Sparse Linear Systems

2.1 Problem Statement

In this paper, we consider a continuous-time, linear, time-invariant (LTI) autonomous system given by

x˙(t)= Ax(t), x(0) = x0, (1)

n where x : R+ → R is its state. In the classical discretisation problem, one seeks to determine the discrete- time realisation

x[k +1]= Mx[k], x[0] = x0, (2)

N R∗ such that x(kh) ≈ x[k] for all k ∈ , where h ∈ + is the discretisation step or sampling period. It is a well hA ∞ k known fact [12] that, whenever M = e = k=0(hA) /k!, the discrete-time LTI system (2) is such that N x(kh)= x[k] for all k ∈ . Hence, whenever thisP exact approach can be adopted, the discretisation problem is readily solved from the computation of the matrix exponential, [13, 14]. However, in some applications, one seeks to determine M that approximates ehA and preserves some specific properties, such as sparsity. n×n In what follows, we assume that A = (aij ) ∈ R is a , whose specific sparsity pattern n×n 2 is defined by the set S ⊂ R . Formally, one can consider the set IS ⊂ {1,...,n} composed of pairs

(i, j) such that aij is allowed to be nonzero and, therefore, define S as the set that contains all matrices n×n S = (sij ) ∈ R such that sij = 0 whenever (i, j) ∈/ IS . Due to its definition, note that S is a subspace of Rn×n. However, it is of interest to observe that A ∈ S does not ensure that ehA ∈ S for some h> 0. In fact, the discretisation procedure A 7→ ehA generally destroys structural properties of the original continuous-time system. This phenomenon, which ensures x(kh) = x[k], ∀k ∈ N, creates direct dependencies between state variables that do not exist in the original continuous-time dynamics. Hence, considering another subspace R⊂ Rn×n that defines a sparsity pattern, our main goal is to determine M ∈ R such that M ≈ ehA for some h> 0 given. It is often desirable that R = S, but this may be relaxed in some situations, where we will allow R ⊃ S. For example, in ITS applications, local inter-vehicle communication may be possible. Moreover, R 2 also presents a set IR ⊂{1,...,n} composed of the nonzero positions allowed by its structure and, since R may relax some constraints imposed by S, it follows that IR ⊃ IS . This property can be exploited not only to improve the quality of the approximation to the matrix exponential but also to make the optimisation feasible in some situations.

2.2 Mathematical Preliminaries

The following preliminary results and definitions are extensively used throughout. We begin by defining the classical Pad´eapproximants; see [15].

Definition 1 The [λ/µ] order Pad´eapproximant to exp(·) is the rational function Rλµ(·) defined by

−1 Rλµ(s)= Qλ(s)Qµ(−s) , (3)

λ k µ k where the polynomials Qλ(s)= k=0 ℓks and Qµ(s)= k=0 mks have the coefficients

P λ!(λ+µ−k)! P µ!(λ+µ−k)! ℓk = (λ+µ)!k!(λ−k)! and mk = (λ+µ)!k!(µ−k)! (4)

Pad´eapproximants are widely adopted in discretisation methods [11, 16] and in numerical algorithms to compute the matrix exponential [13, 14]. There are two classes of Pad´eapproximants that deserve special attention. The first one is composed of the [λ/0] Pad´eapproximants, which are given by the first λ terms of the Taylor Series of the exponential function, centred at the origin. The second one is composed of the [µ/µ] Pad´eapproximants, called µ-order diagonal Pad´eapproximants, which has Tustin’s bilinear transformation, important in the control theory setting [12], as a particular case (µ = 1). The following theorem [13] provides an error bound for the approximation of a matrix function.

∞ k Theorem 1 If f(·) has the Taylor series representation f(z)= k=0 αkz in an open disk containing the Cn×n eigenvalues of A ∈ , then P

λ λ+1 k nkAk2 (λ+1) f(A) − αkA ≤ max f (sA) . (5) (λ + 1)! 0≤s≤1 2 k=0 2 X

We are particularly interested in the case f(·) ≡ exp(·), which implies

λ Ak nkAkλ+1 eA − ≤ 2 ekAk2 . (6) k! (λ + 1)! k=0 2 X

Additionally, it is also possible to obtain bounds for the Frobenius norm and for any Pad´eapproximant to the exponential; see [13, 15]. Finally, we recall the definition of metric (see [17]).

Definition 2 A metric on a set X is a function δ : X×X → R+ such that, for all x,y,z ∈ X ,

(i) δ(x, y)=0 if, and only if, x = y.

(ii) δ(x, y)= δ(y, x).

(iii) δ(x, y) ≤ δ(x, z)+ δ(z,y).

From the conditions above, one can conclude that a metric represents the notion of distance between two given elements in a set. Therefore, a simple way to define a metric δ on a set X is to adopt δ : (x, y) 7→ kx−yk, where k·k is any norm defined on X . In this case, we say that δ is induced by the norm k·k.

2.3 Discretisation as an Optimisation Problem

Now we focus on the main problem stated before, which can be analysed as a projection problem. Indeed, R∗ ⋆ given a continuous-time system with realisation (1) and a step size h ∈ +, we wish to determine M ∈R⊂ Rn×n such that M ⋆ is the “closest” element of R to ehA, with respect to the metric δ. Thus, its general formulation is M ⋆ = arg inf δ M,ehA (7) M∈R where δ is a metric of interest. One should note that δ measures the distance between the approximation M hA R∗ and the exact discrete-time matrix e , for any h ∈ + given. Thus, from the computational viewpoint, it represents the error yielded by the approximation. The following theorem provides a result on the convexity of this problem.

n×n n×n n×n Theorem 2 Let δ : R × R → R+ be a metric induced by a norm k·k : R → R+. The optimisation problem (7) is convex.

Proof: Since the feasible set R is convex, we only have to show the convexity of the cost function δ( · ,ehA). To this end, we observe that, for any X, Y ∈ Rn×n and any α ∈ [0, 1], we have

δ αX + (1−α)Y,ehA = αX + (1 − α)Y − ehA ≤ α X− ehA + (1 − α) Y − ehA

≤ αδ X,ehA + (1 − α)δ Y,ehA . (8)

  Thus, we conclude δ( · ,ehA) is convex and the statement follows. The proof is complete ✷

In this paper, we will consider the optimisation problem (7), in which δ is a metric induced by two widely adopted norms in approximation problems (see [18]): the spectral norm and the Frobenius norm. To this end, we state the following theorems. Rn×n R∗ Rn×n Theorem 3 Let A ∈S⊂ and h ∈ + be given. Let M be a matrix in R ⊂ and σ be a scalar R∗ in +. The following statements are equivalent:

hA (i) The error bound kM − e k2 < σ holds.

(ii) The following linear matrix inequality (LMI) holds

σ2I ⋆ > 0. (9) M − ehA I!

Rn×n R∗ Rn×n Theorem 4 Let A ∈S⊂ and h ∈ + be given. Let M be a matrix in R ⊂ and σ be a scalar R∗ in +. The following statements are equivalent:

hA (i) The error bound kM − e kF < σ holds.

(ii) There is W > 0 such that the following LMIs hold

W ⋆ tr(W ) < σ2, > 0. (10) M − ehA I!

The proof of both theorems involves well known results of matrix theory and convex analysis. Indeed, the hA hA ′ hA 2 first theorem is based on the equivalence between kM − e k2 < σ and (M − e ) (M − e ) < σ I, whilst hA hA ′ hA 2 the second one considers the equivalent inequalities kM − e kF < σ and tr[(M − e ) (M − e )] < σ . Moreover, from the inequalities presented in Theorem 3, one can conclude that, whenever δ is induced by

k·k2, the best approximation, in the subspace R, to the matrix exponential can be obtained solving the convex optimisation problem (M ⋆, σ⋆) = arg inf {σ : (9)}. (11) M∈R,σ

Similarly, whenever the metric δ is induced by k·kF , the best approximation, in the subspace R, to the matrix exponential can be obtained solving

(M ⋆, σ⋆) = arg inf {σ : (10)}. (12) M∈R,σ

In both cases, σ⋆ = δ(M ⋆,ehA) is the optimal value for the error yielded by the approximation. Moreover, both optimisation problems are convex, as expected. Additionally, the element-wise characteristics of the Frobenius norm allows us to obtain analytically the optimal solution of the problem (12), as stated in the following theorem.

Rn×n R∗ ⋆ ⋆ Theorem 5 Let A ∈S⊂ and h ∈ + be given. The solution M = (mij ) ∈ R to problem (12) is such that ⋆ φij , if (i, j) ∈ IR mij = (13) ( 0, if (i, j) ∈/ IR

hA n×n where e = (φij ) ∈ R . Moreover, the following identity holds

⋆ hA 2 2 M − e F = φij . (14) (i,j)∈I/ R X

Proof: The proof follows from the definition of the Frobenius norm for square matrices. Indeed, we remember that the optimisation problem (12) is equivalent to

2 2 2 inf (mij − φij ) = inf (mij − φij ) + φij , (15) M∈R X mij X X i,j (i,j)∈IR (i,j)∈I/ R which clearly implies that M ⋆ is given by (13), yielding the optimal error (14). The proof is complete. ✷

Theorem 5 shows that the optimal approximation with respect to the Frobenius induced metric is very simple to obtain; one just has got to neglect the elements that are not allowed in the feasibility pattern. This simple technique may not provide good approximations in some cases, since it is completely element-wise and, hence, we will only consider the 2-norm formulation in the sequel. Moreover, when the spectral norm is adopted, the following result ensures the consistency and the order of accuracy of the approximation yielded (see [19] for details).

Theorem 6 Consider the continuous-time dynamical system (1) and the optimisation problem (11). If the hA [λ/0] Pad´eapproximant Rλ0(hA), λ ≥ 1, to the matrix exponential e is feasible to (11), then the discrete- ⋆ time iteration x[k +1] = M x[k], with x[0] = x0, is consistent with the differential equation (1) and it has order of accuracy of, at least, λ.

Proof: The proof follows from the concept of truncation error for numerical methods for ODEs. At the

time instant tk = kh ∈ [0,T ], k ∈ N, for some T > 0 fixed, the truncation error τk(h) is defined as

τk(h) = (1/h)(x(tk+1) − x[k + 1]), that is, it is related to the error caused by a single iteration, assuming

that the true solution at the point tk is known. Hence, for this case, we have that

hA ⋆ kτk(h)k2 = (1/h)ke x(tk) − M x(tk)k2 hA ⋆ ≤ (1/h)ke − M k2kx(tk)k2 hA ≤ (1/h)ke − Rλ0(hA)k2kx(tk)k2

λ λ+1 khAk2 ≤ nh kAk2 e kx(tk)k2/(λ + 1)!, (16)

where we have used the results of Theorem 1 and the fact that Rλ0(hA) is feasible to (11). Since the solution N λ R∗ of (1) is bounded for any tk ∈ [0,T ], k ∈ , there exists a constant K such that kτk(h)k2 ≤ Kh for h ∈ + sufficiently small and, hence, the method is consistent and has order of accuracy of at least λ. The proof is complete. ✷

Some final remarks must be made in this section. First, the results stated in Theorem 6 provide important theoretical consequences for the discretisation method studied in this paper. Since in almost all situations

the matrix R10(hA)= I + hA, h> 0, is feasible to R, we guarantee that the discrete-time iteration (2), with M = M ⋆ obtained from (7), is consistent and has order of accuracy of at least 1. As it will be discussed

in the sequel, the feasibility of Rλ0 for λ ≥ 2 is ensured for special cases of S and R. Moreover, it is important to note that the main difficulty in this problem is to adequately choose R in order to provide an optimal solution M ⋆ such that the optimal error σ⋆ is sufficiently small. This problem is not fully addressed here. For some special classes of matrices, to be analysed in the sequel, some results provide a basis for this choice. Furthermore, it is important to remember that, in many applications, R is specified by, for instance, inter-agent communication patterns. Remark 1 Whenever the computation of the matrix exponential has to be avoided, an alternative optimisa- tion problem

M˜ = arg inf δ M, Rλµ(hA) (17) M∈R  can be solved. In this case, for h → 0+ or for λ, µ sufficiently large, it is clear that M˜ → M ⋆ and the solution of (17) recovers the optimal one yielded by (7). However, this is not true in general and the quality of the approximation obtained by solving (17) often has to be validated.

Remark 2 In the literature, two other methods are widely adopted in the discretisation problem with struc- tural constraints. These are Euler’s approximation, which uses M ≈ I + hA, and the recently introduced Mixed Euler-ZOH (mE-ZOH) method, discussed in [10, 11]. The latter is based on the approximation M ≈ I + D(h)A, where

h a11t h annt D(h)= diag 0 e dt, ··· , 0 e dt . (18)   It is clear that both approximations are less expensiveR from a computationalR viewpoint when compared to our approach, since both do not require the explicit evaluation of ehA. However, the approximations provided by R∗ these methods are usually most effective for very small values of h ∈ +. Based on these methods, we can impose the structure M = I +DA in (11)or(12) and obtain optimisation problems with the D as a decision variable.

3 Special Sparsity Patterns

In this section, we discuss some special sparsity patterns that arise in many practical applications. These patterns are then exploited to provide error bounds on the approximations yielded by (11). Moreover, we briefly analyse the relaxation of the structure of R to provide smaller values for the error.

3.1 Systems

First, we study band matrix systems, which present a peculiar structure that arises in several applications. For instance, tridiagonal systems are often used to model interaction between species [20], binary distillation systems [21] and some queueing systems [22]. Systems of this class present the following realisation

x˙ 1 = f1(x1, x2),

x˙ i = fi(xi−1, xi, xi+1), i =2,...,n − 1 (19)

x˙ n = fn(xn−1, xn)

n evolving from the initial condition x(0) = x0 ∈ R . Moreover, this structure implies the linearised model obtained near an equilibrium point has a tridiagonal dynamic matrix, which is a particular case of a band matrix. n×n Following [13], we say A = (aij ) ∈ R is a (bu,bℓ)-band matrix if all the nonzero elements of A have to satisfy the condition −bu ≤ i − j ≤ bℓ. The constants bu and bℓ are usually called upper bandwidth and lower bandwidth, respectively. Diagonal, tridiagonal, pentadiagonal, Hessenberg and triangular matrices can be seen as special cases of band matrices and, thus, this class of matrices plays an important role in a wide range of applications. The following propositions state important properties on these matrices.

n×n k Proposition 7 If A = (aij ) ∈ R is a (bu,bℓ)-band matrix, then A is a (k ·bu, k ·bℓ)-band matrix, k ∈ N.

Proof: The proof is based on mathematical induction. It is clear that the statement holds for k =0or k = 1, since A0 = I and A1 = A. Let us suppose that the statement is true for a given k ∈ N. Then, we define the k Rn×n k+1 Rn×n n following notation A = (αij ) ∈ and A = (βij ) ∈ , which implies βij = ℓ=1 αiℓaℓj . Note that aℓj = 0 for −bu ≤ ℓ − j ≤ bℓ and, as it was assumed, αiℓ = 0 for −kbu ≤ i − ℓ ≤ kbPℓ. Therefore, from both conditions, it follows that βij = 0 whenever −(k + 1) · bu ≤ i − j ≤ (k + 1) · bℓ and the proof is complete. ✷

Rn×n R∗ Proposition 8 Let A ∈ be a (bu,bℓ)-band matrix with bu,bℓ > 0 and let h ∈ + be a given constant. The following statements hold:

(i) The exponential ehA is, in general, a full matrix.

(ii) The [λ/0] order Pad´eapproximant is a (λbu, λbℓ)-band matrix.

(iii) The [λ/µ] order Pad´eapproximant, for µ> 0, is, in general, a full matrix.

Proof: The proof of items (i) and (ii) follows directly from Proposition 7. Now, let us prove statement (iii). n×n n Remember that, any nonsingular matrix X ∈ R such that limn→∞(X − I) = 0 can be expressed in terms of its Neumann series (see [23])

X−1 = I − (X − I) + (X − I)2 − (X − I)3 + ··· , (20)

−1 which implies that X may be a full matrix even when X is a band matrix. Thus, it follows that Rλµ(hA)= −1 Qλ(hA)Qµ(−hA) is, in general, a full matrix. ✷

Some comments concerning the previous results must be made at this point. First, Proposition 8 justifies the “fill-up” effect of discretisation on the structure of a band matrix A. Second, the widely adopted µ- order diagonal Pad´eapproximants to the exponential usually provide full matrices and, therefore, these approximations cannot be considered when we desire to preserve the band sparsity pattern of a given matrix A. Finally, it is remarkable that the truncated Taylor series preserves a band sparsity pattern, but with a larger bandwidth. Moreover, it is clear that, as λ increases, the quality of the approximation improves, but the yielded matrix will have a large number of nonzero entries. Now we focus again on the discretisation problem. The results developed in this section altogether allow us to state the following theorem, which provides an important error bound to the approximation yielded by (11).

n×n Theorem 9 Assume A ∈S ⊂ R , where S is the set composed of all (bu,bℓ)-band matrices and let R∗ Rn×n h ∈ + be given. If R⊂ is the set composed of all (λbu, λbℓ)-band matrices, λ ≥ 1, then the solution (M ⋆, σ⋆) to the convex optimisation problem (11) is such that

n(hkAk )λ+1 σ⋆ ≤ 2 ehkAk2 . (21) (λ + 1)! Furthermore, the iteration (2), with M = M ⋆, is consistent with (1) and has order of accuracy of at least λ.

Proof: First, we observe that the definition of R implies that the [λ/0]-order Pad´eapproximant Rλ0(hA) R∗ satisfies the constraints of the optimisation problem (11) for some σ ∈ +. Therefore, it is clear that ⋆ ⋆ hA hA σ = kM − e k2 ≤ kRλ0(hA) − e k2. Finally, considering the error bound (6), we conclude that (21) holds. The rest of the statement follows from Theorem 6. The proof is complete. ✷

It is remarkable that the special structure presented by band matrices allow us to obtain error bounds for our approximation. As it has been shown in the proof of Theorem 9, this result is completely based on

the feasibility of the approximant Rλ0(hA) with respect to the constraints of the optimisation problem (11). Finally, it is important to note that the approximations yielded by the optimal solution of (11) are always at least as good as the ones provided by [λ/0]-order Pad´eapproximants.

3.2 Arrowhead Matrix Systems

Now we briefly discuss another special class of sparse matrices, known as arrowhead matrices, which may arise in network flow control and optimisation problems [9]. Indeed, in this reference, the authors consider a continuous-time primal-dual update for optimisation variables associated to a network flow control problem. The decentralised structure of the algorithm must be considered by the discretisation method applied to this problem. In the special case of single link networks, the dynamic matrix of the linearised model has only nonzero elements on its main diagonal and on its last row and column, which is the structure of an arrowhead matrix. Formally, an arrowhead matrix A ∈ Rn×n can be written, without any loss of generality, in the form

A = D + uv′ + wu′, (22)

where D ∈ Rn×n is diagonal, v, w ∈ Rn are given vectors and u = [0 ··· 0 1]′ ∈ Rn. As it happens with band matrices, the matrix exponential of an arrowhead matrix is, in general, full. Indeed, we have A2 = AD + DA − D2 + (uv′ + wu′)2 and, due to the last term, A2 is filled-up and sparsity is lost for eA. However, Euler’s approximation

ehA ≈ I + hA = (I + hD)+ h(uv′ + wu′) (23)

also has the arrowhead structure and, thus, is feasible for the optimisation problem (11), whenever S and R are sets that contain this class of matrices. Hence, whenever M = M ⋆ given by (11), we can ensure (2) is consistent with (1) with order of accuracy of at least 1. Thus, in this case, considering that (σ⋆,M ⋆) is the solution to (11), we have ⋆ ⋆ hA 2 2 hkAk2 σ = kM − e k2 ≤ (1/2)nh kAk2e (24)

where we used the results of Theorem 1. As before, the structure constraint imposed by R can be relaxed. 4 Stability and Positivity Preservation

In almost all situations, sparsity preservation is just one constraint of many. For example, one is often interested in the quality of the solution and other qualitative properties, such as stability and positivity. In this section we show how these properties can be incorporated in the discretisation procedure; specifically, we consider preservation of asymptotical stability and positivity in case of positive systems. Both properties have already been addressed for special classes of systems in [10, 11].

4.1 Preserving Stability

hA R∗ It is widely known the mapping A 7→ e preserves asymptotic stability for any h ∈ +, as a consequence of the dynamic behaviour of the considered system. Whenever approximations are adopted, one has to be R∗ R∗ careful with the choice of the step h ∈ +, since only some methods can preserve stability for all h ∈ +. We now indicate how to preserve Lyapunov stability in our optimisation setting. We begin with the following theorem.

1 Rn×n R∗ Theorem 10 Let the Hurwitz stable matrix A ∈S⊂ and the sampling period h ∈ + be given. Let Rn×n R∗ M be a matrix in R⊂ and σ be a scalar in +. The following statements are equivalent:

2 hA (i) M is Schur stable and the error bound kM − e k2 < σ holds.

(ii) There exist matrices S = S′ ∈ Rn×n and G ∈ Rn×n such that

σ2I ⋆ G + G′ − G′SG ⋆ > 0, > 0. (25) M − ehA I! M S!

hA Proof: From Theorem 3, it is clear that kM − e k2 < σ is equivalent to the first inequality of (25). Thus, we first observe that the Schur stability of M implies the second inequality of (25) holds for G = S−1 > 0. Conversely, we also observe that Schur complement applied to the second inequality of (25) yields S−1 ≥ G + G′ − G′SG>M ′S−1M, which implies M is Schur stable, completing the proof. ✷

Taking into account the statement of Theorem 10, the best approximation in terms of the metric induced by the spectral norm can be determined by solving

(M ⋆, σ⋆,S⋆, G⋆) = arg inf {σ : (25)}, (26) M∈R,σ,S>0,G

which, albeit non-convex, possesses the following remarkable property; namely, whenever G ∈ Rn×n is fixed, the constraints become LMIs and the problem consequently becomes convex. This property can be exploited to provide a sequential convex optimisation method that calculates a suboptimal solution as a result of a convergent sequence of intermediate solutions with decreasing costs

(M , σ ,S ) = arg inf {σ : (25)}, (27) ℓ+1 ℓ+1 ℓ+1 −1 M∈R,σ,S>0,Gℓ=Sℓ

1A matrix whose eigenvalues are in {s ∈ C : Re(s) < 0}. 2A matrix whose eigenvalues are in {z ∈ C : |z| < 1}. where G is fixed as the previous value for S−1. The next proposition states a remarkable property presented by this method.

−1 Proposition 11 Suppose that (25) is feasible for some G0 = S0 > 0 fixed. The iterative method defined by the update (27) generates a convergent sequence such that σℓ+1 ≤ σℓ, ∀ℓ ∈ N.

Proof: Convergence follows from the boundedness of the sequence {σℓ}ℓ∈N, which is ensured by the first −1 ′ ′ constraint of (25), and from its monotonicity, whose proof is based on two facts. First, Sℓ ≥ G+G −G SℓG −1 −1 ′ ′ −1 holds independently of G = Sℓ−1 > 0. Second, Sℓ = G + G − G SℓG for G = Sℓ > 0. Thus, at the ℓ-th iteration, the first remark allows us to state that the second inequality of (25) remains valid if we replace −1 its (1, 1) block by Sℓ . Additionally, at the (ℓ + 1) iteration, the second equality yields the conclusion that the previous solution (Mℓ, σℓ,Sℓ) is feasible and, thus, σℓ+1 ≤ σℓ. Since G0 ensures feasibility of the first iteration, the claim follows. ✷

−1 Note that the most challenging aspect of this algorithm is the choice of G0 = S0 . Indeed, suppose hA −1 there is a feasible solution M that is sufficiently close to e . We can adopt S0 = X where X > 0 is a Lyapunov matrix for ehA, that is, (ehA)′X(ehA) − X = −Q < 0, where Q > 0 is given. The algorithm provides good results and, in most cases, absolutely no conservatism is added, since it finds an optimal solution. Furthermore, the number of iterations, empirically speaking, is usually found to be small.

4.2 Preserving Positivity

Positivity of LTI systems is a very important property; see [22, 24]. Formally, the continuous-time LTI

system (1) is said to be positive if, for any given initial state x(0) = x0 ≥ 0, we have x(t) ≥ 0, ∀t ∈ R+.

Equivalently, the discrete-time, LTI system (2) is positive if, for any initial state x[0] = x0 ≥ 0, we have x[k] ≥ 0, ∀k ∈ N. Thus, it is unacceptable to provide a discrete-time approximation to a continuous-positive system which does not preserve this property, due to the physical meaning the state variables may have in real world applications. Hence, we now discuss on the preservation of positivity in our approach. It is a well-known fact that (1) is positive if, and only if, A is a , that is, all of its off- diagonal elements are nonnegative. Moreover, (2) is positive if, and only if, M is a , that

is, all of its elements are nonnegative. Let us denote by M the set of all Metzler matrices and Md the set hA R∗ of all the nonnegative matrices. It is clear that, if A ∈ M, then M = e ∈ Md, ∀h ∈ +. Therefore, we should seek M that approximates the exponential ehA, satisfies some structure constraint imposed by R n×n and is a nonnegative matrix. Additionally, since Md is a convex cone in R , the optimal approximation considering the metric induced by the 2-norm can be obtained from

(M ⋆, σ⋆) = arg inf {σ : (9)}, (28) M∈R∩Md,σ

which is convex due to the convexity of the new feasible set for M is R∩Md. Clearly this formulation does not add any conservatism to the previous conditions. Note that (28) can be combined with the sequential procedure developed previously to ensure the Schur stability of M ⋆.

Remark 3 Based on the mE-ZOH structure, if we adopt M = I + DA and ensure M ∈ Md, then M is Schur stable for every diagonal matrix D > 0 whenever A ∈ M is Hurwitz (see [11]). Therefore, in this case, the optimisation problem (28) always provides a stable, nonnegative approximation.

5 Related Problems

In this final section, we address other interesting problems that involve discretisation methods.

5.1 Lyapunov Function Preservation

One of the many properties of Tustin’s approximation is the preservation of Lyapunov functions: for any R∗ ′ ′ Hurwitz stable matrix A and any h ∈ +, if P > 0 is such that A P +PA< 0 then R11(hA) P R11(hA)−P < 0 holds (see [25]). Hence, if one needs to preserve a Lyapunov matrix P > 0 in our discretisation setting, the problem to be solved can be formulated as

inf δ M,ehA : A′P + PA< 0 , M ′PM − P < 0 , (29) M∈R , P>0 n   o which is convex whenever δ is induced by a norm and P > 0 such that A′P +PA< 0 is fixed. Nevertheless, if P > 0 is considered as a variable, this problem can be solved by the sequential method presented previously. This problem can be generalised to the context of quadratic stability of switched systems; see [26, 27].

Given a set of N Hurwitz matrices Ac, a matrix P > 0 is a common Lyapunov matrix (CLM) for Ac if ′ A P + PA < 0 for all A ∈ Ac. Similarly, given a set of N Schur matrices Ad, a matrix P > 0 is a CLM ′ for Ad if M PM − P < 0 for all M ∈ Ad. These concepts are essential to define quadratic stability (QS)

of switched systems (see [26]). As a clear consequence from the LTI case, whenever we consider a set Ac of

Hurwitz matrices with an associated CLM P > 0, the first order diagonal Pad´eapproximant yields a set Ad of Schur matrices with the same CLM.

In our optimisation approach, if one seeks to find the discrete-time set of matrices Ad such that presents the same CLM P > 0 as Ac, the problem to be solved is

N hAi ′ ′ inf δ Mi, e : AiP + PAi < 0 , Mi P Mi − P < 0 , (30) M ∈R,P >0 i (i=1 ) X   for i = 1,...,N, which is only convex if δ is induced by a norm and the CLM P > 0 for Ac is fixed. In the more general case where P > 0 is a variable, the problem is nonconvex and, thus, an adaptation of the sequential procedure detailed before has to be adopted.

5.2 Robust Discretisation

Another interesting problem that can be addressed is a “robust” discretisation problem. In some applications, R∗ one cannot assume that the discretisation step h ∈ + is constant, due to uneven data rates present in real world scenarios. Therefore, one can be interested in the determination of a “robust” discrete-time matrix hA ⋆ M that approximates adequately e for all h ∈ [h⋆,h ], in which this interval bounds the uncertainties on the step size. This problem can be formulated as

inf sup δ M,ehA , (31) M∈R ⋆ h∈[h⋆,h ]  ⋆ in which the matrix to be found minimises the maximum error with respect to all values of h ∈ [h⋆,h ]. Considering δ is induced by the 2-norm, this problem is equivalent to

hA ⋆ inf σ : M − e <σ, h ∈ [h⋆,h ] (32) M∈R,σ 2  which is similar to the basic problem presented in this paper. It is clear that, due to the nonlinear dependence hA ⋆ of e on the discretisation step h ∈ [h⋆,h ], this problem is difficult to solve as is. However, the continuity R∗ ⋆ of the constraints with respect to h ∈ + allows one can split the interval [h⋆,h ] with a large enough number ⋆ N of evenly spaced points hi, with h1 = h⋆ and hN = h , and impose the constraints stated in Theorem 3 simultaneously. Hence, the convex optimisation problem to be solved is

⋆ ⋆ hiA (M , σ )=arg inf nσ : M − e <σ, i = 1,...,No . (33) M∈R,σ 2

Note that the optimal cost provides the bound for the worst-case error obtained by the robust approximation in the considered interval. Moreover, since the problem can be stated in terms of LMIs, it can be readily solved even for large values of N [28].

5.3 Stochastic Matrices and Markov Processes

The final application to be analysed in this paper considers the discretisation problem of Markov processes. Large-scale, sparse, stochastic models arise in queue theory and in its applications [29, 22, 24] and it is essential that discretisation methods preserve the special characteristics presented by these systems. A finite dimensional, continuous-time Markov processes can be modelled as a linear autonomous system with realisation (1), in which A is a Metzler matrix and is such that π′A = 0, where π = [1 ··· 1]′. Thus, A is the transpose of the transition rate matrix3 associated to the process. Similarly, in discrete-time, a finite dimensional Markov process (or chain) can be modelled as a linear autonomous system with realisation (2), where M is a nonnegative matrix and satisfies π′M = π′. Therefore, in the discrete-time case, M is the transpose of the stochastic matrix4 associated to the Markov chain. Discretisation methods can be applied to continuous-time Markov processes in order to obtain their corresponding discrete-time chains. It can be shown that, if Q is a transition rate matrix, then P = ehQ is a R∗ for any h ∈ +. Therefore, the approximation to be obtained by our approach has to be in R, has to be nonnegative and has to satisfy π′M = π′. Thus, the problem to be solved is

inf δ M,ehA : π′M = π′ , (34) M∈R∩Md   which is clearly convex whenever δ is induced by a norm. Thus, the approximation yielded by (34), applied to a continuous-time Markov process, always yields a well-posed Markov chain, in the sense that M is always the transpose of a stochastic matrix.

3 Rn×n Q = (qij ) ∈ is a transition rate matrix if qij ≥ 0, ∀i,j, and j qij = 0. 4 n×n P P = (pij ) ∈ R is a stochastic matrix if pij ≥ 0, ∀i,j, and pij = 1. Pj 6 Numerical Examples

Example 1 (Queueing System) In this example, we consider a simple queueing system, described in [22]. Under Markovian hypotheses, a system with 2 servers and a queue with capacity of 3 clients can be modelled as (1) with −λ µ  λ −λ − µ 2µ   λ −λ − 2µ 2µ  A =   ,  λ −λ − 2µ 2µ       λ −λ − 2µ 2µ     λ −2µ   R∗   where λ, µ ∈ + are parameters related to the rate of arrivals and the time needed to perform the service. At any given instant t ∈ R+, xi(t) represents the probability of having i users in the system. Thus, this system is a continuous-time Markovian process and, in order to be discretised, its characteristics have to be taken into account. For instance, for λ = µ = 1 and h = 0.25 s, we solve (34) with the metric induced by the spectral norm and obtain

0.8221 0.1833 0.1779 0.6622 0.3077  ⋆  0.1545 0.5663 0.2834  M =   ,  0.1260 0.5498 0.2899       0.1668 0.5494 0.3215    0.1607 0.6785     ⋆ R∗ which yields an error of σ = 0.0895. In this case, Euler’s method fails to ensure positivity for all h ∈ + and its modified version, although preserving distribution properties and always providing a nonnegative ap- ′ ′ ′ R∗ proximation, does not ensure π (I +D(h)A)= π , where π = [1 ··· 1] , for h ∈ +. Thus, the approximation obtained is adequate.

Example 2 (Reactor-Separator Process) This example considers the linearised model for a reactors- separator process. It was described in [30] and analysed from the discretisation viewpoint in [10]. The dynamic matrix is Hurwitz and presents the block structure

A11 A13 A = A21 A22  , A A  32 33   in which each block is a 4 × 4 real matrix; all of the submatrices are given in [10]. Note that such structure is rather usual in distributed control applications. Hence, our main goal here is to obtain a discrete-time approximant that preserves stability and presents the same block pattern. To this end, our set R only allows nonzero valuers for the entries that correspond to nonzero blocks of A. Following [10], we analyse (see Figure 1) the spectral radius of the approximant yielded by the optimal solution to (11) (dashed line) and compare it with the spectral radius of the exact solution ehA (solid line) and with the spectral radius of the approximand 1

0.8

0.6

0.4 Spectral radius 0.2

0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 h Figure 1: Spectral radius for h ∈ (0, 0.5].

obtained by the mE-ZOH method (dot-dashed line), for h ∈ (0, 0.5]. It is clear that, for small h ∈ (0, 0.5], both approximations are good but the optimal solution to (11) is more adequate when this is not the case.

Example 3 (Network Flow Control) We consider an example related to the network flow control prob-

lem studied in [9]. The resource allocation is done with a fair utility function Ui(xi) = log(xi) in a single link (with cost λ and capacity c =2) and four sources network, with flows xi, i =1,..., 4. In this case, the routing matrix is R = [1 1 1 1]. For illustration, the feedback gains K = 0.01diag([1 1 1 1]) and Γ = 10 are adopted. The linearised model for the dynamics of the variables xi and λ, around the optimal solution ⋆ ⋆ to this problem (xi =0.5, i =1,..., 4, λ =2), is given by

d x KU ′′(x⋆) −KR′ x = , dt λ! ΓR 0 ! λ!

′′ ⋆ ′′ ⋆ where U (x ) = diag(Ui (xi )). The dynamic matrix A of this system has eigenvalues very close to the imaginary axis. We consider R = S for this case. For instance, if we take h = 0.2 s, the optimal solution M ⋆ of problem (11) is

0.9881 −0.0020  0.9881 −0.0020 M ⋆ = 0.9881 −0.0020 ,      0.9881 −0.0020   1.9867 1.9867 1.9867 1.9867 0.9920     ⋆ ⋆ hA with the associated error σ = kM − e k2 = 0.0040. Furthermore, we analyse the behaviour of some methods applied to this example and plot the eigenvalues loci parameterised by h ∈ (0, 1]. Blue dots represent the eigenvalues of ehA, whilst the ones of the approximations are denoted by green or red circles, the first being used when the approximation yielded is Schur stable and the latter on the contrary. These curves are shown in Figure 2, which is organised as follows. Subplot (a) shows the optimal solution obtained by solving (11), which yields stable solutions for h ∈ (0, 0.41]; (b) illustrates the approximation given by the mE-ZOH method, which provides stable solutions for h ∈ (0, 0.09] (if we impose M = I + DA in (11) and optimise D, the results are only slightly better); (c) considers the optimal solution obtained by solving (12), which is stable for h ∈ (0, 0.27]; (d) shows the stable solution of (26) obtained with the iterative procedure with Q = I. This example shows the quality of the approximations obtained.

7 Conclusions

In this paper, we have addressed the discretisation problem for sparse linear systems. Several practical dy- namical systems, mainly those that involve the interaction of a large number of independent agents, present some sparsity pattern in their continuous-time formulation that is lost whenever classical discretisation pro- cedures are adopted. Hence, the discretised models yielded by such techniques are not viable for simulation or implementation purposes, due to cost and communication constraints. To preserve the sparsity pattern, we formulate a convex optimisation problem that yields an approximation to the exact discrete-time dynamical matrix that presents a desired structure. Error bounds were provided for band matrix and arrowhead matrix linear systems, which occur in several applications. We have also addressed related problems that arise in different areas of control theory. Academic examples illustrate the developed results. Future studies should focus on extending and adapting the developed theory to consider dynamical systems presenting inputs and outputs.

References

[1] G. F. Franklin, J. D. Powell, and M. L. Workman, Digital Control of Dynamic Systems, 3rd ed. En- glewood Cliffs, NJ: Prentice Hall, 1997.

[2] T. Chen and B. A. Francis, Optimal Sampled-Data Control Systems. London, UK: Springer-Verlag, 1995.

[3] M. Souza, G. S. Deaecto, J. C. Geromel, and J. Daafouz, “Self-triggered linear quadratic networked control,” Optim. Control Appl. Meth., 2013, doi: 10.1002/oca.2085.

[4] A. Schlote, F. H¨ausler, T. Hecker, A. Bergmann, E. Crisostomi, I. Radusch, and R. Shorten, “Co- operative Regulation and Trading of Emissions Using Plug-in Hybrid Vehicles,” IEEE Trans. Intell. Transport. Syst., doi: 10.1109/TITS.2013.2264754.

[5] J. Lunze, Feedback Control of Large Scale Systems, ser. Systems and Control Engineering. Upper Saddle River, NJ: Prentice Hall, 1992.

[6] F.-Y. Wang and D. Liu, Networked Control Systems: Theory and Applications. London, UK: Springer- Verlag, 2008.

[7] P. Seiler and R. Sengupta, “Analysis of Communication Losses in Vehicle Control Problems,” in Proc. of the Americal Control Conference, Arlington, VA, June 2001.

[8] R. W. Clough and J. Penzien, Dynamics of Structures, 2nd ed. New York, NY: McGraw-Hill, 1993.

[9] J. T. Wen and M. Arcak, “A unifying passivity framework for network flow control,” IEEE Trans. on Automat. Contr., vol. 49, no. 2, pp. 162–174, February 2004. [10] M. Farina, P. Colaneri, and R. Scattolini, “Block-wise discretization accounting for structural con- straints,” Automatica, vol. 49, no. 11, pp. 3411–3417, November 2013.

[11] P. Colaneri, M. Farina, S. Kirkland, R. Scattolini, and R. Shorten, Hybrid Systems and Constraints. ISTE-Wiley, 2013, ch. Positive linear systems: discretization with positivity and structural constraints, pp. 1–20.

[12] B. D. O. Anderson and J. B. Moore, Optimal Control: Linear Quadratic Methods. Mineola, NY: Dover Publications, 2007.

[13] G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd ed. Baltimore, MD: Johns Hopkins University Press, 1996.

[14] C. Moler and C. Van Loan, “Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty- Five Years Later,” SIAM Review, vol. 45, no. 1, pp. 3–49, 2003.

[15] G. A. Baker Jr. and P. Graves-Morris, Pad´eApproximants, 2nd ed. Cambridge, UK: Cambridge University Press, 1996.

[16] R. N. Shorten, M. Corless, S. Sajja, and S. Solmaz, “On Pad´e approximations, quadratic stability and discretization of switched linear systems,” Systems & Control Letters, vol. 60, no. 9, pp. 683–689, Semptember 2011.

[17] E. Kreyszig, Introductory Functional Analysis with Applications. New York, NY: John Wiley & Sons, 1978.

[18] I. Markovsky, “Structured low-rank approximation and its applications,” Automatica, vol. 44, no. 4, pp. 891–909, April 2007.

[19] E. S¨uli and D. F. Mayers, An Introduction to Numerical Analysis. Cambridge, UK: Cambridge Uni- versity Press, 2003.

[20] B. Fiedler and T. Gedeon, “A Lyapunov function for tridiagonal competitive-cooperative systems,” SIAM J. Math. Anal., vol. 30, no. 3, pp. 469–478, March 1999.

[21] J. L´evine and P. Rouchon, “Quality control of binary distillation columns via nonlinear aggregate models,” Automatica, vol. 27, no. 3, pp. 463–480, May 1991.

[22] L. Farina and S. Rinaldi, Positive Linear Systems: Theory and Applications. New York, NY: John Wiley & Sons, 2000.

[23] C. D. Meyer, Matrix Analysis and Applied . Philadelphia, PA: SIAM, 2000.

[24] D. G. Luenberger, Introduction to Dynamic Systems: Theory, Models and Applications. New York, NY: John Wiley & Sons, 1979.

[25] S. Barnett, Matrices in control theory. Florida, USA.: Robert E. Krieger Publishing Company, 1984. [26] S. Sajja, S. Solmaz, R. Shorten, and M. Corless, “Preservation of Common Quadratic Lyapunov Func- tions and Pad´eApproximations,” in Proc. of the 49th IEEE Conf. on Dec. and Contr., Atlanta, USA, December 2010, pp. 7334–7338.

[27] T. Mori, T. Nguyen, Y. Mori, and H. Kokame, “Preservation of Lyapunov Functions under Bilinear Mapping,” Automatica, vol. 42, no. 6, pp. 1055–1058, June 2006.

[28] S. Boyd, L. E. Ghaoui, E. Feron, and V. Balakrishnan, Linear Matrix Inequalities in System and Control Theory. Philadelphia, PA: SIAM Studies in Applied Mathematics, 1994.

[29] A. Leon Garcia, Probability, and Random Processes for Electrical Engineering, 3rd ed. Upper Saddle River, NJ: Prentice Hall, 2008.

[30] B. T. Stewart, A. N. Venkat, J. B. Wright, and G. Pannocchia, “Cooperative distributed model predic- tive control,” Systems & Control Letters, vol. 59, no. 8, pp. 460–469, August 2010. (a) ehA ≈ M ⋆ ∈ R (Spectral) (b) ehA ≈ I + D(h)A ∈ R

0.6 0.6

0.4 0.4

0.2 0.2

0 0 Im Im −0.2 −0.2

−0.4 −0.4

−0.6 −0.6 0.6 0.8 1 0.6 0.8 1

Re Re

(c) ehA ≈ M ⋆ ∈ R (Frobenius) (d) ehA ≈ M ⋆ ∈ R (Stability)

0.6 0.6

0.4 0.4

0.2 0.2

0 0 Im Im −0.2 −0.2

−0.4 −0.4

−0.6 −0.6 0.6 0.8 1 0.6 0.8 1

Re Re

Figure 2: Eigenvalues loci plot for h ∈ (0, 1].