On Distributed Solution of Ill-Conditioned System of Linear Equations under Communication Delays

Kushal Chakrabarti1, Nirupam Gupta2 and Nikhil Chopra3

Abstract— This paper considers a distributed solution for The communication delay from an agent i to agent j is a system of linear equations. The underlying peer-to-peer modeled as τ ij, where τ ij = τ ji > 0 is constant for any edge communication network is assumed to be undirected, however, (i, j) ∈ . Other approaches to study the delay robustness the communication links are subject to potentially large but con- E stant delays. We propose an algorithm that solves a distributed problem can be found in [3], [5], [6]. least-squares problem, which is equivalent to solving the system The considered problem has also been addressed in prior of linear equations. Effectively, the proposed algorithm is a pre-conditioned version of the traditional consensus-based works [2], [3], [5], [7], [8]. Initially [2] and later on [3] con- distributed gradient descent (DGD) algorithm. We show that sider directed time-varying networks. [3] proves the global the accuracy of the solution obtained by the proposed algorithm convergence of a projection-based asynchronous algorithm, is better than the DGD algorithm, especially when the system with the assumption of extended neighbour graphs being of linear equations is ill-conditioned. repeatedly jointly strongly connected and bounded delays, I.INTRODUCTION and provides an upper bound on the convergence rate. Random communications have been considered in [5], [8] Two of the major existing challenges in solving a set and algorithms have been provided with almost sure con- of linear equations, Ax = b, are high dimensionality and vergence to a solution of (1). [7] proves convergence of a ill-conditioning. When the dimension of b (or A) is large, communication-efficient extension of the algorithm in [2]. In the workload of solving this problem can be distributed the algorithm proposed in [7], each agent also needs to share by splitting among a number of agents [1], [2], [3], [4]. the number of their neighbours. We guarantee convergence in The focus of existing distributed algorithms is primarily on case of a deterministic network topology, without assuming accuracy and efficiency in terms of required computational a bound on the constant delays and by the agents sharing capacity and memory. only their estimates with neighbors. In this paper we solve the system of equations The solution of (1) can also be obtained by solving Ax = b, (1) a least-squares problem, such as in [3], [9], [10], [11]. where rows of A ∈ RN×n and corresponding elements When (1) is not solvable, the least-squares formulation of b ∈ RN are distributed among m agents. Each agent has an advantage of finding the solution that best fits (1). knows only a subset (Ai, bi) of the rows in (A, b) so The least-squares problem can also be solved by general  1 T m T T N×n that A = (A ) ... (A ) ∈ R and b = distributed optimization algorithms that have been discussed  1 T m T T N i ni×n i ni in the literature. However, ill-conditioning of A poses an (b ) ... (b ) ∈ R , where A ∈ R , b ∈ R Pm i additional challenge when there are communication delays and N = i=1 n . The agents can communicate with each other over a peer-to-peer network. between the agents. Existing distributed optimization al- The network is represented as an undirected fixed graph gorithms [6], [12], [13], [14], [15] fare poorly if A is ill-conditioned. The algorithms in [12], [13], [14] require G = (V, E), with m nodes V = {1, . . . , m} =: [m] decreasing stepsize, which leads to slower convergence. [6] representing the agents and the set of edges E, where an edge needs additional variables to be shared with neighbouring (i, j) ∈ E between two nodes i, j ∈ V represents that the agent i and agent j are immediate neighbours. It is assumed agents. [14] requires prior information on the delays. The algorithm proposed in [15] globally converges under strict that there is no self edge from a node i ∈ V to itself. We consider two major challenges: convexity of each agent’s cost function. Moreover, when A is ill-conditioned, these gradient-based optimization approaches • ill-conditioned matrix A, suffer from poor convergence rate or may even converge to • delay in the communication links. an undesired point. We follow the same distributed optimiza- *This work is being carried out as a part of the Pipeline System Integrity tion approach, however, our algorithm converges faster with- Management Project, which is supported by the Petroleum Institute, Khalifa out requiring strict convexity of the local costs and shares University of Science and Technology, Abu Dhabi, UAE. only the estimates between neighbours. The key ingredient 1Department of Electrical and Computer Engineering, University of Maryland, College Park, MD 20742, USA of our proposed approach is local pre-conditioning, which ([email protected]). is obtained by solving an appropriate Lyapunov equation. 2 Department of Computer Science, Georgetown University, Washington, Additionally, the proposed algorithm does not require any D.C. 20057, USA ([email protected]). 3Department of Mechanical Engineering, University of Maryland, Col- information about the heterogeneous communication delays, lege Park, MD 20742, USA ([email protected]). albeit that they are constant. TABLE I: Notations II.SHORTCOMINGOFDISTRIBUTEDGRADIENT METHOD Symbol Meaning N i neighbor set of node i Our goal is to design a distributed optimization algorithm η(A) nullity of matrix A that solves (2) for ill-conditioned matrix A and in the x˙ first derivative of x w.r.t time t presence of constant communication delays in the network. |S| cardinality of a set S In this section we show that the distributed gradient descent A 0 matrix A is positive definite In identity matrix of order n method, which is the base of gradient-based distributed opti- L∞ set of bounded functions mization methods, can not solve the distributed optimization n S+ set of symmetric positive semi-definite matrices of problem (2) in the presence of communication delay. order n Sn set of symmetric positive definite matrices of order n Distributed gradient descent (DGD) [19] is a consensus- ++ based iterative algorithm for solving distributed optimization problems. Here each agent i maintains a local estimate of global minima, let it be denoted by xi(t) for iteration A. Summary of Contributions: t. All the agents update their local estimates using their We solve (1) by solving the least-squares problem, local costs’ gradients and current local estimates of their m neighbors. Theoretically, under Assumption 2, if there is X 1 i i i 2 minimize A x − b no communication delay in the network then the agents’ xi∈ n,∀i∈[m] 2 R i=1 estimates converge to common global minima. However, in i j i subject to: x = x , ∀j ∈ N , ∀i ∈ V. (2) the presence of communication delays an agent receives past (and not the current) local estimates from its neighbors. [16] uses a similar approach with stricter restrictions on Ai. Consequentially, in DGD the agents’ local estimates need For now, we make the following assumptions: not converge to global minima or even reach consensus, as Assumption 1: There is no noise in the measured data. shown below. Specifically, considering delay in inter-agent Assumption 2: is a connected graph. G communications, each local agent i updates its estimate for Assumption 3: η(A) = 0, whereas for each agent i ∈ , V x∗ according to η(Ai) > 0. x˙ i(t) = − δ(t)(Ai)T (Aixi(t) − bi) Lemma 1: Under Assumption 2, (2) and the following m X optimization problems are equivalent: + wij(xj(t − τ ji) − xi(t)), (4) 1 j=1 minimize kAx − bk2 . (3) n i n x∈R 2 where x (t) ∈ R is estimate computed by agent i after t Proof: See Appendix A. iterations, δ(t) > 0 is a stepsize along descent direction after Problem (2) can be solved using existing distributed ij t iterations, w ∈ R is the weight assigned by agent i on optimization algorithms, such as [17], [18], [19]. This fact the estimate computed by agent j and τ ii = 0. The weights combined with Lemma 1 and equivalency of problem (1) wij satisfy Assumption 1 of [19]. and (3) enable us to formulate the following distributed Example 1. Consider the following numerical example: solution of (1). Each agent i ∈ V knows only its own cost A1 = 1 0.99, A2 = 1 1.01, b1 = b2 = 1, τ ij = 2, i i function based on (A ,b ) and minimizes that cost cooper- n = m = 2 and the network graph is a cycle. atively, while sharing only its solution with its neighbour We implement algorithm (4) with parameters δ(t) = 1/t; agents, as represented in problem (2). ij i ij 2 w = 1/2 if j ∈ N ∪ {i} and w = 0 otherwise. It can Remarks 1: The aggregate cost (1/2) kAx − bk is be seen from Fig. 1 that, DGD fails to estimate x∗ for the convex. If η(A) = 0, then A has full column rank. So the above problem where A has condition number 200. Also, aggregate cost is strictly convex and, hence, has a unique convergence depends on the initial values of the agent states minima. Thus, Assumption 3 implies that (1) has a unique (inset of Fig. 1). solution x∗ satisfying Ax∗ = b. III.PROPOSEDALGORITHM Compared to the existing works that address communica- In this section, we propose a distributed optimization tion delays in distributed optimization, the major contribu- algorithm for solving (1) that is robust to communication tions of our proposed algorithm are follows: delays even when the matrix A is ill-conditioned. 1) Higher robustness to ill-conditioning of A, un- The proposed algorithm is a modification of proportional- like [12], [13], [14], [15]. integral (PI) consensus-based gradient descent method [15], 2) Local agents’ costs need not be strictly convex, i.e. Ai wherein we (a) remove the integral terms from the algo- need not be full-rank, unlike [12], [15]. rithm and (b) introduce pre-conditioning at every node. It 3) The communication delays are apriori unknown, unlike is evident from the analysis of the proposed method, that [14]. integral terms are not required for linear problems. The pre- 4) Addressing communication delays in solving least- conditioning matrices have been introduced in the algorithm squares formulation, unlike [2], [3]. to take care of ill-conditioning of A. Even in the synchronous framework. Note that, the following framework is solely for analysis purposes. Define consensus terms for each agent:

i X ij ij i i v = v , v = K (rij − x ). (8) j∈N i

In this framework, rij is an external input to agent i from its i neighbor j ∈ N . In order to evaluate these rij’s, we define the following transformations [15]: 1 1 ~s = √ (−vij + ηr ), s~ = √ (vij + ηr ), (9) ij 2η ij ij 2η ij Fig. 1: Temporal evolution of error norm for estimate of each agent i ∗ 1 1 x (t) − x for Example 1, under DGD. s~ = √ (vji + ηr ), ~s = √ (−vji + ηr ), (10) ji 2η ji ji 2η ji where η > 0. Information about agent j’s estimate is setting without any communication delay, ill-conditioning of contained in vji which is sent to neighbor agent i ∈ N j A slows down convergence or, even worse, can make the in the form of ~sji. Considering the delay model defined in agents converge to an undesired solution. Proper choice of Section 1, these communication variables are related as conditioning matrices can tolerate ill-conditioning. ij ji For each agent i ∈ [m], we denote the local gradients as sji~ (t) = ~sij(t − τ ), sij~ (t) = ~sji(t − τ ). (11) i i i T i i i φ (x ) := (A ) (A x − b ). (5) Upon receiving sij~ , which is the variable ~sji sent by agent j but delayed in time by τ ji, agent i then recovers r Define local preconditioner matrix Ki which is the solution ij from it using (9). Hence, r contains information about of the Lyapunov equation [20] ij delayed estimates of its neighbor j. These transformations i i i i −N K − K N = −2kIn, (6) are commonly known as scattering transformation [15]. We assume, ~s (t) = ~s (t) = 0 ∀t < 0. with k > 0 and N i = (Ai)T Ai + |N i|I . Since N i 0 and ij ji n Using the above definitions, we propose the following k > 0, (6) has a unique symmetrical solution that is specified lemma which will be useful in proving convergence of later in the convergence analysis of the algorithm. Using the Algorithm 1. above definitions, we describe Algorithm 1 below. Lemma 3: Consider the following dynamics for each agent i ∈ [m]: Algorithm 1 i i i i i 1: for t = 0, 1, 2, ... do x˙ = v − K φ (x ), (12) 2: for each agent i ∈ [m] do the transformation (9)-(10) and delays (11) for j ∈ N i, ∀i. 3: receive xj(t − τ ji), ∀j ∈ N i If Assumptions 1-3 hold, then xi → x∗ ∀i ∈ [m] as t → ∞. 4: update estimate Proof: See Appendix C. i i i −1 x˙ (t) = K [η(ηIn + K ) The following theorem shows global convergence of Al- X (xj(t − τ ji) − xi(t)) − φi(xi(t))]. (7) gorithm 1. Theorem 1: Consider Algorithm 1. If Assumptions 1-3 j∈N i hold, then xi → x∗ ∀i ∈ [m] as t → ∞. 5: transmit updated estimate to all j ∈ N i Proof: See Appendix D. 6: end for 7: end for IV. SIMULATION RESULTS Example 2. Consider the following numerical example. In step 3 of Algorithm 1, each xj(t − τ ji) can be set to The matrix A is a real symmetric positive definite matrix any value for t < τ ji. The choice of preconditioner matrices from the benchmark dataset1 “bcsstm19” which is part of a Ki addresses ill-conditioning of the matrix A, helping the suspension bridge problem. The dimension of A is N = n = local estimates to converge quickly to the desired solution. 817 and its condition number is 2.337333 ∗ 105. The rows of A and the corresponding rows of b are distributed among A. Convergence Analysis m = 5 agents, so that four of them know 163 such rows and ¯ i i −1 i Lemma 2: For every i ∈ [m], K := (K +ηIn) (K − the remaining 165 rows are known by the fifth agent. The ηIn) is Schur for any η > 0. network is assumed to be of ring topology and τ ij = 5. We T Proof: See Appendix B. set b = Ax∗ where x∗ = 1 1 ... 1 is the unique In order to establish convergence of Algorithm 1, we solution. develop a framework based on [15] and pre-conditioning. The analysis of our algorithm then easily follows from this 1https://sparse.tamu.edu i ∗ Fig. 2: Temporal evolution of error norm for estimate of each agent x (t) − x for Example 2, under (a) Algorithm 1 and projection algorithm [3], (b) Algorithm 1, (c) under the algorithm in [15]. (a) xi(0) = [0,..., 0]T ∀i in each of the cases. (b) η = 300, k = 990, xi(0) are randomly initialized within [−20, 20]. (c) xi(0) = [3,..., 3]T ∀i.

We implement Algorithm 1 on this numerical example. REFERENCES The differential equations governing xi’s are solved numeri- −3 cally with small stepsizes h = 10 . Each agent converge to [1] Peng Wang, Shaoshuai Mou, Jianming Lian, and Wei Ren. Solving a the desired solution x∗ irrespective of their initial choice of system of linear equations: From centralized to distributed algorithms. estimate (ref. Fig. 2) and convergence rate can be changed by Annual Reviews in Control, 2019. [2] Shaoshuai Mou, Ji Liu, and A Stephen Morse. A distributed algo- tuning the algorithm parameters (ref. Fig. 2a). We compare rithm for solving a linear algebraic equation. IEEE Transactions on this result with that of two other algorithms in [15] and [3]. Automatic Control, 60(11):2863–2878, 2015. The algorithm in [15] has slower convergence rate (ref. [3] Ji Liu, Shaoshuai Mou, and A Stephen Morse. Asynchronous A distributed algorithms for solving linear algebraic equations. IEEE Fig. 2c) due of ill-conditioning of the matrix , however Transactions on Automatic Control, 63(2):372–385, 2017. the projection-based algorithm in [3] converges faster (ref. [4] Brian Anderson, Shaoshuai Mou, A Stephen Morse, and Uwe Helmke. Fig. 2a). A comparison of the time taken by Algorithm 1 Decentralized gradient algorithm for solution of a linear equation. and the projection algorithm is provided in Table II. arXiv preprint arXiv:1509.04538, 2015. [5] S Sh Alaviani and Nicola Elia. A distributed algorithm for solving linear algebraic equations over random networks. In 2018 IEEE TABLE II: Average time taken (in seconds) by Algorithm 1 with Conference on Decision and Control (CDC), pages 83–88. IEEE, (η = 300, k = 900) and projection algorithm [3] for solving 2018. Example 2 over 5 runs. [6] Tianyu Wu, Kun Yuan, Qing Ling, Wotao Yin, and Ali H Sayed. De- centralized consensus optimization with asynchrony and delays. IEEE Algorithm: Algorithm 1 Projection Transactions on Signal and Information Processing over Networks, algorithm 4(2):293–307, 2017. Time needed before iterations: 0.1775 0.4001 [7] Ji Liu, Xiaobin Gao, and Tamer Bas¸ar. A communication-efficient Time needed during iterations: 1.4169 0.6149 distributed algorithm for solving linear algebraic equations. In 2014 Total time needed: 1.5944 1.0150 7th International Conference on NETwork Games, COntrol and OP- timization (NetGCoop), pages 62–69. IEEE, 2014. [8] Xiaobin Gao, Ji Liu, and Tamer Bas¸ar. Stochastic communication- efficient distributed algorithms for solving linear algebraic equations. V. DISCUSSION In 2016 IEEE Conference on Control Applications (CCA), pages 380– 385. IEEE, 2016. In this paper, we proposed a continuous time algorithm [9] Xuan Wang, Shaoshuai Mou, and Dengfeng Sun. Improvement of a to solve a system of linear algebraic equations Ax = b distributed algorithm for solving linear equations. IEEE Transactions distributively when there are delays in the communication on Industrial Electronics, 64(4):3113–3117, 2016. [10] Xuan Wang, Jingqiu Zhou, Shaoshuai Mou, and Martin J Corless. A links and the matrix A is ill-conditioned. Our algorithm distributed linear equation solver for least square solutions. In 2017 utilizes preconditioners in conjunction with the classical IEEE 56th Annual Conference on Decision and Control (CDC), pages distributed gradient method for optimization. A methodology 5955–5960. IEEE, 2017. for appropriately selecting the local preconditioner matrices [11] Yang Liu, Youcheng Lou, Brian Anderson, and Guodong Shi. Network flows that solve least squares for linear equations. arXiv preprint has been provided so that the distributed gradient method arXiv:1808.04140, 2018. globally converges to the desired solution of a set of equa- [12] John Tsitsiklis, Dimitri Bertsekas, and . Distributed tions Ax = b. asynchronous deterministic and stochastic gradient optimization algo- rithms. IEEE transactions on automatic control, 31(9):803–812, 1986. To illustrate the effectiveness of the proposed algorithm, [13] Konstantinos I Tsianos and Michael G Rabbat. Distributed dual we have applied it to solving a numerical example. It has averaging for under communication delays. In been shown that the rate of convergence can be controlled 2012 American Control Conference (ACC), pages 1067–1072. IEEE, by tuning the algorithm parameters, although no bound on 2012. [14]H akan˚ Terelius, Ufuk Topcu, and Richard M Murray. Decentralized convergence rate has been provided. From simulations, we multi-agent optimization via dual decomposition. IFAC proceedings have seen that the algorithm in [3] is faster than Algorithm 1. volumes, 44(1):11245–11251, 2011. However, unlike the former algorithm, Algorithm 1 is guar- [15] Takeshi Hatanaka, Nikhil Chopra, Takayuki Ishizaki, and Na Li. Passivity-based distributed optimization with communication delays anteed to converge without assuming upper bound on the using PI consensus algorithm. IEEE Transactions on Automatic delays τ ij. Control, 63(12):4421–4428, 2018. [16] Tao Yang, Jemin George, Jiahu Qin, Xinlei Yi, and Junfeng Wu. Dis- C. Proof of Lemma 3 tributed finite-time least squares solver for network linear equations. arXiv preprint arXiv:1810.00156, 2018. We begin with the necessary definitions (ref. [15]): [17] Vassilis Kekatos and Georgios B Giannakis. Distributed robust ∗ power system state estimation. IEEE Transactions on Power Systems, r¯ij = rij − x , (15) 28(2):1617–1626, 2012. x¯i = xi − x∗. (16) [18] Aryan Mokhtari, Qing Ling, and Alejandro Ribeiro. Network new- i i 2 ton distributed optimization methods. IEEE Transactions on Signal S¯ = (1/2) x¯ , (17) Processing, 65(1):146–161, 2016. Z t 2 [19] Angelia Nedic and Asuman Ozdaglar. Distributed subgradient meth- ij 1 1 ∗ ods for multi-agent optimization. IEEE Transactions on Automatic V (t) = ~sij(y) − √ ηx dy 2 ij 2η Control, 54(1):48–61, 2009. t−τ [20] Joao P Hespanha. Linear systems theory. Princeton university press, Z t 2 1 1 ∗ 2018. + ~sji(y) − √ ηx dy, (18) 2 t−τ ji 2η X X V = S¯i + V ij. (19) APPENDIX i∈[m] (i,j)∈E A. Proof of Lemma 1 As the proof is long, we outline the steps as follows: 1) Define a suitable time-dependent Lyapunov function It is well-known, if G is connected then (2) is equivalent candidate V (ref. (19)) for the combined agent dynam- to the unconstrained optimization problem (ref. [17], [18]) ics, where the overall Lyapunov function is contributed ¯i ij m by storage functionals S for individual agents and V X 1 i i 2 minimize A x − b . (13) for their links. x∈ n,∀i∈[m] 2 R i=1 2) The time derivative of V is shown to be non-positive, and consensus is established between the agents using Now the objective cost in (13) can be written as extension on LaSalle’s principle. 3) Finally, asymptotic convergence is established by  1 1  2 A x − b showing that the solution set of the time derivative m X i i 2  .  2 of the Lyapunov function V being identically zero A x − b =  .  = kAx − bk . (14) i=1 m m only consists of each agent asymptotically reaching the A x − b desired solution x∗. Therefore, problems (3) and (13) are equivalent to each From (12) and (16), other. Hence, by transitivity property of equivalence relation, x¯˙ i =x ˙ i = vi − Kiφi(xi). (20) problems (2) and (3) are equivalent to each other. From (17), S¯i is “positive definite” and ˙ i i T i i T i i T i i ∗ B. Proof of Lemma 2 S¯ = (¯x ) v − (¯x ) K (A ) A (x − x ). (21) For any i ∈ [m], From (8), (15) and (16), ij i i i X i i v = K (¯rij − x¯ ), v = K (¯rij − x¯ ). ¯ i (λ, v) is an Eigen-pair of K j∈N i i −1 i ⇐⇒ (K + ηIn) (K − ηIn)v = λv Substituting them into (21), i i ⇐⇒ (K − ηIn)v = (K + ηIn)λv [This implies ˙ i i T X i i i T i i T i i S¯ =(¯x ) K (¯rij − x¯ ) − (¯x ) K (A ) A x¯ λ 6= 1 v = 0 K¯ i. , otherwise is an Eigenvector of ] j∈N i i 1 + λ X T ij i T i i ⇐⇒ K v = ηv = [(¯rij) v − (¯x − r¯ij) K (¯x − r¯ij)] 1 − λ i 1 + λ j∈N ⇐⇒ ( η, v) is an Eigen-pair of Ki. i T i i T i i 1 − λ − (¯x ) K (A ) A x¯ . (22) We use the following facts (ref. proof of Lemma 5 in [15]): We have Ki = k(N i)−1, which can easily be verified by substituting this expression in the Lyapunov equation (6) V ij(t) ≥ 0 ∀t (23) ij ij T ji T which has a unique solution. Then, and v˙ (t) = −(v ) r¯ij − (v ) r¯ji. (24) k k From (22) and (24) it follows that, λ[Ki] = = > 0 λ[N i] λ[(Ai)T Ai] + |N i| X X i T i i V˙ = − (¯x − r¯ij) K (¯x − r¯ij) 1 + λ ⇐⇒ η > 0 ⇐⇒ |λ| < 1. i∈[m] j∈N i 1 − λ X − (¯xi)T Ki(Ai)T Aix¯i ≤ 0 (25) Therefore, K¯ i is Schur and the claim follows. i∈[m] i i i T i n So, x ∈ L∞ ∀i. From (8)-(11), K (A ) A ∈ S+. n ∗ ¯ i 2 ij ji ij Consider any x ∈ , x 6= 0. Now ∃j ∈ [m] s.t. rij(t) = (K ) rij(t − τ − τ ) + β (t), (26) ∗ R x∈ / null(Aj ). Otherwise, x ∈ null(Ai) ∀i which implies ij i j where β (t) are linear functions of x and x at times t, x ∈ null(A). From Assumption 3, null(A) = 0 which is a ij ji ij ji ij t−τ , t−τ , t−τ −τ (ref. [15]). Also β (t) are bounded contradiction. Now, i linear, because x ∈ L∞. Lemma 2 states that Eigenvalues n ¯ i T j∗ j∗ T j∗ X λj∗k T 2 of K ∀i are within unit circle, if η > 0. Then, (26) is a ∗ x K (A ) A x=k j∗ x uj k . (32) |N | + λj∗k stable difference equation with bounded inputs, which means k=1 rij ∈ L∞. So, extension of LaSalle’s principle for time delay We consider the nontrivial case where for each i, Ai is not systems is applicable. Now, the zero matrix. Then, a positive Eigenvalue exists, because ˙ i i j∗ T j∗ n V ≡ 0 ⇐⇒ x¯ =r ¯ij ∀i, ∀j ∈ N , (A ) A ∈ S+ and all of its Eigenvalues cannot be zero. ∗ ∗ X Define l = n − η((Aj )T Aj ). Then ∃l ≥ 1 such that (¯xi)T Ki(Ai)T Aix¯i = 0. (27) λj∗k > 0, k = 1, ..., l (possibly by rearrangement of indices) i∈[m] l j∗ T j∗ and the Eigenvectors {uj∗k}k=1 6⊂ null((A ) A ) and Thus, LaSalle’s principle implies that n j∗ T j∗ {uj∗k}k=l+1 = null((A ) A ). Then, i i ∀i, x → rij as t → ∞ ∀j ∈ N . (28) T j∗ j∗ T j∗ l x K (A ) A x = 0 ⇐⇒ x ⊥ span{uj∗k}k=1 From (8)-(11) one can see that, n ⇐⇒ x ∈ span{uj∗k}k=l+1 p ji i i ∗ ∗ ηrij(t) = 2η~sji(t − τ ) − K (rij(t) − x (t)) ⇐⇒ x ∈ null((Aj )T Aj ) j ji j ji = − K (r (t − τ ) − x (t − τ )) ∗ ∗ ∗ ji Therefore, xT Kj (Aj )T Aj x > 0 and the claim follows ji i i + ηrji(t − τ ) − K (rij(t) − x (t)). (29) from (31). i, ∀j ∈ N i From (28) and (29), for every , D. Proof of Theorem 1 ji lim rij(t) = lim rji(t − τ ) = lim rji(t) t→∞ t→∞ t→∞ For every i ∈ [m], consider the system (12) with the i ∗ =⇒ lim xi(t) = lim xj(t). definitions (9)-(11). Then, Lemma 3 implies that x → x t→∞ t→∞ ∀i as t → ∞. So it is enough to show that, there exists Thus consensus is achieved asymptotically. Then, from (27) i ~sij ∀j ∈ N satisfying definitions (9)-(11) such that the we have dynamics in (7) is same as the dynamics (12). V˙ ≡ 0 =⇒ lim xi(t) = lim x(t) ∀i, Let t→∞ t→∞ r X T i i T i η i i lim (¯x(t)) K (A ) A x¯(t) = 0. (30) ~sij(t) = x (t) ∀j ∈ N . (33) t→∞ 2 i∈[m] n ∗ From (11) and the scattering variables chosen as above, for some x : R → R and x¯ := x − x . So, the only thing r ji η j ji left to be shown is sij~ (t) = ~sji(t − τ ) = x (t − τ ). (34) X 2 lim (¯x(t))T Ki(Ai)T Aix¯(t)=0 =⇒ lim x¯(t)=0. t→∞ t→∞ From (8), (9) and (34) it can be seen that, i∈[m] j ji i i So it is sufficient to show that, ηx (t − τ ) = K (rij(t) − x (t)) + ηrij(t) X i −1 j ji i i Ki(Ai)T Ai 0. (31) =⇒ rij(t) = (K + ηIn) (ηx (t − τ ) + K x (t)). i∈[m] By the above equation and (8), ij i i −1 j ji i i i We have v (t) =K [(K + ηIn) (ηx (t − τ ) + K x (t)) − x (t)] i n i T i n X i i T i i i −1 j ji i K ∈ S++, (A ) A ∈ S+ ∀i=⇒ K (A ) A  0 =ηK (K + ηIn) (x (t − τ ) − x (t)), (35) i∈[m] where the last equality follow from the fact i (Ai)T Ai = For each , consider the Eigen decomposition (Ki + ηI )−1Ki − I U iΛi(U i)T , where U i is the Eigenvector matrix of (Ai)T Ai n n i −1 i i −1 i and Λi is a diagonal matrix with the Eigenvalues of (Ai)T Ai =(K + ηIn) K − (K + ηIn) (K + ηIn) i −1 in the diagonal. Then, = − η(K + ηIn) . i i i T i i i i i T N = |N |In + (A ) A = U (|N |In + Λ )(U ) . Equation (35) substituted in (12) leads to (7), and the choice We have Ki = k(N i)−1, which can easily be verified by of variables in (33) satisfy definitions (9)-(11). Therefore, substituting in the Lyapunov equation (6) which has a unique the claim follows from Lemma 3. solution. Then,

i i T i i λik n i T K (A ) A = kU diag{ i }k=1(U ) , |N | + λik i T i n where λik are the Eigenvalues of (A ) A ∈ S+, and λik = 0 if the kth column of U i is in null(Ai). This also means