A Fast Converging Distributed Solver for Linear Systems with Generalised Diagonal Dominance

1 A Fast Converging Distributed Solver for Linear Systems with Generalised Diagonal Dominance Qianqian Cai1, Zhaorong Zhang2 and Minyue Fu2;1, Fellow, IEEE Abstract—This paper proposes a new distributed algorithm for In this paper, we study distributed solutions for sparse solving linear systems associated with a sparse graph under a linear systems. Our focus is on linear systems which sat- generalised diagonal dominance assumption. The algorithm runs isfy the so-called generalised diagonal dominance condition. iteratively on each node of the graph, with low complexities on local information exchange between neighbouring nodes, local These systems include those with diagonal dominance as a computation and local storage. For an acyclic graph under the subset (those with a diagonally dominant matrix). Diagonally condition of diagonal dominance, the algorithm is shown to dominant matrices have vast applications in engineering, eco- converge to the correct solution in a finite number of iterations, nomics and finance [1], [2]. For example, many matrices equalling the diameter of the graph. For a loopy graph, the that arise in finite element modeling are diagonally dominant algorithm is shown to converge to the correct solution asymptotically. Simulations verify that the proposed algorithm significantly [32]; diagonally dominant matrices also arise naturally for outperforms the classical Jacobi method and a recent distributed covariance of partially correlated random vectors [33]. The linear system solver based on average consensus and orthogonal generalised diagonal dominance condition relaxes the diagonal projection. dominance requirement using a diagonal scaling matrix, i.e., Index Terms—Distributed algorithm; distributed optimization; only a diagonally scaled matrix is required to be diagonally distributed estimation; linear systems. dominant. As we will point out later, the generalised diagonal dominance condition is also equivalent to the so-called walk- summability condition [12], [34]. I. INTRODUCTION We propose a new distributed algorithm for solving sparse Sparse linear systems are of great interest in many disci- linear systems with generalised diagonal dominance. The algo- plines, and many iterative methods exist for solving sparse rithm runs iteratively on each node of the graph associated with linear systems; see, e.g., [1], [2], [3], [4], [5], [6], [7], [8], the system, and enjoys low complexities on local information [9], [10], [11], [12]. However, for many networked system exchange between neighboring nodes, local computation and applications, distributed algorithms are preferred. The main local storage. For an acyclic graph, the algorithm converges difference is that an iterative method focuses on reducing the to the exact solution in d iterations, where d is the diameter total computation by incorporating the sparsity of the system, of the graph. This is also the theoretical minimum number of but allowing a central controller for the coordination of the iterations required to solve a linear system using a distributed algorithm. In contrast, a distributed algorithm is designed algorithm. For a loopy graph, the algorithm is guaranteed to such that the same algorithm is executed concurrently on converge to the exact solution asymptotically. every node (or sub-system, or agent) while allowing only We have compared the proposed algorithm with the classical neighbouring nodes to share information. Ideally, no central Jacobi method [1], [2], [3] and the distributed linear system controller or coordinator is required, and each node should solver [16] based on average consensus and orthogonal projec- have low complexities in terms of information exchange, tion to show that our algorithm has much faster convergence. computation and storage. For example, a sequential iterative The development of the algorithm and its theoretical anal- algorithm, where the nodes are ordered and their executions ysis follows from a similar algorithm known as Gaussian belief propagation (BP) algorithm [35] which applies to Gauss- arXiv:1904.12313v1 [eess.SP] 28 Apr 2019 are done sequentially, should not be regarded as a distributed algorithm. For a large networked system such as sensor Markov random fields which involve a symmetric system ma- networks [13], [14], [15], networked control systems [16], trix; see, e.g., [10], [12]. Namely, the convergence properties in [17], [18], network-based state estimation [19], [20], [21], [10] are for symmetric matrices with diagonal dominance, and [22], biological networks [23], [24], multi-agent systems [25], the convergence properties in [12] are for symmetric matrices [26], [27], [28], multi-agent optimization [29], [30], [31] and with generalised diagonal dominance (or walk summability). so on, it is often desirable to use distributed algorithms rather The main contribution of our work in comparison with those than iterative methods. in [10], [12] can be viewed as generalising the Gaussian BP algorithm to solving linear systems with an asymmetric system 1School of Automation, Guangdong University of Technology, and Guang- matrix. dong Key Laboratory of IoT Information Processing, Guangzhou 510006, China. 2School of Electrical Engineering and Computer Science, The University II. PROBLEM FORMULATION AND PRELIMINARIES of Newcastle. University Drive, Callaghan, 2308, NSW, Australia. This work was supported by the National Natural Science A. Problem Formulation Foundation of China (Grant Nos. 61633014 and U1701264). E- mails: [email protected]; [email protected]; Consider a network of nodes (1; 2; : : : ; n) associated with n [email protected]. a state vector x = colfx1; x2; : : : ; xng 2 R . The information 2 available at each node i is that x satisfies a linear system: C3: Local storage: Each node i’s storage should be at most O(jNij) over all iterations. aT x = b ; i i Lemma 1: Under the local information exchange constraint n where ai = colfai1; ai2; : : : aing 2 R is a column vector above, a connected graph G with diameter d needs a minimum and bi is a scalar. Collectively, the common state satisfies of d iterations to solve the distributed linear system problem with a general A and b. Ax = b (1) Proof: Let nodes i and j be such that they are d hops T T T away from each other (such nodes exist by the definition of with A = colfa1 ; a2 ; : : : ; an g and b = colfb1; b2; : : : ; bng. It is assumed throughout the paper that (i) A is invertible; and (ii) d). The correct solution for xi requires the information of all the bj connected to node i, which needs to propagate to node aii 6= 0 for all i = 1; 2; : : : ; n. Note that (ii) can be guaranteed under (i) because nonzero diagonals can always be obtained i. By the local information exchange constraint, this will take through proper pivoting (swapping of rows of columns) [2]. at least d iterations. Denote the solution of (1) by Remark 1: Although we focus only on linear systems with square and invertible matrix A, under-determined and x? = A−1b: (2) over-determined linear systems can be treated with slight modifications. For the under-determined case (where the row Define the induced graph G = fV; Eg with V = rank of A is less than n), we can zero-pad (A; b) to make A f1; 2; : : : ; ng and E = f(i; j)jaij 6= 0 or aji 6= 0g. We square, then solve the regularised problem of (A + λI)x = b associate each node i with (aii; bi), and each edge (i; j) with for some sufficiently small scalar or diagonal matrix λ > 0. aij. Note in particular that G is undirected, i.e., (i; j) 2 E For the over-determined case (where A has full column rank if and only if (j; i) 2 E. For each node i 2 V, we and has more rows than columns), it is standard to solve the define its neighbouring set as Ni = fjj(i; j) 2 Eg and T least-squares problem minx(Ax − b) (Ax − b) instead, which denote its cardinality by jNij. We assume in the sequel results in the linear system (AT A)x = (AT b) with a square that jNij n for all i 2 V. The graph G is said to be and invertible (AT A). connected if for any i; j 2 V, there exists a (connecting) Remark 2: The well-celebrated Jacobi method (see, e.g., [1], path of (i; i1); (i1; i2);:::; (ik; j) 2 E. The distance between [2], [3]) for solving a linear system with diagonal dominance two nodes is the minimum number of edges connecting the is an excellent example of distributed algorithms. Denoting by two nodes. It is obvious that a graph with a finite number ? xî(k) the estimate of the i-th component of x , this method of nodes is either connected or composed of a finite number is simply rewritten as of disjoint subgraphs with each of them being a connected 0 1 graph. The diameter of a connected graph is defined to be 1 X largest distance between two nodes in the graph. The diameter xî(k + 1) = @bi − aijx^j(k)A (3) aii of a disconnected graph is the largest diameter of a connected j2Ni subgraph. By this definition, the diameter of a graph with a with the initial condition of xi(0) = bi=aii, i = 1; 2; : : : ; n. finite number of nodes (or a finite graph) is always finite. A It is straightforward to verify that constraints C1)-C3) are loop is defined to be a path starting and ending at node i all met by this method. In contrast, consider the well-known through a node j 6= i. A graph is said to be acyclic if it does enhanced version of the Jacobi method known as the Gauss- not contain any loop. A cyclic (or loopy) graph is a graph with Seidel algorithm (also see [1], [2], [3]): at least one loop. 0 1 The distributed linear system problem we are interested in 1 X X is to devise an iterative algorithm for every node i 2 V to xî(k + 1) = Bbi − aijx^j(k + 1) − aijx^j(k)C (4) aii@ A execute so that the computed estimate xî(k) at iteration j<i j>i ? j2Ni j2Ni k = 0; 1; 2;:::; will approach xi as k increases.

Load more