Distributed Averaging Using Non-Doubly Stochastic Matrices

Weighted Gossip: Distributed Averaging Using Non-Doubly Stochastic Matrices Florence Bénézit Vincent Blondel Patrick Thiran John Tsitsiklis Martin Vetterli ENS-INRIA, France UCL, Belgium EPFL, Switzerland MIT, USA EPFL, Switzerland Abstract—This paper presents a general class of gossip- Then the destination node sends the average back through the based averaging algorithms, which are inspired from Uniform same route so that all the nodes in the route can update their Gossip [1]. While Uniform Gossip works synchronously on values to the average. Path Averaging is efficient in terms complete graphs, weighted gossip algorithms allow asynchronous rounds and converge on any connected, directed or undirected of energy consumption, but it demands some long distance graph. Unlike most previous gossip algorithms [2]–[6], Weighted coordination to make sure that all the values in the route were Gossip admits stochastic update matrices which need not be updated correctly. Routing information back and forth might as doubly stochastic. Double-stochasticity being very restrictive in well introduce delay issues, because a node that is engaged in a distributed setting [7], this novel degree of freedom is essential a route needs to wait for the update to come back before it can and it opens the perspective of designing a large number of new gossip-based algorithms. To give an example, we present proceed to another round. Furthermore, in a mobile network, one of these algorithms, which we call One-Way Averaging. It is or in a highly dynamic network, routing the information back based on random geographic routing, just like Path Averaging [5], on the same route might even not succeed. except that routes are one way instead of round trip. Hence in This work started with the goal of designing a unidirectional this example, getting rid of double stochasticity allows us to add gossip algorithm fulfilling the following requirements: robustness to Path Averaging. • Keep a geographic routing communication scheme because it is highly diffusive, I. INTRODUCTION • Avoid routing back data: instead of long distance agree- Gossip algorithms were recently developed to solve the ments, only agreements between neighbors are allowed, distributed average consensus problem [1]–[6]. Every node i • Route crossing is possible at any time, without introduc- in a network holds a value xi and wants to learn the average ing errors in the algorithm. xave of all the values in the network in a distributed way. Most As we were designing One-Way Averaging, we happened to gossip algorithms were designed for wireless sensor networks, prove the correctness of a broad set of gossip-based algo- which are usually modeled as random geometric graphs and rithms, which we present in this paper along with One-Way sometimes as lattices. Ideally a distributed averaging algorithm Averaging. These algorithms can be asynchronous and they should be efficient in terms of energy and delay without use stochastic diffusion matrices which are not necessarily requiring too much knowledge about the network topology doubly stochastic, as announced by the title of the paper. at each node, nor sophisticated coordination between nodes. In Section II, we give some background on gossip algo- The simplest gossip algorithm is Pairwise Gossip, where rithms, and we explain why Uniform Gossip is a key algorithm random pairs of connected nodes iteratively and locally aver- to get inspired from when building a unidirectional gossip age their values until convergence to the global average [2]. algorithm. In Section III, we present Weighted Gossip, an Pairwise local averaging is an easy task, which does not re- asynchronous generalization of Uniform Gossip, which was quire global knowledge nor global coordination, thus Pairwise already suggested in [1] but had remained unnamed. We show Gossip fulfills the requirements of our distributed problem. in Section IV that weighted gossip algorithms converge to However, the convergence speed of Pairwise Gossip suffers xave, which is a novel result to the best of our knowledge. In from the locality of the updates, and it was shown that averag- Section V, we describe in detail One-Way Averaging and we ing random geographic routes instead of local neighborhoods show on simulations that the good diffusivity of geographic is an order-optimal communication scheme to run gossip. routes in Path Averaging persists in One-Way Averaging. Let n be the number of nodes in the network. On random Computing the speed of convergence of weighted gossip geometric graphs, Pairwise Gossip requires Θ(n2) messages algorithms remains open and is part of future work. whereas Path Averaging requires only Θ(n log n) messages II. BACKGROUND ON GOSSIP ALGORITHMS under some conditions [5]. x The previous algorithm gained efficiency at the price of The values to be averaged are gathered in a vector (0) and more complex coordination. At every round of Path Averaging, at any iteration t, the current estimates of the average xave are x a random node wakes up and generates a random route. gathered in (t). Gossip algorithms update estimates linearly. W Values are aggregated along the route and the destination node At any iteration t, there is a matrix (t) such that: computes the average of the values collected along the route. x(t)T = x(t 1)T W (t). − In gossip algorithms that converge to average consensus, W (t) Theorem 3.1 (Non-biased estimator): If the estimates is doubly stochastic: W (t)1 = 1 ensures that the global x(t) = s(t)/ω(t) converge to a consensus, then the T T average is conserved, and 1 W (t) = 1 guarantees stable consensus value is the average xave. consensus. To perform averaging on a one way route, W (t) Proof: Let c be the consensus value. For any ǫ> 0, there should be upper triangular (up to a node index permutation). is an iteration t0 after which, for any node i, xi(t) c <ǫ. | − | But the only matrix that is both doubly stochastic and upper Then, for any t>t0, si(t) cωi(t) < ǫωi(t) (weights are triangular matrix is the identity matrix. Thus, unidirectional always positive). Hence,| summing− over| i, averaging requires to drop double stochasticity. s ω s ω ω Uniform Gossip solves this issue in the following way. ( i(t) c i(t)) i(t) c i(t) <ǫ i(t). x − ≤ | − | Instead of updating one vector (t) of variables, it updates a Xi Xi Xi s ω vector (t) of sums, and a vector (t) of weights. Uniform Using Eq. (5), (6), the previous equation can be written as Gossip initializes s(0) = x(0) and ω(0) = 1. At any time, the nxave nc < nǫ, which is equivalent to xave c < ǫ. | − | | − | vector of estimates is x(t) = s(t)/ω(t), where the division Hence c = xave. is performed elementwise. The updates are computed with In the next section, we show that, although sums and weights stochastic diffusion matrices D(t) t>0: do not reach a consensus, the estimates xi(t) 1≤i≤n converge { } to a consensus under some conditions.{ } s(t)T = s(t 1)T D(t), (1) − ω(t)T = ω(t 1)T D(t). (2) IV. CONVERGENCE − In this section we prove that Weighted Gossip succeeds in Kempe et al. [1] prove that the algorithm converges to a other cases than just Uniform Gossip. consensus on x (lim x(t) = x 1) in the special case ave t ave Assumption 1: D(t) 0 is a stationary and ergodic se- D D t> where for any node i, ii(t)=1/2 and ij (t)=1/2 for quence of stochastic{ matrices} with positive diagonals, and one node j chosen i.i.d. uniformly at random. As a key remark, E[D] is irreducible. D note that here (t) is not doubly stochastic. The algorithm is Irreducibility means that the graph formed by edges (i, j) synchronous and it works on complete graphs without routing, such that P[Dij > 0] > 0 is connected, which requires the and on other graphs with routing. We show in this paper connectivity of the network. Note that i.i.d. sequences are that the idea works with many more sequences of matrices stationary and ergodic. Stationarity implies that E[D] does not D(t) t>0 than just the one used in Uniform Gossip. { } depend on t. Positive diagonals means that each node should III. WEIGHTED GOSSIP always keep part of its sum and weight: i, t, Dii(t) > 0. Theorem 4.1 (Main Theorem): Under∀ Assumption 1, We call Weighted Gossip the class of gossip-based algo- Weighted Gossip using D(t) 0 converges to a consensus rithms following the sum and weight structure of Uniform t> with probability 1, i.e. lim{ }x(t)= x 1. Gossip described above (Eq. (1) and (2)). A weighted gossip t→∞ ave To prove Th. 4.1, we will start by upper bounding the algorithm is entirely characterized by the distribution of its error x(t) xave1 ∞ with a non-increasing function f(t) diffusion matrices D(t) t>0. Let P (s,t) := D(s)D(s + k − k n { } (Lemma 4.1): let ηji(t)= Pji(t) Pji(t)/n = Pji(t) 1) ... D(t) and let P(t) := P (1,t). Then − j=1 − ωi(t)/n, then f is defined as f(t)P = max1≤i≤n fi(t), where T T n s(t) = x(0) P (t), (3) fi(t) = ηji(t) /ωi(t). Then, we will prove that f(t) j=1 | | ω(t)T = 1T P (t). (4) vanishesP to 0 by showing that ηji(t) vanishes to 0 (weak ergodicity argument of Lemma 4.3) and that ωi(t) is bounded If a weighted gossip algorithm is asynchronous, then, away from 0 infinitely often (Lemma 4.4).

Distributed Averaging Using Non-Doubly Stochastic Matrices

Adjacency and Incidence Matrices

Graph Equivalence Classes for Spectral Projector-Based Graph Fourier Transforms Joya A

Network Properties Revealed Through Matrix Functions 697

Diagonal Sums of Doubly Stochastic Matrices Arxiv:2101.04143V1 [Math

The Generalized Dedekind Determinant

Dynamical Systems Associated with Adjacency Matrices

Adjacency Matrices Text Reference: Section 2.1, P

On Structured Realizability and Stabilizability of Linear Systems

A Note on the Spectra and Eigenspaces of the Universal Adjacency Matrices of Arbitrary Lifts of Graphs

4/2/2015 1.0.1 the Laplacian Matrix and Its Spectrum

Determinants Associated to Zeta Matrices of Posets

GPU Accelerated Approach to Numerical Linear Algebra and Matrix Analysis with CFD Applications