Rates of Convergence for Distributed Average Consensus with Probabilistic Quantization

Tuncer C. Aysal, Mark J. Coates, Michael G. Rabbat Department of Electrical and Computer McGill University, Montr´eal, Qu´ebec, Canada {tuncer.aysal, mark.coates, michael.rabbat}@mcgill.ca

Abstract— Probabilistically quantized distributed averaging the weighted linear combination, (PQDA) is a fully decentralized algorithm for performing aver- age consensus in a network with finite-rate links. At each itera- xi(t +1)= Wi,ixi(t)+ Wi,j xj (t). tion, nodes exchange quantized messages with their immediate j:(i,j)∈E neighbors. Then each node locally computes a weighted average X of the messages it received, quantizes this new value using a For a given initial state, x(0), and reasonable choices of randomized quantization scheme, and then the whole process weights Wi,j , it is easy to show that limt→∞ xi(t) = is repeated in the next iteration. In our previous work we 1 N , N i=1 xi(0) x. The DA algorithm was introduced by introduced PQDA and demonstrated that the algorithm almost Tsitsiklis in [17], and has since been pursued in various surely converges to a consensus (i.e., every node converges to P the same value). The present article builds upon this work forms by many other researchers (e.g., [3,6,8,11,14,20]). by characterizing the rate of convergence to a consensus. Of course, in any practical implementation of this algo- We illustrate that the rate of PQDA is essentially the same rithm, communication rates between neighboring nodes will as unquantized distributed averaging when the discrepancy be finite, and thus quantization must be applied to xi(t) among node values is large. When the network has nearly before it can be transmitted. In applications where heavy converged and all nodes’ values are at one of two neighboring quantization points, then the rate of convergence slows down. quantization must be applied (e.g., when executing multiple We bound the rate of convergence during this final phase by consensus computations in parallel, so that each packet trans- applying lumpability to compress the state space, and then using mission carries many values), quantization can actually affect stochastic comparison methods. convergence properties of the algorithm. Figure 1 shows the trajectories, xi(t), for all nodes superimposed on one set of axes. In Fig. 1(a), nodes apply deterministic uniform I.INTRODUCTION quantization with ∆ =0.1 spacing between quantization points. Although the algorithm converges in this example, A fundamental problem in decentralized networked sys- clearly the is not a consensus; not all nodes arrive at tems is that of having nodes reach a state of agreement. the same value. For example, the nodes in a wireless sensor network must In [1,2] we introduced probabilistically quantized dis- be synchronized in order to communicate using a TDMA tributed averaging (PQDA). Rather than applying deter- scheme or to use time-difference-of-arrival measurements for ministic uniform quantization, nodes independently apply a localization and tracking. Similarly, one would like a host of simple randomized quantization scheme. In this scheme, the unmanned aerial vehicles to make coordinated decisions on random quantized value is equal to the original unquantized a surveillance strategy. This paper focuses on a prototypical value in expectation. Through this use of randomization, we example of agreement in networked systems, namely, the guarantee that PQDA converges almost surely to a consensus. average consensus problem: each node initially has a scalar However, since x¯ is not divisible by ∆ in general, the 1 N value, yi, and the goal is to compute the average, N i=1 yi value we converge to is not precisely the average of the at every node in the network. initial values. We presented characteristics of the limiting P Distributed averaging (DA) is a simple iterative distributed consensus value, in particular, showing that it is equal to x¯ algorithm for solving the average consensus problem with in expectation. many attractive properties. The network state is maintained The main contribution of the present paper is to character- N in a vector x(t) R , where xi(t) is the value at node i after ize the rates of convergence for PQDA. We show that, when ∈ t iterations, and there are N nodes in the network. Network the values xi(t) lie in an interval which is large relative to the connectivity is represented by a graph G = (V, E), with quantizer precision ∆, then PQDA moves at the same rate vertex set V = 1,...,N and edge set E V 2 such that as regular unquantized consensus. On the other hand, the (i, j) E implies{ that nodes} i and j communicate⊆ directly transition from when node values are all within ∆ of each with each∈ other. (We assume communication is symmetric.) other (i.e., each node is either at k∆ or (k+1)∆, for some k), In the t+1st DA iteration, node i receives values xj (t) from is slower than unquantized consensus. We present a scheme all the nodes j in its neighborhood and updates its value by for characterizing the time from when PQDA iterates enter 1.8 1.8

1.6 1.6

1.4 1.4

1.2 1.2

1 1

0.8 0.8

0.6 0.6

0.4 0.4 Individual Node Trajectories Individual Node Trajectories 0.2 0.2 Probabilistically Quantized Consensus

Deterministic Uniformly Quantized Consensus 0 0

−0.2 −0.2 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 Iteration Number Iteration Number

(a) (b)

Fig. 1. Individual node trajectories (i.e., xi(t), ∀i) taken by the distributed average consensus using (a:) deterministic uniform quantization and (b:) probabilistic quantization. The number of nodes is N = 50, the nodes’ initial average is x = 0.85, and the quantization resolution is set to ∆ = 0.1. The consensus value, in this case, is 0.8. this “final bin” until the algorithm is absorbed at a consensus the quantization noise diminishes and nodes converge to a state. consensus. The remainder of the paper is organized as follows. Kashyap et al. examine the effects of quantization in Section II describes the probabilistic quantization scheme consensus algorithms from a different point of view [7]. employed by PQDA. Section III formally defines the PQDA N They require that the network average, x¯ =1/N i=1 xi(t), algorithm and lists fundamental convergence properties. Sec- be preserved at every iteration. To do this using quantized tion IV explores rates of convergence when the node values transmissions, nodes must carefully account forP round-off are far from a consensus, relative to the quantization preci- errors. Suppose we have a network of N nodes and let ∆ sion ∆. Section V presents our technique for characterizing denote the “quantization resolution” or distance between to convergence rates when all nodes reach the “final bin” and quantization lattice points. If x¯ is not a multiple of N∆, are within ∆ of each other. We conclude in Section VI. then it is not possible for the network to reach a strict Before proceeding, we briefly review related work. consensus (i.e., limt→∞ maxi,j xi(t) xj (t) = 0) while also preserving the network average,| −x¯, since| nodes only A. Related Work ever exchange units of ∆. Instead, Kashyap et. al define the While there exists a substantial body of work on average notion of a “quantized consensus” to be such that all xi(t) consensus protocols with infinite precision and noise–free take on one of two neighboring quantization values while peer–to–peer communications, little research has been done preserving the network average; i.e., xi(t) k∆, (k+1)∆ ∈{ } introducing distortions in the message exchange. In [12], for all i and some k, and i xi(T )= Nx¯. They show that, Rabani, Sinclair, and Wanka examine distributed averaging under reasonable conditions, their algorithm will converge P as a mechanism for load balancing in distributed computing to a quantized consensus. However, the quantized consensus systems. They provide a bound on the divergence between is clearly not a strict consensus, i.e., all nodes do not have quantized consensus trajectories, q(t), and the trajectories the same value. Even when the algorithm has converged, as which would be taken by an unquantized averaging algo- many as half the nodes in the network may have different rithms. The bound reduces to looking at properties of the values. If nodes are strategizing and/or performing actions averaging matrix W. based on these values (e.g., flight formation), then differing Recently, Yildiz and Scaglione, in [22], explored the values may lead to inconsistent behavior. impact of quantization noise through modification of the Of note is that the related works discussed above all consensus algorithm proposed by Xiao and Boyd [21]. They utilize standard deterministic uniform quantization schemes note that the noise component in [21] can be considered to quantize the data. In contrast to [22], where quantization as the quantization noise and they develop an algorithm noise terms are modeled as independent zero-mean random for predicting neighbors’ unquantized values in order to variables, we explicitly introduce randomization in our quan- correct errors introduced by quantization [22]. Simulation tization procedure. Careful analysis of this randomization studies for small N indicate that if the increasing correlation allows us to provide concrete theoretical rates of convergence among the node states is taken into account, the variance of in addition to empirical results. Moreover, the algorithm proposed in this paper converges to a strict consensus, as for some quantization point τ ∗ τ . opposed to the approximate “quantized consensus” achieved Proofs of all the theorems stated∈ in this section can be found in [7] which is clearly not a strict consensus and does not in [2]. preserve the average, however the network average may not The limiting quantization point, τ ∗, is not equal to x¯ be preserved perfectly by our algorithm. In addition to prov- in general, since x is not necessarily divisible by ∆. This ing that our algorithm converges, we show that the network compromise is required if we want to guarantee convergence average is preserved in expectation, and we characterize the to a consensus in the quantized setting. The theorem above limiting mean squared error between the consensus value and only guarantees to a convergence and does not say anything the network average. about how close τ ∗ will be to x. The following results quantify the accuracy of PQDA. II. PROBABILISTIC QUANTIZATION Theorem 2 ([1,2]): For the x(t) of iterates gen- Suppose that the scalar value xi [ U,U] R lies erated by PQDA steps (3) and (4), in a bounded interval. Furthermore, suppose∈ − that⊂ we wish to obtain a quantized message qi with length l bits, where E lim x(t) =x ¯1. t→∞ l is application dependent. We therefore have L = 2l Theorem 3 ([2]): Forn the sequenceo x(t) of iterates gener- quantization points given by the set τ = τ , τ ,...,τL . { 1 2 } ated by PQDA steps (3) and (4), The points are uniformly spaced such that ∆ = τj τj +1 − for j 1, 2,...,L 1 . It follows that ∆=2U/(2l 1). 1 ∆ 1 ∈{ − } , − lim lim E q(t) x¯ . Now suppose xi [τj , τj+1) and let qi (xi) where t→∞ N→∞ N k − k ≤ 2 1 ρ(W J) ∈ Q   ( ) denotes the PQ operation. Then xi is quantized in a − − Q · Thus, PQDA converges to x in expectation. The upper probabilistic manner: bound on the standard deviation given in Theorem 3 contains

Pr qi = τj+1 = r, and, Pr qi = τj =1 r (1) two terms. First, the ∆/2 worst-case error is due to our use { } { } − of quantization. The second term, (1 ρ(W J))−1, relates where r = (xi τj )/∆. The following lemma, adopted − − − to how fast information can diffuse over the network, which from [19], discusses two important properties of PQ. is directly a function of the network topology. The more Lemma 1 ([19]): Suppose xi [τj , τj ) and let qi be ∈ +1 iterations that are required to reach consensus, the more time an l–bit quantization of xi [ U,U]. The message qi is an ∈ − probabilistic quantization must be applied, and each time we unbiased representation of xi, i.e., quantize we potentially introduce additional errors. 2 2 2 U ∆ E qi = xi, and, E (qi xi) . (2) IV. CONVERGENCE ANALYSIS: FAR FROM CONSENSUS { } { − }≤ (2l 1)2 ≡ 4 − To get a handle on the rate of convergence of PQDA, III. PROBABILISTICALLY QUANTIZED DISTRIBUTED we begin by studying how spread out the individual AVERAGING node values are over the interval [ U,U]. Let I(t) = − This section describes the PQDA algorithm introduced in [mini qi(t), maxi qi(t)]. It is easy to show (see [2]) that [1,2]. We assume each node begins with initial condition I(t + 1) I(t). This follows since, by our constraints on ⊆ xi(0) = yi, as before. At iteration t, nodes independently W, each xi(t+1) is a convex combination of components of quantize their values via q(t). Therefore, after iterating and quantizing, the minimum value can never decrease and the maximum cannot increase. qi(t) = (xi(t)). (3) Q The analogous result also holds if one defines I(t) in terms These quantized values are exchanged among neighbors, and of the components of minimum length quantization range of the usual consensus linear update is performed via x(t). Next, let denote the range of x(t +1) = Wq(t). (4) rq(t) = maxi,j qi qj quantized network values at time− t. Clearly, since x(t) −1 T Let J = N 11 . Following Xiao and Boyd [20], we converges to a consensus, eventually rq(t). Our first results assume that W is a symmetric stochastic matrix which dealing with rates of convergence for PQDA examine the satisfies W1 = 1, 1T W = 1T , and ρ(W J) < 1, where rate at which r (t) tends to zero. − q ρ( ) denotes the spectral radius. These properties suffice to Theorem 4 ([2]): Let r (t) be as above, and let r (0) = · q x guarantee that x(t) converges to the average consensus when maxi,j xi(0) xj (0). Then perfect, unquantized transmissions are used [20]. − Due to our use of probabilistic quantization, x(t) is a N 1 t E rq(t) − ρ (W J) rx(0) + 2∆. random process, and it is natural to ask: Does x(t) converge, { }≤ r N − and if so, does it converge to a consensus? In [1,2], we show Ignoring the 2∆ term for a moment, we see that the range that x(t) indeed converges almost surely to a consensus. of values decays geometrically at a rate proportional to the Theorem 1 ([1,2]): Let x(t) be the sequence of iterates spectral gap of W, ρ(W J). In fact, this is the same − generated by the PQDA algorithm defied in (3) and (4). Then rate at which standard, unquantized consensus converges, so we see that when rq(t) is large, errors due to quantization Pr lim x(t)= τ ∗1 =1, t→∞ do not significantly hamper the rate of convergence. On   1 Using the state recursion and utilizing the conditional inde- 10 Theoretical Upper Bound pendence of vi(t) samples [19], we arrive at Simulated Range − PQDA Simulated Range − SDA N 0 10 j k i Pij = 1 (q w q ) (6) − k − k Y=1 −1 j k 10 where qk and w denote the k–th element of the state j and the k–th row of the weight matrix, respectively. Recall that the constructed Markov chain is an absorbing one. −2 10 Renumbering the states in P so that the transient states Consensus Range comes first yields the canonical form:

−3 10 QR C = . (7) 0 I   −4 10 Here I and 0 are identity and zero matrices with appropriate 0 50 100 150 200 Iteration Number sizes, respectively. The fundamental matrix of a Markov −1 chain is given by, F = (I Q) . The entry Fij of F − Fig. 2. This figure plots the range, rq(t), as a function of iteration number. gives the expected number of times that the process is in The solid curve shows our theoretical upper bound, the dashed curve is the transient state if it is started in the transient state . generated empirically by averaging over 5000 Monte carlo simulations. Also j i plotted for reference is the curve of standard unquantized DA (SDA) that Let ηi be the expected number of steps before the chain is is generated empirically by averaging over 5000 Monte carlo simulations. absorbed, given that the chain starts in state i, and let η For the first 20 iterations, PQDA converges at essentially the same rate be the column vector whose –th entry is . Then ν F1. as unquantized distributed averaging. However, around 20 iterations, all of i νi = the values qi(t) are within one or two ∆, and convergence slows down Moreover, Let Bij be the probability that an absorbing chain considerably. will be absorbed in the absorbing state j if it starts in the the other hand, this result is loose in the sense that it transient state i. Let B be the matrix with entries Bij . Then, B FR R only implies rq(t) 2∆, at best. However, we know that = , where is as in the canonical form. ≤ PQDA converges almost surely, so eventually rq(t)=0. Unfortunately, the exact computation of the expected The gap is due to worst-case analysis employed to obtain the number of steps to absorption requires inversion of matrix N N result. Empirically, we observe that does indeed tend with size 2 1 2 1, which is a challenging task. To rq(t) − × − to zero, but when convergence is nearly obtained (i.e., when overcome this drawback, in the following, we investigate a rq(t) = ∆ or 2∆, the rate of decay is significantly slower Markov chain with N/2+1 states, requiring the inversion of a matrix of manageable size N/2 N/2 with trade–off than that of unquantized consensus. Figure 2 depicts rq(t) × as a function of iteration. The plot shows our bound, and of yielding only bounds on the expected number of steps to a simulated curve obtained by averaging over 5000 Monte absorption. Carlo trials. In this simulation we use a network of N = 50 We employ the reduction techniques outlined in [16] to nodes, with quantizer precision ∆=0.2. In the figure we analyze the convergence of the Markov chain. We need see that around the time when rq(t) = 2∆ (roughly the to introduce a modification, since it is not computationally N N first 20 iterations, also plotted the simulated range of the feasible to compute or process the 2 1 2 1 probability − × − unquantized standard DA (SDA)), the rate of convergence transition matrix, as required in [16]. undergoes a smooth transition and slows down significantly. A. Preliminaries Next, we look at characterizing the rate of convergence during this final phase, focusing on the final step, without We first introduce some notation and recall some defini- loss of generality, from rq(t)=1 to rq(t)=0. tions. The definitions are extracted from [16] and [15]. Definition 1 (Strong order): Let U and V be two Rn- V. CONVERGENCE ANALYSIS: FINAL BIN valued random variables. U is smaller than V in the sense Next, we consider the convergence of the probabilistically of strong order, if and only if for all nondecreasing real quantized consensus algorithm after the time step that all functions f from Rn (in the sense of componentwise ordering the node state values are in the ∆ length quantization bin, on Rn), we have Ef(U) Ef(V ), provided that the ≤ i.e., rq(t) = ∆. Without loss of generality, we assume that expectations exist. ′ N N x(t + t) [0, 1] indicating that q(t) 0, 1 , and that If this conditions holds, then it is denoted U st V . st is t′ =0. ∈ ∈ { } a partial order, i.e., reflexive, antisymmetric and≤ transitive.≤ Let us construct a Markov chain Q with 2N states, where In the case where U and V are 1,...,m -valued random N 0 1 2 { } the chain states are given by Q = q , q ,..., q , where variables, U st V if and only if i E, k≥i u(i) qi 0, 1 N for i =1, 2,...,N. Hence{ q(t) Q for} t 1. v(i), where≤ u and v are the probability∀ ∈ distributions≤ ∈{ } ∈ ≥ k≥i P The entries Pij of the transition probability matrix P is given of U and V . This last relation is also denoted u st v. P ≤ by Definition 2 ( st comparison of stochastic matrices): j i ≤ Pij = Pr q(t +1)= q q(t)= q . (5) Let A and B be two η η stochastic matrices indexed by { | } × elements of E. and initial condition αǫ. The Markov chain Φ is said to be ǫ ordinary lumpable with respect to E and matrix Ao is referred A st B i E, A(i, ) B(i, ). (8) ≤ ⇐⇒ ∀ ∈ · ≤ · to as an ordinary lumped matrix. Definition 3 ( st-Monotonicity): Let A be an η η The key result from [16] (Lemma 2.1) relies on identifying stochastic matrix.≤ We say that A is monotone in the sense× a st-monotone optimal matrix M and a discrete Markov of st if and only if ≤ chain χ = (π, P ), such that (i) P st M st P ; (ii) P ≤ ≤ ≤ i<η, A(i, ) st A(i +1, ). (9) is ordinary lumpable with respect to a partition E; and (iii) ∀ · ≤ · Definition 4 ( st-monotone optimal matrix): Let A be a π st π. Under these conditions, the Σ-valued Markov chain ≤ ≤ ǫ ǫ ǫ stochastic matrix. M is the st-monotone optimal matrix Y o = (π , P o) with P o defined by (10) (with A = P ) is ≤ with respect to A if it is the lowest (the greatest) with respect such that ǫ(χ) st Y o. Note that the result holds if M is ≤ to the partial order (st) such that A st M (M st A). merely st-monotone (not optimal), but the resultant bounds ≤ ≤ ≤ on ǫ(χ)≤are not as tight. B. Truffet’s Reduction techniques In our case, we can set π = π to satisfy the third condition. Let χ = (π, P ) be a discrete-time homogeneous Markov Lemma 3.1 from [16] provides a recipe for constructing an chain with totally ordered state space E = 1,...,η (with optimal ordinary lumped matrix P given the st-monotone ≤ respect to a canonical order on N, ). Here{ π is the} initial matrix M. We first recall the definition of upper optimality distribution and P is the associated Markov≤ transition matrix. in this setting: The random variable χ(t) represents the state of the system Definition 5: Let M be an η η monotone matrix. O is × at epoch t. This chain is used to represent the progression an optimal upper L L matrix if and only if I,K Σ, the L × ∀ ∈ through the quantization states, so the states are the q-vectors quantity J=K O(I, J) is the smallest quantity such that: N N N and η = 2 1. Let k(q) = min( qi,N qi). − i=1 − i=1 P N N This value captures the minimum Manhattan distance of q k E(I), M(k, j) O(I, J) P P ∀ ∈ ≤ to an absorbing state. We establish an ordering by requiring J=K j∈E(J) J=K that for any two states labelled i and j, such that i < j, the Lemma 2 ([16], LemmaX X 3.1): For any arbitraryX partition (i) (j) (i) associated q-vectors, denoted q and q satisfy k(q ) E, and st-monotone matrix M, there exists an optimal (j) ≥ ≤ ǫ k(q ). If the equality holds, then states are ordered by ordinary lumped matrix P o,opt, which is defined by: considering the q vectors as binary numbers. j=bJ Following [16], consider the surjective mapping ǫ : E ǫ −1 → I, J Σ, P o,opt(I, J)= M(bI , j) (11) Σ= 1,...,L , 0

D. The Krawtchouk Expansion j N x x j−k k Let S = X + X + + X denote the sum of N Kj(x,N,p)= − ( p) q . (19) N 1 2 N j k k − · · · k independent Bernoulli random variables, Xi, each having X=0  −    success probabilities Note that we are specifically interested Krawtchouk polyno- mials of degree two and three; finding the roots of K3 and Pr Xj =1 =1 Pr Xj =0 = pj [0, 1] (14) { } − { } ∈ evaluating those roots in K2. From the theory of orthogonal for j = 1, 2,...,N. The distribution Pr SN = n is com- polynomials, it follows that zeros of the Krawtchouk poly- { } monly referred to as the Poisson–Binomial distribution and nomials are real, simple and lie in the interval (0,N) [13]. is denoted B(N, p), where p denotes the vector of success Hence, one can determine the roots of K3 using numerical probabilities; the exact expression for B(N, p) is given by techniques. Moreover, closed–form expressions exist for K2 (13). Since B(N, p) has a complicated structure, it is often rendering the evaluation of obtained roots trivial. approximated by other distributions. The most notables are normal approximations and the Edgeworth expansion [10, E. Example 18], and Poisson approximations and expansion related to In this section, we include the results of a convergence Charlier polynomials [5,9]. In the following, we consider analysis of the W matrix used in simulations described in the approximation of the distribution of SN by the Binomial Section IV. For this matrix, we perform a branch-and-bound distribution (N,p) with parameter N and arbitrary success search procedure to identify the sets I∗. This procedure is probability pB proposed by Roos [13] due to its ability to a greedy search that initially selects K of the sets with provide point metric bounds requires to bound the considered k(q(j))=1, choosing those with the smallest neighbour 600 through less “averaging” (multiplication with the weight matrix) before arriving at the final bin. It is hence clear that 500 the algorithm needs to run for a smaller number of iterations to arrive at a larger final bin size. On the other hand, the expected number of iterations 400 taken to achieve consensus is dominated by the number of iterations taken to converge to an absorbing state after all the 300 node values are in the final bin. Probabilistic quantization is the dominant effect in the final bin. In fact, the time taken to 200 converge to an absorbing state is heavily dependent on the Number of iterations distance to that absorbing state at the first time step when all values enter the final bin. This distance is affected by 100 two factors. First, if more averaging operations occur prior to the instant t∆ = min t : rx(t) ∆ , then there is more 0 { ≤ } 0 5 10 15 20 25 uniformity in the values, decreasing the distance from each Initial k−value xi to x¯. Second, if the initial data average x¯ is close to a Fig. 3. Bounds on the expected number of iterations to convergence after quantization point, then, on average, x(t∆) will be closer entry into the final bin. The lines with circle and diamond markers show to an absorbing state (note that E 1Tq(t) = 1Tx(0)). the lower and upper bounds, as derived from the branch-and-bound search, These observations explain the results{ of Fig.} 4. Note that combined with the lumped-state Markov chain analysis. The empirical mean, derived from 5000 trials for each initial k-value, is depicted by the line with the convergence time order for x(0) = 0.85 and x(0) = 0.90 square markers. Error bars depict 5 and 95 percentiles; outliers are shown cases flip for ∆=0.2 and ∆=0.1. That is due to the fact with the “+” marker. that the average distance to an absorbing when, at the first cardinalities (K is an algorithm constant defining the degree time step, all the node values enter the final bin is smaller of branching in the search). Subsequently it attempts to grow for x(0) = 0.85 when ∆=0.2 compared to ∆=0.1, each of these sets by adding one neighbour. Through this and is smaller for x(0) = 0.90 when ∆=0.1 compared to process it creates K′ sets, but at each step it retains only the ∆=0.2. Moreover, note that ∆=0.05 yields the smallest K with smallest neighbour cardinalities. The process ends distance to an absorbing state for both initial conditions. when the examined sets satisfy k(q(j))= N/2. Although, it takes more iterations to converge to final bin, We now apply the procedure of [16] to develop stochastic in both cases, PQDA algorithm with ∆=0.05 yields the matrices that provide upper- and lower- bounds on the smallest average distance to an absorbing state when all the expected time to absorption for the original Markov chain. node values enter to the final bin for the first time step, Figure 3 depicts the results, showing the upper and lower hence, the smallest average number of iterations to achieve bounds (dashed) and the expected time to convergence, the consensus. estimated empirically, by running 5000 simulations for each value of k(q(i)) from random initial states. The significant VI. CONCLUSION number of outliers and the substantial variance indicate the This paper presented convergence results for PQDA. We value of analytical bounds. The state-space is very large even show that when the range of node values is large, PQDA be- with only 50 nodes and it is very difficult to reliably evaluate haves essentially like standard distributed averaging, without convergence times through empirical studies. quantization. When all nodes have values on two neighbor- ing quantization points, convergence is slowed. We provide F. An Empirical Observation reduction techniques and approximations to bound the rate We investigate the average convergence time of the dis- of convergence rate in this final stage of the algorithm. tributed average consensus using probabilistic quantization for varying ∆ 0.05, 0.1, 0.2 against the number of nodes REFERENCES ∈{ } in the network, Fig. 4(a) and (b). We also show the average [1] T. Aysal, M. Coates, and M. Rabbat. Distributed average consensus number of iterations taken to achieve the final quantization using probabilistic quantization. In Proc. IEEE Statistical Signal Processing Workshop, Madison, WI, Aug. 2007. bin. Moreover, Fig. 4(c) and (d) plots the average normalized [2] T. Aysal, M. Coates, and M. Rabbat. Probabilistically quantized distance to the closest absorbing state at the first time step distributed averaging. submitted to IEEE Trans. Signal Procesing, when all the quantized node state values are in the final Sep. 2007. [3] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah. Randomized gossip quantization bin. The initial state averages are x(0) = 0.85 algorithms. IEEE Trans. Info. Theory, 52(6):2508–2530, June 2006. and x(0) = 0.90, and the connectivity radius is d = [4] P. Buchholz. Exact and ordinary lumpability in finite Markov chains. 4 log(N)/N. Each data point is an ensemble average of J. Appl. Probab., 31(1):59–75, Jan. 1994. [5] P. Deheuvels and D. Pfeifer. A semigroup approach to poisson 10000 trials. Note that the convergence time increases with p approximation. Annals of Probability, 14:663–676, 1986. the number of nodes in the network. The plots suggest that [6] A. Jadbabaie, J. Lin, and A. S. Morse. Coordination of groups of the number of iterations taken by the PQDA algorithm to mobile autonomous agents using nearest neighbor rules. 48(6):988– 1001, 2003. converge to final quantization bin decreases as ∆ increases. [7] A. Kashyap, T. Basar, and R.Srikant. Quantized consensus. Automat- This can be seen by noting that the algorithm has to go ica, 43:1192–1203, July 2007. 120 120 ∆=0.2 ∆=0.2 ∆=0.1 ∆=0.1 ∆ ∆ 100 =0.05 100 =0.05 ∆=0.2 ∆=0.2 ∆=0.1 ∆=0.1 ∆ ∆ 80 =0.05 80 =0.05

60 60

40 40 Average Number of Iterations Average Number of Iterations

20 20

0 0 30 40 50 60 70 30 40 50 60 70 Number of Nodes Number of Nodes

(a) (b)

0.5 0.5 ∆=0.2 ∆=0.2 0.45 ∆=0.1 0.45 ∆=0.1 ∆=0.05 ∆=0.05 0.4 0.4

0.35 0.35

0.3 0.3

0.25 0.25

0.2 0.2

0.15 0.15

0.1 0.1

Average Distance to Closest Absorbing State 0.05 Average Distance to Closest Absorbing State 0.05

0 0 30 40 50 60 70 30 40 50 60 70 Number of Nodes Number of Nodes

(c) (d) Fig. 4. Average number of iterations taken by the probabilistically quantized distribute average computation to achieve final quantization bin (dashed) and consensus (solid) for ∆ ∈ {0.05, 0.1, 0.2} and for varying N with (a:) x(0) = 0.85 and (b:) x(0) = 0.90, along with the corresponding average distance to the closest absorbing state at the first time step when all the quantized node state value are in the final quantization bin for (c:) x(0) = 0.85 and (d:) x(0) = 0.90.

[8] D. Kempe, A. Dobra, and J. Gehrke. Computing aggregate information Models. John Wiley, New York, 1976. using gossip. In Proc. Foundations of Computer Science, Cambridge, [16] L. Truffet. Reduction techniques for discrete-time Markov chains on MA, October 2003. totally ordered state space using stochastic comparisons. J. Appl. [9] L. LeCam. An approximation theorem for the poisson binomial Probab., 37(3):795–806, Mar. 2000. distribution. Pacific Journal of Mathematics, 10:1181–1197, 1960. [17] J. Tsitsiklis. Problems in Decentralized Decision Making and Com- [10] H. Makabe. On approximations to some limiting distributions with putation. PhD thesis, Dept. of Electrical Engineering and Computer applications to theory of sampling inspections by attributes. Kodai Science, M.I.T., Boston, MA, 1984. Mathematics Seminars Report, 16:1–17, 1964. [18] J. V. Uspensky. Introduction to mathematical probability. McGraw– [11] R. Olfati-Saber and R. Murray. Consensus problems in networks of Hill, New York, 1937. agents with switching topology and time delays. 49(9):1520–1533, [19] J.-J. Xiao and Z.-Q. Luo. Decentralized estimation in an inhomoge- Sept. 2004. neous sensing environment. 51(10):3564–3575, Oct. 2005. [20] L. Xiao and S. Boyd. Fast linear iterations for distributed averaging. [12] Y. Rabani, A. Sinclair, and R. Wanka. Local divergence of Markov Systems and Control Letters, 53:65–78, 2004. chains and the analysis of iterative load-balancing schemes. In [21] L. Xiao, S. Boyd, and S.-J. Kim. Distributed average consensus Proceedings of the IEEE Symp. on Found. of Comp. Sci., Palo Alto, with least–mean–square deviation. Journal of Parallel and Distributed CA, Nov. 1998. Computing, 67(1):33–46, Jan. 2007. [13] B. Roos. Binomial approximation to the poisson binomial distribution: [22] M. E. Yildiz and A. Scaglione. Differential nested lattice encoding for The krawtchouk expansion. Theory of Probability and its Applications, consensus problems. In Proceedings of the Information Processing in 45:258–272, 2001. Sensor Networks, Cambridge, MA, Apr. 2007. [14] V. Saligrama, M. Alanyali, and O. Savas. Distributed detection in sensor networks with packet loss and finite capacity links. IEEE Trans. Signal Processing, 54(11):4118–4132, November 2006. [15] D. Stoyan. Comparison Methods for Queues and Other Stochastic