SEQUENCES OF WELL-DISTRIBUTED VERTICES ON GRAPHS AND SPECTRAL BOUNDS ON OPTIMAL TRANSPORT
LOUIS BROWN
Abstract. Given a graph G = (V,E), suppose we are interested in select- n ing a sequence of vertices (xj )j=1 such that {x1, . . . , xk} is ‘well-distributed’ uniformly in k. We describe a greedy algorithm motivated by potential the- ory and corresponding developments in the continuous setting. The algorithm performs nicely on graphs and may be of use for sampling problems. We can interpret the algorithm as trying to greedily minimize a negative Sobolev norm; we explain why this is related to Wasserstein distance by establishing a purely spectral bound on the Wasserstein distance on graphs that mirrors R. Peyre’s estimate in the continuous setting. We illustrate this with many examples and discuss several open problems.
1. Introduction 1.1. Introduction. The purpose of this paper is to discuss a basic problem on finite graphs G = (V,E) that is already difficult on the unit interval [0, 1]. Given a metric space (say, the unit interval, a compact manifold or a finite graph), how does one construct a sequence x1, x2,... of elements in the space such that their distribution is uniformly good—by this we mean that if one takes the first k el- ements {x1, . . . , xk}, then this set is very nearly as evenly distributed on the set as any set of k elements would be. We have not clearly defined what notion of ‘well-distributed’ we mean, this will depend on the actual setting; the question is frequently interesting for several different such notions. There is not even a canon- ical answer on the unit interval [0, 1] but the van der Corput sequence 1 1 3 1 5 3 7 , , , , , , ,... 2 4 4 8 8 8 8 constructed via an inverse binary digit expansion (see [10]), is a good example.
1 1 3 1 5 3 7 arXiv:2008.11296v2 [math.CO] 4 Mar 2021 8 4 8 2 8 4 8
4 2 6 1 5 3 7
Figure 1. The first 7 elements of the van der Corput sequence.
2010 Mathematics Subject Classification. 05C35, 05C85, 05C90, 31B10, 44A99, 49Q20. Key words and phrases. Sampling, Graph, Manifold, Inverse Laplacian, Potential Theory. This paper is part of the author’s PhD thesis at Yale University and has been partially sup- ported by NSF (DMS-1763179). 1 2
We see that the first k elements of the sequence are not as uniformly distributed as k equispaced points but they are always fairly uniformly distributed independently of what k is. There are many different reasons why one could be interested in such sequences: they are natural sampling points for functions (especially for on-line selection and in cases where one does not know in advance in how many points one can sample) but there is also an obvious combinatorial question (‘How well distributed can sequences be? What is the unavoidable degree of irregularity?’).
Figure 2. The first k (here k = 2, 3, 4) elements of the sequence are nearly as evenly distributed as any set of k vertices could be.
Historically, the question has lead to remarkable connections to other fields of math- ematics such as Number Theory, Harmonic Analysis and even Probability Theory (we will discuss several of these connections in §2). We will now state one informal version of the main problem before stating a precise version further below. Main Problem (informal version). Given a finite graph G = (V,E), how would one select a sequence of vertices that are uni- formly good? In what metric would one measure the ‘goodness’ of such a sequence?
8 1
11 4
5 6 9 10 2 3
12 7
Figure 3. The Frucht Graph (see §4.3) and the enumeration ob- tained by the algorithm when starting with vertex 1. 3
Fig. 2 contains a simple such example: taking the Truncated Tetrahedral Graph (see §4.2), in which order should one select the vertices so as to obtain a sequence that is uniformly evenly distributed? Fig. 3 showcases another example, on the Frucht Graph (see §4.3). In Fig. 5, we display an example on the Nauru Graph, a 3-regular bipartite graph with 24 vertices and 36 edges. Even before making the notion of quality precise, we can get some intuition from these simple examples. The enumeration of the vertices in each case was generated by the algorithm discussed below, in §2.2. The organization of this article is as follows: in §1.2 we introduce the Wasserstein distance, and state Kantorovich-Rubinstein duality in the setting of finite graphs. In §2, we introduce a novel algorithm for selecting evenly distributed vertices on graphs, and in §2.3 we provide a Theorem quantitatively demonstrating the even distribution of these vertices. In §3 we explore the connection between optimal transport cost and spectral properties of the Laplacian, providing in §3.2 a Theorem bounding the former in terms of the latter on graphs. In §3.4 and 3.5 we conduct a case study of cycle graphs and torus grid graphs, respectively, in the context of this Theorem. In §4, we provide numerics displaying the performance of the algorithm on a variety of different graphs. In §5, we present proofs of both Theorems. In §6, we list connections to results in the continuous setting.
x2 x1
1/2
x3 x6
1/2
x4 x5
P6 Figure 4. W1((δx1 + δx2 )/2, k=1 δxk /6) = 2/3: We can trans- port 1/6 units of mass from x1 to each of x2 and x6, and similarly with x4 to x3 and x5, incurring a total cost of 4 × 1/6.
1.2. Wasserstein Distance. Wasserstein Distance is a notion of distance on prob- ability distributions introduced by Wasserstein in 1969 [38]. Its simplest instance is W1(µ, ν), also known as Earth Mover’s Distance, which is defined as the minimal amount of cost required to transport one distribution µ to match another distri- bution ν—here, cost is defined as mass × distance: transporting ε units of mass over a distance of δ has a cost of εδ. The notion of Wasserstein distance can be immediately applied to finite graphs: we will work only in the simple case of un- weighted graphs where the distance between two vertices is given as the length of the shortest path connecting them; extensions to the weighted case are conceivable. In short, transporting ε mass over a single edge has a W1 cost of ε (see Fig. 4). More formally, on an abstract metric space X equipped with a metric d, we define Z 1/p p Wp(µ, ν) = inf d(x, y) dγ(x, y) , γ∈Γ(µ,ν) X×X 4 where Γ(µ, ν) denotes the collection of all measures on X × X with marginals µ and ν, respectively (also called the set of all couplings of µ and ν). Note that µ, ν need not be probability measures. We will, throughout this paper, work exclusively with the Earth Mover’s Distance W1 (although extensions to more general Wp are certainly conceivable). The Earth Mover’s Distance is particularly nice to work with: by Kantorovich-Rubinstein duality (see e.g., [39]) Z Z W1(µ, ν) = sup fdµ − fdν : f is 1-Lipschitz . X X 0 Because of this, W1 is translation invariant—for positive measures µ, ν, µ , we have 0 0 W1(µ, ν) = W1(µ + µ , ν + µ ).
Thus, if we have a positive measure µ decomposed into non-positive measures µ1, µ2 as µ = µ1 + µ2, we may use this translation invariance to write + + − − W1(µ, ν) = W1(µ1 + µ2 , ν + µ1 + µ2 ), + − where µi = max{µi, 0} is the positive part of µi and µi = max{−µi, 0} is the negative part. Now that we have defined the Wasserstein distance, we may make precise our original question regarding evenly distributed vertices: Main Problem (formal version). Given a finite graph G = (V,E), how would one select a sequence of vertices such that
k 1 X W δ , dx is small for all k, 1 k xj j=1
where dx is normalized counting measure having weight |V |−1 on each vertex of G. How small one could expect this quantity to be will depend on the particular geometry of the graph. For simplicity of exposition, consider M = Td, the d- dimensional torus with normalized volume measure dx: it is a basic exercise to show that the Earth Mover’s Distance satisfies
n 1 X W δ , dx ≥ c n−1/d 1 n xj d j=1 for all sets of points {x1, . . . , xn} ⊂ M. This clearly shows that the geometry (here: the dimension d) plays a role in what we can expect. We also emphasize that it is almost surely the case that our main question (as asked in §1.1) is of interest also for many other ways of making the notion of distribution quantitative and Wasserstein distance may be one of many (though certainly a rather canonical one). We conclude our short introduction to Wasserstein distance by stating Kantorovich- Rubinstein duality, mentioned above, when X is a finite graph. Proposition (Kantorovich-Rubinstein, see e.g., [17, 27]). Let G = (V,E) be a finite, simple graph, let f : V → R and let W ⊂ V be a subset of vertices. Then ! 1 X 1 X 1 X f(x) − f(x) ≤ W1 δx, dx max |f(xi) − f(xj)|. |V | |W | |W | xi∼xj x∈V x∈W x∈W 5
This shows that our notion of uniform distribution of a subset of vertices has a natural connection to the question of sampling on graphs (i.e., reconstructing the average value of a ‘smooth’ function by sampling in a subset of the vertices). The theory of sampling on graphs is in its infancy but rapidly developing, we refer to [16, 20, 25, 26, 33].
11 20 13 5 4 14 22 7 16 3 1 18 8 24 23 9 17 2 10 21 19 12 6 15
Figure 5. The Nauru Graph on 24 vertices: algorithm starting at 1.
2. The Algorithm 2.1. Setup. We recall that for a finite graph G = (V,E), we can define the adja- cency matrix ( |V | 1 if xi ∼E xj A = (aij) where aij = i,j=1 0 otherwise as well as the degree matrix ( |V | deg(xi) if i = j D = (dij) where dij = i,j=1 0 otherwise.
With these definitions, we can define a notion of a Laplacian via L = D − A.
We denote the eigenvectors of L by φi with corresponding eigenvalues λi, i.e., Lφi = λiφi. Note that L has all its eigenvalues in [0, 2 maxv∈V deg(v)]. Since L’s columns sum to 0, we have for all measures µ that Lµ has no net mass, i.e., is orthogonal to the constant vector, or has mean 0. Since L is symmetric, it is diagonalizable and all pairs of eigenvectors with distinct eigenvalues are orthogonal. The induced ordering of eigenvectors is analogous to the continuous case: small eigenvalue means slow oscillation frequency and the oscillation increases with the eigenvalue—the larger the eigenvalue, the more oscillation there is. In particular, φ1 is constant with λ1 = 0. Since we assume G is connected, there is only one 6 instance of the trivial eigenvalue. As L is diagonalizable, we can also take arbitrary powers and define the fractional Laplacian Lα for α > 0. Setting n = |V |, n α X α L v = hv, φii λi φi. i=1 −α To define L , we need to adjust this definition slightly: since λ1 = 0, we first shift v down by its mean to avoid dividing by 0. (In other words, we simply ignore the component of v in the direction of the constant eigenvector φ1.) That is, n −α X −α L v = hv, φii λi φi. i=2 Of course, multiplying a vector by a matrix can be interpreted as applying an operator to a function, since vectors indexed by vertices are simply functions V → R. Note that AD−1 = I − LD−1 is a diffusion operator: each vertex splits its mass uniformly among its neighbors, and hence mass is preserved. This operator has eigenvalues in [−1, 1]. If we instead apply the transpose D−1A, this corresponds to each vertex taking an equal portion of each of its neighbors masses, which, in general, is not mass-preserving. If G is k-regular (each vertex has equal degree k), 1 then D = kI is scalar and these two notions coincide and equal k A. We refer to [8, 14] for a good introduction to these notions and many references. 2.2. Description of the Algorithm. We present an algorithm, parametrized by 0 < α < 1, for greedily picking well-distributed vertices xk on graphs. (If α 1 the algorithm degenerates and repeatedly selects the same small subset of distant vertices over and over.) First, x1 is chosen arbitrarily. Then, vertices are picked recursively according to the following:
k −α X xk+1 = arg min L δxj (x), x∈V j=1 breaking ties arbitrarily (Figures 2, 3, 5, and 9-11 display applications of this algo- rithm to various graphs). If we write out the algorithm explicitly in terms of the spectrum of the Laplacian operator L = D − A, it becomes n k X X φi(xj) xk+1 = arg min φi(x), x∈V λα i=2 j=1 i where the φi are normalized with kφikL2 = 1, and we skip the constant eigenvector φ1 since it has eigenvalue 0 (note that this choice, while seemingly arbitrary, has no impact on the algorithm as any contribution from the constant vector can be ignored when computing arg min—the algorithm is independent of choice of right inverse of the Laplacian). That is, we add up the projections of the indicator Pk vector of the current vertex set, j=1 δxj , onto each eigenvector, scaling down by the α power of the respective eigenvalue. Consider the case of a cycle graph: if the number of vertices is sufficiently large, this is well approximated by a torus. Setting α = 1/2 and identifying the torus with [0, 2π)/ ∼:= T, we have a simple explicit formula for the inverse Laplacian of a point mass (see §3.4 for the derivation): 1 L−1/2(δ ) = − ln |2 sin((x − x )/2)|. xk π k 7
Note that 2 sin((x − xk)/2) is precisely the Euclidean distance between the points 2πix 2πix at angles xi and x on the unit circle (i.e., |e − e k |). Thus, the algorithm is simply maximizing the product of distances between points on the circle, by setting
k 1 X xk+1 = arg min − ln |2 sin((x − xj)/2)| . x∈ π T j=1 The arising sequence appears to behave on par with provably optimally regular sequences, and Steinerberger recently proved strong results on the regularity of such a sequence in [36], using techniques which are specific to this setting and unlikely to generalize to other graphs. (We explore this example in more detail in §3.4.) Nonetheless, these remarkable results on the torus and cycle graph give us hope that the algorithm may work comparably well on graphs more generally.
2.3. A Theoretical Guarantee. We prove a theoretical guarantee, for any finite graph G, that these sequences do exhibit at least a certain degree of regularity. Theorem 1. Let G be a simple connected graph and let
k 1 X µ = δ , k k xj j=1 where vertices xj are selected as above. Then, for all 1 ≤ k ≤ n,
n 2 X |hµk, φii| −2α 2 −1 ≤ max L δxj 2 k . λ2α j≤k ` i=2 i Remark 1. Observe for the sake of comparison that n X 2 2 2 1 1 |hµ , φ i| = kµ k 2 − |hµ , φ i| = − . k i k ` k 1 k n i=2 Thus, it is natural that the bound in the Theorem should be ∼ k−1. However, 2α λi (and thus λi ) may be arbitrarily close to 0, scaling up the terms in the sum substantially. The only way to prevent this is for µk to be almost orthogonal to low-frequency eigenfunctions (which is cf. Erd˝os-Tur´an[12, 13] a natural way of defining regularity, as it means that µk is concentrated at high frequencies). Remark 2. Note that −2α 2 −2α 2 max L δxj 2 ≤ max L (δx) 2 . j≤k ` x∈V ` The term on the right side is an interesting quantity in itself, and there may be good bounds for it in terms of the geometry of the graph. As can be seen from the expansion into eigenfunctions, this quantity measures, implicitly, how much low- frequency eigenfunctions concentrate in a particular vertex. In vertex-transitive graphs like cycle graphs and torus grid graphs we see that the quantity is actually independent of the vertex x.
3. Spectral Bounds on Transport Distances 3.1. Motivation. The motivation behind the algorithm is two-fold: 8
(1) The greedy algorithm tries to minimize a Sobolev norm
k 1 X δxj . k j=1 H˙ −1 (2) A recent result of R. Peyre [28] shows that, in the continuous setting,
W2(µ, dx) . kµkH˙ −1 . The purpose of this section is to establish a connection between problems of optimal transport and spectral properties of the Laplacian. This is known to hold in the continuous case, we recall the following bound:
Theorem (Carroll, Massaneda, Ortega-Cerda [7]). Let (M, g) be a compact Rie- mannian manifold with normalized volume measure dx and ∂M = ∅. If −∆gφ = λφ on M, then, for some constant C > 0 depending only on (M, g),