2018 IEEE International Symposium on Information Theory (ISIT)

Comparing Massive Networks via Moment Matrices

Hayoung Choi, Yifei Shen, and Yuanming Shi School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China Email: {hchoi,shenyf,shiym}@shanghaitech.edu.cn

Abstract—In this paper, a novel similarity measure for com- similarity between massive networks based on the spectral paring massive complex networks based on moment matrices is distribution of the corresponding adjacency in the given proposed. We consider the corresponding of state. Specifically, we first compute the corresponding positive a graph as a real random variable of the algebraic probability space with a state. It is shown that the spectral distribution of definite moment matrix whose entries consist of the first few the matrix can be expressed as a unique discrete probability number of moments of the spectrum distribution. Our pro- measure. Then we use the geodesic distance between positive posed distance between networks is obtained by the geodesic definite moment matrices for comparing massive networks. It is distance of the moment matrices (called GDMM). We show proved that this distance is graph invariant and sub-structure that this distance is graph invariant and sub-structure invariant. invariant. Numerical simulations demonstrate that the proposed method outperforms state-of-art method in collaboration network GDMM is scalable to extremely massive networks and highly classification and its computational cost is extremely cheap. parallelable. Numerical Simulations demonstrate that GDMM not only has better performance over the competing methods, I.INTRODUCTION but also outperforms the state-of-art method in collaboration Network is one of the most common representations of network classification. complex data and plays an indispensable role in diversified research areas. Over the past several decades, enormous II.BACKGROUNDAND PRELIMINARY breakthroughs have been made while many fundamental prob- Let M(n, C) (resp. M(n, R)) be the set of n × n complex lems about networks are remaining to be solved. Comparing (resp. real) matrices. Denote N the set of nonnegative integer networks is one of the most important problems with a numbers. In general, we will follow the notations and defini- very long history [1]. In practice, the similarity measure of tions in [9]. networks is widely applied in social science, biology, and chemistry. For instance, the similarity measure of networks A. Graph can be used to classify ego networks [2], distinguish between Let V be the set of vertices, and let {x, y} denote the edge neurological disorders [3], and discover molecules with similar connecting two points x, y ∈ V .A undirected graph is a pair properties [4]. In order to measure similarity between networks G = (V,E) where the set V of vertices is finite, and the set effectively, several definitions of distance and similarity have E of edges is a subset of the set {{x, y} : x, y ∈ V }. We been proposed. Graph edit distances are the minimum cost say that two vertices x, y ∈ V are adjacent if {x, y} ∈ E, for transforming one network to another by the distortion of denoted by x ∼ y.A degree of a vertex x ∈ V is defined by nodes and edges [5]. These definitions only pay attention to the deg(x) = |{y ∈ V : y ∼ x}|. In this paper we consider a finite similarities of the nodes and edges but lacks the information of undirected graph. Two graphs G = (V,E) and G0 = (V 0,E0) topological structures of the networks. For the purpose of ad- are isomorphic if there is a bijection f : V −→ V 0, such dressing this limitation, frequency subgraph mining algorithms that any two vertices u, v ∈ V are adjancent in G if and [6], graph kernels [7] and methods based on moments [8] only if f(u) and f(v) are adjacent in V 0. For m ∈ N, a have been proposed. However, these methods are not scalable finite sequence of vertices x0, x1, . . . , xm ∈ V is called a to massive networks containing millions of edges, which are walk of length m if x0 ∼ x1 ∼ · · · ∼ xm, where some common in today’s applications. As a result, effective and of x0, x1, . . . , xm may coincide. A graph G = (V,E) is scalable methods for massive network comparison is urgently connected if every pair of distinct vertices x, y ∈ V (x 6= y) needed. are connected by a walk. If there is a walk connecting two In this paper, we propose a novel similarity measure for distinct vertices x, y ∈ V , the graph distance between x comparing massive complex networks. We consider the adja- and y is the minimum length of a walk connecting x and cency matrix of the network as a probability random variable y, denoted by ∂(x, y). If there is no such a walk, we define of the algebraic probability space with the proposed state. ∂(x, y) = ∞. For x = y we define ∂(x, x) = 0. For graphs We show that the spectral distribution of the matrix can be Gi = (Vi,Ei), i = 1, 2 with V1 ∩ V2 = ∅, the direct sum of expressed as a unique discrete probability measure. Then G1 and G2 is defined as G = (V1 ∪ V2,E1 ∪ E2), denoted we propose an efficient and scalable method to measure the by G = G1 t G2. From now on, without loss of generality we assume that V = {1, 2, . . . , n}. Any graph G = (V,E) is This work was partly supported by the National Nature Science Foundation n×n of China under Grant No. 61601290, and the Shanghai Sailing Program under represented by the adjacency matrix A ∈ {0, 1} where Grant No. 16YF1407700. Ai,j = 1 if and only if {i, j} ∈ E. Every permutation

978-1-5386-4780-6/18/$31.00©2018 IEEE 656 2018 IEEE International Symposium on Information Theory (ISIT)

π : {1, 2, . . . , n} −→ {1, 2, . . . , n} is associated with a Theorem 2. Let (A , ϕ) be an algebraic probability space. For corresponding P . Given an adjacency a real random variable a = a∗ ∈ A there exists a probability matrix A, graphs corresponding to adjacency matrix A and measure µ ∈ B(R) such that P AP > are isomorphic, i.e., they represent the same graph Z k k structure. A property of graph is called graph invariant if ϕ(a ) = x dµ(x) for all k ∈ N0. the property does not change under the transformation of R reordering of vertices. Note that the adjacency matrix of a Such µ is called the spectral distribution of a in ϕ [9]. graph includes the full information about a graph. For x, y ∈ V It is noted that M(n, C) with the usual operators is a unital ∗-algebra. The following is a typical example of a state. For and m ∈ N let Wm(x, y) denote the number of m-step walks A = [A ] ∈ M(n, ) the normalized trace connecting x and y. Remark that W0(x, y) = 0 if x 6= y and ij C , the is defined m by W0(x, y) = 1 if x = y. It is noted that (A )ij = Wm(i, j) n 1 1 X for all i, j ∈ V and m ∈ N. ϕ (A) = tr(A) = A . (II.1) tr n n ii Let A (G) be the unital algebra generated by A (the algebra i=1 generated by A and the I = A0, i.e., A (G) = Note that the normalized trace is a state on M(n, ), implying {f(A): f ∈ [x]}, where [x] is the set of all polynomials C C C (M(n, ), ϕ ) becomes an algebraic probability space. with complex coefficients. Moreover, the involution is defined C tr m ∗ m by (cA ) =c ¯A for c ∈ C. Then A (G) becomes a unital III.MAIN RESULTS ∗-algebra. We call A (G) adjacency algebra of G. n Denote the vector of all ones by e ∈ C . Define a function Proposition 1. Let s(G) denote the number of distinct eigen- ϕe : M(n, C) −→ C by value of G For a connected finite graph G we have 1 ϕ (A) = he, Aei (III.2) e n s(G) = dimA (G) ≥ diam(G) + 1. for all A ∈ M(n, C). Then it is clear that ϕe is a vector state on M(n, ), implying that (M(n, ), ϕ ) is an algebraic B. Quantum Probability C C e probability space. Let A be a unital ∗-algebra over the complex number field Let G = (V,E) be a graph and let ϕ be a state given C with the multiplication unit 1A . A function ϕ : A −→ C on the adjacency algebra A (G). Since the adjacency matrix is called a state on A if A ∈ M(n, C) of G can be regarded as a real random variable of the algebraic probability space (M(n, ), ϕe), by Theorem ∗ C (i) ϕ is linear; (ii) ϕ(a a) ≥ 0; (iii) ϕ(1A ) = 1. 2 it follows that there exists the spectral distribution of A in The pair (A , ϕ) is called an algebraic probability space. the state ϕe such that Note that a state ϕ on a unital ∗-algebra is a ∗-map, i.e., Z A ϕ (Ak) = xkdµ(x) for all k ∈ . (III.3) ϕ(a∗) = ϕ(a). Let ( , ϕ) be an algebraic probability space. e N A R An element a ∈ A is called an algebraic random variable Note that or a random variable for short. A random variable a ∈ A ∗ k 1 k k is called real if a = a . For a random variable a ∈ A the ϕe(A ) = he, A ei = [A e], n E quantity of the form: 1 Pn where E(v) = i=1 vi is the average of entries of vector ε1 ε2 εm n ϕ(a a ··· a ), ε1, ε2, . . . , εm ∈ {1, ∗}, k k v. Since (A )ij = Wm(i, j) and A e is the column vector is called a mixed moment of order m. Statistical properties whose i-th entry is equal to the sum of the number of all walks k of an algebraic random variable are determined by its mixed of length k from the vertex i, ϕe(A ) is the average of the the moments. For a real random variable a = a∗ the mixed sum of the number of all walks of length k from each vertex. k Let δ denote a dirac measure at λ (i.e., δ (S) = 1 if λ ∈ S moments are reduced to the moment sequence: ϕ(a ), k ∈ N, λ λ where ϕ(ak) is called the kth moment of a. By definition and δλ(S) = 0 if λ∈ / S). ϕ(a0) = 1. Alternatively, ϕ(ak) is denoted by m . k Theorem 3. Let (M(n, C), ϕe) be the algebraic probability A moment matrix with degree n is defined as space. For a real random variable A ∈ M(n, C), there exists a unique probability discrete measure µ = Ps ω δ such   i=1 i λi m0 m1 ··· mn that m1 m2 ··· mn+1 Z Mn :=  . . . .  . ϕ (Ak) = xkdµ(x) for all k ∈ . (III.4)  . . .. .  e N   R mn mn+1 ··· m2n Conversely, for a probability discrete measure µ = Ps ω δ , there exists a real random variable A ∈ Let B( ) denote the set of all probability measures having i=1 i λi R (M(n, ), ϕ ) with s distinct eigenvalues such that ϕ (Ak) = finite moments of all orders. C e e R xkdµ(x) for all k ∈ . R N

657 2018 IEEE International Symposium on Information Theory (ISIT)

Proof. (⇒) Let A ∈ Cn×n be a . By Spectral 5. By Proposition 1, it follows that s(G) ≥ 6, equivalently 6 Theorem, A can be diagonalized by a U. That I, A,..., A are linearly independent. Thus Mn is positive is, A = UDU ∗. Then the kth moment of A is definite for n = 1,..., 6, which is a point on the Riemannian s manifold of positive definite matrices (see [15, Theorem 1.1]). 1 1 X Z ϕ (Ak) = e∗Ake = v∗Dkv = ω λk = xkdµ, Denote the set of positive definite matrices as Po. There e n n i i i=1 R are various distances between two positive definite matri- where v = U ∗e. And it holds that ces ( [16], [17]). The Frobenius norm k · k2 gives rise to the affine-invariant metric on Po given by δ(A, B) = X 1 X 1 X ω = v∗v = e∗UU ∗e = 1. k log(A−1/2BA−1/2)k for any A, B ∈ Po. Then Po i n n 2 is a Cartan-Hadamard manifold, a simply connected com- Ps (⇐) Let µ = ωiδλi with ωi ≥ 0 and λi ∈ R for plete Riemannian manifold with non-positive sectional cur- P i=1 all i ∈ N, and i=1 ωi=1. Let D be the n × n diagonal vature. The geodesic curve has a parametrization γ(t) = matrix whose diagonal entries are λ , λ , . . . , λ , λ , . . . , λ . 1/2 −1/2 −1/2 t 1/2 √ √ 1 2 s s s A (A BA ) A , 0 ≤ t ≤ 1, which is the unique > √1 Let v = [ ω1 ... ωs 0 ... 0] . Since v and n e both are geodesic from A to B (see [18]). unit vectors, there exists a unitary matrix U such that Uv = For two graphs G and G˜, we propose new distance between √1 ∗ ˜ n e. Let A = UDU . Then A holds the equality (III.4). G and G as the geodesic distance between the corresponding k moment matrices, i.e., Theorem 4. The k-th moment of A, ϕe(A ), is a graph ˜ ˜ invariant. dn(G, G) := δ(Mn(G), Mn(G)), n ∈ N. Proof. For any given permutation matrix P it holds that Assume n ∈ N is fixed such that the corresponding moment ˜   1 matrices are positive definite. Denote d(G, G) instead of ϕ (P AP ∗)k = e∗(PAkP ∗)e = ϕ (Ak). d (G, G˜). e n e n Theorem 5. For graphs G, G,˜ Gˆ, (i) (Nonnegativity) d(G, G˜) ≥ 0, Hence, we will henceforth denote Mn as Mn(G) if a (ii) (Identification) d(G, G˜) = 0 if G = G˜, graph G is given. Remark that the probability measure includes (iii) (Symmetry) d(G, G˜) = d(G,˜ G), information not only about the eigenvalues of A, but also (iv) (Triangle Inequality) d(G, Gˆ) ≤ d(G, G˜) + d(G,˜ Gˆ). about the corresponding eigenvectors. To measure similarity between two large-scale networks, by Theorem 3 it is enough Cospectral graphs, also called isospectral graphs, are to compare the corresponding probability measures. There are graphs that share the same graph spectrum. The smallest pair various distances and divergences between two distributions of cospectral graphs is the graph union C4 ∪K1 and star graph such as Kullback-Leibler divergence, Bhattacharyya distance, S5, illustrated in Figure 1. Both have the same graph spectrum, ˜ etc (see [10], [11]). However, since a large-scale network in −2, 0, 0, 0, 2. If the adjacency matrices A and A are considered real world has a rich spectrum, to reconstruct the spectral as real algebraic random variables in (M(5, C), ϕtr), then two ˜ distribution is impossible in practice. Instead, we can use algebraic random variables A and A are moment equivalent, k 1 k 1 ˜k ˜k moments of the distributions. Mathematically, all the moments since ϕtr(A ) = n tr(A ) = n tr(A ) = ϕtr(A ). However, ˜ up to infinity are requested in order to obtain a perfect recon- if A and A are considered as real algebraic random variables struction. However, the first few moments are only sufficient if in (M(5, C), ϕe), then two algebraic random variables A and ˜ the class of functions in which the reconstruction is sought is A are not moment equivalent. So, using the state ϕe allows restricted appropriately. It has been mentioned in the literature us to distinguish two graphs. Indeed, the moment matrices that the most of the information about the measure is contained  1 1.6  1 1.6 M (C ∪ K ) = , M (S ) = in the first few moments, and the higher-order ones providing 1 4 1 1.6 3.2 1 5 1.6 4 only little additional information [12]–[14]. Since M (G) n are different. has sufficient information about the distribution, a distance between moment matrices can be calculated to measure a similarity between two distributions. ∗ n Let c = [c0, c1, . . . , cn] be a vector in C . Then n n 2 ∗ X ∗ i+j ∗ X i c Mnc = (e A e)c cj = ciA e ≥ 0. i Fig. 1: Cospectral graphs: C4 ∪ K1 and S5 i,j=0 i=0 Thus, the moment matrix in (II-B) is positive semi-definite Fig. 2 is introduced in [19]. The first three graphs have the matrix for all n ∈ N. However, the corresponding moment same number of vertices and edges. Table I shows distances matrix Mn can possibly be a singular positive semi-definite between the graphs based on Hamming distance, graph edit matrix, which is not a positive definite matrix. We assume that distance, and our proposed distance. As mentioned in [19], a the diameter of a large-scale network is always greater then good measure should return a higher distance value between

658 2018 IEEE International Symposium on Information Theory (ISIT)

G1 and G2, then between G1 and G3. Hamming distance involves eigenvalue decomposition, which can be computed and graph edit distance do not capture relevant topological in O(n3) time and O(n2) space. The time complexity of differences. However, our proposed measure perform a highly the total algorithm is O(n|E| + n3) and space complexity 2 precise comparison. Remark that M1(G3) is a singular pos- is O(|E| + n ). However, n is relatively small, say 4 or 5, itive semi-definite matrix. So, the geodesic distance between in practical problems because most of the information about M1(G3) and any positive definite matrices is not finite. Al- a distribution is contained in the first few moments [12]–[14]. ternatively, Bures-wasserstein distance between positive semi- Thus the time complexity and space complexity of GDMM definite matrices allows to overcome this situation [20]. are O(|E|). B. Parallelism As we discussed before, the first step is -vector multiplication. This operation can be completely paralleled on CPU or GPU. As n is small, the second step takes much less time than the first step. As a result, our algorithm can be paralleled efficiently.

Fig. 2: The first three different networks with the same number V. EXPERIMENTS of vertices and edges are shown in [19]. A. Classifying Networks TABLE I: (i) H: Hamming distance; (ii) GED: graph edit We apply our method to classify networks. We follow the distance; (iii) our propose method. system setting of [2]. Specifically, we classify one’s research area using the information of the graph structure of one’s Dissimilarity H GED Proposed Measure collaboration network. Because researchers in one area usually d(G1,G2) 12 6 2.7333 d(G1,G3) 12 6 1.2103 tightly connected with researchers in that area compared to d(G2,G3) 12 6 1.6815 other areas, it is possible to determine to which area a researcher belongs considering one’s collaboration networks. This information can be used for recommendations such as Theorem 6. Let G1,G2,...,GK be given mutually disjoint graphs. Then job recommendations and citation recommendations. Three datasets from [21] are used: high energy physics col- K X cj laboration network(HEP), condensed matter collaboration net- M (G t ... t G ) = M (G ) n 1 K c n j work(CM), and astro physics collaboration network(ASTRO). j=1 In the network, an undirected edge from u to v means that the PK where cj is the number of vertices in Gj and c = j=1 cj. author u and the author v are co-authored. We use the method from [2] to generate subgraphs and obtain 415 subgraphs for Especially, M (Gt...tG) = M (G) for all n ∈ . If a n n N CM and 1000 subgraphs for HEP and ASTRO respectively. graph consists of identical subgraphs, then the moment matrix Then we label each sub-graph according to the dataset which of a given graph is equal to one of its subgraph. In other words, it belongs to. The tasks are classifications between each two a moment matrix of graph can preserve information regardless datasets and among three datasets. For each task, we first split of repetition of structure, called sub-structure invariant. the dataset into 10 folds of the same size. We then combine 9 IV. COMPLEXITYAND PARALLELISM of the folds as the training set, the left 1 fold as the test set. Our proposed method, GDMM, has two steps. Consider the We repeat this ten times to compute the average accuracy. moment matrix with degree n whose size is (n+1)×(n+1). In the classification tasks, we use k-nearest-neighbor(KNN) classifier. We set the size of moment matrix n from 2 to 7 and The first step is to obtain the moment matrix Mn whose 2n k in KNN from 1 to 10 and choose the best one. The first three entries consist of the moment sequence {mk}k=0. In the second step, we use the geodesic distance between positive benchmark algorithms are Covariance [2], NCLM [8], and Top-10 eigenvalues(EIGS-10). Specifically, Covariance com- definite matrices to compute the distance between two moment i A e 5 matrices. In the following, we will show the time complexity, putes the of the vector [ |Aie| ]i=1 and uses space complexity, and parallelism of each step and those of Bhattacharyya similarity between two covariance matrices as the distance between the corresponding networks. NCLM first the overall algorithm. i tr(A ) 7 computes the log moment sequence vector [log( ni )]i=2 A. Complexity and uses the Euclidean distance between two moment vectors We consider comparing two graphs G1 and G2. Let as the distance between corresponding networks. EIGS-10 |V1|, |E1| and |V2|, |E2| denote the number of nodes and edges takes the largest 10 eigenvalues as a vector and uses the Eu- of graph G1 and G2 respectively. Let |E| = max(|E1|, |E2|) clidean distance between two vectors as the distance between and |V | = max(|V1|, |V2|). The first step of the algorithm corresponding networks. In addition, we add the state-of-art can be computed in O(n|E|) time and O(|E|) space using method in collaboration network classification, Covariance sparse matrix-vector multiplication. The second step mainly with SVM [2], which employs SVM as the classifier, as the

659 2018 IEEE International Symposium on Information Theory (ISIT)

last benchmark algorithm. The performance of our method and the number of edges. This demonstrates GDMM is scalable to the benchmark algorithm is shown in Table. II. massive networks.

TABLE II: Accuracy for GDMM and other benchmark meth- VI.CONCLUSION ods in collaboration network classification. Best results marked We considered the adjacency matrix of the network as a in bold. random variable and proposed a new network similarity mea-

HEP Vs CM HEP Vs ASTRO ASTRO Vs CM Full sure based on the geodesic distance of corresponding positive GDMM 0.991 0.913 0.904 0.905 definite moment matrix. Our proposed method demonstrated EIGS-10 0.981 0.879 0.861 0.820 NCLM 0.982 0.850 0.865 0.804 state-of-art results in collaboration network classification and Covariance 0.976 0.857 0.861 0.819 turned out to be scalable to massive networks. Covariance with SVM 0.987 0.889 0.887 0.849 REFERENCES From the table, we see that with KNN classifier, Covariance, [1] L. Zager, Graph similarity and matching. PhD thesis, Massachusetts EIGS, and NCLM have similar performance in each task. Institute of Technology, 2005. We also notice that Covariance with SVM performs better [2] A. Shrivastava and P. Li, “A new space for comparing graphs,” in Proc. IEEE Int. Conf. Social Networks Anal. Mining.(ASONAM), pp. 62–71, than Covariance with KNN. This shows that SVM classifier Aug. 2014. is more suitable to Covariance method. On top of that, our [3] A. Calderone, M. Formenti, F. Aprea, M. Papa, L. Alberghina, A. M. proposed method, GDMM, not only outperforms various of Colangelo, and P. Bertolazzi, “Comparing alzheimer’s and parkinson’s diseases networks using graph communities structure,” BMC Syst. Biol., benchmarks with KNN classifier, but also performs better than vol. 10, pp. 25–34, Mar. 2016. Covariance with SVM, the state-of-art method in collaboration [4] J. K. Morrow, L. Tian, and S. Zhang, “Molecular networks in drug classification task in every classification task. This demon- discovery,” Crit. Rev. Biomed. Eng., vol. 38, Nov. 2010. [5] X. Gao, B. Xiao, D. Tao, and X. Li, “A survey of graph edit distance,” strates the effectiveness of GDMM. This also shows that a few IEEE Pattern Anal. Applicat., vol. 13, pp. 113–129, Feb. 2010. moments can provide enough information for collaboration [6] J. Ugander, L. Backstrom, and J. Kleinberg, “Subgraph frequencies: classification. Besides, GDMM has a significant improvement Mapping the empirical and extremal geography of large graph collec- tions,” in Proc. Int. Conf. World Wide Web, pp. 1307–1318, May 2013. over the state-of-art method in three collaboration network [7] N. Shervashidze, S. Vishwanathan, T. Petri, K. Mehlhorn, and K. Borg- classification tasks. This shows GDMM is suitable to clas- wardt, “Efficient graphlet kernels for large graph comparison,” in Proc. sification tasks for sophisticated networks. IEEE Int. Conf. Artificial Intell. and Stat.(AISTATS), pp. 488–495, Apr. 2009. B. Time Comparison [8] S. S. Mukherjee, P. Sarkar, and L. Lin, “On clustering network-valued data,” in Adv. Neural Inf. Process. Syst. (NIPS), pp. 7074–7084, Dec. In this section, we show the efficiency of our algorithm by 2017. [9] N. Obata, Spectral Analysis of Growing Graphs: A Quantum Probability comparing the running time of GDMM and other methods Point of View. Springer, 2017. via a set of experiments. Specifically, in each experiment, we [10] J. Chung, P. Kannappan, C. Ng, and P. Sahoo, “Measures of distance generate 100 Erdos–Rényi˝ random graphs [22] with the same between probability distributions,” J. Math. Anal. Appl., vol. 138, pp. 280 – 292, Feb. 1989. number of nodes and edges. Then we employ GDMM and [11] S. Kullback and R. A. Leibler, “On information and sufficiency,” IMS other methods to get pairwise distances among all possible Ann. Math. Statist., vol. 22, no. 1, pp. 79–86, Mar. 1951. pairs. For each method, we run 10 times and take the average [12] D. Fasino and G. Inglese, “Recovering a probabilty density from a finite number of moments and local a priori information,” Rendiconti running time. The number of nodes, number of edges, and the dell’Istituto di matematica dell’Universitá di Trieste: an International time consumed by different methods are shown in Table III. Journal of Mathematics, 1996. Here, we use 4×4 moment matrix in GDMM, 4×4 covariance [13] F. JB, “Elementary principles of spectral distributions,” in Theory and 6 applications of moment methods in many fermion systems, pp. 1–16, matrix in Covariance and moments in NCLM. All of these Plenum Press, 1980. experiments are done in MATLAB on the server with an Intel [14] P. Gavriliadis and G. Athanassoulis, “Moment data can be analytically Xeon 2.80 GHz CPU and 64 GB RAM. completed,” Probabilistic Eng. Mech., vol. 18, pp. 329–338, Oct. 2003. [15] C. Berg and R. Szwarc, “A determinant characterization of moment TABLE III: Running time for computing pairwise distance sequences with finitely many mass points,” Linear Multilinear Algebra, vol. 63, pp. 1568–1576, Sep. 2015. among 100 random networks(in seconds). Fastest method [16] R. Bhatia and J. Holbrook, “Riemannian geometry and matrix geometric marked in bold. means,” Appl., vol. 413, pp. 594–618, Mar. 2006. [17] S. Sra, “Positive definite matrices and the s-divergence,” Proc. Amer. |V | |E| GDMM Covariance EIGS-10 NCLM Math. Soc., vol. 144, pp. 2787–2797, Oct. 2016. 2000 2000000 7.31 7.32 18.92 39.92 [18] R. Bhatia, Positive Definite Matrices. Princeton, 2007. 5000 1000000 1.38 1.48 85.88 533 [19] T. A. Schieber, L. Carpi, A. Díaz-Guilera, P. M. Pardalos, C. Masoller, 10000 2000000 3.9 4.7 353.5 27340 and M. G. Ravetti, “Quantification of network structural dissimilarities,” 50000 15000000 50 68 11687 N/A Nature Commun., vol. 8, pp. 13928–13937, Jan. 2017. [20] R. Bhatia, T. Jain, and Y. Lim, “On the Bures-Wasserstein distance As shown in the table, the time cost of GDMM is cheaper between positive definite matrices,” arXiv preprint arXiv:1712.01504, than all the comparing methods. For example, it can compute 2017. [21] R. A. Rossi and N. K. Ahmed, “The network data repository with pairwise distances of 100 random networks with 500000 nodes interactive graph analytics and visualization,” in Proc. AAAI Conf. and 15000000 edges in 50 seconds, which has 1.36× speed Artificial Intell., 2015. up to Covariance method and 233× speed up to EIGS-10. [22] P. ERDdS and A. R&WI, “On random graphs I,” Publ. Math. Debrecen, vol. 6, pp. 290–297, 1959. Besides, from the table, GDMM is almost linear in terms of

660