Assortative-Constrained Stochastic Block Models

Daniel Gribel Thibaut Vidal Michel Gendreau Departamento de Informatica´ Departamento de Informatica´ Department of Mathematics and Pontif´ıcia Universidade Catolica´ Pontif´ıcia Universidade Catolica´ Industrial Engineering do Rio de Janeiro do Rio de Janeiro Polytechnique Montreal´ Rio de Janeiro, Brazil, 22451-900 Rio de Janeiro, Brazil, 22451-900 Montreal,´ Canada, P.O. Box 6079 Email: [email protected] Email: [email protected] Email: [email protected]

Abstract—Stochastic block models (SBMs) are often used to community detection approaches are agnostic to the assortativ- find assortative community structures in networks, such that the ity of their solutions. They allow to search for solutions with probability of connections within communities is higher than in a pre-defined number of communities, and can indifferently between communities. However, classic SBMs are not limited to assortative structures. In this study, we discuss the implica- model assortative and disassortative structures. This modeling tions of this model-inherent indifference towards capability can be viewed as an asset but also as a weakness. or disassortativity, and show that this characteristic can lead Indeed, SBMs are often used in contexts in which users to undesirable outcomes for networks which are presupposedy expect assortative solutions. In the most dramatic situations, assortative but which contain a reduced amount of information. non-assortative solutions might go under the radar and lead To circumvent this issue, we introduce a constrained SBM that imposes strong assortativity constraints, along with efficient to mistakes of interpretation. In other cases, non-assortative algorithmic approaches to solve it. These constraints significantly solutions with a better likelihood may substitute the assortative boost community recovery capabilities in regimes that are close solutions which were originally sought (Figure 1). This later to the information-theoretic threshold. They also permit to iden- situation is especially prevalent in case studies involving sparse tify structurally-different communities in networks representing graphs, or with lightly assortative structures which challenge cerebral-cortex activity regions. detection algorithms.

I.INTRODUCTION Community detection methods hold a central place in , with an extensive range of applications related to sociological behavior, protein interactions, image segmentation, and gene expressions analysis [1]. In most of these applications, the actual classes of the nodes in the (a) Optimal solution (b) 2nd best solution (c) 3rd best solution network are unknown, but pairwise relations between nodes log L = −2.7616 log L = −5.3193 log L = −5.9012 are exploited to identify communities. Fig. 1. The three best solutions in a small example case with eight nodes and two communities. The two best solutions in terms of maximum likelihood are Fitting the parameters of a stochastic block model (SBM) disassortative, whereas the third (c) is assortative. [2, 3] to a given graph is a prominent way of searching for communities. The canonical SBM assumes that each node In this work, we propose a variant of the DC-SBM which belongs to one block (representing a community) and that the includes user knowledge about assortativity. We incorporate expected number of edges between two nodes depends only on this information by setting assortativity constraints on the DC- the blocks to which they belong. Thus, the model only assumes SBM parameter set. Indeed, if the user expects an assortative that nodes within each block are statistically equivalent in their solution due to the characteristics of the application case, connectivity patterns. Several variations of the standard SBM it is plausible to guide the convergence of the model via were also introduced to overcome some of its limitations. The additional constraints. We show that the resulting constrained degree-corrected SBM (DC-SBM) introduced by Karrer and likelihood-maximization model can be solved efficiently with

arXiv:2004.11890v1 [cs.SI] 21 Apr 2020 Newman [4], in particular, allows non-uniform node degree an iterative method based on local-search and interior-point distributions, making block modeling more representative of algorithms. Our computational experiments show that the real-world networks. assortativity constraints prevents the search from converging Broadly speaking, a solution for community detection (rep- towards spurious non-assortative local minima, especially for resented as a partition of the node set into communities) is sparse networks, and that they contribute to identify different assortative when connections within communities are more solution structures in application cases related to the analysis frequent than in between communities, it is disassortative of the brain cortex. The key contributions of this work are when connections within communities are less frequent than therefore the following. in between communities, and finally it is non-assortative if 1) We introduce a DC-SBM variant which incorporates as- no such relation exists among all communities. SBM-based sortativity constraints to represent prior user knowledge; 2) We propose an efficient solution approach based on III.BACKGROUNDAND NOTATIONS local optimization and interior-point algorithms for this A. Degree-Corrected Stochastic Block Model (DC-SBM) model; 3) Through extensive computational experiments, we dis- In its most fundamental form, the DC-SBM considers N cuss the practical implications of this constrained model nodes allocated to K groups. We assume that the number and identify the regimes in which it contributes to of edges between a pair of nodes (i, j) depends only on the improve community detection practice. groups to which the nodes belong and on their degrees [5]. Finding the latent membership of nodes corresponds to finding II.RELATED WORKS the block-model parameters that best fit the observed graph [1]. SBMs are commonly used to extract meaningful informa- For an observed adjacency matrix A ∈ NN,N representing a tion from complex networks. The classical SBM is also a graph with m (possibly weighted) edges, the log-likelihood natural modeling choice for community detection [1] and a function of the DC-SBM is calculated as [4]: generalization of maximization [5]. The surveys K N   of Abbe [1] and Lee and Wilkinson [6] discuss key results 1 XX kikj log P (A|Ω,Z) = A log(ω ) − ω z z . regarding recovery requirements and solution algorithms. Dif- 2 ij rs 2m rs ir js rs ij ferent types of algorithms can be used to fit SBMs, based on (1) Markov Chain Monte Carlo (MCMC) approaches [3, 7, 8], In this equation, ki is the degree of node i, and m is the variational inference [9, 10], belief propagation [11], spectral total number of edges. Variables Z ∈ {0, 1}N,K represent the clustering [12–14], and semidefinite programming [15, 16], binary community assignments, in such a way that zir = 1 among others. indicates that node i is assigned to group r. Ω is a symmetric To date, few works have considered the possibility of K × K edge probability matrix. Each element ωrs of Ω incorporating prior information on assortativity. Moore et al. corresponds to the expected number of edges between any [17] studied a SBM in which the edge probabilities within two points in groups r and s. The expected number of and in between communities follow a Beta prior. The hyper- kikj edges between nodes i and j is ωrs, for zir = 1 and parameters defining the Beta distributions drive the degree of 2m zjs = 1. If we fix the assignment Z, then the (unconstrained) assortativity in the graph. Yet, according to their experiments, maximum-likelihood for each parameter ωrs can be estimated these priors dominate only in small or sparse data sets, by differentiation: otherwise they tend to wash out. The Assortative Mixed Membership SBM (a-MMSB) intro- 2m · mrs ωˆrs = , (2) duced by Gopalan et al. [18] considers soft node-to-community κrκs assignments and includes a latent parameter describing com- PN where mrs = ij Aijzirzjs is the number of edges between munity strength, representing how tightly nodes are connected N groups r and s, and κ = P k z is the sum of the degrees within each group. Edges are assumed to be drawn from a r i i ir of nodes in group r. If we substitute ωˆ in Equation (1), we Bernoulli distribution centered around the community strength rs obtain the following log-likelihood function: if the nodes belong to the same group. Otherwise, the distribu- tion is centered around a small value. A variational inference K N    1 X X mrs approach is used to fit the model. Li et al. [19] pursued this log P (A|Z) = A log z z , (3) 2 ij κ κ ir js research line by proposing a scalable MCMC method using rs ij r s a stochastic gradient algorithm for posterior inference in the in which we dropped the terms that do not involve Z. a-MMSB. Lu and Szymanski [20] finally proposed a regularized vari- B. Planted Partition Model and Modularity ant of the DC-SBM, using a prior to regularize the observed in-degree ratio of each node. In practice, this adaptation turns The Planted Partition Model (PPM) is a special case of the out to penalize high-degree nodes with many connections to standard SBM with only two parameters describing the blocks: other communities. The new parameter is adjusted to control ωrs = ωIN if r = s, and ωrs = ωOUT if r 6= s. Newman [5] the assortativity level, and a MCMC algorithm is used to infer shows that maximizing the likelihood of the PPM is equivalent the block assignments. to maximizing modularity. Modularity optimization maximizes The aforementioned models aim to better fit assortative the difference between the observed graph and a networks, but they are either dependent on ad-hoc parameters where edges are reinserted randomly and the degrees of each which are difficult to scale [18, 20], or of limited effect node is preserved. As a consequence, it results in maximizing for larger graphs [17]. In light of these works, we decided the number of edges within groups, leading to assortative to explore a different approach, which consists in guiding solutions. However, modularity maximization is also subject to the search towards assortative structures via constraints in strong limitations: beyond its inability to define the number K the SBM parameters. To fit our model, we propose effective of communities, the model assumes that all communities have algorithms for the resulting constrained maximum-likelihood similar statistical properties [5]. This is a major issue when the optimization problem. distribution of edges between the blocks varies significantly. IV. ASSORTATIVE-CONSTRAINED SBM (ii) an efficient interior point solver for Problem (7a–7d), only We now introduce the assortative-constrained degree- used if the relocation candidate was not filtered out due corrected SBM (AC-DC-SBM) along with an efficient to the previous conditions (Lines 14–19). algorithm to fit it by maximum likelihood. Following Amini et al. [21], two main notions of assortativity can be Algorithm 1 Likelihood maximization algorithm distinguished for block models: 1: Input: Adjacency matrix: A, Number of communities: K

Strong assortativity. All diagonal terms of Ω are greater or 2: Output: Network partition: Z equal than all off-diagonal terms: 3: Initialize a random partition Z 4: Find Ω by solving (7a–7d) for partition Z ωqq ≥ ωrs ∀ q, r, s ∈ {1,...,K}, r 6= s. (4) 5: Evaluate log-likelihood: L ← log P (A|Ω,Z) 6: repeat Weak assortativity. Each diagonal term of Ω is greater or 7: for each i ∈ {1,...,N} and r ∈ {1,...,K} do equal than the other terms in its row: 8: Consider partition ZR obtained from Z by relocating ωqq ≥ ωqs ∀ q, s ∈ {1,...,K}. (5) node i to community r 9: Find Ω0 maximizing log P (A|Ω0,ZR) Other types of assortativity constraints may be considered 10: Evaluate log-likelihood: L0 ← log P (A|Ω0,ZR) with simple adaptations of our algorithm, e.g., imposing a 11: if L0 > L then lower bound on the number of blocks satisfying Condition (5). 12: if Ω0 satisfies (7b–7d) then In this study, we will use the strongest definition of assorta- 13: Apply: Ω ← Ω0, Z ← ZR, L ← L0 tivity based on Condition (4). With these constraints, the log- 14: else likelihood maximization model becomes: 15: Find Ω00 by solving (7a–7d) for partition ZR K N   00 00 R 1 X X kikj 16: L ← log P (A|Ω ,Z ) max Aij log(ωrs) − ωrs zirzjs (6a) 17: 00 Ω,Z,λ 2 2m if L > L then rs ij 18: Apply: Ω ← Ω00, Z ← ZR, L ← L00 s.t. ωqq ≥ λ ∀q ∈ {1,...,K} (6b) 19: end if

ωrs ≤ λ ∀r, s ∈ {1,...,K}, r 6= s (6c) 20: end if 21: end if ωrs ≥ 0 ∀r, s ∈ {1,...,K}, (6d) 22: end for where λ represents a continuous variable acting as a threshold. 23: until No improving relocation has been identified It is important to note that the assortativity constraints only apply on the block-model parameters Ω. This does We use the interior point algorithm of Domahidi et al. not completely eliminate the possibility of a disassortative [22] for the solution of each subproblem. When the partition partition as represented by Z, but strongly penalizes its log- is fixed, the constrained maximization subproblem takes the likelihood in comparison to other assortative solutions. following form: K A. Likelihood Maximization 1 X max (mrs log(ωrs) − Trsωrs) (7a) We introduce an iterative algorithm to solve (6a–6d). This Ω,λ 2 rs algorithm starts with a random initial solution and proceeds s.t. ω ≥ λ ∀q ∈ {1,...,K} (7b) by iteratively evaluating each possible relocation of a node to qq a different community. Each such relocation is only applied ωrs ≤ λ ∀r, s ∈ {1,...,K}, r 6= s (7c) if its application combined with an optimal update of Ω ωrs ≥ 0 ∀r, s ∈ {1,...,K}, (7d) results into an improvement of the likelihood. As such, the where m represents the number of edges between com- evaluation of each relocation may require the solution of a rs 2 munities r and s according to the fixed partition and small constrained convex optimization subproblem with K K K T = (P m P m )/2m. variables and constraints to find an optimal Ω for the new rs t rt t st partition. For the classical DC-SBM, the optimal Ω is simply V. EMPIRICAL STUDIES obtained via Equation (2). This is however, no longer true We conduct extensive computational experiments on syn- for the AC-DC-SBM due to the assortativity constraints. As thetic and real data sets to analyze three aspects of the described in Algorithm 1, the overhead associated to this proposed assortative-constrained DC-SBM (AC-DC-SBM). operation can be mitigated by combining two techniques: Firstly, we wish to know under which conditions the assor- (i) an incremental move evaluation approach, using the log- tativity constraints help to converge to desirable partitions. likelihood of the unconstrained subproblem (Lines 9–10) Secondly, we compare the AC-DC-SBM, the standard DC- to filter relocation candidates (Line 11), and possibly SBM and the modularity maximization model in terms of keeping this solution if it naturally satisfies the assor- community detection performance. Finally, we apply the AC- tativity constraints (Lines 12–13); DC-SBM to graphs representing brain cortex data, highlighting structures which were not previously detected before and B. Networks Generated From SBMs discuss the implications of the different models. We now repeat the previous experiment on general SBMs, characterized by a larger number of parameters. To compare A. Networks Generated From a PPM the results of the DC-SBM and AC-DC-SBM, we generate 50 The standard DC-SBM usually finds assortative solutions synthetic data sets with N = 100 nodes and K = 4 blocks. for assortative networks with a sufficient amount of informa- For each data set, the Ω parameters are uniformly sampled in tion. However, it can be trapped into spurious non-assortative the following intervals: local minima on sparse or lightly assortative networks. To limit the number of factors, we conduct this first analysis ωrr ∈ [0.45, 0.55] ∀ r ∈ {1,...,K} (8) on data sets generated by a simple PPM (Section III-B) with ωrs ∈ [0, 0.4] ∀ r, s ∈ {1,...,K}, r 6= s. (9) K = 4 blocks, N = 100 nodes, an average degree of 16, and different ratio values for ωOUT/ωIN representing different Each node is allocated to one of the four blocks with equal assortativity levels. Our goal is to evaluate in which regimes probability. Then, for each node pair (i, j), a number of the assortativity constraints are meaningful. Figure 2 therefore edges is generated from a Poisson distribution centered in ωrs, depicts the performance of the standard DC-SBM and of where r and s represent the blocks of i and j. the proposed AC-DC-SBM in terms of normalized mutual Figure 3 compares the NMI obtained with the standard information (NMI) [23]. For each data set and model, we DC-SBM and the proposed AC-DC-SBM on these networks. report the results of 100 independent runs from different initial For each network and model, we conduct 50 independent solutions. These results are represented as box plots, where the runs from different initial solutions and report the results as whiskers extend to 1.5 times the interquartile range. boxplots. AC-DC-SBM obtains on 49 out of 50 datasets a better or equal median NMI than DC-SBM. DC-SBM appears

1.00 ●●●●●●●●●● ●●●●●●●●●●●●●●●● to be very sensible to low-quality local minima. This behavior ●●●●●●●●●●● ●●●●●●●●●●●●●●●

● ● ● ●●●●●●●●●●●●● Standard DC−SBM is particularly visible on the first six data sets presented in ● ●

● ●●●● ● ● ● ●●●● ● ●●● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ●● ● ● ● ●● ●● ● ● ●●●●●●●● AC−DC−SBM ● ● the figure. A pairwise Wilcoxon test comparing the average ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ●●●●● ● ● ● ● ●●● ● ● ●●●● ● ●● ● ● ●●●●●● ● ●● ● ● ●● ● ● ● ●● ●●● ● ● ●●●● ● ● ●●● ●●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● 0.75 ● ●● NMI of both methods over the 50 data sets confirms the ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ●● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● statistical significance of this difference of performance (with ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −10 ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● p = 3.9 × 10 ● ). ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ●●● ● ● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● 0.50 ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● In a second part of this analysis, we filter the set of ● NMI ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ●●●● ● ● ● ● ●● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●●● ● ● 10% ● ● solutions produced by the methods to focus on the top ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ●● ● in terms of likelihood for each data set. This corresponds ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.25 ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ●● ● ●● ● ● ● ●●●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● to a typical use case in which multiple independent runs ● ● ● ● ●● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ●●●● ● ● ●● ● ●● ● ● ● ● ● ● ●● ●● ● ●●● ● ●● ● ●●● ● ● ● ●● ● ● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ●● ●● ●● ● ●● ● ● ● ●●●●●●● ● ●●● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●● ● ●●● ● ● ●● ● ● ● ●● ● ●● ●● ●● ● ●●● ●● ● ● ● ●● ● ● ● ● ●● ● ●●●● ● ● ●●● ● ● ●● ● ● ●● ●●● ●●● ● ●● ● ● ●● ● ● ●● ● ●● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ●● ●● ● ● ●●● ●● ●●● ●●●● ● ● ●●● ● ● ●● ●● ● ●● ● ●●● ● ●● ●●● are performed to avoid local minima. Figure 4 displays the ●● ● ●● ●● ●● ● ● ●●● ● ● ● ● ● ●● ●● ● ●●● ● ●●●● ● ● ●● ● ●●●● ● ●● ● ● ● ●● ● ● ●● ● ● ● ●●●● ● ● ●●● ●●●● ●●●●●● ● ●●● ●● ●● ● ● ●●●●● ● ● ● ● ● ● ●● ●●●●● ● ●●●● ●●● ●●●● ●●●● ●●● ●●●●●●●● ●●●● ●●●●● ● ● ● ●●● ●●●● ● ●● ● ●● ● ● ●●●● ● ● ●● ●●●● ●● ●● ● ●● ● ●● ●●● ● ●●●●●● ● ● ●●●● ●●● ● ● ● ●● ●●●● ● ●● ● ● ● ● ●●●●● ●●●● ●● ● ●● ●●● ● ●● ●●●●● ●●●●●●●●● ● ● ●● ● ●● ●● ● ● ●●●● ●● ● ●●●●●●● ●● ●●●● ●●● ●● ●●●●● ●●● ● ● ● ● ● ● ●● ●●● ● ● ●●● ● ●● ●● ● ● ●●● ●● ●●● ●●● ● ●● ● ● ●● ● ● ●● ● ● ● ●●● ●●●● ●● ● ● ●●●●● ●●●● ● ●●●●●● ●●●●● ●● ●● ● ● ● ● ● ● ●●●●●● ●●● ●● ●● ● ●●●●●●● ●●● ●●● ●●●●●● ●●●●●● ● ● ● ●●●●● ● ● ● ● ●●●● ●● ● ●● ●●●● ● ●● ●● ●● ●● ● ● ●● ● ● ● ● ●●● ●● ● ● ●● ●●●●●●●●● ●●● ●●●● ● ●●●●● ● ● ●●●● ●●●● ● ●●●●● ●●●● ●●●● ●●●●●●● ● ● ●●●●● ●● ● ●●●●●●● ●●● ●●● ●● ●●●● ● ●● ● ● ●● ● ● ●● ● ●● ●●●●●● ●● ●● ●●●●●●●●● ● ●● ● ● ●● ● ●●● ●● ● ●● ●●●●● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● 10% ● ● relative difference between the NMI of the top solutions 0.00 ● ● ● ● ● ● of the AC-DC-SBM and those of the standard DC-SBM. For 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 the sake of completeness, we repeat the same analysis with ωout ωin the modularity-maximization algorithm. As visible in these Fig. 2. Performance of DC-SBM and AC-DC-SBM on networks generated results, the best AC-DC-DBM solutions still outperform those from PPMs with varying degree of assortativity. of the two other approaches on most data sets. The statistical significance of these observations is confirmed by pairwise For the data sets of Figure 2, detectability is known to Wilcoxon tests (with p = 2.4 × 10−5 and p = 5.1 × 10−6 for be possible for values of ωOUT/ωIN smaller than ≈ 0.4 (see DC-DBM and modularity maximization, respectively).

[24]). As expected, as the ratio ωOUT/ωIN increases beyond Figure 5 finally compares the number of assortative commu- 0.4, both models are unable to recover the communities. nities found by AC-DC-SBM and DC-SBM. The standard DC- In contrast, when this ratio diminishes below 0.4, the per- SBM produces much fewer assortative communities in average formance of both methods improves, highlighting a phase (2.43 compared to 3.76). As discussed earlier in this article, transition towards a regime where partial recovery is possible. AC-DC-SBM only enforces constraints on the block-model As visible in these experiments, the transition of AC-DC-SBM parameters Ω, and therefore does not systematically guarantee occurs before that of the standard DC-SBM. For example, assortative partitions. Yet, non-assortative partitions are heav- when ωOUT/ωIN = 0.25, AC-DC-SBM achieves an average ily penalized from a likelihood perspective and therefore gen- NMI of 0.55, compared to 0.34 for DC-SBM. Similarly, erally avoided. Finally, remark that modularity maximization when ωOUT/ωIN = 0.1, AC-DC-SBM achieves near-perfect always produces assortative solutions, but its equivalence to recovery on a much larger proportion of the runs. As such, the PPM (with only two parameters driving the distribution of it appears that the assortativity constraints are useful to guide the edges) limits its ability to fit more general SBMs. Among likelihood maximization algorithms in challenging data sets these alternatives, AC-DC-SBM appears to find a trade-off located within the phase transition regime. between insufficient and excessive expressiveness. Standard DC−SBM 0.9 AC−DC−SBM 0.8 0.7 0.6 0.5 NMI 0.4 0.3 0.2 0.1 0.0 Datasets Fig. 3. Performance of DC-SBM and AC-DC-SBM on networks generated from general SBMs. The results are ordered by median NMI.

NMI(AC−DC−SBM) − NMI(DC−SBM) NMI(AC−DC−SBM) − NMI(MOD) 0.4 DC-SBM, the AC-DC-SBM and modularity maximization models on this dataset. For each model, we performed 100 0.2 optimization runs and registered the best solution (in terms of

0.0 likelihood or modularity). The best solution obtained with the standard DC-SBM is Relative NMI Relative −0.2 visibly non-assortative. The minimum value found along the

−0.4 Ω diagonal is 1.5060, whereas the maximum value in the off- Datasets diagonal is 1.9050. The size of each group is similar, and one disassortative community acts as a “hub” for edges that Fig. 4. Relative NMI between the AC-DC-SBM and the standard DC-SBM flow between groups. In contrast, the partition produced by the (left) and modularity-maximization (right). Analysis based on the top 10% best solutions for each data set. AC-DC-SBM satisfies the strong assortativity conditions. The minimum value of the Ω diagonal is 2.0196, and the maximum value in the off-diagonal is 1.7152. This solution includes Standard DC−SBM AC−DC−SBM 2000 communities of different sizes with edges which are more

1500 evenly distributed between groups. Two mutually-disconnected

1000 community pairs are also identified (green-yellow and dark

500 blue-yellow). Finally, the modularity-maximization approach 0 leads to the most assortative partitioning of this network. Number of solutions 0 1 2 3 4 0 1 2 3 4 Number of assortative groups Yet, since the model does not take K into consideration, this partitioning contains only three groups, contrasting with the Fig. 5. Distribution of the number of assortative communities found by AC- four functional areas which were originally expected. DC-SBM and DC-SBM.

VI.CONCLUSIONS C. Brain Cortex Networks Assortativity constraints arise as a natural approach to guide Many real-world networks are known to present assortative maximum-likelihood algorithms away from spurious local structures, e.g., in applications to module or community detec- minima on networks which have a presupposed assortative tion in brain cortex networks, protein-protein interaction, and structure. In this work, we have shown that these constraints metabolic networks [25–28]. We analyze in this section the can be effectively handled with tailored local optimization case of the “cats cortex network”, which is known to have an and interior point methods. Our experiments show that the re- assortative structure and is divided into four main functional sulting AC-DC-SBM significantly outperforms unconstrained areas: visual, auditory, frontolimbic, and somatosensory-motor community detection methods in lightly assortative graphs, duties [29]. The network is obtained from the cortico-cortical especially in regimes which are close to the detectability connectivity pattern described by [30], based on 1139 cortico- threshold. In these circumstances, the classic SBM has a strong cortical connections and 65 cortical areas. As in most com- tendency to converge towards non-assortative solutions, while munity detection tasks, the ground truth in this network is not the modularity maximization model does not generalize well available. In fact, there is no unique “correct” partitioning [31], to graphs in which the number of edges between groups widely but different algorithms can allow to highlight different under- varies. On the practical example of a brain cortex network, the lying structures. proposed AC-DC-SBM reveals drastically different commu- Figure 6 reports the communities found with the standard nity structures which were not identified by other algorithms. (a) Standard DC-SBM (b) AC-DC-SBM (c) Modularity maximization model Fig. 6. The best among 100 network partitions found by different models in the cats cortex network.

The research perspectives related to this work are numerous. [15] T. T. Cai, X. Li et al., “Robust and computationally feasible community We recommend to further evaluate the impact of assortativity detection in the presence of arbitrary outlier nodes,” The Annals of Statistics, vol. 43, no. 3, pp. 1027–1059, 2015. constraints on known phase transitions and thresholds. We also [16] Y. Chen, S. Sanghavi, and H. Xu, “Clustering sparse graphs,” in recommend to investigate different algorithmic paradigms to Advances in Neural Information Processing Systems, 2012, pp. 2204– improve the solution of this constrained maximum likelihood 2212. [17] C. Moore, X. Yan, Y. Zhu, J.-B. Rouquier, and T. Lane, “Active learning formulation, and to pursue the study of the AC-DC-SBM in a for node classification in assortative and disassortative networks,” in wider range of application contexts. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data mining. ACM, 2011, pp. 841–849. REFERENCES [18] P. K. Gopalan, S. Gerrish, M. Freedman, D. M. Blei, and D. M. Mimno, [1] E. Abbe, “Community detection and stochastic block models: Recent “Scalable inference of overlapping communities,” in Advances in Neural developments,” The Journal of Machine Learning Research, vol. 18, Information Processing Systems, 2012, pp. 2249–2257. no. 1, pp. 6446–6531, 2017. [19] W. Li, S. Ahn, and M. Welling, “Scalable MCMC for mixed membership [2] P. W. Holland, K. B. Laskey, and S. Leinhardt, “Stochastic blockmodels: stochastic blockmodels,” in Artificial Intelligence and Statistics, 2016, First steps,” Social Networks, vol. 5, no. 2, pp. 109–137, 1983. pp. 723–731. [3] K. Nowicki and T. A. B. Snijders, “Estimation and prediction for [20] X. Lu and B. K. Szymanski, “Regularized stochastic block model stochastic blockstructures,” Journal of the American Statistical Asso- for robust community detection in complex networks,” arXiv preprint ciation, vol. 96, no. 455, pp. 1077–1087, 2001. arXiv:1903.11751, 2019. [4] B. Karrer and M. E. Newman, “Stochastic blockmodels and community [21] A. A. Amini, E. Levina et al., “On semidefinite relaxations for the block structure in networks,” Physical Review E, vol. 83, no. 1, p. 016107, model,” The Annals of Statistics, vol. 46, no. 1, pp. 149–179, 2018. 2011. [22] A. Domahidi, E. Chu, and S. Boyd, “ECOS: An SOCP solver for [5] M. E. Newman, “Equivalence between modularity optimization and embedded systems,” in 2013 European Control Conference (ECC). maximum likelihood methods for community detection,” Physical Re- IEEE, 2013, pp. 3071–3076. view E, vol. 94, no. 5, p. 052315, 2016. [23] T. O. Kvalseth, “Entropy and correlation: Some comments,” IEEE [6] C. Lee and D. J. Wilkinson, “A review of stochastic block models and Transactions on Systems, Man, and Cybernetics, vol. 17, no. 3, pp. 517– extensions for graph clustering,” Applied , vol. 4, no. 1, 519, 1987. p. 122, 2019. [24] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborova,´ “Inference and [7] A. F. McDaid, T. B. Murphy, N. Friel, and N. J. Hurley, “Improved phase transitions in the detection of modules in sparse networks,” bayesian inference for the stochastic block model with application to Physical Review Letters, vol. 107, no. 6, p. 065701, 2011. large networks,” Computational Statistics & Data Analysis, vol. 60, pp. [25] Z. J. Chen, Y. He, P. Rosa-Neto, J. Germann, and A. C. Evans, 12–31, 2013. “Revealing modular architecture of human brain structural networks by [8] T. P. Peixoto, “Bayesian stochastic blockmodeling,” Advances in Net- using cortical thickness from MRI,” Cerebral Cortex, vol. 18, no. 10, work Clustering and Blockmodeling, pp. 289–332, 2019. pp. 2374–2381, 2008. [9] Y. R. Wang, P. J. Bickel et al., “Likelihood-based model selection for [26] A. Kreimer, E. Borenstein, U. Gophna, and E. Ruppin, “The evolution of stochastic block models,” The Annals of Statistics, vol. 45, no. 2, pp. modularity in bacterial metabolic networks,” Proceedings of the National 500–528, 2017. Academy of Sciences, vol. 105, no. 19, pp. 6976–6981, 2008. [10] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing, “Mixed [27] E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai, and A.- membership stochastic blockmodels,” Journal of Machine Learning L. Barabasi,´ “Hierarchical organization of modularity in metabolic Research, vol. 9, no. Sep, pp. 1981–2014, 2008. networks,” Science, vol. 297, no. 5586, pp. 1551–1555, 2002. [11] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborova,´ “Asymptotic [28] M. Huss and P. Holme, “Currency and commodity metabolites: their analysis of the stochastic block model for modular networks and its identification and relation to the modularity of metabolic networks,” IET algorithmic applications,” Physical Review E, vol. 84, no. 6, p. 066106, Systems Biology, vol. 1, no. 5, pp. 280–285, 2007. 2011. [29] E. L. Lameu, F. S. Borges, R. R. Borges, K. C. Iarosz, I. L. Caldas, A. M. [12] J. Lei, A. Rinaldo et al., “Consistency of spectral clustering in stochastic Batista, R. L. Viana, and J. Kurths, “Suppression of phase synchroni- block models,” The Annals of Statistics, vol. 43, no. 1, pp. 215–237, sation in network based on cat’s brain,” Chaos: An Interdisciplinary 2015. Journal of Nonlinear Science, vol. 26, no. 4, p. 043107, 2016. [13] T. Qin and K. Rohe, “Regularized spectral clustering under the degree- [30] J. W. Scannell, C. Blakemore, and M. P. Young, “Analysis of connectiv- corrected stochastic blockmodel,” in Advances in Neural Information ity in the cat cerebral cortex,” Journal of Neuroscience, vol. 15, no. 2, Processing Systems, 2013, pp. 3120–3128. pp. 1463–1483, 1995. [14] K. Rohe, S. Chatterjee, B. Yu et al., “Spectral clustering and the high- [31] L. Peel, D. B. Larremore, and A. Clauset, “The ground truth about dimensional stochastic blockmodel,” The Annals of Statistics, vol. 39, metadata and community detection in networks,” Science Advances, no. 4, pp. 1878–1915, 2011. vol. 3, no. 5, p. e1602548, 2017.