Spectral entropies as information-theoretic tools for comparison

Manlio De Domenico1 and Jacob Biamonte2, 3 1Departament d’Enginyeria Inform`atica i Matem`atiques, Universitat Rovira i Virgili, 43007 Tarragona, Spain 2Quantum Complexity Science Initiative Department of Physics, University of Malta, MSD 2080 Malta 3Institute for University of Waterloo, Waterloo, N2L 3G1 Ontario, Canada Any physical system can be viewed from the perspective that information is implicitly represented in its state. However, the quantification of this information when it comes to complex networks has remained largely elusive. In this work, we use techniques inspired by quantum statistical mechanics to define an entropy measure for complex networks and to develop a set of information-theoretic tools, based on network spectral properties, such as R´enyi q−entropy, generalized Kullback-Leibler and Jensen-Shannon divergences, the latter allowing us to define a natural measure between com- plex networks. First we show that by minimizing the Kullback-Leibler divergence between an observed network and a parametric network model, inference of model parameter(s) by means of maximum-likelihood estimation can be achieved and model selection can be performed appropriate information criteria. Second, we show that the information-theoretic metric quantifies the distance between pairs of networks and we can use it, for instance, to cluster the layers of a multilayer system. By applying this framework to networks corresponding to sites of the human microbiome, we perform hierarchical cluster analysis and recover with high accuracy existing community-based associations. Our results imply that spectral based statistical inference in complex net- works results in demonstrably superior performance as well as a conceptual backbone, filling a gap towards a network information theory.

I. INTRODUCTION an output state of a system. This then necessitates a method to quantify this information, its time-dependence in terms of storage and transfer of information, ideally Shannon’s entropy [1] and information-theoretic de- between levels of a multilayer system, which can fur- rived measures have been successfully applied in a range ther be imperfect due to randomness or noise. A sim- of disciplines, from revealing time scale dependence in ilar challenge was addressed by quantum theorists be- neural coding [2,3] to quantifying quantum informa- ginning decades ago when faced with quantifying infor- tion [4–6] and the complexity of genetic sequences [7,8] or matic properties of quantum states [4–6, 34–37]. In- to unravel the mesoscale organization of interconnected deed, the modern theory of , as the systems [9–12], to cite just a few emblematic achieve- name suggests, is fundamentally built upon the quan- ments. However, when it comes to complex networks, an tification of physical information, entropic based quan- appropriate definition of entropy has remained elusive— tifications of non-locality, entanglement [34] and the in- applicability being often limited to the probability distri- herent complexity of the quantum model [35]. These bution of some network descriptor (such as the normal- decades of research have placed entropic measures cen- ized distribution of node degrees). tral to the modern theory, leading to quantum generaliza- Complex dates to the discovery of sev- tions of Shannon’s classical information theory [4–6, 34– eral fundamental features [13, 14] and complex networks 37]. However, for a variety of reasons, these results can’t have been widely used to model and better understand be applied directly to complex networks, although clas- the organization of complex systems and their dynam- sical information-theoretic tools have been successfully ics [15–20], the controllability of their constituents [21] adopted to solve specific problems [9, 38–46]. arXiv:1609.01214v1 [physics.soc-ph] 5 Sep 2016 and their resilience to structural and dynamical pertur- Here we address a similarly motivated challenge: in- bations [22–27]. A recent and ongoing drive in complex spired by how entropy is calculated in quantum systems, networks is to merge ideas with quantum information we define an interconnectivity-based density matrix to science, including entangled networks for communica- calculate the von Neumann entropy of a network. We tion [28–30], developing a theory of node in analytically prove that our definition satisfies the desired quantum walks on graphs [31] and in the detection of additivity properties similar to quantum thermodynamic community structures formed in quantum systems [32]. entropy—though the proof of this differs tremendously In the same spirit as the Church–Turing–Deutsch prin- from the quantum case—which opens the door to avoid ciple [33], like all physical systems it is possible to view the shortcomings, recently pointed out in [47], imposed complex networks in terms of information processing, by sub-additivity failing as found in past approaches. in which information changes in time from an input to We exploit this entropy to develop a set of information- 2 theoretic tools which apply to complex networks, such i.e. it is equal to the Shannon entropy of the eigenvalues as R´enyi q-entropy, generalized Kullback-Leibler and of the density matrix, where by convention 0 log2 0 := 0. Jensen-Shannon divergences, thus providing a backbone In this work we are not limited in having a quantum to an information-theoretic approach to . setup where the matrix ρ can be called a “density ma- Our framework, based on spectral properties of networks, trix” in the physical sense, but instead build on this idea allows one to probe contemporary problems faced in to define a matrix from a network that satisfies the same complex network science. For instance, we show that mathematical properties of a density matrix. Kullback-Leibler minimization can be used to infer the First, such a density matrix should be positive defi- parameter(s) of a model aimed to fit an observed net- nite and symmetric, therefore ρ = Z−1e−βH – with H a work. By exploiting results of classical information the- symmetric matrix with non-negative eigenvalues, Z and ory, we are able to introduce the spectral counterpart of β real numbers – is a suitable candidate. Second, the likelihood maximization and use it in practical applica- eigenvalues of ρ must sum to unity, thus imposing the tions to fit network models and perform model selection constraint Z = Tr e−βH. Here, we use the following den- based on appropriate Akaike and Bayesian information sity matrix, defined by criteria, as well as minimum description length. The e−βL strength of our approach relies on the fact that it uses ρ = ,Z = Tr e−βL, (3) the network as a whole, instead of a subset of network’s Z descriptors, to attack parameter inference and model se- where L = D−A denotes the combinatorial graph Lapla- lection problems. cian, being A the of the network and Another important byproduct of the proposed infor- D the diagonal matrix of node degrees. Eq. (3) resembles mation theoretic framework is the possibility to quan- the Gibbs state of a quantum system where the Hamil- tify the distance between complex networks. This prob- tonian equals the Laplacian matrix, i.e. H = L, where β lem is of high interest in recent applications concerning is a scale parameter. multilayer systems, such as time-varying and multiplex It is worth remarking that this functional form appears networks [48–50]. This type of systems consist of nodes in many problems of physical interest. For instance, if β replicated across several networks that exhibit different parameterizes time, setting Z = 1 and H = L, i.e. the types of relationships or connectivity [51]. The possibil- Laplacian matrix, then e−Lt is the evolution matrix gov- ity to compare layers from an information-theoretic point N of view has been recently explored to aggregate them in erning purely diffusive dynamics and its trace P e−λit order to reduce the structure and the complexity of bio- i=1 is N times the average return probability, i.e. the prob- logical, transportation and social multiplex networks [47]. ability that a walker starting at any node i will return To numerically probe our method, we cluster the layer of to its origin at time t. Another case of interest is when an empirical system and compare it against the existing β parameterizes temperature of a quantum system with classification. Hamiltonian H and Z = Tr e−βH, so that ρ would pro- vide the Gibbs state of the system in equilibrium at finite temperature. Although we are interested mainly in non- II. VON NEUMANN ENTROPY OF A negative values of β, it is interesting to show that when COMPLEX NETWORK β = −1, Z = 1 and H = A we recover the well-known communicability matrix introduced by Estrada [52–57] In , probability distributions are (see Ref. [58] for a thorough review) or a normalized ver- encoded by density matrices. A density matrix ρ is a sion of it, if Z = Tr eA. Hermitian and positive semi-definite matrix, with trace In Ref. [59], a matrix is built in such a way that its equal to unity which is used to represent both mixed and mathematical properties satisfy the requirements of a pure quantum states. A system is in a pure state |ψi, if density matrix. For an undirected complex network, L and only if the bound Tr ρ2 ≤ 1 is saturated. The density is symmetric and positive semi-definite but its eigenval- matrix admits a spectral decomposition as ues do not sum to unity. However, the eigenvalues of the proposed matrix, defined by ρbgs = L/ Tr L, evidently N X do, therefore ρbgs is a mathematically suitable density ρ = λi|φiihφi| (1) matrix [45, 59] and hence, Eq. (2) can in principle be i=1 applied. This approach has been recently generalized to the case of multilayer systems [51], composite networks for an orthonormal basis {|φ i}, where λ are non- i i where units exhibits different types of relationships that negative eigenvalues which sum up to 1. are generally modeled as different layers and used to re- The density matrix allows to define the von Neumann duce their structure [47]. entropy by However, it has been recently found that the von Neu- N mann entropy calculated from the rescaled Laplacian X ρbgs does not satisfy the sub-additivity property in some S(ρ) = − Tr (ρ log2 ρ) = − λi log2 λi, (2) i=1 critical circumstances [47]. For instance, let ρbgs be the 3

III. VON NEUMANN ENTROPY OF A B STANDARD NETWORK MODELS

For simplicity, in the following we will use ρ = ρG where there will be no ambiguity about which density matrix is under consideration. Indeed, we will also use the notation S(G) = S(ρG) to indicate the entropy of network G. From the eigen-decomposition of the Laplacian matrix L = QΛQ−1, where Λ is the diagonal matrix of Lapla- cian’s eigenvalues, it is straightforward to show that

N X Z = e−βλi(L), (4) i=1

where λi(L) indicates the i-th eigenvalue of L. The spec- tral entropy of ρ can be rewritten as

N X S(ρ) = − λi(ρ) log2 λi(ρ). (5) i=1

FIG. 1. Density matrix of a complex network. Network It follows that (top) and corresponding density matrix (bottom) from (A) a −1 −βλi(L) stochastic block-model and (B) Watts-Strogats networks, for λi(ρ) = Z e (6) β = 3.16. Colors in the networks indicate nodes belonging to the same community, whereas colors in the density matrices provides the relationship between the eigenvalues of the indicate the magnitude of the corresponding entry. Networks density and the Laplacian matrices. Using Eq. (6), the with N = 200 nodes are shown. spectral entropy of the network G reduces to

N 1 X S(G) = e−βλi(L)[log Z + βλ (L)] Z log 2 i i=1

∂ log2 Z = log2 Z − β = log2 Z + β Tr [Lρ]. (7) rescaled Laplacian matrix of a network with N nodes ∂β and |E|  1 edges, let σbgs be the rescaled Laplacian A possible interpretation of this entropy will be given matrix of a network with N nodes and just one (undi- later. Here, it is worth discussing some very special rected) edge and let τbgs be the rescaled Laplacian of the cases, where the spectral entropy following from our def- network obtained by summing up, entry wise, the cor- inition provides very different results from BGS (Sbgs = responding adjacency matrices of the previous two net- S(ρbgs)). works. Since S(σbgs) = 0, the sub-additivity property Networks of isolated nodes. First, let us consider the case S(τbgs) ≤ S(ρbgs) + S(σbgs) is not always satisfied be- of a network with no links among nodes. The eigenvalues cause, as we will see later, S(τbgs) ≥ S(ρbgs) very often. of the combinatorial Laplacian are all 0, Z = N and S = log2 N, i.e. the entropy is maximum regardless of While this peculiar behavior can in fact be exploited β—whereas Sbgs is not defined. in certain situations [47], one is interested in preserving sub-additivity, because in general it is expected that the Single-link networks. Let us consider a network with only entropy of a composite system A + B is equal or smaller one link between node i to node j, i.e. A = E(ij)+E(ji) than the sum of the entropy of A and B. In the case being E(ij) the canonical matrix with all zero entries of complex networks, it seems intuitively reasonable but except in i−th-j−th . It is straightforward to has remained to be proven—see AppendixA—that the show that the eigenvalues of the combinatorial Laplacian aggregation of two graphs is expected to have lower en- are all 0 except one whose value is 2. While Sbgs = 0, it tropy than the sum of the entropy of each graph sep- follows that arately. The entropy of the density matrix introduced 2βe−2β S = log Z + (8) to complex networks in this work —that we name spec- 2 Z log 2 tral entropy—presents some interesting features that will be discussed later. A visualization of the density matrix where Z = N − 1 + e−2β and the asymptotic behavior is corresponding to different network models is shown in log2 N for any β. Our spectral entropy scales with the Fig.1. size of the network when only one directed link is present. 4

FIG. 2. von Neumann entropy of complex network models. Spectral entropy as a function of 1/β for Erd¨os-R´enyi networks (left-hand panel), Watts-Strogatz’s small-world networks (central panel) and K-regular lattices (right-hand panel). Color encodes realizations obtained by varying the parameter of the network model, i.e. the probability plink of a link between two nodes, the rewiring probability prew and the number of neighbors K, respectively. In all cases, networks with N = 200 are considered.

Complete networks. In the case of the network, quantum and BGS entropy for an ensemble of Erd¨os- there are N − 1 eigenvalues equal to N, from which R´enyi networks. Similar results are obtained by using ensembles of Watts-Strogatz and K-regular networks. βN(Z − 1) S = log2 Z + (9) As proven in the Appendix, the spectral entropy does Z log 2 not violate the sub-additivity property regardless of the where Z = 1+(N −1)e−βN and the asymptotic behavior value of β, whereas the Sbgs often violates these sensi- ble additivity requirement. See AppendixA for further is log2 N for small β and 0 for large β. details. Complex network models. We show in Fig.2 the von Neu- mann entropy as a function of 1/β for Erd¨os-R´enyi [60], Watts-Strogatz [61] and K-regular networks, for differ- ent values of their parameters, the link probability plink, the rewiring probability prew and the number K of node neighbors, respectively. Intriguingly, in all three network types, when β is high the entropy tends to zero whereas it approaches the theoretical limit log2 N when β is small enough. Sub-additivity of spectral entropy. Let us consider an undirected network G of N nodes changing over time, with adjacency matrix A(0) at time t = 0. At each time step t a pair of two nodes, chosen uniformly randomly FIG. 3. Sub-additivity of spectral entropy. Distribu- and not yet connected, is linked by one undirected link. tion of entropy variation (see the text) for an Erd¨os-R´enyi This is equivalent to having another network G0(t) con- network with plink = 0.01 and N = 100. The results for sisting of just one link and N − 2 isolated nodes, whose the quantum entropy defined in this work (left-hand panel) adjacency matrix A0(t) is summed up to A(t − 1) such and the BGS entropy (right-hand panel) are shown. Quan- that A(t) = A(t − 1) + A0(t). If G(t − 1) and G0(t) have tum entropy never violates sub-additivity, regardless of the no edges in common, the above operation is equivalent β parameter, whereas the BGS entropy significantly violates to the union of the two graphs, another typical approach sub-additivity. to aggregate networks. One immediately asks if the spectral entropy defined in this work and the BGS entropy satisfy the sub-additivity property such that IV. RENYI´ ENTROPY OF COMPLEX NETWORKS S(G(t)) ≤ S(G(t − 1)) + S(G0(t)). (10) We show in Fig.3 the distribution of ∆ S = S(G(t)) − The von Neumann entropy is just a special case of the [S(G(t − 1)) + S(G0(t))] obtained by calculating both more general R´enyi entropy of a quantum system, defined 5 by [36] The quantum R´enyi entropy can be used to define the quantum R´enyi divergence, also known as q−relative N R´enyi entropy, by 1 q 1 X q Sq = log Tr ρ = log λi(ρ) . (11) 1 − q 2 1 − q 2 1 i=1 D (ρ||σ) = log Tr (ρqσ1−q) (13) q q − 1 2 It is widely used in quantum information theory and quantum computing to quantify entanglement [62] and which reduces to the quantum Kullback-Leibler diver- correlations in physical systems [63], it can be expressed gence (or, equivalently, the quantum relative entropy) as families of tensor network contractions [64] and has D (ρ||σ) = Tr [ρ(log ρ − log σ)] (14) been found to share a close relationship to free en- 1 2 2 ergy [65]. for q → 1. By using Eq. (6), the R´enyi spectral entropy of a com- In general such divergences are not symmetric and plex network is therefore given by bounded, making difficult certain comparison uses. An alternative measure is the quantum q−Jensen-Shannon 1 N divergence [67] X −q −qβλi(L) Sq = ln Z e , (12) 1 − q ρ + σ  1 i=1 J (ρ||σ) = S − [S (ρ) + S (σ)], (15) q q 2 2 q q in terms of the eigenvalues of the Laplacian matrix. In fact, this entropy generalizes other entropic measures. ρ+σ where µ = 2 is usually called mixture matrix. It can As q approaches 0, S increasingly weights all eigenval- 1 q be proven that J 2 defines a true metric for 0 ≤ q < ues more equally, approaching the Hartley entropy. In q 2 [67–69] and can be used as a measure of distinguisha- the limit of q → 1 R´enyi entropy approaches the spec- bility [47] or similarity [70]. tral entropy. As q approaches ∞, S becomes dominated q Our formalism allows to use the family of q−quantum by the high-probability events and it converges to the divergences to compare two networks and this opens up min-Entropy. The case with q = 2, recovers the collision a new approach to attack two fundamental problems in entropy—in the case of quantum systems, this is con- network science. nected to the purity of the system. In fact, S2 identically vanishes for ρ2 = ρ, i.e. for pure states. We show in Fig.4 entropy as a function of q at dif- A. Maximum Likelihood Estimation and Model ferent values of β for Erd¨os-R´enyi, Watts-Strogatz and Selection K-regular networks, respectively, for different values of their parameters. It is common to assess that the Watts- For a given model and its corresponding set of param- Strogatz network reduces to a K-regular graph for p = rew eters, the likelihood function measures the probability of 0 and approaches an Erd¨os-R´enyi network for p = 1. res observing the data according to the model parameters. However, the behavior of R´enyi entropy as a function of q Therefore, it is sufficient to perform maximum likelihood shows that there are some significant differences at least estimation to obtain the parameters of the model that in the latter case, where the entropy is considerably larger better reproduce the data, according to the model. for the Watts-Strogatz model than the Erd¨os-R´enyi one. While it is straightforward to define the likelihood function for probability distributions, it is challenging to define a similar concept in the context of density ma- V. GENERALIZED QUANTUM DIVERGENCES trices. In the case of complex networks the challenge BETWEEN TWO COMPLEX NETWORKS is complicated by the lack of an appropriate probability distribution. In fact, one usually designs a model, or a One of the main goal of information theory is to quan- set of models, to reproduce only some salient features of tify the amount of information about a probability dis- an observed complex network, often limiting the compar- tribution – generally obtained from empirical measure- ison between model and data to a few descriptors such ments – provided that one has full information about an- as distribution, degree correlations, clustering or other probability distribution – e.g. the model. This goal path statistics. is achieved by introducing relative entropies or, equiva- Quantum divergences provide a grounded and unifying lently, information divergences. approach to this problem. Let us start from the classi- Similarly, the introduction of divergences (a.k.a. quan- cal problem where an empirical probability distribution tum relative entropy) in quantum information theory is P (x) is obtained from observing the N outcomes {xi} foundational to the quest to understand differences be- (i = 1, 2, ..., N) of a stochastic variable X . Let Q(x; Θ) tween quantum states, quantum and classical informa- be a model to approximate P (x), which depends on one tion, the quantification of the thermodynamic cost of (or more) parameter(s), here indicated by Θ. In this con- communication as well as optimal protocols to transfer text, the Kullback-Leibler divergence measures the infor- information—see e.g. the reviews [6] and [66]. mation gain when the model Q(x; Θ) is used to explain 6

FIG. 4. R´enyi entropy of complex network models. Entropy as a function of q with β = 1/15.8 for Erd¨os-R´enyi networks (left-hand panel), Watts-Strogatz’s small-world networks (central panel) and K-regular lattices (right-hand panel). Color encodes realizations obtained by varying the parameter of the network model, i.e. the probability plink of a link between two nodes, the rewiring probability prew and the number of neighbors K, respectively. In all cases, networks with N = 200 are considered. the observation P (x), and it can be written as Eq. (14), and by arguments similar to the classical case, it is straightforward to show that Z P (x) D(P ||Q) = dxP (x) log2 Q(x; Θ) min{D(ρ||σ)} = max {Tr [ρ log2 σ(Θ)]} . (19) Z Θ Θ = −S(P ) − dxP (x) log2 Q(x; Θ). (16) By comparing the right-hand side of Eq. (18) and Eq. (19), we define the network log-likelihood function If the model Q is plausible, there exist a value Θ? such by that the divergence is minimum and we are interested in finding such a value by minimizing the divergence with log2 L(Θ) = Tr [ρ log2 σ(Θ)], (20) respect to Θ. We notice that the first term in the right- hand side of Eq. (16) does not depend on Θ and therefore where the likelihood function can be calculated by ex- plays no role in the minimization procedure. By noticing ploiting the properties of the matrix exponential as that L(Θ) = eTr [ρ log σ(Θ)] N   1 X ρ log σ(Θ) P (x) = δ(x − x ), = det e . (21) N i i=1 This result allows us to obtain a maximum-likelihood where δ is the Dirac function, the second term in the estimation of parameter(s) Θ = {θ1, θ2, ..., θd} by min- right-hand side of Eq. (16) reduces to imizing the Kullback-Leibler divergence between net- works. The covariance matrix corresponding to this Z N 1 X estimation is the Fisher information matrix (see Ap- dx P (x) log Q(x; Θ) = log Q(xi; Θ), (17) 2 N 2 pendixB), whose classical counterpart is equivalent to i=1 the Hessian of the Kullback-Leibler divergence and it is which is proportional to the negative log-likelihood func- used to assess the quality of the spectral likelihood esti- tion. Here, the pre-factor can be safely neglected dur- mate. If Θ? is an unbiased estimator of Θ, the associated ing the minimization procedure. Therefore, by minimiz- covariance matrix satisfies the Cramer-Rao bound ing the Kullback-Leibler divergence one effectively max- imizes the log-likelihood function: cov(Θ?) ≥ I−1(Θ?). (22)

min{D(P ||Q)} = max{log2 L(x; Θ)}. (18) We can exploit this finding to go beyond model fitting, Θ Θ defining an operative procedure for model selection. We use the proposed framework to achieve a similar In fact, generally more than one model is used to un- result in the case of density matrices. Let ρ be the derstand the data and a fundamental problem in data density matrix of an empirical network and let σ(Θ) a analysis, known as model selection, is to quantify which model for such a network, depending on one (or more) model out of a set of candidates is the best one in re- parameter(s), here indicated by Θ. By starting from producing the data. One solution to this problem has 7 been given by Akaike, who proposed an information cri- ensemble of 100 realizations for each value of the link terion (AIC) for this purpose [71], by showing that the probability and compare each network against the ob- expected value of the relative cross-entropy term in the served one to calculate the average Kullback-Leibler di- Kullback-Leibler divergence equals the log-likelihood of vergence. In principle, the procedure depends on the the model given the data plus a penalizing constant term value of β, therefore we perform the analysis for different which accounts for the number of free parameters. The values of this parameter in order to understand which AIC is given by value (or range of values) provides the best fit. We con- sider specific values of β for this purpose, more specifi- ? AIC = 2k − 2 log2 L(Θ ) (23) cally we calculate the β? such that the entropy normal- ized to its maximum value, i.e. log N, gets a specific real where k is the number of parameters of the model 2 value c(β?) between 0 and 1. We choose values for c(β?) and we plug Eq. (20) into Eq. (23) for applications to ranging between 0.01 and 0.9, and the results show that complex networks. In practice, given a set of mod- the global minima corresponds or are very close to the els M = {M ,M , ..., M }, with number of parameters 1 2 n expected value, for c(β?) ≤ 0.5. The best performance k , k , ..., k and likelihood L , L , ..., L , respectively, 1 2 n 1 2 n is obtained for c(β?) < 0.1. This analysis suggests a the most suitable candidate to explain the data is the rule of thumb to choose a specific value of β for fitting one being a trade-off between having as small as possible purposes: the region close to the critical point – where divergence from the data and as small as possible number entropy changes from 0 to a positive value – provides the of parameters, i.e. the one such that AIC is minimum. most performant range for β. Similarly, other model selection criteria can be ex- tended from information theory to the complex network In Fig.5B we show the Kullback-Leibler divergence of framework. This is the case of Bayesian information cri- Watts-Strogatz’s network model (N = 200) for different values of the parameters K and prew against a Watts- terion (BIC) defined by ? ? Strogatz network with K = 6 and prew = 0.2, assumed ? BIC = k log2 N − 2 log2 L(Θ ) (24) to be the empirical data. The result shows that the most likely region of the parameter space, i.e. the one where and Fisher information approximation (FIA) defined by the model is more informative about the data, is suc- cessfully identified by the Kullback-Leibler minimization k N FIA = log − log L(Θ?) procedure previously described. The result is very in- 2 2 2π 2 Z  teresting because only a single realization of the model, p instead of an ensemble as in the previous case, has been + log2 dΘ det I(Θ) , (25) used for each pair of parameters, suggesting that the pro- cedure is robust against sample size while reducing the where the last term penalizes a model because of its ge- computational cost of the calculation. ometric complexity [72, 73], a typical concept in infor- Fig.5C shows the Kullback-Leibler divergence of a mation geometry [74]. In classical statistical inference, it stochastic block-model [76](N = 200) for different values has been shown [72] that the FIA quantifies the length of of the intra- and inter-community probability parame- the shorted description of the data given the model and, ters, p and p , and number of equally sized blocks as a consequence, its minimization corresponds to finding intra inter against a network obtained from the same model with 8 the minimum description length (MDL) of the network. blocks, p? = 0.5 and p? = 0.05, assumed to be the The MDL principle [75], also known as a formalization of intra inter empirical data. As for the Erd¨os-R´enyi case, we sample Occam’s razor, represents data and models as codes to be an ensemble of 100 realizations for each triad of parame- compressed, where the model better compresses data to ters, to calculate the average Kullback-Leibler divergence provide its best description. This principle is one of the between model and a observation. We show the results most important concepts in information theory and its for different block sizes and the overall minimum is found interpretation, in our context, opens the door for future for 8 blocks, p = 0.6 and p = 0.05, in excellent studies. intra inter agreement with expectation. To probe the power of the proposed method we per- formed several tests on synthetic networks. We generate a network from a specific model with a given value of the parameter(s), assuming it is the observed network, B. Clustering layers of multilayer systems and a set of networks from the same model by vary- ing the value of the parameter(s). Therefore, we per- The second application concerns the more recent prob- form a maximum-likelihood parameter estimation based lem of identifying layers of a multiplex network or snap- on Kullback-Leibler minimization to find the value of the shots of a time-varying network [51] that provide redun- parameter(s) that better fit the observation. dant information about the system and that might be In Fig.5A we consider the case of an Erd¨os-R´enyi net- aggregated to reduce the complexity of the system [47]. work model (N = 200), being link probability plink the In this case, by using appropriate quantum divergences it ? parameter to fit and plink = 0.05 its expected value. By is possible to calculate a distance between layers or snap- scanning over the range allowed for plink, we sample an shots and devise exact or heuristics procedures to cluster 8 A B

C

FIG. 5. Maximum-likelihood parameter estimation based on Kullback-Leibler minimization. (A) Erd¨os-R´enyi ? network model (N = 200) for different values of the link probability plink against an Erd¨os-R´enyi network with plink = 0.05, assumed to be the data to fit. The vertical dashed line indicates the true value of the parameter. Each curve corresponds to a ? ? ? value β of β such that S[ρ(β )]/ log2 N = c(β ), with c a real number between 0 and 1. Global minima are expected around the true value. (B) Watts-Strogatz’s network model (N = 200) for different values of the parameters K and prew against a ? ? Watts-Strogatz network with K = 6 and prew = 0.2, assumed to be the data to fit. The white dot indicates the true value and it falls in the iso-likelihood region with smallest Kullback-Leibler divergence (encoded by color in log scale). (C) Stochastic block-model (N = 200) for different values of the intra- and inter-community probability parameters, pintra and pinter, and ? ? number of equally sized blocks against a network obtained from the same model with 8 blocks, pintra = 0.5 and pinter = 0.05, assumed to be the data to fit. 9 A B C

FIG. 6. Hierarchical clustering of human microbiome sites. The Jensen-Shannon distance matrices with (A) β = 0.1 and (B) β = 10 are shown, together with their (C) tantlegram, i.e., a visual analysis of their differences. Cophenetic distance and Baker’s gamma correlation coefficient, quantifying the correlation between the two dendrograms, are reported. them. With the recent rise of interest in multilayer sys- ing the fact that the result is robust to the choice of β. tems [48–50], this problem became very relevant for prac- tical applications, e.g. to demonstrate the validity of a multilayer approach to modeling complex systems as the human brain [77], and new approaches to cluster and ag- VI. DISCUSSION gregate layers have been proposed more recently [78, 79]. Here, we use the networks built from analysis of the The use of von Neumann entropy is central to modern structure and function of the human microbiome, con- quantum information theory, with many new insights, sisting of 18 layers of a multiplex network, each one cor- uses and interpretations arising often. In the context of responding to a body site. Recently such layers have complex networks, spectral entropy might be considered been partitioned into community types, by using Dirich- as a measure of the information content of the system, let multinomial mixture models, that may be associated although this interpretation is not easy to accept with- with complex diseases [56]. We use the same data and out the appropriate formalization typical of classical and calculate the Jensen-Shannon distance between each pair quantum information theory. of layers for different values of the β parameter. We show Here, we observe that networks of isolated nodes in Fig.6 the resulting hierarchical clustering of the lay- (which challenge the notion of a network itself) and fully- ers, in good agreement with the result reported in [56]. A connected networks, i.e. cliques, have maximal entropy. more quantitative comparison against the results in [56] is Of course, connectivity is not the feature common to difficult because only p−value upper bounds are reported such extremal network scenarios. We investigated sev- in the study for community-type associations. Neverthe- eral alternative explanations, but the most promising one less, our method is able to correctly reproduce the clus- is based on the number of possible configurations ob- tering of sites within certain body regions. For instance, tained by reshuffling the connections among nodes, that the community consisting of hard palate, tongue dorsum, is equivalent to randomly reshuffling the entries of the ad- saliva, palatine tonsils and throat, as well as the commu- jacency matrix. In fact, all possible reshuffled configura- nity made by vaginal introitus, mid vagina and posterior tion of both the network of isolated nodes and the clique fornix and the smaller ones made by sub- and supra- network do not alter the system. In other words, the gingival plaque and left/right retroauricular crease. For graph isomorphism problem becomes trivial, as all pos- ranges of β within one order of magnitude, the differ- sible networks are not distinguishable from each other, ences between hierarchies are small and not significant. thus maximizing the uncertainty about the system and, Here we show the results for β = 0.1 and β = 10, with consequently, the entropy. In the case of K-regular net- cophenetic distance equal to 0.84 and Baker’s gamma works the situation is a bit different, because not all correlation coefficient equal to 0.68. By performing the reshuffled configurations will keep them unaltered: there- same analysis on hierarchies where labels are reshuffled fore such networks are expected to exhibit lower entropy uniformly random, we reject the null hypothesis that the than log2 N. It is interesting the case of an Erd¨os-R´enyi two dendrograms are uncorrelated (P < 0.001), support- network, which is characterized by stochastically homo- 10 geneous connectivity. The entropy for this type of net- aggregated networks. works is almost always zero or log2 N, except for a narrow This definition of entropy has allowed us to define range of β. This is compatible with our interpretation, R´enyi spectral entropy and generalized relative entropies, because on average almost any reshuffled configuration of also known as quantum divergences. By using Kullback- this type of networks will resemble the original. Accord- Leibler divergence and well-known results in classical in- ing to this interpretation, extremely sparse networks and formation theory we have devised a maximum-likelihood extremely dense networks should exhibit almost maxi- approach to model fitting, entirely based on the eigen- mum entropy. We have performed additional numerical value spectra of density matrices. Our numerical experi- experiments and theoretical analysis of toy models (path ments confirmed that by minimizing the Kullback-Leibler graphs, lattices, etc.) and we have verified our expecta- divergence between observed network and models, we ob- tion. tain the maximum spectral likelihood estimation of pa- The proposed framework is entirely based on the com- rameters. This result might be of special interest for ap- putation of eigenvalues of density matrices and their plications devoted to unravel the meso-scale organization products. The computational complexity of such opera- of a network, among others. tion is generally expected to be polynomial and to scale Finally, by using the Jensen-Shannon divergence, as O(nγ ), with 2 < γ < 2.4, making challenging calcu- whose square root provides a metric, we have shown that lations for networks with hundreds of thousands nodes. our framework can be applied to quantify the spectral However, the recent advances in parallel operations on distance between two networks. More specifically, this dense symmetric matrices make possible to solve eigen- result is of interest for applications in multilayer network value problems for matrices with size of the order of research, where the distance among layers can be mea- 105 in less than five minutes on standard computer ma- sured and used to hierarchically cluster them. As a rep- chines [80]. resentative example of the power of our methodology, we have shown that this approach recovers with excellent accuracy the community-based classification of human VII. CONCLUSIONS AND OUTLOOK microbiome sites. This study opens interesting perspectives on future The concept of thermodynamic entropy has been cru- works. For instance, while our inference procedure does cial to understand the structure and the dynamics of not provide the community label to assign to each node, complex systems. The generalization of this concept to as in community-detection methods based on stochastic quantum mechanics by von Neumann was a milestone in block-modeling [81–87], it provides a fast way, based on the field, allowing, among other applications, to charac- network’s spectral properties, to explore the parameter terize the mixedness of quantum states. space in order to identify the most informative region An appropriate definition of entropy has been lacking about the data. An intriguing possibility to explore is in the field of complex networks, mainly because there is how this result can complement existing network infer- no simple way to define a probability distribution able to ence approaches [88]. represent, without loss of information, the network. Pre- A future application of our framework—of great inter- vious attempts to define the entropy of a complex net- est for the community of network scientists—is the quan- work where mainly based on the calculation of Shannon tification of how much information is needed for correctly entropy of the probability distribution of some descrip- learning the parameters of a model, that should not be tor(s), while neglecting other information. confused with the Cramer-Rao bound. This represents a More recently, the idea that a quantum-like entropy crucial issue in network science: connectivity of empiri- might be introduced for complex networks has been ex- cal networks is often sampled through ad hoc algorithms, plored. Here, we have demonstrated that previous at- as in the case of virtual social systems, the Internet or tempts fail in preserving fundamental properties of en- the World Wide Web. Quantifying to which extent it is tropy, like sub-additivity. Motivated by this finding, we possible to learn the parameter(s) of a network model in have proposed a density matrix, inspired by the Gibbs absence of partial connectivity information is a tantaliz- state of a quantum system, which non-trivially depends ing possibility, expected to deepen our understanding of on the combinatorial Laplacian matrix of the network. networked systems. One of the main advantages of this approach, rooted in The information-theoretic quantities introduced in this spectral theory, is that the resulting entropy does not de- work open the intriguing possibility to extend informa- pend on the distribution of a specific network descriptor tion geometry to complex networks. In information ge- but it depends on the network as a whole. ometry parametric statistical models define a Rieman- In this study, we have shown, analytically and numer- nian manifold endowed with Fisher information matrix ically, that our density matrix allows to preserve the ba- as a metric tensor [89, 90]. The importance of informa- sic properties of an entropy. More specifically, we have tion manifolds for statistical inference is well established demonstrated that the entropy of the system obtained and found applications to machine learning [74, 91, 92] by aggregating two different networks must be equal to and phase transitions in quantum systems [93–98]. It will or smaller than the sum of the entropy of the two non- be interesting to explore to which extent the success of 11 information geometry can be ported to statistical infer- By summing up such non-negative terms, the following ence of complex networks, where the metric tensor of the inequality underlying information manifold is given by the spectral Fisher information matrix. D(ρC ||ρA) + D(ρC ||ρB) +

The potential of the proposed methodologies goes be- +β Tr [LAρA] + β Tr [LBρB] + log2 ZC ≥ 0 (A4) yond network science and we envision important contri- butions to quantum physics, especially to the emergent holds. The above inequality can be expanded into field of Hamiltonian learning [99, 100] and inference of quantum complex network models based on entan- − SC + β Tr [LAρC ] + log2 ZA + glement [29, 30]. −SC + β Tr [LBρC ] + log2 ZB +

β Tr [LAρA] + β Tr [LBρB] + log2 ZC ≥ 0. (A5) ACKNOWLEDGMENTS By exploiting the fact that, for a Gibbs-like state, the von Neumann entropy is given by The authors thanks Alex Arenas for useful discussions. MDD acknowledges financial support from the Spanish S(ρX ) = β Tr [LX ρX ] + log2 ZX (A6) program Juan de la Cierva (IJCI-2014-20225). JDB ac- knowledges IARPA and the Foundational Questions In- and LC = LA + LB, it follows that stitute (FQXi) for financial support. SA + SB − 2SC + log2 ZC + β Tr [LC ρC ] ≥ 0, which leads to S + S ≥ S . Appendix A: Sub-additivity of the von Neumann A B C entropy for aggregate networks

Let A and B be the adjacency matrices of two networks Appendix B: Spectral Fisher information matrix GA and GB, respectively, with N nodes. Let C = A + B be the adjacency matrix obtained by their sum, which Let ρ(Θ) the parametric statistical model, depending represents a new network GC obtained by aggregating on the parameter set Θ = {θ1, θ2, ..., θd}, for a complex GA and GB. In the following we will show that S(GC ) ≤ network with density matrix ρ. In quantum physics, the S(GA) + S(GB), i.e. that the sub-additivity property of upper bound to the expected Fisher information is given the entropy is always satisfied. For sake of simplicity, we by the quantum information [93, 101, 102] will use the notation SX = S(GX ).  (α) (β) h (α) (β)i Let us indicate with LA, LB and LC the Laplacian Iαβ(Θ) = E Σ ◦ Σ = Tr ρ Σ ◦ Σ ,(B1) matrices of GA, GB and GC , respectively, and let ρA, ρB and ρC indicate the corresponding density matrices where Σ(α) is the symmetric logarithmic derivative— defined as in Eq. (3). with respect to the α−th parameter (α = 1, 2, ..., d)— The Kullback-Leibler divergence of two density matri- defined by ces is always non-negative, as a consequence of Klein’s inequality with f(X) = X ln X. Therefore ∂ ρ(Θ) = Σ(α) ◦ ρ(Θ), (B2) ∂θα D(ρC ||ρA) = −SC − Tr [ρC log2 ρA] and we have used the symmetric product defined by = −SC + β Tr [LAρC ] + log2 ZA ≥ 0,(A1) 1 and similarly for D(ρC ||ρB). As LA, LB, ρA and ρB are X ◦ Y = (XY + YX). (B3) positive semi-definite, from their Cholesky factorization 2 it is straightforward to show that Let us consider the spectral decomposition of the density † † matrix Tr [LX ρX ] = Tr [(DD )(QQ ]) † † † = Tr [(Q D)(Q D) ] ≥ 0. (A2) ρ(Θ) = Z(Θ)−1Q(Θ)e−βΛ(Θ)Q−1(Θ), (B4)

Moreover, it is possible to show that log2 ZX ≥ 0. In where Λ(Θ) is the diagonal matrix of Laplacian’s eigen- N P −βλi(LX ) values λ (Θ) (i = 1, 2, ..., N) and the columns of the ma- fact, ZX = e , and λ1(LX ) = 0 because of the i i=1 trix Q(Θ) are the corresponding eigenvectors, that we Perron-Frobenius theorem. Therefore indicate by qi(Θ). Note that we do not indicate explic- N itly the dependence on L for sake of simplicity. In the X −βλi(LX ) ZX = 1 + e ≥ 1, (A3) Laplacian eigenbasis, Eq. (B2) reads i=2 > ∂ 1 (α) qi (Θ) ρ(Θ)qj(Θ) = [λi(Θ) + λj(Θ)]Σij from which log2 ZX ≥ 0. ∂θα 2 12 from which we obtain the entries of the symmetric loga- and δij is the Kronecker function. rithmic derivative [103] as

(α) (α) 2Φij (Θ) Σij = , (B5) λi(Θ) + λj(Θ) where

(α) ∂ > ∂ Φij (Θ) = λi(Θ)δij + (λj − λi)qi (Θ) qj(Θ) ∂θα ∂θα

[1] C. Shannon, Bell System Technical Journal 27, 379 [26] F. Radicchi and A. Arenas, Nature Physics 9, 717 (1948). (2013). [2] A. Borst and F. E. Theunissen, Nature neuroscience 2, [27] M. De Domenico, A. Sol´e-Ribalta, S. G´omez, and 947 (1999). A. Arenas, PNAS 111, 8351 (2014). [3] S. P. Strong, R. Koberle, R. R. d. R. van Steveninck, [28] A. Ac´ın,J. I. Cirac, and M. Lewenstein, Nature Physics and W. Bialek, Phys. Rev. Lett. 80, 197 (1998). 3, 256 (2007). [4] A. S. Holevo, Problemy Peredachi Informatsii 9, 3 [29] M. Cuquet and J. Calsamiglia, Physical review letters (1973). 103, 240503 (2009). [5] C. H. Bennett and P. W. Shor, IEEE transactions on [30] S. Perseguers, M. Lewenstein, A. Ac´ın, and J. I. Cirac, information theory 44, 2724 (1998). Nature Physics 6, 539 (2010). [6] V. Vedral, Reviews of Modern Physics 74, 197 (2002). [31] M. Faccin, T. Johnson, J. Biamonte, S. Kais, and [7] R. N. Mantegna, S. V. Buldyrev, A. L. Goldberger, P. Migdal,Phys. Rev. X 3, 041007 (2013). S. Havlin, C.-K. Peng, M. Simons, and H. E. Stanley, [32] M. Faccin, P. Migda l,T. H. Johnson, V. Bergholm, and Phys. Rev. Lett. 73, 3169 (1994). J. D. Biamonte, Phys. Rev. X 4, 041012 (2014). [8] P. Bernaola-Galv´an,I. Grosse, P. Carpena, J. L. Oliver, [33] D. Deutsch, Proc. Roy. Soc. Lond. A 400, 97 (1985). R. Rom´an-Rold´an, and H. E. Stanley, Physical Review [34] R. Horodecki, P. Horodecki, M. Horodecki, and Letters 85, 1342 (2000). K. Horodecki, Rev. Mod. Phys. 81, 865 (2009). [9] M. Rosvall and C. T. Bergstrom, PNAS 105, 1118 [35] M. M. Wilde, Quantum Information Theory (Cam- (2008). bridge University Press, 2013) cambridge Books Online. [10] A. V. Esquivel and M. Rosvall, Phys. Rev. X 1, 021025 [36] A. Wehrl, Reviews of Modern Physics 50, 221 (1978). (2011). [37] H. Araki and E. H. Lieb, Comm. Math. Phys. 18, 160 [11] M. Rosvall, A. V. Esquivel, A. Lancichinetti, J. D. (1970). West, and R. Lambiotte, Nature Communications 5, [38] M. Rosvall and C. T. Bergstrom, PNAS 104, 7327 4630 (2014). (2007). [12] M. De Domenico, A. Lancichinetti, A. Arenas, and [39] M. Dehmer, Applied Artificial Intelligence 22, 684 M. Rosvall, Phys. Rev. X 5, 011027 (2015). (2008). [13] D. J. Watts and S. H. Strogatz, Nature 393, 440 (1998). [40] N. L¨udtke, S. Panzeri, M. Brown, D. S. Broomhead, [14] A.-L. Barab´asiand R. Albert, Science 286, 509 (1999). J. Knowles, M. A. Montemurro, and D. B. Kell, Journal [15] R. Guimera and L. A. N. Amaral, Nature 433, 895 of The Royal Society Interface 5, 223 (2008). (2005). [41] G. Bianconi, P. Pin, and M. Marsili, PNAS 106, 11433 [16] G. Palla, I. Der´enyi, I. Farkas, and T. Vicsek, Nature (2009). 435, 814 (2005). [42] N. Eagle, M. Macy, and R. Claxton, Science 328, 1029 [17] C. Song, S. Havlin, and H. A. Makse, Nature 433, 392 (2010). (2005). [43] M. Rosvall and C. T. Bergstrom, PloS one 5, e8694 [18] V. Colizza, A. Flammini, M. A. Serrano, and A. Vespig- (2010). nani, Nature physics 2, 110 (2006). [44] S. Johnson, J. J. Torres, J. Marro, and M. A. Munoz, [19] M. Boguna, D. Krioukov, and K. C. Claffy, Nature Physical Review Letters 104, 108702 (2010). Physics 5, 74 (2009). [45] K. Anand, G. Bianconi, and S. Severini, Physical Re- [20] A. Vespignani, Nature Physics 8, 32 (2012). view E 83, 036109 (2011). [21] Y.-Y. Liu, J.-J. Slotine, and A.-L. Barab´asi, Nature [46] I. A. Kov´acs,R. Mizsei, and P. Csermely, Scientific 473, 167 (2011). Reports 5, 13786 (2015). [22] R. Albert, H. Jeong, and A.-L. Barab´asi,nature 406, [47] M. De Domenico, V. Nicosia, A. Arenas, and V. Latora, 378 (2000). Nature Communications 6, 6864 (2015). [23] D. S. Callaway, M. E. Newman, S. H. Strogatz, and [48] M. Kivel¨a,A. Arenas, M. Barthelemy, J. P. Gleeson, D. J. Watts, Phys. Rev. Lett. 85, 5468 (2000). Y. Moreno, and M. A. Porter, Journal of Complex Net- [24] S. V. Buldyrev, R. Parshani, G. Paul, H. E. Stanley, works 2, 203 (2014). and S. Havlin, Nature 464, 1025 (2010). [49] S. Boccaletti, G. Bianconi, R. Criado, C. Del Genio, [25] J. Gao, S. V. Buldyrev, H. E. Stanley, and S. Havlin, J. G´omez-Garde˜nes, M. Romance, I. Sendi˜na-Nadal, Nature physics 8, 40 (2012). Z. Wang, and M. Zanin, Physics Reports 544, 1 (2014). 13

[50] M. De Domenico, C. Granell, M. A. Porter, and [77] M. De Domenico, S. Sasai, and A. Arenas, Frontiers in A. Arenas, Nature Physics (In Press) (2016), Neuroscience 10, 326 (2016). 10.1038/nphys3865. [78] N. Stanley, S. Shai, D. Taylor, and P. Mucha, IEEE [51] M. De Domenico, A. Sol`e-Ribalta,E. Cozzo, M. Kivel¨a, Transactions on Network Science and Engineering 3, 95 Y. Moreno, M. A. Porter, S. G`omez, and A. Arenas, (2016). Phys. Rev. X 3, 041022 (2013). [79] D. Taylor, S. Shai, N. Stanley, and P. J. Mucha, Phys. [52] E. Estrada and N. Hatano, Physical Review E 77, Rev. Lett. 116, 228301 (2016). 036111 (2008). [80] P. Bientinesi, I. S. Dhillon, and R. A. Van De Geijn, [53] E. Estrada, D. J. Higham, and N. Hatano, Physical SIAM Journal on Scientific Computing 27, 43 (2005). Review E 78, 026102 (2008). [81] R. Guimer`aand M. Sales-Pardo, PNAS 106, 22073 [54] E. Estrada and D. J. Higham, SIAM review 52, 696 (2009). (2010). [82] B. Karrer and M. E. Newman, Phys. Rev. E 83, 016107 [55] P. Grindrod, M. C. Parsons, D. J. Higham, and (2011). E. Estrada, Physical Review E 83, 046120 (2011). [83] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborov´a, [56] T. Ding and P. D. Schloss, Nature 509, 357 (2014). Phys. Rev. E 84, 066106 (2011). [57] E. Estrada and J. G´omez-Garde˜nes,Physical Review E [84] T. P. Peixoto and S. Bornholdt, Phys. Rev. Lett. 109, 89, 042819 (2014). 118703 (2012). [58] E. Estrada, N. Hatano, and M. Benzi, Physics reports [85] T. P. Peixoto, Phys. Rev. Lett. 110, 148701 (2013). 514, 89 (2012). [86] T. P. Peixoto, Phys. Rev. X 5, 011033 (2015). [59] S. L. Braunstein, S. Ghosh, and S. Severini, Annals of [87] M. E. J. Newman and A. Clauset, Nature Communica- Combinatorics 10, 291 (2006). tions 7, 11863 (2016). [60] P. Erd¨osand A. R´enyi, Publicationes Mathematicae [88] L. Zdeborov´a and F. Krzakala, arXiv:1511.02476 (Debrecen) 6, 290 (1959). (2015). [61] D. J. Watts and S. H. Strogatz, nature 393, 440 (1998). [89] C. R. Rao, Bull Calcutta. Math. Soc. 37, 81 (1945). [62] J. Eisert, M. Cramer, and M. B. Plenio, Reviews of [90] H. Jeffreys, Proc. Roy. Soc. of London A 186, 453 Modern Physics 82, 277 (2010). (1946). [63] F. Franchini, A. R. Its, and V. E. Korepin, Journal of [91] S.-I. Amari, Neural networks 8, 1379 (1995). Physics A Mathematical General 41, 025302 (2008). [92] S.-I. Amari, IEEE transactions on information theory [64] J. Biamonte, V. Bergholm, and M. Lanzagorta, Journal 47, 1701 (2001). of Physics A Mathematical General 46, 475301 (2013). [93] S. L. Braunstein and C. M. Caves, Physical Review Let- [65] J. C. Baez, ArXiv e-prints (2011), arXiv:1102.2098 ters 72, 3439 (1994). [quant-ph]. [94] P. Zanardi, P. Giorda, and M. Cozzini, Physical Review [66] B. Schumacher and M. D. Westmoreland, Contempo- Letters 99, 100603 (2007). rary Mathematics 305, 265 (2002). [95] A. T. Rezakhani, D. F. Abasto, D. A. Lidar, and P. Za- [67] A. Majtey, P. Lamberti, and D. Prato, Physical Review nardi, Physical Review A 82, 012321 (2010). A 72, 052310 (2005). [96] M. M. Rams and B. Damski, Physical Review Letters [68] P. Lamberti, A. Majtey, A. Borras, M. Casas, and 106, 055701 (2011). A. Plastino, Physical Review A 77, 052311 (2008). [97] H. Strobel, W. Muessel, D. Linnemann, T. Zibold, D. B. [69] J. Bri¨etand P. Harremo¨es,Physical review A 79, 052311 Hume, L. Pezz`e,A. Smerzi, and M. K. Oberthaler, (2009). Science 345, 424 (2014). [70] M. Gerlach, F. Font-Clos, and E. G. Altmann, Phys. [98] P. Hauke, M. Heyl, L. Tagliacozzo, and P. Zoller, Na- Rev. X 6, 021009 (2016). ture Physics 12, 778 (2016). [71] H. Akaike, in Second International Symposium on In- [99] C. E. Granade, C. Ferrie, N. Wiebe, and D. G. Cory, formation Theory, edited by B. N. Petrov and F. Csaki New Journal of Physics 14, 103013 (2012). (Akad´emiaiKiado, Budapest, 1973) pp. 267–281. [100] N. Wiebe, C. Granade, C. Ferrie, and D. Cory, Physical [72] J. J. Rissanen, IEEE transactions on information theory review letters 112, 190501 (2014). 42, 40 (1996). [101] C. Helstrom, Physics letters A 25, 101 (1967). [73] M. A. Pitt, I. J. Myung, and S. Zhang, Psychological [102] A. Monras and F. Illuminati, Physical Review A 81, review 109, 472 (2002). 062326 (2010). [74] S.-i. Amari and H. Nagaoka, Methods of information [103] L. Jing, J. Xiao-Xing, Z. Wei, and W. Xiao-Guang, geometry, Vol. 191 (American Mathematical Soc., 2007). Communications in Theoretical Physics 61, 45 (2014). [75] J. Rissanen, Automatica 14, 465 (1978). [76] P. W. Holland, K. B. Laskey, and S. Leinhardt, Social networks 5, 109 (1983).