EPJ manuscript No. (will be inserted by the editor)

Linear response theory for matrix

Klaus M. Frahm1 and Dima L. Shepelyansky1 Laboratoire de Physique Théorique, IRSAMC, Université de Toulouse, CNRS, UPS, 31062 Toulouse, France

August 26, 2019

Abstract. We develop the linear response theory for the Google matrix PageRank algorithm with respect to a general weak perturbation and a numerical efficient and accurate algorithm, called LIRGOMAX algorithm, to compute the linear response of the PageRank with respect to this perturbation. We illustrate its efficiency on the example of the English Wikipedia network with more than 5 millions of articles (nodes). For a group of initial nodes (or simply a pair of nodes) this algorithm allows to identify the effective pathway between initial nodes thus selecting a particular subset of nodes which are most sensitive to the weak perturbation applied to them (injection or pumping at one node and absorption of probability at another node). The further application of the reduced Google matrix algorithm (REGOMAX) allows to determine the effective interactions between the nodes of this subset. General linear response theory already found numerous applications in various areas of science including statistical and mesoscopic physics. Based on these grounds we argue that the developed LIRGOMAX algorithm will find broad applications in the analysis of complex directed networks.

PACS. XX.XX.XX No PACS code given

1 Introduction bility at a certain network node (or group of nodes) and absorbing probability at another specific node (or group of Linear response theory finds a great variety of applications nodes). In a certain sense such a procedure reminds lasing in statistical physics, stochastic processes, electron trans- in random media where a laser pumping at a certain fre- port, current density correlations and dynamical systems quency generates a response in complex absorbing media (see e.g. [1,2,3,4,5]). In this work we apply the approach [14]. of linear response to Google matrices of directed networks More specifically we select two particular nodes, one with the aim to characterize nontrivial interactions be- for injection and one for absorption, for which we use tween nodes. the LIRGOMAX algorithm to determine a subset of most The concept of Google matrix and the related PageR- sensitive nodes involved in a pathway between these two ank algorithm for the World Wide Web (WWW) has been nodes. Furthermore we apply to this subset of nodes the proposed by Brin and Page in 1998 [6]. A detailed descrip- REduced GOogle MAtriX (REGOMAX) algorithm devel- tion of the Google matrix construction and its properties oped in [10,11] and obtain in this way an effective Google is given in [7]. This approach can be applied to numerous matrix description between nodes of the found pathway. situations and various directed networks [8]. Here we develop the LInear Response algorithm for In general the REGOMAX algorithm determines effec- arXiv:1908.08924v1 [cs.SI] 23 Aug 2019 GOogle MAtriX (LIRGOMAX) which applies to a very tive interactions between selected nodes of a certain rel- general model of a weakly perturbed Google matrix or atively small subset embedded in a global huge network. the related PageRank algorithm. As a particular applica- Its efficiency was recently demonstrated for the Wikipedia tion we consider a model of injection and absorption at a networks of politicians [11] and world universities [12], small number of nodes of the networks and test its effi- SIGNOR network of protein-protein interactions [13] and ciency on examples of the English Wikipedia network of multiproduct world trade network of UN COMTRADE 2017 [9]. However, the scope of LIRGOMAX algorithm is [15]. more general. Thus, for example, it can be also applied In this work our aim is to provide a first illustration to compute efficiently and accurately the PageRank sen- of the efficiency of the LIRGOMAX algorithm combined sitivity with respect to small modifications of individual with the reduced Google matrix analysis. Due to this we elements of the Google matrix or its reduced version [10, restrict in this work our considerations to the analytical 11,12,13]. description of the LIRGOMAX algorithm and the illus- From a physical viewpoint the approach of injec- tration of its application to two cases from the English tion/absorption corresponds to a small pumping proba- Wikipedia network of 2017. 2 K.M. Frahm and D.L. Shepelyansky: Linear response theory for Google matrix

The paper is constructed as follows: in Section 2 we full Google matrix G between the Nr selected nodes of in- provide the analytical description of the LIRGOMAX al- terest. The PageRank vector Pr of GR coincides with the gorithm complemented by a brief description of the Google full PageRank vector projected on the subset of nodes, up matrix construction and the REGOMAX algorithm, in to a constant multiplicative factor due to the sum nor- Section 3 we present certain results for two examples of malization. The mathematical computation of GR pro- the Wikipedia network, the discussion of results is given vides a decomposition of GR into matrix components that in Section 4. Additional data are also available at [16]. clearly distinguish direct from indirect interactions: GR = Grr + Gpr + Gqr [11]. Here Grr is given by the direct links between the selected Nr nodes in the global G matrix 2 Theory of a weakly perturbed Google with N nodes. Gpr is a rank one matrix whose columns matrix are rather close (up to constant factor) to the reduced PageRank vector Pr. Even though the numerical weight of G is typically quite large it does not give much new in- 2.1 Google matrix construction pr teresting information about the reduced effective network structure. We first briefly remind the general construction of the The most interesting role is played by G , which takes Google matrix G from a direct network of N nodes. For qr into account all indirect links between selected nodes hap- this one first computes the adjacency matrix A with el- ij pening due to multiple pathways via the global network ements 1 if node j points to node i and zero otherwise. (nd) The matrix elements of G have the usual form G = nodes N (see [10,11]). The matrix Gqr = Gqrd + Gqr ij (nd) αSij + (1 − α)/N [6,7,8], where S is the matrix of Markov has diagonal (Gqrd) and non-diagonal (Gqr ) parts with (nd) transitions with elements Sij = Aij/kout(j) and kout(j) = Gqr describing indirect interactions between selected PN nodes. The exact formulas and the numerical algorithm Aij 6= 0 being the out-degree of node j (number of i=1 for an efficient numerical computation of all three com- outgoing links) or Sij = 1/N if j has no outgoing links (dangling node). The parameter 0 < α < 1 is the damp- ponents of GR are given in [10,11]. It is also useful to ing factor with the usual value α = 0.85 [7] used here. compute the weights WR, Wpr, Wrr, Wqr of GR and its We note that for the range 0.5 ≤ α ≤ 0.95 the results are 3 matrix components Gpr, Grr, Gqr given by the sum of not sensitive to α [7,8]. This corresponds to a model of a all its elements divided by the matrix size Nr. Due to random surfer who follows with probability α at random the column sum normalization of GR we obviously have one of the links available from the actual node or jumps WR = Wrr + Wpr + Wqr = 1. with probability (1 − α) to an arbitrary other node in the network. The right PageRank eigenvector of G is the solution of 2.3 General model of linear response the equation GP = λP for the unit eigenvalue λ = 1 [6,7]. The PageRank P (j) values represent positive probabilities We consider a Google matrix G(ε) (with non-negative ma- P to find a random surfer on a node j ( j P (j) = 1). All trix elements satisfying the usual column sum normal- nodes can be ordered by decreasing probability P num- ization) depending on a small parameter ε and a gen- bered by the PageRank index K = 1, 2, ...N with a max- eral stochastic process P (t + 1) = G(ε) F (ε, P (t)) where imal probability at K = 1 and minimal at K = N. The F (ε, P ) is a general function on ε and P which does NOT numerical computation of P (j) is done efficiently with the need to be linear in P . (Here P (t) denotes a time depen- PageRank iteration algorithm described in [6,7]. dence of the vector P ; below and for the rest of this paper It is also useful to consider the original network with we will use the notation P (j) for the jth component of inverted direction of links. After inversion the Google ma- the vector P ). trix G∗ is constructed via the same procedure (using the Let ET = (1,..., 1) be the usual vector with unit en- transposed adjacency matrix) and its leading eigenvector tries. Then the condition of column sum normalization of P ∗, determined by G∗P ∗ = P ∗, is called CheiRank [17] G(ε) reads ET G(ε) = ET . The function F (ε, P ) should (see also [8]). Its values P ∗(j) can be again ordered in de- satisfy the condition ET F (ε, P ) = 1 if ET P = 1. At ε = 0 creasing order resulting in the CheiRank index K∗ with we also require that F (0,P ) = P , i.e. F (0,P ) is the iden- ∗ ∗ highest value of P at K = 1 and smallest values at tity operation on P . We denote by G0 = G(0) the Google ∗ ∗ K = N. On average, the high values of P (P ) corre- matrix at ε = 0 and by P0 its PageRank vector such that T spond to nodes with many ingoing (outgoing) links [7,8]. G0 P0 = P0 with E P0 = 1. We denote by P the more general, ε-dependent, solution of

T 2.2 Reduced Google matrix algorithm P = G(ε) F (ε, P ) ,E P = 1 . (1)

The REGOMAX method is described in detail in [10, 11,13,12]. For a given relatively small subset of Nr  N 2.3.1 Pump model nodes it allows to compute efficiently a “reduced Google matrix” GR of size Nr × Nr that captures the full contri- As a first example we present the Pump model to model an butions of direct and indirect pathways appearing in the injection- and absorption scheme. For this we choose for K.M. Frahm and D.L. Shepelyansky: Linear response theory for Google matrix 3

the Google matrix simply G(ε) = G0 (i.e. no ε-dependence We expect that ∆P (ε) has a finite well defined limit if for G) and ε → 0. However, its direct numerical computation by (5) is subject to numerical loss of precision if ε is too small. In (1 + εD)P (1 + εD)P F (ε, P ) = = (2) the following, we will present a different scheme to com- ET [(1 + εD)P ] 1 + ε e(P ) pute ∆P which is numerically more accurate and stable

T and that we call linear response of Google matrix. For this with e(P ) = E DP and D being a diagonal matrix with we expand G(ε) and F (ε, P ) up to order ε1 (neglecting entries Dj which are mostly zero and with few positive val- terms ∼ ε2 or higher): ues Dj > 0 for nodes j with injection and few negative val- ues D < 0 for nodes j with absorption. The non-vanishing j G(ε) = G0 + εG1 + ...,F (ε, P ) = P + εF1(P ) + ... diagonal entries of Dj should be comparable (a global scal- (6) ing factor can be absorbed in a redefinition of the parame- Furthermore we write ter ε) and we have e(P ) = P D P (j). Physically, we Dj 6=0 j multiply each entry P (j) by the factor 1 + εDj (which is P (ε) = P0 + ε P1 + .... (7) unity for most nodes j) and then we sum-normalize this vector to unity before we apply the Google matrix G0 to The usual sum-normalization conditions for the first order it. corrections read as :

T T T E G1 = 0 ,E F1(P ) = 0 ,E P1 = 0 (8) 2.3.2 PageRank sensitivity T T if E P = E P0 = 1. These conditions imply that P1 and As a second example we consider the PageRank sensitiv- also F1(P ) belong to the subspace “bi-orthogonal” to the ity. For this we fix a pair (i, j) of indices and multiply PageRank, i.e. orthogonal to the left leading eigenvector T the matrix element (G0)ij by (1 + ε) and then we sum- of G0 which is just the vector E . normalize the column j to unity which provides the ε- Inserting (6) and (7) into (1) we obtain (up to order dependent Google matrix G(ε). For the function F (ε, P ) ε1): we simply choose the identity operation: F (ε, P ) = P . In a more explicit formula we have: h i P = P0 + εP1 = G0 P0 + ε G0 P1 + G1 P0 + G0 F1(P0) .

(1 + ε δkiδlj)(G0)kl (9) ∀k,l Gkl(ε) = (3) Comparing the terms of order ε0 one obtains the usual 1 + ε δlj (G0)ij unperturbed PageRank equation P0 = G0 P0. The terms 1 where δki = 1 (or 0) if k = i (or k 6= i). Note that the of order ε provide an inhomogeneous PageRank equation denominator is either 1 if l 6= j or the modified column of the type : sum 1 + ε (G0)ij of column j if l = j. Then the sensitivity D(j→i)(k) is defined as: P1 = G0 P1 + V0 ,V0 = G1 P0 + G0 F1(P0) . (10)

P (k) − P0(k) The solution P1 of this equation is just the limit of (5): D(j→i)(k) = (4) εP0(k) P (ε) − P (0) P1 = lim ∆P (ε) = lim . (11) where P is the ε-dependent PageRank of G(ε) computed ε→0 ε→0 ε in the usual way. We expect that this quantity has a well defined limit if ε → 0 but equation (4) is numerically not To solve numerically (10) we first determine the unper- very precise for very small values of ε due to the effect of turbed PageRank P0 of G0 in the usual way and compute loss of precision. Below we present a method to compute V0 which depends on P0. Then we iterate the equation: the sensitivity in a precise way in the limit ε → 0. (n+1) (n) Examples of the sensitivity analysis, using directly (4), P1 = G0 P1 + V0 (12) were considered for the reduced Google matrix of sets of Wikipedia and other networks (see e.g. [12,15]). (0) where for the initial vector we simply choose P1 = 0. This iteration converges with the same speed as the usual PageRank algorithm versus the vector P1 and it is numer- 2.4 Linear response ically more accurate than the finite difference ∆P (ε) at some finite value of ε. 2.4.1 General scheme T P T We remind that E V0 = j V0(j) = 0 and also E G0 = One can directly numerically determine the ε-dependent T (n) E . Therefore if at a given iteration step the vector P1 solution P (ε) of (1) for some small but finite value of ε satisfies the condition ET P (n) = 0 we also have ET P (n+1) = (n+1) (n) 1 1 (by iterating P = G(ε),F (ε, P ) with some suitable T (n) T T (n) (0) E G0P1 + E V0 = E P1 = 0. initial vector P ) and compute the quantity Therefore the conditions (8) are satisfied by the itera- P (ε) − P (0) tion equation (12) at least on a theoretical/mathematical ∆P (ε) = . (5) level. However, rounding errors may produce slight errors ε 4 K.M. Frahm and D.L. Shepelyansky: Linear response theory for Google matrix in the conditions (8) and since such numerical errors con- Furthermore, since for the injection- and absorption tain a contribution in the direction of the unperturbed scheme we also have G1 = 0, the vector V0 in (10) and PageRank vector P0, corresponding to the eigenvector of (12) becomes: G0 with maximal eigenvalue, they do not disappear dur- ing the iteration and might even (slightly) increase with V0 = G0 F1(P0) = G0 Q(DP0) T n. Therefore, due to purely numerical reasons, it is useful = G0DP0 − (E DP0) G0 P0 (16) to remove such contributions by a projection after each T (n+1) = G0DP0 − (E G0 DP0) P0 = Q(G0 DP0) iteration step of the vector P1 on the subspace bi- orthogonal to the PageRank by: with Q being the projector given in (13). Here we have T T (n+1)  (n+1) T used that E G0 = E and G0 P0 = P0. This small cal- P1 → Q P1 ,Q(X) = X −(E X) P0 (13) culation also shows that the projection operator can be applied before or after multiplying G to DP . where Q(X) is the projection operator applied on a vector 0 0 X. It turns out that such an additional projection step in- deed increases the quality and accuracy of the convergence 2.4.3 Application to the sensitivity of (12) but even without it the iteration (12) converges nu- merically, however with a less accurate result. In this case we have F (ε, P ) = P such that F1(P ) = 0 It is interesting to note that one can “formally” solve and we have to determine G from the expansion G(ε) = (12) by: 1 ∞ G0 + εG1 + .... Expanding (3) up to first order in ε we X 1 P = Gn V = V (14) obtain: 1 0 0 1 − G 0 n=0 0 ∀kl (G1)kl = δkiδlj (G0)kl − δlj (G0)ij (17) which can also be found directly from the first equation −1 in (10). Strictly speaking the matrix inverse (1 − G0) where (i, j) is the pair of indices for which we want to does not exist since G0 has always one eigenvalue λ = 1. compute the sensitivity. Using G1 we compute V0 = G1 P0 T However, since E V0 = 0, the vector V0, when expanded and solve the inhomogeneous PageRank equation (10) it- in the basis of (generalized) eigenvectors of G0, does not eratively as described above to obtain P1. Once P1 is know contain a contribution of P0 which is the eigenvector for we can compute the sensitivity from : λ = 1 such that the expression (14) is actually well de- fined. From a numerical point of view a different scheme P1(k) D(j→i)(k) = . (18) to compute P1 would be to solve directly the linear sys- P0(k) tem of equations (1 − G0) P1 = V0 where the first (or any other suitable) equation of this system is replaced by We note that equation (18) is numerically accurate and T the condition E P1 = 0 resulting in a linear system with a corresponds to the exact limit ε → 0 while (4) is numeri- well defined unique solution. Of course, such a direct com- cally not very precise and requires a finite small value of putation is limited to modest matrix dimensions N such ε. that a full matrix inversion is possible (typically N be- ing a few multiples of 104) while the iterative scheme (12) is possible for rather large matrix dimensions such that 2.5 LIRGOMAX combined with REGOMAX the usual PageRank computation by the power method is possible. For example for the English Wikipedia edition We consider the pump model described above and we take of 2017 with N ≈ 5 × 106 the iterative computation of two particular nodes i with injection and j with absorp- P1 using (12) takes typically 2 − 5 minutes on a recent tion. For the diagonal matrix D we choose Di = 1/P0(i) single processor core (e.g.: Intel i5-3570K CPU) without and Dj = −1/P0(j) where P0 is the PageRank of the un- any use of parallelization once the PageRank P0 is known. perturbed network and all other values Dk = 0. In this T (The computation of P0 by the usual power method takes way we have e(P0) = E DP0 = Di P0(i) + Dj P0(j) = 0. roughly the same time.) Due to this the renormalization denominator in (2) is sim- ply unity and all excess probability provided by the injec- tion at node i will be exactly absorbed by the absorption 2.4.2 Application to the pump model at node j. We insist that this is only due to our particu- lar choice for the matrix D and concerning the numerical For the injection- and absorption scheme we can compute procedure one can also choose different values of Di or F1(P ) from (2) as: Dj with e(P0) 6= 0 (which would result in some global F (ε, P ) = (1 − ε e(P ) + ...)(P + εDP ) (15) excess probability which would be artificially injected or absorbed due the normalization denominator in (2) being = P + ε [P − (ET DP )P ] + .... different from unity). 1 Here the term ∼ ε is just the projection of DP to the Using the above values of Di and Dj we compute T subspace bi-orthogonal to P . This projection is the mani- the vector V0 = G0 DP0 = G0 W0 (since E DP0 = 0) festation in first order in ε of the renormalization used in where W0 = DP0 is a vector with only two non-zero (2). components W0(i) = 1 and W0(j) = −1. Therefore for K.M. Frahm and D.L. Shepelyansky: Linear response theory for Google matrix 5

Table 1. Top 20 nodes of strongest negative values of P1 (index nodes participating in the pathway between the pumping number i = 1,..., 20) and top 20 nodes of strongest positive node i and absorbing node j. values of P1 (index number i = 21,..., 40) with P1 being cre- Using this subset we then apply the REGOMAX al- ated as the linear response of PageRank of English Wikipedia gorithm to compute the reduced Google matrix and its 2017 PageRank with injection (or pumping) at University of components which are analyzed in a similar way as in Cambridge and absorption at Harvard University; KL is the [11]. The advantage of the application of LIRGOMAX at ranking index obtained by ordering |P1| and K is the usual the initial stage is that it provides an automatic proce- PageRank index obtained by ordering the PageRank probabil- dure to determine an interesting subset of nodes related ity P0 of the global network with N nodes. to the pumping between nodes i and j instead of using an arbitrary heuristic choice for such a subset. The question arises if the initial two nodes i and j i K K Node name L belong themselves to the subset of nodes with largest P 1 1 129 Harvard University 1 2 2 1608 Cambridge, Massachusetts entries (in modulus). From a physical point of view we 3 4 1 United States indeed expect that this is generically the case but there 4 5 296 Yale University 5 6 4617 Harvard College is no simple mathematical argument for this. In particu- 6 7 62115 Harvard Yard lar for nodes with a low PageRank ranking and zero or 7 10 104359 Harvard Museum of Natural History few incoming links this is probably not the case. However, 8 11 415 National Collegiate Athletic Ass. 9 12 52 The New York Times concerning the two examples which we will present in the 10 13 75 American Civil War next section both initial nodes i and j are indeed present 11 14 7433 Harvard Medical School 12 15 73 American football in the selected subset and even with rather top positions 13 16 20994 Charles River in the ranking (provided by ordering |P1|). 14 17 50 Washington, D.C. 15 18 23901 Harvard Divinity School 16 19 436 Massachusetts Institute of Tech. 17 20 88022 President and Fellows of Harvard Col. 18 21 128 Boston 3 LIRGOMAX for Wikipedia network 19 22 42608 The Harvard Crimson 20 23 42259 Harvard Square As a concrete example we illustrate the application of LIR- 21 3 229 University of Cambridge GOMAX algorithm to the English Wikipedia network of 22 8 15 England 23 9 1414 Cambridge 2017 (network data available at [9]). This network con- 24 69 1842 Trinity College, Cambridge tains N = 5416537 nodes, corresponding to article titles, 25 253 6591 St John’s College, Cambridge and N = 122232932 directed hyperlinks between nodes. 26 254 7022 King’s College, Cambridge i 27 256 285 Order of the British Empire Previous applications of the REGOMAX algorithm for the 28 257 6 United Kingdom Wikipedia networks of years 2013 and 2017 are described 29 258 33256 Newnham College, Cambridge 30 260 316 Church of England in [11,12]. 31 262 238 The Guardian 32 263 21569 Clare College, Cambridge 33 264 4656 Durham University 34 265 191614 Regent House 3.1 Case of pathway Cambridge - Harvard Universities 35 266 3814 Chancellor (education) 36 267 16193 Gonville and Caius Col. Cambridge As a first example of the application of the combined LIR- 37 270 25165 E. M. Forster 38 274 1650 Archbishop of Canterbury GOMAX and REGOMAX algorithms we select two arti- 39 277 2076 Fellow cles (nodes) of the Wikipedia network with pumping at 40 278 73538 Colleges of the Univ. of Cambridge University of Cambridge and absorption at Harvard Uni- versity. The global PageRank indices of these two nodes are K = 229 (PageRank probability P0(229) = 0.0001078) and K = 129 (PageRank probability P0(129) = 0.0001524). As described above we chose the diagonal matrix D as all k we have V0(k) = (G0)ki − (G0)kj. According to D(229) = 1/P0(229) and D(129) = −1/P0(129) (other the above theory we know that V0 and W0 are orthog- diagonal entries of D are chosen as zero) and determine T T T onal to E , i.e. E V0 = E W0 = 0 or more explicitely the vector V0 used for the computation of P1 (see (12)) P P k V0(k) = k W0(k) = 0. For W0 the last equality is by V0 = G0 W0 where the vector W0 = DP0 has the obvious and the first one is due to the column sum normal- nonzero components W0(229) = 1 and W0(129) = −1. P P ization of G0 meaning that k(G0)ki = k(G0)kj = 1. Both W0 and V0 are orthogonal to the left leading eigen- T Using the expression V0(k) = (G0)ki − (G0)kj we deter- vector E = (1,..., 1) of G0 according to the theory de- mine the solution of the linear response correction to the scribed in the last section. PageRank P1 by solving iteratively the inhomogeneous The subset of 40 most affected nodes with 20 strongest PageRank equation (10) as described above. The vector negative and 20 strongest positive values of the linear P1 has real positive and negative entries also satisfying the response correction P1 to the initial PageRank P0 are P condition k P1(k) = 0. Then we determine the 20 top given in Table 1. We order these 40 nodes by the index nodes with strongest negative values of P1 and further 20 i = 1,..., 20 for the first 20 most negative P1 values and top nodes with strongest positive values of P1 which con- then i = 21,..., 40 for the most positive P1 values. The 6 stitute a subset of 40 nodes which are the most significant index KL is obtained by ordering |P1| for all N ≈ 5 × 10 6 K.M. Frahm and D.L. Shepelyansky: Linear response theory for Google matrix

-2 GR, EnWiki2017, Cam./Harv. Grr, EnWiki2017, Cam./Harv. 10 1 1 1 10-4 10 10 -6 10 i 20 i 20

| 0.5 1 -8 10 30 30 |P -10 10 40 40 -12 1 10 20 30 40 1 10 20 30 40 10 i i 0 -14 EnWiki2017, Cam./Harv. Gpr, EnWiki2017, Cam./Harv. Gqr, EnWiki2017, Cam./Harv. 10 1 1 100 101 102 103 104 105 106 107 10 10 -0.5 KL i 20 i 20 -1 10 30 30 -2 10 40 40 -1 1 10 20 30 40 1 10 20 30 40 -3 10 i i

1 ± -4 EnWiki2017, Cam./Harv. Fig. 2. Reduced Google matrix components GR, Gpr, Grr and P 10 Gqr for the English Wikipedia 2017 network and the subgroup -10-3 of nodes given in Table 1 corresponding to injection at Univer- sity of Cambridge and absorption at Harvard University (see -10-2 text for explanations). The axis labels correspond to the index number i used in Table 1. The relative weights of these com- -1 -10 ponents are Wpr = 0.920, Wrr = 0.036, and Wqr = 0.044. Note 0 1 2 3 that elements of Gqr may be negative. The values of the color 10 10 10 10 bar correspond to sgn(g)(|g|/ max |g|)1/4 where g is the shown matrix element value. The exponent 1/4 amplifies small values KL of g for a better visibility. Fig. 1. Linear response vector P1 of PageRank for the En- glish Wikipedia 2017 with injection (or pumping) at Univer- sity of Cambridge and absorption at Harvard University. Here KL is the ranking index obtained by ordering |P1| from max- Times, American Civil War for Harvard U and Church of imal value at KL = 1 down to its minimal value. Top panel England, The Guardian, Durham University for U Cam- shows |P1| versus KL in a double logarithmic representation bridge (in addition to many Colleges presented in the list) for all N nodes. Bottom panel shows a zoom of P1 versus KL 3 corresponding to closest other universities and also news- for KL ≤ 10 in a double logarithmic representation with sign; papers appearing on the pathway between the pair of se- blue data points correspond to P1 > 0 and red data points to lected nodes. P1 < 0. Of course, the linear response vector P1 extends on all N nodes of the global network. We show its dependence on the ordering index KL in Figure 1. Here the top panel network nodes. The table also gives the PageRank in- represents the decay of |P1| with KL and the bottom panel dex K obtained by ordering P0. The first 4 positions in shows the decay of negative and and positive P1 values for 3 KL are taken by Harvard University; Cambridge, Mas- KL ≤ 10 . We note that among top 100 values of KL there sachusetts; University of Cambridge; United States. Thus, are only 4 nodes related to U Cambridge with positive even if the injection is made for University of Cambridge P1 values while all other values of P1 are negative being the strongest response appears for Harvard University; related to Harvard U. This demonstrates a rather different Cambridge, Massachusetts and only then for University structural influence between these two universities. of Cambridge (KL = 1, 2, 3). We attribute this to non- After the selection of 40 most significant nodes of the trivial flows existing in the global directed network. This pathway between both universities (see Table 1) we apply shows that the linear response approach provides rather the REGOMAX algorithm which determines all matrix interesting information about the sensitivity and interac- elements of Markov transitions between these 40 nodes tions of nodes on directed networks. We will see below for including all direct and indirect pathways via the huge other examples that the top nodes of the linear response global Wikipedia network with 5 million nodes. vector P1 can have rather unexpected features. The reduced Google matrix GR and its three compo- In general the most sensitive nodes of Table 1 are nents Gpr, Grr, Gqr are shown in Figure 2. As discussed rather natural. They represent countries, cities and other above the weight Wpr = 0.920 of Gpr is close to unity and administrative structures related to the two universities. its matrix structure is rather similar to the one of GR with Other type of nodes are Yale University, The New York strong transition lines of matrix elements corresponding to K.M. Frahm and D.L. Shepelyansky: Linear response theory for Google matrix 7

(nd) Grr+Gqr , EnWiki2017, Cam./Harv. (nd) 1 Grr+Gqr 1 friends ●28

10 0.5 ●24 ●21■■■■■■■■■■■■■■■■■■■■■■■University of Cambridge ●25 ●26 12●8 ● 3 ■■US 35 ● i 20 0 ●38● ●9 ●31 ●14 ●10 ●30 ●22■■■■■■■England 30 -0.5 ●4 ●5 ●2 ●18 40 -1 ●1 ■■■■■■■■■■■■■■■■■■Harvard University 1 10 20 30 40 ●23 ■■■■■■■■■■■■■■Town Cambridge ●16 i (nd) Fig. 3. Same as in Fig. 2 but for the matrix Grr +Gqr , where (nd) Gqr is obtained from Gqr by putting its diagonal elements at zero; the weight of these two components is Wrr+qrnd = 0.066. Fig. 4. Network of friends for the subgroup of nodes given in Table 1 corresponding to injection at University of Cam- bridge and absorption at Harvard University constructed from the matrix G + G(nd) using 4 top (friends) links per column United States at top PageRank index K = 1 (i = 3,KL = rr qr (see text for explanations). The numbers used as labels for the 4) and United Kingdom at K = 6 (i = 28,KL = 257). The different nodes correspond to the index i of Table 1. weights Wrr = 0.036, Wqr = 0.044 of Grr, Gqr are signifi- cantly smaller. These values are similar to those obtained in the REGOMAX analysis of politicians and universities in Wikipedia networks [11,12]. Even if the weights of these (nd) From Figure 3 we see that for Grr +Gqr the strongest matrix components are not large they represent the most interactions are also inside each university block. However, interesting and nontrivial direct (Grr) and indirect (Gqr) there are still some significant links between blocks with interactions between the selected 40 nodes. The image of strongest matrix elements being 0.0120 from Fellow to Grr in Figure 2 shows that the direct links between the United States and 0.0050 from Harvard University to Uni- U Cambridge block of nodes (with index 21 ≤ i ≤ 40 in versity of Cambridge (for both directions between blocks). Table 1) and the Harvard U block of nodes (with index The link Fellow to United States is also the strongest in- 1 ≤ i ≤ 20 in Table 1) are rather rare and relatively weak direct link in its off diagonal sub-block (for Gqr) while while the links within each block are multiple and rela- the strongest direct link (for Grr) is Fellow to Harvard tively strong. This confirms the appropriate selection of University. Furthermore, the link Harvard University to nodes in each block provided by the LIRGOMAX algo- University of Cambridge is also the strongest indirect link rithm. in its off diagonal sub-block (for Gqr) while the strongest The matrix elements of the sum of two components direct link (for Grr) is Harvard College to University of (nd) Grr + Gqr (component Gqr is taken without diagonal Cambridge. (nd) elements) are shown in Figure 3. We note that some ele- Using the transition matrix elements of Grr + Gqr ments are negative which is not forbidden since only the we construct a network of effective friends shown in Fig- sum of all three components given by GR should have posi- ure 4. First, we select five initial nodes which are placed on tive matrix elements. However, the negative values are rare a (large) circle: the two nodes with injection/absorption and relatively small compared to the values of positive ma- (University of Cambridge and Harvard University) and trix elements. Thus the minimal value is −0.00216 for the three other nodes with a rather top position in the KL transition from Church of England to United States while ranking (England, (Town of) Cambridge and United States). other typical negative values are smaller by a factor 5-10. For each of these five initial nodes we determine four For comparison, the maximal value of positive element friends by the criterion of largest matrix elements (in mod- is 0.1135 from Regent House to University of Cambridge ulus) in the same column, i.e. corresponding to the four and there are many other positive elements of the order strongest links from the initial node to the potential friends. of 0.03. Thus we consider that the negative elements play The friend nodes found in this way are added to the net- no significant role. A similar conclusion was also obtained work and drawn on circles of medium size around their for the interactions of politicians and universities in [11, initial node (if they do not already belong to the initial 12]. set of 5 top nodes). The links from the initial nodes to 8 K.M. Frahm and D.L. Shepelyansky: Linear response theory for Google matrix their friends are drawn as thick black arrows. For each of Table 2. Same as in Table 1 for English Wikipedia 2017 with the newly added nodes (level 1 friends) we continue to de- injection (pumping) at Napoleon and absorption at Alexander termine the four strongest friends (level 2 friends) which I of Russia. are drawn on small circles and added to the network (if there are not already present from a previous level). The corresponding links from level 1 friends to level 2 friends i KL K Node name are drawn as thin red arrows. 1 1 181 Russian Empire 2 2 216 Saint Petersburg Each node is marked by the index i from the first col- 3 3 5822 Alexander I of Russia umn of Table 1. The colors of the nodes are essentially red 4 4 15753 Paul I of Russia 5 5 3409 Catherine the Great for nodes with strong negative values of P1 (corresponding 6 6 158 Moscow to the index i = 1,..., 20) and blue for nodes with strong 7 7 17 Russia 8 8 153 Azerbaijan positive values of P1 (for i = 21,..., 40). Only for three 9 9 9035 Nicholas I of Russia of the initial nodes we choose different colors which are 10 10 203707 Elizabeth Alexeievna (Louise of Baden) olive for US, green for England and cyan for (the town of) 11 11 92 Ottoman Empire 12 12 7 Iran Cambridge. 13 13 177854 Government reform of Alexander I The network of Figure 4 shows a quite clear separa- 14 14 889 Caucasus 15 15 31784 Russo-Persian War (1804–13) tion of network nodes in two blocks associated to the two 16 16 8213 Alexander II of Russia universities with a rather small number of links between 17 17 475 Prussia the two blocks (e.g. US is a friend of England but not 18 18 5764 Dagestan 19 19 32966 Serfdom in Russia vice-versa). 20 20 131205 Adam Jerzy Czartoryski 21 29 201 Napoleon 22 52 192 French Revolution 23 144 4 France 3.2 Case of pathway Napoleon - Alexander I of Russia 24 149 12 Italy 25 167 10611 French Directory 26 180 24236 Joséphine de Beauharnais We illustrate the application of the LIRGOMAX and RE- 27 188 7361 National Convention 28 189 21727 French campaign in Egypt and Syria GOMAX algorithms on two other nodes of the Wikipedia 29 195 2237 Corsica network with injection (pumping) at Napoleon and ab- 30 198 3166 Louis XVI of France 31 199 7875 Saint Helena sorption at Alexander I of Russia. The global PageRank 32 200 40542 André Masséna indices of these two nodes are K = 201 (PageRank proba- 33 201 3353 French Revolutionary Wars 34 203 11916 Maximilien Robespierre bility P0(201) = 0.0001188) and K = 5822 (PageRank −5 35 204 1241 Louvre probability P0(5822) = 1.389 × 10 ) respectively. In 36 205 69382 Lucien Bonaparte contrast to the the previous example the two PageRank 37 206 21754 Coup of 18 Brumaire 38 207 15926 French Republican Calendar probabilities are rather different. However, this difference 39 208 14931 Jacobin is compensated by our choice of the diagonal matrix D 40 209 7509 Napoleonic Code with D(201) = 1/P0(201) and D(5822) = −1/P0(5822) (other diagonal entries of D begin zero). Again we deter- mine the vector V0 used for the computation of P1 (see (12)) by V0 = G0 W0 where the vector W0 = DP0 has the historical figures related in some manner to Napoleon or nonzero components W0(201) = 1 and W0(5822) = −1. Alexander I of Russia. Furthermore, both W0 and V0 are orthogonal to the left The dependence of the linear response vector P1 on the T leading eigenvector E = (1,..., 1) of G0. index KL is shown in Figure 5 (analogous to Figure 1). The top nodes of P1, noted by index i, with 20 strongest The decay of |P1| with KL is shown in the top panel, negative and 20 strongest positive values are presented in being similar to the top panel of Figure 1. The values of Table 2. The ranking of nodes in decreasing order of |P1| P1 with sign are shown in the bottom panel. The difference given by the index KL is shown in the second column of of the |P1| values for Napoleon and Alexander I of Russia Table 2. It is interesting to note that the injection node is not so significant but many nodes (28) from the block Napoleon is only at position KL = 29 with a significantly of Alexander I of Russia have larger |P1| values than |P1| smaller value of |P1| compared to Alexander I of Russia at of Napoleon. KL = 3, Russian Empire at KL = 1 and Saint Petersburg The results for the reduced Google matrix of 40 nodes at KL = 2. But among the positive P1 values Napoleon of Table 2 are shown in Figure 6. The strongest lines of is still at the first position. We attribute this relatively transitions in GR and Gpr correspond to nodes with top small |P1| value of Napoleon compared to the nodes of PageRank positions of the global Wikipedia network being the other block to significant complex directed flows in France at K = 4 (i = 23,KL = 144), Iran at K = 7 (i = the global Wikipedia network. Also Napoleon has a signif- 12,KL = 12), Italy at K = 12 (i = 24,KL = 149) icantly stronger PageRank probability and thus this node and Russia at K = 17 (i = 7,KL = 7). As explained produces a stronger influence on Alexander I of Russia above the structure of transitions appears rather similar than vice-versa. between GR and Gpr. The weights of all three components In contrast to the previous case of universities Table 2 Gpr, Grr, Gqr are similar to those of the two universities contains mainly countries, a few towns and islands, and (see caption of Figure 6). K.M. Frahm and D.L. Shepelyansky: Linear response theory for Google matrix 9

-2 (nd) 10 Grr+Gqr , EnWiki2017, Nap./Alex.I 1 10-4 1 10-6 -8 | 10 0.5 1 10

|P 10-10 10-12 i 20 0 10-14 -16 EnWiki2017, Nap./Alex.I 10 30 -0.5 100 101 102 103 104 105 106 107 KL -1 10 40 -1 10-2 1 10 20 30 40 10-3 i Fig. 7. Same as in Fig. 3 for the subgroup of Table 2 corre- 1 ± -4 EnWiki2017, Nap./Alex.I P 10 sponding to injection at Napoleon and absorption at Alexander (nd) -10-3 I of Russia. The weight of Grr + Gqr is Wrr+qrnd = 0.087. -10-2 -1 -10 The components Grr and Gqr, shown in Figure 6, are 0 1 2 3 also dominated by the two diagonal blocks related to the 10 10 10 10 two initial nodes Napoleon and Alexander I of Russia. KL There are only a few direct links between the two blocks but the number of indirect links is substantially increased. Fig. 5. Same as in Fig. 1 for the subgroup of Table 2 corre- (nd) sponding to injection at Napoleon and absorption at Alexander The sum of these two components Grr + Gqr is shown I of Russia. in Figure 7, where the diagonal elements of Gqr are omit- ted. The strongest couplings between the two blocks in (nd) Grr + Gqr are 0.009879 for the link from French cam- GR, EnWiki2017, Nap./Alex.I Grr, EnWiki2017, Nap./Alex.I 1 1 1 paign in Egypt and Syria to Ottoman Empire and 0.02439 for the link from Elizabeth Alexeievna (Louise of Baden) 10 10 to Napoleon (for both directions between the diagonal blocks). i 20 i 20 0.5 In analogy to Figure 4 we construct the network of 30 30 friends for the subset of Table 2 shown in Figure 8. As in Figure 4, we use the four strongest transition matrix 40 40 (nd) 1 10 20 30 40 1 10 20 30 40 elements of Grr +Gqr per column to construct links from i i 0 the five top nodes to level 1 friends (thick black arrows) Gpr, EnWiki2017, Nap./Alex.I Gqr, EnWiki2017, Nap./Alex.I and from level 1 to level 2 friends (thin red arrows). As 1 1 the five initial top nodes we choose France (cyan), Russian 10 10 Empire (olive), Saint Petersburg (green), Napoleon (blue) -0.5 and Alexander I of Russia (red); all other nodes of the i 20 i 20 Napoleon block (21 ≤ i ≤ 40 in Table 2) are shown in blue, and all other nodes of the Alexander I of Russia 30 30 block (1 ≤ i ≤ 20 in Table 2)) are shown in red; numbers 40 40 -1 inside the points correspond to the index i of Table 2. 1 10 20 30 40 1 10 20 30 40 The network of Figure 8 also shows a clear two block i i structure with relatively rare links between the two blocks. Fig. 6. As Fig. 2 for the subgroup of Table 2 corresponding to The coupling between two blocks appears due to one link injection at Napoleon and absorption at Alexander I of Russia. from Alexander I of Russia to Prussia, which even being The relative weights of the different matrix components are red is more closely related to the blued block of nodes. Wpr = 0.900, Wrr = 0.042 and Wqr = 0.058. For both examples network figures constructed in the same way using the other matrix components GR, Grr or (nd) Gqr (instead of Grr + Gqr ) or using strongest matrix el- 10 K.M. Frahm and D.L. Shepelyansky: Linear response theory for Google matrix

(nd) and 2 with effective friend networks shown in Figures 4 Grr+Gqr ●14 and 8. friends As a further independent application the LIRGOMAX ●11 ●8 ●12 algorithm allows also to compute more accurately the Page- ●3 Rank sensitivity with respect to variations of matrix ele- ■■■■■■■■■■■■■■■■■■ Alexander I of Russia ●5 ments of the (reduced) Google matrix as already studied in [12,15]. ●2 It is known that the linear response theory finds a va- 24●17 ● ■■■■■■■■■■■■■■■■Saint Petersburg riety of applications in statistical and mesoscopic physics [1,3], current density correlations [4], stochastic processes ●22●27 ●21■■■■■■■■Napoleon and dynamical chaotic systems [2,5]. The matrix prop- erties and their concepts, like Random Matrix Theory ●33 ●7 (RMT), find important applications for various physical ●6 systems (see e.g. [18]). However, in physics one usually ●30 ●1 works with unitary or Hermitian matrices, like in RMT. ■■■■■■■■■■■■■■Russian Empire In contrast the Google matrices belong to another class of ●23 matricies rarely appearing in physical systems but being ■■■■■■France very natural to the communication networks developed by modern societies (WWW, Wikipedia, Twitter ...). Thus we hope that the linear response theory for the Google matrix developed here will also find useful applications in the analysis of real directed networks. Fig. 8. Same as in Fig. 4 for the subgroup of Table 2 corre- sponding to injection at Napoleon and absorption at Alexander I of Russia. Acknowledgments

This work was supported in part by the Programme In- ements in rows (instead of columns) to determine follower vestissements d’Avenir ANR-11-IDEX-0002-02, reference networks are available at [16]. ANR-10-LABX-0037-NEXT (project THETRACOM); it This type of friend/follower effective network schemes was granted access to the HPC resources of CALMIP constructed from the reduced Google matrix (or one of its (Toulouse) under the allocation 2019-P0110. components) were already presented in [11] in the context of Wikipedia networks of politicians. References

4 Discussion 1. R. Kubo, Statistical-mechanical theory of irreversible pro- cesses. I. General theory and simple applications to mag- We introduced here a linear response theory for a very netic and conduction problems, J. Phys. Soc. Japan 12(6), generic model where either the Google matrix or the as- 570 (1957) sociated Markov process depends on a small parameter 2. P. Hanggi and H. Thomas, Stochastic processes: time evo- and we developed the LIRGOMAX algorithm to compute lution, symmetries and linear response, Phys. Rep. 88, 207 (1982). efficiently and accurately the linear response vector P1 to 3. H.U. Baranger and A.D. Stone, Electrical linear-response the PageRank P0 with respect to this parameter for large directed networks. As a particular application of this ap- theory in an arbitrary magnetic field: a new Fermi-surface proach it is in particular possible to identify the most im- formation, Phys. Rev. B 40, 8159 (1989) portant and sensitive nodes of the pathway connecting two 4. G. Vignale and W. Kohn, Current-dependent exchange- initial groups of nodes (or simply a pair of nodes) with in- correlation potential for dynamical linear response theory, Phys. Rev. Lett. 77, 2037 (1996) jection or absorption of probability. This group of most 5. D. Ruelle, A review of linear response theory for gen- sensitive nodes can then be analyzed with the reduced eral differentiable dynamical systems, Nonlinearity 22, 855 Google matrix approach by the related REGOMAX al- (2009) gorithm which allows to determine effective indirect net- 6. S.Brin and L.Page, The anatomy of a large-scale hyper- work interactions for this set of nodes. We illustrated the textual Web , Computer Networks and ISDN efficiency of the combined LIRGOMAX and REGOMAX Systems 30, 107 (1998) algorithm for the English Wikipedia network of 2017 with 7. A.M. Langville and C.D. Meyer, Google’s PageRank and two very interesting examples. In these examples, we use beyond: the science of search engine rankings, Princeton two initial nodes (articles) for injection/absorption, cor- University Press, Princeton (2006) responding either to two important universities or to two 8. L.Ermann, K.M. Frahm and D.L. Shepelyansky Google related historical figures. As a result we obtain associated matrix analysis of directed networks, Rev. Mod. Phys. 87, sets for most sensitive Wikipedia articles given in Tables 1 1261 (2015) K.M. Frahm and D.L. Shepelyansky: Linear response theory for Google matrix 11

9. K.M. Frahm and D.L. Shepelyansky, Wikipedia net- works of 24 editions of 2017, Available: http: //www.quantware.ups-tlse.fr/QWLIB/24wiki2017/. Accessed August (2019) 10. K.M. Frahm and D.L. Shepelyansky, Reduced Google ma- trix, arXiv:1602.02394[physics.soc] (2016) 11. K.M. Frahm, K. Jaffres-Runser and D.L. Shepelyansky, Wikipedia mining of hidden links between political leaders, Eur. Phys. J. B 89, 269 (2016) 12. C. Coquide, J. Lages and D.L. Shepelyansky, World influ- ence and interactions of universities from Wikipedia net- works, Eur. Phys. J. B 92, 3 (2019) 13. J. Lages, D.L. Shepelyansky and A. Zinovyev, Inferring hidden causal relations between pathway members using re- duced Google matrix of directed biological networks, PLoS ONE 13(1), e0190812 (2018) 14. H. Cao, Lasing in random media, Waves Random Media 13, R1 (2003) 15. C. Coquide, L. Ermann, J. Lages and D.L. Shepelyansky, Influence of petroleum and gas trade on EU economies from the reduced Google matrix analysis of UN COM- TRADE data, Eur. Phys. J. B 92, 171 (2019) 16. K.M. Frahm and D.L. Shepelyansky, LIRGOMAX al- gorithm, Available: http://www.quantware.ups-tlse.fr/ QWLIB/lirgomax. Accessed August (2019) 17. Chepelianskii A.D., Towards physical laws for software ar- chitecture, arXiv:1003.5455 [cs.SE] (2010) 18. T. Guhr, A. Müller-Groeling, H.A. Weidenmüller, Random-matrix theories in quantum physics: common con- cepts, Phys. Rep. 299, 189 (1998)