Google Matrix Analysis of Directed Networks

Google matrix analysis of directed networks Leonardo Ermann Departamento de F´ısicaTeórica,GIyA, ComisiónNacional de Energ´ıaAtómica,Buenos Aires, Argentina Klaus M. Frahm and Dima L. Shepelyansky Laboratoire de Physique Théorique du CNRS, IRSAMC, Universitéde Toulouse, UPS, 31062 Toulouse, France (Dated: July 25, 2014; Revised: January 27, 2015) In the past decade modern societies have developed enormous communication and social networks. Their classification and information retrieval processing has become a formidable task for the society. Due to the rapid growth of the World Wide Web, and social and communication networks, new mathematical methods have been invented to characterize the properties of these networks in a more detailed and precise way. Various search engines use extensively such methods. It is highly important to develop new tools to classify and rank massive amount of network information in a way that is adapted to internal network structures and characteristics. This review describes the Google matrix analysis of directed complex networks demonstrating its efficiency using various examples including World Wide Web, Wikipedia, software architectures, world trade, social and citation networks, brain neural networks, DNA sequences and Ulam networks. The analytical and numerical matrix methods used in this analysis originate from the fields of Markov chains, quantum chaos and Random Matrix theory. Keywords: Markov chains, World Wide Web, search engines, complex networks, PageRank, 2DRank, CheiRank \The Library exists ab aeterno." B. Universal emergence of PageRank 18 Jorge Luis Borges The Library of Babel C. Two-dimensional ranking for University networks 20 IX. Wikipedia networks 20 A. Two-dimensional ranking of Wikipedia articles 20 Contents B. Spectral properties of Wikipedia network 21 C. Communities and eigenstates of Google matrix 22 I. Introduction 2 D. Top people of Wikipedia 23 E. Multilingual Wikipedia editions 23 II. Scale-free properties of directed networks 3 F. Networks and entanglement of cultures 26 III. Construction of Google matrix and its properties 4 X. Google matrix of social networks 27 A. Construction rules 4 A. Twitter network 28 B. Markov chains and Perron-Frobenius operators 5 B. Poisson statistics of PageRank probabilities 29 C. Invariant subspaces 5 D. Arnoldi method for numerical diagonalization 6 XI. Google matrix analysis of world trade 30 E. General properties of eigenvalues and eigenstates 7 A. Democratic ranking of countries 30 B. Ranking of countries by trade in products 31 IV. CheiRank versus PageRank 7 C. Ranking time evolution and crises 32 A. Probability decay of PageRank and CheiRank 7 D. Ecological ranking of world trade 32 B. Correlator between PageRank and CheiRank 8 E. Remarks on world trade and banking networks 36 C. PageRank-CheiRank plane 9 D. 2DRank 9 XII. Networks with nilpotent adjacency matrix 37 E. Historical notes on spectral ranking 9 A. General properties 37 V. Complex spectrum and fractal Weyl law 9 B. PageRank of integers 37 C. Citation network of Physical Review 39 arXiv:1409.0428v3 [physics.soc-ph] 13 Jul 2016 VI. Ulam networks 10 A. Ulam method for dynamical maps 11 XIII. Random matrix models of Markov chains 41 B. Chirikov standard map 11 A. Albert-Barabásimodel of directed networks 41 C. Dynamical maps with strange attractors 12 B. Random matrix models of directed networks 42 D. Fractal Weyl law for Perron-Frobenius operators 13 C. Anderson delocalization of PageRank? 43 E. Intermittency maps 14 F. Chirikov typical map 14 XIV. Other examples of directed networks 44 A. Brain neural networks 44 VII. Linux Kernel networks 15 B. Google matrix of DNA sequences 46 A. Ranking of software architecture 15 C. Gene regulation networks 49 B. Fractal dimension of Linux Kernel Networks 16 D. Networks of game go 50 E. Opinion formation on directed networks 50 VIII. WWW networks of UK universities 17 A. Cambridge and Oxford University networks 17 XV. Discussion 52 2 XVI. Acknowledgments 53 culiar features of such matrices in the limit of large matrix size corresponding to many-body quantum systems References 53 (Guhr et al., 1998), quantum computers (Shepelyansky, 2001) and a semiclassical limit of large quantum num- bers in the regime of quantum chaos (Haake, 2010). In I. INTRODUCTION contrast to the Hermitian problem, the Google matrices of directed networks have complex eigenvalues. The only In the past ten years, modern societies have devel- physical systems where similar matrices had been stud- oped enormous communication and social networks. The ied analytically and numerically correspond to models World Wide Web (WWW) alone has about 50 billion in- of quantum chaotic scattering whose spectrum is known dexed web pages, so that their classification and informa- to have such unusual properties as the fractal Weyl law tion retrieval processing becomes a formidable task. Var- (Gaspard, 2014; Nonnenmacher and Zworski, 2007; She- ious search engines have been developed by private com- pelyansky, 2008; Sjöstrand, 1990; Zworski, 1999). panies such as Google, Yahoo! and others which are extensively used by Internet users. In addition, social networks (Facebook, LiveJournal, Twitter, etc) have gained huge popularity in the last few years. In addition, use of social networks has spread beyond their initial purpose, making them important for political or social events. To handle such massive databases, fundamental mathematical tools and algorithms related to centrality mea- sures and network matrix properties are actively being developed. Indeed, the PageRank algorithm, which was initially at the basis of the development of the Google search engine (Brin and Page, 1998; Langville and Meyer, 2006), is directly linked to the mathematical properties of Markov chains (Markov, 1906) and Perron-Frobenius FIG. 1 (Color online) Google matrix of the network operators (Brin and Stuck, 2002; Langville and Meyer, Wikipedia English articles for Aug 2009 in the basis of PageR- 0 2006). Due to its mathematical foundation, this algo- ank index K (and K ). Matrix GKK0 corresponds to x rithm determines a ranking order of nodes that can be (and y) axis with 1 ≤ K; K0 ≤ 200 on panel (a), and with applied to various types of directed networks. However, 1 ≤ K; K0 ≤ N on panel (b); all nodes are ordered by PageR- ank index K of matrix G and thus we have two matrix indexes the recent rapid development of WWW and communi- 0 cation networks requires the creation of new tools and K; K for matrix elements in this basis. Panel (a) shows the first 200 × 200 matrix elements of G matrix (see Sec. III). algorithms to characterize the properties of these net- Panel (b) shows density of all matrix elements coarse-grained works on a more detailed and precise level. For example, on 500×500 cells where its elements, GK;K0 , are written in the such networks contain weakly coupled or secret commu- PageRank basis K(i) with indexes i ! K(i) (in x-axis) and nities which may correspond to very small values of the j ! K0(j) (in a usual matrix representation with K = K0 = 1 PageRank and are hard to detect. It is therefore highly on the top-left corner). Color shows the density of matrix ele- important to have new methods to classify and rank hige ments changing from black for minimum value ((1− α)=N) to amounts of network information in a way adapted to in- white for maximum value via green (gray) and yellow (light ternal network structures and characteristics. gray); here the damping factor is α = 0:85 After (Ermann This review describes matrix tools and algorithms et al., 2012a). which facilitate classification and information retrieval from large networks recently created by human activity. In this review we present an extensive analysis of a The Google matrix, formed by links of the network has, is variety of Google matrices emerging from real networks typically huge (a few tens of billions of webpages). Thus, in various sciences including WWW of UK universities, the analysis of its spectral properties including complex Wikipedia, Physical Review citation network, Linux Ker- eigenvalues and eigenvectors represents a challenge for nel network, world trade network from the UN COM- analytical and numerical methods. It is rather surpris- TRADE database, brain neural networks, networks of ing, but the class of such matrices, which belong to the DNA sequences and many others. As an example, the class of Markov chains and Perron-Frobenius operators, Google matrix of the Wikipedia network of English arti- has been essentially overlooked in physics. Indeed, phys- cles (2009) is shown in Fig. 1. We demonstrate that the ical problems typically belong to the class of Hermitian analysis of the spectrum and eigenstates of a Google ma- or unitary matrices. Their properties have been actively trix of a given network provides a detailed understanding studied in the frame of Random Matrix Theory (RMT) about the information flow and ranking. We also show (Akemann et al., 2011; Guhr et al., 1998; Mehta, 2004) that such types of matrices naturally appear for Ulam and quantum chaos (Haake, 2010). The analytical and networks of dynamical maps (Frahm and Shepelyansky, numerical tools developed in these research fields have 2012b; Shepelyansky and Zhirov, 2010a) in the frame- paved the way for understanding many universal and pe- work of the Ulam method (Ulam, 1960). 3 Currently, Wikipedia, a free online encyclopaedia, II. SCALE-FREE PROPERTIES OF DIRECTED stores more and more information and has become the NETWORKS largest database of human knowledge. In this respect it is similar to the Library of Babel, described by Jorge Luis The distributions of the number of ingoing or outgoing Borges (Borges, 1962). The understanding of hidden re- links per node for directed networks with N nodes and N` lations between various areas of knowledge on the basis of links are well known as indegree and outdegree distribu- Wikipedia can be improved with the help of Google ma- tions in the community of computer science (Caldarelli, trix analysis of directed hyperlink networks of Wikipedia 2003; Donato et al., 2004; Pandurangan et al., 2005).

Google Matrix Analysis of Directed Networks

Google Matrix Analysis of Bi-Functional SIGNOR Network of Protein-Protein Interactions Klaus M

Mathematical Properties and Analysis of Google's Pagerank

Google Matrix of the World Trade Network

The Pagerank Algorithm and Application on Searching of Academic Papers

Reduced Google Matrix 2

Google Pagerank and Reduced-Order Modelling

Linear Response Theory for Google Matrix

A Reordering for the Pagerank Problem

The Google Markov Chain: Convergence Speed and Eigenvalues

Fast Parallel Pagerank: a Linear System Approach

Arxiv:1711.11499V1 [Cs.SI] 30 Nov 2017 Ized by a Power Law [10,11] Which Is Typical for Complex Scale-Free Networks [12]

Google: an Application of Linear Algebra (The Mathematics of a Great Success)