THE GOOGLE SPECTRUM Contents 1. Introduction 1 2. the Spectrum Of
Total Page:16
File Type:pdf, Size:1020Kb
THE GOOGLE SPECTRUM VARUN IYER Abstract. Google's PageRank algorithm is vastly superior to its rival search algorithms because of the mathematics behind it. As one of the many re- sults of Perron's Theorem, the algorithm is a discrete-time finite-state Markov chain that relies upon its transition probability matrix's spectrum for its con- vergence. This paper introduces the spectrum, Perron's theorem and proof, some of its applications, and basic Markov Chain theory, including, finally, a description of the PageRank algorithm. Contents 1. Introduction 1 2. The Spectrum of a Graph 2 3. Perron's Theorem 4 4. Some Applications of Perron's Theorem 8 Ecological Population Model 8 Walrasian Stability of Competitive Markets 9 5. The Power Method and Markov Chains 10 6. The Mathematics of Google's PageRank 12 Acknowledgments 15 References 15 1. Introduction Larry Page and Sergey Brin developed Google's PageRank algorithm in 1998, still during the early stages of the Internet. By 2002, 2 years before Google's IPO, PageRank was operating on a matrix at the order of 2.7 billion, cited by Cleve Moler as "The World's Largest Matrix Computation" [4]. Today Google searches over 30 trillion webpages, 100 billion times a month, and those numbers are still growing as new sites are used and new users Google on a daily basis. The algorithm, that was far more efficient than its rival '90s search engines at ranking web pages based on a given search, is still exceedingly relevant today, with some minor modifications over the years. To understand Google's PageRank is to understand the development and structure of the Internet. The goal of this paper is to highlight what makes the Google PageRank algorithm so superior to its rival search engines by explaining its core mathematics. To get to this discussion, however, we need to cover several different topics. First, we begin with some spectral graph theory to provide a graphical understanding of the spectrum, a tool that will be used numerous times. Describing some properties of graphs that can be found from the spectrum will provide some intuition for 1 2 VARUN IYER when we later discuss an example of a large directed graph, the Internet. We move on to stating and proving Perron's Theorem, an elegant foundation to many practical mathematics. A related theorem shows the convergence properties of iterative processes, which we will use to discuss some of Perron's other applications in ecology and economics. We then define the power method, a central tool to the PageRank algorithm and other Perron applications, and then introduce some Markov Chain theory which is crucial to understanding the intuition behind creating PageRank. Finally explaining in detail the rationale behind the Google matrix and the PageRank algorithm, we end with a proof of the Google matrix's spectrum, which is fundamentally responsible for the algorithm's efficiency. Hopefully this paper explains the first revolutionary search algorithm in depth, and will perhaps inspire future algorithms to be modeled from a similar mathematical foundation. 2. The Spectrum of a Graph Spectral graph theory is the study of the properties of graphs by analyzing the eigenvalues and eigenvectors of the graph's associated matrices, including the adja- cency matrix. We begin by defining a graph, and introduce the graph's spectrum. In later sections, we will prove a useful theorem about the spectrum of a positive matrix, and then use the spectrum to study a particular graph, the Internet. For now, this introduction to spectral graph theory should open some ideas to how the manipulation of the matrix of a graph can provide useful information about the graph's structure. Definition 2.1. A graph is an ordered pair G = (V; E), where V is the set of vertices and E is the set of edges, such that: E ⊂ f(x; y) j x; y 2 V; x 6= yg: The graph we defined above is an undirected graph. A directed graph contains edges with directions, where an edge that joins vi to vj may not necessarily join vj to vi. In this section, we limit our discussion to undirected graphs, but the concept of directed graphs will become important in a later section. If vi and vj are vertices of G and e = fvi; vjg is an edge that joins the two vertices, we call vi and vj adjacent to one another. In other words, two vertices are adjacent in a graph if there is a single connecting edge. The degree d(v) of a vertex v is the number of vertices that are adjacent to v. We will now define the adjacency matrix, which will be crucial to understanding the graph's properties. Definition 2.2. Suppose G is a graph with vertex set V = fv1; v2; : : : ; vng. The adjacency matrix of graph G is the n × n matrix A whose entries aij are given by: ( 1; if vi and vj are adjacent; aij = 0; otherwise. The adjacency matrix is both real and symmetric, by definition. Since the la- beling of a graph's vertices is arbitrary, properties of the matrix that are invariant under permutations of rows and columns are of particular interest. Suppose that λ is an eigenvalue of A. Since A is real and symmetric, λ is also real. This λ is a root of the characteristic polynomial det(λI − A). We now have the tools to define the spectrum of a graph. THE GOOGLE SPECTRUM 3 Definition 2.3. The spectrum of graph G is the set of eigenvalues of A combined with their respective algebraic multiplicities. If the k distinct eigenvalues of A are λ0 > λ1 > : : : > λk, then we write Spec = λ0 λ1 : : : λk Applications of the spectrum will come into play in later sections. For now, we will show some properties of graphs that can be derived from A. Suppose the characteristic polynomial det(λI − A) of G is k k−1 k−2 k−3 χ(G; λ) = λ + c1λ + c2λ + c3λ + ··· + ck The coefficients of χ can be expressed in terms of the principal minors of A, where a principal minor is the determinant of a submatrix obtained by taking a subset of the rows and the same subset of the columns. For the following proposition, we define triangles T as subgraphs containing 3 mutually adjacent vertices. Proposition 2.4. The coefficients of the characteristic polynomial χ of graph G have the following properties: (1) c1 = 0; (2) −c2 is the number of edges in G (3) −c3 is twice the number of triangles in G i Proof. For i 2 f1; 2; : : : ; kg, the value of (−1) ci is the sum of the principal minors of A with i rows and i columns. (1) The diagonal elements of A are all 0, so c1 = 0. (2) A nonzero principal minor of A with 2 rows and 2 columns must be 0 1 = −1: 1 0 For each pair of adjacent vertices in G, there is one such minor. Hence 2 (−1) c2 = −1 × E. (3) The nontrivial possibilities for principal minors of three rows and columns include: 0 1 0 0 1 1 0 1 1 1 0 0 = 0; 1 0 0 = 0; 1 0 1 = 2: 0 0 0 1 0 0 1 1 0 The only nonzero minor corresponds to 3 mutually adjacent vertices in G, 3 or a triangle T . Thus (−1) c3 = 2 × T . We end our introduction to spectral graph theory by defining the walk and what it means for a graph to be connected. A walk of length l in G, from vi to vj, is a finite sequence of vertices of G, vi = u0; u1; : : : ; ul = vj; such that consecutive elements of the sequence are adjacent. The powers of the adjacency matrix of a graph can be used to find the number of walks of length l. Theorem 2.5. The number of walks of length l in graph G, from vi to vj, is element (i; j) of the matrix Al. 4 VARUN IYER Proof. We will prove this theorem by induction. For walks of lengths 0 and 1, the result is true, as A0 = I and A1 = A. Suppose the result is true for walks of length L. For each walk of length L + 1 from vi to vj, there is a walk of length L from vi to vertices vk adjacent to vj. This bijection implies that the number of such L + 1 walks is n X L X L L+1 (A )ik = (A )ikakj = (A )ij fvk;vj g2E k=1 Hence, the result is true by induction. Definition 2.6. A graph is connected if each pair of vertices is joined by a walk. A directed graph is said to be strongly connected if there is a walk of directed edges between any pair of vertices. For more on spectral and algebraic graph theory, see Bigg [1]. In the next section, we move away from graphs to discuss a property of positive square matrices quintessential to many practical applications, stated in Perron's Theorem. 3. Perron's Theorem In 1907, Oskar Perron proved a theorem that would play a role in numerous fields of practical mathematics. Many measured interactions that take place in the real world can be described by nonnegative or positive matrices, where each element is nonnegative or positive, and a large number of important models take the form of simple linear iterative processes. Such processes begin with an initial state x0 and evolve recursively as xk = Akx0; Perron tells us when that iterative process will converge using, interestingly enough, the spectrum of matrix A.