Feature ’s secret and Linear Algebra

Pablo Fernández Gallardo (Madrid, Spain)

1. Introduction We now ask the reader (quite possibly a google-maniac himself) to decide, from his own experience, whether Google Some months ago newspapers all around the world consid- fulfils this objective or not. We are sure the common response ered Google’s plan to go public. The significance of this piece will be affirmative ... and even amazingly affirmative! It of news was not only related to the volume of the transaction, seems to be magic3 but it is just mathematics, mathematics re- the biggest since the dot-com “irrational exuberance” in the quiring no more than the tools of a first year graduate course, 90’s, but also to the particular characteristics of the firm. A as we will soon see. few decades ago there was a complete revolution in technol- To tackle our task, we need an ordering criterion. Notice ogy and communications (and also a cultural and sociologi- that if we label each web page with symbols P1,...,Pn,allwe cal revolution), namely the generalization of use and access want is to assign each Pj a number x j, its significance.These to the Internet. Google’s appearance has represented a revolu- numbers might range, for example, between 0 and 1. Once the tion comparable to the former as it became a tool that brought complete list of web pages, along with their significances, is some order into this universe of information, a universe that at our disposal, we can use this ordering each time we answer was not manageable before. a query; the selected pages will be displayed in the order as The design of a web is a problem of mathe- prescribed by the list.4 matical engineering. Notice the adjective. First, a deep knowl- edge of the context is needed in order to translate it into mod- 3. The model els, into mathematics. But after this process of abstraction and after the relevant conclusions have been drawn, it is essential Let us suppose that we have collected all the information to carry out a thorough, detailed and efficient design of the about the web: sites, contents, links between pages, etc. The computational aspects inherent in this problem. set of web pages, labelled P1 ...Pn, and the links between them can be modelled with a (directed) graph G. Each web 2. The Google engine page Pj is a vertex of the graph and there will be an edge be- tween vertices Pi and Pj whenever there is a link from page Pi The engine was designed in 1998 by Sergei to page Pj. It is a gigantic, overwhelming graph, whose real Brin and Lawrence Page, two computer science doctorate stu- structure deserves some consideration (see Section 8). dents at Stanford – two young men, now in their thirties, who When dealing with graphs, we like to use drawings in the have become multimillionaires. The odd name of the firm is paper, in which vertices are points of the plane, while edges a variation of the term googol, the name that somebody1 in- are merely arrows joining these points. But, for our purposes, vented to refer to the overwhelming number 10100.Thisis it is helpful to consider an alternative description, with ma- one of those numbers that mathematicians are comfortable trices. Let us build an n × n matrix M with zero-one entries, with but are perhaps bigger than the number of particles in whose rows and columns are labelled with symbols P1,...,Pn. the whole universe. The matrix entry mij will be 1 whenever there is a link from The scale of the question we are concerned with is also page Pj to page Pi and 0 otherwise. immense. In 1997, when Brin and Page were to start working The matrix M is, except for a transposition, the adjacency on Google’s design, there were about 100 million web pages. matrix of the graph. Notice that it is not necessarily symmetric Altavista, the most popular search engine in those days, at- because we are dealing with a directed graph. Observe also tended to 20 million daily queries. Today, these figures have that the sum of the entries for Pj’s column is the number of been multiplied; Google receives some hundred million daily Pj’s outgoing links, while we get the number of ingoing links queries and indexes several billion web pages. by summing rows. Therefore, the design of a search engine must efficiently We will assume that the significance of a certain page Pj solve some computational aspects, namely the way to store “is related to” the pages linking to it. This sounds reasonable; that enormous amount of information, how it is updated, how if there are a lot of pages pointing to Pj, its information must to manage the queries, the way to search the databases, etc. have been considered as “advisable” by a lot of web-makers. But, although interesting, we are not going to treat these ques- The above term “related to” is still rather vague. A first tions here. The point of interest can be formulated in a simple attempt to define it, in perhaps a naïve manner, amounts to manner. Let us suppose that, after a certain query, we have de- supposing that the significance x j of each Pj is proportional termined that, say, one hundred web pages enclose informa- to the number of links to Pj. Let us note that, whenever we tion that might, in some sense, be relevant to the user. Now, have the matrix M at our disposal, the computation of each x j in which order should they be displayed? The objective, as is quite simple; it is just the sum of the entries of each row Pj. explicitly posed2 by Brin and Page (see [6]), is that in the ma- This model does not adequately grasp a situation deserv- jority of attempts, at least one of, say, the first ten displayed ing attention, i.e., when a certain page is cited from a few pages contains useful information for the user. very relevant pages, e.g., from www..com and www.

10 EMS Newsletter March 2007 Feature .com. The previous algorithm would assign it a low 4. The random surfer significance and this is not what we want. So we need to en- hance our model in such a way that a strong significance is Google’s approach to the question follows a slightly different assigned both to highly cited pages and to those that, although point of view. At the present stage, a page Pj distributes a “1” not cited so many times, have links from very “significant to every page where there is an outgoing link. This means that pages”. pages with many outgoing links have a great influence, which Following this line of argument, the second attempt as- surely is not reasonable. It is more fair to assign each page sumes that the significance x j of each page Pj is proportional Pj a “total weight” 1, which is equally distributed among the to the sum of the significances of the pages linking to Pj.This outgoing links. So we should consider a new matrix instead of slight variation completely alters the features of the problem. M (the matrix of the graph, with entries 0 and 1). Let Nj be the Suppose, for instance, that page P1 is cited on pages P2, number of Pj’s outgoing links (that is, the sum of the entries in  P25 and P256,thatP2 isonlycitedonpagesP1 and P256,etc., the column labelled Pj in M). The new matrix M is built from  = / and that there are links to page Pn from P1, P2, P3, P25 and the original M by replacing each entry mij by mij mij Nj.  Pn−1. Following the previous assignment, x1 should be pro- The entries of M will be non-negative numbers (between 0 portional to 3, x2 to 2, etc., while xn should be proportional to and 1) and the sum of the entries for each column will be 1. 5. But now, our assignment x1,...,xn must verify that And now we are interested in the non-negative vector of the corresponding5 problem Mx = λx. The matrix M is called a = ( + + ), x1 K x2 x25 x256 stochastic (or Markovian) matrix. x2 = K (x1 + x256), This new point of view leads us to a nice interpretation. . Let us imagine a user surfing the web. At some moment he . will reach some page, say P1. But, probably bored with the xn = K (x1 + x2 + x3 + x25 + xn−1), contents of P1, he will jump to another page, following P1’s where K is a certain proportionality constant. In this way, we outgoing links (suppose there are N1 possibilities). But, to face an enormous system of linear equations, whose solutions which one? Our brave navigator is a random surfer – and are all the admissible assignments x1...xn. Below these lines needless to say, also blond and suntanned. So, in order to we write the system of equations in a better way, using matri- decide his destination, he is going to use chance, and in the ces. most simple possible way: with a regular (and virtual, we pre- P1 P2 P25 P256 Pn−1 sume) die, which has the same number of faces as the number ⎛ ⎞ ⎛ ↓ ↓ ↓ ↓ ↓ ⎞⎛ ⎞ of outgoing links from P1. In technical terms, the choice of x1 0 10··· 0 10··· 0 10··· 00 x1 destination follows a (discrete) uniform probability distribu- ⎜ ⎟ ⎜ ⎟⎜ ⎟ [ , ] ⎜ x2 ⎟ ⎜ 100··· 000··· 010··· 00⎟⎜ x2 ⎟ tion in 1 N1 . Say, for instance, that there are three edges ⎜ ⎟ = K ⎜ ⎟⎜ ⎟ ⎝ . ⎠ ⎝ ...... ⎠⎝ . ⎠ leaving P1 to vertices P2, P6 and P8. Our navigator draws ...... his destination, assigning a probability of 1/3 to each ver- ··· ··· ··· xn 111 010 000 10 xn tex. Let us call the significance vector x.Then×n matrix of Our model is no longer deterministic but probabilistic.We the system is exactly the matrix M associated with the graph. do not know where he will be a moment of time later but we So we can state that the significance assignment is a solution do know what his chances are of being in each admissible of destination. And it is a dynamic model as well because the Mx = λx. same argument may be applied to the second movement, to the third one, etc. In our example, if the first movement is We have already used the symbol λ for the constant of propor- from P1 to P2 and there are four edges leaving P2,thenheis tionality. This is so because, as anyone who has been exposed to draw again, now with probability 1/4 for each possible des- to a linear algebra first course will recognize, the question has tination. Our surfer is following what is known as a random become a problem of eigenvalues and eigenvectors; the sig- walk in the graph. nificance vector x is no more than an eigenvector of the matrix And what about the matrix M? Let us say that the surfer M. You might recall that this matrix contains all the informa- is on page (vertex) Pk at the beginning, that is in probabilistic tion about the web structure, i.e., the vertices and adjacency terms, he is on page Pk with a probability of 100 %. We repre- relations. sent this initial condition with the vector (0,...,1,...,0),the Perhaps this is not enough to arouse the reader’s enthu- 1 being in position k. Recall that the surfer draws among the siasm yet. Alright, an eigenvector. But which one? There are Nk destinations, assigning probability 1/Nk to each of them. so many. And also, how could we compute it? The matrix is But when we multiply the matrix M by this initial vector, (  ,  ,...,  ) inconceivably huge. Remember, it is built up of a thousand we get m1k m2k mnk , a vector with entries summing to  / million rows (or columns). Patience, please. For the time be- 1: the m jk are either 0 or 1 Nk and there are exactly Nk non- ing, it sounds reasonable to demand the entries of our vector zero entries. Notice that the vector we get exactly describes (the significance of the web pages) to be non-negative (or, at the probability of being, one moment later, on each page of least, with the same sign). This will be written as x ≥ 0. We the web, assuming he began at Pk. More than that, in order to ask the reader to excuse this abuse of notation. But also, since know the probabilities of being on each page of the web after we intend the method to be useful, we need this hypotheti- two moments of time, it is enough to repeat the process. That cal non-negative vector to be unique. If there were more than is, to multiply (M)2 by the initial vector. And the same for one, which of them should be chosen? the third movement, the fourth, etc.

EMS Newsletter March 2007 11 Feature

Following the usual terminology, we consider a certain λ = 0.475 is the biggest (in modulus) eigenvalue, its associ- number of states, in our case just being the vertices of the ated eigenvector being graph G. The matrix M is (appropriately) called the transi-  x =(0.509,0.746,0.928,0.690,0.840,1). tion matrix of the system; each entry mij describes the prob- ability of going from state (vertex) Pj to state (vertex) Pi.And Andthisistheonly eigenvector that has real non-negative the entries of the successive powers of the matrix give us tran- entries! The components of the vector suggest the following sition probabilities between vertices as time goes by. The well ordering: E6 → E3 → E5 → E2 → E4 → E1. And now E6 is versed reader may have already deduced the relation with the the best team! previous ideas: the stationary state of this Markov chain turns Let us summarize. In this particular matrix with non-nega- out to be precisely the non-negative vector of the problem tive entries (that might be regarded as a small-scale version of Mx = λx. the Internet matrix) we are in the best possible situation; there It might happen that some pages have no outgoing links is a unique non-negative eigenvector, the one we need to solve at all (with only zeros in the corresponding columns). This the ordering question we posed. Did this happen by chance? would not give a stochastic matrix. We will discuss Google’s Or was it just a trick, an artfully chosen matrix to persuade the solution to this problem in Section 8. unwary reader that things work as they should? The reader, far from being unwary, is now urgently demanding a categorical 5. Qualifying for the playoffs response. And he knows that it is time to welcome a new actor to this performance. We will illustrate the ordering algorithm6 with the following question. Let us imagine a sports competition in which teams 6. Mathematics enters the stage are divided in groups or conferences7. Each team plays the same number of games but not the same number of games Let us distil the common essence of all the questions we have against each other; it is customary they play more games been dealing with. Doing so, we discover that the only fea- against the teams from their own conference. So we may ask ture shared by all our matrices (being stochastic or not) is that the following question. Once the regular season is finished, all their entries are non-negative. Not a lot of information, it which teams should classify for the playoffs? The standard seems. They are neither symmetric matrices nor positive defi- 8 system computes the number of wins to determine the final nite nor. . . Nevertheless, as shown by Perron at the beginning positions but it is reasonable (see [10]) to wonder whether of the 20th century, it is enough to obtain interesting results: this is a “fair” system or not. After all, it might happen that Theorem (Perron, 1907). Let A be a square matrix with pos- a certain team could have achieved many wins just because itive entries, A > 0.Then it was included in a very “weak” conference. What should be (a) there exists a (simple) eigenvalue λ > 0 such that Av = worthier: the number of wins or their “quality”? And we again λv, where the corresponding eigenvector is v > 0; face Google’s dichotomy! (b) λ is bigger (in modulus) than the other eigenvalues; ,..., Say, for example, that there are six teams, E1 E6,di- (c) any other positive eigenvector of A is a multiple of v. vided into two conferences. Each team plays 21 games in all: 6 against each team from its own conference, 3 against the Perron’s result points to the direction we are interested in but others. These are the results of the competition: it is not enough because the matrices we deal with might con- tain zeros. So we need something else. The following act of E E E E E E 1 2 3 4 5 6 this performance was written several years later by Frobe- − / / / / / → / E1 3 21 0 21 0 21 1 21 2 21 6 21 nius9 where he deals with the general case of non-negative / − / / / / → / E2 3 21 2 21 2 21 2 21 1 21 10 21 matrices. Frobenius observed that if we only have that A ≥ 0 E / / − / / / → / 3 6 21 4 21 2 21 1 21 1 21 14 21 then, although there is still a dominant (of maximum mod- / / / − / / → / E4 3 21 1 21 1 21 2 21 2 21 9 21 ulus) eigenvalue λ > 0 associated to an eigenvector v ≥ 0, / / / / − / → / E5 2 21 1 21 2 21 4 21 2 21 11 21 there might be other eigenvalues of the same “size”. Here is / / / / / − → / E6 1 21 2 21 2 21 4 21 4 21 13 21 his theorem: To the right of the table, we have written the number of wins Theorem (Frobenius, 1908–1912). Let A be a square matrix of each team. This count suggests the following ordering: with non-negative entries, A ≥ 0.IfA is irreducible,10 then E → E → E → E → E → E . But notice, for instance, 3 6 5 2 4 1 (a) there exists a (simple) eigenvalue λ > 0 such that Av = that the leader team E has collected a lot of victories against 3 λv, where the corresponding eigenvector is v > 0.Inad- E , the worst one. 1 dition, λ ≥|µ| for any other eigenvalue µ of A. Let us now assign significances x =(x ,...,x ) to the 1 6 (b) Any eigenvector ≥ 0 is a multiple of v. teams with the mentioned criterion: x is proportional to the j (c) If there are k eigenvalues of maximum modulus, then they number of wins of E , weighted with the significance of the j are the solutions of the equation xk − λk = 0. other teams. If A is the above table, this leads, once more, to Ax = λx. And again, we want to find a non-negative eigen- Notice firstly that Frobenius’ theorem is indeed a general- vector of A (a unique one, if possible). ization of Perron’s result, because if A > 0, then A is ≥ 0 Even in such a simple example as this one, we need to use and irreducible. Secondly, if A is irreducible then the ques- a computer. So we ask some mathematical software to per- tion is completely solved: there exists a unique non-negative form the calculations. We find that the moduli of the six eigen- eigenvector associated to the positive eigenvalue of maximum values are 0.012, 0.475, 0.161, 0.126, 0.139 and 0.161. So modulus (a very useful feature, as we will see in a moment).

12 EMS Newsletter March 2007 Feature

These results, to which we will refer from now on as the Therefore, when repeatedly multiplying the initial vector Perron–Frobenius Theorem, are widely used in other con- by the matrix A, we determine, more precisely each time, the texts (see Section 9). Some people even talk about “Perron– direction of interest, namely the one given by v1. This nu- Frobenius Theory”, this theorem being one of its central re- merical method is known as the power method and its rate sults. of convergence depends on the ratio between the first and the The proof is quite complicated and here we will just sketch second eigenvalue (see in [8] an estimate for Google’s ma- an argument (in the 3× 3 case) with some of the fundamental trix). ideas. Let us start with a non-negative vector x ≥ 0. As A ≥ Our problem is finally solved, at least if we are in the 0, the vector Ax is also non-negative. In geometric terms, the best possible conditions (a non-negative irreducible matrix). matrix A maps the positive octant into itself. Let us consider The answer does exist, it is unique and we have an efficient now the mapping α given by α(x)=Ax/Ax. Notice that method to compute it at our disposal (according to Google’s α(x) is always a unit length vector. The function α maps the web page, a few hours are needed). But . . . set {x ∈ R3 : x ≥ 0,x = 1} into itself. Now, applying the 11 Brouwer Fixed Point Theorem , there exists a certain x˜ such 8. Are we in an ideal situation? that α(x˜)=x˜. Therefore, Ax˜ To make things work properly, we need the matrix M asso- α(x˜)= = x˜ =⇒ Ax˜ = Ax˜x˜ . Ax˜ ciated to the web-graph G to be irreducible. In other words, we need G to be a strongly connected graph13. As the reader Summing up, x˜ is an eigenvector of A with non-negative en- might suspect, this is not the case. Research developed in tries associated to an eigenvalue > 0. For all other details, 1999 (see [7]) came to the conclusion that, among the 203 such as proving that this eigenvector is (essentially) unique million pages under study, 90 % of them laid in a gigantic and the other parts of the theorem, we refer the reader to [1], (weakly connected) component, this in turn having a quite [4], [13] and [14]. complex internal structure, as can be seen in the following picture, taken from [7]. 7. And what about the computational aspects?

The captious reader will be raising a serious objection: Perron– Frobenius’ theorem guarantees the existence of the needed eigenvector for our ordering problem but says nothing about how to compute it. Notice that the proof we sketched is not a constructive one. Thus, we still should not rule out the pos- sibility that these results are not so satisfactory. Recall that Google’s matrix is overwhelming. The calculation of our eigenvector could be a cumbersome task! Let us suppose we are in an ideal situation, i.e., in those conditions12 that guarantee the existence of a positive eigen- value λ1 strictly bigger (in modulus) than the other eigenval- ues. Let v1 be its (positive) eigenvector. We could, of course, compute all the eigenvalues and keep the one of interest but even using efficient methods, the task would be excessive. However, the structure of the problem helps us again and This is a quite peculiar structure, which resembles a bio- make the computation easy. It all comes from the fact that logical organism, a kind of colossal amoeba. Along with the the eigenvector is associated to the dominant eigenvalue. central part (SCC, Strongly Connected Component), we find Suppose, to simplify the argument, that A is diagonaliz- two more pieces14: the IN part is made up of web pages hav- Rn { ,..., } able.Wehaveabasisof with the eigenvectors v1 vn , ing links to those of SCC and the OUT part is formed by pages the corresponding eigenvalues being decreasing size ordered: pointed from the pages of SCC. Furthermore, there are sort of |λ | > |λ |≥|λ | ≥ ··· ≥ |λ | 1 2 3 n . We start, say, with a certain tendrils (sometimes turning into tubes) comprising the pages ≥ = + + ···+ v0 0 that may be written as v0 c1v1 c2v2 cnvn, not pointing to SCC’s pages nor accessible from them. No- ,..., where the numbers c1 cn are the v0 coordinates in our ba- tice that the configuration of the web is something dynamic = sis. Now we multiply vector v0 by matrix A to obtain Av 0 and that it is evolving with time. And it is not clear whether λ + λ + ···+ λ ,..., c1 1v1 c2 2v2 cn nvn because the vectors v1 vn this structure has been essentially preserved or not15. We refer are eigenvectors of A. Let us repeat the operation, say k times: here to [3]. k = λk + λk + ···+ λk A v0 c1 1v1 c2 2v2 cn nvn. Let us suppose that What Google does in this situation is a standard trick: try c = 0. Then, 16 1 to get the best possible situation in a reasonable way .For λ k λ k instance, adding a whole series of transition probabilities to 1 k = + 2 + ···+ n k A v0 c1v1 c2 v2 cn vn all the vertices. That is, considering the following matrix, λ λ1 λ1 1 ⎛ ⎞ k→∞ −−−→ c v p1 1 1 ⎜ ⎟  =  +( − )⎝ . ⎠( ,..., ), since |λ j/λ1| < 1 for each j = 2,...,n (recall that λ1 was the M cM 1 c . 1 1 dominant eigenvalue). pn

EMS Newsletter March 2007 13 Feature where p1,...,pn is a certain probability distribution (p j ≥ 0, Banach spaces known as the Krein–Rutman theorem (see [12] ∑ j p j = 1) and c is a parameter between 0 and 1 (for Google, and [5]). And those engaged in partial differential equations about 0.85). will enjoy proving, using Krein–Rutman Theorem, that the As an example, we could choose a uniform distribution, first eigenfunction of the Laplacian in the Dirichlet problem n p j = 1/n, for each j = 1,...,n (and the matrix would have (in an open, connected and bounded set Ω ⊂ R )ispositive positive entries). But there are other reasonable choices and (see the details in the appendix to Chapter 8 of [9]). this degree of freedom gives us the possibility of making “per- sonalized” searches. In terms of the random surfer, we are 10. Coda giving him the option (with probability 1 − c) to get “bored” of following the links and to jump to any web page (obeying The design of a web search engine is a formidable technolog- a certain probability distribution)17. ical challenge. But in the end, we discover that the key point is mathematics: a wise application of theorems and a detailed 9. Non-negative matrices in other contexts analysis of the algorithm convergence. A new confirmation of the unreasonable effectiveness of mathematics in the natural The results on non-negative matrices that we have seen above sciences, as Eugene Wigner used to say – as in so many other have a wide range of applications. The following two obser- fields, we might add. We hope that these pages will encour- vations (see [13]) may explain their ubiquity: age the readers to explore for themselves the many problems ◦ In most “real” systems (from physics, economy, biology, we have briefly sketched here – and hopefully, they have been technology, etc.) the measured interactions are positive, or a source of good entertainment. And a very fond farewell to at least non-negative. And matrices with non-negative en- Perron–Frobenius’ theorem, which plays such a distinguished tries are the appropriate way to encode these measurements. role in so many questions. Let us bid farewell with a humor- ◦ Many models involve linear iterative processes: starting ous (but regretfully untranslatable19) coplilla manriqueña: = from an initial state x0, the generic one is of the form xk Un hermoso resultado k A x0. The convergence of the method depends upon the que además se nos revela size of A’s eigenvalues and upon the ratios between their indiscreto; sizes, particularly between the biggest and all the others. y un tanto desvergonzado, And here is where Perron–Frobenius’ theorem has some- porque de Google desvela thing to say, as long as the matrix A is non-negative. su secreto. The probabilistic model of Markov chains is widely used in quite diverse contexts. Google’s method is a nice example, but it is also used as a model for population migrations, transmis- 11. To know more sion of diseases, rating migrations in finance, etc. But, as men- The following book is an excellent and very recent reference: tioned before, Perron–Frobenius’ Theory also plays a central role in many other contexts (we refer the reader again to [13]). Google’s page rank and beyond: the science of search engine Let us mention just a pair: rankings. (Amy N. Langville and Carl D. Meyer. Princeton Biological models: a well known population model, in University Press, 2006). some sense a generalization of the one developed by Fibo- Other references cited throughout the note: nacci, is encoded with the so called Leslie matrices.Theiren- [1] Bapat, R. B. and Raghavan, T. E. S.: Nonnegative matrices tries are non-negative numbers, related to the transition frac- and applications. Cambridge University Press, 1997. tions between age classes and survival rates. If λ1 is the dom- [2] Barabási, A.-L.: The physics of the web. Physics World (july inant eigenvalue then the system behaviour (extinction, end- 2001). Available at www.nd.edu/~alb. less growth or oscillating behaviour) depends upon the pre- [3] Barabási, A.-L.: Linked, the new science of networks. How cise value of λ (λ > 1, λ = 1orλ < 1 being the three everything is connected to everything else and what it means. 1 1 1 1 Plume Books, 2003. cases of interest). [4] Berman, A. and Plemmons, R. J.: Nonnegative matrices in the Economic models: in 1973, Leontief was awarded the No- Mathematical Sciences. Academic Press, 1979. bel Prize for the development of his input-output model.A [5] Brézis, H.: Análisis funcional. Teoría y aplicaciones. Alianza certain country’s economy is divided into sectors and the ba- Editorial, Madrid, 1984. sic hypothesis is that the jth sector’s input of the ith sector’s [6] Brin, S. and Page, L.: The anatomy of a large-scale hypertex- output is proportional to the jth sector’s output. In these con- tual web search engine. www-db.stanford.edu/~sergey/ ditions, the existence of the solution for the system depends [7] Broder, A. et al.: Graph structure in the web. www9.org/ upon the value of the dominant eigenvalue of the matrix that w9cdrom/160/160.. encodes the features of the problem. [8] Haweliwala, T. S. and Kamvar, S. D.: The second eigen- Finally, there are several extensions of Perron–Frobenius’ value of the Google matrix. www.stanford.edu/taherh/papers/ Theory that the reader might find interesting: secondeigenvalue.pdf. [9] Dautray, R. and Lions, J.-L.: Mathematical analysis and nu- Cones in Rn: the key point of Perron–Frobenius’ theo- × merical methods for science and technology. Volume 3: Spec- rem is that any n n matrix with non-negative entries preserve tral Theory and Applications. Springer-Verlag, Berlin, 1990. the “positive octant”. There is a general version dealing with [10] Keener, J. P.: The Perron–Frobenius Theorem and the ranking 18 (proper convex) cones (see [1, 4]). of football teams.SIAMReview35 (1993), no. 1, 80–93. Banach spaces: those readers versed in functional analy- [11] Kendall, M. G.: Further contributions to the theory of paired sis and spectral theory will be aware of the generalization to comparisons. Biometrics 11 (1955), 43–62.

14 EMS Newsletter March 2007 Feature

[12] Krein, M. G. and Rutman, M. A.: Linear operators leaving 12. A matrix A is said to be primitive if it has a dominant eigenvalue invariant a cone in a Banach space. Uspehi Matem. Nauk (bigger, in modulus, than the other eigenvalues). This happens, (N.S.) 3 (1948), no. 1, 3–95 [Russian]. Translated to English for instance, when, for a certain positive integer k, all the entries in Amer. Math. Soc. Transl. 10 (1962), 199–325. of the matrix Ak are positive. [13] MacLauer, C. R.: The many proofs and applications of Per- 13. Let us consider a directed graph G (a set of vertices and a set ron’s Theorem.SIAMReview42 (2000), no. 3, 487–498. of directed edges). G is said to be strongly connected if, given [14] Minc, H.: Nonnegative matrices. John Wiley & Sons, New any two vertices u and v, we are able to find a sequence of edges York, 1988. joining one to the other. The same conclusion, but “erasing” the [15] Wei, T. H.: The algebraic foundations of ranking theory. directions of the edges, lead us to the concept of a weakly con- Cambridge Univ. Press, London, 1952. nected graph. Needless to say, a strongly connected graph is also [16] Wilf, H. S.: Searching the web with eigenvectors. a weakly connected graph but not necessarily the reverse. 14. Researchers put forward some explanations: The IN set might be made up of newly created pages with no time to get linked by the central kernel pages. OUT pages might be corporate web pages, Notes including only internal links. 1. The inventor of the name is said to be a nephew of the mathe- 15. A lot of interesting questions come up about the structure of the matician Edward Kasner. Kasner also defined the googolplex,its web graph. For instance, the average number of links per page, ( ) value being 10googol.Wow! the mean distance between two pages and the probability P k of 2. They also intended the search engine to be “resistant” to any kind a randomly selected page to have exactly k (say ingoing) links. of manipulation, like commercially-oriented attempts to place Should the graph be random (in the precise sense of Erdös and certain pages at the top positions on the list. Curiously enough, Rényi) then we would expect to have a binomial distribution (or nowadays a new “sport”, , has become very pop- a Poisson distribution in the limit). And we would predict that ular: to try to place a web page in the top positions, usually as most pages would have a similar number of links. However, em- only a recreational exercise. Some queries such as “miserable pirical studies suggest that the decay of the probability distribu- −β β failure” have become classics. tion is not exponential but follows a power law, k ,where 3. Not to mention the incredible capacity of the search engine to is a little bigger than 2 (see, for instance, [2]). This would im- “correct” the query terms and suggest the word one indeed had in ply, for example, that most pages have very few links, while a mind.Thisleadsustoenvisagesupernaturalphenomena...well, minority (even though very significant) have a lot. More than let us give it up. that, if we consider the web as an evolving system, to which new 4. Although we will not go into the details, we should mention that pages are added in succession, the outcome is that the trend gets there are a pair of elements used by Google, in combination with reinforced: “the rich get richer”. This is a usual conclusion in the general criterion we will explain here, when answering spe- competitive systems (as in real life). cific queries. On one hand, as is reasonable, Google does not 16. “Reasonable” means here that it works, as the corresponding give the same “score” to a term that is in the title of the page, in ranking vector turns out to be remarkably good at assigning sig- boldface, in a small font, etc. For combined searches, it will be nificances. 17. In fact, Google’s procedure involves two steps: firstly, in order quite different if, within the document, the terms appear “close”  or “distant” to each other. to make matrix M stochastic, the entries of the zero columns are replaced by 1/n. This “new” matrix M is then transformed 5. This is indeed a new model. Notice that, in general, matrices M   and M will not have the same spectral properties. into M as explained in the text. Notice that the original M is a very sparse matrix, a very convenient feature for multiplica- 6. The ideas behind Google’s procedure can be traced back to the  algorithms developed by Kendall and Wei [11, 15] in the 1950’s. tion. In contrast, M is a dense matrix. But, as the reader may check, all the vector-matrix multiplications in the power method At the same time that Brin and Page were developing their en-  gine, Jon Kleinberg presented his are executed on the sparse matrix M . ⊂ Rn ∈ ∈ HITS (Hypertext Induced Topic Search) algorithm, which fol- 18. A set C is said to be a cone if ax C for any x C and for ≥ λ + ∈ lowed a similar scheme. Kleinberg was awarded the Nevanlinna any number a 0. It will be a convex cone if x µy C for all , ∈ λ, ≥ ∩(− )={ } Prize at the recent ICM 2006. x y C and µ 0. A cone is proper if (a) C C 0 ,(b) ( ) = ( )=Rn 7. The NBA competition is a good example although the dichotomy int C Ø; y (c) span C . of “number of wins” versus their “quality” could also be applied 19. More or less: “a beautiful result, which shows itself as indiscreet to any competition. andshameless,becauseitreveals...Google’s secret”. 8. The German mathematician Oskar Perron (1880–1975) was a conspicuous example of mathematical longevity Pablo Fernández Gallardo [pablo. and was interested in several fields such as analysis, differen- [email protected]] got his PhD in 1997 tial equations, algebra, geometry and number theory, in which from the Universidad Autónoma de Madrid he published several text-books that eventually became classics. and he is currently working as an assistant 9. Ferdinand Georg Frobenius (1849–1917) was one of the out- professor in the mathematics department standing members of the Berlin School, along with distinguished there. His interests range from harmonic mathematicians such as Kronecker, Kummer and Weierstrass. He analysis to discrete mathematics and mathematical finance. is well known for his contributions to group theory. His works on An extended version of this note, both in English and Span- non-negative matrices were done in the last stages of his live. ish, can be found at www.uam.es/pablo.fernandez. A pre- 10. An n × n matrix M is irreducible if all the entries of the matrix (I + A)n−1,whereI stands for the n × n identity matrix, are pos- vious Spanish version of this article appeared in Boletín de itive. If A is the adjacency matrix of a graph then the graph is la Sociedad Española de Matemática Aplicada 30 (2004), strongly connected (see Section 8). 115–141; it was awarded the “V Premio SEMA a la Divul- 11. Notice that the part of the 2-sphere that is situated in the positive gación en Matemática Aplicada” of Sociedad Española de orthant is homeomorphic to a 2-disc. Matemática Aplicada in 2004.

EMS Newsletter March 2007 15