Graph Theory in the Information Age Fan Chung
Total Page:16
File Type:pdf, Size:1020Kb
Graph Theory in the Information Age Fan Chung nthepastdecade,graphtheoryhasgone through a remarkable shift and a profound transformation. The change is in large part due to the humongous amount of informa- tion that we are confronted with. A main way Ito sort through massive data sets is to build and examine the network formed by interrelations. For example, Google’s successful Web search al- gorithms are based on the WWW graph, which contains all Web pages as vertices and hyper- links as edges. There are all sorts of information networks, such as biological networks built from biological databases and social networks formed by email, phone calls, instant messaging, etc., as well as various types of physical networks. Of particular interest to mathematicians is the col- laboration graph, which is based on the data from Mathematical Reviews.Inthecollaborationgraph, every mathematician is a vertex, and two mathe- maticians who wrote a joint paper are connected by an edge. Figure 1 illustrates a portion of the collaboration Figure 1. An induced subgraph of the graph consisting of about 5,000 vertices, repre- collaboration graph. senting mathematicians with Erd˝os number 2 (i.e., mathematicians who wrote a paper with a coauthor of Paul Erd˝os). been used in a wide range of areas. However, Graph theory has two hundred years of history never before have we confronted graphs of not studying the basic mathematical structures called only such tremendous sizes but also extraordinary graphs. A graph G consists of a collection V of richness and complexity, both at a theoretical and vertices and a collection E of edges that connect apracticallevel.Numerouschallengingproblems pairs of vertices. In the past, graph theory has have attracted the attention and imagination of Fan Chung is professor of mathematics at the University researchers from physics, computer science, engi- of California, San Diego. Her email address is fan@ucsd. neering, biology, social science, and mathematics. edu. The new area of “network science” emerged, call- ing for a sound scientific foundation and rigorous ∗This article is based on the Noether Lecture given at the AMS-MAA-SIAM Annual Meeting, January 2009, Wash- analysis for which graph theory is ideally suited. In ington D. C. the other direction, examples of real-world graphs 726 Notices of the AMS Volume 57,Number6 lead to central questions and new directions for graph in G(n, p) has the same expected degree research in graph theory. at every vertex, and therefore G(n, p) does not These real-world networks are massive and capture some of the main behaviors of real-world complex but illustrate amazing coherence. Empir- graphs. Nevertheless, the approaches and methods ically, most real-world graphs have the following in classical random graph theory provide the properties: foundation for the study of random graphs with sparsity—The number of edges is within general degree distributions. • aconstantmultipleofthenumberof Many random graph models have been pro- vertices. posed in the study of information network graphs, small world phenomenon—Any two ver- but there are basically two different approaches. • tices are connected by a short path. Two The “online” model mimics the growth or decay of vertices having a common neighbor are adynamicallychangingnetwork,andthe“offline” more likely to be neighbors. model of random graphs consists of specified fam- power law degree distribution—The degree ilies of graphs as the probability spaces together • of a vertex is the number of its neighbors. with some specified probability distribution. The number of vertices with degree j (or One online model is the so-called preferential β attachment scheme,whichcanbedescribedas having j neighbors) is proportional to j− for some fixed constant β. “the rich get richer”. The preferential attachment scheme has been receiving much attention in the To deal with these information networks, many recent study of complex networks [11, 57], but its basic questions arise: What are basic structures of history can be traced back to Vilfredo Pareto in such large networks? How do they evolve? What 1896, among others. At each tick of the clock (so are the underlying principles that dictate their to speak), a new edge is added, with each of its behavior? How are subgraphs related to the large endpoints chosen with probability proportional to (and often incomplete) host graph? What are the their degrees. It can be proved [15, 31, 57] that the main graph invariants that capture the myriad preferential attachment scheme leads to a power properties of such large graphs? law degree distribution. There are several other To answer these problems, we first delve into the online models, including the duplication model wealth of knowledge from the past, although it is (which seems to be more feasible for biological often not enough. In the past thirty years there has networks, see [35]), as well as many recent exten- been a great deal of progress in combinatorial and sions, such as adding more parameters concerning probabilistic methods, as well as spectral methods. the “talent” or “fitness” of each node [50]. However, traditional probabilistic methods mostly There are two main offline graph models consider the same probability distribution for all for graphs with general degree distribution—the vertices or edges while real graphs are uneven configuration model and random graphs with ex- and clustered. The classical algebraic and ana- pected degree sequences. A random graph in lytic methods are efficient in dealing with highly the configuration model with degree sequences symmetric structures, whereas real-world graphs d1,d2,...,dn is defined by choosing a random are quite the opposite. Guided by examples of matching on !i di “pseudo nodes”, where the real-world graphs, we are compelled to improvise, pseudo nodes are partitioned into parts of sizes extend and create new theory and methods. Here di ,fori 1,...,n.Eachpartisassociatedwitha we will discuss the new developments in several vertex. By= using results of Molloy and Reed [58, 59], topics in graph theory that are rapidly developing. it can be shown [2] that under some mild condi- The topics include a general random graph theory tions, a random power law graph with exponent for any given degree distribution, percolation in β almost surely has no giant component if β β0 general host graphs, PageRank for representing ≥ where β0 is a solution to the equation involving the quantitative correlations among vertices, and the Riemann zeta function ζ(β 2) 2ζ(β 1) 0. game aspects of graphs. The general random graph− model− G(−w) =with expected degree sequence w (w1,w2,...,wn) Random Graph Theory for General Degree follows the spirit of the Erd˝os-Rényi= model. The Distributions probability of having an edge between the ith and The primary subject in the study of random graph jth vertices is defined to be wi wj /Vol (G),where w theory is the classical random graph G(n, p), Vol (G) denotes !i wi .Furthermore,inG( ) each introduced by Erd˝os and Rényi in 1959 [38, 39] edge is chosen independently of the others, and (also independently by Gilbert [44]). In G(n, p), therefore the analysis can be carried out. It was every pair of a set of n vertices is chosen to proved in [28] that if the expected average degree be an edge with probability p.Inaseriesof is strictly greater than 1 in a random graph papers, Erd˝osand Rényi gave an elegant and in G(w),thenthereisagiantcomponent(i.e., comprehensive analysis describing the evolution aconnectedcomponentofvolumeapositive of G(n, p) as p increases. Note that a random fraction of that of the whole graph). Furthermore, June/July 2010 Notices of the AMS 727 the giant component almost surely has volume the host graph and vice versa. It is of interest to δVol (G) O(√n log3.5 n),whereδ is the unique understand the connections between a graph and nonzero root+ of the following equation [29]: its subgraphs. What invariants of the host graph n n canorcannotbetranslatedtoitssubgraphs?Under wi δ (1) " wi e− (1 δ) " wi . what conditions can we predict the behavior of all = − i 1 i 1 or any subgraphs? Can a sparse subgraph have = = Because of the robustness of the G(w) model, very different behavior from its host graph? Here many properties can be derived. For example, a we discuss some of the work in this direction. random graph in G(w) has average distance almost Many information networks or social networks surely equal to (1 o(1)) log n ,andthediameter have very small diameters (in the range of log n), as + log w˜ dictated by the so-called small world phenomenon. is almost surely ( log n ),wherew˜ w 2/ w log w˜ !i i !i i However, in a recent paper by Liben-Nowell and Θ =w provided some mild conditions on are satisfied Kleinberg [52], it was observed that the tree-like [27]. For the range 2 <β<3, where the power law subgraphs derived from some chain-letter data exponents β for numerous real networks reside, seem to have relatively large diameter. In the the power law graph can be roughly described as study of the Erd˝os-Rényi graph model G(n, p),it an “octopus” with a dense subgraph having small was shown [60] that the diameter of a random diameter O(log log n) as the core, while the overall spanning tree is of order √n,incontrastwith diameter is O(log n) and the average distance is the fact that the diameter of the host graph Kn O(log log n) (see [31]). is 1. Aldous [4] proved that in a regular graph For the spectra of power law graphs, there are G with a certain spectral bound σ ,thediameter basically two competing approaches.